Text-based open standard designed for human-readable data interchange
POPULARITY
Categories
Show DescriptionHow it all comes back to the why column, dark patterns, privacy and tracking, getting emails forever from one purchase, how to be bold with communication while still being respectful, HTMHell, CSS mistakes, are we anti-JSON, and the state of FitVid in 2025. Listen on Website →Links Markup from hell - HTMHell Incomplete List of Mistakes in the Design of CSS [CSS Working Group Wiki] JSON Editing Douglas Crockford on JSON Fluid Video Plugin Sponsors
Midjourney Fast Hours, Episode 40 After a short hiatus (blame conferences and caffeine dependency), the Rory Flynn and Drew Brucker break down Google's shiny new Flow suite — with its Veo 3 video model, sound + dialogue generation, and confusing-as-hell product naming. They talk strategy, cost, coherence, and why it still feels like Midjourney has that “magic dust” no one else can replicate.Along the way: Runway love, layering hacks, JSON secrets, interior design with arrows, and 3D dogs with job titles. It's fun. It's weird. It's chaotic. But you'll probably walk away with 3 ideas you want to try right away.Also, someone paid $125 just to tell you whether it's worth it. (You're welcome.)---Midjourney Fast Hour0:00 – When did this madness begin?2:19 – AI video is finally getting spicy3:29 – Google's Flow Suite: Veo 3, sound, and coherence5:02 – Google's confusing product soup: Flow, Gemini, Imagen, Whisk10:45 – Pricing pain: Is Veo 3 worth the $125?13:09 – Veo 2 vs Veo 3: Best value tips and tradeoffs15:08 – Prompt accuracy and physics: Is Google really listening?17:53 – Why less prompt effort = better results now19:40 – Veo 3 vs Kling vs Midjourney: Prompting philosophies20:52 – Scene builder: Longer takes and smart extension workflows22:34 – The catch: extending drops quality and loses sound24:17 – New image-to-video support + third-party images25:41 – Ingredients-based generation and persistent characters27:10 – Frame extraction: finally, a feature we all needed28:08 – Timeline editing, upscaling, and staying inside the tool29:48 – Sora vs Veo 3 vs Runway: usability and consistency31:43 – Canva, Figma, Framer: Tools are becoming monsters35:33 – Figma's new AI website builder is wild36:40 – Prompting sneaker ads and JSON-based design37:09 – Why training teams on AI is almost impossible38:07 – Hedra who? Veo 3 makes fast pivots a must39:55 – Midjourney's next move: what video could look like41:11 – Runway's underrated features and clever reference hacks44:26 – Scene sketching and layout prompting: mind blown47:25 – Interior design from mood board to layout to render49:45 – Lighting direction via floorplans = next-gen hack52:53 – Try-on tech and Chrome extensions54:22 – Style consistency with JSON + ChatGPT58:23 – Mass-generating stylized icons and dogs with jobs1:02:36 – Midjourney updates: V7.1, personalization, and video1:05:01 – What Midjourney must get right with video1:07:18 – The one-shot window to impress1:09:23 – Bring back the Midjourney magic1:11:14 – Wrap-up: chaotic times, coherent thoughts, caffeinated takes
Jason Martin is an AI Security Researcher at HiddenLayer. This episode explores “policy puppetry,” a universal attack technique bypassing safety features in all major language models using structured formats like XML or JSON.Subscribe to the Gradient Flow Newsletter
In this potluck episode of Syntax, Wes and CJ answer your questions about OpenAI's $3B Windsurf acquisition, the evolving role of UI in an AI-driven world, why good design still matters, React vs. Svelte, and more! Show Notes 00:00 Welcome to Syntax! Devs Night Out 02:35 OpenAI acquires Windsurf for $3B Windsurf Ep 870: Windsurf forked VS Code to compete with Cursor. Talking the future of AI + Coding 05:20 What is the future of UI now that AI is such a heavy hitter? 08:45 Handling spam submissions on websites Cloudflare Turnstile 14:18 Duplicating HTML for desktop and mobile websites? 17:03 Is it okay to use a JSON file for simple website data? 19:04 How to handle anonymous and duplicate users Better-Auth 21:55 Working with TypeScript Object.keys() and “any” vs “@ts-ignore” 25:51 Brought to you by Sentry.io 26:38 What is the difference between React and Svelte? 30:24 How should you name your readme file? 31:55 How do you find time to refactor code? 35:20 Best practices for testing responsiveness Polypane 39:19 Avoiding layout shift with progressive enhancement 46:56 Sick Picks + Shameless Plugs Sick Picks CJ: Portable Chainsaw Wes: White Lotus Shameless Plugs CJ: Nuxt Wes: Full Stack App Build | Travel Log w/ Nuxt, Vue, Better Auth, Drizzle, Tailwind, DaisyUI, MapLibre Hit us up on Socials! Syntax: X Instagram Tiktok LinkedIn Threads Wes: X Instagram Tiktok LinkedIn Threads Scott: X Instagram Tiktok LinkedIn Threads Randy: X Instagram YouTube Threads
Gros épisode qui couvre un large spectre de sujets : Java, Scala, Micronaut, NodeJS, l'IA et la compétence des développeurs, le sampling dans les LLMs, les DTO, le vibe coding, les changements chez Broadcom et Red Hat ainsi que plusieurs nouvelles sur les licences open source. Enregistré le 7 mai 2025 Téléchargement de l'épisode LesCastCodeurs-Episode-325.mp3 ou en vidéo sur YouTube. News Langages A l'occasion de JavaOne et du lancement de Java 24, Oracle lance un nouveau site avec des ressources vidéo pour apprendre le langage https://learn.java/ site plutôt à destination des débutants et des enseignants couvre la syntaxe aussi, y compris les ajouts plus récents comme les records ou le pattern matching c'est pas le site le plus trendy du monde. Martin Odersky partage un long article sur l'état de l'écosystème Scala et les évolutions du language https://www.scala-lang.org/blog/2025/03/24/evolving-scala.html Stabilité et besoin d'évolution : Scala maintient sa position (~14ème mondial) avec des bases techniques solides, mais doit évoluer face à la concurrence pour rester pertinent. Axes prioritaires : L'évolution se concentre sur l'amélioration du duo sécurité/convivialité, le polissage du langage (suppression des “rugosités”) et la simplification pour les débutants. Innovation continue : Geler les fonctionnalités est exclu ; l'innovation est clé pour la valeur de Scala. Le langage doit rester généraliste et ne pas se lier à un framework spécifique. Défis et progrès : L'outillage (IDE, outils de build comme sbt, scala-cli, Mill) et la facilité d'apprentissage de l'écosystème sont des points d'attention, avec des améliorations en cours (partenariat pédagogique, plateformes simples). Des strings encore plus rapides ! https://inside.java/2025/05/01/strings-just-got-faster/ Dans JDK 25, la performance de la fonction String::hashCode a été améliorée pour être principalement constant foldable. Cela signifie que si les chaînes de caractères sont utilisées comme clés dans une Map statique et immuable, des gains de performance significatifs sont probables. L'amélioration repose sur l'annotation interne @Stable appliquée au champ privé String.hash. Cette annotation permet à la machine virtuelle de lire la valeur du hash une seule fois et de la considérer comme constante si elle n'est pas la valeur par défaut (zéro). Par conséquent, l'opération String::hashCode peut être remplacée par la valeur de hash connue, optimisant ainsi les lookups dans les Map immuables. Un cas limite est celui où le code de hachage de la chaîne est zéro, auquel cas l'optimisation ne fonctionne pas (par exemple, pour la chaîne vide “”). Bien que l'annotation @Stable soit interne au JDK, un nouveau JEP (JEP 502: Stable Values (Preview)) est en cours de développement pour permettre aux utilisateurs de bénéficier indirectement de fonctionnalités similaires. AtomicHash, une implémentation Java d'une HashMap qui est thread-safe, atomique et non-bloquante https://github.com/arxila/atomichash implémenté sous forme de version immutable de Concurrent Hash Trie Librairies Sortie de Micronaut 4.8.0 https://micronaut.io/2025/04/01/micronaut-framework-4-8-0-released/ Mise à jour de la BOM (Bill of Materials) : La version 4.8.0 met à jour la BOM de la plateforme Micronaut. Améliorations de Micronaut Core : Intégration de Micronaut SourceGen pour la génération interne de métadonnées et d'expressions bytecode. Nombreuses améliorations dans Micronaut SourceGen. Ajout du traçage de l'injection de dépendances pour faciliter le débogage au démarrage et à la création des beans. Nouveau membre definitionType dans l'annotation @Client pour faciliter le partage d'interfaces entre client et serveur. Support de la fusion dans les Bean Mappers via l'annotation @Mapping. Nouvelle liveness probe détectant les threads bloqués (deadlocked) via ThreadMXBean. Intégration Kubernetes améliorée : Mise à jour du client Java Kubernetes vers la version 22.0.1. Ajout du module Micronaut Kubernetes Client OpenAPI, offrant une alternative au client officiel avec moins de dépendances, une configuration unifiée, le support des filtres et la compatibilité Native Image. Introduction d'un nouveau runtime serveur basé sur le serveur HTTP intégré de Java, permettant de créer des applications sans dépendances serveur externes. Ajout dans Micronaut Micrometer d'un module pour instrumenter les sources de données (traces et métriques). Ajout de la condition condition dans l'annotation @MetricOptions pour contrôler l'activation des métriques via une expression. Support des Consul watches dans Micronaut Discovery Client pour détecter les changements de configuration distribuée. Possibilité de générer du code source à partir d'un schéma JSON via les plugins de build (Gradle et Maven). Web Node v24.0.0 passe en version Current: https://nodejs.org/en/blog/release/v24.0.0 Mise à jour du moteur V8 vers la version 13.6 : intégration de nouvelles fonctionnalités JavaScript telles que Float16Array, la gestion explicite des ressources (using), RegExp.escape, WebAssembly Memory64 et Error.isError. npm 11 inclus : améliorations en termes de performance, de sécurité et de compatibilité avec les packages JavaScript modernes. Changement de compilateur pour Windows : abandon de MSVC au profit de ClangCL pour la compilation de Node.js sur Windows. AsyncLocalStorage utilise désormais AsyncContextFrame par défaut : offrant une gestion plus efficace du contexte asynchrone. URLPattern disponible globalement : plus besoin d'importer explicitement cette API pour effectuer des correspondances d'URL. Améliorations du modèle de permissions : le flag expérimental --experimental-permission devient --permission, signalant une stabilité accrue de cette fonctionnalité. Améliorations du test runner : les sous-tests sont désormais attendus automatiquement, simplifiant l'écriture des tests et réduisant les erreurs liées aux promesses non gérées. Intégration d'Undici 7 : amélioration des capacités du client HTTP avec de meilleures performances et un support étendu des fonctionnalités HTTP modernes. Dépréciations et suppressions : Dépréciation de url.parse() au profit de l'API WHATWG URL. Suppression de tls.createSecurePair. Dépréciation de SlowBuffer. Dépréciation de l'instanciation de REPL sans new. Dépréciation de l'utilisation des classes Zlib sans new. Dépréciation du passage de args à spawn et execFile dans child_process. Node.js 24 est actuellement la version “Current” et deviendra une version LTS en octobre 2025. Il est recommandé de tester cette version pour évaluer son impact sur vos applications. Data et Intelligence Artificielle Apprendre à coder reste crucial et l'IA est là pour venir en aide : https://kyrylo.org/software/2025/03/27/learn-to-code-ignore-ai-then-use-ai-to-code-even-better.html Apprendre à coder reste essentiel malgré l'IA. L'IA peut assister la programmation. Une solide base est cruciale pour comprendre et contrôler le code. Cela permet d'éviter la dépendance à l'IA. Cela réduit le risque de remplacement par des outils d'IA accessibles à tous. L'IA est un outil, pas un substitut à la maîtrise des fondamentaux. Super article de Anthropic qui essaie de comprendre comment fonctionne la “pensée” des LLMs https://www.anthropic.com/research/tracing-thoughts-language-model Effet boîte noire : Stratégies internes des IA (Claude) opaques aux développeurs et utilisateurs. Objectif : Comprendre le “raisonnement” interne pour vérifier capacités et intentions. Méthode : Inspiration neurosciences, développement d'un “microscope IA” (regarder quels circuits neuronaux s'activent). Technique : Identification de concepts (“features”) et de “circuits” internes. Multilinguisme : Indice d'un “langage de pensée” conceptuel commun à toutes les langues avant de traduire dans une langue particulière. Planification : Capacité à anticiper (ex: rimes en poésie), pas seulement de la génération mot par mot (token par token). Raisonnement non fidèle : Peut fabriquer des arguments plausibles (“bullshitting”) pour une conclusion donnée. Logique multi-étapes : Combine des faits distincts, ne se contente pas de mémoriser. Hallucinations : Refus par défaut ; réponse si “connaissance” active, sinon risque d'hallucination si erreur. “Jailbreaks” : Tension entre cohérence grammaticale (pousse à continuer) et sécurité (devrait refuser). Bilan : Méthodes limitées mais prometteuses pour la transparence et la fiabilité de l'IA. Le “S” dans MCP veut dire Securité (ou pas !) https://elenacross7.medium.com/%EF%B8%8F-the-s-in-mcp-stands-for-security-91407b33ed6b La spécification MCP pour permettre aux LLMs d'avoir accès à divers outils et fonctions a peut-être été adoptée un peu rapidement, alors qu'elle n'était pas encore prête niveau sécurité L'article liste 4 types d'attaques possibles : vulnérabilité d'injection de commandes attaque d'empoisonnement d'outils redéfinition silencieuse de l'outil le shadowing d'outils inter-serveurs Pour l'instant, MCP n'est pas sécurisé : Pas de standard d'authentification Pas de chiffrement de contexte Pas de vérification d'intégrité des outils Basé sur l'article de InvariantLabs https://invariantlabs.ai/blog/mcp-security-notification-tool-poisoning-attacks Sortie Infinispan 15.2 - pre rolling upgrades 16.0 https://infinispan.org/blog/2025/03/27/infinispan-15-2 Support de Redis JSON + scripts Lua Métriques JVM désactivables Nouvelle console (PatternFly 6) Docs améliorées (métriques + logs) JDK 17 min, support JDK 24 Fin du serveur natif (performances) Guillaume montre comment développer un serveur MCP HTTP Server Sent Events avec l'implémentation de référence Java et LangChain4j https://glaforge.dev/posts/2025/04/04/mcp-client-and-server-with-java-mcp-sdk-and-langchain4j/ Développé en Java, avec l'implémentation de référence qui est aussi à la base de l'implémentation dans Spring Boot (mais indépendant de Spring) Le serveur MCP est exposé sous forme de servlet dans Jetty Le client MCP lui, est développé avec le module MCP de LangChain4j c'est semi independant de Spring dans le sens où c'est dépendant de Reactor et de ses interface. il y a une conversation sur le github d'anthropic pour trouver une solution, mais cela ne parait pas simple. Les fallacies derrière la citation “AI won't replace you, but humans using AI will” https://platforms.substack.com/cp/161356485 La fallacie de l'automatisation vs. l'augmentation : Elle se concentre sur l'amélioration des tâches existantes avec l'IA au lieu de considérer le changement de la valeur de ces tâches dans un nouveau système. La fallacie des gains de productivité : L'augmentation de la productivité ne se traduit pas toujours par plus de valeur pour les travailleurs, car la valeur créée peut être capturée ailleurs dans le système. La fallacie des emplois statiques : Les emplois sont des constructions organisationnelles qui peuvent être redéfinies par l'IA, rendant les rôles traditionnels obsolètes. La fallacie de la compétition “moi vs. quelqu'un utilisant l'IA” : La concurrence évolue lorsque l'IA modifie les contraintes fondamentales d'un secteur, rendant les compétences existantes moins pertinentes. La fallacie de la continuité du flux de travail : L'IA peut entraîner une réimagination complète des flux de travail, éliminant le besoin de certaines compétences. La fallacie des outils neutres : Les outils d'IA ne sont pas neutres et peuvent redistribuer le pouvoir organisationnel en changeant la façon dont les décisions sont prises et exécutées. La fallacie du salaire stable : Le maintien d'un emploi ne garantit pas un salaire stable, car la valeur du travail peut diminuer avec l'augmentation des capacités de l'IA. La fallacie de l'entreprise stable : L'intégration de l'IA nécessite une restructuration de l'entreprise et ne se fait pas dans un vide organisationnel. Comprendre le “sampling” dans les LLMs https://rentry.co/samplers Explique pourquoi les LLMs utilisent des tokens Les différentes méthodes de “sampling” : càd de choix de tokens Les hyperparamètres comme la température, top-p, et leur influence réciproque Les algorithmes de tokenisation comme Byte Pair Encoding et SentencePiece. Un de moins … OpenAI va racheter Windsurf pour 3 milliards de dollars. https://www.bloomberg.com/news/articles/2025-05-06/openai-reaches-agreement-to-buy-startup-windsurf-for-3-billion l'accord n'est pas encore finalisé Windsurf était valorisé à 1,25 milliards l'an dernier et OpenAI a levé 40 milliards dernièrement portant sa valeur à 300 milliards Le but pour OpenAI est de rentrer dans le monde des assistants de code pour lesquels ils sont aujourd'hui absent Docker desktop se met à l'IA… ? Une nouvelle fonctionnalité dans docker desktop 4.4 sur macos: Docker Model Runner https://dev.to/docker/run-genai-models-locally-with-docker-model-runner-5elb Permet de faire tourner des modèles nativement en local ( https://docs.docker.com/model-runner/ ) mais aussi des serveurs MCP ( https://docs.docker.com/ai/mcp-catalog-and-toolkit/ ) Outillage Jetbrains défend la suppression des commentaires négatifs sur son assistant IA https://devclass.com/2025/04/30/jetbrains-defends-removal-of-negative-reviews-for-unpopular-ai-assistant/?td=rt-3a L'IA Assistant de JetBrains, lancée en juillet 2023, a été téléchargée plus de 22 millions de fois mais n'est notée que 2,3 sur 5. Des utilisateurs ont remarqué que certaines critiques négatives étaient supprimées, ce qui a provoqué une réaction négative sur les réseaux sociaux. Un employé de JetBrains a expliqué que les critiques ont été supprimées soit parce qu'elles mentionnaient des problèmes déjà résolus, soit parce qu'elles violaient leur politique concernant les “grossièretés, etc.” L'entreprise a reconnu qu'elle aurait pu mieux gérer la situation, un représentant déclarant : “Supprimer plusieurs critiques d'un coup sans préavis semblait suspect. Nous aurions dû au moins publier un avis et fournir plus de détails aux auteurs.” Parmi les problèmes de l'IA Assistant signalés par les utilisateurs figurent : un support limité pour les fournisseurs de modèles tiers, une latence notable, des ralentissements fréquents, des fonctionnalités principales verrouillées aux services cloud de JetBrains, une expérience utilisateur incohérente et une documentation insuffisante. Une plainte courante est que l'IA Assistant s'installe sans permission. Un utilisateur sur Reddit l'a qualifié de “plugin agaçant qui s'auto-répare/se réinstalle comme un phénix”. JetBrains a récemment introduit un niveau gratuit et un nouvel agent IA appelé Junie, destiné à fonctionner parallèlement à l'IA Assistant, probablement en réponse à la concurrence entre fournisseurs. Mais il est plus char a faire tourner. La société s'est engagée à explorer de nouvelles approches pour traiter les mises à jour majeures différemment et envisage d'implémenter des critiques par version ou de marquer les critiques comme “Résolues” avec des liens vers les problèmes correspondants au lieu de les supprimer. Contrairement à des concurrents comme Microsoft, AWS ou Google, JetBrains commercialise uniquement des outils et services de développement et ne dispose pas d'une activité cloud distincte sur laquelle s'appuyer. Vos images de README et fichiers Markdown compatibles pour le dark mode de GitHub: https://github.blog/developer-skills/github/how-to-make-your-images-in-markdown-on-github-adjust-for-dark-mode-and-light-mode/ Seulement quelques lignes de pure HTML pour le faire Architecture Alors, les DTOs, c'est bien ou c'est pas bien ? https://codeopinion.com/dtos-mapping-the-good-the-bad-and-the-excessive/ Utilité des DTOs : Les DTOs servent à transférer des données entre les différentes couches d'une application, en mappant souvent les données entre différentes représentations (par exemple, entre la base de données et l'interface utilisateur). Surutilisation fréquente : L'article souligne que les DTOs sont souvent utilisés de manière excessive, notamment pour créer des API HTTP qui ne font que refléter les entités de la base de données, manquant ainsi l'opportunité de composer des données plus riches. Vraie valeur : La valeur réelle des DTOs réside dans la gestion du couplage entre les couches et la composition de données provenant de sources multiples en formes optimisées pour des cas d'utilisation spécifiques. Découplage : Il est suggéré d'utiliser les DTOs pour découpler les modèles de données internes des contrats externes (comme les API), ce qui permet une évolution et une gestion des versions indépendantes. Exemple avec CQRS : Dans le cadre de CQRS (Command Query Responsibility Segregation), les réponses aux requêtes (queries) agissent comme des DTOs spécifiquement adaptés aux besoins de l'interface utilisateur, pouvant inclure des données de diverses sources. Protection des données internes : Les DTOs aident à distinguer et protéger les modèles de données internes (privés) des changements externes (publics). Éviter l'excès : L'auteur met en garde contre les couches de mapping excessives (mapper un DTO vers un autre DTO) qui n'apportent pas de valeur ajoutée. Création ciblée : Il est conseillé de ne créer des DTOs que lorsqu'ils résolvent des problèmes concrets, tels que la gestion du couplage ou la facilitation de la composition de données. Méthodologies Même Guillaume se met au “vibe coding” https://glaforge.dev/posts/2025/05/02/vibe-coding-an-mcp-server-with-micronaut-and-gemini/ Selon Andrey Karpathy, c'est le fait de POC-er un proto, une appli jetable du weekend https://x.com/karpathy/status/1886192184808149383 Mais Simon Willison s'insurge que certains confondent coder avec l'assistance de l'IA avec le vibe coding https://simonwillison.net/2025/May/1/not-vibe-coding/ Guillaume c'est ici amusé à développer un serveur MCP avec Micronaut, en utilisant Gemini, l'IA de Google. Contrairement à Quarkus ou Spring Boot, Micronaut n'a pas encore de module ou de support spécifique pour faciliter la création de serveur MCP Sécurité Une faille de sécurité 10/10 sur Tomcat https://www.it-connect.fr/apache-tomcat-cette-faille-activement-exploitee-seulement-30-heures-apres-sa-divulgation-patchez/ Une faille de sécurité critique (CVE-2025-24813) affecte Apache Tomcat, permettant l'exécution de code à distance Cette vulnérabilité est activement exploitée seulement 30 heures après sa divulgation du 10 mars 2025 L'attaque ne nécessite aucune authentification et est particulièrement simple à exécuter Elle utilise une requête PUT avec une charge utile Java sérialisée encodée en base64, suivie d'une requête GET L'encodage en base64 permet de contourner la plupart des filtres de sécurité Les serveurs vulnérables utilisent un stockage de session basé sur des fichiers (configuration répandue) Les versions affectées sont : 11.0.0-M1 à 11.0.2, 10.1.0-M1 à 10.1.34, et 9.0.0.M1 à 9.0.98 Les mises à jour recommandées sont : 11.0.3+, 10.1.35+ et 9.0.99+ Les experts prévoient des attaques plus sophistiquées dans les prochaines phases d'exploitation (upload de config ou jsp) Sécurisation d'un serveur ssh https://ittavern.com/ssh-server-hardening/ un article qui liste les configurations clés pour sécuriser un serveur SSH par exemple, enlever password authentigfication, changer de port, desactiver le login root, forcer le protocol ssh 2, certains que je ne connaissais pas comme MaxStartups qui limite le nombre de connections non authentifiées concurrentes Port knocking est une technique utile mais demande une approche cliente consciente du protocol Oracle admet que les identités IAM de ses clients ont leaké https://www.theregister.com/2025/04/08/oracle_cloud_compromised/ Oracle a confirmé à certains clients que son cloud public a été compromis, alors que l'entreprise avait précédemment nié toute intrusion. Un pirate informatique a revendiqué avoir piraté deux serveurs d'authentification d'Oracle et volé environ six millions d'enregistrements, incluant des clés de sécurité privées, des identifiants chiffrés et des entrées LDAP. La faille exploitée serait la vulnérabilité CVE-2021-35587 dans Oracle Access Manager, qu'Oracle n'avait pas corrigée sur ses propres systèmes. Le pirate a créé un fichier texte début mars sur login.us2.oraclecloud.com contenant son adresse email pour prouver son accès. Selon Oracle, un ancien serveur contenant des données vieilles de huit ans aurait été compromis, mais un client affirme que des données de connexion aussi récentes que 2024 ont été dérobées. Oracle fait face à un procès au Texas concernant cette violation de données. Cette intrusion est distincte d'une autre attaque contre Oracle Health, sur laquelle l'entreprise refuse de commenter. Oracle pourrait faire face à des sanctions sous le RGPD européen qui exige la notification des parties affectées dans les 72 heures suivant la découverte d'une fuite de données. Le comportement d'Oracle consistant à nier puis à admettre discrètement l'intrusion est inhabituel en 2025 et pourrait mener à d'autres actions en justice collectives. Une GitHub action très populaire compromise https://www.stepsecurity.io/blog/harden-runner-detection-tj-actions-changed-files-action-is-compromised Compromission de l'action tj-actions/changed-files : En mars 2025, une action GitHub très utilisée (tj-actions/changed-files) a été compromise. Des versions modifiées de l'action ont exposé des secrets CI/CD dans les logs de build. Méthode d'attaque : Un PAT compromis a permis de rediriger plusieurs tags de version vers un commit contenant du code malveillant. Détails du code malveillant : Le code injecté exécutait une fonction Node.js encodée en base64, qui téléchargeait un script Python. Ce script parcourait la mémoire du runner GitHub à la recherche de secrets (tokens, clés…) et les exposait dans les logs. Dans certains cas, les données étaient aussi envoyées via une requête réseau. Période d'exposition : Les versions compromises étaient actives entre le 12 et le 15 mars 2025. Tout dépôt, particulièrement ceux publiques, ayant utilisé l'action pendant cette période doit être considéré comme potentiellement exposé. Détection : L'activité malveillante a été repérée par l'analyse des comportements inhabituels pendant l'exécution des workflows, comme des connexions réseau inattendues. Réaction : GitHub a supprimé l'action compromise, qui a ensuite été nettoyée. Impact potentiel : Tous les secrets apparaissant dans les logs doivent être considérés comme compromis, même dans les dépôts privés, et régénérés sans délai. Loi, société et organisation Les startup the YCombinateur ont les plus fortes croissances de leur histoire https://www.cnbc.com/2025/03/15/y-combinator-startups-are-fastest-growing-in-fund-history-because-of-ai.html Les entreprises en phase de démarrage à Silicon Valley connaissent une croissance significative grâce à l'intelligence artificielle. Le PDG de Y Combinator, Garry Tan, affirme que l'ensemble des startups de la dernière cohorte a connu une croissance hebdomadaire de 10% pendant neuf mois. L'IA permet aux développeurs d'automatiser des tâches répétitives et de générer du code grâce aux grands modèles de langage. Pour environ 25% des startups actuelles de YC, 95% de leur code a été écrit par l'IA. Cette révolution permet aux entreprises de se développer avec moins de personnel - certaines atteignant 10 millions de dollars de revenus avec moins de 10 employés. La mentalité de “croissance à tout prix” a été remplacée par un renouveau d'intérêt pour la rentabilité. Environ 80% des entreprises présentées lors du “demo day” étaient centrées sur l'IA, avec quelques startups en robotique et semi-conducteurs. Y Combinator investit 500 000 dollars dans les startups en échange d'une participation au capital, suivi d'un programme de trois mois. Red Hat middleware (ex-jboss) rejoint IBM https://markclittle.blogspot.com/2025/03/red-hat-middleware-moving-to-ibm.html Les activités Middleware de Red Hat (incluant JBoss, Quarkus, etc.) vont être transférées vers IBM, dans l'unité dédiée à la sécurité des données, à l'IAM et aux runtimes. Ce changement découle d'une décision stratégique de Red Hat de se concentrer davantage sur le cloud hybride et l'intelligence artificielle. Mark Little explique que ce transfert était devenu inévitable, Red Hat ayant réduit ses investissements dans le Middleware ces dernières années. L'intégration vise à renforcer l'innovation autour de Java en réunissant les efforts de Red Hat et IBM sur ce sujet. Les produits Middleware resteront open source et les clients continueront à bénéficier du support habituel sans changement. Mark Little affirme que des projets comme Quarkus continueront à être soutenus et que cette évolution est bénéfique pour la communauté Java. Un an de commonhaus https://www.commonhaus.org/activity/253.html un an, démarré sur les communautés qu'ils connaissaient bien maintenant 14 projets et put en accepter plus confiance, gouvernance legère et proteger le futur des projets automatisation de l'administratif, stabiilité sans complexité, les developpeurs au centre du processus de décision ils ont besoins de members et supporters (financiers) ils veulent accueillir des projets au delà de ceux du cercles des Java Champions Spring Cloud Data Flow devient un produit commercial et ne sera plus maintenu en open source https://spring.io/blog/2025/04/21/spring-cloud-data-flow-commercial Peut-être sous l'influence de Broadcom, Spring se met à mettre en mode propriétaire des composants du portefeuille Spring ils disent que peu de gens l'utilisaent en mode OSS et la majorité venait d'un usage dans la plateforme Tanzu Maintenir en open source le coutent du temps qu'ils son't pas sur ces projets. La CNCF protège le projet NATS, dans la fondation depuis 2018, vu que la société Synadia qui y contribue souhaitait reprendre le contrôle du projet https://www.cncf.io/blog/2025/04/24/protecting-nats-and-the-integrity-of-open-source-cncfs-commitment-to-the-community/ CNCF : Protège projets OS, gouvernance neutre. Synadia vs CNCF : Veut retirer NATS, licence non-OS (BUSL). CNCF : Accuse Synadia de “claw back” (reprise illégitime). Revendications Synadia : Domaine nats.io, orga GitHub. Marque NATS : Synadia n'a pas transféré (promesse rompue malgré aide CNCF). Contestation Synadia : Juge règles CNCF “trop vagues”. Vote interne : Mainteneurs Synadia votent sortie CNCF (sans communauté). Support CNCF : Investissement majeur ($ audits, légal), succès communautaire (>700 orgs). Avenir NATS (CNCF) : Maintien sous Apache 2.0, gouvernance ouverte. Actions CNCF : Health check, appel mainteneurs, annulation marque Synadia, rejet demandes. Mais finalement il semble y avoir un bon dénouement : https://www.cncf.io/announcements/2025/05/01/cncf-and-synadia-align-on-securing-the-future-of-the-nats-io-project/ Accord pour l'avenir de NATS.io : La Cloud Native Computing Foundation (CNCF) et Synadia ont conclu un accord pour sécuriser le futur du projet NATS.io. Transfert des marques NATS : Synadia va céder ses deux enregistrements de marque NATS à la Linux Foundation afin de renforcer la gouvernance ouverte du projet. Maintien au sein de la CNCF : L'infrastructure et les actifs du projet NATS resteront sous l'égide de la CNCF, garantissant ainsi sa stabilité à long terme et son développement en open source sous licence Apache-2.0. Reconnaissance et engagement : La Linux Foundation, par la voix de Todd Moore, reconnaît les contributions de Synadia et son soutien continu. Derek Collison, PDG de Synadia, réaffirme l'engagement de son entreprise envers NATS et la collaboration avec la Linux Foundation et la CNCF. Adoption et soutien communautaire : NATS est largement adopté et considéré comme une infrastructure critique. Il bénéficie d'un fort soutien de la communauté pour sa nature open source et l'implication continue de Synadia. Finalement, Redis revient vers une licence open source OSI, avec la AGPL https://foojay.io/today/redis-is-now-available-under-the-agplv3-open-source-license/ Redis passe à la licence open source AGPLv3 pour contrer l'exploitation par les fournisseurs cloud sans contribution. Le passage précédent à la licence SSPL avait nui à la relation avec la communauté open source. Salvatore Sanfilippo (antirez) est revenu chez Redis. Redis 8 adopte la licence AGPL, intègre les fonctionnalités de Redis Stack (JSON, Time Series, etc.) et introduit les “vector sets” (le support de calcul vectoriel développé par Salvatore). Ces changements visent à renforcer Redis en tant que plateforme appréciée des développeurs, conformément à la vision initiale de Salvatore. Conférences La liste des conférences provenant de Developers Conferences Agenda/List par Aurélie Vache et contributeurs : 6-7 mai 2025 : GOSIM AI Paris - Paris (France) 7-9 mai 2025 : Devoxx UK - London (UK) 15 mai 2025 : Cloud Toulouse - Toulouse (France) 16 mai 2025 : AFUP Day 2025 Lille - Lille (France) 16 mai 2025 : AFUP Day 2025 Lyon - Lyon (France) 16 mai 2025 : AFUP Day 2025 Poitiers - Poitiers (France) 22-23 mai 2025 : Flupa UX Days 2025 - Paris (France) 24 mai 2025 : Polycloud - Montpellier (France) 24 mai 2025 : NG Baguette Conf 2025 - Nantes (France) 3 juin 2025 : TechReady - Nantes (France) 5-6 juin 2025 : AlpesCraft - Grenoble (France) 5-6 juin 2025 : Devquest 2025 - Niort (France) 10-11 juin 2025 : Modern Workplace Conference Paris 2025 - Paris (France) 11-13 juin 2025 : Devoxx Poland - Krakow (Poland) 12 juin 2025 : Positive Design Days - Strasbourg (France) 12-13 juin 2025 : Agile Tour Toulouse - Toulouse (France) 12-13 juin 2025 : DevLille - Lille (France) 13 juin 2025 : Tech F'Est 2025 - Nancy (France) 17 juin 2025 : Mobilis In Mobile - Nantes (France) 19-21 juin 2025 : Drupal Barcamp Perpignan 2025 - Perpignan (France) 24 juin 2025 : WAX 2025 - Aix-en-Provence (France) 25-26 juin 2025 : Agi'Lille 2025 - Lille (France) 25-27 juin 2025 : BreizhCamp 2025 - Rennes (France) 26-27 juin 2025 : Sunny Tech - Montpellier (France) 1-4 juillet 2025 : Open edX Conference - 2025 - Palaiseau (France) 7-9 juillet 2025 : Riviera DEV 2025 - Sophia Antipolis (France) 5 septembre 2025 : JUG Summer Camp 2025 - La Rochelle (France) 12 septembre 2025 : Agile Pays Basque 2025 - Bidart (France) 18-19 septembre 2025 : API Platform Conference - Lille (France) & Online 23 septembre 2025 : OWASP AppSec France 2025 - Paris (France) 25-26 septembre 2025 : Paris Web 2025 - Paris (France) 2-3 octobre 2025 : Volcamp - Clermont-Ferrand (France) 3 octobre 2025 : DevFest Perros-Guirec 2025 - Perros-Guirec (France) 6-10 octobre 2025 : Devoxx Belgium - Antwerp (Belgium) 7 octobre 2025 : BSides Mulhouse - Mulhouse (France) 9-10 octobre 2025 : Forum PHP 2025 - Marne-la-Vallée (France) 9-10 octobre 2025 : EuroRust 2025 - Paris (France) 16 octobre 2025 : PlatformCon25 Live Day Paris - Paris (France) 16-17 octobre 2025 : DevFest Nantes - Nantes (France) 30-31 octobre 2025 : Agile Tour Bordeaux 2025 - Bordeaux (France) 30-31 octobre 2025 : Agile Tour Nantais 2025 - Nantes (France) 30 octobre 2025-2 novembre 2025 : PyConFR 2025 - Lyon (France) 4-7 novembre 2025 : NewCrafts 2025 - Paris (France) 6 novembre 2025 : dotAI 2025 - Paris (France) 7 novembre 2025 : BDX I/O - Bordeaux (France) 12-14 novembre 2025 : Devoxx Morocco - Marrakech (Morocco) 13 novembre 2025 : DevFest Toulouse - Toulouse (France) 15-16 novembre 2025 : Capitole du Libre - Toulouse (France) 20 novembre 2025 : OVHcloud Summit - Paris (France) 21 novembre 2025 : DevFest Paris 2025 - Paris (France) 27 novembre 2025 : Devfest Strasbourg 2025 - Strasbourg (France) 28 novembre 2025 : DevFest Lyon - Lyon (France) 5 décembre 2025 : DevFest Dijon 2025 - Dijon (France) 10-11 décembre 2025 : Devops REX - Paris (France) 10-11 décembre 2025 : Open Source Experience - Paris (France) 28-31 janvier 2026 : SnowCamp 2026 - Grenoble (France) 2-6 février 2026 : Web Days Convention - Aix-en-Provence (France) 23-25 avril 2026 : Devoxx Greece - Athens (Greece) 17 juin 2026 : Devoxx Poland - Krakow (Poland) Nous contacter Pour réagir à cet épisode, venez discuter sur le groupe Google https://groups.google.com/group/lescastcodeurs Contactez-nous via X/twitter https://twitter.com/lescastcodeurs ou Bluesky https://bsky.app/profile/lescastcodeurs.com Faire un crowdcast ou une crowdquestion Soutenez Les Cast Codeurs sur Patreon https://www.patreon.com/LesCastCodeurs Tous les épisodes et toutes les infos sur https://lescastcodeurs.com/
Daisy Hollman joins Phil and Anastasia. Daisy talks to us about the current state of the art in using LLM-based AI agents to help with software development, as well as where that is going in the future, and what impacts it is having (good and bad). Show Notes News Clang 20 released Boost 1.88 released JSON for Modern C++ 3.12.0 Conferences: Pure Virtual C++ 2025 Full schedule C++ Now 2025 C++ on Sea 2025 - speakers C++ under the Sea 2025 Links "Not your Grandparent's C++" - Phil's talk "Robots Are After Your Job: Exploring Generative AI for C++" - Andrei Alexandrescu's closing CppCon 2023 keynote
Cloud Connections 2025 | St. Petersburg, FL “If you think you're moving fast, you're probably not moving fast enough.” That was the core message from Mike Tessler, managing partner at True North Advisory, in his opening keynote at the Cloud Connections 2025 conference. In a session titled “Don't Stop Believin': AI's Journey in Enterprise Transformation,” Tessler shifted the AI conversation from capabilities to strategy. Instead of showcasing the latest contact center tricks or flashy generative features, he dove deep into how enterprises should approach AI adoption—with urgency, realism, and a clear plan. Tessler framed the moment as a once-in-a-generation inflection point. Just 866 days since ChatGPT launched, enterprises have been flooded with AI solutions, but many are still struggling with actual implementation. “The field is exploding, but there's friction,” said Tessler, noting that while consumers quickly embraced AI tools, corporate environments remain slow to adapt. Three Big Takeaways from Tessler's Talk AI Is Only as Good as Your Data Enterprises must start by understanding their own data. “Almost every company says, ‘We don't have data,'” Tessler observed, “but they do. They just don't know how to surface and structure it.” He suggested simple tools like JSON to codify marketing guidelines or operational principles and inject consistency into AI-generated content. Enterprise Strategy Starts with Personal Productivity Tessler outlined a three-layer AI roadmap used at Boldyn Networks, where he serves on the board: Layer 1: Personal Productivity (e.g., Copilot, Gemini) Layer 2: Team & Process-Level AI (e.g., AI in network design/deployment) Layer 3: New Services & Capabilities enabled by proprietary data This layered model helps unify enterprise goals and align AI projects with tangible outcomes. Start Small, Move Fast, Stay Agile Forget long IT rollouts, said Tessler. AI adoption demands an agile, iterative approach. Small proofs of concept are key. “Something that wasn't possible last week might be today,” he warned. “So get started now.” Real-World Use Cases: Where AI Is Delivering Value Today Tessler concluded with four examples of AI being used to solve real business problems: Spinoco – Helps micro-businesses manage customer interactions by turning every message, call, or DM into actionable tasks, no CRM needed. Kiwi Data – Uses AI to extract key terms and obligations from decades of contracts and NDAs, helping enterprises get a grip on what they've signed. Tato – Leverages the “exhaust” of UCaaS platforms (transcripts, messages) to identify project risks and drive smarter project management. Intent HQ – Delivers hyper-personalized marketing using behavioral data harvested via mobile SDKs. A Call to Action for the Telecom Community Tessler left the audience with a challenge: "We have to change the way we do things—or get wiped out by those who do." He encouraged every organization to return home with at least one AI use case to explore. “Try something. Test. Learn. Iterate.” To request the slides from the keynote, contact: info@truenorthadvisory.com
Get featured on the show by leaving us a Voice Mail: https://bit.ly/MIPVM FULL SHOW NOTES https://www.microsoftinnovationpodcast.com/680 Microsoft's AI landscape has evolved into three distinct categories: Copilot for Microsoft 365 (M365) applications, Copilot Studio for low-code chatbot development, and Azure AI Foundry (formerly AI Studio) for pro-code flexibility with AI models. Join Nanddeep Nachan on today's Power Platform Show to learn more. TAKEAWAYs• Declarative agents provide the simplest approach to extending Copilot functionality without complex licensing• Teams toolkit in Visual Studio Code offers an easy way to create declarative agents using simple JSON configurations• Copilot Studio gives business users a drag-and-drop interface for creating virtual assistants quickly• Azure AI Foundry provides comprehensive tools for developers and data scientists building advanced AI solutions• Retrieval Augmented Generation (RAG) pattern bridges the gap between LLMs and organization-specific data• Contract management use cases demonstrate how AI can extract insights from millions of documents• Graph RAG pattern enables "global queries" that deliver insights across entire document collections• AI Foundry solutions can be deployed directly to websites, Teams apps, or Microsoft 365 Copilot• Despite impressive personal productivity gains, many organizations still struggle to find compelling enterprise-level use cases for CopilotThis year we're adding a new show to our line up - The AI Advantage. We'll discuss the skills you need to thrive in an AI-enabled world. DynamicsMinds is a world-class event in Slovenia that brings together Microsoft product managers, industry leaders, and dedicated users to explore the latest in Microsoft Dynamics 365, the Power Platform, and Copilot.Early bird tickets are on sale now and listeners of the Microsoft Innovation Podcast get 10% off with the code MIPVIP144bff https://www.dynamicsminds.com/register/?voucher=MIPVIP144bff Accelerate your Microsoft career with the 90 Day Mentoring Challenge We've helped 1,300+ people across 70+ countries establish successful careers in the Microsoft Power Platform and Dynamics 365 ecosystem.Benefit from expert guidance, a supportive community, and a clear career roadmap. A lot can change in 90 days, get started today!Support the showIf you want to get in touch with me, you can message me here on Linkedin.Thanks for listening
In deze aflevering deelt Michele gelekte nieuwe features van Figma: Figma sites en grid. Hebben we het over Mozilla die de default Heading 1 styling aanpast en Webkit die een andere weg kiest voor de implementatie van text-wrap pretty. Ook bespreken we het gebruik van JSON voor het genereren van afbeeldingen binnen ChatGPT. 0:00 - Intro 0:56 - UX Survey update 1:36 - Font Awesome 7 uitleg 7:05 - Firefox past user agent heading 1 styling aan - https://developer.mozilla.org/en-US/blog/h1-element-styles/ 11:36 - Figma grid - https://x.com/wongmjane/status/1914034569143337390 en Figma sites - https://x.com/wongmjane/status/1913640426801865082 geleaked 21:00 - Webkit implementeert text-wrap: pretty net even anders - https://webkit.org/blog/16547/better-typography-with-text-wrap-pretty/ 32:28 - Consistente ChatGPT resultaten met JSON - https://x.com/d4m1n/status/1914618354859384860 43:05 - The White Lotus op Netflix - https://www.imdb.com/title/tt13406094/ 46:03 - Yellow Stone op Netflix - https://www.imdb.com/title/tt4236770/
This week, we discuss Google being found to be a monopoly, OpenAI's “offer” to buy Chrome, and some hot takes on JSON. Plus, is it better to wait on hold or ask for a callback? Watch the YouTube Live Recording of Episode (https://www.youtube.com/watch?v=EhUxUPJv5g4) 516 (https://www.youtube.com/watch?v=EhUxUPJv5g4) Runner-up Titles Just Fine The SDT “Fine” Scale Callback Asynchronous Friendship I would love to get to know you better…over text Send you Jams to the dry cleaners. JSON Take it xslt-easy! Rundown OpenAI OpenAI in talks to pay about $3 billion to acquire AI coding startup Windsurf (https://www.cnbc.com/2025/04/16/openai-in-talks-to-pay-about-3-billion-to-acquire-startup-windsurf.html) The Cursor Mirage (https://artificialintelligencemadesimple.substack.com/p/the-cursor-mirage) AI is for Tinkerers (https://redmonk.com/kholterhoff/2023/06/27/ai-is-for-tinkerers/) Vibe Coding is for PMs (https://redmonk.com/rstephens/2025/04/18/vibe-coding-is-for-pms/) OpenAI releases new simulated reasoning models with full tool access (https://arstechnica.com/ai/2025/04/openai-releases-new-simulated-reasoning-models-with-full-tool-access/) Clouded Judgement 4.18.25 - The Hidden Value in the AI Application Layer (https://cloudedjudgement.substack.com/p/clouded-judgement-41825-the-hidden?utm_source=post-email-title&publication_id=56878&post_id=161562220&utm_campaign=email-post-title&isFreemail=true&r=2l9&triedRedirect=true&utm_medium=email) OpenAI tells judge it would buy Chrome from Google (https://www.theverge.com/news/653882/openai-chrome-google-us-judge) The Creators of Model Context Protocol (https://www.latent.space/p/mcp?utm_source=substack&utm_medium=email) Judge finds Google holds illegal online ad tech monopolies (https://www.cnbc.com/2025/04/17/judge-finds-google-holds-illegal-online-ad-tech-monopolies.html) Intuit, Owner of TurboTax, Wins Battle Against America's Taxpayers (https://prospect.org/power/2025-04-17-intuit-turbotax-wins-battle-against-taxpayers-irs-direct-file/) Relevant to your Interests Switch 2 Carts Still Taste Bad, Designed Purposefully To Be Spat Out (https://www.gamespot.com/articles/switch-2-carts-still-taste-bad-designed-purposefully-to-be-spat-out/1100-6530649/) CEO Andy Jassy's 2024 Letter to Shareholders (https://www.aboutamazon.com/news/company-news/amazon-ceo-andy-jassy-2024-letter-to-shareholders) Amazon CEO Andy Jassy says AI costs will come down (https://www.cnbc.com/2025/04/10/amazon-ceo-andy-jassys-2025-shareholder-letter.html) Happy 18th Birthday CUDA! (https://www.aboutamazon.com/news/company-news/amazon-ceo-andy-jassy-2024-letter-to-shareholders) Honeycomb Acquires Grit: A Strategic Investment in Pragmatic AI and Customer Value (https://www.honeycomb.io/blog/honeycomb-acquires-grit) Everything Announced at Google Cloud Next in 12 Minutes (https://www.youtube.com/watch?v=2OpHbyN4vEM) GitLab vs GitHub : Key Differences in 2025 (https://spacelift.io/blog/gitlab-vs-github) Old Fashioned Function Keys (https://economistwritingeveryday.com/2025/04/11/old-fashioned-function-keys/) Fake job seekers are flooding U.S. companies that are hiring for remote positions, (https://www.cnbc.com/2025/04/08/fake-job-seekers-use-ai-to-interview-for-remote-jobs-tech-ceos-say.html) NetRise raises $10M to expand software supply chain security platform (https://siliconangle.com/2025/04/15/netrise-raises-10-million-expand-software-supply-chain-security-platform/) Mark Zuckerberg's antitrust testimony aired his wildest ideas from Meta's history (https://www.theverge.com/policy/649520/zuckerberg-meta-ftc-antitrust-testimony-facebook-history) How Much Should I Be Spending On Observability? (https://www.honeycomb.io/blog/how-much-should-i-spend-on-observability-pt1) Did we just make platform engineering much easier by shipping a cloud IDP? (https://seroter.com/2025/04/16/did-we-just-make-platform-engineering-much-easier-by-shipping-a-cloud-idp/) Google Cloud Next 2025: Agentic AI Stack, Multimodality, And Sovereignty (https://www.forrester.com/blogs/google-next-2025-agentic-ai-stack-multimodality-and-sovereignty/) iPhone Shipments Down 9% in China's Q1 Smartphone Boom (https://www.macrumors.com/2025/04/18/iphone-shipments-down-in-china-q1/) Exclusive: Anthropic warns fully AI employees are a year away (https://www.axios.com/2025/04/22/ai-anthropic-virtual-employees-security) Synology requires self-branded drives for some consumer NAS systems, drops full functionality and support for third-party HDDs (https://www.tomshardware.com/pc-components/nas/synology-requires-self-branded-drives-for-some-consumer-nas-systems-drops-full-functionality-and-support-for-third-party-hdds) Porting Tailscale to Plan 9 (https://tailscale.com/blog/plan9-port?ck_subscriber_id=512840665&utm_source=convertkit&utm_medium=email&utm_campaign=[Last%20Week%20in%20AWS]%20Issue%20#418:%20Another%20New%20Capacity%20Dingus%20-%2017270009) CVE Foundation (https://www.thecvefoundation.org/) The Cursor Mirage (https://artificialintelligencemadesimple.substack.com/p/the-cursor-mirage) There's a Lot of Bad Telemetry Out There (https://blog.olly.garden/theres-a-lot-of-bad-telemetry-out-there) Gee Wiz (https://redmonk.com/rstephens/2025/04/04/gee-wiz/?ck_subscriber_id=512840665&utm_source=convertkit&utm_medium=email&utm_campaign=[Last%20Week%20in%20AWS]%20Issue%20#418:%20Another%20New%20Capacity%20Dingus%20-%2017270009) Nonsense Silicon Valley crosswalk buttons hacked to imitate Musk, Zuckerberg's voices (https://techcrunch.com/2025/04/14/silicon-valley-crosswalk-buttons-hacked-to-imitate-musk-zuckerberg-voices/) A Visit to Costco in France (https://davidlebovitz.substack.com/p/a-visit-to-costco-in-france) No sweat: Humanoid robots run a Chinese half-marathon (https://apnews.com/article/china-robot-half-marathon-153c6823bd628625106ed26267874d21) Metre, a consistent measurement of the world (https://mappingignorance.org/2025/04/23/150-years-ago-the-metre-convention-determined-how-we-measure-the-world/) Conferences DevOps Days Atlanta (https://devopsdays.org/events/2025-atlanta/welcome/), April 29th-30th. KCD Texas Austin 2025 (https://community.cncf.io/events/details/cncf-kcd-texas-presents-kcd-texas-austin-2025/), May 15th, Whitney Lee Speaking. Cloud Foundry Day US (https://events.linuxfoundation.org/cloud-foundry-day-north-america/), May 14th, Palo Alto, CA, Coté speaking. Fr (https://vmwarereg.fig-street.com/051325-tanzu-workshop/)ee AI workshop (https://vmwarereg.fig-street.com/051325-tanzu-workshop/), May 13th. day before C (https://events.linuxfoundation.org/cloud-foundry-day-north-america/)loud (https://events.linuxfoundation.org/cloud-foundry-day-north-america/) (https://events.linuxfoundation.org/cloud-foundry-day-north-america/)Foundry (https://events.linuxfoundation.org/cloud-foundry-day-north-america/) Day (https://events.linuxfoundation.org/cloud-foundry-day-north-america/). NDC Oslo (https://ndcoslo.com/), May 21st-23th, Coté speaking. SDT News & Community Join our Slack community (https://softwaredefinedtalk.slack.com/join/shared_invite/zt-1hn55iv5d-UTfN7mVX1D9D5ExRt3ZJYQ#/shared-invite/email) Email the show: questions@softwaredefinedtalk.com (mailto:questions@softwaredefinedtalk.com) Free stickers: Email your address to stickers@softwaredefinedtalk.com (mailto:stickers@softwaredefinedtalk.com) Follow us on social media: Twitter (https://twitter.com/softwaredeftalk), Threads (https://www.threads.net/@softwaredefinedtalk), Mastodon (https://hachyderm.io/@softwaredefinedtalk), LinkedIn (https://www.linkedin.com/company/software-defined-talk/), BlueSky (https://bsky.app/profile/softwaredefinedtalk.com) Watch us on: Twitch (https://www.twitch.tv/sdtpodcast), YouTube (https://www.youtube.com/channel/UCi3OJPV6h9tp-hbsGBLGsDQ/featured), Instagram (https://www.instagram.com/softwaredefinedtalk/), TikTok (https://www.tiktok.com/@softwaredefinedtalk) Book offer: Use code SDT for $20 off "Digital WTF" by Coté (https://leanpub.com/digitalwtf/c/sdt) Sponsor the show (https://www.softwaredefinedtalk.com/ads): ads@softwaredefinedtalk.com (mailto:ads@softwaredefinedtalk.com) Recommendations Brandon: Dope Thief (https://www.rottentomatoes.com/tv/dope_thief) on Apple TV (https://www.rottentomatoes.com/tv/dope_thief) Coté: Check out the recording of the Tanzu Annual update (https://www.youtube.com/watch?v=c1QZXzJcAfQ), all about Tanzu's private AI platform. Next, watch Coté's new MCP for D&D video (#4) figures out something cool to do with MCP Prompts (https://www.youtube.com/watch?v=xEtYBznneFg), they make sense now. And, a regret-a-mmendation: Fields Notes annual subscription (https://fieldnotesbrand.com/limited-editions). Photo Credits Header (https://unsplash.com/photos/a-telephone-sitting-on-top-of-a-wooden-shelf-2XnGRN_caHc)
In this episode, Pallavi Koppol, Research Scientist at Databricks, explores the importance of domain-specific intelligence in large language models (LLMs). She discusses how enterprises need models tailored to their unique jargon, data, and tasks rather than relying solely on general benchmarks.Highlights include:- Why benchmarking LLMs for domain-specific tasks is critical for enterprise AI.- An introduction to the Databricks Intelligence Benchmarking Suite (DIBS).- Evaluating models on real-world applications like RAG, text-to-JSON, and function calling.- The evolving landscape of open-source vs. closed-source LLMs.- How industry and academia can collaborate to improve AI benchmarking.
Discover how Oracle APEX leverages OCI AI services to build smarter, more efficient applications. Hosts Lois Houston and Nikita Abraham interview APEX experts Chaitanya Koratamaddi, Apoorva Srinivas, and Toufiq Mohammed about how key services like OCI Vision, Oracle Digital Assistant, and Document Understanding integrate with Oracle APEX. Packed with real-world examples, this episode highlights all the ways you can enhance your APEX apps. Oracle APEX: Empowering Low Code Apps with AI: https://mylearn.oracle.com/ou/course/oracle-apex-empowering-low-code-apps-with-ai/146047/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode. --------------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:25 Lois: Hello and welcome to the Oracle University Podcast. I'm Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Team Lead: Editorial Services. Nikita: Hi everyone! Last week, we looked at how generative AI powers Oracle APEX and in today's episode, we're going to focus on integrating APEX with OCI AI Services. Lois: That's right, Niki. We're going to look at how you can use Oracle AI services like OCI Vision, Oracle Digital Assistant, Document Understanding, OCI Generative AI, and more to enhance your APEX apps. 01:03 Nikita: And to help us with it all, we've got three amazing experts with us, Chaitanya Koratamaddi, Director of Product Management at Oracle, and senior product managers, Apoorva Srinivas and Toufiq Mohammed. In today's episode, we'll go through each Oracle AI service and look at how it interacts with APEX. Apoorva, let's start with you. Can you explain what the OCI Vision service is? Apoorva: Oracle Cloud Infrastructure Vision is a serverless multi-tenant service accessible using the console or REST APIs. You can upload images to detect and classify objects in them. With prebuilt models available, developers can quickly build image recognition into their applications without machine learning expertise. OCI Vision service provides a fully managed model infrastructure. With complete integration with OCI Data Labeling, you can build custom models easily. OCI Vision service provides pretrained models-- Image Classification, Object Detection, Face Detection, and Text Recognition. You can build custom models for Image Classification and Object Detection. 02:24 Lois: Ok. What about its use cases? How can OCI Vision make APEX apps more powerful? Apoorva: Using OCI Vision, you can make images and videos discoverable and searchable in your APEX app. You can use OCI Vision to detect and classify objects in the images. OCI Vision also highlights the objects using a red rectangular box. This comes in handy in use cases such as detecting vehicles that have violated the rules in traffic images. You can use OCI Vision to identify visual anomalies in your data. This is a very popular use case where you can detect anomalies in cancer X-ray images to detect cancer. These are some of the most popular use cases of using OCI Vision with your APEX app. But the possibilities are endless and you can use OCI Vision for any of your image analysis. 03:29 Nikita: Let's shift gears to Oracle Digital Assistant. Chaitanya, can you tell us what it's all about? Chaitanya: Oracle Digital Assistant is a low-code conversational AI platform that allows businesses to build and deploy AI assistants. It provides natural language understanding, automatic speech recognition, and text-to-speech capabilities to enable human-like interactions with customers and employees. Oracle Digital Assistant comes with prebuilt templates for you to get started. 04:00 Lois: What are its key features and benefits, Chaitanya? How does it enhance the user experience? Chaitanya: Oracle Digital Assistant provides conversational AI capabilities that include generative AI features, natural language understanding and ML, AI-powered voice, and analytics and insights. Integration with enterprise applications become easier with unified conversational experience, prebuilt chatbots for Oracle Cloud applications, and chatbot architecture frameworks. Oracle Digital Assistant provides advanced conversational design tools, conversational designer, dialogue and domain trainer, and native multilingual support. Oracle Digital Assistant is open, scalable, and secure. It provides multi-channel support, automated bot-to-agent transfer, and integrated authentication profile. 04:56 Nikita: And what about the architecture? What happens at the back end? Chaitanya: Developers assemble digital assistants from one or more skills. Skills can be based on prebuilt skills provided by Oracle or third parties, custom developed, or based on one of the many skill templates available. 05:16 Lois: Chaitanya, what exactly are “skills” within the Oracle Digital Assistant framework? Chaitanya: Skills are individual chatbots that are designed to interact with users and fulfill specific type of tasks. Each skill helps a user complete a task through a combination of text messages and simple UI elements like select list. When a user request is submitted through a channel, the Digital Assistant routes the user's request to the most appropriate skill to satisfy the user's request. Skills can combine multilingual NLP deep learning engine, a powerful dialogflow engine, and integration components to connect to back-end systems. Skills provide a modular way to build your chatbot functionality. Now users connect with a chatbot through channels such as Facebook, Microsoft Teams, or in our case, Oracle APEX chatbot, which is embedded into an APEX application. 06:21 Nikita: That's fascinating. So, what are some use cases of Oracle Digital Assistant in APEX apps? Chaitanya: Digital assistants streamline approval processes by collecting information, routing requests, and providing status updates. Digital assistants offer instant access to information and documentation, answering common questions and guiding users. Digital assistants assist sales teams by automating tasks, responding to inquiries, and guiding prospects through the sales funnel. Digital assistants facilitate procurement by managing orders, tracking deliveries, and handling supplier communication. Digital assistants simplify expense approvals by collecting reports, validating receipts, and routing them for managerial approval. Digital assistants manage inventory by tracking stock levels, reordering supplies, and providing real-time inventory updates. Digital assistants have become a common UX feature in any enterprise application. 07:28 Want to learn how to design stunning, responsive enterprise applications directly from your browser with minimal coding? The new Oracle APEX Developer Professional learning path and certification enables you to leverage AI-assisted development, including generative AI and Database 23ai, to build secure, scalable web and mobile applications with advanced AI-powered features. From now through May 15, 2025, we're waiving the certification exam fee (valued at $245). So, what are you waiting for? Visit mylearn.oracle.com to get started today. 08:09 Nikita: Welcome back! Thanks for that, Chaitanya. Toufiq, let's talk about the OCI Document Understanding service. What is it? Toufiq: Using this service, you can upload documents to extract text, tables, and other key data. This means the service can automatically identify and extract relevant information from various types of documents, such as invoices, receipts, contracts, etc. The service is serverless and multitenant, which means you don't need to manage any servers or infrastructure. You can access this service using the console, REST APIs, SDK, or CLI, giving you multiple ways to integrate. 08:55 Nikita: What do we use for APEX apps? Toufiq: For APEX applications, we will be using REST APIs to integrate the service. Additionally, you can process individual files or batches of documents using the ProcessorJob API endpoint. This flexibility allows you to handle different volumes of documents efficiently, whether you need to process a single document or thousands at once. With these capabilities, the OCI Document Understanding service can significantly streamline your document processing tasks, saving time and reducing the potential for manual errors. 09:36 Lois: Ok. What are the different types of models available? How do they cater to various business needs? Toufiq: Let us start with pre-trained models. These are ready-to-use models that come right out of the box, offering a range of functionalities. The available models are Optical Character Recognition (OCR) enables the service to extract text from documents, allowing you to digitize, scan the documents effortlessly. You can precisely extract text content from documents. Key-value extraction, useful in streamlining tasks like invoice processing. Table extraction can intelligently extract tabular data from documents. Document classification automatically categorizes documents based on their content. OCR PDF enables seamless extraction of text from PDF files. Now, what if your business needs go beyond these pre-trained models. That's where custom models come into play. You have the flexibility to train and build your own models on top of these foundational pre-trained models. Models available for training are key value extraction and document classification. 10:50 Nikita: What does the architecture look like for OCI Document Understanding? Toufiq: You can ingest or supply the input file in two different ways. You can upload the file to an OCI Object Storage location. And in your request, you can point the Document Understanding service to pick the file from this Object Storage location. Alternatively, you can upload a file directly from your computer. Once the file is uploaded, the Document Understanding service can process the file and extract key information using the pre-trained models. You can also customize models to tailor the extraction to your data or use case. After processing the file, the Document Understanding service stores the results in JSON format in the Object Storage output bucket. Your Oracle APEX application can then read the JSON file from the Object Storage output location, parse the JSON, and store useful information at local table or display it on the screen to the end user. 11:52 Lois: And what about use cases? How are various industries using this service? Toufiq: In financial services, you can utilize Document Understanding to extract data from financial statements, classify and categorize transactions, identify and extract payment details, streamline tax document management. Under manufacturing, you can perform text extraction from shipping labels and bill of lading documents, extract data from production reports, identify and extract vendor details. In the healthcare industry, you can automatically process medical claims, extract patient information from forms, classify and categorize medical records, identify and extract diagnostic codes. This is not an exhaustive list, but provides insights into some industry-specific use cases for Document Understanding. 12:50 Nikita: Toufiq, let's switch to the big topic everyone's excited about—the OCI Generative AI Service. What exactly is it? Toufiq: OCI Generative AI is a fully managed service that provides a set of state of the art, customizable large language models that cover a wide range of use cases. It provides enterprise grade generative AI with data governance and security, which means only you have access to your data and custom-trained models. OCI Generative AI provides pre-trained out-of-the-box LLMs for text generation, summarization, and text embedding. OCI Generative AI also provides necessary tools and infrastructure to define models with your own business knowledge. 13:37 Lois: Generally speaking, how is OCI Generative AI useful? Toufiq: It supports various large language models. New models available from Meta and Cohere include Llama2 developed by Meta, and Cohere's Command model, their flagship text generation model. Additionally, Cohere offers the Summarize model, which provides high-quality summaries, accurately capturing essential information from documents, and the Embed model, converting text to vector embeddings representation. OCI Generative AI also offers dedicated AI clusters, enabling you to host foundational models on private GPUs. It integrates LangChain and open-source framework for developing new interfaces for generative AI applications powered by language models. Moreover, OCI Generative AI facilitates generative AI operations, providing content moderation controls, zero downtime endpoint model swaps, and endpoint deactivation and activation capabilities. For each model endpoint, OCI Generative AI captures a series of analytics, including call statistics, tokens processed, and error counts. 14:58 Nikita: What about the architecture? How does it handle user input? Toufiq: Users can input natural language, input/output examples, and instructions. The LLM analyzes the text and can generate, summarize, transform, extract information, or classify text according to the user's request. The response is sent back to the user in the specified format, which can include raw text or formatting like bullets and numbering, etc. 15:30 Lois: Can you share some practical use cases for generative AI in APEX apps? Toufiq: Some of the OCI generative AI use cases for your Oracle APEX apps include text summarization. Generative AI can quickly summarize lengthy documents such as articles, transcripts, doctor's notes, and internal documents. Businesses can utilize generative AI to draft marketing copy, emails, blog posts, and product descriptions efficiently. Generative AI-powered chatbots are capable of brainstorming, problem solving, and answering questions. With generative AI, content can be rewritten in different styles or languages. This is particularly useful for localization efforts and catering to diverse audience. Generative AI can classify intent in customer chat logs, support tickets, and more. This helps businesses understand customer needs better and provide tailored responses and solutions. By searching call transcripts, internal knowledge sources, Generative AI enables businesses to efficiently answer user queries. This enhances information retrieval and decision-making processes. 16:47 Lois: Before we let you go, can you explain what Select AI is? How is it different from the other AI services? Toufiq: Select AI is a feature of Autonomous Database. This is where Select AI differs from the other AI services. Be it OCI Vision, Document Understanding, or OCI Generative AI, these are all freely managed standalone services on Oracle Cloud, accessible via REST APIs. Whereas Select AI is a feature available in Autonomous Database. That means to use Select AI, you need Autonomous Database. 17:26 Nikita: And what can developers do with Select AI? Toufiq: Traditionally, SQL is the language used to query the data in the database. With Select AI, you can talk to the database and get insights from the data in the database using human language. At the very basic, what Select AI does is it generates SQL queries using natural language, like an NL2SQL capability. 17:52 Nikita: How does it actually do that? Toufiq: When a user asks a question, the first step Select AI does is look into the AI profile, which you, as a developer, define. The AI profile holds crucial information, such as table names, the LLM provider, and the credentials needed to authenticate with the LLM service. Next, Select AI constructs a prompt. This prompt includes information from the AI profile and the user's question. Essentially, it's a packet of information containing everything the LLM service needs to generate SQL. The next step is generating SQL using LLM. The prompt prepared by Select AI is sent to the available LLM services via REST. Which LLM to use is configured in the AI profile. The supported providers are OpenAI, Cohere, Azure OpenAI, and OCI Generative AI. Once the SQL is generated by the LLM service, it is returned to the application. The app can then handle the SQL query in various ways, such as displaying the SQL results in a report format or as charts, etc. 19:05 Lois: This has been an incredible discussion! Thank you, Chaitanya, Apoorva, and Toufiq, for walking us through all of these amazing AI tools. If you're ready to dive deeper, visit mylearn.oracle.com and search for the Oracle APEX: Empowering Low Code Apps with AI course. You'll find step-by-step guides and demos for everything we covered today. Nikita: Until next week, this is Nikita Abraham… Lois: And Lois Houston signing off! 19:31 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
Allen Wyma talks with WindSoilder, a contributor to Nushell, a shell that treats data as structured tables. WindSoilder shares his journey into programming, his work on Nushell, and how Rust has shaped his development experience. Contributing to Rustacean Station Rustacean Station is a community project; get in touch with us if you'd like to suggest an idea for an episode or offer your services as a host or audio editor! Twitter: @rustaceanfm Discord: Rustacean Station Github: @rustacean-station Email: hello@rustacean-station.org Timestamps [@00:00] - Meet WindSoilder: Python developer and Rust enthusiast [@04:15] - Discovering Rust and starting with Nushell [@09:30] - Structured data pipelines in Nushell [@15:20] - Using Nushell for CSV, JSON, and HTTP tasks [@20:45] - Integrating Nushell with external commands and plugins [@27:35] - From contributor to core team member [@33:10] - Learning Rust through Nushell: Challenges and rewards [@38:50] - Upcoming features and improvements in Nushell [@44:25] - Advice for new contributors and Rust beginners [@47:40] - Final thoughts and community resources Credits Intro Theme: Aerocity Audio Editing: Plangora Hosting Infrastructure: Jon Gjengset Show Notes: Plangora Hosts: Allen Wyma
RJJ Software's Software Development Service This episode of The Modern .NET Show is supported, in part, by RJJ Software's Software Development Services, whether your company is looking to elevate its UK operations or reshape its US strategy, we can provide tailored solutions that exceed expectations. Show Notes "So on my side it was actually, the interesting experience was that I kind of used it one way, because it was mainly about reading the Python code, the JavaScript code, and, let's say like, the Go implementations, trying to understand what are the concepts, what are the ways about how it has been implemented by the different teams. And then, you know, switching mentally into the other direction of writing than the code in C#."— Jochen Kirstaetter Welcome friends to The Modern .NET Show; the premier .NET podcast, focusing entirely on the knowledge, tools, and frameworks that all .NET developers should have in their toolbox. We are the go-to podcast for .NET developers worldwide, and I am your host: Jamie “GaProgMan” Taylor. In this episode, Jochen Kirstaetter joined us to talk about his .NET SDK for interacting with Google's Gemini suite of LLMs. Jochen tells us that he started his journey by looking at the existing .NET SDK, which didn't seem right to him, and wrote his own using the HttpClient and HttpClientFactory classes and REST. "I provide a test project with a lot of tests. And when you look at the simplest one, is that you get your instance of the Generative AI type, which you pass in either your API key, if you want to use it against Google AI, or you pass in your project ID and location if you want to use it against Vertex AI. Then you specify which model that you like to use, and you specify the prompt, and the method that you call is then GenerateContent and you get the response back. So effectively with four lines of code you have a full integration of Gemini into your .NET application."— Jochen Kirstaetter Along the way, we discuss the fact that Jochen had to look into the Python, JavaScript, and even Go SDKs to get a better understanding of how his .NET SDK should work. We discuss the “Pythonistic .NET” and “.NETy Python” code that developers can accidentally end up writing, if they're not careful when moving from .NET to Python and back. And we also talk about Jochen's use of tests as documentation for his SDK. Anyway, without further ado, let's sit back, open up a terminal, type in `dotnet new podcast` and we'll dive into the core of Modern .NET. Supporting the Show If you find this episode useful in any way, please consider supporting the show by either leaving a review (check our review page for ways to do that), sharing the episode with a friend or colleague, buying the host a coffee, or considering becoming a Patron of the show. Full Show Notes The full show notes, including links to some of the things we discussed and a full transcription of this episode, can be found at: https://dotnetcore.show/season-7/google-gemini-in-net-the-ultimate-guide-with-jochen-kirstaetter/ Jason's Links: JoKi's MVP Profile JoKi's Google Developer Expert Profile JoKi's website Other Links: Generative AI for .NET Developers with Amit Bahree curl Noda Time with Jon Skeet Google Cloud samples repo on GitHub Google's Gemini SDK for Python Google's Gemini SDK for JavaScript Google's Gemini SDK for Go Vertex AI JoKi's base NuGet package: Mscc.GenerativeAI JoKi's NuGet package: Mscc.GenerativeAI.Google System.Text.Json gcloud CLI .NET Preprocessor directives .NET Target Framework Monikers QUIC protocol IAsyncEnumerable Microsoft.Extensions.AI Supporting the show: Leave a rating or review Buy the show a coffee Become a patron Getting in Touch: Via the contact page Joining the Discord Remember to rate and review the show on Apple Podcasts, Podchaser, or wherever you find your podcasts, this will help the show's audience grow. Or you can just share the show with a friend. And don't forget to reach out via our Contact page. We're very interested in your opinion of the show, so please get in touch. You can support the show by making a monthly donation on the show's Patreon page at: https://www.patreon.com/TheDotNetCorePodcast. Music created by Mono Memory Music, licensed to RJJ Software for use in The Modern .NET Show
Bytes und Strings (click here to comment) 18. April 2025, Jochen In dieser Episode werfen wir einen Blick auf das nächste Kapitel von "Fluent Python" über "Bytes und Strings". Johannes erklärt die wichtigsten Konzepte und warum UTF-8 fast immer die richtige Wahl ist.
In this episode of In-Ear Insights, the Trust Insights podcast, Katie and Chris discuss MCP (Model Context Protocol) and agentic marketing. You’ll learn how MCP connects AI tools to automate tasks—but also why technical expertise is essential to use it effectively. You’ll discover the three layers of AI adoption, from manual prompts to fully autonomous agents, and why skipping foundational steps leads to costly mistakes. You’ll see why workflow automation (like N8N) is the bridge to agentic AI, and how to avoid falling for social media hype. Finally, you’ll get practical advice on staying ahead without drowning in tech overwhelm. Watch now to demystify AI's next big thing! Watch the video here: Can’t see anything? Watch it on YouTube here. Listen to the audio here: https://traffic.libsyn.com/inearinsights/tipodcast-what-is-mcp-agentic-ai-generative-ai.mp3 Download the MP3 audio here. Need help with your company’s data and analytics? Let us know! Join our free Slack group for marketers interested in analytics! [podcastsponsor] Machine-Generated Transcript What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for listening to the episode. Christopher S. Penn – 00:00 In this week’s In-Ear Insights, let’s talk about MCP—Model Context Protocol—and its applications for marketing and what it means. Katie, you said you have questions. Katie Robbert – 00:13 I do. I saw you posted in our free Slack group, Analytics for Marketers, towards the end of last week that one of the models had MCP available. When I see notifications like that, my first thought is: Is this something I need to pay attention to? Usually, you’re really good about letting me know, but I am a fully grown human who needs to be responsible for what I should be paying attention to and not just relying on the data scientist on my team. That was my first gut reaction—which is fair, because you’re a busy person. I like to keep you very busy, and you don’t always have time to let me know what I should be paying attention to. So that was problem one. Problem number two is, yes, you post things typically ahead of when they become more commonplace announcements. I saw a post this morning that I shared with you about MCP and agentic marketing processes, and how it’s going to replace your SEO if you’re doing traditional SEO. For some reason, that raised all of my insecurities and anxieties. Oh my gosh, I really am falling behind because I like to tell people about getting their foundation squared away. If I’m being really honest with myself, I think I focus on that because I feel so lost when I think about AI, agentic processes, MCP, N8N, and all these other things. So I’m like, let me focus on what I know best. But I am now in the boat where I feel like my boat is trailing behind the giant AI yacht. I’m dog-paddling to try to keep up, and I’m just not there. So help me understand a couple of things. One, what is MCP? Two, we’ve talked about agentic AI, but let’s talk about agentic marketing processes. And three, how is someone who isn’t in the weeds with AI every day supposed to not sit at their desk and cry over all of this? Those are big questions, so maybe let’s take them one at a time. All right, let’s start with: What is MCP? Christopher S. Penn – 02:36 Okay, MCP stands for Model Context Protocol. This is something initially advanced by Anthropic, the makers of Claude. It has since been adopted as a standard by OpenAI and now by Google. Sundar Pichai announced at Google Cloud Next last week that the Gemini family will adopt MCP. So what is this? It’s a way for a generative AI model to interface with other systems—a process called tool handling. MCP is a specific kind of tool. You create an MCP server that does stuff behind the scenes. It can be as simple as reading files from your disk or as complicated as using a bunch of SEO tools to optimize a page. It makes that keyword tool available in a tool like Claude Desktop. You could call the tool something like “Make a Katie Joke.” That would be the tool name. You would build an MCP server that talks to an LLM to do all these things behind the scenes. But in Claude, it would just appear as a little tool icon. You’d say, “Hey, Claude, use the Make a Katie Joke tool to make a joke that Katie would make,” and it would talk to that MCP server and kick off all these processes behind the scenes. So think of MCP as a kind of natural language API where, in a conversation with ChatGPT or Claude, you’d say, “Hey, write me some Google Ads with the Google Ads tool.” If you’ve built this tool for yourself or use one of the many free, open MCP servers available (which have data privacy issues), you can add new capabilities to generative AI that the tools don’t have on their own. The thing is, you still have to know what the tool does. You have to build it if it doesn’t exist, integrate it, and know when you should and shouldn’t use it. So as much as it may feel like you’re falling behind, believe it or not, your expertise is actually more important than ever for this. Even though we have MCP, N8N, and workflow automation, all that is software development. It still has to conform to the SDLC. You may not write code, but you better know the SDLC, or you’re going to waste a lot of time. Katie Robbert – 05:19 That’s helpful to know because, again, this may be—let me back up for a second. The information people share on social media is what they want you to see about them. They’re presenting their best selves. I understand that. I do that too as a representative of the company. That’s my job—to represent the best parts of what we do. And yet, my non-professional persona looks at what everyone else is sharing and thinks, Oh my gosh, I really am falling behind. And yet, here I am. My posts are right up there with the others. It’s like, no, I’m not. So I think that’s a little bit of the psychology of where a lot of people are feeling right now. We’re trying our best to look like we know what we’re talking about, but on a daily basis, we’re like, I have no idea what’s happening. So that’s part one. Now I understand what an MCP is. In a nutshell, it’s kind of like a connector between two systems. The commercial version is Zapier—a lot of marketers use Zapier. It’s like, how do I get my data from this place to that place? It transfers information from one system to another. Interestingly enough, I was at the animal shelter we work with yesterday, talking with the executive director. One of the problems she’s trying to solve is that she has literally hundreds of tabs in different spreadsheets of inventory at the shelter. They’re moving to a new shelter, and she’s trying to figure out where everything goes. I was describing to her a system—which doesn’t exist yet—that could include what you’re telling me is an MCP. In a very short version, I explained: We could take all your spreadsheets with all your tabs (which are basically your categories), put those into a database, and then layer generative AI on top of it with some system instructions. Your staff takes a picture of whatever’s been donated. Generative AI recognizes, Okay, that’s two bags of dog food, one thing of wet food, and some pee pads. It looks like those go in Room 121 with the other things, and it adds to the database. I was explaining this process without knowing what that connector was going to be. I said, Let me go back and talk to Chris about it. But I’m fairly certain that’s a thing that can exist. So it sounds like I was describing something I didn’t have the terminology for. Christopher S. Penn – 08:12 Exactly. Right now, here’s the thing—and this is something the LinkedIn hype crowd won’t tell you. As the average user, let me show you what the “getting started with MCP” quick start for non-technical users is. This is from Claude’s website: For Claude Desktop users, get started using MCP in Claude Desktop. First, install Claude Desktop. Second, go into the config file and edit this JSON. Katie Robbert – 08:41 You’ve already lost me. Christopher S. Penn – 08:42 Exactly. Oh, by the way, you also need Node.js on your computer for this to run properly. So when someone says MCP is the future and it’s so easy—well, yes, if you’re a technical person, that’s true. If you’re a non-technical person, this is useless because you’re not going to sit there and install Node.js just to configure a pre-built MCP server. You and your company—if you want to use these capabilities—need to have some IT resources because this is just straight-up IT. This isn’t even AI. This is just, Hey, you need these components in your kitchen before you can cook anything. As cool as MCP is (and believe me, it is very cool), it also has a very high technical bar of entry. So when you see somebody saying, Hey, this is the new AI-enabled MCP SEO, well, yes, that’s true. But what they’re not saying is, you’re probably not going to do this on your own if you’re a non-technical marketer. It’s a business ploy to say, You should hire us as your SEO firm because we’re AI-enabled and we know how to install MCP services. Like, yeah, I can do that too. I just don’t advertise it because it’s kind of a jerk move. Katie Robbert – 10:13 But I think that’s an important point to raise—not that you’re a jerk, but that a lot of us struggle with feeling like we’re not keeping up with AI because of these individuals—professionals, thought leaders, futurists, content creators—who put out this information: This is the future, this is how you’re going to do it. I can probably accurately describe agentic AI, but I couldn’t build it for you. And I think that’s where everyday marketers are struggling. Yeah, I think now I finally understand the concept, but I have no idea how to get started with the thing because there’s nothing out of the box for non-technical people. It’s all still, to your point, a lot of software development, a lot of IT. Even if it’s just installing things so you can get to the drag-and-drop, asking people to suddenly update their config file is maybe one step beyond their technical comfort zone. I just—I know the purpose of this episode is to understand more about MCP and agentic marketing, but I’m struggling to feel like I’m keeping up with being able to execute on all these things that are happening. Because every day, it’s something new, right? Christopher S. Penn – 11:54 So here’s how you get to MCP usage. First, you have to have the basics. Remember, we have the three layers we’ve talked about in the past: Done by you—You’re copy-pasting prompts. There’s nothing wrong with that, but it’s labor-intensive. If you’ve got a great prompt and a way of doing things that works, you’re already ahead of 95% of the crowd who’s still typing one-sentence prompts into ChatGPT. That’s step one. Done with you—How can you put that in some form of automation? We’ve talked about N8N in the past. I’ll give you an example: I put together a workflow for my newsletter where I say, Here’s my newsletter post. I want you to translate it into these four languages. It sends it to Google Gemini, then writes the updated versions back to my hard drive. This saves me about 20 minutes a week because I don’t have to copy-paste each prompt anymore. This is workflow automation. Done for you (Agentic)—To turn this into an MCP server (which makes it an agent, where I’m not part of the process at all), I’d add the MCP server node. Instead of saying, When manual start (when Chris clicks go), you’d have an MCP server that says, When a generative AI tool like Claude requests this, run the process. So, Claude would say, Hey, here’s this week’s newsletter—go make it. Claude Desktop would recognize there’s an Almost Timely Newsletter tool (an MCP server), send the request, the software would run, and when it’s done, it would send a message back to Claude saying, We’re done. That’s how MCP fits in. It takes the whole automation, puts it in a black box, and now it’s an agent. But you cannot build the agent without the workflow automation, and you cannot build the workflow automation without the standard operating procedure. If you don’t have that fundamental in place, you’re going to create garbage. Katie Robbert – 15:59 I think that’s also helpful because even just thinking about the step of translation—I’m assuming you didn’t just say, Hey, Gemini, translate this and accept whatever it gave back. You likely had to build system instructions that included, Translate it this way, then here’s how you’re going to double-check it, then here’s how you’re going to triple-check it. That to me is very helpful because you’re giving me confirmation that the foundational pieces still have to happen. And I think that’s where a lot of these content creators on social platforms talking about MCP and agentic AI are skipping that part of the conversation. Because, as we’ve said before, it’s not the fun stuff—it’s not the push-the-buttons, twist-the-knob, get-the-shiny-object part. It’s how you actually get things to work correctly. And that’s where, as a regular human, I get caught up in the hype: Oh, but they’re making it look so easy. You just do the thing. It’s like the people on social who post, Look how perfect my sourdough bread came out, but they’re not showing you the 17 loaves and five years of trial and error before this perfect loaf. Or they’re faking it with a mock background. I’m saying all this because I need that reminder—it’s all smoke and mirrors. There’s no shortcut for getting it done correctly. So when I see posts about agentic marketing systems and SEO and email marketing—You’re not even going to have to participate, and it’s going to get it right—I need that reminder that it’s all smoke and mirrors. That’s my therapy session for the morning. Christopher S. Penn – 18:33 And here’s the thing: If you have well-written standard operating procedures (SOPs) that are step-by-step, you can hand that to someone skilled at N8N to turn it into a workflow automation. But it has to be granular—Click here, then click here. That level of detail is so important. Once you have an SOP (your process), you turn it into workflow automation. Once the workflow automation works, you bolt on the MCP pieces, and now you have an agent. But here’s the danger: All these things use APIs, and APIs cost either time, money, or resources. I’m using Gemini’s free version, which Google trains on. If I was doing this for a client, I’d use the paid version (which doesn’t train), and the bills start coming in. Every API call costs money. If you don’t know what you’re doing and you haven’t perfected the process, you might end up with a five-figure server bill and wonder, What happened? Part of MCP construction and agentic AI is great development practices to make your code as efficient as possible. Otherwise, you’re going to burn a lot of money—and you may not even be cash-positive. Katie Robbert – 21:27 But look how fast it is! Look how cool it is! Christopher S. Penn – 21:36 It is cool. Katie Robbert – 21:38 Going back to the original question about MCP—I read a post this morning about agentic marketing systems using MCP and how it’s going to change the way you do SEO. It said it’s going to optimize your content, optimize for competitors, find keywords—all of which sounds really cool. But the way it was presented was like, Oh, duh, why am I not already doing this? I’m falling behind if I’m not letting the machines do my SEO for me and building these systems for my clients. This conversation has already made me feel better about where I am in terms of understanding and execution. Going back to—you still have to have those foundational pieces. Because agentic AI, MCPs, generative AI, shiny objects—it’s all just software development. Christopher S. Penn – 22:59 Exactly. It’s all software development. We’ve just gotten used to writing in natural language instead of code. The challenge with shiny objects is that the people promoting them correctly say, This is what’s possible. But at a certain point, even with agentic AI and MCP automations, it’s more efficient to go back to classical programming. N8N doesn’t scale as well as Python code. In the same way, a 3D printer is cool for making one thing at home, but if you want to make 10,000, classical injection molding is the way to go. New technology doesn’t solve old problems. Katie Robbert – 23:47 And yet, it’s going to happen. Well, I know we’re wrapping up this episode. This has been incredibly helpful and educational for me because every week there’s a new term, a new thing we’re being asked to wrap our heads around. As long as we can keep going back to It’s just software development, you still need the foundation, then I think myself and a lot of other people at my skill level are going to be like, Whew, okay, I can still breathe this week. I don’t have to panic just yet. Christopher S. Penn – 24:23 That said, at some point, we are going to have to make a training course on a system like N8N and workflow automation because it’s so valuable for the boring stuff—like keyword selection in SEO. Stay tuned for that. The best place to stay tuned for announcements from us is our free Slack group, Trust Insights AI Analytics for Marketers, where you and nearly 5,000 marketers are asking and answering each other’s questions every day about data science, analytics, and AI. Wherever you watch or listen to the show, if there’s a channel you’d rather have it on, go to trustinsights.ai/tipodcast to find us at all the places fine podcasts are served. Thanks for tuning in—I’ll talk to you on the next one! (Transcript ends with AI training permission notice.) Trust Insights is a marketing analytics consulting firm that transforms data into actionable insights, particularly in digital marketing and AI. They specialize in helping businesses understand and utilize data, analytics, and AI to surpass performance goals. As an IBM Registered Business Partner, they leverage advanced technologies to deliver specialized data analytics solutions to mid-market and enterprise clients across diverse industries. Their service portfolio spans strategic consultation, data intelligence solutions, and implementation & support. Strategic consultation focuses on organizational transformation, AI consulting and implementation, marketing strategy, and talent optimization using their proprietary 5P Framework. Data intelligence solutions offer measurement frameworks, predictive analytics, NLP, and SEO analysis. Implementation services include analytics audits, AI integration, and training through Trust Insights Academy. Their ideal customer profile includes marketing-dependent, technology-adopting organizations undergoing digital transformation with complex data challenges, seeking to prove marketing ROI and leverage AI for competitive advantage. Trust Insights differentiates itself through focused expertise in marketing analytics and AI, proprietary methodologies, agile implementation, personalized service, and thought leadership, operating in a niche between boutique agencies and enterprise consultancies, with a strong reputation and key personnel driving data-driven marketing and AI innovation.
We'll keep this brief because we're on a tight turnaround: GPT 4.1, previously known as the Quasar and Optimus models, is now live as the natural update for 4o/4o-mini (and the research preview of GPT 4.5). Though it is a general purpose model family, the headline features are: Coding abilities (o1-level SWEBench and SWELancer, but ok Aider) Instruction Following (with a very notable prompting guide) Long Context up to 1m tokens (with new MRCR and Graphwalk benchmarks) Vision (simply o1 level) Cheaper Pricing (cheaper than 4o, greatly improved prompt caching savings) We caught up with returning guest Michelle Pokrass and Josh McGrath to get more detail on each! Chapters 00:00:00 Introduction and Guest Welcome 00:00:57 GPC 4.1 Launch Overview 00:01:54 Developer Feedback and Model Names 00:02:53 Model Naming and Starry Themes 00:03:49 Confusion Over GPC 4.1 vs 4.5 00:04:47 Distillation and Model Improvements 00:05:45 Omnimodel Architecture and Future Plans 00:06:43 Core Capabilities of GPC 4.1 00:07:40 Training Techniques and Long Context 00:08:37 Challenges in Long Context Reasoning 00:09:34 Context Utilization in Models 00:10:31 Graph Walks and Model Evaluation 00:11:31 Real Life Applications of Graph Tasks 00:12:30 Multi-Hop Reasoning Benchmarks 00:13:30 Agentic Workflows and Backtracking 00:14:28 Graph Traversals for Agent Planning 00:15:24 Context Usage in API and Memory Systems 00:16:21 Model Performance in Long Context Tasks 00:17:17 Instruction Following and Real World Data 00:18:12 Challenges in Grading Instructions 00:19:09 Instruction Following Techniques 00:20:09 Prompting Techniques and Model Responses 00:21:05 Agentic Workflows and Model Persistence 00:22:01 Balancing Persistence and User Control 00:22:56 Evaluations on Model Edits and Persistence 00:23:55 XML vs JSON in Prompting 00:24:50 Instruction Placement in Context 00:25:49 Optimizing for Prompt Caching 00:26:49 Chain of Thought and Reasoning Models 00:27:46 Choosing the Right Model for Your Task 00:28:46 Coding Capabilities of GPC 4.1 00:29:41 Model Performance in Coding Tasks 00:30:39 Understanding Coding Model Differences 00:31:36 Using Smaller Models for Coding 00:32:33 Future of Coding in OpenAI 00:33:28 Internal Use and Success Stories 00:34:26 Vision and Multi-Modal Capabilities 00:35:25 Screen vs Embodied Vision 00:36:22 Vision Benchmarks and Model Improvements 00:37:19 Model Deprecation and GPU Usage 00:38:13 Fine-Tuning and Preference Steering 00:39:12 Upcoming Reasoning Models 00:40:10 Creative Writing and Model Humor 00:41:07 Feedback and Developer Community 00:42:03 Pricing and Blended Model Costs 00:44:02 Conclusion and Wrap-Up
Topics covered in this episode: How to Write a Git Commit Message Caddy Web Server Some new PEPs approved juv Extras Joke Watch on YouTube About the show Sponsored by Posit Connect: pythonbytes.fm/connect Connect with the hosts Michael: @mkennedy@fosstodon.org / @mkennedy.codes (bsky) Brian: @brianokken@fosstodon.org / @brianokken.bsky.social Show: @pythonbytes@fosstodon.org / @pythonbytes.fm (bsky) Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Monday at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Brian #1: How to Write a Git Commit Message Chris Beams 7 rules of a great commit message Separate subject from body with a blank line Limit the subject line to 50 characters Capitalize the subject line Do not end the subject line with a period Use the imperative mood in the subject line Wrap the body at 72 characters Use the body to explain what and why vs. how Article also includes Why a good commit message matters Discussion about each of the 7 rules Cool hat tips to other articles on the subject “Keep in mind: This has all been said before.” Each word is a different link. Michael #2: Caddy Web Server via Fredrik Mellström Like a more modern NGINX Caddy automatically obtains and renews TLS certificates for all your sites. Caddy's native configuration is a JSON document. Even localhost and internal IPs are served with TLS using the intermediate of a fully-automated, self-managed CA that is automatically installed into most local trust stores. Configure multiple Caddy instances with the same storage, and they will automatically coordinate certificate management as a fleet. Production-grade static file server. Brian #3: Some new PEPs approved PEP 770 – Improving measurability of Python packages with Software Bill-of-Materials Accepted for packaging Author: Seth Larson, Sponsor Brett Cannon “This PEP proposes using SBOM documents included in Python packages as a means to improve automated software measurability for Python packages.” PEP 750 – Template Strings Accepted for Python 3.14 Author: Jim Baker, Guido van Rossum, Paul Everitt, Kaudai Aono, Lysandros Nikolaou, Dave Peck “Templates provide developers with access to the string and its interpolated values before they are combined. This brings native flexible string processing to the Python language and enables safety checks, web templating, domain-specific languages, and more.” Michael #4: juv A toolkit for reproducible Jupyter notebooks, powered by uv. Create, manage, and run Jupyter notebooks with their dependencies Pin dependencies with PEP 723 - inline script metadata Launch ephemeral sessions for multiple front ends (e.g., JupyterLab, Notebook, NbClassic) Powered by uv for fast dependency management Use uvx to run jupyterlab with ephemeral virtual environments and tracked dependencies. Extras Brian: Status of Python versions new-ish format Use this all the time. Can't remember if we've covered the new format yet. See also Python endoflife.date Same dates, very visible encouragement to move on to Python 3.13 if you haven't already. Michael: Python 3.13.3 is out. .git-blame-ignore-revs follow up Joke: BGPT (thanks Doug Farrell)
Adobe su Bluesky non finisce bene. La convergenza dei servizi online. LLM e discriminazione dei contenuti. Microsoft rilancia Recall. Queste e molte altre le notizie tech commentate nella puntata di questa settimana.Dallo studio distribuito di digitalia:Franco Solerio, Michele Di Maio, Francesco FacconiProduttori esecutivi:Christophe Sollami, Raffaele Marco Della Monica, Andrea Sinigaglia, Alessandro Lazzarini, Giovanni Priolo, Nicola Gabriele Del Popolo, Alessandro Morgantini, Antonio Taurisano, Giorgio Sidari, @Akagrinta, Alessio Conforto, Mario Cervai, Mario Giammona, Calogero Augusta, Simone Andreozzi, Claudio Schifanella, Matteo Tarabini, Consultech Srl, Giuliano Arcinotti, Ivan, Renato Battistin, Diego Arati, Alessandro Plicato, Alessandro Stevanin, Arzigogolo, Michelangelo Rocchetti, Davide Capra, Matteo De Lucia, Massimiliano Casamento, Maurizio Galluzzo, Paolo Tegoni, Stefano Minardi, Enrico, Manuel Zavatta, Idle Fellow, Alessio Ferrara, Raffaele Viero, Luca Ubiali, Davide Tinti, Edoardo Zini, Andrea Picotti, Carlo Tomas, Stefano Cutellè, @Jh4Ckal, Simone Podico, Fiorenzo Pilla, Vincenzo IngenitoSponsor:Links:Meta got caught gaming AI benchmarksExec denies Meta artificially boosted Llama 4's benchmark scoresAdobe Deletes Bluesky Posts After Furious BacklashBut what if I really want a faster horse?Non ho visto Un Film Minecraft e giudico chiunque l'abbia fattoAn archeologist adventurer who wears a hat and uses a bullwhipMeta addestra la sua IA con i post sui socialCan Tim Cook Save Apple From Being Crushed by Trump?New Tariff Rule Exempts Smartphones and Other ElectronicsTrump esenta smartphone e computer dai super-dazi alla CinaiPhones will likely be subject to more Trump tariffsChairman Brendan Carr and the FCCs news distortion policyPath of Exile 2 players bully Elon Musk during eerie airplane streamWhite House: iOS helpfully added Atlantic editor to Signal chatMicrosoft is about to launch Recall for real this timeThe AI That Calls Your Elderly Parents If You Can't Be BotheredAnthropic Education Report: How University Students Use ClaudeThe State of AI 2025: 12 Eye-Opening Graphs - IEEE SpectrumHumanity's Last ExamShopify CEO says staffers need to prove jobs can't be done by AIGoogle launches Agent2Agent protocol to connect AI agentsScrivere leggi in un linguaggio che l'intelligenza artificiale può capireUK ecco la "precrimine" alla Minority Report: realtà oltre la distopiaEU: These are scary times let's backdoor encryption!Kawasaki CORLEO: il robot cavalcabile a idrogenoBrazils government-run payments system has become dominantAlmanacco Digitaliano 2024 su LedizioniAlmanacco Digitaliano su AmazonGingilli del giorno:Opodsync - Server GPodder minimalista self-hostedPale Moon - browser open sourceParamita DG08 - buone cuffie a conduzione ossea economicheSupporta Digitalia, diventa produttore esecutivo.
Lois Houston and Nikita Abraham kick off a new season of the podcast, exploring how Oracle APEX integrates with AI to build smarter low-code applications. They are joined by Chaitanya Koratamaddi, Director of Product Management at Oracle, who explains the basics of Oracle APEX, its global adoption, and the challenges it addresses for businesses managing and integrating data. They also explore real-world use cases of AI within the Oracle APEX ecosystem Oracle APEX: Empowering Low Code Apps with AI: https://mylearn.oracle.com/ou/course/oracle-apex-empowering-low-code-apps-with-ai/146047/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode. ----------------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:25 Lois: Hello and welcome to the Oracle University Podcast! I'm Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Team Lead: Editorial Services. Nikita: Hi everyone! Thank you for joining us as we begin a new season of the podcast, this time focused on Oracle APEX and how it integrates with AI to help you create powerful applications. This season is for everyone—from beginners and SQL developers to DBA data scientists and low-code enthusiasts. So, if you're interested in using Oracle APEX to build low-code applications that have custom generative AI features, you'll want to stay tuned in. 01:07 Lois: That's right, Niki. Today, we're going to discuss Oracle APEX at a high level, starting with what it is. Then, we'll cover a few business challenges related to data and AI innovation that organizations face, and learn how the powerful combination of APEX and AI can help overcome these challenges. 01:27 Nikita: To take us through it all, we've got Chaitanya Koratamaddi with us. Chaitanya is Director of Product Management for Oracle APEX. Hi Chaitanya! For anyone new to Oracle APEX, can you explain what it is and why it's so widely used? Chaitanya: Oracle APEX is the world's most popular enterprise low code application platform. APEX enables you to build secure and scalable enterprise-scale applications with world class features that can be deployed anywhere, cloud or on-premises. And with APEX, you can build applications 20 times faster with 100 times less code. APEX delivers the most productive way to develop and deploy mobile and web applications everywhere. 02:18 Lois: That's impressive. So, what's the adoption rate like for Oracle APEX? Chaitanya: As of today, there are 19 million plus APEX applications created globally. 5,000 plus APEX applications are created on a daily basis and there are 800,000 plus APEX developers worldwide. 60,000 plus customers in 150 countries across various industry verticals. And 75% of Fortune 500 companies use Oracle APEX. 02:56 Nikita: Wow, the numbers really speak for themselves, right? But Chaitanya, why are organizations adopting Oracle APEX at this scale? Or to put it differently, what's the core business challenge that Oracle APEX is addressing? Chaitanya: From databases to all data, you know that the world is more connected and automated than ever. To drive new business value, organizations need to explore and exploit new sources of data that are generated from this connected world. That can be sounds, feeds, sensors, videos, images, and more. Businesses need to be able to work with all types of data and also make sure that it is available to be used together. Typically, businesses need to work on all data at a massive scale. For example, supply chains are no longer dependent just on inventory, demand, and order management signals. A manufacturer should be able to understand data describing global weather patterns and how it impacts their supply chains. Businesses need to pull in data from as many social sources as possible to understand how customer sentiment impacts product sales and corporate brands. Our customers need a data platform that ensures all this data works together seamlessly and easily. 04:38 Lois: So, you're saying Oracle APEX is the platform that helps businesses manage and integrate data seamlessly. But data is just one part of the equation, right? Then there's AI. How are the two related? Chaitanya: Before we start talking about Oracle AI, let's first talk about what customers are looking for and where they are struggling within their AI innovation. It all starts with data. For decades, working with data has largely involved dealing with structured data, whether it is your customer records in your CRM application and orders from your ERP database. Data was organized into database and tables, and when you needed to find some insights in your data, all you need to do is just use stored procedures and SQL queries to deliver the answers. But today, the expectations are higher. You want to use AI to construct sophisticated predictions, find anomalies, make decisions, and even take actions autonomously. And the data is far more complicated. It is in an endless variety of formats scattered all over your business. You need tools to find this data, consume it, and easily make sense of it all. And now capabilities like natural language processing, computer vision, and anomaly detection are becoming very essential just like how SQL queries used to be. You need to use AI to analyze phone call transcripts, support tickets, or email complaints so you can understand what customers need and how they feel about your products, customer service, and brand. You may want to use a data source as noisy and unstructured as social media data to detect trends and identify issues in real time. Today, AI capabilities are very essential to accelerate innovation, assess what's happening in your business, and most importantly, exceed the expectations of your customers. So, connecting your application, data, and infrastructure allows everyone in your business to benefit from data. 07:32 Raise your game with the Oracle Cloud Applications skills challenge. Get free training on Oracle Fusion Cloud Applications, Oracle Modern Best Practice, and Oracle Cloud Success Navigator. Pass the free Oracle Fusion Cloud Foundations Associate exam to earn a Foundations Associate certification. Plus, there's a chance to win awards and prizes throughout the challenge! What are you waiting for? Join the challenge today by visiting oracle.com/education. 08:06 Nikita: Welcome back! So, let's focus on AI across the Oracle Cloud ecosystem. How does Oracle bring AI into the mix to connect applications, data, and infrastructure for businesses? Chaitanya: By embedding AI throughout the entire technology stack from the infrastructure that businesses run on through the applications for every line of business, from finance to supply chain and HR, Oracle is helping organizations pragmatically use AI to improve performance while saving time, energy, and resources. Our core cloud infrastructure includes a unique AI infrastructure layer based on our supercluster technology, leveraging the latest and greatest hardware and uniquely able to get the maximum out of the AI infrastructure technology for scenarios such as large language processing. Then there is generative AI and ML for data platforms. On top of the AI infrastructure, our database layer embeds AI in our products such as autonomous database. With autonomous database, you can leverage large language models to use natural language queries rather than writing a SQL when interacting with the autonomous database. This enables you to achieve faster adoption in your application development. Businesses and their customers can use the Select AI natural language interface combined with Oracle Database AI Vector Search to obtain quicker, more intuitive insights into their own data. Then we have AI services. AI services are a collection of offerings, including generative AI with pre-built machine learning models that make it easier for developers to apply AI to applications and business operations. The models can be custom-trained for more accurate business results. 10:17 Nikita: And what specific AI services do we have at Oracle, Chaitanya? Chaitanya: We have Oracle Digital Assistant Speech, Language, Vision, and Document Understanding. Then we have Oracle AI for Applications. Oracle delivers AI built for business, helping you make better decisions faster and empowering your workforce to work more effectively. By embedding classic and generative AI into its applications, Fusion Apps customers can instantly access AI outcomes wherever they are needed without leaving the software environment they use every day to power their business. 11:02 Lois: Let's talk specifically about APEX. How does APEX use the Gen AI and machine learning models in the stack to empower developers. How does it help them boost productivity? Chaitanya: Starting APEX 24.1, you can choose your preferred large language models and leverage native generative AI capabilities of APEX for AI assistants, prompt-based application creation, and more. Using native OCI capabilities, you can leverage native platform capabilities from OCI, like AI infrastructure and object storage, etc. Oracle APEX running on autonomous infrastructure in Oracle Cloud leverages its unique native generative AI capabilities tuned specifically on your data. These language models are schema aware, data aware, and take into account the shape of information, enabling your applications to take advantage of large language models pre-trained on your unique data. You can give your users greater insights by leveraging native capabilities, including vector-based similarity search, content summary, and predictions. You can also incorporate powerful AI features to deliver personalized experiences and recommendations, process natural language prompts, and more by integrating directly with a suite of OCI AI services. 12:38 Nikita: Can you give us some examples of this? Chaitanya: You can leverage OCI Vision to interpret visual and text inputs, including image recognition and classification. Or you can use OCI Speech to transcribe and understand spoken language, making both image and audio content accessible and actionable. You can work with disparate data sources like JSON, spatial, graphs, vectors, and build AI capabilities around your own business data. So, low-code application development with APEX along with AI is a very powerful combination. 13:22 Nikita: What are some use cases of AI-powered Oracle APEX applications? Chaitanya: You can build APEX applications to include conversational chatbots. Your APEX applications can include image and object detection capability. Your APEX applications can include speech transcription capability. And in your applications, you can include code generation that is natural language to SQL conversion capability. Your applications can be powered by semantic search capability. Your APEX applications can include text generation capability. 14:00 Lois: So, there's really a lot we can do! Thank you, Chaitanya, for joining us today. With that, we're wrapping up this episode. We covered Oracle APEX, the key challenges businesses face when it comes to AI innovation, and how APEX and AI work together to give businesses an AI edge. Nikita: Yeah, and if you want to know more about Oracle APEX, visit mylearn.oracle.com and search for the Oracle APEX: Empowering Low Code Apps with AI course. Join us next week for a discussion on AI-assisted development in Oracle APEX. Until then, this is Nikita Abraham… Lois: And Lois Houston signing off! 14:39 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
The tRPC team declares v11 officially production-ready. tRPC allows devs to build typesafe APIs with types that can be shared on the client and server, and now it has support for TanStack Query v5, the ability to send and receive non-JSON data content types, improved support for RSCs, and the ability to stream responses.After the Next.js security incident a few weeks back, Netlify writes an open letter around the challenges Next.js poses when not hosted on Vercel. It raises valid points like a lack of adapters, no production grade documentation for serverless deployments, no visible roadmap or release schedule, and a disregard for open web standards, among others.Firefox is finally adding support for progressive web apps (PWAs), but its web app support will intentionally not look, feel, or behave the same way similar features do in other browsers.News:Paige - tRPC v11Jack - Firefox will support PWAs (finally)TJ - Next.js Netlify deployment dramaBonus News:Styled-components enter maintenance modeNew Bare JS runtimeWindsurf and Netlify partnership (and docs on the feature)What Makes Us Happy this Week:Paige - Squeeze Me novelJack - Pickup Music siteTJ - Mario Kart WorldThanks as always to our sponsor, the Blue Collar Coder channel on YouTube. You can join us in our Discord channel, explore our website and reach us via email, or talk to us on X, Bluesky, or YouTube.Front-end Fire websiteBlue Collar Coder on YouTubeBlue Collar Coder on DiscordReach out via emailTweet at us on X @front_end_fireFollow us on Bluesky @front-end-fire.comSubscribe to our YouTube channel @Front-EndFirePodcast
Brandon Liu is an open source developer and creator of the Protomaps basemap project. We talk about how static maps help developers build sites that last, the PMTiles file format, the role of OpenStreetMap, and his experience funding and running an open source project full time. Protomaps Protomaps PMTiles (File format used by Protomaps) Self-hosted slippy maps, for novices (like me) Why Deploy Protomaps on a CDN User examples Flickr Pinball Map Toilet Map Related projects OpenStreetMap (Dataset protomaps is based on) Mapzen (Former company that released details on what to display based on zoom levels) Mapbox GL JS (Mapbox developed source available map rendering library) MapLibre GL JS (Open source fork of Mapbox GL JS) Other links HTTP range requests (MDN) Hilbert curve Transcript You can help correct transcripts on GitHub. Intro [00:00:00] Jeremy: I'm talking to Brandon Liu. He's the creator of Protomaps, which is a way to easily create and host your own maps. Let's get into it. [00:00:09] Brandon: Hey, so thanks for having me on the podcast. So I'm Brandon. I work on an open source project called Protomaps. What it really is, is if you're a front end developer and you ever wanted to put maps on a website or on a mobile app, then Protomaps is sort of an open source solution for doing that that I hope is something that's way easier to use than, um, a lot of other open source projects. Why not just use Google Maps? [00:00:36] Jeremy: A lot of people are gonna be familiar with Google Maps. Why should they worry about whether something's open source? Why shouldn't they just go and use the Google maps API? [00:00:47] Brandon: So Google Maps is like an awesome thing it's an awesome product. Probably one of the best tech products ever right? And just to have a map that tells you what restaurants are open and something that I use like all the time especially like when you're traveling it has all that data. And the most amazing part is that it's free for consumers but it's not necessarily free for developers. Like if you wanted to embed that map onto your website or app, that usually has an API cost which still has a free tier and is affordable. But one motivation, one basic reason to use open source is if you have some project that doesn't really fit into that pricing model. You know like where you have to pay the cost of Google Maps, you have a side project, a nonprofit, that's one reason. But there's lots of other reasons related to flexibility or customization where you might want to use open source instead. Protomaps examples [00:01:49] Jeremy: Can you give some examples where people have used Protomaps and where that made sense for them? [00:01:56] Brandon: I follow a lot of the use cases and I also don't know about a lot of them because I don't have an API where I can track a hundred percent of the users. Some of them use the hosted version, but I would say most of them probably use it on their own infrastructure. One of the cool projects I've been seeing is called Toilet Map. And what toilet map is if you're in the UK and you want find a public restroom then it maps out, sort of crowdsourced all of the public restrooms. And that's important for like a lot of people if they have health issues, they need to find that information. And just a lot of different projects in the same vein. There's another one called Pinball Map which is sort of a hobby project to find all the pinball machines in the world. And they wanted to have a customized map that fit in with their theme of pinball. So these sorts of really cool indie projects are the ones I'm most excited about. Basemaps vs Overlays [00:02:57] Jeremy: And if we talk about, like the pinball map as an example, there's this concept of a basemap and then there's the things that you lay on top of it. What is a basemap and then is the pinball locations is that part of it or is that something separate? [00:03:12] Brandon: It's usually something separate. The example I usually use is if you go to a real estate site, like Zillow, you'll open up the map of Seattle and it has a bunch of pins showing all the houses, and then it has some information beneath it. That information beneath it is like labels telling, this neighborhood is Capitol Hill, or there is a park here. But all that information is common to a lot of use cases and it's not specific to real estate. So I think usually that's the distinction people use in the industry between like a base map versus your overlay. The overlay is like the data for your product or your company while the base map is something you could get from Google or from Protomaps or from Apple or from Mapbox that kind of thing. PMTiles for hosting the basemap and overlays [00:03:58] Jeremy: And so Protomaps in particular is responsible for the base map, and that information includes things like the streets and the locations of landmarks and things like that. Where is all that information coming from? [00:04:12] Brandon: So the base map information comes from a project called OpenStreetMap. And I would also, point out that for Protomaps as sort of an ecosystem. You can also put your overlay data into a format called PMTiles, which is sort of the core of what Protomaps is. So it can really do both. It can transform your data into the PMTiles format which you can host and you can also host the base map. So you kind of have both of those sides of the product in one solution. [00:04:43] Jeremy: And so when you say you have both are you saying that the PMTiles file can have, the base map in one file and then you would have the data you're laying on top in another file? Or what are you describing there? [00:04:57] Brandon: That's usually how I recommend to do it. Oftentimes there'll be sort of like, a really big basemap 'cause it has all of that data about like where the rivers are. Or while, if you want to put your map of toilets or park benches or pickleball courts on top, that's another file. But those are all just like assets you can move around like JSON or CSV files. Statically Hosted [00:05:19] Jeremy: And I think one of the things you mentioned was that your goal was to make Protomaps or the, the use of these PMTiles files easy to use. What does that look like for, for a developer? I wanna host a map. What do I actually need to, to put on my servers? [00:05:38] Brandon: So my usual pitch is that basically if you know how to use S3 or cloud storage, that you know how to deploy a map. And that, I think is the main sort of differentiation from most open source projects. Like a lot of them, they call themselves like, like some sort of self-hosted solution. But I've actually avoided using the term self-hosted because I think in most cases that implies a lot of complexity. Like you have to log into a Linux server or you have to use Kubernetes or some sort of Docker thing. What I really want to emphasize is the idea that, for Protomaps, it's self-hosted in the same way like CSS is self-hosted. So you don't really need a service from Amazon to host the JSON files or CSV files. It's really just a static file. [00:06:32] Jeremy: When you say static file that means you could use any static web host to host your HTML file, your JavaScript that actually renders the map. And then you have your PMTiles files, and you're not running a process or anything, you're just putting your files on a static file host. [00:06:50] Brandon: Right. So I think if you're a developer, you can also argue like a static file server is a server. It's you know, it's the cloud, it's just someone else's computer. It's really just nginx under the hood. But I think static storage is sort of special. If you look at things like static site generators, like Jekyll or Hugo, they're really popular because they're a commodity or like the storage is a commodity. And you can take your blog, make it a Jekyll blog, hosted on S3. One day, Amazon's like, we're charging three times as much so you can move it to a different cloud provider. And that's all vendor neutral. So I think that's really the special thing about static storage as a primitive on the web. Why running servers is a problem for resilience [00:07:36] Jeremy: Was there a prior experience you had? Like you've worked with maps for a very long time. Were there particular difficulties you had where you said I just gotta have something that can be statically hosted? [00:07:50] Brandon: That's sort of exactly why I got into this. I've been working sort of in and around the map space for over a decade, and Protomaps is really like me trying to solve the same problem I've had over and over again in the past, just like once and forever right? Because like once this problem is solved, like I don't need to deal with it again in the future. So I've worked at a couple of different companies before, mostly as a contractor, for like a humanitarian nonprofit for a design company doing things like, web applications to visualize climate change. Or for even like museums, like digital signage for museums. And oftentimes they had some sort of data visualization component, but always sort of the challenge of how to like, store and also distribute like that data was something that there wasn't really great open source solutions. So just for map data, that's really what motivated that design for Protomaps. [00:08:55] Jeremy: And in those, those projects in the past, were those things where you had to run your own server, run your own database, things like that? [00:09:04] Brandon: Yeah. And oftentimes we did, we would spin up an EC2 instance, for maybe one client and then we would have to host this server serving map data forever. Maybe the client goes away, or I guess it's good for business if you can sign some sort of like long-term support for that client saying, Hey, you know, like we're done with a project, but you can pay us to maintain the EC2 server for the next 10 years. And that's attractive. but it's also sort of a pain, because usually what happens is if people are given the choice, like a developer between like either I can manage the server on EC2 or on Rackspace or Hetzner or whatever, or I can go pay a SaaS to do it. In most cases, businesses will choose to pay the SaaS. So that's really like what creates a sort of lock-in is this preference for like, so I have this choice between like running the server or paying the SaaS. Like businesses will almost always go and pay the SaaS. [00:10:05] Jeremy: Yeah. And in this case, you either find some kind of free hosting or low-cost hosting just to host your files and you upload the files and then you're good from there. You don't need to maintain anything. [00:10:18] Brandon: Exactly, and that's really the ideal use case. so I have some users these, climate science consulting agencies, and then they might have like a one-off project where they have to generate the data once, but instead of having to maintain this server for the lifetime of that project, they just have a file on S3 and like, who cares? If that costs a couple dollars a month to run, that's fine, but it's not like S3 is gonna be deprecated, like it's gonna be on an insecure version of Ubuntu or something. So that's really the ideal, set of constraints for using Protomaps. [00:10:58] Jeremy: Yeah. Something this also makes me think about is, is like the resilience of sites like remaining online, because I, interviewed, Kyle Drake, he runs Neocities, which is like a modern version of GeoCities. And if I remember correctly, he was mentioning how a lot of old websites from that time, if they were running a server backend, like they were running PHP or something like that, if you were to try to go to those sites, now they're like pretty much all dead because there needed to be someone dedicated to running a Linux server, making sure things were patched and so on and so forth. But for static sites, like the ones that used to be hosted on GeoCities, you can go to the internet archive or other websites and they were just files, right? You can bring 'em right back up, and if anybody just puts 'em on a web server, then you're good. They're still alive. Case study of news room preferring static hosting [00:11:53] Brandon: Yeah, exactly. One place that's kind of surprising but makes sense where this comes up, is for newspapers actually. Some of the users using Protomaps are the Washington Post. And the reason they use it, is not necessarily because they don't want to pay for a SaaS like Google, but because if they make an interactive story, they have to guarantee that it still works in a couple of years. And that's like a policy decision from like the editorial board, which is like, so you can't write an article if people can't view it in five years. But if your like interactive data story is reliant on a third party, API and that third party API becomes deprecated, or it changes the pricing or it, you know, it gets acquired, then your journalism story is not gonna work anymore. So I have seen really good uptake among local news rooms and even big ones to use things like Protomaps just because it makes sense for the requirements. Working on Protomaps as an open source project for five years [00:12:49] Jeremy: How long have you been working on Protomaps and the parts that it's made up of such as PMTiles? [00:12:58] Brandon: I've been working on it for about five years, maybe a little more than that. It's sort of my pandemic era project. But the PMTiles part, which is really the heart of it only came in about halfway. Why not make a SaaS? [00:13:13] Brandon: So honestly, like when I first started it, I thought it was gonna be another SaaS and then I looked at it and looked at what the environment was around it. And I'm like, uh, so I don't really think I wanna do that. [00:13:24] Jeremy: When, when you say you looked at the environment around it what do you mean? Why did you decide not to make it a SaaS? [00:13:31] Brandon: Because there already is a lot of SaaS out there. And I think the opportunity of making something that is unique in terms of those use cases, like I mentioned like newsrooms, was clear. Like it was clear that there was some other solution, that could be built that would fit these needs better while if it was a SaaS, there are plenty of those out there. And I don't necessarily think that they're well differentiated. A lot of them all use OpenStreetMap data. And it seems like they mainly compete on price. It's like who can build the best three column pricing model. And then once you do that, you need to build like billing and metrics and authentication and like those problems don't really interest me. So I think, although I acknowledge sort of the indie hacker ethos now is to build a SaaS product with a monthly subscription, that's something I very much chose not to do, even though it is for sure like the best way to build a business. [00:14:29] Jeremy: Yeah, I mean, I think a lot of people can appreciate that perspective because it's, it's almost like we have SaaS overload, right? Where you have so many little bills for your project where you're like, another $5 a month, another $10 a month, or if you're a business, right? Those, you add a bunch of zeros and at some point it's just how many of these are we gonna stack on here? [00:14:53] Brandon: Yeah. And honestly. So I really think like as programmers, we're not really like great at choosing how to spend money like a $10 SaaS. That's like nothing. You know? So I can go to Starbucks and I can buy a pumpkin spice latte, and that's like $10 basically now, right? And it's like I'm able to make that consumer choice in like an instant just to spend money on that. But then if you're like, oh, like spend $10 on a SaaS that somebody put a lot of work into, then you're like, oh, that's too expensive. I could just do it myself. So I'm someone that also subscribes to a lot of SaaS products. and I think for a lot of things it's a great fit. Many open source SaaS projects are not easy to self host [00:15:37] Brandon: But there's always this tension between an open source project that you might be able to run yourself and a SaaS. And I think a lot of projects are at different parts of the spectrum. But for Protomaps, it's very much like I'm trying to move maps to being it is something that is so easy to run yourself that anyone can do it. [00:16:00] Jeremy: Yeah, and I think you can really see it with, there's a few SaaS projects that are successful and they're open source, but then you go to look at the self-hosting instructions and it's either really difficult to find and you find it, and then the instructions maybe don't work, or it's really complicated. So I think doing the opposite with Protomaps. As a user, I'm sure we're all appreciative, but I wonder in terms of trying to make money, if that's difficult. [00:16:30] Brandon: No, for sure. It is not like a good way to make money because I think like the ideal situation for an open source project that is open that wants to make money is the product itself is fundamentally complicated to where people are scared to run it themselves. Like a good example I can think of is like Supabase. Supabase is sort of like a platform as a service based on Postgres. And if you wanted to run it yourself, well you need to run Postgres and you need to handle backups and authentication and logging, and that stuff all needs to work and be production ready. So I think a lot of people, like they don't trust themselves to run database backups correctly. 'cause if you get it wrong once, then you're kind of screwed. So I think that fundamental aspect of the product, like a database is something that is very, very ripe for being a SaaS while still being open source because it's fundamentally hard to run. Another one I can think of is like tailscale, which is, like a VPN that works end to end. That's something where, you know, it has this networking complexity where a lot of developers don't wanna deal with that. So they'd happily pay, for tailscale as a service. There is a lot of products or open source projects that eventually end up just changing to becoming like a hosted service. Businesses going from open source to closed or restricted licenses [00:17:58] Brandon: But then in that situation why would they keep it open source, right? Like, if it's easy to run yourself well, doesn't that sort of cannibalize their business model? And I think that's really the tension overall in these open source companies. So you saw it happen to things like Elasticsearch to things like Terraform where they eventually change the license to one that makes it difficult for other companies to compete with them. [00:18:23] Jeremy: Yeah, I mean there's been a number of cases like that. I mean, specifically within the mapping community, one I can think of was Mapbox's. They have Mapbox gl. Which was a JavaScript client to visualize maps and they moved from, I forget which license they picked, but they moved to a much more restrictive license. I wonder what your thoughts are on something that releases as open source, but then becomes something maybe a little more muddy. [00:18:55] Brandon: Yeah, I think it totally makes sense because if you look at their business and their funding, it seems like for Mapbox, I haven't used it in a while, but my understanding is like a lot of their business now is car companies and doing in dash navigation. And that is probably way better of a business than trying to serve like people making maps of toilets. And I think sort of the beauty of it is that, so Mapbox, the story is they had a JavaScript renderer called Mapbox GL JS. And they changed that to a source available license a couple years ago. And there's a fork of it that I'm sort of involved in called MapLibre GL. But I think the cool part is Mapbox paid employees for years, probably millions of dollars in total to work on this thing and just gave it away for free. Right? So everyone can benefit from that work they did. It's not like that code went away, like once they changed the license. Well, the old version has been forked. It's going its own way now. It's quite different than the new version of Mapbox, but I think it's extremely generous that they're able to pay people for years, you know, like a competitive salary and just give that away. [00:20:10] Jeremy: Yeah, so we should maybe look at it as, it was a gift while it was open source, and they've given it to the community and they're on continuing on their own path, but at least the community running Map Libre, they can run with it, right? It's not like it just disappeared. [00:20:29] Brandon: Yeah, exactly. And that is something that I use for Protomaps quite extensively. Like it's the primary way of showing maps on the web and I've been trying to like work on some enhancements to it to have like better internationalization for if you are in like South Asia like not show languages correctly. So I think it is being taken in a new direction. And I think like sort of the combination of Protomaps and MapLibre, it addresses a lot of use cases, like I mentioned earlier with like these like hobby projects, indie projects that are almost certainly not interesting to someone like Mapbox or Google as a business. But I'm happy to support as a small business myself. Financially supporting open source work (GitHub sponsors, closed source, contracts) [00:21:12] Jeremy: In my previous interview with Tom, one of the main things he mentioned was that creating a mapping business is incredibly difficult, and he said he probably wouldn't do it again. So in your case, you're building Protomaps, which you've admitted is easy to self-host. So there's not a whole lot of incentive for people to pay you. How is that working out for you? How are you supporting yourself? [00:21:40] Brandon: There's a couple of strategies that I've tried and oftentimes failed at. Just to go down the list, so I do have GitHub sponsors so I do have a hosted version of Protomaps you can use if you don't want to bother copying a big file around. But the way I do the billing for that is through GitHub sponsors. If you wanted to use this thing I provide, then just be a sponsor. And that definitely pays for itself, like the cost of running it. And that's great. GitHub sponsors is so easy to set up. It just removes you having to deal with Stripe or something. 'cause a lot of people, their credit card information is already in GitHub. GitHub sponsors I think is awesome if you want to like cover costs for a project. But I think very few people are able to make that work. A thing that's like a salary job level. It's sort of like Twitch streaming, you know, there's a handful of people that are full-time streamers and then you look down the list on Twitch and it's like a lot of people that have like 10 viewers. But some of the other things I've tried, I actually started out, publishing the base map as a closed source thing, where I would sell sort of like a data package instead of being a SaaS, I'd be like, here's a one-time download, of the premium data and you can buy it. And quite a few people bought it I just priced it at like $500 for this thing. And I thought that was an interesting experiment. The main reason it's interesting is because the people that it attracts to you in terms of like, they're curious about your products, are all people willing to pay money. While if you start out everything being open source, then the people that are gonna be try to do it are only the people that want to get something for free. So what I discovered is actually like once you transition that thing from closed source to open source, a lot of the people that used to pay you money will still keep paying you money because like, it wasn't necessarily that that closed source thing was why they wanted to pay. They just valued that thought you've put into it your expertise, for example. So I think that is one thing, that I tried at the beginning was just start out, closed source proprietary, then make it open source. That's interesting to people. Like if you release something as open source, if you go the other way, like people are really mad if you start out with something open source and then later on you're like, oh, it's some other license. Then people are like that's so rotten. But I think doing it the other way, I think is quite valuable in terms of being able to find an audience. [00:24:29] Jeremy: And when you said it was closed source and paid to open source, do you still sell those map exports? [00:24:39] Brandon: I don't right now. It's something that I might do in the future, you know, like have small customizations of the data that are available, uh, for a fee. still like the core OpenStreetMap based map that's like a hundred gigs you can just download. And that'll always just be like a free download just because that's already out there. All the source code to build it is open source. So even if I said, oh, you have to pay for it, then someone else can just do it right? So there's no real reason like to make that like some sort of like paywall thing. But I think like overall if the project is gonna survive in the long term it's important that I'd ideally like to be able to like grow like a team like have a small group of people that can dedicate the time to growing the project in the long term. But I'm still like trying to figure that out right now. [00:25:34] Jeremy: And when you mentioned that when you went from closed to open and people were still paying you, you don't sell a product anymore. What were they paying for? [00:25:45] Brandon: So I have some contracts with companies basically, like if they need a feature or they need a customization in this way then I am very open to those. And I sort of set it up to make it clear from the beginning that this is not just a free thing on GitHub, this is something that you could pay for if you need help with it, if you need support, if you wanted it. I'm also a little cagey about the word support because I think like it sounds a little bit too wishy-washy. Pretty much like if you need access to the developers of an open source project, I think that's something that businesses are willing to pay for. And I think like making that clear to potential users is a challenge. But I think that is one way that you might be able to make like a living out of open source. [00:26:35] Jeremy: And I think you said you'd been working on it for about five years. Has that mostly been full time? [00:26:42] Brandon: It's been on and off. it's sort of my pandemic era project. But I've spent a lot of time, most of my time working on the open source project at this point. So I have done some things that were more just like I'm doing a customization or like a private deployment for some client. But that's been a minority of the time. Yeah. [00:27:03] Jeremy: It's still impressive to have an open source project that is easy to self-host and yet is still able to support you working on it full time. I think a lot of people might make the assumption that there's nothing to sell if something is, is easy to use. But this sort of sounds like a counterpoint to that. [00:27:25] Brandon: I think I'd like it to be. So when you come back to the point of like, it being easy to self-host. Well, so again, like I think about it as like a primitive of the web. Like for example, if you wanted to start a business today as like hosted CSS files, you know, like where you upload your CSS and then you get developers to pay you a monthly subscription for how many times they fetched a CSS file. Well, I think most developers would be like, that's stupid because it's just an open specification, you just upload a static file. And really my goal is to make Protomaps the same way where it's obvious that there's not really some sort of lock-in or some sort of secret sauce in the server that does this thing. How PMTiles works and building a primitive of the web [00:28:16] Brandon: If you look at video for example, like a lot of the tech for how Protomaps and PMTiles works is based on parts of the HTTP spec that were made for video. And 20 years ago, if you wanted to host a video on the web, you had to have like a real player license or flash. So you had to go license some server software from real media or from macromedia so you could stream video to a browser plugin. But now in HTML you can just embed a video file. And no one's like, oh well I need to go pay for my video serving license. I mean, there is such a thing, like YouTube doesn't really use that for DRM reasons, but people just have the assumption that video is like a primitive on the web. So if we're able to make maps sort of that same way like a primitive on the web then there isn't really some obvious business or licensing model behind how that works. Just because it's a thing and it helps a lot of people do their jobs and people are happy using it. So why bother? [00:29:26] Jeremy: You mentioned that it a tech that was used for streaming video. What tech specifically is it? [00:29:34] Brandon: So it is byte range serving. So when you open a video file on the web, So let's say it's like a 100 megabyte video. You don't have to download the entire video before it starts playing. It streams parts out of the file based on like what frames... I mean, it's based on the frames in the video. So it can start streaming immediately because it's organized in a way to where the first few frames are at the beginning. And what PMTiles really is, is it's just like a video but in space instead of time. So it's organized in a way where these zoomed out views are at the beginning and the most zoomed in views are at the end. So when you're like panning or zooming in the map all you're really doing is fetching byte ranges out of that file the same way as a video. But it's organized in, this tiled way on a space filling curve. IIt's a little bit complicated how it works internally and I think it's kind of cool but that's sort of an like an implementation detail. [00:30:35] Jeremy: And to the person deploying it, it just looks like a single file. [00:30:40] Brandon: Exactly in the same way like an mp3 audio file is or like a JSON file is. [00:30:47] Jeremy: So with a video, I can sort of see how as someone seeks through the video, they start at the beginning and then they go to the middle if they wanna see the middle. For a map, as somebody scrolls around the map, are you seeking all over the file or is the way it's structured have a little less chaos? [00:31:09] Brandon: It's structured. And that's kind of the main technical challenge behind building PMTiles is you have to be sort of clever so you're not spraying the reads everywhere. So it uses something called a hilbert curve, which is a mathematical concept of a space filling curve. Where it's one continuous curve that essentially lets you break 2D space into 1D space. So if you've seen some maps of IP space, it uses this crazy looking curve that hits all the points in one continuous line. And that's the same concept behind PMTiles is if you're looking at one part of the world, you're sort of guaranteed that all of those parts you're looking at are quite close to each other and the data you have to transfer is quite minimal, compared to if you just had it at random. [00:32:02] Jeremy: How big do the files get? If I have a PMTiles of the entire world, what kind of size am I looking at? [00:32:10] Brandon: Right now, the default one I distribute is 128 gigabytes, so it's quite sizable, although you can slice parts out of it remotely. So if you just wanted. if you just wanted California or just wanted LA or just wanted only a couple of zoom levels, like from zero to 10 instead of zero to 15, there is a command line tool that's also called PMTiles that lets you do that. Issues with CDNs and range queries [00:32:35] Jeremy: And when you're working with files of this size, I mean, let's say I am working with a CDN in front of my application. I'm not typically accustomed to hosting something that's that large and something that's where you're seeking all over the file. is that, ever an issue or is that something that's just taken care of by the browser and, and taken care of by, by the hosts? [00:32:58] Brandon: That is an issue actually, so a lot of CDNs don't deal with it correctly. And my recommendation is there is a kind of proxy server or like a serverless proxy thing that I wrote. That runs on like cloudflare workers or on Docker that lets you proxy those range requests into a normal URL and then that is like a hundred percent CDN compatible. So I would say like a lot of the big commercial installations of this thing, they use that because it makes more practical sense. It's also faster. But the idea is that this solution sort of scales up and scales down. If you wanted to host just your city in like a 10 megabyte file, well you can just put that into GitHub pages and you don't have to worry about it. If you want to have a global map for your website that serves a ton of traffic then you probably want a little bit more sophisticated of a solution. It still does not require you to run a Linux server, but it might require (you) to use like Lambda or Lambda in conjunction with like a CDN. [00:34:09] Jeremy: Yeah. And that sort of ties into what you were saying at the beginning where if you can host on something like CloudFlare Workers or Lambda, there's less time you have to spend keeping these things running. [00:34:26] Brandon: Yeah, exactly. and I think also the Lambda or CloudFlare workers solution is not perfect. It's not as perfect as S3 or as just static files, but in my experience, it still is better at building something that lasts on the time span of years than being like I have a server that is on this Ubuntu version and in four years there's all these like security patches that are not being applied. So it's still sort of serverless, although not totally vendor neutral like S3. Customizing the map [00:35:03] Jeremy: We've mostly been talking about how you host the map itself, but for someone who's not familiar with these kind of tools, how would they be customizing the map? [00:35:15] Brandon: For customizing the map there is front end style customization and there's also data customization. So for the front end if you wanted to change the water from the shade of blue to another shade of blue there is a TypeScript API where you can customize it almost like a text editor color scheme. So if you're able to name a bunch of colors, well you can customize the map in that way you can change the fonts. And that's all done using MapLibre GL using a TypeScript API on top of that for customizing the data. So all the pipeline to generate this data from OpenStreetMap is open source. There is a Java program using a library called PlanetTiler which is awesome, which is this super fast multi-core way of building map tiles. And right now there isn't really great hooks to customize what data goes into that. But that's something that I do wanna work on. And finally, because the data comes from OpenStreetMap if you notice data that's missing or you wanted to correct data in OSM then you can go into osm.org. You can get involved in contributing the data to OSM and the Protomaps build is daily. So if you make a change, then within 24 hours you should see the new base map. Have that change. And of course for OSM your improvements would go into every OSM based project that is ingesting that data. So it's not a protomap specific thing. It's like this big shared data source, almost like Wikipedia. OpenStreetMap is a dataset and not a map [00:37:01] Jeremy: I think you were involved with OpenStreetMap to some extent. Can you speak a little bit to that for people who aren't familiar, what OpenStreetMap is? [00:37:11] Brandon: Right. So I've been using OSM as sort of like a tools developer for over a decade now. And one of the number one questions I get from developers about what is Protomaps is why wouldn't I just use OpenStreetMap? What's the distinction between Protomaps and OpenStreetMap? And it's sort of like this funny thing because even though OSM has map in the name it's not really a map in that you can't... In that it's mostly a data set and not a map. It does have a map that you can see that you can pan around to when you go to the website but the way that thing they show you on the website is built is not really that easily reproducible. It involves a lot of c++ software you have to run. But OpenStreetMap itself, the heart of it is almost like a big XML file that has all the data in the map and global. And it has tagged features for example. So you can go in and edit that. It has a web front end to change the data. It does not directly translate into making a map actually. Protomaps decides what shows at each zoom level [00:38:24] Brandon: So a lot of the pipeline, that Java program I mentioned for building this basemap for protomaps is doing things like you have to choose what data you show when you zoom out. You can't show all the data. For example when you're zoomed out and you're looking at all of a state like Colorado you don't see all the Chipotle when you're zoomed all the way out. That'd be weird, right? So you have to make some sort of decision in logic that says this data only shows up at this zoom level. And that's really what is the challenge in optimizing the size of that for the Protomaps map project. [00:39:03] Jeremy: Oh, so those decisions of what to show at different Zoom levels those are decisions made by you when you're creating the PMTiles file with Protomaps. [00:39:14] Brandon: Exactly. It's part of the base maps build pipeline. and those are honestly very subjective decisions. Who really decides when you're zoomed out should this hospital show up or should this museum show up nowadays in Google, I think it shows you ads. Like if someone pays for their car repair shop to show up when you're zoomed out like that that gets surfaced. But because there is no advertising auction in Protomaps that doesn't happen obviously. So we have to sort of make some reasonable choice. A lot of that right now in Protomaps actually comes from another open source project called Mapzen. So Mapzen was a company that went outta business a couple years ago. They did a lot of this work in designing which data shows up at which Zoom level and open sourced it. And then when they shut down, they transferred that code into the Linux Foundation. So it's this totally open source project, that like, again, sort of like Mapbox gl has this awesome legacy in that this company funded it for years for smart people to work on it and now it's just like a free thing you can use. So the logic in Protomaps is really based on mapzen. [00:40:33] Jeremy: And so the visualization of all this... I think I understand what you mean when people say oh, why not use OpenStreetMaps because it's not really clear it's hard to tell is this the tool that's visualizing the data? Is it the data itself? So in the case of using Protomaps, it sounds like Protomaps itself has all of the data from OpenStreetMap and then it has made all the decisions for you in terms of what to show at different Zoom levels and what things to have on the map at all. And then finally, you have to have a separate, UI layer and in this case, it sounds like the one that you recommend is the Map Libre library. [00:41:18] Brandon: Yeah, that's exactly right. For Protomaps, it has a portion or a subset of OSM data. It doesn't have all of it just because there's too much, like there's data in there. people have mapped out different bushes and I don't include that in Protomaps if you wanted to go in and edit like the Java code to add that you can. But really what Protomaps is positioned at is sort of a solution for developers that want to use OSM data to make a map on their app or their website. because OpenStreetMap itself is mostly a data set, it does not really go all the way to having an end-to-end solution. Financials and the idea of a project being complete [00:41:59] Jeremy: So I think it's great that somebody who wants to make a map, they have these tools available, whether it's from what was originally built by Mapbox, what's built by Open StreetMap now, the work you're doing with Protomaps. But I wonder one of the things that I talked about with Tom was he was saying he was trying to build this mapping business and based on the financials of what was coming in he was stressed, right? He was struggling a bit. And I wonder for you, you've been working on this open source project for five years. Do you have similar stressors or do you feel like I could keep going how things are now and I feel comfortable? [00:42:46] Brandon: So I wouldn't say I'm a hundred percent in one bucket or the other. I'm still seeing it play out. One thing, that I really respect in a lot of open source projects, which I'm not saying I'm gonna do for Protomaps is the idea that a project is like finished. I think that is amazing. If a software project can just be done it's sort of like a painting or a novel once you write, finish the last page, have it seen by the editor. I send it off to the press is you're done with a book. And I think one of the pains of software is so few of us can actually do that. And I don't know obviously people will say oh the map is never finished. That's more true of OSM, but I think like for Protomaps. One thing I'm thinking about is how to limit the scope to something that's quite narrow to where we could be feature complete on the core things in the near term timeframe. That means that it does not address a lot of things that people want. Like search, like if you go to Google Maps and you search for a restaurant, you will get some hits. that's like a geocoding issue. And I've already decided that's totally outta scope for Protomaps. So, in terms of trying to think about the future of this, I'm mostly looking for ways to cut scope if possible. There are some things like better tooling around being able to work with PMTiles that are on the roadmap. but for me, I am still enjoying working on the project. It's definitely growing. So I can see on NPM downloads I can see the growth curve of people using it and that's really cool. So I like hearing about when people are using it for cool projects. So it seems to still be going okay for now. [00:44:44] Jeremy: Yeah, that's an interesting perspective about how you were talking about projects being done. Because I think when people look at GitHub projects and they go like, oh, the last commit was X months ago. They go oh well this is dead right? But maybe that's the wrong framing. Maybe you can get a project to a point where it's like, oh, it's because it doesn't need to be updated. [00:45:07] Brandon: Exactly, yeah. Like I used to do a lot of c++ programming and the best part is when you see some LAPACK matrix math library from like 1995 that still works perfectly in c++ and you're like, this is awesome. This is the one I have to use. But if you're like trying to use some like React component library and it hasn't been updated in like a year, you're like, oh, that's a problem. So again, I think there's some middle ground between those that I'm trying to find. I do like for Protomaps, it's quite dependency light in terms of the number of hard dependencies I have in software. but I do still feel like there is a lot of work to be done in terms of project scope that needs to have stuff added. You mostly only hear about problems instead of people's wins [00:45:54] Jeremy: Having run it for this long. Do you have any thoughts on running an open source project in general? On dealing with issues or managing what to work on things like that? [00:46:07] Brandon: Yeah. So I have a lot. I think one thing people point out a lot is that especially because I don't have a direct relationship with a lot of the people using it a lot of times I don't even know that they're using it. Someone sent me a message saying hey, have you seen flickr.com, like the photo site? And I'm like, no. And I went to flickr.com/map and it has Protomaps for it. And I'm like, I had no idea. But that's cool, if they're able to use Protomaps for this giant photo sharing site that's awesome. But that also means I don't really hear about when people use it successfully because you just don't know, I guess they, NPM installed it and it works perfectly and you never hear about it. You only hear about people's negative experiences. You only hear about people that come and open GitHub issues saying this is totally broken, and why doesn't this thing exist? And I'm like, well, it's because there's an infinite amount of things that I want to do, but I have a finite amount of time and I just haven't gone into that yet. And that's honestly a lot of the things and people are like when is this thing gonna be done? So that's, that's honestly part of why I don't have a public roadmap because I want to avoid that sort of bickering about it. I would say that's one of my biggest frustrations with running an open source project is how it's self-selected to only hear the negative experiences with it. Be careful what PRs you accept [00:47:32] Brandon: 'cause you don't hear about those times where it works. I'd say another thing is it's changed my perspective on contributing to open source because I think when I was younger or before I had become a maintainer I would open a pull request on a project unprompted that has a hundred lines and I'd be like, Hey, just merge this thing. But I didn't realize when I was younger well if I just merge it and I disappear, then the maintainer is stuck with what I did forever. You know if I add some feature then that person that maintains the project has to do that indefinitely. And I think that's very asymmetrical and it's changed my perspective a lot on accepting open source contributions. I wanna have it be open to anyone to contribute. But there is some amount of back and forth where it's almost like the default answer for should I accept a PR is no by default because you're the one maintaining it. And do you understand the shape of that solution completely to where you're going to support it for years because the person that's contributing it is not bound to those same obligations that you are. And I think that's also one of the things where I have a lot of trepidation around open source is I used to think of it as a lot more bazaar-like in terms of anyone can just throw their thing in. But then that creates a lot of problems for the people who are expected out of social obligation to continue this thing indefinitely. [00:49:23] Jeremy: Yeah, I can totally see why that causes burnout with a lot of open source maintainers, because you probably to some extent maybe even feel some guilt right? You're like, well, somebody took the time to make this. But then like you said you have to spend a lot of time trying to figure out is this something I wanna maintain long term? And one wrong move and it's like, well, it's in here now. [00:49:53] Brandon: Exactly. To me, I think that is a very common failure mode for open source projects is they're too liberal in the things they accept. And that's a lot of why I was talking about how that choice of what features show up on the map was inherited from the MapZen projects. If I didn't have that then somebody could come in and say hey, you know, I want to show power lines on the map. And they open a PR for power lines and now everybody who's using Protomaps when they're like zoomed out they see power lines are like I didn't want that. So I think that's part of why a lot of open source projects eventually evolve into a plugin system is because there is this demand as the project grows for more and more features. But there is a limit in the maintainers. It's like the demand for features is exponential while the maintainer amount of time and effort is linear. Plugin systems might reduce need for PRs [00:50:56] Brandon: So maybe the solution to smash that exponential down to quadratic maybe is to add a plugin system. But I think that is one of the biggest tensions that only became obvious to me after working on this for a couple of years. [00:51:14] Jeremy: Is that something you're considering doing now? [00:51:18] Brandon: Is the plugin system? Yeah. I think for the data customization, I eventually wanted to have some sort of programmatic API to where you could declare a config file that says I want ski routes. It totally makes sense. The power lines example is maybe a little bit obscure but for example like a skiing app and you want to be able to show ski slopes when you're zoomed out well you're not gonna be able to get that from Mapbox or from Google because they have a one size fits all map that's not specialized to skiing or to golfing or to outdoors. But if you like, in theory, you could do this with Protomaps if you changed the Java code to show data at different zoom levels. And that is to me what makes the most sense for a plugin system and also makes the most product sense because it enables a lot of things you cannot do with the one size fits all map. [00:52:20] Jeremy: It might also increase the complexity of the implementation though, right? [00:52:25] Brandon: Yeah, exactly. So that's like. That's really where a lot of the terrifying thoughts come in, which is like once you create this like config file surface area, well what does that look like? Is that JSON? Is that TOML, is that some weird like everything eventually evolves into some scripting language right? Where you have logic inside of your templates and I honestly do not really know what that looks like right now. That feels like something in the medium term roadmap. [00:52:58] Jeremy: Yeah and then in terms of bug reports or issues, now it's not just your code it's this exponential combination of whatever people put into these config files. [00:53:09] Brandon: Exactly. Yeah. so again, like I really respect the projects that have done this well or that have done plugins well. I'm trying to think of some, I think obsidian has plugins, for example. And that seems to be one of the few solutions to try and satisfy the infinite desire for features with the limited amount of maintainer time. Time split between code vs triage vs talking to users [00:53:36] Jeremy: How would you say your time is split between working on the code versus issue and PR triage? [00:53:43] Brandon: Oh, it varies really. I think working on the code is like a minority of it. I think something that I actually enjoy is talking to people, talking to users, getting feedback on it. I go to quite a few conferences to talk to developers or people that are interested and figure out how to refine the message, how to make it clearer to people, like what this is for. And I would say maybe a plurality of my time is spent dealing with non-technical things that are neither code or GitHub issues. One thing I've been trying to do recently is talk to people that are not really in the mapping space. For example, people that work for newspapers like a lot of them are front end developers and if you ask them to run a Linux server they're like I have no idea. But that really is like one of the best target audiences for Protomaps. So I'd say a lot of the reality of running an open source project is a lot like a business is it has all the same challenges as a business in terms of you have to figure out what is the thing you're offering. You have to deal with people using it. You have to deal with feedback, you have to deal with managing emails and stuff. I don't think the payoff is anywhere near running a business or a startup that's backed by VC money is but it's definitely not the case that if you just want to code, you should start an open source project because I think a lot of the work for an opensource project has nothing to do with just writing the code. It is in my opinion as someone having done a VC backed business before, it is a lot more similar to running, a tech company than just putting some code on GitHub. Running a startup vs open source project [00:55:43] Jeremy: Well, since you've done both at a high level what did you like about running the company versus maintaining the open source project? [00:55:52] Brandon: So I have done some venture capital accelerator programs before and I think there is an element of hype and energy that you get from that that is self perpetuating. Your co-founder is gungho on like, yeah, we're gonna do this thing. And your investors are like, you guys are geniuses. You guys are gonna make a killing doing this thing. And the way it's framed is sort of obvious to everyone that it's like there's a much more traditional set of motivations behind that, that people understand while it's definitely not the case for running an open source project. Sometimes you just wake up and you're like what the hell is this thing for, it is this thing you spend a lot of time on. You don't even know who's using it. The people that use it and make a bunch of money off of it they know nothing about it. And you know, it's just like cool. And then you only hear from people that are complaining about it. And I think like that's honestly discouraging compared to the more clear energy and clearer motivation and vision behind how most people think about a company. But what I like about the open source project is just the lack of those constraints you know? Where you have a mandate that you need to have this many customers that are paying by this amount of time. There's that sort of pressure on delivering a business result instead of just making something that you're proud of that's simple to use and has like an elegant design. I think that's really a difference in motivation as well. Having control [00:57:50] Jeremy: Do you feel like you have more control? Like you mentioned how you've decided I'm not gonna make a public roadmap. I'm the sole developer. I get to decide what goes in. What doesn't. Do you feel like you have more control in your current position than you did running the startup? [00:58:10] Brandon: Definitely for sure. Like that agency is what I value the most. It is possible to go too far. Like, so I'm very wary of the BDFL title, which I think is how a lot of open source projects succeed. But I think there is some element of for a project to succeed there has to be somebody that makes those decisions. Sometimes those decisions will be wrong and then hopefully they can be rectified. But I think going back to what I was talking about with scope, I think the overall vision and the scope of the project is something that I am very opinionated about in that it should do these things. It shouldn't do these things. It should be easy to use for this audience. Is it gonna be appealing to this other audience? I don't know. And I think that is really one of the most important parts of that leadership role, is having the power to decide we're doing this, we're not doing this. I would hope other developers would be able to get on board if they're able to make good use of the project, if they use it for their company, if they use it for their business, if they just think the project is cool. So there are other contributors at this point and I want to get more involved. But I think being able to make those decisions to what I believe is going to be the best project is something that is very special about open source, that isn't necessarily true about running like a SaaS business. [00:59:50] Jeremy: I think that's a good spot to end it on, so if people want to learn more about Protomaps or they wanna see what you're up to, where should they head? [01:00:00] Brandon: So you can go to Protomaps.com, GitHub, or you can find me or Protomaps on bluesky or Mastodon. [01:00:09] Jeremy: All right, Brandon, thank you so much for chatting today. [01:00:12] Brandon: Great. Thank you very much.
What can GitHub Copilot do for SysAdmins in 2025? Richard talks to Jessica Deen from GitHub about her experiences using Copilot for her work. Jessica talks about Copilot being the first stop for most tasks - describing the task to Copilot helps you think through the problem, and often the tool can generate code or information to get that task done fast. Today's GitHub Copilot can handle everything from explaining existing code to writing something new, debugging a problem, or even writing documentation!LinksGitHub CopilotChanging the AI Model for Copilot ChatVisual Studio Code InsidersAzure ExtensionsGitHub SparkLaunch DarklyRecorded March 13, 2025
Fredrik och Kristoffer snackar Coolify och Hetzner. Kristoffer hjälper Fredrik förstå vad man vill ha Coolify till och vad det egentligen är. Man måste ha en viss kunskap, eller i alla fall vara intresserad av läsa på och skaffa sig kunskap. Coolify är inte säkert nog direkt ur lådan. Fredrik funderar på att byta tjänst för mejl. Konsensus verkar vara att mejl är en tjänst man kanske ändå inte vill ha hos Hetzner. Vi diskuterar också modern C++ och dess framtid en sväng, för att sedan prata om Coolifys svagare sidor: det är mycket att sätta sig in i, och inte alltid superstark dokumentation. Sist men inte minst, ett litet inspel om Roq och ett löfte om ett framtida ämne. Ett stort tack till Cloudnet som sponsrar vår VPS! Har du kommentarer, frågor eller tips? Vi är @kodsnack, @thieta, @krig, och @bjoreman på Mastodon, har en sida på Facebook och epostas på info@kodsnack.se om du vill skriva längre. Vi läser allt som skickas. Gillar du Kodsnack får du hemskt gärna recensera oss i iTunes! Du kan också stödja podden genom att ge oss en kaffe (eller två!) på Ko-fi, eller handla något i vår butik. Länkar Delta i den tolfte spelsylten! - 7 - 20 april 2025 Coolify Kuzzle Dozzle - monitorerar och loggar containrar Odoo Fler tjänster man kan köra i Coolify ERP - enterprise resource planning CRM - customer relationship management Coolest cooler Space monkey - delad molnlagring på NAS-diskar, ungefär. Kickstarterkampanjen har fina bilder Syncthing Hur Dropbox började Boring cash cow S3 Sqlite CRDT Kristoffers företagssida Reverse proxy Heroku Coolifys guide för att ställa in lastbalansering på Hetzner Hetzners Coolifydokumentation cloud-init Fastmail Glesys och mejl Stöd oss på Ko-fi! Zig Bjarne Stroustrup Artikeln om problem med C++ av Izzy Muerte Herb Sutter Profiler i C++ Nginx Caddy Traefik Coolify cloud Infisical - hemlighetshantering Beszel - övervakning Ghost Writefreely Nixpacks UWSGI Roq Software unscripted Feldman håller presentation där han skriver hårt typat backend med JSON-stöd Durable execution Titlar Zzle-ändelsen Enterprisesoftware på min Coolify En mörk skog därute full av monster Fylld av monster Du sköter DNS:en själv Terminalknappen Infarfarerad Ett användbart monster C++ the good parts Ett rum fullt med pistoler Vibe-deploya Coolify Tyskt technoband från nittiotalet Det här är sunk Monstermejl på Hetzner Zig är ju coolare än Rust Inferensen är komplett
If you're in SF: Join us for the Claude Plays Pokemon hackathon this Sunday!If you're not: Fill out the 2025 State of AI Eng survey for $250 in Amazon cards!We are SO excited to share our conversation with Dharmesh Shah, co-founder of HubSpot and creator of Agent.ai.A particularly compelling concept we discussed is the idea of "hybrid teams" - the next evolution in workplace organization where human workers collaborate with AI agents as team members. Just as we previously saw hybrid teams emerge in terms of full-time vs. contract workers, or in-office vs. remote workers, Dharmesh predicts that the next frontier will be teams composed of both human and AI members. This raises interesting questions about team dynamics, trust, and how to effectively delegate tasks between human and AI team members.The discussion of business models in AI reveals an important distinction between Work as a Service (WaaS) and Results as a Service (RaaS), something Dharmesh has written extensively about. While RaaS has gained popularity, particularly in customer support applications where outcomes are easily measurable, Dharmesh argues that this model may be over-indexed. Not all AI applications have clearly definable outcomes or consistent economic value per transaction, making WaaS more appropriate in many cases. This insight is particularly relevant for businesses considering how to monetize AI capabilities.The technical challenges of implementing effective agent systems are also explored, particularly around memory and authentication. Shah emphasizes the importance of cross-agent memory sharing and the need for more granular control over data access. He envisions a future where users can selectively share parts of their data with different agents, similar to how OAuth works but with much finer control. This points to significant opportunities in developing infrastructure for secure and efficient agent-to-agent communication and data sharing.Other highlights from our conversation* The Evolution of AI-Powered Agents – Exploring how AI agents have evolved from simple chatbots to sophisticated multi-agent systems, and the role of MCPs in enabling that.* Hybrid Digital Teams and the Future of Work – How AI agents are becoming teammates rather than just tools, and what this means for business operations and knowledge work.* Memory in AI Agents – The importance of persistent memory in AI systems and how shared memory across agents could enhance collaboration and efficiency.* Business Models for AI Agents – Exploring the shift from software as a service (SaaS) to work as a service (WaaS) and results as a service (RaaS), and what this means for monetization.* The Role of Standards Like MCP – Why MCP has been widely adopted and how it enables agent collaboration, tool use, and discovery.* The Future of AI Code Generation and Software Engineering – How AI-assisted coding is changing the role of software engineers and what skills will matter most in the future.* Domain Investing and Efficient Markets – Dharmesh's approach to domain investing and how inefficiencies in digital asset markets create business opportunities.* The Philosophy of Saying No – Lessons from "Sorry, You Must Pass" and how prioritization leads to greater productivity and focus.Timestamps* 00:00 Introduction and Guest Welcome* 02:29 Dharmesh Shah's Journey into AI* 05:22 Defining AI Agents* 06:45 The Evolution and Future of AI Agents* 13:53 Graph Theory and Knowledge Representation* 20:02 Engineering Practices and Overengineering* 25:57 The Role of Junior Engineers in the AI Era* 28:20 Multi-Agent Systems and MCP Standards* 35:55 LinkedIn's Legal Battles and Data Scraping* 37:32 The Future of AI and Hybrid Teams* 39:19 Building Agent AI: A Professional Network for Agents* 40:43 Challenges and Innovations in Agent AI* 45:02 The Evolution of UI in AI Systems* 01:00:25 Business Models: Work as a Service vs. Results as a Service* 01:09:17 The Future Value of Engineers* 01:09:51 Exploring the Role of Agents* 01:10:28 The Importance of Memory in AI* 01:11:02 Challenges and Opportunities in AI Memory* 01:12:41 Selective Memory and Privacy Concerns* 01:13:27 The Evolution of AI Tools and Platforms* 01:18:23 Domain Names and AI Projects* 01:32:08 Balancing Work and Personal Life* 01:35:52 Final Thoughts and ReflectionsTranscriptAlessio [00:00:04]: Hey everyone, welcome back to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Small AI.swyx [00:00:12]: Hello, and today we're super excited to have Dharmesh Shah to join us. I guess your relevant title here is founder of Agent AI.Dharmesh [00:00:20]: Yeah, that's true for this. Yeah, creator of Agent.ai and co-founder of HubSpot.swyx [00:00:25]: Co-founder of HubSpot, which I followed for many years, I think 18 years now, gonna be 19 soon. And you caught, you know, people can catch up on your HubSpot story elsewhere. I should also thank Sean Puri, who I've chatted with back and forth, who's been, I guess, getting me in touch with your people. But also, I think like, just giving us a lot of context, because obviously, My First Million joined you guys, and they've been chatting with you guys a lot. So for the business side, we can talk about that, but I kind of wanted to engage your CTO, agent, engineer side of things. So how did you get agent religion?Dharmesh [00:01:00]: Let's see. So I've been working, I'll take like a half step back, a decade or so ago, even though actually more than that. So even before HubSpot, the company I was contemplating that I had named for was called Ingenisoft. And the idea behind Ingenisoft was a natural language interface to business software. Now realize this is 20 years ago, so that was a hard thing to do. But the actual use case that I had in mind was, you know, we had data sitting in business systems like a CRM or something like that. And my kind of what I thought clever at the time. Oh, what if we used email as the kind of interface to get to business software? And the motivation for using email is that it automatically works when you're offline. So imagine I'm getting on a plane or I'm on a plane. There was no internet on planes back then. It's like, oh, I'm going through business cards from an event I went to. I can just type things into an email just to have them all in the backlog. When it reconnects, it sends those emails to a processor that basically kind of parses effectively the commands and updates the software, sends you the file, whatever it is. And there was a handful of commands. I was a little bit ahead of the times in terms of what was actually possible. And I reattempted this natural language thing with a product called ChatSpot that I did back 20...swyx [00:02:12]: Yeah, this is your first post-ChatGPT project.Dharmesh [00:02:14]: I saw it come out. Yeah. And so I've always been kind of fascinated by this natural language interface to software. Because, you know, as software developers, myself included, we've always said, oh, we build intuitive, easy-to-use applications. And it's not intuitive at all, right? Because what we're doing is... We're taking the mental model that's in our head of what we're trying to accomplish with said piece of software and translating that into a series of touches and swipes and clicks and things like that. And there's nothing natural or intuitive about it. And so natural language interfaces, for the first time, you know, whatever the thought is you have in your head and expressed in whatever language that you normally use to talk to yourself in your head, you can just sort of emit that and have software do something. And I thought that was kind of a breakthrough, which it has been. And it's gone. So that's where I first started getting into the journey. I started because now it actually works, right? So once we got ChatGPT and you can take, even with a few-shot example, convert something into structured, even back in the ChatGP 3.5 days, it did a decent job in a few-shot example, convert something to structured text if you knew what kinds of intents you were going to have. And so that happened. And that ultimately became a HubSpot project. But then agents intrigued me because I'm like, okay, well, that's the next step here. So chat's great. Love Chat UX. But if we want to do something even more meaningful, it felt like the next kind of advancement is not this kind of, I'm chatting with some software in a kind of a synchronous back and forth model, is that software is going to do things for me in kind of a multi-step way to try and accomplish some goals. So, yeah, that's when I first got started. It's like, okay, what would that look like? Yeah. And I've been obsessed ever since, by the way.Alessio [00:03:55]: Which goes back to your first experience with it, which is like you're offline. Yeah. And you want to do a task. You don't need to do it right now. You just want to queue it up for somebody to do it for you. Yes. As you think about agents, like, let's start at the easy question, which is like, how do you define an agent? Maybe. You mean the hardest question in the universe? Is that what you mean?Dharmesh [00:04:12]: You said you have an irritating take. I do have an irritating take. I think, well, some number of people have been irritated, including within my own team. So I have a very broad definition for agents, which is it's AI-powered software that accomplishes a goal. Period. That's it. And what irritates people about it is like, well, that's so broad as to be completely non-useful. And I understand that. I understand the criticism. But in my mind, if you kind of fast forward months, I guess, in AI years, the implementation of it, and we're already starting to see this, and we'll talk about this, different kinds of agents, right? So I think in addition to having a usable definition, and I like yours, by the way, and we should talk more about that, that you just came out with, the classification of agents actually is also useful, which is, is it autonomous or non-autonomous? Does it have a deterministic workflow? Does it have a non-deterministic workflow? Is it working synchronously? Is it working asynchronously? Then you have the different kind of interaction modes. Is it a chat agent, kind of like a customer support agent would be? You're having this kind of back and forth. Is it a workflow agent that just does a discrete number of steps? So there's all these different flavors of agents. So if I were to draw it in a Venn diagram, I would draw a big circle that says, this is agents, and then I have a bunch of circles, some overlapping, because they're not mutually exclusive. And so I think that's what's interesting, and we're seeing development along a bunch of different paths, right? So if you look at the first implementation of agent frameworks, you look at Baby AGI and AutoGBT, I think it was, not Autogen, that's the Microsoft one. They were way ahead of their time because they assumed this level of reasoning and execution and planning capability that just did not exist, right? So it was an interesting thought experiment, which is what it was. Even the guy that, I'm an investor in Yohei's fund that did Baby AGI. It wasn't ready, but it was a sign of what was to come. And so the question then is, when is it ready? And so lots of people talk about the state of the art when it comes to agents. I'm a pragmatist, so I think of the state of the practical. It's like, okay, well, what can I actually build that has commercial value or solves actually some discrete problem with some baseline of repeatability or verifiability?swyx [00:06:22]: There was a lot, and very, very interesting. I'm not irritated by it at all. Okay. As you know, I take a... There's a lot of anthropological view or linguistics view. And in linguistics, you don't want to be prescriptive. You want to be descriptive. Yeah. So you're a goals guy. That's the key word in your thing. And other people have other definitions that might involve like delegated trust or non-deterministic work, LLM in the loop, all that stuff. The other thing I was thinking about, just the comment on Baby AGI, LGBT. Yeah. In that piece that you just read, I was able to go through our backlog and just kind of track the winter of agents and then the summer now. Yeah. And it's... We can tell the whole story as an oral history, just following that thread. And it's really just like, I think, I tried to explain the why now, right? Like I had, there's better models, of course. There's better tool use with like, they're just more reliable. Yep. Better tools with MCP and all that stuff. And I'm sure you have opinions on that too. Business model shift, which you like a lot. I just heard you talk about RAS with MFM guys. Yep. Cost is dropping a lot. Yep. Inference is getting faster. There's more model diversity. Yep. Yep. I think it's a subtle point. It means that like, you have different models with different perspectives. You don't get stuck in the basin of performance of a single model. Sure. You can just get out of it by just switching models. Yep. Multi-agent research and RL fine tuning. So I just wanted to let you respond to like any of that.Dharmesh [00:07:44]: Yeah. A couple of things. Connecting the dots on the kind of the definition side of it. So we'll get the irritation out of the way completely. I have one more, even more irritating leap on the agent definition thing. So here's the way I think about it. By the way, the kind of word agent, I looked it up, like the English dictionary definition. The old school agent, yeah. Is when you have someone or something that does something on your behalf, like a travel agent or a real estate agent acts on your behalf. It's like proxy, which is a nice kind of general definition. So the other direction I'm sort of headed, and it's going to tie back to tool calling and MCP and things like that, is if you, and I'm not a biologist by any stretch of the imagination, but we have these single-celled organisms, right? Like the simplest possible form of what one would call life. But it's still life. It just happens to be single-celled. And then you can combine cells and then cells become specialized over time. And you have much more sophisticated organisms, you know, kind of further down the spectrum. In my mind, at the most fundamental level, you can almost think of having atomic agents. What is the simplest possible thing that's an agent that can still be called an agent? What is the equivalent of a kind of single-celled organism? And the reason I think that's useful is right now we're headed down the road, which I think is very exciting around tool use, right? That says, okay, the LLMs now can be provided a set of tools that it calls to accomplish whatever it needs to accomplish in the kind of furtherance of whatever goal it's trying to get done. And I'm not overly bothered by it, but if you think about it, if you just squint a little bit and say, well, what if everything was an agent? And what if tools were actually just atomic agents? Because then it's turtles all the way down, right? Then it's like, oh, well, all that's really happening with tool use is that we have a network of agents that know about each other through something like an MMCP and can kind of decompose a particular problem and say, oh, I'm going to delegate this to this set of agents. And why do we need to draw this distinction between tools, which are functions most of the time? And an actual agent. And so I'm going to write this irritating LinkedIn post, you know, proposing this. It's like, okay. And I'm not suggesting we should call even functions, you know, call them agents. But there is a certain amount of elegance that happens when you say, oh, we can just reduce it down to one primitive, which is an agent that you can combine in complicated ways to kind of raise the level of abstraction and accomplish higher order goals. Anyway, that's my answer. I'd say that's a success. Thank you for coming to my TED Talk on agent definitions.Alessio [00:09:54]: How do you define the minimum viable agent? Do you already have a definition for, like, where you draw the line between a cell and an atom? Yeah.Dharmesh [00:10:02]: So in my mind, it has to, at some level, use AI in order for it to—otherwise, it's just software. It's like, you know, we don't need another word for that. And so that's probably where I draw the line. So then the question, you know, the counterargument would be, well, if that's true, then lots of tools themselves are actually not agents because they're just doing a database call or a REST API call or whatever it is they're doing. And that does not necessarily qualify them, which is a fair counterargument. And I accept that. It's like a good argument. I still like to think about—because we'll talk about multi-agent systems, because I think—so we've accepted, which I think is true, lots of people have said it, and you've hopefully combined some of those clips of really smart people saying this is the year of agents, and I completely agree, it is the year of agents. But then shortly after that, it's going to be the year of multi-agent systems or multi-agent networks. I think that's where it's going to be headed next year. Yeah.swyx [00:10:54]: Opening eyes already on that. Yeah. My quick philosophical engagement with you on this. I often think about kind of the other spectrum, the other end of the cell spectrum. So single cell is life, multi-cell is life, and you clump a bunch of cells together in a more complex organism, they become organs, like an eye and a liver or whatever. And then obviously we consider ourselves one life form. There's not like a lot of lives within me. I'm just one life. And now, obviously, I don't think people don't really like to anthropomorphize agents and AI. Yeah. But we are extending our consciousness and our brain and our functionality out into machines. I just saw you were a Bee. Yeah. Which is, you know, it's nice. I have a limitless pendant in my pocket.Dharmesh [00:11:37]: I got one of these boys. Yeah.swyx [00:11:39]: I'm testing it all out. You know, got to be early adopters. But like, we want to extend our personal memory into these things so that we can be good at the things that we're good at. And, you know, machines are good at it. Machines are there. So like, my definition of life is kind of like going outside of my own body now. I don't know if you've ever had like reflections on that. Like how yours. How our self is like actually being distributed outside of you. Yeah.Dharmesh [00:12:01]: I don't fancy myself a philosopher. But you went there. So yeah, I did go there. I'm fascinated by kind of graphs and graph theory and networks and have been for a long, long time. And to me, we're sort of all nodes in this kind of larger thing. It just so happens that we're looking at individual kind of life forms as they exist right now. But so the idea is when you put a podcast out there, there's these little kind of nodes you're putting out there of like, you know, conceptual ideas. Once again, you have varying kind of forms of those little nodes that are up there and are connected in varying and sundry ways. And so I just think of myself as being a node in a massive, massive network. And I'm producing more nodes as I put content or ideas. And, you know, you spend some portion of your life collecting dots, experiences, people, and some portion of your life then connecting dots from the ones that you've collected over time. And I found that really interesting things happen and you really can't know in advance how those dots are necessarily going to connect in the future. And that's, yeah. So that's my philosophical take. That's the, yes, exactly. Coming back.Alessio [00:13:04]: Yep. Do you like graph as an agent? Abstraction? That's been one of the hot topics with LandGraph and Pydantic and all that.Dharmesh [00:13:11]: I do. The thing I'm more interested in terms of use of graphs, and there's lots of work happening on that now, is graph data stores as an alternative in terms of knowledge stores and knowledge graphs. Yeah. Because, you know, so I've been in software now 30 plus years, right? So it's not 10,000 hours. It's like 100,000 hours that I've spent doing this stuff. And so I've grew up with, so back in the day, you know, I started on mainframes. There was a product called IMS from IBM, which is basically an index database, what we'd call like a key value store today. Then we've had relational databases, right? We have tables and columns and foreign key relationships. We all know that. We have document databases like MongoDB, which is sort of a nested structure keyed by a specific index. We have vector stores, vector embedding database. And graphs are interesting for a couple of reasons. One is, so it's not classically structured in a relational way. When you say structured database, to most people, they're thinking tables and columns and in relational database and set theory and all that. Graphs still have structure, but it's not the tables and columns structure. And you could wonder, and people have made this case, that they are a better representation of knowledge for LLMs and for AI generally than other things. So that's kind of thing number one conceptually, and that might be true, I think is possibly true. And the other thing that I really like about that in the context of, you know, I've been in the context of data stores for RAG is, you know, RAG, you say, oh, I have a million documents, I'm going to build the vector embeddings, I'm going to come back with the top X based on the semantic match, and that's fine. All that's very, very useful. But the reality is something gets lost in the chunking process and the, okay, well, those tend, you know, like, you don't really get the whole picture, so to speak, and maybe not even the right set of dimensions on the kind of broader picture. And it makes intuitive sense to me that if we did capture it properly in a graph form, that maybe that feeding into a RAG pipeline will actually yield better results for some use cases, I don't know, but yeah.Alessio [00:15:03]: And do you feel like at the core of it, there's this difference between imperative and declarative programs? Because if you think about HubSpot, it's like, you know, people and graph kind of goes hand in hand, you know, but I think maybe the software before was more like primary foreign key based relationship, versus now the models can traverse through the graph more easily.Dharmesh [00:15:22]: Yes. So I like that representation. There's something. It's just conceptually elegant about graphs and just from the representation of it, they're much more discoverable, you can kind of see it, there's observability to it, versus kind of embeddings, which you can't really do much with as a human. You know, once they're in there, you can't pull stuff back out. But yeah, I like that kind of idea of it. And the other thing that's kind of, because I love graphs, I've been long obsessed with PageRank from back in the early days. And, you know, one of the kind of simplest algorithms in terms of coming up, you know, with a phone, everyone's been exposed to PageRank. And the idea is that, and so I had this other idea for a project, not a company, and I have hundreds of these, called NodeRank, is to be able to take the idea of PageRank and apply it to an arbitrary graph that says, okay, I'm going to define what authority looks like and say, okay, well, that's interesting to me, because then if you say, I'm going to take my knowledge store, and maybe this person that contributed some number of chunks to the graph data store has more authority on this particular use case or prompt that's being submitted than this other one that may, or maybe this one was more. popular, or maybe this one has, whatever it is, there should be a way for us to kind of rank nodes in a graph and sort them in some, some useful way. Yeah.swyx [00:16:34]: So I think that's generally useful for, for anything. I think the, the problem, like, so even though at my conferences, GraphRag is super popular and people are getting knowledge, graph religion, and I will say like, it's getting space, getting traction in two areas, conversation memory, and then also just rag in general, like the, the, the document data. Yeah. It's like a source. Most ML practitioners would say that knowledge graph is kind of like a dirty word. The graph database, people get graph religion, everything's a graph, and then they, they go really hard into it and then they get a, they get a graph that is too complex to navigate. Yes. And so like the, the, the simple way to put it is like you at running HubSpot, you know, the power of graphs, the way that Google has pitched them for many years, but I don't suspect that HubSpot itself uses a knowledge graph. No. Yeah.Dharmesh [00:17:26]: So when is it over engineering? Basically? It's a great question. I don't know. So the question now, like in AI land, right, is the, do we necessarily need to understand? So right now, LLMs for, for the most part are somewhat black boxes, right? We sort of understand how the, you know, the algorithm itself works, but we really don't know what's going on in there and, and how things come out. So if a graph data store is able to produce the outcomes we want, it's like, here's a set of queries I want to be able to submit and then it comes out with useful content. Maybe the underlying data store is as opaque as a vector embeddings or something like that, but maybe it's fine. Maybe we don't necessarily need to understand it to get utility out of it. And so maybe if it's messy, that's okay. Um, that's, it's just another form of lossy compression. Uh, it's just lossy in a way that we just don't completely understand in terms of, because it's going to grow organically. Uh, and it's not structured. It's like, ah, we're just gonna throw a bunch of stuff in there. Let the, the equivalent of the embedding algorithm, whatever they called in graph land. Um, so the one with the best results wins. I think so. Yeah.swyx [00:18:26]: Or is this the practical side of me is like, yeah, it's, if it's useful, we don't necessarilyDharmesh [00:18:30]: need to understand it.swyx [00:18:30]: I have, I mean, I'm happy to push back as long as you want. Uh, it's not practical to evaluate like the 10 different options out there because it takes time. It takes people, it takes, you know, resources, right? Set. That's the first thing. Second thing is your evals are typically on small things and some things only work at scale. Yup. Like graphs. Yup.Dharmesh [00:18:46]: Yup. That's, yeah, no, that's fair. And I think this is one of the challenges in terms of implementation of graph databases is that the most common approach that I've seen developers do, I've done it myself, is that, oh, I've got a Postgres database or a MySQL or whatever. I can represent a graph with a very set of tables with a parent child thing or whatever. And that sort of gives me the ability, uh, why would I need anything more than that? And the answer is, well, if you don't need anything more than that, you don't need anything more than that. But there's a high chance that you're sort of missing out on the actual value that, uh, the graph representation gives you. Which is the ability to traverse the graph, uh, efficiently in ways that kind of going through the, uh, traversal in a relational database form, even though structurally you have the data, practically you're not gonna be able to pull it out in, in useful ways. Uh, so you wouldn't like represent a social graph, uh, in, in using that kind of relational table model. It just wouldn't scale. It wouldn't work.swyx [00:19:36]: Uh, yeah. Uh, I think we want to move on to MCP. Yeah. But I just want to, like, just engineering advice. Yeah. Uh, obviously you've, you've, you've run, uh, you've, you've had to do a lot of projects and run a lot of teams. Do you have a general rule for over-engineering or, you know, engineering ahead of time? You know, like, because people, we know premature engineering is the root of all evil. Yep. But also sometimes you just have to. Yep. When do you do it? Yes.Dharmesh [00:19:59]: It's a great question. This is, uh, a question as old as time almost, which is what's the right and wrong levels of abstraction. That's effectively what, uh, we're answering when we're trying to do engineering. I tend to be a pragmatist, right? So here's the thing. Um, lots of times doing something the right way. Yeah. It's like a marginal increased cost in those cases. Just do it the right way. And this is what makes a, uh, a great engineer or a good engineer better than, uh, a not so great one. It's like, okay, all things being equal. If it's going to take you, you know, roughly close to constant time anyway, might as well do it the right way. Like, so do things well, then the question is, okay, well, am I building a framework as the reusable library? To what degree, uh, what am I anticipating in terms of what's going to need to change in this thing? Uh, you know, along what dimension? And then I think like a business person in some ways, like what's the return on calories, right? So, uh, and you look at, um, energy, the expected value of it's like, okay, here are the five possible things that could happen, uh, try to assign probabilities like, okay, well, if there's a 50% chance that we're going to go down this particular path at some day, like, or one of these five things is going to happen and it costs you 10% more to engineer for that. It's basically, it's something that yields a kind of interest compounding value. Um, as you get closer to the time of, of needing that versus having to take on debt, which is when you under engineer it, you're taking on debt. You're going to have to pay off when you do get to that eventuality where something happens. One thing as a pragmatist, uh, so I would rather under engineer something than over engineer it. If I were going to err on the side of something, and here's the reason is that when you under engineer it, uh, yes, you take on tech debt, uh, but the interest rate is relatively known and payoff is very, very possible, right? Which is, oh, I took a shortcut here as a result of which now this thing that should have taken me a week is now going to take me four weeks. Fine. But if that particular thing that you thought might happen, never actually, you never have that use case transpire or just doesn't, it's like, well, you just save yourself time, right? And that has value because you were able to do other things instead of, uh, kind of slightly over-engineering it away, over-engineering it. But there's no perfect answers in art form in terms of, uh, and yeah, we'll, we'll bring kind of this layers of abstraction back on the code generation conversation, which we'll, uh, I think I have later on, butAlessio [00:22:05]: I was going to ask, we can just jump ahead quickly. Yeah. Like, as you think about vibe coding and all that, how does the. Yeah. Percentage of potential usefulness change when I feel like we over-engineering a lot of times it's like the investment in syntax, it's less about the investment in like arc exacting. Yep. Yeah. How does that change your calculus?Dharmesh [00:22:22]: A couple of things, right? One is, um, so, you know, going back to that kind of ROI or a return on calories, kind of calculus or heuristic you think through, it's like, okay, well, what is it going to cost me to put this layer of abstraction above the code that I'm writing now, uh, in anticipating kind of future needs. If the cost of fixing, uh, or doing under engineering right now. Uh, we'll trend towards zero that says, okay, well, I don't have to get it right right now because even if I get it wrong, I'll run the thing for six hours instead of 60 minutes or whatever. It doesn't really matter, right? Like, because that's going to trend towards zero to be able, the ability to refactor a code. Um, and because we're going to not that long from now, we're going to have, you know, large code bases be able to exist, uh, you know, as, as context, uh, for a code generation or a code refactoring, uh, model. So I think it's going to make it, uh, make the case for under engineering, uh, even stronger. Which is why I take on that cost. You just pay the interest when you get there, it's not, um, just go on with your life vibe coded and, uh, come back when you need to. Yeah.Alessio [00:23:18]: Sometimes I feel like there's no decision-making in some things like, uh, today I built a autosave for like our internal notes platform and I literally just ask them cursor. Can you add autosave? Yeah. I don't know if it's over under engineer. Yep. I just vibe coded it. Yep. And I feel like at some point we're going to get to the point where the models kindDharmesh [00:23:36]: of decide where the right line is, but this is where the, like the, in my mind, the danger is, right? So there's two sides to this. One is the cost of kind of development and coding and things like that stuff that, you know, we talk about. But then like in your example, you know, one of the risks that we have is that because adding a feature, uh, like a save or whatever the feature might be to a product as that price tends towards zero, are we going to be less discriminant about what features we add as a result of making more product products more complicated, which has a negative impact on the user and navigate negative impact on the business. Um, and so that's the thing I worry about if it starts to become too easy, are we going to be. Too promiscuous in our, uh, kind of extension, adding product extensions and things like that. It's like, ah, why not add X, Y, Z or whatever back then it was like, oh, we only have so many engineering hours or story points or however you measure things. Uh, that least kept us in check a little bit. Yeah.Alessio [00:24:22]: And then over engineering, you're like, yeah, it's kind of like you're putting that on yourself. Yeah. Like now it's like the models don't understand that if they add too much complexity, it's going to come back to bite them later. Yep. So they just do whatever they want to do. Yeah. And I'm curious where in the workflow that's going to be, where it's like, Hey, this is like the amount of complexity and over-engineering you can do before you got to ask me if we should actually do it versus like do something else.Dharmesh [00:24:45]: So you know, we've already, let's like, we're leaving this, uh, in the code generation world, this kind of compressed, um, cycle time. Right. It's like, okay, we went from auto-complete, uh, in the GitHub co-pilot to like, oh, finish this particular thing and hit tab to a, oh, I sort of know your file or whatever. I can write out a full function to you to now I can like hold a bunch of the context in my head. Uh, so we can do app generation, which we have now with lovable and bolt and repletage. Yeah. Association and other things. So then the question is, okay, well, where does it naturally go from here? So we're going to generate products. Make sense. We might be able to generate platforms as though I want a platform for ERP that does this, whatever. And that includes the API's includes the product and the UI, and all the things that make for a platform. There's no nothing that says we would stop like, okay, can you generate an entire software company someday? Right. Uh, with the platform and the monetization and the go-to-market and the whatever. And you know, that that's interesting to me in terms of, uh, you know, what, when you take it to almost ludicrous levels. of abstract.swyx [00:25:39]: It's like, okay, turn it to 11. You mentioned vibe coding, so I have to, this is a blog post I haven't written, but I'm kind of exploring it. Is the junior engineer dead?Dharmesh [00:25:49]: I don't think so. I think what will happen is that the junior engineer will be able to, if all they're bringing to the table is the fact that they are a junior engineer, then yes, they're likely dead. But hopefully if they can communicate with carbon-based life forms, they can interact with product, if they're willing to talk to customers, they can take their kind of basic understanding of engineering and how kind of software works. I think that has value. So I have a 14-year-old right now who's taking Python programming class, and some people ask me, it's like, why is he learning coding? And my answer is, is because it's not about the syntax, it's not about the coding. What he's learning is like the fundamental thing of like how things work. And there's value in that. I think there's going to be timeless value in systems thinking and abstractions and what that means. And whether functions manifested as math, which he's going to get exposed to regardless, or there are some core primitives to the universe, I think, that the more you understand them, those are what I would kind of think of as like really large dots in your life that will have a higher gravitational pull and value to them that you'll then be able to. So I want him to collect those dots, and he's not resisting. So it's like, okay, while he's still listening to me, I'm going to have him do things that I think will be useful.swyx [00:26:59]: You know, part of one of the pitches that I evaluated for AI engineer is a term. And the term is that maybe the traditional interview path or career path of software engineer goes away, which is because what's the point of lead code? Yeah. And, you know, it actually matters more that you know how to work with AI and to implement the things that you want. Yep.Dharmesh [00:27:16]: That's one of the like interesting things that's happened with generative AI. You know, you go from machine learning and the models and just that underlying form, which is like true engineering, right? Like the actual, what I call real engineering. I don't think of myself as a real engineer, actually. I'm a developer. But now with generative AI. We call it AI and it's obviously got its roots in machine learning, but it just feels like fundamentally different to me. Like you have the vibe. It's like, okay, well, this is just a whole different approach to software development to so many different things. And so I'm wondering now, it's like an AI engineer is like, if you were like to draw the Venn diagram, it's interesting because the cross between like AI things, generative AI and what the tools are capable of, what the models do, and this whole new kind of body of knowledge that we're still building out, it's still very young, intersected with kind of classic engineering, software engineering. Yeah.swyx [00:28:04]: I just described the overlap as it separates out eventually until it's its own thing, but it's starting out as a software. Yeah.Alessio [00:28:11]: That makes sense. So to close the vibe coding loop, the other big hype now is MCPs. Obviously, I would say Cloud Desktop and Cursor are like the two main drivers of MCP usage. I would say my favorite is the Sentry MCP. I can pull in errors and then you can just put the context in Cursor. How do you think about that abstraction layer? Does it feel... Does it feel almost too magical in a way? Do you think it's like you get enough? Because you don't really see how the server itself is then kind of like repackaging theDharmesh [00:28:41]: information for you? I think MCP as a standard is one of the better things that's happened in the world of AI because a standard needed to exist and absent a standard, there was a set of things that just weren't possible. Now, we can argue whether it's the best possible manifestation of a standard or not. Does it do too much? Does it do too little? I get that, but it's just simple enough to both be useful and unobtrusive. It's understandable and adoptable by mere mortals, right? It's not overly complicated. You know, a reasonable engineer can put a stand up an MCP server relatively easily. The thing that has me excited about it is like, so I'm a big believer in multi-agent systems. And so that's going back to our kind of this idea of an atomic agent. So imagine the MCP server, like obviously it calls tools, but the way I think about it, so I'm working on my current passion project is agent.ai. And we'll talk more about that in a little bit. More about the, I think we should, because I think it's interesting not to promote the project at all, but there's some interesting ideas in there. One of which is around, we're going to need a mechanism for, if agents are going to collaborate and be able to delegate, there's going to need to be some form of discovery and we're going to need some standard way. It's like, okay, well, I just need to know what this thing over here is capable of. We're going to need a registry, which Anthropic's working on. I'm sure others will and have been doing directories of, and there's going to be a standard around that too. How do you build out a directory of MCP servers? I think that's going to unlock so many things just because, and we're already starting to see it. So I think MCP or something like it is going to be the next major unlock because it allows systems that don't know about each other, don't need to, it's that kind of decoupling of like Sentry and whatever tools someone else was building. And it's not just about, you know, Cloud Desktop or things like, even on the client side, I think we're going to see very interesting consumers of MCP, MCP clients versus just the chat body kind of things. Like, you know, Cloud Desktop and Cursor and things like that. But yeah, I'm very excited about MCP in that general direction.swyx [00:30:39]: I think the typical cynical developer take, it's like, we have OpenAPI. Yeah. What's the new thing? I don't know if you have a, do you have a quick MCP versus everything else? Yeah.Dharmesh [00:30:49]: So it's, so I like OpenAPI, right? So just a descriptive thing. It's OpenAPI. OpenAPI. Yes, that's what I meant. So it's basically a self-documenting thing. We can do machine-generated, lots of things from that output. It's a structured definition of an API. I get that, love it. But MCPs sort of are kind of use case specific. They're perfect for exactly what we're trying to use them for around LLMs in terms of discovery. It's like, okay, I don't necessarily need to know kind of all this detail. And so right now we have, we'll talk more about like MCP server implementations, but We will? I think, I don't know. Maybe we won't. At least it's in my head. It's like a back processor. But I do think MCP adds value above OpenAPI. It's, yeah, just because it solves this particular thing. And if we had come to the world, which we have, like, it's like, hey, we already have OpenAPI. It's like, if that were good enough for the universe, the universe would have adopted it already. There's a reason why MCP is taking office because marginally adds something that was missing before and doesn't go too far. And so that's why the kind of rate of adoption, you folks have written about this and talked about it. Yeah, why MCP won. Yeah. And it won because the universe decided that this was useful and maybe it gets supplanted by something else. Yeah. And maybe we discover, oh, maybe OpenAPI was good enough the whole time. I doubt that.swyx [00:32:09]: The meta lesson, this is, I mean, he's an investor in DevTools companies. I work in developer experience at DevRel in DevTools companies. Yep. Everyone wants to own the standard. Yeah. I'm sure you guys have tried to launch your own standards. Actually, it's Houseplant known for a standard, you know, obviously inbound marketing. But is there a standard or protocol that you ever tried to push? No.Dharmesh [00:32:30]: And there's a reason for this. Yeah. Is that? And I don't mean, need to mean, speak for the people of HubSpot, but I personally. You kind of do. I'm not smart enough. That's not the, like, I think I have a. You're smart. Not enough for that. I'm much better off understanding the standards that are out there. And I'm more on the composability side. Let's, like, take the pieces of technology that exist out there, combine them in creative, unique ways. And I like to consume standards. I don't like to, and that's not that I don't like to create them. I just don't think I have the, both the raw wattage or the credibility. It's like, okay, well, who the heck is Dharmesh, and why should we adopt a standard he created?swyx [00:33:07]: Yeah, I mean, there are people who don't monetize standards, like OpenTelemetry is a big standard, and LightStep never capitalized on that.Dharmesh [00:33:15]: So, okay, so if I were to do a standard, there's two things that have been in my head in the past. I was one around, a very, very basic one around, I don't even have the domain, I have a domain for everything, for open marketing. Because the issue we had in HubSpot grew up in the marketing space. There we go. There was no standard around data formats and things like that. It doesn't go anywhere. But the other one, and I did not mean to go here, but I'm going to go here. It's called OpenGraph. I know the term was already taken, but it hasn't been used for like 15 years now for its original purpose. But what I think should exist in the world is right now, our information, all of us, nodes are in the social graph at Meta or the professional graph at LinkedIn. Both of which are actually relatively closed in actually very annoying ways. Like very, very closed, right? Especially LinkedIn. Especially LinkedIn. I personally believe that if it's my data, and if I would get utility out of it being open, I should be able to make my data open or publish it in whatever forms that I choose, as long as I have control over it as opt-in. So the idea is around OpenGraph that says, here's a standard, here's a way to publish it. I should be able to go to OpenGraph.org slash Dharmesh dot JSON and get it back. And it's like, here's your stuff, right? And I can choose along the way and people can write to it and I can prove. And there can be an entire system. And if I were to do that, I would do it as a... Like a public benefit, non-profit-y kind of thing, as this is a contribution to society. I wouldn't try to commercialize that. Have you looked at AdProto? What's that? AdProto.swyx [00:34:43]: It's the protocol behind Blue Sky. Okay. My good friend, Dan Abramov, who was the face of React for many, many years, now works there. And he actually did a talk that I can send you, which basically kind of tries to articulate what you just said. But he does, he loves doing these like really great analogies, which I think you'll like. Like, you know, a lot of our data is behind a handle, behind a domain. Yep. So he's like, all right, what if we flip that? What if it was like our handle and then the domain? Yep. So, and that's really like your data should belong to you. Yep. And I should not have to wait 30 days for my Twitter data to export. Yep.Dharmesh [00:35:19]: you should be able to at least be able to automate it or do like, yes, I should be able to plug it into an agentic thing. Yeah. Yes. I think we're... Because so much of our data is... Locked up. I think the trick here isn't that standard. It is getting the normies to care.swyx [00:35:37]: Yeah. Because normies don't care.Dharmesh [00:35:38]: That's true. But building on that, normies don't care. So, you know, privacy is a really hot topic and an easy word to use, but it's not a binary thing. Like there are use cases where, and we make these choices all the time, that I will trade, not all privacy, but I will trade some privacy for some productivity gain or some benefit to me that says, oh, I don't care about that particular data being online if it gives me this in return, or I don't mind sharing this information with this company.Alessio [00:36:02]: If I'm getting, you know, this in return, but that sort of should be my option. I think now with computer use, you can actually automate some of the exports. Yes. Like something we've been doing internally is like everybody exports their LinkedIn connections. Yep. And then internally, we kind of merge them together to see how we can connect our companies to customers or things like that.Dharmesh [00:36:21]: And not to pick on LinkedIn, but since we're talking about it, but they feel strongly enough on the, you know, do not take LinkedIn data that they will block even browser use kind of things or whatever. They go to great, great lengths, even to see patterns of usage. And it says, oh, there's no way you could have, you know, gotten that particular thing or whatever without, and it's, so it's, there's...swyx [00:36:42]: Wasn't there a Supreme Court case that they lost? Yeah.Dharmesh [00:36:45]: So the one they lost was around someone that was scraping public data that was on the public internet. And that particular company had not signed any terms of service or whatever. It's like, oh, I'm just taking data that's on, there was no, and so that's why they won. But now, you know, the question is around, can LinkedIn... I think they can. Like, when you use, as a user, you use LinkedIn, you are signing up for their terms of service. And if they say, well, this kind of use of your LinkedIn account that violates our terms of service, they can shut your account down, right? They can. And they, yeah, so, you know, we don't need to make this a discussion. By the way, I love the company, don't get me wrong. I'm an avid user of the product. You know, I've got... Yeah, I mean, you've got over a million followers on LinkedIn, I think. Yeah, I do. And I've known people there for a long, long time, right? And I have lots of respect. And I understand even where the mindset originally came from of this kind of members-first approach to, you know, a privacy-first. I sort of get that. But sometimes you sort of have to wonder, it's like, okay, well, that was 15, 20 years ago. There's likely some controlled ways to expose some data on some member's behalf and not just completely be a binary. It's like, no, thou shalt not have the data.swyx [00:37:54]: Well, just pay for sales navigator.Alessio [00:37:57]: Before we move to the next layer of instruction, anything else on MCP you mentioned? Let's move back and then I'll tie it back to MCPs.Dharmesh [00:38:05]: So I think the... Open this with agent. Okay, so I'll start with... Here's my kind of running thesis, is that as AI and agents evolve, which they're doing very, very quickly, we're going to look at them more and more. I don't like to anthropomorphize. We'll talk about why this is not that. Less as just like raw tools and more like teammates. They'll still be software. They should self-disclose as being software. I'm totally cool with that. But I think what's going to happen is that in the same way you might collaborate with a team member on Slack or Teams or whatever you use, you can imagine a series of agents that do specific things just like a team member might do, that you can delegate things to. You can collaborate. You can say, hey, can you take a look at this? Can you proofread that? Can you try this? You can... Whatever it happens to be. So I think it is... I will go so far as to say it's inevitable that we're going to have hybrid teams someday. And what I mean by hybrid teams... So back in the day, hybrid teams were, oh, well, you have some full-time employees and some contractors. Then it was like hybrid teams are some people that are in the office and some that are remote. That's the kind of form of hybrid. The next form of hybrid is like the carbon-based life forms and agents and AI and some form of software. So let's say we temporarily stipulate that I'm right about that over some time horizon that eventually we're going to have these kind of digitally hybrid teams. So if that's true, then the question you sort of ask yourself is that then what needs to exist in order for us to get the full value of that new model? It's like, okay, well... You sort of need to... It's like, okay, well, how do I... If I'm building a digital team, like, how do I... Just in the same way, if I'm interviewing for an engineer or a designer or a PM, whatever, it's like, well, that's why we have professional networks, right? It's like, oh, they have a presence on likely LinkedIn. I can go through that semi-structured, structured form, and I can see the experience of whatever, you know, self-disclosed. But, okay, well, agents are going to need that someday. And so I'm like, okay, well, this seems like a thread that's worth pulling on. That says, okay. So I... So agent.ai is out there. And it's LinkedIn for agents. It's LinkedIn for agents. It's a professional network for agents. And the more I pull on that thread, it's like, okay, well, if that's true, like, what happens, right? It's like, oh, well, they have a profile just like anyone else, just like a human would. It's going to be a graph underneath, just like a professional network would be. It's just that... And you can have its, you know, connections and follows, and agents should be able to post. That's maybe how they do release notes. Like, oh, I have this new version. Whatever they decide to post, it should just be able to... Behave as a node on the network of a professional network. As it turns out, the more I think about that and pull on that thread, the more and more things, like, start to make sense to me. So it may be more than just a pure professional network. So my original thought was, okay, well, it's a professional network and agents as they exist out there, which I think there's going to be more and more of, will kind of exist on this network and have the profile. But then, and this is always dangerous, I'm like, okay, I want to see a world where thousands of agents are out there in order for the... Because those digital employees, the digital workers don't exist yet in any meaningful way. And so then I'm like, oh, can I make that easier for, like... And so I have, as one does, it's like, oh, I'll build a low-code platform for building agents. How hard could that be, right? Like, very hard, as it turns out. But it's been fun. So now, agent.ai has 1.3 million users. 3,000 people have actually, you know, built some variation of an agent, sometimes just for their own personal productivity. About 1,000 of which have been published. And the reason this comes back to MCP for me, so imagine that and other networks, since I know agent.ai. So right now, we have an MCP server for agent.ai that exposes all the internally built agents that we have that do, like, super useful things. Like, you know, I have access to a Twitter API that I can subsidize the cost. And I can say, you know, if you're looking to build something for social media, these kinds of things, with a single API key, and it's all completely free right now, I'm funding it. That's a useful way for it to work. And then we have a developer to say, oh, I have this idea. I don't have to worry about open AI. I don't have to worry about, now, you know, this particular model is better. It has access to all the models with one key. And we proxy it kind of behind the scenes. And then expose it. So then we get this kind of community effect, right? That says, oh, well, someone else may have built an agent to do X. Like, I have an agent right now that I built for myself to do domain valuation for website domains because I'm obsessed with domains, right? And, like, there's no efficient market for domains. There's no Zillow for domains right now that tells you, oh, here are what houses in your neighborhood sold for. It's like, well, why doesn't that exist? We should be able to solve that problem. And, yes, you're still guessing. Fine. There should be some simple heuristic. So I built that. It's like, okay, well, let me go look for past transactions. You say, okay, I'm going to type in agent.ai, agent.com, whatever domain. What's it actually worth? I'm looking at buying it. It can go and say, oh, which is what it does. It's like, I'm going to go look at are there any published domain transactions recently that are similar, either use the same word, same top-level domain, whatever it is. And it comes back with an approximate value, and it comes back with its kind of rationale for why it picked the value and comparable transactions. Oh, by the way, this domain sold for published. Okay. So that agent now, let's say, existed on the web, on agent.ai. Then imagine someone else says, oh, you know, I want to build a brand-building agent for startups and entrepreneurs to come up with names for their startup. Like a common problem, every startup is like, ah, I don't know what to call it. And so they type in five random words that kind of define whatever their startup is. And you can do all manner of things, one of which is like, oh, well, I need to find the domain for it. What are possible choices? Now it's like, okay, well, it would be nice to know if there's an aftermarket price for it, if it's listed for sale. Awesome. Then imagine calling this valuation agent. It's like, okay, well, I want to find where the arbitrage is, where the agent valuation tool says this thing is worth $25,000. It's listed on GoDaddy for $5,000. It's close enough. Let's go do that. Right? And that's a kind of composition use case that in my future state. Thousands of agents on the network, all discoverable through something like MCP. And then you as a developer of agents have access to all these kind of Lego building blocks based on what you're trying to solve. Then you blend in orchestration, which is getting better and better with the reasoning models now. Just describe the problem that you have. Now, the next layer that we're all contending with is that how many tools can you actually give an LLM before the LLM breaks? That number used to be like 15 or 20 before you kind of started to vary dramatically. And so that's the thing I'm thinking about now. It's like, okay, if I want to... If I want to expose 1,000 of these agents to a given LLM, obviously I can't give it all 1,000. Is there some intermediate layer that says, based on your prompt, I'm going to make a best guess at which agents might be able to be helpful for this particular thing? Yeah.Alessio [00:44:37]: Yeah, like RAG for tools. Yep. I did build the Latent Space Researcher on agent.ai. Okay. Nice. Yeah, that seems like, you know, then there's going to be a Latent Space Scheduler. And then once I schedule a research, you know, and you build all of these things. By the way, my apologies for the user experience. You realize I'm an engineer. It's pretty good.swyx [00:44:56]: I think it's a normie-friendly thing. Yeah. That's your magic. HubSpot does the same thing.Alessio [00:45:01]: Yeah, just to like quickly run through it. You can basically create all these different steps. And these steps are like, you know, static versus like variable-driven things. How did you decide between this kind of like low-code-ish versus doing, you know, low-code with code backend versus like not exposing that at all? Any fun design decisions? Yeah. And this is, I think...Dharmesh [00:45:22]: I think lots of people are likely sitting in exactly my position right now, coming through the choosing between deterministic. Like if you're like in a business or building, you know, some sort of agentic thing, do you decide to do a deterministic thing? Or do you go non-deterministic and just let the alum handle it, right, with the reasoning models? The original idea and the reason I took the low-code stepwise, a very deterministic approach. A, the reasoning models did not exist at that time. That's thing number one. Thing number two is if you can get... If you know in your head... If you know in your head what the actual steps are to accomplish whatever goal, why would you leave that to chance? There's no upside. There's literally no upside. Just tell me, like, what steps do you need executed? So right now what I'm playing with... So one thing we haven't talked about yet, and people don't talk about UI and agents. Right now, the primary interaction model... Or they don't talk enough about it. I know some people have. But it's like, okay, so we're used to the chatbot back and forth. Fine. I get that. But I think we're going to move to a blend of... Some of those things are going to be synchronous as they are now. But some are going to be... Some are going to be async. It's just going to put it in a queue, just like... And this goes back to my... Man, I talk fast. But I have this... I only have one other speed. It's even faster. So imagine it's like if you're working... So back to my, oh, we're going to have these hybrid digital teams. Like, you would not go to a co-worker and say, I'm going to ask you to do this thing, and then sit there and wait for them to go do it. Like, that's not how the world works. So it's nice to be able to just, like, hand something off to someone. It's like, okay, well, maybe I expect a response in an hour or a day or something like that.Dharmesh [00:46:52]: In terms of when things need to happen. So the UI around agents. So if you look at the output of agent.ai agents right now, they are the simplest possible manifestation of a UI, right? That says, oh, we have inputs of, like, four different types. Like, we've got a dropdown, we've got multi-select, all the things. It's like back in HTML, the original HTML 1.0 days, right? Like, you're the smallest possible set of primitives for a UI. And it just says, okay, because we need to collect some information from the user, and then we go do steps and do things. And generate some output in HTML or markup are the two primary examples. So the thing I've been asking myself, if I keep going down that path. So people ask me, I get requests all the time. It's like, oh, can you make the UI sort of boring? I need to be able to do this, right? And if I keep pulling on that, it's like, okay, well, now I've built an entire UI builder thing. Where does this end? And so I think the right answer, and this is what I'm going to be backcoding once I get done here, is around injecting a code generation UI generation into, the agent.ai flow, right? As a builder, you're like, okay, I'm going to describe the thing that I want, much like you would do in a vibe coding world. But instead of generating the entire app, it's going to generate the UI that exists at some point in either that deterministic flow or something like that. It says, oh, here's the thing I'm trying to do. Go generate the UI for me. And I can go through some iterations. And what I think of it as a, so it's like, I'm going to generate the code, generate the code, tweak it, go through this kind of prompt style, like we do with vibe coding now. And at some point, I'm going to be happy with it. And I'm going to hit save. And that's going to become the action in that particular step. It's like a caching of the generated code that I can then, like incur any inference time costs. It's just the actual code at that point.Alessio [00:48:29]: Yeah, I invested in a company called E2B, which does code sandbox. And they powered the LM arena web arena. So it's basically the, just like you do LMS, like text to text, they do the same for like UI generation. So if you're asking a model, how do you do it? But yeah, I think that's kind of where.Dharmesh [00:48:45]: That's the thing I'm really fascinated by. So the early LLM, you know, we're understandably, but laughably bad at simple arithmetic, right? That's the thing like my wife, Normies would ask us, like, you call this AI, like it can't, my son would be like, it's just stupid. It can't even do like simple arithmetic. And then like we've discovered over time that, and there's a reason for this, right? It's like, it's a large, there's, you know, the word language is in there for a reason in terms of what it's been trained on. It's not meant to do math, but now it's like, okay, well, the fact that it has access to a Python interpreter that I can actually call at runtime, that solves an entire body of problems that it wasn't trained to do. And it's basically a form of delegation. And so the thought that's kind of rattling around in my head is that that's great. So it's, it's like took the arithmetic problem and took it first. Now, like anything that's solvable through a relatively concrete Python program, it's able to do a bunch of things that I couldn't do before. Can we get to the same place with UI? I don't know what the future of UI looks like in a agentic AI world, but maybe let the LLM handle it, but not in the classic sense. Maybe it generates it on the fly, or maybe we go through some iterations and hit cache or something like that. So it's a little bit more predictable. Uh, I don't know, but yeah.Alessio [00:49:48]: And especially when is the human supposed to intervene? So, especially if you're composing them, most of them should not have a UI because then they're just web hooking to somewhere else. I just want to touch back. I don't know if you have more comments on this.swyx [00:50:01]: I was just going to ask when you, you said you got, you're going to go back to code. What
In this episode, we provide an overview of AWS Step Functions and dive deep into the powerful new JSONata and variables features. We explain how JSONata allows complex JSON transformations without custom Lambda functions, enabling more serverless workflows. The variables feature also helps avoid the previous 256KB state size limit. We share examples from real projects showing how these features simplify workflows, reduce costs and enable new use cases.AWS Bites is brought to you in association with fourTheorem. If you need a friendly partner to support you and work with you to de-risk any AWS migration or development project, check them out at fourtheorem.comIn this episode, we mentioned the following resources:JSONata and variables official launch post: https://aws.amazon.com/blogs/compute/simplifying-developer-experience-with-variables-and-jsonata-in-aws-step-functions/JSONata exerciser: https://try.jsonata.org/Stedi JSONata playground: https://www.stedi.com/jsonata/playgroundEpisode 103: Building GenAI Features with Bedrock https://awsbites.com/103-building-genai-features-with-bedrock/Episode 63: How to automate transcripts with Amazon Transcribe and OpenAI Whisper https://awsbites.com/63-how-to-automate-transcripts-with-amazon-transcribe-and-openai-whisper/ Do you have any AWS questions you would like us to address?Leave a comment here or connect with us on X/Twitter, BlueSky or LinkedIn:- https://twitter.com/eoins | https://bsky.app/profile/eoin.sh | https://www.linkedin.com/in/eoins/- https://twitter.com/loige | https://bsky.app/profile/loige.co | https://www.linkedin.com/in/lucianomammino/
The March 2025 Core Update continues to roll out though early indicators are showing in search console. Google Business had a reverficiation bug that caused chaos for a few days that appears to be fixed. The White House announced it was going to order a dramatic downsizing of the department of Education and a French scientist was denied entry to the US after messages critical of Trump were found on his phone, which has terrifying implications for the future of American education and training. The US Court of Appeals rejected copyright protection for AI generated works without a human author while Spain announced it would impose massive fines for not labeling AI generated content as such. The EU has again charged Google with violating EU anti-trust rules, setting the stage for another series of rulings against the search giant. OpenAI released o1-Pro which will be the company's most expensive model yet. Facebook's efforts to supress Shara Wynn-Williams book, Careless People: A Cautionary Tale of Power, Greed, and Lost Idealism has created a Streisand effect by creating ten times the publicity the book previously had. In other news: Danny Sullivan talks about Optimizing for Google AIO, links in AIOs don't necessarily lead to the same sites on Google search, Google is testing a radically expanded number of search options beyond AI Mode, Google Assist is replaced by Gemini, and Google is crawling JSON files daily. All this and much more in a long but interesting episode.Support this podcast at — https://redcircle.com/webcology/donationsAdvertising Inquiries: https://redcircle.com/brandsPrivacy & Opt-Out: https://redcircle.com/privacy
Guy Royse, dev advocate at Redis, discusses going beyond the cache with Redis and Node.js. He explores its capabilities as a memory-first database, session management, and even fun use cases like the Bigfoot Tracker API. He also shares insights on Redis OM for object mapping and its future in the JavaScript ecosystem. Links http://guyroyse.com http://github.com/guyroyse https://www.twitch.tv/guyroyse https://www.youtube.com/channel/UCNt5SDc6LosO41E77jr59cQ https://x.com/guyroyse https://www.linkedin.com/in/groyse https://2024.connect.tech/session/693665 We want to hear from you! How did you find us? Did you see us on Twitter? In a newsletter? Or maybe we were recommended by a friend? Let us know by sending an email to our producer, Emily, at emily.kochanekketner@logrocket.com (mailto:emily.kochanekketner@logrocket.com), or tweet at us at PodRocketPod (https://twitter.com/PodRocketpod). Follow us. Get free stickers. Follow us on Apple Podcasts, fill out this form (https://podrocket.logrocket.com/get-podrocket-stickers), and we'll send you free PodRocket stickers! What does LogRocket do? LogRocket provides AI-first session replay and analytics that surfaces the UX and technical issues impacting user experiences. Start understand where your users are struggling by trying it for free at [LogRocket.com]. Try LogRocket for free today.(https://logrocket.com/signup/?pdr) Special Guest: Guy Royse.
While everyone is now repeating that 2025 is the “Year of the Agent”, OpenAI is heads down building towards it. In the first 2 months of the year they released Operator and Deep Research (arguably the most successful agent archetype so far), and today they are bringing a lot of those capabilities to the API:* Responses API* Web Search Tool* Computer Use Tool* File Search Tool* A new open source Agents SDK with integrated Observability ToolsWe cover all this and more in today's lightning pod on YouTube!More details here:Responses APIIn our Michelle Pokrass episode we talked about the Assistants API needing a redesign. Today OpenAI is launching the Responses API, “a more flexible foundation for developers building agentic applications”. It's a superset of the chat completion API, and the suggested starting point for developers working with OpenAI models. One of the big upgrades is the new set of built-in tools for the responses API: Web Search, Computer Use, and Files. Web Search ToolWe previously had Exa AI on the podcast to talk about web search for AI. OpenAI is also now joining the race; the Web Search API is actually a new “model” that exposes two 4o fine-tunes: gpt-4o-search-preview and gpt-4o-mini-search-preview. These are the same models that power ChatGPT Search, and are priced at $30/1000 queries and $25/1000 queries respectively. The killer feature is inline citations: you do not only get a link to a page, but also a deep link to exactly where your query was answered in the result page. Computer Use ToolThe model that powers Operator, called Computer-Using-Agent (CUA), is also now available in the API. The computer-use-preview model is SOTA on most benchmarks, achieving 38.1% success on OSWorld for full computer use tasks, 58.1% on WebArena, and 87% on WebVoyager for web-based interactions.As you will notice in the docs, `computer-use-preview` is both a model and a tool through which you can specify the environment. Usage is priced at $3/1M input tokens and $12/1M output tokens, and it's currently only available to users in tiers 3-5.File Search ToolFile Search was also available in the Assistants API, and it's now coming to Responses too. OpenAI is bringing search + RAG all under one umbrella, and we'll definitely see more people trying to find new ways to build all-in-one apps on OpenAI. Usage is priced at $2.50 per thousand queries and file storage at $0.10/GB/day, with the first GB free.Agent SDK: Swarms++!https://github.com/openai/openai-agents-pythonTo bring it all together, after the viral reception to Swarm, OpenAI is releasing an officially supported agents framework (which was previewed at our AI Engineer Summit) with 4 core pieces:* Agents: Easily configurable LLMs with clear instructions and built-in tools.* Handoffs: Intelligently transfer control between agents.* Guardrails: Configurable safety checks for input and output validation.* Tracing & Observability: Visualize agent execution traces to debug and optimize performance.Multi-agent workflows are here to stay!OpenAI is now explicitly designs for a set of common agentic patterns: Workflows, Handoffs, Agents-as-Tools, LLM-as-a-Judge, Parallelization, and Guardrails. OpenAI previewed this in part 2 of their talk at NYC:Further coverage of the launch from Kevin Weil, WSJ, and OpenAIDevs, AMA here.Show Notes* Assistants API* Swarm (OpenAI)* Fine-Tuning in AI* 2024 OpenAI DevDay Recap with Romain* Michelle Pokrass episode (API lead)Timestamps* 00:00 Intros* 02:31 Responses API * 08:34 Web Search API * 17:14 Files Search API * 18:46 Files API vs RAG * 20:06 Computer Use / Operator API * 22:30 Agents SDKAnd of course you can catch up with the full livestream here:TranscriptAlessio [00:00:03]: Hey, everyone. Welcome back to another Latent Space Lightning episode. This is Alessio, partner and CTO at Decibel, and I'm joined by Swyx, founder of Small AI.swyx [00:00:11]: Hi, and today we have a super special episode because we're talking with our old friend Roman. Hi, welcome.Romain [00:00:19]: Thank you. Thank you for having me.swyx [00:00:20]: And Nikunj, who is most famously, if anyone has ever tried to get any access to anything on the API, Nikunj is the guy. So I know your emails because I look forward to them.Nikunj [00:00:30]: Yeah, nice to meet all of you.swyx [00:00:32]: I think that we're basically convening today to talk about the new API. So perhaps you guys want to just kick off. What is OpenAI launching today?Nikunj [00:00:40]: Yeah, so I can kick it off. We're launching a bunch of new things today. We're going to do three new built-in tools. So we're launching the web search tool. This is basically chat GPD for search, but available in the API. We're launching an improved file search tool. So this is you bringing your data to OpenAI. You upload it. We, you know, take care of parsing it, chunking it. We're embedding it, making it searchable, give you this like ready vector store that you can use. So that's the file search tool. And then we're also launching our computer use tool. So this is the tool behind the operator product in chat GPD. So that's coming to developers today. And to support all of these tools, we're going to have a new API. So, you know, we launched chat completions, like I think March 2023 or so. It's been a while. So we're looking for an update over here to support all the new things that the models can do. And so we're launching this new API. It is, you know, it works with tools. We think it'll be like a great option for all the future agentic products that we build. And so that is also launching today. Actually, the last thing we're launching is the agents SDK. We launched this thing called Swarm last year where, you know, it was an experimental SDK for people to do multi-agent orchestration and stuff like that. It was supposed to be like educational experimental, but like people, people really loved it. They like ate it up. And so we are like, all right, let's, let's upgrade this thing. Let's give it a new name. And so we're calling it the agents SDK. It's going to have built-in tracing in the OpenAI dashboard. So lots of cool stuff going out. So, yeah.Romain [00:02:14]: That's a lot, but we said 2025 was the year of agents. So there you have it, like a lot of new tools to build these agents for developers.swyx [00:02:20]: Okay. I guess, I guess we'll just kind of go one by one and we'll leave the agents SDK towards the end. So responses API, I think the sort of primary concern that people have and something I think I've voiced to you guys when, when, when I was talking with you in the, in the planning process was, is chat completions going away? So I just wanted to let it, let you guys respond to the concerns that people might have.Romain [00:02:41]: Chat completion is definitely like here to stay, you know, it's a bare metal API we've had for quite some time. Lots of tools built around it. So we want to make sure that it's maintained and people can confidently keep on building on it. At the same time, it was kind of optimized for a different world, right? It was optimized for a pre-multi-modality world. We also optimized for kind of single turn. It takes two problems. It takes prompt in, it takes response out. And now with these agentic workflows, we, we noticed that like developers and companies want to build longer horizon tasks, you know, like things that require multiple returns to get the task accomplished. And computer use is one of those, for instance. And so that's why the responses API came to life to kind of support these new agentic workflows. But chat completion is definitely here to stay.swyx [00:03:27]: And assistance API, we've, uh, has a target sunset date of first half of 2020. So this is kind of like, in my mind, there was a kind of very poetic mirroring of the API with the models. This, I kind of view this as like kind of the merging of assistance API and chat completions, right. Into one unified responses. So it's kind of like how GPT and the old series models are also unifying.Romain [00:03:48]: Yeah, that's exactly the right, uh, that's the right framing, right? Like, I think we took the best of what we learned from the assistance API, especially like being able to access tools very, uh, very like conveniently, but at the same time, like simplifying the way you have to integrate, like, you no longer have to think about six different objects to kind of get access to these tools with the responses API. You just get one API request and suddenly you can weave in those tools, right?Nikunj [00:04:12]: Yeah, absolutely. And I think we're going to make it really easy and straightforward for assistance API users to migrate over to responsive. Right. To the API without any loss of functionality or data. So our plan is absolutely to add, you know, assistant like objects and thread light objects to that, that work really well with the responses API. We'll also add like the code interpreter tool, which is not launching today, but it'll come soon. And, uh, we'll add async mode to responses API, because that's another difference with, with, uh, assistance. I will have web hooks and stuff like that, but I think it's going to be like a pretty smooth transition. Uh, once we have all of that in place. And we'll be. Like a full year to migrate and, and help them through any issues they, they, they face. So overall, I feel like assistance users are really going to benefit from this longer term, uh, with this more flexible, primitive.Alessio [00:05:01]: How should people think about when to use each type of API? So I know that in the past, the assistance was maybe more stateful, kind of like long running, many tool use kind of like file based things. And the chat completions is more stateless, you know, kind of like traditional completion API. Is that still the mental model that people should have? Or like, should you buy the.Nikunj [00:05:20]: So the responses API is going to support everything that it's at launch, going to support everything that chat completion supports, and then over time, it's going to support everything that assistance supports. So it's going to be a pretty good fit for anyone starting out with open AI. Uh, they should be able to like go to responses responses, by the way, also has a stateless mode, so you can pass in store false and they'll make the whole API stateless, just like chat completions. You're really trying to like get this unification. A story in so that people don't have to juggle multiple endpoints. That being said, like chat completions, just like the most widely adopted API, it's it's so popular. So we're still going to like support it for years with like new models and features. But if you're a new user, you want to or if you want to like existing, you want to tap into some of these like built in tools or something, you should feel feel totally fine migrating to responses and you'll have more capabilities and performance than the tech completions.swyx [00:06:16]: I think the messaging that I agree that I think resonated the most. When I talked to you was that it is a strict superset, right? Like you should be able to do everything that you could do in chat completions and with assistants. And the thing that I just assumed that because you're you're now, you know, by default is stateful, you're actually storing the chat logs or the chat state. I thought you'd be charging me for it. So, you know, to me, it was very surprising that you figured out how to make it free.Nikunj [00:06:43]: Yeah, it's free. We store your state for 30 days. You can turn it off. But yeah, it's it's free. And the interesting thing on state is that it just like makes particularly for me, it makes like debugging things and building things so much simpler, where I can like create a responses object that's like pretty complicated and part of this more complex application that I've built, I can just go into my dashboard and see exactly what happened that mess up my prompt that is like not called one of these tools that misconfigure one of the tools like the visual observability of everything that you're doing is so, so helpful. So I'm excited, like about people trying that out and getting benefits from it, too.swyx [00:07:19]: Yeah, it's a it's really, I think, a really nice to have. But all I'll say is that my friend Corey Quinn says that anything that can be used as a database will be used as a database. So be prepared for some abuse.Romain [00:07:34]: All right. Yeah, that's a good one. Some of that I've tried with the metadata. That's some people are very, very creative at stuffing data into an object. Yeah.Nikunj [00:07:44]: And we do have metadata with responses. Exactly. Yeah.Alessio [00:07:48]: Let's get through it. All of these. So web search. I think the when I first said web search, I thought you were going to just expose a API that then return kind of like a nice list of thing. But the way it's name is like GPD for all search preview. So I'm guessing you have you're using basically the same model that is in the chat GPD search, which is fine tune for search. I'm guessing it's a different model than the base one. And it's impressive the jump in performance. So just to give an example, in simple QA, GPD for all is 38% accuracy for all search is 90%. But we always talk about. How tools are like models is not everything you need, like tools around it are just as important. So, yeah, maybe give people a quick review on like the work that went into making this special.Nikunj [00:08:29]: Should I take that?Alessio [00:08:29]: Yeah, go for it.Nikunj [00:08:30]: So firstly, we're launching web search in two ways. One in responses API, which is our API for tools. It's going to be available as a web search tool itself. So you'll be able to go tools, turn on web search and you're ready to go. We still wanted to give chat completions people access to real time information. So in that. Chat completions API, which does not support built in tools. We're launching the direct access to the fine tuned model that chat GPD for search uses, and we call it GPD for search preview. And how is this model built? Basically, we have our search research team has been working on this for a while. Their main goal is to, like, get information, like get a bunch of information from all of our data sources that we use to gather information for search and then pick the right things and then cite them. As accurately as possible. And that's what the search team has really focused on. They've done some pretty cool stuff. They use like synthetic data techniques. They've done like all series model distillation to, like, make these four or fine tunes really good. But yeah, the main thing is, like, can it remain factual? Can it answer questions based on what it retrieves and get cited accurately? And that's what this like fine tune model really excels at. And so, yeah, so we're excited that, like, it's going to be directly available in chat completions along with being available as a tool. Yeah.Alessio [00:09:49]: Just to clarify, if I'm using the responses API, this is a tool. But if I'm using chat completions, I have to switch model. I cannot use 01 and call search as a tool. Yeah, that's right. Exactly.Romain [00:09:58]: I think what's really compelling, at least for me and my own uses of it so far, is that when you use, like, web search as a tool, it combines nicely with every other tool and every other feature of the platform. So think about this for a second. For instance, imagine you have, like, a responses API call with the web search tool, but suddenly you turn on function calling. You also turn on, let's say, structure. So you can have, like, the ability to structure any data from the web in real time in the JSON schema that you need for your application. So it's quite powerful when you start combining those features and tools together. It's kind of like an API for the Internet almost, you know, like you get, like, access to the precise schema you need for your app. Yeah.Alessio [00:10:39]: And then just to wrap up on the infrastructure side of it, I read on the post that people, publisher can choose to appear in the web search. So are people by default in it? Like, how can we get Latent Space in the web search API?Nikunj [00:10:53]: Yeah. Yeah. I think we have some documentation around how websites, publishers can control, like, what shows up in a web search tool. And I think you should be able to, like, read that. I think we should be able to get Latent Space in for sure. Yeah.swyx [00:11:10]: You know, I think so. I compare this to a broader trend that I started covering last year of online LLMs. Actually, Perplexity, I think, was the first. It was the first to say, to offer an API that is connected to search, and then Gemini had the sort of search grounding API. And I think you guys, I actually didn't, I missed this in the original reading of the docs, but you even give like citations with like the exact sub paragraph that is matching, which I think is the standard nowadays. I think my question is, how do we take what a knowledge cutoff is for something like this, right? Because like now, basically there's no knowledge cutoff is always live, but then there's a difference between what the model has sort of internalized in its back propagation and what is searching up its rag.Romain [00:11:53]: I think it kind of depends on the use case, right? And what you want to showcase as the source. Like, for instance, you take a company like Hebbia that has used this like web search tool. They can combine like for credit firms or law firms, they can find like, you know, public information from the internet with the live sources and citation that sometimes you do want to have access to, as opposed to like the internal knowledge. But if you're building something different, well, like, you just want to have the information. If you want to have an assistant that relies on the deep knowledge that the model has, you may not need to have these like direct citations. So I think it kind of depends on the use case a little bit, but there are many, uh, many companies like Hebbia that will need that access to these citations to precisely know where the information comes from.swyx [00:12:34]: Yeah, yeah, uh, for sure. And then one thing on the, on like the breadth, you know, I think a lot of the deep research, open deep research implementations have this sort of hyper parameter about, you know, how deep they're searching and how wide they're searching. I don't see that in the docs. But is that something that we can tune? Is that something you recommend thinking about?Nikunj [00:12:53]: Super interesting. It's definitely not a parameter today, but we should explore that. It's very interesting. I imagine like how you would do it with the web search tool and responsive API is you would have some form of like, you know, agent orchestration over here where you have a planning step and then each like web search call that you do like explicitly goes a layer deeper and deeper and deeper. But it's not a parameter that's available out of the box. But it's a cool. It's a cool thing to think about. Yeah.swyx [00:13:19]: The only guidance I'll offer there is a lot of these implementations offer top K, which is like, you know, top 10, top 20, but actually don't really want that. You want like sort of some kind of similarity cutoff, right? Like some matching score cuts cutoff, because if there's only five things, five documents that match fine, if there's 500 that match, maybe that's what I want. Right. Yeah. But also that might, that might make my costs very unpredictable because the costs are something like $30 per a thousand queries, right? So yeah. Yeah.Nikunj [00:13:49]: I guess you could, you could have some form of like a context budget and then you're like, go as deep as you can and pick the best stuff and put it into like X number of tokens. There could be some creative ways of, of managing cost, but yeah, that's a super interesting thing to explore.Alessio [00:14:05]: Do you see people using the files and the search API together where you can kind of search and then store everything in the file so the next time I'm not paying for the search again and like, yeah, how should people balance that?Nikunj [00:14:17]: That's actually a very interesting question. And let me first tell you about how I've seen a really cool way I've seen people use files and search together is they put their user preferences or memories in the vector store and so a query comes in, you use the file search tool to like get someone's like reading preferences or like fashion preferences and stuff like that, and then you search the web for information or products that they can buy related to those preferences and you then render something beautiful to show them, like, here are five things that you might be interested in. So that's how I've seen like file search, web search work together. And by the way, that's like a single responses API call, which is really cool. So you just like configure these things, go boom, and like everything just happens. But yeah, that's how I've seen like files and web work together.Romain [00:15:01]: But I think that what you're pointing out is like interesting, and I'm sure developers will surprise us as they always do in terms of how they combine these tools and how they might use file search as a way to have memory and preferences, like Nikum says. But I think like zooming out, what I find very compelling and powerful here is like when you have these like neural networks. That have like all of the knowledge that they have today, plus real time access to the Internet for like any kind of real time information that you might need for your app and file search, where you can have a lot of company, private documents, private details, you combine those three, and you have like very, very compelling and precise answers for any kind of use case that your company or your product might want to enable.swyx [00:15:41]: It's a difference between sort of internal documents versus the open web, right? Like you're going to need both. Exactly, exactly. I never thought about it doing memory as well. I guess, again, you know, anything that's a database, you can store it and you will use it as a database. That sounds awesome. But I think also you've been, you know, expanding the file search. You have more file types. You have query optimization, custom re-ranking. So it really seems like, you know, it's been fleshed out. Obviously, I haven't been paying a ton of attention to the file search capability, but it sounds like your team has added a lot of features.Nikunj [00:16:14]: Yeah, metadata filtering was like the main thing people were asking us for for a while. And I'm super excited about it. I mean, it's just so critical once your, like, web store size goes over, you know, more than like, you know, 5,000, 10,000 records, you kind of need that. So, yeah, metadata filtering is coming, too.Romain [00:16:31]: And for most companies, it's also not like a competency that you want to rebuild in-house necessarily, you know, like, you know, thinking about embeddings and chunking and, you know, how of that, like, it sounds like very complex for something very, like, obvious to ship for your users. Like companies like Navant, for instance. They were able to build with the file search, like, you know, take all of the FAQ and travel policies, for instance, that you have, you, you put that in file search tool, and then you don't have to think about anything. Now your assistant becomes naturally much more aware of all of these policies from the files.swyx [00:17:03]: The question is, like, there's a very, very vibrant RAG industry already, as you well know. So there's many other vector databases, many other frameworks. Probably if it's an open source stack, I would say like a lot of the AI engineers that I talk to want to own this part of the stack. And it feels like, you know, like, when should we DIY and when should we just use whatever OpenAI offers?Nikunj [00:17:24]: Yeah. I mean, like, if you're doing something completely from scratch, you're going to have more control, right? Like, so super supportive of, you know, people trying to, like, roll up their sleeves, build their, like, super custom chunking strategy and super custom retrieval strategy and all of that. And those are things that, like, will be harder to do with OpenAI tools. OpenAI tool has, like, we have an out-of-the-box solution. We give you the tools. We use some knobs to customize things, but it's more of, like, a managed RAG service. So my recommendation would be, like, start with the OpenAI thing, see if it, like, meets your needs. And over time, we're going to be adding more and more knobs to make it even more customizable. But, you know, if you want, like, the completely custom thing, you want control over every single thing, then you'd probably want to go and hand roll it using other solutions. So we're supportive of both, like, engineers should pick. Yeah.Alessio [00:18:16]: And then we got computer use. Which I think Operator was obviously one of the hot releases of the year. And we're only two months in. Let's talk about that. And that's also, it seems like a separate model that has been fine-tuned for Operator that has browser access.Nikunj [00:18:31]: Yeah, absolutely. I mean, the computer use models are exciting. The cool thing about computer use is that we're just so, so early. It's like the GPT-2 of computer use or maybe GPT-1 of computer use right now. But it is a separate model that has been, you know, the computer. The computer use team has been working on, you send it screenshots and it tells you what action to take. So the outputs of it are almost always tool calls and you're inputting screenshots based on whatever computer you're trying to operate.Romain [00:19:01]: Maybe zooming out for a second, because like, I'm sure your audience is like super, super like AI native, obviously. But like, what is computer use as a tool, right? And what's operator? So the idea for computer use is like, how do we let developers also build agents that can complete tasks for the users, but using a computer? Okay. Or a browser instead. And so how do you get that done? And so that's why we have this custom model, like optimized for computer use that we use like for operator ourselves. But the idea behind like putting it as an API is that imagine like now you want to, you want to automate some tasks for your product or your own customers. Then now you can, you can have like the ability to spin up one of these agents that will look at the screen and act on the screen. So that means able, the ability to click, the ability to scroll. The ability to type and to report back on the action. So that's what we mean by computer use and wrapping it as a tool also in the responses API. So now like that gives a hint also at the multi-turned thing that we were hinting at earlier, the idea that like, yeah, maybe one of these actions can take a couple of minutes to complete because there's maybe like 20 steps to complete that task. But now you can.swyx [00:20:08]: Do you think a computer use can play Pokemon?Romain [00:20:11]: Oh, interesting. I guess we tried it. I guess we should try it. You know?swyx [00:20:17]: Yeah. There's a lot of interest. I think Pokemon really is a good agent benchmark, to be honest. Like it seems like Claude is, Claude is running into a lot of trouble.Romain [00:20:25]: Sounds like we should make that a new eval, it looks like.swyx [00:20:28]: Yeah. Yeah. Oh, and then one more, one more thing before we move on to agents SDK. I know you have a hard stop. There's all these, you know, blah, blah, dash preview, right? Like search preview, computer use preview, right? And you see them all like fine tunes of 4.0. I think the question is, are we, are they all going to be merged into the main branch or are we basically always going to have subsets? Of these models?Nikunj [00:20:49]: Yeah, I think in the early days, research teams at OpenAI like operate with like fine tune models. And then once the thing gets like more stable, we sort of merge it into the main line. So that's definitely the vision, like going out of preview as we get more comfortable with and learn about all the developer use cases and we're doing a good job at them. We'll sort of like make them part of like the core models so that you don't have to like deal with the bifurcation.Romain [00:21:12]: You should think of it this way as exactly what happened last year when we introduced vision capabilities, you know. Yes. Vision capabilities were in like a vision preview model based off of GPT-4 and then vision capabilities now are like obviously built into GPT-4.0. You can think about it the same way for like the other modalities like audio and those kind of like models, like optimized for search and computer use.swyx [00:21:34]: Agents SDK, we have a few minutes left. So let's just assume that everyone has looked at Swarm. Sure. I think that Swarm has really popularized the handoff technique, which I thought was like, you know, really, really interesting for sort of a multi-agent. What is new with the SDK?Nikunj [00:21:50]: Yeah. Do you want to start? Yeah, for sure. So we've basically added support for types. We've made this like a lot. Yeah. Like we've added support for types. We've added support for guard railing, which is a very common pattern. So in the guardrail example, you basically have two things happen in parallel. The guardrail can sort of block the execution. It's a type of like optimistic generation that happens. And I think we've added support for tracing. So I think that's really cool. So you can basically look at the traces that the Agents SDK creates in the OpenAI dashboard. We also like made this pretty flexible. So you can pick any API from any provider that supports the ChatCompletions API format. So it supports responses by default, but you can like easily plug it in to anyone that uses the ChatCompletions API. And similarly, on the tracing side, you can support like multiple tracing providers. By default, it sort of points to the OpenAI dashboard. But, you know, there's like so many tracing providers. There's so many tracing companies out there. And we'll announce some partnerships on that front, too. So just like, you know, adding lots of core features and making it more usable, but still centered around like handoffs is like the main, main concept.Romain [00:22:59]: And by the way, it's interesting, right? Because Swarm just came to life out of like learning from customers directly that like orchestrating agents in production was pretty hard. You know, simple ideas could quickly turn very complex. Like what are those guardrails? What are those handoffs, et cetera? So that came out of like learning from customers. And it was initially shipped. It was not as a like low-key experiment, I'd say. But we were kind of like taken by surprise at how much momentum there was around this concept. And so we decided to learn from that and embrace it. To be like, okay, maybe we should just embrace that as a core primitive of the OpenAI platform. And that's kind of what led to the Agents SDK. And I think now, as Nikuj mentioned, it's like adding all of these new capabilities to it, like leveraging the handoffs that we had, but tracing also. And I think what's very compelling for developers is like instead of having one agent to rule them all and you stuff like a lot of tool calls in there that can be hard to monitor, now you have the tools you need to kind of like separate the logic, right? And you can have a triage agent that based on an intent goes to different kind of agents. And then on the OpenAI dashboard, we're releasing a lot of new user interface logs as well. So you can see all of the tracing UIs. Essentially, you'll be able to troubleshoot like what exactly happened. In that workflow, when the triage agent did a handoff to a secondary agent and the third and see the tool calls, et cetera. So we think that the Agents SDK combined with the tracing UIs will definitely help users and developers build better agentic workflows.Alessio [00:24:28]: And just before we wrap, are you thinking of connecting this with also the RFT API? Because I know you already have, you kind of store my text completions and then I can do fine tuning of that. Is that going to be similar for agents where you're storing kind of like my traces? And then help me improve the agents?Nikunj [00:24:43]: Yeah, absolutely. Like you got to tie the traces to the evals product so that you can generate good evals. Once you have good evals and graders and tasks, you can use that to do reinforcement fine tuning. And, you know, lots of details to be figured out over here. But that's the vision. And I think we're going to go after it like pretty hard and hope we can like make this whole workflow a lot easier for developers.Alessio [00:25:05]: Awesome. Thank you so much for the time. I'm sure you'll be busy on Twitter tomorrow with all the developer feedback. Yeah.Romain [00:25:12]: Thank you so much for having us. And as always, we can't wait to see what developers will build with these tools and how we can like learn as quickly as we can from them to make them even better over time.Nikunj [00:25:21]: Yeah.Romain [00:25:22]: Thank you, guys.Nikunj [00:25:23]: Thank you.Romain [00:25:23]: Thank you both. Awesome. Get full access to Latent.Space at www.latent.space/subscribe
SANS Internet Stormcenter Daily Network/Cyber Security and Information Security Stormcast
Romanian Distillery Scanning for SMTP Credentials A particular attacker expanded the scope of their leaked credential file scans. In addition to the usual ".env" style files, it is not looking for specific SMTP related credential files. https://isc.sans.edu/diary/Romanian%20Distillery%20Scanning%20for%20SMTP%20Credentials/31736 Tool Updates: mac-robber.py This update of mac-robber.py fixes issues with symlinks. https://isc.sans.edu/diary/Tool%20update%3A%20mac-robber.py/31738 CVE-2025-1723 Account takeover vulnerability in ADSelfService Plus CVE-2025-1723 describes a vulnerability caused by session mishandling in ADSelfService Plus that could allow unauthorized access to user enrollment data when MFA was not enabled for ADSelfService Plus login. https://www.manageengine.com/products/self-service-password/advisory/CVE-2025-1723.html Android March Update Google released an update for Android addressing two already exploited vulnerabilities and several critical issues. https://source.android.com/docs/security/bulletin/2025-03-01 PayPal's no-code-checkout Abuse Attackers are using PayPal's no-code-checkout feature is being abused by scammers to host PayPal tech support scam pages right within the PayPal.com domain. https://www.malwarebytes.com/blog/scams/2025/02/paypals-no-code-checkout-abused-by-scammers Broadcom Fixes three VMWare VCenter Vulnerabilities https://github.com/vmware/vcf-security-and-compliance-guidelines/tree/main/security-advisories/vmsa-2025-0004
Today's episode is with Paul Klein, founder of Browserbase. We talked about building browser infrastructure for AI agents, the future of agent authentication, and their open source framework Stagehand.* [00:00:00] Introductions* [00:04:46] AI-specific challenges in browser infrastructure* [00:07:05] Multimodality in AI-Powered Browsing* [00:12:26] Running headless browsers at scale* [00:18:46] Geolocation when proxying* [00:21:25] CAPTCHAs and Agent Auth* [00:28:21] Building “User take over” functionality* [00:33:43] Stagehand: AI web browsing framework* [00:38:58] OpenAI's Operator and computer use agents* [00:44:44] Surprising use cases of Browserbase* [00:47:18] Future of browser automation and market competition* [00:53:11] Being a solo founderTranscriptAlessio [00:00:04]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.swyx [00:00:12]: Hey, and today we are very blessed to have our friends, Paul Klein, for the fourth, the fourth, CEO of Browserbase. Welcome.Paul [00:00:21]: Thanks guys. Yeah, I'm happy to be here. I've been lucky to know both of you for like a couple of years now, I think. So it's just like we're hanging out, you know, with three ginormous microphones in front of our face. It's totally normal hangout.swyx [00:00:34]: Yeah. We've actually mentioned you on the podcast, I think, more often than any other Solaris tenant. Just because like you're one of the, you know, best performing, I think, LLM tool companies that have started up in the last couple of years.Paul [00:00:50]: Yeah, I mean, it's been a whirlwind of a year, like Browserbase is actually pretty close to our first birthday. So we are one years old. And going from, you know, starting a company as a solo founder to... To, you know, having a team of 20 people, you know, a series A, but also being able to support hundreds of AI companies that are building AI applications that go out and automate the web. It's just been like, really cool. It's been happening a little too fast. I think like collectively as an AI industry, let's just take a week off together. I took my first vacation actually two weeks ago, and Operator came out on the first day, and then a week later, DeepSeat came out. And I'm like on vacation trying to chill. I'm like, we got to build with this stuff, right? So it's been a breakneck year. But I'm super happy to be here and like talk more about all the stuff we're seeing. And I'd love to hear kind of what you guys are excited about too, and share with it, you know?swyx [00:01:39]: Where to start? So people, you've done a bunch of podcasts. I think I strongly recommend Jack Bridger's Scaling DevTools, as well as Turner Novak's The Peel. And, you know, I'm sure there's others. So you covered your Twilio story in the past, talked about StreamClub, you got acquired to Mux, and then you left to start Browserbase. So maybe we just start with what is Browserbase? Yeah.Paul [00:02:02]: Browserbase is the web browser for your AI. We're building headless browser infrastructure, which are browsers that run in a server environment that's accessible to developers via APIs and SDKs. It's really hard to run a web browser in the cloud. You guys are probably running Chrome on your computers, and that's using a lot of resources, right? So if you want to run a web browser or thousands of web browsers, you can't just spin up a bunch of lambdas. You actually need to use a secure containerized environment. You have to scale it up and down. It's a stateful system. And that infrastructure is, like, super painful. And I know that firsthand, because at my last company, StreamClub, I was CTO, and I was building our own internal headless browser infrastructure. That's actually why we sold the company, is because Mux really wanted to buy our headless browser infrastructure that we'd built. And it's just a super hard problem. And I actually told my co-founders, I would never start another company unless it was a browser infrastructure company. And it turns out that's really necessary in the age of AI, when AI can actually go out and interact with websites, click on buttons, fill in forms. You need AI to do all of that work in an actual browser running somewhere on a server. And BrowserBase powers that.swyx [00:03:08]: While you're talking about it, it occurred to me, not that you're going to be acquired or anything, but it occurred to me that it would be really funny if you became the Nikita Beer of headless browser companies. You just have one trick, and you make browser companies that get acquired.Paul [00:03:23]: I truly do only have one trick. I'm screwed if it's not for headless browsers. I'm not a Go programmer. You know, I'm in AI grant. You know, browsers is an AI grant. But we were the only company in that AI grant batch that used zero dollars on AI spend. You know, we're purely an infrastructure company. So as much as people want to ask me about reinforcement learning, I might not be the best guy to talk about that. But if you want to ask about headless browser infrastructure at scale, I can talk your ear off. So that's really my area of expertise. And it's a pretty niche thing. Like, nobody has done what we're doing at scale before. So we're happy to be the experts.swyx [00:03:59]: You do have an AI thing, stagehand. We can talk about the sort of core of browser-based first, and then maybe stagehand. Yeah, stagehand is kind of the web browsing framework. Yeah.What is Browserbase? Headless Browser Infrastructure ExplainedAlessio [00:04:10]: Yeah. Yeah. And maybe how you got to browser-based and what problems you saw. So one of the first things I worked on as a software engineer was integration testing. Sauce Labs was kind of like the main thing at the time. And then we had Selenium, we had Playbrite, we had all these different browser things. But it's always been super hard to do. So obviously you've worked on this before. When you started browser-based, what were the challenges? What were the AI-specific challenges that you saw versus, there's kind of like all the usual running browser at scale in the cloud, which has been a problem for years. What are like the AI unique things that you saw that like traditional purchase just didn't cover? Yeah.AI-specific challenges in browser infrastructurePaul [00:04:46]: First and foremost, I think back to like the first thing I did as a developer, like as a kid when I was writing code, I wanted to write code that did stuff for me. You know, I wanted to write code to automate my life. And I do that probably by using curl or beautiful soup to fetch data from a web browser. And I think I still do that now that I'm in the cloud. And the other thing that I think is a huge challenge for me is that you can't just create a web site and parse that data. And we all know that now like, you know, taking HTML and plugging that into an LLM, you can extract insights, you can summarize. So it was very clear that now like dynamic web scraping became very possible with the rise of large language models or a lot easier. And that was like a clear reason why there's been more usage of headless browsers, which are necessary because a lot of modern websites don't expose all of their page content via a simple HTTP request. You know, they actually do require you to run this type of code for a specific time. JavaScript on the page to hydrate this. Airbnb is a great example. You go to airbnb.com. A lot of that content on the page isn't there until after they run the initial hydration. So you can't just scrape it with a curl. You need to have some JavaScript run. And a browser is that JavaScript engine that's going to actually run all those requests on the page. So web data retrieval was definitely one driver of starting BrowserBase and the rise of being able to summarize that within LLM. Also, I was familiar with if I wanted to automate a website, I could write one script and that would work for one website. It was very static and deterministic. But the web is non-deterministic. The web is always changing. And until we had LLMs, there was no way to write scripts that you could write once that would run on any website. That would change with the structure of the website. Click the login button. It could mean something different on many different websites. And LLMs allow us to generate code on the fly to actually control that. So I think that rise of writing the generic automation scripts that can work on many different websites, to me, made it clear that browsers are going to be a lot more useful because now you can automate a lot more things without writing. If you wanted to write a script to book a demo call on 100 websites, previously, you had to write 100 scripts. Now you write one script that uses LLMs to generate that script. That's why we built our web browsing framework, StageHand, which does a lot of that work for you. But those two things, web data collection and then enhanced automation of many different websites, it just felt like big drivers for more browser infrastructure that would be required to power these kinds of features.Alessio [00:07:05]: And was multimodality also a big thing?Paul [00:07:08]: Now you can use the LLMs to look, even though the text in the dome might not be as friendly. Maybe my hot take is I was always kind of like, I didn't think vision would be as big of a driver. For UI automation, I felt like, you know, HTML is structured text and large language models are good with structured text. But it's clear that these computer use models are often vision driven, and they've been really pushing things forward. So definitely being multimodal, like rendering the page is required to take a screenshot to give that to a computer use model to take actions on a website. And it's just another win for browser. But I'll be honest, that wasn't what I was thinking early on. I didn't even think that we'd get here so fast with multimodality. I think we're going to have to get back to multimodal and vision models.swyx [00:07:50]: This is one of those things where I forgot to mention in my intro that I'm an investor in Browserbase. And I remember that when you pitched to me, like a lot of the stuff that we have today, we like wasn't on the original conversation. But I did have my original thesis was something that we've talked about on the podcast before, which is take the GPT store, the custom GPT store, all the every single checkbox and plugin is effectively a startup. And this was the browser one. I think the main hesitation, I think I actually took a while to get back to you. The main hesitation was that there were others. Like you're not the first hit list browser startup. It's not even your first hit list browser startup. There's always a question of like, will you be the category winner in a place where there's a bunch of incumbents, to be honest, that are bigger than you? They're just not targeted at the AI space. They don't have the backing of Nat Friedman. And there's a bunch of like, you're here in Silicon Valley. They're not. I don't know.Paul [00:08:47]: I don't know if that's, that was it, but like, there was a, yeah, I mean, like, I think I tried all the other ones and I was like, really disappointed. Like my background is from working at great developer tools, companies, and nothing had like the Vercel like experience. Um, like our biggest competitor actually is partly owned by private equity and they just jacked up their prices quite a bit. And the dashboard hasn't changed in five years. And I actually used them at my last company and tried them and I was like, oh man, like there really just needs to be something that's like the experience of these great infrastructure companies, like Stripe, like clerk, like Vercel that I use in love, but oriented towards this kind of like more specific category, which is browser infrastructure, which is really technically complex. Like a lot of stuff can go wrong on the internet when you're running a browser. The internet is very vast. There's a lot of different configurations. Like there's still websites that only work with internet explorer out there. How do you handle that when you're running your own browser infrastructure? These are the problems that we have to think about and solve at BrowserBase. And it's, it's certainly a labor of love, but I built this for me, first and foremost, I know it's super cheesy and everyone says that for like their startups, but it really, truly was for me. If you look at like the talks I've done even before BrowserBase, and I'm just like really excited to try and build a category defining infrastructure company. And it's, it's rare to have a new category of infrastructure exists. We're here in the Chroma offices and like, you know, vector databases is a new category of infrastructure. Is it, is it, I mean, we can, we're in their office, so, you know, we can, we can debate that one later. That is one.Multimodality in AI-Powered Browsingswyx [00:10:16]: That's one of the industry debates.Paul [00:10:17]: I guess we go back to the LLMOS talk that Karpathy gave way long ago. And like the browser box was very clearly there and it seemed like the people who were building in this space also agreed that browsers are a core primitive of infrastructure for the LLMOS that's going to exist in the future. And nobody was building something there that I wanted to use. So I had to go build it myself.swyx [00:10:38]: Yeah. I mean, exactly that talk that, that honestly, that diagram, every box is a startup and there's the code box and then there's the. The browser box. I think at some point they will start clashing there. There's always the question of the, are you a point solution or are you the sort of all in one? And I think the point solutions tend to win quickly, but then the only ones have a very tight cohesive experience. Yeah. Let's talk about just the hard problems of browser base you have on your website, which is beautiful. Thank you. Was there an agency that you used for that? Yeah. Herb.paris.Paul [00:11:11]: They're amazing. Herb.paris. Yeah. It's H-E-R-V-E. I highly recommend for developers. Developer tools, founders to work with consumer agencies because they end up building beautiful things and the Parisians know how to build beautiful interfaces. So I got to give prep.swyx [00:11:24]: And chat apps, apparently are, they are very fast. Oh yeah. The Mistral chat. Yeah. Mistral. Yeah.Paul [00:11:31]: Late chat.swyx [00:11:31]: Late chat. And then your videos as well, it was professionally shot, right? The series A video. Yeah.Alessio [00:11:36]: Nico did the videos. He's amazing. Not the initial video that you shot at the new one. First one was Austin.Paul [00:11:41]: Another, another video pretty surprised. But yeah, I mean, like, I think when you think about how you talk about your company. You have to think about the way you present yourself. It's, you know, as a developer, you think you evaluate a company based on like the API reliability and the P 95, but a lot of developers say, is the website good? Is the message clear? Do I like trust this founder? I'm building my whole feature on. So I've tried to nail that as well as like the reliability of the infrastructure. You're right. It's very hard. And there's a lot of kind of foot guns that you run into when running headless browsers at scale. Right.Competing with Existing Headless Browser Solutionsswyx [00:12:10]: So let's pick one. You have eight features here. Seamless integration. Scalability. Fast or speed. Secure. Observable. Stealth. That's interesting. Extensible and developer first. What comes to your mind as like the top two, three hardest ones? Yeah.Running headless browsers at scalePaul [00:12:26]: I think just running headless browsers at scale is like the hardest one. And maybe can I nerd out for a second? Is that okay? I heard this is a technical audience, so I'll talk to the other nerds. Whoa. They were listening. Yeah. They're upset. They're ready. The AGI is angry. Okay. So. So how do you run a browser in the cloud? Let's start with that, right? So let's say you're using a popular browser automation framework like Puppeteer, Playwright, and Selenium. Maybe you've written a code, some code locally on your computer that opens up Google. It finds the search bar and then types in, you know, search for Latent Space and hits the search button. That script works great locally. You can see the little browser open up. You want to take that to production. You want to run the script in a cloud environment. So when your laptop is closed, your browser is doing something. The browser is doing something. Well, I, we use Amazon. You can see the little browser open up. You know, the first thing I'd reach for is probably like some sort of serverless infrastructure. I would probably try and deploy on a Lambda. But Chrome itself is too big to run on a Lambda. It's over 250 megabytes. So you can't easily start it on a Lambda. So you maybe have to use something like Lambda layers to squeeze it in there. Maybe use a different Chromium build that's lighter. And you get it on the Lambda. Great. It works. But it runs super slowly. It's because Lambdas are very like resource limited. They only run like with one vCPU. You can run one process at a time. Remember, Chromium is super beefy. It's barely running on my MacBook Air. I'm still downloading it from a pre-run. Yeah, from the test earlier, right? I'm joking. But it's big, you know? So like Lambda, it just won't work really well. Maybe it'll work, but you need something faster. Your users want something faster. Okay. Well, let's put it on a beefier instance. Let's get an EC2 server running. Let's throw Chromium on there. Great. Okay. I can, that works well with one user. But what if I want to run like 10 Chromium instances, one for each of my users? Okay. Well, I might need two EC2 instances. Maybe 10. All of a sudden, you have multiple EC2 instances. This sounds like a problem for Kubernetes and Docker, right? Now, all of a sudden, you're using ECS or EKS, the Kubernetes or container solutions by Amazon. You're spending up and down containers, and you're spending a whole engineer's time on kind of maintaining this stateful distributed system. Those are some of the worst systems to run because when it's a stateful distributed system, it means that you are bound by the connections to that thing. You have to keep the browser open while someone is working with it, right? That's just a painful architecture to run. And there's all this other little gotchas with Chromium, like Chromium, which is the open source version of Chrome, by the way. You have to install all these fonts. You want emojis working in your browsers because your vision model is looking for the emoji. You need to make sure you have the emoji fonts. You need to make sure you have all the right extensions configured, like, oh, do you want ad blocking? How do you configure that? How do you actually record all these browser sessions? Like it's a headless browser. You can't look at it. So you need to have some sort of observability. Maybe you're recording videos and storing those somewhere. It all kind of adds up to be this just giant monster piece of your project when all you wanted to do was run a lot of browsers in production for this little script to go to google.com and search. And when I see a complex distributed system, I see an opportunity to build a great infrastructure company. And we really abstract that away with Browserbase where our customers can use these existing frameworks, Playwright, Publisher, Selenium, or our own stagehand and connect to our browsers in a serverless-like way. And control them, and then just disconnect when they're done. And they don't have to think about the complex distributed system behind all of that. They just get a browser running anywhere, anytime. Really easy to connect to.swyx [00:15:55]: I'm sure you have questions. My standard question with anything, so essentially you're a serverless browser company, and there's been other serverless things that I'm familiar with in the past, serverless GPUs, serverless website hosting. That's where I come from with Netlify. One question is just like, you promised to spin up thousands of servers. You promised to spin up thousands of browsers in milliseconds. I feel like there's no real solution that does that yet. And I'm just kind of curious how. The only solution I know, which is to kind of keep a kind of warm pool of servers around, which is expensive, but maybe not so expensive because it's just CPUs. So I'm just like, you know. Yeah.Browsers as a Core Primitive in AI InfrastructurePaul [00:16:36]: You nailed it, right? I mean, how do you offer a serverless-like experience with something that is clearly not serverless, right? And the answer is, you need to be able to run... We run many browsers on single nodes. We use Kubernetes at browser base. So we have many pods that are being scheduled. We have to predictably schedule them up or down. Yes, thousands of browsers in milliseconds is the best case scenario. If you hit us with 10,000 requests, you may hit a slower cold start, right? So we've done a lot of work on predictive scaling and being able to kind of route stuff to different regions where we have multiple regions of browser base where we have different pools available. You can also pick the region you want to go to based on like lower latency, round trip, time latency. It's very important with these types of things. There's a lot of requests going over the wire. So for us, like having a VM like Firecracker powering everything under the hood allows us to be super nimble and spin things up or down really quickly with strong multi-tenancy. But in the end, this is like the complex infrastructural challenges that we have to kind of deal with at browser base. And we have a lot more stuff on our roadmap to allow customers to have more levers to pull to exchange, do you want really fast browser startup times or do you want really low costs? And if you're willing to be more flexible on that, we may be able to kind of like work better for your use cases.swyx [00:17:44]: Since you used Firecracker, shouldn't Fargate do that for you or did you have to go lower level than that? We had to go lower level than that.Paul [00:17:51]: I find this a lot with Fargate customers, which is alarming for Fargate. We used to be a giant Fargate customer. Actually, the first version of browser base was ECS and Fargate. And unfortunately, it's a great product. I think we were actually the largest Fargate customer in our region for a little while. No, what? Yeah, seriously. And unfortunately, it's a great product, but I think if you're an infrastructure company, you actually have to have a deeper level of control over these primitives. I think it's the same thing is true with databases. We've used other database providers and I think-swyx [00:18:21]: Yeah, serverless Postgres.Paul [00:18:23]: Shocker. When you're an infrastructure company, you're on the hook if any provider has an outage. And I can't tell my customers like, hey, we went down because so-and-so went down. That's not acceptable. So for us, we've really moved to bringing things internally. It's kind of opposite of what we preach. We tell our customers, don't build this in-house, but then we're like, we build a lot of stuff in-house. But I think it just really depends on what is in the critical path. We try and have deep ownership of that.Alessio [00:18:46]: On the distributed location side, how does that work for the web where you might get sort of different content in different locations, but the customer is expecting, you know, if you're in the US, I'm expecting the US version. But if you're spinning up my browser in France, I might get the French version. Yeah.Paul [00:19:02]: Yeah. That's a good question. Well, generally, like on the localization, there is a thing called locale in the browser. You can set like what your locale is. If you're like in the ENUS browser or not, but some things do IP, IP based routing. And in that case, you may want to have a proxy. Like let's say you're running something in the, in Europe, but you want to make sure you're showing up from the US. You may want to use one of our proxy features so you can turn on proxies to say like, make sure these connections always come from the United States, which is necessary too, because when you're browsing the web, you're coming from like a, you know, data center IP, and that can make things a lot harder to browse web. So we do have kind of like this proxy super network. Yeah. We have a proxy for you based on where you're going, so you can reliably automate the web. But if you get scheduled in Europe, that doesn't happen as much. We try and schedule you as close to, you know, your origin that you're trying to go to. But generally you have control over the regions you can put your browsers in. So you can specify West one or East one or Europe. We only have one region of Europe right now, actually. Yeah.Alessio [00:19:55]: What's harder, the browser or the proxy? I feel like to me, it feels like actually proxying reliably at scale. It's much harder than spending up browsers at scale. I'm curious. It's all hard.Paul [00:20:06]: It's layers of hard, right? Yeah. I think it's different levels of hard. I think the thing with the proxy infrastructure is that we work with many different web proxy providers and some are better than others. Some have good days, some have bad days. And our customers who've built browser infrastructure on their own, they have to go and deal with sketchy actors. Like first they figure out their own browser infrastructure and then they got to go buy a proxy. And it's like you can pay in Bitcoin and it just kind of feels a little sus, right? It's like you're buying drugs when you're trying to get a proxy online. We have like deep relationships with these counterparties. We're able to audit them and say, is this proxy being sourced ethically? Like it's not running on someone's TV somewhere. Is it free range? Yeah. Free range organic proxies, right? Right. We do a level of diligence. We're SOC 2. So we have to understand what is going on here. But then we're able to make sure that like we route around proxy providers not working. There's proxy providers who will just, the proxy will stop working all of a sudden. And then if you don't have redundant proxying on your own browsers, that's hard down for you or you may get some serious impacts there. With us, like we intelligently know, hey, this proxy is not working. Let's go to this one. And you can kind of build a network of multiple providers to really guarantee the best uptime for our customers. Yeah. So you don't own any proxies? We don't own any proxies. You're right. The team has been saying who wants to like take home a little proxy server, but not yet. We're not there yet. You know?swyx [00:21:25]: It's a very mature market. I don't think you should build that yourself. Like you should just be a super customer of them. Yeah. Scraping, I think, is the main use case for that. I guess. Well, that leads us into CAPTCHAs and also off, but let's talk about CAPTCHAs. You had a little spiel that you wanted to talk about CAPTCHA stuff.Challenges of Scaling Browser InfrastructurePaul [00:21:43]: Oh, yeah. I was just, I think a lot of people ask, if you're thinking about proxies, you're thinking about CAPTCHAs too. I think it's the same thing. You can go buy CAPTCHA solvers online, but it's the same buying experience. It's some sketchy website, you have to integrate it. It's not fun to buy these things and you can't really trust that the docs are bad. What Browserbase does is we integrate a bunch of different CAPTCHAs. We do some stuff in-house, but generally we just integrate with a bunch of known vendors and continually monitor and maintain these things and say, is this working or not? Can we route around it or not? These are CAPTCHA solvers. CAPTCHA solvers, yeah. Not CAPTCHA providers, CAPTCHA solvers. Yeah, sorry. CAPTCHA solvers. We really try and make sure all of that works for you. I think as a dev, if I'm buying infrastructure, I want it all to work all the time and it's important for us to provide that experience by making sure everything does work and monitoring it on our own. Yeah. Right now, the world of CAPTCHAs is tricky. I think AI agents in particular are very much ahead of the internet infrastructure. CAPTCHAs are designed to block all types of bots, but there are now good bots and bad bots. I think in the future, CAPTCHAs will be able to identify who a good bot is, hopefully via some sort of KYC. For us, we've been very lucky. We have very little to no known abuse of Browserbase because we really look into who we work with. And for certain types of CAPTCHA solving, we only allow them on certain types of plans because we want to make sure that we can know what people are doing, what their use cases are. And that's really allowed us to try and be an arbiter of good bots, which is our long term goal. I want to build great relationships with people like Cloudflare so we can agree, hey, here are these acceptable bots. We'll identify them for you and make sure we flag when they come to your website. This is a good bot, you know?Alessio [00:23:23]: I see. And Cloudflare said they want to do more of this. So they're going to set by default, if they think you're an AI bot, they're going to reject. I'm curious if you think this is something that is going to be at the browser level or I mean, the DNS level with Cloudflare seems more where it should belong. But I'm curious how you think about it.Paul [00:23:40]: I think the web's going to change. You know, I think that the Internet as we have it right now is going to change. And we all need to just accept that the cat is out of the bag. And instead of kind of like wishing the Internet was like it was in the 2000s, we can have free content line that wouldn't be scraped. It's just it's not going to happen. And instead, we should think about like, one, how can we change? How can we change the models of, you know, information being published online so people can adequately commercialize it? But two, how do we rebuild applications that expect that AI agents are going to log in on their behalf? Those are the things that are going to allow us to kind of like identify good and bad bots. And I think the team at Clerk has been doing a really good job with this on the authentication side. I actually think that auth is the biggest thing that will prevent agents from accessing stuff, not captchas. And I think there will be agent auth in the future. I don't know if it's going to happen from an individual company, but actually authentication providers that have a, you know, hidden login as agent feature, which will then you put in your email, you'll get a push notification, say like, hey, your browser-based agent wants to log into your Airbnb. You can approve that and then the agent can proceed. That really circumvents the need for captchas or logging in as you and sharing your password. I think agent auth is going to be one way we identify good bots going forward. And I think a lot of this captcha solving stuff is really short-term problems as the internet kind of reorients itself around how it's going to work with agents browsing the web, just like people do. Yeah.Managing Distributed Browser Locations and Proxiesswyx [00:24:59]: Stitch recently was on Hacker News for talking about agent experience, AX, which is a thing that Netlify is also trying to clone and coin and talk about. And we've talked about this on our previous episodes before in a sense that I actually think that's like maybe the only part of the tech stack that needs to be kind of reinvented for agents. Everything else can stay the same, CLIs, APIs, whatever. But auth, yeah, we need agent auth. And it's mostly like short-lived, like it should not, it should be a distinct, identity from the human, but paired. I almost think like in the same way that every social network should have your main profile and then your alt accounts or your Finsta, it's almost like, you know, every, every human token should be paired with the agent token and the agent token can go and do stuff on behalf of the human token, but not be presumed to be the human. Yeah.Paul [00:25:48]: It's like, it's, it's actually very similar to OAuth is what I'm thinking. And, you know, Thread from Stitch is an investor, Colin from Clerk, Octaventures, all investors in browser-based because like, I hope they solve this because they'll make browser-based submission more possible. So we don't have to overcome all these hurdles, but I think it will be an OAuth-like flow where an agent will ask to log in as you, you'll approve the scopes. Like it can book an apartment on Airbnb, but it can't like message anybody. And then, you know, the agent will have some sort of like role-based access control within an application. Yeah. I'm excited for that.swyx [00:26:16]: The tricky part is just, there's one, one layer of delegation here, which is like, you're authoring my user's user or something like that. I don't know if that's tricky or not. Does that make sense? Yeah.Paul [00:26:25]: You know, actually at Twilio, I worked on the login identity and access. Management teams, right? So like I built Twilio's login page.swyx [00:26:31]: You were an intern on that team and then you became the lead in two years? Yeah.Paul [00:26:34]: Yeah. I started as an intern in 2016 and then I was the tech lead of that team. How? That's not normal. I didn't have a life. He's not normal. Look at this guy. I didn't have a girlfriend. I just loved my job. I don't know. I applied to 500 internships for my first job and I got rejected from every single one of them except for Twilio and then eventually Amazon. And they took a shot on me and like, I was getting paid money to write code, which was my dream. Yeah. Yeah. I'm very lucky that like this coding thing worked out because I was going to be doing it regardless. And yeah, I was able to kind of spend a lot of time on a team that was growing at a company that was growing. So it informed a lot of this stuff here. I think these are problems that have been solved with like the SAML protocol with SSO. I think it's a really interesting stuff with like WebAuthn, like these different types of authentication, like schemes that you can use to authenticate people. The tooling is all there. It just needs to be tweaked a little bit to work for agents. And I think the fact that there are companies that are already. Providing authentication as a service really sets it up. Well, the thing that's hard is like reinventing the internet for agents. We don't want to rebuild the internet. That's an impossible task. And I think people often say like, well, we'll have this second layer of APIs built for agents. I'm like, we will for the top use cases, but instead of we can just tweak the internet as is, which is on the authentication side, I think we're going to be the dumb ones going forward. Unfortunately, I think AI is going to be able to do a lot of the tasks that we do online, which means that it will be able to go to websites, click buttons on our behalf and log in on our behalf too. So with this kind of like web agent future happening, I think with some small structural changes, like you said, it feels like it could all slot in really nicely with the existing internet.Handling CAPTCHAs and Agent Authenticationswyx [00:28:08]: There's one more thing, which is the, your live view iframe, which lets you take, take control. Yeah. Obviously very key for operator now, but like, was, is there anything interesting technically there or that the people like, well, people always want this.Paul [00:28:21]: It was really hard to build, you know, like, so, okay. Headless browsers, you don't see them, right. They're running. They're running in a cloud somewhere. You can't like look at them. And I just want to really make, it's a weird name. I wish we came up with a better name for this thing, but you can't see them. Right. But customers don't trust AI agents, right. At least the first pass. So what we do with our live view is that, you know, when you use browser base, you can actually embed a live view of the browser running in the cloud for your customer to see it working. And that's what the first reason is the build trust, like, okay, so I have this script. That's going to go automate a website. I can embed it into my web application via an iframe and my customer can watch. I think. And then we added two way communication. So now not only can you watch the browser kind of being operated by AI, if you want to pause and actually click around type within this iframe that's controlling a browser, that's also possible. And this is all thanks to some of the lower level protocol, which is called the Chrome DevTools protocol. It has a API called start screencast, and you can also send mouse clicks and button clicks to a remote browser. And this is all embeddable within iframes. You have a browser within a browser, yo. And then you simulate the screen, the click on the other side. Exactly. And this is really nice often for, like, let's say, a capture that can't be solved. You saw this with Operator, you know, Operator actually uses a different approach. They use VNC. So, you know, you're able to see, like, you're seeing the whole window here. What we're doing is something a little lower level with the Chrome DevTools protocol. It's just PNGs being streamed over the wire. But the same thing is true, right? Like, hey, I'm running a window. Pause. Can you do something in this window? Human. Okay, great. Resume. Like sometimes 2FA tokens. Like if you get that text message, you might need a person to type that in. Web agents need human-in-the-loop type workflows still. You still need a person to interact with the browser. And building a UI to proxy that is kind of hard. You may as well just show them the whole browser and say, hey, can you finish this up for me? And then let the AI proceed on afterwards. Is there a future where I stream my current desktop to browser base? I don't think so. I think we're very much cloud infrastructure. Yeah. You know, but I think a lot of the stuff we're doing, we do want to, like, build tools. Like, you know, we'll talk about the stage and, you know, web agent framework in a second. But, like, there's a case where a lot of people are going desktop first for, you know, consumer use. And I think cloud is doing a lot of this, where I expect to see, you know, MCPs really oriented around the cloud desktop app for a reason, right? Like, I think a lot of these tools are going to run on your computer because it makes... I think it's breaking out. People are putting it on a server. Oh, really? Okay. Well, sweet. We'll see. We'll see that. I was surprised, though, wasn't I? I think that the browser company, too, with Dia Browser, it runs on your machine. You know, it's going to be...swyx [00:30:50]: What is it?Paul [00:30:51]: So, Dia Browser, as far as I understand... I used to use Arc. Yeah. I haven't used Arc. But I'm a big fan of the browser company. I think they're doing a lot of cool stuff in consumer. As far as I understand, it's a browser where you have a sidebar where you can, like, chat with it and it can control the local browser on your machine. So, if you imagine, like, what a consumer web agent is, which it lives alongside your browser, I think Google Chrome has Project Marina, I think. I almost call it Project Marinara for some reason. I don't know why. It's...swyx [00:31:17]: No, I think it's someone really likes the Waterworld. Oh, I see. The classic Kevin Costner. Yeah.Paul [00:31:22]: Okay. Project Marinara is a similar thing to the Dia Browser, in my mind, as far as I understand it. You have a browser that has an AI interface that will take over your mouse and keyboard and control the browser for you. Great for consumer use cases. But if you're building applications that rely on a browser and it's more part of a greater, like, AI app experience, you probably need something that's more like infrastructure, not a consumer app.swyx [00:31:44]: Just because I have explored a little bit in this area, do people want branching? So, I have the state. Of whatever my browser's in. And then I want, like, 100 clones of this state. Do people do that? Or...Paul [00:31:56]: People don't do it currently. Yeah. But it's definitely something we're thinking about. I think the idea of forking a browser is really cool. Technically, kind of hard. We're starting to see this in code execution, where people are, like, forking some, like, code execution, like, processes or forking some tool calls or branching tool calls. Haven't seen it at the browser level yet. But it makes sense. Like, if an AI agent is, like, using a website and it's not sure what path it wants to take to crawl this website. To find the information it's looking for. It would make sense for it to explore both paths in parallel. And that'd be a very, like... A road not taken. Yeah. And hopefully find the right answer. And then say, okay, this was actually the right one. And memorize that. And go there in the future. On the roadmap. For sure. Don't make my roadmap, please. You know?Alessio [00:32:37]: How do you actually do that? Yeah. How do you fork? I feel like the browser is so stateful for so many things.swyx [00:32:42]: Serialize the state. Restore the state. I don't know.Paul [00:32:44]: So, it's one of the reasons why we haven't done it yet. It's hard. You know? Like, to truly fork, it's actually quite difficult. The naive way is to open the same page in a new tab and then, like, hope that it's at the same thing. But if you have a form halfway filled, you may have to, like, take the whole, you know, container. Pause it. All the memory. Duplicate it. Restart it from there. It could be very slow. So, we haven't found a thing. Like, the easy thing to fork is just, like, copy the page object. You know? But I think there needs to be something a little bit more robust there. Yeah.swyx [00:33:12]: So, MorphLabs has this infinite branch thing. Like, wrote a custom fork of Linux or something that let them save the system state and clone it. MorphLabs, hit me up. I'll be a customer. Yeah. That's the only. I think that's the only way to do it. Yeah. Like, unless Chrome has some special API for you. Yeah.Paul [00:33:29]: There's probably something we'll reverse engineer one day. I don't know. Yeah.Alessio [00:33:32]: Let's talk about StageHand, the AI web browsing framework. You have three core components, Observe, Extract, and Act. Pretty clean landing page. What was the idea behind making a framework? Yeah.Stagehand: AI web browsing frameworkPaul [00:33:43]: So, there's three frameworks that are very popular or already exist, right? Puppeteer, Playwright, Selenium. Those are for building hard-coded scripts to control websites. And as soon as I started to play with LLMs plus browsing, I caught myself, you know, code-genning Playwright code to control a website. I would, like, take the DOM. I'd pass it to an LLM. I'd say, can you generate the Playwright code to click the appropriate button here? And it would do that. And I was like, this really should be part of the frameworks themselves. And I became really obsessed with SDKs that take natural language as part of, like, the API input. And that's what StageHand is. StageHand exposes three APIs, and it's a super set of Playwright. So, if you go to a page, you may want to take an action, click on the button, fill in the form, etc. That's what the act command is for. You may want to extract some data. This one takes a natural language, like, extract the winner of the Super Bowl from this page. You can give it a Zod schema, so it returns a structured output. And then maybe you're building an API. You can do an agent loop, and you want to kind of see what actions are possible on this page before taking one. You can do observe. So, you can observe the actions on the page, and it will generate a list of actions. You can guide it, like, give me actions on this page related to buying an item. And you can, like, buy it now, add to cart, view shipping options, and pass that to an LLM, an agent loop, to say, what's the appropriate action given this high-level goal? So, StageHand isn't a web agent. It's a framework for building web agents. And we think that agent loops are actually pretty close to the application layer because every application probably has different goals or different ways it wants to take steps. I don't think I've seen a generic. Maybe you guys are the experts here. I haven't seen, like, a really good AI agent framework here. Everyone kind of has their own special sauce, right? I see a lot of developers building their own agent loops, and they're using tools. And I view StageHand as the browser tool. So, we expose act, extract, observe. Your agent can call these tools. And from that, you don't have to worry about it. You don't have to worry about generating playwright code performantly. You don't have to worry about running it. You can kind of just integrate these three tool calls into your agent loop and reliably automate the web.swyx [00:35:48]: A special shout-out to Anirudh, who I met at your dinner, who I think listens to the pod. Yeah. Hey, Anirudh.Paul [00:35:54]: Anirudh's a man. He's a StageHand guy.swyx [00:35:56]: I mean, the interesting thing about each of these APIs is they're kind of each startup. Like, specifically extract, you know, Firecrawler is extract. There's, like, Expand AI. There's a whole bunch of, like, extract companies. They just focus on extract. I'm curious. Like, I feel like you guys are going to collide at some point. Like, right now, it's friendly. Everyone's in a blue ocean. At some point, it's going to be valuable enough that there's some turf battle here. I don't think you have a dog in a fight. I think you can mock extract to use an external service if they're better at it than you. But it's just an observation that, like, in the same way that I see each option, each checkbox in the side of custom GBTs becoming a startup or each box in the Karpathy chart being a startup. Like, this is also becoming a thing. Yeah.Paul [00:36:41]: I mean, like, so the way StageHand works is that it's MIT-licensed, completely open source. You bring your own API key to your LLM of choice. You could choose your LLM. We don't make any money off of the extract or really. We only really make money if you choose to run it with our browser. You don't have to. You can actually use your own browser, a local browser. You know, StageHand is completely open source for that reason. And, yeah, like, I think if you're building really complex web scraping workflows, I don't know if StageHand is the tool for you. I think it's really more if you're building an AI agent that needs a few general tools or if it's doing a lot of, like, web automation-intensive work. But if you're building a scraping company, StageHand is not your thing. You probably want something that's going to, like, get HTML content, you know, convert that to Markdown, query it. That's not what StageHand does. StageHand is more about reliability. I think we focus a lot on reliability and less so on cost optimization and speed at this point.swyx [00:37:33]: I actually feel like StageHand, so the way that StageHand works, it's like, you know, page.act, click on the quick start. Yeah. It's kind of the integration test for the code that you would have to write anyway, like the Puppeteer code that you have to write anyway. And when the page structure changes, because it always does, then this is still the test. This is still the test that I would have to write. Yeah. So it's kind of like a testing framework that doesn't need implementation detail.Paul [00:37:56]: Well, yeah. I mean, Puppeteer, Playwright, and Slenderman were all designed as testing frameworks, right? Yeah. And now people are, like, hacking them together to automate the web. I would say, and, like, maybe this is, like, me being too specific. But, like, when I write tests, if the page structure changes. Without me knowing, I want that test to fail. So I don't know if, like, AI, like, regenerating that. Like, people are using StageHand for testing. But it's more for, like, usability testing, not, like, testing of, like, does the front end, like, has it changed or not. Okay. But generally where we've seen people, like, really, like, take off is, like, if they're using, you know, something. If they want to build a feature in their application that's kind of like Operator or Deep Research, they're using StageHand to kind of power that tool calling in their own agent loop. Okay. Cool.swyx [00:38:37]: So let's go into Operator, the first big agent launch of the year from OpenAI. Seems like they have a whole bunch scheduled. You were on break and your phone blew up. What's your just general view of computer use agents is what they're calling it. The overall category before we go into Open Operator, just the overall promise of Operator. I will observe that I tried it once. It was okay. And I never tried it again.OpenAI's Operator and computer use agentsPaul [00:38:58]: That tracks with my experience, too. Like, I'm a huge fan of the OpenAI team. Like, I think that I do not view Operator as the company. I'm not a company killer for browser base at all. I think it actually shows people what's possible. I think, like, computer use models make a lot of sense. And I'm actually most excited about computer use models is, like, their ability to, like, really take screenshots and reasoning and output steps. I think that using mouse click or mouse coordinates, I've seen that proved to be less reliable than I would like. And I just wonder if that's the right form factor. What we've done with our framework is anchor it to the DOM itself, anchor it to the actual item. So, like, if it's clicking on something, it's clicking on that thing, you know? Like, it's more accurate. No matter where it is. Yeah, exactly. Because it really ties in nicely. And it can handle, like, the whole viewport in one go, whereas, like, Operator can only handle what it sees. Can you hover? Is hovering a thing that you can do? I don't know if we expose it as a tool directly, but I'm sure there's, like, an API for hovering. Like, move mouse to this position. Yeah, yeah, yeah. I think you can trigger hover, like, via, like, the JavaScript on the DOM itself. But, no, I think, like, when we saw computer use, everyone's eyes lit up because they realized, like, wow, like, AI is going to actually automate work for people. And I think seeing that kind of happen from both of the labs, and I'm sure we're going to see more labs launch computer use models, I'm excited to see all the stuff that people build with it. I think that I'd love to see computer use power, like, controlling a browser on browser base. And I think, like, Open Operator, which was, like, our open source version of OpenAI's Operator, was our first take on, like, how can we integrate these models into browser base? And we handle the infrastructure and let the labs do the models. I don't have a sense that Operator will be released as an API. I don't know. Maybe it will. I'm curious to see how well that works because I think it's going to be really hard for a company like OpenAI to do things like support CAPTCHA solving or, like, have proxies. Like, I think it's hard for them structurally. Imagine this New York Times headline, OpenAI CAPTCHA solving. Like, that would be a pretty bad headline, this New York Times headline. Browser base solves CAPTCHAs. No one cares. No one cares. And, like, our investors are bored. Like, we're all okay with this, you know? We're building this company knowing that the CAPTCHA solving is short-lived until we figure out how to authenticate good bots. I think it's really hard for a company like OpenAI, who has this brand that's so, so good, to balance with, like, the icky parts of web automation, which it can be kind of complex to solve. I'm sure OpenAI knows who to call whenever they need you. Yeah, right. I'm sure they'll have a great partnership.Alessio [00:41:23]: And is Open Operator just, like, a marketing thing for you? Like, how do you think about resource allocation? So, you can spin this up very quickly. And now there's all this, like, open deep research, just open all these things that people are building. We started it, you know. You're the original Open. We're the original Open operator, you know? Is it just, hey, look, this is a demo, but, like, we'll help you build out an actual product for yourself? Like, are you interested in going more of a product route? That's kind of the OpenAI way, right? They started as a model provider and then…Paul [00:41:53]: Yeah, we're not interested in going the product route yet. I view Open Operator as a model provider. It's a reference project, you know? Let's show people how to build these things using the infrastructure and models that are out there. And that's what it is. It's, like, Open Operator is very simple. It's an agent loop. It says, like, take a high-level goal, break it down into steps, use tool calling to accomplish those steps. It takes screenshots and feeds those screenshots into an LLM with the step to generate the right action. It uses stagehand under the hood to actually execute this action. It doesn't use a computer use model. And it, like, has a nice interface using the live view that we talked about, the iframe, to embed that into an application. So I felt like people on launch day wanted to figure out how to build their own version of this. And we turned that around really quickly to show them. And I hope we do that with other things like deep research. We don't have a deep research launch yet. I think David from AOMNI actually has an amazing open deep research that he launched. It has, like, 10K GitHub stars now. So he's crushing that. But I think if people want to build these features natively into their application, they need good reference projects. And I think Open Operator is a good example of that.swyx [00:42:52]: I don't know. Actually, I'm actually pretty bullish on API-driven operator. Because that's the only way that you can sort of, like, once it's reliable enough, obviously. And now we're nowhere near. But, like, give it five years. It'll happen, you know. And then you can sort of spin this up and browsers are working in the background and you don't necessarily have to know. And it just is booking restaurants for you, whatever. I can definitely see that future happening. I had this on the landing page here. This might be a slightly out of order. But, you know, you have, like, sort of three use cases for browser base. Open Operator. Or this is the operator sort of use case. It's kind of like the workflow automation use case. And it completes with UiPath in the sort of RPA category. Would you agree with that? Yeah, I would agree with that. And then there's Agents we talked about already. And web scraping, which I imagine would be the bulk of your workload right now, right?Paul [00:43:40]: No, not at all. I'd say actually, like, the majority is browser automation. We're kind of expensive for web scraping. Like, I think that if you're building a web scraping product, if you need to do occasional web scraping or you have to do web scraping that works every single time, you want to use browser automation. Yeah. You want to use browser-based. But if you're building web scraping workflows, what you should do is have a waterfall. You should have the first request is a curl to the website. See if you can get it without even using a browser. And then the second request may be, like, a scraping-specific API. There's, like, a thousand scraping APIs out there that you can use to try and get data. Scraping B. Scraping B is a great example, right? Yeah. And then, like, if those two don't work, bring out the heavy hitter. Like, browser-based will 100% work, right? It will load the page in a real browser, hydrate it. I see.swyx [00:44:21]: Because a lot of people don't render to JS.swyx [00:44:25]: Yeah, exactly.Paul [00:44:26]: So, I mean, the three big use cases, right? Like, you know, automation, web data collection, and then, you know, if you're building anything agentic that needs, like, a browser tool, you want to use browser-based.Alessio [00:44:35]: Is there any use case that, like, you were super surprised by that people might not even think about? Oh, yeah. Or is it, yeah, anything that you can share? The long tail is crazy. Yeah.Surprising use cases of BrowserbasePaul [00:44:44]: One of the case studies on our website that I think is the most interesting is this company called Benny. So, the way that it works is if you're on food stamps in the United States, you can actually get rebates if you buy certain things. Yeah. You buy some vegetables. You submit your receipt to the government. They'll give you a little rebate back. Say, hey, thanks for buying vegetables. It's good for you. That process of submitting that receipt is very painful. And the way Benny works is you use their app to take a photo of your receipt, and then Benny will go submit that receipt for you and then deposit the money into your account. That's actually using no AI at all. It's all, like, hard-coded scripts. They maintain the scripts. They've been doing a great job. And they build this amazing consumer app. But it's an example of, like, all these, like, tedious workflows that people have to do to kind of go about their business. And they're doing it for the sake of their day-to-day lives. And I had never known about, like, food stamp rebates or the complex forms you have to do to fill them. But the world is powered by millions and millions of tedious forms, visas. You know, Emirate Lighthouse is a customer, right? You know, they do the O1 visa. Millions and millions of forms are taking away humans' time. And I hope that Browserbase can help power software that automates away the web forms that we don't need anymore. Yeah.swyx [00:45:49]: I mean, I'm very supportive of that. I mean, forms. I do think, like, government itself is a big part of it. I think the government itself should embrace AI more to do more sort of human-friendly form filling. Mm-hmm. But I'm not optimistic. I'm not holding my breath. Yeah. We'll see. Okay. I think I'm about to zoom out. I have a little brief thing on computer use, and then we can talk about founder stuff, which is, I tend to think of developer tooling markets in impossible triangles, where everyone starts in a niche, and then they start to branch out. So I already hinted at a little bit of this, right? We mentioned more. We mentioned E2B. We mentioned Firecrawl. And then there's Browserbase. So there's, like, all this stuff of, like, have serverless virtual computer that you give to an agent and let them do stuff with it. And there's various ways of connecting it to the internet. You can just connect to a search API, like SERP API, whatever other, like, EXA is another one. That's what you're searching. You can also have a JSON markdown extractor, which is Firecrawl. Or you can have a virtual browser like Browserbase, or you can have a virtual machine like Morph. And then there's also maybe, like, a virtual sort of code environment, like Code Interpreter. So, like, there's just, like, a bunch of different ways to tackle the problem of give a computer to an agent. And I'm just kind of wondering if you see, like, everyone's just, like, happily coexisting in their respective niches. And as a developer, I just go and pick, like, a shopping basket of one of each. Or do you think that you eventually, people will collide?Future of browser automation and market competitionPaul [00:47:18]: I think that currently it's not a zero-sum market. Like, I think we're talking about... I think we're talking about all of knowledge work that people do that can be automated online. All of these, like, trillions of hours that happen online where people are working. And I think that there's so much software to be built that, like, I tend not to think about how these companies will collide. I just try to solve the problem as best as I can and make this specific piece of infrastructure, which I think is an important primitive, the best I possibly can. And yeah. I think there's players that are actually going to like it. I think there's players that are going to launch, like, over-the-top, you know, platforms, like agent platforms that have all these tools built in, right? Like, who's building the rippling for agent tools that has the search tool, the browser tool, the operating system tool, right? There are some. There are some. There are some, right? And I think in the end, what I have seen as my time as a developer, and I look at all the favorite tools that I have, is that, like, for tools and primitives with sufficient levels of complexity, you need to have a solution that's really bespoke to that primitive, you know? And I am sufficiently convinced that the browser is complex enough to deserve a primitive. Obviously, I have to. I'm the founder of BrowserBase, right? I'm talking my book. But, like, I think maybe I can give you one spicy take against, like, maybe just whole OS running. I think that when I look at computer use when it first came out, I saw that the majority of use cases for computer use were controlling a browser. And do we really need to run an entire operating system just to control a browser? I don't think so. I don't think that's necessary. You know, BrowserBase can run browsers for way cheaper than you can if you're running a full-fledged OS with a GUI, you know, operating system. And I think that's just an advantage of the browser. It is, like, browsers are little OSs, and you can run them very efficiently if you orchestrate it well. And I think that allows us to offer 90% of the, you know, functionality in the platform needed at 10% of the cost of running a full OS. Yeah.Open Operator: Browserbase's Open-Source Alternativeswyx [00:49:16]: I definitely see the logic in that. There's a Mark Andreessen quote. I don't know if you know this one. Where he basically observed that the browser is turning the operating system into a poorly debugged set of device drivers, because most of the apps are moved from the OS to the browser. So you can just run browsers.Paul [00:49:31]: There's a place for OSs, too. Like, I think that there are some applications that only run on Windows operating systems. And Eric from pig.dev in this upcoming YC batch, or last YC batch, like, he's building all run tons of Windows operating systems for you to control with your agent. And like, there's some legacy EHR systems that only run on Internet-controlled systems. Yeah.Paul [00:49:54]: I think that's it. I think, like, there are use cases for specific operating systems for specific legacy software. And like, I'm excited to see what he does with that. I just wanted to give a shout out to the pig.dev website.swyx [00:50:06]: The pigs jump when you click on them. Yeah. That's great.Paul [00:50:08]: Eric, he's the former co-founder of banana.dev, too.swyx [00:50:11]: Oh, that Eric. Yeah. That Eric. Okay. Well, he abandoned bananas for pigs. I hope he doesn't start going around with pigs now.Alessio [00:50:18]: Like he was going around with bananas. A little toy pig. Yeah. Yeah. I love that. What else are we missing? I think we covered a lot of, like, the browser-based product history, but. What do you wish people asked you? Yeah.Paul [00:50:29]: I wish people asked me more about, like, what will the future of software look like? Because I think that's really where I've spent a lot of time about why do browser-based. Like, for me, starting a company is like a means of last resort. Like, you shouldn't start a company unless you absolutely have to. And I remain convinced that the future of software is software that you're going to click a button and it's going to do stuff on your behalf. Right now, software. You click a button and it maybe, like, calls it back an API and, like, computes some numbers. It, like, modifies some text, whatever. But the future of software is software using software. So, I may log into my accounting website for my business, click a button, and it's going to go load up my Gmail, search my emails, find the thing, upload the receipt, and then comment it for me. Right? And it may use it using APIs, maybe a browser. I don't know. I think it's a little bit of both. But that's completely different from how we've built software so far. And that's. I think that future of software has different infrastructure requirements. It's going to require different UIs. It's going to require different pieces of infrastructure. I think the browser infrastructure is one piece that fits into that, along with all the other categories you mentioned. So, I think that it's going to require developers to think differently about how they've built software for, you know
Hong Minhee is an open source developer and the creator of the Fedify ActivityPub server framework. We talk about how applications like Mastodon and Misskey communicate with one another using ActivityPub. This includes discussions on built-in activites, extending the specification in a backwards compatible way, difficulties implementing JSON-LD, the inbox model, and his experience implementing the specification. Hong Minhee: activitypub profile fedify hollo Specifications: ActivityPub W3C specification JSON Linked Data Resource Description Framework W3C Semantic Web Standards ActivityPub and WebFinger ActivityPub and HTTP Signatures ActivityPub implementations: Mastodon Misskey Akkoma Pleroma Pixelfed Lemmy Loops GoToSocial ActivityPub support in Ghost Threads has entered the Fediverse ActivityPub tools: ActivityPub Academy BrowserPub fedify CLI -- Transcript You can help correct transcripts on GitHub. What's ActivityPub? [00:00:00] Jeremy: Today, I'm talking to Hong Minhee. He is the developer of Fedify. A TypeScript library for building ActivityPub server applications. The first thing I think we should start with is defining ActivityPub. what is ActivityPub? [00:00:16] Hong: ActivityPub is the protocol that lets social networks talk to each other and it's officially recommended by W3C. It's what powers this thing we call the Fediverse which is basically a way for different social media platforms to work together. Users of ActivityPub [00:00:39] Jeremy: Can you give some examples that people might have heard of -- either users of ActivityPub or things that are a part of this fediverse? [00:00:50] Hong: Mastodon is probably the biggest one out there. And you know what's interesting? Meta threads has actually started implementing ActivityPub this summer. So this still pretty much a one way street right now. In East Asia, especially Japan, there's this really popular microblogging platform called misskey. It's got so many forks that people actually joke around and called them forkeys. but it's not just about Twitter style microblogging, there's Pixelfed which is kind of like Instagram, but for the fediverse. And those same folks recently launched loops. Which is basically doing what TikTok does, but in the Fediverse. Then you've got stuff like Lemmy and which are doing the reddit thing up in the Fediverse. [00:02:00] Jeremy: Oh like Reddit. [00:02:01] Hong: Yeah. There are so much more out there that I haven't even mentioned. Um, most of it is open source, which is pretty cool. [00:02:13] Jeremy: So the first few examples you gave, Mastodon and Meta's threads, they're very similar to, to Twitter, right? So that's what you were calling the, the Microblogging applications. And I think what you had said, which is a little bit interesting is you had said Metas threads is only one way. So could you kind of describe like what you mean by that? [00:02:37] Hong: Currently meta threads only can be followed by other ActivityPub applications but you cannot follow other people in the fediverse. [00:02:55] Jeremy: People who are using another Microblogging platform like Mastodon can follow someone on Meta's Threads platform. But the other way is not true. If you're on threads, you can't follow someone on Mastodon. [00:03:07] Hong: Yes, that's right. [00:03:09] Jeremy: And that's not a limitation of the protocol itself. That's a design decision or a decision made by Meta. [00:03:17] Hong: Yeah. They are slowly implementing ActivityPub and I hope they will implement complete ActivityPub in the future. Interoperability through Activities [00:03:27] Jeremy: And then the other examples you gave, one is I believe it was Pixel Fed is very similar to Instagram. And then the last examples you gave was I think it was Lemmy, you said it's similar to Reddit. Because you mentioned the term Fediverse before and you mentioned that these all use ActivityPub and since these seem like different kinds of applications, what does it mean for them to interact? Because with Mastodon and Threads I can kind of understand because they're both similar to Twitter. So you're posting messages and replying, but, but what does it mean, for example, for someone on Mastodon to interact with someone on Lemmy which is like Reddit because they seem very different. [00:04:16] Hong: People in Lemmy and Mastodon are called actors and can follow each other. They have interactions between them called activities. And there are several types of activities like, create and follow and undo, like, and so on. So, ActivityPub applications tend to, use these vocabulary to implement their features. So, for example, Lemmy uses like activities for upvoting and like activities for down voting and it's translated to likes in Mastodon. So if you submit a post on Lemmy and it shows up on your Mastodon timeline. If you like that post (it) is upvoting in Lemmy. [00:05:36] Jeremy: And probably similarly with Pixelfed, which you said is like Instagram, if you follow someone's Pixelfed account in Mastodon and they post a photo in Pixel Fed, they would see it as a post in Mastodon natively and they could give it a like there. Adding activities or properties [00:05:56] Jeremy: And these activities that you mentioned -- So the like and the dislike are those part of ActivityPub itself? [00:06:05] Hong: Yes, and this vocabulary can be extended. [00:06:10] Jeremy: So you can add, additional actions (activities) or are you adding information (properties) to the existing actions? [00:06:37] Hong: It is called activity vocabulary, and there are, things like accept, add, arrive, block, create lead, dislike, flag, follow, ignore invite, join, and so on. So, basically, almost everything you need to build social media is already there in the vocabulary, but if you want to extend some more, you can define your, own vocabulary. [00:06:56] Jeremy: Most of the things that an Instagram or a Twitter, or a Reddit would need is already there. But you're saying that you can have your own vocabulary. So if there's an action or an activity that is not covered by the specification, you can create one yourself. [00:07:13] Hong: Yes. For example, Misskey and Pleroma defined emoji reactor to represent emoji reactions. [00:07:25] Jeremy: Because the systems can extend the vocabulary. What are some other examples of cases where mastodon or any other of these systems has found that the existing vocabulary is not enough. What are some other examples of applications extending it? [00:07:45] Hong: For example, uh, mastodon defined suspended -- suspended property. They are not activities, but they are properties in the activity. ActivityPub consists of several types of objects and there are activities and normal objects like, article. they can have properties and there are several existing properties, but they can be also extended. So Mastodon extended some properties they need. So for example, they define suspended or discoverable. [00:08:44] Hong: Suspended for to tell if an actor is suspended by moderators. Discoverable tells if an actor itself wants to be, searched and indexed, and there are much more properties. Mastodon extended. Actors [00:09:12] Jeremy: And these are, these are properties of the actor. These are properties of the user? [00:09:19] Hong: Yes. Actors. [00:09:21] Jeremy: Cause I think earlier you mentioned that. The concept of a user is an actor, and it sounds like what you're saying is an actor can have all these properties. There's probably a, a username and things like that, but Mastodon has extended the properties so that, you can have a property on whether you wanna be searched or indexed you can have a property that says you're suspended. So I guess your account, is still there, but can't be used anymore. Something we should probably talk about then is, so you have these actors, you have these activities that I'm assuming the actors are performing on one another. What does that data look like and what does the communication look like? [00:10:09] Hong: Actors have their own dereferencable URI and when you look up that URI you get all the info about the actor in JSON-LD format [00:10:22] Jeremy: JSON-LD? [00:10:23] Hong: Yeah. JSON-LD. linked data. (The) Actor has all the stuff you expect to find on a social account name, bio URL to the profile page, profile picture, head image and more. And there are five main types of actors: application, group, organization, person and service. And you know how sometimes on Mastodon you will see an account marked as a bot? [00:10:58] Jeremy: A bot? [00:10:59] Hong: Yeah. Bot and that's what an actor of type service looks like. And the ActivityPub spec actually let you create other types beyond these five. But I haven't seen anyone actually do that yet. JSON-LD [00:11:15] Jeremy: And you mentioned that these are all JSON objects. but the LD part, the linked data part, I'm not familiar with. So what different about the linked data part of the JSON? [00:11:31] Hong: So JSON-LD is the special way of writing RDF. Which was originally used in the semantic web. Usually RDF uses (a) format (that) is called triples. [00:11:48] Jeremy: Triples? [00:11:49] Hong: Yeah, subject and predicate and object. [00:11:55] Jeremy: Subject, predicate, object. Can you give an example of what those three would be? [00:12:00] Hong: For example, is a person, it's a triple. John is a subject and is a predicate [00:12:11] Jeremy: is, is the predicate. [00:12:12] Hong: Okay. And person is a object. That's great for showing how things are connected, but it is pretty different from how we usually handle data in REST for APIs and stuff. Like normally we say a personal object has property like name, DOB, bio, and so on. And a bunch of subject predicated object triples that's where JSON-LD comes in -- is designed to look more like the JSON we are used to working with, while still being able to represent RDF Graphs. RDF graph are ontology. It's a way to represent factual data, but is, quite different from, how we represent data in relational database. And it's a bunch of triples each subject and objects are nodes and predicates connect these nodes. Semantic Web [00:13:30] Jeremy: You mentioned the Semantic web, what does that mean? What is the semantic web? [00:13:35] Hong: It's a way to represent web in the structural way, is machine readable so that you can, scan the data in the web, using scrapers or crawlers. [00:13:52] Jeremy: Scrapers -- or what was the second one? Crawling. [00:13:59] Hong: Yeah. Then you can have graph data of web and you can, query information about things from the data. [00:14:14] Jeremy: So is the web as it exists now, is that the Semantic web or is it something different? [00:14:24] Hong: I think it is partially semantic web, you have several metadata in Your HTML. For example, there are several specification for semantic web, like, OpenGraph metadata. [00:14:32] Jeremy: Cause when I think about OpenGraph, I think about the metadata on a webpage that, that tells other applications or websites that if you link to this page: show this image or show this title and description. You're saying that specifically you consider part of the semantic web? [00:15:05] Hong: That's, semantic web. To make your website semantic web. Your website should be able to, provide structural data. And other people can make Scrapers to scan, structural data from your website. There are a bunch of attributes and text for HTML to represent metadata. For example you have relation attribute rel so if you have a link with rel=me to your another social profile. Then other people can tell two web pages represent the same person. [00:16:10] Jeremy: Oh, I see. So you could have more than one website. Maybe one is your blog and maybe one is your favorite birds or something like that. But you could put a rel tag with information about you as a person so that someone who scrapes both websites could look at that tag and see that both of these websites are by, Hong, by this person. JSON-LD is difficult to implement and not used as intended [00:16:43] Hong: Yeah. I think JSON-LD is, designed for semantic web, but in reality, ActivityPub implementations, most of them are, not aware of semantic web. [00:17:01] Jeremy: The choice of JSON Linked Data, the JSON-LD, by the people who made the specification -- They had this idea that things that implemented ActivityPub would be a part of this semantic web, but the actual implementation of a Mastodon or a Pixelfed, they use JSON-LD because it's part of the specification, but the way they use it, it ends up not really being a part of this semantic web. [00:17:34] Hong: Yeah, that's exactly.. [00:17:37] Jeremy: You've mentioned that implementing it is difficult. What makes implementing JSON LD particularly hard? [00:17:48] Hong: The JSON-LD is quite complex. Which is why a lot of programming language don't even have JSON-LD implementations and it's pretty slow compared to just working with the regular JSON. So, what happens is a lot of ActivityPub implementations just treat JSON-LD like (it) is regular JSON without using a proper JSON-LD processor. You can do that, but it creates a source of headache. In JSON-LD there are weird equivalences like if a property is missing or if it's an empty array, that means the same thing. Or if a property has one value versus an array with just that one value in it, same thing. So when you are writing code to parse JSON-LD, you've got to keep checking if something's an array how long it is and all that is super easy to mess up. It's not just reading JSON-LD that's tricky. Creating it is just as bad. Like you might forget to include the right context metadata for a vocabulary and end up with a JSON-LD document that's either invalid or means something totally different from what you wanted. Even the big ActivityPub implementations mess this up pretty often. With Fedify we've got a JSON-LD processor built in and we keep running into issues where major ActivityPub implementations create invalidate JSON-LD. We've had to create workaround for all of them, but it's not pretty and causes kind of a mess. [00:19:52] Jeremy: Even though there is a specification for JSON-LD, it sounds like the implementers don't necessarily follow it. So you are kind of parsing JSON-LD, but not really. You're parsing something that. Looks like JSON-LD, but isn't quite it. [00:20:12] Hong: Yes, that's right. [00:20:14] Jeremy: And is that true in the, the biggest implementations, Mastodon, for example, are there things that it sends in its activities that aren't valid JSON-LD? [00:20:26] Hong: Those implementations that had bad JSON-LD tends to fix them soon as a possible. But regressions are so often made. Yeah. [00:20:45] Jeremy: Even within Mastodon, which is probably one of the largest implementers of ActivityPub, there are cases where it's not valid, JSON-LD and somebody fixes it. But then later on there are other messages or other activities that were valid, but aren't valid anymore. And so it's this, it's this back and forth of fixing them and causing new issues it sounds ... [00:21:15] Hong: Yeah. Yeah. Right. [00:21:17] Jeremy: Yeah. That sounds very difficult to deal with. How instances communicate (Inbox) [00:21:20] Jeremy: We've been talking about the messages themselves are this special format of JSON that's very particular. but how do these instances communicate with one another? [00:21:32] Hong: Most of time, it all starts with a follow. Like when John follows Alice, then Alice adds both John and John's inbox URI to her followers list, and after John follows Alice, Whenever Alice posts something new that activities get sent to John's inbox behind the scenes. This is just one HTTP post request. Even though ActivityPub is built on HTTP. It doesn't really care about the HTTP response beyond did it work or not. If you want to reply to an activity, you need to figure out the standard inbox, URI and send or reply activity there. [00:22:27] Jeremy: If we define all the terms, there's the actor, which is the person, each actor can send different activities. those activities are in the form of a JSON linked data. [00:22:40] Hong: Yeah. [00:22:42] Jeremy: And everybody has an inbox. And an inbox is an HTTP URL that people post to. [00:22:50] Hong: Right. [00:22:52] Jeremy: And so when you think about that, you had mentioned that if you have a list of followers, let's say you have a hundred followers, would that mean that you have the URLs to all hundred of those follower's inboxes and that you would send one HTTP post to each inbox every time you had a new message? [00:23:16] Hong: Pretty much all ActivityPub implementations have, a thing called shared inbox, it's exactly what it sounds like. One inbox that all actors on a server share. Private stuff like DMs don't go there (it) is just for public posts and thoughts. [00:23:36] Jeremy: I think we haven't really talked about the fact that, when you have multiple users, usually they're on a server, right? That somebody chooses. So you could have tens of thousands, I don't know how many people can fit on the same server. But, rather than, you having to post to each user individually, you can post to the shared inbox on this server. So let's say, of your 100 followers, 50 them are on the same server, and you have a new post, you only need to post to the shared inbox once. [00:24:16] Hong: Yes, that's right. [00:24:18] Jeremy: And in that message you would I assume have links to each of the profiles or actors that you wanted to send that message to. [00:24:30] Hong: Yeah. Scaling challenges [00:24:31] Jeremy: Something that I've seen in the past is there are people who have challenges with scaling. Their Mastodon instance or their implementations of ActivityPub. As the, the number of followers grow, I've seen a post about, ghost one of the companies you work with mentioning that they've had challenges there. What are the challenges there and, and how do you think those can be resolved? [00:25:04] Hong: To put this in context, when Ghost mentioned the scaling, they were not using Message Queue yet. I'm pretty sure using Message Queue would help a lot of their scaling problems. That said it is definitely true that a lot of activity post software has trouble with scaling right now. I think part of the problem is that everyone's using this purely event driven approach to sending activities around. One of the big issues is that when their delivery fails it's the sender who has to retry and not the receiver. Plus there's all this overhead because the sender has to authenticate itself with HTTP signatures every time. Actually the ActivityPub spec suggests using polling too so I'd love to see more ActivityPub software try using both approaches together. [00:26:16] Jeremy: You mean the followers would poll who they're following instead of the person posting the messages having to send their posts to everyone's inboxes. [00:26:29] Hong: Yeah. [00:26:29] Jeremy: I see. So that's a part of the ActivityPubs specification, but not implemented in a lot of ActivityPub implementations, And so it sounds like maybe that puts a lot of burden on the servers that have people with a lot of followers because they have to post to every single, follower server and maybe the server is slow or they can't reach it. And like you said, they have to just keep trying and trying. There could be a lot of challenges there. [00:27:09] Hong: Right. Account migration [00:27:10] Jeremy: We've talked a little bit about the fact that each person each actor is hosted by a server and those servers can host multiple actors. But if you want to move to another server either because your server is shutting down or you just would like to change servers, what are some of the challenges there? [00:27:38] Hong: ActivityPub and Fediverse already have the specification for an account move. It's called FEP-7628 Move Actor. First thing you need to do when moving an account is prove that both the old and new accounts belong to the same person. You do this by adding the all accounts, add the URI to the new account's AlsoKnownAs property. And then the old account contacts all the other instances it's moving by sending out a move activity. When a server gets this move activity, it checks that both accounts really do belong to the same parts, and then it makes all the accounts that, uh, were following the, all the accounts start to, following the new one instead. that's how the new account gets to keep all the, all the accounts follow us. pretty much all, all the major activity post software has this feature built in, for example, Mastodon Misskey you name it. [00:29:04] Jeremy: This is very similar to the post where when you execute a move, the server that originally hosted that actor, they need to somehow tell every single other server that was following that account that you've moved. And so if there's any issues with communicating with one of those servers, or you miss one, then it just won't recognize that you've moved. You have to make sure that you talk to every single server. [00:29:36] Hong: That's right. [00:29:38] Jeremy: I could see how that could be a difficult problem sometimes if you have a lot of followers. [00:29:45] Hong: Yeah. Fedify [00:29:46] Jeremy: You've created a TypeScript library Fedify for building ActivityPub powered applications. What was the reason you decided to create Fedify? [00:29:58] Hong: Fedify is (a) ActivityPub servers framework I built for TypeScript. It basically takes away a lot of headaches you'd get trying to implement (an) ActivityPub server from scratch. The whole thing started because I wanted to build hollo -- A single user microblogging platform I built. But when I tried, to implement ActivityPub from (the) ground up it was kind of a nightmare. Imagine trying to write a CGI program in Perl or C back in the late nineties, where you are manually printing, HTTP headers and HTML as bias. there just wasn't any good abstraction layer to go with. There were already some libraries and frameworks for ActivityPub out there but none of them really hit the sweet spot I was looking for. They were either too high level and rigid. Like you could only build a mastodon clone or they barely did anything at all. Or they were written in languages I didn't really know. Ghost and Fedify [00:31:24] Jeremy: I saw that you are doing some work with, ghost. How is Ghost using fedify? [00:31:30] Hong: Ghost is an open source publishing platform. They have put some money into fedify which is why I get to work on it full time now. Their ActivityPub feature is still in private beta but it should be available to everyone pretty soon. We work together to improve fedify. Basically they are a user of fedify. They report bugs request new features to fedify then I fix them or implement them, first. [00:32:16] Jeremy: Ghost to my understanding is a blogging platform and a a newsletter platform. So what does it mean for them to implement ActivityPub? What would somebody using Mastodon, for example, get when they follow somebody using Ghost? [00:32:38] Hong: Ghost will have a fediverse handle for each blog. If you follow them in your mastodon or something (similar) then a new post is published. These post will show up (in) your timeline in Mastodon and you can like them or share them. Andin the dashboard of Ghost you can see who liked their posts or shared their posts and so on. It is like how mastodon works but in Ghost. [00:33:26] Jeremy: I see. So if you are writing a ghost blog and somebody follows your blog from Mastodon, sort of like we were talking about earlier, they can like your post, and on the blog itself you could show, oh, I have 200 likes. And those aren't necessarily people who were on your ghost website, they could be people that were liking your post from Mastodon. [00:33:58] Hong: Yes. Misskey / Forkey development in Asia [00:34:00] Jeremy: Something you mentioned at the beginning was there is a community of developers in Asia making forks of I believe of Mastodon, right? [00:34:13] Hong: Yeah. [00:34:14] Jeremy: Do you have experience working in that development community? What's different about it compared to the more Western centric community? [00:34:24] Hong: They are very similar in most ways. The key difference is language of course. They communicate in Japanese primarily. They also accept pull requests with English. But there are tons of comments in Japanese in their code. So you need to translate them into English or your first language to understand what code does. So I think that makes a barrier for Western developers. In fact, many Western developers that contribute to misskey or forkey are able to speak a little Japanese. And many of the developers of misskey and forkey are kind of otaku. [00:35:31] Jeremy: Oh otaku okay. [00:35:33] Hong: It's not a big deal, but you can see (the) difference in a glance. [00:35:41] Jeremy: Yeah. You mentioned one of the things that I believe misskey implemented was the emoji reactions and maybe one of the reasons they wanted that was so that they could react to each other's posts with you know anime pictures or things like that. [00:35:58] Hong: Yeah, that's right. [00:36:01] Jeremy: You've mentioned misskey and forkey. So is misskey a fork of Mastodon and then is forkey a fork of misskey? [00:36:10] Hong: No, misskey is not a fork of mastodon. (It) is built from scratch. It's its own implementation. And forkeys are forks of Mastodon. [00:36:22] Jeremy: Oh, I see. But both of those are primarily built by Japanese developers. [00:36:30] Hong: Yes. Whereas Mastodon (is) written in Ruby. Ruby on Rails. But misskey is built in TypeScript. [00:36:40] Jeremy: And because of ActivityPub -- they all implement it. So you can communicate with people between mastodon and misskey because they all understand the same activities. [00:36:56] Hong: Yes. Backwards compatible activity implementations [00:36:57] Jeremy: You did mention since there are extensions like misskey has the emoji reactions. When there is an activity that an implementation doesn't support what happens between the two servers? Do you send it to a server's inbox and then the server just doesn't do anything with it? [00:37:16] Hong: Some implementers consider backwards compatibility. So they design (it) to work with other implementations that don't support that activity. For example misskey uses like activity for emoji reaction. So if you put an emoji to a Mastodon post then in Mastodon you get one like. So it's intended behavior by misskey developers that they fall back to normal likes. But sometimes ActivityPub implementers introduce entirely new activity types. For example Pleroma introduced the emoji react. And if you put emoji reaction to Mastodon post from Pleroma in Mastodon you have nothing to see because Mastodon just ignores them. [00:38:37] Jeremy: If I understand correctly, both misskey and Pleroma are independent implementations of ActivityPub, but with misskey, they can tell when or their message is backwards compatible where it's if you don't understand the emoji reaction, it'll be embedded inside of a like message. Whereas with Pleroma they send an activity that Mastodon can't understand at all. So it just doesn't do anything. [00:39:11] Hong: Yes, right. But, Misskey also understands (the) emoji react activity. So between pleroma and misskey they have exchanged emoji reactions with no problem. [00:39:27] Jeremy: Oh, I see. So they, they both understand that activity. They both implement it the same way, but then when misskey communicates with Mastodon or with an instance that it knows doesn't understand it, it sends something different. [00:39:45] Hong: Yeah, that's right. [00:39:47] Jeremy: The servers -- can they query one another to know which activities they support? [00:39:53] Hong: Usually ActivityPub implementations also implement NodeInfo specification. It's like a user agent-like thing in Fediverse. Implementations tell the other instance (if it) is Mastodon or something else. You can query the type of server. [00:40:20] Jeremy: Okay, so within ActivityPub are each of the servers -- is the term node is that the word they use for each server? [00:40:31] Hong: Yes. Right. [00:40:32] Jeremy: You have the nodes, which can have any number of actors and the servers send activities to one another, to each other's inboxes. And so those are the way they all communicate. [00:40:49] Hong: Yeah. Building an ActivityPub implementation [00:40:50] Jeremy: You've implemented ActivityPub with Fedify because you found like there weren't good enough implementations or resources already. Did you implement it based off of the specification or did you look at existing implementations while you were building your implementation? [00:41:12] Hong: To be honest, instead of just, diving into the spec. I usually start by looking at actually ActivityPub software code first. The ActivityPub spec is so vague that you can't really build something just from reading it. So when we talk about ActivityPub, we are actually talking about a whole bunch of other technical standards too, WebFinger, HTTP signatures and more. So you need to understand all of these as well. [00:41:47] Jeremy: With the specification alone, you were saying it's too vague and so what ends up being -- I'm not sure if it's right to call it a spec, but looking at the implementations that people have already made that collectively becomes the spec because trying to follow the spec just by itself is maybe too difficult. [00:42:12] Hong: Yes. [00:42:14] Jeremy: Maybe that brings up the issues you were talking about before where you have specifications like JSON-LD where they're so complicated that even the biggest implementations aren't quite following it exactly. [00:42:28] Hong: Yeah. [00:42:29] Jeremy: If somebody wanted to, to get started with understanding a little bit more about ActivityPub or building something with it where would you recommend they start? [00:42:44] Hong: I recommend to dig into a lot of code from actual implementations. First, Mastodon, Misskey, Akkoma and so on. There are are some really cool tools that have been so helpful. For example, ActivityPub Academy is this awesome mastodon server for debugging ActivityPub. It makes it super easy to create a temporary account and see what activities are going back and forth. There is also BrowserPub. BrowserPub is this neat tool for looking up and browsing ActivityPub objects. It's really handy when you want to see how different ActivityPub software handles various features. I also recommend to use Fedify. I've got to mention the Fedify CLI, which comes with some really useful tools. [00:43:46] Jeremy: So if someone uses Fedify they're writing an application in TypeScript, then it sounds like they have to know the high level concepts. They have to know what are the different activities, what is inside of an actor. But the actual implementation of how do I create and parse JSON linked data, those kinds of things are taken care of by the library. [00:44:13] Hong: Yes, right. [00:44:16] Jeremy: So in some ways it seems like it might be good to, like you were saying, use the tools you mentioned to create a test Mastodon account, look at the messages being sent back and forth, and then when you're trying to implement it, starting with something like Fedify might be good because then you can really just focus on the concepts and not worry so much about the, the implementation details. [00:44:43] Hong: Yes, that's right. [00:44:45] Jeremy: Is there anything else you. Wanted to mention or thought we should have talked about? [00:44:52] Hong: Mm. I want to, talk about, a lot of stuff about ActivityPub but it's difficult to speak in English for me, so, it's a shame to talk about it very little. [00:45:15] Jeremy: We need everybody to learn Korean right? [00:45:23] Hong: Yes, please. (laughs) [00:45:23] Jeremy: Yeah. Well, I wanna thank you for taking the time. I know it must have been really challenging to give an interview in, you know, a language that's not your native one. So thank you for spending the time to talk with me. [00:45:38] Hong: Thank you for having me.
In this episode of Quality Matters, host Andy Reynolds is joined by Ed Yurcisin, Chief Technology Officer at NCQA, to break down the complexities of digital transformation in health care quality. Ed explains how NCQA's push for digital measurement cuts through inefficiencies and inconsistencies in assessing quality. Traditionally, HEDIS® quality measures have existed as large, text-heavy PDFs, leaving room for misinterpretation. By digitalizing these measures into computer code—Clinical Quality Language (CQL)—NCQA removes ambiguity and standardizes interpretation. That makes it easier for health care organizations to implement and use quality measures. This shift reduces administrative burden and helps ensure that quality assessments are more accurate and actionable.The conversation then shifts to FHIR® (Fast Healthcare Interoperability Resources), a standard designed to streamline health care data exchange. Ed explains that while FHIR might sound intimidating, it's built on the basic web technologies that power everyday internet browsing. FHIR brings five essential components to the table—JSON files, Rest APIs, standardized value sets, a common data model and government-mandated data exchange. While the government requires organizations to “pitch” data (make data available), there's no mandate to “catch” data (actually use the data). That means organizations that choose to use the data gain a competitive advantage.The discussion ends by focusing on data quality, an issue that looms large over digital transformation efforts. Ed introduces the Bulk FHIR Quality Coalition, a collaborative initiative aimed at improving the reliability of data exchanged between health care providers and insurers. Using the analogy of water through pipes, Ed explains that current data-sharing efforts help ensure flow, but don't always guarantee that data are “clean” enough to be useful. The coalition enhances existing provider–insurer relationships to test and improve large-scale data exchange methods. Ultimately, Ed underscores that digital transformation in health care is only as strong as the quality of the data being exchanged. Standardization, accessibility and interoperability are the foundations of progress, ensuring that technology-driven solutions improve enhance outcomes. Digital HEDIS, FHIR and the Bulk FHIR Quality Coalition are examples of how NCQA is reducing measurement burden to streamline measurement and improve quality. Key Quote:“ The digital transformation of health care is necessary to deliver higher quality care. But that is dependent on high-quality data and the ability to exchange this data. It starts with high-quality data–making it accessible, interoperable, exchangeable. That is the foundation for being able to deliver digital health care transformation. Nothing in digital transformation in health care makes sense without high-quality data exchange.”-Ed YurcisinTime Stamps:(1:03) The How and Why of Digital Measurement(04:14) Understanding FHIR(08:32) From Data Exchange to Competitive Advantage(10:42) The Bulk FHIR Quality CoalitionLinks:Connect with Edward YurcisinNCQA Digital Hub Bulk FHIR Quality Coalition
In this episode of the PowerShell Podcast, we welcome Greg Martin, a longtime developer and PowerShell enthusiast, who has taken PowerShell beyond system administration and into the realm of game development. Greg shares his journey of building Eldoria, a terminal adventure game written entirely in PowerShell, and how his experience across multiple programming languages influenced his approach. Key topics in this episode include: Building a game in PowerShell – How Greg used PowerShell to create a rich text-based adventure. The power of terminal-based gaming – A brief look into ANSI escape sequences, JSON asset management, and the REPL loop. Greg's programming journey – From C and C++ to PowerShell, game development, and enterprise automation. Lessons in curiosity and career growth – How following your interests can lead to unexpected and rewarding opportunities. Greg also discusses the challenges of structuring a large-scale PowerShell project, how PowerShell's object-oriented features made development easier, and how anyone can start exploring creative projects with PowerShell.Join the conversation: Bio and links: Gregory Martin is a Senior Linux Engineer, formerly an industrial network designer, IT manager, sysadmin, and may have given a lecture or two at tech conferences. He's an avid programmer with over 20 years of experience, ranging from Windows/Linux Desktop, Web, Android/iOS, Industrial IoT, Linux CLI, and Automation Orchestration. In his spare time, he writes computer games and dabbles with AI technologies. He writes at his blog (themartinmethod.com). Check out Eldoria on Greg's GitHub and explore the game in your own terminal. Read Greg's blog at TheMartinMethod.com for updates on Eldoria and other projects. Join PowerShell Wednesdays every Wednesday at 2 PM EST in the PDQ Discord community (discord.gg/pdq) for live discussions. https://github.com/gregoryfmartin/Eldoria https://github.com/gregoryfmartin/Burnt-Latte https://www.linkedin.com/in/andrewplatech/ The PowerShell Podcast: https://pdq.com/the-powershell-podcast The PowerShell Podcast on YouTube: https://youtu.be/0kBrtPsD2EE
This week the Rust controversy continues, and a kernel maintainer stirs up some political drama on the way out the door. NTSYNC and Wayland HDR finally land... and you can't use them yet. KDE Plasma pushes 6.3 out the door, OBS threatens to sue Fedora, and OpenSUSE surprises us all by moving to SELinux. For tips we have etckeeper for versioning your /etc files, pw-config for querying your Pipewire config, and a more detailed guide to using jq to manipulate JSON data. You can find the show notes at https://bit.ly/4gHNvng and enjoy! Host: Jonathan Bennett Co-Hosts: Rob Campbell and Ken McDonald Download or subscribe to Untitled Linux Show at https://twit.tv/shows/untitled-linux-show Want access to the ad-free video and exclusive features? Become a member of Club TWiT today! https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.
Show NotesMike Bowers, Chief Architect at FairCom, has spent decades navigating the evolution of database technology. In this conversation, he and Robby explore the challenges of maintaining a 40+ year-old codebase, balancing legacy constraints with forward-thinking design, and the realities of technical debt.Mike shares how FairCom transitioned from ISAM-based databases to modern JSON-driven APIs, the trade-offs between strict schemas and flexible document stores, and how software architecture plays a critical role in long-term maintainability. He also explains why human-readable JSON simplifies debugging, how documentation-driven development improves API usability, and why many software teams struggle with refactoring at the right time.Topics covered[00:05:32] The role of software architecture in long-term maintainability[00:10:45] Why FairCom's legacy ISAM technology still matters today[00:14:20] Transitioning to a JSON-based API for modern developers[00:19:40] The challenges of maintaining 40+ years of C code[00:24:10] Technical debt: What it really means and how to manage it[00:28:50] The trade-offs between strict schemas and flexible NoSQL approaches[00:34:00] When to refactor vs. when to start over from scratch[00:38:15] The influence of product management thinking on software architecture[00:42:30] Advice for engineers considering a shift into architecture rolesResources mentionedFairComMike Bowers on LinkedInFairCom on Twitter/XBook Recommendation: The Influential Product Manager by MSc BuceroThanks to Our Sponsor!Need a smoother way to share your team's inbox? Jelly's got you covered!
If you're in SF, join us tomorrow for a fun meetup at CodeGen Night!If you're in NYC, join us for AI Engineer Summit! The Agent Engineering track is now sold out, but 25 tickets remain for AI Leadership and 5 tickets for the workshops. You can see the full schedule of speakers and workshops at https://ai.engineer!It's exceedingly hard to introduce someone like Bret Taylor. We could recite his Wikipedia page, or his extensive work history through Silicon Valley's greatest companies, but everyone else already does that.As a podcast by AI engineers for AI engineers, we had the opportunity to do something a little different. We wanted to dig into what Bret sees from his vantage point at the top of our industry for the last 2 decades, and how that explains the rise of the AI Architect at Sierra, the leading conversational AI/CX platform.“Across our customer base, we are seeing a new role emerge - the role of the AI architect. These leaders are responsible for helping define, manage and evolve their company's AI agent over time. They come from a variety of both technical and business backgrounds, and we think that every company will have one or many AI architects managing their AI agent and related experience.”In our conversation, Bret Taylor confirms the Paul Buchheit legend that he rewrote Google Maps in a weekend, armed with only the help of a then-nascent Google Closure Compiler and no other modern tooling. But what we find remarkable is that he was the PM of Maps, not an engineer, though of course he still identifies as one. We find this theme recurring throughout Bret's career and worldview. We think it is plain as day that AI leadership will have to be hands-on and technical, especially when the ground is shifting as quickly as it is today:“There's a lot of power in combining product and engineering into as few people as possible… few great things have been created by committee.”“If engineering is an order taking organization for product you can sometimes make meaningful things, but rarely will you create extremely well crafted breakthrough products. Those tend to be small teams who deeply understand the customer need that they're solving, who have a maniacal focus on outcomes.”“And I think the reason why is if you look at like software as a service five years ago, maybe you can have a separation of product and engineering because most software as a service created five years ago. I wouldn't say there's like a lot of technological breakthroughs required for most business applications. And if you're making expense reporting software or whatever, it's useful… You kind of know how databases work, how to build auto scaling with your AWS cluster, whatever, you know, it's just, you're just applying best practices to yet another problem. "When you have areas like the early days of mobile development or the early days of interactive web applications, which I think Google Maps and Gmail represent, or now AI agents, you're in this constant conversation with what the requirements of your customers and stakeholders are and all the different people interacting with it and the capabilities of the technology. And it's almost impossible to specify the requirements of a product when you're not sure of the limitations of the technology itself.”This is the first time the difference between technical leadership for “normal” software and for “AI” software was articulated this clearly for us, and we'll be thinking a lot about this going forward. We left a lot of nuggets in the conversation, so we hope you'll just dive in with us (and thank Bret for joining the pod!)Timestamps* 00:00:02 Introductions and Bret Taylor's background* 00:01:23 Bret's experience at Stanford and the dot-com era* 00:04:04 The story of rewriting Google Maps backend* 00:11:06 Early days of interactive web applications at Google* 00:15:26 Discussion on product management and engineering roles* 00:21:00 AI and the future of software development* 00:26:42 Bret's approach to identifying customer needs and building AI companies* 00:32:09 The evolution of business models in the AI era* 00:41:00 The future of programming languages and software development* 00:49:38 Challenges in precisely communicating human intent to machines* 00:56:44 Discussion on Artificial General Intelligence (AGI) and its impact* 01:08:51 The future of agent-to-agent communication* 01:14:03 Bret's involvement in the OpenAI leadership crisis* 01:22:11 OpenAI's relationship with Microsoft* 01:23:23 OpenAI's mission and priorities* 01:27:40 Bret's guiding principles for career choices* 01:29:12 Brief discussion on pasta-making* 01:30:47 How Bret keeps up with AI developments* 01:32:15 Exciting research directions in AI* 01:35:19 Closing remarks and hiring at Sierra Transcript[00:02:05] Introduction and Guest Welcome[00:02:05] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co host swyx, founder of smol.ai.[00:02:17] swyx: Hey, and today we're super excited to have Bret Taylor join us. Welcome. Thanks for having me. It's a little unreal to have you in the studio.[00:02:25] swyx: I've read about you so much over the years, like even before. Open AI effectively. I mean, I use Google Maps to get here. So like, thank you for everything that you've done. Like, like your story history, like, you know, I think people can find out what your greatest hits have been.[00:02:40] Bret Taylor's Early Career and Education[00:02:40] swyx: How do you usually like to introduce yourself when, you know, you talk about, you summarize your career, like, how do you look at yourself?[00:02:47] Bret: Yeah, it's a great question. You know, we, before we went on the mics here, we're talking about the audience for this podcast being more engineering. And I do think depending on the audience, I'll introduce myself differently because I've had a lot of [00:03:00] corporate and board roles. I probably self identify as an engineer more than anything else though.[00:03:04] Bret: So even when I was. Salesforce, I was coding on the weekends. So I think of myself as an engineer and then all the roles that I do in my career sort of start with that just because I do feel like engineering is sort of a mindset and how I approach most of my life. So I'm an engineer first and that's how I describe myself.[00:03:24] Bret: You majored in computer[00:03:25] swyx: science, like 1998. And, and I was high[00:03:28] Bret: school, actually my, my college degree was Oh, two undergrad. Oh, three masters. Right. That old.[00:03:33] swyx: Yeah. I mean, no, I was going, I was going like 1998 to 2003, but like engineering wasn't as, wasn't a thing back then. Like we didn't have the title of senior engineer, you know, kind of like, it was just.[00:03:44] swyx: You were a programmer, you were a developer, maybe. What was it like in Stanford? Like, what was that feeling like? You know, was it, were you feeling like on the cusp of a great computer revolution? Or was it just like a niche, you know, interest at the time?[00:03:57] Stanford and the Dot-Com Bubble[00:03:57] Bret: Well, I was at Stanford, as you said, from 1998 to [00:04:00] 2002.[00:04:02] Bret: 1998 was near the peak of the dot com bubble. So. This is back in the day where most people that they're coding in the computer lab, just because there was these sun microsystems, Unix boxes there that most of us had to do our assignments on. And every single day there was a. com like buying pizza for everybody.[00:04:20] Bret: I didn't have to like, I got. Free food, like my first two years of university and then the dot com bubble burst in the middle of my college career. And so by the end there was like tumbleweed going to the job fair, you know, it was like, cause it was hard to describe unless you were there at the time, the like level of hype and being a computer science major at Stanford was like, A thousand opportunities.[00:04:45] Bret: And then, and then when I left, it was like Microsoft, IBM.[00:04:49] Joining Google and Early Projects[00:04:49] Bret: And then the two startups that I applied to were VMware and Google. And I ended up going to Google in large part because a woman named Marissa Meyer, who had been a teaching [00:05:00] assistant when I was, what was called a section leader, which was like a junior teaching assistant kind of for one of the big interest.[00:05:05] Bret: Yes. Classes. She had gone there. And she was recruiting me and I knew her and it was sort of felt safe, you know, like, I don't know. I thought about it much, but it turned out to be a real blessing. I realized like, you know, you always want to think you'd pick Google if given the option, but no one knew at the time.[00:05:20] Bret: And I wonder if I'd graduated in like 1999 where I've been like, mom, I just got a job at pets. com. It's good. But you know, at the end I just didn't have any options. So I was like, do I want to go like make kernel software at VMware? Do I want to go build search at Google? And I chose Google. 50, 50 ball.[00:05:36] Bret: I'm not really a 50, 50 ball. So I feel very fortunate in retrospect that the economy collapsed because in some ways it forced me into like one of the greatest companies of all time, but I kind of lucked into it, I think.[00:05:47] The Google Maps Rewrite Story[00:05:47] Alessio: So the famous story about Google is that you rewrote the Google maps back in, in one week after the map quest quest maps acquisition, what was the story there?[00:05:57] Alessio: Is it. Actually true. Is it [00:06:00] being glorified? Like how, how did that come to be? And is there any detail that maybe Paul hasn't shared before?[00:06:06] Bret: It's largely true, but I'll give the color commentary. So it was actually the front end, not the back end, but it turns out for Google maps, the front end was sort of the hard part just because Google maps was.[00:06:17] Bret: Largely the first ish kind of really interactive web application, say first ish. I think Gmail certainly was though Gmail, probably a lot of people then who weren't engineers probably didn't appreciate its level of interactivity. It was just fast, but. Google maps, because you could drag the map and it was sort of graphical.[00:06:38] Bret: My, it really in the mainstream, I think, was it a map[00:06:41] swyx: quest back then that was, you had the arrows up and down, it[00:06:44] Bret: was up and down arrows. Each map was a single image and you just click left and then wait for a few seconds to the new map to let it was really small too, because generating a big image was kind of expensive on computers that day.[00:06:57] Bret: So Google maps was truly innovative in that [00:07:00] regard. The story on it. There was a small company called where two technologies started by two Danish brothers, Lars and Jens Rasmussen, who are two of my closest friends now. They had made a windows app called expedition, which had beautiful maps. Even in 2000.[00:07:18] Bret: For whenever we acquired or sort of acquired their company, Windows software was not particularly fashionable, but they were really passionate about mapping and we had made a local search product that was kind of middling in terms of popularity, sort of like a yellow page of search product. So we wanted to really go into mapping.[00:07:36] Bret: We'd started working on it. Their small team seemed passionate about it. So we're like, come join us. We can build this together.[00:07:42] Technical Challenges and Innovations[00:07:42] Bret: It turned out to be a great blessing that they had built a windows app because you're less technically constrained when you're doing native code than you are building a web browser, particularly back then when there weren't really interactive web apps and it ended up.[00:07:56] Bret: Changing the level of quality that we [00:08:00] wanted to hit with the app because we were shooting for something that felt like a native windows application. So it was a really good fortune that we sort of, you know, their unusual technical choices turned out to be the greatest blessing. So we spent a lot of time basically saying, how can you make a interactive draggable map in a web browser?[00:08:18] Bret: How do you progressively load, you know, new map tiles, you know, as you're dragging even things like down in the weeds of the browser at the time, most browsers like Internet Explorer, which was dominant at the time would only load two images at a time from the same domain. So we ended up making our map tile servers have like.[00:08:37] Bret: Forty different subdomains so we could load maps and parallels like lots of hacks. I'm happy to go into as much as like[00:08:44] swyx: HTTP connections and stuff.[00:08:46] Bret: They just like, there was just maximum parallelism of two. And so if you had a map, set of map tiles, like eight of them, so So we just, we were down in the weeds of the browser anyway.[00:08:56] Bret: So it was lots of plumbing. I can, I know a lot more about browsers than [00:09:00] most people, but then by the end of it, it was fairly, it was a lot of duct tape on that code. If you've ever done an engineering project where you're not really sure the path from point A to point B, it's almost like. Building a house by building one room at a time.[00:09:14] Bret: The, there's not a lot of architectural cohesion at the end. And then we acquired a company called Keyhole, which became Google earth, which was like that three, it was a native windows app as well, separate app, great app, but with that, we got licenses to all this satellite imagery. And so in August of 2005, we added.[00:09:33] Bret: Satellite imagery to Google Maps, which added even more complexity in the code base. And then we decided we wanted to support Safari. There was no mobile phones yet. So Safari was this like nascent browser on, on the Mac. And it turns out there's like a lot of decisions behind the scenes, sort of inspired by this windows app, like heavy use of XML and XSLT and all these like.[00:09:54] Bret: Technologies that were like briefly fashionable in the early two thousands and everyone hates now for good [00:10:00] reason. And it turns out that all of the XML functionality and Internet Explorer wasn't supporting Safari. So people are like re implementing like XML parsers. And it was just like this like pile of s**t.[00:10:11] Bret: And I had to say a s**t on your part. Yeah, of[00:10:12] Alessio: course.[00:10:13] Bret: So. It went from this like beautifully elegant application that everyone was proud of to something that probably had hundreds of K of JavaScript, which sounds like nothing. Now we're talking like people have modems, you know, not all modems, but it was a big deal.[00:10:29] Bret: So it was like slow. It took a while to load and just, it wasn't like a great code base. Like everything was fragile. So I just got. Super frustrated by it. And then one weekend I did rewrite all of it. And at the time the word JSON hadn't been coined yet too, just to give you a sense. So it's all XML.[00:10:47] swyx: Yeah.[00:10:47] Bret: So we used what is now you would call JSON, but I just said like, let's use eval so that we can parse the data fast. And, and again, that's, it would literally as JSON, but at the time there was no name for it. So we [00:11:00] just said, let's. Pass on JavaScript from the server and eval it. And then somebody just refactored the whole thing.[00:11:05] Bret: And, and it wasn't like I was some genius. It was just like, you know, if you knew everything you wished you had known at the beginning and I knew all the functionality, cause I was the primary, one of the primary authors of the JavaScript. And I just like, I just drank a lot of coffee and just stayed up all weekend.[00:11:22] Bret: And then I, I guess I developed a bit of reputation and no one knew about this for a long time. And then Paul who created Gmail and I ended up starting a company with him too, after all of this told this on a podcast and now it's large, but it's largely true. I did rewrite it and it, my proudest thing.[00:11:38] Bret: And I think JavaScript people appreciate this. Like the un G zipped bundle size for all of Google maps. When I rewrote, it was 20 K G zipped. It was like much smaller for the entire application. It went down by like 10 X. So. What happened on Google? Google is a pretty mainstream company. And so like our usage is shot up because it turns out like it's faster.[00:11:57] Bret: Just being faster is worth a lot of [00:12:00] percentage points of growth at a scale of Google. So how[00:12:03] swyx: much modern tooling did you have? Like test suites no compilers.[00:12:07] Bret: Actually, that's not true. We did it one thing. So I actually think Google, I, you can. Download it. There's a, Google has a closure compiler, a closure compiler.[00:12:15] Bret: I don't know if anyone still uses it. It's gone. Yeah. Yeah. It's sort of gone out of favor. Yeah. Well, even until recently it was better than most JavaScript minifiers because it was more like it did a lot more renaming of variables and things. Most people use ES build now just cause it's fast and closure compilers built on Java and super slow and stuff like that.[00:12:37] Bret: But, so we did have that, that was it. Okay.[00:12:39] The Evolution of Web Applications[00:12:39] Bret: So and that was treated internally, you know, it was a really interesting time at Google at the time because there's a lot of teams working on fairly advanced JavaScript when no one was. So Google suggest, which Kevin Gibbs was the tech lead for, was the first kind of type ahead, autocomplete, I believe in a web browser, and now it's just pervasive in search boxes that you sort of [00:13:00] see a type ahead there.[00:13:01] Bret: I mean, chat, dbt[00:13:01] swyx: just added it. It's kind of like a round trip.[00:13:03] Bret: Totally. No, it's now pervasive as a UI affordance, but that was like Kevin's 20 percent project. And then Gmail, Paul you know, he tells the story better than anyone, but he's like, you know, basically was scratching his own itch, but what was really neat about it is email, because it's such a productivity tool, just needed to be faster.[00:13:21] Bret: So, you know, he was scratching his own itch of just making more stuff work on the client side. And then we, because of Lars and Yen sort of like setting the bar of this windows app or like we need our maps to be draggable. So we ended up. Not only innovate in terms of having a big sync, what would be called a single page application today, but also all the graphical stuff you know, we were crashing Firefox, like it was going out of style because, you know, when you make a document object model with the idea that it's a document and then you layer on some JavaScript and then we're essentially abusing all of this, it just was running into code paths that were not.[00:13:56] Bret: Well, it's rotten, you know, at this time. And so it was [00:14:00] super fun. And, and, you know, in the building you had, so you had compilers, people helping minify JavaScript just practically, but there is a great engineering team. So they were like, that's why Closure Compiler is so good. It was like a. Person who actually knew about programming languages doing it, not just, you know, writing regular expressions.[00:14:17] Bret: And then the team that is now the Chrome team believe, and I, I don't know this for a fact, but I'm pretty sure Google is the main contributor to Firefox for a long time in terms of code. And a lot of browser people were there. So every time we would crash Firefox, we'd like walk up two floors and say like, what the hell is going on here?[00:14:35] Bret: And they would load their browser, like in a debugger. And we could like figure out exactly what was breaking. And you can't change the code, right? Cause it's the browser. It's like slow, right? I mean, slow to update. So, but we could figure out exactly where the bug was and then work around it in our JavaScript.[00:14:52] Bret: So it was just like new territory. Like so super, super fun time, just like a lot of, a lot of great engineers figuring out [00:15:00] new things. And And now, you know, the word, this term is no longer in fashion, but the word Ajax, which was asynchronous JavaScript and XML cause I'm telling you XML, but see the word XML there, to be fair, the way you made HTTP requests from a client to server was this.[00:15:18] Bret: Object called XML HTTP request because Microsoft and making Outlook web access back in the day made this and it turns out to have nothing to do with XML. It's just a way of making HTTP requests because XML was like the fashionable thing. It was like that was the way you, you know, you did it. But the JSON came out of that, you know, and then a lot of the best practices around building JavaScript applications is pre React.[00:15:44] Bret: I think React was probably the big conceptual step forward that we needed. Even my first social network after Google, we used a lot of like HTML injection and. Making real time updates was still very hand coded and it's really neat when you [00:16:00] see conceptual breakthroughs like react because it's, I just love those things where it's like obvious once you see it, but it's so not obvious until you do.[00:16:07] Bret: And actually, well, I'm sure we'll get into AI, but I, I sort of feel like we'll go through that evolution with AI agents as well that I feel like we're missing a lot of the core abstractions that I think in 10 years we'll be like, gosh, how'd you make agents? Before that, you know, but it was kind of that early days of web applications.[00:16:22] swyx: There's a lot of contenders for the reactive jobs of of AI, but no clear winner yet. I would say one thing I was there for, I mean, there's so much we can go into there. You just covered so much.[00:16:32] Product Management and Engineering Synergy[00:16:32] swyx: One thing I just, I just observe is that I think the early Google days had this interesting mix of PM and engineer, which I think you are, you didn't, you didn't wait for PM to tell you these are my, this is my PRD.[00:16:42] swyx: This is my requirements.[00:16:44] mix: Oh,[00:16:44] Bret: okay.[00:16:45] swyx: I wasn't technically a software engineer. I mean,[00:16:48] Bret: by title, obviously. Right, right, right.[00:16:51] swyx: It's like a blend. And I feel like these days, product is its own discipline and its own lore and own industry and engineering is its own thing. And there's this process [00:17:00] that happens and they're kind of separated, but you don't produce as good of a product as if they were the same person.[00:17:06] swyx: And I'm curious, you know, if, if that, if that sort of resonates in, in, in terms of like comparing early Google versus modern startups that you see out there,[00:17:16] Bret: I certainly like wear a lot of hats. So, you know, sort of biased in this, but I really agree that there's a lot of power and combining product design engineering into as few people as possible because, you know few great things have been created by committee, you know, and so.[00:17:33] Bret: If engineering is an order taking organization for product you can sometimes make meaningful things, but rarely will you create extremely well crafted breakthrough products. Those tend to be small teams who deeply understand the customer need that they're solving, who have a. Maniacal focus on outcomes.[00:17:53] Bret: And I think the reason why it's, I think for some areas, if you look at like software as a service five years ago, maybe you can have a [00:18:00] separation of product and engineering because most software as a service created five years ago. I wouldn't say there's like a lot of like. Technological breakthroughs required for most, you know, business applications.[00:18:11] Bret: And if you're making expense reporting software or whatever, it's useful. I don't mean to be dismissive of expense reporting software, but you probably just want to understand like, what are the requirements of the finance department? What are the requirements of an individual file expense report? Okay.[00:18:25] Bret: Go implement that. And you kind of know how web applications are implemented. You kind of know how to. How databases work, how to build auto scaling with your AWS cluster, whatever, you know, it's just, you're just applying best practices to yet another problem when you have areas like the early days of mobile development or the early days of interactive web applications, which I think Google Maps and Gmail represent, or now AI agents, you're in this constant conversation with what the requirements of your customers and stakeholders are and all the different people interacting with it.[00:18:58] Bret: And the capabilities of the [00:19:00] technology. And it's almost impossible to specify the requirements of a product when you're not sure of the limitations of the technology itself. And that's why I use the word conversation. It's not literal. That's sort of funny to use that word in the age of conversational AI.[00:19:15] Bret: You're constantly sort of saying, like, ideally, you could sprinkle some magic AI pixie dust and solve all the world's problems, but it's not the way it works. And it turns out that actually, I'll just give an interesting example.[00:19:26] AI Agents and Modern Tooling[00:19:26] Bret: I think most people listening probably use co pilots to code like Cursor or Devon or Microsoft Copilot or whatever.[00:19:34] Bret: Most of those tools are, they're remarkable. I'm, I couldn't, you know, imagine development without them now, but they're not autonomous yet. Like I wouldn't let it just write most code without my interactively inspecting it. We just are somewhere between it's an amazing co pilot and it's an autonomous software engineer.[00:19:53] Bret: As a product manager, like your aspirations for what the product is are like kind of meaningful. But [00:20:00] if you're a product person, yeah, of course you'd say it should be autonomous. You should click a button and program should come out the other side. The requirements meaningless. Like what matters is like, what is based on the like very nuanced limitations of the technology.[00:20:14] Bret: What is it capable of? And then how do you maximize the leverage? It gives a software engineering team, given those very nuanced trade offs. Coupled with the fact that those nuanced trade offs are changing more rapidly than any technology in my memory, meaning every few months you'll have new models with new capabilities.[00:20:34] Bret: So how do you construct a product that can absorb those new capabilities as rapidly as possible as well? That requires such a combination of technical depth and understanding the customer that you really need more integration. Of product design and engineering. And so I think it's why with these big technology waves, I think startups have a bit of a leg up relative to incumbents because they [00:21:00] tend to be sort of more self actualized in terms of just like bringing those disciplines closer together.[00:21:06] Bret: And in particular, I think entrepreneurs, the proverbial full stack engineers, you know, have a leg up as well because. I think most breakthroughs happen when you have someone who can understand those extremely nuanced technical trade offs, have a vision for a product. And then in the process of building it, have that, as I said, like metaphorical conversation with the technology, right?[00:21:30] Bret: Gosh, I ran into a technical limit that I didn't expect. It's not just like changing that feature. You might need to refactor the whole product based on that. And I think that's, that it's particularly important right now. So I don't, you know, if you, if you're building a big ERP system, probably there's a great reason to have product and engineering.[00:21:51] Bret: I think in general, the disciplines are there for a reason. I think when you're dealing with something as nuanced as the like technologies, like large language models today, there's a ton of [00:22:00] advantage of having. Individuals or organizations that integrate the disciplines more formally.[00:22:05] Alessio: That makes a lot of sense.[00:22:06] Alessio: I've run a lot of engineering teams in the past, and I think the product versus engineering tension has always been more about effort than like whether or not the feature is buildable. But I think, yeah, today you see a lot more of like. Models actually cannot do that. And I think the most interesting thing is on the startup side, people don't yet know where a lot of the AI value is going to accrue.[00:22:26] Alessio: So you have this rush of people building frameworks, building infrastructure, layered things, but we don't really know the shape of the compute. I'm curious that Sierra, like how you thought about building an house, a lot of the tooling for evals or like just, you know, building the agents and all of that.[00:22:41] Alessio: Versus how you see some of the startup opportunities that is maybe still out there.[00:22:46] Bret: We build most of our tooling in house at Sierra, not all. It's, we don't, it's not like not invented here syndrome necessarily, though, maybe slightly guilty of that in some ways, but because we're trying to build a platform [00:23:00] that's in Dorian, you know, we really want to have control over our own destiny.[00:23:03] Bret: And you had made a comment earlier that like. We're still trying to figure out who like the reactive agents are and the jury is still out. I would argue it hasn't been created yet. I don't think the jury is still out to go use that metaphor. We're sort of in the jQuery era of agents, not the react era.[00:23:19] Bret: And, and that's like a throwback for people listening,[00:23:22] swyx: we shouldn't rush it. You know?[00:23:23] Bret: No, yeah, that's my point is. And so. Because we're trying to create an enduring company at Sierra that outlives us, you know, I'm not sure we want to like attach our cart to some like to a horse where it's not clear that like we've figured out and I actually want as a company, we're trying to enable just at a high level and I'll, I'll quickly go back to tech at Sierra, we help consumer brands build customer facing AI agents.[00:23:48] Bret: So. Everyone from Sonos to ADT home security to Sirius XM, you know, if you call them on the phone and AI will pick up with you, you know, chat with them on the Sirius XM homepage. It's an AI agent called Harmony [00:24:00] that they've built on our platform. We're what are the contours of what it means for someone to build an end to end complete customer experience with AI with conversational AI.[00:24:09] Bret: You know, we really want to dive into the deep end of, of all the trade offs to do it. You know, where do you use fine tuning? Where do you string models together? You know, where do you use reasoning? Where do you use generation? How do you use reasoning? How do you express the guardrails of an agentic process?[00:24:25] Bret: How do you impose determinism on a fundamentally non deterministic technology? There's just a lot of really like as an important design space. And I could sit here and tell you, we have the best approach. Every entrepreneur will, you know. But I hope that in two years, we look back at our platform and laugh at how naive we were, because that's the pace of change broadly.[00:24:45] Bret: If you talk about like the startup opportunities, I'm not wholly skeptical of tools companies, but I'm fairly skeptical. There's always an exception for every role, but I believe that certainly there's a big market for [00:25:00] frontier models, but largely for companies with huge CapEx budgets. So. Open AI and Microsoft's Anthropic and Amazon Web Services, Google Cloud XAI, which is very well capitalized now, but I think the, the idea that a company can make money sort of pre training a foundation model is probably not true.[00:25:20] Bret: It's hard to, you're competing with just, you know, unreasonably large CapEx budgets. And I just like the cloud infrastructure market, I think will be largely there. I also really believe in the applications of AI. And I define that not as like building agents or things like that. I define it much more as like, you're actually solving a problem for a business.[00:25:40] Bret: So it's what Harvey is doing in legal profession or what cursor is doing for software engineering or what we're doing for customer experience and customer service. The reason I believe in that is I do think that in the age of AI, what's really interesting about software is it can actually complete a task.[00:25:56] Bret: It can actually do a job, which is very different than the value proposition of [00:26:00] software was to ancient history two years ago. And as a consequence, I think the way you build a solution and For a domain is very different than you would have before, which means that it's not obvious, like the incumbent incumbents have like a leg up, you know, necessarily, they certainly have some advantages, but there's just such a different form factor, you know, for providing a solution and it's just really valuable.[00:26:23] Bret: You know, it's. Like just think of how much money cursor is saving software engineering teams or the alternative, how much revenue it can produce tool making is really challenging. If you look at the cloud market, just as a analog, there are a lot of like interesting tools, companies, you know, Confluent, Monetized Kafka, Snowflake, Hortonworks, you know, there's a, there's a bunch of them.[00:26:48] Bret: A lot of them, you know, have that mix of sort of like like confluence or have the open source or open core or whatever you call it. I, I, I'm not an expert in this area. You know, I do think [00:27:00] that developers are fickle. I think that in the tool space, I probably like. Default towards open source being like the area that will win.[00:27:09] Bret: It's hard to build a company around this and then you end up with companies sort of built around open source to that can work. Don't get me wrong, but I just think that it's nowadays the tools are changing so rapidly that I'm like, not totally skeptical of tool makers, but I just think that open source will broadly win, but I think that the CapEx required for building frontier models is such that it will go to a handful of big companies.[00:27:33] Bret: And then I really believe in agents for specific domains which I think will, it's sort of the analog to software as a service in this new era. You know, it's like, if you just think of the cloud. You can lease a server. It's just a low level primitive, or you can buy an app like you know, Shopify or whatever.[00:27:51] Bret: And most people building a storefront would prefer Shopify over hand rolling their e commerce storefront. I think the same thing will be true of AI. So [00:28:00] I've. I tend to like, if I have a, like an entrepreneur asked me for advice, I'm like, you know, move up the stack as far as you can towards a customer need.[00:28:09] Bret: Broadly, but I, but it doesn't reduce my excitement about what is the reactive building agents kind of thing, just because it is, it is the right question to ask, but I think we'll probably play out probably an open source space more than anything else.[00:28:21] swyx: Yeah, and it's not a priority for you. There's a lot in there.[00:28:24] swyx: I'm kind of curious about your idea maze towards, there are many customer needs. You happen to identify customer experience as yours, but it could equally have been coding assistance or whatever. I think for some, I'm just kind of curious at the top down, how do you look at the world in terms of the potential problem space?[00:28:44] swyx: Because there are many people out there who are very smart and pick the wrong problem.[00:28:47] Bret: Yeah, that's a great question.[00:28:48] Future of Software Development[00:28:48] Bret: By the way, I would love to talk about the future of software, too, because despite the fact it didn't pick coding, I have a lot of that, but I can talk to I can answer your question, though, you know I think when a technology is as [00:29:00] cool as large language models.[00:29:02] Bret: You just see a lot of people starting from the technology and searching for a problem to solve. And I think it's why you see a lot of tools companies, because as a software engineer, you start building an app or a demo and you, you encounter some pain points. You're like,[00:29:17] swyx: a lot of[00:29:17] Bret: people are experiencing the same pain point.[00:29:19] Bret: What if I make it? That it's just very incremental. And you know, I always like to use the metaphor, like you can sell coffee beans, roasted coffee beans. You can add some value. You took coffee beans and you roasted them and roasted coffee beans largely, you know, are priced relative to the cost of the beans.[00:29:39] Bret: Or you can sell a latte and a latte. Is rarely priced directly like as a percentage of coffee bean prices. In fact, if you buy a latte at the airport, it's a captive audience. So it's a really expensive latte. And there's just a lot that goes into like. How much does a latte cost? And I bring it up because there's a supply chain from growing [00:30:00] coffee beans to roasting coffee beans to like, you know, you could make one at home or you could be in the airport and buy one and the margins of the company selling lattes in the airport is a lot higher than the, you know, people roasting the coffee beans and it's because you've actually solved a much more acute human problem in the airport.[00:30:19] Bret: And, and it's just worth a lot more to that person in that moment. It's kind of the way I think about technology too. It sounds funny to liken it to coffee beans, but you're selling tools on top of a large language model yet in some ways your market is big, but you're probably going to like be price compressed just because you're sort of a piece of infrastructure and then you have open source and all these other things competing with you naturally.[00:30:43] Bret: If you go and solve a really big business problem for somebody, that's actually like a meaningful business problem that AI facilitates, they will value it according to the value of that business problem. And so I actually feel like people should just stop. You're like, no, that's, that's [00:31:00] unfair. If you're searching for an idea of people, I, I love people trying things, even if, I mean, most of the, a lot of the greatest ideas have been things no one believed in.[00:31:07] Bret: So I like, if you're passionate about something, go do it. Like who am I to say, yeah, a hundred percent. Or Gmail, like Paul as far, I mean I, some of it's Laura at this point, but like Gmail is Paul's own email for a long time. , and then I amusingly and Paul can't correct me, I'm pretty sure he sent her in a link and like the first comment was like, this is really neat.[00:31:26] Bret: It would be great. It was not your email, but my own . I don't know if it's a true story. I'm pretty sure it's, yeah, I've read that before. So scratch your own niche. Fine. Like it depends on what your goal is. If you wanna do like a venture backed company, if its a. Passion project, f*****g passion, do it like don't listen to anybody.[00:31:41] Bret: In fact, but if you're trying to start, you know an enduring company, solve an important business problem. And I, and I do think that in the world of agents, the software industries has shifted where you're not just helping people more. People be more productive, but you're actually accomplishing tasks autonomously.[00:31:58] Bret: And as a consequence, I think the [00:32:00] addressable market has just greatly expanded just because software can actually do things now and actually accomplish tasks and how much is coding autocomplete worth. A fair amount. How much is the eventual, I'm certain we'll have it, the software agent that actually writes the code and delivers it to you, that's worth a lot.[00:32:20] Bret: And so, you know, I would just maybe look up from the large language models and start thinking about the economy and, you know, think from first principles. I don't wanna get too far afield, but just think about which parts of the economy. We'll benefit most from this intelligence and which parts can absorb it most easily.[00:32:38] Bret: And what would an agent in this space look like? Who's the customer of it is the technology feasible. And I would just start with these business problems more. And I think, you know, the best companies tend to have great engineers who happen to have great insight into a market. And it's that last part that I think some people.[00:32:56] Bret: Whether or not they have, it's like people start so much in the technology, they [00:33:00] lose the forest for the trees a little bit.[00:33:02] Alessio: How do you think about the model of still selling some sort of software versus selling more package labor? I feel like when people are selling the package labor, it's almost more stateless, you know, like it's easier to swap out if you're just putting an input and getting an output.[00:33:16] Alessio: If you think about coding, if there's no ID, you're just putting a prompt and getting back an app. It doesn't really matter. Who generates the app, you know, you have less of a buy in versus the platform you're building, I'm sure on the backend customers have to like put on their documentation and they have, you know, different workflows that they can tie in what's kind of like the line to draw there versus like going full where you're managed customer support team as a service outsource versus.[00:33:40] Alessio: This is the Sierra platform that you can build on. What was that decision? I'll sort of[00:33:44] Bret: like decouple the question in some ways, which is when you have something that's an agent, who is the person using it and what do they want to do with it? So let's just take your coding agent for a second. I will talk about Sierra as well.[00:33:59] Bret: Who's the [00:34:00] customer of a, an agent that actually produces software? Is it a software engineering manager? Is it a software engineer? And it's there, you know, intern so to speak. I don't know. I mean, we'll figure this out over the next few years. Like what is that? And is it generating code that you then review?[00:34:16] Bret: Is it generating code with a set of unit tests that pass, what is the actual. For lack of a better word contract, like, how do you know that it did what you wanted it to do? And then I would say like the product and the pricing, the packaging model sort of emerged from that. And I don't think the world's figured out.[00:34:33] Bret: I think it'll be different for every agent. You know, in our customer base, we do what's called outcome based pricing. So essentially every time the AI agent. Solves the problem or saves a customer or whatever it might be. There's a pre negotiated rate for that. We do that. Cause it's, we think that that's sort of the correct way agents, you know, should be packaged.[00:34:53] Bret: I look back at the history of like cloud software and notably the introduction of the browser, which led to [00:35:00] software being delivered in a browser, like Salesforce to. Famously invented sort of software as a service, which is both a technical delivery model through the browser, but also a business model, which is you subscribe to it rather than pay for a perpetual license.[00:35:13] Bret: Those two things are somewhat orthogonal, but not really. If you think about the idea of software running in a browser, that's hosted. Data center that you don't own, you sort of needed to change the business model because you don't, you can't really buy a perpetual license or something otherwise like, how do you afford making changes to it?[00:35:31] Bret: So it only worked when you were buying like a new version every year or whatever. So to some degree, but then the business model shift actually changed business as we know it, because now like. Things like Adobe Photoshop. Now you subscribe to rather than purchase. So it ended up where you had a technical shift and a business model shift that were very logically intertwined that actually the business model shift was turned out to be as significant as the technical as the shift.[00:35:59] Bret: And I think with [00:36:00] agents, because they actually accomplish a job, I do think that it doesn't make sense to me that you'd pay for the privilege of like. Using the software like that coding agent, like if it writes really bad code, like fire it, you know, I don't know what the right metaphor is like you should pay for a job.[00:36:17] Bret: Well done in my opinion. I mean, that's how you pay your software engineers, right? And[00:36:20] swyx: and well, not really. We paid to put them on salary and give them options and they vest over time. That's fair.[00:36:26] Bret: But my point is that you don't pay them for how many characters they write, which is sort of the token based, you know, whatever, like, There's a, that famous Apple story where we're like asking for a report of how many lines of code you wrote.[00:36:40] Bret: And one of the engineers showed up with like a negative number cause he had just like done a big refactoring. There was like a big F you to management who didn't understand how software is written. You know, my sense is like the traditional usage based or seat based thing. It's just going to look really antiquated.[00:36:55] Bret: Cause it's like asking your software engineer, how many lines of code did you write today? Like who cares? Like, cause [00:37:00] absolutely no correlation. So my old view is I don't think it's be different in every category, but I do think that that is the, if an agent is doing a job, you should, I think it properly incentivizes the maker of that agent and the customer of, of your pain for the job well done.[00:37:16] Bret: It's not always perfect to measure. It's hard to measure engineering productivity, but you can, you should do something other than how many keys you typed, you know Talk about perverse incentives for AI, right? Like I can write really long functions to do the same thing, right? So broadly speaking, you know, I do think that we're going to see a change in business models of software towards outcomes.[00:37:36] Bret: And I think you'll see a change in delivery models too. And, and, you know, in our customer base you know, we empower our customers to really have their hands on the steering wheel of what the agent does they, they want and need that. But the role is different. You know, at a lot of our customers, the customer experience operations folks have renamed themselves the AI architects, which I think is really cool.[00:37:55] Bret: And, you know, it's like in the early days of the Internet, there's the role of the webmaster. [00:38:00] And I don't know whether your webmaster is not a fashionable, you know, Term, nor is it a job anymore? I just, I don't know. Will they, our tech stand the test of time? Maybe, maybe not. But I do think that again, I like, you know, because everyone listening right now is a software engineer.[00:38:14] Bret: Like what is the form factor of a coding agent? And actually I'll, I'll take a breath. Cause actually I have a bunch of pins on them. Like I wrote a blog post right before Christmas, just on the future of software development. And one of the things that's interesting is like, if you look at the way I use cursor today, as an example, it's inside of.[00:38:31] Bret: A repackaged visual studio code environment. I sometimes use the sort of agentic parts of it, but it's largely, you know, I've sort of gotten a good routine of making it auto complete code in the way I want through tuning it properly when it actually can write. I do wonder what like the future of development environments will look like.[00:38:55] Bret: And to your point on what is a software product, I think it's going to change a lot in [00:39:00] ways that will surprise us. But I always use, I use the metaphor in my blog post of, have you all driven around in a way, Mo around here? Yeah, everyone has. And there are these Jaguars, the really nice cars, but it's funny because it still has a steering wheel, even though there's no one sitting there and the steering wheels like turning and stuff clearly in the future.[00:39:16] Bret: If once we get to that, be more ubiquitous, like why have the steering wheel and also why have all the seats facing forward? Maybe just for car sickness. I don't know, but you could totally rearrange the car. I mean, so much of the car is oriented around the driver, so. It stands to reason to me that like, well, autonomous agents for software engineering run through visual studio code.[00:39:37] Bret: That seems a little bit silly because having a single source code file open one at a time is kind of a goofy form factor for when like the code isn't being written primarily by you, but it begs the question of what's your relationship with that agent. And I think the same is true in our industry of customer experience, which is like.[00:39:55] Bret: Who are the people managing this agent? What are the tools do they need? And they definitely need [00:40:00] tools, but it's probably pretty different than the tools we had before. It's certainly different than training a contact center team. And as software engineers, I think that I would like to see particularly like on the passion project side or research side.[00:40:14] Bret: More innovation in programming languages. I think that we're bringing the cost of writing code down to zero. So the fact that we're still writing Python with AI cracks me up just cause it's like literally was designed to be ergonomic to write, not safe to run or fast to run. I would love to see more innovation and how we verify program correctness.[00:40:37] Bret: I studied for formal verification in college a little bit and. It's not very fashionable because it's really like tedious and slow and doesn't work very well. If a lot of code is being written by a machine, you know, one of the primary values we can provide is verifying that it actually does what we intend that it does.[00:40:56] Bret: I think there should be lots of interesting things in the software development life cycle, like how [00:41:00] we think of testing and everything else, because. If you think about if we have to manually read every line of code that's coming out as machines, it will just rate limit how much the machines can do. The alternative is totally unsafe.[00:41:13] Bret: So I wouldn't want to put code in production that didn't go through proper code review and inspection. So my whole view is like, I actually think there's like an AI native I don't think the coding agents don't work well enough to do this yet, but once they do, what is sort of an AI native software development life cycle and how do you actually.[00:41:31] Bret: Enable the creators of software to produce the highest quality, most robust, fastest software and know that it's correct. And I think that's an incredible opportunity. I mean, how much C code can we rewrite and rust and make it safe so that there's fewer security vulnerabilities. Can we like have more efficient, safer code than ever before?[00:41:53] Bret: And can you have someone who's like that guy in the matrix, you know, like staring at the little green things, like where could you have an operator [00:42:00] of a code generating machine be like superhuman? I think that's a cool vision. And I think too many people are focused on like. Autocomplete, you know, right now, I'm not, I'm not even, I'm guilty as charged.[00:42:10] Bret: I guess in some ways, but I just like, I'd like to see some bolder ideas. And that's why when you were joking, you know, talking about what's the react of whatever, I think we're clearly in a local maximum, you know, metaphor, like sort of conceptual local maximum, obviously it's moving really fast. I think we're moving out of it.[00:42:26] Alessio: Yeah. At the end of 23, I've read this blog post from syntax to semantics. Like if you think about Python. It's taking C and making it more semantic and LLMs are like the ultimate semantic program, right? You can just talk to them and they can generate any type of syntax from your language. But again, the languages that they have to use were made for us, not for them.[00:42:46] Alessio: But the problem is like, as long as you will ever need a human to intervene, you cannot change the language under it. You know what I mean? So I'm curious at what point of automation we'll need to get, we're going to be okay making changes. To the underlying languages, [00:43:00] like the programming languages versus just saying, Hey, you just got to write Python because I understand Python and I'm more important at the end of the day than the model.[00:43:08] Alessio: But I think that will change, but I don't know if it's like two years or five years. I think it's more nuanced actually.[00:43:13] Bret: So I think there's a, some of the more interesting programming languages bring semantics into syntax. So let me, that's a little reductive, but like Rust as an example, Rust is memory safe.[00:43:25] Bret: Statically, and that was a really interesting conceptual, but it's why it's hard to write rust. It's why most people write python instead of rust. I think rust programs are safer and faster than python, probably slower to compile. But like broadly speaking, like given the option, if you didn't have to care about the labor that went into it.[00:43:45] Bret: You should prefer a program written in Rust over a program written in Python, just because it will run more efficiently. It's almost certainly safer, et cetera, et cetera, depending on how you define safe, but most people don't write Rust because it's kind of a pain in the ass. And [00:44:00] the audience of people who can is smaller, but it's sort of better in most, most ways.[00:44:05] Bret: And again, let's say you're making a web service and you didn't have to care about how hard it was to write. If you just got the output of the web service, the rest one would be cheaper to operate. It's certainly cheaper and probably more correct just because there's so much in the static analysis implied by the rest programming language that it probably will have fewer runtime errors and things like that as well.[00:44:25] Bret: So I just give that as an example, because so rust, at least my understanding that came out of the Mozilla team, because. There's lots of security vulnerabilities in the browser and it needs to be really fast. They said, okay, we want to put more of a burden at the authorship time to have fewer issues at runtime.[00:44:43] Bret: And we need the constraint that it has to be done statically because browsers need to be really fast. My sense is if you just think about like the, the needs of a programming language today, where the role of a software engineer is [00:45:00] to use an AI to generate functionality and audit that it does in fact work as intended, maybe functionally, maybe from like a correctness standpoint, some combination thereof, how would you create a programming system that facilitated that?[00:45:15] Bret: And, you know, I bring up Rust is because I think it's a good example of like, I think given a choice of writing in C or Rust, you should choose Rust today. I think most people would say that, even C aficionados, just because. C is largely less safe for very similar, you know, trade offs, you know, for the, the system and now with AI, it's like, okay, well, that just changes the game on writing these things.[00:45:36] Bret: And so like, I just wonder if a combination of programming languages that are more structurally oriented towards the values that we need from an AI generated program, verifiable correctness and all of that. If it's tedious to produce for a person, that maybe doesn't matter. But one thing, like if I asked you, is this rest program memory safe?[00:45:58] Bret: You wouldn't have to read it, you just have [00:46:00] to compile it. So that's interesting. I mean, that's like an, that's one example of a very modest form of formal verification. So I bring that up because I do think you have AI inspect AI, you can have AI reviewed. Do AI code reviews. It would disappoint me if the best we could get was AI reviewing Python and having scaled a few very large.[00:46:21] Bret: Websites that were written on Python. It's just like, you know, expensive and it's like every, trust me, every team who's written a big web service in Python has experimented with like Pi Pi and all these things just to make it slightly more efficient than it naturally is. You don't really have true multi threading anyway.[00:46:36] Bret: It's just like clearly that you do it just because it's convenient to write. And I just feel like we're, I don't want to say it's insane. I just mean. I do think we're at a local maximum. And I would hope that we create a programming system, a combination of programming languages, formal verification, testing, automated code reviews, where you can use AI to generate software in a high scale way and trust it.[00:46:59] Bret: And you're [00:47:00] not limited by your ability to read it necessarily. I don't know exactly what form that would take, but I feel like that would be a pretty cool world to live in.[00:47:08] Alessio: Yeah. We had Chris Lanner on the podcast. He's doing great work with modular. I mean, I love. LVM. Yeah. Basically merging rust in and Python.[00:47:15] Alessio: That's kind of the idea. Should be, but I'm curious is like, for them a big use case was like making it compatible with Python, same APIs so that Python developers could use it. Yeah. And so I, I wonder at what point, well, yeah.[00:47:26] Bret: At least my understanding is they're targeting the data science Yeah. Machine learning crowd, which is all written in Python, so still feels like a local maximum.[00:47:34] Bret: Yeah.[00:47:34] swyx: Yeah, exactly. I'll force you to make a prediction. You know, Python's roughly 30 years old. In 30 years from now, is Rust going to be bigger than Python?[00:47:42] Bret: I don't know this, but just, I don't even know this is a prediction. I just am sort of like saying stuff I hope is true. I would like to see an AI native programming language and programming system, and I use language because I'm not sure language is even the right thing, but I hope in 30 years, there's an AI native way we make [00:48:00] software that is wholly uncorrelated with the current set of programming languages.[00:48:04] Bret: or not uncorrelated, but I think most programming languages today were designed to be efficiently authored by people and some have different trade offs.[00:48:15] Evolution of Programming Languages[00:48:15] Bret: You know, you have Haskell and others that were designed for abstractions for parallelism and things like that. You have programming languages like Python, which are designed to be very easily written, sort of like Perl and Python lineage, which is why data scientists use it.[00:48:31] Bret: It's it can, it has a. Interactive mode, things like that. And I love, I'm a huge Python fan. So despite all my Python trash talk, a huge Python fan wrote at least two of my three companies were exclusively written in Python and then C came out of the birth of Unix and it wasn't the first, but certainly the most prominent first step after assembly language, right?[00:48:54] Bret: Where you had higher level abstractions rather than and going beyond go to, to like abstractions, [00:49:00] like the for loop and the while loop.[00:49:01] The Future of Software Engineering[00:49:01] Bret: So I just think that if the act of writing code is no longer a meaningful human exercise, maybe it will be, I don't know. I'm just saying it sort of feels like maybe it's one of those parts of history that just will sort of like go away, but there's still the role of this offer engineer, like the person actually building the system.[00:49:20] Bret: Right. And. What does a programming system for that form factor look like?[00:49:25] React and Front-End Development[00:49:25] Bret: And I, I just have a, I hope to be just like I mentioned, I remember I was at Facebook in the very early days when, when, what is now react was being created. And I remember when the, it was like released open source I had left by that time and I was just like, this is so f*****g cool.[00:49:42] Bret: Like, you know, to basically model your app independent of the data flowing through it, just made everything easier. And then now. You know, I can create, like there's a lot of the front end software gym play is like a little chaotic for me, to be honest with you. It is like, it's sort of like [00:50:00] abstraction soup right now for me, but like some of those core ideas felt really ergonomic.[00:50:04] Bret: I just wanna, I'm just looking forward to the day when someone comes up with a programming system that feels both really like an aha moment, but completely foreign to me at the same time. Because they created it with sort of like from first principles recognizing that like. Authoring code in an editor is maybe not like the primary like reason why a programming system exists anymore.[00:50:26] Bret: And I think that's like, that would be a very exciting day for me.[00:50:28] The Role of AI in Programming[00:50:28] swyx: Yeah, I would say like the various versions of this discussion have happened at the end of the day, you still need to precisely communicate what you want. As a manager of people, as someone who has done many, many legal contracts, you know how hard that is.[00:50:42] swyx: And then now we have to talk to machines doing that and AIs interpreting what we mean and reading our minds effectively. I don't know how to get across that barrier of translating human intent to instructions. And yes, it can be more declarative, but I don't know if it'll ever Crossover from being [00:51:00] a programming language to something more than that.[00:51:02] Bret: I agree with you. And I actually do think if you look at like a legal contract, you know, the imprecision of the English language, it's like a flaw in the system. How many[00:51:12] swyx: holes there are.[00:51:13] Bret: And I do think that when you're making a mission critical software system, I don't think it should be English language prompts.[00:51:19] Bret: I think that is silly because you want the precision of a a programming language. My point was less about that and more about if the actual act of authoring it, like if you.[00:51:32] Formal Verification in Software[00:51:32] Bret: I'll think of some embedded systems do use formal verification. I know it's very common in like security protocols now so that you can, because the importance of correctness is so great.[00:51:41] Bret: My intellectual exercise is like, why not do that for all software? I mean, probably that's silly just literally to do what we literally do for. These low level security protocols, but the only reason we don't is because it's hard and tedious and hard and tedious are no longer factors. So, like, if I could, I mean, [00:52:00] just think of, like, the silliest app on your phone right now, the idea that that app should be, like, formally verified for its correctness feels laughable right now because, like, God, why would you spend the time on it?[00:52:10] Bret: But if it's zero costs, like, yeah, I guess so. I mean, it never crashed. That's probably good. You know, why not? I just want to, like, set our bars really high. Like. We should make, software has been amazing. Like there's a Mark Andreessen blog post, software is eating the world. And you know, our whole life is, is mediated digitally.[00:52:26] Bret: And that's just increasing with AI. And now we'll have our personal agents talking to the agents on the CRO platform and it's agents all the way down, you know, our core infrastructure is running on these digital systems. We now have like, and we've had a shortage of software developers for my entire life.[00:52:45] Bret: And as a consequence, you know if you look, remember like health care, got healthcare. gov that fiasco security vulnerabilities leading to state actors getting access to critical infrastructure. I'm like. We now have like created this like amazing system that can [00:53:00] like, we can fix this, you know, and I, I just want to, I'm both excited about the productivity gains in the economy, but I just think as software engineers, we should be bolder.[00:53:08] Bret: Like we should have aspirations to fix these systems so that like in general, as you said, as precise as we want to be in the specification of the system. We can make it work correctly now, and I'm being a little bit hand wavy, and I think we need some systems. I think that's where we should set the bar, especially when so much of our life depends on this critical digital infrastructure.[00:53:28] Bret: So I'm I'm just like super optimistic about it. But actually, let's go to w
This show has been flagged as Clean by the host. I have set up some LoRaWAN temperature and humidity sensors, and am using the Things Stack to collect the data. This gets processed via a web-hook and rendered as a graph. The LoRaWAN Alliance - https://lora-alliance.org Mastering LoRaWAN - https://www.amazon.com/Mastering-LoRaWAN-Comprehensive-Communication-Connectivity-ebook/dp/B0CTRH6MV6 The Things Industries - https://thethingsindustries.com server.py import json import sqlite3 import logging from http.server import BaseHTTPRequestHandler, HTTPServer rooms = { 'eui-24e12*********07': 'living-room', 'eui-24e12*********54': 'hall', 'eui-24e12*********42': 'downstairs-office', 'eui-24e12*********35': 'kitchen', 'eui-24e12*********29': 'conservatory', 'eui-24e12*********87': 'landing', 'eui-24e12*********45': 'main-bedroom', 'eui-24e12*********89': 'upstairs-office', 'eui-24e12*********38': 'spare-bedroom', 'eui-24e12*********37': 'playroom' }; # Configure logging logging.basicConfig(filename="server_log.txt", level=logging.INFO, format="%(asctime)s - %(message)s") # Define the web server handler class MyServerHandler(BaseHTTPRequestHandler): # Handle POST requests def do_POST(self): length = int(self.headers.get('Content-Length')) data = self.rfile.read(length).decode('utf-8') try: # Validate and parse JSON data json_data = json.loads(data) logging.info(f"Received valid JSON data: {json_data}") # Write the data to database id = json_data["end_device_ids"]["device_id"] room = rooms.get(id) readat = json_data["uplink_message"]["rx_metadata"][0]["time"] temp = json_data["uplink_message"]["decoded_payload"]["temperature"] hum = json_data["uplink_message"]["decoded_payload"]["humidity"] conn = sqlite3.connect('data.db') sql = """CREATE TABLE IF NOT EXISTS data ( id INTEGER PRIMARY KEY AUTOINCREMENT, room TEXT, readat DATETIME, temp DECIMAL(4,1), hum DECIMAL(4,1) );""" conn.execute(sql) sql = "INSERT INTO data (room, readat, temp, hum) VALUES (?, ?, ?, ?)" conn.execute(sql, (room, readat, temp, hum)) conn.commit() conn.close() self.send_response(200) self.send_header("Content-type", "text/html") self.end_headers() self.wfile.write(bytes("Data received and logged!", "utf-8")) except json.JSONDecodeError: logging.error("Invalid JSON data received.") self.send_response(400) # Bad Request self.send_header("Content-type", "text/html") self.end_headers() self.wfile.write(bytes("Invalid JSON format.", "utf-8")) except PermissionError: logging.error("File write permission denied.") self.send_response(500) # Internal Server Error self.send_header("Content-type", "text/html") self.end_headers() self.wfile.write(bytes("Server error: Unable to write data to file.", "utf-8")) # Start the server server_address = ('0.0.0.0', 12345) # Customize host and port if needed httpd = HTTPServer(server_address, MyServerHandler) print("Server started on http://localhost:12345") httpd.serve_forever() process.php const data1 = { datasets: [ ], labels: [] }; const ctx1 = document.getElementById("temp").getContext("2d"); const options1 = { type: "line", data: data1, options: { elements: { point:{ radius: 0 } } } }; const chart1 = new Chart(ctx1, options1); const data2 = { datasets: [ ], labels: [] }; const ctx2 = document.getElementById("hum").getContext("2d"); const options2 = { type: "line", data: data2, options: { elements: { point:{ radius: 0 } } } }; const chart2 = new Chart(ctx2, options2); Temperature Chart Humidity Chart Provide feedback on this episode.
Did you know that adding a simple Code Interpreter took o3 from 9.2% to 32% on FrontierMath? The Latent Space crew is hosting a hack night Feb 11th in San Francisco focused on CodeGen use cases, co-hosted with E2B and Edge AGI; watch E2B's new workshop and RSVP here!We're happy to announce that today's guest Samuel Colvin will be teaching his very first Pydantic AI workshop at the newly announced AI Engineer NYC Workshops day on Feb 22! 25 tickets left.If you're a Python developer, it's very likely that you've heard of Pydantic. Every month, it's downloaded >300,000,000 times, making it one of the top 25 PyPi packages. OpenAI uses it in its SDK for structured outputs, it's at the core of FastAPI, and if you've followed our AI Engineer Summit conference, Jason Liu of Instructor has given two great talks about it: “Pydantic is all you need” and “Pydantic is STILL all you need”. Now, Samuel Colvin has raised $17M from Sequoia to turn Pydantic from an open source project to a full stack AI engineer platform with Logfire, their observability platform, and PydanticAI, their new agent framework.Logfire: bringing OTEL to AIOpenTelemetry recently merged Semantic Conventions for LLM workloads which provides standard definitions to track performance like gen_ai.server.time_per_output_token. In Sam's view at least 80% of new apps being built today have some sort of LLM usage in them, and just like web observability platform got replaced by cloud-first ones in the 2010s, Logfire wants to do the same for AI-first apps. If you're interested in the technical details, Logfire migrated away from Clickhouse to Datafusion for their backend. We spent some time on the importance of picking open source tools you understand and that you can actually contribute to upstream, rather than the more popular ones; listen in ~43:19 for that part.Agents are the killer app for graphsPydantic AI is their attempt at taking a lot of the learnings that LangChain and the other early LLM frameworks had, and putting Python best practices into it. At an API level, it's very similar to the other libraries: you can call LLMs, create agents, do function calling, do evals, etc.They define an “Agent” as a container with a system prompt, tools, structured result, and an LLM. Under the hood, each Agent is now a graph of function calls that can orchestrate multi-step LLM interactions. You can start simple, then move toward fully dynamic graph-based control flow if needed.“We were compelled enough by graphs once we got them right that our agent implementation [...] is now actually a graph under the hood.”Why Graphs?* More natural for complex or multi-step AI workflows.* Easy to visualize and debug with mermaid diagrams.* Potential for distributed runs, or “waiting days” between steps in certain flows.In parallel, you see folks like Emil Eifrem of Neo4j talk about GraphRAG as another place where graphs fit really well in the AI stack, so it might be time for more people to take them seriously.Full Video EpisodeLike and subscribe!Chapters* 00:00:00 Introductions* 00:00:24 Origins of Pydantic* 00:05:28 Pydantic's AI moment * 00:08:05 Why build a new agents framework?* 00:10:17 Overview of Pydantic AI* 00:12:33 Becoming a believer in graphs* 00:24:02 God Model vs Compound AI Systems* 00:28:13 Why not build an LLM gateway?* 00:31:39 Programmatic testing vs live evals* 00:35:51 Using OpenTelemetry for AI traces* 00:43:19 Why they don't use Clickhouse* 00:48:34 Competing in the observability space* 00:50:41 Licensing decisions for Pydantic and LogFire* 00:51:48 Building Pydantic.run* 00:55:24 Marimo and the future of Jupyter notebooks* 00:57:44 London's AI sceneShow Notes* Sam Colvin* Pydantic* Pydantic AI* Logfire* Pydantic.run* Zod* E2B* Arize* Langsmith* Marimo* Prefect* GLA (Google Generative Language API)* OpenTelemetry* Jason Liu* Sebastian Ramirez* Bogomil Balkansky* Hood Chatham* Jeremy Howard* Andrew LambTranscriptAlessio [00:00:03]: Hey, everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.Swyx [00:00:12]: Good morning. And today we're very excited to have Sam Colvin join us from Pydantic AI. Welcome. Sam, I heard that Pydantic is all we need. Is that true?Samuel [00:00:24]: I would say you might need Pydantic AI and Logfire as well, but it gets you a long way, that's for sure.Swyx [00:00:29]: Pydantic almost basically needs no introduction. It's almost 300 million downloads in December. And obviously, in the previous podcasts and discussions we've had with Jason Liu, he's been a big fan and promoter of Pydantic and AI.Samuel [00:00:45]: Yeah, it's weird because obviously I didn't create Pydantic originally for uses in AI, it predates LLMs. But it's like we've been lucky that it's been picked up by that community and used so widely.Swyx [00:00:58]: Actually, maybe we'll hear it. Right from you, what is Pydantic and maybe a little bit of the origin story?Samuel [00:01:04]: The best name for it, which is not quite right, is a validation library. And we get some tension around that name because it doesn't just do validation, it will do coercion by default. We now have strict mode, so you can disable that coercion. But by default, if you say you want an integer field and you get in a string of 1, 2, 3, it will convert it to 123 and a bunch of other sensible conversions. And as you can imagine, the semantics around it. Exactly when you convert and when you don't, it's complicated, but because of that, it's more than just validation. Back in 2017, when I first started it, the different thing it was doing was using type hints to define your schema. That was controversial at the time. It was genuinely disapproved of by some people. I think the success of Pydantic and libraries like FastAPI that build on top of it means that today that's no longer controversial in Python. And indeed, lots of other people have copied that route, but yeah, it's a data validation library. It uses type hints for the for the most part and obviously does all the other stuff you want, like serialization on top of that. But yeah, that's the core.Alessio [00:02:06]: Do you have any fun stories on how JSON schemas ended up being kind of like the structure output standard for LLMs? And were you involved in any of these discussions? Because I know OpenAI was, you know, one of the early adopters. So did they reach out to you? Was there kind of like a structure output console in open source that people were talking about or was it just a random?Samuel [00:02:26]: No, very much not. So I originally. Didn't implement JSON schema inside Pydantic and then Sebastian, Sebastian Ramirez, FastAPI came along and like the first I ever heard of him was over a weekend. I got like 50 emails from him or 50 like emails as he was committing to Pydantic, adding JSON schema long pre version one. So the reason it was added was for OpenAPI, which is obviously closely akin to JSON schema. And then, yeah, I don't know why it was JSON that got picked up and used by OpenAI. It was obviously very convenient for us. That's because it meant that not only can you do the validation, but because Pydantic will generate you the JSON schema, it will it kind of can be one source of source of truth for structured outputs and tools.Swyx [00:03:09]: Before we dive in further on the on the AI side of things, something I'm mildly curious about, obviously, there's Zod in JavaScript land. Every now and then there is a new sort of in vogue validation library that that takes over for quite a few years and then maybe like some something else comes along. Is Pydantic? Is it done like the core Pydantic?Samuel [00:03:30]: I've just come off a call where we were redesigning some of the internal bits. There will be a v3 at some point, which will not break people's code half as much as v2 as in v2 was the was the massive rewrite into Rust, but also fixing all the stuff that was broken back from like version zero point something that we didn't fix in v1 because it was a side project. We have plans to move some of the basically store the data in Rust types after validation. Not completely. So we're still working to design the Pythonic version of it, in order for it to be able to convert into Python types. So then if you were doing like validation and then serialization, you would never have to go via a Python type we reckon that can give us somewhere between three and five times another three to five times speed up. That's probably the biggest thing. Also, like changing how easy it is to basically extend Pydantic and define how particular types, like for example, NumPy arrays are validated and serialized. But there's also stuff going on. And for example, Jitter, the JSON library in Rust that does the JSON parsing, has SIMD implementation at the moment only for AMD64. So we can add that. We need to go and add SIMD for other instruction sets. So there's a bunch more we can do on performance. I don't think we're going to go and revolutionize Pydantic, but it's going to continue to get faster, continue, hopefully, to allow people to do more advanced things. We might add a binary format like CBOR for serialization for when you'll just want to put the data into a database and probably load it again from Pydantic. So there are some things that will come along, but for the most part, it should just get faster and cleaner.Alessio [00:05:04]: From a focus perspective, I guess, as a founder too, how did you think about the AI interest rising? And then how do you kind of prioritize, okay, this is worth going into more, and we'll talk about Pydantic AI and all of that. What was maybe your early experience with LLAMP, and when did you figure out, okay, this is something we should take seriously and focus more resources on it?Samuel [00:05:28]: I'll answer that, but I'll answer what I think is a kind of parallel question, which is Pydantic's weird, because Pydantic existed, obviously, before I was starting a company. I was working on it in my spare time, and then beginning of 22, I started working on the rewrite in Rust. And I worked on it full-time for a year and a half, and then once we started the company, people came and joined. And it was a weird project, because that would never go away. You can't get signed off inside a startup. Like, we're going to go off and three engineers are going to work full-on for a year in Python and Rust, writing like 30,000 lines of Rust just to release open-source-free Python library. The result of that has been excellent for us as a company, right? As in, it's made us remain entirely relevant. And it's like, Pydantic is not just used in the SDKs of all of the AI libraries, but I can't say which one, but one of the big foundational model companies, when they upgraded from Pydantic v1 to v2, their number one internal model... The metric of performance is time to first token. That went down by 20%. So you think about all of the actual AI going on inside, and yet at least 20% of the CPU, or at least the latency inside requests was actually Pydantic, which shows like how widely it's used. So we've benefited from doing that work, although it didn't, it would have never have made financial sense in most companies. In answer to your question about like, how do we prioritize AI, I mean, the honest truth is we've spent a lot of the last year and a half building. Good general purpose observability inside LogFire and making Pydantic good for general purpose use cases. And the AI has kind of come to us. Like we just, not that we want to get away from it, but like the appetite, uh, both in Pydantic and in LogFire to go and build with AI is enormous because it kind of makes sense, right? Like if you're starting a new greenfield project in Python today, what's the chance that you're using GenAI 80%, let's say, globally, obviously it's like a hundred percent in California, but even worldwide, it's probably 80%. Yeah. And so everyone needs that stuff. And there's so much yet to be figured out so much like space to do things better in the ecosystem in a way that like to go and implement a database that's better than Postgres is a like Sisyphean task. Whereas building, uh, tools that are better for GenAI than some of the stuff that's about now is not very difficult. Putting the actual models themselves to one side.Alessio [00:07:40]: And then at the same time, then you released Pydantic AI recently, which is, uh, um, you know, agent framework and early on, I would say everybody like, you know, Langchain and like, uh, Pydantic kind of like a first class support, a lot of these frameworks, we're trying to use you to be better. What was the decision behind we should do our own framework? Were there any design decisions that you disagree with any workloads that you think people didn't support? Well,Samuel [00:08:05]: it wasn't so much like design and workflow, although I think there were some, some things we've done differently. Yeah. I think looking in general at the ecosystem of agent frameworks, the engineering quality is far below that of the rest of the Python ecosystem. There's a bunch of stuff that we have learned how to do over the last 20 years of building Python libraries and writing Python code that seems to be abandoned by people when they build agent frameworks. Now I can kind of respect that, particularly in the very first agent frameworks, like Langchain, where they were literally figuring out how to go and do this stuff. It's completely understandable that you would like basically skip some stuff.Samuel [00:08:42]: I'm shocked by the like quality of some of the agent frameworks that have come out recently from like well-respected names, which it just seems to be opportunism and I have little time for that, but like the early ones, like I think they were just figuring out how to do stuff and just as lots of people have learned from Pydantic, we were able to learn a bit from them. I think from like the gap we saw and the thing we were frustrated by was the production readiness. And that means things like type checking, even if type checking makes it hard. Like Pydantic AI, I will put my hand up now and say it has a lot of generics and you need to, it's probably easier to use it if you've written a bit of Rust and you really understand generics, but like, and that is, we're not claiming that that makes it the easiest thing to use in all cases, we think it makes it good for production applications in big systems where type checking is a no-brainer in Python. But there are also a bunch of stuff we've learned from maintaining Pydantic over the years that we've gone and done. So every single example in Pydantic AI's documentation is run on Python. As part of tests and every single print output within an example is checked during tests. So it will always be up to date. And then a bunch of things that, like I say, are standard best practice within the rest of the Python ecosystem, but I'm not followed surprisingly by some AI libraries like coverage, linting, type checking, et cetera, et cetera, where I think these are no-brainers, but like weirdly they're not followed by some of the other libraries.Alessio [00:10:04]: And can you just give an overview of the framework itself? I think there's kind of like the. LLM calling frameworks, there are the multi-agent frameworks, there's the workflow frameworks, like what does Pydantic AI do?Samuel [00:10:17]: I glaze over a bit when I hear all of the different sorts of frameworks, but I like, and I will tell you when I built Pydantic, when I built Logfire and when I built Pydantic AI, my methodology is not to go and like research and review all of the other things. I kind of work out what I want and I go and build it and then feedback comes and we adjust. So the fundamental building block of Pydantic AI is agents. The exact definition of agents and how you want to define them. is obviously ambiguous and our things are probably sort of agent-lit, not that we would want to go and rename them to agent-lit, but like the point is you probably build them together to build something and most people will call an agent. So an agent in our case has, you know, things like a prompt, like system prompt and some tools and a structured return type if you want it, that covers the vast majority of cases. There are situations where you want to go further and the most complex workflows where you want graphs and I resisted graphs for quite a while. I was sort of of the opinion you didn't need them and you could use standard like Python flow control to do all of that stuff. I had a few arguments with people, but I basically came around to, yeah, I can totally see why graphs are useful. But then we have the problem that by default, they're not type safe because if you have a like add edge method where you give the names of two different edges, there's no type checking, right? Even if you go and do some, I'm not, not all the graph libraries are AI specific. So there's a, there's a graph library called, but it allows, it does like a basic runtime type checking. Ironically using Pydantic to try and make up for the fact that like fundamentally that graphs are not typed type safe. Well, I like Pydantic, but it did, that's not a real solution to have to go and run the code to see if it's safe. There's a reason that starting type checking is so powerful. And so we kind of, from a lot of iteration eventually came up with a system of using normally data classes to define nodes where you return the next node you want to call and where we're able to go and introspect the return type of a node to basically build the graph. And so the graph is. Yeah. Inherently type safe. And once we got that right, I, I wasn't, I'm incredibly excited about graphs. I think there's like masses of use cases for them, both in gen AI and other development, but also software's all going to have interact with gen AI, right? It's going to be like web. There's no longer be like a web department in a company is that there's just like all the developers are building for web building with databases. The same is going to be true for gen AI.Alessio [00:12:33]: Yeah. I see on your docs, you call an agent, a container that contains a system prompt function. Tools, structure, result, dependency type model, and then model settings. Are the graphs in your mind, different agents? Are they different prompts for the same agent? What are like the structures in your mind?Samuel [00:12:52]: So we were compelled enough by graphs once we got them right, that we actually merged the PR this morning. That means our agent implementation without changing its API at all is now actually a graph under the hood as it is built using our graph library. So graphs are basically a lower level tool that allow you to build these complex workflows. Our agents are technically one of the many graphs you could go and build. And we just happened to build that one for you because it's a very common, commonplace one. But obviously there are cases where you need more complex workflows where the current agent assumptions don't work. And that's where you can then go and use graphs to build more complex things.Swyx [00:13:29]: You said you were cynical about graphs. What changed your mind specifically?Samuel [00:13:33]: I guess people kept giving me examples of things that they wanted to use graphs for. And my like, yeah, but you could do that in standard flow control in Python became a like less and less compelling argument to me because I've maintained those systems that end up with like spaghetti code. And I could see the appeal of this like structured way of defining the workflow of my code. And it's really neat that like just from your code, just from your type hints, you can get out a mermaid diagram that defines exactly what can go and happen.Swyx [00:14:00]: Right. Yeah. You do have very neat implementation of sort of inferring the graph from type hints, I guess. Yeah. Is what I would call it. Yeah. I think the question always is I have gone back and forth. I used to work at Temporal where we would actually spend a lot of time complaining about graph based workflow solutions like AWS step functions. And we would actually say that we were better because you could use normal control flow that you already knew and worked with. Yours, I guess, is like a little bit of a nice compromise. Like it looks like normal Pythonic code. But you just have to keep in mind what the type hints actually mean. And that's what we do with the quote unquote magic that the graph construction does.Samuel [00:14:42]: Yeah, exactly. And if you look at the internal logic of actually running a graph, it's incredibly simple. It's basically call a node, get a node back, call that node, get a node back, call that node. If you get an end, you're done. We will add in soon support for, well, basically storage so that you can store the state between each node that's run. And then the idea is you can then distribute the graph and run it across computers. And also, I mean, the other weird, the other bit that's really valuable is across time. Because it's all very well if you look at like lots of the graph examples that like Claude will give you. If it gives you an example, it gives you this lovely enormous mermaid chart of like the workflow, for example, managing returns if you're an e-commerce company. But what you realize is some of those lines are literally one function calls another function. And some of those lines are wait six days for the customer to print their like piece of paper and put it in the post. And if you're writing like your demo. Project or your like proof of concept, that's fine because you can just say, and now we call this function. But when you're building when you're in real in real life, that doesn't work. And now how do we manage that concept to basically be able to start somewhere else in the in our code? Well, this graph implementation makes it incredibly easy because you just pass the node that is the start point for carrying on the graph and it continues to run. So it's things like that where I was like, yeah, I can just imagine how things I've done in the past would be fundamentally easier to understand if we had done them with graphs.Swyx [00:16:07]: You say imagine, but like right now, this pedantic AI actually resume, you know, six days later, like you said, or is this just like a theoretical thing we can go someday?Samuel [00:16:16]: I think it's basically Q&A. So there's an AI that's asking the user a question and effectively you then call the CLI again to continue the conversation. And it basically instantiates the node and calls the graph with that node again. Now, we don't have the logic yet for effectively storing state in the database between individual nodes that we're going to add soon. But like the rest of it is basically there.Swyx [00:16:37]: It does make me think that not only are you competing with Langchain now and obviously Instructor, and now you're going into sort of the more like orchestrated things like Airflow, Prefect, Daxter, those guys.Samuel [00:16:52]: Yeah, I mean, we're good friends with the Prefect guys and Temporal have the same investors as us. And I'm sure that my investor Bogomol would not be too happy if I was like, oh, yeah, by the way, as well as trying to take on Datadog. We're also going off and trying to take on Temporal and everyone else doing that. Obviously, we're not doing all of the infrastructure of deploying that right yet, at least. We're, you know, we're just building a Python library. And like what's crazy about our graph implementation is, sure, there's a bit of magic in like introspecting the return type, you know, extracting things from unions, stuff like that. But like the actual calls, as I say, is literally call a function and get back a thing and call that. It's like incredibly simple and therefore easy to maintain. The question is, how useful is it? Well, I don't know yet. I think we have to go and find out. We have a whole. We've had a slew of people joining our Slack over the last few days and saying, tell me how good Pydantic AI is. How good is Pydantic AI versus Langchain? And I refuse to answer. That's your job to go and find that out. Not mine. We built a thing. I'm compelled by it, but I'm obviously biased. The ecosystem will work out what the useful tools are.Swyx [00:17:52]: Bogomol was my board member when I was at Temporal. And I think I think just generally also having been a workflow engine investor and participant in this space, it's a big space. Like everyone needs different functions. I think the one thing that I would say like yours, you know, as a library, you don't have that much control of it over the infrastructure. I do like the idea that each new agents or whatever or unit of work, whatever you call that should spin up in this sort of isolated boundaries. Whereas yours, I think around everything runs in the same process. But you ideally want to sort of spin out its own little container of things.Samuel [00:18:30]: I agree with you a hundred percent. And we will. It would work now. Right. As in theory, you're just like as long as you can serialize the calls to the next node, you just have to all of the different containers basically have to have the same the same code. I mean, I'm super excited about Cloudflare workers running Python and being able to install dependencies. And if Cloudflare could only give me my invitation to the private beta of that, we would be exploring that right now because I'm super excited about that as a like compute level for some of this stuff where exactly what you're saying, basically. You can run everything as an individual. Like worker function and distribute it. And it's resilient to failure, et cetera, et cetera.Swyx [00:19:08]: And it spins up like a thousand instances simultaneously. You know, you want it to be sort of truly serverless at once. Actually, I know we have some Cloudflare friends who are listening, so hopefully they'll get in front of the line. Especially.Samuel [00:19:19]: I was in Cloudflare's office last week shouting at them about other things that frustrate me. I have a love-hate relationship with Cloudflare. Their tech is awesome. But because I use it the whole time, I then get frustrated. So, yeah, I'm sure I will. I will. I will get there soon.Swyx [00:19:32]: There's a side tangent on Cloudflare. Is Python supported at full? I actually wasn't fully aware of what the status of that thing is.Samuel [00:19:39]: Yeah. So Pyodide, which is Python running inside the browser in scripting, is supported now by Cloudflare. They basically, they're having some struggles working out how to manage, ironically, dependencies that have binaries, in particular, Pydantic. Because these workers where you can have thousands of them on a given metal machine, you don't want to have a difference. You basically want to be able to have a share. Shared memory for all the different Pydantic installations, effectively. That's the thing they work out. They're working out. But Hood, who's my friend, who is the primary maintainer of Pyodide, works for Cloudflare. And that's basically what he's doing, is working out how to get Python running on Cloudflare's network.Swyx [00:20:19]: I mean, the nice thing is that your binary is really written in Rust, right? Yeah. Which also compiles the WebAssembly. Yeah. So maybe there's a way that you'd build... You have just a different build of Pydantic and that ships with whatever your distro for Cloudflare workers is.Samuel [00:20:36]: Yes, that's exactly what... So Pyodide has builds for Pydantic Core and for things like NumPy and basically all of the popular binary libraries. Yeah. It's just basic. And you're doing exactly that, right? You're using Rust to compile the WebAssembly and then you're calling that shared library from Python. And it's unbelievably complicated, but it works. Okay.Swyx [00:20:57]: Staying on graphs a little bit more, and then I wanted to go to some of the other features that you have in Pydantic AI. I see in your docs, there are sort of four levels of agents. There's single agents, there's agent delegation, programmatic agent handoff. That seems to be what OpenAI swarms would be like. And then the last one, graph-based control flow. Would you say that those are sort of the mental hierarchy of how these things go?Samuel [00:21:21]: Yeah, roughly. Okay.Swyx [00:21:22]: You had some expression around OpenAI swarms. Well.Samuel [00:21:25]: And indeed, OpenAI have got in touch with me and basically, maybe I'm not supposed to say this, but basically said that Pydantic AI looks like what swarms would become if it was production ready. So, yeah. I mean, like, yeah, which makes sense. Awesome. Yeah. I mean, in fact, it was specifically saying, how can we give people the same feeling that they were getting from swarms that led us to go and implement graphs? Because my, like, just call the next agent with Python code was not a satisfactory answer to people. So it was like, okay, we've got to go and have a better answer for that. It's not like, let us to get to graphs. Yeah.Swyx [00:21:56]: I mean, it's a minimal viable graph in some sense. What are the shapes of graphs that people should know? So the way that I would phrase this is I think Anthropic did a very good public service and also kind of surprisingly influential blog post, I would say, when they wrote Building Effective Agents. We actually have the authors coming to speak at my conference in New York, which I think you're giving a workshop at. Yeah.Samuel [00:22:24]: I'm trying to work it out. But yes, I think so.Swyx [00:22:26]: Tell me if you're not. yeah, I mean, like, that was the first, I think, authoritative view of, like, what kinds of graphs exist in agents and let's give each of them a name so that everyone is on the same page. So I'm just kind of curious if you have community names or top five patterns of graphs.Samuel [00:22:44]: I don't have top five patterns of graphs. I would love to see what people are building with them. But like, it's been it's only been a couple of weeks. And of course, there's a point is that. Because they're relatively unopinionated about what you can go and do with them. They don't suit them. Like, you can go and do lots of lots of things with them, but they don't have the structure to go and have like specific names as much as perhaps like some other systems do. I think what our agents are, which have a name and I can't remember what it is, but this basically system of like, decide what tool to call, go back to the center, decide what tool to call, go back to the center and then exit. One form of graph, which, as I say, like our agents are effectively one implementation of a graph, which is why under the hood they are now using graphs. And it'll be interesting to see over the next few years whether we end up with these like predefined graph names or graph structures or whether it's just like, yep, I built a graph or whether graphs just turn out not to match people's mental image of what they want and die away. We'll see.Swyx [00:23:38]: I think there is always appeal. Every developer eventually gets graph religion and goes, oh, yeah, everything's a graph. And then they probably over rotate and go go too far into graphs. And then they have to learn a whole bunch of DSLs. And then they're like, actually, I didn't need that. I need this. And they scale back a little bit.Samuel [00:23:55]: I'm at the beginning of that process. I'm currently a graph maximalist, although I haven't actually put any into production yet. But yeah.Swyx [00:24:02]: This has a lot of philosophical connections with other work coming out of UC Berkeley on compounding AI systems. I don't know if you know of or care. This is the Gartner world of things where they need some kind of industry terminology to sell it to enterprises. I don't know if you know about any of that.Samuel [00:24:24]: I haven't. I probably should. I should probably do it because I should probably get better at selling to enterprises. But no, no, I don't. Not right now.Swyx [00:24:29]: This is really the argument is that instead of putting everything in one model, you have more control and more maybe observability to if you break everything out into composing little models and changing them together. And obviously, then you need an orchestration framework to do that. Yeah.Samuel [00:24:47]: And it makes complete sense. And one of the things we've seen with agents is they work well when they work well. But when they. Even if you have the observability through log five that you can see what was going on, if you don't have a nice hook point to say, hang on, this is all gone wrong. You have a relatively blunt instrument of basically erroring when you exceed some kind of limit. But like what you need to be able to do is effectively iterate through these runs so that you can have your own control flow where you're like, OK, we've gone too far. And that's where one of the neat things about our graph implementation is you can basically call next in a loop rather than just running the full graph. And therefore, you have this opportunity to to break out of it. But yeah, basically, it's the same point, which is like if you have two bigger unit of work to some extent, whether or not it involves gen AI. But obviously, it's particularly problematic in gen AI. You only find out afterwards when you've spent quite a lot of time and or money when it's gone off and done done the wrong thing.Swyx [00:25:39]: Oh, drop on this. We're not going to resolve this here, but I'll drop this and then we can move on to the next thing. This is the common way that we we developers talk about this. And then the machine learning researchers look at us. And laugh and say, that's cute. And then they just train a bigger model and they wipe us out in the next training run. So I think there's a certain amount of we are fighting the bitter lesson here. We're fighting AGI. And, you know, when AGI arrives, this will all go away. Obviously, on Latent Space, we don't really discuss that because I think AGI is kind of this hand wavy concept that isn't super relevant. But I think we have to respect that. For example, you could do a chain of thoughts with graphs and you could manually orchestrate a nice little graph that does like. Reflect, think about if you need more, more inference time, compute, you know, that's the hot term now. And then think again and, you know, scale that up. Or you could train Strawberry and DeepSeq R1. Right.Samuel [00:26:32]: I saw someone saying recently, oh, they were really optimistic about agents because models are getting faster exponentially. And I like took a certain amount of self-control not to describe that it wasn't exponential. But my main point was. If models are getting faster as quickly as you say they are, then we don't need agents and we don't really need any of these abstraction layers. We can just give our model and, you know, access to the Internet, cross our fingers and hope for the best. Agents, agent frameworks, graphs, all of this stuff is basically making up for the fact that right now the models are not that clever. In the same way that if you're running a customer service business and you have loads of people sitting answering telephones, the less well trained they are, the less that you trust them, the more that you need to give them a script to go through. Whereas, you know, so if you're running a bank and you have lots of customer service people who you don't trust that much, then you tell them exactly what to say. If you're doing high net worth banking, you just employ people who you think are going to be charming to other rich people and set them off to go and have coffee with people. Right. And the same is true of models. The more intelligent they are, the less we need to tell them, like structure what they go and do and constrain the routes in which they take.Swyx [00:27:42]: Yeah. Yeah. Agree with that. So I'm happy to move on. So the other parts of Pydantic AI that are worth commenting on, and this is like my last rant, I promise. So obviously, every framework needs to do its sort of model adapter layer, which is, oh, you can easily swap from OpenAI to Cloud to Grok. You also have, which I didn't know about, Google GLA, which I didn't really know about until I saw this in your docs, which is generative language API. I assume that's AI Studio? Yes.Samuel [00:28:13]: Google don't have good names for it. So Vertex is very clear. That seems to be the API that like some of the things use, although it returns 503 about 20% of the time. So... Vertex? No. Vertex, fine. But the... Oh, oh. GLA. Yeah. Yeah.Swyx [00:28:28]: I agree with that.Samuel [00:28:29]: So we have, again, another example of like, well, I think we go the extra mile in terms of engineering is we run on every commit, at least commit to main, we run tests against the live models. Not lots of tests, but like a handful of them. Oh, okay. And we had a point last week where, yeah, GLA is a little bit better. GLA1 was failing every single run. One of their tests would fail. And we, I think we might even have commented out that one at the moment. So like all of the models fail more often than you might expect, but like that one seems to be particularly likely to fail. But Vertex is the same API, but much more reliable.Swyx [00:29:01]: My rant here is that, you know, versions of this appear in Langchain and every single framework has to have its own little thing, a version of that. I would put to you, and then, you know, this is, this can be agree to disagree. This is not needed in Pydantic AI. I would much rather you adopt a layer like Lite LLM or what's the other one in JavaScript port key. And that's their job. They focus on that one thing and they, they normalize APIs for you. All new models are automatically added and you don't have to duplicate this inside of your framework. So for example, if I wanted to use deep seek, I'm out of luck because Pydantic AI doesn't have deep seek yet.Samuel [00:29:38]: Yeah, it does.Swyx [00:29:39]: Oh, it does. Okay. I'm sorry. But you know what I mean? Should this live in your code or should it live in a layer that's kind of your API gateway that's a defined piece of infrastructure that people have?Samuel [00:29:49]: And I think if a company who are well known, who are respected by everyone had come along and done this at the right time, maybe we should have done it a year and a half ago and said, we're going to be the universal AI layer. That would have been a credible thing to do. I've heard varying reports of Lite LLM is the truth. And it didn't seem to have exactly the type safety that we needed. Also, as I understand it, and again, I haven't looked into it in great detail. Part of their business model is proxying the request through their, through their own system to do the generalization. That would be an enormous put off to an awful lot of people. Honestly, the truth is I don't think it is that much work unifying the model. I get where you're coming from. I kind of see your point. I think the truth is that everyone is centralizing around open AIs. Open AI's API is the one to do. So DeepSeq support that. Grok with OK support that. Ollama also does it. I mean, if there is that library right now, it's more or less the open AI SDK. And it's very high quality. It's well type checked. It uses Pydantic. So I'm biased. But I mean, I think it's pretty well respected anyway.Swyx [00:30:57]: There's different ways to do this. Because also, it's not just about normalizing the APIs. You have to do secret management and all that stuff.Samuel [00:31:05]: Yeah. And there's also. There's Vertex and Bedrock, which to one extent or another, effectively, they host multiple models, but they don't unify the API. But they do unify the auth, as I understand it. Although we're halfway through doing Bedrock. So I don't know about it that well. But they're kind of weird hybrids because they support multiple models. But like I say, the auth is centralized.Swyx [00:31:28]: Yeah, I'm surprised they don't unify the API. That seems like something that I would do. You know, we can discuss all this all day. There's a lot of APIs. I agree.Samuel [00:31:36]: It would be nice if there was a universal one that we didn't have to go and build.Alessio [00:31:39]: And I guess the other side of, you know, routing model and picking models like evals. How do you actually figure out which one you should be using? I know you have one. First of all, you have very good support for mocking in unit tests, which is something that a lot of other frameworks don't do. So, you know, my favorite Ruby library is VCR because it just, you know, it just lets me store the HTTP requests and replay them. That part I'll kind of skip. I think you are busy like this test model. We're like just through Python. You try and figure out what the model might respond without actually calling the model. And then you have the function model where people can kind of customize outputs. Any other fun stories maybe from there? Or is it just what you see is what you get, so to speak?Samuel [00:32:18]: On those two, I think what you see is what you get. On the evals, I think watch this space. I think it's something that like, again, I was somewhat cynical about for some time. Still have my cynicism about some of the well, it's unfortunate that so many different things are called evals. It would be nice if we could agree. What they are and what they're not. But look, I think it's a really important space. I think it's something that we're going to be working on soon, both in Pydantic AI and in LogFire to try and support better because it's like it's an unsolved problem.Alessio [00:32:45]: Yeah, you do say in your doc that anyone who claims to know for sure exactly how your eval should be defined can safely be ignored.Samuel [00:32:52]: We'll delete that sentence when we tell people how to do their evals.Alessio [00:32:56]: Exactly. I was like, we need we need a snapshot of this today. And so let's talk about eval. So there's kind of like the vibe. Yeah. So you have evals, which is what you do when you're building. Right. Because you cannot really like test it that many times to get statistical significance. And then there's the production eval. So you also have LogFire, which is kind of like your observability product, which I tried before. It's very nice. What are some of the learnings you've had from building an observability tool for LEMPs? And yeah, as people think about evals, even like what are the right things to measure? What are like the right number of samples that you need to actually start making decisions?Samuel [00:33:33]: I'm not the best person to answer that is the truth. So I'm not going to come in here and tell you that I think I know the answer on the exact number. I mean, we can do some back of the envelope statistics calculations to work out that like having 30 probably gets you most of the statistical value of having 200 for, you know, by definition, 15% of the work. But the exact like how many examples do you need? For example, that's a much harder question to answer because it's, you know, it's deep within the how models operate in terms of LogFire. One of the reasons we built LogFire the way we have and we allow you to write SQL directly against your data and we're trying to build the like powerful fundamentals of observability is precisely because we know we don't know the answers. And so allowing people to go and innovate on how they're going to consume that stuff and how they're going to process it is we think that's valuable. Because even if we come along and offer you an evals framework on top of LogFire, it won't be right in all regards. And we want people to be able to go and innovate and being able to write their own SQL connected to the API. And effectively query the data like it's a database with SQL allows people to innovate on that stuff. And that's what allows us to do it as well. I mean, we do a bunch of like testing what's possible by basically writing SQL directly against LogFire as any user could. I think the other the other really interesting bit that's going on in observability is OpenTelemetry is centralizing around semantic attributes for GenAI. So it's a relatively new project. A lot of it's still being added at the moment. But basically the idea that like. They unify how both SDKs and or agent frameworks send observability data to to any OpenTelemetry endpoint. And so, again, we can go and having that unification allows us to go and like basically compare different libraries, compare different models much better. That stuff's in a very like early stage of development. One of the things we're going to be working on pretty soon is basically, I suspect, GenAI will be the first agent framework that implements those semantic attributes properly. Because, again, we control and we can say this is important for observability, whereas most of the other agent frameworks are not maintained by people who are trying to do observability. With the exception of Langchain, where they have the observability platform, but they chose not to go down the OpenTelemetry route. So they're like plowing their own furrow. And, you know, they're a lot they're even further away from standardization.Alessio [00:35:51]: Can you maybe just give a quick overview of how OTEL ties into the AI workflows? There's kind of like the question of is, you know, a trace. And a span like a LLM call. Is it the agent? It's kind of like the broader thing you're tracking. How should people think about it?Samuel [00:36:06]: Yeah, so they have a PR that I think may have now been merged from someone at IBM talking about remote agents and trying to support this concept of remote agents within GenAI. I'm not particularly compelled by that because I don't think that like that's actually by any means the common use case. But like, I suppose it's fine for it to be there. The majority of the stuff in OTEL is basically defining how you would instrument. A given call to an LLM. So basically the actual LLM call, what data you would send to your telemetry provider, how you would structure that. Apart from this slightly odd stuff on remote agents, most of the like agent level consideration is not yet implemented in is not yet decided effectively. And so there's a bit of ambiguity. Obviously, what's good about OTEL is you can in the end send whatever attributes you like. But yeah, there's quite a lot of churn in that space and exactly how we store the data. I think that one of the most interesting things, though, is that if you think about observability. Traditionally, it was sure everyone would say our observability data is very important. We must keep it safe. But actually, companies work very hard to basically not have anything that sensitive in their observability data. So if you're a doctor in a hospital and you search for a drug for an STI, the sequel might be sent to the observability provider. But none of the parameters would. It wouldn't have the patient number or their name or the drug. With GenAI, that distinction doesn't exist because it's all just messed up in the text. If you have that same patient asking an LLM how to. What drug they should take or how to stop smoking. You can't extract the PII and not send it to the observability platform. So the sensitivity of the data that's going to end up in observability platforms is going to be like basically different order of magnitude to what's in what you would normally send to Datadog. Of course, you can make a mistake and send someone's password or their card number to Datadog. But that would be seen as a as a like mistake. Whereas in GenAI, a lot of data is going to be sent. And I think that's why companies like Langsmith and are trying hard to offer observability. On prem, because there's a bunch of companies who are happy for Datadog to be cloud hosted, but want self-hosted self-hosting for this observability stuff with GenAI.Alessio [00:38:09]: And are you doing any of that today? Because I know in each of the spans you have like the number of tokens, you have the context, you're just storing everything. And then you're going to offer kind of like a self-hosting for the platform, basically. Yeah. Yeah.Samuel [00:38:23]: So we have scrubbing roughly equivalent to what the other observability platforms have. So if we, you know, if we see password as the key, we won't send the value. But like, like I said, that doesn't really work in GenAI. So we're accepting we're going to have to store a lot of data and then we'll offer self-hosting for those people who can afford it and who need it.Alessio [00:38:42]: And then this is, I think, the first time that most of the workloads performance is depending on a third party. You know, like if you're looking at Datadog data, usually it's your app that is driving the latency and like the memory usage and all of that. Here you're going to have spans that maybe take a long time to perform because the GLA API is not working or because OpenAI is kind of like overwhelmed. Do you do anything there since like the provider is almost like the same across customers? You know, like, are you trying to surface these things for people and say, hey, this was like a very slow span, but actually all customers using OpenAI right now are seeing the same thing. So maybe don't worry about it or.Samuel [00:39:20]: Not yet. We do a few things that people don't generally do in OTA. So we send. We send information at the beginning. At the beginning of a trace as well as sorry, at the beginning of a span, as well as when it finishes. By default, OTA only sends you data when the span finishes. So if you think about a request which might take like 20 seconds, even if some of the intermediate spans finished earlier, you can't basically place them on the page until you get the top level span. And so if you're using standard OTA, you can't show anything until those requests are finished. When those requests are taking a few hundred milliseconds, it doesn't really matter. But when you're doing Gen AI calls or when you're like running a batch job that might take 30 minutes. That like latency of not being able to see the span is like crippling to understanding your application. And so we've we do a bunch of slightly complex stuff to basically send data about a span as it starts, which is closely related. Yeah.Alessio [00:40:09]: Any thoughts on all the other people trying to build on top of OpenTelemetry in different languages, too? There's like the OpenLEmetry project, which doesn't really roll off the tongue. But how do you see the future of these kind of tools? Is everybody going to have to build? Why does everybody want to build? They want to build their own open source observability thing to then sell?Samuel [00:40:29]: I mean, we are not going off and trying to instrument the likes of the OpenAI SDK with the new semantic attributes, because at some point that's going to happen and it's going to live inside OTEL and we might help with it. But we're a tiny team. We don't have time to go and do all of that work. So OpenLEmetry, like interesting project. But I suspect eventually most of those semantic like that instrumentation of the big of the SDKs will live, like I say, inside the main OpenTelemetry report. I suppose. What happens to the agent frameworks? What data you basically need at the framework level to get the context is kind of unclear. I don't think we know the answer yet. But I mean, I was on the, I guess this is kind of semi-public, because I was on the call with the OpenTelemetry call last week talking about GenAI. And there was someone from Arize talking about the challenges they have trying to get OpenTelemetry data out of Langchain, where it's not like natively implemented. And obviously they're having quite a tough time. And I was realizing, hadn't really realized this before, but how lucky we are to primarily be talking about our own agent framework, where we have the control rather than trying to go and instrument other people's.Swyx [00:41:36]: Sorry, I actually didn't know about this semantic conventions thing. It looks like, yeah, it's merged into main OTel. What should people know about this? I had never heard of it before.Samuel [00:41:45]: Yeah, I think it looks like a great start. I think there's some unknowns around how you send the messages that go back and forth, which is kind of the most important part. It's the most important thing of all. And that is moved out of attributes and into OTel events. OTel events in turn are moving from being on a span to being their own top-level API where you send data. So there's a bunch of churn still going on. I'm impressed by how fast the OTel community is moving on this project. I guess they, like everyone else, get that this is important, and it's something that people are crying out to get instrumentation off. So I'm kind of pleasantly surprised at how fast they're moving, but it makes sense.Swyx [00:42:25]: I'm just kind of browsing through the specification. I can already see that this basically bakes in whatever the previous paradigm was. So now they have genai.usage.prompt tokens and genai.usage.completion tokens. And obviously now we have reasoning tokens as well. And then only one form of sampling, which is top-p. You're basically baking in or sort of reifying things that you think are important today, but it's not a super foolproof way of doing this for the future. Yeah.Samuel [00:42:54]: I mean, that's what's neat about OTel is you can always go and send another attribute and that's fine. It's just there are a bunch that are agreed on. But I would say, you know, to come back to your previous point about whether or not we should be relying on one centralized abstraction layer, this stuff is moving so fast that if you start relying on someone else's standard, you risk basically falling behind because you're relying on someone else to keep things up to date.Swyx [00:43:14]: Or you fall behind because you've got other things going on.Samuel [00:43:17]: Yeah, yeah. That's fair. That's fair.Swyx [00:43:19]: Any other observations just about building LogFire, actually? Let's just talk about this. So you announced LogFire. I was kind of only familiar with LogFire because of your Series A announcement. I actually thought you were making a separate company. I remember some amount of confusion with you when that came out. So to be clear, it's Pydantic LogFire and the company is one company that has kind of two products, an open source thing and an observability thing, correct? Yeah. I was just kind of curious, like any learnings building LogFire? So classic question is, do you use ClickHouse? Is this like the standard persistence layer? Any learnings doing that?Samuel [00:43:54]: We don't use ClickHouse. We started building our database with ClickHouse, moved off ClickHouse onto Timescale, which is a Postgres extension to do analytical databases. Wow. And then moved off Timescale onto DataFusion. And we're basically now building, it's DataFusion, but it's kind of our own database. Bogomil is not entirely happy that we went through three databases before we chose one. I'll say that. But like, we've got to the right one in the end. I think we could have realized that Timescale wasn't right. I think ClickHouse. They both taught us a lot and we're in a great place now. But like, yeah, it's been a real journey on the database in particular.Swyx [00:44:28]: Okay. So, you know, as a database nerd, I have to like double click on this, right? So ClickHouse is supposed to be the ideal backend for anything like this. And then moving from ClickHouse to Timescale is another counterintuitive move that I didn't expect because, you know, Timescale is like an extension on top of Postgres. Not super meant for like high volume logging. But like, yeah, tell us those decisions.Samuel [00:44:50]: So at the time, ClickHouse did not have good support for JSON. I was speaking to someone yesterday and said ClickHouse doesn't have good support for JSON and got roundly stepped on because apparently it does now. So they've obviously gone and built their proper JSON support. But like back when we were trying to use it, I guess a year ago or a bit more than a year ago, everything happened to be a map and maps are a pain to try and do like looking up JSON type data. And obviously all these attributes, everything you're talking about there in terms of the GenAI stuff. You can choose to make them top level columns if you want. But the simplest thing is just to put them all into a big JSON pile. And that was a problem with ClickHouse. Also, ClickHouse had some really ugly edge cases like by default, or at least until I complained about it a lot, ClickHouse thought that two nanoseconds was longer than one second because they compared intervals just by the number, not the unit. And I complained about that a lot. And then they caused it to raise an error and just say you have to have the same unit. Then I complained a bit more. And I think as I understand it now, they have some. They convert between units. But like stuff like that, when all you're looking at is when a lot of what you're doing is comparing the duration of spans was really painful. Also things like you can't subtract two date times to get an interval. You have to use the date sub function. But like the fundamental thing is because we want our end users to write SQL, the like quality of the SQL, how easy it is to write, matters way more to us than if you're building like a platform on top where your developers are going to write the SQL. And once it's written and it's working, you don't mind too much. So I think that's like one of the fundamental differences. The other problem that I have with the ClickHouse and Impact Timescale is that like the ultimate architecture, the like snowflake architecture of binary data in object store queried with some kind of cache from nearby. They both have it, but it's closed sourced and you only get it if you go and use their hosted versions. And so even if we had got through all the problems with Timescale or ClickHouse, we would end up like, you know, they would want to be taking their 80% margin. And then we would be wanting to take that would basically leave us less space for margin. Whereas data fusion. Properly open source, all of that same tooling is open source. And for us as a team of people with a lot of Rust expertise, data fusion, which is implemented in Rust, we can literally dive into it and go and change it. So, for example, I found that there were some slowdowns in data fusion's string comparison kernel for doing like string contains. And it's just Rust code. And I could go and rewrite the string comparison kernel to be faster. Or, for example, data fusion, when we started using it, didn't have JSON support. Obviously, as I've said, it's something we can do. It's something we needed. I was able to go and implement that in a weekend using our JSON parser that we built for Pydantic Core. So it's the fact that like data fusion is like for us the perfect mixture of a toolbox to build a database with, not a database. And we can go and implement stuff on top of it in a way that like if you were trying to do that in Postgres or in ClickHouse. I mean, ClickHouse would be easier because it's C++, relatively modern C++. But like as a team of people who are not C++ experts, that's much scarier than data fusion for us.Swyx [00:47:47]: Yeah, that's a beautiful rant.Alessio [00:47:49]: That's funny. Most people don't think they have agency on these projects. They're kind of like, oh, I should use this or I should use that. They're not really like, what should I pick so that I contribute the most back to it? You know, so but I think you obviously have an open source first mindset. So that makes a lot of sense.Samuel [00:48:05]: I think if we were probably better as a startup, a better startup and faster moving and just like headlong determined to get in front of customers as fast as possible, we should have just started with ClickHouse. I hope that long term we're in a better place for having worked with data fusion. We like we're quite engaged now with the data fusion community. Andrew Lam, who maintains data fusion, is an advisor to us. We're in a really good place now. But yeah, it's definitely slowed us down relative to just like building on ClickHouse and moving as fast as we can.Swyx [00:48:34]: OK, we're about to zoom out and do Pydantic run and all the other stuff. But, you know, my last question on LogFire is really, you know, at some point you run out sort of community goodwill just because like, oh, I use Pydantic. I love Pydantic. I'm going to use LogFire. OK, then you start entering the territory of the Datadogs, the Sentrys and the honeycombs. Yeah. So where are you going to really spike here? What differentiator here?Samuel [00:48:59]: I wasn't writing code in 2001, but I'm assuming that there were people talking about like web observability and then web observability stopped being a thing, not because the web stopped being a thing, but because all observability had to do web. If you were talking to people in 2010 or 2012, they would have talked about cloud observability. Now that's not a term because all observability is cloud first. The same is going to happen to gen AI. And so whether or not you're trying to compete with Datadog or with Arise and Langsmith, you've got to do first class. You've got to do general purpose observability with first class support for AI. And as far as I know, we're the only people really trying to do that. I mean, I think Datadog is starting in that direction. And to be honest, I think Datadog is a much like scarier company to compete with than the AI specific observability platforms. Because in my opinion, and I've also heard this from lots of customers, AI specific observability where you don't see everything else going on in your app is not actually that useful. Our hope is that we can build the first general purpose observability platform with first class support for AI. And that we have this open source heritage of putting developer experience first that other companies haven't done. For all I'm a fan of Datadog and what they've done. If you search Datadog logging Python. And you just try as a like a non-observability expert to get something up and running with Datadog and Python. It's not trivial, right? That's something Sentry have done amazingly well. But like there's enormous space in most of observability to do DX better.Alessio [00:50:27]: Since you mentioned Sentry, I'm curious how you thought about licensing and all of that. Obviously, your MIT license, you don't have any rolling license like Sentry has where you can only use an open source, like the one year old version of it. Was that a hard decision?Samuel [00:50:41]: So to be clear, LogFire is co-sourced. So Pydantic and Pydantic AI are MIT licensed and like properly open source. And then LogFire for now is completely closed source. And in fact, the struggles that Sentry have had with licensing and the like weird pushback the community gives when they take something that's closed source and make it source available just meant that we just avoided that whole subject matter. I think the other way to look at it is like in terms of either headcount or revenue or dollars in the bank. The amount of open source we do as a company is we've got to be open source. We're up there with the most prolific open source companies, like I say, per head. And so we didn't feel like we were morally obligated to make LogFire open source. We have Pydantic. Pydantic is a foundational library in Python. That and now Pydantic AI are our contribution to open source. And then LogFire is like openly for profit, right? As in we're not claiming otherwise. We're not sort of trying to walk a line if it's open source. But really, we want to make it hard to deploy. So you probably want to pay us. We're trying to be straight. That it's to pay for. We could change that at some point in the future, but it's not an immediate plan.Alessio [00:51:48]: All right. So the first one I saw this new I don't know if it's like a product you're building the Pydantic that run, which is a Python browser sandbox. What was the inspiration behind that? We talk a lot about code interpreter for lamps. I'm an investor in a company called E2B, which is a code sandbox as a service for remote execution. Yeah. What's the Pydantic that run story?Samuel [00:52:09]: So Pydantic that run is again completely open source. I have no interest in making it into a product. We just needed a sandbox to be able to demo LogFire in particular, but also Pydantic AI. So it doesn't have it yet, but I'm going to add basically a proxy to OpenAI and the other models so that you can run Pydantic AI in the browser. See how it works. Tweak the prompt, et cetera, et cetera. And we'll have some kind of limit per day of what you can spend on it or like what the spend is. The other thing we wanted to b
In this episode, Šimon Mandlík, a PhD candidate at the Czech Technical University will talk with us about leveraging machine learning and graph-based techniques for cybersecurity applications. We'll learn how graphs are used to detect malicious activity in networks, such as identifying harmful domains and executable files by analyzing their relationships within vast datasets. This will include the use of hierarchical multi-instance learning (HML) to represent JSON-based network activity as graphs and the advantages of analyzing connections between entities (like clients, domains etc.). Our guest shows that while other graph methods (such as GNN or Label Propagation) lack in scalability or having trouble with heterogeneous graphs, his method can tackle them because of the "locality assumption" – fraud will be a local phenomenon in the graph – and by relying on this assumption, we can get faster and more accurate results.
Welcome to the first episode of the new year where Chris and Andrew discuss their holiday activities and recent breaks from work, including travel experiences and Christmas celebrations. They delve into updates on Ruby and Bundler enhancements, and they emphasize the importance of Ruby Central's role in maintaining Ruby's security. The conversation also touches on various tech and entertainment topics including movie reviews, gaming experiences, and smart home projects with Raspberry Pi. The hosts share insights on JSON gem performance improvements and considerations for Ruby's frozen string literals. The episode concludes with discussions on practical applications for Home Assistant and reminiscing about their experiences with different programming languages. Hit download to hear more! HoneybadgerHoneybadger is an application health monitoring tool built by developers for developers.Disclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you. Jason Charnes X/Twitter Chris Oliver X/Twitter Andrew Mason X/Twitter
Today Laura and Kevin sit down with Mike Bowers an expert in IIoT platforms, and legacy system modernization. Mike shares invaluable insights on how to innovate in manufacturing and bridge the gap between technology and leadership. We explore Mike's personal and professional journey into software development and architecture and then deep dive into the Industrial Internet of Things (IIoT), explaining its role in transforming industries such as automotive, water treatment, and smart city development. Mike highlights how smart factories are revolutionizing manufacturing by improving efficiency, reducing costs, and minimizing environmental impacts—illustrating concepts with real-world examples like Amazon's use of graph databases to optimize delivery logistics. Mike explains the technologies driving Industry 4.0, including the MQTT protocol, the importance of mastering JSON, and the critical role of AI and machine learning in enhancing IIoT capabilities. Mike also addresses practical advice for aspiring professionals: prioritize hands-on experience by working directly in modern factory environments to bridge theoretical knowledge with real-world applications. Mike also tackles key issues such as cybersecurity risks of legacy connected devices and the skills gap in the workforce. Along the way, he touches on futuristic topics like Elon Musk's AI innovations and the impact of robotics on improving worker conditions—, even addressing whether Amazon workers might finally avoid "bottle-breaks." Whether you're a technologist or simply curious about how factories are evolving beyond the 1980s, this episode offers a fascinating look at the technologies shaping modern industry and the professionals driving these changes.Mike Bowers is the Chief Architect at FairCom Corporation. Mike brings decades of experience in software development and architecture, and specializes in high-performance NoSQL/SQL databases, IIoT platforms, and legacy system modernization solutions. His insights will help CEOs, IT Managers, software architects/engineers, and control engineers to reduce cost in manufacturing, deliver agility by adopting Industry 4.0, and bridge between technologists and executives, to mention a few.
Jake and Michael discuss all the latest Laravel releases, tutorials, and happenings in the community.This episode is sponsored by Honeybadger - move fast and fix things with application monitoring that helps developers get it done.Show linksURI Parsing and Mutation in Laravel 11.35 Set Data on a Fluent Instance in Laravel 11.36 New Eloquent Relation Existence Methods in Laravel 11.37 Laravel VS Code Extension Public Beta Aaron Francis: Laravel Solo, Courses, Screencasting, and more Ghostty Is a Fast, Feature-Rich, Cross-Platform Terminal Laravel News 2024 Recap Wirechat - Laravel Livewire chat package Automated API documentation of Laravel API resources Log Alarm Package for Laravel Token Forge - API Token Management with Laravel Breeze Get a Server's Public IP Address With PHP One-time Password Manager for Laravel A Laravel Package for the Quickpay API Microsoft Teams Notifications Package Laravel Microsoft Graph Using AI to Manage Translations in Laravel Dummy - Generate PHP class instances populated with dummy data using Faker TutorialsManaging concurrent requests with Laravel session blockingUsing Fluent to work with HTTP client responses in LaravelDynamic page updates with Laravel Blade fragmentsConverting collections to queries in Laravel using toQuery()Laravel whenLoadedCustomize the truncation of HTTP client request exceptionsUsing withoutWrapping to flatten API responsesCustomizing data transformations with Laravel castsPreserving collection keys in Laravel API resourcesWorking with URIs in LaravelDiscover file downloads in Laravel with Storage::downloadWorking with JSON attributes using Laravel's array castsAdding request context in Laravel applicationsExtracting sequential data with Laravel's takeWhileDeep array manipulation with Laravel's replaceRecursive methodFiltering collection objects by type with whereInstanceOfConverting Laravel models to JSON for API responsesAccessing raw model data with Laravel's attributesToArray methodOptimizing factory data creation with Laravel's recycle methodEarly view data preparation with Laravel view creatorsManaging proxy trust in Laravel applicationsCustomizing model date formats in LaravelOptimizing large data delivery with Laravel streaming responses
Edge of the Web - An SEO Podcast for Today's Digital Marketer
Mark Williams-Cook continues on the EDGE with the last show of 2024 - discussing his recent findings of a Google bug exploit that shows us a HUGE insight into ranking factors. Learning from the endpoints that send data to micro-services that influence features on Google, Mark was able to discover a way to modify the requests that output JSON data. From that - they saw a slew of “properties that Google was collecting from their ranking systems and query classifiers as to certain scores they were giving websites.” Yup - that's bigger than the DOJ Google leaks earlier this year, folks! At the end of the day, it's a consensus score. Are we in alignment? Learn about AlsoAsked.com, the software that Mark has been building for the last few years. We're huge fans of what this tool can be used for: Mark dissects the importance of understanding SERP dynamics beyond just the "10 blue links." He emphasizes the need to delve deeper into "People Also Ask" panels to gain insights into user intent and search behavior. AlsoAsked.com not only offers a visual map of intent but also supports both synchronous and asynchronous API requests. This functionality allows programmatic querying of PAA data, essential for responding to real-time changes in user queries and optimizing content planning effectively. A huge tool - and a huge insight into Google's consensus data analysis of a site. Great lessons learned from this last show of 2024. Dig in! [00:03:39] Introduction to Mark Williams-Cook [00:05:02] Mark Reported a Google Exploit - a Big One [00:10:11] Metrics Called “Site Quality” from Google [00:14:43] EDGE of the Web Title Sponsor: Site Strategics [00:18:29] It's About Consensus Factors [00:23:38] AlsoAsked.com: Understanding the Most Important Feature on the SERP [00:25:00] The Intent Research Tool [00:31:04] Screenshare of the Tool: Check out our YouTube video [00:35:58] A Slice in Time isn't Enough [00:36:18] EDGE of The Web Sponsor: Wix Studio [00:39:11] Using it with Screaming Frog & ChatGPT - an SEO Superpower [00:42:17] A Recent Challenge from John Mueller & Mark's Response [00:46:14] It's Not About Repeating Answers, It's About NEW Answers Thanks to Our Sponsors! Site Strategics: http://edgeofthewebradio.com/site Wix: http://edgeofthewebradio.com/wixstudio Follow Our Guest Bluesky: https://bsky.app/profile/markwilliamscook.com LinkedIn: https://www.linkedin.com/in/markseo/ YouTube: https://www.youtube.com/@candouragency
Happy holidays! We'll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all all our LS supporters who helped fund the gorgeous venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver. Today, we're proud to share Loubna's highly anticipated talk (slides here)!Synthetic DataWe called out the Synthetic Data debate at last year's NeurIPS, and no surprise that 2024 was dominated by the rise of synthetic data everywhere:* Apple's Rephrasing the Web, Microsoft's Phi 2-4 and Orca/AgentInstruct, Tencent's Billion Persona dataset, DCLM, and HuggingFace's FineWeb-Edu, and Loubna's own Cosmopedia extended the ideas of synthetic textbook and agent generation to improve raw web scrape dataset quality* This year we also talked to the IDEFICS/OBELICS team at HuggingFace who released WebSight this year, the first work on code-vs-images synthetic data.* We called Llama 3.1 the Synthetic Data Model for its extensive use (and documentation!) of synthetic data in its pipeline, as well as its permissive license. * Nemotron CC and Nemotron-4-340B also made a big splash this year for how they used 20k items of human data to synthesize over 98% of the data used for SFT/PFT.* Cohere introduced Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress observing gains of up to 56.5% improvement in win rates comparing multiple teachers vs the single best teacher model* In post training, AI2's Tülu3 (discussed by Luca in our Open Models talk) and Loubna's Smol Talk were also notable open releases this year.This comes in the face of a lot of scrutiny and criticism, with Scale AI as one of the leading voices publishing AI models collapse when trained on recursively generated data in Nature magazine bringing mainstream concerns to the potential downsides of poor quality syndata:Part of the concerns we highlighted last year on low-background tokens are coming to bear: ChatGPT contaminated data is spiking in every possible metric:But perhaps, if Sakana's AI Scientist pans out this year, we will have mostly-AI AI researchers publishing AI research anyway so do we really care as long as the ideas can be verified to be correct?Smol ModelsMeta surprised many folks this year by not just aggressively updating Llama 3 and adding multimodality, but also adding a new series of “small” 1B and 3B “on device” models this year, even working on quantized numerics collaborations with Qualcomm, Mediatek, and Arm. It is near unbelievable that a 1B model today can qualitatively match a 13B model of last year:and the minimum size to hit a given MMLU bar has come down roughly 10x in the last year. We have been tracking this proxied by Lmsys Elo and inference price:The key reads this year are:* MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases* Apple Intelligence Foundation Language Models* Hymba: A Hybrid-head Architecture for Small Language Models* Loubna's SmolLM and SmolLM2: a family of state-of-the-art small models with 135M, 360M, and 1.7B parameters on the pareto efficiency frontier.* and Moondream, which we already covered in the 2024 in Vision talkFull Talk on YouTubeplease like and subscribe!Timestamps* [00:00:05] Loubna Intro* [00:00:33] The Rise of Synthetic Data Everywhere* [00:02:57] Model Collapse* [00:05:14] Phi, FineWeb, Cosmopedia - Synthetic Textbooks* [00:12:36] DCLM, Nemotron-CC* [00:13:28] Post Training - AI2 Tulu, Smol Talk, Cohere Multilingual Arbitrage* [00:16:17] Smol Models* [00:18:24] On Device Models* [00:22:45] Smol Vision Models* [00:25:14] What's NextTranscript2024 in Synthetic Data and Smol Models[00:00:00] [00:00:05] Loubna Intro[00:00:05] Speaker: I'm very happy to be here. Thank you for the invitation. So I'm going to be talking about synthetic data in 2024. And then I'm going to be talking about small on device models. So I think the most interesting thing about synthetic data this year is that like now we have it everywhere in the large language models pipeline.[00:00:33] The Rise of Synthetic Data Everywhere[00:00:33] Speaker: I think initially, synthetic data was mainly used just for post training, because naturally that's the part where we needed human annotators. And then after that, we realized that we don't really have good benchmarks to [00:01:00] measure if models follow instructions well, if they are creative enough, or if they are chatty enough, so we also started using LLMs as judges.[00:01:08] Speaker: Thank you. And I think this year and towards the end of last year, we also went to the pre training parts and we started generating synthetic data for pre training to kind of replace some parts of the web. And the motivation behind that is that you have a lot of control over synthetic data. You can control your prompt and basically also the kind of data that you generate.[00:01:28] Speaker: So instead of just trying to filter the web, you could try to get the LLM to generate what you think the best web pages could look like and then train your models on that. So this is how we went from not having synthetic data at all in the LLM pipeline to having it everywhere. And so the cool thing is like today you can train an LLM with like an entirely synthetic pipeline.[00:01:49] Speaker: For example, you can use our Cosmopedia datasets and you can train a 1B model on like 150 billion tokens that are 100 percent synthetic. And those are also of good quality. And then you can [00:02:00] instruction tune the model on a synthetic SFT dataset. You can also do DPO on a synthetic dataset. And then to evaluate if the model is good, you can use.[00:02:07] Speaker: A benchmark that uses LLMs as a judge, for example, MTBench or AlpacaEvil. So I think this is like a really mind blowing because like just a few years ago, we wouldn't think this is possible. And I think there's a lot of concerns about model collapse, and I'm going to talk about that later. But we'll see that like, if we use synthetic data properly and we curate it carefully, that shouldn't happen.[00:02:29] Speaker: And the reason synthetic data is very popular right now is that we have really strong models, both open and closed. It is really cheap and fast to use compared to human annotations, which cost a lot and take a lot of time. And also for open models right now, we have some really good inference frameworks.[00:02:47] Speaker: So if you have enough GPUs, it's really easy to spawn these GPUs and generate like a lot of synthetic data. Some examples are VLM, TGI, and TensorRT.[00:02:57] Model Collapse[00:02:57] Speaker: Now let's talk about the elephant in the room, model [00:03:00] collapse. Is this the end? If you look at the media and all of like, for example, some papers in nature, it's really scary because there's a lot of synthetic data out there in the web.[00:03:09] Speaker: And naturally we train on the web. So we're going to be training a lot of synthetic data. And if model collapse is going to happen, we should really try to take that seriously. And the other issue is that, as I said, we think, a lot of people think the web is polluted because there's a lot of synthetic data.[00:03:24] Speaker: And for example, when we're building fine web datasets here at Guillerm and Hinek, we're interested in like, how much synthetic data is there in the web? So there isn't really a method to properly measure the amount of synthetic data or to save a webpage synthetic or not. But one thing we can do is to try to look for like proxy words, for example, expressions like as a large language model or words like delve that we know are actually generated by chat GPT.[00:03:49] Speaker: We could try to measure the amount of these words in our data system and compare them to the previous years. For example, here, we measured like a, these words ratio in different dumps of common crawl. [00:04:00] And we can see that like the ratio really increased after chat GPT's release. So if we were to say that synthetic data amount didn't change, you would expect this ratio to stay constant, which is not the case.[00:04:11] Speaker: So there's a lot of synthetic data probably on the web, but does this really make models worse? So what we did is we trained different models on these different dumps. And we then computed their performance on popular, like, NLP benchmarks, and then we computed the aggregated score. And surprisingly, you can see that the latest DOMs are actually even better than the DOMs that are before.[00:04:31] Speaker: So if there's some synthetic data there, at least it did not make the model's worse. Yeah, which is really encouraging. So personally, I wouldn't say the web is positive with Synthetic Data. Maybe it's even making it more rich. And the issue with like model collapse is that, for example, those studies, they were done at like a small scale, and you would ask the model to complete, for example, a Wikipedia paragraph, and then you would train it on these new generations, and you would do that every day.[00:04:56] Speaker: iteratively. I think if you do that approach, it's normal to [00:05:00] observe this kind of behavior because the quality is going to be worse because the model is already small. And then if you train it just on its generations, you shouldn't expect it to become better. But what we're really doing here is that we take a model that is very large and we try to distill its knowledge into a model that is smaller.[00:05:14] Phi, FineWeb, Cosmopedia - Synthetic Textbooks[00:05:14] Speaker: And in this way, you can expect to get like a better performance for your small model. And using synthetic data for pre-training has become really popular. After the textbooks are all you need papers where Microsoft basically trained a series of small models on textbooks that were using a large LLM.[00:05:32] Speaker: And then they found that these models were actually better than models that are much larger. So this was really interesting. It was like first of its time, but it was also met with a lot of skepticism, which is a good thing in research. It pushes you to question things because the dataset that they trained on was not public, so people were not really sure if these models are really good or maybe there's just some data contamination.[00:05:55] Speaker: So it was really hard to check if you just have the weights of the models. [00:06:00] And as Hugging Face, because we like open source, we tried to reproduce what they did. So this is our Cosmopedia dataset. We basically tried to follow a similar approach to what they documented in the paper. And we created a synthetic dataset of textbooks and blog posts and stories that had almost 30 billion tokens.[00:06:16] Speaker: And we tried to train some models on that. And we found that like the key ingredient to getting a good data set that is synthetic is trying as much as possible to keep it diverse. Because if you just throw the same prompts as your model, like generate like a textbook about linear algebra, and even if you change the temperature, the textbooks are going to look alike.[00:06:35] Speaker: So there's no way you could scale to like millions of samples. And the way you do that is by creating prompts that have some seeds that make them diverse. In our case, the prompt, we would ask the model to generate a textbook, but make it related to an extract from a webpage. And also we try to frame it within, to stay within topic.[00:06:55] Speaker: For example, here, we put like an extract about cardiovascular bioimaging, [00:07:00] and then we ask the model to generate a textbook related to medicine that is also related to this webpage. And this is a really nice approach because there's so many webpages out there. So you can. Be sure that your generation is not going to be diverse when you change the seed example.[00:07:16] Speaker: One thing that's challenging with this is that you want the seed samples to be related to your topics. So we use like a search tool to try to go all of fine web datasets. And then we also do a lot of experiments with the type of generations we want the model to generate. For example, we ask it for textbooks for middle school students or textbook for college.[00:07:40] Speaker: And we found that like some generation styles help on some specific benchmarks, while others help on other benchmarks. For example, college textbooks are really good for MMLU, while middle school textbooks are good for benchmarks like OpenBookQA and Pico. This is like a sample from like our search tool.[00:07:56] Speaker: For example, you have a top category, which is a topic, and then you have some [00:08:00] subtopics, and then you have the topic hits, which are basically the web pages in fine web does belong to these topics. And here you can see the comparison between Cosmopedia. We had two versions V1 and V2 in blue and red, and you can see the comparison to fine web, and as you can see throughout the training training on Cosmopedia was consistently better.[00:08:20] Speaker: So we managed to get a data set that was actually good to train these models on. It's of course so much smaller than FineWeb, it's only 30 billion tokens, but that's the scale that Microsoft data sets was, so we kind of managed to reproduce a bit what they did. And the data set is public, so everyone can go there, check if everything is all right.[00:08:38] Speaker: And now this is a recent paper from NVIDIA, Neumatron CC. They took things a bit further, and they generated not a few billion tokens, but 1. 9 trillion tokens, which is huge. And we can see later how they did that. It's more of, like, rephrasing the web. So we can see today that there's, like, some really huge synthetic datasets out there, and they're public, so, [00:09:00] like, you can try to filter them even further if you want to get, like, more high quality corpses.[00:09:04] Speaker: So for this, rephrasing the web this approach was suggested in this paper by Pratyush, where basically in this paper, they take some samples from C4 datasets, and then they use an LLM to rewrite these samples into a better format. For example, they ask an LLM to rewrite the sample into a Wikipedia passage or into a Q& A page.[00:09:25] Speaker: And the interesting thing in this approach is that you can use a model that is Small because it doesn't, rewriting doesn't require knowledge. It's just rewriting a page into a different style. So the model doesn't need to have like knowledge that is like extensive of what is rewriting compared to just asking a model to generate a new textbook and not giving it like ground truth.[00:09:45] Speaker: So here they rewrite some samples from C4 into Q& A, into Wikipedia, and they find that doing this works better than training just on C4. And so what they did in Nemo Trans CC is a similar approach. [00:10:00] They rewrite some pages from Common Crawl for two reasons. One is to, like improve Pages that are low quality, so they rewrite them into, for example, Wikipedia page, so they look better.[00:10:11] Speaker: And another reason is to create more diverse datasets. So they have a dataset that they already heavily filtered, and then they take these pages that are already high quality, and they ask the model to rewrite them in Question and Answer format. into like open ended questions or like multi choice questions.[00:10:27] Speaker: So this way they can reuse the same page multiple times without fearing like having multiple duplicates, because it's the same information, but it's going to be written differently. So I think that's also a really interesting approach for like generating synthetic data just by rephrasing the pages that you already have.[00:10:44] Speaker: There's also this approach called Prox where they try to start from a web page and then they generate a program which finds how to write that page to make it better and less noisy. For example, here you can see that there's some leftover metadata in the web page and you don't necessarily want to keep that for training [00:11:00] your model.[00:11:00] Speaker: So So they train a model that can generate programs that can like normalize and remove lines that are extra. So I think this approach is also interesting, but it's maybe less scalable than the approaches that I presented before. So that was it for like rephrasing and generating new textbooks.[00:11:17] Speaker: Another approach that I think is really good and becoming really popular for using synthetic data for pre training is basically building a better classifiers. For filtering the web for example, here we release the data sets called fine web edu. And the way we built it is by taking Llama3 and asking it to rate the educational content of web pages from zero to five.[00:11:39] Speaker: So for example, if a page is like a really good textbook that could be useful in a school setting, it would get a really high score. And if a page is just like an advertisement or promotional material, it would get a lower score. And then after that, we take these synthetic annotations and we train a classifier on them.[00:11:57] Speaker: It's a classifier like a BERT model. [00:12:00] And then we run this classifier on all of FineWeb, which is a 15 trillion tokens dataset. And then we only keep the pages that have like a score that's higher than 3. So for example, in our case, we went from 15 trillion tokens to 3. to just 1. 5 trillion tokens. Those are really highly educational.[00:12:16] Speaker: And as you can see here, a fine web EDU outperforms all the other public web datasets by a larger margin on a couple of benchmarks here, I show the aggregated score and you can see that this approach is really effective for filtering web datasets to get like better corpuses for training your LLMs.[00:12:36] DCLM, Nemotron-CC[00:12:36] Speaker: Others also try to do this approach. There's, for example, the DCLM datasets where they also train the classifier, but not to detect educational content. Instead, they trained it on OpenHermes dataset, which is a dataset for instruction tuning. And also they explain like IAM5 subreddits, and then they also get really high quality dataset which is like very information dense and can help [00:13:00] you train some really good LLMs.[00:13:01] Speaker: And then Nemotron Common Crawl, they also did this approach, but instead of using one classifier, they used an ensemble of classifiers. So they used, for example, the DCLM classifier, and also classifiers like the ones we used in FineWebEducational, and then they combined these two. Scores into a, with an ensemble method to only retain the best high quality pages, and they get a data set that works even better than the ones we develop.[00:13:25] Speaker: So that was it for like synthetic data for pre-training.[00:13:28] Post Training - AI2 Tulu, Smol Talk, Cohere Multilingual Arbitrage[00:13:28] Speaker: Now we can go back to post training. I think there's a lot of interesting post training data sets out there. One that was released recently, the agent instructs by Microsoft where they basically try to target some specific skills. And improve the performance of models on them.[00:13:43] Speaker: For example, here, you can see code, brain teasers, open domain QA, and they managed to get a dataset that outperforms that's when fine tuning Mistral 7b on it, it outperforms the original instruct model that was released by Mistral. And as I said, to get good synthetic data, you really [00:14:00] have to have a framework to make sure that your data is diverse.[00:14:03] Speaker: So for example, for them, they always. And then they see the generations on either source code or raw text documents, and then they rewrite them to make sure they're easier to generate instructions from, and then they use that for their like instruction data generation. There's also the Tool3SFT mixture, which was released recently by Allen AI.[00:14:23] Speaker: It's also really good quality and it covers a wide range of tasks. And the way they make sure that this dataset is diverse is by using personas from the persona hub datasets. Which is basically a data set of like I think over a million personas. And for example, in the tool mixture to generate like a new code snippet, they would give like the model persona, for example, a machine learning researcher interested in neural networks, and then ask it to generate like a coding problem.[00:14:49] Speaker: This way you make sure that your data set is really diverse, and then you can further filter the data sets, for example, using the reward models. We also released a dataset called Smalltalk, [00:15:00] and we also tried to cover the wide range of tasks, and as you can see here, for example, when fine tuning Mistral 7b on the dataset, we also outperformed the original Mistral instructs on a number of benchmarks, notably on mathematics and instruction following with ifevil.[00:15:18] Speaker: Another paper that's really interesting I wanted to mention is this one called Multilingual Data Arbitrage by Cohere. And basically they want to generate a data set for post training that is multilingual. And they have a really interesting problem. It's the fact that there isn't like one model that's really good at all the languages they wanted.[00:15:36] Speaker: So what they do is that like they use not just one teacher model, but multiple teachers. And then they have a router which basically sends the prompts they have to all these models. And then they get the completions and they have a reward model that traces all these generations and only keeps the best one.[00:15:52] Speaker: And this is like arbitrage and finance. So well, I think what's interesting in this, it shows that like synthetic data, it doesn't have to come from a single model. [00:16:00] And because we have so many good models now, you could like pull these models together and get like a dataset that's really high quality and that's diverse and that's covers all your needs.[00:16:12] Speaker: I was supposed to put a meme there, but. Yeah, so that was it for like a synthetic data.[00:16:17] Smol Models[00:16:17] Speaker: Now we can go to see what's happening in the small models field in 2024. I don't know if you know, but like now we have some really good small models. For example, Lama 3. 2 1B is. It matches Lama 2. 13b from, that was released last year on the LMSYS arena, which is basically the default go to leaderboard for evaluating models using human evaluation.[00:16:39] Speaker: And as you can see here, the scores of the models are really close. So I think we've made like hugely forward in terms of small models. Of course, that's one, just one data point, but there's more. For example, if you look at this chart from the Quint 2. 5 blog post, it shows that today we have some really good models that are only like 3 billion parameters [00:17:00] and 4 billion that score really high on MMLU.[00:17:03] Speaker: Which is a really popular benchmark for evaluating models. And you can see here that the red, the blue dots have more than 65 on MMLU. And the grey ones have less. And for example, Llama33b had less. So now we have a 3b model that outperforms a 33b model that was released earlier. So I think now people are starting to realize that like, we shouldn't just scale and scale models, but we should try to make them more efficient.[00:17:33] Speaker: I don't know if you knew, but you can also chat with a 3B plus model on your iPhone. For example, here, this is an app called PocketPal, where you can go and select a model from Hugging Face. It has a large choice. For example, here we loaded the 5. 3. 5, which is 3. 8 billion parameters on this iPhone. And we can chat with this and you can see that even the latency is also acceptable.[00:17:57] Speaker: For example, here, I asked it to give me a joke about [00:18:00] NeurIPS. So let's see what it has to say.[00:18:06] Speaker: Okay, why did the neural network attend NeurIPS? Because it heard there would be a lot of layers and fun and it wanted to train its sense of humor. So not very funny, but at least it can run on device. Yeah, so I think now we have good small models, but we also have like good frameworks and tools to use these small models.[00:18:24] On Device Models[00:18:24] Speaker: So I think we're really close to having like really on edge and on device models that are really good. And I think for a while we've had this narrative. But just training larger models is better. Of course, this is supported by science scaling laws. As you can see here, for example, when we scale the model size, the loss is lower and obviously you get a better model.[00:18:46] Speaker: But and we can see this, for example, in the GPT family of models, how we went from just a hundred million parameters to more than a trillion. parameters. And of course, we all observed the performance improvement when using the latest model. But [00:19:00] one thing that we shouldn't forget is that when we scale the model, we also scale the inference costs and time.[00:19:05] Speaker: And so the largest models were are going to cost so much more. So I think now instead of just building larger models, we should be focusing on building more efficient models. It's no longer a race for the largest models since these models are really expensive to run and they require like a really good infrastructure to do that and they cannot run on, for example, consumer hardware.[00:19:27] Speaker: And when you try to build more efficient models that match larger models, that's when you can really unlock some really interesting on device use cases. And I think a trend that we're noticing now is the trend of training smaller models longer. For example, if you compare how much, how long LLAMA was trained compared to LLAMA3, there is a huge increase in the pre training length.[00:19:50] Speaker: LLAMA was trained on 1 trillion tokens, but LLAMA3 8b was trained on 15 trillion tokens. So Meta managed to get a model that's the same size, but But it performs so much [00:20:00] better by choosing to like spend the sacrifice during training, because as we know, training is a one time cost, but inference is something that's ongoing.[00:20:08] Speaker: If we want to see what are like the small models reads in 2024, I think this mobile LLM paper by Meta is interesting. They try to study different models that are like have the less than 1 billion parameters and find which architecture makes most sense for these models. For example, they find that depth is more important than width.[00:20:29] Speaker: So it's more important to have models that have like more layers than just one. making them more wide. They also find that GQA helps, that tying the embedding helps. So I think it's a nice study overall for models that are just a few hundred million parameters. There's also the Apple intelligence tech report, which is interesting.[00:20:48] Speaker: So for Apple intelligence, they had two models, one that was like on server and another model that was on device. It had 3 billion parameters. And I think the interesting part is that they trained this model using [00:21:00] pruning. And then distillation. And for example, they have this table where they show that, like, using pruning and distillation works much better than training from scratch.[00:21:08] Speaker: And they also have some interesting insights about, like, how they specialize their models on specific tasks, like, for example, summarization and rewriting. There's also this paper by NVIDIA that was released recently. I think you've already had a talk about, like, hybrid models that was all interesting.[00:21:23] Speaker: And this model, they used, like, a hybrid architecture between state space models and transformers. And they managed to train a 1B model that's really performant without needing to train it on a lot of tokens. And regarding our work, we just recently released SmallM2, so it's a series of three models, which are the best in class in each model size.[00:21:46] Speaker: For example, our 1. 7b model outperforms Lama 1b and also Qt 2. 5. And how we managed to train this model is the following. That's where you spent a lot of time trying to curate the pre training datasets. We did a lot of [00:22:00] ablations, trying to find which datasets are good and also how to mix them. We also created some new math and code datasets that we're releasing soon.[00:22:08] Speaker: But you basically really spent a lot of time trying to find what's the best mixture that you can train these models on. And then we spent some time trying to like we also trained these models for very long. For example, small M1 was trained only on 1 trillion tokens, but this model is trained on 11 trillion tokens.[00:22:24] Speaker: And we saw that the performance kept improving. The models didn't really plateau mid training, which I think is really interesting. It shows that you can train such small models for very long and keep getting performance gains. What's interesting about SmallLM2 is that it's fully open. We also released, like the pre training code base, the fine tuning code, the datasets, and also evaluation in this repository.[00:22:45] Smol Vision Models[00:22:45] Speaker: Also there's, like, really interesting small models for text, but also for vision. For example, here you can see SmallVLM, which is a 2B model that's really efficient. It doesn't consume a lot of RAM, and it also has a good performance. There's also Moondream 0. [00:23:00] 5b, which was released recently. It's like the smallest visual language model.[00:23:04] Speaker: And as you can see, there isn't like a big trade off compared to Moondream 2b. So now I showed you that we have some really good small models. We also have the tools to use them, but why should you consider using small models and when? I think, like, small models are really interesting because of the on device feature.[00:23:23] Speaker: Because these models are small and they can run fast, you can basically run them on your laptop, but also on your mobile phone. And this means that your dataset stays locally. You don't have to send your queries to third parties. And this really enhances privacy. That was, for example, one of the big selling points for Apple Intelligence.[00:23:42] Speaker: Also, right now, we really have a lot of work to do. So many frameworks to do on device inference. For example, there's MLX, MLC, Llama, CPP, Transformers, JS. So we have a lot of options and each of them have like great features. So you have so many options for doing that. Small models are also really powerful if you choose to specialize them.[00:24:00][00:24:00] Speaker: For example, here there's a startup called Numind, which took small LM and then they fine tuned it on text extraction datasets. And they managed to get a model that's not very far from models that are much larger. So I think text extraction is like one use case where small models can be really performant and it makes sense to use them instead of just using larger models.[00:24:19] Speaker: You can also chat with these models in browser. For example, here, you can go there, you can load the model, you can even turn off your internet and just start chatting with the model locally. Speaking of text extraction, if you don't want to fine tune the models, there's a really good method of structure generation.[00:24:36] Speaker: We can basically force the models to follow a JSON schema that you defined. For example, here, we try to force the model to follow a schema for extracting key information from GitHub issues. So you can input free text, which is a complaint about a GitHub repository, something not working. And then you can run it there and the model can extract anything that is relevant for your GitHub issue creation.[00:24:58] Speaker: For example, the [00:25:00] priority, for example, here, priority is high, the type of the issue bug, and then a title and the estimation of how long this will take to fix. And you can just like do this in the browser, you can transform your text into a GitHub issue that's properly formatted.[00:25:14] What's Next[00:25:14] Speaker: So what's next for synthetic data and small models?[00:25:18] Speaker: I think that domain specific synthetic data is going to be, it's already important, it's going to be even more important. For example, generating synthetic data for math. I think this really would help improve the reasoning of a lot of models. And a lot of people are doing it, for example, Quint 2. 12 math, everyone's trying to reproduce a one.[00:25:37] Speaker: And so I think for synthetic data, trying to specialize it on some domains is going to be really important. And then for small models, I think specializing them through fine tuning, it's also going to be really important because I think a lot of companies are just trying to use these large models because they are better.[00:25:53] Speaker: But on some tasks, I think you can already get decent performance with small models. So you don't need to Pay like a [00:26:00] cost that's much larger just to make your model better at your task by a few percent. And this is not just for text. And I think it also applies for other modalities like vision and audio.[00:26:11] Speaker: And I think you should also watch out for on device frameworks and applications. For example, like the app I showed, or lama, all these frameworks are becoming really popular and I'm pretty sure that we're gonna get like more of them in 2025. And users really like that. Maybe for other, I should also say hot take.[00:26:28] Speaker: I think that like in AI, we just started like with fine tuning, for example, trying to make BERT work on some specific use cases, and really struggling to do that. And then we had some models that are much larger. So we just switched to like prompt engineering to get the models And I think we're going back to fine tuning where we realize these models are really costly.[00:26:47] Speaker: It's better to use just a small model or try to specialize it. So I think it's a little bit of a cycle and we're going to start to see like more fine tuning and less of just like a prompt engineering the models. So that was my talk. Thank you for following. And if you have [00:27:00] any questions, we can take them now. Get full access to Latent Space at www.latent.space/subscribe
News includes the long-awaited release of Phoenix LiveView 1.0, exciting enhancements in Elixir 1.18 such as built-in JSON support and improved ExUnit testing capabilities, and the unveiling of AWS Aurora DSQL, a serverless distributed PostgreSQL-compatible database service. Lars Wikman joins us to share updates about Nerves, including the latest on Nerves Hub, Nerves Cloud, and his project oswag.org where you can find official Elixir and Nerves T-shirts. All this and more! Show Notes online - http://podcast.thinkingelixir.com/233 (http://podcast.thinkingelixir.com/233) Elixir Community News https://www.phoenixframework.org/blog/phoenix-liveview-1.0-released (https://www.phoenixframework.org/blog/phoenix-liveview-1.0-released?utm_source=thinkingelixir&utm_medium=shownotes) – Phoenix LiveView 1.0 was officially released! https://github.com/phoenixframework/phoenixliveview (https://github.com/phoenixframework/phoenix_live_view?utm_source=thinkingelixir&utm_medium=shownotes) – Access the Phoenix LiveView 1.0 source code on GitHub. https://github.com/phoenixframework/phoenixliveview/blob/main/CHANGELOG.md (https://github.com/phoenixframework/phoenix_live_view/blob/main/CHANGELOG.md?utm_source=thinkingelixir&utm_medium=shownotes) – Check out the changelog for Phoenix LiveView 1.0. https://dockyard.com/blog/2024/12/03/phoenix-liveview-goes-1-0 (https://dockyard.com/blog/2024/12/03/phoenix-liveview-goes-1-0?utm_source=thinkingelixir&utm_medium=shownotes) – Dockyard blog discussing Phoenix LiveView 1.0. The 1.0 release was announced the day after our last episode was recorded. https://elixirforum.com/t/phoenix-liveview-1-0-is-out/67863 (https://elixirforum.com/t/phoenix-liveview-1-0-is-out/67863?utm_source=thinkingelixir&utm_medium=shownotes) – ElixirForum discussion on the release of Phoenix LiveView 1.0. https://x.com/chris_mccord/status/1864067247255306332 (https://x.com/chris_mccord/status/1864067247255306332?utm_source=thinkingelixir&utm_medium=shownotes) – Chris McCord's announcement of the Phoenix LiveView 1.0 release on Twitter/X. You can now quickly get started with Elixir and Phoenix using a single command line installer. http://elixir-install.org/ (http://elixir-install.org/?utm_source=thinkingelixir&utm_medium=shownotes) – Wojtek Mach's work on a one-line Elixir installer made getting started with Phoenix easier. https://x.com/chris_mccord/status/1864067249960558617 (https://x.com/chris_mccord/status/1864067249960558617?utm_source=thinkingelixir&utm_medium=shownotes) – Chris McCord credits Wojtek Mach for his work on the Elixir installer. https://x.com/liveviewnative/status/1864088172570857691 (https://x.com/liveviewnative/status/1864088172570857691?utm_source=thinkingelixir&utm_medium=shownotes) – LiveView Native updated to be based on LiveView 1.0. https://github.com/liveview-native/liveviewnative/commit/5077bda7bf999311bee467828390912e03e74467 (https://github.com/liveview-native/live_view_native/commit/5077bda7bf999311bee467828390912e03e74467?utm_source=thinkingelixir&utm_medium=shownotes) – GitHub commit showing updates on LiveView Native for LiveView 1.0 compatibility. Elixir 1.18 is confirmed to be released soon, bringing significant improvements. https://github.com/elixir-lang/elixir (https://github.com/elixir-lang/elixir?utm_source=thinkingelixir&utm_medium=shownotes) – Elixir's GitHub repository where you can find version 1.18. https://github.com/elixir-lang/elixir/blob/v1.18/CHANGELOG.md (https://github.com/elixir-lang/elixir/blob/v1.18/CHANGELOG.md?utm_source=thinkingelixir&utm_medium=shownotes) – The changelog details for Elixir 1.18, featuring many new enhancements. New built-in JSON support and upgrades to the testing library in Elixir 1.18. Type system in Elixir 1.18 now supports type checking of function calls. ExUnit in Elixir 1.18 supports parameterized tests and better concurrency handling. mix format --migrate in Elixir 1.18 helps to update deprecated constructs. Official JSON module in Elixir provides standards-compliant encoding and decoding. Language server improvements in Elixir 1.18 enhance development experience. Potential minor incompatibilities in Elixir 1.18, but mostly due to better error detection. https://github.com/nerves-hub/nerveshubweb/releases/tag/v2.1.0 (https://github.com/nerves-hub/nerves_hub_web/releases/tag/v2.1.0?utm_source=thinkingelixir&utm_medium=shownotes) – Nerves Hub 2.1.0 has been released with various updates. https://github.com/nerves-hub/nerveshublink/releases/tag/v2.6.0 (https://github.com/nerves-hub/nerves_hub_link/releases/tag/v2.6.0?utm_source=thinkingelixir&utm_medium=shownotes) – Version 2.6.0 of Nerves Hub Link is now available with new features. Nerves Hub now supports extensions and improved functionality. https://blog.swmansion.com/elixir-stream-week-how-not-to-load-test-during-a-live-elixir-run-broadcast-watched-by-hundreds-of-217d8f4b957a (https://blog.swmansion.com/elixir-stream-week-how-not-to-load-test-during-a-live-elixir-run-broadcast-watched-by-hundreds-of-217d8f4b957a?utm_source=thinkingelixir&utm_medium=shownotes) – Membrane's write-up on Elixir Stream Week and related technical challenges. https://x.com/astuyve/status/1863992458637680935 (https://x.com/astuyve/status/1863992458637680935?utm_source=thinkingelixir&utm_medium=shownotes) – Discussion on Twitter/X about AWS Aurora DSQL, a new distributed SQL service. https://aws.amazon.com/rds/aurora/dsql/ (https://aws.amazon.com/rds/aurora/dsql/?utm_source=thinkingelixir&utm_medium=shownotes) – AWS Aurora DSQL is a new serverless, distributed PostgreSQL-compatible database service. Do you have some Elixir news to share? Tell us at @ThinkingElixir (https://twitter.com/ThinkingElixir) or email at show@thinkingelixir.com (mailto:show@thinkingelixir.com) Discussion Resources - https://docs.nerves-hub.org/ (https://docs.nerves-hub.org/?utm_source=thinkingelixir&utm_medium=shownotes) - https://nerves-project.org/ (https://nerves-project.org/?utm_source=thinkingelixir&utm_medium=shownotes) - https://github.com/nerves-project (https://github.com/nerves-project?utm_source=thinkingelixir&utm_medium=shownotes) - https://nervescloud.com/ (https://nervescloud.com/?utm_source=thinkingelixir&utm_medium=shownotes) - https://www.yoctoproject.org/ (https://www.yoctoproject.org/?utm_source=thinkingelixir&utm_medium=shownotes) - https://oswag.org/ (https://oswag.org/?utm_source=thinkingelixir&utm_medium=shownotes) – Buy an official Elixir t-shirt! Guest Information - https://bsky.app/profile/lawik.bsky.social (https://bsky.app/profile/lawik.bsky.social?utm_source=thinkingelixir&utm_medium=shownotes) – on Bluesky - https://github.com/lawik (https://github.com/lawik?utm_source=thinkingelixir&utm_medium=shownotes) – on Github - https://fosstodon.org/@lawik (https://fosstodon.org/@lawik?utm_source=thinkingelixir&utm_medium=shownotes) – on Fediverse - https://underjord.io/ (https://underjord.io/?utm_source=thinkingelixir&utm_medium=shownotes) – Blog Find us online - Message the show - Bluesky (https://bsky.app/profile/thinkingelixir.com) - Message the show - X (https://x.com/ThinkingElixir) - Message the show on Fediverse - @ThinkingElixir@genserver.social (https://genserver.social/ThinkingElixir) - Email the show - show@thinkingelixir.com (mailto:show@thinkingelixir.com) - Mark Ericksen on X - @brainlid (https://x.com/brainlid) - Mark Ericksen on Bluesky - @brainlid.bsky.social (https://bsky.app/profile/brainlid.bsky.social) - Mark Ericksen on Fediverse - @brainlid@genserver.social (https://genserver.social/brainlid)
Topics covered in this episode: jiter A new home for python-build-standalone moka-py uv: An In-Depth Guide Extras Joke Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training The Complete pytest Course Patreon Supporters Connect with the hosts Michael: @mkennedy@fosstodon.org / @mkennedy.codes (bsky) Brian: @brianokken@fosstodon.org / @brianokken.bsky.social Show: @pythonbytes@fosstodon.org / @pythonbytes.fm (bsky) Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Monday at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Michael #1: jiter Fast iterable JSON parser. About to be the backend for Pydantic and Logfire. Currently powers OpenAI / ChatGPT (along with Pydantic itself), at least their Python library, maybe more. jiter has three interfaces: JsonValue an enum representing JSON data Jiter an iterator over JSON data PythonParse which parses a JSON string into a Python object jiter-python - This is a standalone version of the JSON parser used in pydantic-core. The recommendation is to only use this package directly if you do not use pydantic Brian #2: A new home for python-build-standalone Charlie Marsh See also Transferring Python Build Standalone Stewardship to Astral from Gregory Szorc python-build-standalone is the project that has prebuilt binaries for different architectures. used by uv python install 3.12 and uv venv .venv --python 3.12 and uv sync This is good stability news for everyone. Interesting discussion of prebuilt Python from Charlie Michael #3: moka-py A high performance caching library for Python written in Rust moka-py is a Python binding for the highly efficient Moka caching library written in Rust. This library allows you to leverage the power of Moka's high-performance, feature-rich cache in your Python projects. Features Synchronous Cache: Supports thread-safe, in-memory caching for Python applications. TTL Support: Automatically evicts entries after a configurable time-to-live (TTL). TTI Support: Automatically evicts entries after a configurable time-to-idle (TTI). Size-based Eviction: Automatically removes items when the cache exceeds its size limit using the TinyLFU policy. Concurrency: Optimized for high-performance, concurrent access in multi-threaded environments. Brian #4: uv: An In-Depth Guide On SaaS Pegasus blog, so presumably by Cory Zue Good intro to uv Also a nice list of everyday commands Install python: uv python install 3.12 I don't really use this anymore, as uv venv .venv --python 3.12 or uv sync install if necessary create a virtual env: uv venv .venv --python 3.12 install stuff: uv pip install django add project dependencies build pinned dependencies Also discussion about adopting the new workflow Extras Brian: PydanticAI - not sure why I didn't see that coming In the “good to know” and “commentary on society” area: Anti-Toxicity Features on Bluesky The WIRED Guide to Protecting Yourself From Government Surveillance Michael: Go sponsor a bunch of projects on GitHub Registration is open for PyCon Joke: Inf
On this episode of Hands-On Mac, Mikah answers questions about what to do with old electronics, extracting photos out of downloaded files and maintaining the original photo names, how to properly manage your phone's battery, and a question from a listen about the best way to manage their router setup! Kelli came across some old electronics her father had years ago! She thinks they could be worth something but wonders if she should recycle them instead. Ray has their iPhone wirelessly charging at night while on Do Not Disturb. However, it randomly makes a sound at night! How can he get his phone to stop making that noise? Sean just downloaded their Google photos to back them up on an external drive. How can he keep the photos' original names when he extracts the files? Dan wonders how to manage their phone's battery life best when they keep it plugged in most of the time. Alan has three routers throughout his household in his home network. When he goes to different points in the house, he has to change the SSID to the router closest to where he is. Is there a simpler way of doing this before going to a Mesh router setup? Don't forget to send in your questions for Mikah to answer during the show! hot@twit.tv Host: Mikah Sargent Download or subscribe to Hands-On Tech at https://twit.tv/shows/hands-on-tech Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit Sponsor: cachefly.com/twit