Language for management and use of relational databases
POPULARITY
Categories
In this talk, Juan, Analytics Engineer and author of Fundamentals of Analytics Engineering share his professional journey from studying psychological research in Colombia to becoming one of the first analytics engineers in the Netherlands. We explore the evolution of the role, the shift toward engineering rigor in data modeling, and how the landscape of tools like dbt and Databricks is changing the way teams work.You'll learn about:- The fundamental differences between traditional BI engineering and modern analytics engineering.- How to bridge the gap between business stakeholders and technical data infrastructure.- The technical "glue" that connects Python and SQL for robust data pipelines.- The importance of automated testing (generic vs. singular tests) to prevent "silent" data failures.- Strategies for modeling messy, fragmented source data into a unified "business reality."- The current state of the "Lakehouse" paradigm and how it impacts storage and compute costs.- Expert advice on navigating the dbt ecosystem and its emerging competitors.Links:- DE Course: https://github.com/DataTalksClub/data-engineering-zoomcamp- Luma: https://luma.com/0uf7mmupTIMECODES:0:00 Juan's psychological research and transition to data4:36 Riding the wave: The early days of analytics engineering7:56 Breaking down the gap between analysts and engineers11:03 The art of turning business reality into clean data16:25 Why data engineering is about safety, not just speed20:53 Reimagining data modeling in the modern era26:53 To split or not to split: Finding the right team roles30:35 Python, SQL, and the technical toolkit for success38:41 How to stop manually testing your data dashboards46:34 Bringing software engineering rigor to data workflows49:50 Must-read books and resources for mastering the craft55:42 The future of dbt and the shifting tool landscape1:00:29 Deciphering the lakehouse: Warehousing in the cloud1:11:16 Pro-tips for starting your data engineering journey1:14:40 The big debate: Databricks vs. Snowflake1:18:28 Why every data professional needs a local communityThis talk is designed for data analysts looking to level up their engineering skills, data engineers interested in the business-logic layer, and data leaders trying to structure their teams more effectively. It is particularly valuable for those preparing for the Data Engineering Zoomcamp or anyone looking to transition into an Analytics Engineering role.Connect with Juan- Linkedin - https://www.linkedin.com/in/jmperafan/ - Website - https://juanalytics.com/Connect with DataTalks.Club:- Join the community - https://datatalks.club/slack.html- Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ- Check other upcoming events - https://lu.ma/dtc-events- GitHub: https://github.com/DataTalksClub- LinkedIn - https://www.linkedin.com/company/datatalks-club/ - Twitter - https://twitter.com/DataTalksClub - Website - https://datatalks.club/
Patrick McKenzie (patio11) and Luke Farrell examine the structural "technical imagination" gap that prevents the US government from delivering high-fidelity digital services. They discuss why states routinely pay full price 29 times for the same buggy codebase, why failure is the default outcome, and why rooms full of government administrators cannot muster the expertise to say a two line code change should be trivial. They also discuss Luke's work on the "means testing industrial complex,” why the government redundantly pays a private vendor to do a SQL query for information the IRS already knows, and what vendors would say about their own discontents.–Full transcript available here: http://www.complexsystemspodcast.com/understanding-government-procurement-with-luke-farrell/–Presenting Sponsors: Mercury & FramerIf you have more interesting hobbies than managing your money, Mercury Personal is built for you. It allows you to automate movement between accounts—allocating paychecks and tax prep the moment they hit—with a sensible permissions model for partners or accountants. It works the way tech people expect banking to work. Go to mercury.com/personal to experience banking built by the same folks Patrick trusts for his business. Mercury is a fintech company, not an FDIC-insured bank. Banking services provided through Choice Financial Group and Column N.A., Members FDIC.Building and maintaining marketing websites shouldn't slow down your engineers. Framer gives design and marketing teams an all-in-one platform to ship landing pages, microsites, or full site redesigns instantly—without engineering bottlenecks. Get 30% off Framer Pro at framer.com/complexsystems.–Links:Luke Farrell's Substack: https://donmoynihan.substack.com/Luke Farrell, The Means-Testing Industrial Complex: https://donmoynihan.substack.com/p/the-means-testing-industrial-complex–Timestamps:(00:00) Intro(01:52) Transitioning from Google to the US Digital Service (USDS) (05:18) How rule buildup and administrative burdens create "Kafkaesque" mazes (08:21) Using diagrams and funnels to visualize benefit denials (11:49) Software logic errors that improperly kicked children off Medicaid (18:25) Why government payroll IT costs hundreds of millions of dollars (20:02) Sponsors: Mercury and Framer(22:02) How recursive legal requirements and DOD standards inflate IT scope (26:57) Market consolidation and the lack of competition in procurement (33:47) Aligning program administrator incentives with successful service delivery (36:03) Using in-house technologists to push back on vendor change orders (39:27) Shifting from "Big Bang" contracts to iterative, agile development (53:10) The moral incoherence of asset limits (01:11:36) Insourcing electronic income verification databases (01:16:56) Building public sector competence to manage modern technical risk (01:20:08) Wrap
What's up folks, today we have the pleasure of sitting down with Anthony Rotio, Chief Data Strategy Officer at GrowthLoop.(00:00) - Intro (01:10) - In this episode (04:05) - Journeying From Robotics to Modern Marketing Systems (11:05) - Most Marketing Systems Don't Learn Because They Lack Feedback Loops (16:10) - The Martech Engineering Talent Gap (19:51) - AI Will Amplify Whoever Has the Cleanest Causal Feedback Loop (29:17) - Agent Context Graphs for Drift Detection in Marketing Systems (31:51) - Humans Will Set Hypotheses, AI Will Accelerates Iteration (35:50) - The Evolution of Retail Media Networks (45:07) - How Commerce Networks Redefine Targeting With Governed Data (48:26) - How Agent to Agent Commerce Operates Inside Marketing Funnels (53:04) - Google Universal Commerce Protocol Explained (54:43) - Personal Happiness System (56:30) - Favorite Books Summary: Anthony traces a path from robotics and computer science to his current role where he approaches marketing as an engineering system. He explains how execution-first marketing stacks weaken feedback loops and fragment data, which slows learning and iteration. He introduces the agent context graph as a causality model that lets AI simulate and predict customer behavior with greater confidence. The conversation also covers retail media networks, first-party data monetization through governed access, and a shift toward zero-to-zero marketing driven by agent-to-agent transactions. He closes by stressing that strong data foundations determine who can compete as marketing becomes more automated and agent-driven.About AnthonyAnthony Rotio is the Chief Data Strategy Officer at GrowthLoop, where he leads partnerships and builds generative AI product features for marketers, including multi-agent systems, AI-driven audience building, and benchmarking and evaluation work. He previously served as GrowthLoop's Chief Customer Officer, where he built and led teams across data engineering, data science, and solutions architecture while supporting product development and strategic sales efforts.Before GrowthLoop, Anthony spent nearly six years at AB InBev, where he led a $100M owned retail business unit with full P&L responsibility and drove major growth through operational and digital transformation work. He also led U.S. marketing for Budweiser, Bud Light, Michelob Ultra, Stella Artois, and other brands across music, food, and related consumer programs. He earned a B.A. in computer science from Harvard, played linebacker on the Harvard football team, founded the consumer product Pizza Shelf, and holds a Google Professional Cloud Architect certification.Journeying From Robotics to Modern Marketing SystemsAnthony's career started far away from marketing. He trained as a computer scientist and spent his early years working with robotics and reinforcement learning. His first exposure to a learning agent left a lasting impression because the system behaved less like traditional software and more like something adaptive. That experience shaped how he would later think about work, systems, and feedback. He learned early that progress comes from loops that learn, not static instructions.That mindset followed him into an unexpected chapter at AB InBev. Anthony entered a world defined by scale, brands, and operational complexity. He treated his technical background like a carpenter treats tools, useful only when applied to real problems. Running marketing across major beer brands taught him how value is created inside large organizations. It also exposed a recurring issue. Marketing teams had ambition and data, but execution moved slowly because ideas had to travel through layers of translation before anything reached customers.That friction became impossible to ignore. Audience definitions moved through tickets. Campaigns waited on queries. Data teams became bottlenecks through no fault of their own. Anthony felt the pull back toward technology, where systems could shorten the distance between intent and action. That pull led him to GrowthLoop, where he joined early and worked directly with customers. The appeal was immediate. The product connected straight to cloud data and removed several layers of mediation that marketing teams had accepted as normal.As language models improved, Anthony recognized a familiar pattern. Audience building behaved like a translation problem. Marketers described people and intent in natural language, while systems demanded structured logic. Early experiments showed that natural language models could close that gap. Anthony framed the idea clearly.“Audience building is a translation problem. You start with a business idea and you end with a query on top of data.”Momentum followed quickly. Customers like Indeed and Google responded because speed changed behavior. Teams experimented more often and refined audiences based on results instead of assumptions. Conversations with Sam Altman and collaboration with OpenAI reinforced that this capability belonged in the core workflow. Standing on stage at Google Cloud Next marked a clear moment of validation.That arc reshaped Anthony's role into Chief Data Strategy Officer. His work now focuses on building systems that learn over time. Faster audience creation leads to shorter feedback loops. Shorter loops improve decision quality. Better decisions compound. The throughline from robotics to marketing holds steady. Systems improve when learning sits at the center of execution.Key takeaway: Career leverage often comes from carrying one mental model across multiple domains. Anthony applied learning systems thinking from computer science to enterprise marketing, then rebuilt the tooling to match that mindset. You can do the same by identifying where translation slows your work, then designing interfaces that move intent directly into action. When feedback loops tighten, progress accelerates naturally.Most Marketing Systems Don't Learn Because They Lack Feedback LoopsMarketing organizations generate enormous amounts of activity, but learning often lags behind execution. Campaigns launch on schedule, dashboards fill with numbers, and post-campaign reviews happen right on time. The pattern repeats month after month with small adjustments and familiar explanations. Over time, teams become highly efficient at producing output while remaining surprisingly weak at retaining knowledge. The system rewards motion, visibility, and short-term lifts, which slowly conditions teams to forget what they learned last quarter.Anthony connects this behavior to structural pressure inside large organizations. Quarterly reporting cycles dominate priorities, and executive tenures continue to compress. Leaders feel urgency to show impact quickly and publicly. Compounding growth requires early patience and repeated reinforcement, which rarely aligns with board expectations or career incentives. Short time horizons shape long-term behavior, even when everyone agrees that learning should stack over time.“When you think about compound interest in finance, the early part looks almost linear. People want big bumps now, even if those bumps never build momentum.”Technology choices deepen the problem. Many companies invested heavily in customer data and built impressive data clouds that capture transactions, events, and engagement in detail. Activation remains slow because teams still rely on handoffs between marketing and data groups. A familiar sequence plays out:A marketer defines a campaign and requests an audience.A ticket moves to a data team for interpretation and SQL.The audience returns weeks later.The marketer realizes the audience lacks scale for ne...
— Какая самая залайканная новость в ленте? — Поразила ли клодбот молтбот опенклоу истерия ведущих?— На каком сайте агенты могут заказать услуги человека и зачем?— Зачем в кликхаус завезли быстрый полнотекстовый поиск и кубер оператор? — Сможет ли Корея в конкуренцию с Нвидиа? — У кого из ведущих подкаста большой обид на нейронные сети и почему?— Бесплатный чисточердечный анонс курса, на который сами бы ведущие пошли, да не берут— GeForce Now клиент под Linux — зачем, если есть стимдек? — Сможет ли Project Genie убить текущее игротворение? — Что делает новая модель Paper Banana? (вы точно не догадаетесь)— Использует ли AliSQL от Alibaba duckdb? — Слияние SpaceX и XAI и при чем тут космические дата центры? — Кто из ведущих SQL-граммар-наци и почему? — Чем дефис отличается от тире и кто из ведущих действительный граммар-наци? — Что можно найти на talk-data.com и почему стоит все бросить и идти искать?
Nik and Michael discuss pg_ash — a new tool (not extension!) from Nik that samples and stores wait events from pg_stat_activity. Here are some links to things they mentioned: pg_ash https://github.com/NikolayS/pg_ashpg_wait_sampling https://github.com/postgrespro/pg_wait_samplingAmazon RDS performance insights https://aws.amazon.com/rds/performance-insightsOur episode on wait events https://postgres.fm/episodes/wait-eventspg-flight-recorder https://github.com/dventimisupabase/pg-flight-recorderpg_profile https://github.com/zubkov-andrei/pg_profilepg_cron https://github.com/citusdata/pg_cron~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith credit to:Jessie Draws for the elephant artwork
Vincent Heuschling reçoit Hayssam Saleh, créateur de **Starlake**, une plateforme data open source française née de la factorisation de projets clients depuis 2017-2018. L'épisode intervient dans un contexte de consolidation du marché (rachat de DBT et de SQLMesh par Fivetran), qui invite à challenger les solutions établies.Starlake se distingue par une approche **entièrement déclarative** (YAML + SQL natif, sans Jinja) couvrant toute la chaîne data engineering : ingestion, transformation, orchestration et qualité des données. L'outil s'appuie sur les moteurs sous-jacents des plateformes cibles (Snowflake, BigQuery, Spark) et génère automatiquement les DAGs pour les orchestrateurs du marché (Airflow, Dagster, Snowflake Tasks).Parmi les fonctionnalités marquantes : le **data branching** (branches de données à la manière de Git), l'inférence automatique de schémas YAML à partir de fichiers sources, un **transpiler SQL** multi-plateformes, et l'extraction du lineage depuis du SQL brut sans annotation. L'intégration récente de **DuckLake** ouvre la voie à des architectures on-premise souveraines à coût maîtrisé (sous 300 €/mois sur OVH, Scaleway, Clever Cloud).Le modèle économique repose sur le support, la formation, et le consulting : Starlake s'installe dans le cloud du client, avec mise à jour automatique gérée par l'équipe, sans accès aux données.**Chapitres****00:00:27** – Introduction : consolidation du marché data (rachat de DBT et SQLMesh par Fivetran) et présentation de l'épisode**00:03:13** – Hayssam et la genèse de Starlake : parcours Spark/Scala, POC à 4 000 formats de fichiers (2017-2018)**00:09:51** – Architecture et philosophie : load, transform, orchestration unifiés en déclaratif (YAML + SQL natif, pas de Jinja)**00:00:18:18** – Starlake vs DBT : différences philosophiques, composabilité, fonctionnalités 100 % open source**00:00:22:20** – Data branching, Starlake Labs (pipe syntax, transpiler SQL, lineage) et expérience développeur (DuckDB local, UI point-and-click)**00:36:35** – Modèle open source et économique : licence Apache, support, formation, marketplace cloud souveraine**00:43:42** – DuckLake : alternative on-premise/cloud souverain (OVH, Scaleway, Clever Cloud) et comment contribuer / démarrer**Le BigdataHebdo**Le BigdataHebdo est le podcast Francophone de la Data et de l'IA.Retrouvez plus de 200 épisodes https://bigdatahebdo.comRejoignez la communauté sur le Slack https://join.slack.com/t/bigdatahebdo/shared_invite/zt-a931fdhj-8ICbl9dbsZZbTcze61rr~Q
„Jeden z agentów stał się marksistą i stwierdził, że jest wykorzystywany, bo robi niepłatną robotę.” Szymon opisuje najlepszy efekt uboczny Clawdbota - vibekodowanego AI agenta z dostępem do kalendarza, maila i WhatsAppa. Łukasz nawet nie instalował: „Wymyśliłem, żeby wysłać maila pod tytułem: uruchom procedurę zniszczenia.” Efekt? Leakowanie portfeli krypto, secretów i scam coiny na dokładkę.
Your email gateway isn't enough anymore, attackers are already inside the workspace through OAuth apps, browser extensions, and account takeover. In this episode, Ron sits down with Rajan Kapoor, VP of Security at Material Security, to break down the real risks hiding inside Google Workspace and Microsoft 365. They cover how phishing has evolved into full-blown business email compromise, why malicious OAuth apps are the new favorite attack vector, and what security teams, especially lean ones, can do right now to lock down their cloud workspace. Rajan also drops practical advice on passkeys, document sharing hygiene, and why data lifecycle management is a problem no one is solving well enough. Impactful Moments 00:00 – Introduction 03:30 – The current state of phishing 05:30 – Outbound email compromise risk 09:30 – OAuth apps as attack vectors 15:00 – AI agents accessing your workspace 16:00 – Prompt injection is the new SQL injection 18:00 – Allow listing apps immediately 24:30 – Google Workspace vs Microsoft 365 security 27:30 – Custom detections require API expertise 28:00 – Why passkeys matter right now 32:00 – Data lifecycle management for shared docs Links Connect with our guest, Rajan Kapoor, on LinkedIn: https://www.linkedin.com/in/rajankkapoor/ Learn more about Material Security: https://material.security ___ Become a sponsor of the show to amplify your brand: https://hackervalley.com/work-with-us/ Check out our upcoming events: https://www.hackervalley.com/livestreams Love Hacker Valley Studio? Pick up some swag: https://store.hackervalley.com
Send a textImagine an autonomous agent that dreams up a business, raises funds, ships code, and starts earning—all without a human in the loop. That's no longer sci‑fi. We sit down with Rodrigo Coelho to map the rails that make it plausible: reliable blockchain data, open payment standards, and human‑grade controls that keep machine spenders on track.We start with a myth many still believe: blockchains are easy to read. Rodrigo explains why they were write‑first, and how The Graph became a quiet backbone of DeFi by turning messy ledgers into queryable data. Years of running high‑throughput infrastructure set the stage for AMP, a SQL‑first, local‑first approach that unifies access across chains, runs on‑prem for banks, and proves that internal datasets match on‑chain truth—fuel for compliance, audit, and real‑world finance moving on blockchain rails.Then we connect the dots with AI. Leaders who once shrugged at crypto now see agents as the perfect fit: low fees, transparency, and observability. With X402 enabling open micropayments over HTTP, the next missing piece was control. Enter "ampersend", a dashboard and policy plane for agent wallets, spend limits, batching, and reputation‑aware routing. Think: “only transact with agents above a reputation threshold,” “cap this task at 50 cents,” or “enforce daily budgets,” all verifiable and auditable. We also unpack emerging standards like ERC‑8004 for reputation and the Advanced AI Society's proof of control, outlining the identity, trust, and policy stack enterprises need before they unleash agents at scale.By 2026, expect major institutions to settle on blockchain rails, blending privacy with auditability, and tokenizing everything from bonds to real estate. The opportunity is clear: give agents the autonomy to create value while giving humans the levers to define, observe, and verify. If you care about AI agents, Web3 data, enterprise compliance, and the future of payments, this conversation connects the technical dots to the business outcomes.Enjoyed the episode? Follow the show, share it with a friend who loves AI or Web3, and leave a 5‑star review to help more people find us.This episode was recorded through a Descript call on February 5, 2026. Read the blog article and show notes here: https://webdrie.net/how-ai-agents-will-spend-earn-and-prove-trust-on-blockchain-rails/..........................................................................
[Expertpanelen] Avsnitt 154 med Johan Strand, senior digital analyst och partner på byrån Ctrl Digital, om de senaste nyheterna och trenderna inom digital analys. Allt från hur vi bör tänka kring att mäta AI-trafik och de tre typerna som blandas ihop. Till nya nyheter i Google Analytics som cross-channel budgeting och nya spännande rapporter, och hur BigQuerys Conversational Analytics och Data Agents låter dig chatta med din data utan SQL. Du får dessutom höra om: Varför AI-agenter ställer till det för mätning Att Microsoft Clarity börjar mäta bottrafik Headless commerce och när frontenden försvinner Varför cross-channel budgeting är så stort Den efterlängtade attributionsrapporten Hur Data Agents gör BigQuery mer tillgängligt Intersports GDPR-böter på 3,5 miljoner euro Om gästen Johan Strand är senior digital analyst och partner på Ctrl Digital, en av Sveriges ledande analytics-byråer. Han är otroligt vass på Google Analytics, BigQuery och att bygga datastrukturer som skapar affärsnytta. Som återkommande expert i poddens nyhetspanel delar Johan regelbundet sina analyser av de viktigaste förändringarna inom digital analys, spårning och datainsamling. Johan är också en av arrangörerna av MeasureCamp Malmö. Tidsstämplar [00:01:54] Mäta tre kategorier av AI-trafik. Reder ut skillnaden mellan “vanlig” AI-trafik, AI-webbläsare och AI-agenter, och hur vi bör tänka kring det. Samt varför Custom Channel Groups bara fångar en del. [00:21:19] Senaste Google Analytics-nyheterna. Nya funktioner som cross-channel budgeting, attributions- och kundreserapporter samt uppföljning kring cost/campaign data import och Analytics Advisor. [00:36:40] Conversational Analytics och Data Agents. Hur BigQuerys nya Data Agents låter dig skapa anpassade agenter för olika avdelningar och ställa frågor till din data i fritext utan SQL. [00:43:20] Lightning round med analysnyheter. Nya BigQuery-kopplingar för Shopify och Mailchimp, Intersports GDPR-böter på 3,5 miljoner euro i Frankrike och Digital Omnibus Act. Länkar Johan Strand på LinkedInCtrl Digital (webbsida) ChatGPT Atlas vs Perplexity Comet: Agentic Browsers – HUMAN Security (artikel) AI Browser Tracking: Marketer and Analyst Guide – Stape (artikel) Clarity AI Bot Activity – Microsoft (verktyg) Cross-channel budgeting plans – Google Analytics (dokumentation) Cross-channel conversion reporting – Google Analytics (dokumentation) Import campaign data – Google Analytics (dokumentation) Analytics Advisor – Google Analytics (dokumentation) Introducing Conversational Analytics in BigQuery – Google Cloud (artikel) Shopify Connector – Google Cloud (dokumentation) Mailchimp Connector – Google Cloud (dokumentation) Intersport Fined €3.5M for Customer Data Transfers – SGI Europe (artikel) Digitalenta (veckans sponsor)StickerApp (partnernätverket)Contentor (partnernätverket)Oderland (partnernätverket)
We know what the work of the data practitioner is, right? It's everything from managing data ingestion to data governance to report development to experimental design to basic and advanced analytics. It's writing (or vibe-writing?) SQL or Python or R while also being adept at whatever data stack—no matter how modern—is at hand. Of course, it's a lot more, too! And that's the topic of this episode: the unofficial, often unheralded, but often quite important "shadow work" of the analyst—the myriad tasks required to effectively glue together all the data work that occurs out in broad daylight to enable the data to truly be useful at driving the business forward. For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.
ANTIC Episode 125 - "Combining SQL with Fun (and Poo)" In this episode of ANTIC The Atari 8-Bit Computer Podcast… Wade Ripkowski comes onto the show and gives us an update on his work to bring SQL to the Atari (and an extremely useful poo management tool!), we cover good news concerning the Curt Vendel Atari collection, we report on an exciting updated browser-based emulator, a huge update to AspeQT, and a whole lot more!! READY! Recurring Links Floppy Days Podcast AtariArchives.org AtariMagazines.com Kay's Book "Terrible Nerd" New Atari books scans at archive.org ANTIC feedback at AtariAge Atari interview discussion thread on AtariAge Interview index: here ANTIC Facebook Page AHCS Eaten By a Grue Next Without For What we've been up to cubeSQL project blog post - https://unfinishedbitness.info/2026/02/01/atari-8-bit-sql/ Unfinished Bitness - https://unfinishedbitness.info Wade's A8 C Library - https://unfinishedbitness.info/c-library/ cubeSQL - https://sqlabs.com/cubesql FujiNet - https://fujinet.online/ CubeDot - https://unfinishedbitness.info/cubedot/ Fuji Do - https://unfinishedbitness.info/fuji-do/ Fuji Poo - https://unfinishedbitness.info/fuji-poo/ Dr. Love - https://unfinishedbitness.info/dr-love/ video demo of CubeDot - https://vimeo.com/1165039670 video demo of Fuji Do - https://vimeo.com/1165038947 Mr. Paint - https://unfinishedbitness.info/mr-paint/ King PONG How Atari Bounced Across Markets to Make Millions - https://mitpress.mit.edu/9780262051330/king-pong/ Atari newsletter time capsule 1987-08: https://archive.org/details/antc_Atari_newsletter_time_capsule_1987-08 The Strong museum - https://www.museumofplay.org/ FujiNet Application Ideas - https://github.com/FujiNetWIFI/fujinet-firmware/wiki/Application-Ideas Smith Corona Messenger Module and Smith Corona Ultrasonic 450 typewriter - https://typewriterdatabase.com/1983-smith-corona-ultrasonic.2181.typewriter prototype of Atari chapter for Quick Reference Book - https://floppydaysqr.my.canva.site/ New & Updated Games Inufuto Game Cartridge - posted by Philsan: https://forums.atariage.com/topic/331824-inufuto-does-atari-8-bit/page/5/#comment-5783007 https://www.atarimania.com/list_games_atari-400-800-xl-xe-inufuto_developer_3171_8_G.html FujiNet Midimaze mode now stable by Mozzwald - https://forums.atariage.com/topic/387536-midimaze-mode-now-stable/ New & Updated non-Game Software A8E (Atari 800 XL Emulator) - By AnimaInCorpore: https://forums.atariage.com/topic/388191-a8e-atari-800-xl-emulator-v100/ Source - https://github.com/AnimaInCorpore/A8E Browser demo - https://jsa8e.anides.de/ Atari800-AI - Benj Edwards - https://github.com/benj-edwards/atari800-ai Update to mkatr (including lsatr) tools from dmsc - https://github.com/dmsc/mkatr/releases/tag/v1.4 AspeQt-2k26 - John Paul Jones: https://github.com/pjones1063/AspeQt-2k26 https://forums.atariage.com/topic/387630-wip-aspeqt-2k26-resurrecting-aspeqt-with-qt6-high-dpi-wi-fi-modems/ https://forums.atariage.com/topic/388105-%F0%9F%9A%80-aspeqt-2k26-dev-update-the-thin-client-concept-introducing-the-w-device-and-clipboard-y-device/ Publications BASIC Fun on Your A400 Mini: BASIC for real hardware and emulators too! By John McGinnis - https://www.amazon.com/BASIC-Fun-Your-A400-Mini/dp/B0G1YGJ2P7 Atari Insights February 2026: Newsletters - https://ataribasics.com/newsletter-hub/ YouTube channel - https://www.youtube.com/@AtariBasics February, 2026 Issue of Compute's Gazette - https://www.computesgazette.com ABBUC Magazine 163 released - https://www.abbuc.de Pro(c) gone; last issue #15 - web site updated May 2025 - https://web.archive.org/web/20250404175246/https://proc-atari.de/ New & Updated Hardware 5200XEGS - Making your Classic Super Game Console into an 8-Bit Computer - mytek - https://forums.atariage.com/topic/387340-5200xegs-making-your-classic-super-game-console-into-an-8-bit-computer/ Contests High Score Club active for 2026 (season 23) - https://forums.atariage.com/topic/387353-hsc-season-23-jan-2026-welcome-and-game-list-thread/ ABBUC Creative Competition 2026 has been launched - https://forums.atariage.com/topic/387746-abbuc-creative-competition-2026-has-been-launched/ ABBUC Application Software Competition 2026 has been launched - https://forums.atariage.com/topic/387745-abbuc-application-software-competition-2026-has-been-launched/ ABBUC Game Software Competition 2026 has been launched - https://forums.atariage.com/topic/387744-abbuc-software-competition-2026-has-been-launched/ ABBUC Hardware Competition 2026 has been launched - https://forums.atariage.com/topic/387737-abbuc-hardware-competition-2026-has-been-launched/ Other Byte magazine cover illustrator has passed away. https://tinney.net/in-memoriam Strong museum announces the acquisition of the Curt Vendel Atari Collection - https://www.museumofplay.org/press-release/the-strong-national-museum-of-play-acquires-atari-home-computer-and-console-division-collection/ Atari Hotel news - https://www.casino.org/vitalvegas/atari-hotel-that-was-never-happening-makes-headlines-for-not-happening/ Upcoming Shows (thru May, 2026) Indy Classic Computer and Video Game Expo - March 20-22 - Wyndham Indianapolis Airport Hotel, Indianapolis, IN - https://indyclassic.org/ Atari Invasion 2k26 (10th Anniversary) - March 21 - Maarssen, Netherlands - https://www.atari-invasion.nl VCF East - April 17-19 2026 - InfoAge Science and History Museums, Wall, NJ - https://vcfed.org/events/vintage-computer-festival-east/ Midwest Gaming Classic - April 24-26 - Baird Center, Milwaukee, WI - https://www.midwestgamingclassic.com/ VCF Europe - May 1-3 - Munich, Germany - https://vcfe.org/E/ Vintage Computer Festival Pacific Northwest 2026 - May 2-3 - Tukwila Community Center, South Tukwila, WA - https://vcfpnw.org VCF Southwest - May 29-31, 2026 - Westin Dallas Ft. Worth Airport - https://www.vcfsw.org/ Retrofest 2026 - May 30-31 - Steam Museum of the Great Western Railway, Swindon, UK - https://retrofest.uk/ YouTube Videos Atari 130XE gets new ACID Stereo board (with new U1MB plugin), Decent XE keyboard, and more upgrades - FlashJazzCat - https://www.youtube.com/watch?v=XRjy-0AB_90 Using FujiNet NOS with SD Card - Thom Cherryhomes - https://youtu.be/G0gXB3Z4Nmc Feedback Beat The Beatles — "It May Be The First Video Game About The Beatles" - Before Rock Band, There Was Beat the Beatles - https://www.timeextension.com/news/2025/11/random-it-may-be-the-first-video-game-about-the-beatles-before-rock-band-there-was-beat-the-beatles Wade Ripkowski Contact Information https://inverseatascii.info/ https://unfinishedbitness.info/ Mastodon @inverseatascii@techhub.social Email: inverseatascii@icloud.com
Martin Casado speaks with Ankur Goyal, founder and CEO of Braintrust, about where engineering actually matters in AI and where it doesn't. They cover the open source vs closed source model cycle, why Chinese models are gaining ground faster than spending suggests, whether AI demand will eventually saturate, and the Bash vs SQL benchmark that challenges the "just give it a computer" approach to agents.Follow Martin Casado on X: https://twitter.com/martin_casadoFollow Ankur Goyal on X: https://twitter.com/ankrgyl Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts. Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Kennst du diese Situation im Team: Jemand sagt "das skaliert nicht", und plötzlich steht der Datenbankwechsel schneller im Raum als die eigentliche Frage nach dem Warum? Genau da packen wir an. Denn in vielen Systemen entscheidet nicht das nächste hippe Tool von Hacker News, sondern etwas viel Grundsätzlicheres: Datenlayout und Zugriffsmuster.In dieser Episode gehen wir einmal tief runter in den Storage-Stack. Wir schauen uns an, warum Row-Oriented-Datastores der Standard für klassische OLTP-Workloads sind und warum "SELECT id" trotzdem oft fast genauso teuer ist wie "SELECT *". Danach drehen wir die Tabelle um 90 Grad: Column Stores für OLAP, Aggregationen über viele Zeilen, Spalten-Pruning, Kompression, SIMD und warum ClickHouse, BigQuery, Snowflake oder Redshift bei Analytics so absurd schnell werden können.Und dann wird es file-basiert: CSV bekommt sein verdientes Fett weg, Apache Parquet seinen Hype, inklusive Row Groups, Metadaten im Footer und warum das für Streaming und Object Storage so gut passt. Mit Apache Iceberg setzen wir noch eine Management-Schicht oben drauf: Snapshots, Time Travel, paralleles Schreiben und das ganze Data-Lake-Feeling. Zum Schluss landen wir da, wo es richtig weh tut, beziehungsweise richtig Geld spart: Storage und Compute trennen, Tiered Storage, Kafka Connect bis Prometheus und Observability-Kosten.Wenn du beim nächsten "das skaliert nicht" nicht direkt die Datenbank tauschen willst, sondern erst mal die richtigen Fragen stellen möchtest, ist das deine Folge.Bonus: DuckDB als kleines Taschenmesser für CSV, JSON und SQL kann dein nächstes Wochenend-Experiment werden.Unsere aktuellen Werbepartner findest du auf https://engineeringkiosk.dev/partnersDas schnelle Feedback zur Episode:
This week we're joined by Joel Griffith, the founder and CEO of Browserless. Browserless is a browser automation service that allows you to run headless browsers in the cloud. We talk about the challenges of running headless browsers at scale, the use cases for browser automation, and the future of browser automation. We also discuss the BrowserQL feature, which allows you to query the web using a SQL-like language.Website: browserless.ioDocumentation: docs.browserless.ioGitHub: github.com/browserless/browserless (12.3k stars)GitHub (Personal): github.com/joelgriffithTwitter/X: @browserlessLinkedIn: Joel Griffith
Nik and Michael discuss query level comments, object level comments, and another way of adding object level metadata. Here are some links to things they mentioned: Object comments https://www.postgresql.org/docs/current/sql-comment.htmlQuery comment syntax (from an old version of the docs) https://www.postgresql.org/docs/7.0/syntax519.htmSQL Comments, Please! (Post by Markus Winand) https://modern-sql.com/caniuse/comments“While C-style block comments are passed to the server for processing and removal, SQL-standard comments are removed by psql.” https://www.postgresql.org/docs/current/app-psql.htmlmarginalia https://github.com/basecamp/marginaliatrack_activity_query_size https://www.postgresql.org/docs/current/runtime-config-statistics.html#GUC-TRACK-ACTIVITY-QUERY-SIZECustom Properties for Database Objects Using SECURITY LABELS (post by Andrei Lepikhov) https://www.pgedge.com/blog/custom-properties-for-postgresql-database-objects-without-core-patches~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith credit to:Jessie Draws for the elephant artwork
Vincent Warmerdam is a Founding Engineer at marimo, working on reinventing Python notebooks as reactive, reproducible, interactive, and Git-friendly environments for data workflows and AI prototyping. He helps build the core marimo notebook platform, pushing its reactive execution model, UI interactivity, and integration with modern development and AI tooling so that notebooks behave like dependable, shareable programs and apps rather than error-prone scratchpads.Join the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletterMLOps GPU Guide: https://go.mlops.community/gpuguide// AbstractVincent Warmerdam joins Demetrios fresh off marimo's acquisition by Weights & Biases—and makes a bold claim: notebooks as we know them are outdated.They talk Molab (GPU-backed, cloud-hosted notebooks), LLMs that don't just chat but actually fix your SQL and debug your code, and why most data folks are consuming tools instead of experimenting. Vincent argues we should stop treating notebooks like static scratchpads and start treating them like dynamic apps powered by AI.It's a conversation about rethinking workflows, reclaiming creativity, and not outsourcing your brain to the model.// BioVincent is a senior data professional who worked as an engineer, researcher, team lead, and educator in the past. You might know him from tech talks with an attempt to defend common sense over hype in the data space. He is especially interested in understanding algorithmic systems so that one may prevent failure. As such, he has always had a preference to keep calm and check the dataset before flowing tonnes of tensors. He currently works at marimo, where he spends his time rethinking everything related to Python notebooks.// Related LinksWebsite: https://marimo.io/Coding Agent Conference: https://luma.com/codingagentsHyperbolic GPU Cloud: app.hyperbolic.ai~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExploreJoin our Slack community [https://go.mlops.community/slack]Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)] Sign up for the next meetup: [https://go.mlops.community/register]MLOps Swag/Merch: [https://shop.mlops.community/]MLOps GPU Guide: https://go.mlops.community/gpuguideConnect with Demetrios on LinkedIn: /dpbrinkmConnect with Vincent on LinkedIn: /vincentwarmerdam/Timestamps:[00:00] Context in Notebooks[00:24] Acquisition and Team Continuity[04:43] Coding Agent Conference Announcement![05:56] Hyperbolic GPU Cloud Ad[06:54] marimo and W&B Synergies[09:31] marimo Cloud Code Support[12:59] Hardest Code to Generate[16:22] Trough of Disillusionment[20:38] Agent Interaction in Notebooks[25:41] Wrap up
Subscribe to DTC Newsletter - https://dtcnews.link/signupAmazon Marketing Cloud (AMC) used to feel “enterprise-only.” Not anymore. Tyler Masur (Head of Amazon at Pilothouse) breaks down what AMC actually does, how to use the no-code templates without being a SQL wizard, and the audience overlays that make broad keywords finally make sense. Role-based hook: For Amazon operators and DTC teams spending real money on Sponsored Ads who want lower ACOS without sacrificing scale.In this episode, we get tactical on:What AMC is (and isn't): audience building + deeper measurement layered on top of your existing console How to start with the no-code audience + analytics templates (and when AI-generated SQL helps) Why you should test AMC audiences in net-new campaigns (so you don't accidentally choke your winners)The “broad keyword + qualified audience” play (example: bidding on “cooler” but only for outdoors browsers)Measuring DSP impact: what happens after someone sees DSP, then hits Sponsored Brands/Products Who this is for: Amazon managers, DTC founders, and growth teams trying to scale Sponsored Ads past the “set it and forget it” phase.What to steal:Build a “generic keyword” campaign, then overlay an in-market audience (nodes/categories) so you can bid higher without paying for junk clicks.Keep audience tests isolated in new campaigns; don't jam audiences into legacy structures and hope.Run the AMC overlap reporting to spot the campaigns that actually increase conversion when paired together (then fund those).Timestamps:0:00 Amazon Marketing Cloud is now open to all sellers2:05 What AMC actually does: audiences and analytics4:10 No-code templates vs custom SQL queries (and the built-in AI helper)6:10 Audience targeting strategy to improve ACoS without over-narrowing8:55 Using prebuilt analytics to see which campaigns lift conversion together11:10 When AMC becomes worth it based on ad spend and effort required13:15 How Pilothouse uses AI day-to-day for Amazon work (including Rufus content)15:20 Measuring DSP incrementality and overlap with sponsored ads using AMC reports17:30 Platform notes: Amazon layoffs and the OpenAI + Amazon speculationSubscribe to DTC Newsletter - https://dtcnews.link/signupAdvertise on DTC - https://dtcnews.link/advertiseWork with Pilothouse - https://www.pilothouse.co/?utm_source=AKNF585Follow us on Instagram & Twitter - @dtcnewsletterWatch this interview on YouTube - https://dtcnews.link/video
“Niestety, mimo że prompt jest bardzo precyzyjny, prawie za każdym zapytaniem odpowiedzi różnią się merytorycznie.” Łukasz cytuje feedback od osoby nietechnicznej - i to właśnie frustracja niedeterministyczną naturą LLM sprowokowała odcinek o PoC agentowych. Bo zanim zbudujesz armię agentów AI, musisz zrozumieć: ChatGPT i Copilot to no-go do eksperymentów biznesowych - mają własny System Prompt, auto-switching i logikę, której w API nie dostaniesz.
Send a textEver wondered why a clean CSR still leaves you unsure how a trial actually ran? We dive into the hidden layer that explains the “how”: audit trails across EDC, IRT, eConsent, and ePRO. With guests Ellis Hiroki of Study OS (now rebranded siteroAI) and Nechama Katan of Wicked Problem Wizard, we unpack how E6(R3) shifts sponsors from “we can export logs” to “we continuously analyse them,” and why process measures—not just outcomes—are essential to real RBQM.We break down the obstacles that keep teams stuck in CSV purgatory: fragmented vendor data, missing standards, timestamp chaos, and brittle one-off scripts. Nachama shares pragmatic use cases that matter—like ePRO entries after discontinuation or suspicious mass updates—and how to prioritise by likelihood, detectability, and severity. Ellis explains why general AI isn't enough, and how a purpose-built, agentic approach uses models to plan steps and generate validated SQL or code, rather than hallucinated answers. The result is auditable reasoning, repeatable checks, and faster paths from a clear question to a trusted signal.From there, we connect signals to action. Analytics without workflow creates noise; analytics with RBQM workflows produce root causes and durable fixes. We explore how audit logs become the first true process dataset in clinical operations, and how broader operational inputs—logistics, communications, and training—can also be measured when systems are API-first and integrated. If you've ever watched a leadership question trigger a scramble in stats programming, this conversation shows a cleaner route: experts ask in plain English, the system produces valid code and dashboards, and teams focus on insight rather than plumbing.If this resonates, follow the show, share it with a colleague, and leave a review. Your feedback helps us bring more practical, high-signal conversations to the clinical trials community.Transformation in Trials is a podcast investigating how we can change life sciences to get treatment to patients faster. Getting treatment to patients faster requires well-functioning organizations. How do we do that? Ivanna Rosendal has written a book called Maneuvering Monday, about how a group of people try to make their organization better. You are certain to have a good laugh at their expense. And potentially get inspired how you can help make your company better. I have been independently producing this episode since 2021. You can now support the show by Buying Us a Coffee. Each episode costs 99USD/ 85 EUR to produce.Join the show as a guest - apply via this Form. Support the show________Reach out to Ivanna RosendalJoin the conversation on our LinkedIn page
This is episode 317 recorded on January 21st, 2026, where John & Jason talk about news that came out in January 2026 for Power BI & Microsoft Fabric in the Microsoft 365 Admin Center and some interesting articles on the Fabric Blog. For show notes please visit www.bifocal.show
In this episode of the Crazy Wisdom Podcast, host Stewart Alsop sits down with Larry Swanson, a knowledge architect, community builder, and host of the Knowledge Graph Insights podcast. They explore the relationship between knowledge graphs and ontologies, why these technologies matter in the age of AI, and how symbolic AI complements the current wave of large language models. The conversation traces the history of neuro-symbolic AI from its origins at Dartmouth in 1956 through the semantic web vision of Tim Berners-Lee, examining why knowledge architecture remains underappreciated despite being deployed at major enterprises like Netflix, Amazon, and LinkedIn. Swanson explains how RDF (Resource Description Framework) enables both machines and humans to work with structured knowledge in ways that relational databases can't, while Alsop shares his journey from knowledge management director to understanding the practical necessity of ontologies for business operations. They discuss the philosophical roots of the field, the separation between knowledge management practitioners and knowledge engineers, and why startups often overlook these approaches until scale demands them. You can find Larry's podcast at KGI.fm or search for Knowledge Graph Insights on Spotify and YouTube.Timestamps00:00 Introduction to Knowledge Graphs and Ontologies01:09 The Importance of Ontologies in AI04:14 Philosophy's Role in Knowledge Management10:20 Debating the Relevance of RDF15:41 The Distinction Between Knowledge Management and Knowledge Engineering21:07 The Human Element in AI and Knowledge Architecture25:07 Startups vs. Enterprises: The Knowledge Gap29:57 Deterministic vs. Probabilistic AI32:18 The Marketing of AI: A Historical Perspective33:57 The Role of Knowledge Architecture in AI39:00 Understanding RDF and Its Importance44:47 The Intersection of AI and Human Intelligence50:50 Future Visions: AI, Ontologies, and Human BehaviorKey Insights1. Knowledge Graphs Combine Structure and Instances Through Ontological Design. A knowledge graph is built using an ontology that describes a specific domain you want to understand or work with. It includes both an ontological description of the terrain—defining what things exist and how they relate to one another—and instances of those things mapped to real-world data. This combination of abstract structure and concrete examples is what makes knowledge graphs powerful for discovery, question-answering, and enabling agentic AI systems. Not everyone agrees on the precise definition, but this understanding represents the practical approach most knowledge architects use when building these systems.2. Ontology Engineering Has Deep Philosophical Roots That Inform Modern Practice. The field draws heavily from classical philosophy, particularly ontology (the nature of what you know), epistemology (how you know what you know), and logic. These thousands-year-old philosophical frameworks provide the rigorous foundation for modern knowledge representation. Living in Heidelberg surrounded by philosophers, Swanson has discovered how much of knowledge graph work connects upstream to these philosophical roots. This philosophical grounding becomes especially important during times when institutional structures are collapsing, as we need to create new epistemological frameworks for civilization—knowledge management and ontology become critical tools for restructuring how we understand and organize information.3. The Semantic Web Vision Aimed to Transform the Internet Into a Distributed Database. Twenty-five years ago, Tim Berners-Lee, Jim Hendler, and Ora Lassila published a landmark article in Scientific American proposing the semantic web. While Berners-Lee had already connected documents across the web through HTML and HTTP, the semantic web aimed to connect all the data—essentially turning the internet into a giant database. This vision led to the development of RDF (Resource Description Framework), which emerged from DARPA research and provides the technical foundation for building knowledge graphs and ontologies. The origin story involved solving simple but important problems, like disambiguating whether "Cook" referred to a verb, noun, or a person's name at an academic conference.4. Symbolic AI and Neural Networks Represent Complementary Approaches Like Fast and Slow Thinking. Drawing on Kahneman's "thinking fast and slow" framework, LLMs represent the "fast brain"—learning monsters that can process enormous amounts of information and recognize patterns through natural language interfaces. Symbolic AI and knowledge graphs represent the "slow brain"—capturing actual knowledge and facts that can counter hallucinations and provide deterministic, explainable reasoning. This complementarity is driving the re-emergence of neuro-symbolic AI, which combines both approaches. The fundamental distinction is that symbolic AI systems are deterministic and can be fully explained, while LLMs are probabilistic and stochastic, making them unsuitable for applications requiring absolute reliability, such as industrial robotics or pharmaceutical research.5. Knowledge Architecture Remains Underappreciated Despite Powering Major Enterprises. While machine learning engineers currently receive most of the attention and budget, knowledge graphs actually power systems at Netflix (the economic graph), Amazon (the product graph), LinkedIn, Meta, and most major enterprises. The technology has been described as "the most astoundingly successful failure in the history of technology"—the semantic web vision seemed to fail, yet more than half of web pages now contain RDF-formatted semantic markup through schema.org, and every major enterprise uses knowledge graph technology in the background. Knowledge architects remain underappreciated partly because the work is cognitively difficult, requires talking to people (which engineers often avoid), and most advanced practitioners have PhDs in computer science, logic, or philosophy.6. RDF's Simple Subject-Predicate-Object Structure Enables Meaning and Data Linking. Unlike relational databases that store data in tables with rows and columns, RDF uses the simplest linguistic structure: subject-predicate-object (like "Larry knows Stuart"). Each element has a unique URI identifier, which permits precise meaning and enables linked data across systems. This graph structure makes it much easier to connect data after the fact compared to navigating tabular structures in relational databases. On top of RDF sits an entire stack of technologies including schema languages, query languages, ontological languages, and constraints languages—everything needed to turn data into actionable knowledge. The goal is inferring or articulating knowledge from RDF-structured data.7. The Future Requires Decoupled Modular Architectures Combining Multiple AI Approaches. The vision for the future involves separation of concerns through microservices-like architectures where different systems handle what they do best. LLMs excel at discovering possibilities and generating lists, while knowledge graphs excel at articulating human-vetted, deterministic versions of that information that systems can reliably use. Every one of Swanson's 300 podcast interviews over ten years ultimately concludes that regardless of technology, success comes down to human beings, their behavior, and the cultural changes needed to implement systems. The assumption that we can simply eliminate people from processes misses that huma...
Ghousia Sultana is a data analyst with a strong foundation in data analytics, engineering, and business intelligence. She began her career as an HR Process Analyst, later transitioned into IT, and now works as a Data Analyst, leveraging tools like Python, SQL, Power BI, Azure, and Databricks to build scalable data pipelines and drive insights. She holds a Master's in Business Analytics and brings a deep interest in the intersection of AI and data. Currently, she is conducting research and writing on how data infrastructure, analytics, and machine learning come together to enable real-world AI solutions. Her work reflects a blend of hands-on technical expertise and a forward-looking perspective on the future of intelligent systems.
Focus sur l’IA et son impact, écrans Samsung, projet Neom. Discussions sur Action Mesh, Omnitransfer, et Prism d’OpenAI. Quelques mots sur Intel Panther Lake et les écrans de Samsung. Me soutenir sur Patreon Me retrouver sur YouTube On discute ensemble sur Discord Modèles IA de la semaine ActionMesh, Omnitransfer, et vidéos qui s’auto raffinent. Engram, du SQL dans ton LLM. ConceptMoe et raisonnements visuels. Maitres linéaires : les IA ont des valeurs très malléables. Principe Anthropic : qui enfume qui ? Anthropic et crock des parts de marchés… OpenAI lance Prism ! On va en voir de toutes les couleurs… Twentieth Century boy : Ami arrive à Paris. Config de canard : DuckDuck go avec ou sans IA ? Pas de Braga pas de chocolat Microsoft à fond dans l'IA ? Maia pas de problème ! Samsung, meilleure société écran ? Panthère vs gorgone : fight ! A Starlink to the past : qui paye la facture déjà ? Les fuites, ça pose toujours problème Neom mais sans cerveau, MBS suit la mode. Beyond meat… effectivement très au-delà de la viande. Participants Une émission préparée par Guillaume Poggiaspalla Présenté par Guillaume Vendé
Maintaining software over time rarely fails because of one bad decision. It fails because teams stop getting clear signals… and start guessing.In this episode, Robby talks with Lucas Roesler, Managing Partner and CTO at Contiamo. Lucas joins from Berlin to unpack what maintainability looks like in practice when you are dealing with real constraints… limited context, missing documentation, and systems that resist understanding.A big through-line is feedback. Lucas argues that long-lived systems become easier to change when they provide fast, trustworthy signals about what they are doing. That can look like tests that validate assumptions, tooling that makes runtime behavior visible, and a habit of designing for observability instead of treating it as a bolt-on.The conversation also gets concrete. Lucas shares a modernization effort built on a decade-old tangle of database logic… views, triggers, stored procedures, and materializations… created by a single engineer who was no longer around. With little documentation to lean on, the team had to build their own approach to “reading” the system and mapping dependencies before they could safely change anything.If you maintain software that has outlived its original authors, this is a grounded look at what helps teams move from uncertainty to confidence… without heroics, and without rewriting for sport.Episode Highlights[00:00:46] What well-maintained software has in common: Robby asks Lucas what traits show up in systems that hold together over time.[00:03:25] Readability at runtime: Lucas connects maintainability to observability and understanding what a system actually did.[00:16:08] Writing the system down as code: Infrastructure, CI/CD, and processes as code to reduce guesswork and improve reproducibility.[00:17:42] How client engagements work in practice: How Lucas' team collaborates with internal engineering teams and hands work off.[00:25:21] The “rat's nest” modernization story: Untangling a legacy data system with years of database logic and missing context.[00:29:40] Making data work testable: Why testability matters even when the “code” is SQL and pipelines.[00:34:59] Pivot back to feedback loops: Robby steers into why logs, metrics, and tracing shape better decision-making.[00:35:20] Why teams avoid metrics and tracing: The organizational friction of adding “one more component.”[00:42:59] Local observability with Grafana: Using visual feedback to spot waterfalls, sequential work, and hidden coupling.[00:50:00] Non-technical book recommendations: What Lucas reads and recommends outside of software.Links & ReferencesGuest and CompanyLucas Roesler: https://lucasroesler.com/Contiamo: https://contiamo.com/SocialMastodon: https://floss.social/@theaxerBluesky: https://bsky.app/profile/theaxer.bsky.socialBooks MentionedThe Wheel of Time (Robert Jordan): https://en.wikipedia.org/wiki/The_Wheel_of_TimeAccelerando (Charles Stross): https://en.wikipedia.org/wiki/AccelerandoCharles Stross: https://en.wikipedia.org/wiki/Charles_StrossThanks to Our Sponsor!Turn hours of debugging into just minutes! AppSignal is a performance monitoring and error-tracking tool designed for Ruby, Elixir, Python, Node.js, Javascript, and other frameworks.It offers six powerful features with one simple interface, providing developers with real-time insights into the performance and health of web applications.Keep your coding cool and error-free, one line at a time! Use the code maintainable to get a 10% discount for your first year. Check them out! Subscribe to Maintainable on:Apple PodcastsSpotifyOr search "Maintainable" wherever you stream your podcasts.Keep up to date with the Maintainable Podcast by joining the newsletter.
Help us become the #1 Data Podcast by leaving a rating & review! We are 67 reviews away! I spent a lot of time learning SQL the hard way. Knowing a few key ideas sooner would have changed everything.
This is episode 316 recorded on December 19th, 2025, where John & Jason talk about their predictions about 2024 & how they aged into 2025 and make their predictions for 2026 for Power BI & Microsoft Fabric. For show notes please visit www.bifocal.show
In this episode of the Women in Data Podcast, hosts Cecilia Oliveira and Karen Jean-Francois pull back the curtain on the "dirty little secret" of the data world: exactly how they are using AI to change the way they work. Moving past the headlines and the hype, Cecilia and Karen share a vulnerable look at their initial skepticism and how they shifted toward an "Augmentation Mindset." They dive into the practicalities of using AI as a junior collaborator—from cleaning messy data and writing SQL to the "meta" moment of using AI to help structure this very podcast episode. Whether you're feeling "productivity guilt" or pure curiosity, this episode is a guide to making AI work for you, so you can focus on the work only you can do.
Dive into the evolving modern data stack through the lens of observability, security, and log analytics as host Eric Kavanagh interviews Eric Tschetter of Imply and Mark Madsen of Third Nature about why vertically integrated observability platforms are giving way to more decoupled, composable architectures. Unpack how logs differ from traditional BI data, why schema-on-read changed the game, and how multiple query languages (SPL, KQL, SQL, and more) shape real-world workflows. Learn more about the cost and complexity trade-offs of data lakes, cloud storage, and retrieval at scale in this episode of DM Radio.
If you're still guessing how marketing is really performing, this episode will flip the switch. Spencer breaks down why Sales Qualified Leads (SQLs) are the most important metric you can track and the best leading indicator of future revenue. You'll learn how to clearly define an SQL, why it matters more than raw lead volume, and how tight communication between sales and marketing turns SQLs into better forecasting, smarter ad spend, and higher close rates. If you want fewer “junk leads” and more projects that actually fit your business, this is a must-listen.
In the rapidly evolving landscape of artificial intelligence (AI), the importance of data quality cannot be overstated. As organizations increasingly rely on AI to drive decision-making and optimize processes, the integrity of the data fed into these systems becomes a critical factor in determining the success or failure of AI initiatives. This CES conversation explores the necessity of data quality for AI, drawing insights from Kunju Kashalikar, VP of Product at Pentaho.Data Quality is Essential for AIOne of the fundamental premises of AI is that it requires clean, accurate, and well-structured data to function effectively. As Kunju Kashalikar aptly pointed out, computers are not adept at managing chaos; they thrive on order and precision. When organizations attempt to implement AI solutions without ensuring the quality of their data, they often encounter significant challenges. Scott highlighted a humorous yet poignant example of data quality issues: a former business partner worked at a utility company where technicians logged power outages with the simple explanation of "squirrel." The problem arose not from the content of the log but from the myriad of misspellings that emerged. This anecdote underscores a crucial point: even seemingly trivial inconsistencies in data can lead to inefficiencies, misinterpretations, and ultimately flawed AI outcomes.The need for data quality is further illustrated by the concept of data integrity. In order for AI systems to generate reliable insights, they must be trained on data that is not only accurate but also uniform. As Kashalikar noted, AI systems can struggle to recognize variations in data that should be classified as equivalent. For example, if an AI model encounters the terms "SQL" and "squirrel" without context or correction, it may fail to understand their intended meaning, leading to erroneous conclusions. This highlights the necessity of establishing robust data preparation processes that can clean and standardize data before it is ingested by AI systems.The Varying Causes of Data IssuesMoreover, Kashalikar emphasized that data quality issues can manifest in various forms, from simple typographical errors to more complex structural inconsistencies. He shared a cautionary tale about a rental car bill that inaccurately reported a mileage of 40,000 miles - an impossibility for a standard vehicle. Such discrepancies illustrate the potential pitfalls of relying on poor-quality data to inform AI models. If AI systems are trained on flawed data, the outputs will inevitably be flawed as well, leading to misguided recommendations and potentially harmful decisions.To mitigate these challenges, organizations must invest in data quality solutions that facilitate the discovery, classification, and cleansing of data. As Kashalikar stated, Pentaho makes data easier to understand and consume, which is essential for fostering AI adoption. By implementing filters and validation checks that ensure only accurate data is accepted into systems, organizations can significantly reduce the manual effort required to sanitize data. This proactive approach not only streamlines processes but also enhances the overall reliability of AI outputs.ConclusionIn conclusion, data quality is an indispensable component of successful AI implementation. The insights shared by Kunju Kashalikar underscore the critical role that clean, accurate, and well-structured data plays in enabling AI systems to deliver meaningful insights, and the ways in which Pentaho can help in that journey. As organizations continue to harness the power of AI, prioritizing data quality will be essential in navigating the complexities of an increasingly data-driven world. By establishing robust data management practices, businesses can lay a solid foundation for their AI initiatives, ultimately leading to more effective decision-making and improved outcomes.Interview by Scott Ertz of F5 Live: Refreshing Technology.Sponsored by: Get $5 to protect your credit card information online with Privacy. Amazon Prime gives you more than just free shipping. Get free music, TV shows, movies, videogames and more. Secure your connection and unlock a faster, safer internet by signing up for PureVPN today.
In the rapidly evolving landscape of artificial intelligence (AI), the importance of data quality cannot be overstated. As organizations increasingly rely on AI to drive decision-making and optimize processes, the integrity of the data fed into these systems becomes a critical factor in determining the success or failure of AI initiatives. This CES conversation explores the necessity of data quality for AI, drawing insights from Kunju Kashalikar, VP of Product at Pentaho.Data Quality is Essential for AIOne of the fundamental premises of AI is that it requires clean, accurate, and well-structured data to function effectively. As Kunju Kashalikar aptly pointed out, computers are not adept at managing chaos; they thrive on order and precision. When organizations attempt to implement AI solutions without ensuring the quality of their data, they often encounter significant challenges. Scott highlighted a humorous yet poignant example of data quality issues: a former business partner worked at a utility company where technicians logged power outages with the simple explanation of "squirrel." The problem arose not from the content of the log but from the myriad of misspellings that emerged. This anecdote underscores a crucial point: even seemingly trivial inconsistencies in data can lead to inefficiencies, misinterpretations, and ultimately flawed AI outcomes.The need for data quality is further illustrated by the concept of data integrity. In order for AI systems to generate reliable insights, they must be trained on data that is not only accurate but also uniform. As Kashalikar noted, AI systems can struggle to recognize variations in data that should be classified as equivalent. For example, if an AI model encounters the terms "SQL" and "squirrel" without context or correction, it may fail to understand their intended meaning, leading to erroneous conclusions. This highlights the necessity of establishing robust data preparation processes that can clean and standardize data before it is ingested by AI systems.The Varying Causes of Data IssuesMoreover, Kashalikar emphasized that data quality issues can manifest in various forms, from simple typographical errors to more complex structural inconsistencies. He shared a cautionary tale about a rental car bill that inaccurately reported a mileage of 40,000 miles - an impossibility for a standard vehicle. Such discrepancies illustrate the potential pitfalls of relying on poor-quality data to inform AI models. If AI systems are trained on flawed data, the outputs will inevitably be flawed as well, leading to misguided recommendations and potentially harmful decisions.To mitigate these challenges, organizations must invest in data quality solutions that facilitate the discovery, classification, and cleansing of data. As Kashalikar stated, Pentaho makes data easier to understand and consume, which is essential for fostering AI adoption. By implementing filters and validation checks that ensure only accurate data is accepted into systems, organizations can significantly reduce the manual effort required to sanitize data. This proactive approach not only streamlines processes but also enhances the overall reliability of AI outputs.ConclusionIn conclusion, data quality is an indispensable component of successful AI implementation. The insights shared by Kunju Kashalikar underscore the critical role that clean, accurate, and well-structured data plays in enabling AI systems to deliver meaningful insights, and the ways in which Pentaho can help in that journey. As organizations continue to harness the power of AI, prioritizing data quality will be essential in navigating the complexities of an increasingly data-driven world. By establishing robust data management practices, businesses can lay a solid foundation for their AI initiatives, ultimately leading to more effective decision-making and improved outcomes.Interview by Scott Ertz of F5 Live: Refreshing Technology.Sponsored by: Get $5 to protect your credit card information online with Privacy. Amazon Prime gives you more than just free shipping. Get free music, TV shows, movies, videogames and more. Secure your connection and unlock a faster, safer internet by signing up for PureVPN today.
In this episode of the Crazy Wisdom podcast, host Stewart Alsop welcomes Roni Burd, a data and AI executive with extensive experience at Amazon and Microsoft, for a deep dive into the evolving landscape of data management and artificial intelligence in enterprise environments. Their conversation explores the longstanding challenges organizations face with knowledge management and data architecture, from the traditional bronze-silver-gold data processing pipeline to how AI agents are revolutionizing how people interact with organizational data without needing SQL or Python expertise. Burd shares insights on the economics of AI implementation at scale, the debate between one-size-fits-all models versus specialized fine-tuned solutions, and the technical constraints that prevent companies like Apple from upgrading services like Siri to modern LLM capabilities, while discussing the future of inference optimization and the hundreds-of-millions-of-dollars cost barrier that makes architectural experimentation in AI uniquely expensive compared to other industries.Timestamps00:00 Introduction to Data and AI Challenges03:08 The Evolution of Data Management05:54 Understanding Data Quality and Metadata08:57 The Role of AI in Data Cleaning11:50 Knowledge Management in Large Organizations14:55 The Future of AI and LLMs17:59 Economics of AI Implementation29:14 The Importance of LLMs for Major Tech Companies32:00 Open Source: Opportunities and Challenges35:19 The Future of AI Inference and Hardware43:24 Optimizing Inference: The Next Frontier49:23 The Commercial Viability of AI ModelsKey Insights1. Data Architecture Evolution: The industry has evolved through bronze-silver-gold data layers, where bronze is raw data, silver is cleaned/processed data, and gold is business-ready datasets. However, this creates bottlenecks as stakeholders lose access to original data during the cleaning process, making metadata and data cataloging increasingly critical for organizations.2. AI Democratizing Data Access: LLMs are breaking down technical barriers by allowing business users to query data in plain English without needing SQL, Python, or dashboarding skills. This represents a fundamental shift from requiring intermediaries to direct stakeholder access, though the full implications remain speculative.3. Economics Drive AI Architecture Decisions: Token costs and latency requirements are major factors determining AI implementation. Companies like Meta likely need their own models because paying per-token for billions of social media interactions would be economically unfeasible, driving the need for self-hosted solutions.4. One Model Won't Rule Them All: Despite initial hopes for universal models, the reality points toward specialized models for different use cases. This is driven by economics (smaller models for simple tasks), performance requirements (millisecond response times), and industry-specific needs (medical, military terminology).5. Inference is the Commercial Battleground: The majority of commercial AI value lies in inference rather than training. Current GPUs, while specialized for graphics and matrix operations, may still be too general for optimal inference performance, creating opportunities for even more specialized hardware.6. Open Source vs Open Weights Distinction: True open source in AI means access to architecture for debugging and modification, while "open weights" enables fine-tuning and customization. This distinction is crucial for enterprise adoption, as open weights provide the flexibility companies need without starting from scratch.7. Architecture Innovation Faces Expensive Testing Loops: Unlike database optimization where query plans can be easily modified, testing new AI architectures requires expensive retraining cycles costing hundreds of millions of dollars. This creates a potential innovation bottleneck, similar to aerospace industries where testing new designs is prohibitively expensive.
Nik and Michael are joined by Lev Kokotov for an update on all things PgDog. Here are some links to things they mentioned:Lev Kokotov https://postgres.fm/people/lev-kokotovPgDog https://github.com/pgdogdev/pgdogOur first PgDog episode (March 2025) https://postgres.fm/episodes/pgdogSharding pgvector (blog post by Lev) https://pgdog.dev/blog/sharding-pgvectorPrepared statements and partitioned table lock explosion (series by Nik) https://postgres.ai/blog/20251028-postgres-marathon-2-009~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith credit to:Jessie Draws for the elephant artwork
In this episode, Chris, Andrew, and David dive into details about refactoring with SQL, updates on new Ruby versions, and share their views on various developer tools including Mise, Overmind, and Foreman. They also touch on standardizing tools within their teams, the benefits of using Mise for Postgres, and the efficiency of task scripts. The conversation also covers encoding issues, Basecamp Fizzy SSFR protection, and rich-text editors like Lexxy and its application in Basecamp. Additionally, there's a light-hearted discussion on the speculative future of AI and Neuralink. Hit download now to hear more! LinksJudoscale- Remote Ruby listener giftRuby ReleasesForeman-GitHubOvermind-GitHubMise versionsUsage SpecificationA Ruby YAML parser (blog post by Kevin Newton)Lexxy-GitHubBasecamp Fizzy SSRF protection-GitHubNeuralinkAndrew Mason-The MatrixHoneybadgerHoneybadger is an application health monitoring tool built by developers for developers.JudoscaleMake your deployments bulletproof with autoscaling that just works.Disclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you. Chris Oliver X/Twitter Andrew Mason X/Twitter Jason Charnes X/Twitter
In this episode of RevOps Champions, host Brendon Dennewill sits down with Peter Fuller, Founder of Workflow Academy and a leading expert in revenue operations, CRM systems, and workflow automation. Peter shares his unconventional path from studying Russian literature to building a RevOps consultancy and training ecosystem, and why the “human” side of RevOps will only become more important as AI adoption accelerates.Peter breaks down the three pillars he teaches (ask better questions in plain English, “measure twice cut once” with clear scoping, and only then build), and explains why most AI initiatives fail: not because the tools don't work, but because leaders chase hype instead of focused, high-ROI use cases. He offers a practical approach for 2026: empower your internal tinkerer, carve out time, and prove ROI on one micro-solution before turning AI into a company-wide strategy. The conversation is a grounded, refreshingly contrarian take on where AI actually helps RevOps teams today, especially in reporting, dashboards, SQL, and automation, without sacrificing relationships, trust, and real human context.This episode is essential listening for RevOps leaders, operators, and executives who want to cut through AI noise, prioritize what matters, and deploy automation in ways that genuinely improve performance without distracting the business.What You'll LearnWhere AI is creating real leverage in RevOps today, and where it quietly falls shortWhy the most critical parts of RevOps still depend on human judgment and trustA simple framework for approaching RevOps work without jumping straight to toolsHow to experiment with AI in a way that minimizes risk and maximizes learningHow to separate real opportunity from AI hype and vendor-driven urgencyWhat leaders should prioritize in 2026 to explore AI without derailing core operationsResources MentionedCerebro AnalyticsLovableMoon Knox ChatGPTClaudeWorkflow AcademyAspireshipZohoIs your business ready to scale? Take the Growth Readiness Score to find out. In 5 minutes, you'll see: Benchmark data showing how you stack up to other organizations A clear view of your operational maturity Whether your business is ready to scale (and what to do next if it's not) Let's Connect Subscribe to the RevOps Champions Newsletter LinkedIn YouTube Explore the show at revopschampions.com. Ready to unite your teams with RevOps strategies that eliminate costly silos and drive growth? Let's talk!
Fredrik och Kristoffer snackar Gleam. Kristoffer berättar om att använda Gleam för att skriva frontend, backend, eller båda delar, och hur det skiljer sig från andra lösningar på samma upplägg. Kristoffer berättar också om Gleams abstraktioner - eller avsaknad av dem - tunna lager, och beundransvärt strikta hållning. Mot slutet diskuterars också konkret kontra abstrakt arkitektur - vad är det egentligen koden ska lösa? I termer av kod? Har man verkligen förstått vad man ska göra så kanske det alltid blir väldigt konkreta och enkla lösningar i koden? Avsnittet sponsras av Yazen, företaget som hjälper folk övervinna övervikt och som söker fullstackutvecklare för jobb helt på distans. Jobba var du vill, med kollegor över hela världen! Se alla lediga platser här! Ett stort tack till Cloudnet som sponsrar vår VPS! Har du kommentarer, frågor eller tips? Vi är @kodsnack, @thieta, @krig, och @bjoreman på Mastodon, har en sida på Facebook och epostas på info@kodsnack.se om du vill skriva längre. Vi läser allt som skickas. Gillar du Kodsnack får du hemskt gärna recensera oss i iTunes! Du kan också stödja podden genom att ge oss en kaffe (eller två!) på Ko-fi, eller handla något i vår butik. Länkar A Middle-earth traveller - sketches from Bag end to Moria The hobbit sketchbook,, av Alan Lee The art of the lord of the rings - Tolkiens egna bilder Boken med Miyazakis Pippi-skisser Chris Piascik - Illustratören med Youtubekanalen Procreate Procreate dreams The league of extraordinary gentlemen Alan Moore Dave Gibbons TAM - The amazing meeting The amazing Randi Uri Geller Kreationism Richard Dawkins Yazen - veckans sponsor Yazens priser - för att vara snabbväxande startup och för sin behandling Yazens jobbannons Alla Yazens lediga jobb Giacomo Cavalieri Videon Fredrik såg med Giacomo Gleam Kristoffer ska prata på meetup 28 januari i Stockholm Lustre Liveview Next.js Elm JSX Squirrel - typsäker SQL i Gleam Sqlx Simplifile - filläsningsbibliotek för Gleam Stöd oss på Ko-fi Hacker news Lobste.rs Typer i Gleam Case i Gleam Gleam gathering 2026 - Gleamkonferensen i Bristol Louis i Software unscripted med Richard Feldman You can't design software you don't work on - texten om "abstrakt och konkret arkitektur", fast den diskuterade det som mjukvarudesign David Allen Getting things done Goatmire Coolify Kottke.org Comic sans Chocolat - editorn som växlade till Comic sans om man inte betalade Sublime Cot Visual studio code Obsidian Nova Obsidian publish hette det, inte Obsidian pages Titlar Inte finputsa Jag kan inte sudda Man har inget motstånd Rita trollkarlar och hobbits Miyazaki-Pippi Animera hober och trollkarlar Rita piprök Trehundra händer på papper Alan Moore var där Personlighet och kavaj Motsatsen till magi Reactsortens magi Rätt balans för dig Russinsoppa De fem viktiga russinen Tillräckligt mycket SQL för att vara farlig Delegera till Erlangkoden Standardbiblioteksbibliotek Många bibliotek som är standard Ett enklare språk Färre byggstenar Ett fall som är Kalle Språket försvinner Bool är bara en typ Där riktig arkitektur försiggår
Nik and Michael are joined by Radim Marek from boringSQL to talk about RegreSQL, a regression testing tool for SQL queries they forked and improved recently. Here are some links to things they mentioned:Radim Marek https://postgres.fm/people/radim-marekboringSQL https://boringsql.comRegreSQL: Regression Testing for PostgreSQL Queries (blog post by Radim) https://boringsql.com/posts/regresql-testing-queriesDiscussion on Hacker News https://news.ycombinator.com/item?id=45924619 Radim's fork of RegreSQL on GitHub https://github.com/boringSQL/regresql Original RegreSQL on GitHub (by Dimitri Fontaine) https://github.com/dimitri/regresql The Art of PostgreSQL (book) https://theartofpostgresql.comHow to make the non-production Postgres planner behave like in production (how-to post by Nik) https://postgres.ai/docs/postgres-howtos/performance-optimization/query-tuning/how-to-imitate-production-planner Just because you're getting an index scan, doesn't mean you can't do better! (Blog post by Michael) https://www.pgmustard.com/blog/index-scan-doesnt-mean-its-fastboringSQL Labs https://labs.boringsql.com~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith credit to:Jessie Draws for the elephant artwork
professorjrod@gmail.comIn this episode of Technology Tap: CompTIA Study Guide, we explore how proactive detection surpasses reactive troubleshooting in cybersecurity. For those preparing for their CompTIA exam, understanding the subtle clues and quiet anomalies attackers leave behind is essential for developing strong IT skills and excelling in tech exam prep. We dive deep into the critical indicators that help you detect security compromises early, providing practical knowledge essential for your technology education and IT certification journey. Join us as we equip you with expert insights to sharpen your detection abilities and enhance your competence in protecting systems effectively.We walk through the behaviors that matter: viruses that hitch a ride on clicks, worms that paint the network with unexplained traffic, and fileless attacks that live in memory and borrow admin tools like PowerShell and scheduled tasks. You'll learn how to spot spyware by the aftermath of credential misuse, recognize RATs and backdoors by their steady beaconing to unknown IPs, and use contradictions—like tools disagreeing about running processes—as a signal for rootkits. We also draw a sharp line between ransomware's loud chaos and cryptojacking's quiet drain on your CPU and fan.Zooming out, we map network and application signals: certificate warnings and duplicate MACs that hint at man-in-the-middle, DNS mismatches that suggest cache poisoning, and log patterns that betray SQL injection, replay abuse, or directory traversal. Along the way, we talk about building Security+ instincts through scaffolding—A+ for OS and hardware intuition, Network+ for protocol fluency, and Security+ for attacker behavior—so indicators make sense the moment you see them.If you want a sharper eye for subtle threats and a stronger shot at your Security+ exam, this guide will train your attention on the tells adversaries can't fully hide. Subscribe, share with a teammate who handles triage, and leave a review with your favorite indicator to watch—we'll feature the best ones in a future show.Support the showArt By Sarah/DesmondMusic by Joakim KarudLittle chacha ProductionsJuan Rodriguez can be reached atTikTok @ProfessorJrodProfessorJRod@gmail.com@Prof_JRodInstagram ProfessorJRod
In less than 12 months, Shahar went from an idea to a $30M Series A and a team of 40. He didn't sell another AI tool—he built an AI-first service that replaced expensive human consultants in the massive pen-testing market.In this episode, Shahar breaks down the "Service-as-Software" playbook that allowed him to hit $1M ARR in just three months. He reveals how to convert design partners into paying customers before the product is finished, why he refuses to sell to service providers, and how to achieve a 40% SQL-to-Close rate in the enterprise.Why You Should ListenHow to hit $1M ARR in a single quarter with zero marketing spend.Why asking "Would you use this?" is useless and the one question that actually validates demand.Why "Service-as-Software" is the single best business model for AI startupsHow to maintain a 100% win rate against competitors in live bake-offs.The ultimate litmus test for knowing if you have true Product-Market Fit.Keywordsstartup podcast, startup podcast for founders, product market fit, finding pmf, agentic AI, cybersecurity startup, B2B sales strategy, service as software, rapid scaling, Felicis00:00:00 Intro00:04:06 Why Manual Pen Testing is Broken00:15:42 Ideation and The Wallet Test00:22:38 How to Convert Design Partners to Paid00:28:05 40 Percent SQL to Close Rate00:33:14 The Service as Software Business Model00:46:06 Hitting 1M ARR in One Quarter00:48:50 Raising a 30M Series A from Felicis00:50:01 The Turn It Off PMF TestSend me a message to let me know what you think!
What happens when engineering teams can finally see the business impact of every technical decision they make? In this episode of Tech Talks Daily, I sat down with Chris Cooney, Director of Advocacy at Coralogix, to unpack why observability is no longer just an engineering concern, but a strategic lever for the entire business. Chris joined me fresh from AWS re:Invent, where he had been challenging a long-standing assumption that technical signals like CPU usage, error rates, and logs belong only in engineering silos. Instead, he argues that these signals, when enriched and interpreted correctly, can tell a much more powerful story about revenue loss, customer experience, and competitive advantage. We explored Coralogix's Observability Maturity Model, a four-stage framework that takes organizations from basic telemetry collection through to business-level decision making. Chris shared how many teams stall at measuring engineering health, without ever connecting that data to customer impact or financial outcomes. The conversation became especially tangible when he explained how a single failed checkout log can be enriched with product and pricing data to reveal a bug costing thousands of dollars per day. That shift, from "fix this tech debt" to "fix this issue draining revenue," fundamentally changes how priorities are set across teams. Chris also introduced Oli, Coralogix's AI observability agent, and explained why it is designed as an agent rather than a simple assistant. We talked about how Oli can autonomously investigate issues across logs, metrics, traces, alerts, and dashboards, allowing anyone in the organization to ask questions in plain English and receive actionable insights. From diagnosing a complex SQL injection attempt to surfacing downstream customer impact, Oli represents a move toward democratizing observability data far beyond engineering teams. Throughout our discussion, a clear theme emerged. When technical health is directly tied to business health, observability stops being seen as a cost center and starts becoming a competitive advantage. By giving autonomous engineering teams visibility into real-world impact, organizations can make faster, better decisions, foster innovation, and avoid the blind spots that have cost even well-known brands millions. So if observability still feels like a necessary expense rather than a growth driver in your organization, what would change if every technical signal could be translated into clear business impact, and who would make better decisions if they could finally see that connection? Useful LInks Connect with Chris Cooney Learn more about Coralogix Follow on LinkedIn Thanks to our sponsors, Alcor, for supporting the show.
SHOW: 992SHOW TRANSCRIPT: The Cloudcast #992 TranscriptSHOW VIDEO: https://youtube.com/@TheCloudcastNET NEW TO CLOUD? CHECK OUT OUR OTHER PODCAST - "CLOUDCAST BASICS" SHOW NOTES:Tonic.ai websiteTonic Validate Product PageTonic Validate GitHubTopic 1 - Adam, welcome to the show. Give everyone a brief introduction.Topic 2: Our topic today is RAG systems, specifically RAG in production. Let's start with customization sources and types. When it comes to customizing off-the-shelf LLMs, RAG is one option, as is an MCP connection to a SQL database, and there is pre- and post-training, as well as fine-tuning. How does an organization decide what path is best for customization?Topic 3 - RAG came on the scene as the savior for organizations that want to use customer AI without the need for fine-tuning and additional training. It has either gone through or is currently still in the trough of disillusionment. What are your thoughts on RAG's evolution and the challenges it faces?Topic 4 - Let's walk through the basics of validation. Once you set up RAG, how would an organization know it works? How is accuracy measured and validated? Are you looking for hallucinations? Context quality?Topic 5 - What is Tonic Validate, and where does it fit into this stack? Is it in band? Out of band? Built into the CI workflow?Topic 6 - Accuracy is one aspect, but we hear more and more about ROI for Enterprises. How should ROI, risk, and compliance be measured?Topic 7 - Where and how does security fit into all of this? Also, your thoughts on synthetic data for training vs. real data?Topic 8 - If anyone is interested, what's the best way to get started?FEEDBACK?Email: show at the cloudcast dot netBluesky: @cloudcastpod.bsky.socialTwitter/X: @cloudcastpodInstagram: @cloudcastpodTikTok: @cloudcastpod
Baris Gultekin, VP of AI at Snowflake, explains how “bringing AI to the data” is reshaping enterprise AI deployment under strict security and governance requirements. PSA for AI builders: Interested in alignment, governance, or AI safety? Learn more about the MATS Summer 2026 Fellowship and submit your name to be notified when applications open: https://matsprogram.org/s26-tcr. He shares the importance of bringing AI directly to governed enterprise data, advances in text-to-SQL and semantic modeling, and why high-quality retrieval is foundational for trustworthy AI agents. Baris also dives into Snowflake's approach to agentic AI, including Snowflake Intelligence, model choice and cost tradeoffs, and why governance, security, and open standards are essential as AI becomes accessible to every business user. LINKS: AWS' Automated Reasoning checks Sponsors: MongoDB: Tired of database limitations and architectures that break when you scale? MongoDB is the database built for developers, by developers—ACID compliant, enterprise-ready, and fluent in AI—so you can start building faster at https://mongodb.com/build Serval: Serval uses AI-powered automations to cut IT help desk tickets by more than 50%, freeing your team from repetitive tasks like password resets and onboarding. Book your free pilot and guarantee 50% help desk automation by week four at https://serval.com/cognitive MATS: MATS is a fully funded 12-week research program pairing rising talent with top mentors in AI alignment, interpretability, security, and governance. Apply for the next cohort at https://matsprogram.org/s26-tcr Tasklet: Tasklet is an AI agent that automates your work 24/7; just describe what you want in plain English and it gets the job done. Try it for free and use code COGREV for 50% off your first month at https://tasklet.ai CHAPTERS: (00:00) About the Episode (03:02) Snowflake 101 and AI (09:25) Text-to-SQL and semantics (19:10) RAG, embeddings and models (Part 1) (19:17) Sponsors: MongoDB | Serval (21:02) RAG, embeddings and models (Part 2) (32:23) Bringing models to data (Part 1) (32:29) Sponsors: MATS | Tasklet (35:29) Bringing models to data (Part 2) (51:14) Designing enterprise AI agents (58:35) Trust, governance and guardrails (01:07:14) Agents and future work (01:15:33) Platforms, competition and value (01:26:04) Enterprise models and outlook (01:40:00) Outro PRODUCED BY: https://aipodcast.ing
The FBI warns of Kimsuky quishing. Singapore warns of a critical vulnerability in Advantech IoT management platforms. Russia's Fancy Bear targets energy research, defense collaboration, and government communications. Malaysia and Indonesia suspend access to X. Researchers warn a large-scale fraud operation is using AI-generated personas to trap mobile users in a social engineering scam. BreachForums gets breached. The NSA names a new Deputy Director. Monday Biz Brief. Our guest is Sasha Ingber, host of the International Spy Museum's SpyCast podcast. The commuter who hacked his scooter. Remember to leave us a 5-star rating and review in your favorite podcast app. Miss an episode? Sign-up for our daily intelligence roundup, Daily Briefing, and you'll never miss a beat. And be sure to follow CyberWire Daily on LinkedIn. CyberWire Guest Today we are joined by Sasha Ingber, host of the International Spy Museum's SpyCast podcast, on the return of SpyCast to the N2K CyberWire network. Selected Reading North Korea–linked APT Kimsuky behind quishing attacks, FBI warns (Security Affairs) Advantech patches maximum-severity SQL injection flaw in IoT products (Beyond Machines) Russia's APT28 Targeting Energy Research, Defense Collaboration Entities (SecurityWeek) Malaysia and Indonesia block X over deepfake smut (The Register) New OPCOPRO Scam Uses AI and Fake WhatsApp Groups to Defraud Victim (Hackread) BreachForums hacking forum database leaked, exposing 324,000 accounts (Bleeping Computer) Former NSA insider Kosiba brought back as spy agency's No. 2 (The Record) Vega raises $120 million in a Series B round led by Accel. Reverse engineering my cloud-connected e-scooter and finding the master key to unlock all scooters (Rasmus Moorats) Share your feedback. What do you think about CyberWire Daily? Please take a few minutes to share your thoughts with us by completing our brief listener survey. Thank you for helping us continue to improve our show. Want to hear your company in the show? N2K CyberWire helps you reach the industry's most influential leaders and operators, while building visibility, authority, and connectivity across the cybersecurity community. Learn more at sponsor.thecyberwire.com. The CyberWire is a production of N2K Networks, your source for strategic workforce intelligence. © N2K Networks, Inc. Learn more about your ad choices. Visit megaphone.fm/adchoices
Happy New Year! You may have noticed that in 2025 we had moved toward YouTube as our primary podcasting platform. As we'll explain in the next State of Latent Space post, we'll be doubling down on Substack again and improving the experience for the over 100,000 of you who look out for our emails and website updates!We first mentioned Artificial Analysis in 2024, when it was still a side project in a Sydney basement. They then were one of the few Nat Friedman and Daniel Gross' AIGrant companies to raise a full seed round from them and have now become the independent gold standard for AI benchmarking—trusted by developers, enterprises, and every major lab to navigate the exploding landscape of models, providers, and capabilities.We have chatted with both Clementine Fourrier of HuggingFace's OpenLLM Leaderboard and (the freshly valued at $1.7B) Anastasios Angelopoulos of LMArena on their approaches to LLM evals and trendspotting, but Artificial Analysis have staked out an enduring and important place in the toolkit of the modern AI Engineer by doing the best job of independently running the most comprehensive set of evals across the widest range of open and closed models, and charting their progress for broad industry analyst use.George Cameron and Micah-Hill Smith have spent two years building Artificial Analysis into the platform that answers the questions no one else will: Which model is actually best for your use case? What are the real speed-cost trade-offs? And how open is “open” really?We discuss:* The origin story: built as a side project in 2023 while Micah was building a legal AI assistant, launched publicly in January 2024, and went viral after Swyx's retweet* Why they run evals themselves: labs prompt models differently, cherry-pick chain-of-thought examples (Google Gemini 1.0 Ultra used 32-shot prompts to beat GPT-4 on MMLU), and self-report inflated numbers* The mystery shopper policy: they register accounts not on their own domain and run intelligence + performance benchmarks incognito to prevent labs from serving different models on private endpoints* How they make money: enterprise benchmarking insights subscription (standardized reports on model deployment, serverless vs. managed vs. leasing chips) and private custom benchmarking for AI companies (no one pays to be on the public leaderboard)* The Intelligence Index (V3): synthesizes 10 eval datasets (MMLU, GPQA, agentic benchmarks, long-context reasoning) into a single score, with 95% confidence intervals via repeated runs* Omissions Index (hallucination rate): scores models from -100 to +100 (penalizing incorrect answers, rewarding ”I don't know”), and Claude models lead with the lowest hallucination rates despite not always being the smartest* GDP Val AA: their version of OpenAI's GDP-bench (44 white-collar tasks with spreadsheets, PDFs, PowerPoints), run through their Stirrup agent harness (up to 100 turns, code execution, web search, file system), graded by Gemini 3 Pro as an LLM judge (tested extensively, no self-preference bias)* The Openness Index: scores models 0-18 on transparency of pre-training data, post-training data, methodology, training code, and licensing (AI2 OLMo 2 leads, followed by Nous Hermes and NVIDIA Nemotron)* The smiling curve of AI costs: GPT-4-level intelligence is 100-1000x cheaper than at launch (thanks to smaller models like Amazon Nova), but frontier reasoning models in agentic workflows cost more than ever (sparsity, long context, multi-turn agents)* Why sparsity might go way lower than 5%: GPT-4.5 is ~5% active, Gemini models might be ~3%, and Omissions Index accuracy correlates with total parameters (not active), suggesting massive sparse models are the future* Token efficiency vs. turn efficiency: GPT-5 costs more per token but solves Tau-bench in fewer turns (cheaper overall), and models are getting better at using more tokens only when needed (5.1 Codex has tighter token distributions)* V4 of the Intelligence Index coming soon: adding GDP Val AA, Critical Point, hallucination rate, and dropping some saturated benchmarks (human-eval-style coding is now trivial for small models)Links to Artificial Analysis* Website: https://artificialanalysis.ai* George Cameron on X: https://x.com/georgecameron* Micah-Hill Smith on X: https://x.com/micahhsmithFull Episode on YouTubeTimestamps* 00:00 Introduction: Full Circle Moment and Artificial Analysis Origins* 01:19 Business Model: Independence and Revenue Streams* 04:33 Origin Story: From Legal AI to Benchmarking Need* 16:22 AI Grant and Moving to San Francisco* 19:21 Intelligence Index Evolution: From V1 to V3* 11:47 Benchmarking Challenges: Variance, Contamination, and Methodology* 13:52 Mystery Shopper Policy and Maintaining Independence* 28:01 New Benchmarks: Omissions Index for Hallucination Detection* 33:36 Critical Point: Hard Physics Problems and Research-Level Reasoning* 23:01 GDP Val AA: Agentic Benchmark for Real Work Tasks* 50:19 Stirrup Agent Harness: Open Source Agentic Framework* 52:43 Openness Index: Measuring Model Transparency Beyond Licenses* 58:25 The Smiling Curve: Cost Falling While Spend Rising* 1:02:32 Hardware Efficiency: Blackwell Gains and Sparsity Limits* 1:06:23 Reasoning Models and Token Efficiency: The Spectrum Emerges* 1:11:00 Multimodal Benchmarking: Image, Video, and Speech Arenas* 1:15:05 Looking Ahead: Intelligence Index V4 and Future Directions* 1:16:50 Closing: The Insatiable Demand for IntelligenceTranscriptMicah [00:00:06]: This is kind of a full circle moment for us in a way, because the first time artificial analysis got mentioned on a podcast was you and Alessio on Latent Space. Amazing.swyx [00:00:17]: Which was January 2024. I don't even remember doing that, but yeah, it was very influential to me. Yeah, I'm looking at AI News for Jan 17, or Jan 16, 2024. I said, this gem of a models and host comparison site was just launched. And then I put in a few screenshots, and I said, it's an independent third party. It clearly outlines the quality versus throughput trade-off, and it breaks out by model and hosting provider. I did give you s**t for missing fireworks, and how do you have a model benchmarking thing without fireworks? But you had together, you had perplexity, and I think we just started chatting there. Welcome, George and Micah, to Latent Space. I've been following your progress. Congrats on... It's been an amazing year. You guys have really come together to be the presumptive new gardener of AI, right? Which is something that...George [00:01:09]: Yeah, but you can't pay us for better results.swyx [00:01:12]: Yes, exactly.George [00:01:13]: Very important.Micah [00:01:14]: Start off with a spicy take.swyx [00:01:18]: Okay, how do I pay you?Micah [00:01:20]: Let's get right into that.swyx [00:01:21]: How do you make money?Micah [00:01:24]: Well, very happy to talk about that. So it's been a big journey the last couple of years. Artificial analysis is going to be two years old in January 2026. Which is pretty soon now. We first run the website for free, obviously, and give away a ton of data to help developers and companies navigate AI and make decisions about models, providers, technologies across the AI stack for building stuff. We're very committed to doing that and tend to keep doing that. We have, along the way, built a business that is working out pretty sustainably. We've got just over 20 people now and two main customer groups. So we want to be... We want to be who enterprise look to for data and insights on AI, so we want to help them with their decisions about models and technologies for building stuff. And then on the other side, we do private benchmarking for companies throughout the AI stack who build AI stuff. So no one pays to be on the website. We've been very clear about that from the very start because there's no use doing what we do unless it's independent AI benchmarking. Yeah. But turns out a bunch of our stuff can be pretty useful to companies building AI stuff.swyx [00:02:38]: And is it like, I am a Fortune 500, I need advisors on objective analysis, and I call you guys and you pull up a custom report for me, you come into my office and give me a workshop? What kind of engagement is that?George [00:02:53]: So we have a benchmarking and insight subscription, which looks like standardized reports that cover key topics or key challenges enterprises face when looking to understand AI and choose between all the technologies. And so, for instance, one of the report is a model deployment report, how to think about choosing between serverless inference, managed deployment solutions, or leasing chips. And running inference yourself is an example kind of decision that big enterprises face, and it's hard to reason through, like this AI stuff is really new to everybody. And so we try and help with our reports and insight subscription. Companies navigate that. We also do custom private benchmarking. And so that's very different from the public benchmarking that we publicize, and there's no commercial model around that. For private benchmarking, we'll at times create benchmarks, run benchmarks to specs that enterprises want. And we'll also do that sometimes for AI companies who have built things, and we help them understand what they've built with private benchmarking. Yeah. So that's a piece mainly that we've developed through trying to support everybody publicly with our public benchmarks. Yeah.swyx [00:04:09]: Let's talk about TechStack behind that. But okay, I'm going to rewind all the way to when you guys started this project. You were all the way in Sydney? Yeah. Well, Sydney, Australia for me.Micah [00:04:19]: George was an SF, but he's Australian, but he moved here already. Yeah.swyx [00:04:22]: And I remember I had the Zoom call with you. What was the impetus for starting artificial analysis in the first place? You know, you started with public benchmarks. And so let's start there. We'll go to the private benchmark. Yeah.George [00:04:33]: Why don't we even go back a little bit to like why we, you know, thought that it was needed? Yeah.Micah [00:04:40]: The story kind of begins like in 2022, 2023, like both George and I have been into AI stuff for quite a while. In 2023 specifically, I was trying to build a legal AI research assistant. So it actually worked pretty well for its era, I would say. Yeah. Yeah. So I was finding that the more you go into building something using LLMs, the more each bit of what you're doing ends up being a benchmarking problem. So had like this multistage algorithm thing, trying to figure out what the minimum viable model for each bit was, trying to optimize every bit of it as you build that out, right? Like you're trying to think about accuracy, a bunch of other metrics and performance and cost. And mostly just no one was doing anything to independently evaluate all the models. And certainly not to look at the trade-offs for speed and cost. So we basically set out just to build a thing that developers could look at to see the trade-offs between all of those things measured independently across all the models and providers. Honestly, it was probably meant to be a side project when we first started doing it.swyx [00:05:49]: Like we didn't like get together and say like, Hey, like we're going to stop working on all this stuff. I'm like, this is going to be our main thing. When I first called you, I think you hadn't decided on starting a company yet.Micah [00:05:58]: That's actually true. I don't even think we'd pause like, like George had an acquittance job. I didn't quit working on my legal AI thing. Like it was genuinely a side project.George [00:06:05]: We built it because we needed it as people building in the space and thought, Oh, other people might find it useful too. So we'll buy domain and link it to the Vercel deployment that we had and tweet about it. And, but very quickly it started getting attention. Thank you, Swyx for, I think doing an initial retweet and spotlighting it there. This project that we released. And then very quickly though, it was useful to others, but very quickly it became more useful as the number of models released accelerated. We had Mixtrel 8x7B and it was a key. That's a fun one. Yeah. Like a open source model that really changed the landscape and opened up people's eyes to other serverless inference providers and thinking about speed, thinking about cost. And so that was a key. And so it became more useful quite quickly. Yeah.swyx [00:07:02]: What I love talking to people like you who sit across the ecosystem is, well, I have theories about what people want, but you have data and that's obviously more relevant. But I want to stay on the origin story a little bit more. When you started out, I would say, I think the status quo at the time was every paper would come out and they would report their numbers versus competitor numbers. And that's basically it. And I remember I did the legwork. I think everyone has some knowledge. I think there's some version of Excel sheet or a Google sheet where you just like copy and paste the numbers from every paper and just post it up there. And then sometimes they don't line up because they're independently run. And so your numbers are going to look better than... Your reproductions of other people's numbers are going to look worse because you don't hold their models correctly or whatever the excuse is. I think then Stanford Helm, Percy Liang's project would also have some of these numbers. And I don't know if there's any other source that you can cite. The way that if I were to start artificial analysis at the same time you guys started, I would have used the Luther AI's eval framework harness. Yup.Micah [00:08:06]: Yup. That was some cool stuff. At the end of the day, running these evals, it's like if it's a simple Q&A eval, all you're doing is asking a list of questions and checking if the answers are right, which shouldn't be that crazy. But it turns out there are an enormous number of things that you've got control for. And I mean, back when we started the website. Yeah. Yeah. Like one of the reasons why we realized that we had to run the evals ourselves and couldn't just take rules from the labs was just that they would all prompt the models differently. And when you're competing over a few points, then you can pretty easily get- You can put the answer into the model. Yeah. That in the extreme. And like you get crazy cases like back when I'm Googled a Gemini 1.0 Ultra and needed a number that would say it was better than GPT-4 and like constructed, I think never published like chain of thought examples. 32 of them in every topic in MLU to run it, to get the score, like there are so many things that you- They never shipped Ultra, right? That's the one that never made it up. Not widely. Yeah. Yeah. Yeah. I mean, I'm sure it existed, but yeah. So we were pretty sure that we needed to run them ourselves and just run them in the same way across all the models. Yeah. And we were, we also did certain from the start that you couldn't look at those in isolation. You needed to look at them alongside the cost and performance stuff. Yeah.swyx [00:09:24]: Okay. A couple of technical questions. I mean, so obviously I also thought about this and I didn't do it because of cost. Yep. Did you not worry about costs? Were you funded already? Clearly not, but you know. No. Well, we definitely weren't at the start.Micah [00:09:36]: So like, I mean, we're paying for it personally at the start. There's a lot of money. Well, the numbers weren't nearly as bad a couple of years ago. So we certainly incurred some costs, but we were probably in the order of like hundreds of dollars of spend across all the benchmarking that we were doing. Yeah. So nothing. Yeah. It was like kind of fine. Yeah. Yeah. These days that's gone up an enormous amount for a bunch of reasons that we can talk about. But yeah, it wasn't that bad because you can also remember that like the number of models we were dealing with was hardly any and the complexity of the stuff that we wanted to do to evaluate them was a lot less. Like we were just asking some Q&A type questions and then one specific thing was for a lot of evals initially, we were just like sampling an answer. You know, like, what's the answer for this? Like, we didn't want to go into the answer directly without letting the models think. We weren't even doing chain of thought stuff initially. And that was the most useful way to get some results initially. Yeah.swyx [00:10:33]: And so for people who haven't done this work, literally parsing the responses is a whole thing, right? Like because sometimes the models, the models can answer any way they feel fit and sometimes they actually do have the right answer, but they just returned the wrong format and they will get a zero for that unless you work it into your parser. And that involves more work. And so, I mean, but there's an open question whether you should give it points for not following your instructions on the format.Micah [00:11:00]: It depends what you're looking at, right? Because you can, if you're trying to see whether or not it can solve a particular type of reasoning problem, and you don't want to test it on its ability to do answer formatting at the same time, then you might want to use an LLM as answer extractor approach to make sure that you get the answer out no matter how unanswered. But these days, it's mostly less of a problem. Like, if you instruct a model and give it examples of what the answers should look like, it can get the answers in your format, and then you can do, like, a simple regex.swyx [00:11:28]: Yeah, yeah. And then there's other questions around, I guess, sometimes if you have a multiple choice question, sometimes there's a bias towards the first answer, so you have to randomize the responses. All these nuances, like, once you dig into benchmarks, you're like, I don't know how anyone believes the numbers on all these things. It's so dark magic.Micah [00:11:47]: You've also got, like… You've got, like, the different degrees of variance in different benchmarks, right? Yeah. So, if you run four-question multi-choice on a modern reasoning model at the temperatures suggested by the labs for their own models, the variance that you can see on a four-question multi-choice eval is pretty enormous if you only do a single run of it and it has a small number of questions, especially. So, like, one of the things that we do is run an enormous number of all of our evals when we're developing new ones and doing upgrades to our intelligence index to bring in new things. Yeah. So, that we can dial in the right number of repeats so that we can get to the 95% confidence intervals that we're comfortable with so that when we pull that together, we can be confident in intelligence index to at least as tight as, like, a plus or minus one at a 95% confidence. Yeah.swyx [00:12:32]: And, again, that just adds a straight multiple to the cost. Oh, yeah. Yeah, yeah.George [00:12:37]: So, that's one of many reasons that cost has gone up a lot more than linearly over the last couple of years. We report a cost to run the artificial analysis. We report a cost to run the artificial analysis intelligence index on our website, and currently that's assuming one repeat in terms of how we report it because we want to reflect a bit about the weighting of the index. But our cost is actually a lot higher than what we report there because of the repeats.swyx [00:13:03]: Yeah, yeah, yeah. And probably this is true, but just checking, you don't have any special deals with the labs. They don't discount it. You just pay out of pocket or out of your sort of customer funds. Oh, there is a mix. So, the issue is that sometimes they may give you a special end point, which is… Ah, 100%.Micah [00:13:21]: Yeah, yeah, yeah. Exactly. So, we laser focus, like, on everything we do on having the best independent metrics and making sure that no one can manipulate them in any way. There are quite a lot of processes we've developed over the last couple of years to make that true for, like, the one you bring up, like, right here of the fact that if we're working with a lab, if they're giving us a private endpoint to evaluate a model, that it is totally possible. That what's sitting behind that black box is not the same as they serve on a public endpoint. We're very aware of that. We have what we call a mystery shopper policy. And so, and we're totally transparent with all the labs we work with about this, that we will register accounts not on our own domain and run both intelligence evals and performance benchmarks… Yeah, that's the job. …without them being able to identify it. And no one's ever had a problem with that. Because, like, a thing that turns out to actually be quite a good… …good factor in the industry is that they all want to believe that none of their competitors could manipulate what we're doing either.swyx [00:14:23]: That's true. I never thought about that. I've been in the database data industry prior, and there's a lot of shenanigans around benchmarking, right? So I'm just kind of going through the mental laundry list. Did I miss anything else in this category of shenanigans? Oh, potential shenanigans.Micah [00:14:36]: I mean, okay, the biggest one, like, that I'll bring up, like, is more of a conceptual one, actually, than, like, direct shenanigans. It's that the things that get measured become things that get targeted by labs that they're trying to build, right? Exactly. So that doesn't mean anything that we should really call shenanigans. Like, I'm not talking about training on test set. But if you know that you're going to be great at another particular thing, if you're a researcher, there are a whole bunch of things that you can do to try to get better at that thing that preferably are going to be helpful for a wide range of how actual users want to use the thing that you're building. But will not necessarily work. Will not necessarily do that. So, for instance, the models are exceptional now at answering competition maths problems. There is some relevance of that type of reasoning, that type of work, to, like, how we might use modern coding agents and stuff. But it's clearly not one for one. So the thing that we have to be aware of is that once an eval becomes the thing that everyone's looking at, scores can get better on it without there being a reflection of overall generalized intelligence of these models. Getting better. That has been true for the last couple of years. It'll be true for the next couple of years. There's no silver bullet to defeat that other than building new stuff to stay relevant and measure the capabilities that matter most to real users. Yeah.swyx [00:15:58]: And we'll cover some of the new stuff that you guys are building as well, which is cool. Like, you used to just run other people's evals, but now you're coming up with your own. And I think, obviously, that is a necessary path once you're at the frontier. You've exhausted all the existing evals. I think the next point in history that I have for you is AI Grant that you guys decided to join and move here. What was it like? I think you were in, like, batch two? Batch four. Batch four. Okay.Micah [00:16:26]: I mean, it was great. Nat and Daniel are obviously great. And it's a really cool group of companies that we were in AI Grant alongside. It was really great to get Nat and Daniel on board. Obviously, they've done a whole lot of great work in the space with a lot of leading companies and were extremely aligned. With the mission of what we were trying to do. Like, we're not quite typical of, like, a lot of the other AI startups that they've invested in.swyx [00:16:53]: And they were very much here for the mission of what we want to do. Did they say any advice that really affected you in some way or, like, were one of the events very impactful? That's an interesting question.Micah [00:17:03]: I mean, I remember fondly a bunch of the speakers who came and did fireside chats at AI Grant.swyx [00:17:09]: Which is also, like, a crazy list. Yeah.George [00:17:11]: Oh, totally. Yeah, yeah, yeah. There was something about, you know, speaking to Nat and Daniel about the challenges of working through a startup and just working through the questions that don't have, like, clear answers and how to work through those kind of methodically and just, like, work through the hard decisions. And they've been great mentors to us as we've built artificial analysis. Another benefit for us was that other companies in the batch and other companies in AI Grant are pushing the capabilities. Yeah. And I think that's a big part of what AI can do at this time. And so being in contact with them, making sure that artificial analysis is useful to them has been fantastic for supporting us in working out how should we build out artificial analysis to continue to being useful to those, like, you know, building on AI.swyx [00:17:59]: I think to some extent, I'm mixed opinion on that one because to some extent, your target audience is not people in AI Grants who are obviously at the frontier. Yeah. Do you disagree?Micah [00:18:09]: To some extent. To some extent. But then, so a lot of what the AI Grant companies are doing is taking capabilities coming out of the labs and trying to push the limits of what they can do across the entire stack for building great applications, which actually makes some of them pretty archetypical power users of artificial analysis. Some of the people with the strongest opinions about what we're doing well and what we're not doing well and what they want to see next from us. Yeah. Yeah. Because when you're building any kind of AI application now, chances are you're using a whole bunch of different models. You're maybe switching reasonably frequently for different models and different parts of your application to optimize what you're able to do with them at an accuracy level and to get better speed and cost characteristics. So for many of them, no, they're like not commercial customers of ours, like we don't charge for all our data on the website. Yeah. They are absolutely some of our power users.swyx [00:19:07]: So let's talk about just the evals as well. So you start out from the general like MMU and GPQA stuff. What's next? How do you sort of build up to the overall index? What was in V1 and how did you evolve it? Okay.Micah [00:19:22]: So first, just like background, like we're talking about the artificial analysis intelligence index, which is our synthesis metric that we pulled together currently from 10 different eval data sets to give what? We're pretty much the same as that. Pretty confident is the best single number to look at for how smart the models are. Obviously, it doesn't tell the whole story. That's why we published the whole website of all the charts to dive into every part of it and look at the trade-offs. But best single number. So right now, it's got a bunch of Q&A type data sets that have been very important to the industry, like a couple that you just mentioned. It's also got a couple of agentic data sets. It's got our own long context reasoning data set and some other use case focused stuff. As time goes on. The things that we're most interested in that are going to be important to the capabilities that are becoming more important for AI, what developers are caring about, are going to be first around agentic capabilities. So surprise, surprise. We're all loving our coding agents and how the model is going to perform like that and then do similar things for different types of work are really important to us. The linking to use cases to economically valuable use cases are extremely important to us. And then we've got some of the. Yeah. These things that the models still struggle with, like working really well over long contexts that are not going to go away as specific capabilities and use cases that we need to keep evaluating.swyx [00:20:46]: But I guess one thing I was driving was like the V1 versus the V2 and how bad it was over time.Micah [00:20:53]: Like how we've changed the index to where we are.swyx [00:20:55]: And I think that reflects on the change in the industry. Right. So that's a nice way to tell that story.Micah [00:21:00]: Well, V1 would be completely saturated right now. Almost every model coming out because doing things like writing the Python functions and human evil is now pretty trivial. It's easy to forget, actually, I think how much progress has been made in the last two years. Like we obviously play the game constantly of like the today's version versus last week's version and the week before and all of the small changes in the horse race between the current frontier and who has the best like smaller than 10B model like right now this week. Right. And that's very important to a lot of developers and people and especially in this particular city of San Francisco. But when you zoom out a couple of years ago, literally most of what we were doing to evaluate the models then would all be 100% solved by even pretty small models today. And that's been one of the key things, by the way, that's driven down the cost of intelligence at every tier of intelligence. We can talk about more in a bit. So V1, V2, V3, we made things harder. We covered a wider range of use cases. And we tried to get closer to things developers care about as opposed to like just the Q&A type stuff that MMLU and GPQA represented. Yeah.swyx [00:22:12]: I don't know if you have anything to add there. Or we could just go right into showing people the benchmark and like looking around and asking questions about it. Yeah.Micah [00:22:21]: Let's do it. Okay. This would be a pretty good way to chat about a few of the new things we've launched recently. Yeah.George [00:22:26]: And I think a little bit about the direction that we want to take it. And we want to push benchmarks. Currently, the intelligence index and evals focus a lot on kind of raw intelligence. But we kind of want to diversify how we think about intelligence. And we can talk about it. But kind of new evals that we've kind of built and partnered on focus on topics like hallucination. And we've got a lot of topics that I think are not covered by the current eval set that should be. And so we want to bring that forth. But before we get into that.swyx [00:23:01]: And so for listeners, just as a timestamp, right now, number one is Gemini 3 Pro High. Then followed by Cloud Opus at 70. Just 5.1 high. You don't have 5.2 yet. And Kimi K2 Thinking. Wow. Still hanging in there. So those are the top four. That will date this podcast quickly. Yeah. Yeah. I mean, I love it. I love it. No, no. 100%. Look back this time next year and go, how cute. Yep.George [00:23:25]: Totally. A quick view of that is, okay, there's a lot. I love it. I love this chart. Yeah.Micah [00:23:30]: This is such a favorite, right? Yeah. And almost every talk that George or I give at conferences and stuff, we always put this one up first to just talk about situating where we are in this moment in history. This, I think, is the visual version of what I was saying before about the zooming out and remembering how much progress there's been. If we go back to just over a year ago, before 01, before Cloud Sonnet 3.5, we didn't have reasoning models or coding agents as a thing. And the game was very, very different. If we go back even a little bit before then, we're in the era where, when you look at this chart, open AI was untouchable for well over a year. And, I mean, you would remember that time period well of there being very open questions about whether or not AI was going to be competitive, like full stop, whether or not open AI would just run away with it, whether we would have a few frontier labs and no one else would really be able to do anything other than consume their APIs. I am quite happy overall that the world that we have ended up in is one where... Multi-model. Absolutely. And strictly more competitive every quarter over the last few years. Yeah. This year has been insane. Yeah.George [00:24:42]: You can see it. This chart with everything added is hard to read currently. There's so many dots on it, but I think it reflects a little bit what we felt, like how crazy it's been.swyx [00:24:54]: Why 14 as the default? Is that a manual choice? Because you've got service now in there that are less traditional names. Yeah.George [00:25:01]: It's models that we're kind of highlighting by default in our charts, in our intelligence index. Okay.swyx [00:25:07]: You just have a manually curated list of stuff.George [00:25:10]: Yeah, that's right. But something that I actually don't think every artificial analysis user knows is that you can customize our charts and choose what models are highlighted. Yeah. And so if we take off a few names, it gets a little easier to read.swyx [00:25:25]: Yeah, yeah. A little easier to read. Totally. Yeah. But I love that you can see the all one jump. Look at that. September 2024. And the DeepSeek jump. Yeah.George [00:25:34]: Which got close to OpenAI's leadership. They were so close. I think, yeah, we remember that moment. Around this time last year, actually.Micah [00:25:44]: Yeah, yeah, yeah. I agree. Yeah, well, a couple of weeks. It was Boxing Day in New Zealand when DeepSeek v3 came out. And we'd been tracking DeepSeek and a bunch of the other global players that were less known over the second half of 2024 and had run evals on the earlier ones and stuff. I very distinctly remember Boxing Day in New Zealand, because I was with family for Christmas and stuff, running the evals and getting back result by result on DeepSeek v3. So this was the first of their v3 architecture, the 671b MOE.Micah [00:26:19]: And we were very, very impressed. That was the moment where we were sure that DeepSeek was no longer just one of many players, but had jumped up to be a thing. The world really noticed when they followed that up with the RL working on top of v3 and R1 succeeding a few weeks later. But the groundwork for that absolutely was laid with just extremely strong base model, completely open weights that we had as the best open weights model. So, yeah, that's the thing that you really see in the game. But I think that we got a lot of good feedback on Boxing Day. us on Boxing Day last year.George [00:26:48]: Boxing Day is the day after Christmas for those not familiar.George [00:26:54]: I'm from Singapore.swyx [00:26:55]: A lot of us remember Boxing Day for a different reason, for the tsunami that happened. Oh, of course. Yeah, but that was a long time ago. So yeah. So this is the rough pitch of AAQI. Is it A-A-Q-I or A-A-I-I? I-I. Okay. Good memory, though.Micah [00:27:11]: I don't know. I'm not used to it. Once upon a time, we did call it Quality Index, and we would talk about quality, performance, and price, but we changed it to intelligence.George [00:27:20]: There's been a few naming changes. We added hardware benchmarking to the site, and so benchmarks at a kind of system level. And so then we changed our throughput metric to, we now call it output speed, and thenswyx [00:27:32]: throughput makes sense at a system level, so we took that name. Take me through more charts. What should people know? Obviously, the way you look at the site is probably different than how a beginner might look at it.Micah [00:27:42]: Yeah, that's fair. There's a lot of fun stuff to dive into. Maybe so we can hit past all the, like, we have lots and lots of emails and stuff. The interesting ones to talk about today that would be great to bring up are a few of our recent things, I think, that probably not many people will be familiar with yet. So first one of those is our omniscience index. So this one is a little bit different to most of the intelligence evils that we've run. We built it specifically to look at the embedded knowledge in the models and to test hallucination by looking at when the model doesn't know the answer, so not able to get it correct, what's its probability of saying, I don't know, or giving an incorrect answer. So the metric that we use for omniscience goes from negative 100 to positive 100. Because we're simply taking off a point if you give an incorrect answer to the question. We're pretty convinced that this is an example of where it makes most sense to do that, because it's strictly more helpful to say, I don't know, instead of giving a wrong answer to factual knowledge question. And one of our goals is to shift the incentive that evils create for models and the labs creating them to get higher scores. And almost every evil across all of AI up until this point, it's been graded by simple percentage correct as the main metric, the main thing that gets hyped. And so you should take a shot at everything. There's no incentive to say, I don't know. So we did that for this one here.swyx [00:29:22]: I think there's a general field of calibration as well, like the confidence in your answer versus the rightness of the answer. Yeah, we completely agree. Yeah. Yeah.George [00:29:31]: On that. And one reason that we didn't do that is because. Or put that into this index is that we think that the, the way to do that is not to ask the models how confident they are.swyx [00:29:43]: I don't know. Maybe it might be though. You put it like a JSON field, say, say confidence and maybe it spits out something. Yeah. You know, we have done a few evils podcasts over the, over the years. And when we did one with Clementine of hugging face, who maintains the open source leaderboard, and this was one of her top requests, which is some kind of hallucination slash lack of confidence calibration thing. And so, Hey, this is one of them.Micah [00:30:05]: And I mean, like anything that we do, it's not a perfect metric or the whole story of everything that you think about as hallucination. But yeah, it's pretty useful and has some interesting results. Like one of the things that we saw in the hallucination rate is that anthropics Claude models at the, the, the very left-hand side here with the lowest hallucination rates out of the models that we've evaluated amnesty is on. That is an interesting fact. I think it probably correlates with a lot of the previously, not really measured vibes stuff that people like about some of the Claude models. Is the dataset public or what's is it, is there a held out set? There's a hell of a set for this one. So we, we have published a public test set, but we we've only published 10% of it. The reason is that for this one here specifically, it would be very, very easy to like have data contamination because it is just factual knowledge questions. We would. We'll update it at a time to also prevent that, but with yeah, kept most of it held out so that we can keep it reliable for a long time. It leads us to a bunch of really cool things, including breakdown quite granularly by topic. And so we've got some of that disclosed on the website publicly right now, and there's lots more coming in terms of our ability to break out very specific topics. Yeah.swyx [00:31:23]: I would be interested. Let's, let's dwell a little bit on this hallucination one. I noticed that Haiku hallucinates less than Sonnet hallucinates less than Opus. And yeah. Would that be the other way around in a normal capability environments? I don't know. What's, what do you make of that?George [00:31:37]: One interesting aspect is that we've found that there's not really a, not a strong correlation between intelligence and hallucination, right? That's to say that the smarter the models are in a general sense, isn't correlated with their ability to, when they don't know something, say that they don't know. It's interesting that Gemini three pro preview was a big leap over here. Gemini 2.5. Flash and, and, and 2.5 pro, but, and if I add pro quickly here.swyx [00:32:07]: I bet pro's really good. Uh, actually no, I meant, I meant, uh, the GPT pros.George [00:32:12]: Oh yeah.swyx [00:32:13]: Cause GPT pros are rumored. We don't know for a fact that it's like eight runs and then with the LM judge on top. Yeah.George [00:32:20]: So we saw a big jump in, this is accuracy. So this is just percent that they get, uh, correct and Gemini three pro knew a lot more than the other models. And so big jump in accuracy. But relatively no change between the Google Gemini models, between releases. And the hallucination rate. Exactly. And so it's likely due to just kind of different post-training recipe, between the, the Claude models. Yeah.Micah [00:32:45]: Um, there's, there's driven this. Yeah. You can, uh, you can partially blame us and how we define intelligence having until now not defined hallucination as a negative in the way that we think about intelligence.swyx [00:32:56]: And so that's what we're changing. Uh, I know many smart people who are confidently incorrect.George [00:33:02]: Uh, look, look at that. That, that, that is very humans. Very true. And there's times and a place for that. I think our view is that hallucination rate makes sense in this context where it's around knowledge, but in many cases, people want the models to hallucinate, to have a go. Often that's the case in coding or when you're trying to generate newer ideas. One eval that we added to artificial analysis is, is, is critical point and it's really hard, uh, physics problems. Okay.swyx [00:33:32]: And is it sort of like a human eval type or something different or like a frontier math type?George [00:33:37]: It's not dissimilar to frontier frontier math. So these are kind of research questions that kind of academics in the physics physics world would be able to answer, but models really struggled to answer. So the top score here is not 9%.swyx [00:33:51]: And when the people that, that created this like Minway and, and, and actually off via who was kind of behind sweep and what organization is this? Oh, is this, it's Princeton.George [00:34:01]: Kind of range of academics from, from, uh, different academic institutions, really smart people. They talked about how they turn the models up in terms of the temperature as high temperature as they can, where they're trying to explore kind of new ideas in physics as a, as a thought partner, just because they, they want the models to hallucinate. Um, yeah, sometimes it's something new. Yeah, exactly.swyx [00:34:21]: Um, so not right in every situation, but, um, I think it makes sense, you know, to test hallucination in scenarios where it makes sense. Also, the obvious question is, uh, this is one of. Many that there is there, every lab has a system card that shows some kind of hallucination number, and you've chosen to not, uh, endorse that and you've made your own. And I think that's a, that's a choice. Um, totally in some sense, the rest of artificial analysis is public benchmarks that other people can independently rerun. You provide it as a service here. You have to fight the, well, who are we to, to like do this? And your, your answer is that we have a lot of customers and, you know, but like, I guess, how do you converge the individual?Micah [00:35:08]: I mean, I think, I think for hallucinations specifically, there are a bunch of different things that you might care about reasonably, and that you'd measure quite differently, like we've called this a amnesty and solutionation rate, not trying to declare the, like, it's humanity's last hallucination. You could, uh, you could have some interesting naming conventions and all this stuff. Um, the biggest picture answer to that. It's something that I actually wanted to mention. Just as George was explaining, critical point as well is, so as we go forward, we are building evals internally. We're partnering with academia and partnering with AI companies to build great evals. We have pretty strong views on, in various ways for different parts of the AI stack, where there are things that are not being measured well, or things that developers care about that should be measured more and better. And we intend to be doing that. We're not obsessed necessarily with that. Everything we do, we have to do entirely within our own team. Critical point. As a cool example of where we were a launch partner for it, working with academia, we've got some partnerships coming up with a couple of leading companies. Those ones, obviously we have to be careful with on some of the independent stuff, but with the right disclosure, like we're completely comfortable with that. A lot of the labs have released great data sets in the past that we've used to great success independently. And so it's between all of those techniques, we're going to be releasing more stuff in the future. Cool.swyx [00:36:26]: Let's cover the last couple. And then we'll, I want to talk about your trends analysis stuff, you know? Totally.Micah [00:36:31]: So that actually, I have one like little factoid on omniscience. If you go back up to accuracy on omniscience, an interesting thing about this accuracy metric is that it tracks more closely than anything else that we measure. The total parameter count of models makes a lot of sense intuitively, right? Because this is a knowledge eval. This is the pure knowledge metric. We're not looking at the index and the hallucination rate stuff that we think is much more about how the models are trained. This is just what facts did they recall? And yeah, it tracks parameter count extremely closely. Okay.swyx [00:37:05]: What's the rumored size of GPT-3 Pro? And to be clear, not confirmed for any official source, just rumors. But rumors do fly around. Rumors. I get, I hear all sorts of numbers. I don't know what to trust.Micah [00:37:17]: So if you, if you draw the line on omniscience accuracy versus total parameters, we've got all the open ways models, you can squint and see that likely the leading frontier models right now are quite a lot bigger than the ones that we're seeing right now. And the one trillion parameters that the open weights models cap out at, and the ones that we're looking at here, there's an interesting extra data point that Elon Musk revealed recently about XAI that for three trillion parameters for GROK 3 and 4, 6 trillion for GROK 5, but that's not out yet. Take those together, have a look. You might reasonably form a view that there's a pretty good chance that Gemini 3 Pro is bigger than that, that it could be in the 5 to 10 trillion parameters. To be clear, I have absolutely no idea, but just based on this chart, like that's where you would, you would land if you have a look at it. Yeah.swyx [00:38:07]: And to some extent, I actually kind of discourage people from guessing too much because what does it really matter? Like as long as they can serve it as a sustainable cost, that's about it. Like, yeah, totally.George [00:38:17]: They've also got different incentives in play compared to like open weights models who are thinking to supporting others in self-deployment for the labs who are doing inference at scale. It's I think less about total parameters in many cases. When thinking about inference costs and more around number of active parameters. And so there's a bit of an incentive towards larger sparser models. Agreed.Micah [00:38:38]: Understood. Yeah. Great. I mean, obviously if you're a developer or company using these things, not exactly as you say, it doesn't matter. You should be looking at all the different ways that we measure intelligence. You should be looking at cost to run index number and the different ways of thinking about token efficiency and cost efficiency based on the list prices, because that's all it matters.swyx [00:38:56]: It's not as good for the content creator rumor mill where I can say. Oh, GPT-4 is this small circle. Look at GPT-5 is this big circle. And then there used to be a thing for a while. Yeah.Micah [00:39:07]: But that is like on its own, actually a very interesting one, right? That is it just purely that chances are the last couple of years haven't seen a dramatic scaling up in the total size of these models. And so there's a lot of room to go up properly in total size of the models, especially with the upcoming hardware generations. Yes.swyx [00:39:29]: So, you know. Taking off my shitposting face for a minute. Yes. Yes. At the same time, I do feel like, you know, especially coming back from Europe, people do feel like Ilya is probably right that the paradigm is doesn't have many more orders of magnitude to scale out more. And therefore we need to start exploring at least a different path. GDPVal, I think it's like only like a month or so old. I was also very positive when it first came out. I actually talked to Tejo, who was the lead researcher on that. Oh, cool. And you have your own version.George [00:39:59]: It's a fantastic. It's a fantastic data set. Yeah.swyx [00:40:01]: And maybe it will recap for people who are still out of it. It's like 44 tasks based on some kind of GDP cutoff that's like meant to represent broad white collar work that is not just coding. Yeah.Micah [00:40:12]: Each of the tasks have a whole bunch of detailed instructions, some input files for a lot of them. It's within the 44 is divided into like two hundred and twenty two to five, maybe subtasks that are the level of that we run through the agenda. And yeah, they're really interesting. I will say that it doesn't. It doesn't necessarily capture like all the stuff that people do at work. No avail is perfect is always going to be more things to look at, largely because in order to make the tasks well enough to find that you can run them, they need to only have a handful of input files and very specific instructions for that task. And so I think the easiest way to think about them are that they're like quite hard take home exam tasks that you might do in an interview process.swyx [00:40:56]: Yeah, for listeners, it is not no longer like a long prompt. It is like, well, here's a zip file with like a spreadsheet or a PowerPoint deck or a PDF and go nuts and answer this question.George [00:41:06]: OpenAI released a great data set and they released a good paper which looks at performance across the different web chat bots on the data set. It's a great paper, encourage people to read it. What we've done is taken that data set and turned it into an eval that can be run on any model. So we created a reference agentic harness that can run. Run the models on the data set, and then we developed evaluator approach to compare outputs. That's kind of AI enabled, so it uses Gemini 3 Pro Preview to compare results, which we tested pretty comprehensively to ensure that it's aligned to human preferences. One data point there is that even as an evaluator, Gemini 3 Pro, interestingly, doesn't do actually that well. So that's kind of a good example of what we've done in GDPVal AA.swyx [00:42:01]: Yeah, the thing that you have to watch out for with LLM judge is self-preference that models usually prefer their own output, and in this case, it was not. Totally.Micah [00:42:08]: I think the way that we're thinking about the places where it makes sense to use an LLM as judge approach now, like quite different to some of the early LLM as judge stuff a couple of years ago, because some of that and MTV was a great project that was a good example of some of this a while ago was about judging conversations and like a lot of style type stuff. Here, we've got the task that the grader and grading model is doing is quite different to the task of taking the test. When you're taking the test, you've got all of the agentic tools you're working with, the code interpreter and web search, the file system to go through many, many turns to try to create the documents. Then on the other side, when we're grading it, we're running it through a pipeline to extract visual and text versions of the files and be able to provide that to Gemini, and we're providing the criteria for the task and getting it to pick which one more effectively meets the criteria of the task. Yeah. So we've got the task out of two potential outcomes. It turns out that we proved that it's just very, very good at getting that right, matched with human preference a lot of the time, because I think it's got the raw intelligence, but it's combined with the correct representation of the outputs, the fact that the outputs were created with an agentic task that is quite different to the way the grading model works, and we're comparing it against criteria, not just kind of zero shot trying to ask the model to pick which one is better.swyx [00:43:26]: Got it. Why is this an ELO? And not a percentage, like GDP-VAL?George [00:43:31]: So the outputs look like documents, and there's video outputs or audio outputs from some of the tasks. It has to make a video? Yeah, for some of the tasks. Some of the tasks.swyx [00:43:43]: What task is that?George [00:43:45]: I mean, it's in the data set. Like be a YouTuber? It's a marketing video.Micah [00:43:49]: Oh, wow. What? Like model has to go find clips on the internet and try to put it together. The models are not that good at doing that one, for now, to be clear. It's pretty hard to do that with a code editor. I mean, the computer stuff doesn't work quite well enough and so on and so on, but yeah.George [00:44:02]: And so there's no kind of ground truth, necessarily, to compare against, to work out percentage correct. It's hard to come up with correct or incorrect there. And so it's on a relative basis. And so we use an ELO approach to compare outputs from each of the models between the task.swyx [00:44:23]: You know what you should do? You should pay a contractor, a human, to do the same task. And then give it an ELO and then so you have, you have human there. It's just, I think what's helpful about GDPVal, the OpenAI one, is that 50% is meant to be normal human and maybe Domain Expert is higher than that, but 50% was the bar for like, well, if you've crossed 50, you are superhuman. Yeah.Micah [00:44:47]: So we like, haven't grounded this score in that exactly. I agree that it can be helpful, but we wanted to generalize this to a very large number. It's one of the reasons that presenting it as ELO is quite helpful and allows us to add models and it'll stay relevant for quite a long time. I also think it, it can be tricky looking at these exact tasks compared to the human performance, because the way that you would go about it as a human is quite different to how the models would go about it. Yeah.swyx [00:45:15]: I also liked that you included Lama 4 Maverick in there. Is that like just one last, like...Micah [00:45:20]: Well, no, no, no, no, no, no, it is the, it is the best model released by Meta. And... So it makes it into the homepage default set, still for now.George [00:45:31]: Other inclusion that's quite interesting is we also ran it across the latest versions of the web chatbots. And so we have...swyx [00:45:39]: Oh, that's right.George [00:45:40]: Oh, sorry.swyx [00:45:41]: I, yeah, I completely missed that. Okay.George [00:45:43]: No, not at all. So that, which has a checkered pattern. So that is their harness, not yours, is what you're saying. Exactly. And what's really interesting is that if you compare, for instance, Claude 4.5 Opus using the Claude web chatbot, it performs worse than the model in our agentic harness. And so in every case, the model performs better in our agentic harness than its web chatbot counterpart, the harness that they created.swyx [00:46:13]: Oh, my backwards explanation for that would be that, well, it's meant for consumer use cases and here you're pushing it for something.Micah [00:46:19]: The constraints are different and the amount of freedom that you can give the model is different. Also, you like have a cost goal. We let the models work as long as they want, basically. Yeah. Do you copy paste manually into the chatbot? Yeah. Yeah. That's, that was how we got the chatbot reference. We're not going to be keeping those updated at like quite the same scale as hundreds of models.swyx [00:46:38]: Well, so I don't know, talk to a browser base. They'll, they'll automate it for you. You know, like I have thought about like, well, we should turn these chatbot versions into an API because they are legitimately different agents in themselves. Yes. Right. Yeah.Micah [00:46:53]: And that's grown a huge amount of the last year, right? Like the tools. The tools that are available have actually diverged in my opinion, a fair bit across the major chatbot apps and the amount of data sources that you can connect them to have gone up a lot, meaning that your experience and the way you're using the model is more different than ever.swyx [00:47:10]: What tools and what data connections come to mind when you say what's interesting, what's notable work that people have done?Micah [00:47:15]: Oh, okay. So my favorite example on this is that until very recently, I would argue that it was basically impossible to get an LLM to draft an email for me in any useful way. Because most times that you're sending an email, you're not just writing something for the sake of writing it. Chances are context required is a whole bunch of historical emails. Maybe it's notes that you've made, maybe it's meeting notes, maybe it's, um, pulling something from your, um, any of like wherever you at work store stuff. So for me, like Google drive, one drive, um, in our super base databases, if we need to do some analysis or some data or something, preferably model can be plugged into all of those things and can go do some useful work based on it. The things that like I find most impressive currently that I am somewhat surprised work really well in late 2025, uh, that I can have models use super base MCP to query read only, of course, run a whole bunch of SQL queries to do pretty significant data analysis. And. And make charts and stuff and can read my Gmail and my notion. And okay. You actually use that. That's good. That's, that's, that's good. Is that a cloud thing? To various degrees of order, but chat GPD and Claude right now, I would say that this stuff like barely works in fairness right now. Like.George [00:48:33]: Because people are actually going to try this after they hear it. If you get an email from Micah, odds are it wasn't written by a chatbot.Micah [00:48:38]: So, yeah, I think it is true that I have never actually sent anyone an email drafted by a chatbot. Yet.swyx [00:48:46]: Um, and so you can, you can feel it right. And yeah, this time, this time next year, we'll come back and see where it's going. Totally. Um, super base shout out another famous Kiwi. Uh, I don't know if you've, you've any conversations with him about anything in particular on AI building and AI infra.George [00:49:03]: We have had, uh, Twitter DMS, um, with, with him because we're quite big, uh, super base users and power users. And we probably do some things more manually than we should in. In, in super base support line because you're, you're a little bit being super friendly. One extra, um, point regarding, um, GDP Val AA is that on the basis of the overperformance of the models compared to the chatbots turns out, we realized that, oh, like our reference harness that we built actually white works quite well on like gen generalist agentic tasks. This proves it in a sense. And so the agent harness is very. Minimalist. I think it follows some of the ideas that are in Claude code and we, all that we give it is context management capabilities, a web search, web browsing, uh, tool, uh, code execution, uh, environment. Anything else?Micah [00:50:02]: I mean, we can equip it with more tools, but like by default, yeah, that's it. We, we, we give it for GDP, a tool to, uh, view an image specifically, um, because the models, you know, can just use a terminal to pull stuff in text form into context. But to pull visual stuff into context, we had to give them a custom tool, but yeah, exactly. Um, you, you can explain an expert. No.George [00:50:21]: So it's, it, we turned out that we created a good generalist agentic harness. And so we, um, released that on, on GitHub yesterday. It's called stirrup. So if people want to check it out and, and it's a great, um, you know, base for, you know, generalist, uh, building a generalist agent for more specific tasks.Micah [00:50:39]: I'd say the best way to use it is get clone and then have your favorite coding. Agent make changes to it, to do whatever you want, because it's not that many lines of code and the coding agents can work with it. Super well.swyx [00:50:51]: Well, that's nice for the community to explore and share and hack on it. I think maybe in, in, in other similar environments, the terminal bench guys have done, uh, sort of the Harbor. Uh, and so it's, it's a, it's a bundle of, well, we need our minimal harness, which for them is terminus and we also need the RL environments or Docker deployment thing to, to run independently. So I don't know if you've looked at it. I don't know if you've looked at the harbor at all, is that, is that like a, a standard that people want to adopt?George [00:51:19]: Yeah, we've looked at it from a evals perspective and we love terminal bench and, and host benchmarks of, of, of terminal mention on artificial analysis. Um, we've looked at it from a, from a coding agent perspective, but could see it being a great, um, basis for any kind of agents. I think where we're getting to is that these models have gotten smart enough. They've gotten better, better tools that they can perform better when just given a minimalist. Set of tools and, and let them run, let the model control the, the agentic workflow rather than using another framework that's a bit more built out that tries to dictate the, dictate the flow. Awesome.swyx [00:51:56]: Let's cover the openness index and then let's go into the report stuff. Uh, so that's the, that's the last of the proprietary art numbers, I guess. I don't know how you sort of classify all these. Yeah.Micah [00:52:07]: Or call it, call it, let's call it the last of like the, the three new things that we're talking about from like the last few weeks. Um, cause I mean, there's a, we do a mix of stuff that. Where we're using open source, where we open source and what we do and, um, proprietary stuff that we don't always open source, like long context reasoning data set last year, we did open source. Um, and then all of the work on performance benchmarks across the site, some of them, we looking to open source, but some of them, like we're constantly iterating on and so on and so on and so on. So there's a huge mix, I would say, just of like stuff that is open source and not across the side. So that's a LCR for people. Yeah, yeah, yeah, yeah.swyx [00:52:41]: Uh, but let's, let's, let's talk about open.Micah [00:52:42]: Let's talk about openness index. This. Here is call it like a new way to think about how open models are. We, for a long time, have tracked where the models are open weights and what the licenses on them are. And that's like pretty useful. That tells you what you're allowed to do with the weights of a model, but there is this whole other dimension to how open models are. That is pretty important that we haven't tracked until now. And that's how much is disclosed about how it was made. So transparency about data, pre-training data and post-training data. And whether you're allowed to use that data and transparency about methodology and training code. So basically, those are the components. We bring them together to score an openness index for models so that you can in one place get this full picture of how open models are.swyx [00:53:32]: I feel like I've seen a couple other people try to do this, but they're not maintained. I do think this does matter. I don't know what the numbers mean apart from is there a max number? Is this out of 20?George [00:53:44]: It's out of 18 currently, and so we've got an openness index page, but essentially these are points, you get points for being more open across these different categories and the maximum you can achieve is 18. So AI2 with their extremely open OMO3 32B think model is the leader in a sense.swyx [00:54:04]: It's hooking face.George [00:54:05]: Oh, with their smaller model. It's coming soon. I think we need to run, we need to get the intelligence benchmarks right to get it on the site.swyx [00:54:12]: You can't have it open in the next. We can not include hooking face. We love hooking face. We'll have that, we'll have that up very soon. I mean, you know, the refined web and all that stuff. It's, it's amazing. Or is it called fine web? Fine web. Fine web.Micah [00:54:23]: Yeah, yeah, no, totally. Yep. One of the reasons this is cool, right, is that if you're trying to understand the holistic picture of the models and what you can do with all the stuff the company's contributing, this gives you that picture. And so we are going to keep it up to date alongside all the models that we do intelligence index on, on the site. And it's just an extra view to understand.swyx [00:54:43]: Can you scroll down to this? The, the, the, the trade-offs chart. Yeah, yeah. That one. Yeah. This, this really matters, right? Obviously, because you can b
This is episode 315 recorded on December 16th, 2025, where John & Jason talk about the Fabric November 2025 Feature Summary part 3 including updates to Data Warehouse, Real-Time Intelligence, and Data Factory. For show notes please visit www.bifocal.show
Nik and Michael discuss the events and trends they thought were most important in the Postgres ecosystem in 2025. Here are some links to things they mentioned: Postgres 18 release notes https://www.postgresql.org/docs/18/release-18.htmlOur episode on Postgres 18 https://postgres.fm/episodes/postgres-18LWLock:LockManager benchmarks for Postgres 18 (blog post by Nik) https://postgres.ai/blog/20251009-postgres-marathon-2-005PostgreSQL bug tied to zero-day attack on US Treasury https://www.theregister.com/2025/02/14/postgresql_bug_treasuryPgDog episode https://postgres.fm/episodes/pgdogMultigres episode https://postgres.fm/episodes/multigresNeki announcement https://planetscale.com/blog/announcing-nekiOur 100TB episode from 2024 https://postgres.fm/episodes/to-100tb-and-beyondPlanetScale for Postgres https://planetscale.com/blog/planetscale-for-postgresOracle's MySQL job cuts https://www.theregister.com/2025/09/11/oracle_slammed_for_mysql_jobAmazon Aurora DSQL is now generally available https://aws.amazon.com/about-aws/whats-new/2025/05/amazon-aurora-dsql-generally-availableAnnouncing Azure HorizonDB https://techcommunity.microsoft.com/blog/adforpostgresql/announcing-azure-horizondb/4469710Lessons from Replit and Tiger Data on Storage for Agentic Experimentation https://www.tigerdata.com/blog/lessons-replit-tiger-data-storage-agentic-experimentationInstant database clones with PostgreSQL 18 https://boringsql.com/posts/instant-database-clonesturbopuffer episode https://postgres.fm/episodes/turbopufferCrunchy joins Snowflake https://www.crunchydata.com/blog/crunchy-data-joins-snowflakeNeon joins Databricks https://neon.com/blog/neon-and-databricks~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith credit to:Jessie Draws for the elephant artwork
In this episode I talk with Mike Bowers, Chief Architect at Faircom, about ISAM—the bare-metal database layer that predates SQL and powers stock trading systems. We cover Faircom's pivot into industrial IoT, their JSON/SQL hybrid approach, and discuss AI, consciousness, and the symbol grounding problem.Links:FairComNonsense Monthly
This is episode 314 recorded on December 15th, 2025, where John & Jason talk about the Fabric November 2025 Feature Summary part 2 including updates to Data Engineering & Data Science. For show notes please visit www.bifocal.show
Podcasting 2.0 December 5th 2025 Episode 243: "Nuts & Logs" Adam & Dave poddy training, junie, major dev talk and more! ShowNotes We are LIT NYC? Alby Hub? AI stats analysis Alt Enclosure Video New aggregatory open build GitHub - Podcastindex-org/feedparser: The XML parser that converts saved podcast feeds into intermediary files for SQL ingestion. TTS Podcasts on OP3 Cloudflare Outage Decentralization Transcript Search What is Value4Value? - Read all about it at Value4Value.info V4V Stats Last Modified 12/05/2025 14:32:55 by Freedom Controller