Podcasts about V4

  • 382PODCASTS
  • 958EPISODES
  • 43mAVG DURATION
  • 5WEEKLY NEW EPISODES
  • Jun 9, 2026LATEST

POPULARITY

20192020202120222023202420252026


Best podcasts about V4

Show all podcasts related to v4

Latest podcast episodes about V4

Hacker Public Radio
HPR4657: UNIX Curio #8 - Comparing Files

Hacker Public Radio

Play Episode Listen Later Jun 9, 2026


This show has been flagged as Clean by the host. This series is dedicated to exploring little-known—and occasionally useful—trinkets lurking in the dusty corners of UNIX-like operating systems. Most users of UNIX-like systems are probably familiar with the diff utility. It is widely used with source code to compare two files and see what the differences are between them. Non-programmers, like me, also use it to examine what has changed in different versions of scripts or configuration files. Quite a few pieces of newer software can compare different versions of data and express changes in a format either identical to or similar to diff output. However, there are two other long-standing tools for this purpose that are far less known and deserve in my view to be termed UNIX Curios. The first of these is cmp 1 . While diff is primarily intended to be used on text files and compares them line by line, cmp compares files byte by byte. In my experience, its main use is to see whether two binary files are in fact identical—if they are, cmp outputs nothing and returns an exit status of 0. Back when methods of transferring files were not as reliable as they are today, this was a tool I would reach for sometimes. For example, you could use it to confirm that the data on a CD-ROM you burned was the same as the original. If there is a difference between the files, cmp will return an exit status of 1. By default, it will also print the location (byte and line number) of the first differing byte. When used with the -l option, it will print the location and value of every byte that differs. There is one exception to these: if the files are the same except that one is shorter than the other, it will print a message to that effect. The exit status will still be 1 in that case. Using the -s option with cmp will cause it to be totally silent and output nothing. Only the exit status will indicate whether the files are the same, different, or if the exit status is greater than 1, that an error occurred. This makes it useful for scripting, for example in case you wanted to confirm that a file copied to another location arrived fully intact. It is worth noting that diff is also capable of comparing binary files—however, it is not required by POSIX to report what is actually different or where differences occur. The same exit status as in cmp is returned: 0 if the files are the same, 1 if they are different, or greater than 1 if an error occurred. While many implementations offer an option to suppress the output, this is not in the standard 2 so the most portable method would be to instead redirect output to /dev/null . On my system the diff utility is three times the size of cmp , so if you don't need its extra capabilities, it is a less efficient way of doing the job. The other UNIX Curio for today is comm , and this utility 3 is also intended to compare two files to see what is common between them. Ken Fallon briefly talked about it a few years ago in HPR episode 3889 . Compared to the others, it has a much more specific use case. The two files are expected to be text files that are already sorted. What comm will do is print a tab-separated list of all the lines appearing in either or both files. Lines only in the first file will appear in the first column, lines only in the second file will be in the second column, and lines in both files will be in the third column. Any combination of the options -1 , -2 , and -3 can be used with comm to suppress printing of the first, second, or third column respectively. Using all three options at the same time is supported but it results in no output, so that isn't very useful. Unlike the other utilities, the exit status of comm doesn't tell you anything about the two files. It will be 0 if the program ran successfully, and greater than 0 if it didn't. I'm not sure if I have ever actually used comm for anything practical. I find its default output a bit difficult to meaningfully interpret, plus you need to ensure the two files are already sorted. It seems to be best suited to comparing lists, and one use case that Ken Fallon mentioned would be comparing two lists of files to see if any are missing. The command comm -3 listA listB would print files that only appear in listA in the first column and those only in listB in the second column. This would let you ignore all the filenames that appear in both and focus on those that were absent from one or the other. If on the other hand you only wanted to see the filenames that are on both lists, comm -12 listA listB would give you that. Some more frivolous potential uses also come to mind. If for some reason the cat utility is broken on your system, you could use comm listA /dev/null to print the file listA instead. If you want to insert tab characters before every line of a file but have an aversion to using sed or awk , then comm /dev/null listA would output listA with one tab before each line, and comm listA listA would insert two tabs. A bit silly, but it would work. The GNU implementation of comm even lets you choose something other than a tab to separate the columns 4 , so you could go wild with that. According to the POSIX specifications for cmp and comm , one of the two filenames given as arguments, but not both, can be a " - ", in which case standard input will be used for that "file" in the comparison. Also, the results are undefined if both arguments are the same FIFO special, character special, or block special file. Some implementations might not have these limitations, but you shouldn't rely on that everywhere. All three of these were developed quite early. The cmp utility appeared in 1971's First Edition UNIX 5 , while comm and diff seem to have made their debut in Fourth Edition UNIX 6,7 from 1973. The original versions might not have behaved exactly like their modern counterparts, and newer implementations (especially of the diff utility) have acquired additional options and capabilities, but the basic operation of each has stayed the same. The next time you need to compare files against each other, consider whether cmp or comm might be appropriate before automatically reaching for diff . They all have their uses in different situations. References: Cmp specification https://pubs.opengroup.org/onlinepubs/009695399/utilities/cmp.html Diff specification https://pubs.opengroup.org/onlinepubs/009695399/utilities/diff.html Comm specification https://pubs.opengroup.org/onlinepubs/009695399/utilities/comm.html GNU coreutils manual: comm https://www.gnu.org/software/coreutils/manual/html_node/comm-invocation.html First Edition UNIX cmp manual page http://man.cat-v.org/unix-1st/1/cmp Fourth Edition UNIX comm manual page https://www.tuhs.org/cgi-bin/utree.pl?file=V4/usr/man/man1/comm.1 Fourth Edition UNIX diff source https://www.tuhs.org/cgi-bin/utree.pl?file=V4/usr/source/s1/diff1.c Provide feedback on this episode.

ROI Hunters - Podcast de Marketing do Infomoney
Ele Enterrou a Própria Agência e Hoje Fatura R$ 1,2 Milhão por Mês | ROI Hunters 348

ROI Hunters - Podcast de Marketing do Infomoney

Play Episode Listen Later Jun 6, 2026 64:07


이진우의 손에 잡히는 경제
[손경제] 5/26(화) 현대차 원가절감 | 레버리지 ETF | 삼성 성과급 후폭풍

이진우의 손에 잡히는 경제

Play Episode Listen Later May 25, 2026


[깊이 있는 경제뉴스] 1) 현대차, 협력사에 “원가 20% 줄이자”.. 중국 저가 공세 대응 2) 딥시크 V4프로 요금, 챗GPT의 30분의 1 수준 3) 삼성·하이닉스 레버리지 ETF 내일 출시.. 시장 영향은? 4) 성과급 협상 후폭풍.. 타부문 가처분 신청· TSMC도 들썩 - 김치형 경제뉴스 큐레이터 - 정지서 연합인포맥스 기자 - 손희애 경제뉴스 큐레이터

The Thrill of Driving Podcast
How TVS Saved Norton — Manx R Review and Deep Dive

The Thrill of Driving Podcast

Play Episode Listen Later May 25, 2026 40:35


Norton is back. And the company that saved it isn't British — it's Indian.In this episode, evo India Editor Sirish Chandran and Executive Editor Aatish Mishra go deep on one of motorcycling's most unlikely comeback stories — how TVS Motor Company acquired a bankrupt Norton, poured serious money into it, rebuilt the factory, the workforce, and the engineering, and then produced the Manx R Superbike as proof that it worked.They cover TVS's investment and what Norton actually did with it, how the brand was upgraded from the ground up, which Indian components are on the Manx R, styling and first ride impressions, the V4 engine history and what makes it special, performance impressions from Spain, India launch plans, motorsport ambitions, how former JLR design boss Gerry McGovern is involved, and why Norton's revival matters to India.This is just the beginning.evo India is India's leading automotive enthusiast magazine and podcast — covering cars, motorcycles, motorsport, and the culture around them.

Tech Gumbo
Claude's Mythos Too Powerful, DeepSeek, OpenAI Misses Targets, Anthropic Hits $1 Trillion, and Taylor Swift TMs Her Voice

Tech Gumbo

Play Episode Listen Later May 7, 2026 22:20


News and Updates: Claude Mythos Finds Thousands of Flaws: Anthropic's unreleased Claude Mythos AI autonomously discovered thousands of zero-day vulnerabilities, including a 27-year-old OpenBSD flaw and critical Linux kernel exploits, without human assistance. DeepSeek V4 Goes Public: Chinese AI firm DeepSeek released V4, an open-source model matching top closed-source competitors at a fraction of the cost, with a 1-million-token context window and dramatic memory efficiency gains. OpenAI Misses Growth Targets: OpenAI reportedly fell short of internal ChatGPT user and revenue goals, rattling investors and sending Nvidia, AMD, Oracle, and CoreWeave shares lower in pre-market trading. Anthropic Surpasses $1 Trillion Valuation: Secondary market demand for Anthropic shares has pushed its valuation to $1 trillion on Forge Global, now trading above OpenAI despite OpenAI's larger official valuation of $852 billion. Taylor Swift Trademarks Her Voice and Likeness: Swift filed three U.S. trademark applications covering specific spoken phrases and a stage performance image, aiming to establish legal protection against unauthorized AI-generated deepfakes. Legal experts note Swift's approach of trademarking her voice is unprecedented in court, but could set a new legal standard for how public figures protect their identities in the AI era.

Magyar Közgazdasági Társaság
V4-ek: az iparpolitika új korszaka

Magyar Közgazdasági Társaság

Play Episode Listen Later May 6, 2026 95:38


Az iparpolitika új korszaka - mit tanulhatunk a V4-ek tapasztalataiból? Ezzel a címmel szervezett kerekasztal-beszélgetést a Magyar Közgazdasági Társaság (MKT) Ipari és Vállalkozási Szakosztálya 2026. április 14-én, kedden.A beszélgetés résztvevői: Szabó Dorottya, a Vállalkozásélénkítő Egyesület kutatója; Csontos Tamás Tibor, a KRTK Világgazdasági Intézet és a Szegedi Tudományegyetem tudományos segédmunkatársa; valamint Puzder Filip, az EY Magyarország állami támogatásokkal foglalkozó senior managere voltak. A beszélgetést Kállay László, a Budapesti Corvinus Egyetem Vállalkozás és Innováció Intézete Vállalkozásfejlesztés és -menedzsment Tanszékének egyetemi docense, az MKT Ipari és Vállalkozási Szakosztályának elnökségi tagja moderálja. A résztvevőket a rendezvény kezdetén Kozma Miklós, a Budapesti Corvinus Egyetem MBA-programigazgatója, a szakosztály elnöke köszöntötte.

NAHLAS |aktuality.sk
Po prehre Orbána je Robert Fico izolovaný. Péter Magyar plánuje novú stredoeurópsku spoluprácu

NAHLAS |aktuality.sk

Play Episode Listen Later May 5, 2026 17:23


Nástup Pétra Magyara k moci v Maďarsku otvára otázku nového geopolitického usporiadania strednej Európy. Jeho ambícia posilniť väzby s Viedňou vyvoláva v zahraničí diskusie o projekte nového „Rakúsko-Uhorska“, postavenom na silnej ekonomike a spoločnej histórii. Pre Slovensko však táto zmena môže znamenať oslabenie formátu V4 a postupný odsun na vedľajšiu koľaj. V podcaste rozoberáme, či je Magyar pragmatickým európskym lídrom, alebo skôr hrozbou pre naše národné záujmy. So šéfom zahraničného oddelenia portálu Aktuality.sk Pavlom Štrbom sme rozoberali aj to, či je Robert Fico po prehre Orbána bez spojencov a ako sa to snaží napraviť. Nahrával Marek Biró

Podcasty Aktuality.sk
Po prehre Orbána je Robert Fico izolovaný. Péter Magyar plánuje novú stredoeurópsku spoluprácu

Podcasty Aktuality.sk

Play Episode Listen Later May 5, 2026 17:23


Nástup Pétra Magyara k moci v Maďarsku otvára otázku nového geopolitického usporiadania strednej Európy. Jeho ambícia posilniť väzby s Viedňou vyvoláva v zahraničí diskusie o projekte nového „Rakúsko-Uhorska“, postavenom na silnej ekonomike a spoločnej histórii. Pre Slovensko však táto zmena môže znamenať oslabenie formátu V4 a postupný odsun na vedľajšiu koľaj. V podcaste rozoberáme, či je Magyar pragmatickým európskym lídrom, alebo skôr hrozbou pre naše národné záujmy. So šéfom zahraničného oddelenia portálu Aktuality.sk Pavlom Štrbom sme rozoberali aj to, či je Robert Fico po prehre Orbána bez spojencov a ako sa to snaží napraviť. Nahrával Marek Biró

ROI Hunters - Podcast de Marketing do Infomoney
Tráfego Pago e Orgânico: Estratégias Atualizadas Para Vender Mais em 2026 | ROI Hunters 344

ROI Hunters - Podcast de Marketing do Infomoney

Play Episode Listen Later May 5, 2026 89:39


Let's Talk AI
#243 - GPT 5.5, DeepSeek V4, AI safety sabotage

Let's Talk AI

Play Episode Listen Later May 3, 2026 112:22


Our 243rd episode with a summary and discussion of last week's big AI news!Recorded on 04/29/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:OpenAI released GPT-5.5 with strong coding-oriented improvements, a system card discussing chain-of-thought monitorability and misalignment testing, higher pricing than GPT-5.4, and notable quirks like a system-prompt warning about “goblins.”xAI launched Grok Voice Think Fast 1.0, claiming large benchmark leads for real-time voice agents and reporting major Starlink customer-support automation and sales conversion impact.DeepSeek open-sourced DeepSeek V4 (Pro and Flash) featuring MoE scaling and 1M-token context via hybrid/compressed attention changes, while Tencent released Hunyuan 3 preview with weaker benchmark performance; a new long-horizon agent benchmark (Clawmark) shows low task success rates.Major business, legal, and policy updates include Google's planned up-to-$40B investment and 5GW compute commitment to Anthropic, Meta's AWS Gravitron deal and China blocking Meta's Manus acquisition, a revamped OpenAI–Microsoft agreement, ongoing Musk–OpenAI trial developments, and new safety/security research on sabotage, document degradation under delegation, and bit-flip attacks.Timestamps:(00:00:10) Intro / Banter(00:02:00) News Preview(00:02:26) Response to listener comments(00:02:55) SponsorsTools & Apps(00:05:55) OpenAI Unveils Its New, More Powerful GPT-5.5 Model - The New York Times(00:23:33) xAI Launches grok-voice-think-fast-1.0: Topping τ-voice Bench at 67.3%, Outperforming Gemini, GPT Realtime, and More - MarkTechPost(00:29:00) Claude can now plug directly into Photoshop, Blender, and Ableton | The VergeProjects & Open Source(00:29:38) China's DeepSeek releases preview of long-awaited V4 model as AI race intensifies(00:47:05) Tencent Unveils Hy3 preview; Model Enhances Agent Capabilities and Real-World Usability - Tencent 腾讯(00:50:14) ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker AgentsApplications & Business(00:53:03) Google Plans to Invest Up to $40 Billion in Anthropic(00:56:26) Meta will use hundreds of thousands of AWS Graviton chips(00:59:51) China blocks Meta's $2 billion takeover of AI startup Manus(01:01:45) OpenAI shakes up partnership with Microsoft, capping revenue share payments(01:07:13) Elon Musk Testifies of AI Risk at Trial, Says OpenAI Tried to ‘Steal' a Charity - WSJ(01:11:50) Judge rejects DOJ bid to delay Anthropic appeal in Pentagon dispute(01:14:42) Google's Gemini can now run on a single air-gapped server — and vanish when you pull the plug(01:19:07) DeepMind's David Silver just raised $1.1B to build an AI that learns without human data | TechCrunchPolicy & Safety(01:22:47) Evaluating whether AI models would sabotage AI safety research(01:28:59) LLMs Corrupt Your Documents When You Delegate(01:32:50) Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability(01:39:53) Memorandum on Adversarial Distillation of American AI Models(01:41:41) Teen boys are dating their AI chatbots—and experts warn it could kill their careers | Fortune(01:43:57) Announcing the Anthropic Economic Index Survey(01:45:21) Scoop: CISA lacks access to Anthropic's MythosSynthetic Media & Art(01:48:03) Taylor Swift Files to Trademark Voice and Likeness to Protect Against AI MisuseResearch & Advancements(01:49:15) Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit FlipsSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Niptech: tech & startups
494 - BabaSeek - Apple CEO, DeepSeek, Résultats de Big Tech

Niptech: tech & startups

Play Episode Listen Later May 1, 2026 64:35


Syde teste -> CLAUDE vs Microsoft cowork overview (Frontier) vs copilot Claude integration https://learn.microsoft.com/en-us/microsoft-365/copilot/cowork/ Apple Just Positioned Itself for the Next Trillion Dollars https://www.youtube.com/watch?v=RaAFquzj5B8 DeepSeek a sorti V4 le 24 avril — V4-Pro (1,6T paramètres / 49B actifs) et V4-Flash (284B / 13B actifs), open source, avec un contexte window d'1 million de tokens. Le twist majeur : V4 tourne sur des puces chinoises Huawei Ascend et Cambricon, pas sur du Nvidia. Huawei fournit sa technologie "Supernode" avec ses puces Ascend 950. Is it REALLY breakthrough ? https://www.bloomberg.com/news/articles/2026-04-24/deepseek-unveils-newest-flagship-a-year-after-ai-breakthrough https://www.technologyreview.com/2026/04/24/1136422/why-deepseeks-v4-matters/ Big Tech Earnings Alphabet stock rises on Q1 earnings beat, cloud growth https://finance.yahoo.com/sectors/technology/article/alphabet-stock-rises-on-q1-earnings-beat-cloud-growth-212244059.html Microsoft earnings top Q3 estimates, says AI business up 123% year over year / Copilot: Now exceeds 20 million paid seats (continued strong enterprise traction). https://finance.yahoo.com/sectors/technology/article/microsoft-earnings-top-q3-estimates-says-ai-business-up-123-year-over-year-211358994.html Spotify introduces verified artist badges to help distinguish humans from AI https://techcrunch.com/2026/04/30/spotify-introduces-verified-artist-badges-to-help-distinguish-humans-from-ai/ Inspiration#NUMEROLOGIE :: NUMSTRAT https://portail.numstrat.fr/login/ https://fr.wikipedia.org/wiki/Effet_Barnum#PERSONNAGE :: Lokenath Brahmachari Lokenath Brahmachari - Wikipedia #BOOK :: The Incredible Life of a Himalayan Yogi https://www.amazon.com/Incredible-Life-Himalayan-Yogi-Brahmachari/dp/8187207078 #QUOTE :: "Perform all your actions consciously. Do whatever you like but do it consciously and with a sense of awareness." Baba Lokenath Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.

The AI Breakdown: Daily Artificial Intelligence News and Discussions
How DeepSeek V4 Connects to the US Power Grid

The AI Breakdown: Daily Artificial Intelligence News and Discussions

Play Episode Listen Later Apr 27, 2026 24:56


Today's episode connects two stories that look unrelated on the surface — the White House invoking the Defense Production Act around US grid infrastructure, and DeepSeek's long-awaited V4 release. Together they point to a single conclusion, that energy has become the real frontline of the US-China AI competition. In the headlines: Google commits up to $40 billion to Anthropic, the AI trade roars back as hyperscalers push markets to new highs, and Nvidia becomes the first $5 trillion company.SIGN UP FOR OUR NEW FREE PROGRAM: AGENTOS⁠https://aidbagentos.ai/⁠Brought to you by:KPMG – Agentic AI is powering a potential $3 trillion productivity shift, and KPMG's new paper, Agentic AI Untangled, gives leaders a clear framework to decide whether to build, buy, or borrow—download it at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.kpmg.us/Navigate⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Granola - The AI notepad for people in back-to-back meetings. 100% off your first 3 months with code AIDAILY at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠http://granola.ai/aidaily⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Mercury - Modern banking for business and now personal accounts. Learn more at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://mercury.com/personal-banking⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Zenflow Work - Agents for knowledge work - ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://zenflow.free/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Drata - The agentic trust management platform - ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://drata.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Blitzy - Want to accelerate enterprise software development velocity by 5x? ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://blitzy.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠AssemblyAI - The best way to build Voice AI apps - ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.assemblyai.com/brief⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Robots & Pencils - Cloud-native AI solutions that power results ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://robotsandpencils.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠The Agent Readiness Audit from Superintelligent - Go to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://besuper.ai/ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://pod.link/1680633614⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Our Newsletter is BACK: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://aidailybrief.beehiiv.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Interested in sponsoring the show? sponsors@aidailybrief.ai

Vida Digital
Ford Explorer: 35 Años de Tecnología, Seguridad e Innovación

Vida Digital

Play Episode Listen Later Apr 27, 2026 23:31


En este episodio de Vida Digital en Radio Ancón, Alex Neuman conversa con Rodrigo Hernández, Gerente de Producto de Ford para Centroamérica y el Caribe, con motivo de los 35 años del Ford Explorer — el SUV más vendido de todos los tiempos en Estados Unidos.Rodrigo explica que el Explorer nació en 1991 como sucesor espiritual de la Ford Bronco, pero con un ADN distinto: "Ford quería un vehículo más emocionante, más SUV cómoda, pero sobre todo familiar. Y eso es uno de los pilares que se ha mantenido al día de hoy."En cuanto a los motores EcoBoost, lo pone en términos concretos: combinar inyección directa y turbocompresión permite que "motores pequeños produzcan la potencia de un V8, pero con el consumo de un V4 o V6", con una calibración que ajusta la entrega de combustible miles de veces para que cada gota rinda al máximo.Uno de los momentos más ilustrativos fue su explicación del Terrain Management System: "Yo siempre le digo a los clientes: este sistema es como tener seis vehículos en uno solo." Los modos Normal, Eco, Sport, Remolque, Resbaladizo y Sendero cubren desde el tráfico de la ciudad hasta caminos sin pavimentar, ajustando tracción y estabilidad de forma automática.Sobre el galardón Top Safety Pick+ del IIHS — la distinción más alta de la industria — Rodrigo fue directo: "Para Ford, la seguridad no es opcional. Es la prioridad número uno." Y aclaró que el reconocimiento no solo premia la estructura del vehículo, sino la tecnología activa: "No es que el vehículo sea el más pesado, sino ¿qué necesitas para prevenir?"La plataforma Ford Co-Pilot360 tiene un propósito claro: "Quiere convertir al vehículo en un copiloto inteligente, que cuida no solo de mí, sino de mi familia." En los showrooms, la función que más sorprende a los compradores es la pantalla táctil de 13.2 pulgadas con SYNC 5.4: "El cliente dice: ¿a poco ya puedo hacerlo? Pedirle al sistema que abra Spotify y busque una canción solito es como un boom para los clientes."Mirando al futuro, Rodrigo trazó tres pilares para la región: electrificación con motores híbridos, conectividad total vía Ford Pass, y autonomía asistida con Ford Blue Cruise. Su visión: "El vehículo será cada vez más un software sobre ruedas, capaz de aprender de tus rutas y de tus preferencias."

INSIDE FINANCE
Non è più tecnologia. È potere | Intelligenze Emergenti #7 | Rassegna internazionale 17 – 24 Aprile 2026

INSIDE FINANCE

Play Episode Listen Later Apr 26, 2026 9:05


L'intelligenza artificiale ha superato una nuova soglia:non è più solo tecnologia, ma infrastruttura strategica su scala nazionale.Non conta più solo il modello.Contano capitale, energia, hardware e governance.In questa puntata analizziamo cosa è successo davvero nella quarta settimana di aprile 2026:• L'ingresso dell'AI in una fase di autonomia sistemica, con dinamiche sempre più vicine a logiche statuali• Il gigantismo finanziario del settore: Amazon fino a 25 miliardi su Anthropic e OpenAI oltre 20 miliardi con Cerebras• La nascita di un nuovo equilibrio tra capitale e infrastruttura, con partnership sempre più integrate tra cloud e AI• La competizione globale tra USA e Cina, con DeepSeek che accelera su chip Huawei per la sovranità tecnologica• I limiti fisici dell'AI: consumo energetico, emissioni e tensioni su acqua e risorse nei data center• La trasformazione del lavoro: licenziamenti, automazione e riorganizzazione delle aziende• Il rischio di nuove disuguaglianze: forte divario nell'adozione dell'AI tra lavoratori ad alto e basso reddito• L'arrivo degli agenti autonomi: modelli sempre più capaci di gestire processi complessi in autonomia• L'espansione dell'AI nel mondo fisico, con sistemi in grado di competere con l'uomo anche in attività reali• La sicurezza come nuovo fronte critico, tra vulnerabilità scoperte e coinvolgimento delle agenzie governative• La sfida regolatoria europea tra sovranità tecnologica e competitività globaleIn collaborazione con Claudio Ricci, Amministratore unico di Recomb, una realtà specializzata nel fornire aggiornamenti personalizzati alle organizzazioni orientate all'innovazione sugli sviluppi dell'intelligenza artificiale, oltre a offrire corsi di aggiornamento professionalePer maggiori informazioni: info@recomb.aiFonti principali: CNBChttps://www.cnbc.com/2026/04/20/amazon-invest-up-to-25-billion-in-anthropic-part-of-ai-infrastructure.htmlInvestimento fino a 25 miliardi di Amazon in Anthropic e rafforzamento infrastruttura AI. The Informationhttps://www.theinformation.com/articles/openai-spend-20-billion-cerebras-chips-receive-equity-stakeAccordo OpenAI–Cerebras per ridurre la dipendenza da Nvidia con nuovi chip. The Informationhttps://www.theinformation.com/articles/openai-tpg-bain-capital-negotiate-10-billion-deploy-co-ventureJoint venture da 10 miliardi per accelerare l'adozione dell'AI nelle imprese. Reutershttps://www.reuters.com/technology/chinas-deepseek-returns-with-new-model-year-after-viral-rise-2026-04-24/DeepSeek lancia il modello V4 ottimizzato per chip Huawei. Wiredhttps://archive.is/KVNsBImpatto ambientale dei data center AI alimentati a gas negli Stati Uniti. Wiredhttps://archive.is/uo7oeProgetto data center TikTok in Brasile e tensioni su acqua ed energia. Reutershttps://www.reuters.com/world/meta-targets-may-20-first-wave-layoffs-additional-cuts-later-2026-04-17/Meta avvia licenziamenti per riallocare risorse verso l'intelligenza artificiale. CNBChttps://www.cnbc.com/2026/04/23/microsoft-plans-first-voluntary-retirement-program-for-us-employees.htmlMicrosoft introduce uscite volontarie per riorganizzazione legata all'AI. Financial Timeshttps://archive.is/hRtEjDivario crescente nell'uso dell'AI tra lavoratori ad alto e basso reddito. American Psychological Associationhttps://psycnet.apa.org/doiLandingRischi cognitivi legati all'uso dell'intelligenza artificiale e delega del pensiero. Business Insiderhttps://archive.is/U1w99Lancio di GPT-5.5 e crescita della valutazione di Anthropic. VentureBeathttps://venturebeat.com/orchestration/openai-unveils-workspace-agents-a-successor-to-custom-gpts-for-enterprises-that-can-plug-directly-into-slack-salesforce-and-moreIntroduzione dei Workspace Agents per automazione dei processi aziendali. Ars Technicahttps://arstechnica.com/ai/2026/04/mozilla-anthropics-mythos-found-271-zero-day-vulnerabilities-in-firefox-150/Anthropic scopre oltre 270 vulnerabilità nel browser Firefox. Reutershttps://archive.is/skG1aOpenAI presenta modelli avanzati alle agenzie di sicurezza Five Eyes. Financial Timeshttps://archive.is/AYpqSAccuse USA alla Cina per furto di proprietà intellettuale tramite AI. Reutershttps://www.reuters.com/business/media-telecom/eu-commission-awards-180-million-euro-cloud-contract-four-european-providers-2026-04-17/UE investe 180 milioni per sviluppare cloud sovrano europeo. Silicon Republichttps://www.siliconrepublic.com/machines/merz-siemens-call-for-easing-of-eu-regulations-on-industrial-aiPressioni tedesche per ridurre la regolamentazione sull'AI industriale. The Informationhttps://www.theinformation.com/articles/berkshire-hathaway-chubb-win-approval-drop-ai-insurance-coverageAssicurazioni escludono i rischi legati all'intelligenza artificiale dalle polizze.

Gregario Cycling
Episódio 306 - Potência Brasileria [U2e]

Gregario Cycling

Play Episode Listen Later Apr 24, 2026 67:49


Se os italianos da Favero desbancaram americanos, alemães e até suíços no mercado de medidores de potência, por que não os brasileiros?Hermano Guimarães e Guilherme Oliveira passaram três anos de desenvolvimento para tirar a U2e do pedivela e chegar ao pedal. Tecnologia 100% nacional, mesmas características técnicas dos concorrentes importados, fábrica e suporte no Brasil — e um preço que muda a conversa.Neste episódio do Gregário Cycling, a gente senta com os dois pra entender como nasce um medidor de potência feito no Brasil, o que separa um PRX de um V4, e por que o mercado mundial ainda não tinha olhado pra cá.

Headline News
China's DeepSeek previews new model adapted for Huawei chip technology

Headline News

Play Episode Listen Later Apr 24, 2026 4:45


Chinese AI startup DeepSeek has unveiled a preview of its next-generation V4 model, optimized for Huawei's chip technology and featuring an ultra-long context of one million words.

Podcasty Aktuality.sk
NA ROVINU s Eduardom Hegerom: Robert Fico je problémom pre V4, Péter Magyar s ním nemá zľutovanie (27/26)

Podcasty Aktuality.sk

Play Episode Listen Later Apr 24, 2026 45:45


KDH a SaS sa zachovali neslušne a pri rokovaní o spájaní nám tresli dverami pred nosom, hovorí Eduard Heger. Expremiér a člen Demokratov avizuje, že referendový petičný výbor sa bude sťažovať Ústavnému súdu na postup prezidenta Pellegriniho, ktorý odmietol vyhlásiť jednu z ich otázok. Slovensko podľa neho teraz nebude mať žiadne vzťahy s Maďarskom, a Robert Fico bude aj prekážkou fungovania celej V4.V podcaste s Eduardom Hegerom sa dozviete:– od 1. minúty – že Péter Magyar si pamätá, ako sa Robert Fico správal pred voľbami;– po 2:30 – prečo je Robert Fico problémom pre fungovanie V4;– od 5:00 – kde cítiť vietnamskú inšpiráciu Roberta Fica;– po 6:00 – či by sme nemali mať 5-ročné funkčné obdobia parlamentu aj vlády;– od 8:00 – že Fico neurobil to, čo sľúbil, ale urobil to, čo nesľúbil;– po 9:20 – či by mali zahraniční Slováci rozhodovať vo voľbách na Slovensku;– od 11:00 – že Robert Fico myslí len na seba a nie na Slovensko;– po 12:00 – prečo si myslí, že prezident o referende rozhodol zákerne;– od 14:30 – že Demokrati sa obracajú na Ústavný súd SR so sťažnosťou na prezidenta;– po 16:00 – čo takou sťažnosťou dosiahnu;– od 17:00 – či sa dala referendová otázka naformulovať inak;– po 19:00 – či Demokrati nemohli naplánovať podanie petície lepšie s ohľadom na termín referenda;– od 21:30 – že prezidenta považujú za straníka;– po 24:00 – či nie je aj ďalšia otázka o ÚŠP a NAKA nevykonateľná;– po 28:00 – či nie je júlové referendum vopred prehratý zápas;– od 32:00 – aké prorastové opatrenia vlády čaká;– po 33:00 – že Demokrati navrhujú 17-percentnú rovnú daň z príjmu;– od 34:00 – kam sa stratilo sedem miliárd z konsolidácií;– po 35:30 – či Demokrati vychádzajú z predpokladu, že vláda si naplánovala kradnutie aj v ďalšom volebnom období;– po 36:30 – prečo urobil neadresnú energopomoc, ktorú dnes všetci kritizujú a chcú zmeniť;– po 38:00 – že Demokrati sa chcú spájať, ale nikto sa nechce spájať s nimi;– od 40:00 – ako SaS a KDH dali Demokratom neslušnú facku.

NA ROVINU|aktuality.sk
NA ROVINU s Eduardom Hegerom: Robert Fico je problémom pre V4, Péter Magyar s ním nemá zľutovanie (27/26)

NA ROVINU|aktuality.sk

Play Episode Listen Later Apr 24, 2026 45:45


KDH a SaS sa zachovali neslušne a pri rokovaní o spájaní nám tresli dverami pred nosom, hovorí Eduard Heger. Expremiér a člen Demokratov avizuje, že referendový petičný výbor sa bude sťažovať Ústavnému súdu na postup prezidenta Pellegriniho, ktorý odmietol vyhlásiť jednu z ich otázok. Slovensko podľa neho teraz nebude mať žiadne vzťahy s Maďarskom, a Robert Fico bude aj prekážkou fungovania celej V4.V podcaste s Eduardom Hegerom sa dozviete:– od 1. minúty – že Péter Magyar si pamätá, ako sa Robert Fico správal pred voľbami;– po 2:30 – prečo je Robert Fico problémom pre fungovanie V4;– od 5:00 – kde cítiť vietnamskú inšpiráciu Roberta Fica;– po 6:00 – či by sme nemali mať 5-ročné funkčné obdobia parlamentu aj vlády;– od 8:00 – že Fico neurobil to, čo sľúbil, ale urobil to, čo nesľúbil;– po 9:20 – či by mali zahraniční Slováci rozhodovať vo voľbách na Slovensku;– od 11:00 – že Robert Fico myslí len na seba a nie na Slovensko;– po 12:00 – prečo si myslí, že prezident o referende rozhodol zákerne;– od 14:30 – že Demokrati sa obracajú na Ústavný súd SR so sťažnosťou na prezidenta;– po 16:00 – čo takou sťažnosťou dosiahnu;– od 17:00 – či sa dala referendová otázka naformulovať inak;– po 19:00 – či Demokrati nemohli naplánovať podanie petície lepšie s ohľadom na termín referenda;– od 21:30 – že prezidenta považujú za straníka;– po 24:00 – či nie je aj ďalšia otázka o ÚŠP a NAKA nevykonateľná;– po 28:00 – či nie je júlové referendum vopred prehratý zápas;– od 32:00 – aké prorastové opatrenia vlády čaká;– po 33:00 – že Demokrati navrhujú 17-percentnú rovnú daň z príjmu;– od 34:00 – kam sa stratilo sedem miliárd z konsolidácií;– po 35:30 – či Demokrati vychádzajú z predpokladu, že vláda si naplánovala kradnutie aj v ďalšom volebnom období;– po 36:30 – prečo urobil neadresnú energopomoc, ktorú dnes všetci kritizujú a chcú zmeniť;– po 38:00 – že Demokrati sa chcú spájať, ale nikto sa nechce spájať s nimi;– od 40:00 – ako SaS a KDH dali Demokratom neslušnú facku.

Black Box
Estesa tregua Israele-Libano. Asia mista, TSMC record. Intel vola dopo i conti, Deepseek V4 | Morning Finance

Black Box

Play Episode Listen Later Apr 24, 2026 22:48


24/4 Israele e Libano hanno esteso cessate il fuoco per tre settimane. Trump: Lavoreremo con Libano per difenderlo da Hezbollah. La portaerei Bush è arrivata nell'oceano indiano. Brent a 105$, salgono dollaro e rendimenti Treasuries. Giù oro, argento, Bitcoin. Futures misti in Usa, Nasdaq in verde con Intel +20% in pre-market dopo i conti. Meta taglia il 10% della forza lavoro per efficientare visti alti Capex, Microsoft: uscite volontarie al 7% della forza lavoro. KPMG verso taglio 10% partner audit in Usa. Google nel mirino Antitrust Eu, Avis -50% dopo short squeeze. Polymarket: soldato Usa arrestato per scommessa che gli ha fatto guadagnare 400mila$ legata a cattura di Maduro. *** Questo episodio è offerto da Scalable Capital   Investire comporta rischi  Interesse p.a. lordo variabile su liquidità illimitata. Condizioni e distribuzione della liquidità su scalable.capital/conto-deposito-non-vincolato *** In Asia seduta mista, Nikkei in verde:  Inflazione giapponese sotto le attese all'1,5%. Kospi giù, la borsa di Taiwan sale del 3% con il record di TSMC dopo che le autorità hanno aumentato i limiti al 25% per i fondi di possedere un singolo titolo. Deepseek ha rilasciato la preview di V4. Dollaro/yen sfiora 160: Katayama pronti a intervenire, gli analisti puntano alla Golden Week quando i listini saranno chiusi.  Futures in EU in rosso, oggi Consiglio Europeo. Via libera a prestito 90mld Ucraina, step preliminari per ingresso Ue. Meloni chiede coraggio all'Europa e non esclude scostamento deficit. Unicredit è il terzo socio del Leone, cosa aspettarsi? Arnault: rischiamo catastrofe globale. Learn more about your ad choices. Visit megaphone.fm/adchoices

Z prvej ruky
Európska bezpečnosť (24.4.2026 12:30)

Z prvej ruky

Play Episode Listen Later Apr 24, 2026 27:13


Hostia: Katarína Roth Neveďalová (europoslankyňa; Smer-SD) a Miriam Lexmann (europoslankyňa; KDH). Stretnutie na Cypre: Lídri krajín majú hovoriť o posilňovaní európskej bezpečnosti, budúcnosť EÚ a Slovenska v NATO, obranyschopnosť európskych krajín, článok 42.7 obrannej doložky, kapacity Únie, jej obchodné postavenie v súčasnom svete, geopolitické zmeny a ozbrojené konflikty nielen na Ukrajine, situácia na Ukrajine, úloha EÚ v nej, schválenie 90 miliardovej pôžičky, 20. balík protiruských sankcií, viaceré krajiny volajú po zrušení práva veta, hlasovať by si vedeli predstaviť nadpolovičnou väčšinou, čo by to prinieslo, akú silu by mali menšie krajiny ako napríklad Slovensko? Bezpečnostné prostredie Slovenska: Vojnový konflikt u našich susedov, agresia Ruska ukázala nedostatky pri diverzifikovaní zdrojov, nielen energetika je kľúčovou oblasťou, ako je na tom slovenská obranyschopnosť, pomoc od spojencov v prípade potreby, USA ako hlavný hráč na pôde Severoatlantickej aliancie, samostatne postupujú napríklad Poliaci, ktorí dávajú na obranu ročne miliardy, spolupráca na úrovni V4 a podobne. | Európska bezpečnosť. | Moderuje: Soňa Mačor Otajovičová; | Diskusiu Z prvej ruky pripravuje Slovenský rozhlas, Rádio Slovensko, SRo1. Vysielame každý pracovný deň o 12:30 v Rádiu Slovensko.

Studio N
Poradce prezidenta Žantovský: Spor o summit NATO je Macinkova vendeta, Babiš zanedbává své ústavní povinnosti

Studio N

Play Episode Listen Later Apr 17, 2026 30:27


VŠECHNY EPIZODY V PLNÉ DÉLCE NAJDETE NA HEROHERO.CO/STUDION, DĚKUJEME, ŽE PODPORUJETE NEZÁVISLOU ŽURNALISTIKU „Máme premiéra, součástí jehož politického úspěchu je vyhýbat se situacím, u kterých by jeho popularita mohla utrpět,“ říká diplomat a poradce prezidenta Michael Žantovský ke sporu o účast zástupců Česka na summitu NATO v Ankaře. V podcastu Studio N kritizuje, že se Andrej Babiš vyhýbá nejen politicky nepříjemným situacím, ale i své ústavní roli. Podle Žantovského premiér nechává věci „na menších hráčích, kteří se radostně chopí příležitosti a dělají potom v české politice dusno“. Tím podle něj přispívá k eskalaci sporu. Celý konflikt podle něj způsobil ministr zahraničí Petr Macinka. „Vidím to jako problém, který způsobil a dále pod ním zatápí jeden ministr, a to z osobních a vnitropolitických důvodů, které nesouvisejí s NATO a s tím, kdo tam bude Česko zastupovat,“ míní. „Je to politická vendeta, ke které se pan Macinka otevřeně přihlásil ve zprávách, které zasílal jednomu z mých kolegů poradců prezidenta. Nyní v tom otevřeně pokračuje a otevřeně škodí zájmům této země, ačkoliv by měl mít jako ministr zahraničních věcí v první řadě na paměti zájmy České republiky. Je to velmi zvláštní způsob chápání své role,“ tvrdí Žantovský. „Já jsem povoláním psycholog a strávil jsem osm let života v psychiatrické léčebně. Mohl bych chování pana Macinky nahlížet i z tohoto úhlu, ale myslím, že bude lepší, když to zatím neudělám,“ říká ve Studiu N. Kdo tahá ve sporu Babišovy vlády s prezidentem za kratší konec ústavy? Proč by podle něj kompetenční žaloba nevedla k vyřešení situace? Jak by se s Petrem Macinkou vypořádal Václav Havel? Má stále ještě smysl udržovat formát V4? A zůstává NATO naší bezpečnostní zárukou, nebo už jde jen o spojenectví na papíře? Podívejte se na celý rozhovor na herohero.co/studion

Kecy a politika
Kecy a politika 260: Fico a Babiš válčí s Evropou

Kecy a politika

Play Episode Listen Later Apr 6, 2026 47:32


Společné zasedání vlád Česka a Slovenska bylo symbolické hned ze dvou důvodů. Závěrečný rozhovor obou premiérů pro XTV nastínil bludnou cestu české zahraniční politiky. Andrej Babiš znovu trval na tom, že stojí o budování společenství V4. To ovšem po faktickém odchodu Polska na protest proti proruské politice Maďarska neexistuje. Stále víc nás zatahuje do nevyhlášené války Viktora Orbána a Roberta Fica s Evropou.Pěkně to bylo vidět na zmíněném rozhovoru, kde se Fico vyjadřoval o lidech z Evropské komise jako o největších nepřátelích. Když si to dáme do souvislosti s tím, jak Maďarsko drží v šachu půjčku EU napadené Ukrajině, napadá jen jediné – tihle lidé chtějí způsobit její bankrot a kapitulaci. Co ovšem na téhle galéře hledá Česko a Babiš, těžko říct.Naladění vládních politiků v Maďarsku a na Slovensku je zralé na to, aby jim vlády zbylých zemí EU řekly: dámy a pánové, nechcete si vypsat referendum o vystoupení z Unie? Opakem by totiž bylo vyloučit příslušnou zemi z EU, ale k tomu jednak nedošlo a zároveň na to není žádný paragraf. Existuje článek 7 smlouvy o EU, který předpokládá zbavení hlasovacích práv členské země, a to za určitých okolností. S tím ovšem musí souhlasit zbylých 26 zemí. Jak vidíme propojení Maďarska a Slovenska, k něčemu takovému by zřejmě nemohlo dojít.Proč v téhle partě má ovšem statovat Česká republika? Kde je nějaký národní zájem? Výsledkem je zahraniční politika, která je přizdisráčská, netransparentní, eroduje porevoluční představy o místu Česka na mapě Evropy a jde na ruku Putinovi.

DeFi Slate
Stani Kulechov on Why Aave V4 Is The Most Resilient DeFi In The World

DeFi Slate

Play Episode Listen Later Mar 30, 2026 26:43


Stani Kulechov joins The Rollup live from DeFi Day to reflect on the V4 announcement & break down the hub-and-spoke architecture, how Aave is positioning for RWAs and tokenized equities, the Whop integration bringing 21 million fintech users into DeFi, his vision for abundance, and more.Stani Kulechov is the founder of Aave, one of the largest and most resilient DeFi lending protocols in the world. He has been building at the forefront of DeFi since 2017 and is a leading voice on the future of onchain credit, stablecoins, and real-world assets onchain.The Rollup is where the leaders of digital assets and finance converge. Live from the financial capital of the world.Timestamps:00:00 Intro00:14 V4 Launch Reaction01:03 Risk Architecture Explained03:05 Hub & Spoke Liquidity Model04:50 Three Risk Tiers Breakdown05:44 Bootstrapping New Use Cases06:59 Aave V4 vs. V2 & V308:35 Institutional Capital Coming Onchain12:23 RWA Pools & Collateral Strategy13:04 GHO's Role in Credit Markets14:53 Quarterly Call Highlights16:59 Whop Integration Breakdown17:39 Aave App & Consumer Abstraction19:56 Chainlink SVR Announcement21:01 What Aave Users Need To Know23:19 Permissioned vs. Permissionless25:19 Future of Public Chain RWAs26:16 Who Wins the Tokenization Race?Website: https://therollup.co/Spotify: https://open.spotify.com/show/1P6ZeYd...Podcast: https://therollup.co/category/podcastFollow us on X: https://www.x.com/therollupcoFollow Rob on X: https://www.x.com/robbiek__Follow Andy on X: https://www.x.com/ayyyeandyJoin our TG group: https://t.me/+TsM1CRpWFgk1NGZhThe Rollup Disclosures: https://goodidea.ventures

This Week in Google (MP3)
IM 862: Ménage à Claude - AI, Human Agency, and Economic Value

This Week in Google (MP3)

Play Episode Listen Later Mar 19, 2026 180:43


Who gets to define what intelligence means in the age of AI, and why are tech companies so keen to shift blame onto their creations? This episode digs into moral outsourcing, agency, and the urgent need for independent oversight in the world of artificial intelligence. Nvidia Unveils NemoClaw Agent Software Nvidia's NemoClaw is OpenClaw with guardrails Jensen just put Nvidia's Blackwell and Vera Rubin sales projections into the $1 trillion stratosphere Nvidia Unveils Groq-Based Chip System to Speed Up AI Tasks Like Coding Nvidia's DLSS 5 is like motion smoothing for video games, but worse Zuckerberg has "finished" with Alexandr Wang, worth US$14 billion Meta didn't buy Moltbook for bots — it bought into the agentic web Meta's Manus AI agent arrives on your desktop to take on OpenClaw Introducing GPT-5.4 mini and nano | OpenAI Sources: OpenAI signed a deal with AWS to sell its AI services to US government agencies for both classified and unclassified work, amid the Anthropic-DOD spat Inside OpenAI's Race to Catch Up to Claude Code OpenAI, Musk and Focus A mystery 1T-parameter AI model called Hunter Alpha, which appeared on OpenRouter on March 11, sparks speculation that DeepSeek is quietly testing its V4 model Hustlers are cashing in on China's OpenClaw AI craze Baidu is integrating OpenClaw with its Xiaodu devices to work as voice-controlled remotes, as it seeks to catch up with Tencent and Alibaba in the AI race Tennessee grandmother jailed after AI facial recognition error links her to fraud Judges Find AI Doesn't Have Human Intelligence in Two New Court Cases - Slashdot AI Agent Hacks McKinsey A study of ~1,500 US workers finds AI use can reduce burnout but also cause "AI brain fry", a mental fatigue from using AI tools beyond one's cognitive capacity AI companies want to harvest improv actors' skills to train AI on human emotion A Reddit Post, An AI Hallucination, And Two Lawyers Who Never Checked Citations Walk Into A Dog Custody Case Digg's open beta shuts down after just two months, blaming AI bot spam EchoPrime – Cedars-Sinai’s AI system can read echocardiograms and write the report Robotic Surgery Performed Remotely on Patient 1,500 Miles Away - Slashdot Ex-Uber CEO Kalanick Debuts Plan for 'Gainfully Employed Robots' German philosopher Jürgen Habermas dies at 96 CanIRun.ai — Can your machine run AI models? We tried White Castle from an airport vending machine. It was bleak. I tried BigArch. A big mess. Hosts: Leo Laporte, Jeff Jarvis, and Fr. Robert Ballecer, SJ Guest: Rumman Chowdhury Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit Sponsors: spaceship.com/twit outsystems.com/twit zscaler.com/security preview.modulate.ai

All TWiT.tv Shows (MP3)
Intelligent Machines 862: Ménage à Claude

All TWiT.tv Shows (MP3)

Play Episode Listen Later Mar 19, 2026 180:43 Transcription Available


Who gets to define what intelligence means in the age of AI, and why are tech companies so keen to shift blame onto their creations? This episode digs into moral outsourcing, agency, and the urgent need for independent oversight in the world of artificial intelligence. Nvidia Unveils NemoClaw Agent Software Nvidia's NemoClaw is OpenClaw with guardrails Jensen just put Nvidia's Blackwell and Vera Rubin sales projections into the $1 trillion stratosphere Nvidia Unveils Groq-Based Chip System to Speed Up AI Tasks Like Coding Nvidia's DLSS 5 is like motion smoothing for video games, but worse Zuckerberg has "finished" with Alexandr Wang, worth US$14 billion Meta didn't buy Moltbook for bots — it bought into the agentic web Meta's Manus AI agent arrives on your desktop to take on OpenClaw Introducing GPT-5.4 mini and nano | OpenAI Sources: OpenAI signed a deal with AWS to sell its AI services to US government agencies for both classified and unclassified work, amid the Anthropic-DOD spat Inside OpenAI's Race to Catch Up to Claude Code OpenAI, Musk and Focus A mystery 1T-parameter AI model called Hunter Alpha, which appeared on OpenRouter on March 11, sparks speculation that DeepSeek is quietly testing its V4 model Hustlers are cashing in on China's OpenClaw AI craze Baidu is integrating OpenClaw with its Xiaodu devices to work as voice-controlled remotes, as it seeks to catch up with Tencent and Alibaba in the AI race Tennessee grandmother jailed after AI facial recognition error links her to fraud Judges Find AI Doesn't Have Human Intelligence in Two New Court Cases - Slashdot AI Agent Hacks McKinsey A study of ~1,500 US workers finds AI use can reduce burnout but also cause "AI brain fry", a mental fatigue from using AI tools beyond one's cognitive capacity AI companies want to harvest improv actors' skills to train AI on human emotion A Reddit Post, An AI Hallucination, And Two Lawyers Who Never Checked Citations Walk Into A Dog Custody Case Digg's open beta shuts down after just two months, blaming AI bot spam EchoPrime – Cedars-Sinai’s AI system can read echocardiograms and write the report Robotic Surgery Performed Remotely on Patient 1,500 Miles Away - Slashdot Ex-Uber CEO Kalanick Debuts Plan for 'Gainfully Employed Robots' German philosopher Jürgen Habermas dies at 96 CanIRun.ai — Can your machine run AI models? We tried White Castle from an airport vending machine. It was bleak. I tried BigArch. A big mess. Hosts: Leo Laporte, Jeff Jarvis, and Fr. Robert Ballecer, SJ Guest: Rumman Chowdhury Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit Sponsors: spaceship.com/twit outsystems.com/twit zscaler.com/security preview.modulate.ai

Radio Leo (Audio)
Intelligent Machines 862: Ménage à Claude

Radio Leo (Audio)

Play Episode Listen Later Mar 19, 2026 180:43 Transcription Available


Who gets to define what intelligence means in the age of AI, and why are tech companies so keen to shift blame onto their creations? This episode digs into moral outsourcing, agency, and the urgent need for independent oversight in the world of artificial intelligence. Nvidia Unveils NemoClaw Agent Software Nvidia's NemoClaw is OpenClaw with guardrails Jensen just put Nvidia's Blackwell and Vera Rubin sales projections into the $1 trillion stratosphere Nvidia Unveils Groq-Based Chip System to Speed Up AI Tasks Like Coding Nvidia's DLSS 5 is like motion smoothing for video games, but worse Zuckerberg has "finished" with Alexandr Wang, worth US$14 billion Meta didn't buy Moltbook for bots — it bought into the agentic web Meta's Manus AI agent arrives on your desktop to take on OpenClaw Introducing GPT-5.4 mini and nano | OpenAI Sources: OpenAI signed a deal with AWS to sell its AI services to US government agencies for both classified and unclassified work, amid the Anthropic-DOD spat Inside OpenAI's Race to Catch Up to Claude Code OpenAI, Musk and Focus A mystery 1T-parameter AI model called Hunter Alpha, which appeared on OpenRouter on March 11, sparks speculation that DeepSeek is quietly testing its V4 model Hustlers are cashing in on China's OpenClaw AI craze Baidu is integrating OpenClaw with its Xiaodu devices to work as voice-controlled remotes, as it seeks to catch up with Tencent and Alibaba in the AI race Tennessee grandmother jailed after AI facial recognition error links her to fraud Judges Find AI Doesn't Have Human Intelligence in Two New Court Cases - Slashdot AI Agent Hacks McKinsey A study of ~1,500 US workers finds AI use can reduce burnout but also cause "AI brain fry", a mental fatigue from using AI tools beyond one's cognitive capacity AI companies want to harvest improv actors' skills to train AI on human emotion A Reddit Post, An AI Hallucination, And Two Lawyers Who Never Checked Citations Walk Into A Dog Custody Case Digg's open beta shuts down after just two months, blaming AI bot spam EchoPrime – Cedars-Sinai’s AI system can read echocardiograms and write the report Robotic Surgery Performed Remotely on Patient 1,500 Miles Away - Slashdot Ex-Uber CEO Kalanick Debuts Plan for 'Gainfully Employed Robots' German philosopher Jürgen Habermas dies at 96 CanIRun.ai — Can your machine run AI models? We tried White Castle from an airport vending machine. It was bleak. I tried BigArch. A big mess. Hosts: Leo Laporte, Jeff Jarvis, and Fr. Robert Ballecer, SJ Guest: Rumman Chowdhury Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit Sponsors: spaceship.com/twit outsystems.com/twit zscaler.com/security preview.modulate.ai

This Week in Google (Video HI)
IM 862: Ménage à Claude - AI, Human Agency, and Economic Value

This Week in Google (Video HI)

Play Episode Listen Later Mar 19, 2026


Who gets to define what intelligence means in the age of AI, and why are tech companies so keen to shift blame onto their creations? This episode digs into moral outsourcing, agency, and the urgent need for independent oversight in the world of artificial intelligence. Nvidia Unveils NemoClaw Agent Software Nvidia's NemoClaw is OpenClaw with guardrails Jensen just put Nvidia's Blackwell and Vera Rubin sales projections into the $1 trillion stratosphere Nvidia Unveils Groq-Based Chip System to Speed Up AI Tasks Like Coding Nvidia's DLSS 5 is like motion smoothing for video games, but worse Zuckerberg has "finished" with Alexandr Wang, worth US$14 billion Meta didn't buy Moltbook for bots — it bought into the agentic web Meta's Manus AI agent arrives on your desktop to take on OpenClaw Introducing GPT-5.4 mini and nano | OpenAI Sources: OpenAI signed a deal with AWS to sell its AI services to US government agencies for both classified and unclassified work, amid the Anthropic-DOD spat Inside OpenAI's Race to Catch Up to Claude Code OpenAI, Musk and Focus A mystery 1T-parameter AI model called Hunter Alpha, which appeared on OpenRouter on March 11, sparks speculation that DeepSeek is quietly testing its V4 model Hustlers are cashing in on China's OpenClaw AI craze Baidu is integrating OpenClaw with its Xiaodu devices to work as voice-controlled remotes, as it seeks to catch up with Tencent and Alibaba in the AI race Tennessee grandmother jailed after AI facial recognition error links her to fraud Judges Find AI Doesn't Have Human Intelligence in Two New Court Cases - Slashdot AI Agent Hacks McKinsey A study of ~1,500 US workers finds AI use can reduce burnout but also cause "AI brain fry", a mental fatigue from using AI tools beyond one's cognitive capacity AI companies want to harvest improv actors' skills to train AI on human emotion A Reddit Post, An AI Hallucination, And Two Lawyers Who Never Checked Citations Walk Into A Dog Custody Case Digg's open beta shuts down after just two months, blaming AI bot spam EchoPrime – Cedars-Sinai’s AI system can read echocardiograms and write the report Robotic Surgery Performed Remotely on Patient 1,500 Miles Away - Slashdot Ex-Uber CEO Kalanick Debuts Plan for 'Gainfully Employed Robots' German philosopher Jürgen Habermas dies at 96 CanIRun.ai — Can your machine run AI models? We tried White Castle from an airport vending machine. It was bleak. I tried BigArch. A big mess. Hosts: Leo Laporte, Jeff Jarvis, and Fr. Robert Ballecer, SJ Guest: Rumman Chowdhury Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit Sponsors: spaceship.com/twit outsystems.com/twit zscaler.com/security preview.modulate.ai

All TWiT.tv Shows (Video LO)
Intelligent Machines 862: Ménage à Claude

All TWiT.tv Shows (Video LO)

Play Episode Listen Later Mar 19, 2026 180:43 Transcription Available


Who gets to define what intelligence means in the age of AI, and why are tech companies so keen to shift blame onto their creations? This episode digs into moral outsourcing, agency, and the urgent need for independent oversight in the world of artificial intelligence. Nvidia Unveils NemoClaw Agent Software Nvidia's NemoClaw is OpenClaw with guardrails Jensen just put Nvidia's Blackwell and Vera Rubin sales projections into the $1 trillion stratosphere Nvidia Unveils Groq-Based Chip System to Speed Up AI Tasks Like Coding Nvidia's DLSS 5 is like motion smoothing for video games, but worse Zuckerberg has "finished" with Alexandr Wang, worth US$14 billion Meta didn't buy Moltbook for bots — it bought into the agentic web Meta's Manus AI agent arrives on your desktop to take on OpenClaw Introducing GPT-5.4 mini and nano | OpenAI Sources: OpenAI signed a deal with AWS to sell its AI services to US government agencies for both classified and unclassified work, amid the Anthropic-DOD spat Inside OpenAI's Race to Catch Up to Claude Code OpenAI, Musk and Focus A mystery 1T-parameter AI model called Hunter Alpha, which appeared on OpenRouter on March 11, sparks speculation that DeepSeek is quietly testing its V4 model Hustlers are cashing in on China's OpenClaw AI craze Baidu is integrating OpenClaw with its Xiaodu devices to work as voice-controlled remotes, as it seeks to catch up with Tencent and Alibaba in the AI race Tennessee grandmother jailed after AI facial recognition error links her to fraud Judges Find AI Doesn't Have Human Intelligence in Two New Court Cases - Slashdot AI Agent Hacks McKinsey A study of ~1,500 US workers finds AI use can reduce burnout but also cause "AI brain fry", a mental fatigue from using AI tools beyond one's cognitive capacity AI companies want to harvest improv actors' skills to train AI on human emotion A Reddit Post, An AI Hallucination, And Two Lawyers Who Never Checked Citations Walk Into A Dog Custody Case Digg's open beta shuts down after just two months, blaming AI bot spam EchoPrime – Cedars-Sinai’s AI system can read echocardiograms and write the report Robotic Surgery Performed Remotely on Patient 1,500 Miles Away - Slashdot Ex-Uber CEO Kalanick Debuts Plan for 'Gainfully Employed Robots' German philosopher Jürgen Habermas dies at 96 CanIRun.ai — Can your machine run AI models? We tried White Castle from an airport vending machine. It was bleak. I tried BigArch. A big mess. Hosts: Leo Laporte, Jeff Jarvis, and Fr. Robert Ballecer, SJ Guest: Rumman Chowdhury Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit Sponsors: spaceship.com/twit outsystems.com/twit zscaler.com/security preview.modulate.ai

Radio Leo (Video HD)
Intelligent Machines 862: Ménage à Claude

Radio Leo (Video HD)

Play Episode Listen Later Mar 19, 2026 180:43 Transcription Available


Who gets to define what intelligence means in the age of AI, and why are tech companies so keen to shift blame onto their creations? This episode digs into moral outsourcing, agency, and the urgent need for independent oversight in the world of artificial intelligence. Nvidia Unveils NemoClaw Agent Software Nvidia's NemoClaw is OpenClaw with guardrails Jensen just put Nvidia's Blackwell and Vera Rubin sales projections into the $1 trillion stratosphere Nvidia Unveils Groq-Based Chip System to Speed Up AI Tasks Like Coding Nvidia's DLSS 5 is like motion smoothing for video games, but worse Zuckerberg has "finished" with Alexandr Wang, worth US$14 billion Meta didn't buy Moltbook for bots — it bought into the agentic web Meta's Manus AI agent arrives on your desktop to take on OpenClaw Introducing GPT-5.4 mini and nano | OpenAI Sources: OpenAI signed a deal with AWS to sell its AI services to US government agencies for both classified and unclassified work, amid the Anthropic-DOD spat Inside OpenAI's Race to Catch Up to Claude Code OpenAI, Musk and Focus A mystery 1T-parameter AI model called Hunter Alpha, which appeared on OpenRouter on March 11, sparks speculation that DeepSeek is quietly testing its V4 model Hustlers are cashing in on China's OpenClaw AI craze Baidu is integrating OpenClaw with its Xiaodu devices to work as voice-controlled remotes, as it seeks to catch up with Tencent and Alibaba in the AI race Tennessee grandmother jailed after AI facial recognition error links her to fraud Judges Find AI Doesn't Have Human Intelligence in Two New Court Cases - Slashdot AI Agent Hacks McKinsey A study of ~1,500 US workers finds AI use can reduce burnout but also cause "AI brain fry", a mental fatigue from using AI tools beyond one's cognitive capacity AI companies want to harvest improv actors' skills to train AI on human emotion A Reddit Post, An AI Hallucination, And Two Lawyers Who Never Checked Citations Walk Into A Dog Custody Case Digg's open beta shuts down after just two months, blaming AI bot spam EchoPrime – Cedars-Sinai’s AI system can read echocardiograms and write the report Robotic Surgery Performed Remotely on Patient 1,500 Miles Away - Slashdot Ex-Uber CEO Kalanick Debuts Plan for 'Gainfully Employed Robots' German philosopher Jürgen Habermas dies at 96 CanIRun.ai — Can your machine run AI models? We tried White Castle from an airport vending machine. It was bleak. I tried BigArch. A big mess. Hosts: Leo Laporte, Jeff Jarvis, and Fr. Robert Ballecer, SJ Guest: Rumman Chowdhury Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit Sponsors: spaceship.com/twit outsystems.com/twit zscaler.com/security preview.modulate.ai

Ethereum Daily - Crypto News Briefing
Ethereum Foundation Mandate

Ethereum Daily - Crypto News Briefing

Play Episode Listen Later Mar 14, 2026 2:57


The Ethereum Foundation publishes a 38-page mandate. Spire launches full synchronous composability. POAP enters maintenance mode. And Aave Labs publishes its V4 activation proposal. Read more: https://ethdaily.io/903 Earn 10% real yield on your dollars, fully onchain. Hold $BOLD, the only decentralized stablecoin rated A- by stablecoin agency, Bluechip. No vaults, no middlemen, no RWAs. Learn more on liquity.org/earn Content is for informational purposes only, not endorsement or investment advice. The accuracy of information is not guaranteed.

AI For Humans
AI Can Improve Itself Now. We're Sure That's Fine.

AI For Humans

Play Episode Listen Later Mar 13, 2026 59:21


AI just learned how to make itself smarter. That's not a hypothetical anymore. Recursive self-learning is here, and it's changing everything about how AI develops.   This week on AI For Humans, we break down Andrej Karpathy's new AutoResearch project and what recursive self-improvement actually means for the rest of us. Plus, Anthropic's massive Time magazine profile reveals just how fast Claude is writing its own code, Meta quietly acquired an AI agent social network called MoltBook, Replit drops V4, Perplexity launches computer use, Gemini finally shows up in Google Docs and Maps, Cloudflare does a full 180 on web scraping, Figure's robot cleans an entire living room, and there's a robot horse.    We're sure that's fine.   AI IS IMPROVING ITSELF AND WE'RE JUST SITTING HERE WATCHING.   #ai #artificialintelligence #aiforhumans  Come to our Discord: https://discord.gg/muD2TYgC8f Join our Patreon: https://www.patreon.com/AIForHumansShow AI For Humans Newsletter: https://aiforhumans.beehiiv.com/ Follow us for more on X @AIForHumansShow Join our TikTok @aiforhumansshow To book us for speaking, please visit our website: https://www.aiforhumans.show/ // Show Links // Karpathy's AutoResearch: Recursive Self-Learning https://x.com/karpathy/status/2031135152349524125?s=20 AutoResearch GitHub Repository https://github.com/karpathy/autoresearch Sam Altman on Multi-Day and Multi-Week AI Agent Work https://youtu.be/sTnl8O_BuuE?si=xaWYyqYbVJYzOvYZ HBR: When Using AI Leads to Brain Fry https://hbr.org/2026/03/when-using-ai-leads-to-brain-fry Anthropic's Big Time Magazine Profile: Claude, the Pentagon, and Disruption https://time.com/article/2026/03/11/anthropic-claude-disruptive-company-pentagon/ Claude's Rapid Shipping Pace https://x.com/claudeai/status/2032124273587077133?s=20 Paperclip Open Sourced: AI-Powered Company Management https://x.com/dotta/status/2029239759428780116?s=20 Meta Acquires MoltBook AI Agent Social Network https://www.axios.com/2026/03/10/meta-facebook-moltbook-agent-social-network Replit V4 Launch https://x.com/amasad/status/2031755113694679094?s=20 Perplexity Computer Use https://x.com/perplexity_ai/status/2031790180521427166?s=20 Claude Code Makes Videos Now https://x.com/josephdviviano/status/2031196768424132881?s=20 Gavin's Claude Code Video Experiment https://x.com/gavinpurcell/status/2031487595717226955?s=20 Gavin's Claude Code Bio Video https://x.com/gavinpurcell/status/2031620238689898770?s=20 Gemini Comes to Google Docs and More https://x.com/OfficialLoganK/status/2031374503599567113?s=20 Gemini in Google Maps: Ask Maps with Immersive Navigation https://blog.google/products-and-platforms/products/maps/ask-maps-immersive-navigation/ Gemini Embeddings https://x.com/OfficialLoganK/status/2031411916489298156?s=20 Runway Characters https://x.com/runwayml/status/2031028120971571687?s=20 Cloudflare Launches /Crawl So All Sites Can Be Scraped https://x.com/CloudflareDev/status/2031488099725754821?s=20 Figure Robot Does Full Autonomous Living Room Cleanup https://x.com/Figure_robot/status/2031038981333565949?s=20 Deep Robotics Robot Horse https://x.com/DeepRobotics_CN/status/2031910951465992535?s=20 Real-Time Skeletal Visualization with Three.js https://x.com/nick_bisesi/status/2031728629592289591?s=20 Taking Halo ISO and Getting It to Play on Mac https://x.com/JasonBotterill/status/2031855986303254926?s=20 AI Tennis Prediction https://x.com/phosphenq/status/2031400355167117498 Green Code YouTube Channel: AI Explainers https://www.youtube.com/@Green-Code LotR x Pawn Stars AI Video Mashup https://www.reddit.com/r/aivideo/comments/1rqgolw/wrong_universe_lotr_vs_pawn_stars_ai_mashup/  

HetiVálasz
[Zöld Válasz] Biomassza – tényleg zöld az, ha elégetünk egy rahedli tűzifát?

HetiVálasz

Play Episode Listen Later Mar 13, 2026 74:32


A biomassza az EU megújulóenergia-felhasználásának majdnem 60 százalékát adja, itthon pedig még jobb a helyzet: 80 százalék körüli ez az arány. Persze a megújulók csak 5-6 százalékot képviselnek az energiamixben, de a tervezhetőség miatt fontosak. A megújuló energia ugyanakkor nagyrészt azt jelenti, hogy elégetjük a tűzifát. Komoly ökológiai ellentmondás ez, hiszen így annyi szén-dioxid keletkezik, amelyet egy új fa sak száz év alatt egyensúlyoz ki. Hogy lehet mégis egyensúlyba hozni az energia- és az ökológiai szempontokat? Erről beszélget Litkai Gergely vendégeivel: Szajkó Gabriellával, a Regionális Energiagazdasági Kutatóiközpont (REK) tudományos munkatársával, az energiapiacok és a dekarbonizációs stratégiák kutatójával, valamint Harmat Ádámmal, a WWF Magyarország éghajlatvédelmi szakértőjével. Tudjuk egyáltalán, hogy a V4-országokban mekkora a valós kibocsátás? Zöld Válasz a biomasszáról.

DopoGP MotoGP - Moto.it
DopoGP test Sepang - Ducati e Aprilia, Alex e Bez

DopoGP MotoGP - Moto.it

Play Episode Listen Later Feb 25, 2026 65:28


Alex Marquez e Marco Bezzecchi, Ducati e Aprilia davanti a tutti dopo gli ultimi time attack e le varie simulazioni. Gli equilibri non sembrano tanto diversi dall'anno scorso, ma le novità viste sulle MotoGP 2026 sono tante e importanti. Le vedremo insieme. Quattro piloti italiani nei primi sei. Tra i debuttanti, Toprak diciottesimo davanti a Moreira. Sembra di cogliere una generale soddisfazione di piloti e tecnici. Analizzeremo casa per casa, con un occhio particolare a quella che sembra più in difficoltà: Yamaha. Già senza Quartararo fin dal primo giorno per una brutta caduta, al giorno 2 le nuove V4 sono state fermate per un problema tecnico da verificare, poi risolto.Diventa un supporter di questo podcast: https://www.spreaker.com/podcast/dopogp-motogp-moto-it--4070022/support.

Denník N podcast
Ekonomický newsfilter: Kedy príde na rad porovnanie s Albánskom

Denník N podcast

Play Episode Listen Later Feb 17, 2026 12:28


1. V turnaji V4 sa Slovensko ocitlo na dne tabuľky 2. Ako je na tom Slovensko s infláciou 3. Umelá inteligencia pomáha novinárom nájsť trendy, ktoré by inak spozorovali ťažko 4. Krátko o aukcii dlhopisov, daňovej prognóze a úvahách o eure vo Švédsku

Cleveland Moto
Clevelandmoto 536 Nobody died shovelin' sunshine.

Cleveland Moto

Play Episode Listen Later Feb 7, 2026 142:36


ClevelandMoto Podcast 536 Show Notes8 cylinders, 8 speed DCT, 1000 lbs. and sure, it's coming to America....yeah, right. Call me when it's sitting at a dealership. They tried this crap at AIM expo 2 years ago, this year they trotted it out at CES. https://www.advrider.com/the-gold-wing-dwarfing-gwm-souo-s2000-is-coming-to-america-allegedly/Oooh, let's all make fun of the silly Chinese company and their silly motorcycle! Right? Be careful tho' they sold 1.2 Million cars and trucks last year. That's just behind HONDA. https://www.gwm-global.com/news/3403831.html$60,000 is a lot of dosh. https://www.ft.com/content/d65acba7-33ca-4e43-8581-d71061543dd0?shareType=nongiftRemember this guy, we loved his 250cc 76 HP v-twin 2-strokes at $40,000 do we feel the same about his 185 HP Buell-Powered cafe racer? This is rough, probably rougher than you want, but it's a real 400cc V4 and I think it's gonna go cheap. https://iconicmotorbikeauctions.com/auction/1989-honda-vfr400-nc30-6/Triumph announces a "Limited Edition Cafe Racer 1200" Kind of their regular 1200 Bonneville with a $19,000 price tag. $5000 and 25 more HP than the standard model. $4400 more and the same HP as the Speed Twin 1200. https://www.triumphmotorcycles.com/motorcycles/classic/speed/speed-twin-cafe-racer-editionSince we're talking about high-dollar Retro bikes: coming in at a hair under $20,000 (if you want it in Black) is the newest Retro from Indian. 1890cc and about 120 torques. I actually love the look of this machine, but of course I do, it's a copy of a Kawasaki Drifter.  https://www.indianmotorcycle.com/en-us/chief-motorcycles/chief-vintage/Or Save $16,000 and have the same experience. https://atvhunt.com/l/10728353/2003-Kawasaki-Vulcan-800-Drifteror save $15,000 and get the 1500cc model: https://motohunt.com/l/4190659/1999-Kawasaki-Drifter-VN1500Support the showRemember folks...Ride Fast and Take Chances! check out our Youtube channel at https://www.youtube.com/c/ClevelandMoto

SynGAP10 weekly 10 minute updates on SYNGAP1 (video)
All #SYNGAP1 Families need to take part in our Natural History Studies: ProMMiS & Citizen #S10e198

SynGAP10 weekly 10 minute updates on SYNGAP1 (video)

Play Episode Listen Later Feb 6, 2026 9:58


Thursday, February 5, 2026 - Week 6 Happy #RareDisease & #BlackHistory Month!   #NaturalHistory means how this disease progresses.  Reminder: We have only been at this for 17 years, first patients were identified via Hamdan, 2009. https://pubmed.ncbi.nlm.nih.gov/19196676/   Retrospective Digital NHS: cureSYNGAP1.org/Citizen (Growing list of tools available to families, for free)   Prospective Multi-disciplinary Multi-site NHS: ProMMiS cureSYNGAP1.org/ProMMiS   Reminder, only possible by CS1 support for non-CHOP sites and travel plus huge gift to Penn. https://www.chop.edu/news/25-million-gift-penn-medicine-and-children-s-hospital-philadelphia-establishes-center-epilepsy   Potential for being a control arm in the future.   Protocol: https://www.linkedin.com/posts/curesyngap1_syngap1-stxbp1-dee-activity-7425223573134327808-SVEQ & early data: https://pubmed.ncbi.nlm.nih.gov/40119723/   Join the ~160 families who have enjoyed excellent clinical care and contributed tot he future of SYNGAP1.  Today, a 4 month old is going! CHOP: 119 new, V2- 67, V3- 32, V4- 10, V5- 4 CHCO: 37 new, V2- 7 Stanford: 8 new, V2- 2 Total: 164 (double counting one family who goes to multiple sites)   Survey English: https://curesyngap1.org/SurveyProMMiS Spanish: https://curesyngap1.org/encuestaProMMiS   94 Responses to survey, so far: Why not? Did not receive an invitation, Too far to travel, Too expensive Barriers: Logistics, Cost, Time off, Behaviors, Insurance   ETC. Pubmed 2026 is at 6!  But will soon be 7 with the McKee paper! https://pubmed.ncbi.nlm.nih.gov/?term=syngap1&filter=years.2026-2026&sort=date   Biorepository needs more samples.  Check out the list and map here https://docs.google.com/presentation/d/1IjaHILXj7AlBDlbTJgvYrkBS_0bnI8VCnTIiPXJ7JGM/edit?usp=sharing and contribute blood.  The data and research we do with these samples is invaluable.   May 28, San Francisco, CA: cureSYNGAP1.org/SF26   SOCIAL MATTERS 4,668 LinkedIn.  https://www.linkedin.com/company/curesyngap1/ 1,520 YouTube.  https://www.youtube.com/@CureSYNGAP1 11.2k Twitter https://twitter.com/cureSYNGAP1 45k Insta https://www.instagram.com/curesyngap1/   $CAMP stock is at $3.59 on 5 Feb. ‘26 https://www.google.com/finance/beta/quote/CAMP:NASDAQ   Like and subscribe to this podcast wherever you listen.  https://curesyngap1.org/podcasts/syngap10/ Episode 198 of #Syngap10 #CureSYNGAP1 #Podcast

Focus Check
ep102 - YouTube Paid $100 Billion to Creators | Hohem iSteady MT3 | DJI RS 5 | Astera QuikBeam – CineD Focus Check

Focus Check

Play Episode Listen Later Feb 5, 2026 95:45


Imagine this: YouTube has paid over $100 billion to creators in the last four years. That's an enormous sum — far more than most TV networks spend on content today. In this episode, we take a closer look at what that means for creators and the platform as a whole. Further in this episode, we talk about plenty of other highlights, including two brand-new Hohem gimbals (which Nino reviewed), a range of innovative new lights from several manufacturers, and exciting new lenses — all covered in this episode of Focus Check. So, as always, hit that play button and enjoy. Chapters and Articles in This Episode   (00:00) – Intro (07:44) – Hohem iSteady MT3 & MT3 Pro Gimbal Review – Built-In AI Tracking and Pro Performance Without the Pro Price https://www.cined.com/hohem-isteady-mt3-mt3-pro-review-built-in-ai-tracking-and-pro-performance-without-the-pro-price/ (15:15) – DJI RS 5 Announced – Enhanced Intelligent Tracking Module, 5th-Gen Stabilization, and One-Hour Fast Charging https://www.cined.com/dji-rs-5-announced-enhanced-intelligent-tracking-module-5th-gen-stabilization-and-one-hour-fast-charging/ (25:31) – YouTube Paid Creators $100 Billion in Four Years – Here Are Their Priorities for 2026 https://www.cined.com/youtube-paid-creators-100-billion-in-four-years-here-are-their-priorities-for-2026/ (37:25) – Sony a7S III Firmware Update Version 5.00 Released with Expanded Autofocus Customization https://www.cined.com/sony-a7s-iii-firmware-update-version-5-00-released-with-expanded-autofocus-customization/ (40:50) – Sony Firmware Updates for VENICE 2 V4.1, BURANO V3.0, FX6 V6.0, and FR7 V4.0 Firmware at BSC Expo 2026 – New Anamorphic Modes, BIG6, Blackmagic RAW https://www.cined.com/sony-firmware-updates-for-venice-2-v4-1-burano-v3-0-fx6-v6-0-and-fr7-v4-0-firmware-at-bsc-expo-2026-new-anamorphic-modes-big6-blackmagic-raw/ (48:20) – Sony VENICE 2 Extension System Mini – Now Shipping https://www.cined.com/sony-venice-2-extension-system-mini-now-shipping/ (54:43) – Astera QuikBeam Announced – Ultra-Compact Fresnel with Multiple Powering Options https://www.cined.com/astera-quikbeam-announced-ultra-compact-fresnel-with-multiple-powering-options/ (58:17) – NANLITE FC-720B and FC-720C Announced – 750W Additions to the FC Series https://www.cined.com/nanlite-fc-720b-and-fc-720c-announced-750w-additions-to-the-fc-series/ (01:00:49) – Aputure NOVA 9° 2×1 and NOVA II 1×1 Announced – A 9° Long-Throw Panel and a High-Output 1×1 https://www.cined.com/aputure-nova-9-2x1-and-nova-ii-1x1-announced-a-9-long-throw-panel-and-a-high-output-1x1/ (01:05:51) – Leica Noctilux-M 35mm f/1.2 ASPH – Innovative and Exorbitantly Priced https://www.cined.com/leica-noctilux-m-35mm-f-1-2-asph-innovative-and-exorbitantly-priced/ (01:10:13) – Pixboom Spark First Production Unit Rolls Off Assembly Line – Mid-March Shipping Confirmed https://www.cined.com/pixboom-spark-first-production-unit-rolls-off-assembly-line-mid-march-shipping-confirmed/ (01:16:49) – Blazar Talon 1.5X AF Anamorphic Lenses Announced – World's First 1.5x Squeeze Autofocus System https://www.cined.com/blazar-talon-1-5x-af-anamorphic-lenses-announced-worlds-first-1-5x-squeeze-autofocus-system/ (01:23:00) – Eddie AI Adds Multi-Track Audio Support for Professional Editing Workflows https://www.cined.com/eddie-ai-adds-multi-track-audio-support-for-professional-editing-workflows/ (01:25:13) – FUJIFILM GFX ETERNA 55 Firmware Version 1.04 Released – Display Delay of External Monitor Improved https://www.cined.com/fujifilm-gfx-eterna-55-firmware-version-1-04-released-display-delay-of-external-monitor-improved/ (01:27:04) – Brightin Star 60mm f/2.8 II Macro Lens Announced https://www.cined.com/brightin-star-60mm-f-2-8-ii-macro-lens-announced/ (01:29:08) – Hollyland Solidcom M1 Pro Released – A Scalable 1.9 GHz Intercom for Medium-Scale Productions https://www.cined.com/hollyland-solidcom-m1-pro-released-a-scalable-1-9-ghz-intercom-for-medium-scale-productions/ (01:32:20) – Freeze, Move, Immerse – Bullet Time and Volumetric Capture in Action https://www.cined.com/freeze-move-immerse-bullet-time-and-volumetric-capture-in-action/ We hope you enjoyed this episode! You have feedback, comments, or suggestions? Write us at podcast@cined.com

Loud Pipes!
241: Riding Update and Predictions for 2026

Loud Pipes!

Play Episode Listen Later Feb 3, 2026 109:52


Predictions: Chad: Increase in V4 engines Honda announces a new VFR Increase in Chinese motorcycles to the US More manufacturers enter three-wheel market Every major manufacturer will have double stack 1911 by end of the year (except Glock) More DA/SA handguns in the market Rich: Yamaha announces a street bike with V4 insprired by change in MotoGP More lighter ADV or dual sport motorcycles over (like IBEX 450, KLX300, etc) More double stack 1911 style handguns on the way More non-striker fires carry options

Cleveland Moto
ClevelandMoto 535 Moto Morini Vettore 450 braves the blizzard

Cleveland Moto

Play Episode Listen Later Jan 29, 2026 157:45


SHOW NOTES 535MotoMorini Vettore 450 will be in attendanceTriumph Triple Updates: The Trident 660 and Tiger Sport 660 received significant updates for 2026, featuring "punchier" engine layouts and revised chassis. They are expected in dealerships by March 2026.Honda Entry-Level Changes: The 2026 Rebel 300 has been updated with Honda's E-Clutch technology Yamaha's New Era: Yamaha has officially debuted its first full-factory V4-powered M1 for the 2026 MotoGP season, signaling a major technical shift from its traditional inline-four engine.Dakar Victory: Luciano Benavides and Red Bull KTM have beeKTM/Bajaj Restructuring: Following a major post-insolvency reset, Bajaj Auto has officially taken controlling interest of KTM AG. The company has announced plans to cut approximately 500 jobs as part of a global cost-reduction strategy.Market Trends: Reports indicate a challenging start to 2026, with nearly 120 U.S. dealerships closing due to sagging sales and rising costs. Conversely, BMW Motorrad reported record success, clearing 200,000 worldwide sales for the fourth consecutive year.New Brake Systems: WP Suspension (owned by KTM/Bajaj) has launched a new performance braking division, moving toward more in-house component integration for future models like the 390 DukeSupport the showRemember folks...Ride Fast and Take Chances! check out our Youtube channel at https://www.youtube.com/c/ClevelandMoto

Radio Prague - English
Czech-Hungarian talks, engraving from Ice Age in Moravia, Lesser Town

Radio Prague - English

Play Episode Listen Later Jan 20, 2026 27:38


Czech–Hungarian alignment on migration, war and V4, Ice Age horse engraving found in Moravian Karst cave, history of license plates in Czech lands, Lesser Town 

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
Artificial Analysis: The Independent LLM Analysis House — with George Cameron and Micah Hill-Smith

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Jan 9, 2026 78:14


don't miss George's AIE talk: https://www.youtube.com/watch?v=sRpqPgKeXNk —- From launching a side project in a Sydney basement to becoming the independent gold standard for AI benchmarking—trusted by developers, enterprises, and every major lab to navigate the exploding landscape of models, providers, and capabilities—George Cameron and Micah Hill-Smith have spent two years building Artificial Analysis into the platform that answers the questions no one else will: Which model is actually best for your use case? What are the real speed-cost trade-offs? And how open is "open" really? We discuss: The origin story: built as a side project in 2023 while Micah was building a legal AI assistant, launched publicly in January 2024, and went viral after Swyx's retweet Why they run evals themselves: labs prompt models differently, cherry-pick chain-of-thought examples (Google Gemini 1.0 Ultra used 32-shot prompts to beat GPT-4 on MMLU), and self-report inflated numbers The mystery shopper policy: they register accounts not on their own domain and run intelligence + performance benchmarks incognito to prevent labs from serving different models on private endpoints How they make money: enterprise benchmarking insights subscription (standardized reports on model deployment, serverless vs. managed vs. leasing chips) and private custom benchmarking for AI companies (no one pays to be on the public leaderboard) The Intelligence Index (V3): synthesizes 10 eval datasets (MMLU, GPQA, agentic benchmarks, long-context reasoning) into a single score, with 95% confidence intervals via repeated runs Omissions Index (hallucination rate): scores models from -100 to +100 (penalizing incorrect answers, rewarding "I don't know"), and Claude models lead with the lowest hallucination rates despite not always being the smartest GDP Val AA: their version of OpenAI's GDP-bench (44 white-collar tasks with spreadsheets, PDFs, PowerPoints), run through their Stirrup agent harness (up to 100 turns, code execution, web search, file system), graded by Gemini 3 Pro as an LLM judge (tested extensively, no self-preference bias) The Openness Index: scores models 0-18 on transparency of pre-training data, post-training data, methodology, training code, and licensing (AI2 OLMo 2 leads, followed by Nous Hermes and NVIDIA Nemotron) The smiling curve of AI costs: GPT-4-level intelligence is 100-1000x cheaper than at launch (thanks to smaller models like Amazon Nova), but frontier reasoning models in agentic workflows cost more than ever (sparsity, long context, multi-turn agents) Why sparsity might go way lower than 5%: GPT-4.5 is ~5% active, Gemini models might be ~3%, and Omissions Index accuracy correlates with total parameters (not active), suggesting massive sparse models are the future Token efficiency vs. turn efficiency: GPT-5 costs more per token but solves Tau-bench in fewer turns (cheaper overall), and models are getting better at using more tokens only when needed (5.1 Codex has tighter token distributions) V4 of the Intelligence Index coming soon: adding GDP Val AA, Critical Point, hallucination rate, and dropping some saturated benchmarks (human-eval-style coding is now trivial for small models) — Artificial Analysis Website: https://artificialanalysis.ai (https://artificialanalysis.ai ("https://artificialanalysis.ai")) George Cameron on X: https://x.com/grmcameron (https://x.com/grmcameron ("https://x.com/grmcameron")) Micah Hill-Smith on X: https://x.com/_micah_h (https://x.com/_micah_h ("https://x.com/_micah_h")) Chapters 00:00:00 Introduction: Full Circle Moment and Artificial Analysis Origins 00:01:08 Business Model: Independence and Revenue Streams 00:04:00 The Origin Story: From Legal AI to Benchmarking 00:07:00 Early Challenges: Cost, Methodology, and Independence 00:16:13 AI Grant and Moving to San Francisco 00:18:58 Evolution of the Intelligence Index: V1 to V3 00:27:55 New Benchmarks: Hallucination Rate and Omissions Index 00:33:19 Critical Point and Frontier Physics Problems 00:35:56 GDPVAL AA: Agentic Evaluation and Stirrup Harness 00:51:47 The Openness Index: Measuring Model Transparency 00:57:57 The Smiling Curve: Cost of Intelligence Paradox 01:04:00 Hardware Efficiency and Sparsity Trends 01:07:43 Reasoning vs Non-Reasoning: Token Efficiency Matters 01:10:47 Multimodal Benchmarking and Community Requests 01:14:50 Looking Ahead: V4 Intelligence Index and Beyond

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
Artificial Analysis: Independent LLM Evals as a Service — with George Cameron and Micah-Hill Smith

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Jan 8, 2026 78:24


Happy New Year! You may have noticed that in 2025 we had moved toward YouTube as our primary podcasting platform. As we'll explain in the next State of Latent Space post, we'll be doubling down on Substack again and improving the experience for the over 100,000 of you who look out for our emails and website updates!We first mentioned Artificial Analysis in 2024, when it was still a side project in a Sydney basement. They then were one of the few Nat Friedman and Daniel Gross' AIGrant companies to raise a full seed round from them and have now become the independent gold standard for AI benchmarking—trusted by developers, enterprises, and every major lab to navigate the exploding landscape of models, providers, and capabilities.We have chatted with both Clementine Fourrier of HuggingFace's OpenLLM Leaderboard and (the freshly valued at $1.7B) Anastasios Angelopoulos of LMArena on their approaches to LLM evals and trendspotting, but Artificial Analysis have staked out an enduring and important place in the toolkit of the modern AI Engineer by doing the best job of independently running the most comprehensive set of evals across the widest range of open and closed models, and charting their progress for broad industry analyst use.George Cameron and Micah-Hill Smith have spent two years building Artificial Analysis into the platform that answers the questions no one else will: Which model is actually best for your use case? What are the real speed-cost trade-offs? And how open is “open” really?We discuss:* The origin story: built as a side project in 2023 while Micah was building a legal AI assistant, launched publicly in January 2024, and went viral after Swyx's retweet* Why they run evals themselves: labs prompt models differently, cherry-pick chain-of-thought examples (Google Gemini 1.0 Ultra used 32-shot prompts to beat GPT-4 on MMLU), and self-report inflated numbers* The mystery shopper policy: they register accounts not on their own domain and run intelligence + performance benchmarks incognito to prevent labs from serving different models on private endpoints* How they make money: enterprise benchmarking insights subscription (standardized reports on model deployment, serverless vs. managed vs. leasing chips) and private custom benchmarking for AI companies (no one pays to be on the public leaderboard)* The Intelligence Index (V3): synthesizes 10 eval datasets (MMLU, GPQA, agentic benchmarks, long-context reasoning) into a single score, with 95% confidence intervals via repeated runs* Omissions Index (hallucination rate): scores models from -100 to +100 (penalizing incorrect answers, rewarding ”I don't know”), and Claude models lead with the lowest hallucination rates despite not always being the smartest* GDP Val AA: their version of OpenAI's GDP-bench (44 white-collar tasks with spreadsheets, PDFs, PowerPoints), run through their Stirrup agent harness (up to 100 turns, code execution, web search, file system), graded by Gemini 3 Pro as an LLM judge (tested extensively, no self-preference bias)* The Openness Index: scores models 0-18 on transparency of pre-training data, post-training data, methodology, training code, and licensing (AI2 OLMo 2 leads, followed by Nous Hermes and NVIDIA Nemotron)* The smiling curve of AI costs: GPT-4-level intelligence is 100-1000x cheaper than at launch (thanks to smaller models like Amazon Nova), but frontier reasoning models in agentic workflows cost more than ever (sparsity, long context, multi-turn agents)* Why sparsity might go way lower than 5%: GPT-4.5 is ~5% active, Gemini models might be ~3%, and Omissions Index accuracy correlates with total parameters (not active), suggesting massive sparse models are the future* Token efficiency vs. turn efficiency: GPT-5 costs more per token but solves Tau-bench in fewer turns (cheaper overall), and models are getting better at using more tokens only when needed (5.1 Codex has tighter token distributions)* V4 of the Intelligence Index coming soon: adding GDP Val AA, Critical Point, hallucination rate, and dropping some saturated benchmarks (human-eval-style coding is now trivial for small models)Links to Artificial Analysis* Website: https://artificialanalysis.ai* George Cameron on X: https://x.com/georgecameron* Micah-Hill Smith on X: https://x.com/micahhsmithFull Episode on YouTubeTimestamps* 00:00 Introduction: Full Circle Moment and Artificial Analysis Origins* 01:19 Business Model: Independence and Revenue Streams* 04:33 Origin Story: From Legal AI to Benchmarking Need* 16:22 AI Grant and Moving to San Francisco* 19:21 Intelligence Index Evolution: From V1 to V3* 11:47 Benchmarking Challenges: Variance, Contamination, and Methodology* 13:52 Mystery Shopper Policy and Maintaining Independence* 28:01 New Benchmarks: Omissions Index for Hallucination Detection* 33:36 Critical Point: Hard Physics Problems and Research-Level Reasoning* 23:01 GDP Val AA: Agentic Benchmark for Real Work Tasks* 50:19 Stirrup Agent Harness: Open Source Agentic Framework* 52:43 Openness Index: Measuring Model Transparency Beyond Licenses* 58:25 The Smiling Curve: Cost Falling While Spend Rising* 1:02:32 Hardware Efficiency: Blackwell Gains and Sparsity Limits* 1:06:23 Reasoning Models and Token Efficiency: The Spectrum Emerges* 1:11:00 Multimodal Benchmarking: Image, Video, and Speech Arenas* 1:15:05 Looking Ahead: Intelligence Index V4 and Future Directions* 1:16:50 Closing: The Insatiable Demand for IntelligenceTranscriptMicah [00:00:06]: This is kind of a full circle moment for us in a way, because the first time artificial analysis got mentioned on a podcast was you and Alessio on Latent Space. Amazing.swyx [00:00:17]: Which was January 2024. I don't even remember doing that, but yeah, it was very influential to me. Yeah, I'm looking at AI News for Jan 17, or Jan 16, 2024. I said, this gem of a models and host comparison site was just launched. And then I put in a few screenshots, and I said, it's an independent third party. It clearly outlines the quality versus throughput trade-off, and it breaks out by model and hosting provider. I did give you s**t for missing fireworks, and how do you have a model benchmarking thing without fireworks? But you had together, you had perplexity, and I think we just started chatting there. Welcome, George and Micah, to Latent Space. I've been following your progress. Congrats on... It's been an amazing year. You guys have really come together to be the presumptive new gardener of AI, right? Which is something that...George [00:01:09]: Yeah, but you can't pay us for better results.swyx [00:01:12]: Yes, exactly.George [00:01:13]: Very important.Micah [00:01:14]: Start off with a spicy take.swyx [00:01:18]: Okay, how do I pay you?Micah [00:01:20]: Let's get right into that.swyx [00:01:21]: How do you make money?Micah [00:01:24]: Well, very happy to talk about that. So it's been a big journey the last couple of years. Artificial analysis is going to be two years old in January 2026. Which is pretty soon now. We first run the website for free, obviously, and give away a ton of data to help developers and companies navigate AI and make decisions about models, providers, technologies across the AI stack for building stuff. We're very committed to doing that and tend to keep doing that. We have, along the way, built a business that is working out pretty sustainably. We've got just over 20 people now and two main customer groups. So we want to be... We want to be who enterprise look to for data and insights on AI, so we want to help them with their decisions about models and technologies for building stuff. And then on the other side, we do private benchmarking for companies throughout the AI stack who build AI stuff. So no one pays to be on the website. We've been very clear about that from the very start because there's no use doing what we do unless it's independent AI benchmarking. Yeah. But turns out a bunch of our stuff can be pretty useful to companies building AI stuff.swyx [00:02:38]: And is it like, I am a Fortune 500, I need advisors on objective analysis, and I call you guys and you pull up a custom report for me, you come into my office and give me a workshop? What kind of engagement is that?George [00:02:53]: So we have a benchmarking and insight subscription, which looks like standardized reports that cover key topics or key challenges enterprises face when looking to understand AI and choose between all the technologies. And so, for instance, one of the report is a model deployment report, how to think about choosing between serverless inference, managed deployment solutions, or leasing chips. And running inference yourself is an example kind of decision that big enterprises face, and it's hard to reason through, like this AI stuff is really new to everybody. And so we try and help with our reports and insight subscription. Companies navigate that. We also do custom private benchmarking. And so that's very different from the public benchmarking that we publicize, and there's no commercial model around that. For private benchmarking, we'll at times create benchmarks, run benchmarks to specs that enterprises want. And we'll also do that sometimes for AI companies who have built things, and we help them understand what they've built with private benchmarking. Yeah. So that's a piece mainly that we've developed through trying to support everybody publicly with our public benchmarks. Yeah.swyx [00:04:09]: Let's talk about TechStack behind that. But okay, I'm going to rewind all the way to when you guys started this project. You were all the way in Sydney? Yeah. Well, Sydney, Australia for me.Micah [00:04:19]: George was an SF, but he's Australian, but he moved here already. Yeah.swyx [00:04:22]: And I remember I had the Zoom call with you. What was the impetus for starting artificial analysis in the first place? You know, you started with public benchmarks. And so let's start there. We'll go to the private benchmark. Yeah.George [00:04:33]: Why don't we even go back a little bit to like why we, you know, thought that it was needed? Yeah.Micah [00:04:40]: The story kind of begins like in 2022, 2023, like both George and I have been into AI stuff for quite a while. In 2023 specifically, I was trying to build a legal AI research assistant. So it actually worked pretty well for its era, I would say. Yeah. Yeah. So I was finding that the more you go into building something using LLMs, the more each bit of what you're doing ends up being a benchmarking problem. So had like this multistage algorithm thing, trying to figure out what the minimum viable model for each bit was, trying to optimize every bit of it as you build that out, right? Like you're trying to think about accuracy, a bunch of other metrics and performance and cost. And mostly just no one was doing anything to independently evaluate all the models. And certainly not to look at the trade-offs for speed and cost. So we basically set out just to build a thing that developers could look at to see the trade-offs between all of those things measured independently across all the models and providers. Honestly, it was probably meant to be a side project when we first started doing it.swyx [00:05:49]: Like we didn't like get together and say like, Hey, like we're going to stop working on all this stuff. I'm like, this is going to be our main thing. When I first called you, I think you hadn't decided on starting a company yet.Micah [00:05:58]: That's actually true. I don't even think we'd pause like, like George had an acquittance job. I didn't quit working on my legal AI thing. Like it was genuinely a side project.George [00:06:05]: We built it because we needed it as people building in the space and thought, Oh, other people might find it useful too. So we'll buy domain and link it to the Vercel deployment that we had and tweet about it. And, but very quickly it started getting attention. Thank you, Swyx for, I think doing an initial retweet and spotlighting it there. This project that we released. And then very quickly though, it was useful to others, but very quickly it became more useful as the number of models released accelerated. We had Mixtrel 8x7B and it was a key. That's a fun one. Yeah. Like a open source model that really changed the landscape and opened up people's eyes to other serverless inference providers and thinking about speed, thinking about cost. And so that was a key. And so it became more useful quite quickly. Yeah.swyx [00:07:02]: What I love talking to people like you who sit across the ecosystem is, well, I have theories about what people want, but you have data and that's obviously more relevant. But I want to stay on the origin story a little bit more. When you started out, I would say, I think the status quo at the time was every paper would come out and they would report their numbers versus competitor numbers. And that's basically it. And I remember I did the legwork. I think everyone has some knowledge. I think there's some version of Excel sheet or a Google sheet where you just like copy and paste the numbers from every paper and just post it up there. And then sometimes they don't line up because they're independently run. And so your numbers are going to look better than... Your reproductions of other people's numbers are going to look worse because you don't hold their models correctly or whatever the excuse is. I think then Stanford Helm, Percy Liang's project would also have some of these numbers. And I don't know if there's any other source that you can cite. The way that if I were to start artificial analysis at the same time you guys started, I would have used the Luther AI's eval framework harness. Yup.Micah [00:08:06]: Yup. That was some cool stuff. At the end of the day, running these evals, it's like if it's a simple Q&A eval, all you're doing is asking a list of questions and checking if the answers are right, which shouldn't be that crazy. But it turns out there are an enormous number of things that you've got control for. And I mean, back when we started the website. Yeah. Yeah. Like one of the reasons why we realized that we had to run the evals ourselves and couldn't just take rules from the labs was just that they would all prompt the models differently. And when you're competing over a few points, then you can pretty easily get- You can put the answer into the model. Yeah. That in the extreme. And like you get crazy cases like back when I'm Googled a Gemini 1.0 Ultra and needed a number that would say it was better than GPT-4 and like constructed, I think never published like chain of thought examples. 32 of them in every topic in MLU to run it, to get the score, like there are so many things that you- They never shipped Ultra, right? That's the one that never made it up. Not widely. Yeah. Yeah. Yeah. I mean, I'm sure it existed, but yeah. So we were pretty sure that we needed to run them ourselves and just run them in the same way across all the models. Yeah. And we were, we also did certain from the start that you couldn't look at those in isolation. You needed to look at them alongside the cost and performance stuff. Yeah.swyx [00:09:24]: Okay. A couple of technical questions. I mean, so obviously I also thought about this and I didn't do it because of cost. Yep. Did you not worry about costs? Were you funded already? Clearly not, but you know. No. Well, we definitely weren't at the start.Micah [00:09:36]: So like, I mean, we're paying for it personally at the start. There's a lot of money. Well, the numbers weren't nearly as bad a couple of years ago. So we certainly incurred some costs, but we were probably in the order of like hundreds of dollars of spend across all the benchmarking that we were doing. Yeah. So nothing. Yeah. It was like kind of fine. Yeah. Yeah. These days that's gone up an enormous amount for a bunch of reasons that we can talk about. But yeah, it wasn't that bad because you can also remember that like the number of models we were dealing with was hardly any and the complexity of the stuff that we wanted to do to evaluate them was a lot less. Like we were just asking some Q&A type questions and then one specific thing was for a lot of evals initially, we were just like sampling an answer. You know, like, what's the answer for this? Like, we didn't want to go into the answer directly without letting the models think. We weren't even doing chain of thought stuff initially. And that was the most useful way to get some results initially. Yeah.swyx [00:10:33]: And so for people who haven't done this work, literally parsing the responses is a whole thing, right? Like because sometimes the models, the models can answer any way they feel fit and sometimes they actually do have the right answer, but they just returned the wrong format and they will get a zero for that unless you work it into your parser. And that involves more work. And so, I mean, but there's an open question whether you should give it points for not following your instructions on the format.Micah [00:11:00]: It depends what you're looking at, right? Because you can, if you're trying to see whether or not it can solve a particular type of reasoning problem, and you don't want to test it on its ability to do answer formatting at the same time, then you might want to use an LLM as answer extractor approach to make sure that you get the answer out no matter how unanswered. But these days, it's mostly less of a problem. Like, if you instruct a model and give it examples of what the answers should look like, it can get the answers in your format, and then you can do, like, a simple regex.swyx [00:11:28]: Yeah, yeah. And then there's other questions around, I guess, sometimes if you have a multiple choice question, sometimes there's a bias towards the first answer, so you have to randomize the responses. All these nuances, like, once you dig into benchmarks, you're like, I don't know how anyone believes the numbers on all these things. It's so dark magic.Micah [00:11:47]: You've also got, like… You've got, like, the different degrees of variance in different benchmarks, right? Yeah. So, if you run four-question multi-choice on a modern reasoning model at the temperatures suggested by the labs for their own models, the variance that you can see on a four-question multi-choice eval is pretty enormous if you only do a single run of it and it has a small number of questions, especially. So, like, one of the things that we do is run an enormous number of all of our evals when we're developing new ones and doing upgrades to our intelligence index to bring in new things. Yeah. So, that we can dial in the right number of repeats so that we can get to the 95% confidence intervals that we're comfortable with so that when we pull that together, we can be confident in intelligence index to at least as tight as, like, a plus or minus one at a 95% confidence. Yeah.swyx [00:12:32]: And, again, that just adds a straight multiple to the cost. Oh, yeah. Yeah, yeah.George [00:12:37]: So, that's one of many reasons that cost has gone up a lot more than linearly over the last couple of years. We report a cost to run the artificial analysis. We report a cost to run the artificial analysis intelligence index on our website, and currently that's assuming one repeat in terms of how we report it because we want to reflect a bit about the weighting of the index. But our cost is actually a lot higher than what we report there because of the repeats.swyx [00:13:03]: Yeah, yeah, yeah. And probably this is true, but just checking, you don't have any special deals with the labs. They don't discount it. You just pay out of pocket or out of your sort of customer funds. Oh, there is a mix. So, the issue is that sometimes they may give you a special end point, which is… Ah, 100%.Micah [00:13:21]: Yeah, yeah, yeah. Exactly. So, we laser focus, like, on everything we do on having the best independent metrics and making sure that no one can manipulate them in any way. There are quite a lot of processes we've developed over the last couple of years to make that true for, like, the one you bring up, like, right here of the fact that if we're working with a lab, if they're giving us a private endpoint to evaluate a model, that it is totally possible. That what's sitting behind that black box is not the same as they serve on a public endpoint. We're very aware of that. We have what we call a mystery shopper policy. And so, and we're totally transparent with all the labs we work with about this, that we will register accounts not on our own domain and run both intelligence evals and performance benchmarks… Yeah, that's the job. …without them being able to identify it. And no one's ever had a problem with that. Because, like, a thing that turns out to actually be quite a good… …good factor in the industry is that they all want to believe that none of their competitors could manipulate what we're doing either.swyx [00:14:23]: That's true. I never thought about that. I've been in the database data industry prior, and there's a lot of shenanigans around benchmarking, right? So I'm just kind of going through the mental laundry list. Did I miss anything else in this category of shenanigans? Oh, potential shenanigans.Micah [00:14:36]: I mean, okay, the biggest one, like, that I'll bring up, like, is more of a conceptual one, actually, than, like, direct shenanigans. It's that the things that get measured become things that get targeted by labs that they're trying to build, right? Exactly. So that doesn't mean anything that we should really call shenanigans. Like, I'm not talking about training on test set. But if you know that you're going to be great at another particular thing, if you're a researcher, there are a whole bunch of things that you can do to try to get better at that thing that preferably are going to be helpful for a wide range of how actual users want to use the thing that you're building. But will not necessarily work. Will not necessarily do that. So, for instance, the models are exceptional now at answering competition maths problems. There is some relevance of that type of reasoning, that type of work, to, like, how we might use modern coding agents and stuff. But it's clearly not one for one. So the thing that we have to be aware of is that once an eval becomes the thing that everyone's looking at, scores can get better on it without there being a reflection of overall generalized intelligence of these models. Getting better. That has been true for the last couple of years. It'll be true for the next couple of years. There's no silver bullet to defeat that other than building new stuff to stay relevant and measure the capabilities that matter most to real users. Yeah.swyx [00:15:58]: And we'll cover some of the new stuff that you guys are building as well, which is cool. Like, you used to just run other people's evals, but now you're coming up with your own. And I think, obviously, that is a necessary path once you're at the frontier. You've exhausted all the existing evals. I think the next point in history that I have for you is AI Grant that you guys decided to join and move here. What was it like? I think you were in, like, batch two? Batch four. Batch four. Okay.Micah [00:16:26]: I mean, it was great. Nat and Daniel are obviously great. And it's a really cool group of companies that we were in AI Grant alongside. It was really great to get Nat and Daniel on board. Obviously, they've done a whole lot of great work in the space with a lot of leading companies and were extremely aligned. With the mission of what we were trying to do. Like, we're not quite typical of, like, a lot of the other AI startups that they've invested in.swyx [00:16:53]: And they were very much here for the mission of what we want to do. Did they say any advice that really affected you in some way or, like, were one of the events very impactful? That's an interesting question.Micah [00:17:03]: I mean, I remember fondly a bunch of the speakers who came and did fireside chats at AI Grant.swyx [00:17:09]: Which is also, like, a crazy list. Yeah.George [00:17:11]: Oh, totally. Yeah, yeah, yeah. There was something about, you know, speaking to Nat and Daniel about the challenges of working through a startup and just working through the questions that don't have, like, clear answers and how to work through those kind of methodically and just, like, work through the hard decisions. And they've been great mentors to us as we've built artificial analysis. Another benefit for us was that other companies in the batch and other companies in AI Grant are pushing the capabilities. Yeah. And I think that's a big part of what AI can do at this time. And so being in contact with them, making sure that artificial analysis is useful to them has been fantastic for supporting us in working out how should we build out artificial analysis to continue to being useful to those, like, you know, building on AI.swyx [00:17:59]: I think to some extent, I'm mixed opinion on that one because to some extent, your target audience is not people in AI Grants who are obviously at the frontier. Yeah. Do you disagree?Micah [00:18:09]: To some extent. To some extent. But then, so a lot of what the AI Grant companies are doing is taking capabilities coming out of the labs and trying to push the limits of what they can do across the entire stack for building great applications, which actually makes some of them pretty archetypical power users of artificial analysis. Some of the people with the strongest opinions about what we're doing well and what we're not doing well and what they want to see next from us. Yeah. Yeah. Because when you're building any kind of AI application now, chances are you're using a whole bunch of different models. You're maybe switching reasonably frequently for different models and different parts of your application to optimize what you're able to do with them at an accuracy level and to get better speed and cost characteristics. So for many of them, no, they're like not commercial customers of ours, like we don't charge for all our data on the website. Yeah. They are absolutely some of our power users.swyx [00:19:07]: So let's talk about just the evals as well. So you start out from the general like MMU and GPQA stuff. What's next? How do you sort of build up to the overall index? What was in V1 and how did you evolve it? Okay.Micah [00:19:22]: So first, just like background, like we're talking about the artificial analysis intelligence index, which is our synthesis metric that we pulled together currently from 10 different eval data sets to give what? We're pretty much the same as that. Pretty confident is the best single number to look at for how smart the models are. Obviously, it doesn't tell the whole story. That's why we published the whole website of all the charts to dive into every part of it and look at the trade-offs. But best single number. So right now, it's got a bunch of Q&A type data sets that have been very important to the industry, like a couple that you just mentioned. It's also got a couple of agentic data sets. It's got our own long context reasoning data set and some other use case focused stuff. As time goes on. The things that we're most interested in that are going to be important to the capabilities that are becoming more important for AI, what developers are caring about, are going to be first around agentic capabilities. So surprise, surprise. We're all loving our coding agents and how the model is going to perform like that and then do similar things for different types of work are really important to us. The linking to use cases to economically valuable use cases are extremely important to us. And then we've got some of the. Yeah. These things that the models still struggle with, like working really well over long contexts that are not going to go away as specific capabilities and use cases that we need to keep evaluating.swyx [00:20:46]: But I guess one thing I was driving was like the V1 versus the V2 and how bad it was over time.Micah [00:20:53]: Like how we've changed the index to where we are.swyx [00:20:55]: And I think that reflects on the change in the industry. Right. So that's a nice way to tell that story.Micah [00:21:00]: Well, V1 would be completely saturated right now. Almost every model coming out because doing things like writing the Python functions and human evil is now pretty trivial. It's easy to forget, actually, I think how much progress has been made in the last two years. Like we obviously play the game constantly of like the today's version versus last week's version and the week before and all of the small changes in the horse race between the current frontier and who has the best like smaller than 10B model like right now this week. Right. And that's very important to a lot of developers and people and especially in this particular city of San Francisco. But when you zoom out a couple of years ago, literally most of what we were doing to evaluate the models then would all be 100% solved by even pretty small models today. And that's been one of the key things, by the way, that's driven down the cost of intelligence at every tier of intelligence. We can talk about more in a bit. So V1, V2, V3, we made things harder. We covered a wider range of use cases. And we tried to get closer to things developers care about as opposed to like just the Q&A type stuff that MMLU and GPQA represented. Yeah.swyx [00:22:12]: I don't know if you have anything to add there. Or we could just go right into showing people the benchmark and like looking around and asking questions about it. Yeah.Micah [00:22:21]: Let's do it. Okay. This would be a pretty good way to chat about a few of the new things we've launched recently. Yeah.George [00:22:26]: And I think a little bit about the direction that we want to take it. And we want to push benchmarks. Currently, the intelligence index and evals focus a lot on kind of raw intelligence. But we kind of want to diversify how we think about intelligence. And we can talk about it. But kind of new evals that we've kind of built and partnered on focus on topics like hallucination. And we've got a lot of topics that I think are not covered by the current eval set that should be. And so we want to bring that forth. But before we get into that.swyx [00:23:01]: And so for listeners, just as a timestamp, right now, number one is Gemini 3 Pro High. Then followed by Cloud Opus at 70. Just 5.1 high. You don't have 5.2 yet. And Kimi K2 Thinking. Wow. Still hanging in there. So those are the top four. That will date this podcast quickly. Yeah. Yeah. I mean, I love it. I love it. No, no. 100%. Look back this time next year and go, how cute. Yep.George [00:23:25]: Totally. A quick view of that is, okay, there's a lot. I love it. I love this chart. Yeah.Micah [00:23:30]: This is such a favorite, right? Yeah. And almost every talk that George or I give at conferences and stuff, we always put this one up first to just talk about situating where we are in this moment in history. This, I think, is the visual version of what I was saying before about the zooming out and remembering how much progress there's been. If we go back to just over a year ago, before 01, before Cloud Sonnet 3.5, we didn't have reasoning models or coding agents as a thing. And the game was very, very different. If we go back even a little bit before then, we're in the era where, when you look at this chart, open AI was untouchable for well over a year. And, I mean, you would remember that time period well of there being very open questions about whether or not AI was going to be competitive, like full stop, whether or not open AI would just run away with it, whether we would have a few frontier labs and no one else would really be able to do anything other than consume their APIs. I am quite happy overall that the world that we have ended up in is one where... Multi-model. Absolutely. And strictly more competitive every quarter over the last few years. Yeah. This year has been insane. Yeah.George [00:24:42]: You can see it. This chart with everything added is hard to read currently. There's so many dots on it, but I think it reflects a little bit what we felt, like how crazy it's been.swyx [00:24:54]: Why 14 as the default? Is that a manual choice? Because you've got service now in there that are less traditional names. Yeah.George [00:25:01]: It's models that we're kind of highlighting by default in our charts, in our intelligence index. Okay.swyx [00:25:07]: You just have a manually curated list of stuff.George [00:25:10]: Yeah, that's right. But something that I actually don't think every artificial analysis user knows is that you can customize our charts and choose what models are highlighted. Yeah. And so if we take off a few names, it gets a little easier to read.swyx [00:25:25]: Yeah, yeah. A little easier to read. Totally. Yeah. But I love that you can see the all one jump. Look at that. September 2024. And the DeepSeek jump. Yeah.George [00:25:34]: Which got close to OpenAI's leadership. They were so close. I think, yeah, we remember that moment. Around this time last year, actually.Micah [00:25:44]: Yeah, yeah, yeah. I agree. Yeah, well, a couple of weeks. It was Boxing Day in New Zealand when DeepSeek v3 came out. And we'd been tracking DeepSeek and a bunch of the other global players that were less known over the second half of 2024 and had run evals on the earlier ones and stuff. I very distinctly remember Boxing Day in New Zealand, because I was with family for Christmas and stuff, running the evals and getting back result by result on DeepSeek v3. So this was the first of their v3 architecture, the 671b MOE.Micah [00:26:19]: And we were very, very impressed. That was the moment where we were sure that DeepSeek was no longer just one of many players, but had jumped up to be a thing. The world really noticed when they followed that up with the RL working on top of v3 and R1 succeeding a few weeks later. But the groundwork for that absolutely was laid with just extremely strong base model, completely open weights that we had as the best open weights model. So, yeah, that's the thing that you really see in the game. But I think that we got a lot of good feedback on Boxing Day. us on Boxing Day last year.George [00:26:48]: Boxing Day is the day after Christmas for those not familiar.George [00:26:54]: I'm from Singapore.swyx [00:26:55]: A lot of us remember Boxing Day for a different reason, for the tsunami that happened. Oh, of course. Yeah, but that was a long time ago. So yeah. So this is the rough pitch of AAQI. Is it A-A-Q-I or A-A-I-I? I-I. Okay. Good memory, though.Micah [00:27:11]: I don't know. I'm not used to it. Once upon a time, we did call it Quality Index, and we would talk about quality, performance, and price, but we changed it to intelligence.George [00:27:20]: There's been a few naming changes. We added hardware benchmarking to the site, and so benchmarks at a kind of system level. And so then we changed our throughput metric to, we now call it output speed, and thenswyx [00:27:32]: throughput makes sense at a system level, so we took that name. Take me through more charts. What should people know? Obviously, the way you look at the site is probably different than how a beginner might look at it.Micah [00:27:42]: Yeah, that's fair. There's a lot of fun stuff to dive into. Maybe so we can hit past all the, like, we have lots and lots of emails and stuff. The interesting ones to talk about today that would be great to bring up are a few of our recent things, I think, that probably not many people will be familiar with yet. So first one of those is our omniscience index. So this one is a little bit different to most of the intelligence evils that we've run. We built it specifically to look at the embedded knowledge in the models and to test hallucination by looking at when the model doesn't know the answer, so not able to get it correct, what's its probability of saying, I don't know, or giving an incorrect answer. So the metric that we use for omniscience goes from negative 100 to positive 100. Because we're simply taking off a point if you give an incorrect answer to the question. We're pretty convinced that this is an example of where it makes most sense to do that, because it's strictly more helpful to say, I don't know, instead of giving a wrong answer to factual knowledge question. And one of our goals is to shift the incentive that evils create for models and the labs creating them to get higher scores. And almost every evil across all of AI up until this point, it's been graded by simple percentage correct as the main metric, the main thing that gets hyped. And so you should take a shot at everything. There's no incentive to say, I don't know. So we did that for this one here.swyx [00:29:22]: I think there's a general field of calibration as well, like the confidence in your answer versus the rightness of the answer. Yeah, we completely agree. Yeah. Yeah.George [00:29:31]: On that. And one reason that we didn't do that is because. Or put that into this index is that we think that the, the way to do that is not to ask the models how confident they are.swyx [00:29:43]: I don't know. Maybe it might be though. You put it like a JSON field, say, say confidence and maybe it spits out something. Yeah. You know, we have done a few evils podcasts over the, over the years. And when we did one with Clementine of hugging face, who maintains the open source leaderboard, and this was one of her top requests, which is some kind of hallucination slash lack of confidence calibration thing. And so, Hey, this is one of them.Micah [00:30:05]: And I mean, like anything that we do, it's not a perfect metric or the whole story of everything that you think about as hallucination. But yeah, it's pretty useful and has some interesting results. Like one of the things that we saw in the hallucination rate is that anthropics Claude models at the, the, the very left-hand side here with the lowest hallucination rates out of the models that we've evaluated amnesty is on. That is an interesting fact. I think it probably correlates with a lot of the previously, not really measured vibes stuff that people like about some of the Claude models. Is the dataset public or what's is it, is there a held out set? There's a hell of a set for this one. So we, we have published a public test set, but we we've only published 10% of it. The reason is that for this one here specifically, it would be very, very easy to like have data contamination because it is just factual knowledge questions. We would. We'll update it at a time to also prevent that, but with yeah, kept most of it held out so that we can keep it reliable for a long time. It leads us to a bunch of really cool things, including breakdown quite granularly by topic. And so we've got some of that disclosed on the website publicly right now, and there's lots more coming in terms of our ability to break out very specific topics. Yeah.swyx [00:31:23]: I would be interested. Let's, let's dwell a little bit on this hallucination one. I noticed that Haiku hallucinates less than Sonnet hallucinates less than Opus. And yeah. Would that be the other way around in a normal capability environments? I don't know. What's, what do you make of that?George [00:31:37]: One interesting aspect is that we've found that there's not really a, not a strong correlation between intelligence and hallucination, right? That's to say that the smarter the models are in a general sense, isn't correlated with their ability to, when they don't know something, say that they don't know. It's interesting that Gemini three pro preview was a big leap over here. Gemini 2.5. Flash and, and, and 2.5 pro, but, and if I add pro quickly here.swyx [00:32:07]: I bet pro's really good. Uh, actually no, I meant, I meant, uh, the GPT pros.George [00:32:12]: Oh yeah.swyx [00:32:13]: Cause GPT pros are rumored. We don't know for a fact that it's like eight runs and then with the LM judge on top. Yeah.George [00:32:20]: So we saw a big jump in, this is accuracy. So this is just percent that they get, uh, correct and Gemini three pro knew a lot more than the other models. And so big jump in accuracy. But relatively no change between the Google Gemini models, between releases. And the hallucination rate. Exactly. And so it's likely due to just kind of different post-training recipe, between the, the Claude models. Yeah.Micah [00:32:45]: Um, there's, there's driven this. Yeah. You can, uh, you can partially blame us and how we define intelligence having until now not defined hallucination as a negative in the way that we think about intelligence.swyx [00:32:56]: And so that's what we're changing. Uh, I know many smart people who are confidently incorrect.George [00:33:02]: Uh, look, look at that. That, that, that is very humans. Very true. And there's times and a place for that. I think our view is that hallucination rate makes sense in this context where it's around knowledge, but in many cases, people want the models to hallucinate, to have a go. Often that's the case in coding or when you're trying to generate newer ideas. One eval that we added to artificial analysis is, is, is critical point and it's really hard, uh, physics problems. Okay.swyx [00:33:32]: And is it sort of like a human eval type or something different or like a frontier math type?George [00:33:37]: It's not dissimilar to frontier frontier math. So these are kind of research questions that kind of academics in the physics physics world would be able to answer, but models really struggled to answer. So the top score here is not 9%.swyx [00:33:51]: And when the people that, that created this like Minway and, and, and actually off via who was kind of behind sweep and what organization is this? Oh, is this, it's Princeton.George [00:34:01]: Kind of range of academics from, from, uh, different academic institutions, really smart people. They talked about how they turn the models up in terms of the temperature as high temperature as they can, where they're trying to explore kind of new ideas in physics as a, as a thought partner, just because they, they want the models to hallucinate. Um, yeah, sometimes it's something new. Yeah, exactly.swyx [00:34:21]: Um, so not right in every situation, but, um, I think it makes sense, you know, to test hallucination in scenarios where it makes sense. Also, the obvious question is, uh, this is one of. Many that there is there, every lab has a system card that shows some kind of hallucination number, and you've chosen to not, uh, endorse that and you've made your own. And I think that's a, that's a choice. Um, totally in some sense, the rest of artificial analysis is public benchmarks that other people can independently rerun. You provide it as a service here. You have to fight the, well, who are we to, to like do this? And your, your answer is that we have a lot of customers and, you know, but like, I guess, how do you converge the individual?Micah [00:35:08]: I mean, I think, I think for hallucinations specifically, there are a bunch of different things that you might care about reasonably, and that you'd measure quite differently, like we've called this a amnesty and solutionation rate, not trying to declare the, like, it's humanity's last hallucination. You could, uh, you could have some interesting naming conventions and all this stuff. Um, the biggest picture answer to that. It's something that I actually wanted to mention. Just as George was explaining, critical point as well is, so as we go forward, we are building evals internally. We're partnering with academia and partnering with AI companies to build great evals. We have pretty strong views on, in various ways for different parts of the AI stack, where there are things that are not being measured well, or things that developers care about that should be measured more and better. And we intend to be doing that. We're not obsessed necessarily with that. Everything we do, we have to do entirely within our own team. Critical point. As a cool example of where we were a launch partner for it, working with academia, we've got some partnerships coming up with a couple of leading companies. Those ones, obviously we have to be careful with on some of the independent stuff, but with the right disclosure, like we're completely comfortable with that. A lot of the labs have released great data sets in the past that we've used to great success independently. And so it's between all of those techniques, we're going to be releasing more stuff in the future. Cool.swyx [00:36:26]: Let's cover the last couple. And then we'll, I want to talk about your trends analysis stuff, you know? Totally.Micah [00:36:31]: So that actually, I have one like little factoid on omniscience. If you go back up to accuracy on omniscience, an interesting thing about this accuracy metric is that it tracks more closely than anything else that we measure. The total parameter count of models makes a lot of sense intuitively, right? Because this is a knowledge eval. This is the pure knowledge metric. We're not looking at the index and the hallucination rate stuff that we think is much more about how the models are trained. This is just what facts did they recall? And yeah, it tracks parameter count extremely closely. Okay.swyx [00:37:05]: What's the rumored size of GPT-3 Pro? And to be clear, not confirmed for any official source, just rumors. But rumors do fly around. Rumors. I get, I hear all sorts of numbers. I don't know what to trust.Micah [00:37:17]: So if you, if you draw the line on omniscience accuracy versus total parameters, we've got all the open ways models, you can squint and see that likely the leading frontier models right now are quite a lot bigger than the ones that we're seeing right now. And the one trillion parameters that the open weights models cap out at, and the ones that we're looking at here, there's an interesting extra data point that Elon Musk revealed recently about XAI that for three trillion parameters for GROK 3 and 4, 6 trillion for GROK 5, but that's not out yet. Take those together, have a look. You might reasonably form a view that there's a pretty good chance that Gemini 3 Pro is bigger than that, that it could be in the 5 to 10 trillion parameters. To be clear, I have absolutely no idea, but just based on this chart, like that's where you would, you would land if you have a look at it. Yeah.swyx [00:38:07]: And to some extent, I actually kind of discourage people from guessing too much because what does it really matter? Like as long as they can serve it as a sustainable cost, that's about it. Like, yeah, totally.George [00:38:17]: They've also got different incentives in play compared to like open weights models who are thinking to supporting others in self-deployment for the labs who are doing inference at scale. It's I think less about total parameters in many cases. When thinking about inference costs and more around number of active parameters. And so there's a bit of an incentive towards larger sparser models. Agreed.Micah [00:38:38]: Understood. Yeah. Great. I mean, obviously if you're a developer or company using these things, not exactly as you say, it doesn't matter. You should be looking at all the different ways that we measure intelligence. You should be looking at cost to run index number and the different ways of thinking about token efficiency and cost efficiency based on the list prices, because that's all it matters.swyx [00:38:56]: It's not as good for the content creator rumor mill where I can say. Oh, GPT-4 is this small circle. Look at GPT-5 is this big circle. And then there used to be a thing for a while. Yeah.Micah [00:39:07]: But that is like on its own, actually a very interesting one, right? That is it just purely that chances are the last couple of years haven't seen a dramatic scaling up in the total size of these models. And so there's a lot of room to go up properly in total size of the models, especially with the upcoming hardware generations. Yes.swyx [00:39:29]: So, you know. Taking off my shitposting face for a minute. Yes. Yes. At the same time, I do feel like, you know, especially coming back from Europe, people do feel like Ilya is probably right that the paradigm is doesn't have many more orders of magnitude to scale out more. And therefore we need to start exploring at least a different path. GDPVal, I think it's like only like a month or so old. I was also very positive when it first came out. I actually talked to Tejo, who was the lead researcher on that. Oh, cool. And you have your own version.George [00:39:59]: It's a fantastic. It's a fantastic data set. Yeah.swyx [00:40:01]: And maybe it will recap for people who are still out of it. It's like 44 tasks based on some kind of GDP cutoff that's like meant to represent broad white collar work that is not just coding. Yeah.Micah [00:40:12]: Each of the tasks have a whole bunch of detailed instructions, some input files for a lot of them. It's within the 44 is divided into like two hundred and twenty two to five, maybe subtasks that are the level of that we run through the agenda. And yeah, they're really interesting. I will say that it doesn't. It doesn't necessarily capture like all the stuff that people do at work. No avail is perfect is always going to be more things to look at, largely because in order to make the tasks well enough to find that you can run them, they need to only have a handful of input files and very specific instructions for that task. And so I think the easiest way to think about them are that they're like quite hard take home exam tasks that you might do in an interview process.swyx [00:40:56]: Yeah, for listeners, it is not no longer like a long prompt. It is like, well, here's a zip file with like a spreadsheet or a PowerPoint deck or a PDF and go nuts and answer this question.George [00:41:06]: OpenAI released a great data set and they released a good paper which looks at performance across the different web chat bots on the data set. It's a great paper, encourage people to read it. What we've done is taken that data set and turned it into an eval that can be run on any model. So we created a reference agentic harness that can run. Run the models on the data set, and then we developed evaluator approach to compare outputs. That's kind of AI enabled, so it uses Gemini 3 Pro Preview to compare results, which we tested pretty comprehensively to ensure that it's aligned to human preferences. One data point there is that even as an evaluator, Gemini 3 Pro, interestingly, doesn't do actually that well. So that's kind of a good example of what we've done in GDPVal AA.swyx [00:42:01]: Yeah, the thing that you have to watch out for with LLM judge is self-preference that models usually prefer their own output, and in this case, it was not. Totally.Micah [00:42:08]: I think the way that we're thinking about the places where it makes sense to use an LLM as judge approach now, like quite different to some of the early LLM as judge stuff a couple of years ago, because some of that and MTV was a great project that was a good example of some of this a while ago was about judging conversations and like a lot of style type stuff. Here, we've got the task that the grader and grading model is doing is quite different to the task of taking the test. When you're taking the test, you've got all of the agentic tools you're working with, the code interpreter and web search, the file system to go through many, many turns to try to create the documents. Then on the other side, when we're grading it, we're running it through a pipeline to extract visual and text versions of the files and be able to provide that to Gemini, and we're providing the criteria for the task and getting it to pick which one more effectively meets the criteria of the task. Yeah. So we've got the task out of two potential outcomes. It turns out that we proved that it's just very, very good at getting that right, matched with human preference a lot of the time, because I think it's got the raw intelligence, but it's combined with the correct representation of the outputs, the fact that the outputs were created with an agentic task that is quite different to the way the grading model works, and we're comparing it against criteria, not just kind of zero shot trying to ask the model to pick which one is better.swyx [00:43:26]: Got it. Why is this an ELO? And not a percentage, like GDP-VAL?George [00:43:31]: So the outputs look like documents, and there's video outputs or audio outputs from some of the tasks. It has to make a video? Yeah, for some of the tasks. Some of the tasks.swyx [00:43:43]: What task is that?George [00:43:45]: I mean, it's in the data set. Like be a YouTuber? It's a marketing video.Micah [00:43:49]: Oh, wow. What? Like model has to go find clips on the internet and try to put it together. The models are not that good at doing that one, for now, to be clear. It's pretty hard to do that with a code editor. I mean, the computer stuff doesn't work quite well enough and so on and so on, but yeah.George [00:44:02]: And so there's no kind of ground truth, necessarily, to compare against, to work out percentage correct. It's hard to come up with correct or incorrect there. And so it's on a relative basis. And so we use an ELO approach to compare outputs from each of the models between the task.swyx [00:44:23]: You know what you should do? You should pay a contractor, a human, to do the same task. And then give it an ELO and then so you have, you have human there. It's just, I think what's helpful about GDPVal, the OpenAI one, is that 50% is meant to be normal human and maybe Domain Expert is higher than that, but 50% was the bar for like, well, if you've crossed 50, you are superhuman. Yeah.Micah [00:44:47]: So we like, haven't grounded this score in that exactly. I agree that it can be helpful, but we wanted to generalize this to a very large number. It's one of the reasons that presenting it as ELO is quite helpful and allows us to add models and it'll stay relevant for quite a long time. I also think it, it can be tricky looking at these exact tasks compared to the human performance, because the way that you would go about it as a human is quite different to how the models would go about it. Yeah.swyx [00:45:15]: I also liked that you included Lama 4 Maverick in there. Is that like just one last, like...Micah [00:45:20]: Well, no, no, no, no, no, no, it is the, it is the best model released by Meta. And... So it makes it into the homepage default set, still for now.George [00:45:31]: Other inclusion that's quite interesting is we also ran it across the latest versions of the web chatbots. And so we have...swyx [00:45:39]: Oh, that's right.George [00:45:40]: Oh, sorry.swyx [00:45:41]: I, yeah, I completely missed that. Okay.George [00:45:43]: No, not at all. So that, which has a checkered pattern. So that is their harness, not yours, is what you're saying. Exactly. And what's really interesting is that if you compare, for instance, Claude 4.5 Opus using the Claude web chatbot, it performs worse than the model in our agentic harness. And so in every case, the model performs better in our agentic harness than its web chatbot counterpart, the harness that they created.swyx [00:46:13]: Oh, my backwards explanation for that would be that, well, it's meant for consumer use cases and here you're pushing it for something.Micah [00:46:19]: The constraints are different and the amount of freedom that you can give the model is different. Also, you like have a cost goal. We let the models work as long as they want, basically. Yeah. Do you copy paste manually into the chatbot? Yeah. Yeah. That's, that was how we got the chatbot reference. We're not going to be keeping those updated at like quite the same scale as hundreds of models.swyx [00:46:38]: Well, so I don't know, talk to a browser base. They'll, they'll automate it for you. You know, like I have thought about like, well, we should turn these chatbot versions into an API because they are legitimately different agents in themselves. Yes. Right. Yeah.Micah [00:46:53]: And that's grown a huge amount of the last year, right? Like the tools. The tools that are available have actually diverged in my opinion, a fair bit across the major chatbot apps and the amount of data sources that you can connect them to have gone up a lot, meaning that your experience and the way you're using the model is more different than ever.swyx [00:47:10]: What tools and what data connections come to mind when you say what's interesting, what's notable work that people have done?Micah [00:47:15]: Oh, okay. So my favorite example on this is that until very recently, I would argue that it was basically impossible to get an LLM to draft an email for me in any useful way. Because most times that you're sending an email, you're not just writing something for the sake of writing it. Chances are context required is a whole bunch of historical emails. Maybe it's notes that you've made, maybe it's meeting notes, maybe it's, um, pulling something from your, um, any of like wherever you at work store stuff. So for me, like Google drive, one drive, um, in our super base databases, if we need to do some analysis or some data or something, preferably model can be plugged into all of those things and can go do some useful work based on it. The things that like I find most impressive currently that I am somewhat surprised work really well in late 2025, uh, that I can have models use super base MCP to query read only, of course, run a whole bunch of SQL queries to do pretty significant data analysis. And. And make charts and stuff and can read my Gmail and my notion. And okay. You actually use that. That's good. That's, that's, that's good. Is that a cloud thing? To various degrees of order, but chat GPD and Claude right now, I would say that this stuff like barely works in fairness right now. Like.George [00:48:33]: Because people are actually going to try this after they hear it. If you get an email from Micah, odds are it wasn't written by a chatbot.Micah [00:48:38]: So, yeah, I think it is true that I have never actually sent anyone an email drafted by a chatbot. Yet.swyx [00:48:46]: Um, and so you can, you can feel it right. And yeah, this time, this time next year, we'll come back and see where it's going. Totally. Um, super base shout out another famous Kiwi. Uh, I don't know if you've, you've any conversations with him about anything in particular on AI building and AI infra.George [00:49:03]: We have had, uh, Twitter DMS, um, with, with him because we're quite big, uh, super base users and power users. And we probably do some things more manually than we should in. In, in super base support line because you're, you're a little bit being super friendly. One extra, um, point regarding, um, GDP Val AA is that on the basis of the overperformance of the models compared to the chatbots turns out, we realized that, oh, like our reference harness that we built actually white works quite well on like gen generalist agentic tasks. This proves it in a sense. And so the agent harness is very. Minimalist. I think it follows some of the ideas that are in Claude code and we, all that we give it is context management capabilities, a web search, web browsing, uh, tool, uh, code execution, uh, environment. Anything else?Micah [00:50:02]: I mean, we can equip it with more tools, but like by default, yeah, that's it. We, we, we give it for GDP, a tool to, uh, view an image specifically, um, because the models, you know, can just use a terminal to pull stuff in text form into context. But to pull visual stuff into context, we had to give them a custom tool, but yeah, exactly. Um, you, you can explain an expert. No.George [00:50:21]: So it's, it, we turned out that we created a good generalist agentic harness. And so we, um, released that on, on GitHub yesterday. It's called stirrup. So if people want to check it out and, and it's a great, um, you know, base for, you know, generalist, uh, building a generalist agent for more specific tasks.Micah [00:50:39]: I'd say the best way to use it is get clone and then have your favorite coding. Agent make changes to it, to do whatever you want, because it's not that many lines of code and the coding agents can work with it. Super well.swyx [00:50:51]: Well, that's nice for the community to explore and share and hack on it. I think maybe in, in, in other similar environments, the terminal bench guys have done, uh, sort of the Harbor. Uh, and so it's, it's a, it's a bundle of, well, we need our minimal harness, which for them is terminus and we also need the RL environments or Docker deployment thing to, to run independently. So I don't know if you've looked at it. I don't know if you've looked at the harbor at all, is that, is that like a, a standard that people want to adopt?George [00:51:19]: Yeah, we've looked at it from a evals perspective and we love terminal bench and, and host benchmarks of, of, of terminal mention on artificial analysis. Um, we've looked at it from a, from a coding agent perspective, but could see it being a great, um, basis for any kind of agents. I think where we're getting to is that these models have gotten smart enough. They've gotten better, better tools that they can perform better when just given a minimalist. Set of tools and, and let them run, let the model control the, the agentic workflow rather than using another framework that's a bit more built out that tries to dictate the, dictate the flow. Awesome.swyx [00:51:56]: Let's cover the openness index and then let's go into the report stuff. Uh, so that's the, that's the last of the proprietary art numbers, I guess. I don't know how you sort of classify all these. Yeah.Micah [00:52:07]: Or call it, call it, let's call it the last of like the, the three new things that we're talking about from like the last few weeks. Um, cause I mean, there's a, we do a mix of stuff that. Where we're using open source, where we open source and what we do and, um, proprietary stuff that we don't always open source, like long context reasoning data set last year, we did open source. Um, and then all of the work on performance benchmarks across the site, some of them, we looking to open source, but some of them, like we're constantly iterating on and so on and so on and so on. So there's a huge mix, I would say, just of like stuff that is open source and not across the side. So that's a LCR for people. Yeah, yeah, yeah, yeah.swyx [00:52:41]: Uh, but let's, let's, let's talk about open.Micah [00:52:42]: Let's talk about openness index. This. Here is call it like a new way to think about how open models are. We, for a long time, have tracked where the models are open weights and what the licenses on them are. And that's like pretty useful. That tells you what you're allowed to do with the weights of a model, but there is this whole other dimension to how open models are. That is pretty important that we haven't tracked until now. And that's how much is disclosed about how it was made. So transparency about data, pre-training data and post-training data. And whether you're allowed to use that data and transparency about methodology and training code. So basically, those are the components. We bring them together to score an openness index for models so that you can in one place get this full picture of how open models are.swyx [00:53:32]: I feel like I've seen a couple other people try to do this, but they're not maintained. I do think this does matter. I don't know what the numbers mean apart from is there a max number? Is this out of 20?George [00:53:44]: It's out of 18 currently, and so we've got an openness index page, but essentially these are points, you get points for being more open across these different categories and the maximum you can achieve is 18. So AI2 with their extremely open OMO3 32B think model is the leader in a sense.swyx [00:54:04]: It's hooking face.George [00:54:05]: Oh, with their smaller model. It's coming soon. I think we need to run, we need to get the intelligence benchmarks right to get it on the site.swyx [00:54:12]: You can't have it open in the next. We can not include hooking face. We love hooking face. We'll have that, we'll have that up very soon. I mean, you know, the refined web and all that stuff. It's, it's amazing. Or is it called fine web? Fine web. Fine web.Micah [00:54:23]: Yeah, yeah, no, totally. Yep. One of the reasons this is cool, right, is that if you're trying to understand the holistic picture of the models and what you can do with all the stuff the company's contributing, this gives you that picture. And so we are going to keep it up to date alongside all the models that we do intelligence index on, on the site. And it's just an extra view to understand.swyx [00:54:43]: Can you scroll down to this? The, the, the, the trade-offs chart. Yeah, yeah. That one. Yeah. This, this really matters, right? Obviously, because you can b