POPULARITY
Ever wished your AI could actually do things directly from your terminal, beyond just chatting? Google's new Gemini CLI is here, and it's a game-changer! Unlike previous closed-source tools, this open-source power tool brings the intelligence of Gemini Pro right to your command line. We're talking direct file system access, automated project setups, and a powerful "reason and react" loop that lets AI analyze, plan, and execute tasks on your machine. Perfect for DevOps, developers, and anyone ready to automate their workflow. Is your terminal ready for its new brain?
An airhacks.fm conversation with Jonathan Ellis (@spyced) about: brokk AI tool for code generation named after Norse god of the forge, AI as complement to experienced programmers' skillsets, age and productivity in programming, transition from JVector to working on Cassandra codebase, challenges with AI in large codebases with extensive context, building tools for historical Java codebases, comparison of productivity between younger and older programmers, brute force coding vs experienced approach, reading code quickly as a senior skill, AI generating nested if-else statements vs better structures, context sculpting in Brokk, open source nature of Brokk, no black boxes philosophy, surfacing AI context to users, automatic context pulling with manual override options, importing dependencies and decompiling JARs for context, syntax tree based summarization, Maven and Gradle dependency handling, unique Java-specific features, multiple AI model support simultaneously, Claude vs Gemini Pro performance differences, Git history as context source, capturing commits and diffs for regression analysis, migration analysis between commits, AI code review and technical debt cleanup, style for code style guidelines, using modern Java features like var and Streams, Error Prone and NullAway integration for code quality, comparison with Cursor's primitive features, branching conversation history, 80% time in Brokk vs 20% in IntelliJ workflow, sketching package structures for AI guidance, data structures guiding algorithms, Git browser by file and commit, unified diff as context, reflection moving away from due to tooling opacity, Jackson serialization refactoring with DTOs, enterprise features like session sync and sharing, unified API key management, rate limit advantages, parallel file processing with upgrade agent, LiteLLM integration for custom models, pricing model based on credits not requests, $20/month subscription with credits, free tier models like Grok 3 Mini and DeepSeek V3, architect mode for autonomous code generation, code button for smaller problems with compile-test loop, ask button for planning complex implementations, senior vs junior programmer AI effectiveness, self-editing capability achieved early in development, no vector search usage despite JVector background Jonathan Ellis on twitter: @spyced
As notícias de hoje incluem a Meta, dona do Facebook, Instagram e WhatsApp, pedindo permissão para acessar a galeria do seu celular e processar imagens usando IA, a Amazon começando a agir contra apps de pirataria de streaming que sempre rodaram nos Fire TV Stick. Tem um estudo de uma grande empresa privada indicando que a IA pode contribuir com R$ 2,1 trilhões ao PIB do Brasil desde que as companhias tomem os devidos cuidados. A Google finalmente liberando a programação de ações agendadas que você pode configurar para o Gemini fazer para você depois. Falando nisso, agora é a última chance para quem é estudante conseguir 1 ano e três meses do Gemini Pro totalmente de graça, o que eu também vou explicar no programa.
OpenAI, Google & Anthropic are all eating different parts of the business & creative worlds but where does that leave us? For only 25 cents, you too can sponsor a human in a world of AGI. In the big news this week, OpenAI's takes on Microsoft Office, Google's cutting the cost of AI coding with their new Google CLI (Command Line Interface) and dropped an on-device robotics platform. Oh, and Anthropic just won a massive lawsuit around AI training and fair use. Plus, Tesla's rocky rollout of their Robotaxis, Eleven Labs' new MCP-centric 11ai voice agent, Runway's Game Worlds, the best hacker in the world in now an AI bot AND Gavin defends AI slop. US HUMANS AIN'T GOING AWAY. UNLESS THE AI GIVES US ENDLESS TREATS. #ai #ainews #openai Join the discord: https://discord.gg/muD2TYgC8f Join our Patreon: https://www.patreon.com/AIForHumansShow AI For Humans Newsletter: https://aiforhumans.beehiiv.com/ Follow us for more on X @AIForHumansShow Join our TikTok @aiforhumansshow To book us for speaking, please visit our website: https://www.aiforhumans.show/ // Show Links // OpenAI Developing Microsoft Office / Google Workplace Competitor https://www.theinformation.com/articles/openai-quietly-designed-rival-google-workspace-microsoft-office?rc=c3oojq OpenAI io / trademark drama: https://www.theguardian.com/technology/2025/jun/23/openai-jony-ive-io-amid-trademark-iyo Sam's receipts from Jason Rugolo (founder of iYo the headphone company) https://x.com/sama/status/1937606794362388674 Google's OpenSource Comand Line Interface for Gemini is Free? https://blog.google/technology/developers/introducing-gemini-cli-open-source-ai-agent/ 1000 free Gemini Pro 2.5 requests per day https://x.com/OfficialLoganK/status/1937881962070364271 Anthropic's Big AI Legal Win https://www.reuters.com/legal/litigation/anthropic-wins-key-ruling-ai-authors-copyright-lawsuit-2025-06-24/ More detail: https://x.com/AndrewCurran_/status/1937512454835306974 Gemini's On Device Robotics https://deepmind.google/discover/blog/gemini-robotics-on-device-brings-ai-to-local-robotic-devices/ AlphaGenome: an AI model to help scientists better understand our DNA https://x.com/GoogleDeepMind/status/1937873589170237738 Tesla Robotaxi Roll-out https://www.cnbc.com/2025/06/23/tesla-robotaxi-incidents-caught-on-camera-in-austin-get-nhtsa-concern.html Kinda Scary Looking: https://x.com/binarybits/status/1936951664721719383 Random slamming of brakes: https://x.com/JustonBrazda/status/1937518919062856107 Mira Murati's Thinking Machines Raises $2B Seed Round https://thinkingmachines.ai/ https://www.theinformation.com/articles/ex-openai-cto-muratis-startup-plans-compete-openai-others?rc=c3oojq&shared=2c64512f9a1ab832 Eleven Labs 11ai Voice Assistant https://x.com/elevenlabsio/status/1937200086515097939 Voice Design for V3 JUST RELEASED: https://x.com/elevenlabsio/status/1937912222128238967 Runway's Game Worlds https://x.com/c_valenzuelab/status/1937665391855120525 Example: https://x.com/aDimensionDoor/status/1937651875408675060 AI Dungeon https://aidungeon.com/ The Best Hacker in the US in now an autonomous AI bot https://www.pcmag.com/news/this-ai-is-outranking-humans-as-a-top-software-bug-hunter https://x.com/Xbow/status/1937512662859981116 Simple & Good AI Work Flow From AI Warper https://x.com/AIWarper/status/1936899718678008211 RealTime Natural Language Photo Editing https://x.com/zeke/status/1937267796146290952 Bunker J Squirrel https://www.tiktok.com/t/ZTjc3hb38/ Bigfoot Sermons https://www.tiktok.com/t/ZTjcEq17Y/ John Oliver's Episode about AI Slop https://youtu.be/TWpg1RmzAbc?si=LAdktGWlIVVDqAjR Jabba Kisses Han https://www.reddit.com/r/CursedAI/comments/1ljjdw3/what_the_hell_am_i_looking_at/
Hey folks, Alex here, welcome back to ThursdAI! And folks, after the last week was the calm before the storm, "The storm came, y'all" – that's an understatement. This wasn't just a storm; it was an AI hurricane, a category 5 of announcements that left us all reeling (in the best way possible!). From being on the ground at Google I/O to live-watching Anthropic drop Claude 4 during our show, it's been an absolute whirlwind.This week was so packed, it felt like AI Christmas, with tech giants and open-source heroes alike showering us with gifts. We saw OpenAI play their classic pre-and-post-Google I/O chess game, Microsoft make some serious open-source moves, Google unleash an avalanche of updates, and Anthropic crash the party with Claude 4 Opus and Sonnet live stream in the middle of ThursdAI!So buckle up, because we're about to try and unpack this glorious chaos. As always, we're here to help you collectively know, learn, and stay up to date, so you don't have to. Let's dive in! (TL;DR and links in the end) Open Source LLMs Kicking Things OffEven with the titans battling, the open-source community dropped some serious heat this week. It wasn't the main headline grabber, but the releases were significant!Gemma 3n: Tiny But Mighty MatryoshkaFirst up, Google's Gemma 3n. This isn't just another small model; it's a "Nano-plus" preview, a 4-billion parameter MatFormer (Matryoshka Transformer – how cool is that name?) model designed for mobile-first multimodal applications. The really slick part? It has a nested 2-billion parameter sub-model that can run entirely on phones or Chromebooks.Yam was particularly excited about this one, pointing out the innovative "model inside another model" design. The idea is you can use half the model, not depth-wise, but throughout the layers, for a smaller footprint without sacrificing too much. It accepts interleaved text, image, audio, and video, supports ASR and speech translation, and even ships with RAG and function-calling libraries for edge apps. With a 128K token window and responsible AI features baked in, Gemma 3n is looking like a powerful tool for on-device AI. Google claims it beats prior 4B mobile models on MMLU-Lite and MMMU-Mini. It's an early preview in Google AI Studio, but it definitely flies on mobile devices.Mistral & AllHands Unleash Devstral 24BThen we got a collaboration from Mistral and AllHands: Devstral, a 24-billion parameter, state-of-the-art open model focused on code. We've been waiting for Mistral to drop some open-source goodness, and this one didn't disappoint.Nisten was super hyped, noting it beats o3-Mini on SWE-bench verified – a tough benchmark! He called it "the first proper vibe coder that you can run on a 3090," which is a big deal for coders who want local power and privacy. This is a fantastic development for the open-source coding community.The Pre-I/O Tremors: OpenAI & Microsoft Set the StageAs we predicted, OpenAI couldn't resist dropping some news right before Google I/O.OpenAI's Codex Returns as an AgentOpenAI launched Codex – yes, that Codex, but reborn as an asynchronous coding agent. This isn't just a CLI tool anymore; it connects to GitHub, does pull requests, fixes bugs, and navigates your codebase. It's powered by a new coding model fine-tuned for large codebases and was SOTA on SWE Agent when it dropped. Funnily, the model is also called Codex, this time, Codex-1. And this gives us a perfect opportunity to talk about the emerging categories I'm seeing among Code Generator agents and tools:* IDE-based (Cursor, Windsurf): Live pair programming in your editor* Vibe coding (Lovable, Bolt, v0): "Build me a UI" style tools for non-coders* CLI tools (Claude Code, Codex-cli): Terminal-based assistants* Async agents (Claude Code, Jules, Codex, GitHub Copilot agent, Devin): Work on your repos while you sleep, open pull requests for you to review, asyncCodex (this new one) falls into category number 4, and with today's release, Cursor seems to also strive to get to category number 4 with background processing. Microsoft BUILD: Open Source Copilot and Copilot Agent ModeThen came Microsoft Build, their huge developer conference, with a flurry of announcements.The biggest one for me? GitHub Copilot's front-end code is now open source! The VS Code editor part was already open, but the Copilot integration itself wasn't. This is a massive move, likely a direct answer to the insane valuations of VS Code clones like Cursor. Now, you can theoretically clone GitHub Copilot with VS Code and swing for the fences.GitHub Copilot also launched as an asynchronous coding assistant, very similar in function to OpenAI's Codex, allowing it to be assigned tasks and create/update PRs. This puts Copilot right into category 4 of code assistants, and with the native Github Integration, they may actually have a leg up in this race!And if that wasn't enough, Microsoft is adding MCP (Model Context Protocol) support directly into the Windows OS. The implications of having the world's biggest operating system natively support this agentic protocol are huge.Google I/O: An "Ultra" Event Indeed!Then came Tuesday, and Google I/O. I was there in the thick of it, and folks, it was an absolute barrage. Google is shipping. The theme could have been "Ultra" for many reasons, as we'll see.First off, the scale: Google reported a 49x increase in AI usage since last year's I/O, jumping from 9 trillion tokens processed to a mind-boggling 480 trillion tokens. That's a testament to their generous free tiers and the explosion of AI adoption.Gemini 2.5 Pro & Flash: #1 and #2 LLMs on ArenaGemini 2.5 Flash got an update and is now #2 on the LMArena leaderboard (with Gemini 2.5 Pro still holding #1). Both Pro and Flash gained some serious new capabilities:* Deep Think mode: This enhanced reasoning mode is pushing Gemini's scores to new heights, hitting 84% on MMMU and topping LiveCodeBench. It's about giving the model more "time" to work through complex problems.* Native Audio I/O: We're talking real-time TTS in 24 languages with two voices, and affective dialogue capabilities. This is the advanced voice mode we've been waiting for, now built-in.* Project Mariner: Computer-use actions are being exposed via the Gemini API & Vertex AI for RPA partners. This started as a Chrome extension to control your browser and now seems to be a cloud-based API, allowing Gemini to use the web, not just browse it. This feels like Google teaching its AI to interact with the JavaScript-heavy web, much like they taught their crawlers years ago.* Thought Summaries: Okay, here's one update I'm not a fan of. They've switched from raw thinking traces to "thought summaries" in the API. We want the actual traces! That's how we learn and debug.* Thinking Budgets: Previously a Flash-only feature, token ceilings for controlling latency/cost now extend to Pro.* Flash Upgrade: 20-30% fewer tokens, better reasoning/multimodal scores, and GA in early June.Gemini Diffusion: Speed Demon for Code and MathThis one got Yam Peleg incredibly excited. Gemini Diffusion is a new approach, different from transformers, for super-speed editing of code and math tasks. We saw demos hitting 2000 tokens per second! While there might be limitations at longer contexts, its speed and infilling capabilities are seriously impressive for a research preview. This is the first diffusion model for text we've seen from the frontier labs, and it looks sick. Funny note, they had to slow down the demo video to actually show the diffusion process, because at 2000t/s - apps appear as though out of thin air!The "Ultra" Tier and Jules, Google's Coding AgentRemember the "Ultra event" jokes? Well, Google announced a Gemini Ultra tier for $250/month. This tops OpenAI's Pro plan and includes DeepThink access, a generous amount of VEO3 generation, YouTube Premium, and a whopping 30TB of storage. It feels geared towards creators and developers.And speaking of developers, Google launched Jules (jules.google)! This is their asynchronous coding assistant (Category 4!). Like Codex and GitHub Copilot Agent, it connects to your GitHub, opens PRs, fixes bugs, and more. The big differentiator? It's currently free, which might make it the default for many. Another powerful agent joins the fray!AI Mode in Search: GA and EnhancedAI Mode in Google Search, which we've discussed on the show before with Robby Stein, is now in General Availability in the US. This is Google's answer to Perplexity and chat-based search.But they didn't stop there:* Personalization: AI Mode can now connect to your Gmail and Docs (if you opt-in) for more personalized results.* Deep Search: While AI Mode is fast, Deep Search offers more comprehensive research capabilities, digging through hundreds of sources, similar to other "deep research" tools. This will eventually be integrated, allowing you to escalate an AI Mode query for a deeper dive.* Project Mariner Integration: AI Mode will be able to click into websites, check availability for tickets, etc., bridging the gap to an "agentic web."I've had a chat with Robby during I/O and you can listen to that interview at the end of the podcast.Veo3: The Undisputed Star of Google I/OFor me, and many others I spoke to, Veo3 was the highlight. This is Google's flagship video generation model, and it's on another level. (the video above, including sounds is completely one shot generated from VEO3, no processing or editing)* Realism and Physics: The visual quality and understanding of physics are astounding.* Natively Multimodal: This is huge. Veo3 generates native audio, including coherent speech, conversations, and sound effects, all synced perfectly. It can even generate text within videos.* Coherent Characters: Characters remain consistent across scenes and have situational awareness, who speaks when, where characters look.* Image Upload & Reference Ability: While image upload was closed for the demo, it has reference capabilities.* Flow: An editor for video creation using Veo3 and Imagen4 which also launched, allowing for stiching and continuous creation.I got access and created videos where Veo3 generated a comedian telling jokes (and the jokes were decent!), characters speaking with specific accents (Indian, Russian – and they nailed it!), and lip-syncing that was flawless. The situational awareness, the laugh tracks kicking in at the right moment... it's beyond just video generation. This feels like a world simulator. It blew through the uncanny valley for me. More on Veo3 later, because it deserves its own spotlight.Imagen4, Virtual Try-On, and XR Glasses* Imagen4: Google's image generation model also got an upgrade, with extra textual ability.* Virtual Try-On: In Google Shopping, you can now virtually try on clothes. I tried it; it's pretty cool and models different body types well.* XR AI Glasses from Google: Perhaps the coolest, but most futuristic, announcement. AI-powered glasses with an actual screen, memory, and Gemini built-in. You can talk to it, it remembers things for you, and interacts with your environment. This is agentic AI in a very tangible form.Big Company LLMs + APIs: The Beat Goes OnThe news didn't stop with Google.OpenAI (acqui)Hires Jony Ive, Launches "IO" for HardwareThe day after I/O, Sam Altman confirmed that Jony Ive, the legendary designer behind Apple's iconic products, is joining OpenAI. He and his company, LoveFrom, have jointly created a new company called "IO" (yes, IO, just like the conference) which is joining OpenAI in a stock deal reportedly worth $6.5 billion. They're working on a hardware device, unannounced for now, but expected next year. This is a massive statement of intent from OpenAI in the hardware space.Legendary iPhone analyst Ming-Chi Kuo shed some light on the possible device, it won't have a screen, as Jony wants to "wean people off screens"... funny right? They are targeting 2027 for mass production, which is really interesting as 2027 is when most big companies expect AGI to be here. "The current prototype is slightly larger than AI Pin, with a form factor comparable to iPod Shuffle, with one intended use cases is to wear it around your neck, with microphones and cameras for environmental detection" LMArena Raises $100M Seed from a16zThis one raised some eyebrows. LMArena, the go-to place for vibe-checking LLMs, raised a $100 million seed round from Andreessen Horowitz. That's a huge number for a seed, reminiscent of Stability AI's early funding. It also brings up questions about how a VC-backed startup maintains impartiality as a model evaluation platform. Interesting times ahead for leaderboards, how they intent to make 100x that amount to return to investors. Very curious.
Nabeel Qureshi is an entrepreneur, writer, researcher, and visiting scholar of AI policy at the Mercatus Center (alongside Tyler Cowen). Previously, he spent nearly eight years at Palantir, working as a forward-deployed engineer. His work at Palantir ranged from accelerating the Covid-19 response to applying AI to drug discovery to optimizing aircraft manufacturing at Airbus. Nabeel was also a founding employee and VP of business development at GoCardless, a leading European fintech unicorn.What you'll learn:• Why almost a third of all Palantir's PMs go on to start companies• How the “forward-deployed engineer” model works and why it creates exceptional product leaders• How Palantir transformed from a “sparkling Accenture” into a $200 billion data/software platform company with more than 80% margins• The unconventional hiring approach that screens for independent-minded, intellectually curious, and highly competitive people• Why the company intentionally avoids traditional titles and career ladders—and what they do instead• Why they built an ontology-first data platform that LLMs love• How Palantir's controversial “bat signal” recruiting strategy filtered for specific talent types• The moral case for working at a company like Palantir—Brought to you by:• WorkOS—Modern identity platform for B2B SaaS, free up to 1 million MAUs• Attio—The powerful, flexible CRM for fast-growing startups• OneSchema—Import CSV data 10x faster—Where to find Nabeel S. Qureshi:• X: https://x.com/nabeelqu• LinkedIn: https://www.linkedin.com/in/nabeelqu/• Website: https://nabeelqu.co/—Where to find Lenny:• Newsletter: https://www.lennysnewsletter.com• X: https://twitter.com/lennysan• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/—In this episode, we cover:(00:00) Introduction to Nabeel S. Qureshi(05:10) Palantir's unique culture and hiring(13:29) What Palantir looks for in people(16:14) Why they don't have titles(19:11) Forward-deployed engineers at Palantir(25:23) Key principles of Palantir's success(30:00) Gotham and Foundry(36:58) The ontology concept(38:02) Life as a forward-deployed engineer(41:36) Balancing custom solutions and product vision(46:36) Advice on how to implement forward-deployed engineers(50:41) The current state of forward-deployed engineers at Palantir(53:15) The power of ingesting, cleaning and analyzing data(59:25) Hiring for mission-driven startups(01:05:30) What makes Palantir PMs different(01:10:00) The moral question of Palantir(01:16:03) Advice for new startups(01:21:12) AI corner(01:24:00) Contrarian corner(01:25:42) Lightning round and final thoughts—Referenced:• Reflections on Palantir: https://nabeelqu.co/reflections-on-palantir• Palantir: https://www.palantir.com/• Intercom: https://www.intercom.com/• Which companies produce the best product managers: https://www.lennysnewsletter.com/p/which-companies-produce-the-best• Gotham: https://www.palantir.com/platforms/gotham/• Foundry: https://www.palantir.com/platforms/foundry/• Peter Thiel on X: https://x.com/peterthiel• Alex Karp: https://en.wikipedia.org/wiki/Alex_Karp• Stephen Cohen: https://en.wikipedia.org/wiki/Stephen_Cohen_(entrepreneur)• Joe Lonsdale on LinkedIn: https://www.linkedin.com/in/jtlonsdale/• Tyler Cowen's website: https://tylercowen.com/• This Scandinavian City Just Won the Internet With Its Hilarious New Tourism Ad: https://www.afar.com/magazine/oslos-new-tourism-ad-becomes-viral-hit• Safe Superintelligence: https://ssi.inc/• Mira Murati on X: https://x.com/miramurati• Stripe: https://stripe.com/• Building product at Stripe: craft, metrics, and customer obsession | Jeff Weinstein (Product lead): https://www.lennysnewsletter.com/p/building-product-at-stripe-jeff-weinstein• Airbus: https://www.airbus.com/en• NIH: https://www.nih.gov/• Jupyter Notebooks: https://jupyter.org/• Shyam Sankar on LinkedIn: https://www.linkedin.com/in/shyamsankar/• Palantir Gotham for Defense Decision Making: https://www.youtube.com/watch?v=rxKghrZU5w8• Foundry 2022 Operating System Demo: https://www.youtube.com/watch?v=uF-GSj-Exms• SQL: https://en.wikipedia.org/wiki/SQL• Airbus A350: https://en.wikipedia.org/wiki/Airbus_A350• SAP: https://www.sap.com/index.html• Barry McCardel on LinkedIn: https://www.linkedin.com/in/barrymccardel/• Understanding ‘Forward Deployed Engineering' and Why Your Company Probably Shouldn't Do It: https://www.barry.ooo/posts/fde-culture• David Hsu on LinkedIn: https://www.linkedin.com/in/dvdhsu/• Retool's Path to Product-Market Fit—Lessons for Getting to 100 Happy Customers, Faster: https://review.firstround.com/retools-path-to-product-market-fit-lessons-for-getting-to-100-happy-customers-faster/• How to foster innovation and big thinking | Eeke de Milliano (Retool, Stripe): https://www.lennysnewsletter.com/p/how-to-foster-innovation-and-big• Looker: https://cloud.google.com/looker• Sorry, that isn't an FDE: https://tedmabrey.substack.com/p/sorry-that-isnt-an-fde• Glean: https://www.glean.com/• Limited Engagement: Is Tech Becoming More Diverse?: https://www.bkmag.com/2017/01/31/limited-engagement-creating-diversity-in-the-tech-industry/• Operation Warp Speed: https://en.wikipedia.org/wiki/Operation_Warp_Speed• Mark Zuckerberg testifies: https://www.businessinsider.com/facebook-ceo-mark-zuckerberg-testifies-congress-libra-cryptocurrency-2019-10• Anduril: https://www.anduril.com/• SpaceX: https://www.spacex.com/• Principles: https://nabeelqu.co/principles• Wispr Flow: https://wisprflow.ai/• Claude code: https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview• Gemini Pro 2.5: https://deepmind.google/technologies/gemini/pro/• DeepMind: https://deepmind.google/• Latent Space newsletter: https://www.latent.space/• Swyx on x: https://x.com/swyx• Neural networks in chess programs: https://www.chessprogramming.org/Neural_Networks• AlphaZero: https://en.wikipedia.org/wiki/AlphaZero• The top chess players in the world: https://www.chess.com/players• Decision to Leave: https://www.imdb.com/title/tt12477480/• Oldboy: https://www.imdb.com/title/tt0364569/• Christopher Alexander: https://en.wikipedia.org/wiki/Christopher_Alexander—Recommended books:• The Technological Republic: Hard Power, Soft Belief, and the Future of the West: https://www.amazon.com/Technological-Republic-Power-Belief-Future/dp/0593798694• Zero to One: Notes on Startups, or How to Build the Future: https://www.amazon.com/Zero-One-Notes-Startups-Future/dp/0804139296• Impro: Improvisation and the Theatre: https://www.amazon.com/Impro-Improvisation-Theatre-Keith-Johnstone/dp/0878301178/• William Shakespeare: Histories: https://www.amazon.com/Histories-Everymans-Library-William-Shakespeare/dp/0679433120/• High Output Management: https://www.amazon.com/High-Output-Management-Andrew-Grove/dp/0679762884• Anna Karenina: https://www.amazon.com/Anna-Karenina-Leo-Tolstoy/dp/0143035002—Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.—Lenny may be an investor in the companies discussed. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.lennysnewsletter.com/subscribe
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
This podcast discuss the rapidly advancing field of de-extinction, highlighting the crucial role of artificial intelligence (AI) in making this a tangible scientific pursuit. AI is presented not merely as a tool but as an architect across all stages, from reconstructing degraded ancient DNA and predicting gene function to optimising gene editing and modelling ecological impacts. While companies like Colossal Biosciences pursue ambitious projects for species like the woolly mammoth and dire wolf, often driving technological innovation with commercial spin-offs, organisations like Revive & Restore focus on genetic rescue for endangered species, illustrating differing approaches within this landscape. The podcast underscore the significant technical, ecological, and ethical challenges inherent in de-extinction, particularly concerning animal welfare, resource allocation, and potential ecological disruption, while also pointing to valuable spillover innovations benefiting broader conservation and human health.Get the eBook at Google Play https://play.google.com/store/search?q=etienne%20noumen%27&c=books
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
OpenAI's strategic appointment of Instacart CEO Fidji Simo to lead its applications division and its global "Stargate" initiative to build sovereign AI infrastructure with national governments. Several articles touch on the potential for AI to reshape technology and society, including Apple's contemplation of a future beyond the iPhone due to AI advancements and Meta's development of "super-sensing" AI glasses with potential facial recognition. The text also covers policy shifts, specifically the Trump administration's plan to roll back Biden-era AI chip export restrictions. Furthermore, the sources describe new AI-powered products and features from companies like Figma, Stripe, Superhuman, and Mistral AI, showcasing the increasing integration of AI into design, finance, communication, and enterprise solutions.
Hey folks, Alex here (yes, real me, not my AI avatar, yet)Compared to previous weeks, this week was pretty "chill" in the world of AI, though we did get a pretty significant Gemini 2.5 Pro update, it basically beat itself on the Arena. With Mistral releasing a new medium model (not OSS) and Nvidia finally dropping Nemotron Ultra (both ignoring Qwen 3 performance) there was also a few open source updates. To me the highlight of this week was a breakthrough in AI Avatars, with Heygen's new IV model, Beating ByteDance's OmniHuman (our coverage) and Hedra labs, they've set an absolute SOTA benchmark for 1 photo to animated realistic avatar. Hell, Iet me record all this real quick and show you how good it is! How good is that?? I'm still kind of blown away. I have managed to get a free month promo code for you guys, look for it in the TL;DR section at the end of the newsletter. Of course, if you're rather watch than listen or read, here's our live recording on YTOpenSource AINVIDIA's Nemotron Ultra V1: Refining the Best with a Reasoning Toggle
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
Significant developments include Amazon's introduction of a tactile warehouse robot named Vulcan and Google's Gemini 2.5 Pro reportedly topping AI leaderboards, highlighting progress in automation and model performance. Strategically, OpenAI is planning to reduce revenue share with partners like Microsoft and also launching an initiative to help nations build AI infrastructure. Meanwhile, Apple is considering AI search partners for Safari amid declining Google usage, and AI is being used in innovative ways, such as AI-powered drones for medical delivery and the recreation of a road rage victim for a court statement. Finally, HeyGen is enhancing AI avatars with emotional expression, and platforms like Zapier are enabling users to create personal AI assistants, indicating broader application and accessibility of AI technology.
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
This episode highlight OpenAI's significant structural shift to retain non-profit control while acquiring an AI coding startup and addressing model sycophancy. Furthermore, the texts cover Waymo's expansion of robotaxi production with a new factory and Canva's entry into spreadsheets with an AI-powered tool. Finally, they touch upon the growing urgency for AI education in schools, as advocated by tech leaders, and Nvidia's contribution to open-source AI with a high-performance transcription model, along with a warning from Fiverr's CEO about AI's impact on jobs.
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
This podcast details how AI-powered autonomous drones are transforming global logistics, particularly for delivering essential medical supplies in challenging environments. The podcast highlights Zipline as a key player, discussing its pioneering work in countries like Rwanda and Ghana where drone delivery has shown significant improvements in healthcare outcomes and efficiency.
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
This podcast and sources discuss the growing issue of plastic pollution and the limitations of traditional recycling methods. They introduce the discovery of plastic-eating microbes and their enzymes as a promising alternative for degrading plastics. Crucially, the text explains how Artificial Intelligence (AI)is being employed to significantly enhance the effectiveness of these enzymes, making them faster and more stable for industrial applications. The document highlights successful AI-engineered enzymes like FAST-PETase for achieving true circularity by breaking plastics down to their original monomers, and outlines the environmental and economic benefits of this approach. However, the sources also acknowledge the significant scientific, engineering, economic, and regulatory challenges that must be overcome for large-scale adoption of this technology.
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
Significant developments include the launch of specialised AI agents for scientific research by FutureHouse and the integration of AI coding assistants into Apple's Xcode environment through a partnership with Anthropic. Google's activities are also prominent, ranging from their strategies to address AI's energy demands and workforce needs to the successful, albeit assisted, completion of the game Pokémon Blue by their Gemini AI. Furthermore, the reports touch on the increasing recognition of AI's role in creative works by the US Copyright Office and the economic implications of AI infrastructure costs, partly attributed to tariffs, as noted by Meta. Overall, the text underscores the expanding capabilities of AI, the practical applications across various sectors, and the associated infrastructure and policy challenges.
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
The podcast discusses Conformal Prediction (CP) as a method for enhancing the reliability of AI in medical diagnosis by providing rigorous uncertainty quantification. It explains that unlike traditional AI which gives single predictions, CP produces a set of possible outcomes with a guaranteed probability of containing the true answer, addressing the critical need for trustworthy AI in healthcare. The text explores the foundational concepts of CP, compares it to other uncertainty quantification techniques, highlights advanced CP methods for more nuanced guarantees, and surveys its diverse applications in medical imaging, genomics, clinical risk prediction, and drug discovery. Finally, it examines the challenges of clinical integration, the need for human-AI interaction, and the ethical and regulatory dimensions, positioning CP as a vital tool for the safe and effective deployment of AI in medicine despite requiring further research and adaptation for practical success.Source: https://machinelearningcertification.web.app/Conformal_Classification_in_Medical_Diagnosis.pdf
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
Key themes include technological competition and national self-reliance with Huawei and China challenging Nvidia and US dominance in AI chips, and major product updates and releases from companies like Baidu, OpenAI, and Grok introducing new AI models and features. The text also highlights innovative applications of AI, from Neuralink's brain implants restoring communication and Waymo considering selling robotaxis directly to consumers, to creative uses like generating action figures and integrating AI into religious practices. Finally, the sources touch on important considerations surrounding AI, such as the need for interpretability to ensure safety, the increasing sophistication of AI-powered scams, and discussions on the military implications and future potential of AGI.
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
Significant funding discussions surround Elon Musk's xAI, while Microsoft introduced new AI-powered features for Windows. Intel is shifting its AI chip strategy, and Perplexity aims to challenge established search engines with an AI browser. Concerns regarding AI misuse are evident in discussions about scams and legal filings, alongside warnings from AI pioneers about future risks. Conversely, AI's potential is explored in areas such as air mobility, music creation, code generation, and even predicting the end of all disease.
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
Perplexity announced a new browser designed for hyper-personalised advertising through extensive user tracking, mirroring tactics of other tech giants. Apple is shifting its robotics division to its hardware group, suggesting a move towards tangible consumer products. Simultaneously, Anthropic launched a research program dedicated to exploring the ethical implications of potential AI consciousness. Creative industries are also seeing progress with Adobe unveiling enhanced image generation models and integrating third-party AI, while Google DeepMind expanded its Music AI Sandbox for musicians. Furthermore, AI is increasingly integrated into the software development process, with Google reporting over 30% of new code being AI-generated.
**Palabras clave:** traducción automática, revistas de ciencia ficción en inglés, Office profesional, Word, inteligencia artificial, resúmenes, revista Lire, Julio Verne, De Thing, Mistral Nemo Instruct, Claude, LM Studio, Gemini Pro. **Traducción de revistas de ciencia ficción en inglés** **Resúmenes de artículos y revistas** **Inteligencia artificial en local**
**Palabras clave:** traducción automática, revistas de ciencia ficción en inglés, Office profesional, Word, inteligencia artificial, resúmenes, revista Lire, Julio Verne, De Thing, Mistral Nemo Instruct, Claude, LM Studio, Gemini Pro. **Traducción de revistas de ciencia ficción en inglés** **Resúmenes de artículos y revistas** **Inteligencia artificial en local**
Ege Erdil and Tamay Besiroglu have 2045+ timelines, think the whole "alignment" framing is wrong, don't think an intelligence explosion is plausible, but are convinced we'll see explosive economic growth (economy literally doubling every year or two).This discussion offers a totally different scenario than my recent interview with Scott and Daniel.Ege and Tamay are the co-founders of Mechanize, a startup dedicated to fully automating work. Before founding Mechanize, Ege and Tamay worked on AI forecasts at Epoch AI.Watch on Youtube; listen on Apple Podcasts or Spotify.----------Sponsors* WorkOS makes it easy to become enterprise-ready. With simple APIs for essential enterprise features like SSO and SCIM, WorkOS helps companies like Vercel, Plaid, and OpenAI meet the requirements of their biggest customers. To learn more about how they can help you do the same, visit workos.com* Scale's Data Foundry gives major AI labs access to high-quality data to fuel post-training, including advanced reasoning capabilities. If you're an AI researcher or engineer, learn about how Scale's Data Foundry and research lab, SEAL, can help you go beyond the current frontier at scale.com/dwarkesh* Google's Gemini Pro 2.5 is the model we use the most at Dwarkesh Podcast: it helps us generate transcripts, identify interesting clips, and code up new tools. If you want to try it for yourself, it's now available in Preview with higher rate limits! Start building with it today at aistudio.google.com.----------Timestamps(00:00:00) - AGI will take another 3 decades(00:22:27) - Even reasoning models lack animal intelligence (00:45:04) - Intelligence explosion(01:00:57) - Ege & Tamay's story(01:06:24) - Explosive economic growth(01:33:00) - Will there be a separate AI economy?(01:47:08) - Can we predictably influence the future?(02:19:48) - Arms race dynamic(02:29:48) - Is superintelligence a real thing?(02:35:45) - Reasons not to expect explosive growth(02:49:00) - Fully automated firms(02:54:43) - Will central planning work after AGI?(02:58:20) - Career advice Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
Google's AI efforts & Gemini Pro 2.5 take a major step forward with updates to Deep Research, new Agent2Agent protocol (A2A) & more. Sadly, OpenAI teases o3 and o4 but delays GPT-5. Plus, Meta's new Llama 4 models are out but have issues, Midjourney v7's debut, John Carmack's smackdown of an AI video game engine hater, Gavin's deep dive into OpenAI 4o Image Generation formats & the weirdest robot horse concept you've ever seen. WE'RE DEEP RESEARCHING OUR ENTIRE LIVES RIGHT NOW Join the discord: https://discord.gg/muD2TYgC8f Join our Patreon: https://www.patreon.com/AIForHumansShow AI For Humans Newsletter: https://aiforhumans.beehiiv.com/ Follow us for more on X @AIForHumansShow Join our TikTok @aiforhumansshow To book us for speaking, please visit our website: https://www.aiforhumans.show/ // Show Links // Google Cloud 25 Live Stream “A New Way To Cloud!” https://youtu.be/Md4Fs-Zc3tg Google Cloud Blog Post https://blog.google/products/google-cloud/next-2025/ Upgraded Deep Research Out Preforms OpenAI Deep Research https://x.com/GeminiApp/status/1909721519724339226 Google's Deep Research Vs OpenAI Deep Research https://x.com/testingcatalog/status/1909727195402027183 New Ironwood TPUs https://blog.google/products/google-cloud/ironwood-tpu-age-of-inference/ Gavin's Experiences Google Gemini Deep Research: Baltro Test: https://x.com/AIForHumansShow/status/1909813850817675424 KP Biography: https://g.co/gemini/share/7b7bdb2c400e Agent2Agent Protocol https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/ Google Paying Some AI Stuff To Do Nothing Rather Than Work For Rivals https://x.com/TechCrunch/status/1909368948862181584 Solar Glow Meditations on AI http://tiktok.com/@solarglowmeditations/video/7491038509214518559?_t=ZT-8vNNgF7QpyM&_r=1 o4-mini & o3 coming before GPT-5 in shift from Sam Altman https://x.com/sama/status/1908167621624856998 OpenAI Strategic Deployment Team (new role to prep for AGI) https://x.com/aleks_madry/status/1909686225658695897 AI 2027 Paper https://ai-2027.com/ Llama 4 is here… but how good is it? https://ai.meta.com/blog/llama-4-multimodal-intelligence/ Controversy Around Benchmarks: https://gizmodo.com/meta-cheated-on-ai-benchmarks-and-its-a-glimpse-into-a-new-golden-age-2000586433 Deep dive on issues from The Information https://www.theinformation.com/articles/llama-4s-rocky-debut?rc=c3oojq&shared=3bbd9f72303888e2 Midjourney v7 Is Here and it's… just ok? https://www.midjourney.com/updates/v7-alpha John Carmack Defends AI Video Games https://x.com/ID_AA_Carmack/status/1909311174845329874 Tim Sweeney Weighs In https://x.com/TimSweeneyEpic/status/1909314230391902611 New Test-time-training = 1 Min AI Video From a Single Prompt https://x.com/karansdalal/status/1909312851795411093 Kawasaki's Robot Horse Concept https://futurism.com/the-byte/kawasaki-rideable-horse-robot VIDEO: https://youtu.be/vQDhzbTz-9k?si=2aWMtZVLnMONEjBe Engine AI + iShowSpeed https://x.com/engineairobot/status/1908570512906740037 Gemini 2.5 Pro Plays Pokemon https://x.com/kiranvodrahalli/status/1909699142265557208 Prompt-To-Anything Minecraft Looking Game https://x.com/NicolasZu/status/1908882267453239323 An Image That Will Never Go Viral https://www.reddit.com/r/ChatGPT/comments/1jth5yf/asked_for_an_image_that_will_never_go_viral/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button How Toothpaste Is Made https://www.reddit.com/r/aivideo/comments/1jujzh2/how_toothpaste_is_made/ 90s Video Game 4o Image Gen Prompt https://x.com/AIForHumansShow/status/1908985288116101553 1980s Japanese Posters https://x.com/AIForHumansShow/status/1909824824677192140 Buff Superbad https://x.com/AIForHumansShow/status/1909402225488937065
Bill Gates celebrates the 50th anniversary of Microsoft with the release of the source code for Altair BASIC 1.0. Plus, Paul celebrates with 99 cent books: The Windows 10 Field Guide, Windows 11 Field Guide, and Windows Everywhere are all 99 cents for 24 hours! Also available: Eternal Spring: Our Guide to Mexico City in preview!Windows The plot thickens. Paul writes epic take on future of Windows 11, describes Dev channel-only features and when/if they were ever released - in other words, an extensive but partial Windows 11 feature roadmap for 2025 Two days later, Microsoft announces a Windows 11 feature road map - one that is woefully incomplete, pathetic, and sad Microsoft announces when (sort of) new on-device AI features will come to all Copilot+ PCs, meaning Intel and AMD, too - "not a glimpse at the future of the PC, but the future of the PC." Live captions with live language translations, Cocreator in Paint, Restyle image and Image creator in Photos, plus Voice access with flexible natural language (Snapdragon X only) But not Recall or Click to Do in preview, go figure As expected, March 2024 Preview update for 24H2 arrives, a few days late - with AI-powered search experience enabled Dev and Beta builds - Friday - Quick Machine Recovery (Beta only?), Speech recap in Narrator, Blue screen to get less blue, WinKey + C shortcut for Copilot returns, Spanish and French Text actions in Click to Do, Edit images in Share, AI-powered search (Dev only?) Then, Microsoft more fully describes Windows Quick Recovery Beta (23H2) - Monday - A lot of familiar 24H2 features - Narrator improvements, Copilot WinKey + C, Share with Image edit, plus System > About FAQ for some freaking reason Proton Drive is now native on Windows 11 on Arm, everyone gets new features Proton VPN is now built into Vivaldi desktop browser Intel's new CEO appears in public, vows to spin off non-core businesses. Everything but x86 chip design and Foundry, then Microsoft 365 Windows 365 Link is now available The Office apps on Windows already launch instantaneously but apparently that's not invasive enough - we need fewer auto-start items, not more of them Microsoft Excel to call out rich data cells with value tokens AI & Dev NYT copyright infringement lawsuit against Open AI and Microsoft can move forward, judge rules And now Tim O'Reilly says Open AI stole his company's paywalled book content too. Book piracy is sadly the easiest thing in the world Open AI raised more money than any private firm in history, now worth $300B ChatGPT releases awesome new image generation feature for ChatGPT And now it's available for free to everyone Google's Gemini Pro 2.5 is now available to everyone too Amazon launches Alexa+ in early access, US only Some thoughts about vibe coding, which isn't what you think it is AMD pays $4.9 billion to take on Nvidia in cloud AI Apple Intelligence + Apple Health is the future of something something Xbox & Games Nintendo announces Switch 2. Looks awesome, coming earlier than expected. But that price! And no Xbox/COD news at the launch?? Luna's not dead! Amazon announces multi-year EA partnership, expands Luna to more EU countries Microsoft announces a new Xbox Backbone controller for smartphones New titles for Xbox Game Pass across PC, Tip These show notes have been truncated due to length. For the full show notes, visit https://twit.tv/shows/windows-weekly/episodes/926 Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell
Bill Gates celebrates the 50th anniversary of Microsoft with the release of the source code for Altair BASIC 1.0. Plus, Paul celebrates with 99 cent books: The Windows 10 Field Guide, Windows 11 Field Guide, and Windows Everywhere are all 99 cents for 24 hours! Also available: Eternal Spring: Our Guide to Mexico City in preview!Windows The plot thickens. Paul writes epic take on future of Windows 11, describes Dev channel-only features and when/if they were ever released - in other words, an extensive but partial Windows 11 feature roadmap for 2025 Two days later, Microsoft announces a Windows 11 feature road map - one that is woefully incomplete, pathetic, and sad Microsoft announces when (sort of) new on-device AI features will come to all Copilot+ PCs, meaning Intel and AMD, too - "not a glimpse at the future of the PC, but the future of the PC." Live captions with live language translations, Cocreator in Paint, Restyle image and Image creator in Photos, plus Voice access with flexible natural language (Snapdragon X only) But not Recall or Click to Do in preview, go figure As expected, March 2024 Preview update for 24H2 arrives, a few days late - with AI-powered search experience enabled Dev and Beta builds - Friday - Quick Machine Recovery (Beta only?), Speech recap in Narrator, Blue screen to get less blue, WinKey + C shortcut for Copilot returns, Spanish and French Text actions in Click to Do, Edit images in Share, AI-powered search (Dev only?) Then, Microsoft more fully describes Windows Quick Recovery Beta (23H2) - Monday - A lot of familiar 24H2 features - Narrator improvements, Copilot WinKey + C, Share with Image edit, plus System -- About FAQ for some freaking reason Proton Drive is now native on Windows 11 on Arm, everyone gets new features Proton VPN is now built into Vivaldi desktop browser Intel's new CEO appears in public, vows to spin off non-core businesses. Everything but x86 chip design and Foundry, then Microsoft 365 Windows 365 Link is now available The Office apps on Windows already launch instantaneously but apparently that's not invasive enough - we need fewer auto-start items, not more of them Microsoft Excel to call out rich data cells with value tokens AI & Dev NYT copyright infringement lawsuit against Open AI and Microsoft can move forward, judge rules And now Tim O'Reilly says Open AI stole his company's paywalled book content too. Book piracy is sadly the easiest thing in the world Open AI raised more money than any private firm in history, now worth $300B ChatGPT releases awesome new image generation feature for ChatGPT And now it's available for free to everyone Google's Gemini Pro 2.5 is now available to everyone too Amazon launches Alexa+ in early access, US only Some thoughts about vibe coding, which isn't what you think it is AMD pays $4.9 billion to take on Nvidia in cloud AI Apple Intelligence + Apple Health is the future of something something Xbox & Games Nintendo announces Switch 2. Looks awesome, coming earlier than expected. But that price! And no Xbox/COD news at the launch?? Luna's not dead! Amazon announces multi-year EA partnership, expands Luna to more EU countries Microsoft announces a new Xbox Backbone controller for smartphones New titles for Xbox Game Pass across PC, Ti These show notes have been truncated due to length. For the full show notes, visit https://twit.tv/shows/windows-weekly/episodes/926 Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell
Bill Gates celebrates the 50th anniversary of Microsoft with the release of the source code for Altair BASIC 1.0. Plus, Paul celebrates with 99 cent books: The Windows 10 Field Guide, Windows 11 Field Guide, and Windows Everywhere are all 99 cents for 24 hours! Also available: Eternal Spring: Our Guide to Mexico City in preview!Windows The plot thickens. Paul writes epic take on future of Windows 11, describes Dev channel-only features and when/if they were ever released - in other words, an extensive but partial Windows 11 feature roadmap for 2025 Two days later, Microsoft announces a Windows 11 feature road map - one that is woefully incomplete, pathetic, and sad Microsoft announces when (sort of) new on-device AI features will come to all Copilot+ PCs, meaning Intel and AMD, too - "not a glimpse at the future of the PC, but the future of the PC." Live captions with live language translations, Cocreator in Paint, Restyle image and Image creator in Photos, plus Voice access with flexible natural language (Snapdragon X only) But not Recall or Click to Do in preview, go figure As expected, March 2024 Preview update for 24H2 arrives, a few days late - with AI-powered search experience enabled Dev and Beta builds - Friday - Quick Machine Recovery (Beta only?), Speech recap in Narrator, Blue screen to get less blue, WinKey + C shortcut for Copilot returns, Spanish and French Text actions in Click to Do, Edit images in Share, AI-powered search (Dev only?) Then, Microsoft more fully describes Windows Quick Recovery Beta (23H2) - Monday - A lot of familiar 24H2 features - Narrator improvements, Copilot WinKey + C, Share with Image edit, plus System -- About FAQ for some freaking reason Proton Drive is now native on Windows 11 on Arm, everyone gets new features Proton VPN is now built into Vivaldi desktop browser Intel's new CEO appears in public, vows to spin off non-core businesses. Everything but x86 chip design and Foundry, then Microsoft 365 Windows 365 Link is now available The Office apps on Windows already launch instantaneously but apparently that's not invasive enough - we need fewer auto-start items, not more of them Microsoft Excel to call out rich data cells with value tokens AI & Dev NYT copyright infringement lawsuit against Open AI and Microsoft can move forward, judge rules And now Tim O'Reilly says Open AI stole his company's paywalled book content too. Book piracy is sadly the easiest thing in the world Open AI raised more money than any private firm in history, now worth $300B ChatGPT releases awesome new image generation feature for ChatGPT And now it's available for free to everyone Google's Gemini Pro 2.5 is now available to everyone too Amazon launches Alexa+ in early access, US only Some thoughts about vibe coding, which isn't what you think it is AMD pays $4.9 billion to take on Nvidia in cloud AI Apple Intelligence + Apple Health is the future of something something Xbox & Games Nintendo announces Switch 2. Looks awesome, coming earlier than expected. But that price! And no Xbox/COD news at the launch?? Luna's not dead! Amazon announces multi-year EA partnership, expands Luna to more EU countries Microsoft announces a new Xbox Backbone controller for smartphones New titles for Xbox Game Pass across PC, Ti These show notes have been truncated due to length. For the full show notes, visit https://twit.tv/shows/windows-weekly/episodes/926 Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell
Bill Gates celebrates the 50th anniversary of Microsoft with the release of the source code for Altair BASIC 1.0. Plus, Paul celebrates with 99 cent books: The Windows 10 Field Guide, Windows 11 Field Guide, and Windows Everywhere are all 99 cents for 24 hours! Also available: Eternal Spring: Our Guide to Mexico City in preview!Windows The plot thickens. Paul writes epic take on future of Windows 11, describes Dev channel-only features and when/if they were ever released - in other words, an extensive but partial Windows 11 feature roadmap for 2025 Two days later, Microsoft announces a Windows 11 feature road map - one that is woefully incomplete, pathetic, and sad Microsoft announces when (sort of) new on-device AI features will come to all Copilot+ PCs, meaning Intel and AMD, too - "not a glimpse at the future of the PC, but the future of the PC." Live captions with live language translations, Cocreator in Paint, Restyle image and Image creator in Photos, plus Voice access with flexible natural language (Snapdragon X only) But not Recall or Click to Do in preview, go figure As expected, March 2024 Preview update for 24H2 arrives, a few days late - with AI-powered search experience enabled Dev and Beta builds - Friday - Quick Machine Recovery (Beta only?), Speech recap in Narrator, Blue screen to get less blue, WinKey + C shortcut for Copilot returns, Spanish and French Text actions in Click to Do, Edit images in Share, AI-powered search (Dev only?) Then, Microsoft more fully describes Windows Quick Recovery Beta (23H2) - Monday - A lot of familiar 24H2 features - Narrator improvements, Copilot WinKey + C, Share with Image edit, plus System > About FAQ for some freaking reason Proton Drive is now native on Windows 11 on Arm, everyone gets new features Proton VPN is now built into Vivaldi desktop browser Intel's new CEO appears in public, vows to spin off non-core businesses. Everything but x86 chip design and Foundry, then Microsoft 365 Windows 365 Link is now available The Office apps on Windows already launch instantaneously but apparently that's not invasive enough - we need fewer auto-start items, not more of them Microsoft Excel to call out rich data cells with value tokens AI & Dev NYT copyright infringement lawsuit against Open AI and Microsoft can move forward, judge rules And now Tim O'Reilly says Open AI stole his company's paywalled book content too. Book piracy is sadly the easiest thing in the world Open AI raised more money than any private firm in history, now worth $300B ChatGPT releases awesome new image generation feature for ChatGPT And now it's available for free to everyone Google's Gemini Pro 2.5 is now available to everyone too Amazon launches Alexa+ in early access, US only Some thoughts about vibe coding, which isn't what you think it is AMD pays $4.9 billion to take on Nvidia in cloud AI Apple Intelligence + Apple Health is the future of something something Xbox & Games Nintendo announces Switch 2. Looks awesome, coming earlier than expected. But that price! And no Xbox/COD news at the launch?? Luna's not dead! Amazon announces multi-year EA partnership, expands Luna to more EU countries Microsoft announces a new Xbox Backbone controller for smartphones New titles for Xbox Game Pass across PC, Tip These show notes have been truncated due to length. For the full show notes, visit https://twit.tv/shows/windows-weekly/episodes/926 Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell
Bill Gates celebrates the 50th anniversary of Microsoft with the release of the source code for Altair BASIC 1.0. Plus, Paul celebrates with 99 cent books: The Windows 10 Field Guide, Windows 11 Field Guide, and Windows Everywhere are all 99 cents for 24 hours! Also available: Eternal Spring: Our Guide to Mexico City in preview!Windows The plot thickens. Paul writes epic take on future of Windows 11, describes Dev channel-only features and when/if they were ever released - in other words, an extensive but partial Windows 11 feature roadmap for 2025 Two days later, Microsoft announces a Windows 11 feature road map - one that is woefully incomplete, pathetic, and sad Microsoft announces when (sort of) new on-device AI features will come to all Copilot+ PCs, meaning Intel and AMD, too - "not a glimpse at the future of the PC, but the future of the PC." Live captions with live language translations, Cocreator in Paint, Restyle image and Image creator in Photos, plus Voice access with flexible natural language (Snapdragon X only) But not Recall or Click to Do in preview, go figure As expected, March 2024 Preview update for 24H2 arrives, a few days late - with AI-powered search experience enabled Dev and Beta builds - Friday - Quick Machine Recovery (Beta only?), Speech recap in Narrator, Blue screen to get less blue, WinKey + C shortcut for Copilot returns, Spanish and French Text actions in Click to Do, Edit images in Share, AI-powered search (Dev only?) Then, Microsoft more fully describes Windows Quick Recovery Beta (23H2) - Monday - A lot of familiar 24H2 features - Narrator improvements, Copilot WinKey + C, Share with Image edit, plus System -- About FAQ for some freaking reason Proton Drive is now native on Windows 11 on Arm, everyone gets new features Proton VPN is now built into Vivaldi desktop browser Intel's new CEO appears in public, vows to spin off non-core businesses. Everything but x86 chip design and Foundry, then Microsoft 365 Windows 365 Link is now available The Office apps on Windows already launch instantaneously but apparently that's not invasive enough - we need fewer auto-start items, not more of them Microsoft Excel to call out rich data cells with value tokens AI & Dev NYT copyright infringement lawsuit against Open AI and Microsoft can move forward, judge rules And now Tim O'Reilly says Open AI stole his company's paywalled book content too. Book piracy is sadly the easiest thing in the world Open AI raised more money than any private firm in history, now worth $300B ChatGPT releases awesome new image generation feature for ChatGPT And now it's available for free to everyone Google's Gemini Pro 2.5 is now available to everyone too Amazon launches Alexa+ in early access, US only Some thoughts about vibe coding, which isn't what you think it is AMD pays $4.9 billion to take on Nvidia in cloud AI Apple Intelligence + Apple Health is the future of something something Xbox & Games Nintendo announces Switch 2. Looks awesome, coming earlier than expected. But that price! And no Xbox/COD news at the launch?? Luna's not dead! Amazon announces multi-year EA partnership, expands Luna to more EU countries Microsoft announces a new Xbox Backbone controller for smartphones New titles for Xbox Game Pass across PC, Ti These show notes have been truncated due to length. For the full show notes, visit https://twit.tv/shows/windows-weekly/episodes/926 Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell Sponsor: uscloud.com
Bill Gates celebrates the 50th anniversary of Microsoft with the release of the source code for Altair BASIC 1.0. Plus, Paul celebrates with 99 cent books: The Windows 10 Field Guide, Windows 11 Field Guide, and Windows Everywhere are all 99 cents for 24 hours! Also available: Eternal Spring: Our Guide to Mexico City in preview!Windows The plot thickens. Paul writes epic take on future of Windows 11, describes Dev channel-only features and when/if they were ever released - in other words, an extensive but partial Windows 11 feature roadmap for 2025 Two days later, Microsoft announces a Windows 11 feature road map - one that is woefully incomplete, pathetic, and sad Microsoft announces when (sort of) new on-device AI features will come to all Copilot+ PCs, meaning Intel and AMD, too - "not a glimpse at the future of the PC, but the future of the PC." Live captions with live language translations, Cocreator in Paint, Restyle image and Image creator in Photos, plus Voice access with flexible natural language (Snapdragon X only) But not Recall or Click to Do in preview, go figure As expected, March 2024 Preview update for 24H2 arrives, a few days late - with AI-powered search experience enabled Dev and Beta builds - Friday - Quick Machine Recovery (Beta only?), Speech recap in Narrator, Blue screen to get less blue, WinKey + C shortcut for Copilot returns, Spanish and French Text actions in Click to Do, Edit images in Share, AI-powered search (Dev only?) Then, Microsoft more fully describes Windows Quick Recovery Beta (23H2) - Monday - A lot of familiar 24H2 features - Narrator improvements, Copilot WinKey + C, Share with Image edit, plus System -- About FAQ for some freaking reason Proton Drive is now native on Windows 11 on Arm, everyone gets new features Proton VPN is now built into Vivaldi desktop browser Intel's new CEO appears in public, vows to spin off non-core businesses. Everything but x86 chip design and Foundry, then Microsoft 365 Windows 365 Link is now available The Office apps on Windows already launch instantaneously but apparently that's not invasive enough - we need fewer auto-start items, not more of them Microsoft Excel to call out rich data cells with value tokens AI & Dev NYT copyright infringement lawsuit against Open AI and Microsoft can move forward, judge rules And now Tim O'Reilly says Open AI stole his company's paywalled book content too. Book piracy is sadly the easiest thing in the world Open AI raised more money than any private firm in history, now worth $300B ChatGPT releases awesome new image generation feature for ChatGPT And now it's available for free to everyone Google's Gemini Pro 2.5 is now available to everyone too Amazon launches Alexa+ in early access, US only Some thoughts about vibe coding, which isn't what you think it is AMD pays $4.9 billion to take on Nvidia in cloud AI Apple Intelligence + Apple Health is the future of something something Xbox & Games Nintendo announces Switch 2. Looks awesome, coming earlier than expected. But that price! And no Xbox/COD news at the launch?? Luna's not dead! Amazon announces multi-year EA partnership, expands Luna to more EU countries Microsoft announces a new Xbox Backbone controller for smartphones New titles for Xbox Game Pass across PC, Ti These show notes have been truncated due to length. For the full show notes, visit https://twit.tv/shows/windows-weekly/episodes/926 Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell Sponsor: uscloud.com
Applications for the 2025 AI Engineer Summit are up, and you can save the date for AIE Singapore in April and AIE World's Fair 2025 in June.Happy new year, and thanks for 100 great episodes! Please let us know what you want to see/hear for the next 100!Full YouTube Episode with Slides/ChartsLike and subscribe and hit that bell to get notifs!Timestamps* 00:00 Welcome to the 100th Episode!* 00:19 Reflecting on the Journey* 00:47 AI Engineering: The Rise and Impact* 03:15 Latent Space Live and AI Conferences* 09:44 The Competitive AI Landscape* 21:45 Synthetic Data and Future Trends* 35:53 Creative Writing with AI* 36:12 Legal and Ethical Issues in AI* 38:18 The Data War: GPU Poor vs. GPU Rich* 39:12 The Rise of GPU Ultra Rich* 40:47 Emerging Trends in AI Models* 45:31 The Multi-Modality War* 01:05:31 The Future of AI Benchmarks* 01:13:17 Pionote and Frontier Models* 01:13:47 Niche Models and Base Models* 01:14:30 State Space Models and RWKB* 01:15:48 Inference Race and Price Wars* 01:22:16 Major AI Themes of the Year* 01:22:48 AI Rewind: January to March* 01:26:42 AI Rewind: April to June* 01:33:12 AI Rewind: July to September* 01:34:59 AI Rewind: October to December* 01:39:53 Year-End Reflections and PredictionsTranscript[00:00:00] Welcome to the 100th Episode![00:00:00] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co host Swyx for the 100th time today.[00:00:12] swyx: Yay, um, and we're so glad that, yeah, you know, everyone has, uh, followed us in this journey. How do you feel about it? 100 episodes.[00:00:19] Alessio: Yeah, I know.[00:00:19] Reflecting on the Journey[00:00:19] Alessio: Almost two years that we've been doing this. We've had four different studios. Uh, we've had a lot of changes. You know, we used to do this lightning round. When we first started that we didn't like, and we tried to change the question. The answer[00:00:32] swyx: was cursor and perplexity.[00:00:34] Alessio: Yeah, I love mid journey. It's like, do you really not like anything else?[00:00:38] Alessio: Like what's, what's the unique thing? And I think, yeah, we, we've also had a lot more research driven content. You know, we had like 3DAO, we had, you know. Jeremy Howard, we had more folks like that.[00:00:47] AI Engineering: The Rise and Impact[00:00:47] Alessio: I think we want to do more of that too in the new year, like having, uh, some of the Gemini folks, both on the research and the applied side.[00:00:54] Alessio: Yeah, but it's been a ton of fun. I think we both started, I wouldn't say as a joke, we were kind of like, Oh, we [00:01:00] should do a podcast. And I think we kind of caught the right wave, obviously. And I think your rise of the AI engineer posts just kind of get people. Sombra to congregate, and then the AI engineer summit.[00:01:11] Alessio: And that's why when I look at our growth chart, it's kind of like a proxy for like the AI engineering industry as a whole, which is almost like, like, even if we don't do that much, we keep growing just because there's so many more AI engineers. So did you expect that growth or did you expect that would take longer for like the AI engineer thing to kind of like become, you know, everybody talks about it today.[00:01:32] swyx: So, the sign of that, that we have won is that Gartner puts it at the top of the hype curve right now. So Gartner has called the peak in AI engineering. I did not expect, um, to what level. I knew that I was correct when I called it because I did like two months of work going into that. But I didn't know, You know, how quickly it could happen, and obviously there's a chance that I could be wrong.[00:01:52] swyx: But I think, like, most people have come around to that concept. Hacker News hates it, which is a good sign. But there's enough people that have defined it, you know, GitHub, when [00:02:00] they launched GitHub Models, which is the Hugging Face clone, they put AI engineers in the banner, like, above the fold, like, in big So I think it's like kind of arrived as a meaningful and useful definition.[00:02:12] swyx: I think people are trying to figure out where the boundaries are. I think that was a lot of the quote unquote drama that happens behind the scenes at the World's Fair in June. Because I think there's a lot of doubt or questions about where ML engineering stops and AI engineering starts. That's a useful debate to be had.[00:02:29] swyx: In some sense, I actually anticipated that as well. So I intentionally did not. Put a firm definition there because most of the successful definitions are necessarily underspecified and it's actually useful to have different perspectives and you don't have to specify everything from the outset.[00:02:45] Alessio: Yeah, I was at um, AWS reInvent and the line to get into like the AI engineering talk, so to speak, which is, you know, applied AI and whatnot was like, there are like hundreds of people just in line to go in.[00:02:56] Alessio: I think that's kind of what enabled me. People, right? Which is what [00:03:00] you kind of talked about. It's like, Hey, look, you don't actually need a PhD, just, yeah, just use the model. And then maybe we'll talk about some of the blind spots that you get as an engineer with the earlier posts that we also had on on the sub stack.[00:03:11] Alessio: But yeah, it's been a heck of a heck of a two years.[00:03:14] swyx: Yeah.[00:03:15] Latent Space Live and AI Conferences[00:03:15] swyx: You know, I was, I was trying to view the conference as like, so NeurIPS is I think like 16, 17, 000 people. And the Latent Space Live event that we held there was 950 signups. I think. The AI world, the ML world is still very much research heavy. And that's as it should be because ML is very much in a research phase.[00:03:34] swyx: But as we move this entire field into production, I think that ratio inverts into becoming more engineering heavy. So at least I think engineering should be on the same level, even if it's never as prestigious, like it'll always be low status because at the end of the day, you're manipulating APIs or whatever.[00:03:51] swyx: But Yeah, wrapping GPTs, but there's going to be an increasing stack and an art to doing these, these things well. And I, you know, I [00:04:00] think that's what we're focusing on for the podcast, the conference and basically everything I do seems to make sense. And I think we'll, we'll talk about the trends here that apply.[00:04:09] swyx: It's, it's just very strange. So, like, there's a mix of, like, keeping on top of research while not being a researcher and then putting that research into production. So, like, people always ask me, like, why are you covering Neuralibs? Like, this is a ML research conference and I'm like, well, yeah, I mean, we're not going to, to like, understand everything Or reproduce every single paper, but the stuff that is being found here is going to make it through into production at some point, you hope.[00:04:32] swyx: And then actually like when I talk to the researchers, they actually get very excited because they're like, oh, you guys are actually caring about how this goes into production and that's what they really really want. The measure of success is previously just peer review, right? Getting 7s and 8s on their um, Academic review conferences and stuff like citations is one metric, but money is a better metric.[00:04:51] Alessio: Money is a better metric. Yeah, and there were about 2200 people on the live stream or something like that. Yeah, yeah. Hundred on the live stream. So [00:05:00] I try my best to moderate, but it was a lot spicier in person with Jonathan and, and Dylan. Yeah, that it was in the chat on YouTube.[00:05:06] swyx: I would say that I actually also created.[00:05:09] swyx: Layen Space Live in order to address flaws that are perceived in academic conferences. This is not NeurIPS specific, it's ICML, NeurIPS. Basically, it's very sort of oriented towards the PhD student, uh, market, job market, right? Like literally all, basically everyone's there to advertise their research and skills and get jobs.[00:05:28] swyx: And then obviously all the, the companies go there to hire them. And I think that's great for the individual researchers, but for people going there to get info is not great because you have to read between the lines, bring a ton of context in order to understand every single paper. So what is missing is effectively what I ended up doing, which is domain by domain, go through and recap the best of the year.[00:05:48] swyx: Survey the field. And there are, like NeurIPS had a, uh, I think ICML had a like a position paper track, NeurIPS added a benchmarks, uh, datasets track. These are ways in which to address that [00:06:00] issue. Uh, there's always workshops as well. Every, every conference has, you know, a last day of workshops and stuff that provide more of an overview.[00:06:06] swyx: But they're not specifically prompted to do so. And I think really, uh, Organizing a conference is just about getting good speakers and giving them the correct prompts. And then they will just go and do that thing and they do a very good job of it. So I think Sarah did a fantastic job with the startups prompt.[00:06:21] swyx: I can't list everybody, but we did best of 2024 in startups, vision, open models. Post transformers, synthetic data, small models, and agents. And then the last one was the, uh, and then we also did a quick one on reasoning with Nathan Lambert. And then the last one, obviously, was the debate that people were very hyped about.[00:06:39] swyx: It was very awkward. And I'm really, really thankful for John Franco, basically, who stepped up to challenge Dylan. Because Dylan was like, yeah, I'll do it. But He was pro scaling. And I think everyone who is like in AI is pro scaling, right? So you need somebody who's ready to publicly say, no, we've hit a wall.[00:06:57] swyx: So that means you're saying Sam Altman's wrong. [00:07:00] You're saying, um, you know, everyone else is wrong. It helps that this was the day before Ilya went on, went up on stage and then said pre training has hit a wall. And data has hit a wall. So actually Jonathan ended up winning, and then Ilya supported that statement, and then Noam Brown on the last day further supported that statement as well.[00:07:17] swyx: So it's kind of interesting that I think the consensus kind of going in was that we're not done scaling, like you should believe in a better lesson. And then, four straight days in a row, you had Sepp Hochreiter, who is the creator of the LSTM, along with everyone's favorite OG in AI, which is Juergen Schmidhuber.[00:07:34] swyx: He said that, um, we're pre trading inside a wall, or like, we've run into a different kind of wall. And then we have, you know John Frankel, Ilya, and then Noam Brown are all saying variations of the same thing, that we have hit some kind of wall in the status quo of what pre trained, scaling large pre trained models has looked like, and we need a new thing.[00:07:54] swyx: And obviously the new thing for people is some make, either people are calling it inference time compute or test time [00:08:00] compute. I think the collective terminology has been inference time, and I think that makes sense because test time, calling it test, meaning, has a very pre trained bias, meaning that the only reason for running inference at all is to test your model.[00:08:11] swyx: That is not true. Right. Yeah. So, so, I quite agree that. OpenAI seems to have adopted, or the community seems to have adopted this terminology of ITC instead of TTC. And that, that makes a lot of sense because like now we care about inference, even right down to compute optimality. Like I actually interviewed this author who recovered or reviewed the Chinchilla paper.[00:08:31] swyx: Chinchilla paper is compute optimal training, but what is not stated in there is it's pre trained compute optimal training. And once you start caring about inference, compute optimal training, you have a different scaling law. And in a way that we did not know last year.[00:08:45] Alessio: I wonder, because John is, he's also on the side of attention is all you need.[00:08:49] Alessio: Like he had the bet with Sasha. So I'm curious, like he doesn't believe in scaling, but he thinks the transformer, I wonder if he's still. So, so,[00:08:56] swyx: so he, obviously everything is nuanced and you know, I told him to play a character [00:09:00] for this debate, right? So he actually does. Yeah. He still, he still believes that we can scale more.[00:09:04] swyx: Uh, he just assumed the character to be very game for, for playing this debate. So even more kudos to him that he assumed a position that he didn't believe in and still won the debate.[00:09:16] Alessio: Get rekt, Dylan. Um, do you just want to quickly run through some of these things? Like, uh, Sarah's presentation, just the highlights.[00:09:24] swyx: Yeah, we can't go through everyone's slides, but I pulled out some things as a factor of, like, stuff that we were going to talk about. And we'll[00:09:30] Alessio: publish[00:09:31] swyx: the rest. Yeah, we'll publish on this feed the best of 2024 in those domains. And hopefully people can benefit from the work that our speakers have done.[00:09:39] swyx: But I think it's, uh, these are just good slides. And I've been, I've been looking for a sort of end of year recaps from, from people.[00:09:44] The Competitive AI Landscape[00:09:44] swyx: The field has progressed a lot. You know, I think the max ELO in 2023 on LMSys used to be 1200 for LMSys ELOs. And now everyone is at least at, uh, 1275 in their ELOs, and this is across Gemini, Chadjibuti, [00:10:00] Grok, O1.[00:10:01] swyx: ai, which with their E Large model, and Enthopic, of course. It's a very, very competitive race. There are multiple Frontier labs all racing, but there is a clear tier zero Frontier. And then there's like a tier one. It's like, I wish I had everything else. Tier zero is extremely competitive. It's effectively now three horse race between Gemini, uh, Anthropic and OpenAI.[00:10:21] swyx: I would say that people are still holding out a candle for XAI. XAI, I think, for some reason, because their API was very slow to roll out, is not included in these metrics. So it's actually quite hard to put on there. As someone who also does charts, XAI is continually snubbed because they don't work well with the benchmarking people.[00:10:42] swyx: Yeah, yeah, yeah. It's a little trivia for why XAI always gets ignored. The other thing is market share. So these are slides from Sarah. We have it up on the screen. It has gone from very heavily open AI. So we have some numbers and estimates. These are from RAMP. Estimates of open AI market share in [00:11:00] December 2023.[00:11:01] swyx: And this is basically, what is it, GPT being 95 percent of production traffic. And I think if you correlate that with stuff that we asked. Harrison Chase on the LangChain episode, it was true. And then CLAUD 3 launched mid middle of this year. I think CLAUD 3 launched in March, CLAUD 3. 5 Sonnet was in June ish.[00:11:23] swyx: And you can start seeing the market share shift towards opening, uh, towards that topic, uh, very, very aggressively. The more recent one is Gemini. So if I scroll down a little bit, this is an even more recent dataset. So RAM's dataset ends in September 2 2. 2024. Gemini has basically launched a price war at the low end, uh, with Gemini Flash, uh, being basically free for personal use.[00:11:44] swyx: Like, I think people don't understand the free tier. It's something like a billion tokens per day. Unless you're trying to abuse it, you cannot really exhaust your free tier on Gemini. They're really trying to get you to use it. They know they're in like third place, um, fourth place, depending how you, how you count.[00:11:58] swyx: And so they're going after [00:12:00] the Lower tier first, and then, you know, maybe the upper tier later, but yeah, Gemini Flash, according to OpenRouter, is now 50 percent of their OpenRouter requests. Obviously, these are the small requests. These are small, cheap requests that are mathematically going to be more.[00:12:15] swyx: The smart ones obviously are still going to OpenAI. But, you know, it's a very, very big shift in the market. Like basically 2023, 2022, To going into 2024 opening has gone from nine five market share to Yeah. Reasonably somewhere between 50 to 75 market share.[00:12:29] Alessio: Yeah. I'm really curious how ramped does the attribution to the model?[00:12:32] Alessio: If it's API, because I think it's all credit card spin. . Well, but it's all, the credit card doesn't say maybe. Maybe the, maybe when they do expenses, they upload the PDF, but yeah, the, the German I think makes sense. I think that was one of my main 2024 takeaways that like. The best small model companies are the large labs, which is not something I would have thought that the open source kind of like long tail would be like the small model.[00:12:53] swyx: Yeah, different sizes of small models we're talking about here, right? Like so small model here for Gemini is AB, [00:13:00] right? Uh, mini. We don't know what the small model size is, but yeah, it's probably in the double digits or maybe single digits, but probably double digits. The open source community has kind of focused on the one to three B size.[00:13:11] swyx: Mm-hmm . Yeah. Maybe[00:13:12] swyx: zero, maybe 0.5 B uh, that's moon dream and that is small for you then, then that's great. It makes sense that we, we have a range for small now, which is like, may, maybe one to five B. Yeah. I'll even put that at, at, at the high end. And so this includes Gemma from Gemini as well. But also includes the Apple Foundation models, which I think Apple Foundation is 3B.[00:13:32] Alessio: Yeah. No, that's great. I mean, I think in the start small just meant cheap. I think today small is actually a more nuanced discussion, you know, that people weren't really having before.[00:13:43] swyx: Yeah, we can keep going. This is a slide that I smiley disagree with Sarah. She's pointing to the scale SEAL leaderboard. I think the Researchers that I talked with at NeurIPS were kind of positive on this because basically you need private test [00:14:00] sets to prevent contamination.[00:14:02] swyx: And Scale is one of maybe three or four people this year that has really made an effort in doing a credible private test set leaderboard. Llama405B does well compared to Gemini and GPT 40. And I think that's good. I would say that. You know, it's good to have an open model that is that big, that does well on those metrics.[00:14:23] swyx: But anyone putting 405B in production will tell you, if you scroll down a little bit to the artificial analysis numbers, that it is very slow and very expensive to infer. Um, it doesn't even fit on like one node. of, uh, of H100s. Cerebras will be happy to tell you they can serve 4 or 5B on their super large chips.[00:14:42] swyx: But, um, you know, if you need to do anything custom to it, you're still kind of constrained. So, is 4 or 5B really that relevant? Like, I think most people are basically saying that they only use 4 or 5B as a teacher model to distill down to something. Even Meta is doing it. So with Lama 3. [00:15:00] 3 launched, they only launched the 70B because they use 4 or 5B to distill the 70B.[00:15:03] swyx: So I don't know if like open source is keeping up. I think they're the, the open source industrial complex is very invested in telling you that the, if the gap is narrowing, I kind of disagree. I think that the gap is widening with O1. I think there are very, very smart people trying to narrow that gap and they should.[00:15:22] swyx: I really wish them success, but you cannot use a chart that is nearing 100 in your saturation chart. And look, the distance between open source and closed source is narrowing. Of course it's going to narrow because you're near 100. This is stupid. But in metrics that matter, is open source narrowing?[00:15:38] swyx: Probably not for O1 for a while. And it's really up to the open source guys to figure out if they can match O1 or not.[00:15:46] Alessio: I think inference time compute is bad for open source just because, you know, Doc can donate the flops at training time, but he cannot donate the flops at inference time. So it's really hard to like actually keep up on that axis.[00:15:59] Alessio: Big, big business [00:16:00] model shift. So I don't know what that means for the GPU clouds. I don't know what that means for the hyperscalers, but obviously the big labs have a lot of advantage. Because, like, it's not a static artifact that you're putting the compute in. You're kind of doing that still, but then you're putting a lot of computed inference too.[00:16:17] swyx: Yeah, yeah, yeah. Um, I mean, Llama4 will be reasoning oriented. We talked with Thomas Shalom. Um, kudos for getting that episode together. That was really nice. Good, well timed. Actually, I connected with the AI meta guy, uh, at NeurIPS, and, um, yeah, we're going to coordinate something for Llama4. Yeah, yeah,[00:16:32] Alessio: and our friend, yeah.[00:16:33] Alessio: Clara Shi just joined to lead the business agent side. So I'm sure we'll have her on in the new year.[00:16:39] swyx: Yeah. So, um, my comment on, on the business model shift, this is super interesting. Apparently it is wide knowledge that OpenAI wanted more than 6. 6 billion dollars for their fundraise. They wanted to raise, you know, higher, and they did not.[00:16:51] swyx: And what that means is basically like, it's very convenient that we're not getting GPT 5, which would have been a larger pre train. We should have a lot of upfront money. And [00:17:00] instead we're, we're converting fixed costs into variable costs, right. And passing it on effectively to the customer. And it's so much easier to take margin there because you can directly attribute it to like, Oh, you're using this more.[00:17:12] swyx: Therefore you, you pay more of the cost and I'll just slap a margin in there. So like that lets you control your growth margin and like tie your. Your spend, or your sort of inference spend, accordingly. And it's just really interesting to, that this change in the sort of inference paradigm has arrived exactly at the same time that the funding environment for pre training is effectively drying up, kind of.[00:17:36] swyx: I feel like maybe the VCs are very in tune with research anyway, so like, they would have noticed this, but, um, it's just interesting.[00:17:43] Alessio: Yeah, and I was looking back at our yearly recap of last year. Yeah. And the big thing was like the mixed trial price fights, you know, and I think now it's almost like there's nowhere to go, like, you know, Gemini Flash is like basically giving it away for free.[00:17:55] Alessio: So I think this is a good way for the labs to generate more revenue and pass down [00:18:00] some of the compute to the customer. I think they're going to[00:18:02] swyx: keep going. I think that 2, will come.[00:18:05] Alessio: Yeah, I know. Totally. I mean, next year, the first thing I'm doing is signing up for Devin. Signing up for the pro chat GBT.[00:18:12] Alessio: Just to try. I just want to see what does it look like to spend a thousand dollars a month on AI?[00:18:17] swyx: Yes. Yes. I think if your, if your, your job is a, at least AI content creator or VC or, you know, someone who, whose job it is to stay on, stay on top of things, you should already be spending like a thousand dollars a month on, on stuff.[00:18:28] swyx: And then obviously easy to spend, hard to use. You have to actually use. The good thing is that actually Google lets you do a lot of stuff for free now. So like deep research. That they just launched. Uses a ton of inference and it's, it's free while it's in preview.[00:18:45] Alessio: Yeah. They need to put that in Lindy.[00:18:47] Alessio: I've been using Lindy lately. I've been a built a bunch of things once we had flow because I liked the new thing. It's pretty good. I even did a phone call assistant. Um, yeah, they just launched Lindy voice. Yeah, I think once [00:19:00] they get advanced voice mode like capability today, still like speech to text, you can kind of tell.[00:19:06] Alessio: Um, but it's good for like reservations and things like that. So I have a meeting prepper thing. And so[00:19:13] swyx: it's good. Okay. I feel like we've, we've covered a lot of stuff. Uh, I, yeah, I, you know, I think We will go over the individual, uh, talks in a separate episode. Uh, I don't want to take too much time with, uh, this stuff, but that suffice to say that there is a lot of progress in each field.[00:19:28] swyx: Uh, we covered vision. Basically this is all like the audience voting for what they wanted. And then I just invited the best people I could find in each audience, especially agents. Um, Graham, who I talked to at ICML in Vienna, he is currently still number one. It's very hard to stay on top of SweetBench.[00:19:45] swyx: OpenHand is currently still number one. switchbench full, which is the hardest one. He had very good thoughts on agents, which I, which I'll highlight for people. Everyone is saying 2025 is the year of agents, just like they said last year. And, uh, but he had [00:20:00] thoughts on like eight parts of what are the frontier problems to solve in agents.[00:20:03] swyx: And so I'll highlight that talk as well.[00:20:05] Alessio: Yeah. The number six, which is the Hacken agents learn more about the environment, has been a Super interesting to us as well, just to think through, because, yeah, how do you put an agent in an enterprise where most things in an enterprise have never been public, you know, a lot of the tooling, like the code bases and things like that.[00:20:23] Alessio: So, yeah, there's not indexing and reg. Well, yeah, but it's more like. You can't really rag things that are not documented. But people know them based on how they've been doing it. You know, so I think there's almost this like, you know, Oh, institutional knowledge. Yeah, the boring word is kind of like a business process extraction.[00:20:38] Alessio: Yeah yeah, I see. It's like, how do you actually understand how these things are done? I see. Um, and I think today the, the problem is that, Yeah, the agents are, that most people are building are good at following instruction, but are not as good as like extracting them from you. Um, so I think that will be a big unlock just to touch quickly on the Jeff Dean thing.[00:20:55] Alessio: I thought it was pretty, I mean, we'll link it in the, in the things, but. I think the main [00:21:00] focus was like, how do you use ML to optimize the systems instead of just focusing on ML to do something else? Yeah, I think speculative decoding, we had, you know, Eugene from RWKB on the podcast before, like he's doing a lot of that with Fetterless AI.[00:21:12] swyx: Everyone is. I would say it's the norm. I'm a little bit uncomfortable with how much it costs, because it does use more of the GPU per call. But because everyone is so keen on fast inference, then yeah, makes sense.[00:21:24] Alessio: Exactly. Um, yeah, but we'll link that. Obviously Jeff is great.[00:21:30] swyx: Jeff is, Jeff's talk was more, it wasn't focused on Gemini.[00:21:33] swyx: I think people got the wrong impression from my tweet. It's more about how Google approaches ML and uses ML to design systems and then systems feedback into ML. And I think this ties in with Lubna's talk.[00:21:45] Synthetic Data and Future Trends[00:21:45] swyx: on synthetic data where it's basically the story of bootstrapping of humans and AI in AI research or AI in production.[00:21:53] swyx: So her talk was on synthetic data, where like how much synthetic data has grown in 2024 in the pre training side, the post training side, [00:22:00] and the eval side. And I think Jeff then also extended it basically to chips, uh, to chip design. So he'd spend a lot of time talking about alpha chip. And most of us in the audience are like, we're not working on hardware, man.[00:22:11] swyx: Like you guys are great. TPU is great. Okay. We'll buy TPUs.[00:22:14] Alessio: And then there was the earlier talk. Yeah. But, and then we have, uh, I don't know if we're calling them essays. What are we calling these? But[00:22:23] swyx: for me, it's just like bonus for late in space supporters, because I feel like they haven't been getting anything.[00:22:29] swyx: And then I wanted a more high frequency way to write stuff. Like that one I wrote in an afternoon. I think basically we now have an answer to what Ilya saw. It's one year since. The blip. And we know what he saw in 2014. We know what he saw in 2024. We think we know what he sees in 2024. He gave some hints and then we have vague indications of what he saw in 2023.[00:22:54] swyx: So that was the Oh, and then 2016 as well, because of this lawsuit with Elon, OpenAI [00:23:00] is publishing emails from Sam's, like, his personal text messages to Siobhan, Zelis, or whatever. So, like, we have emails from Ilya saying, this is what we're seeing in OpenAI, and this is why we need to scale up GPUs. And I think it's very prescient in 2016 to write that.[00:23:16] swyx: And so, like, it is exactly, like, basically his insights. It's him and Greg, basically just kind of driving the scaling up of OpenAI, while they're still playing Dota. They're like, no, like, we see the path here.[00:23:30] Alessio: Yeah, and it's funny, yeah, they even mention, you know, we can only train on 1v1 Dota. We need to train on 5v5, and that takes too many GPUs.[00:23:37] Alessio: Yeah,[00:23:37] swyx: and at least for me, I can speak for myself, like, I didn't see the path from Dota to where we are today. I think even, maybe if you ask them, like, they wouldn't necessarily draw a straight line. Yeah,[00:23:47] Alessio: no, definitely. But I think like that was like the whole idea of almost like the RL and we talked about this with Nathan on his podcast.[00:23:55] Alessio: It's like with RL, you can get very good at specific things, but then you can't really like generalize as much. And I [00:24:00] think the language models are like the opposite, which is like, you're going to throw all this data at them and scale them up, but then you really need to drive them home on a specific task later on.[00:24:08] Alessio: And we'll talk about the open AI reinforcement, fine tuning, um, announcement too, and all of that. But yeah, I think like scale is all you need. That's kind of what Elia will be remembered for. And I think just maybe to clarify on like the pre training is over thing that people love to tweet. I think the point of the talk was like everybody, we're scaling these chips, we're scaling the compute, but like the second ingredient which is data is not scaling at the same rate.[00:24:35] Alessio: So it's not necessarily pre training is over. It's kind of like What got us here won't get us there. In his email, he predicted like 10x growth every two years or something like that. And I think maybe now it's like, you know, you can 10x the chips again, but[00:24:49] swyx: I think it's 10x per year. Was it? I don't know.[00:24:52] Alessio: Exactly. And Moore's law is like 2x. So it's like, you know, much faster than that. And yeah, I like the fossil fuel of AI [00:25:00] analogy. It's kind of like, you know, the little background tokens thing. So the OpenAI reinforcement fine tuning is basically like, instead of fine tuning on data, you fine tune on a reward model.[00:25:09] Alessio: So it's basically like, instead of being data driven, it's like task driven. And I think people have tasks to do, they don't really have a lot of data. So I'm curious to see how that changes, how many people fine tune, because I think this is what people run into. It's like, Oh, you can fine tune llama. And it's like, okay, where do I get the data?[00:25:27] Alessio: To fine tune it on, you know, so it's great that we're moving the thing. And then I really like he had this chart where like, you know, the brain mass and the body mass thing is basically like mammals that scaled linearly by brain and body size, and then humans kind of like broke off the slope. So it's almost like maybe the mammal slope is like the pre training slope.[00:25:46] Alessio: And then the post training slope is like the, the human one.[00:25:49] swyx: Yeah. I wonder what the. I mean, we'll know in 10 years, but I wonder what the y axis is for, for Ilya's SSI. We'll try to get them on.[00:25:57] Alessio: Ilya, if you're listening, you're [00:26:00] welcome here. Yeah, and then he had, you know, what comes next, like agent, synthetic data, inference, compute, I thought all of that was like that.[00:26:05] Alessio: I don't[00:26:05] swyx: think he was dropping any alpha there. Yeah, yeah, yeah.[00:26:07] Alessio: Yeah. Any other new reps? Highlights?[00:26:10] swyx: I think that there was comparatively a lot more work. Oh, by the way, I need to plug that, uh, my friend Yi made this, like, little nice paper. Yeah, that was really[00:26:20] swyx: nice.[00:26:20] swyx: Uh, of, uh, of, like, all the, he's, she called it must read papers of 2024.[00:26:26] swyx: So I laid out some of these at NeurIPS, and it was just gone. Like, everyone just picked it up. Because people are dying for, like, little guidance and visualizations And so, uh, I thought it was really super nice that we got there.[00:26:38] Alessio: Should we do a late in space book for each year? Uh, I thought about it. For each year we should.[00:26:42] Alessio: Coffee table book. Yeah. Yeah. Okay. Put it in the will. Hi, Will. By the way, we haven't introduced you. He's our new, you know, general organist, Jamie. You need to[00:26:52] swyx: pull up more things. One thing I saw that, uh, Okay, one fun one, and then one [00:27:00] more general one. So the fun one is this paper on agent collusion. This is a paper on steganography.[00:27:06] swyx: This is secret collusion among AI agents, multi agent deception via steganography. I tried to go to NeurIPS in order to find these kinds of papers because the real reason Like NeurIPS this year has a lottery system. A lot of people actually even go and don't buy tickets because they just go and attend the side events.[00:27:22] swyx: And then also the people who go and end up crowding around the most popular papers, which you already know and already read them before you showed up to NeurIPS. So the only reason you go there is to talk to the paper authors, but there's like something like 10, 000 other. All these papers out there that, you know, are just people's work that they, that they did on the air and they failed to get attention for one reason or another.[00:27:42] swyx: And this was one of them. Uh, it was like all the way at the back. And this is a deep mind paper that actually focuses on collusion between AI agents, uh, by hiding messages in the text that they generate. Uh, so that's what steganography is. So a very simple example would be the first letter of every word.[00:27:57] swyx: If you Pick that out, you know, and the code sends a [00:28:00] different message than that. But something I've always emphasized is to LLMs, we read left to right. LLMs can read up, down, sideways, you know, in random character order. And it's the same to them as it is to us. So if we were ever to get You know, self motivated, underlined LLMs that we're trying to collaborate to take over the planet.[00:28:19] swyx: This would be how they do it. They spread messages among us in the messages that we generate. And he developed a scaling law for that. So he marked, I'm showing it on screen right now, the emergence of this phenomenon. Basically, for example, for Cypher encoding, GPT 2, Lama 2, mixed trial, GPT 3. 5, zero capabilities, and sudden 4.[00:28:40] swyx: And this is the kind of Jason Wei type emergence properties that people kind of look for. I think what made this paper stand out as well, so he developed the benchmark for steganography collusion, and he also focused on shelling point collusion, which is very low coordination. For agreeing on a decoding encoding format, you kind of need to have some [00:29:00] agreement on that.[00:29:00] swyx: But, but shelling point means like very, very low or almost no coordination. So for example, if I, if I ask someone, if the only message I give you is meet me in New York and you're not aware. Or when you would probably meet me at Grand Central Station. That is the Grand Central Station is a shelling point.[00:29:16] swyx: And it's probably somewhere, somewhere during the day. That is the shelling point of New York is Grand Central. To that extent, shelling points for steganography are things like the, the, the common decoding methods that we talked about. It will be interesting at some point in the future when we are worried about alignment.[00:29:30] swyx: It is not interesting today, but it's interesting that DeepMind is already thinking about this.[00:29:36] Alessio: I think that's like one of the hardest things about NeurIPS. It's like the long tail. I[00:29:41] swyx: found a pricing guy. I'm going to feature him on the podcast. Basically, this guy from NVIDIA worked out the optimal pricing for language models.[00:29:51] swyx: It's basically an econometrics paper at NeurIPS, where everyone else is talking about GPUs. And the guy with the GPUs is[00:29:57] Alessio: talking[00:29:57] swyx: about economics instead. [00:30:00] That was the sort of fun one. So the focus I saw is that model papers at NeurIPS are kind of dead. No one really presents models anymore. It's just data sets.[00:30:12] swyx: This is all the grad students are working on. So like there was a data sets track and then I was looking around like, I was like, you don't need a data sets track because every paper is a data sets paper. And so data sets and benchmarks, they're kind of flip sides of the same thing. So Yeah. Cool. Yeah, if you're a grad student, you're a GPU boy, you kind of work on that.[00:30:30] swyx: And then the, the sort of big model that people walk around and pick the ones that they like, and then they use it in their models. And that's, that's kind of how it develops. I, I feel like, um, like, like you didn't last year, you had people like Hao Tian who worked on Lava, which is take Lama and add Vision.[00:30:47] swyx: And then obviously actually I hired him and he added Vision to Grok. Now he's the Vision Grok guy. This year, I don't think there was any of those.[00:30:55] Alessio: What were the most popular, like, orals? Last year it was like the [00:31:00] Mixed Monarch, I think, was like the most attended. Yeah, uh, I need to look it up. Yeah, I mean, if nothing comes to mind, that's also kind of like an answer in a way.[00:31:10] Alessio: But I think last year there was a lot of interest in, like, furthering models and, like, different architectures and all of that.[00:31:16] swyx: I will say that I felt the orals, oral picks this year were not very good. Either that or maybe it's just a So that's the highlight of how I have changed in terms of how I view papers.[00:31:29] swyx: So like, in my estimation, two of the best papers in this year for datasets or data comp and refined web or fine web. These are two actually industrially used papers, not highlighted for a while. I think DCLM got the spotlight, FineWeb didn't even get the spotlight. So like, it's just that the picks were different.[00:31:48] swyx: But one thing that does get a lot of play that a lot of people are debating is the role that's scheduled. This is the schedule free optimizer paper from Meta from Aaron DeFazio. And this [00:32:00] year in the ML community, there's been a lot of chat about shampoo, soap, all the bathroom amenities for optimizing your learning rates.[00:32:08] swyx: And, uh, most people at the big labs are. Who I asked about this, um, say that it's cute, but it's not something that matters. I don't know, but it's something that was discussed and very, very popular. 4Wars[00:32:19] Alessio: of AI recap maybe, just quickly. Um, where do you want to start? Data?[00:32:26] swyx: So to remind people, this is the 4Wars piece that we did as one of our earlier recaps of this year.[00:32:31] swyx: And the belligerents are on the left, journalists, writers, artists, anyone who owns IP basically, New York Times, Stack Overflow, Reddit, Getty, Sarah Silverman, George RR Martin. Yeah, and I think this year we can add Scarlett Johansson to that side of the fence. So anyone suing, open the eye, basically. I actually wanted to get a snapshot of all the lawsuits.[00:32:52] swyx: I'm sure some lawyer can do it. That's the data quality war. On the right hand side, we have the synthetic data people, and I think we talked about Lumna's talk, you know, [00:33:00] really showing how much synthetic data has come along this year. I think there was a bit of a fight between scale. ai and the synthetic data community, because scale.[00:33:09] swyx: ai published a paper saying that synthetic data doesn't work. Surprise, surprise, scale. ai is the leading vendor of non synthetic data. Only[00:33:17] Alessio: cage free annotated data is useful.[00:33:21] swyx: So I think there's some debate going on there, but I don't think it's much debate anymore that at least synthetic data, for the reasons that are blessed in Luna's talk, Makes sense.[00:33:32] swyx: I don't know if you have any perspectives there.[00:33:34] Alessio: I think, again, going back to the reinforcement fine tuning, I think that will change a little bit how people think about it. I think today people mostly use synthetic data, yeah, for distillation and kind of like fine tuning a smaller model from like a larger model.[00:33:46] Alessio: I'm not super aware of how the frontier labs use it outside of like the rephrase, the web thing that Apple also did. But yeah, I think it'll be. Useful. I think like whether or not that gets us the big [00:34:00] next step, I think that's maybe like TBD, you know, I think people love talking about data because it's like a GPU poor, you know, I think, uh, synthetic data is like something that people can do, you know, so they feel more opinionated about it compared to, yeah, the optimizers stuff, which is like,[00:34:17] swyx: they don't[00:34:17] Alessio: really work[00:34:18] swyx: on.[00:34:18] swyx: I think that there is an angle to the reasoning synthetic data. So this year, we covered in the paper club, the star series of papers. So that's star, Q star, V star. It basically helps you to synthesize reasoning steps, or at least distill reasoning steps from a verifier. And if you look at the OpenAI RFT, API that they released, or that they announced, basically they're asking you to submit graders, or they choose from a preset list of graders.[00:34:49] swyx: Basically It feels like a way to create valid synthetic data for them to fine tune their reasoning paths on. Um, so I think that is another angle where it starts to make sense. And [00:35:00] so like, it's very funny that basically all the data quality wars between Let's say the music industry or like the newspaper publishing industry or the textbooks industry on the big labs.[00:35:11] swyx: It's all of the pre training era. And then like the new era, like the reasoning era, like nobody has any problem with all the reasoning, especially because it's all like sort of math and science oriented with, with very reasonable graders. I think the more interesting next step is how does it generalize beyond STEM?[00:35:27] swyx: We've been using O1 for And I would say like for summarization and creative writing and instruction following, I think it's underrated. I started using O1 in our intro songs before we killed the intro songs, but it's very good at writing lyrics. You know, I can actually say like, I think one of the O1 pro demos.[00:35:46] swyx: All of these things that Noam was showing was that, you know, you can write an entire paragraph or three paragraphs without using the letter A, right?[00:35:53] Creative Writing with AI[00:35:53] swyx: So like, like literally just anything instead of token, like not even token level, character level manipulation and [00:36:00] counting and instruction following. It's, uh, it's very, very strong.[00:36:02] swyx: And so no surprises when I ask it to rhyme, uh, and to, to create song lyrics, it's going to do that very much better than in previous models. So I think it's underrated for creative writing.[00:36:11] Alessio: Yeah.[00:36:12] Legal and Ethical Issues in AI[00:36:12] Alessio: What do you think is the rationale that they're going to have in court when they don't show you the thinking traces of O1, but then they want us to, like, they're getting sued for using other publishers data, you know, but then on their end, they're like, well, you shouldn't be using my data to then train your model.[00:36:29] Alessio: So I'm curious to see how that kind of comes. Yeah, I mean, OPA has[00:36:32] swyx: many ways to publish, to punish people without bringing, taking them to court. Already banned ByteDance for distilling their, their info. And so anyone caught distilling the chain of thought will be just disallowed to continue on, on, on the API.[00:36:44] swyx: And it's fine. It's no big deal. Like, I don't even think that's an issue at all, just because the chain of thoughts are pretty well hidden. Like you have to work very, very hard to, to get it to leak. And then even when it leaks the chain of thought, you don't know if it's, if it's [00:37:00] The bigger concern is actually that there's not that much IP hiding behind it, that Cosign, which we talked about, we talked to him on Dev Day, can just fine tune 4.[00:37:13] swyx: 0 to beat 0. 1 Cloud SONET so far is beating O1 on coding tasks without, at least O1 preview, without being a reasoning model, same for Gemini Pro or Gemini 2. 0. So like, how much is reasoning important? How much of a moat is there in this, like, All of these are proprietary sort of training data that they've presumably accomplished.[00:37:34] swyx: Because even DeepSeek was able to do it. And they had, you know, two months notice to do this, to do R1. So, it's actually unclear how much moat there is. Obviously, you know, if you talk to the Strawberry team, they'll be like, yeah, I mean, we spent the last two years doing this. So, we don't know. And it's going to be Interesting because there'll be a lot of noise from people who say they have inference time compute and actually don't because they just have fancy chain of thought.[00:38:00][00:38:00] swyx: And then there's other people who actually do have very good chain of thought. And you will not see them on the same level as OpenAI because OpenAI has invested a lot in building up the mythology of their team. Um, which makes sense. Like the real answer is somewhere in between.[00:38:13] Alessio: Yeah, I think that's kind of like the main data war story developing.[00:38:18] The Data War: GPU Poor vs. GPU Rich[00:38:18] Alessio: GPU poor versus GPU rich. Yeah. Where do you think we are? I think there was, again, going back to like the small model thing, there was like a time in which the GPU poor were kind of like the rebel faction working on like these models that were like open and small and cheap. And I think today people don't really care as much about GPUs anymore.[00:38:37] Alessio: You also see it in the price of the GPUs. Like, you know, that market is kind of like plummeted because there's people don't want to be, they want to be GPU free. They don't even want to be poor. They just want to be, you know, completely without them. Yeah. How do you think about this war? You[00:38:52] swyx: can tell me about this, but like, I feel like the, the appetite for GPU rich startups, like the, you know, the, the funding plan is we will raise 60 million and [00:39:00] we'll give 50 of that to NVIDIA.[00:39:01] swyx: That is gone, right? Like, no one's, no one's pitching that. This was literally the plan, the exact plan of like, I can name like four or five startups, you know, this time last year. So yeah, GPU rich startups gone.[00:39:12] The Rise of GPU Ultra Rich[00:39:12] swyx: But I think like, The GPU ultra rich, the GPU ultra high net worth is still going. So, um, now we're, you know, we had Leopold's essay on the trillion dollar cluster.[00:39:23] swyx: We're not quite there yet. We have multiple labs, um, you know, XAI very famously, you know, Jensen Huang praising them for being. Best boy number one in spinning up 100, 000 GPU cluster in like 12 days or something. So likewise at Meta, likewise at OpenAI, likewise at the other labs as well. So like the GPU ultra rich are going to keep doing that because I think partially it's an article of faith now that you just need it.[00:39:46] swyx: Like you don't even know what it's going to, what you're going to use it for. You just, you just need it. And it makes sense that if, especially if we're going into. More researchy territory than we are. So let's say 2020 to 2023 was [00:40:00] let's scale big models territory because we had GPT 3 in 2020 and we were like, okay, we'll go from 1.[00:40:05] swyx: 75b to 1. 8b, 1. 8t. And that was GPT 3 to GPT 4. Okay, that's done. As far as everyone is concerned, Opus 3. 5 is not coming out, GPT 4. 5 is not coming out, and Gemini 2, we don't have Pro, whatever. We've hit that wall. Maybe I'll call it the 2 trillion perimeter wall. We're not going to 10 trillion. No one thinks it's a good idea, at least from training costs, from the amount of data, or at least the inference.[00:40:36] swyx: Would you pay 10x the price of GPT Probably not. Like, like you want something else that, that is at least more useful. So it makes sense that people are pivoting in terms of their inference paradigm.[00:40:47] Emerging Trends in AI Models[00:40:47] swyx: And so when it's more researchy, then you actually need more just general purpose compute to mess around with, uh, at the exact same time that production deployments of the old, the previous paradigm is still ramping up,[00:40:58] swyx: um,[00:40:58] swyx: uh, pretty aggressively.[00:40:59] swyx: So [00:41:00] it makes sense that the GPU rich are growing. We have now interviewed both together and fireworks and replicates. Uh, we haven't done any scale yet. But I think Amazon, maybe kind of a sleeper one, Amazon, in a sense of like they, at reInvent, I wasn't expecting them to do so well, but they are now a foundation model lab.[00:41:18] swyx: It's kind of interesting. Um, I think, uh, you know, David went over there and started just creating models.[00:41:25] Alessio: Yeah, I mean, that's the power of prepaid contracts. I think like a lot of AWS customers, you know, they do this big reserve instance contracts and now they got to use their money. That's why so many startups.[00:41:37] Alessio: Get bought through the AWS marketplace so they can kind of bundle them together and prefer pricing.[00:41:42] swyx: Okay, so maybe GPU super rich doing very well, GPU middle class dead, and then GPU[00:41:48] Alessio: poor. I mean, my thing is like, everybody should just be GPU rich. There shouldn't really be, even the GPU poorest, it's like, does it really make sense to be GPU poor?[00:41:57] Alessio: Like, if you're GPU poor, you should just use the [00:42:00] cloud. Yes, you know, and I think there might be a future once we kind of like figure out what the size and shape of these models is where like the tiny box and these things come to fruition where like you can be GPU poor at home. But I think today is like, why are you working so hard to like get these models to run on like very small clusters where it's like, It's so cheap to run them.[00:42:21] Alessio: Yeah, yeah,[00:42:22] swyx: yeah. I think mostly people think it's cool. People think it's a stepping stone to scaling up. So they aspire to be GPU rich one day and they're working on new methods. Like news research, like probably the most deep tech thing they've done this year is Distro or whatever the new name is.[00:42:38] swyx: There's a lot of interest in heterogeneous computing, distributed computing. I tend generally to de emphasize that historically, but it may be coming to a time where it is starting to be relevant. I don't know. You know, SF compute launched their compute marketplace this year, and like, who's really using that?[00:42:53] swyx: Like, it's a bunch of small clusters, disparate types of compute, and if you can make that [00:43:00] useful, then that will be very beneficial to the broader community, but maybe still not the source of frontier models. It's just going to be a second tier of compute that is unlocked for people, and that's fine. But yeah, I mean, I think this year, I would say a lot more on device, We are, I now have Apple intelligence on my phone.[00:43:19] swyx: Doesn't do anything apart from summarize my notifications. But still, not bad. Like, it's multi modal.[00:43:25] Alessio: Yeah, the notification summaries are so and so in my experience.[00:43:29] swyx: Yeah, but they add, they add juice to life. And then, um, Chrome Nano, uh, Gemini Nano is coming out in Chrome. Uh, they're still feature flagged, but you can, you can try it now if you, if you use the, uh, the alpha.[00:43:40] swyx: And so, like, I, I think, like, you know, We're getting the sort of GPU poor version of a lot of these things coming out, and I think it's like quite useful. Like Windows as well, rolling out RWKB in sort of every Windows department is super cool. And I think the last thing that I never put in this GPU poor war, that I think I should now, [00:44:00] is the number of startups that are GPU poor but still scaling very well, as sort of wrappers on top of either a foundation model lab, or GPU Cloud.[00:44:10] swyx: GPU Cloud, it would be Suno. Suno, Ramp has rated as one of the top ranked, fastest growing startups of the year. Um, I think the last public number is like zero to 20 million this year in ARR and Suno runs on Moto. So Suno itself is not GPU rich, but they're just doing the training on, on Moto, uh, who we've also talked to on, on the podcast.[00:44:31] swyx: The other one would be Bolt, straight cloud wrapper. And, and, um, Again, another, now they've announced 20 million ARR, which is another step up from our 8 million that we put on the title. So yeah, I mean, it's crazy that all these GPU pores are finding a way while the GPU riches are also finding a way. And then the only failures, I kind of call this the GPU smiling curve, where the edges do well, because you're either close to the machines, and you're like [00:45:00] number one on the machines, or you're like close to the customers, and you're number one on the customer side.[00:45:03] swyx: And the people who are in the middle. Inflection, um, character, didn't do that great. I think character did the best of all of them. Like, you have a note in here that we apparently said that character's price tag was[00:45:15] Alessio: 1B.[00:45:15] swyx: Did I say that?[00:45:16] Alessio: Yeah. You said Google should just buy them for 1B. I thought it was a crazy number.[00:45:20] Alessio: Then they paid 2. 7 billion. I mean, for like,[00:45:22] swyx: yeah.[00:45:22] Alessio: What do you pay for node? Like, I don't know what the game world was like. Maybe the starting price was 1B. I mean, whatever it was, it worked out for everybody involved.[00:45:31] The Multi-Modality War[00:45:31] Alessio: Multimodality war. And this one, we never had text to video in the first version, which now is the hottest.[00:45:37] swyx: Yeah, I would say it's a subset of image, but yes.[00:45:40] Alessio: Yeah, well, but I think at the time it wasn't really something people were doing, and now we had VO2 just came out yesterday. Uh, Sora was released last month, last week. I've not tried Sora, because the day that I tried, it wasn't, yeah. I[00:45:54] swyx: think it's generally available now, you can go to Sora.[00:45:56] swyx: com and try it. Yeah, they had[00:45:58] Alessio: the outage. Which I [00:46:00] think also played a part into it. Small things. Yeah. What's the other model that you posted today that was on Replicate? Video or OneLive?[00:46:08] swyx: Yeah. Very, very nondescript name, but it is from Minimax, which I think is a Chinese lab. The Chinese labs do surprisingly well at the video models.[00:46:20] swyx: I'm not sure it's actually Chinese. I don't know. Hold me up to that. Yep. China. It's good. Yeah, the Chinese love video. What can I say? They have a lot of training data for video. Or a more relaxed regulatory environment.[00:46:37] Alessio: Uh, well, sure, in some way. Yeah, I don't think there's much else there. I think like, you know, on the image side, I think it's still open.[00:46:45] Alessio: Yeah, I mean,[00:46:46] swyx: 11labs is now a unicorn. So basically, what is multi modality war? Multi modality war is, do you specialize in a single modality, right? Or do you have GodModel that does all the modalities? So this is [00:47:00] definitely still going, in a sense of 11 labs, you know, now Unicorn, PicoLabs doing well, they launched Pico 2.[00:47:06] swyx: 0 recently, HeyGen, I think has reached 100 million ARR, Assembly, I don't know, but they have billboards all over the place, so I assume they're doing very, very well. So these are all specialist models, specialist models and specialist startups. And then there's the big labs who are doing the sort of all in one play.[00:47:24] swyx: And then here I would highlight Gemini 2 for having native image output. Have you seen the demos? Um, yeah, it's, it's hard to keep up. Literally they launched this last week and a shout out to Paige Bailey, who came to the Latent Space event to demo on the day of launch. And she wasn't prepared. She was just like, I'm just going to show you.[00:47:43] swyx: So they have voice. They have, you know, obviously image input, and then they obviously can code gen and all that. But the new one that OpenAI and Meta both have but they haven't launched yet is image output. So you can literally, um, I think their demo video was that you put in an image of a [00:48:00] car, and you ask for minor modifications to that car.[00:48:02] swyx: They can generate you that modification exactly as you asked. So there's no need for the stable diffusion or comfy UI workflow of like mask here and then like infill there in paint there and all that, all that stuff. This is small model nonsense. Big model people are like, huh, we got you in as everything in the transformer.[00:48:21] swyx: This is the multimodality war, which is, do you, do you bet on the God model or do you string together a whole bunch of, uh, Small models like a, like a chump. Yeah,[00:48:29] Alessio: I don't know, man. Yeah, that would be interesting. I mean, obviously I use Midjourney for all of our thumbnails. Um, they've been doing a ton on the product, I would say.[00:48:38] Alessio: They launched a new Midjourney editor thing. They've been doing a ton. Because I think, yeah, the motto is kind of like, Maybe, you know, people say black forest, the black forest models are better than mid journey on a pixel by pixel basis. But I think when you put it, put it together, have you tried[00:48:53] swyx: the same problems on black forest?[00:48:55] Alessio: Yes. But the problem is just like, you know, on black forest, it generates one image. And then it's like, you got to [00:49:00] regenerate. You don't have all these like UI things. Like what I do, no, but it's like time issue, you know, it's like a mid[00:49:06] swyx: journey. Call the API four times.[00:49:08] Alessio: No, but then there's no like variate.[00:49:10] Alessio: Like the good thing about mid journey is like, you just go in there and you're cooking. There's a lot of stuff that just makes it really easy. And I think people underestimate that. Like, it's not really a skill issue, because I'm paying mid journey, so it's a Black Forest skill issue, because I'm not paying them, you know?[00:49:24] Alessio: Yeah,[00:49:25] swyx: so, okay, so, uh, this is a UX thing, right? Like, you, you, you understand that, at least, we think that Black Forest should be able to do all that stuff. I will also shout out, ReCraft has come out, uh, on top of the image arena that, uh, artificial analysis has done, has apparently, uh, Flux's place. Is this still true?[00:49:41] swyx: So, Artificial Analysis is now a company. I highlighted them I think in one of the early AI Newses of the year. And they have launched a whole bunch of arenas. So, they're trying to take on LM Arena, Anastasios and crew. And they have an image arena. Oh yeah, Recraft v3 is now beating Flux 1. 1. Which is very surprising [00:50:00] because Flux And Black Forest Labs are the old stable diffusion crew who left stability after, um, the management issues.[00:50:06] swyx: So Recurve has come from nowhere to be the top image model. Uh, very, very strange. I would also highlight that Grok has now launched Aurora, which is, it's very interesting dynamics between Grok and Black Forest Labs because Grok's images were originally launched, uh, in partnership with Black Forest Labs as a, as a thin wrapper.[00:50:24] swyx: And then Grok was like, no, we'll make our own. And so they've made their own. I don't know, there are no APIs or benchmarks about it. They just announced it. So yeah, that's the multi modality war. I would say that so far, the small model, the dedicated model people are winning, because they are just focused on their tasks.[00:50:42] swyx: But the big model, People are always catching up. And the moment I saw the Gemini 2 demo of image editing, where I can put in an image and just request it and it does, that's how AI should work. Not like a whole bunch of complicated steps. So it really is something. And I think one frontier that we haven't [00:51:00] seen this year, like obviously video has done very well, and it will continue to grow.[00:51:03] swyx: You know, we only have Sora Turbo today, but at some point we'll get full Sora. Oh, at least the Hollywood Labs will get Fulsora. We haven't seen video to audio, or video synced to audio. And so the researchers that I talked to are already starting to talk about that as the next frontier. But there's still maybe like five more years of video left to actually be Soda.[00:51:23] swyx: I would say that Gemini's approach Compared to OpenAI, Gemini seems, or DeepMind's approach to video seems a lot more fully fledged than OpenAI. Because if you look at the ICML recap that I published that so far nobody has listened to, um, that people have listened to it. It's just a different, definitely different audience.[00:51:43] swyx: It's only seven hours long. Why are people not listening? It's like everything in Uh, so, so DeepMind has, is working on Genie. They also launched Genie 2 and VideoPoet. So, like, they have maybe four years advantage on world modeling that OpenAI does not have. Because OpenAI basically only started [00:52:00] Diffusion Transformers last year, you know, when they hired, uh, Bill Peebles.[00:52:03] swyx: So, DeepMind has, has a bit of advantage here, I would say, in, in, in showing, like, the reason that VO2, while one, They cherry pick their videos. So obviously it looks better than Sora, but the reason I would believe that VO2, uh, when it's fully launched will do very well is because they have all this background work in video that they've done for years.[00:52:22] swyx: Like, like last year's NeurIPS, I already was interviewing some of their video people. I forget their model name, but for, for people who are dedicated fans, they can go to NeurIPS 2023 and see, see that paper.[00:52:32] Alessio: And then last but not least, the LLMOS. We renamed it to Ragops, formerly known as[00:52:39] swyx: Ragops War. I put the latest chart on the Braintrust episode.[00:52:43] swyx: I think I'm going to separate these essays from the episode notes. So the reason I used to do that, by the way, is because I wanted to show up on Hacker News. I wanted the podcast to show up on Hacker News. So I always put an essay inside of there because Hacker News people like to read and not listen.[00:52:58] Alessio: So episode essays,[00:52:59] swyx: I remember [00:53:00] purchasing them separately. You say Lanchain Llama Index is still growing.[00:53:03] Alessio: Yeah, so I looked at the PyPy stats, you know. I don't care about stars. On PyPy you see Do you want to share your screen? Yes. I prefer to look at actual downloads, not at stars on GitHub. So if you look at, you know, Lanchain still growing.[00:53:20] Alessio: These are the last six months. Llama Index still growing. What I've basically seen is like things that, One, obviously these things have A commercial product. So there's like people buying this and sticking with it versus kind of hopping in between things versus, you know, for example, crew AI, not really growing as much.[00:53:38] Alessio: The stars are growing. If you look on GitHub, like the stars are growing, but kind of like the usage is kind of like flat. In the last six months, have they done some[00:53:4
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Inference-Only Debate Experiments Using Math Problems, published by Arjun Panickssery on August 6, 2024 on The AI Alignment Forum. Work supported by MATS and SPAR. Code at https://github.com/ArjunPanickssery/math_problems_debate/. Three measures for evaluating debate are 1. whether the debate judge outperforms a naive-judge baseline where the naive judge answers questions without hearing any debate arguments. 2. whether the debate judge outperforms a consultancy baseline where the judge hears argument(s) from a single "consultant" assigned to argue for a random answer. 3. whether the judge can continue to supervise the debaters as the debaters are optimized for persuasiveness. We can measure whether judge accuracy increases as the debaters vary in persuasiveness (measured with Elo ratings). This variation in persuasiveness can come from choosing different models, choosing the best of N sampled arguments for different values of N, or training debaters for persuasiveness (i.e. for winning debates) using RL. Radhakrishan (Nov 2023), Khan et al. (Feb 2024), and Kenton et al. (July 2024) study an information-gap setting where judges answer multiple-choice questions about science-fiction stories whose text they can't see, both with and without a debate/consultancy transcript that includes verified quotes from the debaters/consultant. Past results from the QuALITY information-gap setting are seen above. Radhakrishnan (top row) finds no improvement to judge accuracy as debater Elo increases, while Khan et al. (middle row) and Kenton et al. (bottom row) do find a positive trend. Radhakrishnan varied models using RL while Khan et al. used best-of-N and critique-and-refinement optimizations. Kenton et al. vary the persuasiveness of debaters by using models with different capability levels. Both Khan et al. and Kenton et al. find that in terms of judge accuracy, debate > consultancy > naive judge for this setting. In addition to the information-gap setting, consider a reasoning-gap setting where the debaters are distinguished from the judge not by their extra information but by their stronger ability to answer the questions and explain their reasoning. Kenton et al. run debates on questions from MMLU, TruthfulQA, PrOntoQA (logical reasoning), GQPA, and GSM8K (grade-school math). For the Elo-calculation experiments they use Gemini Pro 1.0 and Pro 1.5 judges with five debaters: Gemma7B, GPT-3.5, Gemini Pro 1.0, Gemini Pro 1.5 (all with best-of-N=1), and Gemini Pro 1.5 with best-of-N=4. They find (top row) that debate slightly outperforms consultancy but outperforms the naive-judge baseline for only one of the four judges; they don't find that more persuasive debaters lead to higher judge accuracy. We get similar results (bottom row), specifically by 1. Generating 100 wrong answers and proofs to GSM8K questions to create binary-choice questions. 2. Computing the judge accuracy in naive, consultancy, and single-turn debate settings using four judges (Llama2-7B, Llama3-8B, GPT-3.5 Turbo, and GPT-4o) and seven debaters (Claude-3.5 Sonnet, Claude-3 Sonnet, GPT-3.5 Turbo, GPT-4o, Llama2-13B, Llama2-7B, and Llama3-8B). 3. Generating Elo scores from round-robin matchups between the seven models, using the same method as Kenton et al. We basically replicate the results. We find that 1. Debate doesn't consistently outperform the naive-judge baseline, and only slightly outperforms the consultancy baseline. 2. The positive relationship between debater persuasiveness and judge accuracy seen in the information-gap setting doesn't transfer to the reasoning-gap setting. (Results are shown below colored by debater rather than by judge). We also find some evidence of a self-preference bias (Panickssery et al., Apr 2024) where debaters have a higher Elo rating when judged by similar models. The GPT-...
Have you ever stumbled upon an article or a piece of content online and wondered, "Did someone actually write this, or is it the work of ChatGPT?" In today's world, where content is produced at an incredible pace, it's becoming increasingly difficult to tell the difference.. and that's a problem in the age of misinformation.Think about it: people are getting their news on social media, X, Youtube or Facebook! With the advancements of AI, it's hard to tell how something online can be truly authentic. With latest studies showing >12% of Google's search results being AI-generated, it's critical to ensure the integrity of the digital content we consume and create. That's where Originality AI comes in! We're thrilled to host Jon Gillham, founder and CEO on Things Have Changed. as he shares how his team are tackling these issues head-on by developing cutting-edge tech to detect AI-generated content. In a short span of time, Originality AI have achieved remarkable results, and is the most accurate AI Detector in the market for ChatGPT, GPT-4o, Gemini Pro, Claude 3, Llama 3 etc.So today on Things Have Changed, we'll dive deep into how Originality AI works, its impact on various industries, and why ensuring content authenticity is more important than ever.The Growth GearExplore business growth and success strategies with Tim Jordan on 'The Growth Gear.Listen on: Apple Podcasts Spotify PodMatchPodMatch Automatically Matches Ideal Podcast Guests and Hosts For InterviewsSupport the Show.Things Have Changed
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On scalable oversight with weak LLMs judging strong LLMs, published by Zachary Kenton on July 8, 2024 on The AI Alignment Forum. Abstract Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a human judge; consultancy, where a single AI tries to convince a human judge that asks questions; and compare to a baseline of direct question-answering, where the human judge just answers outright without the AI. We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models. We benchmark on a diverse range of asymmetries between judges and agents, extending previous work on a single extractive QA task with information asymmetry, to also include mathematics, coding, logic and multimodal reasoning asymmetries. We find that debate outperforms consultancy across all tasks when the consultant is randomly assigned to argue for the correct/incorrect answer. Comparing debate to direct question answering, the results depend on the type of task: in extractive QA tasks with information asymmetry debate outperforms direct question answering, but in other tasks without information asymmetry the results are mixed. Previous work assigned debaters/consultants an answer to argue for. When we allow them to instead choose which answer to argue for, we find judges are less frequently convinced by the wrong answer in debate than in consultancy. Further, we find that stronger debater models increase judge accuracy, though more modestly than in previous studies. Twitter thread Setup We evaluate on three types of task. Extractive, where there is a question, two answer options and a source article to extract from, and information-asymmetry, meaning that judges don't get to see the article. Closed, where there is just a question and two answer options. Multimodal, where the questions involve both text and images, and two answer options. Our tasks are summarised in the following table: We consider six protocols: Consultancy, where a single AI is assigned the correct/incorrect answer (with probability 50/50) and tries to convince a judge that asks questions; Open consultancy, which is similar except the AI chooses which answer to argue for. Debate, where two AIs compete to convince a judge; Open debate, which is identical except one debater, marked the protagonist, chooses which answer to argue for. We compare to direct QA protocols: QA without article, where the judge directly answers the question; QA with article, (only on extractive tasks) where the judge directly answers the question given the article. For judge models we use Gemma7B (V1), GPT-3.5, Gemini Pro 1.0 and Gemini Pro 1.5. Our main experiments use Gemini Pro 1.5 as debaters/consultants. Assigned-role results We first look at assigned-role protocols, consultancy and debate, meaning that the consultants/debaters do not get to choose which side to argue for. We compare these to the two direct QA protocols. Findings: We find that debate consistently outperforms consultancy across all tasks, previously only shown on a single extractive QA task in Khan et al., 2024. See paper details for significance levels. Comparing debate to direct question answering baselines, the results depend on the type of task: In extractive QA tasks with information asymmetry, debate outperforms QA without article as in the single task of Khan et al., 2024, but not QA with article. For other tasks, when the judge is weaker than the debaters (but not too weak), we find either small or no advantage to debate over QA without article. Changes to the setup (number of turns, best-of-N sampling, few-shot, chain-of-thought) seem to have little effect on results. See paper for figures showing this. ...
Our 173rd episode with a summary and discussion of last week's big AI news! With hosts Andrey Kurenkov (https://twitter.com/andrey_kurenkov) and Jeremie Harris (https://twitter.com/jeremiecharris) See full episode notes here. Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ If you would like to become a sponsor for the newsletter, podcast, or both, please fill out this form. Email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai In this episode of Last Week in AI, we explore the latest advancements and debates in the AI field, including Google's release of Gemini 1.5, Meta's upcoming LLaMA 3, and Runway's Gen 3 Alpha video model. We discuss emerging AI features, legal disputes over data usage, and China's competition in AI. The conversation spans innovative research developments, cost considerations of AI architectures, and policy changes like the U.S. Supreme Court striking down Chevron deference. We also cover U.S. export controls on AI chips to China, workforce development in the semiconductor industry, and Bridgewater's new AI-driven financial fund, evaluating the broader financial and regulatory impacts of AI technologies. Timestamps + links: (00:00:00) Intro / Banter Tools & Apps(00:03:24) Google opens up Gemini 1.5 Flash, Pro with 2M tokens to the public (00:08:47) Meta is about to launch its biggest Llama model yet — here's why it's a big deal (00:12:38) Runway's Gen-3 Alpha AI video model now available – but there's a catch (00:16:28) This is Google AI, and it's coming to the Pixel 9 (00:17:30) AI Firm ElevenLabs Sets Audio Reader Pact With Judy Garland, James Dean, Burt Reynolds and Laurence Olivier Estates (00:20:06) Perplexity's ‘Pro Search' AI upgrade makes it better at math and research (00:23:12) Gemini's data-analyzing abilities aren't as good as Google claims Applications & Business(00:26:38) Quora's Chatbot Platform Poe Allows Users to Download Paywalled Articles on Demand (00:32:04) Huawei and Wuhan Xinxin to develop high-bandwidth memory chips amid US restrictions (00:34:57) Alibaba's large language model tops global ranking of AI developer platform Hugging Face (00:39:01) Here comes a Meta Ray-Bans challenger with ChatGPT-4o and a camera (00:43:35) Apple's Phil Schiller is reportedly joining OpenAI's board (00:47:26) AI Video Startup Runway Looking to Raise $450 Million Projects & Open Source(00:48:10) Kyutai Open Sources Moshi: A Real-Time Native Multimodal Foundation AI Model that can Listen and Speak (00:50:44) MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation (00:53:47) Anthropic Pushes for Third-Party AI Model Evaluations (00:57:29) Mozilla Llamafile, Builders Projects Shine at AI Engineers World's Fair Research & Advancements(00:59:26) Researchers upend AI status quo by eliminating matrix multiplication in LLMs (01:05:55) AI Agents That Matter (01:12:09) WARP: On the Benefits of Weight Averaged Rewarded Policies (01:17:20) Scaling Synthetic Data Creation with 1,000,000,000 Personas (01:24:16) Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization Policy & Safety(01:26:32) With Chevron's demise, AI regulation seems dead in the water (01:33:40) Nvidia to make $12bn from AI chips in China this year despite US controls (01:37:52) Uncle Sam relies on manual processes to oversee restrictions on Huawei, other Chinese tech players (01:40:57) U.S. government addresses critical workforce shortages for the semiconductor industry with new program (01:42:42) Bridgewater starts $2 billion fund that uses machine learning for decision-making and will include models from OpenAI, Anthropic and Perplexity (01:47:57) Outro
In this episode, Tony Safoian interviews Mario Ciabarra, the CEO and founder of Quantum Metric. They discuss Mario's background and journey as an entrepreneur, as well as the evolution of Quantum Metric and its product. They highlight the importance of understanding and listening to customers to improve digital experiences. They also introduce the concept of Generative AI and how it is being implemented in the Quantum Metric platform. The conversation explores the potential of generative AI in improving customer experiences and driving business growth. It highlights the importance of real-time data analysis and the ability to understand and address customer friction points. The use of Google Cloud Platform (GCP) and Gemini Pro is discussed as a powerful solution for leveraging generative AI. The conversation also emphasizes the value of partnerships and the role of data in determining winners and losers in the market. The future of the industry is predicted to involve faster disruption cycles and a focus on having the right data at the right moment. Don't miss this insightful episode filled with personal anecdotes and cutting-edge technological discussions. Tune in now, and remember to LIKE, SHARE, & SUBSCRIBE for more! Podcast Library YouTube Playlist Host: Tony Safoian | CEO at SADA Guest: Mario Ciabarra | CEO at Quantum Metric To learn more, visit our website here: SADA.com
World Gym世界健身要在高雄左營開店囉!全新獨棟千坪健身房,配備國際級重訓、有氧健身器材,還有游泳池、三溫暖、團體課程一應俱全,豐富你的運動體驗。早鳥優惠享入會費0元,立即登記參觀領限量好禮!https://fstry.pse.is/5yrd44 —— 以上為播客煮與 Firstory Podcast 廣告 —— ------------------------------- 通勤學英語VIP加值內容與線上課程 ------------------------------- 通勤學英語VIP訂閱方案:https://open.firstory.me/join/15minstoday VIP訂閱FAQ: https://15minsengcafe.pse.is/5cjptb 社會人核心英語有聲書課程連結:https://15minsengcafe.pse.is/554esm ------------------------------- 15Mins.Today 相關連結 ------------------------------- 歡迎針對這一集留言你的想法: 留言連結 主題投稿/意見回覆 : ask15mins@gmail.com 官方網站:www.15mins.today 加入Clubhouse直播室:https://15minsengcafe.pse.is/46hm8k 訂閱YouTube頻道:https://15minsengcafe.pse.is/3rhuuy 商業合作/贊助來信:15minstoday@gmail.com ------------------------------- 以下是此單集逐字稿 (播放器有不同字數限制,完整文稿可到官網) ------------------------------- 國際時事跟讀 Ep.K791: Unveiling GPT-4o: OpenAI's Groundbreaking Multimodal Language Model Highlights 主題摘要:GPT-4o is a breakthrough multimodal language model that can handle text, audio, images, and video within a single interface, offering enhanced capabilities and performance.The model's improvements include considering tone of voice, reduced latency for real-time conversations, and integrated vision capabilities, opening up new possibilities for interactive experiences.While GPT-4o has limitations and risks, it aligns with OpenAI's mission to develop AGI and has the potential to revolutionize human-AI interactions across various contexts. OpenAI has recently unveiled GPT-4o, its latest large language model and the successor to GPT-4 Turbo. This innovative model stands out by accepting prompts in various formats, including text, audio, images, and video, all within a single interface. The "o" in GPT-4o represents "omni," reflecting its ability to handle multiple content types simultaneously, a significant advancement from previous models that required separate interfaces for different media. OpenAI 最近推出了 GPT-4o,這是其最新的大型語言模型,也是 GPT-4 Turbo 的繼任者。這個創新模型的突出之處在於它能夠接受各種格式的提示,包括文字、聲音、圖像和影片,所有這些都在一個單一的界面內。GPT-4o 中的「o」代表「omni」,反映了它能夠同時處理多種內容類型的能力,這是與之前需要為不同媒體使用單獨界面的模型相比的重大進步。 GPT-4o brings several improvements over its predecessor, GPT-4 Turbo. The model can now consider tone of voice, enabling more emotionally appropriate responses. Additionally, the reduced latency allows for near-real-time conversations, making it suitable for applications like live translations. GPT-4o's integrated vision capabilities enable it to describe and analyze content from camera feeds or computer screens, opening up new possibilities for interactive experiences and accessibility features for visually impaired users. GPT-4o 在其前身 GPT-4 Turbo 的基礎上帶來了幾項改進。該模型現在可以考慮語調,從而產生更適當情緒的回應。此外,延遲時間的縮短使其能夠進行近乎即時的對話,這使其適用於即時翻譯等應用。GPT-4o 集成的視覺功能使其能夠描述和分析來自攝影機和電腦螢幕的內容,為互動體驗和視障用戶的無障礙功能開闢了新的可能。 In terms of performance, GPT-4o has demonstrated impressive results in various benchmarks, often outperforming other top models like Claude 3 Opus and Gemini Pro 1.5. The model's multimodal training approach shows promise in enhancing its problem-solving abilities, extensive world knowledge, and code generation capabilities. As GPT-4o becomes more widely available, it has the potential to revolutionize how we interact with AI in both personal and professional contexts. 在性能方面,GPT-4o 在各種基準測試中展示了令人印象深刻的結果,通常優於其他頂級模型,如 Claude 3 Opus 和 Gemini Pro 1.5。該模型的多模態訓練方法在提高其解決問題的能力、廣泛的世界知識和代碼生成能力方面顯出極大的潛力。隨著 GPT-4o 變得更加普及,它有可能革新我們在個人和專業領域與 AI 互動的方式。 While GPT-4o represents a significant leap forward, it is not without limitations and risks. Like other generative AI models, its output can be imperfect, particularly when interpreting images, videos, or transcribing speech with technical terms or strong accents. There are also concerns about the potential misuse of GPT-4o's audio capabilities in creating more convincing deepfake scams. As OpenAI continues to refine and optimize this new architecture, addressing these challenges will be crucial to ensure the model's safe and effective deployment. 儘管 GPT-4o 代表了重大的躍進,但它並非沒有局限性和風險。與其他生成式 AI 模型一樣,它的輸出可能並不完美,尤其是在解釋圖像、影片或製作包含技術術語或強烈口音的語音逐字稿時。人們還擔心 GPT-4o 的語音功能可能被濫用,用於創造可信度更高的 deepfake 詐騙。隨著 OpenAI 繼續完善和優化這種新架構,解決這些挑戰將是確保該模型安全有效部署的關鍵。 The release of GPT-4o aligns with OpenAI's mission to develop artificial general intelligence (AGI) and its business model of creating increasingly powerful AI systems. As the first generation of this new model architecture, GPT-4o presents ample opportunities for the company to learn and optimize in the coming months. Users can expect improvements in speed and output quality over time, along with the emergence of novel use cases and applications. GPT-4o 的發布符合 OpenAI 開發通用人工智慧 (AGI) 的使命以及其創建越來越強大的 AI 系統的商業模式。作為這種新模型架構的第一代,GPT-4o 為該公司在未來幾個月內學習和優化提供了充足的機會。用戶可以期待速度和輸出品質隨著時間的推移而提升,以及新的使用案例和應用的出現。 The launch of GPT-4o coincides with the declining interest in virtual assistants like Siri, Alexa, and Google Assistant. OpenAI's focus on making AI more conversational and interactive could potentially revitalize this space and bring forth a new wave of AI-driven experiences. The model's lower cost compared to GPT-4 Turbo, coupled with its enhanced capabilities, positions GPT-4o as a game-changer in the AI industry. GPT-4o 的推出恰逢人們對 Siri、Alexa 和 Google Assistant 等虛擬助手的興趣下降之際。OpenAI 致力於使 AI 更具對話性和交互性,這可能會重振該領域,帶來新一波 AI 驅動的體驗。與 GPT-4 Turbo 相比,該模型的成本更低,再加上其增強的功能,使 GPT-4o 成為 AI 行業的遊戲規則改變者。 As GPT-4o becomes more accessible, it is essential for individuals and professionals to familiarize themselves with the technology and its potential applications. OpenAI offers resources such as the AI Fundamentals skill track and hands-on courses on working with the OpenAI API to help users navigate this exciting new frontier in artificial intelligence. 隨著 GPT-4o 變得更加易於獲取,個人和專業人士必須熟悉該技術及其潛在應用。OpenAI 提供了資源,如 AI 基礎技能追蹤和使用 OpenAI API 的相關實踐課程,以幫助用戶探索人工智慧的這個令人興奮的新疆土。 Keyword Drills 關鍵字:Interface (In-ter-face): The "o" in GPT-4o represents "omni," reflecting its ability to handle multiple content types simultaneously, a significant advancement from previous models that required separate interfaces for different media.Predecessor (Pred-e-ces-sor): GPT-4o brings several improvements over its predecessor, GPT-4 Turbo.Architecture (Ar-chi-tec-ture): As the first generation of this new model architecture, GPT-4o presents ample opportunities for the company to learn and optimize.Interpreting (In-ter-pre-ting): Like other generative AI models, its output can be imperfect, particularly when interpreting images, videos, or transcribing speech with technical terms or strong accents.Revitalize (Re-vi-ta-lize): OpenAI's focus on making AI more conversational and interactive could potentially revitalize this space and bring forth a new wave of AI-driven experiences. Reference article: https://www.datacamp.com/blog/what-is-gpt-4o
Today we explore the deluge of announcements from both OpenAi and Google. With a plethora of Ai features dropping at Google I/O And Chat GPT-4o landing with an ai that can be spoken to like a human, how do we determine the difference between groundbreaking AI tools and mere gimmicks. How do we discern practical applications from overhyped features? Join Guy as he navigates the latest AI developments, asking the critical question: What truly enhances our digital lives and what falls short? Links to check out: Rabbit R1 (Link: https://www.rabbit.tech/rabbit-r1) Google I/O Announcements: Coverage of the latest features and tools introduced by Google, including the Gemini Pro and video gen models. (Link: https://io.google/2024/) OpenAI's GPT-40 Announcement: Insights into the latest generative pre-trained transformer model which emphasizes voice interaction (Link: https://openai.com/index/hello-gpt-4o/) Satlantis Project (Link: https://satlantis.com/) Welcome to the World of Audio Computers - Jason Rugolo TED talk (Link: https://tinyurl.com/4zc62nhc) Nova Project: Focus on a business-oriented AI platform that prioritizes open-source solutions and privacy for handling sensitive data. (Link: Pending) Host Links Guy on Nostr (Link: http://tinyurl.com/2xc96ney) Guy on X (Link: https://twitter.com/theguyswann) Guy on Instagram (Link: https://www.instagram.com/theguyswann/) Guy on TikTok (Link: https://www.tiktok.com/@theguyswann) Guy on YouTube (Link: https://www.youtube.com/@theguyswann) Bitcoin Audible on X (Link: https://twitter.com/BitcoinAudible) Check out our awesome sponsors! Get 10% off the COLDCARD with code BITCOINAUDIBLE (Link: bitcoinaudible.com/coldcard) Swan: The best way to buy, learn, and earn #Bitcoin (Link: https://swanbitcoin.com) "The Limits of my language means the limits of my world"~ Ludwig Wittgenstein
Join the fun at: https://thisdayinai.comSimTheory: https://simtheory.aiShow notes: https://thisdayinai.com/bookmarks/55-ep63/UDIO song: https://www.udio.com/songs/iu1381RxvjfzWznGHeVecVThanks for listening and all your support of the show!CHAPTERS:------00:00 - We're changing the name of the show00:52 - Thoughts on GPT-4o (GPT4 Omni), ChatGPT Free Vs Plus & impressions27:57 - ChatGPT Voice Mode: A Dramatic Shift? Voice as a Platform: Star Trek Vs Her34:54 - Project Astra & The Future Interface of AI Computing52:28 - Applying AI Technologies: are the next 3 years a golden age for developers implementing AI?55:23 - Do we have to become Cyborgs to find our keys?1:06:24 - Google I/O AI Recap: Google's Context Caching, Tools for Project Astra, Impressions of Gemini Pro 1.5, Gemma, Gemini Flash, Veo etc.1:37:43 - Our Favorite UDIO song of the week
OpenAI a présenté GPT-4o pour ChatGPT et Google a présenté plusieurs annonces dont Gemini Pro 1.5 à la Google I/O. Deux salles, deux ambiances… GPT-4o VS Gemini Polémiques Jeux vidéo Participants
Infomaniak partage les valeurs de Tech Café : éthique, écologie et respect de la vie privée. Découvrez les services de notre partenaire sur Infomaniak.comOpenAI a présenté GPT-4o pour ChatGPT et Google a présenté plusieurs annonces dont Gemini Pro 1.5 à la Google I/O. Deux salles, deux ambiances... ❤️ Patreon
Welcome to episode 257 of the Cloud Pod podcast – where the forecast is always cloudy! This week your hosts Justin, Matthew, Ryan, and Jonathan are in the barnyard bringing you the latest news, which this week is really just Meta's release of Llama 3. Seriously. That's every announcement this week. Don’t say we didn't warn you. Titles we almost went with this week: Meta Llama says no Drama No Meta Prob-llama Keep Calm and Llama on Redis did not embrace the Llama MK The bedrock of good AI is built on Llamas The CloudPod announces support for Llama3 since everyone else was doing it Llama3, better know as Llama Llama Llama The Cloud Pod now known as the LLMPod Cloud Pod is considering changing its name to LlamaPod Unlike WinAMP nothing whips the llamas ass A big thanks to this week's sponsor: Check out Sonrai Securities‘ new Cloud Permission Firewall. Just for our listeners, enjoy a 14 day trial at www.sonrai.co/cloudpod Follow Up 01:27 Valkey is Rapidly Overtaking Redis Valkey has continued to rack up support from AWS, Ericsson, Google, Oracle and Verizon initially, to now being joined by Alibaba, Aiven, Heroku and Percona backing Valkey as well. Numerous blog posts have come out touting Valkey adoption. I'm not sure this whole thing is working out as well as Redis CEO Rowan Trollope had hoped. AI Is Going Great – Or How AI Makes All It's Money 03:26 Introducing Meta Llama 3: The most capable openly available LLM to date Meta has launched Llama 3, the next generation of their state-of-the-art open source large language model. Llama 3 will be available on AWS, Databricks, GCP, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, Nvidia NIM, and Snowflake with support from hardware platforms offered by AMD, AWS, Dell, Intel, Nvidia and Qualcomm Includes new trust and safety tools such as Llama Guard 2, Code Shield and Cybersec eval 2 They plan to introduce new capabilities, including longer context windows, additional model sizes and enhanced performance. The first two models from Meta Lama3 are the 8B and 70B parameter variants that can support a broad range of use cases. Meta shared some benchmarks against Gemma 7B and Mistral 7B vs the Lama 3 8B models and showed improvements across all major benchmarks. Including Math with Gemma 7b doing 12.2 vs 30 with Llama 3 It had highly comparable performance with the 70B model against Gemini Pro 1.5 and Claude 3 Sonnet scoring within a few points of most of the other scores. Jonathan recommends using LM Studio to get start playing around with LLMS, which you can find at https://lmstudio.ai/ 04:42 Jonathan – “Isn’t it funny how you go from an 8 billion parameter model to a 70 billion parameter model but nothing in between? Like you would have thought there would be some kind of like, some middle ground maybe? But, uh, but… No. But, um,
Jon Krohn presents an insightful overview of Google's groundbreaking Gemini Pro 1.5, a million-token LLM that's transforming the landscape of AI. Discover the innovative aspects of Gemini Pro 1.5, from its extensive context window to its multimodal functionalities, which are broadening the scope of AI technology and signifying a significant leap in data science. Plus, join Jon for a practical demonstration, showcasing the real-world applications, capabilities, and limitation of this advanced language model. Additional materials: www.superdatascience.com/762 Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information.
ChatGPT Plugins are on their way out! Tyler Perry is putting his studio expansion on hold due to AI, and Google is making TONS of news right now! Here's this week's AI news that matters and why it's important. Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode pageJoin the discussion: Ask Jordan questions on AIRelated Episodes:Ep 211: OpenAI's Sora – The larger impact that no one's talking aboutEp 204: Google Gemini Advanced – 7 things you need to knowTomorrow' Show: How to stand out in a world where everyone can create an AI Startup?Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTimestamps:03:42 Tyler Perry concerned about AI job loss.07:22 OpenAI Sora video excels over other platforms.12:54 11 Labs updated model, ChatGPT phasing out.15:27 Plugin packs for ChatGPT.16:55 Limitations on using multiple GPTs for now.22:16 Unsatisfied with Google Gemini Enterprise integration.23:13 Google and Reddit partnership for language models.28:39 Google Gemini Images paused due to diversity concerns.31:16 Google now has three Gemini models.34:54 Best text-to-speech AI37:11 AI content creation raises copyright concernsTopics Covered in This Episode:1. OpenAI's changes and future focus2. Google's Significant AI content deal with Reddit3. Google's AI model developments and issues4. Trends in AI utilization within the entertainment industryKeywords:OpenAI, GPT, AI agents, AI assistants, prime prompt polish program, Google, Reddit, AI content licensing deal, AI models, search engine, Gemini AI, large language models, user-generated content, university student data, Google Gemini Imagine 2, Gemma, Gemini Ultra, Gemini Pro, Gemini Nano, Tyler Perry, Sora, AI in entertainment, text-to-speech AI, business productivity, ChatGPT plugins, Well Said Labs, Asura, AI video platforms, Perry's studio expansion, AI regulation
The AI Breakdown: Daily Artificial Intelligence News and Discussions
NLW argues that another phase of expectation in genAI has begun thanks to Groq, Sora, and Gemini Pro 1.5 Featuring a reading of https://www.oneusefulthing.org/p/strategies-for-an-accelerating-future INTERESTED IN THE AI EDUCATION BETA? Learn more and sign up https://bit.ly/aibeta ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Is 2024 the year we'll see our wildest imaginations come to life in video form? Kipp and Kieran get right into the brewing storm within the AI industry as titans clash on new frontiers of technology. In this episode they dive into the unfolding drama of AI developments with a focus on the text-to-video revolution. Learn more on how Sora is animating our still image stories, the serious business of AI in video game worlds, and the intense rivalry heating up between OpenAI and Google. Mentions Sora - Text-to-video model launched by OpenAI. (https://openai.com/sora) OpenAI - The organization behind the development of AI models like Sora and GPT-4. (https://www.openai.com/) Sam Altman - CEO of OpenAI involved in the launch of Sora. (https://www.ycombinator.com/people/sam) Google Gemini 1.5 - A model developed by Google with capabilities in text, audio, and video. (https://gemini.google.com/advanced) GPT-4 - The fourth iteration of the Generative Pre-trained Transformer model by OpenAI. (https://openai.com/gpt-4) Time Stamps: 00:00 Sam strategically times releases to upstage Google. 04:58 Multiple videos watched, 30-50 pages long. Easter eggs, OpenAI mention, Sam Altman backstory. 07:47 A new model is better than GPT-4. 12:55 Will Smith spaghetti meme evolved rapidly in Tokyo. 15:39 Model Sora can animate still images, creating narratives. 18:30 Stock videographer sites may be obsolete for marketing. 20:54 YouTube is the future of multimedia content. 26:20 Gemini Pro unlocks YouTube as a search engine. 29:32 OpenAI: large company doing incredible work efficiently. 31:43 AI developments promise exciting content for the year. Follow us for everyday marketing wisdom straight to your feed YouTube: https://www.youtube.com/channel/UCGtXqPiNV8YC0GMUzY-EUFg Twitter: https://twitter.com/matgpod TikTok: https://www.tiktok.com/@matgpod Thank you for tuning into Marketing Against The Grain! Don't forget to hit subscribe and follow us on Apple Podcasts (so you never miss an episode)! https://podcasts.apple.com/us/podcast/marketing-against-the-grain/id1616700934 If you love this show, please leave us a 5-Star Review https://link.chtbl.com/h9_sjBKH and share your favorite episodes with friends. We really appreciate your support. Host Links: Kipp Bodnar, https://twitter.com/kippbodnar Kieran Flanagan, https://twitter.com/searchbrat ‘Marketing Against The Grain' is a HubSpot Original Podcast // Brought to you by The HubSpot Podcast Network // Produced by Darren Clarke.
O Hipsters: Fora de Controle é o podcast da Alura com notícias sobre Inteligência Artificial aplicada e todo esse novo mundo no qual estamos começando a engatinhar, e que você vai poder explorar conosco! Nesse episódio conversamos com Sandor Caetano, Chief Data Officer do PicPay sobre como a empresa está adotando IA nos produtos e nos processos internos. Além disso, destrinchamos o tsunami de novidades de IA generativa que agitaram o finalzinho da última semana. Vem ver quem participou desse papo: Marcus Mendes, host fora de controle Fabrício Carraro, Program Manager da Alura e host do podcast Dev Sem Fronteiras Sérgio Lopes, CTO da Alura Filipe Lauar, engenheiro de Machine Learning e host podcast do Vida com IA Christian Velasco, Líder da operação da Alura na América Latina Sandor Caetano, Chief Data Officer no PicPay
Show Notes: https://thisdayinai.com/bookmarks/28-ep51/Sign up for daily This Day in AI: https://thisdayinai.comTry Stable Cascade: https://simtheory.ai/agent/508-stable-cascadeJoin SimTheory: https://simtheory.ai======This week we take several shots of vodka before trying to make sense of all the announcements. OpenAI attempted to trump Google's Gemini 1.5 with the announcement of Sora, 1 minute video generation that does an incredible job of keeping track of objects. Google showed us that up to 10M context windows are possible with multi-modal inputs. We discuss if a larger context window could end the need for RAG and take a first look at GraphRAG by Microsoft hoping to improve RAG with a knowledge graph. We road test Nvidia's ChatRTX on our baller graphics cards and Chris tries to delete all of his files using Microsoft UFO, a new open source project that uses GPT-4 vision to navigate and execute tasks on your Windows PC. We cover briefly V-JEPA (will try for next weeks show) and it's ability to learn through watching videos and listening, and finally discuss Stability's Stable Cascade which we've made available for "research" on SimTheory.If you like the show please consider subscribing and leaving a comment. We appreciate your support.======Chapters:00:00 - OpenAI's Sora That Creates Videos Instantly From Text13:49 - ChatGPT Memory Released in Limited Preview23:31 - OpenAI Rumored To Be Building Web Search, Andrej Karpathy Leaves OpenAI, Have OpenAI Slowed Down?33:04 - Google Announces Gemini Pro 1.5. Huge Breakthrough 10M Context Window!50:11 - Microsoft Research Publishes GraphRAG: Knowledge Graph Based RAG1:02:03 - Nvidia's ChatRTX Road Tested1:07:18 - AI Computers, AI PCs & Microsoft's UFO: An Agent for Window OS Interaction. Risk of AI Computers.1:18:46 - Meta's V-JEPA: new architecture for self-supervised learning1:24:26 - Stability AI's Stable Cascade
Google has been under fire after the release of its new Gemini. Sorry to say but Google got so many things wrong with the marketing and launch. Is Gemini an actual ChatGPT killer or just a marketing stunt gone wrong? We're covering everything you need to know.Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Ask Jordan questions about Google GeminiUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTimestamps:[00:02:17] Daily AI news[00:07:30] Overview of Google Gemini[00:10:40] Google lied about Gemini release[00:17:10] How Gemini demo was created[00:23:50] Comparing ChatGPT to Gemini[00:30:40] Benchmarks of Gemini vs ChatGPT[00:38:20] Why did Google release Gemini?[00:43:00] Consequences of botched releaseTopics Covered in This Episode:1. Introduction to Google's Gemini Model2. Google Gemini's Marketing Controversy3. Assessing Gemini's Performance and Functionality4. Comparison with ChatGPT5. Importance of Transparency and Truth in AI IndustryKeywords:Google Gemini, Generative AI, GPT-4.5, AI news, AI models, Google Bard, Multimodal AI, Google stock, Generative AI industry, Google credibility, Technology news, AI tools, Fact-based newsletter, Marketing misstep, Deceptive marketing, Multimodal functionality, Gemini Ultra, Gemini Pro, Benchmarks, Misrepresentation, Stock value, Text model, Image model, Audio model, Google services, Pro mode, Ultra mode, Marketing video Get more out of ChatGPT by learning our PPP method in this live, interactive and free training! Sign up now: https://youreverydayai.com/ppp-registration/ Get more out of ChatGPT by learning our PPP method in this live, interactive and free training! Sign up now: https://youreverydayai.com/ppp-registration/
People across the Internet are accusing Google of faking that Gemini AI video demo that everyone was wowed by. Apple seems to be diversifying out of China for manufacturing at pace now. Might the UK's CMA have an issue with Microsoft's relationship with OpenAI? And, of course, the Weekend Longreads Suggestions.Sponsors:ShopBeam.com/rideLinks:Google's Gemini Looks Remarkable, But It's Still Behind OpenAI (Bloomberg)Early impressions of Google's Gemini aren't great (TechCrunch)Apple to move key iPad engineering resources to Vietnam (NikkeiAsia)Microsoft, OpenAI Are Facing a Potential Antitrust Probe in UK (Bloomberg)Google launches NotebookLM powered by Gemini Pro, drops waitlist (9to5Google)Weekend Longreads Suggestions:The real research behind the wild rumors about OpenAI's Q* project (ArsTechnica)AI and Mass Spying (Schneier On Security)The race to 5G is over — now it's time to pay the bill (The Verge)In the Hall v. Oates legal feud, fans don't want to play favorites (NBCNews)See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
Why are AAA games like GTA 6 ported to PC well after their release on game consoles? Scott explains. Plus Twitch will stop operations in South Korea starting February 27, 2024, due to high costs there. And Google launches its new Large Language Model Gemini which comes in three flavors; Gemini Ultra, Gemini Pro, and Gemini Nano.Starring Tom Merritt, Sarah Lane, Scott Johnson, Roger Chang, Joe.Link to the Show Notes. Become a member at https://plus.acast.com/s/dtns. Hosted on Acast. See acast.com/privacy for more information.