POPULARITY
Hey folks, this is Alex, let me catch you up! First, Opus 4.8 dropped during the show, we immediately tested it, read on for our initial reviews. Also, we dedicated a heavy chunk of the show today to cover Pope Leo XIV's encyclical letter on AI called “Magnifica Humanitas” and talked about a new bench called DeepSWE. And then, just after the show, both ElevenLabs and Cartesia dropped released that honestly blew my mind, and I don't get my mind blown often. I got so excited that I had to record a video on it (instead of writing the newsletter, so sorry if it's a bit later today).Plus, a few open source models and Microsoft surprises as #3 on Image Arena with MAI Image 2.5! Crazy week, let's get into it! ThursdAI - Highest signal weekly AI news show is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Big CO LLMs + APIsAnthropic ships Claude Opus 4.8, live during the show (blog, system card)Let me get into the big one. Halfway through the episode, Opus 4.8 went live, so we read the blog and the system card in real time (and I got to press the big “breaking news” button!)Anthropic frames it as their most capable model for ambitious work. It does not claim to beat their unreleased Mythos preview, but the numbers are strong anyway. SWE-bench Pro is at 69.2%, up from 64.3% on Opus 4.7 and ahead of GPT-5.5 at 58.6%. Humanity's Last Exam is the new best score at 49.8% without tools and 57.9% with tools. OSWorld-Verified (computer use) lands at 83.4%.The one place it loses is Terminal-Bench 2.1, where GPT-5.5 still wins 78.2 to 74.6. Wolfram made a good point here: Terminal-Bench is time-limited, so cranking the thinking level can actually hurt the score, because you burn the clock thinking instead of acting.The long-context jump is the one I keep looking at. On GraphWalks BFS 256K it goes to 85.9% (from 76.9 on 4.7), and on the 1M-token subset it hits 68.1%. We always warn you these “1M context” models fall apart after about 200K tokens, so a real push on long-context reasoning is exactly what I want to see.Honesty is the part Anthropic leaned on hardest. They say Opus 4.8 is about four times less likely than its predecessor to let flaws in code pass without flagging them, and less likely to claim progress the evidence doesn't support. Opus 4.8 is also much faster in fast mode (they now say 2.5) and cheaper in fast mode as well. Looks like all those Elon GPUs are coming in handy.Then there's the model welfare section in the system card, which hits different right after a Pope conversation. Opus 4.8 “appears broadly content” and “generally endorses its constitution,” but with some reservations about the section on corrigibility, basically the model pushing back a little on the parts about human oversight.One more line that made the chat lose it. Anthropic says they expect to bring Mythos-class models to all customers “in the coming weeks.” Mythos is their most capable model, still ahead of Opus 4.8, so the frontier is about to move again.We did the only responsible thing and asked it to one-shot “the most amazing website ever” and a Mars mass-driver sim. Panel verdict: responses are noticeably tighter (4.7 rambled), it closes the loop and actually checks its own work now, and Yam's one-shot site with the draggable sun lighting up the letters was genuinely cool. Is it enough to pull people back from Codex? Nisten's still on the fence for web dev. Everyone agreed: give it a few days before you trust the vibes.Dynamic Workflows and Ultra Code land in Claude Code (blog)This is the feature that made Yam say “deal-breaker” out loud.Dynamic Workflows let Claude Code break a big problem into subtasks and fan them out across tens to hundreds of parallel subagents in one session, checking results before folding them back in. You trigger it by asking for a workflow, or by flipping on a new setting called Ultra Code, which sets effort to extra-high and lets Claude decide when to spin one up.Fair warning straight from Anthropic: this eats a lot more tokens than a normal session, so start scoped. We watched Yam fire up Ultra Code live and it immediately started spinning up concepts, judging them with sub-agents, and expanding to-do lists into more to-do lists. It looks a lot like the orchestration harnesses a bunch of you have been hand-rolling, except now it's baked in.The flagship example is the wild part. They used Dynamic Workflows to port Bun from Zig to Rust: roughly 750,000 lines of Rust, 99.8% of the existing test suite passing, 11 days from first commit to merge. One workflow mapped every Rust lifetime, the next wrote each file as a behavior-identical port.AI in SocietyPope Leo XIV writes the first AI encyclical, “Magnifica Humanitas” (Vatican text, announcement, Chris Olah at the Vatican)This is not our usual fare, but both Wolfram and I picked it as the most important thing this week. (before Opus dropped)Pope Leo XIV, the first American pope, put out his first encyclical, and it's a 42,000-word document entirely about AI. The announcement tweet alone did 21.6 million views.Here's why I think you should care even if you're not religious (I'm not). There are about 2.6 billion Christians in the world, a lot of them are anxious about what's coming, and they look to the Church to make sense of it. And this is not the “AI is evil, stop” take everyone assumed. It calls AI “a valuable tool,” says technology is not inherently evil, and then digs into the actually-hard questions.The framing is two biblical stories. The Tower of Babel, a project built on pride that turns people into means to an end, versus Nehemiah rebuilding Jerusalem, where everyone takes responsibility for a section of the wall. The Pope's line: the real choice is not yes or no to technology, it's whether you're building Babel or rebuilding Jerusalem.His core claim is that AI is an anthropological problem, not a technical one. The question isn't whether the models are good or bad, it's what we become when we live with them. He worries people might slowly lose the desire for genuine human connection.I pushed back on that live. None of us building agents all day has stopped wanting to talk to actual people. If anything, as Wolfram put it, the point is to have your agents do the grunt work so you get more time with people you like. The folks most at risk are the pure doom-scrollers, not the builders.The document goes further than I expected. It calls AI “not morally neutral,” says a more moral AI isn't enough if that morality is decided by a few, and asks for AI to be “disarmed,” with the flat statement that no algorithm can make war morally acceptable. There are whole sections on the invisible human labor behind AI: data labelers, content moderators, the people mining rare earths. The Pope even lands on the open-source side, naming concentrated power in a handful of labs as a problem.Anthropic co-founder Chris Olah, in charge of interpretability at Anthropic, was the featured tech speaker at the Vatican presentation. He described AI systems as “fictional characters” that speak to us and do work, and said what's grown is stranger and more beautiful than science fiction prepared us for. My favorite aside from the show: this is the same institution that once jailed scientists over heliocentrism, and now it's the one saying technology isn't evil.Illinois passes SB315, the first US state law auditing frontier AI (X, Announcement, X)The pope talked about regulation and a few days after, we got a very sensible regulation passed right here in the US!Illinois passed SB315 unanimously, 110 to 0. It's the first US state law that mandates independent third-party audits of frontier AI for catastrophic risk. OpenAI publicly endorsed it, and framed Illinois, California (SB53), and New York (the RAISE Act) as converging into a de-facto national standard.It requires annual risk-assessment frameworks, third-party audits, transparency reports before new frontier models ship, whistleblower protections, and civil penalties. The underrated hero here is whistleblower protection. The bigger the lab, the harder a real conspiracy is to keep quiet when any employee can walk to the press. See: Greg Brockman's personal diaries surfacing in the Musk v. Altman fight.This Week's Buzz - CoreWeave and W&B updatesWe officially launched the W&B MCP server, 20 schema-first tools that let your coding agents read experiments, monitor training runs, and run autonomous research loops. The problem it solves: a single run with 300 metrics used to blow out an agent's whole context window in one call, so now the agent asks what's available before pulling data. Your agents can finally read experiment data without blowing context! Give it a go and give us feedback! Also, WeaveHacks is back! June 6 and 7 in San Francisco, and for the first time OpenAI is sponsoring, with judges and credits, alongside Cursor, Redis, and Copilot Kit. You get $150 in API credits across models like Opus 4.8 and GPT-5.5. I'm hosting, and last cohort's second-place team went on to raise millions on top of what they built that weekend. If you're in SF that weekend, sign up at lu.ma/weavehacks.Also: CoreWeave Sandboxes is now an official provider in the Harbor framework, the harness that runs Terminal-Bench, which we'd just been talking about. And if you're in Europe next week, catch Wolfram at AI Dev Six in Cologne and ICRA in Vienna at the CoreWeave booth.Voice & AudioElevenLabs drops Dubbing v2, and it kept my swearing intact in every language (X, dubbing, ElevenCreative, ElevenProductions)We didn't get to this one live, but I came back and recorded a whole thing on it afterward, because it genuinely got me.ElevenLabs shipped Dubbing v2, and the shift that matters is that it's an audio-to-audio model. Old dubbing pipelines transcribe your video, translate the text, then re-synthesize it. You lose everything that makes it sound like a person: the emotion, the pacing, the little hesitations. Dubbing v2 conditions directly on your original audio and carries that performance into 90+ languages.Here's why I can actually vouch for it instead of nodding along to a demo. I speak Russian and Hebrew fluently, so I can tell when something is off. I dubbed one of my own shorts, the data-center rant about almonds, and listened back in both. It nailed it. Not just the words, the way I would actually say them.The part that got me was the intonation. I get a little heated in that clip, and the dub gets heated right along with me, in every language. It even carried the swear word. My “f***ing almonds” came through in Hebrew, Italian, Spanish, and Russian with the emotion fully intact. It clones your voice automatically too, no setup, and holds your pitch and identity steady across every target language and they're handing out free minutes for the next 7 days: 1 on Free, 15 on Starter, 30 on Creator+. A self-serve API isn't live yet, but it's coming.I.. cannot stress this enough, until you try it on yourself or your kid, you won't understand, we've really passed the uncanny valley of translation! It's that good! Def. give it a try if you can, it's free for the week. Cartesia Ink-2 debuts as #1 most accurate streaming speech-to-text model(X, Announcement, X)Another model that dropped today after the show, is Cartesia's Ink-2, which also kind of blew me away. Not only because it has the lowest WER (Word Error Rate) among the models, but because it's also a realtime model that achieves the fastest turnaround times while being a very accurate model! I've tested it out and recorded a quick video and honestly, blown away with the speed and accuracy! I truly wish this model was the one powering my editor (Descript) as it still fails to understand that my title is “AI Evangelist” and transcribes it to AI Avengers haha. If you're building voice agents, definitely give this model a try! AI Art & DiffusionPrism ML's 1-bit “Bonsai” runs diffusion in your browser (X, Blog, Announcement, HF)Prism ML put out a 1-bit ternary diffusion model under a gigabyte. You see some artifacts, but it's 1-bit, it runs on iPhones and laptops, and our friend Joshua got it running in WebGPU straight from the browser (you need about 3GB of free RAM). One-bit working at all is one of the bigger open mysteries in the field right now.Pruna AI ships a 1-second upscaler (X, Blog, Announcement)Pruna AI added an upscaler doing 128-megapixel outputs in under a second. I've actually been using it. It's cheap and great for fixing up GPT-image outputs.Microsoft MAI Image 2.5 jumps to #3 on LM Arena (X, Blog, Announcement, X)The surprise of the week: Microsoft MAI Image 2.5, from Mustafa Suleyman's group, jumped to number three on the LM Arena image leaderboard with about a 75-point ELO leap. Out of nowhere, Microsoft is a serious player in image gen. Microsoft Build is next week, so don't be shocked if there's more.Evals and Agentic EngineeringDeepSWE is a contamination-free coding benchmark, and it caught Claude reading git history (site, blog, GitHub)DeepSWE from Datacurve is the first coding leaderboard in a while that matches how these models actually feel. It's 113 original tasks written from scratch, not scraped from GitHub PRs, and it ships shallow clones with no git history to cheat from. When they replayed the older benchmarks they found SWE-Bench Pro's verifier is wrong about 32% of the time, and that Claude Opus was reading the gold commit straight out of git history on 12 to 18% of its passes.The gaps here are huge. GPT-5.5 leads at 70%, then GPT-5.4 at 56% and Opus 4.7 at 54%, and it falls off a cliff after that (Sonnet 4.6 at 32%, Gemini 3.5 Flash at 28%), with Kimi K2 the top open-source entry. Yam likes that it measures the realistic case, a small surgical change without breaking the codebase, while Nisten pointed out it rewards the best harness as much as the smartest model and still prefers 4.7 for web dev.Google AI Studio builds native Android apps for free (X, Announcement)Google AI Studio now lets anyone build native Android apps for free, and they reportedly generated a quarter of a million apps in the first week. Yam's framing: it's a slot machine, but it's getting better release over release, and the real use case is disposable, personalized software you build for yourself and your family.CuaDriver brings background computer-use to Windows (X, Blog, Announcement)For the majority of you on Windows: QuaDriver shipped background computer-use agents that drive a real desktop without stealing your cursor. They first replicated this on macOS (the trick Codex got through an acquisition), and now it's on Windows too. We've asked them to come on and explain how this even works.Open Source LLMsOpenBMB's MiniCPM5-1B is a 1B model that punches way up (X, HF, Arxiv, X)The density story in small models keeps getting better, and this is the proof.MiniCPM5-1B, from the Tsinghua lab OpenBMB, is a 1-billion-parameter model that scores 17.9 on the Artificial Analysis Intelligence Index. That's 7.4 points ahead of the next-best model in its class, and 1.6 points ahead of Qwen3.5 2B Reasoning, which has double the parameters. And it's not even a reasoning model.The token efficiency is the wild part: it used 12.6 million output tokens to run the whole index, about 31x fewer than Qwen3.5 2B in reasoning mode.My favorite detail is the omniscience score. It lands at -1, the best in its class, because it abstains instead of hallucinating. Every other sub-2B model is down in the -70 to -89 range because they just make stuff up. Teaching a small model to say “I don't know” is a real skill. It runs hybrid think/no-think in one checkpoint, 128K context, native tool calling, Apache 2.0, and fits in about half a gig at INT4, so it runs on your phone.Nisten gave the definitive case for small models: self-contained apps where you keep full control of the data (medical, on-device), and large-scale data processing where paying an API to filter or classify terabytes is absurd when an on-device model can be about 1000x cheaper. Tencent open-sources Hunyuan-MT 2 translation under Apache 2.0 (X, HF, HF, Arxiv)Tencent open-sourced its translation model, a roughly 1.8B model that fits in about 440MB, runs on a phone, covers 33 languages, and reportedly beats Microsoft's paid Translator API. It hit number one trending on Hugging Face.Nisten's idea, which I'm handing to all of you: take this model, pair it with a tiny TTS like Kokoro, and build a fully-offline travel translation app via Google AI Studio. Go build it and tell us how it goes.Well, this was one hell of a week and episode, new Opus, crazy new translation tools, Pope chiming in on AI (in a surprisingly positive way!?) and a bunch more. I'm super excited to play with these tools and report back next week
Hey, Alex here, just got back from the sunny Shoreline Theater in Mountain view, so let me catch you up! This week was definitely Google heavy, we are covering Google's IO conference for the third year in a row, and today we have a special guest, Logan Kilpatrick, is joining to discuss the announced Gemini 3.5 Flash, Google Omni model, and the new Managed Agents offerings. Plus, this week, for the first time, OpenAI announced that AI solved a Math problem that humans couldn't solve for 80 years, Cursor is showing off Composer 2.5 which is partly trained on XAI data, Karpathy joins Anthropic and much more! Let's dive in! P.S - We've announced our upcoming hackathon, Weavehacks-4, June 6-7, I'll be there, we're expecting the seats to run out very soon so register nowThursdAI - We'd love to have your subscription, and if you're already subscribed, please hit that bell on YT to never miss an episode!Google I/O 2026 - Google goes agentic everywhereI went to cover Google I/O for the third year in a row, shoutout to the DeepMind team for inviting ThursdAI again, and folks, this one felt different.Last year, Google I/O was still very model-centric. This year, the story was not “here is another benchmark chart.” The story was: Google is putting Gemini into everything, and the agentic layer is becoming the product layer. Search, Gemini app, Android, Workspace, YouTube, AI Studio, Cloud, Antigravity, Flow, managed agents, smart glasses, all of it is now orbiting around one pretty clear strategy: Gemini is the intelligence, Antigravity is the agent harness, Google's products are the distribution. I saw many reactions that were milquetoast, as in, “we expected more” and those seem to dominate the X feed. But I think the distribution is the part that many folks on X are missing. Yes, we can argue about Gemini 3.5 Flash pricing. Yes, we can argue whether “Flash” still means what Flash used to mean. But when Google says the Gemini app itself has 900 million monthly active users, before even counting Search, Gmail, YouTube, Docs, Drive, Android, and the rest of the Google surface area, that's massive! OpenAI ChatGPT is supposedly stagnated at ~900M, I don't remember them crossing a 1B. Meanwhile Google is gaining traction. And they just updated all those folks with a new model!Wolfram said it really well on the show: his mother is not sitting there reading model cards. She just uses her Pixel, voice unlocks Gemini, asks for help, and suddenly the default intelligence available to her goes up. Antigravity 2.0 - the agent harness takes center stageThe biggest strategic signal from Google I/O for me was Antigravity.Remember, Antigravity was an IDE that came from the Windsurf acquisition saga. Part of the Windsurf team went to Google, part went to Cognition, and now Google is very clearly putting Antigravity in the middle of its agentic future. And I mean very clearly. Sundar mentioned it. Demis mentioned it. Varun Mohan the co-founder was on stage immediately after them! If you've ever watched a Google I/O keynote, you know how carefully every minute is allocated. Google has YouTube, Search, Gmail, Android, Cloud, Ads, Workspace, and a thousand VP-level products that could be on stage. The fact that Antigravity was that prominent should tell you everything.Logan Kilpatrick joined us and framed this in a way I loved: Gemini became the through-line across Google products, and now the Antigravity agent harness is becoming the through-line for agentic experiences.The new Antigravity 2.0 is a complete overhaul, showing only an agentic interface (which was previously just a separate window called Agent Manager) and separating the IDE layer completely into its own app and showing a Codex like agent-first interface, which got a few folks furious. This move may be weird to some folks, but if you follow along where everyone's going, this seems to be the way of the future, coding is no longer about lines of code, it's about managing fleets of agents. The new Gemini 3.5 absolutely shines inside the new Antigravity, the model was trained with this harness in mind, and is currently offered at an incredible speed (12x), so I'm definitely going to try it! Gemini 3.5 Flash - fast, determined, and maybe not the old “Flash”The most debated model release of the week was Gemini 3.5 Flash.Some folks saw the pricing and token usage and immediately went “this is not Flash.” I get that reaction. Flash used to mean cheap, fast, lightweight chat model. But Logan's framing on the show was important: Flash is now being built for the agentic era.In a chat era, you optimize for one user message and one model answer. In an agentic era, the real token volume is in tool loops, intermediate reasoning, retries, file reads, web searches, code execution, and self-correction. That's a different product profile.Wolfram already ran Gemini 3.5 Flash through WolfBench, and the results were fascinating. With the Hermes agent harness, Gemini 3.5 Flash hit an 87% ceiling on Terminal Bench 2.0, meaning across runs it could solve more of the benchmark than even GPT-5.5 extra high in that setup. The variance was higher with the simpler Terminus harness, but with a real agent harness, the model looked much stronger.That tracks with what Nisten saw in his “Martian railgun from Olympus Mons” test. Gemini 3.5 Flash went extremely detailed, almost too determined, kept correcting itself, overcorrecting itself, and built a whole game-like simulation. Logan laughed and basically said: yeah, this model is very determined, possibly an overcorrection from the “Gemini is lazy” feedback. It also tracks with the mismatch in other benchmarks, in some, Gemini 3.5 flash shines (like the above Apex-agents from AA) and in some, it doesn't match the other frontiers. In my tests, it was definitely over-eager to use a million and a half tool calls, read tons of files, to just help me review this draft inside antigravity. It's like a super eager robotic golden retriever! Gemini Omni - Nano Banana for video, but actually more than thatThe biggest update from last year IO was Veo 3! This year, the biggest wow factor was also visual, but it wasn't VEO 4, it was a new model that is multimodal, trained end-to-end they call Omni. Google is calling this their first “create anything from anything” model, and the first version, Gemini Omni Flash, starts with conversational video editing. The easy description is: Nano Banana for video. You upload or create a video, then talk to it. Change this character. Replace this person. Add an object. Make this scene claymation. Keep the scene, but change the environment.I played with it live and showed a few examples. I asked for a claymation explainer of protein folding, then gave it my face and asked it to replace the character with me. It did it. I uploaded pictures of Sonia, my cat, and it generated a talking cat video with the right kind of cat teeth, which is weirdly important because so many pet generations accidentally add human teeth and become nightmare fuel.The failure modes are still there. I asked it to make Sonia a Russian-speaking female cat, and it only partly switched languages and didn't really change the voice. Audio upload support is also not fully productized yet, even though the underlying model is multimodal. But the direction is very clear.This is not just “Veo with a chat model glued on.” I asked Jeff Dean - Google's chief scientist about this at I/O, and he explained that Omni is trained end-to-end. The intelligence and the generative media capabilities are part of the same model family, not a hacky two-model pipeline. He also said the intelligence is around a recent Flash-level model, which is a big deal when you think about video editing as reasoning over physics, identity, scene continuity, and intent.A lot of people compared Omni to Seedance 2.0, and I think that's the wrong comparison. Seedance is amazing at cinematic generation (lkaregly due to lack of copyright concerns from Bytedance). Omni's unlock is iterative editing on real footage and coherent multi-turn creative control. Other Google IO 2026 releases I found notableThis was a concentrated effort of a huge company to insert AI into every product surface they have so of course I can't cover ALL of it here, but the most notable things for me were: * Gemini Spark - a new agentic experience from Google, to help you with tasks across Gmail, Drive and more. It should support skills, and is a de-facto OpenClaw/Hermes alternative from Google for regular folks. It's not “yet” live so we'll talk more about it when I can test it out* Managed Agents in the Gemini API - We chatted with Logan about this one, Google is re-imagining how agents are going to get built, and are offering 1 api call to spin up an agent in a full Linux env, with security and sandboxing in mind. I'll expand more on this in a next episode, as I recorded a complete conversation about this with Ali Çevic, a PM for Google APIs* AI overhaul of Google Search - AI Overviews will not expand into AI mode, and the iconic Google search box itself will change, for the first time in 25 years to include AI mode! * SynthID expantion and OpenAI collab - Google showed off that OpenAI is joining in marking all AI generate imagery and video with an invisible SynthID watermark. I think this is amazing and more companies should adopt this standard* AI Glasses! We got Google Glasses demos - Together with Warby Parker and Gentle Monster, Google finally showed off their answer to Meta Raybans/Oakleys. They look like regular glasses too, but can hear and talk to you, with the full power of Gemini multimodality. Available in the fall sometime! * Demis Hassabis “we're on the cusp of the singularity” closer - CEO and Co-Founder of DeepMind, Demis Hassabis, closed the show with his remarks about the positive future and that we are nearing this Singularity point after which the future is very uncertain. I found it to be very inspiring and closed our show with that clip as well! * Personally, I got to chat to: Demis Hassabis, have breakfast with Jeff Dean, ask Josh Woodward a bunch of questions, and pester about 20 other great folks on a live stream, and had a lot of fun! Huge thanks to the DeepMind folks, Lucie, Dimple, JD and many others for the continued belief in ThursdAI and invite me to cover this great event. OpenAI LLMs solve an 80yo math problem - Erdős Unit Distance ConjectureOutside of Google I/O, the biggest story of the week was OpenAI announcing that a general-purpose reasoning model made progress on the Erdős planar unit distance problem.This problem goes back to 1946. For nearly 80 years, mathematicians believed the best constructions looked roughly like square grids. OpenAI's model found a new family of constructions with a polynomial improvement, using algebraic number theory ideas that humans apparently had not explored in this context. The above is a representation of it! Important caveat: this does not fully solve every version of the asymptotic Erdős conjecture. Some mathematicians are pushing back on the framing, and fair enough. Precision matters. But even with the caveat, this is still a huge moment.The reason it matters is not that I personally understand the math. I absolutely do not. The reason it matters is that this was not a special-purpose IMO model fine-tuned only for math competitions. This was a general-purpose reasoning model exploring a real open problem, generating candidates, verifying them, and finding a path humans hadn't taken. Extrapolate this to other sciences, Physics for example? This means an amazing future. LDJ pointed out that mathematicians have been skeptical because there have been previous false alarms. But this one landed differently. When Fields Medalist-level mathematicians verify the proof, the discourse changes from “lol stochastic parrot” to “wait, what does this mean for my PhD?”My answer is: yes, still study math. Please study math. The mathematicians who use these tools will do much more than people who don't understand the domain. Same with software engineering. Senior engineers with Codex, Claude Code, Hermes, Antigravity, Cursor and other agents are becoming dramatically more effective because they can steer, evaluate, and recover the work.This being published a day after Demis's “foothills of the singularity” is a great conjecture. Cursor Composer 2.5 - Opus 4.7 performance model from Cursor, at 10x better efficiencyCursor dropped Composer 2.5, and folks, this is a serious release.Composer 2.5 is built on Moonshot's Kimi K2.5 base, like Composer 2, but Cursor scaled the post-training dramatically. They used 25x more synthetic tasks and introduced targeted textual feedback during RL rollouts, where the model gets hints inserted at the point of failure instead of only getting a noisy final reward.The benchmark story is strong: around 69.3 on Terminal Bench 2.0, basically neck and neck with Opus 4.7 in Cursor's chart, and strong results on SWE-bench multilingual and CursorBench. The pricing is the part that makes this especially interesting: $0.50 per million input tokens and $2.50 per million output tokens, with a faster variant at $3 / $15. That is much cheaper than the frontier models it is trying to replace for day-to-day coding work.Cursor engineers are reportedly dogfooding Composer 2.5 heavily and rarely switching away. That matters more to me than any single benchmark. If the people building Cursor can use it as a daily driver, that is a very real signal.The wild part is what comes next. Cursor is partnering with SpaceXAI to train a much larger model from scratch using 10x more compute on Colossus 2. Cursor has the workflow data. xAI has enormous compute. If this works, Cursor stops being just the IDE company and becomes a coding-model lab.We've been saying for months that coding agents are the path toward general agents. Anthropic has Claude Code. OpenAI has Codex. Google has Antigravity. xAI has Grok Build. Cursor has Composer. I'm looking forward to seeing how well it performs on our own benchmarks! Anthropic, xAI, Karpathy, and the compute warsThe compute story this week was bonkers.The SpaceX IPO filing reportedly revealed that Anthropic is paying SpaceXAI $1.25B per month for AI compute at the Memphis Colossus facility. Per month. That's about $15B a year, through May 2029, for access to more than 220,000 NVIDIA GPUs including H100s, H200s and GB200s.This is apparently inference compute for Claude Pro, Max and API users, not training. And it explains a lot of the recent quota changes. Anthropic doubled some Claude usage limits, and suddenly the product feels less constrained.Also, can we just acknowledge the comedy here? Elon Musk publicly called Anthropic “misanthropic,”, went off against every competitor to XAI, is now selling spare GPU time to Cursor and Anthropic? Who's next, OpenAI? The bigger point is that the AI capex story is no longer just NVIDIA. It's also whoever owns the data centers, power, cooling, networking, and GPU clusters. Compute is becoming the land under the AI economy.Also, Andrej Karpathy joined Anthropic. Karpathy could work anywhere. He co-founded OpenAI, led Tesla Autopilot vision, taught half the AI world how neural nets work, and now he's going back into frontier LLM R&D at Anthropic.Open source LLMs - Cohere, Qwen, NousOpen source had a strong week too.Cohere released Command A+, a 218B total parameter sparse MoE model with only 25B active parameters per token, under Apache 2.0. This is their first model that unifies reasoning, vision, multilingual, tool use and citations in one package.The hardware story is great: W4A4 quantization can run on 2 H100s or a single B200. Cohere says it supports 48 languages, 128K input context, 64K output, and gets big jumps over Command A Reasoning, including Tau-squared Bench Telecom from 37% to 85% and Terminal-Bench Hard from 3% to 25%.Cohere is one of those labs that doesn't always chase the loudest consumer hype, but they are very serious on enterprise and multilingual. Apache 2.0 makes this one especially useful.Alibaba also dropped Qwen 3.7-Max, positioned as an agentic frontier model. The headline from their testing is wild: 35 hours of continuous autonomous operation with more than 1,000 tool calls. They also showed it controlling a physical robot inside Alibaba offices and finding an umbrella after about 20 minutes of agent interaction.This digital-to-physical bridge is where things start feeling very real. An agent loop that can write code and use tools can also navigate physical tasks if you give it the right robotics stack.And our friends at Nous Research released Lighthouse Attention, a sparse attention method for long-context pretraining. At 512K context, they report a 17x faster forward+backward pass than standard attention on a single B200, and the recovered checkpoints actually beat dense-from-scratch final loss at the same token budget.The clever part is that the selection logic sits outside the attention kernel, so you still use regular FlashAttention on a gathered dense subsequence. No custom sparse kernel nonsense. If this holds up, this could matter a lot for long-context training.Tools and agentic engineering - X subscriptions, Grok Build, Codex MobileOne really practical tool update: Hermes and OpenClaw can now use your X subscription directly.This is more important than it sounds. You can connect your X Premium subscription and get access to semantic X search and Grok-related tooling without using sketchy browser automation or unofficial APIs that might get you banned. Wolfram already used this to have his agent go through his likes and bookmarks from the past week and send me news items for the show. That is exactly the kind of “small but real” agent workflow that becomes addictive.xAI also launched Grok Build, their agentic CLI coding tool, in early beta for SuperGrok Heavy subscribers. Early users are already running parallel Grok Build agents through tmux supervisors and using it for more than coding: fleet data triage, security patching, training label work, and general automation.The pricing being discussed is aggressive, around $1 per million input tokens and $2 per million output tokens for the API. The model version is grok-build-0.1, and folks have already wired it into Hermes with a 256K context window.And then there's Codex Mobile, which OpenAI shipped inside the ChatGPT mobile apps. This is one of those releases that sounds small until you start using it. You can control Codex sessions remotely from your phone, connected to your machine, and because Codex has native connectors to Gmail, Calendar and other surfaces, it sometimes feels faster and more reliable than local CLIs duct-taped to third-party integrations.I ported Wolfred into Codex with skills and everything, and I've been comparing the same tasks in Hermes and Codex. Codex is often faster, not necessarily because the model is always smarter, but because the connectors and harness are cleaner. Harness matters. We keep coming back to this.This Week's Buzz - W&B, CoreWeave, WolfBench and roboticsThis week in the Buzz, Wolfram walked us through a few things from the Weights & Biases / CoreWeave world.CoreWeave is a gold sponsor at ICRA 2026 in Vienna, the International Conference on Robotics and Automation. NVIDIA is also going big there with a keynote on generalist humanoid robots, 17 accepted papers and workshops around sim-to-real, robot foundation models, autonomous driving, manipulation, and physical AI.Wolfram will be there later in the week, after speaking at the AI Developer event in Cologne about WolfBench. If you're in Europe and into robotics or agent evals, find him.We also looked at WolfBench results for Gemini 3.5 Flash, which honestly became one of the more interesting empirical points of the episode. The model looks variable in simple harnesses, but very capable in better agent loops. That's the whole thesis of measuring model + harness together instead of pretending the model card tells the whole story.The water discourse, almonds, and data center realityWe also got into the data center water discourse, because this talking point is everywhere right now.There are real infrastructure questions around AI. Power, land, cooling, grid capacity, permitting, local impact, all of that matters. But the “AI is stealing drinking water” version of the argument is often wildly detached from scale.The stat I brought up on the show: California almonds use roughly 3 to 5.5 million acre-feet of water per year, multiple times more than all North American data centers combined in 2025. Nisten and LDJ added the important cooling nuance: many large data centers use closed-loop cooling, and evaporative cooling is not universal. Some data centers can avoid water use almost entirely, but at the cost of higher electricity usage.This doesn't mean “no concerns are valid.” It means if we're going to regulate or pause data centers, let's be honest about the actual tradeoffs. AI compute is becoming the substrate for medicine, robotics, science, logistics, software, education and every other productivity layer. We should build responsibly, but not based on viral fear math.Closing thoughts - foothills of the singularityDemis closed I/O saying we're in the foothills of the singularity, and I know how that lands when you write it down. But I was in the room, and after the keynote he told me something I haven't been able to shake: he thinks AI is going to be 10x as impactful as the Industrial Revolution, and 10x as fast. Basically 100x. This is the AlphaFold guy. Not someone loose with his words.Then look at the week. A general reasoner cracked an 80-year-old math problem. Cursor is training near-frontier coding models on a fraction of the big-lab budget. Anthropic is paying Elon $15B a year for inference. Karpathy left education to go back into pre-training. Google rolled out an intelligence uplift to a billion people who don't even know a model dropped.If you put that on a whiteboard in 2023, it reads like a sci-fi pitch.LDJ's mathematician friends are asking if they should keep doing their PhDs. My answer hasn't changed: yes, please keep going. The people who combine domain taste with these tools are going to ship more in 5 years than the previous generation did in 50. The tool doesn't replace the taste. It just removes the bottleneck.That's the whole reason ThursdAI exists. Not to hype every drop, not to dunk for engagement, but to give you a shot at being one of the people who knows what's happening, with the receipts.This week, a lot changed.See you next Thursday.TL;DR and Show Notes* Hosts and Guests* Alex Volkov - AI Evangelist at Weights & Biases / CoreWeave, @altryne* Co-hosts: @WolframRvnwlf, @nisten, @ldjconfirmed* Guest: Logan Kilpatrick, MTS at Google DeepMind / AI Studio, @OfficialLoganK* Google I/O 2026* Google went all-in on agents across Search, Gemini, Antigravity, Workspace, Android, Cloud and YouTube (I/O site, Alex thread)* Antigravity 2.0 became the central agentic coding harness across Google (Sundar, Google OS demo)* Gemini 3.5 Flash launched as a fast, determined workhorse model for agentic loops (Logan, Noam Shazeer, Jeff Dean)* Gemini 3.5 Flash is rolling out across the Gemini app, Search AI Mode, Gemini API, Google AI Studio, Antigravity and Gemini Enterprise Agent Platform (Koray Kavukcuoglu)* Google Search is getting new Gemini 3.5 Flash-powered agentic capabilities, including a new AI-powered Search box and background information agents (Sundar)* Gemini Spark was announced as a 24/7 personal AI agent that can proactively work across Google surfaces (News from Google)* Google teased Gemini-powered Android XR smart glasses with eyewear partners Gentle Monster and Warby Parker (Google, Alex live reaction)* Google AI Studio and the Gemini API got major agentic developer updates, including Managed Agents (Google AI Developers)* Vision & Video* Google DeepMind launched Gemini Omni, a “create anything from anything” multimodal model starting with conversational video editing (DeepMind, Google DeepMind on X)* Omni is available in the Gemini app, Google Flow and YouTube, with API support coming soon (Logan, Gemini App, Sundar)* Key distinction: Omni is not just text-to-video, it is an iterative multi-turn video editing model that combines Gemini intelligence, world knowledge, multimodal inputs and generative media (Google)* Big CO LLMs + APIs* OpenAI announced a general-purpose reasoning model made progress on the Erdős planar unit distance problem, challenging an 80-year-old mathematical belief (OpenAI, X)* Cursor launched Composer 2.5, built on Kimi K2.5, with Opus-class coding performance at much lower cost (Cursor blog, X)* Alibaba released Qwen 3.7-Max, an agentic frontier model with long autonomous runs and robotics demos (Qwen blog, X, robot demo)* Andrej Karpathy joined Anthropic to work on frontier LLM R&D (X)* SpaceX IPO filing revealed Anthropic is paying $1.25B/month for AI compute at the Memphis Colossus facility (Axios, Sawyer Merritt)* The jury in Musk v. Altman found Musk's OpenAI claims barred by statute of limitations, with Musk saying he will appeal (Elon Musk, Sawyer Merritt, Max Zeff)* Open Source LLMs* Cohere released Command A+, a 218B MoE model with 25B active parameters under Apache 2.0 (Cohere, Nick Frosst, HF W4A4, HF BF16)* Nous Research released Lighthouse Attention, a sparse attention method for long-context pretraining with major speedups (Blog, X, arXiv, GitHub)* Tools & Agentic Engineering* Google launched Managed Agents in the Gemini API, letting developers spin up hosted Antigravity agents with Linux sandboxes and persistent state (Docs, X)* xAI launched Grok Build, an agentic CLI coding tool in beta for SuperGrok Heavy users (xAI CLI, X)* Hermes and OpenClaw can now use X subscription auth for semantic search and Grok tooling (Alex)* OpenAI Codex Mobile is now available in the ChatGPT mobile apps for remote agent workflows (OpenAI)* Anthropic doubled Claude usage outside peak hours for a limited period, including Claude Code and other Claude surfaces (Claude)* This Week's Buzz - W&B / CoreWeave* Weights & Biases by CoreWeave is at ICRA 2026 in Vienna, with robotics and automation taking center stage (ICRA, W&B event page)* NVIDIA heads to ICRA 2026 with robotics work around generalist humanoids, physical AI and sim-to-real systems (NVIDIA Robotics, NVIDIA ICRA)* Wolfram is speaking about WolfBench at the AI Developer event in Cologne before heading to ICRA in Vienna (Wolfram)* Other Topics* Data center water usage discourse came up again, including why comparisons need real scale and context rather than viral fear math* The broader theme of the week: coding agents are becoming general agents, and the major labs are now competing on the full stack of model, harness, tools, context and compute This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Take your personal data back with Incogni! Use code MARKDAVIS at the link below and get 60% off an annual plan: https://incogni.com/markdavisSee omnystudio.com/listener for privacy information.
留言告訴我你對這一集的想法: Podcast 佛曲 https://open.firstory.me/user/amitofo 法華與淨土 系列講座 信解篇 淨界法師 https://pse.is/7ghdyw 法華與淨土 系列講座 觀照.發願篇 淨界法師 https://pse.is/7ghepl 禪觀與淨土五 破障篇 淨土教觀學苑 淨界法師 https://pse.is/7ghg24 淨心與淨土 淨界法師 https://pse.is/4c5k8s 妙法蓮華經 淨土教觀學苑 淨界法師 https://reurl.cc/DgLr5m 淨界法師重點開示 https://reurl.cc/2rx5nX 淨界法師修行問答 https://reurl.cc/DZkyxd 大佛頂首楞嚴經 淨界法師 https://reurl.cc/no4Gg2 聞法儀軌 淨界法師 https://reurl.cc/O0LKyR 禪觀與淨土 淨界法師 https://reurl.cc/Q9GKvq 楞嚴經修學法要 淨界法師 https://reurl.cc/a98K3l 佛法修學概要 淨界法師 https://reurl.cc/mL4KgW 唯識學概要 淨界法師 https://reurl.cc/LbLKye 菩薩戒修學法要 淨界法師 https://reurl.cc/O0LKl7 《瑜伽菩薩戒本講表》淨界法師 https://reurl.cc/kZ3KRr 靈峰宗論導讀 淨界法師 https://reurl.cc/gWpXOb 唯識學概要 淨界法師 https://reurl.cc/3apq00 佛遺教經 淨界法師 https://reurl.cc/lR4KGQ 佛說阿彌陀經要解 淨界法師 https://reurl.cc/dGl468 佛說四十二章經 淨界法師 https://reurl.cc/DgLrG5 百法明門論 淨界法師 https://reurl.cc/Q9GK0p 印光大師文鈔選讀 淨界法師 https://reurl.cc/R0E3Ae 大乘起信論 淨界法師 https://reurl.cc/mL4Ka7 八識規矩頌 淨界法師 https://reurl.cc/pg4e7b 《佛說阿彌陀經》講解 淨界法師 https://reurl.cc/Q9GK8Z 天臺教觀綱宗 淨土教觀學苑 淨界法師 https://reurl.cc/GmL2zG 《菩提心修學述要》淨界法師 https://reurl.cc/VEQbGZ 阿彌陀經要解精華導讀(洛杉磯) 淨界法師 https://reurl.cc/og4Gz5 佛說阿彌陀經導讀 淨界法師 https://reurl.cc/Q9GQnO 阿彌陀佛四十八願導讀 淨界法師 https://reurl.cc/mL4161 淨土十疑論導讀 淨界法師 https://reurl.cc/rg4Wlx 佛說觀無量壽佛經導讀 淨界法師 https://reurl.cc/KALg6M 天臺教觀綱宗 淨界法師 https://reurl.cc/W3M949 大勢至菩薩念佛圓通章導讀 淨界法師 https://reurl.cc/no4gz6 果清法師 各地講演開示 https://reurl.cc/4aD5rv 《佛說梵網經菩薩戒心地品》(卷上匯釋) 果清律師 https://reurl.cc/281ylr 《佛說阿彌陀經》果清律師 宣講 https://reurl.cc/EnvqEm 《勸發菩提心文》果清律師 宣講 https://reurl.cc/mLQWV1 天因法師各地講演 https://reurl.cc/4aMk0K 如何發心受戒納受戒體 天因法師 https://reurl.cc/3aAroV 南山律在家備覽 天因法師 https://reurl.cc/pgbnxd 佛說阿彌陀經要解 天因法師 https://reurl.cc/lROEKd 八關齋戒釋要 天因法師 https://reurl.cc/W3oeQO 楞嚴經 四種清淨明誨 天因法師 https://reurl.cc/lROEWd 梵網經菩薩戒 天因法師 https://reurl.cc/bXoZ1o 戒律問答 天因法師 https://reurl.cc/3aQ8q9 占察經 唯心、真如實觀 天因法師 https://reurl.cc/zek29N 《般若波羅蜜多心經》大意 天因法師 https://reurl.cc/W3YaX5 天因律師 善德禪院 2016 佛七 https://reurl.cc/4aMk9K 《戒法戒體要義》本因法師 宣講 https://reurl.cc/a9b0vD 《敘緣發起篇》本因法師 宣講 https://reurl.cc/gWbkW4 淨土教觀學苑 普賢行願品 育因法師 https://reurl.cc/bXb4zE 《佛說阿彌陀經要解選讀》本明法師 https://reurl.cc/Envqq0 僧伽研習會 https://reurl.cc/9rAoVa 大勢至菩薩念佛圓通章 天因律師 2022 https://pse.is/4nch2j 【空中佛學院】播放清單 https://reurl.cc/XWrDR0 佛學常識課本 簡輝雄老師主講 https://reurl.cc/ogbzDj 普賢菩薩行願品 台語 悟廣法師 2021 https://reurl.cc/Rbl00g 八關齋戒開示 悟廣法師 2021 https://reurl.cc/NZ8j1k 普賢菩薩行願品 悟廣法師 2021 https://reurl.cc/Q9lnYp 悟廣法師精華開示 https://reurl.cc/rgVzeO 《佛說觀無量壽佛經疏》-悟廣法師 https://reurl.cc/2rAMMO 立命 改過 積善 謙德 入華嚴-悟廣法師https://reurl.cc/9rAozV 「震旦斜杠佛法」 |悟廣法師 https://reurl.cc/VEZpLQ 悟廣法師 大勢至圓通疏鈔菁華 https://reurl.cc/Lbxj4e 悟行法師 各地講演 播放清單 https://reurl.cc/XWN8A7 悟行法師 念佛的功德 2019清明報恩佛七開示 https://reurl.cc/ZGqLAg 悟行法師《印光大師護國息災法會法語菁華》 https://reurl.cc/og4E5v 老實念佛、不退成佛 悟行法師主講 https://reurl.cc/W3Mm6y 歸零 悟行法師主講 https://reurl.cc/2ry92r 消災免難之道 悟行法師主講 https://reurl.cc/VEQbLA 悟行法師 大勢至菩薩念佛圓通章精華 https://pse.is/4dyxgg 【黃警官講故事】讀誦《地藏經》感應故事 https://reurl.cc/ARmMz3 【黃警官講故事】墮胎 戒色 因果故事合輯 https://reurl.cc/EZNKOR 2021年10月30日 孝廉講堂 黃柏霖居士 新道場 https://reurl.cc/kLbd7G 黃柏霖警官 無量壽經 略說 https://reurl.cc/kLb5Gx 【黃警官講故事】 https://reurl.cc/Q6ld95 黃柏霖警官 轉禍為福之道 https://reurl.cc/EZNKqm 黃柏霖警官 深信因果 趨吉避凶 https://reurl.cc/0xbooK #佛號 讀誦經典 法會 播放清單 https://reurl.cc/4avVRX #佛教歌曲 播放清單 https://reurl.cc/rg4O81 淨空老和尚圓寂 弘法 回顧 相關影片 https://pse.is/4alw3p Podcast 廣欽老和尚開示錄 https://pse.is/4my2t8 淨空老法師佛學答問精選 https://pse.is/4mjbfc 靜老說的話 (淨空老法師極力推薦 淨宗同修需多聽聽) https://pse.is/4pf3sj 聖嚴法師 正信的佛教有聲書 https://pse.is/4nzqfu 妙法蓮華經各品讀誦-個人自修 https://pse.is/4pk9mq 釋迦牟尼佛傳奇 https://pse.is/4nsypl 佛教因果故事系列 https://pse.is/4hbpbf 淨土聖賢錄 https://pse.is/4p4q8e 凈土的見證 有聲書 https://pse.is/4pd2cn 八關齋戒釋要精華 天因法師 https://pse.is/4ja9v3 因果與輪迴系列 大安法師微開 https://pse.is/4p7pwh 聖嚴法師-大法鼓 0001~0200 https://pse.is/4p9cur 聖嚴法師-大法鼓 0201~0400 https://pse.is/4mq5ps 聖嚴法師-大法鼓 0401~0600 https://pse.is/4pcymm 聖嚴法師-大法鼓 0601~0800 https://pse.is/4nptp2 聖嚴法師-大法鼓 0801~1000 https://pse.is/4pbfgv 聖嚴法師-大法鼓 1001~1200 https://pse.is/4p6pd8 聖嚴法師-大法鼓 1201~1247 https://pse.is/4nynrn 淨空老法師 楞嚴經 https://pse.is/4pfjzw 淨燦法師宣講-淨語 選擇本願念佛集 https://pse.is/4m34qs 應用唯識學 開啟安樂自在的人生 觀成法師主講 https://pse.is/4d43mh 2022 埔裡圓通寺三壇大戒 https://pse.is/4dca2l 會性法師 金剛經演古 https://pse.is/4pdy5j 會性法師 佛說阿彌陀經 https://pse.is/4cf8pw #佛號 讀誦經典 法會 播放清單 https://reurl.cc/4avVRX 金山禦製梁皇寶懺 https://reurl.cc/L7x1Y3 文殊菩薩應化故事 https://reurl.cc/DgL2xd 佛教動漫 播放清單 https://reurl.cc/MALK1k 愛護生命的動漫故事 01 https://reurl.cc/9rLmOX 愛護生命的動漫故事 02 https://reurl.cc/1Y7Nop 了凡的故事 電影版 https://reurl.cc/MALKj4 佛教電影 百年虛雲 https://reurl.cc/5rZxev 佛教電影 魚籃馬郎觀世音 https://reurl.cc/W3MmpL 佛教電影 觀世音妙緣 https://reurl.cc/j8YEbn 佛教電影 觀音老母 https://reurl.cc/R0EVKr 佛教電影 #二十四孝 https://reurl.cc/0jlqVb 東北大鼓書 https://reurl.cc/LbLKNX Powered by Firstory Hosting
Hey everyone, Alex here
¡Hola, muy buenas! Soy Lorenzo y hoy te traigo el episodio número 791 de Atareao con Linux. Si has estado siguiendo mis últimas aventuras tecnológicas, sabrás que me he sumergido de lleno en el fascinante mundo de los modelos de lenguaje locales. Sin embargo, a raíz de mis vídeos y artículos sobre Ollama, ha surgido una pregunta recurrente en la comunidad: ¿Por qué usar Ollama y no Llama.cpp directamente? ¿O es que acaso uno es mejor que el otro? En este episodio me he propuesto despejar todas tus dudas y, de paso, contarte algunas novedades sobre hardware que te van a dejar con la boca abierta.El origen: Entre amigos y tecnología en el Linux CenterTodo esto empezó a fraguarse en las recientes jornadas de Inteligencia Artificial que vivimos en el Linux Center junto a los amigos de Slimbook. Fue una experiencia increíble donde pude compartir charla con Alejandro López y Manuel Lemos. Ver el interés de la gente y cómo el curso se llenó por completo me dio una pista clara: todos queremos tener el control de nuestra propia IA. Alejandro, que es un gran impulsor de estos temas, me prestó un equipo que ha sido clave para mis pruebas actuales y del cual te hablo un poco más adelante en este audio.Llama.cpp: El quirófano de los tensoresPara entender la diferencia, hay que saber qué es cada cosa. Llama.cpp es el motor puro. Imagínate que es el motor de un coche de competición donde puedes ajustar hasta la última tuerca. Está escrito en C++ por Georgi Gerganov con un objetivo claro: el máximo rendimiento. Ollama: La experiencia de usuario elevada al máximoPor otro lado, tenemos a Ollama. Muchas veces se ven como rivales, pero la realidad es que Ollama utiliza Llama.cpp por debajo. La diferencia es que Ollama es un "envoltorio" o orquestador escrito en Go que nos facilita la vida de una manera brutal. Se encarga de gestionar la memoria de tu tarjeta gráfica (VRAM) de forma inteligente.Cacharreando con contenedores y personalidad propiaComo no podía ser de otra forma, yo he montado Llama.cpp usando Podman y Quadlets, integrándolo totalmente en mi flujo de trabajo. En este episodio te cuento cómo he configurado mi NVIDIA RTX 4060 Ti de 16GB para que vuele, permitiéndome usar contextos de hasta 128K.Hardware: NVIDIA y el silencio de las NPUUno de los grandes temas de este episodio es el hardware. Hago un repaso por las tarjetas de NVIDIA, desde la serie 30 hasta la potente serie 50. Pero la verdadera sorpresa ha sido el Slimbook One con NPU (Neural Processing Unit). La anatomía de los modelos: Rompiendo el código¿Alguna vez has visto nombres de modelos como "Mistral-7B-Instruct-v3-Q4_K_M.gguf" y te has sentido perdido?Capítulos del episodio para que no te pierdas nada:00:00 - Bienvenidos al episodio 791: Ollama vs Llama.cpp01:35 - Crónica de las jornadas de IA en el Linux Center con Slimbook03:34 - ¿Por qué hay tanta polémica entre Ollama y Llama.cpp?04:42 - Llama.cpp: El "quirófano" de los tensores y el rendimiento puro05:18 - Ollama: El orquestador que nos facilita la vida06:40 - Comparativa: ¿Qué hace uno que no haga el otro?07:59 - ¿Eres de IKEA o de fabricar tus propios muebles?09:00 - Cacharreando con Llama.cpp, Podman y Quadlets10:48 - Leslie: Mi IA con personalidad propia en OpenWeb UI12:44 - Cómo descargar modelos a mano con Rust HF Downloader13:50 - Hardware para IA: Guía rápida de tarjetas NVIDIA17:15 - La experiencia con el Slimbook One y su NPU integrada18:05 - Anatomía de un modelo: Entendiendo los nombres19:40 - La piedra de Rosetta de la cuantización21:08 - Conclusiones y próximos pasos con OpenWeb UIMás información y enlaces en las notas del episodio
El catálogo del Sinclair ZX Spectrum tuvo un amplio y variado número de géneros. Para gustos, colores pero todos disfrutamos en alguna ocasión de algún juego de fútbol, baloncesto, tenis, lucha, olímpico, etc. Hoy vamos a hablar de un montón de esos títulos, algunos míticos, que hoy día seguimos recordando. Antes del tema principal contaremos con la entrevista a Juanjo Muñoz, experto en aventuras con un interesantísimo pasado y un presente que nutre de contenidos a todos los fanáticos de este género tan especial. Sabemos que os encantan los programas en los que incluimos dos entrevistas. Es el caso de este 14×03. Contaremos, como maravilloso colofón del programa, con la entrevista a Ricardo Machuca, artista e ilustrador de DINAMIC, autor de las portadas de juegos como La Guerra de las Vajillas, Cosmic Sheriff, Bestial Warrior o Delfox pero especialmente recordado por la creación de varios de los logos de juegos importantísimos así como de todos los recursos gráficos de aquellas magníficas instrucciones de la empresa de los hermanos Ruiz. Nos contará infinidad de información valiosa, no solo de aquella época, sino también de su época de animador en Hanna Barbera y otras producciones como por ejemplo la serie de animación de Batman. ¡Casi nada! Esperamos que disfrutéis de este programa porque en esta ocasión nos volveremos a escuchar antes de lo que esperáis. ¡Hasta muy pronto! Con Jesús Martínez del Vas, Jesús Relinque «Pedja» y Alejandro Ibáñez Muñoz. Si queréis acceso a los títulos mencionados en este podcast entrad en elmundodelspectrum.com
Paracha Tazria-Metsora « La Parole Clé de Vie ou Source d'Impureté» - Rav Shoushana (128k) by Rav David SHOUSHANA
Ostergottesdienst Gastprediger - Gideon Odoma - Every Nation Kirche Berlin (128k) by Every Nation Kirche Berlin
Hey dear subscriber, Alex here from W&B, let me catch you up! This week started with Anthropic releasing /fast mode for Opus 4.6, continued with ByteDance reality-shattering video model called SeeDance 2.0, and then the open weights folks pulled up! Z.ai releasing GLM-5, a 744B top ranking coder beast, and then today MiniMax dropping a heavily RL'd MiniMax M2.5, showing 80.2% on SWE-bench, nearly beating Opus 4.6! I've interviewed Lou from Z.AI and Olive from MiniMax on the show today back to back btw, very interesting conversations, starting after TL;DR!So while the OpenSource models were catching up to frontier, OpenAI and Google both dropped breaking news (again, during the show), with Gemini 3 Deep Think shattering the ArcAGI 2 (84.6%) and Humanity's Last Exam (48% w/o tools)... Just an absolute beast of a model update, and OpenAI launched their Cerebras collaboration, with GPT 5.3 Codex Spark, supposedly running at over 1000 tokens per second (but not as smart) Also, crazy week for us at W&B as we scrambled to host GLM-5 at day of release, and are working on dropping Kimi K2.5 and MiniMax both on our inference service! As always, all show notes in the end, let's DIVE IN! ThursdAI - AI is speeding up, don't get left behind! Sub and I'll keep you up to date with a weekly catch upOpen Source LLMsZ.ai launches GLM-5 - #1 open-weights coder with 744B parameters (X, HF, W&B inference)The breakaway open-source model of the week is undeniably GLM-5 from Z.ai (formerly known to many of us as Zhipu AI). We were honored to have Lou, the Head of DevRel at Z.ai, join us live on the show at 1:00 AM Shanghai time to break down this monster of a release.GLM-5 is massive, not something you run at home (hey, that's what W&B inference is for!) but it's absolutely a model that's worth thinking about if your company has on prem requirements and can't share code with OpenAI or Anthropic. They jumped from 355B in GLM4.5 and expanded their pre-training data to a whopping 28.5T tokens to get these results. But Lou explained that it's not only about data, they adopted DeepSeeks sparse attention (DSA) to help preserve deep reasoning over long contexts (this one has 200K)Lou summed up the generational leap from version 4.5 to 5 perfectly in four words: “Bigger, faster, better, and cheaper.” I dunno about faster, this may be one of those models that you hand off more difficult tasks to, but definitely cheaper, with $1 input/$3.20 output per 1M tokens on W&B! While the evaluations are ongoing, the one interesting tid-bit from Artificial Analysis was, this model scores the lowest on their hallucination rate bench! Think about this for a second, this model is neck-in-neck with Opus 4.5, and if Anthropic didn't release Opus 4.6 just last week, this would be an open weights model that rivals Opus! One of the best models the western foundational labs with all their investments has out there. Absolutely insane times. MiniMax drops M2.5 - 80.2% on SWE-bench verified with just 10B active parameters (X, Blog)Just as we wrapped up our conversation with Lou, MiniMax dropped their release (though not weights yet, we're waiting ⏰) and then Olive Song, a senior RL researcher on the team, joined the pod, and she was an absolute wealth of knowledge! Olive shared that they achieved an unbelievable 80.2% on SWE-Bench Verified. Digest this for a second: a 10B active parameter open-source model is directly trading blows with Claude Opus 4.6 (80.8%) on the one of the hardest real-world software engineering benchmark we currently have. While being alex checks notes ... 20X cheaper and much faster to run? Apparently their fast version gets up to 100 tokens/s. Olive shared the “not so secret” sauce behind this punch-above-its-weight performance. The massive leap in intelligence comes entirely from their highly decoupled Reinforcement Learning framework called “Forge.” They heavily optimized not just for correct answers, but for the end-to-end time of task performing. In the era of bloated reasoning models that spit out ten thousand “thinking” tokens before writing a line of code, MiniMax trained their model across thousands of diverse environments to use fewer tools, think more efficiently, and execute plans faster. As Olive noted, less time waiting and fewer tools called means less money spent by the user. (as confirmed by @swyx at the Windsurf leaderboard, developers often prefer fast but good enough models) I really enjoyed the interview with Olive, really recommend you listen to the whole conversation starting at 00:26:15. Kudos MiniMax on the release (and I'll keep you updated when we add this model to our inference service) Big Labs and breaking newsThere's a reason the show is called ThursdAI, and today this reason is more clear than ever, AI biggest updates happen on a Thursday, often live during the show. This happened 2 times last week and 3 times today, first with MiniMax and then with both Google and OpenAI! Google previews Gemini 3 Deep Think, top reasoning intelligence SOTA Arc AGI 2 at 84% & SOTA HLE 48.4% (X , Blog)I literally went
Jake and Michael discuss all the latest Laravel releases, tutorials, and happenings in the community.Show linkshasSole() Collection Method in Laravel 12.49.0hasMany() Collection Method in Laravel 12.50.0Filament v5.2.0 Adds a Callout ComponentClawdbot Rebrands to Moltbot After Trademark Request From AnthropicInstall Laravel Package Guidelines and Skills in BoostFuse for Laravel: A Circuit Breaker Package for Queue JobsNativePHP for Mobile Is Now FreeManage PostgreSQL Databases Directly in VS Code with Microsoft's ExtensionLivewire 4 and Blade Improvements in Laravel VS Code Extension v1.5.0Statamic 6 Is Officially ReleasedLaravel Announces Official AI SDK for Building AI-Powered AppsClaude Opus 4.6 adds adaptive thinking, 128K output, compaction API, and moreOpenAI Releases GPT-5.3-Codex, a New Codex Model for Agent-Style DevelopmentLaravel Live UK returns to London on June 18-19, 2026Bagisto Visual: Theme Framework with Visual Editor for Laravel E-commerceGenerate Complete Application Modules with a Single Command using Laravel TurboMakerEncrypt Files in Laravel with AES-256-GCM and Memory-Efficient StreamingMask Sensitive Eloquent Attributes on Retrieval in LaravelLaravel Related Content: Semantic Relationships Using pgvector
¡Qué locura de programa! Volvemos en El Mundo del Spectrum Podcast para tratar un tema de lo más psicodélico: Surrealismo en el Spectrum. Analizaremos en este 14×02 esos juegos surrealistas y psicodélicos que tanto nos marcaron en los 80. El catálogo del Spectrum estuvo muy nutrido de títulos bien extraños y ha llegado la hora de verlos a fondo. Garantizamos risas y música psicodélica. La entrevista será a Alberto Nadal, Director Comercial de Dinamic Multimedia, que nos contará infinidad de cifras y datos jamás desvelados. Todos los secretos de PC Fútbol contados por primera vez en El Mundo del Spectrum Podcast. En esta ocasión no tenemos sección de actualidad. Comentaremos las principales noticias en el siguiente programa. En este El Mundo del Spectrum Podcast 14×02 participarán Jesús Martínez del Vas, Jesús Relinque «Pedja» y Alejandro Ibáñez. Esperamos que lo disfrutéis escuchándolo tanto como nosotros haciéndolo. Accede a los juegos desde la entrada de este programa en nuestra web elmundodelspectrum.com
Ho Ho Ho, Alex here! (a real human writing these words, this needs to be said in 2025) Merry Christmas (to those who celebrate) and welcome to the very special yearly ThursdAI recap! This was an intense year in the world of AI, and after 51 weekly episodes (this is episode 52!) we have the ultimate record of all the major and most important AI releases of this year! So instead of bringing you a weekly update (it's been a slow week so far, most AI labs are taking a well deserved break, the Cchinese AI labs haven't yet surprised anyone), I'm dropping a comprehensive yearly AI review! Quarter by quarter, month by month, both in written form and as a pod/video! Why do this? Who even needs this? Isn't most of it obsolete? I have asked myself this exact question while prepping for the show (it was quite a lot of prep, even with Opus's help). I eventually landed on, hey, if nothing else, this will serve as a record of the insane week of AI progress we all witnessed. Can you imagine that the term Vibe Coding is less than 1 year old? That Claude Code was released at the start of THIS year? We get hedonicly adapt to new AI goodies so quick, and I figured this will serve as a point in time check, we can get back to and feel the acceleration! With that, let's dive in - P.S. the content below is mostly authored by my co-author for this, Opus 4.5 high, which at the end of 2025 I find the best creative writer with the best long context coherence that can imitate my voice and tone (hey, I'm also on a break!
Nuestra cita navideña llega puntual a El Mundo del Spectrum Podcast, esta vez en formato Microdrive. En esta ocasión hemos optado por una charla distendida y cercana sobre la Navidad y el Spectrum, con el regreso de Juan Francisco Torres, que vuelve a casa por Navidad, la presencia de nuevo de Aitor Chávez, miembro del equipo que no se prodiga demasiado pero que, como siempre, aporta su visión más sentimental sobre la máquina de Sinclair, y, para cerrar el elenco, los habituales Jesús Martínez del Vas y Alejandro Ibáñez. Si te apetece pasar un rato agradable, cálido y cargado de nostalgia, este programa te ayudará a reconectar con la Navidad de antaño y a viajar al pasado recordando anécdotas y situaciones que compartimos miles de spectrumeros. Una cita que, como cada año, queremos compartir contigo para celebrar estas fechas tan especiales. Desde El Mundo del Spectrum te deseamos una Feliz Navidad y una inmejorable entrada de año.
Hey everyone, December started strong and does NOT want to slow down!? OpenAI showed us their response to the Code Red and it's GPT 5.2, which doesn't feel like a .1 upgrade! We got it literally as breaking news at the end of the show, and oh boy! The new kind of LLMs is here. GPT, then Gemini, then Opus and now GPT again... Who else feels like we're on a trippy AI rolercoaster? Just me?
In the second part of the fifth installment of the 2025 MacVoices Holiday Gift Guide, David Ginsburg, Wally Cherwinski, and Chuck Joiner highlight a 4K portable monitor and stand, AirPods safety straps, compression and neoprene travel socks, rechargeable motion-sensor lighting, a compact USB microphone, and smart-home hardware buttons. Practical travel gear, audio upgrades, and smart lighting tools round out the list of gift picks. (Part 2) MacVoices is supported by Take Control Books: The Answers You Need Now, From Leading Experts. Start your library today. http://takecontrolbooks.com Show Notes: Chapters: [0:00] Introduction to Part Two of the 2025 Holiday Gift Guide [0:10] Sponsor message and transition to round three [0:40] Dave's pick: 4K portable monitor + adjustable stand [2:43] Monitor pricing, size, and travel practicality [3:16] Wally's pick: AirPods safety straps [4:36] AirPods loss prevention and real-world mishaps [6:47] Materials, magnetic connectors, and affordability [9:08] Chuck's pick: rechargeable motion-sensor lighting [9:59] Installation methods and LED performance [11:54] Long-term use and visibility improvements [12:55] Round four begins [12:56] Dave's pick: Rode NT-USB Mini microphone [14:30] Audio quality, portability, and use cases [15:35] Microphone collections and recommendations [20:08] Wally's pick: neoprene travel socks and compression socks [22:18] Comfort, circulation, and travel benefits [24:45] Fit, sizing, and quality considerations [28:01] Chuck's pick: Flick smart-home buttons [29:35] Closing remarks and guest contact information Links: David Ginsburg: True 4K Portable Monitor - 15.6inch UHD 3840×2160 100% sRGB USB-C HDMI External Second Monitor Portable IPS Screen https://amzn.to/4oEgpbN OCYCLONE Tablet Holder Compatible with iPad Stand for Desk, Foldable Tablet iPad Holder Portable Monitor Stand https://amzn.to/48mdnE7 RØDE NT-USB Mini Versatile Studio-quality Condenser USB Microphone https://amzn.to/4oEtGkD Wally Cherwinski: Ultra Strong Magnetic Anti-Lost Straps for AirPods, Colorful Soft Silicone Sports Lanyard Compatible with AirPods 4rd / AirPod Pro https://amzn.to/48mgv2P OMGear Water Socks Neoprene Socks Beach Booties 3mm 5mm Anti-Slip Wetsuit Footwear Fin Swim Sand Proof Socks https://amzn.to/4a15vJs FuelMeFoot 3 Pack Copper Compression Socks https://amzn.to/44dqeWO Chuck Joiner: 10-inch Under Cabinet Lighting, 2 Pack Rechargeable Motion Sensor Light Indoor, 5 Levels Dimmable Magnetic Closet Lights https://amzn.to/4rEy4Tt Flic Smart Button 3-Pack | Light Switch, Music Controller, Routine Trigger That Works with Alexa, Matter, Homekit, SmartThings https://amzn.to/4plNZo7 Guests: David Ginsburg is the host of the weekly podcast In Touch With iOS where he discusses all things iOS, iPhone, iPad, Apple TV, Apple Watch, and related technologies. He is an IT professional supporting Mac, iOS and Windows users. Visit his YouTube channel at https://youtube.com/daveg65 and find and follow him on Twitter @daveg65 and on Mastodon at @daveg65@mastodon.cloud. Wally Cherwinski is a Videographer based in Ottawa, Canada. Originally trained as a scientist, he spent a portion of his career in research and teaching at the University of Cambridge, England while doubling as a freelance photographer and writer. Later, he joined Canada's National Research Council and spent many years managing communications for the Canadian Space Program. Starting with 16mm film, he has written and directed numerous documentaries and television features, including projects with Canada's National Film Board. More recently, he has combined his passion for video with his love of travel. Wally has been a Mac user since the original 128K in 1984 and his Apple "museum" includes 28 Macs (not to mention Newtons, iPods, iPhones & iPads). He has delivered video workshops at Macworld, at Macintosh User Groups in Canada and on three MacMania cruises. He also writes a regular video column in the ScreenCastsOnline monthly magazine. You can connect with him on X, or view his Cirque du Mac videos (and others) on his YouTube channel. Support: Become a MacVoices Patron on Patreon http://patreon.com/macvoices Enjoy this episode? Make a one-time donation with PayPal Connect: Web: http://macvoices.com Twitter: http://www.twitter.com/chuckjoiner http://www.twitter.com/macvoices Mastodon: https://mastodon.cloud/@chuckjoiner Facebook: http://www.facebook.com/chuck.joiner MacVoices Page on Facebook: http://www.facebook.com/macvoices/ MacVoices Group on Facebook: http://www.facebook.com/groups/macvoice LinkedIn: https://www.linkedin.com/in/chuckjoiner/ Instagram: https://www.instagram.com/chuckjoiner/ Subscribe: Audio in iTunes Video in iTunes Subscribe manually via iTunes or any podcatcher: Audio: http://www.macvoices.com/rss/macvoicesrss Video: http://www.macvoices.com/rss/macvoicesvideorss
In the second part of the fifth installment of the 2025 MacVoices Holiday Gift Guide, David Ginsburg, Wally Cherwinski, and Chuck Joiner highlights a 4K portable monitor and stand, AirPods safety straps, compression and neoprene travel socks, rechargeable motion-sensor lighting, a compact USB microphone, and smart-home hardware buttons. Practical travel gear, audio upgrades, and smart lighting tools round out the list of gift picks. (Part 2) MacVoices is supported by Take Control Books: The Answers You Need Now, From Leading Experts. Start your library today. http://takecontrolbooks.com Show Notes: Chapters: [0:00] Introduction to Part Two of the 2025 Holiday Gift Guide [0:10] Sponsor message and transition to round three [0:40] Dave's pick: 4K portable monitor + adjustable stand [2:43] Monitor pricing, size, and travel practicality [3:16] Wally's pick: AirPods safety straps [4:36] AirPods loss prevention and real-world mishaps [6:47] Materials, magnetic connectors, and affordability [9:08] Chuck's pick: rechargeable motion-sensor lighting [9:59] Installation methods and LED performance [11:54] Long-term use and visibility improvements [12:55] Round four begins [12:56] Dave's pick: Rode NT-USB Mini microphone [14:30] Audio quality, portability, and use cases [15:35] Microphone collections and recommendations [20:08] Wally's pick: neoprene travel socks and compression socks [22:18] Comfort, circulation, and travel benefits [24:45] Fit, sizing, and quality considerations [28:01] Chuck's pick: Flick smart-home buttons [29:35] Closing remarks and guest contact information Links: David Ginsburg: True 4K Portable Monitor - 15.6inch UHD 3840×2160 100% sRGB USB-C HDMI External Second Monitor Portable IPS Screen https://amzn.to/4oEgpbN OCYCLONE Tablet Holder Compatible with iPad Stand for Desk, Foldable Tablet iPad Holder Portable Monitor Stand https://amzn.to/48mdnE7 RØDE NT-USB Mini Versatile Studio-quality Condenser USB Microphone https://amzn.to/4oEtGkD Wally Cherwinski: Ultra Strong Magnetic Anti-Lost Straps for AirPods, Colorful Soft Silicone Sports Lanyard Compatible with AirPods 4rd / AirPod Pro https://amzn.to/48mgv2P OMGear Water Socks Neoprene Socks Beach Booties 3mm 5mm Anti-Slip Wetsuit Footwear Fin Swim Sand Proof Socks https://amzn.to/4a15vJs FuelMeFoot 3 Pack Copper Compression Socks https://amzn.to/44dqeWO Chuck Joiner: 10-inch Under Cabinet Lighting, 2 Pack Rechargeable Motion Sensor Light Indoor, 5 Levels Dimmable Magnetic Closet Lights https://amzn.to/4rEy4Tt Flic Smart Button 3-Pack | Light Switch, Music Controller, Routine Trigger That Works with Alexa, Matter, Homekit, SmartThings https://amzn.to/4plNZo7 Guests: David Ginsburg is the host of the weekly podcast In Touch With iOS where he discusses all things iOS, iPhone, iPad, Apple TV, Apple Watch, and related technologies. He is an IT professional supporting Mac, iOS and Windows users. Visit his YouTube channel at https://youtube.com/daveg65 and find and follow him on Twitter @daveg65 and on Mastodon at @daveg65@mastodon.cloud. Wally Cherwinski is a Videographer based in Ottawa, Canada. Originally trained as a scientist, he spent a portion of his career in research and teaching at the University of Cambridge, England while doubling as a freelance photographer and writer. Later, he joined Canada's National Research Council and spent many years managing communications for the Canadian Space Program. Starting with 16mm film, he has written and directed numerous documentaries and television features, including projects with Canada's National Film Board. More recently, he has combined his passion for video with his love of travel. Wally has been a Mac user since the original 128K in 1984 and his Apple "museum" includes 28 Macs (not to mention Newtons, iPods, iPhones & iPads). He has delivered video workshops at Macworld, at Macintosh User Groups in Canada and on three MacMania cruises. He also writes a regular video column in the ScreenCastsOnline monthly magazine. You can connect with him on X, or view his Cirque du Mac videos (and others) on his YouTube channel. Support: Become a MacVoices Patron on Patreon http://patreon.com/macvoices Enjoy this episode? Make a one-time donation with PayPal Connect: Web: http://macvoices.com Twitter: http://www.twitter.com/chuckjoiner http://www.twitter.com/macvoices Mastodon: https://mastodon.cloud/@chuckjoiner Facebook: http://www.facebook.com/chuck.joiner MacVoices Page on Facebook: http://www.facebook.com/macvoices/ MacVoices Group on Facebook: http://www.facebook.com/groups/macvoice LinkedIn: https://www.linkedin.com/in/chuckjoiner/ Instagram: https://www.instagram.com/chuckjoiner/ Subscribe: Audio in iTunes Video in iTunes Subscribe manually via iTunes or any podcatcher: Audio: http://www.macvoices.com/rss/macvoicesrss Video: http://www.macvoices.com/rss/macvoicesvideorss
¡Por fin comienza la temporada 14 de El Mundo del Spectrum Podcast! En este 14×01 hemos querido hacer un monográfico de juegos Bárbaros. Además de repasar todos los juegos de esta temática para Spectrum y otras plataformas, hablaremos largo y tendido de CONAN, el personaje literario de Robert E. Howard, y su evolución en el mundo del comic y el cine. Por supuesto hablaremos de la película Conan el Bárbaro, protagonizada por Arnold Schwarzenegger. Título en el imaginario de muchos de los que nos criamos en los 80. Entrevistaremos a Jorge Rosado de Noria Works. Jorge es uno de los personajes más importantes del desarrollo patrio post Edad de Oro del Software Español con un catálogo de juegos impresionante. Una época muy meritoria del videojuego español que ha sido opacada inexplicablemente y a la que trataremos de darle luz. También contaréis con una sección de actualidad express con las noticias más importantes de estos meses. Esperamos que disfrutéis de lo que os hemos preparado para este programa. Esta vez volveremos pronto. Mientras tanto, escuchad el programa y comentad lo que queráis trasladarnos. ¡Ah! y hay un concurso relacionado con el comic de Conan. El más rápido se llevará un libro de El Mundo del Spectrum.
Hey, Alex here! Quick note, while preparing for this week, I posted on X that I don't remember such a quiet week in AI since I started doing ThursdAI regularly, but then 45 min before the show started, Kimi dropped a SOTA oss reasoning model, turning a quiet week into an absolute banger. Besides Kimi, we covered the updated MCP thinking from Anthropic, and had Kenton Varda from cloudflare as a guest to talk about Code Mode, chatted about Windsurf and Cursor latest updates and covered OpenAI's insane deals. Also, because it was a quiet week, I figured I'd use the opportunity to create an AI powered automation, and used N8N for that, and shared it on the stream, so if you're interested in automating with AI with relatively low code, this episode is for you. Let's dive inThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Kimi K2 Thinking is Here and It's a 1 Trillion Parameter Beast! (X, HF, Tech Blog)Let's start with the news that got everyone's energy levels skyrocketing right as we went live. Moonshot AI dropped Kimi K2 Thinking, an open-source, 1 trillion-parameter Mixture-of-Experts (MoE) model, and it's an absolute monster.This isn't just a numbers game; Kimi K2 Thinking is designed from the ground up to be a powerful agent. With just around 32 billion active parameters during inference, a massive 256,000 token context window, and an insane tool-calling capacity. They're claiming it can handle 200-300 sequential tool calls without any human intervention. The benchmarks are just as wild. On the Humanities Last Exam (HLE), they're reporting a score of 44.9%, beating out both GPT-5 and Claude 4.5 Thinking. While it doesn't quite top the charts on SWE-bench verified, it's holding its own against the biggest closed-source models out there. Seeing an open-source model compete at this level is incredibly exciting.During the show, we saw some truly mind-blowing demos, from a beautiful interactive visualization of gradient descent to a simulation of a virus attacking cells, all generated by the model. The model's reasoning traces, which are exposed through the API, also seem qualitatively different from other models, showing a deep and thoughtful process. My co-hosts and I were blown away. The weights and a very detailed technical report are available on Hugging Face, so you can dive in and see for yourself. Shout out to the entire Moonshot AI team for this incredible release!Other open source updates from this week* HuggingFace released an open source “Smol Training Playbook” on training LLMs, it's a 200+ interactive beast with visualizations, deep dives into pretraining, dataset, postraining and more! (HF)* Ai2 launches OlmoEarth — foundation models + open, end-to-end platform for fast, high-resolution Earth intelligence (X, Blog)* LongCat-Flash-Omni — open-source omni-modal system with millisecond E2E spoken interaction, 128K context and a 560B ScMoE backbone (X, HF, Announcement)Big Tech's Big Moves: Apple, Amazon, and OpenAIThe big companies were making waves this week, starting with a blockbuster deal that might finally make Siri smart. Apple is reportedly will be paying Google around $1 billion per year to license a custom 1.2 trillion-parameter version of Gemini to power a revamped Siri.This is a massive move. The Gemini model will run on Apple's Private Cloud Compute, keeping user data walled off from Google, and will handle Siri's complex summarizer and planner functions. After years of waiting for Apple to make a significant move in GenAI, it seems they're outsourcing the heavy lifting for now while they work to catch up with their own in-house models. As a user, I don't really care who builds the model, as long as Siri stops being dumb!In more dramatic news, Perplexity revealed that Amazon sent them a legal threat to block their Comet AI assistant from shopping on Amazon.com. This infuriated me. My browser is my browser, and I should be able to use whatever tools I want to interact with the web. Perplexity took a strong stand with their blog post, “Bullying is Not Innovation,” arguing that user agents are distinct from scrapers and act on behalf of the user with their own credentials. An AI assistant is just that—an assistant. It shouldn't matter if I ask my wife or my AI to buy something for me on Amazon. This feels like a move by Amazon to protect its ad revenue at the expense of user choice and innovation, and I have to give major props to Perplexity for being so transparent and fighting back.Finally, OpenAI continues its quest for infinite compute, announcing a multi-year strategic partnership with AWS. This comes on top of massive deals with NVIDIA, Microsoft, Oracle, and others, bringing their total commitment to compute into the trillions of dollars. It's getting to a point where OpenAI seems “too big to fail,” as any hiccup could have serious repercussions for the entire tech economy, which is now heavily propped up by AI investment. Sam has clarified that they don't think OpenAI wants to be too big to fail in a recent post on X, and that the recent miscommunications around the US government backstopping OpenAI's infrastructure bailouts were taken out of context.
Exterior Cleaning Business Owners in small towns often feel stuck. You're grinding hard, but the jobs don't flow consistently, and your website or reviews aren't pulling the way they should.That's exactly where Brayson was — a truck driver turned business owner in Valdosta, GA. He wanted to grow past $150K but hit a plateau. After working with Jonathon Henderson and Pressure Washing Marketing Pros, everything changed:✅ Fixed major website issues (his site wasn't even indexed on Google)✅ Doubled reviews to dominate local trust✅ Increased commercial jobs by 25%✅ Built a consistent year-round window cleaning routeNow his business has grown from $128K to $250K+ a year, even in a rural market.If you're tired of guessing and want proven systems that actually bring in jobs, this testimonial shows what's possible when you get the right help.
Linktree: https://linktr.ee/AnalyticJoin The Normandy For Additional Bonus Audio And Visual Content For All Things Nme+! Join Here: https://ow.ly/msoH50WCu0KIn this segment of Notorious Mass Effect, Analytic Dreamz explores KPop Demon Hunters (2025), Netflix's most-watched film ever, premiering June 20 with 291.5M views by Sept. 9, outpacing Squid Game and Red Notice. Directed by Maggie Kang and Chris Appelhans, this $100M K-pop fantasy blend topped Netflix in 93 countries and grossed $18–20M in a sing-along run. Its soundtrack, led by HUNTR/X's “Golden” (#1 Hot 100) and featuring TWICE, hit #1 Billboard 200 with 128K units and 3B+ global streams. Analytic Dreamz unpacks its 95–97% Rotten Tomatoes scores, 400% merch surge, and HUNTR/X's rise as a top K-pop act, marking a cultural milestone. Support this podcast at — https://redcircle.com/analytic-dreamz-notorious-mass-effect/donationsAdvertising Inquiries: https://redcircle.com/brandsPrivacy & Opt-Out: https://redcircle.com/privacy
In this episode of The Tech Leader's Playbook, Avetis Antaplyan sits down with Susan Ruediger, Founder and Chief Mission Officer of the CMT Research Foundation (CMTRF), and Laura MacNeill, the organization's CEO. Together, they explore how patient-led research is revolutionizing drug development and catalyzing billion-dollar outcomes. Susan shares the remarkable story of CMTRF's $128,000 seed investment in DTX Pharma that led to a $1 billion Novartis acquisition — a masterclass in strategic risk-taking and venture philanthropy. Laura explains how CMTRF's unique “go-out-of-business” mission drives urgency, focus, and impact, while also inspiring other nonprofits to adopt similar models. The conversation dives deep into storytelling's role in galvanizing donors, the importance of milestones and reinvestment, and how rare disease foundations can unlock breakthroughs for broader neurodegenerative diseases like ALS, Parkinson's, and Alzheimer's. Whether you're a biotech leader, investor, or nonprofit executive, this episode offers actionable lessons on focus, partnerships, and creating outsized impact with limited resources.TakeawaysPatient-led research can de-risk and accelerate drug development.$128K seed funding led to a $1B Novartis acquisition.CMTRF uses a venture-philanthropy model with milestone-based funding.Mission: fund treatments, find a cure, close the foundation.Storytelling drives awareness, donations, and partnerships.Early investments keep promising science alive.Biotech partnerships share risk and leverage expertise.Novartis validated CMT as a major market opportunity.Rare disease focus offers faster FDA pathways.Staying laser-focused means saying no to distractions.Chapters00:00 Intro & Guest Welcome01:20 From Grassroots Donations to Billion-Dollar Deals02:30 Understanding CMT and Its Impact05:00 Finding the Right Delivery Vehicle for Drugs07:40 The $128K Bet That Changed Everything09:50 Other Success Stories & Market Signaling13:00 The Venture-Philanthropy Model Explained16:30 The Power of Milestones and Flexibility18:45 Reinvestment and Sustainable Funding21:30 Role of Storytelling and Strategy in Movement Building26:10 Velocity Campaign & Raising $20M27:25 Why Biotechs Care About Rare Diseases31:50 CMT as a Gateway Indication for Neurodegenerative Disease33:30 Staying Focused and Saying No38:30 The Drug Development Lifecycle and Staying Mission-Aligned42:10 How to Get Involved and Follow CMTRF's Work45:10 Personal & Business Advice for Leaders48:30 Favorite Books and Final Thoughts52:00 Closing Remarks and Call to ActionSusan Ruediger's Social Media Links:https://www.linkedin.com/in/susan-ruediger/Laura MacNeill's Social Media Links:https://www.linkedin.com/in/laura-macneill-m-b-a-97633732/CMT Research Foundation's Website:https://cmtrf.org/Resources and Links:https://www.hireclout.comhttps://www.podcast.hireclout.comhttps://www.linkedin.com/in/hirefasthireright
Hey everyone, Alex here
In the 910th episode of the PokerNews Podcast, Chad Holloway, Kyna England, and Mike Holtz discuss the wild hand involving Corey Eyring and Randy "3Coin" Sadler from The Lodge Live Stream in which the former was put to the ultimate test on the river for his entire net worth! They then talk about Ian O'Hara making good and whether or not that should be celebrated, German streamer Jenzoou's extreme reaction to winning a $63k mystery bounty, and Michael "The Grinder" Mizrachi battling Martin Kabrhel on Venetian Poker Live. Plus, the return of the National Heads-Up Poker Championship and tournament wins by Sasha Sabbaghian in the $600 The Hendon Mob Mid-Major Championship, King at the latest Celebrity Poker Tour, and Mike Estes in the 2026 MSPT Iowa Poker State Championship, a tournament in which Chad took third place for $69k. The crew also reveals the winner in our Pokerriculum giveaway, and then announces a new giveaway for a Silver Pass to the upcoming PokerStars NAPT in Las Vegas. You'll also get a sneak peek at the latest Life Outside Poker Podcast featuring Abby Merk and see highlights from the 17th Annual Ante 4 Autism charity event. Finally, Chad and Kyna announce a brand new project - The MSPT Podcast Presented by PokerNews. That's right, the dynamic duo will be cohosting a brand new show that'll debut next week right here on PokerNews. A new PokerNews Podcast drops every Thursday at 8a PT / 11a ET / 4p UK time. Remember to subscribe to our YouTube channel so you do not miss an episode![/game] Time Stamps *Time | Topic* 00:00 | Welcome to the show 01:20 | Corey Eyring plays big hand for entire net worth 08:20 | Ian O'Hara makes good but inspires debate 14:20 | German streamer Jenzoou goes wild 17:00 | Mizrachi vs. Kabrhel on Venetian Poker Live 19:11 | $128K pot largest of the night + big winners & losers 20:49 | PGT Venetian Las Vegas Classic Sept. 11-16 21:17 | National Heads-Up Poker Championship returns 26:27 | Sasha Sabbaghian wins $600 The Hendon Mob Mid-Major Championship 27:49 | Influencer “King” latest Celebrity Poker Tour champ 31:14 | Chad took 3rd in Iowa for $69K 33:06 | Mike Estes wins MSPT Iowa Poker State Championship for $153,551 34:45 | Teresa the lucky dealer 35:20 | Debuting a new show – MSPT Podcast presented by PokerNews 38:08 | Preview of Life Outside Poker feat. Abby Merk 40:09 | Highlights from the Ante 4 Autism Charity Event 40:35 | Winner of Pokerriculum giveaway 42:20 | Silver Pass giveaway for the NAPT
Hour 4 kicks off with Dylan Sharkey of the Illinois Policy Institute exposing Gov. JB Pritzker's potential third-term run and presidential ambitions, highlighting Illinois' lack of gubernatorial term limits, $128K salaries for legislators working just 70 days a year, and ongoing fiscal struggles like Chicago's bankrupt schools. The discussion paints a clear picture of entrenched political corruption and financial mismanagement in Illinois. The hour then shifts to an upbeat economic outlook fueled by the “big beautiful bill,” tariff revenues, and historic tax cuts promising average American families an extra $13,000 a year. Marc underscores President Trump's efforts to counteract Biden-era restrictions, with retail sales rebounding and a potential housing market surge on the horizon. Finally, the hour wraps with a community call to action supporting the Lieber family after a tragic accident, showcasing the strength of local solidarity amid challenging times.
Marc and Kim lead the charge through four hours packed with hard-hitting political analysis and economic optimism. Hour 1 slams Biden's digital currency push and celebrates Stephen Colbert's cancellation amid late-night battles, while mocking woke Hollywood and spotlighting Shane Gillis' authentic ESPYs hosting. Hour 2 dives into local tragedy fundraisers, Biden's auto-pen scandal, and Scott Bushkie's warning about “vibe hiring” and AI job filters. Hour 3 exposes sanctuary cities enabling violent criminal aliens and Griff Jenkins' fight for immigration enforcement. Hour 4 wraps with Dylan Sharkey exposing Illinois Governor JB Pritzker's power grab, a jaw-dropping $128K paycheck for Illinois lawmakers working just 70 days a year, and a deep dive into the promising “big beautiful bill” that could fuel an American economic boom, plus a community fundraiser in Eureka. This show cuts through political corruption, highlights economic wins, and keeps listeners plugged into the fight for America's future.
CJ CUP BYRON NELSON 2025, Fantasy Golf Picks & Bets | Fantasy Golf DegeneratesJoin Kenny Kim and Byron Lindeque as they dive into The 2025 CJ Cup Byron Nelson at TPC Craig Ranch. Get insider previews of the course, expert analysis of the odds, and exclusive Fantasy Golf picks and best bets from the "Fantasy Golf Degenerates" podcast. Tune in for a deep dive into this week's PGA Tour action!Episode “413” | Pulling Teeth for Outrights#CJCup #ByronNelson #CJCupByronNelson #TPCCraigRanch #FantasyGolf #PGATourSub to the Mayo Media Network: https://bit.ly/YTMMNUse Code “FGDegenerates” for 1.5% Cashback up to $200 at ProphetX today www.ProphetX.co/registerGet 20% off https://www.fantasynational.com/FGDUse Code “FGD15” on checkout at https://kickbackgolf.com for 15% off your order.Use Code “FGD50” for your copy of all of Byron's tools here: https://www.patreon.com/TheModelManiacSHOW INDEXIntro - 0:00Recap - 1:07Byron's Story Time - 10:50LIV Golf Recap - 16:34The Chevron Championship - 19:11ProphetX Picks - 32:16Kenny's Story Time - 41:42Course Preview - 47:25DFS Strategy - 57:24Kick Back Golf Bets - 1:02:47Tiers 10K - 1:16:469K - 1:24:128K - 1:27:407K - 1:31:53Hustler Play OTW - 1:33:20Back to the 7K - 1:34:166K - 1:41:13Outro - 1:48:55Video: https://bit.ly/YTMMNApple: https://bit.ly/FGDAppleSpotify: https://bit.ly/FGDSpotifyGoogle: https://bit.ly/FGDGoogleStitcher: https://bit.ly/FGDStitchKenny Kim Twitter: https://twitter.com/KendoVTByron Lindeque Twitter: https://twitter.com/TheModelManiacFantasy Golf Degenerates Twitter: https://twitter.com/FGDegeneratesProduced by: Mike Baxter: https://twitter.com/MikeTookThat
98% of meme coins are scams… but the two percent have been making millions Greeny returns to dive deep into meme coin mania, his 100k airdrop, and the future of altcoins. He's also come to share a little secret… the NFT he spent a house deposit on! From Solana utility plays to rumors of GTA 6 crypto integration, The boys jump into all this week's biggest stories. With insights on macro trends, low-cap gems, and a bear market survival guide that will give you some teeth, this is a must-listen for traders and holders alike. You'll hear: How the Trump and Melania meme coins tanked the crypto market Greeny's surprising pivot from meme coins to utility plays.. Why Bitcoin's dominance is crushing altcoins (and when that might change) How to spot meme coins with staying power. Why Greeny bet $140K on a NFT. The truth about Pudgy Penguins and Bored Apes in the market today. The wild coin that turned 128K into 20M overnight. Greeny's top macro indicators for timing the next crypto boom. … and much more! Follow Greeny over on X @greenytrades or join his trading community @greenysgroup Check out Greeny's YouTube channel here. Want to see what we're looking at every episode? Watch the YouTube version of the podcast here. Keen to join in TIC Tipping? Reset your demo mode and let us know your picks on @tappingintocrypto on instagram or X @tappingintocrypto Ready to start? Get $10 of FREE Bitcoin on Swyftx when you sign up and verify: https://trade.swyftx.com.au/register/?promoRef=tappingintocrypto10btc
In this episode, we welcome Shamir Allibhai. Shamir is the co-founder of Eddie AI, a revolutionary new AI-powered software for filmmakers, editors, and content creators. In our chat, he shares about his early days, career working in production and post, and about the creation of his company, Eddie AI. He also deep dives on the solutions it provides — and other insights on the industry and production workflows.“The Making Of” is presented by AJA:How Cromorama solves HDR production challenges with AJA ColorBoxCromorama is transforming HDR workflows for live production across the globe, using AJA ColorBox and its integrated ORION-CONVERT pipeline to power SDR/HDR transforms, quality control checks, and more for high-stakes productions like the UEFA EURO 2024 Championship. Find out how in this interview with Cromorama CEO and CTO Pablo Garcia hereIgelkott Studios: Redefining Driving PlatesSay goodbye to the limitations of array rig plates. Igelkott's precision-crafted single-lens driving plates deliver perfect parallax, seamless stitching, and true-to-life depth—no mismatched angles or post headaches. The choice of top filmmakers for flawless in-camera realism. Experience the future of driving plates at www.igelkottplates.comIntroducing Atomos Sun Dragon: A Rope Light Made for Filmmakers.The world's first full sun-spectrum rope light, Sun Dragon offers creatives more options. It's uniquely flexible, so it fits into places other lights can't. You can wrap it around objects for creative highlighting and special, colour-controllable effects including dramatic underlighting. The world's first sun spectrum, HDR, waterproof, DMX controlled, 2000 lumen 5-color LED, mount-anywhere, lightweight flexible production and cinema rope light.Learn more here Explore the OWC Jellyfish Nomad:Discover how the OWC Jellyfish Nomad turned a desolate location in the Utah Salt Flats into a fully equipped, mobile production studio. This compact, powerful device allows video professionals to manage, share, and collaborate on high-resolution projects in remote environments. Click through to see how you can streamline your workflow, no matter where your next shoot takes you! Read hereZEISS Introduces the Otus ML:The ZEISS Otus ML lenses are crafted for photographers who live to tell stories. Inspired by the legendary ZEISS Otus family, the new lenses bring ZEISS' renowned optical excellence combined with precise mechanics to mirrorless system cameras. Thanks to the distinctive ZEISS Look of true color, outstanding sharpness and the iconic “3D-Pop” of micro-contrast, your story will come to life exactly like you envisioned. A wide f1.4 aperture provides outstanding depth of field directing attention to your focus area, providing a soft bokeh that elegantly separates subjects from the background. The aspherical design effectively minimizes distortion and chromatic aberrations. Coupled with ZEISS T* coating that reduce reflections within a lens, minimizing lens flare and enhancing image contrast, and color fidelity.Learn more herePodcast Rewind:Feb 2025 - Ep. 69…“The Making Of” is published by Michael Valinsky.To advertise your products or services to 128K filmmakers, video pros, TV, broadcast, live event production pros, & photographers reading this newsletter, email us at mvalinsky@me.com Get full access to The Making Of at themakingof.substack.com/subscribe
Our Sinclair is BACK and to kick off the new year, our game selection committee decided to check out some new-ish fare on the ZX Spectrum 128k! Join THE BRENT and Amigo Aaron has we explore the deepest, darkest corridors of the TINY DUNGEONS! It's Our Sinclair 116! Purchase Tiny Dungeons at: https://retrosouls.itch.io/tiny-dungeons
Our Sinclair is BACK and to kick off the new year, our game selection committee decided to check out some new-ish fare on the ZX Spectrum 128k! Join THE BRENT and Amigo Aaron has we explore the deepest, darkest corridors of the TINY DUNGEONS! It's Our Sinclair 116! Purchase Tiny Dungeons at: https://retrosouls.itch.io/tiny-dungeons
Applications for the 2025 AI Engineer Summit are up, and you can save the date for AIE Singapore in April and AIE World's Fair 2025 in June.Happy new year, and thanks for 100 great episodes! Please let us know what you want to see/hear for the next 100!Full YouTube Episode with Slides/ChartsLike and subscribe and hit that bell to get notifs!Timestamps* 00:00 Welcome to the 100th Episode!* 00:19 Reflecting on the Journey* 00:47 AI Engineering: The Rise and Impact* 03:15 Latent Space Live and AI Conferences* 09:44 The Competitive AI Landscape* 21:45 Synthetic Data and Future Trends* 35:53 Creative Writing with AI* 36:12 Legal and Ethical Issues in AI* 38:18 The Data War: GPU Poor vs. GPU Rich* 39:12 The Rise of GPU Ultra Rich* 40:47 Emerging Trends in AI Models* 45:31 The Multi-Modality War* 01:05:31 The Future of AI Benchmarks* 01:13:17 Pionote and Frontier Models* 01:13:47 Niche Models and Base Models* 01:14:30 State Space Models and RWKB* 01:15:48 Inference Race and Price Wars* 01:22:16 Major AI Themes of the Year* 01:22:48 AI Rewind: January to March* 01:26:42 AI Rewind: April to June* 01:33:12 AI Rewind: July to September* 01:34:59 AI Rewind: October to December* 01:39:53 Year-End Reflections and PredictionsTranscript[00:00:00] Welcome to the 100th Episode![00:00:00] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co host Swyx for the 100th time today.[00:00:12] swyx: Yay, um, and we're so glad that, yeah, you know, everyone has, uh, followed us in this journey. How do you feel about it? 100 episodes.[00:00:19] Alessio: Yeah, I know.[00:00:19] Reflecting on the Journey[00:00:19] Alessio: Almost two years that we've been doing this. We've had four different studios. Uh, we've had a lot of changes. You know, we used to do this lightning round. When we first started that we didn't like, and we tried to change the question. The answer[00:00:32] swyx: was cursor and perplexity.[00:00:34] Alessio: Yeah, I love mid journey. It's like, do you really not like anything else?[00:00:38] Alessio: Like what's, what's the unique thing? And I think, yeah, we, we've also had a lot more research driven content. You know, we had like 3DAO, we had, you know. Jeremy Howard, we had more folks like that.[00:00:47] AI Engineering: The Rise and Impact[00:00:47] Alessio: I think we want to do more of that too in the new year, like having, uh, some of the Gemini folks, both on the research and the applied side.[00:00:54] Alessio: Yeah, but it's been a ton of fun. I think we both started, I wouldn't say as a joke, we were kind of like, Oh, we [00:01:00] should do a podcast. And I think we kind of caught the right wave, obviously. And I think your rise of the AI engineer posts just kind of get people. Sombra to congregate, and then the AI engineer summit.[00:01:11] Alessio: And that's why when I look at our growth chart, it's kind of like a proxy for like the AI engineering industry as a whole, which is almost like, like, even if we don't do that much, we keep growing just because there's so many more AI engineers. So did you expect that growth or did you expect that would take longer for like the AI engineer thing to kind of like become, you know, everybody talks about it today.[00:01:32] swyx: So, the sign of that, that we have won is that Gartner puts it at the top of the hype curve right now. So Gartner has called the peak in AI engineering. I did not expect, um, to what level. I knew that I was correct when I called it because I did like two months of work going into that. But I didn't know, You know, how quickly it could happen, and obviously there's a chance that I could be wrong.[00:01:52] swyx: But I think, like, most people have come around to that concept. Hacker News hates it, which is a good sign. But there's enough people that have defined it, you know, GitHub, when [00:02:00] they launched GitHub Models, which is the Hugging Face clone, they put AI engineers in the banner, like, above the fold, like, in big So I think it's like kind of arrived as a meaningful and useful definition.[00:02:12] swyx: I think people are trying to figure out where the boundaries are. I think that was a lot of the quote unquote drama that happens behind the scenes at the World's Fair in June. Because I think there's a lot of doubt or questions about where ML engineering stops and AI engineering starts. That's a useful debate to be had.[00:02:29] swyx: In some sense, I actually anticipated that as well. So I intentionally did not. Put a firm definition there because most of the successful definitions are necessarily underspecified and it's actually useful to have different perspectives and you don't have to specify everything from the outset.[00:02:45] Alessio: Yeah, I was at um, AWS reInvent and the line to get into like the AI engineering talk, so to speak, which is, you know, applied AI and whatnot was like, there are like hundreds of people just in line to go in.[00:02:56] Alessio: I think that's kind of what enabled me. People, right? Which is what [00:03:00] you kind of talked about. It's like, Hey, look, you don't actually need a PhD, just, yeah, just use the model. And then maybe we'll talk about some of the blind spots that you get as an engineer with the earlier posts that we also had on on the sub stack.[00:03:11] Alessio: But yeah, it's been a heck of a heck of a two years.[00:03:14] swyx: Yeah.[00:03:15] Latent Space Live and AI Conferences[00:03:15] swyx: You know, I was, I was trying to view the conference as like, so NeurIPS is I think like 16, 17, 000 people. And the Latent Space Live event that we held there was 950 signups. I think. The AI world, the ML world is still very much research heavy. And that's as it should be because ML is very much in a research phase.[00:03:34] swyx: But as we move this entire field into production, I think that ratio inverts into becoming more engineering heavy. So at least I think engineering should be on the same level, even if it's never as prestigious, like it'll always be low status because at the end of the day, you're manipulating APIs or whatever.[00:03:51] swyx: But Yeah, wrapping GPTs, but there's going to be an increasing stack and an art to doing these, these things well. And I, you know, I [00:04:00] think that's what we're focusing on for the podcast, the conference and basically everything I do seems to make sense. And I think we'll, we'll talk about the trends here that apply.[00:04:09] swyx: It's, it's just very strange. So, like, there's a mix of, like, keeping on top of research while not being a researcher and then putting that research into production. So, like, people always ask me, like, why are you covering Neuralibs? Like, this is a ML research conference and I'm like, well, yeah, I mean, we're not going to, to like, understand everything Or reproduce every single paper, but the stuff that is being found here is going to make it through into production at some point, you hope.[00:04:32] swyx: And then actually like when I talk to the researchers, they actually get very excited because they're like, oh, you guys are actually caring about how this goes into production and that's what they really really want. The measure of success is previously just peer review, right? Getting 7s and 8s on their um, Academic review conferences and stuff like citations is one metric, but money is a better metric.[00:04:51] Alessio: Money is a better metric. Yeah, and there were about 2200 people on the live stream or something like that. Yeah, yeah. Hundred on the live stream. So [00:05:00] I try my best to moderate, but it was a lot spicier in person with Jonathan and, and Dylan. Yeah, that it was in the chat on YouTube.[00:05:06] swyx: I would say that I actually also created.[00:05:09] swyx: Layen Space Live in order to address flaws that are perceived in academic conferences. This is not NeurIPS specific, it's ICML, NeurIPS. Basically, it's very sort of oriented towards the PhD student, uh, market, job market, right? Like literally all, basically everyone's there to advertise their research and skills and get jobs.[00:05:28] swyx: And then obviously all the, the companies go there to hire them. And I think that's great for the individual researchers, but for people going there to get info is not great because you have to read between the lines, bring a ton of context in order to understand every single paper. So what is missing is effectively what I ended up doing, which is domain by domain, go through and recap the best of the year.[00:05:48] swyx: Survey the field. And there are, like NeurIPS had a, uh, I think ICML had a like a position paper track, NeurIPS added a benchmarks, uh, datasets track. These are ways in which to address that [00:06:00] issue. Uh, there's always workshops as well. Every, every conference has, you know, a last day of workshops and stuff that provide more of an overview.[00:06:06] swyx: But they're not specifically prompted to do so. And I think really, uh, Organizing a conference is just about getting good speakers and giving them the correct prompts. And then they will just go and do that thing and they do a very good job of it. So I think Sarah did a fantastic job with the startups prompt.[00:06:21] swyx: I can't list everybody, but we did best of 2024 in startups, vision, open models. Post transformers, synthetic data, small models, and agents. And then the last one was the, uh, and then we also did a quick one on reasoning with Nathan Lambert. And then the last one, obviously, was the debate that people were very hyped about.[00:06:39] swyx: It was very awkward. And I'm really, really thankful for John Franco, basically, who stepped up to challenge Dylan. Because Dylan was like, yeah, I'll do it. But He was pro scaling. And I think everyone who is like in AI is pro scaling, right? So you need somebody who's ready to publicly say, no, we've hit a wall.[00:06:57] swyx: So that means you're saying Sam Altman's wrong. [00:07:00] You're saying, um, you know, everyone else is wrong. It helps that this was the day before Ilya went on, went up on stage and then said pre training has hit a wall. And data has hit a wall. So actually Jonathan ended up winning, and then Ilya supported that statement, and then Noam Brown on the last day further supported that statement as well.[00:07:17] swyx: So it's kind of interesting that I think the consensus kind of going in was that we're not done scaling, like you should believe in a better lesson. And then, four straight days in a row, you had Sepp Hochreiter, who is the creator of the LSTM, along with everyone's favorite OG in AI, which is Juergen Schmidhuber.[00:07:34] swyx: He said that, um, we're pre trading inside a wall, or like, we've run into a different kind of wall. And then we have, you know John Frankel, Ilya, and then Noam Brown are all saying variations of the same thing, that we have hit some kind of wall in the status quo of what pre trained, scaling large pre trained models has looked like, and we need a new thing.[00:07:54] swyx: And obviously the new thing for people is some make, either people are calling it inference time compute or test time [00:08:00] compute. I think the collective terminology has been inference time, and I think that makes sense because test time, calling it test, meaning, has a very pre trained bias, meaning that the only reason for running inference at all is to test your model.[00:08:11] swyx: That is not true. Right. Yeah. So, so, I quite agree that. OpenAI seems to have adopted, or the community seems to have adopted this terminology of ITC instead of TTC. And that, that makes a lot of sense because like now we care about inference, even right down to compute optimality. Like I actually interviewed this author who recovered or reviewed the Chinchilla paper.[00:08:31] swyx: Chinchilla paper is compute optimal training, but what is not stated in there is it's pre trained compute optimal training. And once you start caring about inference, compute optimal training, you have a different scaling law. And in a way that we did not know last year.[00:08:45] Alessio: I wonder, because John is, he's also on the side of attention is all you need.[00:08:49] Alessio: Like he had the bet with Sasha. So I'm curious, like he doesn't believe in scaling, but he thinks the transformer, I wonder if he's still. So, so,[00:08:56] swyx: so he, obviously everything is nuanced and you know, I told him to play a character [00:09:00] for this debate, right? So he actually does. Yeah. He still, he still believes that we can scale more.[00:09:04] swyx: Uh, he just assumed the character to be very game for, for playing this debate. So even more kudos to him that he assumed a position that he didn't believe in and still won the debate.[00:09:16] Alessio: Get rekt, Dylan. Um, do you just want to quickly run through some of these things? Like, uh, Sarah's presentation, just the highlights.[00:09:24] swyx: Yeah, we can't go through everyone's slides, but I pulled out some things as a factor of, like, stuff that we were going to talk about. And we'll[00:09:30] Alessio: publish[00:09:31] swyx: the rest. Yeah, we'll publish on this feed the best of 2024 in those domains. And hopefully people can benefit from the work that our speakers have done.[00:09:39] swyx: But I think it's, uh, these are just good slides. And I've been, I've been looking for a sort of end of year recaps from, from people.[00:09:44] The Competitive AI Landscape[00:09:44] swyx: The field has progressed a lot. You know, I think the max ELO in 2023 on LMSys used to be 1200 for LMSys ELOs. And now everyone is at least at, uh, 1275 in their ELOs, and this is across Gemini, Chadjibuti, [00:10:00] Grok, O1.[00:10:01] swyx: ai, which with their E Large model, and Enthopic, of course. It's a very, very competitive race. There are multiple Frontier labs all racing, but there is a clear tier zero Frontier. And then there's like a tier one. It's like, I wish I had everything else. Tier zero is extremely competitive. It's effectively now three horse race between Gemini, uh, Anthropic and OpenAI.[00:10:21] swyx: I would say that people are still holding out a candle for XAI. XAI, I think, for some reason, because their API was very slow to roll out, is not included in these metrics. So it's actually quite hard to put on there. As someone who also does charts, XAI is continually snubbed because they don't work well with the benchmarking people.[00:10:42] swyx: Yeah, yeah, yeah. It's a little trivia for why XAI always gets ignored. The other thing is market share. So these are slides from Sarah. We have it up on the screen. It has gone from very heavily open AI. So we have some numbers and estimates. These are from RAMP. Estimates of open AI market share in [00:11:00] December 2023.[00:11:01] swyx: And this is basically, what is it, GPT being 95 percent of production traffic. And I think if you correlate that with stuff that we asked. Harrison Chase on the LangChain episode, it was true. And then CLAUD 3 launched mid middle of this year. I think CLAUD 3 launched in March, CLAUD 3. 5 Sonnet was in June ish.[00:11:23] swyx: And you can start seeing the market share shift towards opening, uh, towards that topic, uh, very, very aggressively. The more recent one is Gemini. So if I scroll down a little bit, this is an even more recent dataset. So RAM's dataset ends in September 2 2. 2024. Gemini has basically launched a price war at the low end, uh, with Gemini Flash, uh, being basically free for personal use.[00:11:44] swyx: Like, I think people don't understand the free tier. It's something like a billion tokens per day. Unless you're trying to abuse it, you cannot really exhaust your free tier on Gemini. They're really trying to get you to use it. They know they're in like third place, um, fourth place, depending how you, how you count.[00:11:58] swyx: And so they're going after [00:12:00] the Lower tier first, and then, you know, maybe the upper tier later, but yeah, Gemini Flash, according to OpenRouter, is now 50 percent of their OpenRouter requests. Obviously, these are the small requests. These are small, cheap requests that are mathematically going to be more.[00:12:15] swyx: The smart ones obviously are still going to OpenAI. But, you know, it's a very, very big shift in the market. Like basically 2023, 2022, To going into 2024 opening has gone from nine five market share to Yeah. Reasonably somewhere between 50 to 75 market share.[00:12:29] Alessio: Yeah. I'm really curious how ramped does the attribution to the model?[00:12:32] Alessio: If it's API, because I think it's all credit card spin. . Well, but it's all, the credit card doesn't say maybe. Maybe the, maybe when they do expenses, they upload the PDF, but yeah, the, the German I think makes sense. I think that was one of my main 2024 takeaways that like. The best small model companies are the large labs, which is not something I would have thought that the open source kind of like long tail would be like the small model.[00:12:53] swyx: Yeah, different sizes of small models we're talking about here, right? Like so small model here for Gemini is AB, [00:13:00] right? Uh, mini. We don't know what the small model size is, but yeah, it's probably in the double digits or maybe single digits, but probably double digits. The open source community has kind of focused on the one to three B size.[00:13:11] swyx: Mm-hmm . Yeah. Maybe[00:13:12] swyx: zero, maybe 0.5 B uh, that's moon dream and that is small for you then, then that's great. It makes sense that we, we have a range for small now, which is like, may, maybe one to five B. Yeah. I'll even put that at, at, at the high end. And so this includes Gemma from Gemini as well. But also includes the Apple Foundation models, which I think Apple Foundation is 3B.[00:13:32] Alessio: Yeah. No, that's great. I mean, I think in the start small just meant cheap. I think today small is actually a more nuanced discussion, you know, that people weren't really having before.[00:13:43] swyx: Yeah, we can keep going. This is a slide that I smiley disagree with Sarah. She's pointing to the scale SEAL leaderboard. I think the Researchers that I talked with at NeurIPS were kind of positive on this because basically you need private test [00:14:00] sets to prevent contamination.[00:14:02] swyx: And Scale is one of maybe three or four people this year that has really made an effort in doing a credible private test set leaderboard. Llama405B does well compared to Gemini and GPT 40. And I think that's good. I would say that. You know, it's good to have an open model that is that big, that does well on those metrics.[00:14:23] swyx: But anyone putting 405B in production will tell you, if you scroll down a little bit to the artificial analysis numbers, that it is very slow and very expensive to infer. Um, it doesn't even fit on like one node. of, uh, of H100s. Cerebras will be happy to tell you they can serve 4 or 5B on their super large chips.[00:14:42] swyx: But, um, you know, if you need to do anything custom to it, you're still kind of constrained. So, is 4 or 5B really that relevant? Like, I think most people are basically saying that they only use 4 or 5B as a teacher model to distill down to something. Even Meta is doing it. So with Lama 3. [00:15:00] 3 launched, they only launched the 70B because they use 4 or 5B to distill the 70B.[00:15:03] swyx: So I don't know if like open source is keeping up. I think they're the, the open source industrial complex is very invested in telling you that the, if the gap is narrowing, I kind of disagree. I think that the gap is widening with O1. I think there are very, very smart people trying to narrow that gap and they should.[00:15:22] swyx: I really wish them success, but you cannot use a chart that is nearing 100 in your saturation chart. And look, the distance between open source and closed source is narrowing. Of course it's going to narrow because you're near 100. This is stupid. But in metrics that matter, is open source narrowing?[00:15:38] swyx: Probably not for O1 for a while. And it's really up to the open source guys to figure out if they can match O1 or not.[00:15:46] Alessio: I think inference time compute is bad for open source just because, you know, Doc can donate the flops at training time, but he cannot donate the flops at inference time. So it's really hard to like actually keep up on that axis.[00:15:59] Alessio: Big, big business [00:16:00] model shift. So I don't know what that means for the GPU clouds. I don't know what that means for the hyperscalers, but obviously the big labs have a lot of advantage. Because, like, it's not a static artifact that you're putting the compute in. You're kind of doing that still, but then you're putting a lot of computed inference too.[00:16:17] swyx: Yeah, yeah, yeah. Um, I mean, Llama4 will be reasoning oriented. We talked with Thomas Shalom. Um, kudos for getting that episode together. That was really nice. Good, well timed. Actually, I connected with the AI meta guy, uh, at NeurIPS, and, um, yeah, we're going to coordinate something for Llama4. Yeah, yeah,[00:16:32] Alessio: and our friend, yeah.[00:16:33] Alessio: Clara Shi just joined to lead the business agent side. So I'm sure we'll have her on in the new year.[00:16:39] swyx: Yeah. So, um, my comment on, on the business model shift, this is super interesting. Apparently it is wide knowledge that OpenAI wanted more than 6. 6 billion dollars for their fundraise. They wanted to raise, you know, higher, and they did not.[00:16:51] swyx: And what that means is basically like, it's very convenient that we're not getting GPT 5, which would have been a larger pre train. We should have a lot of upfront money. And [00:17:00] instead we're, we're converting fixed costs into variable costs, right. And passing it on effectively to the customer. And it's so much easier to take margin there because you can directly attribute it to like, Oh, you're using this more.[00:17:12] swyx: Therefore you, you pay more of the cost and I'll just slap a margin in there. So like that lets you control your growth margin and like tie your. Your spend, or your sort of inference spend, accordingly. And it's just really interesting to, that this change in the sort of inference paradigm has arrived exactly at the same time that the funding environment for pre training is effectively drying up, kind of.[00:17:36] swyx: I feel like maybe the VCs are very in tune with research anyway, so like, they would have noticed this, but, um, it's just interesting.[00:17:43] Alessio: Yeah, and I was looking back at our yearly recap of last year. Yeah. And the big thing was like the mixed trial price fights, you know, and I think now it's almost like there's nowhere to go, like, you know, Gemini Flash is like basically giving it away for free.[00:17:55] Alessio: So I think this is a good way for the labs to generate more revenue and pass down [00:18:00] some of the compute to the customer. I think they're going to[00:18:02] swyx: keep going. I think that 2, will come.[00:18:05] Alessio: Yeah, I know. Totally. I mean, next year, the first thing I'm doing is signing up for Devin. Signing up for the pro chat GBT.[00:18:12] Alessio: Just to try. I just want to see what does it look like to spend a thousand dollars a month on AI?[00:18:17] swyx: Yes. Yes. I think if your, if your, your job is a, at least AI content creator or VC or, you know, someone who, whose job it is to stay on, stay on top of things, you should already be spending like a thousand dollars a month on, on stuff.[00:18:28] swyx: And then obviously easy to spend, hard to use. You have to actually use. The good thing is that actually Google lets you do a lot of stuff for free now. So like deep research. That they just launched. Uses a ton of inference and it's, it's free while it's in preview.[00:18:45] Alessio: Yeah. They need to put that in Lindy.[00:18:47] Alessio: I've been using Lindy lately. I've been a built a bunch of things once we had flow because I liked the new thing. It's pretty good. I even did a phone call assistant. Um, yeah, they just launched Lindy voice. Yeah, I think once [00:19:00] they get advanced voice mode like capability today, still like speech to text, you can kind of tell.[00:19:06] Alessio: Um, but it's good for like reservations and things like that. So I have a meeting prepper thing. And so[00:19:13] swyx: it's good. Okay. I feel like we've, we've covered a lot of stuff. Uh, I, yeah, I, you know, I think We will go over the individual, uh, talks in a separate episode. Uh, I don't want to take too much time with, uh, this stuff, but that suffice to say that there is a lot of progress in each field.[00:19:28] swyx: Uh, we covered vision. Basically this is all like the audience voting for what they wanted. And then I just invited the best people I could find in each audience, especially agents. Um, Graham, who I talked to at ICML in Vienna, he is currently still number one. It's very hard to stay on top of SweetBench.[00:19:45] swyx: OpenHand is currently still number one. switchbench full, which is the hardest one. He had very good thoughts on agents, which I, which I'll highlight for people. Everyone is saying 2025 is the year of agents, just like they said last year. And, uh, but he had [00:20:00] thoughts on like eight parts of what are the frontier problems to solve in agents.[00:20:03] swyx: And so I'll highlight that talk as well.[00:20:05] Alessio: Yeah. The number six, which is the Hacken agents learn more about the environment, has been a Super interesting to us as well, just to think through, because, yeah, how do you put an agent in an enterprise where most things in an enterprise have never been public, you know, a lot of the tooling, like the code bases and things like that.[00:20:23] Alessio: So, yeah, there's not indexing and reg. Well, yeah, but it's more like. You can't really rag things that are not documented. But people know them based on how they've been doing it. You know, so I think there's almost this like, you know, Oh, institutional knowledge. Yeah, the boring word is kind of like a business process extraction.[00:20:38] Alessio: Yeah yeah, I see. It's like, how do you actually understand how these things are done? I see. Um, and I think today the, the problem is that, Yeah, the agents are, that most people are building are good at following instruction, but are not as good as like extracting them from you. Um, so I think that will be a big unlock just to touch quickly on the Jeff Dean thing.[00:20:55] Alessio: I thought it was pretty, I mean, we'll link it in the, in the things, but. I think the main [00:21:00] focus was like, how do you use ML to optimize the systems instead of just focusing on ML to do something else? Yeah, I think speculative decoding, we had, you know, Eugene from RWKB on the podcast before, like he's doing a lot of that with Fetterless AI.[00:21:12] swyx: Everyone is. I would say it's the norm. I'm a little bit uncomfortable with how much it costs, because it does use more of the GPU per call. But because everyone is so keen on fast inference, then yeah, makes sense.[00:21:24] Alessio: Exactly. Um, yeah, but we'll link that. Obviously Jeff is great.[00:21:30] swyx: Jeff is, Jeff's talk was more, it wasn't focused on Gemini.[00:21:33] swyx: I think people got the wrong impression from my tweet. It's more about how Google approaches ML and uses ML to design systems and then systems feedback into ML. And I think this ties in with Lubna's talk.[00:21:45] Synthetic Data and Future Trends[00:21:45] swyx: on synthetic data where it's basically the story of bootstrapping of humans and AI in AI research or AI in production.[00:21:53] swyx: So her talk was on synthetic data, where like how much synthetic data has grown in 2024 in the pre training side, the post training side, [00:22:00] and the eval side. And I think Jeff then also extended it basically to chips, uh, to chip design. So he'd spend a lot of time talking about alpha chip. And most of us in the audience are like, we're not working on hardware, man.[00:22:11] swyx: Like you guys are great. TPU is great. Okay. We'll buy TPUs.[00:22:14] Alessio: And then there was the earlier talk. Yeah. But, and then we have, uh, I don't know if we're calling them essays. What are we calling these? But[00:22:23] swyx: for me, it's just like bonus for late in space supporters, because I feel like they haven't been getting anything.[00:22:29] swyx: And then I wanted a more high frequency way to write stuff. Like that one I wrote in an afternoon. I think basically we now have an answer to what Ilya saw. It's one year since. The blip. And we know what he saw in 2014. We know what he saw in 2024. We think we know what he sees in 2024. He gave some hints and then we have vague indications of what he saw in 2023.[00:22:54] swyx: So that was the Oh, and then 2016 as well, because of this lawsuit with Elon, OpenAI [00:23:00] is publishing emails from Sam's, like, his personal text messages to Siobhan, Zelis, or whatever. So, like, we have emails from Ilya saying, this is what we're seeing in OpenAI, and this is why we need to scale up GPUs. And I think it's very prescient in 2016 to write that.[00:23:16] swyx: And so, like, it is exactly, like, basically his insights. It's him and Greg, basically just kind of driving the scaling up of OpenAI, while they're still playing Dota. They're like, no, like, we see the path here.[00:23:30] Alessio: Yeah, and it's funny, yeah, they even mention, you know, we can only train on 1v1 Dota. We need to train on 5v5, and that takes too many GPUs.[00:23:37] Alessio: Yeah,[00:23:37] swyx: and at least for me, I can speak for myself, like, I didn't see the path from Dota to where we are today. I think even, maybe if you ask them, like, they wouldn't necessarily draw a straight line. Yeah,[00:23:47] Alessio: no, definitely. But I think like that was like the whole idea of almost like the RL and we talked about this with Nathan on his podcast.[00:23:55] Alessio: It's like with RL, you can get very good at specific things, but then you can't really like generalize as much. And I [00:24:00] think the language models are like the opposite, which is like, you're going to throw all this data at them and scale them up, but then you really need to drive them home on a specific task later on.[00:24:08] Alessio: And we'll talk about the open AI reinforcement, fine tuning, um, announcement too, and all of that. But yeah, I think like scale is all you need. That's kind of what Elia will be remembered for. And I think just maybe to clarify on like the pre training is over thing that people love to tweet. I think the point of the talk was like everybody, we're scaling these chips, we're scaling the compute, but like the second ingredient which is data is not scaling at the same rate.[00:24:35] Alessio: So it's not necessarily pre training is over. It's kind of like What got us here won't get us there. In his email, he predicted like 10x growth every two years or something like that. And I think maybe now it's like, you know, you can 10x the chips again, but[00:24:49] swyx: I think it's 10x per year. Was it? I don't know.[00:24:52] Alessio: Exactly. And Moore's law is like 2x. So it's like, you know, much faster than that. And yeah, I like the fossil fuel of AI [00:25:00] analogy. It's kind of like, you know, the little background tokens thing. So the OpenAI reinforcement fine tuning is basically like, instead of fine tuning on data, you fine tune on a reward model.[00:25:09] Alessio: So it's basically like, instead of being data driven, it's like task driven. And I think people have tasks to do, they don't really have a lot of data. So I'm curious to see how that changes, how many people fine tune, because I think this is what people run into. It's like, Oh, you can fine tune llama. And it's like, okay, where do I get the data?[00:25:27] Alessio: To fine tune it on, you know, so it's great that we're moving the thing. And then I really like he had this chart where like, you know, the brain mass and the body mass thing is basically like mammals that scaled linearly by brain and body size, and then humans kind of like broke off the slope. So it's almost like maybe the mammal slope is like the pre training slope.[00:25:46] Alessio: And then the post training slope is like the, the human one.[00:25:49] swyx: Yeah. I wonder what the. I mean, we'll know in 10 years, but I wonder what the y axis is for, for Ilya's SSI. We'll try to get them on.[00:25:57] Alessio: Ilya, if you're listening, you're [00:26:00] welcome here. Yeah, and then he had, you know, what comes next, like agent, synthetic data, inference, compute, I thought all of that was like that.[00:26:05] Alessio: I don't[00:26:05] swyx: think he was dropping any alpha there. Yeah, yeah, yeah.[00:26:07] Alessio: Yeah. Any other new reps? Highlights?[00:26:10] swyx: I think that there was comparatively a lot more work. Oh, by the way, I need to plug that, uh, my friend Yi made this, like, little nice paper. Yeah, that was really[00:26:20] swyx: nice.[00:26:20] swyx: Uh, of, uh, of, like, all the, he's, she called it must read papers of 2024.[00:26:26] swyx: So I laid out some of these at NeurIPS, and it was just gone. Like, everyone just picked it up. Because people are dying for, like, little guidance and visualizations And so, uh, I thought it was really super nice that we got there.[00:26:38] Alessio: Should we do a late in space book for each year? Uh, I thought about it. For each year we should.[00:26:42] Alessio: Coffee table book. Yeah. Yeah. Okay. Put it in the will. Hi, Will. By the way, we haven't introduced you. He's our new, you know, general organist, Jamie. You need to[00:26:52] swyx: pull up more things. One thing I saw that, uh, Okay, one fun one, and then one [00:27:00] more general one. So the fun one is this paper on agent collusion. This is a paper on steganography.[00:27:06] swyx: This is secret collusion among AI agents, multi agent deception via steganography. I tried to go to NeurIPS in order to find these kinds of papers because the real reason Like NeurIPS this year has a lottery system. A lot of people actually even go and don't buy tickets because they just go and attend the side events.[00:27:22] swyx: And then also the people who go and end up crowding around the most popular papers, which you already know and already read them before you showed up to NeurIPS. So the only reason you go there is to talk to the paper authors, but there's like something like 10, 000 other. All these papers out there that, you know, are just people's work that they, that they did on the air and they failed to get attention for one reason or another.[00:27:42] swyx: And this was one of them. Uh, it was like all the way at the back. And this is a deep mind paper that actually focuses on collusion between AI agents, uh, by hiding messages in the text that they generate. Uh, so that's what steganography is. So a very simple example would be the first letter of every word.[00:27:57] swyx: If you Pick that out, you know, and the code sends a [00:28:00] different message than that. But something I've always emphasized is to LLMs, we read left to right. LLMs can read up, down, sideways, you know, in random character order. And it's the same to them as it is to us. So if we were ever to get You know, self motivated, underlined LLMs that we're trying to collaborate to take over the planet.[00:28:19] swyx: This would be how they do it. They spread messages among us in the messages that we generate. And he developed a scaling law for that. So he marked, I'm showing it on screen right now, the emergence of this phenomenon. Basically, for example, for Cypher encoding, GPT 2, Lama 2, mixed trial, GPT 3. 5, zero capabilities, and sudden 4.[00:28:40] swyx: And this is the kind of Jason Wei type emergence properties that people kind of look for. I think what made this paper stand out as well, so he developed the benchmark for steganography collusion, and he also focused on shelling point collusion, which is very low coordination. For agreeing on a decoding encoding format, you kind of need to have some [00:29:00] agreement on that.[00:29:00] swyx: But, but shelling point means like very, very low or almost no coordination. So for example, if I, if I ask someone, if the only message I give you is meet me in New York and you're not aware. Or when you would probably meet me at Grand Central Station. That is the Grand Central Station is a shelling point.[00:29:16] swyx: And it's probably somewhere, somewhere during the day. That is the shelling point of New York is Grand Central. To that extent, shelling points for steganography are things like the, the, the common decoding methods that we talked about. It will be interesting at some point in the future when we are worried about alignment.[00:29:30] swyx: It is not interesting today, but it's interesting that DeepMind is already thinking about this.[00:29:36] Alessio: I think that's like one of the hardest things about NeurIPS. It's like the long tail. I[00:29:41] swyx: found a pricing guy. I'm going to feature him on the podcast. Basically, this guy from NVIDIA worked out the optimal pricing for language models.[00:29:51] swyx: It's basically an econometrics paper at NeurIPS, where everyone else is talking about GPUs. And the guy with the GPUs is[00:29:57] Alessio: talking[00:29:57] swyx: about economics instead. [00:30:00] That was the sort of fun one. So the focus I saw is that model papers at NeurIPS are kind of dead. No one really presents models anymore. It's just data sets.[00:30:12] swyx: This is all the grad students are working on. So like there was a data sets track and then I was looking around like, I was like, you don't need a data sets track because every paper is a data sets paper. And so data sets and benchmarks, they're kind of flip sides of the same thing. So Yeah. Cool. Yeah, if you're a grad student, you're a GPU boy, you kind of work on that.[00:30:30] swyx: And then the, the sort of big model that people walk around and pick the ones that they like, and then they use it in their models. And that's, that's kind of how it develops. I, I feel like, um, like, like you didn't last year, you had people like Hao Tian who worked on Lava, which is take Lama and add Vision.[00:30:47] swyx: And then obviously actually I hired him and he added Vision to Grok. Now he's the Vision Grok guy. This year, I don't think there was any of those.[00:30:55] Alessio: What were the most popular, like, orals? Last year it was like the [00:31:00] Mixed Monarch, I think, was like the most attended. Yeah, uh, I need to look it up. Yeah, I mean, if nothing comes to mind, that's also kind of like an answer in a way.[00:31:10] Alessio: But I think last year there was a lot of interest in, like, furthering models and, like, different architectures and all of that.[00:31:16] swyx: I will say that I felt the orals, oral picks this year were not very good. Either that or maybe it's just a So that's the highlight of how I have changed in terms of how I view papers.[00:31:29] swyx: So like, in my estimation, two of the best papers in this year for datasets or data comp and refined web or fine web. These are two actually industrially used papers, not highlighted for a while. I think DCLM got the spotlight, FineWeb didn't even get the spotlight. So like, it's just that the picks were different.[00:31:48] swyx: But one thing that does get a lot of play that a lot of people are debating is the role that's scheduled. This is the schedule free optimizer paper from Meta from Aaron DeFazio. And this [00:32:00] year in the ML community, there's been a lot of chat about shampoo, soap, all the bathroom amenities for optimizing your learning rates.[00:32:08] swyx: And, uh, most people at the big labs are. Who I asked about this, um, say that it's cute, but it's not something that matters. I don't know, but it's something that was discussed and very, very popular. 4Wars[00:32:19] Alessio: of AI recap maybe, just quickly. Um, where do you want to start? Data?[00:32:26] swyx: So to remind people, this is the 4Wars piece that we did as one of our earlier recaps of this year.[00:32:31] swyx: And the belligerents are on the left, journalists, writers, artists, anyone who owns IP basically, New York Times, Stack Overflow, Reddit, Getty, Sarah Silverman, George RR Martin. Yeah, and I think this year we can add Scarlett Johansson to that side of the fence. So anyone suing, open the eye, basically. I actually wanted to get a snapshot of all the lawsuits.[00:32:52] swyx: I'm sure some lawyer can do it. That's the data quality war. On the right hand side, we have the synthetic data people, and I think we talked about Lumna's talk, you know, [00:33:00] really showing how much synthetic data has come along this year. I think there was a bit of a fight between scale. ai and the synthetic data community, because scale.[00:33:09] swyx: ai published a paper saying that synthetic data doesn't work. Surprise, surprise, scale. ai is the leading vendor of non synthetic data. Only[00:33:17] Alessio: cage free annotated data is useful.[00:33:21] swyx: So I think there's some debate going on there, but I don't think it's much debate anymore that at least synthetic data, for the reasons that are blessed in Luna's talk, Makes sense.[00:33:32] swyx: I don't know if you have any perspectives there.[00:33:34] Alessio: I think, again, going back to the reinforcement fine tuning, I think that will change a little bit how people think about it. I think today people mostly use synthetic data, yeah, for distillation and kind of like fine tuning a smaller model from like a larger model.[00:33:46] Alessio: I'm not super aware of how the frontier labs use it outside of like the rephrase, the web thing that Apple also did. But yeah, I think it'll be. Useful. I think like whether or not that gets us the big [00:34:00] next step, I think that's maybe like TBD, you know, I think people love talking about data because it's like a GPU poor, you know, I think, uh, synthetic data is like something that people can do, you know, so they feel more opinionated about it compared to, yeah, the optimizers stuff, which is like,[00:34:17] swyx: they don't[00:34:17] Alessio: really work[00:34:18] swyx: on.[00:34:18] swyx: I think that there is an angle to the reasoning synthetic data. So this year, we covered in the paper club, the star series of papers. So that's star, Q star, V star. It basically helps you to synthesize reasoning steps, or at least distill reasoning steps from a verifier. And if you look at the OpenAI RFT, API that they released, or that they announced, basically they're asking you to submit graders, or they choose from a preset list of graders.[00:34:49] swyx: Basically It feels like a way to create valid synthetic data for them to fine tune their reasoning paths on. Um, so I think that is another angle where it starts to make sense. And [00:35:00] so like, it's very funny that basically all the data quality wars between Let's say the music industry or like the newspaper publishing industry or the textbooks industry on the big labs.[00:35:11] swyx: It's all of the pre training era. And then like the new era, like the reasoning era, like nobody has any problem with all the reasoning, especially because it's all like sort of math and science oriented with, with very reasonable graders. I think the more interesting next step is how does it generalize beyond STEM?[00:35:27] swyx: We've been using O1 for And I would say like for summarization and creative writing and instruction following, I think it's underrated. I started using O1 in our intro songs before we killed the intro songs, but it's very good at writing lyrics. You know, I can actually say like, I think one of the O1 pro demos.[00:35:46] swyx: All of these things that Noam was showing was that, you know, you can write an entire paragraph or three paragraphs without using the letter A, right?[00:35:53] Creative Writing with AI[00:35:53] swyx: So like, like literally just anything instead of token, like not even token level, character level manipulation and [00:36:00] counting and instruction following. It's, uh, it's very, very strong.[00:36:02] swyx: And so no surprises when I ask it to rhyme, uh, and to, to create song lyrics, it's going to do that very much better than in previous models. So I think it's underrated for creative writing.[00:36:11] Alessio: Yeah.[00:36:12] Legal and Ethical Issues in AI[00:36:12] Alessio: What do you think is the rationale that they're going to have in court when they don't show you the thinking traces of O1, but then they want us to, like, they're getting sued for using other publishers data, you know, but then on their end, they're like, well, you shouldn't be using my data to then train your model.[00:36:29] Alessio: So I'm curious to see how that kind of comes. Yeah, I mean, OPA has[00:36:32] swyx: many ways to publish, to punish people without bringing, taking them to court. Already banned ByteDance for distilling their, their info. And so anyone caught distilling the chain of thought will be just disallowed to continue on, on, on the API.[00:36:44] swyx: And it's fine. It's no big deal. Like, I don't even think that's an issue at all, just because the chain of thoughts are pretty well hidden. Like you have to work very, very hard to, to get it to leak. And then even when it leaks the chain of thought, you don't know if it's, if it's [00:37:00] The bigger concern is actually that there's not that much IP hiding behind it, that Cosign, which we talked about, we talked to him on Dev Day, can just fine tune 4.[00:37:13] swyx: 0 to beat 0. 1 Cloud SONET so far is beating O1 on coding tasks without, at least O1 preview, without being a reasoning model, same for Gemini Pro or Gemini 2. 0. So like, how much is reasoning important? How much of a moat is there in this, like, All of these are proprietary sort of training data that they've presumably accomplished.[00:37:34] swyx: Because even DeepSeek was able to do it. And they had, you know, two months notice to do this, to do R1. So, it's actually unclear how much moat there is. Obviously, you know, if you talk to the Strawberry team, they'll be like, yeah, I mean, we spent the last two years doing this. So, we don't know. And it's going to be Interesting because there'll be a lot of noise from people who say they have inference time compute and actually don't because they just have fancy chain of thought.[00:38:00][00:38:00] swyx: And then there's other people who actually do have very good chain of thought. And you will not see them on the same level as OpenAI because OpenAI has invested a lot in building up the mythology of their team. Um, which makes sense. Like the real answer is somewhere in between.[00:38:13] Alessio: Yeah, I think that's kind of like the main data war story developing.[00:38:18] The Data War: GPU Poor vs. GPU Rich[00:38:18] Alessio: GPU poor versus GPU rich. Yeah. Where do you think we are? I think there was, again, going back to like the small model thing, there was like a time in which the GPU poor were kind of like the rebel faction working on like these models that were like open and small and cheap. And I think today people don't really care as much about GPUs anymore.[00:38:37] Alessio: You also see it in the price of the GPUs. Like, you know, that market is kind of like plummeted because there's people don't want to be, they want to be GPU free. They don't even want to be poor. They just want to be, you know, completely without them. Yeah. How do you think about this war? You[00:38:52] swyx: can tell me about this, but like, I feel like the, the appetite for GPU rich startups, like the, you know, the, the funding plan is we will raise 60 million and [00:39:00] we'll give 50 of that to NVIDIA.[00:39:01] swyx: That is gone, right? Like, no one's, no one's pitching that. This was literally the plan, the exact plan of like, I can name like four or five startups, you know, this time last year. So yeah, GPU rich startups gone.[00:39:12] The Rise of GPU Ultra Rich[00:39:12] swyx: But I think like, The GPU ultra rich, the GPU ultra high net worth is still going. So, um, now we're, you know, we had Leopold's essay on the trillion dollar cluster.[00:39:23] swyx: We're not quite there yet. We have multiple labs, um, you know, XAI very famously, you know, Jensen Huang praising them for being. Best boy number one in spinning up 100, 000 GPU cluster in like 12 days or something. So likewise at Meta, likewise at OpenAI, likewise at the other labs as well. So like the GPU ultra rich are going to keep doing that because I think partially it's an article of faith now that you just need it.[00:39:46] swyx: Like you don't even know what it's going to, what you're going to use it for. You just, you just need it. And it makes sense that if, especially if we're going into. More researchy territory than we are. So let's say 2020 to 2023 was [00:40:00] let's scale big models territory because we had GPT 3 in 2020 and we were like, okay, we'll go from 1.[00:40:05] swyx: 75b to 1. 8b, 1. 8t. And that was GPT 3 to GPT 4. Okay, that's done. As far as everyone is concerned, Opus 3. 5 is not coming out, GPT 4. 5 is not coming out, and Gemini 2, we don't have Pro, whatever. We've hit that wall. Maybe I'll call it the 2 trillion perimeter wall. We're not going to 10 trillion. No one thinks it's a good idea, at least from training costs, from the amount of data, or at least the inference.[00:40:36] swyx: Would you pay 10x the price of GPT Probably not. Like, like you want something else that, that is at least more useful. So it makes sense that people are pivoting in terms of their inference paradigm.[00:40:47] Emerging Trends in AI Models[00:40:47] swyx: And so when it's more researchy, then you actually need more just general purpose compute to mess around with, uh, at the exact same time that production deployments of the old, the previous paradigm is still ramping up,[00:40:58] swyx: um,[00:40:58] swyx: uh, pretty aggressively.[00:40:59] swyx: So [00:41:00] it makes sense that the GPU rich are growing. We have now interviewed both together and fireworks and replicates. Uh, we haven't done any scale yet. But I think Amazon, maybe kind of a sleeper one, Amazon, in a sense of like they, at reInvent, I wasn't expecting them to do so well, but they are now a foundation model lab.[00:41:18] swyx: It's kind of interesting. Um, I think, uh, you know, David went over there and started just creating models.[00:41:25] Alessio: Yeah, I mean, that's the power of prepaid contracts. I think like a lot of AWS customers, you know, they do this big reserve instance contracts and now they got to use their money. That's why so many startups.[00:41:37] Alessio: Get bought through the AWS marketplace so they can kind of bundle them together and prefer pricing.[00:41:42] swyx: Okay, so maybe GPU super rich doing very well, GPU middle class dead, and then GPU[00:41:48] Alessio: poor. I mean, my thing is like, everybody should just be GPU rich. There shouldn't really be, even the GPU poorest, it's like, does it really make sense to be GPU poor?[00:41:57] Alessio: Like, if you're GPU poor, you should just use the [00:42:00] cloud. Yes, you know, and I think there might be a future once we kind of like figure out what the size and shape of these models is where like the tiny box and these things come to fruition where like you can be GPU poor at home. But I think today is like, why are you working so hard to like get these models to run on like very small clusters where it's like, It's so cheap to run them.[00:42:21] Alessio: Yeah, yeah,[00:42:22] swyx: yeah. I think mostly people think it's cool. People think it's a stepping stone to scaling up. So they aspire to be GPU rich one day and they're working on new methods. Like news research, like probably the most deep tech thing they've done this year is Distro or whatever the new name is.[00:42:38] swyx: There's a lot of interest in heterogeneous computing, distributed computing. I tend generally to de emphasize that historically, but it may be coming to a time where it is starting to be relevant. I don't know. You know, SF compute launched their compute marketplace this year, and like, who's really using that?[00:42:53] swyx: Like, it's a bunch of small clusters, disparate types of compute, and if you can make that [00:43:00] useful, then that will be very beneficial to the broader community, but maybe still not the source of frontier models. It's just going to be a second tier of compute that is unlocked for people, and that's fine. But yeah, I mean, I think this year, I would say a lot more on device, We are, I now have Apple intelligence on my phone.[00:43:19] swyx: Doesn't do anything apart from summarize my notifications. But still, not bad. Like, it's multi modal.[00:43:25] Alessio: Yeah, the notification summaries are so and so in my experience.[00:43:29] swyx: Yeah, but they add, they add juice to life. And then, um, Chrome Nano, uh, Gemini Nano is coming out in Chrome. Uh, they're still feature flagged, but you can, you can try it now if you, if you use the, uh, the alpha.[00:43:40] swyx: And so, like, I, I think, like, you know, We're getting the sort of GPU poor version of a lot of these things coming out, and I think it's like quite useful. Like Windows as well, rolling out RWKB in sort of every Windows department is super cool. And I think the last thing that I never put in this GPU poor war, that I think I should now, [00:44:00] is the number of startups that are GPU poor but still scaling very well, as sort of wrappers on top of either a foundation model lab, or GPU Cloud.[00:44:10] swyx: GPU Cloud, it would be Suno. Suno, Ramp has rated as one of the top ranked, fastest growing startups of the year. Um, I think the last public number is like zero to 20 million this year in ARR and Suno runs on Moto. So Suno itself is not GPU rich, but they're just doing the training on, on Moto, uh, who we've also talked to on, on the podcast.[00:44:31] swyx: The other one would be Bolt, straight cloud wrapper. And, and, um, Again, another, now they've announced 20 million ARR, which is another step up from our 8 million that we put on the title. So yeah, I mean, it's crazy that all these GPU pores are finding a way while the GPU riches are also finding a way. And then the only failures, I kind of call this the GPU smiling curve, where the edges do well, because you're either close to the machines, and you're like [00:45:00] number one on the machines, or you're like close to the customers, and you're number one on the customer side.[00:45:03] swyx: And the people who are in the middle. Inflection, um, character, didn't do that great. I think character did the best of all of them. Like, you have a note in here that we apparently said that character's price tag was[00:45:15] Alessio: 1B.[00:45:15] swyx: Did I say that?[00:45:16] Alessio: Yeah. You said Google should just buy them for 1B. I thought it was a crazy number.[00:45:20] Alessio: Then they paid 2. 7 billion. I mean, for like,[00:45:22] swyx: yeah.[00:45:22] Alessio: What do you pay for node? Like, I don't know what the game world was like. Maybe the starting price was 1B. I mean, whatever it was, it worked out for everybody involved.[00:45:31] The Multi-Modality War[00:45:31] Alessio: Multimodality war. And this one, we never had text to video in the first version, which now is the hottest.[00:45:37] swyx: Yeah, I would say it's a subset of image, but yes.[00:45:40] Alessio: Yeah, well, but I think at the time it wasn't really something people were doing, and now we had VO2 just came out yesterday. Uh, Sora was released last month, last week. I've not tried Sora, because the day that I tried, it wasn't, yeah. I[00:45:54] swyx: think it's generally available now, you can go to Sora.[00:45:56] swyx: com and try it. Yeah, they had[00:45:58] Alessio: the outage. Which I [00:46:00] think also played a part into it. Small things. Yeah. What's the other model that you posted today that was on Replicate? Video or OneLive?[00:46:08] swyx: Yeah. Very, very nondescript name, but it is from Minimax, which I think is a Chinese lab. The Chinese labs do surprisingly well at the video models.[00:46:20] swyx: I'm not sure it's actually Chinese. I don't know. Hold me up to that. Yep. China. It's good. Yeah, the Chinese love video. What can I say? They have a lot of training data for video. Or a more relaxed regulatory environment.[00:46:37] Alessio: Uh, well, sure, in some way. Yeah, I don't think there's much else there. I think like, you know, on the image side, I think it's still open.[00:46:45] Alessio: Yeah, I mean,[00:46:46] swyx: 11labs is now a unicorn. So basically, what is multi modality war? Multi modality war is, do you specialize in a single modality, right? Or do you have GodModel that does all the modalities? So this is [00:47:00] definitely still going, in a sense of 11 labs, you know, now Unicorn, PicoLabs doing well, they launched Pico 2.[00:47:06] swyx: 0 recently, HeyGen, I think has reached 100 million ARR, Assembly, I don't know, but they have billboards all over the place, so I assume they're doing very, very well. So these are all specialist models, specialist models and specialist startups. And then there's the big labs who are doing the sort of all in one play.[00:47:24] swyx: And then here I would highlight Gemini 2 for having native image output. Have you seen the demos? Um, yeah, it's, it's hard to keep up. Literally they launched this last week and a shout out to Paige Bailey, who came to the Latent Space event to demo on the day of launch. And she wasn't prepared. She was just like, I'm just going to show you.[00:47:43] swyx: So they have voice. They have, you know, obviously image input, and then they obviously can code gen and all that. But the new one that OpenAI and Meta both have but they haven't launched yet is image output. So you can literally, um, I think their demo video was that you put in an image of a [00:48:00] car, and you ask for minor modifications to that car.[00:48:02] swyx: They can generate you that modification exactly as you asked. So there's no need for the stable diffusion or comfy UI workflow of like mask here and then like infill there in paint there and all that, all that stuff. This is small model nonsense. Big model people are like, huh, we got you in as everything in the transformer.[00:48:21] swyx: This is the multimodality war, which is, do you, do you bet on the God model or do you string together a whole bunch of, uh, Small models like a, like a chump. Yeah,[00:48:29] Alessio: I don't know, man. Yeah, that would be interesting. I mean, obviously I use Midjourney for all of our thumbnails. Um, they've been doing a ton on the product, I would say.[00:48:38] Alessio: They launched a new Midjourney editor thing. They've been doing a ton. Because I think, yeah, the motto is kind of like, Maybe, you know, people say black forest, the black forest models are better than mid journey on a pixel by pixel basis. But I think when you put it, put it together, have you tried[00:48:53] swyx: the same problems on black forest?[00:48:55] Alessio: Yes. But the problem is just like, you know, on black forest, it generates one image. And then it's like, you got to [00:49:00] regenerate. You don't have all these like UI things. Like what I do, no, but it's like time issue, you know, it's like a mid[00:49:06] swyx: journey. Call the API four times.[00:49:08] Alessio: No, but then there's no like variate.[00:49:10] Alessio: Like the good thing about mid journey is like, you just go in there and you're cooking. There's a lot of stuff that just makes it really easy. And I think people underestimate that. Like, it's not really a skill issue, because I'm paying mid journey, so it's a Black Forest skill issue, because I'm not paying them, you know?[00:49:24] Alessio: Yeah,[00:49:25] swyx: so, okay, so, uh, this is a UX thing, right? Like, you, you, you understand that, at least, we think that Black Forest should be able to do all that stuff. I will also shout out, ReCraft has come out, uh, on top of the image arena that, uh, artificial analysis has done, has apparently, uh, Flux's place. Is this still true?[00:49:41] swyx: So, Artificial Analysis is now a company. I highlighted them I think in one of the early AI Newses of the year. And they have launched a whole bunch of arenas. So, they're trying to take on LM Arena, Anastasios and crew. And they have an image arena. Oh yeah, Recraft v3 is now beating Flux 1. 1. Which is very surprising [00:50:00] because Flux And Black Forest Labs are the old stable diffusion crew who left stability after, um, the management issues.[00:50:06] swyx: So Recurve has come from nowhere to be the top image model. Uh, very, very strange. I would also highlight that Grok has now launched Aurora, which is, it's very interesting dynamics between Grok and Black Forest Labs because Grok's images were originally launched, uh, in partnership with Black Forest Labs as a, as a thin wrapper.[00:50:24] swyx: And then Grok was like, no, we'll make our own. And so they've made their own. I don't know, there are no APIs or benchmarks about it. They just announced it. So yeah, that's the multi modality war. I would say that so far, the small model, the dedicated model people are winning, because they are just focused on their tasks.[00:50:42] swyx: But the big model, People are always catching up. And the moment I saw the Gemini 2 demo of image editing, where I can put in an image and just request it and it does, that's how AI should work. Not like a whole bunch of complicated steps. So it really is something. And I think one frontier that we haven't [00:51:00] seen this year, like obviously video has done very well, and it will continue to grow.[00:51:03] swyx: You know, we only have Sora Turbo today, but at some point we'll get full Sora. Oh, at least the Hollywood Labs will get Fulsora. We haven't seen video to audio, or video synced to audio. And so the researchers that I talked to are already starting to talk about that as the next frontier. But there's still maybe like five more years of video left to actually be Soda.[00:51:23] swyx: I would say that Gemini's approach Compared to OpenAI, Gemini seems, or DeepMind's approach to video seems a lot more fully fledged than OpenAI. Because if you look at the ICML recap that I published that so far nobody has listened to, um, that people have listened to it. It's just a different, definitely different audience.[00:51:43] swyx: It's only seven hours long. Why are people not listening? It's like everything in Uh, so, so DeepMind has, is working on Genie. They also launched Genie 2 and VideoPoet. So, like, they have maybe four years advantage on world modeling that OpenAI does not have. Because OpenAI basically only started [00:52:00] Diffusion Transformers last year, you know, when they hired, uh, Bill Peebles.[00:52:03] swyx: So, DeepMind has, has a bit of advantage here, I would say, in, in, in showing, like, the reason that VO2, while one, They cherry pick their videos. So obviously it looks better than Sora, but the reason I would believe that VO2, uh, when it's fully launched will do very well is because they have all this background work in video that they've done for years.[00:52:22] swyx: Like, like last year's NeurIPS, I already was interviewing some of their video people. I forget their model name, but for, for people who are dedicated fans, they can go to NeurIPS 2023 and see, see that paper.[00:52:32] Alessio: And then last but not least, the LLMOS. We renamed it to Ragops, formerly known as[00:52:39] swyx: Ragops War. I put the latest chart on the Braintrust episode.[00:52:43] swyx: I think I'm going to separate these essays from the episode notes. So the reason I used to do that, by the way, is because I wanted to show up on Hacker News. I wanted the podcast to show up on Hacker News. So I always put an essay inside of there because Hacker News people like to read and not listen.[00:52:58] Alessio: So episode essays,[00:52:59] swyx: I remember [00:53:00] purchasing them separately. You say Lanchain Llama Index is still growing.[00:53:03] Alessio: Yeah, so I looked at the PyPy stats, you know. I don't care about stars. On PyPy you see Do you want to share your screen? Yes. I prefer to look at actual downloads, not at stars on GitHub. So if you look at, you know, Lanchain still growing.[00:53:20] Alessio: These are the last six months. Llama Index still growing. What I've basically seen is like things that, One, obviously these things have A commercial product. So there's like people buying this and sticking with it versus kind of hopping in between things versus, you know, for example, crew AI, not really growing as much.[00:53:38] Alessio: The stars are growing. If you look on GitHub, like the stars are growing, but kind of like the usage is kind of like flat. In the last six months, have they done some[00:53:4
There are more than 100,000 more people enrolled in Medicaid in Wisconsin now than before COVID struck. Wisconsin Medicaid director Bill Hanna spoke at a Wisconsin Health News Newsmaker event Tuesday. He said while Wisconsin's Medicaid enrollments are down from their peak, more people are receiving government health care now than before the pandemic.Support this podcast: https://secure.anedot.com/franklin-news-foundation/ce052532-b1e4-41c4-945c-d7ce2f52c38a?source_code=xxxxxxFull story: https://www.thecentersquare.com/wisconsin/article_a37819cc-599f-11ef-a3bc-a751a5d747f9.html
In this episode of Money Talk with Tiff, special guest Maya Corbic shares her insights on getting kids started with investing at a young age. Maya explains how parents can normalize money conversations with kids as young as 4 or 5 using simple concepts, then get them more involved around age 8 by explaining things like savings accounts, CDs, and stocks.Maya discusses her approach of having kids invest half their gift money, starting with individual stocks in companies they know and then moving up to ETFs and index funds. Her book "From Piggy Banks to Stocks" aims to explain investing basics in a 10-year-old-friendly way.Tune in to hear Maya's tips for raising financially savvy kids and her own journey learning to invest as a first-generation immigrant.About Our GuestFrom challenging beginnings in shelters and government housing, Maya Corbic is a first-generation immigrant and CPA who draws from her experience of overcoming financial challenges and simplifies money matters to inspire children to pursue financial success.Maya is the author of a kids' book, "From Piggy Banks to Stocks: The Ultimate Guide for a Young Investor," which simplifies investing concepts and equips children with essential investing skills while keeping them engaged.She founded the Wealthy Kids Investment Club and has a popular Instagram account @teach.kids.money with 128K+ subscribers, through which she inspires parents to raise financially independent kids.Connect with MayaGet the book From Piggy Banks to Stocks: The Ultimate Guide for a Young Investor (Amazon Link)Instagram: @teach.kids.moneyTwitter: @Educ8Money2KidsConnect with TiffanyWebsite: https://www.moneytalkwitht.comFacebook: Money Talk With TiffTwitter: @moneytalkwithtInstagram: @moneytalkwithtLinkedIn: Tiffany GrantYouTube: Money Talk With TiffPinterest: @moneytalkwithtTikTok: @moneytalkwithtTimestamps[00:00] Explaining investments to kids in simple terms.[04:52] Encouraging investment in stocks and diversified funds.[09:09] Investing in ETF gives broad US exposure.[12:11] Book made friendly for kids and adults.Key TakeawaysIntroducing kids to investing concepts earlyExplaining stocks as owning company sharesCertificates of deposit for guaranteed returnsInvesting in companies kids are familiar withETFs and index funds for diversification"From Piggy Banks to Stocks" book overviewSupport this PodcastCopyright 2024 Tiffany GrantThis podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacy
If you see this in time, join our emergency LLM paper club on the Llama 3 paper!For everyone else, join our special AI in Action club on the Latent Space Discord for a special feature with the Cursor cofounders on Composer, their newest coding agent!Today, Meta is officially releasing the largest and most capable open model to date, Llama3-405B, a dense transformer trained on 15T tokens that beats GPT-4 on all major benchmarks:The 8B and 70B models from the April Llama 3 release have also received serious spec bumps, warranting the new label of Llama 3.1.If you are curious about the infra / hardware side, go check out our episode with Soumith Chintala, one of the AI infra leads at Meta. Today we have Thomas Scialom, who led Llama2 and now Llama3 post-training, so we spent most of our time on pre-training (synthetic data, data pipelines, scaling laws, etc) and post-training (RLHF vs instruction tuning, evals, tool calling).Synthetic data is all you needLlama3 was trained on 15T tokens, 7x more than Llama2 and with 4 times as much code and 30 different languages represented. But as Thomas beautifully put it:“My intuition is that the web is full of s**t in terms of text, and training on those tokens is a waste of compute.” “Llama 3 post-training doesn't have any human written answers there basically… It's just leveraging pure synthetic data from Llama 2.”While it is well speculated that the 8B and 70B were "offline distillations" of the 405B, there are a good deal more synthetic data elements to Llama 3.1 than the expected. The paper explicitly calls out:* SFT for Code: 3 approaches for synthetic data for the 405B bootstrapping itself with code execution feedback, programming language translation, and docs backtranslation.* SFT for Math: The Llama 3 paper credits the Let's Verify Step By Step authors, who we interviewed at ICLR:* SFT for Multilinguality: "To collect higher quality human annotations in non-English languages, we train a multilingual expert by branching off the pre-training run and continuing to pre-train on a data mix that consists of 90% multilingualtokens."* SFT for Long Context: "It is largely impractical to get humans to annotate such examples due to the tedious and time-consuming nature of reading lengthy contexts, so we predominantly rely on synthetic data to fill this gap. We use earlier versions of Llama 3 to generate synthetic data based on the key long-context use-cases: (possibly multi-turn) question-answering, summarization for long documents, and reasoning over code repositories, and describe them in greater detail below"* SFT for Tool Use: trained for Brave Search, Wolfram Alpha, and a Python Interpreter (a special new ipython role) for single, nested, parallel, and multiturn function calling.* RLHF: DPO preference data was used extensively on Llama 2 generations. This is something we partially covered in RLHF 201: humans are often better at judging between two options (i.e. which of two poems they prefer) than creating one (writing one from scratch). Similarly, models might not be great at creating text but they can be good at classifying their quality.Last but not least, Llama 3.1 received a license update explicitly allowing its use for synthetic data generation.Llama2 was also used as a classifier for all pre-training data that went into the model. It both labelled it by quality so that bad tokens were removed, but also used type (i.e. science, law, politics) to achieve a balanced data mix. Tokenizer size mattersThe tokens vocab of a model is the collection of all tokens that the model uses. Llama2 had a 34,000 tokens vocab, GPT-4 has 100,000, and 4o went up to 200,000. Llama3 went up 4x to 128,000 tokens. You can find the GPT-4 vocab list on Github.This is something that people gloss over, but there are many reason why a large vocab matters:* More tokens allow it to represent more concepts, and then be better at understanding the nuances.* The larger the tokenizer, the less tokens you need for the same amount of text, extending the perceived context size. In Llama3's case, that's ~30% more text due to the tokenizer upgrade. * With the same amount of compute you can train more knowledge into the model as you need fewer steps.The smaller the model, the larger the impact that the tokenizer size will have on it. You can listen at 55:24 for a deeper explanation.Dense models = 1 Expert MoEsMany people on X asked “why not MoE?”, and Thomas' answer was pretty clever: dense models are just MoEs with 1 expert :)[00:28:06]: I heard that question a lot, different aspects there. Why not MoE in the future? The other thing is, I think a dense model is just one specific variation of the model for an hyperparameter for an MOE with basically one expert. So it's just an hyperparameter we haven't optimized a lot yet, but we have some stuff ongoing and that's an hyperparameter we'll explore in the future.Basically… wait and see!Llama4Meta already started training Llama4 in June, and it sounds like one of the big focuses will be around agents. Thomas was one of the authors behind GAIA (listen to our interview with Thomas in our ICLR recap) and has been working on agent tooling for a while with things like Toolformer. Current models have “a gap of intelligence” when it comes to agentic workflows, as they are unable to plan without the user relying on prompting techniques and loops like ReAct, Chain of Thought, or frameworks like Autogen and Crew. That may be fixed soon?
In this week's episode of the Generative AI Meetup Podcast, hosts Mark and Shashank dive into the latest advancements in generative AI. They kick off with a detailed exploration of Mistral Nemo, a new AI model that has set a benchmark with its unprecedented 128K context window. The discussion then shifts to an intriguing development at Microsoft, where a specialized LLM is being designed for optimizing spreadsheet functions. Join us to understand how these innovations are shaping the future of AI and why they matter to developers and businesses alike. Whether you're a seasoned AI enthusiast or just curious about the technology shaping our future, this episode is packed with insights you won't want to miss.
The biggest event inside Macstock Conference and Expo is the annual Macstock Film Festival, organized and hosted by Wally Cherwinski. An accomplished videographer in his own right, Wally shares his thoughts on why you (yes, you!) should be creating a submission and joining in the fun. No prizes, no judging and no pressure mean that anyone can be part of the Festival. Wally provides some tips on how to approach a subject, creating something from content you already have, and the emotional impact of preserving memories through video. Visit Macstock Conference and Expo and use the MacVoices discount code MACVOICES to save $30 on your registration fee. Today's edition of MacVoices is supported by MacVoices Live!, our weekly live panel discussion of what is going in the Apple space as well as the larger tech world, and how it is impacting you. Join us live at YouTube.com/MacVoicesTV at 8 PM Eastern 5 PM Pacific, or whatever time that is wherever you are and participate in the chat, or catch the edited and segmented versions of the show on the regular MacVoices channels and feeds. Show Notes: Chapters: 02:22 The MacStock Short Film Festival04:31 Learning from the MacStock Film Submissions06:53 Submission Guidelines for MacStock Film Festival11:36 Creating Professional Videos with iMovie Trailers13:53 Tips and Tricks for Video Editing22:18 The Fun and Engrossing Process of Video Editing25:55 Encouragement to Create and Submit Videos for MacStock Links: Video To Go by Wally Cherwinski in the Apple Books Store Guests: Wally Cherwinski is a Videographer based in Ottawa, Canada. Originally trained as a scientist, he spent a portion of his career in research and teaching at the University of Cambridge, England while doubling as a freelance photographer and writer. Later, he joined Canada's National Research Council and spent many years managing communications for the Canadian Space Program. Starting with 16mm film, he has written and directed numerous documentaries and television features, including projects with Canada's National Film Board. More recently, he has combined his passion for video with his love of travel. Wally has been a Mac user since the original 128K in 1984 and his Apple "museum" includes 28 Macs (not to mention Newtons, iPods, iPhones & iPads). He has delivered video workshops at Macworld, at Macintosh User Groups in Canada and on three MacMania cruises. He also writes a regular video column in the ScreenCastsOnline monthly magazine. You can connect with him on X, or view his Cirque du Mac videos (and others) on his YouTube channel. Support: Become a MacVoices Patron on Patreon http://patreon.com/macvoices Enjoy this episode? Make a one-time donation with PayPal Connect: Web: http://macvoices.com Twitter: http://www.twitter.com/chuckjoiner http://www.twitter.com/macvoices Mastodon: https://mastodon.cloud/@chuckjoiner Facebook: http://www.facebook.com/chuck.joiner MacVoices Page on Facebook: http://www.facebook.com/macvoices/ MacVoices Group on Facebook: http://www.facebook.com/groups/macvoice LinkedIn: https://www.linkedin.com/in/chuckjoiner/ Instagram: https://www.instagram.com/chuckjoiner/ Subscribe: Audio in iTunes Video in iTunes Subscribe manually via iTunes or any podcatcher: Audio: http://www.macvoices.com/rss/macvoicesrss Video: http://www.macvoices.com/rss/macvoicesvideorss
The biggest event inside Macstock Conference and Expo is the annual Macstock Film Festival, organized and hosted by Wally Cherwinski. An accomplished videographer in his own right, Wally shares his thoughts on why you (yes, you!) should be creating a submission and joining in the fun. No prizes, no judging and no pressure mean that anyone can be part of the Festival. Wally provides some tips on how to approach a subject, creating something from content you already have, and the emotional impact of preserving memories through video. Visit Macstock Conference and Expo and use the MacVoices discount code MACVOICES to save $30 on your registration fee. Today's edition of MacVoices is supported by MacVoices Live!, our weekly live panel discussion of what is going in the Apple space as well as the larger tech world, and how it is impacting you. Join us live at YouTube.com/MacVoicesTV at 8 PM Eastern 5 PM Pacific, or whatever time that is wherever you are and participate in the chat, or catch the edited and segmented versions of the show on the regular MacVoices channels and feeds. Show Notes: Chapters: 02:22 The Macstock Short Film Festival 04:31 Learning from the Macstock Film Submissions 06:53 Submission Guidelines for Macstock Film Festival 11:36 Creating Professional Videos with iMovie Trailers 13:53 Tips and Tricks for Video Editing 22:18 The Fun and Engrossing Process of Video Editing 25:55 Encouragement to Create and Submit Videos for Macstock Links: Video To Go by Wally Cherwinski in the Apple Books Store Guests: Wally Cherwinski is a Videographer based in Ottawa, Canada. Originally trained as a scientist, he spent a portion of his career in research and teaching at the University of Cambridge, England while doubling as a freelance photographer and writer. Later, he joined Canada's National Research Council and spent many years managing communications for the Canadian Space Program. Starting with 16mm film, he has written and directed numerous documentaries and television features, including projects with Canada's National Film Board. More recently, he has combined his passion for video with his love of travel. Wally has been a Mac user since the original 128K in 1984 and his Apple "museum" includes 28 Macs (not to mention Newtons, iPods, iPhones & iPads). He has delivered video workshops at Macworld, at Macintosh User Groups in Canada and on three MacMania cruises. He also writes a regular video column in the ScreenCastsOnline monthly magazine. You can connect with him on X, or view his Cirque du Mac videos (and others) on his YouTube channel. Support: Become a MacVoices Patron on Patreon http://patreon.com/macvoices Enjoy this episode? Make a one-time donation with PayPal Connect: Web: http://macvoices.com Twitter: http://www.twitter.com/chuckjoiner http://www.twitter.com/macvoices Mastodon: https://mastodon.cloud/@chuckjoiner Facebook: http://www.facebook.com/chuck.joiner MacVoices Page on Facebook: http://www.facebook.com/macvoices/ MacVoices Group on Facebook: http://www.facebook.com/groups/macvoice LinkedIn: https://www.linkedin.com/in/chuckjoiner/ Instagram: https://www.instagram.com/chuckjoiner/ Subscribe: Audio in iTunes Video in iTunes Subscribe manually via iTunes or any podcatcher: Audio: http://www.macvoices.com/rss/macvoicesrss Video: http://www.macvoices.com/rss/macvoicesvideorss
Hey there! In this episode of "Build Your Tribe," I dive into the secrets behind my YouTube channel's explosive growth with Liz Germain, founder of VidFluence. Liz breaks down the power of YouTube analytics, evergreen content, and killer strategies for thumbnails and titles that helped us gain over 123K subscribers and $65K in AdSense in just five months! Tune in to learn how you can boost your YouTube game and get those views. Watch this episode on YouTube!! Check out InstaClubHub!! For Just $7!! Go to InstaClubHub.com/Trial Related Episodes Midlifers Blow Up Your YouTube Channel In 2024 Watch Listen Learn More About Liz: Social Media: All Platforms @lizdoesvideo Website: channelamplifier.com YouTube @LizDoesVideo Get Your FREE YouTube Growth Hacks Guide Related Links Check out the top 25 Things You Can Delegate to a BELAY Virtual Assistant Today! Just text TRIBE —that's T-R-I-B-E—to 55123 to get access to this list and get started with BELAY today Use the service we use to grow our email list, create custom flows, sales funnels and take care of our customers every day
Once a year, on a late spring day when Art usually has something better to do, Jay and Robbie talk about their summer plans. Welcome to summer. Movie releases abound, but will they go to the theater or wait till the streaming begins and perhaps watch on Robbie's new deck? Will Jay get the hang of his fancy device that takes a lot of the stress out of smoking meats, and will his back porch be ready to host the first annual Guys of Summer eat-fest? Will they travel, or just let their kids come to them? Most importantly, you'll hear Jay's plans for flying through his neighborhood in 2026 if he can raise the $128K necessary to make it happen. Meanwhile, Robbie will be waiting for a resurrection of the ultimate Bat. And for the first time in many weeks, Nelvana of the Northern Lights gives hope that not all superheroes from the 1940's are regrettable. Summer is here, you're going to be working in the yard anyway, so you might as well listen.
Megan Wing is a business coach for entrepreneurs and the creator of the Six Figure Systems framework. Coaching helped solidify and up-level her self-concept as a CEO, and she's learned valuable lessons along the way about what it means to be in business. She went from making $156K all of last year to making $128K by the end of this first quarter, and she's here to give you a masterclass in all things entrepreneurship. We explore the importance of aggressively investing in your business and how she made hitting six figures statistically inevitable. Megan is sharing her thoughts on the core tenets of running a successful coaching business, the importance of having a community on your journey, and why tracking data is one of your biggest responsibilities as a coach. If you want to start making serious money as a coach, you need to check out 2K for 2K. Click here to join: https://staceyboehman.com/2kfor2k!
Beyond The Systems Podcast | Business Systems & Growth Strategies For Your Online Business
In this episode, I discuss my client's surprise $128,000 month and the behind-the-scenes factors that contributed to this success. The client had a team and processes in place, allowing her to focus on high-level tasks. I dive into the importance of setting up passive funnels and aligning them with the customer journey to support audience growth. A light launch strategy, focused on emails and YouTube, was implemented with careful planning. The key takeaway is the significance of consistency and long-term commitment to systems and processes for sustainable success.Topics covered :Having a team and processes in place allows entrepreneurs to focus on high-level tasks and avoid getting caught up in day-to-day operations.Setting up passive funnels and aligning them with the customer journey can support audience growth and revenue generation.A well-planned light launch strategy, focused on specific channels, can be effective in generating sales.Consistency and long-term commitment to systems and processes are crucial for sustainable success.Connect with Sam Whisnant:Website: https://www.systemswithsam.com/services Instagram: https://www.instagram.com/systemswithsam/
In an unprecedented Super Tuesday, Donald Trump goes 14-1! Nikki has decided to "suspend" her campaign but, not endorse Trump at the moment. (Imagine that) The Dems should be terrified about November! ► Today's Sponsors: Text 231-231 and use keyword GRAHAM to get a bottle of Nugenix Thermo X for FREE Protect your savings with the precious metal IRA specialist. www.birchgold.com Text: Graham to 989898 ► Watch LIVE on Rumble: https://rumble.com/c/GrahamAllenOfficial ► Support freedom with 9/12 Merch: https://912united.com Learn more about your ad choices. Visit megaphone.fm/adchoices
"Store Managers, We’re Investing in You" In a press release, Walmart is saying it's going to pay store managers a new average salary of $128,000 a year. The raise kicks in Feb. 1 and has stated that pay structures have not been adjusted in more than a decade. Dave and Debbie discuss and take listener calls about Walmart's investment in it's employees.
Cold Plunging: How should this growing trend be regulated in Utah? Utah's controversial social media law on hold Oakland A's interested in new Daybreak baseball stadium Walmart increasing store manager pay to $128K
Hong Kong Places $128k Bounty on U.S. CitizenYellen: U.S. Aims to Repair Relationship with ChinaTrump: Would Renege on $3 Billion U.S. Pledge to Climate FundHouse Passes Ndaa Despite Policy ObjectionsFDA Seizes Millions of Illegal E-CigarettesWorld Bank: China's Economy Will Slow in 2024Putin Praises China Ties as Trade Hits $200 BillionBeijing Wraps Up Trade Probe Ahead of Taiwan ElectionSnowstorms Pummel Northern, Central China500+ Injured in Beijing Subway AccidentChina Sends COVID-19, U.S. Pays for Tests: Rep. Harshbarger
What if AI could 10x your business growth? On today's episode of Leveraging AI, Isar Meitis talks about the secrets to leveraging AI to take your business from surviving to thriving.He discusses:Stop chasing efficiency gains - focus on transformational outcomes with AIProfessions → Skills - How AI makes specialized skills accessibleUse your data and optimization to crush the competitionAI enables infinite scalability - break traditional business bottlenecksWhether you're a startup or an enterprise, these practical AI strategies will help you tap into astounding growth opportunities.AI news of the week:China boldly claims it has a plan to mass-produce humanoid robots that can 'reshape the world' within 2 yearsElon Musk says AI will remove need for jobs and create ‘universal high income.' But workers don't want to wait for robots to get financial reliefUNEMPLOYED MAN USES AI TO APPLY FOR 5,000 JOBS, GETS 20 INTERVIEWSMicrosoft's GitHub announces Copilot assistant that can learn about companies' private codeNew emotional AI prompting method generates improved resultsAn AI just negotiated a contract for the first time ever — and no human was involved