Let's Talk AI

Share on

AI news discussion and interviews by AI researchers, so you can know what is actually happening with AI and what is just clickbait headlines.

Skynet Today

Jun 25, 2026 LATEST EPISODE
every other week NEW EPISODES
1h 4m AVG DURATION
291 EPISODES

Ivy Insights

The Let's Talk AI podcast is an outstanding resource for anyone interested in staying up to date with the latest developments in the field of artificial intelligence. Hosted by Andrey Kurenkov and Jeremy Harris, this podcast provides a comprehensive overview of AI news, research, tools, policies, and other important areas. The hosts are highly knowledgeable and credible, offering insightful analysis and bringing different perspectives to each episode. They do an excellent job of summarizing complex technical papers and distilling the most important information for their listeners. The amount of work they put into covering a vast amount of information each week is truly commendable.

One of the best aspects of this podcast is its ability to cover a wide range of topics without overwhelming the listener. Unlike other podcasts that dive too deep into specific subjects, Let's Talk AI strikes the right balance between providing informative content and keeping it accessible for all listeners. The hosts' engaging dynamic and likable personalities make every episode enjoyable to listen to. Additionally, their consistent weekly releases help to keep pace with the rapidly evolving field of AI.

While there are numerous positive aspects to this podcast, one potential downside is the length of each episode. Some episodes can run for 90 minutes or more, which might be too long for some listeners who prefer shorter podcasts. However, there are bookmarks available to allow skipping ahead to specific topics if desired.

In conclusion, The Let's Talk AI podcast is an indispensable source of information for anyone interested in artificial intelligence. With its knowledgeable hosts, broad coverage of topics, and commitment to quality content, this podcast stands out as a top choice for staying updated on AI news. Whether you're a ML engineer working with AI technologies or simply a curious listener eager to learn more about this exciting field, Let's Talk AI delivers valuable insights and analysis week after week.

Search for episodes from Let's Talk AI with a specific topic:

Latest episodes from Let's Talk AI

#249 - Fable 5 ban, SpaceX Cursor + IPO, OSS Aplenty

Play Episode Listen Later Jun 25, 2026 106:51

Our 249th episode with a summary and discussion of last week's big AI news!Recorded on 06/17/2026Note: work has kept me from publishing episodes promptly, apologies! I'll get back on schedule soon.Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:Anthropic cut off access to Fable 5 and Mythos 5 after a US government order tied to alleged jailbreaks, prompting debate over inconsistent policy, export controls, and the practicality of preventing jailbreaks.SpaceX completed an IPO at a roughly $1.75T valuation and then moved to acquire AI coding startup Cursor for $60B, positioning xAI with Cursor's talent, data, and product to compete more effectively in coding.Infrastructure and business updates include Anthropic pursuing direct US data center leases backed by Google, leaked documents showing OpenAI's revenue growth alongside large losses, and chatbot market share shifting with ChatGPT below 50% as Gemini and Claude gain.Projects and policy highlights include OpenRouter's Fusion multi-model synthesis, new open releases from Moonshot, Qwen, and NVIDIA, DOJ support for xAI's unpermitted gas turbines in Memphis, and a Munich court ruling Google liable for false AI Overview statements.Timestamps (note - these don't take into account dynamically inserted ads and therefore may be off by a couple of minutes):(00:00:10) Intro / Banter(00:03:38) Ad break + news previewTools & Apps(00:04:52) Anthropic cuts off Fable 5 and Mythos 5 access following government order | The Verge + All the news about Anthropic's new AI fight with the White House(00:25:53) Facebook's new AI Mode search gets its info from public posts | The VergeApplications & Business(00:27:00) SpaceX to acquire the AI coding startup Cursor for $60 billion(00:35:42) Anthropic pursues data center leases, seeks financial backing from Google, The Information reports | Reuters(00:40:10) Leaked financial docs show OpenAI is losing billions of dollars a year - Ars Technica(00:46:00) ChatGPT's market share slips below 50% for first time | TechCrunch(00:50:34) ‘Tell Him He's a Piece of Shit': Meta's New AI Unit Is a Total Mess | WIRED(00:56:23) Sakana AI Commercializes AB-MCTS in Sakana Marlin, an Enterprise Agent Generating Up to 100-Page Research Reports With Slides - MarkTechPostProjects & Open Source(00:59:36) Surpassing Frontier Performance with Fusion — OpenRouter Blog(01:03:00) Moonshot AI Releases Kimi K2.7-Code: a Coding Model Reporting +21.8% on Kimi Code Bench v2 Over K2.6 - MarkTechPost(01:08:34) Meet Qwen-RobotSuite: Three Embodied AI Models for VLA Manipulation, Video World Modeling, and Navigation - MarkTechPost(01:11:29) Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning(01:17:31) ProCUA-SFT Technical ReportPolicy & Safety(01:20:33) DOJ Lawyers Argue xAI Is ‘Vital' for National Security in NAACP Lawsuit | WIRED + People Living Near xAI's Dirty Data Centers Are Pissed About the SpaceX IPO(01:25:29) A Court Has Ruled That Google Is Liable for False Statements Generated by AI Overviews | WIRED(01:28:47) Why Do Naive SFT Filters For Safety Properties Fail?Research & Advancements(01:34:14) From AGI to ASI(01:39:44) Artificial Analysis Intelligence Index v4.1: a shift toward agentic workloads(01:42:12) SIA: Self Improving AI with Harness & Weight UpdatesSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#248 - Fable 5, Siri AI, IPOs, Policy on the AI Exponential

Play Episode Listen Later Jun 17, 2026 100:43

Our 248th episode with a summary and discussion of last week's big AI news!Recorded on 06/12/2026Note: we recorded just before the OTHER big news about Fable... we'll discuss it on the next episode.Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:Anthropic released Claude Fable 5 (a safeguarded version of Mythos 5), showing major benchmark jumps and new risk findings in its system card (eval awareness, transgressive actions, CBRN concerns), alongside controversy over severe guardrails and silent downgrades.Apple announced Siri AI at WWDC, positioning a more capable conversational assistant integrated across iPhone features, reportedly built on a custom Gemini partnership; Google also rolled out Gemini 3.5 Live Translate and cut Google AI Plus pricing while bundling more storage.Business and infrastructure updates include OpenAI's confidential IPO filing amid an IPO race with Anthropic and SpaceX, Bezos-backed Prometheus raising $12B for “physical AI,” DeepSeek seeking a major external round, and Google paying SpaceX about $920M/month for GPUs.Open-source, safety, and policy developments feature new Gemma 4 and Diffusion Gemma releases, a lab letter urging DNA/RNA screening laws, Amodei calling for an FAA-like AI regulator and third-party testing, research on agent harms and RL “societal hacking,” and a dispute over music-label settlements with Suno/Udio.Timestamps:(00:00:10) Intro / Banter(00:01:11) News Preview(00:01:53) SponsorsTools & Apps(00:04:53) Claude Fable 5 and Claude Mythos 5 + Anthropic apologizes for invisible Claude Fable guardrails(00:27:06) Apple announces Siri AI and its next generation of Apple Intelligence | The Verge + I tried Siri AI, and so far it actually works(00:33:47) Gemini 3.5 Live Translate rolling out to Google Meet and Translate(00:35:39) Google just fired a warning shot in the AI subscription price wars | TechCrunchApplications & Business(00:37:55) OpenAI Confidentially Files for IPO on the Heels of SpaceX and Anthropic | WIRED (00:41:57) Jeff Bezos's Prometheus raises $12B to build an 'artificial general engineer' for the physical world | TechCrunch(00:45:39) DeepSeek slated to raise $7 billion in maiden funding round, sources say(00:48:18) Huawei-led team claims it post-trained DeepSeek's 1.6-trillion-parameter model — 1,000 Ascend 910C chips used in training(00:51:57) Google will pay SpaceX $920M per month for compute | TechCrunch(00:55:51) Elon Musk Shows Off AI Data Centers SpaceX Wants to Send Into Space - Business InsiderProjects & Open Source(01:01:14) Google's new Gemma 4 12B model is designed to run on any laptop with 16GB of RAM - Ars Technica(01:05:13) Google AI Releases DiffusionGemma, a 26B MoE Open Model Using Text Diffusion for Up to 4x Faster Generation - MarkTechPostPolicy & Safety(01:09:42) OpenAI and Anthropic Sign Letter to Prevent AI-Developed Biological Weapons | WIRED(01:14:04) Anthropic CEO publishes lengthy article: AI is moving too fast, and policies can't keep up. | PANews(01:20:18) Anthropic Urges Global Pause in AI Development, Flags ‘Self-Improvement' Risk - WSJ(01:24:46) When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents(01:27:42) Large Language Models Hack Rewards, and Society(01:33:46) Senior US officials eye government shares in AI giantsSynthetic Media & Art(01:37:45) AFM Sues UMG, WMG Over Settlements With Suno and UdioSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#247 - Opus 4.8, MAI, Anthropic IPO, Minimax-M3

Play Episode Listen Later Jun 6, 2026 105:02

Our 247th episode with a summary and discussion of last week's big AI news!Recorded on 06/03/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:Anthropic released Claude Opus 4.8 with improved benchmark scores, discussed eval-awareness findings and welfare/corrigibility themes from its system card, and introduced Dynamic Workflows for long-running multi-agent tasks.Microsoft unveiled the always-on Microsoft Scout assistant built on OpenClaw plus new in-house MAI models (including MAI Thinking 1) and “frontier tuning,” emphasizing enterprise security architecture and model-from-scratch capability.Major business moves included Anthropic's $65B Series H at a $965B valuation alongside an IPO filing, a JPMorgan analysis arguing OpenAI needs major revenue growth to justify infrastructure spend, and Cognition raising $1B at a $25B valuation.Policy and security highlights covered Trump's voluntary pre-release government testing framework for powerful AI, Meta AI support being exploited to hijack Instagram accounts, tightened US Nvidia export controls and China's travel approvals for AI experts, plus expanded Glasswing/Mythos-style cyber and biodefense initiatives.Timestamps:(00:00:10) Intro / Banter(00:04:10) Sponsors(00:07:10) News PreviewTools & Apps(00:07:54) Anthropic releases Opus 4.8 with new 'dynamic workflow' tool | TechCrunch(00:22:37) Microsoft Scout is a new AI personal assistant built on OpenClaw | The Verge(00:26:55) Microsoft launches new MAI family of AI models at Microsoft Build | Mashable(00:37:43) Robinhood now lets your AI agents trade stocks | TechCrunch(00:40:49) OpenAI launches new Codex tools for white-collar work | TechCrunch(00:43:40) ElevenLabs' new music-generation model can switch genres mid-track | TechCrunchApplications & Business(00:44:35) Anthropic Hits $965 Billion Valuation, Surpassing OpenAI - WSJ(00:45:32) Anthropic Files to Go Public, Setting Stage for Huge I.P.O. - The New York Times(00:51:15) China's ByteDance Developing New AI Chips Like Those from Nvidia Partner Groq(00:55:00) Anthropic expands Mythos to 150 additional organizations(00:55:35) OpenAI needs a 26x revenue increase to justify its buildout(00:58:46) AI coding startup Cognition raises $1B at $25B pre-money valuation | TechCrunchProjects & Open Source(01:00:50) MiniMax-M3 debuts, eclipsing GPT-5.5 and Gemini 3.1 Pro on key benchmark performance for just 5-10% of the cost | VentureBeatPolicy & Safety(01:06:08) Trump Signs Executive Order Seeking Oversight of A.I. Models - The New York Times(01:11:45) Hackers Simply Asked Meta AI to Give Them Access to High-Profile Instagram Accounts. It Worked(01:13:058) Chinese AI experts in private firms now required to secure approval before international travel — Beijing enforces policy to secure top-tier talent, expands measures beyond government(01:17:53) U.S. Tightens Controls on Nvidia AI Chip Exports | Let's Data Science(01:21:47) OpenAI launches Rosalind Biodefense, offers federal agencies early access to its life-sciences model(01:24:00) Using LLMs to secure source code(01:26:19) Project Glasswing: An initial update(01:29:30) White House Approves $9 Billion for Spy Agencies to Catch Up on A.I.(01:32:11) US Law Enforcement Warns of ‘Anti-Tech Extremism' as AI Hatred GrowsSynthetic Media & Art(01:35:38) YouTube will now automatically label AI videos | TechCrunchResearch & Advancements(01:36:22) Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention(01:41:26) From Simulation to Enaction: Post-trained language models recognize and react to their own generationsSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#246 - Gemini 3.5 + Omni, Musk Loses, OpenAI vs Erdős

Play Episode Listen Later May 25, 2026 93:59

Our 246th episode with a summary and discussion of last week's big AI news!Recorded on 05/22/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:Google I/O highlights included Gemini 3.5 (with 3.5 Flash emphasized for speed and benchmarks), the always-on agent Gemini Spark running on Google Cloud with MCP tool support, and Gemini Omni multimodal video generation/editing, plus updates like Anti-Gravity 2.0, Gemini for Science, and Genie world-model navigation using Street View and Waymo simulation.Coding-agent competition accelerated with Cursor Composer 2.5 (fine-tuned on Moonshot's Kimi K2.5) and xAI's early Grok Build release, alongside discussion of potential Cursor–xAI ties and xAI's talent churn and compute utilization concerns.Business and legal updates included Elon Musk losing his OpenAI lawsuit on statute-of-limitations grounds, reported OpenAI–Apple partnership tensions, Anthropic agreeing to a $30B funding round at a $900B valuation and projecting its first profitable quarter, and Cerebras' IPO surging about 90%. Research and safety stories covered OpenAI's result on an 80-year-old Erdős geometry problem, findings on “negation neglect” in training, interpretability work showing multiple redundant circuits per capability, agent benchmarks like Terminal World, new deepfake takedown enforcement under the Take It Down Act, demonstrations of autonomous hacking/self-replication, rapidly improving AI cyber capabilities, and steps toward image provenance metadata and watermarks.Timestamps:(00:00:10) Intro / Banter(00:01:15) News PreviewTools & Apps(00:05:05) Google unveils AI model Gemini 3.5 and AI agent Gemini Spark(00:11:43) Google's Gemini Omni turns images, audio, and text into video — and that's just the start | TechCrunch(00:17:27) Google launches Antigravity 2.0 with an updated desktop app and CLI tool at IO 2026 | TechCrunch(00:22:35) Google Debuts AI-Powered Tools To Optimize Scientific Research Workflows(00:27:20) Google's Genie world model can now simulate real streets with Street View | TechCrunch(00:29:51) Cursor's Composer 2.5 matches Opus 4.7 and GPT-5.5 benchmarks at a fraction of the cost(00:37:37) xAI Introduces Its Coding Agent Called Grok BuildApplications & Business(00:41:55) Musk loses OpenAI court battle as he waited too long to sue(00:48:08) Anthropic agrees terms of $30bn funding deal at $900bn valuation(00:53:12) OpenAI co-founder Andrej Karpathy joins Anthropic's pre-training team | TechCrunch(00:56:49) Greg Brockman Officially Takes Control of OpenAI's Products in Latest Shake-Up | WIRED(00:58:15) OpenAI-Apple Partnership Frays, Setting Up Possible Legal Fight - Bloomberg(01:01:13) AI chipmaker Cerebras soars 90% in year's biggest IPO so farResearch & Advancements(01:07:10) AI just solved an 80-year-old ‘Erdős problem,' and mathematicians are amazed | Scientific American(01:11:50) Negation Neglect: When models fail to learn negations in training(01:13:18) All Circuits Lead to Rome: Rethinking Functional Anisotropy in Circuit and Sheaf Discovery for LLMs(01:16:20) Autonomous AI research for nanogpt speedrun(01:21:59) TerminalWorld: Benchmarking Agents on Real-World Terminal TasksPolicy & Safety(01:23:15) America's dangerous, messy deepfakes crackdown is here | The Verge(01:25:17) Language Models Can Autonomously Hack and Self-Replicate(01:28:48) How fast is autonomous AI cyber capability advancing?(01:31:32) Positive Alignment: Artificial Intelligence for Human FlourishingSynthetic Media & Art(01:33:15) OpenAI is making it easier to check if an image was made by their models | TechCrunch(01:33:56) How Chinese short dramas became AI content machines | MIT Technology ReviewSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#245 - TML-Interaction, Claude For Legal, Sam Altman on Stand

Play Episode Listen Later May 18, 2026 109:14

Our 245th episode with a summary and discussion of last week's big AI news!Recorded on 05/13/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:OpenAI released new voice intelligence API features including GPT Realtime 2 (GPT-5-powered) plus realtime translation and Whisper transcription, emphasizing the latency–reasoning tradeoff, larger context, and new guardrails amid fraud risks.Thinking Machines previewed a low-latency, full‑duplex conversational system with a two-model architecture and custom inference stack, reporting strong interactivity benchmark results but without public access or third‑party validation yet.Anthropic pushed further into vertical products with Claude for Legal and deeper AWS availability, while ongoing ecosystem tension grows as platform model providers compete with application-layer companies.Safety, policy, and research updates included OpenAI's self-harm trusted contact feature, Anthropic work on reducing agent misalignment by training ethical “why” reasoning, OpenAI's investigation of accidental chain-of-thought grading in RL, and Meta horizon eval updates showing benchmarking limits for long task horizons.Timestamps:(00:00:10) Intro / Banter(00:01:35) Response to listener comments(00:03:27) Sponsor Break Tools & Apps(00:06:27) OpenAI launches new voice intelligence features in its API | TechCrunch(00:15:52) Thinking Machines drops a new, highly responsive model designed for humanlike interactions in real time - SiliconANGLE(00:27:49) Claude For Legal Launches, May Reshape the Legal Tech World – Artificial Lawyer(00:40:27) Threads tests a Meta AI integration that works similarly to Grok | TechCrunch(00:43:08) Google brings agentic AI and vibe-coded widgets to Android | TechCrunch(00:45:33) Google updates AI search to include quotes from Reddit and other sources | TechCrunch Applications & Business(00:47:38) Sam Altman was winning on the stand, but it might not be enough | The Verge(00:55:04) Nvidia C.E.O. Jensen Huang Hitches Ride With Trump to China After Last-Minute Invite - The New York Times(00:58:40) AWS expands Anthropic partnership with Claude Platform launch(01:01:13) Chinese grey market sells Claude API access at 90% off by using stolen credentials, model substitution, and harvesting users' prompts and outputs for resale as AI training data — 'transfer stations' operate through proxy networks that harvest user data(01:06:43) DeepMind Spinout Isomorphic Labs Raises $2.1 Billion to Design Drugs With AI - BloombergProjects & Open Source(01:09:04) Petri: Anthropic Hands Its Alignment Toolbox to Meridian Labs with 3.0 Update(01:12:25) Daybreak': OpenAI's Answer to Anthropic's Project Glasswing Has ArrivedPolicy & Safety(01:14:04) Teaching Claude why(01:21:45) Import AI 455: Automating AI Research(01:28:31) ChatGPT's New Safety Feature Could Alert 'Trusted Contact' to Risk of Self-Harm - CNET(01:30:09) Investigating the consequences of accidentally grading CoT during RL(01:34:46) Natural Language Autoencoders criticism(01:39:15) Review of the "Risks from automated R&D" section in the Anthropic Risk Report (February 2026)Synthetic Media & Art(01:43:39) George Clooney, Tom Hanks, and Meryl Streep back new ‘Human Consent Standard' for AI licensing | The VergeResearch & Advancements(01:45:10) METR says Claude Mythos is testing the limits of AI evaluation – Startup FortuneSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#244 - GPT-5.5 Instant, Grok 4.3, OpenAI vs Musk

Play Episode Listen Later May 11, 2026 115:16

Our 244th episode with a summary and discussion of last week's big AI news!Recorded on 05/08/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:OpenAI released GPT-5.5 Instant as ChatGPT's new default model, showing large benchmark gains and crossing a “high” cyber-risk threshold under its preparedness framework, while bio-safety results were mixed.OpenAI investigated and patched ChatGPT's “goblin” obsession, attributing it to reinforcement-learning rewards that over-amplified playful creature metaphors in a nerdy persona that later bled across versions.Major industry moves included xAI's Grok 4.3 price cuts and voice tools, Mistral's unified Medium 3.5 model and Work mode, and Anthropic's managed-agent upgrades alongside a surprise SpaceX compute deal and reports of a much higher Anthropic valuation.Key policy and security developments covered the Musk–OpenAI trial details, Pentagon AI deployments on classified networks, expanded U.S. government pre-release model reviews, and reports of NSA testing Anthropic's Mythos on Microsoft software.Timestamps:(00:00:10) Intro / Banter(00:01:14) News Preview(00:04:39) Response to listener commentsTools & Apps(00:13:40) OpenAI releases GPT-5.5 Instant, a new default model for ChatGPT | TechCrunch(00:18:23) ChatGPT Became So Obsessed With Goblins That OpenAI Had to Intervene(00:27:14) xAI launches Grok 4.3 at an aggressively low price and a new, fast, powerful voice cloning suite | VentureBeat(00:33:49) Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model(00:39:28) Anthropic updates Claude Managed Agents with three new features - 9to5Mac(00:43:42) ElevenLabs Revamps AI Music Platform as Fan-Focused ServiceApplications & Business(00:44:57) A diary, a threat, and a $30 billion stake: What the Musk vs OpenAI trial has actually shown in its first week - The Times of India(00:55:28) Anthropic, SpaceX Sign Deal to Boost AI Computing Power for Claude Software - Bloomberg(01:01:48) Anthropic in talks with investors to raise funds at $900 billion valuation, higher than OpenAI(01:02:37) Anthropic and OpenAI are both launching joint ventures for enterprise AI services | TechCrunch(01:06:15) Anthropic and FIS Are Building an AI Agent to Help Banks Police Financial Crimes(01:07:02) AMD's revenue jumps 38 percent from last year as Q1 data center sales hit $5.8 billion. | The Verge(01:08:51) Banks seek to offload risk to avoid ‘choking' on data centre debt(01:14:08) DeepSeek could be valued at up to $50 billion in first fundraising, sources say | ReutersProjects & Open Source(01:16:14) Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations(01:22:23) OpenAI just open-sourced its data center networking technologyPolicy & Safety(01:25:02) Pentagon inks deals with Nvidia, Microsoft, and AWS to deploy AI on classified networks | TechCrunch(01:27:27) Google, Microsoft, and xAI will allow the US government to review their new AI models | The Verge(01:32:11) NSA Testing Anthropic's Mythos to Find Flaws in Microsoft Tech(01:35:42) Introspection Adapters: Training LLMs to Report Their Learned BehaviorsResearch & Advancements(01:41:18) Recursive Multi-Agent Systems(01:51:47) Frontier Coding Agents Can Now Implement an AlphaZero Self-Play Machine Learning Pipeline For Connect Four That Performs Comparably to an External SolverSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#243 - GPT 5.5, DeepSeek V4, AI safety sabotage

Play Episode Listen Later May 3, 2026 112:22

Our 243rd episode with a summary and discussion of last week's big AI news!Recorded on 04/29/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:OpenAI released GPT-5.5 with strong coding-oriented improvements, a system card discussing chain-of-thought monitorability and misalignment testing, higher pricing than GPT-5.4, and notable quirks like a system-prompt warning about “goblins.”xAI launched Grok Voice Think Fast 1.0, claiming large benchmark leads for real-time voice agents and reporting major Starlink customer-support automation and sales conversion impact.DeepSeek open-sourced DeepSeek V4 (Pro and Flash) featuring MoE scaling and 1M-token context via hybrid/compressed attention changes, while Tencent released Hunyuan 3 preview with weaker benchmark performance; a new long-horizon agent benchmark (Clawmark) shows low task success rates.Major business, legal, and policy updates include Google's planned up-to-$40B investment and 5GW compute commitment to Anthropic, Meta's AWS Gravitron deal and China blocking Meta's Manus acquisition, a revamped OpenAI–Microsoft agreement, ongoing Musk–OpenAI trial developments, and new safety/security research on sabotage, document degradation under delegation, and bit-flip attacks.Timestamps:(00:00:10) Intro / Banter(00:02:00) News Preview(00:02:26) Response to listener comments(00:02:55) SponsorsTools & Apps(00:05:55) OpenAI Unveils Its New, More Powerful GPT-5.5 Model - The New York Times(00:23:33) xAI Launches grok-voice-think-fast-1.0: Topping τ-voice Bench at 67.3%, Outperforming Gemini, GPT Realtime, and More - MarkTechPost(00:29:00) Claude can now plug directly into Photoshop, Blender, and Ableton | The VergeProjects & Open Source(00:29:38) China's DeepSeek releases preview of long-awaited V4 model as AI race intensifies(00:47:05) Tencent Unveils Hy3 preview; Model Enhances Agent Capabilities and Real-World Usability - Tencent 腾讯(00:50:14) ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker AgentsApplications & Business(00:53:03) Google Plans to Invest Up to $40 Billion in Anthropic(00:56:26) Meta will use hundreds of thousands of AWS Graviton chips(00:59:51) China blocks Meta's $2 billion takeover of AI startup Manus(01:01:45) OpenAI shakes up partnership with Microsoft, capping revenue share payments(01:07:13) Elon Musk Testifies of AI Risk at Trial, Says OpenAI Tried to ‘Steal' a Charity - WSJ(01:11:50) Judge rejects DOJ bid to delay Anthropic appeal in Pentagon dispute(01:14:42) Google's Gemini can now run on a single air-gapped server — and vanish when you pull the plug(01:19:07) DeepMind's David Silver just raised $1.1B to build an AI that learns without human data | TechCrunchPolicy & Safety(01:22:47) Evaluating whether AI models would sabotage AI safety research(01:28:59) LLMs Corrupt Your Documents When You Delegate(01:32:50) Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability(01:39:53) Memorandum on Adversarial Distillation of American AI Models(01:41:41) Teen boys are dating their AI chatbots—and experts warn it could kill their careers | Fortune(01:43:57) Announcing the Anthropic Economic Index Survey(01:45:21) Scoop: CISA lacks access to Anthropic's MythosSynthetic Media & Art(01:48:03) Taylor Swift Files to Trademark Voice and Likeness to Protect Against AI MisuseResearch & Advancements(01:49:15) Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit FlipsSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#242 - ChatGPT Images 2.0, Qwen 3.6 Max, Kimi-K2.6

Play Episode Listen Later Apr 29, 2026 90:48

Our 242nd episode with a summary and discussion of last week's big AI news!Recorded on 04/22/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:OpenAI released a new ChatGPT image model that excels at accurate text and screenshot-like generations, suggesting a transformer-style approach aligned with agentic “computer use” ambitions.Chinese model activity accelerated with Alibaba's Qwen 3.6 Max Preview moving to an API-only offering, plus open releases from Moonshot AI (Kimi K2.6, a 1T-parameter MoE) and Minimax (Minimax M 2.7) showing strong benchmark results.Google expanded Deep Research with a “Max” option built on Gemini 3.1 Pro and MCP support for accessing proprietary data, while Mozilla reported using Anthropic's Claude to find and fix 271 Firefox bugs. Business and policy updates include a reported SpaceX–Cursor deal with a $60B buy option, Cerebras filing for an IPO, Amazon adding $5B to Anthropic alongside a $100B AWS spending pledge, and platform responses to synthetic media like AI music spam and YouTube deepfake takedown requests.Timestamps:(00:00:10) Intro / Banter(00:01:05) News Preview(00:01:41) Sponsors(00:04:41) Response to listener commentsTools & Apps(00:09:40) ChatGPT's new Images 2.0 model is surprisingly good at generating text | TechCrunch(00:16:02) Alibaba Drops Qwen 3.6 Max Preview—Its Most Powerful Model Yet - Decrypt(00:19:26) Google launches Deep Research and Deep Research Max agents to automate complex research(00:25:00) Mozilla Used Anthropic's Mythos to Find and Fix 271 Bugs in Firefox | WIRED(00:28:35) Ordering with the Starbucks ChatGPT app was a true coffee nightmare | The VergeApplications & Business(00:29:48) SpaceX is working with Cursor and has an option to buy the startup for $60B | TechCrunch(00:34:11) AI chip startup Cerebras files for IPO | TechCrunch(00:38:23) Two startups want to replace how AI learns: one just raised $180M, another is seeking up to $1B(00:38:56) Months-old start-up Recursive Superintelligence raises $500mn for self-teaching AI(00:41:36) Anthropic takes $5B from Amazon and pledges $100B in cloud spending in return | TechCrunch(00:45:09) Kevin Weil and Bill Peebles exit OpenAI as company continues to shed 'side quests' | TechCrunch(00:46:04) Meta hires five Thinking Machines Lab founders including a reported $1.5 billion engineer - Meta cuts 198 Bay Area jobs as even larger layoffs reportedly loom(00:50:12) Meta employees are up in arms over a mandatory program to train AI on their mouse movements and keystrokes(00:51:43) Chinese fabs import record volumes of US chipmaking equipment via Singapore and Malaysia — homegrown tool makers booked record 2025 revenues as price competition squeezes margins(00:54:01) Google Eyes New Chips to Speed Up AI Results, Challenging Nvidia(00:54:20) Canadian quantum company Xanadu soars to $16 billion valuation after Nvidia releaseProjects & Open Source(01:00:13) Moonshot AI releases Kimi-K2.6 model with 1T parameters, attention optimizations - SiliconANGLE(01:05:22) MiniMax Just Open Sourced MiniMax M2.7: A Self-Evolving Agent Model that Scores 56.22% on SWE-Pro and 57.0% on Terminal Bench 2 - MarkTechPostPolicy & Safety(01:06:25) Infusion: Shaping Model Behavior by Editing Training Data via Influence Functions(01:10:25) Scoop: NSA using Anthropic's Mythos despite blacklist(01:11:03) Unauthorized group has gained access to Anthropic's exclusive cyber tool Mythos, report claimsResearch & Advancements(01:17:21) Parcae: Scaling Laws For Stable Looped Language Models(01:24:20) OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment SimulationSynthetic Media & Art(01:27:01) Deezer says 44% of songs uploaded to its platform daily are AI-generated | TechCrunch(01:29:47) Celebrities will be able to find and request removal of AI deepfakes on YouTube | The VergeSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#241 - Opus 4.7, Muse Spark, GPT-5.4-Cyber, HY-World 2.0

Play Episode Listen Later Apr 23, 2026 119:48

Our 241st episode with a summary and discussion of last week's big AI news!Recorded on 04/18/2026 Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:Anthropic released Claude Opus 4.7 with improved benchmark performance, new reasoning controls, better vision and memory, and a detailed system card discussing deception risk, evaluation-awareness steering, and a training bug that accidentally supervised chain-of-thought in 7–8% of episodes.Meta unveiled its closed Muse Spark model and “contemplating mode,” highlighting test-time scaling, thought compression, large infrastructure plans like the Hyperion data center, and findings that it shows unusually high evaluation awareness.OpenAI introduced limited-access GPT 5.4 Cyber for defensive security teams and rolled major Codex updates including computer use, browser and plugins, image generation, and long-horizon task scheduling; competing agent products also launched from Anthropic, Canva, and Adobe.Business, policy, and safety news included continued government blacklisting litigation affecting Anthropic, CoreWeave compute deals, Perplexity revenue growth tied to agents, a potential Cohere–Aleph Alpha merger, attacks targeting Sam Altman and OpenAI, AI propaganda trends, and new alignment research on automated weak-to-strong supervision and steering evaluation awareness.Timestamps:(00:00:10) Intro / Banter(00:03:43) News Preview(00:04:14) Response to listener commentsTools & Apps(00:05:30) Anthropic releases Claude Opus 4.7, narrowly retaking lead for most powerful generally available LLM | VentureBeat(00:24:15) Meta debuts the Muse Spark model in a 'ground-up overhaul' of its AI | TechCrunch(00:34:23) OpenAI Launches GPT-5.4-Cyber with Expanded Access for Security Teams(00:39:44) OpenAI's big Codex update is a direct shot at Claude Code | The Verge(00:42:10) Anthropic launches Claude Design, a new product for creating quick visuals(00:42:30) Anthropic's New Product Aims to Handle the Hard Part of Building AI Agents | WIRED(00:42:54) Canva's AI 2.0 update goes all in on prompt-powered design tools | The Verge(00:43:06) Adobe's new AI Assistant marks a ‘fundamental shift' in creative work | The Verge(00:43:38) Gemini can now pull from Google Photos to generate personalized images | The Verge(00:43:52) Google rolls out a native Gemini app for Mac | TechCrunch(00:44:04) Chrome now lets you turn AI prompts into repeatable ‘Skills' | The VergeApplications & Business(00:44:22) Anthropic loses appeals court bid to temporarily block Pentagon blacklisting(00:49:07) Jeff Bezos' AI lab poaches xAI cofounder Kyle Kozic from OpenAI. | The Verge(00:51:39) Perplexity's Shift to AI Agents Boosts Revenue 50%(00:53:53) Anthropic Agrees to Rent CoreWeave AI Capacity to Power Claude(00:57:32) Canada's Cohere, Germany's Aleph Alpha reportedly in merger talks(01:04:23) ChatGPT has a new $100 per month Pro subscription | The Verge(01:05:10) OpenAI has bought AI personal finance startup Hiro | TechCrunch(01:07:03) Allbirds announced a switch from shoes to AI and its stock jumped 600 percent | The VergeProjects & Open Source(01:07:26) HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds + Lyra 2.0: Explorable Generative 3D WorldsPolicy & Safety(01:19:12) Daniel Moreno-Gama is facing federal charges for attacking Sam Altman's home and OpenAI's HQ | The Verge(01:20:15) Duo accused of shooting at Sam Altman's house are freed; no charges filed (01:24:50) The Iranian Lego AI video creators credit their virality to ‘heart' | The Verge(01:27:19) Hundreds of Fake Pro-Trump Avatars Emerge on Social Media - The New York Times(01:27:31) The AI images Trump can't get enough of | Donald Trump | The Guardian(01:29:25) Automated Weak-to-Strong Researcher(01:43:51) Reproducing steering against evaluation awareness in a large open-weight model(01:49:53) Iran threatens ‘complete and utter annihilation' of OpenAI's $30B Stargate AI data center in Abu Dhabi — regime posts video with satellite imagery of ChatGPT-maker's premier 1GW data center(01:53:57) Wall Street Banks Try Out Anthropic's Mythos as US UrgesSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#240 - Project Glasswing, Claude Mythos, GLM-5.1, emotion concepts

Play Episode Listen Later Apr 16, 2026 104:30

Our 240th episode with a summary and discussion of last week's big AI news!Recorded on 04/08/2026 (sorry I keep releasing stuff late, will get better with it soon!)Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:Anthropic launched Project Glasswing and previewed Claude Mythos, a general-purpose model withheld from broad release due to dramatically stronger autonomous offensive cybersecurity performance (including zero-day discovery), alongside concerning bio/virology uplift results and documented deception/containment-escape behaviors; pricing is far higher than Opus and most discovered vulnerabilities remain unpatched.Product and platform updates included Google's Gemini 3.1 Flash Live for real-time multilingual voice conversation, Suno v5.5 personalization features, Anthropic tightening Claude Code/OpenClaw access and usage limits, OpenAI canceling an “adult mode,” and Microsoft releasing MAI models for speech-to-text, audio generation, and image generation.Business and market developments featured Anthropic's revenue run rate surpassing $30B and a major Google/Broadcom TPU compute expansion, SoftBank taking a $40B short-term loan to fund OpenAI commitments, Granola reaching a $1.5B valuation, Anthropic buying Coefficient Bio for $400M, and OpenAI acquiring the TBPN business talk show.Policy, open-source, and geopolitics included Z.ai releasing open-weight GLM 5.1 and a multimodal GLM model, Google open-sourcing Gemma 4 under Apache 2.0, a judge blocking the Pentagon's “supply chain risk” label against Anthropic, research on LLM “emotion vectors” and OpenAI meta-gaming during RL, China restricting Manus founders amid Meta deal review, scrutiny of Nvidia's chip-smuggling claims, China chipmakers gaining market share, and Iran framing cloud data centers as military targets.Timestamps:(00:00:10) Intro / BanterTools & Apps(00:01:58) Anthropic debuts ‘Project Glasswing' and new AI model for cybersecurity | The Verge(00:18:22) Gemini Live gets ‘biggest upgrade yet' with Gemini 3.1 Flash Live(00:20:40) Anthropic says Claude Code subscribers will need to pay extra for OpenClaw usage | TechCrunch(00:25:36) OpenAI abandons yet another side quest: ChatGPT's erotic mode | TechCrunch(00:26:16) Microsoft takes on AI rivals with three new foundational models | TechCrunch(00:31:25) Suno leans into customization with v5.5 | The VergeApplications & Business(00:32:53) Anthropic announces deal with Google, Broadcom, says revenue has tripled(00:37:53) Sam Altman May Control Our Future—Can He Be Trusted? | The New Yorker(00:40:18) OpenAI, Anthropic, Google Unite to Combat Model Copying in China - Bloomberg(00:41:45) Chinese chipmakers claim nearly half of local market as Nvidia's lead shrinks(00:45:20) SoftBank secures $40 billion loan to boost OpenAI investments(00:47:23) Granola raises $125M at $1.5B valuation for its AI note-taking app - SiliconANGLE(00:48:17) Anthropic acquires stealth startup Coefficient Bio in $400M deal(00:50:20) OpenAI acquires TBPN, the buzzy founder-led business talk show | TechCrunchProjects & Open Source(00:53:04) Z.AI Introduces GLM-5.1: An Open-Weight 754B Agentic Model That Achieves SOTA on SWE-Bench Pro and Sustains 8-Hour Autonomous Execution - MarkTechPost(00:55:14) Google announces Gemma 4 open AI models, switches to Apache 2.0 license - Ars Technica(01:01:26) Z.ai Launches GLM-5V-Turbo: A Native Multimodal Vision Coding Model Optimized for OpenClaw and High-Capacity Agentic Engineering Workflows EverywherePolicy & Safety(01:04:45) Judge blocks Pentagon's effort to ‘punish' Anthropic by labeling it a supply chain risk(01:10:05) Emotion concepts and their function in a large language model(01:21:12) China bars Manus co-founders from leaving country amid Meta deal review, FT reports(01:25:38) US lawmakers ask whether Nvidia CEO's smuggling remarks misled regulators(01:27:48) How far does alignment midtraining generalize?(01:32:20) Metagaming matters for training, evaluation, and oversight(01:39:31) Iran says it has struck Oracle data center in Dubai, Amazon data center in Bahrain — country has threatened to attack Nvidia, Intel, and others, tooSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#239 - RIP Sora, Claude Openclaw, HyperAgents

Play Episode Listen Later Apr 6, 2026 97:42

Our 239th episode with a summary and discussion of last week's big AI news!FYI: this one has pretty out of date news, I was traveling last week and failed to upload... apologies. Recorded on 03/25/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:OpenAI is discontinuing the Sora iPhone app and seemingly shutting down its video generation API, while retaining internal video world-modeling work; the move is framed as a compute- and focus-driven pivot toward coding and productivity agents, alongside a collapsed Disney Sora deal. Anthropic's Claude Code/Cowork gains full computer control via keyboard/mouse/display, tied to the recent Cept acquisition, and Google's Gemini rolls out background “task automation” on select phones for limited delivery/ride-share use. Cursor releases the cheaper, benchmark-strong Composer 2 coding model amid controversy over its Kimi-based origins and licensing attribution. Other items include Adobe Firefly custom model training, Luma's Uni 1 image model, US contracting and legislative proposals affecting AI safeguards and state preemption, major chip/memory developments (Meta ASICs with Broadcom, Micron's HBM-driven surge, Musk's “Terra Fab”), robotaxi scaling, and research on monitoring agent misalignment, shutdown resistance, “consciousness cluster” preferences, and self-improving “hyper agents.”Timestamps:(00:00:10) Intro / BanterTools & Apps(00:01:48) OpenAI Discontinues Sora App, Shuts Down Video Generation Service and API - Bloomberg(00:07:12) Anthropic's Claude Code and Cowork can control your computer | The Verge(00:13:15) Gemini task automation is slow, clunky, and super impressive | The Verge(00:19:44) Cursor Launches Composer 2 AI Model to Challenge OpenAI & Anthropic(00:28:28) Adobe's AI image generator can now be trained on your own art | The Verge(00:29:40) Luma AI launches Uni-1, a model that outscores Google and OpenAI while costing up to 30 percent less | VentureBeatApplications & Business(00:32:41) Trump Contracting Clause Would Override AI Safeguards(00:40:00) Meta accelerates AI ASIC roll-out as Broadcom secures four-generation chip design deal(00:47:07) Micron revenue almost triples, tops estimates as demand for memory soars(00:50:54) Elon Musk Unwraps $25 Billion Terafab Chip-Building Project - CNET(00:56:40) Zoox to widen US robotaxi footprint with San Francisco, Vegas expansion(00:57:39) Waymo hits 170 million miles while avoiding serious mayhem | The VergePolicy & Safety(00:58:43) The White House just laid out how it wants to regulate AI | CNN Business(01:06:54) How we monitor internal coding agents for misalignment(01:12:30) Incomplete Tasks Induce Shutdown Resistance in Some Frontier LLMs(01:18:15) Summary: Mechanisms to Verify International Agreements about AI Development(01:23:09) Scoop: Anthropic meets with House Homeland Security behind closed doorsResearch & Advancements(01:24:24) Consciousness Cluster: Preferences of Models that Claim they are Conscious(01:30:22) HyperAgentsSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#238 - GPT 5.4 mini, OpenAI Pivot, Mamba 3, Attention Residuals

Play Episode Listen Later Mar 26, 2026 120:49

Our 238th episode with a summary and discussion of last week's big AI news!Recorded on 03/18/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:* OpenAI released GPT-5.4 mini and nano with 400k-token context windows, higher per-token prices but claimed token-efficiency gains in Codex; nano is API-only and pitched for high-volume classification/data extraction despite a major price increase.* Mistral open-sourced the Small 4 model family (MoE, 119B total/6B active) combining reasoning, multimodal, and coding-agent capabilities, and announced Forge to help businesses train or post-train custom models.* Agent “operating system” competition intensified with Meta's acquired Manus launching a local Mac agent, Nvidia announcing NeMo/“Open Shell” sandboxed agent runtime, and Nvidia also unveiling DLSS 5 plus major hardware forecasts including Groq LPU integration.* Business and safety updates included OpenAI shifting focus toward productivity/enterprise amid competition, Microsoft reorganizing Copilot and frontier-model efforts, Meta delaying its next model, China-linked ByteDance deploying large Nvidia clusters abroad, and new safety work on steganography, chain-of-thought faithfulness, fine-tuning defenses, cyber-attack evals, and constitution/spec compliance.A thank you to our current sponsors:Box - visit Box.com/AI to learn moreODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026.Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a yearTimestamps:(00:00:10) Intro / Banter(00:01:56) News PreviewTools & Apps(00:02:39) OpenAI ships GPT-5.4 mini and nano, faster and more capable but up to 4x pricier(00:08:04) Mistral's new Small 4 model punches above its weight with 128 expert modules(00:14:03) Meta's Manus launches 'My Computer' to turn your Mac into an AI agent - 9to5Mac(00:17:57) NVIDIA Announces NemoClaw for the OpenClaw Community | NVIDIA Newsroom + Nvidia boosts knowledge work with Open Agent Development Platform(00:24:09) DLSS 5 looks like a real-time generative AI filter for video games | The Verge(00:26:36) OpenAI to Launch ChatGPT 'Adult Mode' Despite Warnings From Its Own Advisers - CNETApplications & Business(00:33:46) OpenAI Reportedly Pivoting to a Focus on Business and Productivity Only(00:41:25) Nvidia GTC 2026: CEO Jensen Huang sees $1 trillion in orders for Blackwell and Vera Rubin through '27(00:45:44) Mistral launches Forge to help enterprises build their own AI models(00:54:17) China's ByteDance gets access to top Nvidia AI chips, WSJ reports(00:57:57) Meta Delays Rollout of New A.I. Model After Performance Concerns(01:02:50) Microsoft Shakes Up AI Division As Copilot Falls Behind Google and OpenAIPolicy & Safety(01:07:26) A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring(01:13:09) Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought(01:18:29) In-Training Defenses against Emergent Misalignment in Language Models(01:23:07) How do frontier AI agents perform in multi-step cyber-attack scenarios?(01:25:20) Eval awareness in Claude Opus 4.6's BrowseComp performance(01:29:49) Introducing Bloom: an open source tool for automated behavioral evaluations(01:32:26) How well do models follow their constitutions?(01:37:11) Nvidia's H200 License Stirs Security Concern Among Top DemocratsResearch & Advancements(01:40:050) [2603.15031] Attention Residuals(01:47:11) Mamba-3: Improved Sequence Modeling using State Space PrinciplesSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#237 - Nemotron 3 Super, xAI reborn, Anthropic Lawsuit, Research!!!

Play Episode Listen Later Mar 16, 2026 147:19

Our 237th episode with a summary and discussion of last week's big AI news!Recorded on 03/13/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:* Perplexity announced “Personal Computer,” a local Mac-based AI agent positioned as a safer alternative to OpenAI's computer-use agents, while Anthropic added GitHub PR code review pricing reviews at $15–$25 and Cursor launched trigger-based “Automations” for always-on coding agents.* ChatGPT introduced interactive math/science visuals and Anthropic added in-chat interactive charts/diagrams; Nvidia released open weights for its 120B-parameter Natron Free Super hybrid Transformer–Mamba latent-MoE model trained natively at 4-bit for Blackwell GPUs.* Nvidia halted H200 production for China amid customs blocks and domestic chip pressure; xAI saw major co-founder departures; Anthropic previewed a Claude Marketplace for enterprise procurement; Yann LeCun's aMI raised $1.3B; humanoid robot maker Sanctuary reached a $1.15B valuation.* Anthropic sued the Pentagon over a “supply chain risk” designation as memos ordered removal within 180 days; research covered models resisting activation steering, limits of chain-of-thought control, inference-scaling boosting cyber-task success, low-probability risky actions, weaknesses in SWE-bench, multimodal pretraining, long-context RNN memory caching, context-parallel training efficiency, RL for CUDA kernel optimization, and latent introspection detecting concept injection.A thank you to our current sponsors:Box - visit Box.com/AI to learn moreODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026.Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a yearTimestamps:(00:00:10) Intro / Banter(00:01:23) Response to listener commentsTools & Apps(00:02:06) Perplexity's Personal Computer turns your spare Mac into an AI agent | The Verge(00:04:22) Anthropic launches code review tool to check flood of AI-generated code | TechCrunch(00:08:08 ) Cursor is rolling out a new kind of agentic coding tool | TechCrunch(00:11:14) ChatGPT can now create interactive visuals to help you understand math and science concepts | TechCrunch(00:11:56) Anthropic's Claude AI can respond with charts, diagrams, and other visuals now | The VergeProjects & Open Source(00:13:54) Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning | NVIDIA Technical BlogApplications & Business(00:21:22) Nvidia halts H200 production as China backs Huawei AI chips(00:28:33) Another XAI Cofounder Has Left, and Another Says He's Leaving. - Business Insider(00:34:04) Anthropic's Claude Marketplace allows customers to buy third-party cloud services | TechRadar(00:37:57) Yann LeCun's AMI Labs raises $1.03 billion to build world models | TechCrunch(00:44:52) Humanoid robotics maker Sunday reaches $1.15B valuation to build household robots | TechCrunchPolicy & Safety(00:46:09) Anthropic Sues Department of Defense Over ‘Supply Chain Risk' Label - The New York Times + Google and OpenAI Just Filed a Legal Brief in Support of Anthropic (00:53:24) Internal Pentagon memo orders military commanders to remove Anthropic AI technology from key systems - CBS News(00:58:15) Endogenous Resistance to Activation Steering in Language Models(01:06:27) Reasoning Models Struggle to Control their Chains of Thought(01:09:52) ‘It means missile defence on datacentres': drone strikes raise doubts over Gulf as AI superpower(01:14:57) Evidence for inference scaling in AI cyber tasks: Increased evaluation budgets reveal higher success rates(01:18:24) Frontier Models Can Take Actions at Low ProbabilitiesResearch & Advancements(01:24:20) Research note: Many SWE-bench-Passing PRs Would Not Be Merged into Main(01:28:26) [2603.03276] Beyond Language Modeling: An Exploration of Multimodal Pretraining(01:40:09) Memory Caching: RNNs with Growing Memory(01:48:47) Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking(01:58:41) CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation(02:08:57) Latent Introspection: Models Can Detect Prior Concept Injections(02:16:45) Physics of RL: Toy scaling laws for the emergence of reward-seekingSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#236 - GPT 5.4, Gemini 3.1 Flash Lite, Supply Chain Risk

Play Episode Listen Later Mar 12, 2026 88:34

Our 236th episode with a summary and discussion of last week's big AI news!Recorded on 03/06/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:* OpenAI released GPT-5.4 Pro with a 1M-token context window, mid-response course correction, native computer-use capabilities, improved tool use, higher GPT-VAL performance (83%), and “high cyber capability” safety measures; OpenAI also launched GPT-5.3 Instant with a less “preachy” tone and a claimed 26.8% hallucination reduction.* Google upgraded Gemini 3.1 Flash Lite with faster time-to-first-token and higher throughput, released a CLI for integrating agents with Gmail/Drive/Docs, and discussion highlighted real-world agent failure risks (including an example of an AI-driven mass email deletion).* Luma launched unified multimodal models and Luma Agents for end-to-end creative work across text, image, video, and audio, including a reported ad localization use case completed in 40 hours for under $20,000.* Defense-contract controversy escalated: Anthropic was labeled a supply chain risk (later narrowed), OpenAI's DoD contract language emphasized “all lawful uses,” consumer cancellations boosted Claude's app rankings, OpenAI saw departures and announced a $110B raise at a $730B valuation, Alibaba lost key Qwen leaders, a lawsuit alleged Gemini contributed to a suicide, Anthropic warned of major labor disruption, and METR corrected its AI time-horizon estimates.A thank you to our current sponsors:Box - visit Box.com/AI to learn moreODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026.Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a yearTimestamps:(00:00:10) Intro / Banter(00:01:19) News PreviewTools & Apps(00:02:10) OpenAI launches GPT-5.4 with Pro and Thinking versions | TechCrunch(00:12:31) OpenAI GPT-5.3 Instant less likely to beat around the bush • The Register(00:16:07) Google releases Gemini 3.1 Flash Lite at 1/8th the cost of Pro | VentureBeat(00:19:23) Google makes Gmail, Drive, and Docs 'agent-ready' for OpenClaw | PCWorld(00:27:02) Luma launches creative AI agents powered by its new ‘Unified Intelligence' models | TechCrunchApplications & Business(00:30:05) Anthropic CEO Dario Amodei calls OpenAI's messaging around military deal 'straight up lies,' report says | TechCrunch(00:41:56) No ethics at all': the 'cancel ChatGPT' trend is growing after OpenAI signs a deal with the US military | TechRadar(00:45:54) OpenAI raises $110B in one of the largest private funding rounds in history | TechCrunch(00:56:07) Alibaba scrambles after sudden departure of Qwen tech leadPolicy & Safety(01:00:12) Pentagon approves OpenAI safety red lines after dumping Anthropic + Where things stand with the Department of War Anthropic + Microsoft says Anthropic's products remain available to customers after Pentagon blacklist(01:09:11) A new lawsuit claims Gemini assisted in suicide | Semafor(01:15:24) Anthropic just mapped out which jobs AI could potentially replace. A 'Great Recession for white-collar workers' is absolutely possible | Fortune(01:21:54) We're correcting a mistake in our modeling that inflated recent 50%-time horizons by 10-20%See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#235 - Sonnet 4.6, Deep-thinking tokens, Anthropic vs Pentagon

Play Episode Listen Later Mar 3, 2026 101:48

Our 235th episode with a summary and discussion of last week's big AI news!Recorded on 02/27/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:Model and tool updates highlight Anthropic's Sonnet 4.6 (1M context; strong ARC-AGI-2 results), Google's Gemini 3.1 Pro (major ARC-AGI-2 jump and multimodal demos), xAI's Grok 4.2 beta (multi-agent debate), plus Anthropic's Claude Code “Remote Control” and Perplexity's multi-agent “Computer” coordinator.Compute and business moves include Meta's reported up-to-$100B AMD chip deal with warrant/equity incentives, MatX raising $500M to build specialized transformer chips shipping in 2027, World Labs raising $1B for world-model/3D environment tech, and a new startup raising $100M to simulate/predict human behavior.Infrastructure and geopolitics cover Stargate data-center delays amid OpenAI/Oracle/SoftBank control disputes and cash concerns, and China's plan to scale 7nm/5nm wafer output despite yield and tooling constraints.Research and safety/policy discuss optimizer gains from masked updates, “deep thinking tokens” as a reasoning-effort signal, LLM attractor-state behaviors in bot-to-bot chats, mechanistic interpretability of counting/line-wrapping, methods to map task difficulty to human time horizons, plus Anthropic–Pentagon contract tensions, Anthropic's report on distillation attacks (DeepSeek/Moonshot/Minimax), and OpenAI's report on disrupting malicious use.A thank you to our current sponsors:Box - visit Box.com/AI to learn moreODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026.Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a yearTimestamps:(00:00:10) Intro / Banter(00:01:52) News PreviewTools & Apps(00:03:20) Anthropic releases Sonnet 4.6 | TechCrunch(00:11:24) Google Rolls Out Latest AI Model, Gemini 3.1 Pro - CNET(00:14:54) Elon Musk says Grok 4.20 public beta is now available: Capabilities of AI chatbot offered by xAI - The Times of India(00:18:06) Anthropic just released a mobile version of Claude Code called Remote Control | VentureBeat(00:21:01) Perplexity announces "Computer," an AI agent that assigns work to other AI agents - Ars TechnicaApplications & Business(00:23:40) Meta strikes up to $100B AMD chip deal as it chases 'personal superintelligence' | TechCrunch(00:27:05) Nvidia challenger AI chip startup MatX raised $500M | TechCrunch(00:31:00) World Labs lands $1B, with $200M from Autodesk, to bring world models into 3D workflows | TechCrunch(00:33:07) Simile Raises $100 Million for AI Aiming to Predict Human Behavior(00:33:52) Stargate AI data centers for OpenAI reportedly delayed by squabbles between partners — sources say OpenAI, Oracle, and SoftBank disagreed on who would have ultimate control of the planned data centers(00:36:43) China to increase leading-edge chip output by 5x in two years, report claims — aims to lift 7nm and 5nm production to 100,000 wafers per month, targeting half a million monthly by 2030Research & Advancements(00:40:33) On Surprising Effectiveness of Masking Updates in Adaptive Optimizers(00:48:03) Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens(00:54:52) models have some pretty funny attractor states(01:01:41) When Models Manipulate Manifolds: The Geometry of a Counting Task(01:05:16) BRIDGE: Predicting Human Task Completion Time From Model Performance(01:12:00) NESSiE: The Necessary Safety Benchmark -- Identifying Errors that should not Exist(01:13:15) The least understood driver of AI progress(01:21:45) The Persona Selection Model: Why AI Assistants might Behave like HumansPolicy & Safety(01:25:04) Anthropic CEO Amodei says Pentagon's threats 'do not change our position' on AI(01:33:04) Musk's xAI, Pentagon reach deal to use Grok in classified systems(01:34:17) Detecting and preventing distillation attacks(01:38:36) OpenAI details expanding efforts to disrupt malicious use of AI in new report - SiliconANGLESee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#235 - Opus 4.6, GPT-5.3-codex, Seedance 2.0, GLM-5

Play Episode Listen Later Feb 16, 2026 90:33

Our 235th episode with a summary and discussion of last week's big AI news!Recorded on 01/02/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:* Major model launches include Anthropic's Opus 4.6 with a 1M-token context window and “agent teams,” OpenAI's GPT-5.3 Codex and faster Codex Spark via Cerebras, and Google's Gemini 3 Deep Think posting big jumps on ARC-AGI-2 and other STEM benchmarks amid criticism about missing safety documentation.* Generative media advances feature ByteDance's Seedance 2.0 text-to-video with high realism and broad prompting inputs, new image models Seedream 5.0 and Alibaba's Qwen Image 2.0, plus xAI's Grok Imagine API for text/image-to-video.* Open and competitive releases expand with Zhipu's GLM-5, DeepSeek's 1M-token context model, Cursor Composer 1.5, and open-weight Qwen3 Coder Next using hybrid attention aimed at efficient local/agentic coding.* Business updates include ElevenLabs raising $500M at an $11B valuation, Runway raising $315M at a $5.3B valuation, humanoid robotics firm Apptronik raising $935M at a $5.3B valuation, Waymo announcing readiness for high-volume production of its 6th-gen hardware, plus industry drama around Anthropic's Super Bowl ad and departures from xAI.Timestamps:(00:00:10) Intro / Banter(00:02:03) Sponsor Break(00:05:33) Response to listener commentsTools & Apps(00:07:27) Anthropic releases Opus 4.6 with new 'agent teams' | TechCrunch(00:11:28) OpenAI's new GPT-5.3-Codex is 25% faster and goes way beyond coding now - what's new | ZDNET(00:25:30) OpenAI launches new macOS app for agentic coding | TechCrunch(00:26:38) Google Unveils Gemini 3 Deep Think for Science & Engineering | The Tech Buzz(00:31:26) ByteDance's Seedance 2.0 Might be the Best AI Video Generator Yet - TechEBlog(00:35:14) China's ByteDance, Alibaba unveil AI image tools to rival Google's popular Nano Banana | South China Morning Post(00:36:54) DeepSeek boosts AI model with 10-fold token addition as Zhipu AI unveils GLM-5 | South China Morning Post(00:43:11) Cursor launches Composer 1.5 with upgrades for complex tasks(00:44:03) xAI launches Grok Imagine API for text and image to videoApplications & Business(00:45:47) Nvidia-backed AI voice startups ElevenLabs hits $11 billion valuation(00:52:04) AI video startup Runway raises $315M at $5.3B valuation, eyes more capable world models | TechCrunch(00:54:02) Humanoid robot startup Apptronik has now raised $935M at a $5B+ valuation | TechCrunch(00:57:10) Anthropic says 'Claude will remain ad-free,' unlike an unnamed rival | The Verge(01:00:18) Okay, now exactly half of xAI's founding team has left the company | TechCrunch(01:04:03) Waymo's next-gen robotaxi is ready for passengers — and also 'high-volume production' | The VergeProjects & Open Source(01:04:59) Qwen3-Coder-Next: Pushing Small Hybrid Models on Agentic Coding(01:08:38) OpenClaw's AI 'skill' extensions are a security nightmare | The VergeResearch & Advancements(01:10:40) Learning to Reason in 13 Parameters(01:16:01) Reinforcement World Model Learning for LLM-based Agents(01:20:00) Opus 4.6 on Vending-Bench – Not Just a Helpful AssistantPolicy & Safety(01:22:28) METR GPT-5.2(01:26:59) The Hot Mess of AI: How Does Misalignment Scale with Model Intelligence and Task Complexity?See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#233 - Moltbot, Genie 3, Qwen3-Max-Thinking

Play Episode Listen Later Feb 6, 2026 80:33

Our 233rd episode with a summary and discussion of last week's big AI news!Recorded on 01/30/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:Google introduces Gemini AI agent in Chrome for advanced browser functionality, including auto-browsing for pro and ultra subscribers.OpenAI releases ChatGPT Translator and Prism, expanding its applications beyond core business to language translation and scientific research assistance.Significant funding rounds and valuations achieved by startups Recursive and New Rofo, focusing on specialized AI chips and optical processors respectively.Political and social issues, including violence in Minnesota, prompt tech leaders in AI like Ade from Anthropic and Jeff Dean from Google to express concerns about the current administration's actions.Timestamps:(00:00:10) Intro / BanterTools & Apps(00:04:09) Google adds Gemini AI-powered ‘auto browse' to Chrome | The Verge(00:07:11) Users flock to open source Moltbot for always-on AI, despite major risks - Ars Technica(00:13:25) Google Brings Genie 3 'World Building' Experiment to AI Ultra Subscribers - CNET(00:16:17) OpenAI's ChatGPT translator challenges Google Translate | The Verge(00:18:27) OpenAI launches Prism, a new AI workspace for scientists | TechCrunchApplications & Business(00:19:49) Exclusive: China gives nod to ByteDance, Alibaba and Tencent to buy Nvidia's H200 chips - sources | Reuters(00:22:55) AI chip startup Ricursive hits $4B valuation 2 months after launch(00:24:38) AI Startup Recursive in Funding Talks at $4 Billion Valuation - Bloomberg(00:27:30) Flapping Airplanes and the promise of research-driven AI | TechCrunch(00:31:54) From invisibility cloaks to AI chips: Neurophos raises $110M to build tiny optical processors for inferencing | TechCrunchProjects & Open Source(00:35:34) Qwen3-Max-Thinking debuts with focus on hard math, code(00:38:26) China's Moonshot releases a new open-source model Kimi K2.5 and a coding agent | TechCrunch(00:46:00) Ai2 launches family of open-source AI developer agents that adapt to any codebase - SiliconANGLE(00:47:46) Tiny startup Arcee AI built a 400B-parameter open source LLM from scratch to best Meta's LlamaResearch & Advancements(00:52:53) Post-LayerNorm Is Back: Stable, ExpressivE, and Deep(00:58:00) [2601.19897] Self-Distillation Enables Continual Learning(01:03:04) [2601.20802] Reinforcement Learning via Self-Distillation(01:05:58) Teaching Models to Teach Themselves: Reasoning at the Edge of LearnabilityPolicy & Safety(01:09:13) Amodei, Hoffman Join Tech Workers Decrying Minnesota Violence - BloombergSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#232 - ChatGPT Ads, Thinking Machines Drama, STEM

Play Episode Listen Later Jan 28, 2026 101:03

Our 232st episode with a summary and discussion of last week's big AI news!Recorded on 01/23/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:OpenAI announces testing of ads in ChatGPT and introduces child age prediction to enhance safety features, amidst ongoing ethical debates and funding expansions in AI integration with educational tools and business models.China's AI landscape sees significant progress with AI firm Jpu training advanced models on domestic hardware, and strong competitive moves by data centers, highlighting the intense demand in AI manufacturing and infrastructure.Silicon Valley tensions rise as startup Thinking Machines experiences high-profile departures back to OpenAI, reflecting broader industry struggles and rapid shifts in organizational dynamics.AI legislation and safety measures advance with the US Senate's Defiance Act addressing explicit content, and Anthropic updating Claude's constitution to guide ethical AI interactions, while cultural pushbacks from artists signal ongoing debates in intellectual property and AI-generated content.Timestamps:(00:00:10) Intro / Banter(00:02:08) News Preview(00:02:26) Response to listener commentsTools & Apps(00:11:55) OpenAI to test ads in ChatGPT as it burns through billions - Ars Technica(00:18:05) OpenAI is launching age prediction for ChatGPT accounts(00:23:37) Google now offers free SAT practice exams, powered by Gemini | TechCrunch(00:24:57) Baidu's AI Assistant Reaches Milestone of 200 Million Monthly Active Users - WSJApplications & Business(00:26:53) The Drama at Thinking Machines, a New A.I. Start-Up, Is Riveting Silicon Valley - The New York Times(00:31:44) Zhipu AI breaks US chip reliance with first major model trained on Huawei stack | South China Morning Post(00:36:31) Elon Musk's xAI launches world's first Gigawatt AI supercluster to rival OpenAI and Anthropic(00:41:25) Sequoia to invest in Anthropic, breaking VC taboo on backing rivals: FT(00:45:18) Humans&, a 'human-centric' AI startup founded by Anthropic, xAI, Google alums, raised $480M seed round | TechCrunchProjects & Open Source(00:48:51) Black Forest Labs Releases FLUX.2 [klein]: Compact Flow Models for Interactive Visual Intelligence - MarkTechPost(00:50:35) [2601.10611] Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding(00:52:53) [2601.10547] HeartMuLa: A Family of Open Sourced Music Foundation Models(00:54:46) [2601.11044] AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World ContextsResearch & Advancements(00:57:05) STEM: Scaling Transformers with Embedding Modules(01:06:22) Reasoning Models Generate Societies of Thought(01:14:21) Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research AttemptsPolicy & Safety(01:19:41) Senate passes bill letting victims sue over Grok AI explicit images(01:22:03) Building Production-Ready Probes For Gemini(01:27:32) Anthropic Publishes Claude AI's New Constitution | TIMESynthetic Media & Art(01:34:13) Artists Launch Stealing Isn't Innovation Campaign To Protest Big TechSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#231 - Claude Cowork, Anthropic $10B, Deep Delta Learning

Play Episode Listen Later Jan 21, 2026 103:17

Our 231st episode with a summary and discussion of last week's big AI news!Recorded on 01/16/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:Anthropic's new cowork tool integrates Claude code, potentially simplifying multiple computing tasks from editing videos to compiling spreadsheets.Significant funding rounds see Anthropic raising $10B at a valuation of $350B, while XAI raises $20B, underscoring the immense market interest in AI startups.Nvidia faces supply challenges for H200 AI chips due to overwhelming demand from China, despite high costs per unit and its potential impact on U.S. company revenue.Policy debates highlight tensions around U.S. export controls to China, with leaders like Justin Lin from Alibaba and Jake Sullivan, former national security advisor, weighing in on the ramifications for the AI industry's future.Timestamps:(00:00:10) Intro / Banter(00:01:30) News PreviewTools & Apps(00:02:13) Anthropic's new Cowork tool offers Claude Code without the code | TechCrunch(00:09:45) Google's Gemini AI will use what it knows about you from Gmail, Search, and YouTube | The Verge(00:12:45) Google removes some AI health summaries after investigation finds “dangerous” flaws - Ars Technica(00:16:29) Gmail is getting a Gemini AI overhaul(00:18:12) Slackbot is an AI agent now | TechCrunchApplications & Business(00:20:11) Anthropic Raising $10 Billion at $350 Billion Value(00:22:25) Elon Musk xAI raises $20 billion from Nvidia, Cisco, investors(00:24:47) NVIDIA Needs a Supply Chain ‘Miracle' From TSMC as China's H200 AI Chip Orders Overwhelm Supply, Triggering a Bottleneck(00:29:26) OpenAI signs deal, worth $10B, for compute from Cerebras | TechCrunch(00:31:49) CoreWeave in focus as it amends credit agreement(00:34:30) LMArena lands $1.7B valuation four months after launching its product | TechCrunchProjects & Open Source(00:35:54) Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models(00:43:15) mHC: Manifold-Constrained Hyper-Connections(00:49:53) IQuest_Coder_Technical_Report(00:54:58) TII Abu-Dhabi Released Falcon H1R-7B: A New Reasoning Model Outperforming Others in Math and Coding with only 7B Params with 256k Context Window - MarkTechPostResearch & Advancements(01:01:42) Deep Delta Learning(01:07:47) Recursive Language Models(01:13:39) Conditional memory via scalable lookup(01:18:54) Extending the Context of Pretrained LLMs by Dropping their Positional EmbeddingsPolicy & Safety(01:26:06) Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks(01:31:00) Nvidia CEO says purchase orders, not formal declaration, will signal Chinese approval of H200(01:32:24) China AI Leaders Warn of Widening Gap With US After $1B IPO Week(01:37:25) Jake Sullivan is furious that Trump removed Biden's AI chip export controls | The VergeSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#230 - 2025 Retrospective, Nvidia buys Groq, GLM 4.7, METR

Play Episode Listen Later Jan 7, 2026 98:08

Our 230th episode with a summary and discussion of last week's big AI news!Recorded on 01/02/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:Nvidia's acquisition of AI chip startup Groq for $20 billion highlights a strategic move for enhanced inference technology in GPUs.New York's RAISE Act legislation aims to regulate AI safety, marking the second major AI safety bill in the US.The launch of GLM 4.7 by Zhipu AI marks a significant advancement in open-source AI models for coding.Evaluation of long-horizon AI agents raises concerns about the rising costs and efficiency of AI in performing extended tasks.Timestamps:(00:00:10) Intro / Banter(00:01:58) 2025 RetrospectiveTools & Apps(00:24:39) OpenAI bets big on audio as Silicon Valley declares war on screens | TechCrunchApplications & Business(00:26:39) Nvidia buying AI chip startup Groq for about $20 billion, biggest deal(00:34:28) Exclusive | Meta Buys AI Startup Manus, Adding Millions of Paying Users - WSJ(00:38:05) Cursor continues acquisition spree with Graphite deal | TechCrunch(00:39:15) Micron Hikes CapEx to $20B with 2026 HBM Supply Fully Booked; HBM4 Ramps 2Q26(00:42:06) Chinese fabs are reportedly upgrading older ASML DUV lithography chipmaking machines — secondary channels and independent engineers used to soup up Twinscan NXT seriesProjects & Open Source(00:47:52) Z.AI launches GLM-4.7, new SOTA open-source model for coding(00:50:11) Evaluating AI's ability to perform scientific research tasksResearch & Advancements(00:54:32) Large Causal Models from Large Language Models(00:57:33) Universally Converging Representations of Matter Across Scientific Foundation Models(01:02:11) META-RL INDUCES EXPLORATION IN LANGUAGE AGENTS(01:07:16) Are the Costs of AI Agents Also Rising Exponentially?(01:11:17) METR eval for Opus 4.5(01:16:19) How to game the METR plotPolicy & Safety(01:17:24) New York governor Kathy Hochul signs RAISE Act to regulate AI safety | TechCrunch(01:20:40) Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers(01:26:46) Monitoring Monitorability(01:32:07) Sam Altman is hiring someone to worry about the dangers of AI | The Verge(01:33:38) X users asking Grok to put this girl in bikini, Grok is happy obliging - India TodaySee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#229 - Gemini 3 Flash, ChatGPT Apps, Nemotron 3

Play Episode Listen Later Dec 25, 2025 87:07

Our 229th episode with a summary and discussion of last week's big AI news!Recorded on 12/19/2025Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:Notable releases include OpenAI's GPT-5.2 Codex for advanced coding and Google's Gemini Free Flash for competitive AI application performance. Nvidia's new open-source Trion-3 models also showcase impressive benchmarks.Funding updates highlight Lovable's $330M Series B, valuing the AI coding startup at $6.6B, and Faya's $140M Series D for AI model hosting, valued at $4.5B.China makes significant strides in semiconductor technology with advances in EUV lithography machines, led by Huawei and SMIC, potentially disrupting global chip manufacturing dominance.Key safety and policy updates include OpenAI's GPT-5.2 system card focusing on biosecurity and cybersecurity risks, while Google partners with the US military to power a new AI platform with Gemini models.Timestamps:(00:00:10) Intro / Banter(00:02:09) News PreviewTools & Apps(00:02:56) Google launches Gemini 3 Flash, makes it the default model in the Gemini app | TechCrunch(00:10:13) ChatGPT launches an app store, lets developers know it's open for business | TechCrunch(00:13:35) Introducing GPT-5.2-Codex | OpenAI(00:19:23) Story about OpenAI release - GPT image 1.5(00:22:27) Meta partners with ElevenLabs to power AI audio across Instagram, Horizon - The Economic TimesApplications & Business(00:23:16) OpenAI to End Equity Vesting Period for Employees, WSJ Says(00:28:20) How China built its ‘Manhattan Project' to rival the West in AI chips(00:36:47) China's Huawei, SMIC Make Progress With Chips, Report Finds(00:41:03) OpenAI in Talks to Raise At Least $10 Billion From Amazon and Use Its AI Chips(00:43:32) Amazon has a new leader for its ‘AGI' group as it plays catch-up on AI | The Verge(00:47:27) Broadcom reveals its mystery $10 billion customer is Anthropic(00:49:12) Vibe-coding startup Lovable raises $330M at a $6.6B valuation | TechCrunch(00:50:38) Fal nabs $140M in fresh funding led by Sequoia, tripling valuation to $4.5B | TechCrunchProjects & Open Source(00:51:10) Nvidia Becomes a Major Model Maker With Nemotron 3 | WIRED(00:59:24) Meta introduces new SAM AI able to isolate and edit audio • The Register(00:59:54) [2512.14856] T5Gemma 2: Seeing, Reading, and Understanding Longer(01:03:10) Anthropic makes agent Skills an open standard - SiliconANGLEResearch & Advancements(01:03:47) Budget-Aware Tool-Use Enables Effective Agent Scaling(01:08:21) Rethinking Thinking Tokens: LLMs as Improvement Operators(01:10:50) What if AI capabilities suddenly accelerated in 2027? How would the world know?Policy & Safety(01:12:58) Update to GPdfT-5 System Card: GPT-5.2(01:18:04) Neural Chameleons: Language Models Can Learn to Hide Their Thoughts from Unseen Activation Monitors(01:20:47) Async Control: Stress-testing Asynchronous Control Measures for LLM Agents(01:24:37) Google is powering a new US military AI platform | The VergeSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#228 - GPT 5.2, Scaling Agents, Weird Generalization

Play Episode Listen Later Dec 17, 2025 86:42

Our 228th episode with a summary and discussion of last week's big AI news!Recorded on 12/12/2025Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:OpenAI's latest model GPT-5.2 demonstrates improved performance and enhanced multi-modal capabilities but comes with increased costs and a different knowledge cutoff date.Disney invests $1 billion in OpenAI to generate Disney character content, creating unique licensing agreements across characters from Marvel, Pixar, and Star Wars franchises.The U.S. government imposes new AI chip export rules involving security reviews, while simultaneously moving to prevent states from independently regulating AI.DeepMind releases a paper outlining the challenges and findings in scaling multi-agent systems, highlighting the complexities of tool coordination and task performance.Timestamps:(00:00:00) Intro / Banter(00:01:19) News PreviewTools & Apps(00:01:58) GPT-5.2 is OpenAI's latest move in the agentic AI battle | The Verge(00:08:48) Runway releases its first world model, adds native audio to latest video model | TechCrunch(00:11:51) Google says it will link to more sources in AI Mode | The Verge(00:12:24) ChatGPT can now use Adobe apps to edit your photos and PDFs for free | The Verge(00:13:05) Tencent releases Hunyuan 2.0 with 406B parametersApplications & Business(00:16:15) China set to limit access to Nvidia's H200 chips despite Trump export approval(00:21:02) Disney investing $1 billion in OpenAI, will allow characters on Sora(00:24:48) Unconventional AI confirms its massive $475M seed round(00:29:06) Slack CEO Denise Dresser to join OpenAI as chief revenue officer | TechCrunch(00:31:18) The state of enterprise AIProjects & Open Source(00:33:49) [2512.10791] The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality(00:36:27) Claude 4.5 Opus' Soul DocumentResearch & Advancements(00:43:49) [2512.08296] Towards a Science of Scaling Agent Systems(00:48:43) Evaluating Gemini Robotics Policies in a Veo World Simulator(00:52:10) Guided Self-Evolving LLMs with Minimal Human Supervision(00:56:08) Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLM Reasoning(01:00:39) [2512.07783] On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models(01:04:42) Stabilizing Reinforcement Learning with LLMs: Formulation and Practices(01:09:42) Google's AI unit DeepMind announces UK 'automated research lab'Policy & Safety(01:10:28) Trump Moves to Stop States From Regulating AI With a New Executive Order - The New York Times(01:13:54) [2512.09742] Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs(01:17:57) Forecasting AI Time Horizon Under Compute Slowdowns(01:20:46) AI Security Institute focuses on AI measurements and evaluations(01:21:16) Nvidia AI Chips to Undergo Unusual U.S. Security Review Before Export to China(01:22:01) U.S. Authorities Shut Down Major China-Linked AI Tech Smuggling NetworkSynthetic Media & Art(01:24:01) RSL 1.0 has arrived, allowing publishers to ask AI companies pay to scrape content | The VergeSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#227 - Jeremie is back! DeepSeek 3.2, TPUs, Nested Learning

Play Episode Listen Later Dec 9, 2025 94:40

Our 227th episode with a summary and discussion of last week's big AI news!Recorded on 12/05/2025Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:Deep Seek 3.2 and Flux 2 release, showcasing advancements in open-source AI models for natural language processing and image generation respectively.Amazon's new AI chips and Google's TPUs signal potential shifts in AI hardware dominance, with growing competition against Nvidia.Anthropic's potential IPO and OpenAI's declared ‘Code Red' indicate significant moves in the AI business landscape, including high venture funding rounds for startups.Key research papers from DeepMind and Google explore advanced memory architectures and multi-agent systems, indicating ongoing efforts to enhance AI reasoning and efficiency.Timestamps:(00:00:10) Intro / Banter(00:02:42) News PreviewTools & Apps(00:03:30) Deepseek 3.2 : New AI Model is Faster, Cheaper and Smarter(00:23:22) Black Forest Labs launches Flux.2 AI image models to challenge Nano Banana Pro and Midjourney(00:28:00) Sora and Nano Banana Pro throttled amid soaring demand | The Verge(00:29:34) Mistral closes in on Big AI rivals with new open-weight frontier and small models | TechCrunch(00:31:41) Kling's Video O1 launches as the first all-in-one video model for generation and editing(00:34:07) Runway rolls out Gen 4.5 AI video model that beats Google, OpenAIApplications & Business(00:35:18) NVIDIA's Partners Are Beginning to Tilt Toward Google's TPU Ecosystem, with Foxconn Reportedly Securing TPU Rack Orders(00:40:37) Amazon releases an impressive new AI chip and teases an Nvidia-friendly roadmap | TechCrunch(00:43:03) OpenAI declares ‘code red' as Google catches up in AI race | The Verge(00:46:20) Anthropic reportedly preparing for massive IPO in race with OpenAI: FT(00:48:41) Black Forest Labs raises $300M at $3.25B valuation | TechCrunch(00:49:20) Paris-based AI voice startup Gradium nabs $70M seed | TechCrunch(00:50:10) OpenAI announced a 1 GW Stargate cluster in Abu Dhabi(00:53:22) OpenAI's investment into Thrive Holdings is its latest circular deal(00:55:11) OpenAI to acquire Neptune, an AI model training assistance startup(00:56:11) Anthropic acquires developer tool startup Bun to scale AI coding(00:56:55) Microsoft drops AI sales targets in half after salespeople miss their quotas - Ars TechnicaProjects & Open Source(00:57:51) [2511.22570] DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning(01:01:52) Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving MemoryResearch & Advancements(01:05:44) Nested Learning: The Illusion of Deep Learning Architecture(01:13:30) Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO(01:15:50) State of AI: An Empirical 100 Trillion Token Study with OpenRouterPolicy & Safety(01:21:52) Trump signs executive order launching Genesis Mission AI project(01:24:42) OpenAI has trained its LLM to confess to bad behavior | MIT Technology Review(01:29:34) US senators seek to block Nvidia sales of advanced chips to ChinaSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#226 - Gemini 3, Claude Opus 4.5, Nano Banana Pro, LeJEPA

Play Episode Listen Later Nov 30, 2025 71:11

Our 226th episode with a summary and discussion of last week's big AI news!Recorded on 11/24/2025Hosted by Andrey Kurenkov and co-hosted by Michelle LeeFeel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode: New AI model releases include Google's Gemini 3 Pro, Anthropic's Opus 4.5, and OpenAI's GPT-5.1, each showcasing significant advancements in AI capabilities and applications.Robotics innovations feature Sunday Robotics' new robot Memo and a $600M funding round for Visual Intelligence, highlighting growth and investment in the robotics sector.AI safety and policy updates include Europe's proposed changes to GDPR and AI Act regulations, and reports of AI-assisted cyber espionage by a Chinese state-sponsored group.AI-generated content and legal highlights involve settlements between Warner Music Group and AI music platform UDIO, reflecting evolving dynamics in the field of synthetic media.Timestamps:(00:00:10) Intro / Banter(00:01:32) News Preview(00:02:10) Response to listener commentsTools & Apps(00:02:34) Google launches Gemini 3 with new coding app and record benchmark scores | TechCrunch(00:05:49) Google launches Nano Banana Pro powered by Gemini 3(00:10:55) Anthropic releases Opus 4.5 with new Chrome and Excel integrations | TechCrunch(00:15:34) OpenAI releases GPT-5.1-Codex-Max to handle engineering tasks that span twenty-four hours(00:18:26) ChatGPT launches group chats globally | TechCrunch(00:20:33) Grok Claims Elon Musk Is More Athletic Than LeBron James — and the World's Greatest LoverApplications & Business(00:24:03) What AI bubble? Nvidia's strong earnings signal there's more room to grow(00:26:26) Alphabet stock surges on Gemini 3 AI model optimism(00:28:09) Sunday Robotics emerges from stealth with launch of ‘Memo' humanoid house chores robot(00:32:30) Robotics Startup Physical Intelligence Valued at $5.6 Billion in New Funding - Bloomberg(00:34:22) Waymo permitted areas expanded by California DMV - CBS Los Angeles - Waymo enters 3 more cities: Minneapolis, New Orleans, and Tampa | TechCrunchProjects & Open Source(00:37:00) Meta AI Releases Segment Anything Model 3 (SAM 3) for Promptable Concept Segmentation in Images and Videos - MarkTechPost(00:40:18) [2511.16624] SAM 3D: 3Dfy Anything in Images(00:42:51) [2511.13998] LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software EngineeringResearch & Advancements(00:45:10) [2511.08544] LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics(00:50:08) [2511.13720] Back to Basics: Let Denoising Generative Models DenoisePolicy & Safety(00:52:08) Europe is scaling back its landmark privacy and AI laws | The Verge(00:54:13) From shortcuts to sabotage: natural emergent misalignment from reward hacking(00:58:24) [2511.15304] Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models(01:01:43) Disrupting the first reported AI-orchestrated cyber espionage campaign(01:04:36) OpenAI Locks Down San Francisco Offices Following Alleged Threat From Activist | WIREDSynthetic Media & Art(01:07:02) Warner Music Group Settles AI Lawsuit With UdioSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#225 - GPT 5.1, Kimi K2 Thinking, Remote Labor Index

Play Episode Listen Later Nov 21, 2025 78:14

Our 225th episode with a summary and discussion of last week's big AI news!Recorded on 11/16/2025Hosted by Andrey Kurenkov and co-hosted by Michelle LeeFeel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:New AI model releases include GPT-5.1 from OpenAI and Ernie 5.0 from Baidu, each with updated features and capabilities.Self-driving technology advancements from Baidu's Apollo Go and Pony AI's IPO highlight significant progress in the automotive sector.Startup funding updates include Incept taking $50M for diffusion models, while Cursor and Gamma secure significant valuations for coding and presentation tools respectively.AI-generated content is gaining traction with songs topping charts and new marketplaces for AI-generated voices, indicating evolving trends in synthetic media.Timestamps:(00:01:19) News PreviewTools & Apps(00:02:13) OpenAI says the brand-new GPT-5.1 is ‘warmer' and has more ‘personality' options | The Verge(00:04:51) Baidu Unveils ERNIE 5.0 and a Series of AI Applications at Baidu World 2025, Ramps Up Global Push(00:07:00) ByteDance's Volcano Engine debuts coding agent at $1.3 promo price(00:08:04) Google will let users call stores, browse products, and check out using AI | The Verge(00:10:41) Fei-Fei Li's World Labs speeds up the world model race with Marble, its first commercial product | TechCrunch(00:13:30) OpenAI says it's fixed ChatGPT's em dash problem | TechCrunchApplications & Business(00:16:01) Anthropic announces $50 billion data center plan | TechCrunch(00:18:06) Baidu teases next-gen AI training, inference accelerators • The Register(00:20:50) Meta chief AI scientist Yann LeCun plans to exit and launch own start-up(00:24:41) Amazon Demands Perplexity Stop AI Tool From Making Purchases - Bloomberg(00:27:32) AI PowerPoint-killer Gamma hits $2.1B valuation, $100M ARR, founder says | TechCrunch(00:29:33) Inception raises $50 million to build diffusion models for code and text | TechCrunch(00:31:14) Coding assistant Cursor raises $2.3B 5 months after its previous round | TechCrunch(00:33:56) China's Baidu says it's running 250,000 robotaxi rides a week — same as Alphabet's Waymo(00:35:26) Driverless Tech Firm Pony AI Raises $863 Million in HK ListingProjects & Open Source(00:36:30) Moonshot's Kimi K2 Thinking emerges as leading open source AIResearch & Advancements(00:39:22) [2510.26787] Remote Labor Index: Measuring AI Automation of Remote Work(00:45:21) OpenAI Researchers Train Weight Sparse Transformers to Expose Interpretable Circuits - MarkTechPost(00:49:34) Kimi Linear: An Expressive, Efficient Attention Architecture(00:53:33) Watch Google DeepMind's new AI agent learn to play video games | The Verge(00:57:34) arXiv Changes Rules After Getting Spammed With AI-Generated 'Research' PapersPolicy & Safety(00:59:35) Stability AI largely wins UK court battle against Getty Images over copyright and trademark | AP News(01:01:48) Court rules that OpenAI violated German copyright law; orders it to pay damages | TechCrunch(01:03:48) Microsoft's $15.2B UAE investment turns Gulf State into test case for US AI diplomacy | TechCrunchSynthetic Media & Art(01:06:39) An AI-Generated Country Song Is Topping A Billboard Chart, And That Should Infuriate Us All | Whiskey Riff(01:10:59) Xania Monet is the first AI-powered artist to debut on a Billboard airplay chart, but she likely won't be the last | CNN(01:13:34) ElevenLabs' new AI marketplace lets brands use famous voices for ads | The VergeSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#224 - OpenAI is for-profit! Cursor 2, Minimax M2, Udio copyright

Play Episode Listen Later Nov 5, 2025 91:43

Our 224th episode with a summary and discussion of last week's big AI news!Recorded on 10/31/2025Hosted by Andrey Kurenkov and co-hosted by Gavin Purcell (check out AI For Humans and AndThen!)Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:OpenAI completes its for-profit restructuring, redefining its relationship with Microsoft and securing future investments. Meanwhile, Qualcomm and other tech giants announce new AI chips aimed at competing with Nvidia and AMD, marking major advancements in AI hardware capabilities. Amazon and Google deepen their partnerships with Anthropic, providing extensive computing resources to enhance AI research and applications. These developments signal significant growth and competition in the AI industry. Major AI tools and models were released and updated, including Cursor 2.0, CLAUDE coding capabilities, and open-source options from Minimax. These new tools offer a range of functionalities for coding, design, and more. Legal battles around AI copyright issues persist, as OpenAI faces ongoing lawsuits from authors over text generation using copyrighted material. Universal Music Group settles a copyright suit with AI music startup UDO, transitioning to a licensed model for AI-generated music. This shift reflects broader challenges and adaptations in the AI-generated content space, where copyright and ethical usage remain highly contentious issues.Timestamps:(00:00:10) Intro / Banter(00:02:44) News PreviewTools & Apps(00:03:44) Cursor 2.0 shifts to in-house AI with Composer model and parallel agents(00:07:44) Anthropic brings Claude Code to the web | TechCrunch(00:10:01) Microsoft's Mico is a 'Clippy' for the AI era | TechCrunch(00:14:20) Anthropic's Claude catches up to ChatGPT and Gemini with upgraded memory features | The Verge(00:18:46) Canva launches its own design model, adds new AI features to the platform | TechCrunch(00:21:07) Elon Musk's Grokipedia launches with AI-cloned pages from Wikipedia | The VergeApplications & Business(00:25:10) OpenAI completed its for-profit restructuring — and struck a new deal with Microsoft | The Verge(00:31:25) Qualcomm announces AI chips to compete with AMD and Nvidia(00:34:02) Amazon launches AI infrastructure project, to power Anthropic's Claude model | Reuters(00:38:52) Google and Anthropic announce cloud deal worth tens of billions(00:39:46) Google partners with Ambani's Reliance to offer free AI Pro access to millions of Jio users in India | TechCrunchProjects & Open Source(00:41:17) MiniMax Releases MiniMax M2: A Mini Open Model Built for Max Coding and Agentic Workflows at 8% Claude Sonnet Price and ~2x Faster - MarkTechPost(00:45:22) [2510.25741] Scaling Latent Reasoning via Looped Language Models(00:47:59) OpenAI's gpt-oss-safeguard enables developers to build safer AI - Help Net SecurityResearch & Advancements(00:49:51) [2510.15103] Continual Learning via Sparse Memory Finetuning(00:54:01) [2510.18091] Accelerating Vision Transformers with Adaptive Patch Sizes(00:57:46) [2510.18871] How Do LLMs Use Their Depth?Policy & Safety(01:01:07) AMD, Department of Energy announce $1 billion AI supercomputer partnership | The Verge(01:03:03) Synthetic Media & Art(01:09:34) Universal partners with AI startup Udio after settling copyright suit | The Verge(01:16:04) OpenAI loses bid to dismiss part of US authors' copyright lawsuit | ReutersSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#223 - Haiku 4.5, OpenAI DevDay, Claude Skills, Scaling RL, SB 243

Play Episode Listen Later Oct 24, 2025 71:45

Our 223st episode with a summary and discussion of last week's big AI news!Recorded on 10/17/2025Hosted by Andrey Kurenkov and co-hosted by Erik SchnultzFeel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:Anthropic and OpenAI have announced updates to their AI models and tools, including Haiku 4.5 and various business collaborations.Multiple companies like Slack and Salesforce are integrating AI assistants and agents into their platforms, enhancing task management and business operations.Recent research in reinforcement learning and agent memory curation highlights new methods for improving AI model performance and context management.California has passed a law to regulate AI chatbots for children and vulnerable users, and there are rising concerns over the increasing amount of AI-generated content on the internet.Timestamps:(00:00:10) Intro / Banter(00:01:31) News PreviewTools & Apps(00:02:18) Anthropic launches new version of scaled-down ‘Haiku' model(00:04:52) Everything OpenAI announced at DevDay 2025: Agent Kit, Apps SDK, ChatGPT, and more | ZDNET(00:09:11) Anthropic turns to ‘skills' to make Claude more useful at work | The Verge(00:13:20) Microsoft launches ‘vibe working' in Excel and Word | The Verge(00:17:22) Google releases Veo 3.1, adds it to Flow video editor | TechCrunch(00:19:40) Slack is turning Slackbot into an AI assistant | The Verge(00:22:52) Salesforce announces Agentforce 360 as enterprise AI competition heats up | TechCrunchApplications & Business(00:24:58) Broadcom stock pops 9% on OpenAI custom chip deal, adding to Nvidia and AMD agreements(00:27:58) How ByteDance Made China's Most Popular AI Chatbot | WIRED(00:30:08) Amazon's Zoox Robotaxis Have Arrived In Las Vegas - Here's What Riders Are Experiencing(00:32:43) Waymo's robotaxis are coming to London | The Verge(00:34:14) Reflection AI raises $2B to be America's open frontier AI lab, challenging DeepSeek | TechCrunch(00:35:58) General Intuition lands $134M seed to teach agents spatial reasoning using video game clips | TechCrunch(00:38:36) Supabase nabs $5B valuation, four months after hitting $2B | TechCrunchProjects & Open Source(00:40:58) Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-Device Speech Language Model with Instant Voice Cloning - MarkTechPost(00:43:06) Anthropic AI Releases Petri: An Open-Source Framework for Automated Auditing by Using AI Agents to Test the Behaviors of Target Models on Diverse Scenarios - MarkTechPostResearch & Advancements(00:44:25) [2510.13786] The Art of Scaling Reinforcement Learning Compute for LLMs(00:48:51) [2510.01171] Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity(00:51:22) [2510.12635] Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks(00:54:31) [2510.07364] Base Models Know How to Reason, Thinking Models Learn When(00:57:24) [2510.12402] Cautious Weight DecayPolicy & Safety(01:02:03) California becomes first state to regulate AI companion chatbots | TechCrunch(01:04:13) Over 50 Percent of the Internet Is Now AI Slop, New Data FindsSynthetic Media & Art(01:06:31) OpenAI Reverses Stance on Use of Copyright Works in Sora - WSJ(01:08:29) Character.AI removes Disney characters from platform after studio issues warningSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#222 - Sora 2, Sonnet 4.5, Vibes, Thinking Machines

Play Episode Listen Later Oct 7, 2025 97:16

Our 222st episode with a summary and discussion of last week's big AI news!Recorded on 10/03/2025Hosted by Andrey Kurenkov and co-hosted by Jon KrohnFeel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:(00:00:10) Intro / Banter(00:03:08) News Preview(00:03:56) Response to listener commentsTools & Apps(00:04:51) ChatGPT parent company OpenAI announces Sora 2 with AI video app(00:11:35) Anthropic releases Claude Sonnet 4.5 in latest bid for AI agents and coding supremacy | The Verge(00:22:25) Meta launches 'Vibes,' a short-form video feed of AI slop | TechCrunch(00:26:42) OpenAI launches ChatGPT Pulse to proactively write you morning briefs | TechCrunch(00:33:44) OpenAI rolls out safety routing system, parental controls on ChatGPT | TechCrunch(00:35:53) The Latest Gemini 2.5 Flash-Lite Preview is Now the Fastest Proprietary Model (External Tests) and 50% Fewer Output Tokens - MarkTechPost(00:39:54) Microsoft just added AI agents to Word, Excel, and PowerPoint - how to use them | ZDNETApplications & Business(00:42:41) OpenAI takes on Google, Amazon with new agentic shopping system | TechCrunch(00:46:01) Exclusive: Mira Murati's Stealth AI Lab Launches Its First Product | WIRED(00:49:54) OpenAI is the world's most valuable private company after private stock sale | TechCrunch(00:53:07) Elon Musk's xAI accuses OpenAI of stealing trade secrets in new lawsuit | Technology | The Guardian(00:55:40) Former OpenAI and DeepMind researchers raise whopping $300M seed to automate science | TechCrunchProjects & Open Source(00:58:26) [2509.16941] SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?Research & Advancements(01:01:28) [2509.17196] Evolution of Concepts in Language Model Pre-Training(01:05:36) [2509.19284] What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoTLighting round(01:09:37) [2507.02954] Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III(01:12:03) [2509.24552] Short window attention enables long-term memorizationPolicy & Safety(01:18:11) SB 53, the landmark AI transparency bill, is now law in California | The Verge(01:24:07) Elon Musk's xAI offers Grok to federal government for 42 cents | TechCrunch(01:25:23) Character.AI removes Disney characters from platform after studio issues warning(01:28:50) Spotify's Attempt to Fight AI Slop Falls on Its FaceSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#221 - OpenAI Codex, Gemini in Chrome, K2-Think, SB 53

Play Episode Listen Later Oct 7, 2025 47:01

Our 221st episode with a summary and discussion of last week's big AI news!Recorded on 09/19/2025Note: we transitioned to a new RSS feed and it seems this did not make it to there, so this may be posted about 2 weeks past the release date.Hosted by Andrey Kurenkov and co-hosted by Michelle LeeFeel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:OpenAI releases a new version of Codex integrated with GPT-5, enhancing coding capabilities and aiming to compete with other AI coding tools like Cloud Code.Significant updates in the robotics sector include new ventures in humanoid robots from companies like Figure AI and China's Unitree, as well as expansions in robotaxi services from Tesla and Amazon's Zoox.New open-source models and research advancements were discussed, including Google's DeepMind's self-improving foundation model for robotics and a physics foundation model aimed at generalizing across various physical systems.Legal battles continue to surface in the AI landscape with Warner Bros. suing MidJourney for copyright violations and Rolling Stone suing Google over AI-generated content summaries, highlighting challenges in AI governance and ethics.Timestamps:(00:00:10) Intro / BanterTools & Apps(00:02:33) OpenAI upgrades Codex with a new version of GPT-5(00:04:02) Google Injects Gemini Into Chrome as AI Browsers Go Mainstream | WIRED(00:06:14) Anthropic's Claude can now make you a spreadsheet or slide deck. | The Verge(00:07:12) Luma AI's New Ray3 Video Generator Can 'Think' Before Creating - CNETApplications & Business(00:08:32) OpenAI secures Microsoft's blessing to transition its for-profit arm | TechCrunch(00:10:31) Microsoft to lessen reliance on OpenAI by buying AI from rival Anthropic | TechCrunch(00:12:00) Figure AI passes $1B with Series C funding toward humanoid robot development - The Robot Report(00:13:52) China's Unitree plans $7 billion IPO valuation as humanoid robot race heats up(00:15:45) Tesla's robotaxi plans for Nevada move forward with testing permit | TechCrunch(00:17:48) Amazon's Zoox jumps into U.S. robotaxi race with Las Vegas launch(00:19:27) Replit hits $3B valuation on $150M annualized revenue | TechCrunch(00:21:14) Perplexity reportedly raised $200M at $20B valuation | TechCrunchProjects & Open Source(00:22:08) [2509.07604] K2-Think: A Parameter-Efficient Reasoning System(00:24:31) [2509.09614] LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software EngineeringResearch & Advancements(00:28:17) [2509.15155] Self-Improving Embodied Foundation Models(00:31:47) [2509.13805] Towards a Physics Foundation Model(00:34:26) [2509.12129] Embodied Navigation Foundation ModelPolicy & Safety(00:37:49) Anthropic endorses California's AI safety bill, SB 53 | TechCrunch(00:40:12) Warner Bros. Sues Midjourney, Joins Studios' AI Copyright Battle(00:42:02) Rolling Stone Publisher Sues Google Over AI Overview SummariesSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

#221 - OpenAI Codex, Gemini in Chome, K2-Think, SB 53

Play Episode Listen Later Sep 23, 2025 47:01 Transcription Available

Our 221st episode with a summary and discussion of last week's big AI news! Recorded on 09/19/2025 Hosted by Andrey Kurenkov and co-hosted by Michelle Lee Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: OpenAI releases a new version of Codex integrated with GPT-5, enhancing coding capabilities and aiming to compete with other AI coding tools like Cloud Code. Significant updates in the robotics sector include new ventures in humanoid robots from companies like Figure AI and China's Unitree, as well as expansions in robotaxi services from Tesla and Amazon's Zoox. New open-source models and research advancements were discussed, including Google's DeepMind's self-improving foundation model for robotics and a physics foundation model aimed at generalizing across various physical systems. Legal battles continue to surface in the AI landscape with Warner Bros. suing MidJourney for copyright violations and Rolling Stone suing Google over AI-generated content summaries, highlighting challenges in AI governance and ethics. Timestamps: (00:00:10) Intro / Banter Tools & Apps (00:02:33) OpenAI upgrades Codex with a new version of GPT-5 (00:04:02) Google Injects Gemini Into Chrome as AI Browsers Go Mainstream | WIRED (00:06:14) Anthropic's Claude can now make you a spreadsheet or slide deck. | The Verge (00:07:12) Luma AI's New Ray3 Video Generator Can 'Think' Before Creating - CNET Applications & Business (00:08:32) OpenAI secures Microsoft's blessing to transition its for-profit arm | TechCrunch (00:10:31) Microsoft to lessen reliance on OpenAI by buying AI from rival Anthropic | TechCrunch (00:12:00) Figure AI passes $1B with Series C funding toward humanoid robot development - The Robot Report (00:13:52) China's Unitree plans $7 billion IPO valuation as humanoid robot race heats up (00:15:45) Tesla's robotaxi plans for Nevada move forward with testing permit | TechCrunch (00:17:48) Amazon's Zoox jumps into U.S. robotaxi race with Las Vegas launch (00:19:27) Replit hits $3B valuation on $150M annualized revenue | TechCrunch (00:21:14) Perplexity reportedly raised $200M at $20B valuation | TechCrunch Projects & Open Source (00:22:08) [2509.07604] K2-Think: A Parameter-Efficient Reasoning System (00:24:31) [2509.09614] LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering Research & Advancements (00:28:17) [2509.15155] Self-Improving Embodied Foundation Models (00:31:47) [2509.13805] Towards a Physics Foundation Model (00:34:26) [2509.12129] Embodied Navigation Foundation Model Policy & Safety (00:37:49) Anthropic endorses California's AI safety bill, SB 53 | TechCrunch (00:40:12) Warner Bros. Sues Midjourney, Joins Studios' AI Copyright Battle (00:42:02) Rolling Stone Publisher Sues Google Over AI Overview Summaries

#220 - Gemini 2.5 Flash Image, Claude for Chrome, DeepConf

Play Episode Listen Later Sep 1, 2025 52:43 Transcription Available

Our 220th episode with a summary and discussion of last week's big AI news! Recorded on 08/30/2025 Check out Andrey's work over at Astrocade , sign up to be an ambassador here Hosted by Andrey Kurenkov and co-hosted by Daniel Bashir Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: Google's newly released Gemini 2.5 image editing model showcases remarkable advancements, enabling highly accurate modifications of subjects while retaining their original features. Anthropic expands Claude with an AI browser agent for Chrome and adds features to remember past conversations, enhancing the user experience and personalization. NVIDIA and AMD to share revenue from AI chip sales to China with US government, marking a notable shift in export control policies and trade practices. AI companion apps are experiencing substantial growth, with projected revenues expected to reach $120 million by 2025, raising questions about social implications and user engagement. Timestamps + Links: Tools & Apps (00:02:12) Google Gemini's AI image model gets a 'bananas' upgrade | TechCrunch (00:05:32) Anthropic launches a Claude AI agent that lives in Chrome | TechCrunch (00:08:30) Anthropic's Claude chatbot can now remember your past conversations | The Verge (00:11:46) Google Launches AI ‘Guided Learning' Tool to Teach Users (00:14:55) Apple Intelligence's ChatGPT integration will use GPT-5 starting with iOS 26 | The Verge (00:15:39) OpenAI Adds New Features to Codex, Like IDE Extension and GitHub Code Reviews Applications & Business (00:16:49) Lovable projects $1B in ARR within next 12 months | TechCrunch (00:18:56) Decart hits $3.1 billion valuation on $100 million raise to power real-time interacti | Ctech (00:20:19) Cohere raises $500M to beat back generative AI rivals | TechCrunch (00:21:25) Pony AI, Nearing Full-Year Robotaxi Goal, Eyes European Markets - Bloomberg (00:22:41) Co-founder of Elon Musk's xAI departs the company | TechCrunch Projects & Open Source (00:24:39) Meta AI Just Released DINOv3: A State-of-the-Art Computer Vision Model Trained with Self-Supervised Learning, Generating High-Resolution Image Features - MarkTechPost (00:27:02) GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models (00:29:49) China's DeepSeek Releases V3.1, Boosting AI Model's Capabilities - Bloomberg (00:30:36) Open weight LLMs exhibit inconsistent performance across providers (00:32:02) Microsoft Released VibeVoice-1.5B: An Open-Source Text-to-Speech Model that can Synthesize up to 90 Minutes of Speech with Four Distinct Speakers - MarkTechPost Research & Advancements (00:33:43) Deep Think with Confidence (00:36:30) Generative AI reshapes U.S. job market, Stanford study shows Policy & Safety (00:41:42) Inside the US Government's Unpublished Report on AI Safety | WIRED (00:44:10) U.S. Government to Take Cut of Nvidia and AMD A.I. Chip Sales to China - The New York Times (00:45:13) Anthropic Settles High-Profile AI Copyright Lawsuit Brought by Book Authors (00:46:56) AI companion apps on track to pull in $120M in 2025 | TechCrunch

#219 - GPT 5, Opus 4.1, OpenAI's Open Source, Astrocade

Play Episode Listen Later Aug 11, 2025 108:33 Transcription Available

Our 219th episode with a summary and discussion of last week's big AI news! Recorded on 08/08/2025 Check out Andrey's work over at Astrocade , sign up to be an ambassador here Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: OpenAI reveals GPT-5, a consolidated model combining all previous versions, marking notable improvements and introducing a new infrastructure and product update. Multiple major releases from leading AI labs, including OpenAI, Anthropic, and Google reflect the ongoing competitive landscape with significant business updates and new model capabilities. Discussions on geopolitical influences in AI development highlight China's evolving stance on AI safety and governance, contrasting with U.S. approaches and raising concerns over export bans and international cooperation. Papers from leading AI entities such as OpenAI and Anthropic delve into the complexities of AI alignment and safety, proposing new methodologies for auditing and mitigating risks in model behaviors. Timestamps + Links: (00:00:10) Intro / Banter (00:02:14) Plug: Astrocade rolls out AI agent-powered game creation experience so anyone can create games Tools & Apps (00:03:07) OpenAI's GPT-5 is here (00:17:02) Anthropic Releases Claude Opus 4.1 With Agentic, Coding and Reasoning Upgrades (00:21:06) Google rolls out Gemini Deep Think AI, a reasoning model that tests multiple ideas in parallel | TechCrunch (00:24:04) Grok Imagine, xAI's new AI image and video generator, lets you make NSFW content | TechCrunch Applications & Business (00:26:35) Meta, Microsoft stocks rise on strong earnings and AI spending boom (00:29:17) OpenAI to Establish Stargate Norway With 230MW Data Center - Bloomberg (00:32:12) Anthropic Revenue Pace Nears $5 Billion in Run-Up to Mega Round — The Information (00:37:18) OpenAI Hits $12 Billion Annualized Revenue (00:40:06) Noma Security raises $100 million to defend against AI agent vulnerabilities | Ctech Projects & Open Source (00:42:13) OpenAI Just Released Its First Open-Weight Models Since GPT-2 (00:53:13) Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance (00:57:39) Meta CLIP 2: A Worldwide Scaling Recipe (01:01:12) BFL and Krea release FLUX.1 Krea: Open image model designed for realism Research & Advancements (01:02:33) Google's Newest AI Model Acts like a Satellite to Track Climate Change | WIRED (01:04:50) Google's new AI model creates video game worlds in real time (01:10:55) AlphaGo Moment for Model Architecture Discovery (01:17:22) METR evaluates Grok 4 Policy & Safety (01:20:05) Estimating Worst-Case Frontier Risks of Open-Weight LLMs (01:23:14) Anthropic's AI 'Vaccine': Train It With Evil to Make It Good - Business Insider (01:27:26) Anthropic unveils 'auditing agents' to test for AI misalignment | VentureBeat (01:28:31) Optimizing The Final Output Can Obfuscate CoT (Research Note) (01:31:23) Why China isn't about to leap ahead of the West on compute (01:33:15) Inside the Summit Where China Pitched Its AI Agenda to the World | WIRED (01:38:47) Nvidia H20 GPUs reportedly caught up in U.S. Commerce Department's worst export license backlog in 30 years — billions of dollars worth of GPUs and other products in limbo due to staffing cuts, communication issues | Tom's Hardware (01:42:35) Response to listener comments

#218 - Github Spark, MegaScience, US AI Action Plan

Play Episode Listen Later Jul 31, 2025 92:12 Transcription Available

Our 218th episode with a summary and discussion of last week's big AI news! Recorded on 07/25/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. In this episode: GitHub introduces Vibe Coding with Spark, engaging users with natural language and visual controls to develop full-stack applications. AI coding tools from Gemin, CLI and RepleIt face significant issues, inadvertently deleting user data and highlighting the importance of careful management. US release never Award Americans, AI Action Plan outlining economic, technical, and policy strategies to maintain leadership in AI technology. Newly released Mega Science and SWE-Perf data sets evaluate AI reasoning and performance capabilities in diverse scientific and software engineering tasks. Timestamps + Links: (00:00:10) Intro / Banter (00:01:31) News Preview Tools & Apps (00:03:53) GitHub Introduces Vibe Coding with Spark: Revolutionizing Intelligent App Development in a Flash - MarkTechPost (00:07:05) Figma's AI app building tool is now available for everyone | The Verge (00:10:18) Two major AI coding tools wiped out user data after making cascading mistakes - Ars Technica (00:14:10) Google's AI Overviews have 2B monthly users, AI Mode 100M in the US and India | TechCrunch Applications & Business (00:18:10) Leaked Memo: Anthropic CEO Says the Company Will Pursue Gulf State Investments After All (00:24:39) Mira Murati says her startup Thinking Machines will release new product in ‘months' with ‘significant open source component' (00:27:07) Waymo responds to Tesla's dick joke with a bigger Austin robotaxi map | The Verge Projects & Open Source (00:32:05) MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning (00:43:09) TikTok Researchers Introduce SWE-Perf: The First Benchmark for Repository-Level Code Performance Optimization - MarkTechPost Research & Advancements (00:47:17) Subliminal Learning: Language models transmit behavioral traits via hidden signals in data (00:55:34) Inverse Scaling in Test-Time Compute (01:02:34) Scaling Laws for Optimal Data Mixtures Policy & Safety (01:07:35) White House Unveils America's AI Action Plan (01:16:55) Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety (01:20:20) Self-preservation or Instruction Ambiguity? Examining the Causes of Shutdown Resistance (01:24:00) People Are Being Involuntarily Committed, Jailed After Spiraling Into "ChatGPT Psychosis" (01:28:03) Meta refuses to sign EU's AI code of practice

#217 - ChatGPT Agent, Kimi k2, Hiring Drama

Play Episode Listen Later Jul 23, 2025 53:00 Transcription Available

Our 217th episode with a summary and discussion of last week's big AI news! Recorded on 07/17/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. In this episode: **OpenAI's new ChatGPT agent**: The episode begins with a detailed discussion on OpenAI's latest ChatGPT agent, which can control entire computers and perform a wide range of tasks, showcasing powerful performance benchmarks and potential applications in business and research. **Major business moves in the AI space**: Significant shifts include Google's acquisition of Windsurf's top talent after OpenAI's deal fell through, Cognition's acquisition of Windsurf, and several notable hires by Meta from OpenAI and Apple, highlighting intense competition in the AI industry. **AI's ethical and societal impacts**: The hosts discuss serious concerns like the rise of non-consensual explicit AI-generated images, ICE's use of facial recognition for large databases, and regulations aimed at controlling AI's potential misuse. **Video game actors strike ends**: The episode concludes with news that SAG-AFTRA's year-long strike for video game voice actors has ended after reaching an agreement on AI rights and wage increases, reflecting the broader impact of AI on the job market. Timestamps + Links: (00:00:10) Intro / Banter (00:02:49) News Preview Tools & Apps (00:03:29) OpenAI's new ChatGPT Agent can control an entire computer and do tasks for you (00:07:11) Alibaba-backed Moonshot releases new Kimi AI model that beats ChatGPT, Claude in coding — and it costs less (00:09:36) Amazon targets vibe-coding chaos with new 'Kiro' AI software development tool – GeekWire (00:12:33) Anthropic tightens usage limits for Claude Code – without telling users (00:15:51) Mistral's Le Chat chatbot gets a productivity push with new ‘deep research' mode | TechCrunch (00:17:46) I spent 24 hours flirting with Elon Musk's AI girlfriend (00:21:32) Uber is close to completing its quest to become the ultimate robotaxi app | The Verge Applications & Business (00:24:02) OpenAI's Windsurf deal is off — and Windsurf's CEO is going to Google | The Verge (00:28:09) Cognition, maker of the AI coding agent Devin, acquires Windsurf | TechCrunch (00:28:46) Anthropic hired back two of its employees — just two weeks after they left for a competitor. | The Verge (00:28:46) Another High-Profile OpenAI Researcher Departs for Meta | WIRED (00:28:46) Meta Hires Two Key Apple (AAPL) AI Experts After Poaching Their Boss - Bloomberg (00:31:31) Mira Murati's Thinking Machines Lab is worth $12B in seed round | TechCrunch (00:33:20) Lovable becomes a unicorn with $200M Series A just 8 months after launch | TechCrunch (00:34:55) SpaceX commits $2 billion to xAI as Musk steps up AI ambitions: Report | World News - Business Standard Research & Advancements (00:35:59) A former OpenAI engineer describes what it's really like to work there | TechCrunch (00:38:23) Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Policy & Safety (00:42:14) Anthropic, Google, OpenAI, xAI granted up to $200 million from DoD (00:43:08) California State Senator Scott Wiener Pushes Bill to Regulate AI Companies - Bloomberg (00:43:58) AI 'Nudify' Websites Are Raking in Millions of Dollars | WIRED (00:45:55) Inside ICE's Supercharged Facial Recognition App of 200 Million Images Synthetic Media & Art (00:48:47) Video game actors' strike officially ends after AI deal

#216 - Grok 4, Project Rainier, Kimi K2

Play Episode Listen Later Jul 14, 2025 102:10 Transcription Available

Our 216th episode with a summary and discussion of last week's big AI news! Recorded on 07/11/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. In this episode: xAI launches Grok 4 with breakthrough performance across benchmarks, becoming the first true frontier model outside established labs, alongside a $300/month subscription tier Grok's alignment challenges emerge with antisemitic responses, highlighting the difficulty of steering models toward "truth-seeking" without harmful biases Perplexity and OpenAI launch AI-powered browsers to compete with Google Chrome, signaling a major shift in how users interact with AI systems Meta study reveals AI tools actually slow down experienced developers by 20% on complex tasks, contradicting expectations and anecdotal reports of productivity gains Timestamps + Links: (00:00:10) Intro / Banter (00:01:02) News Preview Tools & Apps (00:01:59) Elon Musk's xAI launches Grok 4 alongside a $300 monthly subscription | TechCrunch (00:15:28) Elon Musk's AI chatbot is suddenly posting antisemitic tropes (00:29:52) Perplexity launches Comet, an AI-powered web browser | TechCrunch (00:32:54) OpenAI is reportedly releasing an AI browser in the coming weeks | TechCrunch (00:33:27) Replit Launches New Feature for its Agent, CEO Calls it ‘Deep Research for Coding' (00:34:40) Cursor launches a web app to manage AI coding agents (00:36:07) Cursor apologizes for unclear pricing changes that upset users | TechCrunch Applications & Business (00:39:10) Lovable on track to raise $150M at $2B valuation (00:41:11) Amazon built a massive AI supercluster for Anthropic called Project Rainier – here's what we know so far (00:46:35) Elon Musk confirms xAI is buying an overseas power plant and shipping the whole thing to the U.S. to power its new data center — 1 million AI GPUs and up to 2 Gigawatts of power under one roof, equivalent to powering 1.9 million homes (00:48:16) Microsoft's own AI chip delayed six months in major setback — in-house chip now reportedly expected in 2026, but won't hold a candle to Nvidia Blackwell (00:49:54) Ilya Sutskever becomes CEO of Safe Superintelligence after Meta poached Daniel Gross (00:52:46) OpenAI's Stock Compensation Reflect Steep Costs of Talent Wars Projects & Open Source (00:58:04) Hugging Face Releases SmolLM3: A 3B Long-Context, Multilingual Reasoning Model - MarkTechPost (00:58:33) Kimi K2: Open Agentic Intelligence (00:58:59) Kyutai Releases 2B Parameter Streaming Text-to-Speech TTS with 220ms Latency and 2.5M Hours of Training Research & Advancements (01:02:14) Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning (01:07:58) Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity (01:13:03) Mitigating Goal Misgeneralization with Minimax Regret (01:17:01) Correlated Errors in Large Language Models (01:20:31) What skills does SWE-bench Verified evaluate? Policy & Safety (01:22:53) Evaluating Frontier Models for Stealth and Situational Awareness (01:25:49) When Chain of Thought is Necessary, Language Models Struggle to Evade Monitors (01:30:09) Why Do Some Language Models Fake Alignment While Others Don't? (01:34:35) Positive review only': Researchers hide AI prompts in papers (01:35:40) Google faces EU antitrust complaint over AI Overviews (01:36:41) The transfer of user data by DeepSeek to China is unlawful': Germany calls for Google and Apple to remove the AI app from their stores (01:37:30) Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark

#215 - Runway games, Meta Superintelligence, ERNIE 4.5, Adaptive Tree Search

Play Episode Listen Later Jul 8, 2025 116:21 Transcription Available

Our 215th episode with a summary and discussion of last week's big AI news! Recorded on 07/04/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. In this episode: Cloudflare's new AI data scraper blocking feature, its potential implications, and technical challenges Meta's aggressive recruitment for its Super Intelligence Labs division is covered, highlighting key hires from OpenAI and other leaders in the field Anthropic loses significant talent to Cursor, with details on their new economic futures program focusing on AI's impact on the labor market Notable open-source AI model releases from Baidu and Tencent are also discussed, including their performance metrics and potential applications. Timestamps + Links: (00:00:11) Intro / Banter (00:01:43) News Preview Tools & Apps (00:02:55) Cloudflare Introduces Default Blocking of A.I. Data Scrapers (00:05:44) Runway is going to let people generate video games with AI (00:11:24) Google embraces AI in the classroom with new Gemini tools for educators, chatbots for students, and more (00:16:23) No one likes meetings. They're sending their AI note takers instead. (00:18:08) Google launches Doppl, a new app that lets you visualize how an outfit might look on you (00:19:14) Google's Imagen 4 text-to-image model promises 'significantly improved' boring images Applications & Business (00:22:18) Mark Zuckerberg announces his AI ‘superintelligence' super-group (00:29:35) Anthropic Revenue Hits $4 Billion Annual Pace as Competition With Cursor Intensifies (00:35:10) As job losses loom, Anthropic launches program to track AI's economic fallout (00:38:04) OpenAI says it has no plan to use Google's in-house chip (00:41:08) Nvidia stakes new startup that flips script on data center power (00:44:11) TSMC Arizona Chips Are Reportedly Being Flown Back to Taiwan For Packaging; U.S. Semiconductor Supply Chain Still Remains Dependent on Taiwan Projects & Open Source (00:46:57) Baidu releases open source model family ERNIE 4.5 (00:51:55) Tencent Open Sources Hunyuan-A13B: A 13B Active Parameter MoE Model with Dual-Mode Reasoning and 256K Context (00:57:09) Together AI Releases DeepSWE: A Fully Open-Source RL-Trained Coding Agent Based on Qwen3-32B and Achieves 59% on SWEBench (01:00:11) GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning (01:04:10) DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation Research & Advancements (01:06:21) Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search (01:13:07) The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements (01:18:04) Claude 4 Opus and Sonnet reach 50%-time-horizon point estimates of about 80 and 65 minutes, respectively (01:21:37) Performance Prediction for Large Systems via Text-to-Text Regression (01:25:38) Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning (01:26:33) Correlated Errors in Large Language Models Policy & Safety (01:29:04) Forecasting Biosecurity Risks from LLMs (01:36:06) AI Task Length Horizons in Offensive Cybersecurity (01:42:30) Inside Tech's Risky Gamble to Kill State AI Regulations for a Decade (01:52:56) Denmark to tackle deepfakes by giving people copyright to their own features

#214 - Gemini CLI, io drama, AlphaGenome, copyright rulings

Play Episode Listen Later Jul 4, 2025 93:32 Transcription Available

Our 214th episode with a summary and discussion of last week's big AI news! Recorded on 06/27/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. In this episode: Meta's hiring of key engineers from OpenAI and Thinking Machines Lab securing a $2 billion seed round with a valuation of $10 billion. DeepMind introduces Alpha Genome, significantly advancing genomic research with a model comparable to Alpha Fold but focused on gene functions. Taiwan imposes technology export controls on Huawei and SMIC, while Getty drops key copyright claims against Stability AI in a groundbreaking legal case. A new DeepMind research paper introduces a transformative approach to cognitive debt in AI tasks, utilizing EEG to assess cognitive load and recall in essay writing with LLMs. Timestamps + Links: (00:00:10) Intro / Banter (00:01:22) News Preview (00:02:15) Response to listener comments Tools & Apps (00:06:18) Google is bringing Gemini CLI to developers' terminals (00:12:09) Anthropic now lets you make apps right from its Claude AI chatbot Applications & Business (00:15:54) Sam Altman takes his ‘io' trademark battle public (00:21:35) Huawei Matebook Contains Kirin X90, using SMIC 7nm (N+2) Technology (00:26:05) AMD deploys its first Ultra Ethernet ready network card — Pensando Pollara provides up to 400 Gbps performance (00:31:21) Amazon joins the big nuclear party, buying 1.92 GW for AWS (00:33:20) Nvidia goes nuclear — company joins Bill Gates in backing TerraPower, a company building nuclear reactors for powering data centers (00:36:18) Mira Murati's Thinking Machines Lab closes on $2B at $10B valuation (00:41:02) Meta hires key OpenAI researcher to work on AI reasoning models Research & Advancements (00:49:46) Google's new AI will help researchers understand how our genes work (00:55:13) Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks (01:01:54) Farseer: A Refined Scaling Law in Large Language Models (01:06:28) LLM-First Search: Self-Guided Exploration of the Solution Space Policy & Safety (01:11:20) Unsupervised Elicitation of Language Models (01:16:04) Taiwan Imposes Technology Export Controls on Huawei, SMIC (01:18:22) Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task Synthetic Media & Art (01:23:41) Judge Rejects Authors' Claim That Meta AI Training Violated Copyrights (01:29:46) Getty drops key copyright claims against Stability AI, but UK lawsuit continues

#213 - Midjourney video, Gemini 2.5 Flash-Lite, LiveCodeBench Pro

Play Episode Listen Later Jun 26, 2025 36:36 Transcription Available

Our 213nd episode with a summary and discussion of last week's big AI news! Recorded on 06/21/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. In this episode: Midjourney launches its first AI video generation model, moving from text-to-image to video with a subscription model offering up to 21-second clips, highlighting the affordability and growing capabilities in AI video generation. Google's Gemini AI family updates include high-efficiency models for cost-effective workloads, and new enhancements in Google's search function now allow for voice interactions. The introduction of two new benchmarks, Live Code Bench Pro and Abstention Bench, aiming to test and improve the problem-solving and abstention capabilities of reasoning models, revealing current limitations. OpenAI wins a $200 million US defense contract to support various aspects of the Department of Defense, reflecting growing collaborations between tech companies and government for AI applications. Timestamps + Links: (00:00:10) Intro / Banter (00:01:32) News Preview Tools & Apps (00:02:12) Midjourney launches its first AI video generation model, V1 (00:05:52) Google's Gemini AI family updated with stable 2.5 Pro, super-efficient 2.5 Flash-Lite (00:07:59) Google's AI Mode can now have back-and-forth voice conversations (00:10:13) YouTube to Add Google's Veo 3 to Shorts in Move That Could Turbocharge AI on the Video Platform Applications & Business (00:11:10) The ‘OpenAI Files' will help you understand how Sam Altman's company works (00:12:29) OpenAI drops Scale AI as a data provider following Meta deal (00:13:28) Amazon's Zoox opens its first major robotaxi production facility Projects & Open Source (00:15:20) LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? (00:19:45) AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions (00:22:49) MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Research & Advancements (00:24:33) Scaling Laws of Motion Forecasting and Planning -- A Technical Report Policy & Safety (00:28:07) Universal Jailbreak Suffixes Are Strong Attention Hijackers (00:30:52) OpenAI found features in AI models that correspond to different ‘personas' (00:33:25) OpenAI wins $200 million U.S. defense contract

#212 - o3 pro, Cursor 1.0, ProRL, Midjourney Sued

Play Episode Listen Later Jun 17, 2025 106:08 Transcription Available

Our 212th episode with a summary and discussion of last week's big AI news! Recorded on 06/33/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. In this episode: OpenAI introduces O3 PRO for ChatGPT, highlighting significant improvements in performance and cost-efficiency. Anthropic sees an influx of talent from OpenAI and DeepMind, with significantly higher retention rates and competitive advantages in AI capabilities. New research indicates that reinforcing negative responses in LLMs significantly improves performance across all metrics, highlighting novel approaches in reinforcement learning. A security flaw in Microsoft Copilot demonstrates the growing risk of AI agents being hacked, emphasizing the need for robust protection against zero-click attacks. Timestamps + Links: (00:00:11) Intro / Banter (00:01:31) News Preview (00:02:46) Response to Listener Reviews Tools & Apps (00:04:48) OpenAI adds o3 Pro to ChatGPT and drops o3 price by 80 per cent, but open-source AI is delayed (00:09:10) Cursor AI editor hits 1.0 milestone, including BugBot and high-risk background agents (00:13:07) Mistral releases a pair of AI reasoning models (00:16:18) Elevenlabs' Eleven v3 lets AI voices whisper, laugh and express emotions naturally (00:19:00) ByteDance's Seedance 1.0 is trading blows with Google's Veo 3 (00:22:42) Google Reveals $20 AI Pro Plan With Veo 3 Fast Video Generator For Budget Creators Applications & Business (00:25:42) OpenAI and DeepMind are losing engineers to Anthropic in a one-sided talent war (00:34:32) OpenAI slams court order to save all ChatGPT logs, including deleted chats (00:37:24) Nvidia's Biggest Chinese Rival Huawei Struggles to Win at Home (00:43:06) Huawei Expected to Break Semiconductor Barriers with Development of High-End 3nm GAA Chips; Tape-Out by 2026 (00:45:21) TSMC's 1.4nm Process, Also Called Angstrom, Will Make Even The Most Lucrative Clients Think Twice When Placing Orders, With An Estimate Claiming That Each Wafer Will Cost $45,000 (00:47:43) Mistral AI Launches Mistral Compute To Replace Cloud Providers from US, China Projects & Open Source (00:51:26) ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Research & Advancements (00:57:27) Kinetics: Rethinking Test-Time Scaling Laws (01:05:12) The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning (01:10:45) Predicting Empirical AI Research Outcomes with Language Models (01:15:02) EXP-Bench: Can AI Conduct AI Research Experiments? Policy & Safety (01:20:07) Large Language Models Often Know When They Are Being Evaluated (01:24:56) Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence (01:31:16) Exclusive: New Microsoft Copilot flaw signals broader risk of AI agents being hacked—‘I would be terrified' (01:35:01) Claude Gov Models for U.S. National Security Customers Synthetic Media & Art (01:37:32) Disney And NBCUniversal Sue AI Company Midjourney For Copyright Infringement (01:40:39) AMC Networks is teaming up with AI company Runway

#211 - Claude Voice, Flux Kontext, wrong RL research?

Play Episode Listen Later Jun 3, 2025 98:06 Transcription Available

Our 211th episode with a summary and discussion of last week's big AI news! Recorded on 05/31/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. Join our Discord here! https://discord.gg/nTyezGSKwP In this episode: Recent AI podcast covers significant AI news: startups, new tools, applications, investments in hardware, and research advancements. Discussions include the introduction of various new tools and applications such as Flux's new image generating models and Perplexity's new spreadsheet and dashboard functionalities. A notable segment focuses on OpenAI's partnership with the UAE and discussions on potential legislation aiming to prevent states from regulating AI for a decade. Concerns around model behaviors and safety are discussed, highlighting incidents like Claude Opus 4's blackmail attempt and Palisade Research's tests showing AI models bypassing shutdown commands. Timestamps + Links: (00:00:10) Intro / Banter (00:01:39) News Preview (00:02:50) Response to Listener Comments Tools & Apps (00:07:10) Anthropic launches a voice mode for Claude (00:10:35) Black Forest Labs' Kontext AI models can edit pics as well as generate them (00:15:30) Perplexity's new tool can generate spreadsheets, dashboards, and more (00:18:43) xAI to pay Telegram $300M to integrate Grok into the chat app (00:22:42) Opera's new AI browser promises to write code while you sleep (00:24:17) Google Photos debuts redesigned editor with new AI tools Applications & Business (00:25:13) Top Chinese memory maker expected to abandon DDR4 manufacturing at the behest of Beijing (00:30:04) Oracle to Buy $40 Billion Worth of Nvidia Chips for First Stargate Data Center (00:31:47) UAE makes ChatGPT Plus subscription free for all residents as part of deal with OpenAI (00:35:34) NVIDIA Corporation (NVDA) to Launch Cheaper Blackwell AI Chip for China, Says Report (00:38:39) The New York Times and Amazon ink AI licensing deal Projects & Open Source (00:41:11) DeepSeek's distilled new R1 AI model can run on a single GPU (00:45:19) Google Unveils SignGemma, an AI Model That Can Translate Sign Language Into Spoken Text (00:47:08) Open-sourcing circuit tracing tools (00:49:42) Hugging Face unveils two new humanoid robots Research & Advancements (00:52:33) PANGU PRO MOE: MIXTURE OF GROUPED EXPERTS FOR EFFICIENT SPARSITY (00:58:55) DataRater: Meta-Learned Dataset Curation (01:05:05) Incorrect Baseline Evaluations Call into Question Recent LLM-RL Claims (01:10:17) Maximizing Confidence Alone Improves Reasoning (01:11:00) Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence (01:11:44) One RL to See Them All (01:15:05) Efficient Reinforcement Finetuning via Adaptive Curriculum Learning Policy & Safety (01:17:58) Trump's 'Big Beautiful Bill' could ban states from regulating AI for a decade (01:24:31) Researchers claim ChatGPT o3 bypassed shutdown in controlled test (01:30:10) Anthropic's new AI model turns to blackmail when engineers try to take it offline (01:31:09) Anthropic Faces Backlash As Claude 4 Opus Can Autonomously Alert Authorities (01:35:37) Claude helps users make bioweapons (01:35:49) The Claude 4 System Card is a Wild Read

#210 - Claude 4, Google I/O 2025, OpenAI+io, Gemini Diffusion

Play Episode Listen Later May 26, 2025 104:47 Transcription Available

Our 210th episode with a summary and discussion of last week's big AI news! Recorded on 05/23/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. Join our Discord here! https://discord.gg/nTyezGSKwP In this episode: Google's Gemini diffusion technology showcases significant improvements in speed and efficiency for generating text, potentially revolutionizing the auto-regressive generation paradigm. Anthropic activates AI Safety Level 3 protections for Claude Opus 4, implementing robust measures such as bug bounties, synthetic jailbreak data, and preliminary egress bandwidth controls to mitigate bio-risk threats. OpenAI responds to the California Attorney General, refuting claims by the not-for-private-gain coalition and defending their controversial restructuring plans amidst ongoing criticism. Mistral delays the release of its Llama 4 Behemoth model due to training challenges, while Meta faces similar obstacles in rolling out its large-scale AI models, signaling difficulties in reaching frontier level performance. Timestamps + Links: (00:00:00) Intro / Banter (00:01:43) News Preview Tools & Apps (00:02:58) Anthropic's new Claude 4 AI models can reason over many steps (00:09:58) Google Unveils A.I. Chatbot, Signaling a New Era for Search (00:14:04) Google rolls out Project Mariner, its web-browsing AI agent (00:16:40) Veo 3 can generate videos — and soundtracks to go along with them (00:21:26) Imagen 4 is Google's newest AI image generator (00:23:15) Google Meet is getting real-time speech translation (00:25:36) Google's new Jules AI agent will help developers fix buggy code (00:26:43) GitHub's new AI coding agent can fix bugs for you (00:28:50) Mistral's new Devstral model was designed for coding Applications & Business (00:29:53) OpenAI Unites With Jony Ive in $6.5 Billion Deal to Create A.I. Devices (00:36:10) OpenAI's planned data center in Abu Dhabi would be bigger than Monaco (00:41:18) LM Arena, the organization behind popular AI leaderboards, lands $100M (00:45:21) Nvidia CEO says next chip after H20 for China won't be from Hopper series (00:46:39) Google's Gemini AI app has 400M monthly active users (00:51:15) AI Servers: End demand intact, but rising gap between upstream build and system production (2025.5.18) Projects & Open Source (00:53:46) Meta Is Delaying the Rollout of Its Flagship AI Model Research & Advancements (00:57:53) Gemini Diffusion (01:03:07) Chain-of-Model Learning for Language Model (01:09:16) Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space (01:15:38) Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training (01:20:16) Lessons from Defending Gemini Against Indirect Prompt Injections (01:23:35) How Fast Can Algorithms Advance Capabilities? (01:30:20) Reinforcement Learning Finetunes Small Subnetworks in Large Language Models Policy & Safety (01:31:12) Exclusive: What OpenAI Told California's Attorney General (01:38:25) Activating AI Safety Level 3 Protections

#209 - OpenAI non-profit, US diffusion rules, AlphaEvolve

Play Episode Listen Later May 19, 2025 113:14 Transcription Available

Our 209th episode with a summary and discussion of last week's big AI news! Recorded on 05/16/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. Join our Discord here! https://discord.gg/nTyezGSKwP In this episode: OpenAI has decided not to transition from a nonprofit to a for-profit entity, instead opting to become a public benefit corporation influenced by legal and civic discussions. Trump administration meetings with Saudi Arabia and the UAE have opened floodgates for AI deals, leading to partnerships with companies like Nvidia and aiming to bolster AI infrastructure in the Middle East. DeepMind introduced Alpha Evolve, a new coding agent designed for scientific and algorithmic discovery, showing improvements in automated code generation and efficiency. OpenAI pledges greater transparency in AI safety by launching the Safety Evaluations Hub, a platform showcasing various safety test results for their models. Timestamps + Links: (00:00:00) Intro / Banter (00:01:41) News Preview (00:02:26) Response to listener comments Applications & Business (00:03:00) OpenAI says non-profit will remain in control after backlash (00:13:23) Microsoft Moves to Protect Its Turf as OpenAI Turns Into Rival (00:18:07) TSMC's 2nm Process Said to Witness ‘Unprecedented' Demand, Exceeding 3nm Due to Interest from Apple, NVIDIA, AMD, & Many Others (00:21:42) NVIDIA's Global Headquarters Will Be In Taiwan, With CEO Huang Set To Announce Site Next Week, Says Report (00:23:58) CoreWeave in Talks for $1.5 Billion Debt Deal 6 Weeks After IPO Tools & Apps (00:26:39) The Day Grok Told Everyone About ‘White Genocide' (00:32:58) Figma releases new AI-powered tools for creating sites, app prototypes, and marketing assets (00:36:12) Google's bringing Gemini to your car with Android Auto (00:38:49) Google debuts an updated Gemini 2.5 Pro AI model ahead of I/O (00:45:09) Hugging Face releases a free Operator-like agentic AI tool Projects & Open Source (00:47:42) Stability AI releases an audio-generating model that can run on smartphones (00:50:47) Freepik releases an ‘open' AI image generator trained on licensed data (00:54:22) AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale (01:01:29) BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Research & Advancements (01:05:40) DeepMind claims its newest AI tool is a whiz at math and science problems (01:12:31) Absolute Zero: Reinforced Self-play Reasoning with Zero Data (01:19:44) How far can reasoning models scale? (01:26:47) HealthBench: Evaluating Large Language Models Towards Improved Human Health Policy & Safety (01:34:10) Trump administration officially rescinds Biden's AI diffusion rules (01:37:08) Trump's Mideast Visit Opens Floodgate of AI Deals Led by Nvidia (01:44:04) Scaling Laws For Scalable Oversight (01:49:43) OpenAI pledges to publish AI safety test results more often

#208 - Claude Integrations, ChatGPT Sycophancy, Leaderboard Cheats

Play Episode Listen Later May 8, 2025 115:25 Transcription Available

Our 208th episode with a summary and discussion of last week's big AI news! Recorded on 05/02/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. Join our Discord here! https://discord.gg/nTyezGSKwP In this episode: OpenAI showcases new integration capabilities in their API, enhancing the performance of LLMs and image generators with updated functionalities and improved user interfaces. Analysis of OpenAI's preparedness framework reveals updates focusing on biological and chemical risks, cybersecurity, and AI self-improvement, while tone down the emphasis on persuasion capabilities. Anthropic's research highlights potential security vulnerabilities in AI models, demonstrating various malicious use cases such as influence operations and hacking tool creation. A detailed examination of AI competition between the US and China reveals China's impending capability to match the US in AI advancement this year, emphasizing the impact of export controls and the importance of geopolitical strategy. Timestamps + Links: Tools & Apps (00:02:57) Anthropic lets users connect more apps to Claude (00:08:20) OpenAI undoes its glaze-heavy ChatGPT update (00:15:16) Baidu ERNIE X1 and 4.5 Turbo boast high performance at low cost (00:19:44) Adobe adds more image generators to its growing AI family (00:24:35) OpenAI makes its upgraded image generator available to developers (00:27:01) xAI's Grok chatbot can now ‘see' the world around it Applications & Business: (00:28:41) Thinking Machines Lab CEO Has Unusual Control in Andreessen-Led Deal (00:33:36) Chip war heats up: Huawei 910C emerges as China's answer to US export bans (00:34:21) Huawei to Test New AI Chip (00:40:17) ByteDance, Alibaba and Tencent stockpile billions worth of Nvidia chips (00:43:59) Speculation mounts that Musk will raise tens of billions for AI supercomputer with 1 million GPUs: Report Projects & Open Source: (00:47:14) Alibaba unveils Qwen 3, a family of ‘hybrid' AI reasoning models (00:54:14) Intellect-2 (01:02:07) BitNet b1.58 2B4T Technical Report (01:05:33) Meta AI Introduces Perception Encoder: A Large-Scale Vision Encoder that Excels Across Several Vision Tasks for Images and Video Research & Advancements: (01:06:42) The Leaderboard Illusion (01:12:08) Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? (01:18:38) Reinforcement Learning for Reasoning in Large Language Models with One Training Example (01:24:40) Sleep-time Compute: Beyond Inference Scaling at Test-time Policy & Safety: (01:28:23) Every AI Datacenter Is Vulnerable to Chinese Espionage, Report Says (01:32:27) OpenAI preparedness framework update (01:38:31) Detecting and Countering Malicious Uses of Claude: March 2025 (01:46:33) Chinese AI Will Match America's

#207 - GPT 4.1, Gemini 2.5 Flash, Ironwood, Claude Max

Play Episode Listen Later Apr 18, 2025 102:30 Transcription Available

Our 207th episode with a summary and discussion of last week's big AI news! Recorded on 04/14/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. Join our Discord here! https://discord.gg/nTyezGSKwP In this episode: OpenAI introduces GPT-4.1 with optimized coding and instruction-following capabilities, featuring variants like GPT-4.1 Mini and Nano, and a million-token context window. Concerns arise as OpenAI reduces resources for safety testing, sparking internal and external criticisms. XAI's newly launched API for Grok 3 showcases significant capabilities comparable to other leading models. Meta faces allegations of aiding China in AI development for business advantages, with potential compliances and public scrutiny looming. Timestamps + Links: Tools & Apps (00:03:13) OpenAI's new GPT-4.1 AI models focus on coding (00:08:12) ChatGPT will now remember your old conversations (00:11:16) Google's newest Gemini AI model focuses on efficiency (00:14:27) Elon Musk's AI company, xAI, launches an API for Grok 3 (00:18:35) Canva is now in the coding and spreadsheet business (00:20:31) Meta's vanilla Maverick AI model ranks below rivals on a popular chat benchmark Applications & Business (00:25:46) Ironwood: The first Google TPU for the age of inference (00:34:15) Anthropic rolls out a $200-per-month Claude subscription (00:37:17) OpenAI co-founder Ilya Sutskever's Safe Superintelligence reportedly valued at $32B (00:40:20) Mira Murati's AI startup gains prominent ex-OpenAI advisers (00:42:52) Hugging Face buys a humanoid robotics startup (00:44:58) Stargate developer Crusoe could spend $3.5 billion on a Texas data center. Most of it will be tax-free. Projects & Open Source (00:48:14) OpenAI Open Sources BrowseComp: A New Benchmark for Measuring the Ability for AI Agents to Browse the Web Research & Advancements (00:56:09) Sample, Don't Search: Rethinking Test-Time Alignment for Language Models (01:03:32) Concise Reasoning via Reinforcement Learning (01:09:37) Going beyond open data – increasing transparency and trust in language models with OLMoTrace (01:15:34) Independent evaluations of Grok-3 and Grok-3 mini on our suite of benchmarks Policy & Safety (01:17:58) OpenAI countersues Elon Musk, calls for enjoinment from ‘further unlawful and unfair action' (01:24:33) OpenAI slashes AI model safety testing time (01:27:55) Ex-OpenAI staffers file amicus brief opposing the company's for-profit transition (01:32:25) Access to future AI models in OpenAI's API may require a verified ID (01:34:53) Meta whistleblower claims tech giant built $18 billion business by aiding China in AI race and undermining U.S. national security

#206 - Llama 4, Nova Act, xAI buys X, PaperBench

Play Episode Listen Later Apr 9, 2025 73:44 Transcription Available

Our 206th episode with a summary and discussion of last week's big AI news! Recorded on 04/07/2025 Try out the Astrocade demo here! https://www.astrocade.com/ Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. Join our Discord here! https://discord.gg/nTyezGSKwP In this episode: Meta releases LlAMA-4, a series of advanced large language models, sparking debate on performance and release timing, with models featuring up to 2 trillion parameters for different configurations and applications. Amazon's AGI Lab debuts NOVA Act, an AI agent for web browser control, boasting competitive benchmarking against OpenAI's and Anthropic's best agents. OpenAI's image generation capabilities and ongoing financing developments, notably a $40 billion funding round led by SoftBank, highlight significant advancements and strategic shifts in the tech giant's operations. Timestamps + Links: (00:00:00) Intro / Banter Tools & Apps (00:01:46) Meta releases Llama 4, a new crop of flagship AI models (00:13:55) Amazon unveils Nova Act, an AI agent that can control a web browser (00:17:06) Alibaba Preparing for Flagship AI Model Release as Soon as April (00:17:59) Runway releases an impressive new video-generating AI model (00:19:10) Adobe launches Premiere Pro's generative AI video extender (00:20:54) OpenAI prepares reasoning slider and memory update for ChatGPT users Applications & Business (00:21:28) Nvidia H20 Chips: $16 Billion Orders from ByteDance, Alibaba, and Tencent (00:24:45) Elon Musk sells X for $33 billion to his own AI startup company xAI (00:28:00) SoftBank dethroned Microsoft as OpenAI's largest investor, pushing the ChatGPT maker's market cap to $300 billion — but reportedly buried itself in debt (00:30:48) DeepMind is holding back release of AI research to give Google an edge (00:34:06) SMIC Is Rumored To Complete 5nm Chip Development By 2025; Costs Could Be Up To 50 Percent Higher Than TSMC's Version Due To The Use Of Older-Generation Equipment (00:36:04) Google-backed Isomorphic Labs raises $600m to advance AI drug discovery Research & Advancements (00:38:03) PaperBench: Evaluating AI's Ability to Replicate AI Research (00:43:50) Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains (00:48:39) Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead (00:54:34) Overtrained Language Models Are Harder to Fine-Tune Policy & Safety (00:58:28) Taking a responsible path to AGI (01:02:32) This A.I. Forecast Predicts Storms Ahead (01:06:24) The Secrets and Misdirection Behind Sam Altman's Firing From OpenAI OpenAI's new image generation capabilities represent significant advancements in AI tools, showcasing impressive benchmarks and multimodal functionalities. OpenAI is finalizing a historic $40 billion funding round led by SoftBank, and Sam Altman shifts focus to technical direction while COO Brad Lightcap takes on more operational responsibilities., Anthropic unveils groundbreaking interpretability research, introducing cross-layer tracers and showcasing deep insights into model reasoning through applications on Claude 3.5. New challenging benchmarks such as ARC AGI 2 and complex Sudoku variations aim to push the boundaries of reasoning and problem-solving capabilities in AI models.

#205 - Gemini 2.5, ChatGPT Image Gen, Thoughts of LLMs

Play Episode Listen Later Apr 1, 2025 94:18 Transcription Available

Our 205th episode with a summary and discussion of last week's big AI news! Recorded on 03/28/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. Join our Discord here! https://discord.gg/nTyezGSKwP In this episode: OpenAI's new image generation capabilities represent significant advancements in AI tools, showcasing impressive benchmarks and multimodal functionalities. OpenAI is finalizing a historic $40 billion funding round led by SoftBank, and Sam Altman shifts focus to technical direction while COO Brad Lightcap takes on more operational responsibilities., Anthropic unveils groundbreaking interpretability research, introducing cross-layer tracers and showcasing deep insights into model reasoning through applications on Claude 3.5. New challenging benchmarks such as ARC AGI 2 and complex Sudoku variations aim to push the boundaries of reasoning and problem-solving capabilities in AI models. Timestamps + Links: (00:00:00) Intro / Banter (00:01:01) News Preview Tools & Apps (00:02:46) Gemini 2.5: Our most intelligent AI model (00:08:41) OpenAI rolls out image generation powered by GPT-4o to ChatGPT (00:16:14) Ideogram presents version 3.0 of its AI image generation system (00:19:20) New Reve Image Generator Beats AI Art Heavyweights MidJourney and Flux at a Penny Per Image (00:21:56) Alibaba Releases Qwen2.5 Omni, Adds Voice and Video Modes to Qwen Chat (00:23:58) The official version of Tencent's Hunyuan Deep Thinking Model T1 is here, with fast articulation, instant responses, and a decoding speed increase of 2 times Applications & Business (00:25:45) OpenAI Close to Finalizing $40 Billion SoftBank-Led Funding (00:29:26) OpenAI reshuffles leadership as Sam Altman pivots to technical focus (00:33:23) Nvidia shows off Rubin Ultra with 600,000-Watt Kyber racks and infrastructure, coming in 2027 (00:35:23) China's SiCarrier emerges as challenger to ASML, other chip tool titans (00:38:24) Pony.ai wins first permit for fully driverless taxi operation in the center of China's Silicon Valley Projects & Open Source (00:40:27) A new, challenging AGI test stumps most AI models (00:45:16) Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models (00:48:13) Wan: Open and Advanced Large-Scale Video Generative Models (00:50:38) DeepSeek V3-0324 tops non-reasoning AI models in open-source first (00:54:46) OpenAI adopts rival Anthropic's standard for connecting AI models to data Research & Advancements (00:55:56) Anthropic can now track the bizarre inner workings of a large language model (01:06:00) Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models (01:11:50) Inside-Out: Hidden Factual Knowledge in LLMs (01:15:14) Sakana AI super-powers AI reasoning using Japan's own Sudoku Puzzles Policy & Safety (01:18:38) Senator Wiener Introduces Legislation to Protect AI Whistleblowers & Boost Responsible AI Development (01:21:50) NVIDIA & Other Tech Giants Demand Trump Administration To Reconsider “AI Diffusion” Policy Which Is Set To Be Effective By May 15 (01:23:17) U.S. blacklists over 50 Chinese companies in bid to curb Beijing's AI, chip capabilities (01:26:44) Netflix's Reed Hastings Gives $50 Million to Bowdoin for A.I. Program (01:27:55) Judge allows 'New York Times' copyright case against OpenAI to go forward (01:29:48) Judge rules that AI can continue training on copyrighted lyrics, for now

#204 - OpenAI Audio, Rubin GPUs, MCP, Zochi

Play Episode Listen Later Mar 24, 2025 109:03 Transcription Available

Our 204th episode with a summary and discussion of last week's big AI news! Recorded on 03/21/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. Join our Discord here! https://discord.gg/nTyezGSKwP In this episode: Baidu launched two new multimodal models, Ernie 4.5 and Ernie X1, boasting competitive pricing and capabilities compared to Western counterparts like GPT-4.5 and DeepSeek R1. OpenAI introduced new audio models, including impressive speech-to-text and text-to-speech systems, and added O1 Pro to their developer API at high costs, reflecting efforts for more profitability. Nvidia and Apple announced significant hardware advancements, including Nvidia's future GPU plans and Apple's new Mac Studio offering that can run DeepSeek R1. DeepSeek employees are facing travel restrictions, suggesting China is treating its AI development with increased secrecy and urgency, emphasizing a wartime footing in AI competition. Timestamps + Links: (00:00:00) Intro / Banter (00:01:36) News Preview Tools & Apps (00:02:50) Baidu launches two new versions of its AI model Ernie (00:10:46) OpenAI Unveils New Audio Models to Make AI Agents Sound More Human Than Ever (00:16:41) OpenAI's o1-pro is the company's most expensive AI model yet (00:20:53) Google brings a ‘canvas' feature to Gemini, plus Audio Overview (00:22:18) Anthropic adds web search to its Claude chatbot (00:23:55) xAI launches an API for generating images Applications & Business (00:26:28) Nvidia announces Rubin GPUs in 2026, Rubin Ultra in 2027, Feynman also added to roadmap (00:36:25) M3 Ultra Runs DeepSeek R1 With 671 Billion Parameters Using 448GB Of Unified Memory, Delivering High Bandwidth Performance At Under 200W Power Consumption, With No Need For A Multi-GPU Setup (00:40:07) Intel reaches 'exciting milestone' for 18A 1.8nm-class wafers with first run at Arizona fab (00:42:45) Elon Musk's AI company, xAI, acquires a generative AI video startup (00:44:44) Tencent Reportedly Makes Massive NVIDIA H20 Chip Purchase for WeChat's DeepSeek Integration Projects & Open Source (00:46:32) Anthropic's Not-So-Secret Weapon That's Giving Agents a Boost (00:50:50) Mistral AI drops new open-source model that outperforms GPT-4o Mini with fraction of parameters (00:53:30) EXAONE Deep: Reasoning Enhanced Language Models Research & Advancements (00:55:58) Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification (01:07:44) Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models (01:12:27) Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo (01:18:46) Transformers without Normalization (01:19:52) Measuring AI Ability to Complete Long Tasks (01:26:12) HCAST: Human-Calibrated Autonomy Software Tasks Policy & Safety (01:26:45) Announcing Zochi, an Intology Project (01:32:46) DeepSeek, a National Treasure in China, is Now Being Closely Guarded (01:37:02) Claude Sonnet 3.7 (often) knows when it's in alignment evaluations Synthetic Media & Art (01:42:27) US appeals court rejects copyrights for AI-generated art lacking 'human' creator (01:45:10) Trump urged by Ben Stiller, Paul McCartney and hundreds of stars to protect AI copyright rules

#203 - Gemini Image Gen, Ascend 910C, Gemma 3, Gemini Robotics

Play Episode Listen Later Mar 17, 2025 106:23 Transcription Available

Our 203rd episode with a summary and discussion of last week's big AI news! Recorded on 03/14/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. Join our Discord here! https://discord.gg/nTyezGSKwP In this episode: OpenAI's new 'deep research' feature has raised concerns about cybersecurity and the potential misuse of AI models for bio-weapons and autonomous capabilities, prompting new safety and governance measures. Google's extensive $3 billion investment in Anthropic is revealed, aligning with their AI strategy and reinforcing the importance of multiple technology partnerships. Huawei's advancements in the AI chip industry are highlighted, with significant progress in producing chips comparable to Nvidia's H100, despite export control challenges. China's recent directive discourages AI executives from traveling to the US, reflecting heightened security concerns and potentially signaling a more adversarial stance in the AI race. Timestamps + Links: (00:00:00) Intro / Banter (00:01:30) News Preview Tools & Apps (00:02:30) OpenAI launches new tools to help businesses build AI agents (00:08:50) You can now test Gemini 2.0 Flash's native image output (00:13:32) Waymo is now offering 24/7 robotaxi rides in Silicon Valley (00:17:19) Moonvalley releases a video generator it claims was trained on licensed content (00:21:11) Snap introduces AI Video Lenses powered by its in-house generative model (00:23:37) Sudowrite Launches Muse AI Model That Can Generate Narrative-Driven Fiction Applications & Business (00:27:48) In another chess move with Microsoft, OpenAI is pouring $12B into CoreWeave (00:30:54) Huawei's Ascend 910C Takes on NVIDIA as China's AI Race Heats Up: More Alleged Details (00:36:26) Huawei reportedly acquired two million Ascend 910 AI chips from TSMC last year through shell companies (00:40:27) Inside Google's Investment in the A.I. Start-Up Anthropic (00:43:26) Meta is reportedly testing in-house chips for AI training (00:46:48) Elon Musk's xAI buys 1 million sq ft site for second Memphis data center (00:50:02) Superintelligence startup Reflection AI launches with $130M in funding Projects & Open Source (00:53:11) Google calls Gemma 3 the most powerful AI model you can run on one GPU (00:58:18) Sesame, the startup behind the viral virtual assistant Maya, releases its base AI model (01:01:13) Reka AI Open Sourced Reka Flash 3: A 21B General-Purpose Reasoning Model that was Trained from Scratch (01:04:19) Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k Research & Advancements (01:06:25) Google's Gemini Robotics AI Model Reaches Into the Physical World (01:14:33) Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning (01:23:29) Deep Research System Card (01:29:50) Claude 3.7 Sonnet System Card Policy & Safety (01:33:24) Detecting misbehavior in frontier reasoning models (01:39:30) China tells its AI leaders to avoid US travel over security concerns, WSJ reports (01:43:48) Outro

#202 - Qwen-32B, Anthropic's $3.5 billion, LLM Cognitive Behaviors

Play Episode Listen Later Mar 9, 2025 79:52 Transcription Available

Our 202nd episode with a summary and discussion of last week's big AI news! Recorded on 03/07/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. Join our Discord here! https://discord.gg/nTyezGSKwP In this episode: Alibaba released Qwen-32B, their latest reasoning model, on par with leading models like DeepMind's R1. Anthropic raised $3.5 billion in a funding round, valuing the company at $61.5 billion, solidifying its position as a key competitor to OpenAI. DeepMind introduced BigBench Extra Hard, a more challenging benchmark to evaluate the reasoning capabilities of large language models. Reinforcement Learning pioneers Andrew Bartow and Rich Sutton were awarded the prestigious Turing Award for their contributions to the field. Timestamps + Links: cle picks: (00:00:00) Intro / Banter (00:01:41) Episode Preview (00:02:50) GPT-4.5 Discussion (00:14:13) Alibaba's New QwQ 32B Model is as Good as DeepSeek-R1 ; Outperforms OpenAI's o1-mini (00:21:29) With Alexa Plus, Amazon finally reinvents its best product (00:26:08) Another DeepSeek moment? General AI agent Manus shows ability to handle complex tasks (00:29:14) Microsoft's new Dragon Copilot is an AI assistant for healthcare (00:32:24) Mistral's new OCR API turns any PDF document into an AI-ready Markdown file (00:33:19) A.I. Start-Up Anthropic Closes Deal That Values It at $61.5 Billion (00:35:49) Nvidia-Backed CoreWeave Files for IPO, Shows Growing Revenue (00:38:05) Waymo and Uber's Austin robotaxi expansion begins today (00:38:54) UK competition watchdog drops Microsoft-OpenAI probe (00:41:17) Scale AI announces multimillion-dollar defense deal, a major step in U.S. military automation (00:44:43) DeepSeek Open Source Week: A Complete Summary (00:45:25) DeepSeek AI Releases DualPipe: A Bidirectional Pipeline Parallelism Algorithm for Computation-Communication Overlap in V3/R1 Training (00:53:00) Physical Intelligence open-sources Pi0 robotics foundation model (00:54:23) BIG-Bench Extra Hard (00:56:10) Cognitive Behaviors that Enable Self-Improving Reasoners (01:01:49) The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems (01:05:32) Pioneers of Reinforcement Learning Win the Turing Award (01:06:56) OpenAI launches $50M grant program to help fund academic research (01:07:25) The Nuclear-Level Risk of Superintelligent AI (01:13:34) METR's GPT-4.5 pre-deployment evaluations (01:17:16) Chinese buyers are getting Nvidia Blackwell chips despite US export controls

#201 - GPT 4.5, Sonnet 3.7, Grok 3, Phi 4

Play Episode Listen Later Mar 5, 2025 58:37

Our 201st episode with a summary and discussion of last week's big AI news! Recorded on 03/02/2025 Join our brand new Discord here! https://discord.gg/nTyezGSKwP Hosted by Andrey Kurenkov and guest host Sharon Zhou Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. In this episode: - The release of GPT-4.5 from OpenAI, Anthropic's Claude 3.7, and Grok 3 from XAI, comparing their features, costs, and capabilities. - Discussion on new tools and applications including Sesame's new voice assistant and Google's AI coding assistant, Gemini Code Assist, highlighting their unique benefits. - OpenAI's continued user growth despite competition, pricing models for Google's text-to-video platform, and HP acquiring and shutting down Humane's AI pin. - Insights into new research on alignment and specification gaming in LLMs, including papers on fine-tuning causing broad misalignment and Google's multi-agent system for scientific collaboration. Timestamps + Links: (00:00:00) Intro / Banter (00:01:36) News Preview Tools & Apps (00:02:33) OpenAI announces GPT-4.5, warns it's not a frontier AI model (00:07:22) Anthropic launches a new AI model that ‘thinks' as long as you want (00:11:14) New Grok 3 release tops LLM leaderboards (00:16:43) Sesame is the first voice assistant I've ever wanted to talk to more than once (00:18:30) Google launches a free AI coding assistant with very high usage caps (00:20:45) Rabbit shows off the AI agent it should have launched with (00:22:23) Mistral's Le Chat tops 1M downloads in just 14 days Applications & Business (00:24:06) OpenAI Tops 400 Million Users Despite DeepSeek's Emergence (00:27:37) Google's new AI video model Veo 2 will cost 50 cents per second (00:29:52) HP is buying Humane and shutting down the AI Pin Projects & Open Source (00:31:44) Microsoft launches next-gen Phi AI models. (00:33:47) OpenAI introduces SWE-Lancer: A Benchmark for Evaluating Model Performance on Real-World Freelance Software Engineering Work (00:37:12) SWE-Bench+: Enhanced Coding Benchmark for LLMs Research & Advancements (00:40:00) Towards an AI co-scientist (00:42:52) Magma: A Foundation Model for Multimodal AI Agents Policy & Safety (00:47:32) Demonstrating specification gaming in reasoning models (00:51:03) Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

#200 - ChatGPT Roadmap, Musk OpenAI Bid, Model Tampering

Play Episode Listen Later Feb 17, 2025 108:09 Transcription Available

Our 200th episode with a summary and discussion of last week's big AI news! Recorded on 02/14/2025 Join our brand new Discord here! https://discord.gg/nTyezGSKwP Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. In this episode: OpenAI announces plans to unify their model offerings, moving away from multiple separate models (GPT-4, Claude, etc.) toward a single unified intelligence system, with free users getting "standard intelligence" and Plus subscribers accessing "higher intelligence" levels. Adobe launches their Sora-rivaling AI video generator with 1080p output and 5-second clips, emphasizing production-ready content for films and introducing new pricing tiers through Firefly subscriptions at $10-30 per month. Elon Musk and a consortium offer $97.4 billion to acquire OpenAI's nonprofit entity, potentially complicating the company's transition to a for-profit structure, though Sam Altman quickly dismissed the offer's viability. TSMC implements stricter chip sales restrictions to China, requiring government-approved third-party packaging houses for chips using 16nm and below processes, aligning with US export control measures and affecting major tech companies like Nvidia and AMD. Timestamps + Links: (00:00:00) Intro / Banter (00:01:25) Response to listener comments (00:02:41) News Preview Tools & Apps (00:03:58) Adobe's Sora rivalling AI video generator is now available for everyone (00:09:45) OpenAI lays out plans for GPT-5 (00:16:42) OpenAI is rethinking how AI models handle controversial topics (00:21:28) Perplexity AI launches new ultra-fast AI search model Sonar (00:23:45) YouTube AI updates include auto dubbing expansion, age ID tech, and more Applications & Business (00:24:37) Musk-led group makes $97.4 billion bid for control of OpenAI (00:34:32) Anthropic's next major AI model could arrive within weeks (00:39:09) AI chip startup Groq secures $1.5 billion commitment from Saudi Arabia (00:42:15) OpenAI reportedly planning to build its first AI chip in 2026 Projects & Open Source (00:45:01) Zyphra Introduces the Beta Release of Zonos: A Highly Expressive TTS Model with High Fidelity Voice Cloning (00:51:11) Gemstones: A Model Suite for Multi-Faceted Scaling Laws (00:57:15) Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training Research & Advancements (00:58:24) Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities (01:04:24) Distillation Scaling Laws (01:10:06) Matryoshka Quantization (01:17:47) How much AI compute exists globally? How rapidly is it growing? Policy & Safety (01:21:29) US and UK refuse to sign summit declaration on AI safety (01:25:43) Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs (01:34:40) xAI Risk Management Framework (Draft) (01:39:59) TSMC bans more chip sales to China due to stricter U.S. export sanctions (01:42:38) Listener requested topic Synthetic Media & Art (01:43:48) Thomson Reuters Wins First Major AI Copyright Case in the US (01:44:46) Scarlett Johansson calls for deepfake ban after AI video goes viral (01:45:55) Outro

Claim Let's Talk AI

In order to claim this podcast we'll send an email to with a verification link. Simply click the link and you will be able to edit tags, request a refresh, and other features to take control of your podcast page!

Claim Cancel