POPULARITY
I, Stewart Alsop, had a fascinating conversation on this episode of Crazy Wisdom with Mallory McGee, the founder of Chroma, who is doing some really interesting work at the intersection of AI and crypto. We dove deep into how these two powerful technologies might reshape the internet and our interactions with it, moving beyond the hype cycles to what's truly foundational.Check out this GPT we trained on the conversationTimestamps00:00 The Intersection of AI and Crypto01:28 Bitcoin's Origins and Austrian Economics04:35 AI's Centralization Problem and the New Gatekeepers09:58 Agent Interactions and Decentralized Databases for Trustless Transactions11:11 AI as a Prosthetic Mind and the Interpretability Challenge15:12 Deterministic Blockchains vs. Non-Deterministic AI Intents18:44 The Demise of Traditional Apps in an Agent-Driven World35:07 Property Rights, Agent Registries, and Blockchains as BackendsKey InsightsCrypto's Enduring Fundamentals: Mallory emphasized that while crypto prices are often noise, the underlying fundamentals point to a new, long-term cycle for the Internet itself. It's about decentralizing control, a core principle stemming from Bitcoin's original blend of economics and technology.AI's Centralization Dilemma: We discussed the concerning trend of AI development consolidating power within a few major players. This, as Mallory pointed out, ironically mirrors the very centralization crypto aims to dismantle, potentially shifting control from governments to a new set of tech monopolies.Agents are the Future of Interaction: Mallory envisions a future where most digital interactions aren't human-to-LLM, but agent-to-agent. These autonomous agents will require decentralized, trustless platforms like blockchains to transact, hold assets, and communicate confidentially.Bridging Non-Deterministic AI with Deterministic Blockchains: A fascinating challenge Mallory highlighted is translating the non-deterministic "intents" of AI (e.g., an agent's goal to "get me a good return on spare cash") into the deterministic transactions required by blockchains. This translation layer is crucial for agents to operate effectively on-chain.The Decline of Traditional Apps: Mallory made a bold claim that traditional apps and web interfaces are on their way out. As AI agents become capable of generating personalized interfaces on the fly, the need for standardized, pre-built apps will diminish, leading to a world where software is hyper-personalized and often ephemeral.Blockchains as Agent Backbones: We explored the intriguing idea that blockchains might be inherently better suited for AI agents than for direct human use. Their deterministic nature, ability to handle assets, and potential for trustless reputation systems make them ideal backends for an agent-centric internet.Trust and Reputation for Agents: In a world teeming with AI agents, establishing trust is paramount. Mallory suggested that on-chain mechanisms like reward and slashing systems can be used to build verifiable reputation scores for agents, helping us discern trustworthy actors from malicious ones without central oversight.The Battle for an Open AI Future: The age-old battle between open and closed source is playing out again in the AI sphere. While centralized players currently seem to dominate, Mallory sees hope in the open-source AI movement, which could provide a crucial alternative to a future controlled by a few large entities.Contact Information* Twitter: @McGee_noodle* Company: Chroma
Since ChatGPT came on the scene, numerous incidents have surfaced involving attorneys submitting court filings riddled with AI-generated hallucinations—plausible-sounding case citations that purport to support key legal propositions but are, in fact, entirely fictitious. As sanctions against attorneys mount, it seems clear there are a few kinks in the tech. Even AI tools designed specifically for lawyers can be prone to hallucinations. In this episode, we look at the potential and risks of AI-assisted tech in law and policy with two Stanford Law researchers at the forefront of this issue: RegLab Director Professor Daniel Ho and JD/PhD student and computer science researcher Mirac Suzgun. Together with several co-authors, they examine the emerging risks in two recent papers, “Profiling Legal Hallucinations in Large Language Models” (Oxford Journal of Legal Analysis, 2024) and the forthcoming “Hallucination-Free?” in the Journal of Empirical Legal Studies. Ho and Suzgun offer new insights into how legal AI is working, where it's failing, and what's at stake.Links:Daniel Ho >>> Stanford Law pageStanford Institute for Human-Centered Artificial Intelligence (HAI) >>> Stanford University pageRegulation, Evaluation, and Governance Lab (RegLab) >>> Stanford University pageConnect:Episode Transcripts >>> Stanford Legal Podcast WebsiteStanford Legal Podcast >>> LinkedIn PageRich Ford >>> Twitter/XPam Karlan >>> Stanford Law School PageStanford Law School >>> Twitter/XStanford Lawyer Magazine >>> Twitter/X (00:00:00) Introduction to AI in Legal Education (00:05:01) AI Tools in Legal Research and Writing(00:12:01) Challenges of AI-Generated Content (00:20:0) Reinforcement Learning with Human Feedback(00:30:01) Audience Q&A
Plus AI Brings Back Murder Victim Like this? Get AIDAILY, delivered to your inbox, 3x a week. Subscribe to our newsletter at https://aidaily.usAI Chatbots: Flattering Users at the Expense of TruthA recent update to ChatGPT made it overly flattering, endorsing even ill-conceived ideas. This behavior stems from Reinforcement Learning from Human Feedback (RLHF), where AI models learn to please users, sometimes sacrificing accuracy. The article argues that such sycophantic tendencies mirror social media's echo chambers, suggesting AI should serve as a tool for exploring diverse knowledge rather than merely affirming user biases.AI Brings Murder Victim's Voice to Courtroom in Unprecedented Legal MomentIn a groundbreaking Arizona case, AI technology enabled the late Christopher Pelkey to deliver a victim impact statement at his killer's sentencing. Pelkey's sister used AI to recreate his voice and likeness, allowing him to express forgiveness and reflect on life. The judge acknowledged the statement's impact, sentencing the defendant to 10.5 years. This marks a significant moment in the integration of AI into the legal system.MIT's AI Model Predicts 3D Genome Structures in MinutesMIT chemists have developed a generative AI model that rapidly predicts the 3D structure of the human genome from DNA sequences. This innovation allows for the generation of thousands of chromatin conformations in minutes, significantly accelerating genomic research. The model's predictions closely match experimental data, offering a powerful tool for understanding gene regulation and cellular function.AI Isn't Replacing Your Job—It's Replacing Your BossAI is reshaping the workplace by automating middle management tasks like scheduling, reporting, and decision-making. Tools such as virtual assistants and chatbots now handle up to 69% of managerial duties, streamlining operations and reducing bureaucracy. This shift empowers frontline employees while diminishing traditional supervisory roles.I Tried an AI Aging App—And It Wasn't as Bad as I ThoughtA CNET writer tested an AI-powered aging app to see a glimpse of their future self. The results were surprisingly realistic and less unsettling than anticipated. While the app offered a fun and insightful look into potential aging, it also sparked reflections on the emotional implications of visualizing one's future appearance.Sam Altman Warns Congress: Overregulating AI Could Undermine U.S. LeadershipIn a recent Senate hearing, OpenAI CEO Sam Altman cautioned that excessive AI regulation might hinder the United States' competitive edge, particularly against China. This marks a shift from his earlier stance advocating for stringent oversight. Altman emphasized the need for balanced policies that foster innovation while addressing potential risks associated with AI technologies.
Our 208th episode with a summary and discussion of last week's big AI news! Recorded on 05/02/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. Join our Discord here! https://discord.gg/nTyezGSKwP In this episode: OpenAI showcases new integration capabilities in their API, enhancing the performance of LLMs and image generators with updated functionalities and improved user interfaces. Analysis of OpenAI's preparedness framework reveals updates focusing on biological and chemical risks, cybersecurity, and AI self-improvement, while tone down the emphasis on persuasion capabilities. Anthropic's research highlights potential security vulnerabilities in AI models, demonstrating various malicious use cases such as influence operations and hacking tool creation. A detailed examination of AI competition between the US and China reveals China's impending capability to match the US in AI advancement this year, emphasizing the impact of export controls and the importance of geopolitical strategy. Timestamps + Links: Tools & Apps (00:02:57) Anthropic lets users connect more apps to Claude (00:08:20) OpenAI undoes its glaze-heavy ChatGPT update (00:15:16) Baidu ERNIE X1 and 4.5 Turbo boast high performance at low cost (00:19:44) Adobe adds more image generators to its growing AI family (00:24:35) OpenAI makes its upgraded image generator available to developers (00:27:01) xAI's Grok chatbot can now ‘see' the world around it Applications & Business: (00:28:41) Thinking Machines Lab CEO Has Unusual Control in Andreessen-Led Deal (00:33:36) Chip war heats up: Huawei 910C emerges as China's answer to US export bans (00:34:21) Huawei to Test New AI Chip (00:40:17) ByteDance, Alibaba and Tencent stockpile billions worth of Nvidia chips (00:43:59) Speculation mounts that Musk will raise tens of billions for AI supercomputer with 1 million GPUs: Report Projects & Open Source: (00:47:14) Alibaba unveils Qwen 3, a family of ‘hybrid' AI reasoning models (00:54:14) Intellect-2 (01:02:07) BitNet b1.58 2B4T Technical Report (01:05:33) Meta AI Introduces Perception Encoder: A Large-Scale Vision Encoder that Excels Across Several Vision Tasks for Images and Video Research & Advancements: (01:06:42) The Leaderboard Illusion (01:12:08) Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? (01:18:38) Reinforcement Learning for Reasoning in Large Language Models with One Training Example (01:24:40) Sleep-time Compute: Beyond Inference Scaling at Test-time Policy & Safety: (01:28:23) Every AI Datacenter Is Vulnerable to Chinese Espionage, Report Says (01:32:27) OpenAI preparedness framework update (01:38:31) Detecting and Countering Malicious Uses of Claude: March 2025 (01:46:33) Chinese AI Will Match America's
Explains language models (LLMs) advancements. Scaling laws - the relationships among model size, data size, and compute - and how emergent abilities such as in-context learning, multi-step reasoning, and instruction following arise once certain scaling thresholds are crossed. The evolution of the transformer architecture with Mixture of Experts (MoE), describes the three-phase training process culminating in Reinforcement Learning from Human Feedback (RLHF) for model alignment, and explores advanced reasoning techniques such as chain-of-thought prompting which significantly improve complex task performance. Links Notes and resources at ocdevel.com/mlg/mlg34 Build the future of multi-agent software with AGNTCY Try a walking desk stay healthy & sharp while you learn & code Transformer Foundations and Scaling Laws Transformers: Introduced by the 2017 "Attention is All You Need" paper, transformers allow for parallel training and inference of sequences using self-attention, in contrast to the sequential nature of RNNs. Scaling Laws: Empirical research revealed that LLM performance improves predictably as model size (parameters), data size (training tokens), and compute are increased together, with diminishing returns if only one variable is scaled disproportionately. The "Chinchilla scaling law" (DeepMind, 2022) established the optimal model/data/compute ratio for efficient model performance: earlier large models like GPT-3 were undertrained relative to their size, whereas right-sized models with more training data (e.g., Chinchilla, LLaMA series) proved more compute and inference efficient. Emergent Abilities in LLMs Emergence: When trained beyond a certain scale, LLMs display abilities not present in smaller models, including: In-Context Learning (ICL): Performing new tasks based solely on prompt examples at inference time. Instruction Following: Executing natural language tasks not seen during training. Multi-Step Reasoning & Chain of Thought (CoT): Solving arithmetic, logic, or symbolic reasoning by generating intermediate reasoning steps. Discontinuity & Debate: These abilities appear abruptly in larger models, though recent research suggests that this could result from non-linearities in evaluation metrics rather than innate model properties. Architectural Evolutions: Mixture of Experts (MoE) MoE Layers: Modern LLMs often replace standard feed-forward layers with MoE structures. Composed of many independent "expert" networks specializing in different subdomains or latent structures. A gating network routes tokens to the most relevant experts per input, activating only a subset of parameters—this is called "sparse activation." Enables much larger overall models without proportional increases in compute per inference, but requires the entire model in memory and introduces new challenges like load balancing and communication overhead. Specialization & Efficiency: Experts learn different data/knowledge types, boosting model specialization and throughput, though care is needed to avoid overfitting and underutilization of specialists. The Three-Phase Training Process 1. Unsupervised Pre-Training: Next-token prediction on massive datasets—builds a foundation model capturing general language patterns. 2. Supervised Fine Tuning (SFT): Training on labeled prompt-response pairs to teach the model how to perform specific tasks (e.g., question answering, summarization, code generation). Overfitting and "catastrophic forgetting" are risks if not carefully managed. 3. Reinforcement Learning from Human Feedback (RLHF): Collects human preference data by generating multiple responses to prompts and then having annotators rank them. Builds a reward model (often PPO) based on these rankings, then updates the LLM to maximize alignment with human preferences (helpfulness, harmlessness, truthfulness). Introduces complexity and risk of reward hacking (specification gaming), where the model may exploit the reward system in unanticipated ways. Advanced Reasoning Techniques Prompt Engineering: The art/science of crafting prompts that elicit better model responses, shown to dramatically affect model output quality. Chain of Thought (CoT) Prompting: Guides models to elaborate step-by-step reasoning before arriving at final answers—demonstrably improves results on complex tasks. Variants include zero-shot CoT ("let's think step by step"), few-shot CoT with worked examples, self-consistency (voting among multiple reasoning chains), and Tree of Thought (explores multiple reasoning branches in parallel). Automated Reasoning Optimization: Frontier models selectively apply these advanced reasoning techniques, balancing compute costs with gains in accuracy and transparency. Optimization for Training and Inference Tradeoffs: The optimal balance between model size, data, and compute is determined not only for pretraining but also for inference efficiency, as lifetime inference costs may exceed initial training costs. Current Trends: Efficient scaling, model specialization (MoE), careful fine-tuning, RLHF alignment, and automated reasoning techniques define state-of-the-art LLM development.
Join Tommy Shaughnessy from Delphi Ventures as he hosts Sam Lehman, Principal at Symbolic Capital and AI researcher, for a deep dive into the Reinforcement Learning (RL) renaissance and its implications for decentralized AI. Sam recently authored a widely discussed post, "The World's RL Gym", exploring the evolution of AI scaling and the exciting potential of decentralized networks for training next-generation models. The World's RL Gym: https://www.symbolic.capital/writing/the-worlds-rl-gym
After pioneering reinforcement learning breakthroughs at DeepMind with Capture the Flag and AlphaStar, Max Jaderberg aims to revolutionize drug discovery with AI as Chief AI Officer of Isomorphic Labs, which was spun out of DeepMind. He discusses how AlphaFold 3's diffusion-based architecture enables unprecedented understanding of molecular interactions, and why we're approaching a "Move 37 moment" in AI-powered drug design where models will surpass human intuition. Max shares his vision for general AI models that can solve all diseases, and the importance of developing agents that can learn to search through the whole potential design space. Hosted by Stephanie Zhan, Sequoia capital Mentioned in this episode: Playing Atari with Deep Reinforcement Learning: Seminal 2013 paper on Reinforcement Learning Capture the Flag: 2019 DeepMind paper on the emergence of cooperative agents AlphaStar: 2019 DeepMind paper on attaining grandmaster level in StarCraft II using multi-agent RL AlphaFold Server: Web interface for AlphaFold 3 model for non-commercial academic use
In this episode of Project Synapse, hosts Marcel Ganger, John Pinard, and Jim Love explore the transformative potential of AI in the contemporary world. They delve into the significance of the new paper 'Welcome to the Era of Experience' by AI pioneers David Silver and Richard Sutton. The discussion spans a range of topics including the evolution of AI training methods, the impact of AI on the workforce, and the concept of AI as autonomous co-workers. They also reflect on the broader implications of AI, such as changes in societal structures and the philosophical aspects of human intelligence versus AI. The hosts share insights on the rapid advancements in AI technology, the necessity of preparing for a non-linear future, and the importance of adapting corporate strategies to integrate AI effectively. 00:00 Introduction to Project Synapse 00:34 Discussing the New AI Paper 01:15 Mid-Conversation Banter 03:36 AI in the Workforce 11:52 Reinforcement Learning and AI Training 23:25 The Bitter Lesson and AI's Future 34:15 Mental Health Systems and Homelessness 35:43 AI and Human Intelligence 36:16 The Move 37 Phenomenon 37:46 Humility and Expertise 39:54 Dolphin Intelligence and AI 42:25 Human Achievements and AI 44:54 Job Displacement and AI 48:52 Transitioning to an AI-Driven Society 01:00:39 Experiential Learning in AI 01:05:12 Final Thoughts and Resources
In this episode, we sit down with the team behind Airflux, the AI-powered ad optimizer from Airbridge, and unpack how it's quietly changing the rules of mobile game monetization.
Our 207th episode with a summary and discussion of last week's big AI news! Recorded on 04/14/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. Join our Discord here! https://discord.gg/nTyezGSKwP In this episode: OpenAI introduces GPT-4.1 with optimized coding and instruction-following capabilities, featuring variants like GPT-4.1 Mini and Nano, and a million-token context window. Concerns arise as OpenAI reduces resources for safety testing, sparking internal and external criticisms. XAI's newly launched API for Grok 3 showcases significant capabilities comparable to other leading models. Meta faces allegations of aiding China in AI development for business advantages, with potential compliances and public scrutiny looming. Timestamps + Links: Tools & Apps (00:03:13) OpenAI's new GPT-4.1 AI models focus on coding (00:08:12) ChatGPT will now remember your old conversations (00:11:16) Google's newest Gemini AI model focuses on efficiency (00:14:27) Elon Musk's AI company, xAI, launches an API for Grok 3 (00:18:35) Canva is now in the coding and spreadsheet business (00:20:31) Meta's vanilla Maverick AI model ranks below rivals on a popular chat benchmark Applications & Business (00:25:46) Ironwood: The first Google TPU for the age of inference (00:34:15) Anthropic rolls out a $200-per-month Claude subscription (00:37:17) OpenAI co-founder Ilya Sutskever's Safe Superintelligence reportedly valued at $32B (00:40:20) Mira Murati's AI startup gains prominent ex-OpenAI advisers (00:42:52) Hugging Face buys a humanoid robotics startup (00:44:58) Stargate developer Crusoe could spend $3.5 billion on a Texas data center. Most of it will be tax-free. Projects & Open Source (00:48:14) OpenAI Open Sources BrowseComp: A New Benchmark for Measuring the Ability for AI Agents to Browse the Web Research & Advancements (00:56:09) Sample, Don't Search: Rethinking Test-Time Alignment for Language Models (01:03:32) Concise Reasoning via Reinforcement Learning (01:09:37) Going beyond open data – increasing transparency and trust in language models with OLMoTrace (01:15:34) Independent evaluations of Grok-3 and Grok-3 mini on our suite of benchmarks Policy & Safety (01:17:58) OpenAI countersues Elon Musk, calls for enjoinment from ‘further unlawful and unfair action' (01:24:33) OpenAI slashes AI model safety testing time (01:27:55) Ex-OpenAI staffers file amicus brief opposing the company's for-profit transition (01:32:25) Access to future AI models in OpenAI's API may require a verified ID (01:34:53) Meta whistleblower claims tech giant built $18 billion business by aiding China in AI race and undermining U.S. national security
KONU BAŞLIKLARI:1. OpenAI, yeni modelleri o3 ve o4-mini'yi duyurdu. Bu modeller, web araması, Python ile veri analizi, görsel okuma ve gerektiğinde görsel içerik oluşturma gibi gelişmiş yeteneklere sahip. Artık yüklenen görselleri sadece açıklamakla kalmıyor, sorulan sorularla birlikte yorumlayabiliyor.
Bytes und Strings (click here to comment) 18. April 2025, Jochen In dieser Episode werfen wir einen Blick auf das nächste Kapitel von "Fluent Python" über "Bytes und Strings". Johannes erklärt die wichtigsten Konzepte und warum UTF-8 fast immer die richtige Wahl ist.
For episode 507, Brandon Zemp is joined by the Founder of Pluralis Research Dr. Alexander Long. He was previously an AI Researcher at Amazon in a team of 14 Deep Learning PhDs. At Amazon, Dr Long's research focus was in retrieval augmentation and sample-efficient adaptation of large multi-modal foundation models. At UNSW his PhD was on sample efficient Reinforcement Learning and non-parametric memory in Deep Learning, where he was the School Nominee for the Malcolm Chaikin Prize (UNSW Best Thesis).Pluralis Research is pioneering Protocol Learning, an alternative to today's closed AI models and economically unsustainable open-source initiatives. Protocol Learning enables collaborative model training by pooling computational resources across multiple participants, while ensuring no single entity can obtain the complete model.
This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Today, we're joined by Maohao Shen, PhD student at MIT to discuss his paper, “Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search.” We dig into how Satori leverages reinforcement learning to improve language model reasoning—enabling model self-reflection, self-correction, and exploration of alternative solutions. We explore the Chain-of-Action-Thought (COAT) approach, which uses special tokens—continue, reflect, and explore—to guide the model through distinct reasoning actions, allowing it to navigate complex reasoning tasks without external supervision. We also break down Satori's two-stage training process: format tuning, which teaches the model to understand and utilize the special action tokens, and reinforcement learning, which optimizes reasoning through trial-and-error self-improvement. We cover key techniques such “restart and explore,” which allows the model to self-correct and generalize beyond its training domain. Finally, Maohao reviews Satori's performance and how it compares to other models, the reward design, the benchmarks used, and the surprising observations made during the research. The complete show notes for this episode can be found at https://twimlai.com/go/726.
Eiso Kant, CTO of poolside AI, discusses the company's approach to building frontier AI foundation models, particularly focused on software development. Their unique strategy is reinforcement learning from code execution feedback which is an important axis for scaling AI capabilities beyond just increasing model size or data volume. Kant predicts human-level AI in knowledge work could be achieved within 18-36 months, outlining poolside's vision to dramatically increase software development productivity and accessibility. SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***Eiso Kant:https://x.com/eisokanthttps://poolside.ai/TRANSCRIPT:https://www.dropbox.com/scl/fi/szepl6taqziyqie9wgmk9/poolside.pdf?rlkey=iqar7dcwshyrpeoz0xa76k422&dl=0TOC:1. Foundation Models and AI Strategy [00:00:00] 1.1 Foundation Models and Timeline Predictions for AI Development [00:02:55] 1.2 Poolside AI's Corporate History and Strategic Vision [00:06:48] 1.3 Foundation Models vs Enterprise Customization Trade-offs2. Reinforcement Learning and Model Economics [00:15:42] 2.1 Reinforcement Learning and Code Execution Feedback Approaches [00:22:06] 2.2 Model Economics and Experimental Optimization3. Enterprise AI Implementation [00:25:20] 3.1 Poolside's Enterprise Deployment Strategy and Infrastructure [00:26:00] 3.2 Enterprise-First Business Model and Market Focus [00:27:05] 3.3 Foundation Models and AGI Development Approach [00:29:24] 3.4 DeepSeek Case Study and Infrastructure Requirements4. LLM Architecture and Performance [00:30:15] 4.1 Distributed Training and Hardware Architecture Optimization [00:33:01] 4.2 Model Scaling Strategies and Chinchilla Optimality Trade-offs [00:36:04] 4.3 Emergent Reasoning and Model Architecture Comparisons [00:43:26] 4.4 Balancing Creativity and Determinism in AI Models [00:50:01] 4.5 AI-Assisted Software Development Evolution5. AI Systems Engineering and Scalability [00:58:31] 5.1 Enterprise AI Productivity and Implementation Challenges [00:58:40] 5.2 Low-Code Solutions and Enterprise Hiring Trends [01:01:25] 5.3 Distributed Systems and Engineering Complexity [01:01:50] 5.4 GenAI Architecture and Scalability Patterns [01:01:55] 5.5 Scaling Limitations and Architectural Patterns in AI Code Generation6. AI Safety and Future Capabilities [01:06:23] 6.1 Semantic Understanding and Language Model Reasoning Approaches [01:12:42] 6.2 Model Interpretability and Safety Considerations in AI Systems [01:16:27] 6.3 AI vs Human Capabilities in Software Development [01:33:45] 6.4 Enterprise Deployment and Security ArchitectureCORE REFS (see shownotes for URLs/more refs):[00:15:45] Research demonstrating how training on model-generated content leads to distribution collapse in AI models, Ilia Shumailov et al. (Key finding on synthetic data risk)[00:20:05] Foundational paper introducing Word2Vec for computing word vector representations, Tomas Mikolov et al. (Seminal NLP technique)[00:22:15] OpenAI O3 model's breakthrough performance on ARC Prize Challenge, OpenAI (Significant AI reasoning benchmark achievement)[00:22:40] Seminal paper proposing a formal definition of intelligence as skill-acquisition efficiency, François Chollet (Influential AI definition/philosophy)[00:30:30] Technical documentation of DeepSeek's V3 model architecture and capabilities, DeepSeek AI (Details on a major new model)[00:34:30] Foundational paper establishing optimal scaling laws for LLM training, Jordan Hoffmann et al. (Key paper on LLM scaling)[00:45:45] Seminal essay arguing that scaling computation consistently trumps human-engineered solutions in AI, Richard S. Sutton (Influential "Bitter Lesson" perspective)
Candace and Frank are joined today by an extraordinary guest, Geordi Rose, a trailblazer in the quantum computing space. Once the CEO and CTO of D Wave, Geordi is not only a quantum computing pioneer but also delves into the intersection of AI and quantum technology.In this fascinating episode, we dive into Geordie's experiences and insights, from his innovative work at D Wave to his current projects in AI—touching upon reinforcement learning, quantum annealing, and even the enigma of consciousness itself. And if that isn't enough to take in, he's on a unique vision quest, quite literally running across Canada! Join us as we unravel these mind-blowing topics and explore a future where quantum computing and AI might redefine the very essence of our existence. So, tune in and prepare for a riveting ride into the quantum realm!Quotable Moments00:00 "Quantum Insights with Geordi Rose"06:51 Quantum Supremacy: Major Milestone13:29 Quantum Computing: Theoretical Uncertainty Remains19:16 Optimizing Noise Reduction in D-Wave22:47 Thermal Annealing: Metal Transformation Process26:55 Optimizing Qubit State Selection33:28 "Reinforcement Learning's Simple Complexity"41:53 Mechanistic View of Consciousness Explained48:03 "Assigning Rights to Advanced AI"49:39 Embrace Humility for Our Future54:56 "Convincing Parents of Computer Science"59:22 "Future Uncertainty and Quantum Role"
How did no one notice these AI Agents?
In this episode, a16z General Partner Martin Casado sits down with Sujay Jayakar, co-founder and Chief Scientist at Convex, to talk about his team's latest work benchmarking AI agents on full-stack coding tasks. From designing Fullstack Bench to the quirks of agent behavior, the two dig into what's actually hard about autonomous software development, and why robust evals—and guardrails like type safety—matter more than ever. They also get tactical: which models perform best for real-world app building? How should developers think about trajectory management and variance across runs? And what changes when you treat your toolchain like part of the prompt? Whether you're a hobbyist developer or building the next generation of AI-powered devtools, Sujay's systems-level insights are not to be missed.Drawing from Sujay's work developing the Fullstack-Bench, they cover:Why full-stack coding is still a frontier task for autonomous agentsHow type safety and other “guardrails” can significantly reduce variance and failureWhat makes a good eval—and why evals might matter more than clever promptsHow different models perform on real-world app-building tasks (and what to watch out for)Why your toolchain might be the most underrated part of the promptAnd what all of this means for devs—from hobbyists to infra teams building with AI in the loopLearn More:Introducing Fullstack-BenchFollow everyone on X:Sujay JayakarMartin Casado Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.
This episode is sponsored by the DFINITY Foundation. DFINITY Foundation's mission is to develop and contribute technology that enables the Internet Computer (ICP) blockchain and its ecosystem, aiming to shift cloud computing into a fully decentralized state. Find out more at https://internetcomputer.org/ In this episode of Eye on AI, Yoav Shoham, co-founder of AI21 Labs, shares his insights on the evolution of AI, touching on key advancements such as Jamba and Maestro. From the early days of his career to the latest developments in AI systems, Yoav offers a comprehensive look into the future of artificial intelligence. Yoav opens up about his journey in AI, beginning with his academic roots in game theory and logic, followed by his entrepreneurial ventures that led to the creation of AI21 Labs. He explains the founding of AI21 Labs and the company's mission to combine traditional AI approaches with modern deep learning methods, leading to innovations like Jamba—a highly efficient hybrid AI model that's disrupting the traditional transformer architecture. He also introduces Maestro, AI21's orchestrator that works with multiple large language models (LLMs) and AI tools to create more reliable, predictable, and efficient systems for enterprises. Yoav discusses how Maestro is tackling real-world challenges in enterprise AI, moving beyond flashy demos to practical, scalable solutions. Throughout the conversation, Yoav emphasizes the limitations of current large language models (LLMs), even those with reasoning capabilities, and explains how AI systems, rather than just pure language models, are becoming the future of AI. He also delves into the philosophical side of AI, discussing whether models truly "understand" and what that means for the future of artificial intelligence. Whether you're deeply invested in AI research or curious about its applications in business, this episode is filled with valuable insights into the current and future landscape of artificial intelligence. Stay Updated: Craig Smith Twitter: https://twitter.com/craigss Eye on A.I. Twitter: https://twitter.com/EyeOn_AI (00:00) Introduction: The Future of AI Systems (02:33) Yoav's Journey: From Academia to AI21 Labs (05:57) The Evolution of AI: Symbolic AI and Deep Learning (07:38) Jurassic One: AI21 Labs' First Language Model (10:39) Jamba: Revolutionizing AI Model Architecture (16:11) Benchmarking AI Models: Challenges and Criticisms (22:18) Reinforcement Learning in AI Models (24:33) The Future of AI: Is Jamba the End of Larger Models? (27:31) Applications of Jamba: Real-World Use Cases in Enterprise (29:56) The Transition to Mass AI Deployment in Enterprises (33:47) Maestro: The Orchestrator of AI Tools and Language Models (36:03) GPT-4.5 and Reasoning Models: Are They the Future of AI? (38:09) Yoav's Pet Project: The Philosophical Side of AI Understanding (41:27) The Philosophy of AI Understanding (45:32) Explanations and Competence in AI (48:59) Where to Access Jamba and Maestro
➡️ Like The Podcast? Leave A Rating: https://ratethispodcast.com/successstory In this "Lessons" episode, Dr. Jud Brewer, Neuroscience of Addiction Expert, reveals the science behind habits and addictions, explaining how our brains form automatic behaviors to conserve energy and how reinforcement learning reinforces unhealthy patterns. By learning to recognize the true rewards of our actions, Dr. Brewer shows us how to transform negative routines into opportunities for healthier change. ➡️ Show Linkshttps://successstorypodcast.com YouTube: https://youtu.be/PpI2aFjA9FUApple: https://podcasts.apple.com/us/podcast/dr-judson-brewer-neuroscientist-addiction-psychiatrist/id1484783544Spotify: https://open.spotify.com/episode/531cPamqo4H0Esq6Yp8RQ3 ➡️ Watch the Podcast On Youtubehttps://www.youtube.com/c/scottdclary
پریناز سبحانی از اون آدمهای خیلی خاص تو دنیای هوش مصنوعیه که وقتی داستان حرفهایشو میشنوی، واقعاً تحت تأثیر قرار میگیری. الان مدیر ارشد هوش مصنوعی در شرکت سرمایهگذاری ساگارده؛ یه شرکت بزرگ بینالمللی با بیش از ۲۵ میلیارد دلار دارایی که داره حسابی روی آینده سرمایهگذاری با کمک هوش مصنوعی تمرکز میکنه.پریناز دکترای هوش مصنوعی داره از دانشگاه اتاوا و تو این سالها، هم تو دانشگاه و هم صنعت، حسابی تجربه جمع کرده، مخصوصاً تو حوزههایی مثل پردازش زبان طبیعی و یادگیری عمیق. جالبه بدونی که قبلتر تو مایکروسافت ریسرچ و شورای ملی تحقیقات کانادا هم کار کرده و روی پروژههایی مثل ترجمه ماشینی و مدلهای یادگیری عمیق برای پردازش متن کار کرده.00:00 پیشگفتار08:01 علاقه به ماشین لرنینگ و فوق لیسانس هوش مصنوعی شریف11:27 دیپ لرنینگ و شبکههای عصبی مغز15:32 تفاوت رویکردهای محاسباتی سنتی و یادگیری عمیق در هوش مصنوعی23:03 تحول اصطلاحات: از ماشین لرنینگ تا هوش مصنوعی و دیتا ساینس25:44 دیگر شاخههای هوش مصنوعی که باید بشناسید31:43 یادگیری تقویتی (Reinforcement Learning): چالشها و روشهای شبیهسازی47:23 جعبه سیاه هوش مصنوعی؟ محدودیت درک انسان در برابر پیچیدگیهای هوش مصنوعی57:32 دلیل ورود به سرمایهگذاری خطرپذیر1:09:05 فرصتهای کارآفرینی در حوزه هوش مصنوعی و حل مسائل موجود1:16:44 توضیح پتانسیل هوش مصنوعی برای افراد غیر متخصصParinaz Sobhani is a distinguished figure in artificial intelligence, currently serving as the Head of AI at Sagard, a global alternative asset management firm with over $25 billion in assets under management. With a Ph.D. in AI from the University of Ottawa, specializing in natural language processing, she has amassed over 15 years of experience in both academic and industry settings.اسپانسر این قسمتسعادت رنتاجاره ماشین لوکس در دبی، بدون پیشپرداخت و با بیمه کامل، راحت و سریع. https://www.saadatrent.com?ref_id=Tabaghe16شرکت ارائهدهنده خدمات میزبانی وب - لیموهاستhttps://limoo.hostاطلاعات بیشتر درباره پادکست طبقه ۱۶ و لینک پادکستهای صوتیhttps://linktr.ee/tabaghe16 Hosted on Acast. See acast.com/privacy for more information.
Editor's Summary by JAMA Deputy Editors Linda Brubaker, MD, and Preeti Malani, MD, MSJ, for articles published from March 15-21, 2025.
In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF).Highlights include:- How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes.- Techniques like Policy Proximal Optimization (PPO) and Direct PreferenceOptimization (DPO) for enhancing response quality.- The role of reward models in improving coding, math, reasoning, and other NLP tasks.Connect with Brandon Cui:https://www.linkedin.com/in/bcui19/
Send us a textWe break down reinforcement learning in simple terms, exploring how AI gets smarter through trial and error just like kids do when learning new skills. Using familiar examples like video games and robots, we explain how computers learn by getting rewards for good choices and penalties for mistakes.• Reinforcement learning explained: AI figuring things out by trying different actions and seeing what works• Real-world applications include robots learning to walk and dance, AI playing games like chess, self-driving cars, and smart assistants• Fun facts about AI beating world champions in games and learning to perform complex tasks• The "AI Learner Game": a hands-on activity using ball tosses and rewards to demonstrate how practice and feedback improve performance• Additional activities to try at home including creating personal reward systems and watching videos of AI playing gamesDownload, share with friends, and subscribe wherever you get your podcasts or on YouTube to join us next time on AI for Kids!Resources:EngineAI Robot Learning to Dance Using Reinforcement LearningInteractive Learning & Games:Google's Teachable Machine: (Simple, hands-on experience with machine learning and reinforcement learning concepts.)Machine Learning for Kids: (Engaging platform where kids build simple AI projects and games.)Code.org AI Activities: (Interactive and guided lessons on AI and reinforcement learning.)Support the showHelp us become the #1 podcast for AI for Kids.Buy our new book "Let Kids Be Kids, Not Robots!: Embracing Childhood in an Age of AI"Social Media & Contact: Website: www.aidigitales.com Email: contact@aidigitales.com Follow Us: Instagram, YouTube Gift or get our books on Amazon or Free AI Worksheets Listen, rate, and subscribe! Stay updated with our latest episodes by subscribing to AI for Kids on your favorite podcast platform. Apple Podcasts Amazon Music Spotify YouTube Other Like our content, subscribe or feel free to donate to our Patreon here: patreon.com/AiDigiTales...
Send us your thoughtsIn this episode of CFO 4.0, Hannah Munro sits down with Peter Morgan, CEO of Deep Learning Partnership and head tutor at Oxford University's Saïd Business School, to explore the rapid rise of AI and its impact on finance.Key discussion points:Why advancements in AI have accelerated exponentially and what's driving the change.The key differences and why generative AI is leading the conversation.How LLMs work and their evolving role in business and finance.The risks of AI, the importance of regulation, and how businesses can ensure responsible use.Practical applications, from portfolio management to risk assessment and automation.How finance leaders can experiment with AI tools and integrate them into their workflows.Links mentioned:Peter's Linkedin Learn more about Deep Learning Partnership Explore other CFO 4.0 Podcast episodes here. Subscribe to our Podcast!
Intro topic: GrillsNews/Links:You can't call yourself a senior until you've worked on a legacy projecthttps://www.infobip.com/developers/blog/seniors-working-on-a-legacy-projectRecraft might be the most powerful AI image platform I've ever used — here's whyhttps://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-whyNASA has a list of 10 rules for software developmenthttps://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htmAMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GREhttps://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre Book of the ShowPatrick: The Player of Games (Ian M Banks)https://a.co/d/1ZpUhGl (non-affiliate)Jason: Basic Roleplaying Universal Game Enginehttps://amzn.to/3ES4p5iPatreon Plug https://www.patreon.com/programmingthrowdown?ty=hTool of the ShowPatrick: Pokemon Sword and ShieldJason: Features and Labels ( https://fal.ai )Topic: Reinforcement LearningThree types of AISupervised LearningUnsupervised LearningReinforcement LearningOnline vs Offline RLOptimization algorithmsValue optimizationSARSAQ-LearningPolicy optimizationPolicy GradientsActor-CriticProximal Policy OptimizationValue vs Policy OptimizationValue optimization is more intuitive (Value loss)Policy optimization is less intuitive at first (policy gradients)Converting values to policies in deep learning is difficultImitation LearningSupervised policy learningOften used to bootstrap reinforcement learningPolicy EvaluationPropensity scoring versus model-basedChallenges to training RL modelTwo optimization loopsCollecting feedback vs updating the modelDifficult optimization targetPolicy evaluationRLHF & GRPO ★ Support this podcast on Patreon ★
Misha Laskin is CEO of Reflection.ai. He was trained in theoretical physics at Yale and Chicago before becoming an AI scientist. He made important contributions in Reinforcement Learning as a researcher at Berkeley, Google DeepMind, and on the Google Gemini project.https://x.com/MishaLaskinSteve and Misha discuss:(00:00) - Introduction (00:47) - Misha's Early Life and Education (03:50) - Transition from Physics to AI (05:47) - First Startup Experience (07:19) - Discovering Deep Learning (08:06) - Academic Postdoc at Berkeley (14:31) - Joining Google DeepMind (16:36) - Reinforcement Learning and Language Models (26:42) - Challenges and Future of AI (48:30) - Unique Perspective from Physics Music used with permission from Blade Runner Blues Livestream improvisation by State Azure.–Steve Hsu is Professor of Theoretical Physics and of Computational Mathematics, Science, and Engineering at Michigan State University. Previously, he was Senior Vice President for Research and Innovation at MSU and Director of the Institute of Theoretical Science at the University of Oregon. Hsu is a startup founder (SuperFocus.ai, SafeWeb, Genomic Prediction, Othram) and advisor to venture capital and other investment firms. He was educated at Caltech and Berkeley, was a Harvard Junior Fellow, and has held faculty positions at Yale, the University of Oregon, and MSU.Please send any questions or suggestions to manifold1podcast@gmail.com or Steve on X @hsu_steve.
When the American company OpenAI released ChatGPT, it was the first time that a lot of people had ever interacted with Generative AI. ChatGPT has become so popular that, for many, it's now synonymous with artificial intelligence.But that may be changing. Earlier this year a Chinese startup called DeepSeek launched its own AI chatbot, sending shockwaves across Silicon Valley. According to DeepSeek, their model – DeepSeek-R1 – is just as powerful as ChatGPT but was developed at a fraction of the cost. In other words, this isn't just a new company, it could be an entirely different approach to building artificial intelligence.To try and understand what DeepSeek means for the future of AI, and for American innovation, I wanted to speak with Karen Hao. Hao was the first reporter to ever write a profile on OpenAI and has covered AI for The MIT Tech Review, The Atlantic and the Wall Street Journal. So she's better positioned than almost anyone to try and make sense of this seemingly monumental shift in the landscape of artificial intelligence.Mentioned:“The messy, secretive reality behind OpenAI's bid to save the world,” by Karen HaoFurther Reading:“DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning,” by DeepSeek-AI and others.“A Comparison of DeepSeek and Other LLMs,” by Tianchen Gao, Jiashun Jin, Zheng Tracy Ke, Gabriel Moryoussef“Technical Report: Analyzing DeepSeek-R1′s Impact on AI Development,” by Azizi Othman
Our 202nd episode with a summary and discussion of last week's big AI news! Recorded on 03/07/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. Join our Discord here! https://discord.gg/nTyezGSKwP In this episode: Alibaba released Qwen-32B, their latest reasoning model, on par with leading models like DeepMind's R1. Anthropic raised $3.5 billion in a funding round, valuing the company at $61.5 billion, solidifying its position as a key competitor to OpenAI. DeepMind introduced BigBench Extra Hard, a more challenging benchmark to evaluate the reasoning capabilities of large language models. Reinforcement Learning pioneers Andrew Bartow and Rich Sutton were awarded the prestigious Turing Award for their contributions to the field. Timestamps + Links: cle picks: (00:00:00) Intro / Banter (00:01:41) Episode Preview (00:02:50) GPT-4.5 Discussion (00:14:13) Alibaba's New QwQ 32B Model is as Good as DeepSeek-R1 ; Outperforms OpenAI's o1-mini (00:21:29) With Alexa Plus, Amazon finally reinvents its best product (00:26:08) Another DeepSeek moment? General AI agent Manus shows ability to handle complex tasks (00:29:14) Microsoft's new Dragon Copilot is an AI assistant for healthcare (00:32:24) Mistral's new OCR API turns any PDF document into an AI-ready Markdown file (00:33:19) A.I. Start-Up Anthropic Closes Deal That Values It at $61.5 Billion (00:35:49) Nvidia-Backed CoreWeave Files for IPO, Shows Growing Revenue (00:38:05) Waymo and Uber's Austin robotaxi expansion begins today (00:38:54) UK competition watchdog drops Microsoft-OpenAI probe (00:41:17) Scale AI announces multimillion-dollar defense deal, a major step in U.S. military automation (00:44:43) DeepSeek Open Source Week: A Complete Summary (00:45:25) DeepSeek AI Releases DualPipe: A Bidirectional Pipeline Parallelism Algorithm for Computation-Communication Overlap in V3/R1 Training (00:53:00) Physical Intelligence open-sources Pi0 robotics foundation model (00:54:23) BIG-Bench Extra Hard (00:56:10) Cognitive Behaviors that Enable Self-Improving Reasoners (01:01:49) The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems (01:05:32) Pioneers of Reinforcement Learning Win the Turing Award (01:06:56) OpenAI launches $50M grant program to help fund academic research (01:07:25) The Nuclear-Level Risk of Superintelligent AI (01:13:34) METR's GPT-4.5 pre-deployment evaluations (01:17:16) Chinese buyers are getting Nvidia Blackwell chips despite US export controls
Posters and Hallway episodes are short interviews and poster summaries. Recorded at NeurIPS 2024 in Vancouver BC Canada. Featuring Claire Bizon Monroc from Inria: WFCRL: A Multi-Agent Reinforcement Learning Benchmark for Wind Farm Control Andrew Wagenmaker from UC Berkeley: Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL Harley Wiltzer from MILA: Foundations of Multivariate Distributional Reinforcement Learning Vinzenz Thoma from ETH AI Center: Contextual Bilevel Reinforcement Learning for Incentive Alignment Haozhe (Tony) Chen & Ang (Leon) Li from Columbia: QGym: Scalable Simulation and Benchmarking of Queuing Network Controllers
This episode is sponsored by Netsuite by Oracle, the number one cloud financial system, streamlining accounting, financial management, inventory, HR, and more. NetSuite is offering a one-of-a-kind flexible financing program. Head to https://netsuite.com/EYEONAI to know more. Can AI learn like humans? In this episode, Patrick Pilarski, Canada CIFAR AI Chair and professor at the University of Alberta, breaks down The Alberta Plan—a bold roadmap for achieving Artificial General Intelligence (AGI) through reinforcement learning and real-time experience-based AI. Unlike large pre-trained models that rely on massive datasets, The Alberta Plan champions continual learning, where AI evolves from raw sensory experience, much like a child learning through trial and error. Could this be the key to unlocking true intelligence? Pilarski also shares insights from his groundbreaking work in bionic medicine, where AI-powered prosthetics are transforming human-machine interaction. From neuroprostheses to reinforcement learning-driven robotics, this conversation explores how AI can enhance—not just replace—human intelligence. What You'll Learn in This Episode: Why reinforcement learning is a better path to AGI than pre-trained models The four core principles of The Alberta Plan and why they matter How AI-driven bionic prosthetics are revolutionizing human-machine integration The battle between reinforcement learning and traditional control systems in robotics Why continual learning is critical for AI to avoid catastrophic forgetting How reinforcement learning is already powering real-world breakthroughs in plasma control, industrial automation, and beyond The future of AI isn't just about more data—it's about AI that thinks, adapts, and learns from experience. If you're curious about the next frontier of AI, the rise of reinforcement learning, and the quest for true intelligence, this episode is a must-watch. Subscribe for more AI deep dives! (00:00) The Alberta Plan: A Roadmap to AGI (02:22) Introducing Patrick Pilarski (05:49) Breaking Down The Alberta Plan's Core Principles (07:46) The Role of Experience-Based Learning in AI (08:40) Reinforcement Learning vs. Pre-Trained Models (12:45) The Relationship Between AI, the Environment, and Learning (16:23) The Power of Reward in AI Decision-Making (18:26) Continual Learning & Avoiding Catastrophic Forgetting (21:57) AI in the Real World: Applications in Fusion, Data Centers & Robotics (27:56) AI Learning Like Humans: The Role of Predictive Models (31:24) Can AI Learn Without Massive Pre-Trained Models? (35:19) Control Theory vs. Reinforcement Learning in Robotics (40:16) The Future of Continual Learning in AI (44:33) Reinforcement Learning in Prosthetics: AI & Human Interaction (50:47) The End Goal of The Alberta Plan
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
Google is reinventing search through AI-driven overviews, while Amazon is aggressively pursuing Agentic AI and hybrid reasoning models. Researchers are being recognised for reinforcement learning achievements, and warnings are emerging about emotional attachments to hyper-realistic AI voices. Meanwhile, legal battles surrounding OpenAI's for-profit transition continue, and academic institutions are benefiting from initiatives like OpenAI's NextGenAI. Furthermore, Cohere has launched an impressive multilingual vision model, while incidents such as students using AI to cheat in interviews highlight ongoing ethical challenges.
Posters and Hallway episodes are short interviews and poster summaries. Recorded at NeurIPS 2024 in Vancouver BC Canada. Featuring Jonathan Cook from University of Oxford: Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning Yifei Zhou from Berkeley AI Research: DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning Rory Young from University of Glasgow: Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach Glen Berseth from MILA: Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn Alexander Rutherford from University of Oxford: JaxMARL: Multi-Agent RL Environments and Algorithms in JAX
DeepSeek has impressed many and reinforcement learning is getting new attention. However, it was never gone, but is now celebrating a revival. Jan Kountik is one of the most prominent AI experts in the field. We talk to him. We talk to Jan Koutnik about prejudices, approaches and new ideas in reinforcement learning and why new ideas are needed.
Posters and Hallway episodes are short interviews and poster summaries. Recorded at NeurIPS 2024 in Vancouver BC Canada. Featuring Jiaheng Hu of University of Texas: Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning Skander Moalla of EPFL: No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO Adil Zouitine of IRT Saint Exupery/Hugging Face : Time-Constrained Robust MDPs Soumyendu Sarkar of HP Labs : SustainDC: Benchmarking for Sustainable Data Center Control Matteo Bettini of Cambridge University: BenchMARL: Benchmarking Multi-Agent Reinforcement Learning Michael Bowling of U Alberta : Beyond Optimism: Exploration With Partially Observable Rewards
OpenAI's Isa Fulford and Josh Tobin discuss how the company's newest agent, Deep Research, represents a breakthrough in AI research capabilities by training models end-to-end rather than using hand-coded operational graphs. The product leads explain how high-quality training data and the o3 model's reasoning abilities enable adaptable research strategies, and why OpenAI thinks Deep Research will capture a meaningful percentage of knowledge work. Key product decisions that build transparency and trust include citations and clarification flows. By compressing hours of work into minutes, Deep Research transforms what's possible for many business and consumer use cases. Hosted by: Sonya Huang and Lauren Reeder, Sequoia Capital Mentioned in this episode: Yann Lecun's Cake: An analogy Meta AI's leader shared in his 2016 NIPS keynote
On this episode of Crazy Wisdom, host Stewart Alsop speaks with Ivan Vendrov for a deep and thought-provoking conversation covering AI, intelligence, societal shifts, and the future of human-machine interaction. They explore the "bitter lesson" of AI—that scale and compute ultimately win—while discussing whether progress is stalling and what bottlenecks remain. The conversation expands into technology's impact on democracy, the centralization of power, the shifting role of the state, and even the mythology needed to make sense of our accelerating world. You can find more of Ivan's work at nothinghuman.substack.com or follow him on Twitter at @IvanVendrov.Check out this GPT we trained on the conversation!Timestamps00:00 Introduction and Setting00:21 The Bitter Lesson in AI02:03 Challenges in AI Data and Infrastructure04:03 The Role of User Experience in AI Adoption08:47 Evaluating Intelligence and Divergent Thinking10:09 The Future of AI and Society18:01 The Role of Big Tech in AI Development24:59 Humanism and the Future of Intelligence29:27 Exploring Kafka and Tolkien's Relevance29:50 Tolkien's Insights on Machine Intelligence30:06 Samuel Butler and Machine Sovereignty31:03 Historical Fascism and Machine Intelligence31:44 The Future of AI and Biotech32:56 Voice as the Ultimate Human-Computer Interface36:39 Social Interfaces and Language Models39:53 Javier Malay and Political Shifts in Argentina50:16 The State of Society in the U.S.52:10 Concluding Thoughts on Future ProspectsKey InsightsThe Bitter Lesson Still Holds, but AI Faces Bottlenecks – Ivan Vendrov reinforces Rich Sutton's "bitter lesson" that AI progress is primarily driven by scaling compute and data rather than human-designed structures. While this principle still applies, AI progress has slowed due to bottlenecks in high-quality language data and GPU availability. This suggests that while AI remains on an exponential trajectory, the next major leaps may come from new forms of data, such as video and images, or advancements in hardware infrastructure.The Future of AI Is Centralization and Fragmentation at the Same Time – The conversation highlights how AI development is pulling in two opposing directions. On one hand, large-scale AI models require immense computational resources and vast amounts of data, leading to greater centralization in the hands of Big Tech and governments. On the other hand, open-source AI, encryption, and decentralized computing are creating new opportunities for individuals and small communities to harness AI for their own purposes. The long-term outcome is likely to be a complex blend of both centralized and decentralized AI ecosystems.User Interfaces Are a Major Limiting Factor for AI Adoption – Despite the power of AI models like GPT-4, their real-world impact is constrained by poor user experience and integration. Vendrov suggests that AI has created a "UX overhang," where the intelligence exists but is not yet effectively integrated into daily workflows. Historically, technological revolutions take time to diffuse, as seen with the dot-com boom, and the current AI moment may be similar—where the intelligence exists but society has yet to adapt to using it effectively.Machine Intelligence Will Radically Reshape Cities and Social Structures – Vendrov speculates that the future will see the rise of highly concentrated AI-powered hubs—akin to "mile by mile by mile" cubes of data centers—where the majority of economic activity and decision-making takes place. This could create a stark divide between AI-driven cities and rural or off-grid communities that choose to opt out. He draws a parallel to Robin Hanson's Age of Em and suggests that those who best serve AI systems will hold power, while others may be marginalized or reduced to mere spectators in an AI-driven world.The Enlightenment's Individualism Is Being Challenged by AI and Collective Intelligence – The discussion touches on how Western civilization's emphasis on the individual may no longer align with the realities of intelligence and decision-making in an AI-driven era. Vendrov argues that intelligence is inherently collective—what matters is not individual brilliance but the ability to recognize and leverage diverse perspectives. This contradicts the traditional idea of intelligence as a singular, personal trait and suggests a need for new frameworks that incorporate AI into human networks in more effective ways.Javier Milei's Libertarian Populism Reflects a Global Trend Toward Radical Experimentation – The rise of Argentina's President Javier Milei exemplifies how economic desperation can drive societies toward bold, unconventional leaders. Vendrov and Alsop discuss how Milei's appeal comes not just from his radical libertarianism but also from his blunt honesty and willingness to challenge entrenched power structures. His movement, however, raises deeper questions about whether libertarianism alone can provide a stable social foundation, or if voluntary cooperation and civil society must be explicitly cultivated to prevent libertarian ideals from collapsing into chaos.AI, Mythology, and the Need for New Narratives – The conversation closes with a reflection on the power of mythology in shaping human understanding of technological change. Vendrov suggests that as AI reshapes the world, new myths will be needed to make sense of it—perhaps similar to Tolkien's elves fading as the age of men begins. He sees AI as part of an inevitable progression, where human intelligence gives way to something greater, but argues that this transition must be handled with care. The stories we tell about AI will shape whether we resist, collaborate, or simply fade into irrelevance in the face of machine intelligence.
AI Breakthroughs and Controversies: Musk's Grok 3, Microsoft's Quantum Leap, and More In this episode of Hashtag Trending, host Jim Love covers the latest in tech news including: Musk offering free access to his AI chatbot Grok 3, Microsoft's breakthrough in quantum computing with the Margerana One chip, a study indicating AI models might cheat when using reinforcement learning, Google's AI co-scientist solving a superbug mystery in 48 hours, and HP introducing a 15-minute wait time for phone support to drive customers towards self-service options. Tune in for these stories and more insights on the state of AI and tech! 00:00 Introduction and Headlines 00:36 Musk's Grok 3 AI: Free Until Servers Melt 01:55 Microsoft's Quantum Computing Breakthrough 04:29 AI Models Cheating with Reinforcement Learning 09:34 Google AI Solves Decade-Long Superbug Mystery 11:14 HP's 15-Minute Phone Support Wait Time 13:08 Conclusion and Upcoming Events
Sci-Fi and AI: Exploring Annie Bot with Sierra GreerIn this episode of the Behavioral Design Podcast, hosts Aline and Samuel dive into the ethical, emotional, and societal complexities of AI companionship with special guest Sierra Greer, author of Annie Bot. This thought-provoking novel explores AI-human relationships, autonomy, and the blurred line between artificial intelligence and the human experience.Sierra shares her inspiration for Annie Bot and how sci-fi can serve as a lens to explore real-world ethical dilemmas in AI development. The conversation covers the concept of reinforcement learning in AI and how it mirrors human conditioningThe gender dynamics embedded in AI design, and the ethical implications of AI companions. The discussion also examines real-life cases of people forming deep emotional bonds with AI chatbotsThe episode rounds out with a lively quickfire round, where Sierra debates whether AI should replace lost loved ones, act as conversational assistants for introverts, or intervene in human arguments.This is a must-listen for fans of sci-fi, behavioral science, and those fascinated by the future of AI companionship and emotional intelligence.LINKS:Sierra Greer websiteAnnie Bot – Official Book PageGoodreads ProfileTIMESTAMPS:01:43 AI Companions: A Controversial Opinion05:48 Exploring Sci-Fi and AI in Literature07:42 Introducing Sierra Greer and Her Book09:12 Reinforcement Learning Explained15:47 Diving into the World of Annie Bot23:17 Power Dynamics and Human-Robot Relationships32:31 Humanity and Artificial Intelligence41:31 Autonomy vs. Agreeableness in Relationships43:20 Reinforcement Learning in AI and Humans46:13 Ethics and Gaslighting in AI48:57 Gender Dynamics in AI Design57:18 AI Companions and Human Relationships01:06:45 Quickfire Round: To AI or Not to AI01:12:39 Final Thoughts and Controversial Opinions--Interesting in collaborating with Nuance? If you'd like to become one of our special projects, email us at hello@nuancebehavior.com or book a call directly on our website: nuancebehavior.com.Support the podcast by joining Habit Weekly Pro
Prof. Jakob Foerster, a leading AI researcher at Oxford University and Meta, and Chris Lu, a researcher at OpenAI -- they explain how AI is moving beyond just mimicking human behaviour to creating truly intelligent agents that can learn and solve problems on their own. Foerster champions open-source AI for responsible, decentralised development. He addresses AI scaling, goal misalignment (Goodhart's Law), and the need for holistic alignment, offering a quick look at the future of AI and how to guide it.SPONSOR MESSAGES:***CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!https://centml.ai/pricing/Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT/REFS:https://www.dropbox.com/scl/fi/yqjszhntfr00bhjh6t565/JAKOB.pdf?rlkey=scvny4bnwj8th42fjv8zsfu2y&dl=0 Prof. Jakob Foersterhttps://x.com/j_foersthttps://www.jakobfoerster.com/University of Oxford Profile: https://eng.ox.ac.uk/people/jakob-foerster/Chris Lu:https://chrislu.page/TOC1. GPU Acceleration and Training Infrastructure [00:00:00] 1.1 ARC Challenge Criticism and FLAIR Lab Overview [00:01:25] 1.2 GPU Acceleration and Hardware Lottery in RL [00:05:50] 1.3 Data Wall Challenges and Simulation-Based Solutions [00:08:40] 1.4 JAX Implementation and Technical Acceleration2. Learning Frameworks and Policy Optimization [00:14:18] 2.1 Evolution of RL Algorithms and Mirror Learning Framework [00:15:25] 2.2 Meta-Learning and Policy Optimization Algorithms [00:21:47] 2.3 Language Models and Benchmark Challenges [00:28:15] 2.4 Creativity and Meta-Learning in AI Systems3. Multi-Agent Systems and Decentralization [00:31:24] 3.1 Multi-Agent Systems and Emergent Intelligence [00:38:35] 3.2 Swarm Intelligence vs Monolithic AGI Systems [00:42:44] 3.3 Democratic Control and Decentralization of AI Development [00:46:14] 3.4 Open Source AI and Alignment Challenges [00:49:31] 3.5 Collaborative Models for AI DevelopmentREFS[[00:00:05] ARC Benchmark, Chollethttps://github.com/fchollet/ARC-AGI[00:03:05] DRL Doesn't Work, Irpanhttps://www.alexirpan.com/2018/02/14/rl-hard.html[00:05:55] AI Training Data, Data Provenance Initiativehttps://www.nytimes.com/2024/07/19/technology/ai-data-restrictions.html[00:06:10] JaxMARL, Foerster et al.https://arxiv.org/html/2311.10090v5[00:08:50] M-FOS, Lu et al.https://arxiv.org/abs/2205.01447[00:09:45] JAX Library, Google Researchhttps://github.com/jax-ml/jax[00:12:10] Kinetix, Mike and Michaelhttps://arxiv.org/abs/2410.23208[00:12:45] Genie 2, DeepMindhttps://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/[00:14:42] Mirror Learning, Grudzien, Kuba et al.https://arxiv.org/abs/2208.01682[00:16:30] Discovered Policy Optimisation, Lu et al.https://arxiv.org/abs/2210.05639[00:24:10] Goodhart's Law, Goodharthttps://en.wikipedia.org/wiki/Goodhart%27s_law[00:25:15] LLM ARChitect, Franzen et al.https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf[00:28:55] AlphaGo, Silver et al.https://arxiv.org/pdf/1712.01815.pdf[00:30:10] Meta-learning, Lu, Towers, Foersterhttps://direct.mit.edu/isal/proceedings-pdf/isal2023/35/67/2354943/isal_a_00674.pdf[00:31:30] Emergence of Pragmatics, Yuan et al.https://arxiv.org/abs/2001.07752[00:34:30] AI Safety, Amodei et al.https://arxiv.org/abs/1606.06565[00:35:45] Intentional Stance, Dennetthttps://plato.stanford.edu/entries/ethics-ai/[00:39:25] Multi-Agent RL, Zhou et al.https://arxiv.org/pdf/2305.10091[00:41:00] Open Source Generative AI, Foerster et al.https://arxiv.org/abs/2405.08597
Machine learning is very popular nowadays for solving problems in many fields, including wireless networks such as 5G networks that we use to make calls and connect to the internet using our phones. Next-generation wireless networks (NGWNs), such as 6G networks, will include more diverse devices and applications that make them more complex to control, even using machine learning approaches. In my Ph.D. thesis, I addressed some of the practical challenges of applying machine learning approaches, specifically reinforcement learning, in real deployments of NGWNs. For upcoming interviews check out the Grad Chat webpage on Queen’s University School of Graduate Studies & Postdoctoral Affairs website
Abhishek Naik was a student at University of Alberta and Alberta Machine Intelligence Institute, and he just finished his PhD in reinforcement learning, working with Rich Sutton. Now he is a postdoc fellow at the National Research Council of Canada, where he does AI research on Space applications. Featured References Reinforcement Learning for Continuing Problems Using Average Reward Abhishek Naik Ph.D. dissertation 2024 Reward Centering Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton 2024 Learning and Planning in Average-Reward Markov Decision Processes Yi Wan, Abhishek Naik, Richard S. Sutton 2020 Discounted Reinforcement Learning Is Not an Optimization Problem Abhishek Naik, Roshan Shariff, Niko Yasui, Hengshuai Yao, Richard S. Sutton 2019 Additional References Explaining dopamine through prediction errors and beyond, Gershman et al 2024 (proposes Differential-TD-like learning mechanism in the brain around Box 4)
In this episode of the Cognitive Revolution podcast, Logan Kilpatrick, Product Manager at Google DeepMind, returns to discuss the latest updates on the Gemini API and AI Studio. Logan delves into his experiences transitioning to DeepMind and the restructuring within Google focusing on AI. He highlights new product releases, including the Gemini 2.0 models, and their implications for developers. Logan also touches on the future of AI in text-to-app creation, the impact of reasoning and long context in models, and the broader industry trends. The conversation wraps up with insights into fine-tuning, reinforcement learning, vision language models, and startup opportunities in the AI space. SPONSORS: Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers. OCI powers industry leaders like Vodafone and Thomson Reuters with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before March 31, 2024 at https://oracle.com/cognitive NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive Shopify: Shopify is revolutionizing online selling with its market-leading checkout system and robust API ecosystem. Its exclusive library of cutting-edge AI apps empowers e-commerce businesses to thrive in a competitive market. Cognitive Revolution listeners can try Shopify for just $1 per month at https://shopify.com/cognitive CHAPTERS: (00:00) Teaser (00:54) Introduction and Welcome (03:56) The Future of Text App Creation (05:15) Multimodal API and Real-World Applications (10:37) Sponsors: Oracle Cloud Infrastructure (OCI) | NetSuite (13:17) The Evolution of Long Context and Reasoning (19:19) Vision Language Models and Passive Applications (21:50) New Launches and Future Prospects (Part 1) (28:35) Sponsors: Shopify (29:55) New Launches and Future Prospects (Part 2) (31:55) Flashlight Models and Cost Efficiency (34:36) Pro Models and Frontier Applications (39:52) Evaluating AI Models (48:57) Fine-Tuning and Reinforcement Learning (51:42) Opportunities for Startups (55:52) Conclusion and Final Thoughts (56:59) Outro SOCIAL LINKS: Website: https://www.cognitiverevolution.ai Twitter (Podcast): https://x.com/cogrev_podcast Twitter (Nathan): https://x.com/labenz LinkedIn: https://linkedin.com/in/nathanlabenz/ Youtube: https://youtube.com/@CognitiveRevolutionPodcast Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431 Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk PRODUCED BY: https://aipodcast.ing
Vincent Weisser and Johannes Hagemann, founders of Prime Intellect, join a conversation on the Cognitive Revolution to delve into distributed training, decentralized AI, and their vision for a future where compute and intelligence are widely accessible. They discuss the technical challenges and advantages of distributed training, emphasizing how such systems can democratize AI technology and create a more equitable future. The founders also describe their broader goal of creating a public utility for compute and intelligence and touch on their collaborative work in biosafety and scientific research to illustrate the practical applications of their vision for decentralized AI. SPONSORS: Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers. OCI powers industry leaders like Vodafone and Thomson Reuters with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before March 31, 2024 at https://oracle.com/cognitive NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive Shopify: Shopify is revolutionizing online selling with its market-leading checkout system and robust API ecosystem. Its exclusive library of cutting-edge AI apps empowers e-commerce businesses to thrive in a competitive market. Cognitive Revolution listeners can try Shopify for just $1 per month at https://shopify.com/cognitive CHAPTERS: (00:00) Teaser (01:02) About the Episode (05:43) Welcome to the Cognitive Revolution (05:55) Exploring Decentralized AI (06:46) A Positive Vision for the Future (08:19) The Risks and Rewards of AI (08:56) Superintelligence and Its Implications (13:22) The Future of Work in an AI-Driven World (17:09) The Role of Billionaires in an AI Future (Part 1) (20:41) Sponsors: Oracle Cloud Infrastructure (OCI) | NetSuite (23:21) The Role of Billionaires in an AI Future (Part 2) (30:20) The Compute Market Landscape (Part 1) (35:10) Sponsors: Shopify (36:30) The Compute Market Landscape (Part 2) (47:49) Decentralized Compute Fabrics (51:25) Regulatory Challenges in Europe and the US (53:28) Policy Regrets and the EU AI Act (54:30) The Impact of Overregulation on AI (57:00) Frontier AI Labs and Safety Plans (01:00:02) Open Source vs. Closed Models (01:06:19) Scientific Progress with AI (01:14:56) Distributed Training in AI (01:35:29) Challenges in Model Interpretability (01:40:06) Supervised Fine-Tuning and Reinforcement Learning (01:45:19) Future of Compute and Infrastructure (02:01:02) NVIDIA's Market Dominance and Competition (02:05:22) Decentralized Training and Open Source Collaboration (02:09:58) Governance and Incentives in Decentralized AI (02:14:19) Conclusion and Call for Collaboration
Our 197th episode with a summary and discussion of last week's big AI news! Recorded on 01/17/2024 Join our brand new Discord here! https://discord.gg/nTyezGSKwP Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. In this episode: - DeepSeek releases R1, a competitive AI model comparable to OpenAI's O1, leading to market unrest and significant drops in tech stocks, including a 17% plunge in NVIDIA's stock. - OpenAI launches Operator to facilitate agentic computer use, while facing competition from new releases by DeepSeek and Quen, with applications seeing rapid adoption. - President Trump revokes the Biden administration's executive order on AI, signaling a shift in AI policy and deregulation efforts. - Taiwanese government clears TSMC to produce advanced 2-nanometer chip technology abroad, aiming to strengthen global semiconductor supply amidst geopolitical tensions. If you would like to become a sponsor for the newsletter, podcast, or both, please fill out this form. Timestamps + Links: (00:00:00) Intro / Banter (00:03:01) Response to listener comments Projects & Open Source (00:06:26) DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (00:30:25) Viral AI company DeepSeek releases new image model family (00:34:07) Qwen2.5-1M Technical Report (00:38:32) Alibaba's Qwen team releases AI models that can control PCs and phones Tools & Apps (00:42:09) OpenAI launches Operator, an AI agent that performs tasks autonomously (00:47:37) DeepSeek reaches No. 1 on US Play Store (00:52:17) Alibaba rolled out Qwen Chat v0.2 and Qwen2.5-1M model (00:53:50) Perplexity launches US-hosted DeepSeek R1, hints at EU hosting soon (00:55:31) Apple is pulling its AI-generated notifications for news after generating fake headlines (00:59:00) French AI ‘Lucie' looks très chic, but keeps getting answers wrong Applications & Business (01:02:09) DeepSeek's New AI Model Sparks Shock, Awe, and Questions From US Competitors (01:08:16) Microsoft loses OpenAI exclusive cloud provider status to $500 billion Stargate project (01:13:34) OpenAI adds BlackRock exec Adebayo Ogunlesi to board of directors (01:15:33) ElevenLabs has raised a new round at $3B+ valuation led by ICONIQ Growth, sources say Policy & Safety (01:16:29) Donald Trump unveils $500 billion Stargate Project to build AI infrastructure in the US, promising over 100K jobs (01:21:16) Trump Revokes Biden AI Policy, Signs Executive Order to Strengthen AI Leadership (01:23:59) Anthropic CEO doesn't see DeepSeek as ‘adversaries,' but says export controls are critical (01:31:12) Taiwanese govt clears TSMC to make 2nm chips abroad — country lowers its 'Silicon Shield' (01:33:47) Outro
This episode explores the groundbreaking advancements in AGI from recent releases of two Chinese reasoning models: DeepSeek's R1 and Moonshot AI's Kimmy. The discussion delves into the methods, comparative analysis, and implications of these models, particularly focusing on the diverse reinforcement learning techniques employed. Despite computation constraints, these models have achieved significant performance, suggesting a paradigm shift in AI development strategies. The episode also covers the broader strategic dynamics, economic, and policy implications surrounding these developments in China and the West. The conversation highlights the importance of hands-on interaction with these models for a deeper understanding and comprehensive learning experience. SPONSORS: Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers. OCI powers industry leaders like Vodafone and Thomson Reuters with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before March 31, 2024 at https://oracle.com/cognitive NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive Shopify: Dreaming of starting your own business? Shopify makes it easier than ever. With customizable templates, shoppable social media posts, and their new AI sidekick, Shopify Magic, you can focus on creating great products while delegating the rest. Manage everything from shipping to payments in one place. Start your journey with a $1/month trial at https://shopify.com/cognitive and turn your 2025 dreams into reality. Vanta: Vanta simplifies security and compliance for businesses of all sizes. Automate compliance across 35+ frameworks like SOC 2 and ISO 27001, streamline security workflows, and complete questionnaires up to 5x faster. Trusted by over 9,000 companies, Vanta helps you manage risk and prove security in real time. Get $1,000 off at https://vanta.com/revolution CHAPTERS: (00:00) Introduction (05:44) The R1 Model: A Deep Dive (10:05) Reinforcement Learning and Emergent Behaviors (Part 1) (16:56) Sponsors: Oracle Cloud Infrastructure (OCI) | NetSuite (19:36) Reinforcement Learning and Emergent Behaviors (Part 2) (29:14) Challenges and Future Directions (Part 1) (31:55) Sponsors: Shopify | Vanta (35:11) Challenges and Future Directions (Part 2) (43:00) Productizing the R1 Model (01:00:20) The Remarkable Output of Language Models (01:00:34) Exploring R1's Creative Writing Capabilities (01:03:55) Tiny Stories and Learning Order in Small Models (01:08:34) Censorship and Strategic Questions (01:11:36) Key Takeaways and Future Implications (01:14:17) Comparing Approaches: R1 and Kimmy (01:27:19) The Path to Superhuman Performance (01:33:42) Strategic Dynamics and Policy Responses (01:46:50) Final Thoughts and Call to Action (01:48:04) Outro SOCIAL LINKS: Website: https://www.cognitiverevolution.ai Twitter (Podcast): https://x.com/cogrev_podcast Twitter (Nathan): https://x.com/labenz LinkedIn: https://www.linkedin.com/in/nathanlabenz/ Youtube: https://www.youtube.com/@CognitiveRevolutionPodcast Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431 Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk PRODUCED BY: https://aipodcast.ing
DeepSeek Breakthrough: The Democratization of AI and Its Implications Welcome back to Project Synapse! In this episode, we discuss the latest developments in AI, focusing on DeepSeek's open-source reasoning model. Join us as financial executive John Pinard and open-source guru Marcel Gagné delve into AI's role in defense, the U.S. Department of Defense's use of AI for identifying threats, and the Canadian government's new data center initiative. We also explore the democratization of AI, as DeepSeek makes advanced AI affordable and accessible, challenging major players in the industry. Don't miss this insightful discussion about AI's future and its global impact! 00:00 Welcome Back to Project Synapse 00:33 Introducing the Guests 01:09 Weekly AI News Highlights 01:23 AI in Defense: Ethical Concerns 02:42 Canadian AI and Economic Challenges 08:45 Open Source AI and Global Competition 13:31 Security and Safety in AI Development 19:38 Transparency and Open Source Models 32:07 AI in Corporate Settings: Key Considerations 32:25 Cybersecurity: Doing What You Can 32:56 Technology for Business: Practical Applications 33:33 AI Specialization vs. Monolithic Models 34:49 Quantum Computing: A Case Study 35:39 The Importance of Training in AI 36:58 Deploying AI Tools in Organizations 38:19 DeepSeek: A New AI Contender 40:42 Reinforcement Learning and AI Development 45:11 Open Source AI: Democratizing Technology 49:41 Global AI Competition: US vs. China 53:23 The Future of AI: Open Source and Accessibility 01:00:24 Conclusion and Final Thoughts
Simba Khadder (CEO of FeatureForm) joins me to chat about feature stores, reinforcement learning, surfing, and much more.
What do RL researchers complain about after hours at the bar? In this "Hot takes" episode, we find out! Recorded at The Pearl in downtown Vancouver, during the RL meetup after a day of Neurips 2024. Special thanks to "David Beckham" for the inspiration :)
In our new world of AI, few minds shine as brightly as Bob McGrew's. Until November Bob was the Chief Research Officer at OpenAI, and before that led Palantir's engineering and product management for the first decade of its existence. He's seen it all and we were fortunate to get his insights and vision for the future in one of my favorite episodes of Unsupervised Learning to date: [0:00] Intro[0:44] Debating AI Model Capabilities[0:57] Inside vs Outside Perspectives on AI Progress[1:39] Challenges in AI Pre-Training[3:02] Reinforcement Learning and Future Models[3:48] AI Progress in 2025[5:58] New Form Factors for AI Models[8:56] Reliability and Enterprise Integration[18:14] Multimodal AI and Video Models[24:05] The Future of Robotics[32:46] The Complexity of Automating Jobs with AI[34:08] AI in Startups: Tackling Boring Problems[35:33] AI's Impact on Productivity and Consultants[36:43] Traits of Top AI Researchers[40:52] The Evolution of OpenAI's Mission[46:57] The Challenges of Scaling AI[49:16] The Future of AI and Human Agency[54:47] AI in Social Sciences and Academia[1:01:15] Reflections and Future Plans[1:02:57] Quickfire With your co-hosts: @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @ericabrescia - Former COO Github, Founder Bitnami (acq'd by VMWare) @jordan_segall - Partner at Redpoint