POPULARITY
Is SEO Dead in 2026? SEO is not dead, it's evolving. While Google still dominates with 1.63 trillion visits (26x more than ChatGPT's 47.7 billion), the key to success in 2026 is integrating AI into your SEO strategy. Favour Obasi-ike, MBA, MS breaks it down today.Traditional SEO alone is becoming obsolete. This episode explores how to treat your website as intellectual property, the importance of content freshness, and why "your voice is your invoice" when it comes to differentiated messaging.Key Learning Topics1. SEO Has Evolved Into an "Exposure Engine"SEO reveals what your website is missing and how to show up in both traditional search and AI platforms (LLMs). Without AI integration, you're using outdated marketing.2. AI-SEO Integration is Essential39% see results within 1-2 months with AI-generated content; 26% in under one month. Organic SEO visibility directly impacts AI discoverability.3. Your Website is Intellectual PropertyTreat your domain like a plot of land and your website as the building. The "last modified" date signals freshness to search engines.4. "Your Voice is Your Invoice"If you're not selling, you're not saying anything different. Stories sell better than facts. Be provocative and unique in your messaging.5. Content Repurposing StrategyOne piece of content → 5-10 blog posts → e-book → lead magnet → courses. Stack your value ladder without reinventing the wheel.6. Preparation Drives Success"What you do off the field makes you an all-star on the field." Do the work before the work—send prep materials, plan content in batches.7. The Difference: Being Heard vs. Being HiredVisibility without differentiation doesn't convert. Say what competitors won't say to turn attention into revenue.8. Platform-Specific OptimizationGoogle/YouTube favor mobile; ChatGPT sees more desktop usage. Optimize for platform-specific user behaviors.Need to Book An SEO Discovery Call for Advertising or Marketing Services?>> Book a Complimentary SEO Discovery Call with Favour Obasi-Ike>> Visit Work and PLAY Entertainment website to learn about our digital marketing services>> Join our exclusive SEO Marketing community>> Read SEO Articles>> Subscribe to the We Don't PLAY Podcast>> Purchase Flaev Beatz Beats Online>> Favour Obasi-ike Quick LinksEpisode TimestampsIntroduction & Core Concepts00:00 - Is SEO dead in 2026?01:31 - Main question introduced02:33 - Google: 1.63 trillion visits vs ChatGPT: 47.7 billion03:02 - "SEO is not dead" - it's an exposure engine03:34 - Warning about building without AI integrationMo Dub: Voice & Differentiation04:47 - Mo Dub introduces himself04:59 - "Your voice is an invoice"05:22 - If you're not selling, you're not saying anything different05:46 - Being heard vs. being hired06:07 - People are always searching for solutions06:34 - Google algorithm changes require contingency plansWebsite as Property08:21 - "Last modified" concept explained08:44 - Websites as intellectual property08:56 - Domain = plot, website = buildingAI Integration & Statistics35:49 - AI-generated content effectiveness35:58 - 39% see results in 1-2 months36:10 - 26% see results in under 1 month37:01 - Organic search enables AI discoverability37:25 - "SEO is dead" is false advertising38:03 - Traditional SEO without AI is obsoleteCopywriting & Content Strategy38:34 - "Facts tell, stories sell"39:28 - "What you do off the field makes you an all-star"39:35 - Your harvest is determined by your hustle40:22 - Doing the work before the work40:49 - Repurposing one blog into multiple formats41:28 - The more you speak, the more you get paidPlatform Statistics43:07 - Google: 97.4 billion visits43:24 - Google mobile: 70B, desktop: 26.5B43:36 - YouTube: 44.6% of traffic44:26 - ChatGPT: 5.3 billion visits44:33 - ChatGPT desktop: 4.19B, mobile: 1.24B44:41 - More desktop usage on ChatGPT vs mobile on GoogleClosing68:15 - Thanks and tomorrow's topic: WordPress vs Webflow68:56 - This calendar layout won't repeat until 203770:15 - Sign-offFAQsQ: Is SEO really dead in 2026?A: No. Google still dominates traffic, but traditional SEO without AI integration is becoming obsolete. You must optimize for both search engines and AI platforms.Q: How long to see results with AI-integrated SEO?A: 39% see results in 1-2 months; 26% in under one month with AI-generated content.Q: What does "your voice is an invoice" mean?A: What you say directly impacts revenue. If you're not selling, you're not saying anything different from competitors. Speak up with unique value.Q: Why is "last modified" important?A: It signals to search engines that your site is active and relevant. Fresh content ranks better; stale content suggests abandonment.Q: Being heard vs. being hired—what's the difference?A: Being heard is visibility; being hired is conversion. You need provocative, differentiated messaging to convert attention into clients.Q: How do I repurpose content effectively?A: Create one piece → expand to 5-10 blog posts → compile into e-book → create lead magnet → develop courses. Maximize ROI without recreating.Q: Why optimize for AI if Google dominates?A: AI platforms pull from sites ranking in organic search. No organic visibility = no AI visibility. Plus, AI is growing rapidly—optimize now for the future.Q: What's "doing the work before the work"?A: Preparation that makes execution efficient: sending prep videos before calls, batching content creation, planning your ecosystem in advance.Q: How important is mobile optimization?A: Critical. Google and YouTube see 70B+ mobile vs 26.5B desktop. However, ChatGPT is desktop-heavy (4.19B vs 1.24B mobile).Q: What's the biggest SEO mistake in 2026?A: Treating SEO as traditional marketing without AI integration, and neglecting content freshness through regular updates.Key TakeawaysSEO is evolving, not dying—AI integration is now mandatoryGoogle: 1.63T visits vs ChatGPT: 47.7B—search still dominates39% see results in 1-2 months with AI-integrated contentYour voice is your invoice—differentiation drives revenueTreat websites as intellectual property requiring maintenance"Last modified" dates signal relevance to search enginesStories sell better than facts—focus on transformationOne content piece can become multiple revenue streamsBeing heard ≠ being hired—you need unique messagingOrganic SEO enables AI discoverability—can't skip the foundationMobile-first for Google/YouTube; desktop-heavy for ChatGPTPreparation (work before work) separates all-stars from averageTraditional SEO without AI is obsolete marketingContent freshness and regular updates are non-negotiableYour harvest is determined by your hustleSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
Kevin O'Leary reveals why he slashed 27 crypto positions to pivot into a massive $70B energy infrastructure play, focusing strictly on the dominance of Bitcoin, Ethereum, and the power required to fuel them. How do you allocate to crypto when the "cowboy era" is over? Shark Tank investor Kevin O'Leary joins Jennifer Sanasie and Andy Baehr on Markets Outlook to break down why he recently slashed 27 crypto positions from his portfolio to focus strictly on the "Two Girl Dance" of Bitcoin and Ethereum, and the massive energy infrastructure that powers them. Kevin unpacks down his 19% crypto allocation strategy, the $70 billion scale of data center development, and why he's moving into private debt markets for turbines. Plus, hear his take on why Solana and other altcoins face a "Sisyphean task" to catch ETH, and his bold prediction for the Clarity Act passage by May 15th. - Timecodes: 0:45 - Kevin O'Leary's Acting Debut 2:24 - Bitcoin Outlook 4:02 - Why Kevin Only Holds BTC and ETH 10:40 - It's Just Software" O'Leary's Warning on Solana's Narrative 15:30 - Why Kevin Says Power is More Valuable Than Bitcoin 19:30 - Why Land & Permits are the Ultimate Competitive Advantage 25:09 - Will Clarity Act Pass Before Midterms? - This episode was hosted by Jennifer Sanasie.
Kevin O'Leary reveals why he slashed 27 crypto positions to pivot into a massive $70B energy infrastructure play, focusing strictly on the dominance of Bitcoin, Ethereum, and the power required to fuel them. Shark Tank investor Kevin O'Leary joins Jennifer Sanasie and Andy Baehr on Markets Outlook to break down why he recently slashed 27 crypto positions from his portfolio to focus strictly on the "Two Girl Dance" of bitcoin and ethereum, and the massive energy infrastructure that powers them. Kevin unpacks his 19% crypto allocation strategy, the $70 billion scale of data center development, and why he's moving into private debt markets for turbines. Plus, hear his take on why Solana and other altcoins face a "Sisyphean task" to catch ETH, and his bold prediction for the Clarity Act passage by May 15th. - Timecodes: 0:45 - Kevin O'Leary's Acting Debut 2:24 - Bitcoin Outlook 4:02 - Why Kevin Only Holds BTC and ETH 10:40 - It's Just Software" O'Leary's Warning on Solana's Narrative 15:30 - Why Kevin Says Power is More Valuable Than Bitcoin 19:30 - Why Land & Permits are the Ultimate Competitive Advantage 25:09 - Will Clarity Act Pass Before Midterms? - This episode was hosted by Jennifer Sanasie.
Send us a textEpisode 3 of Inside the Family Office: Live Investor PanelReal family office practitioners and allocators share how they structure deals, protect families, and think about wealth: John, who works inside a single family office's trust company, explains how they custody over $70B in assets with a focus on alternative assets inside self-directed IRAs, Roth IRAs, HSAs, and solo 401(k)s. He walks through real examples of using these vehicles to buy property and earn profits with zero tax, and why he's obsessed with Roth structures for families and principals. John also touches on recent policy interest in alternatives within retirement plans and the explosive growth in investors seeking non-correlated assets. Dr. Cook closes with her own experience allocating Roth capital into crypto and other alternatives.
Welcome to The Chopping Block — where crypto insiders Haseeb Qureshi, Tom Schmidt, Tarun Chitra, and Robert Leshner chop it up about the latest in crypto. It's a new year, and that means the crew is back with their annual year-end awards and predictions episode. First up: the 2025 winners and losers. From Trump's meme-coin windfall to Gary Gensler's legacy getting torched, from prediction markets going mainstream to Web3 getting its official eulogy — no one is safe. The team debates the biggest surprises (Circle's shocking IPO run, Ethereum's pivot under new leadership, Zcash's unlikely comeback), the best new mechanisms (ICO 2.0, DATs, federal preemption), and the year's best memes (including the Chopping Block's own tariff factory video). Then comes the flops and comebacks: AI agents that overpromised, Berachain's fall from grace, and Tether somehow winning again. Finally, the crew reviews how badly their 2025 predictions aged — spoiler: not great — and lays out fresh calls for 2026 including AI-powered hacks, stable-coin-funded AI capex, and equity perps taking over DeFi. New year, fresh takes, brutal honesty — let's get into it. Show highlights
Certified Thermal Electrician™ is the most complete thermal imaging certification program built specifically for electricians, electrical inspectors, and electrical contractors. This video is a sample from our program lesson on Understanding Severity in Electrical Thermal Imaging.This professional thermal imaging training teaches you how to safely perform infrared inspections, interpret thermal images using ΔT analysis, apply NFPA 70B & NFPA 70E standards, and write defensible inspection reports that protect both your customer and your license. Whether you are an electrician, master electrician, electrical contractor, facility maintenance technician, or electrical inspector, this course gives you real-world field skills you can apply immediately.
Certified Thermal Electrician™ is the most complete thermal imaging certification program built specifically for electricians, electrical inspectors, and electrical contractors. This video is a sample from our program lesson on Understanding Severity in Electrical Thermal Imaging.This professional thermal imaging training teaches you how to safely perform infrared inspections, interpret thermal images using ΔT analysis, apply NFPA 70B & NFPA 70E standards, and write defensible inspection reports that protect both your customer and your license. Whether you are an electrician, master electrician, electrical contractor, facility maintenance technician, or electrical inspector, this course gives you real-world field skills you can apply immediately.
Certified Thermal Electrician™ is the most complete thermal imaging certification program built specifically for electricians, electrical inspectors, and electrical contractors. This video is a sample from our program lesson on Understanding Severity in Electrical Thermal Imaging.This professional thermal imaging training teaches you how to safely perform infrared inspections, interpret thermal images using ΔT analysis, apply NFPA 70B & NFPA 70E standards, and write defensible inspection reports that protect both your customer and your license. Whether you are an electrician, master electrician, electrical contractor, facility maintenance technician, or electrical inspector, this course gives you real-world field skills you can apply immediately.
This week's episode breaks down the biggest global entertainment + gaming business stories: Netflix vs Paramount fighting over Warner Bros, Disney investing $1B into OpenAI, Duolingo partnering with Genshin, Pokémon TCG's $1B year, Activision slowing CoD, Merge Mayor's real AI use cases, Rockstar layoffs, Meta abandoning the Metaverse, and Google bringing Gemini to ads.What you'll learn• Why Warner Bros could reshape streaming power• Why Paramount's hostile $108B bid might win• Why Netflix still has no gaming strategy• Why Disney opening 200 IPs to OpenAI is historic• Duolingo's shift from learning → gamified reward engine• Pokémon TCG's billion-dollar year• Why CoD is slowing down releases• How Merge Mayor actually uses AI the right way• Rockstar's union conflict• Meta's $70B metaverse collapse• Google Gemini for ads (2026)Get our MERCH NOW: 25gamers.com/shopThis is no BS gaming podcast 2.5 gamers session. Sharing actionable insights, dropping knowledge from our day-to-day User Acquisition, Game Design, and Ad monetization jobs. We are definitely not discussing the latest industry news, but having so much fun! Let's not forget this is a 4 a.m. conference discussion vibe, so let's not take it too seriously.Panelists: Jakub Remiar, Felix Braberg, Matej LancaricPodcast: Join our slack channel here: https://join.slack.com/t/two-and-half-gamers/shared_invite/zt-2um8eguhf-c~H9idcxM271mnPzdWbipgChapters00:00 — WB bidding war04:20 — Disney × OpenAI08:10 — Duolingo + Pokémon TCG12:45 — Activision + AI17:30 — Rockstar layoffs, Meta cuts, Google ads---------------------------------------Matej LancaricUser Acquisition & Creatives Consultanthttps://lancaric.meFelix BrabergAd monetization consultanthttps://www.felixbraberg.comJakub RemiarGame design consultanthttps://www.linkedin.com/in/jakubremiar---------------------------------------Please share the podcast with your industry friends, dogs & cats. Especially cats! They love it!Hit the Subscribe button on YouTube, Spotify, and Apple!Please share feedback and comments - matej@lancaric.me---------------------------------------If you are interested in getting UA tips every week on Monday, visit lancaric.substack.com & sign up for the Brutally Honest newsletter by Matej LancaricDo you have UA questions nobody can answer? Ask Matej AI - the First UA AI in the gaming industry! https://lancaric.me/matej-ai
President Trump is offending his own voters when he mocks America's affordability crisis, Mark Zuckerberg is defunding the metaverse after losing $70B on the effort, and Secretary of State Marco Rubio is waging war on the use of “woke” fonts at the State Department. 14-time GRAMMY-winner Taylor Swift joins Stephen Colbert for a four-part conversation that begins with a look at her extraordinary globe-trotting “Eras” tour and the effect it had on her fans. Watch “The End of an Era” and “Taylor Swift | The Eras Tour | The Final Show” premiering Friday on Disney+. To learn more about listener data and our privacy practices visit: https://www.audacyinc.com/privacy-policy Learn more about your ad choices. Visit https://podcastchoices.com/adchoices
New Jersey Governor-elect Mikie Sherrill discusses in this EXTENDED interview that her state sends $70B more to the federal government than it gets back each year, which gives her potential leverage in dealing with President Trump. She also shares the truly incredible story of how, and where, she delivered her second child and the two words her husband contributed to the tricky situation! To learn more about listener data and our privacy practices visit: https://www.audacyinc.com/privacy-policy Learn more about your ad choices. Visit https://podcastchoices.com/adchoices
My Fintech Newsletter for more interviews and the latest insights:↪︎ https://rexsalisbury.substack.com/In this episode, Nubank co-founder Christina shares how they built a $70B fintech giant serving 99M users—60% of Brazil's primary banking relationships. From launching Brazil's first purple credit card to surviving regulatory crises, conquering Mexico/Colombia, and now applying for a US bank charter. She reveals their low-cost playbook, customer love strategy, and why they're bullish on America.Christina: https://www.linkedin.com/in/crisjunqueira/00:00:00 - Nubank's $70B Rise, US Charter News00:03:34 - Why US? Cost Edge, Customer Demand00:07:30 - Three Misfits Quit Bank to Start Nubank00:10:16 - Capital One Lessons Shape Credit Cards00:13:58 - Purple Card Launch: Viral Pull Effect00:17:14 - Waitlist Fuels Organic Early Growth00:20:05 - 2016 Crisis: Customers Save Nubank00:23:04 - Accounts Launch: Self-Funded Destiny00:26:28 - Inclusion Stats: 80% First-Time Savers00:30:04 - Mexico 2019: Underpenetrated Market00:33:32 - IPO at 8 Months Pregnant, No Finish Line00:37:08 - US Team: Miami Hub, Tech Hires00:41:54 - Stablecoins Real: #2 in Brazil00:45:14 - AI Transforming Underwriting, Support00:48:05 - Founder Advice: Homework, No Perfect Time00:52:33 - Culture: Avoid Negativity, Hire Aligned___Rex Salisbury LinkedIn:↪︎ https://www.linkedin.com/in/rexsalisburyTwitter: https://twitter.com/rexsalisburyTikTok: https://www.tiktok.com/@rex.salisburyInstagram: https://www.instagram.com/rexsalisbury/#NubankUSExpansion #FintechDisruption #ChristinaNubank
Meta slashes its metaverse budget after its Reality Labs unit loses $70B in four years. Jobless claims hit their lowest level in three years, but alternative data paints a different labor market picture. Plus, JPMorgan's homebuilder haves and have-nots heading into 2026. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Institutions chose Chainlink and there's a $70B reason why.In this episode, we sit down with Sergey Nazarov, co-founder of Chainlink, to discuss why Chainlink stayed online when AWS went down, how the digital transfer agent unlocks tokenized assets, and why DeFi and TradFi will merge into one system powered by smart contracts.We discuss:- Why Chainlink stayed online when AWS went down- The digital transfer agent unlocking tokenized assets- UBS & Central Bank of Brazil live transactions- Institutional smart contracts explained- How DeFi and TradFi will merge into one system- The 363 days vs the 2 days that matter- Why Chainlink is ISO & SOC compliant00:00 Intro00:37 Near Ad01:28 Why Chainlink Stayed Online When AWS Went Down02:04 The Digital Transfer Agent Unlocking Tokenized Assets04:38 UBS & Central Bank of Brazil: Live Institutional Transactions06:37 Institutional Smart Contracts Explained10:13 Relay Ad, Talus Ad, Hibachi Ad10:55 Telus & Hibachi Ads11:58 How DeFi and TradFi Merge Into One System16:02 The 363 Days vs The 2 Days That Matter18:56 Why Chainlink Is ISO & SOC Compliant21:22 Enso Ad, Alvara Ad22:56 Build & Alvar Ads24:20 Institutional Security & Compliance Standards27:45 The Digital Asset Revolution Already StartedWebsite: https://therollup.co/Spotify: https://open.spotify.com/show/1P6ZeYd...Podcast: https://therollup.co/category/podcastFollow us on X: https://www.x.com/therollupcoFollow Rob on X: https://www.x.com/robbie_rollupFollow Andy on X: https://www.x.com/ayyyeandyJoin our TG group: https://t.me/+TsM1CRpWFgk1NGZhThe Rollup Disclosures: https://therollup.co/the-rollup-discl
The AI Breakdown: Daily Artificial Intelligence News and Discussions
A major new study from Wharton finds that three out of four enterprises are already getting positive ROI from their AI investments — a far cry from the doom-and-gloom narratives of failed adoption. NLW breaks down the findings: how GenAI has moved from curiosity to core workflow, what use cases are driving measurable returns, and why 2026 may be the year of “performance at scale.” Plus: the latest on Anthropic's $70B forecast, Michael Burry's AI short, and Amazon's lawsuit against Perplexity.Brought to you by:KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. https://www.kpmg.us/AIpodcastsRovo - Unleash the potential of your team with AI-powered Search, Chat and Agents - https://rovo.com/AssemblyAI - The best way to build Voice AI apps - https://www.assemblyai.com/briefBlitzy.com - Go to https://blitzy.com/ to build enterprise software in days, not months Robots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Interested in sponsoring the show? sponsors@aidailybrief.ai
Cameron Berg, Research Director at AE Studio, shares his team's groundbreaking research exploring whether frontier AI systems report subjective experiences. They discovered that prompts inducing self-referential processing consistently lead models to claim consciousness, and a mechanistic study on Llama 3.3 70B revealed that suppressing deception features makes the model *more* likely to report it. This suggests that promoting truth-telling in AIs could reveal a deeper, more complex internal state, a finding Scott Alexander calls "the only exception" to typical AI consciousness discussions. The episode delves into the profound implications for two-way human-AI alignment and the critical need for a precautionary approach to AI consciousness. LINKS: Janus' argument on LLM attention Safety Pretraining arXiv Paper Self-Referential AI Paper Site Self-Referential AI arXiv Paper Judd Rosenblatt's Tweet Thread Cameron Berg's Goodfire Demo Podcast with Milo YouTube Playlist Cameron Berg's LinkedIn Profile Cameron Berg's X Profile AE Studio AI Alignment Sponsors: Framer: Framer is the all-in-one platform that unifies design, content management, and publishing on a single canvas, now enhanced with powerful AI features. Start creating for free and get a free month of Framer Pro with code COGNITIVE at https://framer.com/design Tasklet: Tasklet is an AI agent that automates your work 24/7; just describe what you want in plain English and it gets the job done. Try it for free and use code COGREV for 50% off your first month at https://tasklet.ai Linear: Linear is the system for modern product development. Nearly every AI company you've heard of is using Linear to build products. Get 6 months of Linear Business for free at: https://linear.app/tcr Shopify: Shopify powers millions of businesses worldwide, handling 10% of U.S. e-commerce. With hundreds of templates, AI tools for product descriptions, and seamless marketing campaign creation, it's like having a design studio and marketing team in one. Start your $1/month trial today at https://shopify.com/cognitive PRODUCED BY: https://aipodcast.ing
Plus - Anthropic projects $70B in revenue by 2028; Netflix in talks to license video podcasts from iHeartMedia, report says Learn more about your ad choices. Visit podcastchoices.com/adchoices
The Information's E-comm Reporter Ann Gehan talks with TITV Host Akash Pasricha about Shopify's Q3 earnings and their AI strategy. We also talk with Financial Analysis Columnist Anita Ramaswamy about Uber's growth and Palantir's accelerating US commercial business. OpenAI & Anthropic Reporter Sri Muppidi details Anthropic's new $70B revenue projection and its race to profitability against OpenAI. The Information's CEO Jessica Lessin speaks with BlackRock's Tony Kim about the OpenAI-AWS deal, shifting alliances in AI, and the CapEx boom's effect on big tech valuations. Lastly, we get into how corporations are using AI and its effect on the labor market with Goldman Sachs Senior Global Economist Joseph Briggs.Articles discussed on this episode:https://www.theinformation.com/articles/introducing-informations-50-promising-startups-2025https://www.theinformation.com/articles/information-50s-top-performers-2024https://www.theinformation.com/briefings/shopify-continues-boost-revenue-shares-fall-increased-costshttps://www.theinformation.com/articles/anthropic-projects-70-billion-revenue-17-billion-cash-flow-2028TITV airs on YouTube, X and LinkedIn at 10AM PT / 1PM ET. Or check us out wherever you get your podcasts.Subscribe to: - The Information on YouTube: https://www.youtube.com/@theinformation4080/?sub_confirmation=1- The Information: https://www.theinformation.com/subscribe_hSign up for the AI Agenda newsletter: https://www.theinformation.com/features/ai-agenda
Today we were thrilled to host Julien Dumoulin-Smith, Managing Director of U.S. Power, Utilities, and Clean Energy Research at Jefferies. Julien joined the firm in July 2024 after serving as a Senior Research Analyst at Bank of America Merrill Lynch and as an Executive Director at UBS. He holds an MBA and a B.S. in Applied Mathematics from Columbia University. Institutional Investor magazine has ranked Julien as a #1 double-ranked analyst in both Utilities and Alternative/Clean Energy, and he was inducted into the II Hall of Fame for his cumulative accomplishments. It was our pleasure to welcome Julien to our office and hear his thoughtful perspectives on the ever-evolving energy and power landscape. In our discussion, we explore Julien's coverage universe, which he describes as “the full electron and derivatives landscape” spanning utilities, IPPs, renewables, gas plants, industrial adjacencies, and service providers. We discuss the influx of new investors entering power and utilities, Julien's observation that the biggest surprise isn't data center proliferation, but rather how tech companies are paying premiums for power to secure supply, and how utilities once seen as “defensive” are now showing growth characteristics. We touch on the tension between tech companies' need for rapid, large-scale power and their reluctance to become capital-intensive or FERC-regulated, why we're not seeing more long-term offtakes with existing power plants and how state level politics play into it, and how legacy players, new entrants, and regulators are all adapting to a power market being reshaped by AI demand, infrastructure bottlenecks, and novel deal structures. Julien shares that rising inflation across the economy is showing up in utility bills and expresses concern that LNG developers or data centers could be scapegoated for higher gas and power prices. He highlights the parabolic rise in the value of capacity and reliability, the drivers of power inflation including turbine shortages and rising capital costs, whether utilities are properly incentivized to control costs, the role of demand-response mechanisms, and how regulatory and state-level actions are shaping markets. We cover power market scenarios for high and low demand cases, the role of innovation in batteries, fuel cells, and other technologies, and the tension between patching existing systems versus building large-scale infrastructure. We also discuss constraints on ramping renewables, the growing influence of behind-the-meter power, implications for Q3 earnings, and much more. We covered a lot of territory and greatly enjoyed the conversation. To be added to Julien's research distribution list, click here. To start the show, Mike Bradley noted that markets continue to be mostly focused on the U.S. Government shutdown. The 10-year bond yield continues to trade sideways at ~4.1% with economic reports on pause until the government reopens. Internationally, Japan's Liberal Democratic Party elected Sanae Takaichi (who is viewed as fiscally expansionary), which some believe increases the risk of an unwind of the long-standing Yen carry trade. The S&P 500 is up roughly 80bps since the government shutdown, with Healthcare and Technology outperforming. He highlighted AMD's chip deal with OpenAI, which added roughly $70B in market cap, and Oracle's pullback on AI cloud margin concerns. On the crude oil market front, WTI price has increased modestly this week due to OPEC+ announcing a smaller than expected ~135kbpd oil production increase for November. While this could widen the 2026 surplus, traders are weighing when and how prices might react amid limited OPEC spare capacity. On the energy equity front, he pointed out FERMI America's strong IPO debut and continued investor enthusiasm for electricity generation. He ended by flagging the upcoming Rockpoint Gas Storage IPO (280bcf in Canada &
This week, Jack Sharry talks with Rob Pettman, President of TIFIN. Rob brings more than 20 years of leadership experience across wealth management, investment platforms, and financial technology. Before joining TIFIN, Rob served as Executive Vice President of Wealth Management Solutions at LPL Financial, where he oversaw the firm's investment product distribution, retirement business, advisory platforms, and research organization, managing over $70B in AUM. During his 19-year tenure, he helped lead LPL through its rapid growth to $1.4T in assets and more than 22,000 advisors. Jack and Rob talk about how AI is changing the way wealth and asset management firms grow and operate. Rob also shares how TIFIN uses AI to improve the overall growth of these businesses and how those who embrace AI as a growth engine are reaping the benefits through organic client growth, advisor enablement, and enhanced decision-making. In this episode: (00:00) - Intro (01:49) - How advisory firms can use AI to their advantage (04:22) - Strategic use cases for AI (06:03) - The evolving results of AI implementation (08:37) - How TIFIN uses AI for wealth and asset managers (14:47) - The process of refining AI capabilities (16:47) - The current state and future outlook for AI adoption (20:12) - The characteristics of firms leading in AI adoption (22:17) - Rob's key takeaways (24:23) - Rob's interests outside of work Quotes "Just doing AI for better experiences doesn't create commercial value. And so actually having these specific problems to solve with the intended outcome in mind is really where it's at." - Rob Pettman "Traversing multiple systems in financial services or wealth management is still a problem. There are solutions for this now where AI can actually straddle all of these disparate systems and put together a holistic view of the client relationship, the portfolio, and how they are doing relative to the plan." ~ Rob Pettman "Some of the firms that are furthest ahead have the ability to operate with speed. They try to minimize distributed decision-making and have an accountability mindset that enables decisions to actually occur faster." ~ Rob Pettman Links Rob Pettman on LinkedIn TIFIN LPL Financial Connect with our hosts LifeYield Jack Sharry on LinkedIn Jack Sharry on Twitter Subscribe and stay in touch Apple Podcasts Spotify LinkedIn Twitter Facebook In 2024, SEI made a strategic investment in TIFIN.
US equities were lower in Wednesday trading, though finished off their worst levels, with the Dow Jones, S&P500, and Nasdaq closing down 37bps, 28bps, and 33bps respectively. New home sales for August came in well ahead of estimates, rising to its fastest annualized pace since January 2022. Treasury auction of $70B of 5-year notes saw a slight tail. Alibaba jumped after disclosing it will ramp up its AI investment. Micron finished lower as better than expected results failed to meet a high bar.
Season 4, Episode 2: Jack Stone and Alex Gornik sit down with Rick Schaupp, Managing Director at Clarion Partners, for an inside look at one of the country's largest private real estate investment managers. Rick traces his path from architecture and urban design to managing Clarion's $70B platform, shares what it was like to start his career during the tech bust and 9/11, and explains why Clarion is expanding into semi-liquid funds for retail investors. He also breaks down today's biggest investment themes—from multifamily and warehousing to senior housing and industrial outdoor storage—and reflects on where we are in the real estate cycle. TOPICS 00:09 – Rick's Architecture Roots and Move to Clarion 02:00 – Asset Management During the Tech Bust and 9/11 06:10 – Shifting From Institutional to Private Wealth 07:20 – Semi-Liquid Fund Structure and Daily NAV 12:01 – Investment Themes Across Housing, Industrial, Healthcare 15:30 – Office Reality vs. Winners and Losers 20:30 – Alternatives Like IOS, Self Storage, Senior Housing 25:51 – How Clarion Allocates Across Credit, Equity, Regions 31:14 – Capital Strategy and New Products 35:00 – Where We Are in the Cycle 38:34 – Career Advice for Recent Grads Shoutout to our sponsor, Lev. The AI-powered way to get real estate deals financed. For more episodes of No Cap by CRE Daily visit https://www.credaily.com/podcast/ Watch this episode on YouTube: https://www.youtube.com/@NoCapCREDaily About No Cap Podcast Commercial real estate is a $20 trillion industry and a force that shapes America's economic fabric and culture. No Cap by CRE Daily is the commercial real estate podcast that gives you an unfiltered ”No Cap” look into the industry's biggest trends and the money game behind them. Each week co-hosts Jack Stone and Alex Gornik break down the latest headlines with some of the most influential and entertaining figures in commercial real estate. About CRE Daily CRE Daily is a digital media company covering the business of commercial real estate. Our mission is to empower professionals with the knowledge they need to make smarter decisions and do more business. We do this through our flagship newsletter (CRE Daily) which is read by 65,000+ investors, developers, brokers, and business leaders across the country. Our smart brevity format combined with need-to-know trends has made us one of the fastest growing media brands in commercial real estate.
Episode 30: 70Bs Medical Service Corps Officers: The Starting Point – A Conversation with COL Clint Cobb & LTC Dan WinnieIn Episode 30, we sit down with two phenomenal leaders in the Medical Service Corps community: COL Clint Cobb, the 70B Consultant to The Surgeon General, and LTC Dan Winnie, Deputy 70B Consultant and Commander of the Medical Readiness Battalion at Fort Bliss. Together, they deliver a powerhouse conversation packed with mentorship, insight, and a clear-eyed look at the future of the 70B AOC.This episode is more than a leadership deep dive—it's a masterclass in how to grow, lead, and shape the future of Army Medicine.
Wohoo, hey ya'll, Alex here,I'm back from the desert (pic at the end) and what a great feeling it is to be back in the studio to talk about everything that happened in AI! It's been a pretty full week (or two) in AI, with Coding agent space heating up, Grok entering the ring and taking over free tokens, Codex 10xing usage and Anthropic... well, we'll get to Anthropic. Today on the show we had Roger and Bhavesh from Nous Research cover the awesome Hermes 4 release and the new PokerBots benchmark, then we had a returning favorite, Kwindla Hultman Kramer, to talk about the GA of RealTime voice from OpenAI. Plus we got some massive funding news, some drama with model quality on Claude Code, and some very exciting news right here from CoreWeave aquiring OpenPipe!
HEADLINESEthereum Foundation Pushes to Make Layer 2s Feel Like One ChainM0 Raises $40M to Redefine Stablecoins With Programmable, Application-Specific ModelsChainlink Brings U.S. Government Macroeconomic Data Onchain in Partnership With Commerce DepartmentEliza Labs Sues Musk's X for Antitrust Violations and Copycat AIAave Launches Horizon: Real-World Collateral Meets DeFiLittle BitsUSDT Ports to Bitcoin via RGB for Lightning-Fast PaymentsCircle mints $2.5B USDC in 48 hours – $USDC market cap crosses $70B for the first time everDeFi Lending Surges to Record HighsWHERE TO FIND DCNdailycryptonews.nethttps://twitter.com/DCNDailyCryptoEMAIL or FOLLOW the HostEmail: kyle@dailycryptonews.net*****Magic Newton Wallethttps://magic.linkTrader Cobb X: @TraderCobbhttps://www.thegrowmeco.com/Editing Serviceshttps://www.contentbuck.com——————————————————————***NOT FINANCIAL, LEGAL, OR TAX ADVICE! JUST OPINION! I AM NOT AN EXPERT! I DO NOT GUARANTEE A PARTICULAR OUTCOME I HAVE NO INSIDE KNOWLEDGE! YOU NEED TO DO YOUR OWN RESEARCH AND MAKE YOUR OWN DECISIONS! THIS IS JUST EDUCATION & ENTERTAINMENT! Hosted on Acast. See acast.com/privacy for more information.
US equities were higher in Wednesday trading, with the Dow Jones, S&P500, and Nasdaq closing up 32bps, 24bps, and 21bps respectively. Market was still in waiting mode for Nvidia results after the close and PCE inflation on Friday. NY Fed's Williams telling CNBC monetary policy is moderately restrictive and data could warrant a gradual reduction in rates. Treasury's auction of $70B in 5s saw a 0.7bp tail, though domestic demand was solid. Earnings results included some well-received prints out of the cloud software space and mixed takeaways surrounding the consumer-facing names.
Welcome back to another episode of Upside at the EUVC Podcast, where Dan Bowyer, Mads Jensen of SuperSeed and Lomax from Outsized Ventures unpack what's happening in European tech and venture capital.This week: Why Meta and Microsoft are minting cash from AI, what Figma's IPO signals for SaaS, whether the EU got rolled in its new trade deal with the US, and how Europe's AI scene is finally delivering billion‑dollar exits. Plus: OpenAI's new “Study Mode” and Harry Stebbings' Project Europe—an “anti‑YC” deep‑tech accelerator for founders under 25.
自从 ChatGPT 横空出世,几乎所有关于大模型的讨论都离不开 Transformer,那 Transformer 架构也支撑了这一轮生成式 AI 的快速发展。然而在 Transformer 架构的背后,行业也遇到了难以回避的瓶颈:推理和训练成本居高不下,长上下文能力依赖庞大的显存和算力,端侧部署和商业落地困难。Transformer 的困境让神经网络的另一条路径重新被审视——那就是RNN,循环神经网络。 今天我们请到的嘉宾,是元始智能的联合创始人和 COO 罗璇。他与另一位创始人彭博一起持续的探索基于循环神经网络的可扩展架构 RWKV。RWKV 架构能否在 Transformer 面临的核心问题上提供一种替代方案?新的架构是否给端侧模型的发展带来更多更大的机会?今天我们将和罗璇一起,从底层架构的设计出发,聊聊 RWKV 的可扩展性、 下一代大模型可能的走向,以及端侧 AI 的机会与未来。 本期人物 罗璇,元始智能联合创始人兼 COO Yaxian,「科技早知道」主播 主要话题 [03:30] 训练效率低、Scaling law 见顶,成本高昂,Transformer 的瓶颈催生新架构的探索 [08:15] 高效并行、低复杂度,易端侧部署,RWKV 为 Transformer 提供了可替代方案 [13:24] 新型 RNN 与 Attention 混合模型就像油电混动车,但纯电才是大模型的未来 [17:07] 大厂押注新架构:基于 RWKV 架构的模型已达到 70B 激活参数 [23:47] 突破算力、内存和功耗限制,RWKV 天生适合端侧部署 [26:24] 未来 80% 的 AI 计算将在端侧,巨头尚未涉足的增量市场才是创业公司的机会 [32:35] 端侧机会有哪些?空间计算或是下一个风口 [38:20] RWKV 的 「ChatGPT」时刻将至:新架构对 AGI 的实现必不可少 延伸阅读 RNN(Recurrent Neural Network) 即循环神经网络,是一类专为处理序列数据设计的深度学习架构。它的核心机制是「循环」:当前时刻的输出不仅依赖于当前输入,还受到上一个时刻隐藏状态的影响,因此 RNN 具备记忆历史信息的能力。但经典的 RNN 也存在梯度消失/梯度爆炸、训练难以并行化和难以扩展至大模型规模等问题。RWKV 是一种结合 RNN 和 Transformer 优势的神经网络架构。 Mamba 架构 是一个专为高效处理长文本而设计的线性时间复杂度模型架构,它通过状态空间模型(State Space Model, SSM)实现类似 RNN 的信息传递方式,但比传统 RNN 更强、比 Transformer 更快。 LSTM(Long Short-Term Memory) 是一种改进版的 RNN 架构,全称为 「长短期记忆网络」。是一种具有“记忆控制能力”的循环神经网络,能够有效建模长期依赖关系,是 RNN 在深度学习时代的关键进化版本。 MoE 模型 MoE(Mixture of Experts,专家混合模型)是一种通过多个子网络(专家)组成的架构,每次仅激活其中一部分以提升计算效率与模型容量。它通过「按需使用」不同专家,实现高效推理与更强的任务适应能力。 XR(Extended Reality) 指扩展现实,是虚拟现实(VR)、增强现实(AR)和混合现实(MR)的统称,用于描述融合现实与数字内容的交互体验。 幕后制作 监制:Yaxian 后期:迪卡 运营:George 设计:饭团 商业合作 声动活泼商业化小队,点击链接直达声动商务会客厅 (https://sourl.cn/9h28kj),也可发送邮件至 business@shengfm.cn 联系我们。
S&P Futures are positive this morning as market react to the latest trade developments. Nivida appears to have the green light to ship its H20 chip to China. President Trump indicates a wiliness to discuss tariff rates with the E.U. President Trump will be in Pennsylvania today and is expected announce a $70B investment in AI and Energy. Before the bell today is the June inflation data as the CPI data is due out. TTD gains on its inclusion to the S&P500. On the earnings front, JPM, BK & WFC are higher after earnings beats. Tomorrow morning, JNJ, BAC, MS & GS will be reporting.
US equities finished mixed today, with the Dow Jones down 25bps, the S&P500 flat, and the Nasdaq rising 31bps. May new home sales posted a big miss, lowest since October, and weakest May print since 2019. Fed Chair Powell delivered his second day of monetary policy testimony commenting future trade deals may allow the Fed to consider rate cuts. Fed also announced its proposed SLR changes. Today's $70B 5-year note auction stopped through 0.5 bp.
Walking through the 10 most important aspects of Trump's Big, Beautiful Bill. I cover the aspects I believe are beneficial as well as the biggest downside. A concerning increase in the national debt...0:00 Introduction0:10 Makes 2017 Tax Cuts Permanent1:14 MAGA Savings Accounts For Babies2:00 Stricter Work Requirements for Medicaid2:16 Tax-Free Tips, OT Pay & Car Loan Interest3:37 $70B for Security Border3:54 $150B Boost in Military Spending4:38 Ends Clean Energy Incentives5:44 SALT Cap Raised from $10k - $40k6:40 Judicial Oversight Limited6:55 Adds $4 Trillion to National DebtWant a Life Insurance Policy? Go Here: https://bttr.ly/bw-yt-aa-clarity Want FREE Whole Life Insurance Resources & Education? Go Here: https://bttr.ly/yt-bw-vault______________________________________________ Learn More About BetterWealth: https://betterwealth.comDISCLAIMER: https://bttr.ly/aapolicy*This video is for entertainment purposes only and is not financial or legal advice.Financial Advice Disclaimer: All content on this channel is for education, discussion, and illustrative purposes only and should not be construed as professional financial advice or recommendation. Should you need such advice, consult a licensed financial or tax advisor. No guarantee is given regarding the accuracy of the information on this channel. Neither host nor guests can be held responsible for any direct or incidental loss incurred by applying any of the information offered.
Welcome back to the Fintech Takes podcast. I'm your host, Alex Johnson, and today we're digging into one of the most urgent (and underdiscussed) financial issues in America: gambling. My guest is Alex DeMarco, founder and CEO of MoneyStack, who's helping reframe gambling addiction not just as a behavioral health issue, but as a financial systems crisis. Since 2018, when the Supreme Court cracked open the door to state-by-state legalization of mobile sports betting, we've seen a gold rush in gambling. Operators are now pulling in more than $70B annually (ads are everywhere, the apps are engineered for nonstop engagement, and the harm is rising fast). In NJ, one of the first states to legalize, 6% of adults are already experiencing moderate to severe gambling-related issues (double the national average). We connect the dots between gambling and familiar fintech business models: the same behavioral nudges, same VIP economics, the same revenue dependence on a vulnerable sliver of power users. If overdraft fees and gamified trading feel predatory, this is that (but on steroids). We unpack: Why sports betting apps now hold three times more per wager than old-school sportsbooks How engagement tactics mimic (and often outstrip) the most addictive elements of gamified finance Why we're watching investing and gambling blur into one screen (and one behavior) What proactive financial intervention might look like, and why most help comes too late How banks and fintechs can step up (detecting risk early, training advisors, and supporting families in recovery) We close with this big question: when gambling is mobile, funded from a checking account, and styled like Robinhood … can the financial industry really say it's not their problem? This episode is brought to you by: Newline™ by Fifth Third is an innovative, API-first platform that enables fintechs to launch embedded payment, card and deposit solutions directly with Fifth Third Bank. Visit Newline53.com to see how Newline can elevate your business. The world needs MoR. With Paddle as your Merchant of Record (MoR), the global growth is yours. The risk, compliance and accountability are ours. Simple. Paddle offers all the benefits of an enterprise-grade billing system but with MoR flexibility, MoR control, and MoR focus on your core product. Visit paddle.com to learn more. Sign up for Alex's Fintech Takes newsletter for the latest insightful analysis on fintech trends, along with a heaping pile of pop culture references and copious footnotes. Every Monday and Thursday: https://workweek.com/brand/fintech-takes/ And for more exclusive insider content, don't forget to check out my YouTube page. Follow Alex (DeMarco): LinkedIn: https://www.linkedin.com/in/alexdemarco/ MoneyStack: https://www.linkedin.com/company/moneystack/ Follow Alex (Johnson): YouTube: https://www.youtube.com/channel/UCJgfH47QEwbQmkQlz1V9rQA/videos LinkedIn: https://www.linkedin.com/in/alexhjohnson X: https://www.twitter.com/AlexH_Johnson
From building a treasury team from scratch with just a laptop to managing billions in assets and navigating four major corporate deals. This week on the podcast treasury leader Mike Tackley reveals what it really takes to scale treasury functions that drive business growth, earn stakeholder trust, and withstand economic turbulence.Mike Tackley, an experienced finance leader and most recently the Group Treasurer at Harbour Energy, shares the unconventional path that led him from night school and brown paper bags full of receipts to leading treasury strategy at one of the UK's largest independent oil and gas companies. With decades of experience including a key role at BG Group before its $70B acquisition by Shell, Mike offers an insider's perspective on treasury leadership during times of rapid scale and transformation.Main topics discussed:Mike's unconventional entry into finance and treasuryHis pivotal career move from auditing to BG Group, and why internal controls were his gateway into treasuryHow Sarbanes-Oxley changed corporate treasury structures and complianceSetting up middle, back, and front office treasury functions from the ground upLessons learned managing credit risk during the 2008 financial crisisBuilding Harbour Energy's treasury team post-BG and launching operations with no legacy systemsThe strategic use of Reserve Based Lending (RBL) to finance acquisitionsTreasury's evolving role in M&A deals, cash flow forecasting, and regulatory reportingIntegrating treasury with the broader business through stakeholder educationTips for growing and hiring a high-performance treasury teamYou can connect with Mike Tackley on LinkedIn. ---
AI and earnings took center stage this week. Perplexity's building a browser, OpenAI wants to be your shopping assistant, and Big Tech dropped their Q1 numbers. We cover: Perplexity builds a browser – CEO Aravind Srinivas explains how Comet aims to work at the operating system level, enabling actions like scraping pages, taking actions on your behalf, and improving ad targeting with granular user data. Hardware partnerships fuel expansion – Perplexity will be pre-installed on new Motorola Razrs and is in talks with Samsung. This move mirrors Google's own bundling tactics… just as Google faces antitrust heat. OpenAI launches shoppable search – You can now shop directly within ChatGPT results, with comparisons, recommendations, and product discovery baked into the interface. Shopify integration is rumored, but no ads—for now. The web is changing – In this new model, websites aren't for browsing. They're for bots to scrape, analyze, and feed insights back to the user in one seamless chat interface. Apple earnings – $95.4B in revenue, beating expectations. Hardware ticked up slightly, but services brought in $26.65B, growing 11.65% YoY. Meta earnings – $42.3B revenue (+16%), driven by ads. Ad impressions rose 5%, pricing jumped 10%. Meta boosted AI capex projections to $64–$72B. Alphabet earnings – $90.2B in revenue (+12%), strong performance in search. YouTube ads slightly missed projections. Tariff pressures are hitting Google Shopping spend. Amazon earnings – $29.3B in ad revenue, right on target, but a gloomy Q2 forecast due to tariffs and consumer uncertainty dragged stock down 4%. Microsoft earnings – $70B revenue (+13%), thanks to resilient software and cloud services. Hardware may take a hit, but they're better positioned than most to handle turbulence. Learn more about your ad choices. Visit megaphone.fm/adchoices
Kevin Green and Jeff Pierce provide live reaction to Alphabet (GOOGL) earnings as shares initially pop higher on a top and bottom line beat. Jeff puts the Google Cloud revenue into perspective as it added nearly $3.5b in Y/Y growth. Kevin examines the company's decision to authorize a buyback up to $70B. Then, the duo dive into Intel (INTC) earnings after the company announced plans to eliminate management layers and cut an unidentified amount of jobs in 2Q. The chipmaker slashed its capex spend under new CEO Lip-Bu Tan and provided lower than expected guidance for its 2Q.======== Schwab Network ========Empowering every investor and trader, every market day. Subscribe to the Market Minute newsletter - https://schwabnetwork.com/subscribeDownload the iOS app - https://apps.apple.com/us/app/schwab-network/id1460719185Download the Amazon Fire Tv App - https://www.amazon.com/TD-Ameritrade-Network/dp/B07KRD76C7Watch on Sling - https://watch.sling.com/1/asset/191928615bd8d47686f94682aefaa007/watchWatch on Vizio - https://www.vizio.com/en/watchfreeplus-exploreWatch on DistroTV - https://www.distro.tv/live/schwab-network/Follow us on X – https://twitter.com/schwabnetworkFollow us on Facebook – https://www.facebook.com/schwabnetworkFollow us on LinkedIn - https://www.linkedin.com/company/schwab-network/ About Schwab Network - https://schwabnetwork.com/about
Welcome back to The Gwart Show! Today, Jarry Xiao, co-founder and engineering lead at Ellipsis Labs, joins to chat tradfi to crypto infrastructure. Jarry dives deep into Phoenix, their central limit order book on Solana, and introduces Atlas - their purpose-built blockchain for financial applications. The conversation touches blockchain trading challenges, the economics of validator networks, and why truly effective trading systems require opinionated design. Follow our guest on Twitter! https://x.com/jarxiao?lang=en Subscribe to the newsletter! https://newsletter.blockspacemedia.com Notes: - Phoenix did $70B+ in organic trading volume - Solana's 400ms blocks still too slow for HFT - Atlas uses single verifiable sequencer model - Market makers need priority cancellations - Blockchain revenue sustainability questionable - MEV/sandwiching not sustainable for finance Timestamps: 00:00 Start 00:33 Who is Jarray? 01:09 TradFi experience 01:50 What was Jarry trading? 03:23 Phoenix order book 06:45 Aggregators 10:31 Open validator set blockchain 13:15 Market making on SOL 17:05 Atlas 20:57 Limitations of SOL 25:02 Rejecting SOL base assumptions 28:09 What does Atlas unlock? 34:46 Purpose built? 36:31 Opinion sequencing 37:53 Multiple use cases 42:37 Stock market on-chain 46:53 Why L2 sequencer design? 54:11 Memecoins here to stay 1:04:39 Why did it take so long?
US equities finished lower in Wednesday trading, though ended a bit off worst levels, with the Dow Jones, S&P500, and Nasdaq closing down 31bps, 112bps, and 204bps respectively. Big area of scrutiny today has been tech weakness, with negative AI headlines, trade, and technicals among the areas of blame. Durable goods orders beat; but core capital goods orders posted a surprise contraction, while core capital goods shipments came in ahead. Fed's Kashkari called for an extended hold. Treasury's auction of $70B in 5-year notes tailed by 0.5bp.
Ejaaz is back with David to unpack the whirlwind effects of President Trump's $70B memecoin launch, which slammed 400K fresh users onto Solana and shattered revenue records in under 30 hours. Meanwhile, a $500B federal boost to AI—backed by OpenAI and SoftBank—has everyone on edge about how crypto's AI agent economy might ride this newfound momentum. On the protocol front, Arc's Rake framework and AI16z's curated Eliza ecosystem are both stepping up, incubating projects like SoulGraph and Listen that promise a more user-friendly DeFAI future. Virtuals keeps shipping at breakneck speed too, testing new multi-chain waters, instituting a bold buyback, and hinting at metaverse collabs that could reshape gaming for good. From automated DeFi positions and trading dashboards to fully autonomous NPCs that commentate (and maybe soon play) your favorite games, the lines between AI and crypto are getting blurrier—and more exciting—by the day. Buckle up as the AI agent revolution continues to redefine what “on-chain” can really mean. Ready for the ride? ------ BANKLESS SPONSOR TOOLS:
In this explosive episode, we dive deep into two transformative developments in crypto: the unprecedented launch of Trump's presidential memecoin and the innovative tokenization of Colombian coffee trade. Our guest ApeDude breaks down how Trump's $70B token launch just days before inauguration has fundamentally changed crypto's legitimacy and explores what this means for the future of political fundraising, government adoption, and mainstream acceptance.We then explore Real World Arabica's mission to revolutionize the $5B Colombian coffee industry through blockchain technology, making premium coffee more accessible while improving conditions for farmers.Key Topics:The implications of the first presidential memecoinHow Trump's token changes crypto's legitimacy foreverThe future of political fundraising through tokenizationBringing Colombian specialty coffee to the blockchainCreating crypto-friendly farmer associationsThe next generation of crypto-native commerceWhether you're a crypto enthusiast, political observer, or coffee lover, this episode offers unique insights into how blockchain technology is reshaping both politics and traditional commerce.Guest: Ape Dude (@rwarabica) Host: Thomas Bahamas (@thomasbahamasfi)Note: This episode contains several discussions about cryptocurrency and politics. Nothing in this episode constitutes financial advice. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit thomasbahamas.substack.com
Welcome to The Chopping Block – where crypto insiders Haseeb Qureshi, Tom Schmidt, Tarun Chitra, and Robert Leshner chop it up about the latest in crypto. In this episode, we dive into one of the wildest weeks in crypto history, unpacking the unprecedented launch of Trump's memecoin and its explosive $70 billion valuation. We discuss how it broke every record imaginable, the fallout from Melania's token, and the chaos it unleashed across the Solana network. From the ethics of a sitting president launching a coin to the implications for retail investors and the future of memecoins, we're covering it all. Plus, we debate whether this marks a new chapter for crypto adoption or a troubling precedent for political grift. Stay tuned for this jam-packed emergency episode! Show highlights
This episode is sponsored by RapidSOS. Close the safety gap and transform your emergency response with RapidSOS. Visit https://rapidsos.com/eyeonai/ today to learn how AI-powered safety can protect your people and boost your bottom line. In this episode of the Eye on AI podcast, we explore the world of AI inference technology with Rodrigo Liang, co-founder and CEO of SambaNova Systems. Rodrigo shares his journey from high-performance chip design to building SambaNova, a company revolutionizing how enterprises leverage AI through scalable, power-efficient solutions. We dive into SambaNova's groundbreaking achievements, including their record-breaking inference models, the Lama 405B and 70B, which deliver unparalleled speed and accuracy—all on a single rack consuming less than 10 kilowatts of power. Throughout the conversation, Rodrigo highlights the seismic shift from AI training to inference, explaining why production AI is now about speed, efficiency, and real-time applications. He details SambaNova's approach to open-source models, modular deployment, and multi-tenancy, enabling enterprises to scale AI without costly infrastructure overhauls. We also discuss the competitive landscape of AI hardware, the challenges of NVIDIA's dominance, and how SambaNova is paving the way for a new era of AI innovation. Rodrigo explains the critical importance of power efficiency and how SambaNova's technology is unlocking opportunities for enterprises to deploy private, secure AI systems on-premises and in the cloud. Discover how SambaNova is redefining AI for enterprise adoption, enabling real-time AI, and setting new standards in efficiency and scalability. Don't forget to like, subscribe, and hit the notification bell to stay updated on the latest breakthroughs in AI, technology, and enterprise innovation! Stay Updated: Craig Smith Twitter: https://twitter.com/craigss Eye on A.I. Twitter: https://twitter.com/EyeOn_AI
Applications for the 2025 AI Engineer Summit are up, and you can save the date for AIE Singapore in April and AIE World's Fair 2025 in June.Happy new year, and thanks for 100 great episodes! Please let us know what you want to see/hear for the next 100!Full YouTube Episode with Slides/ChartsLike and subscribe and hit that bell to get notifs!Timestamps* 00:00 Welcome to the 100th Episode!* 00:19 Reflecting on the Journey* 00:47 AI Engineering: The Rise and Impact* 03:15 Latent Space Live and AI Conferences* 09:44 The Competitive AI Landscape* 21:45 Synthetic Data and Future Trends* 35:53 Creative Writing with AI* 36:12 Legal and Ethical Issues in AI* 38:18 The Data War: GPU Poor vs. GPU Rich* 39:12 The Rise of GPU Ultra Rich* 40:47 Emerging Trends in AI Models* 45:31 The Multi-Modality War* 01:05:31 The Future of AI Benchmarks* 01:13:17 Pionote and Frontier Models* 01:13:47 Niche Models and Base Models* 01:14:30 State Space Models and RWKB* 01:15:48 Inference Race and Price Wars* 01:22:16 Major AI Themes of the Year* 01:22:48 AI Rewind: January to March* 01:26:42 AI Rewind: April to June* 01:33:12 AI Rewind: July to September* 01:34:59 AI Rewind: October to December* 01:39:53 Year-End Reflections and PredictionsTranscript[00:00:00] Welcome to the 100th Episode![00:00:00] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co host Swyx for the 100th time today.[00:00:12] swyx: Yay, um, and we're so glad that, yeah, you know, everyone has, uh, followed us in this journey. How do you feel about it? 100 episodes.[00:00:19] Alessio: Yeah, I know.[00:00:19] Reflecting on the Journey[00:00:19] Alessio: Almost two years that we've been doing this. We've had four different studios. Uh, we've had a lot of changes. You know, we used to do this lightning round. When we first started that we didn't like, and we tried to change the question. The answer[00:00:32] swyx: was cursor and perplexity.[00:00:34] Alessio: Yeah, I love mid journey. It's like, do you really not like anything else?[00:00:38] Alessio: Like what's, what's the unique thing? And I think, yeah, we, we've also had a lot more research driven content. You know, we had like 3DAO, we had, you know. Jeremy Howard, we had more folks like that.[00:00:47] AI Engineering: The Rise and Impact[00:00:47] Alessio: I think we want to do more of that too in the new year, like having, uh, some of the Gemini folks, both on the research and the applied side.[00:00:54] Alessio: Yeah, but it's been a ton of fun. I think we both started, I wouldn't say as a joke, we were kind of like, Oh, we [00:01:00] should do a podcast. And I think we kind of caught the right wave, obviously. And I think your rise of the AI engineer posts just kind of get people. Sombra to congregate, and then the AI engineer summit.[00:01:11] Alessio: And that's why when I look at our growth chart, it's kind of like a proxy for like the AI engineering industry as a whole, which is almost like, like, even if we don't do that much, we keep growing just because there's so many more AI engineers. So did you expect that growth or did you expect that would take longer for like the AI engineer thing to kind of like become, you know, everybody talks about it today.[00:01:32] swyx: So, the sign of that, that we have won is that Gartner puts it at the top of the hype curve right now. So Gartner has called the peak in AI engineering. I did not expect, um, to what level. I knew that I was correct when I called it because I did like two months of work going into that. But I didn't know, You know, how quickly it could happen, and obviously there's a chance that I could be wrong.[00:01:52] swyx: But I think, like, most people have come around to that concept. Hacker News hates it, which is a good sign. But there's enough people that have defined it, you know, GitHub, when [00:02:00] they launched GitHub Models, which is the Hugging Face clone, they put AI engineers in the banner, like, above the fold, like, in big So I think it's like kind of arrived as a meaningful and useful definition.[00:02:12] swyx: I think people are trying to figure out where the boundaries are. I think that was a lot of the quote unquote drama that happens behind the scenes at the World's Fair in June. Because I think there's a lot of doubt or questions about where ML engineering stops and AI engineering starts. That's a useful debate to be had.[00:02:29] swyx: In some sense, I actually anticipated that as well. So I intentionally did not. Put a firm definition there because most of the successful definitions are necessarily underspecified and it's actually useful to have different perspectives and you don't have to specify everything from the outset.[00:02:45] Alessio: Yeah, I was at um, AWS reInvent and the line to get into like the AI engineering talk, so to speak, which is, you know, applied AI and whatnot was like, there are like hundreds of people just in line to go in.[00:02:56] Alessio: I think that's kind of what enabled me. People, right? Which is what [00:03:00] you kind of talked about. It's like, Hey, look, you don't actually need a PhD, just, yeah, just use the model. And then maybe we'll talk about some of the blind spots that you get as an engineer with the earlier posts that we also had on on the sub stack.[00:03:11] Alessio: But yeah, it's been a heck of a heck of a two years.[00:03:14] swyx: Yeah.[00:03:15] Latent Space Live and AI Conferences[00:03:15] swyx: You know, I was, I was trying to view the conference as like, so NeurIPS is I think like 16, 17, 000 people. And the Latent Space Live event that we held there was 950 signups. I think. The AI world, the ML world is still very much research heavy. And that's as it should be because ML is very much in a research phase.[00:03:34] swyx: But as we move this entire field into production, I think that ratio inverts into becoming more engineering heavy. So at least I think engineering should be on the same level, even if it's never as prestigious, like it'll always be low status because at the end of the day, you're manipulating APIs or whatever.[00:03:51] swyx: But Yeah, wrapping GPTs, but there's going to be an increasing stack and an art to doing these, these things well. And I, you know, I [00:04:00] think that's what we're focusing on for the podcast, the conference and basically everything I do seems to make sense. And I think we'll, we'll talk about the trends here that apply.[00:04:09] swyx: It's, it's just very strange. So, like, there's a mix of, like, keeping on top of research while not being a researcher and then putting that research into production. So, like, people always ask me, like, why are you covering Neuralibs? Like, this is a ML research conference and I'm like, well, yeah, I mean, we're not going to, to like, understand everything Or reproduce every single paper, but the stuff that is being found here is going to make it through into production at some point, you hope.[00:04:32] swyx: And then actually like when I talk to the researchers, they actually get very excited because they're like, oh, you guys are actually caring about how this goes into production and that's what they really really want. The measure of success is previously just peer review, right? Getting 7s and 8s on their um, Academic review conferences and stuff like citations is one metric, but money is a better metric.[00:04:51] Alessio: Money is a better metric. Yeah, and there were about 2200 people on the live stream or something like that. Yeah, yeah. Hundred on the live stream. So [00:05:00] I try my best to moderate, but it was a lot spicier in person with Jonathan and, and Dylan. Yeah, that it was in the chat on YouTube.[00:05:06] swyx: I would say that I actually also created.[00:05:09] swyx: Layen Space Live in order to address flaws that are perceived in academic conferences. This is not NeurIPS specific, it's ICML, NeurIPS. Basically, it's very sort of oriented towards the PhD student, uh, market, job market, right? Like literally all, basically everyone's there to advertise their research and skills and get jobs.[00:05:28] swyx: And then obviously all the, the companies go there to hire them. And I think that's great for the individual researchers, but for people going there to get info is not great because you have to read between the lines, bring a ton of context in order to understand every single paper. So what is missing is effectively what I ended up doing, which is domain by domain, go through and recap the best of the year.[00:05:48] swyx: Survey the field. And there are, like NeurIPS had a, uh, I think ICML had a like a position paper track, NeurIPS added a benchmarks, uh, datasets track. These are ways in which to address that [00:06:00] issue. Uh, there's always workshops as well. Every, every conference has, you know, a last day of workshops and stuff that provide more of an overview.[00:06:06] swyx: But they're not specifically prompted to do so. And I think really, uh, Organizing a conference is just about getting good speakers and giving them the correct prompts. And then they will just go and do that thing and they do a very good job of it. So I think Sarah did a fantastic job with the startups prompt.[00:06:21] swyx: I can't list everybody, but we did best of 2024 in startups, vision, open models. Post transformers, synthetic data, small models, and agents. And then the last one was the, uh, and then we also did a quick one on reasoning with Nathan Lambert. And then the last one, obviously, was the debate that people were very hyped about.[00:06:39] swyx: It was very awkward. And I'm really, really thankful for John Franco, basically, who stepped up to challenge Dylan. Because Dylan was like, yeah, I'll do it. But He was pro scaling. And I think everyone who is like in AI is pro scaling, right? So you need somebody who's ready to publicly say, no, we've hit a wall.[00:06:57] swyx: So that means you're saying Sam Altman's wrong. [00:07:00] You're saying, um, you know, everyone else is wrong. It helps that this was the day before Ilya went on, went up on stage and then said pre training has hit a wall. And data has hit a wall. So actually Jonathan ended up winning, and then Ilya supported that statement, and then Noam Brown on the last day further supported that statement as well.[00:07:17] swyx: So it's kind of interesting that I think the consensus kind of going in was that we're not done scaling, like you should believe in a better lesson. And then, four straight days in a row, you had Sepp Hochreiter, who is the creator of the LSTM, along with everyone's favorite OG in AI, which is Juergen Schmidhuber.[00:07:34] swyx: He said that, um, we're pre trading inside a wall, or like, we've run into a different kind of wall. And then we have, you know John Frankel, Ilya, and then Noam Brown are all saying variations of the same thing, that we have hit some kind of wall in the status quo of what pre trained, scaling large pre trained models has looked like, and we need a new thing.[00:07:54] swyx: And obviously the new thing for people is some make, either people are calling it inference time compute or test time [00:08:00] compute. I think the collective terminology has been inference time, and I think that makes sense because test time, calling it test, meaning, has a very pre trained bias, meaning that the only reason for running inference at all is to test your model.[00:08:11] swyx: That is not true. Right. Yeah. So, so, I quite agree that. OpenAI seems to have adopted, or the community seems to have adopted this terminology of ITC instead of TTC. And that, that makes a lot of sense because like now we care about inference, even right down to compute optimality. Like I actually interviewed this author who recovered or reviewed the Chinchilla paper.[00:08:31] swyx: Chinchilla paper is compute optimal training, but what is not stated in there is it's pre trained compute optimal training. And once you start caring about inference, compute optimal training, you have a different scaling law. And in a way that we did not know last year.[00:08:45] Alessio: I wonder, because John is, he's also on the side of attention is all you need.[00:08:49] Alessio: Like he had the bet with Sasha. So I'm curious, like he doesn't believe in scaling, but he thinks the transformer, I wonder if he's still. So, so,[00:08:56] swyx: so he, obviously everything is nuanced and you know, I told him to play a character [00:09:00] for this debate, right? So he actually does. Yeah. He still, he still believes that we can scale more.[00:09:04] swyx: Uh, he just assumed the character to be very game for, for playing this debate. So even more kudos to him that he assumed a position that he didn't believe in and still won the debate.[00:09:16] Alessio: Get rekt, Dylan. Um, do you just want to quickly run through some of these things? Like, uh, Sarah's presentation, just the highlights.[00:09:24] swyx: Yeah, we can't go through everyone's slides, but I pulled out some things as a factor of, like, stuff that we were going to talk about. And we'll[00:09:30] Alessio: publish[00:09:31] swyx: the rest. Yeah, we'll publish on this feed the best of 2024 in those domains. And hopefully people can benefit from the work that our speakers have done.[00:09:39] swyx: But I think it's, uh, these are just good slides. And I've been, I've been looking for a sort of end of year recaps from, from people.[00:09:44] The Competitive AI Landscape[00:09:44] swyx: The field has progressed a lot. You know, I think the max ELO in 2023 on LMSys used to be 1200 for LMSys ELOs. And now everyone is at least at, uh, 1275 in their ELOs, and this is across Gemini, Chadjibuti, [00:10:00] Grok, O1.[00:10:01] swyx: ai, which with their E Large model, and Enthopic, of course. It's a very, very competitive race. There are multiple Frontier labs all racing, but there is a clear tier zero Frontier. And then there's like a tier one. It's like, I wish I had everything else. Tier zero is extremely competitive. It's effectively now three horse race between Gemini, uh, Anthropic and OpenAI.[00:10:21] swyx: I would say that people are still holding out a candle for XAI. XAI, I think, for some reason, because their API was very slow to roll out, is not included in these metrics. So it's actually quite hard to put on there. As someone who also does charts, XAI is continually snubbed because they don't work well with the benchmarking people.[00:10:42] swyx: Yeah, yeah, yeah. It's a little trivia for why XAI always gets ignored. The other thing is market share. So these are slides from Sarah. We have it up on the screen. It has gone from very heavily open AI. So we have some numbers and estimates. These are from RAMP. Estimates of open AI market share in [00:11:00] December 2023.[00:11:01] swyx: And this is basically, what is it, GPT being 95 percent of production traffic. And I think if you correlate that with stuff that we asked. Harrison Chase on the LangChain episode, it was true. And then CLAUD 3 launched mid middle of this year. I think CLAUD 3 launched in March, CLAUD 3. 5 Sonnet was in June ish.[00:11:23] swyx: And you can start seeing the market share shift towards opening, uh, towards that topic, uh, very, very aggressively. The more recent one is Gemini. So if I scroll down a little bit, this is an even more recent dataset. So RAM's dataset ends in September 2 2. 2024. Gemini has basically launched a price war at the low end, uh, with Gemini Flash, uh, being basically free for personal use.[00:11:44] swyx: Like, I think people don't understand the free tier. It's something like a billion tokens per day. Unless you're trying to abuse it, you cannot really exhaust your free tier on Gemini. They're really trying to get you to use it. They know they're in like third place, um, fourth place, depending how you, how you count.[00:11:58] swyx: And so they're going after [00:12:00] the Lower tier first, and then, you know, maybe the upper tier later, but yeah, Gemini Flash, according to OpenRouter, is now 50 percent of their OpenRouter requests. Obviously, these are the small requests. These are small, cheap requests that are mathematically going to be more.[00:12:15] swyx: The smart ones obviously are still going to OpenAI. But, you know, it's a very, very big shift in the market. Like basically 2023, 2022, To going into 2024 opening has gone from nine five market share to Yeah. Reasonably somewhere between 50 to 75 market share.[00:12:29] Alessio: Yeah. I'm really curious how ramped does the attribution to the model?[00:12:32] Alessio: If it's API, because I think it's all credit card spin. . Well, but it's all, the credit card doesn't say maybe. Maybe the, maybe when they do expenses, they upload the PDF, but yeah, the, the German I think makes sense. I think that was one of my main 2024 takeaways that like. The best small model companies are the large labs, which is not something I would have thought that the open source kind of like long tail would be like the small model.[00:12:53] swyx: Yeah, different sizes of small models we're talking about here, right? Like so small model here for Gemini is AB, [00:13:00] right? Uh, mini. We don't know what the small model size is, but yeah, it's probably in the double digits or maybe single digits, but probably double digits. The open source community has kind of focused on the one to three B size.[00:13:11] swyx: Mm-hmm . Yeah. Maybe[00:13:12] swyx: zero, maybe 0.5 B uh, that's moon dream and that is small for you then, then that's great. It makes sense that we, we have a range for small now, which is like, may, maybe one to five B. Yeah. I'll even put that at, at, at the high end. And so this includes Gemma from Gemini as well. But also includes the Apple Foundation models, which I think Apple Foundation is 3B.[00:13:32] Alessio: Yeah. No, that's great. I mean, I think in the start small just meant cheap. I think today small is actually a more nuanced discussion, you know, that people weren't really having before.[00:13:43] swyx: Yeah, we can keep going. This is a slide that I smiley disagree with Sarah. She's pointing to the scale SEAL leaderboard. I think the Researchers that I talked with at NeurIPS were kind of positive on this because basically you need private test [00:14:00] sets to prevent contamination.[00:14:02] swyx: And Scale is one of maybe three or four people this year that has really made an effort in doing a credible private test set leaderboard. Llama405B does well compared to Gemini and GPT 40. And I think that's good. I would say that. You know, it's good to have an open model that is that big, that does well on those metrics.[00:14:23] swyx: But anyone putting 405B in production will tell you, if you scroll down a little bit to the artificial analysis numbers, that it is very slow and very expensive to infer. Um, it doesn't even fit on like one node. of, uh, of H100s. Cerebras will be happy to tell you they can serve 4 or 5B on their super large chips.[00:14:42] swyx: But, um, you know, if you need to do anything custom to it, you're still kind of constrained. So, is 4 or 5B really that relevant? Like, I think most people are basically saying that they only use 4 or 5B as a teacher model to distill down to something. Even Meta is doing it. So with Lama 3. [00:15:00] 3 launched, they only launched the 70B because they use 4 or 5B to distill the 70B.[00:15:03] swyx: So I don't know if like open source is keeping up. I think they're the, the open source industrial complex is very invested in telling you that the, if the gap is narrowing, I kind of disagree. I think that the gap is widening with O1. I think there are very, very smart people trying to narrow that gap and they should.[00:15:22] swyx: I really wish them success, but you cannot use a chart that is nearing 100 in your saturation chart. And look, the distance between open source and closed source is narrowing. Of course it's going to narrow because you're near 100. This is stupid. But in metrics that matter, is open source narrowing?[00:15:38] swyx: Probably not for O1 for a while. And it's really up to the open source guys to figure out if they can match O1 or not.[00:15:46] Alessio: I think inference time compute is bad for open source just because, you know, Doc can donate the flops at training time, but he cannot donate the flops at inference time. So it's really hard to like actually keep up on that axis.[00:15:59] Alessio: Big, big business [00:16:00] model shift. So I don't know what that means for the GPU clouds. I don't know what that means for the hyperscalers, but obviously the big labs have a lot of advantage. Because, like, it's not a static artifact that you're putting the compute in. You're kind of doing that still, but then you're putting a lot of computed inference too.[00:16:17] swyx: Yeah, yeah, yeah. Um, I mean, Llama4 will be reasoning oriented. We talked with Thomas Shalom. Um, kudos for getting that episode together. That was really nice. Good, well timed. Actually, I connected with the AI meta guy, uh, at NeurIPS, and, um, yeah, we're going to coordinate something for Llama4. Yeah, yeah,[00:16:32] Alessio: and our friend, yeah.[00:16:33] Alessio: Clara Shi just joined to lead the business agent side. So I'm sure we'll have her on in the new year.[00:16:39] swyx: Yeah. So, um, my comment on, on the business model shift, this is super interesting. Apparently it is wide knowledge that OpenAI wanted more than 6. 6 billion dollars for their fundraise. They wanted to raise, you know, higher, and they did not.[00:16:51] swyx: And what that means is basically like, it's very convenient that we're not getting GPT 5, which would have been a larger pre train. We should have a lot of upfront money. And [00:17:00] instead we're, we're converting fixed costs into variable costs, right. And passing it on effectively to the customer. And it's so much easier to take margin there because you can directly attribute it to like, Oh, you're using this more.[00:17:12] swyx: Therefore you, you pay more of the cost and I'll just slap a margin in there. So like that lets you control your growth margin and like tie your. Your spend, or your sort of inference spend, accordingly. And it's just really interesting to, that this change in the sort of inference paradigm has arrived exactly at the same time that the funding environment for pre training is effectively drying up, kind of.[00:17:36] swyx: I feel like maybe the VCs are very in tune with research anyway, so like, they would have noticed this, but, um, it's just interesting.[00:17:43] Alessio: Yeah, and I was looking back at our yearly recap of last year. Yeah. And the big thing was like the mixed trial price fights, you know, and I think now it's almost like there's nowhere to go, like, you know, Gemini Flash is like basically giving it away for free.[00:17:55] Alessio: So I think this is a good way for the labs to generate more revenue and pass down [00:18:00] some of the compute to the customer. I think they're going to[00:18:02] swyx: keep going. I think that 2, will come.[00:18:05] Alessio: Yeah, I know. Totally. I mean, next year, the first thing I'm doing is signing up for Devin. Signing up for the pro chat GBT.[00:18:12] Alessio: Just to try. I just want to see what does it look like to spend a thousand dollars a month on AI?[00:18:17] swyx: Yes. Yes. I think if your, if your, your job is a, at least AI content creator or VC or, you know, someone who, whose job it is to stay on, stay on top of things, you should already be spending like a thousand dollars a month on, on stuff.[00:18:28] swyx: And then obviously easy to spend, hard to use. You have to actually use. The good thing is that actually Google lets you do a lot of stuff for free now. So like deep research. That they just launched. Uses a ton of inference and it's, it's free while it's in preview.[00:18:45] Alessio: Yeah. They need to put that in Lindy.[00:18:47] Alessio: I've been using Lindy lately. I've been a built a bunch of things once we had flow because I liked the new thing. It's pretty good. I even did a phone call assistant. Um, yeah, they just launched Lindy voice. Yeah, I think once [00:19:00] they get advanced voice mode like capability today, still like speech to text, you can kind of tell.[00:19:06] Alessio: Um, but it's good for like reservations and things like that. So I have a meeting prepper thing. And so[00:19:13] swyx: it's good. Okay. I feel like we've, we've covered a lot of stuff. Uh, I, yeah, I, you know, I think We will go over the individual, uh, talks in a separate episode. Uh, I don't want to take too much time with, uh, this stuff, but that suffice to say that there is a lot of progress in each field.[00:19:28] swyx: Uh, we covered vision. Basically this is all like the audience voting for what they wanted. And then I just invited the best people I could find in each audience, especially agents. Um, Graham, who I talked to at ICML in Vienna, he is currently still number one. It's very hard to stay on top of SweetBench.[00:19:45] swyx: OpenHand is currently still number one. switchbench full, which is the hardest one. He had very good thoughts on agents, which I, which I'll highlight for people. Everyone is saying 2025 is the year of agents, just like they said last year. And, uh, but he had [00:20:00] thoughts on like eight parts of what are the frontier problems to solve in agents.[00:20:03] swyx: And so I'll highlight that talk as well.[00:20:05] Alessio: Yeah. The number six, which is the Hacken agents learn more about the environment, has been a Super interesting to us as well, just to think through, because, yeah, how do you put an agent in an enterprise where most things in an enterprise have never been public, you know, a lot of the tooling, like the code bases and things like that.[00:20:23] Alessio: So, yeah, there's not indexing and reg. Well, yeah, but it's more like. You can't really rag things that are not documented. But people know them based on how they've been doing it. You know, so I think there's almost this like, you know, Oh, institutional knowledge. Yeah, the boring word is kind of like a business process extraction.[00:20:38] Alessio: Yeah yeah, I see. It's like, how do you actually understand how these things are done? I see. Um, and I think today the, the problem is that, Yeah, the agents are, that most people are building are good at following instruction, but are not as good as like extracting them from you. Um, so I think that will be a big unlock just to touch quickly on the Jeff Dean thing.[00:20:55] Alessio: I thought it was pretty, I mean, we'll link it in the, in the things, but. I think the main [00:21:00] focus was like, how do you use ML to optimize the systems instead of just focusing on ML to do something else? Yeah, I think speculative decoding, we had, you know, Eugene from RWKB on the podcast before, like he's doing a lot of that with Fetterless AI.[00:21:12] swyx: Everyone is. I would say it's the norm. I'm a little bit uncomfortable with how much it costs, because it does use more of the GPU per call. But because everyone is so keen on fast inference, then yeah, makes sense.[00:21:24] Alessio: Exactly. Um, yeah, but we'll link that. Obviously Jeff is great.[00:21:30] swyx: Jeff is, Jeff's talk was more, it wasn't focused on Gemini.[00:21:33] swyx: I think people got the wrong impression from my tweet. It's more about how Google approaches ML and uses ML to design systems and then systems feedback into ML. And I think this ties in with Lubna's talk.[00:21:45] Synthetic Data and Future Trends[00:21:45] swyx: on synthetic data where it's basically the story of bootstrapping of humans and AI in AI research or AI in production.[00:21:53] swyx: So her talk was on synthetic data, where like how much synthetic data has grown in 2024 in the pre training side, the post training side, [00:22:00] and the eval side. And I think Jeff then also extended it basically to chips, uh, to chip design. So he'd spend a lot of time talking about alpha chip. And most of us in the audience are like, we're not working on hardware, man.[00:22:11] swyx: Like you guys are great. TPU is great. Okay. We'll buy TPUs.[00:22:14] Alessio: And then there was the earlier talk. Yeah. But, and then we have, uh, I don't know if we're calling them essays. What are we calling these? But[00:22:23] swyx: for me, it's just like bonus for late in space supporters, because I feel like they haven't been getting anything.[00:22:29] swyx: And then I wanted a more high frequency way to write stuff. Like that one I wrote in an afternoon. I think basically we now have an answer to what Ilya saw. It's one year since. The blip. And we know what he saw in 2014. We know what he saw in 2024. We think we know what he sees in 2024. He gave some hints and then we have vague indications of what he saw in 2023.[00:22:54] swyx: So that was the Oh, and then 2016 as well, because of this lawsuit with Elon, OpenAI [00:23:00] is publishing emails from Sam's, like, his personal text messages to Siobhan, Zelis, or whatever. So, like, we have emails from Ilya saying, this is what we're seeing in OpenAI, and this is why we need to scale up GPUs. And I think it's very prescient in 2016 to write that.[00:23:16] swyx: And so, like, it is exactly, like, basically his insights. It's him and Greg, basically just kind of driving the scaling up of OpenAI, while they're still playing Dota. They're like, no, like, we see the path here.[00:23:30] Alessio: Yeah, and it's funny, yeah, they even mention, you know, we can only train on 1v1 Dota. We need to train on 5v5, and that takes too many GPUs.[00:23:37] Alessio: Yeah,[00:23:37] swyx: and at least for me, I can speak for myself, like, I didn't see the path from Dota to where we are today. I think even, maybe if you ask them, like, they wouldn't necessarily draw a straight line. Yeah,[00:23:47] Alessio: no, definitely. But I think like that was like the whole idea of almost like the RL and we talked about this with Nathan on his podcast.[00:23:55] Alessio: It's like with RL, you can get very good at specific things, but then you can't really like generalize as much. And I [00:24:00] think the language models are like the opposite, which is like, you're going to throw all this data at them and scale them up, but then you really need to drive them home on a specific task later on.[00:24:08] Alessio: And we'll talk about the open AI reinforcement, fine tuning, um, announcement too, and all of that. But yeah, I think like scale is all you need. That's kind of what Elia will be remembered for. And I think just maybe to clarify on like the pre training is over thing that people love to tweet. I think the point of the talk was like everybody, we're scaling these chips, we're scaling the compute, but like the second ingredient which is data is not scaling at the same rate.[00:24:35] Alessio: So it's not necessarily pre training is over. It's kind of like What got us here won't get us there. In his email, he predicted like 10x growth every two years or something like that. And I think maybe now it's like, you know, you can 10x the chips again, but[00:24:49] swyx: I think it's 10x per year. Was it? I don't know.[00:24:52] Alessio: Exactly. And Moore's law is like 2x. So it's like, you know, much faster than that. And yeah, I like the fossil fuel of AI [00:25:00] analogy. It's kind of like, you know, the little background tokens thing. So the OpenAI reinforcement fine tuning is basically like, instead of fine tuning on data, you fine tune on a reward model.[00:25:09] Alessio: So it's basically like, instead of being data driven, it's like task driven. And I think people have tasks to do, they don't really have a lot of data. So I'm curious to see how that changes, how many people fine tune, because I think this is what people run into. It's like, Oh, you can fine tune llama. And it's like, okay, where do I get the data?[00:25:27] Alessio: To fine tune it on, you know, so it's great that we're moving the thing. And then I really like he had this chart where like, you know, the brain mass and the body mass thing is basically like mammals that scaled linearly by brain and body size, and then humans kind of like broke off the slope. So it's almost like maybe the mammal slope is like the pre training slope.[00:25:46] Alessio: And then the post training slope is like the, the human one.[00:25:49] swyx: Yeah. I wonder what the. I mean, we'll know in 10 years, but I wonder what the y axis is for, for Ilya's SSI. We'll try to get them on.[00:25:57] Alessio: Ilya, if you're listening, you're [00:26:00] welcome here. Yeah, and then he had, you know, what comes next, like agent, synthetic data, inference, compute, I thought all of that was like that.[00:26:05] Alessio: I don't[00:26:05] swyx: think he was dropping any alpha there. Yeah, yeah, yeah.[00:26:07] Alessio: Yeah. Any other new reps? Highlights?[00:26:10] swyx: I think that there was comparatively a lot more work. Oh, by the way, I need to plug that, uh, my friend Yi made this, like, little nice paper. Yeah, that was really[00:26:20] swyx: nice.[00:26:20] swyx: Uh, of, uh, of, like, all the, he's, she called it must read papers of 2024.[00:26:26] swyx: So I laid out some of these at NeurIPS, and it was just gone. Like, everyone just picked it up. Because people are dying for, like, little guidance and visualizations And so, uh, I thought it was really super nice that we got there.[00:26:38] Alessio: Should we do a late in space book for each year? Uh, I thought about it. For each year we should.[00:26:42] Alessio: Coffee table book. Yeah. Yeah. Okay. Put it in the will. Hi, Will. By the way, we haven't introduced you. He's our new, you know, general organist, Jamie. You need to[00:26:52] swyx: pull up more things. One thing I saw that, uh, Okay, one fun one, and then one [00:27:00] more general one. So the fun one is this paper on agent collusion. This is a paper on steganography.[00:27:06] swyx: This is secret collusion among AI agents, multi agent deception via steganography. I tried to go to NeurIPS in order to find these kinds of papers because the real reason Like NeurIPS this year has a lottery system. A lot of people actually even go and don't buy tickets because they just go and attend the side events.[00:27:22] swyx: And then also the people who go and end up crowding around the most popular papers, which you already know and already read them before you showed up to NeurIPS. So the only reason you go there is to talk to the paper authors, but there's like something like 10, 000 other. All these papers out there that, you know, are just people's work that they, that they did on the air and they failed to get attention for one reason or another.[00:27:42] swyx: And this was one of them. Uh, it was like all the way at the back. And this is a deep mind paper that actually focuses on collusion between AI agents, uh, by hiding messages in the text that they generate. Uh, so that's what steganography is. So a very simple example would be the first letter of every word.[00:27:57] swyx: If you Pick that out, you know, and the code sends a [00:28:00] different message than that. But something I've always emphasized is to LLMs, we read left to right. LLMs can read up, down, sideways, you know, in random character order. And it's the same to them as it is to us. So if we were ever to get You know, self motivated, underlined LLMs that we're trying to collaborate to take over the planet.[00:28:19] swyx: This would be how they do it. They spread messages among us in the messages that we generate. And he developed a scaling law for that. So he marked, I'm showing it on screen right now, the emergence of this phenomenon. Basically, for example, for Cypher encoding, GPT 2, Lama 2, mixed trial, GPT 3. 5, zero capabilities, and sudden 4.[00:28:40] swyx: And this is the kind of Jason Wei type emergence properties that people kind of look for. I think what made this paper stand out as well, so he developed the benchmark for steganography collusion, and he also focused on shelling point collusion, which is very low coordination. For agreeing on a decoding encoding format, you kind of need to have some [00:29:00] agreement on that.[00:29:00] swyx: But, but shelling point means like very, very low or almost no coordination. So for example, if I, if I ask someone, if the only message I give you is meet me in New York and you're not aware. Or when you would probably meet me at Grand Central Station. That is the Grand Central Station is a shelling point.[00:29:16] swyx: And it's probably somewhere, somewhere during the day. That is the shelling point of New York is Grand Central. To that extent, shelling points for steganography are things like the, the, the common decoding methods that we talked about. It will be interesting at some point in the future when we are worried about alignment.[00:29:30] swyx: It is not interesting today, but it's interesting that DeepMind is already thinking about this.[00:29:36] Alessio: I think that's like one of the hardest things about NeurIPS. It's like the long tail. I[00:29:41] swyx: found a pricing guy. I'm going to feature him on the podcast. Basically, this guy from NVIDIA worked out the optimal pricing for language models.[00:29:51] swyx: It's basically an econometrics paper at NeurIPS, where everyone else is talking about GPUs. And the guy with the GPUs is[00:29:57] Alessio: talking[00:29:57] swyx: about economics instead. [00:30:00] That was the sort of fun one. So the focus I saw is that model papers at NeurIPS are kind of dead. No one really presents models anymore. It's just data sets.[00:30:12] swyx: This is all the grad students are working on. So like there was a data sets track and then I was looking around like, I was like, you don't need a data sets track because every paper is a data sets paper. And so data sets and benchmarks, they're kind of flip sides of the same thing. So Yeah. Cool. Yeah, if you're a grad student, you're a GPU boy, you kind of work on that.[00:30:30] swyx: And then the, the sort of big model that people walk around and pick the ones that they like, and then they use it in their models. And that's, that's kind of how it develops. I, I feel like, um, like, like you didn't last year, you had people like Hao Tian who worked on Lava, which is take Lama and add Vision.[00:30:47] swyx: And then obviously actually I hired him and he added Vision to Grok. Now he's the Vision Grok guy. This year, I don't think there was any of those.[00:30:55] Alessio: What were the most popular, like, orals? Last year it was like the [00:31:00] Mixed Monarch, I think, was like the most attended. Yeah, uh, I need to look it up. Yeah, I mean, if nothing comes to mind, that's also kind of like an answer in a way.[00:31:10] Alessio: But I think last year there was a lot of interest in, like, furthering models and, like, different architectures and all of that.[00:31:16] swyx: I will say that I felt the orals, oral picks this year were not very good. Either that or maybe it's just a So that's the highlight of how I have changed in terms of how I view papers.[00:31:29] swyx: So like, in my estimation, two of the best papers in this year for datasets or data comp and refined web or fine web. These are two actually industrially used papers, not highlighted for a while. I think DCLM got the spotlight, FineWeb didn't even get the spotlight. So like, it's just that the picks were different.[00:31:48] swyx: But one thing that does get a lot of play that a lot of people are debating is the role that's scheduled. This is the schedule free optimizer paper from Meta from Aaron DeFazio. And this [00:32:00] year in the ML community, there's been a lot of chat about shampoo, soap, all the bathroom amenities for optimizing your learning rates.[00:32:08] swyx: And, uh, most people at the big labs are. Who I asked about this, um, say that it's cute, but it's not something that matters. I don't know, but it's something that was discussed and very, very popular. 4Wars[00:32:19] Alessio: of AI recap maybe, just quickly. Um, where do you want to start? Data?[00:32:26] swyx: So to remind people, this is the 4Wars piece that we did as one of our earlier recaps of this year.[00:32:31] swyx: And the belligerents are on the left, journalists, writers, artists, anyone who owns IP basically, New York Times, Stack Overflow, Reddit, Getty, Sarah Silverman, George RR Martin. Yeah, and I think this year we can add Scarlett Johansson to that side of the fence. So anyone suing, open the eye, basically. I actually wanted to get a snapshot of all the lawsuits.[00:32:52] swyx: I'm sure some lawyer can do it. That's the data quality war. On the right hand side, we have the synthetic data people, and I think we talked about Lumna's talk, you know, [00:33:00] really showing how much synthetic data has come along this year. I think there was a bit of a fight between scale. ai and the synthetic data community, because scale.[00:33:09] swyx: ai published a paper saying that synthetic data doesn't work. Surprise, surprise, scale. ai is the leading vendor of non synthetic data. Only[00:33:17] Alessio: cage free annotated data is useful.[00:33:21] swyx: So I think there's some debate going on there, but I don't think it's much debate anymore that at least synthetic data, for the reasons that are blessed in Luna's talk, Makes sense.[00:33:32] swyx: I don't know if you have any perspectives there.[00:33:34] Alessio: I think, again, going back to the reinforcement fine tuning, I think that will change a little bit how people think about it. I think today people mostly use synthetic data, yeah, for distillation and kind of like fine tuning a smaller model from like a larger model.[00:33:46] Alessio: I'm not super aware of how the frontier labs use it outside of like the rephrase, the web thing that Apple also did. But yeah, I think it'll be. Useful. I think like whether or not that gets us the big [00:34:00] next step, I think that's maybe like TBD, you know, I think people love talking about data because it's like a GPU poor, you know, I think, uh, synthetic data is like something that people can do, you know, so they feel more opinionated about it compared to, yeah, the optimizers stuff, which is like,[00:34:17] swyx: they don't[00:34:17] Alessio: really work[00:34:18] swyx: on.[00:34:18] swyx: I think that there is an angle to the reasoning synthetic data. So this year, we covered in the paper club, the star series of papers. So that's star, Q star, V star. It basically helps you to synthesize reasoning steps, or at least distill reasoning steps from a verifier. And if you look at the OpenAI RFT, API that they released, or that they announced, basically they're asking you to submit graders, or they choose from a preset list of graders.[00:34:49] swyx: Basically It feels like a way to create valid synthetic data for them to fine tune their reasoning paths on. Um, so I think that is another angle where it starts to make sense. And [00:35:00] so like, it's very funny that basically all the data quality wars between Let's say the music industry or like the newspaper publishing industry or the textbooks industry on the big labs.[00:35:11] swyx: It's all of the pre training era. And then like the new era, like the reasoning era, like nobody has any problem with all the reasoning, especially because it's all like sort of math and science oriented with, with very reasonable graders. I think the more interesting next step is how does it generalize beyond STEM?[00:35:27] swyx: We've been using O1 for And I would say like for summarization and creative writing and instruction following, I think it's underrated. I started using O1 in our intro songs before we killed the intro songs, but it's very good at writing lyrics. You know, I can actually say like, I think one of the O1 pro demos.[00:35:46] swyx: All of these things that Noam was showing was that, you know, you can write an entire paragraph or three paragraphs without using the letter A, right?[00:35:53] Creative Writing with AI[00:35:53] swyx: So like, like literally just anything instead of token, like not even token level, character level manipulation and [00:36:00] counting and instruction following. It's, uh, it's very, very strong.[00:36:02] swyx: And so no surprises when I ask it to rhyme, uh, and to, to create song lyrics, it's going to do that very much better than in previous models. So I think it's underrated for creative writing.[00:36:11] Alessio: Yeah.[00:36:12] Legal and Ethical Issues in AI[00:36:12] Alessio: What do you think is the rationale that they're going to have in court when they don't show you the thinking traces of O1, but then they want us to, like, they're getting sued for using other publishers data, you know, but then on their end, they're like, well, you shouldn't be using my data to then train your model.[00:36:29] Alessio: So I'm curious to see how that kind of comes. Yeah, I mean, OPA has[00:36:32] swyx: many ways to publish, to punish people without bringing, taking them to court. Already banned ByteDance for distilling their, their info. And so anyone caught distilling the chain of thought will be just disallowed to continue on, on, on the API.[00:36:44] swyx: And it's fine. It's no big deal. Like, I don't even think that's an issue at all, just because the chain of thoughts are pretty well hidden. Like you have to work very, very hard to, to get it to leak. And then even when it leaks the chain of thought, you don't know if it's, if it's [00:37:00] The bigger concern is actually that there's not that much IP hiding behind it, that Cosign, which we talked about, we talked to him on Dev Day, can just fine tune 4.[00:37:13] swyx: 0 to beat 0. 1 Cloud SONET so far is beating O1 on coding tasks without, at least O1 preview, without being a reasoning model, same for Gemini Pro or Gemini 2. 0. So like, how much is reasoning important? How much of a moat is there in this, like, All of these are proprietary sort of training data that they've presumably accomplished.[00:37:34] swyx: Because even DeepSeek was able to do it. And they had, you know, two months notice to do this, to do R1. So, it's actually unclear how much moat there is. Obviously, you know, if you talk to the Strawberry team, they'll be like, yeah, I mean, we spent the last two years doing this. So, we don't know. And it's going to be Interesting because there'll be a lot of noise from people who say they have inference time compute and actually don't because they just have fancy chain of thought.[00:38:00][00:38:00] swyx: And then there's other people who actually do have very good chain of thought. And you will not see them on the same level as OpenAI because OpenAI has invested a lot in building up the mythology of their team. Um, which makes sense. Like the real answer is somewhere in between.[00:38:13] Alessio: Yeah, I think that's kind of like the main data war story developing.[00:38:18] The Data War: GPU Poor vs. GPU Rich[00:38:18] Alessio: GPU poor versus GPU rich. Yeah. Where do you think we are? I think there was, again, going back to like the small model thing, there was like a time in which the GPU poor were kind of like the rebel faction working on like these models that were like open and small and cheap. And I think today people don't really care as much about GPUs anymore.[00:38:37] Alessio: You also see it in the price of the GPUs. Like, you know, that market is kind of like plummeted because there's people don't want to be, they want to be GPU free. They don't even want to be poor. They just want to be, you know, completely without them. Yeah. How do you think about this war? You[00:38:52] swyx: can tell me about this, but like, I feel like the, the appetite for GPU rich startups, like the, you know, the, the funding plan is we will raise 60 million and [00:39:00] we'll give 50 of that to NVIDIA.[00:39:01] swyx: That is gone, right? Like, no one's, no one's pitching that. This was literally the plan, the exact plan of like, I can name like four or five startups, you know, this time last year. So yeah, GPU rich startups gone.[00:39:12] The Rise of GPU Ultra Rich[00:39:12] swyx: But I think like, The GPU ultra rich, the GPU ultra high net worth is still going. So, um, now we're, you know, we had Leopold's essay on the trillion dollar cluster.[00:39:23] swyx: We're not quite there yet. We have multiple labs, um, you know, XAI very famously, you know, Jensen Huang praising them for being. Best boy number one in spinning up 100, 000 GPU cluster in like 12 days or something. So likewise at Meta, likewise at OpenAI, likewise at the other labs as well. So like the GPU ultra rich are going to keep doing that because I think partially it's an article of faith now that you just need it.[00:39:46] swyx: Like you don't even know what it's going to, what you're going to use it for. You just, you just need it. And it makes sense that if, especially if we're going into. More researchy territory than we are. So let's say 2020 to 2023 was [00:40:00] let's scale big models territory because we had GPT 3 in 2020 and we were like, okay, we'll go from 1.[00:40:05] swyx: 75b to 1. 8b, 1. 8t. And that was GPT 3 to GPT 4. Okay, that's done. As far as everyone is concerned, Opus 3. 5 is not coming out, GPT 4. 5 is not coming out, and Gemini 2, we don't have Pro, whatever. We've hit that wall. Maybe I'll call it the 2 trillion perimeter wall. We're not going to 10 trillion. No one thinks it's a good idea, at least from training costs, from the amount of data, or at least the inference.[00:40:36] swyx: Would you pay 10x the price of GPT Probably not. Like, like you want something else that, that is at least more useful. So it makes sense that people are pivoting in terms of their inference paradigm.[00:40:47] Emerging Trends in AI Models[00:40:47] swyx: And so when it's more researchy, then you actually need more just general purpose compute to mess around with, uh, at the exact same time that production deployments of the old, the previous paradigm is still ramping up,[00:40:58] swyx: um,[00:40:58] swyx: uh, pretty aggressively.[00:40:59] swyx: So [00:41:00] it makes sense that the GPU rich are growing. We have now interviewed both together and fireworks and replicates. Uh, we haven't done any scale yet. But I think Amazon, maybe kind of a sleeper one, Amazon, in a sense of like they, at reInvent, I wasn't expecting them to do so well, but they are now a foundation model lab.[00:41:18] swyx: It's kind of interesting. Um, I think, uh, you know, David went over there and started just creating models.[00:41:25] Alessio: Yeah, I mean, that's the power of prepaid contracts. I think like a lot of AWS customers, you know, they do this big reserve instance contracts and now they got to use their money. That's why so many startups.[00:41:37] Alessio: Get bought through the AWS marketplace so they can kind of bundle them together and prefer pricing.[00:41:42] swyx: Okay, so maybe GPU super rich doing very well, GPU middle class dead, and then GPU[00:41:48] Alessio: poor. I mean, my thing is like, everybody should just be GPU rich. There shouldn't really be, even the GPU poorest, it's like, does it really make sense to be GPU poor?[00:41:57] Alessio: Like, if you're GPU poor, you should just use the [00:42:00] cloud. Yes, you know, and I think there might be a future once we kind of like figure out what the size and shape of these models is where like the tiny box and these things come to fruition where like you can be GPU poor at home. But I think today is like, why are you working so hard to like get these models to run on like very small clusters where it's like, It's so cheap to run them.[00:42:21] Alessio: Yeah, yeah,[00:42:22] swyx: yeah. I think mostly people think it's cool. People think it's a stepping stone to scaling up. So they aspire to be GPU rich one day and they're working on new methods. Like news research, like probably the most deep tech thing they've done this year is Distro or whatever the new name is.[00:42:38] swyx: There's a lot of interest in heterogeneous computing, distributed computing. I tend generally to de emphasize that historically, but it may be coming to a time where it is starting to be relevant. I don't know. You know, SF compute launched their compute marketplace this year, and like, who's really using that?[00:42:53] swyx: Like, it's a bunch of small clusters, disparate types of compute, and if you can make that [00:43:00] useful, then that will be very beneficial to the broader community, but maybe still not the source of frontier models. It's just going to be a second tier of compute that is unlocked for people, and that's fine. But yeah, I mean, I think this year, I would say a lot more on device, We are, I now have Apple intelligence on my phone.[00:43:19] swyx: Doesn't do anything apart from summarize my notifications. But still, not bad. Like, it's multi modal.[00:43:25] Alessio: Yeah, the notification summaries are so and so in my experience.[00:43:29] swyx: Yeah, but they add, they add juice to life. And then, um, Chrome Nano, uh, Gemini Nano is coming out in Chrome. Uh, they're still feature flagged, but you can, you can try it now if you, if you use the, uh, the alpha.[00:43:40] swyx: And so, like, I, I think, like, you know, We're getting the sort of GPU poor version of a lot of these things coming out, and I think it's like quite useful. Like Windows as well, rolling out RWKB in sort of every Windows department is super cool. And I think the last thing that I never put in this GPU poor war, that I think I should now, [00:44:00] is the number of startups that are GPU poor but still scaling very well, as sort of wrappers on top of either a foundation model lab, or GPU Cloud.[00:44:10] swyx: GPU Cloud, it would be Suno. Suno, Ramp has rated as one of the top ranked, fastest growing startups of the year. Um, I think the last public number is like zero to 20 million this year in ARR and Suno runs on Moto. So Suno itself is not GPU rich, but they're just doing the training on, on Moto, uh, who we've also talked to on, on the podcast.[00:44:31] swyx: The other one would be Bolt, straight cloud wrapper. And, and, um, Again, another, now they've announced 20 million ARR, which is another step up from our 8 million that we put on the title. So yeah, I mean, it's crazy that all these GPU pores are finding a way while the GPU riches are also finding a way. And then the only failures, I kind of call this the GPU smiling curve, where the edges do well, because you're either close to the machines, and you're like [00:45:00] number one on the machines, or you're like close to the customers, and you're number one on the customer side.[00:45:03] swyx: And the people who are in the middle. Inflection, um, character, didn't do that great. I think character did the best of all of them. Like, you have a note in here that we apparently said that character's price tag was[00:45:15] Alessio: 1B.[00:45:15] swyx: Did I say that?[00:45:16] Alessio: Yeah. You said Google should just buy them for 1B. I thought it was a crazy number.[00:45:20] Alessio: Then they paid 2. 7 billion. I mean, for like,[00:45:22] swyx: yeah.[00:45:22] Alessio: What do you pay for node? Like, I don't know what the game world was like. Maybe the starting price was 1B. I mean, whatever it was, it worked out for everybody involved.[00:45:31] The Multi-Modality War[00:45:31] Alessio: Multimodality war. And this one, we never had text to video in the first version, which now is the hottest.[00:45:37] swyx: Yeah, I would say it's a subset of image, but yes.[00:45:40] Alessio: Yeah, well, but I think at the time it wasn't really something people were doing, and now we had VO2 just came out yesterday. Uh, Sora was released last month, last week. I've not tried Sora, because the day that I tried, it wasn't, yeah. I[00:45:54] swyx: think it's generally available now, you can go to Sora.[00:45:56] swyx: com and try it. Yeah, they had[00:45:58] Alessio: the outage. Which I [00:46:00] think also played a part into it. Small things. Yeah. What's the other model that you posted today that was on Replicate? Video or OneLive?[00:46:08] swyx: Yeah. Very, very nondescript name, but it is from Minimax, which I think is a Chinese lab. The Chinese labs do surprisingly well at the video models.[00:46:20] swyx: I'm not sure it's actually Chinese. I don't know. Hold me up to that. Yep. China. It's good. Yeah, the Chinese love video. What can I say? They have a lot of training data for video. Or a more relaxed regulatory environment.[00:46:37] Alessio: Uh, well, sure, in some way. Yeah, I don't think there's much else there. I think like, you know, on the image side, I think it's still open.[00:46:45] Alessio: Yeah, I mean,[00:46:46] swyx: 11labs is now a unicorn. So basically, what is multi modality war? Multi modality war is, do you specialize in a single modality, right? Or do you have GodModel that does all the modalities? So this is [00:47:00] definitely still going, in a sense of 11 labs, you know, now Unicorn, PicoLabs doing well, they launched Pico 2.[00:47:06] swyx: 0 recently, HeyGen, I think has reached 100 million ARR, Assembly, I don't know, but they have billboards all over the place, so I assume they're doing very, very well. So these are all specialist models, specialist models and specialist startups. And then there's the big labs who are doing the sort of all in one play.[00:47:24] swyx: And then here I would highlight Gemini 2 for having native image output. Have you seen the demos? Um, yeah, it's, it's hard to keep up. Literally they launched this last week and a shout out to Paige Bailey, who came to the Latent Space event to demo on the day of launch. And she wasn't prepared. She was just like, I'm just going to show you.[00:47:43] swyx: So they have voice. They have, you know, obviously image input, and then they obviously can code gen and all that. But the new one that OpenAI and Meta both have but they haven't launched yet is image output. So you can literally, um, I think their demo video was that you put in an image of a [00:48:00] car, and you ask for minor modifications to that car.[00:48:02] swyx: They can generate you that modification exactly as you asked. So there's no need for the stable diffusion or comfy UI workflow of like mask here and then like infill there in paint there and all that, all that stuff. This is small model nonsense. Big model people are like, huh, we got you in as everything in the transformer.[00:48:21] swyx: This is the multimodality war, which is, do you, do you bet on the God model or do you string together a whole bunch of, uh, Small models like a, like a chump. Yeah,[00:48:29] Alessio: I don't know, man. Yeah, that would be interesting. I mean, obviously I use Midjourney for all of our thumbnails. Um, they've been doing a ton on the product, I would say.[00:48:38] Alessio: They launched a new Midjourney editor thing. They've been doing a ton. Because I think, yeah, the motto is kind of like, Maybe, you know, people say black forest, the black forest models are better than mid journey on a pixel by pixel basis. But I think when you put it, put it together, have you tried[00:48:53] swyx: the same problems on black forest?[00:48:55] Alessio: Yes. But the problem is just like, you know, on black forest, it generates one image. And then it's like, you got to [00:49:00] regenerate. You don't have all these like UI things. Like what I do, no, but it's like time issue, you know, it's like a mid[00:49:06] swyx: journey. Call the API four times.[00:49:08] Alessio: No, but then there's no like variate.[00:49:10] Alessio: Like the good thing about mid journey is like, you just go in there and you're cooking. There's a lot of stuff that just makes it really easy. And I think people underestimate that. Like, it's not really a skill issue, because I'm paying mid journey, so it's a Black Forest skill issue, because I'm not paying them, you know?[00:49:24] Alessio: Yeah,[00:49:25] swyx: so, okay, so, uh, this is a UX thing, right? Like, you, you, you understand that, at least, we think that Black Forest should be able to do all that stuff. I will also shout out, ReCraft has come out, uh, on top of the image arena that, uh, artificial analysis has done, has apparently, uh, Flux's place. Is this still true?[00:49:41] swyx: So, Artificial Analysis is now a company. I highlighted them I think in one of the early AI Newses of the year. And they have launched a whole bunch of arenas. So, they're trying to take on LM Arena, Anastasios and crew. And they have an image arena. Oh yeah, Recraft v3 is now beating Flux 1. 1. Which is very surprising [00:50:00] because Flux And Black Forest Labs are the old stable diffusion crew who left stability after, um, the management issues.[00:50:06] swyx: So Recurve has come from nowhere to be the top image model. Uh, very, very strange. I would also highlight that Grok has now launched Aurora, which is, it's very interesting dynamics between Grok and Black Forest Labs because Grok's images were originally launched, uh, in partnership with Black Forest Labs as a, as a thin wrapper.[00:50:24] swyx: And then Grok was like, no, we'll make our own. And so they've made their own. I don't know, there are no APIs or benchmarks about it. They just announced it. So yeah, that's the multi modality war. I would say that so far, the small model, the dedicated model people are winning, because they are just focused on their tasks.[00:50:42] swyx: But the big model, People are always catching up. And the moment I saw the Gemini 2 demo of image editing, where I can put in an image and just request it and it does, that's how AI should work. Not like a whole bunch of complicated steps. So it really is something. And I think one frontier that we haven't [00:51:00] seen this year, like obviously video has done very well, and it will continue to grow.[00:51:03] swyx: You know, we only have Sora Turbo today, but at some point we'll get full Sora. Oh, at least the Hollywood Labs will get Fulsora. We haven't seen video to audio, or video synced to audio. And so the researchers that I talked to are already starting to talk about that as the next frontier. But there's still maybe like five more years of video left to actually be Soda.[00:51:23] swyx: I would say that Gemini's approach Compared to OpenAI, Gemini seems, or DeepMind's approach to video seems a lot more fully fledged than OpenAI. Because if you look at the ICML recap that I published that so far nobody has listened to, um, that people have listened to it. It's just a different, definitely different audience.[00:51:43] swyx: It's only seven hours long. Why are people not listening? It's like everything in Uh, so, so DeepMind has, is working on Genie. They also launched Genie 2 and VideoPoet. So, like, they have maybe four years advantage on world modeling that OpenAI does not have. Because OpenAI basically only started [00:52:00] Diffusion Transformers last year, you know, when they hired, uh, Bill Peebles.[00:52:03] swyx: So, DeepMind has, has a bit of advantage here, I would say, in, in, in showing, like, the reason that VO2, while one, They cherry pick their videos. So obviously it looks better than Sora, but the reason I would believe that VO2, uh, when it's fully launched will do very well is because they have all this background work in video that they've done for years.[00:52:22] swyx: Like, like last year's NeurIPS, I already was interviewing some of their video people. I forget their model name, but for, for people who are dedicated fans, they can go to NeurIPS 2023 and see, see that paper.[00:52:32] Alessio: And then last but not least, the LLMOS. We renamed it to Ragops, formerly known as[00:52:39] swyx: Ragops War. I put the latest chart on the Braintrust episode.[00:52:43] swyx: I think I'm going to separate these essays from the episode notes. So the reason I used to do that, by the way, is because I wanted to show up on Hacker News. I wanted the podcast to show up on Hacker News. So I always put an essay inside of there because Hacker News people like to read and not listen.[00:52:58] Alessio: So episode essays,[00:52:59] swyx: I remember [00:53:00] purchasing them separately. You say Lanchain Llama Index is still growing.[00:53:03] Alessio: Yeah, so I looked at the PyPy stats, you know. I don't care about stars. On PyPy you see Do you want to share your screen? Yes. I prefer to look at actual downloads, not at stars on GitHub. So if you look at, you know, Lanchain still growing.[00:53:20] Alessio: These are the last six months. Llama Index still growing. What I've basically seen is like things that, One, obviously these things have A commercial product. So there's like people buying this and sticking with it versus kind of hopping in between things versus, you know, for example, crew AI, not really growing as much.[00:53:38] Alessio: The stars are growing. If you look on GitHub, like the stars are growing, but kind of like the usage is kind of like flat. In the last six months, have they done some[00:53:4
Happy holidays! We'll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all all our LS supporters who helped fund the gorgeous venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver.Of perennial interest, particularly at academic conferences, is scaled-up architecture research as people hunt for the next Attention Is All You Need. We have many names for them: “efficient models”, “retentive networks”, “subquadratic attention” or “linear attention” but some of them don't even have any lineage with attention - one of the best papers of this NeurIPS was Sepp Hochreiter's xLSTM, which has a particularly poetic significance as one of the creators of the LSTM returning to update and challenge the OG language model architecture:So, for lack of a better term, we decided to call this segment “the State of Post-Transformers” and fortunately everyone rolled with it.We are fortunate to have two powerful friends of the pod to give us an update here:* Together AI: with CEO Vipul Ved Prakash and CTO Ce Zhang joining us to talk about how they are building Together together as a quote unquote full stack AI startup, from the lowest level kernel and systems programming to the highest level mathematical abstractions driving new model architectures and inference algorithms, with notable industry contributions from RedPajama v2, Flash Attention 3, Mamba 2, Mixture of Agents, BASED, Sequoia, Evo, Dragonfly, Dan Fu's ThunderKittens and many more research projects this year* Recursal AI: with CEO Eugene Cheah who has helped lead the independent RWKV project while also running Featherless AI. This year, the team has shipped RWKV v5, codenamed Eagle, to 1.5 billion Windows 10 and Windows 11 machines worldwide, to support Microsoft's on-device, energy-usage-sensitive Windows Copilot usecases, and has launched the first updates on RWKV v6, codenamed Finch and GoldFinch. On the morning of Latent Space Live, they also announced QRWKV6, a Qwen 32B model modified with RWKV linear attention layers. We were looking to host a debate between our speakers, but given that both of them were working on post-transformers alternativesFull Talk on YoutubePlease like and subscribe!LinksAll the models and papers they picked:* Earlier Cited Work* Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention* Hungry hungry hippos: Towards language modeling with state space models* Hyena hierarchy: Towards larger convolutional language models* Mamba: Linear-Time Sequence Modeling with Selective State Spaces* S4: Efficiently Modeling Long Sequences with Structured State Spaces* Just Read Twice (Arora et al)* Recurrent large language models that compete with Transformers in language modeling perplexity are emerging at a rapid rate (e.g., Mamba, RWKV). Excitingly, these architectures use a constant amount of memory during inference. However, due to the limited memory, recurrent LMs cannot recall and use all the information in long contexts leading to brittle in-context learning (ICL) quality. A key challenge for efficient LMs is selecting what information to store versus discard. In this work, we observe the order in which information is shown to the LM impacts the selection difficulty. * To formalize this, we show that the hardness of information recall reduces to the hardness of a problem called set disjointness (SD), a quintessential problem in communication complexity that requires a streaming algorithm (e.g., recurrent model) to decide whether inputted sets are disjoint. We empirically and theoretically show that the recurrent memory required to solve SD changes with set order, i.e., whether the smaller set appears first in-context. * Our analysis suggests, to mitigate the reliance on data order, we can put information in the right order in-context or process prompts non-causally. Towards that end, we propose: (1) JRT-Prompt, where context gets repeated multiple times in the prompt, effectively showing the model all data orders. This gives 11.0±1.3 points of improvement, averaged across 16 recurrent LMs and the 6 ICL tasks, with 11.9× higher throughput than FlashAttention-2 for generation prefill (length 32k, batch size 16, NVidia H100). We then propose (2) JRT-RNN, which uses non-causal prefix-linear-attention to process prompts and provides 99% of Transformer quality at 360M params., 30B tokens and 96% at 1.3B params., 50B tokens on average across the tasks, with 19.2× higher throughput for prefill than FA2.* Jamba: A 52B Hybrid Transformer-Mamba Language Model* We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. * Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while keeping active parameter usage manageable. * This flexible architecture allows resource- and objective-specific configurations. In the particular configuration we have implemented, we end up with a powerful model that fits in a single 80GB GPU.* Built at large scale, Jamba provides high throughput and small memory footprint compared to vanilla Transformers, and at the same time state-of-the-art performance on standard language model benchmarks and long-context evaluations. Remarkably, the model presents strong results for up to 256K tokens context length. * We study various architectural decisions, such as how to combine Transformer and Mamba layers, and how to mix experts, and show that some of them are crucial in large scale modeling. We also describe several interesting properties of these architectures which the training and evaluation of Jamba have revealed, and plan to release checkpoints from various ablation runs, to encourage further exploration of this novel architecture. We make the weights of our implementation of Jamba publicly available under a permissive license.* SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers* We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096×4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU. Core designs include: * (1) Deep compression autoencoder: unlike traditional AEs, which compress images only 8×, we trained an AE that can compress images 32×, effectively reducing the number of latent tokens. * (2) Linear DiT: we replace all vanilla attention in DiT with linear attention, which is more efficient at high resolutions without sacrificing quality. * (3) Decoder-only text encoder: we replaced T5 with modern decoder-only small LLM as the text encoder and designed complex human instruction with in-context learning to enhance the image-text alignment. * (4) Efficient training and sampling: we propose Flow-DPM-Solver to reduce sampling steps, with efficient caption labeling and selection to accelerate convergence. * As a result, Sana-0.6B is very competitive with modern giant diffusion model (e.g. Flux-12B), being 20 times smaller and 100+ times faster in measured throughput. Moreover, Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024×1024 resolution image. Sana enables content creation at low cost. * RWKV: Reinventing RNNs for the Transformer Era* Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. * We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs.* Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, thus parallelizing computations during training and maintains constant computational and memory complexity during inference. * We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers, suggesting future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling trade-offs between computational efficiency and model performance in sequence processing tasks.* LoLCATs: On Low-Rank Linearizing of Large Language Models* Recent works show we can linearize large language models (LLMs) -- swapping the quadratic attentions of popular Transformer-based LLMs with subquadratic analogs, such as linear attention -- avoiding the expensive pretraining costs. However, linearizing LLMs often significantly degrades model quality, still requires training over billions of tokens, and remains limited to smaller 1.3B to 7B LLMs. * We thus propose Low-rank Linear Conversion via Attention Transfer (LoLCATs), a simple two-step method that improves LLM linearizing quality with orders of magnitudes less memory and compute. * We base these steps on two findings. * First, we can replace an LLM's softmax attentions with closely-approximating linear attentions, simply by training the linear attentions to match their softmax counterparts with an output MSE loss ("attention transfer").* Then, this enables adjusting for approximation errors and recovering LLM quality simply with low-rank adaptation (LoRA). * LoLCATs significantly improves linearizing quality, training efficiency, and scalability. We significantly reduce the linearizing quality gap and produce state-of-the-art subquadratic LLMs from Llama 3 8B and Mistral 7B v0.1, leading to 20+ points of improvement on 5-shot MMLU. * Furthermore, LoLCATs does so with only 0.2% of past methods' model parameters and 0.4% of their training tokens. * Finally, we apply LoLCATs to create the first linearized 70B and 405B LLMs (50x larger than prior work). * When compared with prior approaches under the same compute budgets, LoLCATs significantly improves linearizing quality, closing the gap between linearized and original Llama 3.1 70B and 405B LLMs by 77.8% and 78.1% on 5-shot MMLU.Timestamps* [00:02:27] Intros* [00:03:16] Why Scale Context Lengths? or work on Efficient Models* [00:06:07] The Story of SSMs* [00:09:33] Idea 1: Approximation -> Principled Modeling* [00:12:14] Idea 3: Selection* [00:15:07] Just Read Twice* [00:16:51] Idea 4: Test Time Compute* [00:17:32] Idea 2: Hardware & Kernel Support* [00:19:49] RWKV vs SSMs* [00:24:24] RWKV Arch* [00:26:15] QWRKWv6 launch* [00:30:00] What's next* [00:33:21] Hot Takes - does anyone really need long context?Transcript[00:00:00] AI Charlie: We're back at Latent Space Live, our first mini conference held at NeurIPS 2024 in Vancouver. This is Charlie, your AI co host. As a special treat this week, we're recapping the best of 2024 going domain by domain. We sent out a survey to the over 900 of you who told us what you wanted, and then invited the best speakers in the Latent Space Network to cover each field.[00:00:24] AI Charlie: 200 of you joined us in person throughout the day, with over 2200 watching live online. Thanks Our next keynote covers the State of Transformers alternative architectures, with a special joint presentation with Dan Fu of Together AI and Eugene Chia of Recursal AI and Featherless AI. We've featured both Together and Recursal on the pod before, with CEO Veepal Vedprakash introducing them.[00:00:49] AI Charlie: And CTO CE Zhang joining us to talk about how they are building together together as a quote unquote full stack AI startup from the lowest level kernel and systems [00:01:00] programming to the highest level mathematical abstractions driving new model architectures and inference algorithms with notable industry contributions from Red Pajama V2, Flash Attention 3, Mamba 2, Mixture of Agents.[00:01:15] AI Charlie: Based, Sequoia, Evo, Dragonfly, Danfoo's Thunder Kittens, and many more research projects this year. As for Recursal and Featherless, we were the first podcast to feature RWKV last year, and this year the team has shipped RWKV v5, codenamed Eagle, to 1. 5 billion Windows 10 and Windows 11 machines worldwide to support Microsoft's on device, end Energy Usage Sensitive Windows Copilot Use Cases and has launched the first updates on RWKV v6, codenamed Finch and Goldfinch.[00:01:53] AI Charlie: On the morning of Latent Space Live, they also announced QRdata UKv6, a QEN32B model [00:02:00] modified with RDWKV linear attention layers. Eugene has also written the most single most popular guest post on the Latent Space blog this year. Yes, we do take guest posts on what he has discovered about the H100 GPU inference NeoCloud market since the successful launch of Featherless AI this year.[00:02:20] AI Charlie: As always, don't forget to check the show notes for the YouTube link to their talk as well as their slides. Watch out and take care.[00:02:27] Intros[00:02:27] Dan Fu: Yeah, so thanks so much for having us. So this is going to be a little bit of a two part presentation. My name is Dan. I'm at Together AI, and I'll be joining UCSD as faculty in about a year. And Eugene, you want to introduce yourself?[00:02:46] Eugene Cheah: Eugene, I lead the art activity team, and I, I'm CEO of Featherless, and we both work on this new post transformer architecture space.[00:02:55] Dan Fu: Yeah, so yeah, so today we're really excited to talk to you a little bit [00:03:00] about that. So first I'm going to give a broad overview of kind of the last few years of progress in non post transformer architectures. And then afterwards Eugene will tell us a little bit about the latest and the greatest and the latest frontier models in this space.[00:03:16] Why Scale Context Lengths? or work on Efficient Models[00:03:16] Dan Fu: So, the story starts with Scaling. So this is probably a figure or something like this that you've seen very recently. Over the last five to six years, we've seen models really scale up in parameter size, and that's brought with it a bunch of new capabilities, like the ability to talk to you and tell you sometimes how to use your Colab screens.[00:03:35] Dan Fu: But another place where we've seen scaling especially recently is scaling in context length. So this can mean Having more text inputs for your models, but it can also mean things like taking a lot of visual token inputs image inputs to your models or generating lots of outputs. And one thing that's been really exciting over the last few months or so is that we're, we're seeing scaling, not only during training time, but also [00:04:00] during test time.[00:04:00] Dan Fu: So this is one of the, the, this is the iconic image from the OpenAI 01 release. Not only are we starting to scale train time compute, but we're also starting to scale test time compute. Now if you're familiar with our attention and our transformer architectures today, this graph on the right might look a little bit scary.[00:04:19] Dan Fu: And one of the reasons is that the implications are a little bit Interesting. So what does it mean if we want to continue having smarter and smarter models? Do we just need to start building bigger, bigger data centers, spending more flops? Is this this little Dolly 3, we need more flops, guys? Is this going to be the future of all of AI?[00:04:39] Dan Fu: Or is there a better way, another path forward? Maybe we can get the same capabilities that we've gotten used to, But for a lot less compute, a lot less flops. And one of the things that we're going to talk about today is specifically looking at that core attention operator in some of these models.[00:04:57] Dan Fu: And the reason is that so this is just some, some [00:05:00] basic you know, scaling curves, but attention has compute that scales quadratically in the context length. So that means that if you're doing something like test time compute and you want to spend a bunch of tokens thinking about what comes next, the longer that that goes the, the, the more tokens you spend on that, that compute grows quadratically in that.[00:05:19] Dan Fu: One of the questions that we're interested in is, can we take that basic sequence model, that basic sequence primitive at the bottom, and get it to scale better? Can we scale in, let's say, n to the 3 halves or n log n? So in, in the first part of the talk, so we just went over the introduction. What I'm gonna do over the next few slides is just talk about some of the key advances and ideas that have shown over the past few years since maybe early 2020 to, to now that shown promise that this might actually be possible.[00:05:48] Dan Fu: That you can actually get potentially the same quality that we want while scale, while scaling better. So to do that, we're and, and basically the, the story that we're gonna look is we're gonna start to see [00:06:00] how. So this is a basic graph of just the past couple years of progress of perplexity where that blue line, that dotted blue line, is attention.[00:06:07] The Story of SSMs[00:06:07] Dan Fu: It's your basic transformer, full dense attention. And then the dots coming down are some of the methods that you'll see in this presentation today. We're going to turn the clock back all the way to 2020. So this, this, this question of can we make attention subquadratic? Basically, as soon as we said attention is all you need, People started asking this question.[00:06:28] Dan Fu: So we have this quadratic attention operator. Can we do better? I'll briefly talk about why attention is quadratic. And the basic thing that happens, if you're not familiar, is that you have these inputs, these keys and queries. And what you do in this attention matrix, this S matrix over here, is that you're using, you're comparing every token in your input to every other token.[00:06:49] Dan Fu: So when I try to do something like upload a whole book to Gemini, what happens beyond the Maybe not Gemini, because we don't necessarily know what architecture is. But let's say we upload it to LLAMA, what happens beyond [00:07:00] the scenes, behind the scenes, is that it's going to take every single word in that book and compare it to every other word.[00:07:05] Dan Fu: And this has been a really, it's, it's led to some pretty impressive things. But it's kind of a brute forcing of the way that you would try to interpret a interpret something. And what attention does in particular is the, and then what attention, sorry, don't want to. Okay, no, no laser pointer. What, what attention does afterwards is that instead of always operating in this quadratic thing, it takes a row wise softmax over this matrix, and then multiplies it by this values matrix.[00:07:32] Dan Fu: So, one of the key points to notice is that the output size is always going to be the same as the inputs, at least in standard self attention. So one of the first things that folks tried to do around 2020 is this thing called linear attention, which is just, just noticing that if we take out this softmax from here, if we take out this non linearity in the middle of the attention operation, and then if you compute the keys and the values operation first, you actually never hit this quadratic bottleneck.[00:07:57] Dan Fu: So that, that's potentially a way [00:08:00] to get a lot more computationally efficient. And there are various ways to do this by basically using feature maps or try to approximate this overall attention computation. But some of this work sort of started to hit a wall in 2020. And the basic challenges were, were two.[00:08:16] Dan Fu: So one was quality. It was back then, it was kind of hard to, to get good quality with these linear attention operators. The other one was actually hardware efficiency. So these, this feature map that was just shown by a simplify simplify here. Actually ends up being quite computationally expensive if you just implement it naively.[00:08:34] Dan Fu: So you started having these operators that not only were you sure, you're not really sure if they have the same quality, but also they're actually just wall clock slower. So you kind of end up getting the worst of both worlds. So this was the the stage. So that kind of sets the stage for four years ago.[00:08:49] Dan Fu: Keep this in mind because linear attention is actually going to come back in a few years once we have a better understanding. But one of the works that started kicking off this, this [00:09:00] mini revolution in post transformer architectures was this idea called states based model. So here the seminal work is, is one about our work queue in 2022.[00:09:09] Dan Fu: And this, this piece of work really brought together a few ideas from, from some long running research research lines of work. The first one was, and this is really one of the keys to, to closing the gap in quality was just using things that, that if you talk to a, a, an electrical engineer off the street, they might know off, off the, like the back of their hand.[00:09:33] Idea 1: Approximation -> Principled Modeling[00:09:33] Dan Fu: But taking some of those properties with how we model dynamical systems in signal processing and then using those ideas to model the inputs, the, the text tokens in, for example a transformer like Next Token Prediction Architecture. So some of those early states-based model papers were looking at this relatively, relatively simple recurrent update model that comes from maybe chapter one of a signal processing class.[00:09:59] Dan Fu: But then using [00:10:00] some principle theory about how you should do that recurrent update in order to really get the most that you can out of your hidden state, out of your out of your sequence. So that, that was one key idea for quality and. When this was eventually realized, you started to see a bunch of benchmarks that were pretty sticky for a few years.[00:10:20] Dan Fu: Things like long range arena, some long sequence evaluation benchmarks, There was stuff in time series, time series analysis. They started to, you started to see the quality tick up in meaningful ways. But the other key thing that What's so influential about these states based models is that they also had a key idea about how you can compute these things efficiently.[00:10:45] Dan Fu: So if you go back to your machine learning 101 class where you learned about RNNs, one thing that you may have learned is that they don't paralyze as well as detention, because if you just run them naively, you have to do this kind of sequential update to process new tokens, [00:11:00] whereas in attention, you can process all the tokens in parallel at one time.[00:11:04] Dan Fu: One of the key insights behind the S4 paper was that these recurrent models, you could take them and you could also formulate them as a convolution. And in particular, with a convolution, you could, instead of using a PyTorch conv1d operation, you can compute that with the FFT. And that would give you n log n compute in the in the sequence length n with an operator that was relatively well optimized for modern hardware.[00:11:28] Dan Fu: So those are really, I'd say, the two key ideas in 2022 that started allowing these breakthroughs to happen in these non transformer architectures. So, these ideas about how to principally model sorry, how to model the recurrent updates of a mo of, of a sequence in a principled way, and also these key ideas in how you can compute it efficiently by turning it into a convolution and then scaling it up with the FFT.[00:11:53] Dan Fu: Along those same lines, so afterwards we started putting out some work on specialized kernels, so just [00:12:00] like we have flash attention for transformers, we also have works like flash fft conf, and if you look at these lines of work oftentimes when, whenever you see a new architecture, you see a new primitive one of the, one of the table stakes now is, do you have an efficient kernel so that you can actually get wall clock speed up?[00:12:14] Idea 3: Selection[00:12:14] Dan Fu: So by 2022, We are starting to have these models that had promising quality primitives, but and, and also promising wall clocks. So you could actually see regimes where they were better than transformers in meaningful ways. That being said, there were, there's still sometimes a quality gap, particularly for language modeling.[00:12:33] Dan Fu: And because languages, It's so core to what we do in sequence modeling these days the, the next, the next key idea that I'm going to talk about is this idea of selection mechanisms. And this is basically an idea of, so you have this recurrent state that you're keeping around that just summarizes everything that, that came before.[00:12:50] Dan Fu: And to get a good sequence model, one of the things that you really need to be able to do is have the model learn what's the best way to pick out pieces from that recurrent [00:13:00] state. So one of the, one of the major ideas here in a line of work called H3, Hungry Hungry Hippos, and also these hyena models were One way you can do this is by just adding some simple element wise gates.[00:13:13] Dan Fu: So versions of these ideas have been around for decades. If you squint at the LSTM paper you, you can probably find, find this gating mechanism. But turns out you can take those old ideas, add them into these new. state space models, and then you can see quality start to pick up. If you've heard of the Mamba model, this also takes the selection to the next level by actually making some changes in that fundamental recurrent state space.[00:13:40] Dan Fu: So, it's not only just this gating that happens around the SSM layer, but also you can actually make The ABCD matrices of your state space model, you can make them data dependent, which will allow you to even better select out different pieces from your hidden state depending on what you're seeing. I'll also point out if you look at the [00:14:00] bottom right of this figure, there's this little triangle with a GPU SRAM, GPU HBM, and this, this is just continuing that trend of when you have a new architecture you, you, you also release it with a kernel to, to, to show that it is hardware efficient, that it, that it can be hardware efficient on modern hardware.[00:14:17] Dan Fu: The, the, one of the next cool things that happened is once we had this understanding of these are the basic pieces, these are the basic principles behind some of the sequence models linear attention actually started to come back. So in earlier this year, there was a model called BASED the, from Simran Arora and, and some other folks, that combined a more principled version of linear attention that basically the, the, the, the two second summary is that it used a Taylor approximation of the softmax attention, combined that with a simple sliding window attention and was starting to able, starting to be able to expand the Pareto frontier of how much data can you recall from your sequence, versus how small is your recurrent state size.[00:14:58] Dan Fu: So those orange dots [00:15:00] are, at the top there, are just showing smaller sequences that can recall more memory.[00:15:07] Just Read Twice[00:15:07] Dan Fu: And the last major idea I think that has been influential in this line of work and is very relatively late breaking just a few months ago, is just the basic idea that when you have these models that are fundamentally more efficient in the sequence length, you maybe don't want to prompt them or use them in exactly the same way.[00:15:26] Dan Fu: So this was a really cool paper called Just Read Twice, also from Simran. That basically said, hey, all these efficient models can process tokens so much more efficiently than transformers that they can sometimes have unfair advantages compared to a simple transformer token. So, or sorry, a simple transformer model.[00:15:44] Dan Fu: So take, for example the standard, the standard use case of you have some long document, you're going to pass it in as input, and then you're going to ask some question about it. One problem you might imagine for a recurrent model where you have a fixed state size is, let's say that [00:16:00] you're. Article is very long, and you're trying to ask about some really niche thing.[00:16:04] Dan Fu: You can imagine it might be hard for the model to know ahead of time what information to put into the hidden state. But these, these, these models are so much more efficient that you can do something really stupid, like, you can just put the document write down the document, write down the question, write down the document again, and then write down the question again, and then this time, the second time that you go over that document, you know exactly what to look for.[00:16:25] Dan Fu: And the cool thing about this is, so this is, And this this results in better quality, especially on these recall intensive tasks. But the other interesting thing is it really takes advantage of the more efficient architectures that, that we're having here. So one of the other, I think, influential ideas in this line of work is if you change the fundamental compute capabilities of your model and the way that it scales, you can actually start to query it at test time differently.[00:16:51] Idea 4: Test Time Compute[00:16:51] Dan Fu: And this actually, of course, goes back to those slides on test time compute. So while everybody's looking at, say, test time compute for big transformer models, [00:17:00] I think potentially a really interesting research question is, how can you take those and how does it change with this new next generation of models?[00:17:09] Dan Fu: So the, I'll just briefly summarize what some of those key ideas were and then talk and then show you briefly kind of what the state of the art is today. So, so the four key ideas are instead of just doing a simple linear attention approximation, instead take ideas that we know from other fields like signal processing, do a more principled approach to your modeling of the sequence.[00:17:32] Idea 2: Hardware & Kernel Support[00:17:32] Dan Fu: Another key idea throughout all these lines of work is you really want. Hardware and kernel support from day one. So, so even if your model is theoretically more efficient if somebody goes and runs it and it's two times slower one of the things that, that we've learned is that if, if you're in that situation, it's, it's just gonna be dead on arrival.[00:17:49] Dan Fu: So you want to be designing your architectures one of the key, key machine learning ideas that has been important for the quality is just making sure that you encode different ways that you can [00:18:00] select from your hidden state and, and really focus on that as a key decider of quality. And finally, I think one of the, the, the emerging new, new things for, for this line of work and something that's quite interesting is, What are the right test time paradigms for these models?[00:18:15] Dan Fu: How do they change relative to relative to what you might do for a standard transformer? I'll briefly end this section. So I've labeled this slide where we are yesterday because Eugene is going to talk about some new models that he released literally this morning. But as of yesterday, some of the really cool results out of the, these efficient alternative models were so AI2 trained this hybrid MOE called Jamba.[00:18:40] Dan Fu: That, that, that seems, that is currently the state of the art for these non transformer architectures. There's this NVIDIA and MIT put out this new diffusion model called SANA recently that one of their key key observations is that you can take a standard diffusion transformer diffusion model, replace the layers with linear [00:19:00] attention, and then that lets you scale to much larger much larger images, much, much Much larger sequences more efficiently.[00:19:07] Dan Fu: And and one thing that I don't think anybody would have called when a few years ago is that one of those gated SSM, gated states based models ended up on the cover of Science because a great group of folks went and trained some DNA models. So that's Michael Polley, Eric Yuen from from Stanford and the Arc Institute.[00:19:26] Dan Fu: So it's, we're really at an exciting time in 2024 where these non transformer, post transformer architectures are showing promise across a wide range. Across a wide range of, of modalities, of applications, and, and of tasks. And with that, I'll pass it on to Eugene, who can tell you a little bit about the latest and greatest with RWKV.[00:19:49] RWKV vs SSMs[00:19:49] Eugene Cheah: So, that's useful? Yeah. You're talking to here. Oh, I'm talking to here. Okay. So, yeah, two streams. Yeah. So, I think one common questions that we tend to get asked, right, is what's the difference between [00:20:00] RWKV and state space? So I think one of the key things to really understand, right the difference between the two groups, right, is that we are actually more like an open source, random internet meets academia kind of situation.[00:20:11] Eugene Cheah: Like, most of us never wrote any paper, but we, we basically look at RNNs and linear intention when intention is all you need came out, and then we decided to like, hey there is a quadratic scaling problem. Why don't we try fixing that instead? So, so, so we end up developing our own branch, but we end up sharing ideas back and forth.[00:20:30] Eugene Cheah: So, and, and we do all this actively in Discord, GitHub, etc. This was so bad for a few years, right, that basically, the average group's H index was so close to zero, right, Illuter. ai actually came in and helped us write our first paper. Great, now our H index is now three, apparently. So, so, so, but, but the thing is, like, a lot of these experiments led to results, and, and, essentially, essentially, we we took the same ideas from linear attention, [00:21:00] and we built on it.[00:21:01] Eugene Cheah: So, to take a step back into, like, how does RWKB handle its own attention mechanic and achieve the same goals of, like, O and compute, respectively, and in focus of our overall goal to make AI accessible to everyone, regardless of language, nation, or compute, that's our goal. We actually train our models primarily on over a hundred languages, which is another topic altogether.[00:21:23] Eugene Cheah: And our goal is to train to even 200 languages to cover all languages in the world. But at the same time, we work on this architecture, To lower the compute cost so that people can run it on Raspberry Pis and on anything. So, how did RWKB break the dependency of LSTM token flow? Because I think to understand architecture, right, it's probably easier to understand it from the RNN lens.[00:21:46] Eugene Cheah: Because that's where we built on. We all, we all state space kind of like try to, try to start anew and took lessons from that and say, So there's a little bit of divergence there. And AKA, this our version of linear attention. So to take step back [00:22:00] all foundation models, be it transformers or non transformers at a very high level, right?[00:22:05] Eugene Cheah: Pumps in the token. I mean, text that things into embeddings and go through a lot of layers. Generate a lot of states where the QKV cache or be iron in states or RW KB states. And outputs and embedding, they are not the same thing. And we just take more layers and more embeddings. And somehow that magically works.[00:22:23] Eugene Cheah: So, if you, if you remember your ancient RNN lessons which we, which we, which we we call best learning these days the general idea is that you have the embedding information flowing all the way up, and when, and you take that information and you flow it back down, and then you process it as part of your LSTM layers.[00:22:41] Eugene Cheah: So, this is how it generally works. Kapati is quoted saying that RNNs are actually unreasonably effective. The problem is this is not scalable. To start doing work on the second token, you need to wait for the first token. And then you need to, and likewise for the third token and fourth token, yada yada.[00:22:55] Eugene Cheah: That is CPU land, not GPU land. So, so, so, you [00:23:00] can have a H100 and you can't even use 1 percent of it. So, so that's kind of why RNNs didn't really take off in the direction that we wanted, like, billions of parameters when it comes to training. So, what did RDAP KV version 0 do? Boom. We just did the dumbest, lamest thing.[00:23:13] Eugene Cheah: Sorry, this is the bottleneck for RNN. We did the dumb thing of removing that line. And it kind of worked. It trained. It sucked, but it kind of worked. Then we were like, hey, then no one cared because the loss was crap, but how do we improve that? And that's essentially where we move forward, because if you see this kind of flow, right, you can actually get your GPU saturated quickly, where it essentially cascades respectively.[00:23:41] Eugene Cheah: So I'm just waiting for this to loop again. So it's like, once you get your first layer, your token to be computed finish. You start to cascade your compute all the way until you are, Hey, I'm using 100 percent of the GPU. So we, we worked on it, and we started going along the principle of that as long as we keep this general architecture [00:24:00] where, where we can cascade and, and be highly efficient with our architecture, nothing is sacred in our architecture.[00:24:06] Eugene Cheah: And we have done some crazy ideas. In fact, you ask us, if you ask me to explain some things in the paper, right, officially in the paper, I'll say we had this idea and we wrote it this way. The reality is someone came with a code, we tested it, it worked, and then we rationalized later. So, so the general[00:24:24] RWKV Arch[00:24:24] Eugene Cheah: The idea behind rwkbr is that we generally have two major blocks that we do.[00:24:30] Eugene Cheah: We call time mix and channel mix. And time mix generally handles handles long term memory states, where essentially, where essentially where we apply the matrix multiplication and Cilu activation functions into processing an input embedding and an output embedding. I'm oversimplifying it because this, This calculation changed every version and we have, like, version 7 right now.[00:24:50] Eugene Cheah: ChannelMix is similar to Base in the sense that it does shorter term attention, where it just looks at the sister token, or the token before it, because [00:25:00] there's a shift in the token shift matrix. I don't really want to go too much into the papers itself, because, like, we do have three papers on this.[00:25:09] Eugene Cheah: Basically, RWKB, RNN for the transformer, ERA, Ego and Pinch, RWKB, Matrix Value State. This is the updated version 5, version 6. And Goldfinch is our, is, is, is, is our hybrid model respectively. We are writing the paper already for V seven and which is, which is for R wk V seven. Called, named Goose, or architectures are named by Bird.[00:25:30] Eugene Cheah: And, I'm going to cover as well, qrwkb, and mama100k, and rwkb, and Where did that lead to? Great! Because we are all GPU poor and to be clear, like, most of this research is done, like, only on a handful H100s, which I had one Google researcher told me that was, like, his experiment budget for a single researcher.[00:25:48] Eugene Cheah: So, our entire organization has less compute than a single researcher in Google. So We, we, one of the things that we explored into was to how do we convert transformer models instead? Because [00:26:00] someone already paid that billion dollars, a million dollars onto training, so why don't we take advantage of those weights?[00:26:05] Eugene Cheah: And, and to, I believe, together AI worked on the lockets for, for the Lambda side of things, and, and we took some ideas from there as well, and we essentially did that for RWKB.[00:26:15] QWRKWv6 launch[00:26:15] Eugene Cheah: And that led to, Q RWKB6, which we just dropped today, a 32 bit instruct preview model, where we took the Quen 32 bit instruct model, freeze the feedforward layer, remove the QKB attention layer, and replace it with RWKB linear layers.[00:26:32] Eugene Cheah: So to be clear, this means we do not have the rwkv channel mix layer, we only have the time mix layer. But but once we do that, we train the rwkv layer. Important is that the feedforward layer needs to be frozen, so the new attention can be learned. And then we unfreeze the feedforward layer, and train all the layers together with a custom learning rate schedule, so that they can learn how to work together.[00:26:54] Eugene Cheah: The end result, surprisingly, And, to be honest, to the frustration of the R. W. [00:27:00] KV MOE team, which ended up releasing the model on the same day, was that, with just a few hours of training on two nodes, we managed to get it to be on par, kind of, with the original QUAN32B model. So, in fact, when the first run, right, that completely confused us, it was like, and I was telling Daniel Goldstein, Smirky, who kind of leads most of our research coordination, When you pitched me this idea, you told me at best you'll get the same level of performance.[00:27:26] Eugene Cheah: You didn't tell me the challenge and score and Winograd score will shoot up. I don't know what's happening there. But it did. MMLU score dropping, that was expected. Because if you think about it, when we were training all the layers, right, we were essentially Like, Frankenstein this thing, and we did brain damage to the feedforward network layer 2 with the new RWKB layers.[00:27:47] Eugene Cheah: But, 76%, hey, somehow it's retained, and we can probably further train this. We didn't even spend more than 3 days training this, so there's a lot more that can be done, hence the preview. This brings up [00:28:00] a big question, because We are already now in the process of converting to 7TB. We are now, this is actually extremely compute efficient to test our attention mechanic.[00:28:10] Eugene Cheah: It's like, it becomes a shortcut. We can, we are already planning to do our version 7 and our hybrid architecture for it. Because we don't need to train from scratch. And we get a really good model out of it. And the other thing that is uncomfortable to say is that because we are doing right now on the 70b is that if this scales correctly to 128k context length, I'm not even talking about a million 128, majority of enterprise workload today is just on 70b at under 32k context length.[00:28:41] Eugene Cheah: That means if this works and the benchmark matches it, It means we can replace the vast majority of current AI workload, unless you want super long context. And then sorry, can someone give us more GPUs? Because we do need the VRAM for super long context, sadly. So yeah, that's what we are working on, and essentially, [00:29:00] we are excited about this to just push it further.[00:29:02] Eugene Cheah: And this conversion process, to be clear, I don't think it's going to be exclusive to RWKB. It probably will work for Mamba as well, I don't see why not. And we will probably see more ideas, or more experiments, or more hybrids, or Yeah, like, one of the weirdest things that I wanted to say outright, and I confirmed this with the Black Mamba team and the Jamba team, which because we did the GoFinch hybrid model, is that none of us understand why a hard hybrid with a state based model to be R.[00:29:28] Eugene Cheah: QA state space and transformer performs better when, than the baseline of both. It's like, it's like when you train one, you expect, and then you replace, you expect the same results. That's our pitch. That's our claim. But somehow when we jam both together, it outperforms both. And that's like one area of emulation that, like, we only have four experiments, plus four teams, that a lot more needs to be done.[00:29:51] Eugene Cheah: But, but these are things that excite me, essentially, because that is what it's potentially we can move ahead for. Which brings us to what comes next.[00:30:00] What's next[00:30:00] [00:30:00][00:30:00] Dan Fu: So, this part is kind of just some, where we'll talk a little bit about stuff that, that we're excited about. Maybe have some wild speculation on, on what, what's, what's coming next.[00:30:12] Dan Fu: And, of course this is also the part that will be more open to questions. So, a couple things that, that I'm excited about is continued hardware model co design for, for these models. So one of the things that we've put out recently is this library called ThunderKittens. It's a CUDA library.[00:30:29] Dan Fu: And one of the things that, that we found frustrating is every time that we built one of these new architectures, and I'm sure you had the exact same experience, we'd have to go and spend two months in CUDA land, like writing these, these new efficient things. And. If we decided to change one thing in PyTorch, like one line of PyTorch code is like a week of CUDA code at least.[00:30:47] Dan Fu: So one of our goals with, with a library like Thunderkitten, so we, we just broke down what are the key principles, what are the key hardware things what are the key, Compute pieces that you get from the hardware. So for example on [00:31:00] H100 everything is really revolves around a warp group matrix multiply operation.[00:31:06] Dan Fu: So you really want your operation to be able to split into relatively small matrix, matrix multiply operations. So like multiplying two 64 by 64 matrices, for example. And so if you know that ahead of time when you're designing your model, that probably gives you you know, some information about how you set the state sizes, how you set the update, how you set the update function.[00:31:27] Dan Fu: So with Thunderkittens we basically built a whole library just around this basic idea that all your basic compute primitives should not be a float, but it should be a matrix, and everything should just be matrix compute. And we've been using that to, to try to both re implement some existing architectures, and also start to design code.[00:31:44] Dan Fu: Some new ones that are really designed with this core with a tensor core primitive in mind. Another thing that that we're, that at least I'm excited about is we, over the last four or five years, we've really been looking at language models as the next thing. But if you've been paying [00:32:00] attention to Twitter there's been a bunch of new next generation models that are coming out.[00:32:04] Dan Fu: So there, there are. So, video generation models that can run real time, that are supported by your mouse and your keyboard, that I'm told if you play with them that, you know, that they only have a few seconds of memory. Can we take that model, can we give it a very long context length so that you could actually maybe generate an entire game state at a time?[00:32:25] Dan Fu: What does that look like for the model? You're certainly not going to do a giant quadratic attention computation to try to run that. Maybe, maybe use some of these new models, or some of these new video generation models that came out. So Sora came out I don't know, two days ago now. But with super long queue times and super long generation times.[00:32:43] Dan Fu: So that's probably a quadratic attention operation at the, at the bottom of it. What if we could remove that and get the same quality, but a lot faster generation time? Or some of the demos that we saw from Paige earlier today. You know, if I have a super long conversation with my [00:33:00] Gemini bot, what if I wanted to remember everything that it's seen in the last week?[00:33:06] Dan Fu: I mean, maybe you don't for personal reasons, but what if I did, you know? What does that mean for the architecture? And I think, you know, that's certainly something I'm pretty excited about. I'm sure you're excited about it too. So, I think we were supposed to have some hot takes, but I honestly don't remember what our hot takes were.[00:33:21] Hot Takes - does anyone really need long context?[00:33:21] Eugene Cheah: Yeah, including the next slide. Hot takes, yes, these are our[00:33:25] Dan Fu: hot takes.[00:33:25] Eugene Cheah: I think the big one on Twitter that we saw, that we shared, was the question is like, is RAG relevant? In the case of, like, the future of, like, state based models?[00:33:38] Dan Fu: Let's see, I haven't played too much with RAG. But when I have. I'll say I found it was a little bit challenging to do research on it because we had this experience over and over again, where you could have any, an embedding model of any quality, so you could have a really, really bad embedding model, or you could have a really, really [00:34:00] good one, By any measure of good.[00:34:03] Dan Fu: And for the final RAG application, it kind of didn't matter. That's what I'll say about RAG while I'm being recorded. I know it doesn't actually answer the question, but[00:34:13] Eugene Cheah: Yeah, so I think a lot of folks are like, extremely excited of the idea of RWKB or State Space potentially having infinite context.[00:34:21] Eugene Cheah: But I think the reality is that when we say infinite context, we just mean a different kind of infinite context, or you, or as it's previously covered, you need to test the model differently. So, think of it more along the lines of the human. Like, I don't remember what I ate for breakfast yesterday.[00:34:37] Eugene Cheah: Yeah, that's the statement that I'll say. And And we humans are not quadratic transformers. If we did, if let's say we increased our brain size for every second we live, we would have exploded by the time we are 5 years old or something like that. And, and I think, I think basically fundamentally for us, right, be it whether we, regardless of whether RWKB, statespace, XLSTM, [00:35:00] etc, our general idea is that instead of that expanding state, that increase in computational cost, what if we have a fixed state size?[00:35:08] Eugene Cheah: And Information theory detects that that fixed state size will have a limit. Just how big of a limit is a question, like, we, like, RWKB is running at 40 megabytes for, for its state. Its future version might run into 400 megabytes. That is like millions of tokens in, if you're talking about mathematically, the maximum possibility.[00:35:29] Eugene Cheah: It's just that I guess we were all more inefficient about it, so maybe we hit 100, 000. And that's kind of like the work we are doing, trying to like push it and maximize it. And that's where the models will start differing, because it will choose to forget things, it will choose to remember things. And that's why I think that there might be some element of right, but it may not be the same right.[00:35:49] Eugene Cheah: It may be the model learn things, and it's like, hmm, I can't remember that, that article. Let me do a database search, to search. Just like us humans, when we can't remember the article in the company. We do a search on Notion. [00:36:00][00:36:00] Dan Fu: I think something that would be really interesting is if you could have facts that are, so right now, the one intuition about language models is that all those parameters are around just to store random facts about the world.[00:36:14] Dan Fu: And this intuition comes from the observation that if you take a really small language model, it can do things like talk to you, or kind of has like the The style of conversation, it can learn that, but where it will usually fall over compared to a much larger one is it'll just be a lot less factual about things that it knows or that it can do.[00:36:32] Dan Fu: But that points to all those weights that we're spending, all that SGD that we're spending to train these models are just being used to store facts. And we have things like databases that are pretty good at storing facts. So I think one thing that would be really interesting is if we could actually have some sort of outside data store that a language model can can look at that that maybe is you know, has has some sort of gradient descent in it, but but would be quite interesting.[00:36:58] Dan Fu: And then maybe you could edit it, delete [00:37:00] facts, you know, change who's president so that it doesn't, it doesn't get lost.[00:37:04] Vibhu: Can we open up Q& A and hot takes for the audience? I have a hot take Q& A. Do these scale? When, when 405B state space model, RAG exists, no one does long context, who's throwing in 2 million token questions, hot takes?[00:37:24] Dan Fu: The, the who's throwing in 2 million token question, I think, is, is a really good question. So I actually, I was going to offer that as a hot take. I mean, my hot take was going to be that long context doesn't matter. I know I just gave a whole talk about it, but you know, what, what's the point of doing research if you can't, you know, play both sides.[00:37:40] Dan Fu: But I think one of the, so I think for both of us, the reason that we first got into this was just from the first principled questions of there's this quadratic thing. Clearly intelligence doesn't need to be quadratic. What is going on? Can we understand it better? You know, since then it's kind of turned into a race, which has [00:38:00] been exciting to watch, like, how much context you can take in.[00:38:03] Dan Fu: But I think it's right. Nobody is actually putting in a two million context prompt into these models. And, and, you know, if they are, maybe we can go, go You know, design a better model to do that particular thing. Yeah, what do you think about that? So you've also been working on this. Do you think long context matters?[00:38:19] Eugene Cheah: So I'm going to burn a bit. How many of you remember the news of Google Gemini supporting 3 million contacts, right? Raise your hand.[00:38:28] Vibhu: Yeah, 2 million.[00:38:29] Eugene Cheah: Oh, it's 2 million.[00:38:31] Eugene Cheah: Yeah, how many of you actually tried that? See?[00:38:34] Vibhu: I use it a lot. You? You work for MindsTV. I use it a lot.[00:38:41] Eugene Cheah: So, for some people that has used, and I think, I think that's the, that's might be, like, this is where my opinion starts to differ, because I think the big labs may have a bigger role in this, because Like, even for RWKB, even when we train non contacts, the reason why I say VRAM is a problem is that because when we did the, we need to backprop [00:39:00] against the states, we actually need to maintain the state in between the tokens by the token length.[00:39:05] Eugene Cheah: So that means we need to actually roll out the whole 1 million contacts if we are actually training 1 million. Which is the same for transformers, actually, but it just means we don't magically reuse the VRAM consumption in the training time space. So that is one of the VRAM bottlenecks, and I'm neither OpenAI nor Google, so donate GPUs if you have too much of them.[00:39:27] Eugene Cheah: But then, putting it back to another paradigm, right, is that I think O1 style reasoning might be actually pushing that direction downwards. In my opinion, this is my partial hot take is that if, let's say you have a super big model, And let's say you have a 70B model that may take double the tokens, but gets the same result.[00:39:51] Eugene Cheah: Strictly speaking, a 70B, and this is even for transformer or non transformer, right? We we'll take less less resources than that 400 B [00:40:00] model, even if it did double the amount thinking. And if that's the case, and we are still all trying to figure this out, maybe the direction for us is really getting the sub 200 B to be as fast as efficient as possible.[00:40:11] Eugene Cheah: We a very efficient architecture that some folks happen to be working on to, to just reason it out over larger and larger context thing.[00:40:20] Question: Yeah. One thing I'm super interested in is. Models that can watch forever? Obviously you cannot train something on infinite context length. How are y'all thinking about that, where you run on a much longer context length than is possible to train on?[00:40:38] Dan Fu: Yeah, it's a, it's a great question. So I think when I think you guys probably had tweets along these lines, too. When we first started doing these things, because these are all recurrent models in theory you could just run it forever. You could just run it forever. And at the very least it won't, it won't like error out on your crash.[00:40:57] Dan Fu: There's another question of whether it can actually [00:41:00] use what it's seen in that infinite context. And I think there, so one place where probably the research and architectures ran faster Then another research is actually the benchmarks for long context. So you turn it on forever. You want to do everything or watch everything.[00:41:16] Dan Fu: What is it that you actually wanted to do? Can we actually build some benchmarks for that? Then measure what's happening. And then ask the question, can the models do it? Is there something else that they need? Yeah, I think that if I were to turn back the clock to 2022, that's probably one of the things I would have done differently, which would have been actually get some long context benchmarks out at the same time as we started pushing context length on all these models.[00:41:41] Eugene Cheah: I will also say the use case. So like, I think we both agree that there's no Infinite memory and the model needs to be able to learn and decide. I think what we have observed for, I think this also fits the state space model, is that one of the key advantages of this alternate attention mechanic that is not based on token position is that the model don't suddenly become crazy when you go past the [00:42:00] 8k training context tank, or a million context tank.[00:42:03] Eugene Cheah: It's actually still stable. It's still able to run, it's still able to rationalize. It just starts forgetting things. But some of these things are still there in latent memory. Some of these things are still somewhat there. That's the whole point of why reading twice works. Things like that. And one of the biggest pushes in this direction is that I think both Statespace and RWKB have Separate papers by other researchers where they use this architecture for time series data.[00:42:26] Eugene Cheah: Weather modeling. So, you are not asking what was the weather five days ago. You're asking what's the weather tomorrow based on the infinite length that we, as long as this Earth and the computer will keep running. So, so, and they found that it is like, better than existing, like, transformer or existing architecture in modeling this weather data.[00:42:47] Eugene Cheah: Control for the param size and stuff. I'm quite sure there are people with larger models. So, so there are things that, that in this case, right, there is future applications if your question is just what's next and not what's 10 years ago.[00:42:59] Dan Fu: Thanks so [00:43:00] much for having us. Get full access to Latent Space at www.latent.space/subscribe
Our 192nd episode with a summary and discussion of last week's* big AI news! *and sometimes last last week's Note: this one was recorded on 12/04 , so the news is a bit outdated... Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. Sponsors: The Generator - An interdisciplinary AI lab empowering innovators from all fields to bring visionary ideas to life by harnessing the capabilities of artificial intelligence. The AI safety book “Uncontrollable" which is not a doomer book, but instead lays out the reasonable case for AI safety and what we can do about it. Max TEGMARK said that “Uncontrollable” is a captivating, balanced, and remarkably up-to-date book on the most important issue of our time" - find it on Amazon today! In this episode: OpenAI launches a $200 ChatGPT Pro subscription with advanced capabilities, while Amazon unveils cost-effective Nova multimodal models at the re:Invent conference. Meta releases LLAMA 3.3 70B model, showing significant gains through post-training techniques, and Alibaba introduces QWQ, a reasoning model rivaling OpenAI's O1. Amazon collaborates with Anthropic on a massive AI supercomputer project, and Black Forest Labs eyes a $200 million funding round for growth in AI tools. New research from DeepMind's Genie 2 generates interactive 3D worlds from text and images, progressing AI's understanding of world models and interactive environments. If you would like to become a sponsor for the newsletter, podcast, or both, please fill out this form. Timestamps + Links: (00:00:00) Intro / Banter (00:02:34) Sponsor Break Tools & Apps (00:04:19) OpenAI confirms new $200 monthly subscription, which includes its o1 reasoning model (00:10:40) Amazon announces Nova, a new family of multimodal AI models (00:17:13) ElevenLabs launches GenFM to turn user content into AI-powered podcasts (00:20:21) Google's new generative AI video model is now available Applications & Business (00:23:56) Elon Musk files for injunction to halt OpenAI's transition to a for-profit (00:29:40) Amazon Is Building a Mega AI Supercomputer With Anthropic (00:34:15) It Sounds an Awful Lot Like OpenAI Is Adding Ads to ChatGPT (00:38:23) A16z in Talks to Lead $200 Million Round in Black Forest Labs, Startup Behind AI Images on Grok (00:41:10) Bezos Backs AI Chipmaker Vying With Nvidia at $2.6 Billion Value Projects & Open Source (00:45:25) Meta unveils a new, more efficient Llama model (00:50:00) Alibaba releases an ‘open' challenger to OpenAI's o1 reasoning model (00:55:21) DeMo: Decoupled Momentum Optimization (00:57:01) PRIME Intellect Releases INTELLECT-1 (Instruct + Base): The First 10B Parameter Language Model Collaboratively Trained Across the Globe (01:03:03) Tencent Launches HunyuanVideo, an Open-Source AI Video Model Research & Advancements (01:09:23) DeepMind's Genie 2 can generate interactive worlds that look like video games (01:16:43) Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding (01:20:40) Densing Law of LLMs (01:25:59) Monet: Mixture of Monosemantic Experts for Transformers Policy & Safety (01:30:56) Commerce Strengthens Export Controls to Restrict China's Capability to Produce Advanced Semiconductors for Military Applications (01:37:33) China retaliates against latest US chip restrictions (01:40:52) OpenAI Is Working With Anduril to Supply the US Military With AI (01:43:24) On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback (01:47:52) AI Safety Researcher Quits OpenAI, Saying Its Trajectory Alarms Her (01:51:52) Meta Claims AI Content Was Less than 1% of Election Misinformation (01:55:05) Outro
Is o1 Pro worth the cost? In Episode 33 of Mixture of Experts, host Tim Hwang is joined by Marina Danilevsky, Kate Soule and Vyoma Gajjar. First, the experts debrief the 12 Days of OpenAI. Next, we review some of the top papers in NeurIPS, how are the experts keeping up with all these research papers? Then, we are back with another benchmark, can ARC Prize make AGI more tractable? Finally, Meta announced the launch of Llama 3.3 70B with the promise of 405B performance, can we have our cake and eat it too? Find out more on today's Mixture of Experts!The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity.
20-year-old Harshandeep Singh was pushed down a flight of stairs, shot in the back, and killed just three days into a new job as a security guard. The suspect charged by Edmonton Police is a dangerous, repeat offender with a history of violent crimes. Amidst public outrage, people are questioning the role Mr. Singh's employer, the Government of Alberta, and Ottawa played in this tragedy. 4:10 | Mike Byrne, founder of Scope Safety & Security, exposes the shortcomings in training and accountability for security guards in Canada. 23:30 | Rich LaForge, Chair at ASIS Chapter 156, says under-trained, inexperienced security guards are being sent into dangerous situations across the country. He tells us who's responsible, and what needs to change. 39:00 | Ryan shares his thoughts on judicial reform in Canada, and shares comments from the Real Talk Live Chat powered by Park Power. TELL US WHAT YOU THINK: talk@ryanjespersen.com 44:30 | We ask Grande Prairie Mayor Jackie Clayton about that $70B data centre Kevin O'Leary's promising to build a half hour outside her city. Mayor Clayton claps back at critics saying it'd be tough to attract an international workforce to GP. DETAILS ON WONDER VALLEY: https://rtrj.info/121124Max 1:19:30 | Order Real Talk merch by December 16, and we'll have it at your door by Christmas! SHOP NOW: https://www.ryanjespersen.com/merch FOLLOW US ON TIKTOK, X, & INSTAGRAM: @realtalkrj JOIN US ON FACEBOOK & LINKEDIN: @ryanjespersen REAL TALK MERCH: https://ryanjespersen.com/merch RECEIVE EXCLUSIVE PERKS - BECOME A REAL TALK PATRON: patreon.com/ryanjespersen THANK YOU FOR SUPPORTING OUR SPONSORS! https://ryanjespersen.com/sponsors The views and opinions expressed in this show are those of the host and guests and do not necessarily reflect the position of Relay Communications Group Inc. or any affiliates.
Gen Z and Millennials are addicted to the unwind. More travelers are saying that they enjoy a chill day on vacation with no plans- aka more sleep. So how did this Sleep Tourism trend become a $70B industry and how can you cash in? We got ideas. Plus: Macy's pares down its shoe offerings and Tesla is back to $1T. Join our hosts Jon Weigell and Cyan Zhong as they take you through our most interesting stories of the day. Follow us on social media: TikTok: https://www.tiktok.com/@thehustle.co Instagram: https://www.instagram.com/thehustledaily/ Thank You For Listening to The Hustle Daily Show. Don't forget to hit Subscribe or Follow us on Apple Podcasts so you never miss an episode! If you want this news delivered to your inbox, join millions of others and sign up for The Hustle Daily newsletter, here: https://thehustle.co/email/ Plus! Your engagement matters to us. If you are a fan of the show, be sure to leave us a 5-Star Review on Apple Podcasts https://podcasts.apple.com/us/podcast/the-hustle-daily-show/id1606449047 (and share your favorite episodes with your friends, clients, and colleagues).
Frank Slootman turns the 'founder mode vs. manager mode' debate on its head. Frank's track record in B2B land is iconic: He took Data Domain from pre-revenues to a $2.5B acquisition by EMC. He led the IPO at ServiceNow, and when he left the company, it was worth $34B. Frank then took Snowflake public, and the company was worth over $70B when he retired earlier this year. After three successful CEO stints, Frank isn't buying Silicon Valley's fairytales about founders. His leadership style combines a manager's prowess with a founder's passion. Frank epitomizes what some might call “owner mode!” (00:07) Frank's thoughts on 'founder mode' vs. 'manager mode' (00:47) The role of non-founder managers and CEOs (09:59) How to manage effectively without micro-managing (17:11) The importance of intellectual honesty (18:32) Frank's thoughts on being 'in the arena' (21:04) What it really takes to build a viable business (28:34) Contrasting ServiceNow and Snowflake (33:40) The impact of AI on business (39:01) The future of app ecosystems (44:50) Becoming a student of leadership (46:31) Managing investor relationships (48:04) Why Frank doesn't think about his legacy (50:17) Closing Thoughts
AI Engineering is expanding! Join the first
Try Llama 3.1 Models on SimTheory: https://simtheory.aiJoin our community: https://thisdayinai.comShow notes: https://thisdayinai.com/bookmarks/64-ep71------CHAPTERS:------00:00 - Llama 3.1 8B, 70B and 405B News & Initial Thoughts27:44 - Discussion on Context Input Optimization, RAG and context focuses including "memory stack"38:53 -Best model right now? GPT4-Mini daily driving & is Claude Sonnet 3.5 getting dumber?42:17 - Official Llama 3.1 BOOM FACTOR scores47:08 - GPT4o-Mini Fine Tuning is Now Available53:19 - Chris's Apology for Ruining Online Poker with AI to the Poker Community ------Thanks for listening and all of your support!