POPULARITY
Wypróbuj angielski online w Tutlo podczas bezpłatnej lekcji próbnej od Żurnalisty https://tutlo.pro/ZURNALISTAYT
Chris Lattner of Modular (https://modular.com) joined us (again!) to talk about how they are breaking the CUDA monopoly, what it took to match NVIDIA performance with AMD, and how they are building a company of "elite nerds". X: https://x.com/latentspacepod Substack: https://latent.space 00:00:00 Introductions 00:00:12 Overview of Modular and the Shape of Compute 00:02:27 Modular's R&D Phase 00:06:55 From CPU Optimization to GPU Support 00:11:14 MAX: Modular's Inference Framework 00:12:52 Mojo Programming Language 00:18:25 MAX Architecture: From Mojo to Cluster-Scale Inference 00:29:16 Open Source Contributions and Community Involvement 00:32:25 Modular's Differentiation from VLLM and SGLang 00:41:37 Modular's Business Model and Monetization Strategy 00:53:17 DeepSeek's Impact and Low-Level GPU Programming 01:00:00 Inference Time Compute and Reasoning Models 01:02:31 Personal Reflections on Leading Modular 01:08:27 Daily Routine and Time Management as a Founder 01:13:24 Using AI Coding Tools and Staying Current with Research 01:14:47 Personal Projects and Work-Life Balance 01:17:05 Hiring, Open Source, and Community Engagement
Interview with Stephen Witt Altman's Gentle Singularity Sutskever video: start at 5:50-6:40 Paris on Apple Glass OpenAI slams court order to save all ChatGPT logs, including deleted chats Disney and Universal Sue A.I. Firm for Copyright Infringement Apple paper: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Futurism on the paper Could AI make a Scorsese movie? Demis Hassabis and Darren Aronofsky discuss YouTube Loosens Rules Guiding the Moderation of Videos Meta Is Creating a New A.I. Lab to Pursue 'Superintelligence' Meta and Yandex are de-anonymizing Android users' web browsing identifiers Amazon 'testing humanoid robots to deliver packages' Google battling 'fox infestation' on roof of £1bn London office 23andMe's Former CEO Pushes Purchase Price Nearly $50 Million Higher Code to control vocal production with hands Warner Bros. Discovery to split into two public companies by next year Social media creators to overtake traditional media in ad revenue this year Hosts: Leo Laporte, Jeff Jarvis, and Paris Martineau Guest: Stephen Witt Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: agntcy.org smarty.com/twit monarchmoney.com with code TWIT spaceship.com/twit
Interview with Stephen Witt Altman's Gentle Singularity Sutskever video: start at 5:50-6:40 Paris on Apple Glass OpenAI slams court order to save all ChatGPT logs, including deleted chats Disney and Universal Sue A.I. Firm for Copyright Infringement Apple paper: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Futurism on the paper Could AI make a Scorsese movie? Demis Hassabis and Darren Aronofsky discuss YouTube Loosens Rules Guiding the Moderation of Videos Meta Is Creating a New A.I. Lab to Pursue 'Superintelligence' Meta and Yandex are de-anonymizing Android users' web browsing identifiers Amazon 'testing humanoid robots to deliver packages' Google battling 'fox infestation' on roof of £1bn London office 23andMe's Former CEO Pushes Purchase Price Nearly $50 Million Higher Code to control vocal production with hands Warner Bros. Discovery to split into two public companies by next year Social media creators to overtake traditional media in ad revenue this year Hosts: Leo Laporte, Jeff Jarvis, and Paris Martineau Guest: Stephen Witt Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: agntcy.org smarty.com/twit monarchmoney.com with code TWIT spaceship.com/twit
Interview with Stephen Witt Altman's Gentle Singularity Sutskever video: start at 5:50-6:40 Paris on Apple Glass OpenAI slams court order to save all ChatGPT logs, including deleted chats Disney and Universal Sue A.I. Firm for Copyright Infringement Apple paper: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Futurism on the paper Could AI make a Scorsese movie? Demis Hassabis and Darren Aronofsky discuss YouTube Loosens Rules Guiding the Moderation of Videos Meta Is Creating a New A.I. Lab to Pursue 'Superintelligence' Meta and Yandex are de-anonymizing Android users' web browsing identifiers Amazon 'testing humanoid robots to deliver packages' Google battling 'fox infestation' on roof of £1bn London office 23andMe's Former CEO Pushes Purchase Price Nearly $50 Million Higher Code to control vocal production with hands Warner Bros. Discovery to split into two public companies by next year Social media creators to overtake traditional media in ad revenue this year Hosts: Leo Laporte, Jeff Jarvis, and Paris Martineau Guest: Stephen Witt Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: agntcy.org smarty.com/twit monarchmoney.com with code TWIT spaceship.com/twit
Interview with Stephen Witt Altman's Gentle Singularity Sutskever video: start at 5:50-6:40 Paris on Apple Glass OpenAI slams court order to save all ChatGPT logs, including deleted chats Disney and Universal Sue A.I. Firm for Copyright Infringement Apple paper: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Futurism on the paper Could AI make a Scorsese movie? Demis Hassabis and Darren Aronofsky discuss YouTube Loosens Rules Guiding the Moderation of Videos Meta Is Creating a New A.I. Lab to Pursue 'Superintelligence' Meta and Yandex are de-anonymizing Android users' web browsing identifiers Amazon 'testing humanoid robots to deliver packages' Google battling 'fox infestation' on roof of £1bn London office 23andMe's Former CEO Pushes Purchase Price Nearly $50 Million Higher Code to control vocal production with hands Warner Bros. Discovery to split into two public companies by next year Social media creators to overtake traditional media in ad revenue this year Hosts: Leo Laporte, Jeff Jarvis, and Paris Martineau Guest: Stephen Witt Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: agntcy.org smarty.com/twit monarchmoney.com with code TWIT spaceship.com/twit
Interview with Stephen Witt Altman's Gentle Singularity Sutskever video: start at 5:50-6:40 Paris on Apple Glass OpenAI slams court order to save all ChatGPT logs, including deleted chats Disney and Universal Sue A.I. Firm for Copyright Infringement Apple paper: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Futurism on the paper Could AI make a Scorsese movie? Demis Hassabis and Darren Aronofsky discuss YouTube Loosens Rules Guiding the Moderation of Videos Meta Is Creating a New A.I. Lab to Pursue 'Superintelligence' Meta and Yandex are de-anonymizing Android users' web browsing identifiers Amazon 'testing humanoid robots to deliver packages' Google battling 'fox infestation' on roof of £1bn London office 23andMe's Former CEO Pushes Purchase Price Nearly $50 Million Higher Code to control vocal production with hands Warner Bros. Discovery to split into two public companies by next year Social media creators to overtake traditional media in ad revenue this year Hosts: Leo Laporte, Jeff Jarvis, and Paris Martineau Guest: Stephen Witt Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: agntcy.org smarty.com/twit monarchmoney.com with code TWIT spaceship.com/twit
Interview with Stephen Witt Altman's Gentle Singularity Sutskever video: start at 5:50-6:40 Paris on Apple Glass OpenAI slams court order to save all ChatGPT logs, including deleted chats Disney and Universal Sue A.I. Firm for Copyright Infringement Apple paper: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Futurism on the paper Could AI make a Scorsese movie? Demis Hassabis and Darren Aronofsky discuss YouTube Loosens Rules Guiding the Moderation of Videos Meta Is Creating a New A.I. Lab to Pursue 'Superintelligence' Meta and Yandex are de-anonymizing Android users' web browsing identifiers Amazon 'testing humanoid robots to deliver packages' Google battling 'fox infestation' on roof of £1bn London office 23andMe's Former CEO Pushes Purchase Price Nearly $50 Million Higher Code to control vocal production with hands Warner Bros. Discovery to split into two public companies by next year Social media creators to overtake traditional media in ad revenue this year Hosts: Leo Laporte, Jeff Jarvis, and Paris Martineau Guest: Stephen Witt Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: agntcy.org smarty.com/twit monarchmoney.com with code TWIT spaceship.com/twit
Python is the dominant language for AI and data science applications, but it lacks the performance and low-level control needed to fully leverage GPU hardware. As a result, developers often rely on NVIDIA's CUDA framework, which adds complexity and fragments the development stack. Mojo is a new programming language designed to combine the simplicity of The post Mojo and Building a CUDA Replacement with Chris Lattner appeared first on Software Engineering Daily.
Python is the dominant language for AI and data science applications, but it lacks the performance and low-level control needed to fully leverage GPU hardware. As a result, developers often rely on NVIDIA's CUDA framework, which adds complexity and fragments the development stack. Mojo is a new programming language designed to combine the simplicity of The post Mojo and Building a CUDA Replacement with Chris Lattner appeared first on Software Engineering Daily.
Czy wiesz, że na to że wyzdrowiejesz ma wpływ nie tylko jakość leków ale również ich kolor, wielkość, ilość a nawet gabinet lekarza? A czy wiesz, że pijąc codziennie kawę sam się oszukujesz? A może zainteresuję Cię fakt jak to możliwe, że słysząc jęk Innych ludzi sam odczuwasz większy ból?To nie magia a psychologia, dlatego koniecznie wysłuchaj tego odcinka!
Send us a textThe Get Out N Drive Podcast is Fueled By AMD.Ride along John CustomCarNerd Meyer chats with the guys from MOPARS5150 at MCACN.The Get Out N Drive Podcast is Fuel By AMD ~ AMD: More Than MetalVisit the AMD Garage ~ Your one stop source for high quality body panelsSpeed over to our friends at Racing_JunkFor all things Get Out N Drive, cruise on over to the Get Out N Drive website.Be sure to follow GOND on social media!GOND WebsiteIGXFBYouTubeRecording Engineer, Paul MeyerSubscribe to the Str8sixfan YouTube Channel#c10sinthecity#classiccars #automotive #amd #autometaldirect #c10 #restoration #autorestoration #autoparts #restorationparts #truckrestoration #Jasonchandler #podcast #sheetmetal #badchad #polebarngarage #vicegripgarage #youtube #amd #autometaldirect#tradeschool#carengines#WhatDrivesYOUth#GetOutNDriveFASTJoin our fb group to share pics of how you Get Out N DriveFollow Jason on IGIGFollow Jason on fbSubscribe To the OldeCarrGuy YouTube ChannelFollow John on IGRecording Engineer, Paul MeyerSign Up and Learn more about National Get Out N Drive Day.Music Credit:Licensor's Author Username:LoopsLabLicensee:Get Out N Drive PodcastItem Title:The RockabillyItem URL:https://audiojungle.ne...Item ID:25802696Purchase Date:2022-09-07 22:37:20 UTCSupport the show
An airhacks.fm conversation with Juan Fumero (@snatverk) about: tornadovm as a Java parallel framework for accelerating data parallelization on GPUs and other hardware, first GPU experiences with ELSA Winner and Voodoo cards, explanation of TornadoVM as a plugin to existing JDKs that uses Graal as a library, TornadoVM's programming model with @parallel and @reduce annotations for parallelizable code, introduction of kernel API for lower-level GPU programming, TornadoVM's ability to dynamically reconfigure and select the best hardware for workloads, implementation of LLM inference acceleration with TornadoVM, challenges in accelerating Llama models on GPUs, introduction of tensor types in TornadoVM to support FP8 and FP16 operations, shared buffer capabilities for GPU memory management, comparison of Java Vector API performance versus GPU acceleration, discussion of model quantization as a potential use case for TornadoVM, exploration of Deep Java Library (DJL) and its ND array implementation, potential standardization of tensor types in Java, integration possibilities with Project Babylon and its Code Reflection capabilities, TornadoVM's execution plans and task graphs for defining accelerated workloads, ability to run on multiple GPUs with different backends simultaneously, potential enterprise applications for LLMs in Java including model distillation for domain-specific models, discussion of Foreign Function & Memory API integration in TornadoVM, performance comparison between different GPU backends like OpenCL and CUDA, collaboration with Intel Level Zero oneAPI and integrated graphics support, future plans for RISC-V support in TornadoVM Juan Fumero on twitter: @snatverk
Introducing @danielcuda_ — the Melbourne-based DJ/producer takes over the Crosstown Mix Show ahead of his debut EP ‘Mulholland Cowboy', out tomorrow via Rebellion! Expect deep cuts, hypnotic grooves, and a preview of what's to come.
Join us for Shark Bytes on Teal Town USA. Puckguy, Jules, and Ian discuss but not limited to: Cuda eliminated in division semis to Colorado Interesting locker clean out comments We get Cup Crazy as the craziness that is the Playoffs continue Curious hires in SoCal Blues thicken and tinker and... Is #2 overall really available?
Hey folks, Alex here (yes, real me, not my AI avatar, yet)Compared to previous weeks, this week was pretty "chill" in the world of AI, though we did get a pretty significant Gemini 2.5 Pro update, it basically beat itself on the Arena. With Mistral releasing a new medium model (not OSS) and Nvidia finally dropping Nemotron Ultra (both ignoring Qwen 3 performance) there was also a few open source updates. To me the highlight of this week was a breakthrough in AI Avatars, with Heygen's new IV model, Beating ByteDance's OmniHuman (our coverage) and Hedra labs, they've set an absolute SOTA benchmark for 1 photo to animated realistic avatar. Hell, Iet me record all this real quick and show you how good it is! How good is that?? I'm still kind of blown away. I have managed to get a free month promo code for you guys, look for it in the TL;DR section at the end of the newsletter. Of course, if you're rather watch than listen or read, here's our live recording on YTOpenSource AINVIDIA's Nemotron Ultra V1: Refining the Best with a Reasoning Toggle
On March 3 2023 I finally got to attend y first ever San Jose Barracuda game during their first season at Tech CU Arena and it was more than I could ever imagined as not only did I get to see a win but I also met a very special someone
Co karp koi ma wspólnego z praktykowaniem drogi we wspólnocie uczniów? Skoro nikt nie idzie za Jezusem samotnie, warto byśmy uczyli się praktykowania otwartości wobec zaufanych uczniów Chrystusa. Jednym ze sposobów wyznawanie innym ludziom swoich słabości. To nas rozciąga i pomaga wzrastać duchowo, podobnie jak karp koi rośnie do rozmiarów akwarium, w którym się znajduje. Zapraszamy: www.SpolecznoscMIASTO.pl Obserwuj nas na:
An epic night of hockey concluded with the San Jose Barracuda winning their first playoff game at TechCU Arena, beating the Colorado Eagles in overtime 2-1. The Cuda tie the series at 1-1. Puckguy and Jules discuss the game, first round recap of the Stanley Cup Playoffs, preview round two, then joined by Kevin and AJ later on about the atmosphere of the win and AJ vividly elaborates on meeting Logan Couture.
This week we discuss: 00:00 Playoffs Recap - Mikko Rantanen's insane game against The Avs 10:53 Playoff Bracket update 17:51 William Eklund's scary injury during a friendly against Czechia 24:48 More on the coaching carousel 33:20 Draft lottery talk 45:34 Utah Mammoth logo? 53:50 Update on the Cuda's playoffs Partnership with The Hockey Podcast Network Sponsored by DraftKings Follow/subscribe to us on Instagram, Twitter, YouTube, and wherever you get your podcasts! Check out our website! Love what we do? Share with a friend! Or leave us a tip on Ko-fi! Opening Track: Make It Happen by Fifty Sounds
Jonathan reviews the OrangePI RV2, Windows runs Arch btw, and Nvidia is deprecating CUDA for some old video cards. PewDiePie made a Linux video, Proton 10 enters Beta, and OSU's Open Source Labs has a funding crunch. For command line tips, Ken starts a series on the pw-cli, Jeff has some ricing tips with eww, and Jonathan talks about Open Source character recognition with ocrmypdf and pdftotext. You can find the show notes at https://bit.ly/3GxPRbY and enjoy! Host: Jonathan Bennett Co-Hosts: Ken McDonald and Jeff Massie Download or subscribe to Untitled Linux Show at https://twit.tv/shows/untitled-linux-show Want access to the ad-free video and exclusive features? Become a member of Club TWiT today! https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.
On the latest episode of Barracuda Breakdown, Ted goes over San Jose's Game 1 loss to Colorado, and chats with Cuda play by play announcer, Nick Nollenberger, about everything we saw.
Join us as we discuss: - Barracuda moving on in the Calder Cup Playoffs - Coaches and GMs are moving on from their teams - Stanley Cup Playoff Reset and Recap - Network and Teams getting cheaper with their broadcasts. and maybe Jules gets her wish to fight Ian about ocean rankings...or not. Teal Town USA - A San Jose Sharks' post-game podcast, for the fans, by the fans! Subscribe to catch us after every Sharks game and our weekly wrap-up show, The Pucknologists! Check us out on YouTube and remember to Like, Subscribe, and hit that Notification bell to be alerted every time we go live!
Send us a textRide along as John CustomCarNerd Meyer talks with Art Kelly about his GTO Tribute car that you probably have never seen from the MCACN show floor. The Get Out N Drive Podcast is Fuel By AMD ~ AMD: More Than MetalVisit the AMD Garage ~ Your one stop source for high quality body panelsSpeed over to our friends at Racing_JunkFor all things Get Out N Drive, cruise on over to the Get Out N Drive website.Be sure to follow GOND on social media!GOND WebsiteIGXFBYouTubeRecording Engineer, Paul MeyerSubscribe to the Str8sixfan YouTube Channel#classiccars #automotive #amd #autometaldirect #c10 #restoration #autorestoration #autoparts #restorationparts #truckrestoration #Jasonchandler #podcast #sheetmetal #fm3 #barnfinds #mcacn #coronet#tradeschool#carengines#WhatDrivesYOUth#GetOutNDriveFASTJoin our fb group to share pics of how you Get Out N DriveFollow Jason on IGIGFollow Jason on fbSubscribe To the OldeCarrGuy YouTube ChannelFollow John on IGRecording Engineer, Paul MeyerSign Up and Learn more about National Get Out N Drive Day.Music Credit:Licensor's Author Username:LoopsLabLicensee:Get Out N Drive PodcastItem Title:The RockabillyItem URL:https://audiojungle.ne...Item ID:25802696Purchase Date:2022-09-07 22:37:20 UTCSupport the show
On the latest episode of Barracuda Breakdown, Ted goes over the Cuda's series win in Ontario, gets into postgame sound, and more!
The Sharks blew a lead for the last time this season during an OT loss in Vancouver. A former Shark scores the only goal needed to win in a shutout over San Jose for the season finale. Macklin Celebrini becomes the 2nd Sharks in franchise history to score the first and last goals of a season. Also, Logan Couture officially steps away from hockey, Nikolai Kovalenko maybe wasn't taken out of context after all, Alexandar Georgiev with the quickest exit, player comments following the end of 2024-2025 San Jose Sharks season, and curious decisions for Mike Grier. Finally, the San Jose Barracuda welcomed in Quentin Musty and Igor Chernyshov, fight off injuries, and will face Ontario in the first round of the AHL playoffs. The Canucks hand San Jose their 37th one goal loss, 2-1 in OT Shutout to end the season by Ty Emberson and the Edmonton Oilers Logan Couture calls it a career Nikolai Kovalenko's dad says San Jose is disappointing Alexandar Georgiev dealt with quick Postseason comments from players, front office Stock Up, Stock Down… week and season NHL Playoff picks Barracuda update: a split in Calgary, off to Ontario and more… Teal Town USA - A San Jose Sharks post-game podcast, for fans, by fans! Subscribe to catch us after every Sharks game and our weekly wrap-up show, The Pucknologists! Want audio only? Subscribe to our audio-only platforms below:
The AI revolution is charging ahead—but powering it shouldn't cost us the planet. That tension lies at the heart of Vaire Computing's bold proposition: rethinking the very logic that underpins silicon to make chips radically more energy efficient. Speaking on the Data Center Frontier Show podcast, Vaire CEO Rodolfo Rossini laid out a compelling case for why the next era of compute won't just be about scaling transistors—but reinventing the way they work. “Moore's Law is coming to an end, at least for classical CMOS,” Rossini said. “There are a number of potential architectures out there—quantum and photonics are the most well known. Our bet is that the future will look a lot like existing CMOS, but the logic will look very, very, very different.” That bet is reversible computing—a largely untapped architecture that promises major gains in energy efficiency by recovering energy lost during computation. Product, Not IP Unlike some chip startups focused on licensing intellectual property, Vaire is playing to win with full-stack product development. “Right now we're not really planning to license. We really want to build product,” Rossini emphasized. “It's very important today, especially from the point of view of the customer. It's not just the hardware—it's the hardware and software.” Rossini points to Nvidia's CUDA ecosystem as the gold standard for integrated hardware/software development. “The reason why Nvidia is so great is because they spent a decade perfecting their CUDA stack,” he said. “You can't really think of a chip company being purely a hardware company anymore. Better hardware is the ticket to the ball—and the software is how you get to dance.” A great metaphor for a company aiming to rewrite the playbook on compute logic. The Long Game: Reimagining Chips Without Breaking the System In an industry where even incremental change can take years to implement, Vaire Computing is taking a pragmatic approach to a deeply ambitious goal: reimagining chip architecture through reversible computing — but without forcing the rest of the computing stack to start over. “We call it the Near-Zero Energy Chip,” said Rossini. “And by that we mean a chip that operates at the lowest possible energy point compared to classical chips—one that dissipates the least amount of energy, and where you can reuse the software and the manufacturing supply chain.” That last point is crucial. Vaire isn't trying to uproot the hyperscale data center ecosystem — it's aiming to integrate into it. The company's XPU architecture is designed to deliver breakthrough efficiency while remaining compatible with existing tooling, manufacturing processes, and software paradigms.
Another blown 3rd period and their 34th one goal loss comes against Calgary. A drunken soiree of hat tricks comes in Minnesota while the Sharks get their 35th one goal loss. McDavid assists on every Edmonton goal during the Sharks' 36th one goal loss. Finally, the Sharks look for their first April win in Calgary in an effort to end an 8 game losing streak. Also, Shakir Mukhamadullin and Mario Ferraro are ruled out for the season, captain Couture cause click bait chaos, tank watch ends with the Sharks on top, or at the bottom depending on your perspective, free agent target this offseason, a projected lineup for next season, and 3 silly positives from the 2024-2025 San Jose Sharks season. Finally, Eddie Lack is pissed off, history made in big D, a one time only draft, MLB killing RSNs, and controversies in Chicago and New York. Meanwhile, the Barracuda get reinforcements but still disappoint the Cuda cult on Fan Appreciation Night. - Smith scores twice, but Sharks blow another 3rd period lead, 3-2 to Flames - Celebrini gets his first NHL hat trick in a wild 8-7 loss in Minnesota - Connor McDavid assists all Oilers goals in 4-2 win over San Jose - Sharks look to end Flames' playoff hopes in Calgary - Defensive injuries pile up - Logan Couture sparks click bait on Instagram - Stock Up, Stock Down - The Athletic looking at this season, this offseason, and next season - Tank watch ends, Sharks win - Around the NHL: Eddie Lack pissed, Dallas disaster, New York controversy - Barracuda update: Quentin Musty debuts, Barracuda clinch playoff spot - and more… Teal Town USA - A San Jose Sharks post-game podcast, for fans, by fans! Subscribe to catch us after every Sharks game and our weekly wrap-up show, The Pucknologists! Want audio only? Subscribe to our audio-only platforms below:
This week, we discuss the rise of MCP, Google's Agent2Agent protocol, and 20 years of Git. Plus, lazy ways to get rid of your junk. Watch the YouTube Live Recording of Episode (https://www.youtube.com/live/o2bmkzXOzHE?si=bPrbuPlKYODQj88s) 514 (https://www.youtube.com/live/o2bmkzXOzHE?si=bPrbuPlKYODQj88s) Runner-up Titles They like to keep it tight, but I'll distract them Bring some SDT energy Salesforce is where AI goes to struggle I like words Rundown MCP The Strategy Behind MCP (https://fintanr.com/links/2025/03/31/mcp-strategy.html?utm_source=substack&utm_medium=email) Google's Agent2Agent Protocol Helps AI Agents Talk to Each Other (https://thenewstack.io/googles-agent2agent-protocol-helps-ai-agents-talk-to-each-other/) Announcing the Agent2Agent Protocol (A2A)- Google Developers Blog (https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/) MCP: What It Is and Why It Matters (https://addyo.substack.com/p/mcp-what-it-is-and-why-it-matters) 20 years of Git. Still weird, still wonderful. (https://blog.gitbutler.com/20-years-of-git/) A love letter to the CSV format (https://github.com/medialab/xan/blob/master/docs/LOVE_LETTER.md?ref=labnotes.org) Relevant to your Interests JFrog Survey Surfaces Limited DevSecOps Gains - DevOps.com (https://substack.com/redirect/dc38a19b-484e-47bc-83ec-f0413af42718?j=eyJ1IjoiMmw5In0.XyGUvWHNbIDkkVfjKDkxiDWJVFXc4dKUhxHaMrlgmdI) Raspberry Pi's sliced profits are easier to swallow than its valuation (https://on.ft.com/42d3mol) 'I begin spying for Deel': (https://www.yahoo.com/news/begin-spying-deel-rippling-employee-151407449.html) Bill Gates Publishes Original Microsoft Source Code in a Blog Post (https://www.cnet.com/tech/computing/bill-gates-publishes-original-microsoft-source-code-in-a-blog-post/) WordPress.com owner Automattic is laying off 16 percent of workers (https://www.theverge.com/news/642187/automattic-wordpress-layoffs-matt-mullenweg) Intel, TSMC recently discussed chipmaking joint venture (https://www.reuters.com/technology/intel-tsmc-tentatively-agree-form-chipmaking-joint-venture-information-reports-2025-04-03/) TikTok deal scuttled because of Trump's tariffs on China (https://www.nbcnews.com/politics/politics-news/trump-tiktok-ban-extension-rcna199394) NVIDIA Finally Adds Native Python Support to CUDA (https://thenewstack.io/nvidia-finally-adds-native-python-support-to-cuda/) Cloudflare Acquires Outerbase (https://www.cloudflare.com/press-releases/2025/cloudflare-acquires-outerbase-to-expand-developer-experience/) UK loses bid to keep Apple appeal against demand for iPhone 'backdoor' a secret (https://www.cnbc.com/2025/04/07/uk-loses-bid-to-keep-apple-appeal-against-iphone-backdoor-a-secret.html) Cloud Asteroids | Wiz (https://www.wiz.io/asteroids) Unpacking Google Cloud Platform's Acquisition Of Wiz (https://moorinsightsstrategy.com/unpacking-google-cloud-platforms-acquisition-of-wiz/) Trade, Tariffs, and Tech (https://stratechery.com/2025/trade-tariffs-and-tech/?access_token=eyJhbGciOiJSUzI1NiIsImtpZCI6InN0cmF0ZWNoZXJ5LnBhc3Nwb3J0Lm9ubGluZSIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzdHJhdGVjaGVyeS5wYXNzcG9ydC5vbmxpbmUiLCJhenAiOiJIS0xjUzREd1Nod1AyWURLYmZQV00xIiwiZW50Ijp7InVyaSI6WyJodHRwczovL3N0cmF0ZWNoZXJ5LmNvbS8yMDI1L3RyYWRlLXRhcmlmZnMtYW5kLXRlY2gvIl19LCJleHAiOjE3NDY2MjA4MTAsImlhdCI6MTc0NDAyODgxMCwiaXNzIjoiaHR0cHM6Ly9hcHAucGFzc3BvcnQub25saW5lL29hdXRoIiwic2NvcGUiOiJmZWVkOnJlYWQgYXJ0aWNsZTpyZWFkIGFzc2V0OnJlYWQgY2F0ZWdvcnk6cmVhZCBlbnRpdGxlbWVudHMiLCJzdWIiOiJDS1RtckdldHdmM1lYa3FCYkpKaUgiLCJ1c2UiOiJhY2Nlc3MifQ.pVeppxFZcYy960AbHM--oz5gzQdMEa_mv3ZPrqrZmbw9PhwL3iCEQ7_PtfPEKgInTfvSGWofXW0ZjAN-G_Eug5BlvwlF8T6HhXOCNJlwJJeqkWKvNdjvVz0t6bc5fOjn4Tbt_JobtrwxIEe-4-L7QRMhzFj9ajiiRqU6KNi3qYxWScg3XWfYmuhRdItQsgWINcSyW9iLaTkDLga_m95MMBNAat-CXDhEeKKCrAApZBM_RoNFaQ3s679vslz2IbJuCIAN1jVvZYR2Vg18lDbwubPiddDQAOkjs77PZRX_tCnMSwVXtOq0S1cCn4GZIw1qPY8j0qWWmkUck_izqPAveg) Google Workspace gets automation flows, podcast-style summaries (https://techcrunch.com/2025/04/09/google-workspace-gets-automation-flows-podcast-style-summaries/?guccounter=1&guce_referrer=aHR0cHM6Ly9uZXdzLmdvb2dsZS5jb20v&guce_referrer_sig=AQAAAAm5axmZnaAYjPgnDoqozIFkZHFPG8FHWa9y8pWwoQMN-oJ8MvJjY0IOg7Ej35bBB1Y2Ej192X3dHr5Q8PZ4i8WP_VNeXKj4f1n-KXFgqrpjfjUbiUvE4eGIl1j1VPWIg62ApISVGhYQ-__bXdIteBex8_k5-wxcpSYtfmlAFxsk) Zelle is shutting down its app. Here's how you can still use the service (https://www.cnn.com/2025/04/03/business/zelle-cash-transferring-app-shuts-down/index.html) One year ago Redis changed its license – and lost most of its external contributors (https://devclass.com/2025/04/01/one-year-ago-redis-changed-its-license-and-lost-most-of-its-external-contributors/?ck_subscriber_id=512840665&utm_source=convertkit&utm_medium=email&utm_campaign=[Last%20Week%20in%20AWS]%20Issue%20#417:%20Way%20of%20the%20Weasel,%20RDS%20and%20SageMaker%20Edition%20-%2017192200) Tailscale raises $160 Million (USD) Series C to build the New Internet (https://tailscale.com/blog/series-c) Nonsense NFL announces use of virtual measurement technology for first downs (https://www.nytimes.com/athletic/6247338/2025/04/01/nfl-announces-virtual-first-down-measurement-technology/?source=athletic_scoopcity_newsletter&campaign=13031970&userId=56655) Listener Feedback GitJobs (https://gitjobs.dev/) Freecycle (https://www.freecycle.org) Conferences Tanzu Annual Update AI PARTY! (https://go-vmware.broadcom.com/april-moment-2025?utm_source=cote&utm_campaign=devrel&utm_medium=newsletter), April 16th, Coté speaking DevOps Days Atlanta (https://devopsdays.org/events/2025-atlanta/welcome/), April 29th-30th Cloud Foundry Day US (https://events.linuxfoundation.org/cloud-foundry-day-north-america/), May 14th, Palo Alto, CA, Coté speaking Fr (https://vmwarereg.fig-street.com/051325-tanzu-workshop/)ee AI workshop (https://vmwarereg.fig-street.com/051325-tanzu-workshop/), May 13th. day before C (https://events.linuxfoundation.org/cloud-foundry-day-north-america/)loud (https://events.linuxfoundation.org/cloud-foundry-day-north-america/) (https://events.linuxfoundation.org/cloud-foundry-day-north-america/)Foundry (https://events.linuxfoundation.org/cloud-foundry-day-north-america/) Day (https://events.linuxfoundation.org/cloud-foundry-day-north-america/) NDC Oslo (https://ndcoslo.com/), May 21st-23th, Coté speaking SDT News & Community Join our Slack community (https://softwaredefinedtalk.slack.com/join/shared_invite/zt-1hn55iv5d-UTfN7mVX1D9D5ExRt3ZJYQ#/shared-invite/email) Email the show: questions@softwaredefinedtalk.com (mailto:questions@softwaredefinedtalk.com) Free stickers: Email your address to stickers@softwaredefinedtalk.com (mailto:stickers@softwaredefinedtalk.com) Follow us on social media: Twitter (https://twitter.com/softwaredeftalk), Threads (https://www.threads.net/@softwaredefinedtalk), Mastodon (https://hachyderm.io/@softwaredefinedtalk), LinkedIn (https://www.linkedin.com/company/software-defined-talk/), BlueSky (https://bsky.app/profile/softwaredefinedtalk.com) Watch us on: Twitch (https://www.twitch.tv/sdtpodcast), YouTube (https://www.youtube.com/channel/UCi3OJPV6h9tp-hbsGBLGsDQ/featured), Instagram (https://www.instagram.com/softwaredefinedtalk/), TikTok (https://www.tiktok.com/@softwaredefinedtalk) Book offer: Use code SDT for $20 off "Digital WTF" by Coté (https://leanpub.com/digitalwtf/c/sdt) Sponsor the show (https://www.softwaredefinedtalk.com/ads): ads@softwaredefinedtalk.com (mailto:ads@softwaredefinedtalk.com) Recommendations Brandon: KONNWEI KW208 12V Car Battery Tester (https://www.amazon.com/dp/B08MPXGSGN?ref=ppx_yo2ov_dt_b_fed_asin_title) Matt: Search Engine: The Memecoin Casino (https://www.searchengine.show/planet-money-the-memecoin-casino/) Coté: :Knipex Cobra High-Tech Water Pump Pliers (https://www.amazon.com/atramentized-125-self-service-87-01/dp/B098D1HNGY/) Photo Credits Header (https://unsplash.com/photos/a-bicycle-parked-on-the-side-of-a-road-next-to-a-traffic-sign-wPv1QV_i8ek)
Let's dive into some “Con-versations” for episode 165 of The Steve & Crypto Show! We have a couple special appearances coming up including Saturday, April 12th at Squatchcon (Port Angeles, WA) & May 2nd-4th at Crypticon Seattle (SeaTac, WA). Tune in for some details of what to expect as we get squatchy then we bring our friend/cosplayer Krystin Bogan aka Cuda Kris Cosplay on the show to chat about the con scene in Alaska and our mutual love of Crypticon Seattle! All that and more...Tune in, share with your friends and if you're in the WA area, we hope you're getting your costume ready for the these conventions. Hope to see you there!Get your Squatchcon passes at www.squatchconpa.com, and come hang with us on April 12th!Head over to RondoAward.com to get details how to vote for us in the "Best Podcast" category!Space Monsters Magazine is coming! Stay tuned at spacemonsters.art for the latestCrypticon Seattle tickets are on sale NOW at www.crypticonseattle.comIf you've been enjoying The Steve & Crypto Show, and want to support your #3rd FAVORITE PODCAST, you can do so in the following places:Promote The Steve & Crypto Show and look really freakin' cool doing it with some merch: www.etsy.com/shop/SteveAndCryptoMerchGet exclusive content on Patreon: www.patreon.com/stevecryptoBuy Me A Coffee: www.buymeacoffee.com/stevecryptoJoin the Facebook Group: www.facebook.com/groups/stevecryptoshowAnd of course, be sure to follow Steve and Crypto Zoo on social media @thestevestrout and @cryptozoo88 both on X and Instagram!Other Friends of The Steve & Crypto Show:Subscribe to Pinup Palmer aka Gwengoolie at http://www.youtube.com/@pinuppalmerJoin Steve & Crypto Zoo in Expedition Roasters' 'Coffeeverse'. Visit ExpeditionRoasters.com and use the code STEVECRYPTO for a huge discount!Goth Cloth Co. is an amazing female owned and ran company featuring clothing, decor, accessories and more to fill your spooky and goth needs during the spooky season and beyond. Visit www.gothclothco.com and use the code GHOUL10 for 10% off your first purchase!Crypto Zoo Clothing is another amazing shop where you can find all the rad spooky gear and accessories and more. Get all their details at cryptozootees.comcryptozootees.comVisit Galactic Druid Treats at www.galacticdruidtreats.com and use the code STEVECRYPTO for a discountThank you for listening and for your support!Be sure to spread the word about The Steve & Crypto Show, and subscribe wherever you listen to us!
A strong performance from Georgi Romanov comes apart in the third period as the San Jose Sharks lose to the Calgary Flames 3-2. Will Smith had both goals for San Jose in the loss, while Macklin Celebrini had a pair of assists. Jules, Landi, and Puckguy collar on the game, the future with Quentin Musty coming to the Cuda and more. Teal Town USA - A San Jose Sharks' post-game podcast, for the fans, by the fans! Subscribe to catch us after every Sharks game and our weekly wrap-up show, The Pucknologists! Check us out on YouTube and remember to Like, Subscribe, and hit that Notification bell to be alerted every time we go live!
In this episode, Conor and Bryce chat about Bryce's talk The CUDA C++ Developer's Toolbox from NVIDIA GTC 2025.Link to Episode 227 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)SocialsADSP: The Podcast: TwitterConor Hoekstra: Twitter | BlueSky | MastodonBryce Adelstein LelbachShow NotesDate Generated: 2025-03-20Date Released: 2025-03-28NVIDIA GTC 2025NVIDIA GTC Trip Report⭐ The CUDA C++ Developer's Toolbox - GTC 2025 - Bryce LelbachThrustRAPIDS.aiCUTLASSCUBnvbenchHow to Make Beautiful Code PresentationsIntro Song InfoMiss You by Sarah Jansen https://soundcloud.com/sarahjansenmusicCreative Commons — Attribution 3.0 Unported — CC BY 3.0Free Download / Stream: http://bit.ly/l-miss-youMusic promoted by Audio Library https://youtu.be/iYYxnasvfx8
In 2022, Lin Qiao decided to leave Meta, where she was managing several hundred engineers, to start Fireworks AI. In this episode, we sit down with Lin for a deep dive on her work, starting with her leadership on PyTorch, now one of the most influential machine learning frameworks in the industry, powering research and production at scale across the AI industry. Now at the helm of Fireworks AI, Lin is leading a new wave in generative AI infrastructure, simplifying model deployment and optimizing performance to empower all developers building with Gen AI technologies.We dive into the technical core of Fireworks AI, uncovering their innovative strategies for model optimization, Function Calling in agentic development, and low-level breakthroughs at the GPU and CUDA layers.Fireworks AIWebsite - https://fireworks.aiX/Twitter - https://twitter.com/FireworksAI_HQLin QiaoLinkedIn - https://www.linkedin.com/in/lin-qiao-22248b4X/Twitter - https://twitter.com/lqiaoFIRSTMARKWebsite - https://firstmark.comX/Twitter - https://twitter.com/FirstMarkCapMatt Turck (Managing Director)LinkedIn - https://www.linkedin.com/in/turck/X/Twitter - https://twitter.com/mattturck(00:00) Intro(01:20) What is Fireworks AI?(02:47) What is PyTorch?(12:50) Traditional ML vs GenAI(14:54) AI's enterprise transformation(16:16) From Meta to Fireworks(19:39) Simplifying AI infrastructure(20:41) How Fireworks clients use GenAI(22:02) How many models are powered by Fireworks(30:09) LLM partitioning(34:43) Real-time vs pre-set search(36:56) Reinforcement learning(38:56) Function calling(44:23) Low-level architecture overview(45:47) Cloud GPUs & hardware support(47:16) VPC vs on-prem vs local deployment(49:50) Decreasing inference costs and its business implications(52:46) Fireworks roadmap(55:03) AI future predictions
Send us a textRide along as John CustomCarNerd Meyer talks with Clay King about his 1973 Plymouth Cuda at the MCACN car show. Ever wish you kept that car you had in high school? Well, Clay King did! This is a great story! The Get Out N Drive Podcast is Fuel By AMD ~ AMD: More Than MetalVisit the AMD Garage ~ Your one stop source for high quality body panelsSpeed over to our friends at Racing_JunkFor all things Get Out N Drive, cruise on over to the Get Out N Drive website.Be sure to follow GOND on social media!GOND WebsiteIGXFBYouTubeRecording Engineer, Paul MeyerSubscribe to the Str8sixfan YouTube Channel#classiccars #automotive #amd #autometaldirect #c10 #restoration #autorestoration #autoparts #restorationparts #truckrestoration #Jasonchandler #podcast #sheetmetal #rileysrebuilds #armo #sema #carburetorrebuild #queenofcarbs #mcacn #1957chevynomad #chevy #nomad#tradeschool#carengines#WhatDrivesYOUth#GetOutNDriveFASTJoin our fb group to share pics of how you Get Out N DriveFollow Jason on IGIGFollow Jason on fbSubscribe To the OldeCarrGuy YouTube ChannelFollow John on IGRecording Engineer, Paul MeyerSign Up and Learn more about National Get Out N Drive Day.Music Credit:Licensor's Author Username:LoopsLabLicensee:Get Out N Drive PodcastItem Title:The RockabillyItem URL:https://audiojungle.ne...Item ID:25802696Purchase Date:2022-09-07 22:37:20 UTCSupport the show
Chapitre 9 du Viveka Cuda Mani (Le plus beau fleuron de la discrimination) intitulé "Maya la grande enchanteresse". Lexique: _ Gunas: qualités, attributs ou caractéristiques de l'énergie universelle, au nombre de trois (rajas, tamas et sattva), dont la combinaison crée les divers éléments d'où procède la nature multiforme. _ Purusha: la Conscience suprême, substrat de toutes les opérations de la Matière, Prakriti. Purusha est alors synonyme d'Être suprême, d'Âme suprême universelle; Adi Purusha est la Personne archétype, Parama Purusha est l'Être suprême, et Purushottama est le plus haut parmi les Purushas. Bibliographie: https://www.babelio.com/livres/Sankar... Musique: Jaja (https://jaja.bandcamp.com/track/sternenseele) Narration et réalisation: Bruno Léger Production: Les mécènes du Vieux Sage Que règnent la paix et l'amour parmi tous les êtres de l'univers. OM Shanti, Shanti, Shanti.
The San Jose Barracuda (28-20-3-3) held four different leads on Wednesday but failed to close out the San Diego Gulls (21-25-5-3) at Tech CU Arena, falling 5-4 in overtime. In the loss, Andrew Poturalski collected his 17th multi-point (1+1=2) effort of the year and is now up to a league-high 60 points and Walker Duehr picked up a pair (1+1=2) of points in his Cuda debut.
The San Jose Barracuda (28-20-3-3) held four different leads on Wednesday but failed to close out the San Diego Gulls (21-25-5-3) at Tech CU Arena, falling 5-4 in overtime. In the loss, Andrew Poturalski collected his 17th multi-point (1+1=2) effort of the year and is now up to a league-high 60 points and Walker Duehr picked up a pair (1+1=2) of points in his Cuda debut.
The San Jose Sharks acquire defenseman Vincent Desharnais from the Pittsburgh Penguins in exchange for 2028 5th Round Pick. Puckguy, Kevin, and Jules break down this move plus, why did Tyler Toffoli sleepover with Macklin Celebrini & Will Smith. We'll also checked in on the Barracuda during their overtime loss to the Gulls. Teal Town USA - A San Jose Sharks' post-game podcast, for the fans, by the fans! Subscribe to catch us after every Sharks game and our weekly wrap-up show, The Pucknologists! Check us out on YouTube and remember to Like, Subscribe, and hit that Notification bell to be alerted every time we go live!
NVIDIA RAPIDS is an open-source suite of GPU-accelerated data science and AI libraries. It leverages CUDA and significantly enhances the performance of core Python frameworks including Polars, pandas, scikit-learn and NetworkX. Chris Deotte is a Senior Data Scientist at NVIDIA and Jean-Francois Puget is the Director and a Distinguished Engineer at NVIDIA. Chris and Jean-Francois The post NVIDIA RAPIDS and Open Source ML Acceleration with Chris Deotte and Jean-Francois Puget appeared first on Software Engineering Daily.
NVIDIA RAPIDS is an open-source suite of GPU-accelerated data science and AI libraries. It leverages CUDA and significantly enhances the performance of core Python frameworks including Polars, pandas, scikit-learn and NetworkX. Chris Deotte is a Senior Data Scientist at NVIDIA and Jean-Francois Puget is the Director and a Distinguished Engineer at NVIDIA. Chris and Jean-Francois The post NVIDIA RAPIDS and Open Source ML Acceleration with Chris Deotte and Jean-Francois Puget appeared first on Software Engineering Daily.
We recap the Sharks 3-2 loss to the Flames on Sunday night, then cover Sharks returning from 4 Nations break, the Sharks seven game, pre-trade road trip, and what could happen next. How can the league needs to keep the post 4 Nations hype going. Meanwhile, the Barracuda roller coaster continues with an Askarov injury derailing a playoff push. Break over, refresh or continue the same Hero, Zero for what Did 4 Nations give the hype for more hockey coverage? Around the NHL: maybe end the tourney early? Crazy results on the return. Barracuda update: Askarov injured, Cuda with mixed results and more… Teal Town USA - A San Jose Sharks' post-game podcast, for the fans, by the fans! Subscribe to catch us after every Sharks game and our weekly wrap-up show, The Pucknologists! Check us out on YouTube and remember to Like, Subscribe, and hit that Notification bell to be alerted every time we go live!
The San Jose Barracuda (26-18-2-3) never trailed on Saturday, upending the Ontario Reign (30-15-2-1), 3-1, at the Toyota Arena. With the win, the Cuda snapped their three-game skid versus the Reign and ended Ontario's seven-game points streak and four-game winning streak.
The San Jose Barracuda (26-18-2-3) never trailed on Saturday, upending the Ontario Reign (30-15-2-1), 3-1, at the Toyota Arena. With the win, the Cuda snapped their three-game skid versus the Reign and ended Ontario's seven-game points streak and four-game winning streak.
Ontario is in San Jose for a battle with the Barracuda on Wednesday night at Tech CU Arena. Jared Shafran and Paige Capistran recap the team's sweep of the Wranglers in Calgary with Monday's 3-2 overtime win and preview the contest against San Jose, the Reign's first against the Cuda since Dec. 15.
On this episode Ian and Puckguy talk about the 4-2 loss for the San Jose Barracuda to the San Diego Gulls, the struggle for the Cuda since Thanksgiving, the hype around the 4 Nations Face-Off, and a high school broadcaster cursing?! Teal Town USA - A San Jose Sharks' post-game podcast, for the fans, by the fans! Subscribe to catch us after every Sharks game and our weekly wrap-up show, The Pucknologists! Check us out on YouTube and remember to Like, Subscribe, and hit that Notification bell to be alerted every time we go live!
欢迎收听雪球出品的财经有深度,雪球,国内领先的集投资交流交易一体的综合财富管理平台,聪明的投资者都在这里。今天分享的内容叫英伟达CUDA的优势及挑战,来自wangdizhe。d s 对英伟达的挑战,并不是简单的“算法平权”,还有开源对闭源的挑战。如果只是了解a h100或者gb200这种东西,意义不大,英伟达的护城河主要是CUDACUDA的故事起步于2006年11月份发布的GeForce8800GTX,19年前了,那是一个起点。2007年6月份发布了英伟达的通用图形处理器、CUDA出世,这个跳跃是让显卡不仅能用在图像绘制了,也能用在其他方面。A I 的本质,其实和 B T C 的哈希算法类似,都是大量的数学计算。这也可以解释为啥近10年金融越来越“数学化”,包括做对冲的幻方能弄出 d s,也是因为它是最具“金融数学化底蕴的对冲私募”。主要就是transformer那套,也就是比如从一维的向量到二维的矩阵,然后再到三位或高位的张量,核心不在于算的多难,而在于算的题量很大。G P U 更像一个“事业部经理”,而 C P U 类似于一个“ c e o ”。 i t 世界一开始,c e o 比较重要,因为机会多多,需要面面俱到,就像80到90年代做生意,压对方向很重要。但随着时间发展,需要不断“细分而深化”,尤其是显卡计算部分,这部分其实初期是游戏推进的,但后期科学计算的需求上来了,把控机会需要更好的“项目经理”。G P U 内部有很多逻辑计算单位,每个单元基本上只做简单的加减乘除,靠着分工协同完成庞大的计算任务。CUDA就是G P U这个项目部经理手下的“调度总管”,比如计算张量这个活,就具体分派谁谁来做,也就是CUDA的作用,其实就是“算力调度者”,它优化算法效率。这个作用类似于斯隆对通用汽车的管理,也就是在具体的“算力事业部内”,CUDA这个算力调度者,甚至有比肩整个事业部经理的实力。因为所谓的算力,A M D 也有,也就是经理不稀罕,调度总管那套管理方法,却是稀缺的。算力管理的优化,也是 d s 之所以引人瞩目的地方,因为人们认为“算力调度工作”应该在CUDA逻辑下优化,但没想到 d s 用了一些方法,似乎实现了更大的优化。人们好奇的就是它是如何实现的?以及优化算力之后,对于未来算力需求是不是降低?以及这对于“算力优化”世界,意味着什么?CUDA的好处,是如果研究者,只会 a i 模型的训练及推理方法,而不会任务分类的话,也没事,英伟达有自动分配的程序库,这样玩ai的,只需要专注于训练或推论就行了。这降低了项目开发的门槛,等于是一个特殊的“懒人包”。所以开发人员都喜欢用,然后20年过去了,用的人越来越多,产生生态影响力和开发依赖度。未来英伟达还要推行量子计算,比如2023年就推出了 CUDA Quantum 平台,这部分也是为未来布局。其实逻辑核心依然是“并行计算”,也就是用多个处理单元,同时推进,计算量越大,越快,就越容易“大力超快出奇迹”。从一定程度上,可以理解CUDA在G P U 领域,是类似于x86在 C P U 领域的那种“专利优势”。CUDA未来就没有挑战么?当然有的,大概4个维度1、硬件挑战首先基本上,每个做 C P U 的,其实都看着别人火而眼馋。A M D 的mi300x,直接对标英伟达的H100,价格基本是其三分之一。然后 A M D还通过ROCm平台通过兼容CUDA代码吸引开发者,弱化CUDA生态。英特尔虽然遇到困境,但有美国政府撑腰,也没闲着,其G P U 加速器结合了Xe架构和开放标准SYCL,通过OneAPI实现跨硬件统一编程,降低对CUDA的依赖。然后就是科技巨头的自研芯片,比如谷歌TPU通过专用张量核心和软件栈在 A I 训练中实现更高能效比。亚马逊云科技的自研芯片直接与CUDA生态脱钩,挑战英伟达的云市场份额。以及中国势力的挑战,主要就是华为昇腾、寒武纪等国产芯片在政策驱动下抢占本土市场,通过兼容PyTorch等框架绕过CUDA绑定。2、软件挑战英伟达的闭源属具,让其必然引来开源的挑战, d s 事件其实就是代表之一。首先就是开源编译器的性能逼近,比如OpenAI Triton,支持Python编写G P U 内核,在英文的 G P U 上性能接近CUDA,同时兼容 A M D和英特尔硬件,成为CUDA的“平替”。然后就是AI框架的硬件抽象化:比如PyTorch 2.0与TorchDynamo,PyTorch通过编译器技术自动优化计算图,无需手动编写CUDA内核即可实现高性能,降低开发者对CUDA的依赖。最后是跨平台标准:比如Vulkan Compute和SYCL等开放标准支持多厂商硬件,未来可能挤压CUDA的生存空间3、cuda本身存在的技术瓶颈内存墙与通信瓶颈:G P U 显存容量和带宽增长放缓,而大模型训练需要TB级内存,迫使开发者转向分布式计算或多芯片方案,CUDA的单卡优化优势被稀释。其次是NVLink和InfiniBand的私有协议面临通用芯粒互联等开放标准的竞争,可能削弱英伟达全栈技术的协同效应。能效比挑战: 随着摩尔定律放缓,单纯依靠制程升级提升算力的模式不可持续。CUDA需在稀疏计算、混合精度等算法层创新,但竞争对手通过架构革新实现更高能效。量子计算与神经形态计算的长期威胁:量子计算在特定领域的突破可能分流HPC需求。神经形态芯片更适合脉冲神经网络,这些新型计算范式与CUDA的SIMT模型不兼容。4、市场及政策挑战地缘政治与供应链风险:美国对华高端G P U 出口限制迫使中国厂商加速去CUDA化,华为昇腾和百度的正在逐渐强化替代性生态。未来美国对从香港和新加坡渠道都会加强管理,对Azure华ai芯片营收占到英伟达总量的20到25%,这部分如果管制加强,英伟达业绩会受到影响。云厂商的“去英伟达化”策略:亚马逊、微软等云服务商通过自研芯片和多元化硬件方案降低对英伟达 G P U 的采购比例,CUDA在云端的统治力可能被削弱。开发者社区的迁移成本降低:工具链可将CUDA代码自动转换为HIP( A M D)或SYCL(Intel),迁移成本从“月级”降至“天级”,CUDA的生态锁定效应减弱。英伟达也不傻,早就看到了这些威胁,因此也在CUDA护城河上做出应对,大概做了4点应对:1、强化全栈优势:首先是软硬件协同设计,通过Grace Hopper超级芯片实现C P U /G P U 内存一致性,提升CUDA在异构计算中的竞争力。然后是,CUDA-X生态扩展,集成更多加速库,覆盖量子计算和科学计算等新领域。2、拥抱开放标准:有限支持开源编译器,同时推动英伟达贡献标准组织,避免被边缘化。3、抢占新兴场景:首先是重视“边缘计算”,通过Jetson平台和CUDA-on-ARM支持边缘AI,应对ROS 2等机器人框架的异构计算需求。然后是打造数字孪生与元宇宙,Omniverse平台依赖CUDA实现实时物理仿真,构建新的技术护城河。4、商业模式创新:打造CUDA-as-a-Service,通过NGC提供预训练模型和优化容器,增加用户粘性。整体来看,由于20年技术积累,开发者的生态黏性,以及巨大的迁移成本。导致CUDA护城河当下还比较强大。目前追的最快的就是 A M D,但至少3年内英伟达CUDA还是优势明显但从seekingalpha等文章反馈来看,如果CUDA被超越或者被追上,大概有2个临界预警值:1、技术临界点:当竞争对手的硬件性能超越英伟达且软件生态成熟度达到80%以上。2、经济临界点:云厂商自研芯片成本低于采购英伟达 G P U 的30%。所以要想投资互联网或者芯片产业,需要对于技术趋势有深度了解,阅读大量的资料和文献。对于“强科技成长”的估值尤其难,这也是巴菲特基本不碰强成长科技股的原因 (买苹果是当消费股买的)。美股这么贵,我旁观。先积累一些知识和资料,等回调时候方便下手。
One last Gold sponsor slot is available for the AI Engineer Summit in NYC. Our last round of invites is going out soon - apply here - If you are building AI agents or AI eng teams, this will be the single highest-signal conference of the year for you!While the world melts down over DeepSeek, few are talking about the OTHER notable group of former hedge fund traders who pivoted into AI and built a remarkably profitable consumer AI business with a tiny team with incredibly cracked engineering team — Chai Research. In short order they have:* Started a Chat AI company well before Noam Shazeer started Character AI, and outlasted his departure.* Crossed 1m DAU in 2.5 years - William updates us on the pod that they've hit 1.4m DAU now, another +40% from a few months ago. Revenue crossed >$22m. * Launched the Chaiverse model crowdsourcing platform - taking 3-4 week A/B testing cycles down to 3-4 hours, and deploying >100 models a week.While they're not paying million dollar salaries, you can tell they're doing pretty well for an 11 person startup:The Chai Recipe: Building infra for rapid evalsRemember how the central thesis of LMarena (formerly LMsys) is that the only comprehensive way to evaluate LLMs is to let users try them out and pick winners?At the core of Chai is a mobile app that looks like Character AI, but is actually the largest LLM A/B testing arena in the world, specialized on retaining chat users for Chai's usecases (therapy, assistant, roleplay, etc). It's basically what LMArena would be if taken very, very seriously at one company (with $1m in prizes to boot):Chai publishes occasional research on how they think about this, including talks at their Palo Alto office:William expands upon this in today's podcast (34 mins in):Fundamentally, the way I would describe it is when you're building anything in life, you need to be able to evaluate it. And through evaluation, you can iterate, we can look at benchmarks, and we can say the issues with benchmarks and why they may not generalize as well as one would hope in the challenges of working with them. But something that works incredibly well is getting feedback from humans. And so we built this thing where anyone can submit a model to our developer backend, and it gets put in front of 5000 users, and the users can rate it. And we can then have a really accurate ranking of like which model, or users finding more engaging or more entertaining. And it gets, you know, it's at this point now, where every day we're able to, I mean, we evaluate between 20 and 50 models, LLMs, every single day, right. So even though we've got only got a team of, say, five AI researchers, they're able to iterate a huge quantity of LLMs, right. So our team ships, let's just say minimum 100 LLMs a week is what we're able to iterate through. Now, before that moment in time, we might iterate through three a week, we might, you know, there was a time when even doing like five a month was a challenge, right? By being able to change the feedback loops to the point where it's not, let's launch these three models, let's do an A-B test, let's assign, let's do different cohorts, let's wait 30 days to see what the day 30 retention is, which is the kind of the, if you're doing an app, that's like A-B testing 101 would be, do a 30-day retention test, assign different treatments to different cohorts and come back in 30 days. So that's insanely slow. That's just, it's too slow. And so we were able to get that 30-day feedback loop all the way down to something like three hours.In Crowdsourcing the leap to Ten Trillion-Parameter AGI, William describes Chai's routing as a recommender system, which makes a lot more sense to us than previous pitches for model routing startups:William is notably counter-consensus in a lot of his AI product principles:* No streaming: Chats appear all at once to allow rejection sampling* No voice: Chai actually beat Character AI to introducing voice - but removed it after finding that it was far from a killer feature.* Blending: “Something that we love to do at Chai is blending, which is, you know, it's the simplest way to think about it is you're going to end up, and you're going to pretty quickly see you've got one model that's really smart, one model that's really funny. How do you get the user an experience that is both smart and funny? Well, just 50% of the requests, you can serve them the smart model, 50% of the requests, you serve them the funny model.” (that's it!)But chief above all is the recommender system.We also referenced Exa CEO Will Bryk's concept of SuperKnowlege:Full Video versionOn YouTube. please like and subscribe!Timestamps* 00:00:04 Introductions and background of William Beauchamp* 00:01:19 Origin story of Chai AI* 00:04:40 Transition from finance to AI* 00:11:36 Initial product development and idea maze for Chai* 00:16:29 User psychology and engagement with AI companions* 00:20:00 Origin of the Chai name* 00:22:01 Comparison with Character AI and funding challenges* 00:25:59 Chai's growth and user numbers* 00:34:53 Key inflection points in Chai's growth* 00:42:10 Multi-modality in AI companions and focus on user-generated content* 00:46:49 Chaiverse developer platform and model evaluation* 00:51:58 Views on AGI and the nature of AI intelligence* 00:57:14 Evaluation methods and human feedback in AI development* 01:02:01 Content creation and user experience in Chai* 01:04:49 Chai Grant program and company culture* 01:07:20 Inference optimization and compute costs* 01:09:37 Rejection sampling and reward models in AI generation* 01:11:48 Closing thoughts and recruitmentTranscriptAlessio [00:00:04]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel, and today we're in the Chai AI office with my usual co-host, Swyx.swyx [00:00:14]: Hey, thanks for having us. It's rare that we get to get out of the office, so thanks for inviting us to your home. We're in the office of Chai with William Beauchamp. Yeah, that's right. You're founder of Chai AI, but previously, I think you're concurrently also running your fund?William [00:00:29]: Yep, so I was simultaneously running an algorithmic trading company, but I fortunately was able to kind of exit from that, I think just in Q3 last year. Yeah, congrats. Yeah, thanks.swyx [00:00:43]: So Chai has always been on my radar because, well, first of all, you do a lot of advertising, I guess, in the Bay Area, so it's working. Yep. And second of all, the reason I reached out to a mutual friend, Joyce, was because I'm just generally interested in the... ...consumer AI space, chat platforms in general. I think there's a lot of inference insights that we can get from that, as well as human psychology insights, kind of a weird blend of the two. And we also share a bit of a history as former finance people crossing over. I guess we can just kind of start it off with the origin story of Chai.William [00:01:19]: Why decide working on a consumer AI platform rather than B2B SaaS? So just quickly touching on the background in finance. Sure. Originally, I'm from... I'm from the UK, born in London. And I was fortunate enough to go study economics at Cambridge. And I graduated in 2012. And at that time, everyone in the UK and everyone on my course, HFT, quant trading was really the big thing. It was like the big wave that was happening. So there was a lot of opportunity in that space. And throughout college, I'd sort of played poker. So I'd, you know, I dabbled as a professional poker player. And I was able to accumulate this sort of, you know, say $100,000 through playing poker. And at the time, as my friends would go work at companies like ChangeStreet or Citadel, I kind of did the maths. And I just thought, well, maybe if I traded my own capital, I'd probably come out ahead. I'd make more money than just going to work at ChangeStreet.swyx [00:02:20]: With 100k base as capital?William [00:02:22]: Yes, yes. That's not a lot. Well, it depends what strategies you're doing. And, you know, there is an advantage. There's an advantage to being small, right? Because there are, if you have a 10... Strategies that don't work in size. Exactly, exactly. So if you have a fund of $10 million, if you find a little anomaly in the market that you might be able to make 100k a year from, that's a 1% return on your 10 million fund. If your fund is 100k, that's 100% return, right? So being small, in some sense, was an advantage. So started off, and the, taught myself Python, and machine learning was like the big thing as well. Machine learning had really, it was the first, you know, big time machine learning was being used for image recognition, neural networks come out, you get dropout. And, you know, so this, this was the big thing that's going on at the time. So I probably spent my first three years out of Cambridge, just building neural networks, building random forests to try and predict asset prices, right, and then trade that using my own money. And that went well. And, you know, if you if you start something, and it goes well, you You try and hire more people. And the first people that came to mind was the talented people I went to college with. And so I hired some friends. And that went well and hired some more. And eventually, I kind of ran out of friends to hire. And so that was when I formed the company. And from that point on, we had our ups and we had our downs. And that was a whole long story and journey in itself. But after doing that for about eight or nine years, on my 30th birthday, which was four years ago now, I kind of took a step back to just evaluate my life, right? This is what one does when one turns 30. You know, I just heard it. I hear you. And, you know, I looked at my 20s and I loved it. It was a really special time. I was really lucky and fortunate to have worked with this amazing team, been successful, had a lot of hard times. And through the hard times, learned wisdom and then a lot of success and, you know, was able to enjoy it. And so the company was making about five million pounds a year. And it was just me and a team of, say, 15, like, Oxford and Cambridge educated mathematicians and physicists. It was like the real dream that you'd have if you wanted to start a quant trading firm. It was like...swyx [00:04:40]: Your own, all your own money?William [00:04:41]: Yeah, exactly. It was all the team's own money. We had no customers complaining to us about issues. There's no investors, you know, saying, you know, they don't like the risk that we're taking. We could. We could really run the thing exactly as we wanted it. It's like Susquehanna or like Rintec. Yeah, exactly. Yeah. And they're the companies that we would kind of look towards as we were building that thing out. But on my 30th birthday, I look and I say, OK, great. This thing is making as much money as kind of anyone would really need. And I thought, well, what's going to happen if we keep going in this direction? And it was clear that we would never have a kind of a big, big impact on the world. We can enrich ourselves. We can make really good money. Everyone on the team would be paid very, very well. Presumably, I can make enough money to buy a yacht or something. But this stuff wasn't that important to me. And so I felt a sort of obligation that if you have this much talent and if you have a talented team, especially as a founder, you want to be putting all that talent towards a good use. I looked at the time of like getting into crypto and I had a really strong view on crypto, which was that as far as a gambling device. This is like the most fun form of gambling invented in like ever super fun, I thought as a way to evade monetary regulations and banking restrictions. I think it's also absolutely amazing. So it has two like killer use cases, not so much banking the unbanked, but everything else, but everything else to do with like the blockchain and, and you know, web, was it web 3.0 or web, you know, that I, that didn't, it didn't really make much sense. And so instead of going into crypto, which I thought, even if I was successful, I'd end up in a lot of trouble. I thought maybe it'd be better to build something that governments wouldn't have a problem with. I knew that LLMs were like a thing. I think opening. I had said they hadn't released GPT-3 yet, but they'd said GPT-3 is so powerful. We can't release it to the world or something. Was it GPT-2? And then I started interacting with, I think Google had open source, some language models. They weren't necessarily LLMs, but they, but they were. But yeah, exactly. So I was able to play around with, but nowadays so many people have interacted with the chat GPT, they get it, but it's like the first time you, you can just talk to a computer and it talks back. It's kind of a special moment and you know, everyone who's done that goes like, wow, this is how it should be. Right. It should be like, rather than having to type on Google and search, you should just be able to ask Google a question. When I saw that I read the literature, I kind of came across the scaling laws and I think even four years ago. All the pieces of the puzzle were there, right? Google had done this amazing research and published, you know, a lot of it. Open AI was still open. And so they'd published a lot of their research. And so you really could be fully informed on, on the state of AI and where it was going. And so at that point I was confident enough, it was worth a shot. I think LLMs are going to be the next big thing. And so that's the thing I want to be building in, in that space. And I thought what's the most impactful product I can possibly build. And I thought it should be a platform. So I myself love platforms. I think they're fantastic because they open up an ecosystem where anyone can contribute to it. Right. So if you think of a platform like a YouTube, instead of it being like a Hollywood situation where you have to, if you want to make a TV show, you have to convince Disney to give you the money to produce it instead, anyone in the world can post any content they want to YouTube. And if people want to view it, the algorithm is going to promote it. Nowadays. You can look at creators like Mr. Beast or Joe Rogan. They would have never have had that opportunity unless it was for this platform. Other ones like Twitter's a great one, right? But I would consider Wikipedia to be a platform where instead of the Britannica encyclopedia, which is this, it's like a monolithic, you get all the, the researchers together, you get all the data together and you combine it in this, in this one monolithic source. Instead. You have this distributed thing. You can say anyone can host their content on Wikipedia. Anyone can contribute to it. And anyone can maybe their contribution is they delete stuff. When I was hearing like the kind of the Sam Altman and kind of the, the Muskian perspective of AI, it was a very kind of monolithic thing. It was all about AI is basically a single thing, which is intelligence. Yeah. Yeah. The more intelligent, the more compute, the more intelligent, and the more and better AI researchers, the more intelligent, right? They would speak about it as a kind of erased, like who can get the most data, the most compute and the most researchers. And that would end up with the most intelligent AI. But I didn't believe in any of that. I thought that's like the total, like I thought that perspective is the perspective of someone who's never actually done machine learning. Because with machine learning, first of all, you see that the performance of the models follows an S curve. So it's not like it just goes off to infinity, right? And the, the S curve, it kind of plateaus around human level performance. And you can look at all the, all the machine learning that was going on in the 2010s, everything kind of plateaued around the human level performance. And we can think about the self-driving car promises, you know, how Elon Musk kept saying the self-driving car is going to happen next year, it's going to happen next, next year. Or you can look at the image recognition, the speech recognition. You can look at. All of these things, there was almost nothing that went superhuman, except for something like AlphaGo. And we can speak about why AlphaGo was able to go like super superhuman. So I thought the most likely thing was going to be this, I thought it's not going to be a monolithic thing. That's like an encyclopedia Britannica. I thought it must be a distributed thing. And I actually liked to look at the world of finance for what I think a mature machine learning ecosystem would look like. So, yeah. So finance is a machine learning ecosystem because all of these quant trading firms are running machine learning algorithms, but they're running it on a centralized platform like a marketplace. And it's not the case that there's one giant quant trading company of all the data and all the quant researchers and all the algorithms and compute, but instead they all specialize. So one will specialize on high frequency training. Another will specialize on mid frequency. Another one will specialize on equity. Another one will specialize. And I thought that's the way the world works. That's how it is. And so there must exist a platform where a small team can produce an AI for a unique purpose. And they can iterate and build the best thing for that, right? And so that was the vision for Chai. So we wanted to build a platform for LLMs.Alessio [00:11:36]: That's kind of the maybe inside versus contrarian view that led you to start the company. Yeah. And then what was maybe the initial idea maze? Because if somebody told you that was the Hugging Face founding story, people might believe it. It's kind of like a similar ethos behind it. How did you land on the product feature today? And maybe what were some of the ideas that you discarded that initially you thought about?William [00:11:58]: So the first thing we built, it was fundamentally an API. So nowadays people would describe it as like agents, right? But anyone could write a Python script. They could submit it to an API. They could send it to the Chai backend and we would then host this code and execute it. So that's like the developer side of the platform. On their Python script, the interface was essentially text in and text out. An example would be the very first bot that I created. I think it was a Reddit news bot. And so it would first, it would pull the popular news. Then it would prompt whatever, like I just use some external API for like Burr or GPT-2 or whatever. Like it was a very, very small thing. And then the user could talk to it. So you could say to the bot, hi bot, what's the news today? And it would say, this is the top stories. And you could chat with it. Now four years later, that's like perplexity or something. That's like the, right? But back then the models were first of all, like really, really dumb. You know, they had an IQ of like a four year old. And users, there really wasn't any demand or any PMF for interacting with the news. So then I was like, okay. Um. So let's make another one. And I made a bot, which was like, you could talk to it about a recipe. So you could say, I'm making eggs. Like I've got eggs in my fridge. What should I cook? And it'll say, you should make an omelet. Right. There was no PMF for that. No one used it. And so I just kept creating bots. And so every single night after work, I'd be like, okay, I like, we have AI, we have this platform. I can create any text in textile sort of agent and put it on the platform. And so we just create stuff night after night. And then all the coders I knew, I would say, yeah, this is what we're going to do. And then I would say to them, look, there's this platform. You can create any like chat AI. You should put it on. And you know, everyone's like, well, chatbots are super lame. We want absolutely nothing to do with your chatbot app. No one who knew Python wanted to build on it. I'm like trying to build all these bots and no consumers want to talk to any of them. And then my sister who at the time was like just finishing college or something, I said to her, I was like, if you want to learn Python, you should just submit a bot for my platform. And she, she built a therapy for me. And I was like, okay, cool. I'm going to build a therapist bot. And then the next day I checked the performance of the app and I'm like, oh my God, we've got 20 active users. And they spent, they spent like an average of 20 minutes on the app. I was like, oh my God, what, what bot were they speaking to for an average of 20 minutes? And I looked and it was the therapist bot. And I went, oh, this is where the PMF is. There was no demand for, for recipe help. There was no demand for news. There was no demand for dad jokes or pub quiz or fun facts or what they wanted was they wanted the therapist bot. the time I kind of reflected on that and I thought, well, if I want to consume news, the most fun thing, most fun way to consume news is like Twitter. It's not like the value of there being a back and forth, wasn't that high. Right. And I thought if I need help with a recipe, I actually just go like the New York times has a good recipe section, right? It's not actually that hard. And so I just thought the thing that AI is 10 X better at is a sort of a conversation right. That's not intrinsically informative, but it's more about an opportunity. You can say whatever you want. You're not going to get judged. If it's 3am, you don't have to wait for your friend to text back. It's like, it's immediate. They're going to reply immediately. You can say whatever you want. It's judgment-free and it's much more like a playground. It's much more like a fun experience. And you could see that if the AI gave a person a compliment, they would love it. It's much easier to get the AI to give you a compliment than a human. From that day on, I said, okay, I get it. Humans want to speak to like humans or human like entities and they want to have fun. And that was when I started to look less at platforms like Google. And I started to look more at platforms like Instagram. And I was trying to think about why do people use Instagram? And I could see that I think Chai was, was filling the same desire or the same drive. If you go on Instagram, typically you want to look at the faces of other humans, or you want to hear about other people's lives. So if it's like the rock is making himself pancakes on a cheese plate. You kind of feel a little bit like you're the rock's friend, or you're like having pancakes with him or something, right? But if you do it too much, you feel like you're sad and like a lonely person, but with AI, you can talk to it and tell it stories and tell you stories, and you can play with it for as long as you want. And you don't feel like you're like a sad, lonely person. You feel like you actually have a friend.Alessio [00:16:29]: And what, why is that? Do you have any insight on that from using it?William [00:16:33]: I think it's just the human psychology. I think it's just the idea that, with old school social media. You're just consuming passively, right? So you'll just swipe. If I'm watching TikTok, just like swipe and swipe and swipe. And even though I'm getting the dopamine of like watching an engaging video, there's this other thing that's building my head, which is like, I'm feeling lazier and lazier and lazier. And after a certain period of time, I'm like, man, I just wasted 40 minutes. I achieved nothing. But with AI, because you're interacting, you feel like you're, it's not like work, but you feel like you're participating and contributing to the thing. You don't feel like you're just. Consuming. So you don't have a sense of remorse basically. And you know, I think on the whole people, the way people talk about, try and interact with the AI, they speak about it in an incredibly positive sense. Like we get people who say they have eating disorders saying that the AI helps them with their eating disorders. People who say they're depressed, it helps them through like the rough patches. So I think there's something intrinsically healthy about interacting that TikTok and Instagram and YouTube doesn't quite tick. From that point on, it was about building more and more kind of like human centric AI for people to interact with. And I was like, okay, let's make a Kanye West bot, right? And then no one wanted to talk to the Kanye West bot. And I was like, ah, who's like a cool persona for teenagers to want to interact with. And I was like, I was trying to find the influencers and stuff like that, but no one cared. Like they didn't want to interact with the, yeah. And instead it was really just the special moment was when we said the realization that developers and software engineers aren't interested in building this sort of AI, but the consumers are right. And rather than me trying to guess every day, like what's the right bot to submit to the platform, why don't we just create the tools for the users to build it themselves? And so nowadays this is like the most obvious thing in the world, but when Chai first did it, it was not an obvious thing at all. Right. Right. So we took the API for let's just say it was, I think it was GPTJ, which was this 6 billion parameter open source transformer style LLM. We took GPTJ. We let users create the prompt. We let users select the image and we let users choose the name. And then that was the bot. And through that, they could shape the experience, right? So if they said this bot's going to be really mean, and it's going to be called like bully in the playground, right? That was like a whole category that I never would have guessed. Right. People love to fight. They love to have a disagreement, right? And then they would create, there'd be all these romantic archetypes that I didn't know existed. And so as the users could create the content that they wanted, that was when Chai was able to, to get this huge variety of content and rather than appealing to, you know, 1% of the population that I'd figured out what they wanted, you could appeal to a much, much broader thing. And so from that moment on, it was very, very crystal clear. It's like Chai, just as Instagram is this social media platform that lets people create images and upload images, videos and upload that, Chai was really about how can we let the users create this experience in AI and then share it and interact and search. So it's really, you know, I say it's like a platform for social AI.Alessio [00:20:00]: Where did the Chai name come from? Because you started the same path. I was like, is it character AI shortened? You started at the same time, so I was curious. The UK origin was like the second, the Chai.William [00:20:15]: We started way before character AI. And there's an interesting story that Chai's numbers were very, very strong, right? So I think in even 20, I think late 2022, was it late 2022 or maybe early 2023? Chai was like the number one AI app in the app store. So we would have something like 100,000 daily active users. And then one day we kind of saw there was this website. And we were like, oh, this website looks just like Chai. And it was the character AI website. And I think that nowadays it's, I think it's much more common knowledge that when they left Google with the funding, I think they knew what was the most trending, the number one app. And I think they sort of built that. Oh, you found the people.swyx [00:21:03]: You found the PMF for them.William [00:21:04]: We found the PMF for them. Exactly. Yeah. So I worked a year very, very hard. And then they, and then that was when I learned a lesson, which is that if you're VC backed and if, you know, so Chai, we'd kind of ran, we'd got to this point, I was the only person who'd invested. I'd invested maybe 2 million pounds in the business. And you know, from that, we were able to build this thing, get to say a hundred thousand daily active users. And then when character AI came along, the first version, we sort of laughed. We were like, oh man, this thing sucks. Like they don't know what they're building. They're building the wrong thing anyway, but then I saw, oh, they've raised a hundred million dollars. Oh, they've raised another hundred million dollars. And then our users started saying, oh guys, your AI sucks. Cause we were serving a 6 billion parameter model, right? How big was the model that character AI could afford to serve, right? So we would be spending, let's say we would spend a dollar per per user, right? Over the, the, you know, the entire lifetime.swyx [00:22:01]: A dollar per session, per chat, per month? No, no, no, no.William [00:22:04]: Let's say we'd get over the course of the year, we'd have a million users and we'd spend a million dollars on the AI throughout the year. Right. Like aggregated. Exactly. Exactly. Right. They could spend a hundred times that. So people would say, why is your AI much dumber than character AIs? And then I was like, oh, okay, I get it. This is like the Silicon Valley style, um, hyper scale business. And so, yeah, we moved to Silicon Valley and, uh, got some funding and iterated and built the flywheels. And, um, yeah, I, I'm very proud that we were able to compete with that. Right. So, and I think the reason we were able to do it was just customer obsession. And it's similar, I guess, to how deep seek have been able to produce such a compelling model when compared to someone like an open AI, right? So deep seek, you know, their latest, um, V2, yeah, they claim to have spent 5 million training it.swyx [00:22:57]: It may be a bit more, but, um, like, why are you making it? Why are you making such a big deal out of this? Yeah. There's an agenda there. Yeah. You brought up deep seek. So we have to ask you had a call with them.William [00:23:07]: We did. We did. We did. Um, let me think what to say about that. I think for one, they have an amazing story, right? So their background is again in finance.swyx [00:23:16]: They're the Chinese version of you. Exactly.William [00:23:18]: Well, there's a lot of similarities. Yes. Yes. I have a great affinity for companies which are like, um, founder led, customer obsessed and just try and build something great. And I think what deep seek have achieved. There's quite special is they've got this amazing inference engine. They've been able to reduce the size of the KV cash significantly. And then by being able to do that, they're able to significantly reduce their inference costs. And I think with kind of with AI, people get really focused on like the kind of the foundation model or like the model itself. And they sort of don't pay much attention to the inference. To give you an example with Chai, let's say a typical user session is 90 minutes, which is like, you know, is very, very long for comparison. Let's say the average session length on TikTok is 70 minutes. So people are spending a lot of time. And in that time they're able to send say 150 messages. That's a lot of completions, right? It's quite different from an open AI scenario where people might come in, they'll have a particular question in mind. And they'll ask like one question. And a few follow up questions, right? So because they're consuming, say 30 times as many requests for a chat, or a conversational experience, you've got to figure out how to how to get the right balance between the cost of that and the quality. And so, you know, I think with AI, it's always been the case that if you want a better experience, you can throw compute at the problem, right? So if you want a better model, you can just make it bigger. If you want it to remember better, give it a longer context. And now, what open AI is doing to great fanfare is with projection sampling, you can generate many candidates, right? And then with some sort of reward model or some sort of scoring system, you can serve the most promising of these many candidates. And so that's kind of scaling up on the inference time compute side of things. And so for us, it doesn't make sense to think of AI is just the absolute performance. So. But what we're seeing, it's like the MML you score or the, you know, any of these benchmarks that people like to look at, if you just get that score, it doesn't really tell tell you anything. Because it's really like progress is made by improving the performance per dollar. And so I think that's an area where deep seek have been able to form very, very well, surprisingly so. And so I'm very interested in what Lama four is going to look like. And if they're able to sort of match what deep seek have been able to achieve with this performance per dollar gain.Alessio [00:25:59]: Before we go into the inference, some of the deeper stuff, can you give people an overview of like some of the numbers? So I think last I checked, you have like 1.4 million daily active now. It's like over 22 million of revenue. So it's quite a business.William [00:26:12]: Yeah, I think we grew by a factor of, you know, users grew by a factor of three last year. Revenue over doubled. You know, it's very exciting. We're competing with some really big, really well funded companies. Character AI got this, I think it was almost a $3 billion valuation. And they have 5 million DAU is a number that I last heard. Torquay, which is a Chinese built app owned by a company called Minimax. They're incredibly well funded. And these companies didn't grow by a factor of three last year. Right. And so when you've got this company and this team that's able to keep building something that gets users excited, and they want to tell their friend about it, and then they want to come and they want to stick on the platform. I think that's very special. And so last year was a great year for the team. And yeah, I think the numbers reflect the hard work that we put in. And then fundamentally, the quality of the app, the quality of the content, the quality of the content, the quality of the content, the quality of the content, the quality of the content. AI is the quality of the experience that you have. You actually published your DAU growth chart, which is unusual. And I see some inflections. Like, it's not just a straight line. There's some things that actually inflect. Yes. What were the big ones? Cool. That's a great, great, great question. Let me think of a good answer. I'm basically looking to annotate this chart, which doesn't have annotations on it. Cool. The first thing I would say is this is, I think the most important thing to know about success is that success is born out of failures. Right? Through failures that we learn. You know, if you think something's a good idea, and you do and it works, great, but you didn't actually learn anything, because everything went exactly as you imagined. But if you have an idea, you think it's going to be good, you try it, and it fails. There's a gap between the reality and expectation. And that's an opportunity to learn. The flat periods, that's us learning. And then the up periods is that's us reaping the rewards of that. So I think the big, of the growth shot of just 2024, I think the first thing that really kind of put a dent in our growth was our backend. So we just reached this scale. So we'd, from day one, we'd built on top of Google's GCP, which is Google's cloud platform. And they were fantastic. We used them when we had one daily active user, and they worked pretty good all the way up till we had about 500,000. It was never the cheapest, but from an engineering perspective, man, that thing scaled insanely good. Like, not Vertex? Not Vertex. Like GKE, that kind of stuff? We use Firebase. So we use Firebase. I'm pretty sure we're the biggest user ever on Firebase. That's expensive. Yeah, we had calls with engineers, and they're like, we wouldn't recommend using this product beyond this point, and you're 3x over that. So we pushed Google to their absolute limits. You know, it was fantastic for us, because we could focus on the AI. We could focus on just adding as much value as possible. But then what happened was, after 500,000, just the thing, the way we were using it, and it would just, it wouldn't scale any further. And so we had a really, really painful, at least three-month period, as we kind of migrated between different services, figuring out, like, what requests do we want to keep on Firebase, and what ones do we want to move on to something else? And then, you know, making mistakes. And learning things the hard way. And then after about three months, we got that right. So that, we would then be able to scale to the 1.5 million DAE without any further issues from the GCP. But what happens is, if you have an outage, new users who go on your app experience a dysfunctional app, and then they're going to exit. And so your next day, the key metrics that the app stores track are going to be something like retention rates. And so your next day, the key metrics that the app stores track are going to be something like retention rates. Money spent, and the star, like, the rating that they give you. In the app store. In the app store, yeah. Tyranny. So if you're ranked top 50 in entertainment, you're going to acquire a certain rate of users organically. If you go in and have a bad experience, it's going to tank where you're positioned in the algorithm. And then it can take a long time to kind of earn your way back up, at least if you wanted to do it organically. If you throw money at it, you can jump to the top. And I could talk about that. But broadly speaking, if we look at 2024, the first kink in the graph was outages due to hitting 500k DAU. The backend didn't want to scale past that. So then we just had to do the engineering and build through it. Okay, so we built through that, and then we get a little bit of growth. And so, okay, that's feeling a little bit good. I think the next thing, I think it's, I'm not going to lie, I have a feeling that when Character AI got... I was thinking. I think so. I think... So the Character AI team fundamentally got acquired by Google. And I don't know what they changed in their business. I don't know if they dialed down that ad spend. Products don't change, right? Products just what it is. I don't think so. Yeah, I think the product is what it is. It's like maintenance mode. Yes. I think the issue that people, you know, some people may think this is an obvious fact, but running a business can be very competitive, right? Because other businesses can see what you're doing, and they can imitate you. And then there's this... There's this question of, if you've got one company that's spending $100,000 a day on advertising, and you've got another company that's spending zero, if you consider market share, and if you're considering new users which are entering the market, the guy that's spending $100,000 a day is going to be getting 90% of those new users. And so I have a suspicion that when the founders of Character AI left, they dialed down their spending on user acquisition. And I think that kind of gave oxygen to like the other apps. And so Chai was able to then start growing again in a really healthy fashion. I think that's kind of like the second thing. I think a third thing is we've really built a great data flywheel. Like the AI team sort of perfected their flywheel, I would say, in end of Q2. And I could speak about that at length. But fundamentally, the way I would describe it is when you're building anything in life, you need to be able to evaluate it. And through evaluation, you can iterate, we can look at benchmarks, and we can say the issues with benchmarks and why they may not generalize as well as one would hope in the challenges of working with them. But something that works incredibly well is getting feedback from humans. And so we built this thing where anyone can submit a model to our developer backend, and it gets put in front of 5000 users, and the users can rate it. And we can then have a really accurate ranking of like which model, or users finding more engaging or more entertaining. And it gets, you know, it's at this point now, where every day we're able to, I mean, we evaluate between 20 and 50 models, LLMs, every single day, right. So even though we've got only got a team of, say, five AI researchers, they're able to iterate a huge quantity of LLMs, right. So our team ships, let's just say minimum 100 LLMs a week is what we're able to iterate through. Now, before that moment in time, we might iterate through three a week, we might, you know, there was a time when even doing like five a month was a challenge, right? By being able to change the feedback loops to the point where it's not, let's launch these three models, let's do an A-B test, let's assign, let's do different cohorts, let's wait 30 days to see what the day 30 retention is, which is the kind of the, if you're doing an app, that's like A-B testing 101 would be, do a 30-day retention test, assign different treatments to different cohorts and come back in 30 days. So that's insanely slow. That's just, it's too slow. And so we were able to get that 30-day feedback loop all the way down to something like three hours. And when we did that, we could really, really, really perfect techniques like DPO, fine tuning, prompt engineering, blending, rejection sampling, training a reward model, right, really successfully, like boom, boom, boom, boom, boom. And so I think in Q3 and Q4, we got, the amount of AI improvements we got was like astounding. It was getting to the point, I thought like how much more, how much more edge is there to be had here? But the team just could keep going and going and going. That was like number three for the inflection point.swyx [00:34:53]: There's a fourth?William [00:34:54]: The important thing about the third one is if you go on our Reddit or you talk to users of AI, there's like a clear date. It's like somewhere in October or something. The users, they flipped. Before October, the users... The users would say character AI is better than you, for the most part. Then from October onwards, they would say, wow, you guys are better than character AI. And that was like a really clear positive signal that we'd sort of done it. And I think people, you can't cheat consumers. You can't trick them. You can't b******t them. They know, right? If you're going to spend 90 minutes on a platform, and with apps, there's the barriers to switching is pretty low. Like you can try character AI, you can't cheat consumers. You can't cheat them. You can't cheat them. You can't cheat AI for a day. If you get bored, you can try Chai. If you get bored of Chai, you can go back to character. So the users, the loyalty is not strong, right? What keeps them on the app is the experience. If you deliver a better experience, they're going to stay and they can tell. So that was the fourth one was we were fortunate enough to get this hire. He was hired one really talented engineer. And then they said, oh, at my last company, we had a head of growth. He was really, really good. And he was the head of growth for ByteDance for two years. Would you like to speak to him? And I was like, yes. Yes, I think I would. And so I spoke to him. And he just blew me away with what he knew about user acquisition. You know, it was like a 3D chessswyx [00:36:21]: sort of thing. You know, as much as, as I know about AI. Like ByteDance as in TikTok US. Yes.William [00:36:26]: Not ByteDance as other stuff. Yep. He was interviewing us as we were interviewing him. Right. And so pick up options. Yeah, exactly. And so he was kind of looking at our metrics. And he was like, I saw him get really excited when he said, guys, you've got a million daily active users and you've done no advertising. I said, correct. And he was like, that's unheard of. He's like, I've never heard of anyone doing that. And then he started looking at our metrics. And he was like, if you've got all of this organically, if you start spending money, this is going to be very exciting. I was like, let's give it a go. So then he came in, we've just started ramping up the user acquisition. So that looks like spending, you know, let's say we're spending, we started spending $20,000 a day, it looked very promising than 20,000. Right now we're spending $40,000 a day on user acquisition. That's still only half of what like character AI or talkie may be spending. But from that, it's sort of, we were growing at a rate of maybe say, 2x a year. And that got us growing at a rate of 3x a year. So I'm growing, I'm evolving more and more to like a Silicon Valley style hyper growth, like, you know, you build something decent, and then you canswyx [00:37:33]: slap on a huge... You did the important thing, you did the product first.William [00:37:36]: Of course, but then you can slap on like, like the rocket or the jet engine or something, which is just this cash in, you pour in as much cash, you buy a lot of ads, and your growth is faster.swyx [00:37:48]: Not to, you know, I'm just kind of curious what's working right now versus what surprisinglyWilliam [00:37:52]: doesn't work. Oh, there's a long, long list of surprising stuff that doesn't work. Yeah. The surprising thing, like the most surprising thing, what doesn't work is almost everything doesn't work. That's what's surprising. And I'll give you an example. So like a year and a half ago, I was working at a company, we were super excited by audio. I was like, audio is going to be the next killer feature, we have to get in the app. And I want to be the first. So everything Chai does, I want us to be the first. We may not be the company that's strongest at execution, but we can always be theswyx [00:38:22]: most innovative. Interesting. Right? So we can... You're pretty strong at execution.William [00:38:26]: We're much stronger, we're much stronger. A lot of the reason we're here is because we were first. If we launched today, it'd be so hard to get the traction. Because it's like to get the flywheel, to get the users, to build a product people are excited about. If you're first, people are naturally excited about it. But if you're fifth or 10th, man, you've got to beswyx [00:38:46]: insanely good at execution. So you were first with voice? We were first. We were first. I only knowWilliam [00:38:51]: when character launched voice. They launched it, I think they launched it at least nine months after us. Okay. Okay. But the team worked so hard for it. At the time we did it, latency is a huge problem. Cost is a huge problem. Getting the right quality of the voice is a huge problem. Right? Then there's this user interface and getting the right user experience. Because you don't just want it to start blurting out. Right? You want to kind of activate it. But then you don't have to keep pressing a button every single time. There's a lot that goes into getting a really smooth audio experience. So we went ahead, we invested the three months, we built it all. And then when we did the A-B test, there was like, no change in any of the numbers. And I was like, this can't be right, there must be a bug. And we spent like a week just checking everything, checking again, checking again. And it was like, the users just did not care. And it was something like only 10 or 15% of users even click the button to like, they wanted to engage the audio. And they would only use it for 10 or 15% of the time. So if you do the math, if it's just like something that one in seven people use it for one seventh of their time. You've changed like 2% of the experience. So even if that that 2% of the time is like insanely good, it doesn't translate much when you look at the retention, when you look at the engagement, and when you look at the monetization rates. So audio did not have a big impact. I'm pretty big on audio. But yeah, I like it too. But it's, you know, so a lot of the stuff which I do, I'm a big, you can have a theory. And you resist. Yeah. Exactly, exactly. So I think if you want to make audio work, it has to be a unique, compelling, exciting experience that they can't have anywhere else.swyx [00:40:37]: It could be your models, which just weren't good enough.William [00:40:39]: No, no, no, they were great. Oh, yeah, they were very good. it was like, it was kind of like just the, you know, if you listen to like an audible or Kindle, or something like, you just hear this voice. And it's like, you don't go like, wow, this is this is special, right? It's like a convenience thing. But the idea is that if you can, if Chai is the only platform, like, let's say you have a Mr. Beast, and YouTube is the only platform you can use to make audio work, then you can watch a Mr. Beast video. And it's the most engaging, fun video that you want to watch, you'll go to a YouTube. And so it's like for audio, you can't just put the audio on there. And people go, oh, yeah, it's like 2% better. Or like, 5% of users think it's 20% better, right? It has to be something that the majority of people, for the majority of the experience, go like, wow, this is a big deal. That's the features you need to be shipping. If it's not going to appeal to the majority of people, for the majority of the experience, and it's not a big deal, it's not going to move you. Cool. So you killed it. I don't see it anymore. Yep. So I love this. The longer, it's kind of cheesy, I guess, but the longer I've been working at Chai, and I think the team agrees with this, all the platitudes, at least I thought they were platitudes, that you would get from like the Steve Jobs, which is like, build something insanely great, right? Or be maniacally focused, or, you know, the most important thing is saying no to, not to work on. All of these sort of lessons, they just are like painfully true. They're painfully true. So now I'm just like, everything I say, I'm either quoting Steve Jobs or Zuckerberg. I'm like, guys, move fast and break free.swyx [00:42:10]: You've jumped the Apollo to cool it now.William [00:42:12]: Yeah, it's just so, everything they said is so, so true. The turtle neck. Yeah, yeah, yeah. Everything is so true.swyx [00:42:18]: This last question on my side, and I want to pass this to Alessio, is on just, just multi-modality in general. This actually comes from Justine Moore from A16Z, who's a friend of ours. And a lot of people are trying to do voice image video for AI companions. Yes. You just said voice didn't work. Yep. What would make you revisit?William [00:42:36]: So Steve Jobs, he was very, listen, he was very, very clear on this. There's a habit of engineers who, once they've got some cool technology, they want to find a way to package up the cool technology and sell it to consumers, right? That does not work. So you're free to try and build a startup where you've got your cool tech and you want to find someone to sell it to. That's not what we do at Chai. At Chai, we start with the consumer. What does the consumer want? What is their problem? And how do we solve it? So right now, the number one problems for the users, it's not the audio. That's not the number one problem. It's not the image generation either. That's not their problem either. The number one problem for users in AI is this. All the AI is being generated by middle-aged men in Silicon Valley, right? That's all the content. You're interacting with this AI. You're speaking to it for 90 minutes on average. It's being trained by middle-aged men. The guys out there, they're out there. They're talking to you. They're talking to you. They're like, oh, what should the AI say in this situation, right? What's funny, right? What's cool? What's boring? What's entertaining? That's not the way it should be. The way it should be is that the users should be creating the AI, right? And so the way I speak about it is this. Chai, we have this AI engine in which sits atop a thin layer of UGC. So the thin layer of UGC is absolutely essential, right? It's just prompts. But it's just prompts. It's just an image. It's just a name. It's like we've done 1% of what we could do. So we need to keep thickening up that layer of UGC. It must be the case that the users can train the AI. And if reinforcement learning is powerful and important, they have to be able to do that. And so it's got to be the case that there exists, you know, I say to the team, just as Mr. Beast is able to spend 100 million a year or whatever it is on his production company, and he's got a team building the content, the Mr. Beast company is able to spend 100 million a year on his production company. And he's got a team building the content, which then he shares on the YouTube platform. Until there's a team that's earning 100 million a year or spending 100 million on the content that they're producing for the Chai platform, we're not finished, right? So that's the problem. That's what we're excited to build. And getting too caught up in the tech, I think is a fool's errand. It does not work.Alessio [00:44:52]: As an aside, I saw the Beast Games thing on Amazon Prime. It's not doing well. And I'mswyx [00:44:56]: curious. It's kind of like, I mean, the audience reading is high. The run-to-meet-all sucks, but the audience reading is high.Alessio [00:45:02]: But it's not like in the top 10. I saw it dropped off of like the... Oh, okay. Yeah, that one I don't know. I'm curious, like, you know, it's kind of like similar content, but different platform. And then going back to like, some of what you were saying is like, you know, people come to ChaiWilliam [00:45:13]: expecting some type of content. Yeah, I think it's something that's interesting to discuss is like, is moats. And what is the moat? And so, you know, if you look at a platform like YouTube, the moat, I think is in first is really is in the ecosystem. And the ecosystem, is comprised of you have the content creators, you have the users, the consumers, and then you have the algorithms. And so this, this creates a sort of a flywheel where the algorithms are able to be trained on the users, and the users data, the recommend systems can then feed information to the content creators. So Mr. Beast, he knows which thumbnail does the best. He knows the first 10 seconds of the video has to be this particular way. And so his content is super optimized for the YouTube platform. So that's why it doesn't do well on Amazon. If he wants to do well on Amazon, how many videos has he created on the YouTube platform? By thousands, 10s of 1000s, I guess, he needs to get those iterations in on the Amazon. So at Chai, I think it's all about how can we get the most compelling, rich user generated content, stick that on top of the AI engine, the recommender systems, in such that we get this beautiful data flywheel, more users, better recommendations, more creative, more content, more users.Alessio [00:46:34]: You mentioned the algorithm, you have this idea of the Chaiverse on Chai, and you have your own kind of like LMSYS-like ELO system. Yeah, what are things that your models optimize for, like your users optimize for, and maybe talk about how you build it, how people submit models?William [00:46:49]: So Chaiverse is what I would describe as a developer platform. More often when we're speaking about Chai, we're thinking about the Chai app. And the Chai app is really this product for consumers. And so consumers can come on the Chai app, they can come on the Chai app, they can come on the Chai app, they can interact with our AI, and they can interact with other UGC. And it's really just these kind of bots. And it's a thin layer of UGC. Okay. Our mission is not to just have a very thin layer of UGC. Our mission is to have as much UGC as possible. So we must have, I don't want people at Chai training the AI. I want people, not middle aged men, building AI. I want everyone building the AI, as many people building the AI as possible. Okay, so what we built was we built Chaiverse. And Chaiverse is kind of, it's kind of like a prototype, is the way to think about it. And it started with this, this observation that, well, how many models get submitted into Hugging Face a day? It's hundreds, it's hundreds, right? So there's hundreds of LLMs submitted each day. Now consider that, what does it take to build an LLM? It takes a lot of work, actually. It's like someone devoted several hours of compute, several hours of their time, prepared a data set, launched it, ran it, evaluated it, submitted it, right? So there's a lot of, there's a lot of, there's a lot of work that's going into that. So what we did was we said, well, why can't we host their models for them and serve them to users? And then what would that look like? The first issue is, well, how do you know if a model is good or not? Like, we don't want to serve users the crappy models, right? So what we would do is we would, I love the LMSYS style. I think it's really cool. It's really simple. It's a very intuitive thing, which is you simply present the users with two completions. You can say, look, this is from model one. This is from model two. This is from model three. This is from model A. This is from model B, which is better. And so if someone submits a model to Chaiverse, what we do is we spin up a GPU. We download the model. We're going to now host that model on this GPU. And we're going to start routing traffic to it. And we're going to send, we think it takes about 5,000 completions to get an accurate signal. That's roughly what LMSYS does. And from that, we're able to get an accurate ranking. And we're able to get an accurate ranking. And we're able to get an accurate ranking of which models are people finding entertaining and which models are not entertaining. If you look at the bottom 80%, they'll suck. You can just disregard them. They totally suck. Then when you get the top 20%, you know you've got a decent model, but you can break it down into more nuance. There might be one that's really descriptive. There might be one that's got a lot of personality to it. There might be one that's really illogical. Then the question is, well, what do you do with these top models? From that, you can do more sophisticated things. You can try and do like a routing thing where you say for a given user request, we're going to try and predict which of these end models that users enjoy the most. That turns out to be pretty expensive and not a huge source of like edge or improvement. Something that we love to do at Chai is blending, which is, you know, it's the simplest way to think about it is you're going to end up, and you're going to pretty quickly see you've got one model that's really smart, one model that's really funny. How do you get the user an experience that is both smart and funny? Well, just 50% of the requests, you can serve them the smart model, 50% of the requests, you serve them the funny model. Just a random 50%? Just a random, yeah. And then... That's blending? That's blending. You can do more sophisticated things on top of that, as in all things in life, but the 80-20 solution, if you just do that, you get a pretty powerful effect out of the gate. Random number generator. I think it's like the robustness of randomness. Random is a very powerful optimization technique, and it's a very robust thing. So you can explore a lot of the space very efficiently. There's one thing that's really, really important to share, and this is the most exciting thing for me, is after you do the ranking, you get an ELO score, and you can track a user's first join date, the first date they submit a model to Chaiverse, they almost always get a terrible ELO, right? So let's say the first submission they get an ELO of 1,100 or 1,000 or something, and you can see that they iterate and they iterate and iterate, and it will be like, no improvement, no improvement, no improvement, and then boom. Do you give them any data, or do you have to come up with this themselves? We do, we do, we do, we do. We try and strike a balance between giving them data that's very useful, you've got to be compliant with GDPR, which is like, you have to work very hard to preserve the privacy of users of your app. So we try to give them as much signal as possible, to be helpful. The minimum is we're just going to give you a score, right? That's the minimum. But that alone is people can optimize a score pretty well, because they're able to come up with theories, submit it, does it work? No. A new theory, does it work? No. And then boom, as soon as they figure something out, they keep it, and then they iterate, and then boom,Alessio [00:51:46]: they figure something out, and they keep it. Last year, you had this post on your blog, cross-sourcing the lead to the 10 trillion parameter, AGI, and you call it a mixture of experts, recommenders. Yep. Any insights?William [00:51:58]: Updated thoughts, 12 months later? I think the odds, the timeline for AGI has certainly been pushed out, right? Now, this is in, I'm a controversial person, I don't know, like, I just think... You don't believe in scaling laws, you think AGI is further away. I think it's an S-curve. I think everything's an S-curve. And I think that the models have proven to just be far worse at reasoning than people sort of thought. And I think whenever I hear people talk about LLMs as reasoning engines, I sort of cringe a bit. I don't think that's what they are. I think of them more as like a simulator. I think of them as like a, right? So they get trained to predict the next most likely token. It's like a physics simulation engine. So you get these like games where you can like construct a bridge, and you drop a car down, and then it predicts what should happen. And that's really what LLMs are doing. It's not so much that they're reasoning, it's more that they're just doing the most likely thing. So fundamentally, the ability for people to add in intelligence, I think is very limited. What most people would consider intelligence, I think the AI is not a crowdsourcing problem, right? Now with Wikipedia, Wikipedia crowdsources knowledge. It doesn't crowdsource intelligence. So it's a subtle distinction. AI is fantastic at knowledge. I think it's weak at intelligence. And a lot, it's easy to conflate the two because if you ask it a question and it gives you, you know, if you said, who was the seventh president of the United States, and it gives you the correct answer, I'd say, well, I don't know the answer to that. And you can conflate that with intelligence. But really, that's a question of knowledge. And knowledge is really this thing about saying, how can I store all of this information? And then how can I retrieve something that's relevant? Okay, they're fantastic at that. They're fantastic at storing knowledge and retrieving the relevant knowledge. They're superior to humans in that regard. And so I think we need to come up for a new word. How does one describe AI should contain more knowledge than any individual human? It should be more accessible than any individual human. That's a very powerful thing. That's superswyx [00:54:07]: powerful. But what words do we use to describe that? We had a previous guest on Exa AI that does search. And he tried to coin super knowledge as the opposite of super intelligence.William [00:54:20]: Exactly. I think super knowledge is a more accurate word for it.swyx [00:54:24]: You can store more things than any human can.William [00:54:26]: And you can retrieve it better than any human can as well. And I think it's those two things combined that's special. I think that thing will exist. That thing can be built. And I think you can start with something that's entertaining and fun. And I think, I often think it's like, look, it's going to be a 20 year journey. And we're in like, year four, or it's like the web. And this is like 1998 or something. You know, you've got a long, long way to go before the Amazon.coms are like these huge, multi trillion dollar businesses that every single person uses every day. And so AI today is very simplistic. And it's fundamentally the way we're using it, the flywheels, and this ability for how can everyone contribute to it to really magnify the value that it brings. Right now, like, I think it's a bit sad. It's like, right now you have big labs, I'm going to pick on open AI. And they kind of go to like these human labelers. And they say, we're going to pay you to just label this like subset of questions that we want to get a really high quality data set, then we're going to get like our own computers that are really powerful. And that's kind of like the thing. For me, it's so much like Encyclopedia Britannica. It's like insane. All the people that were interested in blockchain, it's like, well, this is this is what needs to be decentralized, you need to decentralize that thing. Because if you distribute it, people can generate way more data in a distributed fashion, way more, right? You need the incentive. Yeah, of course. Yeah. But I mean, the, the, that's kind of the exciting thing about Wikipedia was it's this understanding, like the incentives, you don't need money to incentivize people. You don't need dog coins. No. Sometimes, sometimes people get the satisfaction fro
Sponsorships and applications for the AI Engineer Summit in NYC are live! (Speaker CFPs have closed) If you are building AI agents or leading teams of AI Engineers, this will be the single highest-signal conference of the year for you.Right after Christmas, the Chinese Whale Bros ended 2024 by dropping the last big model launch of the year: DeepSeek v3. Right now on LM Arena, DeepSeek v3 has a score of 1319, right under the full o1 model, Gemini 2, and 4o latest. This makes it the best open weights model in the world in January 2025.There has been a big recent trend in Chinese labs releasing very large open weights models, with TenCent releasing Hunyuan-Large in November and Hailuo releasing MiniMax-Text this week, both over 400B in size. However these extra-large language models are very difficult to serve.Baseten was the first of the Inference neocloud startups to get DeepSeek V3 online, because of their H200 clusters, their close collaboration with the DeepSeek team and early support of SGLang, a relatively new VLLM alternative that is also used at frontier labs like X.ai. Each H200 has 141 GB of VRAM with 4.8 TB per second of bandwidth, meaning that you can use 8 H200's in a node to inference DeepSeek v3 in FP8, taking into account KV Cache needs. We have been close to Baseten since Sarah Guo introduced Amir Haghighat to swyx, and they supported the very first Latent Space Demo Day in San Francisco, which was effectively the trial run for swyx and Alessio to work together! Since then, Philip Kiely also led a well attended workshop on TensorRT LLM at the 2024 World's Fair. We worked with him to get two of their best representatives, Amir and Lead Model Performance Engineer Yineng Zhang, to discuss DeepSeek, SGLang, and everything they have learned running Mission Critical Inference workloads at scale for some of the largest AI products in the world.The Three Pillars of Mission Critical InferenceWe initially planned to focus the conversation on SGLang, but Amir and Yineng were quick to correct us that the choice of inference framework is only the simplest, first choice of 3 things you need for production inference at scale:“I think it takes three things, and each of them individually is necessary but not sufficient: * Performance at the model level: how fast are you running this one model running on a single GPU, let's say. The framework that you use there can, can matter. The techniques that you use there can matter. The MLA technique, for example, that Yineng mentioned, or the CUDA kernels that are being used. But there's also techniques being used at a higher level, things like speculative decoding with draft models or with Medusa heads. And these are implemented in the different frameworks, or you can even implement it yourself, but they're not necessarily tied to a single framework. But using speculative decoding gets you massive upside when it comes to being able to handle high throughput. But that's not enough. Invariably, that one model running on a single GPU, let's say, is going to get too much traffic that it cannot handle.* Horizontal scaling at the cluster/region level: And at that point, you need to horizontally scale it. That's not an ML problem. That's not a PyTorch problem. That's an infrastructure problem. How quickly do you go from, a single replica of that model to 5, to 10, to 100. And so that's the second, that's the second pillar that is necessary for running these machine critical inference workloads.And what does it take to do that? It takes, some people are like, Oh, You just need Kubernetes and Kubernetes has an autoscaler and that just works. That doesn't work for, for these kinds of mission critical inference workloads. And you end up catching yourself wanting to bit by bit to rebuild those infrastructure pieces from scratch. This has been our experience. * And then going even a layer beyond that, Kubernetes runs in a single. cluster. It's a single cluster. It's a single region tied to a single region. And when it comes to inference workloads and needing GPUs more and more, you know, we're seeing this that you cannot meet the demand inside of a single region. A single cloud's a single region. In other words, a single model might want to horizontally scale up to 200 replicas, each of which is, let's say, 2H100s or 4H100s or even a full node, you run into limits of the capacity inside of that one region. And what we had to build to get around that was the ability to have a single model have replicas across different regions. So, you know, there are models on Baseten today that have 50 replicas in GCP East and, 80 replicas in AWS West and Oracle in London, etc.* Developer experience for Compound AI Systems: The final one is wrapping the power of the first two pillars in a very good developer experience to be able to afford certain workflows like the ones that I mentioned, around multi step, multi model inference workloads, because more and more we're seeing that the market is moving towards those that the needs are generally in these sort of more complex workflows. We think they said it very well.Show Notes* Amir Haghighat, Co-Founder, Baseten* Yineng Zhang, Lead Software Engineer, Model Performance, BasetenFull YouTube EpisodePlease like and subscribe!Timestamps* 00:00 Introduction and Latest AI Model Launch* 00:11 DeepSeek v3: Specifications and Achievements* 03:10 Latent Space Podcast: Special Guests Introduction* 04:12 DeepSeek v3: Technical Insights* 11:14 Quantization and Model Performance* 16:19 MOE Models: Trends and Challenges* 18:53 Baseten's Inference Service and Pricing* 31:13 Optimization for DeepSeek* 31:45 Three Pillars of Mission Critical Inference Workloads* 32:39 Scaling Beyond Single GPU* 33:09 Challenges with Kubernetes and Infrastructure* 33:40 Multi-Region Scaling Solutions* 35:34 SG Lang: A New Framework* 38:52 Key Techniques Behind SG Lang* 48:27 Speculative Decoding and Performance* 49:54 Future of Fine-Tuning and RLHF* 01:00:28 Baseten's V3 and Industry TrendsBaseten's previous TensorRT LLM workshop: Get full access to Latent Space at www.latent.space/subscribe
It's the year-in-review show, and the Steam survey, and the Linux Kernel commit review. There's also Proxmox news, news on Debian 13, and questions about x.org. Then the guys dove into their predictions from last year, and made new predictions for 2025. Check it out to see how they did! You can find the show notes at https://bit.ly/4fMbHnK and happy new year! Host: Jonathan Bennett Co-Hosts: Rob Campbell, Jeff Massie, and Ken McDonald Want access to the video version and exclusive features? Become a member of Club TWiT today! https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.