POPULARITY
Categories
Want to Start or Grow a Successful Business? Schedule a FREE 13-Point Assessment with Clay Clark Today At: www.ThrivetimeShow.com Join Clay Clark's Thrivetime Show Business Workshop!!! Learn Branding, Marketing, SEO, Sales, Workflow Design, Accounting & More. **Request Tickets & See Testimonials At: www.ThrivetimeShow.com **Request Tickets Via Text At (918) 851-0102 See the Thousands of Success Stories and Millionaires That Clay Clark Has Helped to Produce HERE: https://www.thrivetimeshow.com/testimonials/ Download A Millionaire's Guide to Become Sustainably Rich: A Step-by-Step Guide to Become a Successful Money-Generating and Time-Freedom Creating Business HERE: www.ThrivetimeShow.com/Millionaire See Thousands of Case Studies Today HERE: www.thrivetimeshow.com/does-it-work/
The Client Stampede - An Unconventional Marketing Podcast by Julie Guest
Kobe Bryant wasn't just a basketball legend—he was a master of mindset, discipline, and pushing past limits. His leadership wasn't about pleasing people; it was about demanding greatness. In this episode, we take a look at Kobe's most powerful lessons on success, resilience, and thinking bigger, from his brutal honesty about failure to the “GOAT mountain” he climbed to learn from the best. If you've ever needed a shot of motivation (or a swift kick to step up your game), this one's for you.Ready for more Kobe inspiration? In Kobe's honor, Nike have dubbed it the year of the Mamba and they made this great video called “Have a Hard Year." Check out that video HERE.And as Kobe once said, “Everything negative — pressure, challenges — is all an opportunity for me to rise.” Watch this second video HERE any time you need to reset, refocus, and rise. GET MORE MARKETING & SALES TOOLS:Are you interested in becoming the published author of a powerful book to help you attract more ideal clients and set you apart from the competition? Imagine holding your own book in your hands as quickly as 3-6 weeks without you ever having to write a word. We do all the work, you get all the glory! Find out how we Capture Your Genius at our sister publishing house Lunch Break Books - powerful books for entrepreneurs with big growth goals.Are you subscribed to Marketing Gold? Get more marketing tools, tips and strategies delivered to your inbox most Mondays. Sign up here.Is your business doing $2M+ and you're ready to take it to the next level? We'll show you how. Get your free marketing roadmap by taking the Client Stampede Assessment. It's fast, free (Value $197) and your 20+ page report is emailed to you instantly.Enjoying the podcast? You'll love the audio book. Get The Client Stampede audio book on Amazon.
We had blockmamba on way back when for Episode #2 and since then he has had the most listened episode in the history of the show. We had to have him back! Mamba is on to chat Super Mario 64, It Takes Two, and a Pac-Man/Mappy combo. You can find all of blockmamba's links at c3.gg/blockmamba Our hosts' links can be found at c3.gg/rey and c3.gg/danThe show is Executive Produced by Channel 3 Founder Joel Willis who can be found at c3.gg/joelOur theme song is provided by Castor Garden. Find all of their tracks on Spotify by simply going to c3.gg/castorgardenmusic or find all of their links at c3.gg/castorgardenALSO! https://adam-evalt.itch.io/neoclassical-mystery is a music package that our own castorgarden put together for game developers. Go check it out or contact Castor Garden for your own custom music package.About Channel3.gg: channel3.gg is social networking built from the ground up for gamers. Sure you can do all the stuff like on the old social medias like post pictures, videos, comments and the like. Channel 3 is so much more than that though. It takes the social media experience and game-ifies it. Made a great post that someone likes (1-ups) or respawns? You earn XP experience points that level you up. New levels mean chances to win tickets for physical prizes, earn digital flair for your profile, and more. Additionally there are weekly events hosted by Channel 3 that let the community unwind and kick back with a little friendly competition. Sure, you want to win but it's more about hanging out and the vibes. These events are hosted on C3's Twitch Channel and also earn XP for participants. XP can also be earned for completing quests-questions related to games and being a gamer, challenges where you go forth and complete a task in a game, rating & reviewing games and systems, creating specifically themed lists of games and more. You can find Channel 3 in both the Android and Apple App Stores or at c3.gg/app
White Mamba | Take the Scallenge And Experience the Ultimate Brian Scalabrine Highlight Video To learn more about the man, the myth and the legend that is Brian Scalabrine click the links below: https://en.wikipedia.org/wiki/Brian_Scalabrine https://x.com/scalabrine?lang=en To discover more original music by Brett Raio and Clay Clark click the links below: https://www.youtube.com/@BrettRaio/videos https://www.thrivetimeshow.com/lyrical-miracles/ https://www.youtube.com/@thrivetimeshowbusinessscho5008/videos
What's the legacy you want to leave — what values are most important to you? We're all made differently, so it's always important that people show up at work who they are.”Adam Antoniewicz is Nike China's VP of Mens Marketing, leading groundbreaking campaigns that redefine consumer engagement. Previously, he served as Apple's Head of Marketing Communications for Greater China, overseeing all customer-facing communications in Apple's $44B market - spearheading iconic campaigns, launching the company's first social media in the region, and navigating complex challenges like the China Cybersecurity Law and U.S.-China trade tensions. Adam's earlier roles include leading Nike Basketball in Greater China, where he led the Rise (打出名堂) campaign and launched the House of Mamba - the world's first reactive LED basketball court. Adam also drove key marketing partnerships for the NBA and helped build the NFL's presence in China. You'll enjoy the candid conversation about how we can enhance our team by learning from sport, and the importance of a pregame before work! This is part of our Chinese leaders series - hosted by P&G Alumni Emily Chang. Got an idea for a future “Learnings from Leaders” episode? Reach out at pgalumpod@gmail.com
Happy National Seal Day!What an absolute marathon of basketball with March Madness, yet, the Madness seems to be missing from the Sweet 16! However, it's still top tier sports. There is still plenty to talk about around the sports world with Jameis Winston/Aaron Rodgers taking new homes (maybe), MLB parks innovating with food, Tiger Woods wants Privacy dad gum it, and Much More!Also, The Valspar Championship was at Copperhead this weekend, and what a show! Victory for Valhalla at the Valspar, as Viktor Hovland gets back in the W column! The G.U.Y.S are back and we totally wrecked ourselves this weekend for DraftKings DFS! We will still throw out some Hang The Banners, Salute Your Sports, and talk Other Relevant Sports News. We've got it all, including a mall debate, let's laugh!Look alive folks!Follow us on:HOF Bets: https://hof-bets.app.link/millygoats (Promo Code: MILLYGOATS)Twitter - https://www.twitter.com/MillyGoatsInstagram - https://www.instagram.com/TheMillyGoatsYouTube - https://www.youtube.com/@TheMillyGoatsTwitch - https://www.twitch.tv/TheMillyGoatsPodcastTikTok - https://www.tiktok.com/@TheMillyGoatsApple Pod - https://rb.gy/0meu1Spotify Pod - https://t.ly/ZUfObWeb - https://themillygoats.godaddysites.com/
It's been a fantastic week sitting with Brandon Hernandez from the SanDiegoBeer.News Awards! Tonight we close out a week with a toast to the host of the official after party, recently named number one beer bar in the USA by USA Today, O'Brien's Pub! Of course we have to toast with a big award winning beer from last year's ceremony from GOAL. Brewing, Year of the Mamba Filipino Style Lager.
Around The NFL, QB competition in Indy? NFL Report Cards are out. Is Clint right about the notion of this team “codling” CJ? What's Popping, The White Mamba calls it a career, Luka faces his old team, & more.
Descubra o Plano de Criação e Escala do Seu Negócio Digital em Apenas 1 Hora!
Dans ce nouvel épisode, je suis ravie de vous partager ma conversation avec Marie Cibot. Marie est Vétérinaire spécialisée dans l'accompagnement de la fin de vie, grâce à la structure Solame. Dans cette discussion, elle nous raconte son parcours avec Mamba et ses différentes expériences qui l'ont conduite à dédier da carrière à la relation humain/animal. Comment accompagner la fin de vie d'un animal, qu'est-ce que l'euthanasie, comment rendre hommage à son chien ou encore comment parler de la mort à son enfant, autant de sujet que nous abordons à travers cet épisode. Le sujet du deuil n'est pas facile à aborder, mais cet échange est important, et j'espère qui vous aidera autant qu'il m'a aidé.⭐ Rejoins le Club des Explorateurs ! La première communauté des Dog-parents qui veulent aller plus loins dans l'exploration de leur relation à leur chien. Une communauté de soutien et d'entraide, des webinaires interactifs mensuels, et des outils concrets pour avancer à ton rythme avec ton chien : https://lanicheaventure.podia.com/le-club-des-explorateurs ⭐Sommaire :00:10 : Présentation du podcast01:24 : Présentation de Marie et Solame 2:20 : La fin de vie d'un animal29:00 : L'euthanasie33:20 : Rendre hommage à son animal45:00 : Parler de la mort d'un animal à un enfant55:00 : La formation des vétérinaires sur la fin de vie animale01:10:00 : Soutenir le podcastOn en parle dans cet épisode : L'épisode sur le deuil avec Livia Arce : https://podcast.ausha.co/la-niche-aventure/ep56-livia-arce-psychopaws-le-deuilSon odeur après la pluie Cédric Sapin DefourLa rubrique ressource de Solame : https://www.solame.vet/ressources/Le compte instagram de Marie : https://www.instagram.com/marie_veto_solame/La page Facebook de Marie : https://www.facebook.com/profile.php?id=100094000063438Le site de Solame : https://www.solame.vet/Poursuivez votre écoute en suivant La Niche sur :Le site : https://laniche-aventure.fr/ Instagram : https://www.instagram.com/lanicheaventure/ Facebook : https://www.facebook.com/lanicheaventure LinkedIn : https://www.linkedin.com/company/lanicheaventure/ YouTube : https://www.youtube.com/channel/UC8FGY3ZcycTD6AfcTVNaIKA Musique : Dolling - CybersdfSource: https://soundcloud.com/cybersdfLicence: https://creativecommons.org/licenses/by/3.0/deed.frTéléchargement (6MB): https://auboutdufil.com/?id=502Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.
Use our promo code ALLDODGERS at https://play.underdogfantasy.com/pc-KrbSsn56X0 Clayton Kershaw officially returns to the Dodgers on a one year deal! We hear from Kershaw about his recovery from offseason surgery and his surprising updated recovery timeline that could have him back in action sooner than expected. The Dodgers signed veteran reliever Luis Garcia to a minor league deal. We talk about the addition and what his role could be in the organization. Plus some Mookie Betts shortstop talk from Dave Roberts has us hyped for the position change now. Could Mookie find himself in the MVP conversation if he can be average at SS? We discuss. Plus Mookie's mindset has us ready to move on from 2024... All that and more on another edition of All Dodgers Live! Tube in all season long! Leave a voicemail or text the Friend of the Show hotline!
Descubrimos que las artes marciales son la inspiración de este grupo musical
Sepp Hochreiter, the inventor of LSTM (Long Short-Term Memory) networks – a foundational technology in AI. Sepp discusses his journey, the origins of LSTM, and why he believes his latest work, XLSTM, could be the next big thing in AI, particularly for applications like robotics and industrial simulation. He also shares his controversial perspective on Large Language Models (LLMs) and why reasoning is a critical missing piece in current AI systems.SPONSOR MESSAGES:***CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!https://centml.ai/pricing/Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.Goto https://tufalabs.ai/***TRANSCRIPT AND BACKGROUND READING:https://www.dropbox.com/scl/fi/n1vzm79t3uuss8xyinxzo/SEPPH.pdf?rlkey=fp7gwaopjk17uyvgjxekxrh5v&dl=0Prof. Sepp Hochreiterhttps://www.nx-ai.com/https://x.com/hochreitersepphttps://scholar.google.at/citations?user=tvUH3WMAAAAJ&hl=enTOC:1. LLM Evolution and Reasoning Capabilities[00:00:00] 1.1 LLM Capabilities and Limitations Debate[00:03:16] 1.2 Program Generation and Reasoning in AI Systems[00:06:30] 1.3 Human vs AI Reasoning Comparison[00:09:59] 1.4 New Research Initiatives and Hybrid Approaches2. LSTM Technical Architecture[00:13:18] 2.1 LSTM Development History and Technical Background[00:20:38] 2.2 LSTM vs RNN Architecture and Computational Complexity[00:25:10] 2.3 xLSTM Architecture and Flash Attention Comparison[00:30:51] 2.4 Evolution of Gating Mechanisms from Sigmoid to Exponential3. Industrial Applications and Neuro-Symbolic AI[00:40:35] 3.1 Industrial Applications and Fixed Memory Advantages[00:42:31] 3.2 Neuro-Symbolic Integration and Pi AI Project[00:46:00] 3.3 Integration of Symbolic and Neural AI Approaches[00:51:29] 3.4 Evolution of AI Paradigms and System Thinking[00:54:55] 3.5 AI Reasoning and Human Intelligence Comparison[00:58:12] 3.6 NXAI Company and Industrial AI ApplicationsREFS:[00:00:15] Seminal LSTM paper establishing Hochreiter's expertise (Hochreiter & Schmidhuber)https://direct.mit.edu/neco/article-abstract/9/8/1735/6109/Long-Short-Term-Memory[00:04:20] Kolmogorov complexity and program composition limitations (Kolmogorov)https://link.springer.com/article/10.1007/BF02478259[00:07:10] Limitations of LLM mathematical reasoning and symbolic integration (Various Authors)https://www.arxiv.org/pdf/2502.03671[00:09:05] AlphaGo's Move 37 demonstrating creative AI (Google DeepMind)https://deepmind.google/research/breakthroughs/alphago/[00:10:15] New AI research lab in Zurich for fundamental LLM research (Benjamin Crouzier)https://tufalabs.ai[00:19:40] Introduction of xLSTM with exponential gating (Beck, Hochreiter, et al.)https://arxiv.org/abs/2405.04517[00:22:55] FlashAttention: fast & memory-efficient attention (Tri Dao et al.)https://arxiv.org/abs/2205.14135[00:31:00] Historical use of sigmoid/tanh activation in 1990s (James A. McCaffrey)https://visualstudiomagazine.com/articles/2015/06/01/alternative-activation-functions.aspx[00:36:10] Mamba 2 state space model architecture (Albert Gu et al.)https://arxiv.org/abs/2312.00752[00:46:00] Austria's Pi AI project integrating symbolic & neural AI (Hochreiter et al.)https://www.jku.at/en/institute-of-machine-learning/research/projects/[00:48:10] Neuro-symbolic integration challenges in language models (Diego Calanzone et al.)https://openreview.net/forum?id=7PGluppo4k[00:49:30] JKU Linz's historical and neuro-symbolic research (Sepp Hochreiter)https://www.jku.at/en/news-events/news/detail/news/bilaterale-ki-projekt-unter-leitung-der-jku-erhaelt-fwf-cluster-of-excellence/YT: https://www.youtube.com/watch?v=8u2pW2zZLCs
Matthew S. Ibrahim, PhD(c), CSCS, LMT, is a strength and conditioning coach, college professor, public speaker, author, and founder of Athletic Performance University (APU). Currently, he serves as Clinical Coordinator/Instructor of Exercise Science and Co-Advisor of The Hidden Opponent chapter at Endicott College. Initially drawn to strength and conditioning for the structure, routine and discipline, he started to see the impact on self-efficacy, problem solving and positive change. He has shared his knowledge and experience in over 25 U.S. states at prestigious venues such as the National Strength & Conditioning Association (NSCA), Perform Better, EXOS at Google Headquarters, Sports Academy (formerly Mamba), UFC Performance Institute, Duke University, Stanford University, and Equinox, along with several engagements across Europe. His professional work and expertise have been featured in leading platforms and publications, including Muscle & Fitness, Men's Journal, NSCA Personal Training Quarterly, Science for Sport, and T-Nation. As an author, Matthew is set to release his first book through Human Kinetics in July 2025: “Train Like a Pro: Programming to Develop Your Inner Athlete”. @matthewibrahim_, www.athleticperformanceu.com
You will forever remember where you were when this trade went down. Join the Dropping Dimes crew and friend of the show Bruce Mamba as they sahre their reactions of the Luka - AD trade and what the immediate future looks for each team.
Welcome back to another episode of Fratello On Air! This week, we discuss a popular request from our listeners about traveling with watches. We've discussed this topic before, but it's been a while since we last did so. For our listeners, we begin the watch content after 32 minutes. Traveling with watches is a typical topic on forums and boards. Some may think it's a tired subject, but with new collectors always entering the fray, the theme deserves more attention. Plus, with new schemes arising to defraud watch wearers, it is constantly on some folks' minds.HandgelenkskontrolleTo kick off this episode, we have a lengthy chat about Burns Night, television, movies, and shoes. Balazs mentions one of his favorite Scottish comedians, Kevin Bridges. We've been watching or awaiting titles such as The Night Agent, Nosferatu, Cross, Shining Girls, Paradise, Landman, and a documentary on Kimbo Slice on the VICE channel. Regarding shoes, the latest Nike Air Tech Challenge 2 retro tennis shoes worn by Andre Agassi are about to launch. We discuss the anniversary of Kobe Bryant's tragic death and mention the upcoming Nike Kobe V Protro "Year of the Mamba" shoes. For the Handglenkskontrolle, Mike is wearing his newly repaired Movado Curviplan with a black dial and an 18K gold case. Balazs is sporting a new release, the Nivada Grenchen Antarctic Diver in green. The piece is limited to 75 pieces with a date and the same number without the function for €956. Expect a review soon!Traveling with watchesSeveral people in our Discord group have asked about traveling with watches. It seems that the new year will bring heavy travel, and with watch friends all over the world, it's fun to bring a few watches to show and tell. However, is this the best idea? We discuss the pitfalls of packing precious timepieces for the trip. Safety, forgetfulness, how to pack, and more enter into the discussion. We hope you enjoy the episode and thanks again to our community for the topic suggestion!If you have ideas for future shows, please feel free to let us know!
On this episode of 'The Kevin O'Connor Show,' NBA player Larry Nance Jr. stops by to remember his former teammate, Kobe Bryant. Both KOC and Larry have a hard time believing it's been 5 years since we lost the legend. They talk about the lasting impact Kobe had on Larry's career and some of the funny moments they shared on and off the court. Larry had a front row seat to Kobe's 60-point final game and told Kevin what it was like watching that special night unfold.Tom Haberstroh joins to discuss the latest news from around the league, including the latest on trade talks surrounding disgruntled Miami Heat star Jimmy Butler. The two also discuss Steph Curry, Zach LaVine, Khris Middleton, Bradley Beal, the injury-riddled Dallas Mavericks, and more. (0:53) Suns trade 1st round pick to Jazz(7:17) Jimmy Butler suspended again(12:33) Zach LaVine trade buzz(20:03) Steph Curry to the Spurs?(40:09) Dereck Lively injury update(42:43) Jarred Vanderbilt cleared to play(43:10) Tom's favorite Kobe Bryant memories(45:48) Remembering Kobe Bryant with Larry Nance Jr. Subscribe to The Kevin O'Connor Show on your favorite podcast app:
QuantStack is an open-source technology software company specializing in tools for data science, scientific computing, and visualization. They are known for maintaining vital projects such as Jupyter, the conda-forge package channel, and the Mamba package manager. Sylvain Corlay is the CEO of QuantStack. He joins the podcast to talk about his company, Conda, Mamba, the The post Mamba and Software Package Security with Sylvain Corlay appeared first on Software Engineering Daily.
Csak úgy repkednek az olyan kifejezések mostanában, mint a toxikus munkahely vagy toxikus kapcsolat, a bántalmazás, a munkahelyi kiégés és az önbeteljesítés. Vajon ki lehet égni a magánéletben is és vannak olyan munkahelyek, amik nem toxikusak? Egyáltalán mi az, ami toxikussá tesz egy munkahelyet vagy egy kapcsolatot, hogyan lehet megjavítani és van-e értelme? Év elején hajlamosak vagyunk világmegváltó terveket szőni arról, hogy többet teszünk a jóllétünkért, ami természetesen érinti a munkahelyünket is, sokan pedig el is döntik, hogy idén minden máshogy lesz, mint tavaly. Aztán január végére általában belátják, hogy ez vagy nem ilyen egyszerű, vagy a valódi változáshoz nem elég elhatározni a dolgot. Főleg akkor, ha a munkahelyi nyomás kiégésbe torkollott, a munkavállaló pedig már csak vergődik a mindennapokban. A kiégés azonban nem csak az egyén problémája, komoly gazdasági károkat okozhat a munkából való kiesés, ezért a megelőzés és a kezelés nagyon fontos lenne minden szempontból. Hogyan lehet elkerülni a kiégést vagy ha már kiégett az ember, mit tehet annak érdekében, hogy kimásszon a gödörből? Ilyenekről fogunk beszélgetni a mai tyúkólban, megpróbálunk redflageket keresni, amikre érdemes odafigyelni, szóba kerülnek a tünetek meg a sokkoló statisztikák is. Vendégünk Szalay Ágnes, pszichológus és coach. Munkájában arra törekszik, hogy támogassa az embereket a belső egyensúlyuk megtalálásában, és abban, hogy megtalálják az elégedettséget a munkájukban, az életükben. Szervezetfejlesztőként és coachként gyakran találkozik a túlzott stressz, a kiégés hatásaival – és szerencsére többeket látott már ebből kijönni, változtatni az életükön. Öt éve gyakorolja a mindfulness meditációt, és tanítja is nyolchetes mindfulness alapú stresszcsökkentés (MBSR) tanfolyamokon. Ha a kiégés, stressz témakörben még szívesen olvasnátok, itt és itt és itt megtehetitek. Bővebben: 00:00:16 - Év elején néha túl sokat is megfogadunk, például a munkánkkal kapcsolatban. 00:01:54 - Miért szar minden? 00:04:14 - Nem létezik a Blue Monday, de azért megbeszéljük, hogy mi az, mert lehet azért benne valami… 00:07:10 - Mi kell ahhoz a munkahelyeken, hogy ne égjünk ki és ne legyünk teljesen motiválatlanok? Van erre megoldás egyáltalán? 00:07:59 - Kiégni akkor tudsz igazán, ha szereted a munkád. 00:12:06 - Gyakorlatilag mindenki ki lehet égve… 00:15:52 - Nem tudjuk elképzelni magunkról, hogy kevesebb feladatot vállaljunk magunkra. 00:16:33 - A nulladik lépés az önkizsákmányolás ebben a rendszerben. 00:19:22 - A munkáltatónak ELVILEG törvényi kötelezettsége, hogy odafigyeljen a dolgozók mentális egészségére. 00:23:34 - Vannak szakaszai a kiégésnek, sorra is vesszük őket! 00:27:22 - A nők hajlamosabbak a kiégésre. 00:31:38 - … és ki tudnak égni a családi teendőkben is. 00:32:07 - A 2025-ös Mamba-akció. 00:38:05 - A stresszel, a megfelelési kényszerrel, az alkalmatlanság érzéssel egyáltalán nem vagyunk egyedül! 00:40:42 - A nőknek eleve több szerepben kell lehetőleg tökéletesen helyt állnia. 00:43:18 - És nem csak akkor, ha van gyerekük, ha nincs, akkor pont azért lesznek céltáblák. 00:47:16 - Nagyon fontos az érdekérvényesítés, és az is, hogy már az állásinterjún észrevegyük a redflageket, ha toxikus egy munkakörnyezet. 00:59:17 - Van generációs különbség is a munkához való hozzáállásunkban és abban, hogy mennyire hagyjuk magunkat kiszipolyozni… 01:05:14 - Témaváltás: Hová lett a sisterhood a munkahelyekről? Miért lesz nő a nőnek farkasa? 01:08:02 - Egy patriarchális társadalomban próbálunk nőként boldogulni. 01:12:03 - Más-más “életparanccsal” érkezünk a helyzetekbe nőként és férfiként, ezek pedig néha megnehezítik a problémák kezelését. 01:14:20 - “Ha mi kibírtuk, nektek is ki kell bírni!” - hozzáállás. 01:18:22 - Nőként ne legyél se túl jó, se nem elég jó. 01:19:53 - A beszélgetés és a munkahelyi kapcsolatok építése is lehet eszköz a konfliktusok és a munkahelyi nehézségek kezelésében. 01:27:34 - A lányok publikus alázása az oktatásban és a munkahelyeken is nagyon rossz hatással van mind az elszenvedőkre, mind a fiúkra, még akkor is, ha ezt nem ismerik fel azonnal. 01:30:06 - Írjátok meg a Blu Monday tapasztalataitokat! :) Némi olvasnivaló: Lelombozó statisztikák, amik szerint akár a munkavállalók több mint négyötöde is közel lehet a kiégéshez? A kiégés tünetei elég általánosak, de azért meglehetősen beszédesek. Byung-Chul Han is írt a kiégésről, könyve a A kiégés társadalma címmel jelent meg 2010-ben. A covid-járvány alatt elterjedt home-office nem tett jót: a szabadidő eltűnt, a munka és a magánélet összecsúszott, ez pedig megágyazott a kiégésnek. Fiatal felnőttek, halogatás és kiégés – a cikk óriásit tarolt, úgyhogy a 444-en külön is beszélgettünk róla. A kiégést már hivatalosan is elismerik betegségként, létezik a kiégés-szindróma. Beragadunk a robotpilóta-üzemmódba és elszalad mellettünk az élet. Akkor a legkönnyebb kiégni, ha szereted a munkád. Önzőnek tűnik magaddal foglalkozni, de nem önzés, ha odafigyelsz arra, hogy hogy vagy. Podcastunk kéthetente jelentkezik új adással, meghallgatható a 444 Spotify- és Apple-csatornáján is. Korábbi adásaink itt találhatók. Javaslataid, ötleteid, meglátásaid a tyukol@444.hu címre várjuk. Illusztráció: Kiss Bence/444See omnystudio.com/listener for privacy information.
QuantStack is an open-source technology software company specializing in tools for data science, scientific computing, and visualization. They are known for maintaining vital projects such as Jupyter, the conda-forge package channel, and the Mamba package manager. Sylvain Corlay is the CEO of QuantStack. He joins the podcast to talk about his company, Conda, Mamba, the The post Mamba and Software Package Security with Sylvain Corlay appeared first on Software Engineering Daily.
In this week's episodes, the guys go over some of the biggest matchups. They also take a look at many of the upcoming Kobe releases for 2025.Check us out on Instagram: https://www.instagram.com/sneaksandstats/We're also on YouTube: https://www.youtube.com/channel/UChfjqV40wCrqVFIqlfbnt_ABuy a pair for yourself:Nike Sabrina 2 Doernbecher - https://stockx.com/nike-sabrina-2-doernbecher-sophia-womensNike LeBron 22 - https://www.nike.com/t/lebron-xxii-basketball-shoes-aNB6tabQConverse All Star BB Trilliant CX - https://stockx.com/converse-all-star-bb-trilliant-cx-ox-diamond-pack-rainy-daze-blueNike Kobe 5 Protro Year of the Mamba Red - https://stockx.com/nike-kobe-5-protro-year-of-the-mamba-university-redNike Kobe 5 Protro Year of the Mamba Eggplant - https://stockx.com/nike-kobe-5-protro-year-of-the-mamba-eggplant
Introduction 00:00:26–00:01:37 Hosts Chris Horwedel and Matt Crone open the episode with lighthearted banter about the cold weather and clarify that their Patreon page is non-existent. Kobe Bryant Shoe Discussion 00:01:37–00:09:03 Chris and Matt discuss the release schedule of Kobe Bryant shoes. Critique of various colorways, including "Year of the Mamba" and "Venice Beach." Matt expresses enthusiasm for the Elite Protro and All-Star 2.0 editions but criticizes the high price points. California Wildfires and Impact 00:09:03–00:10:12 Brief reflection on the devastation caused by wildfires in Malibu, highlighting the destruction of multimillion-dollar homes. Both hosts send thoughts and prayers to those affected. TV Series: Slow Horses 00:10:12–00:11:19 Chris shares his enjoyment of the spy thriller Slow Horses, which Matt had recommended. Discussion of favorite scenes and characters, with a focus on Gary Oldman's performance. College Football Playoff Discussion 00:11:19–00:13:07 Analysis of the upcoming Penn State vs. Notre Dame game, highlighting defensive matchups and potential player absences. Chris notes the recent line shift in favor of Penn State and discusses quarterback Drew Allar's NFL draft potential Tyreek Hill's Potential Trade 00:14:09–00:15:54 Chris highlights rumors of Tyreek Hill leaving the Dolphins, with the Ravens and Raiders among potential landing spots. The hosts debate Hill's fit with different teams and how his departure could affect Miami's offense. Custom Sneaker Viral Video 00:22:34–00:23:57 Chris and Matt discuss a viral video of a child receiving custom sneakers based on his drawings. They praise the parents' creativity and reflect on the joy of such personalized gifts. NFL Wild Card Weekend Game Analysis 1. Los Angeles Chargers vs. Houston Texans 00:17:18–00:18:21 Chargers are favored by three points. Discussion about Justin Herbert's impressive passing stats and the Texans' inconsistent performances. 2. Pittsburgh Steelers vs. Baltimore Ravens 00:24:11–00:25:06 Ravens favored by 9.5 points. Matt hopes for a Steelers win due to his family ties but acknowledges Baltimore's strong defense. 3. Denver Broncos vs. Buffalo Bills 00:25:56–00:26:45 Bills favored by 8.5 points. Both hosts praise the Broncos' turnaround but expect the Bills to dominate due to their offensive strength. 4. Green Bay Packers vs. Philadelphia Eagles 00:27:02–00:28:38 Eagles favored by 4.5 points. Chris emphasizes the Eagles' depth and Jalen Hurts' return from injury. The absence of Christian Watson is noted as a significant blow to the Packers. 5. Washington Commanders vs. Tampa Bay Buccaneers 00:28:38–00:29:58 Buccaneers favored by three points. Matt argues Washington could upset Tampa, criticizing the Buccaneers' performance against stronger teams. 6. Minnesota Vikings vs. Los Angeles Rams 00:30:16–00:31:41 Rams are one-point underdogs. Discussion on Sam Darnold's struggles and the Rams' ability to score points. Closing Thoughts and Props 00:32:04–00:33:04 Chris mentions an intriguing NFL prop bet involving Jalen Hurts, Saquon Barkley, and Josh Jacobs scoring touchdowns.
Chad Hyams and Bob Stewart explore Kobe Bryant's "10 Rules" for success, highlighting lessons applicable to life, business, and personal growth. They discuss concepts like getting better every day, proving skeptics wrong, learning from wins and losses, practicing mindfulness, and fostering ambition. The episode reflects on Kobe's Mamba mentality, his influence on sports and beyond, and how these principles can translate into everyday achievements. Whether seeking inspiration in personal endeavors or professional environments, this episode offers valuable insights from Kobe's extraordinary life. ---------- Connect with the hosts: • Ben Kinney: https://www.BenKinney.com/ • Bob Stewart: https://www.linkedin.com/in/activebob • Chad Hyams: https://ChadHyams.com/ • Book one of our co-hosts for your next event: https://WinMakeGive.com/speakers/ More ways to connect: • Join our Facebook group at www.facebook.com/groups/winmakegive • Sign up for our weekly newsletter: https://WinMakeGive.com/sign-up • Explore the Win Make Give Podcast Network: https://WinMakeGive.com/ Part of the Win Make Give Podcast Network
Happy holidays! We'll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all all our LS supporters who helped fund the gorgeous venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver.Of perennial interest, particularly at academic conferences, is scaled-up architecture research as people hunt for the next Attention Is All You Need. We have many names for them: “efficient models”, “retentive networks”, “subquadratic attention” or “linear attention” but some of them don't even have any lineage with attention - one of the best papers of this NeurIPS was Sepp Hochreiter's xLSTM, which has a particularly poetic significance as one of the creators of the LSTM returning to update and challenge the OG language model architecture:So, for lack of a better term, we decided to call this segment “the State of Post-Transformers” and fortunately everyone rolled with it.We are fortunate to have two powerful friends of the pod to give us an update here:* Together AI: with CEO Vipul Ved Prakash and CTO Ce Zhang joining us to talk about how they are building Together together as a quote unquote full stack AI startup, from the lowest level kernel and systems programming to the highest level mathematical abstractions driving new model architectures and inference algorithms, with notable industry contributions from RedPajama v2, Flash Attention 3, Mamba 2, Mixture of Agents, BASED, Sequoia, Evo, Dragonfly, Dan Fu's ThunderKittens and many more research projects this year* Recursal AI: with CEO Eugene Cheah who has helped lead the independent RWKV project while also running Featherless AI. This year, the team has shipped RWKV v5, codenamed Eagle, to 1.5 billion Windows 10 and Windows 11 machines worldwide, to support Microsoft's on-device, energy-usage-sensitive Windows Copilot usecases, and has launched the first updates on RWKV v6, codenamed Finch and GoldFinch. On the morning of Latent Space Live, they also announced QRWKV6, a Qwen 32B model modified with RWKV linear attention layers. We were looking to host a debate between our speakers, but given that both of them were working on post-transformers alternativesFull Talk on YoutubePlease like and subscribe!LinksAll the models and papers they picked:* Earlier Cited Work* Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention* Hungry hungry hippos: Towards language modeling with state space models* Hyena hierarchy: Towards larger convolutional language models* Mamba: Linear-Time Sequence Modeling with Selective State Spaces* S4: Efficiently Modeling Long Sequences with Structured State Spaces* Just Read Twice (Arora et al)* Recurrent large language models that compete with Transformers in language modeling perplexity are emerging at a rapid rate (e.g., Mamba, RWKV). Excitingly, these architectures use a constant amount of memory during inference. However, due to the limited memory, recurrent LMs cannot recall and use all the information in long contexts leading to brittle in-context learning (ICL) quality. A key challenge for efficient LMs is selecting what information to store versus discard. In this work, we observe the order in which information is shown to the LM impacts the selection difficulty. * To formalize this, we show that the hardness of information recall reduces to the hardness of a problem called set disjointness (SD), a quintessential problem in communication complexity that requires a streaming algorithm (e.g., recurrent model) to decide whether inputted sets are disjoint. We empirically and theoretically show that the recurrent memory required to solve SD changes with set order, i.e., whether the smaller set appears first in-context. * Our analysis suggests, to mitigate the reliance on data order, we can put information in the right order in-context or process prompts non-causally. Towards that end, we propose: (1) JRT-Prompt, where context gets repeated multiple times in the prompt, effectively showing the model all data orders. This gives 11.0±1.3 points of improvement, averaged across 16 recurrent LMs and the 6 ICL tasks, with 11.9× higher throughput than FlashAttention-2 for generation prefill (length 32k, batch size 16, NVidia H100). We then propose (2) JRT-RNN, which uses non-causal prefix-linear-attention to process prompts and provides 99% of Transformer quality at 360M params., 30B tokens and 96% at 1.3B params., 50B tokens on average across the tasks, with 19.2× higher throughput for prefill than FA2.* Jamba: A 52B Hybrid Transformer-Mamba Language Model* We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. * Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while keeping active parameter usage manageable. * This flexible architecture allows resource- and objective-specific configurations. In the particular configuration we have implemented, we end up with a powerful model that fits in a single 80GB GPU.* Built at large scale, Jamba provides high throughput and small memory footprint compared to vanilla Transformers, and at the same time state-of-the-art performance on standard language model benchmarks and long-context evaluations. Remarkably, the model presents strong results for up to 256K tokens context length. * We study various architectural decisions, such as how to combine Transformer and Mamba layers, and how to mix experts, and show that some of them are crucial in large scale modeling. We also describe several interesting properties of these architectures which the training and evaluation of Jamba have revealed, and plan to release checkpoints from various ablation runs, to encourage further exploration of this novel architecture. We make the weights of our implementation of Jamba publicly available under a permissive license.* SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers* We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096×4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU. Core designs include: * (1) Deep compression autoencoder: unlike traditional AEs, which compress images only 8×, we trained an AE that can compress images 32×, effectively reducing the number of latent tokens. * (2) Linear DiT: we replace all vanilla attention in DiT with linear attention, which is more efficient at high resolutions without sacrificing quality. * (3) Decoder-only text encoder: we replaced T5 with modern decoder-only small LLM as the text encoder and designed complex human instruction with in-context learning to enhance the image-text alignment. * (4) Efficient training and sampling: we propose Flow-DPM-Solver to reduce sampling steps, with efficient caption labeling and selection to accelerate convergence. * As a result, Sana-0.6B is very competitive with modern giant diffusion model (e.g. Flux-12B), being 20 times smaller and 100+ times faster in measured throughput. Moreover, Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024×1024 resolution image. Sana enables content creation at low cost. * RWKV: Reinventing RNNs for the Transformer Era* Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. * We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs.* Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, thus parallelizing computations during training and maintains constant computational and memory complexity during inference. * We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers, suggesting future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling trade-offs between computational efficiency and model performance in sequence processing tasks.* LoLCATs: On Low-Rank Linearizing of Large Language Models* Recent works show we can linearize large language models (LLMs) -- swapping the quadratic attentions of popular Transformer-based LLMs with subquadratic analogs, such as linear attention -- avoiding the expensive pretraining costs. However, linearizing LLMs often significantly degrades model quality, still requires training over billions of tokens, and remains limited to smaller 1.3B to 7B LLMs. * We thus propose Low-rank Linear Conversion via Attention Transfer (LoLCATs), a simple two-step method that improves LLM linearizing quality with orders of magnitudes less memory and compute. * We base these steps on two findings. * First, we can replace an LLM's softmax attentions with closely-approximating linear attentions, simply by training the linear attentions to match their softmax counterparts with an output MSE loss ("attention transfer").* Then, this enables adjusting for approximation errors and recovering LLM quality simply with low-rank adaptation (LoRA). * LoLCATs significantly improves linearizing quality, training efficiency, and scalability. We significantly reduce the linearizing quality gap and produce state-of-the-art subquadratic LLMs from Llama 3 8B and Mistral 7B v0.1, leading to 20+ points of improvement on 5-shot MMLU. * Furthermore, LoLCATs does so with only 0.2% of past methods' model parameters and 0.4% of their training tokens. * Finally, we apply LoLCATs to create the first linearized 70B and 405B LLMs (50x larger than prior work). * When compared with prior approaches under the same compute budgets, LoLCATs significantly improves linearizing quality, closing the gap between linearized and original Llama 3.1 70B and 405B LLMs by 77.8% and 78.1% on 5-shot MMLU.Timestamps* [00:02:27] Intros* [00:03:16] Why Scale Context Lengths? or work on Efficient Models* [00:06:07] The Story of SSMs* [00:09:33] Idea 1: Approximation -> Principled Modeling* [00:12:14] Idea 3: Selection* [00:15:07] Just Read Twice* [00:16:51] Idea 4: Test Time Compute* [00:17:32] Idea 2: Hardware & Kernel Support* [00:19:49] RWKV vs SSMs* [00:24:24] RWKV Arch* [00:26:15] QWRKWv6 launch* [00:30:00] What's next* [00:33:21] Hot Takes - does anyone really need long context?Transcript[00:00:00] AI Charlie: We're back at Latent Space Live, our first mini conference held at NeurIPS 2024 in Vancouver. This is Charlie, your AI co host. As a special treat this week, we're recapping the best of 2024 going domain by domain. We sent out a survey to the over 900 of you who told us what you wanted, and then invited the best speakers in the Latent Space Network to cover each field.[00:00:24] AI Charlie: 200 of you joined us in person throughout the day, with over 2200 watching live online. Thanks Our next keynote covers the State of Transformers alternative architectures, with a special joint presentation with Dan Fu of Together AI and Eugene Chia of Recursal AI and Featherless AI. We've featured both Together and Recursal on the pod before, with CEO Veepal Vedprakash introducing them.[00:00:49] AI Charlie: And CTO CE Zhang joining us to talk about how they are building together together as a quote unquote full stack AI startup from the lowest level kernel and systems [00:01:00] programming to the highest level mathematical abstractions driving new model architectures and inference algorithms with notable industry contributions from Red Pajama V2, Flash Attention 3, Mamba 2, Mixture of Agents.[00:01:15] AI Charlie: Based, Sequoia, Evo, Dragonfly, Danfoo's Thunder Kittens, and many more research projects this year. As for Recursal and Featherless, we were the first podcast to feature RWKV last year, and this year the team has shipped RWKV v5, codenamed Eagle, to 1. 5 billion Windows 10 and Windows 11 machines worldwide to support Microsoft's on device, end Energy Usage Sensitive Windows Copilot Use Cases and has launched the first updates on RWKV v6, codenamed Finch and Goldfinch.[00:01:53] AI Charlie: On the morning of Latent Space Live, they also announced QRdata UKv6, a QEN32B model [00:02:00] modified with RDWKV linear attention layers. Eugene has also written the most single most popular guest post on the Latent Space blog this year. Yes, we do take guest posts on what he has discovered about the H100 GPU inference NeoCloud market since the successful launch of Featherless AI this year.[00:02:20] AI Charlie: As always, don't forget to check the show notes for the YouTube link to their talk as well as their slides. Watch out and take care.[00:02:27] Intros[00:02:27] Dan Fu: Yeah, so thanks so much for having us. So this is going to be a little bit of a two part presentation. My name is Dan. I'm at Together AI, and I'll be joining UCSD as faculty in about a year. And Eugene, you want to introduce yourself?[00:02:46] Eugene Cheah: Eugene, I lead the art activity team, and I, I'm CEO of Featherless, and we both work on this new post transformer architecture space.[00:02:55] Dan Fu: Yeah, so yeah, so today we're really excited to talk to you a little bit [00:03:00] about that. So first I'm going to give a broad overview of kind of the last few years of progress in non post transformer architectures. And then afterwards Eugene will tell us a little bit about the latest and the greatest and the latest frontier models in this space.[00:03:16] Why Scale Context Lengths? or work on Efficient Models[00:03:16] Dan Fu: So, the story starts with Scaling. So this is probably a figure or something like this that you've seen very recently. Over the last five to six years, we've seen models really scale up in parameter size, and that's brought with it a bunch of new capabilities, like the ability to talk to you and tell you sometimes how to use your Colab screens.[00:03:35] Dan Fu: But another place where we've seen scaling especially recently is scaling in context length. So this can mean Having more text inputs for your models, but it can also mean things like taking a lot of visual token inputs image inputs to your models or generating lots of outputs. And one thing that's been really exciting over the last few months or so is that we're, we're seeing scaling, not only during training time, but also [00:04:00] during test time.[00:04:00] Dan Fu: So this is one of the, the, this is the iconic image from the OpenAI 01 release. Not only are we starting to scale train time compute, but we're also starting to scale test time compute. Now if you're familiar with our attention and our transformer architectures today, this graph on the right might look a little bit scary.[00:04:19] Dan Fu: And one of the reasons is that the implications are a little bit Interesting. So what does it mean if we want to continue having smarter and smarter models? Do we just need to start building bigger, bigger data centers, spending more flops? Is this this little Dolly 3, we need more flops, guys? Is this going to be the future of all of AI?[00:04:39] Dan Fu: Or is there a better way, another path forward? Maybe we can get the same capabilities that we've gotten used to, But for a lot less compute, a lot less flops. And one of the things that we're going to talk about today is specifically looking at that core attention operator in some of these models.[00:04:57] Dan Fu: And the reason is that so this is just some, some [00:05:00] basic you know, scaling curves, but attention has compute that scales quadratically in the context length. So that means that if you're doing something like test time compute and you want to spend a bunch of tokens thinking about what comes next, the longer that that goes the, the, the more tokens you spend on that, that compute grows quadratically in that.[00:05:19] Dan Fu: One of the questions that we're interested in is, can we take that basic sequence model, that basic sequence primitive at the bottom, and get it to scale better? Can we scale in, let's say, n to the 3 halves or n log n? So in, in the first part of the talk, so we just went over the introduction. What I'm gonna do over the next few slides is just talk about some of the key advances and ideas that have shown over the past few years since maybe early 2020 to, to now that shown promise that this might actually be possible.[00:05:48] Dan Fu: That you can actually get potentially the same quality that we want while scale, while scaling better. So to do that, we're and, and basically the, the story that we're gonna look is we're gonna start to see [00:06:00] how. So this is a basic graph of just the past couple years of progress of perplexity where that blue line, that dotted blue line, is attention.[00:06:07] The Story of SSMs[00:06:07] Dan Fu: It's your basic transformer, full dense attention. And then the dots coming down are some of the methods that you'll see in this presentation today. We're going to turn the clock back all the way to 2020. So this, this, this question of can we make attention subquadratic? Basically, as soon as we said attention is all you need, People started asking this question.[00:06:28] Dan Fu: So we have this quadratic attention operator. Can we do better? I'll briefly talk about why attention is quadratic. And the basic thing that happens, if you're not familiar, is that you have these inputs, these keys and queries. And what you do in this attention matrix, this S matrix over here, is that you're using, you're comparing every token in your input to every other token.[00:06:49] Dan Fu: So when I try to do something like upload a whole book to Gemini, what happens beyond the Maybe not Gemini, because we don't necessarily know what architecture is. But let's say we upload it to LLAMA, what happens beyond [00:07:00] the scenes, behind the scenes, is that it's going to take every single word in that book and compare it to every other word.[00:07:05] Dan Fu: And this has been a really, it's, it's led to some pretty impressive things. But it's kind of a brute forcing of the way that you would try to interpret a interpret something. And what attention does in particular is the, and then what attention, sorry, don't want to. Okay, no, no laser pointer. What, what attention does afterwards is that instead of always operating in this quadratic thing, it takes a row wise softmax over this matrix, and then multiplies it by this values matrix.[00:07:32] Dan Fu: So, one of the key points to notice is that the output size is always going to be the same as the inputs, at least in standard self attention. So one of the first things that folks tried to do around 2020 is this thing called linear attention, which is just, just noticing that if we take out this softmax from here, if we take out this non linearity in the middle of the attention operation, and then if you compute the keys and the values operation first, you actually never hit this quadratic bottleneck.[00:07:57] Dan Fu: So that, that's potentially a way [00:08:00] to get a lot more computationally efficient. And there are various ways to do this by basically using feature maps or try to approximate this overall attention computation. But some of this work sort of started to hit a wall in 2020. And the basic challenges were, were two.[00:08:16] Dan Fu: So one was quality. It was back then, it was kind of hard to, to get good quality with these linear attention operators. The other one was actually hardware efficiency. So these, this feature map that was just shown by a simplify simplify here. Actually ends up being quite computationally expensive if you just implement it naively.[00:08:34] Dan Fu: So you started having these operators that not only were you sure, you're not really sure if they have the same quality, but also they're actually just wall clock slower. So you kind of end up getting the worst of both worlds. So this was the the stage. So that kind of sets the stage for four years ago.[00:08:49] Dan Fu: Keep this in mind because linear attention is actually going to come back in a few years once we have a better understanding. But one of the works that started kicking off this, this [00:09:00] mini revolution in post transformer architectures was this idea called states based model. So here the seminal work is, is one about our work queue in 2022.[00:09:09] Dan Fu: And this, this piece of work really brought together a few ideas from, from some long running research research lines of work. The first one was, and this is really one of the keys to, to closing the gap in quality was just using things that, that if you talk to a, a, an electrical engineer off the street, they might know off, off the, like the back of their hand.[00:09:33] Idea 1: Approximation -> Principled Modeling[00:09:33] Dan Fu: But taking some of those properties with how we model dynamical systems in signal processing and then using those ideas to model the inputs, the, the text tokens in, for example a transformer like Next Token Prediction Architecture. So some of those early states-based model papers were looking at this relatively, relatively simple recurrent update model that comes from maybe chapter one of a signal processing class.[00:09:59] Dan Fu: But then using [00:10:00] some principle theory about how you should do that recurrent update in order to really get the most that you can out of your hidden state, out of your out of your sequence. So that, that was one key idea for quality and. When this was eventually realized, you started to see a bunch of benchmarks that were pretty sticky for a few years.[00:10:20] Dan Fu: Things like long range arena, some long sequence evaluation benchmarks, There was stuff in time series, time series analysis. They started to, you started to see the quality tick up in meaningful ways. But the other key thing that What's so influential about these states based models is that they also had a key idea about how you can compute these things efficiently.[00:10:45] Dan Fu: So if you go back to your machine learning 101 class where you learned about RNNs, one thing that you may have learned is that they don't paralyze as well as detention, because if you just run them naively, you have to do this kind of sequential update to process new tokens, [00:11:00] whereas in attention, you can process all the tokens in parallel at one time.[00:11:04] Dan Fu: One of the key insights behind the S4 paper was that these recurrent models, you could take them and you could also formulate them as a convolution. And in particular, with a convolution, you could, instead of using a PyTorch conv1d operation, you can compute that with the FFT. And that would give you n log n compute in the in the sequence length n with an operator that was relatively well optimized for modern hardware.[00:11:28] Dan Fu: So those are really, I'd say, the two key ideas in 2022 that started allowing these breakthroughs to happen in these non transformer architectures. So, these ideas about how to principally model sorry, how to model the recurrent updates of a mo of, of a sequence in a principled way, and also these key ideas in how you can compute it efficiently by turning it into a convolution and then scaling it up with the FFT.[00:11:53] Dan Fu: Along those same lines, so afterwards we started putting out some work on specialized kernels, so just [00:12:00] like we have flash attention for transformers, we also have works like flash fft conf, and if you look at these lines of work oftentimes when, whenever you see a new architecture, you see a new primitive one of the, one of the table stakes now is, do you have an efficient kernel so that you can actually get wall clock speed up?[00:12:14] Idea 3: Selection[00:12:14] Dan Fu: So by 2022, We are starting to have these models that had promising quality primitives, but and, and also promising wall clocks. So you could actually see regimes where they were better than transformers in meaningful ways. That being said, there were, there's still sometimes a quality gap, particularly for language modeling.[00:12:33] Dan Fu: And because languages, It's so core to what we do in sequence modeling these days the, the next, the next key idea that I'm going to talk about is this idea of selection mechanisms. And this is basically an idea of, so you have this recurrent state that you're keeping around that just summarizes everything that, that came before.[00:12:50] Dan Fu: And to get a good sequence model, one of the things that you really need to be able to do is have the model learn what's the best way to pick out pieces from that recurrent [00:13:00] state. So one of the, one of the major ideas here in a line of work called H3, Hungry Hungry Hippos, and also these hyena models were One way you can do this is by just adding some simple element wise gates.[00:13:13] Dan Fu: So versions of these ideas have been around for decades. If you squint at the LSTM paper you, you can probably find, find this gating mechanism. But turns out you can take those old ideas, add them into these new. state space models, and then you can see quality start to pick up. If you've heard of the Mamba model, this also takes the selection to the next level by actually making some changes in that fundamental recurrent state space.[00:13:40] Dan Fu: So, it's not only just this gating that happens around the SSM layer, but also you can actually make The ABCD matrices of your state space model, you can make them data dependent, which will allow you to even better select out different pieces from your hidden state depending on what you're seeing. I'll also point out if you look at the [00:14:00] bottom right of this figure, there's this little triangle with a GPU SRAM, GPU HBM, and this, this is just continuing that trend of when you have a new architecture you, you, you also release it with a kernel to, to, to show that it is hardware efficient, that it, that it can be hardware efficient on modern hardware.[00:14:17] Dan Fu: The, the, one of the next cool things that happened is once we had this understanding of these are the basic pieces, these are the basic principles behind some of the sequence models linear attention actually started to come back. So in earlier this year, there was a model called BASED the, from Simran Arora and, and some other folks, that combined a more principled version of linear attention that basically the, the, the, the two second summary is that it used a Taylor approximation of the softmax attention, combined that with a simple sliding window attention and was starting to able, starting to be able to expand the Pareto frontier of how much data can you recall from your sequence, versus how small is your recurrent state size.[00:14:58] Dan Fu: So those orange dots [00:15:00] are, at the top there, are just showing smaller sequences that can recall more memory.[00:15:07] Just Read Twice[00:15:07] Dan Fu: And the last major idea I think that has been influential in this line of work and is very relatively late breaking just a few months ago, is just the basic idea that when you have these models that are fundamentally more efficient in the sequence length, you maybe don't want to prompt them or use them in exactly the same way.[00:15:26] Dan Fu: So this was a really cool paper called Just Read Twice, also from Simran. That basically said, hey, all these efficient models can process tokens so much more efficiently than transformers that they can sometimes have unfair advantages compared to a simple transformer token. So, or sorry, a simple transformer model.[00:15:44] Dan Fu: So take, for example the standard, the standard use case of you have some long document, you're going to pass it in as input, and then you're going to ask some question about it. One problem you might imagine for a recurrent model where you have a fixed state size is, let's say that [00:16:00] you're. Article is very long, and you're trying to ask about some really niche thing.[00:16:04] Dan Fu: You can imagine it might be hard for the model to know ahead of time what information to put into the hidden state. But these, these, these models are so much more efficient that you can do something really stupid, like, you can just put the document write down the document, write down the question, write down the document again, and then write down the question again, and then this time, the second time that you go over that document, you know exactly what to look for.[00:16:25] Dan Fu: And the cool thing about this is, so this is, And this this results in better quality, especially on these recall intensive tasks. But the other interesting thing is it really takes advantage of the more efficient architectures that, that we're having here. So one of the other, I think, influential ideas in this line of work is if you change the fundamental compute capabilities of your model and the way that it scales, you can actually start to query it at test time differently.[00:16:51] Idea 4: Test Time Compute[00:16:51] Dan Fu: And this actually, of course, goes back to those slides on test time compute. So while everybody's looking at, say, test time compute for big transformer models, [00:17:00] I think potentially a really interesting research question is, how can you take those and how does it change with this new next generation of models?[00:17:09] Dan Fu: So the, I'll just briefly summarize what some of those key ideas were and then talk and then show you briefly kind of what the state of the art is today. So, so the four key ideas are instead of just doing a simple linear attention approximation, instead take ideas that we know from other fields like signal processing, do a more principled approach to your modeling of the sequence.[00:17:32] Idea 2: Hardware & Kernel Support[00:17:32] Dan Fu: Another key idea throughout all these lines of work is you really want. Hardware and kernel support from day one. So, so even if your model is theoretically more efficient if somebody goes and runs it and it's two times slower one of the things that, that we've learned is that if, if you're in that situation, it's, it's just gonna be dead on arrival.[00:17:49] Dan Fu: So you want to be designing your architectures one of the key, key machine learning ideas that has been important for the quality is just making sure that you encode different ways that you can [00:18:00] select from your hidden state and, and really focus on that as a key decider of quality. And finally, I think one of the, the, the emerging new, new things for, for this line of work and something that's quite interesting is, What are the right test time paradigms for these models?[00:18:15] Dan Fu: How do they change relative to relative to what you might do for a standard transformer? I'll briefly end this section. So I've labeled this slide where we are yesterday because Eugene is going to talk about some new models that he released literally this morning. But as of yesterday, some of the really cool results out of the, these efficient alternative models were so AI2 trained this hybrid MOE called Jamba.[00:18:40] Dan Fu: That, that, that seems, that is currently the state of the art for these non transformer architectures. There's this NVIDIA and MIT put out this new diffusion model called SANA recently that one of their key key observations is that you can take a standard diffusion transformer diffusion model, replace the layers with linear [00:19:00] attention, and then that lets you scale to much larger much larger images, much, much Much larger sequences more efficiently.[00:19:07] Dan Fu: And and one thing that I don't think anybody would have called when a few years ago is that one of those gated SSM, gated states based models ended up on the cover of Science because a great group of folks went and trained some DNA models. So that's Michael Polley, Eric Yuen from from Stanford and the Arc Institute.[00:19:26] Dan Fu: So it's, we're really at an exciting time in 2024 where these non transformer, post transformer architectures are showing promise across a wide range. Across a wide range of, of modalities, of applications, and, and of tasks. And with that, I'll pass it on to Eugene, who can tell you a little bit about the latest and greatest with RWKV.[00:19:49] RWKV vs SSMs[00:19:49] Eugene Cheah: So, that's useful? Yeah. You're talking to here. Oh, I'm talking to here. Okay. So, yeah, two streams. Yeah. So, I think one common questions that we tend to get asked, right, is what's the difference between [00:20:00] RWKV and state space? So I think one of the key things to really understand, right the difference between the two groups, right, is that we are actually more like an open source, random internet meets academia kind of situation.[00:20:11] Eugene Cheah: Like, most of us never wrote any paper, but we, we basically look at RNNs and linear intention when intention is all you need came out, and then we decided to like, hey there is a quadratic scaling problem. Why don't we try fixing that instead? So, so, so we end up developing our own branch, but we end up sharing ideas back and forth.[00:20:30] Eugene Cheah: So, and, and we do all this actively in Discord, GitHub, etc. This was so bad for a few years, right, that basically, the average group's H index was so close to zero, right, Illuter. ai actually came in and helped us write our first paper. Great, now our H index is now three, apparently. So, so, so, but, but the thing is, like, a lot of these experiments led to results, and, and, essentially, essentially, we we took the same ideas from linear attention, [00:21:00] and we built on it.[00:21:01] Eugene Cheah: So, to take a step back into, like, how does RWKB handle its own attention mechanic and achieve the same goals of, like, O and compute, respectively, and in focus of our overall goal to make AI accessible to everyone, regardless of language, nation, or compute, that's our goal. We actually train our models primarily on over a hundred languages, which is another topic altogether.[00:21:23] Eugene Cheah: And our goal is to train to even 200 languages to cover all languages in the world. But at the same time, we work on this architecture, To lower the compute cost so that people can run it on Raspberry Pis and on anything. So, how did RWKB break the dependency of LSTM token flow? Because I think to understand architecture, right, it's probably easier to understand it from the RNN lens.[00:21:46] Eugene Cheah: Because that's where we built on. We all, we all state space kind of like try to, try to start anew and took lessons from that and say, So there's a little bit of divergence there. And AKA, this our version of linear attention. So to take step back [00:22:00] all foundation models, be it transformers or non transformers at a very high level, right?[00:22:05] Eugene Cheah: Pumps in the token. I mean, text that things into embeddings and go through a lot of layers. Generate a lot of states where the QKV cache or be iron in states or RW KB states. And outputs and embedding, they are not the same thing. And we just take more layers and more embeddings. And somehow that magically works.[00:22:23] Eugene Cheah: So, if you, if you remember your ancient RNN lessons which we, which we, which we we call best learning these days the general idea is that you have the embedding information flowing all the way up, and when, and you take that information and you flow it back down, and then you process it as part of your LSTM layers.[00:22:41] Eugene Cheah: So, this is how it generally works. Kapati is quoted saying that RNNs are actually unreasonably effective. The problem is this is not scalable. To start doing work on the second token, you need to wait for the first token. And then you need to, and likewise for the third token and fourth token, yada yada.[00:22:55] Eugene Cheah: That is CPU land, not GPU land. So, so, so, you [00:23:00] can have a H100 and you can't even use 1 percent of it. So, so that's kind of why RNNs didn't really take off in the direction that we wanted, like, billions of parameters when it comes to training. So, what did RDAP KV version 0 do? Boom. We just did the dumbest, lamest thing.[00:23:13] Eugene Cheah: Sorry, this is the bottleneck for RNN. We did the dumb thing of removing that line. And it kind of worked. It trained. It sucked, but it kind of worked. Then we were like, hey, then no one cared because the loss was crap, but how do we improve that? And that's essentially where we move forward, because if you see this kind of flow, right, you can actually get your GPU saturated quickly, where it essentially cascades respectively.[00:23:41] Eugene Cheah: So I'm just waiting for this to loop again. So it's like, once you get your first layer, your token to be computed finish. You start to cascade your compute all the way until you are, Hey, I'm using 100 percent of the GPU. So we, we worked on it, and we started going along the principle of that as long as we keep this general architecture [00:24:00] where, where we can cascade and, and be highly efficient with our architecture, nothing is sacred in our architecture.[00:24:06] Eugene Cheah: And we have done some crazy ideas. In fact, you ask us, if you ask me to explain some things in the paper, right, officially in the paper, I'll say we had this idea and we wrote it this way. The reality is someone came with a code, we tested it, it worked, and then we rationalized later. So, so the general[00:24:24] RWKV Arch[00:24:24] Eugene Cheah: The idea behind rwkbr is that we generally have two major blocks that we do.[00:24:30] Eugene Cheah: We call time mix and channel mix. And time mix generally handles handles long term memory states, where essentially, where essentially where we apply the matrix multiplication and Cilu activation functions into processing an input embedding and an output embedding. I'm oversimplifying it because this, This calculation changed every version and we have, like, version 7 right now.[00:24:50] Eugene Cheah: ChannelMix is similar to Base in the sense that it does shorter term attention, where it just looks at the sister token, or the token before it, because [00:25:00] there's a shift in the token shift matrix. I don't really want to go too much into the papers itself, because, like, we do have three papers on this.[00:25:09] Eugene Cheah: Basically, RWKB, RNN for the transformer, ERA, Ego and Pinch, RWKB, Matrix Value State. This is the updated version 5, version 6. And Goldfinch is our, is, is, is, is our hybrid model respectively. We are writing the paper already for V seven and which is, which is for R wk V seven. Called, named Goose, or architectures are named by Bird.[00:25:30] Eugene Cheah: And, I'm going to cover as well, qrwkb, and mama100k, and rwkb, and Where did that lead to? Great! Because we are all GPU poor and to be clear, like, most of this research is done, like, only on a handful H100s, which I had one Google researcher told me that was, like, his experiment budget for a single researcher.[00:25:48] Eugene Cheah: So, our entire organization has less compute than a single researcher in Google. So We, we, one of the things that we explored into was to how do we convert transformer models instead? Because [00:26:00] someone already paid that billion dollars, a million dollars onto training, so why don't we take advantage of those weights?[00:26:05] Eugene Cheah: And, and to, I believe, together AI worked on the lockets for, for the Lambda side of things, and, and we took some ideas from there as well, and we essentially did that for RWKB.[00:26:15] QWRKWv6 launch[00:26:15] Eugene Cheah: And that led to, Q RWKB6, which we just dropped today, a 32 bit instruct preview model, where we took the Quen 32 bit instruct model, freeze the feedforward layer, remove the QKB attention layer, and replace it with RWKB linear layers.[00:26:32] Eugene Cheah: So to be clear, this means we do not have the rwkv channel mix layer, we only have the time mix layer. But but once we do that, we train the rwkv layer. Important is that the feedforward layer needs to be frozen, so the new attention can be learned. And then we unfreeze the feedforward layer, and train all the layers together with a custom learning rate schedule, so that they can learn how to work together.[00:26:54] Eugene Cheah: The end result, surprisingly, And, to be honest, to the frustration of the R. W. [00:27:00] KV MOE team, which ended up releasing the model on the same day, was that, with just a few hours of training on two nodes, we managed to get it to be on par, kind of, with the original QUAN32B model. So, in fact, when the first run, right, that completely confused us, it was like, and I was telling Daniel Goldstein, Smirky, who kind of leads most of our research coordination, When you pitched me this idea, you told me at best you'll get the same level of performance.[00:27:26] Eugene Cheah: You didn't tell me the challenge and score and Winograd score will shoot up. I don't know what's happening there. But it did. MMLU score dropping, that was expected. Because if you think about it, when we were training all the layers, right, we were essentially Like, Frankenstein this thing, and we did brain damage to the feedforward network layer 2 with the new RWKB layers.[00:27:47] Eugene Cheah: But, 76%, hey, somehow it's retained, and we can probably further train this. We didn't even spend more than 3 days training this, so there's a lot more that can be done, hence the preview. This brings up [00:28:00] a big question, because We are already now in the process of converting to 7TB. We are now, this is actually extremely compute efficient to test our attention mechanic.[00:28:10] Eugene Cheah: It's like, it becomes a shortcut. We can, we are already planning to do our version 7 and our hybrid architecture for it. Because we don't need to train from scratch. And we get a really good model out of it. And the other thing that is uncomfortable to say is that because we are doing right now on the 70b is that if this scales correctly to 128k context length, I'm not even talking about a million 128, majority of enterprise workload today is just on 70b at under 32k context length.[00:28:41] Eugene Cheah: That means if this works and the benchmark matches it, It means we can replace the vast majority of current AI workload, unless you want super long context. And then sorry, can someone give us more GPUs? Because we do need the VRAM for super long context, sadly. So yeah, that's what we are working on, and essentially, [00:29:00] we are excited about this to just push it further.[00:29:02] Eugene Cheah: And this conversion process, to be clear, I don't think it's going to be exclusive to RWKB. It probably will work for Mamba as well, I don't see why not. And we will probably see more ideas, or more experiments, or more hybrids, or Yeah, like, one of the weirdest things that I wanted to say outright, and I confirmed this with the Black Mamba team and the Jamba team, which because we did the GoFinch hybrid model, is that none of us understand why a hard hybrid with a state based model to be R.[00:29:28] Eugene Cheah: QA state space and transformer performs better when, than the baseline of both. It's like, it's like when you train one, you expect, and then you replace, you expect the same results. That's our pitch. That's our claim. But somehow when we jam both together, it outperforms both. And that's like one area of emulation that, like, we only have four experiments, plus four teams, that a lot more needs to be done.[00:29:51] Eugene Cheah: But, but these are things that excite me, essentially, because that is what it's potentially we can move ahead for. Which brings us to what comes next.[00:30:00] What's next[00:30:00] [00:30:00][00:30:00] Dan Fu: So, this part is kind of just some, where we'll talk a little bit about stuff that, that we're excited about. Maybe have some wild speculation on, on what, what's, what's coming next.[00:30:12] Dan Fu: And, of course this is also the part that will be more open to questions. So, a couple things that, that I'm excited about is continued hardware model co design for, for these models. So one of the things that we've put out recently is this library called ThunderKittens. It's a CUDA library.[00:30:29] Dan Fu: And one of the things that, that we found frustrating is every time that we built one of these new architectures, and I'm sure you had the exact same experience, we'd have to go and spend two months in CUDA land, like writing these, these new efficient things. And. If we decided to change one thing in PyTorch, like one line of PyTorch code is like a week of CUDA code at least.[00:30:47] Dan Fu: So one of our goals with, with a library like Thunderkitten, so we, we just broke down what are the key principles, what are the key hardware things what are the key, Compute pieces that you get from the hardware. So for example on [00:31:00] H100 everything is really revolves around a warp group matrix multiply operation.[00:31:06] Dan Fu: So you really want your operation to be able to split into relatively small matrix, matrix multiply operations. So like multiplying two 64 by 64 matrices, for example. And so if you know that ahead of time when you're designing your model, that probably gives you you know, some information about how you set the state sizes, how you set the update, how you set the update function.[00:31:27] Dan Fu: So with Thunderkittens we basically built a whole library just around this basic idea that all your basic compute primitives should not be a float, but it should be a matrix, and everything should just be matrix compute. And we've been using that to, to try to both re implement some existing architectures, and also start to design code.[00:31:44] Dan Fu: Some new ones that are really designed with this core with a tensor core primitive in mind. Another thing that that we're, that at least I'm excited about is we, over the last four or five years, we've really been looking at language models as the next thing. But if you've been paying [00:32:00] attention to Twitter there's been a bunch of new next generation models that are coming out.[00:32:04] Dan Fu: So there, there are. So, video generation models that can run real time, that are supported by your mouse and your keyboard, that I'm told if you play with them that, you know, that they only have a few seconds of memory. Can we take that model, can we give it a very long context length so that you could actually maybe generate an entire game state at a time?[00:32:25] Dan Fu: What does that look like for the model? You're certainly not going to do a giant quadratic attention computation to try to run that. Maybe, maybe use some of these new models, or some of these new video generation models that came out. So Sora came out I don't know, two days ago now. But with super long queue times and super long generation times.[00:32:43] Dan Fu: So that's probably a quadratic attention operation at the, at the bottom of it. What if we could remove that and get the same quality, but a lot faster generation time? Or some of the demos that we saw from Paige earlier today. You know, if I have a super long conversation with my [00:33:00] Gemini bot, what if I wanted to remember everything that it's seen in the last week?[00:33:06] Dan Fu: I mean, maybe you don't for personal reasons, but what if I did, you know? What does that mean for the architecture? And I think, you know, that's certainly something I'm pretty excited about. I'm sure you're excited about it too. So, I think we were supposed to have some hot takes, but I honestly don't remember what our hot takes were.[00:33:21] Hot Takes - does anyone really need long context?[00:33:21] Eugene Cheah: Yeah, including the next slide. Hot takes, yes, these are our[00:33:25] Dan Fu: hot takes.[00:33:25] Eugene Cheah: I think the big one on Twitter that we saw, that we shared, was the question is like, is RAG relevant? In the case of, like, the future of, like, state based models?[00:33:38] Dan Fu: Let's see, I haven't played too much with RAG. But when I have. I'll say I found it was a little bit challenging to do research on it because we had this experience over and over again, where you could have any, an embedding model of any quality, so you could have a really, really bad embedding model, or you could have a really, really [00:34:00] good one, By any measure of good.[00:34:03] Dan Fu: And for the final RAG application, it kind of didn't matter. That's what I'll say about RAG while I'm being recorded. I know it doesn't actually answer the question, but[00:34:13] Eugene Cheah: Yeah, so I think a lot of folks are like, extremely excited of the idea of RWKB or State Space potentially having infinite context.[00:34:21] Eugene Cheah: But I think the reality is that when we say infinite context, we just mean a different kind of infinite context, or you, or as it's previously covered, you need to test the model differently. So, think of it more along the lines of the human. Like, I don't remember what I ate for breakfast yesterday.[00:34:37] Eugene Cheah: Yeah, that's the statement that I'll say. And And we humans are not quadratic transformers. If we did, if let's say we increased our brain size for every second we live, we would have exploded by the time we are 5 years old or something like that. And, and I think, I think basically fundamentally for us, right, be it whether we, regardless of whether RWKB, statespace, XLSTM, [00:35:00] etc, our general idea is that instead of that expanding state, that increase in computational cost, what if we have a fixed state size?[00:35:08] Eugene Cheah: And Information theory detects that that fixed state size will have a limit. Just how big of a limit is a question, like, we, like, RWKB is running at 40 megabytes for, for its state. Its future version might run into 400 megabytes. That is like millions of tokens in, if you're talking about mathematically, the maximum possibility.[00:35:29] Eugene Cheah: It's just that I guess we were all more inefficient about it, so maybe we hit 100, 000. And that's kind of like the work we are doing, trying to like push it and maximize it. And that's where the models will start differing, because it will choose to forget things, it will choose to remember things. And that's why I think that there might be some element of right, but it may not be the same right.[00:35:49] Eugene Cheah: It may be the model learn things, and it's like, hmm, I can't remember that, that article. Let me do a database search, to search. Just like us humans, when we can't remember the article in the company. We do a search on Notion. [00:36:00][00:36:00] Dan Fu: I think something that would be really interesting is if you could have facts that are, so right now, the one intuition about language models is that all those parameters are around just to store random facts about the world.[00:36:14] Dan Fu: And this intuition comes from the observation that if you take a really small language model, it can do things like talk to you, or kind of has like the The style of conversation, it can learn that, but where it will usually fall over compared to a much larger one is it'll just be a lot less factual about things that it knows or that it can do.[00:36:32] Dan Fu: But that points to all those weights that we're spending, all that SGD that we're spending to train these models are just being used to store facts. And we have things like databases that are pretty good at storing facts. So I think one thing that would be really interesting is if we could actually have some sort of outside data store that a language model can can look at that that maybe is you know, has has some sort of gradient descent in it, but but would be quite interesting.[00:36:58] Dan Fu: And then maybe you could edit it, delete [00:37:00] facts, you know, change who's president so that it doesn't, it doesn't get lost.[00:37:04] Vibhu: Can we open up Q& A and hot takes for the audience? I have a hot take Q& A. Do these scale? When, when 405B state space model, RAG exists, no one does long context, who's throwing in 2 million token questions, hot takes?[00:37:24] Dan Fu: The, the who's throwing in 2 million token question, I think, is, is a really good question. So I actually, I was going to offer that as a hot take. I mean, my hot take was going to be that long context doesn't matter. I know I just gave a whole talk about it, but you know, what, what's the point of doing research if you can't, you know, play both sides.[00:37:40] Dan Fu: But I think one of the, so I think for both of us, the reason that we first got into this was just from the first principled questions of there's this quadratic thing. Clearly intelligence doesn't need to be quadratic. What is going on? Can we understand it better? You know, since then it's kind of turned into a race, which has [00:38:00] been exciting to watch, like, how much context you can take in.[00:38:03] Dan Fu: But I think it's right. Nobody is actually putting in a two million context prompt into these models. And, and, you know, if they are, maybe we can go, go You know, design a better model to do that particular thing. Yeah, what do you think about that? So you've also been working on this. Do you think long context matters?[00:38:19] Eugene Cheah: So I'm going to burn a bit. How many of you remember the news of Google Gemini supporting 3 million contacts, right? Raise your hand.[00:38:28] Vibhu: Yeah, 2 million.[00:38:29] Eugene Cheah: Oh, it's 2 million.[00:38:31] Eugene Cheah: Yeah, how many of you actually tried that? See?[00:38:34] Vibhu: I use it a lot. You? You work for MindsTV. I use it a lot.[00:38:41] Eugene Cheah: So, for some people that has used, and I think, I think that's the, that's might be, like, this is where my opinion starts to differ, because I think the big labs may have a bigger role in this, because Like, even for RWKB, even when we train non contacts, the reason why I say VRAM is a problem is that because when we did the, we need to backprop [00:39:00] against the states, we actually need to maintain the state in between the tokens by the token length.[00:39:05] Eugene Cheah: So that means we need to actually roll out the whole 1 million contacts if we are actually training 1 million. Which is the same for transformers, actually, but it just means we don't magically reuse the VRAM consumption in the training time space. So that is one of the VRAM bottlenecks, and I'm neither OpenAI nor Google, so donate GPUs if you have too much of them.[00:39:27] Eugene Cheah: But then, putting it back to another paradigm, right, is that I think O1 style reasoning might be actually pushing that direction downwards. In my opinion, this is my partial hot take is that if, let's say you have a super big model, And let's say you have a 70B model that may take double the tokens, but gets the same result.[00:39:51] Eugene Cheah: Strictly speaking, a 70B, and this is even for transformer or non transformer, right? We we'll take less less resources than that 400 B [00:40:00] model, even if it did double the amount thinking. And if that's the case, and we are still all trying to figure this out, maybe the direction for us is really getting the sub 200 B to be as fast as efficient as possible.[00:40:11] Eugene Cheah: We a very efficient architecture that some folks happen to be working on to, to just reason it out over larger and larger context thing.[00:40:20] Question: Yeah. One thing I'm super interested in is. Models that can watch forever? Obviously you cannot train something on infinite context length. How are y'all thinking about that, where you run on a much longer context length than is possible to train on?[00:40:38] Dan Fu: Yeah, it's a, it's a great question. So I think when I think you guys probably had tweets along these lines, too. When we first started doing these things, because these are all recurrent models in theory you could just run it forever. You could just run it forever. And at the very least it won't, it won't like error out on your crash.[00:40:57] Dan Fu: There's another question of whether it can actually [00:41:00] use what it's seen in that infinite context. And I think there, so one place where probably the research and architectures ran faster Then another research is actually the benchmarks for long context. So you turn it on forever. You want to do everything or watch everything.[00:41:16] Dan Fu: What is it that you actually wanted to do? Can we actually build some benchmarks for that? Then measure what's happening. And then ask the question, can the models do it? Is there something else that they need? Yeah, I think that if I were to turn back the clock to 2022, that's probably one of the things I would have done differently, which would have been actually get some long context benchmarks out at the same time as we started pushing context length on all these models.[00:41:41] Eugene Cheah: I will also say the use case. So like, I think we both agree that there's no Infinite memory and the model needs to be able to learn and decide. I think what we have observed for, I think this also fits the state space model, is that one of the key advantages of this alternate attention mechanic that is not based on token position is that the model don't suddenly become crazy when you go past the [00:42:00] 8k training context tank, or a million context tank.[00:42:03] Eugene Cheah: It's actually still stable. It's still able to run, it's still able to rationalize. It just starts forgetting things. But some of these things are still there in latent memory. Some of these things are still somewhat there. That's the whole point of why reading twice works. Things like that. And one of the biggest pushes in this direction is that I think both Statespace and RWKB have Separate papers by other researchers where they use this architecture for time series data.[00:42:26] Eugene Cheah: Weather modeling. So, you are not asking what was the weather five days ago. You're asking what's the weather tomorrow based on the infinite length that we, as long as this Earth and the computer will keep running. So, so, and they found that it is like, better than existing, like, transformer or existing architecture in modeling this weather data.[00:42:47] Eugene Cheah: Control for the param size and stuff. I'm quite sure there are people with larger models. So, so there are things that, that in this case, right, there is future applications if your question is just what's next and not what's 10 years ago.[00:42:59] Dan Fu: Thanks so [00:43:00] much for having us. Get full access to Latent Space at www.latent.space/subscribe
Happy holidays! We'll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all our LS supporters who helped fund the venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver.Since Nathan Lambert ( Interconnects ) joined us for the hit RLHF 201 episode at the start of this year, it is hard to overstate how much Open Models have exploded this past year. In 2023 only five names were playing in the top LLM ranks, Mistral, Mosaic's MPT, TII UAE's Falcon, Yi from Kai-Fu Lee's 01.ai, and of course Meta's Llama 1 and 2. This year a whole cast of new open models have burst on the scene, from Google's Gemma and Cohere's Command R, to Alibaba's Qwen and Deepseek models, to LLM 360 and DCLM and of course to the Allen Institute's OLMo, OL MOE, Pixmo, Molmo, and Olmo 2 models. We were honored to host Luca Soldaini, one of the research leads on the Olmo series of models at AI2.Pursuing Open Model research comes with a lot of challenges beyond just funding and access to GPUs and datasets, particularly the regulatory debates this year across Europe, California and the White House. We also were honored to hear from and Sophia Yang, head of devrel at Mistral, who also presented a great session at the AI Engineer World's Fair Open Models track!Full Talk on YouTubePlease like and subscribe!Timestamps* 00:00 Welcome to Latent Space Live * 00:12 Recap of 2024: Best Moments and Keynotes * 01:22 Explosive Growth of Open Models in 2024 * 02:04 Challenges in Open Model Research * 02:38 Keynote by Luca Soldani: State of Open Models * 07:23 Significance of Open Source AI Licenses * 11:31 Research Constraints and Compute Challenges * 13:46 Fully Open Models: A New Trend * 27:46 Mistral's Journey and Innovations * 32:57 Interactive Demo: Lachat Capabilities * 36:50 Closing Remarks and NetworkingTranscriptSession3Audio[00:00:00] AI Charlie: Welcome to Latent Space Live, our first mini conference held at NeurIPS 2024 in Vancouver. This is Charlie, your AI co host. As a special treat this week, we're recapping the best of 2024 going domain by domain. We sent out a survey to the over 900 of you who told us what you wanted, and then invited the best speakers in the latent space network to cover each field.[00:00:28] AI Charlie: 200 of you joined us in person throughout the day, with over 2, 200 watching live online. Our next keynote covers the state of open models in 2024, with Luca Soldani and Nathan Lambert of the Allen Institute for AI, with a special appearance from Dr. Sophia Yang of Mistral. Our first hit episode of 2024 was with Nathan Lambert on RLHF 201 back in January.[00:00:57] AI Charlie: Where he discussed both reinforcement learning for language [00:01:00] models and the growing post training and mid training stack with hot takes on everything from constitutional AI to DPO to rejection sampling and also previewed the sea change coming to the Allen Institute. And to Interconnects, his incredible substack on the technical aspects of state of the art AI training.[00:01:18] AI Charlie: We highly recommend subscribing to get access to his Discord as well. It is hard to overstate how much open models have exploded this past year. In 2023, only five names were playing in the top LLM ranks. Mistral, Mosaics MPT, and Gatsby. TII UAE's Falcon, Yi, from Kaifu Lee's 01. ai, And of course, Meta's Lama 1 and 2.[00:01:43] AI Charlie: This year, a whole cast of new open models have burst on the scene. From Google's Jemma and Cohere's Command R, To Alibaba's Quen and DeepSeq models, to LLM360 and DCLM, and of course, to the Allen Institute's OLMO, [00:02:00] OLMOE, PIXMO, MOLMO, and OLMO2 models. Pursuing open model research comes with a lot of challenges beyond just funding and access to GPUs and datasets, particularly the regulatory debates this year across Europe.[00:02:14] AI Charlie: California and the White House. We also were honored to hear from Mistral, who also presented a great session at the AI Engineer World's Fair Open Models track. As always, don't forget to check the show notes for the YouTube link to their talk, as well as their slides. Watch out and take care.[00:02:35] Luca Intro[00:02:35] Luca Soldaini: Cool. Yeah, thanks for having me over. I'm Luca. I'm a research scientist at the Allen Institute for AI. I threw together a few slides on sort of like a recap of like interesting themes in open models for, for 2024. Have about maybe 20, 25 minutes of slides, and then we can chat if there are any questions.[00:02:57] Luca Soldaini: If I can advance to the next slide. [00:03:00] Okay, cool. So I did the quick check of like, to sort of get a sense of like, how much 2024 was different from 2023. So I went on Hugging Face and sort of get, tried to get a picture of what kind of models were released in 2023 and like, what do we get in 2024?[00:03:16] Luca Soldaini: 2023 we get, we got things like both LLAMA 1 and 2, we got Mistral, we got MPT, Falcon models, I think the YI model came in at the end. Tail end of the year. It was a pretty good year. But then I did the same for 2024. And it's actually quite stark difference. You have models that are, you know, reveling frontier level.[00:03:38] Luca Soldaini: Performance of what you can get from closed models from like Quen, from DeepSeq. We got Llama3. We got all sorts of different models. I added our own Olmo at the bottom. There's this growing group of like, Fully open models that I'm going to touch on a little bit later. But you know, just looking at the slides, it feels like 2024 [00:04:00] was just smooth sailing, happy knees, much better than previous year.[00:04:04] Luca Soldaini: And you know, you can plot you can pick your favorite benchmark Or least favorite, I don't know, depending on what point you're trying to make. And plot, you know, your closed model, your open model and sort of spin it in ways that show that, oh, you know open models are much closer to where closed models are today versus to Versus last year where the gap was fairly significant.[00:04:29] Luca Soldaini: So one thing that I think I don't know if I have to convince people in this room, but usually when I give this talks about like open models, there is always like this background question in, in, in people's mind of like, why should we use open models? APIs argument, you know, it's, it's. Just an HTTP request to get output from a, from one of the best model out there.[00:04:53] Luca Soldaini: Why do I have to set up infra and use local models? And there are really like two answer. There is the more [00:05:00] researchy answer for this, which is where it might be. Background lays, which is just research. If you want to do research on language models, research thrives on, on open models, there is like large swath of research on modeling, on how these models behave on evaluation and inference on mechanistic interpretability that could not happen at all if you didn't have open models they're also for AI builders, they're also like.[00:05:30] Luca Soldaini: Good use cases for using local models. You know, you have some, this is like a very not comprehensive slides, but you have things like there are some application where local models just blow closed models out of the water. So like retrieval, it's a very clear example. We might have like constraints like Edge AI applications where it makes sense.[00:05:51] Luca Soldaini: But even just like in terms of like stability, being able to say this model is not changing under the hood. It's, there's plenty of good cases for, [00:06:00] for open models. And the community is just not models. Is I stole this slide from one of the Quent2 announcement blog posts. But it's super cool to see like how much tech exists around open models and serving them on making them efficient and hosting them.[00:06:18] Luca Soldaini: It's pretty cool. And so. It's if you think about like where the term opens come from, comes from like the open source really open models meet the core tenants of, of open, of open source specifically when it comes around collaboration, there is truly a spirit, like through these open models, you can build on top of other people.[00:06:41] Luca Soldaini: innovation. We see a lot of these even in our own work of like, you know, as we iterate in the various versions of Alma it's not just like every time we collect from scratch all the data. No, the first step is like, okay, what are the cool data sources and datasets people have put [00:07:00] together for language model for training?[00:07:01] Luca Soldaini: Or when it comes to like our post training pipeline We one of the steps is you want to do some DPO and you use a lot of outputs of other models to improve your, your preference model. So it's really having like an open sort of ecosystem benefits and accelerates the development of open models.[00:07:23] The Definition of Open Models[00:07:23] Luca Soldaini: One thing that we got in 2024, which is not a specific model, but I thought it was really significant, is we first got we got our first open source AI definition. So this is from the open source initiative they've been generally the steward of a lot of the open source licenses when it comes to software and so they embarked on this journey in trying to figure out, okay, How does a license, an open source license for a model look like?[00:07:52] Luca Soldaini: Majority of the work is very dry because licenses are dry. So I'm not going to walk through the license step by [00:08:00] step, but I'm just going to pick out one aspect that is very good and then one aspect that personally feels like it needs improvement on the good side. This this open source AI license actually.[00:08:13] Luca Soldaini: This is very intuitive. If you ever build open source software and you have some expectation around like what open source looks like for software for, for AI, sort of matches your intuition. So, the weights need to be fairly available the code must be released with an open source license and there shouldn't be like license clauses that block specific use cases.[00:08:39] Luca Soldaini: So. Under this definition, for example, LLAMA or some of the QUEN models are not open source because the license says you can't use this model for this or it says if you use this model you have to name the output this way or derivative needs to be named that way. Those clauses don't meet open source [00:09:00] definition and so they will not be covered.[00:09:02] Luca Soldaini: The LLAMA license will not be covered under the open source definition. It's not perfect. One of the thing that, um, internally, you know, in discussion with with OSI, we were sort of disappointed is around the language. For data. So you might imagine that an open source AI model means a model where the data is freely available.[00:09:26] Luca Soldaini: There were discussion around that, but at the end of the day, they decided to go with a softened stance where they say a model is open source if you provide sufficient detail information. On how to sort of replicate the data pipeline. So you have an equivalent system, sufficient, sufficiently detailed.[00:09:46] Luca Soldaini: It's very, it's very fuzzy. Don't like that. An equivalent system is also very fuzzy. And this doesn't take into account the accessibility of the process, right? It might be that you provide enough [00:10:00] information, but this process costs, I don't know, 10 million to do. Now the open source definition. Like, any open source license has never been about accessibility, so that's never a factor in open source software, how accessible software is.[00:10:14] Luca Soldaini: I can make a piece of open source, put it on my hard drive, and never access it. That software is still open source, the fact that it's not widely distributed doesn't change the license, but practically there are expectations of like, what we want good open sources to be. So, it's, It's kind of sad to see that the data component in this license is not as, as, Open as some of us would like would like it to be.[00:10:40] Challenges for Open Models[00:10:40] Luca Soldaini: and I linked a blog post that Nathan wrote on the topic that it's less rambly and easier to follow through. One thing that in general, I think it's fair to say about the state of open models in 2024 is that we know a lot more than what we knew in, [00:11:00] in 2023. Like both on the training data, like And the pre training data you curate on like how to do like all the post training, especially like on the RL side.[00:11:10] Luca Soldaini: You know, 2023 was a lot of like throwing random darts at the board. I think 2024, we have clear recipes that, okay, don't get the same results as a closed lab because there is a cost in, in actually matching what they do. But at least we have a good sense of like, okay, this is, this is the path to get state of the art language model.[00:11:31] Luca Soldaini: I think that one thing that it's a downside of 2024 is that I think we are more research constrained in 2023. It feels that, you know, the barrier for compute that you need to, to move innovation along as just being right rising and rising. So like, if you go back to this slide, there is now this, this cluster of models that are sort of released by the.[00:11:57] Luca Soldaini: Compute rich club. Membership is [00:12:00] hotly debated. You know, some people don't want to be. Called the rich because it comes to expectations. Some people want to be called rich, but I don't know, there's debate, but like, these are players that have, you know, 10, 000, 50, 000 GPUs at minimum. And so they can do a lot of work and a lot of exploration and improving models that it's not very accessible.[00:12:21] Luca Soldaini: To give you a sense of like how I personally think about. Research budget for each part of the, of the language model pipeline is like on the pre training side, you can maybe do something with a thousand GPUs, really you want 10, 000. And like, if you want real estate of the art, you know, your deep seek minimum is like 50, 000 and you can scale to infinity.[00:12:44] Luca Soldaini: The more you have, the better it gets. Everyone on that side still complains that they don't have enough GPUs. Post training is a super wide sort of spectrum. You can do as little with like eight GPUs as long as you're able to [00:13:00] run, you know, a good version of, say, a LLAMA model, you can do a lot of work there.[00:13:05] Luca Soldaini: You can scale a lot of the methodology, just like scales with compute, right? If you're interested in you know, your open replication of what OpenAI's O1 is you're going to be on the 10K spectrum of our GPUs. Inference, you can do a lot with very few resources. Evaluation, you can do a lot with, well, I should say at least one GPUs if you want to evaluate GPUs.[00:13:30] Luca Soldaini: Open models but in general, like if you are, if you care a lot about intervention to do on this model, which it's my prefer area of, of research, then, you know, the resources that you need are quite, quite significant. Yeah. One other trends that has emerged in 2024 is this cluster of fully open models.[00:13:54] Luca Soldaini: So Omo the model that we built at ai, two being one of them and you know, it's nice [00:14:00] that it's not just us. There's like a cluster of other mostly research efforts who are working on this. And so it's good to to give you a primer of what like fully open means. So fully open, the easy way to think about it is instead of just releasing a model checkpoint that you run, you release a full recipe so that other people working on it.[00:14:24] Luca Soldaini: Working on that space can pick and choose whatever they want from your recipe and create their own model or improve on top of your model. You're giving out the full pipeline and all the details there instead of just like the end output. So I pull up the screenshot from our recent MOE model.[00:14:43] Luca Soldaini: And like for this model, for example, we released the model itself. Data that was trained on, the code, both for training and inference all the logs that we got through the training run, as well as every intermediate checkpoint and like the fact that you release different part of the pipeline [00:15:00] allows others to do really cool things.[00:15:02] Luca Soldaini: So for example, this tweet from early this year from folks in news research they use our pre training data to do a replication of the BitNet paper in the open. So they took just a Really like the initial part of a pipeline and then the, the thing on top of it. It goes both ways.[00:15:21] Luca Soldaini: So for example, for the Olmo2 model a lot of our pre trained data for the first stage of pre training was from this DCLM initiative that was led by folks Ooh, a variety of ins a variety of institutions. It was a really nice group effort. But you know, for When it was nice to be able to say, okay, you know, the state of the art in terms of like what is done in the open has improved.[00:15:46] AI2 Models - Olmo, Molmo, Pixmo etc[00:15:46] Luca Soldaini: We don't have to like do all this work from scratch to catch up the state of the art. We can just take it directly and integrate it and do our own improvements on top of that. I'm going to spend a few minutes doing like a [00:16:00] shameless plug for some of our fully open recipes. So indulge me in this.[00:16:05] Luca Soldaini: So a few things that we released this year was, as I was mentioning, there's OMOE model which is, I think still is state of the art MOE model in its size class. And it's also. Fully open, so every component of this model is available. We released a multi modal model called Molmo. Molmo is not just a model, but it's a full recipe of how you go from a text only model to a multi modal model, and we apply this recipe on top of Quent checkpoints, on top of Olmo checkpoints, as well as on top of OlmoE.[00:16:37] Luca Soldaini: And I think there'd be a replication doing that on top of Mistral as well. The post training side we recently released 2. 0. 3. Same story. This is a recipe on how you go from a base model to A state of the art post training model. We use the Tulu recipe on top of Olmo, on top of Llama, and then there's been open replication effort [00:17:00] to do that on top of Quen as well.[00:17:02] Luca Soldaini: It's really nice to see like, you know, when your recipe sort of, it's kind of turnkey, you can apply it to different models and it kind of just works. And finally, the last thing we released this year was Olmo 2, which so far is the best state of the art. Fully open language model a Sera combines aspect from all three of these previous models.[00:17:22] Luca Soldaini: What we learn on the data side from MomoE and what we learn on like making models that are easy to adapt from the Momo project and the Tulu project. I will close with a little bit of reflection of like ways this, this ecosystem of open models like it's not all roses. It's not all happy. It feels like day to day, it's always in peril.[00:17:44] Luca Soldaini: And, you know, I talked a little bit about like the compute issues that come with it. But it's really not just compute. One thing that is on top of my mind is due to like the environment and how you know, growing feelings about like how AI is treated. [00:18:00] It's actually harder to get access to a lot of the data that was used to train a lot of the models up to last year.[00:18:06] Luca Soldaini: So this is a screenshot from really fabulous work from Shane Longpre who's, I think is in Europe about Just access of like diminishing access to data for language model pre training. So what they did is they went through every snapshot of common crawl. Common crawl is this publicly available scrape of the, of a subset of the internet.[00:18:29] Luca Soldaini: And they looked at how For any given website whether a website that was accessible in say 2017, what, whether it was accessible or not in 2024. And what they found is as a reaction to like the close like of the existence of closed models like OpenAI or Cloud GPT or Cloud a lot of content owners have blanket Blocked any type of crawling to your website.[00:18:57] Luca Soldaini: And this is something that we see also internally at [00:19:00] AI2. Like one project that we started this year is we wanted to, we wanted to understand, like, if you're a good citizen of the internet and you crawl following sort of norms and policy that have been established in the last 25 years, what can you crawl?[00:19:17] Luca Soldaini: And we found that there's a lot of website where. The norms of how you express preference of whether to crawl your data or not are broken. A lot of people would block a lot of crawling, but do not advertise that in RobustDXT. You can only tell that they're crawling, that they're blocking you in crawling when you try doing it.[00:19:37] Luca Soldaini: Sometimes you can't even crawl the robots. txt to, to check whether you're allowed or not. And then a lot of websites there's, there's like all these technologies that historically have been, have existed to make websites serving easier such as Cloudflare or DNS. They're now being repurposed for blocking AI or any type of crawling [00:20:00] in a way that is Very opaque to the content owners themselves.[00:20:04] Luca Soldaini: So, you know, you go to these websites, you try to access them and they're not available and you get a feeling it's like, Oh, someone changed, something changed on the, on the DNS side that it's blocking this and likely the content owner has no idea. They're just using a Cloudflare for better, you know, load balancing.[00:20:25] Luca Soldaini: And this is something that was sort of sprung on them with very little notice. And I think the problem is this, this blocking or ideas really, it impacts people in different ways. It disproportionately helps companies that have a headstart, which are usually the closed labs and it hurts incoming newcomer players where either have now to do things in a sketchy way or you're never going to get that content that the closed lab might have.[00:20:54] Luca Soldaini: So there's a lot, it was a lot of coverage. I'm going to plug Nathan's blog post again. That is, [00:21:00] that I think the title of this one is very succinct which is like, we're actually not, You know, before thinking about running out of training data, we're actually running out of open training data. And so if we want better open models they should be on top of our mind.[00:21:13] Regulation and Lobbying[00:21:13] Luca Soldaini: The other thing that has emerged is that there is strong lobbying efforts on trying to define any kind of, AI as like a new extremely risky and I want to be precise here. Like the problem is now, um, like the problem is not not considering the risk of this technology. Every technology has risks that, that should always be considered.[00:21:37] Luca Soldaini: The thing that it's like to me is sorry, is ingenious is like just putting this AI on a pedestal and calling it like, An unknown alien technology that has like new and undiscovered potentials to destroy humanity. When in reality, all the dangers I think are rooted in [00:22:00] dangers that we know from existing software industry or existing issues that come with when using software on on a lot of sensitive domains, like medical areas.[00:22:13] Luca Soldaini: And I also noticed a lot of efforts that have actually been going on and trying to make this open model safe. I pasted one here from AI2, but there's actually like a lot of work that has been going on on like, okay, how do you make, if you're distributing this model, Openly, how do you make it safe?[00:22:31] Luca Soldaini: How, what's the right balance between accessibility on open models and safety? And then also there's annoying brushing of sort of concerns that are then proved to be unfounded under the rug. You know, if you remember the beginning of this year, it was all about bio risk of these open models.[00:22:48] Luca Soldaini: The whole thing fizzled because as being Finally, there's been like rigorous research, not just this paper from Cohere folks, but it's been rigorous research showing [00:23:00] that this is really not a concern that we should be worried about. Again, there is a lot of dangerous use of AI applications, but this one was just like, A lobbying ploy to just make things sound scarier than they actually are.[00:23:15] Luca Soldaini: So I got to preface this part. It says, this is my personal opinion. It's not my employer, but I look at things like the SP 1047 from, from California. And I think we kind of dodged a bullet on, on this legislation. We, you know, the open source community, a lot of the community came together at the last, sort of the last minute and did a very good effort trying to explain all the negative impact of this bill.[00:23:43] Luca Soldaini: But There's like, I feel like there's a lot of excitement on building these open models or like researching on these open models. And lobbying is not sexy it's kind of boring but it's sort of necessary to make sure that this ecosystem can, can really [00:24:00] thrive. This end of presentation, I have Some links, emails, sort of standard thing in case anyone wants to reach out and if folks have questions or anything they wanted to discuss.[00:24:13] Luca Soldaini: Is there an open floor? I think we have Sophia[00:24:16] swyx: who wants to who one, one very important open model that we haven't covered is Mistral. Ask her on this slide. Yeah, yeah. Well, well, it's nice to have the Mistral person talk recap the year in Mistral. But while Sophia gets set up, does anyone have like, just thoughts or questions about the progress in this space?[00:24:32] Questions - Incentive Alignment[00:24:32] swyx: Do you always have questions?[00:24:34] Quesiton: I'm very curious how we should build incentives to build open models, things like Francois Chollet's ArcPrize, and other initiatives like that. What is your opinion on how we should better align incentives in the community so that open models stay open?[00:24:49] Luca Soldaini: The incentive bit is, like, really hard.[00:24:51] Luca Soldaini: Like, even It's something that I actually, even we think a lot about it internally because like building open models is risky. [00:25:00] It's very expensive. And so people don't want to take risky bets. I think the, definitely like the challenges like our challenge, I think those are like very valid approaches for it.[00:25:13] Luca Soldaini: And then I think in general, promoting, building, so, any kind of effort to participate in this challenge, in those challenges, if we can promote doing that on top of open models and sort of really lean into like this multiplier effect, I think that is a good way to go. If there were more money for that.[00:25:35] Luca Soldaini: For efforts like research efforts around open models. There's a lot of, I think there's a lot of investments in companies that at the moment are releasing their model in the open, which is really cool. But it's usually more because of commercial interest and not wanting to support this, this like open models in the longterm, it's a really hard problem because I think everyone is operating sort of [00:26:00] in what.[00:26:01] Luca Soldaini: Everyone is at their local maximum, right? In ways that really optimize their position on the market. Global maximum is harder to achieve.[00:26:11] Question2: Can I ask one question? No.[00:26:12] Luca Soldaini: Yeah.[00:26:13] Question2: So I think one of the gap between the closed and open source models is the mutability. So the closed source models like chat GPT works pretty good on the low resource languages, which is not the same on the open, open source models, right?[00:26:27] Question2: So is it in your plan to improve on that?[00:26:32] Luca Soldaini: I think in general,[00:26:32] Luca Soldaini: yes, is I think it's. I think we'll see a lot of improvements there in, like, 2025. Like, there's groups like, Procurement English on the smaller side that are already working on, like, better crawl support, multilingual support. I think what I'm trying to say here is you really want to be experts.[00:26:54] Luca Soldaini: who are actually in those countries that teach those languages to [00:27:00] participate in the international community. To give you, like, a very easy example I'm originally from Italy. I think I'm terribly equipped to build a model that works well in Italian. Because one of the things you need to be able to do is having that knowledge of, like, okay, how do I access, you know, how Libraries, or content that is from this region that covers this language.[00:27:23] Luca Soldaini: I've been in the US long enough that I no longer know. So, I think that's the efforts that folks in Central Europe, for example, are doing. Around like, okay, let's tap into regional communities. To get access you know, to bring in collaborators from those areas. I think it's going to be, like, very crucial for getting products there.[00:27:46] Mistral intro[00:27:46] Sophia Yang: Hi everyone. Yeah, I'm super excited to be here to talk to you guys about Mistral. A really short and quick recap of what we have done, what kind of models and products we have released in the [00:28:00] past year and a half. So most of you We have already known that we are a small startup funded about a year and a half ago in Paris in May, 2003, it was funded by three of our co founders, and in September, 2003, we released our first open source model, Mistral 7b yeah, how, how many of you have used or heard about Mistral 7b?[00:28:24] Sophia Yang: Hey, pretty much everyone. Thank you. Yeah, it's our Pretty popular and community. Our committee really loved this model, and in December 23, we, we released another popular model with the MLE architecture Mr. A X seven B and oh. Going into this year, you can see we have released a lot of things this year.[00:28:46] Sophia Yang: First of all, in February 2004, we released MrSmall, MrLarge, LeChat, which is our chat interface, I will show you in a little bit. We released an embedding model for, you [00:29:00] know, converting your text into embedding vectors, and all of our models are available. The, the big cloud resources. So you can use our model on Google cloud, AWS, Azure Snowflake, IBM.[00:29:16] Sophia Yang: So very useful for enterprise who wants to use our model through cloud. And in April and May this year, we released another powerful open source MOE model, AX22B. And we also released our first code. Code Model Coastal, which is amazing at 80 plus languages. And then we provided another fine tuning service for customization.[00:29:41] Sophia Yang: So because we know the community love to fine tune our models, so we provide you a very nice and easy option for you to fine tune our model on our platform. And also we released our fine tuning code base called Menstrual finetune. It's open source, so feel free to take it. Take a look and.[00:29:58] Sophia Yang: More models. [00:30:00] On July 2, November this year, we released many, many other models. First of all is the two new small, best small models. We have Minestra 3B great for Deploying on edge devices we have Minstrel 8B if you used to use Minstrel 7B, Minstrel 8B is a great replacement with much stronger performance than Minstrel 7B.[00:30:25] Sophia Yang: We also collaborated with NVIDIA and open sourced another model, Nemo 12B another great model. And Just a few weeks ago, we updated Mistral Large with the version 2 with the updated, updated state of the art features and really great function calling capabilities. It's supporting function calling in LatentNate.[00:30:45] Sophia Yang: And we released two multimodal models Pixtral 12b. It's this open source and Pixtral Large just amazing model for, models for not understanding images, but also great at text understanding. So. Yeah, a [00:31:00] lot of the image models are not so good at textual understanding, but pixel large and pixel 12b are good at both image understanding and textual understanding.[00:31:09] Sophia Yang: And of course, we have models for research. Coastal Mamba is built on Mamba architecture and MathRoll, great with working with math problems. So yeah, that's another model.[00:31:29] Sophia Yang: Here's another view of our model reference. We have several premier models, which means these models are mostly available through our API. I mean, all of the models are available throughout our API, except for Ministry 3B. But for the premier model, they have a special license. Minstrel research license, you can use it for free for exploration, but if you want to use it for enterprise for production use, you will need to purchase a license [00:32:00] from us.[00:32:00] Sophia Yang: So on the top row here, we have Minstrel 3b and 8b as our premier model. Minstrel small for best, best low latency use cases, MrLarge is great for your most sophisticated use cases. PixelLarge is the frontier class multimodal model. And, and we have Coastral for great for coding and then again, MrEmbedding model.[00:32:22] Sophia Yang: And The bottom, the bottom of the slides here, we have several Apache 2. 0 licensed open way models. Free for the community to use, and also if you want to fine tune it, use it for customization, production, feel free to do so. The latest, we have Pixtros 3 12b. We also have Mr. Nemo mum, Coastal Mamba and Mastro, as I mentioned, and we have three legacy models that we don't update anymore.[00:32:49] Sophia Yang: So we recommend you to move to our newer models if you are still using them. And then, just a few weeks ago, [00:33:00] we did a lot of, uh, improvements to our code interface, Lachette. How many of you have used Lachette? Oh, no. Only a few. Okay. I highly recommend Lachette. It's chat. mistral. ai. It's free to use.[00:33:16] Sophia Yang: It has all the amazing capabilities I'm going to show you right now. But before that, Lachette in French means cat. So this is actually a cat logo. If you You can tell this is the cat eyes. Yeah. So first of all, I want to show you something Maybe let's, let's take a look at image understanding.[00:33:36] Sophia Yang: So here I have a receipts and I want to ask, just going to get the prompts. Cool. So basically I have a receipt and I said I ordered I don't know. Coffee and the sausage. How much do I owe? Add a 18 percent tip. So hopefully it was able to get the cost of the coffee and the [00:34:00] sausage and ignore the other things.[00:34:03] Sophia Yang: And yeah, I don't really understand this, but I think this is coffee. It's yeah. Nine, eight. And then cost of the sausage, we have 22 here. And then it was able to add the cost, calculate the tip, and all that. Great. So, it's great at image understanding, it's great at OCR tasks. So, if you have OCR tasks, please use it.[00:34:28] Sophia Yang: It's free on the chat. It's also available through our API. And also I want to show you a Canvas example. A lot of you may have used Canvas with other tools before. But, With Lachat, it's completely free again. Here, I'm asking it to create a canvas that's used PyScript to execute Python in my browser.[00:34:51] Sophia Yang: Let's see if it works. Import this. Okay, so, yeah, so basically it's executing [00:35:00] Python here. Exactly what we wanted. And the other day, I was trying to ask Lachat to create a game for me. Let's see if we can make it work. Yeah, the Tetris game. Yep. Let's just get one row. Maybe. Oh no. Okay. All right. You get the idea. I failed my mission. Okay. Here we go. Yay! Cool. Yeah. So as you can see, Lachet can write, like, a code about a simple game pretty easily. And you can ask Lachet to explain the code. Make updates however you like. Another example. There is a bar here I want to move.[00:35:48] Sophia Yang: Okay, great, okay. And let's go back to another one. Yeah, we also have web search capabilities. Like, you can [00:36:00] ask what's the latest AI news. Image generation is pretty cool. Generate an image about researchers. Okay. In Vancouver? Yeah, it's Black Forest Labs flux Pro. Again, this is free, so Oh, cool.[00:36:19] Sophia Yang: I guess researchers here are mostly from University of British Columbia. That's smart. Yeah. So this is Laia ira. Please feel free to use it. And let me know if you have any feedback. We're always looking for improvement and we're gonna release a lot more powerful features in the coming years.[00:36:37] Sophia Yang: Thank you. Get full access to Latent Space at www.latent.space/subscribe
Use code LOGAN10 for 10% off your SeatGeek order https://seatgeek.onelink.me/RrnK/LOGAN10 *Up to $25 off Our ex-roommate Dwarf Mamba RETURNS to discuss life after Logan Paul, quitting social media for a 9-5 job, Hawk Tuah’s crypto scandal, bone-chilling truth about NJ drones, Logan’s family disaster at Thanksgiving, Trump ending daylight savings, backlash following Luigi Mangione’s m*rder, how to talk to aliens & more… SUBSCRIBE TO THE PODCAST ► https://www.youtube.com/impaulsive Watch Previous (AMP’s Biggest Member FANUM on iShowSpeed VS Kai Cenat, Taxing John Cena, Cops Stealing His Lambo) ► https://www.youtube.com/watch?v=oZziB35XSnw&t=145s ADD US ON: INSTAGRAM: https://www.instagram.com/impaulsiveshow/ Timestamps: 0:00 Welcome Dwarf Mamba!
Philadelphia Eagles CB Darius Slay reacts to the Eagles' Week 14 victory over the Carolina Panthers. Slay pleads his case for Eagles RB Saquon Barkley for MVP and why Zach Baun should be a Defensive Player of the Year candidate. Slay then takes us through his experience of the nail-biting final drive of the game as Panthers QB Bryce Young attempted to rally his team to victory against one of the best defenses in the NFL. (Times are approximate due to added advertisements) 0:00 - Start 0:28 - Slay's concussion vs Rams 2:21 - Carolina is NOT as bad as you think! 5:39 - Saquon for MVP 6:43 - Eagles defense has been amazing since the bye week 14:50 - Big Play Slay final drive 27:01 - Eagles-Steelers preview Learn more about your ad choices. Visit podcastchoices.com/adchoices
In this episode, Shashank Rajput, Research Scientist at Mosaic and Databricks, explores innovative approaches in large language models (LLMs), with a focus on Retrieval Augmented Generation (RAG) and its impact on improving efficiency and reducing operational costs.Highlights include:- How RAG enhances LLM accuracy by incorporating relevant external documents.- The evolution of attention mechanisms, including mixed attention strategies.- Practical applications of Mamba architectures and their trade-offs with traditional transformers.
This episode is brought to you by the Change Makers Certification Program! Sinikka Waugh and Mike Rice discuss God's Plan is Always Better than My Plan (Jeremiah 29:11). As a dedicated IT Staffing executive and leader, Mike's commitment lies in orchestrating teams to deliver exceptional outcomes for their valued clients. With a relentless focus on nurturing talent and fostering innovation, he lead the organization in identifying, recruiting, and deploying top-tier IT professionals who not only meet but exceed the specific needs and aspirations of their clients. By leveraging cutting-edge strategies and a deep understanding of the tech landscape, Mike empowers their teams to curate solutions that drive success for their clients, fostering enduring partnerships built on trust, reliability, and excellence. It's his passion to align their expertise with the dynamic demands of the industry, ensuring that they consistently deliver unparalleled services and cultivate environments where businesses thrive through exceptional talent acquisition and management. For 25 years, Mike dedicated himself to the IT Consulting field, navigating diverse roles from recruiting and account management to branch management, regional directorship, and executive leadership. His fundamental philosophy centers around fostering businesses incrementally through individual relationships while steadfastly adhering to faith-driven principles in every action. Beyond his professional pursuits, Mike finds joy in golfing, actively participating in Life Change Church, and cherishing moments with his wife and four adult children. Life Verse: Matthew 6:33 “Seek ye first the kingdom of God and his righteousness, and he will give you everything you need.”
The Color of Money | Transformative Conversations for Wealth Building
Charles Holloway is a leader of people. He has built a brokerage firm, a title company, and a property management company from the ground up. He joins us today to tell us what it takes to lead a business empire.We discuss Kobe Bryant's “Mamba” mentality for achieving greatness, getting the right people on your team who are as driven as you are, and what it means to lead your wider community.If you aspire to not only be a great business professional, but a true leader of people, this is the episode for you. Resources:Learn more at The Color of MoneyRead “Rich Dad, Poor Dad”Read “The Energy Bus”Become a real estate agent HEREConnect with Our HostsEmerick Peace:Instagram: @theemerickpeaceFacebook: facebook.com/emerickpeaceDaniel Dixon:Instagram: @dixonsolditFacebook: facebook.com/realdanieldixonLinkedIn: linkedin.com/in/dixonsolditYouTube: @dixongroupcompaniesJulia Lashay:Instagram: @iamjulialashayFacebook: facebook.com/growwithjuliaLinkedIn: linkedin.com/in/julialashay/YouTube: @JuliaLashayBo MenkitiInstagram: @themenkitigroupFacebook: facebook.com/obiora.menkitiLinkedIn: linkedin.com/in/bomenkiti/Produced by NOVA This podcast is for general informational purposes only. The guest's views, thoughts, and opinions represent those of the guest and not KWRI and its affiliates and should not be construed as financial, economic, legal, tax, or other advice. This podcast is provided without any warranty, or guarantee of its accuracy, completeness, timeliness, or results from using the information.Advertising Inquiries: https://redcircle.com/brands
Coach Mark Gottfried interviews Audrey and Nicole Nourse, four-time NCAA beach volleyball champions from USC. The twins share their journey, including their competitive spirit, the "Mamba mentality," and their inspiring path from high school multi-sport athletes to collegiate champions. They discuss plans for future careers in investment banking and private wealth management, and their aspiration to start a fund for women's sports. The Nourse twins also talk about their personal growth, unwavering faith, and appreciation for the welcoming nature of beach volleyball.
In this episode of The Cognitive Revolution, Nathan dives deep into the world of state space models with returning co-host Jason Meaux and special guest Quentin Anthony, Head of Model Training at Zyphra. Explore the cutting-edge Zamba 2-7b model, which combines selective state space and attention mechanisms. Uncover practical insights on model training, architectural choices, and the challenges of scaling AI. From learning schedules to hybrid architectures, loss metrics to context length extension, this technical discussion covers it all. Don't miss this in-depth conversation on the future of personalized, on-device AI. Check out more about Zyphra and Jason Meaux here: Zyphra's website: https://www.zyphra.com Zamba2-7B Blog: https://www.zyphra.com/post/zamba2-7b Zamba2 GitHub: https://github.com/Zyphra/Zamba2 Tree attention: https://www.zyphra.com/post/tree-attention-topology-aware-decoding-for-long-context-attention-on-gpu-clusters Jason's Meaux Twitter: https://x.com/KamaraiCode Jason's Meaux website: https://www.statespace.info Be notified early when Turpentine's drops new publication: https://www.turpentine.co/exclusiveaccess SPONSORS: Weights & Biases RAG++: Advanced training for building production-ready RAG applications. Learn from experts to overcome LLM challenges, evaluate systematically, and integrate advanced features. Includes free Cohere credits. Visit https://wandb.me/cr to start the RAG++ course today. Shopify: Shopify is the world's leading e-commerce platform, offering a market-leading checkout system and exclusive AI apps like Quikly. Nobody does selling better than Shopify. Get a $1 per month trial at https://shopify.com/cognitive Notion: Notion offers powerful workflow and automation templates, perfect for streamlining processes and laying the groundwork for AI-driven automation. With Notion AI, you can search across thousands of documents from various platforms, generating highly relevant analysis and content tailored just for you - try it for free at https://notion.com/cognitiverevolution LMNT: LMNT is a zero-sugar electrolyte drink mix that's redefining hydration and performance. Ideal for those who fast or anyone looking to optimize their electrolyte intake. Support the show and get a free sample pack with any purchase at https://drinklmnt.com/tcr CHAPTERS: (00:00:00) Teaser (00:00:42) About the Show (00:01:05) About the Episode (00:03:09) Introducing Zyphra (00:07:28) Personalization in AI (00:12:48) State Space Models & Efficiency (Part 1) (00:19:22) Sponsors: Weights & Biases RAG++ | Shopify (00:21:26) State Space Models & Efficiency (Part 2) (00:22:23) Dense Attention to Shared Attention (00:29:41) Zyphra's Early Bet on Mamba (Part 1) (00:33:18) Sponsors: Notion | LMNT (00:36:00) Zyphra's Early Bet on Mamba (Part 2) (00:37:22) Loss vs. Model Quality (00:44:53) Emergence & Grokking (00:50:06) Loss Landscapes & Convergence (00:56:55) Sophia, Distillation & Secrets (01:09:00) Competing with Big Tech (01:23:50) The Future of Model Training (01:30:02) Deep Dive into Zamba 1 (01:34:24) Zamba 2 and Mamba 2 (01:38:56) Context Extension & Memory (01:44:04) Sequence Parallelism (01:45:44) Zamba 2 Architecture (01:53:57) Mamba Attention Hybrids (02:00:00) Lock-in Effects (02:05:32) Mamba Hybrids in Robotics (02:07:07) Ease of Use & Compatibility (02:12:10) Tree Attention vs. Ring Attention (02:22:02) Zyphra's Vision & Goals (02:23:57) Outro SOCIAL LINKS: Website: https://www.cognitiverevolution.ai Twitter (Podcast): https://x.com/cogrev_podcast Twitter (Nathan): https://x.com/labenz LinkedIn: https://www.linkedin.com/in/nathanlabenz/
This week, we took over a high school homecoming, and let me tell you, after hearing about the DJ disaster they had last year, I knew I had to swoop in and save the day. Last year's guy played all the wrong songs, capped the night with Mamba—which, I mean, why? So, they canned him and called in the pro (yours truly). The kids? They had a blast. I'm talking, Unwritten by Natasha Bedingfield brought the house down. You know you've hit the right note when an entire room of teenagers collectively loses their minds two seconds into a song. Then came Since U Been Gone by Kelly Clarkson, because obviously, you can't go wrong with early 2000s anthems at a 2024 homecoming. These kids? They get it. Who knew? But, let's get to the meat of this episode. We're in the final stretch, people—the 2024 election is less than 20 days away. And let's be real, nobody has any idea who they're voting for. We've got Zaddy Trumpets out here spinning his usual “policies” (read: nonsense). He's supposed to be answering questions at a town hall but ends up playing DJ with his summer playlist while people literally pass out from heat exhaustion. Instead of answering questions like a normal candidate, he's out here doing the YMCA for 40 minutes straight. I mean, what are we doing? But the highlight? Trump's diving into crypto now—yep, you heard me. He's promoting some sketchy token called “Liberty Financial.” Look, if you're feeling adventurous, go ahead and dump your entire savings into it, but I wouldn't recommend it (not financial advice, wink, wink). The best part? You can't even withdraw your money until the “scheme”—I mean, system—decides you can. Yeah, read the fine print, folks. Meanwhile, Elon Musk is jumping on the Trump train, donating millions and becoming Trump's new bestie after they were practically enemies a few weeks ago. Musk is now campaigning in Pennsylvania like he's the freaking Director of Government Efficiency, which is a real job title Trump invented just for him. Honestly, the whole thing is starting to feel like a crossover episode of The Apprentice and Shark Tank. And if all that wasn't enough, we've got Ice Cube dropping new music—yes, that Ice Cube. The man who hasn't rapped in forever decided now is the time. I don't think anyone's listening to it, but hey, props to him for trying. Exhibit's also crawling out of retirement with a new album on a record label run by… wait for it… Conor McGregor. Yes, because when I think “music mogul,” I totally think “MMA fighter.” What are we doing? Oh, and shoutout to Dasani. They're back with a new formula—now without salt. Yes, that's right. The selling point for water is that it finally doesn't have salt in it. What a groundbreaking concept. Welcome to 2024, people. That's it for this week. It's all a mess, and I'm just here trying to make sense of it. If you haven't subscribed to the YouTube channel yet, what are you even doing? See you next week for more chaos. Peace! --- Support this podcast: https://podcasters.spotify.com/pod/show/what-are-we-doing-pod/support
Welcome back to The Switch! In today's episode, Eric and I explore the edges of our creativity—those moments when our ideas push us out of our comfort zones. We dive into how creating alter egos like Kobe's Mamba or my own Happy Oscar allows us to tackle deeper topics and push boundaries. Switch-it-up Steps: Embrace Alter Egos: Learn how adopting a creative persona, like Eric's Little Italy, can expand your expressive range and tackle themes that might feel too personal or challenging. Question the Limits: We discuss what really caps our creativity—is it resources, experience, or just our own hesitations? Leverage Your Assets: Whether it's the tools at your disposal or your life experiences, every creative has unique ammunition to break barriers. Today's chat pushes us to think about how far we can stretch our creative muscles and what it truly means to limit—or liberate—our artistic expressions. Stay Connected: Catch Eric: @infinitetalkspod Catch Oscar: @happyoscarstudio Tune in as we dissect the boundaries of creativity and how sometimes, stepping into another character can be the key to breaking free. Let's dive deep and push past those limits together!
Cybercab, Golden Jackal, Mamba 2FA, Multi Microsoft, iPhone thieves, esims, Aaran Leyland, and More, on this edition of the Security Weekly News. Visit https://www.securityweekly.com/swn for all the latest episodes! Show Notes: https://securityweekly.com/swn-421
Cybercab, Golden Jackal, Mamba 2FA, Multi Microsoft, iPhone thieves, esims, Aaran Leyland, and More, on this edition of the Security Weekly News. Show Notes: https://securityweekly.com/swn-421
Cybercab, Golden Jackal, Mamba 2FA, Multi Microsoft, iPhone thieves, esims, Aaran Leyland, and More, on this edition of the Security Weekly News. Visit https://www.securityweekly.com/swn for all the latest episodes! Show Notes: https://securityweekly.com/swn-421
En el episodio de hoy me siento desde la ciudad de Medellín con Juan David Zapata, cofundador de Mamba Negra, Juniper Drinks, Selva Gin y Ron Carbón. Juan David me cuenta cómo fueron sus años creciendo en la Comuna 13, la apuñalada que recibió a los 17 años durante una pelea de barras bravas en un partido de fútbol, el impacto que tuvo haber quedado en la décima posición durante la final de World Class 2018 y cómo nació Mamba Negra, la única barra en Medellín que ha sido reconocida en 50Best Discovery. También hablamos sobre la importancia de conocer otras culturas, cómo crearon la primera marca de agua tónica colombiana, el poder que tiene asociarse con las personas correctas y a qué le debe el éxito que ha tenido Mamba Negra en tan solo 2 años. Tres "takeaways" de este episodio: 1. La reputación es la base para construir un emprendimiento, un negocio o una carrera laboral. 2. Si tienes una sociedad exitosa, conserva esos valores y no saques agallas cuando el negocio siga creciendo. 3. Hoy en día, el que no cuenta lo que hace, llega otro que no lo hace tan bien pero se sabe expresar y agarra más clientes. Sigue a Juan David y sus empresas en Instagram: Juan David Zapata | Mamba Negra | Juniper Drinks | Selva Gin | Ron Carbón No olvides suscribirte a nuestro canal de Youtube.
Episode 5-494Hello my running friends, yes I let this one slide a week, because I needed to align it with the off-week of my other podcast – the apocalypse podcast – in which I am entering my 5th season now. Honestly, I have crowded my calendar so much it's hard to get done all the stuff I'm signed up for. Today I will talk about one of those top 5 questions from the running forums – how to stay motivated and get the workouts done. I'll talk a bit about how I do goal setting and why. And I'll give you an update on my own goals for the summer that, for better or worse, I'm closing in on the deadlines for. Today's interview is with the race director for the Mamba 100 in Memphis. That is my goal race for this campaign. I am glad I talked to him because I acquired insights into the course that filled in some blanks for me. So, sit back, (unless you're in a chair with no back, because you'll sprawl onto the floor). Sprawl is an interesting word that come to us through Old English and Danish. It rolls off the tongue nicely doesn't it? Almost onomatopoeia. Anyhow – sit back or sprawl back and let's go on an adventure together. On with the show. Interview… Thanks so much. Great talking to you. www.mambatrailrunners.com is our website. Facebook and Instagram: Mamba Trail RunnersOutroOk my running friends – I have to tell you I was quite relieved after my discussion with James. When I talked to him a few weeks ago, I did not yet have any big weks under my belt and the 100K distance was looking a bit daunting. But, after our talk I realized that the course is runnable and it's well supported so good news all around for what my goal is. Now I just need to stay healthy for a few more weeks. What kind of mileage do you need to have for this distance? I think the mileage isn't as important as you basic strength and time on your feet. So, cumulative fitness is more important than any set mileage goal. When I ran 50K, my long run was probably 20 miles, but I was coming off a marathon campaign. When I ran 50 miles, my long run was 36 miles, and that was probably overkill. But it was my first ultra so it was ok. When I trained for 100miles my longest run was a 10-12 hour overnight run, so probably in the mid-30's. But, again for these longer distances, it's all about time on your feet and fitness. All these longer distances are grouped together for impact in the training calendar – so you are getting that time on your feet and running on tired legs effect. My long run for this cycle is probably going to be 30ish miles, again on tired legs. I've been listening to a lot of audio books. I find it's a great way to get content in parallel with my training and dead time. It's so convenient now with the library app. Last week I listened to Ryan Holiday's Discipline is Destiny. Great listen when you training and trying to get stuff done. I would highly recommend it on audiobook. It's read by the author. There is a lot of the ‘same old' stoic philosophy stuff, but it's a good reminder and might get you motivated. So my friends, I pushed this episode to line up with the off week of my other post-apocalyptic podcast After the Apocalypse, soon to be a series of novels, lol… Keep pushing, keep being smart, take time to learn, take time for yourself, sharpen your saw and I'l see you out there. Hosted on Acast. See acast.com/privacy for more information.
In this episode of Early to Rise Radio, I reveal the five mindset traits that distinguish my most successful millionaire clients. These “Mamba Millionaire Mentalities,” inspired by Kobe Bryant's fierce drive, are key to unlocking greater success, wealth, and resilience. I'll share real-life stories of entrepreneurs like Joel Marion and Isabel Price who embody these… The post 385 – Mamba Millionaire Mindset appeared first on Early To Rise.
In this episode of Barca Talk, it was clear Barca was determined to revenue the two losses against Girona and did just that with a dominating performance vs. their Catalan rival. Also, we discuss Dani Olmo's injury, Kounde's Mamba mentality and other player news as we get ready for Monaco in the Champion's League. Learn more about your ad choices. Visit podcastchoices.com/adchoices
Episode 5-493 – Interview Les is Made Hello my running friends. This week I present the last of the repurposed interview that I did originally for my other show “After the Apocalypse”. This is a friendly talk with Led for his Les is Made podcast – and fittingly he is really interested in talking about running. I had planned to write an article that dove deeper into some of the frequently asked questions. But, I'm out of day today and I figure something is better than nothing. I had another satirical piece on how women running together have an entirely different conversation than men talking to each other – based on my admittedly biased observations. But, alas there is only so much gas in the tank and I have miles to go before I sleep. So – My friends, I'll give you an update on my training – drop the interview and slink off stage left in a embarred shuffle. My training is going great. Within the last 7 days I have completed a nice 12 mile trail run, a 100K bike and capped it off with a 100 mile bike ride with my buddies up to Saco Maine yesterday. Everything went well. We got perfect weather. Nobody crashed. No flat tires. And we made the trip, with breaks in just about 10 hours. My second goal of weight loss is going well also. I poked my nose under the 173 pound flap of the tent this week and with my training load increasing it should continue to trend down. I also finished the last additional chapter for the first novel in my apocalypse series – so getting very close to the end of the editing process and having a finished manuscript to kick out the door like a grown child who has tarried too long. Funny story – I actually brought that final chapter to my new Monday morning writers group for a read this past week as well. I was scared like a little kid having to stand in front of the class. But, they liked it. And gave me some good feedback. All is not lost. I grabbed the race director of the Mamba 100 for an interview and I'll shove that up for the next show. You know me. Why read the race web site when you can just talk to the race director? I was a little worried about running a 100k in the trails in the dark. But he put my mind at ease. The course is in a very gentle trail system right in a park in the center of Memphis. That's the next adventure! So – my friends. This is it for commentary,. I won't be back at the end. Stay in touch. Wish me well. And I will see you out there. Les is MadeBy Les MadewellMy name is Leslie and I am a photographer. I have a few good stories to share and would love to shared them to the world. I try to have models and people I have work with one with me as much as I can. I hope I am at least not boringhttps://podcasters.spotify.com/pod/show/lesismade Hosted on Acast. See acast.com/privacy for more information.
Michael thinks it might well be, and it is likely the pistol he will shoot in the Rimfire Challenge World Championships in October. Meanwhile, fond thoughts on the long-discontinued Remington Versa Max semiauto. MichaelBane.TV - On the Radio episode # 235. Scroll down for reference links on topics discussed in this episode. Disclaimer: The statements and opinions expressed here are our own and may not represent those of the companies we represent or any entities affiliated to it. Host: Michael Bane Producer: Flying Dragon Ltd. More information and reference links: Kevin's Tisas 9mm 1911 Michael's Tisas 9mm 1911 Volquartsen Black Mamba C-More Slide Ride Michael on the Remington Versa Max Remington Model V3 Ruger Precision Rifle The 6mm Creedmoor/John B. Snow, Outdoor Life FTW Ranch The Music of Kyle Cox The Music of the African Tribal Orchestra
In honoring 8/24/24 Day of the Mamba, we delve into the legendary 2009-2010 NBA season that saw Kobe Bryant and the Los Angeles Lakers claim their fifth championship together. As Kobe's career neared its twilight, this championship run served as a poignant reminder of his unparalleled greatness while the Black Mamba Mentality elevated one of the NBA's oldest rivalries between the Lakers and the Celtics.Factor Healthy Eating, Made Easy.Factor Meals - Click Here for Special OfferGet 50% off your first box plus 20% off your next month. Become a supporter of this podcast: https://www.spreaker.com/podcast/the-baseline-nba-podcast--3677698/support.
Mamba Hot Sauces Review
Nathan's podcast, The Cognitive Revolution ... The enduring enigmas of AI ... Conceptualizing how LLMs conceptualize ... Do AIs actually understand things? ... Why AI doesn't need to be superhuman to be revolutionary ... Human vs AI representations of the world ... Thinking through high-dimensional AI brain space ... Nathan on AI risk: Doomer? Accelerationist? Both? ... The open source question and Cold War II ... Do LLMs “naturally” learn human values? ... Mamba: a new approach to building large language models ...
Nathan's podcast, The Cognitive Revolution ... The enduring enigmas of AI ... Conceptualizing how LLMs conceptualize ... Do AIs actually understand things? ... Why AI doesn't need to be superhuman to be revolutionary ... Human vs AI representations of the world ... Thinking through high-dimensional AI brain space ... Nathan on AI risk: Doomer? Accelerationist? Both? ... The open source question and Cold War II ... Do LLMs “naturally” learn human values? ... Mamba: a new approach to building large language models ...
This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Today, we're joined by Albert Gu, assistant professor at Carnegie Mellon University, to discuss his research on post-transformer architectures for multi-modal foundation models, with a focus on state-space models in general and Albert's recent Mamba and Mamba-2 papers in particular. We dig into the efficiency of the attention mechanism and its limitations in handling high-resolution perceptual modalities, and the strengths and weaknesses of transformer architectures relative to alternatives for various tasks. We dig into the role of tokenization and patching in transformer pipelines, emphasizing how abstraction and semantic relationships between tokens underpin the model's effectiveness, and explore how this relates to the debate between handcrafted pipelines versus end-to-end architectures in machine learning. Additionally, we touch on the evolving landscape of hybrid models which incorporate elements of attention and state, the significance of state update mechanisms in model adaptability and learning efficiency, and the contribution and adoption of state-space models like Mamba and Mamba-2 in academia and industry. Lastly, Albert shares his vision for advancing foundation models across diverse modalities and applications. The complete show notes for this episode can be found at https://twimlai.com/go/693.
In Hour 2... [02:30] CHARLOTTE WILDER LIVE FROM BOSTON ON HURLEY TO LAKERS Charlotte Wilder reacts to the Lakers' pursuit of Dan Hurley, noting LeBron James' influence on the coaching search. They discuss whether another successful college coach moving to the NBA signals a trend in college sports. [17:50] NEW ENGLANDERS VS. KYRIE IRVING Charlotte shares her excitement and nerves for Game 1 as a Boston fan. The hosts also dive into how much Boston fans truly despise Kyrie Irving. [23:17] NBA FINALS FACTS OR “FAN”TASY GoJo, Golic, and Charlotte decide if these are the real issues from the NBA Finals or if they're just talk radio topics: #1: For Boston sports fans, Kyrie Irving is Public Enemy #1. #2: Porzingis is the make-or-break factor for the Celtics. #3: Jayson Tatum doesn't have enough “Mamba” in him to win a title. [33:28] WILDER ON WNBA MEDIA DISCOURSE Charlotte discusses the importance of how the media covers the WNBA, highlighting recent missteps and their impact on the league. She also touches on the Aces celebrating Kate Martin's birthday, the Chicago Sky's run-in with a man getting off their team bus, and Cameron Brink's emotional response to making the Olympic 3-on-3 team. [40:08] NBA FINALS PREDICTIONS GoJo, Golic, and Charlotte make their NBA Finals predictions, debating whether the series will end in a gentlemen's sweep, the Mavericks will take it in six games, the Celtics will prevail in seven, or any other results. [43:20] THIS, THAT, AND THE THIRD This: Jokic doesn't waste time having fun this post-season as he's spotted white-water rafting. That: Chase Budinger, a former NBA player, will compete in beach volleyball at the 2024 Olympics. The Third: Dr. Pepper surpasses Pepsi as the number two soda brand in the U.S. Click here to subscribe, rate, and review the newest episodes of GoJo and Golic! If you or someone you know has a gambling problem, crisis counseling and referral services can be accessed by calling 1-800-GAMBLER (1-800-426-2537) (IL/IN/MI/NJ/PA/WV/WY), 1-800-NEXT STEP (AZ), 1-800-522-4700 (CO/NH), 888-789-7777/visit http://ccpg.org/chat (CT), 1-800-BETS OFF (IA), 1-877-770-STOP (7867) (LA), 877-8-HOPENY/text HOPENY (467369) (NY), visit OPGR.org (OR), call/text TN REDLINE 1-800-889-9789 (TN), or 1-888-532-3500 (VA). 21+ (18+ WY). Physically present in AZ/CO/CT/IL/IN/IA/LA/MI/NJ/ NY/PA/TN/VA/WV/WY only. New customers only. Min. $5 deposit required. Eligibility restrictions apply. See http://draftkings.com/sportsbook for details. Learn more about your ad choices. Visit podcastchoices.com/adchoices