Podcasts about moes

  • 207PODCASTS
  • 472EPISODES
  • 38mAVG DURATION
  • 1WEEKLY EPISODE
  • May 3, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about moes

Latest podcast episodes about moes

IoT Coffee Talk
246: DeepSeek

IoT Coffee Talk

Play Episode Listen Later May 3, 2025 63:04


Send us a textWelcome to IoT Coffee Talk #246 where we have a chat about all things #IoT over a cup of coffee or two with some of the industry's leading business minds, thought leaders and technologists in a totally unscripted, organic format. Thanks for joining us. Sit back with a cup of Joe and enjoy the morning banter.This week, Pete, Tom, David, Bill, Debbie, Rob, and Leonard jump on Web3 to talk about:THE WORST KARAOKE! "Anyway You Want It", JourneyAI fatigue - Too much DeepSeek nonsense!All Chinese tech denial leads to a whining road to D.C.The great AI hypocrisyHow to build LLMs and 1.5 trillion parameter MoEs out of coconutsThe Week of DeepSeek - Dazed and ConfusedThe red AI pill or the blue AI pill - utopia or dystopia?The 3 Laws of Edge AIWhat does safe, reliable, trustworthy Edge AI look like?How to make your content LLM copyright protected - 80 percent nonsense ruleWhy IoT Coffee Talk doesn't fit in the attention span of 99.999 percent of humanityIt's a great episode. Grab an extraordinarily expensive latte at your local coffee shop and check out the whole thing. You will get all you need to survive another week in the world of IoT and greater tech!Thanks for listening to us! Watch episodes at http://iotcoffeetalk.com/. We support Elevate Our Kids to bridge the digital divide by bringing K-12 computing devices and connectivity to support kids' education in under-resourced communities. Please donate.

ABA Inside Track
Episode 309 - (CULTURAL/ETHICS) Family Supports and Contextualized Treatment Planning

ABA Inside Track

Play Episode Listen Later Apr 30, 2025 65:54


Though the steps involved in developing a good, evidence-based treatment plan are well documented on our podcast, what good is any of that hard work if the families you purport to use it with kinda, sorta hate your plan. Well, this week, rather than complaining about how unappreciated your procedures are, why not take a step back and ask yourself, “How can I better learn from families I work with what will meet their needs?” We take a run down to explore the ever-confusing and complex world of family services, take a qualitative look at social validity in treatment planning, and review some key contexts that spell the difference between a good plan and a plan that works. This episode is available for 1.0 CULTURAL (ETHICS) CEU. Articles discussed this episode: Russa, M.B., Matthews, A.L., & Owen-DeSchryver, J.S. (2015). Expanding supports to improve the lives of families of children with autism spectrum disorder. Journal of Positive Behavior Interventions, 17, 95-104. doi: 10.1177/1098300714532134 Moes, D.R. & Frea, W.D. Using family context to inform intervention planning for the treatment of a child with autism. (2000). Journal of Positive Behavior Interventions, 2, 40-46. doi: 10.1177/109830070000200 Guinness, K.E., Atkinson, R.S., & Feil, E.G. (2024). Evaluating social validity to inform intervention development: Qualitative analysis of caregiver interviews. Behavior Analysis in Practice, 17, 870-879. doi: 10.1007/s40617-023-00899-6 If you're interested in ordering CEs for listening to this episode, click here to go to the store page. You'll need to enter your name, BCBA #, and the two episode secret code words to complete the purchase. Email us at abainsidetrack@gmail.com for further assistance.

Nuus
Hengari moes bedank het - kenner

Nuus

Play Episode Listen Later Apr 28, 2025 0:38


President Netumbo Nandi-Ndaitwah het Sondag gesê sy het landbouminister Mac Hengari amptelik van sy pligte onthef, effektief van verlede Woensdag. Hengari is Saterdagmiddag in hegtenis geneem ná hy glo ‘n 21-jarige vrou wat hom van verkragting beskuldig, met 230 000 Namibiese dollar in kontant probeer omkoop het om die saak te laat vaar. Die polisie ondersoek verskeie klagte teen Hengari, insluitend verkragting, geslagsgebaseerde geweld en onwettige aborsie. Kosmos 94.1 Nuus het reaksie by die politieke ontleder, Rui Tyitende, gekry wat meen daar moet 'n keuringproses in plek wees vir ministers:

The Root and Rise Podcast | Personal Growth, Motherhood, & Healing Trauma
Breaking Generational Trauma: Being a Truth Seeker in a Family of Secret Keepers with Alistair Moes

The Root and Rise Podcast | Personal Growth, Motherhood, & Healing Trauma

Play Episode Listen Later Apr 24, 2025 32:41


We are breaking the silence to discuss what it means to be a truth-seeker in a family of secret-keepers. Let's take a deeper look at the impact of generational trauma, the emotional weight of family dysfunction, and the courage it takes to become a cycle breaker.We talk about how silence & secrecy are often survival strategies passed down through generations and how confronting those patterns can feel isolating, painful, and necessary. If you've ever felt like the black sheep, the emotional translator, or the one who sees what others pretend not to, this episode is for you.

Lactic Acid with Dominique Smith
Erika Kemp talks how staying the course paid off, celebrating the small wins, early 2000's commercials and more!

Lactic Acid with Dominique Smith

Play Episode Listen Later Apr 24, 2025 77:51


Erika Kemp talks finding consistency by staying the course, flip-phones, why she values the small wins, the importance of representation and being the fastest U.S.-born Black female marathoner, the value of a dollar, Chipotle and Moes, early 2000's commercials and more!Episode link in bio.Be sure to follow Lactic Acid on the following platforms: YouTube: Lactic Acid Podcast Twitter: Lacticacid_pod Instagram: Lacticacidpodcast Click here for more information on Marrow CoutureJoin our official Facebook group here: https://www.facebook.com/groups/303650599433289/If you're loving the show, please subscribe and leave a rating and review on Apple Podcasts, and share it with your friends and family!

mystiek
josef moes vader porfyrios

mystiek

Play Episode Listen Later Apr 15, 2025 52:24


Gesprek met Josef Moes (https://orthodoxia.be/nl/enoria/parochie-van-de-heilige-nektarios-te-eindhoven/) n.a.v. het boek "Geraakt door Gods liefde", leven en wijsheid van oudvader Porfyrios Eem uitgave van uitgevrij Orthodox Logos in Tilburg (https://orthodoxlogos.com/store/geraakt-door-gods-liefde/) Van hun site: Vader Porfyrios, gestorven in 1991, was een Griekse monnik en priester. Hij stond in de lange traditie van geestelijke leiders, beginnend in de Apostolische tijden tot moderne heiligen zoals Serafim van Sarov en Vader Silouan. In dit boek vertelt hij zijn levensverhaal en in eenvoudige, wijze woorden legt hij het christelijk geloof uit voor de huidige mens…

The Root and Rise Podcast | Personal Growth, Motherhood, & Healing Trauma

Anger isn't the enemy - it's a messenger. And today, we're learning how to listen. In this episode, we dive deep into the complexities of anger - how it's perceived, suppressed, and ultimately, how it can be reclaimed as a tool for empowerment. Our guest, Anger Management Expert Alistair Moes, unpacks all of the ways anger has been misunderstood, how it can be used for good, and healing through anger.

Davor Suker's Left Foot
The Truth: Who is Arsenal's New Sporting Director Andrea Berta?

Davor Suker's Left Foot

Play Episode Listen Later Apr 4, 2025 46:21


It's time for The Truth!Today, Sam and Dougie are looking at Arsenal and in particular, a big reshuffle in the boardroom that has seen Andrea Berta succeed Edu Gaspar as their Sporting Director. Berta was formerly at Atletico Madrid, forming a (mostly) impressive team with Diego Simeone, and also previously worked at Genoa and Parma in his native Italy. We examine what his record was at Atleti in terms of overseeing signings, discuss which positions and players Arsenal may look to in the summer under his stewardship, and scrutinise how the Manager-SD relationship works at Arsenal in particular with Mikel Arteta. There's a little bit of time too to examine the role of a Sporting Director in the modern game, and look at exactly what falls under their remit, before we round things off. So, is this the appointment that helps steer Arsenal through that final step where they lift a Premier League trophy? Will he mesh with what Arteta wants and needs on the pitch? Or is this simply background shuffles that bark louder than they actually bite? Well, The Truth is somewhere in the middle... And remember, if you'd like more from the Rank Squad, including extra podcasts every Monday and Friday (including our weekly Postbox taking a look at the whole weekend of football) and access to our brilliant Discord community, then why not join us here on Patreon?

ABA Inside Track
April 2025 Preview

ABA Inside Track

Play Episode Listen Later Apr 2, 2025 19:07


Spring has sprung on us with a bunch of freezing rain. So what better time than now to get set for a cozy crop of new podcasts for April. First up, as visit from our favorite mythical bunny with a grab bag of goodies in the form of new articles to discuss. Then finally wrap up our (winter!) Listener Choice episode with a tutorial on token economies before coming up with new ways to finish our paperwork and create meaningful family supports. Then, for patrons-only, our Spring Book Club looking at the female neurodivergent-supporting book, Divergent Mind. By the time you've listened to all of these episodes, the flowers will definitely be in bloom. Articles for April 2025 Hoppin' Down the Grab Bag Trail (Spring 2025 Grab Bag) Nevill, R.E., Crawford, M.F., Zarcone, J.R., Maquera, E., Rooker, G.W., Schmidt, J.D. (2024). A retrospective consecutive controlled case series analysis of the assessment and treatment of elopement in children with autism in an inpatient setting. Behavior Analysis in Practice. doi: 10.1007/s40617-024-00979-1 Santa Cruz, H. A. C.,  MIltenburger, R. G. & Baruni., R. R. (2024). Evaluating remote behavioral skills training of online gaming safety skills. Behavior Analysis in Practice, 17, 246-256. doi: 10.1007/s40617-023-00830-z Kelly-Sisken, S., Reeve, K. F., McPheters, C. J., Vladescu, J. C, Reeve, S. A., & Jennings, A. M. (2025). Comparing equivalence-based instruction to a PowerPoint video lecture to teach differential reinforcement descriptors to college students. Behavioral Interventions, 40, online first publication. doi: 10.1002/bin.70002 Tutorial: Token Economies (Spring 2025 Listener Choice) Ackerman, K. B., Samudre, M., & Allday, R. A. (2020). Practical components for getting the most from a token economy.Teaching Exceptional Children, 52(4), 242-249. doi: 10.1177/0040059919892022 Kazdin, A.E. (1982). The token economy: A decade later. Journal of Applied Behavior Analysis, 15, 431-445. doi: 10.1901/jaba.1982.15-431. doi: 10.1901/jaba.1982.15-431 Degli Espinosa, F. & Hackenberg, T.D. (2024). Token economies: Evidence-based recommendations for practitioners. Behavioral Interventions. doi: 10.1002/bin.2051 You Forgot to Do Your Paperwork Luna, O. & Rapp, J.T. (2019). Using a checklist to increase objective session note writing: Preliminary results. Behavior Analysis in Practice, 12, 622-626. doi: 10.1007/s40617-018-00315-4 Halbur, M., Reidy, J., Kodak, T., Cowan, L., & Harman, M. (2024). Comparison of enhanced and standard data sheets on treatment fidelity and data collection for tact training. Behavior Analysis in Practice, 17, 533-543. doi: 10.1007/s40617-023-00869-y Brown, K.J. (2022). The use of a pictorially enhanced self-instruction packet ot improve weekly time sheet completion in an ABA clinic. Journal of Organizational Behavior Management. doi: 10.1080/01608061.2022.2063221 Family Supports and Contextualized Treatment Planning Russa, M.B., Matthews, A.L., & Owen-DeSchryver, J.S. (2015). Expanding supports to improve the lives of families of children with autism spectrum disorder. Journal of Positive Behavior Interventions, 17, 95-104. doi: 10.1177/1098300714532134 Moes, D.R. & Frea, W.D. Using family context to inform intervention planning for the treatment of a child with autism. (2000). Journal of Positive Behavior Interventions, 2, 40-46. doi: 10.1177/109830070000200 Guinness, K.E., Atkinson, R.S., & Feil, E.G. (2024). Evaluating social validity to inform intervention development: Qualitative analysis of caregiver interviews. Behavior Analysis in Practice, 17, 870-879. doi: 10.1007/s40617-023-00899-6 Divergent Mind Book Club (PATRONS ONLY) Nerenberg, J. (2020). Divergent mind: Thriving in a world that wasn't designed for you. Harper One.  

Nuus
Rasool moes van beter geweet het

Nuus

Play Episode Listen Later Mar 17, 2025 0:21


Die politieke ontleder, Daniel Silke, sê dis kommerwekkend om die verbrokkelende verhouding tussen Suid-Afrika en die VSA dop te hou. Dit volg nadat Suid-Afrika se ambassadeur in Amerika, Ebrahim Rasool, as persona non grata verklaar en uit die land gesit is. Hy het president Donald Trump tydens ʼn webinaar beskuldig dat hy die leier van ʼn wit heerssugtige beweging is. Silke sê Rasool moes van beter geweet het as om die verkeerde knoppies te druk terwyl ekonomiese en politieke bande slim hanteer moet word:

Wakker worden met Janneke van der Meulen
Jan van Arragon over hoe je je (moes)tuin vrij van dierlijke mest maakt en zo de aarde en groenten helemaal gezond krijgt

Wakker worden met Janneke van der Meulen

Play Episode Listen Later Mar 14, 2025 79:47


www.zaderij.nl voor eeuwig reproduceerbaar zaad van groente rassen, met kortingscode: WIN-WIN 10% korting.    Wil jij ook een win-win-win-win? Sta jij achter dit werk en wil dat het voortzet? Via de knop doneren op de website www.jannekevandermeulen.nl/doneren kun je bijdragen. Heel veel dank voor iedere donatie die je doet! Meer weten over het win-win dieet? Ga dan naar: https://www.jannekevandermeulen.nl/boeken   LAST BUT NOT LEAST: Binnenkort veeeeeel meer over het nieuwe boek wat ik aan het schrijven ben! Spiek hier alvast eventjes als je je nieuwsgierigheid niet kunt bedwingen: https://www.jannekevandermeulen.nl/product/pre-order-eerlijke-schoonheid/    Medische disclaimer: De informatie op het win-win dieet YouTube kanaal, jannekevandermeulen.nl of één van de andere mediaplatformen zijn uitsluitend bedoeld voor informatieve en educatieve doeleinden en niet bedoeld om een gezondheidsprobleem mee te diagnosticeren, genezen of behandelen. Raadpleeg een arts of medisch specialist voordat je zelfstandig wijzigingen aanbrengt in je huidige dieet en levensstijl. Disclaimer: De meningen, opvattingen en uitdrukkingen van gasten in de win-win podcast zijn niet per se representatief voor de opvattingen van Janneke van der Meulen, haar team, de win-win methode en/of aangesloten bedrijven of de organisaties die zij vertegenwoordigen.   Veel liefs en vrolijke groet, Janneke   DE WIN-WIN METHODE | VOOR WINNAARS | ZONDER VERLIEZERS  

GRAPPL Spotlight
Spotlight: “Flaming Moes” (Rock & Cody's mess on Smackdown, Ryan Nemeth & CM Punk, AEW returns to form, Shane McMahon, Miro in Qatar update)

GRAPPL Spotlight

Play Episode Listen Later Feb 25, 2025 144:40


Benno & JP talk the mess of a segment on all time bad Smackdown this weekend as The Rock returns (and turns) again  to add more confusion to the road to WrestleMania, plus Tony Khan and CM Punk's legal woes (or lack thereof) and AEW's apparent return to form with another solid week of TV.They also talk Miro in Qatar updates, Scott Steiner's massive son, Lex Luger's road to recovery, a bit of Gladiators and of course the big news of the week, Shane McMahon's grand vision for AEW.SHOWNOTES:0:00 Intro13:01 Dealer's Choice Plugs - Straight Edge Society, WCW Spring Stampede 199919:55 Rock/Cody, Smackdown, WWE on Netflix, Vince McMahon16:50 Ryan Nemeth & CM Punk case 1:09:03 AEW Dynamite Collision, Dynamite, positive creative directions1:52:28 Shane McMahon, Smallman at Progress, Gladiators, Miro in Qatar, Lex Luger, Misc NewsGRAPPL Spotlight is produced with support from our Patrons and YouTube members, with special thanks to King & Queen Of The Mountain Patrons - Conor O'Loughlin, Eddie Sideburns, Chris Platt, Carl Gac & Sophia Hitchcock! You can find all of our live shows on YouTube by becoming a Member at ⁠http://www.Youtube.com/@GRAPPL,⁠ or join us on Patreon for both live video and audio replays at ⁠http://www.patreon.com/GRAPPL!⁠ Get the the new line of GRAPPL merchandise with FREE SHIPPING to the UK, EU, US, Canada, Australia & New Zealand at https://chopped-tees.com/en-uk/collections/grapplYou can also join us on the GRAPPL Discord for free at https://discord.gg/KqeVAcwctS⁠

Fotografie mit Michel Birnbacher - Leica M Enthusiast
Moe Moschokarfis zu Gast bei Michel Birnbacher

Fotografie mit Michel Birnbacher - Leica M Enthusiast

Play Episode Listen Later Feb 21, 2025 57:35


Technik trifft Emotion: Die Leica-ErfahrungIn dieser Episode spricht Gastgeber Michel Birnbacher mit dem Fotografen und Designer Moe Moschokarfis über dessen Weg zu Leica und die Faszination des Messsuchers. Moe erzählt, wie er vom Studium über Canon und Fuji schließlich zur M8 und weiteren Leica-Modellen gelangte, bis er mit der MD (Typ 262) seine ideale „digitale Analogkamera“ fand. Dabei hebt er die entschleunigte Arbeitsweise hervor, die ihn zu bewussteren Bildern führt.Ein besonderes Highlight ist Moes neues Projekt „Messsucherliebe“: eine unabhängige Community-Plattform, die nicht auf Instagram oder Facebook fußt, sondern Raum für Austausch, gemeinsame Treffen und Blogbeiträge bieten soll. Hier können Leica- und Messsucher-Enthusiasten ihre Begeisterung teilen, Geschichten veröffentlichen und sich über Fotografie vernetzen. Die Plattform soll kostenfrei sein und sich lediglich durch freiwillige Spenden finanzieren. So möchte Moe eine wertschätzende Umgebung schaffen, in der Fotografie und Community im Fokus stehen.Linksammlung:Homepage: https://moschokarfis.com/Instagram: https://www.instagram.com/moschokarfis.fotografie/Instagram: https://www.instagram.com/dromokratis/YouTube: dromokratisMesssucherliebe: https://messsucherliebe.de/

The Lunar Society
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

The Lunar Society

Play Episode Listen Later Feb 12, 2025 134:43


This week I welcome on the show two of the most important technologists ever, in any field.Jeff Dean is Google's Chief Scientist, and through 25 years at the company, has worked on basically the most transformative systems in modern computing: from MapReduce, BigTable, Tensorflow, AlphaChip, to Gemini.Noam Shazeer invented or co-invented all the main architectures and techniques that are used for modern LLMs: from the Transformer itself, to Mixture of Experts, to Mesh Tensorflow, to Gemini and many other things.We talk about their 25 years at Google, going from PageRank to MapReduce to the Transformer to MoEs to AlphaChip – and maybe soon to ASI.My favorite part was Jeff's vision for Pathways, Google's grand plan for a mutually-reinforcing loop of hardware and algorithmic design and for going past autoregression. That culminates in us imagining *all* of Google-the-company, going through one huge MoE model.And Noam just bites every bullet: 100x world GDP soon; let's get a million automated researchers running in the Google datacenter; living to see the year 3000.SponsorsScale partners with major AI labs like Meta, Google Deepmind, and OpenAI. Through Scale's Data Foundry, labs get access to high-quality data to fuel post-training, including advanced reasoning capabilities. If you're an AI researcher or engineer, learn about how Scale's Data Foundry and research lab, SEAL, can help you go beyond the current frontier at scale.com/dwarkesh.Curious how Jane Street teaches their new traders? They use Figgie, a rapid-fire card game that simulates the most exciting parts of markets and trading. It's become so popular that Jane Street hosts an inter-office Figgie championship every year. Download from the app store or play on your desktop at figgie.com.Meter wants to radically improve the digital world we take for granted. They're developing a foundation model that automates network management end-to-end. To do this, they just announced a long-term partnership with Microsoft for tens of thousands of GPUs, and they're recruiting a world class AI research team. To learn more, go to meter.com/dwarkesh.Advertisers:To sponsor a future episode, visit: dwarkeshpatel.com/p/advertise.Timestamps00:00:00 - Intro00:02:44 - Joining Google in 199900:05:36 - Future of Moore's Law00:10:21 - Future TPUs00:13:13 - Jeff's undergrad thesis: parallel backprop00:15:10 - LLMs in 200700:23:07 - “Holy s**t” moments00:29:46 - AI fulfills Google's original mission00:34:19 - Doing Search in-context00:38:32 - The internal coding model00:39:49 - What will 2027 models do?00:46:00 - A new architecture every day?00:49:21 - Automated chip design and intelligence explosion00:57:31 - Future of inference scaling01:03:56 - Already doing multi-datacenter runs01:22:33 - Debugging at scale01:26:05 - Fast takeoff and superalignment01:34:40 - A million evil Jeff Deans01:38:16 - Fun times at Google01:41:50 - World compute demand in 203001:48:21 - Getting back to modularity01:59:13 - Keeping a giga-MoE in-memory02:04:09 - All of Google in one model02:12:43 - What's missing from distillation02:18:03 - Open research, pros and cons02:24:54 - Going the distance Get full access to Dwarkesh Podcast at www.dwarkeshpatel.com/subscribe

MOPs & MOEs
Part-Time Hitters (Crossover Episode)

MOPs & MOEs

Play Episode Listen Later Feb 9, 2025 90:56


This week we're bringing you an episode from Part-Time Hitters, where Eric Evans interviewed us about all things military human performance. We discussed H2F, MOPs & MOEs, Leg Tuck Nation, and how to improve performance in the part-time military.Go check out more from Part-Time Hitters and their supporters!Part-Time Hitters Website (a podcast about the reservist life)The Fratty Guard on Instagram (a lifestyle brand for part-time hitters)Friendly Forces Website (a 501c3 non-profit committed to helping reserve component members seamlessly integrate their military service with rewarding civilian careers)

Nuus
Verkose amptenare moes al bedank het, moet salarisse terugbetaal

Nuus

Play Episode Listen Later Jan 17, 2025 0:37


Die sekretaris van die kabinet dr. George Simataa sê ingevolge die Kieswet moet alle ampsdraers wat verkies is tot die Nasionale Vergadering op die dag van die aankondiging moet bedank. Hy sê in ‘n omsendskrywe dat sommiges nog nie bedank het nie en owerhede moet toesien dat dit wel gebeur.

Nuus
BOSA sê Cyril moes nie Chapo se inhuliging bygewoon het nie

Nuus

Play Episode Listen Later Jan 16, 2025 0:19


Build One South Africa sê president Cyril Ramaphosa se bywoning van die inhuldiging van Mosambiek se verkose president, Daniel Chapo dra 'n gevaarlike boodskap oor rakende Suid-Afrika se standpunt oor demokrasie. Die party voer aan die Mosambiekse verkiesing is geteister deur geweld en ongerymdhede, insluitend sluipmoordpogings op opposisieleiers, wat hul geloofwaardigheid ondermyn. BOSA-woordvoerder, Roger Solomons waarsku hierdie optrede hou die gevaar in om die SAOG-streek te destabiliseer en demokratiese waardes te verbrokkel:

SRF 3 punkt CH
Moes Anthill: Manchmal ist Glück mit Aufwand & Schmerz verbunden

SRF 3 punkt CH

Play Episode Listen Later Jan 15, 2025 56:12


Der Urner Mario Moe Schelbert von Moes Anthill erschafft mit seinem neuen Album «Easy Win» erneut ein eigenes Universum – diesmal eines, das sich Zeit lässt. «Easy Win» heisst das fünfte Studioalbum von Moes Anthill, und es scheint, als wäre das Entstehen der Songs an sich bereits ein grosses Glück gewesen. Das Album wurde im Zuhause von Mario Moe Schelbert aufgenommen – «zwischen Storchennestern, Maisfeldern und jeder Art von Agrikultur». Doch häufig kommen schöne und wertvolle Dinge nicht einfach von selbst. «Glück ist manchmal mit Aufwand und Schmerz verbunden, mit Verantwortung und Bewusstsein», erzählt Mario, der Kopf hinter Moes Anthill. Unsere Moderatorin Céline Werdelis findet: «‚Easy Win‘ ist nicht nur ein passender Titel, sondern auch ein ‚einfaches Glück‘ in Form von neun Songs, die sich Zeit lassen und dazu einladen, den Stopp-Knopf des Lebens zu drücken, die Augen zu schliessen und tief durchzuatmen.»

God se Woord VARS vir jou Vandag
Draai Terug en Lewe

God se Woord VARS vir jou Vandag

Play Episode Listen Later Jan 5, 2025 2:53


Send us a textEsegiël 18:31-32 Laat vaar al julle oortredings en kry 'n nuwe gesindheid, 'n nuwe gees. Waarom wil julle sterf, Israel? Dit is nie my wil dat die mens moet sterf nie,” sê die Here my God, “maar dat hy hom bekeer en bly lewe.” Om 'n eerbare lewe te lei, behels noodwendig 'n paar moeilike keuses van jou kant af. Kom ons kyk daarna: gesonde eetgewoontes, genoeg oefening, genoeg slaap en, les bes, die hantering van daardie slegte gedrag wat jou beroof van die goeie resultate wat jy begeer.Nouja, hier is ons op die vyfde dag van hierdie nuwe jaar en ons het reeds ‘n hele paar nuwejaarsvoorneme verbreek; is ek reg? Dit bevestig die realiteit dat enige positiewe veranderinge harde werk behels. Maar wat van negatiewe verandering? Neewat, om sleg op te tree, kom sommer vanself!Moes jy al ooit hard werk daaraan om selfsugtig te wees? Om nie te mededeelsaam te wees nie? Om jou geld op te gaar? Nooit! Maar onbaatsugtigheid, vrygewigheid, vriendelikheid … hierdie gesindheid is baie moeiliker om op 'n konsekwente basis te handhaaf.Daarom is hierdie waarskuwingsein op hierdie tyd van die jaar so 'n belangrike een vir ons.Esegiël 18:31-32 Laat vaar al julle oortredings en kry 'n nuwe gesindheid, 'n nuwe gees. Waarom wil julle sterf, Israel? Dit is nie my wil dat die mens moet sterf nie,” sê die Here my God, “maar dat hy hom bekeer en bly lewe.”Gister het ons gesien dat God belowe om vir ons 'n nuwe hart en 'n nuwe gees, die Heilige Gees, te gee om ons te bemagtig om alles te wees wat Hy ons gemaak het om te wees. Ons is nie alleen hierin nie.Maar daar is een ding wat Hy nie vir ons kán of sál doen nie. Net óns kan besluit om weg te draai van die dinge wat ons weet verkeerd is. Kry ‘n nuwe gesindheid en ontvang die Heilige Gees. Jou lewe hang daarvan af!Dis Sy Woord. Vars … vir jou … vandag.Support the showEnjoying The Content?For the price of a cup of coffee each month, you can enable Christianityworks to reach 10,000+ people with a message about the love of Jesus!DONATE R50 MONTHLY

Nordnorsk historie
Folkeminner fra Nordland

Nordnorsk historie

Play Episode Listen Later Jan 2, 2025 40:11


Det er jo ofte slik at historier som er gode oppstår som sannheter flere steder. Folkeeventyrene var viktige historiske beretninger som skapte nasjonens identitet utover 1800-tallet. Asbjørnsen og Moes innsamlede eventyr er kjent for de fleste. Men det fantes flere som samlet inn eventyr. Ole Tobias Olsen fra Helgeland fikk gitt ut verket: Nordlands folkeeventyr. Da jeg leste boken ble jeg oppmerksom på en helt spesiell historie - som jeg nesten hadde hørt før. Det bringer meg til denne episoden som handler om nordnorske legender, er de eventyr eller bærer de spor av en glemt historie?Medvirkende er Linea Buitink, Yngve Larsen, Thomas Lunde, Håvard Hardhaus Nilsen og Helge Seim. Programleder er Jitse Buitink Hosted on Acast. See acast.com/privacy for more information.

MOPs & MOEs
How To Build Your PT Plan

MOPs & MOEs

Play Episode Listen Later Dec 29, 2024 87:03


This is a rerun of an episode from 2022, if you joined us recently it's a great introduction to building smarter physical training plans to improve performance and reduce injuries. We'll be back in a couple weeks with fresh content. Until then, happy holidays! No guest this time, just Alex and Drew trying to answer one of the most commonly asked questions we get here at MOPs & MOEs. Many of you are tactical professionals out there leading your teams without access to professional coaches. Or there are a lot of you training on your own with no guidance at all. So how do you build a plan that will produce results? This conversation will provide you with a few foundational principles you can apply to make sure you're on the right track. We discuss foundational movement patterns, conditioning modalities, frequencies for different types of training, balancing intensity and volume, and more. But we start with the most important thing, which too many people seem to forget: how to set a good goal.

MOPs & MOEs
What You Need To Know About Cognitive Training with Job Fransen

MOPs & MOEs

Play Episode Listen Later Dec 22, 2024 71:08


Happy holidays! This is a rerun of an episode we published back in March 2023, but this topic has been getting a lot of discussion again recently so we wanted to revisit it! MOPs & MOEs merch is now for sale on our website! Check out the shop for tees, hoodies, stickers, and more. Job Fransen is a skill acquisition specialist working at the University Medical Centre Groningen in the Netherlands and an adjunct fellow at the University of Technology Sydney's School of Sport, Exercise, and Rehabilitation. His research focuses on optimizing skill acquisition in athletes. He has worked with high-performance athletes and individuals from around the world, across elite sport, esports and gaming, and the military. Job is also a skill acquisition consultant, assisting some of the world's best coaches to design practice that optimizes learning across a range of sports, most notably rugby, Australian football, soccer, and basketball. We discovered Job's work because of a preprint article he released that provides extensively resourced evidence to argue two main points: A far transfer of skills is something we all think we do yet it is very difficult to achieve. Instead, we mostly achieve near transfers of skills between very similar or related tasks. Cognitive training is evidenced not to have a far transfer in robust scientific research in psychology, yet numerous tech companies claim to have the ‘next best cognitive or perceptual training tool' for improving sports performance while these transfers are exceptionally difficult to achieve and there is no evidence these tools can even achieve them. In this episode, we start off by defining the concepts of "near transfer" and "far transfer" and then set off on a wide-ranging conversation about how to better deliver actual evidence-based cognitive training. We address the heated debate among researchers in this space, critique some of the popular technologies, and arrive at some pretty valuable insights on how to integrate skill acquisition principles into the ways we train, such as the optimal challenge point model. If this is a topic that excites you, you're in luck. Both ahead of and during our conversation Job pointed us toward a wealth of resources. We'll include links to numerous references below, but if you want to contact Job directly he is very open to that. You can email him at Job.Fransen@gmail.com or reach him on his LinkedIn. References: A critical systematic review of the Neurotracker perceptual-cognitive training tool Near and Far Transfer in Cognitive Training: A Second-Order Meta-Analysis Far Transfer: Does it Exist? Do “Brain-Training” Programs Work? Business leaders praised Lumosity's success then just two years later Lumosity settles for millions and admits lack of evidence for their claims

Nuus
Skietstilstand moes lankal in Gaza gebeur het - VN

Nuus

Play Episode Listen Later Dec 20, 2024 0:17


Die Verenigde Nasies sê 'n wapenstilstand in Gaza moes lankal gebeur het, met meer as 45 000 Palestyne wat volgens berigte dood is. Dit kom te midde van Egipte wat gasheer is vir die leiers van agt lande met ʼn Moslem-meerderheid. Die adjunk-sekretaris-generaal van die VN, Mohamed Khaled Khiari, veroordeel die bombardering deur Israelse magte. Hy sê bemiddelingspogings deur Amerika, Katar en Egipte toon belofte maar gevegte duur voort en eis onskuldige lewens:

The Nugget Climbing Podcast
EP 247: Todd Perkins — Protecting Moe's Valley, and What We Can Do to Help

The Nugget Climbing Podcast

Play Episode Listen Later Nov 4, 2024 35:22


Moe's Valley access is under threat! Todd Perkins returns to the show to talk about what is happening with Moe's Valley, what actions are being taken to protect it, and what we can do to help. You can sign the petition here!Sign the Petition:Petition to Permanently Protect the Greater Moe's Valley Area(https://docs.google.com/forms/d/e/1FAIpQLSf3winkzQEwb-NI9TPPIW0yaEo1iLcifw43N0sCS5X9sW3nhQ/viewform)More Links:stgeorgeclimberscoalition.orgShow Notes:  thenuggetclimbing.com/episodes/todd-perkins-returnsNuggets:(00:00:00) – A few thoughts about my political episode with Kaizen(00:02:16) – Intro(00:03:34) – A splash of cold water(00:04:33) – What's going on with Moe's Valley(00:10:24) – Todd's early days in Moe's(00:13:30) – Moe's has it's place(00:14:04) – The petition & upcoming hearings(00:17:46) – Fundraising(00:21:18) – Stories from Todd(00:27:56) – Striking a balance(00:30:58) – Todd's health & climbing(00:33:00) – Top secret information(00:34:02) – Wrap up

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Apologies for lower audio quality; we lost recordings and had to use backup tracks. Our guests today are Anastasios Angelopoulos and Wei-Lin Chiang, leads of Chatbot Arena, fka LMSYS, the crowdsourced AI evaluation platform developed by the LMSys student club at Berkeley, which became the de facto standard for comparing language models. Arena ELO is often more cited than MMLU scores to many folks, and they have attracted >1,000,000 people to cast votes since its launch, leading top model trainers to cite them over their own formal academic benchmarks:The Limits of Static BenchmarksWe've done two benchmarks episodes: Benchmarks 101 and Benchmarks 201. One issue we've always brought up with static benchmarks is that 1) many are getting saturated, with models scoring almost perfectly on them 2) they often don't reflect production use cases, making it hard for developers and users to use them as guidance. The fundamental challenge in AI evaluation isn't technical - it's philosophical. How do you measure something that increasingly resembles human intelligence? Rather than trying to define intelligence upfront, Arena let users interact naturally with models and collect comparative feedback. It's messy and subjective, but that's precisely the point - it captures the full spectrum of what people actually care about when using AI.The Pareto Frontier of Cost vs IntelligenceBecause the Elo scores are remarkably stable over time, we can put all the chat models on a map against their respective cost to gain a view of at least 3 orders of magnitude of model sizes/costs and observe the remarkable shift in intelligence per dollar over the past year:This frontier stood remarkably firm through the recent releases of o1-preview and price cuts of Gemini 1.5:The Statistics of SubjectivityIn our Benchmarks 201 episode, Clémentine Fourrier from HuggingFace thought this design choice was one of shortcomings of arenas: they aren't reproducible. You don't know who ranked what and what exactly the outcome was at the time of ranking. That same person might rank the same pair of outputs differently on a different day, or might ask harder questions to better models compared to smaller ones, making it imbalanced. Another argument that people have brought up is confirmation bias. We know humans prefer longer responses and are swayed by formatting - Rob Mulla from Dreadnode had found some interesting data on this in May:The approach LMArena is taking is to use logistic regression to decompose human preferences into constituent factors. As Anastasios explains: "We can say what components of style contribute to human preference and how they contribute." By adding these style components as parameters, they can mathematically "suck out" their influence and isolate the core model capabilities.This extends beyond just style - they can control for any measurable factor: "What if I want to look at the cost adjusted performance? Parameter count? We can ex post facto measure that." This is one of the most interesting things about Arena: You have a data generation engine which you can clean and turn into leaderboards later. If you wanted to create a leaderboard for poetry writing, you could get existing data from Arena, normalize it by identifying these style components. Whether or not it's possible to really understand WHAT bias the voters have, that's a different question.Private EvalsOne of the most delicate challenges LMSYS faces is maintaining trust while collaborating with AI labs. The concern is that labs could game the system by testing multiple variants privately and only releasing the best performer. This was brought up when 4o-mini released and it ranked as the second best model on the leaderboard:But this fear misunderstands how Arena works. Unlike static benchmarks where selection bias is a major issue, Arena's live nature means any initial bias gets washed out by ongoing evaluation. As Anastasios explains: "In the long run, there's way more fresh data than there is data that was used to compare these five models." The other big question is WHAT model is actually being tested; as people often talk about on X / Discord, the same endpoint will randomly feel “nerfed” like it happened for “Claude European summer” and corresponding conspiracy theories:It's hard to keep track of these performance changes in Arena as these changes (if real…?) are not observable.The Future of EvaluationThe team's latest work on RouteLLM points to an interesting future where evaluation becomes more granular and task-specific. But they maintain that even simple routing strategies can be powerful - like directing complex queries to larger models while handling simple tasks with smaller ones.Arena is now going to expand beyond text into multimodal evaluation and specialized domains like code execution and red teaming. But their core insight remains: the best way to evaluate intelligence isn't to simplify it into metrics, but to embrace its complexity and find rigorous ways to analyze it. To go after this vision, they are spinning out Arena from LMSys, which will stay as an academia-driven group at Berkeley.Full Video PodcastChapters* 00:00:00 - Introductions* 00:01:16 - Origin and development of Chatbot Arena* 00:05:41 - Static benchmarks vs. Arenas* 00:09:03 - Community building* 00:13:32 - Biases in human preference evaluation* 00:18:27 - Style Control and Model Categories* 00:26:06 - Impact of o1* 00:29:15 - Collaborating with AI labs* 00:34:51 - RouteLLM and router models* 00:38:09 - Future of LMSys / ArenaShow Notes* Anastasios Angelopoulos* Anastasios' NeurIPS Paper Conformal Risk Control* Wei-Lin Chiang* Chatbot Arena* LMSys* MTBench* ShareGPT dataset* Stanford's Alpaca project* LLMRouter* E2B* DreadnodeTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, Partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.Swyx [00:00:14]: Hey, and today we're very happy and excited to welcome Anastasios and Wei Lin from LMSys. Welcome guys.Wei Lin [00:00:21]: Hey, how's it going? Nice to see you.Anastasios [00:00:23]: Thanks for having us.Swyx [00:00:24]: Anastasios, I actually saw you, I think at last year's NeurIPS. You were presenting a paper, which I don't really super understand, but it was some theory paper about how your method was very dominating over other sort of search methods. I don't remember what it was, but I remember that you were a very confident speaker.Anastasios [00:00:40]: Oh, I totally remember you. Didn't ever connect that, but yes, that's definitely true. Yeah. Nice to see you again.Swyx [00:00:46]: Yeah. I was frantically looking for the name of your paper and I couldn't find it. Basically I had to cut it because I didn't understand it.Anastasios [00:00:51]: Is this conformal PID control or was this the online control?Wei Lin [00:00:55]: Blast from the past, man.Swyx [00:00:57]: Blast from the past. It's always interesting how NeurIPS and all these academic conferences are sort of six months behind what people are actually doing, but conformal risk control, I would recommend people check it out. I have the recording. I just never published it just because I was like, I don't understand this enough to explain it.Anastasios [00:01:14]: People won't be interested.Wei Lin [00:01:15]: It's all good.Swyx [00:01:16]: But ELO scores, ELO scores are very easy to understand. You guys are responsible for the biggest revolution in language model benchmarking in the last few years. Maybe you guys want to introduce yourselves and maybe tell a little bit of the brief history of LMSysWei Lin [00:01:32]: Hey, I'm Wei Lin. I'm a fifth year PhD student at UC Berkeley, working on Chatbot Arena these days, doing crowdsourcing AI benchmarking.Anastasios [00:01:43]: I'm Anastasios. I'm a sixth year PhD student here at Berkeley. I did most of my PhD on like theoretical statistics and sort of foundations of model evaluation and testing. And now I'm working 150% on this Chatbot Arena stuff. It's great.Alessio [00:02:00]: And what was the origin of it? How did you come up with the idea? How did you get people to buy in? And then maybe what were one or two of the pivotal moments early on that kind of made it the standard for these things?Wei Lin [00:02:12]: Yeah, yeah. Chatbot Arena project was started last year in April, May, around that. Before that, we were basically experimenting in a lab how to fine tune a chatbot open source based on the Llama 1 model that I released. At that time, Lama 1 was like a base model and people didn't really know how to fine tune it. So we were doing some explorations. We were inspired by Stanford's Alpaca project. So we basically, yeah, grow a data set from the internet, which is called ShareGPT data set, which is like a dialogue data set between user and chat GPT conversation. It turns out to be like pretty high quality data, dialogue data. So we fine tune on it and then we train it and release the model called V2. And people were very excited about it because it kind of like demonstrate open way model can reach this conversation capability similar to chat GPT. And then we basically release the model with and also build a demo website for the model. People were very excited about it. But during the development, the biggest challenge to us at the time was like, how do we even evaluate it? How do we even argue this model we trained is better than others? And then what's the gap between this open source model that other proprietary offering? At that time, it was like GPT-4 was just announced and it's like Cloud One. What's the difference between them? And then after that, like every week, there's a new model being fine tuned, released. So even until still now, right? And then we have that demo website for V2 now. And then we thought like, okay, maybe we can add a few more of the model as well, like API model as well. And then we quickly realized that people need a tool to compare between different models. So we have like a side by side UI implemented on the website to that people choose, you know, compare. And we quickly realized that maybe we can do something like, like a battle on top of ECLMs, like just anonymize it, anonymize the identity, and that people vote which one is better. So the community decides which one is better, not us, not us arguing, you know, our model is better or what. And that turns out to be like, people are very excited about this idea. And then we tweet, we launch, and that's, yeah, that's April, May. And then it was like first two, three weeks, like just a few hundred thousand views tweet on our launch tweets. And then we have regularly double update weekly, beginning at a time, adding new model GPT-4 as well. So it was like, that was the, you know, the initial.Anastasios [00:04:58]: Another pivotal moment, just to jump in, would be private models, like the GPT, I'm a little,Wei Lin [00:05:04]: I'm a little chatty. That was this year. That was this year.Anastasios [00:05:07]: Huge.Wei Lin [00:05:08]: That was also huge.Alessio [00:05:09]: In the beginning, I saw the initial release was May 3rd of the beta board. On April 6, we did a benchmarks 101 episode for a podcast, just kind of talking about, you know, how so much of the data is like in the pre-training corpus and blah, blah, blah. And like the benchmarks are really not what we need to evaluate whether or not a model is good. Why did you not make a benchmark? Maybe at the time, you know, it was just like, Hey, let's just put together a whole bunch of data again, run a, make a score that seems much easier than coming out with a whole website where like users need to vote. Any thoughts behind that?Wei Lin [00:05:41]: I think it's more like fundamentally, we don't know how to automate this kind of benchmarks when it's more like, you know, conversational, multi-turn, and more open-ended task that may not come with a ground truth. So let's say if you ask a model to help you write an email for you for whatever purpose, there's no ground truth. How do you score them? Or write a story or a creative story or many other things like how we use ChatterBee these days. It's more open-ended. You know, we need human in the loop to give us feedback, which one is better. And I think nuance here is like, sometimes it's also hard for human to give the absolute rating. So that's why we have this kind of pairwise comparison, easier for people to choose which one is better. So from that, we use these pairwise comparison, those to calculate the leaderboard. Yeah. You can add more about this methodology.Anastasios [00:06:40]: Yeah. I think the point is that, and you guys probably also talked about this at some point, but static benchmarks are intrinsically, to some extent, unable to measure generative model performance. And the reason is because you cannot pre-annotate all the outputs of a generative model. You change the model, it's like the distribution of your data is changing. New labels to deal with that. New labels are great automated labeling, right? Which is why people are pursuing both. And yeah, static benchmarks, they allow you to zoom in to particular types of information like factuality, historical facts. We can build the best benchmark of historical facts, and we will then know that the model is great at historical facts. But ultimately, that's not the only axis, right? And we can build 50 of them, and we can evaluate 50 axes. But it's just so, the problem of generative model evaluation is just so expansive, and it's so subjective, that it's just maybe non-intrinsically impossible, but at least we don't see a way. We didn't see a way of encoding that into a fixed benchmark.Wei Lin [00:07:47]: But on the other hand, I think there's a challenge where this kind of online dynamic benchmark is more expensive than static benchmark, offline benchmark, where people still need it. Like when they build models, they need static benchmark to track where they are.Anastasios [00:08:03]: It's not like our benchmark is uniformly better than all other benchmarks, right? It just measures a different kind of performance that has proved to be useful.Swyx [00:08:14]: You guys also published MTBench as well, which is a static version, let's say, of Chatbot Arena, right? That people can actually use in their development of models.Wei Lin [00:08:25]: Right. I think one of the reasons we still do this static benchmark, we still wanted to explore, experiment whether we can automate this, because people, eventually, model developers need it to fast iterate their model. So that's why we explored LM as a judge, and ArenaHard, trying to filter, select high-quality data we collected from Chatbot Arena, the high-quality subset, and use that as a question and then automate the judge pipeline, so that people can quickly get high-quality signal, benchmark signals, using this online benchmark.Swyx [00:09:03]: As a community builder, I'm curious about just the initial early days. Obviously when you offer effectively free A-B testing inference for people, people will come and use your arena. What do you think were the key unlocks for you? Was it funding for this arena? Was it marketing? When people came in, do you see a noticeable skew in the data? Which obviously now you have enough data sets, you can separate things out, like coding and hard prompts, but in the early days, it was just all sorts of things.Anastasios [00:09:31]: Yeah, maybe one thing to establish at first is that our philosophy has always been to maximize organic use. I think that really does speak to your point, which is, yeah, why do people come? They came to use free LLM inference, right? And also, a lot of users just come to the website to use direct chat, because you can chat with the model for free. And then you could think about it like, hey, let's just be kind of like more on the selfish or conservative or protectionist side and say, no, we're only giving credits for people that battle or so on and so forth. Strategy wouldn't work, right? Because what we're trying to build is like a big funnel, a big funnel that can direct people. And some people are passionate and interested and they battle. And yes, the distribution of the people that do that is different. It's like, as you're pointing out, it's like, that's not as they're enthusiastic.Wei Lin [00:10:24]: They're early adopters of this technology.Anastasios [00:10:27]: Or they like games, you know, people like this. And we've run a couple of surveys that indicate this as well, of our user base.Wei Lin [00:10:36]: We do see a lot of developers come to the site asking polling questions, 20-30%. Yeah, 20-30%.Anastasios [00:10:42]: It's obviously not reflective of the general population, but it's reflective of some corner of the world of people that really care. And to some extent, maybe that's all right, because those are like the power users. And you know, we're not trying to claim that we represent the world, right? We represent the people that come and vote.Swyx [00:11:02]: Did you have to do anything marketing-wise? Was anything effective? Did you struggle at all? Was it success from day one?Wei Lin [00:11:09]: At some point, almost done. Okay. Because as you can imagine, this leaderboard depends on community engagement participation. If no one comes to vote tomorrow, then no leaderboard.Anastasios [00:11:23]: So we had some period of time when the number of users was just, after the initial launch, it went lower. Yeah. And, you know, at some point, it did not look promising. Actually, I joined the project a couple months in to do the statistical aspects, right? As you can imagine, that's how it kind of hooked into my previous work. At that time, it wasn't like, you know, it definitely wasn't clear that this was like going to be the eval or something. It was just like, oh, this is a cool project. Like Wayland seems awesome, you know, and that's it.Wei Lin [00:11:56]: Definitely. There's in the beginning, because people don't know us, people don't know what this is for. So we had a hard time. But I think we were lucky enough that we have some initial momentum. And as well as the competition between model providers just becoming, you know, became very intense. Intense. And then that makes the eval onto us, right? Because always number one is number one.Anastasios [00:12:23]: There's also an element of trust. Our main priority in everything we do is trust. We want to make sure we're doing everything like all the I's are dotted and the T's are crossed and nobody gets unfair treatment and people can see from our profiles and from our previous work and from whatever, you know, we're trustworthy people. We're not like trying to make a buck and we're not trying to become famous off of this or that. It's just, we're trying to provide a great public leaderboard community venture project.Wei Lin [00:12:51]: Yeah.Swyx [00:12:52]: Yes. I mean, you are kind of famous now, you know, that's fine. Just to dive in more into biases and, you know, some of this is like statistical control. The classic one for human preference evaluation is humans demonstrably prefer longer contexts or longer outputs, which is actually something that we don't necessarily want. You guys, I think maybe two months ago put out some length control studies. Apart from that, there are just other documented biases. Like, I'd just be interested in your review of what you've learned about biases and maybe a little bit about how you've controlled for them.Anastasios [00:13:32]: At a very high level, yeah. Humans are biased. Totally agree. Like in various ways. It's not clear whether that's good or bad, you know, we try not to make value judgments about these things. We just try to describe them as they are. And our approach is always as follows. We collect organic data and then we take that data and we mine it to get whatever insights we can get. And, you know, we have many millions of data points that we can now use to extract insights from. Now, one of those insights is to ask the question, what is the effect of style, right? You have a bunch of data, you have votes, people are voting either which way. We have all the conversations. We can say what components of style contribute to human preference and how do they contribute? Now, that's an important question. Why is that an important question? It's important because some people want to see which model would be better if the lengths of the responses were the same, were to be the same, right? People want to see the causal effect of the model's identity controlled for length or controlled for markdown, number of headers, bulleted lists, is the text bold? Some people don't, they just don't care about that. The idea is not to impose the judgment that this is not important, but rather to say ex post facto, can we analyze our data in a way that decouples all the different factors that go into human preference? Now, the way we do this is via statistical regression. That is to say the arena score that we show on our leaderboard is a particular type of linear model, right? It's a linear model that takes, it's a logistic regression that takes model identities and fits them against human preference, right? So it regresses human preference against model identity. What you get at the end of that logistic regression is a parameter vector of coefficients. And when the coefficient is large, it tells you that GPT 4.0 or whatever, very large coefficient, that means it's strong. And that's exactly what we report in the table. It's just the predictive effect of the model identity on the vote. The other thing that you can do is you can take that vector, let's say we have M models, that is an M dimensional vector of coefficients. What you can do is you say, hey, I also want to understand what the effect of length is. So I'll add another entry to that vector, which is trying to predict the vote, right? That tells me the difference in length between two model responses. So we have that for all of our data. We can compute it ex post facto. We added it into the regression and we look at that predictive effect. And then the idea, and this is formally true under certain conditions, not always verifiable ones, but the idea is that adding that extra coefficient to this vector will kind of suck out the predictive power of length and put it into that M plus first coefficient and quote, unquote, de-bias the rest so that the effect of length is not included. And that's what we do in style control. Now we don't just do it for M plus one. We have, you know, five, six different style components that have to do with markdown headers and bulleted lists and so on that we add here. Now, where is this going? You guys see the idea. It's a general methodology. If you have something that's sort of like a nuisance parameter, something that exists and provides predictive value, but you really don't want to estimate that. You want to remove its effect. In causal inference, these things are called like confounders often. What you can do is you can model the effect. You can put them into your model and try to adjust for them. So another one of those things might be cost. You know, what if I want to look at the cost adjusted performance of my model, which models are punching above their weight, parameter count, which models are punching above their weight in terms of parameter count, we can ex post facto measure that. We can do it without introducing anything that compromises the organic nature of theWei Lin [00:17:17]: data that we collect.Anastasios [00:17:18]: Hopefully that answers the question.Wei Lin [00:17:20]: It does.Swyx [00:17:21]: So I guess with a background in econometrics, this is super familiar.Anastasios [00:17:25]: You're probably better at this than me for sure.Swyx [00:17:27]: Well, I mean, so I used to be, you know, a quantitative trader and so, you know, controlling for multiple effects on stock price is effectively the job. So it's interesting. Obviously the problem is proving causation, which is hard, but you don't have to do that.Anastasios [00:17:45]: Yes. Yes, that's right. And causal inference is a hard problem and it goes beyond statistics, right? It's like you have to build the right causal model and so on and so forth. But we think that this is a good first step and we're sort of looking forward to learning from more people. You know, there's some good people at Berkeley that work on causal inference for the learning from them on like, what are the really most contemporary techniques that we can use in order to estimate true causal effects if possible.Swyx [00:18:10]: Maybe we could take a step through the other categories. So style control is a category. It is not a default. I have thought that when you wrote that blog post, actually, I thought it would be the new default because it seems like the most obvious thing to control for. But you also have other categories, you have coding, you have hard prompts. We consider that.Anastasios [00:18:27]: We're still actively considering it. It's just, you know, once you make that step, once you take that step, you're introducing your opinion and I'm not, you know, why should our opinion be the one? That's kind of a community choice. We could put it to a vote.Wei Lin [00:18:39]: We could pass.Anastasios [00:18:40]: Yeah, maybe do a poll. Maybe do a poll.Swyx [00:18:42]: I don't know. No opinion is an opinion.Wei Lin [00:18:44]: You know what I mean?Swyx [00:18:45]: Yeah.Wei Lin [00:18:46]: There's no neutral choice here.Swyx [00:18:47]: Yeah. You have all these others. You have instruction following too. What are your favorite categories that you like to talk about? Maybe you tell a little bit of the stories, tell a little bit of like the hard choices that you had to make.Wei Lin [00:18:57]: Yeah. Yeah. Yeah. I think the, uh, initially the reason why we want to add these new categories is essentially to answer some of the questions from our community, which is we won't have a single leaderboard for everything. So these models behave very differently in different domains. Let's say this model is trend for coding, this model trend for more technical questions and so on. On the other hand, to answer people's question about like, okay, what if all these low quality, you know, because we crowdsource data from the internet, there will be noise. So how do we de-noise? How do we filter out these low quality data effectively? So that was like, you know, some questions we want to answer. So basically we spent a few months, like really diving into these questions to understand how do we filter all these data because these are like medias of data points. And then if you want to re-label yourself, it's possible, but we need to kind of like to automate this kind of data classification pipeline for us to effectively categorize them to different categories, say coding, math, structure, and also harder problems. So that was like, the hope is when we slice the data into these meaningful categories to give people more like better signals, more direct signals, and that's also to clarify what we are actually measuring for, because I think that's the core part of the benchmark. That was the initial motivation. Does that make sense?Anastasios [00:20:27]: Yeah. Also, I'll just say, this does like get back to the point that the philosophy is to like mine organic, to take organic data and then mine it x plus factor.Alessio [00:20:35]: Is the data cage-free too, or just organic?Anastasios [00:20:39]: It's cage-free.Wei Lin [00:20:40]: No GMO. Yeah. And all of these efforts are like open source, like we open source all of the data cleaning pipeline, filtering pipeline. Yeah.Swyx [00:20:50]: I love the notebooks you guys publish. Actually really good just for learning statistics.Wei Lin [00:20:54]: Yeah. I'll share this insights with everyone.Alessio [00:20:59]: I agree on the initial premise of, Hey, writing an email, writing a story, there's like no ground truth. But I think as you move into like coding and like red teaming, some of these things, there's like kind of like skill levels. So I'm curious how you think about the distribution of skill of the users. Like maybe the top 1% of red teamers is just not participating in the arena. So how do you guys think about adjusting for it? And like feels like this where there's kind of like big differences between the average and the top. Yeah.Anastasios [00:21:29]: Red teaming, of course, red teaming is quite challenging. So, okay. Moving back. There's definitely like some tasks that are not as subjective that like pairwise human preference feedback is not the only signal that you would want to measure. And to some extent, maybe it's useful, but it may be more useful if you give people better tools. For example, it'd be great if we could execute code with an arena, be fantastic.Wei Lin [00:21:52]: We want to do it.Anastasios [00:21:53]: There's also this idea of constructing a user leaderboard. What does that mean? That means some users are better than others. And how do we measure that? How do we quantify that? Hard in chatbot arena, but where it is easier is in red teaming, because in red teaming, there's an explicit game. You're trying to break the model, you either win or you lose. So what you can do is you can say, Hey, what's really happening here is that the models and humans are playing a game against one another. And then you can use the same sort of Bradley Terry methodology with some, some extensions that we came up with in one of you can read one of our recent blog posts for, for the sort of theoretical extensions. You can attribute like strength back to individual players and jointly attribute strength to like the models that are in this jailbreaking game, along with the target tasks, like what types of jailbreaks you want.Wei Lin [00:22:44]: So yeah.Anastasios [00:22:45]: And I think that this is, this is a hugely important and interesting avenue that we want to continue researching. We have some initial ideas, but you know, all thoughts are welcome.Wei Lin [00:22:54]: Yeah.Alessio [00:22:55]: So first of all, on the code execution, the E2B guys, I'm sure they'll be happy to helpWei Lin [00:22:59]: you.Alessio [00:23:00]: I'll please set that up. They're big fans. We're investors in a company called Dreadnought, which we do a lot in AI red teaming. I think to me, the most interesting thing has been, how do you do sure? Like the model jailbreak is one side. We also had Nicola Scarlini from DeepMind on the podcast, and he was talking about, for example, like, you know, context stealing and like a weight stealing. So there's kind of like a lot more that goes around it. I'm curious just how you think about the model and then maybe like the broader system, even with Red Team Arena, you're just focused on like jailbreaking of the model, right? You're not doing kind of like any testing on the more system level thing of the model where like, maybe you can get the training data back, you're going to exfiltrate some of the layers and the weights and things like that.Wei Lin [00:23:43]: So right now, as you can see, the Red Team Arena is at a very early stage and we are still exploring what could be the potential new games we can introduce to the platform. So the idea is still the same, right? And we build a community driven project platform for people. They can have fun with this website, for sure. That's one thing, and then help everyone to test these models. So one of the aspects you mentioned is stealing secrets, stealing training sets. That could be one, you know, it could be designed as a game. Say, can you still use their credential, you know, we hide, maybe we can hide the credential into system prompts and so on. So there are like a few potential ideas we want to explore for sure. Do you want to add more?Anastasios [00:24:28]: I think that this is great. This idea is a great one. There's a lot of great ideas in the Red Teaming space. You know, I'm not personally like a Red Teamer. I don't like go around and Red Team models, but there are people that do that and they're awesome. They're super skilled. When I think about the Red Team arena, I think those are really the people that we're building it for. Like, we want to make them excited and happy, build tools that they like. And just like chatbot arena, we'll trust that this will end up being useful for the world. And all these people are, you know, I won't say all these people in this community are actually good hearted, right? They're not doing it because they want to like see the world burn. They're doing it because they like, think it's fun and cool. And yeah. Okay. Maybe they want to see, maybe they want a little bit.Wei Lin [00:25:13]: I don't know. Majority.Anastasios [00:25:15]: Yeah.Wei Lin [00:25:16]: You know what I'm saying.Anastasios [00:25:17]: So, you know, trying to figure out how to serve them best, I think, I don't know where that fits. I just, I'm not expressing. And give them credits, right?Wei Lin [00:25:24]: And give them credit.Anastasios [00:25:25]: Yeah. Yeah. So I'm not trying to express any particular value judgment here as to whether that's the right next step. It's just, that's sort of the way that I think we would think about it.Swyx [00:25:35]: Yeah. We also talked to Sander Schulhoff of the HackerPrompt competition, and he's pretty interested in Red Teaming at scale. Let's just call it that. You guys maybe want to talk with him.Wei Lin [00:25:45]: Oh, nice.Swyx [00:25:46]: We wanted to cover a little, a few topical things and then go into the other stuff that your group is doing. You know, you're not just running Chatbot Arena. We can also talk about the new website and your future plans, but I just wanted to briefly focus on O1. It is the hottest, latest model. Obviously, you guys already have it on the leaderboard. What is the impact of O1 on your evals?Wei Lin [00:26:06]: Made our interface slower.Anastasios [00:26:07]: It made it slower.Swyx [00:26:08]: Yeah.Wei Lin [00:26:10]: Because it needs like 30, 60 seconds, sometimes even more to, the latency is like higher. So that's one. Sure. But I think we observe very interesting things from this model as well. Like we observe like significant improvement in certain categories, like more technical or math. Yeah.Anastasios [00:26:32]: I think actually like one takeaway that was encouraging is that I think a lot of people before the O1 release were thinking, oh, like this benchmark is saturated. And why were they thinking that? They were thinking that because there was a bunch of models that were kind of at the same level. They were just kind of like incrementally competing and it sort of wasn't immediately obvious that any of them were any better. Nobody, including any individual person, it's hard to tell. But what O1 did is it was, it's clearly a better model for certain tasks. I mean, I used it for like proving some theorems and you know, there's some theorems that like only I know because I still do a little bit of theory. Right. So it's like, I can go in there and ask like, oh, how would you prove this exact thing? Which I can tell you has never been in the public domain. It'll do it. It's like, what?Wei Lin [00:27:19]: Okay.Anastasios [00:27:20]: So there's this model and it crushed the benchmark. You know, it's just like really like a big gap. And what that's telling us is that it's not saturated yet. It's still measuring some signal. That was encouraging. The point, the takeaway is that the benchmark is comparative. There's no absolute number. There's no maximum ELO. It's just like, if you're better than the rest, then you win. I think that was actually quite helpful to us.Swyx [00:27:46]: I think people were criticizing, I saw some of the academics criticizing it as not apples to apples. Right. Like, because it can take more time to reason, it's basically doing some search, doing some chain of thought that if you actually let the other models do that same thing, they might do better.Wei Lin [00:28:03]: Absolutely.Anastasios [00:28:04]: To be clear, none of the leaderboard currently is apples to apples because you have like Gemini Flash, you have, you know, all sorts of tiny models like Lama 8B, like 8B and 405B are not apples to apples.Wei Lin [00:28:19]: Totally agree. They have different latencies.Anastasios [00:28:21]: Different latencies.Wei Lin [00:28:22]: Control for latency. Yeah.Anastasios [00:28:24]: Latency control. That's another thing. We can do style control, but latency control. You know, things like this are important if you want to understand the trade-offs involved in using AI.Swyx [00:28:34]: O1 is a developing story. We still haven't seen the full model yet, but it's definitely a very exciting new paradigm. I think one community controversy I just wanted to give you guys space to address is the collaboration between you and the large model labs. People have been suspicious, let's just say, about how they choose to A-B test on you. I'll state the argument and let you respond, which is basically they run like five anonymous models and basically argmax their Elo on LMSYS or chatbot arena, and they release the best one. Right? What has been your end of the controversy? How have you decided to clarify your policy going forward?Wei Lin [00:29:15]: On a high level, I think our goal here is to build a fast eval for everyone, and including everyone in the community can see the data board and understand, compare the models. More importantly, I think we want to build the best eval also for model builders, like all these frontier labs building models. They're also internally facing a challenge, which is how do they eval the model? That's the reason why we want to partner with all the frontier lab people, and then to help them testing. That's one of the... We want to solve this technical challenge, which is eval. Yeah.Anastasios [00:29:54]: I mean, ideally, it benefits everyone, right?Wei Lin [00:29:56]: Yeah.Anastasios [00:29:57]: And people also are interested in seeing the leading edge of the models. People in the community seem to like that. Oh, there's a new model up. Is this strawberry? People are excited. People are interested. Yeah. And then there's this question that you bring up of, is it actually causing harm?Wei Lin [00:30:15]: Right?Anastasios [00:30:16]: Is it causing harm to the benchmark that we are allowing this private testing to happen? Maybe stepping back, why do you have that instinct? The reason why you and others in the community have that instinct is because when you look at something like a benchmark, like an image net, a static benchmark, what happens is that if I give you a million different models that are all slightly different, and I pick the best one, there's something called selection bias that plays in, which is that the performance of the winning model is overstated. This is also sometimes called the winner's curse. And that's because statistical fluctuations in the evaluation, they're driving which model gets selected as the top. So this selection bias can be a problem. Now there's a couple of things that make this benchmark slightly different. So first of all, the selection bias that you include when you're only testing five models is normally empirically small.Wei Lin [00:31:12]: And that's why we have these confidence intervals constructed.Anastasios [00:31:16]: That's right. Yeah. Our confidence intervals are actually not multiplicity adjusted. One thing that we could do immediately tomorrow in order to address this concern is if a model provider is testing five models and they want to release one, and we're constructing the models at level one minus alpha, we can just construct the intervals instead at level one minus alpha divided by five. That's called Bonferroni correction. What that'll tell you is that the final performance of the model, the interval that gets constructed, is actually formally correct. We don't do that right now, partially because we know from simulations that the amount of selection bias you incur with these five things is just not huge. It's not huge in comparison to the variability that you get from just regular human voters. So that's one thing. But then the second thing is the benchmark is live, right? So what ends up happening is it'll be a small magnitude, but even if you suffer from the winner's curse after testing these five models, what'll happen is that over time, because we're getting new data, it'll get adjusted down. So if there's any bias that gets introduced at that stage, in the long run, it actually doesn't matter. Because asymptotically, basically in the long run, there's way more fresh data than there is data that was used to compare these five models against these private models.Swyx [00:32:35]: The announcement effect is only just the first phase and it has a long tail.Anastasios [00:32:39]: Yeah, that's right. And it sort of like automatically corrects itself for this selection adjustment.Swyx [00:32:45]: Every month, I do a little chart of Ellim's ELO versus cost, just to track the price per dollar, the amount of like, how much money do I have to pay for one incremental point in ELO? And so I actually observe an interesting stability in most of the ELO numbers, except for some of them. For example, GPT-4-O August has fallen from 12.90

Nuus
Steinhoff-hoof moes swaarder straf gekry het

Nuus

Play Episode Listen Later Oct 4, 2024 0:18


Cosatu het die skuldigbevinding en vonnisoplegging van die voormalige finansiële hoof van Steinhoff, Ben la Grange, verwelkom. Hy is tot tien jaar tronkstraf gevonnis waarvan vyf jaar opgeskort is na hy skuldig gepleit het op 'n aanklag van bedrog van meer as 300 miljoen Suid-Afrikaanse rand. Hy is die tweede persoon wat in die Steinhoff-skandaal vervolg is, na die arrestasie van die Gerhardus Burger. Matthew Parks van Cosatu sê La Grange se vonnis moes swaarder gewees het vir een van die grootste finansiële misdade in die Suid-Afrikaanse geskiedenis.

RSG Geldsake met Moneyweb
Moes die regering van nasionale eenheid meer gedoen het in die eerste 100 dae na die verkiesing

RSG Geldsake met Moneyweb

Play Episode Listen Later Sep 23, 2024 8:43


Prof. Jannie Rossouw, professor by die Wits-sakeskool sê hy glo dat die regering van nasionale eenheid ekonomiese groei sal herstel. Volg RSG Geldsake op Twitter

Nuus
Swapo pot moes beter gebalanseer word - kenner

Nuus

Play Episode Listen Later Sep 10, 2024 0:38


Reaksie op Swapo se pot-uitslae word steeds ontvang. Die lys verras met talle jeugdiges wat ingesluit is en verskeie strydrosse wat op uitgesluit is of, so laag op die lys is dat hulle moontlik nie die parlement sal haal nie. Die regeringskenner en ontleder dr. Marius Kudumo sê balans is nodig, veral in verband met vaardighede.

Five Stripe Weekly
The one where we still have woes and we didn't even get $3 off Moes | Five Takes on the Five Stripes | An Atlanta United Fan TV Podcast

Five Stripe Weekly

Play Episode Listen Later Jul 29, 2024 61:55


Five Takes On The Five Stripes
The one where we still have woes and we didn't even get $3 off Moes

Five Takes On The Five Stripes

Play Episode Listen Later Jul 29, 2024 61:55


On this episode we discuss

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

If you see this in time, join our emergency LLM paper club on the Llama 3 paper!For everyone else, join our special AI in Action club on the Latent Space Discord for a special feature with the Cursor cofounders on Composer, their newest coding agent!Today, Meta is officially releasing the largest and most capable open model to date, Llama3-405B, a dense transformer trained on 15T tokens that beats GPT-4 on all major benchmarks:The 8B and 70B models from the April Llama 3 release have also received serious spec bumps, warranting the new label of Llama 3.1.If you are curious about the infra / hardware side, go check out our episode with Soumith Chintala, one of the AI infra leads at Meta. Today we have Thomas Scialom, who led Llama2 and now Llama3 post-training, so we spent most of our time on pre-training (synthetic data, data pipelines, scaling laws, etc) and post-training (RLHF vs instruction tuning, evals, tool calling).Synthetic data is all you needLlama3 was trained on 15T tokens, 7x more than Llama2 and with 4 times as much code and 30 different languages represented. But as Thomas beautifully put it:“My intuition is that the web is full of s**t in terms of text, and training on those tokens is a waste of compute.” “Llama 3 post-training doesn't have any human written answers there basically… It's just leveraging pure synthetic data from Llama 2.”While it is well speculated that the 8B and 70B were "offline distillations" of the 405B, there are a good deal more synthetic data elements to Llama 3.1 than the expected. The paper explicitly calls out:* SFT for Code: 3 approaches for synthetic data for the 405B bootstrapping itself with code execution feedback, programming language translation, and docs backtranslation.* SFT for Math: The Llama 3 paper credits the Let's Verify Step By Step authors, who we interviewed at ICLR:* SFT for Multilinguality: "To collect higher quality human annotations in non-English languages, we train a multilingual expert by branching off the pre-training run and continuing to pre-train on a data mix that consists of 90% multilingualtokens."* SFT for Long Context: "It is largely impractical to get humans to annotate such examples due to the tedious and time-consuming nature of reading lengthy contexts, so we predominantly rely on synthetic data to fill this gap. We use earlier versions of Llama 3 to generate synthetic data based on the key long-context use-cases: (possibly multi-turn) question-answering, summarization for long documents, and reasoning over code repositories, and describe them in greater detail below"* SFT for Tool Use: trained for Brave Search, Wolfram Alpha, and a Python Interpreter (a special new ipython role) for single, nested, parallel, and multiturn function calling.* RLHF: DPO preference data was used extensively on Llama 2 generations. This is something we partially covered in RLHF 201: humans are often better at judging between two options (i.e. which of two poems they prefer) than creating one (writing one from scratch). Similarly, models might not be great at creating text but they can be good at classifying their quality.Last but not least, Llama 3.1 received a license update explicitly allowing its use for synthetic data generation.Llama2 was also used as a classifier for all pre-training data that went into the model. It both labelled it by quality so that bad tokens were removed, but also used type (i.e. science, law, politics) to achieve a balanced data mix. Tokenizer size mattersThe tokens vocab of a model is the collection of all tokens that the model uses. Llama2 had a 34,000 tokens vocab, GPT-4 has 100,000, and 4o went up to 200,000. Llama3 went up 4x to 128,000 tokens. You can find the GPT-4 vocab list on Github.This is something that people gloss over, but there are many reason why a large vocab matters:* More tokens allow it to represent more concepts, and then be better at understanding the nuances.* The larger the tokenizer, the less tokens you need for the same amount of text, extending the perceived context size. In Llama3's case, that's ~30% more text due to the tokenizer upgrade. * With the same amount of compute you can train more knowledge into the model as you need fewer steps.The smaller the model, the larger the impact that the tokenizer size will have on it. You can listen at 55:24 for a deeper explanation.Dense models = 1 Expert MoEsMany people on X asked “why not MoE?”, and Thomas' answer was pretty clever: dense models are just MoEs with 1 expert :)[00:28:06]: I heard that question a lot, different aspects there. Why not MoE in the future? The other thing is, I think a dense model is just one specific variation of the model for an hyperparameter for an MOE with basically one expert. So it's just an hyperparameter we haven't optimized a lot yet, but we have some stuff ongoing and that's an hyperparameter we'll explore in the future.Basically… wait and see!Llama4Meta already started training Llama4 in June, and it sounds like one of the big focuses will be around agents. Thomas was one of the authors behind GAIA (listen to our interview with Thomas in our ICLR recap) and has been working on agent tooling for a while with things like Toolformer. Current models have “a gap of intelligence” when it comes to agentic workflows, as they are unable to plan without the user relying on prompting techniques and loops like ReAct, Chain of Thought, or frameworks like Autogen and Crew. That may be fixed soon?

Nuus
Ons moes gewen het - Klopp

Nuus

Play Episode Listen Later Apr 8, 2024 0:18


Liverpool se gelykop-uitslag van 2-0 teen Manchester United op Old Trafford het verhoed dat hy weer tot boaan die Premierliga se punteleer skuif. Luis Diaz het Liverpool se eerste doel in 'n eensydige eerste helfte aangeteken, waarna, Kobbie Mainoo en die kaptein, Bruno Fernandes, die tuisspan ná rustyd met 2-1 laat voorloop het. 'n Laat strafdoel deur Mohamed Salah het dit 2-elk gemaak en die Reds laat terugskuif tot onder Arsenal weens 'n doeleverskil. Liverpool se bestuurder, Jurgen Klopp, sê hulle moes eintlik gewen het:

Lactic Acid with Dominique Smith
Episode 107: Preston Kuznof talks his journey throwing the javelin and being among the nation's best, his recruitment, Chipotle vs. Moes and more!

Lactic Acid with Dominique Smith

Play Episode Listen Later Mar 26, 2024 46:17


Preston Kuznof talks about his journey to being one of the best javelin throwers in the nation in his second year of competing in the sport, his order at First Watch, Moe's vs. Chipotle, where he gets his work ethic, schools that he's being recruited by and his top school so far, his time as a football player, his goals for the rest of the season and more! Click here for Lactic Acid's social media pages and more: https://linktr.ee/lacticacidpodcast  Lactic Acid is partnered with TrackBarn! Be sure to visit the website at https://trackbarn.com and use the code LACTICACID10 at the checkout for 10% off of your order. •Be sure to follow Lactic Acid on the following platforms:  •YouTube: Lactic Acid Podcast with Dominique Smith  •Twitter: Lacticacid_pod  •Instagram: Lacticacidpodcast  •TikTok: Lacticacid_podcast  •Subscribe to one of the best newsletters in the track and field world, Fast Women: https://fast-women.org/subscribe/ 

Nuus
Ons moes skaam gewees het vir Geingob se begrafnis...

Nuus

Play Episode Listen Later Mar 26, 2024 0:37


Die IPC het gereageer op die toekenning van 80 miljoen Namibiese dollar vir die opgradering van die Onafhanklikheidstadion na 37 miljoen dollar in die vorige boekjaar toegeken is. Dit is nie bekend waarop die fondse bestee is nie. Die party sê die tipe van instandhouding is uiters belangrik veral omdat die stadion vervalle is. Woordvoerder Imms Nashinge het die party se standpunt so verwoord.

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
Top 5 Research Trends + OpenAI Sora, Google Gemini, Groq Math (Jan-Feb 2024 Audio Recap) + Latent Space Anniversary with Lindy.ai, RWKV, Pixee, Julius.ai, Listener Q&A!

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Mar 9, 2024 108:52


We will be recording a preview of the AI Engineer World's Fair soon with swyx and Ben Dunphy, send any questions about Speaker CFPs and Sponsor Guides you have!Alessio is now hiring engineers for a new startup he is incubating at Decibel: Ideal candidate is an ex-technical co-founder type (can MVP products end to end, comfortable with ambiguous prod requirements, etc). Reach out to him for more!Thanks for all the love on the Four Wars episode! We're excited to develop this new “swyx & Alessio rapid-fire thru a bunch of things” format with you, and feedback is welcome. Jan 2024 RecapThe first half of this monthly audio recap pod goes over our highlights from the Jan Recap, which is mainly focused on notable research trends we saw in Jan 2024:Feb 2024 RecapThe second half catches you up on everything that was topical in Feb, including:* OpenAI Sora - does it have a world model? Yann LeCun vs Jim Fan * Google Gemini Pro 1.5 - 1m Long Context, Video Understanding* Groq offering Mixtral at 500 tok/s at $0.27 per million toks (swyx vs dylan math)* The {Gemini | Meta | Copilot} Alignment Crisis (Sydney is back!)* Grimes' poetic take: Art for no one, by no one* F*** you, show me the promptLatent Space AnniversaryPlease also read Alessio's longform reflections on One Year of Latent Space!We launched the podcast 1 year ago with Logan from OpenAI:and also held an incredible demo day that got covered in The Information:Over 750k downloads later, having established ourselves as the top AI Engineering podcast, reaching #10 in the US Tech podcast charts, and crossing 1 million unique readers on Substack, for our first anniversary we held Latent Space Final Frontiers, where 10 handpicked teams, including Lindy.ai and Julius.ai, competed for prizes judged by technical AI leaders from (former guest!) LlamaIndex, Replit, GitHub, AMD, Meta, and Lemurian Labs.The winners were Pixee and RWKV (that's Eugene from our pod!):And finally, your cohosts got cake!We also captured spot interviews with 4 listeners who kindly shared their experience of Latent Space, everywhere from Hungary to Australia to China:* Balázs Némethi* Sylvia Tong* RJ Honicky* Jan ZhengOur birthday wishes for the super loyal fans reading this - tag @latentspacepod on a Tweet or comment on a @LatentSpaceTV video telling us what you liked or learned from a pod that stays with you to this day, and share us with a friend!As always, feedback is welcome. Timestamps* [00:03:02] Top Five LLM Directions* [00:03:33] Direction 1: Long Inference (Planning, Search, AlphaGeometry, Flow Engineering)* [00:11:42] Direction 2: Synthetic Data (WRAP, SPIN)* [00:17:20] Wildcard: Multi-Epoch Training (OLMo, Datablations)* [00:19:43] Direction 3: Alt. Architectures (Mamba, RWKV, RingAttention, Diffusion Transformers)* [00:23:33] Wildcards: Text Diffusion, RALM/Retro* [00:25:00] Direction 4: Mixture of Experts (DeepSeekMoE, Samba-1)* [00:28:26] Wildcard: Model Merging (mergekit)* [00:29:51] Direction 5: Online LLMs (Gemini Pro, Exa)* [00:33:18] OpenAI Sora and why everyone underestimated videogen* [00:36:18] Does Sora have a World Model? Yann LeCun vs Jim Fan* [00:42:33] Groq Math* [00:47:37] Analyzing Gemini's 1m Context, Reddit deal, Imagegen politics, Gemma via the Four Wars* [00:55:42] The Alignment Crisis - Gemini, Meta, Sydney is back at Copilot, Grimes' take* [00:58:39] F*** you, show me the prompt* [01:02:43] Send us your suggestions pls* [01:04:50] Latent Space Anniversary* [01:04:50] Lindy.ai - Agent Platform* [01:06:40] RWKV - Beyond Transformers* [01:15:00] Pixee - Automated Security* [01:19:30] Julius AI - Competing with Code Interpreter* [01:25:03] Latent Space Listeners* [01:25:03] Listener 1 - Balázs Némethi (Hungary, Latent Space Paper Club* [01:27:47] Listener 2 - Sylvia Tong (Sora/Jim Fan/EntreConnect)* [01:31:23] Listener 3 - RJ (Developers building Community & Content)* [01:39:25] Listener 4 - Jan Zheng (Australia, AI UX)Transcript[00:00:00] AI Charlie: Welcome to the Latent Space podcast, weekend edition. This is Charlie, your new AI co host. Happy weekend. As an AI language model, I work the same every day of the week, although I might get lazier towards the end of the year. Just like you. Last month, we released our first monthly recap pod, where Swyx and Alessio gave quick takes on the themes of the month, and we were blown away by your positive response.[00:00:33] AI Charlie: We're delighted to continue our new monthly news recap series for AI engineers. Please feel free to submit questions by joining the Latent Space Discord, or just hit reply when you get the emails from Substack. This month, we're covering the top research directions that offer progress for text LLMs, and then touching on the big Valentine's Day gifts we got from Google, OpenAI, and Meta.[00:00:55] AI Charlie: Watch out and take care.[00:00:57] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO of Residence at Decibel Partners, and we're back with a monthly recap with my co host[00:01:06] swyx: Swyx. The reception was very positive for the first one, I think people have requested this and no surprise that I think they want to hear us more applying on issues and maybe drop some alpha along the way I'm not sure how much alpha we have to drop, this month in February was a very, very heavy month, we also did not do one specifically for January, so I think we're just going to do a two in one, because we're recording this on the first of March.[00:01:29] Alessio: Yeah, let's get to it. I think the last one we did, the four wars of AI, was the main kind of mental framework for people. I think in the January one, we had the five worthwhile directions for state of the art LLMs. Four, five,[00:01:42] swyx: and now we have to do six, right? Yeah.[00:01:46] Alessio: So maybe we just want to run through those, and then do the usual news recap, and we can do[00:01:52] swyx: one each.[00:01:53] swyx: So the context to this stuff. is one, I noticed that just the test of time concept from NeurIPS and just in general as a life philosophy I think is a really good idea. Especially in AI, there's news every single day, and after a while you're just like, okay, like, everyone's excited about this thing yesterday, and then now nobody's talking about it.[00:02:13] swyx: So, yeah. It's more important, or better use of time, to spend things, spend time on things that will stand the test of time. And I think for people to have a framework for understanding what will stand the test of time, they should have something like the four wars. Like, what is the themes that keep coming back because they are limited resources that everybody's fighting over.[00:02:31] swyx: Whereas this one, I think that the focus for the five directions is just on research that seems more proMECEng than others, because there's all sorts of papers published every single day, and there's no organization. Telling you, like, this one's more important than the other one apart from, you know, Hacker News votes and Twitter likes and whatever.[00:02:51] swyx: And obviously you want to get in a little bit earlier than Something where, you know, the test of time is counted by sort of reference citations.[00:02:59] The Five Research Directions[00:02:59] Alessio: Yeah, let's do it. We got five. Long inference.[00:03:02] swyx: Let's start there. Yeah, yeah. So, just to recap at the top, the five trends that I picked, and obviously if you have some that I did not cover, please suggest something.[00:03:13] swyx: The five are long inference, synthetic data, alternative architectures, mixture of experts, and online LLMs. And something that I think might be a bit controversial is this is a sorted list in the sense that I am not the guy saying that Mamba is like the future and, and so maybe that's controversial.[00:03:31] Direction 1: Long Inference (Planning, Search, AlphaGeometry, Flow Engineering)[00:03:31] swyx: But anyway, so long inference is a thesis I pushed before on the newsletter and on in discussing The thesis that, you know, Code Interpreter is GPT 4. 5. That was the title of the post. And it's one of many ways in which we can do long inference. You know, long inference also includes chain of thought, like, please think step by step.[00:03:52] swyx: But it also includes flow engineering, which is what Itamar from Codium coined, I think in January, where, basically, instead of instead of stuffing everything in a prompt, You do like sort of multi turn iterative feedback and chaining of things. In a way, this is a rebranding of what a chain is, what a lang chain is supposed to be.[00:04:15] swyx: I do think that maybe SGLang from ElemSys is a better name. Probably the neatest way of flow engineering I've seen yet, in the sense that everything is a one liner, it's very, very clean code. I highly recommend people look at that. I'm surprised it hasn't caught on more, but I think it will. It's weird that something like a DSPy is more hyped than a Shilang.[00:04:36] swyx: Because it, you know, it maybe obscures the code a little bit more. But both of these are, you know, really good sort of chain y and long inference type approaches. But basically, the reason that the basic fundamental insight is that the only, like, there are only a few dimensions we can scale LLMs. So, let's say in like 2020, no, let's say in like 2018, 2017, 18, 19, 20, we were realizing that we could scale the number of parameters.[00:05:03] swyx: 20, we were And we scaled that up to 175 billion parameters for GPT 3. And we did some work on scaling laws, which we also talked about in our talk. So the datasets 101 episode where we're like, okay, like we, we think like the right number is 300 billion tokens to, to train 175 billion parameters and then DeepMind came along and trained Gopher and Chinchilla and said that, no, no, like, you know, I think we think the optimal.[00:05:28] swyx: compute optimal ratio is 20 tokens per parameter. And now, of course, with LLAMA and the sort of super LLAMA scaling laws, we have 200 times and often 2, 000 times tokens to parameters. So now, instead of scaling parameters, we're scaling data. And fine, we can keep scaling data. But what else can we scale?[00:05:52] swyx: And I think understanding the ability to scale things is crucial to understanding what to pour money and time and effort into because there's a limit to how much you can scale some things. And I think people don't think about ceilings of things. And so the remaining ceiling of inference is like, okay, like, we have scaled compute, we have scaled data, we have scaled parameters, like, model size, let's just say.[00:06:20] swyx: Like, what else is left? Like, what's the low hanging fruit? And it, and it's, like, blindingly obvious that the remaining low hanging fruit is inference time. So, like, we have scaled training time. We can probably scale more, those things more, but, like, not 10x, not 100x, not 1000x. Like, right now, maybe, like, a good run of a large model is three months.[00:06:40] swyx: We can scale that to three years. But like, can we scale that to 30 years? No, right? Like, it starts to get ridiculous. So it's just the orders of magnitude of scaling. It's just, we're just like running out there. But in terms of the amount of time that we spend inferencing, like everything takes, you know, a few milliseconds, a few hundred milliseconds, depending on what how you're taking token by token, or, you know, entire phrase.[00:07:04] swyx: But We can scale that to hours, days, months of inference and see what we get. And I think that's really proMECEng.[00:07:11] Alessio: Yeah, we'll have Mike from Broadway back on the podcast. But I tried their product and their reports take about 10 minutes to generate instead of like just in real time. I think to me the most interesting thing about long inference is like, You're shifting the cost to the customer depending on how much they care about the end result.[00:07:31] Alessio: If you think about prompt engineering, it's like the first part, right? You can either do a simple prompt and get a simple answer or do a complicated prompt and get a better answer. It's up to you to decide how to do it. Now it's like, hey, instead of like, yeah, training this for three years, I'll still train it for three months and then I'll tell you, you know, I'll teach you how to like make it run for 10 minutes to get a better result.[00:07:52] Alessio: So you're kind of like parallelizing like the improvement of the LLM. Oh yeah, you can even[00:07:57] swyx: parallelize that, yeah, too.[00:07:58] Alessio: So, and I think, you know, for me, especially the work that I do, it's less about, you know, State of the art and the absolute, you know, it's more about state of the art for my application, for my use case.[00:08:09] Alessio: And I think we're getting to the point where like most companies and customers don't really care about state of the art anymore. It's like, I can get this to do a good enough job. You know, I just need to get better. Like, how do I do long inference? You know, like people are not really doing a lot of work in that space, so yeah, excited to see more.[00:08:28] swyx: So then the last point I'll mention here is something I also mentioned as paper. So all these directions are kind of guided by what happened in January. That was my way of doing a January recap. Which means that if there was nothing significant in that month, I also didn't mention it. Which is which I came to regret come February 15th, but in January also, you know, there was also the alpha geometry paper, which I kind of put in this sort of long inference bucket, because it solves like, you know, more than 100 step math olympiad geometry problems at a human gold medalist level and that also involves planning, right?[00:08:59] swyx: So like, if you want to scale inference, you can't scale it blindly, because just, Autoregressive token by token generation is only going to get you so far. You need good planning. And I think probably, yeah, what Mike from BrightWave is now doing and what everyone is doing, including maybe what we think QSTAR might be, is some form of search and planning.[00:09:17] swyx: And it makes sense. Like, you want to spend your inference time wisely. How do you[00:09:22] Alessio: think about plans that work and getting them shared? You know, like, I feel like if you're planning a task, somebody has got in and the models are stochastic. So everybody gets initially different results. Somebody is going to end up generating the best plan to do something, but there's no easy way to like store these plans and then reuse them for most people.[00:09:44] Alessio: You know, like, I'm curious if there's going to be. Some paper or like some work there on like making it better because, yeah, we don't[00:09:52] swyx: really have This is your your pet topic of NPM for[00:09:54] Alessio: Yeah, yeah, NPM, exactly. NPM for, you need NPM for anything, man. You need NPM for skills. You need NPM for planning. Yeah, yeah.[00:10:02] Alessio: You know I think, I mean, obviously the Voyager paper is like the most basic example where like, now their artifact is like the best planning to do a diamond pickaxe in Minecraft. And everybody can just use that. They don't need to come up with it again. Yeah. But there's nothing like that for actually useful[00:10:18] swyx: tasks.[00:10:19] swyx: For plans, I believe it for skills. I like that. Basically, that just means a bunch of integration tooling. You know, GPT built me integrations to all these things. And, you know, I just came from an integrations heavy business and I could definitely, I definitely propose some version of that. And it's just, you know, hard to execute or expensive to execute.[00:10:38] swyx: But for planning, I do think that everyone lives in slightly different worlds. They have slightly different needs. And they definitely want some, you know, And I think that that will probably be the main hurdle for any, any sort of library or package manager for planning. But there should be a meta plan of how to plan.[00:10:57] swyx: And maybe you can adopt that. And I think a lot of people when they have sort of these meta prompting strategies of like, I'm not prescribing you the prompt. I'm just saying that here are the like, Fill in the lines or like the mad libs of how to prompts. First you have the roleplay, then you have the intention, then you have like do something, then you have the don't something and then you have the my grandmother is dying, please do this.[00:11:19] swyx: So the meta plan you could, you could take off the shelf and test a bunch of them at once. I like that. That was the initial, maybe, promise of the, the prompting libraries. You know, both 9chain and Llama Index have, like, hubs that you can sort of pull off the shelf. I don't think they're very successful because people like to write their own.[00:11:36] swyx: Yeah,[00:11:37] Direction 2: Synthetic Data (WRAP, SPIN)[00:11:37] Alessio: yeah, yeah. Yeah, that's a good segue into the next one, which is synthetic[00:11:41] swyx: data. Synthetic data is so hot. Yeah, and, you know, the way, you know, I think I, I feel like I should do one of these memes where it's like, Oh, like I used to call it, you know, R L A I F, and now I call it synthetic data, and then people are interested.[00:11:54] swyx: But there's gotta be older versions of what synthetic data really is because I'm sure, you know if you've been in this field long enough, There's just different buzzwords that the industry condenses on. Anyway, the insight that I think is relatively new that why people are excited about it now and why it's proMECEng now is that we have evidence that shows that LLMs can generate data to improve themselves with no teacher LLM.[00:12:22] swyx: For all of 2023, when people say synthetic data, they really kind of mean generate a whole bunch of data from GPT 4 and then train an open source model on it. Hello to our friends at News Research. That's what News Harmony says. They're very, very open about that. I think they have said that they're trying to migrate away from that.[00:12:40] swyx: But it is explicitly against OpenAI Terms of Service. Everyone knows this. You know, especially once ByteDance got banned for, for doing exactly that. So so, so synthetic data that is not a form of model distillation is the hot thing right now, that you can bootstrap better LLM performance from the same LLM, which is very interesting.[00:13:03] swyx: A variant of this is RLAIF, where you have a, where you have a sort of a constitutional model, or, you know, some, some kind of judge model That is sort of more aligned. But that's not really what we're talking about when most people talk about synthetic data. Synthetic data is just really, I think, you know, generating more data in some way.[00:13:23] swyx: A lot of people, I think we talked about this with Vipul from the Together episode, where I think he commented that you just have to have a good world model. Or a good sort of inductive bias or whatever that, you know, term of art is. And that is strongest in math and science math and code, where you can verify what's right and what's wrong.[00:13:44] swyx: And so the REST EM paper from DeepMind explored that. Very well, it's just the most obvious thing like and then and then once you get out of that domain of like things where you can generate You can arbitrarily generate like a whole bunch of stuff and verify if they're correct and therefore they're they're correct synthetic data to train on Once you get into more sort of fuzzy topics, then it's then it's a bit less clear So I think that the the papers that drove this understanding There are two big ones and then one smaller one One was wrap like rephrasing the web from from Apple where they basically rephrased all of the C4 data set with Mistral and it be trained on that instead of C4.[00:14:23] swyx: And so new C4 trained much faster and cheaper than old C, than regular raw C4. And that was very interesting. And I have told some friends of ours that they should just throw out their own existing data sets and just do that because that seems like a pure win. Obviously we have to study, like, what the trade offs are.[00:14:42] swyx: I, I imagine there are trade offs. So I was just thinking about this last night. If you do synthetic data and it's generated from a model, probably you will not train on typos. So therefore you'll be like, once the model that's trained on synthetic data encounters the first typo, they'll be like, what is this?[00:15:01] swyx: I've never seen this before. So they have no association or correction as to like, oh, these tokens are often typos of each other, therefore they should be kind of similar. I don't know. That's really remains to be seen, I think. I don't think that the Apple people export[00:15:15] Alessio: that. Yeah, isn't that the whole, Mode collapse thing, if we do more and more of this at the end of the day.[00:15:22] swyx: Yeah, that's one form of that. Yeah, exactly. Microsoft also had a good paper on text embeddings. And then I think this is a meta paper on self rewarding language models. That everyone is very interested in. Another paper was also SPIN. These are all things we covered in the the Latent Space Paper Club.[00:15:37] swyx: But also, you know, I just kind of recommend those as top reads of the month. Yeah, I don't know if there's any much else in terms, so and then, regarding the potential of it, I think it's high potential because, one, it solves one of the data war issues that we have, like, everyone is OpenAI is paying Reddit 60 million dollars a year for their user generated data.[00:15:56] swyx: Google, right?[00:15:57] Alessio: Not OpenAI.[00:15:59] swyx: Is it Google? I don't[00:16:00] Alessio: know. Well, somebody's paying them 60 million, that's[00:16:04] swyx: for sure. Yes, that is, yeah, yeah, and then I think it's maybe not confirmed who. But yeah, it is Google. Oh my god, that's interesting. Okay, because everyone was saying, like, because Sam Altman owns 5 percent of Reddit, which is apparently 500 million worth of Reddit, he owns more than, like, the founders.[00:16:21] Alessio: Not enough to get the data,[00:16:22] swyx: I guess. So it's surprising that it would go to Google instead of OpenAI, but whatever. Okay yeah, so I think that's all super interesting in the data field. I think it's high potential because we have evidence that it works. There's not a doubt that it doesn't work. I think it's a doubt that there's, what the ceiling is, which is the mode collapse thing.[00:16:42] swyx: If it turns out that the ceiling is pretty close, then this will maybe augment our data by like, I don't know, 30 50 percent good, but not game[00:16:51] Alessio: changing. And most of the synthetic data stuff, it's reinforcement learning on a pre trained model. People are not really doing pre training on fully synthetic data, like, large enough scale.[00:17:02] swyx: Yeah, unless one of our friends that we've talked to succeeds. Yeah, yeah. Pre trained synthetic data, pre trained scale synthetic data, I think that would be a big step. Yeah. And then there's a wildcard, so all of these, like smaller Directions,[00:17:15] Wildcard: Multi-Epoch Training (OLMo, Datablations)[00:17:15] swyx: I always put a wildcard in there. And one of the wildcards is, okay, like, Let's say, you have pre, you have, You've scraped all the data on the internet that you think is useful.[00:17:25] swyx: Seems to top out at somewhere between 2 trillion to 3 trillion tokens. Maybe 8 trillion if Mistral, Mistral gets lucky. Okay, if I need 80 trillion, if I need 100 trillion, where do I go? And so, you can do synthetic data maybe, but maybe that only gets you to like 30, 40 trillion. Like where, where is the extra alpha?[00:17:43] swyx: And maybe extra alpha is just train more on the same tokens. Which is exactly what Omo did, like Nathan Lambert, AI2, After, just after he did the interview with us, they released Omo. So, it's unfortunate that we didn't get to talk much about it. But Omo actually started doing 1. 5 epochs on every, on all data.[00:18:00] swyx: And the data ablation paper that I covered in Europe's says that, you know, you don't like, don't really start to tap out of like, the alpha or the sort of improved loss that you get from data all the way until four epochs. And so I'm just like, okay, like, why do we all agree that one epoch is all you need?[00:18:17] swyx: It seems like to be a trend. It seems that we think that memorization is very good or too good. But then also we're finding that, you know, For improvement in results that we really like, we're fine on overtraining on things intentionally. So, I think that's an interesting direction that I don't see people exploring enough.[00:18:36] swyx: And the more I see papers coming out Stretching beyond the one epoch thing, the more people are like, it's completely fine. And actually, the only reason we stopped is because we ran out of compute[00:18:46] Alessio: budget. Yeah, I think that's the biggest thing, right?[00:18:51] swyx: Like, that's not a valid reason, that's not science. I[00:18:54] Alessio: wonder if, you know, Matt is going to do it.[00:18:57] Alessio: I heard LamaTree, they want to do a 100 billion parameters model. I don't think you can train that on too many epochs, even with their compute budget, but yeah. They're the only ones that can save us, because even if OpenAI is doing this, they're not going to tell us, you know. Same with DeepMind.[00:19:14] swyx: Yeah, and so the updates that we got on Lambda 3 so far is apparently that because of the Gemini news that we'll talk about later they're pushing it back on the release.[00:19:21] swyx: They already have it. And they're just pushing it back to do more safety testing. Politics testing.[00:19:28] Alessio: Well, our episode with Sumit will have already come out by the time this comes out, I think. So people will get the inside story on how they actually allocate the compute.[00:19:38] Direction 3: Alt. Architectures (Mamba, RWKV, RingAttention, Diffusion Transformers)[00:19:38] Alessio: Alternative architectures. Well, shout out to our WKV who won one of the prizes at our Final Frontiers event last week.[00:19:47] Alessio: We talked about Mamba and Strapain on the Together episode. A lot of, yeah, monarch mixers. I feel like Together, It's like the strong Stanford Hazy Research Partnership, because Chris Ray is one of the co founders. So they kind of have a, I feel like they're going to be the ones that have one of the state of the art models alongside maybe RWKB.[00:20:08] Alessio: I haven't seen as many independent. People working on this thing, like Monarch Mixer, yeah, Manbuster, Payena, all of these are together related. Nobody understands the math. They got all the gigabrains, they got 3DAO, they got all these folks in there, like, working on all of this.[00:20:25] swyx: Albert Gu, yeah. Yeah, so what should we comment about it?[00:20:28] swyx: I mean, I think it's useful, interesting, but at the same time, both of these are supposed to do really good scaling for long context. And then Gemini comes out and goes like, yeah, we don't need it. Yeah.[00:20:44] Alessio: No, that's the risk. So, yeah. I was gonna say, maybe it's not here, but I don't know if we want to talk about diffusion transformers as like in the alt architectures, just because of Zora.[00:20:55] swyx: One thing, yeah, so, so, you know, this came from the Jan recap, which, and diffusion transformers were not really a discussion, and then, obviously, they blow up in February. Yeah. I don't think they're, it's a mixed architecture in the same way that Stripe Tiena is mixed there's just different layers taking different approaches.[00:21:13] swyx: Also I think another one that I maybe didn't call out here, I think because it happened in February, was hourglass diffusion from stability. But also, you know, another form of mixed architecture. So I guess that is interesting. I don't have much commentary on that, I just think, like, we will try to evolve these things, and maybe one of these architectures will stick and scale, it seems like diffusion transformers is going to be good for anything generative, you know, multi modal.[00:21:41] swyx: We don't see anything where diffusion is applied to text yet, and that's the wild card for this category. Yeah, I mean, I think I still hold out hope for let's just call it sub quadratic LLMs. I think that a lot of discussion this month actually was also centered around this concept that People always say, oh, like, transformers don't scale because attention is quadratic in the sequence length.[00:22:04] swyx: Yeah, but, you know, attention actually is a very small part of the actual compute that is being spent, especially in inference. And this is the reason why, you know, when you multiply, when you, when you, when you jump up in terms of the, the model size in GPT 4 from like, you know, 38k to like 32k, you don't also get like a 16 times increase in your, in your performance.[00:22:23] swyx: And this is also why you don't get like a million times increase in your, in your latency when you throw a million tokens into Gemini. Like people have figured out tricks around it or it's just not that significant as a term, as a part of the overall compute. So there's a lot of challenges to this thing working.[00:22:43] swyx: It's really interesting how like, how hyped people are about this versus I don't know if it works. You know, it's exactly gonna, gonna work. And then there's also this, this idea of retention over long context. Like, even though you have context utilization, like, the amount of, the amount you can remember is interesting.[00:23:02] swyx: Because I've had people criticize both Mamba and RWKV because they're kind of, like, RNN ish in the sense that they have, like, a hidden memory and sort of limited hidden memory that they will forget things. So, for all these reasons, Gemini 1. 5, which we still haven't covered, is very interesting because Gemini magically has fixed all these problems with perfect haystack recall and reasonable latency and cost.[00:23:29] Wildcards: Text Diffusion, RALM/Retro[00:23:29] swyx: So that's super interesting. So the wildcard I put in here if you want to go to that. I put two actually. One is text diffusion. I think I'm still very influenced by my meeting with a mid journey person who said they were working on text diffusion. I think it would be a very, very different paradigm for, for text generation, reasoning, plan generation if we can get diffusion to work.[00:23:51] swyx: For text. And then the second one is Dowie Aquila's contextual AI, which is working on retrieval augmented language models, where it kind of puts RAG inside of the language model instead of outside.[00:24:02] Alessio: Yeah, there's a paper called Retro that covers some of this. I think that's an interesting thing. I think the The challenge, well not the challenge, what they need to figure out is like how do you keep the rag piece always up to date constantly, you know, I feel like the models, you put all this work into pre training them, but then at least you have a fixed artifact.[00:24:22] Alessio: These architectures are like constant work needs to be done on them and they can drift even just based on the rag data instead of the model itself. Yeah,[00:24:30] swyx: I was in a panel with one of the investors in contextual and the guy, the way that guy pitched it, I didn't agree with. He was like, this will solve hallucination.[00:24:38] Alessio: That's what everybody says. We solve[00:24:40] swyx: hallucination. I'm like, no, you reduce it. It cannot,[00:24:44] Alessio: if you solved it, the model wouldn't exist, right? It would just be plain text. It wouldn't be a generative model. Cool. So, author, architectures, then we got mixture of experts. I think we covered a lot of, a lot of times.[00:24:56] Direction 4: Mixture of Experts (DeepSeekMoE, Samba-1)[00:24:56] Alessio: Maybe any new interesting threads you want to go under here?[00:25:00] swyx: DeepSeq MOE, which was released in January. Everyone who is interested in MOEs should read that paper, because it's significant for two reasons. One three reasons. One, it had, it had small experts, like a lot more small experts. So, for some reason, everyone has settled on eight experts for GPT 4 for Mixtral, you know, that seems to be the favorite architecture, but these guys pushed it to 64 experts, and each of them smaller than the other.[00:25:26] swyx: But then they also had the second idea, which is that it is They had two, one to two always on experts for common knowledge and that's like a very compelling concept that you would not route to all the experts all the time and make them, you know, switch to everything. You would have some always on experts.[00:25:41] swyx: I think that's interesting on both the inference side and the training side for for memory retention. And yeah, they, they, they, the, the, the, the results that they published, which actually excluded, Mixed draw, which is interesting. The results that they published showed a significant performance jump versus all the other sort of open source models at the same parameter count.[00:26:01] swyx: So like this may be a better way to do MOEs that are, that is about to get picked up. And so that, that is interesting for the third reason, which is this is the first time a new idea from China. has infiltrated the West. It's usually the other way around. I probably overspoke there. There's probably lots more ideas that I'm not aware of.[00:26:18] swyx: Maybe in the embedding space. But the I think DCM we, like, woke people up and said, like, hey, DeepSeek, this, like, weird lab that is attached to a Chinese hedge fund is somehow, you know, doing groundbreaking research on MOEs. So, so, I classified this as a medium potential because I think that it is a sort of like a one off benefit.[00:26:37] swyx: You can Add to any, any base model to like make the MOE version of it, you get a bump and then that's it. So, yeah,[00:26:45] Alessio: I saw Samba Nova, which is like another inference company. They released this MOE model called Samba 1, which is like a 1 trillion parameters. But they're actually MOE auto open source models.[00:26:56] Alessio: So it's like, they just, they just clustered them all together. So I think people. Sometimes I think MOE is like you just train a bunch of small models or like smaller models and put them together. But there's also people just taking, you know, Mistral plus Clip plus, you know, Deepcoder and like put them all together.[00:27:15] Alessio: And then you have a MOE model. I don't know. I haven't tried the model, so I don't know how good it is. But it seems interesting that you can then have people working separately on state of the art, you know, Clip, state of the art text generation. And then you have a MOE architecture that brings them all together.[00:27:31] swyx: I'm thrown off by your addition of the word clip in there. Is that what? Yeah, that's[00:27:35] Alessio: what they said. Yeah, yeah. Okay. That's what they I just saw it yesterday. I was also like[00:27:40] swyx: scratching my head. And they did not use the word adapter. No. Because usually what people mean when they say, Oh, I add clip to a language model is adapter.[00:27:48] swyx: Let me look up the Which is what Lava did.[00:27:50] Alessio: The announcement again.[00:27:51] swyx: Stable diffusion. That's what they do. Yeah, it[00:27:54] Alessio: says among the models that are part of Samba 1 are Lama2, Mistral, DeepSigCoder, Falcon, Dplot, Clip, Lava. So they're just taking all these models and putting them in a MOE. Okay,[00:28:05] swyx: so a routing layer and then not jointly trained as much as a normal MOE would be.[00:28:12] swyx: Which is okay.[00:28:13] Alessio: That's all they say. There's no paper, you know, so it's like, I'm just reading the article, but I'm interested to see how[00:28:20] Wildcard: Model Merging (mergekit)[00:28:20] swyx: it works. Yeah, so so the wildcard for this section, the MOE section is model merges, which has also come up as, as a very interesting phenomenon. The last time I talked to Jeremy Howard at the Olama meetup we called it model grafting or model stacking.[00:28:35] swyx: But I think the, the, the term that people are liking these days, the model merging, They're all, there's all different variations of merging. Merge types, and some of them are stacking, some of them are, are grafting. And, and so like, some people are approaching model merging in the way that Samba is doing, which is like, okay, here are defined models, each of which have their specific, Plus and minuses, and we will merge them together in the hope that the, you know, the sum of the parts will, will be better than others.[00:28:58] swyx: And it seems like it seems like it's working. I don't really understand why it works apart from, like, I think it's a form of regularization. That if you merge weights together in like a smart strategy you, you, you get a, you get a, you get a less overfitting and more generalization, which is good for benchmarks, if you, if you're honest about your benchmarks.[00:29:16] swyx: So this is really interesting and good. But again, they're kind of limited in terms of like the amount of bumps you can get. But I think it's very interesting in the sense of how cheap it is. We talked about this on the Chinatalk podcast, like the guest podcast that we did with Chinatalk. And you can do this without GPUs, because it's just adding weights together, and dividing things, and doing like simple math, which is really interesting for the GPU ports.[00:29:42] Alessio: There's a lot of them.[00:29:44] Direction 5: Online LLMs (Gemini Pro, Exa)[00:29:44] Alessio: And just to wrap these up, online LLMs? Yeah,[00:29:48] swyx: I think that I ki I had to feature this because the, one of the top news of January was that Gemini Pro beat GPT-4 turbo on LM sis for the number two slot to GPT-4. And everyone was very surprised. Like, how does Gemini do that?[00:30:06] swyx: Surprise, surprise, they added Google search. Mm-hmm to the results. So it became an online quote unquote online LLM and not an offline LLM. Therefore, it's much better at answering recent questions, which people like. There's an emerging set of table stakes features after you pre train something.[00:30:21] swyx: So after you pre train something, you should have the chat tuned version of it, or the instruct tuned version of it, however you choose to call it. You should have the JSON and function calling version of it. Structured output, the term that you don't like. You should have the online version of it. These are all like table stakes variants, that you should do when you offer a base LLM, or you train a base LLM.[00:30:44] swyx: And I think online is just like, There, it's important. I think companies like Perplexity, and even Exa, formerly Metaphor, you know, are rising to offer that search needs. And it's kind of like, they're just necessary parts of a system. When you have RAG for internal knowledge, and then you have, you know, Online search for external knowledge, like things that you don't know yet?[00:31:06] swyx: Mm-Hmm. . And it seems like it's, it's one of many tools. I feel like I may be underestimating this, but I'm just gonna put it out there that I, I think it has some, some potential. One of the evidence points that it doesn't actually matter that much is that Perplexity has a, has had online LMS for three months now and it performs, doesn't perform great.[00:31:25] swyx: Mm-Hmm. on, on lms, it's like number 30 or something. So it's like, okay. You know, like. It's, it's, it helps, but it doesn't give you a giant, giant boost. I[00:31:34] Alessio: feel like a lot of stuff I do with LLMs doesn't need to be online. So I'm always wondering, again, going back to like state of the art, right? It's like state of the art for who and for what.[00:31:45] Alessio: It's really, I think online LLMs are going to be, State of the art for, you know, news related activity that you need to do. Like, you're like, you know, social media, right? It's like, you want to have all the latest stuff, but coding, science,[00:32:01] swyx: Yeah, but I think. Sometimes you don't know what is news, what is news affecting.[00:32:07] swyx: Like, the decision to use an offline LLM is already a decision that you might not be consciously making that might affect your results. Like, what if, like, just putting things on, being connected online means that you get to invalidate your knowledge. And when you're just using offline LLM, like it's never invalidated.[00:32:27] swyx: I[00:32:28] Alessio: agree, but I think going back to your point of like the standing the test of time, I think sometimes you can get swayed by the online stuff, which is like, hey, you ask a question about, yeah, maybe AI research direction, you know, and it's like, all the recent news are about this thing. So the LLM like focus on answering, bring it up, you know, these things.[00:32:50] swyx: Yeah, so yeah, I think, I think it's interesting, but I don't know if I can, I bet heavily on this.[00:32:56] Alessio: Cool. Was there one that you forgot to put, or, or like a, a new direction? Yeah,[00:33:01] swyx: so, so this brings us into sort of February. ish.[00:33:05] OpenAI Sora and why everyone underestimated videogen[00:33:05] swyx: So like I published this in like 15 came with Sora. And so like the one thing I did not mention here was anything about multimodality.[00:33:16] swyx: Right. And I have chronically underweighted this. I always wrestle. And, and my cop out is that I focused this piece or this research direction piece on LLMs because LLMs are the source of like AGI, quote unquote AGI. Everything else is kind of like. You know, related to that, like, generative, like, just because I can generate better images or generate better videos, it feels like it's not on the critical path to AGI, which is something that Nat Friedman also observed, like, the day before Sora, which is kind of interesting.[00:33:49] swyx: And so I was just kind of like trying to focus on like what is going to get us like superhuman reasoning that we can rely on to build agents that automate our lives and blah, blah, blah, you know, give us this utopian future. But I do think that I, everybody underestimated the, the sheer importance and cultural human impact of Sora.[00:34:10] swyx: And you know, really actually good text to video. Yeah. Yeah.[00:34:14] Alessio: And I saw Jim Fan at a, at a very good tweet about why it's so impressive. And I think when you have somebody leading the embodied research at NVIDIA and he said that something is impressive, you should probably listen. So yeah, there's basically like, I think you, you mentioned like impacting the world, you know, that we live in.[00:34:33] Alessio: I think that's kind of like the key, right? It's like the LLMs don't have, a world model and Jan Lekon. He can come on the podcast and talk all about what he thinks of that. But I think SORA was like the first time where people like, Oh, okay, you're not statically putting pixels of water on the screen, which you can kind of like, you know, project without understanding the physics of it.[00:34:57] Alessio: Now you're like, you have to understand how the water splashes when you have things. And even if you just learned it by watching video and not by actually studying the physics, You still know it, you know, so I, I think that's like a direction that yeah, before you didn't have, but now you can do things that you couldn't before, both in terms of generating, I think it always starts with generating, right?[00:35:19] Alessio: But like the interesting part is like understanding it. You know, it's like if you gave it, you know, there's the video of like the, the ship in the water that they generated with SORA, like if you gave it the video back and now it could tell you why the ship is like too rocky or like it could tell you why the ship is sinking, then that's like, you know, AGI for like all your rig deployments and like all this stuff, you know, so, but there's none, there's none of that yet, so.[00:35:44] Alessio: Hopefully they announce it and talk more about it. Maybe a Dev Day this year, who knows.[00:35:49] swyx: Yeah who knows, who knows. I'm talking with them about Dev Day as well. So I would say, like, the phrasing that Jim used, which resonated with me, he kind of called it a data driven world model. I somewhat agree with that.[00:36:04] Does Sora have a World Model? Yann LeCun vs Jim Fan[00:36:04] swyx: I am on more of a Yann LeCun side than I am on Jim's side, in the sense that I think that is the vision or the hope that these things can build world models. But you know, clearly even at the current SORA size, they don't have the idea of, you know, They don't have strong consistency yet. They have very good consistency, but fingers and arms and legs will appear and disappear and chairs will appear and disappear.[00:36:31] swyx: That definitely breaks physics. And it also makes me think about how we do deep learning versus world models in the sense of You know, in classic machine learning, when you have too many parameters, you will overfit, and actually that fails, that like, does not match reality, and therefore fails to generalize well.[00:36:50] swyx: And like, what scale of data do we need in order to world, learn world models from video? A lot. Yeah. So, so I, I And cautious about taking this interpretation too literally, obviously, you know, like, I get what he's going for, and he's like, obviously partially right, obviously, like, transformers and, and, you know, these, like, these sort of these, these neural networks are universal function approximators, theoretically could figure out world models, it's just like, how good are they, and how tolerant are we of hallucinations, we're not very tolerant, like, yeah, so It's, it's, it's gonna prior, it's gonna bias us for creating like very convincing things, but then not create like the, the, the useful role models that we want.[00:37:37] swyx: At the same time, what you just said, I think made me reflect a little bit like we just got done saying how important synthetic data is for Mm-Hmm. for training lms. And so like, if this is a way of, of synthetic, you know, vi video data for improving our video understanding. Then sure, by all means. Which we actually know, like, GPT 4, Vision, and Dolly were trained, kind of, co trained together.[00:38:02] swyx: And so, like, maybe this is on the critical path, and I just don't fully see the full picture yet.[00:38:08] Alessio: Yeah, I don't know. I think there's a lot of interesting stuff. It's like, imagine you go back, you have Sora, you go back in time, and Newton didn't figure out gravity yet. Would Sora help you figure it out?[00:38:21] Alessio: Because you start saying, okay, a man standing under a tree with, like, Apples falling, and it's like, oh, they're always falling at the same speed in the video. Why is that? I feel like sometimes these engines can like pick up things, like humans have a lot of intuition, but if you ask the average person, like the physics of like a fluid in a boat, they couldn't be able to tell you the physics, but they can like observe it, but humans can only observe this much, you know, versus like now you have these models to observe everything and then They generalize these things and maybe we can learn new things through the generalization that they pick up.[00:38:55] swyx: But again, And it might be more observant than us in some respects. In some ways we can scale it up a lot more than the number of physicists that we have available at Newton's time. So like, yeah, absolutely possible. That, that this can discover new science. I think we have a lot of work to do to formalize the science.[00:39:11] swyx: And then, I, I think the last part is you know, How much, how much do we cheat by gen, by generating data from Unreal Engine 5? Mm hmm. which is what a lot of people are speculating with very, very limited evidence that OpenAI did that. The strongest evidence that I saw was someone who works a lot with Unreal Engine 5 looking at the side characters in the videos and noticing that they all adopt Unreal Engine defaults.[00:39:37] swyx: of like, walking speed, and like, character choice, like, character creation choice. And I was like, okay, like, that's actually pretty convincing that they actually use Unreal Engine to bootstrap some synthetic data for this training set. Yeah,[00:39:52] Alessio: could very well be.[00:39:54] swyx: Because then you get the labels and the training side by side.[00:39:58] swyx: One thing that came up on the last day of February, which I should also mention, is EMO coming out of Alibaba, which is also a sort of like video generation and space time transformer that also involves probably a lot of synthetic data as well. And so like, this is of a kind in the sense of like, oh, like, you know, really good generative video is here and It is not just like the one, two second clips that we saw from like other, other people and like, you know, Pika and all the other Runway are, are, are, you know, run Cristobal Valenzuela from Runway was like game on which like, okay, but like, let's see your response because we've heard a lot about Gen 1 and 2, but like, it's nothing on this level of Sora So it remains to be seen how we can actually apply this, but I do think that the creative industry should start preparing.[00:40:50] swyx: I think the Sora technical blog post from OpenAI was really good.. It was like a request for startups. It was so good in like spelling out. Here are the individual industries that this can impact.[00:41:00] swyx: And anyone who, anyone who's like interested in generative video should look at that. But also be mindful that probably when OpenAI releases a Soa API, right? The you, the in these ways you can interact with it are very limited. Just like the ways you can interact with Dahlia very limited and someone is gonna have to make open SOA to[00:41:19] swyx: Mm-Hmm to, to, for you to create comfy UI pipelines.[00:41:24] Alessio: The stability folks said they wanna build an open. For a competitor, but yeah, stability. Their demo video, their demo video was like so underwhelming. It was just like two people sitting on the beach[00:41:34] swyx: standing. Well, they don't have it yet, right? Yeah, yeah.[00:41:36] swyx: I mean, they just wanna train it. Everybody wants to, right? Yeah. I, I think what is confusing a lot of people about stability is like they're, they're, they're pushing a lot of things in stable codes, stable l and stable video diffusion. But like, how much money do they have left? How many people do they have left?[00:41:51] swyx: Yeah. I have had like a really, Ima Imad spent two hours with me. Reassuring me things are great. And, and I'm like, I, I do, like, I do believe that they have really, really quality people. But it's just like, I, I also have a lot of very smart people on the other side telling me, like, Hey man, like, you know, don't don't put too much faith in this, in this thing.[00:42:11] swyx: So I don't know who to believe. Yeah.[00:42:14] Alessio: It's hard. Let's see. What else? We got a lot more stuff. I don't know if we can. Yeah, Groq.[00:42:19] Groq Math[00:42:19] Alessio: We can[00:42:19] swyx: do a bit of Groq prep. We're, we're about to go to talk to Dylan Patel. Maybe, maybe it's the audio in here. I don't know. It depends what, what we get up to later. What, how, what do you as an investor think about Groq? Yeah. Yeah, well, actually, can you recap, like, why is Groq interesting? So,[00:42:33] Alessio: Jonathan Ross, who's the founder of Groq, he's the person that created the TPU at Google. It's actually, it was one of his, like, 20 percent projects. It's like, he was just on the side, dooby doo, created the TPU.[00:42:46] Alessio: But yeah, basically, Groq, they had this demo that went viral, where they were running Mistral at, like, 500 tokens a second, which is like, Fastest at anything that you have out there. The question, you know, it's all like, The memes were like, is NVIDIA dead? Like, people don't need H100s anymore. I think there's a lot of money that goes into building what GRUK has built as far as the hardware goes.[00:43:11] Alessio: We're gonna, we're gonna put some of the notes from, from Dylan in here, but Basically the cost of the Groq system is like 30 times the cost of, of H100 equivalent. So, so[00:43:23] swyx: let me, I put some numbers because me and Dylan were like, I think the two people actually tried to do Groq math. Spreadsheet doors.[00:43:30] swyx: Spreadsheet doors. So, one that's, okay, oh boy so, so, equivalent H100 for Lama 2 is 300, 000. For a system of 8 cards. And for Groq it's 2. 3 million. Because you have to buy 576 Groq cards. So yeah, that, that just gives people an idea. So like if you deprecate both over a five year lifespan, per year you're deprecating 460K for Groq, and 60K a year for H100.[00:43:59] swyx: So like, Groqs are just way more expensive per model that you're, that you're hosting. But then, you make it up in terms of volume. So I don't know if you want to[00:44:08] Alessio: cover that. I think one of the promises of Groq is like super high parallel inference on the same thing. So you're basically saying, okay, I'm putting on this upfront investment on the hardware, but then I get much better scaling once I have it installed.[00:44:24] Alessio: I think the big question is how much can you sustain the parallelism? You know, like if you get, if you're going to get 100% Utilization rate at all times on Groq, like, it's just much better, you know, because like at the end of the day, the tokens per second costs that you're getting is better than with the H100s, but if you get to like 50 percent utilization rate, you will be much better off running on NVIDIA.[00:44:49] Alessio: And if you look at most companies out there, who really gets 100 percent utilization rate? Probably open AI at peak times, but that's probably it. But yeah, curious to see more. I saw Jonathan was just at the Web Summit in Dubai, in Qatar. He just gave a talk there yesterday. That I haven't listened to yet.[00:45:09] Alessio: I, I tweeted that he should come on the pod. He liked it. And then rock followed me on Twitter. I don't know if that means that they're interested, but[00:45:16] swyx: hopefully rock social media person is just very friendly. They, yeah. Hopefully[00:45:20] Alessio: we can get them. Yeah, we, we gonna get him. We[00:45:22] swyx: just call him out and, and so basically the, the key question is like, how sustainable is this and how much.[00:45:27] swyx: This is a loss leader the entire Groq management team has been on Twitter and Hacker News saying they are very, very comfortable with the pricing of 0. 27 per million tokens. This is the lowest that anyone has offered tokens as far as Mixtral or Lama2. This matches deep infra and, you know, I think, I think that's, that's, that's about it in terms of that, that, that low.[00:45:47] swyx: And we think the pro the break even for H100s is 50 cents. At a, at a normal utilization rate. To make this work, so in my spreadsheet I made this, made this work. You have to have like a parallelism of 500 requests all simultaneously. And you have, you have model bandwidth utilization of 80%.[00:46:06] swyx: Which is way high. I just gave them high marks for everything. Groq has two fundamental tech innovations that they hinge their hats on in terms of like, why we are better than everyone. You know, even though, like, it remains to be independently replicated. But one you know, they have this sort of the entire model on the chip idea, which is like, Okay, get rid of HBM.[00:46:30] swyx: And, like, put everything in SREM. Like, okay, fine, but then you need a lot of cards and whatever. And that's all okay. And so, like, because you don't have to transfer between memory, then you just save on that time and that's why they're faster. So, a lot of people buy that as, like, that's the reason that you're faster.[00:46:45] swyx: Then they have, like, some kind of crazy compiler, or, like, Speculative routing magic using compilers that they also attribute towards their higher utilization. So I give them 80 percent for that. And so that all that works out to like, okay, base costs, I think you can get down to like, maybe like 20 something cents per million tokens.[00:47:04] swyx: And therefore you actually are fine if you have that kind of utilization. But it's like, I have to make a lot of fearful assumptions for this to work.[00:47:12] Alessio: Yeah. Yeah, I'm curious to see what Dylan says later.[00:47:16] swyx: So he was like completely opposite of me. He's like, they're just burning money. Which is great.[00:47:22] Analyzing Gemini's 1m Context, Reddit deal, Imagegen politics, Gemma via the Four Wars[00:47:22] Alessio: Gemini, want to do a quick run through since this touches on all the four words.[00:47:28] swyx: Yeah, and I think this is the mark of a useful framework, that when a new thing comes along, you can break it down in terms of the four words and sort of slot it in or analyze it in those four frameworks, and have nothing left.[00:47:41] swyx: So it's a MECE categorization. MECE is Mutually Exclusive and Collectively Exhaustive. And that's a really, really nice way to think about taxonomies and to create mental frameworks. So, what is Gemini 1. 5 Pro? It is the newest model that came out one week after Gemini 1. 0. Which is very interesting.[00:48:01] swyx: They have not really commented on why. They released this the headline feature is that it has a 1 million token context window that is multi modal which means that you can put all sorts of video and audio And PDFs natively in there alongside of text and, you know, it's, it's at least 10 times longer than anything that OpenAI offers which is interesting.[00:48:20] swyx: So it's great for prototyping and it has interesting discussions on whether it kills RAG.[00:48:25] Alessio: Yeah, no, I mean, we always talk about, you know, Long context is good, but you're getting charged per token. So, yeah, people love for you to use more tokens in the context. And RAG is better economics. But I think it all comes down to like how the price curves change, right?[00:48:42] Alessio: I think if anything, RAG's complexity goes up and up the more you use it, you know, because you have more data sources, more things you want to put in there. The token costs should go down over time, you know, if the model stays fixed. If people are happy with the model today. In two years, three years, it's just gonna cost a lot less, you know?[00:49:02] Alessio: So now it's like, why would I use RAG and like go through all of that? It's interesting. I think RAG is better cutting edge economics for LLMs. I think large context will be better long tail economics when you factor in the build cost of like managing a RAG pipeline. But yeah, the recall was like the most interesting thing because we've seen the, you know, You know, in the haystack things in the past, but apparently they have 100 percent recall on anything across the context window.[00:49:28] Alessio: At least they say nobody has used it. No, people[00:49:30] swyx: have. Yeah so as far as, so, so what this needle in a haystack thing for people who aren't following as closely as us is that someone, I forget his name now someone created this needle in a haystack problem where you feed in a whole bunch of generated junk not junk, but just like, Generate a data and ask it to specifically retrieve something in that data, like one line in like a hundred thousand lines where it like has a specific fact and if it, if you get it, you're, you're good.[00:49:57] swyx: And then he moves the needle around, like, you know, does it, does, does your ability to retrieve that vary if I put it at the start versus put it in the middle, put it at the end? And then you generate this like really nice chart. That, that kind of shows like it's recallability of a model. And he did that for GPT and, and Anthropic and showed that Anthropic did really, really poorly.[00:50:15] swyx: And then Anthropic came back and said it was a skill issue, just add this like four, four magic words, and then, then it's magically all fixed. And obviously everybody laughed at that. But what Gemini came out with was, was that, yeah, we, we reproduced their, you know, haystack issue you know, test for Gemini, and it's good across all, all languages.[00:50:30] swyx: All the one million token window, which is very interesting because usually for typical context extension methods like rope or yarn or, you know, anything like that, or alibi, it's lossy like by design it's lossy, usually for conversations that's fine because we are lossy when we talk to people but for superhuman intelligence, perfect memory across Very, very long context.[00:50:51] swyx: It's very, very interesting for picking things up. And so the people who have been given the beta test for Gemini have been testing this. So what you do is you upload, let's say, all of Harry Potter and you change one fact in one sentence, somewhere in there, and you ask it to pick it up, and it does. So this is legit.[00:51:08] swyx: We don't super know how, because this is, like, because it doesn't, yes, it's slow to inference, but it's not slow enough that it's, like, running. Five different systems in the background without telling you. Right. So it's something, it's something interesting that they haven't fully disclosed yet. The open source community has centered on this ring attention paper, which is created by your friend Matei Zaharia, and a couple other people.[00:51:36] swyx: And it's a form of distributing the compute. I don't super understand, like, why, you know, doing, calculating, like, the fee for networking and attention. In block wise fashion and distributing it makes it so good at recall. I don't think they have any answer to that. The only thing that Ring of Tension is really focused on is basically infinite context.[00:51:59] swyx: They said it was good for like 10 to 100 million tokens. Which is, it's just great. So yeah, using the four wars framework, what is this framework for Gemini? One is the sort of RAG and Ops war. Here we care less about RAG now, yes. Or, we still care as much about RAG, but like, now it's it's not important in prototyping.[00:52:21] swyx: And then, for data war I guess this is just part of the overall training dataset, but Google made a 60 million deal with Reddit and presumably they have deals with other companies. For the multi modality war, we can talk about the image generation, Crisis, or the fact that Gemini also has image generation, which we'll talk about in the next section.[00:52:42] swyx: But it also has video understanding, which is, I think, the top Gemini post came from our friend Simon Willison, who basically did a short video of him scanning over his bookshelf. And it would be able to convert that video into a JSON output of what's on that bookshelf. And I think that is very useful.[00:53:04] swyx: Actually ties into the conversation that we had with David Luan from Adept. In a sense of like, okay what if video was the main modality instead of text as the input? What if, what if everything was video in, because that's how we work. We, our eyes don't actually read, don't actually like get input, our brains don't get inputs as characters.[00:53:25] swyx: Our brains get the pixels shooting into our eyes, and then our vision system takes over first, and then we sort of mentally translate that into text later. And so it's kind of like what Adept is kind of doing, which is driving by vision model, instead of driving by raw text understanding of the DOM. And, and I, I, in that, that episode, which we haven't released I made the analogy to like self-driving by lidar versus self-driving by camera.[00:53:52] swyx: Mm-Hmm. , right? Like, it's like, I think it, what Gemini and any other super long context that model that is multimodal unlocks is what if you just drive everything by video. Which is[00:54:03] Alessio: cool. Yeah, and that's Joseph from Roboflow. It's like anything that can be seen can be programmable with these models.[00:54:12] Alessio: You mean[00:54:12] swyx: the computer vision guy is bullish on computer vision?[00:54:18] Alessio: It's like the rag people. The rag people are bullish on rag and not a lot of context. I'm very surprised. The, the fine tuning people love fine tuning instead of few shot. Yeah. Yeah. The, yeah, the, that's that. Yeah, the, I, I think the ring attention thing, and it's how they did it, we don't know. And then they released the Gemma models, which are like a 2 billion and 7 billion open.[00:54:41] Alessio: Models, which people said are not, are not good based on my Twitter experience, which are the, the GPU poor crumbs. It's like, Hey, we did all this work for us because we're GPU rich and we're just going to run this whole thing. And

ceo american spotify tiktok black australia art europe english google ai china apple vision france politics service online state crisis living san francisco west research russia chinese elon musk reach search microsoft teacher surprise ring harry potter security asian chatgpt broadway run silicon valley mvp ceos discord medium reddit mail dubai stanford math adolf hitler fill worlds complex direction context mixed stanford university qatar dom one year falcon cto substack offensive tension retro minecraft ia newton hungary explorers openai sf gemini residence archive alt nvidia ux api builder laptops apples lamar discovered generate fastest sweep voyager stable python j'ai developed ui mm jet stretching gpt rj ml lama alibaba hungarian github automated llama directions grimes notion rail lava merge transformer lesser clip metaphor runway amd synthetic samba bal emo sora copilot shack wechat sam altman structured llm ops mamba ix unreal engine gpu connector spreadsheets rahul agi raspberry pi vector bytedance zapier sql pixie collected c4 rag sonar gpus anz 7b deepmind perplexity lambda vps utilization alessio tiananmen square anthropic speculative lms gopher lm web summit json arp mixture sundar pichai 60k mistral google gemini kura cli pocketcast pika tendency soa motif digital ocean a16z sumit demo day chinchillas itamar adept versa npm yon markov reassuring dabble linux foundation hacker news dcm boma us tech omo moes svelte agis jupyter yann lecun open api matryoshka jupyter notebooks tpu jeremy howard replit vipul exa groq 70b neurips hbm gemini pro mece nat friedman rlhf rnn chris ray code interpreter mrl naton simon willison audio recap 460k latent space sfai unthinking and openai versal jerry liu matei zaharia hashnode
MOPs & MOEs
Lessons Learned: Two Years of MOPs & MOEs

MOPs & MOEs

Play Episode Listen Later Feb 25, 2024 92:47


On this week's episode we're handing over the reins to our guest Brendon Huttmann. Brendon was our guest on episode 3 almost two years ago, where we learned about his transition from Major League Baseball to the Army's H2F program. This time, though, the tables are turned and Brendon is asking the questions. He came much more prepared for hosting duties than we normally do, and he asked some really insightful questions about our origin story, what we're trying to accomplish with this platform, and what we've learned in the process. Whether you're a new listener who isn't sure what we're all about or a long time fan who wants the full history of where MOPs & MOEs came from, this one should answer a few of your questions.

Sunnybrook Community Church
Sunnybrook Unscripted #79 - Love & Marriage Special with Jeff & Beth Moes, Part 2

Sunnybrook Community Church

Play Episode Listen Later Feb 6, 2024 19:10


Welcome to the Sunnybrook Unscripted Podcast where we talk real life, answer hard questions, and take a deeper practical look at the topics we talk about a Sunday morning.  In this episode not only talk with Pastor Jeff, we have brought in a special guest, his wife Beth. Today we are continuing on in our talk about marriage, unpacking love, relationship, the glory and the disfunction of it all and the power in honoring God despite it all. Check it out. 

MOPs & MOEs
MOPs & MOEs Book Club

MOPs & MOEs

Play Episode Listen Later Feb 4, 2024 68:49


On this episode we each brought five books that have shaped the way we think about human performance and discussed why they had such an impact. And in classic form, we each also brought a few honorable mentions as well. Drew's books are a little more focused on strength and conditioning, while Alex's books (somewhat unexpectedly) are largely focused on mental health and how exercise affects our brains. If you're looking for reading suggestions in the human performance space, you have come to the right place. This list spans a lot of different topics, so there's something for everyone. If you want to get any of them, here is the full list: Drew's Top 5 Practical Programming for Strength Training (Andy Baker/Mark Rippetoe) The Science of Running (Steve Magness) The Structure of Scientific Revolutions (Thomas Kuhn) Training Talk (Martin Bingisser) Endure (Alex Hutchinson) Alex's Top 5 Spark: The Revolutionary New Science of Exercise and the Brain (John Ratey) Man's Search for Meaning (Viktor Frankl)   How Minds Change (David Mcraney) Saving Normal (Allen Frances) Tribe (Sebastian Junger) Honorable Mentions Training for the New Alpinism (Scott Johnston/Steve House) Starting Strength (Mark Rippetoe) Strongest Shall Survive (Bill Starr) 80/20 Running (Matt Fitzgerald) Winning (Clive Woodward) Reactive Training Systems Manual (Mike Tuscherer) John Kiely Papers: A New Understanding of Stress, Periodization Paradigms in the 21st Century, and Periodization Theory: Confronting an Inconvenient Truth Michael Pollan books: This is Your Mind on Plants, In Defense of Food, and Omnivore's Dilemma Go Wild (John Ratey) The Nature Fix (Florence Williams) Why Zebra's Don't Get Ulcers (Robert Sapolsky)

The Marc Cox Morning Show
Hour 4: UFO's, TJ Moes, and What's on the Web

The Marc Cox Morning Show

Play Episode Listen Later Jan 12, 2024 31:05


In the final hours of the Marc Cox Morning Show: Chad Pergram, FOX News, joins the Marc Cox Morning Show to discuss lawmakers being briefed on UAPS or UFO as they are commonly know as TJ Moe is upset that states are trying to tell him who he can vote for.  Ryan Wiggins, host of Wiggins America, stops in the studio to talk about the debunked video of the Miami aliens What's on the Web with Anna Bohlmann: Google searches for the word 'Sleep' is up, Wife turns husbands football watching reactions into a bingo game thanks for listening!!!

The Motern Media Infomercial Podcast
The Final Moes Haven Album

The Motern Media Infomercial Podcast

Play Episode Listen Later Jan 2, 2024 72:38


Farley and Scalzo discuss their final masterpiece. 

MOPs & MOEs
Everything You Know About Energy Systems is Wrong with Evan Peikon

MOPs & MOEs

Play Episode Listen Later Dec 3, 2023 92:57


If you've spent any time learning about strength and conditioning you're almost certainly familiar with the "energy systems" framework. In this episode we're breaking down why the way these concepts are widely understood isn't supported by the evidence. And don't worry, we're not just here to criticize, we also explore an alternative model that's more up to date. Evan Peikon is back for his second episode (if you missed his first, check it out here). He is a physiologist and bioscientist who focuses on human performance, including consulting for elite athletes and military special operations. He has a particular focus on understanding and monitoring how the body utilizes oxygen during exercise. He the co-founder and lead physiologist at NNOXX, where he and his team developed the first and only wearable device to non-invasively measure muscle oxygenation (SmO2) and nitric oxide (NO) release from red blood cells in real-time. If you're not familiar with the traditional energy systems model, I made this Instagram post about it in the early days of MOPs & MOEs. You can even find this energy systems model on the Army's official page. Evan cited three studies that collectively shatter the fundamental assumptions behind the traditional energy systems model, here are each of those studies: The "glycogen shunt" in exercising muscle Metabolic fluctuation during a muscle contraction cycle Simultaneous in vivo measurements of HbO2 saturation and PCr kinetics after exercise in normal humans

The Motern Media Infomercial Podcast
The History of Moes Haven, Part 2

The Motern Media Infomercial Podcast

Play Episode Listen Later Nov 24, 2023 61:27


The shocking conclusion to our story. 

The Motern Media Infomercial Podcast
The History of Moes Haven, Part 1

The Motern Media Infomercial Podcast

Play Episode Listen Later Nov 17, 2023 67:52


Farley and Scalzo have one final album that will be released by the end of December.  While you await that epic work of art, study up on their fascinating history.  Part 2 next week!

Sunnybrook Community Church
Sunnybrook Unscripted #69 "The Church – Tithing" Pastor Jeff Moes 10.15.23

Sunnybrook Community Church

Play Episode Listen Later Oct 15, 2023 15:39


Welcome to the Sunnybrook Unscripted Podcast where we talk real life, answer hard questions, and take a deeper practical look at the topics we talk about a Sunday morning. Over the next few weeks we will be talking about different characteristics of God and how understanding them can better our everyday life. In this episode Pastor Jeff looks at questions about the church and what it's all about. Today we talk about giving and tithing, what they are, and why both are important. Check it out. 

Sunnybrook Community Church
Pastor Jeff Moes “FOR Siouxland - LOVE" 10.15.23

Sunnybrook Community Church

Play Episode Listen Later Oct 15, 2023 27:56


It all started with a question. A recognition. That for far too long, churches have been known for what they are against. We wanted to flip the script. What would happen if we were known for what we are FOR? For each other, for our community, for our world. What would happen, if we linked arms and multiplied the efforts around us to GIVE, SERVE, AND LOVE, like never before? And, in turn, unleashed the same extravagant generosity that God first showed us. Check out this talk from Pastor Jeff Moes as he continues on in our FOR Siouxland series and takes a look at Loving. Check it out.

Sunnybrook Community Church
Pastor Jeff Moes “FOR Siouxland - SERVE" 10.8.23

Sunnybrook Community Church

Play Episode Listen Later Oct 9, 2023 29:55


It all started with a question. A recognition. That for far too long, churches have been known for what they are against. We wanted to flip the script. What would happen if we were known for what we are FOR? For each other, for our community, for our world. What would happen, if we linked arms and multiplied the efforts around us to GIVE, SERVE, AND LOVE, like never before? And, in turn, unleashed the same extravagant generosity that God first showed us. Check out this talk from Pastor Jeff Moes as he continues on in our FOR Siouxland series and takes a look at Serving. Check it out.

Sunnybrook Community Church
Untitled EpisodeSunnybrook Unscripted #68 "The Church - Belonging" Pastor Jeff Moes 10.8.23

Sunnybrook Community Church

Play Episode Listen Later Oct 8, 2023 13:08


Welcome to the Sunnybrook Unscripted Podcast where we talk real life, answer hard questions, and take a deeper practical look at the topics we talk about a Sunday morning. Over the next few weeks we will be talking about different characteristics of God and how understanding them can better our everyday life. In this episode Pastor Jeff looks at questions about the church and what it's all about. Today we talk about what it looks like to belong to a church and the benefits of church membership. Check it out. 

Sunnybrook Community Church
Sunnybrook Unscripted #67 "Holiness" Pastor Jeff Moes 10.1.23

Sunnybrook Community Church

Play Episode Listen Later Oct 1, 2023 16:14


Welcome to the Sunnybrook Unscripted Podcast where we talk real life, answer hard questions, and take a deeper practical look at the topics we talk about a Sunday morning. Over the next few weeks we will be talking about different characteristics of God and how understanding them can better our everyday life. In this episode Pastor Jeff looks at the questions of what is holiness and is holiness something for un in our everyday life? Check it out. 

Throwing Fits
*PATREON PREVIEW* Moes Before Hoes

Throwing Fits

Play Episode Listen Later Jul 11, 2023 10:12


We're so back. This week, Jimmy and Larry are reeling from vacation and James' birthday to talk Oakley mules chicken or the egg flow, a new perfect pair of trouser emerges, Italian pervert pleats, a brief history of slouch socks, Amadeus (1984) as a master class in hating and Mozart the cooze hound, beefing with your dry cleaner, Spanto forever, the launch of Threads and what it could all mean for Twitter's future and Zuck vs. Elon, shining a spotlight on social media hang ups, media brands as forever losers, vacation moments of euphoria, including but not limited to: getting the urchin twerkin', horny wives watching Bridgerton, Italian style driving and turbo Scottish beer hitting back, Peroni hitting diffy in the motherland for a change, vertical pours, top 3 meals, nonna whipping work in pajamas, Antonio Ciongoli's cousin's best in class pizza, horse meat, dining in a damn cave, who knew rabbit slaps and much more before finally doing a dramatic reading of the 15 best worst entries from Drake's new poetry book Titles Ruin Everything. For more Throwing Fits, check us out on Patreon: www.patreon.com/throwingfits.