Podcasts about T5

  • 252PODCASTS
  • 756EPISODES
  • 46mAVG DURATION
  • 5WEEKLY NEW EPISODES
  • Jul 10, 2025LATEST

POPULARITY

20172018201920202021202220232024

Categories



Best podcasts about T5

Latest podcast episodes about T5

Tea & Trails
Tea & Trails Ultra LIVE Podcast #133

Tea & Trails

Play Episode Listen Later Jul 10, 2025 51:10


This week, we're thrilled to share the Tea & Trails Ultra Live Podcast, a truly special evening that brought together old friends, new faces, and plenty of laughter. We had the honour of sharing the stage with the brilliant Trish, Mel, Lucy, and Robyn, whose stories and energy lit up the room.The night was wrapped in something beautiful, as the ever-inspiring Running Reverend Lawrence Basham closed things out with a heartfelt message of togetherness.Yes, the audio may have had its hiccups, but the spirit of community was loud and clear. Thank you all so much for being part of it, and for making it unforgettable.See you next year for more trails, tea, and tales.XMILES UK - Listeners now receive 10% of their order value back as store credit via the link below.https://xmiles.avln.me/c/RiwxnARvfHeRRunderwear - Use code TEATRAILS15 for 15% off your orderhttps://runderwear.avln.me/c/GPVNMgMfYfLPSHOKZ - Use code TEA102025 to receive £10 off.https://uk.shokz.com?sca_ref=7394994.MfsDQZBAeLQihiPrecision Fuel & Hydration https://visit.pfandh.com/3GKxHjUPrecision Fuel & Hydration Planner https://visit.pfandh.com/3RuP25zHarrier - Use code TEA10 for 10% off. https://harrierrunfree.co.uk/Fenixlight Limited - Use code T&T5 for 5% off your order.https://www.fenixlight.co.uk/Protein Rebel - Use code Tea15 for 15% off your first order. https://proteinrebel.com/Centurion Running - Use code TEAANDTRAILS10 to receive 10% off *Excluding Sale Items.https://centurionrunning.com/GOODR - Use code GOTEAANDTRAILS to reveive 10% off your order.https://goodr.avln.me/c/VLEmsAIZCDtmLIFE JACKET SKIN PROTECTION - Use code GOTYOURBACK for 10% off your first order.https://lifejacketskin.com/PRIMUS UK - Use code TT-PRIMUS-20 for 20& off.https://primusuk.avln.me/c/kBWmOJaEiByDContent may contain affiliate links which can help support and grow this channel at no extra cost to you. Thanks for your continued support!Brew with the Coaches - CLICK HEREKeeping Dry & Staying Warm - https://amzn.to/42JCexqFix Your Feet - https://amzn.to/3FE4nf0Running Challenges by Keri Wallace - https://amzn.to/3KGdU7eROAR - https://amzn.to/3WU7xB2NEXT LEVEL - https://amzn.to/3Hu15LrUltra Trails - https://www.ultratrails.co.uk/Hannah Walsh - https://www.hannahwalsh.co.uk/Punk Panther - https://www.punkpanther.co.uk/Pen Llyn Ultra - https://penllyn.niftyentries.comRaw Adventures - https://www.raw-adventures.co.uk/

Tea & Trails
Luke Grenfell-Shaw - A Life In Tandem #132

Tea & Trails

Play Episode Listen Later Jul 6, 2025 55:54


Luke Grenfell-Shaw is a remarkable endurance athlete, and cancer advocate whose story reads like a masterclass in resilience. Diagnosed with stage 4 sarcoma at just 24, he refused to let the label “incurable” define him. Instead, he coined the term “CanLiver”.He's cycled 30,000 km from Bristol to Beijing on a tandem bike, inviting hundreds of others, including fellow CanLivers, to join him along the way. He's since become a professional trail runner for Brooks, and he casually ran an 80-minute half marathon during chemotherapy.https://www.alifeintandemfilm.com/https://www.instagram.com/lukegshaw/ XMILES UK - Listeners now receive 10% of their order value back as store credit via the link below.https://xmiles.avln.me/c/RiwxnARvfHeRRunderwear - Use code TEATRAILS15 for 15% off your orderhttps://runderwear.avln.me/c/GPVNMgMfYfLPSHOKZ - Use code TEA102025 to receive £10 off.https://uk.shokz.com?sca_ref=7394994.MfsDQZBAeLQihiPrecision Fuel & Hydration https://visit.pfandh.com/3GKxHjUPrecision Fuel & Hydration Planner https://visit.pfandh.com/3RuP25zHarrier - Use code TEA10 for 10% off. https://harrierrunfree.co.uk/Fenixlight Limited - Use code T&T5 for 5% off your order.https://www.fenixlight.co.uk/Protein Rebel - Use code Tea15 for 15% off your first order. https://proteinrebel.com/Centurion Running - Use code TEAANDTRAILS10 to receive 10% off *Excluding Sale Items.https://centurionrunning.com/GOODR - Use code GOTEAANDTRAILS to reveive 10% off your order.https://goodr.avln.me/c/VLEmsAIZCDtmLIFE JACKET SKIN PROTECTION - Use code GOTYOURBACK for 10% off your first order.https://lifejacketskin.com/PRIMUS UK - Use code TT-PRIMUS-20 for 20& off.https://primusuk.avln.me/c/kBWmOJaEiByDContent may contain affiliate links which can help support and grow this channel at no extra cost to you. Thanks for your continued support!Brew with the Coaches - CLICK HEREKeeping Dry & Staying Warm - https://amzn.to/42JCexqFix Your Feet - https://amzn.to/3FE4nf0Running Challenges by Keri Wallace - https://amzn.to/3KGdU7eROAR - https://amzn.to/3WU7xB2NEXT LEVEL - https://amzn.to/3Hu15LrUltra Trails - https://www.ultratrails.co.uk/Greener Miles - https://greenermilesrunning.co.uk/Hannah Walsh - https://www.hannahwalsh.co.uk/Punk Panther - https://www.punkpanther.co.uk/Pen Llyn Ultra - https://penllyn.niftyentries.com

Tea & Trails
Tea & Trails Ultra Debrief #131

Tea & Trails

Play Episode Listen Later Jul 3, 2025 61:24


This week, we raise our mugs and reflect on the inaugural Tea & Trails Ultra, an unforgettable celebration of endurance, friendship, and the wild beauty of the Lake District.

Tea & Trails
Sean Conway - Ultra Endurance Athlete #130

Tea & Trails

Play Episode Listen Later Jun 29, 2025 80:47


Sean Conway is a British-Zimbabwean ultra-endurance athlete known for pushing the limits of human endurance. He has set multiple world records, including completing 105 Ironman triathlons in 105 days, becoming the first person to swim the length of Britain, and achieving the fastest self-supported cycle across Europe. His adventurous spirit has led him to complete a 4,200-mile triathlon, cycle around the world, and run a marathon in every UK national park.Conway has also written several books and starred in documentaries about his challenges.XMILES UK - Listeners now receive 10% of their order value back as store credit via the link below.https://xmiles.avln.me/c/RiwxnARvfHeRRunderwear - Use code TEATRAILS15 for 15% off your orderhttps://runderwear.avln.me/c/GPVNMgMfYfLPSHOKZ - Use code TEA102025 to receive £10 off.https://uk.shokz.com?sca_ref=7394994.MfsDQZBAeLQihiPrecision Fuel & Hydration https://visit.pfandh.com/3GKxHjUPrecision Fuel & Hydration Planner https://visit.pfandh.com/3RuP25zHarrier - Use code TEA10 for 10% off. https://harrierrunfree.co.uk/Fenixlight Limited - Use code T&T5 for 5% off your order.https://www.fenixlight.co.uk/Protein Rebel - Use code Tea15 for 15% off your first order. https://proteinrebel.com/Centurion Running - Use code TEAANDTRAILS10 to receive 10% off *Excluding Sale Items.https://centurionrunning.com/GOODR - Use code GOTEAANDTRAILS to reveive 10% off your order.https://goodr.avln.me/c/VLEmsAIZCDtmLIFE JACKET SKIN PROTECTION - Use code GOTYOURBACK for 10% off your first order.https://lifejacketskin.com/PRIMUS UK - Use code TT-PRIMUS-20 for 20& off.https://primusuk.avln.me/c/kBWmOJaEiByDContent may contain affiliate links which can help support and grow this channel at no extra cost to you. Thanks for your continued support!Brew with the Coaches - CLICK HEREKeeping Dry & Staying Warm - https://amzn.to/42JCexqFix Your Feet - https://amzn.to/3FE4nf0Running Challenges by Keri Wallace - https://amzn.to/3KGdU7eROAR - https://amzn.to/3WU7xB2NEXT LEVEL - https://amzn.to/3Hu15LrUltra Trails - https://www.ultratrails.co.uk/Greener Miles - https://greenermilesrunning.co.uk/Hannah Walsh - https://www.hannahwalsh.co.uk/Punk Panther - https://www.punkpanther.co.uk/Pen Llyn Ultra - https://penllyn.niftyentries.com

JSA Podcasts for Telecom and Data Centers
David Mettler on T5 Data Centers' Growth, AI Impact & Capacity Challenges at DCD Connect New York

JSA Podcasts for Telecom and Data Centers

Play Episode Listen Later Jun 27, 2025 8:19


Check out our latest #JSATV interview from DCD Connect New York with David Mettler, EVP of Sales and Marketing at T5 Data Centers! David shares insights on the growth of T5's three key business segments – data center development, construction, and operations – and how they address capacity challenges in the rapidly evolving digital infrastructure industry. David also discusses the impact of AI on data center design, power density, and how T5 continues to innovate and meet demand. 

Tea & Trails
Rest Days, Epic long runs & Fuelling Disasters! #129

Tea & Trails

Play Episode Listen Later Jun 26, 2025 72:18


Cue the drumroll... the wait is finally over—the very first Tea & Trails Ultra has arrived! This week, we're diving headfirst into epic long runs that test the limits of sanity, other podcasts, and reliving a trail fuel related disaster. Of course, we also serve up our usual dose of TV talk, because even ultrarunners need sofa time. And through it all, we're reminded that sometimes the boldest move is choosing to rest. So here's to the noble rest day: the quiet champion whispering, “You've earned that nap... and maybe just one more episode.”https://paulforbrainrecovery.enthuse.com/cf/160-mile-yorkshire-wolds-way-challengeXMILES UK - Listeners now receive 10% of their order value back as store credit via the link below.https://xmiles.avln.me/c/RiwxnARvfHeRRunderwear - Use code TEATRAILS15 for 15% off your orderhttps://runderwear.avln.me/c/GPVNMgMfYfLPSHOKZ - Use code TEA102025 to receive £10 off.https://uk.shokz.com?sca_ref=7394994.MfsDQZBAeLQihiPrecision Fuel & Hydration https://visit.pfandh.com/3GKxHjUPrecision Fuel & Hydration Planner https://visit.pfandh.com/3RuP25zHarrier - Use code TEA10 for 10% off. https://harrierrunfree.co.uk/Fenixlight Limited - Use code T&T5 for 5% off your order.https://www.fenixlight.co.uk/Protein Rebel - Use code Tea15 for 15% off your first order. https://proteinrebel.com/Centurion Running - Use code TEAANDTRAILS10 to receive 10% off *Excluding Sale Items.https://centurionrunning.com/GOODR - Use code GOTEAANDTRAILS to reveive 10% off your order.https://goodr.avln.me/c/VLEmsAIZCDtmLIFE JACKET SKIN PROTECTION - Use code GOTYOURBACK for 10% off your first order.https://lifejacketskin.com/PRIMUS UK - Use code TT-PRIMUS-20 for 20& off.https://primusuk.avln.me/c/kBWmOJaEiByDContent may contain affiliate links which can help support and grow this channel at no extra cost to you. Thanks for your continued support!Brew with the Coaches - CLICK HEREKeeping Dry & Staying Warm - https://amzn.to/42JCexqFix Your Feet - https://amzn.to/3FE4nf0Running Challenges by Keri Wallace - https://amzn.to/3KGdU7eROAR - https://amzn.to/3WU7xB2NEXT LEVEL - https://amzn.to/3Hu15LrUltra Trails - https://www.ultratrails.co.uk/Greener Miles - https://greenermilesrunning.co.uk/Hannah Walsh - https://www.hannahwalsh.co.uk/Punk Panther - https://www.punkpanther.co.uk/Pen Llyn Ultra - https://penllyn.niftyentries.com

Tea & Trails
Lowri Morgan - Presenter, Adventurer & Ultra Marathon Runner #128

Tea & Trails

Play Episode Listen Later Jun 22, 2025 64:57


Lowri Morgan is a remarkable Welsh television presenter, adventurer, and runner. She's one of the very few people to have completed both the grueling 350 mile 6633 Ultra in the Arctic and the Amazon Jungle Ultra Marathon.Beyond her athletic feats, she's a BAFTA winning broadcaster who's presented shows like Scrum V on the BBC and Uned 5 on S4C. She's also dived to the wreck of the Titanic and produced award-winning documentaries about her adventures.What's even more inspiring is that she overcame serious injuries, only to go on and conquer some of the world's toughest races.XMILES UK - Listeners now receive 10% of their order value back as store credit via the link below.https://xmiles.avln.me/c/RiwxnARvfHeRRunderwear - Use code TEATRAILS15 for 15% off your orderhttps://runderwear.avln.me/c/GPVNMgMfYfLPSHOKZ - Use code TEA102025 to receive £10 off.https://uk.shokz.com?sca_ref=7394994.MfsDQZBAeLQihiPrecision Fuel & Hydration https://visit.pfandh.com/3GKxHjUPrecision Fuel & Hydration Planner https://visit.pfandh.com/3RuP25zHarrier - Use code TEA10 for 10% off. https://harrierrunfree.co.uk/Fenixlight Limited - Use code T&T5 for 5% off your order.https://www.fenixlight.co.uk/Protein Rebel - Use code Tea15 for 15% off your first order. https://proteinrebel.com/Centurion Running - Use code TEAANDTRAILS10 to receive 10% off *Excluding Sale Items.https://centurionrunning.com/GOODR - Use code GOTEAANDTRAILS to reveive 10% off your order.https://goodr.avln.me/c/VLEmsAIZCDtmLIFE JACKET SKIN PROTECTION - Use code GOTYOURBACK for 10% off your first order.https://lifejacketskin.com/PRIMUS UK - Use code TT-PRIMUS-20 for 20& off.https://primusuk.avln.me/c/kBWmOJaEiByDContent may contain affiliate links which can help support and grow this channel at no extra cost to you. Thanks for your continued support!Brew with the Coaches - CLICK HEREKeeping Dry & Staying Warm - https://amzn.to/42JCexqFix Your Feet - https://amzn.to/3FE4nf0Running Challenges by Keri Wallace - https://amzn.to/3KGdU7eROAR - https://amzn.to/3WU7xB2NEXT LEVEL - https://amzn.to/3Hu15LrUltra Trails - https://www.ultratrails.co.uk/Greener Miles - https://greenermilesrunning.co.uk/Hannah Walsh - https://www.hannahwalsh.co.uk/Punk Panther - https://www.punkpanther.co.uk/Pen Llyn Ultra - https://penllyn.niftyentries.com

Tea & Trails
Glory Holes, Fuelling Your Ultra & The Worst Trail Food EVER #127

Tea & Trails

Play Episode Listen Later Jun 19, 2025 60:28


This week, we plummeted to new depths and somehow, Glory Holes ended up in the show. Don't ask. We move swiftly on (with some emotional scarring) to the real meat of the episode: fuelling your ultra without, breaking hearts, not wind, and how Gary's Garmin cost him precious minutes.We also crown the worst checkpoint food of all time. Plus, Gary dishes out his Swaledale Marathon tales.XMILES UK - Listeners now receive 10% of their order value back as store credit via the link below.https://xmiles.avln.me/c/RiwxnARvfHeRRunderwear - Use code TEATRAILS15 for 15% off your orderhttps://runderwear.avln.me/c/GPVNMgMfYfLPSHOKZ - Use code TEA102025 to receive £10 off.https://uk.shokz.com?sca_ref=7394994.MfsDQZBAeLQihiPrecision Fuel & Hydration https://visit.pfandh.com/3GKxHjUPrecision Fuel & Hydration Planner https://visit.pfandh.com/3RuP25zHarrier - Use code TEA10 for 10% off. https://harrierrunfree.co.uk/Fenixlight Limited - Use code T&T5 for 5% off your order.https://www.fenixlight.co.uk/Protein Rebel - Use code Tea15 for 15% off your first order. https://proteinrebel.com/Centurion Running - Use code TEAANDTRAILS10 to receive 10% off *Excluding Sale Items.https://centurionrunning.com/GOODR - Use code GOTEAANDTRAILS to reveive 10% off your order.https://goodr.avln.me/c/VLEmsAIZCDtmLIFE JACKET SKIN PROTECTION - Use code GOTYOURBACK for 10% off your first order.https://lifejacketskin.com/PRIMUS UK - Use code TT-PRIMUS-20 for 20& off.https://primusuk.avln.me/c/kBWmOJaEiByDContent may contain affiliate links which can help support and grow this channel at no extra cost to you. Thanks for your continued support!Brew with the Coaches - CLICK HEREKeeping Dry & Staying Warm - https://amzn.to/42JCexqFix Your Feet - https://amzn.to/3FE4nf0Running Challenges by Keri Wallace - https://amzn.to/3KGdU7eROAR - https://amzn.to/3WU7xB2NEXT LEVEL - https://amzn.to/3Hu15LrUltra Trails - https://www.ultratrails.co.uk/Greener Miles - https://greenermilesrunning.co.uk/Hannah Walsh - https://www.hannahwalsh.co.uk/Punk Panther - https://www.punkpanther.co.uk/Pen Llyn Ultra - https://penllyn.niftyentries.com

Live to Walk Again
Episode 224 Brandon Parkes

Live to Walk Again

Play Episode Listen Later Jun 18, 2025 76:26


This week on the Live to Walk Again Podcast we had the pleasure of speaking with Brandon Parkes who is a Disability Advocate, Digital Creator, Spinal Cord Injury Survivor, and a Twitch Streamer. We talked to Brandon about the rare infection that caused damage to his spinal cord and left him paralyzed at the T5/6 level, his love of wheelchair basketball, and the content he's putting out from gaming to comedy to motivation for other people dealing with an SCI. Connect with Brandon at his social media links below!! Please listen, like, rate, review, and share the podcast!! We're just trying to find a cure for paralysis!!   Brandon Parkes: IG: @professorparkes Twitter: @professorparkes Twitch: @professorparkes12 Tik-Tok: @professorparkes_               @professorparkesclips  

Tea & Trails
Joe Barrs - Arctic Spine Race - 126

Tea & Trails

Play Episode Listen Later Jun 15, 2025 63:36


Joe Barrs recently, one of the first-ever finishers of the grueling 293-mile Montane Arctic Spine Race, a brutal ultramarathon across Sweden's Kungsleden Trail, where temperatures can plummet to -40°C and competitors haul their supplies on sleds.A seasoned adventurer and former Royal Marines Commando, Joe was drawn to the race by the camaraderie of the Spine Race community. He completed the course in eight days, 15 hours, and 19 minutes, tying for second place with fellow racer Ulf Nore. His background includes winter ultramarathons, Alpine skiing, and mountaineering, so he's no stranger to extreme endurance.Pic Credits - @willbaldlygo & @trail_bearXMILES UK - Listeners now receive 10% of their order value back as store credit via the link below.https://xmiles.avln.me/c/RiwxnARvfHeRRunderwear - Use code TEATRAILS15 for 15% off your orderhttps://runderwear.avln.me/c/GPVNMgMfYfLPSHOKZ - Use code TEA102025 to receive £10 off.https://uk.shokz.com?sca_ref=7394994.MfsDQZBAeLQihiPrecision Fuel & Hydration https://visit.pfandh.com/3GKxHjUPrecision Fuel & Hydration Planner https://visit.pfandh.com/3RuP25zHarrier - Use code TEA10 for 10% off. https://harrierrunfree.co.uk/Fenixlight Limited - Use code T&T5 for 5% off your order.https://www.fenixlight.co.uk/Protein Rebel - Use code Tea15 for 15% off your first order. https://proteinrebel.com/Centurion Running - Use code TEAANDTRAILS10 to receive 10% off *Excluding Sale Items.https://centurionrunning.com/GOODR - Use code GOTEAANDTRAILS to reveive 10% off your order.https://goodr.avln.me/c/VLEmsAIZCDtmLIFE JACKET SKIN PROTECTION - Use code GOTYOURBACK for 10% off your first order.https://lifejacketskin.com/PRIMUS UK - Use code TT-PRIMUS-20 for 20& off.https://primusuk.avln.me/c/kBWmOJaEiByDContent may contain affiliate links which can help support and grow this channel at no extra cost to you. Thanks for your continued support!Brew with the Coaches - CLICK HEREKeeping Dry & Staying Warm - https://amzn.to/42JCexqFix Your Feet - https://amzn.to/3FE4nf0Running Challenges by Keri Wallace - https://amzn.to/3KGdU7eROAR - https://amzn.to/3WU7xB2NEXT LEVEL - https://amzn.to/3Hu15LrUltra Trails - https://www.ultratrails.co.uk/Greener Miles - https://greenermilesrunning.co.uk/Hannah Walsh - https://www.hannahwalsh.co.uk/Punk Panther - https://www.punkpanther.co.uk/Pen Llyn Ultra - https://penllyn.niftyentries.com

Tea & Trails
Training Metrics - Time or Miles? Scottish Trails, GI Battles, Dog Poop & Strava Triumphs #125

Tea & Trails

Play Episode Listen Later Jun 12, 2025 66:25


Welcome to a Poetic Tea & Trails Podcast, This week brings a slight format shake-up, but nothing drastic, don't panic!Expect stories of epic Scottish mountain runs, unexpected GI battles, and a bit of dog poop too. But it's not all a struggle! There's ice cream fueled Swaledale Marathon fun, moments of triumph, and a bit of Strava inspiration to keep us all motivated. Also, our coaches talk about how do you track your running? Miles or time on feet?So lace up, breath & believe, and we hope you enjoy episode 125 of the Tea & Trails Podcast.XMILES UK - 10% discount via the link below.https://xmiles.avln.me/c/RiwxnARvfHeRRunderwear - Use code TEATRAILS15 for 15% off your orderhttps://runderwear.avln.me/c/GPVNMgMfYfLPSHOKZ - Use code TEA102025 to receive £10 off.https://uk.shokz.com?sca_ref=7394994.MfsDQZBAeLQihiPrecision Fuel & Hydration https://visit.pfandh.com/3GKxHjUPrecision Fuel & Hydration Planner https://visit.pfandh.com/3RuP25zHarrier - Use code TEA10 for 10% off. https://harrierrunfree.co.uk/Fenixlight Limited - Use code T&T5 for 5% off your order.https://www.fenixlight.co.uk/Protein Rebel - Use code Tea15 for 15% off your first order. https://proteinrebel.com/Centurion Running - Use code TEAANDTRAILS10 to receive 10% off *Excluding Sale Items.https://centurionrunning.com/GOODR - Use code GOTEAANDTRAILS to reveive 10% off your order.https://goodr.avln.me/c/VLEmsAIZCDtmLIFE JACKET SKIN PROTECTION - Use code GOTYOURBACK for 10% off your first order.https://lifejacketskin.com/PRIMUS UK - Use code TT-PRIMUS-20 for 20& off.https://primusuk.avln.me/c/kBWmOJaEiByDContent may contain affiliate links which can help support and grow this channel at no extra cost to you. Thanks for your continued support!Brew with the Coaches - CLICK HEREKeeping Dry & Staying Warm - https://amzn.to/42JCexqFix Your Feet - https://amzn.to/3FE4nf0Running Challenges by Keri Wallace - https://amzn.to/3KGdU7eROAR - https://amzn.to/3WU7xB2NEXT LEVEL - https://amzn.to/3Hu15LrUltra Trails - https://www.ultratrails.co.uk/Greener Miles - https://greenermilesrunning.co.uk/Hannah Walsh - https://www.hannahwalsh.co.uk/Punk Panther - https://www.punkpanther.co.uk/Pen Llyn Ultra - https://penllyn.niftyentries.com

Ecos a 10.000 kilómetros
S12E05 - En el que un desastre tras otro

Ecos a 10.000 kilómetros

Play Episode Listen Later Jun 11, 2025 89:14


PRESENTACIÓN LIBROS 00:01:45 La muy catastrófica visita al zoo (Joël Dicker) 00:03:20 Juliette (Camille Jourdy) 00:04:15 Happy Endings (Lucie Bryon) 00:05:05 Las gratitudes (Delphine de Vigan) 00:06:20 Apocalipsis (Stephen King) 00:08:00 La singularidad está más cerca (Ray Kurzweil) 00:09:55 Crímenes rurales (Las amigas estupendas) 00:12:15 Cuál es tu tormento (Sigrid Nunez) 00:13:35 Un lugar feliz (Emily Henry) 00:14:55 Casi nada que ponerte (Lucía Lijtmaer) 00:16:05 El mito del idealismo americano (Noam Chomsky) 00:18:10 El factor Rachel (Caroline O'Donoghue) 00:19:10 Todos en este tren son sospechosos (Benjamin Stevenson) 00:20:20 Tres (Dror Mishani) 00:22:10 Señoras bien; La soledad de la Reina; Ena (Pilar Eyre) 00:24:05 Deberes: La mujer de arriba (Freida McFadden) / Mi familia y otros animales (Gerard Durrell) PELÍCULAS 00:26:20 Lost in translation 00:28:00 Otro pequeño favor 00:28:35 Destino final: Lazos de Sangre 00:29:45 Matrimonio mortal en Carolina del Norte 00:32:40 La fuente de la eterna juventud 00:34:10 Fred & Rose West. Love & Murder 00:36:05 Historias para no contar 00:36:40 Misión Imposible: Sentencia Mortal, parte 2 00:38:10 La reina del baile 00:39:30 Matteo Lane: The Al Dente Special 00:40:00 Lilo & Stitch 00:41:00 La viuda negra 00:44:55 La maldición del colgante 00:46:25 Twisters 00:47:40 Ballerina SERIES 00:49:55 La canción 00:51:15 Prisoner of the Prophet 00:52:15 The killing of Dolores McCrea 00:53:55 Sirenas 00:55:45 Onision in real life 00:57:15 Los crímenes del Tetris 00:58:45 El diablo en la familia: la caída de Rubby Frankle 01:01:20 Fred & Rose West. Una historia británica de terror 01:02:55 El asesinato no resuelto de Beverly Lynn Smith 01:04:10 Los secretos que ocultamos 01:05:20 Felipe y Letizia 01:06:25 Citizen detective (T1) 01:07:50 Matlock (T1) 01:09:25 The Studio (T1) 01:11:00 Dept Q (T1) 01:12:55 Doctor Who (T2) OJO: ESPOILERS 01:16:00 Elsbeth (T2) 01:17:41 The last of us (T2) 01:18:20 Hacks (T4) 01:19:50 El cuento de la criada (T5) 01:22:05 Grey´s anatomy (T21) OJO: ESPOILERS 01:24:50 Deberes: Cuando nadie nos ve / Yellowjackets (T3) 01:27:20 DESPEDIDA En este programa suenan: Radical Opinion (Archers) / Siesta (Jahzzar) / Place on Fire (Creo) / I saw you on TV (Jahzzar) / Bicycle Waltz (Goobye Kumiko)

Tea & Trails
Sarah Ingram - Cape Wrath Ultra Champ - 124

Tea & Trails

Play Episode Listen Later Jun 5, 2025 126:36


Honed in the peat bogs of the dark peak and the lake district rain, suffering enthusiast and of the belief that not much in life can't be improved with a cup of Yorkshire tea or a dunk in a cold body of water.Not our words, but those of tea-slurping Cape Wrath Ultra champion and this week's guest, Sarah Ingram!Entries are open for the 2026 event - https://www.capewrathultra.com/Photo credit: No Limits PhotographyXMILES UK - 10% discount via the link below.https://xmiles.avln.me/c/RiwxnARvfHeRRunderwear - Use code TEATRAILS15 for 15% off your orderhttps://runderwear.avln.me/c/GPVNMgMfYfLPSHOKZ - Use code TEA102025 to receive £10 off.https://uk.shokz.com?sca_ref=7394994.MfsDQZBAeLQihiPrecision Fuel & Hydration https://visit.pfandh.com/3GKxHjUPrecision Fuel & Hydration Planner https://visit.pfandh.com/3RuP25zHarrier - Use code TEA10 for 10% off. https://harrierrunfree.co.uk/Fenixlight Limited - Use code T&T5 for 5% off your order.https://www.fenixlight.co.uk/Protein Rebel - Use code Tea15 for 15% off your first order. https://proteinrebel.com/Centurion Running - Use code TEAANDTRAILS10 to receive 10% off *Excluding Sale Items.https://centurionrunning.com/GOODR - Use code GOTEAANDTRAILS to reveive 10% off your order.https://goodr.avln.me/c/VLEmsAIZCDtmLIFE JACKET SKIN PROTECTION - Use code GOTYOURBACK for 10% off your first order.https://lifejacketskin.com/PRIMUS UK - Use code TT-PRIMUS-20 for 20& off.https://primusuk.avln.me/c/kBWmOJaEiByDContent may contain affiliate links which can help support and grow this channel at no extra cost to you. Thanks for your continued support!Brew with the Coaches - CLICK HEREKeeping Dry & Staying Warm - https://amzn.to/42JCexqFix Your Feet - https://amzn.to/3FE4nf0Running Challenges by Keri Wallace - https://amzn.to/3KGdU7eROAR - https://amzn.to/3WU7xB2NEXT LEVEL - https://amzn.to/3Hu15LrUltra Trails - https://www.ultratrails.co.uk/Greener Miles - https://greenermilesrunning.co.uk/Hannah Walsh - https://www.hannahwalsh.co.uk/Punk Panther - https://www.punkpanther.co.uk/Pen Llyn Ultra - https://penllyn.niftyentries.com

High on Cars - podcast
Bilrevy 1994 med Bollerslev!

High on Cars - podcast

Play Episode Listen Later Jun 3, 2025 81:01


Året er nu blevet 1994 i Niels og Bollerslevs marathon-gennemlæsning af samtlige bilrevyer! I denne omgang er der kei-cars, Volvo 850 T5 og så er der en som har "vundet" en Supra.Podcasten indeholder reklame.Tak til vores samarbejdspartnere:OK Oktan 100.Engel Workwear.Scania Danmark.Aros Forsikring.

Tea & Trails
Old County Tops Fell Race Special - 123

Tea & Trails

Play Episode Listen Later May 29, 2025 120:55


This week we do an Old County Tops deep dive! Does the bromance continue or was it sad times on the trails. Let's find out. Kudos to everyone who toed the line and the team for putting together another awesome race!Held in May, The Old County Tops Fell Race covers 37 miles and involves around 10,000 feet of ascent.The exact distance and amount of ascent are dependant on the route you choose!This weeks Brew With the Coaches question is about self doubt.XMILES UK - 10% discount via the link below.https://xmiles.avln.me/c/RiwxnARvfHeRRunderwear - Use code TEATRAILS15 for 15% off your orderhttps://runderwear.avln.me/c/GPVNMgMfYfLPSHOKZ - Use code TEA102025 to receive £10 off.https://uk.shokz.com?sca_ref=7394994.MfsDQZBAeLQihiPrecision Fuel & Hydration https://visit.pfandh.com/3GKxHjUPrecision Fuel & Hydration Planner https://visit.pfandh.com/3RuP25zHarrier - Use code TEA10 for 10% off. https://harrierrunfree.co.uk/Fenixlight Limited - Use code T&T5 for 5% off your order.https://www.fenixlight.co.uk/Protein Rebel - Use code Tea15 for 15% off your first order. https://proteinrebel.com/Centurion Running - Use code TEAANDTRAILS10 to receive 10% off *Excluding Sale Items.https://centurionrunning.com/GOODR - Use code GOTEAANDTRAILS to reveive 10% off your order.https://goodr.avln.me/c/VLEmsAIZCDtmLIFE JACKET SKIN PROTECTION - Use code GOTYOURBACK for 10% off your first order.https://lifejacketskin.com/PRIMUS UK - Use code TT-PRIMUS-20 for 20& off.https://primusuk.avln.me/c/kBWmOJaEiByDContent may contain affiliate links which can help support and grow this channel at no extra cost to you. Thanks for your continued support!Brew with the Coaches - CLICK HEREKeeping Dry & Staying Warm - https://amzn.to/42JCexqFix Your Feet - https://amzn.to/3FE4nf0Running Challenges by Keri Wallace - https://amzn.to/3KGdU7eROAR - https://amzn.to/3WU7xB2NEXT LEVEL - https://amzn.to/3Hu15LrUltra Trails - https://www.ultratrails.co.uk/Greener Miles - https://greenermilesrunning.co.uk/Hannah Walsh - https://www.hannahwalsh.co.uk/Punk Panther - https://www.punkpanther.co.uk/Pen Llyn Ultra - https://penllyn.niftyentries.com

JSA Podcasts for Telecom and Data Centers
T5's CEO Pete Marin on Scalable Data Center Growth Fueled by AI

JSA Podcasts for Telecom and Data Centers

Play Episode Listen Later May 28, 2025 6:07


T5's CEO Pete Marin talks about having an owner's mindset when it comes to data center development, construction and operations - and the immense scale of growth that will be needed to support AI and high performance computing.

RJ Bell's Dream Preview
Memorial Tournament Picks

RJ Bell's Dream Preview

Play Episode Listen Later May 28, 2025 48:16


Will Doctor gives you the sharpest card for the action at Jack's Place. -Going over top players on odds board -1 matchup -2 p2p -3 outrights (40/1, 75/1, 110/1) -Sleeper, 2 FRP, scoring -Best Bet Will Doctor delivers a focused and stat-driven breakdown of the Memorial Tournament at Muirfield Village, offering sharp PGA betting insights, critiques of tour policies, and precise player analysis. He opens with a recap of Week 21's 10-unit loss, missing on Ben Griffin's win despite Griffin's elite short game and putting. Griffin, a two-time winner this season, overcame poor driving stats at Colonial and held off Mati Schmidt and Bud Cawley. Doctor also critiques picks like JT Poston, who faltered due to big numbers, and others like Riley, Højgaard, and Rai, who failed to deliver. Scottie Scheffler is highlighted as a dominant force at Muirfield, with podium finishes in his last three appearances, though Doctor avoids betting him at 3-1 due to putting issues and his third straight week competing. Rory McIlroy receives heavy criticism for skipping his third signature event of the year, including Memorial, without informing host Jack Nicklaus. Doctor dissects the PGA's approach to field size, arguing it unfairly excludes players like Higgo and Phillips while excessively relying on sponsor exemptions for names like Fowler and Snedeker. Muirfield Village is described as a long and punishing course with narrow fairways and small bentgrass greens that reward elite ball-striking and putting accuracy. Top betting lines are reviewed: Morikawa (16-1) is doubted due to form; Schauffele (18-1) lacks Sunday contention; Justin Thomas (25-1) and Patrick Cantlay (25-1) show concerning stats despite course fits. Doctor recommends a matchup bet of Taylor Pendrith over Davis Thompson, citing Pendrith's recent T5 and solid form. Key top finishes include Tony Finau Top 20 (+120) and Shane Lowry Top 10 (+250), with Finau's ball-striking and putting trending positively. Three outright picks are revealed: Lowry (40-1), Novak (75-1), and Bud Cawley (110-1), each supported with course history and recent performance data. Cawley's comeback from injury and recent top-5 finishes are especially praised. Sleeper pick is Cawley to Top 10 (+550), and First Round Top 10s include Lowry (+275) and Novak (+400). Fantasy lineups include combinations of Scheffler, Lowry, Novak, Fowler, Cawley, and Graceman, with strategy adjusted for DraftKings and PGA Tour.com rules. Doctor projects a winning score of 10-under due to rain-softened conditions in Dublin, Ohio. The final best bet is Novak Top 20 (+175), emphasizing his current form and statistical edge. For the latest on the world of golf, follow Doc on X @ drmedia59 Learn more about your ad choices. Visit megaphone.fm/adchoices

Tea & Trails
Kallum Pritchard - Dukeries 40 Champ - 122

Tea & Trails

Play Episode Listen Later May 22, 2025 108:31


Kallum Pritchard is a formidable trail and ultra-distance runner from the UK, renowned for his impressive performances in endurance events. He has excelled in prestigious races such as the Country to Capital Ultra, the Centurion Thames Path 100, and the Manchester to Liverpool Ultra, frequently securing podium finishes.Recently, Pritchard claimed second place in the challenging Hundred Hills 50km, a race that demands resilience with over 4,000 feet of climbing through the scenic Chilterns.With a remarkable ability to adapt across varied terrains and distances, Pritchard continues to distinguish himself as a standout figure in the UK ultra-running scene.XMILES UK - 10% discount via the link below.https://xmiles.avln.me/c/RiwxnARvfHeRRunderwear - Use code TEATRAILS15 for 15% off your orderhttps://runderwear.avln.me/c/GPVNMgMfYfLPSHOKZ - Use code TEA102025 to receive £10 off.https://uk.shokz.com?sca_ref=7394994.MfsDQZBAeLQihiPrecision Fuel & Hydration https://visit.pfandh.com/3GKxHjUPrecision Fuel & Hydration Planner https://visit.pfandh.com/3RuP25zHarrier - Use code TEA10 for 10% off. https://harrierrunfree.co.uk/Fenixlight Limited - Use code T&T5 for 5% off your order.https://www.fenixlight.co.uk/Protein Rebel - Use code Tea15 for 15% off your first order. https://proteinrebel.com/Centurion Running - Use code TEAANDTRAILS10 to receive 10% off *Excluding Sale Items.https://centurionrunning.com/LIFE JACKET SKIN PROTECTION - Use code GOTYOURBACK for 10% off your first order.https://lifejacketskin.com/PRIMUS UK - Use code TT-PRIMUS-20 for 20& off.https://primusequipment.co.uk/Content may contain affiliate links which can help support and grow this channel at no extra cost to you. Thanks for your continued support!Brew with the Coaches - CLICK HEREKeeping Dry & Staying Warm - https://amzn.to/42JCexqFix Your Feet - https://amzn.to/3FE4nf0Running Challenges by Keri Wallace - https://amzn.to/3KGdU7eROAR - https://amzn.to/3WU7xB2NEXT LEVEL - https://amzn.to/3Hu15LrUltra Trails - https://www.ultratrails.co.uk/Greener Miles - https://greenermilesrunning.co.uk/Hannah Walsh - https://www.hannahwalsh.co.uk/Punk Panther - https://www.punkpanther.co.uk/Pen Llyn Ultra - https://penllyn.niftyentries.com

No Laying Up - Golf Podcast
1006: Friday at the PGA Championship

No Laying Up - Golf Podcast

Play Episode Listen Later May 17, 2025 92:31


Johnny Vegas leads the 2025 PGA Championship by two over Si Woo Kim, Matty Fitz, and Matthieu Pavon. Round of the day from The Pro Max Homa with a 64 to get to -5 (T5). We break it all down: leaderboard, MCs, Mudballs, TIO, and more! Presented by High Noon. Support our sponsors: High Noon - Sun's Up! H&B - NLU10 The Stack - code NOLAYINGUP Join us in our support of the Evans Scholars Foundation: https://nolayingup.com/esf PGA Champ mystery box: go to subscribe.nolayingup.com/pga-2025 for all the details Learn more about your ad choices. Visit megaphone.fm/adchoices

Tea & Trails
Lianne van Dijk - 2025 Northern Traverse - 121

Tea & Trails

Play Episode Listen Later May 15, 2025 114:07


Lianne van Dijk embodies the spirit of perseverance and adventure, inspiring countless runners to push beyond their limits. As an ultrarunner, writer, and coach, she shares her deep passion for trail and mountain running, guiding others on their endurance journeys.Her dedication has led her to remarkable achievements, including a second-place finish in the women's race at the Northern Traverse in 2025, an extraordinary feat in one of the UK's toughest ultra-distance events.Through coaching and storytelling, she reminds us that endurance sports are not just about speed or distance, but about resilience, discovery, and personal triumph. Whether on rugged trails or guiding others toward their goals, Lianne continues to be a force for positivity and inspiration in the outdoor endurance community.This week's Brew with the Coaches is all about how to crush a HOT race.XMILES UK - 10% discount via the link below.https://xmiles.avln.me/c/RiwxnARvfHeRRunderwear - Use code TEATRAILS15 for 15% off your orderhttps://runderwear.avln.me/c/GPVNMgMfYfLPSHOKZ - Use code TEA102025 to receive £10 off.https://uk.shokz.com?sca_ref=7394994.MfsDQZBAeLQihiPrecision Fuel & Hydration https://visit.pfandh.com/3GKxHjUPrecision Fuel & Hydration Planner https://visit.pfandh.com/3RuP25zHarrier - Use code TEA10 for 10% off. https://harrierrunfree.co.uk/Fenixlight Limited - Use code T&T5 for 5% off your order.https://www.fenixlight.co.uk/Protein Rebel - Use code Tea15 for 15% off your first order. https://proteinrebel.com/Centurion Running - Use code TEAANDTRAILS10 to receive 10% off *Excluding Sale Items.https://centurionrunning.com/LIFE JACKET SKIN PROTECTION - Use code GOTYOURBACK for 10% off your first order.https://lifejacketskin.com/PRIMUS UK - Use code TT-PRIMUS-20 for 20& off.https://primusequipment.co.uk/Content may contain affiliate links which can help support and grow this channel at no extra cost to you. Thanks for your continued support.Brew with the Coaches - CLICK HEREKeeping Dry & Staying Warm - https://amzn.to/42JCexqFix Your Feet - https://amzn.to/3FE4nf0Running Challenges by Keri Wallace - https://amzn.to/3KGdU7eROAR - https://amzn.to/3WU7xB2NEXT LEVEL - https://amzn.to/3Hu15LrUltra Trails - https://www.ultratrails.co.uk/Greener Miles - https://greenermilesrunning.co.uk/Hannah Walsh - https://www.hannahwalsh.co.uk/Punk Panther - https://www.punkpanther.co.uk/Pen Llyn Ultra - https://penllyn.niftyentries.com

Tea & Trails
Sean Merryweather - Chester Ultra 100 Champ - 120

Tea & Trails

Play Episode Listen Later May 8, 2025 111:09


Sean Merryweather is a British ultra-runner and obstacle course racer who has won multiple endurance events, including the GB Ultra Chester 50, Manchester to Liverpool Ultra, and GB Ultra Snowdon 50. Recently, he recently set a new course record at the Chester Ultra 100, finishing in 16 hours, 20 minutes, and 5 seconds. Smoking fast!Outside of running, Sean has a background in football and holds a black belt in kickboxing. His favorite races tend to be in the mountains, and he lives in Cheshire.This week's Brew with the Coaches is all about injury prevention.XMILES UK - 10% discount via the link below.https://xmiles.avln.me/c/RiwxnARvfHeRRunderwear - Use code TEATRAILS15 for 15% off your orderhttps://runderwear.avln.me/c/GPVNMgMfYfLPSHOKZ - Use code TEA102025 to receive £10 off.https://uk.shokz.com?sca_ref=7394994.MfsDQZBAeLQihiPrecision Fuel & Hydration https://visit.pfandh.com/3GKxHjUPrecision Fuel & Hydration Planner https://visit.pfandh.com/3RuP25zHarrier - Use code TEA10 for 10% off. https://harrierrunfree.co.uk/Fenixlight Limited - Use code T&T5 for 5% off your order.https://www.fenixlight.co.uk/Protein Rebel - Use code Tea15 for 15% off your first order. https://proteinrebel.com/Centurion Running - Use code TEAANDTRAILS10 to receive 10% off *Excluding Sale Items.https://centurionrunning.com/LIFE JACKET SKIN PROTECTION - Use code GOTYOURBACK for 10% off your first order.https://lifejacketskin.com/PRIMUS UK - Use code TT-PRIMUS-20 for 20& off.https://primusequipment.co.uk/Content may contain affiliate links which can help support and grow this channel at no extra cost to you. Thanks for your continued support.Brew with the Coaches - CLICK HEREKeeping Dry & Staying Warm - https://amzn.to/42JCexqFix Your Feet - https://amzn.to/3FE4nf0Running Challenges by Keri Wallace - https://amzn.to/3KGdU7eROAR - https://amzn.to/3WU7xB2NEXT LEVEL - https://amzn.to/3Hu15LrUltra Trails - https://www.ultratrails.co.uk/Greener Miles - https://greenermilesrunning.co.uk/Hannah Walsh - https://www.hannahwalsh.co.uk/Punk Panther - https://www.punkpanther.co.uk/Pen Llyn Ultra - https://penllyn.niftyentries.com

Vital MX
MotoXpod Ep352 | Ft. Pierce Brown, Devin Simonson, and Mike Muye

Vital MX

Play Episode Listen Later Apr 23, 2025 121:43


This week, the MotoXpod features Monster Energy Yamaha Star Racing's Pierce Brown, who has been recovering from a broken T5 vertebrae. We will check in with him and see what his plans are going forward. Then Muc-Off/FXR/ClubMX's Devin Simonson will be on the phone talking about his return to racing and finishing inside the top 10. During the Yamaha Open Chat, we will discuss the incident during the second 450 qualifying session, where some believe Cooper Webb intentionally got in the way of Chase Sexton. Director of Operations for Supercross, Mike Muye, will also join to discuss the logistics of setting the Supercross schedule. Let us know you're thoughts below. If you have any questions for the guests let us know. You can also email Motoxpodshow@Gmail.com if you want to get in on the Evan's Coolant Emails, T-Bolt USA Top 5, FXR Picks for East Rutherford, and the X Brand Forum Check-In. Watch live on the Vital MX YouTube channel starting at 4:30 Pacific/7:30 Eastern.

Fishin' for Birdies
Ep 061: Ya Gotta Have a Stinger

Fishin' for Birdies

Play Episode Listen Later Apr 9, 2025 34:22


Patrick rode a stinger and the longest made putt in 17 years on the PGA Tour to a T5 at the Valero Texas Open in San Antonio. Sponsored by Goldenwest Credit Union. 

Ford Mustang The First Generation, The Early Years Podcast
First Gen 6 Banger Pros Take the Stage

Ford Mustang The First Generation, The Early Years Podcast

Play Episode Listen Later Apr 8, 2025 37:23


Danny Stucker and Aaron CoxI met both of these guests today in the Vintage Mustang 6 Forum on Facebook, even had a chance to watch the miracle of a new wiring harness and all sorts of goodness being added to a 6-banger in SoCal. Excited to have maybe a more technical conversation than I am ready for. A little scared but confident in my ability to hit the "mute" button. Danny Stucker and Aaron Cox, welcome to Ford Mustang the Early Years podcast.Danny Stucker Notes:How long have you owned your ride?:Since November 29th, 2020If you've made improvements to your classic car or restored it, what work have you done?:I have made substantial modifications to the 200 inline 6 including a Vintage Inlines alloy head, multiport fuel injection, T5 conversion, 9" rear end, wilwood disc brakes . All new stock interior. Basically everything has been done that is not cosmetic outside.What plans do you have for improvements/restoration/modification of your classic car?:I want to do the Street or Track front coil over suspension upgrade.Danny's YouTube Channel: https://www.youtube.com/mechtrician1Aaron Cox Notes:How long have you owned your ride?:8 years for this oneWhat is his/her name?:Her name is SashaIf you've made improvements to your classic car or restored it, what work have you done?:A very long list of modifications. Forged engine and turbocharged with electronic fuel injection.What plans do you have for improvements/restoration/modification of your classic car?:Track days, cars and coffee and enjoymentConnect with the show:@mustangpodcasthttps://www.instagram.com/mustangpodcast/An Expert's Guide to Maintaining Your Classic Mustangwww.TheMustangPodcast.com/repairSponsored by: National Parts Depotwww.npdlink.comWith 4 warehouses nationwide, you'll get your parts fast!"Keep it safe, keep it rollin' and keep it on the road. Until next time!" Doug Sandlerdoug@turnkeypodcast.com

Boomer & Warrener in the Morning
Pat Mayo on Betting The Masters!

Boomer & Warrener in the Morning

Play Episode Listen Later Apr 8, 2025 46:47


Hour 1 of the Big Show+ with Patrick Dumas and GVP is on demand!!! The guys started off with the flames potentially giving Parekh and Suniev playing time this season,  Martin Pospisil had to leave the game with a injury, Adam Klapka preforming, The motivation to keep winning in the group, The team matchups throughout the season, the importance of overtime and shootout losses. (20:06) Later on, Pat Mayo joins the show to talk about the masters coming up. He brings an interesting stat were the winner of the masters has not been from T5 in the last 5 years. Also, who can win the masters this weekend, and how are individual players doing coming in too the Masters, and what Canadian has the best chance at the Masters? The views and opinions expressed in this podcast are those of the hosts and guests and do not necessarily reflect the position of Rogers Media Inc. or any affiliate.

Beyond the Clubhouse
Ep 212: 2025 Augusta National Women's Amateur w/Gianna Clemente

Beyond the Clubhouse

Play Episode Listen Later Apr 2, 2025 27:19


2x Augusta National Women's Amateur participant Gianna Clemente shares what it's like to compete at Augusta with her dad Patrick on the bag. She also paints a picture of what it felt like to compete in the Drive, Chip & Putt National Finals as a 9-year-old in 2017. She returns for her 3rd ANWA this year after playing in the final group last year with winner Lottie Woad and shares what she learned from 2024 when she finished T5.

der Hoefliche & der BAUstein
Wie Benjamin Penderock von Dotlux die Elektrobranche aufmischt

der Hoefliche & der BAUstein

Play Episode Listen Later Apr 1, 2025 35:57


DOTLUX steht für innovative LED-Technik und bietet eine breite Palette an Produkten für verschiedene Anwendungen. Ein Highlight ist das QUICK-FIX System, das eine einfache und schnelle Sanierung bestehender Leuchten ermöglicht. Die QUICK-FIXdc Module beispielsweise ersetzen herkömmliche T5- und T8-Leuchtstoffröhren und zeichnen sich durch hohe Effizienz von bis zu 200 lm/W sowie eine Lebensdauer von 100.000 Stunden aus.

Machine Learning Street Talk
Test-Time Adaptation: the key to reasoning with DL (Mohamed Osman)

Machine Learning Street Talk

Play Episode Listen Later Mar 22, 2025 63:36


Mohamed Osman joins to discuss MindsAI's highest scoring entry to the ARC challenge 2024 and the paradigm of test-time fine-tuning. They explore how the team, now part of Tufa Labs in Zurich, achieved state-of-the-art results using a combination of pre-training techniques, a unique meta-learning strategy, and an ensemble voting mechanism. Mohamed emphasizes the importance of raw data input and flexibility of the network.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT + REFS:https://www.dropbox.com/scl/fi/jeavyqidsjzjgjgd7ns7h/MoFInal.pdf?rlkey=cjjmo7rgtenxrr3b46nk6yq2e&dl=0Mohamed Osman (Tufa Labs)https://x.com/MohamedOsmanMLJack Cole (Tufa Labs)https://x.com/MindsAI_JackHow and why deep learning for ARC paper:https://github.com/MohamedOsman1998/deep-learning-for-arc/blob/main/deep_learning_for_arc.pdfTOC:1. Abstract Reasoning Foundations [00:00:00] 1.1 Test-Time Fine-Tuning and ARC Challenge Overview [00:10:20] 1.2 Neural Networks vs Programmatic Approaches to Reasoning [00:13:23] 1.3 Code-Based Learning and Meta-Model Architecture [00:20:26] 1.4 Technical Implementation with Long T5 Model2. ARC Solution Architectures [00:24:10] 2.1 Test-Time Tuning and Voting Methods for ARC Solutions [00:27:54] 2.2 Model Generalization and Function Generation Challenges [00:32:53] 2.3 Input Representation and VLM Limitations [00:36:21] 2.4 Architecture Innovation and Cross-Modal Integration [00:40:05] 2.5 Future of ARC Challenge and Program Synthesis Approaches3. Advanced Systems Integration [00:43:00] 3.1 DreamCoder Evolution and LLM Integration [00:50:07] 3.2 MindsAI Team Progress and Acquisition by Tufa Labs [00:54:15] 3.3 ARC v2 Development and Performance Scaling [00:58:22] 3.4 Intelligence Benchmarks and Transformer Limitations [01:01:50] 3.5 Neural Architecture Optimization and Processing DistributionREFS:[00:01:32] Original ARC challenge paper, François Chollethttps://arxiv.org/abs/1911.01547[00:06:55] DreamCoder, Kevin Ellis et al.https://arxiv.org/abs/2006.08381[00:12:50] Deep Learning with Python, François Chollethttps://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438[00:13:35] Deep Learning with Python, François Chollethttps://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438[00:13:35] Influence of pretraining data for reasoning, Laura Ruishttps://arxiv.org/abs/2411.12580[00:17:50] Latent Program Networks, Clement Bonnethttps://arxiv.org/html/2411.08706v1[00:20:50] T5, Colin Raffel et al.https://arxiv.org/abs/1910.10683[00:30:30] Combining Induction and Transduction for Abstract Reasoning, Wen-Ding Li, Kevin Ellis et al.https://arxiv.org/abs/2411.02272[00:34:15] Six finger problem, Chen et al.https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_SpatialVLM_Endowing_Vision-Language_Models_with_Spatial_Reasoning_Capabilities_CVPR_2024_paper.pdf[00:38:15] DeepSeek-R1-Distill-Llama, DeepSeek AIhttps://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B[00:40:10] ARC Prize 2024 Technical Report, François Chollet et al.https://arxiv.org/html/2412.04604v2[00:45:20] LLM-Guided Compositional Program Synthesis, Wen-Ding Li and Kevin Ellishttps://arxiv.org/html/2503.15540[00:54:25] Abstraction and Reasoning Corpus, François Chollethttps://github.com/fchollet/ARC-AGI[00:57:10] O3 breakthrough on ARC-AGI, OpenAIhttps://arcprize.org/[00:59:35] ConceptARC Benchmark, Arseny Moskvichev, Melanie Mitchellhttps://arxiv.org/abs/2305.07141[01:02:05] Mixtape: Breaking the Softmax Bottleneck Efficiently, Yang, Zhilin and Dai, Zihang and Salakhutdinov, Ruslan and Cohen, William W.http://papers.neurips.cc/paper/9723-mixtape-breaking-the-softmax-bottleneck-efficiently.pdf

The Empowered Investor
Essential Tax Tips to Complete your 2024 Tax Returns

The Empowered Investor

Play Episode Listen Later Mar 20, 2025 34:02 Transcription Available


Are you ready for tax season? Do you know the key deadlines and the essential documents you need to file your 2024 tax return? Have you explored all the deductions and tax relief options available to you? In this episode, Keith Matthews and Andrea LeRoyer guide you through a structured approach to preparing your taxes efficiently. From crucial deadlines to tax-saving opportunities, they cover everything you need to stay compliant and optimize your return. They also discuss major changes for the 2024 tax season, including updates on the capital gains inclusion rate, home flipping rules, and the latest home buyer plan withdrawal limits. Whether you're a salaried employee, self-employed, a retiree, or a parent, this episode provides a simple and practical framework to handle your taxes effectively. Don't miss this essential conversation to help you file your 2024 return with confidence!Thank you for tuning in!Key Topics:Overview of the 2024 tax season (0:39)Introducing co-host Andrea LeRoyer and her background in tax and finance (1:47)Five key points covered in this episode (2:02)April 30: General filing deadline (3:21)June 16: Self-employed individuals' deadline (4:19)Income slips (T4, T4A, T3, T5, etc.) (4:50)Notice of Assessment: Why it matters (5:35)Installment payment summaries (6:58)Childcare and activity credits (8:42)Tuition and education-related deductions (9:43)Medical expenses and insurance coverage (11:01)Charitable donation changes and extended deadline (13:36)Home office deductions: Employees vs. self-employed (14:28)Tax credits for home support services (16:45)Caregiver tax credits (17:19)Multigenerational renovation tax credit (17:44)Disability tax credit and eligibility (20:08)Changes in relationship status (22:09)Moving, renting, or selling property (23:02)Foreign asset ownership and T1135 requirements (24:03)Capital gains inclusion rate update (25:48)Short-term rental compliance rules (26:47)Home Buyers' Plan withdrawal increase (27:57)New tax rules for property flipping (29:10)Crypto asset reporting for Quebec residents (30:26)Alternative Minimum Tax (AMT) updates (31:26)Final tax season tips & best practices (32:20)Closing thoughts & embracing tax season (32:51)And much more!Mentioned in this Episode:Tulett, Matthews & AssociatesThanks for Listening!Be sure to subscribe on Apple, Google, Spotify, or wherever you get your podcasts. Feel free to drop us a line at lawrence@tma-invest.com or 514-695-0096 ext.112.Follow Tulett, Matthews & Associates on social media: LinkedIn, Facebook, and more!Follow The Empowered Investor on Facebook, LinkedIn, and

The Rewind
Episode 402: 2024 Composite Top 10

The Rewind

Play Episode Listen Later Mar 1, 2025 239:02


Josh is joined by The Rewind's most frequent guests from 2024 as everyone goes on the record with their Top 10 movies of the year. Each segment appears as follows: Daniel Lima (Beginning-36:05) Josh Brown (36:07-1:15:57) Elijah Howard (1:16:00-1:52:16) Fred Kolb (1:52:19-2:34:15) Ben Luben (2:34:18-3:23:36) Joe Morgan (3:23:38-End) Also, The Rewind's composite Top 10 for 2024 is as follows: 1. Flow 2.The Brutalist 3.Challengers 4.A Different Man T5.Anora T5.Dune: Part 2 T5. Close Your Eyes T8. The Seed of the Sacred Fig T8. I Saw the TV Glow 10. La Chimera

Entre nos pages
Episode #98 : Où l'on tient un journal de lecture

Entre nos pages

Play Episode Listen Later Feb 18, 2025 29:01


Bonjouuuuur ! Nous revoilà avec un journal de lecture, où l'on vous parle de nos lectures au fil de l'enregistrement. Comme toujours, c'est relativement dense, mais avec plein de belles découvertes :DOn espère que ça vous plaira, n'hésitez pas à nous donner vos avis, via instagram @entrenospages ou par mail : entrenospages@gmail.com.Bonne écoute !Les livres abordés dans cet épisode sont :- Tsubasa : World chronicle T2 et T3, CLAMP- Les embrasés, Stefan Platteau- Le golem de pierre, Claire Krust- La souris du futur, Collectif- Les aigles de Vishan Lour, Pierre Bottero- Petit pays (BD), Marzena Sowa et Sylvain Savoia- Et à la fin, ils meurent, Lou Lubie- Le trône de fer T5, George R. R. Martin- Les carnets de l'apothicaire T1 et T2, Itsuki Nanao et Nekokurage- Le coeur en braille (BD), Joris Chamblain et Anne-Lise Nalin- Lightfall T1, Tim Probert- Le jour où le bus est reparti sans elle, Béka et Marko- Les 5 terres : Demeus Lor, Lewelyn et Sylvain Guinebaud- Homo sapienne, Niviaq KorneliussenMusic promoted by La Musique LibreJoakim Karud - Canals: https://youtu.be/zrXbhncmorcJoakim Karud: https://soundcloud.com/joakimkarud

Ford Mustang The First Generation, The Early Years Podcast
Labors of Love: Wiring, Wrenching, and Rewinding a Vintage Mustang, Interview with Ron Bossen

Ford Mustang The First Generation, The Early Years Podcast

Play Episode Listen Later Jan 10, 2025 42:23


Purchased about the same time I acquired Jewel, my 1965 convertible, today's guest Ron Bossen got his 1965 coupe. Helsinki, Finland continues bringing the Mustang brand out strong. Welcome, Ron to Ford Mustang The Early Years podcastFord Mustang, The Early Years Podcast -- Guest Interview ApplicationDo you own an early year Mustang?:Yes! A 1965 Mustang CoupeIf you own a Mustang, how long have you owned your ride?:4 years+If you own a Mustang or classic car, have you named your car? If so, what is his/her name?:RowenaIf you've made improvements to your classic car or restored it, what work have you done?:New wiring bumper to bumper, LED's on all exterior and interior lights except headlights, heater rebuild, shocks and suspension all four corners, T5 manual transmission, power windows, pony interior, center console, wheels and tiresWhat plans do you have for improvements/restoration/modification of your classic car?:Repair leaky windows, add emergency flashers, some minor paint and body work, maybe three-point seat belts, other small stuff like seat belt and parking brake reminder lightsThe Facebook GroupTheMustangPodcast.com/facebookhttps://www.facebook.com/groups/185146876036328Instagram@mustangpodcasthttps://www.instagram.com/mustangpodcast/@fordpickuppodcast https://www.instagram.com/fordpickuppodcast/An Expert's Guide to Maintaining Your Classic Mustangwww.TheMustangPodcast.com/repairSponsored by: National Parts Depotwww.npdlink.comWith 4 warehouses nationwide, you'll get your parts fast!"Keep it safe, keep it rollin' and keep it on the road. Until next time!" Doug Sandlerdoug@turnkeypodcast.com

The Secure Developer
Securing The Future: How AI Is Transforming Vulnerability Detection With Berkay Berabi

The Secure Developer

Play Episode Listen Later Jan 7, 2025 29:45


Episode SummaryImagine if AI could detect and fix vulnerabilities in your code faster and with greater precision than ever before. That future is already here! In today's episode, we're joined by Berkay Berabi, an AI researcher and Senior Software Engineer at Snyk, to dive into the cutting-edge world of AI-powered vulnerability detection. Berkay offers insight into how Snyk is leveraging a hybrid AI approach to detect and fix vulnerabilities in code, combining human-driven expertise with machine learning for greater accuracy and scalability. He also introduces CodeReduce, a game-changing tool by Snyk that strips away irrelevant code, streamlining the detection process and addressing the challenges posed by complex, multi-step data flows. Through rigorous model testing, Snyk ensures that AI-generated fixes are validated to prevent errors, making the process faster and more reliable.Show NotesIn this fascinating episode of The Secure Developer, host Danny Allan sits down with Berkay Berabi, an AI researcher at Snyk, to explore the groundbreaking CodeReduce technology and its implications for software security. Berabi, who transitioned from electrical engineering to AI research, shares insights into how Snyk is revolutionizing vulnerability detection and remediation using artificial intelligence.The conversation delves deep into the technical aspects of CodeReduce, explaining how this innovative approach reduces complex code structures by up to 50 times their original size while maintaining vulnerability detection capabilities. Berabi explains the sophisticated process of code reduction, analysis, and fix generation, highlighting how AI models can better understand and address security vulnerabilities when working with simplified code. The discussion also covers the challenges of different AI models, from T5 to StarCoder and Mixtral, exploring their varying capabilities, accuracies, and performance trade-offs.The episode critically examines the future of AI in software development, addressing both opportunities and concerns. Berabi and Allan discuss recent findings about AI-generated code potentially introducing new vulnerabilities, referencing Gartner's prediction that by 2027, 25% of software vulnerabilities could be created by AI-generated code. They explore how tools like CodeReduce and other AI-powered security measures might help mitigate these risks while examining the broader implications of AI assistance in software development. This episode offers valuable insights for developers, security professionals, and anyone interested in the intersection of AI and software security.LinksDeepCode AI Fix Research PaperDeepCode AI Fix Blog Post Follow UsOur WebsiteOur LinkedIn

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
2024 in Post-Transformers Architectures (State Space Models, RWKV) [LS Live @ NeurIPS]

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Dec 24, 2024 43:02


Happy holidays! We'll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all all our LS supporters who helped fund the gorgeous venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver.Of perennial interest, particularly at academic conferences, is scaled-up architecture research as people hunt for the next Attention Is All You Need. We have many names for them: “efficient models”, “retentive networks”, “subquadratic attention” or “linear attention” but some of them don't even have any lineage with attention - one of the best papers of this NeurIPS was Sepp Hochreiter's xLSTM, which has a particularly poetic significance as one of the creators of the LSTM returning to update and challenge the OG language model architecture:So, for lack of a better term, we decided to call this segment “the State of Post-Transformers” and fortunately everyone rolled with it.We are fortunate to have two powerful friends of the pod to give us an update here:* Together AI: with CEO Vipul Ved Prakash and CTO Ce Zhang joining us to talk about how they are building Together together as a quote unquote full stack AI startup, from the lowest level kernel and systems programming to the highest level mathematical abstractions driving new model architectures and inference algorithms, with notable industry contributions from RedPajama v2, Flash Attention 3, Mamba 2, Mixture of Agents, BASED, Sequoia, Evo, Dragonfly, Dan Fu's ThunderKittens and many more research projects this year* Recursal AI: with CEO Eugene Cheah who has helped lead the independent RWKV project while also running Featherless AI. This year, the team has shipped RWKV v5, codenamed Eagle, to 1.5 billion Windows 10 and Windows 11 machines worldwide, to support Microsoft's on-device, energy-usage-sensitive Windows Copilot usecases, and has launched the first updates on RWKV v6, codenamed Finch and GoldFinch. On the morning of Latent Space Live, they also announced QRWKV6, a Qwen 32B model modified with RWKV linear attention layers. We were looking to host a debate between our speakers, but given that both of them were working on post-transformers alternativesFull Talk on YoutubePlease like and subscribe!LinksAll the models and papers they picked:* Earlier Cited Work* Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention* Hungry hungry hippos: Towards language modeling with state space models* Hyena hierarchy: Towards larger convolutional language models* Mamba: Linear-Time Sequence Modeling with Selective State Spaces* S4: Efficiently Modeling Long Sequences with Structured State Spaces* Just Read Twice (Arora et al)* Recurrent large language models that compete with Transformers in language modeling perplexity are emerging at a rapid rate (e.g., Mamba, RWKV). Excitingly, these architectures use a constant amount of memory during inference. However, due to the limited memory, recurrent LMs cannot recall and use all the information in long contexts leading to brittle in-context learning (ICL) quality. A key challenge for efficient LMs is selecting what information to store versus discard. In this work, we observe the order in which information is shown to the LM impacts the selection difficulty. * To formalize this, we show that the hardness of information recall reduces to the hardness of a problem called set disjointness (SD), a quintessential problem in communication complexity that requires a streaming algorithm (e.g., recurrent model) to decide whether inputted sets are disjoint. We empirically and theoretically show that the recurrent memory required to solve SD changes with set order, i.e., whether the smaller set appears first in-context. * Our analysis suggests, to mitigate the reliance on data order, we can put information in the right order in-context or process prompts non-causally. Towards that end, we propose: (1) JRT-Prompt, where context gets repeated multiple times in the prompt, effectively showing the model all data orders. This gives 11.0±1.3 points of improvement, averaged across 16 recurrent LMs and the 6 ICL tasks, with 11.9× higher throughput than FlashAttention-2 for generation prefill (length 32k, batch size 16, NVidia H100). We then propose (2) JRT-RNN, which uses non-causal prefix-linear-attention to process prompts and provides 99% of Transformer quality at 360M params., 30B tokens and 96% at 1.3B params., 50B tokens on average across the tasks, with 19.2× higher throughput for prefill than FA2.* Jamba: A 52B Hybrid Transformer-Mamba Language Model* We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. * Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while keeping active parameter usage manageable. * This flexible architecture allows resource- and objective-specific configurations. In the particular configuration we have implemented, we end up with a powerful model that fits in a single 80GB GPU.* Built at large scale, Jamba provides high throughput and small memory footprint compared to vanilla Transformers, and at the same time state-of-the-art performance on standard language model benchmarks and long-context evaluations. Remarkably, the model presents strong results for up to 256K tokens context length. * We study various architectural decisions, such as how to combine Transformer and Mamba layers, and how to mix experts, and show that some of them are crucial in large scale modeling. We also describe several interesting properties of these architectures which the training and evaluation of Jamba have revealed, and plan to release checkpoints from various ablation runs, to encourage further exploration of this novel architecture. We make the weights of our implementation of Jamba publicly available under a permissive license.* SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers* We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096×4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU. Core designs include: * (1) Deep compression autoencoder: unlike traditional AEs, which compress images only 8×, we trained an AE that can compress images 32×, effectively reducing the number of latent tokens. * (2) Linear DiT: we replace all vanilla attention in DiT with linear attention, which is more efficient at high resolutions without sacrificing quality. * (3) Decoder-only text encoder: we replaced T5 with modern decoder-only small LLM as the text encoder and designed complex human instruction with in-context learning to enhance the image-text alignment. * (4) Efficient training and sampling: we propose Flow-DPM-Solver to reduce sampling steps, with efficient caption labeling and selection to accelerate convergence. * As a result, Sana-0.6B is very competitive with modern giant diffusion model (e.g. Flux-12B), being 20 times smaller and 100+ times faster in measured throughput. Moreover, Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024×1024 resolution image. Sana enables content creation at low cost. * RWKV: Reinventing RNNs for the Transformer Era* Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. * We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs.* Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, thus parallelizing computations during training and maintains constant computational and memory complexity during inference. * We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers, suggesting future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling trade-offs between computational efficiency and model performance in sequence processing tasks.* LoLCATs: On Low-Rank Linearizing of Large Language Models* Recent works show we can linearize large language models (LLMs) -- swapping the quadratic attentions of popular Transformer-based LLMs with subquadratic analogs, such as linear attention -- avoiding the expensive pretraining costs. However, linearizing LLMs often significantly degrades model quality, still requires training over billions of tokens, and remains limited to smaller 1.3B to 7B LLMs. * We thus propose Low-rank Linear Conversion via Attention Transfer (LoLCATs), a simple two-step method that improves LLM linearizing quality with orders of magnitudes less memory and compute. * We base these steps on two findings. * First, we can replace an LLM's softmax attentions with closely-approximating linear attentions, simply by training the linear attentions to match their softmax counterparts with an output MSE loss ("attention transfer").* Then, this enables adjusting for approximation errors and recovering LLM quality simply with low-rank adaptation (LoRA). * LoLCATs significantly improves linearizing quality, training efficiency, and scalability. We significantly reduce the linearizing quality gap and produce state-of-the-art subquadratic LLMs from Llama 3 8B and Mistral 7B v0.1, leading to 20+ points of improvement on 5-shot MMLU. * Furthermore, LoLCATs does so with only 0.2% of past methods' model parameters and 0.4% of their training tokens. * Finally, we apply LoLCATs to create the first linearized 70B and 405B LLMs (50x larger than prior work). * When compared with prior approaches under the same compute budgets, LoLCATs significantly improves linearizing quality, closing the gap between linearized and original Llama 3.1 70B and 405B LLMs by 77.8% and 78.1% on 5-shot MMLU.Timestamps* [00:02:27] Intros* [00:03:16] Why Scale Context Lengths? or work on Efficient Models* [00:06:07] The Story of SSMs* [00:09:33] Idea 1: Approximation -> Principled Modeling* [00:12:14] Idea 3: Selection* [00:15:07] Just Read Twice* [00:16:51] Idea 4: Test Time Compute* [00:17:32] Idea 2: Hardware & Kernel Support* [00:19:49] RWKV vs SSMs* [00:24:24] RWKV Arch* [00:26:15] QWRKWv6 launch* [00:30:00] What's next* [00:33:21] Hot Takes - does anyone really need long context?Transcript[00:00:00] AI Charlie: We're back at Latent Space Live, our first mini conference held at NeurIPS 2024 in Vancouver. This is Charlie, your AI co host. As a special treat this week, we're recapping the best of 2024 going domain by domain. We sent out a survey to the over 900 of you who told us what you wanted, and then invited the best speakers in the Latent Space Network to cover each field.[00:00:24] AI Charlie: 200 of you joined us in person throughout the day, with over 2200 watching live online. Thanks Our next keynote covers the State of Transformers alternative architectures, with a special joint presentation with Dan Fu of Together AI and Eugene Chia of Recursal AI and Featherless AI. We've featured both Together and Recursal on the pod before, with CEO Veepal Vedprakash introducing them.[00:00:49] AI Charlie: And CTO CE Zhang joining us to talk about how they are building together together as a quote unquote full stack AI startup from the lowest level kernel and systems [00:01:00] programming to the highest level mathematical abstractions driving new model architectures and inference algorithms with notable industry contributions from Red Pajama V2, Flash Attention 3, Mamba 2, Mixture of Agents.[00:01:15] AI Charlie: Based, Sequoia, Evo, Dragonfly, Danfoo's Thunder Kittens, and many more research projects this year. As for Recursal and Featherless, we were the first podcast to feature RWKV last year, and this year the team has shipped RWKV v5, codenamed Eagle, to 1. 5 billion Windows 10 and Windows 11 machines worldwide to support Microsoft's on device, end Energy Usage Sensitive Windows Copilot Use Cases and has launched the first updates on RWKV v6, codenamed Finch and Goldfinch.[00:01:53] AI Charlie: On the morning of Latent Space Live, they also announced QRdata UKv6, a QEN32B model [00:02:00] modified with RDWKV linear attention layers. Eugene has also written the most single most popular guest post on the Latent Space blog this year. Yes, we do take guest posts on what he has discovered about the H100 GPU inference NeoCloud market since the successful launch of Featherless AI this year.[00:02:20] AI Charlie: As always, don't forget to check the show notes for the YouTube link to their talk as well as their slides. Watch out and take care.[00:02:27] Intros[00:02:27] Dan Fu: Yeah, so thanks so much for having us. So this is going to be a little bit of a two part presentation. My name is Dan. I'm at Together AI, and I'll be joining UCSD as faculty in about a year. And Eugene, you want to introduce yourself?[00:02:46] Eugene Cheah: Eugene, I lead the art activity team, and I, I'm CEO of Featherless, and we both work on this new post transformer architecture space.[00:02:55] Dan Fu: Yeah, so yeah, so today we're really excited to talk to you a little bit [00:03:00] about that. So first I'm going to give a broad overview of kind of the last few years of progress in non post transformer architectures. And then afterwards Eugene will tell us a little bit about the latest and the greatest and the latest frontier models in this space.[00:03:16] Why Scale Context Lengths? or work on Efficient Models[00:03:16] Dan Fu: So, the story starts with Scaling. So this is probably a figure or something like this that you've seen very recently. Over the last five to six years, we've seen models really scale up in parameter size, and that's brought with it a bunch of new capabilities, like the ability to talk to you and tell you sometimes how to use your Colab screens.[00:03:35] Dan Fu: But another place where we've seen scaling especially recently is scaling in context length. So this can mean Having more text inputs for your models, but it can also mean things like taking a lot of visual token inputs image inputs to your models or generating lots of outputs. And one thing that's been really exciting over the last few months or so is that we're, we're seeing scaling, not only during training time, but also [00:04:00] during test time.[00:04:00] Dan Fu: So this is one of the, the, this is the iconic image from the OpenAI 01 release. Not only are we starting to scale train time compute, but we're also starting to scale test time compute. Now if you're familiar with our attention and our transformer architectures today, this graph on the right might look a little bit scary.[00:04:19] Dan Fu: And one of the reasons is that the implications are a little bit Interesting. So what does it mean if we want to continue having smarter and smarter models? Do we just need to start building bigger, bigger data centers, spending more flops? Is this this little Dolly 3, we need more flops, guys? Is this going to be the future of all of AI?[00:04:39] Dan Fu: Or is there a better way, another path forward? Maybe we can get the same capabilities that we've gotten used to, But for a lot less compute, a lot less flops. And one of the things that we're going to talk about today is specifically looking at that core attention operator in some of these models.[00:04:57] Dan Fu: And the reason is that so this is just some, some [00:05:00] basic you know, scaling curves, but attention has compute that scales quadratically in the context length. So that means that if you're doing something like test time compute and you want to spend a bunch of tokens thinking about what comes next, the longer that that goes the, the, the more tokens you spend on that, that compute grows quadratically in that.[00:05:19] Dan Fu: One of the questions that we're interested in is, can we take that basic sequence model, that basic sequence primitive at the bottom, and get it to scale better? Can we scale in, let's say, n to the 3 halves or n log n? So in, in the first part of the talk, so we just went over the introduction. What I'm gonna do over the next few slides is just talk about some of the key advances and ideas that have shown over the past few years since maybe early 2020 to, to now that shown promise that this might actually be possible.[00:05:48] Dan Fu: That you can actually get potentially the same quality that we want while scale, while scaling better. So to do that, we're and, and basically the, the story that we're gonna look is we're gonna start to see [00:06:00] how. So this is a basic graph of just the past couple years of progress of perplexity where that blue line, that dotted blue line, is attention.[00:06:07] The Story of SSMs[00:06:07] Dan Fu: It's your basic transformer, full dense attention. And then the dots coming down are some of the methods that you'll see in this presentation today. We're going to turn the clock back all the way to 2020. So this, this, this question of can we make attention subquadratic? Basically, as soon as we said attention is all you need, People started asking this question.[00:06:28] Dan Fu: So we have this quadratic attention operator. Can we do better? I'll briefly talk about why attention is quadratic. And the basic thing that happens, if you're not familiar, is that you have these inputs, these keys and queries. And what you do in this attention matrix, this S matrix over here, is that you're using, you're comparing every token in your input to every other token.[00:06:49] Dan Fu: So when I try to do something like upload a whole book to Gemini, what happens beyond the Maybe not Gemini, because we don't necessarily know what architecture is. But let's say we upload it to LLAMA, what happens beyond [00:07:00] the scenes, behind the scenes, is that it's going to take every single word in that book and compare it to every other word.[00:07:05] Dan Fu: And this has been a really, it's, it's led to some pretty impressive things. But it's kind of a brute forcing of the way that you would try to interpret a interpret something. And what attention does in particular is the, and then what attention, sorry, don't want to. Okay, no, no laser pointer. What, what attention does afterwards is that instead of always operating in this quadratic thing, it takes a row wise softmax over this matrix, and then multiplies it by this values matrix.[00:07:32] Dan Fu: So, one of the key points to notice is that the output size is always going to be the same as the inputs, at least in standard self attention. So one of the first things that folks tried to do around 2020 is this thing called linear attention, which is just, just noticing that if we take out this softmax from here, if we take out this non linearity in the middle of the attention operation, and then if you compute the keys and the values operation first, you actually never hit this quadratic bottleneck.[00:07:57] Dan Fu: So that, that's potentially a way [00:08:00] to get a lot more computationally efficient. And there are various ways to do this by basically using feature maps or try to approximate this overall attention computation. But some of this work sort of started to hit a wall in 2020. And the basic challenges were, were two.[00:08:16] Dan Fu: So one was quality. It was back then, it was kind of hard to, to get good quality with these linear attention operators. The other one was actually hardware efficiency. So these, this feature map that was just shown by a simplify simplify here. Actually ends up being quite computationally expensive if you just implement it naively.[00:08:34] Dan Fu: So you started having these operators that not only were you sure, you're not really sure if they have the same quality, but also they're actually just wall clock slower. So you kind of end up getting the worst of both worlds. So this was the the stage. So that kind of sets the stage for four years ago.[00:08:49] Dan Fu: Keep this in mind because linear attention is actually going to come back in a few years once we have a better understanding. But one of the works that started kicking off this, this [00:09:00] mini revolution in post transformer architectures was this idea called states based model. So here the seminal work is, is one about our work queue in 2022.[00:09:09] Dan Fu: And this, this piece of work really brought together a few ideas from, from some long running research research lines of work. The first one was, and this is really one of the keys to, to closing the gap in quality was just using things that, that if you talk to a, a, an electrical engineer off the street, they might know off, off the, like the back of their hand.[00:09:33] Idea 1: Approximation -> Principled Modeling[00:09:33] Dan Fu: But taking some of those properties with how we model dynamical systems in signal processing and then using those ideas to model the inputs, the, the text tokens in, for example a transformer like Next Token Prediction Architecture. So some of those early states-based model papers were looking at this relatively, relatively simple recurrent update model that comes from maybe chapter one of a signal processing class.[00:09:59] Dan Fu: But then using [00:10:00] some principle theory about how you should do that recurrent update in order to really get the most that you can out of your hidden state, out of your out of your sequence. So that, that was one key idea for quality and. When this was eventually realized, you started to see a bunch of benchmarks that were pretty sticky for a few years.[00:10:20] Dan Fu: Things like long range arena, some long sequence evaluation benchmarks, There was stuff in time series, time series analysis. They started to, you started to see the quality tick up in meaningful ways. But the other key thing that What's so influential about these states based models is that they also had a key idea about how you can compute these things efficiently.[00:10:45] Dan Fu: So if you go back to your machine learning 101 class where you learned about RNNs, one thing that you may have learned is that they don't paralyze as well as detention, because if you just run them naively, you have to do this kind of sequential update to process new tokens, [00:11:00] whereas in attention, you can process all the tokens in parallel at one time.[00:11:04] Dan Fu: One of the key insights behind the S4 paper was that these recurrent models, you could take them and you could also formulate them as a convolution. And in particular, with a convolution, you could, instead of using a PyTorch conv1d operation, you can compute that with the FFT. And that would give you n log n compute in the in the sequence length n with an operator that was relatively well optimized for modern hardware.[00:11:28] Dan Fu: So those are really, I'd say, the two key ideas in 2022 that started allowing these breakthroughs to happen in these non transformer architectures. So, these ideas about how to principally model sorry, how to model the recurrent updates of a mo of, of a sequence in a principled way, and also these key ideas in how you can compute it efficiently by turning it into a convolution and then scaling it up with the FFT.[00:11:53] Dan Fu: Along those same lines, so afterwards we started putting out some work on specialized kernels, so just [00:12:00] like we have flash attention for transformers, we also have works like flash fft conf, and if you look at these lines of work oftentimes when, whenever you see a new architecture, you see a new primitive one of the, one of the table stakes now is, do you have an efficient kernel so that you can actually get wall clock speed up?[00:12:14] Idea 3: Selection[00:12:14] Dan Fu: So by 2022, We are starting to have these models that had promising quality primitives, but and, and also promising wall clocks. So you could actually see regimes where they were better than transformers in meaningful ways. That being said, there were, there's still sometimes a quality gap, particularly for language modeling.[00:12:33] Dan Fu: And because languages, It's so core to what we do in sequence modeling these days the, the next, the next key idea that I'm going to talk about is this idea of selection mechanisms. And this is basically an idea of, so you have this recurrent state that you're keeping around that just summarizes everything that, that came before.[00:12:50] Dan Fu: And to get a good sequence model, one of the things that you really need to be able to do is have the model learn what's the best way to pick out pieces from that recurrent [00:13:00] state. So one of the, one of the major ideas here in a line of work called H3, Hungry Hungry Hippos, and also these hyena models were One way you can do this is by just adding some simple element wise gates.[00:13:13] Dan Fu: So versions of these ideas have been around for decades. If you squint at the LSTM paper you, you can probably find, find this gating mechanism. But turns out you can take those old ideas, add them into these new. state space models, and then you can see quality start to pick up. If you've heard of the Mamba model, this also takes the selection to the next level by actually making some changes in that fundamental recurrent state space.[00:13:40] Dan Fu: So, it's not only just this gating that happens around the SSM layer, but also you can actually make The ABCD matrices of your state space model, you can make them data dependent, which will allow you to even better select out different pieces from your hidden state depending on what you're seeing. I'll also point out if you look at the [00:14:00] bottom right of this figure, there's this little triangle with a GPU SRAM, GPU HBM, and this, this is just continuing that trend of when you have a new architecture you, you, you also release it with a kernel to, to, to show that it is hardware efficient, that it, that it can be hardware efficient on modern hardware.[00:14:17] Dan Fu: The, the, one of the next cool things that happened is once we had this understanding of these are the basic pieces, these are the basic principles behind some of the sequence models linear attention actually started to come back. So in earlier this year, there was a model called BASED the, from Simran Arora and, and some other folks, that combined a more principled version of linear attention that basically the, the, the, the two second summary is that it used a Taylor approximation of the softmax attention, combined that with a simple sliding window attention and was starting to able, starting to be able to expand the Pareto frontier of how much data can you recall from your sequence, versus how small is your recurrent state size.[00:14:58] Dan Fu: So those orange dots [00:15:00] are, at the top there, are just showing smaller sequences that can recall more memory.[00:15:07] Just Read Twice[00:15:07] Dan Fu: And the last major idea I think that has been influential in this line of work and is very relatively late breaking just a few months ago, is just the basic idea that when you have these models that are fundamentally more efficient in the sequence length, you maybe don't want to prompt them or use them in exactly the same way.[00:15:26] Dan Fu: So this was a really cool paper called Just Read Twice, also from Simran. That basically said, hey, all these efficient models can process tokens so much more efficiently than transformers that they can sometimes have unfair advantages compared to a simple transformer token. So, or sorry, a simple transformer model.[00:15:44] Dan Fu: So take, for example the standard, the standard use case of you have some long document, you're going to pass it in as input, and then you're going to ask some question about it. One problem you might imagine for a recurrent model where you have a fixed state size is, let's say that [00:16:00] you're. Article is very long, and you're trying to ask about some really niche thing.[00:16:04] Dan Fu: You can imagine it might be hard for the model to know ahead of time what information to put into the hidden state. But these, these, these models are so much more efficient that you can do something really stupid, like, you can just put the document write down the document, write down the question, write down the document again, and then write down the question again, and then this time, the second time that you go over that document, you know exactly what to look for.[00:16:25] Dan Fu: And the cool thing about this is, so this is, And this this results in better quality, especially on these recall intensive tasks. But the other interesting thing is it really takes advantage of the more efficient architectures that, that we're having here. So one of the other, I think, influential ideas in this line of work is if you change the fundamental compute capabilities of your model and the way that it scales, you can actually start to query it at test time differently.[00:16:51] Idea 4: Test Time Compute[00:16:51] Dan Fu: And this actually, of course, goes back to those slides on test time compute. So while everybody's looking at, say, test time compute for big transformer models, [00:17:00] I think potentially a really interesting research question is, how can you take those and how does it change with this new next generation of models?[00:17:09] Dan Fu: So the, I'll just briefly summarize what some of those key ideas were and then talk and then show you briefly kind of what the state of the art is today. So, so the four key ideas are instead of just doing a simple linear attention approximation, instead take ideas that we know from other fields like signal processing, do a more principled approach to your modeling of the sequence.[00:17:32] Idea 2: Hardware & Kernel Support[00:17:32] Dan Fu: Another key idea throughout all these lines of work is you really want. Hardware and kernel support from day one. So, so even if your model is theoretically more efficient if somebody goes and runs it and it's two times slower one of the things that, that we've learned is that if, if you're in that situation, it's, it's just gonna be dead on arrival.[00:17:49] Dan Fu: So you want to be designing your architectures one of the key, key machine learning ideas that has been important for the quality is just making sure that you encode different ways that you can [00:18:00] select from your hidden state and, and really focus on that as a key decider of quality. And finally, I think one of the, the, the emerging new, new things for, for this line of work and something that's quite interesting is, What are the right test time paradigms for these models?[00:18:15] Dan Fu: How do they change relative to relative to what you might do for a standard transformer? I'll briefly end this section. So I've labeled this slide where we are yesterday because Eugene is going to talk about some new models that he released literally this morning. But as of yesterday, some of the really cool results out of the, these efficient alternative models were so AI2 trained this hybrid MOE called Jamba.[00:18:40] Dan Fu: That, that, that seems, that is currently the state of the art for these non transformer architectures. There's this NVIDIA and MIT put out this new diffusion model called SANA recently that one of their key key observations is that you can take a standard diffusion transformer diffusion model, replace the layers with linear [00:19:00] attention, and then that lets you scale to much larger much larger images, much, much Much larger sequences more efficiently.[00:19:07] Dan Fu: And and one thing that I don't think anybody would have called when a few years ago is that one of those gated SSM, gated states based models ended up on the cover of Science because a great group of folks went and trained some DNA models. So that's Michael Polley, Eric Yuen from from Stanford and the Arc Institute.[00:19:26] Dan Fu: So it's, we're really at an exciting time in 2024 where these non transformer, post transformer architectures are showing promise across a wide range. Across a wide range of, of modalities, of applications, and, and of tasks. And with that, I'll pass it on to Eugene, who can tell you a little bit about the latest and greatest with RWKV.[00:19:49] RWKV vs SSMs[00:19:49] Eugene Cheah: So, that's useful? Yeah. You're talking to here. Oh, I'm talking to here. Okay. So, yeah, two streams. Yeah. So, I think one common questions that we tend to get asked, right, is what's the difference between [00:20:00] RWKV and state space? So I think one of the key things to really understand, right the difference between the two groups, right, is that we are actually more like an open source, random internet meets academia kind of situation.[00:20:11] Eugene Cheah: Like, most of us never wrote any paper, but we, we basically look at RNNs and linear intention when intention is all you need came out, and then we decided to like, hey there is a quadratic scaling problem. Why don't we try fixing that instead? So, so, so we end up developing our own branch, but we end up sharing ideas back and forth.[00:20:30] Eugene Cheah: So, and, and we do all this actively in Discord, GitHub, etc. This was so bad for a few years, right, that basically, the average group's H index was so close to zero, right, Illuter. ai actually came in and helped us write our first paper. Great, now our H index is now three, apparently. So, so, so, but, but the thing is, like, a lot of these experiments led to results, and, and, essentially, essentially, we we took the same ideas from linear attention, [00:21:00] and we built on it.[00:21:01] Eugene Cheah: So, to take a step back into, like, how does RWKB handle its own attention mechanic and achieve the same goals of, like, O and compute, respectively, and in focus of our overall goal to make AI accessible to everyone, regardless of language, nation, or compute, that's our goal. We actually train our models primarily on over a hundred languages, which is another topic altogether.[00:21:23] Eugene Cheah: And our goal is to train to even 200 languages to cover all languages in the world. But at the same time, we work on this architecture, To lower the compute cost so that people can run it on Raspberry Pis and on anything. So, how did RWKB break the dependency of LSTM token flow? Because I think to understand architecture, right, it's probably easier to understand it from the RNN lens.[00:21:46] Eugene Cheah: Because that's where we built on. We all, we all state space kind of like try to, try to start anew and took lessons from that and say, So there's a little bit of divergence there. And AKA, this our version of linear attention. So to take step back [00:22:00] all foundation models, be it transformers or non transformers at a very high level, right?[00:22:05] Eugene Cheah: Pumps in the token. I mean, text that things into embeddings and go through a lot of layers. Generate a lot of states where the QKV cache or be iron in states or RW KB states. And outputs and embedding, they are not the same thing. And we just take more layers and more embeddings. And somehow that magically works.[00:22:23] Eugene Cheah: So, if you, if you remember your ancient RNN lessons which we, which we, which we we call best learning these days the general idea is that you have the embedding information flowing all the way up, and when, and you take that information and you flow it back down, and then you process it as part of your LSTM layers.[00:22:41] Eugene Cheah: So, this is how it generally works. Kapati is quoted saying that RNNs are actually unreasonably effective. The problem is this is not scalable. To start doing work on the second token, you need to wait for the first token. And then you need to, and likewise for the third token and fourth token, yada yada.[00:22:55] Eugene Cheah: That is CPU land, not GPU land. So, so, so, you [00:23:00] can have a H100 and you can't even use 1 percent of it. So, so that's kind of why RNNs didn't really take off in the direction that we wanted, like, billions of parameters when it comes to training. So, what did RDAP KV version 0 do? Boom. We just did the dumbest, lamest thing.[00:23:13] Eugene Cheah: Sorry, this is the bottleneck for RNN. We did the dumb thing of removing that line. And it kind of worked. It trained. It sucked, but it kind of worked. Then we were like, hey, then no one cared because the loss was crap, but how do we improve that? And that's essentially where we move forward, because if you see this kind of flow, right, you can actually get your GPU saturated quickly, where it essentially cascades respectively.[00:23:41] Eugene Cheah: So I'm just waiting for this to loop again. So it's like, once you get your first layer, your token to be computed finish. You start to cascade your compute all the way until you are, Hey, I'm using 100 percent of the GPU. So we, we worked on it, and we started going along the principle of that as long as we keep this general architecture [00:24:00] where, where we can cascade and, and be highly efficient with our architecture, nothing is sacred in our architecture.[00:24:06] Eugene Cheah: And we have done some crazy ideas. In fact, you ask us, if you ask me to explain some things in the paper, right, officially in the paper, I'll say we had this idea and we wrote it this way. The reality is someone came with a code, we tested it, it worked, and then we rationalized later. So, so the general[00:24:24] RWKV Arch[00:24:24] Eugene Cheah: The idea behind rwkbr is that we generally have two major blocks that we do.[00:24:30] Eugene Cheah: We call time mix and channel mix. And time mix generally handles handles long term memory states, where essentially, where essentially where we apply the matrix multiplication and Cilu activation functions into processing an input embedding and an output embedding. I'm oversimplifying it because this, This calculation changed every version and we have, like, version 7 right now.[00:24:50] Eugene Cheah: ChannelMix is similar to Base in the sense that it does shorter term attention, where it just looks at the sister token, or the token before it, because [00:25:00] there's a shift in the token shift matrix. I don't really want to go too much into the papers itself, because, like, we do have three papers on this.[00:25:09] Eugene Cheah: Basically, RWKB, RNN for the transformer, ERA, Ego and Pinch, RWKB, Matrix Value State. This is the updated version 5, version 6. And Goldfinch is our, is, is, is, is our hybrid model respectively. We are writing the paper already for V seven and which is, which is for R wk V seven. Called, named Goose, or architectures are named by Bird.[00:25:30] Eugene Cheah: And, I'm going to cover as well, qrwkb, and mama100k, and rwkb, and Where did that lead to? Great! Because we are all GPU poor and to be clear, like, most of this research is done, like, only on a handful H100s, which I had one Google researcher told me that was, like, his experiment budget for a single researcher.[00:25:48] Eugene Cheah: So, our entire organization has less compute than a single researcher in Google. So We, we, one of the things that we explored into was to how do we convert transformer models instead? Because [00:26:00] someone already paid that billion dollars, a million dollars onto training, so why don't we take advantage of those weights?[00:26:05] Eugene Cheah: And, and to, I believe, together AI worked on the lockets for, for the Lambda side of things, and, and we took some ideas from there as well, and we essentially did that for RWKB.[00:26:15] QWRKWv6 launch[00:26:15] Eugene Cheah: And that led to, Q RWKB6, which we just dropped today, a 32 bit instruct preview model, where we took the Quen 32 bit instruct model, freeze the feedforward layer, remove the QKB attention layer, and replace it with RWKB linear layers.[00:26:32] Eugene Cheah: So to be clear, this means we do not have the rwkv channel mix layer, we only have the time mix layer. But but once we do that, we train the rwkv layer. Important is that the feedforward layer needs to be frozen, so the new attention can be learned. And then we unfreeze the feedforward layer, and train all the layers together with a custom learning rate schedule, so that they can learn how to work together.[00:26:54] Eugene Cheah: The end result, surprisingly, And, to be honest, to the frustration of the R. W. [00:27:00] KV MOE team, which ended up releasing the model on the same day, was that, with just a few hours of training on two nodes, we managed to get it to be on par, kind of, with the original QUAN32B model. So, in fact, when the first run, right, that completely confused us, it was like, and I was telling Daniel Goldstein, Smirky, who kind of leads most of our research coordination, When you pitched me this idea, you told me at best you'll get the same level of performance.[00:27:26] Eugene Cheah: You didn't tell me the challenge and score and Winograd score will shoot up. I don't know what's happening there. But it did. MMLU score dropping, that was expected. Because if you think about it, when we were training all the layers, right, we were essentially Like, Frankenstein this thing, and we did brain damage to the feedforward network layer 2 with the new RWKB layers.[00:27:47] Eugene Cheah: But, 76%, hey, somehow it's retained, and we can probably further train this. We didn't even spend more than 3 days training this, so there's a lot more that can be done, hence the preview. This brings up [00:28:00] a big question, because We are already now in the process of converting to 7TB. We are now, this is actually extremely compute efficient to test our attention mechanic.[00:28:10] Eugene Cheah: It's like, it becomes a shortcut. We can, we are already planning to do our version 7 and our hybrid architecture for it. Because we don't need to train from scratch. And we get a really good model out of it. And the other thing that is uncomfortable to say is that because we are doing right now on the 70b is that if this scales correctly to 128k context length, I'm not even talking about a million 128, majority of enterprise workload today is just on 70b at under 32k context length.[00:28:41] Eugene Cheah: That means if this works and the benchmark matches it, It means we can replace the vast majority of current AI workload, unless you want super long context. And then sorry, can someone give us more GPUs? Because we do need the VRAM for super long context, sadly. So yeah, that's what we are working on, and essentially, [00:29:00] we are excited about this to just push it further.[00:29:02] Eugene Cheah: And this conversion process, to be clear, I don't think it's going to be exclusive to RWKB. It probably will work for Mamba as well, I don't see why not. And we will probably see more ideas, or more experiments, or more hybrids, or Yeah, like, one of the weirdest things that I wanted to say outright, and I confirmed this with the Black Mamba team and the Jamba team, which because we did the GoFinch hybrid model, is that none of us understand why a hard hybrid with a state based model to be R.[00:29:28] Eugene Cheah: QA state space and transformer performs better when, than the baseline of both. It's like, it's like when you train one, you expect, and then you replace, you expect the same results. That's our pitch. That's our claim. But somehow when we jam both together, it outperforms both. And that's like one area of emulation that, like, we only have four experiments, plus four teams, that a lot more needs to be done.[00:29:51] Eugene Cheah: But, but these are things that excite me, essentially, because that is what it's potentially we can move ahead for. Which brings us to what comes next.[00:30:00] What's next[00:30:00] [00:30:00][00:30:00] Dan Fu: So, this part is kind of just some, where we'll talk a little bit about stuff that, that we're excited about. Maybe have some wild speculation on, on what, what's, what's coming next.[00:30:12] Dan Fu: And, of course this is also the part that will be more open to questions. So, a couple things that, that I'm excited about is continued hardware model co design for, for these models. So one of the things that we've put out recently is this library called ThunderKittens. It's a CUDA library.[00:30:29] Dan Fu: And one of the things that, that we found frustrating is every time that we built one of these new architectures, and I'm sure you had the exact same experience, we'd have to go and spend two months in CUDA land, like writing these, these new efficient things. And. If we decided to change one thing in PyTorch, like one line of PyTorch code is like a week of CUDA code at least.[00:30:47] Dan Fu: So one of our goals with, with a library like Thunderkitten, so we, we just broke down what are the key principles, what are the key hardware things what are the key, Compute pieces that you get from the hardware. So for example on [00:31:00] H100 everything is really revolves around a warp group matrix multiply operation.[00:31:06] Dan Fu: So you really want your operation to be able to split into relatively small matrix, matrix multiply operations. So like multiplying two 64 by 64 matrices, for example. And so if you know that ahead of time when you're designing your model, that probably gives you you know, some information about how you set the state sizes, how you set the update, how you set the update function.[00:31:27] Dan Fu: So with Thunderkittens we basically built a whole library just around this basic idea that all your basic compute primitives should not be a float, but it should be a matrix, and everything should just be matrix compute. And we've been using that to, to try to both re implement some existing architectures, and also start to design code.[00:31:44] Dan Fu: Some new ones that are really designed with this core with a tensor core primitive in mind. Another thing that that we're, that at least I'm excited about is we, over the last four or five years, we've really been looking at language models as the next thing. But if you've been paying [00:32:00] attention to Twitter there's been a bunch of new next generation models that are coming out.[00:32:04] Dan Fu: So there, there are. So, video generation models that can run real time, that are supported by your mouse and your keyboard, that I'm told if you play with them that, you know, that they only have a few seconds of memory. Can we take that model, can we give it a very long context length so that you could actually maybe generate an entire game state at a time?[00:32:25] Dan Fu: What does that look like for the model? You're certainly not going to do a giant quadratic attention computation to try to run that. Maybe, maybe use some of these new models, or some of these new video generation models that came out. So Sora came out I don't know, two days ago now. But with super long queue times and super long generation times.[00:32:43] Dan Fu: So that's probably a quadratic attention operation at the, at the bottom of it. What if we could remove that and get the same quality, but a lot faster generation time? Or some of the demos that we saw from Paige earlier today. You know, if I have a super long conversation with my [00:33:00] Gemini bot, what if I wanted to remember everything that it's seen in the last week?[00:33:06] Dan Fu: I mean, maybe you don't for personal reasons, but what if I did, you know? What does that mean for the architecture? And I think, you know, that's certainly something I'm pretty excited about. I'm sure you're excited about it too. So, I think we were supposed to have some hot takes, but I honestly don't remember what our hot takes were.[00:33:21] Hot Takes - does anyone really need long context?[00:33:21] Eugene Cheah: Yeah, including the next slide. Hot takes, yes, these are our[00:33:25] Dan Fu: hot takes.[00:33:25] Eugene Cheah: I think the big one on Twitter that we saw, that we shared, was the question is like, is RAG relevant? In the case of, like, the future of, like, state based models?[00:33:38] Dan Fu: Let's see, I haven't played too much with RAG. But when I have. I'll say I found it was a little bit challenging to do research on it because we had this experience over and over again, where you could have any, an embedding model of any quality, so you could have a really, really bad embedding model, or you could have a really, really [00:34:00] good one, By any measure of good.[00:34:03] Dan Fu: And for the final RAG application, it kind of didn't matter. That's what I'll say about RAG while I'm being recorded. I know it doesn't actually answer the question, but[00:34:13] Eugene Cheah: Yeah, so I think a lot of folks are like, extremely excited of the idea of RWKB or State Space potentially having infinite context.[00:34:21] Eugene Cheah: But I think the reality is that when we say infinite context, we just mean a different kind of infinite context, or you, or as it's previously covered, you need to test the model differently. So, think of it more along the lines of the human. Like, I don't remember what I ate for breakfast yesterday.[00:34:37] Eugene Cheah: Yeah, that's the statement that I'll say. And And we humans are not quadratic transformers. If we did, if let's say we increased our brain size for every second we live, we would have exploded by the time we are 5 years old or something like that. And, and I think, I think basically fundamentally for us, right, be it whether we, regardless of whether RWKB, statespace, XLSTM, [00:35:00] etc, our general idea is that instead of that expanding state, that increase in computational cost, what if we have a fixed state size?[00:35:08] Eugene Cheah: And Information theory detects that that fixed state size will have a limit. Just how big of a limit is a question, like, we, like, RWKB is running at 40 megabytes for, for its state. Its future version might run into 400 megabytes. That is like millions of tokens in, if you're talking about mathematically, the maximum possibility.[00:35:29] Eugene Cheah: It's just that I guess we were all more inefficient about it, so maybe we hit 100, 000. And that's kind of like the work we are doing, trying to like push it and maximize it. And that's where the models will start differing, because it will choose to forget things, it will choose to remember things. And that's why I think that there might be some element of right, but it may not be the same right.[00:35:49] Eugene Cheah: It may be the model learn things, and it's like, hmm, I can't remember that, that article. Let me do a database search, to search. Just like us humans, when we can't remember the article in the company. We do a search on Notion. [00:36:00][00:36:00] Dan Fu: I think something that would be really interesting is if you could have facts that are, so right now, the one intuition about language models is that all those parameters are around just to store random facts about the world.[00:36:14] Dan Fu: And this intuition comes from the observation that if you take a really small language model, it can do things like talk to you, or kind of has like the The style of conversation, it can learn that, but where it will usually fall over compared to a much larger one is it'll just be a lot less factual about things that it knows or that it can do.[00:36:32] Dan Fu: But that points to all those weights that we're spending, all that SGD that we're spending to train these models are just being used to store facts. And we have things like databases that are pretty good at storing facts. So I think one thing that would be really interesting is if we could actually have some sort of outside data store that a language model can can look at that that maybe is you know, has has some sort of gradient descent in it, but but would be quite interesting.[00:36:58] Dan Fu: And then maybe you could edit it, delete [00:37:00] facts, you know, change who's president so that it doesn't, it doesn't get lost.[00:37:04] Vibhu: Can we open up Q& A and hot takes for the audience? I have a hot take Q& A. Do these scale? When, when 405B state space model, RAG exists, no one does long context, who's throwing in 2 million token questions, hot takes?[00:37:24] Dan Fu: The, the who's throwing in 2 million token question, I think, is, is a really good question. So I actually, I was going to offer that as a hot take. I mean, my hot take was going to be that long context doesn't matter. I know I just gave a whole talk about it, but you know, what, what's the point of doing research if you can't, you know, play both sides.[00:37:40] Dan Fu: But I think one of the, so I think for both of us, the reason that we first got into this was just from the first principled questions of there's this quadratic thing. Clearly intelligence doesn't need to be quadratic. What is going on? Can we understand it better? You know, since then it's kind of turned into a race, which has [00:38:00] been exciting to watch, like, how much context you can take in.[00:38:03] Dan Fu: But I think it's right. Nobody is actually putting in a two million context prompt into these models. And, and, you know, if they are, maybe we can go, go You know, design a better model to do that particular thing. Yeah, what do you think about that? So you've also been working on this. Do you think long context matters?[00:38:19] Eugene Cheah: So I'm going to burn a bit. How many of you remember the news of Google Gemini supporting 3 million contacts, right? Raise your hand.[00:38:28] Vibhu: Yeah, 2 million.[00:38:29] Eugene Cheah: Oh, it's 2 million.[00:38:31] Eugene Cheah: Yeah, how many of you actually tried that? See?[00:38:34] Vibhu: I use it a lot. You? You work for MindsTV. I use it a lot.[00:38:41] Eugene Cheah: So, for some people that has used, and I think, I think that's the, that's might be, like, this is where my opinion starts to differ, because I think the big labs may have a bigger role in this, because Like, even for RWKB, even when we train non contacts, the reason why I say VRAM is a problem is that because when we did the, we need to backprop [00:39:00] against the states, we actually need to maintain the state in between the tokens by the token length.[00:39:05] Eugene Cheah: So that means we need to actually roll out the whole 1 million contacts if we are actually training 1 million. Which is the same for transformers, actually, but it just means we don't magically reuse the VRAM consumption in the training time space. So that is one of the VRAM bottlenecks, and I'm neither OpenAI nor Google, so donate GPUs if you have too much of them.[00:39:27] Eugene Cheah: But then, putting it back to another paradigm, right, is that I think O1 style reasoning might be actually pushing that direction downwards. In my opinion, this is my partial hot take is that if, let's say you have a super big model, And let's say you have a 70B model that may take double the tokens, but gets the same result.[00:39:51] Eugene Cheah: Strictly speaking, a 70B, and this is even for transformer or non transformer, right? We we'll take less less resources than that 400 B [00:40:00] model, even if it did double the amount thinking. And if that's the case, and we are still all trying to figure this out, maybe the direction for us is really getting the sub 200 B to be as fast as efficient as possible.[00:40:11] Eugene Cheah: We a very efficient architecture that some folks happen to be working on to, to just reason it out over larger and larger context thing.[00:40:20] Question: Yeah. One thing I'm super interested in is. Models that can watch forever? Obviously you cannot train something on infinite context length. How are y'all thinking about that, where you run on a much longer context length than is possible to train on?[00:40:38] Dan Fu: Yeah, it's a, it's a great question. So I think when I think you guys probably had tweets along these lines, too. When we first started doing these things, because these are all recurrent models in theory you could just run it forever. You could just run it forever. And at the very least it won't, it won't like error out on your crash.[00:40:57] Dan Fu: There's another question of whether it can actually [00:41:00] use what it's seen in that infinite context. And I think there, so one place where probably the research and architectures ran faster Then another research is actually the benchmarks for long context. So you turn it on forever. You want to do everything or watch everything.[00:41:16] Dan Fu: What is it that you actually wanted to do? Can we actually build some benchmarks for that? Then measure what's happening. And then ask the question, can the models do it? Is there something else that they need? Yeah, I think that if I were to turn back the clock to 2022, that's probably one of the things I would have done differently, which would have been actually get some long context benchmarks out at the same time as we started pushing context length on all these models.[00:41:41] Eugene Cheah: I will also say the use case. So like, I think we both agree that there's no Infinite memory and the model needs to be able to learn and decide. I think what we have observed for, I think this also fits the state space model, is that one of the key advantages of this alternate attention mechanic that is not based on token position is that the model don't suddenly become crazy when you go past the [00:42:00] 8k training context tank, or a million context tank.[00:42:03] Eugene Cheah: It's actually still stable. It's still able to run, it's still able to rationalize. It just starts forgetting things. But some of these things are still there in latent memory. Some of these things are still somewhat there. That's the whole point of why reading twice works. Things like that. And one of the biggest pushes in this direction is that I think both Statespace and RWKB have Separate papers by other researchers where they use this architecture for time series data.[00:42:26] Eugene Cheah: Weather modeling. So, you are not asking what was the weather five days ago. You're asking what's the weather tomorrow based on the infinite length that we, as long as this Earth and the computer will keep running. So, so, and they found that it is like, better than existing, like, transformer or existing architecture in modeling this weather data.[00:42:47] Eugene Cheah: Control for the param size and stuff. I'm quite sure there are people with larger models. So, so there are things that, that in this case, right, there is future applications if your question is just what's next and not what's 10 years ago.[00:42:59] Dan Fu: Thanks so [00:43:00] much for having us. Get full access to Latent Space at www.latent.space/subscribe

Entre nos pages
Episode #91 : Où l'on tient un journal de lecture

Entre nos pages

Play Episode Listen Later Nov 12, 2024 34:06


Bonjouuuuur ! Nous revoilà avec un journal de lecture, et comme d'habitude on est parties dans tous les sens :D On espère que ça vous plaira, n'hésitez pas à nous donner vos avis, via instagram @entrenospages ou par mail : entrenospages@gmail.com. Bonne écoute ! Les livres abordés dans cet épisode sont : - Capitale du Nord T2, Claire Duvivier - Fils-des-Brumes/Mistborn T2, Brandon Sanderson - Le club des veufs noirs/Tales of the black widowers, Isaac Asimov - Jusque dans la terre/Follow me to ground, Sue Rainsford - Chaussette, Loïc Clément et Anne Montel - Chaque jour Dracula, Loïc Clément et Clément Lefèvre - Les détectives du Yorkshire/The Dales detective T9, Julia Chapman - Brussailes, Eléonore Devillepoix - Journal d'un Assasynth/The murderbot diaries T6, Martha Wells - Les 5 terres T7 à 12, Lewelyn et Jérôme Lereculey - Tu réclamais le soir, Fabrice Colin - Le cercle du dragon-thé/The tea dragon society, K. O'Neill - Six versions/Six stories T5, Matt Wesolowski Music promoted by La Musique Libre Joakim Karud - Canals: https://youtu.be/zrXbhncmorc Joakim Karud: https://soundcloud.com/joakimkarud

PodCacher: Geocaching Goodness
Show 887.0: Extreme Caches and Cache Odyssey

PodCacher: Geocaching Goodness

Play Episode Listen Later Nov 11, 2024 36:23


On our geocaching podcast today, we have a discussion of the November Cache hiding theme, extreme (T5) caches along with a big announcement of a special geocaching project you'll want to know about. We also share interesting alternatives to GPS, a special message and new product from Logwerk, some contest winners and much more. Listen […] The post Show 887.0: Extreme Caches and Cache Odyssey appeared first on PodCacher: Geocaching Goodness.

NO ENCORE
TOP 5 NONSENSE SONGS ft. Max Zanga

NO ENCORE

Play Episode Listen Later Oct 18, 2024 106:51


Autumnal gloom has set in and the only way is down after Zara Hedderman's all-timer of a Top 5 on the previous episode, right? WRONG! Despite the astounding heights of that aforementioned T5, we're keeping the energy supercharged with the returning Max Zanga, present in-studio as a representative of mysterious rising solo talent Filmore!, who definitely is a different person than Max, honest. The new Filmore! EP is called Idle Death Gamble and it's really very good indeed, and so we talk about that and the overall project at large. Elsewhere, we've got a truly nonsensical Top 5, and a grab-bag of news to get through...And don't forget to hit up patreon.com/noencore, we've got a new Film Club episode dropping this Sunday, all about sweaty and problematic 1993 thriller Falling Down.ACT ONE: The preamble in which we ramble. ACT TWO: Filmore! in conversation. Kinda. ACT THREE (31:54): News! Robert Smith bashes Oasis, Taylor Swiftonomics keeps on churning out the goods, Donald Trump reveals his megamix, Atomic Kitten are still a thing, and some words for the late, great Ka. ACT FOUR (1:04:32): Top 5 Nonsense Songs. -Follow Max Zanga on Instagram / X Follow Filmore! on InstagramListen to Idle Death Gamble Get bonus content on Patreon Hosted on Acast. See acast.com/privacy for more information.

JSA Podcasts for Telecom and Data Centers
How T5's Integrated Data Center Services Are Transforming the Industry | Interview with David Mettler, EVP Sales and Marketing & John Shingler EVP

JSA Podcasts for Telecom and Data Centers

Play Episode Listen Later Oct 9, 2024 6:45


In this exclusive interview, David Mettler, EVP of Sales and Marketing, and John Shingler, EVP of T5 Data Centers, dive into the trends shaping the future of #datacenterdevelopment and #datacenteroperations. Discover how their company's integrated approach is attracting more clients and meeting the evolving needs of the industry. Learn why businesses are turning to T5 for both building and operating their data centers, ensuring they remain #ForeverOn.

JSA Podcasts for Telecom and Data Centers
DCD Connect | London: T5's EVP John Shingler talks European Market and Expansion in the US

JSA Podcasts for Telecom and Data Centers

Play Episode Listen Later Sep 30, 2024 5:28


Watch as T5's EVP John Shingler discusses the company's journey in Europe and what's next on the horizon. Over the past two years, T5 has achieved remarkable success in the region—launching a new client this year. The company has also expanded its footprint in the US with projects announced in Atlanta and Chicago.#datacenters #digitalinfrastructure

Focus Check
ep30 - IBC 2024 recap | IBC 2024 Recap | AF Cine Lenses by SIGMA & BLAZAR | LC-Tec eDiffusion | PDMovie 3D Air Budget 3D Shooting Rig

Focus Check

Play Episode Listen Later Sep 19, 2024 58:49


Fresh after IBC 2024 trade show, Johnnie and Nino recap the most innovative and exciting products they discovered. They highlight standout innovations, including the latest autofocus lenses from BLAZAR and SIGMA, and showcase many more fascinating products they explored at the event. Be sure to watch until the end for all the details! Sponsor: This episode is sponsored by Fujifilm. Check it out at (17:20). Chapters & articles mentioned in this episode:   (00:00) - Intro (06:04)  - BLAZAR APEX 1.33x Autofocus Anamorphic Lenses – First Look https://www.cined.com/blazar-apex-1-33x-autofocus-anamorphic-lenses-first-look/ (10:06) - SIGMA Auto Focus Cine Lens Prototype to be Shown at IBC https://www.cined.com/sigma-auto-focus-cine-lens-prototype-to-be-shown-at-ibc/ (18:40) - VILTROX LUNA 42-420mm T5.6 Large-Format Cine Zoom – First Look https://www.cined.com/viltrox-luna-42-420mm-t5-6-large-format-cine-zoom-first-look/ (20:55) - NiSi ATHENA Tuned Full-Frame Cinema Prime Lenses – First Look https://www.cined.com/nisi-athena-tuned-full-frame-cinema-prime-lenses-first-look/ (24:38) - PDMOVIE 3D AIR Smart Mini – First Look https://www.cined.com/pdmovie-3d-air-smart-mini-first-look/ (33:22) - iFootage Shark Slider Nano 2 Announced – First Look https://www.cined.com/ifootage-shark-slider-nano-2-announced-first-look/ (36:56) - Aputure STORM 1200x with New BLAIR Light Engine – First Look https://www.cined.com/aputure-storm-1200x-with-new-blair-light-engine-first-look/ (42:13) - Pixelcube remote DIT box https://www.cined.com/awesome-pixels-x-ottomatic-pixel-cube-explained-automated-remote-offload-solution-for-dits/ (49:58) - LC-Tec Electronic Variable Diffusion Filter System & Metabones EF-E CINE eND Smart Adapter – First Look https://www.cined.com/lc-tec-electronic-variable-diffusion-filter-system-metabones-ef-e-cine-end-smart-adapter-first-look/ (57:31) - Outro    We hope you enjoyed this episode! You have feedback, comments, or suggestions? Write us at podcast@cined.com. 

Slate Star Codex Podcast
Your Book Review: Two Arms and a Head

Slate Star Codex Podcast

Play Episode Listen Later Aug 26, 2024 54:48


[This is one of the finalists in the 2024 book review contest, written by an ACX reader who will remain anonymous until after voting is done. I'll be posting about one of these a week for several months. When you've read them all, I'll ask you to vote for a favorite, so remember which ones you liked] Content warning: body horror, existential devastation, suicide. This book is an infohazard that will permanently alter your view of paraplegia. The Death of a Newly-Paraplegic Philosopher For me, paraplegia and life itself are not compatible. This is not life, it is something else. In May of 2006, philosophy student Clayton Schwartz embarks on a Pan-American motorcycle trip for the summer before law school. He is 30 years old and in peak physical condition.  He makes it as far south as Acapulco in Mexico before crashing into a donkey that had wandered into the road.  The impact crushes his spinal cord at the T5 vertebra, rendering him paralyzed from the nipples down.  On Sunday, February 24, 2008, he commits suicide. In the year and a half in between, he writes Two Arms and a Head, his combination memoir and suicide note.  https://www.astralcodexten.com/p/your-book-review-two-arms-and-a-head 

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
AI Magic: Shipping 1000s of successful products with no managers and a team of 12 — Jeremy Howard of Answer.ai

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Aug 16, 2024 58:56


Disclaimer: We recorded this episode ~1.5 months ago, timing for the FastHTML release. It then got bottlenecked by Llama3.1, Winds of AI Winter, and SAM2 episodes, so we're a little late. Since then FastHTML was released, swyx is building an app in it for AINews, and Anthropic has also released their prompt caching API. Remember when Dylan Patel of SemiAnalysis coined the GPU Rich vs GPU Poor war? (if not, see our pod with him). The idea was that if you're GPU poor you shouldn't waste your time trying to solve GPU rich problems (i.e. pre-training large models) and are better off working on fine-tuning, optimized inference, etc. Jeremy Howard (see our “End of Finetuning” episode to catchup on his background) and Eric Ries founded Answer.AI to do exactly that: “Practical AI R&D”, which is very in-line with the GPU poor needs. For example, one of their first releases was a system based on FSDP + QLoRA that let anyone train a 70B model on two NVIDIA 4090s. Since then, they have come out with a long list of super useful projects (in no particular order, and non-exhaustive):* FSDP QDoRA: this is just as memory efficient and scalable as FSDP/QLoRA, and critically is also as accurate for continued pre-training as full weight training.* Cold Compress: a KV cache compression toolkit that lets you scale sequence length without impacting speed.* colbert-small: state of the art retriever at only 33M params* JaColBERTv2.5: a new state-of-the-art retrievers on all Japanese benchmarks.* gpu.cpp: portable GPU compute for C++ with WebGPU.* Claudette: a better Anthropic API SDK. They also recently released FastHTML, a new way to create modern interactive web apps. Jeremy recently released a 1 hour “Getting started” tutorial on YouTube; while this isn't AI related per se, but it's close to home for any AI Engineer who are looking to iterate quickly on new products: In this episode we broke down 1) how they recruit 2) how they organize what to research 3) and how the community comes together. At the end, Jeremy gave us a sneak peek at something new that he's working on that he calls dialogue engineering: So I've created a new approach. It's not called prompt engineering. I'm creating a system for doing dialogue engineering. It's currently called AI magic. I'm doing most of my work in this system and it's making me much more productive than I was before I used it.He explains it a bit more ~44:53 in the pod, but we'll just have to wait for the public release to figure out exactly what he means.Timestamps* [00:00:00] Intro by Suno AI* [00:03:02] Continuous Pre-Training is Here* [00:06:07] Schedule-Free Optimizers and Learning Rate Schedules* [00:07:08] Governance and Structural Issues within OpenAI and Other AI Labs* [00:13:01] How Answer.ai works* [00:23:40] How to Recruit Productive Researchers* [00:27:45] Building a new BERT* [00:31:57] FSDP, QLoRA, and QDoRA: Innovations in Fine-Tuning Large Models* [00:36:36] Research and Development on Model Inference Optimization* [00:39:49] FastHTML for Web Application Development* [00:46:53] AI Magic & Dialogue Engineering* [00:52:19] AI wishlist & predictionsShow Notes* Jeremy Howard* Previously on Latent Space: The End of Finetuning, NeurIPS Startups* Answer.ai* Fast.ai* FastHTML* answerai-colbert-small-v1* gpu.cpp* Eric Ries* Aaron DeFazio* Yi Tai* Less Wright* Benjamin Warner* Benjamin Clavié* Jono Whitaker* Austin Huang* Eric Gilliam* Tim Dettmers* Colin Raffel* Sebastian Raschka* Carson Gross* Simon Willison* Sepp Hochreiter* Llama3.1 episode* Snowflake Arctic* Ranger Optimizer* Gemma.cpp* HTMX* UL2* BERT* DeBERTa* Efficient finetuning of Llama 3 with FSDP QDoRA* xLSTMTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO-in-Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.Swyx [00:00:14]: And today we're back with Jeremy Howard, I think your third appearance on Latent Space. Welcome.Jeremy [00:00:19]: Wait, third? Second?Swyx [00:00:21]: Well, I grabbed you at NeurIPS.Jeremy [00:00:23]: I see.Swyx [00:00:24]: Very fun, standing outside street episode.Jeremy [00:00:27]: I never heard that, by the way. You've got to send me a link. I've got to hear what it sounded like.Swyx [00:00:30]: Yeah. Yeah, it's a NeurIPS podcast.Alessio [00:00:32]: I think the two episodes are six hours, so there's plenty to listen, we'll make sure to send it over.Swyx [00:00:37]: Yeah, we're trying this thing where at the major ML conferences, we, you know, do a little audio tour of, give people a sense of what it's like. But the last time you were on, you declared the end of fine tuning. I hope that I sort of editorialized the title a little bit, and I know you were slightly uncomfortable with it, but you just own it anyway. I think you're very good at the hot takes. And we were just discussing in our pre-show that it's really happening, that the continued pre-training is really happening.Jeremy [00:01:02]: Yeah, absolutely. I think people are starting to understand that treating the three ULM FIT steps of like pre-training, you know, and then the kind of like what people now call instruction tuning, and then, I don't know if we've got a general term for this, DPO, RLHFE step, you know, or the task training, they're not actually as separate as we originally suggested they were in our paper, and when you treat it more as a continuum, and that you make sure that you have, you know, more of kind of the original data set incorporated into the later stages, and that, you know, we've also seen with LLAMA3, this idea that those later stages can be done for a lot longer. These are all of the things I was kind of trying to describe there. It wasn't the end of fine tuning, but more that we should treat it as a continuum, and we should have much higher expectations of how much you can do with an already trained model. You can really add a lot of behavior to it, you can change its behavior, you can do a lot. So a lot of our research has been around trying to figure out how to modify the model by a larger amount rather than starting from random weights, because I get very offended at the idea of starting from random weights.Swyx [00:02:14]: Yeah, I saw that in ICLR in Vienna, there was an outstanding paper about starting transformers from data-driven piers. I don't know if you saw that one, they called it sort of never trained from scratch, and I think it was kind of rebelling against like the sort of random initialization.Jeremy [00:02:28]: Yeah, I've, you know, that's been our kind of continuous message since we started Fast AI, is if you're training for random weights, you better have a really good reason, you know, because it seems so unlikely to me that nobody has ever trained on data that has any similarity whatsoever to the general class of data you're working with, and that's the only situation in which I think starting from random weights makes sense.Swyx [00:02:51]: The other trends since our last pod that I would point people to is I'm seeing a rise in multi-phase pre-training. So Snowflake released a large model called Snowflake Arctic, where they detailed three phases of training where they had like a different mixture of like, there was like 75% web in the first instance, and then they reduced the percentage of the web text by 10% each time and increased the amount of code in each phase. And I feel like multi-phase is being called out in papers more. I feel like it's always been a thing, like changing data mix is not something new, but calling it a distinct phase is new, and I wonder if there's something that you're seeingJeremy [00:03:32]: on your end. Well, so they're getting there, right? So the point at which they're doing proper continued pre-training is the point at which that becomes a continuum rather than a phase. So the only difference with what I was describing last time is to say like, oh, there's a function or whatever, which is happening every batch. It's not a huge difference. You know, I always used to get offended when people had learning rates that like jumped. And so one of the things I started doing early on in Fast.ai was to say to people like, no, you should actually have your learning rate schedule should be a function, not a list of numbers. So now I'm trying to give the same idea about training mix.Swyx [00:04:07]: There's been pretty public work from Meta on schedule-free optimizers. I don't know if you've been following Aaron DeFazio and what he's doing, just because you mentioned learning rate schedules, you know, what if you didn't have a schedule?Jeremy [00:04:18]: I don't care very much, honestly. I don't think that schedule-free optimizer is that exciting. It's fine. We've had non-scheduled optimizers for ages, like Less Wright, who's now at Meta, who was part of the Fast.ai community there, created something called the Ranger optimizer. I actually like having more hyperparameters. You know, as soon as you say schedule-free, then like, well, now I don't get to choose. And there isn't really a mathematically correct way of, like, I actually try to schedule more parameters rather than less. So like, I like scheduling my epsilon in my atom, for example. I schedule all the things. But then the other thing we always did with the Fast.ai library was make it so you don't have to set any schedules. So Fast.ai always supported, like, you didn't even have to pass a learning rate. Like, it would always just try to have good defaults and do the right thing. But to me, I like to have more parameters I can play with if I want to, but you don't have to.Alessio [00:05:08]: And then the more less technical side, I guess, of your issue, I guess, with the market was some of the large research labs taking all this innovation kind of behind closed doors and whether or not that's good, which it isn't. And now we could maybe make it more available to people. And then a month after we released the episode, there was the whole Sam Altman drama and like all the OpenAI governance issues. And maybe people started to think more, okay, what happens if some of these kind of labs, you know, start to break from within, so to speak? And the alignment of the humans is probably going to fall before the alignment of the models. So I'm curious, like, if you have any new thoughts and maybe we can also tie in some of the way that we've been building Answer as like a public benefit corp and some of those aspects.Jeremy [00:05:51]: Sure. So, yeah, I mean, it was kind of uncomfortable because two days before Altman got fired, I did a small public video interview in which I said, I'm quite sure that OpenAI's current governance structure can't continue and that it was definitely going to fall apart. And then it fell apart two days later and a bunch of people were like, what did you know, Jeremy?Alessio [00:06:13]: What did Jeremy see?Jeremy [00:06:15]: I didn't see anything. It's just obviously true. Yeah. So my friend Eric Ries and I spoke a lot before that about, you know, Eric's, I think probably most people would agree, the top expert in the world on startup and AI governance. And you know, we could both clearly see that this didn't make sense to have like a so-called non-profit where then there are people working at a company, a commercial company that's owned by or controlled nominally by the non-profit, where the people in the company are being given the equivalent of stock options, like everybody there was working there with expecting to make money largely from their equity. So the idea that then a board could exercise control by saying like, oh, we're worried about safety issues and so we're going to do something that decreases the profit of the company, when every stakeholder in the company, their remuneration pretty much is tied to their profit, it obviously couldn't work. So I mean, that was a huge oversight there by someone. I guess part of the problem is that the kind of people who work at non-profits and in this case the board, you know, who are kind of academics and, you know, people who are kind of true believers. I think it's hard for them to realize that 99.999% of the world is driven very heavily by money, especially huge amounts of money. So yeah, Eric and I had been talking for a long time before that about what could be done differently, because also companies are sociopathic by design and so the alignment problem as it relates to companies has not been solved. Like, companies become huge, they devour their founders, they devour their communities and they do things where even the CEOs, you know, often of big companies tell me like, I wish our company didn't do that thing. You know, I know that if I didn't do it, then I would just get fired and the board would put in somebody else and the board knows if they don't do it, then their shareholders can sue them because they're not maximizing profitability or whatever. So what Eric's spent a lot of time doing is trying to think about how do we make companies less sociopathic, you know, how to, or more, you know, maybe a better way to think of it is like, how do we make it so that the founders of companies can ensure that their companies continue to actually do the things they want them to do? You know, when we started a company, hey, we very explicitly decided we got to start a company, not a academic lab, not a nonprofit, you know, we created a Delaware Seacorp, you know, the most company kind of company. But when we did so, we told everybody, you know, including our first investors, which was you Alessio. They sound great. We are going to run this company on the basis of maximizing long-term value. And in fact, so when we did our second round, which was an angel round, we had everybody invest through a long-term SPV, which we set up where everybody had to agree to vote in line with long-term value principles. So like never enough just to say to people, okay, we're trying to create long-term value here for society as well as for ourselves and everybody's like, oh, yeah, yeah, I totally agree with that. But when it comes to like, okay, well, here's a specific decision we have to make, which will not maximize short-term value, people suddenly change their mind. So you know, it has to be written into the legal documents of everybody so that no question that that's the way the company has to be managed. So then you mentioned the PBC aspect, Public Benefit Corporation, which I never quite understood previously. And turns out it's incredibly simple, like it took, you know, like one paragraph added to our corporate documents to become a PBC. It was cheap, it was easy, but it's got this huge benefit, which is if you're not a public benefit corporation, then somebody can come along and offer to buy you with a stated description of like turning your company into the thing you most hate, right? And if they offer you more than the market value of your company and you don't accept it, then you are not necessarily meeting the kind of your fiduciary responsibilities. So the way like Eric always described it to me is like, if Philip Morris came along and said that you've got great technology for marketing cigarettes to children, so we're going to pivot your company to do that entirely, and we're going to pay you 50% more than the market value, you're going to have to say yes. If you have a PBC, then you are more than welcome to say no, if that offer is not in line with your stated public benefit. So our stated public benefit is to maximize the benefit to society through using AI. So given that more children smoking doesn't do that, then we can say like, no, we're not selling to you.Alessio [00:11:01]: I was looking back at some of our emails. You sent me an email on November 13th about talking and then on the 14th, I sent you an email working together to free AI was the subject line. And then that was kind of the start of the C round. And then two days later, someone got fired. So you know, you were having these thoughts even before we had like a public example of like why some of the current structures didn't work. So yeah, you were very ahead of the curve, so to speak. You know, people can read your awesome introduction blog and answer and the idea of having a R&D lab versus our lab and then a D lab somewhere else. I think to me, the most interesting thing has been hiring and some of the awesome people that you've been bringing on that maybe don't fit the central casting of Silicon Valley, so to speak. Like sometimes I got it like playing baseball cards, you know, people are like, oh, what teams was this person on, where did they work versus focusing on ability. So I would love for you to give a shout out to some of the awesome folks that you have on the team.Jeremy [00:11:58]: So, you know, there's like a graphic going around describing like the people at XAI, you know, Elon Musk thing. And like they are all connected to like multiple of Stanford, Meta, DeepMind, OpenAI, Berkeley, Oxford. Look, these are all great institutions and they have good people. And I'm definitely not at all against that, but damn, there's so many other people. And one of the things I found really interesting is almost any time I see something which I think like this is really high quality work and it's something I don't think would have been built if that person hadn't built the thing right now, I nearly always reach out to them and ask to chat. And I tend to dig in to find out like, okay, you know, why did you do that thing? Everybody else has done this other thing, your thing's much better, but it's not what other people are working on. And like 80% of the time, I find out the person has a really unusual background. So like often they'll have like, either they like came from poverty and didn't get an opportunity to go to a good school or had dyslexia and, you know, got kicked out of school in year 11, or they had a health issue that meant they couldn't go to university or something happened in their past and they ended up out of the mainstream. And then they kind of succeeded anyway. Those are the people that throughout my career, I've tended to kind of accidentally hire more of, but it's not exactly accidentally. It's like when I see somebody who's done, two people who have done extremely well, one of them did extremely well in exactly the normal way from the background entirely pointing in that direction and they achieved all the hurdles to get there. And like, okay, that's quite impressive, you know, but another person who did just as well, despite lots of constraints and doing things in really unusual ways and came up with different approaches. That's normally the person I'm likely to find useful to work with because they're often like risk-takers, they're often creative, they're often extremely tenacious, they're often very open-minded. So that's the kind of folks I tend to find myself hiring. So now at Answer.ai, it's a group of people that are strong enough that nearly every one of them has independently come to me in the past few weeks and told me that they have imposter syndrome and they're not convinced that they're good enough to be here. And I kind of heard it at the point where I was like, okay, I don't think it's possible that all of you are so far behind your peers that you shouldn't get to be here. But I think part of the problem is as an R&D lab, the great developers look at the great researchers and they're like, wow, these big-brained, crazy research people with all their math and s**t, they're too cool for me, oh my God. And then the researchers look at the developers and they're like, oh, they're killing it, making all this stuff with all these people using it and talking on Twitter about how great it is. I think they're both a bit intimidated by each other, you know. And so I have to kind of remind them like, okay, there are lots of things in this world where you suck compared to lots of other people in this company, but also vice versa, you know, for all things. And the reason you came here is because you wanted to learn about those other things from those other people and have an opportunity to like bring them all together into a single unit. You know, it's not reasonable to expect you're going to be better at everything than everybody else. I guess the other part of it is for nearly all of the people in the company, to be honest, they have nearly always been better than everybody else at nearly everything they're doing nearly everywhere they've been. So it's kind of weird to be in this situation now where it's like, gee, I can clearly see that I suck at this thing that I'm meant to be able to do compared to these other people where I'm like the worst in the company at this thing for some things. So I think that's a healthy place to be, you know, as long as you keep reminding each other about that's actually why we're here. And like, it's all a bit of an experiment, like we don't have any managers. We don't have any hierarchy from that point of view. So for example, I'm not a manager, which means I don't get to tell people what to do or how to do it or when to do it. Yeah, it's been a bit of an experiment to see how that would work out. And it's been great. So for instance, Ben Clavier, who you might have come across, he's the author of Ragatouille, he's the author of Rerankers, super strong information retrieval guy. And a few weeks ago, you know, this additional channel appeared on Discord, on our private Discord called Bert24. And these people started appearing, as in our collab sections, we have a collab section for like collaborating with outsiders. And these people started appearing, there are all these names that I recognize, like Bert24, and they're all talking about like the next generation of Bert. And I start following along, it's like, okay, Ben decided that I think, quite rightly, we need a new Bert. Because everybody, like so many people are still using Bert, and it's still the best at so many things, but it actually doesn't take advantage of lots of best practices. And so he just went out and found basically everybody who's created better Berts in the last four or five years, brought them all together, suddenly there's this huge collaboration going on. So yeah, I didn't tell him to do that. He didn't ask my permission to do that. And then, like, Benjamin Warner dived in, and he's like, oh, I created a whole transformers from scratch implementation designed to be maximally hackable. He originally did it largely as a teaching exercise to show other people, but he was like, I could, you know, use that to create a really hackable BERT implementation. In fact, he didn't say that. He said, I just did do that, you know, and I created a repo, and then everybody's like starts using it. They're like, oh my god, this is amazing. I can now implement all these other BERT things. And it's not just answer AI guys there, you know, there's lots of folks, you know, who have like contributed new data set mixes and blah, blah, blah. So, I mean, I can help in the same way that other people can help. So like, then Ben Clavier reached out to me at one point and said, can you help me, like, what have you learned over time about how to manage intimidatingly capable and large groups of people who you're nominally meant to be leading? And so, you know, I like to try to help, but I don't direct. Another great example was Kerem, who, after our FSTP QLORA work, decided quite correctly that it didn't really make sense to use LoRa in today's world. You want to use the normalized version, which is called Dora. Like two or three weeks after we did FSTP QLORA, he just popped up and said, okay, I've just converted the whole thing to Dora, and I've also created these VLLM extensions, and I've got all these benchmarks, and, you know, now I've got training of quantized models with adapters that are as fast as LoRa, and as actually better than, weirdly, fine tuning. Just like, okay, that's great, you know. And yeah, so the things we've done to try to help make these things happen as well is we don't have any required meetings, you know, but we do have a meeting for each pair of major time zones that everybody's invited to, and, you know, people see their colleagues doing stuff that looks really cool and say, like, oh, how can I help, you know, or how can I learn or whatever. So another example is Austin, who, you know, amazing background. He ran AI at Fidelity, he ran AI at Pfizer, he ran browsing and retrieval for Google's DeepMind stuff, created Jemma.cpp, and he's been working on a new system to make it easier to do web GPU programming, because, again, he quite correctly identified, yeah, so I said to him, like, okay, I want to learn about that. Not an area that I have much expertise in, so, you know, he's going to show me what he's working on and teach me a bit about it, and hopefully I can help contribute. I think one of the key things that's happened in all of these is everybody understands what Eric Gilliam, who wrote the second blog post in our series, the R&D historian, describes as a large yard with narrow fences. Everybody has total flexibility to do what they want. We all understand kind of roughly why we're here, you know, we agree with the premises around, like, everything's too expensive, everything's too complicated, people are building too many vanity foundation models rather than taking better advantage of fine-tuning, like, there's this kind of general, like, sense of we're all on the same wavelength about, you know, all the ways in which current research is fucked up, and, you know, all the ways in which we're worried about centralization. We all care a lot about not just research for the point of citations, but research that actually wouldn't have happened otherwise, and actually is going to lead to real-world outcomes. And so, yeah, with this kind of, like, shared vision, people understand, like, you know, so when I say, like, oh, well, you know, tell me, Ben, about BERT 24, what's that about? And he's like, you know, like, oh, well, you know, you can see from an accessibility point of view, or you can see from a kind of a actual practical impact point of view, there's far too much focus on decoder-only models, and, you know, like, BERT's used in all of these different places and industry, and so I can see, like, in terms of our basic principles, what we're trying to achieve, this seems like something important. And so I think that's, like, a really helpful that we have that kind of shared perspective, you know?Alessio [00:21:14]: Yeah. And before we maybe talk about some of the specific research, when you're, like, reaching out to people, interviewing them, what are some of the traits, like, how do these things come out, you know, usually? Is it working on side projects that you, you know, you're already familiar with? Is there anything, like, in the interview process that, like, helps you screen for people that are less pragmatic and more research-driven versus some of these folks that are just gonna do it, you know? They're not waiting for, like, the perfect process.Jeremy [00:21:40]: Everybody who comes through the recruiting is interviewed by everybody in the company. You know, our goal is 12 people, so it's not an unreasonable amount. So the other thing to say is everybody so far who's come into the recruiting pipeline, everybody bar one, has been hired. So which is to say our original curation has been good. And that's actually pretty easy, because nearly everybody who's come in through the recruiting pipeline are people I know pretty well. So Jono Whitaker and I, you know, he worked on the stable diffusion course we did. He's outrageously creative and talented, and he's super, like, enthusiastic tinkerer, just likes making things. Benjamin was one of the strongest parts of the fast.ai community, which is now the alumni. It's, like, hundreds of thousands of people. And you know, again, like, they're not people who a normal interview process would pick up, right? So Benjamin doesn't have any qualifications in math or computer science. Jono was living in Zimbabwe, you know, he was working on, like, helping some African startups, you know, but not FAANG kind of credentials. But yeah, I mean, when you actually see people doing real work and they stand out above, you know, we've got lots of Stanford graduates and open AI people and whatever in our alumni community as well. You know, when you stand out above all of those people anyway, obviously you've got something going for you. You know, Austin, him and I worked together on the masks study we did in the proceeding at the National Academy of Science. You know, we had worked together, and again, that was a group of, like, basically the 18 or 19 top experts in the world on public health and epidemiology and research design and so forth. And Austin, you know, one of the strongest people in that collaboration. So yeah, you know, like, I've been lucky enough to have had opportunities to work with some people who are great and, you know, I'm a very open-minded person, so I kind of am always happy to try working with pretty much anybody and some people stand out. You know, there have been some exceptions, people I haven't previously known, like Ben Clavier, actually, I didn't know before. But you know, with him, you just read his code, and I'm like, oh, that's really well-written code. And like, it's not written exactly the same way as everybody else's code, and it's not written to do exactly the same thing as everybody else's code. So yeah, and then when I chatted to him, it's just like, I don't know, I felt like we'd known each other for years, like we just were on the same wavelength, but I could pretty much tell that was going to happen just by reading his code. I think you express a lot in the code you choose to write and how you choose to write it, I guess. You know, or another example, a guy named Vic, who was previously the CEO of DataQuest, and like, in that case, you know, he's created a really successful startup. He won the first, basically, Kaggle NLP competition, which was automatic essay grading. He's got the current state-of-the-art OCR system, Surya. Again, he's just a guy who obviously just builds stuff, you know, he doesn't ask for permission, he doesn't need any, like, external resources. Actually, Karim's another great example of this, I mean, I already knew Karim very well because he was my best ever master's student, but it wasn't a surprise to me then when he then went off to create the world's state-of-the-art language model in Turkish on his own, in his spare time, with no budget, from scratch. This is not fine-tuning or whatever, he, like, went back to Common Crawl and did everything. Yeah, it's kind of, I don't know what I'd describe that process as, but it's not at all based on credentials.Swyx [00:25:17]: Assemble based on talent, yeah. We wanted to dive in a little bit more on, you know, turning from the people side of things into the technical bets that you're making. Just a little bit more on Bert. I was actually, we just did an interview with Yi Tay from Reka, I don't know if you're familiar with his work, but also another encoder-decoder bet, and one of his arguments was actually people kind of over-index on the decoder-only GPT-3 type paradigm. I wonder if you have thoughts there that is maybe non-consensus as well. Yeah, no, absolutely.Jeremy [00:25:45]: So I think it's a great example. So one of the people we're collaborating with a little bit with BERT24 is Colin Raffle, who is the guy behind, yeah, most of that stuff, you know, between that and UL2, there's a lot of really interesting work. And so one of the things I've been encouraging the BERT group to do, Colin has as well, is to consider using a T5 pre-trained encoder backbone as a thing you fine-tune, which I think would be really cool. You know, Colin was also saying actually just use encoder-decoder as your Bert, you know, why don't you like use that as a baseline, which I also think is a good idea. Yeah, look.Swyx [00:26:25]: What technical arguments are people under-weighting?Jeremy [00:26:27]: I mean, Colin would be able to describe this much better than I can, but I'll give my slightly non-expert attempt. Look, I mean, think about like diffusion models, right? Like in stable diffusion, like we use things like UNet. You have this kind of downward path and then in the upward path you have the cross connections, which it's not a tension, but it's like a similar idea, right? You're inputting the original encoding path into your decoding path. It's critical to make it work, right? Because otherwise in the decoding part, the model has to do so much kind of from scratch. So like if you're doing translation, like that's a classic kind of encoder-decoder example. If it's decoder only, you never get the opportunity to find the right, you know, feature engineering, the right feature encoding for the original sentence. And it kind of means then on every token that you generate, you have to recreate the whole thing, you know? So if you have an encoder, it's basically saying like, okay, this is your opportunity model to create a really useful feature representation for your input information. So I think there's really strong arguments for encoder-decoder models anywhere that there is this kind of like context or source thing. And then why encoder only? Well, because so much of the time what we actually care about is a classification, you know? It's like an output. It's like generating an arbitrary length sequence of tokens. So anytime you're not generating an arbitrary length sequence of tokens, decoder models don't seem to make much sense. Now the interesting thing is, you see on like Kaggle competitions, that decoder models still are at least competitive with things like Deberta v3. They have to be way bigger to be competitive with things like Deberta v3. And the only reason they are competitive is because people have put a lot more time and money and effort into training the decoder only ones, you know? There isn't a recent Deberta. There isn't a recent Bert. Yeah, it's a whole part of the world that people have slept on a little bit. And this is just what happens. This is how trends happen rather than like, to me, everybody should be like, oh, let's look at the thing that has shown signs of being useful in the past, but nobody really followed up with properly. That's the more interesting path, you know, where people tend to be like, oh, I need to get citations. So what's everybody else doing? Can I make it 0.1% better, you know, or 0.1% faster? That's what everybody tends to do. Yeah. So I think it's like, Itay's work commercially now is interesting because here's like a whole, here's a whole model that's been trained in a different way. So there's probably a whole lot of tasks it's probably better at than GPT and Gemini and Claude. So that should be a good commercial opportunity for them if they can figure out what those tasks are.Swyx [00:29:07]: Well, if rumors are to be believed, and he didn't comment on this, but, you know, Snowflake may figure out the commercialization for them. So we'll see.Jeremy [00:29:14]: Good.Alessio [00:29:16]: Let's talk about FSDP, Qlora, Qdora, and all of that awesome stuff. One of the things we talked about last time, some of these models are meant to run on systems that nobody can really own, no single person. And then you were like, well, what if you could fine tune a 70B model on like a 4090? And I was like, no, that sounds great, Jeremy, but like, can we actually do it? And then obviously you all figured it out. Can you maybe tell us some of the worst stories behind that, like the idea behind FSDP, which is kind of taking sharded data, parallel computation, and then Qlora, which is do not touch all the weights, just go quantize some of the model, and then within the quantized model only do certain layers instead of doing everything.Jeremy [00:29:57]: Well, do the adapters. Yeah.Alessio [00:29:59]: Yeah. Yeah. Do the adapters. Yeah. I will leave the floor to you. I think before you published it, nobody thought this was like a short term thing that we're just going to have. And now it's like, oh, obviously you can do it, but it's not that easy.Jeremy [00:30:12]: Yeah. I mean, to be honest, it was extremely unpleasant work to do. It's like not at all enjoyable. I kind of did version 0.1 of it myself before we had launched the company, or at least the kind of like the pieces. They're all pieces that are difficult to work with, right? So for the quantization, you know, I chatted to Tim Detmers quite a bit and, you know, he very much encouraged me by saying like, yeah, it's possible. He actually thought it'd be easy. It probably would be easy for him, but I'm not Tim Detmers. And, you know, so he wrote bits and bytes, which is his quantization library. You know, he wrote that for a paper. He didn't write that to be production like code. It's now like everybody's using it, at least the CUDA bits. So like, it's not particularly well structured. There's lots of code paths that never get used. There's multiple versions of the same thing. You have to try to figure it out. So trying to get my head around that was hard. And you know, because the interesting bits are all written in CUDA, it's hard to like to step through it and see what's happening. And then, you know, FSTP is this very complicated library and PyTorch, which not particularly well documented. So the only really, really way to understand it properly is again, just read the code and step through the code. And then like bits and bytes doesn't really work in practice unless it's used with PEF, the HuggingFace library and PEF doesn't really work in practice unless you use it with other things. And there's a lot of coupling in the HuggingFace ecosystem where like none of it works separately. You have to use it all together, which I don't love. So yeah, trying to just get a minimal example that I can play with was really hard. And so I ended up having to rewrite a lot of it myself to kind of create this like minimal script. One thing that helped a lot was Medec had this LlamaRecipes repo that came out just a little bit before I started working on that. And like they had a kind of role model example of like, here's how to train FSTP, LoRa, didn't work with QLoRa on Llama. A lot of the stuff I discovered, the interesting stuff would be put together by Les Wright, who's, he was actually the guy in the Fast.ai community I mentioned who created the Ranger Optimizer. So he's doing a lot of great stuff at Meta now. So yeah, I kind of, that helped get some minimum stuff going and then it was great once Benjamin and Jono joined full time. And so we basically hacked at that together and then Kerim joined like a month later or something. And it was like, gee, it was just a lot of like fiddly detailed engineering on like barely documented bits of obscure internals. So my focus was to see if it kind of could work and I kind of got a bit of a proof of concept working and then the rest of the guys actually did all the work to make it work properly. And, you know, every time we thought we had something, you know, we needed to have good benchmarks, right? So we'd like, it's very easy to convince yourself you've done the work when you haven't, you know, so then we'd actually try lots of things and be like, oh, and these like really important cases, the memory use is higher, you know, or it's actually slower. And we'd go in and we just find like all these things that were nothing to do with our library that just didn't work properly. And nobody had noticed they hadn't worked properly because nobody had really benchmarked it properly. So we ended up, you know, trying to fix a whole lot of different things. And even as we did so, new regressions were appearing in like transformers and stuff that Benjamin then had to go away and figure out like, oh, how come flash attention doesn't work in this version of transformers anymore with this set of models and like, oh, it turns out they accidentally changed this thing, so it doesn't work. You know, there's just, there's not a lot of really good performance type evals going on in the open source ecosystem. So there's an extraordinary amount of like things where people say like, oh, we built this thing and it has this result. And when you actually check it, so yeah, there's a shitload of war stories from getting that thing to work. And it did require a particularly like tenacious group of people and a group of people who don't mind doing a whole lot of kind of like really janitorial work, to be honest, to get the details right, to check them. Yeah.Alessio [00:34:09]: We had a trade out on the podcast and we talked about how a lot of it is like systems work to make some of these things work. It's not just like beautiful, pure math that you do on a blackboard. It's like, how do you get into the nitty gritty?Jeremy [00:34:22]: I mean, flash attention is a great example of that. Like it's, it basically is just like, oh, let's just take the attention and just do the tiled version of it, which sounds simple enough, you know, but then implementing that is challenging at lots of levels.Alessio [00:34:36]: Yeah. What about inference? You know, obviously you've done all this amazing work on fine tuning. Do you have any research you've been doing on the inference side, how to make local inference really fast on these models too?Jeremy [00:34:47]: We're doing quite a bit on that at the moment. We haven't released too much there yet. But one of the things I've been trying to do is also just to help other people. And one of the nice things that's happened is that a couple of folks at Meta, including Mark Seraphim, have done a nice job of creating this CUDA mode community of people working on like CUDA kernels or learning about that. And I tried to help get that going well as well and did some lessons to help people get into it. So there's a lot going on in both inference and fine tuning performance. And a lot of it's actually happening kind of related to that. So PyTorch team have created this Torch AO project on quantization. And so there's a big overlap now between kind of the FastAI and AnswerAI and CUDA mode communities of people working on stuff for both inference and fine tuning. But we're getting close now. You know, our goal is that nobody should be merging models, nobody should be downloading merged models, everybody should be using basically quantized plus adapters for almost everything and just downloading the adapters. And that should be much faster. So that's kind of the place we're trying to get to. It's difficult, you know, because like Karim's been doing a lot of work with VLM, for example. These inference engines are pretty complex bits of code. They have a whole lot of custom kernel stuff going on as well, as do the quantization libraries. So we've been working on, we're also quite a bit of collaborating with the folks who do HQQ, which is a really great quantization library and works super well. So yeah, there's a lot of other people outside AnswerAI that we're working with a lot who are really helping on all this performance optimization stuff, open source.Swyx [00:36:27]: Just to follow up on merging models, I picked up there that you said nobody should be merging models. That's interesting because obviously a lot of people are experimenting with this and finding interesting results. I would say in defense of merging models, you can do it without data. That's probably the only thing that's going for it.Jeremy [00:36:45]: To explain, it's not that you shouldn't merge models. You shouldn't be distributing a merged model. You should distribute a merged adapter 99% of the time. And actually often one of the best things happening in the model merging world is actually that often merging adapters works better anyway. The point is, Sean, that once you've got your new model, if you distribute it as an adapter that sits on top of a quantized model that somebody's already downloaded, then it's a much smaller download for them. And also the inference should be much faster because you're not having to transfer FB16 weights from HPM memory at all or ever load them off disk. You know, all the main weights are quantized and the only floating point weights are in the adapters. So that should make both inference and fine tuning faster. Okay, perfect.Swyx [00:37:33]: We're moving on a little bit to the rest of the fast universe. I would have thought that, you know, once you started Answer.ai, that the sort of fast universe would be kind of on hold. And then today you just dropped Fastlight and it looks like, you know, there's more activity going on in sort of Fastland.Jeremy [00:37:49]: Yeah. So Fastland and Answerland are not really distinct things. Answerland is kind of like the Fastland grown up and funded. They both have the same mission, which is to maximize the societal benefit of AI broadly. We want to create thousands of commercially successful products at Answer.ai. And we want to do that with like 12 people. So that means we need a pretty efficient stack, you know, like quite a few orders of magnitude more efficient, not just for creation, but for deployment and maintenance than anything that currently exists. People often forget about the D part of our R&D firm. So we've got to be extremely good at creating, deploying and maintaining applications, not just models. Much to my horror, the story around creating web applications is much worse now than it was 10 or 15 years ago in terms of, if I say to a data scientist, here's how to create and deploy a web application, you know, either you have to learn JavaScript or TypeScript and about all the complex libraries like React and stuff, and all the complex like details around security and web protocol stuff around how you then talk to a backend and then all the details about creating the backend. You know, if that's your job and, you know, you have specialists who work in just one of those areas, it is possible for that to all work. But compared to like, oh, write a PHP script and put it in the home directory that you get when you sign up to this shell provider, which is what it was like in the nineties, you know, here are those 25 lines of code and you're done and now you can pass that URL around to all your friends, or put this, you know, .pl file inside the CGI bin directory that you got when you signed up to this web host. So yeah, the thing I've been mainly working on the last few weeks is fixing all that. And I think I fixed it. I don't know if this is an announcement, but I tell you guys, so yeah, there's this thing called fastHTML, which basically lets you create a complete web application in a single Python file. Unlike excellent projects like Streamlit and Gradio, you're not working on top of a highly abstracted thing. That's got nothing to do with web foundations. You're working with web foundations directly, but you're able to do it by using pure Python. There's no template, there's no ginger, there's no separate like CSS and JavaScript files. It looks and behaves like a modern SPA web application. And you can create components for like daisy UI, or bootstrap, or shoelace, or whatever fancy JavaScript and or CSS tailwind etc library you like, but you can write it all in Python. You can pip install somebody else's set of components and use them entirely from Python. You can develop and prototype it all in a Jupyter notebook if you want to. It all displays correctly, so you can like interactively do that. And then you mentioned Fastlight, so specifically now if you're using SQLite in particular, it's like ridiculously easy to have that persistence, and all of your handlers will be passed database ready objects automatically, that you can just call dot delete dot update dot insert on. Yeah, you get session, you get security, you get all that. So again, like with most everything I do, it's very little code. It's mainly tying together really cool stuff that other people have written. You don't have to use it, but a lot of the best stuff comes from its incorporation of HTMX, which to me is basically the thing that changes your browser to make it work the way it always should have. So it just does four small things, but those four small things are the things that are basically unnecessary constraints that HTML should never have had, so it removes the constraints. It sits on top of Starlet, which is a very nice kind of lower level platform for building these kind of web applications. The actual interface matches as closely as possible to FastAPI, which is a really nice system for creating the kind of classic JavaScript type applications. And Sebastian, who wrote FastAPI, has been kind enough to help me think through some of these design decisions, and so forth. I mean, everybody involved has been super helpful. Actually, I chatted to Carson, who created HTMX, you know, so about it. Some of the folks involved in Django, like everybody in the community I've spoken to definitely realizes there's a big gap to be filled around, like, highly scalable, web foundation-based, pure Python framework with a minimum of fuss. So yeah, I'm getting a lot of support and trying to make sure that FastHTML works well for people.Swyx [00:42:38]: I would say, when I heard about this, I texted Alexio. I think this is going to be pretty huge. People consider Streamlit and Gradio to be the state of the art, but I think there's so much to improve, and having what you call web foundations and web fundamentals at the core of it, I think, would be really helpful.Jeremy [00:42:54]: I mean, it's based on 25 years of thinking and work for me. So like, FastML was built on a system much like this one, but that was of hell. And so I spent, you know, 10 years working on that. We had millions of people using that every day, really pushing it hard. And I really always enjoyed working in that. Yeah. So, you know, and obviously lots of other people have done like great stuff, and particularly HTMX. So I've been thinking about like, yeah, how do I pull together the best of the web framework I created for FastML with HTMX? There's also things like PicoCSS, which is the CSS system, which by default, FastHTML comes with. Although, as I say, you can pip install anything you want to, but it makes it like super easy to, you know, so we try to make it so that just out of the box, you don't have any choices to make. Yeah. You can make choices, but for most people, you just, you know, it's like the PHP in your home directory thing. You just start typing and just by default, you'll get something which looks and feels, you know, pretty okay. And if you want to then write a version of Gradio or Streamlit on top of that, you totally can. And then the nice thing is if you then write it in kind of the Gradio equivalent, which will be, you know, I imagine we'll create some kind of pip installable thing for that. Once you've outgrown, or if you outgrow that, it's not like, okay, throw that all away and start again. And this like whole separate language that it's like this kind of smooth, gentle path that you can take step-by-step because it's all just standard web foundations all the way, you know.Swyx [00:44:29]: Just to wrap up the sort of open source work that you're doing, you're aiming to create thousands of projects with a very, very small team. I haven't heard you mention once AI agents or AI developer tooling or AI code maintenance. I know you're very productive, but you know, what is the role of AI in your own work?Jeremy [00:44:47]: So I'm making something. I'm not sure how much I want to say just yet.Swyx [00:44:52]: Give us a nibble.Jeremy [00:44:53]: All right. I'll give you the key thing. So I've created a new approach. It's not called prompt engineering. It's called dialogue engineering. But I'm creating a system for doing dialogue engineering. It's currently called AI magic. I'm doing most of my work in this system and it's making me much more productive than I was before I used it. So I always just build stuff for myself and hope that it'll be useful for somebody else. Think about chat GPT with code interpreter, right? The basic UX is the same as a 1970s teletype, right? So if you wrote APL on a teletype in the 1970s, you typed onto a thing, your words appeared at the bottom of a sheet of paper and you'd like hit enter and it would scroll up. And then the answer from APL would be printed out, scroll up, and then you would type the next thing. And like, which is also the way, for example, a shell works like bash or ZSH or whatever. It's not terrible, you know, like we all get a lot done in these like very, very basic teletype style REPL environments, but I've never felt like it's optimal and everybody else has just copied chat GPT. So it's also the way BART and Gemini work. It's also the way the Claude web app works. And then you add code interpreter. And the most you can do is to like plead with chat GPT to write the kind of code I want. It's pretty good for very, very, very beginner users who like can't code at all, like by default now the code's even hidden away, so you never even have to see it ever happened. But for somebody who's like wanting to learn to code or who already knows a bit of code or whatever, it's, it seems really not ideal. So okay, that's one end of the spectrum. The other end of the spectrum, which is where Sean's work comes in, is, oh, you want to do more than chat GPT? No worries. Here is Visual Studio Code. I run it. There's an empty screen with a flashing cursor. Okay, start coding, you know, and it's like, okay, you can use systems like Sean's or like cursor or whatever to be like, okay, Apple K in cursors, like a creative form that blah, blah, blah. But in the end, it's like a convenience over the top of this incredibly complicated system that full-time sophisticated software engineers have designed over the past few decades in a totally different environment as a way to build software, you know. And so we're trying to like shoehorn in AI into that. And it's not easy to do. And I think there are like much better ways of thinking about the craft of software development in a language model world to be much more interactive, you know. So the thing that I'm building is neither of those things. It's something between the two. And it's built around this idea of crafting a dialogue, you know, where the outcome of the dialogue is the artifacts that you want, whether it be a piece of analysis or whether it be a Python library or whether it be a technical blog post or whatever. So as part of building that, I've created something called Claudette, which is a library for Claude. I've created something called Cosette, which is a library for OpenAI. They're libraries which are designed to make those APIs much more usable, much easier to use, much more concise. And then I've written AI magic on top of those. And that's been an interesting exercise because I did Claudette first, and I was looking at what Simon Willison did with his fantastic LLM library. And his library is designed around like, let's make something that supports all the LLM inference engines and commercial providers. I thought, okay, what if I did something different, which is like make something that's as Claude friendly as possible and forget everything else. So that's what Claudette was. So for example, one of the really nice things in Claude is prefill. So by telling the assistant that this is what your response started with, there's a lot of powerful things you can take advantage of. So yeah, I created Claudette to be as Claude friendly as possible. And then after I did that, and then particularly with GPT 4.0 coming out, I kind of thought, okay, now let's create something that's as OpenAI friendly as possible. And then I tried to look to see, well, where are the similarities and where are the differences? And now can I make them compatible in places where it makes sense for them to be compatible without losing out on the things that make each one special for what they are. So yeah, those are some of the things I've been working on in that space. And I'm thinking we might launch AI magic via a course called how to solve it with code. The name is based on the classic Polya book, if you know how to solve it, which is, you know, one of the classic math books of all time, where we're basically going to try to show people how to solve challenging problems that they didn't think they could solve without doing a full computer science course, by taking advantage of a bit of AI and a bit of like practical skills, as particularly for this like whole generation of people who are learning to code with and because of ChatGPT. Like I love it, I know a lot of people who didn't really know how to code, but they've created things because they use ChatGPT, but they don't really know how to maintain them or fix them or add things to them that ChatGPT can't do, because they don't really know how to code. And so this course will be designed to show you how you can like either become a developer who can like supercharge their capabilities by using language models, or become a language model first developer who can supercharge their capabilities by understanding a bit about process and fundamentals.Alessio [00:50:19]: Nice. That's a great spoiler. You know, I guess the fourth time you're going to be on learning space, we're going to talk about AI magic. Jeremy, before we wrap, this was just a great run through everything. What are the things that when you next come on the podcast in nine, 12 months, we're going to be like, man, Jeremy was like really ahead of it. Like, is there anything that you see in the space that maybe people are not talking enough? You know, what's the next company that's going to fall, like have drama internally, anything in your mind?Jeremy [00:50:47]: You know, hopefully we'll be talking a lot about fast HTML and hopefully the international community that at that point has come up around that. And also about AI magic and about dialogue engineering. Hopefully dialogue engineering catches on because I think it's the right way to think about a lot of this stuff. What else? Just trying to think about all on the research side. Yeah. I think, you know, I mean, we've talked about a lot of it. Like I think encoder decoder architectures, encoder only architectures, hopefully we'll be talking about like the whole re-interest in BERT that BERT 24 stimulated.Swyx [00:51:17]: There's a safe space model that came out today that might be interesting for this general discussion. One thing that stood out to me with Cartesia's blog posts was that they were talking about real time ingestion, billions and trillions of tokens, and keeping that context, obviously in the state space that they have.Jeremy [00:51:34]: Yeah.Swyx [00:51:35]: I'm wondering what your thoughts are because you've been entirely transformers the whole time.Jeremy [00:51:38]: Yeah. No. So obviously my background is RNNs and LSTMs. Of course. And I'm still a believer in the idea that state is something you can update, you know? So obviously Sepp Hochreiter came up, came out with xLSTM recently. Oh my God. Okay. Another whole thing we haven't talked about, just somewhat related. I've been going crazy for like a long time about like, why can I not pay anybody to save my KV cash? I just ingested the Great Gatsby or the documentation for Starlet or whatever, you know, I'm sending it as my prompt context. Why are you redoing it every time? So Gemini is about to finally come out with KV caching, and this is something that Austin actually in Gemma.cpp had had on his roadmap for years, well not years, months, long time. The idea that the KV cache is like a thing that, it's a third thing, right? So there's RAG, you know, there's in-context learning, you know, and prompt engineering, and there's KV cache creation. I think it creates like a whole new class almost of applications or as techniques where, you know, for me, for example, I very often work with really new libraries or I've created my own library that I'm now writing with rather than on. So I want all the docs in my new library to be there all the time. So I want to upload them once, and then we have a whole discussion about building this application using FastHTML. Well nobody's got FastHTML in their language model yet, I don't want to send all the FastHTML docs across every time. So one of the things I'm looking at doing in AI Magic actually is taking advantage of some of these ideas so that you can have the documentation of the libraries you're working on be kind of always available. Something over the next 12 months people will be spending time thinking about is how to like, where to use RAG, where to use fine-tuning, where to use KV cache storage, you know. And how to use state, because in state models and XLSTM, again, state is something you update. So how do we combine the best of all of these worlds?Alessio [00:53:46]: And Jeremy, I know before you talked about how some of the autoregressive models are not maybe a great fit for agents. Any other thoughts on like JEPA, diffusion for text, any interesting thing that you've seen pop up?Jeremy [00:53:58]: In the same way that we probably ought to have state that you can update, i.e. XLSTM and state models, in the same way that a lot of things probably should have an encoder, JEPA and diffusion both seem like the right conceptual mapping for a lot of things we probably want to do. So the idea of like, there should be a piece of the generative pipeline, which is like thinking about the answer and coming up with a sketch of what the answer looks like before you start outputting tokens. That's where it kind of feels like diffusion ought to fit, you know. And diffusion is, because it's not autoregressive, it's like, let's try to like gradually de-blur the picture of how to solve this. So this is also where dialogue engineering fits in, by the way. So with dialogue engineering, one of the reasons it's working so well for me is I use it to kind of like craft the thought process before I generate the code, you know. So yeah, there's a lot of different pieces here and I don't know how they'll all kind of exactly fit together. I don't know if JEPA is going to actually end up working in the text world. I don't know if diffusion will end up working in the text world, but they seem to be like trying to solve a class of problem which is currently unsolved.Alessio [00:55:13]: Awesome, Jeremy. This was great, as usual. Thanks again for coming back on the pod and thank you all for listening. Yeah, that was fantastic. Get full access to Latent Space at www.latent.space/subscribe

Utah Golf Radio
Ep 954: Danny T5, Cooper & Kihei T23 after R1 at Utah Championship

Utah Golf Radio

Play Episode Listen Later Aug 2, 2024 9:41


Danny Summerhays sits at T5 after an opening round (-7) 64 in the Korn Ferry Tour's Utah Championship being played at Oakridge CC. Amateurs Cooper Jones and Kihei Akina are at T23 with (-5) 66s. Peter Kuest, Conner Howe, Preston Summerhays, Carson Lundell, Max Brenchley and Cole Ponich are also in the field. Akina joins the pod. Sponsored by Goldenwest Credit Union. 

Hacker News Recap
July 19th, 2024 | Crowdstrike Outage Causing Widespread Issues

Hacker News Recap

Play Episode Listen Later Jul 20, 2024 12:47


This is a recap of the top 10 posts on Hacker News on July 19th, 2024.This podcast was generated by wondercraft.ai(00:36): FCC votes to limit prison telecom chargesOriginal post: https://news.ycombinator.com/item?id=41005181&utm_source=wondercraft_ai(01:51): Crowdstrike Outage Causing Widespread IssuesOriginal post: https://news.ycombinator.com/item?id=41002677&utm_source=wondercraft_ai(02:42): Bangladesh imposes curfew after dozens killed in anti-government protestsOriginal post: https://news.ycombinator.com/item?id=41007396&utm_source=wondercraft_ai(03:51): Ryanair – when every page is a dark patternOriginal post: https://news.ycombinator.com/item?id=41004039&utm_source=wondercraft_ai(04:55): AI paid for by Ads – the GPT-4o mini inflection pointOriginal post: https://news.ycombinator.com/item?id=41010188&utm_source=wondercraft_ai(06:14): Multisatellite data depicts a record-breaking methane leak from a well blowoutOriginal post: https://news.ycombinator.com/item?id=41012193&utm_source=wondercraft_ai(07:25): CrowdStrike fixes start at "reboot up to 15 times", gets more complex from thereOriginal post: https://news.ycombinator.com/item?id=41007898&utm_source=wondercraft_ai(08:33): NASA's Curiosity rover discovers a surprise in a Martian rockOriginal post: https://news.ycombinator.com/item?id=41006552&utm_source=wondercraft_ai(09:46): What happened to BERT and T5?Original post: https://news.ycombinator.com/item?id=41009803&utm_source=wondercraft_ai(11:07): Never Update AnythingOriginal post: https://news.ycombinator.com/item?id=41009942&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

The CEO Sessions
Grow Your Next Gen Leaders - Tom Mertz, COO at T5 Data Centers

The CEO Sessions

Play Episode Listen Later Jul 11, 2024 29:04


Grow Your Next Gen LeadersTom Mertz, COO of T5 Data Centers. shares his strategy for cultivating talent, fostering mentorship, and building a resilient leadership pipeline.Next Generation leaders are pivotal for your organization's future, offering fresh perspectives, challenging the status quo, and offering ideas on emerging technologies. With Data Centers driving the tech world forward, Tom provides invaluable insights into their crucial role moving forward. Whether you're an executive, HR professional, entrepreneur, or aspiring leader, unlock actionable insights that empower your organization and elevate your team's potential.Tom has served in the data center industry for over 20 years holding many different leadership roles from sales, operations and CEO.He's also coached youth, middle school and high school football for 12 years.T5 Data Centers supports companies that are changing the world through AI and technology innovation with their development, construction, and data center operation services. T5 develops build-to-suit data center solutions across their campuses, and they uniquely deliver construction and facility management as services within their customer's own data centers. With nearly two decades of experience successfully managing execution and operations risks, their customers can be confident that T5 will safely deliver on their commitment to Forever On performance.LinkedIn Profile https://www.linkedin.com/in/tomjmertzCompany Link: https://t5datacenters.com/What You'll Discover in this EpisodeCoaching Championship Football…The Big Lesson from 6 State Titles.A Strategy to Hold the Team Accountable.The Rise of the Data Center, and Why They are VITAL.The AI Effect on Data Infrastructure.How to Communicate a Powerful Vision.A Hack to Block Out the Noise.What his First Job Taught Him about Leadership.The One Trait He's Instill in Every Employee.A Failure that Led to his Success.A Tool that Boosts Productivity.-----Connect with the Host, #1 bestselling author Ben FanningSpeaking and Training inquiresSubscribe to my Youtube channelLinkedInInstagramTwitter

The Animals at Home Network
205: The Reptile Heat Bulb Ban | Thomas Griffiths- AAH

The Animals at Home Network

Play Episode Listen Later Jun 27, 2024 98:15


Thomas Griffiths owns Tomaskas Ltd, an animal lighting and husbandry consultancy out of the UK. In this episode, we discuss the ban on halogen & other incandescent heat lamps in the USA. Thomas explains what the ban means for the reptile industry moving forward, how to fight the ban, and also the science behind why we need to use incandescent bulbs for our animals. Thomas also discusses the European ban on T5 fluorescent tubes and some new exciting technology in the UVB lamp space. SHOW NOTES: https://www.animalsathomenetwork.com/205-heat-bulb-ban/ SHOW SPONSORS: Visit The BioDude here: www.thebiodude.com  @TheBioDudeJoshHalter  Visit Zoo Med Labs here: https://zoomed.com/  @ZooMed  CUSTOM REPTILE HABITATS: https://www.animalsathome.ca/crh JOIN US ON PATREON: https://www.patreon.com/animalsathome LINKS FROM THE EPISODE: https://tomaskas.co.uk/ https://tomaskas.co.uk/dont-let-them-take-your-heat-bulbs/ https://www.instagram.com/thetomaskas/ Reptile Lighting FB: https://www.facebook.com/groups/ReptileLighting/permalink/3222394354561805 We Discuss: 0:00 Coming Up The Bio Dude + Zoo Med 3:06 Welcome Thomas - What Is Going On? 12:51 The Halogen Ban 15:51 What is a Black Body Radiator? How Heat Lamps Work 26:18 Accidental Mismarketking - Heat Projectors 34:50 NERD Video - What they got wrong 43:18 NERD Video - What they got right 44:50 125W+ Is the Current Exception 46:02 The Zoo Med Labs Letter Leak 51:05 How To Fight The Halogen Ban 1:03:20 The Fluorescent Lamp Ban 1:05:44 New Exo Terra Bulbs 1:11:50 How Do UVB Fluorescent Tubes Function? 1:18:11 Understanding the Fluorescent Ban 1:30:18 What To Do Next? 1:35:52 Closing Thoughts

Utah Golf Radio
Ep 935: Big Week for UPOTs; Coop Leader for Quote of the Year

Utah Golf Radio

Play Episode Listen Later Jun 24, 2024 14:30


Several UPOTs, the Utah Players on Tour, had big weeks, but none more meaningful that BYU rising sophomore Cooper Jones in his Korn Ferry Tour debut. Tony had a T5; Weirsy a solo 2nd. Peter Kuest and Danny Summerhays made noise, too. Jones joins the pod. Sponsored by Goldenwest Credit Union. 

Reverse Sweep
OpTic's Losses are “UNACCEPTABLE!” Black Ops 6 REACTION! What If COD Had National Teams?

Reverse Sweep

Play Episode Listen Later Jun 10, 2024 72:25


It's been a big week for pro Call of Duty, as OpTic Texas and Pred, Shotzzy, Dashy & Kenny saw their post-major hangover continue with a loss to And will Call of Duty: Black Ops 6 be the game COD needs? Reverse Sweep hosts and Call of Duty legends Patrick ‘ACHES' Price, Chris ‘Parasite' Duarte, Doug ‘Censor' Martin and Mark ‘MarkyB' Bryceland discuss a big weekend of COD esports action. CHAPTERS: 0:00 Reaction to Black Ops 6 announcement! 7:50 Thoughts on the Esports World Cup 11:48 OpTic suffering a post-major hangover! What's going wrong? 19:12 Vegas challenging Seattle for T5? 25:52 Boston Breach's struggles continue - when will it end? 32:02 Who's going to Champs? 36:34 COD National Team DRAFT! 1:01:22 Is the CDL suppressing Challengers talent? 1:06:49 Most improved CDL player?

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
ICLR 2024 — Best Papers & Talks (Benchmarks, Reasoning & Agents) — ft. Graham Neubig, Aman Sanger, Moritz Hardt)

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Jun 10, 2024 269:19


Our second wave of speakers for AI Engineer World's Fair were announced! The conference sold out of Platinum/Gold/Silver sponsors and Early Bird tickets! See our Microsoft episode for more info and buy now with code LATENTSPACE.This episode is straightforwardly a part 2 to our ICLR 2024 Part 1 episode, so without further ado, we'll just get right on with it!Timestamps[00:03:43] Section A: Code Edits and Sandboxes, OpenDevin, and Academia vs Industry — ft. Graham Neubig and Aman Sanger* [00:07:44] WebArena* [00:18:45] Sotopia* [00:24:00] Performance Improving Code Edits* [00:29:39] OpenDevin* [00:47:40] Industry and Academia[01:05:29] Section B: Benchmarks* [01:05:52] SWEBench* [01:17:05] SWEBench/SWEAgent Interview* [01:27:40] Dataset Contamination Detection* [01:39:20] GAIA Benchmark* [01:49:18] Moritz Hart - Science of Benchmarks[02:36:32] Section C: Reasoning and Post-Training* [02:37:41] Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection* [02:51:00] Let's Verify Step By Step* [02:57:04] Noam Brown* [03:07:43] Lilian Weng - Towards Safe AGI* [03:36:56] A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis* [03:48:43] MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework[04:00:51] Bonus: Notable Related Papers on LLM CapabilitiesSection A: Code Edits and Sandboxes, OpenDevin, and Academia vs Industry — ft. Graham Neubig and Aman Sanger* Guests* Graham Neubig* Aman Sanger - Previous guest and NeurIPS friend of the pod!* WebArena * * Sotopia (spotlight paper, website)* * Learning Performance-Improving Code Edits* OpenDevin* Junyang Opendevin* Morph Labs, Jesse Han* SWE-Bench* SWE-Agent* Aman tweet on swebench* LiteLLM* Livecodebench* the role of code in reasoning* Language Models of Code are Few-Shot Commonsense Learners* Industry vs academia* the matryoshka embeddings incident* other directions* UnlimiformerSection A timestamps* [00:00:00] Introduction to Guests and the Impromptu Nature of the Podcast* [00:00:45] Graham's Experience in Japan and Transition into Teaching NLP* [00:01:25] Discussion on What Constitutes a Good Experience for Students in NLP Courses* [00:02:22] The Relevance and Teaching of Older NLP Techniques Like Ngram Language Models* [00:03:38] Speculative Decoding and the Comeback of Ngram Models* [00:04:16] Introduction to WebArena and Zotopia Projects* [00:05:19] Deep Dive into the WebArena Project and Benchmarking* [00:08:17] Performance Improvements in WebArena Using GPT-4* [00:09:39] Human Performance on WebArena Tasks and Challenges in Evaluation* [00:11:04] Follow-up Work from WebArena and Focus on Web Browsing as a Benchmark* [00:12:11] Direct Interaction vs. Using APIs in Web-Based Tasks* [00:13:29] Challenges in Base Models for WebArena and the Potential of Visual Models* [00:15:33] Introduction to Zootopia and Exploring Social Interactions with Language Models* [00:16:29] Different Types of Social Situations Modeled in Zootopia* [00:17:34] Evaluation of Language Models in Social Simulations* [00:20:41] Introduction to Performance-Improving Code Edits Project* [00:26:28] Discussion on DevIn and the Future of Coding Agents* [00:32:01] Planning in Coding Agents and the Development of OpenDevon* [00:38:34] The Changing Role of Academia in the Context of Large Language Models* [00:44:44] The Changing Nature of Industry and Academia Collaboration* [00:54:07] Update on NLP Course Syllabus and Teaching about Large Language Models* [01:00:40] Call to Action: Contributions to OpenDevon and Open Source AI Projects* [01:01:56] Hiring at Cursor for Roles in Code Generation and Assistive Coding* [01:02:12] Promotion of the AI Engineer ConferenceSection B: Benchmarks * Carlos Jimenez & John Yang (Princeton) et al: SWE-bench: Can Language Models Resolve Real-world Github Issues? (ICLR Oral, Paper, website)* “We introduce SWE-bench, an evaluation framework consisting of 2,294 software engineering problems drawn from real GitHub issues and corresponding pull requests across 12 popular Python repositories. Given a codebase along with a description of an issue to be resolved, a language model is tasked with editing the codebase to address the issue. Resolving issues in SWE-bench frequently requires understanding and coordinating changes across multiple functions, classes, and even files simultaneously, calling for models to interact with execution environments, process extremely long contexts and perform complex reasoning that goes far beyond traditional code generation tasks. Our evaluations show that both state-of-the-art proprietary models and our fine-tuned model SWE-Llama can resolve only the simplest issues. The best-performing model, Claude 2, is able to solve a mere 1.96% of the issues. Advances on SWE-bench represent steps towards LMs that are more practical, intelligent, and autonomous.”* Yonatan Oren et al (Stanford): Proving Test Set Contamination in Black-Box Language Models (ICLR Oral, paper, aman tweet on swebench contamination)* “We show that it is possible to provide provable guarantees of test set contamination in language models without access to pretraining data or model weights. Our approach leverages the fact that when there is no data contamination, all orderings of an exchangeable benchmark should be equally likely. In contrast, the tendency for language models to memorize example order means that a contaminated language model will find certain canonical orderings to be much more likely than others. Our test flags potential contamination whenever the likelihood of a canonically ordered benchmark dataset is significantly higher than the likelihood after shuffling the examples. * We demonstrate that our procedure is sensitive enough to reliably prove test set contamination in challenging situations, including models as small as 1.4 billion parameters, on small test sets of only 1000 examples, and datasets that appear only a few times in the pretraining corpus.”* Outstanding Paper mention: “A simple yet elegant method to test whether a supervised-learning dataset has been included in LLM training.”* Thomas Scialom (Meta AI-FAIR w/ Yann LeCun): GAIA: A Benchmark for General AI Assistants (paper)* “We introduce GAIA, a benchmark for General AI Assistants that, if solved, would represent a milestone in AI research. GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. * GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins. * GAIA's philosophy departs from the current trend in AI benchmarks suggesting to target tasks that are ever more difficult for humans. We posit that the advent of Artificial General Intelligence (AGI) hinges on a system's capability to exhibit similar robustness as the average human does on such questions. Using GAIA's methodology, we devise 466 questions and their answer.* * Mortiz Hardt (Max Planck Institute): The emerging science of benchmarks (ICLR stream)* “Benchmarks are the keystone that hold the machine learning community together. Growing as a research paradigm since the 1980s, there's much we've done with them, but little we know about them. In this talk, I will trace the rudiments of an emerging science of benchmarks through selected empirical and theoretical observations. Specifically, we'll discuss the role of annotator errors, external validity of model rankings, and the promise of multi-task benchmarks. The results in each case challenge conventional wisdom and underscore the benefits of developing a science of benchmarks.”Section C: Reasoning and Post-Training* Akari Asai (UW) et al: Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection (ICLR oral, website)* (Bad RAG implementations) indiscriminately retrieving and incorporating a fixed number of retrieved passages, regardless of whether retrieval is necessary, or passages are relevant, diminishes LM versatility or can lead to unhelpful response generation. * We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG) that enhances an LM's quality and factuality through retrieval and self-reflection. * Our framework trains a single arbitrary LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its generations using special tokens, called reflection tokens. Generating reflection tokens makes the LM controllable during the inference phase, enabling it to tailor its behavior to diverse task requirements. * Self-RAG (7B and 13B parameters) outperforms ChatGPT and retrieval-augmented Llama2-chat on Open-domain QA, reasoning, and fact verification tasks, and it shows significant gains in improving factuality and citation accuracy for long-form generations relative to these models. * Hunter Lightman (OpenAI): Let's Verify Step By Step (paper)* “Even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either to outcome supervision, which provides feedback for a final result, or process supervision, which provides feedback for each intermediate reasoning step. * We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset. Our process-supervised model solves 78% of problems from a representative subset of the MATH test set. Additionally, we show that active learning significantly improves the efficacy of process supervision. * To support related research, we also release PRM800K, the complete dataset of 800,000 step-level human feedback labels used to train our best reward model.* * Noam Brown - workshop on Generative Models for Decision Making* Solving Quantitative Reasoning Problems with Language Models (Minerva paper)* Describes some charts taken directly from the Let's Verify Step By Step paper listed/screenshotted above.* Lilian Weng (OpenAI) - Towards Safe AGI (ICLR talk)* OpenAI Model Spec* OpenAI Instruction Hierarchy: The Instruction Hierarchy: Training LLMs to Prioritize Privileged InstructionsSection D: Agent Systems* Izzeddin Gur (Google DeepMind): A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis (ICLR oral, paper)* [Agent] performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML.* We introduce WebAgent, an LLM-driven agent that learns from self-experience to complete tasks on real websites following natural language instructions.* WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via Python programs generated from those.* We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization.* We empirically demonstrate that our modular recipe improves the success on real websites by over 50%, and that HTML-T5 is the best model to solve various HTML understanding tasks; achieving 18.7% higher success rate than the prior method on MiniWoB web automation benchmark, and SoTA performance on Mind2Web, an offline task planning evaluation.* Sirui Hong (DeepWisdom): MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework (ICLR Oral, Paper)* We introduce MetaGPT, an innovative meta-programming framework incorporating efficient human workflows into LLM-based multi-agent collaborations. MetaGPT encodes Standardized Operating Procedures (SOPs) into prompt sequences for more streamlined workflows, thus allowing agents with human-like domain expertise to verify intermediate results and reduce errors. MetaGPT utilizes an assembly line paradigm to assign diverse roles to various agents, efficiently breaking down complex tasks into subtasks involving many agents working together. Bonus: Notable Related Papers on LLM CapabilitiesThis includes a bunch of papers we wanted to feature above but could not.* Lukas Berglund (Vanderbilt) et al: The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A” (ICLR poster, paper, Github)* We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form ''A is B'', it will not automatically generalize to the reverse direction ''B is A''. This is the Reversal Curse. * The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as ''Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]'' and the reverse ''Who is Mary Lee Pfeiffer's son?''. GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter.* * Omar Khattab (Stanford): DSPy: Compiling Declarative Language Model Calls into State-of-the-Art Pipelines (ICLR Spotlight Poster, GitHub)* presented by Krista Opsahl-Ong* “Existing LM pipelines are typically implemented using hard-coded “prompt templates”, i.e. lengthy strings discovered via trial and error. Toward a more systematic approach for developing and optimizing LM pipelines, we introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs, or imperative computational graphs where LMs are invoked through declarative modules. * DSPy modules are parameterized, meaning they can learn how to apply compositions of prompting, finetuning, augmentation, and reasoning techniques. * We design a compiler that will optimize any DSPy pipeline to maximize a given metric, by creating and collecting demonstrations. * We conduct two case studies, showing that succinct DSPy programs can express and optimize pipelines that reason about math word problems, tackle multi-hop retrieval, answer complex questions, and control agent loops. * Within minutes of compiling, DSPy can automatically produce pipelines that outperform out-of-the-box few-shot prompting as well as expert-created demonstrations for GPT-3.5 and Llama2-13b-chat. On top of that, DSPy programs compiled for relatively small LMs like 770M parameter T5 and Llama2-13b-chat are competitive with many approaches that rely on large and proprietary LMs like GPT-3.5 and on expert-written prompt chains. * * MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning* Scaling Laws for Associative Memories * DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models* Efficient Streaming Language Models with Attention Sinks Get full access to Latent Space at www.latent.space/subscribe

No Laying Up - Golf Podcast
NLU Podcast, Episode 825: Friday Chevron Championship Live Show

No Laying Up - Golf Podcast

Play Episode Listen Later Apr 20, 2024 97:43


We are 36 holes complete at the 2024 Chevron Championship with two atop the leaderboard. Jin Hee Im and Atthaya Thitikul are tied at -8 with Nelly Korda 1 shot behind. Our very own Lauren Coughlin played tough trying to keep the momentum going after he opening 66. She sits at T5 with a host of others. We run down the leaderboard, discuss the other events going on, and close with our ESPN+ experience. Learn more about your ad choices. Visit megaphone.fm/adchoices