vl m podcasts | Ivy.fm

Jérémy Charvet – invité exceptionnel (Au fil de la vie) | La loi des séries #830

Play Episode Listen Later Nov 7, 2025 29:19

Figure emblématique du nouveau Plus belle la vie, Jérémy Charvet vient nous présenter son premier single, « Au fil de la vie ». L’invité : Jérémy Charvet Depuis janvier 2024, Jérémy Charvet incarne Ulysse Kepler dans Plus... Cet article Jérémy Charvet – invité exceptionnel (Au fil de la vie) | La loi des séries #830 est apparu en premier sur VL Média.

figure la vie invit la loi exceptionnel au fil charvet vl m

Jean-Paul Césari et Dominique Poulain – Manga Méga Show | Le Club #67

La Loi des séries

Play Episode Listen Later Nov 5, 2025 72:09

Nouveau numéro du Club avec une grande spéciale Manga Méga Show avec Jean-Paul Césari et Dominique Poulain. Les invités : Dominique Poulain et Jean-Paul Césari / Victoire « Nos sourires » A l’occasion du Manga Mega Show... Cet article Jean-Paul Césari et Dominique Poulain – Manga Méga Show | Le Club #67 est apparu en premier sur VL Média.

club manga nouveau jean paul le club paul c poulain vl m

Faut-il « romancer » des histoires comme celle d’Ed Gein ? | La loi des séries #829

La Loi des séries

Play Episode Listen Later Oct 31, 2025 19:32

C’est LA série dont tout le monde a parlé : Alexandre Letren fait face cette semaine à la série « Monstre : l’histoire de Ed Gein ». Face à la série sur Ed Gein Nouveau format dans... Cet article Faut-il « romancer » des histoires comme celle d’Ed Gein ? | La loi des séries #829 est apparu en premier sur VL Média.

comme faut histoires ed gein la loi monstre vl m romancer

Damien Boisseau – invité exceptionnel | Le Club #66

La Loi des séries

Play Episode Listen Later Oct 29, 2025 68:43

Une nouvelle fois, c’est le doublage qui est à l’honneur avec notre invité, le génial Damien Boisseau, voix iconique des séries et des films. Les invités : Damien Boisseau et Clerwie Frin Grey’s Anatomy, Supernatural,... Cet article Damien Boisseau – invité exceptionnel | Le Club #66 est apparu en premier sur VL Média.

anatomy supernatural invit le club exceptionnel vl m boisseau

The top AI news from the past week, every ThursdAI

Play Episode Listen Later Oct 24, 2025 95:16

Hey everyone, Alex here! Welcome... to the browser war II - the AI edition! This week we chatted in depth about ChatGPT's new Atlas agentic browser, and the additional agentic powers Microsoft added to Edge with Copilot Mode (tho it didn't work for me) Also this week was a kind of crazy OCR week, with more than 4 OCR models releasing, and the crown one is DeepSeek OCR, that turned the whole industry on it's head (more later) Quite a few video updates as well, with real time lipsync from Decart, and a new update from LTX with 4k native video generation, it's been a busy AI week for sure! Additionally, I've had the pleasure to talk about AI Browsing agents with Paul from BrowserBase and real time video with Kwindla Kramer from Pipecat/Daily, so make sure to tune in for those interviews, buckle up, let's dive in! Thanks for reading ThursdAI - Recaps of the most high signal AI weekly spaces! This post is public so feel free to share it.Open Source: OCR is Not What You Think It Is (X, HF, Paper)The most important and frankly mind-bending release this week came from DeepSeek. They dropped DeepSeek-OCR, and let me tell you, this is NOT just another OCR model. The cohost were buzzing about this, and once I dug in, I understood why. This isn't just about reading text from an image; it's a revolutionary approach to context compression.We think that DeepSeek needed this as an internal tool, so we're really grateful to them for open sourcing this, as they did something crazy here. They are essentially turning text into a visual representation, compressing it, and then using a tiny vision decoder to read it back with incredible accuracy. We're talking about a compression ratio of up to 10x with 97% decoding accuracy. Even at 20x compression they are achieving 60% decoding accuracy! My head exploded live on the show when I read that. This is like the middle-out compression algorithm joke from Silicon Valley, but it's real. As Yam pointed out, this suggests our current methods of text tokenization are far from optimal.With only 3B and ~570M active parameters, they are taking a direct stab at long context inefficiency, imagine taking 1M tokens, encoding them into 100K visual tokens, and then feeding those into a model. Since the model is tiny, it's very cheap to run, for example, alphaXiv claimed they have OCRd' all of the papers on ArXiv with this model for $1000, a task that would have cost $7500 using MistalOCR - as per their paper, with DeepSeek OCR, on a single H100 GPU, its possible to scan up to 200K pages!

Les grands moments de Croque Vacances | Croque le Club #2

La Loi des séries

Play Episode Listen Later Oct 24, 2025 33:59

On poursuit notre belle série de l'été avec Croque le Club qui aujourd'hui s'intéresse aux grands événements survenus dans Croque Vacances. A quoi ressemblera Croque le Club ? Génériques, séries, dessins animés, plateaux ou chansons... Cet article Les grands moments de Croque Vacances | Croque le Club #2 est apparu en premier sur VL Média.

club vacances les grands le club croque vl m

Benoît Solès – Le fantôme de l’Opéra / Vanille | Le club #65

La Loi des séries

Play Episode Listen Later Oct 22, 2025 62:57

Avec Benoît Solès et Vanille, on parlera de comédie musicale et d’univers artistique singulier dans cette édition du Club. Les invités : Benoît Solès et Vanille Figue incontournable du théâtre, Benoît Solès signe le livret... Cet article Benoît Solès – Le fantôme de l’Opéra / Vanille | Le club #65 est apparu en premier sur VL Média.

club sol beno le club vanille avec beno le fant vl m

Leur(s) Nom(s) est Bond… Jame(s) Bond(s) – (Partie 1 : 1962-1987) | Seriefonia

La Loi des séries

Play Episode Listen Later Oct 19, 2025 64:11

Comme le nom du prochain Bond, Seriefonia s’est fait attendre mais pour mieux vous régaler avec cette spéciale « Bond » … James Bond. SERIFONIA, SEASON 8 OPENING THEME by Jérôme Marie Oui, je sais, elle arrive... Cet article Leur(s) Nom(s) est Bond… Jame(s) Bond(s) – (Partie 1 : 1962-1987) | Seriefonia est apparu en premier sur VL Média.

bond james bond comme leur jame nom opening theme vl m

Joseph : Lucien Jean-Baptiste tourne la suite de sa série

La Loi des séries

Play Episode Listen Later Oct 16, 2025 16:26

Après une première saison, Joseph s’apprête à revenir mais sous une nouvelle forme, un nouvel épisode baptisé « La vie de palace ». Nouvelle histoire, nouvelle formule pour Joseph « Il faut être plus didactique, plus bavard car... Cet article Joseph : Lucien Jean-Baptiste tourne la suite de sa série est apparu en premier sur VL Média.

nouvelle jean baptiste lucien la suite tourne vl m

Nicolas Marié – spéciale doublage / Julien Chiron (Panini) | Le Club #64

La Loi des séries

Play Episode Listen Later Oct 15, 2025 78:38

Une belle émission spéciale doublage cette semaine avec l’excellent Nicolas Marié (Jarod). On parlera aussi album Panini avec Julien Chrinon. Les invités : Nicolas Marié et Julien Chiron Nicolas Marié est l’invité exceptionnel de cette... Cet article Nicolas Marié – spéciale doublage / Julien Chiron (Panini) | Le Club #64 est apparu en premier sur VL Média.

panini chiron le club doublage vl m

Polerik Rouvière – compositeur | La loi des séries #828

La Loi des séries

Play Episode Listen Later Oct 10, 2025 26:19

Polérik Rouvière, compositeur du film de Mister V, McWalter, mais aussi de la série Il était deux fois, est notre invité cette semaine.

pol entretien la loi compositeur mister v vl m

Laura Ferré, Anna-Maria Giachin, Alain Jeanne et Titia Augustin | Le Club #63

La Loi des séries

Play Episode Listen Later Oct 8, 2025 63:36

Un Club 100% nouveaux talents nouveaux visages avec des artistes et un journaliste : Laura Ferré, Anna-Maria Giachin, Alain Jeanne et Titia Augustin. Les invités : Laura Ferré, Anna-Maria Giachin, Alain Jeanne et Titia Augustin... Cet article Laura Ferré, Anna-Maria Giachin, Alain Jeanne et Titia Augustin | Le Club #63 est apparu en premier sur VL Média.

alain ferr augustin anna maria le club titia vl m

Juliette Lamboley pour « Nuit noire nuit blanche » | La loi des séries #827

La Loi des séries

Play Episode Listen Later Oct 3, 2025 26:17

Avec « Nuit noire nuit blanche », c’est une fiction audio que l’on met en avant cette semaine et portée par Juliette Lamboley. Juliette Lamboley nous présente une fiction forte et nécessaire, Nuit noire Nuit Blanche Disponible... Cet article Juliette Lamboley pour « Nuit noire nuit blanche » | La loi des séries #827 est apparu en premier sur VL Média.

nuit noire la loi nuit blanche vl m juliette lamboley

Guillaume Lemans – créateur de Les sentinelles | La loi des séries #826

La Loi des séries

Play Episode Listen Later Sep 26, 2025 25:49

C’est la série événement de cette rentrée : Les sentinelles arrivent sur Canal+ et Guillaume Lemans, créateur de la série, est notre invité. Guillaume Lemans présente ses Sentinelles C’est l’une des séries que l’on attendait... Cet article Guillaume Lemans – créateur de Les sentinelles | La loi des séries #826 est apparu en premier sur VL Média.

canal guillaume le mans la loi vl m les sentinelles

Richard Joffo – invité exceptionnel | Le Club #62

La Loi des séries

Play Episode Listen Later Sep 24, 2025 54:19

On lui doit le groupe culte Mini Star mais aussi de nombreux génériques : Richard Joffo est l’invité du Club cette semaine. Richard Joffo nous raconte la naissance du groupe Mini Star Les Mini Star... Cet article Richard Joffo – invité exceptionnel | Le Club #62 est apparu en premier sur VL Média.

club invit le club exceptionnel vl m

Maïra Schmitt : de Léo Mattéi à sa première réalisation | La loi des séries #825

La Loi des séries

Play Episode Listen Later Sep 19, 2025 27:13

Alors que le Festival de la Fiction bat son plein, nous recevons aujourd’hui Maïra Schmitt, héroïne de la série Léo Mattéi avec JL Reichmann. Maïra Schmitt signe son premier court métrage Pitché à la Rochelle... Cet article Maïra Schmitt : de Léo Mattéi à sa première réalisation | La loi des séries #825 est apparu en premier sur VL Média.

festival fiction pitch alors premi schmitt la loi vl m

Les trésors musicaux de Croque Vacances | Croque le Club #1

La Loi des séries

Play Episode Listen Later Sep 17, 2025 26:00

On poursuit notre belle série de l’été avec Croque le Club qui aujourd’hui s’intéresse aux pépites musicales dans Croque Vacances. A quoi ressemblera Croque le Club ? Génériques, séries, dessins animés, plateaux ou chansons cultes,... Cet article Les trésors musicaux de Croque Vacances | Croque le Club #1 est apparu en premier sur VL Média.

club vacances le club sors les tr musicaux croque vl m

Iñaki Lartigue : « Aucun regret d’être parti » | La loi des séries #824

La Loi des séries

Play Episode Listen Later Sep 12, 2025 26:07

Nouvelle émission en compagnie d’un des visages de vos séries : Iñaki Lartigue vient faire le bilan de son aventure dans Plus belle la vie. Iñaki Lartigue : deux mois après son départ de Plus... Cet article Iñaki Lartigue : « Aucun regret d’être parti » | La loi des séries #824 est apparu en premier sur VL Média.

regret nouvelle parti la loi aucun vl m lartigue

Constantin Pappas et Yann Pichon / Lisa Mistretta | Le Club #61

La Loi des séries

Play Episode Listen Later Sep 10, 2025 61:40

Pour sa rentrée, Le Club s’ouvre une nouvelle fois au doublage en compagnie de Constantin Pappas et Yann Pichon. Les invités : Constantin Pappas et Yann Pichon On aime le doublage dans Le Club et... Cet article Constantin Pappas et Yann Pichon / Lisa Mistretta | Le Club #61 est apparu en premier sur VL Média.

yann constantin pappas le club pichon vl m mistretta

Ava Baya (Soleil Noir) – invitée exceptionnelle | La loi des séries #823

La Loi des séries

Play Episode Listen Later Sep 5, 2025 27:33

Nouvelle formule de La loi des séries en cette rentrée : Ava Baya, héroïne de Soleil Noir, invitée exceptionnelle d’Alexandre Letren. Ava Baya évoque Soleil Noir Héroïne du carton de Netflix cet été, Ava Baya... Cet article Ava Baya (Soleil Noir) – invitée exceptionnelle | La loi des séries #823 est apparu en premier sur VL Média.

netflix noir nouvelle soleil invit la loi exceptionnelle baya vl m

Acilion et les autres | Le Club croque l’été #8

La Loi des séries

Play Episode Listen Later Aug 27, 2025 13:37

C’est la dernière de l’été pour cette grande aventure Croque l’été et elle est consacrée aux autres émissions de la galaxie Croque dont Acilion. A quoi ressemblera Le club croque l'été ? Génériques, séries, dessins... Cet article Acilion et les autres | Le Club croque l’été #8 est apparu en premier sur VL Média.

les autres le club croque vl m

La famille Croque Vacances | Le Club croque l’été #7

La Loi des séries

Play Episode Listen Later Aug 20, 2025 10:15

En dehors de Claude, Isidore et Clémentine, ils étaient nombreux à faire partie de la famille Croque vacances et on en parle aujourd’hui. EDIT : ON S’EXCUSE MAIS DANS LA VERSION VIDEO, UNE SEQUENCE A... Cet article La famille Croque Vacances | Le Club croque l’été #7 est apparu en premier sur VL Média.

vacances la famille le club isidore croque vl m

#305 - Lebendig gefressen

Viva la Movielución - Podcast

Play Episode Listen Later Aug 19, 2025 44:14

Mike und Kane reden über Umberto Lenzis Kannibalen-Schocker „Lebendig gefressen“ – voller Trash, Kult und Skandale.

trash kult skandale horror podcasts lebendig eaten alive umberto lenzi vl m

Hiroshi Watari et Makoto Sumikawa (Spielvan) | Le Club #60

La Loi des séries

Play Episode Listen Later Aug 15, 2025 22:24

Un mini épisode du Club avant la rentrée mais avec deux légendes des séries jeunesse : Hiroshi Watari et Makoto Sumikawa aka Spielvan et Diana. C’est lors de la dernière édition de la Japan Expo... Cet article Hiroshi Watari et Makoto Sumikawa (Spielvan) | Le Club #60 est apparu en premier sur VL Média.

club le club makoto hiroshi japan expo vl m watari

The top AI news from the past week, every ThursdAI

Play Episode Listen Later Aug 15, 2025 89:41

Hey everyone, Alex here

Les chansons de Claude Pierrard, Isidore, Clémentine et les autres | Le Club croque l’été #6

La Loi des séries

Play Episode Listen Later Aug 13, 2025 12:12

L’émission se poursuit tout comme la campagne du livre qui rencontre un joli succès et voici ce 6e rendez-vous dédié aux chansons des animateurs. A quoi ressemblera Le club croque l'été ? Génériques, séries, dessins... Cet article Les chansons de Claude Pierrard, Isidore, Clémentine et les autres | Le Club croque l’été #6 est apparu en premier sur VL Média.

les autres le club isidore les chansons croque vl m

Les dessins animés | Le Club croque l’été #5

La Loi des séries

Play Episode Listen Later Aug 6, 2025 10:36

Pour cette nouvelle émission, place à ces trop nombreux dessins animés entendus pour et découverts dans Croque vacances. A quoi ressemblera Le club croque l'été ? Génériques, séries, dessins animés, plateaux ou chansons cultes, Croque... Cet article Les dessins animés | Le Club croque l’été #5 est apparu en premier sur VL Média.

anim le club dessins croque vl m

Isidore et Clémentine | Le Club croque l’été #4

La Loi des séries

Play Episode Listen Later Jul 30, 2025 11:51

Après Claude Pierrard, nous consacrons une émission à ses deux comparses, Isidore et Clémentine qui ont marqué les esprits. A quoi ressemblera Le club croque l'été ? Génériques, séries, dessins animés, plateaux ou chansons cultes,... Cet article Isidore et Clémentine | Le Club croque l’été #4 est apparu en premier sur VL Média.

le club isidore croque vl m

#302 - Woah Lecks: Hundreds of Beavers, Wicked, Squid Game, uvm.

Viva la Movielución - Podcast

Play Episode Listen Later Jul 29, 2025 82:18

Dinos, Dämonen & Dokus: Kane, Korbi und Mike kämpfen sich durch Squid Game 3, Jurassic World: Rebirth, Wicked, Hundreds of Beavers u. v. m.!

wicked hundreds squid game beavers dinos pastewka vl m korbi

#366. Ferievrøvl: måltidsfrekvens, kalorier på ferie og "krangel"

AFPT podden

Play Episode Listen Later Jul 27, 2025 51:43

Hva betyr egentlig måltidsfrekvens for kroppen din, og trenger du å spise hver tredje time for å holde formen? Vi pratet om det å finne en balanse som passer deg – også på ferie. Du får tips om protein, kalorier, ferieliv og hvorfor sosiale bånd er like viktige som hva du putter i kroppen. .

tips acast hvorfor hva ferie oppsummering personlige forslag viktigheten eksempler vl m fordeler shaw media ferietid

Les séries télévisées | Le Club croque l’été #3

La Loi des séries

Play Episode Listen Later Jul 23, 2025 11:48

Pour cette nouvelle émission, place à ces trop nombreuses séries entendues pour et découvertes dans Croque vacances. A quoi ressemblera Le club croque l'été ? Génériques, séries, dessins animés, plateaux ou chansons cultes, Croque Vacances... Cet article Les séries télévisées | Le Club croque l’été #3 est apparu en premier sur VL Média.

le club l vis croque vl m

Claude Pierrard | Le Club croque l’été #2

La Loi des séries

Play Episode Listen Later Jul 16, 2025 10:42

Pour cette deuxième émission, évoquons celui sans qui toute cette aventure ne serait rien, Claude Pierrard, journaliste et animateur. A quoi ressemblera Le club croque l'été ? Génériques, séries, dessins animés, plateaux ou chansons cultes,... Cet article Claude Pierrard | Le Club croque l’été #2 est apparu en premier sur VL Média.

le club croque vl m

125. Best of VLM: How to Honor Your Cultural Foods While Meeting Your Health Goals with Mariana Dineen

Viva la Mami

Play Episode Listen Later May 15, 2025 62:33 Transcription Available

You're listening to the Best of VLM episode series featuring the most popular episodes of the Viva la Mami podcast!In this episode, we welcome Mariana Dineen, a registered dietitian and founder of Elemento Health. She shares the importance of nutrition within the Latino community, especially for mothers striving to adopt a healthier lifestyle while preserving cultural traditions. Mariana emphasizes the significance of cultural relevance in nutrition counseling, overcoming language barriers, and the unique challenges faced by the Latino community in accessing healthcare.She offers practical strategies for meal planning and managing chronic diseases like diabetes through a cultural lens, stressing the importance of balance and inclusion rather than the elimination of cultural foods. We also touched on the impact of stress on eating habits, particularly for busy Latina moms, and how to address these issues holistically.Mariana's commitment to cultural sensitivity in her practice, Elemento Health, underscores her dedication to providing accessible and empathetic nutrition care. If you've ever felt shame about your cultural foods or struggled to find a healthcare provider who truly gets you, this episode is for you.For detailed show notes, visit vivalamami.com/episode125Key topics covered:Breaking down barriers to accessing nutrition care for LatinasMaking traditional dishes healthier while preserving cultural rootsManaging stress eating and emotional relationships with foodPractical meal planning strategies for busy mamasCulturally sensitive approaches to managing conditions like diabetesConnect with Mariana from Elemento Health!Email: mariana@elementohealth.comInstagram: @elemento_healthWebsite: elementohealth.comLove this episode? Subscribe wherever you are listening, share this episode with an amiga, and leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.Follow Viva la Mami on Instagram @vivalamamiJoin the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!Have a suggestion for an episode topic? Click HEREHave a suggestion for a guest? Click HEREVisit the Viva la Mami Websitewww.vivalamami.comHave questions or want to connect? Email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

apple nutrition cultural latinas foods latino dietitian mami health goals dineen vl m

124. Best of VLM: Managing Mom Rage and Emotional Dysregulation with Jocelyn Flores

Viva la Mami

Play Episode Listen Later May 8, 2025 54:13 Transcription Available

You're listening to the Best of VLM episode series featuring the most popular episodes of the Viva la Mami podcast!In this episode, I delve into the all-too-common experiences of emotional dysregulation and ‘mom rage' that many Latina moms face. Joined by licensed marriage and family therapist Jocelyn Flores, founder of Raíz Parenting, we discuss practical tools like mindful breathing techniques and setting healthy boundaries to help manage these overwhelming emotions.Jocelyn also shares insights on the impact of our cultural background and the importance of self-compassion in the parenting journey. This is an authentic, judgment-free conversation aimed at providing support and reminding mamis that they are not alone in their struggles. We also explore how to break generational cycles of parenting and create a more emotionally secure environment for our children.For full show notes, visit vivalamami.com/episode124Connect with Jocelyn FloresWebsite: raiz-parenting.comInstagram: @raizparentingResources Mentioned:Get Raíz Parenting's Break the Cycle Freebie: www.raiz-parenting.com/freebieLove this episode? Subscribe wherever you are listening, share this episode with an amiga, and leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.Follow Viva la Mami on Instagram @vivalamamiJoin the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!Have a suggestion for an episode topic? Click HEREHave a suggestion for a guest? Click HEREVisit the Viva la Mami Websitewww.vivalamami.comHave questions or want to connect? Email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

apple parenting managing emotional rage latinas ra flores mami dysregulation vl m

123. Best of VLM: Navigating the Mental Load of Bilingual Parenting with Erika Milla from Spanish En Casita

Viva la Mami

Play Episode Listen Later May 1, 2025 60:25 Transcription Available

You're listening to the Best of VLM episode series featuring the most popular episodes of the Viva la Mami podcast!In this episode, we welcome Erika Milla, creator of Spanish En Casita, an online community for parents striving to raise bilingual children. As a dedicated mom raising bilingual kids, Erika shares how she's preserving Spanish at home.In our conversation, we dive into the mental and emotional struggles of bilingual parenting and the ups and downs of dual immersion programs. Erika also opens up about homeschooling her kids and gives tips for other parents on similar journeys. In addition, talk about the importance of community, cultural pride, and staying intentional in raising bilingual children.For full show notes, visit vivalamami.com/episode123Follow Erika Milla!Instagram: instagram.com/spanish.en.casitaFeeling overwhelmed by navigating cultural expectations and modern parenting as a Latina mom? Join Balanced Madrehood, Viva la Mami's signature coaching program designed to empower Latina moms to create a more balanced and fulfilling madrehood journey. Head over to vivalamami.com/balanced-madrehood to learn more!Love this episode? Subscribe wherever you are listening, share this episode with an amiga, and leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.Follow Viva la Mami on Instagram @vivalamamiJoin the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!Have a suggestion for an episode topic? Click HEREHave a suggestion for a guest? Click HEREVisit the Viva la Mami Websitewww.vivalamami.comHave questions or want to connect? Email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

love head apple navigating parenting spanish latinas bilingual mami mental load milla casita vl m

π0: A Foundation Model for Robotics with Sergey Levine - #719

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Feb 18, 2025 52:30

Today, we're joined by Sergey Levine, associate professor at UC Berkeley and co-founder of Physical Intelligence, to discuss π0 (pi-zero), a general-purpose robotic foundation model. We dig into the model architecture, which pairs a vision language model (VLM) with a diffusion-based action expert, and the model training "recipe," emphasizing the roles of pre-training and post-training with a diverse mixture of real-world data to ensure robust and intelligent robot learning. We review the data collection approach, which uses human operators and teleoperation rigs, the potential of synthetic data and reinforcement learning in enhancing robotic capabilities, and much more. We also introduce the team's new FAST tokenizer, which opens the door to a fully Transformer-based model and significant improvements in learning and generalization. Finally, we cover the open-sourcing of π0 and future directions for their research. The complete show notes for this episode can be found at https://twimlai.com/go/719.

foundation model robotics uc berkeley levine transformer sergey vl m

108. VLM Spotlight: Navigating Divorce as a Latina Mom with Marisa Lopez

Viva la Mami

Play Episode Listen Later Jan 23, 2025 39:55 Transcription Available

In this powerful first episode of a two-part series, I sit down with Marisa Lopez, a civil engineer turned real estate investor, who shares her raw and inspiring journey of breaking generational patterns to create a healthier future for her daughter.Marisa opens up about navigating divorce as a Latina mom, breaking cultural norms, and building a healthy co-parenting relationship. From being told she might never have children to becoming a single mom at 5 months postpartum, Marisa shares how she found strength through therapy, family support, and prioritizing her daughter's wellbeing. This conversation is especially powerful for mamás considering divorce or struggling with family expectations around separation. Join us as Marisa shows how choosing yourself can be the greatest act of love for your children.For detailed show notes, visit vivalamami.com/episode108In This Episode, You'll Hear:Marisa's path to madrehood despite medical challengesThe emotional process of choosing divorce as a new motherBreaking traditional cultural expectations around marriageBuilding a support system as a first-generation divorceeCreating healthy family dynamics for the next generationResources Mentioned:Mujerón MovementConnect with Marisa:Instagram: @asiramzepolLove this episode? Subscribe wherever you are listening, share this episode with an amiga, and leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.Follow Viva la Mami on Instagram@vivalamamiJoin the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!Have a suggestion for an episode topic? Click HEREHave a suggestion for a guest? Click HEREJoin the Viva la Mami Collectivewww.vivalamami.com/vlm-collectiveVisit the Viva la Mami Websitewww.vivalamami.comHave questions or want to connect? Email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

apple navigating divorce latinas lopez mami challengesthe vl m

2024 in Agents [LS Live! @ NeurIPS 2024]

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Dec 25, 2024 48:59

Happy holidays! We'll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all all our LS supporters who helped fund the gorgeous venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver.Our next keynote covers The State of LLM Agents, with the triumphant return of Professor Graham Neubig's return to the pod (his ICLR episode here!). OpenDevin is now a startup known as AllHands! The renamed OpenHands has done extremely well this year, as they end the year sitting comfortably at number 1 on the hardest SWE-Bench Full leaderboard at 29%, though on the smaller SWE-Bench Verified, they are at 53%, behind Amazon Q, devlo, and OpenAI's self reported o3 results at 71.7%.Many are saying that 2025 is going to be the year of agents, with OpenAI, DeepMind and Anthropic setting their sights on consumer and coding agents, vision based computer-using agents and multi agent systems. There has been so much progress on the practical reliability and applications of agents in all domains, from the huge launch of Cognition AI's Devin this year, to the sleeper hit of Cursor Composer and Codeium's Windsurf Cascade in the IDE arena, to the explosive revenue growth of Stackblitz's Bolt, Lovable, and Vercel's v0, and the unicorn rounds and high profile movements of customer support agents like Sierra (now worth $4 billion) and search agents like Perplexity (now worth $9 billion). We wanted to take a little step back to understand the most notable papers of the year in Agents, and Graham indulged with his list of 8 perennial problems in building agents in 2024.Must-Read Papers for the 8 Problems of Agents* The agent-computer interface: CodeAct: Executable Code Actions Elicit Better LLM Agents. Minimial viable tools: Execution Sandbox, File Editor, Web Browsing* The human-agent interface: Chat UI, GitHub Plugin, Remote runtime, …?* Choosing an LLM: See Evaluation of LLMs as Coding Agents on SWE-Bench at 30x - must understand instructions, tools, code, environment, error recovery* Planning: Single Agent Systems vs Multi Agent (CoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration) - Explicit vs Implicit, Curated vs Generated* Reusable common workflows: SteP: Stacked LLM Policies for Web Actions and Agent Workflow Memory - Manual prompting vs Learning from Experience* Exploration: Agentless: Demystifying LLM-based Software Engineering Agents and BAGEL: Bootstrapping Agents by Guiding Exploration with Language* Search: Tree Search for Language Model Agents - explore paths and rewind* Evaluation: Fast Sanity Checks (miniWoB and Aider) and Highly Realistic (WebArena, SWE-Bench) and SWE-Gym: An Open Environment for Training Software Engineering Agents & VerifiersFull Talk on YouTubePlease like and subscribe!Timestamps* 00:00 Welcome to Latent Space Live at NeurIPS 2024* 00:29 State of LLM Agents in 2024* 02:20 Professor Graham Newbig's Insights on Agents* 03:57 Live Demo: Coding Agents in Action* 08:20 Designing Effective Agents* 14:13 Choosing the Right Language Model for Agents* 16:24 Planning and Workflow for Agents* 22:21 Evaluation and Future Predictions for Agents* 25:31 Future of Agent Development* 25:56 Human-Agent Interaction Challenges* 26:48 Expanding Agent Use Beyond Programming* 27:25 Redesigning Systems for Agent Efficiency* 28:03 Accelerating Progress with Agent Technology* 28:28 Call to Action for Open Source Contributions* 30:36 Q&A: Agent Performance and Benchmarks* 33:23 Q&A: Web Agents and Interaction Methods* 37:16 Q&A: Agent Architectures and Improvements* 43:09 Q&A: Self-Improving Agents and Authentication* 47:31 Live Demonstration and Closing RemarksTranscript[00:00:29] State of LLM Agents in 2024[00:00:29] Speaker 9: Our next keynote covers the state of LLM agents. With the triumphant return of Professor Graham Newbig of CMU and OpenDevon, now a startup known as AllHands. The renamed OpenHands has done extremely well this year, as they end the year sitting comfortably at number one on the hardest SWE Benchful leaderboard at 29%.[00:00:53] Speaker 9: Though, on the smaller SWE bench verified, they are at 53 percent behind Amazon Q [00:01:00] Devlo and OpenAI's self reported O3 results at 71. 7%. Many are saying that 2025 is going to be the year of agents, with OpenAI, DeepMind, and Anthropic setting their sights on consumer and coding agents. Vision based computer using agents and multi agent systems.[00:01:22] Speaker 9: There has been so much progress on the practical reliability and applications of agents in all domains, from the huge launch of Cognition AI's Devon this year, to the sleeper hit of Cursor Composer and recent guest Codium's Windsurf Cascade in the IDE arena. To the explosive revenue growth of recent guests StackBlitz's Bolt, Lovable, and Vercel's vZero.[00:01:44] Speaker 9: And the unicorn rounds and high profile movements of customer support agents like Sierra, now worth 4 billion, and search agents like Perplexity, now worth 9 billion. We wanted to take a little step back to understand the most notable papers of the year in [00:02:00] agents, and Graham indulged with his list of eight perennial problems in building agents.[00:02:06] Speaker 9: As always, don't forget to check our show notes for all the selected best papers of 2024, and for the YouTube link to their talk. Graham's slides were especially popular online, and we are honoured to have him. Watch out and take care![00:02:20] Professor Graham Newbig's Insights on Agents[00:02:20] Speaker: Okay hi everyone. So I was given the task of talking about agents in 2024, and this is An impossible task because there are so many agents, so many agents in 2024. So this is going to be strongly covered by like my personal experience and what I think is interesting and important, but I think it's an important topic.[00:02:41] Speaker: So let's go ahead. So the first thing I'd like to think about is let's say I gave you you know, a highly competent human, some tools. Let's say I gave you a web browser and a terminal or a file system. And the ability to [00:03:00] edit text or code. What could you do with that? Everything. Yeah.[00:03:07] Speaker: Probably a lot of things. This is like 99 percent of my, you know, daily daily life, I guess. When I'm, when I'm working. So, I think this is a pretty powerful tool set, and I am trying to do, and what I think some other people are trying to do, is come up with agents that are able to, you know, manipulate these things.[00:03:26] Speaker: Web browsing, coding, running code in successful ways. So there was a little bit about my profile. I'm a professor at CMU, chief scientist at All Hands AI, building open source coding agents. I'm maintainer of OpenHands, which is an open source coding agent framework. And I'm also a software developer and I, I like doing lots of coding and, and, you know, shipping new features and stuff like this.[00:03:51] Speaker: So building agents that help me to do this, you know, is kind of an interesting thing, very close to me.[00:03:57] Live Demo: Coding Agents in Action[00:03:57] Speaker: So the first thing I'd like to do is I'd like to try [00:04:00] some things that I haven't actually tried before. If anybody has, you know, tried to give a live demo, you know, this is, you know very, very scary whenever you do it and it might not work.[00:04:09] Speaker: So it might not work this time either. But I want to show you like three things that I typically do with coding agents in my everyday work. I use coding agents maybe five to 10 times a day to help me solve my own problems. And so this is a first one. This is a data science task. Which says I want to create scatter plots that show the increase of the SWE bench score over time.[00:04:34] Speaker: And so I, I wrote a kind of concrete prompt about this. Agents work better with like somewhat concrete prompts. And I'm gonna throw this into open hands and let it work. And I'll, I'll go back to that in a second. Another thing that I do is I create new software. And I, I've been using a [00:05:00] service a particular service.[00:05:01] Speaker: I won't name it for sending emails and I'm not very happy with it. So I want to switch over to this new service called resend. com, which makes it easier to send emails. And so I'm going to ask it to read the docs for the resend. com API and come up with a script that allows me to send emails. The input to the script should be a CSV file and the subject and body should be provided in Jinja2 templates.[00:05:24] Speaker: So I'll start another agent and and try to get it to do that for me.[00:05:35] Speaker: And let's go with the last one. The last one I do is. This is improving existing software and in order, you know, once you write software, you usually don't throw it away. You go in and, like, actually improve it iteratively. This software that I have is something I created without writing any code.[00:05:52] Speaker: It's basically software to monitor how much our our agents are contributing to the OpenHance repository. [00:06:00] And on the, let me make that a little bit bigger, on the left side, I have the number of issues where it like sent a pull request. I have the number of issues where it like sent a pull request, whether it was merged in purple, closed in red, or is still open in green. And so these are like, you know, it's helping us monitor, but one thing it doesn't tell me is the total number. And I kind of want that feature added to this software.[00:06:33] Speaker: So I'm going to try to add that too. So. I'll take this, I'll take this prompt,[00:06:46] Speaker: and here I want to open up specifically that GitHub repo. So I'll open up that repo and paste in the prompt asking it. I asked it to make a pie chart for each of these and give me the total over the entire time period that I'm [00:07:00] monitoring. So we'll do that. And so now I have let's see, I have some agents.[00:07:05] Speaker: Oh, this one already finished. Let's see. So this one already finished. You can see it finished analyzing the Swebench repository. It wrote a demonstration of, yeah, I'm trying to do that now, actually.[00:07:30] Speaker: It wrote a demonstration of how much each of the systems have improved over time. And I asked it to label the top three for each of the data sets. And so it labeled OpenHands as being the best one for SWE Bench Normal. For SWE Bench Verified, it has like the Amazon QAgent and OpenHands. For the SWE Bench Lite, it has three here over three over here.[00:07:53] Speaker: So you can see like. That's pretty useful, right? If you're a researcher, you do data analysis all the time. I did it while I was talking to all [00:08:00] of you and making a presentation. So that's, that's pretty nice. I, I doubt the other two are finished yet. That would be impressive if the, yeah. So I think they're still working.[00:08:09] Speaker: So maybe we'll get back to them at the end of the presentation. But so these are the kinds of the, these are the kinds of things that I do every day with coding agents now. And it's or software development agents. It's pretty impressive.[00:08:20] Designing Effective Agents[00:08:20] Speaker: The next thing I'd like to talk about a little bit is things I worry about when designing agents.[00:08:24] Speaker: So we're designing agents to, you know, do a very difficult task of like navigating websites writing code, other things like this. And within 2024, there's been like a huge improvement in the methodology that we use to do this. But there's a bunch of things we think about. There's a bunch of interesting papers, and I'd like to introduce a few of them.[00:08:46] Speaker: So the first thing I worry about is the agent computer interface. Like, how do we get an agent to interact with computers? And, How do we provide agents with the tools to do the job? And [00:09:00] within OpenHands we are doing the thing on the right, but there's also a lot of agents that do the thing on the left.[00:09:05] Speaker: So the thing on the left is you give like agents kind of granular tools. You give them tools like or let's say your instruction is I want to determine the most cost effective country to purchase the smartphone model, Kodak one the countries to consider are the USA, Japan, Germany, and India. And you have a bunch of available APIs.[00:09:26] Speaker: And. So what you do for some agents is you provide them all of these tools APIs as tools that they can call. And so in this particular case in order to solve this problem, you'd have to make about like 30 tool calls, right? You'd have to call lookup rates for Germany, you'd have to look it up for the US, Japan, and India.[00:09:44] Speaker: That's four tool goals. And then you go through and do all of these things separately. And the method that we adopt in OpenHands instead is we provide these tools, but we provide them by just giving a coding agent, the ability to call [00:10:00] arbitrary Python code. And. In the arbitrary Python code, it can call these tools.[00:10:05] Speaker: We expose these tools as APIs that the model can call. And what that allows us to do is instead of writing 20 tool calls, making 20 LLM calls, you write a program that runs all of these all at once, and it gets the result. And of course it can execute that program. It can, you know, make a mistake. It can get errors back and fix things.[00:10:23] Speaker: But that makes our job a lot easier. And this has been really like instrumental to our success, I think. Another part of this is what tools does the agent need? And I, I think this depends on your use case, we're kind of extreme and we're only giving the agent five tools or maybe six tools.[00:10:40] Speaker: And what, what are they? The first one is program execution. So it can execute bash programs, and it can execute Jupyter notebooks. It can execute cells in Jupyter notebooks. So that, those are two tools. Another one is a file editing tool. And the file editing tool allows you to browse parts of files.[00:11:00][00:11:00] Speaker: And kind of read them, overwrite them, other stuff like this. And then we have another global search and replace tool. So it's actually two tools for file editing. And then a final one is web browsing, web browsing. I'm kind of cheating when I call it only one tool. You actually have like scroll and text input and click and other stuff like that.[00:11:18] Speaker: But these are basically the only things we allow the agent to do. What, then the question is, like, what if we wanted to allow it to do something else? And the answer is, well, you know, human programmers already have a bunch of things that they use. They have the requests PyPy library, they have the PDF to text PyPy library, they have, like, all these other libraries in the Python ecosystem that they could use.[00:11:41] Speaker: And so if we provide a coding agent with all these libraries, it can do things like data visualization and other stuff that I just showed you. So it can also get clone repositories and, and other things like this. The agents are super good at using the GitHub API also. So they can do, you know, things on GitHub, like finding all of the, you know, [00:12:00] comments on your issues or checking GitHub actions and stuff.[00:12:02] Speaker: The second thing I think about is the human agent interface. So this is like how do we get humans to interact with agents? Bye. I already showed you one variety of our human agent interface. It's basically a chat window where you can browse through the agent's results and things like this. This is very, very difficult.[00:12:18] Speaker: I, I don't think anybody has a good answer to this, and I don't think we have a good answer to this, but the, the guiding principles that I'm trying to follow are we want to present enough info to the user. So we want to present them with, you know, what the agent is doing in the form of a kind of.[00:12:36] Speaker: English descriptions. So you can see here you can see here every time it takes an action, it says like, I will help you create a script for sending emails. When it runs a bash command. Sorry, that's a little small. When it runs a bash command, it will say ran a bash command. It won't actually show you the whole bash command or the whole Jupyter notebook because it can be really large, but you can open it up and see if you [00:13:00] want to, by clicking on this.[00:13:01] Speaker: So like if you want to explore more, you can click over to the Jupyter notebook and see what's displayed in the Jupyter notebook. And you get like lots and lots of information. So that's one thing.[00:13:16] Speaker: Another thing is go where the user is. So like if the user's already interacting in a particular setting then I'd like to, you know, integrate into that setting, but only to a point. So at OpenHands, we have a chat UI for interaction. We have a GitHub plugin for tagging and resolving issues. So basically what you do is you Do at open hands agent and the open hands agent will like see that comment and be able to go in and fix things.[00:13:42] Speaker: So if you say at open hands agent tests are failing on this PR, please fix the tests. It will go in and fix the test for you and stuff like this. Another thing we have is a remote runtime for launching headless jobs. So if you want to launch like a fleet of agents to solve, you know five different problems at once, you can also do [00:14:00] that through an API.[00:14:00] Speaker: So we have we have these interfaces and this probably depends on the use case. So like, depending if you're a coding agent, you want to do things one way. If you're a like insurance auditing agent, you'll want to do things other ways, obviously.[00:14:13] Choosing the Right Language Model for Agents[00:14:13] Speaker: Another thing I think about a lot is choosing a language model.[00:14:16] Speaker: And for agentic LMs we have to have a bunch of things work really well. The first thing is really, really good instruction following ability. And if you have really good instruction following ability, it opens up like a ton of possible applications for you. Tool use and coding ability. So if you provide tools, it needs to be able to use them well.[00:14:38] Speaker: Environment understanding. So it needs, like, if you're building a web agent, it needs to be able to understand web pages either through vision or through text. And error awareness and recovery ability. So, if it makes a mistake, it needs to be able to, you know, figure out why it made a mistake, come up with alternative strategies, and other things like this.[00:14:58] Speaker: [00:15:00] Under the hood, in all of the demos that I did now Cloud, we're using Cloud. Cloud has all of these abilities very good, not perfect, but very good. Most others don't have these abilities quite as much. So like GPT 4. 0 doesn't have very good error recovery ability. And so because of this, it will go into loops and do the same thing over and over and over again.[00:15:22] Speaker: Whereas Claude does not do this. Claude, if you, if you use the agents enough, you get used to their kind of like personality. And Claude says, Hmm, let me try a different approach a lot. So, you know, obviously it's been trained in some way to, you know, elicit this ability. We did an evaluation. This is old.[00:15:40] Speaker: And we need to update this basically, but we evaluated CLOD, mini LLAMA 405B, DeepSeq 2. 5 on being a good code agent within our framework. And CLOD was kind of head and shoulders above the rest. GPT 40 was kind of okay. The best open source model was LLAMA [00:16:00] 3. 1 405B. This needs to be updated because this is like a few months old by now and, you know, things are moving really, really fast.[00:16:05] Speaker: But I still am under the impression that Claude is the best. The other closed models are, you know, not quite as good. And then the open models are a little bit behind that. Grok, I, we haven't tried Grok at all, actually. So, it's a good question. If you want to try it I'd be happy to help.[00:16:24] Speaker: Cool.[00:16:24] Planning and Workflow for Agents[00:16:24] Speaker: Another thing is planning. And so there's a few considerations for planning. The first one is whether you have a curated plan or you have it generated on the fly. And so for solving GitHub issues, you can kind of have an overall plan. Like the plan is first reproduce. If there's an issue, first write tests to reproduce the issue or to demonstrate the issue.[00:16:50] Speaker: After that, run the tests and make sure they fail. Then go in and fix the tests. Run the tests again to make sure they pass and then you're done. So that's like a pretty good workflow [00:17:00] for like solving coding issues. And you could curate that ahead of time. Another option is to let the language model basically generate its own plan.[00:17:10] Speaker: And both of these are perfectly valid. Another one is explicit structure versus implicit structure. So let's say you generate a plan. If you have explicit structure, you could like write a multi agent system, and the multi agent system would have your reproducer agent, and then it would have your your bug your test writer agent, and your bug fixer agent, and lots of different agents, and you would explicitly write this all out in code, and then then use it that way.[00:17:38] Speaker: On the other hand, you could just provide a prompt that says, please do all of these things in order. So in OpenHands, we do very light planning. We have a single prompt. We don't have any multi agent systems. But we do provide, like, instructions about, like, what to do first, what to do next, and other things like this.[00:17:56] Speaker: I'm not against doing it the other way. But I laid [00:18:00] out some kind of justification for this in this blog called Don't Sleep on Single Agent Systems. And the basic idea behind this is if you have a really, really good instruction following agent it will follow the instructions as long as things are working according to your plan.[00:18:14] Speaker: But let's say you need to deviate from your plan, you still have the flexibility to do this. And if you do explicit structure through a multi agent system, it becomes a lot harder to do that. Like, you get stuck when things deviate from your plan. There's also some other examples, and I wanted to introduce a few papers.[00:18:30] Speaker: So one paper I liked recently is this paper called CoAct where you generate plans and then go in and fix them. And so the basic idea is like, if you need to deviate from your plan, you can You know, figure out that your plan was not working and go back and deviate from it.[00:18:49] Speaker: Another thing I think about a lot is specifying common workflows. So we're trying to tackle a software development and I already showed like three use cases where we do [00:19:00] software development and when we. We do software development, we do a ton of different things, but we do them over and over and over again.[00:19:08] Speaker: So just to give an example we fix GitHub actions when GitHub actions are failing. And we do that over and over and over again. That's not the number one thing that software engineers do, but it's a, you know, high up on the list. So how can we get a list of all of, like, the workflows that people are working on?[00:19:26] Speaker: And there's a few research works that people have done in this direction. One example is manual prompting. So there's this nice paper called STEP that got state of the art on the WebArena Web Navigation Benchmark where they came up with a bunch of manual workflows for solving different web navigation tasks.[00:19:43] Speaker: And we also have a paper recently called Agent Workflow Memory where the basic idea behind this is we want to create self improving agents that learn from their past successes. And the way it works is is we have a memory that has an example of lots of the previous [00:20:00] workflows that people have used. And every time the agent finishes a task and it self judges that it did a good job at that task, you take that task, you break it down into individual workflows included in that, and then you put it back in the prompt for the agent to work next time.[00:20:16] Speaker: And this we demonstrated that this leads to a 22. 5 percent increase on WebArena after 40 examples. So that's a pretty, you know, huge increase by kind of self learning and self improvement.[00:20:31] Speaker: Another thing is exploration. Oops. And one thing I think about is like, how can agents learn more about their environment before acting? And I work on coding and web agents, and there's, you know, a few good examples of this in, in both areas. Within coding, I view this as like repository understanding, understanding the code base that you're dealing with.[00:20:55] Speaker: And there's an example of this, or a couple examples of this, one example being AgentList. [00:21:00] Where they basically create a map of the repo and based on the map of the repo, they feed that into the agent so the agent can then navigate the repo and and better know where things are. And for web agents there's an example of a paper called Bagel, and basically what they do is they have the agent just do random tasks on a website, explore the website, better understand the structure of the website, and then after that they they feed that in as part of the product.[00:21:27] Speaker: Part seven is search. Right now in open hands, we just let the agent go on a linear search path. So it's just solving the problem once. We're using a good agent that can kind of like recover from errors and try alternative things when things are not working properly, but still we only have a linear search path.[00:21:45] Speaker: But there's also some nice work in 2024 that is about exploring multiple paths. So one example of this is there's a paper called Tree Search for Language Agents. And they basically expand multiple paths check whether the paths are going well, [00:22:00] and if they aren't going well, you rewind back. And on the web, this is kind of tricky, because, like, how do you rewind when you accidentally ordered something you don't want on Amazon?[00:22:09] Speaker: It's kind of, you know, not, not the easiest thing to do. For code, it's a little bit easier, because you can just revert any changes that you made. But I, I think that's an interesting topic, too.[00:22:21] Evaluation and Future Predictions for Agents[00:22:21] Speaker: And then finally evaluation. So within our development for evaluation, we want to do a number of things. The first one is fast sanity checks.[00:22:30] Speaker: And in order to do this, we want things we can run really fast, really really cheaply. So for web, we have something called mini world of bits, which is basically these trivial kind of web navigation things. We have something called the Adder Code Editing Benchmark, where it's just about editing individual files that we use.[00:22:48] Speaker: But we also want highly realistic evaluation. So for the web, we have something called WebArena that we created at CMU. This is web navigation on real real open source websites. So it's open source [00:23:00] websites that are actually used to serve shops or like bulletin boards or other things like this.[00:23:07] Speaker: And for code, we use Swebench, which I think a lot of people may have heard of. It's basically a coding benchmark that comes from real world pull requests on GitHub. So if you can solve those, you can also probably solve other real world pull requests. I would say we still don't have benchmarks for the fur full versatility of agents.[00:23:25] Speaker: So, for example We don't have benchmarks that test whether agents can code and do web navigation. But we're working on that and hoping to release something in the next week or two. So if that sounds interesting to you, come talk to me and I, I will tell you more about it.[00:23:42] Speaker: Cool. So I don't like making predictions, but I was told that I should be somewhat controversial, I guess, so I will, I will try to do it try to do it anyway, although maybe none of these will be very controversial. Um, the first thing is agent oriented LLMs like large language models for [00:24:00] agents.[00:24:00] Speaker: My, my prediction is every large LM trainer will be focusing on training models as agents. So every large language model will be a better agent model by mid 2025. Competition will increase, prices will go down, smaller models will become competitive as agents. So right now, actually agents are somewhat expensive to run in some cases, but I expect that that won't last six months.[00:24:23] Speaker: I, I bet we'll have much better agent models in six months. Another thing is instruction following ability, specifically in agentic contexts, will increase. And what that means is we'll have to do less manual engineering of agentic workflows and be able to do more by just prompting agents in more complex ways.[00:24:44] Speaker: Cloud is already really good at this. It's not perfect, but it's already really, really good. And I expect the other models will catch up to Cloud pretty soon. Error correction ability will increase, less getting stuck in loops. Again, this is something that Cloud's already pretty good at and I expect the others will, will follow.[00:25:00][00:25:01] Speaker: Agent benchmarks. Agent benchmarks will start saturating.[00:25:05] Speaker: And Swebench I think WebArena is already too easy. It, it is, it's not super easy, but it's already a bit too easy because the tasks we do in there are ones that take like two minutes for a human. So not, not too hard. And kind of historically in 2023 our benchmarks were too easy. So we built harder benchmarks like WebArena and Swebench were both built in 2023.[00:25:31] Future of Agent Development[00:25:31] Speaker: In 2024, our agents were too bad, so we built agents and now we're building better agents. In 2025, our benchmarks will be too easy, so we'll build better benchmarks, I'm, I'm guessing. So, I would expect to see much more challenging agent benchmarks come out, and we're already seeing some of them.[00:25:49] Speaker: In 2026, I don't know. I didn't write AGI, but we'll, we'll, we'll see.[00:25:56] Human-Agent Interaction Challenges[00:25:56] Speaker: Then the human agent computer interface. I think one thing that [00:26:00] we'll want to think about is what do we do at 75 percent success rate at things that we like actually care about? Right now we have 53 percent or 55 percent on Swebench verified, which is real world GitHub PRs.[00:26:16] Speaker: My impression is that the actual. Actual ability of models is maybe closer to 30 to 40%. So 30 to 40 percent of the things that I want an agent to solve on my own repos, it just solves without any human intervention. 80 to 90 percent it can solve without me opening an IDE. But I need to give it feedback.[00:26:36] Speaker: So how do we, how do we make that interaction smooth so that humans can audit? The work of agents that are really, really good, but not perfect is going to be a big challenge.[00:26:48] Expanding Agent Use Beyond Programming[00:26:48] Speaker: How can we expose the power of programming agents to other industries? So like as programmers, I think not all of us are using agents every day in our programming, although we probably will be [00:27:00] in in months or maybe a year.[00:27:02] Speaker: But I, I think it will come very naturally to us as programmers because we know code. We know, you know. Like how to architect software and stuff like that. So I think the question is how do we put this in the hands of like a lawyer or a chemist or somebody else and have them also be able to, you know, interact with it as naturally as we can.[00:27:25] Redesigning Systems for Agent Efficiency[00:27:25] Speaker: Another interesting thing is how can we redesign our existing systems for agents? So we had a paper on API based web agents, and basically what we showed is If you take a web agent and the agent interacts not with a website, but with APIs, the accuracy goes way up just because APIs are way easier to interact with.[00:27:42] Speaker: And in fact, like when I ask the, well, our agent, our agent is able to browse websites, but whenever I want it to interact with GitHub, I tell it do not browse the GitHub website. Use the GitHub API because it's way more successful at doing that. So maybe, you know, every website is going to need to have [00:28:00] an API because we're going to be having agents interact with them.[00:28:03] Accelerating Progress with Agent Technology[00:28:03] Speaker: About progress, I think progress will get faster. It's already fast. A lot of people are already overwhelmed, but I think it will continue. The reason why is agents are building agents. And better agents will build better agents faster. So I expect that you know, if you haven't interacted with a coding agent yet, it's pretty magical, like the stuff that it can do.[00:28:24] Speaker: So yeah.[00:28:28] Call to Action for Open Source Contributions[00:28:28] Speaker: And I have a call to action. I'm honestly, like I've been working on, you know, natural language processing and, and Language models for what, 15 years now. And even for me, it's pretty impressive what like AI agents powered by strong language models can do. On the other hand, I believe that we should really make these powerful tools accessible.[00:28:49] Speaker: And what I mean by this is I don't think like, you know, We, we should have these be opaque or limited to only a set, a certain set of people. I feel like they should be [00:29:00] affordable. They shouldn't be increasing the, you know, difference in the amount of power that people have. If anything, I'd really like them to kind of make it It's possible for people who weren't able to do things before to be able to do them well.[00:29:13] Speaker: Open source is one way to do that. That's why I'm working on open source. There are other ways to do that. You know, make things cheap, make things you know, so you can serve them to people who aren't able to afford them. Easily, like Duolingo is one example where they get all the people in the US to pay them 20 a month so that they can give all the people in South America free, you know, language education, so they can learn English and become, you know like, and become, you know, More attractive on the job market, for instance.[00:29:41] Speaker: And so I think we can all think of ways that we can do that sort of thing. And if that resonates with you, please contribute. Of course, I'd be happy if you contribute to OpenHands and use it. But another way you can do that is just use open source solutions, contribute to them, research with them, and train strong open source [00:30:00] models.[00:30:00] Speaker: So I see, you know, Some people in the room who are already training models. It'd be great if you could train models for coding agents and make them cheap. And yeah yeah, please. I, I was thinking about you among others. So yeah, that's all I have. Thanks.[00:30:20] Speaker 2: Slight, slightly controversial. Tick is probably the nicest way to say hot ticks. Any hot ticks questions, actual hot ticks?[00:30:31] Speaker: Oh, I can also show the other agents that were working, if anybody's interested, but yeah, sorry, go ahead.[00:30:36] Q&A: Agent Performance and Benchmarks[00:30:36] Speaker 3: Yeah, I have a couple of questions. So they're kind of paired, maybe. The first thing is that you said that You're estimating that your your agent is successfully resolving like something like 30 to 40 percent of your issues, but that's like below what you saw in Swebench.[00:30:52] Speaker 3: So I guess I'm wondering where that discrepancy is coming from. And then I guess my other second question, which is maybe broader in scope is that [00:31:00] like, if, if you think of an agent as like a junior developer, and I say, go do something, then I expect maybe tomorrow to get a Slack message being like, Hey, I ran into this issue.[00:31:10] Speaker 3: How can I resolve it? And, and, like you said, your agent is, like, successfully solving, like, 90 percent of issues where you give it direct feedback. So, are you thinking about how to get the agent to reach out to, like, for, for planning when it's, when it's stuck or something like that? Or, like, identify when it runs into a hole like that?[00:31:30] Speaker: Yeah, so great. These are great questions. Oh,[00:31:32] Speaker 3: sorry. The third question, which is a good, so this is the first two. And if so, are you going to add a benchmark for that second question?[00:31:40] Speaker: Okay. Great. Yeah. Great questions. Okay. So the first question was why do I think it's resolving less than 50 percent of the issues on Swebench?[00:31:48] Speaker: So first Swebench is on popular open source repos, and all of these popular open source repos were included in the training data for all of the language models. And so the language [00:32:00] models already know these repos. In some cases, the language models already know the individual issues in Swebench.[00:32:06] Speaker: So basically, like, some of the training data has leaked. And so it, it definitely will overestimate with respect to that. I don't think it's like, you know, Horribly, horribly off but I think, you know, it's boosting the accuracy by a little bit. So, maybe that's the biggest reason why. In terms of asking for help, and whether we're benchmarking asking for help yes we are.[00:32:29] Speaker: So one one thing we're working on now, which we're hoping to put out soon, is we we basically made SuperVig. Sweep edge issues. Like I'm having a, I'm having a problem with the matrix multiply. Please help. Because these are like, if anybody's run a popular open source, like framework, these are what half your issues are.[00:32:49] Speaker: You're like users show up and say like, my screen doesn't work. What, what's wrong or something. And so then you need to ask them questions and how to reproduce. So yeah, we're, we're, we're working on [00:33:00] that. I think. It, my impression is that agents are not very good at asking for help, even Claude. So like when, when they ask for help, they'll ask for help when they don't need it.[00:33:11] Speaker: And then won't ask for help when they do need it. So this is definitely like an issue, I think.[00:33:20] Speaker 4: Thanks for the great talk. I also have two questions.[00:33:23] Q&A: Web Agents and Interaction Methods[00:33:23] Speaker 4: It's first one can you talk a bit more about how the web agent interacts with So is there a VLM that looks at the web page layout and then you parse the HTML and select which buttons to click on? And if so do you think there's a future where there's like, so I work at Bing Microsoft AI.[00:33:41] Speaker 4: Do you think there's a future where the same web index, but there's an agent friendly web index where all the processing is done offline so that you don't need to spend time. Cleaning up, like, cleaning up these TML and figuring out what to click online. And any thoughts on, thoughts on that?[00:33:57] Speaker: Yeah, so great question. There's a lot of work on web [00:34:00] agents. I didn't go into, like, all of the details, but I think there's There's three main ways that agents interact with websites. The first way is the simplest way and the newest way, but it doesn't work very well, which is you take a screenshot of the website and then you click on a particular pixel value on the website.[00:34:23] Speaker: And Like models are not very good at that at the moment. Like they'll misclick. There was this thing about how like clawed computer use started like looking at pictures of Yellowstone national park or something like this. I don't know if you heard about this anecdote, but like people were like, oh, it's so human, it's looking for vacation.[00:34:40] Speaker: And it was like, no, it probably just misclicked on the wrong pixels and accidentally clicked on an ad. So like this is the simplest way. The second simplest way. You take the HTML and you basically identify elements in the HTML. You don't use any vision whatsoever. And then you say, okay, I want to click on this element.[00:34:59] Speaker: I want to enter text [00:35:00] in this element or something like that. But HTML is too huge. So it actually, it usually gets condensed down into something called an accessibility tree, which was made for screen readers for visually impaired people. And So that's another way. And then the third way is kind of a hybrid where you present the screenshot, but you also present like a textual summary of the output.[00:35:18] Speaker: And that's the one that I think will probably work best. What we're using is we're just using text at the moment. And that's just an implementation issue that we haven't implemented the. Visual stuff yet, but that's kind of like we're working on it now. Another thing that I should point out is we actually have two modalities for web browsing.[00:35:35] Speaker: Very recently we implemented this. And the reason why is because if you want to interact with full websites you will need to click on all of the elements or have the ability to click on all of the elements. But most of our work that we need websites for is just web browsing and like gathering information.[00:35:50] Speaker: So we have another modality where we convert all of it to markdown because that's like way more concise and easier for the agent to deal with. And then [00:36:00] can we create an index specifically for agents, maybe a markdown index or something like that would be, you know, would make sense. Oh, how would I make a successor to Swebench?[00:36:10] Speaker: So I mean, the first thing is there's like live code bench, which live code bench is basically continuously updating to make sure it doesn't leak into language model training data. That's easy to do for Swebench because it comes from real websites and those real websites are getting new issues all the time.[00:36:27] Speaker: So you could just do it on the same benchmarks that they have there. There's also like a pretty large number of things covering various coding tasks. So like, for example, Swebunch is mainly fixing issues, but there's also like documentation, there's generating tests that actually test the functionality that you want.[00:36:47] Speaker: And there there was a paper by a student at CMU on generating tests and stuff like that. So I feel like. Swebench is one piece of the puzzle, but you could also have like 10 different other tasks and then you could have like a composite [00:37:00] benchmark where you test all of these abilities, not just that particular one.[00:37:04] Speaker: Well, lots, lots of other things too, but[00:37:11] Speaker 2: Question from across. Use your mic, it will help. Um,[00:37:15] Speaker 5: Great talk. Thank you.[00:37:16] Q&A: Agent Architectures and Improvements[00:37:16] Speaker 5: My question is about your experience designing agent architectures. Specifically how much do you have to separate concerns in terms of tasks specific agents versus having one agent to do three or five things with a gigantic prompt with conditional paths and so on.[00:37:35] Speaker: Yeah, so that's a great question. So we have a basic coding and browsing agent. And I won't say basic, like it's a good, you know, it's a good agent, but it does coding and browsing. And it has instructions about how to do coding and browsing. That is enough for most things. Especially given a strong language model that has a lot of background knowledge about how to solve different types of tasks and how to use different APIs and stuff like that.[00:37:58] Speaker: We do have [00:38:00] a mechanism for something called micro agents. And micro agents are basically something that gets added to the prompt when a trigger is triggered. Right now it's very, very rudimentary. It's like if you detect the word GitHub anywhere, you get instructions about how to interact with GitHub, like use the API and don't browse.[00:38:17] Speaker: Also another one that I just added is for NPM, the like JavaScript package manager. And NPM, when it runs and it hits a failure, it Like hits in interactive terminals where it says, would you like to quit? Yep. Enter yes. And if that does it, it like stalls our agent for the time out until like two minutes.[00:38:36] Speaker: So like I added a new microagent whenever it started using NPM, it would Like get instructions about how to not use interactive terminal and stuff like that. So that's our current solution. Honestly, I like it a lot. It's simple. It's easy to maintain. It works really well and stuff like that. But I think there is a world where you would want something more complex than that.[00:38:55] Speaker 5: Got it. Thank you.[00:38:59] Speaker 6: I got a [00:39:00] question about MCP. I feel like this is the Anthropic Model Context Protocol. It seems like the most successful type of this, like, standardization of interactions between computers and agents. Are you guys adopting it? Is there any other competing standard?[00:39:16] Speaker 6: Anything, anything thought about it?[00:39:17] Speaker: Yeah, I think the Anth, so the Anthropic MCP is like, a way to It, it's essentially a collection of APIs that you can use to interact with different things on the internet. I, I think it's not a bad idea, but it, it's like, there's a few things that bug me a little bit about it.[00:39:40] Speaker: It's like we already have an API for GitHub, so why do we need an MCP for GitHub? Right. You know, like GitHub has an API, the GitHub API is evolving. We can look up the GitHub API documentation. So it seems like kind of duplicated a little bit. And also they have a setting where [00:40:00] it's like you have to spin up a server to serve your GitHub stuff.[00:40:04] Speaker: And you have to spin up a server to serve your like, you know, other stuff. And so I think it makes, it makes sense if you really care about like separation of concerns and security and like other things like this, but right now we haven't seen, we haven't seen that. To have a lot more value than interacting directly with the tools that are already provided.[00:40:26] Speaker: And that kind of goes into my general philosophy, which is we're already developing things for programmers. You know,[00:40:36] Speaker: how is an agent different than from a programmer? And it is different, obviously, you know, like agents are different from programmers, but they're not that different at this point. So we can kind of interact with the interfaces we create for, for programmers. Yeah. I might change my mind later though.[00:40:51] Speaker: So we'll see.[00:40:54] Speaker 7: Yeah. Hi. Thanks. Very interesting talk. You were saying that the agents you have right now [00:41:00] solve like maybe 30 percent of your, your issues out of the gate. I'm curious of the things that it doesn't do. Is there like a pattern that you observe? Like, Oh, like these are the sorts of things that it just seems to really struggle with, or is it just seemingly random?[00:41:15] Speaker: It's definitely not random. It's like, if you think it's more complex than it's. Like, just intuitively, it's more likely to fail. I've gotten a bit better at prompting also, so like, just to give an example it, it will sometimes fail to fix a GitHub workflow because it will not look at the GitHub workflow and understand what the GitHub workflow is doing before it solves the problem.[00:41:43] Speaker: So I, I think actually probably the biggest thing that it fails at is, um, er, that our, our agent plus Claude fails at is insufficient information gathering before trying to solve the task. And so if you provide all, if you provide instructions that it should do information [00:42:00] gathering beforehand, it tends to do well.[00:42:01] Speaker: If you don't provide sufficient instructions, it will try to solve the task without, like, fully understanding the task first, and then fail, and then you need to go back and give feedback. You know, additional feedback. Another example, like, I, I love this example. While I was developing the the monitor website that I, I showed here, we hit a really tricky bug where it was writing out a cache file to a different directory than it was reading the cache file from.[00:42:26] Speaker: And I had no idea what to do. I had no idea what was going on. I, I thought the bug was in a different part of the code, but what I asked it to do was come up with five possible reasons why this could be failing and decreasing order of likelihood and examine all of them. And that worked and it could just go in and like do that.[00:42:44] Speaker: So like I think a certain level of like scaffolding about like how it should sufficiently Gather all the information that's necessary in order to solve a task is like, if that's missing, then that's probably the biggest failure point at the moment. [00:43:00][00:43:01] Speaker 7: Thanks.[00:43:01] Speaker 6: Yeah.[00:43:06] Speaker 6: I'm just, I'm just using this as a chance to ask you all my questions.[00:43:09] Q&A: Self-Improving Agents and Authentication[00:43:09] Speaker 6: You had a, you had a slide on here about like self improving agents or something like that with memory. It's like a really throwaway slide for like a super powerful idea. It got me thinking about how I would do it. I have no idea how.[00:43:21] Speaker 6: So I just wanted you to chain a thought more on this.[00:43:25] Speaker: Yeah, self, self improving. So I think the biggest reason, like the simplest possible way to create a self improving agent. The problem with that is to have a really, really strong language model that with infinite context, and it can just go back and look at like all of its past experiences and, you know, learn from them.[00:43:46] Speaker: You might also want to remove the bad stuff just so it doesn't over index on it's like failed past experiences. But the problem is a really powerful language model is large. Infinite context is expensive. We don't have a good way to [00:44:00] index into it because like rag, Okay. At least in my experience, RAG from language to code doesn't work super well.[00:44:08] Speaker: So I think in the end, it's like, that's the way I would like to solve this problem. I'd like to have an infinite context and somehow be able to index into it appropriately. And I think that would mostly solve it. Another thing you can do is fine tuning. So I think like RAG is one way to get information into your model.[00:44:23] Speaker: Fine tuning is another way to get information into your model. So. That might be another way of continuously improving. Like you identify when you did a good job and then just add all of the good examples into your model.[00:44:34] Speaker 6: Yeah. So, you know, how like Voyager tries to write code into a skill library and then you reuse as a skill library, right?[00:44:40] Speaker 6: So that it improves in the sense that it just builds up the skill library over time.[00:44:44] Speaker: Yep.[00:44:44] Speaker 6: One thing I was like thinking about and there's this idea of, from, from Devin, your, your arch nemesis of playbooks. I don't know if you've seen them.[00:44:52] Speaker: Yeah, I mean, we're calling them workflows, but they're simpler.[00:44:55] Speaker 6: Yeah, so like, basically, like, you should, like, once a workflow works, you can kind of, [00:45:00] like, persist them as a skill library. Yeah. Right? Like I, I feel like that there's a, that's like some in between, like you said, you know, it's hard to do rag between language and code, but I feel like that is ragged for, like, I've done this before, last time I did it, this, this worked.[00:45:14] Speaker 6: So I'm just going to shortcut. All the stuff that failed before.[00:45:18] Speaker: Yeah, I totally, I think it's possible. It's just, you know, not, not trivial at the same time. I'll explain the two curves. So basically, the base, the baseline is just an agent that does it from scratch every time. And this curve up here is agent workflow memory where it's like adding the successful experiences back into the prompt.[00:45:39] Speaker: Why is this improving? The reason why is because just it failed on the first few examples and for the average to catch up it, it took a little bit of time. So it's not like this is actually improving it. You could just basically view the this one is constant and then this one is like improving.[00:45:56] Speaker: Like this, basically you can see it's continuing to go [00:46:00] up.[00:46:01] Speaker 8: How do you think we're going to solve the authentication problem for agents right now?[00:46:05] Speaker: When you say authentication, you mean like credentials, like, yeah.[00:46:09] Speaker 8: Yeah. Cause I've seen a few like startup solutions today, but it seems like it's limited to the amount of like websites or actual like authentication methods that it's capable of performing today.[00:46:19] Speaker: Yeah. Great questions. So. My preferred solution to this at the moment is GitHub like fine grained authentication tokens and GitHub fine grained authentication tokens allow you to specify like very free. On a very granular basis on this repo, you have permission to do this, on this repo, you have permission to do this.[00:46:41] Speaker: You also can prevent people from pushing to the main branch unless they get approved. You can do all of these other things. And I think these were all developed for human developers. Or like, the branch protection rules were developed for human developers. The fine grained authentication tokens were developed for GitHub apps.[00:46:56] Speaker: I think for GitHub, maybe [00:47:00] just pushing this like a little bit more is the way to do this. For other things, they're totally not prepared to give that sort of fine grained control. Like most APIs don't have something like a fine grained authentication token. And that goes into my like comment that we're going to need to prepare the world for agents, I think.[00:47:17] Speaker: But I think like the GitHub authentication tokens are like a good template for how you could start doing that maybe, but yeah, I don't, I don't, I don't have an answer.[00:47:25] Speaker 8: I'll let you know if I find one.[00:47:26] Speaker: Okay. Yeah.[00:47:31] Live Demonstration and Closing Remarks[00:47:31] Speaker: I'm going to finish up. Let, let me just see.[00:47:37] Speaker: Okay. So this one this one did write a script. I'm not going to actually read it for you. And then the other one, let's see.[00:47:51] Speaker: Yeah. So it sent a PR, sorry. What is, what is the PR URL?[00:48:00][00:48:02] Speaker: So I don't, I don't know if this sorry, that's taking way longer than it should. Okay, cool. Yeah. So this one sent a PR. I'll, I'll tell you later if this actually like successfully Oh, no, it's deployed on Vercel, so I can actually show you, but let's, let me try this real quick. Sorry. I know I don't have time.[00:48:24] Speaker: Yeah, there you go. I have pie charts now. So it's so fun. It's so fun to play with these things. Cause you could just do that while I'm giving a, you know, talk and things like that. So, yeah, thanks. Get full access to Latent Space at www.latent.space/subscribe

2024 in Synthetic Data and Smol Models [LS Live @ NeurIPS]

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Dec 24, 2024 28:36

Happy holidays! We'll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all all our LS supporters who helped fund the gorgeous venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver. Today, we're proud to share Loubna's highly anticipated talk (slides here)!Synthetic DataWe called out the Synthetic Data debate at last year's NeurIPS, and no surprise that 2024 was dominated by the rise of synthetic data everywhere:* Apple's Rephrasing the Web, Microsoft's Phi 2-4 and Orca/AgentInstruct, Tencent's Billion Persona dataset, DCLM, and HuggingFace's FineWeb-Edu, and Loubna's own Cosmopedia extended the ideas of synthetic textbook and agent generation to improve raw web scrape dataset quality* This year we also talked to the IDEFICS/OBELICS team at HuggingFace who released WebSight this year, the first work on code-vs-images synthetic data.* We called Llama 3.1 the Synthetic Data Model for its extensive use (and documentation!) of synthetic data in its pipeline, as well as its permissive license. * Nemotron CC and Nemotron-4-340B also made a big splash this year for how they used 20k items of human data to synthesize over 98% of the data used for SFT/PFT.* Cohere introduced Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress observing gains of up to 56.5% improvement in win rates comparing multiple teachers vs the single best teacher model* In post training, AI2's Tülu3 (discussed by Luca in our Open Models talk) and Loubna's Smol Talk were also notable open releases this year.This comes in the face of a lot of scrutiny and criticism, with Scale AI as one of the leading voices publishing AI models collapse when trained on recursively generated data in Nature magazine bringing mainstream concerns to the potential downsides of poor quality syndata:Part of the concerns we highlighted last year on low-background tokens are coming to bear: ChatGPT contaminated data is spiking in every possible metric:But perhaps, if Sakana's AI Scientist pans out this year, we will have mostly-AI AI researchers publishing AI research anyway so do we really care as long as the ideas can be verified to be correct?Smol ModelsMeta surprised many folks this year by not just aggressively updating Llama 3 and adding multimodality, but also adding a new series of “small” 1B and 3B “on device” models this year, even working on quantized numerics collaborations with Qualcomm, Mediatek, and Arm. It is near unbelievable that a 1B model today can qualitatively match a 13B model of last year:and the minimum size to hit a given MMLU bar has come down roughly 10x in the last year. We have been tracking this proxied by Lmsys Elo and inference price:The key reads this year are:* MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases* Apple Intelligence Foundation Language Models* Hymba: A Hybrid-head Architecture for Small Language Models* Loubna's SmolLM and SmolLM2: a family of state-of-the-art small models with 135M, 360M, and 1.7B parameters on the pareto efficiency frontier.* and Moondream, which we already covered in the 2024 in Vision talkFull Talk on YouTubeplease like and subscribe!Timestamps* [00:00:05] Loubna Intro* [00:00:33] The Rise of Synthetic Data Everywhere* [00:02:57] Model Collapse* [00:05:14] Phi, FineWeb, Cosmopedia - Synthetic Textbooks* [00:12:36] DCLM, Nemotron-CC* [00:13:28] Post Training - AI2 Tulu, Smol Talk, Cohere Multilingual Arbitrage* [00:16:17] Smol Models* [00:18:24] On Device Models* [00:22:45] Smol Vision Models* [00:25:14] What's NextTranscript2024 in Synthetic Data and Smol Models[00:00:00] [00:00:05] Loubna Intro[00:00:05] Speaker: I'm very happy to be here. Thank you for the invitation. So I'm going to be talking about synthetic data in 2024. And then I'm going to be talking about small on device models. So I think the most interesting thing about synthetic data this year is that like now we have it everywhere in the large language models pipeline.[00:00:33] The Rise of Synthetic Data Everywhere[00:00:33] Speaker: I think initially, synthetic data was mainly used just for post training, because naturally that's the part where we needed human annotators. And then after that, we realized that we don't really have good benchmarks to [00:01:00] measure if models follow instructions well, if they are creative enough, or if they are chatty enough, so we also started using LLMs as judges.[00:01:08] Speaker: Thank you. And I think this year and towards the end of last year, we also went to the pre training parts and we started generating synthetic data for pre training to kind of replace some parts of the web. And the motivation behind that is that you have a lot of control over synthetic data. You can control your prompt and basically also the kind of data that you generate.[00:01:28] Speaker: So instead of just trying to filter the web, you could try to get the LLM to generate what you think the best web pages could look like and then train your models on that. So this is how we went from not having synthetic data at all in the LLM pipeline to having it everywhere. And so the cool thing is like today you can train an LLM with like an entirely synthetic pipeline.[00:01:49] Speaker: For example, you can use our Cosmopedia datasets and you can train a 1B model on like 150 billion tokens that are 100 percent synthetic. And those are also of good quality. And then you can [00:02:00] instruction tune the model on a synthetic SFT dataset. You can also do DPO on a synthetic dataset. And then to evaluate if the model is good, you can use.[00:02:07] Speaker: A benchmark that uses LLMs as a judge, for example, MTBench or AlpacaEvil. So I think this is like a really mind blowing because like just a few years ago, we wouldn't think this is possible. And I think there's a lot of concerns about model collapse, and I'm going to talk about that later. But we'll see that like, if we use synthetic data properly and we curate it carefully, that shouldn't happen.[00:02:29] Speaker: And the reason synthetic data is very popular right now is that we have really strong models, both open and closed. It is really cheap and fast to use compared to human annotations, which cost a lot and take a lot of time. And also for open models right now, we have some really good inference frameworks.[00:02:47] Speaker: So if you have enough GPUs, it's really easy to spawn these GPUs and generate like a lot of synthetic data. Some examples are VLM, TGI, and TensorRT.[00:02:57] Model Collapse[00:02:57] Speaker: Now let's talk about the elephant in the room, model [00:03:00] collapse. Is this the end? If you look at the media and all of like, for example, some papers in nature, it's really scary because there's a lot of synthetic data out there in the web.[00:03:09] Speaker: And naturally we train on the web. So we're going to be training a lot of synthetic data. And if model collapse is going to happen, we should really try to take that seriously. And the other issue is that, as I said, we think, a lot of people think the web is polluted because there's a lot of synthetic data.[00:03:24] Speaker: And for example, when we're building fine web datasets here at Guillerm and Hinek, we're interested in like, how much synthetic data is there in the web? So there isn't really a method to properly measure the amount of synthetic data or to save a webpage synthetic or not. But one thing we can do is to try to look for like proxy words, for example, expressions like as a large language model or words like delve that we know are actually generated by chat GPT.[00:03:49] Speaker: We could try to measure the amount of these words in our data system and compare them to the previous years. For example, here, we measured like a, these words ratio in different dumps of common crawl. [00:04:00] And we can see that like the ratio really increased after chat GPT's release. So if we were to say that synthetic data amount didn't change, you would expect this ratio to stay constant, which is not the case.[00:04:11] Speaker: So there's a lot of synthetic data probably on the web, but does this really make models worse? So what we did is we trained different models on these different dumps. And we then computed their performance on popular, like, NLP benchmarks, and then we computed the aggregated score. And surprisingly, you can see that the latest DOMs are actually even better than the DOMs that are before.[00:04:31] Speaker: So if there's some synthetic data there, at least it did not make the model's worse. Yeah, which is really encouraging. So personally, I wouldn't say the web is positive with Synthetic Data. Maybe it's even making it more rich. And the issue with like model collapse is that, for example, those studies, they were done at like a small scale, and you would ask the model to complete, for example, a Wikipedia paragraph, and then you would train it on these new generations, and you would do that every day.[00:04:56] Speaker: iteratively. I think if you do that approach, it's normal to [00:05:00] observe this kind of behavior because the quality is going to be worse because the model is already small. And then if you train it just on its generations, you shouldn't expect it to become better. But what we're really doing here is that we take a model that is very large and we try to distill its knowledge into a model that is smaller.[00:05:14] Phi, FineWeb, Cosmopedia - Synthetic Textbooks[00:05:14] Speaker: And in this way, you can expect to get like a better performance for your small model. And using synthetic data for pre-training has become really popular. After the textbooks are all you need papers where Microsoft basically trained a series of small models on textbooks that were using a large LLM.[00:05:32] Speaker: And then they found that these models were actually better than models that are much larger. So this was really interesting. It was like first of its time, but it was also met with a lot of skepticism, which is a good thing in research. It pushes you to question things because the dataset that they trained on was not public, so people were not really sure if these models are really good or maybe there's just some data contamination.[00:05:55] Speaker: So it was really hard to check if you just have the weights of the models. [00:06:00] And as Hugging Face, because we like open source, we tried to reproduce what they did. So this is our Cosmopedia dataset. We basically tried to follow a similar approach to what they documented in the paper. And we created a synthetic dataset of textbooks and blog posts and stories that had almost 30 billion tokens.[00:06:16] Speaker: And we tried to train some models on that. And we found that like the key ingredient to getting a good data set that is synthetic is trying as much as possible to keep it diverse. Because if you just throw the same prompts as your model, like generate like a textbook about linear algebra, and even if you change the temperature, the textbooks are going to look alike.[00:06:35] Speaker: So there's no way you could scale to like millions of samples. And the way you do that is by creating prompts that have some seeds that make them diverse. In our case, the prompt, we would ask the model to generate a textbook, but make it related to an extract from a webpage. And also we try to frame it within, to stay within topic.[00:06:55] Speaker: For example, here, we put like an extract about cardiovascular bioimaging, [00:07:00] and then we ask the model to generate a textbook related to medicine that is also related to this webpage. And this is a really nice approach because there's so many webpages out there. So you can. Be sure that your generation is not going to be diverse when you change the seed example.[00:07:16] Speaker: One thing that's challenging with this is that you want the seed samples to be related to your topics. So we use like a search tool to try to go all of fine web datasets. And then we also do a lot of experiments with the type of generations we want the model to generate. For example, we ask it for textbooks for middle school students or textbook for college.[00:07:40] Speaker: And we found that like some generation styles help on some specific benchmarks, while others help on other benchmarks. For example, college textbooks are really good for MMLU, while middle school textbooks are good for benchmarks like OpenBookQA and Pico. This is like a sample from like our search tool.[00:07:56] Speaker: For example, you have a top category, which is a topic, and then you have some [00:08:00] subtopics, and then you have the topic hits, which are basically the web pages in fine web does belong to these topics. And here you can see the comparison between Cosmopedia. We had two versions V1 and V2 in blue and red, and you can see the comparison to fine web, and as you can see throughout the training training on Cosmopedia was consistently better.[00:08:20] Speaker: So we managed to get a data set that was actually good to train these models on. It's of course so much smaller than FineWeb, it's only 30 billion tokens, but that's the scale that Microsoft data sets was, so we kind of managed to reproduce a bit what they did. And the data set is public, so everyone can go there, check if everything is all right.[00:08:38] Speaker: And now this is a recent paper from NVIDIA, Neumatron CC. They took things a bit further, and they generated not a few billion tokens, but 1. 9 trillion tokens, which is huge. And we can see later how they did that. It's more of, like, rephrasing the web. So we can see today that there's, like, some really huge synthetic datasets out there, and they're public, so, [00:09:00] like, you can try to filter them even further if you want to get, like, more high quality corpses.[00:09:04] Speaker: So for this, rephrasing the web this approach was suggested in this paper by Pratyush, where basically in this paper, they take some samples from C4 datasets, and then they use an LLM to rewrite these samples into a better format. For example, they ask an LLM to rewrite the sample into a Wikipedia passage or into a Q& A page.[00:09:25] Speaker: And the interesting thing in this approach is that you can use a model that is Small because it doesn't, rewriting doesn't require knowledge. It's just rewriting a page into a different style. So the model doesn't need to have like knowledge that is like extensive of what is rewriting compared to just asking a model to generate a new textbook and not giving it like ground truth.[00:09:45] Speaker: So here they rewrite some samples from C4 into Q& A, into Wikipedia, and they find that doing this works better than training just on C4. And so what they did in Nemo Trans CC is a similar approach. [00:10:00] They rewrite some pages from Common Crawl for two reasons. One is to, like improve Pages that are low quality, so they rewrite them into, for example, Wikipedia page, so they look better.[00:10:11] Speaker: And another reason is to create more diverse datasets. So they have a dataset that they already heavily filtered, and then they take these pages that are already high quality, and they ask the model to rewrite them in Question and Answer format. into like open ended questions or like multi choice questions.[00:10:27] Speaker: So this way they can reuse the same page multiple times without fearing like having multiple duplicates, because it's the same information, but it's going to be written differently. So I think that's also a really interesting approach for like generating synthetic data just by rephrasing the pages that you already have.[00:10:44] Speaker: There's also this approach called Prox where they try to start from a web page and then they generate a program which finds how to write that page to make it better and less noisy. For example, here you can see that there's some leftover metadata in the web page and you don't necessarily want to keep that for training [00:11:00] your model.[00:11:00] Speaker: So So they train a model that can generate programs that can like normalize and remove lines that are extra. So I think this approach is also interesting, but it's maybe less scalable than the approaches that I presented before. So that was it for like rephrasing and generating new textbooks.[00:11:17] Speaker: Another approach that I think is really good and becoming really popular for using synthetic data for pre training is basically building a better classifiers. For filtering the web for example, here we release the data sets called fine web edu. And the way we built it is by taking Llama3 and asking it to rate the educational content of web pages from zero to five.[00:11:39] Speaker: So for example, if a page is like a really good textbook that could be useful in a school setting, it would get a really high score. And if a page is just like an advertisement or promotional material, it would get a lower score. And then after that, we take these synthetic annotations and we train a classifier on them.[00:11:57] Speaker: It's a classifier like a BERT model. [00:12:00] And then we run this classifier on all of FineWeb, which is a 15 trillion tokens dataset. And then we only keep the pages that have like a score that's higher than 3. So for example, in our case, we went from 15 trillion tokens to 3. to just 1. 5 trillion tokens. Those are really highly educational.[00:12:16] Speaker: And as you can see here, a fine web EDU outperforms all the other public web datasets by a larger margin on a couple of benchmarks here, I show the aggregated score and you can see that this approach is really effective for filtering web datasets to get like better corpuses for training your LLMs.[00:12:36] DCLM, Nemotron-CC[00:12:36] Speaker: Others also try to do this approach. There's, for example, the DCLM datasets where they also train the classifier, but not to detect educational content. Instead, they trained it on OpenHermes dataset, which is a dataset for instruction tuning. And also they explain like IAM5 subreddits, and then they also get really high quality dataset which is like very information dense and can help [00:13:00] you train some really good LLMs.[00:13:01] Speaker: And then Nemotron Common Crawl, they also did this approach, but instead of using one classifier, they used an ensemble of classifiers. So they used, for example, the DCLM classifier, and also classifiers like the ones we used in FineWebEducational, and then they combined these two. Scores into a, with an ensemble method to only retain the best high quality pages, and they get a data set that works even better than the ones we develop.[00:13:25] Speaker: So that was it for like synthetic data for pre-training.[00:13:28] Post Training - AI2 Tulu, Smol Talk, Cohere Multilingual Arbitrage[00:13:28] Speaker: Now we can go back to post training. I think there's a lot of interesting post training data sets out there. One that was released recently, the agent instructs by Microsoft where they basically try to target some specific skills. And improve the performance of models on them.[00:13:43] Speaker: For example, here, you can see code, brain teasers, open domain QA, and they managed to get a dataset that outperforms that's when fine tuning Mistral 7b on it, it outperforms the original instruct model that was released by Mistral. And as I said, to get good synthetic data, you really [00:14:00] have to have a framework to make sure that your data is diverse.[00:14:03] Speaker: So for example, for them, they always. And then they see the generations on either source code or raw text documents, and then they rewrite them to make sure they're easier to generate instructions from, and then they use that for their like instruction data generation. There's also the Tool3SFT mixture, which was released recently by Allen AI.[00:14:23] Speaker: It's also really good quality and it covers a wide range of tasks. And the way they make sure that this dataset is diverse is by using personas from the persona hub datasets. Which is basically a data set of like I think over a million personas. And for example, in the tool mixture to generate like a new code snippet, they would give like the model persona, for example, a machine learning researcher interested in neural networks, and then ask it to generate like a coding problem.[00:14:49] Speaker: This way you make sure that your data set is really diverse, and then you can further filter the data sets, for example, using the reward models. We also released a dataset called Smalltalk, [00:15:00] and we also tried to cover the wide range of tasks, and as you can see here, for example, when fine tuning Mistral 7b on the dataset, we also outperformed the original Mistral instructs on a number of benchmarks, notably on mathematics and instruction following with ifevil.[00:15:18] Speaker: Another paper that's really interesting I wanted to mention is this one called Multilingual Data Arbitrage by Cohere. And basically they want to generate a data set for post training that is multilingual. And they have a really interesting problem. It's the fact that there isn't like one model that's really good at all the languages they wanted.[00:15:36] Speaker: So what they do is that like they use not just one teacher model, but multiple teachers. And then they have a router which basically sends the prompts they have to all these models. And then they get the completions and they have a reward model that traces all these generations and only keeps the best one.[00:15:52] Speaker: And this is like arbitrage and finance. So well, I think what's interesting in this, it shows that like synthetic data, it doesn't have to come from a single model. [00:16:00] And because we have so many good models now, you could like pull these models together and get like a dataset that's really high quality and that's diverse and that's covers all your needs.[00:16:12] Speaker: I was supposed to put a meme there, but. Yeah, so that was it for like a synthetic data.[00:16:17] Smol Models[00:16:17] Speaker: Now we can go to see what's happening in the small models field in 2024. I don't know if you know, but like now we have some really good small models. For example, Lama 3. 2 1B is. It matches Lama 2. 13b from, that was released last year on the LMSYS arena, which is basically the default go to leaderboard for evaluating models using human evaluation.[00:16:39] Speaker: And as you can see here, the scores of the models are really close. So I think we've made like hugely forward in terms of small models. Of course, that's one, just one data point, but there's more. For example, if you look at this chart from the Quint 2. 5 blog post, it shows that today we have some really good models that are only like 3 billion parameters [00:17:00] and 4 billion that score really high on MMLU.[00:17:03] Speaker: Which is a really popular benchmark for evaluating models. And you can see here that the red, the blue dots have more than 65 on MMLU. And the grey ones have less. And for example, Llama33b had less. So now we have a 3b model that outperforms a 33b model that was released earlier. So I think now people are starting to realize that like, we shouldn't just scale and scale models, but we should try to make them more efficient.[00:17:33] Speaker: I don't know if you knew, but you can also chat with a 3B plus model on your iPhone. For example, here, this is an app called PocketPal, where you can go and select a model from Hugging Face. It has a large choice. For example, here we loaded the 5. 3. 5, which is 3. 8 billion parameters on this iPhone. And we can chat with this and you can see that even the latency is also acceptable.[00:17:57] Speaker: For example, here, I asked it to give me a joke about [00:18:00] NeurIPS. So let's see what it has to say.[00:18:06] Speaker: Okay, why did the neural network attend NeurIPS? Because it heard there would be a lot of layers and fun and it wanted to train its sense of humor. So not very funny, but at least it can run on device. Yeah, so I think now we have good small models, but we also have like good frameworks and tools to use these small models.[00:18:24] On Device Models[00:18:24] Speaker: So I think we're really close to having like really on edge and on device models that are really good. And I think for a while we've had this narrative. But just training larger models is better. Of course, this is supported by science scaling laws. As you can see here, for example, when we scale the model size, the loss is lower and obviously you get a better model.[00:18:46] Speaker: But and we can see this, for example, in the GPT family of models, how we went from just a hundred million parameters to more than a trillion. parameters. And of course, we all observed the performance improvement when using the latest model. But [00:19:00] one thing that we shouldn't forget is that when we scale the model, we also scale the inference costs and time.[00:19:05] Speaker: And so the largest models were are going to cost so much more. So I think now instead of just building larger models, we should be focusing on building more efficient models. It's no longer a race for the largest models since these models are really expensive to run and they require like a really good infrastructure to do that and they cannot run on, for example, consumer hardware.[00:19:27] Speaker: And when you try to build more efficient models that match larger models, that's when you can really unlock some really interesting on device use cases. And I think a trend that we're noticing now is the trend of training smaller models longer. For example, if you compare how much, how long LLAMA was trained compared to LLAMA3, there is a huge increase in the pre training length.[00:19:50] Speaker: LLAMA was trained on 1 trillion tokens, but LLAMA3 8b was trained on 15 trillion tokens. So Meta managed to get a model that's the same size, but But it performs so much [00:20:00] better by choosing to like spend the sacrifice during training, because as we know, training is a one time cost, but inference is something that's ongoing.[00:20:08] Speaker: If we want to see what are like the small models reads in 2024, I think this mobile LLM paper by Meta is interesting. They try to study different models that are like have the less than 1 billion parameters and find which architecture makes most sense for these models. For example, they find that depth is more important than width.[00:20:29] Speaker: So it's more important to have models that have like more layers than just one. making them more wide. They also find that GQA helps, that tying the embedding helps. So I think it's a nice study overall for models that are just a few hundred million parameters. There's also the Apple intelligence tech report, which is interesting.[00:20:48] Speaker: So for Apple intelligence, they had two models, one that was like on server and another model that was on device. It had 3 billion parameters. And I think the interesting part is that they trained this model using [00:21:00] pruning. And then distillation. And for example, they have this table where they show that, like, using pruning and distillation works much better than training from scratch.[00:21:08] Speaker: And they also have some interesting insights about, like, how they specialize their models on specific tasks, like, for example, summarization and rewriting. There's also this paper by NVIDIA that was released recently. I think you've already had a talk about, like, hybrid models that was all interesting.[00:21:23] Speaker: And this model, they used, like, a hybrid architecture between state space models and transformers. And they managed to train a 1B model that's really performant without needing to train it on a lot of tokens. And regarding our work, we just recently released SmallM2, so it's a series of three models, which are the best in class in each model size.[00:21:46] Speaker: For example, our 1. 7b model outperforms Lama 1b and also Qt 2. 5. And how we managed to train this model is the following. That's where you spent a lot of time trying to curate the pre training datasets. We did a lot of [00:22:00] ablations, trying to find which datasets are good and also how to mix them. We also created some new math and code datasets that we're releasing soon.[00:22:08] Speaker: But you basically really spent a lot of time trying to find what's the best mixture that you can train these models on. And then we spent some time trying to like we also trained these models for very long. For example, small M1 was trained only on 1 trillion tokens, but this model is trained on 11 trillion tokens.[00:22:24] Speaker: And we saw that the performance kept improving. The models didn't really plateau mid training, which I think is really interesting. It shows that you can train such small models for very long and keep getting performance gains. What's interesting about SmallLM2 is that it's fully open. We also released, like the pre training code base, the fine tuning code, the datasets, and also evaluation in this repository.[00:22:45] Smol Vision Models[00:22:45] Speaker: Also there's, like, really interesting small models for text, but also for vision. For example, here you can see SmallVLM, which is a 2B model that's really efficient. It doesn't consume a lot of RAM, and it also has a good performance. There's also Moondream 0. [00:23:00] 5b, which was released recently. It's like the smallest visual language model.[00:23:04] Speaker: And as you can see, there isn't like a big trade off compared to Moondream 2b. So now I showed you that we have some really good small models. We also have the tools to use them, but why should you consider using small models and when? I think, like, small models are really interesting because of the on device feature.[00:23:23] Speaker: Because these models are small and they can run fast, you can basically run them on your laptop, but also on your mobile phone. And this means that your dataset stays locally. You don't have to send your queries to third parties. And this really enhances privacy. That was, for example, one of the big selling points for Apple Intelligence.[00:23:42] Speaker: Also, right now, we really have a lot of work to do. So many frameworks to do on device inference. For example, there's MLX, MLC, Llama, CPP, Transformers, JS. So we have a lot of options and each of them have like great features. So you have so many options for doing that. Small models are also really powerful if you choose to specialize them.[00:24:00][00:24:00] Speaker: For example, here there's a startup called Numind, which took small LM and then they fine tuned it on text extraction datasets. And they managed to get a model that's not very far from models that are much larger. So I think text extraction is like one use case where small models can be really performant and it makes sense to use them instead of just using larger models.[00:24:19] Speaker: You can also chat with these models in browser. For example, here, you can go there, you can load the model, you can even turn off your internet and just start chatting with the model locally. Speaking of text extraction, if you don't want to fine tune the models, there's a really good method of structure generation.[00:24:36] Speaker: We can basically force the models to follow a JSON schema that you defined. For example, here, we try to force the model to follow a schema for extracting key information from GitHub issues. So you can input free text, which is a complaint about a GitHub repository, something not working. And then you can run it there and the model can extract anything that is relevant for your GitHub issue creation.[00:24:58] Speaker: For example, the [00:25:00] priority, for example, here, priority is high, the type of the issue bug, and then a title and the estimation of how long this will take to fix. And you can just like do this in the browser, you can transform your text into a GitHub issue that's properly formatted.[00:25:14] What's Next[00:25:14] Speaker: So what's next for synthetic data and small models?[00:25:18] Speaker: I think that domain specific synthetic data is going to be, it's already important, it's going to be even more important. For example, generating synthetic data for math. I think this really would help improve the reasoning of a lot of models. And a lot of people are doing it, for example, Quint 2. 12 math, everyone's trying to reproduce a one.[00:25:37] Speaker: And so I think for synthetic data, trying to specialize it on some domains is going to be really important. And then for small models, I think specializing them through fine tuning, it's also going to be really important because I think a lot of companies are just trying to use these large models because they are better.[00:25:53] Speaker: But on some tasks, I think you can already get decent performance with small models. So you don't need to Pay like a [00:26:00] cost that's much larger just to make your model better at your task by a few percent. And this is not just for text. And I think it also applies for other modalities like vision and audio.[00:26:11] Speaker: And I think you should also watch out for on device frameworks and applications. For example, like the app I showed, or lama, all these frameworks are becoming really popular and I'm pretty sure that we're gonna get like more of them in 2025. And users really like that. Maybe for other, I should also say hot take.[00:26:28] Speaker: I think that like in AI, we just started like with fine tuning, for example, trying to make BERT work on some specific use cases, and really struggling to do that. And then we had some models that are much larger. So we just switched to like prompt engineering to get the models And I think we're going back to fine tuning where we realize these models are really costly.[00:26:47] Speaker: It's better to use just a small model or try to specialize it. So I think it's a little bit of a cycle and we're going to start to see like more fine tuning and less of just like a prompt engineering the models. So that was my talk. Thank you for following. And if you have [00:27:00] any questions, we can take them now. Get full access to Latent Space at www.latent.space/subscribe

How to Manage Your Energy Without the Mom Guilt with Jessica Lynn Rojas

Viva la Mami

Play Episode Listen Later Dec 12, 2024 64:01 Transcription Available

In this episode, we welcome Jessica Rojas, a certified health and wellness coach, work-life balance expert, and founder of Jessica Lynn Wellness. She shares her insights and strategies on maintaining balance and managing energy effectively. We dive into the myths of the 50/50 balance and explore practical tips for prioritizing self-care, setting realistic goals, and breaking free from societal and cultural expectations. Jessica addresses the cultural conditioning many Latinas face, where being constantly busy is seen as a badge of honor, and helps us reframe these limiting beliefs.This episode is especially valuable for Latina moms who are struggling to balance their various roles and responsibilities while maintaining their sense of self. Whether you're a working mom, stay-at-home mom, or preparing for motherhood, this conversation offers valuable insights for creating a more balanced, fulfilling life that honors both your cultural heritage and personal well-being.For detailed show notes, visit vivalamami.com/episode104Key Topics Covered:Why the popular "50/50 work-life balance" concept is a myth and what to focus on insteadHow to break free from cultural conditioning that makes us feel guilty for prioritizing restThe game-changing shift from time management to energy managementPractical strategies for managing your energy as a busy momHow to identify and overcome the "mind drama" that keeps us exhaustedWays to build your own support system when you don't have family nearbyConnect with Jessica Rojas!Instagram: @jessicalynnwellnessWebsite: jessicalynn-wellness.comLove this episode? Subscribe wherever you are listening, share this episode with an amiga, and ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.You can connect with Viva la Mami on ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Instagram⁠⁠⁠⁠,⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Facebook⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, the ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠VLM website⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, or email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠.Join the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!

apple latinas rojas mom guilt mami manage your energy jessica lynn vl m

103. How to Honor Your Cultural Foods While Meeting Your Health Goals with Mariana Dineen

Viva la Mami

Play Episode Listen Later Dec 5, 2024 62:02 Transcription Available

In this episode, we welcome Mariana Dineen, a registered dietitian and founder of Elemento Health. She shares the importance of nutrition within the Latino community, especially for mothers striving to adopt a healthier lifestyle while preserving cultural traditions. Mariana emphasizes the significance of cultural relevance in nutrition counseling, overcoming language barriers, and the unique challenges faced by the Latino community in accessing healthcare.She offers practical strategies for meal planning and managing chronic diseases like diabetes through a cultural lens, stressing the importance of balance and inclusion rather than the elimination of cultural foods. We also touched on the impact of stress on eating habits, particularly for busy Latina moms, and how to address these issues holistically.Mariana's commitment to cultural sensitivity in her practice, Elemento Health, underscores her dedication to providing accessible and empathetic nutrition care. If you've ever felt shame about your cultural foods or struggled to find a healthcare provider who truly gets you, this episode is for you.For detailed show notes, visit vivalamami.com/episode103Key topics covered:Breaking down barriers to accessing nutrition care for LatinasMaking traditional dishes healthier while preserving cultural rootsManaging stress eating and emotional relationships with foodPractical meal planning strategies for busy mamasCulturally sensitive approaches to managing conditions like diabetesConnect with Mariana from Elemento Health!Email: mariana@elementohealth.comInstagram: @elemento_healthWebsite: elementohealth.comLove this episode? Subscribe wherever you are listening, share this episode with an amiga, and ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.You can connect with Viva la Mami on ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Instagram⁠⁠⁠⁠,⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Facebook⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, the ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠VLM website⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, or email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠.Join the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!

apple cultural latinas foods latino mami health goals dineen vl m

102. [Replay Episode] How to Practice Self-Gratitude as Moms

Viva la Mami

Play Episode Listen Later Nov 28, 2024 21:56 Transcription Available

In this replay episode and Thanksgiving edition, I discuss the significance of expressing self-gratitude, especially for moms who often neglect to acknowledge their own efforts while caring for their families. Join me as I share ways on how you can practice self-gratitude.These practices are not only beneficial to your own mental and emotional well-being but can also serve as an inspiring model of self-appreciation for your children.For detailed show notes, visit vivalamami.com/episode102In this episode, you'll hear:The importance of self-gratitudeHow to express self-gratitude this ThanksgivingFive ways to practice self-gratitudeIn what ways do you express self-gratitude?Love this episode? Subscribe wherever you are listening, share this episode with an amiga, and ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.You can connect with Viva la Mami on ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Instagram⁠⁠⁠⁠,⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Facebook⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, the ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠VLM website⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, or email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠.Join the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!

love thanksgiving apple practice gratitude moms mami vl m

Why Compound AI + Open Source will beat Closed AI

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Nov 25, 2024 58:25

We have a full slate of upcoming events: AI Engineer London, AWS Re:Invent in Las Vegas, and now Latent Space LIVE! at NeurIPS in Vancouver and online. Sign up to join and speak!We are still taking questions for our next big recap episode! Submit questions and messages on Speakpipe here for a chance to appear on the show!We try to stay close to the inference providers as part of our coverage, as our podcasts with Together AI and Replicate will attest: However one of the most notable pull quotes from our very well received Braintrust episode was his opinion that open source model adoption has NOT gone very well and is actually declining in relative market share terms (it is of course increasing in absolute terms):Today's guest, Lin Qiao, would wholly disagree. Her team of Pytorch/GPU experts are wholly dedicated toward helping you serve and finetune the full stack of open source models from Meta and others, across all modalities (Text, Audio, Image, Embedding, Vision-understanding), helping customers like Cursor and Hubspot scale up open source model inference both rapidly and affordably.Fireworks has emerged after its successive funding rounds with top tier VCs as one of the leaders of the Compound AI movement, a term first coined by the Databricks/Mosaic gang at Berkeley AI and adapted as “Composite AI” by Gartner:Replicating o1We are the first podcast to discuss Fireworks' f1, their proprietary replication of OpenAI's o1. This has become a surprisingly hot area of competition in the past week as both Nous Forge and Deepseek r1 have launched competitive models.Full Video PodcastLike and subscribe!Timestamps* 00:00:00 Introductions* 00:02:08 Pre-history of Fireworks and PyTorch at Meta* 00:09:49 Product Strategy: From Framework to Model Library* 00:13:01 Compound AI Concept and Industry Dynamics* 00:20:07 Fireworks' Distributed Inference Engine* 00:22:58 OSS Model Support and Competitive Strategy* 00:29:46 Declarative System Approach in AI* 00:31:00 Can OSS replicate o1?* 00:36:51 Fireworks f1* 00:41:03 Collaboration with Cursor and Speculative Decoding* 00:46:44 Fireworks quantization (and drama around it)* 00:49:38 Pricing Strategy* 00:51:51 Underrated Features of Fireworks Platform* 00:55:17 HiringTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner at CTO at Danceable Partners, and I'm joined by my co-host, Swyx founder, Osmalayar.Swyx [00:00:11]: Hey, and today we're in a very special studio inside the Fireworks office with Lin Qiang, CEO of Fireworks. Welcome. Yeah.Lin [00:00:20]: Oh, you should welcome us.Swyx [00:00:21]: Yeah, welcome. Yeah, thanks for having us. It's unusual to be in the home of a startup, but it's also, I think our relationship is a bit unusual compared to all our normal guests. Definitely.Lin [00:00:34]: Yeah. I'm super excited to talk about very interesting topics in that space with both of you.Swyx [00:00:41]: You just celebrated your two-year anniversary yesterday.Lin [00:00:43]: Yeah, it's quite a crazy journey. We circle around and share all the crazy stories across these two years, and it has been super fun. All the way from we experienced Silicon Valley bank run to we delete some data that shouldn't be deleted operationally. We went through a massive scale where we actually are busy getting capacity to, yeah, we learned to kind of work with it as a team with a lot of brilliant people across different places to join a company. It has really been a fun journey.Alessio [00:01:24]: When you started, did you think the technical stuff will be harder or the bank run and then the people side? I think there's a lot of amazing researchers that want to do companies and it's like the hardest thing is going to be building the product and then you have all these different other things. So, were you surprised by what has been your experience the most?Lin [00:01:42]: Yeah, to be honest with you, my focus has always been on the product side and then after the product goes to market. And I didn't realize the rest has been so complicated, operating a company and so on. But because I don't think about it, I just kind of manage it. So it's done. I think I just somehow don't think about it too much and solve whatever problem coming our way and it worked.Swyx [00:02:08]: So let's, I guess, let's start at the pre-history, the initial history of Fireworks. You ran the PyTorch team at Meta for a number of years and we previously had Sumit Chintal on and I think we were just all very interested in the history of GenEI. Maybe not that many people know how deeply involved Faire and Meta were prior to the current GenEI revolution.Lin [00:02:35]: My background is deep in distributed system, database management system. And I joined Meta from the data side and I saw this tremendous amount of data growth, which cost a lot of money and we're analyzing what's going on. And it's clear that AI is driving all this data generation. So it's a very interesting time because when I joined Meta, Meta is going through ramping down mobile-first, finishing the mobile-first transition and then starting AI-first. And there's a fundamental reason about that sequence because mobile-first gave a full range of user engagement that has never existed before. And all this user engagement generated a lot of data and this data power AI. So then the whole entire industry is also going through, falling through this same transition. When I see, oh, okay, this AI is powering all this data generation and look at where's our AI stack. There's no software, there's no hardware, there's no people, there's no team. I want to dive up there and help this movement. So when I started, it's very interesting industry landscape. There are a lot of AI frameworks. It's a kind of proliferation of AI frameworks happening in the industry. But all the AI frameworks focus on production and they use a very certain way of defining the graph of neural network and then use that to drive the model iteration and productionization. And PyTorch is completely different. So they could also assume that he was the user of his product. And he basically says, researchers face so much pain using existing AI frameworks, this is really hard to use and I'm going to do something different for myself. And that's the origin story of PyTorch. PyTorch actually started as the framework for researchers. They don't care about production at all. And as they grow in terms of adoption, so the interesting part of AI is research is the top of our normal production. There are so many researchers across academic, across industry, they innovate and they put their results out there in open source and that power the downstream productionization. So it's brilliant for MATA to establish PyTorch as a strategy to drive massive adoption in open source because MATA internally is a PyTorch shop. So it creates a flying wheel effect. So that's kind of a strategy behind PyTorch. But when I took on PyTorch, it's kind of at Caspo, MATA established PyTorch as the framework for both research and production. So no one has done that before. And we have to kind of rethink how to architect PyTorch so we can really sustain production workload, the stability, reliability, low latency, all this production concern was never a concern before. Now it's a concern. And we actually have to adjust its design and make it work for both sides. And that took us five years because MATA has so many AI use cases, all the way from ranking recommendation as powering the business top line or as ranking newsfeed, video ranking to site integrity detect bad content automatically using AI to all kinds of effects, translation, image classification, object detection, all this. And also across AI running on the server side, on mobile phones, on AI VR devices, the wide spectrum. So by the time we actually basically managed to support AI across ubiquitous everywhere across MATA. But interestingly, through open source engagement, we work with a lot of companies. It is clear to us like this industry is starting to take on AI first transition. And of course, MATA's hyperscale always go ahead of industry. And it feels like when we start this AI journey at MATA, there's no software, no hardware, no team. For many companies we engage with through PyTorch, we feel the pain. That's the genesis why we feel like, hey, if we create fireworks and support industry going through this transition, it will be a huge amount of impact. Of course, the problem that the industry is facing will not be the same as MATA. MATA is so big, right? So it's kind of skewed towards extreme scale and extreme optimization in the industry will be different. But we feel like we have the technical chop and we've seen a lot. We'll look to kind of drive that. So yeah, so that's how we started.Swyx [00:06:58]: When you and I chatted about the origins of fireworks, it was originally envisioned more as a PyTorch platform, and then later became much more focused on generative AI. Is that fair to say? What was the customer discovery here?Lin [00:07:13]: Right. So I would say our initial blueprint is we should build a PyTorch cloud because a PyTorch library and there's no SaaS platform to enable AI workloads.Swyx [00:07:26]: Even in 2022, it's interesting.Lin [00:07:28]: I would not say absolutely no, but cloud providers have some of those, but it's not first class citizen, right? At 2022, there's still like TensorFlow is massively in production. And this is all pre-gen AI, and PyTorch is kind of getting more and more adoption. But there's no PyTorch-first SaaS platform existing. At the same time, we are also a very pragmatic set of people. We really want to make sure from the get-go, we get really, really close to customers. We understand their use case, we understand their pain points, we understand the value we deliver to them. So we want to take a different approach instead of building a horizontal PyTorch cloud. We want to build a verticalized platform first. And then we talk with many customers. And interestingly, we started the company in September 2022, and in October, November, the OpenAI announced ChatGPT. And then boom, when we talked with many customers, they were like, can you help us work on the JNS aspect? So of course, there are some open source models. It's not as good at that time, but people are already putting a lot of attention there. Then we decided that if we're going to pick a vertical, we're going to pick JNI. The other reason is all JNI models are PyTorch models. So that's another reason. We believe that because of the nature of JNI, it's going to generate a lot of human consumable content. It will drive a lot of consumer, customer-developer-facing application and product innovation. Guaranteed. We're just at the beginning of this. Our prediction is for those kind of applications, the inference is much more important than training because inference scale is proportional to the up-limit award population. And training scale is proportional to the number of researchers. Of course, each training round could be very expensive. Although PyTorch supports both inference and training, we decided to laser focus on inference. So yeah, so that's how we got started. And we launched our public platform August last year. When we launched, it was a single product. It's a distributed inference engine with a simple API, open AI compatible API with many models. We started with LM and then we added a lot of models. Fast forward to now, we are a full platform with multiple product lines. So we love to kind of dive deep into what we offer. But that's a very fun journey in the past two years.Alessio [00:09:49]: What was the transition from you start to focus on PyTorch and people want to understand the framework, get it live. And now say maybe most people that use you don't even really know much about PyTorch at all. You know, they're just trying to consume a model. From a product perspective, like what were some of the decisions early on? Like right in October, November, you were just like, hey, most people just care about the model, not about the framework. We're going to make it super easy or was it more a gradual transition to the model librarySwyx [00:10:16]: you have today?Lin [00:10:17]: Yeah. So our product decision is all based on who is our ICP. And one thing I want to acknowledge here is the generic technology is disruptive. It's very different from AI before GNI. So it's a clear leap forward. Because before GNI, the companies that want to invest in AI, they have to train from scratch. There's no other way. There's no foundation model. It doesn't exist. So that means then to start a team, first hire a team who is capable of crunch data. There's a lot of data to crunch, right? Because training from scratch, you have to prepare a lot of data. And then they need to have GPUs to train, and then you start to manage GPUs. So then it becomes a very complex project. It takes a long time and not many companies can afford it, actually. And the GNI is a very different game right now, because it is a foundation model. So you don't have to train anymore. That makes AI much more accessible as a technology. As an app developer or product manager, even, not a developer, they can interact with GNI models directly. So our goal is to make AI accessible to all app developers and product engineers. That's our goal. So then getting them into the building model doesn't make any sense anymore with this new technology. And then building easy, accessible APIs is the most important. Early on, when we got started, we decided we're going to be open AI compatible. It's just kind of very easy for developers to adopt this new technology, and we will manage the underlying complexity of serving all these models.Swyx [00:11:56]: Yeah, open AI has become the standard. Even as we're recording today, Gemini announced that they have open AI compatible APIs. Interesting. So we just need to drop it all in line, and then we have everyone popping in line.Lin [00:12:09]: That's interesting, because we are working very closely with Meta as one of the partners. Meta, of course, is kind of very generous to donate many very, very strong open source models, expecting more to come. But also they have announced LamaStack, which is basically standardized, the upper level stack built on top of Lama models. So they don't just want to give out models and you figure out what the upper stack is. They instead want to build a community around the stack and build a new standard. I think there's an interesting dynamics in play in the industry right now, when it's more standardized across open AI, because they are kind of creating the top of the funnel, or standardized across Lama, because this is the most used open source model. So I think it's a lot of fun working at this time.Swyx [00:13:01]: I've been a little bit more doubtful on LamaStack, I think you've been more positive. Basically it's just like the meta version of whatever Hugging Face offers, you know, or TensorRT, or BLM, or whatever the open source opportunity is. But to me, it's not clear that just because Meta open sources Lama, that the rest of LamaStack will be adopted. And it's not clear why I should adopt it. So I don't know if you agree.Lin [00:13:27]: It's very early right now. That's why I kind of work very closely with them and give them feedback. The feedback to the meta team is very important. So then they can use that to continue to improve the model and also improve the higher level I think the success of LamaStack heavily depends on the community adoption. And there's no way around it. And I know the meta team would like to kind of work with a broader set of community. But it's very early.Swyx [00:13:52]: One thing that after your Series B, so you raced for Benchmark, and then Sequoia. I remember being close to you for at least your Series B announcements, you started betting heavily on this term of Compound AI. It's not a term that we've covered very much in the podcast, but I think it's definitely getting a lot of adoption from Databricks and Berkeley people and all that. What's your take on Compound AI? Why is it resonating with people?Lin [00:14:16]: Right. So let me give a little bit of context why we even consider that space.Swyx [00:14:22]: Because like pre-Series B, there was no message, and now it's like on your landing page.Lin [00:14:27]: So it's kind of very organic evolution from when we first launched our public platform, we are a single product. We are a distributed inference engine, where we do a lot of innovation, customized KUDA kernels, raw kernel kernels, running on different kinds of hardware, and build distributed disaggregated execution, inference execution, build all kinds of caching. So that is one. So that's kind of one product line, is the fast, most cost-efficient inference platform. Because we wrote PyTorch code, we know we basically have a special PyTorch build for that, together with a custom kernel we wrote. And then we worked with many more customers, we realized, oh, the distributed inference engine, our design is one size fits all. We want to have this inference endpoint, then everyone come in, and no matter what kind of form and shape or workload they have, it will just work for them. So that's great. But the reality is, we realized all customers have different kinds of use cases. The use cases come in all different forms and shapes. And the end result is the data distribution in their inference workload doesn't align with the data distribution in the training data for the model. It's a given, actually. If you think about it, because researchers have to guesstimate what is important, what's not important in preparing data for training. So because of that misalignment, then we leave a lot of quality, latency, cost improvement on the table. So then we're saying, OK, we want to heavily invest in a customization engine. And we actually announced it called FHIR Optimizer. So FHIR Optimizer basically helps users navigate a three-dimensional optimization space across quality, latency, and cost. So it's a three-dimensional curve. And even for one company, for different use cases, they want to land in different spots. So we automate that process for our customers. It's very simple. You have your inference workload. You inject into the optimizer along with the objective function. And then we spit out inference deployment config and the model setup. So it's your customized setup. So that is a completely different product. So that product thinking is one size fits all. And now on top of that, we provide a huge variety of state-of-the-art models, hundreds of them, varying from text to large state-of-the-art English models. That's where we started. And as we talk with many customers, we realize, oh, audio and text are very, very close. Many of our customers start to build assistants, all kinds of assistants using text. And they immediately want to add audio, audio in, audio out. So we support transcription, translation, speech synthesis, text, audio alignment, all different kinds of audio features. It's a big announcement. You should have heard by the time this is out. And the other areas of vision and text are very close with each other. Because a lot of information doesn't live in plain text. A lot of information lives in multimedia format, images, PDFs, screenshots, and many other different formats. So oftentimes to solve a problem, we need to put the vision model first to extract information and then use language model to process and then send out results. So vision is important. We also support vision model, various different kinds of vision models specialized in processing different kinds of source and extraction. And we're also going to have another announcement of a new API endpoint we'll support for people to upload various different kinds of multimedia content and then get the extract very accurate information out and feed that into LM. And of course, we support embedding because embedding is very important for semantic search, for RAG, and all this. And in addition to that, we also support text-to-image, image generation models, text-to-image, image-to-image, and we're adding text-to-video as well in our portfolio. So it's a very comprehensive set of model catalog that built on top of File Optimizer and Distributed Inference Engine. But then we talk with more customers, they solve business use case, and then we realize one model is not sufficient to solve their problem. And it's very clear because one is the model hallucinates. Many customers, when they onboard this JNI journey, they thought this is magical. JNI is going to solve all my problems magically. But then they realize, oh, this model hallucinates. It hallucinates because it's not deterministic, it's probabilistic. So it's designed to always give you an answer, but based on probabilities, so it hallucinates. And that's actually sometimes a feature for creative writing, for example. Sometimes it's a bug because, hey, you don't want to give misinformation. And different models also have different specialties. To solve a problem, you want to ask different special models to kind of decompose your task into multiple small tasks, narrow tasks, and then have an expert model solve that task really well. And of course, the model doesn't have all the information. It has limited knowledge because the training data is finite, not infinite. So the model oftentimes doesn't have real-time information. It doesn't know any proprietary information within the enterprise. It's clear that in order to really build a compiling application on top of JNI, we need a compound AI system. Compound AI system basically is going to have multiple models across modalities, along with APIs, whether it's public APIs, internal proprietary APIs, storage systems, database systems, knowledge to work together to deliver the best answer.Swyx [00:20:07]: Are you going to offer a vector database?Lin [00:20:09]: We actually heavily partner with several big vector database providers. Which is your favorite? They are all great in different ways. But it's public information, like MongoDB is our investor. And we have been working closely with them for a while.Alessio [00:20:26]: When you say distributed inference engine, what do you mean exactly? Because when I hear your explanation, it's almost like you're centralizing a lot of the decisions through the Fireworks platform on the quality and whatnot. What do you mean distributed? It's like you have GPUs in a lot of different clusters, so you're sharding the inference across the same model.Lin [00:20:45]: So first of all, we run across multiple GPUs. But the way we distribute across multiple GPUs is unique. We don't distribute the whole model monolithically across multiple GPUs. We chop them into pieces and scale them completely differently based on what's the bottleneck. We also are distributed across regions. We have been running in North America, EMEA, and Asia. We have regional affinity to applications because latency is extremely important. We are also doing global load balancing because a lot of applications there, they quickly scale to global population. And then at that scale, different content wakes up at a different time. And you want to kind of load balancing across. So all the way, and we also have, we manage various different kinds of hardware skew from different hardware vendors. And different hardware design is best for different types of workload, whether it's long context, short context, long generation. So all these different types of workload is best fitted for different kinds of hardware skew. And then we can even distribute across different hardware for a workload. So the distribution actually is all around in the full stack.Swyx [00:22:02]: At some point, we'll show on the YouTube, the image that Ray, I think, has been working on with all the different modalities that you offer. To me, it's basically you offer the open source version of everything that OpenAI typically offers. I don't think there is. Actually, if you do text to video, you will be a superset of what OpenAI offers because they don't have Sora. Is that Mochi, by the way? Mochi. Mochi, right?Lin [00:22:27]: Mochi. And there are a few others. I will say, the interesting thing is, I think we're betting on the open source community is going to proliferate. This is literally what we're seeing. And there's amazing video generation companies. There is amazing audio companies. Like cross-border, the innovation is off the chart, and we are building on top of that. I think that's the advantage we have compared with a closed source company.Swyx [00:22:58]: I think I want to restate the value proposition of Fireworks for people who are comparing you versus a raw GPU provider like a RunPod or Lambda or anything like those, which is like you create the developer experience layer and you also make it easily scalable or serverless or as an endpoint. And then, I think for some models, you have custom kernels, but not all models.Lin [00:23:25]: Almost for all models. For all large language models, all your models, and the VRMs. Almost for all models we serve.Swyx [00:23:35]: And so that is called Fire Attention. I don't remember the speed numbers, but apparently much better than VLM, especially on a concurrency basis.Lin [00:23:44]: So Fire Attention is specific mostly for language models, but for other modalities, we'll also have a customized kernel.Swyx [00:23:51]: And I think the typical challenge for people is understanding that has value, and then there are other people who are also offering open-source models. Your mode is your ability to offer a good experience for all these customers. But if your existence is entirely reliant on people releasing nice open-source models, other people can also do the same thing.Lin [00:24:14]: So I would say we build on top of open-source model foundation. So that's the kind of foundation we build on top of. But we look at the value prop from the lens of application developers and product engineers. So they want to create new UX. So what's happening in the industry right now is people are thinking about a completely new way of designing products. And I'm talking to so many founders, it's just mind-blowing. They help me understand existing way of doing PowerPoint, existing way of coding, existing way of managing customer service. It's actually putting a box in our head. For example, PowerPoint. So PowerPoint generation is we always need to think about how to fit into my storytelling into this format of slide one after another. And I'm going to juggle through design together with what story to tell. But the most important thing is what's our storytelling lines, right? And why don't we create a space that is not limited to any format? And those kind of new product UX design combined with automated content generation through Gen AI is the new thing that many founders are doing. What are the challenges they're facing? Let's go from there. One is, again, because a lot of products built on top of Gen AI, they are consumer-personal developer facing, and they require interactive experience. It's just a kind of product experience we all get used to. And our desire is to actually get faster and faster interaction. Otherwise, nobody wants to spend time, right? And then that requires low latency. And the other thing is the nature of consumer-personal developer facing is your audience is very big. You want to scale up to product market fit quickly. But if you lose money at a small scale, you're going to bankrupt quickly. So it's actually a big contrast. I actually have product market fit, but when I scale, I scale out of my business. So that's kind of a very funny way to think about it. So then having low latency and low cost is essential for those new applications and products to survive and really become a generation company. So that's the design point for our distributed inference engine and the file optimizer. File optimizer, you can think about that as a feedback loop. The more you feed your inference workload to our inference engine, the more we help you improve quality, lower latency further, lower your cost. It basically becomes better. And we automate that because we don't want you as an app developer or product engineer to think about how to figure out all these low-level details. It's impossible because you're not trained to do that at all. You should kind of keep your focus on the product innovation. And then the compound AI, we actually feel a lot of pain as the app developers, engineers, there are so many models. Every week, there's at least a new model coming out.Swyx [00:27:09]: Tencent had a giant model this week. Yeah, yeah.Lin [00:27:13]: I saw that. I saw that.Swyx [00:27:15]: It's like $500 billion.Lin [00:27:18]: So they're like, should I keep chasing this or should I forget about it? And which model should I pick to solve what kind of sub-problem? How do I even decompose my problem into those smaller problems and fit the model into it? I have no idea. And then there are two ways to think about this design. I think I talked about that in the past. One is imperative, as in you figure out how to do it. You give developer tools to dictate how to do it. Or you build a declarative system where a developer tells what they want to do, not how. So these are completely two different designs. So the analogy I want to draw is, in the data world, the database management system is a declarative system because people use database, use SQL. SQL is a way you say, what do you want to extract out of a database? What kind of result do you want? But you don't figure out which node is going to, how many nodes you're going to run on top of, how you redefine your disk, which index you use, which project. You don't need to worry about any of those. And database management system will figure out, generate a new best plan, and execute on that. So database is declarative. And it makes it super easy. You just learn SQL, which is learn a semantic meaning of SQL, and you can use it. Imperative side is there are a lot of ETL pipelines. And people design this DAG system with triggers, with actions, and you dictate exactly what to do. And if it fails, then how to recover. So that's an imperative system. We have seen a range of systems in the ecosystem go different ways. I think there's value of both. There's value of both. I don't think one is going to subsume the other. But we are leaning more into the philosophy of the declarative system. Because from the lens of app developer and product engineer, that would be easiest for them to integrate.Swyx [00:29:07]: I understand that's also why PyTorch won as well, right? This is one of the reasons. Ease of use.Lin [00:29:14]: Focus on ease of use, and then let the system take on the hard challenges and complexities. So we follow, we extend that thinking into current system design. So another announcement is we will also announce our next declarative system is going to appear as a model that has extremely high quality. And this model is inspired by Owen's announcement for OpenAI. You should see that by the time we announce this or soon.Alessio [00:29:46]: Trained by you.Lin [00:29:47]: Yes.Alessio [00:29:48]: Is this the first model that you trained? It's not the first.Lin [00:29:52]: We actually have trained a model called FireFunction. It's a function calling model. It's our first step into compound AI system. Because function calling model can dispatch a request into multiple APIs. We have pre-baked set of APIs the model learned. You can also add additional APIs through the configuration to let model dispatch accordingly. So we have a very high quality function calling model that's already released. We have actually three versions. The latest version is very high quality. But now we take a further step that you don't even need to use function calling model. You use our new model we're going to release. It will solve a lot of problems approaching very high OpenAI quality. So I'm very excited about that.Swyx [00:30:41]: Do you have any benchmarks yet?Lin [00:30:43]: We have a benchmark. We're going to release it hopefully next week. We just put our model to LMSYS and people are guessing. Is this the next Gemini model or a MADIS model? People are guessing. That's very interesting. We're watching the Reddit discussion right now.Swyx [00:31:00]: I have to ask more questions about this. When OpenAI released o1, a lot of people asked about whether or not it's a single model or whether it's a chain of models. Noam and basically everyone on the Strawberry team was very insistent that what they did for reinforcement learning, chain of thought, cannot be replicated by a whole bunch of open source model calls. Do you think that that is wrong? Have you done the same amount of work on RL as they have or was it a different direction?Lin [00:31:29]: I think they take a very specific approach where the caliber of team is very high. So I do think they are the domain expert in doing the things they are doing. I don't think there's only one way to achieve the same goal. We're on the same direction in the sense that the quality scaling law is shifting from training to inference. For that, I fully agree with them. But we're taking a completely different approach to the problem. All of that is because, of course, we didn't train the model from scratch. All of that is because we built on the show of giants. The current model available we have access to is getting better and better. The future trend is the gap between the open source model and the co-source model. It's just going to shrink to the point there's not much difference. And then we're on the same level field. That's why I think our early investment in inference and all the work we do around balancing across quality, latency, and cost pay off because we have accumulated a lot of experience and that empowers us to release this new model that is approaching open-ended quality.Alessio [00:32:39]: I guess the question is, what do you think the gap to catch up will be? Because I think everybody agrees with open source models eventually will catch up. And I think with 4, then with Lama 3.2, 3.1, 4.5b, we close the gap. And then 0.1 just reopened the gap so much and it's unclear. Obviously, you're saying your model will have...Swyx [00:32:57]: We're closing that gap.Alessio [00:32:58]: But you think in the future, it's going to be months?Lin [00:33:02]: So here's the thing that's happened. There's public benchmark. It is what it is. But in reality, open source models in certain dimensions are already on par or beat closed source models. So for example, in the coding space, open source models are really, really good. And in function calling, file function is also really, really good. So it's all a matter of whether you build one model to solve all the problems and you want to be the best of solving all the problems, or in the open source domain, it's going to specialize. All these different model builders specialize in certain narrow area. And it's logical that they can be really, really good in that very narrow area. And that's our prediction is with specialization, there will be a lot of expert models really, really good and even better than one-size-fits-all closed source models.Swyx [00:33:55]: I think this is the core debate that I am still not 100% either way on in terms of compound AI versus normal AI. Because you're basically fighting the bitter lesson.Lin [00:34:09]: Look at the human society, right? We specialize. And you feel really good about someone specializing doing something really well, right? And that's how our way evolved from ancient times. We're all journalists. We do everything. Now we heavily specialize in different domains. So my prediction is in the AI model space, it will happen also. Except for the bitter lesson.Swyx [00:34:30]: You get short-term gains by having specialists, domain specialists, and then someone just needs to train like a 10x bigger model on 10x more inference, 10x more data, 10x more model perhaps, whatever the current scaling law is. And then it supersedes all the individual models because of some generalized intelligence slash world knowledge. I think that is the core insight of the GPTs, the GPT-123 networks. Right.Lin [00:34:56]: But the training scaling law is because you have an increasing amount of data to train from. And you can do a lot of compute. So I think on the data side, we're approaching the limit. And the only data to increase that is synthetic generated data. And then there's like what is the secret sauce there, right? Because if you have a very good large model, you can generate very good synthetic data and then continue to improve quality. So that's why I think in OpenAI, they are shifting from the training scaling law intoSwyx [00:35:25]: inference scaling law.Lin [00:35:25]: And it's the test time and all this. So I definitely believe that's the future direction. And that's where we are really good at, doing inference.Swyx [00:35:34]: A couple of questions on that. Are you planning to share your reasoning choices?Lin [00:35:39]: That's a very good question. We are still debating.Swyx [00:35:43]: Yeah.Lin [00:35:45]: We're still debating.Swyx [00:35:46]: I would say, for example, it's interesting that, for example, SweetBench. If you want to be considered for ranking, you have to submit your reasoning choices. And that has actually disqualified some of our past guests. Cosign was doing well on SweetBench, but they didn't want to leak those results. So that's why you don't see O1 preview on SweetBench, because they don't submit their reasoning choices. And obviously, it's IP. But also, if you're going to be more open, then that's one way to be more open. So your model is not going to be open source, right? It's going to be an endpoint that you provide. Okay, cool. And then pricing, also the same as OpenAI, just kind of based on...Lin [00:36:25]: Yeah, this is... I don't have, actually, information. Everything is going so fast, we haven't even thought about that yet. Yeah, I should be more prepared.Swyx [00:36:33]: I mean, this is live. You know, it's nice to just talk about it as it goes live. Any other things that you want feedback on or you're thinking through? It's kind of nice to just talk about something when it's not decided yet. About this new model. It's going to be exciting. It's going to generate a lot of buzz. Right.Lin [00:36:51]: I'm very excited to see how people are going to use this model. So there's already a Reddit discussion about it. And people are asking very deep, mathematical questions. And since the model got it right, surprising. And internally, we're also asking the model to generate what is AGI. And it generates a very complicated DAG thinking process. So we're having a lot of fun testing this internally. But I'm more curious, how will people use it? What kind of application they're going to try and test on it? And that's where we really like to hear feedback from the community. And also feedback to us. What works out well? What doesn't work out well? What works out well, but surprising them? And what kind of thing they think we should improve on? And those kind of feedback will be tremendously helpful.Swyx [00:37:44]: Yeah. So I've been a production user of Preview and Mini since launch. I would say they're very, very obvious jobs in quality. So much so that they made clods on it. And they made the previous state-of-the-art look bad. It's really that stark, that difference. The number one thing, just feedback or feature requests, is people want control on the budget. Because right now, in 0.1, it kind of decides its own thinking budget. But sometimes you know how hard the problem is. And you want to actually tell the model, spend two minutes on this. Or spend some dollar amount. Maybe it's time you miss dollars. I don't know what the budget is. That makes a lot of sense.Lin [00:38:27]: So we actually thought about that requirement. And it should be, at some point, we need to support that. Not initially. But that makes a lot of sense.Swyx [00:38:38]: Okay. So that was a fascinating overview of just the things that you're working on. First of all, I realized that... I don't know if I've ever given you this feedback. But I think you guys are one of the reasons I agreed to advise you. Because I think when you first met me, I was kind of dubious. I was like... Who are you? There's Replicate. There's Together. There's Laptop. There's a whole bunch of other players. You're in very, very competitive fields. Like, why will you win? And the reason I actually changed my mind was I saw you guys shipping. I think your surface area is very big. The team is not that big. No. We're only 40 people. Yeah. And now here you are trying to compete with OpenAI and everyone else. What is the secret?Lin [00:39:21]: I think the team. The team is the secret.Swyx [00:39:23]: Oh boy. So there's no thing I can just copy. You just... No.Lin [00:39:30]: I think we all come from a very aligned culture. Because most of our team came from meta.Swyx [00:39:38]: Yeah.Lin [00:39:38]: And many startups. So we really believe in results. One is result. And second is customer. We're very customer obsessed. And we don't want to drive adoption for the sake of adoption. We really want to make sure we understand we are delivering a lot of business values to the customer. And we really value their feedback. So we would wake up midnight and deploy some model for them. Shuffle some capacity for them. And yeah, over the weekend, no brainer.Swyx [00:40:15]: So yeah.Lin [00:40:15]: So that's just how we work as a team. And the caliber of the team is really, really high as well. So as plug-in, we're hiring. We're expanding very, very fast. So if we are passionate about working on the most cutting-edge technology in the general space, come talk with us. Yeah.Swyx [00:40:38]: Let's talk a little bit about that customer journey. I think one of your more famous customers is Cursor. We were the first podcast to have Cursor on. And then obviously since then, they have blown up. Cause and effect are not related. But you guys especially worked on a fast supply model where you were one of the first people to work on speculative decoding in a production setting. Maybe just talk about what was the behind the scenes of working with Cursor?Lin [00:41:03]: I will say Cursor is a very, very unique team. I think the unique part is the team has very high technical caliber. There's no question about it. But they have decided, although many companies building coding co-pilot, they will say, I'm going to build a whole entire stack because I can. And they are unique in the sense they seek partnership. Not because they cannot. They're fully capable, but they know where to focus. That to me is amazing. And of course, they want to find a bypass partner. So we spent some time working together. They are pushing us very aggressively because for them to deliver high caliber product experience, they need the latency. They need the interactive, but also high quality at the same time. So actually, we expanded our product feature quite a lot as we support Cursor. And they are growing so fast. And we massively scaled quickly across multiple regions. And we developed a pretty high intense inference stack, almost like similar to what we do for Meta. I think that's a very, very interesting engagement. And through that, there's a lot of trust being built. They realize, hey, this is a team they can really partner with. And they can go big with. That comes back to, hey, we're really customer obsessed. And all the engineers working with them, there's just enormous amount of time syncing together with them and discussing. And we're not big on meetings, but we are like stack channel always on. Yeah, so you almost feel like working as one team. So I think that's really highlighted.Swyx [00:42:38]: Yeah. For those who don't know, so basically Cursor is a VS Code fork. But most of the time, people will be using closed models. Like I actually use a lot of SONET. So you're not involved there, right? It's not like you host SONET or you have any partnership with it. You're involved where Cursor is small, or like their house brand models are concerned, right?Lin [00:42:58]: I don't know what I can say, but the things they haven't said.Swyx [00:43:04]: Very obviously, the drop down is 4.0, but in Cursor, right? So I assume that the Cursor side is the Fireworks side. And then the other side, they're calling out the other. Just kind of curious. And then, do you see any more opportunity on the... You know, I think you made a big splash with 1,000 tokens per second. That was because of speculative decoding. Is there more to push there?Lin [00:43:25]: We push a lot. Actually, when I mentioned Fire Optimizer, right? So as in, we have a unique automation stack that is one size fits one. We actually deployed to Cursor earlier on. Basically optimized for their specific workload. And that's a lot of juice to extract out of there. And we see success in that product. It actually can be widely adopted. So that's why we started a separate product line called Fire Optimizer. So speculative decoding is just one approach. And speculative decoding here is not static. We actually wrote a blog post about it. There's so many different ways to do speculative decoding. You can pair a small model with a large model in the same model family. Or you can have equal pads and so on. There are different trade-offs which approach you take. It really depends on your workload. And then with your workload, we can align the Eagle heads or Medusa heads or a small big model pair much better to extract the best latency reduction. So all of that is part of the Fire Optimizer offering.Alessio [00:44:23]: I know you mentioned some of the other inference providers. I think the other question that people always have is around benchmarks. So you get different performance on different platforms. How should people think about... People are like, hey, Lama 3.2 is X on MMLU. But maybe using speculative decoding, you go down a different path. Maybe some providers run a quantized model. How should people think about how much they should care about how you're actually running the model? What's the delta between all the magic that you do and what a raw model...Lin [00:44:57]: Okay, so there are two big development cycles. One is experimentation, where they need fast iteration. They don't want to think about quality, and they just want to experiment with product experience and so on. So that's one. And then it looks good, and they want to post-product market with scaling. And the quality is really important. And latency and all the other things are becoming important. During the experimentation phase, it's just pick a good model. Don't worry about anything else. Make sure you even generate the right solution to your product. And that's the focus. And then post-product market fit, then that's kind of the three-dimensional optimization curve start to kick in across quality, latency, cost, where you should land. And to me, it's purely a product decision. To many products, if you choose a lower quality, but better speed and lower cost, but it doesn't make a difference to the product experience, then you should do it. So that's why I think inference is part of the validation. The validation doesn't stop at offline eval. The validation will go through A-B testing, through inference. And that's where we offer various different configurations for you to test which is the best setting. So this is the traditional product evaluation. So product evaluation should also include your new model versions and different model setup into the consideration.Swyx [00:46:22]: I want to specifically talk about what happens a few months ago with some of your major competitors. I mean, all of this is public. What is your take on what happens? And maybe you want to set the record straight on how Fireworks does quantization because I think a lot of people may have outdated perceptions or they didn't read the clarification post on your approach to quantization.Lin [00:46:44]: First of all, it's always a surprise to us that without any notice, we got called out.Swyx [00:46:51]: Specifically by name, which is normally not what...Lin [00:46:54]: Yeah, in a public post. And have certain interpretation of our quality. So I was really surprised. And it's not a good way to compete, right? We want to compete fairly. And oftentimes when one vendor gives out results, the interpretation of another vendor is always extremely biased. So we actually refrain ourselves to do any of those. And we happily partner with third parties to do the most fair evaluation. So we're very surprised. And we don't think that's a good way to figure out the competition landscape. So then we react. I think when it comes to quantization, the interpretation, we wrote actually a very thorough blog post. Because again, no one says it's all. We have various different quantization schemes. We can quantize very different parts of the model from ways to activation to cross-TPU communication. They can use different quantization schemes or consistent across the board. And again, it's a trade-off. It's a trade-off across this three-dimensional quality, latency, and cost. And for our customer, we actually let them find the best optimized point. And we have a very thorough evaluation process to pick that point. But for self-serve, there's only one point to pick. There's no customization available. So of course, it depends on what we talk with many customers. We have to pick one point. And I think the end result, like AA published, later on AA published a quality measure. And we actually looked really good. So that's why what I mean is, I will leave the evaluation of quality or performance to third party and work with them to find the most fair benchmark. And I think that's a good approach, a methodology. But I'm not a part of an approach of calling out specific namesSwyx [00:48:55]: and critique other competitors in a very biased way. Databases happens as well. I think you're the more politically correct one. And then Dima is the more... Something like this. It's you on Twitter.Lin [00:49:11]: It's like the Russian... We partner. We play different roles.Swyx [00:49:20]: Another one that I wanted to... I'm just the last one on the competition side. There's a perception of price wars in hosting open source models. And we talked about the competitiveness in the market. Do you aim to make margin on open source models? Oh, absolutely, yes.Lin [00:49:38]: So, but I think it really... When we think about pricing, it's really need to coordinate with the value we're delivering. If the value is limited, or there are a lot of people delivering the same value, there's no differentiation. There's only one way to go. It's going down. So through competition. If I take a big step back, there is pricing from... We're more compared with close model providers, APIs, right? The close model provider, their cost structure is even more interesting because we don't bear any training costs. And we focus on inference optimization, and that's kind of where we continue to add a lot of product value. So that's how we think about product. But for the close source API provider, model provider, they bear a lot of training costs. And they need to amortize the training costs into the inference. So that created very interesting dynamics of, yeah, if we match pricing there, and I think how they are going to make money is very, very interesting.Swyx [00:50:37]: So for listeners, opening eyes 2024, $4 billion in revenue, $3 billion in compute training, $2 billion in compute inference, $1 billion in research compute amortization, and $700 million in salaries. So that is like...Swyx [00:50:59]: I mean, a lot of R&D.Lin [00:51:01]: Yeah, so I think matter is basically like, make it zero. So that's a very, very interesting dynamics we're operating within. But coming back to inference, so we are, again, as I mentioned, our product is, we are a platform. We're not just a single model as a service provider as many other inference providers, like they're providing a single model. We have our optimizer to highly customize towards your inference workload. We have a compound AI system where significantly simplify your interaction to high quality and low latency, low cost. So those are all very different from other providers.Alessio [00:51:38]: What do people not know about the work that you do? I guess like people are like, okay, Fireworks, you run model very quickly. You have the function model. Is there any kind of like underrated part of Fireworks that more people should try?Lin [00:51:51]: Yeah, actually, one user post on x.com, he mentioned, oh, actually, Fireworks can allow me to upload the LoRa adapter to the service model at the same cost and use it at same cost. Nobody has provided that. That's because we have a very special, like we rolled out multi-LoRa last year, actually. And we actually have this function for a long time. And many people has been using it, but it's not well known that, oh, if you find your model, you don't need to use on demand. If you find your model is LoRa, you can upload your LoRa adapter and we deploy it as if it's a new model. And then you use, you get your endpoint and you can use that directly, but at the same cost as the base model. So I'm happy that user is marketing it for us. He discovered that feature, but we have that for last year. So I think to feedback to me is, we have a lot of very, very good features, as Sean just mentioned. I'm the advisor to the company,Swyx [00:52:57]: and I didn't know that you had speculative decoding released.Lin [00:53:02]: We have prompt catching way back last year also. We have many, yeah. So I think that is one of the underrated feature. And if they're developers, you are using our self-serve platform, please try it out.Swyx [00:53:16]: The LoRa thing is interesting because I think you also, the reason people add additional costs to it, it's not because they feel like charging people. Normally in normal LoRa serving setups, there is a cost to dedicating, loading those weights and dedicating a machine to that inference. How come you can't avoid it?Lin [00:53:36]: Yeah, so this is kind of our technique called multi-LoRa. So we basically have many LoRa adapters share the same base model. And basically we significantly reduce the memory footprint of serving. And the one base model can sustain a hundred to a thousand LoRa adapters. And then basically all these different LoRa adapters can share the same, like direct the same traffic to the same base model where base model is dominating the cost. So that's how we advertise that way. And that's how we can manage the tokens per dollar, million token pricing, the same as base model.Swyx [00:54:13]: Awesome. Is there anything that you think you want to request from the community or you're looking for model-wise or tooling-wise that you think like someone should be working on in this?Lin [00:54:23]: Yeah, so we really want to get a lot of feedback from the application developers who are starting to build on JNN or on the already adopted or starting about thinking about new use cases and so on to try out Fireworks first. And let us know what works out really well for you and what is your wishlist and what sucks, right? So what is not working out for you and we would like to continue to improve. And for our new product launches, typically we want to launch to a small group of people. Usually we launch on our Discord first to have a set of people use that first. So please join our Discord channel. We have a lot of communication going on there. Again, you can also give us feedback. We'll have a starting office hour for you to directly talk with our DevRel and engineers to exchange more long notes.Alessio [00:55:17]: And you're hiring across the board?Lin [00:55:18]: We're hiring across the board. We're hiring front-end engineers, infrastructure cloud, infrastructure engineers, back-end system optimization engineers, applied researchers, like researchers who have done post-training, who have done a lot of fine-tuning and so on.Swyx [00:55:34]: That's it. Thank you. Thanks for having us. Get full access to Latent Space at www.latent.space/subscribe

101. Managing Mom Rage and Emotional Dysregulation with Jocelyn Flores

Viva la Mami

Play Episode Listen Later Nov 21, 2024 53:45 Transcription Available

In this episode, I delve into the all-too-common experiences of emotional dysregulation and 'mom rage' that many Latina moms face. Joined by licensed marriage and family therapist Jocelyn Flores, founder of Raíz Parenting, we discuss practical tools like mindful breathing techniques and setting healthy boundaries to help manage these overwhelming emotions.Jocelyn also shares insights on the impact of our cultural background and the importance of self-compassion in the parenting journey. This is an authentic, judgment-free conversation aimed at providing support and reminding mamis that they are not alone in their struggles. We also explore how to break generational cycles of parenting and create a more emotionally secure environment for our children.For detailed show notes, visit vivalamami.com/episode101Connect with Jocelyn FloresWebsite: raiz-parenting.comInstagram: @raizparentingResources Mentioned:Get Raíz Parenting's Break the Cycle Freebie: www.raiz-parenting.com/freebie Sign up for the Raíz Parenting Community waitlist: www.raiz-parenting.com/waitlistLove this episode? Subscribe wherever you are listening, share this episode with an amiga, and ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.You can connect with Viva la Mami on ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Instagram⁠⁠⁠⁠,⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Facebook⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, the ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠VLM website⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, or email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠.Join the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!

apple parenting managing emotional rage latinas ra flores mami dysregulation vl m

100. Thank You for 100 Episodes!

Viva la Mami

Play Episode Listen Later Nov 14, 2024 27:23 Transcription Available

In this special 100th episode, I reflect on the journey of creating this podcast, from its start in 2022 to reaching this monumental milestone. I discuss the initial challenges and inspirations behind starting the Viva la Mami podcast, and how it has evolved over time. I also share the lessons I've learned and the transformations I've undergone through interviewing Latina moms and Latine experts. More importantly, I talk about the significance of this podcast in creating a community and elevating Latina voices and experiences. Finally, I express my hopes for the future of the Viva la Mami podcast and call for listener feedback on desired topics and guest suggestions.Thank you, mil gracias, for being part of this journey!For detailed show notes, visit vivalamami.com/episode100Resources Mentioned:Get my inexpensive microphone from Amazon!Love this episode? Subscribe wherever you are listening, share this episode with an amiga, and ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.You can connect with Viva la Mami on ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Instagram⁠⁠⁠⁠,⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Facebook⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, the ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠VLM website⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, or email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠.Join the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!

love amazon apple latinas mami latine vl m

099. [Replay Episode] How to Raise Children Through a Lens of Social Justice with Samantha Siers

Viva la Mami

Play Episode Listen Later Nov 7, 2024 54:07 Transcription Available

In this episode, I reflect on the emotional aftermath of the 2024 Presidential Election and the importance of focusing on family and community during uncertain times. Because I needed time to rest, I made the decision to replay an impactful interview with a Latina mami, social worker, and Obama Foundation Scholar, Samantha Siers. As mamás, we have been or will be having many difficult conversations with our children. Some of these conversations will be related to social justice-related issues like racism, poverty, and homelessness. In order for us to dismantle biases and prejudices, we should be mindful of what we say and what we model for our kids to ensure they are social justice-minded children.-------In this week's episode, we welcome Samantha Siers (she/her) who is a mom of two, and a 2023-2024 Obama Foundation Scholar. Through her personal experience with teenage parenthood, Samantha is passionate about offering robust, equitable, support services to mothers of color on Chicago's south and west sides. Her project with the Obama Foundation centers around creating a resource database that is truly reflective of the unique needs of single mothers.Samantha discusses the importance of instilling kindness, thoughtfulness and a sense of social justice in children and shares tips for families to become agents of change.In this episode you'll hear:Why it is important to raise justice-minded children.How to raise children through a social justice lens and how to help reduce fear.Age-appropriate lessons that we can teach our children and how to expose our children to normalize this process.How to navigate political differences in family.Ways to connect with Samantha Siers on LinkedIn and learn about the work she is doing.Love this episode? Subscribe wherever you are listening, share this episode with an amiga, and ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.You can connect with Viva la Mami on ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Instagram⁠⁠⁠⁠,⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Facebook⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, the ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠VLM website⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, or email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠.Join the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!

love children chicago apple raise latinas social justice lens presidential election mami obama foundation vl m

097. VLM Spotlight: From Beauty Content Creation to Authentic Madrehood with Rocio Isabel

Viva la Mami

Play Episode Listen Later Oct 24, 2024 59:53 Transcription Available

In this Viva la Mami Spotlight episode, we welcome content creator Rocio Isabel as she shares her personal journey from beauty content creator to an advocate for authenticity and mental health in motherhood. Rocio opens up about the challenges she faced, including postpartum anxiety and raising biracial, bilingual children in a less diverse community. She highlights the importance of preserving cultural identity through family traditions, food, music, and teaching her daughters Spanish. Rocio also emphasizes the value of seeking support - building online and offline communities - and embracing one's roots while navigating the complexities of madrehood. Her story is a testament to the power of authenticity, community, and resilience in raising confident, well-rounded children.In This Episode, You'll Hear About:The transition from beauty content to authentic motherhood storiesPregnancy and childbirth experiences during COVID-19Postpartum mental health awareness in the Latino communityBicultural parenting and preserving Latino heritageBuilding community both online and offline as a Latina momBreaking generational patterns while honoring cultural rootsFor detailed show notes, visit vivalamami.com/episode97Connect With Rocio Isabel:Instagram: @larocioisabelYouTube: Risas RizosWebsite: www.rocioisabel.comTikTok: @larocioisabelThreads: @larocioisabelLove this episode? Subscribe wherever you are listening, share this episode with an amiga, and ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.You can connect with Viva la Mami on ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Instagram⁠⁠⁠⁠,⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Facebook⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, the ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠VLM website⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, or email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠.Join the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!

apple beauty spanish authentic content creation mami rocio vl m

096. Fitness Tips for the Modern Latina Mom with Vanessa Flores

Viva la Mami

Play Episode Listen Later Oct 17, 2024 55:22 Transcription Available

In this episode, we discuss a holistic approach to fitness and health, specifically catered for busy moms. Vanessa Flores shares her journey from battling unhealthy body image and eating habits to creating Kettlebell Kickstart, a platform for simple and realistic fitness routines. We discuss the cultural challenges Latina moms face, emphasize viewing self-care as an investment in family well-being, and debunk fitness myths within the Latine community. Vanessa shares practical strategies for overcoming workout procrastination, balancing traditional foods with nutrition, and achieving mental resilience are highlighted. Through personal anecdotes and success stories, we talk about the importance of small, sustainable changes and the broader goal of redefining madrehood for lasting health benefits.For detailed show notes, visit vivalamami.com/episode96Ways to Connect With Vanessa FloresWebsite: thestrongfitlife.comInstagram: instagram.com/thestrongfitlifePersonal Facebook: facebook.com/vanessa.mugnainiProgram Facebook: facebook.com/strongfitmomsLove this episode? Subscribe wherever you are listening, share this episode with an amiga, and ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.You can connect with Viva la Mami on ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Instagram⁠⁠⁠⁠,⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Facebook⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, the ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠VLM website⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, or email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠.Join the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!

apple modern latinas mami latine fitness tips vl m vanessa flores

095. Navigating the Realities of Madrehood: A Raw and Honest Reflection

Viva la Mami

Play Episode Listen Later Oct 10, 2024 28:27 Transcription Available

In this State of Motherhood Address, I discuss the realities of motherhood, emphasizing the importance of authenticity and community within the Viva la Mami platform. I openly share my personal struggles with mental health, identity pressures, and the challenges of balancing family and entrepreneurship. My conversation touches on cultural expectations, the clutter of physical and mental spaces, and the necessity of self-care and seeking help.I also encourage others to join in embracing their true selves, acknowledging that it's okay not to be okay, and fostering a supportive environment for sharing experiences.This State of Madrehood was an Instagram Live. To see the video, check it out here.Love this episode? Subscribe wherever you are listening, share this episode with an amiga, and ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.You can connect with Viva la Mami on ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Instagram⁠⁠⁠⁠,⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Facebook⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, the ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠VLM website⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, or email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠.Join the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!

love apple state navigating reflection honest realities mami vl m

092. Making Time For When There's No Time As Moms

Viva la Mami

Play Episode Listen Later Sep 19, 2024 26:51 Transcription Available

In this episode, I tackle the challenge of how moms can prioritize ourselves when time is stretched thin. I identify common barriers, such as cultural expectations and internalized guilt, and share five strategies for self-care, including redefining 'me time', learning to say no, delegating tasks, multitasking strategically, and embracing good enough.I emphasize that self-care is necessary, not selfish, and encourage you to start small with simple, actionable steps. Additionally, I share information about my coaching program, Balanced Motherhood, designed to help Latina moms create balanced and fulfilling lives.For detailed show notes, visit vivalamami.com/episode92Resources shared:For quick at-home workouts, get a free trial of Get Mom Strong's SLAM Program!If you're in the Chicago area, join Rise & Thrive Latinas Book Club!Feeling overwhelmed by navigating cultural expectations and modern parenting as a Latina mom? Join Balanced Madrehood, Viva la Mami's first signature coaching program designed to empower bicultural Latinas to create a more balanced and fulfilling madrehood journey. Head over to vivalamami.com/balanced-madrehood to learn more and enroll today!Love this episode? Subscribe wherever you are listening, share this episode with an amiga, and ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.You can connect with Viva la Mami on ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Instagram⁠⁠⁠⁠,⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Facebook⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, the ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠VLM website⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, or email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠.Join the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!

love head chicago apple moms latinas no time making time mami vl m

Podcasts about vl m

Best podcasts about vl m

Viva la Mami

La Loi des séries

Marathon Talk

Cizaire Palace

Les Actuvores – VL Média

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Distances Inconnues – VL Média

Un éclair de Gueny

Les Prolongations – VL Média

Less & The City – VL Média

vegaslegalmagazine's podcast

Le Sporting Club

Au Fil des Siècles

Papers Read on AI

Les Snipers de l'info

HyperLink

GPT Reviews

MakingOf sur VL

Salade Tomate Oignons – VL Média

The top AI news from the past week, every ThursdAI

Les As du placard

Entre Deux Rendez-Vous

Le Grand N'importe Quoi

100% Ligue 1 – VL Média

Red Universe

Tout Pour Réussir

Fluggesellschaft.de Podcast

Latest news about vl m

Latest podcast episodes about vl m

Jérémy Charvet – invité exceptionnel (Au fil de la vie) | La loi des séries #830

Jean-Paul Césari et Dominique Poulain – Manga Méga Show | Le Club #67

Faut-il « romancer » des histoires comme celle d’Ed Gein ? | La loi des séries #829

Damien Boisseau – invité exceptionnel | Le Club #66

Les grands moments de Croque Vacances | Croque le Club #2

Benoît Solès – Le fantôme de l’Opéra / Vanille | Le club #65

Leur(s) Nom(s) est Bond… Jame(s) Bond(s) – (Partie 1 : 1962-1987) | Seriefonia

Joseph : Lucien Jean-Baptiste tourne la suite de sa série

Nicolas Marié – spéciale doublage / Julien Chiron (Panini) | Le Club #64

Polerik Rouvière – compositeur | La loi des séries #828

Laura Ferré, Anna-Maria Giachin, Alain Jeanne et Titia Augustin | Le Club #63

Juliette Lamboley pour « Nuit noire nuit blanche » | La loi des séries #827

Guillaume Lemans – créateur de Les sentinelles | La loi des séries #826

Richard Joffo – invité exceptionnel | Le Club #62

Maïra Schmitt : de Léo Mattéi à sa première réalisation | La loi des séries #825

Les trésors musicaux de Croque Vacances | Croque le Club #1

Iñaki Lartigue : « Aucun regret d’être parti » | La loi des séries #824

Constantin Pappas et Yann Pichon / Lisa Mistretta | Le Club #61

Ava Baya (Soleil Noir) – invitée exceptionnelle | La loi des séries #823

Acilion et les autres | Le Club croque l’été #8

La famille Croque Vacances | Le Club croque l’été #7

#305 - Lebendig gefressen

Hiroshi Watari et Makoto Sumikawa (Spielvan) | Le Club #60

Les chansons de Claude Pierrard, Isidore, Clémentine et les autres | Le Club croque l’été #6

Les dessins animés | Le Club croque l’été #5

Isidore et Clémentine | Le Club croque l’été #4

#302 - Woah Lecks: Hundreds of Beavers, Wicked, Squid Game, uvm.

#366. Ferievrøvl: måltidsfrekvens, kalorier på ferie og "krangel"

Les séries télévisées | Le Club croque l’été #3

Claude Pierrard | Le Club croque l’été #2

125. Best of VLM: How to Honor Your Cultural Foods While Meeting Your Health Goals with Mariana Dineen

124. Best of VLM: Managing Mom Rage and Emotional Dysregulation with Jocelyn Flores

123. Best of VLM: Navigating the Mental Load of Bilingual Parenting with Erika Milla from Spanish En Casita

π0: A Foundation Model for Robotics with Sergey Levine - #719

108. VLM Spotlight: Navigating Divorce as a Latina Mom with Marisa Lopez

2024 in Agents [LS Live! @ NeurIPS 2024]

2024 in Synthetic Data and Smol Models [LS Live @ NeurIPS]

How to Manage Your Energy Without the Mom Guilt with Jessica Lynn Rojas

103. How to Honor Your Cultural Foods While Meeting Your Health Goals with Mariana Dineen

102. [Replay Episode] How to Practice Self-Gratitude as Moms

Why Compound AI + Open Source will beat Closed AI

101. Managing Mom Rage and Emotional Dysregulation with Jocelyn Flores

100. Thank You for 100 Episodes!

099. [Replay Episode] How to Raise Children Through a Lens of Social Justice with Samantha Siers

097. VLM Spotlight: From Beauty Content Creation to Authentic Madrehood with Rocio Isabel

096. Fitness Tips for the Modern Latina Mom with Vanessa Flores

095. Navigating the Realities of Madrehood: A Raw and Honest Reflection

092. Making Time For When There's No Time As Moms