Podcasts about vl m

  • 74PODCASTS
  • 1,313EPISODES
  • 55mAVG DURATION
  • 5WEEKLY NEW EPISODES
  • Feb 20, 2026LATEST

POPULARITY

20192020202120222023202420252026

Categories



Best podcasts about vl m

Show all podcasts related to vl m

Latest podcast episodes about vl m

La Loi des séries
Laly Meignan – invitée exceptionnelle | La loi des séries #842

La Loi des séries

Play Episode Listen Later Feb 20, 2026 34:58


C’est une invitée que vous connaissez et qui se lance dans une nouvelle aventure qui est à mes côtés : Laly Meignan est mon invitée. L’invitée : Laly Meignan Figure incontournable de la saga Hélène... Cet article Laly Meignan – invitée exceptionnelle | La loi des séries #842 est apparu en premier sur VL Média.

La Loi des séries
Croque La 5 | Croque Le Club #5

La Loi des séries

Play Episode Listen Later Feb 18, 2026 27:53


Retour de Croque Le Club avec une nouvelle session d’émissions « collector » et pour cette première, on célèbre les 40 ans de La 5. A quoi ressemble Croque le Club Collector ? Génériques, séries, dessins animés,... Cet article Croque La 5 | Croque Le Club #5 est apparu en premier sur VL Média.

La Loi des séries
Valérie Bonneton – L’affaire Laura Stern | La loi des séries #841

La Loi des séries

Play Episode Listen Later Feb 17, 2026 21:59


A l’occasion de l’arrivée de L’affaire Laura Stern sur France.TV, Valérie Bonneton est l’invitée de ce podcast exceptionnel. L’invitée : Valérie Bonneton Depuis de nombreuses années, Valérie Bonneton est l’une des actrices préférées des françaises... Cet article Valérie Bonneton – L’affaire Laura Stern | La loi des séries #841 est apparu en premier sur VL Média.

La Loi des séries
Phoenix (France.TV) – les producteurs, invités exceptionnels | La loi des séries #840

La Loi des séries

Play Episode Listen Later Feb 13, 2026 27:22


A l’occasion de la diffusion sur France.TV de Phoenix, une série engagée, les deux producteurs sont nos invités cette semaine. Les invités : Alexandre Charlet et Nicolas de Saint Meleuc, les producteurs Une série audacieuse,... Cet article Phoenix (France.TV) – les producteurs, invités exceptionnels | La loi des séries #840 est apparu en premier sur VL Média.

The Atari Jaguar Game by Game Podcast
33 - JaguarCD and VLM

The Atari Jaguar Game by Game Podcast

Play Episode Listen Later Feb 7, 2026 182:01


It's rare, it's controversial, and it completes the Jaguar picture.  Atari's JaguarCD peripheral saw limited release after numerous delays, but was packed with a remarkable audio visualization feature called the Virtual Light Machine, and bundled with a non-trivial percentage of the Jaguar's total commercial CD library. In this episode, we dig through the development timeline both for the hardware itself as well as the Virtual Light Machine, go into perhaps a bit too much detail regarding the 81 VLM effects, marvel at Edge Magazine's JaguarCD coverage, talk about the Tempest 2000 Soundtrack CD also included in the box, and cover the separate-but-necessary MemoryTrack cartridge and it lofty ambitions for game data storage. Also included is feedback from Troff, Graymane Shadow, Editorb, and Aritheus! All that plus Storytime can be found in this slow-loading installment of the Atari Jaguar Game by Game Podcast. Full shownotes can be found at https://forums.atariage.com/blogs/entry/19767-33-jaguarcd-and-vlm-and-memorytrack-and-tempest-2000-soundtrack/ Next up: Blue Lightning!

La Loi des séries
Spéciale « Highlander la série » | Seriefonia

La Loi des séries

Play Episode Listen Later Jan 30, 2026 74:05


A l’occasion de la diffusion sur TF1+ de l’intégrale de la série, voici aujourd’hui une spéciale Highlander avec beaucoup d’interviews autour de la série. C'est la huitième saison déjà. Et c'est toujours… SérieFonia. SERIFONIA, SEASON... Cet article Spéciale « Highlander la série » | Seriefonia est apparu en premier sur VL Média.

La Loi des séries
Stars Manga – Marie Dauphin, Bernard Minet, JP Césari, Noam et Jacky – invités exceptionnels | Le club #73

La Loi des séries

Play Episode Listen Later Jan 28, 2026 70:28


Le concert Stars Manga c’est le 30 octobre prochain et on vous offre une émission spéciale avec les plus grands interprètes de génériques français. Avec Rui Pascoal Les invités : Marie Dauphin, Bernard Minet, JP... Cet article Stars Manga – Marie Dauphin, Bernard Minet, JP Césari, Noam et Jacky – invités exceptionnels | Le club #73 est apparu en premier sur VL Média.

La Loi des séries
Spéciale jeux télévisés avec Mouss et Alexandre Raveleau | Le Club #72

La Loi des séries

Play Episode Listen Later Jan 21, 2026 68:19


A l’occasion de la sortie du livre « La grande histoire des jeux télévisés », c’est une émission spéciale du Club que l’on propose aujourd’hui. Avec Lola Moreau et Arnaud Magnier Les invités : Alexandre Raveleau et... Cet article Spéciale jeux télévisés avec Mouss et Alexandre Raveleau | Le Club #72 est apparu en premier sur VL Média.

La Loi des séries
Sarah Pachoud et Leïna Djema – Le Signal 149 kHz | La loi des séries #839

La Loi des séries

Play Episode Listen Later Jan 16, 2026 26:43


A l’occasion de la diffusion de Le Signal 149 kHz sur Novo 19 à partir du 29 janvier, nous recevons Sarah Pachoud et Leïna Djema pour évoquer la série. Les invitées : Sarah Pachoud et... Cet article Sarah Pachoud et Leïna Djema – Le Signal 149 kHz | La loi des séries #839 est apparu en premier sur VL Média.

La Loi des séries
Capucine Malarre et Guillaume Labbé – Anaon | La loi des séries #838

La Loi des séries

Play Episode Listen Later Jan 15, 2026 28:08


A l’occasion de la diffusion sur France 2 de la série Anaon, nous recevons Capucine Malarre et Guillaume Labbé. Les invités : Capucine Malarre et Guillaume Labbé Ils sont fille et mère dans cette histoire... Cet article Capucine Malarre et Guillaume Labbé – Anaon | La loi des séries #838 est apparu en premier sur VL Média.

La Loi des séries
Amina et Antoine Delie – The Voice, Incroyable talent, l’Eurovision | Le Club #71

La Loi des séries

Play Episode Listen Later Jan 14, 2026 62:15


Ce sont deux grandes voix qui sont nos invités cette semaine : la dernière gagnante de l’Eurovision, Amina, et la révélation Antoine Delie. Les invités : Amina et Antoine Delie Nous recevons pour une complète... Cet article Amina et Antoine Delie – The Voice, Incroyable talent, l’Eurovision | Le Club #71 est apparu en premier sur VL Média.

La Loi des séries
Fanny Riedberger – Le diplôme (TF1) | La loi des séries #837

La Loi des séries

Play Episode Listen Later Jan 11, 2026 19:25


A l’occasion de la diffusion de la série Le diplôme sur TF1, nous recevons la créatrice de la série Fanny Riedberger qui évoque la pépite de TF1. L’invitée : Fanny Riedberger TF1 lance sa nouvelle... Cet article Fanny Riedberger – Le diplôme (TF1) | La loi des séries #837 est apparu en premier sur VL Média.

La Loi des séries
Les Simpson – une vie en jaune avec Romain Nigita | La loi des séries #836

La Loi des séries

Play Episode Listen Later Jan 9, 2026 26:59


A l’occasion de la sortie d’un essai sur la série Les Simpson, le journaliste Romain Nigita est notre invité cette semaine. L’invité : Romain NIgita – Les Simpson ou Le paradoxe du donut universel C’est... Cet article Les Simpson – une vie en jaune avec Romain Nigita | La loi des séries #836 est apparu en premier sur VL Média.

Viva la Mami
146. How to Make 2026 Your Best Year Yet | Season 6 Premiere

Viva la Mami

Play Episode Listen Later Jan 8, 2026 30:14 Transcription Available


Welcome to Season 6 of Viva la Mami! In this solo episode, I'm getting real with you about our move to Mexico, the lessons 2025 taught me, and how we can make 2026 our best year yet. Whether you're questioning the "American Dream", struggling with mom guilt, or dreaming of making a radical change in your own life, this episode is for you. Let's redefine madrehood together - one decision at a time.For detailed show notes, visit vivalamami.com/episode146What You'll Hear:What 2025 taught me about moving to another country and the lessons learnedNavigating the "ni de aquí, ni de allá" feeling all over again, even in the country our parents came fromWhat's New in Season 6 and exciting changes for VLM!Tips to make 2026 your best year yetResources Mentioned:Living in Mexico Series (Season 5) - Full episodes about our relocation journeyApply to be a guest on the show!Suggest an episode topic HERE.Suggest a guest for the podcast HERE.SHOP MY NEWEST PRODUCTS - "How to Get Dual Citizenship in Mexico" E-Guide & Digital Course

La Loi des séries
Hervé Rey – spéciale doublage | Le Club #70

La Loi des séries

Play Episode Listen Later Jan 7, 2026 85:18


Pour cette première édition 2026, nous consacrons une large part de l’émission à notre invité, une grande voix du doublage, Hervé Rey. Les invités : Hervé Rey et Leina Djema Il est une voix incontournable... Cet article Hervé Rey – spéciale doublage | Le Club #70 est apparu en premier sur VL Média.

La Loi des séries
Croque Noël – avec Claude Pierrard | Croque le Club #4

La Loi des séries

Play Episode Listen Later Dec 25, 2025 33:32


Nous voulions vous réserver un cadeau en ce 25 décembre : une émission spéciale en compagnie de et lancée par Claude Pierrard. A quoi ressemble Croque le Club ? Génériques, séries, dessins animés, plateaux ou... Cet article Croque Noël – avec Claude Pierrard | Croque le Club #4 est apparu en premier sur VL Média.

La Loi des séries
Leur(s) Nom(s) est Bond… Jame(s) Bond(s) (Partie 2 : 1989-2021) | Seriefonia

La Loi des séries

Play Episode Listen Later Dec 17, 2025 71:42


On poursuit cette grande rétrospective de James Bond au travers de sa musique dans notre émission Seriefonia. C'est la huitième saison déjà. Et c'est toujours… SérieFonia. SERIFONIA, SEASON 8 OPENING THEME by Jérôme Marie [AMBIANCE... Cet article Leur(s) Nom(s) est Bond… Jame(s) Bond(s) (Partie 2 : 1989-2021) | Seriefonia est apparu en premier sur VL Média.

AI + a16z
Building the “See Something, Say Something” AI for Every Camera

AI + a16z

Play Episode Listen Later Dec 16, 2025 39:25


a16z's Martin Casado sits down with Shikhar Shrestha, CEO and cofounder of Ambient, the company bringing agentic AI to physical security.Shikhar shares how a traumatic armed robbery at age 12—and a security camera that no one was watching—sparked his mission to make every camera intelligent.They discuss how Ambient's AI monitors camera feeds in real-time to detect threats and prevent incidents before they happen, navigating COVID as a physical security company, building their own reasoning VLM called Pulsar, and why the future of security is AI not just detecting threats but automatically responding to them.If you enjoyed this episode, please be sure to like, subscribe, and share with your friends.Follow Shikhar on X: https://x.com/shikharshresthaFollow Martin on X: x.com/martin_casado Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts. Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

AJR Podcast Series
Will AI Replace Me? A Grounded Overview of Vision–Language Models in Current Radiology Practice

AJR Podcast Series

Play Episode Listen Later Dec 15, 2025 5:50


Full article: Decoupling Visual Parsing and Diagnostic Reasoning for Vision–Language Models (GPT-4o and GPT-5): Analysis Using Thoracic Imaging Quiz Cases What is the bottleneck in ongoing attempts to use vision-language models to interpret radiologic imaging? Pranjal Rai, MD, discusses this recent AJR model by Han et al. that seeks to differentiate the roles of visual parsing and diagnostic reasoning toward impacting VLM performance.

La Loi des séries
Nos sourires – Anna Brauge et Victoire Baraduc | La loi des séries #835

La Loi des séries

Play Episode Listen Later Dec 12, 2025 21:25


Pour ce dernier numéro de La loi des séries avant les fêtes, on se focalise sur une série du web, auto-produite : Nos sourires dont la saison 1 vient de se terminer. Les invitées :... Cet article Nos sourires – Anna Brauge et Victoire Baraduc | La loi des séries #835 est apparu en premier sur VL Média.

La Loi des séries
Stephen J. Cannell avec Lionel Olenga | La loi des séries #834

La Loi des séries

Play Episode Listen Later Dec 5, 2025 28:08


A l’occasion de la sortie de l’intégrale de Le rebelle, émission spéciale Stephen J. Cannell en compagnie d’un auteur français, Lionel Olenga. L’invité : Lionel Olenga, scénariste, et grand amateur de Stephen J. Cannell A... Cet article Stephen J. Cannell avec Lionel Olenga | La loi des séries #834 est apparu en premier sur VL Média.

la loi vl m stephen j cannell
La Loi des séries
Hiba Tawaji – invitée exceptionnelle | Le Club #69

La Loi des séries

Play Episode Listen Later Dec 3, 2025 64:03


Elle est sans aucun doute l’une des plus belles voix de la chanson française : Hiba Tawaji est l’invitée exceptionnelle du Club. Les invitées : Hiba Tawaji et Soundous Moustarhim Deux femmes inspirantes sont cette... Cet article Hiba Tawaji – invitée exceptionnelle | Le Club #69 est apparu en premier sur VL Média.

La Loi des séries
Nouveau jour c’est (bientôt) fini | La loi des séries #833

La Loi des séries

Play Episode Listen Later Nov 28, 2025 24:01


La série Nouveau jour s’arrêtera définitivement sur M6+ le 26 décembre. On revient sur cette aventure malheureusement gâchée et terminée trop tôt. L’invité : Geoffrey Bidaut, auteur sur la série Nouveau jour Alors que la... Cet article Nouveau jour c’est (bientôt) fini | La loi des séries #833 est apparu en premier sur VL Média.

La Loi des séries
Daniel Njo Lobé – La petite boutique des horreurs | Le Club #68

La Loi des séries

Play Episode Listen Later Nov 26, 2025 61:03


Pour cette nouvelle édition du Club, de multiples invités sont avec nous dont Daniel Njo Lobé pour La petite boutique des horreurs. Les invités : Daniel Njo Lobé, Claire-Lise Lecerf et Clémentine Millière Le Club... Cet article Daniel Njo Lobé – La petite boutique des horreurs | Le Club #68 est apparu en premier sur VL Média.

La Loi des séries
Capucine Malarre – invitée exceptionnelle | La loi des séries 832

La Loi des séries

Play Episode Listen Later Nov 21, 2025 26:10


Alors que la série Désenchantées est toujours disponible sur France.TV, l’une des interprètes, Capucine Malarre est notre invitée cette semaine. L’invitée : Capucine Malarre Découverte dans la série Anaon sur Prime Video, Capucine Malarre enchaîne... Cet article Capucine Malarre – invitée exceptionnelle | La loi des séries 832 est apparu en premier sur VL Média.

La Loi des séries
« Au pays de Croque Vacances » – spéciale | Croque le Club #3

La Loi des séries

Play Episode Listen Later Nov 19, 2025 74:18


C’est lors d’une soirée au Grand Rex que le livre « Au pays de Croque Vacances » s’est dévoilé au public et on vous propose une émission spéciale. Les invités de la Spéciale Croque Vacances A l’occasion... Cet article « Au pays de Croque Vacances » – spéciale | Croque le Club #3 est apparu en premier sur VL Média.

La Loi des séries
Mélanie Bernier – Prisonnière (Ciné+ OCS) | La loi des séries #831

La Loi des séries

Play Episode Listen Later Nov 14, 2025 28:27


A l’occasion de la diffusion de l’unitaire Prisonnière sur Ciné+ Frisson, nous recevons Mélanie Bernier, héroïne de ce huis clos infernal. L’invitée : Mélanie Bernier pour Prisonnière Mélanie Bernier, bientôt l’affiche de Daron saison 2,... Cet article Mélanie Bernier – Prisonnière (Ciné+ OCS) | La loi des séries #831 est apparu en premier sur VL Média.

La Loi des séries
Jérémy Charvet – invité exceptionnel (Au fil de la vie) | La loi des séries #830

La Loi des séries

Play Episode Listen Later Nov 7, 2025 29:19


Figure emblématique du nouveau Plus belle la vie, Jérémy Charvet vient nous présenter son premier single, « Au fil de la vie ». L’invité : Jérémy Charvet Depuis janvier 2024, Jérémy Charvet incarne Ulysse Kepler dans Plus... Cet article Jérémy Charvet – invité exceptionnel (Au fil de la vie) | La loi des séries #830 est apparu en premier sur VL Média.

La Loi des séries
Jean-Paul Césari et Dominique Poulain – Manga Méga Show | Le Club #67

La Loi des séries

Play Episode Listen Later Nov 5, 2025 72:09


Nouveau numéro du Club avec une grande spéciale Manga Méga Show avec Jean-Paul Césari et Dominique Poulain. Les invités : Dominique Poulain et Jean-Paul Césari / Victoire « Nos sourires » A l’occasion du Manga Mega Show... Cet article Jean-Paul Césari et Dominique Poulain – Manga Méga Show | Le Club #67 est apparu en premier sur VL Média.

La Loi des séries
Faut-il « romancer » des histoires comme celle d’Ed Gein ? | La loi des séries #829

La Loi des séries

Play Episode Listen Later Oct 31, 2025 19:32


C’est LA série dont tout le monde a parlé : Alexandre Letren fait face cette semaine à la série « Monstre : l’histoire de Ed Gein ». Face à la série sur Ed Gein Nouveau format dans... Cet article Faut-il « romancer » des histoires comme celle d’Ed Gein ? | La loi des séries #829 est apparu en premier sur VL Média.

La Loi des séries
Damien Boisseau – invité exceptionnel | Le Club #66

La Loi des séries

Play Episode Listen Later Oct 29, 2025 68:43


Une nouvelle fois, c’est le doublage qui est à l’honneur avec notre invité, le génial Damien Boisseau, voix iconique des séries et des films. Les invités : Damien Boisseau et Clerwie Frin Grey’s Anatomy, Supernatural,... Cet article Damien Boisseau – invité exceptionnel | Le Club #66 est apparu en premier sur VL Média.

La Loi des séries
Les grands moments de Croque Vacances | Croque le Club #2

La Loi des séries

Play Episode Listen Later Oct 24, 2025 33:59


On poursuit notre belle série de l'été avec Croque le Club qui aujourd'hui s'intéresse aux grands événements survenus dans Croque Vacances. A quoi ressemblera Croque le Club ? Génériques, séries, dessins animés, plateaux ou chansons... Cet article Les grands moments de Croque Vacances | Croque le Club #2 est apparu en premier sur VL Média.

The top AI news from the past week, every ThursdAI

Hey everyone, Alex here! Welcome... to the browser war II - the AI edition! This week we chatted in depth about ChatGPT's new Atlas agentic browser, and the additional agentic powers Microsoft added to Edge with Copilot Mode (tho it didn't work for me) Also this week was a kind of crazy OCR week, with more than 4 OCR models releasing, and the crown one is DeepSeek OCR, that turned the whole industry on it's head (more later) Quite a few video updates as well, with real time lipsync from Decart, and a new update from LTX with 4k native video generation, it's been a busy AI week for sure! Additionally, I've had the pleasure to talk about AI Browsing agents with Paul from BrowserBase and real time video with Kwindla Kramer from Pipecat/Daily, so make sure to tune in for those interviews, buckle up, let's dive in! Thanks for reading ThursdAI - Recaps of the most high signal AI weekly spaces! This post is public so feel free to share it.Open Source: OCR is Not What You Think It Is (X, HF, Paper)The most important and frankly mind-bending release this week came from DeepSeek. They dropped DeepSeek-OCR, and let me tell you, this is NOT just another OCR model. The cohost were buzzing about this, and once I dug in, I understood why. This isn't just about reading text from an image; it's a revolutionary approach to context compression.We think that DeepSeek needed this as an internal tool, so we're really grateful to them for open sourcing this, as they did something crazy here. They are essentially turning text into a visual representation, compressing it, and then using a tiny vision decoder to read it back with incredible accuracy. We're talking about a compression ratio of up to 10x with 97% decoding accuracy. Even at 20x compression they are achieving 60% decoding accuracy! My head exploded live on the show when I read that. This is like the middle-out compression algorithm joke from Silicon Valley, but it's real. As Yam pointed out, this suggests our current methods of text tokenization are far from optimal.With only 3B and ~570M active parameters, they are taking a direct stab at long context inefficiency, imagine taking 1M tokens, encoding them into 100K visual tokens, and then feeding those into a model. Since the model is tiny, it's very cheap to run, for example, alphaXiv claimed they have OCRd' all of the papers on ArXiv with this model for $1000, a task that would have cost $7500 using MistalOCR - as per their paper, with DeepSeek OCR, on a single H100 GPU, its possible to scan up to 200K pages!

La Loi des séries
Benoît Solès – Le fantôme de l’Opéra / Vanille | Le club #65

La Loi des séries

Play Episode Listen Later Oct 22, 2025 62:57


Avec Benoît Solès et Vanille, on parlera de comédie musicale et d’univers artistique singulier dans cette édition du Club. Les invités : Benoît Solès et Vanille Figue incontournable du théâtre, Benoît Solès signe le livret... Cet article Benoît Solès – Le fantôme de l’Opéra / Vanille | Le club #65 est apparu en premier sur VL Média.

La Loi des séries
Leur(s) Nom(s) est Bond… Jame(s) Bond(s) – (Partie 1 : 1962-1987) | Seriefonia

La Loi des séries

Play Episode Listen Later Oct 19, 2025 64:11


Comme le nom du prochain Bond, Seriefonia s’est fait attendre mais pour mieux vous régaler avec cette spéciale « Bond » … James Bond. SERIFONIA, SEASON 8 OPENING THEME by Jérôme Marie Oui, je sais, elle arrive... Cet article Leur(s) Nom(s) est Bond… Jame(s) Bond(s) – (Partie 1 : 1962-1987) | Seriefonia est apparu en premier sur VL Média.

La Loi des séries
Joseph : Lucien Jean-Baptiste tourne la suite de sa série

La Loi des séries

Play Episode Listen Later Oct 16, 2025 16:26


Après une première saison, Joseph s’apprête à revenir mais sous une nouvelle forme, un nouvel épisode baptisé « La vie de palace ». Nouvelle histoire, nouvelle formule pour Joseph « Il faut être plus didactique, plus bavard car... Cet article Joseph : Lucien Jean-Baptiste tourne la suite de sa série est apparu en premier sur VL Média.

La Loi des séries
Nicolas Marié – spéciale doublage / Julien Chiron (Panini) | Le Club #64

La Loi des séries

Play Episode Listen Later Oct 15, 2025 78:38


Une belle émission spéciale doublage cette semaine avec l’excellent Nicolas Marié (Jarod). On parlera aussi album Panini avec Julien Chrinon. Les invités : Nicolas Marié et Julien Chiron Nicolas Marié est l’invité exceptionnel de cette... Cet article Nicolas Marié – spéciale doublage / Julien Chiron (Panini) | Le Club #64 est apparu en premier sur VL Média.

AFPT podden
#366. Ferievrøvl: måltidsfrekvens, kalorier på ferie og "krangel"

AFPT podden

Play Episode Listen Later Jul 27, 2025 51:43


Hva betyr egentlig måltidsfrekvens for kroppen din, og trenger du å spise hver tredje time for å holde formen? Vi pratet om det å finne en balanse som passer deg – også på ferie. Du får tips om protein, kalorier, ferieliv og hvorfor sosiale bånd er like viktige som hva du putter i kroppen. .

Viva la Mami
125. Best of VLM: How to Honor Your Cultural Foods While Meeting Your Health Goals with Mariana Dineen

Viva la Mami

Play Episode Listen Later May 15, 2025 62:33 Transcription Available


You're listening to the Best of VLM episode series featuring the most popular episodes of the Viva la Mami podcast!In this episode, we welcome Mariana Dineen, a registered dietitian and founder of Elemento Health. She shares the importance of nutrition within the Latino community, especially for mothers striving to adopt a healthier lifestyle while preserving cultural traditions. Mariana emphasizes the significance of cultural relevance in nutrition counseling, overcoming language barriers, and the unique challenges faced by the Latino community in accessing healthcare.She offers practical strategies for meal planning and managing chronic diseases like diabetes through a cultural lens, stressing the importance of balance and inclusion rather than the elimination of cultural foods. We also touched on the impact of stress on eating habits, particularly for busy Latina moms, and how to address these issues holistically.Mariana's commitment to cultural sensitivity in her practice, Elemento Health, underscores her dedication to providing accessible and empathetic nutrition care. If you've ever felt shame about your cultural foods or struggled to find a healthcare provider who truly gets you, this episode is for you.For detailed show notes, visit vivalamami.com/episode125Key topics covered:Breaking down barriers to accessing nutrition care for LatinasMaking traditional dishes healthier while preserving cultural rootsManaging stress eating and emotional relationships with foodPractical meal planning strategies for busy mamasCulturally sensitive approaches to managing conditions like diabetesConnect with Mariana from Elemento Health!Email: mariana@elementohealth.comInstagram: @elemento_healthWebsite: elementohealth.comLove this episode? Subscribe wherever you are listening, share this episode with an amiga, and leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.Follow Viva la Mami on Instagram @vivalamamiJoin the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!Have a suggestion for an episode topic? Click HEREHave a suggestion for a guest? Click HEREVisit the Viva la Mami Websitewww.vivalamami.comHave questions or want to connect? Email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Viva la Mami
124. Best of VLM: Managing Mom Rage and Emotional Dysregulation with Jocelyn Flores

Viva la Mami

Play Episode Listen Later May 8, 2025 54:13 Transcription Available


You're listening to the Best of VLM episode series featuring the most popular episodes of the Viva la Mami podcast!In this episode, I delve into the all-too-common experiences of emotional dysregulation and ‘mom rage' that many Latina moms face. Joined by licensed marriage and family therapist Jocelyn Flores, founder of Raíz Parenting, we discuss practical tools like mindful breathing techniques and setting healthy boundaries to help manage these overwhelming emotions.Jocelyn also shares insights on the impact of our cultural background and the importance of self-compassion in the parenting journey. This is an authentic, judgment-free conversation aimed at providing support and reminding mamis that they are not alone in their struggles. We also explore how to break generational cycles of parenting and create a more emotionally secure environment for our children.For full show notes, visit vivalamami.com/episode124Connect with Jocelyn FloresWebsite: raiz-parenting.comInstagram: @raizparentingResources Mentioned:Get Raíz Parenting's Break the Cycle Freebie: www.raiz-parenting.com/freebieLove this episode? Subscribe wherever you are listening, share this episode with an amiga, and leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.Follow Viva la Mami on Instagram @vivalamamiJoin the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!Have a suggestion for an episode topic? Click HEREHave a suggestion for a guest? Click HEREVisit the Viva la Mami Websitewww.vivalamami.comHave questions or want to connect? Email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Viva la Mami
123. Best of VLM: Navigating the Mental Load of Bilingual Parenting with Erika Milla from Spanish En Casita

Viva la Mami

Play Episode Listen Later May 1, 2025 60:25 Transcription Available


You're listening to the Best of VLM episode series featuring the most popular episodes of the Viva la Mami podcast!In this episode, we welcome Erika Milla, creator of Spanish En Casita, an online community for parents striving to raise bilingual children. As a dedicated mom raising bilingual kids, Erika shares how she's preserving Spanish at home.In our conversation, we dive into the mental and emotional struggles of bilingual parenting and the ups and downs of dual immersion programs. Erika also opens up about homeschooling her kids and gives tips for other parents on similar journeys. In addition, talk about the importance of community, cultural pride, and staying intentional in raising bilingual children.For full show notes, visit vivalamami.com/episode123Follow Erika Milla!Instagram: instagram.com/spanish.en.casitaFeeling overwhelmed by navigating cultural expectations and modern parenting as a Latina mom? Join Balanced Madrehood, Viva la Mami's signature coaching program designed to empower Latina moms to create a more balanced and fulfilling madrehood journey. Head over to vivalamami.com/balanced-madrehood to learn more!Love this episode? Subscribe wherever you are listening, share this episode with an amiga, and leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.Follow Viva la Mami on Instagram @vivalamamiJoin the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!Have a suggestion for an episode topic? Click HEREHave a suggestion for a guest? Click HEREVisit the Viva la Mami Websitewww.vivalamami.comHave questions or want to connect? Email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
π0: A Foundation Model for Robotics with Sergey Levine - #719

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Feb 18, 2025 52:30


Today, we're joined by Sergey Levine, associate professor at UC Berkeley and co-founder of Physical Intelligence, to discuss π0 (pi-zero), a general-purpose robotic foundation model. We dig into the model architecture, which pairs a vision language model (VLM) with a diffusion-based action expert, and the model training "recipe," emphasizing the roles of pre-training and post-training with a diverse mixture of real-world data to ensure robust and intelligent robot learning. We review the data collection approach, which uses human operators and teleoperation rigs, the potential of synthetic data and reinforcement learning in enhancing robotic capabilities, and much more. We also introduce the team's new FAST tokenizer, which opens the door to a fully Transformer-based model and significant improvements in learning and generalization. Finally, we cover the open-sourcing of π0 and future directions for their research. The complete show notes for this episode can be found at https://twimlai.com/go/719.

Viva la Mami
108. VLM Spotlight: Navigating Divorce as a Latina Mom with Marisa Lopez

Viva la Mami

Play Episode Listen Later Jan 23, 2025 39:55 Transcription Available


In this powerful first episode of a two-part series, I sit down with Marisa Lopez, a civil engineer turned real estate investor, who shares her raw and inspiring journey of breaking generational patterns to create a healthier future for her daughter.Marisa opens up about navigating divorce as a Latina mom, breaking cultural norms, and building a healthy co-parenting relationship. From being told she might never have children to becoming a single mom at 5 months postpartum, Marisa shares how she found strength through therapy, family support, and prioritizing her daughter's wellbeing. This conversation is especially powerful for mamás considering divorce or struggling with family expectations around separation. Join us as Marisa shows how choosing yourself can be the greatest act of love for your children.For detailed show notes, visit vivalamami.com/episode108In This Episode, You'll Hear:Marisa's path to madrehood despite medical challengesThe emotional process of choosing divorce as a new motherBreaking traditional cultural expectations around marriageBuilding a support system as a first-generation divorceeCreating healthy family dynamics for the next generationResources Mentioned:Mujerón MovementConnect with Marisa:Instagram: @asiramzepolLove this episode? Subscribe wherever you are listening, share this episode with an amiga, and leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.Follow Viva la Mami on Instagram@vivalamamiJoin the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!Have a suggestion for an episode topic? Click HEREHave a suggestion for a guest? Click HEREJoin the Viva la Mami Collectivewww.vivalamami.com/vlm-collectiveVisit the Viva la Mami Websitewww.vivalamami.comHave questions or want to connect? Email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Happy holidays! We'll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all all our LS supporters who helped fund the gorgeous venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver.Our next keynote covers The State of LLM Agents, with the triumphant return of Professor Graham Neubig's return to the pod (his ICLR episode here!). OpenDevin is now a startup known as AllHands! The renamed OpenHands has done extremely well this year, as they end the year sitting comfortably at number 1 on the hardest SWE-Bench Full leaderboard at 29%, though on the smaller SWE-Bench Verified, they are at 53%, behind Amazon Q, devlo, and OpenAI's self reported o3 results at 71.7%.Many are saying that 2025 is going to be the year of agents, with OpenAI, DeepMind and Anthropic setting their sights on consumer and coding agents, vision based computer-using agents and multi agent systems. There has been so much progress on the practical reliability and applications of agents in all domains, from the huge launch of Cognition AI's Devin this year, to the sleeper hit of Cursor Composer and Codeium's Windsurf Cascade in the IDE arena, to the explosive revenue growth of Stackblitz's Bolt, Lovable, and Vercel's v0, and the unicorn rounds and high profile movements of customer support agents like Sierra (now worth $4 billion) and search agents like Perplexity (now worth $9 billion). We wanted to take a little step back to understand the most notable papers of the year in Agents, and Graham indulged with his list of 8 perennial problems in building agents in 2024.Must-Read Papers for the 8 Problems of Agents* The agent-computer interface: CodeAct: Executable Code Actions Elicit Better LLM Agents. Minimial viable tools: Execution Sandbox, File Editor, Web Browsing* The human-agent interface: Chat UI, GitHub Plugin, Remote runtime, …?* Choosing an LLM: See Evaluation of LLMs as Coding Agents on SWE-Bench at 30x - must understand instructions, tools, code, environment, error recovery* Planning: Single Agent Systems vs Multi Agent (CoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration) - Explicit vs Implicit, Curated vs Generated* Reusable common workflows: SteP: Stacked LLM Policies for Web Actions and Agent Workflow Memory - Manual prompting vs Learning from Experience* Exploration: Agentless: Demystifying LLM-based Software Engineering Agents and BAGEL: Bootstrapping Agents by Guiding Exploration with Language* Search: Tree Search for Language Model Agents - explore paths and rewind* Evaluation: Fast Sanity Checks (miniWoB and Aider) and Highly Realistic (WebArena, SWE-Bench) and SWE-Gym: An Open Environment for Training Software Engineering Agents & VerifiersFull Talk on YouTubePlease like and subscribe!Timestamps* 00:00 Welcome to Latent Space Live at NeurIPS 2024* 00:29 State of LLM Agents in 2024* 02:20 Professor Graham Newbig's Insights on Agents* 03:57 Live Demo: Coding Agents in Action* 08:20 Designing Effective Agents* 14:13 Choosing the Right Language Model for Agents* 16:24 Planning and Workflow for Agents* 22:21 Evaluation and Future Predictions for Agents* 25:31 Future of Agent Development* 25:56 Human-Agent Interaction Challenges* 26:48 Expanding Agent Use Beyond Programming* 27:25 Redesigning Systems for Agent Efficiency* 28:03 Accelerating Progress with Agent Technology* 28:28 Call to Action for Open Source Contributions* 30:36 Q&A: Agent Performance and Benchmarks* 33:23 Q&A: Web Agents and Interaction Methods* 37:16 Q&A: Agent Architectures and Improvements* 43:09 Q&A: Self-Improving Agents and Authentication* 47:31 Live Demonstration and Closing RemarksTranscript[00:00:29] State of LLM Agents in 2024[00:00:29] Speaker 9: Our next keynote covers the state of LLM agents. With the triumphant return of Professor Graham Newbig of CMU and OpenDevon, now a startup known as AllHands. The renamed OpenHands has done extremely well this year, as they end the year sitting comfortably at number one on the hardest SWE Benchful leaderboard at 29%.[00:00:53] Speaker 9: Though, on the smaller SWE bench verified, they are at 53 percent behind Amazon Q [00:01:00] Devlo and OpenAI's self reported O3 results at 71. 7%. Many are saying that 2025 is going to be the year of agents, with OpenAI, DeepMind, and Anthropic setting their sights on consumer and coding agents. Vision based computer using agents and multi agent systems.[00:01:22] Speaker 9: There has been so much progress on the practical reliability and applications of agents in all domains, from the huge launch of Cognition AI's Devon this year, to the sleeper hit of Cursor Composer and recent guest Codium's Windsurf Cascade in the IDE arena. To the explosive revenue growth of recent guests StackBlitz's Bolt, Lovable, and Vercel's vZero.[00:01:44] Speaker 9: And the unicorn rounds and high profile movements of customer support agents like Sierra, now worth 4 billion, and search agents like Perplexity, now worth 9 billion. We wanted to take a little step back to understand the most notable papers of the year in [00:02:00] agents, and Graham indulged with his list of eight perennial problems in building agents.[00:02:06] Speaker 9: As always, don't forget to check our show notes for all the selected best papers of 2024, and for the YouTube link to their talk. Graham's slides were especially popular online, and we are honoured to have him. Watch out and take care![00:02:20] Professor Graham Newbig's Insights on Agents[00:02:20] Speaker: Okay hi everyone. So I was given the task of talking about agents in 2024, and this is An impossible task because there are so many agents, so many agents in 2024. So this is going to be strongly covered by like my personal experience and what I think is interesting and important, but I think it's an important topic.[00:02:41] Speaker: So let's go ahead. So the first thing I'd like to think about is let's say I gave you you know, a highly competent human, some tools. Let's say I gave you a web browser and a terminal or a file system. And the ability to [00:03:00] edit text or code. What could you do with that? Everything. Yeah.[00:03:07] Speaker: Probably a lot of things. This is like 99 percent of my, you know, daily daily life, I guess. When I'm, when I'm working. So, I think this is a pretty powerful tool set, and I am trying to do, and what I think some other people are trying to do, is come up with agents that are able to, you know, manipulate these things.[00:03:26] Speaker: Web browsing, coding, running code in successful ways. So there was a little bit about my profile. I'm a professor at CMU, chief scientist at All Hands AI, building open source coding agents. I'm maintainer of OpenHands, which is an open source coding agent framework. And I'm also a software developer and I, I like doing lots of coding and, and, you know, shipping new features and stuff like this.[00:03:51] Speaker: So building agents that help me to do this, you know, is kind of an interesting thing, very close to me.[00:03:57] Live Demo: Coding Agents in Action[00:03:57] Speaker: So the first thing I'd like to do is I'd like to try [00:04:00] some things that I haven't actually tried before. If anybody has, you know, tried to give a live demo, you know, this is, you know very, very scary whenever you do it and it might not work.[00:04:09] Speaker: So it might not work this time either. But I want to show you like three things that I typically do with coding agents in my everyday work. I use coding agents maybe five to 10 times a day to help me solve my own problems. And so this is a first one. This is a data science task. Which says I want to create scatter plots that show the increase of the SWE bench score over time.[00:04:34] Speaker: And so I, I wrote a kind of concrete prompt about this. Agents work better with like somewhat concrete prompts. And I'm gonna throw this into open hands and let it work. And I'll, I'll go back to that in a second. Another thing that I do is I create new software. And I, I've been using a [00:05:00] service a particular service.[00:05:01] Speaker: I won't name it for sending emails and I'm not very happy with it. So I want to switch over to this new service called resend. com, which makes it easier to send emails. And so I'm going to ask it to read the docs for the resend. com API and come up with a script that allows me to send emails. The input to the script should be a CSV file and the subject and body should be provided in Jinja2 templates.[00:05:24] Speaker: So I'll start another agent and and try to get it to do that for me.[00:05:35] Speaker: And let's go with the last one. The last one I do is. This is improving existing software and in order, you know, once you write software, you usually don't throw it away. You go in and, like, actually improve it iteratively. This software that I have is something I created without writing any code.[00:05:52] Speaker: It's basically software to monitor how much our our agents are contributing to the OpenHance repository. [00:06:00] And on the, let me make that a little bit bigger, on the left side, I have the number of issues where it like sent a pull request. I have the number of issues where it like sent a pull request, whether it was merged in purple, closed in red, or is still open in green. And so these are like, you know, it's helping us monitor, but one thing it doesn't tell me is the total number. And I kind of want that feature added to this software.[00:06:33] Speaker: So I'm going to try to add that too. So. I'll take this, I'll take this prompt,[00:06:46] Speaker: and here I want to open up specifically that GitHub repo. So I'll open up that repo and paste in the prompt asking it. I asked it to make a pie chart for each of these and give me the total over the entire time period that I'm [00:07:00] monitoring. So we'll do that. And so now I have let's see, I have some agents.[00:07:05] Speaker: Oh, this one already finished. Let's see. So this one already finished. You can see it finished analyzing the Swebench repository. It wrote a demonstration of, yeah, I'm trying to do that now, actually.[00:07:30] Speaker: It wrote a demonstration of how much each of the systems have improved over time. And I asked it to label the top three for each of the data sets. And so it labeled OpenHands as being the best one for SWE Bench Normal. For SWE Bench Verified, it has like the Amazon QAgent and OpenHands. For the SWE Bench Lite, it has three here over three over here.[00:07:53] Speaker: So you can see like. That's pretty useful, right? If you're a researcher, you do data analysis all the time. I did it while I was talking to all [00:08:00] of you and making a presentation. So that's, that's pretty nice. I, I doubt the other two are finished yet. That would be impressive if the, yeah. So I think they're still working.[00:08:09] Speaker: So maybe we'll get back to them at the end of the presentation. But so these are the kinds of the, these are the kinds of things that I do every day with coding agents now. And it's or software development agents. It's pretty impressive.[00:08:20] Designing Effective Agents[00:08:20] Speaker: The next thing I'd like to talk about a little bit is things I worry about when designing agents.[00:08:24] Speaker: So we're designing agents to, you know, do a very difficult task of like navigating websites writing code, other things like this. And within 2024, there's been like a huge improvement in the methodology that we use to do this. But there's a bunch of things we think about. There's a bunch of interesting papers, and I'd like to introduce a few of them.[00:08:46] Speaker: So the first thing I worry about is the agent computer interface. Like, how do we get an agent to interact with computers? And, How do we provide agents with the tools to do the job? And [00:09:00] within OpenHands we are doing the thing on the right, but there's also a lot of agents that do the thing on the left.[00:09:05] Speaker: So the thing on the left is you give like agents kind of granular tools. You give them tools like or let's say your instruction is I want to determine the most cost effective country to purchase the smartphone model, Kodak one the countries to consider are the USA, Japan, Germany, and India. And you have a bunch of available APIs.[00:09:26] Speaker: And. So what you do for some agents is you provide them all of these tools APIs as tools that they can call. And so in this particular case in order to solve this problem, you'd have to make about like 30 tool calls, right? You'd have to call lookup rates for Germany, you'd have to look it up for the US, Japan, and India.[00:09:44] Speaker: That's four tool goals. And then you go through and do all of these things separately. And the method that we adopt in OpenHands instead is we provide these tools, but we provide them by just giving a coding agent, the ability to call [00:10:00] arbitrary Python code. And. In the arbitrary Python code, it can call these tools.[00:10:05] Speaker: We expose these tools as APIs that the model can call. And what that allows us to do is instead of writing 20 tool calls, making 20 LLM calls, you write a program that runs all of these all at once, and it gets the result. And of course it can execute that program. It can, you know, make a mistake. It can get errors back and fix things.[00:10:23] Speaker: But that makes our job a lot easier. And this has been really like instrumental to our success, I think. Another part of this is what tools does the agent need? And I, I think this depends on your use case, we're kind of extreme and we're only giving the agent five tools or maybe six tools.[00:10:40] Speaker: And what, what are they? The first one is program execution. So it can execute bash programs, and it can execute Jupyter notebooks. It can execute cells in Jupyter notebooks. So that, those are two tools. Another one is a file editing tool. And the file editing tool allows you to browse parts of files.[00:11:00][00:11:00] Speaker: And kind of read them, overwrite them, other stuff like this. And then we have another global search and replace tool. So it's actually two tools for file editing. And then a final one is web browsing, web browsing. I'm kind of cheating when I call it only one tool. You actually have like scroll and text input and click and other stuff like that.[00:11:18] Speaker: But these are basically the only things we allow the agent to do. What, then the question is, like, what if we wanted to allow it to do something else? And the answer is, well, you know, human programmers already have a bunch of things that they use. They have the requests PyPy library, they have the PDF to text PyPy library, they have, like, all these other libraries in the Python ecosystem that they could use.[00:11:41] Speaker: And so if we provide a coding agent with all these libraries, it can do things like data visualization and other stuff that I just showed you. So it can also get clone repositories and, and other things like this. The agents are super good at using the GitHub API also. So they can do, you know, things on GitHub, like finding all of the, you know, [00:12:00] comments on your issues or checking GitHub actions and stuff.[00:12:02] Speaker: The second thing I think about is the human agent interface. So this is like how do we get humans to interact with agents? Bye. I already showed you one variety of our human agent interface. It's basically a chat window where you can browse through the agent's results and things like this. This is very, very difficult.[00:12:18] Speaker: I, I don't think anybody has a good answer to this, and I don't think we have a good answer to this, but the, the guiding principles that I'm trying to follow are we want to present enough info to the user. So we want to present them with, you know, what the agent is doing in the form of a kind of.[00:12:36] Speaker: English descriptions. So you can see here you can see here every time it takes an action, it says like, I will help you create a script for sending emails. When it runs a bash command. Sorry, that's a little small. When it runs a bash command, it will say ran a bash command. It won't actually show you the whole bash command or the whole Jupyter notebook because it can be really large, but you can open it up and see if you [00:13:00] want to, by clicking on this.[00:13:01] Speaker: So like if you want to explore more, you can click over to the Jupyter notebook and see what's displayed in the Jupyter notebook. And you get like lots and lots of information. So that's one thing.[00:13:16] Speaker: Another thing is go where the user is. So like if the user's already interacting in a particular setting then I'd like to, you know, integrate into that setting, but only to a point. So at OpenHands, we have a chat UI for interaction. We have a GitHub plugin for tagging and resolving issues. So basically what you do is you Do at open hands agent and the open hands agent will like see that comment and be able to go in and fix things.[00:13:42] Speaker: So if you say at open hands agent tests are failing on this PR, please fix the tests. It will go in and fix the test for you and stuff like this. Another thing we have is a remote runtime for launching headless jobs. So if you want to launch like a fleet of agents to solve, you know five different problems at once, you can also do [00:14:00] that through an API.[00:14:00] Speaker: So we have we have these interfaces and this probably depends on the use case. So like, depending if you're a coding agent, you want to do things one way. If you're a like insurance auditing agent, you'll want to do things other ways, obviously.[00:14:13] Choosing the Right Language Model for Agents[00:14:13] Speaker: Another thing I think about a lot is choosing a language model.[00:14:16] Speaker: And for agentic LMs we have to have a bunch of things work really well. The first thing is really, really good instruction following ability. And if you have really good instruction following ability, it opens up like a ton of possible applications for you. Tool use and coding ability. So if you provide tools, it needs to be able to use them well.[00:14:38] Speaker: Environment understanding. So it needs, like, if you're building a web agent, it needs to be able to understand web pages either through vision or through text. And error awareness and recovery ability. So, if it makes a mistake, it needs to be able to, you know, figure out why it made a mistake, come up with alternative strategies, and other things like this.[00:14:58] Speaker: [00:15:00] Under the hood, in all of the demos that I did now Cloud, we're using Cloud. Cloud has all of these abilities very good, not perfect, but very good. Most others don't have these abilities quite as much. So like GPT 4. 0 doesn't have very good error recovery ability. And so because of this, it will go into loops and do the same thing over and over and over again.[00:15:22] Speaker: Whereas Claude does not do this. Claude, if you, if you use the agents enough, you get used to their kind of like personality. And Claude says, Hmm, let me try a different approach a lot. So, you know, obviously it's been trained in some way to, you know, elicit this ability. We did an evaluation. This is old.[00:15:40] Speaker: And we need to update this basically, but we evaluated CLOD, mini LLAMA 405B, DeepSeq 2. 5 on being a good code agent within our framework. And CLOD was kind of head and shoulders above the rest. GPT 40 was kind of okay. The best open source model was LLAMA [00:16:00] 3. 1 405B. This needs to be updated because this is like a few months old by now and, you know, things are moving really, really fast.[00:16:05] Speaker: But I still am under the impression that Claude is the best. The other closed models are, you know, not quite as good. And then the open models are a little bit behind that. Grok, I, we haven't tried Grok at all, actually. So, it's a good question. If you want to try it I'd be happy to help.[00:16:24] Speaker: Cool.[00:16:24] Planning and Workflow for Agents[00:16:24] Speaker: Another thing is planning. And so there's a few considerations for planning. The first one is whether you have a curated plan or you have it generated on the fly. And so for solving GitHub issues, you can kind of have an overall plan. Like the plan is first reproduce. If there's an issue, first write tests to reproduce the issue or to demonstrate the issue.[00:16:50] Speaker: After that, run the tests and make sure they fail. Then go in and fix the tests. Run the tests again to make sure they pass and then you're done. So that's like a pretty good workflow [00:17:00] for like solving coding issues. And you could curate that ahead of time. Another option is to let the language model basically generate its own plan.[00:17:10] Speaker: And both of these are perfectly valid. Another one is explicit structure versus implicit structure. So let's say you generate a plan. If you have explicit structure, you could like write a multi agent system, and the multi agent system would have your reproducer agent, and then it would have your your bug your test writer agent, and your bug fixer agent, and lots of different agents, and you would explicitly write this all out in code, and then then use it that way.[00:17:38] Speaker: On the other hand, you could just provide a prompt that says, please do all of these things in order. So in OpenHands, we do very light planning. We have a single prompt. We don't have any multi agent systems. But we do provide, like, instructions about, like, what to do first, what to do next, and other things like this.[00:17:56] Speaker: I'm not against doing it the other way. But I laid [00:18:00] out some kind of justification for this in this blog called Don't Sleep on Single Agent Systems. And the basic idea behind this is if you have a really, really good instruction following agent it will follow the instructions as long as things are working according to your plan.[00:18:14] Speaker: But let's say you need to deviate from your plan, you still have the flexibility to do this. And if you do explicit structure through a multi agent system, it becomes a lot harder to do that. Like, you get stuck when things deviate from your plan. There's also some other examples, and I wanted to introduce a few papers.[00:18:30] Speaker: So one paper I liked recently is this paper called CoAct where you generate plans and then go in and fix them. And so the basic idea is like, if you need to deviate from your plan, you can You know, figure out that your plan was not working and go back and deviate from it.[00:18:49] Speaker: Another thing I think about a lot is specifying common workflows. So we're trying to tackle a software development and I already showed like three use cases where we do [00:19:00] software development and when we. We do software development, we do a ton of different things, but we do them over and over and over again.[00:19:08] Speaker: So just to give an example we fix GitHub actions when GitHub actions are failing. And we do that over and over and over again. That's not the number one thing that software engineers do, but it's a, you know, high up on the list. So how can we get a list of all of, like, the workflows that people are working on?[00:19:26] Speaker: And there's a few research works that people have done in this direction. One example is manual prompting. So there's this nice paper called STEP that got state of the art on the WebArena Web Navigation Benchmark where they came up with a bunch of manual workflows for solving different web navigation tasks.[00:19:43] Speaker: And we also have a paper recently called Agent Workflow Memory where the basic idea behind this is we want to create self improving agents that learn from their past successes. And the way it works is is we have a memory that has an example of lots of the previous [00:20:00] workflows that people have used. And every time the agent finishes a task and it self judges that it did a good job at that task, you take that task, you break it down into individual workflows included in that, and then you put it back in the prompt for the agent to work next time.[00:20:16] Speaker: And this we demonstrated that this leads to a 22. 5 percent increase on WebArena after 40 examples. So that's a pretty, you know, huge increase by kind of self learning and self improvement.[00:20:31] Speaker: Another thing is exploration. Oops. And one thing I think about is like, how can agents learn more about their environment before acting? And I work on coding and web agents, and there's, you know, a few good examples of this in, in both areas. Within coding, I view this as like repository understanding, understanding the code base that you're dealing with.[00:20:55] Speaker: And there's an example of this, or a couple examples of this, one example being AgentList. [00:21:00] Where they basically create a map of the repo and based on the map of the repo, they feed that into the agent so the agent can then navigate the repo and and better know where things are. And for web agents there's an example of a paper called Bagel, and basically what they do is they have the agent just do random tasks on a website, explore the website, better understand the structure of the website, and then after that they they feed that in as part of the product.[00:21:27] Speaker: Part seven is search. Right now in open hands, we just let the agent go on a linear search path. So it's just solving the problem once. We're using a good agent that can kind of like recover from errors and try alternative things when things are not working properly, but still we only have a linear search path.[00:21:45] Speaker: But there's also some nice work in 2024 that is about exploring multiple paths. So one example of this is there's a paper called Tree Search for Language Agents. And they basically expand multiple paths check whether the paths are going well, [00:22:00] and if they aren't going well, you rewind back. And on the web, this is kind of tricky, because, like, how do you rewind when you accidentally ordered something you don't want on Amazon?[00:22:09] Speaker: It's kind of, you know, not, not the easiest thing to do. For code, it's a little bit easier, because you can just revert any changes that you made. But I, I think that's an interesting topic, too.[00:22:21] Evaluation and Future Predictions for Agents[00:22:21] Speaker: And then finally evaluation. So within our development for evaluation, we want to do a number of things. The first one is fast sanity checks.[00:22:30] Speaker: And in order to do this, we want things we can run really fast, really really cheaply. So for web, we have something called mini world of bits, which is basically these trivial kind of web navigation things. We have something called the Adder Code Editing Benchmark, where it's just about editing individual files that we use.[00:22:48] Speaker: But we also want highly realistic evaluation. So for the web, we have something called WebArena that we created at CMU. This is web navigation on real real open source websites. So it's open source [00:23:00] websites that are actually used to serve shops or like bulletin boards or other things like this.[00:23:07] Speaker: And for code, we use Swebench, which I think a lot of people may have heard of. It's basically a coding benchmark that comes from real world pull requests on GitHub. So if you can solve those, you can also probably solve other real world pull requests. I would say we still don't have benchmarks for the fur full versatility of agents.[00:23:25] Speaker: So, for example We don't have benchmarks that test whether agents can code and do web navigation. But we're working on that and hoping to release something in the next week or two. So if that sounds interesting to you, come talk to me and I, I will tell you more about it.[00:23:42] Speaker: Cool. So I don't like making predictions, but I was told that I should be somewhat controversial, I guess, so I will, I will try to do it try to do it anyway, although maybe none of these will be very controversial. Um, the first thing is agent oriented LLMs like large language models for [00:24:00] agents.[00:24:00] Speaker: My, my prediction is every large LM trainer will be focusing on training models as agents. So every large language model will be a better agent model by mid 2025. Competition will increase, prices will go down, smaller models will become competitive as agents. So right now, actually agents are somewhat expensive to run in some cases, but I expect that that won't last six months.[00:24:23] Speaker: I, I bet we'll have much better agent models in six months. Another thing is instruction following ability, specifically in agentic contexts, will increase. And what that means is we'll have to do less manual engineering of agentic workflows and be able to do more by just prompting agents in more complex ways.[00:24:44] Speaker: Cloud is already really good at this. It's not perfect, but it's already really, really good. And I expect the other models will catch up to Cloud pretty soon. Error correction ability will increase, less getting stuck in loops. Again, this is something that Cloud's already pretty good at and I expect the others will, will follow.[00:25:00][00:25:01] Speaker: Agent benchmarks. Agent benchmarks will start saturating.[00:25:05] Speaker: And Swebench I think WebArena is already too easy. It, it is, it's not super easy, but it's already a bit too easy because the tasks we do in there are ones that take like two minutes for a human. So not, not too hard. And kind of historically in 2023 our benchmarks were too easy. So we built harder benchmarks like WebArena and Swebench were both built in 2023.[00:25:31] Future of Agent Development[00:25:31] Speaker: In 2024, our agents were too bad, so we built agents and now we're building better agents. In 2025, our benchmarks will be too easy, so we'll build better benchmarks, I'm, I'm guessing. So, I would expect to see much more challenging agent benchmarks come out, and we're already seeing some of them.[00:25:49] Speaker: In 2026, I don't know. I didn't write AGI, but we'll, we'll, we'll see.[00:25:56] Human-Agent Interaction Challenges[00:25:56] Speaker: Then the human agent computer interface. I think one thing that [00:26:00] we'll want to think about is what do we do at 75 percent success rate at things that we like actually care about? Right now we have 53 percent or 55 percent on Swebench verified, which is real world GitHub PRs.[00:26:16] Speaker: My impression is that the actual. Actual ability of models is maybe closer to 30 to 40%. So 30 to 40 percent of the things that I want an agent to solve on my own repos, it just solves without any human intervention. 80 to 90 percent it can solve without me opening an IDE. But I need to give it feedback.[00:26:36] Speaker: So how do we, how do we make that interaction smooth so that humans can audit? The work of agents that are really, really good, but not perfect is going to be a big challenge.[00:26:48] Expanding Agent Use Beyond Programming[00:26:48] Speaker: How can we expose the power of programming agents to other industries? So like as programmers, I think not all of us are using agents every day in our programming, although we probably will be [00:27:00] in in months or maybe a year.[00:27:02] Speaker: But I, I think it will come very naturally to us as programmers because we know code. We know, you know. Like how to architect software and stuff like that. So I think the question is how do we put this in the hands of like a lawyer or a chemist or somebody else and have them also be able to, you know, interact with it as naturally as we can.[00:27:25] Redesigning Systems for Agent Efficiency[00:27:25] Speaker: Another interesting thing is how can we redesign our existing systems for agents? So we had a paper on API based web agents, and basically what we showed is If you take a web agent and the agent interacts not with a website, but with APIs, the accuracy goes way up just because APIs are way easier to interact with.[00:27:42] Speaker: And in fact, like when I ask the, well, our agent, our agent is able to browse websites, but whenever I want it to interact with GitHub, I tell it do not browse the GitHub website. Use the GitHub API because it's way more successful at doing that. So maybe, you know, every website is going to need to have [00:28:00] an API because we're going to be having agents interact with them.[00:28:03] Accelerating Progress with Agent Technology[00:28:03] Speaker: About progress, I think progress will get faster. It's already fast. A lot of people are already overwhelmed, but I think it will continue. The reason why is agents are building agents. And better agents will build better agents faster. So I expect that you know, if you haven't interacted with a coding agent yet, it's pretty magical, like the stuff that it can do.[00:28:24] Speaker: So yeah.[00:28:28] Call to Action for Open Source Contributions[00:28:28] Speaker: And I have a call to action. I'm honestly, like I've been working on, you know, natural language processing and, and Language models for what, 15 years now. And even for me, it's pretty impressive what like AI agents powered by strong language models can do. On the other hand, I believe that we should really make these powerful tools accessible.[00:28:49] Speaker: And what I mean by this is I don't think like, you know, We, we should have these be opaque or limited to only a set, a certain set of people. I feel like they should be [00:29:00] affordable. They shouldn't be increasing the, you know, difference in the amount of power that people have. If anything, I'd really like them to kind of make it It's possible for people who weren't able to do things before to be able to do them well.[00:29:13] Speaker: Open source is one way to do that. That's why I'm working on open source. There are other ways to do that. You know, make things cheap, make things you know, so you can serve them to people who aren't able to afford them. Easily, like Duolingo is one example where they get all the people in the US to pay them 20 a month so that they can give all the people in South America free, you know, language education, so they can learn English and become, you know like, and become, you know, More attractive on the job market, for instance.[00:29:41] Speaker: And so I think we can all think of ways that we can do that sort of thing. And if that resonates with you, please contribute. Of course, I'd be happy if you contribute to OpenHands and use it. But another way you can do that is just use open source solutions, contribute to them, research with them, and train strong open source [00:30:00] models.[00:30:00] Speaker: So I see, you know, Some people in the room who are already training models. It'd be great if you could train models for coding agents and make them cheap. And yeah yeah, please. I, I was thinking about you among others. So yeah, that's all I have. Thanks.[00:30:20] Speaker 2: Slight, slightly controversial. Tick is probably the nicest way to say hot ticks. Any hot ticks questions, actual hot ticks?[00:30:31] Speaker: Oh, I can also show the other agents that were working, if anybody's interested, but yeah, sorry, go ahead.[00:30:36] Q&A: Agent Performance and Benchmarks[00:30:36] Speaker 3: Yeah, I have a couple of questions. So they're kind of paired, maybe. The first thing is that you said that You're estimating that your your agent is successfully resolving like something like 30 to 40 percent of your issues, but that's like below what you saw in Swebench.[00:30:52] Speaker 3: So I guess I'm wondering where that discrepancy is coming from. And then I guess my other second question, which is maybe broader in scope is that [00:31:00] like, if, if you think of an agent as like a junior developer, and I say, go do something, then I expect maybe tomorrow to get a Slack message being like, Hey, I ran into this issue.[00:31:10] Speaker 3: How can I resolve it? And, and, like you said, your agent is, like, successfully solving, like, 90 percent of issues where you give it direct feedback. So, are you thinking about how to get the agent to reach out to, like, for, for planning when it's, when it's stuck or something like that? Or, like, identify when it runs into a hole like that?[00:31:30] Speaker: Yeah, so great. These are great questions. Oh,[00:31:32] Speaker 3: sorry. The third question, which is a good, so this is the first two. And if so, are you going to add a benchmark for that second question?[00:31:40] Speaker: Okay. Great. Yeah. Great questions. Okay. So the first question was why do I think it's resolving less than 50 percent of the issues on Swebench?[00:31:48] Speaker: So first Swebench is on popular open source repos, and all of these popular open source repos were included in the training data for all of the language models. And so the language [00:32:00] models already know these repos. In some cases, the language models already know the individual issues in Swebench.[00:32:06] Speaker: So basically, like, some of the training data has leaked. And so it, it definitely will overestimate with respect to that. I don't think it's like, you know, Horribly, horribly off but I think, you know, it's boosting the accuracy by a little bit. So, maybe that's the biggest reason why. In terms of asking for help, and whether we're benchmarking asking for help yes we are.[00:32:29] Speaker: So one one thing we're working on now, which we're hoping to put out soon, is we we basically made SuperVig. Sweep edge issues. Like I'm having a, I'm having a problem with the matrix multiply. Please help. Because these are like, if anybody's run a popular open source, like framework, these are what half your issues are.[00:32:49] Speaker: You're like users show up and say like, my screen doesn't work. What, what's wrong or something. And so then you need to ask them questions and how to reproduce. So yeah, we're, we're, we're working on [00:33:00] that. I think. It, my impression is that agents are not very good at asking for help, even Claude. So like when, when they ask for help, they'll ask for help when they don't need it.[00:33:11] Speaker: And then won't ask for help when they do need it. So this is definitely like an issue, I think.[00:33:20] Speaker 4: Thanks for the great talk. I also have two questions.[00:33:23] Q&A: Web Agents and Interaction Methods[00:33:23] Speaker 4: It's first one can you talk a bit more about how the web agent interacts with So is there a VLM that looks at the web page layout and then you parse the HTML and select which buttons to click on? And if so do you think there's a future where there's like, so I work at Bing Microsoft AI.[00:33:41] Speaker 4: Do you think there's a future where the same web index, but there's an agent friendly web index where all the processing is done offline so that you don't need to spend time. Cleaning up, like, cleaning up these TML and figuring out what to click online. And any thoughts on, thoughts on that?[00:33:57] Speaker: Yeah, so great question. There's a lot of work on web [00:34:00] agents. I didn't go into, like, all of the details, but I think there's There's three main ways that agents interact with websites. The first way is the simplest way and the newest way, but it doesn't work very well, which is you take a screenshot of the website and then you click on a particular pixel value on the website.[00:34:23] Speaker: And Like models are not very good at that at the moment. Like they'll misclick. There was this thing about how like clawed computer use started like looking at pictures of Yellowstone national park or something like this. I don't know if you heard about this anecdote, but like people were like, oh, it's so human, it's looking for vacation.[00:34:40] Speaker: And it was like, no, it probably just misclicked on the wrong pixels and accidentally clicked on an ad. So like this is the simplest way. The second simplest way. You take the HTML and you basically identify elements in the HTML. You don't use any vision whatsoever. And then you say, okay, I want to click on this element.[00:34:59] Speaker: I want to enter text [00:35:00] in this element or something like that. But HTML is too huge. So it actually, it usually gets condensed down into something called an accessibility tree, which was made for screen readers for visually impaired people. And So that's another way. And then the third way is kind of a hybrid where you present the screenshot, but you also present like a textual summary of the output.[00:35:18] Speaker: And that's the one that I think will probably work best. What we're using is we're just using text at the moment. And that's just an implementation issue that we haven't implemented the. Visual stuff yet, but that's kind of like we're working on it now. Another thing that I should point out is we actually have two modalities for web browsing.[00:35:35] Speaker: Very recently we implemented this. And the reason why is because if you want to interact with full websites you will need to click on all of the elements or have the ability to click on all of the elements. But most of our work that we need websites for is just web browsing and like gathering information.[00:35:50] Speaker: So we have another modality where we convert all of it to markdown because that's like way more concise and easier for the agent to deal with. And then [00:36:00] can we create an index specifically for agents, maybe a markdown index or something like that would be, you know, would make sense. Oh, how would I make a successor to Swebench?[00:36:10] Speaker: So I mean, the first thing is there's like live code bench, which live code bench is basically continuously updating to make sure it doesn't leak into language model training data. That's easy to do for Swebench because it comes from real websites and those real websites are getting new issues all the time.[00:36:27] Speaker: So you could just do it on the same benchmarks that they have there. There's also like a pretty large number of things covering various coding tasks. So like, for example, Swebunch is mainly fixing issues, but there's also like documentation, there's generating tests that actually test the functionality that you want.[00:36:47] Speaker: And there there was a paper by a student at CMU on generating tests and stuff like that. So I feel like. Swebench is one piece of the puzzle, but you could also have like 10 different other tasks and then you could have like a composite [00:37:00] benchmark where you test all of these abilities, not just that particular one.[00:37:04] Speaker: Well, lots, lots of other things too, but[00:37:11] Speaker 2: Question from across. Use your mic, it will help. Um,[00:37:15] Speaker 5: Great talk. Thank you.[00:37:16] Q&A: Agent Architectures and Improvements[00:37:16] Speaker 5: My question is about your experience designing agent architectures. Specifically how much do you have to separate concerns in terms of tasks specific agents versus having one agent to do three or five things with a gigantic prompt with conditional paths and so on.[00:37:35] Speaker: Yeah, so that's a great question. So we have a basic coding and browsing agent. And I won't say basic, like it's a good, you know, it's a good agent, but it does coding and browsing. And it has instructions about how to do coding and browsing. That is enough for most things. Especially given a strong language model that has a lot of background knowledge about how to solve different types of tasks and how to use different APIs and stuff like that.[00:37:58] Speaker: We do have [00:38:00] a mechanism for something called micro agents. And micro agents are basically something that gets added to the prompt when a trigger is triggered. Right now it's very, very rudimentary. It's like if you detect the word GitHub anywhere, you get instructions about how to interact with GitHub, like use the API and don't browse.[00:38:17] Speaker: Also another one that I just added is for NPM, the like JavaScript package manager. And NPM, when it runs and it hits a failure, it Like hits in interactive terminals where it says, would you like to quit? Yep. Enter yes. And if that does it, it like stalls our agent for the time out until like two minutes.[00:38:36] Speaker: So like I added a new microagent whenever it started using NPM, it would Like get instructions about how to not use interactive terminal and stuff like that. So that's our current solution. Honestly, I like it a lot. It's simple. It's easy to maintain. It works really well and stuff like that. But I think there is a world where you would want something more complex than that.[00:38:55] Speaker 5: Got it. Thank you.[00:38:59] Speaker 6: I got a [00:39:00] question about MCP. I feel like this is the Anthropic Model Context Protocol. It seems like the most successful type of this, like, standardization of interactions between computers and agents. Are you guys adopting it? Is there any other competing standard?[00:39:16] Speaker 6: Anything, anything thought about it?[00:39:17] Speaker: Yeah, I think the Anth, so the Anthropic MCP is like, a way to It, it's essentially a collection of APIs that you can use to interact with different things on the internet. I, I think it's not a bad idea, but it, it's like, there's a few things that bug me a little bit about it.[00:39:40] Speaker: It's like we already have an API for GitHub, so why do we need an MCP for GitHub? Right. You know, like GitHub has an API, the GitHub API is evolving. We can look up the GitHub API documentation. So it seems like kind of duplicated a little bit. And also they have a setting where [00:40:00] it's like you have to spin up a server to serve your GitHub stuff.[00:40:04] Speaker: And you have to spin up a server to serve your like, you know, other stuff. And so I think it makes, it makes sense if you really care about like separation of concerns and security and like other things like this, but right now we haven't seen, we haven't seen that. To have a lot more value than interacting directly with the tools that are already provided.[00:40:26] Speaker: And that kind of goes into my general philosophy, which is we're already developing things for programmers. You know,[00:40:36] Speaker: how is an agent different than from a programmer? And it is different, obviously, you know, like agents are different from programmers, but they're not that different at this point. So we can kind of interact with the interfaces we create for, for programmers. Yeah. I might change my mind later though.[00:40:51] Speaker: So we'll see.[00:40:54] Speaker 7: Yeah. Hi. Thanks. Very interesting talk. You were saying that the agents you have right now [00:41:00] solve like maybe 30 percent of your, your issues out of the gate. I'm curious of the things that it doesn't do. Is there like a pattern that you observe? Like, Oh, like these are the sorts of things that it just seems to really struggle with, or is it just seemingly random?[00:41:15] Speaker: It's definitely not random. It's like, if you think it's more complex than it's. Like, just intuitively, it's more likely to fail. I've gotten a bit better at prompting also, so like, just to give an example it, it will sometimes fail to fix a GitHub workflow because it will not look at the GitHub workflow and understand what the GitHub workflow is doing before it solves the problem.[00:41:43] Speaker: So I, I think actually probably the biggest thing that it fails at is, um, er, that our, our agent plus Claude fails at is insufficient information gathering before trying to solve the task. And so if you provide all, if you provide instructions that it should do information [00:42:00] gathering beforehand, it tends to do well.[00:42:01] Speaker: If you don't provide sufficient instructions, it will try to solve the task without, like, fully understanding the task first, and then fail, and then you need to go back and give feedback. You know, additional feedback. Another example, like, I, I love this example. While I was developing the the monitor website that I, I showed here, we hit a really tricky bug where it was writing out a cache file to a different directory than it was reading the cache file from.[00:42:26] Speaker: And I had no idea what to do. I had no idea what was going on. I, I thought the bug was in a different part of the code, but what I asked it to do was come up with five possible reasons why this could be failing and decreasing order of likelihood and examine all of them. And that worked and it could just go in and like do that.[00:42:44] Speaker: So like I think a certain level of like scaffolding about like how it should sufficiently Gather all the information that's necessary in order to solve a task is like, if that's missing, then that's probably the biggest failure point at the moment. [00:43:00][00:43:01] Speaker 7: Thanks.[00:43:01] Speaker 6: Yeah.[00:43:06] Speaker 6: I'm just, I'm just using this as a chance to ask you all my questions.[00:43:09] Q&A: Self-Improving Agents and Authentication[00:43:09] Speaker 6: You had a, you had a slide on here about like self improving agents or something like that with memory. It's like a really throwaway slide for like a super powerful idea. It got me thinking about how I would do it. I have no idea how.[00:43:21] Speaker 6: So I just wanted you to chain a thought more on this.[00:43:25] Speaker: Yeah, self, self improving. So I think the biggest reason, like the simplest possible way to create a self improving agent. The problem with that is to have a really, really strong language model that with infinite context, and it can just go back and look at like all of its past experiences and, you know, learn from them.[00:43:46] Speaker: You might also want to remove the bad stuff just so it doesn't over index on it's like failed past experiences. But the problem is a really powerful language model is large. Infinite context is expensive. We don't have a good way to [00:44:00] index into it because like rag, Okay. At least in my experience, RAG from language to code doesn't work super well.[00:44:08] Speaker: So I think in the end, it's like, that's the way I would like to solve this problem. I'd like to have an infinite context and somehow be able to index into it appropriately. And I think that would mostly solve it. Another thing you can do is fine tuning. So I think like RAG is one way to get information into your model.[00:44:23] Speaker: Fine tuning is another way to get information into your model. So. That might be another way of continuously improving. Like you identify when you did a good job and then just add all of the good examples into your model.[00:44:34] Speaker 6: Yeah. So, you know, how like Voyager tries to write code into a skill library and then you reuse as a skill library, right?[00:44:40] Speaker 6: So that it improves in the sense that it just builds up the skill library over time.[00:44:44] Speaker: Yep.[00:44:44] Speaker 6: One thing I was like thinking about and there's this idea of, from, from Devin, your, your arch nemesis of playbooks. I don't know if you've seen them.[00:44:52] Speaker: Yeah, I mean, we're calling them workflows, but they're simpler.[00:44:55] Speaker 6: Yeah, so like, basically, like, you should, like, once a workflow works, you can kind of, [00:45:00] like, persist them as a skill library. Yeah. Right? Like I, I feel like that there's a, that's like some in between, like you said, you know, it's hard to do rag between language and code, but I feel like that is ragged for, like, I've done this before, last time I did it, this, this worked.[00:45:14] Speaker 6: So I'm just going to shortcut. All the stuff that failed before.[00:45:18] Speaker: Yeah, I totally, I think it's possible. It's just, you know, not, not trivial at the same time. I'll explain the two curves. So basically, the base, the baseline is just an agent that does it from scratch every time. And this curve up here is agent workflow memory where it's like adding the successful experiences back into the prompt.[00:45:39] Speaker: Why is this improving? The reason why is because just it failed on the first few examples and for the average to catch up it, it took a little bit of time. So it's not like this is actually improving it. You could just basically view the this one is constant and then this one is like improving.[00:45:56] Speaker: Like this, basically you can see it's continuing to go [00:46:00] up.[00:46:01] Speaker 8: How do you think we're going to solve the authentication problem for agents right now?[00:46:05] Speaker: When you say authentication, you mean like credentials, like, yeah.[00:46:09] Speaker 8: Yeah. Cause I've seen a few like startup solutions today, but it seems like it's limited to the amount of like websites or actual like authentication methods that it's capable of performing today.[00:46:19] Speaker: Yeah. Great questions. So. My preferred solution to this at the moment is GitHub like fine grained authentication tokens and GitHub fine grained authentication tokens allow you to specify like very free. On a very granular basis on this repo, you have permission to do this, on this repo, you have permission to do this.[00:46:41] Speaker: You also can prevent people from pushing to the main branch unless they get approved. You can do all of these other things. And I think these were all developed for human developers. Or like, the branch protection rules were developed for human developers. The fine grained authentication tokens were developed for GitHub apps.[00:46:56] Speaker: I think for GitHub, maybe [00:47:00] just pushing this like a little bit more is the way to do this. For other things, they're totally not prepared to give that sort of fine grained control. Like most APIs don't have something like a fine grained authentication token. And that goes into my like comment that we're going to need to prepare the world for agents, I think.[00:47:17] Speaker: But I think like the GitHub authentication tokens are like a good template for how you could start doing that maybe, but yeah, I don't, I don't, I don't have an answer.[00:47:25] Speaker 8: I'll let you know if I find one.[00:47:26] Speaker: Okay. Yeah.[00:47:31] Live Demonstration and Closing Remarks[00:47:31] Speaker: I'm going to finish up. Let, let me just see.[00:47:37] Speaker: Okay. So this one this one did write a script. I'm not going to actually read it for you. And then the other one, let's see.[00:47:51] Speaker: Yeah. So it sent a PR, sorry. What is, what is the PR URL?[00:48:00][00:48:02] Speaker: So I don't, I don't know if this sorry, that's taking way longer than it should. Okay, cool. Yeah. So this one sent a PR. I'll, I'll tell you later if this actually like successfully Oh, no, it's deployed on Vercel, so I can actually show you, but let's, let me try this real quick. Sorry. I know I don't have time.[00:48:24] Speaker: Yeah, there you go. I have pie charts now. So it's so fun. It's so fun to play with these things. Cause you could just do that while I'm giving a, you know, talk and things like that. So, yeah, thanks. Get full access to Latent Space at www.latent.space/subscribe

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Happy holidays! We'll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all all our LS supporters who helped fund the gorgeous venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver. Today, we're proud to share Loubna's highly anticipated talk (slides here)!Synthetic DataWe called out the Synthetic Data debate at last year's NeurIPS, and no surprise that 2024 was dominated by the rise of synthetic data everywhere:* Apple's Rephrasing the Web, Microsoft's Phi 2-4 and Orca/AgentInstruct, Tencent's Billion Persona dataset, DCLM, and HuggingFace's FineWeb-Edu, and Loubna's own Cosmopedia extended the ideas of synthetic textbook and agent generation to improve raw web scrape dataset quality* This year we also talked to the IDEFICS/OBELICS team at HuggingFace who released WebSight this year, the first work on code-vs-images synthetic data.* We called Llama 3.1 the Synthetic Data Model for its extensive use (and documentation!) of synthetic data in its pipeline, as well as its permissive license. * Nemotron CC and Nemotron-4-340B also made a big splash this year for how they used 20k items of human data to synthesize over 98% of the data used for SFT/PFT.* Cohere introduced Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress observing gains of up to 56.5% improvement in win rates comparing multiple teachers vs the single best teacher model* In post training, AI2's Tülu3 (discussed by Luca in our Open Models talk) and Loubna's Smol Talk were also notable open releases this year.This comes in the face of a lot of scrutiny and criticism, with Scale AI as one of the leading voices publishing AI models collapse when trained on recursively generated data in Nature magazine bringing mainstream concerns to the potential downsides of poor quality syndata:Part of the concerns we highlighted last year on low-background tokens are coming to bear: ChatGPT contaminated data is spiking in every possible metric:But perhaps, if Sakana's AI Scientist pans out this year, we will have mostly-AI AI researchers publishing AI research anyway so do we really care as long as the ideas can be verified to be correct?Smol ModelsMeta surprised many folks this year by not just aggressively updating Llama 3 and adding multimodality, but also adding a new series of “small” 1B and 3B “on device” models this year, even working on quantized numerics collaborations with Qualcomm, Mediatek, and Arm. It is near unbelievable that a 1B model today can qualitatively match a 13B model of last year:and the minimum size to hit a given MMLU bar has come down roughly 10x in the last year. We have been tracking this proxied by Lmsys Elo and inference price:The key reads this year are:* MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases* Apple Intelligence Foundation Language Models* Hymba: A Hybrid-head Architecture for Small Language Models* Loubna's SmolLM and SmolLM2: a family of state-of-the-art small models with 135M, 360M, and 1.7B parameters on the pareto efficiency frontier.* and Moondream, which we already covered in the 2024 in Vision talkFull Talk on YouTubeplease like and subscribe!Timestamps* [00:00:05] Loubna Intro* [00:00:33] The Rise of Synthetic Data Everywhere* [00:02:57] Model Collapse* [00:05:14] Phi, FineWeb, Cosmopedia - Synthetic Textbooks* [00:12:36] DCLM, Nemotron-CC* [00:13:28] Post Training - AI2 Tulu, Smol Talk, Cohere Multilingual Arbitrage* [00:16:17] Smol Models* [00:18:24] On Device Models* [00:22:45] Smol Vision Models* [00:25:14] What's NextTranscript2024 in Synthetic Data and Smol Models[00:00:00] ​[00:00:05] Loubna Intro[00:00:05] Speaker: ​I'm very happy to be here. Thank you for the invitation. So I'm going to be talking about synthetic data in 2024. And then I'm going to be talking about small on device models. So I think the most interesting thing about synthetic data this year is that like now we have it everywhere in the large language models pipeline.[00:00:33] The Rise of Synthetic Data Everywhere[00:00:33] Speaker: I think initially, synthetic data was mainly used just for post training, because naturally that's the part where we needed human annotators. And then after that, we realized that we don't really have good benchmarks to [00:01:00] measure if models follow instructions well, if they are creative enough, or if they are chatty enough, so we also started using LLMs as judges.[00:01:08] Speaker: Thank you. And I think this year and towards the end of last year, we also went to the pre training parts and we started generating synthetic data for pre training to kind of replace some parts of the web. And the motivation behind that is that you have a lot of control over synthetic data. You can control your prompt and basically also the kind of data that you generate.[00:01:28] Speaker: So instead of just trying to filter the web, you could try to get the LLM to generate what you think the best web pages could look like and then train your models on that. So this is how we went from not having synthetic data at all in the LLM pipeline to having it everywhere. And so the cool thing is like today you can train an LLM with like an entirely synthetic pipeline.[00:01:49] Speaker: For example, you can use our Cosmopedia datasets and you can train a 1B model on like 150 billion tokens that are 100 percent synthetic. And those are also of good quality. And then you can [00:02:00] instruction tune the model on a synthetic SFT dataset. You can also do DPO on a synthetic dataset. And then to evaluate if the model is good, you can use.[00:02:07] Speaker: A benchmark that uses LLMs as a judge, for example, MTBench or AlpacaEvil. So I think this is like a really mind blowing because like just a few years ago, we wouldn't think this is possible. And I think there's a lot of concerns about model collapse, and I'm going to talk about that later. But we'll see that like, if we use synthetic data properly and we curate it carefully, that shouldn't happen.[00:02:29] Speaker: And the reason synthetic data is very popular right now is that we have really strong models, both open and closed. It is really cheap and fast to use compared to human annotations, which cost a lot and take a lot of time. And also for open models right now, we have some really good inference frameworks.[00:02:47] Speaker: So if you have enough GPUs, it's really easy to spawn these GPUs and generate like a lot of synthetic data. Some examples are VLM, TGI, and TensorRT.[00:02:57] Model Collapse[00:02:57] Speaker: Now let's talk about the elephant in the room, model [00:03:00] collapse. Is this the end? If you look at the media and all of like, for example, some papers in nature, it's really scary because there's a lot of synthetic data out there in the web.[00:03:09] Speaker: And naturally we train on the web. So we're going to be training a lot of synthetic data. And if model collapse is going to happen, we should really try to take that seriously. And the other issue is that, as I said, we think, a lot of people think the web is polluted because there's a lot of synthetic data.[00:03:24] Speaker: And for example, when we're building fine web datasets here at Guillerm and Hinek, we're interested in like, how much synthetic data is there in the web? So there isn't really a method to properly measure the amount of synthetic data or to save a webpage synthetic or not. But one thing we can do is to try to look for like proxy words, for example, expressions like as a large language model or words like delve that we know are actually generated by chat GPT.[00:03:49] Speaker: We could try to measure the amount of these words in our data system and compare them to the previous years. For example, here, we measured like a, these words ratio in different dumps of common crawl. [00:04:00] And we can see that like the ratio really increased after chat GPT's release. So if we were to say that synthetic data amount didn't change, you would expect this ratio to stay constant, which is not the case.[00:04:11] Speaker: So there's a lot of synthetic data probably on the web, but does this really make models worse? So what we did is we trained different models on these different dumps. And we then computed their performance on popular, like, NLP benchmarks, and then we computed the aggregated score. And surprisingly, you can see that the latest DOMs are actually even better than the DOMs that are before.[00:04:31] Speaker: So if there's some synthetic data there, at least it did not make the model's worse. Yeah, which is really encouraging. So personally, I wouldn't say the web is positive with Synthetic Data. Maybe it's even making it more rich. And the issue with like model collapse is that, for example, those studies, they were done at like a small scale, and you would ask the model to complete, for example, a Wikipedia paragraph, and then you would train it on these new generations, and you would do that every day.[00:04:56] Speaker: iteratively. I think if you do that approach, it's normal to [00:05:00] observe this kind of behavior because the quality is going to be worse because the model is already small. And then if you train it just on its generations, you shouldn't expect it to become better. But what we're really doing here is that we take a model that is very large and we try to distill its knowledge into a model that is smaller.[00:05:14] Phi, FineWeb, Cosmopedia - Synthetic Textbooks[00:05:14] Speaker: And in this way, you can expect to get like a better performance for your small model. And using synthetic data for pre-training has become really popular. After the textbooks are all you need papers where Microsoft basically trained a series of small models on textbooks that were using a large LLM.[00:05:32] Speaker: And then they found that these models were actually better than models that are much larger. So this was really interesting. It was like first of its time, but it was also met with a lot of skepticism, which is a good thing in research. It pushes you to question things because the dataset that they trained on was not public, so people were not really sure if these models are really good or maybe there's just some data contamination.[00:05:55] Speaker: So it was really hard to check if you just have the weights of the models. [00:06:00] And as Hugging Face, because we like open source, we tried to reproduce what they did. So this is our Cosmopedia dataset. We basically tried to follow a similar approach to what they documented in the paper. And we created a synthetic dataset of textbooks and blog posts and stories that had almost 30 billion tokens.[00:06:16] Speaker: And we tried to train some models on that. And we found that like the key ingredient to getting a good data set that is synthetic is trying as much as possible to keep it diverse. Because if you just throw the same prompts as your model, like generate like a textbook about linear algebra, and even if you change the temperature, the textbooks are going to look alike.[00:06:35] Speaker: So there's no way you could scale to like millions of samples. And the way you do that is by creating prompts that have some seeds that make them diverse. In our case, the prompt, we would ask the model to generate a textbook, but make it related to an extract from a webpage. And also we try to frame it within, to stay within topic.[00:06:55] Speaker: For example, here, we put like an extract about cardiovascular bioimaging, [00:07:00] and then we ask the model to generate a textbook related to medicine that is also related to this webpage. And this is a really nice approach because there's so many webpages out there. So you can. Be sure that your generation is not going to be diverse when you change the seed example.[00:07:16] Speaker: One thing that's challenging with this is that you want the seed samples to be related to your topics. So we use like a search tool to try to go all of fine web datasets. And then we also do a lot of experiments with the type of generations we want the model to generate. For example, we ask it for textbooks for middle school students or textbook for college.[00:07:40] Speaker: And we found that like some generation styles help on some specific benchmarks, while others help on other benchmarks. For example, college textbooks are really good for MMLU, while middle school textbooks are good for benchmarks like OpenBookQA and Pico. This is like a sample from like our search tool.[00:07:56] Speaker: For example, you have a top category, which is a topic, and then you have some [00:08:00] subtopics, and then you have the topic hits, which are basically the web pages in fine web does belong to these topics. And here you can see the comparison between Cosmopedia. We had two versions V1 and V2 in blue and red, and you can see the comparison to fine web, and as you can see throughout the training training on Cosmopedia was consistently better.[00:08:20] Speaker: So we managed to get a data set that was actually good to train these models on. It's of course so much smaller than FineWeb, it's only 30 billion tokens, but that's the scale that Microsoft data sets was, so we kind of managed to reproduce a bit what they did. And the data set is public, so everyone can go there, check if everything is all right.[00:08:38] Speaker: And now this is a recent paper from NVIDIA, Neumatron CC. They took things a bit further, and they generated not a few billion tokens, but 1. 9 trillion tokens, which is huge. And we can see later how they did that. It's more of, like, rephrasing the web. So we can see today that there's, like, some really huge synthetic datasets out there, and they're public, so, [00:09:00] like, you can try to filter them even further if you want to get, like, more high quality corpses.[00:09:04] Speaker: So for this, rephrasing the web this approach was suggested in this paper by Pratyush, where basically in this paper, they take some samples from C4 datasets, and then they use an LLM to rewrite these samples into a better format. For example, they ask an LLM to rewrite the sample into a Wikipedia passage or into a Q& A page.[00:09:25] Speaker: And the interesting thing in this approach is that you can use a model that is Small because it doesn't, rewriting doesn't require knowledge. It's just rewriting a page into a different style. So the model doesn't need to have like knowledge that is like extensive of what is rewriting compared to just asking a model to generate a new textbook and not giving it like ground truth.[00:09:45] Speaker: So here they rewrite some samples from C4 into Q& A, into Wikipedia, and they find that doing this works better than training just on C4. And so what they did in Nemo Trans CC is a similar approach. [00:10:00] They rewrite some pages from Common Crawl for two reasons. One is to, like improve Pages that are low quality, so they rewrite them into, for example, Wikipedia page, so they look better.[00:10:11] Speaker: And another reason is to create more diverse datasets. So they have a dataset that they already heavily filtered, and then they take these pages that are already high quality, and they ask the model to rewrite them in Question and Answer format. into like open ended questions or like multi choice questions.[00:10:27] Speaker: So this way they can reuse the same page multiple times without fearing like having multiple duplicates, because it's the same information, but it's going to be written differently. So I think that's also a really interesting approach for like generating synthetic data just by rephrasing the pages that you already have.[00:10:44] Speaker: There's also this approach called Prox where they try to start from a web page and then they generate a program which finds how to write that page to make it better and less noisy. For example, here you can see that there's some leftover metadata in the web page and you don't necessarily want to keep that for training [00:11:00] your model.[00:11:00] Speaker: So So they train a model that can generate programs that can like normalize and remove lines that are extra. So I think this approach is also interesting, but it's maybe less scalable than the approaches that I presented before. So that was it for like rephrasing and generating new textbooks.[00:11:17] Speaker: Another approach that I think is really good and becoming really popular for using synthetic data for pre training is basically building a better classifiers. For filtering the web for example, here we release the data sets called fine web edu. And the way we built it is by taking Llama3 and asking it to rate the educational content of web pages from zero to five.[00:11:39] Speaker: So for example, if a page is like a really good textbook that could be useful in a school setting, it would get a really high score. And if a page is just like an advertisement or promotional material, it would get a lower score. And then after that, we take these synthetic annotations and we train a classifier on them.[00:11:57] Speaker: It's a classifier like a BERT model. [00:12:00] And then we run this classifier on all of FineWeb, which is a 15 trillion tokens dataset. And then we only keep the pages that have like a score that's higher than 3. So for example, in our case, we went from 15 trillion tokens to 3. to just 1. 5 trillion tokens. Those are really highly educational.[00:12:16] Speaker: And as you can see here, a fine web EDU outperforms all the other public web datasets by a larger margin on a couple of benchmarks here, I show the aggregated score and you can see that this approach is really effective for filtering web datasets to get like better corpuses for training your LLMs.[00:12:36] DCLM, Nemotron-CC[00:12:36] Speaker: Others also try to do this approach. There's, for example, the DCLM datasets where they also train the classifier, but not to detect educational content. Instead, they trained it on OpenHermes dataset, which is a dataset for instruction tuning. And also they explain like IAM5 subreddits, and then they also get really high quality dataset which is like very information dense and can help [00:13:00] you train some really good LLMs.[00:13:01] Speaker: And then Nemotron Common Crawl, they also did this approach, but instead of using one classifier, they used an ensemble of classifiers. So they used, for example, the DCLM classifier, and also classifiers like the ones we used in FineWebEducational, and then they combined these two. Scores into a, with an ensemble method to only retain the best high quality pages, and they get a data set that works even better than the ones we develop.[00:13:25] Speaker: So that was it for like synthetic data for pre-training.[00:13:28] Post Training - AI2 Tulu, Smol Talk, Cohere Multilingual Arbitrage[00:13:28] Speaker: Now we can go back to post training. I think there's a lot of interesting post training data sets out there. One that was released recently, the agent instructs by Microsoft where they basically try to target some specific skills. And improve the performance of models on them.[00:13:43] Speaker: For example, here, you can see code, brain teasers, open domain QA, and they managed to get a dataset that outperforms that's when fine tuning Mistral 7b on it, it outperforms the original instruct model that was released by Mistral. And as I said, to get good synthetic data, you really [00:14:00] have to have a framework to make sure that your data is diverse.[00:14:03] Speaker: So for example, for them, they always. And then they see the generations on either source code or raw text documents, and then they rewrite them to make sure they're easier to generate instructions from, and then they use that for their like instruction data generation. There's also the Tool3SFT mixture, which was released recently by Allen AI.[00:14:23] Speaker: It's also really good quality and it covers a wide range of tasks. And the way they make sure that this dataset is diverse is by using personas from the persona hub datasets. Which is basically a data set of like I think over a million personas. And for example, in the tool mixture to generate like a new code snippet, they would give like the model persona, for example, a machine learning researcher interested in neural networks, and then ask it to generate like a coding problem.[00:14:49] Speaker: This way you make sure that your data set is really diverse, and then you can further filter the data sets, for example, using the reward models. We also released a dataset called Smalltalk, [00:15:00] and we also tried to cover the wide range of tasks, and as you can see here, for example, when fine tuning Mistral 7b on the dataset, we also outperformed the original Mistral instructs on a number of benchmarks, notably on mathematics and instruction following with ifevil.[00:15:18] Speaker: Another paper that's really interesting I wanted to mention is this one called Multilingual Data Arbitrage by Cohere. And basically they want to generate a data set for post training that is multilingual. And they have a really interesting problem. It's the fact that there isn't like one model that's really good at all the languages they wanted.[00:15:36] Speaker: So what they do is that like they use not just one teacher model, but multiple teachers. And then they have a router which basically sends the prompts they have to all these models. And then they get the completions and they have a reward model that traces all these generations and only keeps the best one.[00:15:52] Speaker: And this is like arbitrage and finance. So well, I think what's interesting in this, it shows that like synthetic data, it doesn't have to come from a single model. [00:16:00] And because we have so many good models now, you could like pull these models together and get like a dataset that's really high quality and that's diverse and that's covers all your needs.[00:16:12] Speaker: I was supposed to put a meme there, but. Yeah, so that was it for like a synthetic data.[00:16:17] Smol Models[00:16:17] Speaker: Now we can go to see what's happening in the small models field in 2024. I don't know if you know, but like now we have some really good small models. For example, Lama 3. 2 1B is. It matches Lama 2. 13b from, that was released last year on the LMSYS arena, which is basically the default go to leaderboard for evaluating models using human evaluation.[00:16:39] Speaker: And as you can see here, the scores of the models are really close. So I think we've made like hugely forward in terms of small models. Of course, that's one, just one data point, but there's more. For example, if you look at this chart from the Quint 2. 5 blog post, it shows that today we have some really good models that are only like 3 billion parameters [00:17:00] and 4 billion that score really high on MMLU.[00:17:03] Speaker: Which is a really popular benchmark for evaluating models. And you can see here that the red, the blue dots have more than 65 on MMLU. And the grey ones have less. And for example, Llama33b had less. So now we have a 3b model that outperforms a 33b model that was released earlier. So I think now people are starting to realize that like, we shouldn't just scale and scale models, but we should try to make them more efficient.[00:17:33] Speaker: I don't know if you knew, but you can also chat with a 3B plus model on your iPhone. For example, here, this is an app called PocketPal, where you can go and select a model from Hugging Face. It has a large choice. For example, here we loaded the 5. 3. 5, which is 3. 8 billion parameters on this iPhone. And we can chat with this and you can see that even the latency is also acceptable.[00:17:57] Speaker: For example, here, I asked it to give me a joke about [00:18:00] NeurIPS. So let's see what it has to say.[00:18:06] Speaker: Okay, why did the neural network attend NeurIPS? Because it heard there would be a lot of layers and fun and it wanted to train its sense of humor. So not very funny, but at least it can run on device. Yeah, so I think now we have good small models, but we also have like good frameworks and tools to use these small models.[00:18:24] On Device Models[00:18:24] Speaker: So I think we're really close to having like really on edge and on device models that are really good. And I think for a while we've had this narrative. But just training larger models is better. Of course, this is supported by science scaling laws. As you can see here, for example, when we scale the model size, the loss is lower and obviously you get a better model.[00:18:46] Speaker: But and we can see this, for example, in the GPT family of models, how we went from just a hundred million parameters to more than a trillion. parameters. And of course, we all observed the performance improvement when using the latest model. But [00:19:00] one thing that we shouldn't forget is that when we scale the model, we also scale the inference costs and time.[00:19:05] Speaker: And so the largest models were are going to cost so much more. So I think now instead of just building larger models, we should be focusing on building more efficient models. It's no longer a race for the largest models since these models are really expensive to run and they require like a really good infrastructure to do that and they cannot run on, for example, consumer hardware.[00:19:27] Speaker: And when you try to build more efficient models that match larger models, that's when you can really unlock some really interesting on device use cases. And I think a trend that we're noticing now is the trend of training smaller models longer. For example, if you compare how much, how long LLAMA was trained compared to LLAMA3, there is a huge increase in the pre training length.[00:19:50] Speaker: LLAMA was trained on 1 trillion tokens, but LLAMA3 8b was trained on 15 trillion tokens. So Meta managed to get a model that's the same size, but But it performs so much [00:20:00] better by choosing to like spend the sacrifice during training, because as we know, training is a one time cost, but inference is something that's ongoing.[00:20:08] Speaker: If we want to see what are like the small models reads in 2024, I think this mobile LLM paper by Meta is interesting. They try to study different models that are like have the less than 1 billion parameters and find which architecture makes most sense for these models. For example, they find that depth is more important than width.[00:20:29] Speaker: So it's more important to have models that have like more layers than just one. making them more wide. They also find that GQA helps, that tying the embedding helps. So I think it's a nice study overall for models that are just a few hundred million parameters. There's also the Apple intelligence tech report, which is interesting.[00:20:48] Speaker: So for Apple intelligence, they had two models, one that was like on server and another model that was on device. It had 3 billion parameters. And I think the interesting part is that they trained this model using [00:21:00] pruning. And then distillation. And for example, they have this table where they show that, like, using pruning and distillation works much better than training from scratch.[00:21:08] Speaker: And they also have some interesting insights about, like, how they specialize their models on specific tasks, like, for example, summarization and rewriting. There's also this paper by NVIDIA that was released recently. I think you've already had a talk about, like, hybrid models that was all interesting.[00:21:23] Speaker: And this model, they used, like, a hybrid architecture between state space models and transformers. And they managed to train a 1B model that's really performant without needing to train it on a lot of tokens. And regarding our work, we just recently released SmallM2, so it's a series of three models, which are the best in class in each model size.[00:21:46] Speaker: For example, our 1. 7b model outperforms Lama 1b and also Qt 2. 5. And how we managed to train this model is the following. That's where you spent a lot of time trying to curate the pre training datasets. We did a lot of [00:22:00] ablations, trying to find which datasets are good and also how to mix them. We also created some new math and code datasets that we're releasing soon.[00:22:08] Speaker: But you basically really spent a lot of time trying to find what's the best mixture that you can train these models on. And then we spent some time trying to like we also trained these models for very long. For example, small M1 was trained only on 1 trillion tokens, but this model is trained on 11 trillion tokens.[00:22:24] Speaker: And we saw that the performance kept improving. The models didn't really plateau mid training, which I think is really interesting. It shows that you can train such small models for very long and keep getting performance gains. What's interesting about SmallLM2 is that it's fully open. We also released, like the pre training code base, the fine tuning code, the datasets, and also evaluation in this repository.[00:22:45] Smol Vision Models[00:22:45] Speaker: Also there's, like, really interesting small models for text, but also for vision. For example, here you can see SmallVLM, which is a 2B model that's really efficient. It doesn't consume a lot of RAM, and it also has a good performance. There's also Moondream 0. [00:23:00] 5b, which was released recently. It's like the smallest visual language model.[00:23:04] Speaker: And as you can see, there isn't like a big trade off compared to Moondream 2b. So now I showed you that we have some really good small models. We also have the tools to use them, but why should you consider using small models and when? I think, like, small models are really interesting because of the on device feature.[00:23:23] Speaker: Because these models are small and they can run fast, you can basically run them on your laptop, but also on your mobile phone. And this means that your dataset stays locally. You don't have to send your queries to third parties. And this really enhances privacy. That was, for example, one of the big selling points for Apple Intelligence.[00:23:42] Speaker: Also, right now, we really have a lot of work to do. So many frameworks to do on device inference. For example, there's MLX, MLC, Llama, CPP, Transformers, JS. So we have a lot of options and each of them have like great features. So you have so many options for doing that. Small models are also really powerful if you choose to specialize them.[00:24:00][00:24:00] Speaker: For example, here there's a startup called Numind, which took small LM and then they fine tuned it on text extraction datasets. And they managed to get a model that's not very far from models that are much larger. So I think text extraction is like one use case where small models can be really performant and it makes sense to use them instead of just using larger models.[00:24:19] Speaker: You can also chat with these models in browser. For example, here, you can go there, you can load the model, you can even turn off your internet and just start chatting with the model locally. Speaking of text extraction, if you don't want to fine tune the models, there's a really good method of structure generation.[00:24:36] Speaker: We can basically force the models to follow a JSON schema that you defined. For example, here, we try to force the model to follow a schema for extracting key information from GitHub issues. So you can input free text, which is a complaint about a GitHub repository, something not working. And then you can run it there and the model can extract anything that is relevant for your GitHub issue creation.[00:24:58] Speaker: For example, the [00:25:00] priority, for example, here, priority is high, the type of the issue bug, and then a title and the estimation of how long this will take to fix. And you can just like do this in the browser, you can transform your text into a GitHub issue that's properly formatted.[00:25:14] What's Next[00:25:14] Speaker: So what's next for synthetic data and small models?[00:25:18] Speaker: I think that domain specific synthetic data is going to be, it's already important, it's going to be even more important. For example, generating synthetic data for math. I think this really would help improve the reasoning of a lot of models. And a lot of people are doing it, for example, Quint 2. 12 math, everyone's trying to reproduce a one.[00:25:37] Speaker: And so I think for synthetic data, trying to specialize it on some domains is going to be really important. And then for small models, I think specializing them through fine tuning, it's also going to be really important because I think a lot of companies are just trying to use these large models because they are better.[00:25:53] Speaker: But on some tasks, I think you can already get decent performance with small models. So you don't need to Pay like a [00:26:00] cost that's much larger just to make your model better at your task by a few percent. And this is not just for text. And I think it also applies for other modalities like vision and audio.[00:26:11] Speaker: And I think you should also watch out for on device frameworks and applications. For example, like the app I showed, or lama, all these frameworks are becoming really popular and I'm pretty sure that we're gonna get like more of them in 2025. And users really like that. Maybe for other, I should also say hot take.[00:26:28] Speaker: I think that like in AI, we just started like with fine tuning, for example, trying to make BERT work on some specific use cases, and really struggling to do that. And then we had some models that are much larger. So we just switched to like prompt engineering to get the models And I think we're going back to fine tuning where we realize these models are really costly.[00:26:47] Speaker: It's better to use just a small model or try to specialize it. So I think it's a little bit of a cycle and we're going to start to see like more fine tuning and less of just like a prompt engineering the models. So that was my talk. Thank you for following. And if you have [00:27:00] any questions, we can take them now. Get full access to Latent Space at www.latent.space/subscribe

Viva la Mami
How to Manage Your Energy Without the Mom Guilt with Jessica Lynn Rojas

Viva la Mami

Play Episode Listen Later Dec 12, 2024 64:01 Transcription Available


In this episode, we welcome Jessica Rojas, a certified health and wellness coach, work-life balance expert, and founder of Jessica Lynn Wellness. She shares her insights and strategies on maintaining balance and managing energy effectively. We dive into the myths of the 50/50 balance and explore practical tips for prioritizing self-care, setting realistic goals, and breaking free from societal and cultural expectations. Jessica addresses the cultural conditioning many Latinas face, where being constantly busy is seen as a badge of honor, and helps us reframe these limiting beliefs.This episode is especially valuable for Latina moms who are struggling to balance their various roles and responsibilities while maintaining their sense of self. Whether you're a working mom, stay-at-home mom, or preparing for motherhood, this conversation offers valuable insights for creating a more balanced, fulfilling life that honors both your cultural heritage and personal well-being.For detailed show notes, visit vivalamami.com/episode104Key Topics Covered:Why the popular "50/50 work-life balance" concept is a myth and what to focus on insteadHow to break free from cultural conditioning that makes us feel guilty for prioritizing restThe game-changing shift from time management to energy managementPractical strategies for managing your energy as a busy momHow to identify and overcome the "mind drama" that keeps us exhaustedWays to build your own support system when you don't have family nearbyConnect with Jessica Rojas!Instagram: @jessicalynnwellnessWebsite: jessicalynn-wellness.comLove this episode? Subscribe wherever you are listening, share this episode with an amiga, and ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.You can connect with Viva la Mami on ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Instagram⁠⁠⁠⁠,⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Facebook⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, the ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠VLM website⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, or email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠.Join the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!

Viva la Mami
103. How to Honor Your Cultural Foods While Meeting Your Health Goals with Mariana Dineen

Viva la Mami

Play Episode Listen Later Dec 5, 2024 62:02 Transcription Available


In this episode, we welcome Mariana Dineen, a registered dietitian and founder of Elemento Health. She shares the importance of nutrition within the Latino community, especially for mothers striving to adopt a healthier lifestyle while preserving cultural traditions. Mariana emphasizes the significance of cultural relevance in nutrition counseling, overcoming language barriers, and the unique challenges faced by the Latino community in accessing healthcare.She offers practical strategies for meal planning and managing chronic diseases like diabetes through a cultural lens, stressing the importance of balance and inclusion rather than the elimination of cultural foods. We also touched on the impact of stress on eating habits, particularly for busy Latina moms, and how to address these issues holistically.Mariana's commitment to cultural sensitivity in her practice, Elemento Health, underscores her dedication to providing accessible and empathetic nutrition care. If you've ever felt shame about your cultural foods or struggled to find a healthcare provider who truly gets you, this episode is for you.For detailed show notes, visit vivalamami.com/episode103Key topics covered:Breaking down barriers to accessing nutrition care for LatinasMaking traditional dishes healthier while preserving cultural rootsManaging stress eating and emotional relationships with foodPractical meal planning strategies for busy mamasCulturally sensitive approaches to managing conditions like diabetesConnect with Mariana from Elemento Health!Email: mariana@elementohealth.comInstagram: @elemento_healthWebsite: elementohealth.comLove this episode? Subscribe wherever you are listening, share this episode with an amiga, and ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.You can connect with Viva la Mami on ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Instagram⁠⁠⁠⁠,⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Facebook⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, the ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠VLM website⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, or email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠.Join the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!

Viva la Mami
102. [Replay Episode] How to Practice Self-Gratitude as Moms

Viva la Mami

Play Episode Listen Later Nov 28, 2024 21:56 Transcription Available


In this replay episode and Thanksgiving edition, I discuss the significance of expressing self-gratitude, especially for moms who often neglect to acknowledge their own efforts while caring for their families. Join me as I share ways on how you can practice self-gratitude.These practices are not only beneficial to your own mental and emotional well-being but can also serve as an inspiring model of self-appreciation for your children.For detailed show notes, visit vivalamami.com/episode102In this episode, you'll hear:The importance of self-gratitudeHow to express self-gratitude this ThanksgivingFive ways to practice self-gratitudeIn what ways do you express self-gratitude?Love this episode? Subscribe wherever you are listening, share this episode with an amiga, and ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.You can connect with Viva la Mami on ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Instagram⁠⁠⁠⁠,⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Facebook⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, the ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠VLM website⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, or email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠.Join the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!

Viva la Mami
101. Managing Mom Rage and Emotional Dysregulation with Jocelyn Flores

Viva la Mami

Play Episode Listen Later Nov 21, 2024 53:45 Transcription Available


In this episode, I delve into the all-too-common experiences of emotional dysregulation and 'mom rage' that many Latina moms face. Joined by licensed marriage and family therapist Jocelyn Flores, founder of Raíz Parenting, we discuss practical tools like mindful breathing techniques and setting healthy boundaries to help manage these overwhelming emotions.Jocelyn also shares insights on the impact of our cultural background and the importance of self-compassion in the parenting journey. This is an authentic, judgment-free conversation aimed at providing support and reminding mamis that they are not alone in their struggles. We also explore how to break generational cycles of parenting and create a more emotionally secure environment for our children.For detailed show notes, visit vivalamami.com/episode101Connect with Jocelyn FloresWebsite: raiz-parenting.comInstagram: @raizparentingResources Mentioned:Get Raíz Parenting's Break the Cycle Freebie: www.raiz-parenting.com/freebie Sign up for the Raíz Parenting Community waitlist: www.raiz-parenting.com/waitlistLove this episode? Subscribe wherever you are listening, share this episode with an amiga, and ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠leave a review⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ on Apple podcasts.You can connect with Viva la Mami on ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Instagram⁠⁠⁠⁠,⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Facebook⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, the ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠VLM website⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠, or email us at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠podcast@vivalamami.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠.Join the ⁠⁠⁠⁠Viva la Mami newsletter⁠⁠⁠⁠ so you won't miss a thing!