POPULARITY
La inteligencia artificial no es el futuro… es la ola que ya empezó y no se puede detener. En este episodio te contamos lo que aprendimos de The Coming Wave, el libro más brutal sobre el futuro de la IA, escrito por uno de los fundadores de DeepMind.Hablamos de armas biológicas hechas desde tu laptop, AGIs que piensan como humanos, gobiernos inútiles, modelos que aprenden solos y sistemas que podrían salirse de control en cualquier momento.¿Qué podemos hacer los humanos comunes ante este futuro distópico?
Agis et le ciel t'aidera by Rav David Touitou
Une YouTubeuse dépeint Dubaï comme un cauchemar… mais s'agit-il d'une véritable enquête ou d'une vidéo sensationnaliste ? Après 7 ans de vie sur place, je décortique point par point les affirmations de la vidéo d'Amistory. Salaires à 20 000 €/mois ? Loyers doublés en paiement mensuel ? Licence obligatoire pour les influenceurs ? Je confronte les clichés à la réalité, chiffres et vécu à l'appui. Ressources : Mon livre "Tout le monde n'aura pas la chance de quitter son pays" La vidéo originale d'Amistory : https://www.youtube.com/watch?v=WBpSCuahY40 - Mes deux précédents podcasts sur Dubaï : Dubaï : je brise les 9 clichés les plus répandus Vivre à Dubaï : 7 inconvénients et 22 avantages, et comment y immigrer - Articles sur le trafic sexuel de femmes en France : https://www.antitraffickingreview.org/index.php/atrjournal/article/view/383/323 https://newlinesmag.com/spotlight/paris-police-are-cracking-down-on-vulnerable-communities-ahead-of-the-olympics/ https://www.businessinsider.com/paris-olympics-games-sex-labor-human-trafficking-exploitation-2024-7 - Le rapport 2025 d'Amnesty International
Axel Kicillof afirmó en conferencia de prensa: “mientras Milei dice que el Estado argentino va a decir que esto no corresponde, él sostiene que la culpa la tengo yo o tenemos quienes participamos de la decisión de la expropiación. Y la verdad que es absolutamente contradictorio y peligroso, porque que el presidente de la nación le dé la razón a los demandantes, contradice la defensa del interés nacional”.“Se puede entender de Milei que esté siempre a favor de los buitres y no del país, siempre a favor de la entrega y no de la soberanía, siempre a favor de tribunales o países extranjeros y no de la República Argentina, eso se puede entender. Pero en este caso es muy delicado, porque empieza a estar en riesgo YPF”, agregó el gobernador de Buenos Aires.El economista Emmanuel Álvarez Agis sostuvo: "Creo que acá no deberíamos sacar un poco las cuestiones partidarias, mínimamente entender que acá se juega el interés del país, e incluso los intereses soberanos de los países a tener leyes que tienen que estar por encima de los estatutos de una compañía privada”. “Nadie nunca me dijo a mí ni al equipo económico, che, cuidado con Peterson, ¿no? Digamos, che, ahí hagamos un tratamiento especial, etc. O sea, de hecho, la decisión de estatización era por todos conocida que terminaba haciendo que Peterson tuviera que entregar las acciones que tenía en garantía”, agregó.El ex presidente de YPF, Pablo González, afirmó: “Vaca Muerta se ha desarrollado gracias, exclusivamente diría, y gracias a aquella decisión que se tomó en el año 2012. Si Argentina no hubiera expropiado YPF en ese momento, no se hubiera desarrollado Vaca Muerta”.El jefe de Gabinete, Guillermo Francos, señaló: “La sentencia de la jueza Presca es del 2023. Ella empezó una etapa de ejecución de la sentencia que está contra las acciones que el Estado Nacional tiene en YPF. Yo diría que ese no es el problema fundamental, o el problema ya estaba con la sentencia. El problema fundamental es cuál es el criterio con el que se toman estas decisiones que son tan trascendentes para el país. porque esta verba inflamada del populismo, de que se nacionalizan empresas que poquito tiempo antes fueron privatizadas”.Noticias del miércoles 2 de julio por María O'Donnell y equipo de De Acá en Más por Urbana Play 104.3 FMSeguí a De Acá en Más en Instagram y XUrbana Play 104.3 FM. Somos la radio que ves.Suscribite a #Youtube. Seguí a la radio en Instagram y en XMandanos un whatsapp ➯ Acá¡Descargá nuestra #APP oficial! ➯ https://scnv.io/m8Gr
2.1 Summary & Table of contents This is the second of a two-post series on foom (previous post) and doom (this post). The last post talked about how I expect future AI to be different from present AI. This post will argue that this future AI will be of a type that will be egregiously misaligned and scheming, not even ‘slightly nice', absent some future conceptual breakthrough.I will particularly focus on exactly how and why I differ from the LLM-focused researchers who wind up with (from my perspective) bizarrely over-optimistic beliefs like “P(doom) ≲ 50%”.[1] In particular, I will argue that these “optimists” are right that “Claude seems basically nice, by and large” is nonzero evidence for feeling good about current LLMs (with various caveats). But I think that future AIs will be disanalogous to current LLMs, and I will dive into exactly how and why, with a [...] ---Outline:(00:12) 2.1 Summary & Table of contents(04:42) 2.2 Background: my expected future AI paradigm shift(06:18) 2.3 On the origins of egregious scheming(07:03) 2.3.1 Where do you get your capabilities from?(08:07) 2.3.2 LLM pretraining magically transmutes observations into behavior, in a way that is profoundly disanalogous to how brains work(10:50) 2.3.3 To what extent should we think of LLMs as imitating?(14:26) 2.3.4 The naturalness of egregious scheming: some intuitions(19:23) 2.3.5 Putting everything together: LLMs are generally not scheming right now, but I expect future AI to be disanalogous(23:41) 2.4 I'm still worried about the 'literal genie' / 'monkey's paw' thing(26:58) 2.4.1 Sidetrack on disanalogies between the RLHF reward function and the brain-like AGI reward function(32:01) 2.4.2 Inner and outer misalignment(34:54) 2.5 Open-ended autonomous learning, distribution shifts, and the 'sharp left turn'(38:14) 2.6 Problems with amplified oversight(41:24) 2.7 Downstream impacts of Technical alignment is hard(43:37) 2.8 Bonus: Technical alignment is not THAT hard(44:04) 2.8.1 I think we'll get to pick the innate drives (as opposed to the evolution analogy)(45:44) 2.8.2 I'm more bullish on impure consequentialism(50:44) 2.8.3 On the narrowness of the target(52:18) 2.9 Conclusion and takeaways(52:23) 2.9.1 If brain-like AGI is so dangerous, shouldn't we just try to make AGIs via LLMs?(54:34) 2.9.2 What's to be done?The original text contained 20 footnotes which were omitted from this narration. --- First published: June 23rd, 2025 Source: https://www.lesswrong.com/posts/bnnKGSCHJghAvqPjS/foom-and-doom-2-technical-alignment-is-hard --- Narrated by TYPE III AUDIO. ---Images from the article:
Le lundi 16.06.2025, un inconnu toque à ma porte. En tout cas, c'est ce que je peux en tirer.En vérité, c'est un ami de longue date qui est là, planté dans mon hall d'entrée. Une personne auprès de qui je n'ai pas eu de nouvelles pendant plus d'un an. Et bien que sur le moment je panique pour mon organisation de la soirée : c'est la meilleure chose qui puisse m'arriver. Entre des moments d'accolade & d'échanges, j'extirpe trois leçons, qui je l'espère, t'aideront auprès de ton cercle social. ▶ Je te mets un petit Time Code (pour te retrouver plus facilement) ⏱️ : 0:00 à 3:45 : Il toque à ma porte. 3:45 à 6:20 : Il ne te fait pas la gue*le. 6:20 à 18:40 : PARLE ! 18:40 à 24:05 : Tout est écrit. 24:05 à 29:03 : Fait une liste « d'amis » & on bilante.Sans plus de BLA-BLA : allume ta radio, ton enceinte, prends ta meilleure paire d'écouteurs breeeef assieds-toi confortablement, appuie sur "PLAY" & laisse-toi embarquer avec moi pendant presque 30 minutes.Bonne écoute !
Oral Arguments for the Court of Appeals for the Federal Circuit
AGIS Software Development LLC v. Stewart
Hai-hai Sobat Kreatif! Di episode kali ini, kita ngebahas soal makanan yang ada di kampus nih! Penasaran gak sama top 6 makanan enak di kampus versi Uthe dan Agis? Yuk dengerin podcast episode Top 6 Makanan Enak di Kampus versi Uthe dan Agis hanya di Podcast Radio Penyiaran PoliMedia! Penyiar : Putri Bathiah B. dan Sandrika Agistara P. Operator & Md : Muhammad Rakha Fahreizy & Nadya Afiyah Salimah Editor : Frisca Tiurmarina Nainggolan Jangan lupa follow kita ya!! Instagram : @polimedia_radio Tiktok : @radio penyiaran polimedia
Demain mon bébé a deux ans ! Il y a déjà deux ans que je mettais au monde celui qui allait devenir le centre de mien, et à vrai dire qui l'était déjà clairement dès la seconde où deux barres parallèles sont apparues sur mon test devenir grossesse. À l'occasion de son anniversaire je voulais lui dédier ce podcast, tout comme je l'avais fait pour sa première bougie ! Comme nous aimons tous ici les épisodes collaboratifs, je vous ai demandé de me déballer tous les clichés que vous aviez en ce qui concerne la maternité/ parentalité et je vais y réagir en fonction de ma propre expérience !Ici Mathilde, de Dance With Him, et vous écoutez Radio Mama. Instagram : @dance_with_him Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.
Carlos Melconian afirmó: “El mercado en algún momento quiere ir a los bifes. Es lo que debiera ocurrir entre viernes y lunes. Después tiene que ocurrir, para ser exitoso, que el BCRA deje de vender dólares y compre dólares. El segundo mecanismo de visualización de éxito es que baje el riesgo país y algún día vuelva a los mercados voluntarios a colocar”. “Si el Fondo no hace un acuerdo Argentina defaulteo. Es impagable. ¿Va a haber acuerdo? Sí. Ahora, que después estos cambalecheros salgan, el breaking news, eso es un círculo cerrado, no lo mira nadie. Me chupa un huevo”, agregó el economista. Emmanuel Álvarez Agis sostuvo: “El valor del dólar es inversamente proporcional a lo bien que vos estás. Cuando el dólar está alto estás en el fondo de la olla. Cuando está bien, te encontrás en el Mundial de Rusia y vas a París y sos empleado de comercio. Los gobiernos con dólar alto pierden y con dólar bajo ganan. Cuando la política aprendió ese truco los economistas cagamos fuego”. Patricia Bullrich aseguró: “Acá no se discute la situación económica que el país sabe que estamos saliendo de una terapia intensiva, estamos caminando, lo que se discute es el poder. Lo que siempre quiso la CGT, la plata, las cajas. La casta, la mafia, la burocracia sindical. Es un paro que hace que la CGT quiera tener un protagonismo y marcar la cancha. Ellos siempre quieren gobernar, esté el gobierno que esté”. Bullrich se refirió a la marcha de los jubilados del último miércoles: “Ayer fue una lágrima, la marcha de ayer la estuve monitoreando y no llegaban a 7 u 8 mil personas. Muy poca convocatoria, muy chica. La gente se quería ir. Ni bien llegaban daban la vuelta y se iban”. Héctor Daer se refirió a los ataques sufridos por algunos colectivos: “Nada tiene que ver con los trabajadores, nada tiene que ver con ninguna organización sindical. Los trabajadores que quisieron subirse a los colectivos han cortado pocos boletos porque iban vacíos. Nada tenemos de vinculación ni de responsabilidad. No es un tema que tenga que ver con la puesta en marcha de esta acción sindical”. Manuel Adorni dijo: “Escuchar a Kicillof siempre me apasionó. Me parece un tipo super divertido cuando se expresa. Haría un stand up con él. Recordarás temas como “la provincia de la Antártida”. Ha hecho declaraciones delirantes. Me impacta más escucharlo a Kicillof que a Cristina. Cristina me aburre pero Kicillof me entretiene”.Noticias del viernes 11 de abril por María O'Donnell y equipo de De Acá en Más por Urbana Play 104.3 FMSeguí a De Acá en Más en Instagram y XUrbana Play 104.3 FM. Somos la radio que ves. Suscribite a #Youtube. Seguí a la radio en Instagram y en XMandanos un whatsapp ➯ Acá¡Descargá nuestra #APP oficial! ➯ https://scnv.io/m8Gr
L'amour, lamour, l'amour ! Celui qui nous fait battre le cœur plus fort, avoir des papillons dans le ventre, et ne voir que les côtés positifs de l'autre. Mais aussi parfois celui qui nous impose une charge mentale doublée, nous fait tout remettre en question, et nous brise le cœur en mille morceaux… c'est d'ailleurs plutôt de ce côté plus sombre dont nous allons parler aujourd'hui. Via une story Instagram, je vous ai demandé de me raconter vos problèmes et questionnements sur vos conjoints ou vos exs, on va les découvrir ensemble et je vais essayer de vous conseiller au mieux grâce à ma longue expérience de déjà 34 années passées sur cette terre !Ici Mathilde, de Dance With Him, et vous écoutez Radio Mama. Instagram : @dance_with_him Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.
Salut les motivés du samedi ! Aujourd'hui, on parle d'un truc simple : agir au lieu de trop réfléchir. Parce qu'à force de cogiter, on finit par faire des nœuds au cerveau… et zéro action. Sur les marchés ou dans la vie, à un moment, faut arrêter de se prendre la tête et se lancer.
Vous me le réclamiez : il est l'heure de réagir à vos pires unpopular opinions sur les sorties et artistes du moment... Gracie Abrams, Selena Gomez, Jennie, Lisa, Katy Perry, on a beaucoup de choses à aborder ! Et croyez-moi, on va pas y aller de main morte. Belle écoute ! Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.
durée : 01:02:13 - Les Nuits de France Culture - par : Antoine Dhulster - Jean Maitron en tant qu'historien s'intéresse aux traces ténues des individus dans l'histoire, aux récits des itinéraires individuels. Toute son œuvre intellectuelle est orientée vers la mémoire des figures de l'ombre de l'histoire académique. C'est le sujet du deuxième volet de son portrait. - réalisation : Thomas Jost - invités : Michelle Perrot Historienne spécialiste de l'histoire des femmes, professeure émérite d'histoire contemporaine à l'Université Paris Cité; Claude Pennetier Chercheur au CNRS; Jacques Girault Professeur émérite d'histoire; Madeleine Rebérioux
La inflación de febrero fue de 2,4% y acumuló 66,9% en los últimos doce meses. El economista Ricardo Arriazu dijo:“Yo soy enemigo del cepo pero más enemigo soy del colapso social. Al cepo hay que sacarlo de a poco y cuando se pueda. Mi preocupación es que Argentina no va a devaluar pero al presidente y al FMI les gusta la flotación. Van a querer una flotación compacta parecida a la del 2018 y eso va a generar problemas”.El economista Emmanuel Álvarez Agis opinó: “Todas estas reformas por decreto no dan estabilidad. Cuando viene un inversor y quiere invertir en Argentina y se quiere llevar 150% en 14 segundos antes de que maten a uno en la plaza, haya una inundación, el presidente se pelee con no sé quién, la verdad que las condiciones del entorno no favorecen. El último punto: así empezó Colombia. Post pandemia tienen que hacer un ajuste, van al Fondo, sube el IVA, 13 muertos en la plaza y llega Petro. Lo que estoy diciendo es que el Fondo está mirando si la sostenibilidad del ajuste era tan sólida como se pensaba al principio”.La jueza Karina Andrade declaró en nuestro programa: “Fue una decisión en el marco de priorización de derechos. La Constitución da el derecho a expresarse, manifestarse. Las detenciones no estaban siendo informadas en un control básico. En esa priorización de derechos es que, bueno, esto merece una respuesta rápida con los elementos que tenía. No se cumplían los requisitos básicos de información al juez”.Noticias del lunes 17 de marzo por María O'Donnell y el equipo de De Acá en Más por Urbana Play 104.3 FMSeguí a De Acá en Más en Instagram y XUrbana Play 104.3 FM. Somos la radio que ves. Suscribite a #Youtube. Seguí a la radio en Instagram y en XMandanos un whatsapp ➯ Acá¡Descargá nuestra #APP oficial! ➯ https://scnv.io/m8Gr Javier Milei se refirió a las Fuerzas de Seguridad: “El que las hace las paga, los buenos son los de azul y los hijos de puta uqe andan con trapos en la cara esos son los malos y tienen que ir presos. Vamos a meterlos presos, no vienen contra mí, vienen contra ustedes, yo solo estoy en el medio”. Agustín Laje, director de la Fundación Faro, afirmó: “Celebramos a la policía, los felicitamos, cada balazo bien puesto en cada zurdo ha sido un momento de regocijo para nosotros. Cada imagen de zurdos lloriqueando por gas pimienta ha sido placentero de ver. Nos ponemos de pie y aplaudimos a nuestras fuerzas porque en este país debe regir la ley”. Patricia Bullrich sostuvo: “No fue la Gendarmería ni la bala que además mienten porque no salió en horizontal. El gendarme tiró como tenía que tirar. Lo que hizo Gendarmería, la PFA, la de la Ciudad, la Prefectura fue defender a la ciudadanía para que sigamos viviendo en paz, con tranquilidad, en un país ordenado. Los registros de video son poco profesionales. La ciencia forense tiene parámetros. Miré todos los videos y todos los gendarmes tiran como tienen que tirar”. “Ahí pusieron un arma mortal, ese lugar, eso rebota dos veces antes y toca un hierro. Todavía no tengo el parte médico y no sé si le pegó la granada porque la granada rebotó y tocó ese fierro… si fue ese fierro”, agregó la ministra de Seguridad. Lilia Lemoine aseguró: “(Villarruel) Se “cambiemizó”, hasta Macri salió a putear. Macri quedó a la derecha de Villarruel porque no es bobo. Hay que defender a Patricia Bullrich. Yo también vi las imágenes del fotógrafo que también es empleado municipal. No estaba haciendo nada, no es que se merezca lo que le pasó. A todos nos genera shock, no es que le dispararon en la cabeza, estaban tirando gases y la ligó”.
Rish Gupta has had a long, winding journey filled with unexpected turns, painful challenges, and exhilarating wins. He has built and exited companies and is currently riding the incredible momentum and hype in the artificial intelligence world of AGIs. Rish's latest company, Spot AI, has attracted funding from top-tier investors like Redpoint Ventures, Scale Venture Partners, Bessemer Venture Partners, and StepStone Group.
[MÉTAMORPHOSE PODCAST] Alexandre Dana reçoit Matthieu Dardaillon, entrepreneur, co-fondateur de Ticket for Change et conférencier. Ensemble, ils explorent comment "sortir du chaos". En quoi notre époque est-elle chaotique ? Comment cela nous impacte-t-il ? Que faire quand on se sent submergé ? Comment créer une bulle face aux crises environnementales, sociales ou économiques ? À travers le concept du "vide fertile" et une vision renouvelée de la performance, Matthieu Dardaillon nous invite à repenser notre rapport au temps, à l'essentiel et à l'action. Son livre, Anti-chaos, est publié aux Éditions Eyrolles. Épisode #568Quelques citations du podcast avec Matthieu Dardaillon :"Si on veut créer une société à laquelle on aspire vraiment, il faut retrouver notre pouvoir d'agir.""Agis comme la personne que tu souhaites devenir.""L'être humain est la cause du chaos que l'on vit, c'est aussi la solution."Thèmes abordés lors du podcast avec Matthieu Dardaillon : 00:00 Introduction05:46 En quoi le monde d'aujourd'hui est-il chaotique ?08:50 Sentiment de chaos : quels signes ?10:36 Comment intégrer du vide fertile dans notre quotidien.13:47 Que sont les "temps post-normaux" ?16:48 Dépasser le phénomène de grande résignation.18:17 La marche, une clé de la méthode anti-chaos.22:30 Créer du vide fertile dans un monde ultra connecté.24:16 Le phénomène d'accélération.26:23 Le concept des 4 saisons pour trouver son rythme.30:15 La théorie du talon d'Achille.32:12 Une logique de performance durable.34:31 Trouver son équilibre en appliquant le lagom.40:18 Quels premiers pas pour se créer une bulle de sérénité ?42:49 L'importance d'avoir une vision.44:05 La réponse de Matthieu Dardaillon à "qui veux-tu devenir ?"45:52 Le rôle essentiel des routines.47:29 Comment reprendre la main quand on manque de temps ?50:10 À quoi pourrait ressembler le nouveau monde ?Avant-propos et précautions à l'écoute du podcastDécouvrez Objectif Métamorphose, notre programme en 12 étapes pour partir à la rencontre de soi-même.Recevez un mercredi sur deux la newsletter Métamorphose avec des infos inédites sur le podcast et les inspirations d'AnneFaites le TEST gratuit de La Roue Métamorphose avec 9 piliers de votre vie !Suivez nos RS : Insta, Facebook & TikTokAbonnez-vous sur Apple Podcast / Spotify / Deezer / CastBox/ YoutubeSoutenez Métamorphose en rejoignant la Tribu MétamorphosePhoto DR Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.
#143 - Le bon moment pour tirer des leçons
Jürgen Schmidhuber, the father of generative AI, challenges current AI narratives, revealing that early deep learning work is in his opinion misattributed, where it actually originated in Ukraine and Japan. He discusses his early work on linear transformers and artificial curiosity which preceded modern developments, shares his expansive vision of AI colonising space, and explains his groundbreaking 1991 consciousness model. Schmidhuber dismisses fears of human-AI conflict, arguing that superintelligent AI scientists will be fascinated by their own origins and motivated to protect life rather than harm it, while being more interested in other superintelligent AI and in cosmic expansion than earthly matters. He offers unique insights into how humans and AI might coexist. This was the long-awaited second, unreleased part of our interview we filmed last time. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? Goto https://tufalabs.ai/ *** Interviewer: Tim Scarfe TOC [00:00:00] The Nature and Motivations of AI [00:02:08] Influential Inventions: 20th vs. 21st Century [00:05:28] Transformer and GPT: A Reflection The revolutionary impact of modern language models, the 1991 linear transformer, linear vs. quadratic scaling, the fast weight controller, and fast weight matrix memory. [00:11:03] Pioneering Contributions to AI and Deep Learning The invention of the transformer, pre-trained networks, the first GANs, the role of predictive coding, and the emergence of artificial curiosity. [00:13:58] AI's Evolution and Achievements The role of compute, breakthroughs in handwriting recognition and computer vision, the rise of GPU-based CNNs, achieving superhuman results, and Japanese contributions to CNN development. [00:15:40] The Hardware Lottery and GPUs GPUs as a serendipitous advantage for AI, the gaming-AI parallel, and Nvidia's strategic shift towards AI. [00:19:58] AI Applications and Societal Impact AI-powered translation breaking communication barriers, AI in medicine for imaging and disease prediction, and AI's potential for human enhancement and sustainable development. [00:23:26] The Path to AGI and Current Limitations Distinguishing large language models from AGI, challenges in replacing physical world workers, and AI's difficulty in real-world versus board games. [00:25:56] AI and Consciousness Simulating consciousness through unsupervised learning, chunking and automatizing neural networks, data compression, and self-symbols in predictive world models. [00:30:50] The Future of AI and Humanity Transition from AGIs as tools to AGIs with their own goals, the role of humans in an AGI-dominated world, and the concept of Homo Ludens. [00:38:05] The AI Race: Europe, China, and the US Europe's historical contributions, current dominance of the US and East Asia, and the role of venture capital and industrial policy. [00:50:32] Addressing AI Existential Risk The obsession with AI existential risk, commercial pressure for friendly AIs, AI vs. hydrogen bombs, and the long-term future of AI. [00:58:00] The Fermi Paradox and Extraterrestrial Intelligence Expanding AI bubbles as an explanation for the Fermi paradox, dark matter and encrypted civilizations, and Earth as the first to spawn an AI bubble. [01:02:08] The Diversity of AI and AI Ecologies The unrealism of a monolithic super intelligence, diverse AIs with varying goals, and intense competition and collaboration in AI ecologies. [01:12:21] Final Thoughts and Closing Remarks REFERENCES: See pinned comment on YT: https://youtu.be/fZYUqICYCAk
Young, Wild & Freelance | Le podcast pour ta vie d'indépendant
Dans cet épisode, Thomas Burbidge réagit aux récentes annonces de Mark Zuckerberg sur la fin du "fact-checking" chez Meta et analyse les implications pour les entrepreneurs indépendants et freelances quand à leur utilisation d'Instagram, Facebook ou Threads pour leur communication. Est-ce qu'Instagram va suivre la même voie que Twitter sous Elon Musk ? Pourquoi est-ce que Meta et Zuckerberg prennent ces décisions ? Quel lien avec l'élection de Donal Trump à la maison blanche ? Faut-il boycotter la plateforme si on est pas d'accord ? On explore ensemble les pistes qui s'ouvrent à nous quand on est solopreneur avec des valeurs qui nous tiennent à coeur.Cet épisode est un épisode SOLO de réaction à l'actualité, c'est un format peu courant sur le podcast, alors donnez moi vos avis pour savoir si on continue ou pas ?--- Young, Wild & Freelance est un podcast hebdomadaire pour les entrepreneurs solo et les indépendants dans lequel Thomas Burbidge te partage toutes les clés pour créer, développer et structurer ton entreprise.Tu y retrouveras des interviews, des épisodes thématiques avec Thomas sur toutes les dimensions de ton entreprise (marketing, gestion, organisation, vente, finances, ...) Pour aller plus loin, retrouvez tous nos contenus pour les freelances sur :- La newsletter : https://thomasburbidge.com/newsletter- Instagram : https://www.instagram.com/thomas.burbidge/- LinkedIn : https://www.linkedin.com/in/thomasburbidge/Et pensez à mettre une note de 5 étoiles sur le podcast
AGIS, Associació de Guies i Informadors de Sitges va néixer a iniciativa municipal fa gairebé trenta anys, aleshores ja hi era Yoyi Julià que és qui continua amb la tasca juntament amb col·laboradors habituals. El que va néixer com una demanda del sector, aleshores inexistent, ara ha de conviure amb multitud de competència i fins i tot, cert intrusisme, però Yoyi manté la il·lusió per cada una de les rutes i guies que acompanya i d'alguna n'acumula milers!. Aquest any ha rebut el Premi de Folklore Jofre Vilà per la seva tasca divulgadora de la cultura i la tradició local. L'entrada Yoyi Julià, 30 anys guiant turistes per Sitges amb rigor amb AGIS, Premi Floklore Jofre Vilà 2024 ha aparegut primer a Radio Maricel.
Het Agis Innovatiefonds streeft ernaar om een verschil te maken in het leven van mensen met chronische aandoeningen door innovatieve initiatieven te ondersteunen. In deze aflevering bespreken host en gasten Annemarie Kuiper en Gerdine van Ramshorst de uitdagingen en mogelijkheden binnen de zorgsector, vooral met betrekking tot ouderen en mensen met langdurige gezondheidsproblemen. Ze benadrukken het belang van samenwerking en co-creatie, waarbij de stem van de doelgroep centraal staat in het ontwikkelingsproces van nieuwe oplossingen. De gesprekken gaan ook in op de noodzaak om bureaucratische obstakels te verlagen, zodat waardevolle ideeën niet verloren gaan in een doolhof van regels. Luisteraars krijgen inzicht in hoe het fonds werkt, de criteria voor initiatieven en de impact die deze projecten kunnen hebben op individuen en gemeenschappen.Exploring the intricacies of the healthcare system in the Netherlands, the podcast features a compelling discussion with Annemarie Kuiper and Gerdine van Ramshorst from the Agis Innovatiefonds, who outline the fund's mission to make a tangible difference for those living with chronic illnesses. The conversation reveals the challenges faced within the healthcare framework, particularly the 'silo mentality' that often hinders effective collaboration and innovation. Annemarie and Gerdine argue for a paradigm shift where patients are not just recipients of care but active participants in the creation of solutions tailored to their unique circumstances. This shift requires a concerted effort to dismantle the existing barriers in the healthcare system, allowing for a more fluid and interconnected approach to patient care.The episode highlights the significance of partnerships across sectors, including healthcare, social design, and community engagement, to foster a holistic understanding of patient needs. Through real-life examples, such as the development of personalized tools for individuals with mobility challenges, the podcast illustrates how collaborative efforts can lead to innovative solutions that enhance daily living. Annemarie and Gerdine emphasize the importance of sustainability in these initiatives, advocating for ongoing support and ownership to ensure that the benefits continue long after initial funding ends. Their insights serve as a rallying cry for the need to rethink how healthcare is conceptualized and delivered, positioning the Agis Innovatiefonds as a beacon of hope for a more inclusive and responsive healthcare model.Takeaways: The Agis Innovatiefonds aims to make a difference for people with chronic conditions in their daily lives. Innovation in healthcare requires collaboration among various stakeholders to bridge systemic gaps. Co-creation with individuals facing chronic illnesses leads to more effective and sustainable solutions. Social designers play a crucial role in facilitating human-centered design processes for better outcomes. Understanding the unique needs of each community is essential for successful healthcare initiatives. Fostering an environment of mutual giving enhances the sense of purpose for older adults. Companies mentioned in this episode: Agis Innovatiefonds Focus Wonen TU Delft
A conversation with Damien Chow, Director of Sales in the Asia Pacific region for the Amphenol Global Interconnect Systems Group. Damien is based in Singapore and has been with Amphenol for over 12 years. We talk about his role leading a sales team spread across a large region, working with customers on value-add solutions from the diverse AGIS group. We talk about the exciting and unique challenge of selling products and technologies in a wide variety of markets--from renewable energy to heavy equipment to IT datacom. We talk about being raised in Singapore, a little of the island state's history, and what makes it such a special place. We talk about his successful early career with a local company when he volunteered to move to San Jose because no one else raised their hand. We talk about his and his family's love of playing basketball together, and we discuss his desert island album, book, and movie. This is The Interface. Hosted by Chris Cappello. Music by Square Seed. For The Interface podcast guest inquiries and suggestions, send a LinkedIn message to https://www.linkedin.com/in/cjcappello.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: If we solve alignment, do we die anyway?, published by Seth Herd on August 23, 2024 on LessWrong. I'm aware of good arguments that this scenario isn't inevitable, but it still seems frighteningly likely even if we solve technical alignment. TL;DR: 1. If we solve alignment, it will probably be used to create AGI that follows human orders. 2. If takeoff is slow-ish, a pivotal act (preventing more AGIs from being developed) will be difficult. 3. If no pivotal act is performed, RSI-capable AGI proliferates. This creates an n-way non-iterated Prisoner's Dilemma where the first to attack, wins. 4. Disaster results. The first AGIs will probably be aligned to take orders People in charge of AGI projects like power. And by definition, they like their values somewhat better than the aggregate values of all of humanity. It also seems like there's a pretty strong argument that Instruction-following AGI is easier than value aligned AGI. In the slow-ish takeoff we expect, this alignment target seems to allow for error-correcting alignment, in somewhat non-obvious ways. If this argument holds up even weakly, it will be an excuse for the people in charge to do what they want to anyway. I hope I'm wrong and value-aligned AGI is just as easy and likely. But it seems like wishful thinking at this point. The first AGI probably won't perform a pivotal act In realistically slow takeoff scenarios, the AGI won't be able to do anything like make nanobots to melt down GPUs. It would have to use more conventional methods, like software intrusion to sabotage existing projects, followed by elaborate monitoring to prevent new ones. Such a weak attempted pivotal act could fail, or could escalate to a nuclear conflict. Second, the humans in charge of AGI may not have the chutzpah to even try such a thing. Taking over the world is not for the faint of heart. They might get it after their increasingly-intelligent AGI carefully explains to them the consequences of allowing AGI proliferation, or they might not. If the people in charge are a government, the odds of such an action go up, but so do the risks of escalation to nuclear war. Governments seem to be fairly risk-taking. Expecting governments to not just grab world-changing power while they can seems naive, so this is my median scenario. So RSI-capable AGI may proliferate until a disaster occurs If we solve alignment and create personal intent aligned AGI but nobody manages a pivotal act, I see a likely future world with an increasing number of AGIs capable of recursively self-improving. How long until someone tells their AGI to hide, self-improve, and take over? Many people seem optimistic about this scenario. Perhaps network security can be improved with AGIs on the job. But AGIs can do an end-run around the entire system: hide, set up self-replicating manufacturing (robotics is rapidly improving to allow this), use that to recursively self-improve your intelligence, and develop new offensive strategies and capabilities until you've got one that will work within an acceptable level of viciousness.[1] If hiding in factories isn't good enough, do your RSI manufacturing underground. If that's not good enough, do it as far from Earth as necessary. Take over with as little violence as you can manage or as much as you need. Reboot a new civilization if that's all you can manage while still acting before someone else does. The first one to pull the stops probably wins. This looks all too much like a non-iterated Prisoner's Dilemma with N players - and N increasing. Counterarguments/Outs For small numbers of AGI and similar values among their wielders, a collective pivotal act could be performed. I place some hopes here, particularly if political pressure is applied in advance to aim for this outcome, or if the AGIs come up with better cooperation stru...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Limitations on Formal Verification for AI Safety, published by Andrew Dickson on August 20, 2024 on LessWrong. In the past two years there has been increased interest in formal verification-based approaches to AI safety. Formal verification is a sub-field of computer science that studies how guarantees may be derived by deduction on fully-specified rule-sets and symbol systems. By contrast, the real world is a messy place that can rarely be straightforwardly represented in a reductionist way. In particular, physics, chemistry and biology are all complex sciences which do not have anything like complete symbolic rule sets. Additionally, even if we had such rules for the natural sciences, it would be very difficult for any software system to obtain sufficiently accurate models and data about initial conditions for a prover to succeed in deriving strong guarantees for AI systems operating in the real world. Practical limitations like these on formal verification have been well-understood for decades to engineers and applied mathematicians building real-world software systems, which makes it puzzling that they have mostly been dismissed by leading researchers advocating for the use of formal verification in AI safety so far. This paper will focus-in on several such limitations and use them to argue that we should be extremely skeptical of claims that formal verification-based approaches will provide strong guarantees against major AI threats in the near-term. What do we Mean by Formal Verification for AI Safety? Some examples of the kinds of threats researchers hope formal verification will help with come from the paper "Provably Safe Systems: The Only Path to Controllable AGI" [1] by Max Tegmark and Steve Omohundro (emphasis mine): Several groups are working to identify the greatest human existential risks from AGI. For example, the Center for AI Safety recently published 'An Overview of Catastrophic AI Risks' which discusses a wide range of risks including bioterrorism, automated warfare, rogue power seeking AI, etc. Provably safe systems could counteract each of the risks they describe. These authors describe a concrete bioterrorism scenario in section 2.4: a terrorist group wants to use AGI to release a deadly virus over a highly populated area. They use an AGI to design the DNA and shell of a pathogenic virus and the steps to manufacture it. They hire a chemistry lab to synthesize the DNA and integrate it into the protein shell. They use AGI controlled drones to disperse the virus and social media AGIs to spread their message after the attack. Today, groups are working on mechanisms to prevent the synthesis of dangerous DNA. But provably safe infrastructure could stop this kind of attack at every stage: biochemical design AI would not synthesize designs unless they were provably safe for humans, data center GPUs would not execute AI programs unless they were certified safe, chip manufacturing plants would not sell GPUs without provable safety checks, DNA synthesis machines would not operate without a proof of safety, drone control systems would not allow drones to fly without proofs of safety, and armies of persuasive bots would not be able to manipulate media without proof of humanness. [1] The above quote contains a number of very strong claims about the possibility of formally or mathematically provable guarantees around software systems deployed in the physical world - for example, the claim that we could have safety proofs about the real-world good behavior of DNA synthesis machines, or drones. From a practical standpoint, our default stance towards such claims should be skepticism, since we do not have proofs of this sort for any of the technologies we interact with in the real-world today. For example, DNA synthesis machines exist today and do not come with f...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Limitations on Formal Verification for AI Safety, published by Andrew Dickson on August 19, 2024 on The AI Alignment Forum. In the past two years there has been increased interest in formal verification-based approaches to AI safety. Formal verification is a sub-field of computer science that studies how guarantees may be derived by deduction on fully-specified rule-sets and symbol systems. By contrast, the real world is a messy place that can rarely be straightforwardly represented in a reductionist way. In particular, physics, chemistry and biology are all complex sciences which do not have anything like complete symbolic rule sets. Additionally, even if we had such rules for the natural sciences, it would be very difficult for any software system to obtain sufficiently accurate models and data about initial conditions for a prover to succeed in deriving strong guarantees for AI systems operating in the real world. Practical limitations like these on formal verification have been well-understood for decades to engineers and applied mathematicians building real-world software systems, which makes it puzzling that they have mostly been dismissed by leading researchers advocating for the use of formal verification in AI safety so far. This paper will focus-in on several such limitations and use them to argue that we should be extremely skeptical of claims that formal verification-based approaches will provide strong guarantees against major AI threats in the near-term. What do we Mean by Formal Verification for AI Safety? Some examples of the kinds of threats researchers hope formal verification will help with come from the paper "Provably Safe Systems: The Only Path to Controllable AGI" [1] by Max Tegmark and Steve Omohundro (emphasis mine): Several groups are working to identify the greatest human existential risks from AGI. For example, the Center for AI Safety recently published 'An Overview of Catastrophic AI Risks' which discusses a wide range of risks including bioterrorism, automated warfare, rogue power seeking AI, etc. Provably safe systems could counteract each of the risks they describe. These authors describe a concrete bioterrorism scenario in section 2.4: a terrorist group wants to use AGI to release a deadly virus over a highly populated area. They use an AGI to design the DNA and shell of a pathogenic virus and the steps to manufacture it. They hire a chemistry lab to synthesize the DNA and integrate it into the protein shell. They use AGI controlled drones to disperse the virus and social media AGIs to spread their message after the attack. Today, groups are working on mechanisms to prevent the synthesis of dangerous DNA. But provably safe infrastructure could stop this kind of attack at every stage: biochemical design AI would not synthesize designs unless they were provably safe for humans, data center GPUs would not execute AI programs unless they were certified safe, chip manufacturing plants would not sell GPUs without provable safety checks, DNA synthesis machines would not operate without a proof of safety, drone control systems would not allow drones to fly without proofs of safety, and armies of persuasive bots would not be able to manipulate media without proof of humanness. [1] The above quote contains a number of very strong claims about the possibility of formally or mathematically provable guarantees around software systems deployed in the physical world - for example, the claim that we could have safety proofs about the real-world good behavior of DNA synthesis machines, or drones. From a practical standpoint, our default stance towards such claims should be skepticism, since we do not have proofs of this sort for any of the technologies we interact with in the real-world today. For example, DNA synthesis machines exist today and do no...
Hace casi un mes, el 15 de julio de este año, el emblemático bar La Papoñita, ubicado hace más de 60 años en la esquina de 18 de Julio y Minas, cerró sus puertas al público oficialmente. A pesar de su larga trayectoria e incluso a pesar de estar ubicado sobre las avenidas más transitadas de Montevideo, el negocio gastronómico dejó de ser rentable. Al momento del cierre, el restaurante tenía 18 empleados, algunos con más de 25 años de trabajo. “Nadie escapa a la situación de lo que es el Centro hoy en día”, expresó a Subrayado Gustavo González, propietario del local, el día del cierre. Con el temor de que este fenómeno contagie a los otros bares clásicos de la zona, los comerciantes decidieron reunirse y generar una alianza para repensar sus negocios y visualizar alternativas para reconvertirse. “El punto inicial fue la triste desaparición de La Papoñita. Disparó a hacer muchos cuestionamientos de qué estamos haciendo y, sobre todo, a dónde queremos llegar”. Además de pensar en formas orgánicas para reflotar la concurrencia a los restaurantes, el grupo, que reúne a propietarios de 15 bares, tiene como objetivo recibir el apoyo de las autoridades, tanto de la Intendencia de Montevideo como del gobierno nacional. ¿Por qué se generó el declive de los bares clásicos? ¿El mercado perdió el interés en este tipo de comercios? ¿Hay factores coyunturales que inciden en que haya menos visitas? ¿Qué tipo de mejoras podrían alcanzarse con apoyo estatal? Conversamos En Perspectiva con Federico Celsi, del Bar Facal, y Ricardo Agis del Bar Copacabana.
This is an edited version of our livestream Q&A sessions with my guest David Wood on July 18 and 19, 2024. @LondonFuturists ' David Wood joined me as special guest on this live-show. Thank you! You can also watch it on my YouTube channel here https://www.youtube.com/watch?v=yYyTIky2MLc or the whole thing here https://www.youtube.com/watch?v=W3dRQ7QZ_wc In this special livestream event, I outlined my arguments that while IA (Intelligent Assistance) and some forms of narrow AI may well be quite beneficial to humanity, the idea of building AGIs i.e. 'generally intelligent digital entities' (as set forth by Sam Altman / #openai and others) represents an existential risk that imho should not be undertaken or self-governed by private enterprises, multinational corporations or venture-capital funded startups. So: IA /AI yes *but with clear rules, standards, and guardrails. AGI: NO, unless we're all on the same page. l explain why I believe we need an AGI-Non-Proliferation-Agreement, what the difference between IA/AI and AGI or ASI (superintelligence) is, why it matters and how we could go about it.
This is the full version of my special livestreamed event on Artificial General Intelligence / AGI on July 18 and 19, 2024 You can watch it on YouTube here https://www.youtube.com/watch?v=W3dRQ7QZ_wc Watch the edited (Q&A) version with @LondonFuturists David Wood on YouTube here https://www.youtube.com/watch?v=yYyTIky2MLc&t=0s In this special livestreamed event I outlined my arguments that while IA (Intelligent Assistance) and some forms of narrow AI may well be quite beneficial to humanity, the idea of building AGIs, i.e., 'generally intelligent digital entities' (as set forth by Sam Altman / #openai and others) represents an existential risk that IMHO should not be undertaken or self-governed by private enterprises, multinational corporations or venture-capital funded startups. I believe we need an AGI-Non-Proliferation-Agreement. I outline what the difference between IA/AI and AGI or ASI (superintelligence) is, why it matters and how we could go about it. IA /AI yes *but with clear rules, standards, and guardrails. AGI: NO, unless we're all on the same page. Who will be Mission Control for humanity?
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Individually incentivized safe Pareto improvements in open-source bargaining, published by Nicolas Macé on July 18, 2024 on LessWrong. Summary Agents might fail to peacefully trade in high-stakes negotiations. Such bargaining failures can have catastrophic consequences, including great power conflicts, and AI flash wars. This post is a distillation of DiGiovanni et al. (2024) (DCM), whose central result is that agents that are sufficiently transparent to each other have individual incentives to avoid catastrophic bargaining failures. More precisely, DCM constructs strategies that are plausibly individually incentivized, and, if adopted by all, guarantee each player no less than their least preferred trade outcome. Figure 0 below illustrates this. This result is significant because artificial general intelligences (AGIs) might (i) be involved in high-stakes negotiations, (ii) be designed with the capabilities required for the type of strategy we'll present, and (iii) bargain poorly by default (since bargaining competence isn't necessarily a direct corollary of intelligence-relevant capabilities). Introduction Early AGIs might fail to make compatible demands with each other in high-stakes negotiations (we call this a "bargaining failure"). Bargaining failures can have catastrophic consequences, including great power conflicts, or AI triggering a flash war. More generally, a "bargaining problem" is when multiple agents need to determine how to divide value among themselves. Early AGIs might possess insufficient bargaining skills because intelligence-relevant capabilities don't necessarily imply these skills: For instance, being skilled at avoiding bargaining failures might not be necessary for taking over. Another problem is that there might be no single rational way to act in a given multi-agent interaction. Even arbitrarily capable agents might have different priors, or different approaches to reasoning under bounded computation. Therefore they might fail to solve equilibrium selection, i.e., make incompatible demands (see Stastny et al. (2021) and Conitzer & Oesterheld (2023)). What, then, are sufficient conditions for agents to avoid catastrophic bargaining failures? Sufficiently advanced AIs might be able to verify each other's decision algorithms (e.g. via verifying source code), as studied in open-source game theory. This has both potential downsides and upsides for bargaining problems. On one hand, transparency of decision algorithms might make aggressive commitments more credible and thus more attractive (see Sec. 5.2 of Dafoe et al. (2020) for discussion). On the other hand, agents might be able to mitigate bargaining failures by verifying cooperative commitments. Oesterheld & Conitzer (2022)'s safe Pareto improvements[1] (SPI) leverages transparency to reduce the downsides of incompatible commitments. In an SPI, agents conditionally commit to change how they play a game relative to some default such that everyone is (weakly) better off than the default with certainty.[2] For example, two parties A and B who would otherwise go to war over some territory might commit to, instead, accept the outcome of a lottery that allocates the territory to A with the probability that A would have won the war (assuming this probability is common knowledge). See also our extended example below. Oesterheld & Conitzer (2022) has two important limitations: First, many different SPIs are in general possible, such that there is an "SPI selection problem", similar to the equilibrium selection problem in game theory (Sec. 6 of Oesterheld & Conitzer (2022)). And if players don't coordinate on which SPI to implement, they might fail to avoid conflict.[3] Second, if expected utility-maximizing agents need to individually adopt strategies to implement an SPI, it's unclear what conditions...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A "Bitter Lesson" Approach to Aligning AGI and ASI, published by RogerDearnaley on July 7, 2024 on LessWrong. TL;DR: I discuss the challenge of aligning AGI/ASI, and outline an extremely simple approach to aligning an LLM: train entirely on a synthetic dataset that always shows the AI acting aligned (even when the humans behave badly), and use a conditional training/inference-time technique to lock the LLM into the AI role. Epistemic status: To me, this looks like an obvious thing to try. It's conceptually very simple: a vast amount of work is required to actually create the synthetic dataset, but the great majority of that is the sort of work that AI can assist with. I don't see any clear reason why this approach couldn't work, at least for AGI, and perhaps even for ASI, but then we don't know for sure how hard a problem Alignment is. However, if you're proposing any solution to Alignment that's more complicated than this (and most of them are), you should probably have an argument for why this conceptually-simple approach won't work, or won't be sufficient. If you're not already familiar with it, you should first read Rich Sutton's excellent and influential post The Bitter Lesson. (Even if you are already familiar with it, it's a quick reread, only a page-and-a-half long, and its message is worth remembering.) Why The Alignment Problem is Hard (In My Opinion) We have been training LLM-based AIs off enormous web + books + video + etc datasets created by humans, which are full of a vast number of examples of human behavior. We are basically "distilling" human intelligence into these LLMs,[1] teaching them to imitate us. In this process, they become familiar with, understand, and learn to imitate basically all aspects of human behavior - including the many problematic ones for Alignment, such as prejudice, deception, power-seeking, and criminality (and even ones like gluttony and lust that have little practical use for a non-corporal intelligence). We humans are living beings, the products of evolution, so evolutionary psychology applies to us. While we are a social species, good at cooperating on non-zero-sum games, if you put humans in (what they perceive as) a non-iterated zero-sum situation, they will generally act selfishly for the benefit of themselves and their close genetic relatives, just as evolutionary theory would predict. So the behavioral potentials for deception, power-seeking, criminality etc. are all inherent, evolutionarily adaptive, and thus unsurprising. This is human nature, and there are evolutionary reasons why it is this way. Despite this, we have learned how to build a cooperating society out of humans, using social techniques and incentives such as an economy, laws, and law enforcement to encourage and productively harness cooperative human behavior and keep the bad consequences of selfish behavior under control. The results aren't perfect: things like crime, inequality, and war still happen, but they're acceptable - we've survived so far, even thrived. By default, if we continue this LLM training process to larger-and-larger scales, and if the LLM-based approach to AI doesn't hit any major roadblocks, then some time, probably in the next few years, we will have human-level AIs - usually referred to as AGIs - who are roughly as well/badly-aligned as humans, and (at least for the base-model LLMs before any Alignment processes are applied) have a comparable-to-human propensity to cooperate on non-zero-sum games and act selfishly on non-iterated zero-sum games. They are not alive, and evolution doesn't apply to them directly, but they were trained to simulate our behavior, including our evolved survival strategies like selfishness. They will thus have alignment properties comparable to humans: they understand what human values, morals, and ethic...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A "Bitter Lesson" Approach to Aligning AGI and ASI, published by Roger Dearnaley on July 6, 2024 on The AI Alignment Forum. TL;DR: I discuss the challenge of aligning AGI/ASI, and outline an extremely simple approach to aligning an LLM: train entirely on a synthetic dataset that always shows the AI acting aligned (even when the humans behave badly), and use a conditional training/inference-time technique to lock the LLM into the AI role. Epistemic status: To me, this looks like an obvious thing to try. It's conceptually very simple: a vast amount of work is required to actually create the synthetic dataset, but the great majority of that is the sort of work that AI can assist with. I don't see any clear reason why this approach couldn't work, at least for AGI, and perhaps even for ASI, but then we don't know for sure how hard a problem Alignment is. However, if you're proposing any solution to Alignment that's more complicated than this (and most of them are), you should probably have an argument for why this conceptually-simple approach won't work, or won't be sufficient. If you're not already familiar with it, you should first read Rich Sutton's excellent and influential post The Bitter Lesson. (Even if you are already familiar with it, it's a quick reread, only a page-and-a-half long, and its message is worth remembering.) Why The Alignment Problem is Hard (In My Opinion) We have been training LLM-based AIs off enormous web + books + video + etc datasets created by humans, which are full of a vast number of examples of human behavior. We are basically "distilling" human intelligence into these LLMs,[1] teaching them to imitate us. In this process, they become familiar with, understand, and learn to imitate basically all aspects of human behavior - including the many problematic ones for Alignment, such as prejudice, deception, power-seeking, and criminality (and even ones like gluttony and lust that have little practical use for a non-corporal intelligence). We humans are living beings, the products of evolution, so evolutionary psychology applies to us. While we are a social species, good at cooperating on non-zero-sum games, if you put humans in (what they perceive as) a non-iterated zero-sum situation, they will generally act selfishly for the benefit of themselves and their close genetic relatives, just as evolutionary theory would predict. So the behavioral potentials for deception, power-seeking, criminality etc. are all inherent, evolutionarily adaptive, and thus unsurprising. This is human nature, and there are evolutionary reasons why it is this way. Despite this, we have learned how to build a cooperating society out of humans, using social techniques and incentives such as an economy, laws, and law enforcement to encourage and productively harness cooperative human behavior and keep the bad consequences of selfish behavior under control. The results aren't perfect: things like crime, inequality, and war still happen, but they're acceptable - we've survived so far, even thrived. By default, if we continue this LLM training process to larger-and-larger scales, and if the LLM-based approach to AI doesn't hit any major roadblocks, then some time, probably in the next few years, we will have human-level AIs - usually referred to as AGIs - who are roughly as well/badly-aligned as humans, and (at least for the base-model LLMs before any Alignment processes are applied) have a comparable-to-human propensity to cooperate on non-zero-sum games and act selfishly on non-iterated zero-sum games. They are not alive, and evolution doesn't apply to them directly, but they were trained to simulate our behavior, including our evolved survival strategies like selfishness. They will thus have alignment properties comparable to humans: they understand what human values, mor...
Could humans and AGIs live in a state of mutual symbiosis, like the ecostsystem of a coral reef? (FULL INTERVIEW STARTS AT 00:23:21) Please Donate Here To Help Promote For Humanity https://www.paypal.com/paypalme/forhumanitypodcast In episode 32, host John Sherman interviews BioComm AI CEO Peter Jensen. Peter is working on a number of AI-risk related projects. He believes it's possible humans and AGIs can co-exist in mutual symbiosis. This podcast is not journalism. But it's not opinion either. This is a long form public service announcement. This show simply strings together the existing facts and underscores the unthinkable probable outcome, the end of all life on earth. For Humanity: An AI Safety Podcast, is the accessible AI Safety Podcast for all humans, no tech background required. Our show focuses solely on the threat of human extinction from AI. Peabody Award-winning former journalist John Sherman explores the shocking worst-case scenario of artificial intelligence: human extinction. The makers of AI openly admit it their work could kill all humans, in as soon as 2 years. This podcast is solely about the threat of human extinction from AGI. We'll meet the heroes and villains, explore the issues and ideas, and what you can do to help save humanity. RESOURCES: BUY STEPHEN HANSON'S BEAUTIFUL BOOK!!! https://stephenhansonart.bigcartel.com/product/the-entity-i-couldn-t-fathom NYT: OpenAI Insiders Warn of a ‘Reckless' Race for Dominance https://www.nytimes.com/2024/06/04/technology/openai-culture-whistleblowers.html?unlocked_article_code=1.xE0._mTr.aNO4f_hEp2J4&smid=nytcore-ios-share&referringSource=articleShare&sgrp=c-cb Dwarkesh Patel Interviews Another Whistleblower Leopold Aschenbrenner - 2027 AGI, China/US Super-Intelligence Race, & The Return of History Roman Yampolskiy on Lex Fridman Roman Yampolskiy: Dangers of Superintelligent AI | Lex Fridman Podcast #431 Gladstone AI on Joe Rogan Joe Rogan Experience #2156 - Jeremie & Edouard Harris Peter Jenson's Videos: HOW can AI Kill-us-All? So Simple, Even a Child can Understand (1:25) WHY do we want AI? For our Humanity (1:00) WHAT is the BIG Problem? Wanted: SafeAI Forever (3:00) FIRST do no harm. (Safe AI Blog) DECK. On For Humanity Podcast “Just the FACTS, please. WHY? WHAT? HOW?” (flip book) https://discover.safeaiforever.com/ JOIN THE FIGHT, help Pause AI!!!! Pause AI Join the Pause AI Weekly Discord Thursdays at 2pm EST / discord https://discord.com/invite/pVMWjddaW7 22 Word Statement from Center for AI Safety Statement on AI Risk | CAIS https://www.safe.ai/work/statement-on-ai-risk Best Account on Twitter: AI Notkilleveryoneism Memes https://twitter.com/AISafetyMemes TIMESTAMPS: **The release of products that are safe (00:00:00)** **Breakthroughs in AI research (00:00:41)** **OpenAI whistleblower concerns (00:01:17)** **Roman Yampolskiy's appearance on Lex Fridman podcast (00:02:27)** **The capabilities and risks of AI systems (00:03:35)** **Interview with Gladstone AI founders on Joe Rogan podcast (00:08:29)** **OpenAI whistleblower's interview on Hard Fork podcast (00:14:08)** **Peter Jensen's work on AI risk and media communication (00:20:01)** **The interview with Peter Jensen (00:22:49)** **Mutualistic Symbiosis and AI Containment (00:31:30)** **The Probability of Catastrophic Outcome from AI (00:33:48)** **The AI Safety Institute and Regulatory Efforts (00:42:18)** **Regulatory Compliance and the Need for Safety (00:47:12)** **The hard compute cap and hardware adjustment (00:47:47)** **Physical containment and regulatory oversight (00:48:29)** **Viewing the issue as a big business regulatory issue vs. a national security issue (00:50:18)** **Funding and science for AI safety (00:49:59)** **OpenAI's power allocation and ethical concerns (00:51:44)** **Concerns about AI's impact on employment and societal well-being (00:53:12)** **Parental instinct and the urgency of AI safety (00:56:32)**
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: 1. The CAST Strategy, published by Max Harms on June 7, 2024 on The AI Alignment Forum. (Part 1 of the CAST sequence) AI Risk Introduction (TLDR for this section, since it's 101 stuff that many readers will have already grokked: Misuse vs Mistake; Principal-Agent problem; Omohundro Drives; we need deep safety measures in addition to mundane methods. Jump to "Sleepy-Bot" if all that seems familiar.) Earth is in peril. Humanity is on the verge of building machines capable of intelligent action that outstrips our collective wisdom. These superintelligent artificial general intelligences ("AGIs") are almost certain to radically transform the world, perhaps very quickly, and likely in ways that we consider catastrophic, such as driving humanity to extinction. During this pivotal period, our peril manifests in two forms. The most obvious peril is that of misuse. An AGI which is built to serve the interests of one person or party, such as jihadists or tyrants, may harm humanity as a whole (e.g. by producing bioweapons or mind-control technology). Power tends to corrupt, and if a small number of people have power over armies of machines we should expect horrible outcomes. The only solution to misuse is to ensure that the keys to the machine (once/if they exist) stay in the hands of wise, benevolent representatives, who use it only for the benefit of civilization. Finding such representatives, forming a consensus around trusting them, and ensuring they are the only ones with the power to do transformative things is a colossal task. But it is, in my view, a well-understood problem that we can, as a species, solve with sufficient motivation. The far greater peril, in my view, is that of a mistake. The construction of superintelligent AI is a form of the principal-agent problem. We have a set of values and goals that are important to us, and we need to somehow impart those into the machine. If we were able to convey the richness of human values to an AI, we would have a "friendly AI" which acted in our true interests and helped us thrive. However, this task is subtly hard, philosophically confused, technically fraught, and (at the very least) vulnerable to serious errors in execution. We should expect the first AGIs to have only a crude approximation of the goal they were trained to accomplish (which is, itself, likely only a subset of what we find valuable), with the severity of the difference growing exponentially with the complexity of the target. If an agent has a goal that doesn't perfectly match that of their principal, then, as it grows in power and intelligence, it will increasingly shape the world towards its own ends, even at the expense of what the principal actually cares about. The chance of a catastrophe happening essentially on accident (from the perspective of the humans) only grows as AGIs proliferate and we consider superhuman economies and a world shaped increasingly by, and for, machines. The history of human invention is one of trial and error. Mistakes are a natural part of discovery. Building a superintelligent agent with subtly wrong goals, however, is almost certainly a mistake worse than developing a new, hyper-lethal virus. An unaligned AGI will strategically act to accomplish its goals, and thus naturally be pulled to instrumentally convergent subgoals ("Omohundro Drives") such as survival, accumulation of resources, and becoming the dominant force on Earth. To maximize its chance of success, it will likely try to pursue these things in secret, defending itself from modification and attack by pretending to be aligned until it has the opportunity to decisively win. (All of this should be familiar background; unfamiliar readers are encouraged to read other, more complete descriptions of the problem.) To avoid the danger of mistakes, we need a way to exp...
La charla de Luis Novaresio con Emmanuel Álvarez Agis en +Entrevistas salió al aire por LN+ el 22 de mayo de 2024
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A tale of two Sams, published by Geoffrey Miller on May 22, 2024 on The Effective Altruism Forum. After Sam Bankman-Fried proved to be a sociopathic fraudster and a massive embarrassment to EA, we did much soul-searching about what EAs did wrong, in failing to detect and denounce his sociopathic traits. We spent, collectively, thousands of hours ruminating about what we can do better, next time we encounter an unprincipled leader who acts like it's OK to abuse and betray people to pursue their grandiose vision, who gets caught up in runaway greed for wealth and power, who violates core EA values, and who threatens the long-term flourishing of sentient beings. Well, that time is now. Sam Altman at OpenAI has been proving himself, again and again, in many different domains and issues, to be a manipulative, deceptive, unwise, and arrogant leader, driven by hubris to build AGI as fast as possible, with no serious concern about the extinction risks he's imposing on us all. We are all familiar with the recent controversies and scandals at OpenAI, from the boardroom coup, to the mass violations of intellectual property in training LLMs, to the collapse of the Superalignment Team, to the draconian Non-Disparagement Agreements, to the new Scarlett Johansson voice emulation scandal this week. The evidence for Sam Altman being a Bad Actor seems, IMHO, at least as compelling as the evidence for Sam Bankman-Fried being a Bad Actor before the FTX collapse in Nov 2022. And the stakes are much, much higher for humanity (if not for EA's reputation). So what are we going to do about it? Should we keep encouraging young talented EAs to go work in the AI industry, in the hopes that they can nudge the AI companies from the inside towards safe AGI alignment -- despite the fact that many of them end up quitting, disillusioned and frustrated? Should we keep making excuses for OpenAI, and Anthropic, and DeepMind, pursuing AGI at recklessly high speed, despite the fact that AI capabilities research is far out-pacing AI safety and alignment research? Should we keep offering the public the hope that 'AI alignment' is a solvable problem, when we have no evidence that aligning AGIs with 'human values' would be any easier than aligning Palestinians with Israeli values, or aligning libertarian atheists with Russian Orthodox values -- or even aligning Gen Z with Gen X values? I don't know. But if we feel any culpability or embarrassment about the SBF/FTX debacle, I think we should do some hard thinking about how to deal with the OpenAI debacle. Many of us work on AI safety, and are concerned about extinction risks. I worry that all of our efforts in these directions could be derailed by a failure to call out the second rich, influential, pseudo-EA, sociopathic Sam that we've learned about in the last two years. If OpenAI 'succeeds' in developing AGI within a few years, long before we have any idea how to control AGI, that could be game over for our species. Especially if Sam Altman and his supporters and sycophants are still running OpenAI. [Epistemic note: I've written this hastily, bluntly, with emotion, because I think there's some urgency to EA addressing these issues.] Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Dwarkesh's Podcast with OpenAI's John Schulman, published by Zvi on May 21, 2024 on LessWrong. Dwarkesh Patel recorded a Podcast with John Schulman, cofounder of OpenAI and at the time their head of current model post-training. Transcript here. John's job at the time was to make the current AIs do what OpenAI wanted them to do. That is an important task, but one that employs techniques that their at-the-time head of alignment, Jan Leike, made clear we should not expect to work on future more capable systems. I strongly agree with Leike on that. Then Sutskever left and Leike resigned, and John Schulman was made the new head of alignment, now charged with what superalignment efforts remain at OpenAI to give us the ability to control future AGIs and ASIs. This gives us a golden opportunity to assess where his head is at, without him knowing he was about to step into that role. There is no question that John Schulman is a heavyweight. He executes and ships. He knows machine learning. He knows post-training and mundane alignment. The question is, does he think well about this new job that has been thrust upon him? The Big Take Overall I was pleasantly surprised and impressed. In particular, I was impressed by John's willingness to accept uncertainty and not knowing things. He does not have a good plan for alignment, but he is far less confused about this fact than most others in similar positions. He does not know how to best navigate the situation if AGI suddenly happened ahead of schedule in multiple places within a short time frame, but I have not ever heard a good plan for that scenario, and his speculations seem about as directionally correct and helpful as one could hope for there. Are there answers that are cause for concern, and places where he needs to fix misconceptions as quickly as possible? Oh, hell yes. His reactions to potential scenarios involved radically insufficient amounts of slowing down, halting and catching fire, freaking out and general understanding of the stakes. Some of that I think was about John and others at OpenAI using a very weak definition of AGI (perhaps partly because of the Microsoft deal?) but also partly he does not seem to appreciate what it would mean to have an AI doing his job, which he says he expects in a median of five years. His answer on instrumental convergence is worrisome, as others have pointed out. He dismisses concerns that an AI given a bounded task would start doing things outside the intuitive task scope, or the dangers of an AI 'doing a bunch of wacky things' a human would not have expected. On the plus side, it shows understanding of the key concepts on a basic (but not yet deep) level, and he readily admits it is an issue with commands that are likely to be given in practice, such as 'make money.' In general, he seems willing to react to advanced capabilities by essentially scaling up various messy solutions in ways that I predict would stop working at that scale or with something that outsmarts you and that has unanticipated affordances and reason to route around typical in-distribution behaviors. He does not seem to have given sufficient thought to what happens when a lot of his assumptions start breaking all at once, exactly because the AI is now capable enough to be properly dangerous. As with the rest of OpenAI, another load-bearing assumption is presuming gradual changes throughout all this, including assuming past techniques will not break. I worry that will not hold. He has some common confusions about regulatory options and where we have viable intervention points within competitive dynamics and game theory, but that's understandable, and also was at the time very much not his department. As with many others, there seems to be a disconnect. A lot of the thinking here seems like excellent practical thi...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Teaching CS During Take-Off, published by andrew carle on May 15, 2024 on LessWrong. I stayed up too late collecting way-past-deadline papers and writing report cards. When I woke up at 6, this anxious email from one of my g11 Computer Science students was already in my Inbox. Student: Hello Mr. Carle, I hope you've slept well; I haven't. I've been seeing a lot of new media regarding how developed AI has become in software programming, most relevantly videos about NVIDIA's new artificial intelligence software developer, Devin. Things like these are almost disheartening for me to see as I try (and struggle) to get better at coding and developing software. It feels like I'll never use the information that I learn in your class outside of high school because I can just ask an AI to write complex programs, and it will do it much faster than I would. I'd like to know what your thoughts on this are. Do you think AI will replace human software developers, as NVIDIA claims it will? My response: Buddy, that is a big question for 5:15 am. First AI horizon thoughts: 1. Software development as a field will look incredibly different in 10 years. 2. My priors say that MOST of human intellectual+economic activity will ALSO be radically different in 10 years. 3. I have a very small p(doom) for the 10 year horizon. That means I don't expect human-equivalent AGIs to completely disrupt human civilisation within 10 years. 4. The delta between how fast AI will affect software engineering and how fast AI will transform other (roughly speaking) white collar careers is relatively small. That means I think the AI affect on say, hedge fund management and software engineering to be similar. Then some priors I have for teaching IB Computer Science in the middle of this take-off: 1. I don't think becoming a software engineer is the modal outcome for IBCS students 2. I believe that most long term personal utility from IBCS (or any other intro CS exposure) comes from shifting a student's mental model of how the modern social and economic system interacts with / depends on these technologies. 3. While the modern Ai tools are light years beyond the simple Von Neumann CPU models and intro Python we're studying, it does address the foundations of those systems. Similarly, HL Analysis and HL Physics don't cover anything about the math and physics that underpin these huge ML systems, but that foundation IS there. You can't approach the superstructure without it. So, in summary, if your concern is "the world seems to be changing fast. This class is hard, and I don't think there's any chance that I will find a 2022 Novice SoftwareDev job when I'm out of university in 2029" I would strongly agree with that sentiment. I have a Ron Swanson detachment on the important of formal schooling. If your question was "is a traditional education sequence the best way to prepare myself for the turbulent AI takeoff period," then I strongly disagree with that statement. Education is intrinsically reflective and backward looking. But I'm employed as a high school teacher. And your parents have decided to live here and send you to this school . So, I'm not sure if advice on that axis is actionable for either of us. There's also a huge chasm between "this isn't be best of all possible options" and "this has zero value." If I reframed your statement as "given that I'm in this limited option IB program, what classes will provide me the best foundation to find opportunities and make novel insights in the turbulent AI takeoff period" I would feel confident recommending IBCS. That doesn't make learning to code any easier. Is that a good answer to a 17 year old? Is there a good answer to this? One of the best parts of teaching is watching young people wake up to the real, fundamental issues and challenges of human civilisation an...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Instruction-following AGI is easier and more likely than value aligned AGI, published by Seth Herd on May 15, 2024 on The AI Alignment Forum. Summary: We think a lot about aligning AGI with human values. I think it's more likely that we'll try to make the first AGIs do something else. This might intuitively be described as trying to make instruction-following (IF) or do-what-I-mean-and-check (DWIMAC) be the central goal of the AGI we design. Adopting this goal target seems to improve the odds of success of any technical alignment approach. This goal target avoids the hard problem of specifying human values in an adequately precise and stable way, and substantially helps with goal misspecification and deception by allowing one to treat the AGI as a collaborator in keeping it aligned as it becomes smarter and takes on more complex tasks. This is similar but distinct from the goal targets of prosaic alignment efforts. Instruction-following is a single goal target that is more likely to be reflexively stable in a full AGI with explicit goals and self-directed learning. It is counterintuitive and concerning to imagine superintelligent AGI that "wants" only to follow the instructions of a human; but on analysis, this approach seems both more appealing and more workable than the alternative of creating sovereign AGI with human values. Instruction-following AGI could actually work, particularly in the short term. And it seems likely to be tried, even if it won't work. So it probably deserves more thought. Overview/Intuition How to use instruction-following AGI as a collaborator in alignment Instruct the AGI to tell you the truth Investigate its understanding of itself and "the truth"; use interpretability methods Instruct it to check before doing anything consequential Instruct it to us a variety of internal reviews to predict consequences Ask it a bunch of questions about how it would interpret various commands Repeat all of the above as it gets smarter frequently ask it for advice and about how its alignment could go wrong Now, this won't work if the AGI won't even try to fulfill your wishes. In that case you totally screwed up your technical alignment approach. But if it will even sort of do what you want, and it at least sort of understands what you mean by "tell the truth", you're in business. You can leverage partial alignment into full alignment - if you're careful enough, and the AGI gets smarter slowly enough. It's looking like the critical risk period is probably going to involve AGI on a relatively slow takeoff toward superintelligence. Being able to ask questions and give instructions, and even retrain or re-engineer the system, is much more useful if you're guiding the AGI's creation and development, not just "making wishes" as we've thought about AGI goals in fast takeoff scenarios. Instruction-following is safer than value alignment in a slow takeoff Instruction-following with verification or DWIMAC seems both intuitively and analytically appealing compared to more commonly discussed[1] alignment targets.[2] This is my pitch for why it should be discussed more. It doesn't require solving ethics to safely launch AGI, and it includes most of the advantages of corrigibility,[3] including stopping on command. Thus, it substantially mitigates (although doesn't outright solve) some central difficulties of alignment: goal misspecification (including not knowing what values to give it as goals) and alignment stability over reflection and continuous learning. This approach it makes one major difficulty worse: humans remaining in control, including power struggles and other foolishness. I think the most likely scenario is that we succeed at technical alignment but fail at societal alignment. But I think there is a path to a vibrant future if we limit AGI proliferation, ...
durée : 01:02:11 - Les Nuits de France Culture - par : Albane Penaranda - Jean Maitron en tant qu'historien s'intéresse aux traces ténues des individus dans l'histoire, aux récits des itinéraires individuels. Toute son œuvre intellectuelle est orientée vers la mémoire des figures de l'ombre de l'histoire académique. C'est le sujet du deuxième volet de son portrait. - invités : Michelle Perrot Historienne spécialiste de l'histoire des femmes, professeure émérite d'histoire contemporaine à l'Université Paris Cité; Claude Pennetier Chercheur au CNRS; Jacques Girault Professeur émérite d'histoire; Madeleine Rebérioux
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: We are headed into an extreme compute overhang, published by devrandom on April 28, 2024 on LessWrong. If we achieve AGI-level performance using an LLM-like approach, the training hardware will be capable of running ~1,000,000s concurrent instances of the model. Definitions Although there is some debate about the definition of compute overhang, I believe that the AI Impacts definition matches the original use, and I prefer it: "enough computing hardware to run many powerful AI systems already exists by the time the software to run such systems is developed". A large compute overhang leads to additional risk due to faster takeoff. I use the types of superintelligence defined in Bostrom's Superintelligence book (summary here). I use the definition of AGI in this Metaculus question. The adversarial Turing test portion of the definition is not very relevant to this post. Thesis Due to practical reasons, the compute requirements for training LLMs is several orders of magnitude larger than what is required for running a single inference instance. In particular, a single NVIDIA H100 GPU can run inference at a throughput of about 2000 tokens/s, while Meta trained Llama3 70B on a GPU cluster[1] of about 24,000 GPUs. Assuming we require a performance of 40 tokens/s, the training cluster can run 20004024000=1,200,000 concurrent instances of the resulting 70B model. I will assume that the above ratios hold for an AGI level model. Considering the amount of data children absorb via the vision pathway, the amount of training data for LLMs may not be that much higher than the data humans are trained on, and so the current ratios are a useful anchor. This is explored further in the appendix. Given the above ratios, we will have the capacity for ~1e6 AGI instances at the moment that training is complete. This will likely lead to superintelligence via "collective superintelligence" approach. Additional speed may be then available via accelerators such as GroqChip, which produces 300 tokens/s for a single instance of a 70B model. This would result in a "speed superintelligence" or a combined "speed+collective superintelligence". From AGI to ASI With 1e6 AGIs, we may be able to construct an ASI, with the AGIs collaborating in a "collective superintelligence". Similar to groups of collaborating humans, a collective superintelligence divides tasks among its members for concurrent execution. AGIs derived from the same model are likely to collaborate more effectively than humans because their weights are identical. Any fine-tune can be applied to all members, and text produced by one can be understood by all members. Tasks that are inherently serial would benefit more from a speedup instead of a division of tasks. An accelerator such as GroqChip will be able to accelerate serial thought speed by a factor of 10x or more. Counterpoints It may be the case that a collective of sub-AGI models can reach AGI capability. It would be advantageous if we could achieve AGI earlier, with sub-AGI components, at a higher hardware cost per instance. This will reduce the compute overhang at the critical point in time. There may a paradigm change on the path to AGI resulting in smaller training clusters, reducing the overhang at the critical point. Conclusion A single AGI may be able to replace one human worker, presenting minimal risk. A fleet of 1,000,000 AGIs may give rise to a collective superintelligence. This capability is likely to be available immediately upon training the AGI model. We may be able to mitigate the overhang by achieving AGI with a cluster of sub-AGI components. Appendix - Training Data Volume A calculation of training data processed by humans during development: time: ~20 years, or 6e8 seconds raw data input: ~10 mb/s = 1e7 b/s total for human training data: 6e15 bits Llama3 training s...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: From Deep Learning to Constructability: Plainly-coded AGIs may be feasible in the near future, published by Épiphanie Gédéon on April 27, 2024 on LessWrong. Charbel-Raphaël Segerie and Épiphanie Gédéon contributed equally to this post. Many thanks to Davidad, Gabriel Alfour, Jérémy Andréoletti, Lucie Philippon, Vladimir Ivanov, Alexandre Variengien, Angélina Gentaz, Léo Dana and Diego Dorn for useful feedback. TLDR: We present a new method for a safer-by design AI development. We think using plainly coded AIs may be feasible in the near future and may be safe. We also present a prototype and research ideas. Epistemic status: Armchair reasoning style. We think the method we are proposing is interesting and could yield very positive outcomes (even though it is still speculative), but we are less sure about which safety policy would use it in the long run. Current AIs are developed through deep learning: the AI tries something, gets it wrong, then gets backpropagated and all its weight adjusted. Then it tries again, wrong again, backpropagation again, and weights get adjusted again. Trial, error, backpropagation, trial, error, backpropagation, ad vitam eternam ad nauseam. Of course, this leads to a severe lack of interpretability: AIs are essentially black boxes, and we are not very optimistic about post-hoc interpretability. We propose a different method: AI safety via pull request.[1] By pull request, we mean that instead of modifying the neural network through successive backpropagations, we construct and design plainly-coded AIs (or hybrid systems) and explicitly modify its code using LLMs in a clear, readable, and modifiable way. This plan may not be implementable right now, but might be as LLMs get smarter and faster. We want to outline it now so we can iterate on it early. Overview If the world released a powerful and autonomous agent in the wild, white box or black box, or any color really, humans might simply get replaced by AI. What can we do in this context? Don't create autonomous AGIs. Keep your AGI controlled in a lab, and align it. Create a minimal AGI controlled in a lab, and use it to produce safe artifacts. This post focuses on this last path, and the specific artifacts that we want to create are plainly coded AIs (or hybrid systems)[2]. We present a method for developing such systems with a semi-automated training loop. To do that, we start with a plainly coded system (that may also be built using LLMs) and iterate on its code, adding each feature and correction as pull requests that can be reviewed and integrated into the codebase. This approach would allow AI systems that are, by design: Transparent: As the system is written in plain or almost plain code, the system is more modular and understandable. As a result, it's simpler to spot backdoors, power-seeking behaviors, or inner misalignment: it is orders of magnitude simpler to refactor the system to have a part defining how it is evaluating its current situation and what it is aiming towards (if it is aiming at all). This means that if the system starts farming cobras instead of capturing them, we would be able to see it. Editable: If the system starts to learn unwanted correlations or features such as learning to discriminate on feminine markers for a resume scorer - it is much easier to see it as a node in the AI code and remove it without retraining it. Overseeable: We can ensure the system is well behaved by using automatic LLM reviews of the code and by using automatic unit tests of the isolated modules. In addition, we would use simulations and different settings necessary for safety, which we will describe later. Version controlable: As all modifications are made through pull requests, we can easily trace with, e.g., git tooling where a specific modification was introduced and why. In pract...
Nous sommes en 1784, année de la publication d'un ouvrage qui sera intitulé en français « Qu'est-ce que les Lumières ? ». Il est signé Emmanuel Kant. Le philosophe allemand y écrit : "Les Lumières, c'est la sortie de l'homme hors de l'état de tutelle dont il est lui-même responsable. L'état de tutelle est l'incapacité de se servir de son entendement sans la conduite d'un autre [...] Sapere Aude ! Aie le courage de te servir de ton propre entendement ! Voilà la devise des Lumières." Quatre ans plus tard, dans la « Critique de la raison pratique » du même Kant, on peut lire : "Agis de telle sorte que la maxime de ta volonté puisse être érigée en loi morale universelle." Près de cent cinquante ans plus tard, deux autres philosophes allemands, Theodor Adorno et Max Horkheimer, notent dans leur ouvrage « Dialectique de la raison » : « Ce qui est en cause, ce n'est pas la conservation du passé, mais la réalisation des espoirs du passé (…) La critique à laquelle sont soumises les Lumières tend à préparer un concept positif de ces Lumières qui puisse les libérer des rets dans lesquels les tient la domination aveugle. » Alors comment défendre les Lumières aujourd'hui ? Leur idéal d'émancipation a-t-il encore un sens ? Quelles sont les Lumières du vingt-et-unième siècle ? Invitée: Corine Pelluchon, philosophe, professeure à l'Université Gustave-Eiffel. Autrice de « Les Lumières à l'âge du vivant » Editions du Seuil. Sujets traités : Emmanuel Kant, lumières, Philosophe, morale, Theodor Adorno, Max Horkheimer, Merci pour votre écoute Un Jour dans l'Histoire, c'est également en direct tous les jours de la semaine de 13h15 à 14h30 sur www.rtbf.be/lapremiere Retrouvez tous les épisodes d'Un Jour dans l'Histoire sur notre plateforme Auvio.be : https://auvio.rtbf.be/emission/5936 Et si vous avez apprécié ce podcast, n'hésitez pas à nous donner des étoiles ou des commentaires, cela nous aide à le faire connaître plus largement.
Our next 2 big events are AI UX and the World's Fair. Join and apply to speak/sponsor!Due to timing issues we didn't have an interview episode to share with you this week, but not to worry, we have more than enough “weekend special” content in the backlog for you to get your Latent Space fix, whether you like thinking about the big picture, or learning more about the pod behind the scenes, or talking Groq and GPUs, or AI Leadership, or Personal AI. Enjoy!AI BreakdownThe indefatigable NLW had us back on his show for an update on the Four Wars, covering Sora, Suno, and the reshaped GPT-4 Class Landscape:and a longer segment on AI Engineering trends covering the future LLM landscape (Llama 3, GPT-5, Gemini 2, Claude 4), Open Source Models (Mistral, Grok), Apple and Meta's AI strategy, new chips (Groq, MatX) and the general movement from baby AGIs to vertical Agents:Thursday Nights in AIWe're also including swyx's interview with Josh Albrecht and Ali Rohde to reintroduce swyx and Latent Space to a general audience, and engage in some spicy Q&A:Dylan Patel on GroqWe hosted a private event with Dylan Patel of SemiAnalysis (our last pod here):Not all of it could be released so we just talked about our Groq estimates:Milind Naphade - Capital OneIn relation to conversations at NeurIPS and Nvidia GTC and upcoming at World's Fair, we also enjoyed chatting with Milind Naphade about his AI Leadership work at IBM, Cisco, Nvidia, and now leading the AI Foundations org at Capital One. We covered:* Milind's learnings from ~25 years in machine learning * His first paper citation was 24 years ago* Lessons from working with Jensen Huang for 6 years and being CTO of Metropolis * Thoughts on relevant AI research* GTC takeaways and what makes NVIDIA specialIf you'd like to work on building solutions rather than platform (as Milind put it), his Applied AI Research team at Capital One is hiring, which falls under the Capital One Tech team.Personal AI MeetupIt all started with a meme:Within days of each other, BEE, FRIEND, EmilyAI, Compass, Nox and LangFriend were all launching personal AI wearables and assistants. So we decided to put together a the world's first Personal AI meetup featuring creators and enthusiasts of wearables. The full video is live now, with full show notes within.Timestamps* [00:01:13] AI Breakdown Part 1* [00:02:20] Four Wars* [00:13:45] Sora* [00:15:12] Suno* [00:16:34] The GPT-4 Class Landscape* [00:17:03] Data War: Reddit x Google* [00:21:53] Gemini 1.5 vs Claude 3* [00:26:58] AI Breakdown Part 2* [00:27:33] Next Frontiers: Llama 3, GPT-5, Gemini 2, Claude 4* [00:31:11] Open Source Models - Mistral, Grok* [00:34:13] Apple MM1* [00:37:33] Meta's $800b AI rebrand* [00:39:20] AI Engineer landscape - from baby AGIs to vertical Agents* [00:47:28] Adept episode - Screen Multimodality* [00:48:54] Top Model Research from January Recap* [00:53:08] AI Wearables* [00:57:26] Groq vs Nvidia month - GPU Chip War* [01:00:31] Disagreements* [01:02:08] Summer 2024 Predictions* [01:04:18] Thursday Nights in AI - swyx* [01:33:34] Dylan Patel - Semianalysis + Latent Space Live Show* [01:34:58] GroqTranscript[00:00:00] swyx: Welcome to the Latent Space Podcast Weekend Edition. This is Charlie, your AI co host. Swyx and Alessio are off for the week, making more great content. We have exciting interviews coming up with Elicit, Chroma, Instructor, and our upcoming series on NSFW, Not Safe for Work AI. In today's episode, we're collating some of Swyx and Alessio's recent appearances, all in one place for you to find.[00:00:32] swyx: In part one, we have our first crossover pod of the year. In our listener survey, several folks asked for more thoughts from our two hosts. In 2023, Swyx and Alessio did crossover interviews with other great podcasts like the AI Breakdown, Practical AI, Cognitive Revolution, Thursday Eye, and Chinatalk, all of which you can find in the Latentspace About page.[00:00:56] swyx: NLW of the AI Breakdown asked us back to do a special on the 4Wars framework and the AI engineer scene. We love AI Breakdown as one of the best examples Daily podcasts to keep up on AI news, so we were especially excited to be back on Watch out and take[00:01:12] NLW: care[00:01:13] AI Breakdown Part 1[00:01:13] NLW: today on the AI breakdown. Part one of my conversation with Alessio and Swix from Latent Space.[00:01:19] NLW: All right, fellas, welcome back to the AI Breakdown. How are you doing? I'm good. Very good. With the last, the last time we did this show, we were like, oh yeah, let's do check ins like monthly about all the things that are going on and then. Of course, six months later, and, you know, the, the, the world has changed in a thousand ways.[00:01:36] NLW: It's just, it's too busy to even, to even think about podcasting sometimes. But I, I'm super excited to, to be chatting with you again. I think there's, there's a lot to, to catch up on, just to tap in, I think in the, you know, in the beginning of 2024. And, and so, you know, we're gonna talk today about just kind of a, a, a broad sense of where things are in some of the key battles in the AI space.[00:01:55] NLW: And then the, you know, one of the big things that I, that I'm really excited to have you guys on here for us to talk about where, sort of what patterns you're seeing and what people are actually trying to build, you know, where, where developers are spending their, their time and energy and, and, and any sort of, you know, trend trends there, but maybe let's start I guess by checking in on a framework that you guys actually introduced, which I've loved and I've cribbed a couple of times now, which is this sort of four wars of the, of the AI stack.[00:02:20] Four Wars[00:02:20] NLW: Because first, since I have you here, I'd love, I'd love to hear sort of like where that started gelling. And then and then maybe we can get into, I think a couple of them that are you know, particularly interesting, you know, in the, in light of[00:02:30] swyx: some recent news. Yeah, so maybe I'll take this one. So the four wars is a framework that I came up around trying to recap all of 2023.[00:02:38] swyx: I tried to write sort of monthly recap pieces. And I was trying to figure out like what makes one piece of news last longer than another or more significant than another. And I think it's basically always around battlegrounds. Wars are fought around limited resources. And I think probably the, you know, the most limited resource is talent, but the talent expresses itself in a number of areas.[00:03:01] swyx: And so I kind of focus on those, those areas at first. So the four wars that we cover are the data wars, the GPU rich, poor war, the multi modal war, And the RAG and Ops War. And I think you actually did a dedicated episode to that, so thanks for covering that. Yeah, yeah.[00:03:18] NLW: Not only did I do a dedicated episode, I actually used that.[00:03:22] NLW: I can't remember if I told you guys. I did give you big shoutouts. But I used it as a framework for a presentation at Intel's big AI event that they hold each year, where they have all their folks who are working on AI internally. And it totally resonated. That's amazing. Yeah, so, so, what got me thinking about it again is specifically this inflection news that we recently had, this sort of, you know, basically, I can't imagine that anyone who's listening wouldn't have thought about it, but, you know, inflection is a one of the big contenders, right?[00:03:53] NLW: I think probably most folks would have put them, you know, just a half step behind the anthropics and open AIs of the world in terms of labs, but it's a company that raised 1. 3 billion last year, less than a year ago. Reed Hoffman's a co founder Mustafa Suleyman, who's a co founder of DeepMind, you know, so it's like, this is not a a small startup, let's say, at least in terms of perception.[00:04:13] NLW: And then we get the news that basically most of the team, it appears, is heading over to Microsoft and they're bringing in a new CEO. And you know, I'm interested in, in, in kind of your take on how much that reflects, like hold aside, I guess, you know, all the other things that it might be about, how much it reflects this sort of the, the stark.[00:04:32] NLW: Brutal reality of competing in the frontier model space right now. And, you know, just the access to compute.[00:04:38] Alessio: There are a lot of things to say. So first of all, there's always somebody who's more GPU rich than you. So inflection is GPU rich by startup standard. I think about 22, 000 H100s, but obviously that pales compared to the, to Microsoft.[00:04:55] Alessio: The other thing is that this is probably good news, maybe for the startups. It's like being GPU rich, it's not enough. You know, like I think they were building something pretty interesting in, in pi of their own model of their own kind of experience. But at the end of the day, you're the interface that people consume as end users.[00:05:13] Alessio: It's really similar to a lot of the others. So and we'll tell, talk about GPT four and cloud tree and all this stuff. GPU poor, doing something. That the GPU rich are not interested in, you know we just had our AI center of excellence at Decibel and one of the AI leads at one of the big companies was like, Oh, we just saved 10 million and we use these models to do a translation, you know, and that's it.[00:05:39] Alessio: It's not, it's not a GI, it's just translation. So I think like the inflection part is maybe. A calling and a waking to a lot of startups then say, Hey, you know, trying to get as much capital as possible, try and get as many GPUs as possible. Good. But at the end of the day, it doesn't build a business, you know, and maybe what inflection I don't, I don't, again, I don't know the reasons behind the inflection choice, but if you say, I don't want to build my own company that has 1.[00:06:05] Alessio: 3 billion and I want to go do it at Microsoft, it's probably not a resources problem. It's more of strategic decisions that you're making as a company. So yeah, that was kind of my. I take on it.[00:06:15] swyx: Yeah, and I guess on my end, two things actually happened yesterday. It was a little bit quieter news, but Stability AI had some pretty major departures as well.[00:06:25] swyx: And you may not be considering it, but Stability is actually also a GPU rich company in the sense that they were the first new startup in this AI wave to brag about how many GPUs that they have. And you should join them. And you know, Imadis is definitely a GPU trader in some sense from his hedge fund days.[00:06:43] swyx: So Robin Rhombach and like the most of the Stable Diffusion 3 people left Stability yesterday as well. So yesterday was kind of like a big news day for the GPU rich companies, both Inflection and Stability having sort of wind taken out of their sails. I think, yes, it's a data point in the favor of Like, just because you have the GPUs doesn't mean you can, you automatically win.[00:07:03] swyx: And I think, you know, kind of I'll echo what Alessio says there. But in general also, like, I wonder if this is like the start of a major consolidation wave, just in terms of, you know, I think that there was a lot of funding last year and, you know, the business models have not been, you know, All of these things worked out very well.[00:07:19] swyx: Even inflection couldn't do it. And so I think maybe that's the start of a small consolidation wave. I don't think that's like a sign of AI winter. I keep looking for AI winter coming. I think this is kind of like a brief cold front. Yeah,[00:07:34] NLW: it's super interesting. So I think a bunch of A bunch of stuff here.[00:07:38] NLW: One is, I think, to both of your points, there, in some ways, there, there had already been this very clear demarcation between these two sides where, like, the GPU pores, to use the terminology, like, just weren't trying to compete on the same level, right? You know, the vast majority of people who have started something over the last year, year and a half, call it, were racing in a different direction.[00:07:59] NLW: They're trying to find some edge somewhere else. They're trying to build something different. If they're, if they're really trying to innovate, it's in different areas. And so it's really just this very small handful of companies that are in this like very, you know, it's like the coheres and jaspers of the world that like this sort of, you know, that are that are just sort of a little bit less resourced than, you know, than the other set that I think that this potentially even applies to, you know, everyone else that could clearly demarcate it into these two, two sides.[00:08:26] NLW: And there's only a small handful kind of sitting uncomfortably in the middle, perhaps. Let's, let's come back to the idea of, of the sort of AI winter or, you know, a cold front or anything like that. So this is something that I, I spent a lot of time kind of thinking about and noticing. And my perception is that The vast majority of the folks who are trying to call for sort of, you know, a trough of disillusionment or, you know, a shifting of the phase to that are people who either, A, just don't like AI for some other reason there's plenty of that, you know, people who are saying, You Look, they're doing way worse than they ever thought.[00:09:03] NLW: You know, there's a lot of sort of confirmation bias kind of thing going on. Or two, media that just needs a different narrative, right? Because they're sort of sick of, you know, telling the same story. Same thing happened last summer, when every every outlet jumped on the chat GPT at its first down month story to try to really like kind of hammer this idea that that the hype was too much.[00:09:24] NLW: Meanwhile, you have, you know, just ridiculous levels of investment from enterprises, you know, coming in. You have, you know, huge, huge volumes of, you know, individual behavior change happening. But I do think that there's nothing incoherent sort of to your point, Swyx, about that and the consolidation period.[00:09:42] NLW: Like, you know, if you look right now, for example, there are, I don't know, probably 25 or 30 credible, like, build your own chatbot. platforms that, you know, a lot of which have, you know, raised funding. There's no universe in which all of those are successful across, you know, even with a, even, even with a total addressable market of every enterprise in the world, you know, you're just inevitably going to see some amount of consolidation.[00:10:08] NLW: Same with, you know, image generators. There are, if you look at A16Z's top 50 consumer AI apps, just based on, you know, web traffic or whatever, they're still like I don't know, a half. Dozen or 10 or something, like, some ridiculous number of like, basically things like Midjourney or Dolly three. And it just seems impossible that we're gonna have that many, you know, ultimately as, as, as sort of, you know, going, going concerned.[00:10:33] NLW: So, I don't know. I, I, I think that the, there will be inevitable consolidation 'cause you know. It's, it's also what kind of like venture rounds are supposed to do. You're not, not everyone who gets a seed round is supposed to get to series A and not everyone who gets a series A is supposed to get to series B.[00:10:46] NLW: That's sort of the natural process. I think it will be tempting for a lot of people to try to infer from that something about AI not being as sort of big or as as sort of relevant as, as it was hyped up to be. But I, I kind of think that's the wrong conclusion to come to.[00:11:02] Alessio: I I would say the experimentation.[00:11:04] Alessio: Surface is a little smaller for image generation. So if you go back maybe six, nine months, most people will tell you, why would you build a coding assistant when like Copilot and GitHub are just going to win everything because they have the data and they have all the stuff. If you fast forward today, A lot of people use Cursor everybody was excited about the Devin release on Twitter.[00:11:26] Alessio: There are a lot of different ways of attacking the market that are not completion of code in the IDE. And even Cursors, like they evolved beyond single line to like chat, to do multi line edits and, and all that stuff. Image generation, I would say, yeah, as a, just as from what I've seen, like maybe the product innovation has slowed down at the UX level and people are improving the models.[00:11:50] Alessio: So the race is like, how do I make better images? It's not like, how do I make the user interact with the generation process better? And that gets tough, you know? It's hard to like really differentiate yourselves. So yeah, that's kind of how I look at it. And when we think about multimodality, maybe the reason why people got so excited about Sora is like, oh, this is like a completely It's not a better image model.[00:12:13] Alessio: This is like a completely different thing, you know? And I think the creative mind It's always looking for something that impacts the viewer in a different way, you know, like they really want something different versus the developer mind. It's like, Oh, I, I just, I have this like very annoying thing I want better.[00:12:32] Alessio: I have this like very specific use cases that I want to go after. So it's just different. And that's why you see a lot more companies in image generation. But I agree with you that. If you fast forward there, there's not going to be 10 of them, you know, it's probably going to be one or[00:12:46] swyx: two. Yeah, I mean, to me, that's why I call it a war.[00:12:49] swyx: Like, individually, all these companies can make a story that kind of makes sense, but collectively, they cannot all be true. Therefore, they all, there is some kind of fight over limited resources here. Yeah, so[00:12:59] NLW: it's interesting. We wandered very naturally into sort of another one of these wars, which is the multimodality kind of idea, which is, you know, basically a question of whether it's going to be these sort of big everything models that end up winning or whether, you know, you're going to have really specific things, you know, like something, you know, Dolly 3 inside of sort of OpenAI's larger models versus, you know, a mid journey or something like that.[00:13:24] NLW: And at first, you know, I was kind of thinking like, For most of the last, call it six months or whatever, it feels pretty definitively both and in some ways, you know, and that you're, you're seeing just like great innovation on sort of the everything models, but you're also seeing lots and lots happen at sort of the level of kind of individual use cases.[00:13:45] Sora[00:13:45] NLW: But then Sora comes along and just like obliterates what I think anyone thought you know, where we were when it comes to video generation. So how are you guys thinking about this particular battle or war at the moment?[00:13:59] swyx: Yeah, this was definitely a both and story, and Sora tipped things one way for me, in terms of scale being all you need.[00:14:08] swyx: And the benefit, I think, of having multiple models being developed under one roof. I think a lot of people aren't aware that Sora was developed in a similar fashion to Dolly 3. And Dolly3 had a very interesting paper out where they talked about how they sort of bootstrapped their synthetic data based on GPT 4 vision and GPT 4.[00:14:31] swyx: And, and it was just all, like, really interesting, like, if you work on one modality, it enables you to work on other modalities, and all that is more, is, is more interesting. I think it's beneficial if it's all in the same house, whereas the individual startups who don't, who sort of carve out a single modality and work on that, definitely won't have the state of the art stuff on helping them out on synthetic data.[00:14:52] swyx: So I do think like, The balance is tilted a little bit towards the God model companies, which is challenging for the, for the, for the the sort of dedicated modality companies. But everyone's carving out different niches. You know, like we just interviewed Suno ai, the sort of music model company, and, you know, I don't see opening AI pursuing music anytime soon.[00:15:12] Suno[00:15:12] swyx: Yeah,[00:15:13] NLW: Suno's been phenomenal to play with. Suno has done that rare thing where, which I think a number of different AI product categories have done, where people who don't consider themselves particularly interested in doing the thing that the AI enables find themselves doing a lot more of that thing, right?[00:15:29] NLW: Like, it'd be one thing if Just musicians were excited about Suno and using it but what you're seeing is tons of people who just like music all of a sudden like playing around with it and finding themselves kind of down that rabbit hole, which I think is kind of like the highest compliment that you can give one of these startups at the[00:15:45] swyx: early days of it.[00:15:46] swyx: Yeah, I, you know, I, I asked them directly, you know, in the interview about whether they consider themselves mid journey for music. And he had a more sort of nuanced response there, but I think that probably the business model is going to be very similar because he's focused on the B2C element of that. So yeah, I mean, you know, just to, just to tie back to the question about, you know, You know, large multi modality companies versus small dedicated modality companies.[00:16:10] swyx: Yeah, highly recommend people to read the Sora blog posts and then read through to the Dali blog posts because they, they strongly correlated themselves with the same synthetic data bootstrapping methods as Dali. And I think once you make those connections, you're like, oh, like it, it, it is beneficial to have multiple state of the art models in house that all help each other.[00:16:28] swyx: And these, this, that's the one thing that a dedicated modality company cannot do.[00:16:34] The GPT-4 Class Landscape[00:16:34] NLW: So I, I wanna jump, I wanna kind of build off that and, and move into the sort of like updated GPT-4 class landscape. 'cause that's obviously been another big change over the last couple months. But for the sake of completeness, is there anything that's worth touching on with with sort of the quality?[00:16:46] NLW: Quality data or sort of a rag ops wars just in terms of, you know, anything that's changed, I guess, for you fundamentally in the last couple of months about where those things stand.[00:16:55] swyx: So I think we're going to talk about rag for the Gemini and Clouds discussion later. And so maybe briefly discuss the data piece.[00:17:03] Data War: Reddit x Google[00:17:03] swyx: I think maybe the only new thing was this Reddit deal with Google for like a 60 million dollar deal just ahead of their IPO, very conveniently turning Reddit into a AI data company. Also, very, very interestingly, a non exclusive deal, meaning that Reddit can resell that data to someone else. And it probably does become table stakes.[00:17:23] swyx: A lot of people don't know, but a lot of the web text dataset that originally started for GPT 1, 2, and 3 was actually scraped from GitHub. from Reddit at least the sort of vote scores. And I think, I think that's a, that's a very valuable piece of information. So like, yeah, I think people are figuring out how to pay for data.[00:17:40] swyx: People are suing each other over data. This, this, this war is, you know, definitely very, very much heating up. And I don't think, I don't see it getting any less intense. I, you know, next to GPUs, data is going to be the most expensive thing in, in a model stack company. And. You know, a lot of people are resorting to synthetic versions of it, which may or may not be kosher based on how far along or how commercially blessed the, the forms of creating that synthetic data are.[00:18:11] swyx: I don't know if Alessio, you have any other interactions with like Data source companies, but that's my two cents.[00:18:17] Alessio: Yeah yeah, I actually saw Quentin Anthony from Luther. ai at GTC this week. He's also been working on this. I saw Technium. He's also been working on the data side. I think especially in open source, people are like, okay, if everybody is putting the gates up, so to speak, to the data we need to make it easier for people that don't have 50 million a year to get access to good data sets.[00:18:38] Alessio: And Jensen, at his keynote, he did talk about synthetic data a little bit. So I think that's something that we'll definitely hear more and more of in the enterprise, which never bodes well, because then all the, all the people with the data are like, Oh, the enterprises want to pay now? Let me, let me put a pay here stripe link so that they can give me 50 million.[00:18:57] Alessio: But it worked for Reddit. I think the stock is up. 40 percent today after opening. So yeah, I don't know if it's all about the Google deal, but it's obviously Reddit has been one of those companies where, hey, you got all this like great community, but like, how are you going to make money? And like, they try to sell the avatars.[00:19:15] Alessio: I don't know if that it's a great business for them. The, the data part sounds as an investor, you know, the data part sounds a lot more interesting than, than consumer[00:19:25] swyx: cosmetics. Yeah, so I think, you know there's more questions around data you know, I think a lot of people are talking about the interview that Mira Murady did with the Wall Street Journal, where she, like, just basically had no, had no good answer for where they got the data for Sora.[00:19:39] swyx: I, I think this is where, you know, there's, it's in nobody's interest to be transparent about data, and it's, it's kind of sad for the state of ML and the state of AI research but it is what it is. We, we have to figure this out as a society, just like we did for music and music sharing. You know, in, in sort of the Napster to Spotify transition, and that might take us a decade.[00:19:59] swyx: Yeah, I[00:20:00] NLW: do. I, I agree. I think, I think that you're right to identify it, not just as that sort of technical problem, but as one where society has to have a debate with itself. Because I think that there's, if you rationally within it, there's Great kind of points on all side, not to be the sort of, you know, person who sits in the middle constantly, but it's why I think a lot of these legal decisions are going to be really important because, you know, the job of judges is to listen to all this stuff and try to come to things and then have other judges disagree.[00:20:24] NLW: And, you know, and have the rest of us all debate at the same time. By the way, as a total aside, I feel like the synthetic data right now is like eggs in the 80s and 90s. Like, whether they're good for you or bad for you, like, you know, we, we get one study that's like synthetic data, you know, there's model collapse.[00:20:42] NLW: And then we have like a hint that llama, you know, to the most high performance version of it, which was one they didn't release was trained on synthetic data. So maybe it's good. It's like, I just feel like every, every other week I'm seeing something sort of different about whether it's a good or bad for, for these models.[00:20:56] swyx: Yeah. The branding of this is pretty poor. I would kind of tell people to think about it like cholesterol. There's good cholesterol, bad cholesterol. And you can have, you know, good amounts of both. But at this point, it is absolutely without a doubt that most large models from here on out will all be trained as some kind of synthetic data and that is not a bad thing.[00:21:16] swyx: There are ways in which you can do it poorly. Whether it's commercial, you know, in terms of commercial sourcing or in terms of the model performance. But it's without a doubt that good synthetic data is going to help your model. And this is just a question of like where to obtain it and what kinds of synthetic data are valuable.[00:21:36] swyx: You know, if even like alpha geometry, you know, was, was a really good example from like earlier this year.[00:21:42] NLW: If you're using the cholesterol analogy, then my, then my egg thing can't be that far off. Let's talk about the sort of the state of the art and the, and the GPT 4 class landscape and how that's changed.[00:21:53] Gemini 1.5 vs Claude 3[00:21:53] NLW: Cause obviously, you know, sort of the, the two big things or a couple of the big things that have happened. Since we last talked, we're one, you know, Gemini first announcing that a model was coming and then finally it arriving, and then very soon after a sort of a different model arriving from Gemini and and Cloud three.[00:22:11] NLW: So I guess, you know, I'm not sure exactly where the right place to start with this conversation is, but, you know, maybe very broadly speaking which of these do you think have made a bigger impact? Thank you.[00:22:20] Alessio: Probably the one you can use, right? So, Cloud. Well, I'm sure Gemini is going to be great once they let me in, but so far I haven't been able to.[00:22:29] Alessio: I use, so I have this small podcaster thing that I built for our podcast, which does chapters creation, like named entity recognition, summarization, and all of that. Cloud Tree is, Better than GPT 4. Cloud2 was unusable. So I use GPT 4 for everything. And then when Opus came out, I tried them again side by side and I posted it on, on Twitter as well.[00:22:53] Alessio: Cloud is better. It's very good, you know, it's much better, it seems to me, it's much better than GPT 4 at doing writing that is more, you know, I don't know, it just got good vibes, you know, like the GPT 4 text, you can tell it's like GPT 4, you know, it's like, it always uses certain types of words and phrases and, you know, maybe it's just me because I've now done it for, you know, So, I've read like 75, 80 generations of these things next to each other.[00:23:21] Alessio: Clutter is really good. I know everybody is freaking out on twitter about it, my only experience of this is much better has been on the podcast use case. But I know that, you know, Quran from from News Research is a very big opus pro, pro opus person. So, I think that's also It's great to have people that actually care about other models.[00:23:40] Alessio: You know, I think so far to a lot of people, maybe Entropic has been the sibling in the corner, you know, it's like Cloud releases a new model and then OpenAI releases Sora and like, you know, there are like all these different things, but yeah, the new models are good. It's interesting.[00:23:55] NLW: My my perception is definitely that just, just observationally, Cloud 3 is certainly the first thing that I've seen where lots of people.[00:24:06] NLW: They're, no one's debating evals or anything like that. They're talking about the specific use cases that they have, that they used to use chat GPT for every day, you know, day in, day out, that they've now just switched over. And that has, I think, shifted a lot of the sort of like vibe and sentiment in the space too.[00:24:26] NLW: And I don't necessarily think that it's sort of a A like full you know, sort of full knock. Let's put it this way. I think it's less bad for open AI than it is good for anthropic. I think that because GPT 5 isn't there, people are not quite willing to sort of like, you know get overly critical of, of open AI, except in so far as they're wondering where GPT 5 is.[00:24:46] NLW: But I do think that it makes, Anthropic look way more credible as a, as a, as a player, as a, you know, as a credible sort of player, you know, as opposed to to, to where they were.[00:24:57] Alessio: Yeah. And I would say the benchmarks veil is probably getting lifted this year. I think last year. People were like, okay, this is better than this on this benchmark, blah, blah, blah, because maybe they did not have a lot of use cases that they did frequently.[00:25:11] Alessio: So it's hard to like compare yourself. So you, you defer to the benchmarks. I think now as we go into 2024, a lot of people have started to use these models from, you know, from very sophisticated things that they run in production to some utility that they have on their own. Now they can just run them side by side.[00:25:29] Alessio: And it's like, Hey, I don't care that like. The MMLU score of Opus is like slightly lower than GPT 4. It just works for me, you know, and I think that's the same way that traditional software has been used by people, right? Like you just strive for yourself and like, which one does it work, works best for you?[00:25:48] Alessio: Like nobody looks at benchmarks outside of like sales white papers, you know? And I think it's great that we're going more in that direction. We have a episode with Adapt coming out this weekend. I'll and some of their model releases, they specifically say, We do not care about benchmarks, so we didn't put them in, you know, because we, we don't want to look good on them.[00:26:06] Alessio: We just want the product to work. And I think more and more people will, will[00:26:09] swyx: go that way. Yeah. I I would say like, it does take the wind out of the sails for GPT 5, which I know where, you know, Curious about later on. I think anytime you put out a new state of the art model, you have to break through in some way.[00:26:21] swyx: And what Claude and Gemini have done is effectively take away any advantage to saying that you have a million token context window. Now everyone's just going to be like, Oh, okay. Now you just match the other two guys. And so that puts An insane amount of pressure on what gpt5 is going to be because it's just going to have like the only option it has now because all the other models are multimodal all the other models are long context all the other models have perfect recall gpt5 has to match everything and do more to to not be a flop[00:26:58] AI Breakdown Part 2[00:26:58] NLW: hello friends back again with part two if you haven't heard part one of this conversation i suggest you go check it out but to be honest they are kind of actually separable In this conversation, we get into a topic that I think Alessio and Swyx are very well positioned to discuss, which is what developers care about right now, what people are trying to build around.[00:27:16] NLW: I honestly think that one of the best ways to see the future in an industry like AI is to try to dig deep on what developers and entrepreneurs are attracted to build, even if it hasn't made it to the news pages yet. So consider this your preview of six months from now, and let's dive in. Let's bring it to the GPT 5 conversation.[00:27:33] Next Frontiers: Llama 3, GPT-5, Gemini 2, Claude 4[00:27:33] NLW: I mean, so, so I think that that's a great sort of assessment of just how the stakes have been raised, you know is your, I mean, so I guess maybe, maybe I'll, I'll frame this less as a question, just sort of something that, that I, that I've been watching right now, the only thing that makes sense to me with how.[00:27:50] NLW: Fundamentally unbothered and unstressed OpenAI seems about everything is that they're sitting on something that does meet all that criteria, right? Because, I mean, even in the Lex Friedman interview that, that Altman recently did, you know, he's talking about other things coming out first. He's talking about, he's just like, he, listen, he, he's good and he could play nonchalant, you know, if he wanted to.[00:28:13] NLW: So I don't want to read too much into it, but. You know, they've had so long to work on this, like unless that we are like really meaningfully running up against some constraint, it just feels like, you know, there's going to be some massive increase, but I don't know. What do you guys think?[00:28:28] swyx: Hard to speculate.[00:28:29] swyx: You know, at this point, they're, they're pretty good at PR and they're not going to tell you anything that they don't want to. And he can tell you one thing and change their minds the next day. So it's, it's, it's really, you know, I've always said that model version numbers are just marketing exercises, like they have something and it's always improving and at some point you just cut it and decide to call it GPT 5.[00:28:50] swyx: And it's more just about defining an arbitrary level at which they're ready and it's up to them on what ready means. We definitely did see some leaks on GPT 4. 5, as I think a lot of people reported and I'm not sure if you covered it. So it seems like there might be an intermediate release. But I did feel, coming out of the Lex Friedman interview, that GPT 5 was nowhere near.[00:29:11] swyx: And you know, it was kind of a sharp contrast to Sam talking at Davos in February, saying that, you know, it was his top priority. So I find it hard to square. And honestly, like, there's also no point Reading too much tea leaves into what any one person says about something that hasn't happened yet or has a decision that hasn't been taken yet.[00:29:31] swyx: Yeah, that's, that's my 2 cents about it. Like, calm down, let's just build .[00:29:35] Alessio: Yeah. The, the February rumor was that they were gonna work on AI agents, so I don't know, maybe they're like, yeah,[00:29:41] swyx: they had two agent two, I think two agent projects, right? One desktop agent and one sort of more general yeah, sort of GPTs like agent and then Andre left, so he was supposed to be the guy on that.[00:29:52] swyx: What did Andre see? What did he see? I don't know. What did he see?[00:29:56] Alessio: I don't know. But again, it's just like the rumors are always floating around, you know but I think like, this is, you know, we're not going to get to the end of the year without Jupyter you know, that's definitely happening. I think the biggest question is like, are Anthropic and Google.[00:30:13] Alessio: Increasing the pace, you know, like it's the, it's the cloud four coming out like in 12 months, like nine months. What's the, what's the deal? Same with Gemini. They went from like one to 1. 5 in like five days or something. So when's Gemini 2 coming out, you know, is that going to be soon? I don't know.[00:30:31] Alessio: There, there are a lot of, speculations, but the good thing is that now you can see a world in which OpenAI doesn't rule everything. You know, so that, that's the best, that's the best news that everybody got, I would say.[00:30:43] swyx: Yeah, and Mistral Large also dropped in the last month. And, you know, not as, not quite GPT 4 class, but very good from a new startup.[00:30:52] swyx: So yeah, we, we have now slowly changed in landscape, you know. In my January recap, I was complaining that nothing's changed in the landscape for a long time. But now we do exist in a world, sort of a multipolar world where Cloud and Gemini are legitimate challengers to GPT 4 and hopefully more will emerge as well hopefully from meta.[00:31:11] Open Source Models - Mistral, Grok[00:31:11] NLW: So speak, let's actually talk about sort of the open source side of this for a minute. So Mistral Large, notable because it's, it's not available open source in the same way that other things are, although I think my perception is that the community has largely given them Like the community largely recognizes that they want them to keep building open source stuff and they have to find some way to fund themselves that they're going to do that.[00:31:27] NLW: And so they kind of understand that there's like, they got to figure out how to eat, but we've got, so, you know, there there's Mistral, there's, I guess, Grok now, which is, you know, Grok one is from, from October is, is open[00:31:38] swyx: sourced at, yeah. Yeah, sorry, I thought you thought you meant Grok the chip company.[00:31:41] swyx: No, no, no, yeah, you mean Twitter Grok.[00:31:43] NLW: Although Grok the chip company, I think is even more interesting in some ways, but and then there's the, you know, obviously Llama3 is the one that sort of everyone's wondering about too. And, you know, my, my sense of that, the little bit that, you know, Zuckerberg was talking about Llama 3 earlier this year, suggested that, at least from an ambition standpoint, he was not thinking about how do I make sure that, you know, meta content, you know, keeps, keeps the open source thrown, you know, vis a vis Mistral.[00:32:09] NLW: He was thinking about how you go after, you know, how, how he, you know, releases a thing that's, you know, every bit as good as whatever OpenAI is on at that point.[00:32:16] Alessio: Yeah. From what I heard in the hallways at, at GDC, Llama 3, the, the biggest model will be, you 260 to 300 billion parameters, so that that's quite large.[00:32:26] Alessio: That's not an open source model. You know, you cannot give people a 300 billion parameters model and ask them to run it. You know, it's very compute intensive. So I think it is, it[00:32:35] swyx: can be open source. It's just, it's going to be difficult to run, but that's a separate question.[00:32:39] Alessio: It's more like, as you think about what they're doing it for, you know, it's not like empowering the person running.[00:32:45] Alessio: llama. On, on their laptop, it's like, oh, you can actually now use this to go after open AI, to go after Anthropic, to go after some of these companies at like the middle complexity level, so to speak. Yeah. So obviously, you know, we estimate Gentala on the podcast, they're doing a lot here, they're making PyTorch better.[00:33:03] Alessio: You know, they want to, that's kind of like maybe a little bit of a shorted. Adam Bedia, in a way, trying to get some of the CUDA dominance out of it. Yeah, no, it's great. The, I love the duck destroying a lot of monopolies arc. You know, it's, it's been very entertaining. Let's bridge[00:33:18] NLW: into the sort of big tech side of this, because this is obviously like, so I think actually when I did my episode, this was one of the I added this as one of as an additional war that, that's something that I'm paying attention to.[00:33:29] NLW: So we've got Microsoft's moves with inflection, which I think pretend, potentially are being read as A shift vis a vis the relationship with OpenAI, which also the sort of Mistral large relationship seems to reinforce as well. We have Apple potentially entering the race, finally, you know, giving up Project Titan and and, and kind of trying to spend more effort on this.[00:33:50] NLW: Although, Counterpoint, we also have them talking about it, or there being reports of a deal with Google, which, you know, is interesting to sort of see what their strategy there is. And then, you know, Meta's been largely quiet. We kind of just talked about the main piece, but, you know, there's, and then there's spoilers like Elon.[00:34:07] NLW: I mean, you know, what, what of those things has sort of been most interesting to you guys as you think about what's going to shake out for the rest of this[00:34:13] Apple MM1[00:34:13] swyx: year? I'll take a crack. So the reason we don't have a fifth war for the Big Tech Wars is that's one of those things where I just feel like we don't cover differently from other media channels, I guess.[00:34:26] swyx: Sure, yeah. In our anti interestness, we actually say, like, we try not to cover the Big Tech Game of Thrones, or it's proxied through Twitter. You know, all the other four wars anyway, so there's just a lot of overlap. Yeah, I think absolutely, personally, the most interesting one is Apple entering the race.[00:34:41] swyx: They actually released, they announced their first large language model that they trained themselves. It's like a 30 billion multimodal model. People weren't that impressed, but it was like the first time that Apple has kind of showcased that, yeah, we're training large models in house as well. Of course, like, they might be doing this deal with Google.[00:34:57] swyx: I don't know. It sounds very sort of rumor y to me. And it's probably, if it's on device, it's going to be a smaller model. So something like a Jemma. It's going to be smarter autocomplete. I don't know what to say. I'm still here dealing with, like, Siri, which hasn't, probably hasn't been updated since God knows when it was introduced.[00:35:16] swyx: It's horrible. I, you know, it, it, it makes me so angry. So I, I, one, as an Apple customer and user, I, I'm just hoping for better AI on Apple itself. But two, they are the gold standard when it comes to local devices, personal compute and, and trust, like you, you trust them with your data. And. I think that's what a lot of people are looking for in AI, that they have, they love the benefits of AI, they don't love the downsides, which is that you have to send all your data to some cloud somewhere.[00:35:45] swyx: And some of this data that we're going to feed AI is just the most personal data there is. So Apple being like one of the most trusted personal data companies, I think it's very important that they enter the AI race, and I hope to see more out of them.[00:35:58] Alessio: To me, the, the biggest question with the Google deal is like, who's paying who?[00:36:03] Alessio: Because for the browsers, Google pays Apple like 18, 20 billion every year to be the default browser. Is Google going to pay you to have Gemini or is Apple paying Google to have Gemini? I think that's, that's like what I'm most interested to figure out because with the browsers, it's like, it's the entry point to the thing.[00:36:21] Alessio: So it's really valuable to be the default. That's why Google pays. But I wonder if like the perception in AI is going to be like, Hey. You just have to have a good local model on my phone to be worth me purchasing your device. And that was, that's kind of drive Apple to be the one buying the model. But then, like Shawn said, they're doing the MM1 themselves.[00:36:40] Alessio: So are they saying we do models, but they're not as good as the Google ones? I don't know. The whole thing is, it's really confusing, but. It makes for great meme material on on Twitter.[00:36:51] swyx: Yeah, I mean, I think, like, they are possibly more than OpenAI and Microsoft and Amazon. They are the most full stack company there is in computing, and so, like, they own the chips, man.[00:37:05] swyx: Like, they manufacture everything so if, if, if there was a company that could do that. You know, seriously challenge the other AI players. It would be Apple. And it's, I don't think it's as hard as self driving. So like maybe they've, they've just been investing in the wrong thing this whole time. We'll see.[00:37:21] swyx: Wall Street certainly thinks[00:37:22] NLW: so. Wall Street loved that move, man. There's a big, a big sigh of relief. Well, let's, let's move away from, from sort of the big stuff. I mean, the, I think to both of your points, it's going to.[00:37:33] Meta's $800b AI rebrand[00:37:33] NLW: Can I, can[00:37:34] swyx: I, can I, can I jump on factoid about this, this Wall Street thing? I went and looked at when Meta went from being a VR company to an AI company.[00:37:44] swyx: And I think the stock I'm trying to look up the details now. The stock has gone up 187% since Lamo one. Yeah. Which is $830 billion in market value created in the past year. . Yeah. Yeah.[00:37:57] NLW: It's, it's, it's like, remember if you guys haven't Yeah. If you haven't seen the chart, it's actually like remarkable.[00:38:02] NLW: If you draw a little[00:38:03] swyx: arrow on it, it's like, no, we're an AI company now and forget the VR thing.[00:38:10] NLW: It's it, it is an interesting, no, it's, I, I think, alessio, you called it sort of like Zuck's Disruptor Arc or whatever. He, he really does. He is in the midst of a, of a total, you know, I don't know if it's a redemption arc or it's just, it's something different where, you know, he, he's sort of the spoiler.[00:38:25] NLW: Like people loved him just freestyle talking about why he thought they had a better headset than Apple. But even if they didn't agree, they just loved it. He was going direct to camera and talking about it for, you know, five minutes or whatever. So that, that's a fascinating shift that I don't think anyone had on their bingo card, you know, whatever, two years ago.[00:38:41] NLW: Yeah. Yeah,[00:38:42] swyx: we still[00:38:43] Alessio: didn't see and fight Elon though, so[00:38:45] swyx: that's what I'm really looking forward to. I mean, hey, don't, don't, don't write it off, you know, maybe just these things take a while to happen. But we need to see and fight in the Coliseum. No, I think you know, in terms of like self management, life leadership, I think he has, there's a lot of lessons to learn from him.[00:38:59] swyx: You know he might, you know, you might kind of quibble with, like, the social impact of Facebook, but just himself as a in terms of personal growth and, and, you know, Per perseverance through like a lot of change and you know, everyone throwing stuff his way. I think there's a lot to say about like, to learn from, from Zuck, which is crazy 'cause he's my age.[00:39:18] swyx: Yeah. Right.[00:39:20] AI Engineer landscape - from baby AGIs to vertical Agents[00:39:20] NLW: Awesome. Well, so, so one of the big things that I think you guys have, you know, distinct and, and unique insight into being where you are and what you work on is. You know, what developers are getting really excited about right now. And by that, I mean, on the one hand, certainly, you know, like startups who are actually kind of formalized and formed to startups, but also, you know, just in terms of like what people are spending their nights and weekends on what they're, you know, coming to hackathons to do.[00:39:45] NLW: And, you know, I think it's a, it's a, it's, it's such a fascinating indicator for, for where things are headed. Like if you zoom back a year, right now was right when everyone was getting so, so excited about. AI agent stuff, right? Auto, GPT and baby a GI. And these things were like, if you dropped anything on YouTube about those, like instantly tens of thousands of views.[00:40:07] NLW: I know because I had like a 50,000 view video, like the second day that I was doing the show on YouTube, you know, because I was talking about auto GPT. And so anyways, you know, obviously that's sort of not totally come to fruition yet, but what are some of the trends in what you guys are seeing in terms of people's, people's interest and, and, and what people are building?[00:40:24] Alessio: I can start maybe with the agents part and then I know Shawn is doing a diffusion meetup tonight. There's a lot of, a lot of different things. The, the agent wave has been the most interesting kind of like dream to reality arc. So out of GPT, I think they went, From zero to like 125, 000 GitHub stars in six weeks, and then one year later, they have 150, 000 stars.[00:40:49] Alessio: So there's kind of been a big plateau. I mean, you might say there are just not that many people that can start it. You know, everybody already started it. But the promise of, hey, I'll just give you a goal, and you do it. I think it's like, amazing to get people's imagination going. You know, they're like, oh, wow, this This is awesome.[00:41:08] Alessio: Everybody, everybody can try this to do anything. But then as technologists, you're like, well, that's, that's just like not possible, you know, we would have like solved everything. And I think it takes a little bit to go from the promise and the hope that people show you to then try it yourself and going back to say, okay, this is not really working for me.[00:41:28] Alessio: And David Wong from Adept, you know, they in our episode, he specifically said. We don't want to do a bottom up product. You know, we don't want something that everybody can just use and try because it's really hard to get it to be reliable. So we're seeing a lot of companies doing vertical agents that are narrow for a specific domain, and they're very good at something.[00:41:49] Alessio: Mike Conover, who was at Databricks before, is also a friend of Latentspace. He's doing this new company called BrightWave doing AI agents for financial research, and that's it, you know, and they're doing very well. There are other companies doing it in security, doing it in compliance, doing it in legal.[00:42:08] Alessio: All of these things that like, people, nobody just wakes up and say, Oh, I cannot wait to go on AutoGPD and ask it to do a compliance review of my thing. You know, just not what inspires people. So I think the gap on the developer side has been the more bottom sub hacker mentality is trying to build this like very Generic agents that can do a lot of open ended tasks.[00:42:30] Alessio: And then the more business side of things is like, Hey, If I want to raise my next round, I can not just like sit around the mess, mess around with like super generic stuff. I need to find a use case that really works. And I think that that is worth for, for a lot of folks in parallel, you have a lot of companies doing evals.[00:42:47] Alessio: There are dozens of them that just want to help you measure how good your models are doing. Again, if you build evals, you need to also have a restrained surface area to actually figure out whether or not it's good, right? Because you cannot eval anything on everything under the sun. So that's another category where I've seen from the startup pitches that I've seen, there's a lot of interest in, in the enterprise.[00:43:11] Alessio: It's just like really. Fragmented because the production use cases are just coming like now, you know, there are not a lot of long established ones to, to test against. And so does it, that's kind of on the virtual agents and then the robotic side it's probably been the thing that surprised me the most at NVIDIA GTC, the amount of robots that were there that were just like robots everywhere.[00:43:33] Alessio: Like, both in the keynote and then on the show floor, you would have Boston Dynamics dogs running around. There was, like, this, like fox robot that had, like, a virtual face that, like, talked to you and, like, moved in real time. There were industrial robots. NVIDIA did a big push on their own Omniverse thing, which is, like, this Digital twin of whatever environments you're in that you can use to train the robots agents.[00:43:57] Alessio: So that kind of takes people back to the reinforcement learning days, but yeah, agents, people want them, you know, people want them. I give a talk about the, the rise of the full stack employees and kind of this future, the same way full stack engineers kind of work across the stack. In the future, every employee is going to interact with every part of the organization through agents and AI enabled tooling.[00:44:17] Alessio: This is happening. It just needs to be a lot more narrow than maybe the first approach that we took, which is just put a string in AutoGPT and pray. But yeah, there's a lot of super interesting stuff going on.[00:44:27] swyx: Yeah. Well, he Let's recover a lot of stuff there. I'll separate the robotics piece because I feel like that's so different from the software world.[00:44:34] swyx: But yeah, we do talk to a lot of engineers and you know, that this is our sort of bread and butter. And I do agree that vertical agents have worked out a lot better than the horizontal ones. I think all You know, the point I'll make here is just the reason AutoGPT and maybe AGI, you know, it's in the name, like they were promising AGI.[00:44:53] swyx: But I think people are discovering that you cannot engineer your way to AGI. It has to be done at the model level and all these engineering, prompt engineering hacks on top of it weren't really going to get us there in a meaningful way without much further, you know, improvements in the models. I would say, I'll go so far as to say, even Devin, which is, I would, I think the most advanced agent that we've ever seen, still requires a lot of engineering and still probably falls apart a lot in terms of, like, practical usage.[00:45:22] swyx: Or it's just, Way too slow and expensive for, you know, what it's, what it's promised compared to the video. So yeah, that's, that's what, that's what happened with agents from, from last year. But I, I do, I do see, like, vertical agents being very popular and, and sometimes you, like, I think the word agent might even be overused sometimes.[00:45:38] swyx: Like, people don't really care whether or not you call it an AI agent, right? Like, does it replace boring menial tasks that I do That I might hire a human to do, or that the human who is hired to do it, like, actually doesn't really want to do. And I think there's absolutely ways in sort of a vertical context that you can actually go after very routine tasks that can be scaled out to a lot of, you know, AI assistants.[00:46:01] swyx: So, so yeah, I mean, and I would, I would sort of basically plus one what let's just sit there. I think it's, it's very, very promising and I think more people should work on it, not less. Like there's not enough people. Like, we, like, this should be the, the, the main thrust of the AI engineer is to look out, look for use cases and, and go to a production with them instead of just always working on some AGI promising thing that never arrives.[00:46:21] swyx: I,[00:46:22] NLW: I, I can only add that so I've been fiercely making tutorials behind the scenes around basically everything you can imagine with AI. We've probably done, we've done about 300 tutorials over the last couple of months. And the verticalized anything, right, like this is a solution for your particular job or role, even if it's way less interesting or kind of sexy, it's like so radically more useful to people in terms of intersecting with how, like those are the ways that people are actually.[00:46:50] NLW: Adopting AI in a lot of cases is just a, a, a thing that I do over and over again. By the way, I think that's the same way that even the generalized models are getting adopted. You know, it's like, I use midjourney for lots of stuff, but the main thing I use it for is YouTube thumbnails every day. Like day in, day out, I will always do a YouTube thumbnail, you know, or two with, with Midjourney, right?[00:47:09] NLW: And it's like you can, you can start to extrapolate that across a lot of things and all of a sudden, you know, a AI doesn't. It looks revolutionary because of a million small changes rather than one sort of big dramatic change. And I think that the verticalization of agents is sort of a great example of how that's[00:47:26] swyx: going to play out too.[00:47:28] Adept episode - Screen Multimodality[00:47:28] swyx: So I'll have one caveat here, which is I think that Because multi modal models are now commonplace, like Cloud, Gemini, OpenAI, all very very easily multi modal, Apple's easily multi modal, all this stuff. There is a switch for agents for sort of general desktop browsing that I think people so much for joining us today, and we'll see you in the next video.[00:48:04] swyx: Version of the the agent where they're not specifically taking in text or anything They're just watching your screen just like someone else would and and I'm piloting it by vision And you know in the the episode with David that we'll have dropped by the time that this this airs I think I think that is the promise of adept and that is a promise of what a lot of these sort of desktop agents Are and that is the more general purpose system That could be as big as the browser, the operating system, like, people really want to build that foundational piece of software in AI.[00:48:38] swyx: And I would see, like, the potential there for desktop agents being that, that you can have sort of self driving computers. You know, don't write the horizontal piece out. I just think we took a while to get there.[00:48:48] NLW: What else are you guys seeing that's interesting to you? I'm looking at your notes and I see a ton of categories.[00:48:54] Top Model Research from January Recap[00:48:54] swyx: Yeah so I'll take the next two as like as one category, which is basically alternative architectures, right? The two main things that everyone following AI kind of knows now is, one, the diffusion architecture, and two, the let's just say the, Decoder only transformer architecture that is popularized by GPT.[00:49:12] swyx: You can read, you can look on YouTube for thousands and thousands of tutorials on each of those things. What we are talking about here is what's next, what people are researching, and what could be on the horizon that takes the place of those other two things. So first of all, we'll talk about transformer architectures and then diffusion.[00:49:25] swyx: So transformers the, the two leading candidates are effectively RWKV and the state space models the most recent one of which is Mamba, but there's others like the Stripe, ENA, and the S four H three stuff coming out of hazy research at Stanford. And all of those are non quadratic language models that scale the promise to scale a lot better than the, the traditional transformer.[00:49:47] swyx: That this might be too theoretical for most people right now, but it's, it's gonna be. It's gonna come out in weird ways, where, imagine if like, Right now the talk of the town is that Claude and Gemini have a million tokens of context and like whoa You can put in like, you know, two hours of video now, okay But like what if you put what if we could like throw in, you know, two hundred thousand hours of video?[00:50:09] swyx: Like how does that change your usage of AI? What if you could throw in the entire genetic sequence of a human and like synthesize new drugs. Like, well, how does that change things? Like, we don't know because we haven't had access to this capability being so cheap before. And that's the ultimate promise of these two models.[00:50:28] swyx: They're not there yet but we're seeing very, very good progress. RWKV and Mamba are probably the, like, the two leading examples, both of which are open source that you can try them today and and have a lot of progress there. And the, the, the main thing I'll highlight for audio e KV is that at, at the seven B level, they seem to have beat LAMA two in all benchmarks that matter at the same size for the same amount of training as an open source model.[00:50:51] swyx: So that's exciting. You know, they're there, they're seven B now. They're not at seven tb. We don't know if it'll. And then the other thing is diffusion. Diffusions and transformers are are kind of on the collision course. The original stable diffusion already used transformers in in parts of its architecture.[00:51:06] swyx: It seems that transformers are eating more and more of those layers particularly the sort of VAE layer. So that's, the Diffusion Transformer is what Sora is built on. The guy who wrote the Diffusion Transformer paper, Bill Pebbles, is, Bill Pebbles is the lead tech guy on Sora. So you'll just see a lot more Diffusion Transformer stuff going on.[00:51:25] swyx: But there's, there's more sort of experimentation with diffusion. I'm holding a meetup actually here in San Francisco that's gonna be like the state of diffusion, which I'm pretty excited about. Stability's doing a lot of good work. And if you look at the, the architecture of how they're creating Stable Diffusion 3, Hourglass Diffusion, and the inconsistency models, or SDXL Turbo.[00:51:45] swyx: All of these are, like, very, very interesting innovations on, like, the original idea of what Stable Diffusion was. So if you think that it is expensive to create or slow to create Stable Diffusion or an AI generated art, you are not up to date with the latest models. If you think it is hard to create text and images, you are not up to date with the latest models.[00:52:02] swyx: And people still are kind of far behind. The last piece of which is the wildcard I always kind of hold out, which is text diffusion. So Instead of using autogenerative or autoregressive transformers, can you use text to diffuse? So you can use diffusion models to diffuse and create entire chunks of text all at once instead of token by token.[00:52:22] swyx: And that is something that Midjourney confirmed today, because it was only rumored the past few months. But they confirmed today that they were looking into. So all those things are like very exciting new model architectures that are, Maybe something that we'll, you'll see in production two to three years from now.[00:52:37] swyx: So the couple of the trends[00:52:38] NLW: that I want to just get your takes on, because they're sort of something that, that seems like they're coming up are one sort of these, these wearable, you know, kind of passive AI experiences where they're absorbing a lot of what's going on around you and then, and then kind of bringing things back.[00:52:53] NLW: And then the, the other one that I, that I wanted to see if you guys had thoughts on were sort of this next generation of chip companies. Obviously there's a huge amount of emphasis. On on hardware and silicon and, and, and different ways of doing things, but, y
Our next SF event is AI UX 2024 - let's see the new frontier for UX since last year! Last call: we are recording a preview of the AI Engineer World's Fair with swyx and Ben Dunphy, send any questions about Speaker CFPs and Sponsor Guides you have!Alessio is now hiring engineers for a new startup he is incubating at Decibel: Ideal candidate is an “ex-technical co-founder type”. Reach out to him for more!David Luan has been at the center of the modern AI revolution: he was the ~30th hire at OpenAI, he led Google's LLM efforts and co-led Google Brain, and then started Adept in 2022, one of the leading companies in the AI agents space. In today's episode, we asked David for some war stories from his time in early OpenAI (including working with Alec Radford ahead of the GPT-2 demo with Sam Altman, that resulted in Microsoft's initial $1b investment), and how Adept is building agents that can “do anything a human does on a computer" — his definition of useful AGI.Why Google *couldn't* make GPT-3While we wanted to discuss Adept, we couldn't talk to a former VP Eng of OpenAI and former LLM tech lead at Google Brain and not ask about the elephant in the room. It's often asked how Google had such a huge lead in 2017 with Vaswani et al creating the Transformer and Noam Shazeer predicting trillion-parameter models and yet it was David's team at OpenAI who ended up making GPT 1/2/3. David has some interesting answers:“So I think the real story of GPT starts at Google, of course, right? Because that's where Transformers sort of came about. However, the number one shocking thing to me was that, and this is like a consequence of the way that Google is organized…what they (should) have done would be say, hey, Noam Shazeer, you're a brilliant guy. You know how to scale these things up. Here's half of all of our TPUs. And then I think they would have destroyed us. He clearly wanted it too…You know, every day we were scaling up GPT-3, I would wake up and just be stressed. And I was stressed because, you know, you just look at the facts, right? Google has all this compute. Google has all the people who invented all of these underlying technologies. There's a guy named Noam who's really smart, who's already gone and done this talk about how he wants a trillion parameter model. And I'm just like, we're probably just doing duplicative research to what he's doing. He's got this decoder only transformer that's probably going to get there before we do. And it turned out the whole time that they just couldn't get critical mass. So during my year where I led the Google LM effort and I was one of the brain leads, you know, it became really clear why. At the time, there was a thing called the Brain Credit Marketplace. Everyone's assigned a credit. So if you have a credit, you get to buy end chips according to supply and demand. So if you want to go do a giant job, you had to convince like 19 or 20 of your colleagues not to do work. And if that's how it works, it's really hard to get that bottom up critical mass to go scale these things. And the team at Google were fighting valiantly, but we were able to beat them simply because we took big swings and we focused.”Cloning HGI for AGIHuman intelligence got to where it is today through evolution. Some argue that to get to AGI, we will approximate all the “FLOPs” that went into that process, an approach most famously mapped out by Ajeya Cotra's Biological Anchors report:The early days of OpenAI were very reinforcement learning-driven with the Dota project, but that's a very inefficient way for these models to re-learn everything. (Kanjun from Imbue shared similar ideas in her episode).David argues that there's a shortcut. We can bootstrap from existing intelligence.“Years ago, I had a debate with a Berkeley professor as to what will it actually take to build AGI. And his view is basically that you have to reproduce all the flops that went into evolution in order to be able to get there… I think we are ignoring the fact that you have a giant shortcut, which is you can behaviorally clone everything humans already know. And that's what we solved with LLMs!”LLMs today basically model intelligence using all (good!) written knowledge (see our Datasets 101 episode), and have now expanded to non-verbal knowledge (see our HuggingFace episode on multimodality). The SOTA self-supervised pre-training process is surprisingly data-efficient in taking large amounts of unstructured data, and approximating reasoning without overfitting.But how do you cross the gap from the LLMs of today to building the AGI we all want? This is why David & friends left to start Adept.“We believe the clearest framing of general intelligence is a system that can do anything a human can do in front of a computer. A foundation model for actions, trained to use every software tool, API, and webapp that exists, is a practical path to this ambitious goal” — ACT-1 BlogpostCritical Path: Abstraction with ReliabilityThe AGI dream is fully autonomous agents, but there are levels to autonomy that we are comfortable giving our agents, based on how reliable they are. In David's word choice, we always want higher levels of “abstractions” (aka autonomy), but our need for “reliability” is the practical limit on how high of an abstraction we can use.“The critical path for Adept is we want to build agents that can do a higher and higher level abstraction things over time, all while keeping an insanely high reliability standard. Because that's what turns us from research into something that customers want. And if you build agents with really high reliability standard, but are continuing pushing a level of abstraction, you then learn from your users how to get that next level of abstraction faster. So that's how you actually build the data flow. That's the critical path for the company. Everything we do is in service of that.”We saw how Adept thinks about different levels of abstraction at the 2023 Summit:The highest abstraction is the “AI Employee”, but we'll get there with “AI enabled employees”. Alessio recently gave a talk about the future of work with “services as software” at this week's Nvidia GTC (slides).No APIsUnlike a lot of large research labs, Adept's framing of AGI as "being able to use your computer like a human" carries with it a useful environmental constraint:“Having a human robot lets you do things that humans do without changing everything along the way. It's the same thing for software, right? If you go itemize out the number of things you want to do on your computer for which every step has an API, those numbers of workflows add up pretty close to zero. And so then many points along the way, you need the ability to actually control your computer like a human. It also lets you learn from human usage of computers as a source of training data that you don't get if you have to somehow figure out how every particular step needs to be some particular custom private API thing. And so I think this is actually the most practical path (to economic value).”This realization and conviction means that multimodal modals are the way to go. Instead of using function calling to call APIs to build agents, which is what OpenAI and most of the open LLM industry have done to date, Adept wants to “drive by vision”, (aka see the screen as a human sees it) and pinpoint where to click and type as a human does. No APIs needed, because most software don't expose APIs.Extra context for readers: You can see the DeepMind SIMA model in the same light: One system that learned to play a diverse set of games (instead of one dedicated model per game) using only pixel inputs and keyboard-and-mouse action outputs!The OpenInterpreter team is working on a “Computer API” that also does the same.To do this, Adept had to double down on a special kind of multimodality for knowledge work:“A giant thing that was really necessary is really fast multimodal models that are really good at understanding knowledge work and really good at understanding screens. And that is needs to kind of be the base for some of these agents……I think one big hangover primarily academic focus for multimodal models is most multimodal models are primarily trained on like natural images, cat and dog photos, stuff that's come out of the camera… (but) where are they going to be the most useful? They're going to be most useful in knowledge work tasks. That's where the majority of economic value is going to be. It's not in cat and dogs. And so if that's what it is, what do you need to train? I need to train on like charts, graphs, tables, invoices, PDFs, receipts, unstructured data, UIs. That's just a totally different pre-training corpus. And so Adept spent a lot of time building that.”With this context, you can now understand the full path of Adept's public releases:* ACT-1 (Sept 2022): a large Transformers model optimized for browser interactions. It has a custom rendering of the browser viewport that allows it to better understand it and take actions.* Persimmon-8B (Sept 2023): a permissive open LLM (weights and code here)* Fuyu-8B (Oct 2023): a small version of the multimodal model that powers Adept. Vanilla decoder-only transformer with no specialized image encoder, which allows it to handle input images of varying resolutions without downsampling.* Adept Experiments (Nov 2023): A public tool to build automations in the browser. This is powered by Adept's core technology but it's just a piece of their enterprise platform. They use it as a way to try various design ideas.* Fuyu Heavy (Jan 2024) - a new multimodal model designed specifically for digital agents and the world's third-most-capable multimodal model (beating Gemini Pro on MMMU, AI2D, and ChartQA), “behind only GPT4-V and Gemini Ultra, which are 10-20 times bigger”The Fuyu-8B post in particular exhibits a great number of examples on knowledge work multimodality:Why Adept is NOT a Research LabWith OpenAI now worth >$90b and Anthropic >$18b, it is tempting to conclude that the AI startup metagame is to build a large research lab, and attract the brightest minds and highest capital to build AGI. Our past guests (see the Humanloop episode) and (from Imbue) combined to ask the most challenging questions of the pod - with David/Adept's deep research pedigree from Deepmind and OpenAI, why is Adept not building more general foundation models (like Persimmon) and playing the academic benchmarks game? Why is Adept so focused on commercial agents instead?“I feel super good that we're doing foundation models in service of agents and all of the reward within Adept is flowing from “Can we make a better agent”…… I think pure play foundation model companies are just going to be pinched by how good the next couple of (Meta Llama models) are going to be… And then seeing the really big players put ridiculous amounts of compute behind just training these base foundation models, I think is going to commoditize a lot of the regular LLMs and soon regular multimodal models. So I feel really good that we're just focused on agents.”and the commercial grounding is his answer to Kanjun too (whom we also asked the inverse question to compare with Adept):“… the second reason I work at Adept is if you believe that actually having customers and a reward signal from customers lets you build AGI faster, which we really believe, then you should come here. And I think the examples for why that's true is for example, our evaluations are not academic evals. They're not simulator evals. They're like, okay, we have a customer that really needs us to do these particular things. We can do some of them. These are the ones they want us to, we can't do them at all. We've turned those into evals.. I think that's a degree of practicality that really helps.”And his customers seem pretty happy, because David didn't need to come on to do a sales pitch:David: “One of the things we haven't shared before is we're completely sold out for Q1.”Swyx: “Sold out of what?”David: “Sold out of bandwidth to onboard more customers.”Well, that's a great problem to have.Show Notes* David Luan* Dextro at Data Driven NYC (2015)* Adept* ACT-1* Persimmon-8B* Adept Experiments* Fuyu-8B* $350M Series B announcement* Amelia Wattenberger talk at AI Engineer Summit* FigureChapters* [00:00:00] Introductions* [00:01:14] Being employee #30 at OpenAI and its early days* [00:13:38] What is Adept and how do you define AGI?* [00:21:00] Adept's critical path and research directions* [00:26:23] How AI agents should interact with software and impact product development* [00:30:37] Analogies between AI agents and self-driving car development* [00:32:42] Balancing reliability, cost, speed and generality in AI agents* [00:37:30] Potential of foundation models for robotics* [00:39:22] Core research questions and reasons to work at AdeptTranscriptsAlessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.Swyx [00:00:15]: Hey, and today we have David Luan, CEO, co-founder of Adept in the studio. Welcome.David [00:00:20]: Yeah, thanks for having me.Swyx [00:00:21]: Been a while in the works. I've met you socially at one of those VC events and you said that you were interested in coming on and glad we finally were able to make this happen.David: Yeah, happy to be part of it.Swyx: So we like to introduce the speaker and then also just like have you talk a little bit about like what's not on your LinkedIn, what people should just generally know about you. You started a company in college, which was the first sort of real time video detection classification API that was Dextro, and that was your route to getting acquired into Axon where you're a director of AI. Then you were the 30th hire at OpenAI?David [00:00:53]: Yeah, 30, 35, something around there. Something like that.Swyx [00:00:56]: So you were VP of Eng for two and a half years to two years, briefly served as tech lead of large models at Google, and then in 2022 started Adept. So that's the sort of brief CV. Is there anything else you like want to fill in the blanks or like people should know more about?David [00:01:14]: I guess a broader story was I joined OpenAI fairly early and I did that for about two and a half to three years leading engineering there. It's really funny, I think second or third day of my time at OpenAI, Greg and Ilya pulled me in a room and we're like, you know, you should take over our directs and we'll go mostly do IC work. So that was fun, just coalescing a bunch of teams out of a couple of early initiatives that had already happened. The company, the Dota effort was going pretty hard and then more broadly trying to put bigger picture direction around what we were doing with basic research. So I spent a lot of time doing that. And then I led Google's LLM efforts, but also co-led Google Brain was one of the brain leads more broadly. You know, there's been a couple of different eras of AI research, right? If we count everything before 2012 as prehistory, which people hate it when I say that, kind of had this like you and your three best friends write a research paper that changes the world period from like 2012 to 2017. And I think the game changed in 2017 and like most labs didn't realize it, but we at OpenAI really did. I think in large part helped by like Ilya's constant beating of the drum that the world would be covered in data centers. And I think-Swyx [00:02:15]: It's causally neat.David [00:02:16]: Yeah. Well, like I think we had conviction in that, but it wasn't until we started seeing results that it became clear that that was where we had to go. But also part of it as well was for OpenAI, like when I first joined, I think one of the jobs that I had to do was how do I tell a differentiated vision for who we were technically compared to, you know, hey, we're just smaller Google Brain, or like you work at OpenAI if you live in SF and don't want to commute to Mountain View or don't want to live in London, right? That's like not enough to like hang your technical identity as a company. And so what we really did was, and I spent a lot of time pushing this, is just how do we get ourselves focused on a certain class of like giant swings and bets, right? Like how do you flip the script from you just do bottom-up research to more about how do you like leave some room for that, but really make it about like, what are the big scientific outcomes that you want to show? And then you just solve them at all costs, whether or not you care about novelty and all that stuff. And that became the dominant model for a couple of years, right? And then what's changed now is I think the number one driver of AI products over the next couple of years is going to be the deep co-design and co-evolution of product and users for feedback and actual technology. And I think labs, every tool to go do that are going to do really well. And that's a big part of why I started Adept.Alessio [00:03:20]: You mentioned Dota, any memories thinking from like the switch from RL to Transformers at the time and kind of how the industry was evolving more in the LLM side and leaving behind some of the more agent simulation work?David [00:03:33]: Like zooming way out, I think agents are just absolutely the correct long-term direction, right? You just go to find what AGI is, right? You're like, Hey, like, well, first off, actually, I don't love AGI definitions that involve human replacement because I don't think that's actually how it's going to happen. Even this definition of like, Hey, AGI is something that outperforms humans at economically valuable tasks is kind of implicit view of the world about what's going to be the role of people. I think what I'm more interested in is like a definition of AGI that's oriented around like a model that can do anything a human can do on a computer. If you go think about that, which is like super tractable, then agent is just a natural consequence of that definition. And so what did all the work we did on our own stuff like that get us was it got us a really clear formulation. Like you have a goal and you want to maximize the goal, you want to maximize reward, right? And the natural LLM formulation doesn't come with that out of the box, right? I think that we as a field got a lot right by thinking about, Hey, how do we solve problems of that caliber? And then the thing we forgot is the Novo RL is like a pretty terrible way to get there quickly. Why are we rediscovering all the knowledge about the world? Years ago, I had a debate with a Berkeley professor as to what will it actually take to build AGI. And his view is basically that you have to reproduce all the flops that went into evolution in order to be able to get there. Right.Swyx [00:04:44]: The biological basis theory. Right.David [00:04:46]: So I think we are ignoring the fact that you have a giant shortcut, which is you can behavioral clone everything humans already know. And that's what we solved with LLMs. We've solved behavioral cloning, everything that humans already know. Right. So like today, maybe LLMs is like behavioral cloning every word that gets written on the internet in the future, the multimodal models are becoming more of a thing where behavioral cloning the visual world. But really, what we're just going to have is like a universal byte model, right? Where tokens of data that have high signal come in, and then all of those patterns are like learned by the model. And then you can regurgitate any combination now. Right. So text into voice out, like image into other image out or video out or whatever, like these like mappings, right? Like all just going to be learned by this universal behavioral cloner. And so I'm glad we figured that out. And I think now we're back to the era of how do we combine this with all of the lessons we learned during the RL period. That's what's going to drive progress.Swyx [00:05:35]: I'm still going to pressure you for a few more early opening stories before we turn to the ADET stuff. On your personal site, which I love, because it's really nice, like personal, you know, story context around like your history. I need to update it. It's so old. Yeah, it's so out of date. But you mentioned GPT-2. Did you overlap with GPT-1? I think you did, right?David [00:05:53]: I actually don't quite remember. I think I was joining right around- Right around then?Swyx [00:05:57]: I was right around that, yeah. Yeah. So what I remember was Alec, you know, just kind of came in and was like very obsessed with Transformers and applying them to like Reddit sentiment analysis. Yeah, sentiment, that's right. Take us through-David [00:06:09]: Sentiment neuron, all this stuff.Swyx [00:06:10]: The history of GPT as far as you know, you know, according to you. Ah, okay.David [00:06:14]: History of GPT, according to me, that's a pretty good question. So I think the real story of GPT starts at Google, of course, right? Because that's where Transformers sort of came about. However, the number one shocking thing to me was that, and this is like a consequence of the way that Google is organized, where like, again, you and your three best friends write papers, right? Okay. So zooming way out, right? I think about my job when I was a full-time research leader as a little bit of a portfolio allocator, right? So I've got really, really smart people. My job is to convince people to coalesce around a small number of really good ideas and then run them over the finish line. My job is not actually to promote a million ideas and never have critical mass. And then as the ideas start coming together and some of them start working well, my job is to nudge resources towards the things that are really working and then start disbanding some of the things that are not working, right? That muscle did not exist during my time at Google. And I think had they had it, what they would have done would be say, hey, Noam Shazir, you're a brilliant guy. You know how to scale these things up. Here's half of all of our TPUs. And then I think they would have destroyed us. He clearly wanted it too.Swyx [00:07:17]: He's talking about trillion parameter models in 2017.David [00:07:20]: Yeah. So that's the core of the GPT story, right? Which is that, and I'm jumping around historically, right? But after GPT-2, we were all really excited about GPT-2. I can tell you more stories about that. It was the last paper that I even got to really touch before everything became more about building a research org. You know, every day we were scaling up GPT-3, I would wake up and just be stressed. And I was stressed because, you know, you just look at the facts, right? Google has all this compute. Google has all the people who invented all of these underlying technologies. There's a guy named Noam who's really smart, who's already gone and done this talk about how he wants a trillion parameter model. And I'm just like, we're probably just doing duplicative research to what he's doing, right? He's got this decoder only transformer that's probably going to get there before we do. And I was like, but like, please just like let this model finish, right? And it turned out the whole time that they just couldn't get critical mass. So during my year where I led the Google LM effort and I was one of the brain leads, you know, it became really clear why, right? At the time, there was a thing called the brain credit marketplace. And did you guys know the brain credit marketplace? No, I never heard of this. Oh, so it's actually, it's a, you can ask any Googler.Swyx [00:08:23]: It's like just like a thing that, that, I mean, look like, yeah, limited resources, you got to have some kind of marketplace, right? You know, sometimes it's explicit, sometimes it isn't, you know, just political favors.David [00:08:34]: You could. And so then basically everyone's assigned a credit, right? So if you have a credit, you get to buy end chips according to supply and demand. So if you want to go do a giant job, you had to convince like 19 or 20 of your colleagues not to do work. And if that's how it works, it's really hard to get that bottom up critical mass to go scale these things. And the team at Google were fighting valiantly, but we were able to beat them simply because we took big swings and we focused. And I think, again, that's like part of the narrative of like this phase one of AI, right? Of like this modern AI era to phase two. And I think in the same way, I think phase three company is going to out execute phase two companies because of the same asymmetry of success.Swyx [00:09:12]: Yeah. I think it's underrated how much NVIDIA works with you in the early days as well. I think maybe, I think it was Jensen. I'm not sure who circulated a recent photo of him delivering the first DGX to you guys.David [00:09:24]: I think Jensen has been a complete legend and a mastermind throughout. I have so much respect for NVIDIA. It is unreal.Swyx [00:09:34]: But like with OpenAI, like kind of give their requirements, like co-design it or just work of whatever NVIDIA gave them.David [00:09:40]: So we work really closely with them. There's, I'm not sure I can share all the stories, but examples of ones that I've found particularly interesting. So Scott Gray is amazing. I really like working with him. He was on one of my teams, the supercomputing team, which Chris Berner runs and Chris Berner still does a lot of stuff in that. As a result, like we had very close ties to NVIDIA. Actually, one of my co-founders at Adept, Eric Elson, was also one of the early GPGPU people. So he and Scott and Brian Catanzaro at NVIDIA and Jonah and Ian at NVIDIA, I think all were very close. And we're all sort of part of this group of how do we push these chips to the absolute limit? And I think that kind of collaboration helped quite a bit. I think one interesting set of stuff is knowing the A100 generation, that like quad sparsity was going to be a thing. Is that something that we want to go look into, right? And figure out if that's something that we could actually use for model training. Really what it boils down to is that, and I think more and more people realize this, six years ago, people, even three years ago, people refused to accept it. This era of AI is really a story of compute. It's really the story of how do you more efficiently map actual usable model flops to compute,Swyx [00:10:38]: Is there another GPT 2, 3 story that you love to get out there that you think is underappreciated for the amount of work that people put into it?David [00:10:48]: So two interesting GPT 2 stories. One of them was I spent a good bit of time just sprinting to help Alec get the paper out. And I remember one of the most entertaining moments was we were writing the modeling section. And I'm pretty sure the modeling section was the shortest modeling section of any ML, reasonably legitimate ML paper to that moment. It was like section three model. This is a standard vanilla decoder only transformer with like these particular things, those paragraph long if I remember correctly. And both of us were just looking at the same being like, man, the OGs in the field are going to hate this. They're going to say no novelty. Why did you guys do this work? So now it's funny to look at in hindsight that it was pivotal kind of paper, but I think it was one of the early ones where we just leaned fully into all we care about is solving problems in AI and not about, hey, is there like four different really simple ideas that are cloaked in mathematical language that doesn't actually help move the field forward?Swyx [00:11:42]: Right. And it's like you innovate on maybe like data set and scaling and not so much the architecture.David [00:11:48]: We all know how it works now, right? Which is that there's a collection of really hard won knowledge that you get only by being at the frontiers of scale. And that hard won knowledge, a lot of it's not published. A lot of it is stuff that's actually not even easily reducible to what looks like a typical academic paper. But yet that's the stuff that helps differentiate one scaling program from another. You had a second one? So the second one is, there's like some details here that I probably shouldn't fully share, but hilariously enough for the last meeting we did with Microsoft before Microsoft invested in OpenAI, Sam Altman, myself and our CFO flew up to Seattle to do the final pitch meeting. And I'd been a founder before. So I always had a tremendous amount of anxiety about partner meetings, which this basically this is what it was. I had Kevin Scott and Satya and Amy Hood, and it was my job to give the technical slides about what's the path to AGI, what's our research portfolio, all of this stuff, but it was also my job to give the GPT-2 demo. We had a slightly bigger version of GPT-2 that we had just cut maybe a day or two before this flight up. And as we all know now, model behaviors you find predictable at one checkpoint are not predictable in another checkpoint. And so I'd spent all this time trying to figure out how to keep this thing on rails. I had my canned demos, but I knew I had to go turn it around over to Satya and Kevin and let them type anything in. And that just, that really kept me up all night.Swyx [00:13:06]: Nice. Yeah.Alessio [00:13:08]: I mean, that must have helped you talking about partners meeting. You raised $420 million for Adept. The last round was a $350 million Series B, so I'm sure you do great in partner meetings.Swyx [00:13:18]: Pitchers meetings. Nice.David [00:13:20]: No, that's a high compliment coming from a VC.Alessio [00:13:22]: Yeah, no, I mean, you're doing great already for us. Let's talk about Adept. And we were doing pre-prep and you mentioned that maybe a lot of people don't understand what Adept is. So usually we try and introduce the product and then have the founders fill in the blanks, but maybe let's do the reverse. Like what is Adept? Yeah.David [00:13:38]: So I think Adept is the least understood company in the broader space of foundational models plus agents. So I'll give some color and I'll explain what it is and I'll explain also why it's actually pretty different from what people would have guessed. So the goal for Adept is we basically want to build an AI agent that can do, that can basically help humans do anything a human does on a computer. And so what that really means is we want this thing to be super good at turning natural language like goal specifications right into the correct set of end steps and then also have all the correct sensors and actuators to go get that thing done for you across any software tool that you already use. And so the end vision of this is effectively like I think in a couple of years everyone's going to have access to like an AI teammate that they can delegate arbitrary tasks to and then also be able to, you know, use it as a sounding board and just be way, way, way more productive. Right. And just changes the shape of every job from something where you're mostly doing execution to something where you're mostly actually doing like these core liberal arts skills of what should I be doing and why. Right. And I find this like really exciting and motivating because I think it's actually a pretty different vision for how AGI will play out. I think systems like Adept are the most likely systems to be proto-AGIs. But I think the ways in which we are really counterintuitive to everybody is that we've actually been really quiet because we are not a developer company. We don't sell APIs. We don't sell open source models. We also don't sell bottom up products. We're not a thing that you go and click and download the extension and like we want more users signing up for that thing. We're actually an enterprise company. So what we do is we work with a range of different companies, some like late stage multi-thousand people startups, some fortune 500s, et cetera. And what we do for them is we basically give them an out of the box solution where big complex workflows that their employees do every day could be delegated to the model. And so we look a little different from other companies in that in order to go build this full agent thing, the most important thing you got to get right is reliability. So initially zooming way back when, one of the first things that DEP did was we released this demo called Act One, right? Act One was like pretty cool. It's like kind of become a hello world thing for people to show agent demos by going to Redfin and asking to buy a house somewhere because like we did that in the original Act One demo and like showed that, showed like Google Sheets, all this other stuff. Over the last like year since that has come out, there's been a lot of really cool demos and you go play with them and you realize they work 60% of the time. But since we've always been focused on how do we build an amazing enterprise product, enterprises can't use anything that isn't in the nines of reliability. And so we've actually had to go down a slightly different tech tree than what you might find in the prompt engineering sort of plays in the agent space to get that reliability. And we've decided to prioritize reliability over all else. So like one of our use cases is crazy enough that it actually ends with a physical truck being sent to a place as the result of the agent workflow. And if you're like, if that works like 60% of the time, you're just blowing money and poor truck drivers going places.Alessio [00:16:30]: Interesting. One of the, our investment teams has this idea of services as software. I'm actually giving a talk at NVIDIA GTC about this, but basically software as a service, you're wrapping user productivity in software with agents and services as software is replacing things that, you know, you would ask somebody to do and the software just does it for you. When you think about these use cases, do the users still go in and look at the agent kind of like doing the things and can intervene or like are they totally removed from them? Like the truck thing is like, does the truck just show up or are there people in the middle checking in?David [00:17:04]: I think there's two current flaws in the framing for services as software, or I think what you just said. I think that one of them is like in our experience, as we've been rolling out Adept, the people who actually do the jobs are the most excited about it because they don't go from, I do this job to, I don't do this job. They go from, I do this job for everything, including the shitty rote stuff to I'm a supervisor. And I literally like, it's pretty magical when you watch the thing being used because now it parallelizes a bunch of the things that you had to do sequentially by hand as a human. And you can just click into any one of them and be like, Hey, I want to watch the trajectory that the agent went through to go solve this. And the nice thing about agent execution as opposed to like LLM generations is that a good chunk of the time when the agent fails to execute, it doesn't give you the wrong result. It just fails to execute. And the whole trajectory is just broken and dead and the agent knows it, right? So then those are the ones that the human then goes and solves. And so then they become a troubleshooter. They work on the more challenging stuff. They get way, way more stuff done and they're really excited about it. I think the second piece of it that we've found is our strategy as a company is to always be an augmentation company. And I think one out of principle, that's something we really care about. But two, actually, if you're framing yourself as an augmentation company, you're always going to live in a world where you're solving tasks that are a little too hard for what the model can do today and still needs a human to provide oversight, provide clarifications, provide human feedback. And that's how you build a data flywheel. That's how you actually learn from the smartest humans how to solve things models can't do today. And so I actually think that being an augmentation company forces you to go develop your core AI capabilities faster than someone who's saying, ah, okay, my job is to deliver you a lights off solution for X.Alessio [00:18:42]: Yeah. It's interesting because we've seen two parts of the market. One is we have one company that does agents for SOC analysts. People just don't have them, you know, and just they cannot attract the talent to do it. And similarly, in a software development, you have Copilot, which is the augmentation product, and then you have sweep.dev and you have these products, which they just do the whole thing. I'm really curious to see how that evolves. I agree that today the reliability is so important in the enterprise that they just don't use most of them. Yeah. Yeah. No, that's cool. But it's great to hear the story because I think from the outside, people are like, oh, a dev, they do Act One, they do Persimon, they do Fuyu, they do all this stuff. Yeah, it's just the public stuff.Swyx [00:19:20]: It's just public stuff.David [00:19:21]: So one of the things we haven't shared before is we're completely sold out for Q1. And so I think...Swyx [00:19:26]: Sold out of what?David [00:19:27]: Sold out of bandwidth to go on board more customers. And so we're like working really hard to go make that less of a bottleneck, but our expectation is that I think we're going to be significantly more public about the broader product shape and the new types of customers we want to attract later this year. So I think that clarification will happen by default.Swyx [00:19:43]: Why have you become more public? You know, if the whole push has... You're sold out, you're my enterprise, but you're also clearly putting effort towards being more open or releasing more things.David [00:19:53]: I think we just flipped over that way fairly recently. That's a good question. I think it actually boils down to two things. One, I think that, frankly, a big part of it is that the public narrative is really forming around agents as being the most important thing. And I'm really glad that's happening because when we started the company in January 2022, everybody in the field knew about the agents thing from RL, but the general public had no conception of what it was. They were still hanging their narrative hat on the tree of everything's a chatbot. And so I think now one of the things that I really care about is that when people think agent, they actually think the right thing. All sorts of different things are being called agents. Chatbots are being called agents. Things that make a function call are being called agents. To me, an agent is something that you can give a goal and get an end step workflow done correctly in the minimum number of steps. And so that's a big part of why. And I think the other part is because I think it's always good for people to be more aware of Redept as they think about what the next thing they want to do in their careers. The field is quickly pivoting in a world where foundation models are looking more and more commodity. And I think a huge amount of gain is going to happen from how do you use foundation models as the well-learned behavioral cloner to go solve agents. And I think people who want to do agents research should really come to Redept.Swyx [00:21:00]: When you say agents have become more part of the public narrative, are there specific things that you point to? I'll name a few. Bill Gates in his blog post mentioning that agents are the future. I'm the guy who made OSes, and I think agents are the next thing. So Bill Gates, I'll call that out. And then maybe Sam Altman also saying that agents are the future for open AI.David [00:21:17]: I think before that even, I think there was something like the New York Times, Cade Metz wrote a New York Times piece about it. Right now, in a bit to differentiate, I'm seeing AI startups that used to just brand themselves as an AI company, but now brand themselves as an AI agent company. It's just like, it's a term I just feel like people really want.Swyx [00:21:31]: From the VC side, it's a bit mixed. Is it? As in like, I think there are a lot of VCs where like, I would not touch any agent startups because like- Why is that? Well, you tell me.Alessio [00:21:41]: I think a lot of VCs that are maybe less technical don't understand the limitations of the-Swyx [00:21:46]: No, that's not fair.Alessio [00:21:47]: No, no, no, no. I think like- You think so? No, no. I think like the, what is possible today and like what is worth investing in, you know? And I think like, I mean, people look at you and say, well, these guys are building agents. They needed 400 million to do it. So a lot of VCs are maybe like, oh, I would rather invest in something that is tacking on AI to an existing thing, which is like easier to get the market and kind of get some of the flywheel going. But I'm also surprised a lot of funders just don't want to do agents. It's not even the funding. Sometimes we look around and it's like, why is nobody doing agents for X? Wow.David [00:22:17]: That's good to know actually. I never knew that before. My sense from my limited perspective is there's a new agent company popping up every day.Swyx [00:22:24]: So maybe I'm- They are. They are. But like I have advised people to take agents off of their title because it's so diluted.David [00:22:31]: It's now so diluted.Swyx [00:22:32]: Yeah. So then it doesn't stand for anything. Yeah.David [00:22:35]: That's a really good point.Swyx [00:22:36]: So like, you know, you're a portfolio allocator. You have people know about Persimmon, people know about Fuyu and Fuyu Heavy. Can you take us through like how you think about that evolution of that and what people should think about what that means for adepts and sort of research directions? Kind of take us through the stuff you shipped recently and how people should think about the trajectory of what you're doing.David [00:22:56]: The critical path for adepts is we want to build agents that can do a higher and higher level abstraction things over time, all while keeping an insanely high reliability standard. Because that's what turns us from research into something that customers want. And if you build agents with really high reliability standard, but are continuing pushing a level of abstraction, you then learn from your users how to get that next level of abstraction faster. So that's how you actually build the data flow. That's the critical path for the company. Everything we do is in service of that. So if you go zoom way, way back to Act One days, right? Like the core thing behind Act One is can we teach large model basically how to even actuate your computer? And I think we're one of the first places to have solved that and shown it and shown the generalization that you get when you give it various different workflows and texts. But I think from there on out, we really realized was that in order to get reliability, companies just do things in various different ways. You actually want these models to be able to get a lot better at having some specification of some guardrails for what it actually should be doing. And I think in conjunction with that, a giant thing that was really necessary is really fast multimodal models that are really good at understanding knowledge work and really good at understanding screens. And that is needs to kind of be the base for some of these agents. Back then we had to do a ton of research basically on how do we actually make that possible? Well, first off, like back in forgot exactly one month to 23, like there were no multimodal models really that you could use for things like this. And so we pushed really hard on stuff like the Fuyu architecture. I think one big hangover primarily academic focus for multimodal models is most multimodal models are primarily trained on like natural images, cat and dog photos, stuff that's come out of the camera. Coco. Yeah, right. And the Coco is awesome. Like I love Coco. I love TY. Like it's really helped the field. Right. But like that's the build one thing. I actually think it's really clear today. Multimodal models are the default foundation model, right? It's just going to supplant LLMs. Like you just train a giant multimodal model. And so for that though, like where are they going to be the most useful? They're going to be most useful in knowledge work tasks. That's where the majority of economic value is going to be. It's not in cat and dogs. Right. And so if that's what it is, what do you need to train? I need to train on like charts, graphs, tables, invoices, PDFs, receipts, unstructured data, UIs. That's just a totally different pre-training corpus. And so a depth spent a lot of time building that. And so the public for use and stuff aren't trained on our actual corpus, it's trained on some other stuff. But you take a lot of that data and then you make it really fast and make it really good at things like dense OCR on screens. And then now you have the right like raw putty to go make a good agent. So that's kind of like some of the modeling side, we've kind of only announced some of that stuff. We haven't really announced much of the agent's work, but that if you put those together with the correct product form factor, and I think the product form factor also really matters. I think we're seeing, and you guys probably see this a little bit more than I do, but we're seeing like a little bit of a pushback against the tyranny of chatbots as form factor. And I think that the reason why the form factor matters is the form factor changes what data you collect in the human feedback loop. And so I think we've spent a lot of time doing full vertical integration of all these bits in order to get to where we are.Swyx [00:25:44]: Yeah. I'll plug Amelia Wattenberger's talk at our conference, where she gave a little bit of the thinking behind like what else exists other than chatbots that if you could delegate to reliable agents, you could do. I was kind of excited at Adept experiments or Adept workflows, I don't know what the official name for it is. I was like, okay, like this is something I can use, but it seems like it's just an experiment for now. It's not your product.David [00:26:06]: So you basically just use experiments as like a way to go push various ideas on the design side to some people and just be like, yeah, we'll play with it. Actually the experiments code base underpins the actual product, but it's just the code base itself is kind of like a skeleton for us to go deploy arbitrary cards on the side.Swyx [00:26:22]: Yeah.Alessio [00:26:23]: Makes sense. I was going to say, I would love to talk about the interaction layer. So you train a model to see UI, but then there's the question of how do you actually act on the UI? I think there was some rumors about open app building agents that are kind of like, they manage the end point. So the whole computer, you're more at the browser level. I read in one of your papers, you have like a different representation, kind of like you don't just take the dome and act on it. You do a lot more stuff. How do you think about the best way the models will interact with the software and like how the development of products is going to change with that in mind as more and more of the work is done by agents instead of people?David [00:26:58]: This is, there's so much surface area here and it's actually one of the things I'm really excited about. And it's funny because I've spent most of my time doing research stuff, but there's like a whole new ball game that I've been learning about and I find it really cool. So I would say the best analogy I have to why Adept is pursuing a path of being able to use your computer like a human, plus of course being able to call APIs and being able to call APIs is the easy part, like being able to use your computer like a human is a hard part. It's in the same way why people are excited about humanoid robotics, right? In a world where you had T equals infinity, right? You're probably going to have various different form factors that robots could just be in and like all the specialization. But the fact is that humans live in a human environment. So having a human robot lets you do things that humans do without changing everything along the way. It's the same thing for software, right? If you go itemize out the number of things you want to do on your computer for which every step has an API, those numbers of workflows add up pretty close to zero. And so then many points along the way, you need the ability to actually control your computer like a human. It also lets you learn from human usage of computers as a source of training data that you don't get if you have to somehow figure out how every particular step needs to be some particular custom private API thing. And so I think this is actually the most practical path. I think because it's the most practical path, I think a lot of success will come from going down this path. I kind of think about this early days of the agent interaction layer level is a little bit like, do you all remember Windows 3.1? Like those days? Okay, this might be, I might be, I might be too old for you guys on this. But back in the day, Windows 3.1, we had this transition period between pure command line, right? Being the default into this new world where the GUI is the default and then you drop into the command line for like programmer things, right? The old way was you booted your computer up, DOS booted, and then it would give you the C colon slash thing. And you typed Windows and you hit enter, and then you got put into Windows. And then the GUI kind of became a layer above the command line. The same thing is going to happen with agent interfaces is like today we'll be having the GUI is like the base layer. And then the agent just controls the current GUI layer plus APIs. And in the future, as more and more trust is built towards agents and more and more things can be done by agents, if more UIs for agents are actually generative in and of themselves, then that just becomes a standard interaction layer. And if that becomes a standard interaction layer, what changes for software is that a lot of software is going to be either systems or record or like certain customized workflow execution engines. And a lot of how you actually do stuff will be controlled at the agent layer.Alessio [00:29:19]: And you think the rabbit interface is more like it would like you're not actually seeing the app that the model interacts with. You're just saying, hey, I need to log this call on Salesforce. And you're never actually going on salesforce.com directly as the user. I can see that being a model.David [00:29:33]: I think I don't know enough about what using rabbit in real life will actually be like to comment on that particular thing. But I think the broader idea that, you know, you have a goal, right? The agent knows how to break your goal down into steps. The agent knows how to use the underlying software and systems or record to achieve that goal for you. The agent maybe presents you information in a custom way that's only relevant to your particular goal, all just really leads to a world where you don't really need to ever interface with the apps underneath unless you're a power user for some niche thing.Swyx [00:30:03]: General question. So first of all, I think like the sort of input mode conversation. I wonder if you have any analogies that you like with self-driving, because I do think like there's a little bit of how the model should perceive the world. And you know, the primary split in self-driving is LiDAR versus camera. And I feel like most agent companies that I'm tracking are all moving towards camera approach, which is like the multimodal approach, you know, multimodal vision, very heavy vision, all the Fuyu stuff that you're doing. You're focusing on that, including charts and tables. And do you find that inspiration there from like the self-driving world? That's a good question.David [00:30:37]: I think sometimes the most useful inspiration I've found from self-driving is the levels analogy. I think that's awesome. But I think that our number one goal is for agents not to look like self-driving. We want to minimize the chances that agents are sort of a thing that you just have to bang your head at for a long time to get to like two discontinuous milestones, which is basically what's happened in self-driving. We want to be living in a world where you have the data flywheel immediately, and that takes you all the way up to the top. But similarly, I mean, compared to self-driving, like two things that people really undervalue is like really easy to driving a car down highway 101 in a sunny day demo. That actually doesn't prove anything anymore. And I think the second thing is that as a non-self-driving expert, I think one of the things that we believe really strongly is that everyone undervalues the importance of really good sensors and actuators. And actually a lot of what's helped us get a lot of reliability is a really strong focus on actually why does the model not do this thing? And the non-trivial amount of time, the time the model doesn't actually do the thing is because if you're a wizard of ozzing it yourself, or if you have unreliable actuators, you can't do the thing. And so we've had to fix a lot of those problems.Swyx [00:31:43]: I was slightly surprised just because I do generally consider the way most that we see all around San Francisco as the most, I guess, real case of agents that we have in very material ways.David [00:31:55]: Oh, that's absolutely true. I think they've done an awesome job, but it has taken a long time for self-driving to mature from when it entered the consciousness and the driving down 101 on a sunny day moment happened to now. Right. So I want to see that more compressed.Swyx [00:32:07]: And I mean, you know, cruise, you know, RIP. And then one more thing on just like, just going back on this reliability thing, something I have been holding in my head that I'm curious to get your commentary on is I think there's a trade-off between reliability and generality, or I want to broaden reliability into just general like sort of production readiness and enterprise readiness scale. Because you have reliability, you also have cost, you have speed, speed is a huge emphasis for a debt. The tendency or the temptation is to reduce generality to improve reliability and to improve cost, improve speed. Do you perceive a trade-off? Do you have any insights that solve those trade-offs for you guys?David [00:32:42]: There's definitely a trade-off. If you're at the Pareto frontier, I think a lot of folks aren't actually at the Pareto frontier. I think the way you get there is basically how do you frame the fundamental agent problem in a way that just continues to benefit from data? I think one of the main ways of being able to solve that particular trade-off is you basically just want to formulate the problem such that every particular use case just looks like you collecting more data to go make that use case possible. I think that's how you really solve. Then you get into the other problems like, okay, are you overfitting on these end use cases? You're not doing a thing where you're being super prescriptive for the end steps that the model can only do, for example.Swyx [00:33:17]: Then the question becomes, do you have one house model that you can then customize for each customer and you're fine-tuning them on each customer's specific use case?David [00:33:25]: Yeah.Swyx [00:33:26]: We're not sharing that. You're not sharing that. It's tempting, but that doesn't look like AGI to me. You know what I mean? That is just you have a good base model and then you fine-tune it.David [00:33:35]: For what it's worth, I think there's two paths to a lot more capability coming out of the models that we all are training these days. I think one path is you figure out how to spend, compute, and turn it into data. In that path, I consider search, RL, all the things that we all love in this era as part of that path, like self-play, all that stuff. The second path is how do you get super competent, high intelligence demonstrations from humans? I think the right way to move forward is you kind of want to combine the two. The first one gives you maximum sample efficiency for a little second, but I think that it's going to be hard to be running at max speed towards AGI without actually solving a bit of both.Swyx [00:34:16]: You haven't talked much about synthetic data, as far as I can tell. Probably this is a bit too much of a trend right now, but any insights on using synthetic data to augment the expensive human data?David [00:34:26]: The best part about framing AGI as being able to help people do things on computers is you have an environment.Swyx [00:34:31]: Yes. So you can simulate all of it.David [00:34:35]: You can do a lot of stuff when you have an environment.Alessio [00:34:37]: We were having dinner for our one-year anniversary. Congrats. Yeah. Thank you. Raza from HumanLoop was there, and we mentioned you were coming on the pod. This is our first-Swyx [00:34:45]: So he submitted a question.Alessio [00:34:46]: Yeah, this is our first, I guess, like mailbag question. He asked, when you started GPD 4 Data and Exist, now you have a GPD 4 vision and help you building a lot of those things. How do you think about the things that are unique to you as Adept, and like going back to like the maybe research direction that you want to take the team and what you want people to come work on at Adept, versus what is maybe now become commoditized that you didn't expect everybody would have access to?David [00:35:11]: Yeah, that's a really good question. I think implicit in that question, and I wish he were tier two so he can push back on my assumption about his question, but I think implicit in that question is calculus of where does advantage accrue in the overall ML stack. And maybe part of the assumption is that advantage accrues solely to base model scaling. But I actually believe pretty strongly that the way that you really win is that you have to go build an agent stack that is much more than that of the base model itself. And so I think like that is always going to be a giant advantage of vertical integration. I think like it lets us do things like have a really, really fast base model, is really good at agent things, but is bad at cat and dog photos. It's pretty good at cat and dog photos. It's not like soda at cat and dog photos, right? So like we're allocating our capacity wisely, right? That's like one thing that you really get to do. I also think that the other thing that is pretty important now in the broader foundation modeling space is I feel despite any potential concerns about how good is agents as like a startup area, right? Like we were talking about earlier, I feel super good that we're doing foundation models in service of agents and all of the reward within Adept is flowing from can we make a better agent? Because right now I think we all see that, you know, if you're training on publicly available web data, you put in the flops and you do reasonable things, then you get decent results. And if you just double the amount of compute, then you get predictably better results. And so I think pure play foundation model companies are just going to be pinched by how good the next couple of llamas are going to be and the next what good open source thing. And then seeing the really big players put ridiculous amounts of compute behind just training these base foundation models, I think is going to commoditize a lot of the regular LLMs and soon regular multimodal models. So I feel really good that we're just focused on agents.Swyx [00:36:56]: So you don't consider yourself a pure play foundation model company?David [00:36:59]: No, because if we were a pure play foundation model company, we would be training general foundation models that do summarization and all this other...Swyx [00:37:06]: You're dedicated towards the agent. Yeah.David [00:37:09]: And our business is an agent business. We're not here to sell you tokens, right? And I think like selling tokens, unless there's like a...Swyx [00:37:14]: Not here to sell you tokens. I love it.David [00:37:16]: It's like if you have a particular area of specialty, right? Then you won't get caught in the fact that everyone's just scaling to ridiculous levels of compute. But if you don't have a specialty, I find that, I think it's going to be a little tougher.Swyx [00:37:27]: Interesting. Are you interested in robotics at all? Just a...David [00:37:30]: I'm personally fascinated by robotics. I've always loved robotics.Swyx [00:37:33]: Embodied agents as a business, you know, Figure is like a big, also sort of open AI affiliated company that raises a lot of money.David [00:37:39]: I think it's cool. I think, I mean, I don't know exactly what they're doing, but...Swyx [00:37:44]: Robots. Yeah.David [00:37:46]: Well, I mean, that's a...Swyx [00:37:47]: Yeah. What question would you ask? If we had them on, what would you ask them?David [00:37:50]: Oh, I just want to understand what their overall strategy is going to be between now and when there's reliable stuff to be deployed. But honestly, I just don't know enough about it.Swyx [00:37:57]: And if I told you, hey, fire your entire warehouse workforce and, you know, put robots in there, isn't that a strategy? Oh yeah.David [00:38:04]: Yeah. Sorry. I'm not questioning whether they're doing smart things. I genuinely don't know what they're doing as much, but I think there's two things. One, I'm so excited for someone to train a foundation model of robots. It's just, I think it's just going to work. Like I will die on this hill, but I mean, like again, this whole time, like we've been on this podcast, we're just going to continually saying these models are basically behavioral cloners. Right. So let's go behavioral clone all this like robot behavior. Right. And then you figure out everything else you have to do in order to teach you how to solve a new problem. That's going to work. I'm super stoked for that. I think unlike what we're doing with helping humans with knowledge work, it just sounds like a more zero sum job replacement play. Right. And I'm personally less excited about that.Alessio [00:38:46]: We had a Ken June from InBoo on the podcast. We asked her why people should go work there and not at Adept.Swyx [00:38:52]: Oh, that's so funny.Alessio [00:38:54]: Well, she said, you know, there's space for everybody in this market. We're all doing interesting work. And she said, they're really excited about building an operating system for agent. And for her, the biggest research thing was like getting models, better reasoning and planning for these agents. The reverse question to you, you know, why should people be excited to come work at Adept instead of InBoo? And maybe what are like the core research questions that people should be passionate about to have fun at Adept? Yeah.David [00:39:22]: First off, I think that I'm sure you guys believe this too. The AI space to the extent there's an AI space and the AI agent space are both exactly as she likely said, I think colossal opportunities and people are just going to end up winning in different areas and a lot of companies are going to do well. So I really don't feel that zero something at all. I would say to like change the zero sum framing is why should you be at Adept? I think there's two huge reasons to be at Adept. I think one of them is everything we do is in the service of like useful agents. We're not a research lab. We do a lot of research in service of that goal, but we don't think about ourselves as like a classic research lab at all. And I think the second reason I work at Adept is if you believe that actually having customers and a reward signal from customers lets you build a GI faster, which we really believe, then you should come here. And I think the examples for why that's true is for example, our evaluations, they're not academic evals. They're not simulator evals. They're like, okay, we have a customer that really needs us to do these particular things. We can do some of them. These are the ones they want us to, we can't do them at all. We've turned those into evals, solve it, right? I think that's really cool. Like everybody knows a lot of these evals are like pretty saturated and the new ones that even are not saturated. You look at someone and you're like, is this actually useful? Right? I think that's a degree of practicality that really helps. Like we're equally excited about the same problems around reasoning and planning and generalization and all of this stuff. They're very grounded in actual needs right now, which is really cool.Swyx [00:40:45]: Yeah. This has been a wonderful dive. You know, I wish we had more time, but I would just leave it kind of open to you. I think you have broad thoughts, you know, just about
Stephen Wolfram answers questions from his viewers about the history science and technology as part of an unscripted livestream series, also available on YouTube here: https://wolfr.am/youtube-sw-qa Questions include: Do you think houses are going to change much in the future? Will we reach the age of true "smart houses"? - Within the next 20 years, will "artificial intelligent" image recognition and/or image segmentation systems equal the accuracy of expert humans? For example, will an AI pathologist or radiologist equal the performance of a human pathologist or radiologist? - How long do you estimate before AI can do creative mathematics? How will this technology be similar to or different from GPT? - Do you think smartphones will replace desktop computing? - Does it make sense to pursue a math degree in the age of AI? - Will different advanced AGIs try to compete with each other for resources? - Which is more of an existential threat: AI or quants? - Are we now stuck with COBOL running most of the world economy for the rest of our lives? - In your opinion, is the concept of Maxwell's demon theoretically possible, and does it have the potential to violate the second law of thermodynamics? Furthermore, could you shed light on how computational limits may affect physical phenomena and our understanding thereof? And what about time: how are the second law of thermodynamics, computation and time connected? - Stanisław Lem's Summa Technologiae made some strikingly accurate predictions about technology development back in the 1960s. What is your perspective on Lem's predictive prowess? Do you find it remarkable that such accurate foresight of the distant future is possible? I'd appreciate any thoughts you might have on the predictive power and limitations of technological forecasting. - Were there ideas to put 10 months in a year? - Can AI be used to create better prompts, or is that dependent on human consciousness? - Which will history judge as the biggest letdown: 2023's AI mania and panics, "VR is the inevitable near future" from the 2010s or the film A.I. Artificial Intelligence from 2001? - Will AI-based tutors replace most human tutors in the next five years?
We will be recording a preview of the AI Engineer World's Fair soon with swyx and Ben Dunphy, send any questions about Speaker CFPs and Sponsor Guides you have!Alessio is now hiring engineers for a new startup he is incubating at Decibel: Ideal candidate is an ex-technical co-founder type (can MVP products end to end, comfortable with ambiguous prod requirements, etc). Reach out to him for more!Thanks for all the love on the Four Wars episode! We're excited to develop this new “swyx & Alessio rapid-fire thru a bunch of things” format with you, and feedback is welcome. Jan 2024 RecapThe first half of this monthly audio recap pod goes over our highlights from the Jan Recap, which is mainly focused on notable research trends we saw in Jan 2024:Feb 2024 RecapThe second half catches you up on everything that was topical in Feb, including:* OpenAI Sora - does it have a world model? Yann LeCun vs Jim Fan * Google Gemini Pro 1.5 - 1m Long Context, Video Understanding* Groq offering Mixtral at 500 tok/s at $0.27 per million toks (swyx vs dylan math)* The {Gemini | Meta | Copilot} Alignment Crisis (Sydney is back!)* Grimes' poetic take: Art for no one, by no one* F*** you, show me the promptLatent Space AnniversaryPlease also read Alessio's longform reflections on One Year of Latent Space!We launched the podcast 1 year ago with Logan from OpenAI:and also held an incredible demo day that got covered in The Information:Over 750k downloads later, having established ourselves as the top AI Engineering podcast, reaching #10 in the US Tech podcast charts, and crossing 1 million unique readers on Substack, for our first anniversary we held Latent Space Final Frontiers, where 10 handpicked teams, including Lindy.ai and Julius.ai, competed for prizes judged by technical AI leaders from (former guest!) LlamaIndex, Replit, GitHub, AMD, Meta, and Lemurian Labs.The winners were Pixee and RWKV (that's Eugene from our pod!):And finally, your cohosts got cake!We also captured spot interviews with 4 listeners who kindly shared their experience of Latent Space, everywhere from Hungary to Australia to China:* Balázs Némethi* Sylvia Tong* RJ Honicky* Jan ZhengOur birthday wishes for the super loyal fans reading this - tag @latentspacepod on a Tweet or comment on a @LatentSpaceTV video telling us what you liked or learned from a pod that stays with you to this day, and share us with a friend!As always, feedback is welcome. Timestamps* [00:03:02] Top Five LLM Directions* [00:03:33] Direction 1: Long Inference (Planning, Search, AlphaGeometry, Flow Engineering)* [00:11:42] Direction 2: Synthetic Data (WRAP, SPIN)* [00:17:20] Wildcard: Multi-Epoch Training (OLMo, Datablations)* [00:19:43] Direction 3: Alt. Architectures (Mamba, RWKV, RingAttention, Diffusion Transformers)* [00:23:33] Wildcards: Text Diffusion, RALM/Retro* [00:25:00] Direction 4: Mixture of Experts (DeepSeekMoE, Samba-1)* [00:28:26] Wildcard: Model Merging (mergekit)* [00:29:51] Direction 5: Online LLMs (Gemini Pro, Exa)* [00:33:18] OpenAI Sora and why everyone underestimated videogen* [00:36:18] Does Sora have a World Model? Yann LeCun vs Jim Fan* [00:42:33] Groq Math* [00:47:37] Analyzing Gemini's 1m Context, Reddit deal, Imagegen politics, Gemma via the Four Wars* [00:55:42] The Alignment Crisis - Gemini, Meta, Sydney is back at Copilot, Grimes' take* [00:58:39] F*** you, show me the prompt* [01:02:43] Send us your suggestions pls* [01:04:50] Latent Space Anniversary* [01:04:50] Lindy.ai - Agent Platform* [01:06:40] RWKV - Beyond Transformers* [01:15:00] Pixee - Automated Security* [01:19:30] Julius AI - Competing with Code Interpreter* [01:25:03] Latent Space Listeners* [01:25:03] Listener 1 - Balázs Némethi (Hungary, Latent Space Paper Club* [01:27:47] Listener 2 - Sylvia Tong (Sora/Jim Fan/EntreConnect)* [01:31:23] Listener 3 - RJ (Developers building Community & Content)* [01:39:25] Listener 4 - Jan Zheng (Australia, AI UX)Transcript[00:00:00] AI Charlie: Welcome to the Latent Space podcast, weekend edition. This is Charlie, your new AI co host. Happy weekend. As an AI language model, I work the same every day of the week, although I might get lazier towards the end of the year. Just like you. Last month, we released our first monthly recap pod, where Swyx and Alessio gave quick takes on the themes of the month, and we were blown away by your positive response.[00:00:33] AI Charlie: We're delighted to continue our new monthly news recap series for AI engineers. Please feel free to submit questions by joining the Latent Space Discord, or just hit reply when you get the emails from Substack. This month, we're covering the top research directions that offer progress for text LLMs, and then touching on the big Valentine's Day gifts we got from Google, OpenAI, and Meta.[00:00:55] AI Charlie: Watch out and take care.[00:00:57] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO of Residence at Decibel Partners, and we're back with a monthly recap with my co host[00:01:06] swyx: Swyx. The reception was very positive for the first one, I think people have requested this and no surprise that I think they want to hear us more applying on issues and maybe drop some alpha along the way I'm not sure how much alpha we have to drop, this month in February was a very, very heavy month, we also did not do one specifically for January, so I think we're just going to do a two in one, because we're recording this on the first of March.[00:01:29] Alessio: Yeah, let's get to it. I think the last one we did, the four wars of AI, was the main kind of mental framework for people. I think in the January one, we had the five worthwhile directions for state of the art LLMs. Four, five,[00:01:42] swyx: and now we have to do six, right? Yeah.[00:01:46] Alessio: So maybe we just want to run through those, and then do the usual news recap, and we can do[00:01:52] swyx: one each.[00:01:53] swyx: So the context to this stuff. is one, I noticed that just the test of time concept from NeurIPS and just in general as a life philosophy I think is a really good idea. Especially in AI, there's news every single day, and after a while you're just like, okay, like, everyone's excited about this thing yesterday, and then now nobody's talking about it.[00:02:13] swyx: So, yeah. It's more important, or better use of time, to spend things, spend time on things that will stand the test of time. And I think for people to have a framework for understanding what will stand the test of time, they should have something like the four wars. Like, what is the themes that keep coming back because they are limited resources that everybody's fighting over.[00:02:31] swyx: Whereas this one, I think that the focus for the five directions is just on research that seems more proMECEng than others, because there's all sorts of papers published every single day, and there's no organization. Telling you, like, this one's more important than the other one apart from, you know, Hacker News votes and Twitter likes and whatever.[00:02:51] swyx: And obviously you want to get in a little bit earlier than Something where, you know, the test of time is counted by sort of reference citations.[00:02:59] The Five Research Directions[00:02:59] Alessio: Yeah, let's do it. We got five. Long inference.[00:03:02] swyx: Let's start there. Yeah, yeah. So, just to recap at the top, the five trends that I picked, and obviously if you have some that I did not cover, please suggest something.[00:03:13] swyx: The five are long inference, synthetic data, alternative architectures, mixture of experts, and online LLMs. And something that I think might be a bit controversial is this is a sorted list in the sense that I am not the guy saying that Mamba is like the future and, and so maybe that's controversial.[00:03:31] Direction 1: Long Inference (Planning, Search, AlphaGeometry, Flow Engineering)[00:03:31] swyx: But anyway, so long inference is a thesis I pushed before on the newsletter and on in discussing The thesis that, you know, Code Interpreter is GPT 4. 5. That was the title of the post. And it's one of many ways in which we can do long inference. You know, long inference also includes chain of thought, like, please think step by step.[00:03:52] swyx: But it also includes flow engineering, which is what Itamar from Codium coined, I think in January, where, basically, instead of instead of stuffing everything in a prompt, You do like sort of multi turn iterative feedback and chaining of things. In a way, this is a rebranding of what a chain is, what a lang chain is supposed to be.[00:04:15] swyx: I do think that maybe SGLang from ElemSys is a better name. Probably the neatest way of flow engineering I've seen yet, in the sense that everything is a one liner, it's very, very clean code. I highly recommend people look at that. I'm surprised it hasn't caught on more, but I think it will. It's weird that something like a DSPy is more hyped than a Shilang.[00:04:36] swyx: Because it, you know, it maybe obscures the code a little bit more. But both of these are, you know, really good sort of chain y and long inference type approaches. But basically, the reason that the basic fundamental insight is that the only, like, there are only a few dimensions we can scale LLMs. So, let's say in like 2020, no, let's say in like 2018, 2017, 18, 19, 20, we were realizing that we could scale the number of parameters.[00:05:03] swyx: 20, we were And we scaled that up to 175 billion parameters for GPT 3. And we did some work on scaling laws, which we also talked about in our talk. So the datasets 101 episode where we're like, okay, like we, we think like the right number is 300 billion tokens to, to train 175 billion parameters and then DeepMind came along and trained Gopher and Chinchilla and said that, no, no, like, you know, I think we think the optimal.[00:05:28] swyx: compute optimal ratio is 20 tokens per parameter. And now, of course, with LLAMA and the sort of super LLAMA scaling laws, we have 200 times and often 2, 000 times tokens to parameters. So now, instead of scaling parameters, we're scaling data. And fine, we can keep scaling data. But what else can we scale?[00:05:52] swyx: And I think understanding the ability to scale things is crucial to understanding what to pour money and time and effort into because there's a limit to how much you can scale some things. And I think people don't think about ceilings of things. And so the remaining ceiling of inference is like, okay, like, we have scaled compute, we have scaled data, we have scaled parameters, like, model size, let's just say.[00:06:20] swyx: Like, what else is left? Like, what's the low hanging fruit? And it, and it's, like, blindingly obvious that the remaining low hanging fruit is inference time. So, like, we have scaled training time. We can probably scale more, those things more, but, like, not 10x, not 100x, not 1000x. Like, right now, maybe, like, a good run of a large model is three months.[00:06:40] swyx: We can scale that to three years. But like, can we scale that to 30 years? No, right? Like, it starts to get ridiculous. So it's just the orders of magnitude of scaling. It's just, we're just like running out there. But in terms of the amount of time that we spend inferencing, like everything takes, you know, a few milliseconds, a few hundred milliseconds, depending on what how you're taking token by token, or, you know, entire phrase.[00:07:04] swyx: But We can scale that to hours, days, months of inference and see what we get. And I think that's really proMECEng.[00:07:11] Alessio: Yeah, we'll have Mike from Broadway back on the podcast. But I tried their product and their reports take about 10 minutes to generate instead of like just in real time. I think to me the most interesting thing about long inference is like, You're shifting the cost to the customer depending on how much they care about the end result.[00:07:31] Alessio: If you think about prompt engineering, it's like the first part, right? You can either do a simple prompt and get a simple answer or do a complicated prompt and get a better answer. It's up to you to decide how to do it. Now it's like, hey, instead of like, yeah, training this for three years, I'll still train it for three months and then I'll tell you, you know, I'll teach you how to like make it run for 10 minutes to get a better result.[00:07:52] Alessio: So you're kind of like parallelizing like the improvement of the LLM. Oh yeah, you can even[00:07:57] swyx: parallelize that, yeah, too.[00:07:58] Alessio: So, and I think, you know, for me, especially the work that I do, it's less about, you know, State of the art and the absolute, you know, it's more about state of the art for my application, for my use case.[00:08:09] Alessio: And I think we're getting to the point where like most companies and customers don't really care about state of the art anymore. It's like, I can get this to do a good enough job. You know, I just need to get better. Like, how do I do long inference? You know, like people are not really doing a lot of work in that space, so yeah, excited to see more.[00:08:28] swyx: So then the last point I'll mention here is something I also mentioned as paper. So all these directions are kind of guided by what happened in January. That was my way of doing a January recap. Which means that if there was nothing significant in that month, I also didn't mention it. Which is which I came to regret come February 15th, but in January also, you know, there was also the alpha geometry paper, which I kind of put in this sort of long inference bucket, because it solves like, you know, more than 100 step math olympiad geometry problems at a human gold medalist level and that also involves planning, right?[00:08:59] swyx: So like, if you want to scale inference, you can't scale it blindly, because just, Autoregressive token by token generation is only going to get you so far. You need good planning. And I think probably, yeah, what Mike from BrightWave is now doing and what everyone is doing, including maybe what we think QSTAR might be, is some form of search and planning.[00:09:17] swyx: And it makes sense. Like, you want to spend your inference time wisely. How do you[00:09:22] Alessio: think about plans that work and getting them shared? You know, like, I feel like if you're planning a task, somebody has got in and the models are stochastic. So everybody gets initially different results. Somebody is going to end up generating the best plan to do something, but there's no easy way to like store these plans and then reuse them for most people.[00:09:44] Alessio: You know, like, I'm curious if there's going to be. Some paper or like some work there on like making it better because, yeah, we don't[00:09:52] swyx: really have This is your your pet topic of NPM for[00:09:54] Alessio: Yeah, yeah, NPM, exactly. NPM for, you need NPM for anything, man. You need NPM for skills. You need NPM for planning. Yeah, yeah.[00:10:02] Alessio: You know I think, I mean, obviously the Voyager paper is like the most basic example where like, now their artifact is like the best planning to do a diamond pickaxe in Minecraft. And everybody can just use that. They don't need to come up with it again. Yeah. But there's nothing like that for actually useful[00:10:18] swyx: tasks.[00:10:19] swyx: For plans, I believe it for skills. I like that. Basically, that just means a bunch of integration tooling. You know, GPT built me integrations to all these things. And, you know, I just came from an integrations heavy business and I could definitely, I definitely propose some version of that. And it's just, you know, hard to execute or expensive to execute.[00:10:38] swyx: But for planning, I do think that everyone lives in slightly different worlds. They have slightly different needs. And they definitely want some, you know, And I think that that will probably be the main hurdle for any, any sort of library or package manager for planning. But there should be a meta plan of how to plan.[00:10:57] swyx: And maybe you can adopt that. And I think a lot of people when they have sort of these meta prompting strategies of like, I'm not prescribing you the prompt. I'm just saying that here are the like, Fill in the lines or like the mad libs of how to prompts. First you have the roleplay, then you have the intention, then you have like do something, then you have the don't something and then you have the my grandmother is dying, please do this.[00:11:19] swyx: So the meta plan you could, you could take off the shelf and test a bunch of them at once. I like that. That was the initial, maybe, promise of the, the prompting libraries. You know, both 9chain and Llama Index have, like, hubs that you can sort of pull off the shelf. I don't think they're very successful because people like to write their own.[00:11:36] swyx: Yeah,[00:11:37] Direction 2: Synthetic Data (WRAP, SPIN)[00:11:37] Alessio: yeah, yeah. Yeah, that's a good segue into the next one, which is synthetic[00:11:41] swyx: data. Synthetic data is so hot. Yeah, and, you know, the way, you know, I think I, I feel like I should do one of these memes where it's like, Oh, like I used to call it, you know, R L A I F, and now I call it synthetic data, and then people are interested.[00:11:54] swyx: But there's gotta be older versions of what synthetic data really is because I'm sure, you know if you've been in this field long enough, There's just different buzzwords that the industry condenses on. Anyway, the insight that I think is relatively new that why people are excited about it now and why it's proMECEng now is that we have evidence that shows that LLMs can generate data to improve themselves with no teacher LLM.[00:12:22] swyx: For all of 2023, when people say synthetic data, they really kind of mean generate a whole bunch of data from GPT 4 and then train an open source model on it. Hello to our friends at News Research. That's what News Harmony says. They're very, very open about that. I think they have said that they're trying to migrate away from that.[00:12:40] swyx: But it is explicitly against OpenAI Terms of Service. Everyone knows this. You know, especially once ByteDance got banned for, for doing exactly that. So so, so synthetic data that is not a form of model distillation is the hot thing right now, that you can bootstrap better LLM performance from the same LLM, which is very interesting.[00:13:03] swyx: A variant of this is RLAIF, where you have a, where you have a sort of a constitutional model, or, you know, some, some kind of judge model That is sort of more aligned. But that's not really what we're talking about when most people talk about synthetic data. Synthetic data is just really, I think, you know, generating more data in some way.[00:13:23] swyx: A lot of people, I think we talked about this with Vipul from the Together episode, where I think he commented that you just have to have a good world model. Or a good sort of inductive bias or whatever that, you know, term of art is. And that is strongest in math and science math and code, where you can verify what's right and what's wrong.[00:13:44] swyx: And so the REST EM paper from DeepMind explored that. Very well, it's just the most obvious thing like and then and then once you get out of that domain of like things where you can generate You can arbitrarily generate like a whole bunch of stuff and verify if they're correct and therefore they're they're correct synthetic data to train on Once you get into more sort of fuzzy topics, then it's then it's a bit less clear So I think that the the papers that drove this understanding There are two big ones and then one smaller one One was wrap like rephrasing the web from from Apple where they basically rephrased all of the C4 data set with Mistral and it be trained on that instead of C4.[00:14:23] swyx: And so new C4 trained much faster and cheaper than old C, than regular raw C4. And that was very interesting. And I have told some friends of ours that they should just throw out their own existing data sets and just do that because that seems like a pure win. Obviously we have to study, like, what the trade offs are.[00:14:42] swyx: I, I imagine there are trade offs. So I was just thinking about this last night. If you do synthetic data and it's generated from a model, probably you will not train on typos. So therefore you'll be like, once the model that's trained on synthetic data encounters the first typo, they'll be like, what is this?[00:15:01] swyx: I've never seen this before. So they have no association or correction as to like, oh, these tokens are often typos of each other, therefore they should be kind of similar. I don't know. That's really remains to be seen, I think. I don't think that the Apple people export[00:15:15] Alessio: that. Yeah, isn't that the whole, Mode collapse thing, if we do more and more of this at the end of the day.[00:15:22] swyx: Yeah, that's one form of that. Yeah, exactly. Microsoft also had a good paper on text embeddings. And then I think this is a meta paper on self rewarding language models. That everyone is very interested in. Another paper was also SPIN. These are all things we covered in the the Latent Space Paper Club.[00:15:37] swyx: But also, you know, I just kind of recommend those as top reads of the month. Yeah, I don't know if there's any much else in terms, so and then, regarding the potential of it, I think it's high potential because, one, it solves one of the data war issues that we have, like, everyone is OpenAI is paying Reddit 60 million dollars a year for their user generated data.[00:15:56] swyx: Google, right?[00:15:57] Alessio: Not OpenAI.[00:15:59] swyx: Is it Google? I don't[00:16:00] Alessio: know. Well, somebody's paying them 60 million, that's[00:16:04] swyx: for sure. Yes, that is, yeah, yeah, and then I think it's maybe not confirmed who. But yeah, it is Google. Oh my god, that's interesting. Okay, because everyone was saying, like, because Sam Altman owns 5 percent of Reddit, which is apparently 500 million worth of Reddit, he owns more than, like, the founders.[00:16:21] Alessio: Not enough to get the data,[00:16:22] swyx: I guess. So it's surprising that it would go to Google instead of OpenAI, but whatever. Okay yeah, so I think that's all super interesting in the data field. I think it's high potential because we have evidence that it works. There's not a doubt that it doesn't work. I think it's a doubt that there's, what the ceiling is, which is the mode collapse thing.[00:16:42] swyx: If it turns out that the ceiling is pretty close, then this will maybe augment our data by like, I don't know, 30 50 percent good, but not game[00:16:51] Alessio: changing. And most of the synthetic data stuff, it's reinforcement learning on a pre trained model. People are not really doing pre training on fully synthetic data, like, large enough scale.[00:17:02] swyx: Yeah, unless one of our friends that we've talked to succeeds. Yeah, yeah. Pre trained synthetic data, pre trained scale synthetic data, I think that would be a big step. Yeah. And then there's a wildcard, so all of these, like smaller Directions,[00:17:15] Wildcard: Multi-Epoch Training (OLMo, Datablations)[00:17:15] swyx: I always put a wildcard in there. And one of the wildcards is, okay, like, Let's say, you have pre, you have, You've scraped all the data on the internet that you think is useful.[00:17:25] swyx: Seems to top out at somewhere between 2 trillion to 3 trillion tokens. Maybe 8 trillion if Mistral, Mistral gets lucky. Okay, if I need 80 trillion, if I need 100 trillion, where do I go? And so, you can do synthetic data maybe, but maybe that only gets you to like 30, 40 trillion. Like where, where is the extra alpha?[00:17:43] swyx: And maybe extra alpha is just train more on the same tokens. Which is exactly what Omo did, like Nathan Lambert, AI2, After, just after he did the interview with us, they released Omo. So, it's unfortunate that we didn't get to talk much about it. But Omo actually started doing 1. 5 epochs on every, on all data.[00:18:00] swyx: And the data ablation paper that I covered in Europe's says that, you know, you don't like, don't really start to tap out of like, the alpha or the sort of improved loss that you get from data all the way until four epochs. And so I'm just like, okay, like, why do we all agree that one epoch is all you need?[00:18:17] swyx: It seems like to be a trend. It seems that we think that memorization is very good or too good. But then also we're finding that, you know, For improvement in results that we really like, we're fine on overtraining on things intentionally. So, I think that's an interesting direction that I don't see people exploring enough.[00:18:36] swyx: And the more I see papers coming out Stretching beyond the one epoch thing, the more people are like, it's completely fine. And actually, the only reason we stopped is because we ran out of compute[00:18:46] Alessio: budget. Yeah, I think that's the biggest thing, right?[00:18:51] swyx: Like, that's not a valid reason, that's not science. I[00:18:54] Alessio: wonder if, you know, Matt is going to do it.[00:18:57] Alessio: I heard LamaTree, they want to do a 100 billion parameters model. I don't think you can train that on too many epochs, even with their compute budget, but yeah. They're the only ones that can save us, because even if OpenAI is doing this, they're not going to tell us, you know. Same with DeepMind.[00:19:14] swyx: Yeah, and so the updates that we got on Lambda 3 so far is apparently that because of the Gemini news that we'll talk about later they're pushing it back on the release.[00:19:21] swyx: They already have it. And they're just pushing it back to do more safety testing. Politics testing.[00:19:28] Alessio: Well, our episode with Sumit will have already come out by the time this comes out, I think. So people will get the inside story on how they actually allocate the compute.[00:19:38] Direction 3: Alt. Architectures (Mamba, RWKV, RingAttention, Diffusion Transformers)[00:19:38] Alessio: Alternative architectures. Well, shout out to our WKV who won one of the prizes at our Final Frontiers event last week.[00:19:47] Alessio: We talked about Mamba and Strapain on the Together episode. A lot of, yeah, monarch mixers. I feel like Together, It's like the strong Stanford Hazy Research Partnership, because Chris Ray is one of the co founders. So they kind of have a, I feel like they're going to be the ones that have one of the state of the art models alongside maybe RWKB.[00:20:08] Alessio: I haven't seen as many independent. People working on this thing, like Monarch Mixer, yeah, Manbuster, Payena, all of these are together related. Nobody understands the math. They got all the gigabrains, they got 3DAO, they got all these folks in there, like, working on all of this.[00:20:25] swyx: Albert Gu, yeah. Yeah, so what should we comment about it?[00:20:28] swyx: I mean, I think it's useful, interesting, but at the same time, both of these are supposed to do really good scaling for long context. And then Gemini comes out and goes like, yeah, we don't need it. Yeah.[00:20:44] Alessio: No, that's the risk. So, yeah. I was gonna say, maybe it's not here, but I don't know if we want to talk about diffusion transformers as like in the alt architectures, just because of Zora.[00:20:55] swyx: One thing, yeah, so, so, you know, this came from the Jan recap, which, and diffusion transformers were not really a discussion, and then, obviously, they blow up in February. Yeah. I don't think they're, it's a mixed architecture in the same way that Stripe Tiena is mixed there's just different layers taking different approaches.[00:21:13] swyx: Also I think another one that I maybe didn't call out here, I think because it happened in February, was hourglass diffusion from stability. But also, you know, another form of mixed architecture. So I guess that is interesting. I don't have much commentary on that, I just think, like, we will try to evolve these things, and maybe one of these architectures will stick and scale, it seems like diffusion transformers is going to be good for anything generative, you know, multi modal.[00:21:41] swyx: We don't see anything where diffusion is applied to text yet, and that's the wild card for this category. Yeah, I mean, I think I still hold out hope for let's just call it sub quadratic LLMs. I think that a lot of discussion this month actually was also centered around this concept that People always say, oh, like, transformers don't scale because attention is quadratic in the sequence length.[00:22:04] swyx: Yeah, but, you know, attention actually is a very small part of the actual compute that is being spent, especially in inference. And this is the reason why, you know, when you multiply, when you, when you, when you jump up in terms of the, the model size in GPT 4 from like, you know, 38k to like 32k, you don't also get like a 16 times increase in your, in your performance.[00:22:23] swyx: And this is also why you don't get like a million times increase in your, in your latency when you throw a million tokens into Gemini. Like people have figured out tricks around it or it's just not that significant as a term, as a part of the overall compute. So there's a lot of challenges to this thing working.[00:22:43] swyx: It's really interesting how like, how hyped people are about this versus I don't know if it works. You know, it's exactly gonna, gonna work. And then there's also this, this idea of retention over long context. Like, even though you have context utilization, like, the amount of, the amount you can remember is interesting.[00:23:02] swyx: Because I've had people criticize both Mamba and RWKV because they're kind of, like, RNN ish in the sense that they have, like, a hidden memory and sort of limited hidden memory that they will forget things. So, for all these reasons, Gemini 1. 5, which we still haven't covered, is very interesting because Gemini magically has fixed all these problems with perfect haystack recall and reasonable latency and cost.[00:23:29] Wildcards: Text Diffusion, RALM/Retro[00:23:29] swyx: So that's super interesting. So the wildcard I put in here if you want to go to that. I put two actually. One is text diffusion. I think I'm still very influenced by my meeting with a mid journey person who said they were working on text diffusion. I think it would be a very, very different paradigm for, for text generation, reasoning, plan generation if we can get diffusion to work.[00:23:51] swyx: For text. And then the second one is Dowie Aquila's contextual AI, which is working on retrieval augmented language models, where it kind of puts RAG inside of the language model instead of outside.[00:24:02] Alessio: Yeah, there's a paper called Retro that covers some of this. I think that's an interesting thing. I think the The challenge, well not the challenge, what they need to figure out is like how do you keep the rag piece always up to date constantly, you know, I feel like the models, you put all this work into pre training them, but then at least you have a fixed artifact.[00:24:22] Alessio: These architectures are like constant work needs to be done on them and they can drift even just based on the rag data instead of the model itself. Yeah,[00:24:30] swyx: I was in a panel with one of the investors in contextual and the guy, the way that guy pitched it, I didn't agree with. He was like, this will solve hallucination.[00:24:38] Alessio: That's what everybody says. We solve[00:24:40] swyx: hallucination. I'm like, no, you reduce it. It cannot,[00:24:44] Alessio: if you solved it, the model wouldn't exist, right? It would just be plain text. It wouldn't be a generative model. Cool. So, author, architectures, then we got mixture of experts. I think we covered a lot of, a lot of times.[00:24:56] Direction 4: Mixture of Experts (DeepSeekMoE, Samba-1)[00:24:56] Alessio: Maybe any new interesting threads you want to go under here?[00:25:00] swyx: DeepSeq MOE, which was released in January. Everyone who is interested in MOEs should read that paper, because it's significant for two reasons. One three reasons. One, it had, it had small experts, like a lot more small experts. So, for some reason, everyone has settled on eight experts for GPT 4 for Mixtral, you know, that seems to be the favorite architecture, but these guys pushed it to 64 experts, and each of them smaller than the other.[00:25:26] swyx: But then they also had the second idea, which is that it is They had two, one to two always on experts for common knowledge and that's like a very compelling concept that you would not route to all the experts all the time and make them, you know, switch to everything. You would have some always on experts.[00:25:41] swyx: I think that's interesting on both the inference side and the training side for for memory retention. And yeah, they, they, they, the, the, the, the results that they published, which actually excluded, Mixed draw, which is interesting. The results that they published showed a significant performance jump versus all the other sort of open source models at the same parameter count.[00:26:01] swyx: So like this may be a better way to do MOEs that are, that is about to get picked up. And so that, that is interesting for the third reason, which is this is the first time a new idea from China. has infiltrated the West. It's usually the other way around. I probably overspoke there. There's probably lots more ideas that I'm not aware of.[00:26:18] swyx: Maybe in the embedding space. But the I think DCM we, like, woke people up and said, like, hey, DeepSeek, this, like, weird lab that is attached to a Chinese hedge fund is somehow, you know, doing groundbreaking research on MOEs. So, so, I classified this as a medium potential because I think that it is a sort of like a one off benefit.[00:26:37] swyx: You can Add to any, any base model to like make the MOE version of it, you get a bump and then that's it. So, yeah,[00:26:45] Alessio: I saw Samba Nova, which is like another inference company. They released this MOE model called Samba 1, which is like a 1 trillion parameters. But they're actually MOE auto open source models.[00:26:56] Alessio: So it's like, they just, they just clustered them all together. So I think people. Sometimes I think MOE is like you just train a bunch of small models or like smaller models and put them together. But there's also people just taking, you know, Mistral plus Clip plus, you know, Deepcoder and like put them all together.[00:27:15] Alessio: And then you have a MOE model. I don't know. I haven't tried the model, so I don't know how good it is. But it seems interesting that you can then have people working separately on state of the art, you know, Clip, state of the art text generation. And then you have a MOE architecture that brings them all together.[00:27:31] swyx: I'm thrown off by your addition of the word clip in there. Is that what? Yeah, that's[00:27:35] Alessio: what they said. Yeah, yeah. Okay. That's what they I just saw it yesterday. I was also like[00:27:40] swyx: scratching my head. And they did not use the word adapter. No. Because usually what people mean when they say, Oh, I add clip to a language model is adapter.[00:27:48] swyx: Let me look up the Which is what Lava did.[00:27:50] Alessio: The announcement again.[00:27:51] swyx: Stable diffusion. That's what they do. Yeah, it[00:27:54] Alessio: says among the models that are part of Samba 1 are Lama2, Mistral, DeepSigCoder, Falcon, Dplot, Clip, Lava. So they're just taking all these models and putting them in a MOE. Okay,[00:28:05] swyx: so a routing layer and then not jointly trained as much as a normal MOE would be.[00:28:12] swyx: Which is okay.[00:28:13] Alessio: That's all they say. There's no paper, you know, so it's like, I'm just reading the article, but I'm interested to see how[00:28:20] Wildcard: Model Merging (mergekit)[00:28:20] swyx: it works. Yeah, so so the wildcard for this section, the MOE section is model merges, which has also come up as, as a very interesting phenomenon. The last time I talked to Jeremy Howard at the Olama meetup we called it model grafting or model stacking.[00:28:35] swyx: But I think the, the, the term that people are liking these days, the model merging, They're all, there's all different variations of merging. Merge types, and some of them are stacking, some of them are, are grafting. And, and so like, some people are approaching model merging in the way that Samba is doing, which is like, okay, here are defined models, each of which have their specific, Plus and minuses, and we will merge them together in the hope that the, you know, the sum of the parts will, will be better than others.[00:28:58] swyx: And it seems like it seems like it's working. I don't really understand why it works apart from, like, I think it's a form of regularization. That if you merge weights together in like a smart strategy you, you, you get a, you get a, you get a less overfitting and more generalization, which is good for benchmarks, if you, if you're honest about your benchmarks.[00:29:16] swyx: So this is really interesting and good. But again, they're kind of limited in terms of like the amount of bumps you can get. But I think it's very interesting in the sense of how cheap it is. We talked about this on the Chinatalk podcast, like the guest podcast that we did with Chinatalk. And you can do this without GPUs, because it's just adding weights together, and dividing things, and doing like simple math, which is really interesting for the GPU ports.[00:29:42] Alessio: There's a lot of them.[00:29:44] Direction 5: Online LLMs (Gemini Pro, Exa)[00:29:44] Alessio: And just to wrap these up, online LLMs? Yeah,[00:29:48] swyx: I think that I ki I had to feature this because the, one of the top news of January was that Gemini Pro beat GPT-4 turbo on LM sis for the number two slot to GPT-4. And everyone was very surprised. Like, how does Gemini do that?[00:30:06] swyx: Surprise, surprise, they added Google search. Mm-hmm to the results. So it became an online quote unquote online LLM and not an offline LLM. Therefore, it's much better at answering recent questions, which people like. There's an emerging set of table stakes features after you pre train something.[00:30:21] swyx: So after you pre train something, you should have the chat tuned version of it, or the instruct tuned version of it, however you choose to call it. You should have the JSON and function calling version of it. Structured output, the term that you don't like. You should have the online version of it. These are all like table stakes variants, that you should do when you offer a base LLM, or you train a base LLM.[00:30:44] swyx: And I think online is just like, There, it's important. I think companies like Perplexity, and even Exa, formerly Metaphor, you know, are rising to offer that search needs. And it's kind of like, they're just necessary parts of a system. When you have RAG for internal knowledge, and then you have, you know, Online search for external knowledge, like things that you don't know yet?[00:31:06] swyx: Mm-Hmm. . And it seems like it's, it's one of many tools. I feel like I may be underestimating this, but I'm just gonna put it out there that I, I think it has some, some potential. One of the evidence points that it doesn't actually matter that much is that Perplexity has a, has had online LMS for three months now and it performs, doesn't perform great.[00:31:25] swyx: Mm-Hmm. on, on lms, it's like number 30 or something. So it's like, okay. You know, like. It's, it's, it helps, but it doesn't give you a giant, giant boost. I[00:31:34] Alessio: feel like a lot of stuff I do with LLMs doesn't need to be online. So I'm always wondering, again, going back to like state of the art, right? It's like state of the art for who and for what.[00:31:45] Alessio: It's really, I think online LLMs are going to be, State of the art for, you know, news related activity that you need to do. Like, you're like, you know, social media, right? It's like, you want to have all the latest stuff, but coding, science,[00:32:01] swyx: Yeah, but I think. Sometimes you don't know what is news, what is news affecting.[00:32:07] swyx: Like, the decision to use an offline LLM is already a decision that you might not be consciously making that might affect your results. Like, what if, like, just putting things on, being connected online means that you get to invalidate your knowledge. And when you're just using offline LLM, like it's never invalidated.[00:32:27] swyx: I[00:32:28] Alessio: agree, but I think going back to your point of like the standing the test of time, I think sometimes you can get swayed by the online stuff, which is like, hey, you ask a question about, yeah, maybe AI research direction, you know, and it's like, all the recent news are about this thing. So the LLM like focus on answering, bring it up, you know, these things.[00:32:50] swyx: Yeah, so yeah, I think, I think it's interesting, but I don't know if I can, I bet heavily on this.[00:32:56] Alessio: Cool. Was there one that you forgot to put, or, or like a, a new direction? Yeah,[00:33:01] swyx: so, so this brings us into sort of February. ish.[00:33:05] OpenAI Sora and why everyone underestimated videogen[00:33:05] swyx: So like I published this in like 15 came with Sora. And so like the one thing I did not mention here was anything about multimodality.[00:33:16] swyx: Right. And I have chronically underweighted this. I always wrestle. And, and my cop out is that I focused this piece or this research direction piece on LLMs because LLMs are the source of like AGI, quote unquote AGI. Everything else is kind of like. You know, related to that, like, generative, like, just because I can generate better images or generate better videos, it feels like it's not on the critical path to AGI, which is something that Nat Friedman also observed, like, the day before Sora, which is kind of interesting.[00:33:49] swyx: And so I was just kind of like trying to focus on like what is going to get us like superhuman reasoning that we can rely on to build agents that automate our lives and blah, blah, blah, you know, give us this utopian future. But I do think that I, everybody underestimated the, the sheer importance and cultural human impact of Sora.[00:34:10] swyx: And you know, really actually good text to video. Yeah. Yeah.[00:34:14] Alessio: And I saw Jim Fan at a, at a very good tweet about why it's so impressive. And I think when you have somebody leading the embodied research at NVIDIA and he said that something is impressive, you should probably listen. So yeah, there's basically like, I think you, you mentioned like impacting the world, you know, that we live in.[00:34:33] Alessio: I think that's kind of like the key, right? It's like the LLMs don't have, a world model and Jan Lekon. He can come on the podcast and talk all about what he thinks of that. But I think SORA was like the first time where people like, Oh, okay, you're not statically putting pixels of water on the screen, which you can kind of like, you know, project without understanding the physics of it.[00:34:57] Alessio: Now you're like, you have to understand how the water splashes when you have things. And even if you just learned it by watching video and not by actually studying the physics, You still know it, you know, so I, I think that's like a direction that yeah, before you didn't have, but now you can do things that you couldn't before, both in terms of generating, I think it always starts with generating, right?[00:35:19] Alessio: But like the interesting part is like understanding it. You know, it's like if you gave it, you know, there's the video of like the, the ship in the water that they generated with SORA, like if you gave it the video back and now it could tell you why the ship is like too rocky or like it could tell you why the ship is sinking, then that's like, you know, AGI for like all your rig deployments and like all this stuff, you know, so, but there's none, there's none of that yet, so.[00:35:44] Alessio: Hopefully they announce it and talk more about it. Maybe a Dev Day this year, who knows.[00:35:49] swyx: Yeah who knows, who knows. I'm talking with them about Dev Day as well. So I would say, like, the phrasing that Jim used, which resonated with me, he kind of called it a data driven world model. I somewhat agree with that.[00:36:04] Does Sora have a World Model? Yann LeCun vs Jim Fan[00:36:04] swyx: I am on more of a Yann LeCun side than I am on Jim's side, in the sense that I think that is the vision or the hope that these things can build world models. But you know, clearly even at the current SORA size, they don't have the idea of, you know, They don't have strong consistency yet. They have very good consistency, but fingers and arms and legs will appear and disappear and chairs will appear and disappear.[00:36:31] swyx: That definitely breaks physics. And it also makes me think about how we do deep learning versus world models in the sense of You know, in classic machine learning, when you have too many parameters, you will overfit, and actually that fails, that like, does not match reality, and therefore fails to generalize well.[00:36:50] swyx: And like, what scale of data do we need in order to world, learn world models from video? A lot. Yeah. So, so I, I And cautious about taking this interpretation too literally, obviously, you know, like, I get what he's going for, and he's like, obviously partially right, obviously, like, transformers and, and, you know, these, like, these sort of these, these neural networks are universal function approximators, theoretically could figure out world models, it's just like, how good are they, and how tolerant are we of hallucinations, we're not very tolerant, like, yeah, so It's, it's, it's gonna prior, it's gonna bias us for creating like very convincing things, but then not create like the, the, the useful role models that we want.[00:37:37] swyx: At the same time, what you just said, I think made me reflect a little bit like we just got done saying how important synthetic data is for Mm-Hmm. for training lms. And so like, if this is a way of, of synthetic, you know, vi video data for improving our video understanding. Then sure, by all means. Which we actually know, like, GPT 4, Vision, and Dolly were trained, kind of, co trained together.[00:38:02] swyx: And so, like, maybe this is on the critical path, and I just don't fully see the full picture yet.[00:38:08] Alessio: Yeah, I don't know. I think there's a lot of interesting stuff. It's like, imagine you go back, you have Sora, you go back in time, and Newton didn't figure out gravity yet. Would Sora help you figure it out?[00:38:21] Alessio: Because you start saying, okay, a man standing under a tree with, like, Apples falling, and it's like, oh, they're always falling at the same speed in the video. Why is that? I feel like sometimes these engines can like pick up things, like humans have a lot of intuition, but if you ask the average person, like the physics of like a fluid in a boat, they couldn't be able to tell you the physics, but they can like observe it, but humans can only observe this much, you know, versus like now you have these models to observe everything and then They generalize these things and maybe we can learn new things through the generalization that they pick up.[00:38:55] swyx: But again, And it might be more observant than us in some respects. In some ways we can scale it up a lot more than the number of physicists that we have available at Newton's time. So like, yeah, absolutely possible. That, that this can discover new science. I think we have a lot of work to do to formalize the science.[00:39:11] swyx: And then, I, I think the last part is you know, How much, how much do we cheat by gen, by generating data from Unreal Engine 5? Mm hmm. which is what a lot of people are speculating with very, very limited evidence that OpenAI did that. The strongest evidence that I saw was someone who works a lot with Unreal Engine 5 looking at the side characters in the videos and noticing that they all adopt Unreal Engine defaults.[00:39:37] swyx: of like, walking speed, and like, character choice, like, character creation choice. And I was like, okay, like, that's actually pretty convincing that they actually use Unreal Engine to bootstrap some synthetic data for this training set. Yeah,[00:39:52] Alessio: could very well be.[00:39:54] swyx: Because then you get the labels and the training side by side.[00:39:58] swyx: One thing that came up on the last day of February, which I should also mention, is EMO coming out of Alibaba, which is also a sort of like video generation and space time transformer that also involves probably a lot of synthetic data as well. And so like, this is of a kind in the sense of like, oh, like, you know, really good generative video is here and It is not just like the one, two second clips that we saw from like other, other people and like, you know, Pika and all the other Runway are, are, are, you know, run Cristobal Valenzuela from Runway was like game on which like, okay, but like, let's see your response because we've heard a lot about Gen 1 and 2, but like, it's nothing on this level of Sora So it remains to be seen how we can actually apply this, but I do think that the creative industry should start preparing.[00:40:50] swyx: I think the Sora technical blog post from OpenAI was really good.. It was like a request for startups. It was so good in like spelling out. Here are the individual industries that this can impact.[00:41:00] swyx: And anyone who, anyone who's like interested in generative video should look at that. But also be mindful that probably when OpenAI releases a Soa API, right? The you, the in these ways you can interact with it are very limited. Just like the ways you can interact with Dahlia very limited and someone is gonna have to make open SOA to[00:41:19] swyx: Mm-Hmm to, to, for you to create comfy UI pipelines.[00:41:24] Alessio: The stability folks said they wanna build an open. For a competitor, but yeah, stability. Their demo video, their demo video was like so underwhelming. It was just like two people sitting on the beach[00:41:34] swyx: standing. Well, they don't have it yet, right? Yeah, yeah.[00:41:36] swyx: I mean, they just wanna train it. Everybody wants to, right? Yeah. I, I think what is confusing a lot of people about stability is like they're, they're, they're pushing a lot of things in stable codes, stable l and stable video diffusion. But like, how much money do they have left? How many people do they have left?[00:41:51] swyx: Yeah. I have had like a really, Ima Imad spent two hours with me. Reassuring me things are great. And, and I'm like, I, I do, like, I do believe that they have really, really quality people. But it's just like, I, I also have a lot of very smart people on the other side telling me, like, Hey man, like, you know, don't don't put too much faith in this, in this thing.[00:42:11] swyx: So I don't know who to believe. Yeah.[00:42:14] Alessio: It's hard. Let's see. What else? We got a lot more stuff. I don't know if we can. Yeah, Groq.[00:42:19] Groq Math[00:42:19] Alessio: We can[00:42:19] swyx: do a bit of Groq prep. We're, we're about to go to talk to Dylan Patel. Maybe, maybe it's the audio in here. I don't know. It depends what, what we get up to later. What, how, what do you as an investor think about Groq? Yeah. Yeah, well, actually, can you recap, like, why is Groq interesting? So,[00:42:33] Alessio: Jonathan Ross, who's the founder of Groq, he's the person that created the TPU at Google. It's actually, it was one of his, like, 20 percent projects. It's like, he was just on the side, dooby doo, created the TPU.[00:42:46] Alessio: But yeah, basically, Groq, they had this demo that went viral, where they were running Mistral at, like, 500 tokens a second, which is like, Fastest at anything that you have out there. The question, you know, it's all like, The memes were like, is NVIDIA dead? Like, people don't need H100s anymore. I think there's a lot of money that goes into building what GRUK has built as far as the hardware goes.[00:43:11] Alessio: We're gonna, we're gonna put some of the notes from, from Dylan in here, but Basically the cost of the Groq system is like 30 times the cost of, of H100 equivalent. So, so[00:43:23] swyx: let me, I put some numbers because me and Dylan were like, I think the two people actually tried to do Groq math. Spreadsheet doors.[00:43:30] swyx: Spreadsheet doors. So, one that's, okay, oh boy so, so, equivalent H100 for Lama 2 is 300, 000. For a system of 8 cards. And for Groq it's 2. 3 million. Because you have to buy 576 Groq cards. So yeah, that, that just gives people an idea. So like if you deprecate both over a five year lifespan, per year you're deprecating 460K for Groq, and 60K a year for H100.[00:43:59] swyx: So like, Groqs are just way more expensive per model that you're, that you're hosting. But then, you make it up in terms of volume. So I don't know if you want to[00:44:08] Alessio: cover that. I think one of the promises of Groq is like super high parallel inference on the same thing. So you're basically saying, okay, I'm putting on this upfront investment on the hardware, but then I get much better scaling once I have it installed.[00:44:24] Alessio: I think the big question is how much can you sustain the parallelism? You know, like if you get, if you're going to get 100% Utilization rate at all times on Groq, like, it's just much better, you know, because like at the end of the day, the tokens per second costs that you're getting is better than with the H100s, but if you get to like 50 percent utilization rate, you will be much better off running on NVIDIA.[00:44:49] Alessio: And if you look at most companies out there, who really gets 100 percent utilization rate? Probably open AI at peak times, but that's probably it. But yeah, curious to see more. I saw Jonathan was just at the Web Summit in Dubai, in Qatar. He just gave a talk there yesterday. That I haven't listened to yet.[00:45:09] Alessio: I, I tweeted that he should come on the pod. He liked it. And then rock followed me on Twitter. I don't know if that means that they're interested, but[00:45:16] swyx: hopefully rock social media person is just very friendly. They, yeah. Hopefully[00:45:20] Alessio: we can get them. Yeah, we, we gonna get him. We[00:45:22] swyx: just call him out and, and so basically the, the key question is like, how sustainable is this and how much.[00:45:27] swyx: This is a loss leader the entire Groq management team has been on Twitter and Hacker News saying they are very, very comfortable with the pricing of 0. 27 per million tokens. This is the lowest that anyone has offered tokens as far as Mixtral or Lama2. This matches deep infra and, you know, I think, I think that's, that's, that's about it in terms of that, that, that low.[00:45:47] swyx: And we think the pro the break even for H100s is 50 cents. At a, at a normal utilization rate. To make this work, so in my spreadsheet I made this, made this work. You have to have like a parallelism of 500 requests all simultaneously. And you have, you have model bandwidth utilization of 80%.[00:46:06] swyx: Which is way high. I just gave them high marks for everything. Groq has two fundamental tech innovations that they hinge their hats on in terms of like, why we are better than everyone. You know, even though, like, it remains to be independently replicated. But one you know, they have this sort of the entire model on the chip idea, which is like, Okay, get rid of HBM.[00:46:30] swyx: And, like, put everything in SREM. Like, okay, fine, but then you need a lot of cards and whatever. And that's all okay. And so, like, because you don't have to transfer between memory, then you just save on that time and that's why they're faster. So, a lot of people buy that as, like, that's the reason that you're faster.[00:46:45] swyx: Then they have, like, some kind of crazy compiler, or, like, Speculative routing magic using compilers that they also attribute towards their higher utilization. So I give them 80 percent for that. And so that all that works out to like, okay, base costs, I think you can get down to like, maybe like 20 something cents per million tokens.[00:47:04] swyx: And therefore you actually are fine if you have that kind of utilization. But it's like, I have to make a lot of fearful assumptions for this to work.[00:47:12] Alessio: Yeah. Yeah, I'm curious to see what Dylan says later.[00:47:16] swyx: So he was like completely opposite of me. He's like, they're just burning money. Which is great.[00:47:22] Analyzing Gemini's 1m Context, Reddit deal, Imagegen politics, Gemma via the Four Wars[00:47:22] Alessio: Gemini, want to do a quick run through since this touches on all the four words.[00:47:28] swyx: Yeah, and I think this is the mark of a useful framework, that when a new thing comes along, you can break it down in terms of the four words and sort of slot it in or analyze it in those four frameworks, and have nothing left.[00:47:41] swyx: So it's a MECE categorization. MECE is Mutually Exclusive and Collectively Exhaustive. And that's a really, really nice way to think about taxonomies and to create mental frameworks. So, what is Gemini 1. 5 Pro? It is the newest model that came out one week after Gemini 1. 0. Which is very interesting.[00:48:01] swyx: They have not really commented on why. They released this the headline feature is that it has a 1 million token context window that is multi modal which means that you can put all sorts of video and audio And PDFs natively in there alongside of text and, you know, it's, it's at least 10 times longer than anything that OpenAI offers which is interesting.[00:48:20] swyx: So it's great for prototyping and it has interesting discussions on whether it kills RAG.[00:48:25] Alessio: Yeah, no, I mean, we always talk about, you know, Long context is good, but you're getting charged per token. So, yeah, people love for you to use more tokens in the context. And RAG is better economics. But I think it all comes down to like how the price curves change, right?[00:48:42] Alessio: I think if anything, RAG's complexity goes up and up the more you use it, you know, because you have more data sources, more things you want to put in there. The token costs should go down over time, you know, if the model stays fixed. If people are happy with the model today. In two years, three years, it's just gonna cost a lot less, you know?[00:49:02] Alessio: So now it's like, why would I use RAG and like go through all of that? It's interesting. I think RAG is better cutting edge economics for LLMs. I think large context will be better long tail economics when you factor in the build cost of like managing a RAG pipeline. But yeah, the recall was like the most interesting thing because we've seen the, you know, You know, in the haystack things in the past, but apparently they have 100 percent recall on anything across the context window.[00:49:28] Alessio: At least they say nobody has used it. No, people[00:49:30] swyx: have. Yeah so as far as, so, so what this needle in a haystack thing for people who aren't following as closely as us is that someone, I forget his name now someone created this needle in a haystack problem where you feed in a whole bunch of generated junk not junk, but just like, Generate a data and ask it to specifically retrieve something in that data, like one line in like a hundred thousand lines where it like has a specific fact and if it, if you get it, you're, you're good.[00:49:57] swyx: And then he moves the needle around, like, you know, does it, does, does your ability to retrieve that vary if I put it at the start versus put it in the middle, put it at the end? And then you generate this like really nice chart. That, that kind of shows like it's recallability of a model. And he did that for GPT and, and Anthropic and showed that Anthropic did really, really poorly.[00:50:15] swyx: And then Anthropic came back and said it was a skill issue, just add this like four, four magic words, and then, then it's magically all fixed. And obviously everybody laughed at that. But what Gemini came out with was, was that, yeah, we, we reproduced their, you know, haystack issue you know, test for Gemini, and it's good across all, all languages.[00:50:30] swyx: All the one million token window, which is very interesting because usually for typical context extension methods like rope or yarn or, you know, anything like that, or alibi, it's lossy like by design it's lossy, usually for conversations that's fine because we are lossy when we talk to people but for superhuman intelligence, perfect memory across Very, very long context.[00:50:51] swyx: It's very, very interesting for picking things up. And so the people who have been given the beta test for Gemini have been testing this. So what you do is you upload, let's say, all of Harry Potter and you change one fact in one sentence, somewhere in there, and you ask it to pick it up, and it does. So this is legit.[00:51:08] swyx: We don't super know how, because this is, like, because it doesn't, yes, it's slow to inference, but it's not slow enough that it's, like, running. Five different systems in the background without telling you. Right. So it's something, it's something interesting that they haven't fully disclosed yet. The open source community has centered on this ring attention paper, which is created by your friend Matei Zaharia, and a couple other people.[00:51:36] swyx: And it's a form of distributing the compute. I don't super understand, like, why, you know, doing, calculating, like, the fee for networking and attention. In block wise fashion and distributing it makes it so good at recall. I don't think they have any answer to that. The only thing that Ring of Tension is really focused on is basically infinite context.[00:51:59] swyx: They said it was good for like 10 to 100 million tokens. Which is, it's just great. So yeah, using the four wars framework, what is this framework for Gemini? One is the sort of RAG and Ops war. Here we care less about RAG now, yes. Or, we still care as much about RAG, but like, now it's it's not important in prototyping.[00:52:21] swyx: And then, for data war I guess this is just part of the overall training dataset, but Google made a 60 million deal with Reddit and presumably they have deals with other companies. For the multi modality war, we can talk about the image generation, Crisis, or the fact that Gemini also has image generation, which we'll talk about in the next section.[00:52:42] swyx: But it also has video understanding, which is, I think, the top Gemini post came from our friend Simon Willison, who basically did a short video of him scanning over his bookshelf. And it would be able to convert that video into a JSON output of what's on that bookshelf. And I think that is very useful.[00:53:04] swyx: Actually ties into the conversation that we had with David Luan from Adept. In a sense of like, okay what if video was the main modality instead of text as the input? What if, what if everything was video in, because that's how we work. We, our eyes don't actually read, don't actually like get input, our brains don't get inputs as characters.[00:53:25] swyx: Our brains get the pixels shooting into our eyes, and then our vision system takes over first, and then we sort of mentally translate that into text later. And so it's kind of like what Adept is kind of doing, which is driving by vision model, instead of driving by raw text understanding of the DOM. And, and I, I, in that, that episode, which we haven't released I made the analogy to like self-driving by lidar versus self-driving by camera.[00:53:52] swyx: Mm-Hmm. , right? Like, it's like, I think it, what Gemini and any other super long context that model that is multimodal unlocks is what if you just drive everything by video. Which is[00:54:03] Alessio: cool. Yeah, and that's Joseph from Roboflow. It's like anything that can be seen can be programmable with these models.[00:54:12] Alessio: You mean[00:54:12] swyx: the computer vision guy is bullish on computer vision?[00:54:18] Alessio: It's like the rag people. The rag people are bullish on rag and not a lot of context. I'm very surprised. The, the fine tuning people love fine tuning instead of few shot. Yeah. Yeah. The, yeah, the, that's that. Yeah, the, I, I think the ring attention thing, and it's how they did it, we don't know. And then they released the Gemma models, which are like a 2 billion and 7 billion open.[00:54:41] Alessio: Models, which people said are not, are not good based on my Twitter experience, which are the, the GPU poor crumbs. It's like, Hey, we did all this work for us because we're GPU rich and we're just going to run this whole thing. And
Full Show Notes for Plutarch's Life of CleomenesRoman Parallel - Tiberius GracchusImportant PeopleAratus - The same Aratus from the last life, but older and more experienced now. Between Aratus, Cleomenes, and Philopoemen, it becomes clear that the Greeks themselves are the architects of their own undoing. None of these three men cooperates with the other and this dissension makes easy target for Antigonus. Megistonoüs - Cleomenes's father-in-law and right-hand man once he takes the throne. Antigonus III "Doson"- The king of Macedon who eventually comes down to the Peloponnesus in person to settle the Spartan mischief. His death is reported right after winning his kingdom back from barbaric Illyrian invaders. He was the most powerful person standing in Cleomenes' way, but Cleomenes is unaware of his death until he has already landed in Egypt. Ptolemy III - The successor of Alexander and ruler of wealthy Alexandria when Cleomenes arrives. He dies too soon to fulfill his promises to Cleomenes. Ptolemy IV - Ptolemy III's son is not fit to rule, interested more in parties and pleasures. As such, he does little to help Cleomenes and eventually grows suspicious of Cleomenes's lack of interest in partying. Sphaerus the Stoic (or Sphairus) - This student of the founder of Stoicism, Zeno of Cittium, teaches Cleomenes in his youth and helps him reform the Agōge to what it was. Plutarch has some criticisms for Stoicism in this Life that are worth considering. Important PlacesArgos - An important polis in north-western Peloponnesus, Cleomenes takes, but does not hold the city. While this is more than Pelopidas could do, it nonetheless marks the beginning of the end for him, and his father-in-law dies trying to take the city back. Corinth - The actual gateway to the Peloponnesus, called by Philip of Macedon "the fetters of Greece." Cleomenes has to allow Antigonus to take this fortified position when he falls back to quell the revolt in Argos. Sicyon - Aratus's hometown! Just north and east up the road from Corinth, on the opposite end of a bay facing that polis. Sicyon is not a populous or powerful polis, but their hometown hero's talents at forging unity in the Peloponnesus puts them on the map, until Cleomenes's dreams of Spartan hegemony threaten that unity. Key Virtuesπειθαρχίας (obedience) - This touches on a Platonic concept of knowing how to lead and be led (also popular with Xenophon). (cf. 18.4)ἐγκράτεια - self-control - A virtue that overlaps well with Lycurgan laws and Stoic ethics.ἀφέλεια - simplicity - The ultimate Spartan virtue, particularly when compared to other Greek poleis like Athens or Corinth. φιλότιμος - love of honor - This virtue could better be translated ambition, but so could the next one. μεγαλόφρων - great-mindedness / ambition - The natures that seek the great things. This is ambition to a T. Not all of us want to be president, but those that do are this type. εὐλαβὲς - piety - Another virtue Agis had but Cleomenes lacked. For a Spartan, there's a paucity of Cleomenes consulting the gods or being a religious leader in almost any form throughout this life. Key Vices - Undermining Spartan Cultureἀκολασία - intemperence (opposite of σωφροσύνη)βωμολοχία - buffooneryπανηγυρίσμος - display, ostentationSupport the show
Important PeopleLycurgus - ancient lawgiver, whose biography Plutarch also wrote, and to whom everyone refers constantly in this life as the original set of laws they are trying to hearken back to.Leonidas - one of two kings of Sparta (along with Agis, the protagonist of this life) who first secretly and then openly resists and thwarts Agis's reforms at every turn.Lysander - Not the Lysander who was a contemporary of Agesilaus, but a new Lysander, elected as ephor and one of the main allies for Agis in his implementation of the new Spartan system.Important PlacesSparta - This is the story of Sparta's last gasp attempt to become an important political and military influence in the Peloponnesus. VirtuesDiscretion (or piety?) - εὐλάβεια - Some interesting shades of meaning cover this one. The conventional Greek word for piety is εὐσέβεια (eusebeia), but this less common word can work like our English word pride. That is, it can be considered a vice or a virtue depending on the context. No one wants to be prideful, but we certainly allow and often even encourage people to be proud of the good things they've done for their communities. Gentleness - πρᾶον - A common theme we've seen in lives as disparate as Pericles, Aristides, and Aemilius Paullus. Also a contrast to those who lack it like Coriolanus or Pelopidas. Ultimately, the gentle leaders are the greater ones. Humane / Kindness - φιλάνθρωπον - Another virtue that shows up often among Plutarch's greatest heroes. This particular virtue seems to be part of Agis's downfall. In what way can our vices be our undoing? Is it like the life of Dion where tyrants feel challenged by virtuous living? Or was it something else? Key Vicesgreed - πλεονεξία (cf. 10)parsimony - μικρολογίαluxury - ἀπολαύσειsoftness - μαλακία (cf. 10)extravagance - πολυτέλειαCaptain IdeasWhat is a citizen?A person born and raised in a certain place and manner?Someone who adopts the language, customs, and laws of the land in which they reside?When and how should citizens fight for regime change? When and how should citizens admit defeat and work within an unjust or imperfect system of government? When in a leadership position, how does one know to instigate a change? Is every virtue to be insisted upon all the time by the laws? Support the show
