POPULARITY
宇宙ばなしがベースになっている書籍「やっぱり宇宙はすごい(SB新書)」は好評発売中!おかげさまで即重版!Audibleでも!Audibleの無料体験はこちらから!著書「マーケティングをAIで超効率化!ChatGPT APIビジネス活用入門(講談社)」好評発売中!もう1つのチャンネル「となりのデータ分析屋さん」はこちら!Spotify /Apple Podcast個人ホームページはこちら!Twitter(_ryo_astro)ジングル作成:モリグチさんfromワクワクラジオソース:https://www.xrism.jaxa.jp/topics/science/1140/
宇宙ばなしがベースになっている書籍「やっぱり宇宙はすごい(SB新書)」は好評発売中!おかげさまで即重版!Audibleでも!Audibleの無料体験はこちらから!著書「マーケティングをAIで超効率化!ChatGPT APIビジネス活用入門(講談社)」好評発売中!もう1つのチャンネル「となりのデータ分析屋さん」はこちら!Spotify /Apple Podcast個人ホームページはこちら!Twitter(_ryo_astro)ジングル作成:モリグチさんfromワクワクラジオソース:https://www.tohoku.ac.jp/japanese/2025/05/press20250509-02-blackhole.html
宇宙ばなしがベースになっている書籍「やっぱり宇宙はすごい(SB新書)」は好評発売中!おかげさまで即重版!Audibleでも!Audibleの無料体験はこちらから!著書「マーケティングをAIで超効率化!ChatGPT APIビジネス活用入門(講談社)」好評発売中!もう1つのチャンネル「となりのデータ分析屋さん」はこちら!Spotify /Apple Podcast個人ホームページはこちら!Twitter(_ryo_astro)ジングル作成:モリグチさんfromワクワクラジオソース:
宇宙ばなしがベースになっている書籍「やっぱり宇宙はすごい(SB新書)」は好評発売中!おかげさまで即重版!Audibleでも!Audibleの無料体験はこちらから!著書「マーケティングをAIで超効率化!ChatGPT APIビジネス活用入門(講談社)」好評発売中!もう1つのチャンネル「となりのデータ分析屋さん」はこちら!Spotify /Apple Podcast個人ホームページはこちら!Twitter(_ryo_astro)ジングル作成:モリグチさんfromワクワクラジオソース:
イベントアーカイブ動画はこちら宇宙ばなしがベースになっている書籍「やっぱり宇宙はすごい(SB新書)」は好評発売中!おかげさまで即重版!Audibleでも!Audibleの無料体験はこちらから!筆頭著書「マーケティングをAIで超効率化!ChatGPT APIビジネス活用入門(講談社)」も好評発売中!もう1つのチャンネル「となりのデータ分析屋さん」はこちら!Spotify /Apple Podcast個人ホームページはこちら!Twitter(_ryo_astro)ジングル作成:モリグチさんfromワクワクラジオソース:
イベントアーカイブ動画はこちら宇宙ばなしがベースになっている書籍「やっぱり宇宙はすごい(SB新書)」は好評発売中!おかげさまで即重版!Audibleでも!Audibleの無料体験はこちらから!筆頭著書「マーケティングをAIで超効率化!ChatGPT APIビジネス活用入門(講談社)」も好評発売中!もう1つのチャンネル「となりのデータ分析屋さん」はこちら!Spotify /Apple Podcast個人ホームページはこちら!Twitter(_ryo_astro)ジングル作成:モリグチさんfromワクワクラジオソース:
宇宙ばなしがベースになっている書籍「やっぱり宇宙はすごい(SB新書)」は好評発売中!おかげさまで即重版!Audibleでも!Audibleの無料体験はこちらから!筆頭著書「マーケティングをAIで超効率化!ChatGPT APIビジネス活用入門(講談社)」も好評発売中!もう1つのチャンネル「となりのデータ分析屋さん」はこちら!Spotify /Apple Podcast個人ホームページはこちら!Twitter(_ryo_astro)ジングル作成:モリグチさんfromワクワクラジオソース:https://www.nasa.gov/news-release/president-trumps-fy26-budget-revitalizes-human-space-exploration/
This episode includes a serious, hour-long discussion with Ryan McBeth on Syria, Iran, Lebanon, Israel and everything in between. ANDWe dive deep into this tweet…Of course, on December 24, 1917, President Woodrow Wilson issued the controversial pardon for his brother-in-law, Hunter DeButts, convicted of arms smuggling during World War I. DeButts, married to Wilson's sister-in-law, Alice, was sentenced to 15 years after British intelligence exposed his fraudulent shipping scheme. Though furious, Wilson faced mounting political pressure amid war preparations. The White House cited new evidence suggesting DeButts was manipulated by foreign spies, and critics accused Wilson of nepotism, while supporters framed the pardon as holiday clemency. After his release, DeButts vanished from public life, reportedly living quietly in Cuba until his death in 1933.Except. Wait a minute. What you just read, isn't true. I fabricated it by directing ChatGPT using Model 4o with the Mac app to make up a fictional reason why Hunter DeButts received a pardon from Woodrow Wilson. Because Hunter DeButts never received a pardon from Woodrow Wilson. Hunter DeButts did not marry Wilson's sister. Nor did he receive a pardon. There are other Hunter DeButts involved with Wilson or that time in history.And yet, Anna Navarro tweeted about it. Upon a simple Google search Navarro wound up getting serially dunked on as people realized very quickly something wasn't accurate.And so Anna Navarro posted the following explanation:She blamed ChatGPT's hallucinations.Oh, well. We've all been there. But have we? While conservatives dunked on Navarro even further for believing ChatGPT, I am here to tell you, as a reporter through and through, I don't know if ChatGPT hallucinated this. And really, I am following the research of my friend, Andrew Mayne, who first sent this to me and said, he could not replicate the Hunter DeButts answer on any ChatGPT model. Not 4o, not any model that is available, and specifically was available to Navarro on December 2nd.Now, here's something that you guys might not know about large language models: they are fairly replicable. You can get similar answers based on similar questions. It's not exact, but a hallucination is something that you should be able to recreate. It would be odd if you couldn't.And my friend Andrew should know. He worked at OpenAI. He was a science communicator. He made a lot of videos that demonstrated OpenAI products up to and including ChatGPT itself and is known as the first prompt engineer for that company. He spent a lot of time with these models.And with that, I went down my own reporting rabbit hole. Because one of the other things is that the screen grab that Anna Navarro showed was a ChatGPT search that had web results.See those little brackets with quotes in between them. Those would be annotations. Theoretically, you could click on them and they would bring you to a webpage that would show you where ChatGPT got this information.What's odd about it is that those are not the annotations that ChatGPT uses now. And they certainly were not used on December 2nd when Anna Navarro said that she did this search.So where'd she get it? What version of ChatGPT is she using? And what large language model is going to be the origin story of dear sweet DeButts?I had a theory. Let's say you're not particularly tech-savvy, if you don't know exactly what ChatGPT is or OpenAI is, then it is very easy, as ChatGPT has become more and more popular, to just go into the iOS app store and find a lot of — I'm going to call them copycats.What they really are are other apps that are using the ChatGPT API, but they do a skin on top of it and they often charge you a subscription service. Do not use them. But I did because my theory was that Ana Navarro was using one of these apps, one of these apps that are not using similar if not exact user interface the official ChatGPT app is. Maybe they are using those old annotations?All is revealed!We get to the bottom of DeButts, on this episode of the Politics Politics Politics. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.politicspoliticspolitics.com/subscribe
In this conversation, I jumped into the world of Artificial Intelligence (AI) and its applications beyond just content creation. Focusing on automation and workflow efficiency, AI enthusiast, Parker Olson discusses how he uses AI to streamline operations in his business, PodPitch - an initiative fueled by AI. Day-to-day business tasks like data collection, data analysis, content creation, and more can be efficiently automated using AI. Olson introduces us to some handy tools for such purposes, like Bardeen and the ChatGPT API for Google Sheets. These tools can gather data from popular websites, analyze large datasets, clean and summarize the data, and even outreach prospective customers - all in real-time. Today's businesses are leveraging AI to automate their LinkedIn tasks, using tools like MeetAlfred. Coupled with ChatGPT and Bardeen, this AI trio can run a comprehensive LinkedIn profile analysis to determine potential customers. By comparing and contrasting your profile information with others, it gives you a list of potential business connections. But AI doesn't stop at LinkedIn automation; it also helps identify other sources for content publishing and relevant websites for clients. By creating a Google filter for specific search terms, AI tools can gather data on the latest publishing channels directly into a spreadsheet and provide an insightful analysis of the content.
"...Happy birthday dear ThursdAIiiiiiiii, happy birthday to youuuuuu
The ChatGPT API has reduced its prices, making it more accessible for developers to use. Nvidia CEO Huang is calling for governments to build sovereign AI infrastructure, while also addressing concerns about the dangers of AI. The Aya Dataset is a valuable resource for researchers looking to develop multilingual NLP models. Finally, the "Animated Stickers" paper introduces a model that generates high-quality animated stickers with interesting and relevant motion. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:51 ChatGPT API Reduced Prices 03:19 Nvidia CEO Huang says countries must build sovereign AI infrastructure 05:03 Adrej Karpathi on Learning 06:21 Fake sponsor 08:03 Animated Stickers: Bringing Stickers to Life with Video Diffusion 09:34 Feedback Loops With Language Models Drive In-Context Reward Hacking 11:29 Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning 13:15 Outro
FTC investigating Big Tech, Apple's AI and LLM advancements with Siri, Google's new video model and more! We're going over the AI news that matters and why it's important.Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode pageJoin the discussion: Ask Jordan questions on AIUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTimestamps:02:45 OpenAI releases GPT mentions 06:24 Utilize GPT to automate coding and development.10:10 FTC investigating large partnerships for competitive impact.11:02 Microsoft invests in OpenAI with FTC concerns.16:29 Apple reportedly testing OpenAI's ChatGPT API.20:59 Edge AI brings improved memory to smartphones.24:45 Development of impressive AI models in tech.26:10 2024: Year of advanced image modeling technology.30:23 Large companies using public domain art for marketing.33:45 Access to tech giants with valuable insights.Topics Covered in This Episode:1. OpenAI's New GPT Mentions Feature2. FTC's Investigation of Big Tech Companies3. Apple's Upcoming Generative AI Capabilities4. Google's New AI Video Model, Lumiere5. Samsung's AI DevelopmentsKeywords:Apple, generative AI, edge AI, Samsung s 24, iOS 18, iPhone, smart assistant, Google, Lumiere, AI video model, 2024 video model year, ChatGPT, FTC investigation, big tech AI companies, Amazon, Microsoft, Anthropic, OpenAI, AI investments, AI partnerships, AI-generated photos, personalized marketing videos, text-to-video capabilities, tech regulation, AI productivity challenges, GPT mentions, AI in healthcare Get more out of ChatGPT by learning our PPP method in this live, interactive and free training! Sign up now: https://youreverydayai.com/ppp-registration/
In this episode, we decode the financial implications of ChatGPT's API launch, highlighting the 10X efficiency it brings to the table, and examining how this cost-effective solution has the potential to reshape the economics of AI integration. Invest in AI Box: https://Republic.com/ai-box Get on the AI Box Waitlist: https://AIBox.ai/ AI Facebook Community Learn more about AI in Video Learn more about Open AI
Snap's latest version of its AR development tool includes a ChatGPT API, boosted productivity and more; Meta and Amazon team up on new in-app shopping feature on Facebook & Instagram; Amazon makes online grocery available for non-Prime members, starting with Amazon Fresh; Learn more about your ad choices. Visit megaphone.fm/adchoices
L'intelligence artificielle ChatGPT se fait une place à bord des véhicules DS Automobiles. L'IA sera proposée aux 20 000 premiers inscrits (disposant de DS Iris System) pour une phase pilote durant six mois, et ce, sans le moindre surcoût. Chez DS Automobiles, on a décidé d'intégrer la célèbre intelligence artificielle à bord des gammes DS 3, DS 4, DS 7 et DS 9, au travers de DS Iris System. À bord des véhicules, ChatGPT se transforme en un authentique « assistant numérique dédié à l'expérience de voyage » selon la marque. Pour faire appel à ChatGPT au volant de sa DS, il suffit de dicter la commande « OK Iris » ou de presser le bouton dédié sur le volant. L'interaction vocale avec ChatGPT démarre alors, et le conducteur peut dialoguer avec l'IA sans avoir à quitter la route des yeux et sans lâcher le volant. Le conducteur peut notamment demander à ChatGPT de générer un conte pour occuper les enfants, de lister les plus beaux lieux à visiter dans la ville avoisinante, d'expliquer l'histoire du monument à peine croisé, et finalement effectuer à peu près n'importe quel type de demande. Le constructeur va lancer une phase pilote de cette « SoundHound AI powered by ChatGPT API » au sein de Stellantis, de manière à évaluer l'expérience client auprès des 20 000 premiers utilisateurs. L'intégration de ChatGPT au système embarqué DS Iris est proposée sans surcoût et pour une durée de six mois. À noter que la souscription doit être effectuée entre le 19 octobre 2023 et le 29 février 2024. Learn more about your ad choices. Visit megaphone.fm/adchoices
L'intelligence artificielle ChatGPT se fait une place à bord des véhicules DS Automobiles. L'IA sera proposée aux 20 000 premiers inscrits (disposant de DS Iris System) pour une phase pilote durant six mois, et ce, sans le moindre surcoût.Chez DS Automobiles, on a décidé d'intégrer la célèbre intelligence artificielle à bord des gammes DS 3, DS 4, DS 7 et DS 9, au travers de DS Iris System. À bord des véhicules, ChatGPT se transforme en un authentique « assistant numérique dédié à l'expérience de voyage » selon la marque. Pour faire appel à ChatGPT au volant de sa DS, il suffit de dicter la commande « OK Iris » ou de presser le bouton dédié sur le volant. L'interaction vocale avec ChatGPT démarre alors, et le conducteur peut dialoguer avec l'IA sans avoir à quitter la route des yeux et sans lâcher le volant. Le conducteur peut notamment demander à ChatGPT de générer un conte pour occuper les enfants, de lister les plus beaux lieux à visiter dans la ville avoisinante, d'expliquer l'histoire du monument à peine croisé, et finalement effectuer à peu près n'importe quel type de demande.Le constructeur va lancer une phase pilote de cette « SoundHound AI powered by ChatGPT API » au sein de Stellantis, de manière à évaluer l'expérience client auprès des 20 000 premiers utilisateurs. L'intégration de ChatGPT au système embarqué DS Iris est proposée sans surcoût et pour une durée de six mois. À noter que la souscription doit être effectuée entre le 19 octobre 2023 et le 29 février 2024. Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.
Large language models (LLMs) can be used to serve as agents to simulate human behaviors, given the powerful ability to understand human instructions and provide high-quality generated texts. Such ability stimulates us to wonder whether LLMs can simulate a person in a higher form than simple human behaviors. Therefore, we aim to train an agent with the profile, experience, and emotional states of a specific person instead of using limited prompts to instruct ChatGPT API. In this work, we introduce Character-LLM that teach LLMs to act as specific people such as Beethoven, Queen Cleopatra, Julius Caesar, etc. Our method focuses on editing profiles as experiences of a certain character and training models to be personal simulacra with these experiences. To assess the effectiveness of our approach, we build a test playground that interviews trained agents and evaluates whether the agents textit{memorize} their characters and experiences. Experimental results show interesting observations that help build future simulacra of humankind. 2023: Yunfan Shao, Linyang Li, Junqi Dai, Xipeng Qiu https://arxiv.org/pdf/2310.10158v1.pdf
AI Applied: Covering AI News, Interviews and Tools - ChatGPT, Midjourney, Runway, Poe, Anthropic
Discover how ChatGPT's newly launched API is revolutionizing accessibility by slashing costs by 10 times. In this episode, we dive deep into the details of this groundbreaking development and explore the potential implications across various industries. Tune in to learn how ChatGPT is making high-quality language processing more affordable than ever. Get on the AI Box Waitlist: https://AIBox.ai/Join our ChatGPT Community: https://www.facebook.com/groups/739308654562189/Follow me on Twitter: https://twitter.com/jaeden_ai
AI Hustle: News on Open AI, ChatGPT, Midjourney, NVIDIA, Anthropic, Open Source LLMs
In this episode, we dive into the game-changing announcement of ChatGPT's API launch, accompanied by a remarkable 10X reduction in cost. Explore the exciting implications of this move, from increased accessibility to innovative applications across industries. Join us for an insightful discussion on how ChatGPT's API is set to reshape the AI landscape and drive transformative change. Get on the AI Box Waitlist: https://AIBox.ai/Join our ChatGPT Community: https://www.facebook.com/groups/739308654562189/Follow me on Twitter: https://twitter.com/jaeden_ai
As alluded to on the pod, LangChain has just launched LangChain Hub: “the go-to place for developers to discover new use cases and polished prompts.” It's available to everyone with a LangSmith account, no invite code necessary. Check it out!In 2023, LangChain has speedrun the race from 2:00 to 4:00 to 7:00 Silicon Valley Time. From the back to back $10m Benchmark seed and (rumored) $20-25m Sequoia Series A in April, to back to back critiques of “LangChain is Pointless” and “The Problem with LangChain” in July, to teaching with Andrew Ng and keynoting at basically every AI conference this fall (including ours), it has been an extreme rollercoaster for Harrison and his growing team creating one of the most popular (>60k stars at time of writing) building blocks for AI Engineers.LangChain's OriginsThe first commit to LangChain shows its humble origins as a light wrapper around Python's formatter.format for prompt templating. But as Harrison tells the story, even his first experience with text-davinci-002 in early 2022 was focused on chatting with data from their internal company Notion and Slack, what is now known as Retrieval Augmented Generation (RAG). As the Generative AI meetup scene came to life post Stable Diffusion, Harrison saw a need for common abstractions for what people were building with text LLMs at the time:* LLM Math, aka Riley Goodside's “You Can't Do Math” REPL-in-the-loop (PR #8)* Self-Ask With Search, Ofir Press' agent pattern (PR #9) (later ReAct, PR #24)* NatBot, Nat Friedman's browser controlling agent (PR #18)* Adapters for OpenAI, Cohere, and HuggingFaceHubAll this was built and launched in a few days from Oct 16-25, 2022. Turning research ideas/exciting usecases into software quickly and often has been in the LangChain DNA from Day 1 and likely a big driver of LangChain's success, to date amassing the largest community of AI Engineers and being the default launch framework for every big name from Nvidia to OpenAI:Dancing with GiantsBut AI Engineering is built atop of constantly moving tectonic shifts: * ChatGPT launched in November (“The Day the AGI Was Born”) and the API released in March. Before the ChatGPT API, OpenAI did not have a chat endpoint. In order to build a chatbot with history, you had to make sure to chain all messages and prompt for completion. LangChain made it easy to do that out of the box, which was a huge driver of usage. * Today, OpenAI has gone all-in on the chat API and is deprecating the old completions models, essentially baking in the chat pattern as the default way most engineers should interact with LLMs… and reducing (but not eliminating) the value of ConversationChains.* And there have been more updates since: Plugins released in API form as Functions in June (one of our top pods ever… reducing but not eliminating the value of OutputParsers) and Finetuning in August (arguably reducing some need for Retrieval and Prompt tooling). With each update, OpenAI and other frontier model labs realign the roadmaps of this nascent industry, and Harrison credits the modular design of LangChain in staying relevant. LangChain has not been merely responsive either: LangChain added Agents in November, well before they became the hottest topic of the AI Summer, and now Agents feature as one of LangChain's top two usecases. LangChain's problem for podcasters and newcomers alike is its sheer scope - it is the world's most complete AI framework, but it also has a sprawling surface area that is difficult to fully grasp or document in one sitting. This means it's time for the trademark Latent Space move (ChatGPT, GPT4, Auto-GPT, and Code Interpreter Advanced Data Analysis GPT4.5): the executive summary!What is LangChain?As Harrison explains, LangChain is an open source framework for building context-aware reasoning applications, available in Python and JS/TS.It launched in Oct 2022 with the central value proposition of “composability”, aka the idea that every AI engineer will want to switch LLMs, and combine LLMs with other things into “chains”, using a flexible interface that can be saved via a schema.Today, LangChain's principal offerings can be grouped as:* Components: isolated modules/abstractions* Model I/O* Models (for LLM/Chat/Embeddings, from OpenAI, Anthropic, Cohere, etc)* Prompts (Templates, ExampleSelectors, OutputParsers)* Retrieval (revised and reintroduced in March)* Document Loaders (eg from CSV, JSON, Markdown, PDF)* Text Splitters (15+ various strategies for chunking text to fit token limits)* Retrievers (generic interface for turning an unstructed query into a set of documents - for self-querying, contextual compression, ensembling)* Vector Stores (retrievers that search by similarity of embeddings)* Indexers (sync documents from any source into a vector store without duplication)* Memory (for long running chats, whether a simple Buffer, Knowledge Graph, Summary, or Vector Store)* Use-Cases: compositions of Components* Chains: combining a PromptTemplate, LLM Model and optional OutputParser* with Router, Sequential, and Transform Chains for advanced usecases* savable, sharable schemas that can be loaded from LangChainHub* Agents: a chain that has access to a suite of tools, of nondeterministic length because the LLM is used as a reasoning engine to determine which actions to take and in which order. Notable 100LOC explainer here.* Tools (interfaces that an agent can use to interact with the world - preset list here. Includes things like ChatGPT plugins, Google Search, WolframAlpha. Groups of tools are bundled up as toolkits)* AgentExecutor (the agent runtime, basically the while loop, with support for controls, timeouts, memory sharing, etc)* LangChain has also added a Callbacks system for instrumenting each stage of LLM, Chain, and Agent calls (which enables LangSmith, LangChain's first cloud product), and most recently an Expression Language, a declarative way to compose chains.LangChain the company incorporated in January 2023, announced their seed round in April, and launched LangSmith in July. At time of writing, the company has 93k followers, their Discord has 31k members and their weekly webinars are attended by thousands of people live.The full-featuredness of LangChain means it is often the first starting point for building any mainstream LLM use case, because they are most likely to have working guides for the new developer. Logan (our first guest!) from OpenAI has been a notable fan of both LangChain and LangSmith (they will be running the first LangChain + OpenAI workshop at AI Eng Summit). However, LangChain is not without its critics, with Aravind Srinivas, Jim Fan, Max Woolf, Mckay Wrigley and the general Reddit/HN community describing frustrations with the value of their abstractions, and many are attempting to write their own (the common experience of adding and then removing LangChain is something we covered in our Agents writeup). Harrison compares this with the timeless ORM debate on the value of abstractions.LangSmithLast month, Harrison launched LangSmith, their LLM observability tool and first cloud product. LangSmith makes it easy to monitor all the different primitives that LangChain offers (agents, chains, LLMs) as well as making it easy to share and evaluate them both through heuristics (i.e. manually written ones) and “LLM evaluating LLM” flows. The top HN comment in the “LangChain is Pointless” thread observed that orchestration is the smallest part of the work, and the bulk of it is prompt tuning and data serialization. When asked this directly our pod, Harrison agreed:“I agree that those are big pain points that get exacerbated when you have these complex chains and agents where you can't really see what's going on inside of them. And I think that's partially why we built Langsmith…” (48min mark)You can watch the full launch on the LangChain YouTube:It's clear that the target audience for LangChain is expanding to folks who are building complex, production applications rather than focusing on the simpler “Q&A your docs” use cases that made it popular in the first place. As the AI Engineer space matures, there will be more and more tools graduating from supporting “hobby” projects to more enterprise-y use cases. In this episode we run through some of the history of LangChain, how it's growing from an open source project to one of the highest valued AI startups out there, and its future. We hope you enjoy it!Show Notes* LangChain* LangChain's Berkshire Hathaway Homepage* Abstractions tweet* LangSmith* LangSmith Cookbooks repo* LangChain Retrieval blog* Evaluating CSV Question/Answering blog and YouTube* MultiOn Partner blog* Harvard Sports Analytics Collective* Evaluating RAG Webinar* awesome-langchain:* LLM Math Chain* Self-Ask* LangChain Hub UI* “LangChain is Pointless”* Harrison's links* sports - estimating player compatibility in the NBA* early interest in prompt injections* GitHub* TwitterTimestamps* [00:00:00] Introduction* [00:00:48] Harrison's background and how sports led him into ML* [00:04:54] The inspiration for creating LangChain - abstracting common patterns seen in other GPT-3 projects* [00:05:51] Overview of LangChain - a framework for building context-aware reasoning applications* [00:10:09] Components of LangChain - modules, chains, agents, etc.* [00:14:39] Underappreciated parts of LangChain - text splitters, retrieval algorithms like self-query* [00:18:46] Hiring at LangChain* [00:20:27] Designing the LangChain architecture - balancing flexibility and structure* [00:24:09] The difference between chains and agents in LangChain* [00:25:08] Prompt engineering and LangChain* [00:26:16] Announcing LangSmith* [00:30:50] Writing custom evaluators in LangSmith* [00:33:19] Reducing hallucinations - fixing retrieval vs generation issues* [00:38:17] The challenges of long context windows* [00:40:01] LangChain's multi-programming language strategy* [00:45:55] Most popular LangChain blog posts - deep dives into specific topics* [00:50:25] Responding to LangChain criticisms* [00:54:11] Harrison's advice to AI engineers* [00:55:43] Lightning RoundTranscriptAlessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai. [00:00:19]Swyx: Welcome. Today we have Harrison Chase in the studio with us. Welcome Harrison. [00:00:23]Harrison: Thank you guys for having me. I'm excited to be here. [00:00:25]Swyx: It's been a long time coming. We've been asking you for a little bit and we're really glad that you got some time to join us in the studio. Yeah. [00:00:32]Harrison: I've been dodging you guys for a while. [00:00:34]Swyx: About seven months. You pulled me in here. [00:00:37]Alessio: About seven months. But it's all good. I totally understand. [00:00:38]Swyx: We like to introduce people through the official backgrounds and then ask you a little bit about your personal side. So you went to Harvard, class of 2017. You don't list what you did in Harvard. Was it CS? [00:00:48]Harrison: Stats and CS. [00:00:50]Swyx: That's awesome. I love me some good stats. [00:00:52]Harrison: I got into it through stats, through doing sports analytics. And then there was so much overlap between stats and CS that I found myself doing more and more of that. [00:00:59]Swyx: And it's interesting that a lot of the math that you learn in stats actually comes over into machine learning which you applied at Kensho as a machine learning engineer and Robust Intelligence, which seems to be the home of a lot of AI founders.Harrison: It does. Yeah. Swyx: And you started LangChain, I think around November 2022 and incorporated in January. Yeah. [00:01:19]Harrison: I was looking it up for the podcast and the first tweet was on, I think October 24th. So just before the end of November or end of October. [00:01:26]Swyx: Yeah. So that's your LinkedIn. What should people know about you on the personal side that's not obvious on LinkedIn? [00:01:33]Harrison: A lot of how I got into this is all through sports actually. Like I'm a big sports fan, played a lot of soccer growing up and then really big fan of the NBA and NFL. And so freshman year at college showed up and I knew I liked math. I knew I liked sports. One of the clubs that was there was the Sports Analytics Collective. And so I joined that freshman year, I was doing a lot of stuff in like Excel, just like basic stats, but then like wanted to do more advanced stuff. So learn to code, learn kind of like data science and machine learning through that way. Kind of like just kept on going down that path. I think sports is a great entryway to data science and machine learning. There's a lot of like numbers out there. People like really care. Like I remember, I think sophomore, junior year, I was in the Sports Collective and the main thing we had was a blog. And so we wrote a blog. It wasn't me. One of the other people in the club wrote a blog predicting the NFL season. I think they made some kind of like with stats and I think their stats showed that like the Dolphins would end up beating the Patriots and New England got like pissed about it, of course. So people like really care and they'll give you feedback about whether you're like models doing well or poorly. And so you get that. And then you also get like instantaneous kind of like, well, not instantaneous, but really quick feedback. Like if you predict a game, the game happens that night. Like you don't have to wait a year to see what happens. So I think sports is a great kind of like entryway for kind of like data science. [00:02:43]Alessio: There was actually my first article on the Twilio blog with a Python script to like predict pricing of like Daily Fantasy players based on my past week performance. Yeah, I don't know. It's a good getaway drug. [00:02:56]Swyx: And on my end, the way I got into finance was through sports betting. So maybe we all have some ties in there. Was like Moneyball a big inspiration? The movie? [00:03:06]Harrison: Honestly, not really. I don't really like baseball. That's like the big thing. [00:03:10]Swyx: Let's call it a lot of stats. Cool. Well, we can dive right into LangChain, which is what everyone is excited about. But feel free to make all the sports analogies you want. That really drives home a lot of points. What was your GPT aha moment? When did you start working on GPT itself? Maybe not LangChain, just anything to do with the GPT API? [00:03:29]Harrison: I think it probably started around the time we had a company hackathon. I think that was before I launched LangChain. I'm trying to remember the exact sequence of events, but I do remember that at the hackathon I worked with Will, who's now actually at LangChain as well, and then two other members of Robust. And we made basically a bot where you could ask questions of Notion and Slack. And so I think, yeah, RAG, basically. And I think I wanted to try that out because I'd heard that it was getting good. I'm trying to remember if I did anything before that to realize that it was good. So then I would focus on that on the hackathon. I can't remember or not, but that was one of the first times that I built something [00:04:06]Swyx: with GPT-3. There wasn't that much opportunity before because the API access wasn't that widespread. You had to get into some kind of program to get that. [00:04:16]Harrison: DaVinci-002 was not terrible, but they did an upgrade to get it to there, and they didn't really publicize that as much. And so I think I remember playing around with it when the first DaVinci model came out. I was like, this is cool, but it's not amazing. You'd have to do a lot of work to get it to do something. But then I think that February or something, I think of 2022, they upgraded it and it was it got better, but I think they made less of an announcement around it. And so I just, yeah, it kind of slipped under the radar for me, at least. [00:04:45]Alessio: And what was the step into LangChain? So you did the hackathon, and then as you were building the kind of RAG product, you felt like the developer experience wasn't that great? Or what was the inspiration? [00:04:54]Harrison: No, honestly, so around that time, I knew I was going to leave my previous job. I was trying to figure out what I was going to do next. I went to a bunch of meetups and other events. This was like the September, August, September of that year. So after Stable Diffusion, but before ChatGPT. So there was interest in generative AI as a space, but not a lot of people hacking on language models yet. But there were definitely some. And so I would go to these meetups and just chat with people and basically saw some common abstractions in terms of what they were building, and then thought it would be a cool side project to factor out some of those common abstractions. And that became kind of like LangChain. I looked up again before this, because I remember I did a tweet thread on Twitter to announce LangChain. And we can talk about what LangChain is. It's a series of components. And then there's some end-to-end modules. And there was three end-to-end modules that were in the initial release. One was NatBot. So this was the web agent by Nat Friedman. Another was LLM Math Chain. So it would construct- [00:05:51]Swyx: GPT-3 cannot do math. [00:05:53]Harrison: Yeah, exactly. And then the third was Self-Ask. So some type of RAG search, similar to React style agent. So those were some of the patterns in terms of what I was seeing. And those all came from open source or academic examples, because the people who were actually working on this were building startups. And they were doing things like question answering over your databases, question answering over SQL, things like that. But I couldn't use their code as kind of like inspiration to factor things out. [00:06:18]Swyx: I talked to you a little bit, actually, roundabout, right after you announced LangChain. I'm honored. I think I'm one of many. This is your first open source project. [00:06:26]Harrison: No, that's not actually true. I released, because I like sports stats. And so I remember I did release some really small, random Python package for scraping data from basketball reference or something. I'm pretty sure I released that. So first project to get a star on GitHub, let's say that. [00:06:45]Swyx: Did you reference anything? What was the inspirations, like other frameworks that you look to when open sourcing LangChain or announcing it or anything like that? [00:06:53]Harrison: I mean, the only main thing that I looked for... I remember reading a Hacker News post a little bit before about how a readme on the project goes a long way. [00:07:02]Swyx: Readme's help. [00:07:03]Harrison: Yeah. And so I looked at it and was like, put some status checks at the top and have the title and then one or two lines and then just right into installation. And so that's the main thing that I looked at in terms of how to structure it. Because yeah, I hadn't done open source before. I didn't really know how to communicate that aspect of the marketing or getting people to use it. I think I had some trouble finding it, but I finally found it and used that as a lot [00:07:25]Swyx: of the inspiration there. Yeah. It was one of the subjects of my write-up how it was surprising to me that significant open source experience actually didn't seem to matter in the new wave of AI tooling. Most like auto-GPTs, Torrents, that was his first open source project ever. And that became auto-GPT. Yeah. I don't know. To me, it's just interesting how open source experience is kind of fungible or not necessary. Or you can kind of learn it on the job. [00:07:49]Alessio: Overvalued. [00:07:50]Swyx: Overvalued. Okay. You said it, not me. [00:07:53]Alessio: What's your description of LangChain today? I think when I built the LangChain Hub UI in January, there were a few things. And I think you were one of the first people to talk about agents that were already in there before it got hot now. And it's obviously evolved into a much bigger framework today. Run people through what LangChain is today, how they should think about it, and all of that. [00:08:14]Harrison: The way that we describe it or think about it internally is that LangChain is basically... I started off saying LangChain's a framework for building LLM applications, but that's really vague and not really specific. And I think part of the issue is LangChain does do a lot, so it's hard to be somewhat specific. But I think the way that we think about it internally, in terms of prioritization, what to focus on, is basically LangChain's a framework for building context-aware reasoning applications. And so that's a bit of a mouthful, but I think that speaks to a lot of the core parts of what's in LangChain. And so what concretely that means in LangChain, there's really two things. One is a set of components and modules. And these would be the prompt template abstraction, the LLM abstraction, chat model abstraction, vector store abstraction, text splitters, document loaders. And so these are combinations of things that we build and we implement, or we just have integrations with. So we don't have any language models ourselves. We don't have any vector stores ourselves, but we integrate with a lot of them. And then the text splitters, we have our own logic for that. The document loaders, we have our own logic for that. And so those are the individual modules. But then I think another big part of LangChain, and probably the part that got people using it the most, is the end-to-end chains or applications. So we have a lot of chains for getting started with question answering over your documents, chat question answering, question answering over SQL databases, agent stuff that you can plug in off the box. And that basically combines these components in a series of specific ways to do this. So if you think about a question answering app, you need a lot of different components kind of stacked. And there's a bunch of different ways to do question answering apps. So this is a bit of an overgeneralization, but basically, you know, you have some component that looks up an embedding from a vector store, and then you put that into the prompt template with the question and the context, and maybe you have the chat history as well. And then that generates an answer, and then maybe you parse that out, or you do something with the answer there. And so there's just this sequence of things that you basically stack in a particular way. And so we just provide a bunch of those assembled chains off the shelf to make it really easy to get started in a few lines of code. [00:10:09]Alessio: And just to give people context, when you first released LangChain, OpenAI did not have a chat API. It was a completion-only API. So you had to do all the human assistant, like prompting and whatnot. So you abstracted a lot of that away. I think the most interesting thing to me is you're kind of the Switzerland of this developer land. There's a bunch of vector databases that are killing each other out there to get people to embed data in them, and you're like, I love you all. You all are great. How do you think about being an opinionated framework versus leaving a lot of choice to the user? I mean, in terms of spending time into this integration, it's like you only have 10 people on the team. Obviously that takes time. Yeah. What's that process like for you all? [00:10:50]Harrison: I think right off the bat, having different options for language models. I mean, language models is the main one that right off the bat we knew we wanted to support a bunch of different options for. There's a lot to discuss there. People want optionality between different language models. They want to try it out. They want to maybe change to ones that are cheaper as new ones kind of emerge. They don't want to get stuck into one particular one if a better one comes out. There's some challenges there as well. Prompts don't really transfer. And so there's a lot of nuance there. But from the bat, having this optionality between the language model providers was a big important part because I think that was just something we felt really strongly about. We believe there's not just going to be one model that rules them all. There's going to be a bunch of different models that are good for a bunch of different use cases. I did not anticipate the number of vector stores that would emerge. I don't know how many we supported in the initial release. It probably wasn't as big of a focus as language models was. But I think it kind of quickly became so, especially when Postgres and Elastic and Redis started building their vector store implementations. We saw that some people might not want to use a dedicated vector store. Maybe they want to use traditional databases. I think to your point around what we're opinionated about, I think the thing that we believe most strongly is it's super early in the space and super fast moving. And so there's a lot of uncertainty about how things will shake out in terms of what role will vector databases play? How many will there be? And so I think a lot of it has always kind of been this optionality and ability to switch and not getting locked in. [00:12:19]Swyx: There's other pieces of LangChain which maybe don't get as much attention sometimes. And the way that you explained LangChain is somewhat different from the docs. I don't know how to square this. So for example, you have at the top level in your docs, you have, we mentioned ModelIO, we mentioned Retrieval, we mentioned Chains. Then you have a concept called Agents, which I don't know if exactly matches what other people call Agents. And we also talked about Memory. And then finally there's Callbacks. Are there any of the less understood concepts in LangChain that you want to give some air to? [00:12:53]Harrison: I mean, I think buried in ModelIO is some stuff around like few-shot example selectors that I think is really powerful. That's a workhorse. [00:13:01]Swyx: Yeah. I think that's where I start with LangChain. [00:13:04]Harrison: It's one of those things that you probably don't, if you're building an application, you probably don't start with it. You probably start with like a zero-shot prompt. But I think that's a really powerful one that's probably just talked about less because you don't need it right off the bat. And for those of you who don't know, that basically selects from a bunch of examples the ones that are maybe most relevant to the input at hand. So you can do some nice kind of like in-context learning there. I think that's, we've had that for a while. I don't think enough people use that, basically. Output parsers also used to be kind of important, but then function calling. There's this interesting thing where like the space is just like progressing so rapidly that a lot of things that were really important have kind of diminished a bit, to be honest. Output parsers definitely used to be an understated and underappreciated part. And I think if you're working with non-OpenAI models, they still are, but a lot of people are working with OpenAI models. But even within there, there's different things you can do with kind of like the function calling ability. Sometimes you want to have the option of having the text or the application you're building, it could return either. Sometimes you know that it wants to return in a structured format, and so you just want to take that structured format. Other times you're extracting things that are maybe a key in that structured format, and so you want to like pluck that key. And so there's just like some like annoying kind of like parsing of that to do. Agents, memory, and retrieval, we haven't talked at all. Retrieval, there's like five different subcomponents. You could also probably talk about all of those in depth. You've got the document loaders, the text splitters, the embedding models, the vector stores. Embedding models and vector stores, we don't really have, or sorry, we don't build, we integrate with those. Text splitters, I think we have like 15 or so. Like I think there's an under kind of like appreciated amount of those. [00:14:39]Swyx: And then... Well, it's actually, honestly, it's overwhelming. Nobody knows what to choose. [00:14:43]Harrison: Yeah, there is a lot. [00:14:44]Swyx: Yeah. Do you have personal favorites that you want to shout out? [00:14:47]Harrison: The one that we have in the docs is the default is like the recursive text splitter. We added a playground for text splitters the other week because, yeah, we heard a lot that like, you know, and like these affect things like the chunk overlap and the chunks, they affect things in really subtle ways. And so like I think we added a playground where people could just like choose different options. We have like, and a lot of the ideas are really similar. You split on different characters, depending on kind of like the type of text that you have marked down, you might want to split on differently than HTML. And so we added a playground where you can kind of like choose between those. I don't know if those are like underappreciated though, because I think a lot of people talk about text splitting as being a hard part, and it is a really important part of creating these retrieval applications. But I think we have a lot of really cool retrieval algorithms as well. So like self query is maybe one of my favorite things in LangChain, which is basically this idea of when you have a user question, the typical kind of like thing to do is you embed that question and then find the document that's most similar to that question. But oftentimes questions have things that just, you don't really want to look up semantically, they have some other meaning. So like in the example that I use, the example in the docs is like movies about aliens in the year 1980. 1980, I guess there's some semantic meaning for that, but it's a very particular thing that you care about. And so what the self query retriever does is it splits out the metadata filter and most vector stores support like a metadata filter. So it splits out this metadata filter, and then it splits out the semantic bit. And that's actually like kind of tricky to do because there's a lot of different filters that you can have like greater than, less than, equal to, you can have and things if you have multiple filters. So we have like a pretty complicated like prompt that does all that. That might be one of my favorite things in LangChain, period. Like I think that's, yeah, I think that's really cool. [00:16:26]Alessio: How do you think about speed of development versus support of existing things? So we mentioned retrieval, like you got, or, you know, text splitting, you got like different options for all of them. As you get building LangChain, how do you decide which ones are not going to keep supporting, you know, which ones are going to leave behind? I think right now, as you said, the space moves so quickly that like you don't even know who's using what. What's that like for you? [00:16:50]Harrison: Yeah. I mean, we have, you know, we don't really have telemetry on what people are using in terms of what parts of LangChain, the telemetry we have is like, you know, anecdotal stuff when people ask or have issues with things. A lot of it also is like, I think we definitely prioritize kind of like keeping up with the stuff that comes out. I think we added function calling, like the day it came out or the day after it came out, we added chat model support, like the day after it came out or something like that. That's probably, I think I'm really proud of how the team has kind of like kept up with that because this space is like exhausting sometimes. And so that's probably, that's a big focus of ours. The support, I think we've like, to be honest, we've had to get kind of creative with how we do that. Cause we have like, I think, I don't know how many open issues we have, but we have like 3000, somewhere between 2000 and 3000, like open GitHub issues. We've experimented with a lot of startups that are doing kind of like question answering over your docs and stuff like that. And so we've got them on the website and in the discord and there's a really good one, dosu on the GitHub that's like answering issues and stuff like that. And that's actually something we want to start leaning into more heavily as a company as well as kind of like building out an AI dev rel because we're 10 people now, 10, 11 people now. And like two months ago we were like six or something like that. Right. So like, and to have like 2,500 open issues or something like that, and like 300 or 400 PRs as well. Cause like one of the amazing things is that like, and you kind of alluded to this earlier, everyone's building in the space. There's so many different like touch points. LangChain is lucky enough to kind of like be a lot of the glue that connects it. And so we get to work with a lot of awesome companies, but that's also a lot of like work to keep up with as well. And so I don't really have an amazing answer, but I think like the, I think prioritize kind of like new things that, that come out. And then we've gotten creative with some of kind of like the support functions and, and luckily there's, you know, there's a lot of awesome people working on all those support coding, question answering things that we've been able to work with. [00:18:46]Swyx: I think there is your daily rhythm, which I've seen you, you work like a, like a beast man, like mad impressive. And then there's sometimes where you step back and do a little bit of high level, like 50,000 foot stuff. So we mentioned, we mentioned retrieval. You did a refactor in March and there's, there's other abstractions that you've sort of changed your mind on. When do you do that? When do you do like the, the step back from the day to day and go, where are we going and change the direction of the ship? [00:19:11]Harrison: It's a good question so far. It's probably been, you know, we see three or four or five things pop up that are enough to make us think about it. And then kind of like when it reaches that level, you know, we don't have like a monthly meeting where we sit down and do like a monthly plan or something. [00:19:27]Swyx: Maybe we should. I've thought about this. Yeah. I'd love to host that meeting. [00:19:32]Harrison: It's really been a lot of, you know, one of the amazing things is we get to interact with so many different people. So it's been a lot of kind of like just pattern matching on what people are doing and trying to see those patterns before they punch us in the face or something like that. So for retrieval, it was the pattern of seeing like, Hey, yeah, like a lot of people are using vector sort of stuff. But there's also just like other methods and people are offering like hosted solutions and we want our abstractions to work with that as well. So we shouldn't bake in this paradigm of doing like semantic search too heavily, which sounds like basic now, but I think like, you know, to start a lot of it was people needed help doing these things. But then there was like managed things that did them, hybrid retrieval mechanisms, all of that. I think another example of this, I mean, Langsmith, which we can maybe talk about was like very kind of like, I think we worked on that for like three or four months before announcing it kind of like publicly, two months maybe before giving it to kind of like anyone in beta. But this was a lot of debugging these applications as a pain point. We hear that like just understanding what's going on is a pain point. [00:20:27]Alessio: I mean, you two did a webinar on this, which is called Agents vs. Chains. It was fun, baby. [00:20:32]Swyx: Thanks for having me on. [00:20:33]Harrison: No, thanks for coming. [00:20:34]Alessio: That was a good one. And on the website, you list like RAG, which is retrieval of bank debt generation and agents as two of the main goals of LangChain. The difference I think at the Databricks keynote, you said chains are like predetermined steps and agents is models reasoning to figure out what steps to take and what actions to take. How should people think about when to use the two and how do you transition from one to the other with LangChain? Like is it a path that you support or like do people usually re-implement from an agent to a chain or vice versa? [00:21:05]Swyx: Yeah. [00:21:06]Harrison: You know, I know agent is probably an overloaded term at this point, and so there's probably a lot of different definitions out there. But yeah, as you said, kind of like the way that I think about an agent is basically like in a chain, you have a sequence of steps. You do this and then you do this and then you do this and then you do this. And with an agent, there's some aspect of it where the LLM is kind of like deciding what to do and what steps to do in what order. And you know, there's probably some like gray area in the middle, but you know, don't fight me on this. And so if we think about those, like the benefits of the chains are that they're like, you can say do this and you just have like a more rigid kind of like order and the way that things are done. They have more control and they don't go off the rails and basically everything that's bad about agents in terms of being uncontrollable and expensive, you can control more finely. The benefit of agents is that I think they handle like the long tail of things that can happen really well. And so for an example of this, let's maybe think about like interacting with a SQL database. So you can have like a SQL chain and you know, the first kind of like naive approach at a SQL chain would be like, okay, you have the user question. And then you like write the SQL query, you do some rag, you pull in the relevant tables and schemas, you write a SQL query, you execute that against the SQL database. And then you like return that as the answer, or you like summarize that with an LLM and return that to the answer. And that's basically the SQL chain that we have in LangChain. But there's a lot of things that can go wrong in that process. Starting from the beginning, you may like not want to even query the SQL database at all. Maybe they're saying like, hi, or something, or they're misusing the application. Then like what happens if you have some step, like a big part of the application that people with LangChain is like the context aware part. So there's generally some part of bringing in context to the language model. So if you bring in the wrong context to the language model, so it doesn't know which tables to query, what do you do then? If you write a SQL query, it's like syntactically wrong and it can't run. And then if it can run, like what if it returns an unexpected result or something? And so basically what we do with the SQL agent is we give it access to all these different tools. So it has another tool, it can run the SQL query as another, and then it can respond to the user. But then if it kind of like, it can decide which order to do these. And so it gives it flexibility to handle all these edge cases. And there's like, obviously downsides to that as well. And so there's probably like some safeguards you want to put in place around agents in terms of like not letting them run forever, having some observability in there. But I do think there's this benefit of, you know, like, again, to the other part of what LangChain is like the reasoning part, like each of those steps individually involves some aspect of reasoning, for sure. Like you need to reason about what the SQL query is, you need to reason about what to return. But there's then there's also reasoning about the order of operations. And so I think to me, the key is kind of like giving it an appropriate amount to reason about while still keeping it within checks. And so to the point, like, I would probably recommend that most people get started with chains and then when they get to the point where they're hitting these edge cases, then they think about, okay, I'm hitting a bunch of edge cases where the SQL query is just not returning like the relevant things. Maybe I should add in some step there and let it maybe make multiple queries or something like that. Basically, like start with chain, figure out when you're hitting these edge cases, add in the reasoning step to that to handle those edge cases appropriately. That would be kind of like my recommendation, right? [00:24:09]Swyx: If I were to rephrase it, in my words, an agent would be a reasoning node in a chain, right? Like you start with a chain, then you just add a reasoning node, now it's an agent. [00:24:17]Harrison: Yeah, the architecture for your application doesn't have to be just a chain or just an agent. It can be an agent that calls chains, it can be a chain that has an agent in different parts of them. And this is another part as well. Like the chains in LangChain are largely intended as kind of like a way to get started and take you some amount of the way. But for your specific use case, in order to kind of like eke out the most performance, you're probably going to want to do some customization at the very basic level, like probably around the prompt or something like that. And so one of the things that we've focused on recently is like making it easier to customize these bits of existing architectures. But you probably also want to customize your architectures as well. [00:24:52]Swyx: You mentioned a bit of prompt engineering for self-ask and then for this stuff. There's a bunch of, I just talked to a prompt engineering company today, PromptOps or LLMOps. Do you have any advice or thoughts on that field in general? Like are you going to compete with them? Do you have internal tooling that you've built? [00:25:08]Harrison: A lot of what we do is like where we see kind of like a lot of the pain points being like we can talk about LangSmith and that was a big motivation for that. And like, I don't know, would you categorize LangSmith as PromptOps? [00:25:18]Swyx: I don't know. It's whatever you want it to be. Do you want to call it? [00:25:22]Harrison: I don't know either. Like I think like there's... [00:25:24]Swyx: I think about it as like a prompt registry and you store them and you A-B test them and you do that. LangSmith, I feel like doesn't quite go there yet. Yeah. It's obviously the next step. [00:25:34]Harrison: Yeah, we'll probably go. And yeah, we'll do more of that because I think that's definitely part of the application of a chain or agent is you start with a default one, then you improve it over time. And like, I think a lot of the main new thing that we're dealing with here is like language models. And the main new way to control language models is prompts. And so like a lot of the chains and agents are powered by this combination of like prompt language model and then some output parser or something doing something with the output. And so like, yeah, we want to make that core thing as good as possible. And so we'll do stuff all around that for sure. [00:26:05]Swyx: Awesome. We might as well go into LangSmith because we're bringing it up so much. So you announced LangSmith I think last month. What are your visions for it? Is this the future of LangChain and the company? [00:26:16]Harrison: It's definitely part of the future. So LangSmith is basically a control center for kind of like your LLM application. So the main features that it kind of has is like debugging, logging, monitoring, and then like testing and evaluation. And so debugging, logging, monitoring, basically you set three environment variables and it kind of like logs all the runs that are happening in your LangChain chains or agents. And it logs kind of like the inputs and outputs at each step. And so the main use case we see for this is in debugging. And that's probably the main reason that we started down this path of building it is I think like as you have these more complex things, debugging what's actually going on becomes really painful whether you're using LangChain or not. And so like adding this type of observability and debuggability was really important. Yeah. There's a debugging aspect. You can see the inputs, outputs at each step. You can then quickly enter into like a playground experience where you can fiddle around with it. The first version didn't have that playground and then we'd see people copy, go to open AI playground, paste in there. Okay. Well, that's a little annoying. And then there's kind of like the monitoring, logging experience. And we recently added some analytics on like, you know, how many requests are you getting per hour, minute, day? What's the feedback like over time? And then there's like a testing debugging, sorry, testing and evaluation component as well where basically you can create datasets and then test and evaluate these datasets. And I think importantly, all these things are tied to each other and then also into LangChain, the framework. So what I mean by that is like we've tried to make it as easy as possible to go from logs to adding a data point to a dataset. And because we think a really powerful flow is you don't really get started with a dataset. You can accumulate a dataset over time. And so being able to find points that have gotten like a thumbs up or a thumbs down from a user can be really powerful in terms of creating a good dataset. And so that's maybe like a connection between the two. And then the connection in the other way is like all the runs that you have when you test or evaluate something, they're logged in the same way. So you can debug what exactly is going on and you don't just have like a final score. You have like this nice trace and thing where you can jump in. And then we also want to do more things to hook this into a LangChain proper, the framework. So I think like some of like the managing the prompts will tie in here already. Like we talked about example selectors using datasets as a few short examples is a path that we support in a somewhat janky way right now, but we're going to like make better over time. And so there's this connection between everything. Yeah. [00:28:42]Alessio: And you mentioned the dataset in the announcement blog post, you touched on heuristic evaluation versus LLMs evaluating LLMs. I think there's a lot of talk and confusion about this online. How should people prioritize the two, especially when they might start with like not a good set of evals or like any data at all? [00:29:01]Harrison: I think it's really use case specific in the distinction that I draw between heuristic and LLM. LLMs, you're using an LLM to evaluate the output heuristics, you have some common heuristic that you can use. And so some of these can be like really simple. So we were doing some kind of like measuring of an extraction chain where we wanted it to output JSON. Okay. One evaluation can be, can you use JSON.loads to load it? And like, right. And that works perfectly. You don't need an LLM to do that. But then for like a lot of like the question answering, like, is this factually accurate? And you have some ground truth fact that you know it should be answering with. I think, you know, LLMs aren't perfect. And I think there's a lot of discussion around the pitfalls of using LLMs to evaluate themselves. And I'm not saying they're perfect by any means, but I do think they're, we've found them to be kind of like better than blue or any of those metrics. And the way that I also like to use those is also just like guide my eye about where to look. So like, you know, I might not trust the score of like 0.82, like exactly correct, but like I can look to see like which data points are like flagged as passing or failing. And sometimes the evaluators messing up, but it's like good to like, you know, I don't have to look at like a hundred data points. I can focus on like 10 or something like that. [00:30:10]Alessio: And then can you create a heuristic once in Langsmith? Like what's like your connection to that? [00:30:16]Harrison: Yeah. So right now, all the evaluation, we actually do client side. And part of this is basically due to the fact that a lot of the evaluation is really application specific. So we thought about having evaluators, you could just click off and run in a server side or something like that. But we still think it's really early on in evaluation. We still think there's, it's just really application specific. So we prioritized instead, making it easy for people to write custom evaluators and then run them client side and then upload the results so that they can manually inspect them because I think manual inspection is still a pretty big part of evaluation for better or worse. [00:30:50]Swyx: We have this sort of components of observability. We have cost, latency, accuracy, and then planning. Is that listed in there? [00:30:57]Alessio: Well, planning more in the terms of like, if you're an agent, how to pick the right tool and whether or not you are picking the right tool. [00:31:02]Swyx: So when you talk to customers, how would you stack rank those needs? Are they cost sensitive? Are they latency sensitive? I imagine accuracy is pretty high up there. [00:31:13]Harrison: I think accuracy is definitely the top that we're seeing right now. I think a lot of the applications, people are, especially the ones that we're working with, people are still struggling to get them to work at a level where they're reliable [00:31:24]Swyx: enough. [00:31:25]Harrison: So that's definitely the first. Then I think probably cost becomes the next one. I think a few places where we've started to see this be like one of the main things is the AI simulation that came out. [00:31:36]Swyx: Generative agents. Yeah, exactly. [00:31:38]Harrison: Which is really fun to run, but it costs a lot of money. And so one of our team members, Lance, did an awesome job hooking up like a local model to it. You know, it's not as perfect, but I think it helps with that. Another really big place for this, we believe, is in like extraction of structured data from unstructured data. And the reason that I think it's so important there is that usually you do extraction of some type of like pre-processing or indexing process over your documents. I mean, there's a bunch of different use cases, but one use case is for that. And generally that's over a lot of documents. And so that starts to rack up a bill kind of quickly. And I think extraction is also like a simpler task than like reasoning about which tools to call next in an agent. And so I think it's better suited for that. Yeah. [00:32:15]Swyx: On one of the heuristics I wanted to get your thoughts on, hallucination is one of the big problems there. Do you have any recommendations on how people should reduce hallucinations? [00:32:25]Harrison: To reduce hallucinations, we did a webinar on like evaluating RAG this past week. And I think there's this great project called RAGOS that evaluates four different things across two different spectrums. So the two different spectrums are like, is the retrieval part right? Or is the generation, or sorry, like, is it messing up in retrieval or is it messing up in generation? And so I think to fix hallucination, it probably depends on where it's messing up. If it's messing up in generation, then you're getting the right information, but it's still hallucinating. Or you're getting like partially right information and hallucinating some bits, a lot of that's prompt engineering. And so that's what we would recommend kind of like focusing on the prompt engineering part. And then if you're getting it wrong in the, if you're just not retrieving the right stuff, then there's a lot of different things that you can probably do, or you should look at on the retrieval bit. And honestly, that's where it starts to become a bit like application specific as well. Maybe there's some temporal stuff going on. Maybe you're not parsing things correctly. Yeah. [00:33:19]Swyx: Okay. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. [00:33:35]Harrison: Yeah. Yeah. [00:33:37]Swyx: Yeah. [00:33:38]Harrison: Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. [00:33:56]Swyx: Yeah. Yeah. [00:33:58]Harrison: Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. [00:34:04]Swyx: Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. [00:34:17]Harrison: Yeah. Yeah. Yeah. Yeah. Yeah. Yeah, I mean, there's probably a larger discussion around that, but openAI definitely had a huge headstart, right? And that's... Clawds not even publicly available yet, I don't think. [00:34:28]Swyx: The API? Yeah. Oh, well, you can just basically ask any of the business reps and they'll give it to you. [00:34:33]Harrison: You can. But it's still a different signup process. I think there's... I'm bullish that other ones will catch up especially like Anthropic and Google. The local ones are really interesting. I think we're seeing a big... [00:34:46]Swyx: Lama Two? Yeah, we're doing the fine-tuning hackathon tomorrow. Thanks for promoting that. [00:34:50]Harrison: No, thanks for it. I'm really excited about that stuff. I mean, that's something that like we've been, you know, because like, as I said, like the only thing we know is that the space is moving so fast and changing so rapidly. And like, local models are, have always been one of those things that people have been bullish on. And it seems like it's getting closer and closer to kind of like being viable. So I'm excited to see what we can do with some fine-tuning. [00:35:10]Swyx: Yeah. I have to confess, I did not know that you cared. It's not like a judgment on Langchain. I was just like, you know, you write an adapter for it and you're done, right? Like how much further does it go for Langchain? In terms of like, for you, it's one of the, you know, the model IO modules and that's it. But like, you seem very personally, very passionate about it, but I don't know what the Langchain specific angle for this is, for fine-tuning local models, basically. Like you're just passionate about local models and privacy and all that, right? And open source. [00:35:41]Harrison: Well, I think there's a few different things. Like one, like, you know, if we think about what it takes to build a really reliable, like context-aware reasoning application, there's probably a bunch of different nodes that are doing a bunch of different things. And I think it is like a really complex system. And so if you're relying on open AI for every part of that, like, I think that starts to get really expensive. Also like, probably just like not good to have that much reliability on any one thing. And so I do think that like, I'm hoping that for like, you know, specific parts at the end, you can like fine-tune a model and kind of have a more specific thing for a specific task. Also, to be clear, like, I think like, I also, at the same time, I think open AI is by far the easiest way to get started. And if I was building anything, I would absolutely start with open AI. So. [00:36:27]Swyx: It's something I think a lot of people are wrestling with. But like, as a person building apps, why take five vendors when I can take one vendor, right? Like, as long as I trust Azure, I'm just entrusting all my data to Azure and that's it. So I'm still trying to figure out the real case for local models in production. And I don't know, but fine-tuning, I think, is a good one. That's why I guess open AI worked on fine-tuning. [00:36:49]Harrison: I think there's also like, you know, like if there is, if there's just more options available, like prices are going to go down. So I'm happy about that. So like very selfishly, there's that aspect as well. [00:37:01]Alessio: And in the Lancsmith announcement, I saw in the product screenshot, you have like chain, tool and LLM as like the three core atoms. Is that how people should think about observability in this space? Like first you go through the chain and then you start dig down between like the model itself and like the tool it's using? [00:37:19]Harrison: We've added more. We've added like a retriever logging so that you can see like what query is going in and what are the documents you're getting out. Those are like the three that we started with. I definitely think probably the main ones, like basically the LLM. So the reason I think the debugging in Lancsmith and debugging in general is so needed for these LLM apps is that if you're building, like, again, let's think about like what we want people to build in with LangChain. These like context aware reasoning applications. Context aware. There's a lot of stuff in the prompt. There's like the instructions. There's any previous messages. There's any input this time. There's any documents you retrieve. And so there's a lot of like data engineering that goes into like putting it into that prompt. This sounds silly, but just like making sure the data shows up in the right format is like really important. And then for the reasoning part of it, like that's obviously also all in the prompt. And so being able to like, and there's like, you know, the state of the world right now, like if you have the instructions at the beginning or at the end can actually make like a big difference in terms of whether it forgets it or not. And so being able to kind of like. [00:38:17]Swyx: Yeah. And it takes on that one, by the way, this is the U curve in context, right? Yeah. [00:38:21]Harrison: I think it's real. Basically I've found long context windows really good for when I want to extract like a single piece of information about something basically. But if I want to do reasoning over perhaps multiple pieces of information that are somewhere in like the retrieved documents, I found it not to be that great. [00:38:36]Swyx: Yeah. I have said that that piece of research is the best bull case for Lang chain and all the vector companies, because it means you should do chains. It means you should do retrieval instead of long context, right? People are trying to extend long context to like 100K, 1 million tokens, 5 million tokens. It doesn't matter. You're going to forget. You can't trust it. [00:38:54]Harrison: I expect that it will probably get better over time as everything in this field. But I do also think there'll always be a need for kind of like vector stores and retrieval in some fashions. [00:39:03]Alessio: How should people get started with Langsmith Cookbooks? Wanna talk maybe a bit about that? [00:39:08]Swyx: Yeah. [00:39:08]Harrison: Again, like I think the main thing that even I find valuable about Langsmith is just like the debugging aspect of it. And so for that, it's very simple. You can kind of like turn on three environment variables and it just logs everything. And you don't look at it 95% of the time, but that 5% you do when something goes wrong, it's quite handy to have there. And so that's probably the easiest way to get started. And we're still in a closed beta, but we're letting people off the wait list every day. And if you really need access, just DM me and we're happy to give you access there. And then yeah, there's a lot that you can do with Langsmith that we've been talking about. And so Will on our team has been leading the charge on a really great like Langsmith Cookbooks repo that covers everything from collecting feedback, whether it's thumbs up, thumbs down, or like multi-scale or comments as well, to doing evaluation, doing testing. You can also use Langsmith without Langchain. And so we've got some notebooks on that in there. But we have Python and JavaScript SDKs that aren't dependent on Langchain in any way. [00:40:01]Swyx: And so you can use those. [00:40:01]Harrison: And then we'll also be publishing a notebook on how to do that just with the REST APIs themselves. So yeah, definitely check out that repo. That's a great resource that Will's put together. [00:40:10]Swyx: Yeah, awesome. So we'll zoom out a little bit from Langsmith and talk about Langchain, the company. You're also a first-time founder. Yes. And you've just hired your 10th employee, Julia, who I know from my data engineering days. You mentioned Will Nuno, I think, who maintains Langchain.js. I'm very interested in like your multi-language strategy, by the way. Ankush, your co-founder, Lance, who did AutoEval. What are you staffing up for? And maybe who are you hiring? [00:40:34]Harrison: Yeah, so 10 employees, 12 total. We've got three more joining over the next three weeks. We've got Julia, who's awesome leading a lot of the product, go-to-market, customer success stuff. And then we've got Bri, who's also awesome leading a lot of the marketing and ops aspects. And then other than that, all engineers. We've staffed up a lot on kind of like full stack infra DevOps, kind of like as we've started going into the hosted platform. So internally, we're split about 50-50 between the open source and then the platform stuff. And yeah, we're looking to hire particularly on kind of like the things, we're actually looking to hire across most fronts, to be honest. But in particular, we probably need one or two more people on like open source, both Python and JavaScript and happy to dive into the multi-language kind of like strategy there. But again, like strong focus there on engineering, actually, as opposed to maybe like, we're not a research lab, we're not a research shop. [00:41:48]Swyx: And then on the platform side, [00:41:49]Harrison: like we definitely need some more people on the infra and DevOps side. So I'm using this as an opportunity to tell people that we're hiring and that you should reach out if that sounds like you. [00:41:58]Swyx: Something like that, jobs, whatever. I don't actually know if we have an official job. [00:42:02]Harrison: RIP, what happened to your landing page? [00:42:04]Swyx: It used to be so based. The Berkshire Hathaway one? Yeah, so what was the story, the quick story behind that? Yeah, the quick story behind that is we needed a website [00:42:12]Harrison: and I'm terrible at design. [00:42:14]Swyx: And I knew that we couldn't do a good job. [00:42:15]Harrison: So if you can't do a good job, might as well do the worst job possible. Yeah, and like lean into it. And have some fun with it, yeah. [00:42:21]Swyx: Do you admire Warren Buffett? Yeah, I admire Warren Buffett and admire his website. And actually you can still find a link to it [00:42:26]Harrison: from our current website if you look hard enough. So there's a little Easter egg. Before we dive into more of the open source community things, [00:42:33]Alessio: let's dive into the language thing. How do you think about parity between the Python and JavaScript? Obviously, they're very different ecosystems. So when you're working on a LangChain, is it we need to have the same abstraction in both language or are you to the needs? The core stuff, we want to have the same abstractions [00:42:50]Harrison: because we basically want to be able to do serialize prompts, chains, agents, all the core stuff as tightly as possible and then use that between languages. Like even, yeah, like even right now when we log things to LangChain, we have a playground experience where you can run things that runs in JavaScript because it's kind of like in the browser. But a lot of what's logged is like Python. And so we need that core equivalence for a lot of the core things. Then there's like the incredibly long tail of like integrations, more researchy things. So we want to be able to do that. Python's probably ahead on a lot of like the integrations front. There's more researchy things that we're able to include quickly because a lot of people release some of their code in Python and stuff like that. And so we can use that. And there's just more of an ecosystem around the Python project. But the core stuff will have kind of like the same abstractions and be translatable. That didn't go exactly where I was thinking. So like the LangChain of Ruby, the LangChain of C-sharp, [00:43:44]Swyx: you know, there's demand for that. I mean, I think that's a big part of it. But you are giving up some real estate by not doing it. Yeah, it comes down to kind of like, you know, ROI and focus. And I think like we do think [00:43:58]Harrison: there's a strong JavaScript community and we wanted to lean into that. And I think a lot of the people that we brought on early, like Nuno and Jacob have a lot of experience building JavaScript tooling in that community. And so I think that's a big part of it. And then there's also like, you know, building JavaScript tooling in that community. Will we do another language? Never say never, but like... [00:44:21]Swyx: Python JS for now. Yeah. Awesome. [00:44:23]Alessio: You got 83 articles, which I think might be a record for such a young company. What are like the hottest hits, the most popular ones? [00:44:32]Harrison: I think the most popular ones are generally the ones where we do a deep dive on something. So we did something a few weeks ago around evaluating CSV q
AI Governance is a topic of great interest today. Within the context of AI governance, often data is discussed from a perspective of data privacy, data security and so on. Sridhar, Bharath and Satya are looking at the fact that the few companies that have access to large amounts of data and discussing how this may impact AI development. They raise a lot of questions around the law, data ownership and access to promote competition, innovation and human progress. Reading List: Computing Machinery and Intelligence, A.M.Turing This is the key to designing sustainable data cooperatives | World Economic Forum The world's most valuable resource is no longer oil, but data Why data governance is essential for enterprise AI - IBM Blog Trust region Policy Optimization, Schulman and others Do check out Takshashila's public policy courses: https://school.takshashila.org.in/courses We are @IVMPodcasts on Facebook, Twitter, & Instagram. https://twitter.com/IVMPodcasts https://www.instagram.com/ivmpodcasts/?hl=en https://www.facebook.com/ivmpodcasts/ You can check out our website at https://shows.ivmpodcasts.com/featured Follow the show across platforms: Spotify, Google Podcasts, Apple Podcasts, JioSaavn, Gaana, Amazon Music Do share the word with your folks! See omnystudio.com/listener for privacy information.
What You'll Learn in This Episode:Why influencer marketing, ESG and misinformation were the hot topics at this year's Meltwater SummitWhat differentiates companies that succeed in the short term from those that succeed in the longterm This episode is part of our Behind the Brand series, which pulls back the curtain on an iconic brand to focus on the people shaping that brand's communications and marketing strategy. We launched this series in April 2023, with Gráinne O'Brien, senior director of corporate affairs for Kellogg Europe. This month, host Linda Descano welcomes Dino Delic back to the pod (he previously chatted with us in December 2020 about our Word of the Year). Dino has been with Meltwater since 2009, having grown with the company through four promotions and roles in Melbourne, New York, Chicago and now Los Angeles. Now with 27,000 global customers in 50 offices across six continents, and 2,300 employees, Meltwater empowers companies with a suite of solutions that spans media, social, consumer and sales intelligence. Linda kicks off her conversation with Dino by asking him to describe his role at Meltwater, to which he replies, “My job is to help our team help our clients connect the dots as much as possible with all the external data that we collect. What we're really helping companies do is collect information outside of their four walls, make sense of it and make more informed decisions so they can better understand where they're spending money wisely and not spending money.” Dino says he landed in the external intelligence field many years ago, by accident. “I studied marketing, but I got my first sales job at the age of 18; I was just hooked on interacting with people. It turns out, if you like problem solving, sales is a pretty good profession. But I didn't stick around in a sales career for over two decades now because I love sales. It was just through pure luck that I landed on a company that has such an interesting data set.” The company also has an immense following, having recently attracted more than 800 attendees and more than 50 speakers, including Bethenny Frankel and Trevor Noah, to the Meltwater Summit in New York. At the Summit, Meltwater introduced two new AI Assistants, leveraging the latest technology in generative AI (learn more through the link in our show notes). The first is a PR Assistant which helps PR professionals draft press releases and personalize pitches to journalists in record time. The second is the AI Writing Assistant, powered by the ChatGPT API, which drafts highly engaging social content, saving teams time and increasing social engagement at scale. Dino says, “There's a lot of interest in AI, especially because of how excited people are that automation can save time, but I'm kind of sick of the conversation going into fear mongering about AI. Especially in professions in strategy, customer service, sales, brand marketing, PR and comms, the biggest gripe for everybody is that they're so busy, that they don't have time to do their job because they have to do a lot of manual work. There are so many things that people just pull their hair out about. That's what I'm excited about — is that AI can eliminate all those menial tasks. That shouldn't be replacing jobs. That I think is just hype and hyperbole.” Linda and Dino also talk about what defines a successful company today and how data can be used to ensure not only longevity but collaboration between departments. “The companies that do better, versus the ones struggling with business, are the ones that have a nice, central unifying mission and set of values, and everybody contributes to those goals,” says Dino. “The companies that do well, but only for a short period of time, are the ones that have a really strong marketing discipline or function, but it doesn't last because they don't work all well together.” He points to use cases highlighted at the Summit, whereby PR and sales teams are working together, using Meltwater's media monitoring tools to track competitors or key accounts and ultimately inform customer service and sales. “The holy grail for PR is to be able to say, ‘We ran a campaign and generated this much business,'” says Dino. “That's next to impossible because that's not how people make purchasing decisions. But a good PR campaign changes user behavior. What keeps me employed and excited is realizing that, especially in PR and comms, people don't often connect the dots between what they're doing and what their company is doing. They're just looking to measure their own performance. With the benefit of data, and the tools that we have, you can really inform strategy, but you can also inform other departments.” The most popular track at this year's Meltwater Summit was the influence track, says Dino, explaining they couldn't put enough chairs in the room. “What was interesting is when I looked around at all the name tags, and all the disciplines that were in that room, it wasn't just marketing or PR,” he says. “Influencer marketing is becoming such an interesting space, because it's at the confluence of a really good earned campaign, backed by an influencer campaign, and then also a paid strategy. It's customer acquisition, it's building trust, it's creating awareness.” The Summit's second most-talked about topic: ESG and how consumers crave authenticity and desire tangible actions from brands. This is particularly relevant in influencer marketing, where consumers seek a genuine connection with influencers and desire behind-the-scenes content. Dino emphasizes that it's not enough for brands to make promises, such as electrifying their fleet or implementing sustainable practices. Consumers want to know how these promises will be fulfilled and desire influencers who genuinely share their passion for the brand. They want to see influencers who can take them behind the scenes and provide an authentic look at the brand's actions and values. Linda closes by asking Dino a few rapid-fire questions, including what his superpower is, what's the best career advice he's ever received and if he lives by any particular motto or mantra. Give “Red Sky Fuel for Thought” a listen, and subscribe to the show on iTunes, Spotify or your favorite podcasting app. Don't forget to rate and review to help more people find us! Also mentioned on this episode:Our first Behind-the-Brand episode with Gráinne O'BrienOur 2020 word of the year episode featuring Dino DelicMeltwater announces new AI-powered assistants, summaries and analysis at Meltwater SummitRecapping Meltwater Summit Follow Red Havas for a daily dose of comms news:TwitterFacebookInstagramLinkedIn Subscribe:Don't forget to subscribe to the show using your favorite podcasting app.iTunesSpotify What did you love? What would you like to hear about next?Remember to rate and review today's show; we'd love to hear f
This is the start of a series on AI in L&D. The series is exploring what people are doing now with AI, what people are thinking about, and what people are learning.In this episode, we kick off the series where Robin does an interview with ChatGPT on transforming learning at work. The voice is generated by ElevenLabs.If you are interested, it might be easier to read the blog post along with the podcast, than listening to the computer generated voice.Key takeaways:Promoting self-guided learning and automation in instructional design can contribute to a culture of continuous learning. Automation can be used in instructional design to automate routine tasks and increase efficiency, such as AI automating content curation, learner data analysis, and feedback collection and analysis.There are risks and considerations to be aware of when using AI. These include the dangers of dependency on AI, the digital divide in access to AI tools, ethical concerns regarding biassed information, and the importance of data privacy. To address these risks, organisations should comply with data privacy laws, implement quality control measures, balance AI with human-led learning experiences, and provide training and support for employees who are not comfortable with technology.Strategies for using AI in workplace learning include having human mentors or instructors for discussions and social learning opportunities, providing training and support for employees who are not comfortable with technology, and incorporating ethical considerations when choosing AI providers.Segmented time stamps:(00:00) The argumentation of human lead learning(01:58) AI in workplace learning(05:35) Adaptive learning experience pathways(08:15) On-demand learning for safety leadership(11:41) Content curation and updates(16:07) The risks of using AI in L&D(21:33) The role of humans and AI in the workplace(24:53) Balancing the use of AILinks from the podcast Check out ElevenLabsTranscript:Robin: Act as an expert on L&D and learning. You're going to be a guest on the Learning While Working podcast. The audience is workplace L&D professionals. The interview is going to talk about how you (chatGPT) can transform learning at work. I'm going to ask you questions.ChatGPT: Sure, I'd be happy to participate in this mock podcast interview. Feel free to start with your questions.Please remember that while I can provide useful and insightful information, I'm an AI model developed by OpenAI and my responses are based on patterns in the data I was trained on, up until September 2021. Any information or insights I provide should be validated and adapted to your specific context.Robin: Welcome to the Learning While Working podcastChatGPT: Thank you for having me on the Learning While Working podcast. It's an exciting opportunity to connect with the L&D community. As an AI developed by OpenAI, I aim to provide insights and information that can help transform learning at work. Let's get started!Robin: What are 3 ways you transform learning at work?ChatGPT: AI, such as myself, can help to transform learning at work in various ways. Here are three significant ways:Personalised Learning: One of the biggest challenges in workplace learning is meeting the diverse needs of learners. AI can analyse an individual's learning style, previous knowledge, and performance data to deliver personalised learning experiences. For instance, based on a learner's progress, AI can recommend next steps, additional resources, or even a change in learning strategy to improve outcomes.On-Demand Learning: The traditional model of set training times and classrooms isn't flexible or scalable, especially in a world where remote work is increasingly common. AI can provide on-demand learning, allowing employees to learn at their own pace, anytime and anywhere. This is especially useful for quickly updating employees on new tools, policies, or industry developments.Automation of Routine Tasks: A lot of L&D professionals spend considerable time on administrative tasks like scheduling training sessions, tracking progress, and generating reports. AI can automate these routine tasks, allowing L&D professionals to focus more on strategic tasks such as designing better learning experiences and interventions.These are just a few examples of how AI can help. The actual impact will depend on the organisation, its culture, and its readiness to leverage AI in learning and development.Robin: Give me three examples of Personalized Learning?ChatGPT: Certainly, personalised learning through AI can take many forms. Here are three specific examples:Adaptive Learning Paths: AI can create adaptive learning paths based on each learner's knowledge level, skills, and learning speed. For instance, if an employee is already proficient in certain areas, the AI can automatically adjust their learning pathway to focus on areas where they need improvement. This avoids wasting time on content they already know and focuses on developing areas where they need growth.Recommendation Systems: Similar to how Netflix or Amazon recommends movies or products based on past behaviour, an AI can recommend learning resources based on a user's past learning behaviour and performance. For example, if a user struggled with a particular concept, the AI might suggest additional resources, such as videos or articles, to reinforce their understanding.Learning Style Adaptation: AI can adapt the format of the content to match the learner's preferred style. For example, some people learn better through visual materials like infographics and videos, while others prefer text-based resources or interactive simulations. AI can track how users interact with different types of content and then prioritise delivering content in the formats that work best for them.In all these ways, AI can help create a personalised, more engaging, and effective learning experience for each individual.Robin: You're being a bit vague. Give me and example of using ChatGPT for Adaptive Learning Paths for cybersecurity skills.ChatGPT: Certainly, let's take a more detailed look at how ChatGPT could be utilised for creating an adaptive learning path in the field of cybersecurity.Suppose an employee, let's call her Jane, is tasked with improving her cybersecurity skills. After an initial assessment or quiz, it's clear that Jane has a strong understanding of basic concepts such as types of threats and fundamental security protocols, but she struggles more with advanced topics like incident response management and network security.Initial Assessment: In this scenario, ChatGPT could begin by providing an interactive assessment, where Jane answers questions or solves problems related to various aspects of cybersecurity. This could involve everything from basic concepts to more advanced ones.Custom Learning Pathway: Based on Jane's performance on the assessment, ChatGPT identifies her strengths and weaknesses. Since she already has a strong understanding of basic concepts, the AI doesn't include these topics in her learning pathway. Instead, it prioritises advanced topics like incident response management and network security.Learning Delivery and Continuous Assessment: As Jane moves through her personalised learning pathway, she interacts with ChatGPT in a conversational manner, discussing cybersecurity scenarios, answering quiz questions, and solving problems related to her learning topics. ChatGPT adjusts the complexity and depth of the content based on Jane's progress.Regular Feedback and Adjustments: Throughout this process, Jane receives instant feedback on her progress, which helps her understand where she stands and how she's improving. If Jane is struggling with a certain topic, ChatGPT might choose to slow down, provide additional learning resources, or approach the topic from a different angle.In this way, ChatGPT can provide a flexible, responsive learning experience that adapts to Jane's needs, ensuring she focuses on the areas where she needs to grow most.This approach not only saves time by avoiding unnecessary repetition of known topics but also promotes engagement and effective learning by focusing on areas of need and providing instant feedback.Robin: This is good. How about an example for On-Demand Learning for safety leadershipChatGPT: Absolutely. Here's how ChatGPT could provide on-demand learning for an area such as safety leadership:Let's consider a manager, we'll call him John, who has been tasked with improving his safety leadership skills to better lead his team in a manufacturing environment. John works on a tight schedule and he often can't participate in traditional training programs during usual business hours.24/7 Availability: Since ChatGPT is an AI, it is available 24/7. John can interact with the training program whenever he has time, be it early in the morning before his shift starts, or late in the evening after he finishes his work. He doesn't have to wait for a scheduled training session.Learning in Context: Suppose a specific situation arises at work that John is unsure how to handle from a safety leadership perspective, like responding to a safety incident. He could immediately consult with ChatGPT, asking specific questions to learn how to best handle the situation. ChatGPT could provide guidance based on best practices in safety leadership, and provide resources for deeper learning.Role-Playing Simulations: On-demand learning isn't just about reading material or answering quiz questions. For example, John could engage in a role-playing scenario with ChatGPT at any time, where he has to navigate a simulated safety incident, make decisions and respond to changes. This on-the-spot, interactive learning can be highly effective for building practical skills.Just-In-Time Learning: If a new safety regulation or procedure is implemented, John can immediately get the necessary training from ChatGPT. It can provide an overview of the new rule, explain its implications, and even quiz John to ensure he has understood the change.This kind of on-demand learning can be particularly effective because it is self-paced, flexible, and contextual. It allows learners to get the knowledge when they need it, apply it immediately, and therefore, they are more likely to retain it.Robin: Is this key to type experience making sure employees are self guided learners?ChatGPT:Yes, fostering a culture of self-guided learning is a key aspect of successful implementation of AI-driven personalised and on-demand learning. Here's why:Ownership of Learning: When employees are self-guided learners, they take ownership of their learning journey. This makes the learning process more effective because they are actively seeking knowledge, rather than passively receiving it.Pace and Style: Self-guided learners can set their own pace and choose their preferred style of learning. This flexibility can lead to more engagement and better absorption of knowledge.Immediate Application: Self-guided learners can immediately apply what they've learned to their work, which can reinforce their learning and demonstrate its practical relevance.Continuous Learning Culture: Promoting self-guided learning can contribute to a culture of continuous learning. In today's rapidly changing work environment, continuous learning is crucial for businesses to stay competitive and for employees to keep their skills up-to-date.However, it's important to note that not everyone may be used to or comfortable with self-guided learning. In these cases, organisations can provide support and guidance to help employees become more self-directed learners. This could include training on how to use AI learning tools, creating clear learning paths for skills development, and providing mentors or coaches to guide learners.Robin: Getting those top 3 things. How could an instructional designer use Automation of Routine Tasks?ChatGPT: Absolutely. There are several routine tasks within instructional design that can be automated with AI, which can greatly increase efficiency and allow instructional designers to focus more on the creative and strategic aspects of their work. Here are three specific examples:Content Curation and Update: AI can automate the process of sourcing, organising, and updating learning content. For example, an AI could crawl the web or a specific database to find relevant articles, videos, or research papers for a particular course topic. It can also regularly check for updates or newer versions of the existing learning content and replace outdated materials.Learner Data Analysis: AI can automate the analysis of learner data to provide insights into learning behaviours and performance. For instance, AI could automatically generate reports on learner engagement, quiz performance, areas of struggle, and more. This allows instructional designers to quickly and efficiently evaluate the effectiveness of their courses and make data-driven improvements.Feedback Collection and Analysis: Collecting and analysing learner feedback is an important but time-consuming task. AI can automate this process by prompting learners for feedback at optimal times (like after a course completion), collating the feedback, and analysing it to provide actionable insights. It could categorise feedback into themes, identify common issues, and even suggest potential improvements.By automating these tasks, instructional designers can save significant time and focus more on designing effective, engaging learning experiences. Additionally, the insights provided by AI can inform their design decisions, leading to more effective learning outcomes.Robin: How about for content creation?ChatGPT: AI can indeed assist with content creation in a variety of ways, which can be particularly useful for instructional designers. Here are a few examples:AI-Powered Writing Assistants: Tools like ChatGPT can assist in drafting course content. Given an outline or a set of key points, AI can generate draft content, which can then be edited and refined by instructional designers. This can speed up the content creation process, especially for first drafts or when creating content in a specific area of expertise.Content Optimization: AI can analyse content and suggest improvements based on best practices. For example, it might suggest simplifying complex sentences, adding more visuals, or including quizzes and interactive elements at certain intervals to boost engagement and retention.Translation and Localization: If content needs to be provided in multiple languages, AI-powered translation tools can be used to translate the content. Additionally, AI can help localise content, i.e., adapt the content to fit the cultural, societal, and linguistic context of different learner groups.Creating Interactive Content: AI can help create interactive content such as quizzes, simulations, or even chatbots for role-playing scenarios. Instructional designers can input the learning objectives and key points, and the AI can generate appropriate interactive content.It's important to note that while AI can assist in content creation, the role of instructional designers remains crucial. They provide the human touch, the deep understanding of learner needs, and the creativity that AI cannot replicate. AI should be seen as a tool that can augment the capabilities of instructional designers, not replace them.Robin: What are some risks of using you for workplace learning?ChatGPT: Using AI like ChatGPT for workplace learning presents several opportunities but also comes with some risks. It's important to consider these potential challenges:Data Privacy: AI platforms often rely on data from users to provide personalised experiences. This can raise concerns about data privacy, particularly if sensitive information is involved. It's crucial to ensure that any AI used in the workplace complies with data privacy regulations and that users are informed about how their data is used.Quality Control: While AI can generate content and assist with learning, the quality of that content may not always meet the necessary standards. Misunderstandings or inaccuracies could lead to the dissemination of incorrect information, which could be particularly harmful in areas such as safety training or compliance.Dependency: There's a risk that organisations could become overly reliant on AI for learning and development, neglecting the importance of human interaction and guidance in the learning process. While AI can deliver information and even personalise learning to some extent, it can't replace the mentorship and social interaction that humans provide.Digital Divide: While AI can offer powerful tools for learning, not all employees may have equal access to these tools. For example, employees who are not comfortable with technology or who lack access to high-quality internet may not benefit as much from AI-based learning tools.Ethical Considerations: AI systems, including ChatGPT, are trained on vast amounts of data, which can include biassed or prejudiced information. Even though measures are taken to prevent it, these biases can sometimes influence the AI's output. This is a broader issue with AI and machine learning and something to be aware of when using AI tools.It's crucial for organisations to be aware of these risks and to implement strategies to mitigate them when using AI for workplace learning.Robin: Can we just drill into Data Privacy? What are your Data privacy features?ChatGPT: As an AI developed by OpenAI, I am designed with privacy in mind. Here are some key aspects to understand about how I handle data:No Personal Data Storage: As of my training cut-off in September 2021, I don't have the ability to access or retrieve personal data from users unless it's shared with me in the course of our conversation. I don't store personal conversations or use them to inform my responses.Temporary Data Retention: By default, OpenAI retains data passed to the ChatGPT API for 30 days, but doesn't use it to improve my models.Privacy by Design: From the outset, I've been designed to respect user privacy. For example, I don't ask for personal data unless it's strictly necessary for the conversation.However, it's important to note that the usage of AI tools should still comply with the organisation's data privacy policies and any local laws or regulations. You should always be clear with employees about how their data is being used, and make sure you have their consent before using AI tools in a learning context.As data privacy is a dynamic field with constantly evolving regulations and standards, it's crucial to regularly review and update your data privacy policies and practices.Robin: But doesn't mean you're losing opportunities for training data?ChatGPT: While it's true that retaining and using conversational data could potentially provide opportunities for refining and improving AI models, OpenAI has chosen not to use data from individual interactions with ChatGPT to improve the model in order to prioritise user privacy and confidentiality.This decision reflects an important balance in AI development between improving the system and ensuring user trust. Given that AI systems like me are often used in contexts where sensitive or private information might be discussed, ensuring that users feel safe and confident that their data won't be used without their explicit permission is a critical priority.It's worth noting, though, that overall improvements in the model, which are based on extensive and diverse datasets, will continue to enhance the effectiveness and accuracy of the AI, benefiting all users.Robin: Going back to the risks you talked about, how can we overcome those risks?ChatGPT: Addressing these risks requires thoughtful planning and implementation. Here are some suggestions:Data Privacy: Ensure any AI tool used complies with all relevant data privacy laws and regulations, such as GDPR in Europe. Implement robust data protection measures, and be transparent with users about what data you collect, why you collect it, and how you use and protect it.Quality Control: Don't rely solely on AI for creating and delivering content. Always have a human in the loop to review and validate the information. Regularly test the system and provide opportunities for users to give feedback or report errors.Dependency: Balance the use of AI with human-led learning experiences. AI is a tool to supplement and enhance learning, not replace human guidance and interaction. For example, you might use AI for delivering content and personalising learning paths, but still have human mentors or instructors for discussions, mentoring, and social learning opportunities.Digital Divide: Provide training and support for employees who are not comfortable with technology. This might include basic digital literacy training or one-on-one support. Make sure your learning experiences are accessible on different devices and internet speeds, and where necessary, provide alternative, non-digital learning options.Ethical Considerations: Choose AI providers who are transparent about their training data and methods and who actively work to minimise and address bias. Incorporate ethical considerations into your AI strategy and continuously monitor and evaluate the AI's output for potential bias or ethical issues.These strategies can help mitigate the risks of using AI in workplace learning. However, each organisation's situation is unique, so it's important to continually assess and address risks in your specific context.Robin: In the past I've talked about how AI means that humans need to focus on the 3 things that AI can't do, Create, Collaborate and Change. What are your thoughts on this?ChatGPT: You've highlighted a critical perspective on the interaction between humans and AI in the workplace, and particularly in learning and development. AI technology has made significant strides, but there remain areas where humans outperform any existing AI. Let's unpack the three Cs you've mentioned:Create: While AI can generate content, solve problems within a specified context, and even mimic some forms of creativity, it fundamentally lacks the ability to truly innovate or think outside of the box. Human creativity, which involves the generation of new ideas, unique insights, and novel solutions, remains unchallenged. In learning and development, this creativity plays a crucial role in designing engaging and effective learning experiences.Collaborate: AI can facilitate collaboration (for example, by coordinating schedules or organising tasks), but it cannot truly collaborate the way humans can. Human collaboration involves complex social interactions, empathetic understanding, negotiation, and the ability to build on each other's ideas in dynamic, unpredictable ways. Especially in a learning context, this ability to collaborate and learn from each other is a deeply human skill that AI cannot replicate.Change: AI can adapt to new data within a predefined framework, but it doesn't have the capacity for the kind of flexible, broad-spectrum change and adaptability that humans possess. Humans can change their beliefs, assumptions, behaviors, and strategies based on new experiences or insights, and this ability is key to learning and development.In the context of learning while working, humans should leverage AI for what it does best - managing large amounts of data, providing personalised content, offering on-demand support, and automating routine tasks. Meanwhile, human efforts can focus on these irreplaceable areas of creativity, collaboration, and change, which are at the heart of effective learning and development. This combination of AI's capabilities and the unique human skills can lead to a powerful, holistic approach to workplace learning.Robin: Now as the wrap questions what your greatest piece of advice about using ChatGPT is for learning while working?ChatGPT: When it comes to using ChatGPT or any AI in the learning while working context, the most important piece of advice I'd give is this: View AI as a tool that augments and enhances human-led learning, not as a replacement for it.AI tools like ChatGPT can offer great advantages in terms of personalising learning, providing on-demand support, and automating routine tasks. But they are not a silver bullet. The effectiveness of AI in learning and development depends greatly on how it's integrated into your broader L&D strategy.Here are a few key points to consider:Learner-Centric Approach: Focus on the learner. AI offers exciting possibilities, but ultimately, the goal is to create a better, more effective, more engaging learning experience for your employees. So always consider how the use of AI will benefit the learner.Blend with Human Interaction: Balance the use of AI with opportunities for human interaction and guidance. While AI can deliver content and personalise learning, it can't replace the mentorship, social learning, and human touch that are also crucial for effective learning.Ongoing Evaluation: Continually evaluate and adjust your approach. AI is a rapidly evolving field, so it's important to stay updated with new developments and be ready to adapt your strategies. Regularly solicit feedback from learners and use it to improve your AI-enhanced learning experiences.Training & Support: Provide training and support to help employees use AI tools effectively. Not everyone may be comfortable with technology, so it's important to help them understand how to use AI tools and how they can benefit from them.By keeping these principles in mind, organisations can leverage the power of AI to transform workplace learning, while still maintaining a human-centric approach that values and nurtures the unique capabilities of each individual learner.
05-15-2023 J.L. Hancock Learn more about the interview and get additional links here: https://usadailytimes.com/2023/05/15/is-the-chatgpt-api-creating-a-small-business-boom/ Subscribe to the best of our content here: https://priceofbusiness.substack.com/ Subscribe to our YouTube channel here: https://www.youtube.com/channel/UCywgbHv7dpiBG2Qswr_ceEQ
NewsDiscussion topic: Generating package summaries with GPTDiscussion topic: Experimenting with ingesting in SPI data into GPT via pluginSwift Package Index Playgrounds 1.1.0Validate SPI manifestPackagesEmojiTextFoundationPreview, FoundationICU by AppleDSWaveformImageswift-ast-explorer by Kishikawa KatsumiSwift AST Explorer web appSwiftSyntaxExpression Macros Swift evolution proposal
In this unique conversation, I interviewed ChatGPT as a guest on the podcast.ChatGPT took the world by storm in November of 2022, and since then, the world hasn't been the same, and the release of ChatGPT API created thousands of new applications geared towards specific tasks. In this fascinating conversation, we cover some critical aspects of the implications of AI on businesses and society. Some of these topics include:
Warner Media has changed the name of its HBO Max service to just “Max” with details on news and sports on the service to be announced in a few months time. Game developers in China are using generative AI to create background landscapes more easily. And Stanford University scientists create a Sims-like virtual world RPG with the player characters controlled by the ChatGPT API.Starring Tom Merritt, Sarah Lane, Scott Johnson, Roger Chang, Joe, AmosLink to show notes here. Become a member at https://plus.acast.com/s/dtns. Hosted on Acast. See acast.com/privacy for more information.
話した内容Blog スポンサー Elith さん ElithはAI自社マルチプロダクト開発/AI受託開発 公式サイト: https://elith.co.jp/ ☀️特色1 豊富なデータ系人材 →画像、自然言語、音声、生成系、系列データの各専門家が在籍 →幅広い領域での機械学習、深層学習による課題解決が得意 ☀️特色2 高いKaggler率 →DataScientist/MLエンジニア11名(うちKaggleMaster1名、KaggleExpert6名) →毎日わいわいがやがやとAI爆速開発! ☀️サービス例 ・AIおみくじ(二拍手を運勢に変換) ・SHUWAI[シュワイ](手話を音声に変換) ・週刊エーアイElith(AI技術・AIビジネスのニュースレター) ...そのほか続々開発中! ☀️お気軽にお問い合わせください AI開発を依頼したいまたはElithで働きたいなど、興味を持たれた方は公式サイトのお問い合わせまでご連絡をお待ちしております。 今回は、スポンサー紹介、ChatGPT API、2023年3月の目標、今週の分析コンペ、雑談・来週話したいこと(言語処理学会、Misskey)について話しました。 #regonn_curry_fm へのお便りはこちら https://forms.gle/BZsrPSa4znoQNfww8
If you want to get caught up on the top generative AI news of the week, Eric Schwartz and Andrew Herndon from Voicebot.ai and Synthedia break down the top headlines for the first week in March 2023. On tap this week in the video (with links if you want to read more): ChatGPT API Snap My AI Spotify AI DJ New Bing in Windows 11 Meta joins the LLM wars…sort of More About GAIN The Generative AI News (GAIN) Rundown is recorded live and streamed via YouTube and LinkedIn at 12 noon EST on Thursdays. Join us live if you can make it. You can re-watch the discussion on Voicebot's YouTube channel.
Episode #5 of This Day in AI is Here! We Discuss GPT4, What we Can Expect from GPT4, ChatGPT API 1 Week On, AI Stock Picking, AI Gambling, More on AGI, Doctors being Replaced by AI, VC Investing in AI, Meta's LLaMA and More!00:00 - Whisper v2 AI Example & Intro00:29 - GPT4 Releasing Next Week? According to Microsoft CTO02:47 - GPT4 What to Expect and what will GPT-4 enable?06:00 - ChatGPT API: Great Interface, Token Limits, Censorship07:26 - ChatGPT Releases: Salesforce, Slack, DuckDuckGo, Hubspot10:42 - Custom AI Models: Is this the next wave of AI startups?11:56 - GPT Index, LangChain for Solving Token Limits15:08 - Will GPT4 Wipe Out LangChain and GPT Index?16:16 - The Ultimate AI Stock Picker. Can AI Be Used for Investing?20:00 - Is AI Model Chaining Like Specialization in the Brain? New Roles for Developers with AI21:11 - More AI Stock Picking & Investing21:53 - Gambling with AI: Can AI Place the Best Bets? Wealth Creation with AI24:35 - When Will the Entire Stock Market by AIs?25:56 - Whisper v2 AI Demo & Will Evil AGI Destroy Humanity?33:30 - Are AI Models are "Just Math" or Are Humans Just Dumb? 36:25 - Is AI The Next Predator? More on AGI39:07 - How Long Until Voice AI Chatbots Are in Cars? Homes? Alexa? Google?42:55 - Can Salesforce be Disrupted by AI? Snowflake with Dyanic AI Generated Interfaces? 47:41 - AI Job Wipeout: Can AI LLMs Replace Doctors? Do Models Need To Upskill?54:53 - MidJourney v5 Launch: Generative AI Progression57:52 - Reid Hoffman Quits OpenAI Board: Investing in AI. Salesforce Ventures AI Fund. 59:29 - How Can Individual Invest and Make Money from the AI Boom?1:01:53 - Meta's LLaMA: Is Basing AI on Facebook Comments Stupid?1:04:26 - More Bing "Sydney" LOLz: Does AI have Memory?Chris's Whisper V2 API Demo: https://www.youtube.com/watch?v=5QdjD_wLVT8&ab_channel=ChrisSharkeySOURCES:https://www.heise.de/news/GPT-4-is-coming-next-week-and-it-will-be-multimodal-says-Microsoft-Germany-7540972.htmlhttps://twitter.com/alyssamvance/status/1633932883801825284?s=20https://www.youtube.com/watch?v=5QdjD_wLVT8&ab_channel=ChrisSharkeyhttps://twitter.com/nearcyan/status/1632661647226462211?s=46&t=uXHUN4Glah4CaV-g2czc6Qhttps://www.reddit.com/r/midjourney/comments/yz5saa/midjourney_v5/https://twitter.com/TheRealAdamG/status/1633137765071167492?s=20https://www.reddit.com/r/bing/comments/11m1exf/can_bing_actually_know_this/https://twitter.com/pmarca/status/1633260935010988033?s=46&t=uXHUN4Glah4CaV-g2czc6Qhttps://twitter.com/wallstreetsilv/status/1632522698982080512?s=46&t=uXHUN4Glah4CaV-g2czc6QIf you enjoy this episode please consider subscribing, sharing, liking and commenting!
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Speed running everyone through the bad alignement bingo. $5k bounty for a LW conversational agent, published by ArthurB on March 9, 2023 on LessWrong. There's a wave of people, of various degrees of knowledge and influence, currently waking up to the ideas of AI existential risk. They seem to be literally going through every box of the bad alignement bingo card takes. I think there is value in educating those people. I'm aware there's an argument to be made that: education at scale doesn't matter, coordination is too difficult, all that matter is solving alignment and that takes care of the rest. There's something to that, but I disagree that education at scale doesn't help. It can make progress of frontrunners marginally more safety oriented, it can steer company cultures, it can move the Overton window, change the Zeitgeist, it can buy a bit of time. You likely didn't stumble on these ideas all on your own, so arguing against the value of outreach or education is also arguing against your own ability to do anything. It's also a matter of ROI, and there are some very low hanging fruit there. The simplest thing would be to write a long FAQ that goes through every common objections. No, people won't read the whole sequences, or Arbital on their own, but they might go through a FAQ. But we can do better than a FAQ. It's now fairly straightforward, with tools like langchain () to turn a set of documents into a body of knowledge for a conversational agent. This is done by building an index of embedding that a language model can search to bring context to an answer. This doesn't preclude fine tuning, but it makes it unnecessary. So a straightforward project is to index lesswrong, index arbitral, index the alignment forum, maybe index good alignement papers as well, blog posts, books. Then hook that up to the ChatGPT API, and prompt it to: list search queries for relevant material to answer the question compose an answer that reflects the content and opinion of the data answer with infinite patience Some jailbreak prompts may be needed to prevent ChatGPT's conditioning to regurgitate AI risk appeasing propaganda through the API, but there are a bunch of those out there. Or use the API of other models as they become open source or commercially available. Will this save humanity? No. Will this turn the course of safety research? Also no. Is this using AI to advance alignment? Well, yes, a little bit, don't dismiss very small starts. Is this worth spending a weekend hacking on this project instead of posting on Twitter? Absolutely. Will this actually make things worse? No, you're overthinking this. I'll pay $5k to the best version built by the end of March (if any is built). It's a modest bounty but it's really not all that much work, and it's fun work. And of course if anyone wants to add their own contribution to the bounty please do. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Speed running everyone through the bad alignement bingo. $5k bounty for a LW conversational agent, published by ArthurB on March 9, 2023 on LessWrong. There's a wave of people, of various degrees of knowledge and influence, currently waking up to the ideas of AI existential risk. They seem to be literally going through every box of the bad alignement bingo card takes. I think there is value in educating those people. I'm aware there's an argument to be made that: education at scale doesn't matter, coordination is too difficult, all that matter is solving alignment and that takes care of the rest. There's something to that, but I disagree that education at scale doesn't help. It can make progress of frontrunners marginally more safety oriented, it can steer company cultures, it can move the Overton window, change the Zeitgeist, it can buy a bit of time. You likely didn't stumble on these ideas all on your own, so arguing against the value of outreach or education is also arguing against your own ability to do anything. It's also a matter of ROI, and there are some very low hanging fruit there. The simplest thing would be to write a long FAQ that goes through every common objections. No, people won't read the whole sequences, or Arbital on their own, but they might go through a FAQ. But we can do better than a FAQ. It's now fairly straightforward, with tools like langchain () to turn a set of documents into a body of knowledge for a conversational agent. This is done by building an index of embedding that a language model can search to bring context to an answer. This doesn't preclude fine tuning, but it makes it unnecessary. So a straightforward project is to index lesswrong, index arbitral, index the alignment forum, maybe index good alignement papers as well, blog posts, books. Then hook that up to the ChatGPT API, and prompt it to: list search queries for relevant material to answer the question compose an answer that reflects the content and opinion of the data answer with infinite patience Some jailbreak prompts may be needed to prevent ChatGPT's conditioning to regurgitate AI risk appeasing propaganda through the API, but there are a bunch of those out there. Or use the API of other models as they become open source or commercially available. Will this save humanity? No. Will this turn the course of safety research? Also no. Is this using AI to advance alignment? Well, yes, a little bit, don't dismiss very small starts. Is this worth spending a weekend hacking on this project instead of posting on Twitter? Absolutely. Will this actually make things worse? No, you're overthinking this. I'll pay $5k to the best version built by the end of March (if any is built). It's a modest bounty but it's really not all that much work, and it's fun work. And of course if anyone wants to add their own contribution to the bounty please do. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
0:00:00 – HKPUG 會訊 + 每週 IT 新聞 1:08:57 – Main Topic 本集全長:2:25:36 ChatGPT API, Whisper API, Google I/O 2023, 俄羅斯禁用通訊軟件竟包括 WeChat, 小鵬電動汽車要人下跪, Ring Alarm …
This week's episode of The Marketing AI Show touches on generative AI, and you guessed it, ChatGPT. But it's not more of the same. APIs and HubSpot take ChatGPT to the next level. Tune in! ChatSpot…the latest in ChatGPT The week is starting off with a big development. Just yesterday, Monday, March 6, HubSpot co-founder and CTO Dharmesh Shah released ChatSpot, an AI tool that combines the power of ChatGPT, image generation AI, and HubSpot's CRM. The tool lets you ask questions of your HubSpot portal and provide instructions in natural language through a chat interface. For example, you can use ChatSpot to give you a summary of data in your portal, create a report of companies added last quarter summarized by country, or generate an image of an orange rocket ship. Mike and Paul break down this latest development and what it means for HubSpot customers and agencies. The biggest winners generative AI tech stack…so far Legendary venture capital firm Andreessen Horowitz published a deep dive into the generative AI market: “Who Owns the Generative AI Platform?” To create this, the firm met with dozens of startup founders and operators who deal directly with generative AI to better understand where the value in this market will accrue. Andreessen breaks down the generative AI tech stack into three main categories: Infrastructure - the cloud platforms and chips used to train models Models - the foundational models like GPT-3 that power generative AI tools Apps - the actual products like Jasper that customers use Andreessen observed that infrastructure vendors are likely the biggest winners in this market so far, capturing the majority of dollars flowing through the stack. Application companies are growing topline revenues very quickly but often struggle with retention, product differentiation, and gross margins. And most model providers, though responsible for the very existence of this market, haven't yet achieved a large commercial scale. Bottom line: the companies creating the most value — i.e. training generative AI models and applying them in new apps — haven't captured most of it. APIs are available for ChatGPT and Whisper We knew it would happen soon: developers can now integrate ChatGPT and Whisper, OpenAI's human-level speech recognition system, into apps and products through the company's API. Since December, OpenAI says it has reduced the cost of ChatGPT by 90%—savings that API users will now receive when they use it, making it much easier and cheaper for companies to incorporate the capabilities of ChatGPT and Whisper into their businesses. However, this doesn't just mean every business can have its own instance of ChatGPT. It means they can use these capabilities to build innovative new products. And tech and e-commerce companies are here for it. Already, Snap, the creator of Snapchat, introduced My AI, a customizable on-platform chatbot that is built on the ChatGPT API. Instacart is using the ChatGPT API to pair ChatGPT with its own data so that customers can ask open-ended natural language questions. And Speak is an AI language learning app and the fastest-growing English app in South Korea. They're using the Whisper API to power an AI-speaking companion product. It's impressive to see the API in action. These advancements and developments—happening at lightning speed—have an immediate impact on the marketing world. Paul and Mike help us uncover new opportunities and possibilities. Listen to this week's episode on your favorite podcast player, and be sure to explore the links below for more thoughts and perspectives on these important topics.
In this episode James announces his big news as he reflects on the past 5 years and how he got to where he is today. We also discuss B2C vs B2B and how people like Pieter Levels (@levelsio on Twitter) and Danny Postma (@dannypostmaa on Twitter) have managed to crack the code with B2C using AI to great advantage but is it worth the risk chasing that big win? Whilst mentioning Simon Høiberg (@SimonHoiberg on YouTube) we talk about how the best use of AI is fast becoming the role of an assistant in your app and so NoCoders are keenly jumping into the new ChatGPT API to see how they might add some killer features to their apps. James states how important AI will become in his SaaS product moving forwards with some really juicy early insights from his experimentation from which we can all learn. Another drop-the-mic moment from James comes in the form of Tiny Bird and how it has transformed James's ability to process vast amounts of data in his NoCode solution. Definitely one worth learning about for anyone recording 100's of 1000's, even millions of rows of data in a database, but one which you need to query in a performant manner. Kieran tells us about his experience with website/app roasting-as-a-service which he found very useful despite his reservations and so maybe more of us should be hiring roasters to help us to see the woods rather than the trees? And, naturally, Kieran has started to delve into ChatGPT too and is considering how he could leverage it inside Yep.so Glenn mentions an organisation called Big Change who are supplying grants to early stage ideas which are aiming to bring about transformation in the British education system and so, naturally, he has applied for recognition for his work he is doing with NoCode Kids. Let's see if they appreciate his ideas on teaching NoCode to kids in the school setting - watch this space. What we're working on Glenn's SaaS is NoCode Kids, a learning management system to teach kids about no-code, built on Webflow. Kieran's SaaS is Yep.so, a super fast landing page builder and idea validation tool, built on Bubble. James' SaaS is Userloop, a customer feedback tool for Shopify merchants, built on Bubble.
(00:07:06) WTF is Temu? https://www.modernretail.co/technology/after-a-successful-super-bowl-ad-temus-growth-is-outpacing-rivals-like-target/ (00:13:25) „Schutzgeld“ Monetarisierung in den Sozialen Medien https://www.bigtechnology.com/p/social-media-is-changing-and-paid (00:22:21) Hat BeReal eine Chance? (00:27:18) Bold Glamour https://www.theverge.com/2023/3/2/23621751/bold-glamour-tiktok-face-filter-beauty-ai-ar-body-dismorphia (00:32:55) Auf der Suche nach OpenAI-Alternativen https://www.theinformation.com/articles/fighting-woke-ai-musk-recruits-team-to-develop-openai-rival https://techcrunch.com/2023/03/02/stability-ai-hugging-face-and-canva-back-new-ai-research-nonprofit/ (00:39:35) ChatGPT API und OpenAI-Zapier-Integration https://openai.com/blog/introducing-chatgpt-and-whisper-apis https://zapier.com/apps/openai/integrations (00:42:55) LinkedIn-Content-Inflation mit ChatGPT https://www.theverge.com/2023/3/4/23624241/linkedin-collaborative-articles-ai-prompts-content (00:46:10) Generative AI bei WPP https://www.theguardian.com/technology/2023/feb/23/ai-artificial-intelligence-wpp-global-advertising-revolution-technology (00:50:00) Kann KI Gedanken lesen und kann ChatGPT Abhilfe bei Wordle leisen? https://medium.com/mlearning-ai/ai-art-meets-brain-activity-ai-can-literally-read-our-minds-5f6a5fce80e https://www.wired.com/story/face-recognition-software-led-to-his-arrest-it-was-dead-wrong/ (00:01:02) Brustkrebserkennung mit KI https://www.nytimes.com/2023/03/05/technology/artificial-intelligence-breast-cancer-detection.html (01:05:50) Buchempfehlung: Quit von Annie Duke
Reorx lists awesome apps & tools using the new ChatGPT API, Ernie Smith ranks self-hosted app alternatives, Very Good Ventures brings Dart to the server, Daniel Stenberg tells curl's NuGet story & Hacker Stations showcases tech workspace setups from all over the world.
Reorx lists awesome apps & tools using the new ChatGPT API, Ernie Smith ranks self-hosted app alternatives, Very Good Ventures brings Dart to the server, Daniel Stenberg tells curl's NuGet story & Hacker Stations showcases tech workspace setups from all over the world.
Reorx lists awesome apps & tools using the new ChatGPT API, Ernie Smith ranks self-hosted app alternatives, Very Good Ventures brings Dart to the server, Daniel Stenberg tells curl's NuGet story & Hacker Stations showcases tech workspace setups from all over the world.
Bugünün Ötesi podcast serisinin yeni bölümünde Özcan Yazıcı ve Dağhan Uzgur, Dünya'da ve Türkiye'de ön plana çıkan güncel konuları analiz ediyor, ‘Bugünün Ötesi'ni yorumluyor.Yapay zeka gazetecileri yoldaOpenAI'dan iki yenilik: ChatGPT API ve Whisper APIElon Musk ChatGPT'ye rakip olmak için düğmeye bastıJack Dorsey'in Twitter alternatifi Bluesky App Store'da yayınlandıYapay zeka bu kez F-16 uçurduGoogle, Gmail'e yapay zeka özellikleri entegre edecekBill Gates'ten Musk'a gönderme: Mars'a gideceğime aşıları finanse ederim“Bugünün Ötesi” podcast'ini Spotify, Google Podcasts, Apple Podcasts, Deezer gibi podcast platformlarından takip edebilir ve dinleyebilirsiniz.
This week on Rocket, Simone kissed too many Italians, so Brianna and Christina are running amok. They talk about the new ChatGPT API and the potential it could have on different startups and industries, the risky bets Binance made with its customer's money (without telling them) and Elizabeth Holmes' latest attempt to get out of jail, thus saving Christina $500.
The Wall Street Journal's Tom Dotan joins Ranjan Roy and Alex Kantrowitz for our weekly news recap show. We cover: 1) Salesforce's recent struggles 2) Salesforce's ‘monster quarter' 3) Amazon hitting pause on HQ2 4) State of the market 5) OpenAI's ChatGPT API 6) OpenAI vs. Microsoft? 7) Tinder robberies. --- Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice. For weekly updates on the show, sign up for the pod newsletter on LinkedIn: https://www.linkedin.com/newsletters/6901970121829801984/ Questions? Feedback? Write to: bigtechnologypodcast@gmail.com Rajan and Alex at SXSW: https://schedule.sxsw.com/2023/events/PP122865
This week on Rocket, Simone kissed too many Italians, so Brianna and Christina are running amok. They talk about the new ChatGPT API and the potential it could have on different startups and industries, the risky bets Binance made with its customer's money (without telling them) and Elizabeth Holmes' latest attempt to get out of jail, thus saving Christina $500.
Это главные IT-новости недели в подкасте Telegram-канала ForGeeks с Сергеем Кузнецовым. Рассказал про удаление Tinkoff из AppStore, столкновение чёрных дыр, интересные анонсы MWC 2023 и многое другое. Слушайте свежий выпуск, читайте и подписывайтесь на ForGeeks в Telegram.
Heute u.A. mit diesen Themen:Deutscher Business Angel Report 2023 veröffentlichtRTL Ventures will Consumer-Tech-Unternehmen unterstützenSteve Davis könnte Twitter-CEO werdenSalesforce wächst erneut zweistelligNutzer-Akzeptanz bei Fintech-Apps geringEU könnte Metas Metaverse prüfenApple baut Chip-Zentrum in München ausKryptobank Silvergate in TurbulenzenOpenAI öffnet ChatGPT-API für EntwicklerElon Musk stellt Tesla-„Masterplan“ vor
A ChatGPT API for business is here. Microsoft gives Bing those nobs and dials that I've been talking about. What are multimodal LLMs? New turmoil in crypto, this time around one of the big crypto friendly banks. How is it going in terms of social platforms diversifying into subscription revenue? And why the FDA has rejected Neuralink's applications to begin human testing of brain implants.Links:OpenAI launches an API for ChatGPT, plus dedicated capacity for enterprise customers (TechCrunch)Microsoft now lets you change Bing's chatbot personality to be more entertaining (The Verge)Microsoft unveils AI model that understands image content, solves visual puzzles (ArsTechnica)Apple Blocks Update of ChatGPT-Powered App, as Concerns Grow Over AI's Potential Harm (WSJ)Coinbase is no longer accepting, initiating payments with Silvergate (The Block)Snapchat will now let you pause your Snap Streaks (TechCrunch)TikTok Earned $205 Million More Than Facebook, Twitter, Snap And Instagram Combined On In-App Purchases In 2023 (Forbes)U.S. regulators rejected Elon Musk's bid to test brain chips in humans, citing safety risks (Reuters)See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
OpenAI just rollicked the AI world yet again yesterday — while releasing the long awaited ChatGPT API, they also priced it at $2 per million tokens generated, which is 90% cheaper than the text-davinci-003 pricing of the “GPT3.5” family. Their blogpost on how they did it is vague: Through a series of system-wide optimizations, we've achieved 90% cost reduction for ChatGPT since December; we're now passing through those savings to API users.We were fortunate enough to record Episode 2 of our podcast with someone who routinely creates 90%+ improvements for their customers, and in fact have started productizing their own infra skills with Codeium, the rapidly growing free-forever Copilot alternative (see What Building “Copilot for X” Really Takes). Varun Mohan is CEO of Exafunction/Codeium, and he indulged us in diving deep into AI infrastructure, compute-optimal training vs inference tradeoffs, and why he loves suffering.Recorded in-person at the beautiful StudioPod studios in San Francisco.Full transcript is below the fold. Timestamps* 00:00: Intro to Varun and Exafunction* 03:06: GPU Efficiency, Model Flop Utilization, Dynamic Multiplexing* 05:30: Should companies own their ML infrastructure?* 07:00: The two kinds of LLM Applications* 08:30: Codeium* 14:50: “Our growth is 4-5% day over day”* 16:30: Latency, Quality, and Correctability* 20:30: Acceleration mode vs Exploration mode* 22:00: Copilot for X - Harvey AI's deal with Allen & Overy* 25:00: Scaling Laws (Chinchilla)* 28:45: “The compute-optimal model might not be easy to serve”* 30:00: Smaller models* 32:30: Deepmind Retro can retrieve external infromation* 34:30: Implications for embedding databases* 37:10: LLMOps - Eval, Data Cleaning* 39:45: Testing/User feedback* 41:00: “Users Is All You Need”* 42:45: General Intelligence + Domain Specific Dataset* 43:15: The God Nvidia computer* 46:00: Lightning roundShow notes* Varun Mohan Linkedin* Exafunction* Blogpost: Are GPUs Worth it for ML* Codeium* Copilot statistics* Eleuther's The Pile and The Stack* What Building “Copilot for X” Really Takes* Copilot for X* Harvey, Copilot for Law - deal with Allen & Overy* Scaling Laws* Training Compute-Optimal Large Language Models - arXiv (Chinchilla paper)* chinchilla's wild implications (LessWrong)* UL2 20B: An Open Source Unified Language Learner (20B)* Paper - Deepmind Retro* “Does it make your beer taste better”* HumanEval benchmark/dataset* Reverse Engineering Copilot internals* Quora Poe* Prasanna Sankar notes on FLOPs and Bandwidth* NVIDIA H100 specs - 3TB/s GPU memory, 900GB/s NVLink Interconnect* Optimizer state is 14x size of model - 175B params => 2.5TB to store state → needs at least 30 H100 machines with 80GB each* Connor Leahy on The Gradient PodcastLightning Rounds* Favorite AI Product: Midjourney* Favorite AI Community: Eleuther and GPT-J* One year prediction: Better models, more creative usecases* Request for Startup: Superathlete Fitness Assistant* Takeaway: Continue to tinker!Transcript[00:00:00] Alessio Fanelli: Hey everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO in residence at Decibel Partners. I'm joined by my cohost, swyx, writer, editor of L Space Diaries.[00:00:20] swyx: Hey, and today we have Varun Mohan from Codeium / Exafunction on. I should introduce you a little bit because I like to get the LinkedIn background out of the way.[00:00:30] So you did CS at MIT and then you spent a few years at Nuro where you were ultimately tech lead manager for autonomy. And that's an interesting dive. Self-driving cars in AI and then you went straight into Exafunction with a few of your coworkers and that's where I met some of them and started knowing about Exafunction.[00:00:51] And then from out of nowhere you cloned GitHub Copilot. That's a lot of progress in a very short amount of time. So anyway, welcome .[00:00:59] Varun Mohan: That's high praise.[00:01:00] swyx: What's one thing about you that doesn't appear on LinkedIn that is a big part of what people should know?[00:01:05] Varun Mohan: I actually really like endurance sports actually.[00:01:09] Like I, I've done multiple triathlons. I've actually biked from San Francisco to LA. I like things that are like suffering. I like to suffer while I, while I do sports. Yeah.[00:01:19] swyx: Do you think a lot about like code and tech while you're doing those endurance sports or are you just,[00:01:24] Varun Mohan: your mind is just focused?[00:01:26] I think it's maybe a little bit of both. One of the nice things about, I guess, endurance athletics, It's one of the few things you can do where you're not thinking about, you can't really think about much beyond suffering. Like you're climbing up a hill on a bike and you see like, uh, you see how many more feet you need to climb, and at that point you're just struggling.[00:01:45] That's your only job. Mm-hmm. . Yeah. The only thing you can think of is, uh, pedaling one more pedal. So it's actually like a nice, a nice way to not think about work. Yeah,[00:01:53] Alessio Fanelli: yeah, yeah. Maybe for the audience, you wanna tell a bit about exa function, how that came to be and how coding came out[00:01:59] Varun Mohan: of that. So a little bit about exo function.[00:02:02] Before working at exa function, I worked at Neuro as Sean was just saying, and at neuro, I sort of managed large scale offline deep learning infrastructure. Realized that deep learning infrastructure is really hard to build and really hard to maintain for even the most sophisticated companies, and started exa function to basically solve that gap, to make it so that it was much easier for companies.[00:02:24] To serve deep learning workloads at scale. One of the key issues that we noticed is GPUs are extremely hard to manage fundamentally because they work differently than CPUs. And once a company has heterogeneous hardware requirements, it's hard to make sure that you get the most outta the hardware. It's hard to make sure you can get, get great GPU utilization and exa function was specifically built to make it so that you could get the most outta the hardware.[00:02:50] Make sure. Your GP was effectively virtualized and decoupled from your workload to make it so that you could be confident that you were running at whatever scale you wanted without burning the bank.[00:03:00] swyx: Yeah. You gave me this metric about inefficiency,[00:03:03] Varun Mohan: right? Oh, okay. Like flop efficiency. Yeah. Yeah. So basically, I think it comes down to, for most people, one of the things about CPUs that's really nice is with containers, right?[00:03:13] You can end up having a single. You can place many containers on them and all the containers will slowly start eating the compute. It's not really the same with GPUs. Like let's say you have a single. For the most part, only have one container using that gpu. And because of that, people heavily underestimate what a single container can sort of do.[00:03:33] And the GPU is left like heavily idle. And I guess the common term now with a lot of LM workloads is like the flop efficiency of these workloads. M F U, yeah. Yeah. Model flop utilization. The model flop utilization, which is basically like what fraction of the flops or compute on the hardware is actually getting used.[00:03:49] And sort of what we did at exa function. Not only make it so that the model was always running, we also built compiler technology to make it so that the model was also running more efficiently. And some of these things are with tricks like operator fusion, like basically you could imagine fusing two operations together such that the time it takes to compute.[00:04:07] the fused operation is lower than the time it takes for each individual operation. Oh my God. Yeah. .[00:04:13] Alessio Fanelli: Yeah. And you have this technique called dynamic multiplexing, which is basically, instead of having a one-to-one relationship, you have one GP for multiple clients. And I saw one of your customers, they went from three clients to just one single GPU and the cost by 97%.[00:04:29] What were some of those learning, seeing hardware usage and efficiencies and how that then played into what, what[00:04:34] Varun Mohan: you're building? Yeah, I think it basically showed that there was probably a gap with even very sophisticated teams. Making good use of the hardware is just not an easy problem. I think that was the main I, it's not that these teams were like not good at what they were doing, it's just that they were trying to solve a completely separate problem.[00:04:50] They had a model that was trained in-house and their goal was to just run it and it, that should be an easy. Easy thing to do, but surprisingly still, it's not that easy. And that problem compounds in complexity with the fact that there are more accelerators now in the cloud. There's like TPUs, inferential and there's a lot of decisions, uh, that users need to make even in terms of GPU types.[00:05:10] And I guess sort of what we had was we had internal expertise on what the right way to run the workload was, and we were basically able to build infrastructure and make it so that companies could do that without thinking. So most[00:05:21] Alessio Fanelli: teams. Under utilizing their hardware, how should they think about what to own?[00:05:26] You know, like should they own the appearance architecture? Like should they use Xlo to get it to production? How do you think[00:05:32] Varun Mohan: about it? So I think one thing that has proven to be true over the last year and a half is companies, for the most part, should not be trying to figure out what the optimal ML architecture is or training architecture is.[00:05:45] Especially with a lot of these large language models. We have generic models and transformer architecture that are solving a lot of distinct problems. I'll caveat that with most companies. Some of our customers, which are autonomous vehicle companies, have extremely strict requirements like they need to be able to run a model at very low latency, extremely high precision recall.[00:06:05] You know, GBT three is great, but the Precision Recall, you wouldn't trust someone's life with that, right? So because of that, they need to innovate new kinds of model architectures. For a vast majority of enterprises, they should probably be using something off the shelf, fine tuning Bert models. If it's vision, they should be fine tuning, resonant or using something like clip like the less work they can do, the better.[00:06:25] And I guess that was a key turning point for us, which is like we start to build more and more infrastructure for the architectures that. The most popular and the most popular architecture was the transformer architecture. We had a lot of L L M companies explicitly reach out to us and ask us, wow, our GT three bill is high.[00:06:44] Is there a way to serve G P T three or some open source model much more cheaply? And that's sort of what we viewed as why we were maybe prepared for when we internally needed to deploy transform models our.[00:06:58] Alessio Fanelli: And so the next step was, Hey, we have this amazing infrastructure. We can build kind of consumer facing products, so to speak, at with much better unit economics, much better performance.[00:07:08] And that's how code kind[00:07:10] Varun Mohan: of came to be. Yeah. I think maybe the, the play is not maybe for us to be just, we make a lot of consumer products. We want to make products with like clear ROI in the long term in the enterprise. Like we view code as maybe one of those things. Uh, and maybe we can, we can talk about code maybe after this.[00:07:27] We. Products like co-pilot as being extremely valuable and something that is generating a lot of value to professionals. We saw that there was a gap there where a lot of people probably weren't developing high intensive L L M applications because of cost, because of the inability to train models the way they want to.[00:07:44] And we thought we could do that with our own infrastructure really quickly.[00:07:48] swyx: I wanna highlight when you say high intensive, you mean basically generate models every key, uh, generate inferences on every keystroke? That's[00:07:55] Varun Mohan: right. Yeah. So I would say like, there's probably two kinds of L l M applications here.[00:07:59] There's an L L M application where, you know, it rips through a bunch of data and maybe you wait a couple minutes and then you see something, and then there's an application where the quality is not exactly what you want, but it's able to generate enough, sorry, low enough latency. It's still providing a ton of value.[00:08:16] And I will say there's like a gap there where the number of products that have hit that co-pilot spot is actually not that high. Mm. A lot of them are, are kind of like weight and, you know, just generate a lot of stuff and see what happens because one is clearly more compute intensive than the other Basically.[00:08:31] swyx: Well co uh, I don't know if we told the whole story yet, you were going to[00:08:35] Varun Mohan: dive into it. . Yeah, so I guess, I guess the story was I guess four or five months ago we sort of decided internally as a team we were like very early adopters of co-pilot. I'm not gonna sit here and say co-pilot, it's not a great tool.[00:08:45] We love co-pilot. It's like a fantastic tool. We all got on the beta. The moment it came out we're like a fairly small T, but we, like we all got in, we were showing each other completions. We end up writing like a lot of cuda and c plus plus inside the company. And I think there was probably a thought process within us that was like, Hey, the code we write is like very high aq.[00:09:04] You know? So like there's no way it can help. And one of the things in c plus plus that's like the most annoying is writing templates. Writing template programming is maybe one of those things. No one, maybe there's like some people in the C plus O standards community that can do it without looking at the, looking at anything online.[00:09:19] But we struggle. We struggle writing bariatric templates and COPA just like ripped through. Like we had a 500 line file and it was just like writing templates like, and we didn't really even test it while we were running it. We then just compiled it and it just, We're like, wow. Like this is actually something that's not just like it's completing four loops, it's completing code for us.[00:09:38] That is like hard in our brains to reach, but fundamentally and logically is not that complicated. The only reason why it's complicated is there's just a lot of rules, right. And from then we were just like, wow, this is, that was maybe the first l l m application for us internally, because we're not like marketers that would use, uh, Jasper, where we were like, wow, this is like extremely valuable.[00:09:58] This is not a toy anymore. So we wanted to take our technology to build maybe apps where these apps were not gonna be toys, right? They were not gonna be like a demo where you post it on Twitter and then you know there's hype and then maybe like a month later, no one's using.[00:10:11] swyx: There's a report this morning, um, from co-pilot where they, they were estimating the key tabs on amount of code generated by a co-pilot that is then left in code repos and checked in, and it's something like 60 to 70%[00:10:24] Varun Mohan: That's, that's nuts, but I totally believe it given, given the stats we have too. There's this flips in your head once you start using products like this, where in the beginning there's like, there's like skepticism, like how, how valuable can it be? And suddenly now like user behavior fundamentally changes so that now when I need to write a function, I'm like documenting my code more because I think it's prompting the model better, right?[00:10:43] So there's like this crazy thing where it's a self-fulfilling prophecy where when you get more value from it, more of your code is generated. From co-pilot[00:10:50] swyx: just to walk through the creation process, I actually assumed that you would have grabbed your data from the pile, which is the Luther ai, uh, open source, uh, code information.[00:11:00] But apparently you scraped your own[00:11:01] Varun Mohan: stuff. Yeah. We ended up basically using a lot of open, I guess, permissively licensed code, uh, in the public internet, mainly because I think also the pile is, is fairly a small subset. Uh, I think maybe after we started there was the, that was also came to be, but for us, we had a model for ourselves even before that, uh, was the point.[00:11:21] Ah, okay. So the timing was just a little bit off. Yeah, exactly. Exactly. But it's awesome work. It's, it seems like there's a good amount of work that's getting done Decentrally. Yeah. Which is a little bit surprising to me because I'm like more bullish on everyone needs to get together in a room and make stuff happen.[00:11:35] Like we're all in person in Mountain View. But yeah, no, it's pretty impressive. Yeah. Luther in general, like everything they've done, I'm pretty impressed with it. Yeah, and we're[00:11:42] swyx: gonna talk about that. Cause I, I didn't know you were that involved in the community[00:11:45] Varun Mohan: that early on I wasn't involved. It was more of like a, I was watching and maybe commenting from time to time.[00:11:50] So they're a very special community for sure. Yeah,[00:11:52] swyx: yeah, yeah. That's true. That's true. My impression is a bunch of you are geniuses. You sit down together in a room and you. , get all your data, you train your model, like everything's very smooth sailing. Um, what's wrong with that[00:12:02] Varun Mohan: image? Yeah, so probably a lot of it just in that a lot of our serving infrastructure was already in place, Uhhuh before then.[00:12:09] So like, hey, we were able to knock off one of these boxes that I think a lot of other people maybe struggle with. The open source serving offerings are just, I will say, not great in that. That they aren't customized to transformers and these kind of workloads where I have high latency and I wanna like batch requests, and I wanna batch requests while keeping latency low.[00:12:29] Mm-hmm. , right? One of the weird things about generation models is they're like auto regressive, at least for the time being. They're auto aggressive. So the latency for a generation is a function of the amount of tokens that you actually end up generating. Like that's like the math. And you could imagine while you're generating the tokens though, unless you batch a.[00:12:46] It's gonna end up being the case that you're not gonna get great flop utilization on the hardware. So there's like a bunch of trade offs here where if you end up using something completely off the shelf, like one of these serving thing, uh, serving frameworks, you're gonna end up leaving a lot of performance on the table.[00:13:00] But for us, we were already kind of prepared. To sort of do that because of our infrastructure that we had already built up. And probably the other thing to sort of note is early on we were able to leverage open source models, sort of bootstrap it internally within our company, but then to ship, we finally had some requirements like, Hey, we want this model to have fill in the middle capabilities and a bunch of other things.[00:13:20] And we were able to ship a model ourselves. So we were able to time it so that over the course of multiple months, different pieces were like working out properly for us. So it wasn't. . You know, we started out and we were just planning the launch materials. The moment we started there was like maybe some stuff that was already there, some stuff that we had already figured out how to train models at scale internally.[00:13:38] So we were able to just leverage that muscle very quickly. I think the one[00:13:41] swyx: thing that you had figured out from the beginning was that it was gonna be free forever. Yeah. Yeah, co-pilot costs $10[00:13:47] Varun Mohan: a month. Co-pilot costs $10 a month. I would argue significantly more value than $10 a month. The important thing for us though, was we are gonna continue to build more great products on top of code completion.[00:13:58] We think code completion is maybe day one of what the future looks like. And for that, clearly we can't be a product that's like we're $10 a month and we're adding more products. We want a user base that loves using us. And we'll continue to stay with us as we continue to layer on more products. And I'm sure we're gonna get more users from the other products that we have, but we needed some sort of a differentiator.[00:14:17] And along the way we realized, hey, we're pretty efficient at running these workloads. We could probably do this. Oh, so it wasn't,[00:14:23] swyx: it was a plan to be free from the start. You just[00:14:25] Varun Mohan: realized we, yeah. We realized we could probably, if we cut and optimized heavily, we could probably do this properly. Part of the reasoning here was we were confident we could probably build a pro tier and go to the enter.[00:14:35] But for now, originally when we, when we started, we weren't like, we're just gonna go and give every, all pieces of software away for free. That wasn't like sort of the goal there. And[00:14:43] swyx: since you mentioned, uh, adoption and, you know, traction and all that, uh, what can you disclose about user growth? Yeah, user adoption.[00:14:50] Varun Mohan: Yeah. So right now we have. We probably have over 10,000 users and thousands of daily actives, and people come back day over day. Our growth is like around, you know, four to 5% day over day right now. So all of our growth right now is sort of like word of mouth, and that's fundamentally because like the product is actually one of those products where.[00:15:08] Even use COT and use us, it's, it's hard to tell the difference actually. And a lot of our users have actually churned off of cot isn't Yeah. I,[00:15:14] swyx: I swept Yeah. Yeah. To support you guys, but also also to try[00:15:17] Varun Mohan: it out. Yeah, exactly. So the, the crazy thing is it wasn't like, Hey, we're gonna figure out a marketing motion of like, Going to the people that have never heard of co-pilot and we're gonna like get a bunch of users.[00:15:27] We wanted to just get users so that in our own right we're like a really great product. Uh, and sort of we've spent a lot of engineering time and obviously we co-wrote a blog post with you, Sean, on this in terms of like, there's a lot of engineering work, even beyond the latency, making sure that you can get your cost down to make a product like this actually work.[00:15:44] swyx: Yeah. That's a long tail of, of stuff that you referenced,[00:15:47] Varun Mohan: right? Yes. Yeah, exactly.[00:15:48] swyx: And you, you said something to the order of, um, and this maybe gets into co-pilot for X uh, which is something that everybody is keen about cuz they, they see the success of co-pilot. They're like, okay, well first of all, developer tools, there's more to do here.[00:16:00] And second of all, let's say the co-pilot idea and apply for other disciplines. I don't know if you wanna Yeah.[00:16:06] Varun Mohan: There's[00:16:06] Alessio Fanelli: gonna some. Key points that, that you touched on. Um, how to estimate, inference a scale, you know, and the latency versus quality trade-offs. Building on first party. So this is free forever because you run your own models, right?[00:16:19] That's right. If you were building on open ai, you wouldn't be able to offer it for free real-time. You know, when I first use coding, It was literally the same speed as Copi is a little bit[00:16:29] swyx: faster. I don't know how to quantify it,[00:16:31] Varun Mohan: but we are faster. But it's one of those things that we're not gonna like market as that's the reason because it's not in and of itself a right for you to like, I'm just gonna be open with you.[00:16:39] It's not a reason for you to like suddenly turn off a copilot where if our answers were trash, uh, but we were faster. You know what I mean? But your focus[00:16:46] Alessio Fanelli: was there. We used the alpha, I think prem on our discord came to us and say, you guys should try this out. So it was really fast. Even then, prompt optimization is another big thing, and model outputs and UX kind of how you bring them together.[00:17:00] Which ones of these things are maybe like the one or two that new founders should really think about first?[00:17:07] Varun Mohan: Yeah, I think, I think my feeling on this is unless you are ex, you probably should always bootstrap on top of an existing a. Because like even if you were to, the only reason why we didn't is because we knew that this product was actually buildable.[00:17:22] Probably if we worked hard enough to train a model, we would actually be able to build a great product already. But if you're actually going out and trying to build something from scratch, unless you genuinely believe, I need to fine tune on top of, you know, terabytes of data terabyte is a very large amount of data, but like tens of gigabytes of data.[00:17:37] Probably go out and build on top of an API and spend most of your time to make it so that you can hit that quality latency trade off properly. And if I were to go out and think about like the three categories of like an LM product, it's probably like latency, quality, and correct ability. The reality is, you know, if I were to take a product like co-pilot or Coum, the latency is very low.[00:17:58] The quality I think, is good enough for the task, but the correct ability is, is very easy. Credibility. What, what is correct ability? Correct ability means, let's say the quality is not there. Like you consider the the case where, The answer is wrong. How easy is it for your user to actually go and leverage parts of the generation?[00:18:16] Maybe a, a concrete example. There's a lot of things people are excited about right now where I write a comment and it generates a PR for me, and that's like, that's like really awesome in theory. I think that's like a really cool thing and I'm sure at some point we will be able to get there. That will probably require an entirely new model for what it's worth that's trained on diffs and commits and all these other things that looks at like improvements and code and stuff.[00:18:37] It's probably not gonna be just trained on generic code. But the problem with those, those sort of, I would say, applications are that, let's suppose something does change many files, makes large amounts of changes. First of all, it's guaranteed not gonna be. Because even the idea of like reviewing the change takes a long time.[00:18:54] So if the quality and the correct ability is just not there, let's say you had 10 file, a 10 file change and you modified like, you know, file two and four, and those two modifications were consistent, but the other eight files were not consistent. Then suddenly the correct ability is like really hard.[00:19:10] It's hard to correct the output of the model. And so the user interface is 100% really important. But maybe until you get the latency down or the correct ability, like correct ability, like a lot better, it's probably not gonna be shippable. And I think that's what you gotta spend your time focusing on.[00:19:26] Can you deliver a product that is actually something users want to use? And I think this is why I was talking about like demo. It's like very easy to hand to handpick something that like works, that works for a demo, exceedingly hard for something that has large scope, like a PR to work consistently. It will take a lot of engineering effort to make it work on small enough chunks so that a user is like, wow, this is value generative to me.[00:19:49] Because eroding user trust or consumer trust is very easy. Like that is, it is is much, much, it's very easy to erode user trust versus enterprise. So just be mindful of that, and I think that's probably like the mantra that most of these companies need to operate under. Have you done any[00:20:05] Alessio Fanelli: analysis on. What the ratio between code generated and latency is.[00:20:11] So you can generate one line, but you could also generate the whole block. You can generate Yeah. A whole class and Yeah. You know, the more you generate the, the more time it takes. Like what's the sweet spot that, that you[00:20:21] Varun Mohan: found? Yeah, so I think there was a great study and, and I'm not sure if it's possible to link it, but there was a great study about co-pilot actually that came out.[00:20:28] Basically what they said was there were two ways that developers usually develop with a code assistant technology. They're either in what's called like acceleration mode or exploration mode. And exploration mode is basically you're in the case where you don't even know what the solution space for the function is.[00:20:43] and you just wanna generate a lot of code because you don't even know what that looks like. Like it might use some API that you've never heard of. And what you're actually doing at that point is like you're writing a clean comment, just wishing and praying that you know, the generation is long enough and gets you, gets you far enough, right?[00:20:57] acceleration mode is basically you are doing things where you are very confident in what you're doing and effectively. Code gives you that muscle so that you can basically stay in flow state and you're not thinking about like exactly what the APIs look like, but push comes to shove. You will figure out what the APIs look like, but actually like mentally, it takes off like a load in your head where you're like, oh wow.[00:21:18] Like I can just do this. The intent to execution is just a lot, a lot lower there. And I think effectively you want a tool that captures that a little bit. And we have heuristics in terms of captur. Whether or not you're in acceleration versus exploration mode. And a good heuristic is, let's say you're inside like a basic block of a piece of code.[00:21:37] Let's say you're inside a a block of code or an IF statement, you're probably already in acceleration mode and you would feel really bad if I started generating the ELs clause. Because what happens if that else causes really wrong? That's gonna cause like mental load for you because you are the way programmers think.[00:21:51] They only want to complete the if statement first, if that makes sense. So there are things where we are mindful of like how many lines we generate if you use the product, like multi-line generations happen and we are happy to do them, but we don't want to do them when we think it's gonna increase load on developers, if that makes sense.[00:22:07] That[00:22:07] Alessio Fanelli: makes sense. So co-pilot for x. , what are access that you think are interesting for people to build[00:22:13] Varun Mohan: in? Didn't we see some, some tweet recently about Harvey ai, uh, company that, that is trying to sell legal? It's like a legal, legal assistance. That's, that's pretty impressive, honestly. That's very impressive.[00:22:23] So it seems like I would really love to see what the product looks like there, because there's a lot of text there. You know, looking at bing, bing, ai, like, I mean, it's, it's pretty cool. But it seems like groundedness is something a lot of these products struggle with, and I assume legal, if there's one thing you want them to.[00:22:39] To get right. It's like the groundedness. Yeah.[00:22:42] swyx: Yeah. I've made the analogy before that law and legal language is basically just another form of programming language. You have to be that precise. Yes. Definitions must be made, and you can scroll to find the definition. It's the same thing. Yes. ,[00:22:55] Varun Mohan: yes. Yeah. But like, I guess there's a question of like comprehensiveness.[00:22:59] So like, let's say, let's say the only way it generates a suggestion is it provides like, you know, citations to other legal. You don't want it to be the case that it misses things, so you somehow need the comprehensiveness, but also at the same time, you also don't want it to make conclusions that are not from the site, the things at sites.[00:23:15] So, I don't know, like that's, that's very impressive. It's clear that they've demonstrated some amount of value because they've been able to close a fairly sizable enterprise contract. It was like a firm with 3,500 lawyers, something nuts, honestly. Very cool. So it's clear this is gonna happen, uh, and I think people are gonna need to be clever about how they actually make it work.[00:23:34] Within the constraints of whatever workload they're operating in. Also, you, you guys[00:23:37] swyx: are so good at trading stuff, why don't you, you try[00:23:39] Varun Mohan: cloning it. Yeah. So I think, I think that's, that's, uh, preview the roadmap. Yeah, yeah, yeah, yeah. No, no, no, but I'm just kidding. I think one of the things that we genuinely believe as a startup is most startups can't really even do one thing properly.[00:23:52] Mm-hmm. Focus. Yeah. Yeah. Usually doing one thing is really hard. Most companies that go public have like maybe a couple big products. They don't really have like 10, so we're under no illusions. Give the best product experience, the amount of engineering and attention to detail, to build one good product as hard.[00:24:08] So it's probably gonna be a while before we even consider leaving code. Like that's gonna be a big step because the amount of learning we need to do is gonna be high. We need to get users right. We've learned so much from our users already, so, yeah, I don't think we'd go into law anytime soon.[00:24:22] swyx: 3,500 lawyers with Ellen and Ry, uh, is, is is apparently the, the new[00:24:27] Varun Mohan: That's actually really big.[00:24:28] Yeah. Yeah. I can congrat.[00:24:29] swyx: Yeah, it's funny cuz like, it seems like these guys are moving faster than co-pilot. You know, co-pilot just launched, just announced enterprise, uh, like co-pilot for teams or co-pilot for Enterprise. Yeah. After like two years of testing.[00:24:40] Varun Mohan: Yeah, it does seem like the co-pilot team has built a very, very good product.[00:24:44] Um, so I don't wanna like say anything, but I think it is the case to startups will be able to move faster. I feel like that is true, but hey, like GitHub has great distribution. Whatever product they do have, they will be able to sell it really. Shall[00:24:56] swyx: we go into model numbers and infra estimates? our favorite[00:25:01] Varun Mohan: topics.[00:25:02] Nice small models. Nice.[00:25:04] swyx: So this is, um, relevant to basically I'm researching a lot of skilling law stuff. You have a lot of thoughts. You, you host paper discussions[00:25:12] Varun Mohan: in your team. Yeah, we, we try to like read papers that we think are really interesting and relevant to us. Recently that's been, there's just a fire hose of papers.[00:25:21] You know, someone even just curating what papers we should read internally as a company. Yeah, I think, I think there's, there's so much good content[00:25:28] swyx: out there. You should, you guys should have a podcast. I mean, I told you this before. Should have a podcast. Just, just put a mic near where, where you guys are[00:25:33] Varun Mohan: talking.[00:25:34] We gotta, we gotta keep developing coding though, . No, but you're doing this discussion[00:25:38] swyx: anyway. You[00:25:38] Varun Mohan: might as well just, oh, put the discussion on a podcast. I feel like some of the, some of the thoughts are raw, right? Like, they're not gonna be as, as nuanced. Like we'll just say something completely stupid during our discussions.[00:25:48] I don't know, , maybe that's exciting. Maybe that's, it's kinda like a justin.tv, but for ML papers, Okay, cool. I watched that.[00:25:55] swyx: Okay, so co-pilot is 12 billion parameters. Salesforce cogen is up to 16. G P t three is 175. GP four is gonna be 100 trillion billion. Yeah. So what, what we landed on with you is with, uh, with Cilla, is that we now have an idea of what compute optimal data scaling is.[00:26:14] Yeah. Which is about 20 times parameters. Is that intuitive to you? Like what, what did that[00:26:18] Varun Mohan: unlock? I think basically what this shows is that bigger models are like more data efficient, like given the same number of tokens, a big model like trained on the same number of tokens. A bigger model is like, is gonna learn more basically.[00:26:32] But also at the same time, the way you have to look at it is there are more flops to train a bigger model on the same number of tokens. So like let's say I had a 10 billion parameter model and I trained it on on 1 million tokens, but then I had a 20 billion parameter model at the end of it will be a better.[00:26:47] It will have better perplexity numbers, which means like the probability of like a prediction is gonna be better for like the next token is gonna be better. But at the end of it, you did burn twice the amount of compute on it. Right? So Shinto is an interesting observation, which says if you have a fixed compute budget, And you want the best model that came out of it because there's like a difference here where a model that is, that is smaller, trained on the same number of tokens as fewer flops.[00:27:12] There's a a sweet spot of like number of tokens and size a model. I will say like people probably like. Are talking about it more than they should, and, and I'll, I'll explain why, but it's a useful result, which is like, let's say I have, you know, some compute budget and I want the best model. It tells you what that, what you should generate.[00:27:31] The problem I think here is there is a real trade off of like, you do need to run this model somewhere. You need to run it on a piece of hardware. So then it comes down to how much memory does that piece of hardware have. Let's say for a fixed compute budget, you could train a 70 billion parameter. What are you gonna put that on?[00:27:47] Yeah, maybe you could, could you put that on an 80 gig, A 100? It would be a stretch. You could do things like f, you know, in eight F p a, to reduce the amount of memory that's on the box and do all these other things. But you have to think about that first, right? When you want to go out and train that model.[00:27:59] The worst case is you ended up training that mo, that model, and you cannot serve it. So actually what you end up finding is for a lot of these code completion models, they are actually what you would consider over-trained . So by that I mean like, let's look at a model like Cogen. It's actually trained on, I believe, and, and I could be wrong by, you know, a hundred billion here or there.[00:28:18] I got some data. Oh, okay. Let's look at the 3 billion parameter model. It's a 2.7. I think it's actually a 2.7 billion barometer model. It's weird because they also trained on natural language on top of code, but it's trained on hundreds of billions of tokens. If you applied that chinchilla, Optimization to it, you'd be like, wow, this is, this is a stupid use of compute.[00:28:36] Right? Because three, they should be going to 60, any anything more than 60. And they're like, they should have just increased the model size. But the reality is if they had like the compute optimal one might not be one that's easy to serve, right? It could just have more parameters. And for our case, our models that we train internally, they might not be the most compute.[00:28:56] In other words, we probably could have had a better model by making it larger, but the trade off would've been latency. We know what the impact of having higher latency is, and on top of that, being able to fit properly on our hardware constraints would've also been a concern.[00:29:08] swyx: Isn't the classic stopping point when you, you see like loss kind of levels off.[00:29:12] Right now you're just letting chinchilla tell you,[00:29:16] Varun Mohan: but like you should just look at loss. The problem is the loss will like continue to go down. It'll just continue to go down like, like in a, in a way that's like not that pleasing. It's gonna take longer and longer. It's gonna be painful, but it's like one of those things where if you look at the perplexity number of difference between.[00:29:31] Let's say a model that's like 70 billion versus 10 billion. It's not massive. It's not like tens of percentage points. It's like very small, right? Mm. The reality is here, like, I mean this comes down to like IQ of like these models in some sense, like small wins at the margins are massive wins in terms of iq.[00:29:47] Like it's harder to get those and they don't look as big, but they're like massive wins in terms of reasoning. They can now do chain of thought, all these other things. Yeah, yeah, yeah.[00:29:55] swyx: It's, and, and so apparently unlocked around the[00:29:57] Varun Mohan: 20 billion. Yes. That's right. Some kind of magic. Yeah. I think that was from the UL two or maybe one of those land papers.[00:30:03] Any thoughts on why? Like is there is? I don't know. I mean, emergence of intelligence, I think. I think maybe one of the things is like we don't even know, maybe like five years from now of what we're gonna be running are transformers. But I think it's like, we don't, we don't 100% know that that's true. I mean, there's like a lot of maybe issues with the current version of the transformers, which is like the way attention works, the attention layers work, the amount of computers quadratic in the context sense, because you're like doing like an n squared operation on the attention blocks basically.[00:30:30] And obviously, you know, one of the things that everyone wants right now is infinite context. They wanna shove as much prop as possible in here. And the current version of what a transformer looks like is maybe not ideal. You might just end up burning a lot of flops on this when there are probably more efficient ways of doing it.[00:30:45] So I'm, I'm sure in the future there's gonna be tweaks to this. Yeah. Uh, but it is interesting that we found out interesting things of like, hey, bigger is pretty much always better. There are probably ways of making smaller models significantly better through better data. That is like definitely true. Um, And I think one of the cool things that the stack showed actually was they did a, like a, I think they did some ablation studies where they were like, Hey, what happens if we do, if we do decontamination of our data, what happens if we do de-duplication?[00:31:14] What happens if we do near dup of our data and how does the model get better? And they have like some compelling results that showcase data quality really matters here, but ultimately, Yeah, I think it is an interesting result that at 20 billion there's something happening. But I also think like some of these things in the future may look materially different than what they look like right now.[00:31:30] Hmm. Do you think[00:31:31] Alessio Fanelli: the token limitation is actually a real architectural limitation? Like if you think about the tokens need as kind of like atic, right? Like once you have. 50,000 tokens context, like 50,000 or infinite. For most use cases, it's like the same. Where do you think that number is, especially as you think about code, like some people have very large code bases, there's a lot.[00:31:53] Have you done any work there to figure out where the sweet[00:31:55] Varun Mohan: spot is? Yeah, look, I think what's gonna really end up happening is if people come up with a clever way and, and it, there was some result research that I believe came out of Stanford. I think the team from the Helm group, I think came out with some architecture that looks a little bit different than Transformers, and I'm sure something like this will work in the future.[00:32:13] What I think is always gonna happen is if you find a cheap way to embed context, people are gonna figure out a way to, to put as much as possible in because L LM so far have been like virtually stateless. So the only thing that they have beyond fine tuning is like just shoveling everything you can inside.[00:32:28] And there are some interesting papers, like retro, actually there are maybe some interesting pieces of thought like ideas that have come out recently. Yeah, let's go through them. So one of the really interesting ideas, I think is retro. It's this paper that came out of DeepMind and the idea is actually, let's say you send out, you send out, uh, a prompt.[00:32:44] Okay? Send out a prompt. You compute the burt embedding of that. And then you have this massive embedding database. And by massive, I'm not talking about like gigabytes, I'm talking about terabytes. Like you have, geez, you actually have 10 times the number of tokens as what was used to train the model. So like, let's say you had a model that was trained on a trillion tokens, you have a 10 trillion embed, uh, like embedding database.[00:33:04] And obviously Google has this because they have all content that ever existed in humanity and they have like the best data set and sort of, they were able to make one of these, uh, embedding databases. But the idea here, which is really cool, is you end. Taking your prompt, computing, the bird, embedding you find out the things that were nearby.[00:33:20] So you do roughly like a semantic search or an embedding search within that. And then you take those, you take the documents that were from those embeddings and you shove those in the model too, in what are called like cross chunked attention. So you like shove them in the model with it as well.[00:33:34] Suddenly now the model is able to take in external. Which is really exciting actually, because suddenly now you're able to get dynamic context in, and the model in some sense is deciding what that context is. It's not deciding it completely. In this case, because the Bert model in this case was actually frozen.[00:33:50] It wasn't trained with the retro model as well, but. The idea is you're somehow adding or augmenting context, which I think is like quite exciting. There's probably two futures. Either context becomes really cheap. Right now it's quadratic. Maybe there's a future where it becomes linear in the, in the size of the context, but the future might actually be the model itself dictates, Hey, I have this context.[00:34:10] You have this data source. Give me this. The model itself is going out into your database and like being like, I want this information, and this is kind of like. What Bing search is looking like. Right? Or bing chat is sort of looking like where it's like I, the model is probably, there's probably some model that's saying I want this information.[00:34:27] And that is getting augmented into the context. Now the model itself knows what context it sort of has and it can sort of like build a state machine of sort of what it needs. And that's probably what the future of this looks like. So you, you[00:34:37] swyx: predict monster embedding database[00:34:39] Varun Mohan: companies? Probably Monster embedding database companies or, yeah.[00:34:43] The model in some sense will need to talk to, Talk to these embedding databases. I'm actually not convinced that the current breed of embedding database companies are like ready for what the future sort of looks like. I think I'm just looking at their pricing, how much it costs per gigabyte and it's prohibitive at the scale we're talking about, like let's say you actually did want to host a 10 terabyte embedding database.[00:35:03] A lot of them were created, let's say two years ago, two, three years ago, where people were like, you know, embedding databases are small and they need to make the cost economics work. But maybe, yeah, there's probably gonna be a big workload there. I will just say for us, we will probably just build this in-house to start with, and that's because I think the technology probably isn't there.[00:35:20] And I think that the technology isn't there yet. Like waiting on point solutions to come up is a lot harder, um, than probably building it up. The way I, I like to think about this is probably the world looks on the LM space. Looks like how the early internet days were, where I think the value was accrued to probably like Google and Google needed to figure out all the crazy things to make their workload work.[00:35:41] And the reason why they weren't able to outsource is, is no one else was feeling the pain. ,[00:35:46] swyx: they're just solving their own pain points. They're just solving their own pain points. They're so far ahead of everyone else. Yes, yes. And just wait[00:35:50] Varun Mohan: for people to catch up. Yes. Yes. And that's maybe different than how things like Snowflake look where the interface has been decided for what SQL looks like 50 years ago.[00:35:58] And because of that, you can go out and build the best database and Yeah, like everyone's gonna be like, this doesn't make my beer taste better. And buy your database basically. That's[00:36:08] swyx: a great reference, by the way. Yeah. We have some friends of the, the pod that are working on embedding database, so we'll try to connect you Toroma[00:36:14] Varun Mohan: and see.[00:36:14] Yeah. Oh, I actually know Anton. I worked with him at Neuro. Oh. Although, there you go. Yeah. Uh, what do you, well, what do you think about, I mean,[00:36:20] swyx: so chromas pivoting towards an embedding[00:36:22] Varun Mohan: database. I think it's an interesting idea. I think it's an interesting idea. I wonder what the early set of workloads that.[00:36:27] They will hit our, and you know what the scaling requirements are. This is maybe the classic thing where like, the teams are great, but you need to pick a workload here that you care about the most. You could build anything. You could build anything. When you're an infrastructure company, you can go in, if I was selling, serving in for, I could build, serving for like linear aggression.[00:36:44] I could build this, but like, unless you hit the right niche for the end user, it's gonna be. . So I think it, I'm excited to see what comes out and if they're great, then we'll use it. Yeah.[00:36:54] swyx: I also like how you slowly equated yourself to Google there. Oh, we're not, we're not Google. You're, you're gonna be the Google of ai.[00:37:00] Varun Mohan: We're definitely, we're definitely not Google. But I was just saying in terms of like, if you look at like the style of companies that came out. Yeah. You know? Absolutely. Or maybe we should live in the cutting edge in[00:37:08] swyx: the future. Yeah. I think that's the pitch.[00:37:10] Varun Mohan: Okay, thanks for b***h us.[00:37:13] Alessio Fanelli: So you just mentioned the older vector embedding source are kind of not made for the L l M generation of compute size.[00:37:21] what does l LM ops look like? You know, which pieces need to be drastically different? Which ones can we recycle?[00:37:27] Varun Mohan: Yeah. One of the things that we've found, like in our own thing of building code that's been just shows how much is missing, and this is the thing where like, I don't know how much of this you can really outsource, which is like we needed to build eval infrastructure.[00:37:40] That means how do you build a great code? And there are things online like human eval, right? And uh, I was telling, which is the benchmark telling Sean about this, the idea of human eval is really neat for code. The idea is you provide a bunch of functions with Docstrings and the eval instead of being, did you predict next token?[00:37:56] It's like, did you generate the entire function and does the function run correctly against a bunch of unit tests? Right. And we've built more sophisticated evals to work on many languages, to work on more variety of code bases. One of the issues that ends up coming up with things like human eval is contam.[00:38:12] Because a lot of these, uh, things that train models end up training on all of GitHub GitHub itself has human eva, so they end up training on that. And then the numbers are tiny, though. It's gonna be tiny, right? But it doesn't matter if it's tiny because it'll just remember it. It'll remember that it's, it's not that it's that precise, but it will, it's like, it's basically like mixing your, your training and validation set.[00:38:32] It's like, oh, yeah, yeah, yeah, yeah. But we've seen cases where like online where someone is like, we have a code model that's like, they we're like, we did this one thing, and HU and human eval jumped a ton and we were just like, huh, did human eval get into your data set? Is that really what happened there?[00:38:46] But we've needed to build all this eval. And what is shown is data cleaning is massive, but data cleaning looks different by. Like code data cleaning is different than what is a high quality piece of code is probably different than what's a high quality legal document. Yeah. And then on top of that, how do you eval this?[00:39:01] How do you also train it at scale at whatever cost you really want to get? But those are things that the end user is either gonna need to solve or someone else is gonna need to solve for them. And I guess maybe one of the things I'm a little bearish on is if another company comes out and solves eval properly for a bunch of different verticals, what was the company that they were selling to really?[00:39:21] What were they really doing at that point? If they themselves were not eval for their own workload and all these other things? I think there are cases where, let's say for code where we probably couldn't outsource our eval, like we wouldn't be able to ship models internally if we didn't know how to eval, but it's clear that there's a lot of different things that people need to take.[00:39:38] Like, Hey, maybe there's an embedding piece. How large is this embedding database actually need to be? But hey, this does look very different than what classic ML ops probably did. Mm-hmm. . How[00:39:47] Alessio Fanelli: do you compare some of these models? Like when you're thinking about model upgrading and making changes, like what does the testing piece of it internally?[00:39:56] Yeah. For us look like.[00:39:56] Varun Mohan: For us, it's like old school AB testing. We've built like infrastructure to be able to say, ramp up users from one to 10 to. 50% and slowly roll things out. This is all classic software, uh, which[00:40:09] swyx: you do in-house. You don't, you don't buy any[00:40:10] Varun Mohan: services. We don't buy services for that.[00:40:13] There are good services, open source services that help you just don't need them. Uh, yeah, I think that's just like not the most complicated thing for us. Sure. Basically. Yeah. Uh, but I think in the future, maybe, we'll, obviously we use things like Google Analytics and all this other stuff, but Yeah. For things of ramping our models, finding out if they're actually better because the eval also doesn't tell the whole story because also for us, Even before generating the prompt, we do a lot of work.[00:40:36] And the only way to know that it's really good across all the languages that our users need to tell us that it's actually good. And, and they tell us by accepting completions. So, so GitHub[00:40:44] swyx: co-pilot, uh, the extension does this thing where they, they like, they'll set a timer and then within like five minutes, 10 minutes, 20 minutes, they'll check in to see if the code is still there.[00:40:54] I thought it was a[00:40:54] Varun Mohan: pretty creative way. It's, it's a very, it's honestly a very creative way. We do do things to see, like in the long term, if people did. Accept or write things that are roughly so because they could accept and then change their minds. They could accept and then change their minds. So we, we are mindful of, of things like that.[00:41:09] But for the most part, the most important metric is at the time, did they actually, did we generate value? And we want to know if that's true. And it's, it's kind of, it's honestly really hard to get signal unless you have like a non-trivial amount of usage, non-trivial, meaning you're getting, you're doing hundreds of thousands of completions, if not millions of completions.[00:41:25] That sounds like, oh wow. Like, that's like a very small amount. But like it's classic. Maybe like if you look at like when I used to be an intern at Quora, like, you know, now more than seven, eight years ago. When I was there, I like shipped a change and then Cora had like millions of daily actives and then it looked like it was good, and then a week later it was just like way worse.[00:41:43] And how is this possible? Like in a given hour we get like hundreds of thousands of interaction, just like, no, you just need way more data. So this is like one of those things where I think having users is like genuinely very valuable to us, basically. Users is all you need. . Yeah.[00:41:59] swyx: Um, by the way, since you brought out Quora, have you tried po any, any thoughts[00:42:03] Varun Mohan: on po I have not actually tried po I've not actually tried.[00:42:05] I[00:42:05] swyx: mean, it seems like a question answering website that's been around for 20 years or something. Would be very, would be very good at question answering. Yeah.[00:42:12] Varun Mohan: Also Adam, the ceo, is like incredibly brilliant. That guy is like insanely smart, so I'm sure they're gonna do,[00:42:18] swyx: they have accidentally built the perfect like data collection company for For qa.[00:42:22] Varun Mohan: Yeah. . It takes a certain kind of person to go and like cannibalize your original company like the in, I mean, it was kinda stagnant for like a few years. Yeah, that's probably true. That's[00:42:31] swyx: probably true. The observation is I feel like you have a bias to its domain specific. , whereas most research is skewed towards, uh, general models, general purpose models.[00:42:40] I don't know if there's like a, a deeper insight here that you wanna go into or, or not, but like, train on all the things, get all the data and you're like, no, no, no. Everyone needs like customized per task,[00:42:49] Varun Mohan: uh, data set. Yeah. I think I'm not gonna. Say that general intelligence is not good. You want a base model that's still really good and that's probably trained on normal text, like a lot of different content.[00:43:00] But I think probably one thing that old school machine learning, even though I'm like the kind of person that says a lot of old school machine learning is just gonna die, is that training on a high quality data set for your workload is, is always gonna yield better results and more, more predictable results.[00:43:15] And I think we are under no illusions that that's not the case. Basical. And[00:43:19] swyx: then the other observation is bandwidth and connectivity, uh, which is not something that people usually think about, but apparently is a, is a big deal. Apparently training agreed in the synchronous needs, high GPU coordination.[00:43:29] These are deleted notes from Sam Altman talking about how they think about training and I was like, oh yeah, that's an insight. And[00:43:34] Varun Mohan: you guys have the same thing. Yeah. So I guess for, for training, you're right in that it is actually nuts to think about how insane the networks are for NVIDIA's most recent hardware, it's.[00:43:46] For the H 100 boxes, you shove eight of these H 100 s on a. Between two nodes. The bandwidth is 3,200 gigabits a second, so 400 gigabytes a second between machines. That's like nuts when you just sit and think about it. That's like double the memory bandwidth of what a CPU has, but it's like between two machines.[00:44:04] On top of that, within the machine, they've created this, this fabric called envy link that allows you to communicate at ultra low latency. That's even lower than P C I E. If you're familiar, that's like the communication protocol. . Yeah, between like the CPU and the other devices or other P C I E devices.[00:44:21] All of this is to make sure that reductions are fast, low latency, and you don't need to think about it. And that's because like a lot of deep learning has sort of evolved. Uh, training has evolved to be synchronous in the OG days. There is a lot of analysis in terms of how good is asynchronous training, which is like, Hey, I have a node, it has a current state of the model.[00:44:39] It's gonna update that itself locally, and it'll like every once in a while, go to another machine and update the weights. But I think like everyone has converged to synchronous. I'm not exactly sure. There's not a lot of good research on asynchronous training right now. Or maybe there is an, I haven't read it.[00:44:52] It's just that there isn't as much research because people are just like, oh, synchronous works. Uh, and the hardware is continually upleveled to handle[00:44:59] swyx: that. Yeah. It was just un unintuitive to me cuz like the whole purpose of GPUs could train things. A lot of things in parallel. Yes.[00:45:05] Varun Mohan: But the crazy thing is also, maybe I can, I can give some dumb math here.[00:45:09] Sure. Here, which is that, uh, let's go with uh, G B T three, which is like 170 billion per. The optimizer state, so while you're training is 14 times the size of the model, so in this case, if it's like 170 billion parameters, it's probably, I'm not great at mental math here, but that's probably around 2.5 terabytes to just store the optimizer state.[00:45:30] That has gotta be sharded across a lot of machines. Like that is not a single gpu. Even if you take an H 100 with 80 gigs to just shard that much, that's like 40, at least 30 machines. So there's like something there where these things need to communicate with each other too.[00:45:44] swyx: You need to vertically scale horizontally.[00:45:46] Varun Mohan: Yeah. You gotta co-located, you gotta somehow feel like you have this massive, the, the ideal programming paradigm is you feel like you have this massive computer. That has no communication, you know, overhead at all, but it has like infinite computer and infinite memory bandwidth.[00:45:59] swyx: That's the AI cluster. Um, okay, well, uh, we want to head to the questions.[00:46:05] Alessio Fanelli: So favorite AI product that you are not[00:46:08] Varun Mohan: building? Yeah, I'm friends with some of the folks at Mid Journey and I really think the Mid Journey product is super cool, especially seeing how the team is iterating and the quality of generations. It consistently gets upleveled. I think it's like quite neat and I think internally at at exa functional, we've been trying out mid Journey for like random content to like generate images and stuff.[00:46:26] Does it bother[00:46:26] swyx: you that they have like a style. I don't know. It, it seems like they're hedging themselves into a particular, like you want mid journey art, you go there.[00:46:33] Varun Mohan: Yeah. It's a brand of art. Yeah, you're right. I think they do have a style, but it seems more predictably good for that style. Okay. So maybe that's too, so just get good at, uh, domain specific thing.[00:46:41] Yeah. Yeah. maybe. Maybe I, maybe I'm just selling, talking to a booker right now. . Yeah. Uh, okay.[00:46:46] swyx: Uh, next question. Uh, favorite AI people and[00:46:48] Varun Mohan: communities? Yeah, so I think I mentioned this before, but I think obviously the open. The opening eye folks are, are insane. Like we, we only have respect for them. But beyond that, I think Elu is a pretty special group.[00:46:59] Especially it's been now probably more than a year and a half since they released like G P T J, which was like back when open source G PT three Curri, which was comparable. And it wasn't like a model where like, It wasn't good. It was like comparable in terms of perplexity to GT three curity and it was trained by a university student actually, and it just showed that, you know, in the end, like I would say pedigree is great, but in if you have people that are motivated know how computers work and they're willing to just get their hands dirty, you can do crazy things and that was a crazy project that gave me more hope.[00:47:34] Decentral training being potentially pretty massive. But I think that was like a very cool thing where a bunch of people just got on Discord and were chatting and they were able to just turn this out. Yeah. I did[00:47:42] swyx: not know this until I looked in further into Luther, but it was not a formal organization.[00:47:45] Was a company was a startup. It's not, yeah. Bunch of guys on Discord.[00:47:48] Varun Mohan: They gotta you, they gotta keep you research grant and they somehow just wrote some codes. .[00:47:52] Alessio Fanelli: Yeah. Yeah. Listen to APAC with Connor, who's the person, and basically Open Eye at the time was like, we cannot release G P T because it's like too good and so bad.[00:48:01] And he was like, He actually said he was sick, so he couldn't leave home for like a, a few weeks. So it was like, what else am I gonna do? And ended up
Want to learn more? Check out our Membership! http://bit.ly/3SrEkwE We designed this circuit board for beginners! Kit-On-A-Shield: https://amzn.to/3lfWClU Arduino MEGA video: https://www.youtube.com/watch?v=lai4aGdc78A PEA Customer Projects Page: https://www.programmingelectronics.com/customer-project-gallery/ FOLLOW US ELSEWHERE --------------------------------------------------- Facebook: https://www.facebook.com/ProgrammingElectronicsAcademy/ Twitter:https://twitter.com/ProgElecAcademy Website: https://www.programmingelectronics.com/
Nesse episódio trouxemos as notícias e novidades do mundo da programação que nos chamaram atenção dos dias 18/02 a 24/02!
Nesse episódio trouxemos as notícias e novidades do mundo da programação que nos chamaram atenção dos dias 18/02 a 24/02!
We're so glad to launch our first podcast episode with Logan Kilpatrick! This also happens to be his first public interview since joining OpenAI as their first Developer Advocate. Thanks Logan!Recorded in-person at the beautiful StudioPod studios in San Francisco. Full transcript is below the fold.Timestamps* 00:29: Logan's path to OpenAI* 07:06: On ChatGPT and GPT3 API* 16:16: On Prompt Engineering* 20:30: Usecases and LLM-Native Products* 25:38: Risks and benefits of building on OpenAI* 35:22: OpenAI Codex* 42:40: Apple's Neural Engine* 44:21: Lightning RoundShow notes* Sam Altman's interview with Connie Loizos* OpenAI Cookbook* OpenAI's new Embedding Model* Cohere on Word and Sentence Embeddings* (referenced) What is AGI-hard?Lightning Rounds* Favorite AI Product: https://www.synthesia.io/* Favorite AI Community: MLOps * One year prediction: Personalized AI, https://civitai.com/* Takeaway: AI Revolution is here!Transcript[00:00:00] Alessio Fanelli: Hey everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO in residence at Decibel Partners. I'm joined by my cohost, swyx writer editor of L Space Diaries. Hey.[00:00:20] swyx: Hey . Our guest today is Logan Kilpatrick. What I'm gonna try to do is I'm gonna try to introduce you based on what people know about you, and then you can fill in the blanks.[00:00:28] Introducing Logan[00:00:28] swyx: So you are the first. Developer advocate at OpenAI, which is a humongous achievement. Congrats. You're also the lead developer community advocate of the Julia language. I'm interested in a little bit of that and apparently as I've did a bit of research on you, you got into Julia through NASA where you interned and worked on stuff that's gonna land on the moon apparently.[00:00:50] And you are also working on computer vision at Apple. And had to sit at path, the eye as you fell down the machine learning rabbit hole. What should people know about you that's kind of not on your LinkedIn that like sort of ties together your interest[00:01:02] Logan Kilpatrick: in story? It's a good question. I think so one of the things that is on my LinkedIn that wasn't mentioned that's super near and dear to my heart and what I spend a lot of time in sort of wraps a lot of my open source machine learning developer advocacy experience together is supporting NumFOCUS.[00:01:17] And NumFOCUS is the nonprofit that helps enable a bunch of the open source scientific projects like Julia, Jupyter, Pandas, NumPy, all of those open source projects are. Facilitated legal and fiscally through NumFOCUS. So it's a very critical, important part of the ecosystem and something that I, I spend a bunch of my now more limited free time helping support.[00:01:37] So yeah, something that's, It's on my LinkedIn, but it's, it's something that's important to me. Well,[00:01:42] swyx: it's not as well known of a name, so maybe people kind of skip over it cuz they were like, I don't know what[00:01:45] Logan Kilpatrick: to do with this. Yeah. It's super interesting to see that too. Just one point of context for that is we tried at one point to get a Wikipedia page for non focus and it's, it's providing, again, the infrastructure for, it's like a hundred plus open source scientific projects and they're like, it's not notable enough.[00:01:59] I'm like, well, you know, there's something like 30 plus million developers around the world who use all these open source tools. It's like the foundation. All open source like science that happens. Every breakthrough in science is they discovered the black hole, the first picture of the black hole, all that stuff using numb focus tools, the Mars Rovers, NumFOCUS tools, and it's interesting to see like the disconnect between the nonprofit that supports those projects and the actual success of the projects themselves.[00:02:26] swyx: Well, we'll, we'll get a bunch of people focused on NumFOCUS and we'll get it on Wikipedia. That that is our goal. . That is the goal. , that is our shot. Is this something that you do often, which is you? You seem to always do a lot of community stuff. When you get into something, you're also, I don't know where this, where you find time for this.[00:02:42] You're also a conference chair for DjangoCon, which was last year as well. Do you fall down the rabbit hole of a language and then you look for community opportunities? Is that how you get into.[00:02:51] Logan Kilpatrick: Yeah, so the context for Django stuff was I'd actually been teaching and still am through Harvard's division of continuing education as a teaching fellow for a Django class, and had spent like two and a half years actually teaching students every semester, had a program in Django and realized that like it was kind of the one ecosystem or technical tool that I was using regularly that I wasn't actually contributing to that community.[00:03:13] So, I think sometime in 2021 like applied to be on the board of directors of the Django Events Foundation, north America, who helps run DjangoCon and was fortunate enough to join a support to be the chair of DjangoCon us and then just actually rolled off the board because of all the, all the craziness and have a lot less free time now.[00:03:32] And actually at PATH ai. Sort of core product was also using, was using Django, so it also had a lot of connections to work, so it was a little bit easier to justify that time versus now open ai. We're not doing any Django stuff unfortunately, so, or[00:03:44] swyx: Julia, I mean, should we talk about this? Like, are you defecting from Julia?[00:03:48] What's going on? ,[00:03:50] Logan Kilpatrick: it's actually felt a little bit strange recently because I, for the longest time, and, and happy to talk about this in the context of Apple as well, the Julie ecosystem was my outlet to do a lot of the developer advocacy, developer relations community work that I wanted to do. because again, at Apple I was just like training machine learning models.[00:04:07] Before that, doing software engineering at Apple, and even at Path ai, we didn't really have a developer product, so it wasn't, I was doing like advocacy work, but it wasn't like developer relations in the traditional sense. So now that I'm so deeply doing developer relations work at Open OpenAI, it's really difficult to.[00:04:26] Continue to have the energy after I just spent nine hours doing developer relations stuff to like go and after work do a bunch more developer relations stuff. So I'll be interested to see for myself like how I'm able to continue to do that work and I. The challenge is that it's, it's such critical, important work to happen.[00:04:43] Like I think the Julie ecosystem is so important. I think the language is super important. It's gonna continue to grow in, in popularity, and it's helping scientists and engineers solve problems they wouldn't otherwise be able to. So it's, yeah, the burden is on me to continue to do that work, even though I don't have a lot of time now.[00:04:58] And I[00:04:58] Alessio Fanelli: think when it comes to communities, the machine learning technical community, I think in the last six to nine months has exploded. You know, you're the first developer advocate at open ai, so I don't think anybody has a frame of reference on what that means. What is that? ? So , what do you, how did, how the[00:05:13] swyx: job, yeah.[00:05:13] How do you define the job? Yeah, let's talk about that. Your role.[00:05:16] Logan Kilpatrick: Yeah, it's a good question and I think there's a lot of those questions that actually still exist at OpenAI today. Like I think a lot of traditional developed by advocacy, at least like what you see on Twitter, which I think is what a lot of people's perception of developer advocacy and developer relations is, is like, Just putting out external content, going to events, speaking at conferences.[00:05:35] And I think OpenAI is very unique in the sense that, at least at the present moment, we have so much inbound interest that there's, there is no desire for us to like do that type of developer advocacy work. So it's like more from a developer experience point of view actually. Like how can we enable developers to be successful?[00:05:53] And that at the present moment is like building a strong foundation of documentation and things like that. And we had a bunch of amazing folks internally who were. Who were doing some of this work, but it really wasn't their full-time job. Like they were focused on other things and just helping out here and there.[00:06:05] And for me, my full-time job right now is how can we improve the documentation so that people can build the next generation of, of products and services on top of our api. And it's. Yeah. There's so much work that has to happen, but it's, it's, it's been a ton of fun so far. I find[00:06:20] swyx: being in developer relations myself, like, it's kind of like a fill in the blanks type of thing.[00:06:24] Like you go to where you, you're needed the most open. AI has no problem getting attention. It is more that people are not familiar with the APIs and, and the best practices around programming for large language models, which is a thing that did not exist three years ago, two years ago, maybe one year ago.[00:06:40] I don't know. When she launched your api, I think you launched Dall-E. As an API or I, I don't[00:06:45] Logan Kilpatrick: know. I dunno. The history, I think Dall-E was, was second. I think it was some of the, like GPT3 launched and then GPT3 launched and the API I think like two years ago or something like that. And then Dali was, I think a little more than a year ago.[00:06:58] And then now all the, the Chachi Beast ChatGPT stuff has, has blown it all outta the water. Which you have[00:07:04] swyx: a a wait list for. Should we get into that?[00:07:06] Logan Kilpatrick: Yeah. .[00:07:07] ChatGPT[00:07:07] Alessio Fanelli: Yeah. We would love to hear more about that. We were looking at some of the numbers you went. Zero to like a million users in five days and everybody, I, I think there's like dozens of ChatGPT API wrappers on GitHub that are unofficial and clearly people want the product.[00:07:21] Like how do you think about that and how developers can interact with it.[00:07:24] Logan Kilpatrick: It. It's absolutely, I think one of the most exciting things that I can possibly imagine to think about, like how much excitement there was around ChatGPT and now getting to hopefully at some point soon, put that in the hands of developers and see what they're able to unlock.[00:07:38] Like I, I think ChatGPT has been a tremendous success, hands down without a question, but I'm actually more excited to see what developers do with the API and like being able to build those chat first experiences. And it's really fascinating to see. Five years ago or 10 years ago, there was like, you know, all this like chatbot sort of mm-hmm.[00:07:57] explosion. And then that all basically went away recently, and the hype went to other places. And I think now we're going to be closer to that sort of chat layer and all these different AI chat products and services. And it'll be super interesting to see if that sticks or not. I, I'm not. , like I think people have a lot of excitement for ChatGPT right now, but it's not clear to me that that that's like the, the UI or the ux, even though people really like it in the moment, whether that will stand the test of time, I, I just don't know.[00:08:23] And I think we'll have to do a podcast in five years. Right. And check in and see whether or not people are still really enjoying that sort of conversational experience. I think it does make sense though cause like that's how we all interact and it's kind of weird that you wouldn't do that with AI products.[00:08:37] So we. and I think like[00:08:40] Alessio Fanelli: the conversational interface has made a lot of people, first, the AI to hallucinate, you know, kind of come up with things that are not true and really find all the edge cases. I think we're on the optimism camp, you know, like we see the potential. I think a lot of people like to be negative.[00:08:56] In your role, kind of, how do you think about evangelizing that and kind of the patience that sometimes it takes for these models to become.[00:09:03] Logan Kilpatrick: Yeah, I think what, what I've done is just continue to scream from the, the mountains that like ChatGPT has, current form is definitely a research preview. The model that underlies ChatGPT GPT 3.5 is not a research preview.[00:09:15] I think there's things that folks can do to definitely reduce the amount of hall hallucinations and hopefully that's something that over time I, I, again have full confidence that it'll, it'll solve. Yeah, there's a bunch of like interesting engineering challenges. you have to solve in order to like really fix that problem.[00:09:33] And I think again, people are, are very fixated on the fact that like in, you know, a few percentage points of the conversations, things don't sound really good. Mm-hmm. , I'm really more excited to see, like, again when the APIs and the Han developers like what are the interesting solutions that people come up with, I think there's a lot that can be explored and obviously, OpenAI can explore all them because we have this like one product that's using the api.[00:09:56] And once you get 10,000, a hundred thousand developers building on top of that, like, we'll see what are the different ways that people handle this. And I imagine there's a lot of low-hanging fruit solutions that'll significantly improve the, the amount of halluc hallucinations that are showing up. Talk about[00:10:11] swyx: building on top of your APIs.[00:10:13] Chat GPTs API is not out yet, but let's assume it is. Should I be, let's say I'm, I'm building. A choice between GP 3.5 and chat GPT APIs. As far as I understand, they are kind of comparable. What should people know about deciding between either of them? Like it's not clear to me what the difference is.[00:10:33] Logan Kilpatrick: It's a great question.[00:10:35] I don't know if there's any, if we've made any like public statements about like what the difference will be. I think, I think the point is that the interface for the Chachi B API will be like conversational first, and that's not the case now. If you look at text da Vinci oh oh three, like you, you just put in any sort of prompt.[00:10:52] It's not really built from the ground up to like keep the context of a conversation and things like that. And so it's really. Put in some sort of prompt, get a response. It's not always designed to be in that sort of conversational manner, so it's not tuned in that way. I think that's the biggest difference.[00:11:05] I think, again, the point that Sam made in a, a strictly the strictly VC talk mm-hmm. , which was incredible and I, I think that that talk got me excited and my, which, which part? The whole thing. And I think, I haven't been at open AI that long, so like I didn't have like a s I obviously knew who Sam was and had seen a bunch of stuff, but like obviously before, a lot of the present craziness with Elon Musk, like I used to think Elon Musk seemed like a really great guy and he was solving all these really important problems before all the stuff that happened.[00:11:33] That's a hot topic. Yeah. The stuff that happened now, yeah, now it's much more questionable and I regret having a Tesla, but I, I think Sam is actually. Similar in the sense that like he's solving and thinking about a lot of the same problems that, that Elon, that Elon is still today. But my take is that he seems like a much more aligned version of Elon.[00:11:52] Like he's, he's truly like, I, I really think he cares deeply about people and I think he cares about like solving the problems that people have and wants to enable people. And you can see this in the way that he's talked about how we deploy models at OpenAI. And I think you almost see Tesla in like the completely opposite end of the spectrum, where they're like, whoa, we.[00:12:11] Put these 5,000 pound machines out there. Yeah. And maybe they'll run somebody over, maybe they won't. But like it's all in the interest of like advancement and innovation. I think that's really on the opposite end of the spectrum of, of what open AI is doing, I think under Sam's leadership. So it's, it's interesting to see that, and I think Sam said[00:12:30] Alessio Fanelli: that people could have built Chen g p t with what you offered like six, nine months ago.[00:12:35] I[00:12:35] swyx: don't understand. Can we talk about this? Do you know what, you know what we're talking about, right? I do know what you're talking about. da Vinci oh three was not in the a p six months before ChatGPT. What was he talking about? Yeah.[00:12:45] Logan Kilpatrick: I think it's a little bit of a stretch, but I do think that it's, I, I think the underlying principle is that.[00:12:52] The way that it, it comes back to prompt engineering. The way that you could have engineered, like the, the prompts that you were put again to oh oh three or oh oh two. You would be able to basically get that sort of conversational interface and you can do that now. And, and I, you know, I've seen tutorials.[00:13:05] We have tutorials out. Yep. No, we, I mean, we, nineties, we have tutorials in the cookbook right now in on GitHub. We're like, you can do this same sort of thing. And you just, it's, it's all about how you, how you ask for responses and the way you format data and things like that. It. The, the models are currently only limited by what people are willing to ask them to do.[00:13:24] Like I really do think that, yeah, that you can do a lot of these things and you don't need the chat CBT API to, to build that conversational layer. That is actually where I[00:13:33] swyx: feel a little bit dumb because I feel like I don't, I'm not smart enough to think of new things to ask the models. I have to see an example and go, oh, you can do that.[00:13:43] All right, I'm gonna do that for now. You know, and, and that's why I think the, the cookbook is so important cuz it's kind of like a compendium of things we know about the model that you can ask it to do. I totally[00:13:52] Logan Kilpatrick: agree and I think huge shout out to the, the two folks who I work super closely with now on the cookbook, Ted and Boris, who have done a lot of that work and, and putting that out there and it's, yeah, you see number one trending repo on, on GitHub and it was super, like when my first couple of weeks at Open ai, super unknown, like really, we were only sort of directing our customers to that repo.[00:14:13] Not because we were trying to hide it or anything, but just because. It was just the way that we were doing things and then all of a sudden it got picked up on GitHub trending and a bunch of tweets went viral, showing the repo. So now I think people are actually being able to leverage the tools that are in there.[00:14:26] And, and Ted's written a bunch of amazing tutorials, Boris, as well. So I think it's awesome that more people are seeing those. And from my perspective, it's how can we take those, make them more accessible, give them more visibility, put them into the documentation, and I don't think that that connection right now doesn't exist, which I'm, I'm hopeful we'll be able to bridge those two things.[00:14:44] swyx: Cookbook is kind of a different set of documentation than API docs, and I think there's, you know, sort of existing literature about how you document these things and guide developers the right way. What, what I, what I really like about the cookbook is that it actually cites academic research. So it's like a nice way to not read the paper, but just read the conclusions of the paper ,[00:15:03] Logan Kilpatrick: and, and I think that's, that's a shout out to Ted and Boris cuz I, I think they're, they're really smart in that way and they've done a great job of finding the balance and understanding like who's actually using these different tools.[00:15:13] So, . Yeah.[00:15:15] swyx: You give other people credit, but you should take credit for yourself. So I read your last week you launched some kind of documentation about rate limiting. Yeah. And one of my favorite things about reading that doc was seeing examples of, you know, you were, you're telling people to do exponential back off and, and retry, but you gave code examples with three popular libraries.[00:15:32] You didn't have to do that. You could have just told people, just figure it out. Right. But you like, I assume that was you. It wasn't.[00:15:38] Logan Kilpatrick: So I think that's the, that's, I mean, I'm, I'm helping sort of. I think there's a lot of great stuff that people have done in open ai, but it was, we have the challenge of like, how can we make that accessible, get it into the documentation and still have that high bar for what goes into the doc.[00:15:51] So my role as of recently has been like helping support the team, building that documentation first culture, and supporting like the other folks who actually are, who wrote that information. The information was actually already in. Help center but it out. Yeah, it wasn't in the docs and like wasn't really focused on, on developers in that sense.[00:16:10] So yeah. I can't take the, the credit for the rate limit stuff either. , no, this[00:16:13] swyx: is all, it's part of the A team, that team effort[00:16:16] On Prompt Engineering[00:16:16] Alessio Fanelli: I was reading on Twitter, I think somebody was saying in the future will be kind of like in the hair potter word. People have like the spell book, they pull it out, they do all the stuff in chat.[00:16:24] GP z. When you talk with customers, like are they excited about doing prompt engineering and kind of getting a starting point or do they, do they wish there was like a better interface? ?[00:16:34] Logan Kilpatrick: Yeah, that's a good question. I think prompt engineering is so much more of an art than a science right now. Like I think there are like really.[00:16:42] Systematic things that you can do and like different like approaches and designs that you can take, but really it's a lot of like, you kind of just have to try it and figure it out. And I actually think that this remains to be one of the challenges with large language models in general, and not just head open ai, but for everyone doing it is that it's really actually difficult to understand what are the capabilities of the model and how do I get it to do the things that I wanted to do.[00:17:05] And I think that's probably where a lot of folks need to do like academic research and companies need to invest in understanding the capabilities of these models and the limitations because it's really difficult to articulate the capabilities of a model without those types of things. So I'm hopeful that, and we're shipping hopefully some new updated prompt engineering stuff.[00:17:24] Cause I think the stuff we have on the website is old, and I think the cookbook actually has a little bit more up-to-date stuff. And so hopefully we'll ship some new prompt engineering stuff in the, in the short term. I think dispel some of the myths and rumors, but like I, it's gonna continue to be like a, a little bit of a pseudoscience, I would imagine.[00:17:41] And I also think that the whole prompt engineering being like a job in the future meme, I think is, I think it's slightly overblown. Like I think at, you see this now actually with like, there's tools that are showing up and I forgot what the, I just saw went on Twitter. The[00:17:57] swyx: next guest that we are having on this podcast, Lang.[00:17:59] Yeah. Yeah.[00:18:00] Logan Kilpatrick: Lang Chain and Harrison on, yeah, there's a bunch of repos too that like categorize and like collect all the best prompts that you can put into chat. For example, and like, that's like the people who are, I saw the advertisement for someone to be like a prompt engineer and it was like a $350,000 a year.[00:18:17] Mm-hmm. . Yeah, that was, that was philanthropic. Yeah, so it, it's just unclear to me like how, how sustainable stuff like that is. Cuz like, once you figure out the interesting prompts and like right now it's kind of like the, the Wild West, but like in a year you'll be able to sort of categorize all those and then people will be able to find all the good ones that are relevant for what they want to do.[00:18:35] And I think this goes back to like, having the examples is super important and I'm, I'm with you as well. Like every time I use Dall-E the little. While it's rendering the image, it gives you like a suggestion of like how you should ask for the art to be generated. Like do it in like a cyberpunk format. Do it in a pixel art format.[00:18:53] Et cetera, et cetera, and like, I really need that. I'm like, I would never come up with asking for those things had it not prompted me to like ask it that way. And now I always ask for pixel art stuff or cyberpunk stuff and it looks so cool. That's what I, I think,[00:19:06] swyx: is the innovation of ChatGPT as a format.[00:19:09] It reduces. The need for getting everything into your prompt in the first try. Mm-hmm. , it takes it from zero shot to a few shot. If, if, if that, if prompting as, as, as shots can be concerned.[00:19:21] Logan Kilpatrick: Yeah. , I think that's a great perspective and, and again, this goes back to the ux UI piece of it really being sort of the differentiating layer from some of the other stuff that was already out there.[00:19:31] Because you could kind of like do this before with oh oh three or something like that if you just made the right interface and like built some sort of like prompt retry interface. But I don't think people were really, were really doing that. And I actually think that you really need that right now. And this is the, again, going back to the difference between like how you can use generative models versus like large scale.[00:19:53] Computer vision systems for self-driving cars, like the, the answer doesn't actually need to be right all the time. That's the beauty of, of large language models. It can be wrong 50% of the time and like it doesn't really cost you anything to like regenerate a new response. And there's no like, critical safety issue with that, so you don't need those.[00:20:09] I, I keep seeing these tweets about like, you need those like 99.99% reliability and like the three nines or whatever it is. Mm-hmm. , but like you really don't need that because the cost of regenerating the prop is again, almost, almost. I think you tweeted a[00:20:23] Alessio Fanelli: couple weeks ago that the average person doesn't yet fully grasp how GBT is gonna impact human life in the next four, five years.[00:20:30] Usecases and LLM-Native Products[00:20:30] Alessio Fanelli: I think you had an example in education. Yeah. Maybe touch on some of these. Example of non-tech related use cases that are enabling, enabled by C G B[00:20:38] T.[00:20:39] Logan Kilpatrick: I'm so excited and, and there's a bunch of other like random threads that come to my mind now. I saw a thread and, and our VP of product was, Peter, was, was involved in that thread as well, talking about like how the use of systems like ChatGPT will unlock like pretty almost low to zero cost access to like mental health services.[00:20:59] You know, you can imagine like the same use case for education, like really personalized tutors and like, it's so crazy to think about, but. The technology is not actually , like it's, it's truly like an engineering problem at this point of like somebody using one of these APIs to like build something like that and then hopefully the models get a little bit better and make it, make it better as well.[00:21:20] But like it, I have no doubt in my mind that three years from now that technology will exist for every single student in the world to like have that personalized education experience, have a pr, have a chat based experience where like they'll be able. Ask questions and then the curriculum will just evolve and be constructed for them in a way that keeps, I think the cool part is in a way that keeps them engaged, like it doesn't have to be sort of like the same delivery of curriculum that you've always seen, and this now supplements.[00:21:49] The sort of traditional education experience in the sense of, you know, you don't need teachers to do all of this work. They can really sort of do the thing that they're amazing at and not spend time like grading assignments and all that type of stuff. Like, I really do think that all those could be part of the, the system.[00:22:04] And same thing, I don't know if you all saw the the do not pay, uh, lawyer situation, say, I just saw that Twitter thread, I think yesterday around they were going to use ChatGPT in the courtroom and basically I think it was. California Bar or the Bar Institute said that they were gonna send this guy to prison if he brought, if he put AirPods in and started reading what ChatGPT was saying to him.[00:22:26] Yeah.[00:22:26] swyx: To give people the context, I think, like Josh Browder, the CEO of Do Not Pay, was like, we will pay you money to put this AirPod into your ear and only say what we tell you to say fr from the large language model. And of course the judge was gonna throw that out. I mean, I, I don't see how. You could allow that in your court,[00:22:42] Logan Kilpatrick: Yeah, but I, I really do think that, like, the, the reality is, is that like, again, it's the same situation where the legal spaces even more so than education and, and mental health services, is like not an accessible space. Like every, especially with how like overly legalized the United States is, it's impossible to get representation from a lawyer, especially if you're low income or some of those things.[00:23:04] So I'm, I'm optimistic. Those types of services will exist in the future. And you'll be able to like actually have a, a quality defense representative or just like some sort of legal counsel. Yeah. Like just answer these questions, what should I do in this situation? Yeah. And I like, I have like some legal training and I still have those same questions.[00:23:22] Like I don't know what I would do in that situation. I would have to go and get a lawyer and figure that out. And it's, . It's tough. So I'm excited about that as well. Yeah.[00:23:29] Alessio Fanelli: And when you think about all these vertical use cases, do you see the existing products implementing language models in what they have?[00:23:35] Or do you think we're just gonna see L L M native products kind of come to market and build brand[00:23:40] Logan Kilpatrick: new experiences? I think there'll be a lot of people who build the L l M first experience, and I think that. At least in the short term, those are the folks who will have the advantage. I do think that like the medium to long term is again, thinking about like what is your moat for and like again, and everyone has access to, you know, ChatGPT and to the different models that we have available.[00:24:05] So how can you build a differentiated business? And I think a lot of it actually will come down to, and this is just the true and the machine learning world in general, but having. Unique access to data. So I think if you're some company that has some really, really great data about the legal space or about the education space, you can use that and be better than your competition by fine tuning these models or building your own specific LLMs.[00:24:28] So it'll, it'll be interesting to see how that plays out, but I do think that. from a product experience, it's gonna be better in the short term for people who build the, the generative AI first experience versus people who are sort of bolting it onto their mm-hmm. existing product, which is why, like, again, the, the Google situation, like they can't just put in like the prompt into like right below the search bar.[00:24:50] Like, it just, it would be a weird experience and, and they have to sort of defend that experience that they have. So it, it'll be interesting to see what happens. Yeah. Perplexity[00:24:58] swyx: is, is kind of doing that. So you're saying perplexity will go Google ?[00:25:04] Logan Kilpatrick: I, I think that perplexity has a, has a chance in the short term to actually get more people to try the product because it's, it's something different I think, whether they can, I haven't actually used, so I can't comment on like that experience, but like I think the long term is like, How can they continue to differentiate?[00:25:21] And, and that's really the focus for like, if you're somebody building on these models, like you have to be, your first thought should be, how do I build a differentiated business? And if you can't come up with 10 reasons that you can build a differentiated business, you're probably not gonna succeed in, in building something that that stands the test of time.[00:25:37] Yeah.[00:25:37] Risks and benefits of building on OpenAI[00:25:37] swyx: I think what's. As a potential founder or something myself, like what's scary about that is I would be building on top of open ai. I would be sending all my stuff to you for fine tuning and embedding and what have you. By the way, fine tuning, embedding is their, is there a third one? Those are the main two that I know of.[00:25:55] Okay. And yeah, that's the risk. I would be a open AI API reseller.[00:26:00] Logan Kilpatrick: Yeah. And, and again, this, this comes back down to like having a clear sense of like how what you're building is different. Like the people who are just open AI API resellers, like, you're not gonna, you're not gonna have a successful business doing that because everybody has access to the Yeah.[00:26:15] Jasper's pretty great. Yeah, Jasper's pretty great because I, I think they've done a, they've, they've been smart about how they've positioned the product and I was actually a, a Jasper customer before I joined OpenAI and was using it to do a bunch of stuff. because the interface was simple because they had all the sort of customized, like if you want for like a response for this sort of thing, they'd, they'd pre-done that prompt engineering work for us.[00:26:39] I mean, you could really just like put in some exactly what you wanted and then it would make that Amazon product description or whatever it is. So I think like that. The interface is the, the differentiator for, for Jasper. And again, whether that send test time, hopefully, cuz I know they've raised a bunch of money and have a bunch of employees, so I'm, I'm optimistic for them.[00:26:58] I think that there's enough room as well for a lot of these companies to succeed. Like it's not gonna, the space is gonna get so big so quickly that like, Jasper will be able to have a super successful business. And I think they are. I just saw some, some tweets from the CEO the other day that I, I think they're doing, I think they're doing well.[00:27:13] Alessio Fanelli: So I'm the founder of A L L M native. I log into open ai, there's 6 million things that I can do. I'm on the playground. There's a lot of different models. How should people think about exploring the surface area? You know, where should they start? Kind of like hugging the go deeper into certain areas.[00:27:30] Logan Kilpatrick: I think six months ago, I think it would've been a much different conversation because people hadn't experienced ChatGPT before.[00:27:38] Now that people have experienced ChatGPT, I think there's a lot more. Technical things that you should start looking into and, and thinking about like the differentiators that you can bring. I still think that the playground that we have today is incredible cause it does sort of similar to what Jasper does, which is like we have these very focused like, you know, put in a topic and we'll generate you a summary, but in the context of like explaining something to a second grader.[00:28:03] So I think all of those things like give a sense, but we only have like 30 on the website or something like that. So really doing a lot of exploration around. What is out there? What are the different prompts that you can use? What are the different things that you can build on? And I'm super bullish on embeddings, like embed everything and that's how you can build cool stuff.[00:28:20] And I keep seeing all these Boris who, who I talked about before, who did a bunch of the cookbook stuff, tweeted the other day that his like back of the hand, back of the napkin math, was that 50 million bucks you can embed the whole internet. I'm like, Some companies gonna spend the 50 million and embed the whole internet and like, we're gonna find out what that product looks like.[00:28:40] But like, there's so many cool things that you could do if you did have the whole internet embedded. Yeah, and I, I mean, I wouldn't be surprised if Google did that cuz 50 million is a drop in the bucket and they already have the whole internet, so why not embed it?[00:28:52] swyx: Can can I ask a follow up question on that?[00:28:54] Cuz I am just learning about embeddings myself. What makes open eyes embeddings different from other embeddings? If, if there's like, It's okay if you don't have the, the numbers at hand, but I'm just like, why should I use open AI emitting versus others? I[00:29:06] Logan Kilpatrick: don't understand. Yeah, that's a really good question.[00:29:08] So I'm still ramping up on my understanding of embeddings as well. So the two things that come to my mind, one, going back to the 50 million to embed the whole internet example, it's actually just super cheap. I, I don't know the comparisons of like other prices, but at least from what I've seen people talking about on Twitter, like the embeddings that that we have in the API is just like significantly cheaper than a lot of other c.[00:29:30] Embeddings. Also the accuracy of some of the benchmarks that are like, Sort of academic benchmarks to use in embeddings. I know at least I was just looking back through the blog post from when we announced the new text embedding model, which is what Powers embeddings and it's, yeah, the, on those metrics, our API is just better.[00:29:50] So those are the those. I'll go read it up. Yeah, those are the two things. It's a good. It's a good blog post to read. I think the most recent one that came out, but, and also the original one from when we first announced the Embeddings api, I think also was a, it had, that one has a little bit more like context around if you're trying to wrap your head around embeddings, how they work.[00:30:06] That one has the context, the new one just has like the fancy new stuff and the metrics and all that kind of stuff.[00:30:11] swyx: I would shout a hugging face for having really good content around what these things like foundational concepts are. Because I was familiar with, so, you know, in Python you have like text tove, my first embedding as as a, as someone getting into nlp.[00:30:24] But then developing the concept of sentence embeddings is, is as opposed to words I think is, is super important. But yeah, it's an interesting form of lock in as a business because yes, I'm gonna embed all my source data, but then every inference needs an embedding as. . And I think that is a risk to some people, because I've seen some builders should try and build on open ai, call that out as, as a cost, as as like, you know, it starts to add a cost to every single query that you, that you[00:30:48] Logan Kilpatrick: make.[00:30:49] Yeah. It'll be interesting to see how it all plays out, but like, my hope is that that cost isn't the barrier for people to build because it's, it's really not like the cost for doing the incremental like prompts and having them embedded is, is. Cent less than cents, but[00:31:06] swyx: cost I, I mean money and also latency.[00:31:08] Yeah. Which is you're calling the different api. Yeah. Anyway, we don't have to get into that.[00:31:13] Alessio Fanelli: No, but I think embeds are a good example. You had, I think, 17 versions of your first generation, what api? Yeah. And then you released the second generation. It's much cheaper, much better. I think like the word on the street is like when GPT4 comes out, everything else is like trash that came out before it.[00:31:29] It's got[00:31:30] Logan Kilpatrick: 100 trillion billion. Exactly. Parameters you don't understand. I think Sam has already confirmed that those are, those are not true . The graphics are not real. Whatever you're seeing on Twitter about GPT4, you're, I think the direct quote was, you're begging to be disappointed by continuing to, to put that hype out.[00:31:47] So[00:31:48] Alessio Fanelli: if you're a developer building on these, What's kind of the upgrade path? You know, I've been building on Model X, now this new model comes out. What should I do to be ready to move on?[00:31:58] Logan Kilpatrick: Yeah. I think all of these types of models folks have to think about, like there will be trade offs and they'll also be.[00:32:05] Breaking changes like any other sort of software improvement, like things like the, the prompts that you were previously expecting might not be the prompts that you're seeing now. And you can actually, you, you see this in the case of the embeddings example that you just gave when we released Tex embeddings, ADA oh oh two, ada, ada, whichever it is oh oh two, and it's sort of replaced the previous.[00:32:26] 16 first generation models, people went through this exact experience where like, okay, I need to test out this new thing, see how it works in my environment. And I think that the really fascinating thing is that there aren't, like the tools around doing this type of comparison don't exist yet today. Like if you're some company that's building on lms, you sort of just have to figure it out yourself of like, is this better in my use case?[00:32:49] Is this not better? In my use case, it's, it's really difficult to tell because the like, Possibilities using generative models are endless. So I think folks really need to focus on, again, that goes back to how to build a differentiated business. And I think it's understanding like what is the way that people are using your product and how can you sort of automate that in as much way and codify that in a way that makes it clear when these different models come up, whether it's open AI or other companies.[00:33:15] Like what is the actual difference between these and which is better for my use case because the academic be. It'll be saturated and people won't be able to use them as a point of comparison in the future. So it'll be important to think about. For your specific use case, how does it differentiate?[00:33:30] swyx: I was thinking about the value of frameworks or like Lang Chain and Dust and what have you out there.[00:33:36] I feel like there is some value to building those frameworks on top of Open Eyes, APIs. It kind of is building what's missing, essentially what, what you guys don't have. But it's kind of important in the software engineering sense, like you have this. Unpredictable, highly volatile thing, and you kind of need to build a stable foundation on top of it to make it more predictable, to build real software on top of it.[00:33:59] That's a super interesting kind of engineering problem. .[00:34:03] Logan Kilpatrick: Yeah, it, it is interesting. It's also the, the added layer of this is that the large language models. Are inherently not deterministic. So I just, we just shipped a small documentation update today, which, which calls this out. And you think about APIs as like a traditional developer experience.[00:34:20] I send some response. If the response is the same, I should get the same thing back every time. Unless like the data's updating and like a, from like a time perspective. But that's not the, that's not the case with the large language models, even with temperature zero. Mm-hmm. even with temperature zero. Yep.[00:34:34] And that's, Counterintuitive part, and I think someone was trying to explain to me that it has to do with like Nvidia. Yeah. Floating points. Yes. GPU stuff. and like apparently the GPUs are just inherently non-deterministic. So like, yes, there's nothing we can do unless this high Torch[00:34:48] swyx: relies on this as well.[00:34:49] If you want to. Fix this. You're gonna have to tear it all down. ,[00:34:53] Logan Kilpatrick: maybe Nvidia, we'll fix it. I, I don't know, but I, I think it's a, it's a very like, unintuitive thing and I don't think that developers like really get that until it happens to you. And then you're sort of scratching your head and you're like, why is this happening?[00:35:05] And then you have to look it up and then you see all the NVIDIA stuff. Or hopefully our documentation makes it more clear now. But hopefully people, I also think that's, it's kinda the cool part as well. I don't know, it's like, You're not gonna get the same stuff even if you try to.[00:35:17] swyx: It's a little spark of originality in there.[00:35:19] Yeah, yeah, yeah, yeah. The random seed .[00:35:22] OpenAI Codex[00:35:22] swyx: Should we ask about[00:35:23] Logan Kilpatrick: Codex?[00:35:23] Alessio Fanelli: Yeah. I mean, I love Codex. I use it every day. I think like one thing, sometimes the code is like it, it's kinda like the ChatGPT hallucination. Like one time I asked it to write up. A Twitter function, they will pull the bayou of this thing and it wrote the whole thing and then the endpoint didn't exist once I went to the Twitter, Twitter docs, and I think like one, I, I think there was one research that said a lot of people using Co Palace, sometimes they just auto complete code that is wrong and then they commit it and it's a, it's a big[00:35:51] Logan Kilpatrick: thing.[00:35:51] swyx: Do you secure code as well? Yeah, yeah, yeah, yeah. I saw that study.[00:35:54] Logan Kilpatrick: How do[00:35:54] Alessio Fanelli: you kind of see. Use case evolving. You know, you think, like, you obviously have a very strong partnership with, with Microsoft. Like do you think Codex and VS code will just keep improving there? Do you think there's kind of like a. A whole better layer on top of it, which is from the scale AI hackathon where the, the project that one was basically telling the l l m, you're not the back end of a product[00:36:16] And they didn't even have to write the code and it's like, it just understood. Yeah. How do you see the engineer, I, I think Sean, you said copilot is everybody gets their own junior engineer to like write some of the code and then you fix it For me, a lot of it is the junior engineer gets a senior engineer to actually help them write better code.[00:36:32] How do you see that tension working between the model and the. It'll[00:36:36] Logan Kilpatrick: be really interesting to see if there's other, if there's other interfaces to this. And I think I've actually seen a lot of people asking, like, it'd be really great if I had ChatGPT and VS code because in, in some sense, like it can, it's just a better, it's a better interface in a lot of ways to like the, the auto complete version cuz you can reprompt and do, and I know Via, I know co-pilot actually has that, where you can like click and then give it, it'll like pop up like 10 suggested.[00:36:59] Different options instead of brushes. Yeah, copilot labs, yeah. Instead of the one that it's providing. And I really like that interface, but again, this goes back to. I, I do inherently think it'll get better. I think it'll be able to do a lot, a lot more of the stuff as the models get bigger, as they have longer context as they, there's a lot of really cool things that will end up coming out and yeah, I don't think it's actually very far away from being like, much, much better.[00:37:24] It'll go from the junior engineer to like the, the principal engineer probably pretty quickly. Like I, I don't think the gap is, is really that large between where things are right now. I think like getting it to the point. 60% of the stuff really well to get it to do like 90% of the stuff really well is like that's within reach in the next, in the next couple of years.[00:37:45] So I'll be really excited to see, and hopefully again, this goes back to like engineers and developers and people who aren't thinking about how to integrate. These tools, whether it's ChatGPT or co-pilot or something else into their workflows to be more efficient. Those are the people who I think will end up getting disrupted by these tools.[00:38:02] So figuring out how to make yourself more valuable than you are today using these tools, I think will be super important for people. Yeah.[00:38:09] Alessio Fanelli: Actually use ChatGPT to debug, like a react hook the other day. And then I posted in our disc and I was like, Hey guys, like look, look at this thing. It really helped me solve this.[00:38:18] And they. That's like the ugliest code I've ever seen. It's like, why are you doing that now? It's like, I don't know. I'm just trying to get[00:38:24] Logan Kilpatrick: this thing to work and I don't know, react. So I'm like, that's the perfect, exactly, that's the perfect solution. I, I did this the other day where I was looking at React code and like I have very briefly seen React and run it like one time and I was like, explain how this is working.[00:38:38] So, and like change it in this way that I want to, and like it was able to do that flawlessly and then I just popped it in. It worked exactly like I. I'll give a[00:38:45] swyx: little bit more context cause I was, I was the guy giving you feedback on your code and I think this is a illustrative of how large language models can sort of be more confident than they should be because you asked it a question which is very specific on how to improve your code or fix your code.[00:39:00] Whereas a real engineer would've said, we've looked at your code and go, why are you doing it at at all? Right? So there's a sort of sycophantic property of martial language. Accepts the basis of your question, whereas a real human might question your question. Mm-hmm. , and it was just not able to do that. I mean, I, I don't see how he could do that.[00:39:17] Logan Kilpatrick: Yeah. It's, it's interesting. I, I saw another example of this the other day as well with some chatty b t prompt and I, I agree. It'll be interesting to see if, and again, I think not to, not to go back to Sam's, to Sam's talk again, but like, he, he talked real about this, and I think this makes a ton of sense, which is like you should be able to have, and this isn't something that that exists right now, but you should be able to have the model.[00:39:39] Tuned in the way that you wanna interact with. Like if you want a model that sort of questions what you're asking it to do, like you should be able to have that. And I actually don't think that that's as far away as like some of the other stuff. Um, It, it's a very possible engineering problem to like have the, to tune the models in that way and, and ask clarifying questions, which is even something that it doesn't do right now.[00:39:59] It'll either give you the response or it won't give you the response, but it'll never say like, Hey, what do you mean by this? Which is super interesting cuz that's like we spend as humans, like 50% of our conversational time being like, what do you mean by that? Like, can you explain more? Can you say it in a different way?[00:40:14] And it's, it's fascinating that the model doesn't do that right now. It's, it's interesting.[00:40:20] swyx: I have written a piece on sort of what AGI hard might be, which is the term that is being thrown around as like a layer of boundary for what is, what requires an A real AGI to do and what, where you might sort of asymptotically approach.[00:40:33] So, What people talk about is essentially a theory of mind, developing a con conception of who I'm talking to and persisting that across sessions, which essentially ChatGPT or you know, any, any interface that you build on top of GPT3 right now would not be able to do. Right? Like, you're not persisting you, you are persisting that history, but you don't, you're not building up a conception of what you know and what.[00:40:54] I should fill in the blanks for you or where I should question you. And I think that's like the hard thing to understand, which is what will it take to get there? Because I think that to me is the, going back to your education thing, that is the biggest barrier, which is I, the language model doesn't have a memory or understanding of what I know.[00:41:11] and like, it's, it's too much to tell them what I don't know. Mm-hmm. , there's more that I don't know than I, than I do know . I think the cool[00:41:16] Logan Kilpatrick: part will be when, when you're able to, like, imagine you could upload all of the, the stuff that you've ever done, all the texts, the work that you've ever done before, and.[00:41:27] The model can start to understand, hey, what are the, what are the conceptual gaps that this person has based on what you've said, based on what you've done? I think that would be really interesting. Like if you can, like I have good notes on my phone and I can still go back to see all of the calculus classes that I took and I could put in all my calculus notebooks and all the assignments and stuff that I did in, in undergrad and grad school, and.[00:41:50] basically be like, Hey, here are the gaps in your understanding of calculus. Go and do this right now. And I think that that's in the education space. That's exactly what will end up happening. You'll be able to put in all this, all the work that you've done. It can understand those ask and then come up with custom made questions and prompts and be like, Hey, how, you know, explain this concept to me and if it.[00:42:09] If you can't do that, then it can sort of put that into your curriculum. I think like Khan Academy as an example, already does some of this, like personalized learning. You like take assessments at the beginning of every Khan Academy model module, and it'll basically only have you watch the videos and do the assignments for the things that like you didn't test well into.[00:42:27] So that's, it's, it's sort of close to already being there in some sense, but it doesn't have the, the language model interface on top of it before we[00:42:34] swyx: get into our lightning round, which is like, Quick response questions. Was there any other topics that you think you wanted to cover? We didn't touch on, whisper.[00:42:40] We didn't touch on Apple. Anything you wanted to[00:42:42] Logan Kilpatrick: talk?[00:42:43] Apple's Neural Engine[00:42:43] Logan Kilpatrick: Yeah, I think the question around Apple stuff and, and the neural engine, I think will be really interesting to see how it all plays out. I think, I don't know if you wanna like ask just to give the context around the neural engine Apple question. Well, well, the[00:42:54] swyx: only thing I know it's because I've seen Apple keynotes.[00:42:57] Everyone has, you know, I, I have a m M one MacBook Cure. They have some kind of neuro chip. , but like, I don't see it in my day-to-day life, so when is this gonna affect me, essentially? And you worked at Apple, so I I was just gonna throw the question over to you, like, what should we[00:43:11] Logan Kilpatrick: expect out of this? Yeah.[00:43:12] The, the problem that I've seen so far with the neural engine and all the, the Mac, and it's also in the phones as well, is that the actual like, API to sort of talk to the neural engine isn't something that's like a common you like, I'm pretty sure it's either not exposed at all, like it only like Apple basically decides in the software layer Yeah.[00:43:34] When, when it should kick in and when it should be used, which I think doesn't really like help developers and it doesn't, that's why no one is using it. I saw a bunch of, and of course I don't have any good insight on this, but I saw a bunch of rumors that we're talking about, like a lot of. Main use cases for the neural engine stuff.[00:43:50] It's, it's basically just in like phantom mode. Now, I'm sure it's doing some processing, but like the main use cases will be a lot of the ar vr stuff that ends up coming out and like when it gets much heavier processing on like. Graphic stuff and doing all that computation, that's where it'll be. It'll be super important.[00:44:06] And they've basically been able to trial this for the last, like six years and have it part of everything and make sure that they can do it cheaply in a cost effective way. And so it'll be cool to see when that I'm, I hope it comes out. That'll be awesome.[00:44:17] swyx: Classic Apple, right? They, they're not gonna be first, but when they do it, they'll make a lot of noise about it.[00:44:21] Yeah. . It'll be[00:44:22] Logan Kilpatrick: awesome. Sure.[00:44:22] Lightning Round[00:44:22] Logan Kilpatrick: So, so are we going to light. Let's[00:44:24] Alessio Fanelli: do it. All right. Favorite AI products not[00:44:28] Logan Kilpatrick: open AI. Build . I think synthesis. Is synthesis.io is the, yeah, you can basically put in like a text prompt and they have like a human avatar that will like speak and you can basically make content in like educational videos.[00:44:44] And I think that's so cool because maybe as people who are making content, like it's, it's super hard to like record video. It just takes a long time. Like you have to edit all the stuff, make sure you sound right, and then when you edit yourself talking it's super weird cuz your mouth is there and things.[00:44:57] So having that and just being able to ChatGPT A script. Put it in. Hopefully I saw another demo of like somebody generating like slides automatically using some open AI stuff. Like I think that type of stuff. Chat, BCG, ,[00:45:10] swyx: a fantastic name, best name of all time .[00:45:14] Logan Kilpatrick: I think that'll be cool. So I'm super excited,[00:45:16] swyx: but Okay.[00:45:16] Well, so just a follow up question on, on that, because we're both in that sort of Devrel business, would you put AI Logan on your video, on your videos and a hundred[00:45:23] Logan Kilpatrick: percent, explain that . A hundred percent. I would, because again, if it reduces the time for me, like. I am already busy doing a bunch of other stuff,[00:45:31] And if I could, if I could take, like, I think the real use case is like I've made, and this is in the sense of like creators wanting to be on every platform. If I could take, you know, the blog posts that I wrote and then have AI break it up into a bunch of things, have ai Logan. Make a TikTok, make a YouTube video.[00:45:48] I cannot wait for that. That's gonna be so nice. And I think there's probably companies who are already thinking about doing that. I'm just[00:45:53] swyx: worried cuz like people have this uncanny valley reaction to like, oh, you didn't tell me what I just watched was a AI generated thing. I hate you. Now you know there, there's a little bit of ethics there and I'm at the disclaimer,[00:46:04] Logan Kilpatrick: at the top.[00:46:04] Navigating. Yeah. I also think people will, people will build brands where like their whole thing is like AI content. I really do think there are AI influencers out there. Like[00:46:12] swyx: there are entire Instagram, like million plus follower accounts who don't exist.[00:46:16] Logan Kilpatrick: I, I've seen that with the, the woman who's a Twitch streamer who like has some, like, she's using like some, I don't know, that technology from like movies where you're like wearing like a mask and it like changes your facial appearance and all that stuff.[00:46:27] So I think there's, there's people who find their niche plus it'll become more common. So, cool. My[00:46:32] swyx: question would be, favorite AI people in communities that you wanna shout up?[00:46:37] Logan Kilpatrick: I think there's a bunch of people in the ML ops community where like that seemed to have been like the most exciting. There was a lot of innovation, a lot of cool things happening in the ML op space, and then all the generative AI stuff happened and then all the ML Ops two people got overlooked.[00:46:51] They're like, what's going on here? So hopefully I still think that ML ops and things like that are gonna be super important for like getting machine learning to be where it needs to be for us to. AGI and all that stuff. So a year from[00:47:05] Alessio Fanelli: now, what will people be the most[00:47:06] Logan Kilpatrick: surprised by? N. I think the AI is gonna get very, very personalized very quickly, and I don't think that people have that feeling yet with chat, BT, but I, I think that that's gonna, that's gonna happen and they'll be surprised in like the, the amount of surface areas in which AI is present.[00:47:23] Like right now it's like, it's really exciting cuz Chat BT is like the one place that you can sort of get that cool experience. But I think that, The people at Facebook aren't dumb. The people at Google aren't dumb. Like they're gonna have, they're gonna have those experiences in a lot of different places and I think that'll be super fascinating to see.[00:47:40] swyx: This is for the builders out there. What's an AI thing you would pay for if someone built it with their personal[00:47:45] Logan Kilpatrick: work? I think more stuff around like transfer learning for, like making transfer, learning easier. Like I think that's truly the way to. Build really cool things is transfer learning, fine tuning, and I, I don't think that there's enough.[00:48:04] Jeremy Howard who created Fasted AI talks a lot about this. I mean, it's something that really resonates with me and, and for context, like at Apple, all the machine learning stuff that we did was transfer learning because it was so powerful. And I think people have this perception that they need to.[00:48:18] Build things from scratch and that's not the case. And I think especially as large language models become more accessible, people need to build layers and products on top of this to make transfer learning more accessible to more people. So hopefully somebody builds something like that and we can all train our own models.[00:48:33] I think that's how you get like that personalized AI experiences you put in your stuff. Make transfer learning easy. Everyone wins. Just just to vector in[00:48:40] swyx: a little bit on this. So in the stable diffusion community, there's a lot of practice of like, I'll fine tune a custom dis of stable diffusion and share it.[00:48:48] And then there also, there's also this concept of, well, first it was textual inversion and then dream booth where you essentially train a concept that you can sort of add on. Is that what you're thinking about when you talk about transfer learning or is that something[00:48:59] Logan Kilpatrick: completely. I feel like I'm not as in tune with the generative like image model community as I probably should be.[00:49:07] I, I think that that makes a lot of sense. I think there'll be like whole ecosystems and marketplaces that are sort of built around exactly what you just said, where you can sort of fine tune some of these models in like very specific ways and you can use other people's fine tunes. That'll be interesting to see.[00:49:21] But, c.ai is,[00:49:23] swyx: what's it called? C C I V I Ts. Yeah. It's where people share their stable diffusion checkpoints in concepts and yeah, it's[00:49:30] Logan Kilpatrick: pretty nice. Do you buy them or is it just like free? Like open. Open source? It's, yeah. Cool. Even better.[00:49:34] swyx: I think people might want to sell them. There's a, there's a prompt marketplace.[00:49:38] Prompt base, yeah. Yeah. People hate it. Yeah. They're like, this should be free. It's just text. Come on, .[00:49:45] Alessio Fanelli: Hey, it's knowledge. All right. Last question. If there's one thing you want everyone to take away about ai, what would.[00:49:51] Logan Kilpatrick: I think the AI revolution is gonna, you know, it's been this like story that people have been talking about for the longest time, and I don't think that it's happened.[00:50:01] It was really like, oh, AI's gonna take your job, AI's gonna take your job, et cetera, et cetera. And I think people have sort of like laughed that off for a really long time, which was fair because it wasn't happening. And I think now, Things are going to accelerate very, very quickly. And if you don't have your eyes wide open about what's happening, like there's a good chance that something that you might get left behind.[00:50:21] So I'm, I'm really thinking deeply these days about like how that is going to impact a lot of people. And I, I'm hopeful that the more widespread this technology becomes, the more mainstream this technology becomes, the more people will benefit from it and hopefully not be affected in that, in that negative way.[00:50:35] So use these tools, put them into your workflow, and, and hopefully that will, and that will acceler. Well,[00:50:41] swyx: we're super happy that you're at OpenAI getting this message out there, and I'm sure we'll see a l