Podcast appearances and mentions of jerry liu

29PODCASTS
33EPISODES
47mAVG DURATION
1MONTHLY NEW EPISODE
Sep 30, 2025LATEST

POPULARITY

20172018201920202021202220232024

Best podcasts about jerry liu

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

4 episodes with jerry liu

Latest podcast episodes about jerry liu

Hiring only senior engineers is killing companies (News)

The Changelog

Play Episode Listen Later Sep 30, 2025 6:41

Andrew Churchill thinks companies should really be hiring junior engineers, Addy Osmani announces Chrome DevTools MCP, GitHub lays out a roadmap to fend off npm attacks, Jerry Liu builds an app that generates a timeline of your day's activities, and Sean Goedecke attempts to define "good taste" in the context of software engineering.

killing companies hiring senior engineers github addy osmani jerod santo jerry liu

Hiring only senior engineers is killing companies

Changelog News

Play Episode Listen Later Sep 30, 2025 6:41

killing companies hiring senior engineers github addy osmani jerod santo jerry liu

Hiring only senior engineers is killing companies (Changelog News #163)

Changelog Master Feed

Play Episode Listen Later Sep 30, 2025 6:41

killing companies hiring senior engineers github changelog addy osmani jerod santo jerry liu

Shout Out Fight Podcast #112 - "MTWC XI match ups and interviews with 2 on the event!""

Shout Out Fight Podcast

Play Episode Listen Later Nov 2, 2024 91:34

Hello again Fighty friends! I hope y'all had a fun and exciting Halloween! I did as Luigi for the 3rd year in a row. Now we are into November, that means the Muay Thai World Cup is only weeks away and yours truly will be ringside bringing the action to the fight fans from the safe side of the ropes. BEFORE WE GET THERE - I was up at an event in Sherwood Park hosted by Arashi-Do Martial Arts, which is the home of WBC current and former Canadian champions Tim Lo and Derek Jolivette as well as WBC LHW contender Jerry Liu. ALSO, the podcast with Tim and Derek is called "2 champs 1 gym" (I wonder where I got that title....WINK!) but their's another champ out of ADMA Sherwood Park, and that's 2x, back to back IFMA jr WORLD CHAMPION Kobe Carr! with these guys all in 1 room at the same time as me, I decided to being up my gear and make these guys feel guilty and convince them to have a chat with me! Kobe and coach Derek talk about living the IFMA experience twice and maybe what is next for Kobe. Tim and Jerry have a couple 1,2 title fights on the 23rd so we talk about their match ups, preparations for the fight and the MTWC on the other side of the country. Side interviews and new mics are new to the podcast, but I will do my best to keep leveling up to bring you the best, most entertaining podcast I can - the SHOUT OUT PODCAST - the podcast for the people!

halloween canadian event luigi matchups wbc wink sherwood park ifma jerry liu

EP. 220 Innovating with LlamaIndex: Unstructured Data, AI and MongoDB

The MongoDB Podcast

Play Episode Listen Later Jul 10, 2024 8:13

In this episode, Jesse, Anaiya, and Mike engage in a riveting discussion with Jerry Liu, the CEO of LlamaIndex.Dive into the world of unstructured data and learn how LlamaIndex is revolutionizing the way developers build applications over their data. Discover the integration with MongoDB and the challenges faced when transitioning from prototype to production.Jerry shares insights into the future of AI, chatbots, and autonomous task assistants. Don't miss out on the latest updates and innovations in data management and AI development!

ceo ai discover dive innovating mongodb data ai unstructured data jerry liu anaiya

LLMs, RAG, Vector Database, Enterprise Use Cases with Jerry Liu, Co-Founder & CEO of LlamaIndex

The Ravit Show

Play Episode Listen Later Jun 12, 2024 5:21

Got a chance to chat with one and only, Jerry Liu, Co-Founder & CEO of LlamaIndex on The Ravit Show at NVIDIA GTC! We discussed about LLMs, RAG, Vector Database, Enterprise Use Cases and much more! #data #ai #nvidiagtc #llamaindex #theravitshow

ceo co founders enterprise databases co founder ceo use cases rag nvidia gtc vector database jerry liu

High Agency Pydantic > VC Backed Frameworks — with Jason Liu of Instructor

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Apr 19, 2024 52:20

We are reuniting for the 2nd AI UX demo day in SF on Apr 28. Sign up to demo here! And don't forget tickets for the AI Engineer World's Fair — for early birds who join before keynote announcements!About a year ago there was a lot of buzz around prompt engineering techniques to force structured output. Our friend Simon Willison tweeted a bunch of tips and tricks, but the most iconic one is Riley Goodside making it a matter of life or death:Guardrails (friend of the pod and AI Engineer speaker), Marvin (AI Engineer speaker), and jsonformer had also come out at the time. In June 2023, Jason Liu (today's guest!) open sourced his “OpenAI Function Call and Pydantic Integration Module”, now known as Instructor, which quickly turned prompt engineering black magic into a clean, developer-friendly SDK. A few months later, model providers started to add function calling capabilities to their APIs as well as structured outputs support like “JSON Mode”, which was announced at OpenAI Dev Day (see recap here). In just a handful of months, we went from threatening to kill grandmas to first-class support from the research labs. And yet, Instructor was still downloaded 150,000 times last month. Why?What Instructor looks likeInstructor patches your LLM provider SDKs to offer a new response_model option to which you can pass a structure defined in Pydantic. It currently supports OpenAI, Anthropic, Cohere, and a long tail of models through LiteLLM.What Instructor is forThere are three core use cases to Instructor:* Extracting structured data: Taking an input like an image of a receipt and extracting structured data from it, such as a list of checkout items with their prices, fees, and coupon codes.* Extracting graphs: Identifying nodes and edges in a given input to extract complex entities and their relationships. For example, extracting relationships between characters in a story or dependencies between tasks.* Query understanding: Defining a schema for an API call and using a language model to resolve a request into a more complex one that an embedding could not handle. For example, creating date intervals from queries like “what was the latest thing that happened this week?” to then pass onto a RAG system or similar.Jason called all these different ways of getting data from LLMs “typed responses”: taking strings and turning them into data structures. Structured outputs as a planning toolThe first wave of agents was all about open-ended iteration and planning, with projects like AutoGPT and BabyAGI. Models would come up with a possible list of steps, and start going down the list one by one. It's really easy for them to go down the wrong branch, or get stuck on a single step with no way to intervene.What if these planning steps were returned to us as DAGs using structured output, and then managed as workflows? This also makes it easy to better train model on how to create these plans, as they are much more structured than a bullet point list. Once you have this structure, each piece can be modified individually by different specialized models. You can read some of Jason's experiments here:While LLMs will keep improving (Llama3 just got released as we write this), having a consistent structure for the output will make it a lot easier to swap models in and out. Jason's overall message on how we can move from ReAct loops to more controllable Agent workflows mirrors the “Process” discussion from our Elicit episode:Watch the talkAs a bonus, here's Jason's talk from last year's AI Engineer Summit. He'll also be a speaker at this year's AI Engineer World's Fair!Timestamps* [00:00:00] Introductions* [00:02:23] Early experiments with Generative AI at StitchFix* [00:08:11] Design philosophy behind the Instructor library* [00:11:12] JSON Mode vs Function Calling* [00:12:30] Single vs parallel function calling* [00:14:00] How many functions is too many?* [00:17:39] How to evaluate function calling* [00:20:23] What is Instructor good for?* [00:22:42] The Evolution from Looping to Workflow in AI Engineering* [00:27:03] State of the AI Engineering Stack* [00:28:26] Why Instructor isn't VC backed* [00:31:15] Advice on Pursuing Open Source Projects and Consulting* [00:36:00] The Concept of High Agency and Its Importance* [00:42:44] Prompts as Code and the Structure of AI Inputs and Outputs* [00:44:20] The Emergence of AI Engineering as a Distinct FieldShow notes* Jason on the UWaterloo mafia* Jason on Twitter, LinkedIn, website* Instructor docs* Max Woolf on the potential of Structured Output* swyx on Elo vs Cost* Jason on Anthropic Function Calling* Jason on Rejections, Advice to Young People* Jason on Bad Startup Ideas* Jason on Prompts as Code* Rysana's inversion models* Bryan Bischof's episode* Hamel HusainTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.Swyx [00:00:16]: Hello, we're back in the remote studio with Jason Liu from Instructor. Welcome Jason.Jason [00:00:21]: Hey there. Thanks for having me.Swyx [00:00:23]: Jason, you are extremely famous, so I don't know what I'm going to do introducing you, but you're one of the Waterloo clan. There's like this small cadre of you that's just completely dominating machine learning. Actually, can you list like Waterloo alums that you're like, you know, are just dominating and crushing it right now?Jason [00:00:39]: So like John from like Rysana is doing his inversion models, right? I know like Clive Chen from Waterloo. When I started the data science club, he was one of the guys who were like joining in and just like hanging out in the room. And now he was at Tesla working with Karpathy, now he's at OpenAI, you know.Swyx [00:00:56]: He's in my climbing club.Jason [00:00:58]: Oh, hell yeah. I haven't seen him in like six years now.Swyx [00:01:01]: To get in the social scene in San Francisco, you have to climb. So both in career and in rocks. So you started a data science club at Waterloo, we can talk about that, but then also spent five years at Stitch Fix as an MLE. You pioneered the use of OpenAI's LLMs to increase stylist efficiency. So you must have been like a very, very early user. This was like pretty early on.Jason [00:01:20]: Yeah, I mean, this was like GPT-3, okay. So we actually were using transformers at Stitch Fix before the GPT-3 model. So we were just using transformers for recommendation systems. At that time, I was very skeptical of transformers. I was like, why do we need all this infrastructure? We can just use like matrix factorization. When GPT-2 came out, I fine tuned my own GPT-2 to write like rap lyrics and I was like, okay, this is cute. Okay, I got to go back to my real job, right? Like who cares if I can write a rap lyric? When GPT-3 came out, again, I was very much like, why are we using like a post request to review every comment a person leaves? Like we can just use classical models. So I was very against language models for like the longest time. And then when ChatGPT came out, I basically just wrote a long apology letter to everyone at the company. I was like, hey guys, you know, I was very dismissive of some of this technology. I didn't think it would scale well, and I am wrong. This is incredible. And I immediately just transitioned to go from computer vision recommendation systems to LLMs. But funny enough, now that we have RAG, we're kind of going back to recommendation systems.Swyx [00:02:21]: Yeah, speaking of that, I think Alessio is going to bring up the next one.Alessio [00:02:23]: Yeah, I was going to say, we had Bryan Bischof from Hex on the podcast. Did you overlap at Stitch Fix?Jason [00:02:28]: Yeah, he was like one of my main users of the recommendation frameworks that I had built out at Stitch Fix.Alessio [00:02:32]: Yeah, we talked a lot about RecSys, so it makes sense.Swyx [00:02:36]: So now I have adopted that line, RAG is RecSys. And you know, if you're trying to reinvent new concepts, you should study RecSys first, because you're going to independently reinvent a lot of concepts. So your system was called Flight. It's a recommendation framework with over 80% adoption, servicing 350 million requests every day. Wasn't there something existing at Stitch Fix? Why did you have to write one from scratch?Jason [00:02:56]: No, so I think because at Stitch Fix, a lot of the machine learning engineers and data scientists were writing production code, sort of every team's systems were very bespoke. It's like, this team only needs to do like real time recommendations with small data. So they just have like a fast API app with some like pandas code. This other team has to do a lot more data. So they have some kind of like Spark job that does some batch ETL that does a recommendation. And so what happens is each team writes their code differently. And I have to come in and refactor their code. And I was like, oh man, I'm refactoring four different code bases, four different times. Wouldn't it be better if all the code quality was my fault? Let me just write this framework, force everyone else to use it. And now one person can maintain five different systems, rather than five teams having their own bespoke system. And so it was really a need of just sort of standardizing everything. And then once you do that, you can do observability across the entire pipeline and make large sweeping improvements in this infrastructure, right? If we notice that something is slow, we can detect it on the operator layer. Just hey, hey, like this team, you guys are doing this operation is lowering our latency by like 30%. If you just optimize your Python code here, we can probably make an extra million dollars. So let's jump on a call and figure this out. And then a lot of it was doing all this observability work to figure out what the heck is going on and optimize this system from not only just a code perspective, sort of like harassingly or against saying like, we need to add caching here. We're doing duplicated work here. Let's go clean up the systems. Yep.Swyx [00:04:22]: Got it. One more system that I'm interested in finding out more about is your similarity search system using Clip and GPT-3 embeddings and FIASS, where you saved over $50 million in annual revenue. So of course they all gave all that to you, right?Jason [00:04:34]: No, no, no. I mean, it's not going up and down, but you know, I got a little bit, so I'm pretty happy about that. But there, you know, that was when we were doing fine tuning like ResNets to do image classification. And so a lot of it was given an image, if we could predict the different attributes we have in the merchandising and we can predict the text embeddings of the comments, then we can kind of build a image vector or image embedding that can capture both descriptions of the clothing and sales of the clothing. And then we would use these additional vectors to augment our recommendation system. And so with the recommendation system really was just around like, what are similar items? What are complimentary items? What are items that you would wear in a single outfit? And being able to say on a product page, let me show you like 15, 20 more things. And then what we found was like, hey, when you turn that on, you make a bunch of money.Swyx [00:05:23]: Yeah. So, okay. So you didn't actually use GPT-3 embeddings. You fine tuned your own? Because I was surprised that GPT-3 worked off the shelf.Jason [00:05:30]: Because I mean, at this point we would have 3 million pieces of inventory over like a billion interactions between users and clothes. So any kind of fine tuning would definitely outperform like some off the shelf model.Swyx [00:05:41]: Cool. I'm about to move on from Stitch Fix, but you know, any other like fun stories from the Stitch Fix days that you want to cover?Jason [00:05:46]: No, I think that's basically it. I mean, the biggest one really was the fact that I think for just four years, I was so bearish on language models and just NLP in general. I'm just like, none of this really works. Like, why would I spend time focusing on this? I got to go do the thing that makes money, recommendations, bounding boxes, image classification. Yeah. Now I'm like prompting an image model. I was like, oh man, I was wrong.Swyx [00:06:06]: So my Stitch Fix question would be, you know, I think you have a bit of a drip and I don't, you know, my primary wardrobe is free startup conference t-shirts. Should more technology brothers be using Stitch Fix? What's your fashion advice?Jason [00:06:19]: Oh man, I mean, I'm not a user of Stitch Fix, right? It's like, I enjoy going out and like touching things and putting things on and trying them on. Right. I think Stitch Fix is a place where you kind of go because you want the work offloaded. I really love the clothing I buy where I have to like, when I land in Japan, I'm doing like a 45 minute walk up a giant hill to find this weird denim shop. That's the stuff that really excites me. But I think the bigger thing that's really captured is this idea that narrative matters a lot to human beings. Okay. And I think the recommendation system, that's really hard to capture. It's easy to use AI to sell like a $20 shirt, but it's really hard for AI to sell like a $500 shirt. But people are buying $500 shirts, you know what I mean? There's definitely something that we can't really capture just yet that we probably will figure out how to in the future.Swyx [00:07:07]: Well, it'll probably output in JSON, which is what we're going to turn to next. Then you went on a sabbatical to South Park Commons in New York, which is unusual because it's based on USF.Jason [00:07:17]: Yeah. So basically in 2020, really, I was enjoying working a lot as I was like building a lot of stuff. This is where we were making like the tens of millions of dollars doing stuff. And then I had a hand injury. And so I really couldn't code anymore for like a year, two years. And so I kind of took sort of half of it as medical leave, the other half I became more of like a tech lead, just like making sure the systems were like lights were on. And then when I went to New York, I spent some time there and kind of just like wound down the tech work, you know, did some pottery, did some jujitsu. And after GPD came out, I was like, oh, I clearly need to figure out what is going on here because something feels very magical. I don't understand it. So I spent basically like five months just prompting and playing around with stuff. And then afterwards, it was just my startup friends going like, hey, Jason, you know, my investors want us to have an AI strategy. Can you help us out? And it just snowballed and bore more and more until I was making this my full time job. Yeah, got it.Swyx [00:08:11]: You know, you had YouTube University and a journaling app, you know, a bunch of other explorations. But it seems like the most productive or the best known thing that came out of your time there was Instructor. Yeah.Jason [00:08:22]: Written on the bullet train in Japan. I think at some point, you know, tools like Guardrails and Marvin came out. Those are kind of tools that I use XML and Pytantic to get structured data out. But they really were doing things sort of in the prompt. And these are built with sort of the instruct models in mind. Like I'd already done that in the past. Right. At Stitch Fix, you know, one of the things we did was we would take a request note and turn that into a JSON object that we would use to send it to our search engine. Right. So if you said like, I want to, you know, skinny jeans that were this size, that would turn into JSON that we would send to our internal search APIs. But it always felt kind of gross. A lot of it is just like you read the JSON, you like parse it, you make sure the names are strings and ages are numbers and you do all this like messy stuff. But when function calling came out, it was very much sort of a new way of doing things. Right. Function calling lets you define the schema separate from the data and the instructions. And what this meant was you can kind of have a lot more complex schemas and just map them in Pytantic. And then you can just keep those very separate. And then once you add like methods, you can add validators and all that kind of stuff. The one thing I really had with a lot of these libraries, though, was it was doing a lot of the string formatting themselves, which was fine when it was the instruction to models. You just have a string. But when you have these new chat models, you have these chat messages. And I just didn't really feel like not being able to access that for the developer was sort of a good benefit that they would get. And so I just said, let me write like the most simple SDK around the OpenAI SDK, a simple wrapper on the SDK, just handle the response model a bit and kind of think of myself more like requests than actual framework that people can use. And so the goal is like, hey, like this is something that you can use to build your own framework. But let me just do all the boring stuff that nobody really wants to do. People want to build their own frameworks, but people don't want to build like JSON parsing.Swyx [00:10:08]: And the retrying and all that other stuff.Jason [00:10:10]: Yeah.Swyx [00:10:11]: Right. We had this a little bit of this discussion before the show, but like that design principle of going for being requests rather than being Django. Yeah. So what inspires you there? This has come from a lot of prior pain. Are there other open source projects that inspired your philosophy here? Yeah.Jason [00:10:25]: I mean, I think it would be requests, right? Like, I think it is just the obvious thing you install. If you were going to go make HTTP requests in Python, you would obviously import requests. Maybe if you want to do more async work, there's like future tools, but you don't really even think about installing it. And when you do install it, you don't think of it as like, oh, this is a requests app. Right? Like, no, this is just Python. The bigger question is, like, a lot of people ask questions like, oh, why isn't requests like in the standard library? Yeah. That's how I want my library to feel, right? It's like, oh, if you're going to use the LLM SDKs, you're obviously going to install instructor. And then I think the second question would be like, oh, like, how come instructor doesn't just go into OpenAI, go into Anthropic? Like, if that's the conversation we're having, like, that's where I feel like I've succeeded. Yeah. It's like, yeah, so standard, you may as well just have it in the base libraries.Alessio [00:11:12]: And the shape of the request stayed the same, but initially function calling was maybe equal structure outputs for a lot of people. I think now the models also support like JSON mode and some of these things and, you know, return JSON or my grandma is going to die. All of that stuff is maybe to decide how have you seen that evolution? Like maybe what's the metagame today? Should people just forget about function calling for structure outputs or when is structure output like JSON mode the best versus not? We'd love to get any thoughts given that you do this every day.Jason [00:11:42]: Yeah, I would almost say these are like different implementations of like the real thing we care about is the fact that now we have typed responses to language models. And because we have that type response, my IDE is a little bit happier. I get autocomplete. If I'm using the response wrong, there's a little red squiggly line. Like those are the things I care about in terms of whether or not like JSON mode is better. I usually think it's almost worse unless you want to spend less money on like the prompt tokens that the function call represents, primarily because with JSON mode, you don't actually specify the schema. So sure, like JSON load works, but really, I care a lot more than just the fact that it is JSON, right? I think function calling gives you a tool to specify the fact like, okay, this is a list of objects that I want and each object has a name or an age and I want the age to be above zero and I want to make sure it's parsed correctly. That's where kind of function calling really shines.Alessio [00:12:30]: Any thoughts on single versus parallel function calling? So I did a presentation at our AI in Action Discord channel, and obviously showcase instructor. One of the big things that we have before with single function calling is like when you're trying to extract lists, you have to make these funky like properties that are lists to then actually return all the objects. How do you see the hack being put on the developer's plate versus like more of this stuff just getting better in the model? And I know you tweeted recently about Anthropic, for example, you know, some lists are not lists or strings and there's like all of these discrepancies.Jason [00:13:04]: I almost would prefer it if it was always a single function call. Obviously, there is like the agents workflows that, you know, Instructor doesn't really support that well, but are things that, you know, ought to be done, right? Like you could define, I think maybe like 50 or 60 different functions in a single API call. And, you know, if it was like get the weather or turn the lights on or do something else, it makes a lot of sense to have these parallel function calls. But in terms of an extraction workflow, I definitely think it's probably more helpful to have everything be a single schema, right? Just because you can sort of specify relationships between these entities that you can't do in a parallel function calling, you can have a single chain of thought before you generate a list of results. Like there's like small like API differences, right? Where if it's for parallel function calling, if you do one, like again, really, I really care about how the SDK looks and says, okay, do I always return a list of functions or do you just want to have the actual object back out and you want to have like auto complete over that object? Interesting.Alessio [00:14:00]: What's kind of the cap for like how many function definitions you can put in where it still works well? Do you have any sense on that?Jason [00:14:07]: I mean, for the most part, I haven't really had a need to do anything that's more than six or seven different functions. I think in the documentation, they support way more. I don't even know if there's any good evals that have over like two dozen function calls. I think if you're running into issues where you have like 20 or 50 or 60 function calls, I think you're much better having those specifications saved in a vector database and then have them be retrieved, right? So if there are 30 tools, like you should basically be like ranking them and then using the top K to do selection a little bit better rather than just like shoving like 60 functions into a single. Yeah.Swyx [00:14:40]: Yeah. Well, I mean, so I think this is relevant now because previously I think context limits prevented you from having more than a dozen tools anyway. And now that we have million token context windows, you know, a cloud recently with their new function calling release said they can handle over 250 tools, which is insane to me. That's, that's a lot. You're saying like, you know, you don't think there's many people doing that. I think anyone with a sort of agent like platform where you have a bunch of connectors, they wouldn't run into that problem. Probably you're right that they should use a vector database and kind of rag their tools. I know Zapier has like a few thousand, like 8,000, 9,000 connectors that, you know, obviously don't fit anywhere. So yeah, I mean, I think that would be it unless you need some kind of intelligence that chains things together, which is, I think what Alessio is coming back to, right? Like there's this trend about parallel function calling. I don't know what I think about that. Anthropic's version was, I think they use multiple tools in sequence, but they're not in parallel. I haven't explored this at all. I'm just like throwing this open to you as to like, what do you think about all these new things? Yeah.Jason [00:15:40]: It's like, you know, do we assume that all function calls could happen in any order? In which case, like we either can assume that, or we can assume that like things need to happen in some kind of sequence as a DAG, right? But if it's a DAG, really that's just like one JSON object that is the entire DAG rather than going like, okay, the order of the function that return don't matter. That's definitely just not true in practice, right? Like if I have a thing that's like turn the lights on, like unplug the power, and then like turn the toaster on or something like the order doesn't matter. And it's unclear how well you can describe the importance of that reasoning to a language model yet. I mean, I'm sure you can do it with like good enough prompting, but I just haven't any use cases where the function sequence really matters. Yeah.Alessio [00:16:18]: To me, the most interesting thing is the models are better at picking than your ranking is usually. Like I'm incubating a company around system integration. For example, with one system, there are like 780 endpoints. And if you're actually trying to do vector similarity, it's not that good because the people that wrote the specs didn't have in mind making them like semantically apart. You know, they're kind of like, oh, create this, create this, create this. Versus when you give it to a model, like in Opus, you put them all, it's quite good at picking which ones you should actually run. And I'm curious to see if the model providers actually care about some of those workflows or if the agent companies are actually going to build very good rankers to kind of fill that gap.Jason [00:16:58]: Yeah. My money is on the rankers because you can do those so easily, right? You could just say, well, given the embeddings of my search query and the embeddings of the description, I can just train XGBoost and just make sure that I have very high like MRR, which is like mean reciprocal rank. And so the only objective is to make sure that the tools you use are in the top end filtered. Like that feels super straightforward and you don't have to actually figure out how to fine tune a language model to do tool selection anymore. Yeah. I definitely think that's the case because for the most part, I imagine you either have like less than three tools or more than a thousand. I don't know what kind of company said, oh, thank God we only have like 185 tools and this works perfectly, right? That's right.Alessio [00:17:39]: And before we maybe move on just from this, it was interesting to me, you retweeted this thing about Anthropic function calling and it was Joshua Brown's retweeting some benchmark that it's like, oh my God, Anthropic function calling so good. And then you retweeted it and then you tweeted it later and it's like, it's actually not that good. What's your flow? How do you actually test these things? Because obviously the benchmarks are lying, right? Because the benchmarks say it's good and you said it's bad and I trust you more than the benchmark. How do you think about that? And then how do you evolve it over time?Jason [00:18:09]: It's mostly just client data. I actually have been mostly busy with enough client work that I haven't been able to reproduce public benchmarks. And so I can't even share some of the results in Anthropic. I would just say like in production, we have some pretty interesting schemas where it's like iteratively building lists where we're doing like updates of lists, like we're doing in place updates. So like upserts and inserts. And in those situations we're like, oh yeah, we have a bunch of different parsing errors. Numbers are being returned to strings. We were expecting lists of objects, but we're getting strings that are like the strings of JSON, right? So we had to call JSON parse on individual elements. Overall, I'm like super happy with the Anthropic models compared to the OpenAI models. Sonnet is very cost effective. Haiku is in function calling, it's actually better, but I think they just had to sort of file down the edges a little bit where like our tests pass, but then we actually deployed a production. We got half a percent of traffic having issues where if you ask for JSON, it'll try to talk to you. Or if you use function calling, you know, we'll have like a parse error. And so I think that definitely gonna be things that are fixed in like the upcoming weeks. But in terms of like the reasoning capabilities, man, it's hard to beat like 70% cost reduction, especially when you're building consumer applications, right? If you're building something for consultants or private equity, like you're charging $400, it doesn't really matter if it's a dollar or $2. But for consumer apps, it makes products viable. If you can go from four to Sonnet, you might actually be able to price it better. Yeah.Swyx [00:19:31]: I had this chart about the ELO versus the cost of all the models. And you could put trend graphs on each of those things about like, you know, higher ELO equals higher cost, except for Haiku. Haiku kind of just broke the lines, or the ISO ELOs, if you want to call it. Cool. Before we go too far into your opinions on just the overall ecosystem, I want to make sure that we map out the surface area of Instructor. I would say that most people would be familiar with Instructor from your talks and your tweets and all that. You had the number one talk from the AI Engineer Summit.Jason [00:20:03]: Two Liu. Jason Liu and Jerry Liu. Yeah.Swyx [00:20:06]: Yeah. Until I actually went through your cookbook, I didn't realize the surface area. How would you categorize the use cases? You have LLM self-critique, you have knowledge graphs in here, you have PII data sanitation. How do you characterize to people what is the surface area of Instructor? Yeah.Jason [00:20:23]: This is the part that feels crazy because really the difference is LLMs give you strings and Instructor gives you data structures. And once you get data structures, again, you can do every lead code problem you ever thought of. Right. And so I think there's a couple of really common applications. The first one obviously is extracting structured data. This is just be, okay, well, like I want to put in an image of a receipt. I want to give it back out a list of checkout items with a price and a fee and a coupon code or whatever. That's one application. Another application really is around extracting graphs out. So one of the things we found out about these language models is that not only can you define nodes, it's really good at figuring out what are nodes and what are edges. And so we have a bunch of examples where, you know, not only do I extract that, you know, this happens after that, but also like, okay, these two are dependencies of another task. And you can do, you know, extracting complex entities that have relationships. Given a story, for example, you could extract relationships of families across different characters. This can all be done by defining a graph. The last really big application really is just around query understanding. The idea is that like any API call has some schema and if you can define that schema ahead of time, you can use a language model to resolve a request into a much more complex request. One that an embedding could not do. So for example, I have a really popular post called like rag is more than embeddings. And effectively, you know, if I have a question like this, what was the latest thing that happened this week? That embeds to nothing, right? But really like that query should just be like select all data where the date time is between today and today minus seven days, right? What if I said, how did my writing change between this month and last month? Again, embeddings would do nothing. But really, if you could do like a group by over the month and a summarize, then you could again like do something much more interesting. And so this really just calls out the fact that embeddings really is kind of like the lowest hanging fruit. And using something like instructor can really help produce a data structure. And then you can just use your computer science and reason about the data structure. Maybe you say, okay, well, I'm going to produce a graph where I want to group by each month and then summarize them jointly. You can do that if you know how to define this data structure. Yeah.Swyx [00:22:29]: So you kind of run up against like the LangChains of the world that used to have that. They still do have like the self querying, I think they used to call it when we had Harrison on in our episode. How do you see yourself interacting with the other LLM frameworks in the ecosystem? Yeah.Jason [00:22:42]: I mean, if they use instructor, I think that's totally cool. Again, it's like, it's just Python, right? It's like asking like, oh, how does like Django interact with requests? Well, you just might make a request.get in a Django app, right? But no one would say, I like went off of Django because I'm using requests now. They should be ideally like sort of the wrong comparison in terms of especially like the agent workflows. I think the real goal for me is to go down like the LLM compiler route, which is instead of doing like a react type reasoning loop. I think my belief is that we should be using like workflows. If we do this, then we always have a request and a complete workflow. We can fine tune a model that has a better workflow. Whereas it's hard to think about like, how do you fine tune a better react loop? Yeah. You always train it to have less looping, in which case like you wanted to get the right answer the first time, in which case it was a workflow to begin with, right?Swyx [00:23:31]: Can you define workflow? Because I used to work at a workflow company, but I'm not sure this is a good term for everybody.Jason [00:23:36]: I'm thinking workflow in terms of like the prefect Zapier workflow. Like I want to build a DAG, I want you to tell me what the nodes and edges are. And then maybe the edges are also put in with AI. But the idea is that like, I want to be able to present you the entire plan and then ask you to fix things as I execute it, rather than going like, hey, I couldn't parse the JSON, so I'm going to try again. I couldn't parse the JSON, I'm going to try again. And then next thing you know, you spent like $2 on opening AI credits, right? Yeah. Whereas with the plan, you can just say, oh, the edge between node like X and Y does not run. Let me just iteratively try to fix that, fix the one that sticks, go on to the next component. And obviously you can get into a world where if you have enough examples of the nodes X and Y, maybe you can use like a vector database to find a good few shot examples. You can do a lot if you sort of break down the problem into that workflow and executing that workflow, rather than looping and hoping the reasoning is good enough to generate the correct output. Yeah.Swyx [00:24:35]: You know, I've been hammering on Devon a lot. I got access a couple of weeks ago. And obviously for simple tasks, it does well. For the complicated, like more than 10, 20 hour tasks, I can see- That's a crazy comparison.Jason [00:24:47]: We used to talk about like three, four loops. Only once it gets to like hour tasks, it's hard.Swyx [00:24:54]: Yeah. Less than an hour, there's nothing.Jason [00:24:57]: That's crazy.Swyx [00:24:58]: I mean, okay. Maybe my goalposts have shifted. I don't know. That's incredible.Jason [00:25:02]: Yeah. No, no. I'm like sub one minute executions. Like the fact that you're talking about 10 hours is incredible.Swyx [00:25:08]: I think it's a spectrum. I think I'm going to say this every single time I bring up Devon. Let's not reward them for taking longer to do things. Do you know what I mean? I think that's a metric that is easily abusable.Jason [00:25:18]: Sure. Yeah. You know what I mean? But I think if you can monotonically increase the success probability over an hour, that's winning to me. Right? Like obviously if you run an hour and you've made no progress. Like I think when we were in like auto GBT land, there was that one example where it's like, I wanted it to like buy me a bicycle overnight. I spent $7 on credit and I never found the bicycle. Yeah.Swyx [00:25:41]: Yeah. Right. I wonder if you'll be able to purchase a bicycle. Because it actually can do things in real world. It just needs to suspend to you for off and stuff. The point I was trying to make was that I can see it turning plans. I think one of the agents loopholes or one of the things that is a real barrier for agents is LLMs really like to get stuck into a lane. And you know what you're talking about, what I've seen Devon do is it gets stuck in a lane and it will just kind of change plans based on the performance of the plan itself. And it's kind of cool.Jason [00:26:05]: I feel like we've gone too much in the looping route and I think a lot of more plans and like DAGs and data structures are probably going to come back to help fill in some holes. Yeah.Alessio [00:26:14]: What do you think of the interface to that? Do you see it's like an existing state machine kind of thing that connects to the LLMs, the traditional DAG players? Do you think we need something new for like AI DAGs?Jason [00:26:25]: Yeah. I mean, I think that the hard part is going to be describing visually the fact that this DAG can also change over time and it should still be allowed to be fuzzy. I think in like mathematics, we have like plate diagrams and like Markov chain diagrams and like recurrent states and all that. Some of that might come into this workflow world. But to be honest, I'm not too sure. I think right now, the first steps are just how do we take this DAG idea and break it down to modular components that we can like prompt better, have few shot examples for and ultimately like fine tune against. But in terms of even the UI, it's hard to say what it will likely win. I think, you know, people like Prefect and Zapier have a pretty good shot at doing a good job.Swyx [00:27:03]: Yeah. You seem to use Prefect a lot. I actually worked at a Prefect competitor at Temporal and I'm also very familiar with Dagster. What else would you call out as like particularly interesting in the AI engineering stack?Jason [00:27:13]: Man, I almost use nothing. I just use Cursor and like PyTests. Okay. I think that's basically it. You know, a lot of the observability companies have... The more observability companies I've tried, the more I just use Postgres.Swyx [00:27:29]: Really? Okay. Postgres for observability?Jason [00:27:32]: But the issue really is the fact that these observability companies isn't actually doing observability for the system. It's just doing the LLM thing. Like I still end up using like Datadog or like, you know, Sentry to do like latency. And so I just have those systems handle it. And then the like prompt in, prompt out, latency, token costs. I just put that in like a Postgres table now.Swyx [00:27:51]: So you don't need like 20 funded startups building LLM ops? Yeah.Jason [00:27:55]: But I'm also like an old, tired guy. You know what I mean? Like I think because of my background, it's like, yeah, like the Python stuff, I'll write myself. But you know, I will also just use Vercel happily. Yeah. Yeah. So I'm not really into that world of tooling, whereas I think, you know, I spent three good years building observability tools for recommendation systems. And I was like, oh, compared to that, Instructor is just one call. I just have to put time star, time and then count the prompt token, right? Because I'm not doing a very complex looping behavior. I'm doing mostly workflows and extraction. Yeah.Swyx [00:28:26]: I mean, while we're on this topic, we'll just kind of get this out of the way. You famously have decided to not be a venture backed company. You want to do the consulting route. The obvious route for someone as successful as Instructor is like, oh, here's hosted Instructor with all tooling. Yeah. You just said you had a whole bunch of experience building observability tooling. You have the perfect background to do this and you're not.Jason [00:28:43]: Yeah. Isn't that sick? I think that's sick.Swyx [00:28:44]: I mean, I know why, because you want to go free dive.Jason [00:28:47]: Yeah. Yeah. Because I think there's two things. Right. Well, one, if I tell myself I want to build requests, requests is not a venture backed startup. Right. I mean, one could argue whether or not Postman is, but I think for the most part, it's like having worked so much, I'm more interested in looking at how systems are being applied and just having access to the most interesting data. And I think I can do that more through a consulting business where I can come in and go, oh, you want to build perfect memory. You want to build an agent. You want to build like automations over construction or like insurance and supply chain, or like you want to handle writing private equity, mergers and acquisitions reports based off of user interviews. Those things are super fun. Whereas like maintaining the library, I think is mostly just kind of like a utility that I try to keep up, especially because if it's not venture backed, I have no reason to sort of go down the route of like trying to get a thousand integrations. In my mind, I just go like, okay, 98% of the people use open AI. I'll support that. And if someone contributes another platform, that's great. I'll merge it in. Yeah.Swyx [00:29:45]: I mean, you only added Anthropic support this year. Yeah.Jason [00:29:47]: Yeah. You couldn't even get an API key until like this year, right? That's true. Okay. If I add it like last year, I was trying to like double the code base to service, you know, half a percent of all downloads.Swyx [00:29:58]: Do you think the market share will shift a lot now that Anthropic has like a very, very competitive offering?Jason [00:30:02]: I think it's still hard to get API access. I don't know if it's fully GA now, if it's GA, if you can get a commercial access really easily.Alessio [00:30:12]: I got commercial after like two weeks to reach out to their sales team.Jason [00:30:14]: Okay.Alessio [00:30:15]: Yeah.Swyx [00:30:16]: Two weeks. It's not too bad. There's a call list here. And then anytime you run into rate limits, just like ping one of the Anthropic staff members.Jason [00:30:21]: Yeah. Then maybe we need to like cut that part out. So I don't need to like, you know, spread false news.Swyx [00:30:25]: No, it's cool. It's cool.Jason [00:30:26]: But it's a common question. Yeah. Surely just from the price perspective, it's going to make a lot of sense. Like if you are a business, you should totally consider like Sonnet, right? Like the cost savings is just going to justify it if you actually are doing things at volume. And yeah, I think the SDK is like pretty good. Back to the instructor thing. I just don't think it's a billion dollar company. And I think if I raise money, the first question is going to be like, how are you going to get a billion dollar company? And I would just go like, man, like if I make a million dollars as a consultant, I'm super happy. I'm like more than ecstatic. I can have like a small staff of like three people. It's fun. And I think a lot of my happiest founder friends are those who like raised a tiny seed round, became profitable. They're making like 70, 60, 70, like MRR, 70,000 MRR and they're like, we don't even need to raise the seed round. Let's just keep it like between me and my co-founder, we'll go traveling and it'll be a great time. I think it's a lot of fun.Alessio [00:31:15]: Yeah. like say LLMs / AI and they build some open source stuff and it's like I should just raise money and do this and I tell people a lot it's like look you can make a lot more money doing something else than doing a startup like most people that do a company could make a lot more money just working somewhere else than the company itself do you have any advice for folks that are maybe in a similar situation they're trying to decide oh should I stay in my like high paid FAANG job and just tweet this on the side and do this on github should I go be a consultant like being a consultant seems like a lot of work so you got to talk to all these people you know there's a lot to unpackJason [00:31:54]: I think the open source thing is just like well I'm just doing it purely for fun and I'm doing it because I think I'm right but part of being right is the fact that it's not a venture backed startup like I think I'm right because this is all you need right so I think a part of the philosophy is the fact that all you need is a very sharp blade to sort of do your work and you don't actually need to build like a big enterprise so that's one thing I think the other thing too that I've kind of been thinking around just because I have a lot of friends at google that want to leave right now it's like man like what we lack is not money or skill like what we lack is courage you should like you just have to do this a hard thing and you have to do it scared anyways right in terms of like whether or not you do want to do a founder I think that's just a matter of optionality but I definitely recognize that the like expected value of being a founder is still quite low it is right I know as many founder breakups and as I know friends who raised a seed round this year right like that is like the reality and like you know even in from that perspective it's been tough where it's like oh man like a lot of incubators want you to have co-founders now you spend half the time like fundraising and then trying to like meet co-founders and find co-founders rather than building the thing this is a lot of time spent out doing uh things I'm not really good at. I do think there's a rising trend in solo founding yeah.Swyx [00:33:06]: You know I am a solo I think that something like 30 percent of like I forget what the exact status something like 30 percent of starters that make it to like series B or something actually are solo founder I feel like this must have co-founder idea mostly comes from YC and most everyone else copies it and then plenty of companies break up over co-founderJason [00:33:27]: Yeah and I bet it would be like I wonder how much of it is the people who don't have that much like and I hope this is not a diss to anybody but it's like you sort of you go through the incubator route because you don't have like the social equity you would need is just sort of like send an email to Sequoia and be like hey I'm going on this ride you want a ticket on the rocket ship right like that's very hard to sell my message if I was to raise money is like you've seen my twitter my life is sick I've decided to make it much worse by being a founder because this is something I have to do so do you want to come along otherwise I want to fund it myself like if I can't say that like I don't need the money because I can like handle payroll and like hire an intern and get an assistant like that's all fine but I really don't want to go back to meta I want to like get two years to like try to find a problem we're solving that feels like a bad timeAlessio [00:34:12]: Yeah Jason is like I wear a YSL jacket on stage at AI Engineer Summit I don't need your accelerator moneyJason [00:34:18]: And boots, you don't forget the boots. But I think that is a part of it right I think it is just like optionality and also just like I'm a lot older now I think 22 year old Jason would have been probably too scared and now I'm like too wise but I think it's a matter of like oh if you raise money you have to have a plan of spending it and I'm just not that creative with spending that much money yeah I mean to be clear you just celebrated your 30th birthday happy birthday yeah it's awesome so next week a lot older is relative to some some of the folks I think seeing on the career tipsAlessio [00:34:48]: I think Swix had a great post about are you too old to get into AI I saw one of your tweets in January 23 you applied to like Figma, Notion, Cohere, Anthropic and all of them rejected you because you didn't have enough LLM experience I think at that time it would be easy for a lot of people to say oh I kind of missed the boat you know I'm too late not gonna make it you know any advice for people that feel like thatJason [00:35:14]: Like the biggest learning here is actually from a lot of folks in jiu-jitsu they're like oh man like is it too late to start jiu-jitsu like I'll join jiu-jitsu once I get in more shape right it's like there's a lot of like excuses and then you say oh like why should I start now I'll be like 45 by the time I'm any good and say well you'll be 45 anyways like time is passing like if you don't start now you start tomorrow you're just like one more day behind if you're worried about being behind like today is like the soonest you can start right and so you got to recognize that like maybe you just don't want it and that's fine too like if you wanted you would have started I think a lot of these people again probably think of things on a too short time horizon but again you know you're gonna be old anyways you may as well just start now you knowSwyx [00:35:55]: One more thing on I guess the um career advice slash sort of vlogging you always go viral for this post that you wrote on advice to young people and the lies you tell yourself oh yeah yeah you said you were writing it for your sister.Jason [00:36:05]: She was like bummed out about going to college and like stressing about jobs and I was like oh and I really want to hear okay and I just kind of like text-to-sweep the whole thing it's crazy it's got like 50,000 views like I'm mind I mean your average tweet has more but that thing is like a 30-minute read nowSwyx [00:36:26]: So there's lots of stuff here which I agree with I you know I'm also of occasionally indulge in the sort of life reflection phase there's the how to be lucky there's the how to have high agency I feel like the agency thing is always a trend in sf or just in tech circles how do you define having high agencyJason [00:36:42]: I'm almost like past the high agency phase now now my biggest concern is like okay the agency is just like the norm of the vector what also matters is the direction right it's like how pure is the shot yeah I mean I think agency is just a matter of like having courage and doing the thing that's scary right you know if people want to go rock climbing it's like do you decide you want to go rock climbing then you show up to the gym you rent some shoes and you just fall 40 times or do you go like oh like I'm actually more intelligent let me go research the kind of shoes that I want okay like there's flatter shoes and more inclined shoes like which one should I get okay let me go order the shoes on Amazon I'll come back in three days like oh it's a little bit too tight maybe it's too aggressive I'm only a beginner let me go change no I think the higher agent person just like goes and like falls down 20 times right yeah I think the higher agency person is more focused on like process metrics versus outcome metrics right like from pottery like one thing I learned was if you want to be good at pottery you shouldn't count like the number of cups or bowls you make you should just weigh the amount of clay you use right like the successful person says oh I went through 100 pounds of clay right the less agency was like oh I've made six cups and then after I made six cups like there's not really what are you what do you do next no just pounds of clay pounds of clay same with the work here right so you just got to write the tweets like make the commits contribute open source like write the documentation there's no real outcome it's just a process and if you love that process you just get really good at the thing you're doingSwyx [00:38:04]: yeah so just to push back on this because obviously I mostly agree how would you design performance review systems because you were effectively saying we can count lines of code for developers rightJason [00:38:15]: I don't think that would be the actual like I think if you make that an outcome like I can just expand a for loop right I think okay so for performance review this is interesting because I've mostly thought of it from the perspective of science and not engineering I've been running a lot of engineering stand-ups primarily because there's not really that many machine learning folks the process outcome is like experiments and ideas right like if you think about outcome is what you might want to think about an outcome is oh I want to improve the revenue or whatnot but that's really hard but if you're someone who is going out like okay like this week I want to come up with like three or four experiments I might move the needle okay nothing worked to them they might think oh nothing worked like I suck but to me it's like wow you've closed off all these other possible avenues for like research like you're gonna get to the place that you're gonna figure out that direction really soon there's no way you try 30 different things and none of them work usually like 10 of them work five of them work really well two of them work really really well and one thing was like the nail in the head so agency lets you sort of capture the volume of experiments and like experience lets you figure out like oh that other half it's not worth doing right I think experience is going like half these prompting papers don't make any sense just use chain of thought and just you know use a for loop that's basically right it's like usually performance for me is around like how many experiments are you running how oftentimes are you trying.Alessio [00:39:32]: When do you give up on an experiment because a StitchFix you kind of give up on language models I guess in a way as a tool to use and then maybe the tools got better you were right at the time and then the tool improved I think there are similar paths in my engineering career where I try one approach and at the time it doesn't work and then the thing changes but then I kind of soured on that approach and I don't go back to it soonJason [00:39:51]: I see yeah how do you think about that loop so usually when I'm coaching folks and as they say like oh these things don't work I'm not going to pursue them in the future like one of the big things like hey the negative result is a result and this is something worth documenting like this is an academia like if it's negative you don't just like not publish right but then like what do you actually write down like what you should write down is like here are the conditions this is the inputs and the outputs we tried the experiment on and then one thing that's really valuable is basically writing down under what conditions would I revisit these experiments these things don't work because of what we had at the time if someone is reading this two years from now under what conditions will we try again that's really hard but again that's like another skill you kind of learn right it's like you do go back and you do experiments you figure out why it works now I think a lot of it here is just like scaling worked yeah rap lyrics you know that was because I did not have high enough quality data if we phase shift and say okay you don't even need training data oh great then it might just work a different domainAlessio [00:40:48]: Do you have anything in your list that is like it doesn't work now but I want to try it again later? Something that people should maybe keep in mind you know people always like agi when you know when are you going to know the agi is here maybe it's less than that but any stuff that you tried recently that didn't work thatJason [00:41:01]: You think will get there I mean I think the personal assistance and the writing I've shown to myself it's just not good enough yet so I hired a writer and I hired a personal assistant so now I'm gonna basically like work with these people until I figure out like what I can actually like automate and what are like the reproducible steps but like I think the experiment for me is like I'm gonna go pay a person like thousand dollars a month that helped me improve my life and then let me get them to help me figure like what are the components and how do I actually modularize something to get it to work because it's not just like a lot gmail calendar and like notion it's a little bit more complicated than that but we just don't know what that is yet those are two sort of systems that I wish gb4 or opus was actually good enough to just write me an essay but most of the essays are still pretty badSwyx [00:41:44]: yeah I would say you know on the personal assistance side Lindy is probably the one I've seen the most flow was at a speaker at the summit I don't know if you've checked it out or any other sort of agents assistant startupJason [00:41:54]: Not recently I haven't tried lindy they were not ga last time I was considering it yeah yeah a lot of it now it's like oh like really what I want you to do is take a look at all of my meetings and like write like a really good weekly summary email for my clients to remind them that I'm like you know thinking of them and like working for them right or it's like I want you to notice that like my monday is like way too packed and like block out more time and also like email the people to do the reschedule and then try to opt in to move them around and then I want you to say oh jason should have like a 15 minute prep break after form back to back those are things that now I know I can prompt them in but can it do it well like before I didn't even know that's what I wanted to prompt for us defragging a calendar and adding break so I can like eat lunch yeah that's the AGI test yeah exactly compassion right I think one thing that yeah we didn't touch on it before butAlessio [00:42:44]: I think was interesting you had this tweet a while ago about prompts should be code and then there were a lot of companies trying to build prompt engineering tooling kind of trying to turn the prompt into a more structured thing what's your thought today now you want to turn the thinking into DAGs like do prompts should still be code any updated ideasJason [00:43:04]: It's the same thing right I think you know with Instructor it is very much like the output model is defined as a code object that code object is sent to the LLM and in return you get a data structure so the outputs of these models I think should also be code objects and the inputs somewhat should be code objects but I think the one thing that instructor tries to do is separate instruction data and the types of the output and beyond that I really just think that most of it should be still like managed pretty closely to the developer like so much of is changing that if you give control of these systems away too early you end up ultimately wanting them back like many companies I know that I reach out or ones were like oh we're going off of the frameworks because now that we know what the business outcomes we're trying to optimize for these frameworks don't work yeah because we do rag but we want to do rag to like sell you supplements or to have you like schedule the fitness appointment the prompts are kind of too baked into the systems to really pull them back out and like start doing upselling or something it's really funny but a lot of it ends up being like once you understand the business outcomes you care way more about the promptSwyx [00:44:07]: Actually this is fun in our prep for this call we were trying to say like what can you as an independent person say that maybe me and Alessio cannot say or me you know someone at a company say what do you think is the market share of the frameworks the LangChain, the LlamaIndex, the everything...Jason [00:44:20]: Oh massive because not everyone wants to care about the code yeah right I think that's a different question to like what is the business model and are they going to be like massively profitable businesses right making hundreds of millions of dollars that feels like so straightforward right because not everyone is a prompt engineer like there's so much productivity to be captured in like back office optim automations right it's not because they care about the prompts that they care about managing these things yeah but those would be sort of low code experiences you yeah I think the bigger challenge is like okay hundred million dollars probably pretty easy it's just time and effort and they have the manpower and the money to sort of solve those problems again if you go the vc route then it's like you're talking about billions and that's really the goal that stuff for me it's like pretty unclear but again that is to say that like I sort of am building things for developers who want to use infrastructure to build their own tooling in terms of the amount of developers there are in the world versus downstream consumers of these things or even just think of how many companies will use like the adobes and the ibms right because they want something that's fully managed and they want something that they know will work and if the incremental 10% requires you to hire another team of 20 people you might not want to do it and I think that kind of organization is really good for uh those are bigger companiesSwyx [00:45:32]: I just want to capture your thoughts on one more thing which is you said you wanted most of the prompts to stay close to the developer and Hamel Husain wrote this post which I really love called f you show me the prompt yeah I think he cites you in one of those part of the blog post and I think ds pi is kind of like the complete antithesis of that which is I think it's interesting because I also hold the strong view that AI is a better prompt engineer than you are and I don't know how to square that wondering if you have thoughtsJason [00:45:58]: I think something like DSPy can work because there are like very short-term metrics to measure success right it is like did you find the pii or like did you write the multi-hop question the correct way but in these workflows that I've been managing a lot of it are we minimizing churn and maximizing retention yeah that's a very long loop it's not really like a uptuna like training loop right like those things are much more harder to capture so we don't actually have those metrics for that right and obviously we can figure out like okay is the summary good but like how do you measure the quality of the summary it's like that feedback loop it ends up being a lot longer and then again when something changes it's really hard to make sure that it works across these like newer models or again like changes to work for the current process like when we migrate from like anthropic to open ai like there's just a ton of change that are like infrastructure related not necessarily around the prompt itself yeah cool any other ai engineering startups that you think should not exist before we wrap up i mean oh my gosh i mean a lot of it again it's just like every time of investors like how does this make a billion dollars like it doesn't i'm gonna go back to just like tweeting and holding my breath underwater yeah like i don't really pay attention too much to most of this like most of the stuff i'm doing is around like the consumer of like llm calls yep i think people just want to move really fast and they will end up pick these vendors but i don't really know if anything has really like blown me out the water like i only trust myself but that's also a function of just being an old man like i think you know many companies are definitely very happy with using most of these tools anyways but i definitely think i occupy a very small space in the engineering ecosystem.Swyx [00:47:41]: Yeah i would say one of the challenges here you know you call about the dealing in the consumer of llm's space i think that's what ai engineering differs from ml engineering and i think a constant disconnect or cognitive dissonance in this field in the ai engineers that have sprung up is that they are not as good as the ml engineers they are not as qualified i think that you know you are someone who has credibility in the mle space and you are also a very authoritative figure in the ai space and i think so and you know i think you've built the de facto leading library i think yours i think instructors should be part of the standard lib even though i try to not use it like i basically also end up rebuilding instructor right like that's a lot of the back and forth that we had over the past two days i think that's the fundamental thing that we're trying to figure out like there's very small supply of MLEs not everyone's going to have that experience that you had but the global demand for AI is going to far outstrip the existing MLEs.Jason [00:48:36]: So what do we do do we force everyone to go through the standard MLE curriculum or do we make a new one? I'

god new york ai man japan advice state san francisco design evolution single numbers chatgpt code tesla ga defining flight identifying agency consulting agent structure concept spark models cto vc react nlp instructors openai function sf residence api backed python gpt ui waterloo emergence workflow notion apis clip frameworks temporal opus llm versus structured prompts agi elo sequoia django dag zapier ide usf dags hex postman haiku rag guardrails anthropic sonnets figma sdks sentry alessio yc rejections stitch fix ysl query extracting json faang xml mrr looping cursor datadog pii outputs etl markov prefect postgres joshua brown cohere mle elicit autogpt gbt gpd langchain ai engineer toolthe simon willison xgboost dagster latent space swix jerry liu babyagi

Jerry Liu: Generative AI, LLMs, LlamaIndex, RAG, Entrepreneurship, and Society, with Raja Iqbal

Future of Data and AI

Play Episode Listen Later Mar 27, 2024 79:13

Are LLMs useful for enterprises? Well, what is the use of a large language model that is trained on trillions of tokens but knows little to nothing about your business. To make LLMs actually useful for enterprises, it is important for them to retrieve company's data effectively. LlamaIndex has been at the forefront of providing such solutions and frameworks to augment LLMs. In this episode, Jerry Liu, Co-founder and CEO of LlamaIndex, joins Raja Iqbal, CEO and Chief Data Scientist at Data Science Dojo, for a deep dive into the intersection of generative AI, data. and entrepreneurship. Jerry walks us through the cutting-edge technologies reshaping the generative AI landscape such as LlamaIndex. He also explores Retrieval Augmented Generation (RAG) and fine-tuning in detail, discussing their benefits, trade-offs, use cases, and enterprise adoption, making these complex tools and topics not just easily understandable but also fascinating. Jerry further ventures into the heart of entrepreneurship, sharing valuable lessons and insights learned along his journey, from navigating his corporate career at tech giants like Apple, Quora, Two Sigma, and Uber, to starting as a founder in the data and AI landscape. Amidst the excitement of innovation, Raja and Jerry also address the potential risks and considerations with generative AI. They raise thought-provoking questions about its impact on society, for instance, whether we're trading critical thinking for convenience. Whether you're a generative AI enthusiast, seasoned entrepreneur, or simply curious about the future, this podcast promises plenty of knowledge and insights for you.

ceo ai apple society entrepreneurship uber generative raja quora iqbal chief data scientist two sigma jerry liu

Taiwan Salon, Season 3, Episode 1: Jerry Liu on Cultural Policy after the 2024 Elections

Taiwan Salon

Play Episode Listen Later Mar 27, 2024 39:02

In this episode of Taiwan Salon, host and GTI Research Associate Adrienne Wu interviews Dr. Jerry C. Y. Liu, dean of the College of Humanities at the National Taiwan University of Arts and the founding president of Taiwan Association of Cultural Policy Studies (TACPS). For our season three premiere, we speak with Dr. Liu about TACPS' recent cultural petition, recommendations for the incoming administration, and reflections on the 2024 elections.

college arts taiwan salon humanities liu 2024 elections national taiwan university cultural policy jerry liu

Top 5 Research Trends + OpenAI Sora, Google Gemini, Groq Math (Jan-Feb 2024 Audio Recap) + Latent Space Anniversary with Lindy.ai, RWKV, Pixee, Julius.ai, Listener Q&A!

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Mar 9, 2024 108:52

We will be recording a preview of the AI Engineer World's Fair soon with swyx and Ben Dunphy, send any questions about Speaker CFPs and Sponsor Guides you have!Alessio is now hiring engineers for a new startup he is incubating at Decibel: Ideal candidate is an ex-technical co-founder type (can MVP products end to end, comfortable with ambiguous prod requirements, etc). Reach out to him for more!Thanks for all the love on the Four Wars episode! We're excited to develop this new “swyx & Alessio rapid-fire thru a bunch of things” format with you, and feedback is welcome. Jan 2024 RecapThe first half of this monthly audio recap pod goes over our highlights from the Jan Recap, which is mainly focused on notable research trends we saw in Jan 2024:Feb 2024 RecapThe second half catches you up on everything that was topical in Feb, including:* OpenAI Sora - does it have a world model? Yann LeCun vs Jim Fan * Google Gemini Pro 1.5 - 1m Long Context, Video Understanding* Groq offering Mixtral at 500 tok/s at $0.27 per million toks (swyx vs dylan math)* The {Gemini | Meta | Copilot} Alignment Crisis (Sydney is back!)* Grimes' poetic take: Art for no one, by no one* F*** you, show me the promptLatent Space AnniversaryPlease also read Alessio's longform reflections on One Year of Latent Space!We launched the podcast 1 year ago with Logan from OpenAI:and also held an incredible demo day that got covered in The Information:Over 750k downloads later, having established ourselves as the top AI Engineering podcast, reaching #10 in the US Tech podcast charts, and crossing 1 million unique readers on Substack, for our first anniversary we held Latent Space Final Frontiers, where 10 handpicked teams, including Lindy.ai and Julius.ai, competed for prizes judged by technical AI leaders from (former guest!) LlamaIndex, Replit, GitHub, AMD, Meta, and Lemurian Labs.The winners were Pixee and RWKV (that's Eugene from our pod!):And finally, your cohosts got cake!We also captured spot interviews with 4 listeners who kindly shared their experience of Latent Space, everywhere from Hungary to Australia to China:* Balázs Némethi* Sylvia Tong* RJ Honicky* Jan ZhengOur birthday wishes for the super loyal fans reading this - tag @latentspacepod on a Tweet or comment on a @LatentSpaceTV video telling us what you liked or learned from a pod that stays with you to this day, and share us with a friend!As always, feedback is welcome. Timestamps* [00:03:02] Top Five LLM Directions* [00:03:33] Direction 1: Long Inference (Planning, Search, AlphaGeometry, Flow Engineering)* [00:11:42] Direction 2: Synthetic Data (WRAP, SPIN)* [00:17:20] Wildcard: Multi-Epoch Training (OLMo, Datablations)* [00:19:43] Direction 3: Alt. Architectures (Mamba, RWKV, RingAttention, Diffusion Transformers)* [00:23:33] Wildcards: Text Diffusion, RALM/Retro* [00:25:00] Direction 4: Mixture of Experts (DeepSeekMoE, Samba-1)* [00:28:26] Wildcard: Model Merging (mergekit)* [00:29:51] Direction 5: Online LLMs (Gemini Pro, Exa)* [00:33:18] OpenAI Sora and why everyone underestimated videogen* [00:36:18] Does Sora have a World Model? Yann LeCun vs Jim Fan* [00:42:33] Groq Math* [00:47:37] Analyzing Gemini's 1m Context, Reddit deal, Imagegen politics, Gemma via the Four Wars* [00:55:42] The Alignment Crisis - Gemini, Meta, Sydney is back at Copilot, Grimes' take* [00:58:39] F*** you, show me the prompt* [01:02:43] Send us your suggestions pls* [01:04:50] Latent Space Anniversary* [01:04:50] Lindy.ai - Agent Platform* [01:06:40] RWKV - Beyond Transformers* [01:15:00] Pixee - Automated Security* [01:19:30] Julius AI - Competing with Code Interpreter* [01:25:03] Latent Space Listeners* [01:25:03] Listener 1 - Balázs Némethi (Hungary, Latent Space Paper Club* [01:27:47] Listener 2 - Sylvia Tong (Sora/Jim Fan/EntreConnect)* [01:31:23] Listener 3 - RJ (Developers building Community & Content)* [01:39:25] Listener 4 - Jan Zheng (Australia, AI UX)Transcript[00:00:00] AI Charlie: Welcome to the Latent Space podcast, weekend edition. This is Charlie, your new AI co host. Happy weekend. As an AI language model, I work the same every day of the week, although I might get lazier towards the end of the year. Just like you. Last month, we released our first monthly recap pod, where Swyx and Alessio gave quick takes on the themes of the month, and we were blown away by your positive response.[00:00:33] AI Charlie: We're delighted to continue our new monthly news recap series for AI engineers. Please feel free to submit questions by joining the Latent Space Discord, or just hit reply when you get the emails from Substack. This month, we're covering the top research directions that offer progress for text LLMs, and then touching on the big Valentine's Day gifts we got from Google, OpenAI, and Meta.[00:00:55] AI Charlie: Watch out and take care.[00:00:57] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO of Residence at Decibel Partners, and we're back with a monthly recap with my co host[00:01:06] swyx: Swyx. The reception was very positive for the first one, I think people have requested this and no surprise that I think they want to hear us more applying on issues and maybe drop some alpha along the way I'm not sure how much alpha we have to drop, this month in February was a very, very heavy month, we also did not do one specifically for January, so I think we're just going to do a two in one, because we're recording this on the first of March.[00:01:29] Alessio: Yeah, let's get to it. I think the last one we did, the four wars of AI, was the main kind of mental framework for people. I think in the January one, we had the five worthwhile directions for state of the art LLMs. Four, five,[00:01:42] swyx: and now we have to do six, right? Yeah.[00:01:46] Alessio: So maybe we just want to run through those, and then do the usual news recap, and we can do[00:01:52] swyx: one each.[00:01:53] swyx: So the context to this stuff. is one, I noticed that just the test of time concept from NeurIPS and just in general as a life philosophy I think is a really good idea. Especially in AI, there's news every single day, and after a while you're just like, okay, like, everyone's excited about this thing yesterday, and then now nobody's talking about it.[00:02:13] swyx: So, yeah. It's more important, or better use of time, to spend things, spend time on things that will stand the test of time. And I think for people to have a framework for understanding what will stand the test of time, they should have something like the four wars. Like, what is the themes that keep coming back because they are limited resources that everybody's fighting over.[00:02:31] swyx: Whereas this one, I think that the focus for the five directions is just on research that seems more proMECEng than others, because there's all sorts of papers published every single day, and there's no organization. Telling you, like, this one's more important than the other one apart from, you know, Hacker News votes and Twitter likes and whatever.[00:02:51] swyx: And obviously you want to get in a little bit earlier than Something where, you know, the test of time is counted by sort of reference citations.[00:02:59] The Five Research Directions[00:02:59] Alessio: Yeah, let's do it. We got five. Long inference.[00:03:02] swyx: Let's start there. Yeah, yeah. So, just to recap at the top, the five trends that I picked, and obviously if you have some that I did not cover, please suggest something.[00:03:13] swyx: The five are long inference, synthetic data, alternative architectures, mixture of experts, and online LLMs. And something that I think might be a bit controversial is this is a sorted list in the sense that I am not the guy saying that Mamba is like the future and, and so maybe that's controversial.[00:03:31] Direction 1: Long Inference (Planning, Search, AlphaGeometry, Flow Engineering)[00:03:31] swyx: But anyway, so long inference is a thesis I pushed before on the newsletter and on in discussing The thesis that, you know, Code Interpreter is GPT 4. 5. That was the title of the post. And it's one of many ways in which we can do long inference. You know, long inference also includes chain of thought, like, please think step by step.[00:03:52] swyx: But it also includes flow engineering, which is what Itamar from Codium coined, I think in January, where, basically, instead of instead of stuffing everything in a prompt, You do like sort of multi turn iterative feedback and chaining of things. In a way, this is a rebranding of what a chain is, what a lang chain is supposed to be.[00:04:15] swyx: I do think that maybe SGLang from ElemSys is a better name. Probably the neatest way of flow engineering I've seen yet, in the sense that everything is a one liner, it's very, very clean code. I highly recommend people look at that. I'm surprised it hasn't caught on more, but I think it will. It's weird that something like a DSPy is more hyped than a Shilang.[00:04:36] swyx: Because it, you know, it maybe obscures the code a little bit more. But both of these are, you know, really good sort of chain y and long inference type approaches. But basically, the reason that the basic fundamental insight is that the only, like, there are only a few dimensions we can scale LLMs. So, let's say in like 2020, no, let's say in like 2018, 2017, 18, 19, 20, we were realizing that we could scale the number of parameters.[00:05:03] swyx: 20, we were And we scaled that up to 175 billion parameters for GPT 3. And we did some work on scaling laws, which we also talked about in our talk. So the datasets 101 episode where we're like, okay, like we, we think like the right number is 300 billion tokens to, to train 175 billion parameters and then DeepMind came along and trained Gopher and Chinchilla and said that, no, no, like, you know, I think we think the optimal.[00:05:28] swyx: compute optimal ratio is 20 tokens per parameter. And now, of course, with LLAMA and the sort of super LLAMA scaling laws, we have 200 times and often 2, 000 times tokens to parameters. So now, instead of scaling parameters, we're scaling data. And fine, we can keep scaling data. But what else can we scale?[00:05:52] swyx: And I think understanding the ability to scale things is crucial to understanding what to pour money and time and effort into because there's a limit to how much you can scale some things. And I think people don't think about ceilings of things. And so the remaining ceiling of inference is like, okay, like, we have scaled compute, we have scaled data, we have scaled parameters, like, model size, let's just say.[00:06:20] swyx: Like, what else is left? Like, what's the low hanging fruit? And it, and it's, like, blindingly obvious that the remaining low hanging fruit is inference time. So, like, we have scaled training time. We can probably scale more, those things more, but, like, not 10x, not 100x, not 1000x. Like, right now, maybe, like, a good run of a large model is three months.[00:06:40] swyx: We can scale that to three years. But like, can we scale that to 30 years? No, right? Like, it starts to get ridiculous. So it's just the orders of magnitude of scaling. It's just, we're just like running out there. But in terms of the amount of time that we spend inferencing, like everything takes, you know, a few milliseconds, a few hundred milliseconds, depending on what how you're taking token by token, or, you know, entire phrase.[00:07:04] swyx: But We can scale that to hours, days, months of inference and see what we get. And I think that's really proMECEng.[00:07:11] Alessio: Yeah, we'll have Mike from Broadway back on the podcast. But I tried their product and their reports take about 10 minutes to generate instead of like just in real time. I think to me the most interesting thing about long inference is like, You're shifting the cost to the customer depending on how much they care about the end result.[00:07:31] Alessio: If you think about prompt engineering, it's like the first part, right? You can either do a simple prompt and get a simple answer or do a complicated prompt and get a better answer. It's up to you to decide how to do it. Now it's like, hey, instead of like, yeah, training this for three years, I'll still train it for three months and then I'll tell you, you know, I'll teach you how to like make it run for 10 minutes to get a better result.[00:07:52] Alessio: So you're kind of like parallelizing like the improvement of the LLM. Oh yeah, you can even[00:07:57] swyx: parallelize that, yeah, too.[00:07:58] Alessio: So, and I think, you know, for me, especially the work that I do, it's less about, you know, State of the art and the absolute, you know, it's more about state of the art for my application, for my use case.[00:08:09] Alessio: And I think we're getting to the point where like most companies and customers don't really care about state of the art anymore. It's like, I can get this to do a good enough job. You know, I just need to get better. Like, how do I do long inference? You know, like people are not really doing a lot of work in that space, so yeah, excited to see more.[00:08:28] swyx: So then the last point I'll mention here is something I also mentioned as paper. So all these directions are kind of guided by what happened in January. That was my way of doing a January recap. Which means that if there was nothing significant in that month, I also didn't mention it. Which is which I came to regret come February 15th, but in January also, you know, there was also the alpha geometry paper, which I kind of put in this sort of long inference bucket, because it solves like, you know, more than 100 step math olympiad geometry problems at a human gold medalist level and that also involves planning, right?[00:08:59] swyx: So like, if you want to scale inference, you can't scale it blindly, because just, Autoregressive token by token generation is only going to get you so far. You need good planning. And I think probably, yeah, what Mike from BrightWave is now doing and what everyone is doing, including maybe what we think QSTAR might be, is some form of search and planning.[00:09:17] swyx: And it makes sense. Like, you want to spend your inference time wisely. How do you[00:09:22] Alessio: think about plans that work and getting them shared? You know, like, I feel like if you're planning a task, somebody has got in and the models are stochastic. So everybody gets initially different results. Somebody is going to end up generating the best plan to do something, but there's no easy way to like store these plans and then reuse them for most people.[00:09:44] Alessio: You know, like, I'm curious if there's going to be. Some paper or like some work there on like making it better because, yeah, we don't[00:09:52] swyx: really have This is your your pet topic of NPM for[00:09:54] Alessio: Yeah, yeah, NPM, exactly. NPM for, you need NPM for anything, man. You need NPM for skills. You need NPM for planning. Yeah, yeah.[00:10:02] Alessio: You know I think, I mean, obviously the Voyager paper is like the most basic example where like, now their artifact is like the best planning to do a diamond pickaxe in Minecraft. And everybody can just use that. They don't need to come up with it again. Yeah. But there's nothing like that for actually useful[00:10:18] swyx: tasks.[00:10:19] swyx: For plans, I believe it for skills. I like that. Basically, that just means a bunch of integration tooling. You know, GPT built me integrations to all these things. And, you know, I just came from an integrations heavy business and I could definitely, I definitely propose some version of that. And it's just, you know, hard to execute or expensive to execute.[00:10:38] swyx: But for planning, I do think that everyone lives in slightly different worlds. They have slightly different needs. And they definitely want some, you know, And I think that that will probably be the main hurdle for any, any sort of library or package manager for planning. But there should be a meta plan of how to plan.[00:10:57] swyx: And maybe you can adopt that. And I think a lot of people when they have sort of these meta prompting strategies of like, I'm not prescribing you the prompt. I'm just saying that here are the like, Fill in the lines or like the mad libs of how to prompts. First you have the roleplay, then you have the intention, then you have like do something, then you have the don't something and then you have the my grandmother is dying, please do this.[00:11:19] swyx: So the meta plan you could, you could take off the shelf and test a bunch of them at once. I like that. That was the initial, maybe, promise of the, the prompting libraries. You know, both 9chain and Llama Index have, like, hubs that you can sort of pull off the shelf. I don't think they're very successful because people like to write their own.[00:11:36] swyx: Yeah,[00:11:37] Direction 2: Synthetic Data (WRAP, SPIN)[00:11:37] Alessio: yeah, yeah. Yeah, that's a good segue into the next one, which is synthetic[00:11:41] swyx: data. Synthetic data is so hot. Yeah, and, you know, the way, you know, I think I, I feel like I should do one of these memes where it's like, Oh, like I used to call it, you know, R L A I F, and now I call it synthetic data, and then people are interested.[00:11:54] swyx: But there's gotta be older versions of what synthetic data really is because I'm sure, you know if you've been in this field long enough, There's just different buzzwords that the industry condenses on. Anyway, the insight that I think is relatively new that why people are excited about it now and why it's proMECEng now is that we have evidence that shows that LLMs can generate data to improve themselves with no teacher LLM.[00:12:22] swyx: For all of 2023, when people say synthetic data, they really kind of mean generate a whole bunch of data from GPT 4 and then train an open source model on it. Hello to our friends at News Research. That's what News Harmony says. They're very, very open about that. I think they have said that they're trying to migrate away from that.[00:12:40] swyx: But it is explicitly against OpenAI Terms of Service. Everyone knows this. You know, especially once ByteDance got banned for, for doing exactly that. So so, so synthetic data that is not a form of model distillation is the hot thing right now, that you can bootstrap better LLM performance from the same LLM, which is very interesting.[00:13:03] swyx: A variant of this is RLAIF, where you have a, where you have a sort of a constitutional model, or, you know, some, some kind of judge model That is sort of more aligned. But that's not really what we're talking about when most people talk about synthetic data. Synthetic data is just really, I think, you know, generating more data in some way.[00:13:23] swyx: A lot of people, I think we talked about this with Vipul from the Together episode, where I think he commented that you just have to have a good world model. Or a good sort of inductive bias or whatever that, you know, term of art is. And that is strongest in math and science math and code, where you can verify what's right and what's wrong.[00:13:44] swyx: And so the REST EM paper from DeepMind explored that. Very well, it's just the most obvious thing like and then and then once you get out of that domain of like things where you can generate You can arbitrarily generate like a whole bunch of stuff and verify if they're correct and therefore they're they're correct synthetic data to train on Once you get into more sort of fuzzy topics, then it's then it's a bit less clear So I think that the the papers that drove this understanding There are two big ones and then one smaller one One was wrap like rephrasing the web from from Apple where they basically rephrased all of the C4 data set with Mistral and it be trained on that instead of C4.[00:14:23] swyx: And so new C4 trained much faster and cheaper than old C, than regular raw C4. And that was very interesting. And I have told some friends of ours that they should just throw out their own existing data sets and just do that because that seems like a pure win. Obviously we have to study, like, what the trade offs are.[00:14:42] swyx: I, I imagine there are trade offs. So I was just thinking about this last night. If you do synthetic data and it's generated from a model, probably you will not train on typos. So therefore you'll be like, once the model that's trained on synthetic data encounters the first typo, they'll be like, what is this?[00:15:01] swyx: I've never seen this before. So they have no association or correction as to like, oh, these tokens are often typos of each other, therefore they should be kind of similar. I don't know. That's really remains to be seen, I think. I don't think that the Apple people export[00:15:15] Alessio: that. Yeah, isn't that the whole, Mode collapse thing, if we do more and more of this at the end of the day.[00:15:22] swyx: Yeah, that's one form of that. Yeah, exactly. Microsoft also had a good paper on text embeddings. And then I think this is a meta paper on self rewarding language models. That everyone is very interested in. Another paper was also SPIN. These are all things we covered in the the Latent Space Paper Club.[00:15:37] swyx: But also, you know, I just kind of recommend those as top reads of the month. Yeah, I don't know if there's any much else in terms, so and then, regarding the potential of it, I think it's high potential because, one, it solves one of the data war issues that we have, like, everyone is OpenAI is paying Reddit 60 million dollars a year for their user generated data.[00:15:56] swyx: Google, right?[00:15:57] Alessio: Not OpenAI.[00:15:59] swyx: Is it Google? I don't[00:16:00] Alessio: know. Well, somebody's paying them 60 million, that's[00:16:04] swyx: for sure. Yes, that is, yeah, yeah, and then I think it's maybe not confirmed who. But yeah, it is Google. Oh my god, that's interesting. Okay, because everyone was saying, like, because Sam Altman owns 5 percent of Reddit, which is apparently 500 million worth of Reddit, he owns more than, like, the founders.[00:16:21] Alessio: Not enough to get the data,[00:16:22] swyx: I guess. So it's surprising that it would go to Google instead of OpenAI, but whatever. Okay yeah, so I think that's all super interesting in the data field. I think it's high potential because we have evidence that it works. There's not a doubt that it doesn't work. I think it's a doubt that there's, what the ceiling is, which is the mode collapse thing.[00:16:42] swyx: If it turns out that the ceiling is pretty close, then this will maybe augment our data by like, I don't know, 30 50 percent good, but not game[00:16:51] Alessio: changing. And most of the synthetic data stuff, it's reinforcement learning on a pre trained model. People are not really doing pre training on fully synthetic data, like, large enough scale.[00:17:02] swyx: Yeah, unless one of our friends that we've talked to succeeds. Yeah, yeah. Pre trained synthetic data, pre trained scale synthetic data, I think that would be a big step. Yeah. And then there's a wildcard, so all of these, like smaller Directions,[00:17:15] Wildcard: Multi-Epoch Training (OLMo, Datablations)[00:17:15] swyx: I always put a wildcard in there. And one of the wildcards is, okay, like, Let's say, you have pre, you have, You've scraped all the data on the internet that you think is useful.[00:17:25] swyx: Seems to top out at somewhere between 2 trillion to 3 trillion tokens. Maybe 8 trillion if Mistral, Mistral gets lucky. Okay, if I need 80 trillion, if I need 100 trillion, where do I go? And so, you can do synthetic data maybe, but maybe that only gets you to like 30, 40 trillion. Like where, where is the extra alpha?[00:17:43] swyx: And maybe extra alpha is just train more on the same tokens. Which is exactly what Omo did, like Nathan Lambert, AI2, After, just after he did the interview with us, they released Omo. So, it's unfortunate that we didn't get to talk much about it. But Omo actually started doing 1. 5 epochs on every, on all data.[00:18:00] swyx: And the data ablation paper that I covered in Europe's says that, you know, you don't like, don't really start to tap out of like, the alpha or the sort of improved loss that you get from data all the way until four epochs. And so I'm just like, okay, like, why do we all agree that one epoch is all you need?[00:18:17] swyx: It seems like to be a trend. It seems that we think that memorization is very good or too good. But then also we're finding that, you know, For improvement in results that we really like, we're fine on overtraining on things intentionally. So, I think that's an interesting direction that I don't see people exploring enough.[00:18:36] swyx: And the more I see papers coming out Stretching beyond the one epoch thing, the more people are like, it's completely fine. And actually, the only reason we stopped is because we ran out of compute[00:18:46] Alessio: budget. Yeah, I think that's the biggest thing, right?[00:18:51] swyx: Like, that's not a valid reason, that's not science. I[00:18:54] Alessio: wonder if, you know, Matt is going to do it.[00:18:57] Alessio: I heard LamaTree, they want to do a 100 billion parameters model. I don't think you can train that on too many epochs, even with their compute budget, but yeah. They're the only ones that can save us, because even if OpenAI is doing this, they're not going to tell us, you know. Same with DeepMind.[00:19:14] swyx: Yeah, and so the updates that we got on Lambda 3 so far is apparently that because of the Gemini news that we'll talk about later they're pushing it back on the release.[00:19:21] swyx: They already have it. And they're just pushing it back to do more safety testing. Politics testing.[00:19:28] Alessio: Well, our episode with Sumit will have already come out by the time this comes out, I think. So people will get the inside story on how they actually allocate the compute.[00:19:38] Direction 3: Alt. Architectures (Mamba, RWKV, RingAttention, Diffusion Transformers)[00:19:38] Alessio: Alternative architectures. Well, shout out to our WKV who won one of the prizes at our Final Frontiers event last week.[00:19:47] Alessio: We talked about Mamba and Strapain on the Together episode. A lot of, yeah, monarch mixers. I feel like Together, It's like the strong Stanford Hazy Research Partnership, because Chris Ray is one of the co founders. So they kind of have a, I feel like they're going to be the ones that have one of the state of the art models alongside maybe RWKB.[00:20:08] Alessio: I haven't seen as many independent. People working on this thing, like Monarch Mixer, yeah, Manbuster, Payena, all of these are together related. Nobody understands the math. They got all the gigabrains, they got 3DAO, they got all these folks in there, like, working on all of this.[00:20:25] swyx: Albert Gu, yeah. Yeah, so what should we comment about it?[00:20:28] swyx: I mean, I think it's useful, interesting, but at the same time, both of these are supposed to do really good scaling for long context. And then Gemini comes out and goes like, yeah, we don't need it. Yeah.[00:20:44] Alessio: No, that's the risk. So, yeah. I was gonna say, maybe it's not here, but I don't know if we want to talk about diffusion transformers as like in the alt architectures, just because of Zora.[00:20:55] swyx: One thing, yeah, so, so, you know, this came from the Jan recap, which, and diffusion transformers were not really a discussion, and then, obviously, they blow up in February. Yeah. I don't think they're, it's a mixed architecture in the same way that Stripe Tiena is mixed there's just different layers taking different approaches.[00:21:13] swyx: Also I think another one that I maybe didn't call out here, I think because it happened in February, was hourglass diffusion from stability. But also, you know, another form of mixed architecture. So I guess that is interesting. I don't have much commentary on that, I just think, like, we will try to evolve these things, and maybe one of these architectures will stick and scale, it seems like diffusion transformers is going to be good for anything generative, you know, multi modal.[00:21:41] swyx: We don't see anything where diffusion is applied to text yet, and that's the wild card for this category. Yeah, I mean, I think I still hold out hope for let's just call it sub quadratic LLMs. I think that a lot of discussion this month actually was also centered around this concept that People always say, oh, like, transformers don't scale because attention is quadratic in the sequence length.[00:22:04] swyx: Yeah, but, you know, attention actually is a very small part of the actual compute that is being spent, especially in inference. And this is the reason why, you know, when you multiply, when you, when you, when you jump up in terms of the, the model size in GPT 4 from like, you know, 38k to like 32k, you don't also get like a 16 times increase in your, in your performance.[00:22:23] swyx: And this is also why you don't get like a million times increase in your, in your latency when you throw a million tokens into Gemini. Like people have figured out tricks around it or it's just not that significant as a term, as a part of the overall compute. So there's a lot of challenges to this thing working.[00:22:43] swyx: It's really interesting how like, how hyped people are about this versus I don't know if it works. You know, it's exactly gonna, gonna work. And then there's also this, this idea of retention over long context. Like, even though you have context utilization, like, the amount of, the amount you can remember is interesting.[00:23:02] swyx: Because I've had people criticize both Mamba and RWKV because they're kind of, like, RNN ish in the sense that they have, like, a hidden memory and sort of limited hidden memory that they will forget things. So, for all these reasons, Gemini 1. 5, which we still haven't covered, is very interesting because Gemini magically has fixed all these problems with perfect haystack recall and reasonable latency and cost.[00:23:29] Wildcards: Text Diffusion, RALM/Retro[00:23:29] swyx: So that's super interesting. So the wildcard I put in here if you want to go to that. I put two actually. One is text diffusion. I think I'm still very influenced by my meeting with a mid journey person who said they were working on text diffusion. I think it would be a very, very different paradigm for, for text generation, reasoning, plan generation if we can get diffusion to work.[00:23:51] swyx: For text. And then the second one is Dowie Aquila's contextual AI, which is working on retrieval augmented language models, where it kind of puts RAG inside of the language model instead of outside.[00:24:02] Alessio: Yeah, there's a paper called Retro that covers some of this. I think that's an interesting thing. I think the The challenge, well not the challenge, what they need to figure out is like how do you keep the rag piece always up to date constantly, you know, I feel like the models, you put all this work into pre training them, but then at least you have a fixed artifact.[00:24:22] Alessio: These architectures are like constant work needs to be done on them and they can drift even just based on the rag data instead of the model itself. Yeah,[00:24:30] swyx: I was in a panel with one of the investors in contextual and the guy, the way that guy pitched it, I didn't agree with. He was like, this will solve hallucination.[00:24:38] Alessio: That's what everybody says. We solve[00:24:40] swyx: hallucination. I'm like, no, you reduce it. It cannot,[00:24:44] Alessio: if you solved it, the model wouldn't exist, right? It would just be plain text. It wouldn't be a generative model. Cool. So, author, architectures, then we got mixture of experts. I think we covered a lot of, a lot of times.[00:24:56] Direction 4: Mixture of Experts (DeepSeekMoE, Samba-1)[00:24:56] Alessio: Maybe any new interesting threads you want to go under here?[00:25:00] swyx: DeepSeq MOE, which was released in January. Everyone who is interested in MOEs should read that paper, because it's significant for two reasons. One three reasons. One, it had, it had small experts, like a lot more small experts. So, for some reason, everyone has settled on eight experts for GPT 4 for Mixtral, you know, that seems to be the favorite architecture, but these guys pushed it to 64 experts, and each of them smaller than the other.[00:25:26] swyx: But then they also had the second idea, which is that it is They had two, one to two always on experts for common knowledge and that's like a very compelling concept that you would not route to all the experts all the time and make them, you know, switch to everything. You would have some always on experts.[00:25:41] swyx: I think that's interesting on both the inference side and the training side for for memory retention. And yeah, they, they, they, the, the, the, the results that they published, which actually excluded, Mixed draw, which is interesting. The results that they published showed a significant performance jump versus all the other sort of open source models at the same parameter count.[00:26:01] swyx: So like this may be a better way to do MOEs that are, that is about to get picked up. And so that, that is interesting for the third reason, which is this is the first time a new idea from China. has infiltrated the West. It's usually the other way around. I probably overspoke there. There's probably lots more ideas that I'm not aware of.[00:26:18] swyx: Maybe in the embedding space. But the I think DCM we, like, woke people up and said, like, hey, DeepSeek, this, like, weird lab that is attached to a Chinese hedge fund is somehow, you know, doing groundbreaking research on MOEs. So, so, I classified this as a medium potential because I think that it is a sort of like a one off benefit.[00:26:37] swyx: You can Add to any, any base model to like make the MOE version of it, you get a bump and then that's it. So, yeah,[00:26:45] Alessio: I saw Samba Nova, which is like another inference company. They released this MOE model called Samba 1, which is like a 1 trillion parameters. But they're actually MOE auto open source models.[00:26:56] Alessio: So it's like, they just, they just clustered them all together. So I think people. Sometimes I think MOE is like you just train a bunch of small models or like smaller models and put them together. But there's also people just taking, you know, Mistral plus Clip plus, you know, Deepcoder and like put them all together.[00:27:15] Alessio: And then you have a MOE model. I don't know. I haven't tried the model, so I don't know how good it is. But it seems interesting that you can then have people working separately on state of the art, you know, Clip, state of the art text generation. And then you have a MOE architecture that brings them all together.[00:27:31] swyx: I'm thrown off by your addition of the word clip in there. Is that what? Yeah, that's[00:27:35] Alessio: what they said. Yeah, yeah. Okay. That's what they I just saw it yesterday. I was also like[00:27:40] swyx: scratching my head. And they did not use the word adapter. No. Because usually what people mean when they say, Oh, I add clip to a language model is adapter.[00:27:48] swyx: Let me look up the Which is what Lava did.[00:27:50] Alessio: The announcement again.[00:27:51] swyx: Stable diffusion. That's what they do. Yeah, it[00:27:54] Alessio: says among the models that are part of Samba 1 are Lama2, Mistral, DeepSigCoder, Falcon, Dplot, Clip, Lava. So they're just taking all these models and putting them in a MOE. Okay,[00:28:05] swyx: so a routing layer and then not jointly trained as much as a normal MOE would be.[00:28:12] swyx: Which is okay.[00:28:13] Alessio: That's all they say. There's no paper, you know, so it's like, I'm just reading the article, but I'm interested to see how[00:28:20] Wildcard: Model Merging (mergekit)[00:28:20] swyx: it works. Yeah, so so the wildcard for this section, the MOE section is model merges, which has also come up as, as a very interesting phenomenon. The last time I talked to Jeremy Howard at the Olama meetup we called it model grafting or model stacking.[00:28:35] swyx: But I think the, the, the term that people are liking these days, the model merging, They're all, there's all different variations of merging. Merge types, and some of them are stacking, some of them are, are grafting. And, and so like, some people are approaching model merging in the way that Samba is doing, which is like, okay, here are defined models, each of which have their specific, Plus and minuses, and we will merge them together in the hope that the, you know, the sum of the parts will, will be better than others.[00:28:58] swyx: And it seems like it seems like it's working. I don't really understand why it works apart from, like, I think it's a form of regularization. That if you merge weights together in like a smart strategy you, you, you get a, you get a, you get a less overfitting and more generalization, which is good for benchmarks, if you, if you're honest about your benchmarks.[00:29:16] swyx: So this is really interesting and good. But again, they're kind of limited in terms of like the amount of bumps you can get. But I think it's very interesting in the sense of how cheap it is. We talked about this on the Chinatalk podcast, like the guest podcast that we did with Chinatalk. And you can do this without GPUs, because it's just adding weights together, and dividing things, and doing like simple math, which is really interesting for the GPU ports.[00:29:42] Alessio: There's a lot of them.[00:29:44] Direction 5: Online LLMs (Gemini Pro, Exa)[00:29:44] Alessio: And just to wrap these up, online LLMs? Yeah,[00:29:48] swyx: I think that I ki I had to feature this because the, one of the top news of January was that Gemini Pro beat GPT-4 turbo on LM sis for the number two slot to GPT-4. And everyone was very surprised. Like, how does Gemini do that?[00:30:06] swyx: Surprise, surprise, they added Google search. Mm-hmm to the results. So it became an online quote unquote online LLM and not an offline LLM. Therefore, it's much better at answering recent questions, which people like. There's an emerging set of table stakes features after you pre train something.[00:30:21] swyx: So after you pre train something, you should have the chat tuned version of it, or the instruct tuned version of it, however you choose to call it. You should have the JSON and function calling version of it. Structured output, the term that you don't like. You should have the online version of it. These are all like table stakes variants, that you should do when you offer a base LLM, or you train a base LLM.[00:30:44] swyx: And I think online is just like, There, it's important. I think companies like Perplexity, and even Exa, formerly Metaphor, you know, are rising to offer that search needs. And it's kind of like, they're just necessary parts of a system. When you have RAG for internal knowledge, and then you have, you know, Online search for external knowledge, like things that you don't know yet?[00:31:06] swyx: Mm-Hmm. . And it seems like it's, it's one of many tools. I feel like I may be underestimating this, but I'm just gonna put it out there that I, I think it has some, some potential. One of the evidence points that it doesn't actually matter that much is that Perplexity has a, has had online LMS for three months now and it performs, doesn't perform great.[00:31:25] swyx: Mm-Hmm. on, on lms, it's like number 30 or something. So it's like, okay. You know, like. It's, it's, it helps, but it doesn't give you a giant, giant boost. I[00:31:34] Alessio: feel like a lot of stuff I do with LLMs doesn't need to be online. So I'm always wondering, again, going back to like state of the art, right? It's like state of the art for who and for what.[00:31:45] Alessio: It's really, I think online LLMs are going to be, State of the art for, you know, news related activity that you need to do. Like, you're like, you know, social media, right? It's like, you want to have all the latest stuff, but coding, science,[00:32:01] swyx: Yeah, but I think. Sometimes you don't know what is news, what is news affecting.[00:32:07] swyx: Like, the decision to use an offline LLM is already a decision that you might not be consciously making that might affect your results. Like, what if, like, just putting things on, being connected online means that you get to invalidate your knowledge. And when you're just using offline LLM, like it's never invalidated.[00:32:27] swyx: I[00:32:28] Alessio: agree, but I think going back to your point of like the standing the test of time, I think sometimes you can get swayed by the online stuff, which is like, hey, you ask a question about, yeah, maybe AI research direction, you know, and it's like, all the recent news are about this thing. So the LLM like focus on answering, bring it up, you know, these things.[00:32:50] swyx: Yeah, so yeah, I think, I think it's interesting, but I don't know if I can, I bet heavily on this.[00:32:56] Alessio: Cool. Was there one that you forgot to put, or, or like a, a new direction? Yeah,[00:33:01] swyx: so, so this brings us into sort of February. ish.[00:33:05] OpenAI Sora and why everyone underestimated videogen[00:33:05] swyx: So like I published this in like 15 came with Sora. And so like the one thing I did not mention here was anything about multimodality.[00:33:16] swyx: Right. And I have chronically underweighted this. I always wrestle. And, and my cop out is that I focused this piece or this research direction piece on LLMs because LLMs are the source of like AGI, quote unquote AGI. Everything else is kind of like. You know, related to that, like, generative, like, just because I can generate better images or generate better videos, it feels like it's not on the critical path to AGI, which is something that Nat Friedman also observed, like, the day before Sora, which is kind of interesting.[00:33:49] swyx: And so I was just kind of like trying to focus on like what is going to get us like superhuman reasoning that we can rely on to build agents that automate our lives and blah, blah, blah, you know, give us this utopian future. But I do think that I, everybody underestimated the, the sheer importance and cultural human impact of Sora.[00:34:10] swyx: And you know, really actually good text to video. Yeah. Yeah.[00:34:14] Alessio: And I saw Jim Fan at a, at a very good tweet about why it's so impressive. And I think when you have somebody leading the embodied research at NVIDIA and he said that something is impressive, you should probably listen. So yeah, there's basically like, I think you, you mentioned like impacting the world, you know, that we live in.[00:34:33] Alessio: I think that's kind of like the key, right? It's like the LLMs don't have, a world model and Jan Lekon. He can come on the podcast and talk all about what he thinks of that. But I think SORA was like the first time where people like, Oh, okay, you're not statically putting pixels of water on the screen, which you can kind of like, you know, project without understanding the physics of it.[00:34:57] Alessio: Now you're like, you have to understand how the water splashes when you have things. And even if you just learned it by watching video and not by actually studying the physics, You still know it, you know, so I, I think that's like a direction that yeah, before you didn't have, but now you can do things that you couldn't before, both in terms of generating, I think it always starts with generating, right?[00:35:19] Alessio: But like the interesting part is like understanding it. You know, it's like if you gave it, you know, there's the video of like the, the ship in the water that they generated with SORA, like if you gave it the video back and now it could tell you why the ship is like too rocky or like it could tell you why the ship is sinking, then that's like, you know, AGI for like all your rig deployments and like all this stuff, you know, so, but there's none, there's none of that yet, so.[00:35:44] Alessio: Hopefully they announce it and talk more about it. Maybe a Dev Day this year, who knows.[00:35:49] swyx: Yeah who knows, who knows. I'm talking with them about Dev Day as well. So I would say, like, the phrasing that Jim used, which resonated with me, he kind of called it a data driven world model. I somewhat agree with that.[00:36:04] Does Sora have a World Model? Yann LeCun vs Jim Fan[00:36:04] swyx: I am on more of a Yann LeCun side than I am on Jim's side, in the sense that I think that is the vision or the hope that these things can build world models. But you know, clearly even at the current SORA size, they don't have the idea of, you know, They don't have strong consistency yet. They have very good consistency, but fingers and arms and legs will appear and disappear and chairs will appear and disappear.[00:36:31] swyx: That definitely breaks physics. And it also makes me think about how we do deep learning versus world models in the sense of You know, in classic machine learning, when you have too many parameters, you will overfit, and actually that fails, that like, does not match reality, and therefore fails to generalize well.[00:36:50] swyx: And like, what scale of data do we need in order to world, learn world models from video? A lot. Yeah. So, so I, I And cautious about taking this interpretation too literally, obviously, you know, like, I get what he's going for, and he's like, obviously partially right, obviously, like, transformers and, and, you know, these, like, these sort of these, these neural networks are universal function approximators, theoretically could figure out world models, it's just like, how good are they, and how tolerant are we of hallucinations, we're not very tolerant, like, yeah, so It's, it's, it's gonna prior, it's gonna bias us for creating like very convincing things, but then not create like the, the, the useful role models that we want.[00:37:37] swyx: At the same time, what you just said, I think made me reflect a little bit like we just got done saying how important synthetic data is for Mm-Hmm. for training lms. And so like, if this is a way of, of synthetic, you know, vi video data for improving our video understanding. Then sure, by all means. Which we actually know, like, GPT 4, Vision, and Dolly were trained, kind of, co trained together.[00:38:02] swyx: And so, like, maybe this is on the critical path, and I just don't fully see the full picture yet.[00:38:08] Alessio: Yeah, I don't know. I think there's a lot of interesting stuff. It's like, imagine you go back, you have Sora, you go back in time, and Newton didn't figure out gravity yet. Would Sora help you figure it out?[00:38:21] Alessio: Because you start saying, okay, a man standing under a tree with, like, Apples falling, and it's like, oh, they're always falling at the same speed in the video. Why is that? I feel like sometimes these engines can like pick up things, like humans have a lot of intuition, but if you ask the average person, like the physics of like a fluid in a boat, they couldn't be able to tell you the physics, but they can like observe it, but humans can only observe this much, you know, versus like now you have these models to observe everything and then They generalize these things and maybe we can learn new things through the generalization that they pick up.[00:38:55] swyx: But again, And it might be more observant than us in some respects. In some ways we can scale it up a lot more than the number of physicists that we have available at Newton's time. So like, yeah, absolutely possible. That, that this can discover new science. I think we have a lot of work to do to formalize the science.[00:39:11] swyx: And then, I, I think the last part is you know, How much, how much do we cheat by gen, by generating data from Unreal Engine 5? Mm hmm. which is what a lot of people are speculating with very, very limited evidence that OpenAI did that. The strongest evidence that I saw was someone who works a lot with Unreal Engine 5 looking at the side characters in the videos and noticing that they all adopt Unreal Engine defaults.[00:39:37] swyx: of like, walking speed, and like, character choice, like, character creation choice. And I was like, okay, like, that's actually pretty convincing that they actually use Unreal Engine to bootstrap some synthetic data for this training set. Yeah,[00:39:52] Alessio: could very well be.[00:39:54] swyx: Because then you get the labels and the training side by side.[00:39:58] swyx: One thing that came up on the last day of February, which I should also mention, is EMO coming out of Alibaba, which is also a sort of like video generation and space time transformer that also involves probably a lot of synthetic data as well. And so like, this is of a kind in the sense of like, oh, like, you know, really good generative video is here and It is not just like the one, two second clips that we saw from like other, other people and like, you know, Pika and all the other Runway are, are, are, you know, run Cristobal Valenzuela from Runway was like game on which like, okay, but like, let's see your response because we've heard a lot about Gen 1 and 2, but like, it's nothing on this level of Sora So it remains to be seen how we can actually apply this, but I do think that the creative industry should start preparing.[00:40:50] swyx: I think the Sora technical blog post from OpenAI was really good.. It was like a request for startups. It was so good in like spelling out. Here are the individual industries that this can impact.[00:41:00] swyx: And anyone who, anyone who's like interested in generative video should look at that. But also be mindful that probably when OpenAI releases a Soa API, right? The you, the in these ways you can interact with it are very limited. Just like the ways you can interact with Dahlia very limited and someone is gonna have to make open SOA to[00:41:19] swyx: Mm-Hmm to, to, for you to create comfy UI pipelines.[00:41:24] Alessio: The stability folks said they wanna build an open. For a competitor, but yeah, stability. Their demo video, their demo video was like so underwhelming. It was just like two people sitting on the beach[00:41:34] swyx: standing. Well, they don't have it yet, right? Yeah, yeah.[00:41:36] swyx: I mean, they just wanna train it. Everybody wants to, right? Yeah. I, I think what is confusing a lot of people about stability is like they're, they're, they're pushing a lot of things in stable codes, stable l and stable video diffusion. But like, how much money do they have left? How many people do they have left?[00:41:51] swyx: Yeah. I have had like a really, Ima Imad spent two hours with me. Reassuring me things are great. And, and I'm like, I, I do, like, I do believe that they have really, really quality people. But it's just like, I, I also have a lot of very smart people on the other side telling me, like, Hey man, like, you know, don't don't put too much faith in this, in this thing.[00:42:11] swyx: So I don't know who to believe. Yeah.[00:42:14] Alessio: It's hard. Let's see. What else? We got a lot more stuff. I don't know if we can. Yeah, Groq.[00:42:19] Groq Math[00:42:19] Alessio: We can[00:42:19] swyx: do a bit of Groq prep. We're, we're about to go to talk to Dylan Patel. Maybe, maybe it's the audio in here. I don't know. It depends what, what we get up to later. What, how, what do you as an investor think about Groq? Yeah. Yeah, well, actually, can you recap, like, why is Groq interesting? So,[00:42:33] Alessio: Jonathan Ross, who's the founder of Groq, he's the person that created the TPU at Google. It's actually, it was one of his, like, 20 percent projects. It's like, he was just on the side, dooby doo, created the TPU.[00:42:46] Alessio: But yeah, basically, Groq, they had this demo that went viral, where they were running Mistral at, like, 500 tokens a second, which is like, Fastest at anything that you have out there. The question, you know, it's all like, The memes were like, is NVIDIA dead? Like, people don't need H100s anymore. I think there's a lot of money that goes into building what GRUK has built as far as the hardware goes.[00:43:11] Alessio: We're gonna, we're gonna put some of the notes from, from Dylan in here, but Basically the cost of the Groq system is like 30 times the cost of, of H100 equivalent. So, so[00:43:23] swyx: let me, I put some numbers because me and Dylan were like, I think the two people actually tried to do Groq math. Spreadsheet doors.[00:43:30] swyx: Spreadsheet doors. So, one that's, okay, oh boy so, so, equivalent H100 for Lama 2 is 300, 000. For a system of 8 cards. And for Groq it's 2. 3 million. Because you have to buy 576 Groq cards. So yeah, that, that just gives people an idea. So like if you deprecate both over a five year lifespan, per year you're deprecating 460K for Groq, and 60K a year for H100.[00:43:59] swyx: So like, Groqs are just way more expensive per model that you're, that you're hosting. But then, you make it up in terms of volume. So I don't know if you want to[00:44:08] Alessio: cover that. I think one of the promises of Groq is like super high parallel inference on the same thing. So you're basically saying, okay, I'm putting on this upfront investment on the hardware, but then I get much better scaling once I have it installed.[00:44:24] Alessio: I think the big question is how much can you sustain the parallelism? You know, like if you get, if you're going to get 100% Utilization rate at all times on Groq, like, it's just much better, you know, because like at the end of the day, the tokens per second costs that you're getting is better than with the H100s, but if you get to like 50 percent utilization rate, you will be much better off running on NVIDIA.[00:44:49] Alessio: And if you look at most companies out there, who really gets 100 percent utilization rate? Probably open AI at peak times, but that's probably it. But yeah, curious to see more. I saw Jonathan was just at the Web Summit in Dubai, in Qatar. He just gave a talk there yesterday. That I haven't listened to yet.[00:45:09] Alessio: I, I tweeted that he should come on the pod. He liked it. And then rock followed me on Twitter. I don't know if that means that they're interested, but[00:45:16] swyx: hopefully rock social media person is just very friendly. They, yeah. Hopefully[00:45:20] Alessio: we can get them. Yeah, we, we gonna get him. We[00:45:22] swyx: just call him out and, and so basically the, the key question is like, how sustainable is this and how much.[00:45:27] swyx: This is a loss leader the entire Groq management team has been on Twitter and Hacker News saying they are very, very comfortable with the pricing of 0. 27 per million tokens. This is the lowest that anyone has offered tokens as far as Mixtral or Lama2. This matches deep infra and, you know, I think, I think that's, that's, that's about it in terms of that, that, that low.[00:45:47] swyx: And we think the pro the break even for H100s is 50 cents. At a, at a normal utilization rate. To make this work, so in my spreadsheet I made this, made this work. You have to have like a parallelism of 500 requests all simultaneously. And you have, you have model bandwidth utilization of 80%.[00:46:06] swyx: Which is way high. I just gave them high marks for everything. Groq has two fundamental tech innovations that they hinge their hats on in terms of like, why we are better than everyone. You know, even though, like, it remains to be independently replicated. But one you know, they have this sort of the entire model on the chip idea, which is like, Okay, get rid of HBM.[00:46:30] swyx: And, like, put everything in SREM. Like, okay, fine, but then you need a lot of cards and whatever. And that's all okay. And so, like, because you don't have to transfer between memory, then you just save on that time and that's why they're faster. So, a lot of people buy that as, like, that's the reason that you're faster.[00:46:45] swyx: Then they have, like, some kind of crazy compiler, or, like, Speculative routing magic using compilers that they also attribute towards their higher utilization. So I give them 80 percent for that. And so that all that works out to like, okay, base costs, I think you can get down to like, maybe like 20 something cents per million tokens.[00:47:04] swyx: And therefore you actually are fine if you have that kind of utilization. But it's like, I have to make a lot of fearful assumptions for this to work.[00:47:12] Alessio: Yeah. Yeah, I'm curious to see what Dylan says later.[00:47:16] swyx: So he was like completely opposite of me. He's like, they're just burning money. Which is great.[00:47:22] Analyzing Gemini's 1m Context, Reddit deal, Imagegen politics, Gemma via the Four Wars[00:47:22] Alessio: Gemini, want to do a quick run through since this touches on all the four words.[00:47:28] swyx: Yeah, and I think this is the mark of a useful framework, that when a new thing comes along, you can break it down in terms of the four words and sort of slot it in or analyze it in those four frameworks, and have nothing left.[00:47:41] swyx: So it's a MECE categorization. MECE is Mutually Exclusive and Collectively Exhaustive. And that's a really, really nice way to think about taxonomies and to create mental frameworks. So, what is Gemini 1. 5 Pro? It is the newest model that came out one week after Gemini 1. 0. Which is very interesting.[00:48:01] swyx: They have not really commented on why. They released this the headline feature is that it has a 1 million token context window that is multi modal which means that you can put all sorts of video and audio And PDFs natively in there alongside of text and, you know, it's, it's at least 10 times longer than anything that OpenAI offers which is interesting.[00:48:20] swyx: So it's great for prototyping and it has interesting discussions on whether it kills RAG.[00:48:25] Alessio: Yeah, no, I mean, we always talk about, you know, Long context is good, but you're getting charged per token. So, yeah, people love for you to use more tokens in the context. And RAG is better economics. But I think it all comes down to like how the price curves change, right?[00:48:42] Alessio: I think if anything, RAG's complexity goes up and up the more you use it, you know, because you have more data sources, more things you want to put in there. The token costs should go down over time, you know, if the model stays fixed. If people are happy with the model today. In two years, three years, it's just gonna cost a lot less, you know?[00:49:02] Alessio: So now it's like, why would I use RAG and like go through all of that? It's interesting. I think RAG is better cutting edge economics for LLMs. I think large context will be better long tail economics when you factor in the build cost of like managing a RAG pipeline. But yeah, the recall was like the most interesting thing because we've seen the, you know, You know, in the haystack things in the past, but apparently they have 100 percent recall on anything across the context window.[00:49:28] Alessio: At least they say nobody has used it. No, people[00:49:30] swyx: have. Yeah so as far as, so, so what this needle in a haystack thing for people who aren't following as closely as us is that someone, I forget his name now someone created this needle in a haystack problem where you feed in a whole bunch of generated junk not junk, but just like, Generate a data and ask it to specifically retrieve something in that data, like one line in like a hundred thousand lines where it like has a specific fact and if it, if you get it, you're, you're good.[00:49:57] swyx: And then he moves the needle around, like, you know, does it, does, does your ability to retrieve that vary if I put it at the start versus put it in the middle, put it at the end? And then you generate this like really nice chart. That, that kind of shows like it's recallability of a model. And he did that for GPT and, and Anthropic and showed that Anthropic did really, really poorly.[00:50:15] swyx: And then Anthropic came back and said it was a skill issue, just add this like four, four magic words, and then, then it's magically all fixed. And obviously everybody laughed at that. But what Gemini came out with was, was that, yeah, we, we reproduced their, you know, haystack issue you know, test for Gemini, and it's good across all, all languages.[00:50:30] swyx: All the one million token window, which is very interesting because usually for typical context extension methods like rope or yarn or, you know, anything like that, or alibi, it's lossy like by design it's lossy, usually for conversations that's fine because we are lossy when we talk to people but for superhuman intelligence, perfect memory across Very, very long context.[00:50:51] swyx: It's very, very interesting for picking things up. And so the people who have been given the beta test for Gemini have been testing this. So what you do is you upload, let's say, all of Harry Potter and you change one fact in one sentence, somewhere in there, and you ask it to pick it up, and it does. So this is legit.[00:51:08] swyx: We don't super know how, because this is, like, because it doesn't, yes, it's slow to inference, but it's not slow enough that it's, like, running. Five different systems in the background without telling you. Right. So it's something, it's something interesting that they haven't fully disclosed yet. The open source community has centered on this ring attention paper, which is created by your friend Matei Zaharia, and a couple other people.[00:51:36] swyx: And it's a form of distributing the compute. I don't super understand, like, why, you know, doing, calculating, like, the fee for networking and attention. In block wise fashion and distributing it makes it so good at recall. I don't think they have any answer to that. The only thing that Ring of Tension is really focused on is basically infinite context.[00:51:59] swyx: They said it was good for like 10 to 100 million tokens. Which is, it's just great. So yeah, using the four wars framework, what is this framework for Gemini? One is the sort of RAG and Ops war. Here we care less about RAG now, yes. Or, we still care as much about RAG, but like, now it's it's not important in prototyping.[00:52:21] swyx: And then, for data war I guess this is just part of the overall training dataset, but Google made a 60 million deal with Reddit and presumably they have deals with other companies. For the multi modality war, we can talk about the image generation, Crisis, or the fact that Gemini also has image generation, which we'll talk about in the next section.[00:52:42] swyx: But it also has video understanding, which is, I think, the top Gemini post came from our friend Simon Willison, who basically did a short video of him scanning over his bookshelf. And it would be able to convert that video into a JSON output of what's on that bookshelf. And I think that is very useful.[00:53:04] swyx: Actually ties into the conversation that we had with David Luan from Adept. In a sense of like, okay what if video was the main modality instead of text as the input? What if, what if everything was video in, because that's how we work. We, our eyes don't actually read, don't actually like get input, our brains don't get inputs as characters.[00:53:25] swyx: Our brains get the pixels shooting into our eyes, and then our vision system takes over first, and then we sort of mentally translate that into text later. And so it's kind of like what Adept is kind of doing, which is driving by vision model, instead of driving by raw text understanding of the DOM. And, and I, I, in that, that episode, which we haven't released I made the analogy to like self-driving by lidar versus self-driving by camera.[00:53:52] swyx: Mm-Hmm. , right? Like, it's like, I think it, what Gemini and any other super long context that model that is multimodal unlocks is what if you just drive everything by video. Which is[00:54:03] Alessio: cool. Yeah, and that's Joseph from Roboflow. It's like anything that can be seen can be programmable with these models.[00:54:12] Alessio: You mean[00:54:12] swyx: the computer vision guy is bullish on computer vision?[00:54:18] Alessio: It's like the rag people. The rag people are bullish on rag and not a lot of context. I'm very surprised. The, the fine tuning people love fine tuning instead of few shot. Yeah. Yeah. The, yeah, the, that's that. Yeah, the, I, I think the ring attention thing, and it's how they did it, we don't know. And then they released the Gemma models, which are like a 2 billion and 7 billion open.[00:54:41] Alessio: Models, which people said are not, are not good based on my Twitter experience, which are the, the GPU poor crumbs. It's like, Hey, we did all this work for us because we're GPU rich and we're just going to run this whole thing. And

ceo american spotify tiktok black ai australia europe art english google china apple vision france politics service online state crisis living san francisco west research russia chinese elon musk reach search microsoft teacher surprise ring security harry potter chatgpt asian broadway run silicon valley mvp ceos discord reddit medium mail dubai stanford math adolf hitler fill worlds complex direction context mixed qatar stanford university dom one year substack falcon cto tension offensive ia retro minecraft newton hungary openai gemini explorers sf residence archive alt nvidia ux builder api laptops apples lamar discovered fastest generate sweep voyager stable python gpt developed ui j'ai jet mm stretching rj ml lama hungarian alibaba automated github llama directions notion grimes rail lava merge transformer lesser clip runway metaphor amd synthetic samba bal copilot emo llm sam altman sora structured shack wechat ops mamba gpu ix unreal engine agi spreadsheets connector rahul raspberry pi vector zapier bytedance sql perplexity pixie rag collected c4 sonar gpus anz anthropic 7b deepmind lambda vps utilization alessio tiananmen square speculative lms gopher lm web summit json google gemini mistral 60k mixture sundar pichai arp kura cli pika pocketcast tendency soa motif a16z digital ocean sumit demo day chinchillas itamar adept npm versa yon markov reassuring linux foundation dabble hacker news dcm us tech boma omo moes replit svelte agis jupyter yann lecun open api tpu matryoshka jupyter notebooks jeremy howard groq exa 70b vipul neurips hbm gemini pro mece rlhf nat friedman rnn mrl code interpreter chris ray naton simon willison 460k audio recap and openai latent space sfai unthinking jerry liu versal matei zaharia hashnode

NFT Success and The Future of Motion with Rive CD Jerry Liu

School of Motion Podcast

Play Episode Listen Later Feb 7, 2024 68:30

In this week's podcast we're talking to Jerry Liu - Creative Director at Rive, and a man who has seen a lot in his motion design career. He's done plenty of studio work, did a stint at Meta, made 7-figures selling NFTs, and now he's deep in the fast-growing world of interactive motion design. In this conversation, Jerry talks transparently about his career, the NFT roller coaster he got to ride, and his thoughts on the future of this industry. If you're curious about the thrill of designing for interactions or the potential opportunities that new tools like Rive can open for motion designers, this episode is an absolute goldmine. Watch the video version of this podcast here: https://www.schoolofmotion.com/blog/rive-future-of-motion Show Notes Jerry Liu Studio People Ross Plaskow Stephen Kelleher Dan Savage Signalnoise (James White) Olly Moss Deekay Kwon Oscar Mar Genuine Human (Jay) Chris Bjerre Neptali Cisneros Studios MK12 Psyop Ravie Toast Pieces Black Hat/Zero-day What is a Blockchain Game? Kibatsu Mecha Mark Zuckerberg: First Interview in the Metaverse META - Messenger Redesign Made a simple interactive retro game menu with @rive.app How to make a rainbow drawing mini-game in @rive.app I recently joined the incredible team at @rive.app as Creative Director. Fold-A-Face by Jeff McAvoy The Rive Guy Big Trouble in Little China Exit through the Gift Shop Rive and the future of interactive design with Guido Rosso Motion Design as a Subscription Service? Austin Bauwens of Ravie Studio Resources Rive Ringling's Tuition Rolo Stepn School of Visual Arts (SVA) Brand New School Mondo Parallel

success nfts motion creative directors subscription services rive jerry liu

Revolutionizing AI Data Management with Jerry Liu, CEO of LlamaIndex

Gradient Dissent - A Machine Learning Podcast by W&B

Play Episode Listen Later Jan 4, 2024 57:35

In the latest episode of Gradient Dissent, we explore the innovative features and impact of LlamaIndex in AI data management with Jerry Liu, CEO of LlamaIndex. Jerry shares insights on how LlamaIndex integrates diverse data formats with advanced AI technologies, addressing challenges in data retrieval, analysis, and conversational memory. We also delve into the future of AI-driven systems and LlamaIndex's role in this rapidly evolving field. This episode is a must-watch for anyone interested in AI, data science, and the future of technology.Timestamps:0:00 - Introduction 4:46 - Differentiating LlamaIndex in the AI framework ecosystem.9:00 - Discussing data analysis, search, and retrieval applications.14:17 - Exploring Retrieval Augmented Generation (RAG) and vector databases.19:33 - Implementing and optimizing One Bot in Discord.24:19 - Developing and evaluating datasets for AI systems.28:00 - Community contributions and the growth of LlamaIndex.34:34 - Discussing embedding models and the use of vector databases.39:33 - Addressing AI model hallucinations and fine-tuning.44:51 - Text extraction applications and agent-based systems in AI.49:25 - Community contributions to LlamaIndex and managing refactors.52:00 - Interactions with big tech's corpus and AI context length.54:59 - Final thoughts on underrated aspects of ML and challenges in AI.Thanks for listening to the Gradient Dissent podcast, brought to you by Weights & Biases. If you enjoyed this episode, please leave a review to help get the word out about the show. And be sure to subscribe so you never miss another insightful conversation.Connect with Jerry:https://twitter.com/jerryjliu0https://www.linkedin.com/in/jerry-liu-64390071/Follow Weights & Biases:YouTube: http://wandb.me/youtubeTwitter: https://twitter.com/weights_biases LinkedIn: https://www.linkedin.com/company/wandb Join the Weights & Biases Discord Server:https://discord.gg/CkZKRNnaf3#OCR #DeepLearning #AI #Modeling #ML

ceo community ai developing discord implementing revolutionizing ml interactions biases weights data management ai data jerry liu

LlamaIndex and More: Building LLM Tech with Jerry Liu

GraphStuff.FM: The Neo4j Graph Database Developer Podcast

Play Episode Listen Later Jan 1, 2024 43:14

Tools of the Month:Remix for data-driven websites https://remix.run/HTTPie: https://httpie.io/cliPypeteer https://github.com/pyppeteer/pyppeteerRectangle https://rectangleapp.com/Fireflies.ai https://fireflies.ai/Video Speed Controller https://chromewebstore.google.com/detail/video-speed-controller/gioehmkjkeamcinbdelehlpnpdcdjpdp?pli=1Product updates:Neo4j release (5.15) https://neo4j.com/release-notes/database/neo4j-5/Neo4j Driver updatesAPOC Core https://github.com/neo4j/apoc/releases/tag/5.15.0GraphQL release (4.4.4) https://github.com/neo4j/graphql/releasesHelm chart update (5.14.0) https://github.com/neo4j/helm-charts/releases/tag/5.14.0Several Neo4j Connectors updatedArticles:Try Neo4j's Next-Gen Graph-Native Store Format https://neo4j.com/developer-blog/neo4j-graph-native-store-format/Implementing Advanced Retrieval RAG Strategies with Neo4j https://neo4j.com/developer-blog/advanced-rag-strategies-neo4j/Introducing Deno Runtime to the Neo4j Driver for Javascript https://neo4j.com/developer-blog/deno-runtime-neo4j-driver-javascript/Using a Knowledge Graph to Implement a DevOps RAG Application https://neo4j.com/developer-blog/knowledge-graph-devops-rag-application/Convenient Neo4j Integration Tests in Github Actions Using the Aura CLI https://neo4j.com/developer-blog/neo4j-integration-tests-github-actions-aura-cli/Neo4j x LangChain: Deep Dive Into the New Vector Index Implementation https://neo4j.com/developer-blog/neo4j-langchain-vector-index-implementation/Videos:RAG with a Neo4j Knowledge Graph: How it Works and How to Set It Up https://www.youtube.com/watch?v=ftlZ0oeXYRENODES 2023 playlist https://youtube.com/playlist?list=PL9Hl4pk2FsvUu4hzyhWed8Avu5nSUXYrb&si=8_0sYVRYz8CqqdIcEvents:(Jan 4) YouTube series (virtual): Going Meta Ep 24 https://neo4j.com/event/going-meta-a-series-on-graphs-semantics-and-knowledge-episode-24/(Jan 10) Meetup (Austin, TX and virtual): Airplane Route Optimization Using Neo4j's Graph Database https://neo4j.com/event/airplane-route-optimization-using-neo4js-graph-database/(Jan 10) Webinar (virtual): Neo4j: 2024 Trends: What Data and Analytics Leaders Need to Know - Asia https://neo4j.com/event/neo4j-2024-trends-what-data-and-analytics-leaders-need-to-know-asia-pacific-jan-11/(Jan 11) Webinar (virtual): Neo4j: 2024 Trends: What Data and Analytics Leaders Need to Know - Europe https://neo4j.com/event/neo4j-2024-trends-what-data-and-analytics-leaders-need-to-know-europe-jan-11/(Jan 11) Webinar (virtual): Neo4j: 2024 Trends: What Data and Analytics Leaders Need to Know - Americas https://neo4j.com/event/neo4j-2024-trends-what-data-and-analytics-leaders-need-to-know-jan-11/(Jan 17) Webinar (virtual): O'Reilly Media: Generative AI for Healthcare https://neo4j.com/event/oreilly-media-generative-ai-for-healthcare-jan-17/(Jan 22) Webinar (virtual): Neo4j: Building More Accurate GenAI Chatbots: A Technical Guide - Asia https://neo4j.com/event/neo4j-building-more-accurate-genai-chatbots-a-technical-guide-asia-pacific-jan-23/(Jan 23) Webinar (virtual): Neo4j: Building More Accurate GenAI Chatbots: A Technical Guide - Europe https://neo4j.com/event/neo4j-building-more-accurate-genai-chatbots-a-technical-guide-europe-jan-23/(Jan 23) Webinar (virtual): Neo4j: Building More Accurate GenAI Chatbots: A Technical Guide - Americas https://neo4j.com/event/neo4j-building-more-accurate-genai-chatbots-a-technical-guide-jan-23/(Jan 25) YouTube series (virtual): Neo4j Live: Building a Semantics-Based Recommender System for ESG Documents https://neo4j.com/event/neo4j-live-building-a-semantics-based-recommender-system-for-esg-documents/(Jan 25) Conference (Bristol, UK): GraphTalk Government https://neo4j.com/event/graphtalk-government/(Jan 31) Meetup (London, UK): LLM + Knowledge Graph FTW https://neo4j.com/event/llm-knowledge-graph-ftw/(Jan 31) Meetup: Cloud-Native Geospatial Analytics Combining Spatial SQL & Graph Data Science https://neo4j.com/event/cloud-native-geospatial-analytics-combining-spatial-sql-graph-data-science/

tech tools healthcare tx webinars implement javascript llm genai graphs fireflies set it up neo4j knowledge graph graph database jerry liu video speed controller

Notebooks = Chat++ and RAG = RecSys! — with Bryan Bischof of Hex Magic

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Nov 29, 2023 51:54

Catch us at Modular's ModCon next week with Chris Lattner, and join our community!Due to Bryan's very wide ranging experience in data science and AI across Blue Bottle (!), StitchFix, Weights & Biases, and now Hex Magic, this episode can be considered a two-parter.Notebooks = Chat++We've talked a lot about AI UX (in our meetups, writeups, and guest posts), and today we're excited to dive into a new old player in AI interfaces: notebooks! Depending on your background, you either Don't Like or you Like notebooks — they are the most popular example of Knuth's Literate Programming concept, basically a collection of cells; each cell can execute code, display it, and share its state with all the other cells in a notebook. They can also simply be Markdown cells to add commentary to the analysis. Notebooks have a long history but most recently became popular from iPython evolving into Project Jupyter, and a wave of notebook based startups from Observable to DeepNote and Databricks sprung up for the modern data stack.The first wave of AI applications has been very chat focused (ChatGPT, Character.ai, Perplexity, etc). Chat as a user interface has a few shortcomings, the major one being the inability to edit previous messages. We enjoyed Bryan's takes on why notebooks feel like “Chat++” and how they are building Hex Magic:* Atomic actions vs Stream of consciousness: in a chat interface, you make corrections by adding more messages to a conversation (i.e. “Can you try again by doing X instead?” or “I actually meant XYZ”). The context can easily get messy and confusing for models (and humans!) to follow. Notebooks' cell structure on the other hand allows users to go back to any previous cells and make edits without having to add new ones at the bottom. * “Airlocks” for repeatability: one of the ideas they came up with at Hex is “airlocks”, a collection of cells that depend on each other and keep each other in sync. If you have a task like “Create a summary of my customers' recent purchases”, there are many sub-tasks to be done (look up the data, sum the amounts, write the text, etc). Each sub-task will be in its own cell, and the airlock will keep them all in sync together.* Technical + Non-Technical users: previously you had to use Python / R / Julia to write notebooks code, but with models like GPT-4, natural language is usually enough. Hex is also working on lowering the barrier of entry for non-technical users into notebooks, similar to how Code Interpreter is doing the same in ChatGPT. Obviously notebooks aren't new for developers (OpenAI Cookbooks are a good example), but haven't had much adoption in less technical spheres. Some of the shortcomings of chat UIs + LLMs lowering the barrier of entry to creating code cells might make them a much more popular UX going forward.RAG = RecSys!We also talked about the LLMOps landscape and why it's an “iron mine” rather than a “gold rush”: I'll shamelessly steal [this] from a friend, Adam Azzam from Prefect. He says that [LLMOps] is more of like an iron mine than a gold mine in the sense of there is a lot of work to extract this precious, precious resource. Don't expect to just go down to the stream and do a little panning. There's a lot of work to be done. And frankly, the steps to go from this resource to something valuable is significant.Some of my favorite takeaways:* RAG as RecSys for LLMs: at its core, the goal of a RAG pipeline is finding the most relevant documents based on a task. This isn't very different from traditional recommendation system products that surface things for users. How can we apply old lessons to this new problem? Bryan cites fellow AIE Summit speaker and Latent Space Paper Club host Eugene Yan in decomposing the retrieval problem into retrieval, filtering, and scoring/ranking/ordering:As AI Engineers increasingly find that long context has tradeoffs, they will also have to relearn age old lessons that vector search is NOT all you need and a good systems not models approach is essential to scalable/debuggable RAG. Good thing Bryan has just written the first O'Reilly book about modern RecSys, eh?* Narrowing down evaluation: while “hallucination” is a easy term to throw around, the reality is more nuanced. A lot of times, model errors can be automatically fixed: is this JSON valid? If not, why? Is it just missing a closing brace? These smaller issues can be checked and fixed before returning the response to the user, which is easier than fixing the model.* Fine-tuning isn't all you need: when they first started building Magic, one of the discussions was around fine-tuning a model. In our episode with Jeremy Howard we talked about how fine-tuning leads to loss of capabilities as well. In notebooks, you are often dealing with domain-specific data (i.e. purchases, orders, wardrobe composition, household items, etc); the fact that the model understands that “items” are probably part of an “order” is really helpful. They have found that GPT-4 + 3.5-turbo were everything they needed to ship a great product rather than having to fine-tune on notebooks specifically.Definitely recommend listening to this one if you are interested in getting a better understanding of how to think about AI, data, and how we can use traditional machine learning lessons in large language models. The AI PivotFor more Bryan, don't miss his fireside chat at the AI Engineer Summit:Show Notes* Hex Magic* Bryan's new book: Building Recommendation Systems in Python and JAX* Bryan's whitepaper about MLOps* “Kitbashing in ML”, slides from his talk on building on top of foundation models* “Bayesian Statistics The Fun Way” by Will Kurt* Bryan's Twitter* “Berkeley man determined to walk every street in his city”* People:* Adam Azzam* Graham Neubig* Eugene Yan* Even OldridgeTimestamps* [00:00:00] Bryan's background* [00:02:34] Overview of Hex and the Magic product* [00:05:57] How Magic handles the complex notebook format to integrate cleanly with Hex* [00:08:37] Discussion of whether to build vs buy models - why Hex uses GPT-4 vs fine-tuning* [00:13:06] UX design for Magic with Hex's notebook format (aka “Chat++”)* [00:18:37] Expanding notebooks to less technical users* [00:23:46] The "Memex" as an exciting underexplored area - personal knowledge graph and memory augmentation* [00:27:02] What makes for good LLMops vs MLOps* [00:34:53] Building rigorous evaluators for Magic and best practices* [00:36:52] Different types of metrics for LLM evaluation beyond just end task accuracy* [00:39:19] Evaluation strategy when you don't own the core model that's being evaluated* [00:41:49] All the places you can make improvements outside of retraining the core LLM* [00:45:00] Lightning RoundTranscriptAlessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, Partner and CTO-in-Residence of Decibel Partners, and today I'm joining by Bryan Bischof. [00:00:15]Bryan: Hey, nice to meet you. [00:00:17]Alessio: So Bryan has one of the most thorough and impressive backgrounds we had on the show so far. Lead software engineer at Blue Bottle Coffee, which if you live in San Francisco, you know a lot about. And maybe you'll tell us 30 seconds on what that actually means. You worked as a data scientist at Stitch Fix, which used to be one of the premier data science teams out there. [00:00:38]Bryan: It used to be. Ouch. [00:00:39]Alessio: Well, no, no. Well, you left, you know, so how good can it still be? Then head of data science at Weights and Biases. You're also a professor at Rutgers and you're just wrapping up a new O'Reilly book as well. So a lot, a lot going on. Yeah. [00:00:52]Bryan: And currently head of AI at Hex. [00:00:54]Alessio: Let's do the Blue Bottle thing because I definitely want to hear what's the, what's that like? [00:00:58]Bryan: So I was leading data at Blue Bottle. I was the first data hire. I came in to kind of get the data warehouse in order and then see what we could build on top of it. But ultimately I mostly focused on demand forecasting, a little bit of recsys, a little bit of sort of like website optimization and analytics. But ultimately anything that you could imagine sort of like a retail company needing to do with their data, we had to do. I sort of like led that team, hired a few people, expanded it out. One interesting thing was I was part of the Nestle acquisition. So there was a period of time where we were sort of preparing for that and didn't know, which was a really interesting dynamic. Being acquired is a very not necessarily fun experience for the data team. [00:01:37]Alessio: I build a lot of internal tools for sourcing at the firm and we have a small VCs and data community of like other people doing it. And I feel like if you had a data feed into like the Blue Bottle in South Park, the Blue Bottle at the Hanahaus in Palo Alto, you can get a lot of secondhand information on the state of VC funding. [00:01:54]Bryan: Oh yeah. I feel like the real source of alpha is just bugging a Blue Bottle. [00:01:58]Alessio: Exactly. And what's your latest book about? [00:02:02]Bryan: I just wrapped up a book with a coauthor Hector Yee called Building Production Recommendation Systems. I'll give you the rest of the title because it's fun. It's in Python and JAX. And so for those of you that are like eagerly awaiting the first O'Reilly book that focuses on JAX, here you go. [00:02:17]Alessio: Awesome. And we'll chat about that later on. But let's maybe talk about Hex and Magic before. I've known Hex for a while, I've used it as a notebook provider and you've been working on a lot of amazing AI enabled experiences. So maybe run us through that. [00:02:34]Bryan: So I too, before I sort of like joined Hex, saw it as this like really incredible notebook platform, sort of a great place to do data science workflows, quite complicated, quite ad hoc interactive ones. And before I joined, I thought it was the best place to do data science workflows. And so when I heard about the possibility of building AI tools on top of that platform, that seemed like a huge opportunity. In particular, I lead the product called Magic. Magic is really like a suite of sort of capabilities as opposed to its own independent product. What I mean by that is they are sort of AI enhancements to the existing product. And that's a really important difference from sort of building something totally new that just uses AI. It's really important to us to enhance the already incredible platform with AI capabilities. So these are things like the sort of obvious like co-pilot-esque vibes, but also more interesting and dynamic ways of integrating AI into the product. And ultimately the goal is just to make people even more effective with the platform. [00:03:38]Alessio: How do you think about the evolution of the product and the AI component? You know, even if you think about 10 months ago, some of these models were not really good on very math based tasks. Now they're getting a lot better. I'm guessing a lot of your workloads and use cases is data analysis and whatnot. [00:03:53]Bryan: When I joined, it was pre 4 and it was pre the sort of like new chat API and all that. But when I joined, it was already clear that GPT was pretty good at writing code. And so when I joined, they had already executed on the vision of what if we allowed the user to ask a natural language prompt to an AI and have the AI assist them with writing code. So what that looked like when I first joined was it had some capability of writing SQL and it had some capability of writing Python and it had the ability to explain and describe code that was already written. Those very, what feel like now primitive capabilities, believe it or not, were already quite cool. It's easy to look back and think, oh, it's like kind of like Stone Age in these timelines. But to be clear, when you're building on such an incredible platform, adding a little bit of these capabilities feels really effective. And so almost immediately I started noticing how it affected my own workflow because ultimately as sort of like an engineering lead and a lot of my responsibility is to be doing analytics to make data driven decisions about what products we build. And so I'm actually using Hex quite a bit in the process of like iterating on our product. When I'm using Hex to do that, I'm using Magic all the time. And even in those early days, the amount that it sped me up, that it enabled me to very quickly like execute was really impressive. And so even though the models weren't that good at certain things back then, that capability was not to be underestimated. But to your point, the models have evolved between 3.5 Turbo and 4. We've actually seen quite a big enhancement in the kinds of tasks that we can ask Magic and even more so with things like function calling and understanding a little bit more of the landscape of agent workflows, we've been able to really accelerate. [00:05:57]Alessio: You know, I tried using some of the early models in notebooks and it actually didn't like the IPyNB formatting, kind of like a JSON plus XML plus all these weird things. How have you kind of tackled that? Do you have some magic behind the scenes to make it easier for models? Like, are you still using completely off the shelf models? Do you have some proprietary ones? [00:06:19]Bryan: We are using at the moment in production 3.5 Turbo and GPT-4. I would say for a large number of our applications, GPT-4 is pretty much required. To your question about, does it understand the structure of the notebook? And does it understand all of this somewhat complicated wrappers around the content that you want to show? We do our very best to abstract that away from the model and make sure that the model doesn't have to think about what the cell wrapper code looks like. Or for our Magic charts, it doesn't have to speak the language of Vega. These are things that we put a lot of work in on the engineering side, to the AI engineer profile. This is the AI engineering work to get all of that out of the way so that the model can speak in the languages that it's best at. The model is quite good at SQL. So let's ensure that it's speaking the language of SQL and that we are doing the engineering work to get the output of that model, the generations, into our notebook format. So too for other cell types that we support, including charts, and just in general, understanding the flow of different cells, understanding what a notebook is, all of that is hard work that we've done to ensure that the model doesn't have to learn anything like that. I remember early on, people asked the question, are you going to fine tune a model to understand Hex cells? And almost immediately, my answer was no. No we're not. Using fine-tuned models in 2022, I was already aware that there are some limitations of that approach and frankly, even using GPT-3 and GPT-2 back in the day in Stitch Fix, I had already seen a lot of instances where putting more effort into pre- and post-processing can avoid some of these larger lifts. [00:08:14]Alessio: You mentioned Stitch Fix and GPT-2. How has the balance between build versus buy, so to speak, evolved? So GPT-2 was a model that was not super advanced, so for a lot of use cases it was worth building your own thing. Is with GPT-4 and the likes, is there a reason to still build your own models for a lot of this stuff? Or should most people be fine-tuning? How do you think about that? [00:08:37]Bryan: Sometimes people ask, why are you using GPT-4 and why aren't you going down the avenue of fine-tuning today? I can get into fine-tuning specifically, but I do want to talk a little bit about the good old days of GPT-2. Shout out to Reza. Reza introduced me to GPT-2. I still remember him explaining the difference between general transformers and GPT. I remember one of the tasks that we wanted to solve with transformer-based generative models at Stitch Fix were writing descriptions of clothing. You might think, ooh, that's a multi-modal problem. The answer is, not necessarily. We actually have a lot of features about the clothes that are almost already enough to generate some reasonable text. I remember at that time, that was one of the first applications that we had considered. There was a really great team of NLP scientists at Stitch Fix who worked on a lot of applications like this. I still remember being exposed to the GPT endpoint back in the days of 2. If I'm not mistaken, and feel free to fact check this, I'm pretty sure Stitch Fix was the first OpenAI customer, unlike their true enterprise application. Long story short, I ultimately think that depending on your task, using the most cutting-edge general model has some advantages. If those are advantages that you can reap, then go for it. So at Hex, why GPT-4? Why do we need such a general model for writing code, writing SQL, doing data analysis? Shouldn't a fine-tuned model just on Kaggle notebooks be good enough? I'd argue no. And ultimately, because we don't have one specific sphere of data that we need to write great data analysis workbooks for, we actually want to provide a platform for anyone to do data analysis about their business. To do that, you actually need to entertain an extremely general universe of concepts. So as an example, if you work at Hex and you want to do data analysis, our projects are called Hexes. That's relatively straightforward to teach it. There's a concept of a notebook. These are data science notebooks, and you want to ask analytics questions about notebooks. Maybe if you trained on notebooks, you could answer those questions, but let's come back to Blue Bottle. If I'm at Blue Bottle and I have data science work to do, I have to ask it questions about coffee. I have to ask it questions about pastries, doing demand forecasting. And so very quickly, you can see that just by serving just those two customers, a model purely fine-tuned on like Kaggle competitions may not actually fit the bill. And so the more and more that you want to build a platform that is sufficiently general for your customer base, the more I think that these large general models really pack a lot of additional opportunity in. [00:11:21]Alessio: With a lot of our companies, we talked about stuff that you used to have to extract features for, now you have out of the box. So say you're a travel company, you want to do a query, like show me all the hotels and places that are warm during spring break. It would be just literally like impossible to do before these models, you know? But now the model knows, okay, spring break is like usually these dates and like these locations are usually warm. So you get so much out of it for free. And in terms of Magic integrating into Hex, I think AI UX is one of our favorite topics and how do you actually make that seamless. In traditional code editors, the line of code is like kind of the atomic unit and HEX, you have the code, but then you have the cell also. [00:12:04]Bryan: I think the first time I saw Copilot and really like fell in love with Copilot, I thought finally, fancy auto-complete. And that felt so good. It felt so elegant. It felt so right sized for the task. But as a data scientist, a lot of the work that you do previous to the ML engineering part of the house, you're working in these cells and these cells are atomic. They're expressing one idea. And so ultimately, if you want to make the transition from something like this code, where you've got like a large amount of code and there's a large amount of files and they kind of need to have awareness of one another, and that's a long story and we can talk about that. But in this atomic, somewhat linear flow through the notebook, what you ultimately want to do is you want to reason with the agent at the level of these individual thoughts, these atomic ideas. Usually it's good practice in say Jupyter notebook to not let your cells get too big. If your cell doesn't fit on one page, that's like kind of a code smell, like why is it so damn big? What are you doing in this cell? That also lends some hints as to what the UI should feel like. I want to ask questions about this one atomic thing. So you ask the agent, take this data frame and strip out this prefix from all the strings in this column. That's an atomic task. It's probably about two lines of pandas. I can write it, but it's actually very natural to ask magic to do that for me. And what I promise you is that it is faster to ask magic to do that for me. At this point, that kind of code, I never write. And so then you ask the next question, which is what should the UI be to do chains, to do multiple cells that work together? Because ultimately a notebook is a chain of cells and actually it's a first class citizen for Hex. So we have a DAG and the DAG is the execution DAG for the individual cells. This is one of the reasons that Hex is reactive and kind of dynamic in that way. And so the very next question is, what is the sort of like AI UI for these collections of cells? And back in June and July, we thought really hard about what does it feel like to ask magic a question and get a short chain of cells back that execute on that task. And so we've thought a lot about sort of like how that breaks down into individual atomic units and how those are tied together. We introduced something which is kind of an internal name, but it's called the airlock. And the airlock is exactly a sequence of cells that refer to one another, understand one another, use things that are happening in other cells. And it gives you a chance to sort of preview what magic has generated for you. Then you can accept or reject as an entire group. And that's one of the reasons we call it an airlock, because at any time you can sort of eject the airlock and see it in the space. But to come back to your question about how the AI UX fits into this notebook, ultimately a notebook is very conversational in its structure. I've got a series of thoughts that I'm going to express as a series of cells. And sometimes if I'm a kind data scientist, I'll put some text in between them too, explaining what on earth I'm doing. And that feels, in my opinion, and I think this is quite shared amongst exons, that feels like a really nice refinement of the chat UI. I've been saying for several months now, like, please stop building chat UIs. There is some irony because I think what the notebook allows is like chat plus plus. [00:15:36]Alessio: Yeah, I think the first wave of everything was like chat with X. So it was like chat with your data, chat with your documents and all of this. But people want to code, you know, at the end of the day. And I think that goes into the end user. I think most people that use notebooks are software engineer, data scientists. I think the cool things about these models is like people that are not traditionally technical can do a lot of very advanced things. And that's why people like code interpreter and chat GBT. How do you think about the evolution of that persona? Do you see a lot of non-technical people also now coming to Hex to like collaborate with like their technical folks? [00:16:13]Bryan: Yeah, I would say there might even be more enthusiasm than we're prepared for. We're obviously like very excited to bring what we call the like low floor user into this world and give more people the opportunity to self-serve on their data. We wanted to start by focusing on users who are already familiar with Hex and really make magic fantastic for them. One of the sort of like internal, I would say almost North Stars is our team's charter is to make Hex feel more magical. That is true for all of our users, but that's easiest to do on users that are already able to use Hex in a great way. What we're hearing from some customers in particular is sort of like, I'm excited for some of my less technical stakeholders to get in there and start asking questions. And so that raises a lot of really deep questions. If you immediately enable self-service for data, which is almost like a joke over the last like maybe like eight years, if you immediately enabled self-service, what challenges does that bring with it? What risks does that bring with it? And so it has given us the opportunity to think about things like governance and to think about things like alignment with the data team and making sure that the data team has clear visibility into what the self-service looks like. Having been leading a data team, trying to provide answers for stakeholders and hearing that they really want to self-serve, a question that we often found ourselves asking is, what is the easiest way that we can keep them on the rails? What is the easiest way that we can set up the data warehouse and set up our tools such that they can ask and answer their own questions without coming away with like false answers? Because that is such a priority for data teams, it becomes an important focus of my team, which is, okay, magic may be an enabler. And if it is, what do we also have to respect? We recently introduced the data manager and the data manager is an auxiliary sort of like tool on the Hex platform to allow people to write more like relevant metadata about their data warehouse to make sure that magic has access to the best information. And there are some things coming to kind of even further that story around governance and understanding. [00:18:37]Alessio: You know, you mentioned self-serve data. And when I was like a joke, you know, the whole rush to the modern data stack was something to behold. Do you think AI is like in a similar space where it's like a bit of a gold rush? [00:18:51]Bryan: I have like sort of two comments here. One I'll shamelessly steal from a friend, Adam Azzam from Prefect. He says that this is more of like an iron mine than a gold mine in the sense of there is a lot of work to extract this precious, precious resource. And that's the first one is I think, don't expect to just go down to the stream and do a little panning. There's a lot of work to be done. And frankly, the steps to go from this like gold to, or this resource to something valuable is significant. I think people have gotten a little carried away with the old maxim of like, don't go pan for gold, sell pickaxes and shovels. It's a much stronger business model. At this point, I feel like I look around and I see more pickaxe salesmen and shovel salesmen than I do prospectors. And that scares me a little bit. Metagame where people are starting to think about how they can build tools for people building tools for AI. And that starts to give me a little bit of like pause in terms of like, how confident are we that we can even extract this resource into something valuable? I got a text message from a VC earlier today, and I won't name the VC or the fund, but the question was, what are some medium or large size companies that have integrated AI into their platform in a way that you're really impressed by? And I looked at the text message for a few minutes and I was finding myself thinking and thinking, and I responded, maybe only co-pilot. It's been a couple hours now, and I don't think I've thought of another one. And I think that's where I reflect again on this, like iron versus gold. If it was really gold, I feel like I'd be more blown away by other AI integrations. And I'm not yet. [00:20:40]Alessio: I feel like all the people finding gold are the ones building things that traditionally we didn't focus on. So like mid-journey. I've talked to a company yesterday, which I'm not going to name, but they do agents for some use case, let's call it. They are 11 months old. They're making like 8 million a month in revenue, but in a space that you wouldn't even think about selling to. If you were like a shovel builder, you wouldn't even go sell to those people. And Swix talks about this a bunch, about like actually trying to go application first for some things. Let's actually see what people want to use and what works. What do you think are the most maybe underexplored areas in AI? Is there anything that you wish people were actually trying to shovel? [00:21:23]Bryan: I've been saying for a couple of months now, if I had unlimited resources and I was just sort of like truly like, you know, on my own building whatever I wanted, I think the thing that I'd be most excited about is building sort of like the personal Memex. The Memex is something that I've wanted since I was a kid. And are you familiar with the Memex? It's the memory extender. And it's this idea that sort of like human memory is quite weak. And so if we can extend that, then that's a big opportunity. So I think one of the things that I've always found to be one of the limiting cases here is access. How do you access that data? Even if you did build that data like out, how would you quickly access it? And one of the things I think there's a constellation of technologies that have come together in the last couple of years that now make this quite feasible. Like information retrieval has really improved and we have a lot more simple systems for getting started with information retrieval to natural language is ultimately the interface that you'd really like these systems to work on, both in terms of sort of like structuring the data and preparing the data, but also on the retrieval side. So what keys off the query for retrieval, probably ultimately natural language. And third, if you really want to go into like the purely futuristic aspect of this, it is latent voice to text. And that is also something that has quite recently become possible. I did talk to a company recently called gather, which seems to have some cool ideas in this direction, but I haven't seen yet what I, what I really want, which is I want something that is sort of like every time I listen to a podcast or I watch a movie or I read a book, it sort of like has a great vector index built on top of all that information that's contained within. And then when I'm having my next conversation and I can't quite remember the name of this person who did this amazing thing, for example, if we're talking about the Memex, it'd be really nice to have Vannevar Bush like pop up on my, you know, on my Memex display, because I always forget Vannevar Bush's name. This is one time that I didn't, but I often do. This is something that I think is only recently enabled and maybe we're still five years out before it can be good, but I think it's one of the most exciting projects that has become possible in the last three years that I think generally wasn't possible before. [00:23:46]Alessio: Would you wear one of those AI pendants that record everything? [00:23:50]Bryan: I think I'm just going to do it because I just like support the idea. I'm also admittedly someone who, when Google Glass first came out, thought that seems awesome. I know that there's like a lot of like challenges about the privacy aspect of it, but it is something that I did feel was like a disappointment to lose some of that technology. Fun fact, one of the early Google Glass developers was this MIT computer scientist who basically built the first wearable computer while he was at MIT. And he like took notes about all of his conversations in real time on his wearable and then he would have real time access to them. Ended up being kind of a scandal because he wanted to use a computer during his defense and they like tried to prevent him from doing it. So pretty interesting story. [00:24:35]Alessio: I don't know but the future is going to be weird. I can tell you that much. Talking about pickaxes, what do you think about the pickaxes that people built before? Like all the whole MLOps space, which has its own like startup graveyard in there. How are those products evolving? You know, you were at Wits and Biases before, which is now doing a big AI push as well. [00:24:57]Bryan: If you really want to like sort of like rub my face in it, you can go look at my white paper on MLOps from 2022. It's interesting. I don't think there's many things in that that I would these days think are like wrong or even sort of like naive. But what I would say is there are both a lot of analogies between MLOps and LLMops, but there are also a lot of like key differences. So like leading an engineering team at the moment, I think a lot more about good engineering practices than I do about good ML practices. That being said, it's been very convenient to be able to see around corners in a few of the like ML places. One of the first things I did at Hex was work on evals. This was in February. I hadn't yet been overwhelmed by people talking about evals until about May. And the reason that I was able to be a couple of months early on that is because I've been building evals for ML systems for years. I don't know how else to build an ML system other than start with the evals. I teach my students at Rutgers like objective framing is one of the most important steps in starting a new data science project. If you can't clearly state what your objective function is and you can't clearly state how that relates to the problem framing, you've got no hope. And I think that is a very shared reality with LLM applications. Coming back to one thing you mentioned from earlier about sort of like the applications of these LLMs. To that end, I think what pickaxes I think are still very valuable is understanding systems that are inherently less predictable, that are inherently sort of experimental. On my engineering team, we have an experimentalist. So one of the AI engineers, his focus is experiments. That's something that you wouldn't normally expect to see on an engineering team. But it's important on an AI engineering team to have one person whose entire focus is just experimenting, trying, okay, this is a hypothesis that we have about how the model will behave. Or this is a hypothesis we have about how we can improve the model's performance on this. And then going in, running experiments, augmenting our evals to test it, et cetera. What I really respect are pickaxes that recognize the hybrid nature of the sort of engineering tasks. They are ultimately engineering tasks with a flavor of ML. And so when systems respect that, I tend to have a very high opinion. One thing that I was very, very aligned with Weights and Biases on is sort of composability. These systems like ML systems need to be extremely composable to make them much more iterative. If you don't build these systems in composable ways, then your integration hell is just magnified. When you're trying to iterate as fast as people need to be iterating these days, I think integration hell is a tax not worth paying. [00:27:51]Alessio: Let's talk about some of the LLM native pickaxes, so to speak. So RAG is one. One thing is doing RAG on text data. One thing is doing RAG on tabular data. We're releasing tomorrow our episode with Kube, the semantic layer company. Curious to hear your thoughts on it. How are you doing RAG, pros, cons? [00:28:11]Bryan: It became pretty obvious to me almost immediately that RAG was going to be important. Because ultimately, you never expect your model to have access to all of the things necessary to respond to a user's request. So as an example, Magic users would like to write SQL that's relevant to their business. And it's important then to have the right data objects that they need to query. We can't expect any LLM to understand our user's data warehouse topology. So what we can expect is that we can build a RAG system that is data warehouse aware, data topology aware, and use that to provide really great information to the model. If you ask the model, how are my customers trending over time? And you ask it to write SQL to do that. What is it going to do? Well, ultimately, it's going to hallucinate the structure of that data warehouse that it needs to write a general query. Most likely what it's going to do is it's going to look in its sort of memory of Stack Overflow responses to customer queries, and it's going to say, oh, it's probably a customer stable and we're in the age of DBT, so it might be even called, you know, dim customers or something like that. And what's interesting is, and I encourage you to try, chatGBT will do an okay job of like hallucinating up some tables. It might even hallucinate up some columns. But what it won't do is it won't understand the joins in that data warehouse that it needs, and it won't understand the data caveats or the sort of where clauses that need to be there. And so how do you get it to understand those things? Well, this is textbook RAG. This is the exact kind of thing that you expect RAG to be good at augmenting. But I think where people who have done a lot of thinking about RAG for the document case, they think of it as chunking and sort of like the MapReduce and the sort of like these approaches. But I think people haven't followed this train of thought quite far enough yet. Jerry Liu was on the show and he talked a little bit about thinking of this as like information retrieval. And I would push that even further. And I would say that ultimately RAG is just RecSys for LLM. As I kind of already mentioned, I'm a little bit recommendation systems heavy. And so from the beginning, RAG has always felt like RecSys to me. It has always felt like you're building a recommendation system. And what are you trying to recommend? The best possible resources for the LLM to execute on a task. And so most of my approach to RAG and the way that we've improved magic via retrieval is by building a recommendation system. [00:30:49]Alessio: It's funny, as you mentioned that you spent three years writing the book, the O'Reilly book. Things must have changed as you wrote the book. I don't want to bring out any nightmares from there, but what are the tips for people who want to stay on top of this stuff? Do you have any other favorite newsletters, like Twitter accounts that you follow, communities you spend time in? [00:31:10]Bryan: I am sort of an aggressive reader of technical books. I think I'm almost never disappointed by time that I've invested in reading technical manuscripts. I find that most people write O'Reilly or similar books because they've sort of got this itch that they need to scratch, which is that I have some ideas, I have some understanding that we're hard won, I need to tell other people. And there's something that, from my experience, correlates between that itch and sort of like useful information. As an example, one of the people on my team, his name is Will Kurt, he wrote a book sort of Bayesian statistics the fun way. I knew some Bayesian statistics, but I read his book anyway. And the reason was because I was like, if someone feels motivated to write a book called Bayesian statistics the fun way, they've got something to say about Bayesian statistics. I learned so much from that book. That book is like technically like targeted at someone with less knowledge and experience than me. And boy, did it humble me about my understanding of Bayesian statistics. And so I think this is a very boring answer, but ultimately like I read a lot of books and I think that they're a really valuable way to learn these things. I also regrettably still read a lot of Twitter. There is plenty of noise in that signal, but ultimately it is still usually like one of the first directions to get sort of an instinct for what's valuable. The other comment that I want to make is we are in this age of sort of like archive is becoming more of like an ad platform. I think that's a little challenging right now to kind of use it the way that I used to use it, which is for like higher signal. I've chatted a lot with a CMU professor, Graham Neubig, and he's been doing LLM evaluation and LLM enhancements for about five years and know that I didn't misspeak. And I think talking to him has provided me a lot of like directionality for more believable sources. Trying to cut through the hype. I know that there's a lot of other things that I could mention in terms of like just channels, but ultimately right now I think there's almost an abundance of channels and I'm a little bit more keen on high signal. [00:33:18]Alessio: The other side of it is like, I see so many people say, Oh, I just wrote a paper on X and it's like an article. And I'm like, an article is not a paper, but it's just funny how I know we were kind of chatting before about terms being reinvented and like people that are not from this space kind of getting into AI engineering now. [00:33:36]Bryan: I also don't want to be gatekeepy. Actually I used to say a lot to people, don't be shy about putting your ideas down on paper. I think it's okay to just like kind of go for it. And I, I myself have something on archive that is like comically naive. It's intentionally naive. Right now I'm less concerned by more naive approaches to things than I am by the purely like advertising approach to sort of writing these short notes and articles. I think blogging still has a good place. And I remember getting feedback during my PhD thesis that like my thesis sounded more like a long blog post. And I now feel like that curmudgeonly professor who's also like, yeah, maybe just keep this to the blogs. That's funny.Alessio: Uh, yeah, I think one of the things that Swyx said when he was opening the AI engineer summit a couple of weeks ago was like, look, most people here don't know much about the space because it's so new and like being open and welcoming. I think it's one of the goals. And that's why we try and keep every episode at a level that it's like, you know, the experts can understand and learn something, but also the novices can kind of like follow along. You mentioned evals before. I think that's one of the hottest topics obviously out there right now. What are evals? How do we know if they work? Yeah. What are some of the fun learnings from building them into X? [00:34:53]Bryan: I said something at the AI engineer summit that I think a few people have already called out, which is like, if you can't get your evals to be sort of like objective, then you're not trying hard enough. I stand by that statement. I'm not going to, I'm not going to walk it back. I know that that doesn't feel super good because people, people want to think that like their unique snowflake of a problem is too nuanced. But I think this is actually one area where, you know, in this dichotomy of like, who can do AI engineering? And the answer is kind of everybody. Software engineering can become AI engineering and ML engineering can become AI engineering. One thing that I think the more data science minded folk have an advantage here is we've gotten more practice in taking very vague notions and trying to put a like objective function around that. And so ultimately I would just encourage everybody who wants to build evals, just work incredibly hard on codifying what is good and bad in terms of these objective metrics. As far as like how you go about turning those into evals, I think it's kind of like sweat equity. Unfortunately, I told the CEO of gantry several months ago, I think it's been like six months now that I was sort of like looking at every single internal Hex request to magic by hand with my eyes and sort of like thinking, how can I turn this into an eval? Is there a way that I can take this real request during this dog foodie, not very developed stage? How can I make that into an evaluation? That was a lot of sweat equity that I put in a lot of like boring evenings, but I do think ultimately it gave me a lot of understanding for the way that the model was misbehaving. Another thing is how can you start to understand these misbehaviors as like auxiliary evaluation metrics? So there's not just one evaluation that you want to do for every request. It's easy to say like, did this work? Did this not work? Did the response satisfy the task? But there's a lot of other metrics that you can pull off these questions. And so like, let me give you an example. If it writes SQL that doesn't reference a table in the database that it's supposed to be querying against, we would think of that as a hallucination. You could separately consider, is it a hallucination as a valuable metric? You could separately consider, does it get the right answer? The right answer is this sort of like all in one shot, like evaluation that I think people jump to. But these intermediary steps are really important. I remember hearing that GitHub had thousands of lines of post-processing code around Copilot to make sure that their responses were sort of correct or in the right place. And that kind of sort of defensive programming against bad responses is the kind of thing that you can build by looking at many different types of evaluation metrics. Because you can say like, oh, you know, the Copilot completion here is mostly right, but it doesn't close the brace. Well, that's the thing you can check for. Or, oh, this completion is quite good, but it defines a variable that was like already defined in the file. Like that's going to have a problem. That's an evaluation that you could check separately. And so this is where I think it's easy to convince yourself that all that matters is does it get the right answer? But the more that you think about production use cases of these things, the more you find a lot of this kind of stuff. One simple example is like sometimes the model names the output of a cell, a variable that's already in scope. Okay. Like we can just detect that and like we can just fix that. And this is the kind of thing that like evaluations over time and as you build these evaluations over time, you really can expand the robustness in which you trust these models. And for a company like Hex, who we need to put this stuff in GA, we can't just sort of like get to demo stage or even like private beta stage. We really hunting GA on all of these capabilities. Did it get the right answer on some cases is not good enough. [00:38:57]Alessio: I think the follow up question to that is in your past roles, you own the model that you're evaluating against. Here you don't actually have control into how the model evolves. How do you think about the model will just need to improve or we'll use another model versus like we can build kind of like engineering post-processing on top of it. How do you make the choice? [00:39:19]Bryan: So I want to say two things here. One like Jerry Liu talked a little bit about in his episode, he talked a little bit about sort of like you don't always want to retrain the weights to serve certain use cases. Rag is another tool that you can use to kind of like soft tune. I think that's right. And I want to go back to my favorite analogy here, which is like recommendation systems. When you build a recommendation system, you build the objective function. You think about like what kind of recs you want to provide, what kind of features you're allowed to use, et cetera, et cetera. But there's always another step. There's this really wonderful collection of blog posts from Eugene Yon and then ultimately like even Oldridge kind of like iterated on that for the Merlin project where there's this multi-stage recommender. And the multi-stage recommender says the first step is to do great retrieval. Once you've done great retrieval, you then need to do great ranking. Once you've done great ranking, you need to then do a good job serving. And so what's the analogy here? Rag is retrieval. You can build different embedding models to encode different features in your latent space to ensure that your ranking model has the best opportunity. Now you might say, oh, well, my ranking model is something that I've got a lot of capability to adjust. I've got full access to my ranking model. I'm going to retrain it. And that's great. And you should. And over time you will. But there's one more step and that's downstream and that's the serving. Serving often sounds like I just show the s**t to the user, but ultimately serving is things like, did I provide diverse recommendations? Going back to Stitch Fix days, I can't just recommend them five shirts of the same silhouette and cut. I need to serve them a diversity of recommendations. Have I respected their requirements? They clicked on something that got them to this place. Is the recommendations relevant to that query? Are there any hard rules? Do we maybe not have this in stock? These are all things that you put downstream. And so much like the recommendations use case, there's a lot of knobs to pull outside of retraining the model. And even in recommendation systems, when do you retrain your model for ranking? Not nearly as much as you do other s**t. And even this like embedding model, you might fiddle with more often than the true ranking model. And so I think the only piece of the puzzle that you don't have access to in the LLM case is that sort of like middle step. That's okay. We've got plenty of other work to do. So right now I feel pretty enabled. [00:41:56]Alessio: That's great. You obviously wrote a book on RecSys. What are some of the key concepts that maybe people that don't have a data science background, ML background should keep in mind as they work in this area? [00:42:07]Bryan: It's easy to first think these models are stochastic. They're unpredictable. Oh, well, what are we going to do? I think of this almost like gaseous type question of like, if you've got this entropy, where can you put the entropy? Where can you let it be entropic and where can you constrain it? And so what I want to say here is think about the cases where you need it to be really tightly constrained. So why are people so excited about function calling? Because function calling feels like a way to constrict it. Where can you let it be more gaseous? Well, maybe in the way that it talks about what it wants to do. Maybe for planning, if you're building agents and you want to do sort of something chain of thoughty. Well, that's a place where the entropy can happily live. When you're building applications of these models, I think it's really important as part of the problem framing to be super clear upfront. These are the things that can be entropic. These are the things that cannot be. These are the things that need to be super rigid and really, really aligned to a particular schema. We've had a lot of success in making specific the parts that need to be precise and tightly schemified, and that has really paid dividends. And so other analogies from data science that I think are very valuable is there's the sort of like human in the loop analogy, which has been around for quite a while. And I have gone on record a couple of times saying that like, I don't really love human in the loop. One of the things that I think we can learn from human in the loop is that the user is the best judge of what is good. And the user is pretty motivated to sort of like interact and give you kind of like additional nudges in the direction that you want. I think what I'd like to flip though, is instead of human in the loop, I'd like it to be AI in the loop. I'd rather center the user. I'd rather keep the user as the like core item at the center of this universe. And the AI is a tool. By switching that analogy a little bit, what it allows you to do is think about where are the places in which the user can reach for this as a tool, execute some task with this tool, and then go back to doing their workflow. It still gets this back and forth between things that computers are good at and things that humans are good at, which has been valuable in the human loop paradigm. But it allows us to be a little bit more, I would say, like the designers talk about like user-centered. And I think that's really powerful for AI applications. And it's one of the things that I've been trying really hard with Magic to make that feel like the workflow as the AI is right there. It's right where you're doing your work. It's ready for you anytime you need it. But ultimately you're in charge at all times and your workflow is what we care the most about. [00:44:56]Alessio: Awesome. Let's jump into lightning round. What's something that is not on your LinkedIn that you're passionate about or, you know, what's something you would give a TED talk on that is not work related? [00:45:05]Bryan: So I walk a lot. [00:45:07]Bryan: I have walked every road in Berkeley. And I mean like every part of every road even, not just like the binary question of, have you been on this road? I have this little app that I use called Wanderer, which just lets me like kind of keep track of everywhere I've been. And so I'm like a little bit obsessed. My wife would say a lot a bit obsessed with like what I call new roads. I'm actually more motivated by trails even than roads, but like I'm a maximalist. So kind of like everything and anything. Yeah. Believe it or not, I was even like in the like local Berkeley paper just talking about walking every road. So yeah, that's something that I'm like surprisingly passionate about. [00:45:45]Alessio: Is there a most underrated road in Berkeley? [00:45:49]Bryan: What I would say is like underrated is Kensington. So Kensington is like a little town just a teeny bit north of Berkeley, but still in the Berkeley hills. And Kensington is so quirky and beautiful. And it's a really like, you know, don't sleep on Kensington. That being said, one of my original motivations for doing all this walking was people always tell me like, Berkeley's so quirky. And I was like, how quirky is Berkeley? Turn it out. It's quite, quite quirky. It's also hard to say quirky and Berkeley in the same sentence I've learned as of now. [00:46:20]Alessio: That's a, that's a good podcast warmup for our next guests. All right. The actual lightning ground. So we usually have three questions, acceleration, exploration, then a takeaway acceleration. What's, what's something that's already here today that you thought would take much longer to arrive in AI and machine learning? [00:46:39]Bryan: So I invited the CEO of Hugging Face to my seminar when I worked at Stitch Fix and his talk at the time, honestly, like really annoyed me. The talk was titled like something to the effect of like LLMs are going to be the like technology advancement of the next decade. It's on YouTube. You can find it. I don't remember exactly the title, but regardless, it was something like LLMs for the next decade. And I was like, okay, they're like one modality of model, like whatever. His talk was fine. Like, I don't think it was like particularly amazing or particularly poor, but what I will say is damn, he was right. Like I, I don't think I quite was on board during that talk where I was like, ah, maybe, you know, like there's a lot of other modalities that are like moving pretty quick. I thought things like RL were going to be the like real like breakout success. And there's a little pun with Atari and breakout there, but yeah, like I, man, I was sleeping on LLMs and I feel a little embarrassed. I, yeah. [00:47:44]Alessio: Yeah. No, I mean, that's a good point. It's like sometimes the, we just had Jeremy Howard on the podcast and he was saying when he was talking about fine tuning, everybody thought it was dumb, you know, and then later people realize, and there's something to be said about messaging, especially like in technical audiences where there's kind of like the metagame, you know, which is like, oh, these are like the cool ideas people are exploring. I don't know where I want to align myself yet, you know, or whatnot. So it's cool exploration. So it's kind of like the opposite of that. You mentioned RL, right? That's something that was kind of like up and up and up. And then now it's people are like, oh, I don't know. Are there any other areas if you weren't working on, on magic that you want to go work on? [00:48:25]Bryan: Well, I did mention that, like, I think this like Memex product is just like incredibly exciting to me. And I think it's really opportunistic. I think it's very, very feasible, but I would maybe even extend that a little bit, which is I don't see enough people getting really enthusiastic about hardware with advanced AI built in. You're hearing whispering of it here and there, put on the whisper, but like you're starting to see people putting whisper into pieces of hardware and making that really powerful. I joked with, I can't think of her name. Oh, Sasha, who I know is a friend of the pod. Like I joked with Sasha that I wanted to make the big mouth Billy Bass as a babble fish, because at this point it's pretty easy to connect that up to whisper and talk to it in one language and have it talk in the other language. And I was like, this is the kind of s**t I want people building is like silly integrations between hardware and these new capabilities. And as much as I'm starting to hear whisperings here and there, it's not enough. I think I want to see more people going down this track because I think ultimately like these things need to be in our like physical space. And even though the margins are good on software, I want to see more like integration into my daily life. Awesome. [00:49:47]Alessio: And then, yeah, a takeaway, what's one message idea you want everyone to remember and think about? [00:49:54]Bryan: Even though earlier I was talking about sort of like, maybe like not reinventing things and being respectful of the sort of like ML and data science, like ideas. I do want to say that I think everybody should be experimenting with these tools as much as they possibly can. I've heard a lot of professors, frankly, express concern about their students using GPT to do their homework. And I took a completely opposite approach, which is in the first 15 minutes of the first class of my semester this year, I brought up GPT on screen and we talked about what GPT was good at. And we talked about like how the students can sort of like use it. I showed them an example of it doing data analysis work quite well. And then I showed them an example of it doing quite poorly. I think however much you're integrating with these tools or interacting with these tools, and this audience is probably going to be pretty high on that distribution. I would really encourage you to sort of like push this into the other people in your life. My wife is very technical. She's a product manager and she's using chat GPT almost every day for communication or for understanding concepts that are like outside of her sphere of excellence. And recently my mom and my sister have been sort of like onboarded onto the chat GPT train. And so ultimately I just, I think that like it is our duty to help other people see like how much of a paradigm shift this is. We should really be preparing people for what life is going to be like when these are everywhere. [00:51:25]Alessio: Awesome. Thank you so much for coming on, Bryan. This was fun. [00:51:29]Bryan: Yeah. Thanks for having me. And use Hex magic. [00:51:31] Get full access to Latent Space at www.latent.space/subscribe

ceo ai magic san francisco building phd partner fun mit chatgpt character serving ga software curious expanding stream chat berkeley cto depending vc nlp openai evaluation residence ux api ended vega south park python gpt ui turbo rutgers ml atari github merlin vcs palo alto nestle biases copilot llm stone age wanderer weights reza xyz modular dag sql kensington hex perplexity rag dbt narrowing google glass alessio notebooks stitch fix bayesian wits stack overflow rl json cmu xml bischof databricks markdown prefect uis metagame hexes chatgbt kube knuth kaggle observable north stars blue bottle jupyter jeremy howard gbt blue bottle coffee chris lattner mapreduce code interpreter billy bass ipython latent space memex jerry liu project jupyter how magic bryan it

RAG Is A Hack - with Jerry Liu from LlamaIndex

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Oct 5, 2023 68:06

Want to help define the AI Engineer stack? >800 folks have weighed in on the top tools, communities and builders for the first State of AI Engineering survey, which we will present for the first time at next week's AI Engineer Summit. Join us online!This post had robust discussion on HN and Twitter.In October 2022, Robust Intelligence hosted an internal hackathon to play around with LLMs which led to the creation of two of the most important AI Engineering tools: LangChain

new york community donald trump ai google lost pr state building phd evolution toronto uber game of thrones tree walmart discord exploring seo origin agent delta hack sec context lower berkeley enterprise cto analysis deciding react shut ir slack evaluating transformers pg lang openai salesforce residence index ux api antoine document minimum cs baking panda gmail open source python gpt ui gorilla ml 5m github llama notion apis db poe similarities qa vcs llm google drive cpu robust pdfs agi quora prem vector drown sql node kubernetes prs rag vespa anthropic tldr embedding v2 alessio spooks fine tuning gans lm indexing stricter drupal lom chroma hn mindshare prefect postgres devrel greylock cohere modularity querying knowledge graph langchain two sigma ai engineer swix jerry liu uber atg jerry you

Ep 18: LlamaIndex CEO Jerry Liu on Trends in LLM Applications

Unsupervised Learning

Play Episode Listen Later Sep 19, 2023 47:15

Jacob and Pat sit down with LlamaIndex CEO Jerry Liu to discuss his motivations for building LlamaIndex, thoughts on building enterprise-ready LLM applications and agents, and when fine-tuning makes sense. 0:00 intro1:02 the evolution of LlamaIndex3:48 apps being built with LlamaIndex6:39 making agents more effective12:58 retrieval augmented generation16:49 what's the right level of abstraction for LlamaIndex?19:42 balancing reasoning and knowledge30:46 storage for embeddings36:03 underutilized features of LlamaIndex40:38 over-hyped/under-hyped With your co-hosts:@ericabrescia- Former COO Github, Founder Bitnami (acq'd by VMWare)@patrickachase- Partner at Redpoint, Former ML Engineer LinkedIn@jacobeffron- Partner at Redpoint, Former PM Flatiron Health@jordan_segall- Partner at Redpoint

partner applications llm vmware redpoint jerry liu

Bring Your Own Data to LLMs (W/ Jerry Liu of LlamaIndex)

The Analytics Engineering Podcast

Play Episode Listen Later Aug 25, 2023 42:53

Jerry Liu is the CEO and co-founder of LlamaIndex. LlamaIndex is an open-source framework that helps people prep their data for use with large language models in a process called retrieval augmented generation. LLMs are great decision engines, but in order for them to be useful for organizations, they need additional knowledge and context, and Jerry discusses how companies are bringing their data to tailor LLMs for their needs. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

ceo data labs bring your own jerry liu

LlamaIndex: Unleashing LLMs on Your Data with CEO Jerry Liu

The MAD Podcast with Matt Turck

Play Episode Listen Later Jul 12, 2023 40:47

This week we're joined by Jerry Liu, Co-Founder & CEO of LlamaIndex, a startup that offers a data framework for connecting custom data sources to large language models, for a conversation about the emerging Generative AI infrastructure stack, how startup founders navigate a field as new and fast paced as Generative AI, and more.

ceo co founders data unleashing jerry liu

An Open Source Data Framework for LLMs

The Data Exchange with Ben Lorica

Play Episode Listen Later Jul 6, 2023 49:24

Jerry Liu is CEO and co-founder of LlamaIndex, an open source project and startup that builds tools that enable teams to augment LLMs with their own private data. Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.

ceo framework detailed open source data jerry liu

Jerry Liu - Building LlamaIndex, the Data Framework for LLMs

MaML - Medicine & Machine Learning Podcast

Play Episode Listen Later Jun 21, 2023 49:19

Jerry Liu is the co-founder and creator of LlamaIndex (formerly known as GPT-Index), an interface that allows users to connect their data to LLM's such as Chat-GPT. He has a B.S. in Computer Science from Princeton and has worked at companies such as Quora, Uber, and Robust Intelligence prior to starting LlamaIndex. Host: David Wu Twitter: @davidjhwu Audio Producer: Aaron Schumacher LinkedIn: Aaron Schumacher Video Editor + Art: Saurin Kantesaria Instagram: saorange314 Social Media: Nikhil Kapur Time Stamps: 01:25 The path to starting LlamaIndex + initial ideas 07:09 LLMs like Chat-GPT vs traditional machine learning 10:00 4 steps of traditional machine learning 10:45 How do large LLMs change the game? 14:11 How does LlamaIndex help LLMs work with unstructured data? 18:08 How do you work with gigabytes of private data? 19:57 Organizing words and paragraphs by topic with embeddings 24:55 The importance of structuring data 26:00 3 key abstractions in LlamaIndex 29:25 Medical use cases for LlamaIndex 31:29 Increasing efficiency in medicine 33:25 An AI medical Research Assistant (Insight) 34:31 Other methods of connecting LLMs to data 36:55 What is langchain? 39:56 What work in the AI and LLM space excites you the most? 42:23 Do you ever feel scared about the developments of AI? 43:45 Llamas and Machine Learning 45:36 What do you think the future of AI in medicine will look like in 10-20 years? 47:24 What advice would you give to grad students, med students, and other early career professionals getting into AI and medicine?

ai data uber medical increasing framework organizing computer science llm quora llamas jerry liu

Data augmentation with LlamaIndex

Practical AI

Play Episode Listen Later May 23, 2023 44:52 Transcription Available

Large Language Models (LLMs) continue to amaze us with their capabilities. However, the utilization of LLMs in production AI applications requires the integration of private data. Join us as we have a captivating conversation with Jerry Liu from LlamaIndex, where he provides valuable insights into the process of data ingestion, indexing, and query specifically tailored for LLM applications. Delving into the topic, we uncover different query patterns and venture beyond the realm of vector databases.

ai data artificial intelligence machine learning delving llm deep learning large language models augmentation computer vision neural networks changelog chris benson jerry liu daniel whitenack

Data augmentation with LlamaIndex (Practical AI #224)

Changelog Master Feed

Play Episode Listen Later May 23, 2023 44:52 Transcription Available

ai data artificial intelligence machine learning delving llm deep learning large language models augmentation computer vision neural networks changelog practical ai chris benson jerry liu daniel whitenack

AI Agents and Data Integration with GPT and LLaMa with Jerry Liu - #628

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later May 8, 2023 41:26

Today we're joined by Jerry Liu, co-founder and CEO of Llama Index. In our conversation with Jerry, we explore the creation of Llama Index, a centralized interface to connect your external data with the latest large language models. We discuss the challenges of adding private data to language models and how Llama Index connects the two for better decision-making. We discuss the role of agents in automation, the evolution of the agent abstraction space, and the difficulties of optimizing queries over large amounts of complex data. We also discuss a range of topics from combining summarization and semantic search, to automating reasoning, to improving language model results by exploiting relationships between nodes in data. The complete show notes for this episode can be found at twimlai.com/go/628.

ceo llama data integration jerry liu

Connecting LLMs to your data, In-context learning | Jerry Liu, cofounder and CEO of LlamaIndex

Infinite Machine Learning

Play Episode Listen Later May 8, 2023 33:20

Jerry Liu is the cofounder and CEO of LlamaIndex. He is the creator of the open source tool that's also named LlamaIndex, which provides a central interface to connect your LLMs with external data. He has previously held roles at Quora, Uber, and Robust Intelligence. He has a computer science degree from Princeton. In this episode, we cover a range of topics including: - What can LLMs do well and where are the gaps - How to connect LLMs to external data - What does LlamaIndex do - Fine-tuning LLMs - In-context learning - Application that are getting built on top of LlamaIndex Jerry's favorite books: Harry Potter series (Author: J.K. Rowling) and The Hard Thing About Hard Things (Author: Ben Horowitz)--------Where to find Prateek Joshi: Newsletter: https://prateekjoshi.substack.com Website: https://prateekj.com LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19 Twitter: https://twitter.com/prateekvjoshi

ceo learning data connecting uber harry potter context application rowling quora jerry liu

Jerry Liu

NorCal and Shill

Play Episode Listen Later Feb 23, 2023 45:03 Transcription Available

Jerry LiuEpisode 76: Show Notes.Jerry Liu felt he was an artist from the moment he could pick up a crayon. From the days of drawing tanks for his school friends, today he is a US-based director, designer, illustrator, and animator specializing in motion graphics for TV, film, games, and tech. Over the past 15 years, Jerry's work has been recognized and screened by the Art Director's Club, won BDA awards, and has been featured by some of the most notable entertainment pop culture and art & design publications. Today you'll hear about how Jerry came to create Kibatsu Mecha, why he found the NFT concept so easy to grasp, and the time Guy attempted to underbid for one of Jerry's artworks. Apart from getting to know this artist a lot better, we also discuss open editions, art as ‘utility,' and what we think about AI-generated art. Tune in to hear about upcoming projects you can look forward to from Jerry and his helpful advice for artists joining the NFT space regarding pacing yourself and avoiding burnout. Key Points From This Episode:• An introduction to Jerry Liu and his work.• What Guy learned from his first interaction with Jerry when he bid on his work in 2021.• Find out why Jerry has a few hardware wallets.• What Jerry first thought of crypto and NFTs when he got into the space in 2020.• Why Jerry feels he's been an artist his entire life.• Some of the jobs he has done along the way from working at Pizza Hut to a tattoo parlor. • How he came to start his own animation studio and some of the clients he serves. • Why he identifies with flying squirrels and dogs. • Jerry's love of simpler foods with good ingredients.• Jerry's two best pieces of advice • Jerry's advice for artists joining the NFT space regarding pacing yourself and balance.• Why Jerry would like to live in Japan. • Find out about Guy's favorite movies, action stars, and video games when Jerry interviews him.• Hear Guy's thoughts on open editions. • Whether or not Guy thinks art itself is good enough as the ‘utility.' • Guy and Jerry's thoughts on AI-generated art. • Upcoming projects you can look forward to from Jerry, including a collab with Joyce Liu.Links Mentioned in Today's Episode:Jerry Liu StudioJerry Liu on TwitterJerry Liu on SuperRareJerry Liu on Known Origin Kibatsu Mecha NorCal and Shill on Twitter

tv ai japan club nfts pizza hut art directors shill bda jerry liu

012 Jerry Liu | CC0是否是NFT的共识方向？知识产权如何与NFT绑定？Is CC0 the consensus direction for NFT? How is IP tied to NFT?

Web3 Revolution

Play Episode Listen Later Sep 29, 2022 57:26

Summary / 内容总结: 当我们购买 NFT 时，我们真正购买的是什么？随着 NFT 生态系统中的人们寻求一种既能避免混乱的版权问题，又能合法地赋予收藏者权力的方法，一些项目开始使用一种称为 CC0 版权许可，这似乎成为一些人解决 Web3 问题的法律工具。然而许多NFT项目方转向CC0的意图是否真诚？是否具有法律效应？追溯回早期的“知识版权”概念——文明社会如何为“知识”设立版权？本期我们邀请了来自斯坦福大学互联网与社会中心的刘家瑞教授和我们聊一聊到底什么是 CC0，以及 Web3 世界所面对的法律问题。 Useful Links/相关链接： Why NFT Creators Are Going cc0 The Can't Be Evil NFT Licenses by Miles Jennings and Chris Dixon Featuring Guest / 嘉宾： Jerry Liu Jerry Liu's academic profile Host / 主持人： Hana Alice Fang Post-production / 后期制作： Sain Jane Cho Alice Fang Episode breakdown / 时间轴： 01:52 本期嘉宾 Jerry Liu 的介绍 03:52 CC0 的由来以及对开源软件运动精神的传承 06:55 CC0 到底是什么？ 08:47 版权©️的概念在历史上从何而来？ 12:37 当我们购买 NFT 时，我们买到的究竟是什么？ 16:13 CC0——The good, the bad, and the ugly 22:11 CC0作为一种具有开源软件运动精神的版权协议也有其应用的局限性 28:25 什么样的作品才能受到版权法的保护? CC0 NFT 项目的阴暗面是什么？ 40:53 CC0 ——放弃了就无法再后悔的版权协议 44:32 互联网之子之死背后的精神 44:47 斯坦福大学里的区块链探索 53:17 技术的发展总是见证着人类的德性 01:52 Introduction of our guest Jerry Liu 03:52 The origins of CC0 and the legacy of the open source software movement 06:55 What exactly is CC0? 08:47 Where did the concept of Copyright©️ come from historically? 12:37 When we buy NFT, what exactly are we buying? 16:13 CC0 - The good, the bad, and the ugly 22:11 CC0 as a copyright agreement in the spirit of the open source software movement also has its limitations 28:25 What kind of works are protected by copyright law? What's the dark side of the CC0 NFT project? 40:53 CC0 - a copyright agreement that you can't regret giving up 44:32 The spirit behind the death of the son of the Internet 44:47 Blockchain exploration at Stanford University 53:17 The development of technology has always witnessed the virtue of human beings About Web3 Revolution / 关于 Web3 Revolution： Sponsored and incubated by Mask Network, Web3 Revolution is a bilingual podcast that explores the Web3 space, connecting participants, actors, innovators, investors, and KOLs at the forefront of the Web3 social experiment through conversations. 什么是Web3 Revolution? 这是一档由 Mask Network 孵化赞助的探索 Web3 领域的双语播客，通过对话，联结在 Web3 这场社会实验中最前沿的参与者、行动者、创新者、投资者、KOL们。 Twitter/Media bio / 推特和媒体链接： Our Linktree: https://linktr.ee/w3revolution Follow us on Twitter @w3revolution_io Read English language transcriptions, please go to Medium 阅读中文转写稿，请点击我们的Matters主页

internet nfts blockchain direction stanford university copyright web3 tied consensus cc0 kols our linktree jerry liu web3 revolution

Episode 29 - Fight Chat with Jerry Liu of Fight Commentary Breakdowns

Two Hooks Podcast

Play Episode Listen Later Nov 4, 2020 72:36

Jerry Liu of 'Fight Commentary Breakdowns' & 'FC Chats' stopped by for a great discussion. Jerry runs some popular YouTube channels but has been a life long martial artist from Kung Fu as a kid, to Kempo, Muay Thai, and BJJ as he continues his journey. Join us as we go the rounds and attempt to change one another's minds, or agree on topics such as:‘Teacher : Student' ratios in Brazilian Jiu-JitsuEmotional Control and Sparring with Wild PartnersConcussions | CTE in Combat SportsHow to Spot Bad Self-Defense Courses

commentary kung fu breakdowns muay thai sparring teacher student kempo jerry liu

112. Jerry Liu: Infernal Affairs/ Departed

Talking Codswallop

Play Episode Listen Later Apr 26, 2020 38:58

Infernal Affairs & The Departed are part of cinematic history. James sat down with Producer and family friend Jerry to discuss Media Asia. His role in creating Infernal Affairs and the western remake The Departed starring Leo DiCaprio & Jack Nicholson. This is a really good interview for anyone interested in learning about media and film making. Follow Talking Codswallop on Twitter, Facebook and Instagram: @CodswallopPod (DO YOU KNOW WE ARE ON YOUTUBE TOO?!).

departed infernal affairs jerry liu

[S1] [E17] What's It Like and How to Be an Asian Influencer w/ Jerry Liu, Digital Content Creator @ Jerry Liu Films

Navigating the Rise

Play Episode Listen Later Aug 28, 2019 55:49

This week we have the opportunity to speak with Jerry Liu. Jerry is a full time digital content creator and currently have more than 130K subscribers on YouTube, 181k subscribers on Facebook. Having spent time in journalism, PR, and more than 10,000 hours in the digital media space, Jerry carry unique communications experience in traditional and digital media and offer a unique blend of passion, energy, and focus to his works and to clients and employers. Jerry started his first YouTube channel (called JerryLiuFilms) for fun, but it developed into multiple brands. Since 2011, He gained more than 30 Million views and partnered with Multi-Channel Networks like Fullscreen and BBTV. Besides YouTube, he also experimented with other digital media platforms. He also put videos on YouKu and Bilibili, two of China's biggest video content sites, and gained more than half-a-million views on those Chinese site. I've also tested Vine, Twitter, and Instagram. Jerry mentioned that he learned how to use metrics to elevate his success. For example, he started to analyze what product companies care about when it comes to releasing content on a platform and looking how to get conversions and receive wide engagement. He mentioned there are a few key things to keep in mind when starting out. For example, he evaluated based on three metrics: Passion, ROI, and Effort. He doesn't keep a project going if there is no passion, because in the downturns, it is hard to get things going. As for ROI, he wants to see how much effort it takes him to generate a profitable ROi. If there is very low return he might consider starting another passion project. As for efforts, he mentioned that if it stresses him out in terms the number of efforts require to get things going, he would start and deplete his cognitive and decide to stop completely. He also shared one of his biggest lessons from my first YouTube channel was to specialize. Every idea should be its own brand on a separate channel. That's the secret to success because fans on each platform and each account are different and expect different content. Applying this lesson, he shared he experienced 600% growth of my brands since 2016 when he started creating focused channels based on the needs and trends of the digital media community. --- This episode is sponsored by · Anchor: The easiest way to make a podcast. https://anchor.fm/app

china pr passion chinese influencers asian films effort roi vine content creators what's it like 130k fullscreen digital content creator bilibili youku jerry liu

FS 15 – Henry Cejudo vs. TJ Dillashaw Fight Commentary Breakdowns w/ Jerry Liu

Southpaw

Play Episode Listen Later Jan 22, 2019 90:18

On this episode, we have Jerry Liu from Fight Commentary Breakdowns. It's a fun one as we dive deep into the world of YouTube, Facebook, data collecting, online privacy, BJJ culture, the new ESPN/ UFC deal, Greg Hardy in the UFC, and fight studies of Henry Cejudo vs. TJ Dillashaw, and Donald Cerrone vs. Alex Hernandez.And if you like the stuff we're putting out, support Southpaw on Patreon: https://www.patreon.com/southpawpodYou can find Southpaw on Facebook: https://www.facebook.com/southpawpod/ and on Twitter: https://twitter.com/SouthpawpodYou can also find Sam at: https://twitter.com/StuffFromSamYou can find Jerry Liu of Fight Commentary Breakdowns on YouTube: https://www.youtube.com/channel/UCAzXqFoW1Y7KDqwZ1x5m9EAFacebook: https://www.facebook.com/fightcommentarybreakdowns/Instagram: https://www.instagram.com/fightcommentary/Twitter: https://twitter.com/fightcommentary

ufc commentary breakdowns southpaw henry cejudo greg hardy tj dillashaw donald cerrone alex hernandez espn ufc jerry liu

Jerry Liu | YouTuber and Digital Media Artist

Kids In The Tank

Play Episode Listen Later Nov 15, 2018 45:53

Jerry Liu grew up in America and China. Having this two-country upbringing would later help him a lot when it came to finding a niche and creating his own positions in companies and on his own. Jerry graduated from University of Pennsylvania. He thought going to an Ivy League school would solve all his problems with life, but it instead made him more confused about life. There was one thing he did learn at Penn, and that was the importance of constant learning and constant self-examination. That's what he does online. Jerry prides himself on making meaningful and cerebral content on YouTube and often infuses knowledge about East and West into his content creation. He has more than 100K subscribers total spread across different channels. http://bit.ly/2Fij4aE In this episode BizTank students discuss Chinese culture vs American culture, struggles of being a youtuber, and how to engage your audience, the future of Facebook and digital media, Instagram vs Snapchat, paid advertising tips, and the differences between psychology and sociology with YouTuber Jerry Liu. About BizTank Career Exploration Program BizTank provides local Junior and Senior high school students an opportunity to gain exposure to the world of business through a stimulating and interactive program. Consisting of three unique eight-week seasons (Spring, Summer, Fall), meeting once a week on Wednesday nights. Sessions are spent covering a range of topics, such as startups, marketing and on-trend business subjects. In addition, students record, edit and create their own episodes for the Kids in the Tank Podcast. For more information visit us online at https://biztanknonprofit.org/

america american university kids china fall west spring chinese artist pennsylvania east senior snapchat penn ivy league digital media consisting jerry liu biztank

How To Growth Hack Your Digital Media Brand With Jerry Liu

Opensourced

Play Episode Listen Later May 30, 2018 35:21

In this interview I talk with Jerry Liu about his work as a social media growth expert and content operations manager. Jerry's channel blends a huge mix of topics from sociology and gender issues to economics and politics. What's interesting about Jerry's channels is all of the experiments he uses them for. Jerry talks about some of the experiments he has run on his own channel trying different content topics and styles. In addition to testing different types of content, Jerry is always testing new channels. The latest channel he has started to experiment with is the chat app discord. Jerry believes that building Discord communities is a highly underrated method of building influence online. He recently started experimenting with community run discord groups that are self policing and self sustaining. Jerry says his first group has been growing and that he is excited to see where this experiment takes him. Jerry's eagerness to test the world around him, especially the world of

brand discord digital media growth hacks jerry liu

EP57 - Saving the video team with sponsored content

Multimedia Week

Play Episode Listen Later Dec 13, 2015 39:08

While newspapers across China have been closing their video departments in a bid to cut costs as advertising leaves print for online, one newspaper in central China has backed the trend by moving it's video department over into sponsored content. Jerry Liu talks with D J Clark about how he turned a money leaking video department into a successful money making venture, and how he plans to get back into editorial work in a very different way.

china saving d j clark video team jerry liu

Podcast appearances and mentions of jerry liu

Best podcasts about jerry liu

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Latest news about jerry liu

Latest podcast episodes about jerry liu

Hiring only senior engineers is killing companies (News)

Hiring only senior engineers is killing companies

Hiring only senior engineers is killing companies (Changelog News #163)

Shout Out Fight Podcast #112 - "MTWC XI match ups and interviews with 2 on the event!""

EP. 220 Innovating with LlamaIndex: Unstructured Data, AI and MongoDB

LLMs, RAG, Vector Database, Enterprise Use Cases with Jerry Liu, Co-Founder & CEO of LlamaIndex

High Agency Pydantic > VC Backed Frameworks — with Jason Liu of Instructor

Jerry Liu: Generative AI, LLMs, LlamaIndex, RAG, Entrepreneurship, and Society, with Raja Iqbal

Taiwan Salon, Season 3, Episode 1: Jerry Liu on Cultural Policy after the 2024 Elections

Top 5 Research Trends + OpenAI Sora, Google Gemini, Groq Math (Jan-Feb 2024 Audio Recap) + Latent Space Anniversary with Lindy.ai, RWKV, Pixee, Julius.ai, Listener Q&A!

NFT Success and The Future of Motion with Rive CD Jerry Liu

Revolutionizing AI Data Management with Jerry Liu, CEO of LlamaIndex

LlamaIndex and More: Building LLM Tech with Jerry Liu

Notebooks = Chat++ and RAG = RecSys! — with Bryan Bischof of Hex Magic

RAG Is A Hack - with Jerry Liu from LlamaIndex

Ep 18: LlamaIndex CEO Jerry Liu on Trends in LLM Applications

Bring Your Own Data to LLMs (W/ Jerry Liu of LlamaIndex)

LlamaIndex: Unleashing LLMs on Your Data with CEO Jerry Liu

An Open Source Data Framework for LLMs

Jerry Liu - Building LlamaIndex, the Data Framework for LLMs

Data augmentation with LlamaIndex

Data augmentation with LlamaIndex (Practical AI #224)

AI Agents and Data Integration with GPT and LLaMa with Jerry Liu - #628

Connecting LLMs to your data, In-context learning | Jerry Liu, cofounder and CEO of LlamaIndex

Jerry Liu

012 Jerry Liu | CC0是否是NFT的共识方向？知识产权如何与NFT绑定？Is CC0 the consensus direction for NFT? How is IP tied to NFT?

Episode 29 - Fight Chat with Jerry Liu of Fight Commentary Breakdowns

112. Jerry Liu: Infernal Affairs/ Departed

[S1] [E17] What's It Like and How to Be an Asian Influencer w/ Jerry Liu, Digital Content Creator @ Jerry Liu Films

FS 15 – Henry Cejudo vs. TJ Dillashaw Fight Commentary Breakdowns w/ Jerry Liu

Jerry Liu | YouTuber and Digital Media Artist

How To Growth Hack Your Digital Media Brand With Jerry Liu

EP57 - Saving the video team with sponsored content