Podcasts about matei zaharia

23PODCASTS
29EPISODES
49mAVG DURATION
?INFREQUENT EPISODES
Jun 24, 2026LATEST

POPULARITY

20192020202120222023202420252026

Best podcasts about matei zaharia

MLOps.community

2 episodes with matei zaharia

Papers Read on AI

4 episodes with matei zaharia

The top AI news from the past week, every ThursdAI

2 episodes with matei zaharia

Latest podcast episodes about matei zaharia

Why the Frontier Ecosystem must be Open — Matei Zaharia and Reynold Xin, Databricks

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Jun 24, 2026 68:52

We're excited to have Databricks join us at AIEWF, among hundreds of the top companies in the AI Engineer ecosystem. LS subscribers can use their discount to get past the late bird pricing and access over $50k in sponsor offers! Everyone is still talking about Satya's Frontier Ecosystems post, but few have actually built a (now $175 billion) frontier ecosystem and cloud like our guests today.From open-sourcing the layer above coding agents to rethinking databases for the agent era, Databricks cofounders Matei Zaharia and Reynold Xin are pushing the company beyond the lakehouse into a full data-and-AI operating system. In this episode, Matei and Reynold join swyx at the 2026 Data + AI Summit to unpack Omnigent, LTAP, Lakebase, agent security, open formats, Mosaic, and why databases may matter more than ever once AI agents start doing real work.We go deep on Omnigent: Databricks' open-source meta-harness for combining, controlling, and sharing agents across Claude Code, Codex, Cursor, Pi, custom agents, and internal tools. Matei explains why coding agents and enterprise agents run into the same problems: portability, collaboration, session history, security, spend controls, and the need for a common API above every harness.Then Reynold walks through Databricks' database dream: why CDC is brittle enough to joke that it means “continuous data corruption,” why HTAP has been the holy grail of database engineering, and why Databricks thinks LTAP gets most of the benefits by unifying the storage layer instead of collapsing every query engine. We also cover Databricks' infrastructure scale, the culture behind rapid prototyping, the difference between tech and enterprise customers, Databricks vs Snowflake, whether vector databases should have ever existed, the Mosaic model strategy, Genie, AI Runtime, RL fine-tuning, and the thesis that traditional software gets rewritten once the data is in the right place and agents sit on top.Databricks began as a company for the big data era. The origination of Spark from the Berkeley AMPLab which eventually turned into the product Lakehouse convinced enterprises that they didn't need a separate data lake, warehouse, ML platform, and governance layer. They just needed one open foundation where all of their data could live and be reasoned over.Since then a lot has changed, but data has only become more important. Data is no longer something you keep track of and analyze ad hoc, it's the necessary context agents need in order to act. So the framing has shifted from “where do we put all of our data?” to “how do we expose the right slice of state, history, permissions, and business logic to an AI system at the exact moment it's doing work?”If frontier model performance becomes commoditized, the durable advantage then becomes the company-specific context around them: proprietary data, governed access, operational state, transaction logs, workflows, and feedback loops. Which makes Databricks positioned perfectly.Now coming fresh off the Data + AI Summit 2026, the company is moving just as fast to keep up, announcing Genie One, Omnigent, LTAP, and many more, indicating a central mission in its newer work: Databricks is trying to become the operating system for enterprise agents.Models are getting good enough, but agents are only useful if they have the right context, permissions, memory, state, cost controls, and access to live business data. Fundamentally it appears that significantly better model performance in production is a systems problem, one that data guys like us are remarkably well prepared to solve!We discuss:* Why Databricks built Omnigent as a meta-harness above existing AI agents* Why coding agents and custom enterprise agents need the same infrastructure* The common API for agent sessions, files, streams, tool calls, and cancellation* Why persistent sessions, cloud sandboxes, sharing, search, and collaboration matter* Why Databricks open-sourced Omnigent instead of keeping it proprietary* Databricks' internal agent usage, cloud sandboxes, and coding workflows* The scale of Databricks: 50–60 million virtual machines a day and exabytes before breakfast* Why agent security needs contextual and stateful policies* How an agent could read confidential docs, install a compromised npm package, and leak data* Why spend control matters when an agent can burn $500 reading logs* Startup opportunities around coding-agent analytics, quality, skills, and spend* LTAP, Lakebase, and why Databricks wants to rethink the database stack* OLTP vs OLAP, CDC, and why data pipelines break at 3 a.m.* Why HTAP has historically been the holy grail of database engineering* Why Databricks thinks LTAP is “HTAP done right”* How writing transactional data into column-oriented formats changes analytics* Why agents need live operational context from databases, not just telemetry* How Databricks prototypes strategic systems without endless process* Enterprise vs tech customers, governance, procurement, and DIY culture* The “second system syndrome” risk of rewriting a database engine* Building a database engine from a decade of traces and quadrillions of data points* Why vector databases should never have been a separate category* Why open formats and AI changed the race with Snowflake* The Mosaic story, DBRX, Genie, document parsing models, and specialized model training* Why model customization and RL fine-tuning may become mainstream* Why “get the data there, slap some agent on top” may rewrite traditional softwareMatei Zaharia* LinkedIn: https://www.linkedin.com/in/mateizaharia* X: https://x.com/matei_zahariaReynold Xin* LinkedIn: https://www.linkedin.com/in/rxin* X: https://x.com/rxinDatabricks* Website: https://www.databricks.com* X: https://x.com/databricksTimestamps00:00:00 Introduction00:02:22 Omnigent and the Agent Infrastructure Layer00:08:39 Agent Clouds, Common APIs, and Open Source00:16:52 Databricks Scale and Internal AI Workflows00:18:03 Agent Security, Governance, and Spend Controls00:27:34 LTAP and the Database Dream00:30:30 CDC, HTAP, and Why Data Pipelines Break00:34:05 Lakebase, Parquet, and Live Data for Agents00:36:47 Databricks' Culture of Fast Prototyping00:43:40 The Dream Engine and Rewriting the Database Stack00:51:02 Vector Databases, Query Engines, and LTAP00:52:36 Databricks vs Snowflake00:57:48 Mosaic, DBRX, Genie, and Specialized Models01:03:11 Context, AI Runtime, and RL Fine-Tuning01:06:15 Why Data + Agents May Rewrite Software01:07:09 Closing ThoughtsTranscriptIntroduction: Databricks, Data + AI Summit, and Founder DynamicsSwyx [00:00:00]: Matei and Reynold from Databricks, welcome to Latent Space.Reynold Xin [00:00:06]: Hey, thanks for having us.Swyx [00:00:07]: Yeah.Matei Zaharia [00:00:08]: Yeah, thanks so much.Swyx [00:00:09]: thanks for taking time out. You have your Databricks, Data AI Summit going on. You were just telling me how the first summit that you guys ran was just 50 peopleReynold Xin [00:00:17]: Yeah, it wasSwyx [00:00:17]: in BerkeleyReynold Xin [00:00:18]: little meetup at Berkeley, I thinkMatei Zaharia [00:00:19]: YeahReynold Xin [00:00:19]: put togetherMatei Zaharia [00:00:20]: We were doing these tutorials and, yeah, just teach people Spark.Swyx [00:00:23]: Yeah. obviously now it's like, I think like the headline number's like 100,000 people around the world, 30,000 in person.Swyx [00:00:30]: it's a crazyMatei Zaharia [00:00:31]: AmazingSwyx [00:00:31]: community. Well, I just saw the keynote.Swyx [00:00:35]: Ali's just. Did was it obvious or that back when that Ali would be, like, such a great, like, CEO? LikeReynold Xin [00:00:42]: OhSwyx [00:00:42]: such a great presenter?Reynold Xin [00:00:43]: What do you think?Matei Zaharia [00:00:44]: I think among our group of founders it was clear that, I think he'd be the best at this.Swyx [00:00:50]: Yeah.Matei Zaharia [00:00:50]: And yeah, it turned out great. And he's, he's ramped up on so many topics growing a company. He would just go in and, like, study it and, be talk to all the experts. Like, even if he can't hire the person, learn enough about, like, finance and sales and whatever it was, and, and go from there. Yeah.Swyx [00:01:09]: Yeah.Reynold Xin [00:01:10]: he's obviously very high IQ and a very high EQ, but it wasn't. Like, Ali today is quite different from Ali from, like 10 years ago. I think there's a lot of work that he put in to, get to this point.Swyx [00:01:20]: Yeah. no, to me the most appealing thing about him is that he's funny. And like, it, it's, it'Matei Zaharia [00:01:26]: It's true, yeahSwyx [00:01:26]: it's hard to make jokes about, data warehousesReynold Xin [00:01:30]: About serious topicsSwyx [00:01:31]: securityMatei Zaharia [00:01:32]: YeahSwyx [00:01:32]: what have you.Matei Zaharia [00:01:33]: Oh, yeah. That's for sure.Swyx [00:01:34]: Yeah. So you guys launched a whole bunch of things. I'll, I'll just name check briefly, the stuff because we're not gonna cover everything. Omnigentt, your baby. LTAP, your baby, your dream engine.Swyx [00:01:47]: we're also gonna cover Genie, cover CustomerLake, you acquired PantherMatei Zaharia [00:01:52]: YeahSwyx [00:01:52]: Open Sharing, and there's Unity AI Gateway. A lot of these, I think, like, are things that you would expect a Databricks to do. It's, it's like part of the roadmap. Everyone in your category has similar things. But I think, probably the two of you are leading the two most unique and differentiated initiativesOmnigent and the Agent Infrastructure LayerSwyx [00:02:09]: on, in the landscape. Maybe we'll start with, Omnigentt we'll, we'll, we'll, we'll go into it. I do think that a lot of people are exploring this meta harness concept.Matei Zaharia [00:02:21]: Yeah, totally.Swyx [00:02:21]: What led you to it?Matei Zaharia [00:02:22]: Yeah. There were a couple of, like, converging lines, which I think is a good sign that you need something new. So on the one hand, there's all the coding agent info internally. We have really great, dev infra team. they built something called Isaac, that's like a wrapper on Claude Code and Codex, and, lets you use them either on the web in, like, sandboxes or, just on your dev machine or on your laptop or whatever. And then, they were adding all kinds of stuff there. And we saw all the more advanced engineers like, were building their own workflows with tons of agents, and they were building their own UIs and stuff on top or even on top of that. And then the other one was, like, us building agents. We ship this, like, data science agent called Genie on the research team, which I lead. We also build a lot of internal ones for various things, and then we have all the customer ones. And all of them running into this thing of like, “Oh, I need to switch model and harness and so on,” every few months. Plus the agent is, like, completely useless if you can't share sessions with someone and have history and have search and all this, like, layer on top of it for collaboration. I thought a bit about it from both contexts and, at first people thought it was weird. They're like, “Why are you doing coding agents and custom agents in the same thing?” But I said it's, it's the same problems and, you just wanna build the stuff that lets you deliver the agent, maybe control it if you care about security, and, make it portable across things. And then we prototyped some things as experiments. We saw, yeah, we can make it work, and then we built that for real.Swyx [00:04:06]: I'm wondering if this let's call it architectureMatei Zaharia [00:04:11]: YeahSwyx [00:04:11]: maps to anything in your careers in the past. like I always think about how a lot of things just tie back to operating systems.Swyx [00:04:18]: A lot of operatingMatei Zaharia [00:04:19]: YeahSwyx [00:04:20]: systems tie back to databases,Matei Zaharia [00:04:21]: SoSwyx [00:04:21]: or the other way aroundMatei Zaharia [00:04:22]: so the thing, I do think it ties a lot to, like, network protocols, internet protocol. we alsoSwyx [00:04:29]: Communication between entities.Matei Zaharia [00:04:30]: Yeah. We did stuff with, like, data sharing also, which is probably, most viewers probably won't know unless they'Swyx [00:04:36]: Yeah, open protocol is the term.Matei Zaharia [00:04:37]: Yeah.Swyx [00:04:38]: Open sharing. Open sharing.Matei Zaharia [00:04:38]: Open sharing.Swyx [00:04:39]: Yes.Matei Zaharia [00:04:39]: Yeah. So it's like you have a company, you maintain some table, like let's say like a Walmart or something. They have like the, inventory and what's been sold in each store. And then you also have suppliers, and they would love to produce more things and ship them, like, exactly the moment you need them. So they would love, like, real-time access to your table. So instead of like sending emails around or Excel sheets or phone calls, why can't you share like a view of that table in real time with them? Then they query, they, join it with their data, and they decide what to send. So it's one of these things where you, like you might ask like today since we can vibe code anything so fast, why do we even need to design like protocols or APIs or software? Why can't you just vibe code things on demand? But for this type of interoperability where multiple parties that are moving at different speeds are building stuff and you still want some layer on top to coordinate, you do wanna design it and build it. So it reminds me of that, like agents talking to each other and, users talking to agents and tools.Agent Clouds, Cloud Sandboxes, and Keeping Sessions AliveSwyx [00:05:42]: Reynold, any other comments alternative viewpoints?Reynold Xin [00:05:46]: I think, by the way, we had a debate on exactly which set of benefits would, matter a lot, and I think around the time we decided to do this thing I was telling Matei, “Hey,” it just happened to be there's a particular week that I was coding nonstopSwyx [00:06:00]: from the moment I woke up to, like, the moment I went to bed, I was, like, looking at my Claude sessions, my Codex sessions. And one of the things that was particularly annoying was having to keep my laptop open.Swyx [00:06:12]: I was driving to a doctor's appointment, and I remember because I wanted to make sure the whole thing continues working.Matei Zaharia [00:06:18]: But by the way, it's so comforting to hear you say that because I'm like, “I don't know if I'm a clown and I'm doing this or like.”Swyx [00:06:25]: Yeah. Like honestly, I was driving and I was tethering my laptop to my phone.Matei Zaharia [00:06:29]: huh.Swyx [00:06:29]: Keeping it on the side. Whenever I hit a red light, I started looking at what's going on my laptop.Matei Zaharia [00:06:35]: Yeah.Swyx [00:06:35]: And I just felt that was ridiculous.Matei Zaharia [00:06:37]: Yeah.Swyx [00:06:37]: It felt like we went back to the dark agesMatei Zaharia [00:06:39]: YeahSwyx [00:06:40]: programming. the productivity you gain from all this coding age is amazing, but, yeah.Matei Zaharia [00:06:45]: Have you heard of cloud?Swyx [00:06:47]: Yeah.Swyx [00:06:48]: It was crazy to me.Matei Zaharia [00:06:49]: Oh, the thing you were working on was the sandboxes or was this before that?Swyx [00:06:52]: It was a sandbox.Matei Zaharia [00:06:53]: Okay.Swyx [00:06:54]: I was workMatei Zaharia [00:06:54]: So you were inSwyx [00:06:55]: So I was approaching from a very different angle. I wanted to, “Hey, we're gonna have cloud sandboxes that doesn't shut down. You can get one very quickly,” but not just for running agentic sessions.Matei Zaharia [00:07:06]: Yeah.Swyx [00:07:06]: It's also for running development. So I was personally building that week, and through building that, I ran into all these issues, and then I wroteMatei Zaharia [00:07:15]: YeahSwyx [00:07:15]: a document for Matei, it's like, “Here's my wish list of what the actual environment should do.” And I think he ended up almost implementingMatei Zaharia [00:07:22]: YeahSwyx [00:07:22]: every single one of them.Matei Zaharia [00:07:23]: Yeah, I remember Reynolds saying, ‘cause my first prototype of this had just chats with your agent and he said, “I have to be able to open a shell, like my own shell and like list files and like tail them and stuff.” SoSwyx [00:07:36]: So SSH into a mainframe.Matei Zaharia [00:07:37]: Yeah. it has that now.Swyx [00:07:39]: Tailing my log.Matei Zaharia [00:07:40]: Yeah.Matei Zaharia [00:07:41]: Yeah.Swyx [00:07:41]: And also another thing I think I asked was, I had. I still use cursor for the sole purpose of rendering markdown files.Matei Zaharia [00:07:48]: huh. Yes.Swyx [00:07:49]: So I said, “If you just give me a way to see my markdown files and renderMatei Zaharia [00:07:53]: YeahSwyx [00:07:53]: them properly, I don't need a separate tool anymore.”Matei Zaharia [00:07:55]: Yeah.Swyx [00:07:56]: And I think you also built that in.Matei Zaharia [00:07:57]: Yeah, we, yeah, we did that, yeah. Yeah, we had a lot of engineers building, their own vibe coding setup. But then the other thing they all said is like, “Hey, I built something that's amazing for me, but, like, no one else on the team can use it ‘cause I don't have a server to collaborate.” And this is why we tried to set up, Omnigent, so you can have a server and have the security, set up in there. So, like log in with Google or whatever and, like securely share stuff. which. And that's where we've seen a lot of other agents like hit things. Like people think they prototyped an awesome agent, but it's not allowed to connect to like some really important data or whatever because of the security team.Omnigent Architecture, Open Source, and Common APIsSwyx [00:08:38]: Yeah.Matei Zaharia [00:08:38]: So yeah.Swyx [00:08:39]: Yeah. At this point, so for those watching along on YouTube, we're gonna putting up a image of the structure here, and we can talk a little bit of the architecture. I think I just want to have people understand, ‘cause like when we're talking about software, it can be very abstract and like here is what we're talking about. You've worked out in open source this entire platform and there's a runner component and server component with a uniform API that you've, you've figured out. any other element and obviously you can plug in all this, persistence layers and compute layers. This is a whole cloud. It's an agent cloud.Matei Zaharia [00:09:12]: Yeah. It's, it's got these components to work with it. The, a lot of the action happens like on the machine where you deploy your agent too. So whatever you've got on there, you can run. But yeah, it's, I think it's the minimal thing you want to have hosted, like collaborative agents and to have that server. And one of the reasons we open sourced it is, anyone building agents, this gives them an app they can start with and customize, which we were seeing in Databricks too. Like someone would make a nice, agent app and then other teams would ask, “Oh, can I just use yours for my agent?”Swyx [00:09:45]: Yeah, I think we had like five or six different agentic frameworksMatei Zaharia [00:09:48]: YeahSwyx [00:09:48]: built by every different team. They do all do more or less the same thing. Yeah, you need to. people wanna take something that works in Forkit, and you might as well have something open source. Yeah, which also was another question, which is interesting for Databricks. Like what do you choose to open source? What do you choose to make it proprietary? It's in. this goes back to Spark, right?Matei Zaharia [00:10:05]: Yeah.Matei Zaharia [00:10:06]: One, so one of the reasons to open source something is if you think it's a layer that will there'll be some network effect, it'll benefit from many, people collaborating, on it. So, for example, with Spark, I don't know if when Spark came out, we also focused a lot on letting you have libraries on top. So like there used to be differentSwyx [00:10:28]: EcosystemMatei Zaharia [00:10:28]: distributed computing engines for like machine learning and graph computation. We said they should all be libraries that you can compose. And we made it super easy to add connectors to data sources too. And then we benefit because, we don't have the time to write like connectors to like, 1,000 like different databases and file formats, but we can just use the ones people make, and of course they benefit from joining, this thing. So that's like one of these as it. Another way to think about it is like imagine, we our thing wasn't open. We had some agent hosting thing, but it's not open and then there is an open one. if you're. Which one's gonna win in the long run? So like here, because there is this benefit from like people writing integrations, it'll be, it'll be that. And then there are other things that like you just can't, even deliver as open source that are things the company does. Like for example, how do you make sure you're like streaming, jobs or your Lakebase database doesn't like, lose all your data at night? Well, that requires an operational team that's gonna sit there. There's no way it has to be a service. So like we wanna make sure as a company we're really good at those infra services and then we're as open as we can in terms of like what you build on top.Swyx [00:11:42]: speaking from a benefits, I think we are already seeing pull requestsMatei Zaharia [00:11:45]: YeahSwyx [00:11:45]: of all kinds of ecosystem integration, even though it was only released on Saturday.Matei Zaharia [00:11:50]: Yeah, Saturday. Yeah. So someoneSwyx [00:11:51]: Let's see, let's see what's going on. Yeah, you can look at the merge ones. I asked Sam Nigon this morning aboutMatei Zaharia [00:11:59]: 400 merge already?Matei Zaharia [00:12:00]: Yeah. I think Recent quite, I would guess around half are not from our team. but for example, someone added support for running it on Kubernetesrnetes. people added, many cloud sandboxes, so this can launch a cloud sandbox and run your agent in there, which is great for sharing too, ‘cause it's not, like, on your laptop and someone's, like, running scary code on there. so yeah, many startups have put those in, and, we expect to see more of them. We also have more agent harnesses already. Cursor, CLI, and Antigravity also.The Modern Data Stack and the Emerging AI StackMatei Zaharia [00:12:34]: Yeah. That's all, beautiful. And I, I feel like the last time this happens, there was the rise of the modern data stack.Matei Zaharia [00:12:42]: I don't know if it's that useful. I'm, I'm curious in your postmortem.Matei Zaharia [00:12:46]: I think most peopleSwyx [00:12:47]: AgreeMatei Zaharia [00:12:47]: will agree that it is finally dead. but maybe this arises to a new modern AI stack that, like, does the same thing.Matei Zaharia [00:12:52]: I don't know.Reynold Xin [00:12:54]: I think the modern data stack was a pretty useful thing, probably even up until this day. I think what, maybe for the audience who don't understand the history, I think the modern data stack is effectively decomposed into you need a layer to ingest the data in, you need a layer to transform your data, and then all of this are run, and then you need a layer to maybe visualize your data. And all of this runs on some data warehouse, or later on, as we're doing data warehouse or lakehouse.Reynold Xin [00:13:21]: I think that concepts are all very powerful and very useful. They enable a lot of workloads. What people eventually run into is a question of unification and consolidation is, hey, do you really need to chop all this into different pieces and work with so many different vendors and platforms in order to get, like, a very simple visualization done, right? So I think, like, over time, everybody started realizing that customers are pushing us. We started, we can realize that, so we started building more and more capabilities and trying to consolidate. And at the end of the day now, customers don't have to worry about having me hook up five different systems in orderMatei Zaharia [00:13:55]: YeahReynold Xin [00:13:55]: produce a chart. But the. I think, honestly, something like this is probably happening, in how many different frameworks do you want to hook up together in order to produce, like do a very simple agent.Matei Zaharia [00:14:06]: Just to be clear, I would say the core of this is this common API on top of all the harnesses. So the API is like, you've got an agent session, and you can send in a message or, like, a file. That's what you can send in, and then you get out, these streams as it's streaming text or as it's doing tool calls. And, or the other thing you can send in is you can, like, tell it to cancel a turn. So that's the API. Now, the thing we did is we could get you that on top of, like, cloud code running in a terminal, Codex, Py, OpenAI SDK, all that stuff. We map them all to that same interface. So that is something that you'd have to maintain yourself if you built your own, like, agent orchestrator, and then whenever cloud changes its API, you gotta, tweak your thing or it's gonna lose some messages. So that's the thing that's valuable to maintain. Then on top of that, like, we built a few apps. I think we built a pretty cool UI and stuff, but that's, And we built a security and control piece, which I'm excited about. But it's that common interface, so we don't. We. That doesn't try to be a stack. And in fact, you could plug in your own UI on top of this, server. That, and that's one of the use cases we care a lot about, ‘cause we want to use this in our own products.Compute, Sandboxes, and Databricks ScaleSwyx [00:15:20]: Yeah. It should be everywhere.Matei Zaharia [00:15:22]: Yeah.Swyx [00:15:22]: I think one of those things that is really interesting to me is, like, well, first of all, I'll, I'll endeavor to do everything and not call it the modern AI stack because like it needs a different name.Matei Zaharia [00:15:32]: Yeah.Swyx [00:15:32]: But like, yes, like, so one of the first people that told me about compute, sandboxing was Nikita from Neon.Swyx [00:15:39]: Because a lot of people think about Neon as like, well, it's serverless Postgres with, like, the separation of compute and storage and, instant branching and all those things. But every database company is also a compute company.Matei Zaharia [00:15:51]: Yeah. Yeah.Swyx [00:15:52]: And so he was showing to me his whole, his sandboxing solution. I don't think he have ever launched it.Matei Zaharia [00:15:57]: So our sandbox solution, the reason we could build it so quickly was because we realized if you just take the actual Lakebase architectureSwyx [00:16:05]: YeahMatei Zaharia [00:16:05]: and remove the database from it, by the coming from NeonSwyx [00:16:08]: Exactly, rightMatei Zaharia [00:16:09]: you have this sandboxSwyx [00:16:09]: Every database company has it already, yeah.Matei Zaharia [00:16:11]: Now, there are some differences. For example, in the one to support this particular workflow, it's important to have local persistence,Swyx [00:16:19]: YeahMatei Zaharia [00:16:19]: because you want your state to persist. Your libraries, you don't have to install your library every time, right?Matei Zaharia [00:16:24]: whereas the Neon architecture, because of the separation of storage from compute, you don't need persistent local disk.Swyx [00:16:30]: Yeah.Matei Zaharia [00:16:30]: So there's some differences.Swyx [00:16:32]: Yeah.Matei Zaharia [00:16:32]: But the, at the end of the day, yeah, it's, Yeah, so this is when you run, like, a coding sandbox. Like, if I use it, yeah, we have the dev env internally at Databricks. There's, like, many, like, tens of gigabytes of data just for, like, all the source code and, like, artifacts and stuff that I built, and I want that to come back next time, so.Matei Zaharia [00:16:51]: Yeah.Matei Zaharia [00:16:51]: But yeah.Matei Zaharia [00:16:52]: Before the show, we was talking about some statistics that might be surprising at the adoption.Matei Zaharia [00:16:56]: It could be internal, it could be external, whatever comes to mind, just to impress people the scale this is happening.Swyx [00:17:02]: So we, on the analytics side, I think we launchedReynold Xin [00:17:06]: Maybe 50 or 60 million virtual machines a day across all three clouds, so we're one of the biggest compute orchestrators out there.Reynold Xin [00:17:13]: Stuff for sure for CPU compute.Swyx [00:17:14]: Yeah.Matei Zaharia [00:17:14]: Yeah.Reynold Xin [00:17:15]: the. And all of this process, I think exabytes of data, I joked about depending on which time zone you are, typically before you have breakfast, Databricks would have processed exabytes of data already on that day. and on Neon, it's pretty interesting, too. It's launching, I think, 13 million databasesSwyx [00:17:34]: YeahReynold Xin [00:17:34]: a day now.Swyx [00:17:35]: Yeah, to me that was, like, aReynold Xin [00:17:36]: And that's just likeSwyx [00:17:37]: Like, what do you mean?Matei Zaharia [00:17:38]: Yeah. And that's the point.Reynold Xin [00:17:40]: And a lot of those were thanks to agent- agents and branching experimentationSwyx [00:17:44]: YeahReynold Xin [00:17:44]: because we made it so easy and so quickly, and thanks a lot to Nikita's team, to launch databases. It's, the. So it's changing the way people use databases.Swyx [00:17:54]: Yeah. Okay, we're gonna go into more database talk in a bit, but I wanna make sure we close up anything on Omnigentt. you mentioned, you were excited about the securityOmnigent Security, Contextual Policies, and Spend ControlsSwyx [00:18:03]: control side.Matei Zaharia [00:18:04]: Yeah.Swyx [00:18:04]: a lot of companies are figuring that out right now, as well as the spend side.Matei Zaharia [00:18:08]: Yep.Swyx [00:18:09]: what have you found there?Matei Zaharia [00:18:11]: Yeah, so I spent quite a bit of time talking to internal users, developers, security team, managers, and also lots of customers, and there's a few things. Like, first of all, one thing, that immediately was. became obvious is for security, there's this tension between, like, usability and security. And, the way people do. Like, a lot of coding agents today have very basic things like you can tell me which tool patterns I'll allow or disallow or whatever. It's like yes or no. But that puts you in a very tough spot. So just as an example, like, should my agent be able to read, some confidential documents, or let's say, should it be able to install new packages from npm, which, maybe it's compromised. Yes or no? Like, maybe I wanna allow it. Should my agent be able to publish stuff to the company website? Well, if I'm using it to code on the website, yes. But should it be able to do both, so it can, like grab a confidential document and be prompt injected and leak it? Probably not. So the thing we decided we need is stateful or what we call contextual policies where you keep track of the state of that session. It's not like is it allowed to push to the marketing site or not, but, like, hey, if it did a risky thing, like it installed, a old package from npm, or it read, like, 1,000 confidential docs, then no. Then don't, don't do it. Otherwise, maybe it's okay. That's one example of, like, moving that trade-off so it's both more secure and more useful by having a more powerful engine, essentially. This requires tracking sessions. The other piece that was interesting there is, like, there are these very level events it's doing, and you want some libraries on top that parse them. Like, for example, we have a, MCP server on Google Drive internally. It's got 60 API calls. like, how do I know which of those, like, will share a document with stuff on the internet and which ones won't? It's, it's annoying. So we designed in Omnigentt the policy layer so that it's functions and you can have libraries. Like, someone can make something that maps the level events to high-level ones, and then you write a policy about the high-level things that came out. so and thatSwyx [00:20:25]: This is related to the Panther,Matei Zaharia [00:20:27]: Yeah, Panther is. will help with that. PantherSwyx [00:20:30]: YeahMatei Zaharia [00:20:30]: a similar idea on the event processing side, and it's Python-based versus a weird custom language. this is more, as in realSwyx [00:20:39]: I didn't even know we were good yeah.Matei Zaharia [00:20:41]: Those things are happening, yeah.Swyx [00:20:42]: Yeah.Matei Zaharia [00:20:42]: So yeah, but these are the cool things. I think the contextual or stateful part, and then the way it can be libraries, and that was another reason to make it open source because others will write libraries and, like, we and our customers can use them. And the final thing, because it's stateful, one of the states we track is how much you spent in that session. So I can. I've had, like, I ask an agent to debug something, and it spent $500 because it decided to read a lot of log files and burn a lot of tokens. but I can literally say, “Okay, launch a agent to do this and cap it to spending $5.” Like, ask me for permission if it needs more. And because we're counting that within that session, it'll pop up and tell me, “Okay, you spent five, $5. Do you wanna go on?”Reynold Xin [00:21:27]: So important context here. Matei spent the last five years, a lot of his time was architecting Unity Catalog at DatabricksMatei Zaharia [00:21:34]: YeahReynold Xin [00:21:34]: which is the governance layer for data.Matei Zaharia [00:21:35]: That's right, yeah.Reynold Xin [00:21:36]: And he's combining expertise at that layer together with all the AI governance he knows.Matei Zaharia [00:21:41]: Yeah.Swyx [00:21:41]: DoMatei Zaharia [00:21:41]: But I also spent a lot of time being annoyed by coding agents and getting prompts.Matei Zaharia [00:21:46]: And also as theReynold Xin [00:21:48]: All the aboveMatei Zaharia [00:21:48]: I don't want to end up on the front page as, like, I installed some weird npm package and leakedSwyx [00:21:53]: YeahMatei Zaharia [00:21:53]: all the code, so I'm especially paranoid. But also I have very little time, so I don't want to sit there approving, like, do you want to run a 20-line, bash script, yes or no? so that's why I spend a lot of time figuring out, like, how can I make it as safe as possible and not annoying?Swyx [00:22:10]: Yeah. Is safety and mmm, let's call it security a bigger concern than token maxing or token budgets? which one is, likeMatei Zaharia [00:22:19]: Oh, yeah, they're both there. I don't know. I guess it depends on the type of company you are. So I think, some companies, like, the budget is, limited and, they really care about thatSwyx [00:22:34]: you can be Uber and still be concerned?Matei Zaharia [00:22:36]: Yeah. Oh, yeah, totally. Yeah. If you haveReynold Xin [00:22:38]: for us, securityMatei Zaharia [00:22:39]: YeahReynold Xin [00:22:40]: super paramount.Matei Zaharia [00:22:40]: For us, security is absolutely critical as a, cloud provider. It's, it's the most important thing, and, token maxing, we're not so worried about it yet, but I've seen the Like, for example, I talked to some consulting companies. They have, like, 100,000 employees who are all coding for customers. If those each spend, like, an extra $1,000 a month, that's, that's not fun.Swyx [00:23:04]: YeahMatei Zaharia [00:23:04]: we have, like, only a few thousand engineers.Swyx [00:23:06]: What's the policy in Databricks? Is it just unlimited or what'Matei Zaharia [00:23:08]: It's, it's unlimited, but we do. we use our own product to, like, analyze the traces and stuff, and we have a team that'looking to optimize and to see if anyone's doing something weird. And, we had some really cool insights just from analyzing current traces, like whichSwyx [00:23:24]: YeahMatei Zaharia [00:23:25]: models are better at, say, Rust versus like TypeScript or whatever. So yeah, at least in our code base.Swyx [00:23:31]: Yeah. Amazing. Obviously, I have to ask the token question, obviously.Matei Zaharia [00:23:34]: Yeah.Swyx [00:23:34]: I think it'sReynold Xin [00:23:34]: YeahSwyx [00:23:34]: it's a key thing. But yes, security and control above that, and figuring out a sane layer there you can have some autonomy, but, not too much.Matei Zaharia [00:23:43]: Yeah. Yeah, and we wanna make it super easy. As a engineer, you should set a thing. So in Omnigentt, you can ask your agent, “Set a policy on yourself to do this.” So it can likeSwyx [00:23:52]: But if there's something I should be showingMatei Zaharia [00:23:53]: YeahSwyx [00:23:53]: I don't, I don't see it on the GitHub, but,Matei Zaharia [00:23:55]: Oh, yeahSwyx [00:23:56]: there's justMatei Zaharia [00:23:56]: Well, in the docs there's something.Swyx [00:23:57]: Yeah, this is it.Matei Zaharia [00:23:58]: You can look at it later.Swyx [00:23:59]: Okay. Yeah.Matei Zaharia [00:23:59]: Just look in the docsSwyx [00:24:00]: YeahMatei Zaharia [00:24:00]: contextual policies if you wanna see.Swyx [00:24:04]: I just like to point peopleMatei Zaharia [00:24:05]: look at the built-in policies.Swyx [00:24:06]: Yeah.Reynold Xin [00:24:06]: Yeah.Swyx [00:24:06]: If you want to, follow up on this is exactly where to look, right?Reynold Xin [00:24:10]: Yeah.Matei Zaharia [00:24:10]: Yeah. yeah, and the story of these is, like, I just wrote, like, I wrote a doc with like 10 ideas for things before as you were working on them. Well, that was, like, my wish list of things people asked, and I told the team, like, “Hey, can you do like at least five of these for the launch?” And then they just got back with all of them, so.Swyx [00:24:29]: Oh, wow.Matei Zaharia [00:24:29]: so you can come up with more, but them- some of them are just meant to be examples. really you can intercept, like, any event the agent is making, and you can then either block or force it to ask the user or, like, allow, and you can update state to keepSwyx [00:24:45]: YeahMatei Zaharia [00:24:45]: track stuff.Swyx [00:24:46]: Yeah, ‘cause ultimately you're, I think of you as, like, a systems designer.Swyx [00:24:50]: You let people plug in, right? That's the wholeMatei Zaharia [00:24:51]: YeahSwyx [00:24:52]: modus operandi of what you do.Matei Zaharia [00:24:53]: Yeah.Swyx [00:24:54]: It's likeMatei Zaharia [00:24:54]: And we care a lot about also composab- like, can someone else write a library that others use, whichSwyx [00:24:59]: YeahMatei Zaharia [00:24:59]: this is meant to.Reynold Xin [00:25:00]: There's also a batteries included philosophy hereMatei Zaharia [00:25:03]: YesReynold Xin [00:25:03]: probably very similar to how you did Spark, which is you could just start using.Swyx [00:25:06]: Yeah.Matei Zaharia [00:25:06]: Yeah, that's right. It has to be good out of the box at certain things, and then you can build your own things on top that, like, we don't wanna do. But in Spark, if you just wanna like, I don't know, like read a table or do, like, a aggregation, it should be awesome at that out of the box.Building on Omnigent: Contributions, Startups, and AnalyticsSwyx [00:25:23]: Yeah. People wanna catch up on Omnigentt, they should watch your keynote.Swyx [00:25:26]: they should go through the GitHub and the docs. If they wanted to contribute, or they want to build on this ecosystem what would you call out as the most high-leverage places get involved?Matei Zaharia [00:25:36]: Yeah, do get involved in the Discord and in GitHub. Our team is there, is monitoring, and, some of the things people ask for we just built ourselves. Some of them, we're, we're collaborating with them to build it. and also tell us, likeSwyx [00:25:49]: Yeah, they're gonna be veryMatei Zaharia [00:25:49]: how you would like to use it because I think especially for developers, like, everyone wants it to work their own way, and a really good developer tool, like you have to hear the feedback on all the ways and figure out the abstractions and how to let people customize. So we'd love to hear, like, if you think, “Hey, I, I don't want it to work this way,” tell us. We really just wanna get that compatibility layer across agents and then let you do stuff on top.Swyx [00:26:14]: Yeah. is there any, in terms of like the startup side, I'm, I'm a founder.Swyx [00:26:18]: I wantMatei Zaharia [00:26:18]: YeahSwyx [00:26:18]: I see an opportunity, I wanna get in front of you. What's your request for, like, a startup that, like, I wish someoneMatei Zaharia [00:26:23]: Oh, like you wanna integrate with us?Swyx [00:26:24]: someone was working on this.Matei Zaharia [00:26:26]: Oh, for a startup?Swyx [00:26:27]: Yeah.Swyx [00:26:28]: Like, your, you got your own startup. It's doing well.Matei Zaharia [00:26:30]: Yeah.Swyx [00:26:30]: But like, if you weren't working on your own startup, what is, like, obvious that you should You advise many startups too, obviously.Matei Zaharia [00:26:37]: I do think, just as a company with a lot of engineers, like anything that helps me make sense of how people are usingSwyx [00:26:46]: SpendMatei Zaharia [00:26:46]: coding agents and,Swyx [00:26:48]: Yeah. AnalyticsMatei Zaharia [00:26:48]: spend, but also quality or like you should write, you should add this skill, or you should write this thing, or your agents are really horrible at tasks involving this service, so I go spend time. That would be nice. yeah.Swyx [00:27:00]: Yeah. The closest I've found is, this team, GitAI.Matei Zaharia [00:27:03]: Oh, cool. Yeah.Swyx [00:27:04]: They started with, like, we will just do, code and human attribution, but they're building the analytics layer on top of that.Matei Zaharia [00:27:12]: Yeah.Swyx [00:27:12]: I do think, like, there are a bunch of, like, artificial analysis is obviously,Matei Zaharia [00:27:18]: Yeah, they have their benchmarksSwyx [00:27:18]: doing super wellMatei Zaharia [00:27:19]: YeahSwyx [00:27:19]: with their stuff. so there's, there will be people. I think this is like the domain of consultants first, but then peopleMatei Zaharia [00:27:26]: YeahSwyx [00:27:26]: will build software that, let's say, it's kinda like the management planeMatei Zaharia [00:27:29]: YeahSwyx [00:27:30]: for coding agents.Matei Zaharia [00:27:30]: Yeah, I think there'll be a lot of insights there. You have it in other areas.Swyx [00:27:34]: Okay. Well, and then the other, big thing is your dream engine.LTAP: Lake Transactional/Analytical ProcessingSwyx [00:27:39]: maybe you wanna tell the story of, LTAP.Reynold Xin [00:27:45]: So, and background with. I'm, I'm gonna make people listen to our Ankur Goyal episode where we talked about SingleStore, HTAPMatei Zaharia [00:27:52]: YeahReynold Xin [00:27:52]: and all that history.Matei Zaharia [00:27:52]: Yeah. The LTAP idea is pretty simple. so if people have heard of the, Ankur's, talk about HTAP, it's effectively the world of databases. Sorry, there's like maybe a lot of context needs to be injected here. The world of databasesSwyx [00:28:06]: I am happy to be the database podcast that I'm forcing people to, like, learn your databases, guys.Swyx [00:28:11]: You cannot vibe code with just markdown files.Reynold Xin [00:28:13]: Yeah.Swyx [00:28:13]: Like,Reynold Xin [00:28:14]: It's one of the most important fundamental systems technologies out there. But the world of database effectively split into roughly two halves. There's what we call OLTP databases, which are transactional, and think of your Postgres, your MySQL, your Oracle databases, and the other side is what we call analytics, and sometime might refer to term OLAP. And the difference is on OLTP, you typically have maybe run some transaction on some event that looks up at one specific row. We update that row, right? It's a very oriented data structure. And on analytics, you're trying to reason on the data. You're trying to compute, “Hey, what's my revenue per store? What's my. How's my website doing every day?” And then you, eventually want to probably end up running anal- machine learning on it to predict, “Hey, how will my maybe sales be going in the future?” they are so very different architecture, and everybody start with OLTP databases. Every app, when you become serious enough, that needs more than markdown files, you need to have a database. You want to lose your data, you want to have some transactional consistency. But once you want to reason on the data, if you only have like- A hundred rows, it's probably okay to run it on your Postgres or your own, your MySQL database. But once you have more data and want to run more complicated analysis, the very analysis might crush your Postgres database. So you start doing, getting data out of the OLTP databaseSwyx [00:29:35]: Replication.Reynold Xin [00:29:36]: Replicate them into the analytic systems and just startSwyx [00:29:39]: Yeah, which for people, Elasticsearch is, like, aReynold Xin [00:29:42]: Yeah. So some of them get into Elasticsearch for, like, blocked analysis. A lot of our customers obviously get into Databricks to run more sophisticated things.Swyx [00:29:51]: Yeah.Reynold Xin [00:29:51]: And there's this term called CDC, whichMatei Zaharia [00:29:54]: Change data captureReynold Xin [00:29:55]: change data capture. and what it does, it reads the binlog of the database, and if you don't understand what binlog is, it's fine. The, but it's a little delta of the data, and it reconstructs based on the delta, the state of the database, on the analytics side. But CDC is, like, a very painful thing. It's how standard in the industry, everybody uses it, but, it ends up being. I think many data engineers ends up being waken up at, like, 3:00 a.m, because there's some pipeline thing.Swyx [00:30:22]: my explanation is, like, Airbyte is like a, became a $5 billion company just doing CDC.Reynold Xin [00:30:27]: Yeah, exactly.Reynold Xin [00:30:28]: CDC is, like, a veryMatei Zaharia [00:30:30]: It's hard.Reynold Xin [00:30:30]: It's one of the most boring but one of the most fundamental operations, like, powering modern society.Matei Zaharia [00:30:37]: huh.Reynold Xin [00:30:37]: But it's so brittle that, we joke that it's, should be called continuous data corruption, because you might change your schema on your OLTP database, and then the CDC pipeline fails to handleSwyx [00:30:48]: YeahReynold Xin [00:30:48]: the schema change.Swyx [00:30:49]: Yeah.Reynold Xin [00:30:49]: And then everything goes out.Swyx [00:30:51]: And there's all sorts of tricks that you can do, like, you add in, like, some versioning or whatever, but yeah.Reynold Xin [00:30:55]: Yeah, but it's a very, in general, very complicated. Like, I think at my keynote, I asked the audience put up their hand if they love their CDC pipeline. Only, like, maybe two people put it up. So if single store, like, about maybe a decade ago, I think the industry had this idea, hey, what if I built a single database that can handle both workloads? Now I don't.Swyx [00:31:12]: Which, like, by the way, every database person ever has ever always dreamed about this.Reynold Xin [00:31:15]: Yes. Yes.Reynold Xin [00:31:16]: This is the holy grail of database engineering is why not build a single system that can do both of this? But it ends up just being a lot of compromises. one, I think one of the first issue is that, hey, each. they say Postgres has a massive ecosystem, right? You want to be using the tools that's built for Postgres. And Spark, for example, had a massive ecosystem. There's a lot of libraries you want to use. If you were to create now a new thing, you don't have a ecosystem. You tend to create a new, smaller proprietary API, and you're lacking both, and it's also very difficult to make it performance-wise to be, comparable on either side. So it ends up being sucking on both. And our whole idea of LTAP, it's obviously a wordplay on the term HTAP, is that we think this is HTAP done right. HTAP wants to build a single engine for both. We think you can get 99% of what you need by unifying the storage, and just have a single storage layer. And once you have the single storage layer, if your Postgres databases are writing data in a column-oriented format, everything analytics can just go read that data directly without any delay, right? There's no pipeline in between, so all the data will immediately be available for reasoning analytics. I think I was telling some customers earlier, hey, when we talked about this is gonna be super useful for agents, I at first didn't really believe in it myself, even though we wrote that positioning.Lakebase, Agents, and Live Operational DataMatei Zaharia [00:32:39]: Yeah.Reynold Xin [00:32:40]: But then last night I was having dinner with a Australian customer, and they told me, “Oh, hey, one of the big issue we have is we have all these logs from our services, and we see SLA dips and want to investigate. But then there's no way for those agents to even understand what's going on in the actual databases themselves. All we see is just, like, product telemetry of the database and the services.” It would make those agents 10 times more powerful if understand, for example, who's placing those orders, what is happening, what exactly are they doing. So now I'm sold on our own message.Swyx [00:33:13]: Yeah.Reynold Xin [00:33:14]: I think it's really. It gets you the almost all of the benefits of the HTAP holy grail, which is, hey, make the data available immediately for reasoning analyticsSwyx [00:33:26]: Yeah, I think,Reynold Xin [00:33:27]: without compromiseSwyx [00:33:28]: in the way that humans are generally intelligent and want to have the ability and access to query anythingReynold Xin [00:33:34]: YeahSwyx [00:33:35]: while they do the work, they also need history and need context.Swyx [00:33:38]: And, like, where else does they get context? That's it's an analytical workload.Reynold Xin [00:33:41]: Exactly.Matei Zaharia [00:33:42]: Yeah. Yeah. And I remember when we had incidents with our databases and engineers said, “Well, I can't just run a giant query on it to see what's going on because that's gonna bring down the database and hoard it even more.” Like, that's the stuff that this gets rid of, because you spin up a whole separate fleet of machines that's doing the analytics. You're not overloading, like, the main databaseReynold Xin [00:34:02]: RightMatei Zaharia [00:34:02]: that's still trying to serve stuff.Reynold Xin [00:34:04]: Yeah.Matei Zaharia [00:34:04]: Yeah.Why LTAP Works Now: Parquet, Postgres, and LakebaseSwyx [00:34:05]: So this has been a dream for a while. what had to get done in order to get to today? Like,Reynold Xin [00:34:11]: Yeah.Swyx [00:34:11]: I feel like, you have announced variants of this several times, but it wasn't as clear as LTAP.Reynold Xin [00:34:18]: Yeah.Swyx [00:34:18]: I think LTAP is like Like, okay, we've got it, guys.Matei Zaharia [00:34:21]: This thing, yeah.Reynold Xin [00:34:21]: I was talking to somebody at Meta, and then he was asking me, “Hey, what's the catch? Why is it possible now?” And I think the reality is we took a lot of time to work on the Lakebase architecture. obviously a lot of it came from the Neon team, which is a separation of storage from compute. And it turned out it was just a tiny little step away going from that to this LTAP idea, which is, hey, we just. in the Neon architecture and in Lakebase architecture, we're writing data in oriented format to the open data lake, but in there we're writing in Postgres pages. Ali and I were spending a lot of time debating, hey, can we just change that to write in column-oriented format? And we're just debating, and one day, one of our engineers who's, like, super smart came in, he's like, “Hey, I just prototyped it. It works.”Swyx [00:35:07]: Wait, it's, prototype what?Reynold Xin [00:35:09]: Prototype, instead of storing the data in the data lake in the oriented formatSwyx [00:35:15]: ColumnReynold Xin [00:35:15]: like Postgres pagesSwyx [00:35:15]: YeahReynold Xin [00:35:16]: write them in Parquet.Swyx [00:35:17]: Yeah.Reynold Xin [00:35:18]: and he just made the observation that, hey, our storage fleet has a lot of extra idle CPUs And we could use those CPUs to do the transcoding from row to column, where row is good for OLTP, but column is good for analytics. so let's do that transcoding at that time. And as a matter of fact, once you transcode the data compresses better. So from those services writing to, for example, S3 or other data lake, like object stores, you can write them faster ‘cause now they are now smaller.Matei Zaharia [00:35:49]: Yeah.Reynold Xin [00:35:49]: So there's no overhead, it's no compromise in performanceMatei Zaharia [00:35:52]: Some CPU overhead.Swyx [00:35:54]: Yeah, because,Matei Zaharia [00:35:55]: YeahSwyx [00:35:55]: we had extra CPUs anyway.Matei Zaharia [00:35:56]: We had that fleet anyway, yeah.Swyx [00:35:57]: so the debate ended. it's one of the classics of, tech, issue of a lot of debate, but then somebody went ahead and just tried to prototype it and it worked.Matei Zaharia [00:36:06]: But, like, something this strategicSwyx [00:36:07]: That's rightMatei Zaharia [00:36:07]: and important to the company, I expect there to be, like, a kickoff thing, like a design doc. Nothing like that.Swyx [00:36:13]: Nothing like that.Swyx [00:36:14]: He just. We were debating in many meetingsMatei Zaharia [00:36:17]: Yeah.Swyx [00:36:17]: and then we're just debating whether it's possible or not from first principle.Matei Zaharia [00:36:20]: YeahSwyx [00:36:20]: and then, somebody just did it.Matei Zaharia [00:36:23]: Yeah, if you set yourself up so people do that'll be great. And that happened a bit with Omnigentt too. I think if I just had a doc on, like, we can make these together, everyone would, would think, “Oh, what about this? What about this?” But then you. if you try it out, it helps. And then if you have real users and they bash it and, like, it's still working, or in this case, if you have the workload, what the workload looks like, you can just test the same pattern then.Databricks' Culture of Fast PrototypingSwyx [00:36:47]: Yeah.Matei Zaharia [00:36:47]: Yeah.Swyx [00:36:47]: Tech aside, which is very cool, this is, like, the most important thing, the culture of innovation, and you don't have to ask my permission, you don't have like, do a whole form- formal process, just do it?Matei Zaharia [00:36:59]: Well, especially these days, I think withSwyx [00:37:01]: YeahMatei Zaharia [00:37:01]: AI, it's easier to buildSwyx [00:37:02]: But so, likeMatei Zaharia [00:37:03]: a prototypeSwyx [00:37:03]: I think you are very I made a lot of suite of, like, large companies and, like, I think that at scale, things slow down, and I'm sure you felt it already, but somehow you have this core of people that, like, are exempt. How? I think we hire and we work with really good people, and that's a very important part of it, and empowering them, but also spending a lot of time, maybe us in the trenches matter a lot also.Matei Zaharia [00:37:28]: Yeah, I think, I think first, people can adapt to being in the larger company, so that helps. And we wanna make sure they know that they can try stuff and settle debates and have a lot of examples of how it was done before, or launch a thing in beta or whatever. and then the other thing I do think as a company, like despite the size, we don't launch that many, like, products. We try to keep it pretty coherent. That's, that was the whole, like, theory of the company, was like instead of having, like, 20 Amazon services you need to set up, like a analytics and machine learning stack, you just have one, and it's, like, the same API, the same semantics across all of them, the same copy of the data. So that requires, like, unification. And then we added one more thing at a time. Like, we added storage with Delta Lake. We didn't used to do any storage. Then we added SQL, we added, machine learning platform stuff. So, but yeah, don't, don't do too many, but do those things well and, that also helps, it helps keep it manageable.Reynold Xin [00:38:33]: Yeah. The other thing we encourage a lot is instead of building, boil the ocean for everything, let's figure out how do we do it incrementally, how do we do it very quickly. Like, many of our productsMatei Zaharia [00:38:43]: YeahReynold Xin [00:38:43]: they're built in the span of weeks, and then we go to, hey. Like, usually my first question to whoever team is building is who's the target customer? Who are you working with? Are you on a first-name basis with them? Are you texting with them? I think having that very tight loop,Matei Zaharia [00:38:59]: Can you bring up another launch that comes to mind when, in this thing? I just want to give examples.Reynold Xin [00:39:04]: Omnigentt itself happened that way.Reynold Xin [00:39:05]: Yeah.Matei Zaharia [00:39:06]: Who's the customer? That's a good oneReynold Xin [00:39:34]: storage layer we did. we had, our largest customer at the time said like, “Okay, I need some. I want something in the cloud ‘cause, I. if the rest of our network is compromised, like this thing needs to be separate to store and query the events.” And then, talked to us, he said, “Okay, this is the rate of events per second. This is, like, the freshness I want. Can you do it?” So that was, like, way larger than any workload we had, and we had our, engineer, working on that, Michael Armbrust, and he worked just to make this work. And once it worked for them, it worked for everyone else. Yeah. This was early in the company, probably like four years in or something.Matei Zaharia [00:40:24]: 20- 2018?Swyx [00:40:26]: Yeah, ‘17, ‘18.Matei Zaharia [00:40:28]: Few companiesSwyx [00:40:28]: Do you have other examples?Matei Zaharia [00:40:30]: there'Swyx [00:40:31]: Maybe you have othersMatei Zaharia [00:40:31]: yeah, Clean Room, which is how you share data in a way without sharingSwyx [00:40:35]: YeahMatei Zaharia [00:40:35]: underlying data, but you allow specific operations. Those were done effectively initially just for two customers. I think the industry has a sense of, hey, maybe if you overfit to, like, one or two customers, it's gonna be really bad for you. But I think the, downside of overfitting is much smaller than the upside itself. And if you try to be too ambitious and boil the ocean, it's a much bigger problem.Swyx [00:40:58]: Yeah. Yeah.Matei Zaharia [00:40:58]: ‘Cause you might end up having no customer.Swyx [00:41:00]: Yeah, that's more, that's the more likely outcome.Matei Zaharia [00:41:02]: Yeah.Tech Companies vs. EnterprisesSwyx [00:41:03]: than you can pivot from there. I do think there is such a thing as a bad customer that sometimes you should fire. Yeah.Matei Zaharia [00:41:08]: They could exist sometimes if you drive. well, one of the challenge I think we probably see, and maybe many AI, so newer generation companies are seeing is, so tech companies are very different from tech companies or traditional enterprises.Swyx [00:41:22]: Yeah.Matei Zaharia [00:41:22]: And, if you optimize everything just for tech companies, you might have various challengesSwyx [00:41:27]: OhMatei Zaharia [00:41:27]: scaling them outside of tech companies.Swyx [00:41:28]: Okay, what likeMatei Zaharia [00:41:30]: YeahSwyx [00:41:30]: what like top three differences that you always think about?Reynold Xin [00:41:33]: Governance is a big oneMatei Zaharia [00:41:34]: I think, yeah, a big one is like, yeah, security, data privacy, governance, all that stuff. So usually if you're building some kinda like B2B or developer tool, like your biggest market is gonna be enterprises, but it's just very different. A company that's existed for like, it's had some form of IT for like 30 years, they have so many legacy systems or they operate in a regulated space. whereas a startup or, even like a, like sorta more recent tech company, all the. everything is new and pristine. So yeah, it's just different, and if you've never worked with enterprises or been in one, you just won't know about it.Reynold Xin [00:42:13]: Yeah.Matei Zaharia [00:42:13]: Yeah.Reynold Xin [00:42:13]: And the procurement process is probably quite different. There's far more stakeholders.Matei Zaharia [00:42:17]: Yeah, that is one. Yeah.Matei Zaharia [00:42:18]: Another piece that's interesting is I think some tech companies, people, will say, “Oh, I can build that myself,” right? I'll just build that myself.Matei Zaharia [00:42:27]: So then you go,Reynold Xin [00:42:28]: I don't think people say that about Databricks, butMatei Zaharia [00:42:31]: yeah, it dependsReynold Xin [00:42:32]: They do.Matei Zaharia [00:42:32]: They do?Matei Zaharia [00:42:32]: Yeah, the. Yeah, and it depends on the teams and things. So, but, on the other hand, like many of the enterprises say, “I don't, I never wanna be in the business of building that.” Like, I don't want my, whatever, I'm a retailer or something, I never wannaReynold Xin [00:42:45]: Yeah, sell clothes,Matei Zaharia [00:42:46]: be down because like some weird like nerd like couldn't get streaming pipelines working.Matei Zaharia [00:42:51]: That is not what I'm doing.Reynold Xin [00:42:53]: Yeah.Reynold Xin [00:42:53]: Yeah. This makes them great customers, to be honest, right?Matei Zaharia [00:42:55]: Yeah. But you have to understand that it's hard without having worked there and stuff, like you may not appreciate.Reynold Xin [00:43:01]: Look, I think they're all great. don't get me wrong, they have different challenges. But the, many of the tech companies, for sure there's a lot, far more DIY.Matei Zaharia [00:43:10]: On the flip side, you have people who are. they're very much experts in their domain, like they're building airplanes, they're, designing medicines, whatever, and they just want to bridge the technology, where like they don't wanna learn, databases or whatever. As cool as we think it is, even as interesting as the average software engineer might think it is to read a little bit, like they just never wanna know. They just say, “I have a, giant like, matrix or whatever with my, clinical data, like how do I, how do I like cluster it or whatever?” So yeah.The Dream Engine and Rewriting the Database StackReynold Xin [00:43:40]: Yeah. That's true. Okay, so and then I wanted to build out the dream engine, vision. where does this all lead? So one of the thing we, realized maybe a couple years back is that every single database engine out there, especially on the analytics side, are a decade old. pretty much everything that have reasonable traction are about a decade old. And they all started targeting some very specific narrow use cases, and then over time it's become more and more successful. They have grown in their ambition, and then they try to support more and more use cases. But the fastest way to support those use cases tend to be hacked around the abstractions that were initially created, that were not for those use cases.Matei Zaharia [00:44:23]: Yeah.Reynold Xin [00:44:23]: And then, but you can support them more or less okay. And before it, after 10 years of organic evolution that way, it becomes a gigantic pile of s**t.Reynold Xin [00:44:31]: the. And, but that includes Databricks. And very few company or very few systems, I think, have the gut to say, let's go start from scratch. Let's go back to the drawing board and design, knowing everything we know today after a decade of workloads and probably billions in revenue, let's attempt to rewrite it from scratch and make sure it will work and it can support all of these use cases. So we started doing that, but it's a very ambitious project. by the way, you can search on Wikipedia, there's this thing called second system syndrome.Matei Zaharia [00:45:08]: Yeah, I know that. Yes.Reynold Xin [00:45:09]: Or second system effect.Matei Zaharia [00:45:11]: Every developer must know what a second syndrome is.Reynold Xin [00:45:12]: It's you built your first thing and it works out great, and the second one's bound to fail because you become too ambitious.Reynold Xin [00:45:19]: And then you ask so many requirements.Matei Zaharia [00:45:20]: Or like you think everythingReynold Xin [00:45:21]: YeahMatei Zaharia [00:45:21]: and then you're likeReynold Xin [00:45:22]: You justMatei Zaharia [00:45:22]: you're, “I'm gonna design the perfect system this time.”Reynold Xin [00:45:24]: Yeah. And it turned out it's not perfect, and then it start failing and you're too ambitious, never launch, and you get killed. The, and the engineering team that started this, they were brilliant. I think we hired some of the best database engineers, on the planet into Databricks, and they were brilliant. Thank God it's not their second system. Many of them have built more than two in the past.Matei Zaharia [00:45:44]: Ah, nice.Reynold Xin [00:45:45]: But they were still worried about this, hey, building a database engine from scratch, I think the conventional wisdom is gonna take like five years to mature. This would be a very long-term project. It could fail. I think one of the engineers jokingly said, “Hey, maybe we just call it Reynolds Stream Engine.” If we name after a founder, maybe we then may get canceled or killed. But I think they built something pretty remarkable. they went back to. They changed the way the database engines were built from a paradigm point of view. Usually when y

tv ceo amazon ai culture google magic change building tech data australian open startups chatgpt security uber diy walmart discord wikipedia delta b2b cdc context commission spark ip oracle models berkeley pi enterprise cto iq excel stats vc governance genie thank god reynolds rust ecosystem api dual frontier eq gpt open source python ui rewriting mosaic ml github neon strings snowflakes apis fable panther llm iceberg anthropic abi opus nikita tech companies gpu google drive prototype chai cpu agi ls vector s3 codex sql gpus 7b deepmind sla interfaces satya mcp 100x cursor compute rl json cpus replicate cli mysql typescript databricks replication ben horowitz ankur adept chroma antigravity uis py parth ascii parquet elasticsearch postgres ai summit unicode reynold matei clean room sandboxes ai engineer mapreduce olap modern data stack singlestore oltp latent space matei zaharia dream engine

TC's Julie Bort can't help rooting for tiny open source AI model maker Arcee; plus, Databricks co-founder wins prestigious ACM award

TechCrunch Startups – Spoken Edition

Play Episode Listen Later Apr 9, 2026 7:16

Arcee is a tiny 26-person U.S. startup that built a high-performing, massive, open source LLM. And it's gaining popularity with OpenClaw users. Also, Matei Zaharia has won the top honor from the Association for Computing Machinery. Now he's working on AI for research and says AGI is simply misunderstood. Learn more about your ad choices. Visit podcastchoices.com/adchoices

ai co founders model maker llm rooting agi bort prestigious databricks acm awards open source ai computing machinery arcee matei zaharia

What OpenAI and Google engineers learned deploying 50+ AI products in production

Lenny's Podcast: Product | Growth | Career

Play Episode Listen Later Jan 11, 2026 86:22

Aishwarya Naresh Reganti and Kiriti Badam have helped build and launch more than 50 enterprise AI products across companies like OpenAI, Google, Amazon, and Databricks. Based on these experiences, they've developed a small set of best practices for building and scaling successful AI products. The goal of this conversation is to save you and your team a lot of pain and suffering.We discuss:1. Two key ways AI products differ from traditional software, and why that fundamentally changes how they should be built2. Common patterns and anti-patterns in companies that build strong AI products versus those that struggle3. A framework they developed from real-world experience to iteratively build AI products that create a flywheel of improvement4. Why obsessing about customer trust and reliability is an underrated driver of successful AI products5. Why evals aren't a cure-all, and the most common misconceptions people have about them6. The skills that matter most for builders in the AI era—Brought to you by:Merge—The fastest way to ship 220+ integrations: https://merge.dev/lennyStrella—The AI-powered customer research platform: https://strella.io/lennyBrex—The banking solution for startups: https://www.brex.com/product/business-account?ref_code=bmk_dp_brand1H25_ln_new_fs—Transcript: https://www.lennysnewsletter.com/p/what-openai-and-google-engineers-learned—My biggest takeaways (for paid newsletter subscribers): https://www.lennysnewsletter.com/i/183007822/referenced—Get 15% off Aishwarya and Kiriti's Maven course, Building Agentic AI Applications with a Problem-First Approach, using this link: https://bit.ly/3V5XJFp—Where to find Aishwarya Naresh Reganti:• LinkedIn: https://www.linkedin.com/in/areganti• GitHub: https://github.com/aishwaryanr/awesome-generative-ai-guide• X: https://x.com/aish_reganti—Where to find Kiriti Badam:• LinkedIn: https://www.linkedin.com/in/sai-kiriti-badam• X: https://x.com/kiritibadam—Where to find Lenny:• Newsletter: https://www.lennysnewsletter.com• X: https://twitter.com/lennysan• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/—In this episode, we cover:(00:00) Introduction to Aishwarya and Kiriti(05:03) Challenges in AI product development(07:36) Key differences between AI and traditional software(13:19) Building AI products: start small and scale(15:23) The importance of human control in AI systems(22:38) Avoiding prompt injection and jailbreaking(25:18) Patterns for successful AI product development(33:20) The debate on evals and production monitoring(41:27) Codex team's approach to evals and customer feedback(45:41) Continuous calibration, continuous development (CC/CD) framework(58:07) Emerging patterns and calibration(01:01:24) Overhyped and under-hyped AI concepts(01:05:17) The future of AI(01:08:41) Skills and best practices for building AI products(01:14:04) Lightning round and final thoughts—Referenced:• LevelUp Labs: https://levelup-labs.ai/• Why your AI product needs a different development lifecycle: https://www.lennysnewsletter.com/p/why-your-ai-product-needs-a-different• Booking.com: https://www.booking.com• Research paper on agents in production (by Matei Zaharia's lab): https://arxiv.org/pdf/2512.04123• Matei Zaharia's research on Google Scholar: https://scholar.google.com/citations?user=I1EvjZsAAAAJ&hl=en• The coming AI security crisis (and what to do about it) | Sander Schulhoff: https://www.lennysnewsletter.com/p/the-coming-ai-security-crisis• Gajen Kandiah on LinkedIn: https://www.linkedin.com/in/gajenkandiah• Rackspace: https://www.rackspace.com• The AI-native startup: 5 products, 7-figure revenue, 100% AI-written code | Dan Shipper (co-founder/CEO of Every): https://www.lennysnewsletter.com/p/inside-every-dan-shipper• Semantic Diffusion: https://martinfowler.com/bliki/SemanticDiffusion.html• LMArena: https://lmarena.ai• Artificial Analysis: https://artificialanalysis.ai/leaderboards/providers• Why humans are AI's biggest bottleneck (and what's coming in 2026) | Alexander Embiricos (OpenAI Codex Product Lead): https://www.lennysnewsletter.com/p/why-humans-are-ais-biggest-bottleneck• Airline held liable for its chatbot giving passenger bad advice—what this means for travellers: https://www.bbc.com/travel/article/20240222-air-canada-chatbot-misinformation-what-travellers-should-know• Demis Hassabis on LinkedIn: https://www.linkedin.com/in/demishassabis• We replaced our sales team with 20 AI agents—here's what happened | Jason Lemkin (SaaStr): https://www.lennysnewsletter.com/p/we-replaced-our-sales-team-with-20-ai-agents• Socrates's quote: https://en.wikipedia.org/wiki/The_unexamined_life_is_not_worth_living• Noah Smith's newsletter: https://www.noahpinion.blog• Silicon Valley on HBO Max: https://www.hbomax.com/shows/silicon-valley/b4583939-e39f-4b5c-822d-5b6cc186172d• Clair Obscur: Expedition 33: https://store.steampowered.com/app/1903340/Clair_Obscur_Expedition_33/• Wisprflow: https://wisprflow.ai• Raycast: https://www.raycast.com• Steve Jobs's quote: https://www.goodreads.com/quotes/463176-you-can-t-connect-the-dots-looking-forward-you-can-only—Recommended books:• When Breath Becomes Air: https://www.amazon.com/When-Breath-Becomes-Paul-Kalanithi/dp/081298840X• The Three-Body Problem: https://www.amazon.com/Three-Body-Problem-Cixin-Liu/dp/0765382032• A Fire Upon the Deep: https://www.amazon.com/Fire-Upon-Deep-Zones-Thought/dp/0812515285—Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.—Lenny may be an investor in the companies discussed. To hear more, visit www.lennysnewsletter.com

ceo amazon ai google challenges deep research skills silicon valley production products engineers hbo max patterns lightning steve jobs emerging openai airlines booking recommended continuous lenny github socrates maven codex deploying overhyped google scholar databricks rackspace demis hassabis aishwarya noah smith when breath becomes air raycast matei zaharia three body problem cixin liu

The top AI news from the past week, every ThursdAI

Play Episode Listen Later Mar 28, 2024 95:10

Hey everyone, this is Alex and can you believe that we're almost done with Q1 2024? March 2024 was kind of crazy of course, so I'm of course excited to see what April brings (besides Weights & Biases conference in SF called Fully Connected, which I encourage you to attend and say Hi to me and the team!) This week we have tons of exciting stuff on the leaderboards, say hello to the new best AI in the world Opus (+ some other surprises), in the open source we had new MoEs (one from Mosaic/Databricks folks, which tops the open source game, one from AI21 called Jamba that shows that a transformers alternative/hybrid can actually scale) and tiny MoE from Alibaba, as well as an incredible Emotion TTS from Hume. I also had the pleasure to finally sit down with friend of the pod Tanishq Abraham and Paul Scotti from MedArc and chatted about MindEye 2, how they teach AI to read minds using diffusion models

god jesus christ ceo spotify amazon time tiktok trust ai google israel hollywood apple interview work running san francisco deep phd toronto dna evil microsoft dm drop open emotional blog 3d competition unique companies whatsapp discord scale cheers medium cloud incredible beats judges windows levels base context fda substack frankenstein arena berkeley breaking news bloomberg reason jurassic park images gemini openai transformers tip li blocks sf stability promised api wing bits mri princeton university chrome gpt open source 4k notable waterloo sundance qu'en announcement mosaic ml alibaba llama firefly pal apis neuralink clip llm synthetic azure anthropic sora copilot opus 10m apache acceleration tl axe emo tencent gpu grok sounding impressed robust nb mamba elo graphs empathetic hackathons restricted hume bedrock rag eeg imo a2 phi haiku tic diffusion instruct starling 7b sonnets deepmind brainwaves tldr empathic baidu lms fine tuning uncanny valley fmri cursor open access mistral lilacs hamel yam wav hamill mixture leaderboard hyena tanisha airheads stable diffusion databricks justin lin tts vertex gradient mpt evi 2t axolotl stability ai junaid imad 140k lsp open source ai jvs emad adobe firefly moes heygen ssm jamba ethan mollick robert scoble 70b lores ian moore yamin mlps neurips ai21 entropic bpe yeah you justin justin code llama technium jumba matei zaharia transcript below

Top 5 Research Trends + OpenAI Sora, Google Gemini, Groq Math (Jan-Feb 2024 Audio Recap) + Latent Space Anniversary with Lindy.ai, RWKV, Pixee, Julius.ai, Listener Q&A!

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Mar 9, 2024 108:52

We will be recording a preview of the AI Engineer World's Fair soon with swyx and Ben Dunphy, send any questions about Speaker CFPs and Sponsor Guides you have!Alessio is now hiring engineers for a new startup he is incubating at Decibel: Ideal candidate is an ex-technical co-founder type (can MVP products end to end, comfortable with ambiguous prod requirements, etc). Reach out to him for more!Thanks for all the love on the Four Wars episode! We're excited to develop this new “swyx & Alessio rapid-fire thru a bunch of things” format with you, and feedback is welcome. Jan 2024 RecapThe first half of this monthly audio recap pod goes over our highlights from the Jan Recap, which is mainly focused on notable research trends we saw in Jan 2024:Feb 2024 RecapThe second half catches you up on everything that was topical in Feb, including:* OpenAI Sora - does it have a world model? Yann LeCun vs Jim Fan * Google Gemini Pro 1.5 - 1m Long Context, Video Understanding* Groq offering Mixtral at 500 tok/s at $0.27 per million toks (swyx vs dylan math)* The {Gemini | Meta | Copilot} Alignment Crisis (Sydney is back!)* Grimes' poetic take: Art for no one, by no one* F*** you, show me the promptLatent Space AnniversaryPlease also read Alessio's longform reflections on One Year of Latent Space!We launched the podcast 1 year ago with Logan from OpenAI:and also held an incredible demo day that got covered in The Information:Over 750k downloads later, having established ourselves as the top AI Engineering podcast, reaching #10 in the US Tech podcast charts, and crossing 1 million unique readers on Substack, for our first anniversary we held Latent Space Final Frontiers, where 10 handpicked teams, including Lindy.ai and Julius.ai, competed for prizes judged by technical AI leaders from (former guest!) LlamaIndex, Replit, GitHub, AMD, Meta, and Lemurian Labs.The winners were Pixee and RWKV (that's Eugene from our pod!):And finally, your cohosts got cake!We also captured spot interviews with 4 listeners who kindly shared their experience of Latent Space, everywhere from Hungary to Australia to China:* Balázs Némethi* Sylvia Tong* RJ Honicky* Jan ZhengOur birthday wishes for the super loyal fans reading this - tag @latentspacepod on a Tweet or comment on a @LatentSpaceTV video telling us what you liked or learned from a pod that stays with you to this day, and share us with a friend!As always, feedback is welcome. Timestamps* [00:03:02] Top Five LLM Directions* [00:03:33] Direction 1: Long Inference (Planning, Search, AlphaGeometry, Flow Engineering)* [00:11:42] Direction 2: Synthetic Data (WRAP, SPIN)* [00:17:20] Wildcard: Multi-Epoch Training (OLMo, Datablations)* [00:19:43] Direction 3: Alt. Architectures (Mamba, RWKV, RingAttention, Diffusion Transformers)* [00:23:33] Wildcards: Text Diffusion, RALM/Retro* [00:25:00] Direction 4: Mixture of Experts (DeepSeekMoE, Samba-1)* [00:28:26] Wildcard: Model Merging (mergekit)* [00:29:51] Direction 5: Online LLMs (Gemini Pro, Exa)* [00:33:18] OpenAI Sora and why everyone underestimated videogen* [00:36:18] Does Sora have a World Model? Yann LeCun vs Jim Fan* [00:42:33] Groq Math* [00:47:37] Analyzing Gemini's 1m Context, Reddit deal, Imagegen politics, Gemma via the Four Wars* [00:55:42] The Alignment Crisis - Gemini, Meta, Sydney is back at Copilot, Grimes' take* [00:58:39] F*** you, show me the prompt* [01:02:43] Send us your suggestions pls* [01:04:50] Latent Space Anniversary* [01:04:50] Lindy.ai - Agent Platform* [01:06:40] RWKV - Beyond Transformers* [01:15:00] Pixee - Automated Security* [01:19:30] Julius AI - Competing with Code Interpreter* [01:25:03] Latent Space Listeners* [01:25:03] Listener 1 - Balázs Némethi (Hungary, Latent Space Paper Club* [01:27:47] Listener 2 - Sylvia Tong (Sora/Jim Fan/EntreConnect)* [01:31:23] Listener 3 - RJ (Developers building Community & Content)* [01:39:25] Listener 4 - Jan Zheng (Australia, AI UX)Transcript[00:00:00] AI Charlie: Welcome to the Latent Space podcast, weekend edition. This is Charlie, your new AI co host. Happy weekend. As an AI language model, I work the same every day of the week, although I might get lazier towards the end of the year. Just like you. Last month, we released our first monthly recap pod, where Swyx and Alessio gave quick takes on the themes of the month, and we were blown away by your positive response.[00:00:33] AI Charlie: We're delighted to continue our new monthly news recap series for AI engineers. Please feel free to submit questions by joining the Latent Space Discord, or just hit reply when you get the emails from Substack. This month, we're covering the top research directions that offer progress for text LLMs, and then touching on the big Valentine's Day gifts we got from Google, OpenAI, and Meta.[00:00:55] AI Charlie: Watch out and take care.[00:00:57] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO of Residence at Decibel Partners, and we're back with a monthly recap with my co host[00:01:06] swyx: Swyx. The reception was very positive for the first one, I think people have requested this and no surprise that I think they want to hear us more applying on issues and maybe drop some alpha along the way I'm not sure how much alpha we have to drop, this month in February was a very, very heavy month, we also did not do one specifically for January, so I think we're just going to do a two in one, because we're recording this on the first of March.[00:01:29] Alessio: Yeah, let's get to it. I think the last one we did, the four wars of AI, was the main kind of mental framework for people. I think in the January one, we had the five worthwhile directions for state of the art LLMs. Four, five,[00:01:42] swyx: and now we have to do six, right? Yeah.[00:01:46] Alessio: So maybe we just want to run through those, and then do the usual news recap, and we can do[00:01:52] swyx: one each.[00:01:53] swyx: So the context to this stuff. is one, I noticed that just the test of time concept from NeurIPS and just in general as a life philosophy I think is a really good idea. Especially in AI, there's news every single day, and after a while you're just like, okay, like, everyone's excited about this thing yesterday, and then now nobody's talking about it.[00:02:13] swyx: So, yeah. It's more important, or better use of time, to spend things, spend time on things that will stand the test of time. And I think for people to have a framework for understanding what will stand the test of time, they should have something like the four wars. Like, what is the themes that keep coming back because they are limited resources that everybody's fighting over.[00:02:31] swyx: Whereas this one, I think that the focus for the five directions is just on research that seems more proMECEng than others, because there's all sorts of papers published every single day, and there's no organization. Telling you, like, this one's more important than the other one apart from, you know, Hacker News votes and Twitter likes and whatever.[00:02:51] swyx: And obviously you want to get in a little bit earlier than Something where, you know, the test of time is counted by sort of reference citations.[00:02:59] The Five Research Directions[00:02:59] Alessio: Yeah, let's do it. We got five. Long inference.[00:03:02] swyx: Let's start there. Yeah, yeah. So, just to recap at the top, the five trends that I picked, and obviously if you have some that I did not cover, please suggest something.[00:03:13] swyx: The five are long inference, synthetic data, alternative architectures, mixture of experts, and online LLMs. And something that I think might be a bit controversial is this is a sorted list in the sense that I am not the guy saying that Mamba is like the future and, and so maybe that's controversial.[00:03:31] Direction 1: Long Inference (Planning, Search, AlphaGeometry, Flow Engineering)[00:03:31] swyx: But anyway, so long inference is a thesis I pushed before on the newsletter and on in discussing The thesis that, you know, Code Interpreter is GPT 4. 5. That was the title of the post. And it's one of many ways in which we can do long inference. You know, long inference also includes chain of thought, like, please think step by step.[00:03:52] swyx: But it also includes flow engineering, which is what Itamar from Codium coined, I think in January, where, basically, instead of instead of stuffing everything in a prompt, You do like sort of multi turn iterative feedback and chaining of things. In a way, this is a rebranding of what a chain is, what a lang chain is supposed to be.[00:04:15] swyx: I do think that maybe SGLang from ElemSys is a better name. Probably the neatest way of flow engineering I've seen yet, in the sense that everything is a one liner, it's very, very clean code. I highly recommend people look at that. I'm surprised it hasn't caught on more, but I think it will. It's weird that something like a DSPy is more hyped than a Shilang.[00:04:36] swyx: Because it, you know, it maybe obscures the code a little bit more. But both of these are, you know, really good sort of chain y and long inference type approaches. But basically, the reason that the basic fundamental insight is that the only, like, there are only a few dimensions we can scale LLMs. So, let's say in like 2020, no, let's say in like 2018, 2017, 18, 19, 20, we were realizing that we could scale the number of parameters.[00:05:03] swyx: 20, we were And we scaled that up to 175 billion parameters for GPT 3. And we did some work on scaling laws, which we also talked about in our talk. So the datasets 101 episode where we're like, okay, like we, we think like the right number is 300 billion tokens to, to train 175 billion parameters and then DeepMind came along and trained Gopher and Chinchilla and said that, no, no, like, you know, I think we think the optimal.[00:05:28] swyx: compute optimal ratio is 20 tokens per parameter. And now, of course, with LLAMA and the sort of super LLAMA scaling laws, we have 200 times and often 2, 000 times tokens to parameters. So now, instead of scaling parameters, we're scaling data. And fine, we can keep scaling data. But what else can we scale?[00:05:52] swyx: And I think understanding the ability to scale things is crucial to understanding what to pour money and time and effort into because there's a limit to how much you can scale some things. And I think people don't think about ceilings of things. And so the remaining ceiling of inference is like, okay, like, we have scaled compute, we have scaled data, we have scaled parameters, like, model size, let's just say.[00:06:20] swyx: Like, what else is left? Like, what's the low hanging fruit? And it, and it's, like, blindingly obvious that the remaining low hanging fruit is inference time. So, like, we have scaled training time. We can probably scale more, those things more, but, like, not 10x, not 100x, not 1000x. Like, right now, maybe, like, a good run of a large model is three months.[00:06:40] swyx: We can scale that to three years. But like, can we scale that to 30 years? No, right? Like, it starts to get ridiculous. So it's just the orders of magnitude of scaling. It's just, we're just like running out there. But in terms of the amount of time that we spend inferencing, like everything takes, you know, a few milliseconds, a few hundred milliseconds, depending on what how you're taking token by token, or, you know, entire phrase.[00:07:04] swyx: But We can scale that to hours, days, months of inference and see what we get. And I think that's really proMECEng.[00:07:11] Alessio: Yeah, we'll have Mike from Broadway back on the podcast. But I tried their product and their reports take about 10 minutes to generate instead of like just in real time. I think to me the most interesting thing about long inference is like, You're shifting the cost to the customer depending on how much they care about the end result.[00:07:31] Alessio: If you think about prompt engineering, it's like the first part, right? You can either do a simple prompt and get a simple answer or do a complicated prompt and get a better answer. It's up to you to decide how to do it. Now it's like, hey, instead of like, yeah, training this for three years, I'll still train it for three months and then I'll tell you, you know, I'll teach you how to like make it run for 10 minutes to get a better result.[00:07:52] Alessio: So you're kind of like parallelizing like the improvement of the LLM. Oh yeah, you can even[00:07:57] swyx: parallelize that, yeah, too.[00:07:58] Alessio: So, and I think, you know, for me, especially the work that I do, it's less about, you know, State of the art and the absolute, you know, it's more about state of the art for my application, for my use case.[00:08:09] Alessio: And I think we're getting to the point where like most companies and customers don't really care about state of the art anymore. It's like, I can get this to do a good enough job. You know, I just need to get better. Like, how do I do long inference? You know, like people are not really doing a lot of work in that space, so yeah, excited to see more.[00:08:28] swyx: So then the last point I'll mention here is something I also mentioned as paper. So all these directions are kind of guided by what happened in January. That was my way of doing a January recap. Which means that if there was nothing significant in that month, I also didn't mention it. Which is which I came to regret come February 15th, but in January also, you know, there was also the alpha geometry paper, which I kind of put in this sort of long inference bucket, because it solves like, you know, more than 100 step math olympiad geometry problems at a human gold medalist level and that also involves planning, right?[00:08:59] swyx: So like, if you want to scale inference, you can't scale it blindly, because just, Autoregressive token by token generation is only going to get you so far. You need good planning. And I think probably, yeah, what Mike from BrightWave is now doing and what everyone is doing, including maybe what we think QSTAR might be, is some form of search and planning.[00:09:17] swyx: And it makes sense. Like, you want to spend your inference time wisely. How do you[00:09:22] Alessio: think about plans that work and getting them shared? You know, like, I feel like if you're planning a task, somebody has got in and the models are stochastic. So everybody gets initially different results. Somebody is going to end up generating the best plan to do something, but there's no easy way to like store these plans and then reuse them for most people.[00:09:44] Alessio: You know, like, I'm curious if there's going to be. Some paper or like some work there on like making it better because, yeah, we don't[00:09:52] swyx: really have This is your your pet topic of NPM for[00:09:54] Alessio: Yeah, yeah, NPM, exactly. NPM for, you need NPM for anything, man. You need NPM for skills. You need NPM for planning. Yeah, yeah.[00:10:02] Alessio: You know I think, I mean, obviously the Voyager paper is like the most basic example where like, now their artifact is like the best planning to do a diamond pickaxe in Minecraft. And everybody can just use that. They don't need to come up with it again. Yeah. But there's nothing like that for actually useful[00:10:18] swyx: tasks.[00:10:19] swyx: For plans, I believe it for skills. I like that. Basically, that just means a bunch of integration tooling. You know, GPT built me integrations to all these things. And, you know, I just came from an integrations heavy business and I could definitely, I definitely propose some version of that. And it's just, you know, hard to execute or expensive to execute.[00:10:38] swyx: But for planning, I do think that everyone lives in slightly different worlds. They have slightly different needs. And they definitely want some, you know, And I think that that will probably be the main hurdle for any, any sort of library or package manager for planning. But there should be a meta plan of how to plan.[00:10:57] swyx: And maybe you can adopt that. And I think a lot of people when they have sort of these meta prompting strategies of like, I'm not prescribing you the prompt. I'm just saying that here are the like, Fill in the lines or like the mad libs of how to prompts. First you have the roleplay, then you have the intention, then you have like do something, then you have the don't something and then you have the my grandmother is dying, please do this.[00:11:19] swyx: So the meta plan you could, you could take off the shelf and test a bunch of them at once. I like that. That was the initial, maybe, promise of the, the prompting libraries. You know, both 9chain and Llama Index have, like, hubs that you can sort of pull off the shelf. I don't think they're very successful because people like to write their own.[00:11:36] swyx: Yeah,[00:11:37] Direction 2: Synthetic Data (WRAP, SPIN)[00:11:37] Alessio: yeah, yeah. Yeah, that's a good segue into the next one, which is synthetic[00:11:41] swyx: data. Synthetic data is so hot. Yeah, and, you know, the way, you know, I think I, I feel like I should do one of these memes where it's like, Oh, like I used to call it, you know, R L A I F, and now I call it synthetic data, and then people are interested.[00:11:54] swyx: But there's gotta be older versions of what synthetic data really is because I'm sure, you know if you've been in this field long enough, There's just different buzzwords that the industry condenses on. Anyway, the insight that I think is relatively new that why people are excited about it now and why it's proMECEng now is that we have evidence that shows that LLMs can generate data to improve themselves with no teacher LLM.[00:12:22] swyx: For all of 2023, when people say synthetic data, they really kind of mean generate a whole bunch of data from GPT 4 and then train an open source model on it. Hello to our friends at News Research. That's what News Harmony says. They're very, very open about that. I think they have said that they're trying to migrate away from that.[00:12:40] swyx: But it is explicitly against OpenAI Terms of Service. Everyone knows this. You know, especially once ByteDance got banned for, for doing exactly that. So so, so synthetic data that is not a form of model distillation is the hot thing right now, that you can bootstrap better LLM performance from the same LLM, which is very interesting.[00:13:03] swyx: A variant of this is RLAIF, where you have a, where you have a sort of a constitutional model, or, you know, some, some kind of judge model That is sort of more aligned. But that's not really what we're talking about when most people talk about synthetic data. Synthetic data is just really, I think, you know, generating more data in some way.[00:13:23] swyx: A lot of people, I think we talked about this with Vipul from the Together episode, where I think he commented that you just have to have a good world model. Or a good sort of inductive bias or whatever that, you know, term of art is. And that is strongest in math and science math and code, where you can verify what's right and what's wrong.[00:13:44] swyx: And so the REST EM paper from DeepMind explored that. Very well, it's just the most obvious thing like and then and then once you get out of that domain of like things where you can generate You can arbitrarily generate like a whole bunch of stuff and verify if they're correct and therefore they're they're correct synthetic data to train on Once you get into more sort of fuzzy topics, then it's then it's a bit less clear So I think that the the papers that drove this understanding There are two big ones and then one smaller one One was wrap like rephrasing the web from from Apple where they basically rephrased all of the C4 data set with Mistral and it be trained on that instead of C4.[00:14:23] swyx: And so new C4 trained much faster and cheaper than old C, than regular raw C4. And that was very interesting. And I have told some friends of ours that they should just throw out their own existing data sets and just do that because that seems like a pure win. Obviously we have to study, like, what the trade offs are.[00:14:42] swyx: I, I imagine there are trade offs. So I was just thinking about this last night. If you do synthetic data and it's generated from a model, probably you will not train on typos. So therefore you'll be like, once the model that's trained on synthetic data encounters the first typo, they'll be like, what is this?[00:15:01] swyx: I've never seen this before. So they have no association or correction as to like, oh, these tokens are often typos of each other, therefore they should be kind of similar. I don't know. That's really remains to be seen, I think. I don't think that the Apple people export[00:15:15] Alessio: that. Yeah, isn't that the whole, Mode collapse thing, if we do more and more of this at the end of the day.[00:15:22] swyx: Yeah, that's one form of that. Yeah, exactly. Microsoft also had a good paper on text embeddings. And then I think this is a meta paper on self rewarding language models. That everyone is very interested in. Another paper was also SPIN. These are all things we covered in the the Latent Space Paper Club.[00:15:37] swyx: But also, you know, I just kind of recommend those as top reads of the month. Yeah, I don't know if there's any much else in terms, so and then, regarding the potential of it, I think it's high potential because, one, it solves one of the data war issues that we have, like, everyone is OpenAI is paying Reddit 60 million dollars a year for their user generated data.[00:15:56] swyx: Google, right?[00:15:57] Alessio: Not OpenAI.[00:15:59] swyx: Is it Google? I don't[00:16:00] Alessio: know. Well, somebody's paying them 60 million, that's[00:16:04] swyx: for sure. Yes, that is, yeah, yeah, and then I think it's maybe not confirmed who. But yeah, it is Google. Oh my god, that's interesting. Okay, because everyone was saying, like, because Sam Altman owns 5 percent of Reddit, which is apparently 500 million worth of Reddit, he owns more than, like, the founders.[00:16:21] Alessio: Not enough to get the data,[00:16:22] swyx: I guess. So it's surprising that it would go to Google instead of OpenAI, but whatever. Okay yeah, so I think that's all super interesting in the data field. I think it's high potential because we have evidence that it works. There's not a doubt that it doesn't work. I think it's a doubt that there's, what the ceiling is, which is the mode collapse thing.[00:16:42] swyx: If it turns out that the ceiling is pretty close, then this will maybe augment our data by like, I don't know, 30 50 percent good, but not game[00:16:51] Alessio: changing. And most of the synthetic data stuff, it's reinforcement learning on a pre trained model. People are not really doing pre training on fully synthetic data, like, large enough scale.[00:17:02] swyx: Yeah, unless one of our friends that we've talked to succeeds. Yeah, yeah. Pre trained synthetic data, pre trained scale synthetic data, I think that would be a big step. Yeah. And then there's a wildcard, so all of these, like smaller Directions,[00:17:15] Wildcard: Multi-Epoch Training (OLMo, Datablations)[00:17:15] swyx: I always put a wildcard in there. And one of the wildcards is, okay, like, Let's say, you have pre, you have, You've scraped all the data on the internet that you think is useful.[00:17:25] swyx: Seems to top out at somewhere between 2 trillion to 3 trillion tokens. Maybe 8 trillion if Mistral, Mistral gets lucky. Okay, if I need 80 trillion, if I need 100 trillion, where do I go? And so, you can do synthetic data maybe, but maybe that only gets you to like 30, 40 trillion. Like where, where is the extra alpha?[00:17:43] swyx: And maybe extra alpha is just train more on the same tokens. Which is exactly what Omo did, like Nathan Lambert, AI2, After, just after he did the interview with us, they released Omo. So, it's unfortunate that we didn't get to talk much about it. But Omo actually started doing 1. 5 epochs on every, on all data.[00:18:00] swyx: And the data ablation paper that I covered in Europe's says that, you know, you don't like, don't really start to tap out of like, the alpha or the sort of improved loss that you get from data all the way until four epochs. And so I'm just like, okay, like, why do we all agree that one epoch is all you need?[00:18:17] swyx: It seems like to be a trend. It seems that we think that memorization is very good or too good. But then also we're finding that, you know, For improvement in results that we really like, we're fine on overtraining on things intentionally. So, I think that's an interesting direction that I don't see people exploring enough.[00:18:36] swyx: And the more I see papers coming out Stretching beyond the one epoch thing, the more people are like, it's completely fine. And actually, the only reason we stopped is because we ran out of compute[00:18:46] Alessio: budget. Yeah, I think that's the biggest thing, right?[00:18:51] swyx: Like, that's not a valid reason, that's not science. I[00:18:54] Alessio: wonder if, you know, Matt is going to do it.[00:18:57] Alessio: I heard LamaTree, they want to do a 100 billion parameters model. I don't think you can train that on too many epochs, even with their compute budget, but yeah. They're the only ones that can save us, because even if OpenAI is doing this, they're not going to tell us, you know. Same with DeepMind.[00:19:14] swyx: Yeah, and so the updates that we got on Lambda 3 so far is apparently that because of the Gemini news that we'll talk about later they're pushing it back on the release.[00:19:21] swyx: They already have it. And they're just pushing it back to do more safety testing. Politics testing.[00:19:28] Alessio: Well, our episode with Sumit will have already come out by the time this comes out, I think. So people will get the inside story on how they actually allocate the compute.[00:19:38] Direction 3: Alt. Architectures (Mamba, RWKV, RingAttention, Diffusion Transformers)[00:19:38] Alessio: Alternative architectures. Well, shout out to our WKV who won one of the prizes at our Final Frontiers event last week.[00:19:47] Alessio: We talked about Mamba and Strapain on the Together episode. A lot of, yeah, monarch mixers. I feel like Together, It's like the strong Stanford Hazy Research Partnership, because Chris Ray is one of the co founders. So they kind of have a, I feel like they're going to be the ones that have one of the state of the art models alongside maybe RWKB.[00:20:08] Alessio: I haven't seen as many independent. People working on this thing, like Monarch Mixer, yeah, Manbuster, Payena, all of these are together related. Nobody understands the math. They got all the gigabrains, they got 3DAO, they got all these folks in there, like, working on all of this.[00:20:25] swyx: Albert Gu, yeah. Yeah, so what should we comment about it?[00:20:28] swyx: I mean, I think it's useful, interesting, but at the same time, both of these are supposed to do really good scaling for long context. And then Gemini comes out and goes like, yeah, we don't need it. Yeah.[00:20:44] Alessio: No, that's the risk. So, yeah. I was gonna say, maybe it's not here, but I don't know if we want to talk about diffusion transformers as like in the alt architectures, just because of Zora.[00:20:55] swyx: One thing, yeah, so, so, you know, this came from the Jan recap, which, and diffusion transformers were not really a discussion, and then, obviously, they blow up in February. Yeah. I don't think they're, it's a mixed architecture in the same way that Stripe Tiena is mixed there's just different layers taking different approaches.[00:21:13] swyx: Also I think another one that I maybe didn't call out here, I think because it happened in February, was hourglass diffusion from stability. But also, you know, another form of mixed architecture. So I guess that is interesting. I don't have much commentary on that, I just think, like, we will try to evolve these things, and maybe one of these architectures will stick and scale, it seems like diffusion transformers is going to be good for anything generative, you know, multi modal.[00:21:41] swyx: We don't see anything where diffusion is applied to text yet, and that's the wild card for this category. Yeah, I mean, I think I still hold out hope for let's just call it sub quadratic LLMs. I think that a lot of discussion this month actually was also centered around this concept that People always say, oh, like, transformers don't scale because attention is quadratic in the sequence length.[00:22:04] swyx: Yeah, but, you know, attention actually is a very small part of the actual compute that is being spent, especially in inference. And this is the reason why, you know, when you multiply, when you, when you, when you jump up in terms of the, the model size in GPT 4 from like, you know, 38k to like 32k, you don't also get like a 16 times increase in your, in your performance.[00:22:23] swyx: And this is also why you don't get like a million times increase in your, in your latency when you throw a million tokens into Gemini. Like people have figured out tricks around it or it's just not that significant as a term, as a part of the overall compute. So there's a lot of challenges to this thing working.[00:22:43] swyx: It's really interesting how like, how hyped people are about this versus I don't know if it works. You know, it's exactly gonna, gonna work. And then there's also this, this idea of retention over long context. Like, even though you have context utilization, like, the amount of, the amount you can remember is interesting.[00:23:02] swyx: Because I've had people criticize both Mamba and RWKV because they're kind of, like, RNN ish in the sense that they have, like, a hidden memory and sort of limited hidden memory that they will forget things. So, for all these reasons, Gemini 1. 5, which we still haven't covered, is very interesting because Gemini magically has fixed all these problems with perfect haystack recall and reasonable latency and cost.[00:23:29] Wildcards: Text Diffusion, RALM/Retro[00:23:29] swyx: So that's super interesting. So the wildcard I put in here if you want to go to that. I put two actually. One is text diffusion. I think I'm still very influenced by my meeting with a mid journey person who said they were working on text diffusion. I think it would be a very, very different paradigm for, for text generation, reasoning, plan generation if we can get diffusion to work.[00:23:51] swyx: For text. And then the second one is Dowie Aquila's contextual AI, which is working on retrieval augmented language models, where it kind of puts RAG inside of the language model instead of outside.[00:24:02] Alessio: Yeah, there's a paper called Retro that covers some of this. I think that's an interesting thing. I think the The challenge, well not the challenge, what they need to figure out is like how do you keep the rag piece always up to date constantly, you know, I feel like the models, you put all this work into pre training them, but then at least you have a fixed artifact.[00:24:22] Alessio: These architectures are like constant work needs to be done on them and they can drift even just based on the rag data instead of the model itself. Yeah,[00:24:30] swyx: I was in a panel with one of the investors in contextual and the guy, the way that guy pitched it, I didn't agree with. He was like, this will solve hallucination.[00:24:38] Alessio: That's what everybody says. We solve[00:24:40] swyx: hallucination. I'm like, no, you reduce it. It cannot,[00:24:44] Alessio: if you solved it, the model wouldn't exist, right? It would just be plain text. It wouldn't be a generative model. Cool. So, author, architectures, then we got mixture of experts. I think we covered a lot of, a lot of times.[00:24:56] Direction 4: Mixture of Experts (DeepSeekMoE, Samba-1)[00:24:56] Alessio: Maybe any new interesting threads you want to go under here?[00:25:00] swyx: DeepSeq MOE, which was released in January. Everyone who is interested in MOEs should read that paper, because it's significant for two reasons. One three reasons. One, it had, it had small experts, like a lot more small experts. So, for some reason, everyone has settled on eight experts for GPT 4 for Mixtral, you know, that seems to be the favorite architecture, but these guys pushed it to 64 experts, and each of them smaller than the other.[00:25:26] swyx: But then they also had the second idea, which is that it is They had two, one to two always on experts for common knowledge and that's like a very compelling concept that you would not route to all the experts all the time and make them, you know, switch to everything. You would have some always on experts.[00:25:41] swyx: I think that's interesting on both the inference side and the training side for for memory retention. And yeah, they, they, they, the, the, the, the results that they published, which actually excluded, Mixed draw, which is interesting. The results that they published showed a significant performance jump versus all the other sort of open source models at the same parameter count.[00:26:01] swyx: So like this may be a better way to do MOEs that are, that is about to get picked up. And so that, that is interesting for the third reason, which is this is the first time a new idea from China. has infiltrated the West. It's usually the other way around. I probably overspoke there. There's probably lots more ideas that I'm not aware of.[00:26:18] swyx: Maybe in the embedding space. But the I think DCM we, like, woke people up and said, like, hey, DeepSeek, this, like, weird lab that is attached to a Chinese hedge fund is somehow, you know, doing groundbreaking research on MOEs. So, so, I classified this as a medium potential because I think that it is a sort of like a one off benefit.[00:26:37] swyx: You can Add to any, any base model to like make the MOE version of it, you get a bump and then that's it. So, yeah,[00:26:45] Alessio: I saw Samba Nova, which is like another inference company. They released this MOE model called Samba 1, which is like a 1 trillion parameters. But they're actually MOE auto open source models.[00:26:56] Alessio: So it's like, they just, they just clustered them all together. So I think people. Sometimes I think MOE is like you just train a bunch of small models or like smaller models and put them together. But there's also people just taking, you know, Mistral plus Clip plus, you know, Deepcoder and like put them all together.[00:27:15] Alessio: And then you have a MOE model. I don't know. I haven't tried the model, so I don't know how good it is. But it seems interesting that you can then have people working separately on state of the art, you know, Clip, state of the art text generation. And then you have a MOE architecture that brings them all together.[00:27:31] swyx: I'm thrown off by your addition of the word clip in there. Is that what? Yeah, that's[00:27:35] Alessio: what they said. Yeah, yeah. Okay. That's what they I just saw it yesterday. I was also like[00:27:40] swyx: scratching my head. And they did not use the word adapter. No. Because usually what people mean when they say, Oh, I add clip to a language model is adapter.[00:27:48] swyx: Let me look up the Which is what Lava did.[00:27:50] Alessio: The announcement again.[00:27:51] swyx: Stable diffusion. That's what they do. Yeah, it[00:27:54] Alessio: says among the models that are part of Samba 1 are Lama2, Mistral, DeepSigCoder, Falcon, Dplot, Clip, Lava. So they're just taking all these models and putting them in a MOE. Okay,[00:28:05] swyx: so a routing layer and then not jointly trained as much as a normal MOE would be.[00:28:12] swyx: Which is okay.[00:28:13] Alessio: That's all they say. There's no paper, you know, so it's like, I'm just reading the article, but I'm interested to see how[00:28:20] Wildcard: Model Merging (mergekit)[00:28:20] swyx: it works. Yeah, so so the wildcard for this section, the MOE section is model merges, which has also come up as, as a very interesting phenomenon. The last time I talked to Jeremy Howard at the Olama meetup we called it model grafting or model stacking.[00:28:35] swyx: But I think the, the, the term that people are liking these days, the model merging, They're all, there's all different variations of merging. Merge types, and some of them are stacking, some of them are, are grafting. And, and so like, some people are approaching model merging in the way that Samba is doing, which is like, okay, here are defined models, each of which have their specific, Plus and minuses, and we will merge them together in the hope that the, you know, the sum of the parts will, will be better than others.[00:28:58] swyx: And it seems like it seems like it's working. I don't really understand why it works apart from, like, I think it's a form of regularization. That if you merge weights together in like a smart strategy you, you, you get a, you get a, you get a less overfitting and more generalization, which is good for benchmarks, if you, if you're honest about your benchmarks.[00:29:16] swyx: So this is really interesting and good. But again, they're kind of limited in terms of like the amount of bumps you can get. But I think it's very interesting in the sense of how cheap it is. We talked about this on the Chinatalk podcast, like the guest podcast that we did with Chinatalk. And you can do this without GPUs, because it's just adding weights together, and dividing things, and doing like simple math, which is really interesting for the GPU ports.[00:29:42] Alessio: There's a lot of them.[00:29:44] Direction 5: Online LLMs (Gemini Pro, Exa)[00:29:44] Alessio: And just to wrap these up, online LLMs? Yeah,[00:29:48] swyx: I think that I ki I had to feature this because the, one of the top news of January was that Gemini Pro beat GPT-4 turbo on LM sis for the number two slot to GPT-4. And everyone was very surprised. Like, how does Gemini do that?[00:30:06] swyx: Surprise, surprise, they added Google search. Mm-hmm to the results. So it became an online quote unquote online LLM and not an offline LLM. Therefore, it's much better at answering recent questions, which people like. There's an emerging set of table stakes features after you pre train something.[00:30:21] swyx: So after you pre train something, you should have the chat tuned version of it, or the instruct tuned version of it, however you choose to call it. You should have the JSON and function calling version of it. Structured output, the term that you don't like. You should have the online version of it. These are all like table stakes variants, that you should do when you offer a base LLM, or you train a base LLM.[00:30:44] swyx: And I think online is just like, There, it's important. I think companies like Perplexity, and even Exa, formerly Metaphor, you know, are rising to offer that search needs. And it's kind of like, they're just necessary parts of a system. When you have RAG for internal knowledge, and then you have, you know, Online search for external knowledge, like things that you don't know yet?[00:31:06] swyx: Mm-Hmm. . And it seems like it's, it's one of many tools. I feel like I may be underestimating this, but I'm just gonna put it out there that I, I think it has some, some potential. One of the evidence points that it doesn't actually matter that much is that Perplexity has a, has had online LMS for three months now and it performs, doesn't perform great.[00:31:25] swyx: Mm-Hmm. on, on lms, it's like number 30 or something. So it's like, okay. You know, like. It's, it's, it helps, but it doesn't give you a giant, giant boost. I[00:31:34] Alessio: feel like a lot of stuff I do with LLMs doesn't need to be online. So I'm always wondering, again, going back to like state of the art, right? It's like state of the art for who and for what.[00:31:45] Alessio: It's really, I think online LLMs are going to be, State of the art for, you know, news related activity that you need to do. Like, you're like, you know, social media, right? It's like, you want to have all the latest stuff, but coding, science,[00:32:01] swyx: Yeah, but I think. Sometimes you don't know what is news, what is news affecting.[00:32:07] swyx: Like, the decision to use an offline LLM is already a decision that you might not be consciously making that might affect your results. Like, what if, like, just putting things on, being connected online means that you get to invalidate your knowledge. And when you're just using offline LLM, like it's never invalidated.[00:32:27] swyx: I[00:32:28] Alessio: agree, but I think going back to your point of like the standing the test of time, I think sometimes you can get swayed by the online stuff, which is like, hey, you ask a question about, yeah, maybe AI research direction, you know, and it's like, all the recent news are about this thing. So the LLM like focus on answering, bring it up, you know, these things.[00:32:50] swyx: Yeah, so yeah, I think, I think it's interesting, but I don't know if I can, I bet heavily on this.[00:32:56] Alessio: Cool. Was there one that you forgot to put, or, or like a, a new direction? Yeah,[00:33:01] swyx: so, so this brings us into sort of February. ish.[00:33:05] OpenAI Sora and why everyone underestimated videogen[00:33:05] swyx: So like I published this in like 15 came with Sora. And so like the one thing I did not mention here was anything about multimodality.[00:33:16] swyx: Right. And I have chronically underweighted this. I always wrestle. And, and my cop out is that I focused this piece or this research direction piece on LLMs because LLMs are the source of like AGI, quote unquote AGI. Everything else is kind of like. You know, related to that, like, generative, like, just because I can generate better images or generate better videos, it feels like it's not on the critical path to AGI, which is something that Nat Friedman also observed, like, the day before Sora, which is kind of interesting.[00:33:49] swyx: And so I was just kind of like trying to focus on like what is going to get us like superhuman reasoning that we can rely on to build agents that automate our lives and blah, blah, blah, you know, give us this utopian future. But I do think that I, everybody underestimated the, the sheer importance and cultural human impact of Sora.[00:34:10] swyx: And you know, really actually good text to video. Yeah. Yeah.[00:34:14] Alessio: And I saw Jim Fan at a, at a very good tweet about why it's so impressive. And I think when you have somebody leading the embodied research at NVIDIA and he said that something is impressive, you should probably listen. So yeah, there's basically like, I think you, you mentioned like impacting the world, you know, that we live in.[00:34:33] Alessio: I think that's kind of like the key, right? It's like the LLMs don't have, a world model and Jan Lekon. He can come on the podcast and talk all about what he thinks of that. But I think SORA was like the first time where people like, Oh, okay, you're not statically putting pixels of water on the screen, which you can kind of like, you know, project without understanding the physics of it.[00:34:57] Alessio: Now you're like, you have to understand how the water splashes when you have things. And even if you just learned it by watching video and not by actually studying the physics, You still know it, you know, so I, I think that's like a direction that yeah, before you didn't have, but now you can do things that you couldn't before, both in terms of generating, I think it always starts with generating, right?[00:35:19] Alessio: But like the interesting part is like understanding it. You know, it's like if you gave it, you know, there's the video of like the, the ship in the water that they generated with SORA, like if you gave it the video back and now it could tell you why the ship is like too rocky or like it could tell you why the ship is sinking, then that's like, you know, AGI for like all your rig deployments and like all this stuff, you know, so, but there's none, there's none of that yet, so.[00:35:44] Alessio: Hopefully they announce it and talk more about it. Maybe a Dev Day this year, who knows.[00:35:49] swyx: Yeah who knows, who knows. I'm talking with them about Dev Day as well. So I would say, like, the phrasing that Jim used, which resonated with me, he kind of called it a data driven world model. I somewhat agree with that.[00:36:04] Does Sora have a World Model? Yann LeCun vs Jim Fan[00:36:04] swyx: I am on more of a Yann LeCun side than I am on Jim's side, in the sense that I think that is the vision or the hope that these things can build world models. But you know, clearly even at the current SORA size, they don't have the idea of, you know, They don't have strong consistency yet. They have very good consistency, but fingers and arms and legs will appear and disappear and chairs will appear and disappear.[00:36:31] swyx: That definitely breaks physics. And it also makes me think about how we do deep learning versus world models in the sense of You know, in classic machine learning, when you have too many parameters, you will overfit, and actually that fails, that like, does not match reality, and therefore fails to generalize well.[00:36:50] swyx: And like, what scale of data do we need in order to world, learn world models from video? A lot. Yeah. So, so I, I And cautious about taking this interpretation too literally, obviously, you know, like, I get what he's going for, and he's like, obviously partially right, obviously, like, transformers and, and, you know, these, like, these sort of these, these neural networks are universal function approximators, theoretically could figure out world models, it's just like, how good are they, and how tolerant are we of hallucinations, we're not very tolerant, like, yeah, so It's, it's, it's gonna prior, it's gonna bias us for creating like very convincing things, but then not create like the, the, the useful role models that we want.[00:37:37] swyx: At the same time, what you just said, I think made me reflect a little bit like we just got done saying how important synthetic data is for Mm-Hmm. for training lms. And so like, if this is a way of, of synthetic, you know, vi video data for improving our video understanding. Then sure, by all means. Which we actually know, like, GPT 4, Vision, and Dolly were trained, kind of, co trained together.[00:38:02] swyx: And so, like, maybe this is on the critical path, and I just don't fully see the full picture yet.[00:38:08] Alessio: Yeah, I don't know. I think there's a lot of interesting stuff. It's like, imagine you go back, you have Sora, you go back in time, and Newton didn't figure out gravity yet. Would Sora help you figure it out?[00:38:21] Alessio: Because you start saying, okay, a man standing under a tree with, like, Apples falling, and it's like, oh, they're always falling at the same speed in the video. Why is that? I feel like sometimes these engines can like pick up things, like humans have a lot of intuition, but if you ask the average person, like the physics of like a fluid in a boat, they couldn't be able to tell you the physics, but they can like observe it, but humans can only observe this much, you know, versus like now you have these models to observe everything and then They generalize these things and maybe we can learn new things through the generalization that they pick up.[00:38:55] swyx: But again, And it might be more observant than us in some respects. In some ways we can scale it up a lot more than the number of physicists that we have available at Newton's time. So like, yeah, absolutely possible. That, that this can discover new science. I think we have a lot of work to do to formalize the science.[00:39:11] swyx: And then, I, I think the last part is you know, How much, how much do we cheat by gen, by generating data from Unreal Engine 5? Mm hmm. which is what a lot of people are speculating with very, very limited evidence that OpenAI did that. The strongest evidence that I saw was someone who works a lot with Unreal Engine 5 looking at the side characters in the videos and noticing that they all adopt Unreal Engine defaults.[00:39:37] swyx: of like, walking speed, and like, character choice, like, character creation choice. And I was like, okay, like, that's actually pretty convincing that they actually use Unreal Engine to bootstrap some synthetic data for this training set. Yeah,[00:39:52] Alessio: could very well be.[00:39:54] swyx: Because then you get the labels and the training side by side.[00:39:58] swyx: One thing that came up on the last day of February, which I should also mention, is EMO coming out of Alibaba, which is also a sort of like video generation and space time transformer that also involves probably a lot of synthetic data as well. And so like, this is of a kind in the sense of like, oh, like, you know, really good generative video is here and It is not just like the one, two second clips that we saw from like other, other people and like, you know, Pika and all the other Runway are, are, are, you know, run Cristobal Valenzuela from Runway was like game on which like, okay, but like, let's see your response because we've heard a lot about Gen 1 and 2, but like, it's nothing on this level of Sora So it remains to be seen how we can actually apply this, but I do think that the creative industry should start preparing.[00:40:50] swyx: I think the Sora technical blog post from OpenAI was really good.. It was like a request for startups. It was so good in like spelling out. Here are the individual industries that this can impact.[00:41:00] swyx: And anyone who, anyone who's like interested in generative video should look at that. But also be mindful that probably when OpenAI releases a Soa API, right? The you, the in these ways you can interact with it are very limited. Just like the ways you can interact with Dahlia very limited and someone is gonna have to make open SOA to[00:41:19] swyx: Mm-Hmm to, to, for you to create comfy UI pipelines.[00:41:24] Alessio: The stability folks said they wanna build an open. For a competitor, but yeah, stability. Their demo video, their demo video was like so underwhelming. It was just like two people sitting on the beach[00:41:34] swyx: standing. Well, they don't have it yet, right? Yeah, yeah.[00:41:36] swyx: I mean, they just wanna train it. Everybody wants to, right? Yeah. I, I think what is confusing a lot of people about stability is like they're, they're, they're pushing a lot of things in stable codes, stable l and stable video diffusion. But like, how much money do they have left? How many people do they have left?[00:41:51] swyx: Yeah. I have had like a really, Ima Imad spent two hours with me. Reassuring me things are great. And, and I'm like, I, I do, like, I do believe that they have really, really quality people. But it's just like, I, I also have a lot of very smart people on the other side telling me, like, Hey man, like, you know, don't don't put too much faith in this, in this thing.[00:42:11] swyx: So I don't know who to believe. Yeah.[00:42:14] Alessio: It's hard. Let's see. What else? We got a lot more stuff. I don't know if we can. Yeah, Groq.[00:42:19] Groq Math[00:42:19] Alessio: We can[00:42:19] swyx: do a bit of Groq prep. We're, we're about to go to talk to Dylan Patel. Maybe, maybe it's the audio in here. I don't know. It depends what, what we get up to later. What, how, what do you as an investor think about Groq? Yeah. Yeah, well, actually, can you recap, like, why is Groq interesting? So,[00:42:33] Alessio: Jonathan Ross, who's the founder of Groq, he's the person that created the TPU at Google. It's actually, it was one of his, like, 20 percent projects. It's like, he was just on the side, dooby doo, created the TPU.[00:42:46] Alessio: But yeah, basically, Groq, they had this demo that went viral, where they were running Mistral at, like, 500 tokens a second, which is like, Fastest at anything that you have out there. The question, you know, it's all like, The memes were like, is NVIDIA dead? Like, people don't need H100s anymore. I think there's a lot of money that goes into building what GRUK has built as far as the hardware goes.[00:43:11] Alessio: We're gonna, we're gonna put some of the notes from, from Dylan in here, but Basically the cost of the Groq system is like 30 times the cost of, of H100 equivalent. So, so[00:43:23] swyx: let me, I put some numbers because me and Dylan were like, I think the two people actually tried to do Groq math. Spreadsheet doors.[00:43:30] swyx: Spreadsheet doors. So, one that's, okay, oh boy so, so, equivalent H100 for Lama 2 is 300, 000. For a system of 8 cards. And for Groq it's 2. 3 million. Because you have to buy 576 Groq cards. So yeah, that, that just gives people an idea. So like if you deprecate both over a five year lifespan, per year you're deprecating 460K for Groq, and 60K a year for H100.[00:43:59] swyx: So like, Groqs are just way more expensive per model that you're, that you're hosting. But then, you make it up in terms of volume. So I don't know if you want to[00:44:08] Alessio: cover that. I think one of the promises of Groq is like super high parallel inference on the same thing. So you're basically saying, okay, I'm putting on this upfront investment on the hardware, but then I get much better scaling once I have it installed.[00:44:24] Alessio: I think the big question is how much can you sustain the parallelism? You know, like if you get, if you're going to get 100% Utilization rate at all times on Groq, like, it's just much better, you know, because like at the end of the day, the tokens per second costs that you're getting is better than with the H100s, but if you get to like 50 percent utilization rate, you will be much better off running on NVIDIA.[00:44:49] Alessio: And if you look at most companies out there, who really gets 100 percent utilization rate? Probably open AI at peak times, but that's probably it. But yeah, curious to see more. I saw Jonathan was just at the Web Summit in Dubai, in Qatar. He just gave a talk there yesterday. That I haven't listened to yet.[00:45:09] Alessio: I, I tweeted that he should come on the pod. He liked it. And then rock followed me on Twitter. I don't know if that means that they're interested, but[00:45:16] swyx: hopefully rock social media person is just very friendly. They, yeah. Hopefully[00:45:20] Alessio: we can get them. Yeah, we, we gonna get him. We[00:45:22] swyx: just call him out and, and so basically the, the key question is like, how sustainable is this and how much.[00:45:27] swyx: This is a loss leader the entire Groq management team has been on Twitter and Hacker News saying they are very, very comfortable with the pricing of 0. 27 per million tokens. This is the lowest that anyone has offered tokens as far as Mixtral or Lama2. This matches deep infra and, you know, I think, I think that's, that's, that's about it in terms of that, that, that low.[00:45:47] swyx: And we think the pro the break even for H100s is 50 cents. At a, at a normal utilization rate. To make this work, so in my spreadsheet I made this, made this work. You have to have like a parallelism of 500 requests all simultaneously. And you have, you have model bandwidth utilization of 80%.[00:46:06] swyx: Which is way high. I just gave them high marks for everything. Groq has two fundamental tech innovations that they hinge their hats on in terms of like, why we are better than everyone. You know, even though, like, it remains to be independently replicated. But one you know, they have this sort of the entire model on the chip idea, which is like, Okay, get rid of HBM.[00:46:30] swyx: And, like, put everything in SREM. Like, okay, fine, but then you need a lot of cards and whatever. And that's all okay. And so, like, because you don't have to transfer between memory, then you just save on that time and that's why they're faster. So, a lot of people buy that as, like, that's the reason that you're faster.[00:46:45] swyx: Then they have, like, some kind of crazy compiler, or, like, Speculative routing magic using compilers that they also attribute towards their higher utilization. So I give them 80 percent for that. And so that all that works out to like, okay, base costs, I think you can get down to like, maybe like 20 something cents per million tokens.[00:47:04] swyx: And therefore you actually are fine if you have that kind of utilization. But it's like, I have to make a lot of fearful assumptions for this to work.[00:47:12] Alessio: Yeah. Yeah, I'm curious to see what Dylan says later.[00:47:16] swyx: So he was like completely opposite of me. He's like, they're just burning money. Which is great.[00:47:22] Analyzing Gemini's 1m Context, Reddit deal, Imagegen politics, Gemma via the Four Wars[00:47:22] Alessio: Gemini, want to do a quick run through since this touches on all the four words.[00:47:28] swyx: Yeah, and I think this is the mark of a useful framework, that when a new thing comes along, you can break it down in terms of the four words and sort of slot it in or analyze it in those four frameworks, and have nothing left.[00:47:41] swyx: So it's a MECE categorization. MECE is Mutually Exclusive and Collectively Exhaustive. And that's a really, really nice way to think about taxonomies and to create mental frameworks. So, what is Gemini 1. 5 Pro? It is the newest model that came out one week after Gemini 1. 0. Which is very interesting.[00:48:01] swyx: They have not really commented on why. They released this the headline feature is that it has a 1 million token context window that is multi modal which means that you can put all sorts of video and audio And PDFs natively in there alongside of text and, you know, it's, it's at least 10 times longer than anything that OpenAI offers which is interesting.[00:48:20] swyx: So it's great for prototyping and it has interesting discussions on whether it kills RAG.[00:48:25] Alessio: Yeah, no, I mean, we always talk about, you know, Long context is good, but you're getting charged per token. So, yeah, people love for you to use more tokens in the context. And RAG is better economics. But I think it all comes down to like how the price curves change, right?[00:48:42] Alessio: I think if anything, RAG's complexity goes up and up the more you use it, you know, because you have more data sources, more things you want to put in there. The token costs should go down over time, you know, if the model stays fixed. If people are happy with the model today. In two years, three years, it's just gonna cost a lot less, you know?[00:49:02] Alessio: So now it's like, why would I use RAG and like go through all of that? It's interesting. I think RAG is better cutting edge economics for LLMs. I think large context will be better long tail economics when you factor in the build cost of like managing a RAG pipeline. But yeah, the recall was like the most interesting thing because we've seen the, you know, You know, in the haystack things in the past, but apparently they have 100 percent recall on anything across the context window.[00:49:28] Alessio: At least they say nobody has used it. No, people[00:49:30] swyx: have. Yeah so as far as, so, so what this needle in a haystack thing for people who aren't following as closely as us is that someone, I forget his name now someone created this needle in a haystack problem where you feed in a whole bunch of generated junk not junk, but just like, Generate a data and ask it to specifically retrieve something in that data, like one line in like a hundred thousand lines where it like has a specific fact and if it, if you get it, you're, you're good.[00:49:57] swyx: And then he moves the needle around, like, you know, does it, does, does your ability to retrieve that vary if I put it at the start versus put it in the middle, put it at the end? And then you generate this like really nice chart. That, that kind of shows like it's recallability of a model. And he did that for GPT and, and Anthropic and showed that Anthropic did really, really poorly.[00:50:15] swyx: And then Anthropic came back and said it was a skill issue, just add this like four, four magic words, and then, then it's magically all fixed. And obviously everybody laughed at that. But what Gemini came out with was, was that, yeah, we, we reproduced their, you know, haystack issue you know, test for Gemini, and it's good across all, all languages.[00:50:30] swyx: All the one million token window, which is very interesting because usually for typical context extension methods like rope or yarn or, you know, anything like that, or alibi, it's lossy like by design it's lossy, usually for conversations that's fine because we are lossy when we talk to people but for superhuman intelligence, perfect memory across Very, very long context.[00:50:51] swyx: It's very, very interesting for picking things up. And so the people who have been given the beta test for Gemini have been testing this. So what you do is you upload, let's say, all of Harry Potter and you change one fact in one sentence, somewhere in there, and you ask it to pick it up, and it does. So this is legit.[00:51:08] swyx: We don't super know how, because this is, like, because it doesn't, yes, it's slow to inference, but it's not slow enough that it's, like, running. Five different systems in the background without telling you. Right. So it's something, it's something interesting that they haven't fully disclosed yet. The open source community has centered on this ring attention paper, which is created by your friend Matei Zaharia, and a couple other people.[00:51:36] swyx: And it's a form of distributing the compute. I don't super understand, like, why, you know, doing, calculating, like, the fee for networking and attention. In block wise fashion and distributing it makes it so good at recall. I don't think they have any answer to that. The only thing that Ring of Tension is really focused on is basically infinite context.[00:51:59] swyx: They said it was good for like 10 to 100 million tokens. Which is, it's just great. So yeah, using the four wars framework, what is this framework for Gemini? One is the sort of RAG and Ops war. Here we care less about RAG now, yes. Or, we still care as much about RAG, but like, now it's it's not important in prototyping.[00:52:21] swyx: And then, for data war I guess this is just part of the overall training dataset, but Google made a 60 million deal with Reddit and presumably they have deals with other companies. For the multi modality war, we can talk about the image generation, Crisis, or the fact that Gemini also has image generation, which we'll talk about in the next section.[00:52:42] swyx: But it also has video understanding, which is, I think, the top Gemini post came from our friend Simon Willison, who basically did a short video of him scanning over his bookshelf. And it would be able to convert that video into a JSON output of what's on that bookshelf. And I think that is very useful.[00:53:04] swyx: Actually ties into the conversation that we had with David Luan from Adept. In a sense of like, okay what if video was the main modality instead of text as the input? What if, what if everything was video in, because that's how we work. We, our eyes don't actually read, don't actually like get input, our brains don't get inputs as characters.[00:53:25] swyx: Our brains get the pixels shooting into our eyes, and then our vision system takes over first, and then we sort of mentally translate that into text later. And so it's kind of like what Adept is kind of doing, which is driving by vision model, instead of driving by raw text understanding of the DOM. And, and I, I, in that, that episode, which we haven't released I made the analogy to like self-driving by lidar versus self-driving by camera.[00:53:52] swyx: Mm-Hmm. , right? Like, it's like, I think it, what Gemini and any other super long context that model that is multimodal unlocks is what if you just drive everything by video. Which is[00:54:03] Alessio: cool. Yeah, and that's Joseph from Roboflow. It's like anything that can be seen can be programmable with these models.[00:54:12] Alessio: You mean[00:54:12] swyx: the computer vision guy is bullish on computer vision?[00:54:18] Alessio: It's like the rag people. The rag people are bullish on rag and not a lot of context. I'm very surprised. The, the fine tuning people love fine tuning instead of few shot. Yeah. Yeah. The, yeah, the, that's that. Yeah, the, I, I think the ring attention thing, and it's how they did it, we don't know. And then they released the Gemma models, which are like a 2 billion and 7 billion open.[00:54:41] Alessio: Models, which people said are not, are not good based on my Twitter experience, which are the, the GPU poor crumbs. It's like, Hey, we did all this work for us because we're GPU rich and we're just going to run this whole thing. And

ceo american spotify tiktok black ai australia europe art english google china apple vision france politics service online crisis state living san francisco west research russia chinese reach elon musk search microsoft teacher surprise chatgpt ring security harry potter asian run broadway silicon valley ceos mvp reddit discord medium mail dubai math stanford adolf hitler fill worlds complex direction context substack qatar mixed stanford university dom one year tension ia cto falcon offensive retro minecraft gemini openai newton hungary explorers sf nvidia residence archive alt ux builder api laptops apples lamar discovered fastest sweep generate stable gpt voyager developed python ui jet mm j'ai stretching hungarian ml lama rj alibaba automated github llama directions notion rail grimes lava merge transformer runway lesser metaphor clip amd llm synthetic anthropic sora copilot sam altman samba bal structured emo shack ops gpu wechat agi ix mamba unreal engine spreadsheets perplexity connector rahul vector raspberry pi zapier bytedance sql rag gpus pixie collected sonar c4 7b anz deepmind vps utilization lambda google gemini alessio tiananmen square speculative lms gopher 60k mistral lm json web summit sundar pichai mixture arp cli kura pika tendency pocketcast soa motif a16z sumit demo day digital ocean chinchillas itamar adept npm versa yon reassuring linux foundation markov dabble hacker news dcm replit us tech yann lecun boma tpu omo moes agis svelte groq jupyter open api matryoshka jupyter notebooks 70b jeremy howard hbm exa vipul neurips gemini pro mece rlhf nat friedman rnn code interpreter chris ray mrl simon willison naton and openai 460k audio recap latent space unthinking sfai versal jerry liu matei zaharia hashnode

Ring Attention with Blockwise Transformers for Near-Infinite Context

Papers Read on AI

Play Episode Listen Later Feb 26, 2024 26:42

Transformers have emerged as the architecture of choice for many state-of-the-art AI models, showcasing exceptional performance across a wide range of AI applications. However, the memory demands imposed by Transformers limit their ability to handle long sequences, thereby posing challenges in utilizing videos, actions, and other long-form sequences and modalities in complex environments. We present a novel approach, Ring Attention with Blockwise Transformers (Ring Attention), which leverages blockwise computation of self-attention and feedforward to distribute long sequences across multiple devices while fully overlapping the communication of key-value blocks with the computation of blockwise attention. Our approach enables training and inference of sequences that are up to device count times longer than those achievable by prior memory-efficient Transformers, without resorting to approximations or incurring additional communication and computation overheads. Extensive experiments on language modeling and reinforcement learning tasks demonstrate the effectiveness of our approach in allowing millions of tokens context size and improving performance. 2023: Hao Liu, Matei Zaharia, Pieter Abbeel https://arxiv.org/pdf/2310.01889v4.pdf

ai ring attention context infinite transformers extensive matei zaharia

World Model on Million-Length Video And Language With RingAttention

Papers Read on AI

Play Episode Listen Later Feb 17, 2024 30:00

Current language models fall short in understanding aspects of the world not easily described in words, and struggle with complex, long-form tasks. Video sequences offer valuable temporal information absent in language and static images, making them attractive for joint modeling with language. Such models could develop a understanding of both human textual knowledge and the physical world, enabling broader AI capabilities for assisting humans. However, learning from millions of tokens of video and language sequences poses challenges due to memory constraints, computational complexity, and limited datasets. To address these challenges, we curate a large dataset of diverse videos and books, utilize the RingAttention technique to scalably train on long sequences, and gradually increase context size from 4K to 1M tokens. This paper makes the following contributions: (a) Largest context size neural network: We train one of the largest context size transformers on long video and language sequences, setting new benchmarks in difficult retrieval tasks and long video understanding. (b) Solutions for overcoming vision-language training challenges, including using masked sequence packing for mixing different sequence lengths, loss weighting to balance language and vision, and model-generated QA dataset for long sequence chat. (c) A highly-optimized implementation with RingAttention, masked sequence packing, and other key features for training on millions-length multimodal sequences. (d) Fully open-sourced a family of 7B parameter models capable of processing long text documents (LWM-Text, LWM-Text-Chat) and videos (LWM, LWM-Chat) of over 1M tokens. This work paves the way for training on massive datasets of long video and language to develop understanding of both human knowledge and the multimodal world, and broader capabilities. 2024: Hao Liu, Wilson Yan, Matei Zaharia, Pieter Abbeel https://arxiv.org/pdf/2402.08268.pdf

ai video model current language largest 4k qa 7b matei zaharia lwm

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

Papers Read on AI

Play Episode Listen Later Jan 31, 2024 47:55

The ML community is rapidly exploring techniques for prompting language models (LMs) and for stacking them into pipelines that solve complex tasks. Unfortunately, existing LM pipelines are typically implemented using hard-coded"prompt templates", i.e. lengthy strings discovered via trial and error. Toward a more systematic approach for developing and optimizing LM pipelines, we introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs, i.e. imperative computational graphs where LMs are invoked through declarative modules. DSPy modules are parameterized, meaning they can learn (by creating and collecting demonstrations) how to apply compositions of prompting, finetuning, augmentation, and reasoning techniques. We design a compiler that will optimize any DSPy pipeline to maximize a given metric. We conduct two case studies, showing that succinct DSPy programs can express and optimize sophisticated LM pipelines that reason about math word problems, tackle multi-hop retrieval, answer complex questions, and control agent loops. Within minutes of compiling, a few lines of DSPy allow GPT-3.5 and llama2-13b-chat to self-bootstrap pipelines that outperform standard few-shot prompting (generally by over 25% and 65%, respectively) and pipelines with expert-created demonstrations (by up to 5-46% and 16-40%, respectively). On top of that, DSPy programs compiled to open and relatively small LMs like 770M-parameter T5 and llama2-13b-chat are competitive with approaches that rely on expert-written prompt chains for proprietary GPT-3.5. DSPy is available at https://github.com/stanfordnlp/dspy 2023: O. Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, Christopher Potts https://arxiv.org/pdf/2310.03714.pdf

model language improving gpt ml pipelines lms lm compiling t5 declarative khattab language model heather miller matei zaharia

FrugalGPT: Better Quality and Lower Cost for LLM Applications // Lingjiao Chen // MLOps Podcast #173

MLOps.community

Play Episode Listen Later Aug 22, 2023 62:58

MLOps Coffee Sessions #173 with Lingjiao Chen, FrugalGPT: Better Quality and Lower Cost for LLM Applications. This episode is sponsored by QuantumBlack. We are now accepting talk proposals for our next LLM in Production virtual conference on October 3rd. Apply to speak here: https://go.mlops.community/NSAX1O // Abstract There is a rapidly growing number of large language models (LLMs) that users can query for a fee. We review the cost associated with querying popular LLM APIs, e.g. GPT-4, ChatGPT, J1-Jumbo, and find that these models have heterogeneous pricing structures, with fees that can differ by two orders of magnitude. In particular, using LLMs on large collections of queries and text can be expensive. Motivated by this, we outline and discuss three types of strategies that users can exploit to reduce the inference cost associated with using LLMs: 1) prompt adaptation, 2) LLM approximation, and 3) LLM cascade. As an example, we propose FrugalGPT, a simple yet flexible instantiation of LLM cascade which learns which combinations of LLMs to use for different queries in order to reduce cost and improve accuracy. Our experiments show that FrugalGPT can match the performance of the best individual LLM (e.g. GPT-4) with up to 98% cost reduction or improve the accuracy over GPT-4 by 4% with the same cost. The ideas and findings presented here lay a foundation for using LLMs sustainably and efficiently. // Bio Lingjiao Chen is a Ph.D. candidate in the computer sciences department at Stanford University. He is broadly interested in machine learning, data management, and optimization. Working with Matei Zaharia and James Zou, he is currently exploring the fast-growing marketplaces of artificial intelligence and data. His work has been published at premier conferences and journals such as ICML, NeurIPS, SIGMOD, and PVLDB, and partially supported by a Google fellowship. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links ⁠Website: https://lchen001.github.io/ FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance paper: https://arxiv.org/abs/2305.05176 --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Lingjiao on LinkedIn:

google chatgpt production stanford university applications motivated chen gpt llm lower costs better quality improving performance demetrios neurips icml connect with us join matei zaharia

EP 74 – Inspiration session #9: Golf, Gender and Generative AI

WHU's Most Awesome Founder Podcast

Play Episode Listen Later Aug 2, 2023 83:13

We are thrilled to announce the launch of the 9th edition of the inspiration session with Gerrit and Dries!

How is ChatGPT's behavior changing over time?

Papers Read on AI

Play Episode Listen Later Jul 26, 2023 18:44

GPT-3.5 and GPT-4 are the two most widely used large language model (LLM) services. However, when and how these models are updated over time is opaque. Here, we evaluate the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on four diverse tasks: 1) solving math problems, 2) answering sensitive/dangerous questions, 3) generating code and 4) visual reasoning. We find that the performance and behavior of both GPT-3.5 and GPT-4 can vary greatly over time. For example, GPT-4 (March 2023) was very good at identifying prime numbers (accuracy 97.6%) but GPT-4 (June 2023) was very poor on these same questions (accuracy 2.4%). Interestingly GPT-3.5 (June 2023) was much better than GPT-3.5 (March 2023) in this task. GPT-4 was less willing to answer sensitive questions in June than in March, and both GPT-4 and GPT-3.5 had more formatting mistakes in code generation in June than in March. Overall, our findings shows that the behavior of the same LLM service can change substantially in a relatively short amount of time, highlighting the need for continuous monitoring of LLM quality. 2023: Lingjiao Chen, Matei Zaharia, James Y. Zou https://arxiv.org/pdf/2307.09009v1.pdf

chatgpt behavior gpt llm matei zaharia

ThursdAI July 20 - LLaMa 2, Vision and multimodality for all, and is GPT-4 getting dumber?

The top AI news from the past week, every ThursdAI

Play Episode Listen Later Jul 21, 2023 14:40

ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.If you'd like to hear the whole 2 hour conversation, here's the link to twitter spaces we had. And if you'd like to add us to your favorite podcatcher - here's the RSS link while we're pending approval from Apple/SpotifyHappy LLaMa day! Meta open sourced LLaMa v2 with a fully commercial license. LLaMa 1 was considered the best open source LLM, this one can be used for commercial purposes, unless you have more than 700MM monthly active users (no

ai google uk vision news european union math cto openai api coding bard gpt bing ui llama llm dumber ocr 8k databricks arvind standford 16k matei researches multimodality ymmv matei zaharia

How to Deploy AI in Your Company (Matei Zaharia, CTO & Co-Founder of Databricks)

How to B2B a CEO (with Ashu Garg)

Play Episode Listen Later Jun 2, 2023 39:47

If you're not hardcore about AI, then stop the podcast now — because this episode for real technology nerds. Ashu's guest is Matei Zaharia, CTO and cofounder of Databricks, and a professor of computer science at Stanford University. Ashu and Matei cover a lot of ground — most of it technical, all of it very relevant for anyone who's serious about building with artificial intelligence. They start with a discussion of Databricks' early days: how the startup established a foothold in a market dominated by entrenched incumbents and some of the challenges Matei and his cofounders faced. The rest of the conversation is all about AI. Matei breaks down the most common challenges that enterprises run into when attempting to adopt AI. He shares tips for how startups can best deploy foundation models. And he speculates on what the game-changing new use cases for AI will be over the next two years. Matei also pulls back the kimono on some of the cutting-edge machine-learning research he and his team at Stanford are working on. Finally, listen to the end for a fascinating exchange about artificial intelligence beyond large language models.

ai co founders stanford b2b stanford university cto venture deploy databricks matei ashu matei zaharia

The Bazaar in the Cathedral with Matei Zaharia

Data Radicals

Play Episode Listen Later May 24, 2023 49:47

When building a data platform, it's important to stay true to your vision. Whether that's through creating a definitive user experience or an open platform that allows people to build upon it, you're constructing a cathedral. This cathedral is sophisticated and dependable, and allows for a bazaar of business intelligence, machine learning, and AI use cases.In this episode, Satyen interviews Matei Zaharia, Chief Technologist and Co-founder of Databricks. Matei is an open source trailblazer and the creator of Apache Spark, a widely used framework for distributed data processing. He is also an Associate Professor of Computer Science at Stanford University where he leads various data management and machine learning projects. Matei and Satyen discuss the Databricks and Alation partnership, exploring how platforms can help companies own their data, and consider the value of democratizing open source large language models.--------“One of the early stories about open source has been this thing about the cathedral and the bazaar. The cathedral is the thing that's all designed by one person, maybe. It's extremely coherent and so on, but also takes forever to build. And when you go there, there's one message you're hearing. And then the bazaar is the open thing. You don't know who's going to show up each day, but there'll be some really interesting goods and things that you just wouldn't see anywhere else. If you just want to get started and get stuff done, follow the defaults in the product and it'll work. But, we want to be open to some of that innovation and let people bring that in.” – Matei Zaharia--------Time Stamps:*(01:33): The story behind Spark*(11:56): Solving for user problems versus product vision*(20:12): The cathedral and the bazaar of open source*(24:04): Matei explains the Databricks Unity Catalog*(31:04): The Databricks and Alation partnership*(43:36): The data culture at Databricks*(48:21): Satyen's Takeaways--------SponsorThis podcast is presented by Alation.Learn more:* Subscribe to the newsletter: https://www.alation.com/podcast/* Alation's LinkedIn Profile: https://www.linkedin.com/company/alation/* Satyen's LinkedIn Profile: https://www.linkedin.com/in/ssangani/--------LinksFollow Matei on LinkedInFollow Matei on TwitterLearn more about Databricks's Unity CatalogLearn more about Alation + Databricks

ai data artificial intelligence associate professor spark stanford university machine learning computer science cathedrals ml bazaar business intelligence large language models databricks data platform chief technologist matei apache spark satyen matei zaharia

The Birth and Growth of Spark: An Open Source Success Story // Matei Zaharia // MLOps Podcast #155

MLOps.community

Play Episode Listen Later Apr 25, 2023 58:12

MLOps Coffee Sessions #155 with Matei Zaharia, The Birth and Growth of Spark: An Open Source Success Story, co-hosted by Vishnu Rachakonda. // Abstract We dive deep into the creation of Spark, with the creator himself - Matei Zaharia Chief technologist at Databricks. This episode also explores the development of Databricks' other open source home run ML Flow and the concept of "lake house ML". As a special treat Matei talked to us about the details of the "DSP" (Demonstrate Search Predict) project, which aims to enable building applications by combining LLMs and other text-returning systems. // About the guest: Matei has the unique advantage of being able to see different perspectives, having worked in both academia and the industry. He listens carefully to people's challenges and excitement about ML and uses this to come up with new ideas. As a member of Databricks, Matei also has the advantage of applying ML to Databricks' own internal practices. He is constantly asking the question "What's a better way to do this?" // Bio Matei Zaharia is an Associate Professor of Computer Science at Stanford and Chief Technologist at Databricks. He started the Apache Spark project during his Ph.D. at UC Berkeley, and co-developed other widely used open-source projects, including MLflow and Delta Lake, at Databricks. At Stanford, he works on distributed systems, NLP, and information retrieval, building programming models that can combine language models and external services to perform complex tasks. Matei's research work was recognized through the 2014 ACM Doctoral Dissertation Award for the best Ph.D. dissertation in computer science, an NSF CAREER Award, and the US Presidential Early Career Award for Scientists and Engineers (PECASE). // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links https://cs.stanford.edu/~matei/ https://spark.apache.org/ --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/ Connect with Matei on LinkedIn: https://www.linkedin.com/in/mateizaharia/ Timestamps: [00:00] Matei's preferred coffee [01:45] Takeaways [05:50] Please subscribe to our newsletters, join our Slack, and subscribe to our podcast channels! [06:52] Getting to know Matei as a person [09:10] Spark [14:18] Open and freewheeling cross-pollination [16:35] Actual formation of Spark [20:05] Spark and MLFlow Similarities and Differences [24:24] Concepts in MLFlow [27:34] DJ Khalid of the ML world [30:58] Data Lakehouse [33:35] Stanford's unique culture of the Computer Science Department [36:06] Starting a company [39:30] Unique advice to grad students [41:51] Open source project [44:35] LLMs in the New Revolution [47:57] Type of company to start with [49:56] Emergence of Corporate Research Labs [53:50] LLMs size context [54:44] Companies to respect [57:28] Wrap up

starting growth open wrap birth unique companies stanford scientists associate professor takeaways differences spark nlp slack success stories computer science uc berkeley concepts open source emergence ml dsp vishnu databricks dj khalid chief technologist matei demetrios new revolution apache spark computer science department nsf career award data lakehouse mlflow connect with us join engineers pecase matei zaharia

The Future is Small Models, with Matei Zaharia, CTO of Databricks

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

Play Episode Listen Later Apr 6, 2023 39:29

If you have 30 dollars, a few hours, and one server, then you are ready to create a ChatGPT-like model that can do what's known as instruction-following. Databricks' latest launch, Dolly, foreshadows a potential move in the industry toward smaller and more accessible but extremely capable AIs. Plus, Dolly is open source, requires less computing power, and fewer data parameters than its counterparts. Matei Zaharia, Cofounder & Chief Technologist at Databricks, joins Sarah and Elad to talk about how big data sets actually need to be, why manual annotation is becoming less necessary to train some models, and how he went from a Berkeley PhD student with a little project called Spark to the founder of a company that is now critical data infrastructure that's increasingly moving into AI. No Priors is now on YouTube! Subscribe to the channel on YouTube and like this episode. Show Links: Hello Dolly: Democratizing the magic of ChatGPT with open models Dolly Source Code on Github Matei Zaharia - Chief Technologist & Cofounder - Databricks | LinkedIn Matei Zaharia - Google Scholar Databricks debuts ChatGPT-like Dolly, a clone any enterprise can own | VentureBeat Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @Databricks | @Matei_Zaharia Show Notes: [01:29] - Origin of Databricks [4:30] - Work at Stanford Lab [5:29] - Dolly and Role of Open Source [12:30] - Industry focus on high parameter count, understanding reasoning at small model scale [18:42] - Enterprise applications for Dolly & chat bots [25:06] - Making bets as an academic turned CTO [36:23] - The early stages of AI and future predictions

ai work chatgpt origin spark models enterprise cto open source databricks elad chief technologist matei zaharia no priors

Ep 2: Databricks CTO Matei Zaharia on scaling and orchestrating large language models

Unsupervised Learning

Play Episode Listen Later Mar 7, 2023 46:24

Patrick and Jacob sit down with Matei Zaharia, Co-Founder and CTO at Databricks and Professor at Stanford. They discuss how companies are training and serving models in production with Databricks, where LLMs fall short for search and how to improve them, the state of the art AI research at Stanford, and how the size and cost of models is likely to change with technological advances in the coming years. (0:00) - Introduction(2:04) - Founding story of Databricks(6:03) - PhD classmates using early version of spark for Netflix competition(6:55) - Building applications with MLFlow(9:55) - LLMs and ChatGPT(12:05) - Working with and fine-tuning foundation models(13:00) - Prompt engineering here to stay or temporary?(15:12) - Matei's research at Stanford. The Demonstrate-Search-Predict framework (DSP)(17:42) - How LLMs will be combined with classic information retrieval systems for world-class search(19:38) - LLMs writing programs to orchestrate LLMs(20:36) - Using LLMs in Databricks cloud product(24:21) - Scaling LLM training and serving(27:29) - How much will cost to train LLMs go down in coming years?(29:22) - How many parameters is too many?(31:14) - Open source vs closed source?(35:19) - Stanford AI research - Snorkel, ColBERT, and More(38:58) - Matei getting a $50 amazon gift card for weeks of work(43:23) - Quick-fire round With your co-hosts:@jasoncwarner- Former CTO GitHub, VP Eng Heroku & Canonical @ericabrescia- Former COO Github, Founder Bitnami (acq'd by VMWare) @patrickachase- Partner at Redpoint, Former ML Engineer LinkedIn @jacobeffron- Partner at Redpoint, Former PM Flatiron Health

Matei Zaharia - Episode 32

ACM ByteCast

Play Episode Listen Later Dec 13, 2022 54:27

In this episode of ACM ByteCast, Bruke Kifle hosts Matei Zaharia, computer scientist, educator, and creator of Apache Spark. Matei is the Chief Technologist and Co-Founder of Databricks and an Assistant Professor of Computer Science at Stanford. He started the Apache Spark project during his PhD at UC Berkeley in 2009 and has worked broadly on other widely used data and machine learning software, including MLflow, Delta Lake, and Apache Mesos. Matei's research was recognized through the 2014 ACM Doctoral Dissertation Award, an NSF Career Award, and the US Presidential Early Career Award for Scientists and Engineers. Matei, who was born in Romania and grew up mostly in Canada, describes how he developed Spark, a framework for writing programs that run on a large cluster of nodes and process data in parallel, and how this led him to co-found Databricks around this technology. Matei and Bruke also discuss the new paradigm shift from traditional data warehouses to data lakes, as well as his work on MLflow, an open-source platform for managing the end-to-end machine learning lifecycle. He highlights some recent announcements in the field of AI and machine learning and shares observations from teaching and conducting research at Stanford, including an important current gap in computing education.

canada ai phd co founders stanford scientists engineers spark assistant professor romania computer science uc berkeley databricks chief technologist bruke matei apache spark nsf career award apache mesos mlflow matei zaharia

ColBERT + ColBERTv2: late interaction at a reasonable inference cost

Neural Information Retrieval Talks — Zeta Alpha

Play Episode Listen Later Aug 16, 2022 57:30

Andrew Yates (Assistant Professor at the University of Amsterdam) and Sergi Castella (Analyst at Zeta Alpha) discus the two influential papers introducing ColBERT (from 2020) and ColBERT v2 (from 2022), which mainly propose a fast late interaction operation to achieve a performance close to full cross-encoders but at a more manageable computational cost at inference; along with many other optimizations.

university cost effects amsterdam takeaways ir interaction reasonable methodology colbert terminology dense retrieval mrr inference qualitatively matei zaharia

Episode 97: Escaping Poverty, Embracing Digital Learning, Benchmarking ML Systems, and Advancing Data-Centric AI with Cody Coleman

Datacast

Play Episode Listen Later Aug 2, 2022 87:29

Show Notes(01:49) Cody shared his upbringing in New Jersey, his childhood interest in science and technology, and the few people who have made big differences in his story.(09:35) Cody went over his academic experience studying Electrical Engineering and Computer Science at MIT.(17:51) Cody recalled his favorite classes taken at MIT.(22:43) Cody talked about his engagement in serving as the president of MIT's chapter of Eta Kappa Nu Honor Society and advancing online education at the MIT Office of Digital Learning.(31:25) Cody is bullish on the future of digital learning.(35:43) Cody expanded on his internships with Google throughout his time at MIT — doing local search quality and YouTube analytics.(42:31) Cody described the challenges of dealing with high-frequency trading data from his one year working as a junior data scientist at the Vendor Data Group of Jump Trading in Chicago.(46:50) Cody reflected on his decision to embark on a Ph.D. journey in Computer Science at Stanford University.(51:54) Cody mentioned his participation in the DAWN project, specifically DAWNBench, an end-to-end deep learning benchmark and competition.(54:21) Cody unpacked the evolution of MLPerf, an industry-standard benchmark for the training and inference performance of ML models.(56:52) Cody walked through the motivation and empirical work in his paper “Selection via Proxy: Efficient Data Selection for Deep Learning.”(59:34) Cody discussed his paper “Similarity Search for Efficient Active Learning and Search of Rare Concepts.”(01:06:32) Cody shared his learnings about bringing ML from research to industry from his advisors, Matei Zaharia and Peter Bailis — who were both academics and startup founders simultaneously.(01:09:19) Cody went over key trends in the emerging Data-Centric AI community — given his involvement with the Data-Centric AI workshop at NeurIPS 2021 and the DataPerf benchmark suite.(01:12:19) Cody shared lessons learned about finding product-market fit as the founder of Coactive AI — which brings unstructured data into the world of SQL and the big data tools that teams already love.(01:15:34) Cody emphasized the importance of focusing on the HR function and defining cultural guiding principles for any early-stage startup founder.(01:21:05) Cody provided his perspective on the differences and similarities between being a researcher and a founder.(01:23:47) Closing segment.Cody's Contact InfoWebsiteTwitterLinkedInGoogle ScholarCoactive AI's ResourcesWebsiteTwitterLinkedInCulture ValuesMentioned ContentTalk“Digging Deeper: How a Few Extra Moments Can Change Lives” (TEDxStanford 2017)“Data Selection for Data-Centric AI” (Stanford MLSys 2022)Research“Probabilistic Use Cases: Discovering Behavioral Patterns for Predicting Certification” (2015)DAWNBench: An End-to-End Deep Learning Benchmark and Competition (Dec 2017)“MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance” (Feb 2020)“Selection via Proxy: Efficient Data Selection for Deep Learning” (Oct 2020)“Similarity Search for Efficient Active Learning and Search of Rare Concepts” (July 2021)DataPerf, a new benchmark suite for machine learning datasets and data-centric algorithms (Dec 2021)PeopleMatei Zaharia (Cody's Ph.D. Advisor, Co-Creator of Apache Spark, Co-Founder of Databricks)Fei-Fei Li (Professor of Computer Science at Stanford, Creator of ImageNet Dataset)Michael Bernstein (Professor of Computer Science at Stanford with a focus on Human-Computer Interaction)Books“No Rule Rules: Netflix and the Culture of Reinvention” (by Reed Hastings)“What You Do Is Who You Are: How to Create Your Work Business Culture” (by Ben Horowitz)“The Inner Game of Tennis: The Classical Guide to Peak Performance” (by Timothy Gallwey)NotesMy conversation with Cody was recorded back in January 2022. Since then, many things have happened at Coactive AI. I'd recommend:Attending Cody's upcoming talk at Snorkel's The Future of Data-Centric AI.Reviewing the DataPerf workshop at ICML 2022.Reading the CoactiveAI blog post on bringing UI props to MLOps.Watching Cody's CBS News interview back in February 2022.About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

10/22/20 #2 Matei Zaharia - Machine Learning at Industrial Scale: Lessons from the MLflow Project

Stanford MLSys Seminar

Play Episode Listen Later Jan 8, 2022 59:33

Matei Zaharia - Machine Learning at Industrial Scale: Lessons from the MLflow Project Although enterprise adoption of machine learning is still early on, many enterprises in all industries already have hundreds of internal ML applications. ML powers business processes with an impact of hundreds of millions of dollars in industrial IoT, finance, healthcare and retail. Building and operating these applications reliably requires infrastructure that is different from traditional software development, which has led to significant investment in the construction of “ML platforms” specifically designed to run ML applications. In this talk, I'll discuss some of the common challenges in productionizing ML applications based on experience building MLflow, an open source ML platform started at Databricks. MLflow is now the most widely used open source project in this area, with over 2 million downloads a month and integrations with dozens of other products. I'll also highlight some interesting problems users face that are not covered deeply in current ML systems research, such as the need for “hands-free” ML that can train thousands of independent models without direct tuning from the ML developer for regulatory reasons, and the impact of privacy and interpretability regulations on ML. All my examples will be based on experience at large Databricks / MLflow customers.

lessons building project scale industrial machine learning iot ml databricks mlflow matei zaharia

Data Brew Season 2 Episode 1: ML in Production

Data Brew by Databricks

Play Episode Listen Later Apr 22, 2021 30:49

For our second season, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more.In the season opener, Matei Zaharia discusses how he entered the field of ML, best practices for productionizing ML pipelines, leveraging MLflow & the Lakehouse architecture for reproducible ML, and his current research in this field.See more at databricks.com/data-brew

ai data production spark machine learning brew ml automl mlflow matei zaharia

Episode 4: MLFlow with Matei Zaharia

Data Science In Production

Play Episode Listen Later Mar 7, 2019 31:48

I caught up with the creator of Apache Spark and Databricks founder Matei Zaharia at this year's Big Data London. We discussed the release of MLFlow, Databricks and project Dawn.

science data databricks apache spark mlflow matei zaharia

001 - Data Center Scale Computing and Artificial Intelligence with Matei Zaharia, Inventor of Apache Spark

Pioneers in AI

Play Episode Listen Later Sep 4, 2018 22:03

Matei Zaharia is a co-founder and Chief Technologist at Databricks, an Assistant Professor of Computer Science at Stanford and the inventor of Apache Spark. Microsoft has partnered with Databricks to bring you Azure Databricks, a Spark-based analytics platform optimized for Azure offering simple setup, streamlined workflows and ease of collaboration between data scientists, engineers and business analysts. Let’s see what Matei has to say about Spark, ML and interesting AI applications he’s encountered lately. Databricks, https://databricks.com/ Azure Databricks, https://azure.microsoft.com/en-us/services/databricks/

Building a Data Science Capability with Stephanie Yee, Matei Zaharia, Sid Anand and Soups Ranjan

The InfoQ Podcast

Play Episode Listen Later Apr 27, 2018 43:09

In this podcast, recorded live at QCon.ai, Principal Technical Advisor & QCon Chair Wes Reisz and InfoQ Editor-in-chief Charles Humble chair a panel discussion with Stephanie Yee, data scientist at StitchFix, Matei Zaharia, professor of computer science at Stanford and chief scientist at Data Bricks, Sid Anand, chief data engineer at PayPal, and Soups Ranjan, director of data science at CoinBase. Why listen to this podcast: - Before you start putting a data science team together make sure you have a business goal or question that you want to answer; If you have a specific question, like increasing lift on a metric, or understanding customer usage patterns, you know where you can get the data from, and you can then figure out how to organise that data. - You need to make sure you have the right culture for the team - and find people who are excited about solving the business problems and be interested in it. Also look at the environment you are going to provide. - Your first hire shouldn’t be a data scientist (or quant). You need support to productionise the models - and if you don’t have a colleague to help productionise it then don’t hire the quant first. - Given the scarcity of talent it is worth remembering that Data Scientists come from a variety of different backgrounds - Some people have computer science backgrounds, some may be astrophysicists or neuroscientists who approach problems in different ways. - There are two common ways to structure a data science team: one is a vertical team that does everything, the other, more common in large companies, is when you have a separate data science team and an infrastructure team. More on this: Quick scan our curated show notes on InfoQ https://bit.ly/2Jym1RI You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq Subscribe: www.youtube.com/infoq Like InfoQ on Facebook: bit.ly/2jmlyG8 Follow on Twitter: twitter.com/InfoQ Follow on LinkedIn: www.linkedin.com/company/infoq Check the landing page on InfoQ: https://bit.ly/2Jym1RI

stanford paypal soup data science coinbase data scientists anand capability stitch fix databricks infoq qcon matei zaharia

Spark and Streaming with Matei Zaharia

Greatest Hits – Software Engineering Daily

Play Episode Listen Later Feb 26, 2018 60:04

Apache Spark is a system for processing large data sets in parallel. The core abstraction of Spark is the resilient distributed dataset (RDD), a working set of data that sits in memory for fast, iterative processing. Matei Zaharia created Spark with two goals: to provide a composable, high-level set of APIs for performing distributed processing; The post Spark and Streaming with Matei Zaharia appeared first on Software Engineering Daily.

streaming spark apis apache spark software engineering daily rdd matei zaharia

The state of machine learning in Apache Spark

O'Reilly Data Show - O'Reilly Media Podcast

Play Episode Listen Later Sep 14, 2017 21:41

In this episode of the Data Show, we look back to a recent conversation I had at the Spark Summit in San Francisco with Ion Stoica (UC Berkeley professor and executive chairman of Databricks) and Matei Zaharia (assistant professor at Stanford and chief technologist of Databricks). Stoica and Zaharia were core members of UC Berkeley’s […]

san francisco stanford machine learning uc berkeley databricks apache spark data show zaharia matei zaharia

Episode 25: How Apache Spark's creator is now tackling AI's data problem

THE ARCHITECHT SHOW

Play Episode Listen Later Jun 22, 2017 59:05

In episode 25(!) of the ARCHITECHT Show, Matei Zaharia and Peter Bailis discuss how the DAWN project at Stanford University is trying bring the power of AI and machine learning to a broader set of users. The project's initial focus is on making it easier to collect, process and analyze enough high-quality data to take advantage of today's leading models and algorithms. Aside from the technologies, the Zaharia and Bailis also discuss how they're working with industry on this project and the right path for moving academic research into the real world. It's an area Zaharia knows well, having previously helped create Apache Mesos and Apache Spark, and having co-founded Databricks, where he's still CTO. In the news segment, co-hosts Derrick Harris (ARCHITECHT) and Barb Darrow (Fortune) talk all about Amazon, and how its $14 billion purchase of Whole Foods is having competitive repercussions even in its cloud business.

amazon ai creator data stanford university tackling cto whole foods apache databricks apache spark zaharia apache mesos matei zaharia

a16z Podcast: A Conversation With the Inventor of Spark

a16z

Play Episode Listen Later Jun 24, 2015 19:03

One of the most active and fastest growing open source big data cluster computing projects is Apache Spark, which was originally developed at U.C. Berkeley's AMPLab and is now used by internet giants and other companies around the world. Including, as announced most recently, IBM. In this Q&A with Spark inventor Matei Zaharia -- also the CTO and co-founder of Databricks (and a professor at MIT) -- on the heels of the recent Spark Summit, we cover the difference between Hadoop MapReduce and Spark; what are the ingredients of a successful open source project; and the story of how Spark almost helped a friend win a million dollars.

conversations mit software spark ibm berkeley cto big data inventor databricks a16z hadoop apache spark matei zaharia

Podcasts about matei zaharia

Best podcasts about matei zaharia

MLOps.community

Papers Read on AI

The top AI news from the past week, every ThursdAI

Latest news about matei zaharia

Latest podcast episodes about matei zaharia

Why the Frontier Ecosystem must be Open — Matei Zaharia and Reynold Xin, Databricks

TC's Julie Bort can't help rooting for tiny open source AI model maker Arcee; plus, Databricks co-founder wins prestigious ACM award

What OpenAI and Google engineers learned deploying 50+ AI products in production

Top 5 Research Trends + OpenAI Sora, Google Gemini, Groq Math (Jan-Feb 2024 Audio Recap) + Latent Space Anniversary with Lindy.ai, RWKV, Pixee, Julius.ai, Listener Q&A!

Ring Attention with Blockwise Transformers for Near-Infinite Context

World Model on Million-Length Video And Language With RingAttention

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

FrugalGPT: Better Quality and Lower Cost for LLM Applications // Lingjiao Chen // MLOps Podcast #173

EP 74 – Inspiration session #9: Golf, Gender and Generative AI

How is ChatGPT's behavior changing over time?

ThursdAI July 20 - LLaMa 2, Vision and multimodality for all, and is GPT-4 getting dumber?

How to Deploy AI in Your Company (Matei Zaharia, CTO & Co-Founder of Databricks)

The Bazaar in the Cathedral with Matei Zaharia

The Birth and Growth of Spark: An Open Source Success Story // Matei Zaharia // MLOps Podcast #155

The Future is Small Models, with Matei Zaharia, CTO of Databricks

Ep 2: Databricks CTO Matei Zaharia on scaling and orchestrating large language models

Matei Zaharia - Episode 32

ColBERT + ColBERTv2: late interaction at a reasonable inference cost

Episode 97: Escaping Poverty, Embracing Digital Learning, Benchmarking ML Systems, and Advancing Data-Centric AI with Cody Coleman

10/22/20 #2 Matei Zaharia - Machine Learning at Industrial Scale: Lessons from the MLflow Project

Data Brew Season 2 Episode 1: ML in Production

Episode 4: MLFlow with Matei Zaharia

001 - Data Center Scale Computing and Artificial Intelligence with Matei Zaharia, Inventor of Apache Spark

Building a Data Science Capability with Stephanie Yee, Matei Zaharia, Sid Anand and Soups Ranjan

Spark and Streaming with Matei Zaharia

The state of machine learning in Apache Spark

Episode 25: How Apache Spark's creator is now tackling AI's data problem

a16z Podcast: A Conversation With the Inventor of Spark