POPULARITY
Varun Mohan is the co-founder and CEO of Windsurf (formerly Codeium), an AI-powered development environment (IDE) that has been used by over 1 million developers in just four months and has quickly emerged as a leader in transforming how developers build software. Prior to finding success with Windsurf, the company pivoted twice—first from GPU virtualization infrastructure to an IDE plugin, and then to their own standalone IDE.In this conversation, you'll learn:1. Why Windsurf walked away from a profitable GPU infrastructure business and bet the company on helping engineers code2. The surprising UI discovery that tripled adoption rates overnight.3. The secret behind Windsurf's B2B enterprise plan, and why they invested early in an 80-person sales team despite conventional startup wisdom.4. How non-technical staff at Windsurf built their own custom tools instead of purchasing SaaS products, saving them over $500k in software costs5. Why Varun believes 90% of code will be AI-generated, but engineering jobs will actually increase6. How training on millions of incomplete code samples gives Windsurf an edge, and creates a moat long-term7. Why agency is the most undervalued and important skill in the AI era—Brought to you by:• Brex—The banking solution for startups• Productboard—Make products that matter• Coda—The all-in-one collaborative workspace—Where to find Varun Mohan:• X: https://x.com/_mohansolo• LinkedIn: https://www.linkedin.com/in/varunkmohan/—Where to find Lenny:• Newsletter: https://www.lennysnewsletter.com• X: https://twitter.com/lennysan• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/—In this episode, we cover:(00:00) Varun's background(03:57) Building and scaling Windsurf(12:58) Windsurf: The new purpose-built IDE to harness magic(17:11) The future of engineering and AI(21:30) Skills worth investing in(23:07) Hiring philosophy and company culture(35:22) Sales strategy and market position(39:37) JetBrains vs. VS Code: extensibility and enterprise adoption(41:20) Live demo: building an Airbnb for dogs with Windsurf(42:46) Tips for using Windsurf effectively(46:38) AI's role in code modification and review(48:56) Empowering non-developers to build custom software(54:03) Training Windsurf(01:00:43) Windsurf's unique team structure and product strategy(01:06:40) The importance of continuous innovation(01:08:57) Final thoughts and advice for aspiring developers—Referenced:• Windsurf: https://windsurf.com/• VS Code: https://code.visualstudio.com/• JetBrains: https://www.jetbrains.com/• Eclipse: https://eclipseide.org/• Visual Studio: https://visualstudio.microsoft.com/• Vim: https://www.vim.org/• Emacs: https://www.gnu.org/software/emacs/• Lessons from a two-time unicorn builder, 50-time startup advisor, and 20-time company board member | Uri Levine (co-founder of Waze): https://www.lennysnewsletter.com/p/lessons-from-uri-levine• IntelliJ: https://www.jetbrains.com/idea/• Julia: https://julialang.org/• Parallel computing: https://en.wikipedia.org/wiki/Parallel_computing• Douglas Chen on LinkedIn: https://www.linkedin.com/in/douglaspchen/• Carlos Delatorre on LinkedIn: https://www.linkedin.com/in/cadelatorre/• MongoDB: https://www.mongodb.com/• Cursor: https://www.cursor.com/• GitHub Copilot: https://github.com/features/copilot• Llama: https://www.llama.com/• Mistral: https://mistral.ai/• Building Lovable: $10M ARR in 60 days with 15 people | Anton Osika (CEO and co-founder): https://www.lennysnewsletter.com/p/building-lovable-anton-osika• Inside Bolt: From near-death to ~$40m ARR in 5 months—one of the fastest-growing products in history | Eric Simons (founder & CEO of StackBlitz): https://www.lennysnewsletter.com/p/inside-bolt-eric-simons• Behind the product: Replit | Amjad Masad (co-founder and CEO): https://www.lennysnewsletter.com/p/behind-the-product-replit-amjad-masad• React: https://react.dev/• Sonnet: https://www.anthropic.com/claude/sonnet• OpenAI: https://openai.com/• FedRamp: https://www.fedramp.gov/• Dario Amodei on LinkedIn: https://www.linkedin.com/in/dario-amodei-3934934/• Amdahl's law: https://en.wikipedia.org/wiki/Amdahl%27s_law• How to win in the AI era: Ship a feature every week, embrace technical debt, ruthlessly cut scope, and create magic your competitors can't copy | Gaurav Misra (CEO and co-founder of Captions): https://www.lennysnewsletter.com/p/how-to-win-in-the-ai-era-gaurav-misra—Recommended book:• Fall in Love with the Problem, Not the Solution: A Handbook for Entrepreneurs: https://www.amazon.com/Fall-Love-Problem-Solution-Entrepreneurs/dp/1637741987—Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.—Lenny may be an investor in the companies discussed. Get full access to Lenny's Newsletter at www.lennysnewsletter.com/subscribe
話したこと opトーク カーシェアリング タイムズカー 三井のカーシェアーズ(旧カレコ) 車関連 ブンブンジャー(スーパー戦隊シリーズ) 首都高の激ムズ要因「右側合流」なぜ多い 出入口もJCTも“ランプは右”が当たり前!? 後方確認キビシ~! ハイエース アルファード(トヨタ) 高速道路の合流のコツ AI & 開発支援ツール Devin(AIエンジニア) GitHub Copilot Cursor(AIコードエディタ) Claude(Anthropic) cline Windsurf Editor by Codeium 「AI Code Agents 祭り」~ 2025 Winter ~ OpenHands Grok(XのAI) AIと仕事の未来について(MIT Technology Review) 人月の神話 - Wikipedia Word2vec - Wikipedia 話してる人 マーク(tetuo41) 会社員 須貝(sugaishun) 会社員 駿河(snowlong) 在宅社長
Learn more about your ad choices. Visit podcastchoices.com/adchoices
In this episode, we discuss Codeium, an AI coding startup that is eyeing a $3B valuation just months after its last funding round. We also talk about how Apple's iPhone 16e is shifting toward on-device AI processing and Microsoft's Majorana 1 quantum chip breakthrough, which could fast-track fault-tolerant quantum computing. Plus, our Chart of the Week reveals ChatGPT's milestone of reaching 400 million weekly active users, and we reminisce on its historical scaling to the first 100 million users. Remember to Stay Current! To learn more, visit us on the web at https://www.morgancreekcap.com/morgan-creek-digital/ . To speak to a team member or sign up for additional content, please email mcdigital@morgancreekcap.com Legal Disclaimer This podcast is for informational purposes only and should not be construed as investment advice or a solicitation for the sale of any security, advisory, or other service. Investments related to the themes and ideas discussed may be owned by funds managed by the host and podcast guests. Any conflicts mentioned by the host are subject to change. Listeners should consult their personal financial advisors before making any investment decisions.
welcome to wall-e's tech briefing for thursday, february 20th! here's what's making waves in today's top tech stories: codeium's valuation leap: the ai-powered coding startup is entering a new funding round at a $2.85 billion valuation, led by kleiner perkins, marking a significant rise from its previous $1.25 billion post-money valuation. saronic's funding boost: defense tech startup saronic raises $600 million in series c funding, elevating its valuation to $4 billion. its autonomous ship factory, "port alpha," aims to revolutionize u.s. shipbuilding. elon musk's x fundraising: x, formerly twitter, is in talks to raise funds at a $44 billion valuation, matching musk's purchase price, as it maneuvers through challenges in retaining advertisers. hyundai-tesla charging challenges: with the 2025 ioniq 5 adopting tesla's charging standard, infrastructure and compatibility issues continue to concern ev owners. mistral's le chat success: mistral's ai assistant, le chat, achieves over one million downloads in 14 days, topping france's ios app store rankings, gaining attention before france's ai action summit. tune in tomorrow for the latest tech updates!
In this episode of RAW STORIES LIVE, Lori Lipscomb shares her incredible journey in sales at Codeium. From early struggles to breakthroughs, she reveals how a single cold call worth $50,000 became the turning point in her career. Lori discusses the challenges she faced along the way, the lessons she learned, and the mindset shift that helped her close the deal of a lifetime. In a field dominated by competition and high stakes, Lori's story is a testament to the power of resilience and persistence. She opens up about the emotional rollercoaster of sales, from missing out on big deals to hitting that one major win. Lori also dives into the personal growth that comes from taking ownership of your success. Tune in to hear more about how Lori turned obstacles into opportunities and how you can apply these lessons to your own career. Whether you're in sales or another field, this episode offers valuable insights on overcoming setbacks and staying determined in the pursuit of success.
Happy holidays! We'll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all all our LS supporters who helped fund the gorgeous venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver.Our next keynote covers The State of LLM Agents, with the triumphant return of Professor Graham Neubig's return to the pod (his ICLR episode here!). OpenDevin is now a startup known as AllHands! The renamed OpenHands has done extremely well this year, as they end the year sitting comfortably at number 1 on the hardest SWE-Bench Full leaderboard at 29%, though on the smaller SWE-Bench Verified, they are at 53%, behind Amazon Q, devlo, and OpenAI's self reported o3 results at 71.7%.Many are saying that 2025 is going to be the year of agents, with OpenAI, DeepMind and Anthropic setting their sights on consumer and coding agents, vision based computer-using agents and multi agent systems. There has been so much progress on the practical reliability and applications of agents in all domains, from the huge launch of Cognition AI's Devin this year, to the sleeper hit of Cursor Composer and Codeium's Windsurf Cascade in the IDE arena, to the explosive revenue growth of Stackblitz's Bolt, Lovable, and Vercel's v0, and the unicorn rounds and high profile movements of customer support agents like Sierra (now worth $4 billion) and search agents like Perplexity (now worth $9 billion). We wanted to take a little step back to understand the most notable papers of the year in Agents, and Graham indulged with his list of 8 perennial problems in building agents in 2024.Must-Read Papers for the 8 Problems of Agents* The agent-computer interface: CodeAct: Executable Code Actions Elicit Better LLM Agents. Minimial viable tools: Execution Sandbox, File Editor, Web Browsing* The human-agent interface: Chat UI, GitHub Plugin, Remote runtime, …?* Choosing an LLM: See Evaluation of LLMs as Coding Agents on SWE-Bench at 30x - must understand instructions, tools, code, environment, error recovery* Planning: Single Agent Systems vs Multi Agent (CoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration) - Explicit vs Implicit, Curated vs Generated* Reusable common workflows: SteP: Stacked LLM Policies for Web Actions and Agent Workflow Memory - Manual prompting vs Learning from Experience* Exploration: Agentless: Demystifying LLM-based Software Engineering Agents and BAGEL: Bootstrapping Agents by Guiding Exploration with Language* Search: Tree Search for Language Model Agents - explore paths and rewind* Evaluation: Fast Sanity Checks (miniWoB and Aider) and Highly Realistic (WebArena, SWE-Bench) and SWE-Gym: An Open Environment for Training Software Engineering Agents & VerifiersFull Talk on YouTubePlease like and subscribe!Timestamps* 00:00 Welcome to Latent Space Live at NeurIPS 2024* 00:29 State of LLM Agents in 2024* 02:20 Professor Graham Newbig's Insights on Agents* 03:57 Live Demo: Coding Agents in Action* 08:20 Designing Effective Agents* 14:13 Choosing the Right Language Model for Agents* 16:24 Planning and Workflow for Agents* 22:21 Evaluation and Future Predictions for Agents* 25:31 Future of Agent Development* 25:56 Human-Agent Interaction Challenges* 26:48 Expanding Agent Use Beyond Programming* 27:25 Redesigning Systems for Agent Efficiency* 28:03 Accelerating Progress with Agent Technology* 28:28 Call to Action for Open Source Contributions* 30:36 Q&A: Agent Performance and Benchmarks* 33:23 Q&A: Web Agents and Interaction Methods* 37:16 Q&A: Agent Architectures and Improvements* 43:09 Q&A: Self-Improving Agents and Authentication* 47:31 Live Demonstration and Closing RemarksTranscript[00:00:29] State of LLM Agents in 2024[00:00:29] Speaker 9: Our next keynote covers the state of LLM agents. With the triumphant return of Professor Graham Newbig of CMU and OpenDevon, now a startup known as AllHands. The renamed OpenHands has done extremely well this year, as they end the year sitting comfortably at number one on the hardest SWE Benchful leaderboard at 29%.[00:00:53] Speaker 9: Though, on the smaller SWE bench verified, they are at 53 percent behind Amazon Q [00:01:00] Devlo and OpenAI's self reported O3 results at 71. 7%. Many are saying that 2025 is going to be the year of agents, with OpenAI, DeepMind, and Anthropic setting their sights on consumer and coding agents. Vision based computer using agents and multi agent systems.[00:01:22] Speaker 9: There has been so much progress on the practical reliability and applications of agents in all domains, from the huge launch of Cognition AI's Devon this year, to the sleeper hit of Cursor Composer and recent guest Codium's Windsurf Cascade in the IDE arena. To the explosive revenue growth of recent guests StackBlitz's Bolt, Lovable, and Vercel's vZero.[00:01:44] Speaker 9: And the unicorn rounds and high profile movements of customer support agents like Sierra, now worth 4 billion, and search agents like Perplexity, now worth 9 billion. We wanted to take a little step back to understand the most notable papers of the year in [00:02:00] agents, and Graham indulged with his list of eight perennial problems in building agents.[00:02:06] Speaker 9: As always, don't forget to check our show notes for all the selected best papers of 2024, and for the YouTube link to their talk. Graham's slides were especially popular online, and we are honoured to have him. Watch out and take care![00:02:20] Professor Graham Newbig's Insights on Agents[00:02:20] Speaker: Okay hi everyone. So I was given the task of talking about agents in 2024, and this is An impossible task because there are so many agents, so many agents in 2024. So this is going to be strongly covered by like my personal experience and what I think is interesting and important, but I think it's an important topic.[00:02:41] Speaker: So let's go ahead. So the first thing I'd like to think about is let's say I gave you you know, a highly competent human, some tools. Let's say I gave you a web browser and a terminal or a file system. And the ability to [00:03:00] edit text or code. What could you do with that? Everything. Yeah.[00:03:07] Speaker: Probably a lot of things. This is like 99 percent of my, you know, daily daily life, I guess. When I'm, when I'm working. So, I think this is a pretty powerful tool set, and I am trying to do, and what I think some other people are trying to do, is come up with agents that are able to, you know, manipulate these things.[00:03:26] Speaker: Web browsing, coding, running code in successful ways. So there was a little bit about my profile. I'm a professor at CMU, chief scientist at All Hands AI, building open source coding agents. I'm maintainer of OpenHands, which is an open source coding agent framework. And I'm also a software developer and I, I like doing lots of coding and, and, you know, shipping new features and stuff like this.[00:03:51] Speaker: So building agents that help me to do this, you know, is kind of an interesting thing, very close to me.[00:03:57] Live Demo: Coding Agents in Action[00:03:57] Speaker: So the first thing I'd like to do is I'd like to try [00:04:00] some things that I haven't actually tried before. If anybody has, you know, tried to give a live demo, you know, this is, you know very, very scary whenever you do it and it might not work.[00:04:09] Speaker: So it might not work this time either. But I want to show you like three things that I typically do with coding agents in my everyday work. I use coding agents maybe five to 10 times a day to help me solve my own problems. And so this is a first one. This is a data science task. Which says I want to create scatter plots that show the increase of the SWE bench score over time.[00:04:34] Speaker: And so I, I wrote a kind of concrete prompt about this. Agents work better with like somewhat concrete prompts. And I'm gonna throw this into open hands and let it work. And I'll, I'll go back to that in a second. Another thing that I do is I create new software. And I, I've been using a [00:05:00] service a particular service.[00:05:01] Speaker: I won't name it for sending emails and I'm not very happy with it. So I want to switch over to this new service called resend. com, which makes it easier to send emails. And so I'm going to ask it to read the docs for the resend. com API and come up with a script that allows me to send emails. The input to the script should be a CSV file and the subject and body should be provided in Jinja2 templates.[00:05:24] Speaker: So I'll start another agent and and try to get it to do that for me.[00:05:35] Speaker: And let's go with the last one. The last one I do is. This is improving existing software and in order, you know, once you write software, you usually don't throw it away. You go in and, like, actually improve it iteratively. This software that I have is something I created without writing any code.[00:05:52] Speaker: It's basically software to monitor how much our our agents are contributing to the OpenHance repository. [00:06:00] And on the, let me make that a little bit bigger, on the left side, I have the number of issues where it like sent a pull request. I have the number of issues where it like sent a pull request, whether it was merged in purple, closed in red, or is still open in green. And so these are like, you know, it's helping us monitor, but one thing it doesn't tell me is the total number. And I kind of want that feature added to this software.[00:06:33] Speaker: So I'm going to try to add that too. So. I'll take this, I'll take this prompt,[00:06:46] Speaker: and here I want to open up specifically that GitHub repo. So I'll open up that repo and paste in the prompt asking it. I asked it to make a pie chart for each of these and give me the total over the entire time period that I'm [00:07:00] monitoring. So we'll do that. And so now I have let's see, I have some agents.[00:07:05] Speaker: Oh, this one already finished. Let's see. So this one already finished. You can see it finished analyzing the Swebench repository. It wrote a demonstration of, yeah, I'm trying to do that now, actually.[00:07:30] Speaker: It wrote a demonstration of how much each of the systems have improved over time. And I asked it to label the top three for each of the data sets. And so it labeled OpenHands as being the best one for SWE Bench Normal. For SWE Bench Verified, it has like the Amazon QAgent and OpenHands. For the SWE Bench Lite, it has three here over three over here.[00:07:53] Speaker: So you can see like. That's pretty useful, right? If you're a researcher, you do data analysis all the time. I did it while I was talking to all [00:08:00] of you and making a presentation. So that's, that's pretty nice. I, I doubt the other two are finished yet. That would be impressive if the, yeah. So I think they're still working.[00:08:09] Speaker: So maybe we'll get back to them at the end of the presentation. But so these are the kinds of the, these are the kinds of things that I do every day with coding agents now. And it's or software development agents. It's pretty impressive.[00:08:20] Designing Effective Agents[00:08:20] Speaker: The next thing I'd like to talk about a little bit is things I worry about when designing agents.[00:08:24] Speaker: So we're designing agents to, you know, do a very difficult task of like navigating websites writing code, other things like this. And within 2024, there's been like a huge improvement in the methodology that we use to do this. But there's a bunch of things we think about. There's a bunch of interesting papers, and I'd like to introduce a few of them.[00:08:46] Speaker: So the first thing I worry about is the agent computer interface. Like, how do we get an agent to interact with computers? And, How do we provide agents with the tools to do the job? And [00:09:00] within OpenHands we are doing the thing on the right, but there's also a lot of agents that do the thing on the left.[00:09:05] Speaker: So the thing on the left is you give like agents kind of granular tools. You give them tools like or let's say your instruction is I want to determine the most cost effective country to purchase the smartphone model, Kodak one the countries to consider are the USA, Japan, Germany, and India. And you have a bunch of available APIs.[00:09:26] Speaker: And. So what you do for some agents is you provide them all of these tools APIs as tools that they can call. And so in this particular case in order to solve this problem, you'd have to make about like 30 tool calls, right? You'd have to call lookup rates for Germany, you'd have to look it up for the US, Japan, and India.[00:09:44] Speaker: That's four tool goals. And then you go through and do all of these things separately. And the method that we adopt in OpenHands instead is we provide these tools, but we provide them by just giving a coding agent, the ability to call [00:10:00] arbitrary Python code. And. In the arbitrary Python code, it can call these tools.[00:10:05] Speaker: We expose these tools as APIs that the model can call. And what that allows us to do is instead of writing 20 tool calls, making 20 LLM calls, you write a program that runs all of these all at once, and it gets the result. And of course it can execute that program. It can, you know, make a mistake. It can get errors back and fix things.[00:10:23] Speaker: But that makes our job a lot easier. And this has been really like instrumental to our success, I think. Another part of this is what tools does the agent need? And I, I think this depends on your use case, we're kind of extreme and we're only giving the agent five tools or maybe six tools.[00:10:40] Speaker: And what, what are they? The first one is program execution. So it can execute bash programs, and it can execute Jupyter notebooks. It can execute cells in Jupyter notebooks. So that, those are two tools. Another one is a file editing tool. And the file editing tool allows you to browse parts of files.[00:11:00][00:11:00] Speaker: And kind of read them, overwrite them, other stuff like this. And then we have another global search and replace tool. So it's actually two tools for file editing. And then a final one is web browsing, web browsing. I'm kind of cheating when I call it only one tool. You actually have like scroll and text input and click and other stuff like that.[00:11:18] Speaker: But these are basically the only things we allow the agent to do. What, then the question is, like, what if we wanted to allow it to do something else? And the answer is, well, you know, human programmers already have a bunch of things that they use. They have the requests PyPy library, they have the PDF to text PyPy library, they have, like, all these other libraries in the Python ecosystem that they could use.[00:11:41] Speaker: And so if we provide a coding agent with all these libraries, it can do things like data visualization and other stuff that I just showed you. So it can also get clone repositories and, and other things like this. The agents are super good at using the GitHub API also. So they can do, you know, things on GitHub, like finding all of the, you know, [00:12:00] comments on your issues or checking GitHub actions and stuff.[00:12:02] Speaker: The second thing I think about is the human agent interface. So this is like how do we get humans to interact with agents? Bye. I already showed you one variety of our human agent interface. It's basically a chat window where you can browse through the agent's results and things like this. This is very, very difficult.[00:12:18] Speaker: I, I don't think anybody has a good answer to this, and I don't think we have a good answer to this, but the, the guiding principles that I'm trying to follow are we want to present enough info to the user. So we want to present them with, you know, what the agent is doing in the form of a kind of.[00:12:36] Speaker: English descriptions. So you can see here you can see here every time it takes an action, it says like, I will help you create a script for sending emails. When it runs a bash command. Sorry, that's a little small. When it runs a bash command, it will say ran a bash command. It won't actually show you the whole bash command or the whole Jupyter notebook because it can be really large, but you can open it up and see if you [00:13:00] want to, by clicking on this.[00:13:01] Speaker: So like if you want to explore more, you can click over to the Jupyter notebook and see what's displayed in the Jupyter notebook. And you get like lots and lots of information. So that's one thing.[00:13:16] Speaker: Another thing is go where the user is. So like if the user's already interacting in a particular setting then I'd like to, you know, integrate into that setting, but only to a point. So at OpenHands, we have a chat UI for interaction. We have a GitHub plugin for tagging and resolving issues. So basically what you do is you Do at open hands agent and the open hands agent will like see that comment and be able to go in and fix things.[00:13:42] Speaker: So if you say at open hands agent tests are failing on this PR, please fix the tests. It will go in and fix the test for you and stuff like this. Another thing we have is a remote runtime for launching headless jobs. So if you want to launch like a fleet of agents to solve, you know five different problems at once, you can also do [00:14:00] that through an API.[00:14:00] Speaker: So we have we have these interfaces and this probably depends on the use case. So like, depending if you're a coding agent, you want to do things one way. If you're a like insurance auditing agent, you'll want to do things other ways, obviously.[00:14:13] Choosing the Right Language Model for Agents[00:14:13] Speaker: Another thing I think about a lot is choosing a language model.[00:14:16] Speaker: And for agentic LMs we have to have a bunch of things work really well. The first thing is really, really good instruction following ability. And if you have really good instruction following ability, it opens up like a ton of possible applications for you. Tool use and coding ability. So if you provide tools, it needs to be able to use them well.[00:14:38] Speaker: Environment understanding. So it needs, like, if you're building a web agent, it needs to be able to understand web pages either through vision or through text. And error awareness and recovery ability. So, if it makes a mistake, it needs to be able to, you know, figure out why it made a mistake, come up with alternative strategies, and other things like this.[00:14:58] Speaker: [00:15:00] Under the hood, in all of the demos that I did now Cloud, we're using Cloud. Cloud has all of these abilities very good, not perfect, but very good. Most others don't have these abilities quite as much. So like GPT 4. 0 doesn't have very good error recovery ability. And so because of this, it will go into loops and do the same thing over and over and over again.[00:15:22] Speaker: Whereas Claude does not do this. Claude, if you, if you use the agents enough, you get used to their kind of like personality. And Claude says, Hmm, let me try a different approach a lot. So, you know, obviously it's been trained in some way to, you know, elicit this ability. We did an evaluation. This is old.[00:15:40] Speaker: And we need to update this basically, but we evaluated CLOD, mini LLAMA 405B, DeepSeq 2. 5 on being a good code agent within our framework. And CLOD was kind of head and shoulders above the rest. GPT 40 was kind of okay. The best open source model was LLAMA [00:16:00] 3. 1 405B. This needs to be updated because this is like a few months old by now and, you know, things are moving really, really fast.[00:16:05] Speaker: But I still am under the impression that Claude is the best. The other closed models are, you know, not quite as good. And then the open models are a little bit behind that. Grok, I, we haven't tried Grok at all, actually. So, it's a good question. If you want to try it I'd be happy to help.[00:16:24] Speaker: Cool.[00:16:24] Planning and Workflow for Agents[00:16:24] Speaker: Another thing is planning. And so there's a few considerations for planning. The first one is whether you have a curated plan or you have it generated on the fly. And so for solving GitHub issues, you can kind of have an overall plan. Like the plan is first reproduce. If there's an issue, first write tests to reproduce the issue or to demonstrate the issue.[00:16:50] Speaker: After that, run the tests and make sure they fail. Then go in and fix the tests. Run the tests again to make sure they pass and then you're done. So that's like a pretty good workflow [00:17:00] for like solving coding issues. And you could curate that ahead of time. Another option is to let the language model basically generate its own plan.[00:17:10] Speaker: And both of these are perfectly valid. Another one is explicit structure versus implicit structure. So let's say you generate a plan. If you have explicit structure, you could like write a multi agent system, and the multi agent system would have your reproducer agent, and then it would have your your bug your test writer agent, and your bug fixer agent, and lots of different agents, and you would explicitly write this all out in code, and then then use it that way.[00:17:38] Speaker: On the other hand, you could just provide a prompt that says, please do all of these things in order. So in OpenHands, we do very light planning. We have a single prompt. We don't have any multi agent systems. But we do provide, like, instructions about, like, what to do first, what to do next, and other things like this.[00:17:56] Speaker: I'm not against doing it the other way. But I laid [00:18:00] out some kind of justification for this in this blog called Don't Sleep on Single Agent Systems. And the basic idea behind this is if you have a really, really good instruction following agent it will follow the instructions as long as things are working according to your plan.[00:18:14] Speaker: But let's say you need to deviate from your plan, you still have the flexibility to do this. And if you do explicit structure through a multi agent system, it becomes a lot harder to do that. Like, you get stuck when things deviate from your plan. There's also some other examples, and I wanted to introduce a few papers.[00:18:30] Speaker: So one paper I liked recently is this paper called CoAct where you generate plans and then go in and fix them. And so the basic idea is like, if you need to deviate from your plan, you can You know, figure out that your plan was not working and go back and deviate from it.[00:18:49] Speaker: Another thing I think about a lot is specifying common workflows. So we're trying to tackle a software development and I already showed like three use cases where we do [00:19:00] software development and when we. We do software development, we do a ton of different things, but we do them over and over and over again.[00:19:08] Speaker: So just to give an example we fix GitHub actions when GitHub actions are failing. And we do that over and over and over again. That's not the number one thing that software engineers do, but it's a, you know, high up on the list. So how can we get a list of all of, like, the workflows that people are working on?[00:19:26] Speaker: And there's a few research works that people have done in this direction. One example is manual prompting. So there's this nice paper called STEP that got state of the art on the WebArena Web Navigation Benchmark where they came up with a bunch of manual workflows for solving different web navigation tasks.[00:19:43] Speaker: And we also have a paper recently called Agent Workflow Memory where the basic idea behind this is we want to create self improving agents that learn from their past successes. And the way it works is is we have a memory that has an example of lots of the previous [00:20:00] workflows that people have used. And every time the agent finishes a task and it self judges that it did a good job at that task, you take that task, you break it down into individual workflows included in that, and then you put it back in the prompt for the agent to work next time.[00:20:16] Speaker: And this we demonstrated that this leads to a 22. 5 percent increase on WebArena after 40 examples. So that's a pretty, you know, huge increase by kind of self learning and self improvement.[00:20:31] Speaker: Another thing is exploration. Oops. And one thing I think about is like, how can agents learn more about their environment before acting? And I work on coding and web agents, and there's, you know, a few good examples of this in, in both areas. Within coding, I view this as like repository understanding, understanding the code base that you're dealing with.[00:20:55] Speaker: And there's an example of this, or a couple examples of this, one example being AgentList. [00:21:00] Where they basically create a map of the repo and based on the map of the repo, they feed that into the agent so the agent can then navigate the repo and and better know where things are. And for web agents there's an example of a paper called Bagel, and basically what they do is they have the agent just do random tasks on a website, explore the website, better understand the structure of the website, and then after that they they feed that in as part of the product.[00:21:27] Speaker: Part seven is search. Right now in open hands, we just let the agent go on a linear search path. So it's just solving the problem once. We're using a good agent that can kind of like recover from errors and try alternative things when things are not working properly, but still we only have a linear search path.[00:21:45] Speaker: But there's also some nice work in 2024 that is about exploring multiple paths. So one example of this is there's a paper called Tree Search for Language Agents. And they basically expand multiple paths check whether the paths are going well, [00:22:00] and if they aren't going well, you rewind back. And on the web, this is kind of tricky, because, like, how do you rewind when you accidentally ordered something you don't want on Amazon?[00:22:09] Speaker: It's kind of, you know, not, not the easiest thing to do. For code, it's a little bit easier, because you can just revert any changes that you made. But I, I think that's an interesting topic, too.[00:22:21] Evaluation and Future Predictions for Agents[00:22:21] Speaker: And then finally evaluation. So within our development for evaluation, we want to do a number of things. The first one is fast sanity checks.[00:22:30] Speaker: And in order to do this, we want things we can run really fast, really really cheaply. So for web, we have something called mini world of bits, which is basically these trivial kind of web navigation things. We have something called the Adder Code Editing Benchmark, where it's just about editing individual files that we use.[00:22:48] Speaker: But we also want highly realistic evaluation. So for the web, we have something called WebArena that we created at CMU. This is web navigation on real real open source websites. So it's open source [00:23:00] websites that are actually used to serve shops or like bulletin boards or other things like this.[00:23:07] Speaker: And for code, we use Swebench, which I think a lot of people may have heard of. It's basically a coding benchmark that comes from real world pull requests on GitHub. So if you can solve those, you can also probably solve other real world pull requests. I would say we still don't have benchmarks for the fur full versatility of agents.[00:23:25] Speaker: So, for example We don't have benchmarks that test whether agents can code and do web navigation. But we're working on that and hoping to release something in the next week or two. So if that sounds interesting to you, come talk to me and I, I will tell you more about it.[00:23:42] Speaker: Cool. So I don't like making predictions, but I was told that I should be somewhat controversial, I guess, so I will, I will try to do it try to do it anyway, although maybe none of these will be very controversial. Um, the first thing is agent oriented LLMs like large language models for [00:24:00] agents.[00:24:00] Speaker: My, my prediction is every large LM trainer will be focusing on training models as agents. So every large language model will be a better agent model by mid 2025. Competition will increase, prices will go down, smaller models will become competitive as agents. So right now, actually agents are somewhat expensive to run in some cases, but I expect that that won't last six months.[00:24:23] Speaker: I, I bet we'll have much better agent models in six months. Another thing is instruction following ability, specifically in agentic contexts, will increase. And what that means is we'll have to do less manual engineering of agentic workflows and be able to do more by just prompting agents in more complex ways.[00:24:44] Speaker: Cloud is already really good at this. It's not perfect, but it's already really, really good. And I expect the other models will catch up to Cloud pretty soon. Error correction ability will increase, less getting stuck in loops. Again, this is something that Cloud's already pretty good at and I expect the others will, will follow.[00:25:00][00:25:01] Speaker: Agent benchmarks. Agent benchmarks will start saturating.[00:25:05] Speaker: And Swebench I think WebArena is already too easy. It, it is, it's not super easy, but it's already a bit too easy because the tasks we do in there are ones that take like two minutes for a human. So not, not too hard. And kind of historically in 2023 our benchmarks were too easy. So we built harder benchmarks like WebArena and Swebench were both built in 2023.[00:25:31] Future of Agent Development[00:25:31] Speaker: In 2024, our agents were too bad, so we built agents and now we're building better agents. In 2025, our benchmarks will be too easy, so we'll build better benchmarks, I'm, I'm guessing. So, I would expect to see much more challenging agent benchmarks come out, and we're already seeing some of them.[00:25:49] Speaker: In 2026, I don't know. I didn't write AGI, but we'll, we'll, we'll see.[00:25:56] Human-Agent Interaction Challenges[00:25:56] Speaker: Then the human agent computer interface. I think one thing that [00:26:00] we'll want to think about is what do we do at 75 percent success rate at things that we like actually care about? Right now we have 53 percent or 55 percent on Swebench verified, which is real world GitHub PRs.[00:26:16] Speaker: My impression is that the actual. Actual ability of models is maybe closer to 30 to 40%. So 30 to 40 percent of the things that I want an agent to solve on my own repos, it just solves without any human intervention. 80 to 90 percent it can solve without me opening an IDE. But I need to give it feedback.[00:26:36] Speaker: So how do we, how do we make that interaction smooth so that humans can audit? The work of agents that are really, really good, but not perfect is going to be a big challenge.[00:26:48] Expanding Agent Use Beyond Programming[00:26:48] Speaker: How can we expose the power of programming agents to other industries? So like as programmers, I think not all of us are using agents every day in our programming, although we probably will be [00:27:00] in in months or maybe a year.[00:27:02] Speaker: But I, I think it will come very naturally to us as programmers because we know code. We know, you know. Like how to architect software and stuff like that. So I think the question is how do we put this in the hands of like a lawyer or a chemist or somebody else and have them also be able to, you know, interact with it as naturally as we can.[00:27:25] Redesigning Systems for Agent Efficiency[00:27:25] Speaker: Another interesting thing is how can we redesign our existing systems for agents? So we had a paper on API based web agents, and basically what we showed is If you take a web agent and the agent interacts not with a website, but with APIs, the accuracy goes way up just because APIs are way easier to interact with.[00:27:42] Speaker: And in fact, like when I ask the, well, our agent, our agent is able to browse websites, but whenever I want it to interact with GitHub, I tell it do not browse the GitHub website. Use the GitHub API because it's way more successful at doing that. So maybe, you know, every website is going to need to have [00:28:00] an API because we're going to be having agents interact with them.[00:28:03] Accelerating Progress with Agent Technology[00:28:03] Speaker: About progress, I think progress will get faster. It's already fast. A lot of people are already overwhelmed, but I think it will continue. The reason why is agents are building agents. And better agents will build better agents faster. So I expect that you know, if you haven't interacted with a coding agent yet, it's pretty magical, like the stuff that it can do.[00:28:24] Speaker: So yeah.[00:28:28] Call to Action for Open Source Contributions[00:28:28] Speaker: And I have a call to action. I'm honestly, like I've been working on, you know, natural language processing and, and Language models for what, 15 years now. And even for me, it's pretty impressive what like AI agents powered by strong language models can do. On the other hand, I believe that we should really make these powerful tools accessible.[00:28:49] Speaker: And what I mean by this is I don't think like, you know, We, we should have these be opaque or limited to only a set, a certain set of people. I feel like they should be [00:29:00] affordable. They shouldn't be increasing the, you know, difference in the amount of power that people have. If anything, I'd really like them to kind of make it It's possible for people who weren't able to do things before to be able to do them well.[00:29:13] Speaker: Open source is one way to do that. That's why I'm working on open source. There are other ways to do that. You know, make things cheap, make things you know, so you can serve them to people who aren't able to afford them. Easily, like Duolingo is one example where they get all the people in the US to pay them 20 a month so that they can give all the people in South America free, you know, language education, so they can learn English and become, you know like, and become, you know, More attractive on the job market, for instance.[00:29:41] Speaker: And so I think we can all think of ways that we can do that sort of thing. And if that resonates with you, please contribute. Of course, I'd be happy if you contribute to OpenHands and use it. But another way you can do that is just use open source solutions, contribute to them, research with them, and train strong open source [00:30:00] models.[00:30:00] Speaker: So I see, you know, Some people in the room who are already training models. It'd be great if you could train models for coding agents and make them cheap. And yeah yeah, please. I, I was thinking about you among others. So yeah, that's all I have. Thanks.[00:30:20] Speaker 2: Slight, slightly controversial. Tick is probably the nicest way to say hot ticks. Any hot ticks questions, actual hot ticks?[00:30:31] Speaker: Oh, I can also show the other agents that were working, if anybody's interested, but yeah, sorry, go ahead.[00:30:36] Q&A: Agent Performance and Benchmarks[00:30:36] Speaker 3: Yeah, I have a couple of questions. So they're kind of paired, maybe. The first thing is that you said that You're estimating that your your agent is successfully resolving like something like 30 to 40 percent of your issues, but that's like below what you saw in Swebench.[00:30:52] Speaker 3: So I guess I'm wondering where that discrepancy is coming from. And then I guess my other second question, which is maybe broader in scope is that [00:31:00] like, if, if you think of an agent as like a junior developer, and I say, go do something, then I expect maybe tomorrow to get a Slack message being like, Hey, I ran into this issue.[00:31:10] Speaker 3: How can I resolve it? And, and, like you said, your agent is, like, successfully solving, like, 90 percent of issues where you give it direct feedback. So, are you thinking about how to get the agent to reach out to, like, for, for planning when it's, when it's stuck or something like that? Or, like, identify when it runs into a hole like that?[00:31:30] Speaker: Yeah, so great. These are great questions. Oh,[00:31:32] Speaker 3: sorry. The third question, which is a good, so this is the first two. And if so, are you going to add a benchmark for that second question?[00:31:40] Speaker: Okay. Great. Yeah. Great questions. Okay. So the first question was why do I think it's resolving less than 50 percent of the issues on Swebench?[00:31:48] Speaker: So first Swebench is on popular open source repos, and all of these popular open source repos were included in the training data for all of the language models. And so the language [00:32:00] models already know these repos. In some cases, the language models already know the individual issues in Swebench.[00:32:06] Speaker: So basically, like, some of the training data has leaked. And so it, it definitely will overestimate with respect to that. I don't think it's like, you know, Horribly, horribly off but I think, you know, it's boosting the accuracy by a little bit. So, maybe that's the biggest reason why. In terms of asking for help, and whether we're benchmarking asking for help yes we are.[00:32:29] Speaker: So one one thing we're working on now, which we're hoping to put out soon, is we we basically made SuperVig. Sweep edge issues. Like I'm having a, I'm having a problem with the matrix multiply. Please help. Because these are like, if anybody's run a popular open source, like framework, these are what half your issues are.[00:32:49] Speaker: You're like users show up and say like, my screen doesn't work. What, what's wrong or something. And so then you need to ask them questions and how to reproduce. So yeah, we're, we're, we're working on [00:33:00] that. I think. It, my impression is that agents are not very good at asking for help, even Claude. So like when, when they ask for help, they'll ask for help when they don't need it.[00:33:11] Speaker: And then won't ask for help when they do need it. So this is definitely like an issue, I think.[00:33:20] Speaker 4: Thanks for the great talk. I also have two questions.[00:33:23] Q&A: Web Agents and Interaction Methods[00:33:23] Speaker 4: It's first one can you talk a bit more about how the web agent interacts with So is there a VLM that looks at the web page layout and then you parse the HTML and select which buttons to click on? And if so do you think there's a future where there's like, so I work at Bing Microsoft AI.[00:33:41] Speaker 4: Do you think there's a future where the same web index, but there's an agent friendly web index where all the processing is done offline so that you don't need to spend time. Cleaning up, like, cleaning up these TML and figuring out what to click online. And any thoughts on, thoughts on that?[00:33:57] Speaker: Yeah, so great question. There's a lot of work on web [00:34:00] agents. I didn't go into, like, all of the details, but I think there's There's three main ways that agents interact with websites. The first way is the simplest way and the newest way, but it doesn't work very well, which is you take a screenshot of the website and then you click on a particular pixel value on the website.[00:34:23] Speaker: And Like models are not very good at that at the moment. Like they'll misclick. There was this thing about how like clawed computer use started like looking at pictures of Yellowstone national park or something like this. I don't know if you heard about this anecdote, but like people were like, oh, it's so human, it's looking for vacation.[00:34:40] Speaker: And it was like, no, it probably just misclicked on the wrong pixels and accidentally clicked on an ad. So like this is the simplest way. The second simplest way. You take the HTML and you basically identify elements in the HTML. You don't use any vision whatsoever. And then you say, okay, I want to click on this element.[00:34:59] Speaker: I want to enter text [00:35:00] in this element or something like that. But HTML is too huge. So it actually, it usually gets condensed down into something called an accessibility tree, which was made for screen readers for visually impaired people. And So that's another way. And then the third way is kind of a hybrid where you present the screenshot, but you also present like a textual summary of the output.[00:35:18] Speaker: And that's the one that I think will probably work best. What we're using is we're just using text at the moment. And that's just an implementation issue that we haven't implemented the. Visual stuff yet, but that's kind of like we're working on it now. Another thing that I should point out is we actually have two modalities for web browsing.[00:35:35] Speaker: Very recently we implemented this. And the reason why is because if you want to interact with full websites you will need to click on all of the elements or have the ability to click on all of the elements. But most of our work that we need websites for is just web browsing and like gathering information.[00:35:50] Speaker: So we have another modality where we convert all of it to markdown because that's like way more concise and easier for the agent to deal with. And then [00:36:00] can we create an index specifically for agents, maybe a markdown index or something like that would be, you know, would make sense. Oh, how would I make a successor to Swebench?[00:36:10] Speaker: So I mean, the first thing is there's like live code bench, which live code bench is basically continuously updating to make sure it doesn't leak into language model training data. That's easy to do for Swebench because it comes from real websites and those real websites are getting new issues all the time.[00:36:27] Speaker: So you could just do it on the same benchmarks that they have there. There's also like a pretty large number of things covering various coding tasks. So like, for example, Swebunch is mainly fixing issues, but there's also like documentation, there's generating tests that actually test the functionality that you want.[00:36:47] Speaker: And there there was a paper by a student at CMU on generating tests and stuff like that. So I feel like. Swebench is one piece of the puzzle, but you could also have like 10 different other tasks and then you could have like a composite [00:37:00] benchmark where you test all of these abilities, not just that particular one.[00:37:04] Speaker: Well, lots, lots of other things too, but[00:37:11] Speaker 2: Question from across. Use your mic, it will help. Um,[00:37:15] Speaker 5: Great talk. Thank you.[00:37:16] Q&A: Agent Architectures and Improvements[00:37:16] Speaker 5: My question is about your experience designing agent architectures. Specifically how much do you have to separate concerns in terms of tasks specific agents versus having one agent to do three or five things with a gigantic prompt with conditional paths and so on.[00:37:35] Speaker: Yeah, so that's a great question. So we have a basic coding and browsing agent. And I won't say basic, like it's a good, you know, it's a good agent, but it does coding and browsing. And it has instructions about how to do coding and browsing. That is enough for most things. Especially given a strong language model that has a lot of background knowledge about how to solve different types of tasks and how to use different APIs and stuff like that.[00:37:58] Speaker: We do have [00:38:00] a mechanism for something called micro agents. And micro agents are basically something that gets added to the prompt when a trigger is triggered. Right now it's very, very rudimentary. It's like if you detect the word GitHub anywhere, you get instructions about how to interact with GitHub, like use the API and don't browse.[00:38:17] Speaker: Also another one that I just added is for NPM, the like JavaScript package manager. And NPM, when it runs and it hits a failure, it Like hits in interactive terminals where it says, would you like to quit? Yep. Enter yes. And if that does it, it like stalls our agent for the time out until like two minutes.[00:38:36] Speaker: So like I added a new microagent whenever it started using NPM, it would Like get instructions about how to not use interactive terminal and stuff like that. So that's our current solution. Honestly, I like it a lot. It's simple. It's easy to maintain. It works really well and stuff like that. But I think there is a world where you would want something more complex than that.[00:38:55] Speaker 5: Got it. Thank you.[00:38:59] Speaker 6: I got a [00:39:00] question about MCP. I feel like this is the Anthropic Model Context Protocol. It seems like the most successful type of this, like, standardization of interactions between computers and agents. Are you guys adopting it? Is there any other competing standard?[00:39:16] Speaker 6: Anything, anything thought about it?[00:39:17] Speaker: Yeah, I think the Anth, so the Anthropic MCP is like, a way to It, it's essentially a collection of APIs that you can use to interact with different things on the internet. I, I think it's not a bad idea, but it, it's like, there's a few things that bug me a little bit about it.[00:39:40] Speaker: It's like we already have an API for GitHub, so why do we need an MCP for GitHub? Right. You know, like GitHub has an API, the GitHub API is evolving. We can look up the GitHub API documentation. So it seems like kind of duplicated a little bit. And also they have a setting where [00:40:00] it's like you have to spin up a server to serve your GitHub stuff.[00:40:04] Speaker: And you have to spin up a server to serve your like, you know, other stuff. And so I think it makes, it makes sense if you really care about like separation of concerns and security and like other things like this, but right now we haven't seen, we haven't seen that. To have a lot more value than interacting directly with the tools that are already provided.[00:40:26] Speaker: And that kind of goes into my general philosophy, which is we're already developing things for programmers. You know,[00:40:36] Speaker: how is an agent different than from a programmer? And it is different, obviously, you know, like agents are different from programmers, but they're not that different at this point. So we can kind of interact with the interfaces we create for, for programmers. Yeah. I might change my mind later though.[00:40:51] Speaker: So we'll see.[00:40:54] Speaker 7: Yeah. Hi. Thanks. Very interesting talk. You were saying that the agents you have right now [00:41:00] solve like maybe 30 percent of your, your issues out of the gate. I'm curious of the things that it doesn't do. Is there like a pattern that you observe? Like, Oh, like these are the sorts of things that it just seems to really struggle with, or is it just seemingly random?[00:41:15] Speaker: It's definitely not random. It's like, if you think it's more complex than it's. Like, just intuitively, it's more likely to fail. I've gotten a bit better at prompting also, so like, just to give an example it, it will sometimes fail to fix a GitHub workflow because it will not look at the GitHub workflow and understand what the GitHub workflow is doing before it solves the problem.[00:41:43] Speaker: So I, I think actually probably the biggest thing that it fails at is, um, er, that our, our agent plus Claude fails at is insufficient information gathering before trying to solve the task. And so if you provide all, if you provide instructions that it should do information [00:42:00] gathering beforehand, it tends to do well.[00:42:01] Speaker: If you don't provide sufficient instructions, it will try to solve the task without, like, fully understanding the task first, and then fail, and then you need to go back and give feedback. You know, additional feedback. Another example, like, I, I love this example. While I was developing the the monitor website that I, I showed here, we hit a really tricky bug where it was writing out a cache file to a different directory than it was reading the cache file from.[00:42:26] Speaker: And I had no idea what to do. I had no idea what was going on. I, I thought the bug was in a different part of the code, but what I asked it to do was come up with five possible reasons why this could be failing and decreasing order of likelihood and examine all of them. And that worked and it could just go in and like do that.[00:42:44] Speaker: So like I think a certain level of like scaffolding about like how it should sufficiently Gather all the information that's necessary in order to solve a task is like, if that's missing, then that's probably the biggest failure point at the moment. [00:43:00][00:43:01] Speaker 7: Thanks.[00:43:01] Speaker 6: Yeah.[00:43:06] Speaker 6: I'm just, I'm just using this as a chance to ask you all my questions.[00:43:09] Q&A: Self-Improving Agents and Authentication[00:43:09] Speaker 6: You had a, you had a slide on here about like self improving agents or something like that with memory. It's like a really throwaway slide for like a super powerful idea. It got me thinking about how I would do it. I have no idea how.[00:43:21] Speaker 6: So I just wanted you to chain a thought more on this.[00:43:25] Speaker: Yeah, self, self improving. So I think the biggest reason, like the simplest possible way to create a self improving agent. The problem with that is to have a really, really strong language model that with infinite context, and it can just go back and look at like all of its past experiences and, you know, learn from them.[00:43:46] Speaker: You might also want to remove the bad stuff just so it doesn't over index on it's like failed past experiences. But the problem is a really powerful language model is large. Infinite context is expensive. We don't have a good way to [00:44:00] index into it because like rag, Okay. At least in my experience, RAG from language to code doesn't work super well.[00:44:08] Speaker: So I think in the end, it's like, that's the way I would like to solve this problem. I'd like to have an infinite context and somehow be able to index into it appropriately. And I think that would mostly solve it. Another thing you can do is fine tuning. So I think like RAG is one way to get information into your model.[00:44:23] Speaker: Fine tuning is another way to get information into your model. So. That might be another way of continuously improving. Like you identify when you did a good job and then just add all of the good examples into your model.[00:44:34] Speaker 6: Yeah. So, you know, how like Voyager tries to write code into a skill library and then you reuse as a skill library, right?[00:44:40] Speaker 6: So that it improves in the sense that it just builds up the skill library over time.[00:44:44] Speaker: Yep.[00:44:44] Speaker 6: One thing I was like thinking about and there's this idea of, from, from Devin, your, your arch nemesis of playbooks. I don't know if you've seen them.[00:44:52] Speaker: Yeah, I mean, we're calling them workflows, but they're simpler.[00:44:55] Speaker 6: Yeah, so like, basically, like, you should, like, once a workflow works, you can kind of, [00:45:00] like, persist them as a skill library. Yeah. Right? Like I, I feel like that there's a, that's like some in between, like you said, you know, it's hard to do rag between language and code, but I feel like that is ragged for, like, I've done this before, last time I did it, this, this worked.[00:45:14] Speaker 6: So I'm just going to shortcut. All the stuff that failed before.[00:45:18] Speaker: Yeah, I totally, I think it's possible. It's just, you know, not, not trivial at the same time. I'll explain the two curves. So basically, the base, the baseline is just an agent that does it from scratch every time. And this curve up here is agent workflow memory where it's like adding the successful experiences back into the prompt.[00:45:39] Speaker: Why is this improving? The reason why is because just it failed on the first few examples and for the average to catch up it, it took a little bit of time. So it's not like this is actually improving it. You could just basically view the this one is constant and then this one is like improving.[00:45:56] Speaker: Like this, basically you can see it's continuing to go [00:46:00] up.[00:46:01] Speaker 8: How do you think we're going to solve the authentication problem for agents right now?[00:46:05] Speaker: When you say authentication, you mean like credentials, like, yeah.[00:46:09] Speaker 8: Yeah. Cause I've seen a few like startup solutions today, but it seems like it's limited to the amount of like websites or actual like authentication methods that it's capable of performing today.[00:46:19] Speaker: Yeah. Great questions. So. My preferred solution to this at the moment is GitHub like fine grained authentication tokens and GitHub fine grained authentication tokens allow you to specify like very free. On a very granular basis on this repo, you have permission to do this, on this repo, you have permission to do this.[00:46:41] Speaker: You also can prevent people from pushing to the main branch unless they get approved. You can do all of these other things. And I think these were all developed for human developers. Or like, the branch protection rules were developed for human developers. The fine grained authentication tokens were developed for GitHub apps.[00:46:56] Speaker: I think for GitHub, maybe [00:47:00] just pushing this like a little bit more is the way to do this. For other things, they're totally not prepared to give that sort of fine grained control. Like most APIs don't have something like a fine grained authentication token. And that goes into my like comment that we're going to need to prepare the world for agents, I think.[00:47:17] Speaker: But I think like the GitHub authentication tokens are like a good template for how you could start doing that maybe, but yeah, I don't, I don't, I don't have an answer.[00:47:25] Speaker 8: I'll let you know if I find one.[00:47:26] Speaker: Okay. Yeah.[00:47:31] Live Demonstration and Closing Remarks[00:47:31] Speaker: I'm going to finish up. Let, let me just see.[00:47:37] Speaker: Okay. So this one this one did write a script. I'm not going to actually read it for you. And then the other one, let's see.[00:47:51] Speaker: Yeah. So it sent a PR, sorry. What is, what is the PR URL?[00:48:00][00:48:02] Speaker: So I don't, I don't know if this sorry, that's taking way longer than it should. Okay, cool. Yeah. So this one sent a PR. I'll, I'll tell you later if this actually like successfully Oh, no, it's deployed on Vercel, so I can actually show you, but let's, let me try this real quick. Sorry. I know I don't have time.[00:48:24] Speaker: Yeah, there you go. I have pie charts now. So it's so fun. It's so fun to play with these things. Cause you could just do that while I'm giving a, you know, talk and things like that. So, yeah, thanks. Get full access to Latent Space at www.latent.space/subscribe
This Week in Startups is brought to you by… Squarespace. Turn your idea into a new website! Go to https://www.Squarespace.com/TWIST for a free trial. When you're ready to launch, use offer code TWIST to save 10% off your first purchase of a website or domain. LinkedIn Ads. To redeem a $100 LinkedIn ad credit and launch your first campaign, go to http://www.linkedin.com/thisweekinstartups PrizePicks - Run your game. THE BEST place to get real money action while watching your favorite sports! Download the app today and use code TWIST to get $50 instantly after you play your first $5 lineup! Todays show:Jason & Alex chat with Varun Mohan, founder & CEO of Codeium, the revolutionary AI-powered coding platform backed by Kleiner Perkins & Founders Fund. Varun shares insights on pivoting to greatness, scaling enterprise solutions, and launching Codeium's new IDE, Windsurf. Timestamps: (0:00) Jason and Alex kick off the show (0:30) Failure examples and Star Wars character confusion (1:38) Introduction to the episode, guests, and Perplexity AI (4:36) Varun Mohan and Codium's journey in AI and IDE development (11:01) Squarespace - Use offer code TWIST to save 10% off your first purchase of a website or domain at https://www.Squarespace.com/TWIST (12:29) The challenge of moderate success and building 10x better experiences (19:48) LinkedIn Ads - Get a $100 LinkedIn ad credit at http://www.linkedin.com/thisweekinstartups (21:21) Alex Wilhelm's Cascade demo and Codium's capabilities (24:49) Codium's pricing and the future of no code development (31:45) PrizePicks - Run your game. Download the app today and use code TWIST to get $50 instantly after you play your first $5 lineup! (33:31) Satya Nadella on SaaS's future and AI replacing agents (36:00) AI's impact on developer jobs, salaries, and remote work (41:27) Elite developers, hiring strategies, and in-person vs. remote work (46:03) Twist 500, upcoming events, and what VCs seek in startups (53:02) Hyper competition in AI services and niche market strategies (58:26) Audience Q&A and closing remarks Subscribe to the TWiST500 newsletter: https://ticker.thisweekinstartups.com Check out the TWIST500: https://www.twist500.com Subscribe to This Week in Startups on Apple: https://rb.gy/v19fcp Follow Varun: X: https://x.com/_mohansolo LinkedIn: https://www.linkedin.com/in/varunkmohan Check out: https://codeium.com Follow Alex: X: https://x.com/alex LinkedIn: https://www.linkedin.com/in/alexwilhelm Follow Jason: X: https://twitter.com/Jason LinkedIn: https://www.linkedin.com/in/jasoncalacanis Thank you to our partners: (11:01) Squarespace - Use offer code TWIST to save 10% off your first purchase of a website or domain at https://www.Squarespace.com/TWIST (19:48) LinkedIn Ads - Get a $100 LinkedIn ad credit at http://www.linkedin.com/thisweekinstartups (31:45) PrizePicks - Run your game. Download the app today and use code TWIST to get $50 instantly after you play your first $5 lineup! Great TWIST interviews: Will Guidara, Eoghan McCabe, Steve Huffman, Brian Chesky, Bob Moesta, Aaron Levie, Sophia Amoruso, Reid Hoffman, Frank Slootman, Billy McFarland Check out Jason's suite of newsletters: https://substack.com/@calacanis Follow TWiST: Twitter: https://twitter.com/TWiStartups YouTube: https://www.youtube.com/thisweekin Instagram: https://www.instagram.com/thisweekinstartups TikTok: https://www.tiktok.com/@thisweekinstartups Substack: https://twistartups.substack.com Subscribe to the Founder University Podcast: https://www.youtube.com/@founderuniversity1916
Our second podcast guest ever in March 2023 was Varun Mohan, CEO of Codeium; at the time, they had around 10,000 users and how they vowed to keep their autocomplete free forever: Today, over a million developers use their products, they still have their free tier, and they recently launched Windsurf, an AI IDE. Chapters* 00:00:00: Introductions & Catchup* 00:03:52: Why they created Windsurf* 00:05:52: Limitations of VS Code* 00:10:12: Evaluation methods for Cascade and Windsurf* 00:16:15: Listener questions about Windsurf launch* 00:20:30: Remote execution and security concerns* 00:25:18: Evolution of Codeium's strategy* 00:28:29: Cascade and its capabilities* 00:33:12: Multi-agent systems* 00:37:02: Areas of improvement for Windsurf* 00:39:12: Building an enterprise-first company* 00:42:01: Copilot for X, AI UX, and Enterprise AI blog posts Get full access to Latent Space at www.latent.space/subscribe
Launch, run, and protect your business to make it official TODAY at https://www.legalzoom.com/ and use promo code EYEONAI to get 10% off any LegalZoom business formation product excluding subscriptions and renewals. In this episode of the Eye on AI podcast, Varun Mohan, Co-Founder and CEO of Codeium, shares the journey behind building one of the most innovative AI-powered coding tools available today. Varun joins Craig Smith to explore the future of software development, the evolution of Codeium, and how generative AI is revolutionizing the way developers interact with codebases. As a leader in AI and software engineering, Varun discusses how Codeium integrates advanced AI to enable real-time code completion, refactoring, and even end-to-end application building. From processing billions of lines of code to making complex legacy systems like COBOL more accessible, Codeium is empowering developers to work smarter and faster. We delve into the challenges of managing massive codebases, the importance of maintaining security and self-hosted solutions, and how tools like Codeium stand apart from competitors like Copilot and CodeWhisperer. Varun also introduces **Cascade**, the AI assistant within Codeium's custom IDE, Windsurf, which merges human ingenuity with AI's reasoning capabilities for a seamless coding experience. Other fascinating topics include the role of AI in onboarding developers to complex systems, its ability to handle rare programming languages, and how Codeium is reshaping the software lifecycle from design to deployment. Learn how Codeium is redefining the possibilities of software development with AI, and what the future holds as this technology continues to evolve. Don't miss this deep dive into AI-powered innovation with Varun Mohan. Like, subscribe, and hit the notification bell for more episodes exploring cutting-edge advancements in AI! Stay Updated: Craig Smith Twitter: https://twitter.com/craigss Eye on A.I. Twitter: https://twitter.com/EyeOn_AI (00:00) Introduction to Varun Mohan and Codeium (05:32) Introducing Codeium: AI-powered coding revolution (08:42) Competing with Copilot and CodeWhisperer (11:46) Using Codeium to streamline software development (14:01) Features of Cascade: Refactoring and automation (17:26) Building applications from scratch with Codeium (22:00) Enterprise adoption and recognition by JPMorgan Chase (23:14) The pivot to Codeium and lessons learned (24:32) The vision for Codeium's future
Guests: Varun Mohan, CEO & Co-Founder of Codeium; and Leigh Marie Braswell, partner at Kleiner Perkins“A lot of people are really bad at knowing what good is,” says Codeium CEO Varun Mohan. Specifically, he's thinking of startups that hire based on a “logo” — a well-known company on the résumé — rather than exceptional talent. Codeium is based in Mountain View, CA, and Varun believes that it's incumbent on any new startup to hire in the San Francisco Bay Area, because of how exceptional talent is concentrated there. “When you hire someone that's 10x better,” he says, “you can't replace them with 10 1x people. Because the the 10x person is going to be thinking of ideas that none of these 1x people are ever going to think of.”Chapters:(01:05) - Ludicrous growth (03:54) - Seizing opportunity (07:29) - Product-market fit (13:05) - Scale AI & MIT (17:42) - Coding efficiency (22:58) - Larger companies (25:20) - Varun and Leigh Marie's working relationship (29:51) - Pivoting to Codeium (34:00) - Giving away the product (37:01) - The code-gen landscape (42:20) - Annual reinvention (45:00) - Picking a problem (47:07) - Bipul Sinha's help (50:43) - Ambition (53:13) - Building in Silicon Valley (55:11) - Spotting talent (59:11) - Who Codeium is hiring (59:43) - What “grit” means to Varun Mentioned in this episode: Graham Moreno, Wiz, ChatGPT, Google, Nuro, Goldman Sachs, Waymo, the DARPA Challenge, Alex Wang, Douglas Chen, Safeway, Equinox, Carlos Delatorre and MongoDB, The Qualified Sales Leader by John McMahon, GitHub Copilot, Microsoft, Exafunction, Mamoon Hamid, Figma, JPMorgan Chase, Starlink, SpaceX, Rubrik, Michael Dell, Stripe, and John Doerr.Links:Connect with VarunLinkedInTwitterConnect with Leigh MarieLinkedInTwitterConnect with JoubinTwitterLinkedInEmail: grit@kleinerperkins.com Learn more about Kleiner PerkinsThis episode was edited by Eric Johnson from LightningPod.fm
Hangoskönyvek Starship 5 launch Tesla robotaxi esemény Transformers one OnlyFans, Apple Vision Pro, Google glass, Orion, HoloLens, GenAI, Groq, Nobel díjak, Twitter vs Brazília, OpenAI, GPT o1, Codeium. Elérhetőségek: Web oldalunk Discord Youtube ITunes TuneIn Spotify RSS Twitter Facebook Paypal támogatói link! PATREON link Outro: Track: I'm Yours [Lofi Hip Hop/Chill Study Music Mix] Music provided by Lofi Fruits / Strange Fruits Watch: https://youtu.be/9eyqOO7i3b8
In this engaging conversation, Varun Mohan, CEO and Co Founder of Codeium, shares his journey of building a unicorn AI company in just three and half years. He discusses the importance of having fun in business, the challenges of navigating the startup landscape, and the critical role of sales strategy and market fit. Varun emphasizes the future of AI and its impact on software development, as well as Codeium's vision to revolutionize the industry. He also highlights the significance of cultural principles, lean operations, and personal motivation in driving success. Key Takeaways Having fun is essential in business. Codeium achieved unicorn status in three years. Intellectual honesty is vital for startups. A product must solve real pain for customers. ROI is crucial for AI products. Running lean is important for sustainability. It's more valuable to be adaptable than to be right. Personal motivation drives success in startups. Surrounding yourself with smarter people is key.
Gardner Johnson discusses his experiences and challenges in building successful channel ecosystems. He shares his journey from direct sales to the channel and how he developed a playbook for repeatable and scalable channel success. Gardner emphasizes the importance of data and metrics in understanding partner performance and driving success. He also highlights the need for buy-in from the entire go-to-market organization and the value that a well-executed partner ecosystem can bring. The hiring criteria for partner managers are similar to those for account managers, focusing on intelligence, entrepreneurship, and a sense of urgency. In this conversation, Gardner and the hosts discuss the qualities and skills required for success in partner management roles. They highlight the importance of drive, coachability, and experience in building successful partnerships. They also explore the challenges of convincing AEs to transition into partner roles and the need to check ego and take a backseat in sales campaigns. The conversation delves into the critical role of champion building in partner management and the need to build champions both internally and externally. They also discuss the evolving role of the channel in the changing buyer landscape and the emergence of marketplaces. Gardner shares his insights on why the channel is becoming more important and the potential for a channel-first approach. The conversation concludes with Gardner discussing his new role at Codeium and the exciting opportunities in leveraging AI and large language models to solve specific business problems. Key Takeaways • Building a successful channel ecosystem requires buy-in from the entire go-to-market organization. • Data and metrics are crucial in understanding partner performance and driving success. • A well-executed partner ecosystem can bring significant value to an organization. • The hiring criteria for partner managers are similar to those for account managers, focusing on intelligence, entrepreneurship, and a sense of urgency. Drive, coachability, and experience are key qualities for success in partner management roles. • Convincing AEs to transition into partner roles can be a challenge, as it requires checking ego and taking a backseat in sales campaigns. • Champion building is crucial in partner management, both internally and externally. • Enablement plays a critical role in building successful partner ecosystems. • The channel is becoming more important due to the emergence of marketplaces and changing buyer behaviors. • AI and large language models offer opportunities to solve specific business problems and drive meaningful business value.
Scott and Wes serve up a discussion on AI coding assistants with a deep look at Cursor AI, exploring its unique features like multi-line auto-complete and Smart Rewrites. They also discuss why Cursor's intuitive UI stands out and tackle the big question: is it worth the investment? Show Notes 00:00 Welcome to Syntax! 01:16 Brought to you by Sentry.io. 01:48 Handling objections around AI assistants. 02:55 Context windows and how they're improving functionality. Syntax.fm Episode 728 with Kevin Hou of Codeium 04:08 Cursor's UI. 04:51 This is cool, why is it not a plugin? 08:12 What makes the UI interesting. 09:13 Smart Rewrites. 11:44 It can create multiple files. 13:05 Using the chat interface. 16:32 Another chat example. 20:22 The main features of Cursor. 21:55 Multi-line auto-complete. 23:55 Using docs for additional context. 27:26 AI is here to help you, not replace you. 33:27 Is it worth it? 33:55 The pricing. 44:10 Sick Picks & Shameless Plugs. Sick Picks Scott: The Iron Historian, Scott's Salt & Pepper Mills. Wes: Oxo Salt & Pepper Mills. Shameless Plugs Scott: Syntax.fm Zed Theme. Hit us up on Socials! Syntax: X Instagram Tiktok LinkedIn Threads Wes: X Instagram Tiktok LinkedIn Threads Scott: X Instagram Tiktok LinkedIn Threads Randy: X Instagram YouTube Threads
This week, we discuss Intel's challenges, Elastic's adopting the AGPL, and getting AI to introduce itself. Plus, some thoughts on a gesticulating flâneur using a speakerphone in public. Watch the YouTube Live Recording of Episode 483 (https://www.youtube.com/live/iOnzx3rL_hE?si=1Z53JKvpHYRmTji7) Runner-up Titles Why are you in an attic Flâneur Gesticulating That's not I'm over the big pile of nachos What's your corporate strategy here? They carry the flame Keeper of the Flame Cote' likes weather Brakenthetical Rundown Intel Intel Said to Explore Options to Cope With Historic Slump (https://www.bloomberg.com/news/articles/2024-08-30/intel-is-said-to-explore-options-to-cope-with-historic-slump) Intel debuts second-gen Core Ultra mobile chips in bid to fend off Qualcomm, AMD (https://finance.yahoo.com/news/intel-debuts-second-gen-core-ultra-mobile-chips-in-bid-to-fend-off-qualcomm-amd-160038473.html) 'Time is running out' for Intel CEO Patrick Gelsinger: Analyst (https://finance.yahoo.com/video/time-running-intel-ceo-patrick-205943896.html) Report: Intel CEO Pat Gelsinger to reveal new plan to cut costs and sell off business units (https://siliconangle.com/2024/09/01/report-intel-ceo-pat-gelsinger-reveal-new-plan-cut-costs-sell-off-business-units/) “Broadcom and Intel?” (https://stratechery.com/2024/broadcom-and-intel-nvidia-earnings-doj-investigating-nvidia/) Qualcomm's new eight-core Snapdragon X Plus makes these Windows laptops cheaper (https://www.theverge.com/2024/9/4/24235065/qualcomm-8-core-snapdragon-x-plus-windows-laptops) Senior Intel CPU architects splinter to develop RISC-V processors at AheadComputing (https://www.tomshardware.com/tech-industry/senior-intel-cpu-architects-splinter-to-develop-risc-v-processors-veterans-establish-aheadcomputing) OSS Licensing Elasticsearch is Open Source, Again (https://www.elastic.co/blog/elasticsearch-is-open-source-again) Why Elastic (ESTC) Shares Are Trading Lower Today (https://finance.yahoo.com/news/why-elastic-estc-shares-trading-154216841.html) The slow evaporation of the free/open source surplus (https://www.baldurbjarnason.com/2024/the-slow-evaporation-of-the-foss-surplus/) Post-Open Source (https://www.boringcactus.com/2020/08/13/post-open-source.html) (circa 2020) Relevant to your Interests Data infrastructure startup Cribl raises $319M at a $3.5B valuation (https://techcrunch.com/2024/08/27/data-infrastructure-startup-cribl-raises-319m-at-a-3-5b-valuation/?utm_source=newsletter&utm_medium=email&utm_campaign=newsletter_axiosprorata&stream=top) Cribl Closes $319M Series E Round at a $3.5B Valuation to Revolutionize Enterprise Data Management (https://cribl.io/blog/announcing-our-series-e/) Broadcom Launches Rally Anywhere (https://investors.broadcom.com/news-releases/news-release-details/broadcom-launches-rally-anywhere-premises-version-its-leading) Courts close the loophole letting the feds search your phone at the border (https://reason.com/2024/07/26/courts-close-the-loophole-letting-the-feds-search-your-phone-at-the-border/) GitHub Copilot competitor Codeium raises $150M at a $1.25B valuation (https://techcrunch.com/2024/08/29/github-copilot-competitor-codeium-raises-150m-at-a-1-25b-valuation/) Amazon's Remarkable Alexa will reportedly run on Claude AI and cost $5-10 per month (https://www.engadget.com/ai/amazons-remarkable-alexa-will-reportedly-run-on-claude-ai-and-cost-5-10-per-month-122532161.html?guccounter=1&guce_referrer=aHR0cHM6Ly9uZXdzLmdvb2dsZS5jb20v&guce_referrer_sig=AQAAANR-tMgHgoAVIPpbIFukogu1d3pwdPd_GUdl0sjGCYNHFhfKgwOjeuFmr1t3Hgf6Lr2icFYYS13rAx4TimFx4Tpzmn1N0MKYtCSGd_LSydKC-avA5igy4bwhj4meKgXe04Bcbxke9b36h40_4yeupf2uivsAlg9_J8KFvqgAy-SV) Artifacts are now generally available (https://www.anthropic.com/news/artifacts?ueid=3d6a2c800561880c901963d51e644d95&utm_source=Sailthru&utm_medium=email&utm_campaign=2024-08-31%20Installer%2050&utm_term=Installer) Founder Mode (https://paulgraham.com/foundermode.html) Research Papers into Podcasts (https://www.tiktok.com/t/ZTF1Jj9dH/) via Google Illuminate (https://illuminate.google.com/home?pli=1) Andreessen Horowitz Ditches Miami Two Years After Opening Office (https://www.bloomberg.com/news/articles/2024-09-04/andreessen-horowitz-ditches-miami-two-years-after-opening-office) Clearview AI fined $33.7 million by Dutch data protection watchdog over 'illegal database' of faces (https://apnews.com/article/clearview-ai-facial-recognition-privacy-fine-netherlands-a1ac33c15d561d37a923b6c382f48ab4) China leads in up to 89% of tech research, study shows (https://www.techradar.com/pro/china-leads-in-up-to-89-of-tech-research-study-shows) This 451 Research highlights Kloudfuse (https://www.kloudfuse.com/analystreports/451researchkloudfusecoverage) Nonsense Beaches are for people who enjoy the bureaucracy of going to the beach (https://interconnected.org/home/2024/08/29/beaches). Costco membership fee increase starts Sept. 1 for new members and renewals (https://www.axios.com/2024/08/31/costco-membership-cost-increase-2024-renewal-price). The penny. (https://www.axios.com/newsletters/axios-pro-rata-cc0b0182-805c-4be3-8d7c-cd9da8aa9779.html?chunk=1&utm_term=emshare#story1) Sponsor Nasuni: Head to nasuni.com/software (https://bit.ly/3MvMDoY) and see how it can revolutionize your data infrastructure today! Listener Feedback Amazon.com: Fulext Sleep Headphones Bluetooth Headband (https://www.amazon.com/Headphones-Bluetooth-Fulext-Cancelling-Meditating/dp/B0BBL1GMHC/ref=asc_df_B0BBL1GMHC/?tag=hyprod-20&linkCode=df0&hvadid=693275908971&hvpos=&hvnetw=g&hvrand=17257257273966662560&hvpone=&hvptwo=&hvqmt=&hvdev=m&hvdvcmdl=&hvlocint=&hvlocphy=9028305&hvtargid=pla-1928720591899&psc=1&mcid=470edb27a8173df5bccd8f05cc58a325) Shokz Bone Conduction Headphone (https://shokz.com/) Conferences Civo Navigate Europe, Berlin (https://www.civo.com/navigate/europe), Sept 10-11, 2024. SREday London 2024 (https://sreday.com/2024-london/), Sept 19–20, 2024. Coté speaking, 20% off with code SRE20DAY. Cloud Foundry Day EU (https://events.linuxfoundation.org/cloud-foundry-day-europe/), Karlsruhe, GER, Oct 9, 2024, 20% off with code CFEU24VMW. VMware Explore Barcelona (https://www.vmware.com/explore/eu), Nov 4-7, 2024. Coté speaking. SREday Amsterdam (https://sreday.com/2024-amsterdam/), Nov 21, 2024. Coté speaking (https://sreday.com/2024-amsterdam/Michael_Cote_VMwarePivotal_We_Fear_Change), 20% off with code SRE20DAY. SDT News & Community Join our Slack community (https://softwaredefinedtalk.slack.com/join/shared_invite/zt-1hn55iv5d-UTfN7mVX1D9D5ExRt3ZJYQ#/shared-invite/email) Email the show: questions@softwaredefinedtalk.com (mailto:questions@softwaredefinedtalk.com) Free stickers: Email your address to stickers@softwaredefinedtalk.com (mailto:stickers@softwaredefinedtalk.com) Follow us on social media: Twitter (https://twitter.com/softwaredeftalk), Threads (https://www.threads.net/@softwaredefinedtalk), Mastodon (https://hachyderm.io/@softwaredefinedtalk), LinkedIn (https://www.linkedin.com/company/software-defined-talk/), BlueSky (https://bsky.app/profile/softwaredefinedtalk.com) Watch us on: Twitch (https://www.twitch.tv/sdtpodcast), YouTube (https://www.youtube.com/channel/UCi3OJPV6h9tp-hbsGBLGsDQ/featured), Instagram (https://www.instagram.com/softwaredefinedtalk/), TikTok (https://www.tiktok.com/@softwaredefinedtalk) Book offer: Use code SDT for $20 off "Digital WTF" by Coté (https://leanpub.com/digitalwtf/c/sdt) Sponsor the show (https://www.softwaredefinedtalk.com/ads): ads@softwaredefinedtalk.com (mailto:ads@softwaredefinedtalk.com) Recommendations Brandon: Slow Horses (https://www.googleadservices.com/pagead/aclk?sa=L&ai=DChcSEwigrYDZxamIAxUnh8IIHePYKxIYABAAGgJqZg&ae=2&co=1&gclid=Cj0KCQjwiuC2BhDSARIsALOVfBJ1WsnYOESOZRg8Ap8iUw1C9wYhRV1nBMLjjALo3grF-ZePh8aZ_NoaAhiFEALw_wcB&ohost=www.google.com&cid=CAESVeD2D814G1UIoqvw-1HqoEsvv6MaYwDg2-N74hrOR-bCYEOGPDzVcSkVVs58ogPWXwlW9yc3L9DdHBTGpVY_pgFBJ_G_VPyjTfyMmS-sFlNeY8UcCB0&sig=AOD64_3l1gukzjldLrXEWw5KGhEWtlUzVA&q&adurl&ved=2ahUKEwiapvjYxamIAxXwGtAFHdi7IsAQ0Qx6BAgIEAE) Season 4 (https://www.googleadservices.com/pagead/aclk?sa=L&ai=DChcSEwigrYDZxamIAxUnh8IIHePYKxIYABAAGgJqZg&ae=2&co=1&gclid=Cj0KCQjwiuC2BhDSARIsALOVfBJ1WsnYOESOZRg8Ap8iUw1C9wYhRV1nBMLjjALo3grF-ZePh8aZ_NoaAhiFEALw_wcB&ohost=www.google.com&cid=CAESVeD2D814G1UIoqvw-1HqoEsvv6MaYwDg2-N74hrOR-bCYEOGPDzVcSkVVs58ogPWXwlW9yc3L9DdHBTGpVY_pgFBJ_G_VPyjTfyMmS-sFlNeY8UcCB0&sig=AOD64_3l1gukzjldLrXEWw5KGhEWtlUzVA&q&adurl&ved=2ahUKEwiapvjYxamIAxXwGtAFHdi7IsAQ0Qx6BAgIEAE) Matt: open source Synergy (https://github.com/symless/synergy/) Coté: denim pearl snap shirt at C&A (https://www.c-and-a.com/be/nl/shop/spijkerblouse-regular-fit-kent-2222891/1) Little Shop of Hoarders (https://www.facebook.com/littleshopofhoardersantwerpen/) in Antwerp. Photo Credits Header (https://unsplash.com/photos/brown-and-black-concrete-building-ypcEalNjg0Y) Artwork (https://unsplash.com/photos/photography-of-person-walking-on-sideway-looking-at-leftside-with-buildings-olpYWcO2kHw)
During the week of August 24 to 30, 2024, multiple startups raised substantial funding, indicating increased investment in AI and biotech sectors. Key funding events included:1. **Magic** raised $320 million for AI software development, achieving a total of $465 million in funding since its founding in 2022.2. **Cribl** secured $319 million in a Series E round for data observability, increasing its valuation to $3.5 billion.3. **Codeium** completed a $150 million Series C for AI coding suggestions, totaling $243 million raised since 2021.4. **Elektrofi** raised $112 million in its Series C for drug delivery innovations, founded in 2016.5. **Navigator Medicines** closed a $100 million Series A to develop immune regulation therapies, marking its first funding since 2024.6. **Parry Labs** garnered $80 million for defense technology systems, representing its initial institutional investment.7. **RentVine** obtained $74 million for property management software since its inception in 2005.8. **Bridge** raised $40 million for Web3 payment solutions, totaling $58 million since 2022.9. **Noetik** secured $40 million for cancer therapeutics, with a total funding of $54 million since 2022.10. **Butlr** raised $38 million to enhance sensor technology, upping its total to $68 million since 2019.Additionally, **Xinwanxing Composites**, a Chinese firm, secured approximately $141 million, marking the largest funding deal outside the U.S. this week. Overall, this week showcased significant funding activity across various innovative sectors.Learn more on this news visit us at: https://greyjournal.net/ Hosted on Acast. See acast.com/privacy for more information.
Episode SummaryAre you ready to revolutionize your coding experience with cutting-edge AI tools? In this episode of The Secure Developer, host Danny Allan is joined by Jeff Wang, Head of Business at Codeium, to take a deep dive into the transformative power of generative AI in software development. Discover how coding assistants have evolved from simple auto-complete functions to sophisticated AI-driven tools, the significant impact these advancements have had on productivity and innovation, and how Codeium is addressing some of the security challenges they pose. Tuning in, you'll learn how you can stay ahead in the rapidly changing tech landscape and supercharge your development process.Show NotesIn this insightful episode of The Secure Developer, host Danny Allan sits down with Jeff Wang from Codeium to explore the rapidly evolving world of AI-powered coding assistants. As organizations increasingly look to harness the power of Generative AI in software development, Jeff provides a comprehensive overview of how these tools transform the coding landscape.The conversation starts with a journey through the history of coding assistants, from early autocomplete features to today's sophisticated AI-driven tools. Jeff explains how Large Language Models (LLMs) have revolutionized code generation, offering unprecedented levels of accuracy and efficiency. He delves into the various features of modern coding assistants, including chat functions for code understanding and debugging, highlighting how these tools cater to both junior and senior developers.Security concerns are a key focus of the discussion, with Jeff addressing how Codeium tackles data privacy and protection. He outlines strategies such as air-gapped deployments and local data processing to ensure that sensitive code remains secure. The episode also touches on the challenges of measuring the impact of these tools, with Jeff sharing insights on how companies are quantifying success through metrics like code generation percentage and developer productivity.Looking to the future, Jeff and Danny explore the potential trajectories of AI in software development. They discuss the possibility of more complex, multi-step AI processes and the integration of AI across the entire software development lifecycle. The conversation concludes with thought-provoking insights on how AI coding assistants are improving productivity and enabling developers and organizations to "dream bigger" and tackle more ambitious projects.This episode offers listeners a deep dive into the cutting-edge world of AI-assisted coding, providing valuable insights for developers, technology leaders, and anyone interested in the future of software development. Tune in to understand how these tools reshape the industry and why they're becoming essential to modern development practices.LinksCodeium Follow UsOur WebsiteOur LinkedIn
Ed speaks with Varun Mohan and Jeff Wang from Codeium, an AI code generator. They discuss the importance of being a lean company, how their product stacks up against competitors and why having a level of paranoia has been imperative to their success. Follow the podcast across socials @profgpod: Instagram Threads X Reddit Follow Ed on Instagram and X Learn more about your ad choices. Visit podcastchoices.com/adchoices
Join Varun Mohan, CEO of Codeium, as he discusses the evolution of his company from a GPU virtualization startup to a leading AI-powered coding assistant. He explains how Codeium differentiates itself by training its own models, leveraging context from entire codebases, and focusing on improving the entire software development lifecycle beyond just code writing. Mohan also shares his views on the future of coding assistants, predicting an increase in AI-generated code and a profound change in the IDE experience, while emphasizing that these tools will likely create more developer jobs rather than replace them.
From humble beginnings in Sunnyvale to leading a groundbreaking AI company, Varun Mohan's journey is a testament to resilience, innovation, and a relentless pursuit of impactful solutions.In this interview, Varun shares his story, insights on the evolution of technology, and the strategic pivots that led him from autonomous vehicles to the creation of Codeium, an AI code acceleration tool transforming the software development landscape.Codeium has attracted funding from top-tier investors like Kleiner Perkins, General Catalyst, Greenoaks, and Founder's Fund.
Varun Mohan, co-founder of Codeium, shares the journey of his company from being a GPU virtualization company to building an end-to-end application for code AI. They made a hard pivot when they realized that transformers were going to become popular and decided to build a product around it. Codeium now has over 600,000 users and processes over 100 billion tokens of code every day. Varun emphasizes the importance of maintaining a lean company and constantly questioning the value of each decision. They prioritize trust, accountability, and realism in their culture and give away their Legos to encourage reinvention. They also focus on attracting top talent by fostering a culture of achievement and showcasing their massive vision and breakneck speed of execution. As the company grows, they aim to maintain their culture of innovation and scalability. Takeaways Codeium made a hard pivot from being a GPU virtualization company to building an end-to-end application for code AI. They prioritize maintaining a lean company and constantly questioning the value of each decision. Trust, accountability, and realism are key cultural principles at Codeium. They give away their Legos and encourage reinvention to foster a culture of innovation. Attracting top talent is achieved by showcasing their massive vision and breakneck speed of execution. As the company grows, they aim to maintain their culture of innovation and scalability. Chapters 00:00 Introduction 05:12 The Challenges and Benefits of Making a Hard Pivot 11:10 Trust, Accountability, and Innovation in a High-Performance Culture 13:55 Attracting Top Talent: Achievement, Vision, and Impact 18:22 Managing Burnout: Focusing on the Most Important Tasks 22:50 The Key to Startup Success: Irrational Optimism and Uncompromising Realism
Today's guest on the OneHaas alumni podcast is Jeff Wang, the head of business at Codeium and co-founder of RocketFuel Education. Jeff grew up in Chicago with a passion for experimenting with the latest cutting edge technology. After some unfilling corporate jobs, Jeff got his MBA at Haas and jumped head first into the startup world. From there, he found a new passion for crypto and AI and started writing his own newsletter filled with keen market analysis. Jeff and host Sean Li chat about his unique view o n the crypto and AI markets, what Jeff views as the best uses of AI currently, how those uses could shift in the near future, and if the overall impact of AI on our world is net positive or negative. *OneHaas Alumni Podcast is a production of Haas School of Business and is produced by University FM.*Episode Quotes:What he got from his time at Haas“Open doors is probably the biggest kind of value that [Haas] can bring and meeting people that also were in the spirit. And I think at least like two or three of the opportunities after just came from other classmates who were trying to build something. And I think that's saying something that, yeah, opening doors is not because of going to a class. It really is like people that you interact with and have common values or common alignments of what you want to build.”On what RocketFuel Education is“We converted that into kind of like lessons on the crypto markets. And now it's more like crypto macro and AI markets and just like really understanding what's going on. And then again, being predictive.And if I'm accurate, sometimes that's great. And sometimes I'll be wrong, but I think over time, just having the understanding. And really understanding how markets work on RocketFuel Education, that is why people stick to it.”How he got the idea for RocketFuel“If you join these crypto communities, you actually get some really good insights as to what is upcoming that nobody else is going to be joining. If you go to these crypto conferences, you meet people that are actually the CEOs of these projects. And you can see if they're like for real or not. Or you could even meet the CEOs of projects that had not even been released yet. And you could actually invest in those companies as well. So you can get an edge by just being very early. And a lot of those interactions like kind of compelled me to be like, ‘Hey, spending all this time doing all this research, at least I should put a brain dump of that somewhere.'”On how AI is going to help humans“Everywhere that we are stuck in right now, like even if it's due to physics or if it's due to just manpower, right? Anywhere that humanity has slowed down. I think AI is just going to speed it back up again. I mean just think about like, if I could add more, headcount to any problem that humanity is facing. I think AI is kind of that solution, right?”Show Links:LinkedIn ProfileJeff's education platform: https://rocketfueledu.com/Jeff's AI blog: https://jeffwang.substack.com/Codeium: https://codeium.com/Twitter: @jeffwangcrypto Support this podcast at — https://redcircle.com/onehaas/donations
In this episode of Gradient Dissent, Varun Mohan, Co-Founder & CEO of Codeium, joins host Lukas Biewald to discuss the transformative power of AI in coding. They explore how Codeium evolved from GPU virtualization to a widely used AI coding tool and tackled the technical challenges and future prospects of AI-assisted software development. Varun shares insights on overcoming performance and latency issues and how AI can significantly enhance engineering velocity. This episode offers an in-depth look at the intersection of AI and coding, highlighting both technological advancements and the potential for more efficient development processes.✅ *Subscribe to Weights & Biases* → https://bit.ly/45BCkYzConnect with Varun Mohan:https://www.linkedin.com/in/varunkmohan/ https://x.com/_mohansolo Follow Weights & Biases:https://twitter.com/weights_biases https://www.linkedin.com/company/wandb Join the Weights & Biases Discord Server:https://discord.gg/CkZKRNnaf3
In this Potluck episode of Syntax, Scott and Wes tackle community questions on handling overwork, teaching programming to family, and React's Server Components. They also discuss quantifying accessibility improvements, choosing course platforms, and navigating professional growth in challenging workplaces. Show Notes 00:00 Welcome to Syntax! Thanks to Codeium. Syntax Episode 728 with Codeium. 01:37 Brought to you by Sentry.io. 02:38 Rewarding hard work with more work. 07:14 April Fools Episode. Syntax Episode 750 on April Fools. 09:03 Teaching programming to your family. Codespark. 15:56 React Server Components using JS runtime. 19:58 What happened to the intro music? 23:06 How can I quantify my work to the executive leadership team? 29:16 Should I host my own course website or use Udemy? 35:34 Navigating professional growth in toxic workplaces. 40:43 Getting started in native dev with tools like Tamagui. Syntax Episode 601 on Tamagui. 45:42 Cloudflare, what the heck is it? 50:49 Sick Picks + Shameless Plugs. You're looking for this > https://sentry.shop/products/sentry-aint-broke-tee Sick Picks Scott: Folex. Wes: CalDigit TS4. Shameless Plugs Scott: Syntax on YouTube Wes: Ditto Hit us up on Socials! Syntax: X Instagram Tiktok LinkedIn Threads Wes: X Instagram Tiktok LinkedIn Threads Scott:X Instagram Tiktok LinkedIn Threads Randy: X Instagram YouTube Threads
Varun Mohan is the cofounder and CEO of Codeium, an AI code generation tool used by hundreds of thousands of developers. They recently announced their $65M Series B led by Kleiner Perkins with participation from Greenoaks and General Catalyst. He was previously at Nuro. He has a bachelors and masters degree from MIT. Varun's favorite book: The Idea Factory (Author: Jon Gertner)(00:00) Introduction and State of Play(03:03) What Generative AI Can Do Well(06:10) Introduction to Codeium(08:53) Handling Different Programming Languages(11:26) Model Architectures and Optimization(13:27) Interpreting and Trusting AI Decisions(18:33) Security and Privacy Considerations(20:07) Impact on Software Quality and Developers(21:50) Potential Obsolescence of Programming Languages(23:39) Handling Edge Cases(26:07) The Biggest Impact of Generative AI for Coding(28:27) Technological Breakthroughs in Generative AI(29:30) Rapid Fire Round--------Where to find Prateek Joshi: Newsletter: https://prateekjoshi.substack.com Website: https://prateekj.com LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19 Twitter: https://twitter.com/prateekvjoshi
In this supper club, Scott and Wes welcome Kevin Hou, Head of Product Engineering at Codeium, a blazing fast AI-powered code completion and chat tool for developers. Show Notes 00:00 Welcome to Syntax! 05:31 An introduction to Codeium. 07:56 What information are you sending the AI to get such good completions? Codeium compliance 09:15 Codeium runs a 'Language Server'. 11:15 Crawling dependency tree and abstract syntax tree. 12:07 Using Codeium Live. 12:34 How big of a codebase can you run this on? 14:39 Sending select amounts of data to AI. 16:06 Does Codeium maintain codebase preferences and styling (ie. snake case)? 17:39 Will Codeium scan the dependency? 19:23 AI UI, have we found the best format? 21:55 Crazy ideas in tech. 22:53 Additional AI UI inputs. 24:14 How do you make an AI model? 28:42 How does Codeium manage the product roadmap? 32:09 Do AI models get worse over time? How does Codeium validate that it's not? Open AI Evals 35:39 How is Codeium THAT fast? 36:49 What programming langauges does Codeium use? 38:55 Codeium Playground. Codeium Playground 39:15 Caching as a performance improvement. 39:58 What is the pipe between Codeium and editor? 40:17 Codeium chat service. 41:44 A WebSocket system allowing push and pull communication. 43:13 Closed Beta for GPT 4.0. 46:12 The dreaded closing quote bug. 48:26 Sometimes bugs bug Wes. 49:49 Supper Club Questions 51:40 Perplexity.ai 52:35 What editor does Kevin use? 55:31 Sick Picks + Shameless Plugs Sick Picks Kevin: Lapse.com Shameless Plugs Kevin: Codeium.com Hit us up on Socials! Syntax: X Instagram Tiktok LinkedIn Threads Wes: X Instagram Tiktok LinkedIn Threads Scott: X Instagram Tiktok LinkedIn Threads Randy: X Instagram YouTube Threads
AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning
In this episode, we delve into Codeium's impressive $65 million funding round and its impact on the future of AI-powered tools for developers. Discover how this significant investment is set to revolutionize the way developers code and enhance productivity in the tech industry. Invest in AI Box: https://Republic.com/ai-box Get on the AI Box Waitlist: https://AIBox.ai/ AI Facebook Community Learn About ChatGPT Learn About AI at Tesla
Technovation with Peter High (CIO, CTO, CDO, CXO Interviews)
842: Software development, like the tech landscape as a whole, is constantly evolving, and the pace of change is faster than ever. Now, with the advent of generative AI and ChatGPT, a significant paradigm shift is occurring. Varun Mohan, co-founder and CEO of the code acceleration toolkit Codeium, joins Peter High on the broadcast to explain this shift and how generative AI is being leveraged to accelerate coding and the software development lifecycle. Varun shares the genesis story of Codeium, the benefit his infrastructure background had on building the company, and the trends he noticed signaling the impact of artificial intelligence on software development. He shares the future ahead for generative AI as a whole, the value proposition that Codeium's generative AI toolkit offers its customers and the best practices that executives should follow when partnering with generative AI vendors. Finally, Varun talks about how Codeium competes with larger-scale organizations, the company's plans with its new Series B investment, and the lessons he learned from his career.
Technovation with Peter High (CIO, CTO, CDO, CXO Interviews)
842: Software development, like the tech landscape as a whole, is constantly evolving, and the pace of change is faster than ever. Now, with the advent of generative AI and ChatGPT, a significant paradigm shift is occurring. Varun Mohan, co-founder and CEO of the code acceleration toolkit Codeium, joins Peter High on the broadcast to explain this shift and how generative AI is being leveraged to accelerate coding and the software development lifecycle. Varun shares the genesis story of Codeium, the benefit his infrastructure background had on building the company, and the trends he noticed signaling the impact of artificial intelligence on software development. He shares the future ahead for generative AI as a whole, the value proposition that Codeium's generative AI toolkit offers its customers and the best practices that executives should follow when partnering with generative AI vendors. Finally, Varun talks about how Codeium competes with larger-scale organizations, the company's plans with its new Series B investment, and the lessons he learned from his career.
We are announcing the second edition of our Latent Space demo day event in SF on 2/23: Final Frontiers, a startup and research competition in “The Autonomous Workforce”, ”Beyond Transformers & GPUs”, and “Embodied AI”. RSVP here! The first one was aimed for 15-20 people and ended up blowing up to >200 and covered in the Information - let's see what a year of growth (and competition) does to the local events space in 2024.You can find all Latent Space events here, and of course get in touch with us to host your own AI Engineer meetups like AI Engineering Singapore.In our December 2023 recap we covered the Four Wars of the AI stack. But how do we know when it's time to crown a winner? As we kick off 2024, we wanted to do a recap of the State of AI in 2023 to set a baseline of adoption for different products. Retool had a great report at the end of last year which covered a lot of it. David Hsu, CEO and co-founder of Retool, joined us to go over it together. We also talked about the history of Retool, why they were too embarrassed to present at YC demo day, and how they got to $1M ARR with 3 employees. If you're a founder, there are a lot of nuggets of advice in here!Retool AIIn our modeling of the “Software 3.0 Stack”, we have generally left a pretty wide open gap as to the “user interface” equivalent of the AI stack:Retool AI launched 4 months ago with some nifty features for SQL generation, and its own hosted vector storage service (using pgvector). However, as he explains on the pod, the more interesting potential of Retool is in helping developers build AI infused applications quickly, in combination with its Workflows feature. This moves Retool down the stack from just the UI for internal tooling to the business logic “piping” as well. There are a bunch of dedicated tools in this space like Respell, BuildShip, Flowise, and Ironclad Rivet."We think that practically every internal app is going to be AI infused over the next three years." - David on the podRIP StackOverflow?In July 2023 we talked about the impact of ChatGPT and Copilot:This was then disputed by StackOverflow, who pointed out (very fairly so) that there were privacy-related changes in their analytics instrumentation in 2022. StackOverflow no longer reports traffic, but based on StackOverflow's continuing transparency we can see that organic declines have continued throughout 2023.Retool's report comes over a year after those changes and has some self reported samples from users:* 57.6% of people said they have used StackOverflow less; almost all of them replaced it with ChatGPT and Copilot.* 10.2% said they no longer use StackOverflow.We also saw a lot more tools being released in the dev tools space such as (one of our oldest pod friends) Codeium (which just raised a $65M Series B), SourceGraph (and their newly released Cody), Codium AI (just released AlphaCodium which was picked up by Karpathy), Phind (which beat GPT-4 with OSS models), and Cursor, one of the most beloved products in the dev community at the moment. Intelligence is getting closer and closer to the IDE, and the trend doesn't seem to be reverting. We already said that “You are not too old (to pivot into AI)“, and the advice still stands. When asked to rate “Preference for hiring engineers effective at using ChatGPT/Copilot for coding” on a scale of 1 to 10, where 10 is “Much more likely”, ~40% of companies voted 8-10. Having an AI Engineer skillset is extremely important. 45% of companies between 1,000-4,999 employees said that they increased the difficulty of technical interviews to compensate for these new tools, so the gap between users and non-users will keep widening.Crossing the AI in Production ChasmGeoffrey Moore's “Crossing the Chasm” is one of the most quoted business frameworks. Every market has an initial group of Innovators and Early Adopters, who are willing to suffer through the rough edges of products initially, and eventually crosses into the Early Majority, which expects a full product.In the AI world, ChatGPT and Midjourney / DALL-E have crossed the chasm in the consumer space. Copilot is probably the only tool that did it in the enterprise, having crossed 1M paid users. ~$50B were invested in AI in 2023, and we still only have
We are running an end of year survey for our listeners. Let us know any feedback you have for us, what episodes resonated with you the most, and guest requests for 2024! RAG has emerged as one of the key pieces of the AI Engineer stack. Jerry from LlamaIndex called it a “hack”, Bryan from Hex compared it to “a recommendation system from LLMs”, and even LangChain started with it. RAG is crucial in any AI coding workflow. We talked about context quality for code in our Phind episode. Today's guests, Beyang Liu and Steve Yegge from SourceGraph, have been focused on code indexing and retrieval for over 15 years. We locked them in our new studio to record a 1.5 hours masterclass on the history of code search, retrieval interfaces for code, and how they get SOTA 30% completion acceptance rate in their Cody product by being better at the “bin packing problem” of LLM context generation. Google Grok → SourceGraph → CodyWhile at Google in 2008, Steve built Grok, which lives on today as Google Kythe. It allowed engineers to do code parsing and searching across different codebases and programming languages. (You might remember this blog post from Steve's time at Google) Beyang was an intern at Google at the same time, and Grok became the inspiration to start SourceGraph in 2013. The two didn't know eachother personally until Beyang brought Steve out of retirement 9 years later to join him as VP Engineering. Fast forward 10 years, SourceGraph has become to best code search tool out there and raised $223M along the way. Nine months ago, they open sourced SourceGraph Cody, their AI coding assistant. All their code indexing and search infrastructure allows them to get SOTA results by having better RAG than competitors:* Code completions as you type that achieve an industry-best Completion Acceptance Rate (CAR) as high as 30% using a context-enhanced open-source LLM (StarCoder)* Context-aware chat that provides the option of using GPT-4 Turbo, Claude 2, GPT-3.5 Turbo, Mistral 7x8B, or Claude Instant, with more model integrations planned* Doc and unit test generation, along with AI quick fixes for common coding errors* AI-enhanced natural language code search, powered by a hybrid dense/sparse vector search engine There are a few pieces of infrastructure that helped Cody achieve these results:Dense-sparse vector retrieval system For many people, RAG = vector similarity search, but there's a lot more that you can do to get the best possible results. From their release:"Sparse vector search" is a fancy name for keyword search that potentially incorporates LLMs for things like ranking and term expansion (e.g., "k8s" expands to "Kubernetes container orchestration", possibly weighted as in SPLADE): * Dense vector retrieval makes use of embeddings, the internal representation that LLMs use to represent text. Dense vector retrieval provides recall over a broader set of results that may have no exact keyword matches but are still semantically similar. * Sparse vector retrieval is very fast, human-understandable, and yields high recall of results that closely match the user query. * We've found the approaches to be complementary.There's a very good blog post by Pinecone on SPLADE for sparse vector search if you're interested in diving in. If you're building RAG applications in areas that have a lot of industry-specific nomenclature, acronyms, etc, this is a good approach to getting better results.SCIPIn 2016, Microsoft announced the Language Server Protocol (LSP) and the Language Server Index Format (LSIF). This protocol makes it easy for IDEs to get all the context they need from a codebase to get things like file search, references, “go to definition”, etc. SourceGraph developed SCIP, “a better code indexing format than LSIF”:* Simpler and More Efficient Format: SCIP utilizes Protobuf instead of JSON, which is used by LSIF. Protobuf is more space-efficient, simpler, and more suitable for systems programming. * Better Performance and Smaller Index Sizes: SCIP indexers, such as scip-clang, show enhanced performance and reduced index file sizes compared to LSIF indexers (10%-20% smaller)* Easier to Develop and Debug: SCIP's design, centered around human-readable string IDs for symbols, makes it faster and more straightforward to develop new language indexers. Having more efficient indexing is key to more performant RAG on code. Show Notes* Sourcegraph* Cody* Copilot vs Cody* Steve's Stanford seminar on Grok* Steve's blog* Grab* Fireworks* Peter Norvig* Noam Chomsky* Code search* Kelly Norton* Zoekt* v0.devSee also our past episodes on Cursor, Phind, Codeium and Codium as well as the GitHub Copilot keynote at AI Engineer Summit.Timestamps* [00:00:00] Intros & Backgrounds* [00:05:20] How Steve's work on Grok inspired SourceGraph for Beyang* [00:08:10] What's Cody?* [00:11:22] Comparison of coding assistants and the capabilities of Cody* [00:16:00] The importance of context (RAG) in AI coding tools* [00:21:33] The debate between Chomsky and Norvig approaches in AI* [00:30:06] Normsky: the Norvig + Chomsky models collision* [00:36:00] The death of the DSL?* [00:40:00] LSP, Skip, Kythe, BFG, and all that fun stuff* [00:53:00] The SourceGraph internal stack* [00:58:46] Building on open source models* [01:02:00] SourceGraph for engineering managers?* [01:12:00] Lightning RoundTranscriptAlessio: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO-in-Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI. [00:00:16]Swyx: Hey, and today we're christening our new podcast studio in the Newton, and we have Beyang and Steve from Sourcegraph. Welcome. [00:00:25]Beyang: Hey, thanks for having us. [00:00:26]Swyx: So this has been a long time coming. I'm very excited to have you. We also are just celebrating the one year anniversary of ChatGPT yesterday, but also we'll be talking about the GA of Cody later on today. We'll just do a quick intros of both of you. Obviously, people can research you and check the show notes for more. Beyang, you worked in computer vision at Stanford and then you worked at Palantir. I did, yeah. You also interned at Google. [00:00:48]Beyang: I did back in the day where I get to use Steve's system, DevTool. [00:00:53]Swyx: Right. What was it called? [00:00:55]Beyang: It was called Grok. Well, the end user thing was Google Code Search. That's what everyone called it, or just like CS. But the brains of it were really the kind of like Trigram index and then Grok, which provided the reference graph. [00:01:07]Steve: Today it's called Kythe, the open source Google one. It's sort of like Grok v3. [00:01:11]Swyx: On your podcast, which you've had me on, you've interviewed a bunch of other code search developers, including the current developer of Kythe, right? [00:01:19]Beyang: No, we didn't have any Kythe people on, although we would love to if they're up for it. We had Kelly Norton, who built a similar system at Etsy, it's an open source project called Hound. We also had Han-Wen Nienhuys, who created Zoekt, which is, I think, heavily inspired by the Trigram index that powered Google's original code search and that we also now use at Sourcegraph. Yeah. [00:01:45]Swyx: So you teamed up with Quinn over 10 years ago to start Sourcegraph and you were indexing all code on the internet. And now you're in a perfect spot to create a code intelligence startup. Yeah, yeah. [00:01:56]Beyang: I guess the backstory was, I used Google Code Search while I was an intern. And then after I left that internship and worked elsewhere, it was the single dev tool that I missed the most. I felt like my job was just a lot more tedious and much more of a hassle without it. And so when Quinn and I started working together at Palantir, he had also used various code search engines in open source over the years. And it was just a pain point that we both felt, both working on code at Palantir and also working within Palantir's clients, which were a lot of Fortune 500 companies, large financial institutions, folks like that. And if anything, the pains they felt in dealing with large complex code bases made our pain points feel small by comparison. So that was really the impetus for starting Sourcegraph. [00:02:42]Swyx: Yeah, excellent. Steve, you famously worked at Amazon. And you've told many, many stories. I want every single listener of Latent Space to check out Steve's YouTube because he effectively had a podcast that you didn't tell anyone about or something. You just hit record and just went on a few rants. I'm always here for your Stevie rants. And then you moved to Google, where you also had some interesting thoughts on just the overall Google culture versus Amazon. You joined Grab as head of eng for a couple of years. I'm from Singapore, so I have actually personally used a lot of Grab's features. And it was very interesting to see you talk so highly of Grab's engineering and sort of overall prospects. [00:03:21]Steve: Because as a customer, it sucked? [00:03:22]Swyx: Yeah, no, it's just like, being from a smaller country, you never see anyone from our home country being on a global stage or talked about as a startup that people admire or look up to, like on the league that you, with all your legendary experience, would consider equivalent. Yeah. [00:03:41]Steve: Yeah, no, absolutely. They actually, they didn't even know that they were as good as they were, in a sense. They started hiring a bunch of people from Silicon Valley to come in and sort of like fix it. And we came in and we were like, Oh, we could have been a little better operational excellence and stuff. But by and large, they're really sharp. The only thing about Grab is that they get criticized a lot for being too westernized. Oh, by who? By Singaporeans who don't want to work there. [00:04:06]Swyx: Okay. I guess I'm biased because I'm here, but I don't see that as a problem. If anything, they've had their success because they were more westernized than the Sanders Singaporean tech company. [00:04:15]Steve: I mean, they had their success because they are laser focused. They copy to Amazon. I mean, they're executing really, really, really well for a giant. I was on a slack with 2,500 engineers. It was like this giant waterfall that you could dip your toe into. You'd never catch up. Actually, the AI summarizers would have been really helpful there. But yeah, no, I think Grab is successful because they're just out there with their sleeves rolled up, just making it happen. [00:04:43]Swyx: And for those who don't know, it's not just like Uber of Southeast Asia, it's also a super app. PayPal Plus. [00:04:48]Steve: Yeah. [00:04:49]Swyx: In the way that super apps don't exist in the West. It's one of the enduring mysteries of B2C that super apps work in the East and don't work in the West. We just don't understand it. [00:04:57]Beyang: Yeah. [00:04:58]Steve: It's just kind of curious. They didn't work in India either. And it was primarily because of bandwidth reasons and smaller phones. [00:05:03]Swyx: That should change now. It should. [00:05:05]Steve: And maybe we'll see a super app here. [00:05:08]Swyx: You retired-ish? I did. You retired-ish on your own video game? Mm-hmm. Any fun stories about that? And that's also where you discovered some need for code search, right? Mm-hmm. [00:05:16]Steve: Sure. A need for a lot of stuff. Better programming languages, better databases. Better everything. I mean, I started in like 95, right? Where there was kind of nothing. Yeah. Yeah. [00:05:24]Beyang: I just want to say, I remember when you first went to Grab because you wrote that blog post talking about why you were excited about it, about like the expanding Asian market. And our reaction was like, oh, man, how did we miss stealing it with you? [00:05:36]Swyx: Hiring you. [00:05:37]Beyang: Yeah. [00:05:38]Steve: I was like, miss that. [00:05:39]Swyx: Tell that story. So how did this happen? Right? So you were inspired by Grok. [00:05:44]Beyang: I guess the backstory from my point of view is I had used code search and Grok while at Google, but I didn't actually know that it was connected to you, Steve. I knew you from your blog posts, which were always excellent, kind of like inside, very thoughtful takes from an engineer's perspective on some of the challenges facing tech companies and tech culture and that sort of thing. But my first introduction to you within the context of code intelligence, code understanding was I watched a talk that you gave, I think at Stanford, about Grok when you're first building it. And that was very eye opening. I was like, oh, like that guy, like the guy who, you know, writes the extremely thoughtful ranty like blog posts also built that system. And so that's how I knew, you know, you were involved in that. And then, you know, we always wanted to hire you, but never knew quite how to approach you or, you know, get that conversation started. [00:06:34]Steve: Well, we got introduced by Max, right? Yeah. It was temporal. Yeah. Yeah. I mean, it was a no brainer. They called me up and I had noticed when Sourcegraph had come out. Of course, when they first came out, I had this dagger of jealousy stabbed through me piercingly, which I remember because I am not a jealous person by any means, ever. But boy, I was like, but I was kind of busy, right? And just one thing led to another. I got sucked back into the ads vortex and whatever. So thank God Sourcegraph actually kind of rescued me. [00:07:05]Swyx: Here's a chance to build DevTools. Yeah. [00:07:08]Steve: That's the best. DevTools are the best. [00:07:10]Swyx: Cool. Well, so that's the overall intro. I guess we can get into Cody. Is there anything else that like people should know about you before we get started? [00:07:18]Steve: I mean, everybody knows I'm a musician. I can juggle five balls. [00:07:24]Swyx: Five is good. Five is good. I've only ever managed three. [00:07:27]Steve: Five is hard. Yeah. And six, a little bit. [00:07:30]Swyx: Wow. [00:07:31]Beyang: That's impressive. [00:07:32]Alessio: So yeah, to jump into Sourcegraph, this has been a company 10 years in the making. And as Sean said, now you're at the right place. Phase two. Now, exactly. You spent 10 years collecting all this code, indexing, making it easy to surface it. Yeah. [00:07:47]Swyx: And also learning how to work with enterprises and having them trust you with their code bases. Yeah. [00:07:52]Alessio: Because initially you were only doing on-prem, right? Like a lot of like VPC deployments. [00:07:55]Beyang: So in the very early days, we're cloud only. But the first major customers we landed were all on-prem, self-hosted. And that was, I think, related to the nature of the problem that we're solving, which becomes just like a critical, unignorable pain point once you're above like 100 devs or so. [00:08:11]Alessio: Yeah. And now Cody is going to be GA by the time this releases. So congrats to your future self for launching this in two weeks. Can you give a quick overview of just what Cody is? I think everybody understands that it's a AI coding agent, but a lot of companies say they have a AI coding agent. So yeah, what does Cody do? How do people interface with it? [00:08:32]Beyang: Yeah. So how is it different from the like several dozen other AI coding agents that exist in the market now? When we thought about building a coding assistant that would do things like code generation and question answering about your code base, I think we came at it from the perspective of, you know, we've spent the past decade building the world's best code understanding engine for human developers, right? So like it's kind of your guide as a human dev if you want to go and dive into a large complex code base. And so our intuition was that a lot of the context that we're providing to human developers would also be useful context for AI developers to consume. And so in terms of the feature set, Cody is very similar to a lot of other assistants. It does inline autocompletion. It does code base aware chat. It does specific commands that automate, you know, tasks that you might rather not want to do like generating unit tests or adding detailed documentation. But we think the core differentiator is really the quality of the context, which is hard to kind of describe succinctly. It's a bit like saying, you know, what's the difference between Google and Alta Vista? There's not like a quick checkbox list of features that you can rattle off, but it really just comes down to all the attention and detail that we've paid to making that context work well and be high quality and fast for human devs. We're now kind of plugging into the AI coding assistant as well. Yeah. [00:09:53]Steve: I mean, just to add my own perspective on to what Beyang just described, RAG is kind of like a consultant that the LLM has available, right, that knows about your code. RAG provides basically a bridge to a lookup system for the LLM, right? Whereas fine tuning would be more like on the job training for somebody. If the LLM is a person, you know, and you send them to a new job and you do on the job training, that's what fine tuning is like, right? So tuned to our specific task. You're always going to need that expert, even if you get the on the job training, because the expert knows your particular code base, your task, right? That expert has to know your code. And there's a chicken and egg problem because, right, you know, we're like, well, I'm going to ask the LLM about my code, but first I have to explain it, right? It's this chicken and egg problem. That's where RAG comes in. And we have the best consultants, right? The best assistant who knows your code. And so when you sit down with Cody, right, what Beyang said earlier about going to Google and using code search and then starting to feel like without it, his job was super tedious. Once you start using these, do you guys use coding assistants? [00:10:53]Swyx: Yeah, right. [00:10:54]Steve: I mean, like we're getting to the point very quickly, right? Where you feel like almost like you're programming without the internet, right? Or something, you know, it's like you're programming back in the nineties without the coding assistant. Yeah. Hopefully that helps for people who have like no idea about coding systems, what they are. [00:11:09]Swyx: Yeah. [00:11:10]Alessio: I mean, going back to using them, we had a lot of them on the podcast already. We had Cursor, we have Codium and Codium, very similar names. [00:11:18]Swyx: Yeah. Find, and then of course there's Copilot. [00:11:22]Alessio: You had a Copilot versus Cody blog post, and I think it really shows the context improvement. So you had two examples that stuck with me. One was, what does this application do? And the Copilot answer was like, oh, it uses JavaScript and NPM and this. And it's like, but that's not what it does. You know, that's what it's built with. Versus Cody was like, oh, these are like the major functions. And like, these are the functionalities and things like that. And then the other one was, how do I start this up? And Copilot just said NPM start, even though there was like no start command in the package JSON, but you know, most collapse, right? Most projects use NPM start. So maybe this does too. How do you think about open source models? Because Copilot has their own private thing. And I think you guys use Starcoder, if I remember right. Yeah, that's correct. [00:12:09]Beyang: I think Copilot uses some variant of Codex. They're kind of cagey about it. I don't think they've like officially announced what model they use. [00:12:16]Swyx: And I think they use a range of models based on what you're doing. Yeah. [00:12:19]Beyang: So everyone uses a range of model. Like no one uses the same model for like inline completion versus like chat because the latency requirements for. Oh, okay. Well, there's fill in the middle. There's also like what the model's trained on. So like we actually had completions powered by Claude Instant for a while. And but you had to kind of like prompt hack your way to get it to output just the code and not like, hey, you know, here's the code you asked for, like that sort of text. So like everyone uses a range of models. We've kind of designed Cody to be like especially model, not agnostic, but like pluggable. So one of our kind of design considerations was like as the ecosystem evolves, we want to be able to integrate the best in class models, whether they're proprietary or open source into Cody because the pace of innovation in the space is just so quick. And I think that's been to our advantage. Like today, Cody uses Starcoder for inline completions. And with the benefit of the context that we provide, we actually show comparable completion acceptance rate metrics. It's kind of like the standard metric that folks use to evaluate inline completion quality. It's like if I show you a completion, what's the chance that you actually accept the completion versus you reject it? And so we're at par with Copilot, which is at the head of that industry right now. And we've been able to do that with the Starcoder model, which is open source and the benefit of the context fetching stuff that we provide. And of course, a lot of like prompt engineering and other stuff along the way. [00:13:40]Alessio: And Steve, you wrote a post called cheating is all you need about what you're building. And one of the points you made is that everybody's fighting on the same axis, which is better UI and the IDE, maybe like a better chat response. But data modes are kind of the most important thing. And you guys have like a 10 year old mode with all the data you've been collecting. How do you kind of think about what other companies are doing wrong, right? Like, why is nobody doing this in terms of like really focusing on RAG? I feel like you see so many people. Oh, we just got a new model. It's like a bit human eval. And it's like, well, but maybe like that's not what we should really be doing, you know? Like, do you think most people underestimate the importance of like the actual RAG in code? [00:14:21]Steve: I think that people weren't doing it much. It wasn't. It's kind of at the edges of AI. It's not in the center. I know that when ChatGPT launched, so within the last year, I've heard a lot of rumblings from inside of Google, right? Because they're undergoing a huge transformation to try to, you know, of course, get into the new world. And I heard that they told, you know, a bunch of teams to go and train their own models or fine tune their own models, right? [00:14:43]Swyx: Both. [00:14:43]Steve: And, you know, it was a s**t show. Nobody knew how to do it. They launched two coding assistants. One was called Code D with an EY. And then there was, I don't know what happened in that one. And then there's Duet, right? Google loves to compete with themselves, right? They do this all the time. And they had a paper on Duet like from a year ago. And they were doing exactly what Copilot was doing, which was just pulling in the local context, right? But fundamentally, I thought of this because we were talking about the splitting of the [00:15:10]Swyx: models. [00:15:10]Steve: In the early days, it was the LLM did everything. And then we realized that for certain use cases, like completions, that a different, smaller, faster model would be better. And that fragmentation of models, actually, we expected to continue and proliferate, right? Because we are fundamentally, we're a recommender engine right now. Yeah, we're recommending code to the LLM. We're saying, may I interest you in this code right here so that you can answer my question? [00:15:34]Swyx: Yeah? [00:15:34]Steve: And being good at recommender engine, I mean, who are the best recommenders, right? There's YouTube and Spotify and, you know, Amazon or whatever, right? Yeah. [00:15:41]Swyx: Yeah. [00:15:41]Steve: And they all have many, many, many, many, many models, right? For all fine-tuned for very specific, you know. And that's where we're heading in code, too. Absolutely. [00:15:50]Swyx: Yeah. [00:15:50]Alessio: We just did an episode we released on Wednesday, which we said RAG is like Rexis or like LLMs. You're basically just suggesting good content. [00:15:58]Swyx: It's like what? Recommendations. [00:15:59]Beyang: Recommendations. [00:16:00]Alessio: Oh, got it. [00:16:01]Steve: Yeah, yeah, yeah. [00:16:02]Swyx: So like the naive implementation of RAG is you embed everything, throw it in a vector database, you embed your query, and then you find the nearest neighbors, and that's your RAG. But actually, you need to rank it. And actually, you need to make sure there's sample diversity and that kind of stuff. And then you're like slowly gradient dissenting yourself towards rediscovering proper Rexis, which has been traditional ML for a long time. But like approaching it from an LLM perspective. Yeah. [00:16:24]Beyang: I almost think of it as like a generalized search problem because it's a lot of the same things. Like you want your layer one to have high recall and get all the potential things that could be relevant. And then there's typically like a layer two re-ranking mechanism that bumps up the precision and tries to get the relevant stuff to the top of the results list. [00:16:43]Swyx: Have you discovered that ranking matters a lot? Oh, yeah. So the context is that I think a lot of research shows that like one, context utilization matters based on model. Like GPT uses the top of the context window, and then apparently Claude uses the bottom better. And it's lossy in the middle. Yeah. So ranking matters. No, it really does. [00:17:01]Beyang: The skill with which models are able to take advantage of context is always going to be dependent on how that factors into the impact on the training loss. [00:17:10]Swyx: Right? [00:17:10]Beyang: So like if you want long context window models to work well, then you have to have a ton of data where it's like, here's like a billion lines of text. And I'm going to ask a question about like something that's like, you know, embedded deeply into it and like, give me the right answer. And unless you have that training set, then of course, you're going to have variability in terms of like where it attends to. And in most kind of like naturally occurring data, the thing that you're talking about right now, the thing I'm asking you about is going to be something that we talked about recently. [00:17:36]Swyx: Yeah. [00:17:36]Steve: Did you really just say gradient dissenting yourself? Actually, I love that it's entered the casual lexicon. Yeah, yeah, yeah. [00:17:44]Swyx: My favorite version of that is, you know, how we have to p-hack papers. So, you know, when you throw humans at the problem, that's called graduate student dissent. That's great. It's really awesome. [00:17:54]Alessio: I think the other interesting thing that you have is this inline assist UX that I wouldn't say async, but like it works while you can also do work. So you can ask Cody to make changes on a code block and you can still edit the same file at the same time. [00:18:07]Swyx: Yeah. [00:18:07]Alessio: How do you see that in the future? Like, do you see a lot of Cody's running together at the same time? Like, how do you validate also that they're not messing each other up as they make changes in the code? And maybe what are the limitations today? And what do you think about where the attack is going? [00:18:21]Steve: I want to start with a little history and then I'm going to turn it over to Bian, all right? So we actually had this feature in the very first launch back in June. Dominic wrote it. It was called nonstop Cody. And you could have multiple, basically, LLM requests in parallel modifying your source [00:18:37]Swyx: file. [00:18:37]Steve: And he wrote a bunch of code to handle all of the diffing logic. And you could see the regions of code that the LLM was going to change, right? And he was showing me demos of it. And it just felt like it was just a little before its time, you know? But a bunch of that stuff, that scaffolding was able to be reused for where we're inline [00:18:56]Swyx: sitting today. [00:18:56]Steve: How would you characterize it today? [00:18:58]Beyang: Yeah, so that interface has really evolved from a, like, hey, general purpose, like, request anything inline in the code and have the code update to really, like, targeted features, like, you know, fix the bug that exists at this line or request a very specific [00:19:13]Swyx: change. [00:19:13]Beyang: And the reason for that is, I think, the challenge that we ran into with inline fixes, and we do want to get to the point where you could just fire and forget and have, you know, half a dozen of these running in parallel. But I think we ran into the challenge early on that a lot of people are running into now when they're trying to construct agents, which is the reliability of, you know, working code generation is just not quite there yet in today's language models. And so that kind of constrains you to an interaction where the human is always, like, in the inner loop, like, checking the output of each response. And if you want that to work in a way where you can be asynchronous, you kind of have to constrain it to a domain where today's language models can generate reliable code well enough. So, you know, generating unit tests, that's, like, a well-constrained problem. Or fixing a bug that shows up as, like, a compiler error or a test error, that's a well-constrained problem. But the more general, like, hey, write me this class that does X, Y, and Z using the libraries that I have, that is not quite there yet, even with the benefit of really good context. Like, it definitely moves the needle a lot, but we're not quite there yet to the point where you can just fire and forget. And I actually think that this is something that people don't broadly appreciate yet, because I think that, like, everyone's chasing this dream of agentic execution. And if we're to really define that down, I think it implies a couple things. You have, like, a multi-step process where each step is fully automated. We don't have to have a human in the loop every time. And there's also kind of like an LM call at each stage or nearly every stage in that [00:20:45]Swyx: chain. [00:20:45]Beyang: Based on all the work that we've done, you know, with the inline interactions, with kind of like general Codyfeatures for implementing longer chains of thought, we're actually a little bit more bearish than the average, you know, AI hypefluencer out there on the feasibility of agents with purely kind of like transformer-based models. To your original question, like, the inline interactions with CODI, we actually constrained it to be more targeted, like, you know, fix the current error or make this quick fix. I think that that does differentiate us from a lot of the other tools on the market, because a lot of people are going after this, like, shnazzy, like, inline edit interaction, whereas I think where we've moved, and this is based on the user feedback that we've gotten, it's like that sort of thing, it demos well, but when you're actually coding day to day, you don't want to have, like, a long chat conversation inline with the code base. That's a waste of time. You'd rather just have it write the right thing and then move on with your life or not have to think about it. And that's what we're trying to work towards. [00:21:37]Steve: I mean, yeah, we're not going in the agent direction, right? I mean, I'll believe in agents when somebody shows me one that works. Yeah. Instead, we're working on, you know, sort of solidifying our strength, which is bringing the right context in. So new context sources, ways for you to plug in your own context, ways for you to control or influence the context, you know, the mixing that happens before the request goes out, etc. And there's just so much low-hanging fruit left in that space that, you know, agents seems like a little bit of a boondoggle. [00:22:03]Beyang: Just to dive into that a little bit further, like, I think, you know, at a very high level, what do people mean when they say agents? They really mean, like, greater automation, fully automated, like, the dream is, like, here's an issue, go implement that. And I don't have to think about it as a human. And I think we are working towards that. Like, that is the eventual goal. I think it's specifically the approach of, like, hey, can we have a transformer-based LM alone be the kind of, like, backbone or the orchestrator of these agentic flows? Where we're a little bit more bearish today. [00:22:31]Swyx: You want the human in the loop. [00:22:32]Beyang: I mean, you kind of have to. It's just a reality of the behavior of language models that are purely, like, transformer-based. And I think that's just like a reflection of reality. And I don't think people realize that yet. Because if you look at the way that a lot of other AI tools have implemented context fetching, for instance, like, you see this in the Copilot approach, where if you use, like, the at-workspace thing that supposedly provides, like, code-based level context, it has, like, an agentic approach where you kind of look at how it's behaving. And it feels like they're making multiple requests to the LM being like, what would you do in this case? Would you search for stuff? What sort of files would you gather? Go and read those files. And it's like a multi-hop step, so it takes a long while. It's also non-deterministic. Because any sort of, like, LM invocation, it's like a dice roll. And then at the end of the day, the context it fetches is not that good. Whereas our approach is just like, OK, let's do some code searches that make sense. And then maybe, like, crawl through the reference graph a little bit. That is fast. That doesn't require any sort of LM invocation at all. And we can pull in much better context, you know, very quickly. So it's faster. [00:23:37]Swyx: It's more reliable. [00:23:37]Beyang: It's deterministic. And it yields better context quality. And so that's what we think. We just don't think you should cargo cult or naively go like, you know, agents are the [00:23:46]Swyx: future. [00:23:46]Beyang: Let's just try to, like, implement agents on top of the LM that exists today. I think there are a couple of other technologies or approaches that need to be refined first before we can get into these kind of, like, multi-stage, fully automated workflows. [00:24:00]Swyx: It makes sense. You know, we're very much focused on developer inner loop right now. But you do see things eventually moving towards developer outer loop. Yeah. So would you basically say that they're tackling the agent's problem that you don't want to tackle? [00:24:11]Beyang: No, I would say at a high level, we are after maybe, like, the same high level problem, which is like, hey, I want some code written. I want to develop some software and can automate a system. Go build that software for me. I think the approaches might be different. So I think the analogy in my mind is, I think about, like, the AI chess players. Coding, in some senses, I mean, it's similar and dissimilar to chess. I think one question I ask is, like, do you think producing code is more difficult than playing chess or less difficult than playing chess? More. [00:24:41]Swyx: I think more. [00:24:41]Beyang: Right. And if you look at the best AI chess players, like, yes, you can use an LLM to play chess. Like, people have showed demos where it's like, oh, like, yeah, GPT-4 is actually a pretty decent, like, chess move suggester. Right. But you would never build, like, a best in class chess player off of GPT-4 alone. [00:24:57]Swyx: Right. [00:24:57]Beyang: Like, the way that people design chess players is that you have kind of like a search space and then you have a way to explore that search space efficiently. There's a bunch of search algorithms, essentially. We were doing tree search in various ways. And you can have heuristic functions, which might be powered by an LLM. [00:25:12]Swyx: Right. [00:25:12]Beyang: Like, you might use an LLM to generate proposals in that space that you can efficiently explore. But the backbone is still this kind of more formalized tree search based approach rather than the LLM itself. And so I think my high level intuition is that, like, the way that we get to more reliable multi-step workflows that do things beyond, you know, generate unit test, it's really going to be like a search based approach where you use an LLM as kind of like an advisor or a proposal function, sort of your heuristic function, like the ASTAR search algorithm. But it's probably not going to be the thing that is the backbone, because I guess it's not the right tool for that. Yeah. [00:25:50]Swyx: I can see yourself kind of thinking through this, but not saying the words, the sort of philosophical Peter Norvig type discussion. Maybe you want to sort of introduce that in software. Yeah, definitely. [00:25:59]Beyang: So your listeners are savvy. They're probably familiar with the classic like Chomsky versus Norvig debate. [00:26:04]Swyx: No, actually, I wanted, I was prompting you to introduce that. Oh, got it. [00:26:08]Beyang: So, I mean, if you look at the history of artificial intelligence, right, you know, it goes way back to, I don't know, it's probably as old as modern computers, like 50s, 60s, 70s. People are debating on like, what is the path to producing a sort of like general human level of intelligence? And kind of two schools of thought that emerged. One is the Norvig school of thought, which roughly speaking includes large language models, you know, regression, SVN, basically any model that you kind of like learn from data. And it's like data driven. Most of machine learning would fall under this umbrella. And that school of thought says like, you know, just learn from the data. That's the approach to reaching intelligence. And then the Chomsky approach is more things like compilers and parsers and formal systems. So basically like, let's think very carefully about how to construct a formal, precise system. And that will be the approach to how we build a truly intelligent system. I think Lisp was invented so that you could create like rules-based systems that you would call AI. As a language. Yeah. And for a long time, there was like this debate, like there's certain like AI research labs that were more like, you know, in the Chomsky camp and others that were more in the Norvig camp. It's a debate that rages on today. And I feel like the consensus right now is that, you know, Norvig definitely has the upper hand right now with the advent of LMs and diffusion models and all the other recent progress in machine learning. But the Chomsky-based stuff is still really useful in my view. I mean, it's like parsers, compilers, basically a lot of the stuff that provides really good context. It provides kind of like the knowledge graph backbone that you want to explore with your AI dev tool. Like that will come from kind of like Chomsky-based tools like compilers and parsers. It's a lot of what we've invested in in the past decade at Sourcegraph and what you build with Grok. Basically like these formal systems that construct these very precise knowledge graphs that are great context providers and great kind of guard rails enforcers and kind of like safety checkers for the output of a more kind of like data-driven, fuzzier system that uses like the Norvig-based models. [00:28:03]Steve: Jang was talking about this stuff like it happened in the middle ages. Like, okay, so when I was in college, I was in college learning Lisp and prologue and planning and all the deterministic Chomsky approaches to AI. And I was there when Norvig basically declared it dead. I was there 3,000 years ago when Norvig and Chomsky fought on the volcano. When did he declare it dead? [00:28:26]Swyx: What do you mean he declared it dead? [00:28:27]Steve: It was like late 90s. [00:28:29]Swyx: Yeah. [00:28:29]Steve: When I went to Google, Peter Norvig was already there. He had basically like, I forget exactly where. It was some, he's got so many famous short posts, you know, amazing. [00:28:38]Swyx: He had a famous talk, the unreasonable effectiveness of data. Yeah. [00:28:41]Steve: Maybe that was it. But at some point, basically, he basically convinced everybody that deterministic approaches had failed and that heuristic-based, you know, data-driven statistical approaches, stochastic were better. [00:28:52]Swyx: Yeah. [00:28:52]Steve: The primary reason I can tell you this, because I was there, was that, was that, well, the steam-powered engine, no. The reason was that the deterministic stuff didn't scale. [00:29:06]Swyx: Yeah. Right. [00:29:06]Steve: They're using prologue, man, constraint systems and stuff like that. Well, that was a long time ago, right? Today, actually, these Chomsky-style systems do scale. And that's, in fact, exactly what Sourcegraph has built. Yeah. And so we have a very unique, I love the framing that Bjong's made, that the marriage of the Chomsky and the Norvig, you know, sort of models, you know, conceptual models, because we, you know, we have both of them and they're both really important. And in fact, there, there's this really interesting, like, kind of overlap between them, right? Where like the AI or our graph or our search engine could potentially provide the right context for any given query, which is, of course, why ranking is important. But what we've really signed ourselves up for is an extraordinary amount of testing. [00:29:45]Swyx: Yeah. [00:29:45]Steve: Because in SWIGs, you were saying that, you know, GPT-4 tends to the front of the context window and maybe other LLMs to the back and maybe, maybe the LLM in the middle. [00:29:53]Swyx: Yeah. [00:29:53]Steve: And so that means that, you know, if we're actually like, you know, verifying whether we, you know, some change we've made has improved things, we're going to have to test putting it at the beginning of the window and at the end of the window, you know, and maybe make the right decision based on the LLM that you've chosen. Which some of our competitors, that's a problem that they don't have, but we meet you, you know, where you are. Yeah. And we're, just to finish, we're writing tens of thousands. We're generating tests, you know, fill in the middle type tests and things. And then using our graph to basically sort of fine tune Cody's behavior there. [00:30:20]Swyx: Yeah. [00:30:21]Beyang: I also want to add, like, I have like an internal pet name for this, like kind of hybrid architecture that I'm trying to make catch on. Maybe I'll just say it here. Just saying it publicly kind of makes it more real. But like, I call the architecture that we've developed the Normsky architecture. [00:30:36]Swyx: Yeah. [00:30:36]Beyang: I mean, it's obviously a portmanteau of Norvig and Chomsky, but the acronym, it stands for non-agentic, rapid, multi-source code intelligence. So non-agentic because... Rolls right off the tongue. And Normsky. But it's non-agentic in the sense that like, we're not trying to like pitch you on kind of like agent hype, right? Like it's the things it does are really just developer tools developers have been using for decades now, like parsers and really good search indexes and things like that. Rapid because we place an emphasis on speed. We don't want to sit there waiting for kind of like multiple LLM requests to return to complete a simple user request. Multi-source because we're thinking broadly about what pieces of information and knowledge are useful context. So obviously starting with things that you can search in your code base, and then you add in the reference graph, which kind of like allows you to crawl outward from those initial results. But then even beyond that, you know, sources of information, like there's a lot of knowledge that's embedded in docs, in PRDs or product specs, in your production logging system, in your chat, in your Slack channel, right? Like there's so much context is embedded there. And when you're a human developer, and you're trying to like be productive in your code base, you're going to go to all these different systems to collect the context that you need to figure out what code you need to write. And I don't think the AI developer will be any different. It will need to pull context from all these different sources. So we're thinking broadly about how to integrate these into Codi. We hope through kind of like an open protocol that like others can extend and implement. And this is something else that should be accessible by December 14th in kind of like a preview stage. But that's really about like broadening this notion of the code graph beyond your Git repository to all the other sources where technical knowledge and valuable context can live. [00:32:21]Steve: Yeah, it becomes an artifact graph, right? It can link into your logs and your wikis and any data source, right? [00:32:27]Alessio: How do you guys think about the importance of, it's almost like data pre-processing in a way, which is bring it all together, tie it together, make it ready. Any thoughts on how to actually make that good? Some of the innovation you guys have made. [00:32:40]Steve: We talk a lot about the context fetching, right? I mean, there's a lot of ways you could answer this question. But, you know, we've spent a lot of time just in this podcast here talking about context fetching. But stuffing the context into the window is, you know, the bin packing problem, right? Because the window is not big enough, and you've got more context than you can fit. You've got a ranker maybe. But what is that context? Is it a function that was returned by an embedding or a graph call or something? Do you need the whole function? Or do you just need, you know, the top part of the function, this expression here, right? You know, so that art, the golf game of trying to, you know, get each piece of context down into its smallest state, possibly even summarized by another model, right, before it even goes to the LLM, becomes this is the game that we're in, yeah? And so, you know, recursive summarization and all the other techniques that you got to use to like stuff stuff into that context window become, you know, critically important. And you have to test them across every configuration of models that you could possibly need. [00:33:32]Beyang: I think data preprocessing is probably the like unsexy, way underappreciated secret to a lot of the cool stuff that people are shipping today. Whether you're doing like RAG or fine tuning or pre-training, like the preprocessing step matters so much because it's basically garbage in, garbage out, right? Like if you're feeding in garbage to the model, then it's going to output garbage. Concretely, you know, for code RAG, if you're not doing some sort of like preprocessing that takes advantage of a parser and is able to like extract the key components of a particular file of code, you know, separate the function signature from the body, from the doc string, what are you even doing? Like that's like table stakes. It opens up so much more possibilities with which you can kind of like tune your system to take advantage of the signals that come from those different parts of the code. Like we've had a tool, you know, since computers were invented that understands the structure of source code to a hundred percent precision. The compiler knows everything there is to know about the code in terms of like structure. Like why would you not want to use that in a system that's trying to generate code, answer questions about code? You shouldn't throw that out the window just because now we have really good, you know, data-driven models that can do other things. [00:34:44]Steve: Yeah. When I called it a data moat, you know, in my cheating post, a lot of people were confused, you know, because data moat sort of sounds like data lake because there's data and water and stuff. I don't know. And so they thought that we were sitting on this giant mountain of data that we had collected, but that's not what our data moat is. It's really a data pre-processing engine that can very quickly and scalably, like basically dissect your entire code base in a very small, fine-grained, you know, semantic unit and then serve it up. Yeah. And so it's really, it's not a data moat. It's a data pre-processing moat, I guess. [00:35:15]Beyang: Yeah. If anything, we're like hypersensitive to customer data privacy requirements. So it's not like we've taken a bunch of private data and like, you know, trained a generally available model. In fact, exactly the opposite. A lot of our customers are choosing Cody over Copilot and other competitors because we have an explicit guarantee that we don't do any of that. And that we've done that from day one. Yeah. I think that's a very real concern in today's day and age, because like if your proprietary IP finds its way into the training set of any model, it's very easy both to like extract that knowledge from the model and also use it to, you know, build systems that kind of work on top of the institutional knowledge that you've built up. [00:35:52]Alessio: About a year ago, I wrote a post on LLMs for developers. And one of the points I had was maybe the depth of like the DSL. I spent most of my career writing Ruby and I love Ruby. It's so nice to use, but you know, it's not as performant, but it's really easy to read, right? And then you look at other languages, maybe they're faster, but like they're more verbose, you know? And when you think about efficiency of the context window, that actually matters. [00:36:15]Swyx: Yeah. [00:36:15]Alessio: But I haven't really seen a DSL for models, you know? I haven't seen like code being optimized to like be easier to put in a model context. And it seems like your pre-processing is kind of doing that. Do you see in the future, like the way we think about the DSL and APIs and kind of like service interfaces be more focused on being context friendly, where it's like maybe it's harder to read for the human, but like the human is never going to write it anyway. We were talking on the Hacks podcast. There are like some data science things like spin up the spandex, like humans are never going to write again because the models can just do very easily. Yeah, curious to hear your thoughts. [00:36:51]Steve: Well, so DSLs, they involve, you know, writing a grammar and a parser and they're like little languages, right? We do them that way because, you know, we need them to compile and humans need to be able to read them and so on. The LLMs don't need that level of structure. You can throw any pile of crap at them, you know, more or less unstructured and they'll deal with it. So I think that's why a DSL hasn't emerged for sort of like communicating with the LLM or packaging up the context or anything. Maybe it will at some point, right? We've got, you know, tagging of context and things like that that are sort of peeking into DSL territory, right? But your point on do users, you know, do people have to learn DSLs like regular expressions or, you know, pick your favorite, right? XPath. I think you're absolutely right that the LLMs are really, really good at that. And I think you're going to see a lot less of people having to slave away learning these things. They just have to know the broad capabilities and the LLM will take care of the rest. [00:37:42]Swyx: Yeah, I'd agree with that. [00:37:43]Beyang: I think basically like the value profit of DSL is that it makes it easier to work with a lower level language, but at the expense of introducing an abstraction layer. And in many cases today, you know, without the benefit of AI cogeneration, like that totally worth it, right? With the benefit of AI cogeneration, I mean, I don't think all DSLs will go away. I think there's still, you know, places where that trade-off is going to be worthwhile. But it's kind of like how much of source code do you think is going to be generated through natural language prompting in the future? Because in a way, like any programming language is just a DSL on top of assembly, right? And so if people can do that, then yeah, like maybe for a large portion of the code [00:38:21]Swyx: that's written, [00:38:21]Beyang: people don't actually have to understand the DSL that is Ruby or Python or basically any other programming language that exists. [00:38:28]Steve: I mean, seriously, do you guys ever write SQL queries now without using a model of some sort? At least a draft. [00:38:34]Swyx: Yeah, right. [00:38:36]Steve: And so we have kind of like, you know, past that bridge, right? [00:38:39]Alessio: Yeah, I think like to me, the long-term thing is like, is there ever going to be, you don't actually see the code, you know? It's like, hey, the basic thing is like, hey, I need a function to some two numbers and that's it. I don't need you to generate the code. [00:38:53]Steve: And the following question, do you need the engineer or the paycheck? [00:38:56]Swyx: I mean, right? [00:38:58]Alessio: That's kind of the agent's discussion in a way where like you cannot automate the agents, but like slowly you're getting more of the atomic units of the work kind of like done. I kind of think of it as like, you know, [00:39:09]Beyang: do you need a punch card operator to answer that for you? And so like, I think we're still going to have people in the role of a software engineer, but the portion of time they spend on these kinds of like low-level, tedious tasks versus the higher level, more creative tasks is going to shift. [00:39:23]Steve: No, I haven't used punch cards. [00:39:25]Swyx: Yeah, I've been talking about like, so we kind of made this podcast about the sort of rise of the AI engineer. And like the first step is the AI enhanced engineer. That is that software developer that is no longer doing these routine, boilerplate-y type tasks, because they're just enhanced by tools like yours. So you mentioned OpenCodeGraph. I mean, that is a kind of DSL maybe, and because we're releasing this as you go GA, you hope for other people to take advantage of that? [00:39:52]Beyang: Oh yeah, I would say so OpenCodeGraph is not a DSL. It's more of a protocol. It's basically like, hey, if you want to make your system, whether it's, you know, chat or logging or whatever accessible to an AI developer tool like Cody, here's kind of like the schema by which you can provide that context and offer hints. So I would, you know, comparisons like LSP obviously did this for kind of like standard code intelligence. It's kind of like a lingua franca for providing fine references and codefinition. There's kind of like analogs to that. There might be also analogs to kind of the original OpenAI, kind of like plugins, API. There's all this like context out there that might be useful for an LM-based system to consume. And so at a high level, what we're trying to do is define a common language for context providers to provide context to other tools in the software development lifecycle. Yeah. Do you have any critiques of LSP, by the way, [00:40:42]Swyx: since like this is very much, very close to home? [00:40:45]Steve: One of the authors wrote a really good critique recently. Yeah. I don't think I saw that. Yeah, yeah. LSP could have been better. It just came out a couple of weeks ago. It was a good article. [00:40:54]Beyang: Yeah. I think LSP is great. Like for what it did for the developer ecosystem, it was absolutely fantastic. Like nowadays, like it's much easier now to get code navigation up and running in a bunch of editors by speaking this protocol. I think maybe the interesting question is like looking at the different design decisions comparing LSP basically with Kythe. Because Kythe has more of a... How would you describe it? [00:41:18]Steve: A storage format. [00:41:20]Beyang: I think the critique of LSP from a Kythe point of view would be like with LSP, you don't actually have an actual symbolic model of the code. It's not like LSP models like, hey, this function calls this other function. LSP is all like range-based. Like, hey, your cursor's at line 32, column 1. [00:41:35]Swyx: Yeah. [00:41:35]Beyang: And that's the thing you feed into the language server. And then it's like, okay, here's the range that you should jump to if you click on that range. So it kind of is intentionally ignorant of the fact that there's a thing called a reference underneath your cursor, and that's linked to a symbol definition. [00:41:49]Steve: Well, actually, that's the worst example you could have used. You're right. But that's the one thing that it actually did bake in is following references. [00:41:56]Swyx: Sure. [00:41:56]Steve: But it's sort of hardwired. [00:41:58]Swyx: Yeah. [00:41:58]Steve: Whereas Kythe attempts to model [00:42:00]Beyang: like all these things explicitly. [00:42:02]Swyx: And so... [00:42:02]Steve: Well, so LSP is a protocol, right? And so Google's internal protocol is gRPC-based. And it's a different approach than LSP. It's basically you make a heavy query to the back end, and you get a lot of data back, and then you render the whole page, you know? So we've looked at LSP, and we think that it's a little long in the tooth, right? I mean, it's a great protocol, lots and lots of support for it. But we need to push into the domain of exposing the intelligence through the protocol. Yeah. [00:42:29]Beyang: And so I would say we've developed a protocol of our own called Skip, which is at a very high level trying to take some of the good ideas from LSP and from Kythe and merge that into a system that in the near term is useful for Sourcegraph, but I think in the long term, we hope will be useful for the ecosystem. Okay, so here's what LSP did well. LSP, by virtue of being like intentionally dumb, dumb in air quotes, because I'm not like ragging on it, allowed language servers developers to kind of like bypass the hard problem of like modeling language semantics precisely. So like if all you want to do is jump to definition, you don't have to come up with like a universally unique naming scheme for each symbol, which is actually quite challenging because you have to think about like, okay, what's the top scope of this name? Is it the source code repository? Is it the package? Does it depend on like what package server you're fetching this from? Like whether it's the public one or the one inside your... Anyways, like naming is hard, right? And by just going from kind of like a location to location based approach, you basically just like throw that out the window. All I care about is jumping definition, just make that work. And you can make that work without having to deal with like all the complex global naming things. The limitation of that approach is that it's harder to build on top of that to build like a true knowledge graph. Like if you actually want a system that says like, okay, here's the web of functions and here's how they reference each other. And I want to incorporate that like semantic model of how the code operates or how the code relates to each other at like a static level. You can't do that with LSP because you have to deal with line ranges. And like concretely the pain point that we found in using LSP for source graph is like in order to do like a find references [00:44:04]Swyx: and then jump definitions, [00:44:04]Beyang: it's like a multi-hop process because like you have to jump to the range and then you have to find the symbol at that range. And it just adds a lot of latency and complexity of these operations where as a human, you're like, well, this thing clearly references this other thing. Why can't you just jump me to that? And I think that's the thing that Kaith does well. But then I think the issue that Kaith has had with adoption is because it is more sophisticated schema, I think. And so there's basically more things that you have to implement to get like a Kaith implementation up and running. I hope I'm not like, correct me if I'm wrong about any of this. [00:44:35]Steve: 100%, 100%. Kaith also has a problem, all these systems have the problem, even skip, or at least the way that we implemented the indexers, that they have to integrate with your build system in order to build that knowledge graph, right? Because you have to basically compile the code in a special mode to generate artifacts instead of binaries. And I would say, by the way, earlier I was saying that XREFs were in LSP, but it's actually, I was thinking of LSP plus LSIF. [00:44:58]Swyx: Yeah. That's another. [00:45:01]Steve: Which is actually bad. We can say that it's bad, right? [00:45:04]Steve: It's like skip or Kaith, it's supposed to be sort of a model serialization, you know, for the code graph, but it basically just does what LSP needs, the bare minimum. LSIF is basically if you took LSP [00:45:16]Beyang: and turned that into a serialization format. So like you build an index for language servers to kind of like quickly bootstrap from cold start. But it's a graph model [00:45:23]Steve: with all of the inconvenience of the API without an actual graph. And so, yeah. [00:45:29]Beyang: So like one of the things that we try to do with skip is try to capture the best of both worlds. So like make it easy to write an indexer, make the schema simple, but also model some of the more symbolic characteristics of the code that would allow us to essentially construct this knowledge graph that we can then make useful for both the human developer through SourceGraph and through the AI developer through Cody. [00:45:49]Steve: So anyway, just to finish off the graph comment, we've got a new graph, yeah, that's skip based. We call it BFG internally, right? It's a beautiful something graph. A big friendly graph. [00:46:00]Swyx: A big friendly graph. [00:46:01]Beyang: It's a blazing fast. [00:46:02]Steve: Blazing fast. [00:46:03]Swyx: Blazing fast graph. [00:46:04]Steve: And it is blazing fast, actually. It's really, really interesting. I should probably have to do a blog post about it to walk you through exactly how they're doing it. Oh, please. But it's a very AI-like iterative, you know, experimentation sort of approach. We're building a code graph based on all of our 10 years of knowledge about building code graphs, yeah? But we're building it quickly with zero configuration, and it doesn't have to integrate with your build. And through some magic tricks that we have. And so what just happens when you install the plugin, that it'll be there and indexing your code and providing that knowledge graph in the background without all that build system integration. This is a bit of secret sauce that we haven't really like advertised it very much lately. But I am super excited about it because what they do is they say, all right, you know, let's tackle function parameters today. Cody's not doing a very good job of completing function call arguments or function parameters in the definition, right? Yeah, we generate those thousands of tests, and then we can actually reuse those tests for the AI context as well. So fortunately, things are kind of converging on, we have, you know, half a dozen really, really good context sources, and we mix them all together. So anyway, BFG, you're going to hear more about it probably in the holidays? [00:47:12]Beyang: I think it'll be online for December 14th. We'll probably mention it. BFG is probably not the public name we're going to go with. I think we might call it like Graph Context or something like that. [00:47:20]Steve: We're officially calling it BFG. [00:47:22]Swyx: You heard it here first. [00:47:24]Beyang: BFG is just kind of like the working name. And so the impetus for BFG was like, if you look at like current AI inline code completion tools and the errors that they make, a lot of the errors that they make, even in kind of like the easy, like single line case, are essentially like type errors, right? Like you're trying to complete a function call and it suggests a variable that you defined earlier, but that variable is the wrong type. [00:47:47]Swyx: And that's the sort of thing [00:47:47]Beyang: where it's like a first year, like freshman CS student would not make that error, right? So like, why does the AI make that error? And the reason is, I mean, the AI is just suggesting things that are plausible without the context of the types or any other like broader files in the code. And so the kind of intuition here is like, why don't we just do the basic thing that like any baseline intelligent human developer would do, which is like click jump to definition, click some fine references and pull in that like Graph Context into the context window and then have it generate the completion. So like that's sort of like the MVP of what BFG was. And turns out that works really well. Like you can eliminate a lot of type errors that AI coding tools make just by pulling in that context. Yeah, but the graph is definitely [00:48:32]Steve: our Chomsky side. [00:48:33]Swyx: Yeah, exactly. [00:48:34]Beyang: So like this like Chomsky-Norvig thing, I think pops up in a bunch of differ
MLOps podcast #195 with Varun Mohan, CEO of Codeium, Building the Future of AI in Software Development brought to us by QuantumBlack. // Abstract This brief overview traces the evolution of Exafunction and Codeium, highlighting the strategic transition. It explores the inception of Codeium's key features, offering insights into the thoughtful design process. This emphasizes the company's forward-looking approach to preparing for a rapidly advancing technological landscape. Additionally, it touches upon developing essential MLOps systems, showcasing the commitment to maintaining rigor and efficiency in the face of evolving challenges. // Bio Varun Mohan developed a knack for programming in high school where he actively participated in various competitions. This passion for programming was shared with his now co-founder, with whom he frequently competed. Their common interest in programming and competition led them to attend MIT together, where they undertook more programming challenges. After college, they ventured into the Bay Area where they continued to compete and further cultivate their programming abilities. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Websites: codeium.com, https://exafunction.com/ --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Varun on Twitter: https://www.linkedin.com/in/varunkmohan/ Timestamps: [00:00] Varun's preferred coffee [00:15] Takeaways [02:50] Please like, share, and subscribe to our MLOps channels! [03:05] QuantumBlack ad by Nayur Khan [05:51] Varun's background in tech [10:55] Language Models Advancement [14:17] GPU scarce world [18:23] Vision and Pain Points [19:18] Fine-tuning Challenges in NLP [21:04] ML and AI Caution [21:49] MLOps: App vs Infra [23:53] Data Engineering Abstraction Evolution [26:12] Codeium and Scaling Discussion [31:59] API, Cloud, Computation [34:20] Codeium scaling [35:11] Reserved GPUs, companies self-hosting products [38:00] Open-source code Codeium training [40:03] Protecting IP Licenses [41:32] ML Challenges: Data, Bias, Security [44:37] Evaluating code [48:29] Getting values from Codeium [49:49] Exafunction ML AI Production [52:17] AWS Creation [53:58] Feature flags and MA AI lifecycle [56:34] Coding problem [58:40] New software architectures [1:03:28] Wrap up
Slike, videe in besedila že prepričljivo generira umetna inteligenca. Kako in komu še zaupati? Katera orodja umetne inteligence vendarle prinašajo lažji študij ali delo v pisarni? Natanko eno leto po prvih omembah ChatGPT-ja se predvsem sprašujemo o možnostih uporabe in zlorabe. So ruske tehnologije nevarne? So telefoni na preklop modni trend ali korak v novo smer?Maruša in Anže debatirata s kolegom Matejem Praprotnikom o temah, ki so jih izbrali odbiti poslušalci. Zapiski: DeepL Translate: The world’s most accurate translator VERAS | EvolveLAB Stable Diffusion Online - AI Image Generator GitHub Copilot · Your AI pair programmer · GitHub Tabnine is an AI assistant that speeds up delivery and keeps your code safe CodeGeeX GitHub Copilot · Your AI pair programmer · GitHub Codeium · Free AI Code Completion & Chat Perplexity AI for Research - scite.ai ChatPDF - Chat with any PDF! Consensus: AI Search Engine for Research Chat LlamaIndex HuggingChat Razpravi o odbitih temah se lahko pridružite na Discordu.
In this episode of The AI for Sales Podcast, Joe Papa welcomes Jeff Wang, Head of Business and Sales at Codeium, to explore the growing impact of AI on the sales industry. They discuss how AI is transforming sales processes, improving productivity, and enabling automation. Jeff shares his experiences and insights into the world of AI and sales.KEY TAKEAWAYSAI is revolutionizing sales by enhancing productivity and automating various tasks.AI tools are being used to transcribe and summarize sales calls, making it easier to gather insights and follow up with potential customers.Chatbots and AI-driven communication tools are becoming more prevalent in sales to streamline interactions with customers.The AI landscape is evolving rapidly, with various companies exploring AI applications in different industries.The AI business model is still evolving, and there is a need for more clarity regarding its profitability.QUOTES"In sales, time kills deals. In AI for Sales, AI kills time." - Dr. Joel Aban"The sales calls and deals might soon be conducted entirely by AI, from discovery to negotiation." - Jeff Wang"The trust in AI to handle customer interactions will continue to grow as AI accuracy reaches 100%." - Jeff Wang"The race to AI is fueling innovation and expanding the possibilities in the industry." - Joe Papa"It's the people using AI for nefarious purposes that we should be worried about, not the AI itself." - Jeff WangConnect and learn more about Jeff Wang through the link below:LinkedIn: https://www.linkedin.com/in/jefflinwang/Learn more about AI for Sales with Chad:LinkedIn Group: https://www.linkedin.com/groups/12811259/LinkedIn Personal Page: https://www.linkedin.com/in/chadburmeister/YouTube Channel: https://www.youtube.com/@TheAIforSalesPodcastTikTok: https://www.tiktok.com/@ai4salesFacebook Page: https://www.facebook.com/theaiforsalespodcast/Twitter Page: https://twitter.com/saleshackThe AI For Sales Podcast is sponsored by our proud partners:BDR.ai | https://www.bdr.ai/TruVersity | https://www.truversity.com/
At the AI Pioneers Summit we announced Latent Space Launchpad, an AI-focused accelerator in partnership with Decibel. If you're an AI founder of enterprise early adopter, fill out this form and we'll be in touch with more details. We also have a lot of events coming up as we wrap up the year, so make sure to check out our community events page and come say hi!We previously interviewed the founders of many developer productivity startups embedded in the IDE, like Codium AI, Cursor, and Codeium. We also covered Replit's (former) SOTA model, replit-code-v1-3b and most recently had Amjad and Michele announce replit-code-v1_5-3b at the AI Engineer Summit.Much has been speculated about the StackOverflow traffic drop since ChatGPT release, but the experience is still not perfect. There's now a new player in the “search for developers” arena: Phind.Phind's goal is to help you find answers to your technical questions, and then help you implement them. For example “What should I use to create a frontend for a Python script?” returns a list of frameworks as well as links to the sources. You can then ask follow up questions on specific implementation details, having it write some code for you, etc. They have both a web version and a VS Code integrationThey recently were top of Hacker News with the announcement of their latest model, which is now the #1 rated model on the BigCode Leaderboard, beating their previous version:TLDR Cheat Sheet:* Based on CodeLlama-34B, which is trained on 500B tokens* Further fine-tuned on 70B+ high quality code and reasoning tokens* Expanded context window to 16k tokens* 5x faster than GPT-4 (100 tok/s vs 20 tok/s on single stream)* 74.7% HumanEval vs 45% for the base modelWe've talked before about HumanEval being limited in a lot of cases and how it needs to be complemented with “vibe based” evals. Phind thinks of evals alongside two axis: * Context quality: when asking the model to generate code, was the context high quality? Did we put outdated examples in it? Did we retrieve the wrong files?* Result quality: was the code generated correct? Did it follow the instructions I gave it or did it misunderstand some of it?If you have bad results with bad context, you might get to a good result by working on better RAG. If you have good context and bad result you might either need to work on your prompting or you have hit the limits of the model, which leads you to fine tuning (like they did). Michael was really early to this space and started working on CommonCrawl filtering and indexing back in 2020, which led to a lot of the insights that now power Phind. We talked about that evolution, his experience at YC, how he got Paul Graham to invest in Phind and invite him to dinner at his house, and how Ron Conway connected him with Jensen Huang to get access to more GPUs!Show Notes* Phind* BigScience T0* InstructGPT Paper* Inception-V3* LMQL* Marginalia Nu* Mistral AI* People:* Paul Graham (pg)* Ron Conway* Yacine Jernite from HuggingFace* Jeff DelaneyTimestamps* [00:00:00] Intros & Michael's early interest in computer vision* [00:03:14] Pivoting to NLP and natural language question answering models* [00:07:20] Building a search engine index of Common Crawl and web pages* [00:11:26] Releasing the first version of Hello based on the search index and BigScience T0 model* [00:14:02] Deciding to focus the search engine specifically for programmers* [00:17:39] Overview of Phind's current product and focus on code reasoning* [00:21:51] The future vision for Phind to go from idea to complete code* [00:24:03] Transitioning to using the GPT-4 model and the impact it had* [00:29:43] Developing the Phind model based on CodeLlama and additional training* [00:32:28] Plans to continue improving the Phind model with open source technologies* [00:43:59] The story of meeting Paul Graham and Ron Conway and how that impacted the company* [00:53:02] How Ron Conway helped them get GPUs from Nvidia* [00:57:12] Tips on how Michael learns complex AI topics* [01:01:12] Lightning RoundTranscriptAlessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO of Residence and Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI. [00:00:19]Swyx: Hey, and today we have in the studio Michael Royzen from Phind. Welcome. [00:00:23]Michael: Thank you so much. [00:00:24]Alessio: It's great to be here. [00:00:25]Swyx: Yeah, we are recording this in a surprisingly hot October in San Francisco. And sometimes the studio works, but the blue angels are flying by right now, so sorry about the noise. So welcome. I've seen Phind blow up this year, mostly, I think since your launch in Feb and V2 and then your Hacker News posts. We tend to like to introduce our guests, but then obviously you can fill in the blanks with the origin story. You actually were a high school entrepreneur. You started SmartLens, which is a computer vision startup in 2017. [00:00:59]Michael: That's right. I remember when like TensorFlow came out and people started talking about, obviously at the time after AlexNet, the deep learning revolution was already in flow. Good computer vision models were a thing. And what really made me interested in deep learning was I got invited to go to Apple's WWDC conference as a student scholar because I was really into making iOS apps at the time. So I go there and I go to this talk where they added an API that let people run computer vision models on the device using far more efficient GPU primitives. After seeing that, I was like, oh, this is cool. This is going to have a big explosion of different computer vision models running locally on the iPhone. And so I had this crazy idea where it was like, what if I could just make this model that could recognize just about anything and have it run on the device? And that was the genesis for what eventually became SmartLens. I took this data set called ImageNet 22K. So most people, when they think of ImageNet, think of ImageNet 1K. But the full ImageNet actually has, I think, 22,000 different categories. So I took that, filtered it, pre-processed it, and then did a massive fine tune on Inception V3, which was, I think, the state of the art deep convolutional computer vision model at the time. And to my surprise, it actually worked insanely well. I had no idea what would happen if I give a single model. I think it ended up being 17,000 categories approximately that I collapsed them into. It worked so well that it actually worked better than Google Lens, which released its V1 around the same time. And on top of this, the model ran on the device. So it didn't need an internet connection. A big part of the issue with Google Lens at the time was that connections were slower. 4G was around, but it wasn't nearly as fast. So there was a noticeable lag having to upload an image to a server and get it back. But just processing it locally, even on the iPhones of the day in 2017, much faster. It was a cool little project. It got some traction. TechCrunch wrote about it. There was kind of like one big spike in usage, and then over time it tapered off. But people still pay for it, which is wild. [00:03:14]Swyx: That's awesome. Oh, it's like a monthly or annual subscription? [00:03:16]Michael: Yeah, it's like a monthly subscription. [00:03:18]Swyx: Even though you don't actually have any servers? [00:03:19]Michael: Even though we don't have any servers. That's right. I was in high school. I had a little bit of money. I was like, yeah. [00:03:25]Swyx: That's awesome. I always wonder what the modern equivalents kind of "Be my eyes". And it would be actually disclosed in the GPT-4 Vision system card recently that the usage was surprisingly not that frequent. The extent to which all three of us have our sense of sight. I would think that if I lost my sense of sight, I would use Be My Eyes all the time. The average usage of Be My Eyes per day is 1.5 times. [00:03:49]Michael: Exactly. I was thinking about this as well, where I was also looking into image captioning, where you give a model an image and then it tells you what's in the image. But it turns out that what people want is the exact opposite. People want to give a description of an image and then have the AI generate the image. [00:04:04]Alessio: Oh, the other way. [00:04:06]Michael: Exactly. And so at the time, I think there were some GANs, NVIDIA was working on this back in 2019, 2020. They had some impressive, I think, face GANs where they had this model that would produce these really high quality portraits, but it wasn't able to take a natural language description the way Midjourney or DALL-E 3 can and just generate you an image with exactly what you described in it. [00:04:32]Swyx: And how did that get into NLP? [00:04:35]Michael: Yeah, I released the SmartLens app and that was around the time I was a senior in high school. I was applying to college. College rolls around. I'm still sort of working on updating the app in college. But I start thinking like, hey, what if I make an enterprise version of this as well? At the time, there was Clarify that provided some computer vision APIs, but I thought this massive classification model works so well and it's so small and so fast, might as well build an enterprise product. And I didn't even talk to users or do any of those things that you're supposed to do. I was just mainly interested in building a type of backend I've never built before. So I was mainly just doing it for myself just to learn. I built this enterprise classification product and as part of it, I'm also building an invoice processing product where using some of the aspects that I built previously, although obviously it's very different from classification, I wanted to be able to just extract a bunch of structured data from an unstructured invoice through our API. And that's what led me to Hugnyface for the first time because that involves some natural language components. And so I go to Hugnyface and with various encoder models that were around at the time, I used the standard BERT and also Longformer, which came out around the same time. And Longformer was interesting because it had a much bigger context window than those models at the time, like BERT, all of the first gen encoder only models, they only had a context window of 512 tokens and it's fixed. There's none of this alibi or ROPE that we have now where we can basically massage it to be longer. They're fixed, 512 absolute encodings. Longformer at the time was the only way that you can fit, say, like a sequence length or ask a question about like 4,000 tokens worth of text. Implemented Longformer, it worked super well, but like nobody really kind of used the enterprise product and that's kind of what I expected because at the end of the day, it was COVID. I was building this kind of mostly for me, mostly just kind of to learn. And so nobody really used it and my heart wasn't in it and I kind of just shelved it. But a little later, I went back to HugMeFace and I saw this demo that they had, and this is in the summer of 2020. They had this demo made by this researcher, Yacine Jernite, and he called it long form question answering. And basically, it was this self-contained notebook demo where you can ask a question the way that we do now with ChatGPT. It would do a lookup into some database and it would give you an answer. And it absolutely blew my mind. The demo itself, it used, I think, BART as the model and in the notebook, it had support for both an Elasticsearch index of Wikipedia, as well as a dense index powered by Facebook's FAISS. I think that's how you pronounce it. It was very iffy, but when it worked, I think the question in the demo was, why are all boats white? When it worked, it blew my mind that instead of doing this few shot thing, like people were doing with GPT-3 at the time, which is all the rage, you could just ask a model a question, provide no extra context, and it would know what to do and just give you the answer. It blew my mind to such an extent that I couldn't stop thinking about that. When I started thinking about ways to make it better, I tried training, doing the fine tune with a larger BART model. And this BART model, yeah, it was fine tuned on this Reddit data set called Eli5. So basically... [00:08:02]Alessio: Subreddit. [00:08:03]Swyx: Yeah, subreddit. [00:08:04]Alessio: Yeah. [00:08:05]Michael: And put it into like a well-formatted, relatively clean data set of like human questions and human answers. And that was a really great bootstrap for that model to be able to answer these types of questions. And so Eli5 actually turned out to be a good data set for training these types of question answering models, because the question is written by a human, the answer is written by a human, and at least helps the model get the format right, even if the model is still very small and it can't really think super well, at least it gets the format right. And so it ends up acting as kind of a glorified summarization model, where if it's fed in high quality context from the retrieval system, it's able to have a reasonably high quality output. And so once I made the model as big as I can, just fine tuning on BART large, I started looking for ways to improve the index. So in the demo, in the notebook, there were instructions for how to make an Elasticsearch index just for Wikipedia. And I was like, why not do all of Common Crawl? So I downloaded Common Crawl, and thankfully, I had like 10 or $15,000 worth of AWS credits left over from the SmartLens project. And that's what really allowed me to do this, because there's no other funding. I was still in college, not a lot of money, and so I was able to spin up a bunch of instances and just process all of Common Crawl, which is massive. So it's roughly like, it's terabytes of text. I went to Alexa to get the top 1,000 websites or 10,000 websites in the world, then filtered only by those websites, and then indexed those websites, because the web pages were already included in Dump. [00:09:38]Swyx: You mean to supplement Common Crawl or to filter Common Crawl? [00:09:41]Michael: Filter Common Crawl. [00:09:42]Alessio: Oh, okay. [00:09:43]Michael: Yeah, sorry. So we filtered Common Crawl just by the top, I think, 10,000, just to limit this, because obviously there's this massive long tail of small sites that are really cool, actually. There's other projects like, shout out to Marginalia Nu, which is a search engine specialized on the long tail. I think they actually exclude the top 10,000. [00:10:03]Swyx: That's what they do. [00:10:04]Alessio: Yeah. [00:10:05]Swyx: I've seen them around, I just don't really know what their pitch is. Okay, that makes sense. [00:10:08]Michael: So they exclude all the top stuff. So the long tail is cool, but for this, that was kind of out of the question, and that was most of the data anyway. So we've removed that. And then I indexed the remaining approximately 350 million webpages through Elasticsearch. So I built this index running on AWS with these webpages, and it actually worked quite well. You can ask it general common knowledge, history, politics, current events, questions, and it would be able to do a fast lookup in the index, feed it into the model, and it would give a surprisingly good result. And so when I saw that, I thought that this is definitely doable. And it kind of shocked me that no one else was doing this. And so this was now the fall of 2020. And yeah, I was kind of shocked no one was doing this, but it costs a lot of money to keep it up. I was still in college. There are things going on. I got bogged down by classes. And so I ended up shelving this for almost a full year, actually. When I returned to it in fall of 2021, when BigScience released T0, when BigScience released the T0 models, that was a massive jump in the reasoning ability of the model. And it was better at reasoning, it was better at summarization, it was still a glorified summarizer basically. [00:11:26]Swyx: Was this a precursor to Bloom? Because Bloom's the one that I know. [00:11:29]Alessio: Yeah. [00:11:30]Michael: Actually coming out in 2022. But Bloom had other problems where for whatever reason, the Bloom models just were never really that good, which is so sad because I really wanted to use them. But I think they didn't turn on that much data. I think they used like the original, they were trying to replicate GPT-3. So they just use those numbers, which we now know are like far below Chinchilla Optimal and even Chinchilla Optimal, which we can like talk about later, like what we're currently doing with MIMO goes, yeah, it goes way beyond that. But they weren't trying enough data. I'm not sure how that data was clean, but it probably wasn't super clean. And then they didn't really do any fine tuning until much later. So T0 worked well because they took the T5 models, which were closer to Chinchilla Optimal because I think they were trained on also like 300 something billion tokens, similar to GPT-3, but the models were much smaller. I think T0 is the first model that did large scale instruction tuning from diverse data sources in the fall of 2021. This is before Instruct GPT. This is before Flan T5, which came out in 2022. This is the very, very first, at least well-known example of that. And so it came out and then I did, on top of T0, I also did the Reddit Eli5 fine tune. And that was the first model and system that actually worked well enough to where I didn't get discouraged like I did previously, because the failure cases of the BART based system was so egregious. Sometimes it would just miss a question so horribly that it was just extremely discouraging. But for the first time, it was working reasonably well. Also using a much bigger model. I think the BART model is like 800 million parameters, but T0, we were using 3B. So it was T0, 3B, bigger model. And that was the very first iteration of Hello. So I ended up doing a show HN on Hacker News in January 2022 of that system. Our fine tune T0 model connected to our Elasticsearch index of those 350 million top 10,000 common crawl websites. And to the best of my knowledge, I think that's the first example that I'm aware of a LLM search engine model that's effectively connected to like a large enough index that I consider like an internet scale. So I think we were the first to release like an internet scale LLM powered rag search system In January 2022, around the time me and my future co-founder, Justin, we were like, this seems like the future. [00:14:02]Alessio: This is really cool. [00:14:03]Michael: I couldn't really sleep even like I was going to bed and I was like, I was thinking about it. Like I would say up until like 2.30 AM, like reading papers on my phone in bed, go to sleep, wake up the next morning at like eight and just be super excited to keep working. And I was also doing my thesis at the same time, my senior honors thesis at UT Austin about something very similar. We were researching factuality in abstractive question answering systems. So a lot of overlap with this project and the conclusions of my research actually kind of helped guide the development path of Hello. In the research, we found that LLMs, they don't know what they don't know. So the conclusion was, is that you always have to do a search to ensure that the model actually knows what it's talking about. And my favorite example of this even today is kind of with chat GPT browsing, where you can ask chat GPT browsing, how do I run llama.cpp? And chat GPT browsing will think that llama.cpp is some file on your computer that you can just compile with GCC and you're all good. It won't even bother doing a lookup, even though I'm sure somewhere in their internal prompts they have something like, if you're not sure, do a lookup. [00:15:13]Alessio: That's not good enough. So models don't know what they don't know. [00:15:15]Michael: You always have to do a search. And so we approached LLM powered question answering from the search angle. We pivoted to make this for programmers in June of 2022, around the time that we were getting into YC. We realized that what we're really interested in is the case where the models actually have to think. Because up until then, the models were kind of more glorified summarization models. We really thought of them like the Google featured snippets, but on steroids. And so we saw a future where the simpler questions would get commoditized. And I still think that's going to happen with like Google SGE and like it's nowadays, it's really not that hard to answer the more basic kind of like summarization, like current events questions with lightweight models that'll only continue to get cheaper over time. And so we kind of started thinking about this trade off where LLM models are going to get both better and cheaper over time. And that's going to force people who run them to make a choice. Either you can run a model of the same intelligence that you could previously for cheaper, or you can run a better model for the same price. So someone like Google, once the price kind of falls low enough, they're going to deploy and they're already doing this with SGE, they're going to deploy a relatively basic glorified summarizer model that can answer very basic questions about like current events, who won the Super Bowl, like, you know, what's going on on Capitol Hill, like those types of things. The flip side of that is like more complex questions where like you have to reason and you have to solve problems and like debug code. And we realized like we're much more interested in kind of going along the bleeding edge of that frontier case. And so we've optimized everything that we do for that. And that's a big reason of why we've built Phind specifically for programmers, as opposed to saying like, you know, we're kind of a search engine for everyone because as these models get more capable, we're very interested in seeing kind of what the emergent properties are in terms of reasoning, in terms of being able to solve complex multi-step problems. And I think that some of those emerging capabilities like we're starting to see, but we don't even fully understand. So I think there's always an opportunity for us to become more general if we wanted, but we've been along this path of like, what is the best, most advanced reasoning engine that's connected to your code base, that's connected to the internet that we can just provide. [00:17:39]Alessio: What is Phind today, pragmatically, from a product perspective, how do people interact with it? Yeah. Or does it plug into your workflow? [00:17:46]Michael: Yeah. [00:17:47]Alessio: So Phind is really a system. [00:17:48]Michael: Phind is a system for programmers when they have a question or when they're frustrated or when something's not working. [00:17:54]Swyx: When they're frustrated. [00:17:55]Alessio: Yeah. [00:17:56]Michael: For them to get on block. I think like the single, the most abstract page for Phind is like, if you're experiencing really any kind of issue as a programmer, we'll solve that issue for you in 15 seconds as opposed to 15 minutes or longer. Phind has an interface on the web. It has an interface in VS code and more IDEs to come, but ultimately it's just a system where a developer can paste in a question or paste in code that's not working and Phind will do a search on the internet or they will find other code in your code base perhaps that's relevant. And then we'll find the context that it needs to answer your question and then feed it to a reasoning engine powerful enough to actually answer it. So that's really the philosophy behind Phind. It's a system for getting developers the answers that they're looking for. And so right now from a product perspective, this means that we're really all about getting the right context. So the VS code extension that we launched recently is a big part of this because you can just ask a question and it knows where to find the right code context in your code. It can do an internet search as well. So it's up to date and it's not just reliant on what the model knows and it's able to figure out what it needs by itself and answer your question based on that. If it needs some help, you can also get yourself kind of just, there's opportunities for you yourself to put in all that context in. But the issue is also like not everyone wants these VS code. Some people like are real Neovim sticklers or they're using like PyCharm or other IDEs, JetBrains. And so for those people, they're actually like okay with switching tabs, at least for now, if it means them getting their answer. Because really like there's been an explosion of all these like startups doing code, doing search, etc. But really who everyone's competing with is ChatGPT, which only has like that one web interface. Like ChatGPT is really the bar. And so that's what we're up against. [00:19:50]Alessio: And so your idea, you know, we have Amman from Cursor on the podcast and they've gone through the we need to own the IDE thing. Yours is more like in order to get the right answer, people are happy to like go somewhere else basically. They're happy to get out of their IDE. [00:20:05]Michael: That was a great podcast, by the way. But yeah, so part of it is that people sometimes perhaps aren't even in an IDE. So like the whole task of software engineering goes way beyond just running code, right? There's also like a design stage. There's a planning stage. A lot of this happens like on whiteboards. It happens in notebooks. And so the web part also exists for that where you're not even coding it and you're just trying to get like a more conceptual understanding of what you're trying to build first. The podcast with Amman was great, but somewhere where I disagree with him is that you need to own the IDE. I think like he made some good points about not having platform risk in the long term. But some of the features that were mentioned like suggesting diffs, for example, those are all doable with an extension. We haven't yet seen with VS Code in particular any functionality that we'd like to do yet in the IDE that we can't either do through directly supported VS Code functionality or something that we kind of hack into there, which we've also done a fair bit of. And so I think it remains to be seen where that goes. But I think what we're looking to be is like we're not trying to just be in an IDE or be an IDE. Like Phind is a system that goes beyond the IDE and like is really meant to cover the entire lifecycle of a developer's thought process in going about like, hey, like I have this idea and I want to get from that idea to a working product. And so then that's what the long term vision of Phind is really about is starting with that. In the future, I think programming is just going to be really just the problem solving. Like you come up with an idea, you come up with like the basic design for the algorithm in your head, and you just tell the AI, hey, just like just do it, just make it work. And that's what we're building towards. [00:21:51]Swyx: I think we might want to give people an impression about like type of traffic that you have, because when you present it with a text box, you could type in anything. And I don't know if you have some mental categorization of like what are like the top three use cases that people tend to coalesce around. [00:22:08]Alessio: Yeah, that's a great question. [00:22:09]Michael: The two main types of searches that we see are how-to questions, like how to do X using Y tool. And this historically has been our bread and butter, because with our embeddings, like we're really, really good at just going over a bunch of developer documentation and figuring out exactly the part that's relevant and just telling you, OK, like you can use this method. But as LLMs have gotten better, and as we've really transitioned to using GPT-4 a lot in our product, people organically just started pasting in code that's not working and just said, fix it for me. [00:22:42]Swyx: Fix this. [00:22:43]Alessio: Yeah. [00:22:44]Michael: And what really shocks us is that a lot of the people who do that, they're coming from chat GPT. So they tried it in chat GPT with chat GPT-4. It didn't work. Maybe it required like some multi-step reasoning. Maybe it required some internet context or something found in either a Stack Overflow post or some documentation to solve it. And so then they paste it into find and then find works. So those are really those two different cases. Like, how can I build this conceptually or like remind me of this one detail that I need to build this thing? Or just like, here's this code. Fix it. And so that's what a big part of our VS Code extension is, is like enabling a much smoother here just like fix it for me type of workflow. That's really its main benefits. Like it's in your code base. It's in the IDE. It knows how to find the relevant context to answer that question. But at the end of the day, like I said previously, that's still a relatively, not to say it's a small part, but it's a limited part of the entire mental life cycle of a programmer. [00:23:47]Swyx: Yep. So you launched in Feb and then you launched V2 in August. You had a couple other pretty impactful posts slash feature launches. The web search one was massive. So you were mostly a GPT-4 wrapper. We were for a long time. [00:24:03]Michael: For a long time until recently. Yeah. [00:24:05]Alessio: Until recently. [00:24:06]Swyx: So like people coming over from ChatGPT were saying, we're going to say model with your version of web search. Would that be the primary value proposition? [00:24:13]Michael: Basically yeah. And so what we've seen is that any model plus web search is just significantly better than [00:24:18]Alessio: that model itself. Do you think that's what you got right in April? [00:24:21]Swyx: Like so you got 1500 points on Hacking News in April, which is like, if you live on Hacking News a lot, that is unheard of for someone so early on in your journey. [00:24:31]Alessio: Yeah. [00:24:32]Michael: We're super, super grateful for that. Definitely was not expecting it. So what we've done with Hacker News is we've just kept launching. [00:24:38]Alessio: Yeah. [00:24:39]Michael: Like what they don't tell you is that you can just keep launching. That's what we've been doing. So we launched the very first version of Find in its current incarnation after like the previous demo connected to our own index. Like once we got into YC, we scrapped our own index because it was too cumbersome at the time. So we moved over to using Bing as kind of just the raw source data. We launched as Hello Cognition. Over time, every time we like added some intelligence to the product, a better model, we just keep launching. And every additional time we launched, we got way more traffic. So we actually silently rebranded to Find in late December of last year. But like we didn't have that much traffic. Nobody really knew who we were. [00:25:18]Swyx: How'd you pick the name out of it? [00:25:19]Michael: Paul Graham actually picked it for us. [00:25:21]Swyx: All right. [00:25:22]Alessio: Tell the story. Yeah. So, oh boy. [00:25:25]Michael: So this is the biggest side. Should we go for like the full Paul Graham story or just the name? [00:25:29]Swyx: Do you want to do it now? Or do you want to do it later? I'll give you a choice. [00:25:32]Alessio: Hmm. [00:25:33]Michael: I think, okay, let's just start with the name for now and then we can do the full Paul Graham story later. But basically, Paul Graham, when we were lucky enough to meet him, he saw our name and our domain was at the time, sayhello.so and he's just like, guys, like, come on, like, what is this? You know? And we were like, yeah, but like when we bought it, you know, we just kind of broke college students. Like we didn't have that much money. And like, we really liked hello as a name because it was the first like conversational search engine. And that's kind of, that's the angle that we were approaching it from. And so we had sayhello.so and he's like, there's so many problems with that. Like, like, like the say hello, like, what does that even mean? And like .so, like, it's gotta be like a .com. And so we did some time just like with Paul Graham in the room. We just like looked at different domain names, like different things that like popped into our head. And one of the things that popped into like Paul Graham said was fine with the Phind spelling in particular. [00:26:33]Swyx: Yeah. Which is not typical naming advice, right? Yes. Because it's not when people hear it, they don't spell it that way. [00:26:38]Michael: Exactly. It's hard to spell. And also it's like very 90s. And so at first, like, we didn't like, I was like, like, ah, like, I don't know. But over time it kept growing on us. And eventually we're like, okay, we like the name. It's owned by this elderly Canadian gentleman who we got to know, and he was willing to sell it to us. [00:26:57]Michael: And so we bought it and we changed the name. Yeah. [00:27:01]Swyx: Anyways, where were you? [00:27:02]Alessio: I had to ask. [00:27:03]Swyx: I mean, you know, everyone who looks at you is wondering. [00:27:06]Michael: And a lot of people actually pronounce it Phind, which, you know, by now it's part of the game. But eventually we want to buy Phind.com and then just have that redirect to Phind. So Phind is like definitely the right spelling. But like, we'll just, yeah, we'll have all the cases addressed. [00:27:23]Swyx: Cool. So Bing web search, and then August you launched V2. Is V2 the Phind as a system pitch? Or have you moved, evolved since then? [00:27:31]Michael: Yeah, so I don't, like the V2 moniker, like, I don't really think of it that way in my mind. There's like, there's the version we launched during, last summer during YC, which was the Bing version directed towards programmers. And that's kind of like, that's why I call it like the first incarnation of what we currently are. Because it was already directed towards programmers. We had like a code snippet search built in as well, because at the time, you know, the models we were using weren't good enough to generate code snippets. Even GPT, like the text DaVinci 2 was available at the time, wasn't that good at generating code and it would generate like very, very short, very incomplete code snippets. And so we launched that last summer, got some traction, but really like we were only doing like, I don't know, maybe like 10,000 searches a day. [00:28:15]Alessio: Some people knew about it. [00:28:16]Michael: Some people used it, which is impressive because looking back, the product like was not that good. And every time we've like made an improvement to the way that we retrieve context through better embeddings, more intelligent, like HTML parsers, and importantly, like better underlying models. Every major version after that was when we introduced a better underlying answering model. Like in February, we had to swallow a bit of our pride when we were like, okay, our own models aren't good enough. We have to go to open AI. And actually that did lead to kind of like our first decent bump of traffic in February. And people kept using it, like our attention was way better too. But we were still kind of running into problems of like more advanced reasoning. Some people tried it, but people were leaving because even like GPT 3.5, both turbo and non-turbo, like still not that great at doing like code related reasoning beyond the how do you do X, like documentation search type of use case. And so it was really only when GPT 4 came around in April that we were like, okay, like this is like our first real opportunity to really make this thing like the way that it should have been all along. And having GPT 4 as the brain is what led to that Hacker News post. And so what we did was we just let anyone use GPT 4 on Fyne for free without a login, [00:29:43]Alessio: which I actually don't regret. [00:29:45]Michael: So it was very expensive, obviously. But like at that stage, all we needed to do was show like, we just needed to like show people here's what Fyne can do. That was the main thing. And so that worked. That worked. [00:29:58]Alessio: Like we got a lot of users. [00:29:59]Michael: Do you know Fireship? [00:30:01]Swyx: Yeah. YouTube, Jeff Delaney. [00:30:03]Michael: Yeah. He made a short about Fyne. [00:30:06]Alessio: Oh. [00:30:07]Michael: And that's on top of the Hacker News post. And that's what like really, really made it blow up. It got millions of views in days. And he's just funny. Like what I love about Fireship is like he like you guys, yeah, like humor goes a long a long way towards like really grabbing people's attention. And so that blew up. [00:30:25]Swyx: Something I would be anxious about as a founder during that period, so obviously we all remember that pretty closely. So there were a couple of people who had access to the GPT-4 API doing this, which is unrestricted access to GPT-4. And I have to imagine OpenAI wasn't that happy about that because it was like kind of de facto access to GPT-4 before they released it. [00:30:46]Alessio: No, no. [00:30:47]Michael: GPT-4 was in chat GPT from day one. I think. OpenAI actually came to our support because what happened was we had people building unofficial APIs around to try to get free access to it. And I think OpenAI actually has the right perspective on this where they're like, OK, people can do whatever they want with the API if they're paying for it, like they can do whatever they want, but it's like not OK if, you know, paying customers are being exploite by these other actors. They actually got in touch with us and they helped us like set up better Cloudflare bot monitoring controls to effectively like crack down on those unofficial APIs, which we're very happy about. But yeah, so we launched GPT-4. A lot of people come to the product and yeah, for a long time, we're just we're figuring out like what do we make of this, right? How do we a make it better, but also deal with like our costs, which have just like massively, massively ballooned. Over time, it's become more clear with the release of Llama 2 and Llama 3 on the horizon that we will once again see a return to vertical applications running their own models. As was true last year and before, I think that GPT-4, my hypothesis is that the jump from 4 to 4.5 or 4 to 5 will be smaller than the jump from 3 to 4. And the reason why is because there were a lot of different things. Like there was two plus, effectively two, two and a half years of research that went into going from 3 to 4. Like more data, bigger model, all of the instruction tuning techniques, RLHF, all of that is known. And like Meta, for example, and now there's all these other startups like Mistral too, like there's a bunch of very well-funded open source players that are now working on just like taking the recipe that's now known and scaling it up. So I think that even if a delta exists, the delta between in 2024, the delta between proprietary and open source won't be large enough that a startup like us with a lot of data that we've collected can take the data that we have, fine tune an open source model, and like be able to have it be better than whatever the proprietary model is at the time. That's my hypothesis.Michael: But we'll once again see a return to these verticalized models. And that's something that we're super excited about because, yeah, that brings us to kind of the fine model because the plan from kind of the start was to be able to return to that if that makes sense. And I think now we're definitely at a point where it does make sense because we have requests from users who like, they want longer context in the model, basically, like they want to be able to ask questions about their entire code base without, you know, context and retrieval and taking a chance of that. Like, I think it's generally been shown that if you have the space to just put the raw files inside of a big context window, that is still better than chunking and retrieval. So there's various things that we could do with longer context, faster speed, lower cost. Super excited about that. And that's the direction that we're going with the fine model. And our big hypothesis there is precisely that we can take a really good open source model and then just train it on absolutely all of the high quality data that we can find. And there's a lot of various, you know, interesting ideas for this. We have our own techniques that we're kind of playing with internally. One of the very interesting ideas that I've seen, I think it's called Octopack from BigCode. I don't think that it made that big waves when it came out, I think in August. But the idea is that they have this data set that maps GitHub commits to a change. So basically there's all this really high quality, like human made, human written diff data out there on every time someone makes a commit in some repo. And you can use that to train models. Take the file state before and like given a commit message, what should that code look like in the future? [00:34:52]Swyx: Got it. [00:34:53]Alessio: Do you think your HumanEval is any good?Michael: So we ran this experiment. We trained the Phind model. And if you go to the BigCode leaderboard, as of today, October 5th, all of our models are at the top of the BigCode leaderboard by far. It's not close, particularly in languages other than Python. We have a 10 point gap between us and the next best model on JavaScript. I think C sharp, multilingual. And what we kind of learned from that whole experience releasing those models is that human eval doesn't really matter. Not just that, but GPT-4 itself has been trained on human eval. And we know this because GPT-4 is able to predict the exact docstring in many of the problems. I've seen it predict like the specific example values in the docstring, which is extremely improbable. So I think there's a lot of dataset contamination and it only captures a very limited subset of what programmers are actually doing. What we do internally for evaluations are we have GPT-4 score answers. GPT-4 is a really good evaluator. I mean, obviously it's by really good, I mean, it's the best that we have. I'm sure that, you know, a couple of months from now, next year, we'll be like, oh, you know, like GPT-4.5, GPT-5, it's so much better. Like GPT-4 is terrible, but like right now it's the best that we have short of humans. And what we found is that when doing like temperature zero evals, it's actually mostly deterministic GPT-4 across runs in assigning scores to two different answers. So we found it to be a very useful tool in comparing our model to say, GPT-4, but yeah, on our like internal real world, here's what people will be asking this model dataset. And the other thing that we're running is just like releasing the model to our users and just seeing what they think. Because that's like the only thing that really matters is like releasing it for the application that it's intended for, and then seeing how people react. And for the most part, the incredible thing is, is that people don't notice a difference between our model and GPT-4 for the vast majority of searches. There's some reasoning problems that GPT-4 can still do better. We're working on addressing that. But in terms of like the types of questions that people are asking on find, there's not that much difference. And in fact, I've been running my own kind of side by side comparisons, shout out to GodMode, by the way. [00:37:16]Michael: And I've like myself, I've kind of confirmed this to be the case. And even sometimes it gives a better answer, perhaps like more concise or just like better implementation than GPT-4, which that's what surprises me. And by now we kind of have like this reasoning is all you need kind of hypothesis where we've seen emerging capabilities in the find model, whereby training it on high quality code, it can actually like reason better. It went from not being able to solve world problems, where riddles were like with like temporal placement of objects and moving and stuff like that, that GPT-4 can do pretty well. We went from not being able to do those at all to being able to do them just by training on more code, which is wild. So we're already like starting to see like these emerging capabilities. [00:37:59]Swyx: So I just wanted to make sure that we have the, I guess, like the model card in our heads. So you started from Code Llama? [00:38:07]Alessio: Yes. [00:38:08]Swyx: 65, 34? 34. [00:38:10]Michael: So unfortunately, there's no Code Llama 70b. If there was, that would be super cool. But there's not. [00:38:15]Swyx: 34. And then, which in itself was Llama 2, which is on 2 trillion tokens and the added 500 billion code tokens. Yes. [00:38:22]Michael: And you just added a bunch more. [00:38:23]Alessio: Yeah. [00:38:24]Michael: And they also did a couple of things. So they did, I think they did 500 billion, like general pre-training and then they did an extra 20 billion long context pre-training. So they actually increased the like max position tokens to 16k up from 8k. And then they changed the theta parameter for the ROPE embeddings as well to give it theoretically better long context support up to 100k tokens. But yeah, but otherwise it's like basically Llama 2. [00:38:50]Swyx: And so you just took that and just added data. [00:38:52]Michael: Exactly. [00:38:53]Swyx: You didn't do any other fundamental. [00:38:54]Michael: Yeah. So we didn't actually, we haven't yet done anything with the model architecture and we just trained it on like many, many more billions of tokens on our own infrastructure. And something else that we're taking a look at now is using reinforcement learning for correctness. One of the interesting pitfalls that we've noticed with the Phind model is that in cases where it gets stuff wrong, it sometimes is capable of getting the right answer. It's just, there's a big variance problem. It's wildly inconsistent. There are cases when it is able to get the right chain of thought and able to arrive [00:39:25]Alessio: at the right answer, but not always. [00:39:27]Michael: And so like one of our hypotheses is something that we're going to try is that like we can actually do reinforcement learning on, for a given problem, generate a bunch of completions and then like use the correct answer as like a loss basically to try to get it to be more correct. And I think there's a high chance I think of this working because it's very similar to the like RLHF method where you basically show pairs of completions for a given question except the criteria is like which one is like less harmful. But here we have a different criteria. But if the model is already capable of getting the right answer, which it is, we're just, we just need to cajole it into being more consistent. [00:40:06]Alessio: There were a couple of things that I noticed in the product that were not strange but unique. So first of all, the model can talk multiple times in a row, like most other applications is like human model, human model. And then you had outside of the thumbs up, thumbs down, you have things like have DLLM prioritize this message and its answers or then continue from this message to like go back. How does that change the flow of the user and like in terms of like prompting it, yeah, what are like some tricks or learnings you've had? [00:40:37]Michael: So yeah, that's specifically in our pair programmer mode, which is a more conversational mode that also like asks you clarifying questions back if it doesn't fully understand what you're doing and it kind of it holds your hand a bit more. And so from user feedback, we had requests to make more of an auto GPT where you can kind of give it this problem that might take multiple searches or multiple different steps like multiple reasoning steps to solve. And so that's the impetus behind building that product. Being able to do multiple steps and also be able to handle really long conversations. Like people are really trying to use the pair programmer to go from like sometimes really from like basic idea to like complete working code. And so we noticed was is that we were having like these very, very long threads, sometimes with like 60 messages, like 100 messages. And like those become really, really challenging to manage the appropriate context window of what should go inside of the context and how to preserve the context so that the model can continue or the product can continue giving good responses, even if you're like 60 messages deep in a conversation. So that's where the prioritized user messages like comes from. It's like people have asked us to just like let them pin messages that they want to be left in the conversation. And yeah, and then that seems to have like really gone a long way towards solving that problem, yeah. [00:41:54]Alessio: And then you have a run on Replit thing. Are you planning to build your own repl? Like learning some people trying to run the wrong code, unsafe code? [00:42:03]Michael: Yes. Yes. So I think like in the long term vision of like being a place where people can go from like idea to like fully working code, having a code sandbox, like a natively integrated code sandbox makes a lot of sense. And replit is great and people use that feature. But yeah, I think there's more we can do in terms of like having something a bit closer to code interpreter where it's able to run the code and then like recursively iterate on it. Exactly. [00:42:31]Swyx: So you're working on APIs to enable you to do that? Yep. So Amjad has specifically told me in person that he wants to enable that for people at the same time. He's also working on his own models, and Ghostwriter and you know, all the other stuff. So it's going to get interesting. Like he wants to power you, but also compete with you. Yeah. [00:42:47]Michael: And like, and we love replit. I think that a lot of the companies in our space, like we're all going to converge to solving a very similar problem, but from a different angle. So like replit approaches this problem from the IDE side. Like they started as like this IDE that you can run in the browser. And they started from that side, making coding just like more accessible. And we're approaching it from the side of like an LLM that's just like connected to everything that it needs to be connected to, which includes your code context. So that's why we're kind of making inroads into IDEs, but we're kind of, we're approaching this problem from different sides. And I think it'll be interesting to see where things end up. But I think that in the long, long term, we have an opportunity to also just have like this general technical reasoning engine product that's potentially also not just for, not just for programmers. It's also powered in this web interface, like where there's potential, I think other things that we will build that eventually might go beyond like our current scope. [00:43:49]Swyx: Exciting. We'll look forward to that. We're going to zoom out a little bit into sort of AI ecosystem stories, but first we got to get the Paul Graham, Ron Conway story. [00:43:59]Alessio: Yeah. [00:44:00]Michael: So flashback to last summer, we're in the YC batch. We're doing the summer batch, summer 22. So the summer batch runs from June to September, approximately. And so this was late July, early August, right around the time that many like YC startups start like going out, like during up, here's how we're going to pitch investors and everything. And at the same time, me and my co-founder, Justin, we were planning on moving to New York. So for a long time, actually, we were thinking about building this company in New York, mainly for personal reasons, actually, because like during the pandemic, pre-ChatGPT, pre last year, pre the AI boom, SF unfortunately really kind of, you know, like lost its luster. Yeah. Like no one was here. It was far from clear, like if there would be an AI boom, if like SF would be like... [00:44:49]Alessio: Back. [00:44:50]Michael: Yeah, exactly. Back. As everyone is saying these days, it was far from clear. And so, and all of our friends, we were graduating college because like we happened to just graduate college and immediately start YC, like we didn't even have, I think we had a week in between. [00:45:06]Swyx: You didn't bother looking for jobs. You were just like, this is what we want to do. [00:45:08]Michael: Well, actually both me and my co-founder, we had jobs that we secured in 2021 from previous internships, but we both, funny enough, when I spoke to my boss's boss at the company at where I reneged my offer, I told him we got into YC, they actually said, yeah, you should do YC. [00:45:27]Swyx: Wow. [00:45:28]Alessio: That's very selfless. [00:45:29]Swyx: That was really great that they did that. But in San Francisco, they would have offered to invest as well. [00:45:33]Michael: Yes, they would have. But yeah, but we were both planning to be in New York and all of our friends were there from college at this point, like we have this whole plan where like on August 1st, we're going to move to New York and we had like this Airbnb for the month of New York. We're going to stay there and we're going to work and like all of that. The day before we go to New York, I called Justin and I just, I tell him like, why are we doing this? Because in our batch, by the time August 1st rolled around, all of our mentors at YC were saying like, hey, like you should really consider staying in SF. [00:46:03]Swyx: It's the hybrid batch, right? [00:46:04]Michael: Yeah, it was the hybrid batch, but like there were already signs that like something was kind of like afoot in SF, even if like we didn't fully want to admit it yet. And so we were like, I don't know, I don't know. Something kind of clicked when the rubber met the road and it was time to go to New York. We're like, why are we doing this? And like, we didn't have any good reasons for staying in New York at that point beyond like our friends are there. So we still go to New York because like we have the Airbnb, like we don't have any other kind of place to go for the next few weeks. We're in New York and New York is just unfortunately too much fun. Like all of my other friends from college who are just, you know, basically starting their jobs, starting their lives as adults. They just stepped into these jobs, they're making all this money and they're like partying and like all these things are happening. And like, yeah, it's just a very distracting place to be. And so we were just like sitting in this like small, you know, like cramped apartment, terrible posture, trying to get as much work done as we can, too many distractions. And then we get this email from YC saying that Paul Graham is in town in SF and he is doing office hours with a certain number of startups in the current batch. And whoever signs up first gets it. And I happen to be super lucky. I was about to go for a run, but I just, I saw the email notification come across the street. I immediately clicked on the link and like immediately, like half the spots were gone, but somehow the very last spot was still available. And so I picked the very, very last time slot at 7 p.m. semi-strategically, you know, so we would have like time to go over. And also because I didn't really know how we're going to get to SF yet. And so we made a plan that we're going to fly from New York to SF and back to New York in one day and do like the full round trip. And we're going to meet with PG at the YC Mountain View office. And so we go there, we do that, we meet PG, we tell him about the startup. And one thing I love about PG is that he gets like, he gets so excited. Like when he gets excited about something, like you can see his eyes like really light up. And he'll just start asking you questions. In fact, it's a little challenging sometimes to like finish kind of like the rest of like the description of your pitch because like, he'll just like asking all these questions about how it works. And I'm like, you know, what's going on? [00:48:19]Swyx: What was the most challenging question that he asked you? [00:48:21]Michael: I think that like really how it worked. Because like as soon as like we told him like, hey, like we think that the future of search is answers, not links. Like we could really see like the gears turning in his head. I think we were like the first demo of that. [00:48:35]Swyx: And you're like 10 minutes with him, right? [00:48:37]Michael: We had like 45, yeah, we had a decent chunk of time. And so we tell him how it works. Like he's very excited about it. And I just like, I just blurted out, I just like asked him to invest and he hasn't even seen the product yet. We just asked him to invest and he says, yeah. And like, we're super excited about that. [00:48:55]Swyx: You haven't started your batch. [00:48:56]Michael: No, no, no. This is about halfway through the batch or two, two, no, two thirds of the batch. [00:49:02]Swyx: And you're like not technically fundraising yet. We're about to start fundraising. Yeah. [00:49:06]Michael: So we have like this demo and like we showed him and like there was still a lot of issues with the product, but I think like it must have like still kind of like blown his mind in some way. So like we're having fun. He's having fun. We have this dinner planned with this other friend that we had in SF because we were only there for that one day. So we thought, okay, you know, after an hour we'll be done, you know, we'll grab dinner with our friend and we'll fly back to New York. But PG was like, like, I'm having so much fun. Do you want to have dinner? Yeah. Come to my house. Or he's like, I gotta go have dinner with my wife, Jessica, who's also awesome, by the way. [00:49:40]Swyx: She's like the heart of YC. Yeah. [00:49:42]Michael: Jessica does not get enough credit as an aside for her role. [00:49:46]Swyx: He tries. [00:49:47]Michael: He understands like the technical side and she understands people and together they're just like a phenomenal team. But he's like, yeah, I got to go see Jessica, but you guys are welcome to come with. Do you want to come with? And we're like, we have this friend who's like right now outside of like literally outside the door who like we also promised to get dinner with. It's like, we'd love to, but like, I don't know if we can. He's like, oh, he's welcome to come too. So all of us just like hop in his car and we go to his house and we just like have this like we have dinner and we have this just chat about the future of search. Like I remember him telling Jessica distinctly, like our kids as kids are not going to know what like a search result is. Like they're just going to like have answers. That was really like a mind blowing, like inflection point moment for sure. [00:50:34]Swyx: Wow, that email changed your life. [00:50:35]Michael: Absolutely. [00:50:36]Swyx: And you also just spoiled the booking system for PG because now everyone's just going to go after the last slot. Oh man. [00:50:42]Michael: Yeah. But like, I don't know if he even does that anymore. [00:50:46]Swyx: He does. He does. Yeah. I've met other founders that he did it this year. [00:50:49]Michael: This year. Gotcha. But when we told him about how we did it, he was like, I am like frankly shocked that YC just did like a random like scheduling system. [00:50:55]Alessio: They didn't like do anything else. But, um. [00:50:58]Swyx: Okay. And then he introduces Duron Conway. Yes. Who is one of the most legendary angels in Silicon Valley. [00:51:04]Michael: Yes.So after PG invested, the rest of our round came together pretty quickly. [00:51:10]Swyx: I'm, by the way, I'm surprised. Like it's, it might feel like playing favorites right within the current batch to be like, yo, PG invested in this one. Right. [00:51:17]Alessio: Too bad for the others. [00:51:18]Swyx: Too bad for the others, I guess. [00:51:19]Michael: I think this is a bigger point about YC and like these accelerators in general is like YC gets like a lot of criticism from founders who feel like they didn't get value out of it. But like, in my view, YC is what you make of it. And YC tells you this. They're like, you really got to grab this opportunity, like buy the balls and make the most of it. And if you do, then it could be the best thing in the world. And if you don't, and if you're just kind of like a passive, even like an average founder in YC, you're still going to fail. And they tell you that. They're like, if you're average in your batch, you're going to fail. Like you have to just be exceptional in every way. With that in mind, perhaps that's even part of the reason why we asked PG to invest. And so yeah, after PG invested, the rest of our round came together pretty quickly, which I'm very fortunate for. And yeah, he introduced us to Ron. And after he did, I get a call from Ron. And then Ron says like, hey, like PG tells me what you're working on. I'd love to come meet you guys. And I'm like, wait, no way. And then we're just holed up in this like little house in San Mateo, which is a little small, but you know, it had a nice patio. In fact, we had like a monitor set up outside on the deck out there. And so Ron Conway comes over, we go over to the patio where like our workstation is. And Ron Conway, he's known for having like this notebook that he goes around with where he like sits down with the notebook and like takes very, very detailed notes. So he never like forgets anything. So he sits down with his notebook and he asks us like, hey guys, like, what do you need? And we're like, oh, we need GPUs. Back then, the GPU shortage wasn't even nearly as bad as it is now. But like even then, it was still challenging to get like the quota that we needed. And he's like, okay, no problem. And then like he leaves a couple hours later, we get an email and we're CC'd on an email that Ron wrote to Jensen, the CEO of Nvidia, saying like, hey, these guys need GPUs. [00:53:02]Swyx: You didn't say how much? It was just like, just give them GPUs. [00:53:04]Alessio: Basically, yeah. [00:53:05]Michael: Ron is known for writing these like one-liner emails that are like very short, but very to the point. And I think that's why like everyone responds to Ron. Everyone loves Ron. And so Jensen responds. He responds quickly, like tagging this VP of AI at Nvidia. And we start working with Nvidia, which is great. And something that I love about Nvidia, by the way, is that after that intro, we got matched with like a dedicated team. And at Nvidia, they know that they're going to win regardless. So they don't care where you get the GPUs from. They're like, they're truly neutral, unlike various sales reps that you might encounter at various like clouds and, you know, hardware companies, et cetera. They actually just want to help you because they know they don't care. Like regardless, they know that if you're getting Nvidia GPUs, they're still winning. So I guess that's a tip is that like if you're looking for GPUs like Nvidia, they'll help you do it. [00:53:54]Swyx: So just to tie up this thing, because so first of all, that's a fantastic story. And I just wanted to let you tell that because it's special. That is a strategic shift, right? That you already decided to make by the time you met Ron, which is we are going to have our own hardware. We're going to rack him in a data center somewhere. [00:54:11]Michael: Well, not even that we need our own hardware because actually we don't. Right. But we just we just need GPUs, period. And like every cloud loves like they have their own sales tactics and like they want to make you commit to long terms and like very non-flexible terms. And like there's a web of different things that you kind of have to navigate. Nvidia will kind of be to the point like, OK, you can do this on this cloud, this on this cloud. Like this is your budget. Maybe you want to consider buying as well. Like they'll help you walk through what the options are. And the reason why they're helpful is because like they look at the full picture. So they'll help you with the hardware. And in terms of software, they actually implemented a custom feature for us in Faster Transformer, which is one of their libraries.Swyx: For you? [00:54:53]Michael: For us. Yeah. Which is wild. I don't think they would have done it otherwise. They implemented streaming generation for T5 based models, which we were running at the time up until we switched to GPT in February, March of this year. So they implemented that just for us, actually, in Faster Transformer. And so like they'll help you like look at the complete picture and then just help you get done what you need to get done. I know one of your interests is also local models, open source models and hardware kind of goes hand in hand.Alessio: Any fun projects, explorations in the space that you want to share with local llamas and stuff? [00:55:27]Michael: Yeah, it's something that we're very interested in because something that kind of we're hearing a lot about is like people want something like find, especially comp
Interview with Varun Mohan, CEO & Co-Founder of Exafunction and Codeium. He gives valuable insights on the considerations of AI deployment in the enterprise and shares his own thoughts and predictions on what the future holds for these models.Sign up to the Level-up Engineering newsletter!In this interview we're covering:Considerations of AI deploymentBenefits of deploying AI in the enterpriseConcerns regarding AI deploymentTypical mistakes when deploying AIExcerpt from the interview:"When I think about the last 40 years, there hasn't been a sparsity of software in the world; there are just more and more software companies providing more and more value. Every ten years, we're having larger IPOs, and more companies are building systems that can automate different parts of enterprises. There's more software that can be consumed and generated across companies, but that doesn't reduce the need for employees. It'll just increase the value businesses can provide. Perhaps in the future, AI systems are going to understand more and more company-wide context over time, which is a big gap in their knowledge currently. When they're able to piece together more information from your systems, your documents in the cloud and your other products, they'll aid humans to be even more productive and perhaps even enable stronger cross-functional collaboration."
Varun Mohan is Co-founder & CEO of Codeium, the AI-powered coding platform for developers. In this episode, we dig into competing with a behemoth like GitHub Copilot, how developing their own infrastructure has enabled incredible scale, building a 30K person community, and much more!
Exafunction is a leader in deploying deep learning models at scale. One of their products is Codeium, a coding assistant for software developers based on Exafunction's deep learning technology. Codeium provides AI-assisted autocompletion in your IDE, making it easier for you to incorporate deep learning technology in your software development workflow. Varun Mohan is the The post Modern Coding Superpowers with Varun Mohan appeared first on Software Engineering Daily.
Exafunction is a leader in deploying deep learning models at scale. One of their products is Codeium, a coding assistant for software developers based on Exafunction’s deep learning technology. Codeium provides AI-assisted autocompletion in your IDE, making it easier for you to incorporate deep learning technology in your software development workflow. Varun Mohan is the The post Modern Coding Superpowers with Varun Mohan appeared first on Software Engineering Daily.
Exafunction is a leader in deploying deep learning models at scale. One of their products is Codeium, a coding assistant for software developers based on Exafunction’s deep learning technology. Codeium provides AI-assisted autocompletion in your IDE, making it easier for you to incorporate deep learning technology in your software development workflow. Varun Mohan is the The post Modern Coding Superpowers with Varun Mohan appeared first on Software Engineering Daily.
The world increasingly runs on code. Accelerating the work of those who create that code will boost their productivity — and that's just what AI startup Codeium, a member of NVIDIA's Inception program for startups, aims to do. On the latest episode of NVIDIA's AI Podcast, host Noah Kravitz interviewed Codeium founder and CEO Varun Mohan and Jeff Wang, the company's head of business, about the company's business, about how AI is transforming software. Codeium's AI-powered code acceleration toolkit boasts three core features: autocomplete, chat and search. Autocomplete intelligently suggests code segments, saving developers time by minimizing the need for writing boilerplate or unit tests. At the same time the chat function empowers developers to rework or even create code with natural language queries, enhancing their coding efficiency while providing searchable context on the entire code base. Noah spoke with Mohan and Wang about the future of software development with AI, and the continued, essential role of humans in the process.
Chris sat down with Varun Mohan and Anshul Ramachandran, CEO / Cofounder and Lead of Enterprise and Partnership at Codeium, respectively. They discussed how to streamline and enable modern development in generative AI and large language models (LLMs). Their new tool, Codeium, was born out of the insights they gleaned from their work in GPU software and solutions development, particularly with respect to generative AI, large language models, and supporting infrastructure. Codeium is a free AI-powered toolkit for developers, with in-house models and infrastructure - not another API wrapper.
Chris sat down with Varun Mohan and Anshul Ramachandran, CEO / Cofounder and Lead of Enterprise and Partnership at Codeium, respectively. They discussed how to streamline and enable modern development in generative AI and large language models (LLMs). Their new tool, Codeium, was born out of the insights they gleaned from their work in GPU software and solutions development, particularly with respect to generative AI, large language models, and supporting infrastructure. Codeium is a free AI-powered toolkit for developers, with in-house models and infrastructure - not another API wrapper.
A scathing takedown of Serverless... By Amazon? We react to this strange revelation and more.
Chegamos ao Compilado Número 100!! tadaNesse episódio especial trouxemos várias novidades para nossos seguidores, e também as notícias e novidades do mundo da programação que nos chamaram atenção dos dias 22/04 a 28/04.
Chegamos ao Compilado Número 100!! tadaNesse episódio especial trouxemos várias novidades para nossos seguidores, e também as notícias e novidades do mundo da programação que nos chamaram atenção dos dias 22/04 a 28/04.
We have a laugh at Elon's alt account, why the knives are out for GitHub Co-pilot, and our thoughts on Apple's "major victory" this week.
We have a laugh at Elon's alt account, why the knives are out for GitHub Co-pilot, and our thoughts on Apple's "major victory" this week.
We have a laugh at Elon's alt account, why the knives are out for GitHub Co-pilot, and our thoughts on Apple's "major victory" this week.
OpenAI just rollicked the AI world yet again yesterday — while releasing the long awaited ChatGPT API, they also priced it at $2 per million tokens generated, which is 90% cheaper than the text-davinci-003 pricing of the “GPT3.5” family. Their blogpost on how they did it is vague: Through a series of system-wide optimizations, we've achieved 90% cost reduction for ChatGPT since December; we're now passing through those savings to API users.We were fortunate enough to record Episode 2 of our podcast with someone who routinely creates 90%+ improvements for their customers, and in fact have started productizing their own infra skills with Codeium, the rapidly growing free-forever Copilot alternative (see What Building “Copilot for X” Really Takes). Varun Mohan is CEO of Exafunction/Codeium, and he indulged us in diving deep into AI infrastructure, compute-optimal training vs inference tradeoffs, and why he loves suffering.Recorded in-person at the beautiful StudioPod studios in San Francisco.Full transcript is below the fold. Timestamps* 00:00: Intro to Varun and Exafunction* 03:06: GPU Efficiency, Model Flop Utilization, Dynamic Multiplexing* 05:30: Should companies own their ML infrastructure?* 07:00: The two kinds of LLM Applications* 08:30: Codeium* 14:50: “Our growth is 4-5% day over day”* 16:30: Latency, Quality, and Correctability* 20:30: Acceleration mode vs Exploration mode* 22:00: Copilot for X - Harvey AI's deal with Allen & Overy* 25:00: Scaling Laws (Chinchilla)* 28:45: “The compute-optimal model might not be easy to serve”* 30:00: Smaller models* 32:30: Deepmind Retro can retrieve external infromation* 34:30: Implications for embedding databases* 37:10: LLMOps - Eval, Data Cleaning* 39:45: Testing/User feedback* 41:00: “Users Is All You Need”* 42:45: General Intelligence + Domain Specific Dataset* 43:15: The God Nvidia computer* 46:00: Lightning roundShow notes* Varun Mohan Linkedin* Exafunction* Blogpost: Are GPUs Worth it for ML* Codeium* Copilot statistics* Eleuther's The Pile and The Stack* What Building “Copilot for X” Really Takes* Copilot for X* Harvey, Copilot for Law - deal with Allen & Overy* Scaling Laws* Training Compute-Optimal Large Language Models - arXiv (Chinchilla paper)* chinchilla's wild implications (LessWrong)* UL2 20B: An Open Source Unified Language Learner (20B)* Paper - Deepmind Retro* “Does it make your beer taste better”* HumanEval benchmark/dataset* Reverse Engineering Copilot internals* Quora Poe* Prasanna Sankar notes on FLOPs and Bandwidth* NVIDIA H100 specs - 3TB/s GPU memory, 900GB/s NVLink Interconnect* Optimizer state is 14x size of model - 175B params => 2.5TB to store state → needs at least 30 H100 machines with 80GB each* Connor Leahy on The Gradient PodcastLightning Rounds* Favorite AI Product: Midjourney* Favorite AI Community: Eleuther and GPT-J* One year prediction: Better models, more creative usecases* Request for Startup: Superathlete Fitness Assistant* Takeaway: Continue to tinker!Transcript[00:00:00] Alessio Fanelli: Hey everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO in residence at Decibel Partners. I'm joined by my cohost, swyx, writer, editor of L Space Diaries.[00:00:20] swyx: Hey, and today we have Varun Mohan from Codeium / Exafunction on. I should introduce you a little bit because I like to get the LinkedIn background out of the way.[00:00:30] So you did CS at MIT and then you spent a few years at Nuro where you were ultimately tech lead manager for autonomy. And that's an interesting dive. Self-driving cars in AI and then you went straight into Exafunction with a few of your coworkers and that's where I met some of them and started knowing about Exafunction.[00:00:51] And then from out of nowhere you cloned GitHub Copilot. That's a lot of progress in a very short amount of time. So anyway, welcome .[00:00:59] Varun Mohan: That's high praise.[00:01:00] swyx: What's one thing about you that doesn't appear on LinkedIn that is a big part of what people should know?[00:01:05] Varun Mohan: I actually really like endurance sports actually.[00:01:09] Like I, I've done multiple triathlons. I've actually biked from San Francisco to LA. I like things that are like suffering. I like to suffer while I, while I do sports. Yeah.[00:01:19] swyx: Do you think a lot about like code and tech while you're doing those endurance sports or are you just,[00:01:24] Varun Mohan: your mind is just focused?[00:01:26] I think it's maybe a little bit of both. One of the nice things about, I guess, endurance athletics, It's one of the few things you can do where you're not thinking about, you can't really think about much beyond suffering. Like you're climbing up a hill on a bike and you see like, uh, you see how many more feet you need to climb, and at that point you're just struggling.[00:01:45] That's your only job. Mm-hmm. . Yeah. The only thing you can think of is, uh, pedaling one more pedal. So it's actually like a nice, a nice way to not think about work. Yeah,[00:01:53] Alessio Fanelli: yeah, yeah. Maybe for the audience, you wanna tell a bit about exa function, how that came to be and how coding came out[00:01:59] Varun Mohan: of that. So a little bit about exo function.[00:02:02] Before working at exa function, I worked at Neuro as Sean was just saying, and at neuro, I sort of managed large scale offline deep learning infrastructure. Realized that deep learning infrastructure is really hard to build and really hard to maintain for even the most sophisticated companies, and started exa function to basically solve that gap, to make it so that it was much easier for companies.[00:02:24] To serve deep learning workloads at scale. One of the key issues that we noticed is GPUs are extremely hard to manage fundamentally because they work differently than CPUs. And once a company has heterogeneous hardware requirements, it's hard to make sure that you get the most outta the hardware. It's hard to make sure you can get, get great GPU utilization and exa function was specifically built to make it so that you could get the most outta the hardware.[00:02:50] Make sure. Your GP was effectively virtualized and decoupled from your workload to make it so that you could be confident that you were running at whatever scale you wanted without burning the bank.[00:03:00] swyx: Yeah. You gave me this metric about inefficiency,[00:03:03] Varun Mohan: right? Oh, okay. Like flop efficiency. Yeah. Yeah. So basically, I think it comes down to, for most people, one of the things about CPUs that's really nice is with containers, right?[00:03:13] You can end up having a single. You can place many containers on them and all the containers will slowly start eating the compute. It's not really the same with GPUs. Like let's say you have a single. For the most part, only have one container using that gpu. And because of that, people heavily underestimate what a single container can sort of do.[00:03:33] And the GPU is left like heavily idle. And I guess the common term now with a lot of LM workloads is like the flop efficiency of these workloads. M F U, yeah. Yeah. Model flop utilization. The model flop utilization, which is basically like what fraction of the flops or compute on the hardware is actually getting used.[00:03:49] And sort of what we did at exa function. Not only make it so that the model was always running, we also built compiler technology to make it so that the model was also running more efficiently. And some of these things are with tricks like operator fusion, like basically you could imagine fusing two operations together such that the time it takes to compute.[00:04:07] the fused operation is lower than the time it takes for each individual operation. Oh my God. Yeah. .[00:04:13] Alessio Fanelli: Yeah. And you have this technique called dynamic multiplexing, which is basically, instead of having a one-to-one relationship, you have one GP for multiple clients. And I saw one of your customers, they went from three clients to just one single GPU and the cost by 97%.[00:04:29] What were some of those learning, seeing hardware usage and efficiencies and how that then played into what, what[00:04:34] Varun Mohan: you're building? Yeah, I think it basically showed that there was probably a gap with even very sophisticated teams. Making good use of the hardware is just not an easy problem. I think that was the main I, it's not that these teams were like not good at what they were doing, it's just that they were trying to solve a completely separate problem.[00:04:50] They had a model that was trained in-house and their goal was to just run it and it, that should be an easy. Easy thing to do, but surprisingly still, it's not that easy. And that problem compounds in complexity with the fact that there are more accelerators now in the cloud. There's like TPUs, inferential and there's a lot of decisions, uh, that users need to make even in terms of GPU types.[00:05:10] And I guess sort of what we had was we had internal expertise on what the right way to run the workload was, and we were basically able to build infrastructure and make it so that companies could do that without thinking. So most[00:05:21] Alessio Fanelli: teams. Under utilizing their hardware, how should they think about what to own?[00:05:26] You know, like should they own the appearance architecture? Like should they use Xlo to get it to production? How do you think[00:05:32] Varun Mohan: about it? So I think one thing that has proven to be true over the last year and a half is companies, for the most part, should not be trying to figure out what the optimal ML architecture is or training architecture is.[00:05:45] Especially with a lot of these large language models. We have generic models and transformer architecture that are solving a lot of distinct problems. I'll caveat that with most companies. Some of our customers, which are autonomous vehicle companies, have extremely strict requirements like they need to be able to run a model at very low latency, extremely high precision recall.[00:06:05] You know, GBT three is great, but the Precision Recall, you wouldn't trust someone's life with that, right? So because of that, they need to innovate new kinds of model architectures. For a vast majority of enterprises, they should probably be using something off the shelf, fine tuning Bert models. If it's vision, they should be fine tuning, resonant or using something like clip like the less work they can do, the better.[00:06:25] And I guess that was a key turning point for us, which is like we start to build more and more infrastructure for the architectures that. The most popular and the most popular architecture was the transformer architecture. We had a lot of L L M companies explicitly reach out to us and ask us, wow, our GT three bill is high.[00:06:44] Is there a way to serve G P T three or some open source model much more cheaply? And that's sort of what we viewed as why we were maybe prepared for when we internally needed to deploy transform models our.[00:06:58] Alessio Fanelli: And so the next step was, Hey, we have this amazing infrastructure. We can build kind of consumer facing products, so to speak, at with much better unit economics, much better performance.[00:07:08] And that's how code kind[00:07:10] Varun Mohan: of came to be. Yeah. I think maybe the, the play is not maybe for us to be just, we make a lot of consumer products. We want to make products with like clear ROI in the long term in the enterprise. Like we view code as maybe one of those things. Uh, and maybe we can, we can talk about code maybe after this.[00:07:27] We. Products like co-pilot as being extremely valuable and something that is generating a lot of value to professionals. We saw that there was a gap there where a lot of people probably weren't developing high intensive L L M applications because of cost, because of the inability to train models the way they want to.[00:07:44] And we thought we could do that with our own infrastructure really quickly.[00:07:48] swyx: I wanna highlight when you say high intensive, you mean basically generate models every key, uh, generate inferences on every keystroke? That's[00:07:55] Varun Mohan: right. Yeah. So I would say like, there's probably two kinds of L l M applications here.[00:07:59] There's an L L M application where, you know, it rips through a bunch of data and maybe you wait a couple minutes and then you see something, and then there's an application where the quality is not exactly what you want, but it's able to generate enough, sorry, low enough latency. It's still providing a ton of value.[00:08:16] And I will say there's like a gap there where the number of products that have hit that co-pilot spot is actually not that high. Mm. A lot of them are, are kind of like weight and, you know, just generate a lot of stuff and see what happens because one is clearly more compute intensive than the other Basically.[00:08:31] swyx: Well co uh, I don't know if we told the whole story yet, you were going to[00:08:35] Varun Mohan: dive into it. . Yeah, so I guess, I guess the story was I guess four or five months ago we sort of decided internally as a team we were like very early adopters of co-pilot. I'm not gonna sit here and say co-pilot, it's not a great tool.[00:08:45] We love co-pilot. It's like a fantastic tool. We all got on the beta. The moment it came out we're like a fairly small T, but we, like we all got in, we were showing each other completions. We end up writing like a lot of cuda and c plus plus inside the company. And I think there was probably a thought process within us that was like, Hey, the code we write is like very high aq.[00:09:04] You know? So like there's no way it can help. And one of the things in c plus plus that's like the most annoying is writing templates. Writing template programming is maybe one of those things. No one, maybe there's like some people in the C plus O standards community that can do it without looking at the, looking at anything online.[00:09:19] But we struggle. We struggle writing bariatric templates and COPA just like ripped through. Like we had a 500 line file and it was just like writing templates like, and we didn't really even test it while we were running it. We then just compiled it and it just, We're like, wow. Like this is actually something that's not just like it's completing four loops, it's completing code for us.[00:09:38] That is like hard in our brains to reach, but fundamentally and logically is not that complicated. The only reason why it's complicated is there's just a lot of rules, right. And from then we were just like, wow, this is, that was maybe the first l l m application for us internally, because we're not like marketers that would use, uh, Jasper, where we were like, wow, this is like extremely valuable.[00:09:58] This is not a toy anymore. So we wanted to take our technology to build maybe apps where these apps were not gonna be toys, right? They were not gonna be like a demo where you post it on Twitter and then you know there's hype and then maybe like a month later, no one's using.[00:10:11] swyx: There's a report this morning, um, from co-pilot where they, they were estimating the key tabs on amount of code generated by a co-pilot that is then left in code repos and checked in, and it's something like 60 to 70%[00:10:24] Varun Mohan: That's, that's nuts, but I totally believe it given, given the stats we have too. There's this flips in your head once you start using products like this, where in the beginning there's like, there's like skepticism, like how, how valuable can it be? And suddenly now like user behavior fundamentally changes so that now when I need to write a function, I'm like documenting my code more because I think it's prompting the model better, right?[00:10:43] So there's like this crazy thing where it's a self-fulfilling prophecy where when you get more value from it, more of your code is generated. From co-pilot[00:10:50] swyx: just to walk through the creation process, I actually assumed that you would have grabbed your data from the pile, which is the Luther ai, uh, open source, uh, code information.[00:11:00] But apparently you scraped your own[00:11:01] Varun Mohan: stuff. Yeah. We ended up basically using a lot of open, I guess, permissively licensed code, uh, in the public internet, mainly because I think also the pile is, is fairly a small subset. Uh, I think maybe after we started there was the, that was also came to be, but for us, we had a model for ourselves even before that, uh, was the point.[00:11:21] Ah, okay. So the timing was just a little bit off. Yeah, exactly. Exactly. But it's awesome work. It's, it seems like there's a good amount of work that's getting done Decentrally. Yeah. Which is a little bit surprising to me because I'm like more bullish on everyone needs to get together in a room and make stuff happen.[00:11:35] Like we're all in person in Mountain View. But yeah, no, it's pretty impressive. Yeah. Luther in general, like everything they've done, I'm pretty impressed with it. Yeah, and we're[00:11:42] swyx: gonna talk about that. Cause I, I didn't know you were that involved in the community[00:11:45] Varun Mohan: that early on I wasn't involved. It was more of like a, I was watching and maybe commenting from time to time.[00:11:50] So they're a very special community for sure. Yeah,[00:11:52] swyx: yeah, yeah. That's true. That's true. My impression is a bunch of you are geniuses. You sit down together in a room and you. , get all your data, you train your model, like everything's very smooth sailing. Um, what's wrong with that[00:12:02] Varun Mohan: image? Yeah, so probably a lot of it just in that a lot of our serving infrastructure was already in place, Uhhuh before then.[00:12:09] So like, hey, we were able to knock off one of these boxes that I think a lot of other people maybe struggle with. The open source serving offerings are just, I will say, not great in that. That they aren't customized to transformers and these kind of workloads where I have high latency and I wanna like batch requests, and I wanna batch requests while keeping latency low.[00:12:29] Mm-hmm. , right? One of the weird things about generation models is they're like auto regressive, at least for the time being. They're auto aggressive. So the latency for a generation is a function of the amount of tokens that you actually end up generating. Like that's like the math. And you could imagine while you're generating the tokens though, unless you batch a.[00:12:46] It's gonna end up being the case that you're not gonna get great flop utilization on the hardware. So there's like a bunch of trade offs here where if you end up using something completely off the shelf, like one of these serving thing, uh, serving frameworks, you're gonna end up leaving a lot of performance on the table.[00:13:00] But for us, we were already kind of prepared. To sort of do that because of our infrastructure that we had already built up. And probably the other thing to sort of note is early on we were able to leverage open source models, sort of bootstrap it internally within our company, but then to ship, we finally had some requirements like, Hey, we want this model to have fill in the middle capabilities and a bunch of other things.[00:13:20] And we were able to ship a model ourselves. So we were able to time it so that over the course of multiple months, different pieces were like working out properly for us. So it wasn't. . You know, we started out and we were just planning the launch materials. The moment we started there was like maybe some stuff that was already there, some stuff that we had already figured out how to train models at scale internally.[00:13:38] So we were able to just leverage that muscle very quickly. I think the one[00:13:41] swyx: thing that you had figured out from the beginning was that it was gonna be free forever. Yeah. Yeah, co-pilot costs $10[00:13:47] Varun Mohan: a month. Co-pilot costs $10 a month. I would argue significantly more value than $10 a month. The important thing for us though, was we are gonna continue to build more great products on top of code completion.[00:13:58] We think code completion is maybe day one of what the future looks like. And for that, clearly we can't be a product that's like we're $10 a month and we're adding more products. We want a user base that loves using us. And we'll continue to stay with us as we continue to layer on more products. And I'm sure we're gonna get more users from the other products that we have, but we needed some sort of a differentiator.[00:14:17] And along the way we realized, hey, we're pretty efficient at running these workloads. We could probably do this. Oh, so it wasn't,[00:14:23] swyx: it was a plan to be free from the start. You just[00:14:25] Varun Mohan: realized we, yeah. We realized we could probably, if we cut and optimized heavily, we could probably do this properly. Part of the reasoning here was we were confident we could probably build a pro tier and go to the enter.[00:14:35] But for now, originally when we, when we started, we weren't like, we're just gonna go and give every, all pieces of software away for free. That wasn't like sort of the goal there. And[00:14:43] swyx: since you mentioned, uh, adoption and, you know, traction and all that, uh, what can you disclose about user growth? Yeah, user adoption.[00:14:50] Varun Mohan: Yeah. So right now we have. We probably have over 10,000 users and thousands of daily actives, and people come back day over day. Our growth is like around, you know, four to 5% day over day right now. So all of our growth right now is sort of like word of mouth, and that's fundamentally because like the product is actually one of those products where.[00:15:08] Even use COT and use us, it's, it's hard to tell the difference actually. And a lot of our users have actually churned off of cot isn't Yeah. I,[00:15:14] swyx: I swept Yeah. Yeah. To support you guys, but also also to try[00:15:17] Varun Mohan: it out. Yeah, exactly. So the, the crazy thing is it wasn't like, Hey, we're gonna figure out a marketing motion of like, Going to the people that have never heard of co-pilot and we're gonna like get a bunch of users.[00:15:27] We wanted to just get users so that in our own right we're like a really great product. Uh, and sort of we've spent a lot of engineering time and obviously we co-wrote a blog post with you, Sean, on this in terms of like, there's a lot of engineering work, even beyond the latency, making sure that you can get your cost down to make a product like this actually work.[00:15:44] swyx: Yeah. That's a long tail of, of stuff that you referenced,[00:15:47] Varun Mohan: right? Yes. Yeah, exactly.[00:15:48] swyx: And you, you said something to the order of, um, and this maybe gets into co-pilot for X uh, which is something that everybody is keen about cuz they, they see the success of co-pilot. They're like, okay, well first of all, developer tools, there's more to do here.[00:16:00] And second of all, let's say the co-pilot idea and apply for other disciplines. I don't know if you wanna Yeah.[00:16:06] Varun Mohan: There's[00:16:06] Alessio Fanelli: gonna some. Key points that, that you touched on. Um, how to estimate, inference a scale, you know, and the latency versus quality trade-offs. Building on first party. So this is free forever because you run your own models, right?[00:16:19] That's right. If you were building on open ai, you wouldn't be able to offer it for free real-time. You know, when I first use coding, It was literally the same speed as Copi is a little bit[00:16:29] swyx: faster. I don't know how to quantify it,[00:16:31] Varun Mohan: but we are faster. But it's one of those things that we're not gonna like market as that's the reason because it's not in and of itself a right for you to like, I'm just gonna be open with you.[00:16:39] It's not a reason for you to like suddenly turn off a copilot where if our answers were trash, uh, but we were faster. You know what I mean? But your focus[00:16:46] Alessio Fanelli: was there. We used the alpha, I think prem on our discord came to us and say, you guys should try this out. So it was really fast. Even then, prompt optimization is another big thing, and model outputs and UX kind of how you bring them together.[00:17:00] Which ones of these things are maybe like the one or two that new founders should really think about first?[00:17:07] Varun Mohan: Yeah, I think, I think my feeling on this is unless you are ex, you probably should always bootstrap on top of an existing a. Because like even if you were to, the only reason why we didn't is because we knew that this product was actually buildable.[00:17:22] Probably if we worked hard enough to train a model, we would actually be able to build a great product already. But if you're actually going out and trying to build something from scratch, unless you genuinely believe, I need to fine tune on top of, you know, terabytes of data terabyte is a very large amount of data, but like tens of gigabytes of data.[00:17:37] Probably go out and build on top of an API and spend most of your time to make it so that you can hit that quality latency trade off properly. And if I were to go out and think about like the three categories of like an LM product, it's probably like latency, quality, and correct ability. The reality is, you know, if I were to take a product like co-pilot or Coum, the latency is very low.[00:17:58] The quality I think, is good enough for the task, but the correct ability is, is very easy. Credibility. What, what is correct ability? Correct ability means, let's say the quality is not there. Like you consider the the case where, The answer is wrong. How easy is it for your user to actually go and leverage parts of the generation?[00:18:16] Maybe a, a concrete example. There's a lot of things people are excited about right now where I write a comment and it generates a PR for me, and that's like, that's like really awesome in theory. I think that's like a really cool thing and I'm sure at some point we will be able to get there. That will probably require an entirely new model for what it's worth that's trained on diffs and commits and all these other things that looks at like improvements and code and stuff.[00:18:37] It's probably not gonna be just trained on generic code. But the problem with those, those sort of, I would say, applications are that, let's suppose something does change many files, makes large amounts of changes. First of all, it's guaranteed not gonna be. Because even the idea of like reviewing the change takes a long time.[00:18:54] So if the quality and the correct ability is just not there, let's say you had 10 file, a 10 file change and you modified like, you know, file two and four, and those two modifications were consistent, but the other eight files were not consistent. Then suddenly the correct ability is like really hard.[00:19:10] It's hard to correct the output of the model. And so the user interface is 100% really important. But maybe until you get the latency down or the correct ability, like correct ability, like a lot better, it's probably not gonna be shippable. And I think that's what you gotta spend your time focusing on.[00:19:26] Can you deliver a product that is actually something users want to use? And I think this is why I was talking about like demo. It's like very easy to hand to handpick something that like works, that works for a demo, exceedingly hard for something that has large scope, like a PR to work consistently. It will take a lot of engineering effort to make it work on small enough chunks so that a user is like, wow, this is value generative to me.[00:19:49] Because eroding user trust or consumer trust is very easy. Like that is, it is is much, much, it's very easy to erode user trust versus enterprise. So just be mindful of that, and I think that's probably like the mantra that most of these companies need to operate under. Have you done any[00:20:05] Alessio Fanelli: analysis on. What the ratio between code generated and latency is.[00:20:11] So you can generate one line, but you could also generate the whole block. You can generate Yeah. A whole class and Yeah. You know, the more you generate the, the more time it takes. Like what's the sweet spot that, that you[00:20:21] Varun Mohan: found? Yeah, so I think there was a great study and, and I'm not sure if it's possible to link it, but there was a great study about co-pilot actually that came out.[00:20:28] Basically what they said was there were two ways that developers usually develop with a code assistant technology. They're either in what's called like acceleration mode or exploration mode. And exploration mode is basically you're in the case where you don't even know what the solution space for the function is.[00:20:43] and you just wanna generate a lot of code because you don't even know what that looks like. Like it might use some API that you've never heard of. And what you're actually doing at that point is like you're writing a clean comment, just wishing and praying that you know, the generation is long enough and gets you, gets you far enough, right?[00:20:57] acceleration mode is basically you are doing things where you are very confident in what you're doing and effectively. Code gives you that muscle so that you can basically stay in flow state and you're not thinking about like exactly what the APIs look like, but push comes to shove. You will figure out what the APIs look like, but actually like mentally, it takes off like a load in your head where you're like, oh wow.[00:21:18] Like I can just do this. The intent to execution is just a lot, a lot lower there. And I think effectively you want a tool that captures that a little bit. And we have heuristics in terms of captur. Whether or not you're in acceleration versus exploration mode. And a good heuristic is, let's say you're inside like a basic block of a piece of code.[00:21:37] Let's say you're inside a a block of code or an IF statement, you're probably already in acceleration mode and you would feel really bad if I started generating the ELs clause. Because what happens if that else causes really wrong? That's gonna cause like mental load for you because you are the way programmers think.[00:21:51] They only want to complete the if statement first, if that makes sense. So there are things where we are mindful of like how many lines we generate if you use the product, like multi-line generations happen and we are happy to do them, but we don't want to do them when we think it's gonna increase load on developers, if that makes sense.[00:22:07] That[00:22:07] Alessio Fanelli: makes sense. So co-pilot for x. , what are access that you think are interesting for people to build[00:22:13] Varun Mohan: in? Didn't we see some, some tweet recently about Harvey ai, uh, company that, that is trying to sell legal? It's like a legal, legal assistance. That's, that's pretty impressive, honestly. That's very impressive.[00:22:23] So it seems like I would really love to see what the product looks like there, because there's a lot of text there. You know, looking at bing, bing, ai, like, I mean, it's, it's pretty cool. But it seems like groundedness is something a lot of these products struggle with, and I assume legal, if there's one thing you want them to.[00:22:39] To get right. It's like the groundedness. Yeah.[00:22:42] swyx: Yeah. I've made the analogy before that law and legal language is basically just another form of programming language. You have to be that precise. Yes. Definitions must be made, and you can scroll to find the definition. It's the same thing. Yes. ,[00:22:55] Varun Mohan: yes. Yeah. But like, I guess there's a question of like comprehensiveness.[00:22:59] So like, let's say, let's say the only way it generates a suggestion is it provides like, you know, citations to other legal. You don't want it to be the case that it misses things, so you somehow need the comprehensiveness, but also at the same time, you also don't want it to make conclusions that are not from the site, the things at sites.[00:23:15] So, I don't know, like that's, that's very impressive. It's clear that they've demonstrated some amount of value because they've been able to close a fairly sizable enterprise contract. It was like a firm with 3,500 lawyers, something nuts, honestly. Very cool. So it's clear this is gonna happen, uh, and I think people are gonna need to be clever about how they actually make it work.[00:23:34] Within the constraints of whatever workload they're operating in. Also, you, you guys[00:23:37] swyx: are so good at trading stuff, why don't you, you try[00:23:39] Varun Mohan: cloning it. Yeah. So I think, I think that's, that's, uh, preview the roadmap. Yeah, yeah, yeah, yeah. No, no, no, but I'm just kidding. I think one of the things that we genuinely believe as a startup is most startups can't really even do one thing properly.[00:23:52] Mm-hmm. Focus. Yeah. Yeah. Usually doing one thing is really hard. Most companies that go public have like maybe a couple big products. They don't really have like 10, so we're under no illusions. Give the best product experience, the amount of engineering and attention to detail, to build one good product as hard.[00:24:08] So it's probably gonna be a while before we even consider leaving code. Like that's gonna be a big step because the amount of learning we need to do is gonna be high. We need to get users right. We've learned so much from our users already, so, yeah, I don't think we'd go into law anytime soon.[00:24:22] swyx: 3,500 lawyers with Ellen and Ry, uh, is, is is apparently the, the new[00:24:27] Varun Mohan: That's actually really big.[00:24:28] Yeah. Yeah. I can congrat.[00:24:29] swyx: Yeah, it's funny cuz like, it seems like these guys are moving faster than co-pilot. You know, co-pilot just launched, just announced enterprise, uh, like co-pilot for teams or co-pilot for Enterprise. Yeah. After like two years of testing.[00:24:40] Varun Mohan: Yeah, it does seem like the co-pilot team has built a very, very good product.[00:24:44] Um, so I don't wanna like say anything, but I think it is the case to startups will be able to move faster. I feel like that is true, but hey, like GitHub has great distribution. Whatever product they do have, they will be able to sell it really. Shall[00:24:56] swyx: we go into model numbers and infra estimates? our favorite[00:25:01] Varun Mohan: topics.[00:25:02] Nice small models. Nice.[00:25:04] swyx: So this is, um, relevant to basically I'm researching a lot of skilling law stuff. You have a lot of thoughts. You, you host paper discussions[00:25:12] Varun Mohan: in your team. Yeah, we, we try to like read papers that we think are really interesting and relevant to us. Recently that's been, there's just a fire hose of papers.[00:25:21] You know, someone even just curating what papers we should read internally as a company. Yeah, I think, I think there's, there's so much good content[00:25:28] swyx: out there. You should, you guys should have a podcast. I mean, I told you this before. Should have a podcast. Just, just put a mic near where, where you guys are[00:25:33] Varun Mohan: talking.[00:25:34] We gotta, we gotta keep developing coding though, . No, but you're doing this discussion[00:25:38] swyx: anyway. You[00:25:38] Varun Mohan: might as well just, oh, put the discussion on a podcast. I feel like some of the, some of the thoughts are raw, right? Like, they're not gonna be as, as nuanced. Like we'll just say something completely stupid during our discussions.[00:25:48] I don't know, , maybe that's exciting. Maybe that's, it's kinda like a justin.tv, but for ML papers, Okay, cool. I watched that.[00:25:55] swyx: Okay, so co-pilot is 12 billion parameters. Salesforce cogen is up to 16. G P t three is 175. GP four is gonna be 100 trillion billion. Yeah. So what, what we landed on with you is with, uh, with Cilla, is that we now have an idea of what compute optimal data scaling is.[00:26:14] Yeah. Which is about 20 times parameters. Is that intuitive to you? Like what, what did that[00:26:18] Varun Mohan: unlock? I think basically what this shows is that bigger models are like more data efficient, like given the same number of tokens, a big model like trained on the same number of tokens. A bigger model is like, is gonna learn more basically.[00:26:32] But also at the same time, the way you have to look at it is there are more flops to train a bigger model on the same number of tokens. So like let's say I had a 10 billion parameter model and I trained it on on 1 million tokens, but then I had a 20 billion parameter model at the end of it will be a better.[00:26:47] It will have better perplexity numbers, which means like the probability of like a prediction is gonna be better for like the next token is gonna be better. But at the end of it, you did burn twice the amount of compute on it. Right? So Shinto is an interesting observation, which says if you have a fixed compute budget, And you want the best model that came out of it because there's like a difference here where a model that is, that is smaller, trained on the same number of tokens as fewer flops.[00:27:12] There's a a sweet spot of like number of tokens and size a model. I will say like people probably like. Are talking about it more than they should, and, and I'll, I'll explain why, but it's a useful result, which is like, let's say I have, you know, some compute budget and I want the best model. It tells you what that, what you should generate.[00:27:31] The problem I think here is there is a real trade off of like, you do need to run this model somewhere. You need to run it on a piece of hardware. So then it comes down to how much memory does that piece of hardware have. Let's say for a fixed compute budget, you could train a 70 billion parameter. What are you gonna put that on?[00:27:47] Yeah, maybe you could, could you put that on an 80 gig, A 100? It would be a stretch. You could do things like f, you know, in eight F p a, to reduce the amount of memory that's on the box and do all these other things. But you have to think about that first, right? When you want to go out and train that model.[00:27:59] The worst case is you ended up training that mo, that model, and you cannot serve it. So actually what you end up finding is for a lot of these code completion models, they are actually what you would consider over-trained . So by that I mean like, let's look at a model like Cogen. It's actually trained on, I believe, and, and I could be wrong by, you know, a hundred billion here or there.[00:28:18] I got some data. Oh, okay. Let's look at the 3 billion parameter model. It's a 2.7. I think it's actually a 2.7 billion barometer model. It's weird because they also trained on natural language on top of code, but it's trained on hundreds of billions of tokens. If you applied that chinchilla, Optimization to it, you'd be like, wow, this is, this is a stupid use of compute.[00:28:36] Right? Because three, they should be going to 60, any anything more than 60. And they're like, they should have just increased the model size. But the reality is if they had like the compute optimal one might not be one that's easy to serve, right? It could just have more parameters. And for our case, our models that we train internally, they might not be the most compute.[00:28:56] In other words, we probably could have had a better model by making it larger, but the trade off would've been latency. We know what the impact of having higher latency is, and on top of that, being able to fit properly on our hardware constraints would've also been a concern.[00:29:08] swyx: Isn't the classic stopping point when you, you see like loss kind of levels off.[00:29:12] Right now you're just letting chinchilla tell you,[00:29:16] Varun Mohan: but like you should just look at loss. The problem is the loss will like continue to go down. It'll just continue to go down like, like in a, in a way that's like not that pleasing. It's gonna take longer and longer. It's gonna be painful, but it's like one of those things where if you look at the perplexity number of difference between.[00:29:31] Let's say a model that's like 70 billion versus 10 billion. It's not massive. It's not like tens of percentage points. It's like very small, right? Mm. The reality is here, like, I mean this comes down to like IQ of like these models in some sense, like small wins at the margins are massive wins in terms of iq.[00:29:47] Like it's harder to get those and they don't look as big, but they're like massive wins in terms of reasoning. They can now do chain of thought, all these other things. Yeah, yeah, yeah.[00:29:55] swyx: It's, and, and so apparently unlocked around the[00:29:57] Varun Mohan: 20 billion. Yes. That's right. Some kind of magic. Yeah. I think that was from the UL two or maybe one of those land papers.[00:30:03] Any thoughts on why? Like is there is? I don't know. I mean, emergence of intelligence, I think. I think maybe one of the things is like we don't even know, maybe like five years from now of what we're gonna be running are transformers. But I think it's like, we don't, we don't 100% know that that's true. I mean, there's like a lot of maybe issues with the current version of the transformers, which is like the way attention works, the attention layers work, the amount of computers quadratic in the context sense, because you're like doing like an n squared operation on the attention blocks basically.[00:30:30] And obviously, you know, one of the things that everyone wants right now is infinite context. They wanna shove as much prop as possible in here. And the current version of what a transformer looks like is maybe not ideal. You might just end up burning a lot of flops on this when there are probably more efficient ways of doing it.[00:30:45] So I'm, I'm sure in the future there's gonna be tweaks to this. Yeah. Uh, but it is interesting that we found out interesting things of like, hey, bigger is pretty much always better. There are probably ways of making smaller models significantly better through better data. That is like definitely true. Um, And I think one of the cool things that the stack showed actually was they did a, like a, I think they did some ablation studies where they were like, Hey, what happens if we do, if we do decontamination of our data, what happens if we do de-duplication?[00:31:14] What happens if we do near dup of our data and how does the model get better? And they have like some compelling results that showcase data quality really matters here, but ultimately, Yeah, I think it is an interesting result that at 20 billion there's something happening. But I also think like some of these things in the future may look materially different than what they look like right now.[00:31:30] Hmm. Do you think[00:31:31] Alessio Fanelli: the token limitation is actually a real architectural limitation? Like if you think about the tokens need as kind of like atic, right? Like once you have. 50,000 tokens context, like 50,000 or infinite. For most use cases, it's like the same. Where do you think that number is, especially as you think about code, like some people have very large code bases, there's a lot.[00:31:53] Have you done any work there to figure out where the sweet[00:31:55] Varun Mohan: spot is? Yeah, look, I think what's gonna really end up happening is if people come up with a clever way and, and it, there was some result research that I believe came out of Stanford. I think the team from the Helm group, I think came out with some architecture that looks a little bit different than Transformers, and I'm sure something like this will work in the future.[00:32:13] What I think is always gonna happen is if you find a cheap way to embed context, people are gonna figure out a way to, to put as much as possible in because L LM so far have been like virtually stateless. So the only thing that they have beyond fine tuning is like just shoveling everything you can inside.[00:32:28] And there are some interesting papers, like retro, actually there are maybe some interesting pieces of thought like ideas that have come out recently. Yeah, let's go through them. So one of the really interesting ideas, I think is retro. It's this paper that came out of DeepMind and the idea is actually, let's say you send out, you send out, uh, a prompt.[00:32:44] Okay? Send out a prompt. You compute the burt embedding of that. And then you have this massive embedding database. And by massive, I'm not talking about like gigabytes, I'm talking about terabytes. Like you have, geez, you actually have 10 times the number of tokens as what was used to train the model. So like, let's say you had a model that was trained on a trillion tokens, you have a 10 trillion embed, uh, like embedding database.[00:33:04] And obviously Google has this because they have all content that ever existed in humanity and they have like the best data set and sort of, they were able to make one of these, uh, embedding databases. But the idea here, which is really cool, is you end. Taking your prompt, computing, the bird, embedding you find out the things that were nearby.[00:33:20] So you do roughly like a semantic search or an embedding search within that. And then you take those, you take the documents that were from those embeddings and you shove those in the model too, in what are called like cross chunked attention. So you like shove them in the model with it as well.[00:33:34] Suddenly now the model is able to take in external. Which is really exciting actually, because suddenly now you're able to get dynamic context in, and the model in some sense is deciding what that context is. It's not deciding it completely. In this case, because the Bert model in this case was actually frozen.[00:33:50] It wasn't trained with the retro model as well, but. The idea is you're somehow adding or augmenting context, which I think is like quite exciting. There's probably two futures. Either context becomes really cheap. Right now it's quadratic. Maybe there's a future where it becomes linear in the, in the size of the context, but the future might actually be the model itself dictates, Hey, I have this context.[00:34:10] You have this data source. Give me this. The model itself is going out into your database and like being like, I want this information, and this is kind of like. What Bing search is looking like. Right? Or bing chat is sort of looking like where it's like I, the model is probably, there's probably some model that's saying I want this information.[00:34:27] And that is getting augmented into the context. Now the model itself knows what context it sort of has and it can sort of like build a state machine of sort of what it needs. And that's probably what the future of this looks like. So you, you[00:34:37] swyx: predict monster embedding database[00:34:39] Varun Mohan: companies? Probably Monster embedding database companies or, yeah.[00:34:43] The model in some sense will need to talk to, Talk to these embedding databases. I'm actually not convinced that the current breed of embedding database companies are like ready for what the future sort of looks like. I think I'm just looking at their pricing, how much it costs per gigabyte and it's prohibitive at the scale we're talking about, like let's say you actually did want to host a 10 terabyte embedding database.[00:35:03] A lot of them were created, let's say two years ago, two, three years ago, where people were like, you know, embedding databases are small and they need to make the cost economics work. But maybe, yeah, there's probably gonna be a big workload there. I will just say for us, we will probably just build this in-house to start with, and that's because I think the technology probably isn't there.[00:35:20] And I think that the technology isn't there yet. Like waiting on point solutions to come up is a lot harder, um, than probably building it up. The way I, I like to think about this is probably the world looks on the LM space. Looks like how the early internet days were, where I think the value was accrued to probably like Google and Google needed to figure out all the crazy things to make their workload work.[00:35:41] And the reason why they weren't able to outsource is, is no one else was feeling the pain. ,[00:35:46] swyx: they're just solving their own pain points. They're just solving their own pain points. They're so far ahead of everyone else. Yes, yes. And just wait[00:35:50] Varun Mohan: for people to catch up. Yes. Yes. And that's maybe different than how things like Snowflake look where the interface has been decided for what SQL looks like 50 years ago.[00:35:58] And because of that, you can go out and build the best database and Yeah, like everyone's gonna be like, this doesn't make my beer taste better. And buy your database basically. That's[00:36:08] swyx: a great reference, by the way. Yeah. We have some friends of the, the pod that are working on embedding database, so we'll try to connect you Toroma[00:36:14] Varun Mohan: and see.[00:36:14] Yeah. Oh, I actually know Anton. I worked with him at Neuro. Oh. Although, there you go. Yeah. Uh, what do you, well, what do you think about, I mean,[00:36:20] swyx: so chromas pivoting towards an embedding[00:36:22] Varun Mohan: database. I think it's an interesting idea. I think it's an interesting idea. I wonder what the early set of workloads that.[00:36:27] They will hit our, and you know what the scaling requirements are. This is maybe the classic thing where like, the teams are great, but you need to pick a workload here that you care about the most. You could build anything. You could build anything. When you're an infrastructure company, you can go in, if I was selling, serving in for, I could build, serving for like linear aggression.[00:36:44] I could build this, but like, unless you hit the right niche for the end user, it's gonna be. . So I think it, I'm excited to see what comes out and if they're great, then we'll use it. Yeah.[00:36:54] swyx: I also like how you slowly equated yourself to Google there. Oh, we're not, we're not Google. You're, you're gonna be the Google of ai.[00:37:00] Varun Mohan: We're definitely, we're definitely not Google. But I was just saying in terms of like, if you look at like the style of companies that came out. Yeah. You know? Absolutely. Or maybe we should live in the cutting edge in[00:37:08] swyx: the future. Yeah. I think that's the pitch.[00:37:10] Varun Mohan: Okay, thanks for b***h us.[00:37:13] Alessio Fanelli: So you just mentioned the older vector embedding source are kind of not made for the L l M generation of compute size.[00:37:21] what does l LM ops look like? You know, which pieces need to be drastically different? Which ones can we recycle?[00:37:27] Varun Mohan: Yeah. One of the things that we've found, like in our own thing of building code that's been just shows how much is missing, and this is the thing where like, I don't know how much of this you can really outsource, which is like we needed to build eval infrastructure.[00:37:40] That means how do you build a great code? And there are things online like human eval, right? And uh, I was telling, which is the benchmark telling Sean about this, the idea of human eval is really neat for code. The idea is you provide a bunch of functions with Docstrings and the eval instead of being, did you predict next token?[00:37:56] It's like, did you generate the entire function and does the function run correctly against a bunch of unit tests? Right. And we've built more sophisticated evals to work on many languages, to work on more variety of code bases. One of the issues that ends up coming up with things like human eval is contam.[00:38:12] Because a lot of these, uh, things that train models end up training on all of GitHub GitHub itself has human eva, so they end up training on that. And then the numbers are tiny, though. It's gonna be tiny, right? But it doesn't matter if it's tiny because it'll just remember it. It'll remember that it's, it's not that it's that precise, but it will, it's like, it's basically like mixing your, your training and validation set.[00:38:32] It's like, oh, yeah, yeah, yeah, yeah. But we've seen cases where like online where someone is like, we have a code model that's like, they we're like, we did this one thing, and HU and human eval jumped a ton and we were just like, huh, did human eval get into your data set? Is that really what happened there?[00:38:46] But we've needed to build all this eval. And what is shown is data cleaning is massive, but data cleaning looks different by. Like code data cleaning is different than what is a high quality piece of code is probably different than what's a high quality legal document. Yeah. And then on top of that, how do you eval this?[00:39:01] How do you also train it at scale at whatever cost you really want to get? But those are things that the end user is either gonna need to solve or someone else is gonna need to solve for them. And I guess maybe one of the things I'm a little bearish on is if another company comes out and solves eval properly for a bunch of different verticals, what was the company that they were selling to really?[00:39:21] What were they really doing at that point? If they themselves were not eval for their own workload and all these other things? I think there are cases where, let's say for code where we probably couldn't outsource our eval, like we wouldn't be able to ship models internally if we didn't know how to eval, but it's clear that there's a lot of different things that people need to take.[00:39:38] Like, Hey, maybe there's an embedding piece. How large is this embedding database actually need to be? But hey, this does look very different than what classic ML ops probably did. Mm-hmm. . How[00:39:47] Alessio Fanelli: do you compare some of these models? Like when you're thinking about model upgrading and making changes, like what does the testing piece of it internally?[00:39:56] Yeah. For us look like.[00:39:56] Varun Mohan: For us, it's like old school AB testing. We've built like infrastructure to be able to say, ramp up users from one to 10 to. 50% and slowly roll things out. This is all classic software, uh, which[00:40:09] swyx: you do in-house. You don't, you don't buy any[00:40:10] Varun Mohan: services. We don't buy services for that.[00:40:13] There are good services, open source services that help you just don't need them. Uh, yeah, I think that's just like not the most complicated thing for us. Sure. Basically. Yeah. Uh, but I think in the future, maybe, we'll, obviously we use things like Google Analytics and all this other stuff, but Yeah. For things of ramping our models, finding out if they're actually better because the eval also doesn't tell the whole story because also for us, Even before generating the prompt, we do a lot of work.[00:40:36] And the only way to know that it's really good across all the languages that our users need to tell us that it's actually good. And, and they tell us by accepting completions. So, so GitHub[00:40:44] swyx: co-pilot, uh, the extension does this thing where they, they like, they'll set a timer and then within like five minutes, 10 minutes, 20 minutes, they'll check in to see if the code is still there.[00:40:54] I thought it was a[00:40:54] Varun Mohan: pretty creative way. It's, it's a very, it's honestly a very creative way. We do do things to see, like in the long term, if people did. Accept or write things that are roughly so because they could accept and then change their minds. They could accept and then change their minds. So we, we are mindful of, of things like that.[00:41:09] But for the most part, the most important metric is at the time, did they actually, did we generate value? And we want to know if that's true. And it's, it's kind of, it's honestly really hard to get signal unless you have like a non-trivial amount of usage, non-trivial, meaning you're getting, you're doing hundreds of thousands of completions, if not millions of completions.[00:41:25] That sounds like, oh wow. Like, that's like a very small amount. But like it's classic. Maybe like if you look at like when I used to be an intern at Quora, like, you know, now more than seven, eight years ago. When I was there, I like shipped a change and then Cora had like millions of daily actives and then it looked like it was good, and then a week later it was just like way worse.[00:41:43] And how is this possible? Like in a given hour we get like hundreds of thousands of interaction, just like, no, you just need way more data. So this is like one of those things where I think having users is like genuinely very valuable to us, basically. Users is all you need. . Yeah.[00:41:59] swyx: Um, by the way, since you brought out Quora, have you tried po any, any thoughts[00:42:03] Varun Mohan: on po I have not actually tried po I've not actually tried.[00:42:05] I[00:42:05] swyx: mean, it seems like a question answering website that's been around for 20 years or something. Would be very, would be very good at question answering. Yeah.[00:42:12] Varun Mohan: Also Adam, the ceo, is like incredibly brilliant. That guy is like insanely smart, so I'm sure they're gonna do,[00:42:18] swyx: they have accidentally built the perfect like data collection company for For qa.[00:42:22] Varun Mohan: Yeah. . It takes a certain kind of person to go and like cannibalize your original company like the in, I mean, it was kinda stagnant for like a few years. Yeah, that's probably true. That's[00:42:31] swyx: probably true. The observation is I feel like you have a bias to its domain specific. , whereas most research is skewed towards, uh, general models, general purpose models.[00:42:40] I don't know if there's like a, a deeper insight here that you wanna go into or, or not, but like, train on all the things, get all the data and you're like, no, no, no. Everyone needs like customized per task,[00:42:49] Varun Mohan: uh, data set. Yeah. I think I'm not gonna. Say that general intelligence is not good. You want a base model that's still really good and that's probably trained on normal text, like a lot of different content.[00:43:00] But I think probably one thing that old school machine learning, even though I'm like the kind of person that says a lot of old school machine learning is just gonna die, is that training on a high quality data set for your workload is, is always gonna yield better results and more, more predictable results.[00:43:15] And I think we are under no illusions that that's not the case. Basical. And[00:43:19] swyx: then the other observation is bandwidth and connectivity, uh, which is not something that people usually think about, but apparently is a, is a big deal. Apparently training agreed in the synchronous needs, high GPU coordination.[00:43:29] These are deleted notes from Sam Altman talking about how they think about training and I was like, oh yeah, that's an insight. And[00:43:34] Varun Mohan: you guys have the same thing. Yeah. So I guess for, for training, you're right in that it is actually nuts to think about how insane the networks are for NVIDIA's most recent hardware, it's.[00:43:46] For the H 100 boxes, you shove eight of these H 100 s on a. Between two nodes. The bandwidth is 3,200 gigabits a second, so 400 gigabytes a second between machines. That's like nuts when you just sit and think about it. That's like double the memory bandwidth of what a CPU has, but it's like between two machines.[00:44:04] On top of that, within the machine, they've created this, this fabric called envy link that allows you to communicate at ultra low latency. That's even lower than P C I E. If you're familiar, that's like the communication protocol. . Yeah, between like the CPU and the other devices or other P C I E devices.[00:44:21] All of this is to make sure that reductions are fast, low latency, and you don't need to think about it. And that's because like a lot of deep learning has sort of evolved. Uh, training has evolved to be synchronous in the OG days. There is a lot of analysis in terms of how good is asynchronous training, which is like, Hey, I have a node, it has a current state of the model.[00:44:39] It's gonna update that itself locally, and it'll like every once in a while, go to another machine and update the weights. But I think like everyone has converged to synchronous. I'm not exactly sure. There's not a lot of good research on asynchronous training right now. Or maybe there is an, I haven't read it.[00:44:52] It's just that there isn't as much research because people are just like, oh, synchronous works. Uh, and the hardware is continually upleveled to handle[00:44:59] swyx: that. Yeah. It was just un unintuitive to me cuz like the whole purpose of GPUs could train things. A lot of things in parallel. Yes.[00:45:05] Varun Mohan: But the crazy thing is also, maybe I can, I can give some dumb math here.[00:45:09] Sure. Here, which is that, uh, let's go with uh, G B T three, which is like 170 billion per. The optimizer state, so while you're training is 14 times the size of the model, so in this case, if it's like 170 billion parameters, it's probably, I'm not great at mental math here, but that's probably around 2.5 terabytes to just store the optimizer state.[00:45:30] That has gotta be sharded across a lot of machines. Like that is not a single gpu. Even if you take an H 100 with 80 gigs to just shard that much, that's like 40, at least 30 machines. So there's like something there where these things need to communicate with each other too.[00:45:44] swyx: You need to vertically scale horizontally.[00:45:46] Varun Mohan: Yeah. You gotta co-located, you gotta somehow feel like you have this massive, the, the ideal programming paradigm is you feel like you have this massive computer. That has no communication, you know, overhead at all, but it has like infinite computer and infinite memory bandwidth.[00:45:59] swyx: That's the AI cluster. Um, okay, well, uh, we want to head to the questions.[00:46:05] Alessio Fanelli: So favorite AI product that you are not[00:46:08] Varun Mohan: building? Yeah, I'm friends with some of the folks at Mid Journey and I really think the Mid Journey product is super cool, especially seeing how the team is iterating and the quality of generations. It consistently gets upleveled. I think it's like quite neat and I think internally at at exa functional, we've been trying out mid Journey for like random content to like generate images and stuff.[00:46:26] Does it bother[00:46:26] swyx: you that they have like a style. I don't know. It, it seems like they're hedging themselves into a particular, like you want mid journey art, you go there.[00:46:33] Varun Mohan: Yeah. It's a brand of art. Yeah, you're right. I think they do have a style, but it seems more predictably good for that style. Okay. So maybe that's too, so just get good at, uh, domain specific thing.[00:46:41] Yeah. Yeah. maybe. Maybe I, maybe I'm just selling, talking to a booker right now. . Yeah. Uh, okay.[00:46:46] swyx: Uh, next question. Uh, favorite AI people and[00:46:48] Varun Mohan: communities? Yeah, so I think I mentioned this before, but I think obviously the open. The opening eye folks are, are insane. Like we, we only have respect for them. But beyond that, I think Elu is a pretty special group.[00:46:59] Especially it's been now probably more than a year and a half since they released like G P T J, which was like back when open source G PT three Curri, which was comparable. And it wasn't like a model where like, It wasn't good. It was like comparable in terms of perplexity to GT three curity and it was trained by a university student actually, and it just showed that, you know, in the end, like I would say pedigree is great, but in if you have people that are motivated know how computers work and they're willing to just get their hands dirty, you can do crazy things and that was a crazy project that gave me more hope.[00:47:34] Decentral training being potentially pretty massive. But I think that was like a very cool thing where a bunch of people just got on Discord and were chatting and they were able to just turn this out. Yeah. I did[00:47:42] swyx: not know this until I looked in further into Luther, but it was not a formal organization.[00:47:45] Was a company was a startup. It's not, yeah. Bunch of guys on Discord.[00:47:48] Varun Mohan: They gotta you, they gotta keep you research grant and they somehow just wrote some codes. .[00:47:52] Alessio Fanelli: Yeah. Yeah. Listen to APAC with Connor, who's the person, and basically Open Eye at the time was like, we cannot release G P T because it's like too good and so bad.[00:48:01] And he was like, He actually said he was sick, so he couldn't leave home for like a, a few weeks. So it was like, what else am I gonna do? And ended up