Podcasts about cosine

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Automating Software Engineering: Genie Tops SWE-Bench, w/ Alistair Pullen, from Latent.Space podcast

Play Episode Listen Later Oct 2, 2024 72:30

In this special crossover episode of The Cognitive Revolution, Nathan shares an insightful conversation from the Latent.Space podcast. Swyx and Alessio interview Alistair Pullen of Cosine, creators of Genie, showcasing the cutting edge of AI automation in software engineering. Learn how Cosine achieves state-of-the-art results on the SWE-bench benchmark by implementing advanced AI techniques. This episode complements Nathan's recent discussion on AI Automation, demonstrating how far these practices can be pushed in real-world applications. Don't miss this opportunity to explore the future of AI-driven software development and its implications for businesses across industries. Check out the Latent.Space podcast here: https://www.latent.space Apply to join over 400 Founders and Execs in the Turpentine Network: https://www.turpentinenetwork.co/ SPONSORS: WorkOS: Building an enterprise-ready SaaS app? WorkOS has got you covered with easy-to-integrate APIs for SAML, SCIM, and more. Join top startups like Vercel, Perplexity, Jasper & Webflow in powering your app with WorkOS. Enjoy a free tier for up to 1M users! Start now at https://bit.ly/WorkOS-Turpentine-Network Weights & Biases Weave: Weights & Biases Weave is a lightweight AI developer toolkit designed to simplify your LLM app development. With Weave, you can trace and debug input, metadata and output with just 2 lines of code. Make real progress on your LLM development and visit the following link to get started with Weave today: https://wandb.me/cr 80,000 Hours: 80,000 Hours offers free one-on-one career advising for Cognitive Revolution listeners aiming to tackle global challenges, especially in AI. They connect high-potential individuals with experts, opportunities, and personalized career plans to maximize positive impact. Apply for a free call at https://80000hours.org/cognitiverevolution to accelerate your career and contribute to solving pressing AI-related issues. Omneky: Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/ CHAPTERS: (00:00:00) About the Show (00:00:22) Sponsors: WorkOS (00:01:22) About the Episode (00:04:29) Alistair and Cosine intro (00:13:50) Building the Code Retrieval Tool (00:17:36) Sponsors: Weights & Biases Weave | 80,000 Hours (00:20:15) Developing Genie and Fine-tuning Process (00:27:41) Working with Customer Data (00:30:53) Code Retrieval Challenges and Solutions (00:36:39) Sponsors: Omneky (00:37:02) Planning and Reasoning in AI Models (00:45:55) Language Support and Generalization (00:49:46) Fine-tuning Experience with OpenAI (00:52:56) Synthetic Data and Self-improvement Loop (00:55:57) Benchmarking and SWE-bench Results (01:01:47) Future Plans for Genie (01:03:02) Industry Trends and Cursor's Success (01:05:23) Calls to Action and Ideal Customers (01:08:43) Outro

Why you should write your own LLM benchmarks — with Nicholas Carlini, Google DeepMind

Play Episode Listen Later Aug 29, 2024 70:05

Today's guest, Nicholas Carlini, a research scientist at DeepMind, argues that we should be focusing more on what AI can do for us individually, rather than trying to have an answer for everyone."How I Use AI" - A Pragmatic ApproachCarlini's blog post "How I Use AI" went viral for good reason. Instead of giving a personal opinion about AI's potential, he simply laid out how he, as a security researcher, uses AI tools in his daily work. He divided it in 12 sections:* To make applications* As a tutor* To get started* To simplify code* For boring tasks* To automate tasks* As an API reference* As a search engine* To solve one-offs* To teach me* Solving solved problems* To fix errorsEach of the sections has specific examples, so we recommend going through it. It also includes all prompts used for it; in the "make applications" case, it's 30,000 words total!My personal takeaway is that the majority of the work AI can do successfully is what humans dislike doing. Writing boilerplate code, looking up docs, taking repetitive actions, etc. These are usually boring tasks with little creativity, but with a lot of structure. This is the strongest arguments as to why LLMs, especially for code, are more beneficial to senior employees: if you can get the boring stuff out of the way, there's a lot more value you can generate. This is less and less true as you go entry level jobs which are mostly boring and repetitive tasks. Nicholas argues both sides ~21:34 in the pod.A New Approach to LLM BenchmarksWe recently did a Benchmarks 201 episode, a follow up to our original Benchmarks 101, and some of the issues have stayed the same. Notably, there's a big discrepancy between what benchmarks like MMLU test, and what the models are used for. Carlini created his own domain-specific language for writing personalized LLM benchmarks. The idea is simple but powerful:* Take tasks you've actually needed AI for in the past.* Turn them into benchmark tests.* Use these to evaluate new models based on your specific needs.It can represent very complex tasks, from a single code generation to drawing a US flag using C:"Write hello world in python" >> LLMRun() >> PythonRun() >> SubstringEvaluator("hello world")"Write a C program that draws an american flag to stdout." >> LLMRun() >> CRun() >> VisionLLMRun("What flag is shown in this image?") >> (SubstringEvaluator("United States") | SubstringEvaluator("USA")))This approach solves a few problems:* It measures what's actually useful to you, not abstract capabilities.* It's harder for model creators to "game" your specific benchmark, a problem that has plagued standardized tests.* It gives you a concrete way to decide if a new model is worth switching to, similar to how developers might run benchmarks before adopting a new library or framework.Carlini argues that if even a small percentage of AI users created personal benchmarks, we'd have a much better picture of model capabilities in practice.AI SecurityWhile much of the AI security discussion focuses on either jailbreaks or existential risks, Carlini's research targets the space in between. Some highlights from his recent work:* LAION 400M data poisoning: By buying expired domains referenced in the dataset, Carlini's team could inject arbitrary images into models trained on LAION 400M. You can read the paper "Poisoning Web-Scale Training Datasets is Practical", for all the details. This is a great example of expanding the scope beyond the model itself, and looking at the whole system and how ti can become vulnerable.* Stealing model weights: They demonstrated how to extract parts of production language models (like OpenAI's) through careful API queries. This research, "Extracting Training Data from Large Language Models", shows that even black-box access can leak sensitive information.* Extracting training data: In some cases, they found ways to make models regurgitate verbatim snippets from their training data. Him and Milad Nasr wrote a paper on this as well: Scalable Extraction of Training Data from (Production) Language Models. They also think this might be applicable to extracting RAG results from a generation.These aren't just theoretical attacks. They've led to real changes in how companies like OpenAI design their APIs and handle data. If you really miss logit_bias and logit results by token, you can blame Nicholas :)We had a ton of fun also chatting about things like Conway's Game of Life, how much data can fit in a piece of paper, and porting Doom to Javascript. Enjoy!Show Notes* How I Use AI* My Benchmark for LLMs* Doom Javascript port* Conway's Game of Life* Tic-Tac-Toe in one printf statement* International Obfuscated C Code Contest* Cursor* LAION 400M poisoning paper* Man vs Machine at Black Hat* Model Stealing from OpenAI* Milad Nasr* H.D. Moore* Vijay Bolina* Cosine.sh* uuencodeTimestamps* [00:00:00] Introductions* [00:01:14] Why Nicholas writes* [00:02:09] The Game of Life* [00:05:07] "How I Use AI" blog post origin story* [00:08:24] Do we need software engineering agents?* [00:11:03] Using AI to kickstart a project* [00:14:08] Ephemeral software* [00:17:37] Using AI to accelerate research* [00:21:34] Experts vs non-expert users as beneficiaries of AI* [00:24:02] Research on generating less secure code with LLMs.* [00:27:22] Learning and explaining code with AI* [00:30:12] AGI speculations?* [00:32:50] Distributing content without social media* [00:35:39] How much data do you think you can put on a single piece of paper?* [00:37:37] Building personal AI benchmarks* [00:43:04] Evolution of prompt engineering and its relevance* [00:46:06] Model vs task benchmarking* [00:52:14] Poisoning LAION 400M through expired domains* [00:55:38] Stealing OpenAI models from their API* [01:01:29] Data stealing and recovering training data from models* [01:03:30] Finding motivation in your workTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO-in-Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.Swyx [00:00:12]: Hey, and today we're in the in-person studio, which Alessio has gorgeously set up for us, with Nicholas Carlini. Welcome. Thank you. You're a research scientist at DeepMind. You work at the intersection of machine learning and computer security. You got your PhD from Berkeley in 2018, and also your BA from Berkeley as well. And mostly we're here to talk about your blogs, because you are so generous in just writing up what you know. Well, actually, why do you write?Nicholas [00:00:41]: Because I like, I feel like it's fun to share what you've done. I don't like writing, sufficiently didn't like writing, I almost didn't do a PhD, because I knew how much writing was involved in writing papers. I was terrible at writing when I was younger. I do like the remedial writing classes when I was in university, because I was really bad at it. So I don't actually enjoy, I still don't enjoy the act of writing. But I feel like it is useful to share what you're doing, and I like being able to talk about the things that I'm doing that I think are fun. And so I write because I think I want to have something to say, not because I enjoy the act of writing.Swyx [00:01:14]: But yeah. It's a tool for thought, as they often say. Is there any sort of backgrounds or thing that people should know about you as a person? Yeah.Nicholas [00:01:23]: So I tend to focus on, like you said, I do security work, I try to like attacking things and I want to do like high quality security research. And that's mostly what I spend my actual time trying to be productive members of society doing that. But then I get distracted by things, and I just like, you know, working on random fun projects. Like a Doom clone in JavaScript.Swyx [00:01:44]: Yes.Nicholas [00:01:45]: Like that. Or, you know, I've done a number of things that have absolutely no utility. But are fun things to have done. And so it's interesting to say, like, you should work on fun things that just are interesting, even if they're not useful in any real way. And so that's what I tend to put up there is after I have completed something I think is fun, or if I think it's sufficiently interesting, write something down there.Alessio [00:02:09]: Before we go into like AI, LLMs and whatnot, why are you obsessed with the game of life? So you built multiplexing circuits in the game of life, which is mind boggling. So where did that come from? And then how do you go from just clicking boxes on the UI web version to like building multiplexing circuits?Nicholas [00:02:29]: I like Turing completeness. The definition of Turing completeness is a computer that can run anything, essentially. And the game of life, Conway's game of life is a very simple cellular 2D automata where you have cells that are either on or off. And a cell becomes on if in the previous generation some configuration holds true and off otherwise. It turns out there's a proof that the game of life is Turing complete, that you can run any program in principle using Conway's game of life. I don't know. And so you can, therefore someone should. And so I wanted to do it. Some other people have done some similar things, but I got obsessed into like, if you're going to try and make it work, like we already know it's possible in theory. I want to try and like actually make something I can run on my computer, like a real computer I can run. And so yeah, I've been going on this rabbit hole of trying to make a CPU that I can run semi real time on the game of life. And I have been making some reasonable progress there. And yeah, but you know, Turing completeness is just like a very fun trap you can go down. A while ago, as part of a research paper, I was able to show that in C, if you call into printf, it's Turing complete. Like printf, you know, like, which like, you know, you can print numbers or whatever, right?Swyx [00:03:39]: Yeah, but there should be no like control flow stuff.Nicholas [00:03:42]: Because printf has a percent n specifier that lets you write an arbitrary amount of data to an arbitrary location. And the printf format specifier has an index into where it is in the loop that is in memory. So you can overwrite the location of where printf is currently indexing using percent n. So you can get loops, you can get conditionals, and you can get arbitrary data rates again. So we sort of have another Turing complete language using printf, which again, like this has essentially zero practical utility, but like, it's just, I feel like a lot of people get into programming because they enjoy the art of doing these things. And then they go work on developing some software application and lose all joy with the boys. And I want to still have joy in doing these things. And so on occasion, I try to stop doing productive, meaningful things and just like, what's a fun thing that we can do and try and make that happen.Alessio [00:04:39]: Awesome. So you've been kind of like a pioneer in the AI security space. You've done a lot of talks starting back in 2018. We'll kind of leave that to the end because I know the security part is, there's maybe a smaller audience, but it's a very intense audience. So I think that'll be fun. But everybody in our Discord started posting your how I use AI blog post and we were like, we should get Carlini on the podcast. And then you were so nice to just, yeah, and then I sent you an email and you're like, okay, I'll come.Swyx [00:05:07]: And I was like, oh, I thought that would be harder.Alessio [00:05:10]: I think there's, as you said in the blog posts, a lot of misunderstanding about what LLMs can actually be used for. What are they useful at? What are they not good at? And whether or not it's even worth arguing what they're not good at, because they're obviously not. So if you cannot count the R's in a word, they're like, it's just not what it does. So how painful was it to write such a long post, given that you just said that you don't like to write? Yeah. And then we can kind of run through the things, but maybe just talk about the motivation, why you thought it was important to do it.Nicholas [00:05:39]: Yeah. So I wanted to do this because I feel like most people who write about language models being good or bad, some underlying message of like, you know, they have their camp and their camp is like, AI is bad or AI is good or whatever. And they like, they spin whatever they're going to say according to their ideology. And they don't actually just look at what is true in the world. So I've read a lot of things where people say how amazing they are and how all programmers are going to be obsolete by 2024. And I've read a lot of things where people who say like, they can't do anything useful at all. And, you know, like, they're just like, it's only the people who've come off of, you know, blockchain crypto stuff and are here to like make another quick buck and move on. And I don't really agree with either of these. And I'm not someone who cares really one way or the other how these things go. And so I wanted to write something that just says like, look, like, let's sort of ground reality and what we can actually do with these things. Because my actual research is in like security and showing that these models have lots of problems. Like this is like my day to day job is saying like, we probably shouldn't be using these in lots of cases. I thought I could have a little bit of credibility of in saying, it is true. They have lots of problems. We maybe shouldn't be deploying them lots of situations. And still, they are also useful. And that is the like, the bit that I wanted to get across is to say, I'm not here to try and sell you on anything. I just think that they're useful for the kinds of work that I do. And hopefully, some people would listen. And it turned out that a lot more people liked it than I thought. But yeah, that was the motivation behind why I wanted to write this.Alessio [00:07:15]: So you had about a dozen sections of like how you actually use AI. Maybe we can just kind of run through them all. And then maybe the ones where you have extra commentary to add, we can... Sure.Nicholas [00:07:27]: Yeah, yeah. I didn't put as much thought into this as maybe was deserved. I probably spent, I don't know, definitely less than 10 hours putting this together.Swyx [00:07:38]: Wow.Alessio [00:07:39]: It took me close to that to do a podcast episode. So that's pretty impressive.Nicholas [00:07:43]: Yeah. I wrote it in one pass. I've gotten a number of emails of like, you got this editing thing wrong, you got this sort of other thing wrong. It's like, I haven't just haven't looked at it. I tend to try it. I feel like I still don't like writing. And so because of this, the way I tend to treat this is like, I will put it together into the best format that I can at a time, and then put it on the internet, and then never change it. And this is an aspect of like the research side of me is like, once a paper is published, like it is done as an artifact that exists in the world. I could forever edit the very first thing I ever put to make it the most perfect version of what it is, and I would do nothing else. And so I feel like I find it useful to be like, this is the artifact, I will spend some certain amount of hours on it, which is what I think it is worth. And then I will just...Swyx [00:08:22]: Yeah.Nicholas [00:08:23]: Timeboxing.Alessio [00:08:24]: Yeah. Stop. Yeah. Okay. We just recorded an episode with the founder of Cosine, which is like an AI software engineer colleague. You said it took you 30,000 words to get GPT-4 to build you the, can GPT-4 solve this kind of like app. Where are we in the spectrum where chat GPT is all you need to actually build something versus I need a full on agent that does everything for me?Nicholas [00:08:46]: Yeah. Okay. So this was an... So I built a web app last year sometime that was just like a fun demo where you can guess if you can predict whether or not GPT-4 at the time could solve a given task. This is, as far as web apps go, very straightforward. You need basic HTML, CSS, you have a little slider that moves, you have a button, sort of animate the text coming to the screen. The reason people are going here is not because they want to see my wonderful HTML, right? I used to know how to do modern HTML in 2007, 2008. I was very good at fighting with IE6 and these kinds of things. I knew how to do that. I have no longer had to build any web app stuff in the meantime, which means that I know how everything works, but I don't know any of the new... Flexbox is new to me. Flexbox is like 10 years old at this point, but it's just amazing being able to go to the model and just say, write me this thing and it will give me all of the boilerplate that I need to get going. Of course it's imperfect. It's not going to get you the right answer, and it doesn't do anything that's complicated right now, but it gets you to the point where the only remaining work that needs to be done is the interesting hard part for me, the actual novel part. Even the current models, I think, are entirely good enough at doing this kind of thing, that they're very useful. It may be the case that if you had something, like you were saying, a smarter agent that could debug problems by itself, that might be even more useful. Currently though, make a model into an agent by just copying and pasting error messages for the most part. That's what I do, is you run it and it gives you some code that doesn't work, and either I'll fix the code, or it will give me buggy code and I won't know how to fix it, and I'll just copy and paste the error message and say, it tells me this. What do I do? And it will just tell me how to fix it. You can't trust these things blindly, but I feel like most people on the internet already understand that things on the internet, you can't trust blindly. And so this is not like a big mental shift you have to go through to understand that it is possible to read something and find it useful, even if it is not completely perfect in its output.Swyx [00:10:54]: It's very human-like in that sense. It's the same ring of trust, I kind of think about it that way, if you had trust levels.Alessio [00:11:03]: And there's maybe a couple that tie together. So there was like, to make applications, and then there's to get started, which is a similar you know, kickstart, maybe like a project that you know the LLM cannot solve. It's kind of how you think about it.Nicholas [00:11:15]: Yeah. So for getting started on things is one of the cases where I think it's really great for some of these things, where I sort of use it as a personalized, help me use this technology I've never used before. So for example, I had never used Docker before January. I know what Docker is. Lucky you. Yeah, like I'm a computer security person, like I sort of, I have read lots of papers on, you know, all the technology behind how these things work. You know, I know all the exploits on them, I've done some of these things, but I had never actually used Docker. But I wanted it to be able to, I could run the outputs of language model stuff in some controlled contained environment, which I know is the right application. So I just ask it like, I want to use Docker to do this thing, like, tell me how to run a Python program in a Docker container. And it like gives me a thing. I'm like, step back. You said Docker compose, I do not know what this word Docker compose is. Is this Docker? Help me. And like, you'll sort of tell me all of these things. And I'm sure there's this knowledge that's out there on the internet, like this is not some groundbreaking thing that I'm doing, but I just wanted it as a small piece of one thing I was working on. And I didn't want to learn Docker from first principles. Like I, at some point, if I need it, I can do that. Like I have the background that I can make that happen. But what I wanted to do was, was thing one. And it's very easy to get bogged down in the details of this other thing that helps you accomplish your end goal. And I just want to like, tell me enough about Docker so I can do this particular thing. And I can check that it's doing the safe thing. I sort of know enough about that from, you know, my other background. And so I can just have the model help teach me exactly the one thing I want to know and nothing more. I don't need to worry about other things that the writer of this thinks is important that actually isn't. Like I can just like stop the conversation and say, no, boring to me. Explain this detail. I don't understand. I think that's what that was very useful for me. It would have taken me, you know, several hours to figure out some things that take 10 minutes if you could just ask exactly the question you want the answer to.Alessio [00:13:05]: Have you had any issues with like newer tools? Have you felt any meaningful kind of like a cutoff day where like there's not enough data on the internet or? I'm sure that the answer to this is yes.Nicholas [00:13:16]: But I tend to just not use most of these things. Like I feel like this is like the significant way in which I use machine learning models is probably very different than most people is that I'm a researcher and I get to pick what tools that I use and most of the things that I work on are fairly small projects. And so I can, I can entirely see how someone who is in a big giant company where they have their own proprietary legacy code base of a hundred million lines of code or whatever and like you just might not be able to use things the same way that I do. I still think there are lots of use cases there that are entirely reasonable that are not the same ones that I've put down. But I wanted to talk about what I have personal experience in being able to say is useful. And I would like it very much if someone who is in one of these environments would be able to describe the ways in which they find current models useful to them. And not, you know, philosophize on what someone else might be able to find useful, but actually say like, here are real things that I have done that I found useful for me.Swyx [00:14:08]: Yeah, this is what I often do to encourage people to write more, to share their experiences because they often fear being attacked on the internet. But you are the ultimate authority on how you use things and there's this objectively true. So they cannot be debated. One thing that people are very excited about is the concept of ephemeral software or like personal software. This use case in particular basically lowers the activation energy for creating software, which I like as a vision. I don't think I have taken as much advantage of it as I could. I feel guilty about that. But also, we're trending towards there.Nicholas [00:14:47]: Yeah. No, I mean, I do think that this is a direction that is exciting to me. One of the things I wrote that was like, a lot of the ways that I use these models are for one-off things that I just need to happen that I'm going to throw away in five minutes. And you can.Swyx [00:15:01]: Yeah, exactly.Nicholas [00:15:02]: Right. It's like the kind of thing where it would not have been worth it for me to have spent 45 minutes writing this, because I don't need the answer that badly. But if it will only take me five minutes, then I'll just figure it out, run the program and then get it right. And if it turns out that you ask the thing, it doesn't give you the right answer. Well, I didn't actually need the answer that badly in the first place. Like either I can decide to dedicate the 45 minutes or I cannot, but like the cost of doing it is fairly low. You see what the model can do. And if it can't, then, okay, when you're using these models, if you're getting the answer you want always, it means you're not asking them hard enough questions.Swyx [00:15:35]: Say more.Nicholas [00:15:37]: Lots of people only use them for very small particular use cases and like it always does the thing that they want. Yeah.Swyx [00:15:43]: Like they use it like a search engine.Nicholas [00:15:44]: Yeah. Or like one particular case. And if you're finding that when you're using these, it's always giving you the answer that you want, then probably it has more capabilities than you're actually using. And so I oftentimes try when I have something that I'm curious about to just feed into the model and be like, well, maybe it's just solved my problem for me. You know, most of the time it doesn't, but like on occasion, it's like, it's done things that would have taken me, you know, a couple hours that it's been great and just like solved everything immediately. And if it doesn't, then it's usually easier to verify whether or not the answer is correct than to have written in the first place. And so you check, you're like, well, that's just, you're entirely misguided. Nothing here is right. It's just like, I'm not going to do this. I'm going to go write it myself or whatever.Alessio [00:16:21]: Even for non-tech, I had to fix my irrigation system. I had an old irrigation system. I didn't know how I worked to program it. I took a photo, I sent it to Claude and it's like, oh yeah, that's like the RT 900. This is exactly, I was like, oh wow, you know, you know, a lot of stuff.Swyx [00:16:34]: Was it right?Alessio [00:16:35]: Yeah, it was right.Swyx [00:16:36]: It worked. Did you compare with OpenAI?Alessio [00:16:38]: No, I canceled my OpenAI subscription, so I'm a Claude boy. Do you have a way to think about this like one-offs software thing? One way I talk to people about it is like LLMs are kind of converging to like semantic serverless functions, you know, like you can say something and like it can run the function in a way and then that's it. It just kind of dies there. Do you have a mental model to just think about how long it should live for and like anything like that?Nicholas [00:17:02]: I don't think I have anything interesting to say here, no. I will take whatever tools are available in front of me and try and see if I can use them in meaningful ways. And if they're helpful, then great. If they're not, then fine. And like, you know, there are lots of people that I'm very excited about seeing all these people who are trying to make better applications that use these or all these kinds of things. And I think that's amazing. I would like to see more of it, but I do not spend my time thinking about how to make this any better.Alessio [00:17:27]: What's the most underrated thing in the list? I know there's like simplified code, solving boring tasks, or maybe is there something that you forgot to add that you want to throw in there?Nicholas [00:17:37]: I mean, so in the list, I only put things that people could look at and go, I understand how this solved my problem. I didn't want to put things where the model was very useful to me, but it would not be clear to someone else that it was actually useful. So for example, one of the things that I use it a lot for is debugging errors. But the errors that I have are very much not the errors that anyone else in the world will have. And in order to understand whether or not the solution was right, you just have to trust me on it. Because, you know, like I got my machine in a state that like CUDA was not talking to whatever some other thing, the versions were mismatched, something, something, something, and everything was broken. And like, I could figure it out with interaction with the model, and it gave it like told me the steps I needed to take. But at the end of the day, when you look at the conversation, you just have to trust me that it worked. And I didn't want to write things online that were this, like, you have to trust me that what I'm saying. I want everything that I said to like have evidence that like, here's the conversation, you can go and check whether or not this actually solved the task as I said that the model does. Because a lot of people I feel like say, I used a model to solve this very complicated task. And what they mean is the model did 10%, and I did the other 90% or something, I wanted everything to be verifiable. And so one of the biggest use cases for me, I didn't describe even at all, because it's not the kind of thing that other people could have verified by themselves. So that maybe is like, one of the things that I wish I maybe had said a little bit more about, and just stated that the way that this is done, because I feel like that this didn't come across quite as well. But yeah, of the things that I talked about, the thing that I think is most underrated is the ability of it to solve the uninteresting parts of problems for me right now, where people always say, this is one of the biggest arguments that I don't understand why people say is, the model can only do things that people have done before. Therefore, the model is not going to be helpful in doing new research or like discovering new things. And as someone whose day job is to do new things, like what is research? Research is doing something literally no one else in the world has ever done before. So this is what I do every single day, 90% of this is not doing something new, 90% of this is doing things a million people have done before, and then a little bit of something that was new. There's a reason why we say we stand on the shoulders of giants. It's true. Almost everything that I do is something that's been done many, many times before. And that is the piece that can be automated. Even if the thing that I'm doing as a whole is new, it is almost certainly the case that the small pieces that build up to it are not. And a number of people who use these models, I feel like expect that they can either solve the entire task or none of the task. But now I find myself very often, even when doing something very new and very hard, having models write the easy parts for me. And the reason I think this is so valuable, everyone who programs understands this, like you're currently trying to solve some problem and then you get distracted. And whatever the case may be, someone comes and talks to you, you have to go look up something online, whatever it is. You lose a lot of time to that. And one of the ways we currently don't think about being distracted is you're solving some hard problem and you realize you need a helper function that does X, where X is like, it's a known algorithm. Any person in the world, you say like, give me the algorithm that, have a dense graph or a sparse graph, I need to make it dense. You can do this by doing some matrix multiplies. It's like, this is a solved problem. I knew how to do this 15 years ago, but it distracts me from the problem I'm thinking about in my mind. I needed this done. And so instead of using my mental capacity and solving that problem and then coming back to the problem I was originally trying to solve, you could just ask model, please solve this problem for me. It gives you the answer. You run it. You can check that it works very, very quickly. And now you go back to solving the problem without having lost all the mental state. And I feel like this is one of the things that's been very useful for me.Swyx [00:21:34]: And in terms of this concept of expert users versus non-expert users, floors versus ceilings, you had some strong opinion here that like, basically it actually is more beneficial for non-experts.Nicholas [00:21:46]: Yeah, I don't know. I think it could go either way. Let me give you the argument for both of these. Yes. So I can only speak on the expert user behalf because I've been doing computers for a long time. And so yeah, the cases where it's useful for me are exactly these cases where I can check the output. I know, and anything the model could do, I could have done. I could have done better. I can check every single thing that the model is doing and make sure it's correct in every way. And so I can only speak and say, definitely it's been useful for me. But I also see a world in which this could be very useful for the kinds of people who do not have this knowledge, with caveats, because I'm not one of these people. I don't have this direct experience. But one of these big ways that I can see this is for things that you can check fairly easily, someone who could never have asked or have written a program themselves to do a certain task could just ask for the program that does the thing. And you know, some of the times it won't get it right. But some of the times it will, and they'll be able to have the thing in front of them that they just couldn't have done before. And we see a lot of people trying to do applications for this, like integrating language models into spreadsheets. Spreadsheets run the world. And there are some people who know how to do all the complicated spreadsheet equations and various things, and other people who don't, who just use the spreadsheet program but just manually do all of the things one by one by one by one. And this is a case where you could have a model that could try and give you a solution. And as long as the person is rigorous in testing that the solution does actually the correct thing, and this is the part that I'm worried about most, you know, I think depending on these systems in ways that we shouldn't, like this is what my research says, my research says is entirely on this, like, you probably shouldn't trust these models to do the things in adversarial situations, like, I understand this very deeply. And so I think that it's possible for people who don't have this knowledge to make use of these tools in ways, but I'm worried that it might end up in a world where people just blindly trust them, deploy them in situations that they probably shouldn't, and then someone like me gets to come along and just break everything because everything is terrible. And so I am very, very worried about that being the case, but I think if done carefully it is possible that these could be very useful.Swyx [00:23:54]: Yeah, there is some research out there that shows that when people use LLMs to generate code, they do generate less secure code.Nicholas [00:24:02]: Yeah, Dan Bonet has a nice paper on this. There are a bunch of papers that touch on exactly this.Swyx [00:24:07]: My slight issue is, you know, is there an agenda here?Nicholas [00:24:10]: I mean, okay, yeah, Dan Bonet, at least the one they have, like, I fully trust everything that sort of.Swyx [00:24:15]: Sorry, I don't know who Dan is.Swyx [00:24:17]: He's a professor at Stanford. Yeah, he and some students have some things on this. Yeah, there's a number. I agree that a lot of the stuff feels like people have an agenda behind it. There are some that don't, and I trust them to have done the right thing. I also think, even on this though, we have to be careful because the argument, whenever someone says x is true about language models, you should always append the suffix for current models because I'll be the first to admit I was one of the people who was very much on the opinion that these language models are fun toys and are going to have absolutely no practical utility. If you had asked me this, let's say, in 2020, I still would have said the same thing. After I had seen GPT-2, I had written a couple of papers studying GPT-2 very carefully. I still would have told you these things are toys. And when I first read the RLHF paper and the instruction tuning paper, I was like, nope, this is this thing that these weird AI people are doing. They're trying to make some analogies to people that makes no sense. It's just like, I don't even care to read it. I saw what it was about and just didn't even look at it. I was obviously wrong. These things can be useful. And I feel like a lot of people had the same mentality that I did and decided not to change their mind. And I feel like this is the thing that I want people to be careful about. I want them to at least know what is true about the world so that they can then see that maybe they should reconsider some of the opinions that they had from four or five years ago that may just not be true about today's models.Swyx [00:25:47]: Specifically because you brought up spreadsheets, I want to share my personal experience because I think Google has done a really good job that people don't know about, which is if you use Google Sheets, Gemini is integrated inside of Google Sheets and it helps you write formulas. Great.Nicholas [00:26:00]: That's news to me.Swyx [00:26:01]: Right? They don't maybe do a good job. Unless you watch Google I.O., there was no other opportunity to learn that Gemini is now in your Google Sheets. And so I just don't write formulas manually anymore. It just prompts Gemini to do it for me. And it does it.Nicholas [00:26:15]: One of the problems that these machine learning models have is a discoverability problem. I think this will be figured out. I mean, it's the same problem that you have with any assistant. You're given a blank box and you're like, what do I do with it? I think this is great. More of these things, it would be good for them to exist. I want them to exist in ways that we can actually make sure that they're done correctly. I don't want to just have them be pushed into more and more things just blindly. I feel like lots of people, there are far too many X plus AI, where X is like arbitrary thing in the world that has nothing to do with it and could not be benefited at all. And they're just doing it because they want to use the word. And I don't want that to happen.Swyx [00:26:58]: You don't want an AI fridge?Nicholas [00:27:00]: No. Yes. I do not want my fridge on the internet.Swyx [00:27:03]: I do not want... Okay.Nicholas [00:27:05]: Anyway, let's not go down that rabbit hole. I understand why some of that happens, because people want to sell things or whatever. But I feel like a lot of people see that and then they write off everything as a result of it. And I just want to say, there are allowed to be people who are trying to do things that don't make any sense. Just ignore them. Do the things that make sense.Alessio [00:27:22]: Another chunk of use cases was learning. So both explaining code, being an API reference, all of these different things. Any suggestions on how to go at it? I feel like one thing is generate code and then explain to me. One way is just tell me about this technology. Another thing is like, hey, I read this online, kind of help me understand it. Any best practices on getting the most out of it?Swyx [00:27:47]: Yeah.Nicholas [00:27:47]: I don't know if I have best practices. I have how I use them.Swyx [00:27:51]: Yeah.Nicholas [00:27:51]: I find it very useful for cases where I understand the underlying ideas, but I have never usedSwyx [00:27:59]: them in this way before.Nicholas [00:28:00]: I know what I'm looking for, but I just don't know how to get there. And so yeah, as an API reference is a great example. The tool everyone always picks on is like FFmpeg. No one in the world knows the command line arguments to do what they want. They're like, make the thing faster. I want lower bitrate, like dash V. Once you tell me what the answer is, I can check. This is one of these things where it's great for these kinds of things. Or in other cases, things where I don't really care that the answer is 100% correct. So for example, I do a lot of security work. Most of security work is reading some code you've never seen before and finding out which pieces of the code are actually important. Because, you know, most of the program isn't actually do anything to do with security. It has, you know, the display piece or the other piece or whatever. And like, you just, you would only ignore all of that. So one very fun use of models is to like, just have it describe all the functions and just skim it and be like, wait, which ones look like approximately the right things to look at? Because otherwise, what are you going to do? You're going to have to read them all manually. And when you're reading them manually, you're going to skim the function anyway, and not just figure out what's going on perfectly. Like you already know that when you're going to read these things, what you're going to try and do is figure out roughly what's going on. Then you'll delve into the details. This is a great way of just doing that, but faster, because it will abstract most of whatSwyx [00:29:21]: is right.Nicholas [00:29:21]: It's going to be wrong some of the time. I don't care.Swyx [00:29:23]: I would have been wrong too.Nicholas [00:29:24]: And as long as you treat it with this way, I think it's great. And so like one of the particular use cases I have in the thing is decompiling binaries, where oftentimes people will release a binary. They won't give you the source code. And you want to figure out how to attack it. And so one thing you could do is you could try and run some kind of decompiler. It turns out for the thing that I wanted, none existed. And so I spent too many hours doing it by hand. Before I first thought, why am I doing this? I should just check if the model could do it for me. And it turns out that it can. And it can turn the compiled source code, which is impossible for any human to understand, into the Python code that is entirely reasonable to understand. And it doesn't run. It has a bunch of problems. But it's so much nicer that it's immediately a win for me. I can just figure out approximately where I should be looking, and then spend all of my time doing that by hand. And again, you get a big win there.Swyx [00:30:12]: So I fully agree with all those use cases, especially for you as a security researcher and having to dive into multiple things. I imagine that's super helpful. I do think we want to move to your other blog post. But you ended your post with a little bit of a teaser about your next post and your speculations. What are you thinking about?Nicholas [00:30:34]: So I want to write something. And I will do that at some point when I have time, maybe after I'm done writing my current papers for ICLR or something, where I want to talk about some thoughts I have for where language models are going in the near-term future. The reason why I want to talk about this is because, again, I feel like the discussion tends to be people who are either very much AGI by 2027, orSwyx [00:30:55]: always five years away, or are going to make statements of the form,Nicholas [00:31:00]: you know, LLMs are the wrong path, and we should be abandoning this, and we should be doing something else instead. And again, I feel like people tend to look at this and see these two polarizing options and go, well, those obviously are both very far extremes. Like, how do I actually, like, what's a more nuanced take here? And so I have some opinions about this that I want to put down, just saying, you know, I have wide margins of error. I think you should too. If you would say there's a 0% chance that something, you know, the models will get very, very good in the next five years, you're probably wrong. If you're going to say there's a 100% chance that in the next five years, then you're probably wrong. And like, to be fair, most of the people, if you read behind the headlines, actually say something like this. But it's very hard to get clicks on the internet of like, some things may be good in the future. Like, everyone wants like, you know, a very, like, nothing is going to be good. This is entirely wrong. It's going to be amazing. You know, like, they want to see this. I want people who have negative reactions to these kinds of extreme views to be able to at least say, like, to tell them, there is something real here. It may not solve all of our problems, but it's probably going to get better. I don't know by how much. And that's basically what I want to say. And then at some point, I'll talk about the safety and security things as a result of this. Because the way in which security intersects with these things depends a lot in exactly how people use these tools. You know, if it turns out to be the case that these models get to be truly amazing and can solve, you know, tasks completely autonomously, that's a very different security world to be living in than if there's always a human in the loop. And the types of security questions I would want to ask would be very different. And so I think, you know, in some very large part, understanding what the future will look like a couple of years ahead of time is helpful for figuring out which problems, as a security person, I want to solve now. You mentioned getting clicks on the internet,Alessio [00:32:50]: but you don't even have, like, an ex-account or anything. How do you get people to read your stuff? What's your distribution strategy? Because this post was popping up everywhere. And then people on Twitter were like, Nicholas Garlini wrote this. Like, what's his handle? It's like, he doesn't have it. It's like, how did you find it? What's the story?Nicholas [00:33:07]: So I have an RSS feed and an email list. And that's it. I don't like most social media things. On principle, I feel like they have some harms. As a person, I have a problem when people say things that are wrong on the internet. And I would get nothing done if I would have a Twitter. I would spend all of my time correcting people and getting into fights. And so I feel like it is just useful for me for this not to be an option. I tend to just post things online. Yeah, it's a very good question. I don't know how people find it. I feel like for some things that I write, other people think it resonates with them. And then they put it on Twitter. And...Swyx [00:33:43]: Hacker News as well.Nicholas [00:33:44]: Sure, yeah. I am... Because my day job is doing research, I get no value for having this be picked up. There's no whatever. I don't need to be someone who has to have this other thing to give talks. And so I feel like I can just say what I want to say. And if people find it useful, then they'll share it widely. You know, this one went pretty wide. I wrote a thing, whatever, sometime late last year, about how to recover data off of an Apple profile drive from 1980. This probably got, I think, like 1000x less views than this. But I don't care. Like, that's not why I'm doing this. Like, this is the benefit of having a thing that I actually care about, which is my research. I would care much more if that didn't get seen. This is like a thing that I write because I have some thoughts that I just want to put down.Swyx [00:34:32]: Yeah. I think it's the long form thoughtfulness and authenticity that is sadly lacking sometimes in modern discourse that makes it attractive. And I think now you have a little bit of a brand of you are an independent thinker, writer, person, that people are tuned in to pay attention to whatever is next coming.Nicholas [00:34:52]: Yeah, I mean, this kind of worries me a little bit. I don't like whenever I have a popular thing that like, and then I write another thing, which is like entirely unrelated. Like, I don't, I don't... You should actually just throw people off right now.Swyx [00:35:01]: Exactly.Nicholas [00:35:02]: I'm trying to figure out, like, I need to put something else online. So, like, the last two or three things I've done in a row have been, like, actually, like, things that people should care about.Swyx [00:35:10]: Yes. So, I have a couple of things.Nicholas [00:35:11]: I'm trying to figure out which one do I put online to just, like, cull the list of people who have subscribed to my email.Swyx [00:35:16]: And so, like, tell them, like,Nicholas [00:35:16]: no, like, what you're here for is not informed, well-thought-through takes. Like, what you're here for is whatever I want to talk about. And if you're not up for that, then, like, you know, go away. Like, this is not what I want out of my personal website.Swyx [00:35:27]: So, like, here's, like, top 10 enemies or something.Alessio [00:35:30]: What's the next project you're going to work on that is completely unrelated to research LLMs? Or what games do you want to port into the browser next?Swyx [00:35:39]: Okay. Yeah.Nicholas [00:35:39]: So, maybe.Swyx [00:35:41]: Okay.Nicholas [00:35:41]: Here's a fun question. How much data do you think you can put on a single piece of paper?Swyx [00:35:47]: I mean, you can think about bits and atoms. Yeah.Nicholas [00:35:49]: No, like, normal printer. Like, I gave you an office printer. How much data can you put on a piece of paper?Alessio [00:35:54]: Can you re-decode it? So, like, you know, base 64A or whatever. Yeah, whatever you want.Nicholas [00:35:59]: Like, you get normal off-the-shelf printer, off-the-shelf scanner. How much data?Swyx [00:36:03]: I'll just throw out there. Like, 10 megabytes. That's enormous. I know.Nicholas [00:36:07]: Yeah, that's a lot.Swyx [00:36:10]: Really small fonts. That's my question.Nicholas [00:36:12]: So, I have a thing. It does about a megabyte.Swyx [00:36:14]: Yeah, okay.Nicholas [00:36:14]: There you go. I was off by an order of magnitude.Swyx [00:36:16]: Yeah, okay.Nicholas [00:36:16]: So, in particular, it's about 1.44 megabytes. A floppy disk.Swyx [00:36:21]: Yeah, exactly.Nicholas [00:36:21]: So, this is supposed to be the title at some point. It's a floppy disk.Swyx [00:36:24]: A paper is a floppy disk. Yeah.Nicholas [00:36:25]: So, this is a little hard because, you know. So, you can do the math and you get 8.5 by 11. You can print at 300 by 300 DPI. And this gives you 2 megabytes. And so, every single pixel, you need to be able to recover up to like 90 plus percent. Like, 95 percent. Like, 99 point something percent accuracy. In order to be able to actually decode this off the paper. This is one of the things that I'm considering. I need to get a couple more things working for this. Where, you know, again, I'm running into some random problems. But this is probably, this will be one thing that I'm going to talk about. There's this contest called the International Obfuscated C-Code Contest, which is amazing. People try and write the most obfuscated C code that they can. Which is great. And I have a submission for that whenever they open up the next one for it. And I'll write about that submission. I have a very fun gate level emulation of an old CPU that runs like fully precisely. And it's a fun kind of thing. Yeah.Swyx [00:37:20]: Interesting. Your comment about the piece of paper reminds me of when I was in college. And you would have like one cheat sheet that you could write. So, you have a formula, a theoretical limit for bits per inch. And, you know, that's how much I would squeeze in really, really small. Yeah, definitely.Nicholas [00:37:36]: Okay.Swyx [00:37:37]: We are also going to talk about your benchmarking. Because you released your own benchmark that got some attention, thanks to some friends on the internet. What's the story behind your own benchmark? Do you not trust the open source benchmarks? What's going on there?Nicholas [00:37:51]: Okay. Benchmarks tell you how well the model solves the task the benchmark is designed to solve. For a long time, models were not useful. And so, the benchmark that you tracked was just something someone came up with, because you need to track something. All of deep learning exists because people tried to make models classify digits and classify images into a thousand classes. There is no one in the world who cares specifically about the problem of distinguishing between 300 breeds of dog for an image that's 224 or 224 pixels. And yet, like, this is what drove a lot of progress. And people did this not because they cared about this problem, because they wanted to just measure progress in some way. And a lot of benchmarks are of this flavor. You want to construct a task that is hard, and we will measure progress on this benchmark, not because we care about the problem per se, but because we know that progress on this is in some way correlated with making better models. And this is fine when you don't want to actually use the models that you have. But when you want to actually make use of them, it's important to find benchmarks that track with whether or not they're useful to you. And the thing that I was finding is that there would be model after model after model that was being released that would find some benchmark that they could claim state-of-the-art on and then say, therefore, ours is the best. And that wouldn't be helpful to me to know whether or not I should then switch to it. So the argument that I tried to lay out in this post is that more people should make benchmarks that are tailored to them. And so what I did is I wrote a domain-specific language that anyone can write for and say, you can take tasks that you have wanted models to solve for you, and you can put them into your benchmark that's the thing that you care about. And then when a new model comes out, you benchmark the model on the things that you care about. And you know that you care about them because you've actually asked for those answers before. And if the model scores well, then you know that for the kinds of things that you have asked models for in the past, it can solve these things well for you. This has been useful for me because when another model comes out, I can run it. I can see, does this solve the kinds of things that I care about? And sometimes the answer is yes, and sometimes the answer is no. And then I can decide whether or not I want to use that model or not. I don't want to say that existing benchmarks are not useful. They're very good at measuring the thing that they're designed to measure. But in many cases, what that's designed to measure is not actually the thing that I want to use it for. And I expect that the way that I want to use it is different the way that you want to use it. And I would just like more people to have these things out there in the world. And the final reason for this is, it is very easy. If you want to make a model good at some benchmark, to make it good at that benchmark, you can find the distribution of data that you need and train the model to be good on the distribution of data. And then you have your model that can solve this benchmark well. And by having a benchmark that is not very popular, you can be relatively certain that no one has tried to optimize their model for your benchmark.Swyx [00:40:40]: And I would like this to be-Nicholas [00:40:40]: So publishing your benchmark is a little bit-Swyx [00:40:43]: Okay, sure.Nicholas [00:40:43]: Contextualized. So my hope in doing this was not that people would use mine as theirs. My hope in doing this was that- You should make yours. Yes, you should make your benchmark. And if, for example, there were even a very small fraction of people, 0.1% of people who made a benchmark that was useful for them, this would still be hundreds of new benchmarks that- not want to make one myself, but I might want to- I might know the kinds of work that I do is a little bit like this person, a little bit like that person. I'll go check how it is on their benchmarks. And I'll see, roughly, I'll get a good sense of what's going on. Because the alternative is people just do this vibes-based evaluation thing, where you interact with the model five times, and you see if it worked on the kinds of things that you just like your toy questions. But five questions is a very low bit output from whether or not it works for this thing. And if you could just automate running it 100 questions for you, it's a much better evaluation. So that's why I did this.Swyx [00:41:37]: Yeah, I like the idea of going through your chat history and actually pulling out real-life examples. I regret to say that I don't think my chat history is used as much these days, because I'm using Cursor, the native AI IDE. So your examples are all coding related. And the immediate question is, now that you've written the How I Use AI post, which is a little bit broader, are you able to translate all these things to evals? Are some things unevaluable?Nicholas [00:42:03]: Right. A number of things that I do are harder to evaluate. So this is the problem with a benchmark, is you need some way to check whether or not the output was correct. And so all of the kinds of things that I can put into the benchmark are the kinds of things that you can check. You can check more things than you might have thought would be possible if you do a little bit of work on the back end. So for example, all of the code that I have the model write, it runs the code and sees whether the answer is the correct answer. Or in some cases, it runs the code, feeds the output to another language model, and the language model judges was the output correct. And again, is using a language model to judge here perfect? No. But like, what's the alternative? The alternative is to not do it. And what I care about is just, is this thing broadly useful for the kinds of questions that I have? And so as long as the accuracy is better than roughly random, like, I'm okay with this. I've inspected the outputs of these, and like, they're almost always correct. If you ask the model to judge these things in the right way, they're very good at being able to tell this. And so, yeah, I probably think this is a useful thing for people to do.Alessio [00:43:04]: You complain about prompting and being lazy and how you do not want to tip your model and you do not want to murder a kitten just to get the right answer. How do you see the evolution of like prompt engineering? Even like 18 months ago, maybe, you know, it was kind of like really hot and people wanted to like build companies around it. Today, it's like the models are getting good. Do you think it's going to be less and less relevant going forward? Or what's the minimum valuable prompt? Yeah, I don't know.Nicholas [00:43:29]: I feel like a big part of making an agent is just like a fancy prompt that like, you know, calls back to the model again. I have no opinion. It seems like maybe it turns out that this is really important. Maybe it turns out that this isn't. I guess the only comment I was making here is just to say, oftentimes when I use a model and I find it's not useful, I talk to people who help make it. The answer they usually give me is like, you're using it wrong. Which like reminds me very much of like that you're holding it wrong from like the iPhone kind of thing, right? Like, you know, like I don't care that I'm holding it wrong. I'm holding it that way. If the thing is not working with me, then like it's not useful for me. Like it may be the case that there exists a way to ask the model such that it gives me the answer that's correct, but that's not the way I'm doing it. If I have to spend so much time thinking about how I want to frame the question, that it would have been faster for me just to get the answer. It didn't save me any time. And so oftentimes, you know, what I do is like, I just dump in whatever current thought that I have in whatever ill-formed way it is. And I expect the answer to be correct. And if the answer is not correct, like in some sense, maybe the model was right to give me the wrong answer. Like I may have asked the wrong question, but I want the right answer still. And so like, I just want to sort of get this as a thing. And maybe the way to fix this is you have some default prompt that always goes into all the models or something, or you do something like clever like this. It would be great if someone had a way to package this up and make a thing I think that's entirely reasonable. Maybe it turns out that as models get better, you don't need to prompt them as much in this way. I just want to use the things that are in front of me.Alessio [00:44:55]: Do you think that's like a limitation of just how models work? Like, you know, at the end of the day, you're using the prompt to kind of like steer it in the latent space. Like, do you think there's a way to actually not make the prompt really relevant and have the model figure it out? Or like, what's the... I mean, you could fine tune itNicholas [00:45:10]: into the model, for example, that like it's supposed to... I mean, it seems like some models have done this, for example, like some recent model, many recent models. If you ask them a question, computing an integral of this thing, they'll say, let's think through this step by step. And then they'll go through the step by step answer. I didn't tell it. Two years ago, I would have had to have prompted it. Think step by step on solving the following thing. Now you ask them the question and the model says, here's how I'm going to do it. I'm going to take the following approach and then like sort of self-prompt itself.Swyx [00:45:34]: Is this the right way?Nicholas [00:45:35]: Seems reasonable. Maybe you don't have to do it. I don't know. This is for the people whose job is to make these things better. And yeah, I just want to use these things. Yeah.Swyx [00:45:43]: For listeners, that would be Orca and Agent Instruct. It's the soda on this stuff. Great. Yeah.Alessio [00:45:49]: That's a few shot. It's included in the lazy prompting. Like, do you do a few shot prompting? Like, do you collect some examples when you want to put them in? Or...Nicholas [00:45:57]: I don't because usually when I want the answer, I just want to get the answer. Brutal.Swyx [00:46:03]: This is hard mode. Yeah, exactly.Nicholas [00:46:04]: But this is fine.Swyx [00:46:06]: I want to be clear.Nicholas [00:46:06]: There's a difference between testing the ultimate capability level of the model and testing the thing that I'm doing with it. What I'm doing is I'm not exercising its full capability level because there are almost certainly better ways to ask the questions and sort of really see how good the model is. And if you're evaluating a model for being state of the art, this is ultimately what I care about. And so I'm entirely fine with people doing fancy prompting to show me what the true capability level could be because it's really useful to know what the ultimate level of the model could be. But I think it's also important just to have available to you how good the model is if you don't do fancy things.Swyx [00:46:39]: Yeah, I would say that here's a divergence between how models are marketed these days versus how people use it, which is when they test MMLU, they'll do like five shots, 25 shots, 50 shots. And no one's providing 50 examples. I completely agree.Nicholas [00:46:54]: You know, for these numbers, the problem is everyone wants to get state of the art on the benchmark. And so you find the way that you can ask the model the questions so that you get state of the art on the benchmark. And it's good. It's legitimately good to know. It's good to know the model can do this thing if only you try hard enough. Because it means that if I have some task that I want to be solved, I know what the capability level is. And I could get there if I was willing to work hard enough. And the question then is, should I work harder and figure out how to ask the model the question? Or do I just do the thing myself? And for me, I have programmed for many, many, many years. It's often just faster for me just to do the thing than to figure out the incantation to ask the model. But I can imagine someone who has never programmed before might be fine writing five paragraphs in English describing exactly the thing that they want and have the model build it for them if the alternative is not. But again, this goes to all these questions of how are they going to validate? Should they be trusting the output? These kinds of things.Swyx [00:47:49]: One problem with your eval paradigm and most eval paradigms, I'm not picking on you, is that we're actually training these things for chat, for interactive back and forth. And you actually obviously reveal much more information in the same way that asking 20 questions reveals more information in sort of a tree search branching sort of way. Then this is also by the way the problem with LMSYS arena, right? Where the vast majority of prompts are single question, single answer, eval, done. But actually the way that we use chat things, in the way, even in the stuff that you posted in your how I use AI stuff, you have maybe 20 turns of back and forth. How do you eval that?Nicholas [00:48:25]: Yeah. Okay. Very good question. This is the thing that I think many people should be doing more of. I would like more multi-turn evals. I might be writing a paper on this at some point if I get around to it. A couple of the evals in the benchmark thing I have are already multi-turn. I mentioned 20 questions. I have a 20 question eval there just for fun. But I have a couple others that are like, I just tell the model, here's my get thing, figure out how to cherry pick off this other branch and move it over there. And so what I do is I just, I basically build a tiny little agency thing. I just ask the model how I do it. I run the thing on Linux. This is what I want a Docker for. I spin up a Docker container. I run whatever the model told me the output to do is. I feed the output back into the model. I repeat this many rounds. And then I check at the very end, does the git commit history show that it is correctly cherry picked in

Is finetuning GPT4o worth it?

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Aug 22, 2024 65:19

Betteridge's law says no: with seemingly infinite flavors of RAG, and >2million token context + prompt caching from Anthropic/Deepmind/Deepseek, it's reasonable to believe that "in context learning is all you need".But then there's Cosine Genie, the first to make a huge bet using OpenAI's new GPT4o fine-tuning for code at the largest scale it has ever been used externally; resulting in what is now the #1 coding agent in the world according to SWE-Bench Full, Lite, and Verified:SWE-Bench has been the most successful agent benchmark of the year, receiving honors at ICLR (our interview here) and recently being verified by OpenAI. Cognition (Devin) was valued at $2b after reaching 14% on it. So it is very, very big news when a new agent appears to beat all other solutions, by a lot:While this number is self reported, it seems to be corroborated by OpenAI, who also award it clear highest marks on SWE-Bench verified:The secret is GPT-4o finetuning on billions of tokens of synthetic data. * Finetuning: As OpenAI says:Genie is powered by a fine-tuned GPT-4o model trained on examples of real software engineers at work, enabling the model to learn to respond in a specific way. The model was also trained to be able to output in specific formats, such as patches that could be committed easily to codebases. Due to the scale of Cosine's finetuning, OpenAI worked closely with them to figure out the size of the LoRA:“They have to decide how big your LoRA adapter is going to be… because if you had a really sparse, large adapter, you're not going to get any signal in that at all. So they have to dynamically size these things.”* Synthetic data: we need to finetune on the process of making code work instead of only training on working code.“…we synthetically generated runtime errors. Where we would intentionally mess with the AST to make stuff not work, or index out of bounds, or refer to a variable that doesn't exist, or errors that the foundational models just make sometimes that you can't really avoid, you can't expect it to be perfect.”Genie also has a 4 stage workflow with the standard LLM OS tooling stack that lets it solve problems iteratively:Full Video Podlike and subscribe etc!Show Notes* Alistair Pullen - Twitter, Linkedin* Cosine Genie launch, technical report* OpenAI GPT-4o finetuning GA* Llama 3 backtranslation* Cursor episode and Aman + SWEBench at ICLR episodeTimestamps* [00:00:00] Suno Intro* [00:05:01] Alistair and Cosine intro* [00:16:34] GPT4o finetuning* [00:20:18] Genie Data Mix* [00:23:09] Customizing for Customers* [00:25:37] Genie Workflow* [00:27:41] Code Retrieval* [00:35:20] Planning* [00:42:29] Language Mix* [00:43:46] Running Code* [00:46:19] Finetuning with OpenAI* [00:49:32] Synthetic Code Data* [00:51:54] SynData in Llama 3* [00:52:33] SWE-Bench Submission Process* [00:58:20] Future Plans* [00:59:36] Ecosystem Trends* [01:00:55] Founder Lessons* [01:01:58] CTA: Hiring & CustomersDescript Transcript[00:01:52] AI Charlie: Welcome back. This is Charlie, your AI cohost. As AI engineers, we have a special focus on coding agents, fine tuning, and synthetic data. And this week, it all comes together with the launch of Cosign's Genie, which reached 50 percent on SWE Bench Lite, 30 percent on the full SWE Bench, and 44 percent on OpenAI's new SWE Bench Verified.[00:02:17] All state of the art results by the widest ever margin recorded compared to former leaders Amazon Q and US Autocode Rover. And Factory Code Droid. As a reminder, Cognition Devon went viral with a 14 percent score just five months ago. Cosign did this by working closely with OpenAI to fine tune GPT 4. 0, now generally available to you and me, on billions of tokens of code, much of which was synthetically generated.[00:02:47] Alistair Pullen: Hi, I'm Ali. Co founder and CEO of Cosign, a human reasoning lab. And I'd like to show you Genie, our state of the art, fully autonomous software engineering colleague. Genie has the highest score on SWBench in the world. And the way we achieved this was by taking a completely different approach. We believe that if you want a model to behave like a software engineer, it has to be shown how a human software engineer works.[00:03:15] We've designed new techniques to derive human reasoning from real examples of software engineers doing their jobs. Our data represents perfect information lineage, incremental knowledge discovery, and step by step decision making. Representing everything a human engineer does logically. By actually training Genie on this unique dataset, rather than simply prompting base models, which is what everyone else is doing, we've seen that we're no longer simply generating random code until some works.[00:03:46] It's tackling problems like[00:03:48] AI Charlie: a human. Alistair Pullen is CEO and co founder of Kozen, and we managed to snag him on a brief trip stateside for a special conversation on building the world's current number one coding agent. Watch out and take care.[00:04:07] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO of Resonance at Decibel Partners, and I'm joined by my co host Swyx, founder of Small. ai.[00:04:16] swyx: Hey, and today we're back in the studio. In person, after about three to four months in visa jail and travels and all other fun stuff that we talked about in the previous episode.[00:04:27] But today we have a special guest, Ali Pullen from Cosign. Welcome. Hi, thanks for having me. We're very lucky to have you because you're on a two day trip to San Francisco. Yeah, I wouldn't recommend it. I would not[00:04:38] Alistair Pullen: recommend it. Don't fly from London to San Francisco for two days.[00:04:40] swyx: And you launched Genie on a plane.[00:04:42] On plain Wi Fi, um, claiming state of the art in SuiteBench, which we're all going to talk about. I'm excited to dive into your whole journey, because it has been a journey. I've been lucky to be a small angel in part of that journey. And it's exciting to see that you're launching to such acclaim and, you know, such results.[00:05:01] Alistair and Cosine intro[00:05:01] swyx: Um, so I'll go over your brief background, and then you can sort of fill in the blanks on what else people should know about you. You did your bachelor's in computer science at Exeter.[00:05:10] Speaker 6: Yep.[00:05:10] swyx: And then you worked at a startup that got acquired into GoPuff and round about 2022, you started working on a stealth startup that became a YC startup.[00:05:19] What's that? Yeah. So[00:05:21] Alistair Pullen: basically when I left university, I, I met my now co founder, Sam. At the time we were both mobile devs. He was an Android developer. iOS developer. And whilst at university, we built this sort of small consultancy, sort of, we'd um, be approached to build projects for people and we would just take them up and start with, they were student projects.[00:05:41] They weren't, they weren't anything crazy or anything big. We started with those and over time we started doing larger and larger projects, more interesting things. And then actually, when we left university, we just kept doing that. We didn't really get jobs, traditional jobs. It was also like in the middle of COVID, middle of lockdown.[00:05:57] So we were like, this is a pretty good gig. We'll just keep like writing code in our bedrooms. And yeah, that's it. We did that for a while. And then a friend of ours that we went to Exeter with started a YC startup during COVID. And it was one of these fast grocery delivery companies. At the time I was living in the deepest, darkest countryside in England, where fast grocery companies are still not a thing.[00:06:20] So he, he sort of pitched me this idea and was like, listen, like I need an iOS dev, do you fancy coming along? And I thought, absolutely. It was a chance to get out of my parents house, chance to move to London, you know, do interesting things. And at the time, truthfully, I had no idea what YC was. I had no idea.[00:06:34] I wasn't in the startup space. I knew I liked coding and building apps and stuff, but I'd never, never really done anything in that area. So I said, yes, absolutely. I moved to London just sort of as COVID was ending and yeah, worked at what was fancy for about a year and a half. Then we brought Sam along as well.[00:06:52] So we, Sam and I, were the two engineers at Fancy for basically its entire life, and we built literally everything. So like the, the front, the client mobile apps, the, the backends, the internal like stock management system, the driver routing, algorithms, all those things. Literally like everything. It was my first.[00:07:12] You know, both of us were super inexperienced. We didn't have, like, proper engineering experience. There were definitely decisions we'd do differently now. We'd definitely buy a lot of stuff off the shelf, stuff like that. But it was the initial dip of the toe into, like, the world of startups, and we were both, like, hooked immediately.[00:07:26] We were like, this is so cool. This sounds so much better than all our friends who were, like, consultants and doing, like, normal jobs, right? We did that, and it ran its course, and after, I want to say, 18 months or so, GoPuff came and acquired us. And there was obviously a transitionary period, an integration period, like with all acquisitions, and we did that, and as soon as we'd vested what we wanted to vest, and as soon as we thought, okay, this chapter is sort of done, uh, in about 2022, We left and we knew that we wanted to go alone and try something like we'd had this taste.[00:07:54] Now we knew we'd seen how a like a YC startup was managed like up close and we knew that we wanted to do something similar ourselves. We had no idea what it was at the time. We just knew we wanted to do something. So we, we tried a small, um, some small projects in various different areas, but then GPT 3.[00:08:12] He'd seen it on Reddit and I'm his source of all knowledge. Yeah, Sam loves Reddit. I'd actually heard of GPT 2. And obviously had like loosely followed what OpenAI had done with, what was the game they trained a model to play? Dota. Was it Dota? Yeah. So I'd followed that and, I knew loosely what GPT 2 was, I knew what BERT was, so I was like, Okay, this GPT 3 thing sounds interesting.[00:08:35] And he just mentioned it to me on a walk. And I then went home and, like, googled GPT was the playground. And the model was DaVinci 2 at the time. And it was just the old school playground, completions, nothing crazy, no chat, no nothing. I miss completions though. Yeah. Oh, completion. Honestly, I had this conversation in open hours office yesterday.[00:08:54] I was like, I just went. I know. But yeah, so we, we, um, I started playing around with the, the playground and the first thing I ever wrote into it was like, hello world, and it gave me some sort of like, fairly generic response back. I was like, okay, that looks pretty cool. The next thing was. I looked through the docs, um, also they had a lot of example prompts because I had no idea.[00:09:14] I didn't know if the, if you could put anything in, I didn't know if you had to structure in a certain way or whatever, and I, and I saw that it could start writing like tables and JSON and stuff like that. So I was like, okay, can you write me something in JSON? And it did. And I was like, Oh, wow, this is, this is pretty cool.[00:09:28] Um, can it, can it just write arbitrary JSON for me? And, um, immediately as soon as I realized that my mind was racing and I like got Sam in and we just started messing around in the playground, like fairly innocently to start with. And then, of course, both being mobile devs and also seeing, at that point, we learned about what the Codex model was.[00:09:48] It was like, this thing's trained to write code, sounds awesome. And Copilot was start, I think, I can't actually remember if Copilot had come out yet, it might have done. It's round about the same time as Codex. Round about the same time, yeah. And we were like, okay, as mobile devs, let's see what we can do.[00:10:02] So the initial thing was like, okay, let's see if we can get this AI to build us a mobile app from scratch. We eventually built the world's most flimsy system, which was back in the day with like 4, 000 token context windows, like chaining prompts, trying to keep as much context from one to the other, all these different things, where basically, Essentially, you'd put an app idea in a box, and then we'd do, like, very high level stuff, figuring out what the stack should be, figuring out what the frontend should be written in, backend should be written in, all these different things, and then we'd go through, like, for each thing, more and more levels of detail, until the point that you're You actually got Codex to write the code for each thing.[00:10:41] And we didn't do any templating or anything. We were like, no, we're going to write all the code from scratch every time, which is basically why it barely worked. But there were like occasions where you could put in something and it would build something that did actually run. The backend would run, the database would work.[00:10:54] And we were like, Oh my God, this is insane. This is so cool. And that's what we showed to our co founder Yang. I met my co founder Yang through, through fancy because his wife was their first employee. And, um, we showed him and he was like, You've discovered fire. What is this? This is insane. He has a lot more startup experience.[00:11:12] Historically, he's had a few exits in the past and has been through all different industries. He's like our dad. He's a bit older. He hates me saying that. He's your COO now? He's our COO. Yeah. And, uh, we showed him and he was like, this is absolutely amazing. Let's just do something. Cause he, he, at the time, um, was just about to have a child, so he didn't have anything going on either.[00:11:29] So we, we applied to YC, got an interview. The interview was. As most YC interviews are short, curt, and pretty brutal. They told us they hated the idea. They didn't think it would work. And that's when we started brainstorming. It was almost like the interview was like an office hours kind of thing. And we were like, okay, given what you know about the space now and how to build things with these LLMs, like what can you bring out of what you've learned in building that thing into Something that might be a bit more useful to people on the daily, and also YC obviously likes B2B startups a little bit more, at least at the time they did, back then.[00:12:01] So we were like, okay, maybe we could build something that helps you with existing codebases, like can sort of automate development stuff with existing codebases, not knowing at all what that would look like, or how you would build it, or any of these things. And They were like, yeah, that sounds interesting.[00:12:15] You should probably go ahead and do that. You're in, you've got two weeks to build us an MVP. And we were like, okay, okay. We did our best. The MVP was absolutely horrendous. It was a CLI tool. It sucked. And, um, at the time we were like, we, we don't even know. How to build what we want to build. And we didn't really know what we wanted to build, to be honest.[00:12:33] Like, we knew we wanted to try to help automate dev work, but back then we just didn't know enough about how LLM apps were built, the intricacies and all those things. And also, like, the LLMs themselves, like 4, 000 tokens, you're not going very far, they're extremely expensive. So we ended up building a, uh, a code based retrieval tool, originally.[00:12:51] Our thought process originally was, we want to build something that can do our jobs for us. That is like the gold star, we know that. We've seen like there are glimpses of it happening with our initial demo that we did. But we don't see the path of how to do that at the moment. Like the tech just wasn't there.[00:13:05] So we were like, well, there are going to be some things that you need to build this when the tech does catch up. So retrieval being one of the most important things, like the model is going to have to build like pull code out of a code base somehow. So we were like, well, let's just build the tooling around it.[00:13:17] And eventually when the tech comes, then we'll be able to just like plug it into our, our tooling and then it should work basically. And to be fair, that's basically what we've done. And that's basically what's happened, which is very fortunate. But in the meantime, whilst we were waiting for everything to sort of become available, we built this code base retrieval tool.[00:13:34] That was the first thing we ever launched when we were in YC like that, and it didn't work. It was really frustrating for us because it was just me and Sam like working like all hours trying to get this thing to work. It was quite a big task in of itself, trying to get like a good semantic search engine working that could run locally on your machine.[00:13:51] We were trying to avoid sending code to the cloud as much as possible. And then for very large codebases, you're like, you know, millions of lines of code. You're trying to do some sort of like local HNSW thing that runs inside your VS Code instance that like eats all your RAM as you've seen in the past.[00:14:05] All those different things. Yep. Yeah.[00:14:07] swyx: My first call with[00:14:07] Alistair Pullen: you, I had trouble. You were like, yeah, it sucks, man. I know, I know. I know it sucks. I'm sorry. I'm sorry. But building all that stuff was essentially the first six to eight months of what at the time was built. Which, by the way, build it. Build it. Yeah, it was a terrible, terrible name.[00:14:25] It was the worst,[00:14:27] swyx: like, part of trying to think about whether I would invest is whether or not people could pronounce it.[00:14:32] Alistair Pullen: No, when we, so when we went on our first ever YC, like, retreat, No one got the name right. They were like, build, build, well, um, and then we actually changed the names, cosign, like, although some people would spell it as in like, as if you're cosigning for an apartment or something like that's like, can't win.[00:14:49] Yeah. That was what built was back then. But the ambition, and I did a talk on this back in the end of 2022, the ambition to like build something that essentially automated our jobs was still very much like core to what we were doing. But for a very long time, it was just never apparent to us. Like. How would you go about doing these things?[00:15:06] Even when, like, you had 3. suddenly felt huge, because you've gone from 4 to 16, but even then 16k is like, a lot of Python files are longer than 16k. So you can't, you know, before you even start doing a completion, even then we were like, eh, Yeah, it looks like we're still waiting. And then, like, towards the end of last year, you then start, you see 32k.[00:15:28] 32k was really smart. It was really expensive, but also, like, you could fit a decent amount of stuff in it. 32k felt enormous. And then, finally, 128k came along, and we were like, right, this is, like, this is what we can actually deal with. Because, fundamentally, to build a product like this, you need to get as much information in front of the model as possible, and make sure that everything it ever writes in output can be read.[00:15:49] traced back to something in the context window, so it's not hallucinating it. As soon as that model existed, I was like, okay, I know that this is now going to be feasible in some way. We'd done early sort of dev work on Genie using 3. 5 16k. And that was a very, very like crude way of proving that this loop that we were after and the way we were generating the data actually had signal and worked and could do something.[00:16:16] But the model itself was not useful because you couldn't ever fit enough information into it for it to be able to do the task competently and also the base intelligence of the model. I mean, 3. 5, anyone who's used 3. 5 knows the base intelligence of the model is. is lacking, especially when you're asking it to like do software engineering, this is quite quite involved.[00:16:34] GPT4o finetuning[00:16:34] Alistair Pullen: So, we saw the 128k context model and um, at that point we'd been in touch with OpenAI about our ambitions and like how we wanted to build it. We essentially are, I just took a punt, I was like, I'm just going to ask to see, can we like train this thing? Because at the time Fortobo had just come out and back then there was still a decent amount of lag time between like OpenAI releasing a model and then allowing you to fine tune it in some way.[00:16:59] They've gotten much better about that recently, like 4. 0 fine tuning came out either, I think, a day, 4. 0 mini fine tuning came out like a day after the model did. And I know that's something they're definitely like, optimising for super heavily inside, which is great to see.[00:17:11] swyx: Which is a little bit, you know, for a year or so, YC companies had like a direct Slack channel to open AI.[00:17:17] We still do. Yeah. Yeah. So, it's a little bit of a diminishing of the YC advantage there. Yeah. If they're releasing this fine tuning[00:17:23] Alistair Pullen: ability like a day after. Yeah, no, no, absolutely. But like. You can't build a startup otherwise. The advantage is obviously nice and it makes you feel fuzzy inside. But like, at the end of the day, it's not that that's going to make you win.[00:17:34] But yeah, no, so like we'd spoken to Shamul there, Devrel guy, I'm sure you know him. I think he's head of solutions or something. In their applied team, yeah, we'd been talking to him from the very beginning when we got into YC, and he's been absolutely fantastic throughout. I basically had pitched him this idea back when we were doing it on 3.[00:17:53] 5, 16k, and I was like, this is my, this is my crazy thesis. I want to see if this can work. And as soon as like that 128k model came out, I started like laying the groundwork. I was like, I know this definitely isn't possible because he released it like yesterday, but know that I want it. And in the interim, like, GPT 4, like, 8K fine tuning came out.[00:18:11] We tried that, it's obviously even fewer tokens, but the intelligence helped. And I was like, if we can marry the intelligence and the context window length, then we're going to have something special. And eventually, we were able to get on the Experimental Access Program, and we got access to 4Turbo fine tuning.[00:18:25] As soon as we did that, because in the entire run up to that we built the data pipeline, we already had all that set up, so we were like, right, we have the data, now we have the model, let's put it through and iterate, essentially, and that's, that's where, like, Genie as we know it today, really was born. I won't pretend like the first version of Gene that we trained was good.[00:18:45] It was a disaster. That's where you realize all the implicit biases in your data set. And you realize that, oh, actually this decision you made that was fairly arbitrary was the wrong one. You have to do it a different way. Other subtle things like, you know, how you write Git diffs in using LLMs and how you can best optimize that to make sure they actually apply and work and loads of different little edge cases.[00:19:03] But as soon as we had access to the underlying tool, we were like, we can actually do this. And I was I breathed a sigh of relief because I didn't know it was like, it wasn't a done deal, but I knew that we could build something useful. I mean, I knew that we could build something that would be measurably good on whatever eval at the time that you wanted to use.[00:19:23] Like at the time, back then, we weren't actually that familiar with Swift. But once Devin came out and they announced the SBBench core, I like, that's when my life took a turn. Challenge accepted. Yeah, challenge accepted. And that's where like, yes, that's where my friendships have gone. My sleep has gone. My weight.[00:19:40] Everything got into SweeBench and yeah, we, we, it was actually a very useful tool in building GeniX beforehand. It was like, yes, vibe check this thing and see if it's useful. And then all of a sudden you have a, an actual measure to, to see like, couldn't it do software engineering? Not, not the best measure, obviously, but like it's a, it's the best that we've got now.[00:19:57] We, we just iterated and built and eventually we got it to the point where it is now. And a little bit beyond since we actually Like, we actually got that score a couple of weeks ago, and yeah, it's been a hell of a journey from the beginning all the way now. That was a very rambling answer to your question about how we got here, but that's essentially the potted answer of how we got here.[00:20:16] Got the full[00:20:16] swyx: origin story[00:20:17] Alessio: out. Yeah, no, totally.[00:20:18] Genie Data Mix[00:20:18] Alessio: You mentioned bias in the data and some of these things. In your announcement video, you called Genie the worst verse AI software engineering colleague. And you kind of highlighted how the data needed to train it needs to show how a human engineer works. I think maybe you're contrasting that to just putting code in it.[00:20:37] There's kind of like a lot more than code that goes into software engineering. How do you think about the data mixture, you know, and like, uh, there's this kind of known truth that code makes models better when you put in the pre training data, but since we put so much in the pre training data, what else do you add when you turn to Genium?[00:20:54] Alistair Pullen: Yeah, I think, well, I think that sort of boils down fundamentally to the difference between a model writing code and a model doing software engineering, because the software engineering sort of discipline goes wider, because if you look at something like a PR, that is obviously a Artifact of some thought and some work that has happened and has eventually been squashed into, you know, some diffs, right?[00:21:17] What the, very crudely, what the pre trained models are reading is they're reading those final diffs and they're emulating that and they're being able to output it, right? But of course, it's a super lossy thing, a PR. You have no idea why or how, for the most part, unless there are some comments, which, you know, anyone who's worked in a company realizes PR reviews can be a bit dodgy at times, but you see that you lose so much information at the end, and that's perfectly fine, because PRs aren't designed to be something that perfectly preserves everything that happened, but What we realized was if you want something that's a software engineer, and very crudely, we started with like something that can do PRs for you, essentially, you need to be able to figure out why those things happened.[00:21:58] Otherwise, you're just going to rely, you essentially just have a code writing model, you have something that's good at human eval, but But, but not very good at Sweet Eng. Essentially that realization was, was part of the, the kernel of the idea of of, of the approach that we took to design the agent. That, that is genie the way that we decided we want to try to extract what happened in the past, like as forensically as possible, has been and is currently like one of the, the main things that we focus all our time on, because doing that as getting as much signal out as possible, doing that as well as possible is the biggest.[00:22:31] thing that we've seen that determines how well we do on that benchmark at the end of the day. Once you've sorted things out, like output structure, how to get it consistently writing diffs and all the stuff that is sort of ancillary to the model actually figuring out how to solve a problem, the core bit of solving the problem is how did the human solve this problem and how can we best come up with how the human solved these problems.[00:22:54] So all the effort went in on that. And the mix that we ended up with was, as you've probably seen in the technical report and so on, all of those different languages and different combinations of different task types, all of that has run through that pipeline, and we've extracted all that information out.[00:23:09] Customizing for Customers[00:23:09] Alessio: How does that differ when you work with customers that have private workflows? Like, do you think, is there usually a big delta between what you get in open source and maybe public data versus like Yeah,[00:23:19] Alistair Pullen: yeah, yeah. When you scrape enough of it, most of open source is updating readmes and docs. It's hilarious, like we had to filter out so much of that stuff because when we first did the 16k model, like the amount of readme updating that went in, we did like no data cleaning, no real, like, we just sort of threw it in and saw what happened.[00:23:38] And it was just like, It was really good at updating readme, it was really good at writing some comments, really good at, um, complaining in Git reviews, in PR reviews, rather, and it would, again, like, we didn't clean the data, so you'd, like, give it some feedback, and it would just, like, reply, and, like, it would just be quite insubordinate when it was getting back to you, like, no, I don't think you're right, and it would just sort of argue with you, so The process of doing all that was super interesting because we realized from the beginning, okay, there's a huge amount of work that needs to go into like cleaning this, getting it aligned with what we want the model to do to be able to get the model to be useful in some way.[00:24:12] Alessio: I'm curious, like, how do you think about the customer willingness? To share all of this historical data, I've done a lot of developer tools investing in my career and getting access to the code base is always one of the hard things. Are people getting more cautious about sharing this information? In the past, it was maybe like, you know, you're using static analysis tool, like whatever else you need to plug into the code base, fine.[00:24:35] Now you're building. A model based on it, like, uh, what's the discussion going into these companies? Are most people comfortable with, like, letting you see how to work and sharing everything?[00:24:44] Alistair Pullen: It depends on the sector, mostly. We've actually seen, I'd say, people becoming more amenable to the idea over time, actually, rather than more skeptical, because I think they can see the, the upside.[00:24:55] If this thing could be, Does what they say it does, it's going to be more help to us than it is a risk to our infosec. Um, and of course, like, companies building in this space, we're all going to end up, you know, complying with the same rules, and there are going to be new rules that come out to make sure that we're looking at your code, that everything is safe, and so on.[00:25:12] So from what we've seen so far, we've spoken to some very large companies that you've definitely heard of and all of them obviously have stipulations and many of them want it to be sandbox to start with and all the like very obvious things that I, you know, I would say as well, but they're all super keen to have a go and see because like, despite all those things, if we can genuinely Make them go faster, allow them to build more in a given time period and stuff.[00:25:35] It's super worth it to them.[00:25:37] Genie Workflow[00:25:37] swyx: Okay, I'm going to dive in a little bit on the process that you have created. You showed the demo on your video, and by the time that we release this, you should be taking people off the waitlist and launching people so people can see this themselves. There's four main Parts of the workflow, which is finding files, planning action, writing code and running tests.[00:25:58] And controversially, you have set yourself apart from the Devins of the world by saying that things like having access to a browser is not that important for you. Is that an accurate reading of[00:26:09] Alistair Pullen: what you wrote? I don't remember saying that, but At least with what we've seen, the browser is helpful, but it's not as helpful as, like, ragging the correct files, if that makes sense.[00:26:20] Like, it is still helpful, but obviously there are more fundamental things you have to get right before you get to, like, Oh yeah, you can read some docs, or you can read a stack overflow article, and stuff like that.[00:26:30] swyx: Yeah, the phrase I was indexing on was, The other software tools are wrappers around foundational models with a few additional tools, such as a web browser or code interpreter.[00:26:38] Alistair Pullen: Oh, I see. No, I mean, no, I'm, I'm not, I'm not, I'm not deri, I'm deriding the, the, the approach that, not the, not the tools. Yeah, exactly. So like, I would[00:26:44] swyx: say in my standard model of what a code agent should look like, uh, Devon has been very influential, obviously. Yeah. Yeah. Because you could just add the docs of something.[00:26:54] Mm-Hmm. . And like, you know, now I have, now when I'm installing a new library, I can just add docs. Yeah, yeah. Cursor also does this. Right. And then obviously having a code interpreter does help. I guess you have that in the form[00:27:03] Alistair Pullen: of running tests. I mean, uh, the Genie has both of those tools available to it as well.[00:27:08] So, yeah, yeah, yeah. So, we have a tool where you can, like, put in URLs and it will just read the URLs. And you can also use this Perplexities API under the hood as well to be able to actually ask questions if it wants to. Okay. So, no, we use both of those tools as well. Like, those tools are Super important and super key.[00:27:24] I think obviously the most important tools to these agents are like being able to retrieve code from a code base, being able to read Stack Overflow articles and what have you and just be able to essentially be able to Google like we do is definitely super useful.[00:27:38] swyx: Yeah, I thought maybe we could just kind of dive into each of those actions.[00:27:41] Code Retrieval[00:27:41] swyx: Code retrieval, one of the core indexer that Yes. You've worked on, uh, even as, as built, what makes it hard, what approach you thought would work, didn't work,[00:27:52] Alistair Pullen: anything like that. It's funny, I had a similar conversation to this when I was chatting to the guys from OpenAI yesterday. The thing is that searching for code, specifically semantically, at least to start with, I mean like keyword search and stuff like that is a, is a solved problem.[00:28:06] It's been around for ages, but at least being able to, the phrase we always used back in the day was searching for what code does rather than what code is. Like searching for functionality is really hard. Really hard. The way that we approached that problem was that obviously like a very basic and easy approach is right.[00:28:26] Let's just embed the code base. We'll chunk it up in some arbitrary way, maybe using an AST, maybe using number of lines, maybe using whatever, like some overlapping, just chunk it up and embed it. And once you've done that, I will write a query saying, like, find me some authentication code or something, embed it, and then do the cosine similarity and get the top of K, right?[00:28:43] That doesn't work. And I wish it did work, don't get me wrong. It doesn't work well at all, because fundamentally, if you think about, like, semantically, how code looks is very different to how English looks, and there's, like, not a huge amount of signal that's carried between the two. So what we ended up, the first approach we took, and that kind of did well enough for a long time, was Okay, let's train a model to be able to take in English code queries and then produce a hypothetical code snippet that might look like the answer, embed that, and then do the code similarity.[00:29:18] And that process, although very simple, gets you so much more performance out of the retrieval accuracy. And that was kind of like the start of our of our engine, as we called it, which is essentially like the aggregation of all these different heuristics, like semantic, keyword, LSP, and so on. And then we essentially had like a model that would, given an input, choose which ones it thought were most appropriate, given the type of requests you had.[00:29:45] So the whole code search thing was a really hard problem. And actually what we ended up doing with Genie is we, um, let The model through self play figure out how to retrieve code. So actually we don't use our engine for Genie. So instead of like a request coming in and then like say GPT 4 with some JSON output being like, Well, I think here we should use a keyword with these inputs and then we should use semantic.[00:30:09] And then we should like pick these results. It's actually like, A question comes in and Genie has self played in its training data to be able to be like, okay, this is how I'm going to approach finding this information. Much more akin to how a developer would do it. Because if I was like, Shawn, go into this new code base you've never seen before.[00:30:26] And find me the code that does this. You're gonna probably, you might do some keywords, you're gonna look over the file system, you're gonna try to figure out from the directories and the file names where it might be, you're gonna like jump in one, and then once you're in there, you're probably gonna be doing the, you know, go to definition stuff to like jump from file to file and try to use the graph to like get closer and closer.[00:30:46] And that is exactly what Genie does. Starts on the file system, looks at the file system, picks some candidate files, is this what I'm looking for, yes or no, and If there's something that's interesting, like an import or something, it can, it can command click on that thing, go to definition, go to references, and so on.[00:31:00] And it can traverse the codebase that way.[00:31:02] swyx: Are you using the VS Code, uh, LSP, or? No,[00:31:05] Alistair Pullen: that's not, we're not like, we're not doing this in VS Code, we're just using the language servers running. But, we really wanted to try to mimic the way we do it as best as possible. And we did that during the self play process when we were generating the dataset, so.[00:31:18] Although we did all that work originally, and although, like, Genie still has access to these tools, so it can do keyword searches, and it can do, you know, basic semantic searches, and it can use the graph, it uses them through this process and figures out, okay, I've learned from data how to find stuff in codebases, and I think in our technical report, I can't remember the exact number, but I think it was around 65 or 66 percent retrieval accuracy overall, Measured on, we know what lines we need for these tasks to find, for the task to actually be able to be completed, And we found about 66 percent of all those lines, which is one of the biggest areas of free performance that we can get a hold of, because When we were building Genie, truthfully, like, a lot more focus went on assuming you found the right information, you've been able to reproduce the issue, assuming that's true, how do you then go about solving it?[00:32:08] And the bulk of the work we did was on the solving. But when you go higher up the funnel, obviously, like, the funnel looks like, have you found everything you need for the task? Are you able to reproduce the problem that's seen in the issue? Are you then able to solve it? And the funnel gets narrower as you go down.[00:32:22] And at the top of the funnel, of course, is rank. So I'm actually quite happy with that score. I think it's still pretty impressive considering the size of some of the codebases we're doing, we're using for this. But as soon as that, if that number becomes 80, think how many more tasks we get right. That's one of the key areas we're going to focus on when we continue working on Genie.[00:32:37] It'd be interesting to break out a benchmark just for that.[00:32:41] swyx: Yeah, I mean, it's super easy. Because I don't know what state of the art is.[00:32:43] Alistair Pullen: Yeah, I mean, like, for a, um, it's super easy because, like, for a given PR, you know what lines were edited. Oh, okay. Yeah, you know what lines were[00:32:50] swyx: you can[00:32:51] Alistair Pullen: source it from Cbench, actually.[00:32:52] Yeah, you can do it, you can do it super easily. And that's how we got that figure out at the other end. Um, for us being able to see it against, um, our historic models were super useful. So we could see if we were, you know, actually helping ourselves or not. And initially, one of the biggest performance gains that we saw when we were work, when we did work on the RAG a bit was giving it the ability to use the LSP to like go to definition and really try to get it to emulate how we do that, because I'm sure when you go into an editor with that, where like the LSP is not working or whatever, you suddenly feel really like disarmed and naked.[00:33:20] You're like, Oh my god, I didn't realize how much I actually used this to get about rather than just find stuff. So we really tried to get it to do that and that gave us a big jump in performance. So we went from like 54 percent up to like the 60s, but just by adding, focusing on that.[00:33:34] swyx: One weird trick. Yes.[00:33:37] I'll briefly comment here. So this is the standard approach I would say most, uh, code tooling startups are pursuing. The one company that's not doing this is magic. dev. So would you do things differently if you have a 10 million[00:33:51] Alistair Pullen: token context window? If I had a 10 million context window and hundreds of millions of dollars, I wouldn't have gone and built, uh, it's an LTM, it's not a transformer, right, that they're using, right?[00:34:03] If I'm not mistaken, I believe it's not a transformer. Yeah, Eric's going to come on at some point. Listen, they obviously know a lot more about their product than I do. I don't know a great deal about how magic works. I don't think he knows anything yet. I'm not going to speculate. Would I do it the same way as them?[00:34:17] I like the way we've done it because fundamentally like we focus on the Active software engineering and what that looks like and showing models how to do that. Fundamentally, the underlying model that we use is kind of null to us, like, so long as it's the best one, I don't mind. And the context windows, we've already seen, like, you can get transformers to have, like, million, one and a half million token context windows.[00:34:43] And that works perfectly well, so like, as soon as you can fine tune Gemini 1. 5, then you best be sure that Genie will run on Gemini 1. 5, and like, we'll probably get very good performance out of that. I like our approach because we can be super agile and be like, Oh, well, Anthropic have just released whatever, uh, you know, and it might have half a million tokens and it might be really smart.[00:35:01] And I can just immediately take my JSONL file and just dump it in there and suddenly Genie works on there and it can do all the new things. Does[00:35:07] swyx: Anthropic have the same fine tuning support as OpenAI? I[00:35:11] Alistair Pullen: actually haven't heard any, anyone do it because they're working on it. They are partner, they're partnered with AWS and it's gonna be in Bedrock.[00:35:16] Okay. As far as, as far as I know, I think I'm, I think, I think that's true. Um, cool. Yeah.[00:35:20] Planning[00:35:20] swyx: We have to keep moving on to, uh, the other segments. Sure. Uh, planning the second piece of your four step grand master plan, that is the frontier right now. You know, a lot of people are talking about strawberry Q Star, whatever that is.[00:35:32] Monte Carlo Tree Search. Is current state of the art planning good enough? What prompts have worked? I don't even know what questions to ask. Like, what is the state of planning?[00:35:41] Alistair Pullen: I think it's fairly obvious that with the foundational models, like, you can ask them to think by step by step and ask them to plan and stuff, but that isn't enough, because if you look at how those models score on these benchmarks, then they're not even close to state of the art.[00:35:52] Which ones are[00:35:52] swyx: you referencing? Benchmarks? So, like,[00:35:53] Alistair Pullen: just, uh, like, SweetBench and so on, right? And, like, even the things that get really good scores on human evalor agents as well, because they have these loops, right? Yeah. Obviously these things can reason, quote unquote, but the reasoning is the model, like, it's constrained by the model as intelligence, I'd say, very crudely.[00:36:10] And what we essentially wanted to do was we still thought that, obviously, reasoning is super important, we need it to get the performance we have. But we wanted the reasoning to emulate how we think about problems when we're solving them as opposed to how a model thinks about a problem when we're solving it.[00:36:23] And that was, that's obviously part of, like, the derivation pipeline that we have when we, when we, when we Design our data, but the reasoning that the models do right now, and who knows what Q star, whatever ends up being called looks like, but certainly what I'm excited on a small tangent to that, like, what I'm really excited about is when models like that come out, obviously, the signal in my data, when I regenerate, it goes up.[00:36:44] And then I can then train that model. It's already better at reasoning with it. improved reasoning data and just like I can keep bootstrapping and keep leapfrogging every single time. And that is like super exciting to me because I don't, I welcome like new models so much because immediately it just floats me up without having to do much work, which is always nice.[00:37:02] But at the state of reasoning generally, I don't see it going away anytime soon. I mean, that's like an autoregressive model doesn't think per se. And in the absence of having any thought Maybe, uh, an energy based model or something like that. Maybe that's what QSTAR is. Who knows? Some sort of, like, high level, abstract space where thought happens before tokens get produced.[00:37:22] In the absence of that for the moment, I think it's all we have and it's going to have to be the way it works. For what happens in the future, we'll have to see, but I think certainly it's never going to hinder performance to do it. And certainly, the reasoning that we see Genie do, when you compare it to like, if you ask GPT 4 to break down step by step and approach for the same problem, at least just on a vibe check alone, looks far better.[00:37:46] swyx: Two elements that I like, that I didn't see in your initial video, we'll see when, you know, this, um, Genie launches, is a planner chat, which is, I can modify the plan while it's executing, and then the other thing is playbooks, which is also from Devin, where, here's how I like to do a thing, and I'll use Markdown to, Specify how I do it.[00:38:06] I'm just curious if, if like, you know,[00:38:07] Alistair Pullen: those things help. Yeah, no, absolutely. We're a hundred percent. We want everything to be editable. Not least because it's really frustrating when it's not. Like if you're ever, if you're ever in a situation where like this is the one thing I just wish I could, and you'd be right if that one thing was right and you can't change it.[00:38:21] So we're going to make everything as well, including the code it writes. Like you can, if it makes a small error in a patch, you can just change it yourself and let it continue and it will be fine. Yeah. So yeah, like those things are super important. We'll be doing those two.[00:38:31] Alessio: I'm curious, once you get to writing code, is most of the job done?[00:38:35] I feel like the models are so good at writing code when they're like, And small chunks that are like very well instructed. What's kind of the drop off in the funnel? Like once you get to like, you got the right files and you got the right plan. That's a great question[00:38:47] Alistair Pullen: because by the time this is out, there'll be another blog, there'll be another blog post, which contains all the information, all the learnings that I delivered to OpenAI's fine tuning team when we finally got the score.[00:38:59] Oh, that's good. Um, go for it. It's already up. And, um, yeah, yeah. I don't have it on my phone, but basically I, um, broke down the log probs. I basically got the average log prob for a token at every token position in the context window. So imagine an x axis from 0 to 128k and then the average log prob for each index in there.[00:39:19] As we discussed, like, The way genie works normally is, you know, at the beginning you do your RAG, and then you do your planning, and then you do your coding, and that sort of cycle continues. The certainty of code writing is so much more certain than every other aspect of genie's loop. So whatever's going on under the hood, the model is really comfortable with writing code.[00:39:35] There is no doubt, and it's like in the token probabilities. One slightly different thing, I think, to how most of these models work is, At least for the most part, if you ask GPT4 in ChatGPT to edit some code for you, it's going to rewrite the entire snippet for you with the changes in place. We train Genie to write diffs and, you know, essentially patches, right?[00:39:55] Because it's more token efficient and that is also fundamentally We don't write patches as humans, but it's like, the result of what we do is a patch, right? When Genie writes code, I don't know how much it's leaning on the pre training, like, code writing corpus, because obviously it's just read code files there.[00:40:14] It's obviously probably read a lot of patches, but I would wager it's probably read more code files than it has patches. So it's probably leaning on a different part of its brain, is my speculation. I have no proof for this. So I think the discipline of writing code is slightly different, but certainly is its most comfortable state when it's writing code.[00:40:29] So once you get to that point, so long as you're not too deep into the context window, another thing that I'll bring up in that blog post is, um, Performance of Genie over the length of the context window degrades fairly linearly. So actually, I actually broke it down by probability of solving a SWE bench issue, given the number of tokens of the context window.[00:40:49] It's 60k, it's basically 0. 5. So if you go over 60k in context length, you are more likely to fail than you are to succeed just based on the amount of tokens you have on the context window. And when I presented that to the fine tuning team at OpenAI, that was super interesting to them as well. And that is more of a foundational model attribute than it is an us attribute.[00:41:10] However, the attention mechanism works in, in GPT 4, however, you know, they deal with the context window at that point is, you know, influencing how Genie is able to form, even though obviously all our, all our training data is perfect, right? So even if like stuff is being solved in 110, 000 tokens, sort of that area.[00:41:28] The training data still shows it being solved there, but it's just in practice, the model is finding it much harder to solve stuff down that end of the context window.[00:41:35] Alessio: That's the scale with the context, so for a 200k context size, is 100k tokens like the 0. 5? I don't know. Yeah, but I,[00:41:43] Alistair Pullen: I, um, hope not. I hope you don't just take the context length and halve it and then say, oh, this is the usable context length.[00:41:50] But what's been interesting is knowing that Actually really digging into the data, looking at the log probs, looking at how it performs over the entire window. It's influenced the short term improvements we've made to Genie since we did the, got that score. So we actually made some small optimizations to try to make sure As best we can without, like, overdoing it, trying to make sure that we can artificially make sure stuff sits within that sort of range, because we know that's our sort of battle zone.[00:42:17] And if we go outside of that, we're starting to push the limits, we're more likely to fail. So just doing that sort of analysis has been super useful without actually messing with anything, um, like, more structural in getting more performance out of it.[00:42:29] Language Mix[00:42:29] Alessio: What about, um, different languages? So, in your technical report, the data makes sense.[00:42:34] 21 percent JavaScript, 21 percent Python, 14 percent TypeScript, 14 percent TSX, um, Which is JavaScript, JavaScript.[00:42:42] Alistair Pullen: Yeah,[00:42:42] swyx: yeah, yeah. Yes,[00:42:43] Alistair Pullen: yeah, yeah. It's like 49 percent JavaScript. That's true, although TypeScript is so much superior, but anyway.[00:42:46] Alessio: Do you see, how good is it at just like generalizing? You know, if you're writing Rust or C or whatever else, it's quite different.[00:42:55] Alistair Pullen: It's pretty good at generalizing. Um, obviously, though, I think there's 15 languages in that technical report, I think, that we've, that we've covered. The ones that we picked in the highest mix were, uh, the ones that, selfishly, we internally use the most, and also that are, I'd argue, some of the most popular ones.[00:43:11] When we have more resource as a company, and, More time and, you know, once all the craziness that has just happened sort of dies down a bit, we are going to, you know, work on that mix. I'd love to see everything ideally be represented in a similar level as it is. If you, if you took GitHub as a data set, if you took like how are the languages broken down in terms of popularity, that would be my ideal data mix to start.[00:43:34] It's just that it's not cheap. So, um, yeah, trying to have an equal amount of Ruby and Rust and all these different things is just, at our current state, is not really what we're looking for.[00:43:46] Running Code[00:43:46] Alessio: There's a lot of good Ruby in my GitHub profile. You can have it all. Well, okay, we'll just train on that. For running tests It sounds easy, but it isn't, especially when you're working in enterprise codebases that are kind of like very hard to spin up.[00:43:58] Yes. How do you set that up? It's like, how do you make a model actually understand how to run a codebase, which is different than writing code for a codebase?[00:44:07] Alistair Pullen: The model itself is not in charge of like setting up the codebase and running it. So Genie sits on top of GitHub, and if you have CI running GitHub, you have GitHub Actions and stuff like that, then Genie essentially makes a call out to that, runs your CI, sees the outputs and then like moves on.[00:44:23] Making a model itself, set up a repo, wasn't scoped in what we wanted Genie to be able to do because for the most part, like, at least most enterprises have some sort of CI pipeline running and like a lot of, if you're doing some, even like, A lot of hobbyist software development has some sort of like basic CI running as well.[00:44:40] And that was like the lowest hanging fruit approach that we took. So when, when Genie ships, like the way it will run its own code is it will basically run your CI and it will like take the, um, I'm not in charge of writing this. The rest of the team is, but I think it's the checks API on GitHub allows you to like grab that information and throw it in the context window.[00:44:56] Alessio: What's the handoff like with the person? So, Jeannie, you give it a task, and then how long are you supposed to supervise it for? Or are you just waiting for, like, the checks to eventually run, and then you see how it goes? Like, uh, what does it feel like?[00:45:11] Alistair Pullen: There are a couple of modes that it can run in, essentially.[00:45:14] It can run in, like, fully headless autonomous modes, so say you assign it a ticket in linear or something. Then it won't ask you for anything. It will just go ahead and try. Or if you're in like the GUI on the website and you're using it, then you can give it a task and it, it might choose to ask you a clarifying question.[00:45:30] So like if you ask it something super broad, it might just come back to you and say, what does that actually mean? Or can you point me in the right direction for this? Because like our decision internally was, it's going to piss people off way more if it just goes off and has, and makes a completely like.[00:45:45] ruined attempt at it because it just like from day one got the wrong idea. So it can ask you for a lot of questions. And once it's going much like a regular PR, you can leave review comments, issue comments, all these different things. And it, because you know, he's been trained to be a software engineering colleague, responds in actually a better way than a real colleague, because it's less snarky and less high and mighty.[00:46:08] And also the amount of filtering has to do for When you train a model to like be a software engineer, essentially, it's like you can just do anything. It's like, yeah, it looks good to me, bro.[00:46:17] swyx: Let's[00:46:17] Alistair Pullen: ship it.[00:46:19] Finetuning with OpenAI[00:46:19] swyx: I just wanted to dive in a little bit more on your experience with the fine tuning team. John Allard was publicly sort of very commentary supportive and, you know, was, was part of it.[00:46:27] Like, what's it like working with them? I also picked up that you initially started to fine tune what was publicly available, the 16 to 32 K range. You got access to do more than that. Yeah. You've also trained on billions of tokens instead of the usual millions range. Just, like, take us through that fine tuning journey and any advice that you might have.[00:46:47] Alistair Pullen: It's been so cool, and this will be public by the time this goes out, like, OpenAI themselves have said we are pushing the boundaries of what is possible with fine tuning. Like, we are right on the edge, and like, we are working, genuinely working with them in figuring out how stuff works, what works, what doesn't work, because no one's doing No one else is doing what we're doing.[00:47:06] They have found what we've been working on super interesting, which is why they've allowed us to do so much, like, interesting stuff. Working with John, I mean, I had a really good conversation with John yesterday. We had a little brainstorm after the video we shot. And one of the things you mentioned, the billions of tokens, one of the things we've noticed, and it's actually a very interesting problem for them as well, when you're[00:47:28] How big your peft adapter, your lore adapter is going to be in some way and like figuring that out is actually a really interesting problem because if you make it too big and because they support data sets that are so small, you can put like 20 examples through it or something like that, like if you had a really sparse, large adapter, you're not going to get any signal in that at all.[00:47:44] So they have to dynamically size these things and there is an upper bound and actually we use. Models that are larger than what's publicly available. It's not publicly available yet, but when this goes out, it will be. But we have larger law adapters available to us, just because the amount of data that we're pumping through it.[00:48:01] And at that point, you start seeing really Interesting other things like you have to change your learning rate schedule and do all these different things that you don't have to do when you're on the smaller end of things. So working with that team is such a privilege because obviously they're like at the top of their field in, you know, in the fine tuning space.[00:48:18] So we're, as we learn stuff, they're learning stuff. And one of the things that I think really catalyzed this relationship is when we first started working on Genie, like I delivered them a presentation, which will eventually become the blog post that you'll love to read soon. The information I gave them there I think is what showed them like, oh wow, okay, these guys are really like pushing the boundaries of what we can do here.[00:48:38] And truthfully, our data set, we view our data set right now as very small. It's like the minimum that we're able to afford, literally afford right now to be able to produce a product like this. And it's only going to get bigger. So yesterday while I was in their offices, I was basically, so we were planning, we were like, okay, how, this is where we're going in the next six to 12 months.[00:48:57] Like we're, Putting our foot on the gas here, because this clearly works. Like I've demonstrated this is a good, you know, the best approach so far. And I want to see where it can go. I want to see what the scaling laws like for the data. And at the moment, like, it's hard to figure that out because you don't know when you're running into like saturating a PEFT adapter, as opposed to actually like, is this the model's limit?[00:49:15] Like, where is that? So finding all that stuff out is the work we're actively doing with them. And yeah, it's, it's going to get more and more collaborative over the next few weeks as we, as we explore like larger adapters, pre training extension, different things like that.[00:49:27] swyx: Awesome. I also wanted to talk briefly about the synthetic data process.[00:49:32] Synthetic Code Data[00:49:32] swyx: One of your core insights was that the vast majority of the time, the code that is published by a human is encrypted. In a working state. And actually you need to fine tune on non working code. So just, yeah, take us through that inspiration. How many rounds, uh, did you, did you do? Yeah, I mean, uh,[00:49:47] Alistair Pullen: it might, it might be generous to say that the vast majority of code is in a working state.[00:49:51] I don't know if I don't know if I believe that. I was like, that's very nice of you to say that my code works. Certainly, it's not true for me. No, I think that so yeah, no, but it was you're right. It's an interesting problem. And what we saw was when we didn't do that, obviously, we'll just hope you have to basically like one shot the answer.[00:50:07] Because after that, it's like, well, I've never seen iteration before. How am I supposed to figure out how this works? So what the what you're alluding to there is like the self improvement loop that we started working on. And that was in sort of two parts, we synthetically generated runtime errors. Where we would intentionally mess with the AST to make stuff not work, or index out of bounds, or refer to a variable that doesn't exist, or errors that the foundational models just make sometimes that you can't really avoid, you can't expect it to be perfect.[00:50:39] So we threw some of those in with a, with a, with a probability of happening and on the self improvement side, I spoke about this in the, in the blog post, essentially the idea is that you generate your data in sort of batches. First batch is like perfect, like one example, like here's the problem, here's the answer, go, train the model on it.[00:50:57] And then for the second batch, you then take the model that you trained before that can look like one commit into the future, and then you let it have the first attempt at solving the problem. And hopefully it gets it wrong, and if it gets it wrong, then you have, like, okay, now the codebase is in this incorrect state, but I know what the correct state is, so I can do some diffing, essentially, to figure out how do I get the state that it's in now to the state that I want it in, and then you can train the model to then produce that diff next, and so on, and so on, and so on, so the model can then learn, and also reason as to why it needs to make these changes, to be able to learn how to, like, learn, like, solve problems iteratively and learn from its mistakes and stuff like that.[00:51:35] Alessio: And you picked the size of the data set just based on how much money you could spend generating it. Maybe you think you could just make more and get better results. How, what[00:51:42] Alistair Pullen: multiple of my monthly burn do I spend doing this? Yeah. Basically it was, it was very much related to Yeah. Just like capital and um, yes, with any luck that that will be alleviated to[00:51:53] swyx: very soon.[00:51:54] Alistair Pullen: Yeah.[00:51:54] SynData in Llama 3[00:51:54] swyx: Yeah. I like drawing references to other things that are happening in, in the, in the wild. So, 'cause we only get to release this podcast once a week. Mm-Hmm. , the LAMA three paper also had some really interesting. Thoughts on synthetic data for code? I don't know if you have reviewed that. I'll highlight the back translation section.[00:52:11] Because one of your dataset focuses is updating documentation. I think that translation between natural language, English versus code, and

covid-19 god ceo donald trump english google ai pr england san francisco design performance speaker planning putting code chatgpt starts mvp ga cheers android reddit coo id active ios honestly customers b2b wifi models cto ram expensive swift slack genie openai gemini nah rust ux historically api lite yang unbelievable python gpt aws lama java github llama representing future plans synthetic javascript da vinci exeter ids copilot resonance llm sw alistair ast ide measured git dota bedrock prs urls codex rag benchmarks artifact 8k anthropic alessio customizing yc fine tuning stack overflow json typescript cli vs code cursor markdown devrel genix cosign lsp swe tsx github actions gopuff openai gpt ltm amazon q devins betteridge cosine iclr latent space

IA e RAG na Oracle, fotos polêmicas no Grok, prompt caching no Claude – Hipsters: Fora de Controle #70

Hipsters Ponto Tech

Play Episode Listen Later Aug 16, 2024 67:37

Neste episódio, conversamos com Fernando de Come, Data & AI Solution Engineer na Oracle, sobre como a empresa vem desenvolvendo modelos de IA assistidos por RAG para entregar soluções para clientes em múltiplos segmentos. Antes, repercutimos as principais notícias da semana, incluindo a polêmica de geração de imagens no X, a nova IA desenvolvedora Genie da Cosine, e a função de Gemini Live anunciada no evento Made by Google 2024. Vem ver quem participou desse papo: Marcus Mendes, host fora de controle Fabrício Carraro, host fora de controle, Program Manager da Alura, autor de IA e host do podcast Dev Sem Fronteiras Fernando De Come, Data & AI Solution Engineer na Oracle

Hyperspectral imaging, edge computing, and batch production for space – with cosine

The Space Industry

Play Episode Listen Later Jul 31, 2024 42:44

This episode of the Space Industry podcast by satsearch is a conversation with cosine, on hyperspectral imaging, edge computing, and the company's strategy and production processes.cosine is a Netherlands based space company, and satsearch Trusted Supplier, that builds optical and in-situ measurement systems for space, air, and ground use.The company has produced solutions for a variety of industry sectors, including medicine, agriculture, and space, and has particular expertise in edge computing and hyperspectral imaging. In the podcast we cover:The technical advantages that hyperspectral imaging can bring to space servicesAn overview of cosine's work, and current focus, in the space industryThe applications and status of edge computing in spaceThe benefits and uses of thermal imaging from satellitesHow cosine has developed a high volume batch production approach that maintains quality and consistencyYou can find out more about cosine here on their satsearch supplier hub.And if you would like to learn more about the space industry and our work at satsearch building the global marketplace for space, please join our newsletter https://satsearch.com/mailing-list.[Music from Uppbeat (free for Creators!): https://uppbeat.io/t/all-good-folks/when-we-get-there License code: Y4KZEAESHXDHNYRA]

music ai space artificial intelligence production netherlands creators batch imaging edge computing space industry space technology spacethe edge ai thermal imaging earth observation cosine

The Third Win: A New Business Model for Lighting Design

Working Together

Play Episode Listen Later Jul 16, 2024 37:21

Welcome to the Brick & Wonder podcast, where we delve into the art and science of collaboration within the built environment industry. Hosted by Drew Lang, founder of Brick & Wonder, this podcast explores the stories and strategies behind successful collaborations that drive innovation and shape the built world around us.In this episode we spoke with two people who came together in a successful and enduring business consulting engagement. Matt Schroeder is the co-founder and CEO of The Cosine, an architectural lighting and procurement company. Everett Hollander is a partner at Billy Clark Creative Management, a strategy and talent advisor to creative companies in the built environment industry.

ceo brick business models new business lighting design cosine matt schroeder

#85- Constrastive learning and cosine similarity.

Life with AI

Play Episode Listen Later Jun 20, 2024 12:11

Hey guys, in this episode I talk about two very important technical concepts in Deep Learning, constrastive learning and cosine similarity. They are very useful when training embedding models or doing RAG. Very good blog post about contrastive losses: https://lilianweng.github.io/posts/2021-05-31-contrastive/ SimCLR paper: https://arxiv.org/abs/2002.05709 Instagram of the podcast: https://www.instagram.com/podcast.lifewithai Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai

learning deep learning rag similarity cosine

From .NET to DuckDB: Unleashing the Database Evolution with Giorgi Dalakishvili

The .NET Core Podcast

Play Episode Listen Later Mar 22, 2024 65:15

NService Bus This episode of The Modern .NET Show is supported, in part, by NServiceBus, the ultimate tool to build robust and reliable systems that can handle failures gracefully, maintain high availability, and scale to meet growing demand. Make sure you click the link in the show notes to learn more about NServiceBus. Show Notes Yeah. So what I was thinking the other day is that what we want is to concentrate on the business logic that we need to implement and spend as small as little time as possible configuring, installing and figuring out the tools and libraries that we are using for this specific task. Like our mission is to produce the business logic and we should try to minimize the time that we spend on the tools and libraries that enable us to build the software. —Giorgi Dalakishvili Welcome to The Modern .NET Show! Formerly known as The .NET Core Podcast, we are the go-to podcast for all .NET developers worldwide and I am your host Jamie "GaProgMan" Taylor. In this episode, I spoke with Giorgi Dalakishvili about Postgresql, DuckDB, and where you might use either of them in your applications. As Giorgi points out, .NET has support for SQL Server baked in, but there's also support for other database technologies too: Yes, there are many database technologies and just like you, for me, SQL Server was the default go to database for quite a long time because it's from Microsoft. All the frameworks and libraries work with SQL Server out of the box, and have usually better support for SQL Server than for other databases. But recently I have been diving into Postgresql, which is a free database and I discovered that it has many interesting features and I think that many .NET developers will be quite excited about these features. The are very useful in some very specific scenarios. And it also has a very good support for .NET. Nowadays there is a .NET driver for Postgres, there is a .NET driver for Entity Framework core. So I would say it's not behind SQL server in terms of .NET support or feature wise. —Giorgi Dalakishvili He also points out that our specialist skill as developers is not to focus on the tools, libraries, and frameworks, but to use what we have in our collective toolboxes to build the business logic that our customers, clients, and users desire of us. And along the way, he drops some knowledge on an essential NuGet package for those of us who are using Entity Framework.. So let's sit back, open up a terminal, type in dotnet new podcast and we'll dive into the core of Modern .NET. Supporting the Show If you find this episode useful in any way, please consider supporting the show by either leaving a review (check our review page for ways to do that), sharing the episode with a friend or colleague, buying the host a coffee, or considering becoming a Patron of the show. Full Show Notes The full show notes, including links to some of the things we discussed and a full transcription of this episode, can be found at: https://dotnetcore.show/season-6/from-net-to-DuckDB-unleashing-the-database-evolution-with-giorgi-dalakishvili/ Useful Links Giorgi's GitHub DuckDB .NET Driver Postgres Array data type Postgres Range data type DuckDB DbUpdateException EntityFramework.Exceptions JsonB data type Vector embeddings Cosine similarity Vector databases: Chroma qdrant pgvector pgvector .NET library OLAP queries parquet files Dapper DuckDB documentation Dapr DuckDB Wasm; run DuckDB in your browser GitHub Codespaces Connecting with Giorgi: on Twitter on LinkedIn on his website Supporting the show: Leave a rating or review Buy the show a coffee Become a patron Getting in touch: via the contact page joining the Discord Music created by Mono Memory Music, licensed to RJJ Software for use in The Modern .NET Show Remember to rate and review the show on Apple Podcasts, Podchaser, or wherever you find your podcasts, this will help the show's audience grow. Or you can just share the show with a friend. And don't forget to reach out via our Contact page. We're very interested in your opinion of the show, so please get in touch. You can support the show by making a monthly donation on the show's Patreon page at: https://www.patreon.com/TheDotNetCorePodcast.

Applications of trigonometry for GCSE Maths

GCSE Maths Revision with Jonas

Play Episode Listen Later Nov 20, 2023 6:10

OpenAI's large-scale language-generation tool ChatGPT may have been used to draft some content in this episode and some of the show notes of this episode. StudySquare Ltd has adapted the content, and the publication is attributed to StudySquare Ltd. This episode is a general guideline for information and not a specific tutorial for any specific syllabus; therefore, it should not be relied upon. StudySquare Ltd and any people involved in producing this podcast take no responsibility or liability for any potential errors or omissions regarding this podcast and make no guarantees of any completeness, accuracy, or usefulness of the information contained in this podcast, its structure or its show notes. The problems or questions in this episode might not appear in exam papers.The content in this episode might be more relevant to learners in the United Kingdom. Laws, educational standards, and exam requirements may vary significantly from one location to another. It's the listener's responsibility to confirm that the material complies with the requirements and regulations of their local educational system. If any content of this episode does not comply with your local regulations or laws, please discontinue listening and consult with your local educational authorities.Any references to experiments in this episode are for information purposes only and do not allow any listener to perform them without proper guidance or support. Experiments or practical work mentioned during this episode should not be attempted without appropriate supervision from a qualified teacher or professional. Additionally, the information provided in our podcast is not medical advice and should not be taken as such. If you require medical advice, please consult a healthcare professional. This episode is provided 'as is' without any representations or warranties, express or implied.This episode covers the following:• Area of an irregular triangle• Sine rule• Cosine rule• Page for this topic: https://studysquare.co.uk/test/Maths/OCR/GCSE/Applications-of-trigonometry?s=p• Trial lesson (terms and conditions apply): https://www.studysquare.co.uk/trial?s=p-/test/Maths/OCR/GCSE/Applications-of-trigonometry• Privacy policy of Spreaker (used to distribute this episode): https://www.spreaker.com/privacy

united kingdom trial chatgpt laws experiments privacy applications openai spreaker sine trigonometry cosine gcse maths

S4 Ep47: Sine Cosine Tangent

Daily Notes from Nathan Cassidy

Play Episode Listen Later Nov 14, 2023 1:00

Utter BS. Recorded Live at Edinburgh Fringe. FULL LENGTH SPECIAL 'FIFTY' COMING SOON

coming soon edinburgh fringe festival tangent cosine

Cursor.so: The AI-first Code Editor — with Aman Sanger of Anysphere

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Aug 22, 2023 59:25

Thanks to the almost 30k people who tuned in to the last episode!Your podcast cohosts have been busy shipping:* Alessio open sourced smol-podcaster, which makes the show notes here! * swyx launched GodMode. Maybe someday the Cursor of browsers?* We're also helping organize a Llama Finetuning Hackameetup this Saturday in anticipation of the CodeLlama release. Lastly, more speakers were announced at AI Engineer Summit!

god english google ai pr moving training san francisco tips microsoft mit 3d code paying costs rethinking chat dom native solid cto deciding bloom react openai sf residence worked babel users ux api mckinsey new hope cs palm io python ui gpt ml alibaba java mosaic github llama docs apis azure html 200k cad rift copilot llm aman gpu agi opt ides ide bridgewater kv prs codex gpus converge figma 8k alessio ui ux morph modal sanger hyena 60k dsl vs code cursor itamar arvid god mode tsx hungry hungry hippos gary marcus vsc gpd imagenet jsx idx swale repl 32k solidworks entropic vl m cosine onshape ap computer science gamelon fixies

Kodsnack 536 - I choose computer science, with Michele Riva

Kodsnack

Play Episode Listen Later Aug 1, 2023 49:03

Recorded at the Øredev 2022 developer conference, Fredrik chats with Michele Riva about writing a full-text search engine, maintaining 8% of all Node modules, going to one conference per week, refactoring, the value of a good algorithm, and a lot more. Michele highly recommends writing a full-text search engine. He created Lyra- later renamed Orama, and encourages writing your own in order to demystify subjects. Since the podcast was recorded, Michele has left his then employer Nearform and founded Oramasearch to focus on the search engine full time. We also discuss working for product companies versus consulting, versus open source. It's more about differences between companies than anything else. Open source teaches you deal with more and more different people. Writing code is never just writing code. Should we worry about taking on too many dependencies? Michele is in favour of not fearing dependencies, but ensuring you understand how things important parts for your application work. Writing books is never convenient, but it can open many doors. When it comes to learning, there are areas where a whole level of tutorials are missing - where there is only really surface-level tutorial and perhaps deep papers, but nothing in between. Michele works quite a bit on bridging such gaps through his presentations. Thank you Cloudnet for sponsoring our VPS! Comments, questions or tips? We are @kodsnack, @tobiashieta, @oferlundand @bjoreman on Twitter, have a page on Facebook and can be emailed at info@kodsnack.se if you want to write longer. We read everything we receive. If you enjoy Kodsnack we would love a review in iTunes! You can also support the podcast by buying us a coffee (or two!) through Ko-fi. Links Michele Michele's Øredev 2023 presentations Nearform TC39 - the committee which evolves Javascript as a language Matteo Collina - worked at Nearform, works with the Node technical steering committee Lyra - the full-text search engine - has been renamed Orama Lucene Solr Elasticsearch Radix tree Prefix tree Inverted index Thoughtworks McKinsey Daniel Stenberg Curl Deno Express Fastify Turbopack Turborepo from Vercel Vercel Fast queue Refactoring Michele's refactoring talk Real-world Next.js - Michele's book Next.js Multitenancy Create React app Nuxt Vue Sveltekit TF-IDF - “term frequency–inverse document frequency” Cosine similarity Michele's talk on building Lyra Explaining distributed systems like I'm five Are all programming languages in English? 4th dimension Prolog Velato - programming language using MIDI files as source code Titles For foreign people, it's Mitch That kind of maintenance A very particular company A culture around open source software Now part of the 8% Nothing more than a radix tree One simple and common API Multiple ways of doing consultancy What you're doing is hidden You can't expect to change people A problem we definitely created ourselves Math or magic Writing books is never convenient Good for 90% of the use cases (When I can choose,) I choose computer science

english real writing open computers math ko computer science javascript midi fredrik node vps inverted prefix cosine nearform kodsnack cloudnet

Kodsnack 536 - I choose computer science, with Michele Riva

Kodsnack in English

Play Episode Listen Later Aug 1, 2023 49:03

Recorded at the Øredev 2022 developer conference, Fredrik chats with Michele Riva about writing a full-text search engine, maintaining 8% of all Node modules, going to one conference per week, refactoring, the value of a good algorithm, and a lot more. Michele highly recommends writing a full-text search engine. He created Lyra - later renamed Orama, and encourages writing your own in order to demystify subjects. Since the podcast was recorded, Michele has left his then employer Nearform and founded Oramasearch to focus on the search engine full time. We also discuss working for product companies versus consulting, versus open source. It’s more about differences between companies than anything else. Open source teaches you deal with more and more different people. Writing code is never just writing code. Should we worry about taking on too many dependencies? Michele is in favour of not fearing dependencies, but ensuring you understand how things important parts for your application work. Writing books is never convenient, but it can open many doors. When it comes to learning, there are areas where a whole level of tutorials are missing - where there is only really surface-level tutorial and perhaps deep papers, but nothing in between. Michele works quite a bit on bridging such gaps through his presentations. Thank you Cloudnet for sponsoring our VPS! Comments, questions or tips? We are @kodsnack, @tobiashieta, @oferlund and @bjoreman on Twitter, have a page on Facebook and can be emailed at info@kodsnack.se if you want to write longer. We read everything we receive. If you enjoy Kodsnack we would love a review in iTunes! You can also support the podcast by buying us a coffee (or two!) through Ko-fi. Links Michele Michele’s Øredev 2023 presentations Nearform TC39 - the committee which evolves Javascript as a language Matteo Collina - worked at Nearform, works with the Node technical steering committee Lyra - the full-text search engine - has been renamed Orama Lucene Solr Elasticsearch Radix tree Prefix tree Inverted index Thoughtworks McKinsey Daniel Stenberg Curl Deno Express Fastify Turbopack Turborepo from Vercel Vercel Fast queue Refactoring Michele’s refactoring talk Real-world Next.js - Michele’s book Next.js Multitenancy Create React app Nuxt Vue Sveltekit TF-IDF - “term frequency–inverse document frequency” Cosine similarity Michele’s talk on building Lyra Explaining distributed systems like I’m five Are all programming languages in English? 4th dimension Prolog Velato - programming language using MIDI files as source code Titles For foreign people, it’s Mitch That kind of maintenance A very particular company A culture around open source software Now part of the 8% Nothing more than a radix tree One simple and common API Multiple ways of doing consultancy What you’re doing is hidden You can’t expect to change people A problem we definitely created ourselves Math or magic Writing books is never convenient Good for 90% of the use cases (When I can choose,) I choose computer science

【Cosine】第四季《信義存愛組》主持人現身！！主持之路說起來不簡單卻也不困難，主持甘苦談START

Play Episode Listen Later Apr 19, 2023 43:08

✨S5E1精華盤點✨

【Cosine】一個平凡大學生怎麼走上Podcast主持這條路？最有成就感的時候是？背後原來有這些故事⋯！

Play Episode Listen Later Mar 23, 2023 26:32

✨S4E36精華✨

film direction merchandising napoleon bonaparte directorate directoire cosine

#D165 (direction cosine to directorate)

The Dictionary

Play Episode Listen Later Jan 12, 2023 30:17

I read from direction cosine to directorate. A direction finder is used to find where something is located using triangulation. https://en.wikipedia.org/wiki/Direction_finding The directoire style only lasted for 4 years (almost exactly) and ended when Napoleon Bonaparte overthrew the government. https://en.wikipedia.org/wiki/Directoire_style https://en.wikipedia.org/wiki/French_Directory The word of the episode is "director". https://en.wikipedia.org/wiki/Film_director Theme music from Jonah Kraut https://jonahkraut.bandcamp.com/ Merchandising! https://www.teepublic.com/user/spejampar "The Dictionary - Letter A" on YouTube "The Dictionary - Letter B" on YouTube "The Dictionary - Letter C" on YouTube "The Dictionary - Letter D" on YouTube Featured in a Top 10 Dictionary Podcasts list! https://blog.feedspot.com/dictionary_podcasts/ Backwards Talking on YouTube: https://www.youtube.com/playlist?list=PLmIujMwEDbgZUexyR90jaTEEVmAYcCzuq dictionarypod@gmail.com https://www.facebook.com/thedictionarypod/ https://twitter.com/dictionarypod https://www.instagram.com/dictionarypod/ https://www.patreon.com/spejampar https://www.tiktok.com/@spejampar 917-727-5757

【法蘭 Francis Tsai】搖擺了18年的爵士女聲～人生的開場不算順遂，但同時也賦予了她滿滿故事！feat. 法蘭

Play Episode Listen Later Jan 12, 2023 61:01

✨S4E27精華✨

covid-19 time live papa timesup bossa nova tsai poscast double bass kkbox cosine

【ALL IN 5】超級圈粉的偶像男團光臨！誰最寵粉絲？誰反差最大？這一集給你滿滿的笑點與溫暖～feat. ALL IN 5

Play Episode Listen Later Jan 5, 2023 64:18

✨S4E26精華✨

spotify all in poscast aif kkbox cosine

【嚴梵】決心來到台北，只為一個夢想！創作這條路真的不好走⋯？！feat.嚴梵

spotify linktree poscast kkbox cosine

Play Episode Listen Later Dec 29, 2022 65:54

✨S4E25精華✨

【吳汶芳】想知道你在音樂的世界裡會是什麼樣的人格嗎？新專輯帶你認識不一樣的她！feat.吳汶芳

Play Episode Listen Later Dec 22, 2022 58:57

✨S4E24精華✨

kkbox cosine

【四分衛阿山】怎樣算搖滾？把事情做到最好、做到極致就是搖滾！！feat. 陳如山

Latin in Layman’s - A Rhetoric Revolution

Play Episode Listen Later Nov 17, 2022 55:54

✨S4E19精華✨

nt mv cosine

Etymologizing Mathematical Terms - From Trigonometry to Geometry to Sine/Cosine/Tangent

Play Episode Listen Later Oct 23, 2022 21:14

Trigonometry from Modern Latin trigonometria (Barthelemi Pitiscus, 1595), from Greek trigonon "triangle" from tri- "three"+ gōnia "angle, corner" + metron "a measure." "branch of mathematics that deals with relations between sides and angles of triangles," Geometry - “a measuring of the earth” from combining form of gē (gaia) "earth, land" + -metria "a measuring of" Geometry is, with arithmetic, one of the oldest branches of mathematics. It is concerned with properties of space such as the distance, shape, size, and relative position of figures. Parallel from para- "beside" + allēl "each other." in geometry, of lines, "lying in the same plane but never meeting in either direction." As a noun from 1550s, "a line parallel to another line." Meanings "a comparison made by placing things side by side" and "thing equal to or resembling another in all particulars" are from 1590s. Parallel bars as gymnastics apparatus is recorded from 1868. Perpendicular - "at right angles to the horizon," from per "thoroughly" (see per) + pendere "to hang, cause to hang; weigh" (from PIE root *(s)pen- "to draw, stretch, spin"). Percent - “by/through a hundred” from Modern Latin per centum "by the hundred" Angle - directly from Latin angulus "an angle, a corner," a diminutive form from PIE root *ang-/*ank- "to bend" (source also of Greek ankylos "bent, crooked," Latin ang(u)ere "to compress in a bend, fold, strangle;" Old Church Slavonic aglu "corner;" Lithuanian anka "loop;" Sanskrit ankah "hook, bent," angam "limb;" Old English ancleo "ankle;" Acute - from Latin acutus "sharp, pointed," figuratively "shrill, penetrating; intelligent, cunning," past participle of acuere "to sharpen" (literal and figurative) It was also used of humors (early 15c.). The meaning "ending in a sharp point" is from 1560s; the sense of "sharp or penetrating in intellect" is from 1580s. i.e. acute injury, acute inflammation, acute pancreatitis Obtuse from Latin obtusus "blunted, dull," also used figuratively, past participle of obtundere "to beat against, make dull," from ob "in front of; against" + tundere "to beat," from PIE *(s)tud-e- "to beat, strike, push, thrust," from root *(s)teu- "to push, stick, knock, beat" In geometry, in reference to a plane angle greater than a right angle." Calculus - from Latin calculus "reckoning, account," originally "pebble used as a reckoning counter," diminutive of calx (genitive calcis) "limestone." In medicine, the word also has been used to refer generally to "concretion occurring accidentally in the animal body," such as dental plaque; dental calculus. Sine - from Latin sinus "fold in a garment, bend, curve, bosom." Cosine “with the fold” “with” + “bend, curve” Tangent - "meeting at a point without intersecting," from Latin tangentem (nominative tangens), present participle of tangere "to touch." Addition - "action of adding numbers." from Latin additionem (nominative additio) "an adding to, addition," noun of action from past-participle stem of addere "add to, join, attach" “The action of” + “joining” --- Support this podcast: https://anchor.fm/liam-connerly/support

greek latin terms parallel addition pie angle acute sanskrit tangent meanings geometry sine lithuanian mathematical calculus old english obtuse trigonometry perpendicular cosine

【海豚刑警】史上最失控！時而瘋狂、時而認真，帶你認識最不一樣的他們！feat.海豚刑警

wall best friend dsps cosine

Play Episode Listen Later Sep 29, 2022 79:52

✨S4E13精華✨

【呂允】療癒系的新興創作歌手，音樂像『雨傘』一樣為聽眾擋風遮雨⋯⋯？！feat.呂允

Play Episode Listen Later Sep 15, 2022 49:10

✨S4E11精華✨

spotify poscast kkbox cosine xdd

【陳忻玥】陳忻玥來了！現場竟直接挑戰「她」的高難度歌曲⋯⋯？！feat.陳忻玥

Astro arXiv | all categories

Play Episode Listen Later Sep 8, 2022 63:36

✨S4E10精華✨

poscast kkbox cosine

Ep.7 - SESSUALITÀ ATIPICA, PRELIMINARI E ALTRE BELLE COSINE COL PROF. FABRIZIO QUATTRINI

FACCIAMOLO

Play Episode Listen Later Sep 8, 2022 42:36

Tanti argomenti, a partire dalla sessualità atipica, affrontati insieme al sessuologo, professore e scrittore Fabrizio Quattrini. Non mancano considerazioni su etichette, educazione sessuale, situazione attuale e approcci alla libertà d'espressione sessuale.

prof fabrizio altre tanti cosine

Background modeling for dark matter search with 1 7 years of COSINE-100 data

Play Episode Listen Later Sep 6, 2022 0:55

Background modeling for dark matter search with 1 7 years of COSINE-100 data by G. Adhikari et al. on Tuesday 06 September We present a background model for dark matter searches using an array of NaI(Tl) crystals in the COSINE-100 experiment that is located in the Yangyang underground laboratory. The model includes background contributions from both internal and external sources, including cosmogenic radionuclides and surface $^{210}$Pb contamination. To build the model in the low energy region, with a threshold of 1 keV, we used a depth profile of $^{210}$Pb contamination in the surface of the NaI(Tl) crystals determined in a comparison between measured and simulated spectra. We also considered the effect of the energy scale errors propagated from the statistical uncertainties and the nonlinear detector response at low energies. The 1.7 years COSINE-100 data taken between October 21, 2016 and July 18, 2018 were used for this analysis. Our Monte Carlo simulation provides a non-Gaussian peak around 50 keV originating from beta decays of bulk $^{210}$Pb in a good agreement with the measured background. This model estimates that the activities of bulk $^{210}$Pb and $^{3}$H are dominating the background rate that amounts to an average level of 2.85$pm$0.15 counts/day/keV/kg in the energy region of (1-6) keV, using COSINE-100 data with a total exposure of 97.7 kg$cdot$years. arXiv: http://arxiv.org/abs/http://arxiv.org/abs/2101.11377v2

data search modeling pb dark matter arxiv gaussian cosine

41: Dark Matters Are Seasonal (ボインゴ,ブカ,oka)

Interaxion | 物理系ポッドキャスト

Play Episode Listen Later Aug 31, 2022 98:15

ボインゴ、ブカ、okaでC-S-H系高圧超伝導、ダークマターの季節変動、PTEPなどについて話しました。以下の Show Notes は簡易版です。完全版はこちら。0:00 転職、ISUCONISUCON12 まとめ : ISUCON公式BlogInteraxion 12: AWS as a complex systemISUCON などの話をした回9:29 物性ニュース9:52 C-S-H系高圧超伝導の近況Hirschさんの最新報告ぶひんブログ - マイクロソフトはマヨラナの夢を見るか～固体中におけるマヨラナ粒子観測論争について～ぶひんブログ - 冷たいホットな研究者～LT29(2022年、札幌)で表彰されたスゴイ低温物理研究者たち～25:23 永久磁石を使ったリニア永久磁石で動く中国製懸垂式リニアの試験路線が始動30:21 物性研究のカッティングエッジ東京理科大などが「希土類フリー磁石」開発｜ニュースイッチ by 日刊工業新聞社マテリアルズ・インフォマティクスシリーズ37:35 地下実験ニュース37:50 XENONnT の Electron Recoil Excess が消えた件暗黒物質探索実験XENONnTによる最初の新物理探索の成果：電子反跳事象に関する最新観測結果 - Research at KobeInteraxion 6: I love this company2年前の XENON1T の報告について話した回41:27 COSINE-100 で季節変動が見えた件[2208.05158] An induced annual modulation signature in COSINE-100 data by DAMA/LIBRA's analysis method50:24 Submit your best work to PTEPHigh-Impact Research - Progress of Theoretical and Experimental Physics - Oxford Academic26: A Postdoc Called “It” - Interaxion Podcastグザイハイパー核論文の話をした回PDG: Mathematical Tools高エネルギーニュース　PTEP 誕生の舞台裏第77巻第8号 - バックナンバー - 一般社団法人　日本物理学会Arxiv SanityMidjourneyを超えた？無料の作画AI｢ #StableDiffusion ｣が｢AIを民主化した｣と断言できる理由 - Business Insider Japanこのエピソードの収録から配信まで1週間の間に Midjourney が時代遅れになるという速さ。。1:04:39 コンテンツ1:04:47 企業系ポッドキャストQuantum Native RadioHACARUS.FM https://i8n.page.link/bFxF白金鉱業.FM1:08:12 買ったものTopJoy Butterfly:Pocket-Sized True Color DES Screen E-Reader by Falcon — KickstarterBOOX LeafGoogle Pixel 6aMacBook Air (M2)NEXGIM MG03 + ZwiftFTP1:20:15 ゲームエルデンリングep. 36, 37 でも話しています。Steam：Stray1:29:50 プロジェクト・ヘイル・メアリープロジェクト・ヘイル・メアリー1:32:23 おたより宇宙飛行士試験の話をしてる回一覧1:35:17 お知らせInteraxion Keywords切なく懐かしいトラック - Audiostock出演して頂ける方、感想などお待ちしております。 #interaxion

research seasonal aws dark matter hirsch midjourney theoretical cosine xenon1t

【芒果醬】立志超越五月天的大團！認識他們後保證會想一直塞錢給他們？！feat.芒果醬

Play Episode Listen Later Aug 18, 2022 65:08

✨S4E8精華✨

jump poscast kkbox cosine

【布萊梅】用音樂創造全新的世界觀，超新興樂團布萊梅帶你認識神秘國度陶樂德！

Play Episode Listen Later Jul 28, 2022 53:48

S4E5【布萊梅】用音樂創造全新世界觀，超新興樂團布萊梅帶你認識神秘國度「陶樂德」！feat.布萊梅樂團 ✨S4E5精華✨

The Ricochet Audio Network Superfeed

Erick Erickson Show: S11 EP129: Hour 2 – Sine, Cosine, and Resigned to Defeat

Play Episode Listen Later Jul 21, 2022

Elections are won and lost by a very small percentage of persuadable voters and since Biden is unlikely to get them he's shoring up his base.

joe biden elections defeat resigned erick erickson cosine

S11 EP129: Hour 2 - Sine, Cosine, and Resigned to Defeat

The Erick Erickson Show

Play Episode Listen Later Jul 21, 2022 39:50

Elections are won and lost by a very small percentage of persuadable voters and since Biden is unlikely to get them he's shoring up his base.

joe biden elections defeat resigned cosine

Afl 13: 'Bart's Comet'

The Thompsons: A Simpsons Podcast

Play Episode Listen Later Jun 24, 2022 40:22

Jaren voor Armageddon en Deep Impact de bioscoop haalden met krek dezelfde plot (asteroïde stevent af op de aarde), was daar al The Simpsons met Bart's Comet. Bart ontdekt een komeet, maar die zoeft recht op Springfield af, met Moe's Tavern als ground zero. Het hele stadje gaat in het aanzicht van een zekere dood door verschillende fases van rouw, gaande van ontkenning tot een soort van berusting (Lovejoy: “It's all over people, we don't have a prayer!”). Middels enkele zijsprongetjes naar Austin Powers en Schalkse Ruiters ontleden we zo goed als alle jokes uit de aflevering - en dat zijn er véél. Van Database, Cosine en E-mail over het illustere lot van principal Kahoutec tot de brakke shelterini van Flanders: de meest realistische aflevering is dit zeker niet, hilarisch is ze des te meer.

simpsons bart armageddon springfield comet tavern austin powers jaren flanders deep impact middels cosine

2022 06 07

5...4...3...2...fun!!

Play Episode Listen Later Jun 7, 2022

dear livejournal, i’m not going to complain about what i want to complain about right now, but i’ll just say that i tested negative on day eleven and i just have a lingering cough. my already crappy lungs are pretty tired from all the coughing but i’m good. nice to be back in the world. happy to have shared this experience with more than half the population eh eh eh eh ehhhhhh. see you soon.DOWNLOAD RECORDINGsubscribe to the podcast here: http://feeds.feedburner.com/5432fun00:00 (intro by omar)00:20 Melenas “Ya no me importa” Ya no me importa si tu me quieres03:01 Alien Nosejob “Buffet Of Love” Buffet Of Love EP08:19 All Girls Arson Club “Dogs” Dark Fruits10:52 Clever Girls “Tomorrow” Luck13:26 Stef Chura “Scream” Midnight15:48 Half Shadow “Song for the Garden” At Home With My Candles19:52 Blonde Mom “Play Of Martyrdom” New Brute21:33 Oort Smog “Bright Empty Future” Smeared Pulse Transfers23:58 Pile “No Hands” Green and Gray28:38 Drahla “Serenity” Useless Coordinates31:01 Reptaliens “Baby Come Home” VALIS34:03 Scientific Maps “How I Dissolved” Warm Apparitions37:00 Ancient Pools “Sure Fine” Cosine39:47 Smileswithteeth “for” Care43:28 LLOLLYANNA “HoneyValle” Tarot Waltz EP45:12 QWAM “Everybody Wants To Watch” QWAM47:09 Tight Knit “Want You” Too Hot49:37 Rainham Sheds “You What” Rainham Sheds52:40 Gemma “Love Like (Knock It Off)” Feeling’s Not A Tempo

dogs care song green gardens pop diy midnight scream punk luck pile lofi pop punk too hot no hands clever girls valis cosine stef chura drahla reptaliens

Integral Transforms

Gresham College Lectures

Play Episode Listen Later Apr 21, 2022 58:08 Transcription Available

Integral transforms are the most rarefied of analyses – only used by a subset of engineers and computer scientists; laboured over by many an undergraduate, usually with the accompanying moans; yet every computer, every electronic circuit is an incomprehensible jumble of wires without knowledge of integral transforms. The most common, the Fourier transform, is estimated to be the algorithm that is most computed in the world but what of Laplace and z-transforms? This lecture will explain without using daunting mathematics.A lecture by Richard HarveyThe transcript and downloadable versions of the lecture are available from the Gresham College website:https://www.gresham.ac.uk/lectures-and-events/integral-transformsGresham College has been giving free public lectures since 1597. This tradition continues today with all of our five or so public lectures a week being made available for free download from our website. There are currently over 2,000 lectures free to access or download from the website.Website: http://www.gresham.ac.ukTwitter: http://twitter.com/GreshamCollegeFacebook: https://www.facebook.com/greshamcollegeInstagram: http://www.instagram.com/greshamcollege

education technology debate public academia lecture pulse impulse transforms cos integral laplace kernel gresham fourier superposition dft gresham college cosine

Fannie talks about Fannies West African Cosine, Fufu Challenge, Kansas City restaurant week & more

Eight Plus One-The Classroom

Play Episode Listen Later Apr 17, 2022 53:08

Fannies West African Cosine is becoming a staple in the restaurant dominated streets of Kansas City. The West African inspired dinery is located in the iconic Troost Avenue of Kansas City. Ms. Fannie Gibson is the woman power behind it. Her passion for cooking has taken her from cooking at home to owning a Kansas city Chamber of Commerce recognized restaurant and a two time participant of the Kansas City Restaurant Week. She takes us through her beautiful journey in the United States, An extraordinary story of a champion lady whose heights is limitless. --- Support this podcast: https://podcasters.spotify.com/pod/show/charles-nzally/support

united states ms restaurants kansas kansas city commerce chamber west african restaurant week fufu fannies cosine

#C446 (corymb to cosine)

The Dictionary

Play Episode Listen Later Mar 1, 2022 28:45

I read from corymb to cosine. The word of the episode is "coryphaeus". "The Dictionary - Letter A" on YouTube "The Dictionary - Letter B" on YouTube "The Dictionary - Letter C" on YouTube Featured in a Top 10 Dictionary Podcasts list! https://blog.feedspot.com/dictionary_podcasts/ Backwards Talking on YouTube: https://www.youtube.com/playlist?list=PLmIujMwEDbgZUexyR90jaTEEVmAYcCzuq dictionarypod@gmail.com https://www.facebook.com/thedictionarypod/ https://twitter.com/dictionarypod https://www.instagram.com/dictionarypod/ https://www.patreon.com/spejampar 917-727-5757

intuitions behind Data Science

Cosine Similarity

Play Episode Listen Later Nov 12, 2021 Very Popular

What is cosine similarity in multidimensional data?

similarity cosine

The Morning X with Jason Dick and Friends - Hour 1 - Candy Corn Hackers

Jason and Deb Full Show

Play Episode Listen Later Oct 22, 2021 26:04

We discuss hackers shutting down a candy corn factory, how the Cleveland Brown's new starting running back was working on a fishing boat, and Are You Smarter Than Jason Dick. See omnystudio.com/listener for privacy information.

friends math hackers denver broncos tangent sine candy corn quotient cleveland brown trigonometry d'ernest johnson cosine jason dick

My little baby cosine says things

Lucy’s world❤️✌️

Play Episode Listen Later Sep 7, 2021 0:52

❤️

babies little baby cosine

# 126 If All Your Friends Said Cosine Theta Was .725 Would Say So Too?

Mr. Ohh!'s Sideways View

Play Episode Listen Later Jul 17, 2021 8:45

We want to do the best for our children, But what if our children are out of our league? Laugh with me as I describe how smart kids can be raised by dumb parents

friends laugh theta cosine

Phi sine cosine tangent

The Beta Of Alpha Alphabetically

Play Episode Listen Later May 23, 2021 0:12

https://abcdefghijklmnopqrstuvwxyz.us --- Send in a voice message: https://anchor.fm/ronald-j-legarski/message Support this podcast: https://anchor.fm/ronald-j-legarski/support

tangent cosine

The Marvelous Ones 178: Tangeant ,Cosine Secant, Sine…3.14159

Marvelous Ones – National Talk-Radio Network

Play Episode Listen Later May 13, 2021 47:42

Thunder Force.... Phil only laughed twice! The Ghostbusters new trailers.... WE WANT TO SEE THE MOVIE! Falcon and the Winter Soldier .... Holy Krap!!!

ghostbusters falcon winter soldier marvelous thunder force cosine

Ep. 29: Inventions, Cosine, and Sleep Paralysis

Ramble.

Play Episode Listen Later Apr 29, 2021 51:59

Christian and Anna share their ideas for future inventions, Anna plays "Kiss, Marry, Kill" with sinusoidal functions, and Mark calls in to discuss sleep paralysis and insomnia. April 28, 2021.

kiss marry invention sleep paralysis cosine

When I was young, I was obsessed with the difference between a sine and a cosine. (29 Mar 2021)

Daily Dad Jokes

Play Episode Listen Later Mar 29, 2021 1:57

Daily Dad Jokes (29 Mar 2021)Jokes sourced from reddit.com/r/dadjokes. Joke credits: porichoygupto, adfunk101, Derpvboii, kevographic, ryan2849, jimjimjimjim69, jg4888Produced by Klassic Studios using AutoGen Podcast technology (http://klassicstudios.com/autogen-podcasts/)

young funny reddit jokes joke obsessed dad jokes cosine klassic studios daily dad jokes autogen podcast

Moda e manga - una chiaccherata sulla fashion industry e l'influenza orientale

Ravenous Fashion Podcast

Play Episode Listen Later Mar 5, 2021 23:44

Fashion industry e cultura pop orientale: cosa succede quando si incontrano? Oggi facciamo una chiacchierata tutta su moda, manga e anime, dico una chiacchierata perché questa puntata è stata registrata come flusso di coscienza partendo da delle stories che avevo pubblicato. Se ti sei perso le stories (male male) è perché non ci siamo ancora conosciuti su Instagram: https://www.instagram.com/beatrice_mazza_/ Cosine importanti: Codice sconto "RAVPODCAST" per il corso di Glam Observer 'How to Break into the Fashion Industry' (lo sconto è già applicato al carello): https://sso.teachable.com/secure/348774/checkout/2846627/how-to-break-into-the-fashion-industry?coupon_code=RAVPODCAST (link aff.) Iscriviti alla newsletter "Trova lavoro nel Fashion Marketing": https://tinyurl.com/yypayrqu Per ottenere la Fashion Innovation Bible (che è gratuita) clicca qui! Supporta Ravenous Fashion Podcast con una donazione: https://tinyurl.com/supportaRavenous -- Se ti piaciuta la puntata iscriviti al canale e non dimenticarti di lasciare una recensione su Apple Podcast e condividere la puntata su Instagram! A presto, Bea --- Send in a voice message: https://anchor.fm/ravenousfashionpodcast/message

fashion apple podcast manga moda oggi sulla fashion industry trova iscriviti codice orientale fashion marketing cosine l'influenza

Glampage!(Cosine Wave Accident/The Apocal-Cat)

The Apocal-Cast

Play Episode Listen Later Mar 3, 2021 76:46

The Room With the Snakes Pt. 2 - The Apocal-Cat Strikes Again, The Fauci, Soros and Gates Podcast Pt. 1 Featuring Chris Christie. Octuple-Mask! Tanya calls in, Pooh Contemplates the Silence of God, Eeyore is a c*m Donkey.

god silence wave accident anthony fauci donkey george soros eeyore apocal cosine

71: TRYH: How to Write Copy that Converts with Katie Cosine

Teacherpreneurs, Raise Your Hand

Play Episode Listen Later Feb 16, 2021 39:29

71: TRYH: How to Write Copy that Converts with Katie CosineIn today's episode, Katie Cosine talks to us all about how to write copy that converts. Katie shares some helpful tools to assist in this type of writing as well as some tips she has learned along the way. I hope you will join us.Smartblogger.comInbox BestiesHemingwayGrammarly.comKatie Cosine on Facebook

converts write copy cosine

032 Cartesian Similarity Metrics

Machine Learning Guide

Play Episode Listen Later Nov 8, 2020 42:28

Social media Gnothi and email me a screenshot/link for 3-month access to Machine Learning Applied; commit code to the Github repository for life-access. Normed distances link A norm is a function that assigns a strictly positive length to each vector in a vector space. link Minkowski is generalized. p_root(sum(xi-yi)^p). "p" = ? (1, 2, ..) for below. L1: Manhattan/city-block/taxicab. abs(x2-x1)+abs(y2-y1). Grid-like distance (triangle legs). Preferred for high-dim space. L2: Euclidean. sqrt((x2-x1)^2+(y2-y1)^2. sqrt(dot-product). Straight-line distance; min distance (Pythagorean triangle edge) Others: Mahalanobis, Chebyshev (p=inf), etc Dot product A type of inner product. Outer-product: lies outside the involved planes. Inner-product: dot product lies inside the planes/axes involved link. Dot product: inner product on a finite dimensional Euclidean space link Cosine (normalized dot)

learning ai social straight intelligence artificial metrics grid ml outer preferred similarity cartesian pythagorean euclidean minkowski cosine

Ocean using pen cosine and sine

Scratch tutorials and project reviews

Play Episode Listen Later Oct 28, 2020 9:46

C --- This episode is sponsored by · Anchor: The easiest way to make a podcast. https://anchor.fm/app

ocean cosine

De toekomst van ruimtevaart - Traffic Radio 7 september 2020

Traffic Radio

Play Episode Listen Later Sep 7, 2020 5:28

Na een dag uitstel lanceerde het Europese bedrijf Arianespace vorig week de raket Vega met daarop maar liefst 53 verschillende satellieten. Een belangrijke lancering, want niet eerder zijn zoveel Europese satellieten tegelijk gelanceerd. Yannick van de Wouw erover sprak met Nathan Vercruyssen van Cosine. Bij Traffic Radio richten wij ons op de mobiliteit in Nederland. In verschillende podcasts behandelen we actuele thema's gericht op verkeer over de weg, in de lucht of op het water.

traffic vega nederland yannick europese de toekomst arianespace ruimtevaart cosine wouw

Fighting the big fight: A true warrior Vidya Sekhar

Gladiatrix! Hear me Roar!

Play Episode Play 40 sec Highlight Listen Later Jul 2, 2020 46:52

007 - Vidya Sekhar, Miss India USA 1986, and Miss Michigan State Fair 1984, holds 2 World Records for Longest Solo Classical Dance Performance, through which she fund-raised thousands of dollars to benefit the American Heart Association and the American Cancer Society. Representing the next generation in a lineage of dancers and musicians from India, the Canadian-born singer, dancer and writer, is an advanced dance educator and musical performer for Hindu Temple Rhythms, celebrating almost 50 years of Bharata Natyam education in North America.Today Vidya is a certified PMP, CISSP, CCSP, GCIH with over 20 years experience. She is a senior security services engineer for Microsoft Managed Desktop (MMD). Earlier, she worked as a senior security PM for Microsoft's Cosine, Devices and Gaming Group Services Security team, focused on threat and vulnerability management. Previously, she was a co-author of the Microsoft Security Intelligence Report v14, v15 and v16, while working for the Microsoft Malware Protection Center (MMPC).Her passion for ensuring security that protects both individuals and enterprises from theft, fraud, terrorism, defamation and other abuses, is evident in the choices she makes to prioritize right mindfulness with the career options she has always chosen. Before Microsoft, Vidya leveraged her expertise to work with government and enterprise customers in areas including: risk & change management, BCP/DRP, application & operations security, and compliance/audit.Vidya was previously an award-winning television producer and journalist for PBS, CBS, CNN and various magazines such as CRN, HR Executive Magazine, and siliconindia, covering technology and the workforce since the late '80's, and has also been a panelist, speaker and/or host for national industry events, like the IT Consulting World Conference in New York/New Jersey, as well as the E-biz Conference in San Jose. She was also a founding Board Member of the Bay Agile Leadership Network, and participated actively in PMI as a content author and chapter lead for PMI's Program Management Standards 2nd edition.

Veel Nederlandse hightech de lucht in tijdens ESA-lancering

Wetenschap Vandaag | BNR

Play Episode Listen Later Jun 18, 2020 5:12

Als het weer mee zit worden morgen meer dan 50 kleine satellieten de ruimte in gestuurd. Aan boord: een heleboel Nederlandse hightech, waaronder de Hyperscout-2 van het bedrijfje Cosine. Zij ontwikkelden een instrument dat bijvoorbeeld bosbranden, overstromingen en droogte kan herkennen vanuit de ruimte. Ook beschikt het over kunstmatige intelligentie zodat aan boord al geselecteerd wordt welke data terug naar de aarde wordt gestuurd.

als technologie veel nederlandse high tech zij tijdens aan vandaag wetenschap lancering karlijn wetgeving ruimtevaart cosine meinders

036_9.5 The Sine & Cosine Ratios

Daniela Ramirez

Play Episode Listen Later Apr 14, 2020 1:04

Talking about lesson 9.5

ratios cosine

036_9.5 The Sine & Cosine Ratios

Samanthaaa

Play Episode Listen Later Apr 10, 2020 0:35

036_9.5 The Sine & Cosine Ratios

sine ratios cosine

036_9.5 The Sine and Cosine Ratios

Priscilla Romero

Play Episode Listen Later Apr 7, 2020 0:56

ratios cosine

036_9.5 The Sine & Cosine Ratios

Kimberly's Podcast

Play Episode Listen Later Mar 28, 2020 0:49

The Sine & Cosine Ratios

sine ratios cosine

036_9.5 The Sine & Cosine Ratios

Michelle´s Podcast

Play Episode Listen Later Mar 26, 2020 3:10

gang

ratios cosine

Ep194: AlphaFold, Nueva IA; Dinosaurios y Plumas; Registro Histórico de Actividad Solar; Detectores de Materia Oscura

Coffee Break: Señal y Ruido

Play Episode Listen Later Dec 20, 2018 147:24

La tertulia semanal en la que repasamos las últimas noticias de la actualidad científica. En el episodio de hoy: Noticias varias; Posible descubrimiento de plumas en los pterosaurios; AlphaFold, una nueva IA para el análisis de proteínas; Historia de la actividad solar en los últimos 400 años; Primeros resultados del detector de materia oscura COSINE-100. En la foto, de arriba a abajo y de izquierda a derecha: Sara Robisco, Héctor Socas, Francis Villatoro, Alberto Aparici. Todos los comentarios vertidos durante la tertulia representan únicamente la opinión de quien los hace… y a veces ni eso. CB:SyR es una colaboración entre el Área de Investigación y la Unidad de Comunicación y Cultura Científica (UC3) del Instituto de Astrofísica de Canarias.

Ep194: AlphaFold, Nueva IA; Dinosaurios y Plumas; Registro Histórico de Actividad Solar; Detectores de Materia Oscura

Coffee Break: Señal y Ruido

Play Episode Listen Later Dec 20, 2018 147:24

Episode 24 - Sine, Cosine, But Mostly Tangents

The Platformers Podcast

Play Episode Listen Later Dec 13, 2016 111:00

In this pulse-pounding episode of The Platformers, we get right into the nitty and discuss sinister invisigle forces that remain strangely brutiful! It's tangents within tangents, as the boys incept their way into podcasting history with the most compellingly irrelevant podcast of all time. Are you seriously still reading this? Alright, well, this week we talk about The Last Of Us, virtual reality, and some books. Chris and Marty look forward to Rogue One, and Brian reviews Everybody's Gone To The Rapture, Black Mesa, Pokémon Sun, and Batman: The Telltale Series. (He's also enjoying Watch Dogs 2!) Spoilers: Sometime before the episode ends, the blundersnatch probably catches the snoogledorf. We are getting conflicting reports, but we'll confirm it as soon as we know for sure.

sun pok last of us rogue one tangents watch dogs platformers black mesa batman the telltale series cosine gone to the rapture

ANTIC Episode 27 - Contest!, PRGE

ANTIC The Atari 8-bit Podcast

Play Episode Listen Later Nov 4, 2015 128:03

On this episode of ANTIC the atari 8-bit podcast: we talk about the release of the Star Raiders source code, how to set up an Atari club of your own, and lust over the 1400XL and 815 dual floppy drive. Don’t forget, we have a contest going this month! Whoever transcribes the most ANTIC interviews from 11/1/15 to 11/30/15 will win a Defender cartridge for the Atari 400/800/XL/XE computers signed by none other than Steve Baker, the person who converted the game from the arcade version! Check with Kevin (kevin@savetz.com) to see what interviews need to be transcribed. Recurring Links Floppy Days Podcast AtariArchives.org AtariMagazines.com Kevins Book “Terrible Nerd” New Atari books scans at archive.org ANTIC feedback at AtariAge Atari interview discussion thread on AtariAge ANTIC Facebook Page What we’ve been up to Tricky Tutorials - http://www.atarimania.com/documents/Tricky-Tutorials-1-6.pdf Atlanta Maker Faire - http://makerfaireatl.com/ Star Raiders Source Code - https://archive.org/details/AtariStarRaidersSourceCode Portland Retro Gaming Expo - http://www.retrogamingexpo.com/ News AtariNet released by Slor (James Wilkinson) - http://www.atarinet.com/, http://atariage.com/forums/topic/243627-atarinet-binaries-available-for-download/ MULE Board game https://boardgamegeek.com/boardgame/182619/mule-board-game Rose City Atari Club, Portland, OR, Nov. 19, 2015 - http://calagator.org/events/1250469257 Popeye Arcade 8-bit conversion, courtesy of Homesoft, posted by miker - http://atariage.com/forums/topic/242051-popeye-arcade-8-bit-conversion/ Atari 8-bit emulator for the original Xbox - http://atariteca.blogspot.com/2015/08/atarixlbox-emulador-de-computadora.html new game Quarrion - http://matosimi.websupport.sk/atari/2015/10/quarrion/ New Podcast - The Wizards of Odyssey 2 - https://www.facebook.com/TheWizardsOfOdyssey2Podcast How to transfer files from PC to Atari with Turgen System - http://atariteca.blogspot.com/2015/03/como-transferir-archivos-de-pc-atari.html , http://turgen.sourceforge.net/ Pro(c) Issue #7 - http://proc-atari.de/ 1400XL for sale on ebay by bob1200xl on AtariAge - http://www.ebay.com/itm/Atari-8-bit-1400XL-computer-tested-400-800-1200XL-600XL-800XL-/262103742364?hash=item3d0699b79c:g:PPUAAOSw5ZBWJn3c, http://atariage.com/forums/topic/244786-1400xl-on-ebay/ ABBUC Software Contest Update - http://www.abbuc.de/atari/software-ressort/81-software/softwarewettbewerbe/1764-software-wettbewerb-2015 SIO2PC-USB v1.0 Atari XL XE (KIT) + 2' USB Cable + Software DVD - http://www.ebay.com/itm/281710881072 ProGamer Magazine (7800) - http://www.retroriginals.co.uk/ Update on the company that calls itself Atari - http://www.mcvuk.com/news/read/atari-boss-promises-to-fix-its-video-game-mistakes/0157801 New at Archive.org https://archive.org/details/Atari_POOLDISK_1 https://archive.org/details/PicturesOfAtari8-BitHardware https://archive.org/details/AtariStarRaidersSourceCode https://archive.org/details/XLentXpress_v1n1_Winter1986 https://archive.org/details/Atari800OperatorsManualFirstVersion1979 https://archive.org/details/AtariCorporation1991SecondQuarterReport https://archive.org/details/AtariCorporation1991FirstQuarterReport https://archive.org/details/AtariCorporation1990ThirdQuarterReport https://archive.org/details/AtariCorporation1990SecondQuarterReport https://archive.org/details/AtariCorporation1990FirstQuarterReport A+ Programming in Atari BASIC by John M. Reisinger - https://archive.org/details/APlusProgrammingInAtariBasic Bill’s Modern Segment Callisto Official page: http://cosine.org.uk/products.php?prod=callisto&4mat=xl Atarimania entry: http://www.atarimania.com/game-atari-400-800-xl-xe-callisto_28627.html Video: https://www.youtube.com/watch?v=6EG98bkkaq0 "Let's play" video: https://www.youtube.com/watch?v=jL9mTUOnV9U Jason Kelk, aka TMR of Cosine: http://jasonkelk.me.uk/ Cosine: http://www.cosine.org.uk/ HAR'em AtariAge forum post: http://atariage.com/forums/topic/208493-release-harem-final-not-quite/ Atarimania entry: http://www.atarimania.com/game-atari-400-800-xl-xe-har-em_28628.html Video: https://www.youtube.com/watch?v=Ap5ttWJqKJ4 "Let's Play" video https://www.youtube.com/watch?v=pG2udqjE_ns X:8 Official Google+ page: https://plus.google.com/u/0/116385416373212952958/about AtariAge blog post: http://atariage.com/forums/blog/387/entry-10443-x8/ gury.atari8.info entry: http://gury.atari8.info/details_games/6720.php Video: https://www.youtube.com/watch?v=WqmLi6Xmw7Y ABBUC contests 2012: http://www.abbuc.de/component/content/article/81-software/softwarewettbewerbe/1574-software-wettbewerb-2012 2013: http://www.abbuc.de/component/content/article/81-software/softwarewettbewerbe/1666-software-wettbewerb2013 Zybex Wikipedia article: https://en.wikipedia.org/wiki/Zybex Atarimania entry: http://www.atarimania.com/game-atari-400-800-xl-xe-zybex_s5955.html Video: https://www.youtube.com/watch?v=WrJ6K4c1jwE Life Force (Nintendo) Gradius Wikipedia article: https://en.wikipedia.org/wiki/Gradius Salamander (Life Force) Wikipedia article: https://en.wikipedia.org/wiki/Salamander_(video_game) Of the Month Website: Benj Edwards Inside The Atari 800 on PCWorld - http://www.pcworld.com/article/181421/inside_atari_800.html#slide1 Software:F-15 Strike Eagle by Sid Meier & MicroProse - http://www.atarimania.com/game-atari-400-800-xl-xe-f-15-strike-eagle_1916.html Hardware: The Atari 815 disk drive - http://www.atarimuseum.com/computers/8bits/400800/815/815.html Closing Album “Have You Played Atari Today” by Tony Longworth - www.tonylongworth.com, http://www.cdbaby.com/cd/tonylongworth10

portland pc xbox defenders contest odyssey archive atari salamanders john m steve baker antic tmr pc world cosine strike eagle prge star raiders atariage tony longworth xl xe

Area Between the Graphs of Sine and Cosine

OCW Scholar: Single Variable Calculus

Play Episode Listen Later Jul 2, 2015 3:59

This calculus course covers differentiation and integration of functions of one variable, and concludes with a brief discussion of infinite series. Calculus is fundamental to many scientific disciplines including physics, engineering, and economics.

graphs calculus cosine

Print and follow along with corresponding podcasts

print handout cosine

5.3 cosine sum formulas-part1

Play Episode Listen Later Apr 13, 2014 11:32

Part 1 of 2 guided notes

formulas cosine

4.2 graph sine and cosine-vertical shifts part 2 of 2

shifts vertical graphs cosine

Play Episode Listen Later Mar 16, 2014 11:50

part 2 of 2 guided notes

4.2.graph sine and cosine- phase shifts- part 1 of 2

phase shifts graphs cosine

Play Episode Listen Later Mar 16, 2014 19:19

Part 1 of 2 guided notes for translations of sine and cosine

4.2 HANDOUT-translations of sine and cosine