Podcasts about openai devday

  • 47PODCASTS
  • 51EPISODES
  • 51mAVG DURATION
  • 1MONTHLY NEW EPISODE
  • Oct 4, 2024LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about openai devday

Latest podcast episodes about openai devday

This Day in AI Podcast
EP80: Corey Hotline, OpenAI DevDay 2024 Recap, Microsoft CoPilot, ChatGPT Canvas & Ray-Band Doxing

This Day in AI Podcast

Play Episode Listen Later Oct 4, 2024 96:37


Join Simtheory: https://simtheory.aiCall the Corey Hotline: +1 (650) 547-3393 (Not $4.95/min)Our community: https://thisdayinai.com----CHAPTERS:00:00 - Corey Hotline Cold Intro00:18 - OpenAI Dev Day Recap: Realtime API05:58 - Testing the Realtime API with Corey Hotline test09:04 - Comparing OpenAI's Realtime API Advanced Voice Mode to Retell for Calling (Corey Hotline v2)21:50 - GPT-4o Image Fine Tuning28:48 - Prompt Caching in OpenAI API43:07 - Model Distillation: Fine Tuning with Outputs from OpenAI Frontier Models50:36 - What else is coming for the Realtime API?53:28 - The New Microsoft CoPilot, Voice & Vision with CoPilot1:08:37 - Flux 1.1 PRO Update1:15:19 - OpenAI's Response to Claude Artifacts: Canvas1:26:26 - Meta Rayband Doxing1:33:55 - Mike's weekly LOLThanks for listening! We appreciate all of your support. Please share your experience with Corey!

Mixture of Experts
Episode 23: NotebookLM, OpenAI DevDay, and will AI prevent phishing attacks?

Mixture of Experts

Play Episode Listen Later Oct 4, 2024 39:15


Will DeepDive replace the Mixture of Experts podcast? In Episode 23, host Tim Hwang is joined by IBM Researchers Marina Danilevsky, Nathalie Baracaldo and Vagner Santana to dissect this week's AI news. First, the experts talk about the hype around Google's NotebookLM, specifically regarding the DeepDive podcast feature. Next, OpenAI DevDay sparks some interesting conversation around vision fine-tuning and multimodality. Finally, it's Cybersecurity Awareness Month and IBM X-Force released the Cloud Threat Landscape Report. Will AI prevent phishing attacks? Tune-in to this week's episode to learn more!The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity.

KI-Update – ein Heise-Podcast
KI-Update kompakt: Copilot, OpenAI DevDay, medizinische Forschung, Atomkraft

KI-Update – ein Heise-Podcast

Play Episode Listen Later Oct 2, 2024 13:54


Das ist das KI-Update vom 2. Oktober 2024 unter anderem mit diesen Themen: Microsofts Copilot lernt sprechen, sehen und nachdenken OpenAIs neue KI-Assistenten sprechen jetzt in Echtzeit KI kann in der wissenschaftlichen Forschung lediglich unterstützen und Alte Atomkraftwerke sollen Energie-Hunger von KI bedienen Links zu allen Themen der heutigen Folge findet Ihr hier: https://heise.de/- 9960303 Das nächste KI-Update kommt am Montag, 07.10.2024, um 15 Uhr. https://www.heise.de/thema/KI-Update https://pro.heise.de/ki/ https://www.heise.de/newsletter/anmeldung.html?id=ki-update https://www.heise.de/thema/Kuenstliche-Intelligenz https://the-decoder.de/ https://www.heiseplus.de/podcast https://www.ct.de/ki

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

OpenAI DevDay is almost here! Per tradition, we are hosting a DevDay pregame event for everyone coming to town! Join us with demos and gossip!Also sign up for related events across San Francisco: the AI DevTools Night, the xAI open house, the Replicate art show, the DevDay Watch Party (for non-attendees), Hack Night with OpenAI at Cloudflare. For everyone else, join the Latent Space Discord for our online watch party and find fellow AI Engineers in your city.OpenAI's recent o1 release (and Reflection 70b debacle) has reignited broad interest in agentic general reasoning and tree search methods.While we have covered some of the self-taught reasoning literature on the Latent Space Paper Club, it is notable that the Eric Zelikman ended up at xAI, whereas OpenAI's hiring of Noam Brown and now Shunyu suggests more interest in tool-using chain of thought/tree of thought/generator-verifier architectures for Level 3 Agents.We were more than delighted to learn that Shunyu is a fellow Latent Space enjoyer, and invited him back (after his first appearance on our NeurIPS 2023 pod) for a look through his academic career with Harrison Chase (one year after his first LS show).ReAct: Synergizing Reasoning and Acting in Language Modelspaper linkFollowing seminal Chain of Thought papers from Wei et al and Kojima et al, and reflecting on lessons from building the WebShop human ecommerce trajectory benchmark, Shunyu's first big hit, the ReAct paper showed that using LLMs to “generate both reasoning traces and task-specific actions in an interleaved manner” achieved remarkably greater performance (less hallucination/error propagation, higher ALFWorld/WebShop benchmark success) than CoT alone. In even better news, ReAct scales fabulously with finetuning:As a member of the elite Princeton NLP group, Shunyu was also a coauthor of the Reflexion paper, which we discuss in this pod.Tree of Thoughtspaper link hereShunyu's next major improvement on the CoT literature was Tree of Thoughts:Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role…ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices.The beauty of ToT is it doesnt require pretraining with exotic methods like backspace tokens or other MCTS architectures. You can listen to Shunyu explain ToT in his own words on our NeurIPS pod, but also the ineffable Yannic Kilcher:Other WorkWe don't have the space to summarize the rest of Shunyu's work, you can listen to our pod with him now, and recommend the CoALA paper and his initial hit webinar with Harrison, today's guest cohost:as well as Shunyu's PhD Defense Lecture:as well as Shunyu's latest lecture covering a Brief History of LLM Agents:As usual, we are live on YouTube! Show Notes* Harrison Chase* LangChain, LangSmith, LangGraph* Shunyu Yao* Alec Radford* ReAct Paper* Hotpot QA* Tau Bench* WebShop* SWE-Agent* SWE-Bench* Trees of Thought* CoALA Paper* Related Episodes* Our Thomas Scialom (Meta) episode* Shunyu on our NeurIPS 2023 Best Papers episode* Harrison on our LangChain episode* Mentions* Sierra* Voyager* Jason Wei* Tavily* SERP API* ExaTimestamps* [00:00:00] Opening Song by Suno* [00:03:00] Introductions* [00:06:16] The ReAct paper* [00:12:09] Early applications of ReAct in LangChain* [00:17:15] Discussion of the Reflection paper* [00:22:35] Tree of Thoughts paper and search algorithms in language models* [00:27:21] SWE-Agent and SWE-Bench for coding benchmarks* [00:39:21] CoALA: Cognitive Architectures for Language Agents* [00:45:24] Agent-Computer Interfaces (ACI) and tool design for agents* [00:49:24] Designing frameworks for agents vs humans* [00:53:52] UX design for AI applications and agents* [00:59:53] Data and model improvements for agent capabilities* [01:19:10] TauBench* [01:23:09] Promising areas for AITranscriptAlessio [00:00:01]: Hey, everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO of Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Small AI.Swyx [00:00:12]: Hey, and today we have a super special episode. I actually always wanted to take like a selfie and go like, you know, POV, you're about to revolutionize the world of agents because we have two of the most awesome hiring agents in the house. So first, we're going to welcome back Harrison Chase. Welcome. Excited to be here. What's new with you recently in sort of like the 10, 20 second recap?Harrison [00:00:34]: Linkchain, Linksmith, Lingraph, pushing on all of them. Lots of cool stuff related to a lot of the stuff that we're going to talk about today, probably.Swyx [00:00:42]: Yeah.Alessio [00:00:43]: We'll mention it in there. And the Celtics won the title.Swyx [00:00:45]: And the Celtics won the title. You got that going on for you. I don't know. Is that like floorball? Handball? Baseball? Basketball.Alessio [00:00:52]: Basketball, basketball.Harrison [00:00:53]: Patriots aren't looking good though, so that's...Swyx [00:00:56]: And then Xun Yu, you've also been on the pod, but only in like a sort of oral paper presentation capacity. But welcome officially to the LinkedSpace pod.Shunyu [00:01:03]: Yeah, I've been a huge fan. So thanks for the invitation. Thanks.Swyx [00:01:07]: Well, it's an honor to have you on. You're one of like, you're maybe the first PhD thesis defense I've ever watched in like this AI world, because most people just publish single papers, but every paper of yours is a banger. So congrats.Shunyu [00:01:22]: Thanks.Swyx [00:01:24]: Yeah, maybe we'll just kick it off with, you know, what was your journey into using language models for agents? I like that your thesis advisor, I didn't catch his name, but he was like, you know... Karthik. Yeah. It's like, this guy just wanted to use language models and it was such a controversial pick at the time. Right.Shunyu [00:01:39]: The full story is that in undergrad, I did some computer vision research and that's how I got into AI. But at the time, I feel like, you know, you're just composing all the GAN or 3D perception or whatever together and it's not exciting anymore. And one day I just see this transformer paper and that's really cool. But I really got into language model only when I entered my PhD and met my advisor Karthik. So he was actually the second author of GPT-1 when he was like a visiting scientist at OpenAI. With Alec Redford?Swyx [00:02:10]: Yes.Shunyu [00:02:11]: Wow. That's what he told me. It's like back in OpenAI, they did this GPT-1 together and Ilya just said, Karthik, you should stay because we just solved the language. But apparently Karthik is not fully convinced. So he went to Princeton, started his professorship and I'm really grateful. So he accepted me as a student, even though I have no prior knowledge in NLP. And you know, we just met for the first time and he's like, you know, what do you want to do? And I'm like, you know, you have done those test game scenes. That's really cool. I wonder if we can just redo them with language models. And that's how the whole journey began. Awesome.Alessio [00:02:46]: So GPT-2 was out at the time? Yes, that was 2019.Shunyu [00:02:48]: Yeah.Alessio [00:02:49]: Way too dangerous to release. And then I guess the first work of yours that I came across was React, which was a big part of your defense. But also Harrison, when you came on The Pockets last year, you said that was one of the first papers that you saw when you were getting inspired for BlankChain. So maybe give a recap of why you thought it was cool, because you were already working in AI and machine learning. And then, yeah, you can kind of like intro the paper formally. What was that interesting to you specifically?Harrison [00:03:16]: Yeah, I mean, I think the interesting part was using these language models to interact with the outside world in some form. And I think in the paper, you mostly deal with Wikipedia. And I think there's some other data sets as well. But the outside world is the outside world. And so interacting with things that weren't present in the LLM and APIs and calling into them and thinking about the React reasoning and acting and kind of like combining those together and getting better results. I'd been playing around with LLMs, been talking with people who were playing around with LLMs. People were trying to get LLMs to call into APIs, do things, and it was always, how can they do it more reliably and better? And so this paper was basically a step in that direction. And I think really interesting and also really general as well. Like I think that's part of the appeal is just how general and simple in a good way, I think the idea was. So that it was really appealing for all those reasons.Shunyu [00:04:07]: Simple is always good. Yeah.Alessio [00:04:09]: Do you have a favorite part? Because I have one favorite part from your PhD defense, which I didn't understand when I read the paper, but you said something along the lines, React doesn't change the outside or the environment, but it does change the insight through the context, putting more things in the context. You're not actually changing any of the tools around you to work for you, but you're changing how the model thinks. And I think that was like a very profound thing when I, not that I've been using these tools for like 18 months. I'm like, I understand what you meant, but like to say that at the time you did the PhD defense was not trivial. Yeah.Shunyu [00:04:41]: Another way to put it is like thinking can be an extra tool that's useful.Alessio [00:04:47]: Makes sense. Checks out.Swyx [00:04:49]: Who would have thought? I think it's also more controversial within his world because everyone was trying to use RL for agents. And this is like the first kind of zero gradient type approach. Yeah.Shunyu [00:05:01]: I think the bigger kind of historical context is that we have this two big branches of AI. So if you think about RL, right, that's pretty much the equivalent of agent at a time. And it's like agent is equivalent to reinforcement learning and reinforcement learning is equivalent to whatever game environment they're using, right? Atari game or go or whatever. So you have like a pretty much, you know, you have a biased kind of like set of methodologies in terms of reinforcement learning and represents agents. On the other hand, I think NLP is like a historical kind of subject. It's not really into agents, right? It's more about reasoning. It's more about solving those concrete tasks. And if you look at SEL, right, like each task has its own track, right? Summarization has a track, question answering has a track. So I think really it's about rethinking agents in terms of what could be the new environments that we came to have is not just Atari games or whatever video games, but also those text games or language games. And also thinking about, could there be like a more general kind of methodology beyond just designing specific pipelines for each NLP task? That's like the bigger kind of context, I would say.Alessio [00:06:14]: Is there an inspiration spark moment that you remember or how did you come to this? We had Trida on the podcast and he mentioned he was really inspired working with like systems people to think about Flash Attention. What was your inspiration journey?Shunyu [00:06:27]: So actually before React, I spent the first two years of my PhD focusing on text-based games, or in other words, text adventure games. It's a very kind of small kind of research area and quite ad hoc, I would say. And there are like, I don't know, like 10 people working on that at the time. And have you guys heard of Zork 1, for example? So basically the idea is you have this game and you have text observations, like you see a monster, you see a dragon.Swyx [00:06:57]: You're eaten by a grue.Shunyu [00:06:58]: Yeah, you're eaten by a grue. And you have actions like kill the grue with a sword or whatever. And that's like a very typical setup of a text game. So I think one day after I've seen all the GPT-3 stuff, I just think about, you know, how can I solve the game? Like why those AI, you know, machine learning methods are pretty stupid, but we are pretty good at solving the game relatively, right? So for the context, the predominant method to solve this text game is obviously reinforcement learning. And the idea is you just try out an arrow in those games for like millions of steps and you kind of just overfit to the game. But there's no language understanding at all. And I'm like, why can't I solve the game better? And it's kind of like, because we think about the game, right? Like when we see this very complex text observation, like you see a grue and you might see a sword, you know, in the right of the room and you have to go through the wooden door to go to that room. You will think, you know, oh, I have to kill the monster and to kill that monster, I have to get the sword, I have to get the sword, I have to go, right? And this kind of thinking actually helps us kind of throw shots off the game. And it's like, why don't we also enable the text agents to think? And that's kind of the prototype of React. And I think that's actually very interesting because the prototype, I think, was around November of 2021. So that's even before like chain of thought or whatever came up. So we did a bunch of experiments in the text game, but it was not really working that well. Like those text games are just too hard. I think today it's still very hard. Like if you use GPD 4 to solve it, it's still very hard. So the change came when I started the internship in Google. And apparently Google care less about text game, they care more about what's more practical. So pretty much I just reapplied the idea, but to more practical kind of environments like Wikipedia or simpler text games like Alphard, and it just worked. It's kind of like you first have the idea and then you try to find the domains and the problems to demonstrate the idea, which is, I would say, different from most of the AI research, but it kind of worked out for me in that case.Swyx [00:09:09]: For Harrison, when you were implementing React, what were people applying React to in the early days?Harrison [00:09:14]: I think the first demo we did probably had like a calculator tool and a search tool. So like general things, we tried to make it pretty easy to write your own tools and plug in your own things. And so this is one of the things that we've seen in LangChain is people who build their own applications generally write their own tools. Like there are a few common ones. I'd say like the three common ones might be like a browser, a search tool, and a code interpreter. But then other than that-Swyx [00:09:37]: The LMS. Yep.Harrison [00:09:39]: Yeah, exactly. It matches up very nice with that. And we actually just redid like our integrations docs page, and if you go to the tool section, they like highlight those three, and then there's a bunch of like other ones. And there's such a long tail of other ones. But in practice, like when people go to production, they generally have their own tools or maybe one of those three, maybe some other ones, but like very, very few other ones. So yeah, I think the first demos was a search and a calculator one. And there's- What's the data set?Shunyu [00:10:04]: Hotpot QA.Harrison [00:10:05]: Yeah. Oh, so there's that one. And then there's like the celebrity one by the same author, I think.Swyx [00:10:09]: Olivier Wilde's boyfriend squared. Yeah. 0.23. Yeah. Right, right, right.Harrison [00:10:16]: I'm forgetting the name of the author, but there's-Swyx [00:10:17]: I was like, we're going to over-optimize for Olivier Wilde's boyfriend, and it's going to change next year or something.Harrison [00:10:21]: There's a few data sets kind of like in that vein that require multi-step kind of like reasoning and thinking. So one of the questions I actually had for you in this vein, like the React paper, there's a few things in there, or at least when I think of that, there's a few things that I think of. There's kind of like the specific prompting strategy. Then there's like this general idea of kind of like thinking and then taking an action. And then there's just even more general idea of just like taking actions in a loop. Today, like obviously language models have changed a lot. We have tool calling. The specific prompting strategy probably isn't used super heavily anymore. Would you say that like the concept of React is still used though? Or like do you think that tool calling and running tool calling in a loop, is that ReactSwyx [00:11:02]: in your mind?Shunyu [00:11:03]: I would say like it's like more implicitly used than explicitly used. To be fair, I think the contribution of React is actually twofold. So first is this idea of, you know, we should be able to use calls in a very general way. Like there should be a single kind of general method to handle interaction with various environments. I think React is the first paper to demonstrate the idea. But then I think later there are two form or whatever, and this becomes like a trivial idea. But I think at the time, that's like a pretty non-trivial thing. And I think the second contribution is this idea of what people call like inner monologue or thinking or reasoning or whatever, to be paired with tool use. I think that's still non-trivial because if you look at the default function calling or whatever, like there's no inner monologue. And in practice, that actually is important, especially if the tool that you use is pretty different from the training distribution of the language model. I think those are the two main things that are kind of inherited.Harrison [00:12:10]: On that note, I think OpenAI even recommended when you're doing tool calling, it's sometimes helpful to put a thought field in the tool, along with all the actual acquired arguments,Swyx [00:12:19]: and then have that one first.Harrison [00:12:20]: So it fills out that first, and they've shown that that's yielded better results. The reason I ask is just like this same concept is still alive, and I don't know whether to call it a React agent or not. I don't know what to call it. I think of it as React, like it's the same ideas that were in the paper, but it's obviously a very different implementation at this point in time. And so I just don't know what to call it.Shunyu [00:12:40]: I feel like people will sometimes think more in terms of different tools, right? Because if you think about a web agent versus, you know, like a function calling agent, calling a Python API, you would think of them as very different. But in some sense, the methodology is the same. It depends on how you view them, right? I think people will tend to think more in terms of the environment and the tools rather than the methodology. Or, in other words, I think the methodology is kind of trivial and simple, so people will try to focus more on the different tools. But I think it's good to have a single underlying principle of those things.Alessio [00:13:17]: How do you see the surface of React getting molded into the model? So a function calling is a good example of like, now the model does it. What about the thinking? Now most models that you use kind of do chain of thought on their own, they kind of produce steps. Do you think that more and more of this logic will be in the model? Or do you think the context window will still be the main driver of reasoning and thinking?Shunyu [00:13:39]: I think it's already default, right? You do some chain of thought and you do some tool call, the cost of adding the chain of thought is kind of relatively low compared to other things. So it's not hurting to do that. And I think it's already kind of common practice, I would say.Swyx [00:13:56]: This is a good place to bring in either Tree of Thought or Reflection, your pick.Shunyu [00:14:01]: Maybe Reflection, to respect the time order, I would say.Swyx [00:14:05]: Any backstory as well, like the people involved with NOAA and the Princeton group. We talked about this offline, but people don't understand how these research pieces come together and this ideation.Shunyu [00:14:15]: I think Reflection is mostly NOAA's work, I'm more like advising kind of role. The story is, I don't remember the time, but one day we just see this pre-print that's like Reflection and Autonomous Agent with memory or whatever. And it's kind of like an extension to React, which uses this self-reflection. I'm like, oh, somehow you've become very popular. And NOAA reached out to me, it's like, do you want to collaborate on this and make this from an archive pre-print to something more solid, like a conference submission? I'm like, sure. We started collaborating and we remain good friends today. And I think another interesting backstory is NOAA was contacted by OpenAI at the time. It's like, this is pretty cool, do you want to just work at OpenAI? And I think Sierra also reached out at the same time. It's like, this is pretty cool, do you want to work at Sierra? And I think NOAA chose Sierra, but it's pretty cool because he was still like a second year undergrad and he's a very smart kid.Swyx [00:15:16]: Based on one paper. Oh my god.Shunyu [00:15:19]: He's done some other research based on programming language or chemistry or whatever, but I think that's the paper that got the attention of OpenAI and Sierra.Swyx [00:15:28]: For those who haven't gone too deep on it, the way that you present the inside of React, can you do that also for reflection? Yeah.Shunyu [00:15:35]: I think one way to think of reflection is that the traditional idea of reinforcement learning is you have a scalar reward and then you somehow back-propagate the signal of the scalar reward to the rest of your neural network through whatever algorithm, like policy grading or A2C or whatever. And if you think about the real life, most of the reward signal is not scalar. It's like your boss told you, you should have done a better job in this, but you could jump on that or whatever. It's not like a scalar reward, like 29 or something. I think in general, humans deal more with long scalar reward, or you can say language feedback. And the way that they deal with language feedback also has this back-propagation process, right? Because you start from this, you did a good job on job B, and then you reflect what could have been done differently to change to make it better. And you kind of change your prompt, right? Basically, you change your prompt on how to do job A and how to do job B, and then you do the whole thing again. So it's really like a pipeline of language where in self-graded descent, you have something like text reasoning to replace those gradient descent algorithms. I think that's one way to think of reflection.Harrison [00:16:47]: One question I have about reflection is how general do you think the algorithm there is? And so for context, I think at LangChain and at other places as well, we found it pretty easy to implement React in a standard way. You plug in any tools and it kind of works off the shelf, can get it up and running. I don't think we have an off-the-shelf kind of implementation of reflection and kind of the general sense. I think the concepts, absolutely, we see used in different kind of specific cognitive architectures, but I don't think we have one that comes off the shelf. I don't think any of the other frameworks have one that comes off the shelf. And I'm curious whether that's because it's not general enough or it's complex as well, because it also requires running it more times.Swyx [00:17:28]: Maybe that's not feasible.Harrison [00:17:30]: I'm curious how you think about the generality, complexity. Should we have one that comes off the shelf?Shunyu [00:17:36]: I think the algorithm is general in the sense that it's just as general as other algorithms, if you think about policy grading or whatever, but it's not applicable to all tasks, just like other algorithms. So you can argue PPO is also general, but it works better for those set of tasks, but not on those set of tasks. I think it's the same situation for reflection. And I think a key bottleneck is the evaluator, right? Basically, you need to have a good sense of the signal. So for example, if you are trying to do a very hard reasoning task, say mathematics, for example, and you don't have any tools, you're operating in this chain of thought setup, then reflection will be pretty hard because in order to reflect upon your thoughts, you have to have a very good evaluator to judge whether your thought is good or not. But that might be as hard as solving the problem itself or even harder. The principle of self-reflection is probably more applicable if you have a good evaluator, for example, in the case of coding. If you have those arrows, then you can just reflect on that and how to solve the bug andSwyx [00:18:37]: stuff.Shunyu [00:18:38]: So I think another criteria is that it depends on the application, right? If you have this latency or whatever need for an actual application with an end-user, the end-user wouldn't let you do two hours of tree-of-thought or reflection, right? You need something as soon as possible. So in that case, maybe this is better to be used as a training time technique, right? You do those reflection or tree-of-thought or whatever, you get a lot of data, and then you try to use the data to train your model better. And then in test time, you still use something as simple as React, but that's already improved.Alessio [00:19:11]: And if you think of the Voyager paper as a way to store skills and then reuse them, how would you compare this reflective memory and at what point it's just ragging on the memory versus you want to start to fine-tune some of them or what's the next step once you get a very long reflective corpus? Yeah.Shunyu [00:19:30]: So I think there are two questions here. The first question is, what type of information or memory are you considering, right? Is it like semantic memory that stores knowledge about the word, or is it the episodic memory that stores trajectories or behaviors, or is it more of a procedural memory like in Voyager's case, like skills or code snippets that you can use to do actions, right?Swyx [00:19:54]: That's one dimension.Shunyu [00:19:55]: And the second dimension is obviously how you use the memory, either retrieving from it, using it in the context, or fine-tuning it. I think the Cognitive Architecture for Language Agents paper has a good categorization of all the different combinations. And of course, which way you use depends on the concrete application and the concrete need and the concrete task. But I think in general, it's good to think of those systematic dimensions and all the possible options there.Swyx [00:20:25]: Harrison also has in LangMEM, I think you did a presentation in my meetup, and I think you've done it at a couple other venues as well. User state, semantic memory, and append-only state, I think kind of maps to what you just said.Shunyu [00:20:38]: What is LangMEM? Can I give it like a quick...Harrison [00:20:40]: One of the modules of LangChain for a long time has been something around memory. And I think we're still obviously figuring out what that means, as is everyone kind of in the space. But one of the experiments that we did, and one of the proof of concepts that we did was, technically what it was is you would basically create threads, you'd push messages to those threads in the background, we process the data in a few ways. One, we put it into some semantic store, that's the semantic memory. And then two, we do some extraction and reasoning over the memories to extract. And we let the user define this, but extract key facts or anything that's of interest to the user. Those aren't exactly trajectories, they're maybe more closer to the procedural memory. Is that how you'd think about it or classify it?Shunyu [00:21:22]: Is it like about knowledge about the word, or is it more like how to do something?Swyx [00:21:27]: It's reflections, basically.Harrison [00:21:28]: So in generative worlds.Shunyu [00:21:30]: Generative agents.Swyx [00:21:31]: The Smallville. Yeah, the Smallville one.Harrison [00:21:33]: So the way that they had their memory there was they had the sequence of events, and that's kind of like the raw events that happened. But then every N events, they'd run some synthesis over those events for the LLM to insert its own memory, basically. It's that type of memory.Swyx [00:21:49]: I don't know how that would be classified.Shunyu [00:21:50]: I think of that as more of the semantic memory, but to be fair, I think it's just one way to think of that. But whether it's semantic memory or procedural memory or whatever memory, that's like an abstraction layer. But in terms of implementation, you can choose whatever implementation for whatever memory. So they're totally kind of orthogonal. I think it's more of a good way to think of the things, because from the history of cognitive science and cognitive architecture and how people study even neuroscience, that's the way people think of how the human brain organizes memory. And I think it's more useful as a way to think of things. But it's not like for semantic memory, you have to do this kind of way to retrieve or fine-tune, and for procedural memory, you have to do that. I think those are totally orthogonal kind of dimensions.Harrison [00:22:34]: How much background do you have in cognitive sciences, and how much do you model some of your thoughts on?Shunyu [00:22:40]: That's a great question, actually. I think one of the undergrad influences for my follow-up research is I was doing an internship at MIT's Computational Cognitive Science Lab with Josh Tannenbaum, and he's a very famous cognitive scientist. And I think a lot of his ideas still influence me today, like thinking of things in computational terms and getting interested in language and a lot of stuff, or even developing psychology kind of stuff. So I think it still influences me today.Swyx [00:23:14]: As a developer that tried out LangMEM, the way I view it is just it's a materialized view of a stream of logs. And if anything, that's just useful for context compression. I don't have to use the full context to run it over everything. But also it's kind of debuggable. If it's wrong, I can show it to the user, the user can manually fix it, and I can carry on. That's a really good analogy. I like that. I'm going to steal that. Sure. Please, please. You know I'm bullish on memory databases. I guess, Tree of Thoughts? Yeah, Tree of Thoughts.Shunyu [00:23:39]: I feel like I'm relieving the defense in like a podcast format. Yeah, no.Alessio [00:23:45]: I mean, you had a banger. Well, this is the one where you're already successful and we just highlight the glory. It was really good. You mentioned that since thinking is kind of like taking an action, you can use action searching algorithms to think of thinking. So just like you will use Tree Search to find the next thing. And the idea behind Tree of Thought is that you generate all these possible outcomes and then find the best tree to get to the end. Maybe back to the latency question, you can't really do that if you have to respond in real time. So what are maybe some of the most helpful use cases for things like this? Where have you seen people adopt it where the high latency is actually worth the wait?Shunyu [00:24:21]: For things that you don't care about latency, obviously. For example, if you're trying to do math, if you're just trying to come up with a proof. But I feel like one type of task is more about searching for a solution. You can try a hundred times, but if you find one solution, that's good. For example, if you're finding a math proof or if you're finding a good code to solve a problem or whatever, I think another type of task is more like reacting. For example, if you're doing customer service, you're like a web agent booking a ticket for an end user. Those are more reactive kind of tasks, or more real-time tasks. You have to do things fast. They might be easy, but you have to do it reliably. And you care more about can you solve 99% of the time out of a hundred. But for the type of search type of tasks, then you care more about can I find one solution out of a hundred. So it's kind of symmetric and different.Alessio [00:25:11]: Do you have any data or intuition from your user base? What's the split of these type of use cases? How many people are doing more reactive things and how many people are experimenting with deep, long search?Harrison [00:25:23]: I would say React's probably the most popular. I think there's aspects of reflection that get used. Tree of thought, probably the least so. There's a great tweet from Jason Wei, I think you're now a colleague, and he was talking about prompting strategies and how he thinks about them. And I think the four things that he had was, one, how easy is it to implement? How much compute does it take? How many tasks does it solve? And how much does it improve on those tasks? And I'd add a fifth, which is how likely is it to be relevant when the next generation of models come out? And I think if you look at those axes and then you look at React, reflection, tree of thought, it tracks that the ones that score better are used more. React is pretty easy to implement. Tree of thought's pretty hard to implement. The amount of compute, yeah, a lot more for tree of thought. The tasks and how much it improves, I don't have amazing visibility there. But I think if we're comparing React versus tree of thought, React just dominates the first two axes so much that my question around that was going to be like, how do you think about these prompting strategies, cognitive architectures, whatever you want to call them? When you're thinking of them, what are the axes that you're judging them on in your head when you're thinking whether it's a good one or a less good one?Swyx [00:26:38]: Right.Shunyu [00:26:39]: Right. I think there is a difference between a prompting method versus research, in the sense that for research, you don't really even care about does it actually work on practical tasks or does it help? Whatever. I think it's more about the idea or the principle, right? What is the direction that you're unblocking and whatever. And I think for an actual prompting method to solve a concrete problem, I would say simplicity is very important because the simpler it is, the less decision you have to make about it. And it's easier to design. It's easier to propagate. And it's easier to do stuff. So always try to be as simple as possible. And I think latency obviously is important. If you can do things fast and you don't want to do things slow. And I think in terms of the actual prompting method to use for a particular problem, I think we should all be in the minimalist kind of camp, right? You should try the minimum thing and see if it works. And if it doesn't work and there's absolute reason to add something, then you add something, right? If there's absolute reason that you need some tool, then you should add the tool thing. If there's absolute reason to add reflection or whatever, you should add that. Otherwise, if a chain of thought can already solve something, then you don't even need to use any of that.Harrison [00:27:57]: Yeah. Or if it's just better prompting can solve it. Like, you know, you could add a reflection step or you could make your instructions a little bit clearer.Swyx [00:28:03]: And it's a lot easier to do that.Shunyu [00:28:04]: I think another interesting thing is like, I personally have never done those kind of like weird tricks. I think all the prompts that I write are kind of like just talking to a human, right? It's like, I don't know. I never say something like, your grandma is dying and you have to solve it. I mean, those are cool, but I feel like we should all try to solve things in a very intuitive way. Just like talking to your co-worker. That should work 99% of the time. That's my personal take.Swyx [00:28:29]: The problem with how language models, at least in the GPC 3 era, was that they over-optimized to some sets of tokens in sequence. So like reading the Kojima et al. paper that was listing step-by-step, like he tried a bunch of them and they had wildly different results. It should not be the case, but it is the case. And hopefully we're getting better there.Shunyu [00:28:51]: Yeah. I think it's also like a timing thing in the sense that if you think about this whole line of language model, right? Like at the time it was just like a text generator. We don't have any idea how it's going to be used, right? And obviously at the time you will find all kinds of weird issues because it's not trained to do any of that, right? But then I think we have this loop where once we realize chain of thought is important or agent is important or tool using is important, what we see is today's language models are heavily optimized towards those things. So I think in some sense they become more reliable and robust over those use cases. And you don't need to do as much prompt engineering tricks anymore to solve those things. I feel like in some sense, I feel like prompt engineering even is like a slightly negative word at the time because it refers to all those kind of weird tricks that you have to apply. But I think we don't have to do that anymore. Like given today's progress, you should just be able to talk to like a coworker. And if you're clear and concrete and being reasonable, then it should do reasonable things for you.Swyx [00:29:51]: Yeah. The way I put this is you should not be a prompt engineer because it is the goal of the big labs to put you out of a job.Shunyu [00:29:58]: You should just be a good communicator. Like if you're a good communicator to humans, you should be a good communicator to languageSwyx [00:30:02]: models.Harrison [00:30:03]: That's the key though, because oftentimes people aren't good communicators to these language models and that is a very important skill and that's still messing around with the prompt. And so it depends what you're talking about when you're saying prompt engineer.Shunyu [00:30:14]: But do you think it's like very correlated with like, are they like a good communicator to humans? You know, it's like.Harrison [00:30:20]: It may be, but I also think I would say on average, people are probably worse at communicating with language models than to humans right now, at least, because I think we're still figuring out how to do it. You kind of expect it to be magical and there's probably some correlation, but I'd say there's also just like, people are worse at it right now than talking to humans.Shunyu [00:30:36]: We should make it like a, you know, like an elementary school class or whatever, how toSwyx [00:30:41]: talk to language models. Yeah. I don't know. Very pro that. Yeah. Before we leave the topic of trees and searching, not specific about QSTAR, but there's a lot of questions about MCTS and this combination of tree search and language models. And I just had to get in a question there about how seriously should people take this?Shunyu [00:30:59]: Again, I think it depends on the tasks, right? So MCTS was magical for Go, but it's probably not as magical for robotics, right? So I think right now the problem is not even that we don't have good methodologies, it's more about we don't have good tasks. It's also very interesting, right? Because if you look at my citation, it's like, obviously the most cited are React, Refraction and Tree of Thought. Those are methodologies. But I think like equally important, if not more important line of my work is like benchmarks and environments, right? Like WebShop or SuiteVenture or whatever. And I think in general, what people do in academia that I think is not good is they choose a very simple task, like Alford, and then they apply overly complex methods to show they improve 2%. I think you should probably match the level of complexity of your task and your method. I feel like where tasks are kind of far behind the method in some sense, right? Because we have some good test-time approaches, like whatever, React or Refraction or Tree of Thought, or like there are many, many more complicated test-time methods afterwards. But on the benchmark side, we have made a lot of good progress this year, last year. But I think we still need more progress towards that, like better coding benchmark, better web agent benchmark, better agent benchmark, not even for web or code. I think in general, we need to catch up with tasks.Harrison [00:32:27]: What are the biggest reasons in your mind why it lags behind?Shunyu [00:32:31]: I think incentive is one big reason. Like if you see, you know, all the master paper are cited like a hundred times more than the task paper. And also making a good benchmark is actually quite hard. It's almost like a different set of skills in some sense, right? I feel like if you want to build a good benchmark, you need to be like a good kind of product manager kind of mindset, right? You need to think about why people should use your benchmark, why it's challenging, why it's useful. If you think about like a PhD going into like a school, right? The prior skill that expected to have is more about, you know, can they code this method and can they just run experiments and can solve that? I think building a benchmark is not the typical prior skill that we have, but I think things are getting better. I think more and more people are starting to build benchmarks and people are saying that it's like a way to get more impact in some sense, right? Because like if you have a really good benchmark, a lot of people are going to use it. But if you have a super complicated test time method, like it's very hard for people to use it.Harrison [00:33:35]: Are evaluation metrics also part of the reason? Like for some of these tasks that we might want to ask these agents or language models to do, is it hard to evaluate them? And so it's hard to get an automated benchmark. Obviously with SweetBench you can, and with coding, it's easier, but.Shunyu [00:33:50]: I think that's part of the skillset thing that I mentioned, because I feel like it's like a product manager because there are many dimensions and you need to strike a balance and it's really hard, right? If you want to make sense, very easy to autogradable, like automatically gradable, like either to grade or either to evaluate, then you might lose some of the realness or practicality. Or like it might be practical, but it might not be as scalable, right? For example, if you think about text game, human have pre-annotated all the rewards and all the language are real. So it's pretty good on autogradable dimension and the practical dimension. If you think about, you know, practical, like actual English being practical, but it's not scalable, right? It takes like a year for experts to build that game. So it's not really that scalable. And I think part of the reason that SweetBench is so popular now is it kind of hits the balance between these three dimensions, right? Easy to evaluate and being actually practical and being scalable. Like if I were to criticize upon some of my prior work, I think webshop, like it's my initial attempt to get into benchmark world and I'm trying to do a good job striking the balance. But obviously we make it all gradable and it's really scalable, but then I think the practicality is not as high as actually just using GitHub issues, right? Because you're just creating those like synthetic tasks.Harrison [00:35:13]: Are there other areas besides coding that jump to mind as being really good for being autogradable?Shunyu [00:35:20]: Maybe mathematics.Swyx [00:35:21]: Classic. Yeah. Do you have thoughts on alpha proof, the new DeepMind paper? I think it's pretty cool.Shunyu [00:35:29]: I think it's more of a, you know, it's more of like a confidence boost or like sometimes, you know, the work is not even about, you know, the technical details or the methodology that it chooses or the concrete results. I think it's more about a signal, right?Swyx [00:35:47]: Yeah. Existence proof. Yeah.Shunyu [00:35:50]: Yeah. It can be done. This direction is exciting. It kind of encourages people to work more towards that direction. I think it's more like a boost of confidence, I would say.Swyx [00:35:59]: Yeah. So we're going to focus more on agents now and, you know, all of us have a special interest in coding agents. I would consider Devin to be the sort of biggest launch of the year as far as AI startups go. And you guys in the Princeton group worked on Suiagents alongside of Suibench. Tell us the story about Suiagent. Sure.Shunyu [00:36:21]: I think it's kind of like a triology, it's actually a series of three works now. So actually the first work is called Intercode, but it's not as famous, I know. And the second work is called Suibench and the third work is called Suiagent. And I'm just really confused why nobody is working on coding. You know, it's like a year ago, but I mean, not everybody's working on coding, obviously, but a year ago, like literally nobody was working on coding. I was really confused. And the people that were working on coding are, you know, trying to solve human evil in like a sick-to-sick way. There's no agent, there's no chain of thought, there's no anything, they're just, you know, fine tuning the model and improve some points and whatever, like, I was really confused because obviously coding is the best application for agents because it's autogradable, it's super important, you can make everything like API or code action, right? So I was confused and I collaborated with some of the students in Princeton and we have this work called Intercode and the idea is, first, if you care about coding, then you should solve coding in an interactive way, meaning more like a Jupyter Notebook kind of way than just writing a program and seeing if it fails or succeeds and stop, right? You should solve it in an interactive way because that's exactly how humans solve it, right? You don't have to, you know, write a program like next token, next token, next token and stop and never do any edits and you cannot really use any terminal or whatever tool. It doesn't make sense, right? And that's the way people are solving coding at the time, basically like sampling a program from a language model without chain of thought, without tool call, without refactoring, without anything. So the first point is we should solve coding in a very interactive way and that's a very general principle that applies for various coding benchmarks. And also, I think you can make a lot of the agent task kind of like interactive coding. If you have Python and you can call any package, then you can literally also browse internet or do whatever you want, like control a robot or whatever. So that seems to be a very general paradigm. But obviously I think a bottleneck is at the time we're still doing, you know, very simple tasks like human eval or whatever coding benchmark people proposed. They were super hard in 2021, like 20%, but they're like 95% already in 2023. So obviously the next step is we need a better benchmark. And Carlos and John, which are the first authors of Swaybench, I think they come up with this great idea that we should just script GitHub and solve whatever human engineers are solving. And I think it's actually pretty easy to come up with the idea. And I think in the first week, they already made a lot of progress. They script the GitHub and they make all the same, but then there's a lot of painful info work and whatever, you know. I think the idea is super easy, but the engineering is super hard. And I feel like that's a very typical signal of a good work in the AI era now.Swyx [00:39:17]: I think also, I think the filtering was challenging, because if you look at open source PRs, a lot of them are just like, you know, fixing typos. I think it's challenging.Shunyu [00:39:27]: And to be honest, we didn't do a perfect job at the time. So if you look at the recent blog post with OpenAI, we improved the filtering so that it's more solvable.Swyx [00:39:36]: I think OpenAI was just like, look, this is a thing now. We have to fix this. These students just rushed it.Shunyu [00:39:45]: It's a good convergence of interests for me.Alessio [00:39:48]: Was that tied to you joining OpenAI? Or was that just unrelated?Shunyu [00:39:52]: It's a coincidence for me, but it's a good coincidence.Swyx [00:39:55]: There is a history of anytime a big lab adopts a benchmark, they fix it. Otherwise, it's a broken benchmark.Shunyu [00:40:03]: So naturally, once we propose swimmage, the next step is to solve it. But I think the typical way you solve something now is you collect some training samples, or you design some complicated agent method, and then you try to solve it. Either super complicated prompt, or you build a better model with more training data. But I think at the time, we realized that even before those things, there's a fundamental problem with the interface or the tool that you're supposed to use. Because that's like an ignored problem in some sense. What your tool is, or how that matters for your task. So what we found concretely is that if you just use the text terminal off the shelf as a tool for those agents, there's a lot of problems. For example, if you edit something, there's no feedback. So you don't know whether your edit is good or not. That makes the agent very confused and makes a lot of mistakes. There are a lot of small problems, you would say. Well, you can try to do prompt engineering and improve that, but it turns out to be actually very hard. We realized that the interface design is actually a very omitted part of agent design. So we did this switch agent work. And the key idea is just, even before you talk about what the agent is, you should talk about what the environment is. You should make sure that the environment is actually friendly to whatever agent you're trying to apply. That's the same idea for humans. Text terminal is good for some tasks, like git, pool, or whatever. But it's not good if you want to look at browser and whatever. Also, browser is a good tool for some tasks, but it's not a good tool for other tasks. We need to talk about how design interface, in some sense, where we should treat agents as our customers. It's like when we treat humans as a customer, we design human computer interfaces. We design those beautiful desktops or browsers or whatever, so that it's very intuitive and easy for humans to use. And this whole great subject of HCI is all about that. I think now the research idea of switch agent is just, we should treat agents as our customers. And we should do like, you know… AICI.Swyx [00:42:16]: AICI, exactly.Harrison [00:42:18]: So what are the tools that a suite agent should have, or a coding agent in general should have?Shunyu [00:42:24]: For suite agent, it's like a modified text terminal, which kind of adapts to a lot of the patterns of language models to make it easier for language models to use. For example, now for edit, instead of having no feedback, it will actually have a feedback of, you know, actually here you introduced like a syntax error, and you should probably want to fix that, and there's an ended error there. And that makes it super easy for the model to actually do that. And there's other small things, like how exactly you write arguments, right? Like, do you want to write like a multi-line edit, or do you want to write a single line edit? I think it's more interesting to think about the way of the development process of an ACI rather than the actual ACI for like a concrete application. Because I think the general paradigm is very similar to HCI and psychology, right? Basically, for how people develop HCIs, they do behavior experiments on humans, right? I do every test, right? Like, which interface is actually better? And I do those behavior experiments, kind of like psychology experiments to humans, and I change things. And I think what's really interesting for me, for this three-agent paper, is we can probably do the same thing for agents, right? We can do every test for those agents and do behavior tests. And through the process, we not only invent better interfaces for those agents, that's the practical value, but we also better understand agents. Just like when we do those A-B tests, we do those HCI, we better understand humans. Doing those ACI experiments, we actually better understand agents. And that's pretty cool.Harrison [00:43:51]: Besides that A-B testing, what are other processes that people can use to think about this in a good way?Swyx [00:43:57]: That's a great question.Shunyu [00:43:58]: And I think three-agent is an initial work. And what we do is the kind of the naive approach, right? You just try some interface, and you see what's going wrong, and then you try to fix that. We do this kind of iterative fixing. But I think what's really interesting is there'll be a lot of future directions that's very promising if we can apply some of the HCI principles more systematically into the interface design. I think that would be a very cool interdisciplinary research opportunity.Harrison [00:44:26]: You talked a lot about agent-computer interfaces and interactions. What about human-to-agent UX patterns? Curious for any thoughts there that you might have.Swyx [00:44:38]: That's a great question.Shunyu [00:44:39]: And in some sense, I feel like prompt engineering is about human-to-agent interface. But I think there can be a lot of interesting research done about... So prompting is about how humans can better communicate with the agent. But I think there could be interesting research on how agents can better communicate with humans, right? When to ask questions, how to ask questions, what's the frequency of asking questions. And I think those kinds of stuff could be very cool research.Harrison [00:45:07]: Yeah, I think some of the most interesting stuff that I saw here was also related to coding with Devin from Cognition. And they had the three or four different panels where you had the chat, the browser, the terminal, and I guess the code editor as well.Swyx [00:45:19]: There's more now.Harrison [00:45:19]: There's more. Okay, I'm not up to date. Yeah, I think they also did a good job on ACI.Swyx [00:45:25]: I think that's the main learning I have from Devin. They cracked that. Actually, there was no foundational planning breakthrough. The planner is actually pretty simple, but ACI that they broke through on.Shunyu [00:45:35]: I think making the tool good and reliable is probably like 90% of the whole agent. Once the tool is actually good, then the agent design can be much, much simpler. On the other hand, if the tool is bad, then no matter how much you put into the agent design, planning or search or whatever, it's still going to be trash.Harrison [00:45:53]: Yeah, I'd argue the same. Same with like context and instructions. Like, yeah, go hand in hand.Alessio [00:46:00]: On the tool, how do you think about the tension of like, for both of you, I mean, you're building a library, so even more for you. The tension between making now a language or a library that is like easy for the agent to grasp and write versus one that is easy for like the human to grasp and write. Because, you know, the trend is like more and more code gets written by the agent. So why wouldn't you optimize the framework to be as easy as possible for the model versus for the person?Swyx [00:46:24]: I think it's possible to design an interfaceShunyu [00:46:25]: that's both friendly to humans and agents. But what do you think?Harrison [00:46:29]: We haven't thought about that from the perspective, like we're not trying to design LangChain or LangGraph to be friendly. But I mean, I think to be friendly for agents to write.Swyx [00:46:42]: But I mean, I think we see this with like,Harrison [00:46:43]: I saw some paper that used TypeScript notation instead of JSON notation for tool calling and it got a lot better performance. So it's definitely a thing. I haven't really heard of anyone designing like a syntax or a language explicitly for agents, but there's clearly syntaxes that are better.Shunyu [00:46:59]: I think function calling is a good example where it's like a good interface for both human programmers and for agents, right? Like for developers, it's actually a very friendly interface because it's very concrete and you don't have to do prompt engineering anymore. You can be very systematic. And for models, it's also pretty good, right? Like it can use all the existing coding content. So I think we need more of those kinds of designs.Swyx [00:47:21]: I will mostly agree and I'll slightly disagree in terms of this, which is like, whether designing for humans also overlaps with designing for AI. So Malte Ubo, who's the CTO of Vercel, who is creating basically JavaScript's competitor to LangChain, they're observing that basically, like if the API is easy to understand for humans, it's actually much easier to understand for LLMs, for example, because they're not overloaded functions. They don't behave differently under different contexts. They do one thing and they always work the same way. It's easy for humans, it's easy for LLMs. And like that makes a lot of sense. And obviously adding types is another one. Like type annotations only help give extra context, which is really great. So that's the agreement. And then a disagreement is that when I use structured output to do my chain of thought, I have found that I change my field names to hint to the LLM of what the field is supposed to do. So instead of saying topics, I'll say candidate topics. And that gives me a better result because the LLM was like, ah, this is just a draft thing I can use for chain of thought. And instead of like summaries, I'll say topic summaries to link the previous field to the current field. So like little stuff like that, I find myself optimizing for the LLM where I, as a human, would never do that. Interesting.Shunyu [00:48:32]: It's kind of like the way you optimize the prompt, it might be different for humans and for machines. You can have a common ground that's both clear for humans and agents, but to improve the human performance versus improving the agent performance, they might move to different directions.Swyx [00:48:48]: Might move different directions. There's a lot more use of metadata as well, like descriptions, comments, code comments, annotations and stuff like that. Yeah.Harrison [00:48:56]: I would argue that's just you communicatingSwyx [00:48:58]: to the agent what it should do.Harrison [00:49:00]: And maybe you need to communicate a little bit more than to humans because models aren't quite good enough yet.Swyx [00:49:06]: But like, I don't think that's crazy.Harrison [00:49:07]: I don't think that's like- It's not crazy.Swyx [00:49:09]: I will bring this in because it just happened to me yesterday. I was at the cursor office. They held their first user meetup and I was telling them about the LLM OS concept and why basically every interface, every tool was being redesigned for AIs to use rather than humans. And they're like, why? Like, can we just use Bing and Google for LLM search? Why must I use Exa? Or what's the other one that you guys work with?Harrison [00:49:32]: Tavilli.Swyx [00:49:33]: Tavilli. Web Search API dedicated for LLMs. What's the difference?Shunyu [00:49:36]: Exactly. To Bing API.Swyx [00:49:38]: Exactly.Harrison [00:49:38]: There weren't great APIs for search. Like the best one, like the one that we used initially in LangChain was SERP API, which is like maybe illegal. I'm not sure.Swyx [00:49:49]: And like, you know,Harrison [00:49:52]: and now there are like venture-backed companies.Swyx [00:49:53]: Shout out to DuckDuckGo, which is free.Harrison [00:49:55]: Yes, yes.Swyx [00:49:56]: Yeah.Harrison [00:49:56]: I do think there are some differences though. I think you want, like, I think generally these APIs try to return small amounts of text information, clear legible field. It's not a massive JSON blob. And I think that matters. I think like when you talk about designing tools, it's not only the, it's the interface in the entirety, not only the inputs, but also the outputs that really matter. And so I think they try to make the outputs.Shunyu [00:50:18]: They're doing ACI.Swyx [00:50:19]: Yeah, yeah, absolutely.Harrison [00:50:20]: Really?Swyx [00:50:21]: Like there's a whole set of industries that are just being redone for ACI. It's weird. And so my simple answer to them was like the error messages. When you give error messages, they should be basically prompts for the LLM to take and then self-correct. Then your error messages get more verbose, actually, than you normally would with a human. Stuff like that. Like a little, honestly, it's not that big. Again, like, is this worth a venture-backed industry? Unless you can tell us. But like, I think Code Interpreter, I think is a new thing. I hope so.Alessio [00:50:52]: We invested in it to be so.Shunyu [00:50:53]: I think that's a very interesting point. You're trying to optimize to the extreme, then obviously they're going to be different. For example, the error—Swyx [00:51:00]: Because we take it very seriously. Right.Shunyu [00:51:01]: The error for like language model, the longer the better. But for humans, that will make them very nervous and very tired, right? But I guess the point is more like, maybe we should try to find a co-optimized common ground as much as possible. And then if we have divergence, then we should try to diverge. But it's more philosophical now.Alessio [00:51:19]: But I think like part of it is like how you use it. So Google invented the PageRank because ideally you only click on one link, you know, like the top three should have the answer. But with models, it's like, well, you can get 20. So those searches are more like semantic grouping in a way. It's like for this query, I'll return you like 20, 30 things that are kind of good, you know? So it's less about ranking and it's more about grouping.Shunyu [00:51:42]: Another fundamental thing about HCI is the difference between human and machine's kind of memory limit, right? So I think what's really interesting about this concept HCI versus HCI is interfaces that's optimized for them. You can kind of understand some of the fundamental characteristics, differences of humans and machines, right? Why, you know, if you look at find or whatever terminal command, you know, you can only look at one thing at a time or that's because we have a very small working memory. You can only deal with one thing at a time. You can only look at one paragraph of text at the same time. So the interface for us is by design, you know, a small piece of information, but more temporal steps. But for machines, that should be the opposite, right? You should just give them a hundred different results and they should just decide in context what's the most relevant stuff and trade off the context for temporal steps. That's actually also better for language models because like the cost is smaller or whatever. So it's interesting to connect those interfaces to the fundamental kind of differences of those.Harrison [00:52:43]: When you said earlier, you know, we should try to design these to maybe be similar as possible and diverge if we need to.Swyx [00:52:49]: I actually don't have a problem with them diverging nowHarrison [00:52:51]: and seeing venture-backed startups emerging now because we are different from machines code AI. And it's just so early on, like they may still look kind of similar and they may still be small differences, but it's still just so early. And I think we'll only discover more ways that they differ. And so I'm totally fine with them kind of like diverging earlySwyx [00:53:10]: and optimizing for the...Harrison [00:53:11]: I agree. I think it's more like, you know,Shunyu [00:53:14]: we should obviously try to optimize human interface just for humans. We're already doing that for 50 years. We should optimize agent interface just for agents, but we might also try to co-optimize both and see how far we can get. There's enough people to try all three directions. Yeah.Swyx [00:53:31]: There's a thesis I sometimes push, which is the sour lesson as opposed to the bitter lesson, which we're always inspired by human development, but actually AI develops its own path.Shunyu [00:53:40]: Right. We need to understand better, you know, what are the fundamental differences between those creatures.Swyx [00:53:45]: It's funny when really early on this pod, you were like, how much grounding do you have in cognitive development and human brain stuff? And I'm like

OnBoard!
EP 43.【AI年终特辑2】标志性的OpenAI DevDay,AI创业者和Deepmind研究员怎么看

OnBoard!

Play Episode Listen Later Dec 26, 2023 113:46


不追热点但求深度思考的OnBoard! 又来啦!转眼间 OpenAI 轰轰烈烈的开发者日 (OpenAI DevDay) 已经过去一个多月了。这一个月也发生了太多事情。但是除却各种大瓜和八卦,DevDay 实打实是行业里相当重要的标志性事件。这次的涉及的,不仅是API大幅成本下降、API更新,还有GPT Store, Assistant API, 多模态等等重磅的上新。我们在devday 三周后,邀请了Monica 非常期待的四位嘉宾,在经历了这一段时间的消化和观察沉淀之后,一起聊聊他们不同角度的思考! Hello World, who is OnBoard!? 这次的嘉宾,既有RPA头部公司来也科技的联合创始人兼CTO,也有真格基金EIR、经历两轮AI创业热潮的创业者视角,也有美团智能硬件负责人的软硬结合机会思考,还有来自 Google Deepmind 的研究员 Eric,从模型和技术的角度,解读 DevDay 中agent相关的更新。真的是非常精彩纷呈,又是一次接近两个小时火花飞溅的讨论。本期录制的时候,Google Gemini 还没有发布,但是回头来看,我们对多模态的讨论还是完全适用的! Enjoy! 嘉宾介绍 Peak, 真格基金 EIR(入驻企业家),Magi 创始人 胡一川,来也科技联合创始人 & CTO Eric Li,Google Deepmind 高级研究员 孙杨,美团智能硬件LLM 负责人 OnBoard! 主持: Monica:美元VC投资人,前 AWS 硅谷团队+ AI 创业公司打工人,公众号M小姐研习录 (ID: MissMStudy) 主理人 | 即刻:莫妮卡同学 我们都聊了什么 01:34 嘉宾自我介绍,如何进入AI领域的,最近看到的有意思的AI产品 11:38 OpenAI Devday 的观感:有什么让你印象深刻的更新?与网上评论相比,有哪些被高估和低估了 12:38 Peak: 为什么说GPT store 被高估了,GPT Builder 其实很有借鉴意义 14:27 GPT store 跟一个 App store 的差距在哪里?OpenAI 未来会如何构建 app store? 19:32 胡一川:为什么说 GPT4 Turbo 被低估了? 21:40 价格和 context window 为什么重要?技术角度要持续提升,有哪些难点? 29:53 Eric: 为什么不成熟的 GPT store 是一个好的决策 33:27 孙杨:为什么说 GPT store 短期高估,长期被低估?为什么说Function call, JSON return 被低估了? 39:01 DevDay 中与 Agent 相关的更新有什么亮点?对于创业公司有什么挑战,有什么机会? 53:05 美团的LLM相关尝试,有哪些落地的场景? 58:36 为什么不同的LLM作为 agent 的基座,效果会差别这么大?我们是否需要针对 agent 的基础模型? 64:13 DevDay 的更新,对于创业公司有什么影响?哪些公司会受到比较大的影响? 82:03 如何看待 Q* 的传闻?合成数据会对 LLM 生态产生怎样的影响? 86:50 GPT-4v 为代表的多模态能力使用感受如何?有可能带来怎样的新机会? 95:41 多模态能力的实现有怎样的技术路径?不同技术路径的核心差异和难点是什么? 98:55 经历了“上一波”AI的创业者,对于这一次的AI创业热潮,看到哪些异同?给其他创业者怎样的建议? 105:27 未来1-3年,最期待AI领域发生哪些变化? 重点词汇 OpenAI Devday GPT Store Assistant API Context length LUI: Linguistic User Interface 我们提到的公司 AI Pin by Humane Langchain: Build context-aware, reasoning applications with LangChain's flexible abstractions and AI-first toolkit. Fixie AI: The fastest way to build conversational AI agents Imbue: build AI systems that can reason Character AI: bringing to life the science-fiction dream of open-ended conversations and collaborations with computers. 参考文章 devday.openai.com openai.com openai.com Peak 提到的论文:Retrieval meets Long Context Large Language Models Fixie: www.fixie.ai Imbue 的融资:imbue.com 欢迎关注M小姐的微信公众号,了解更多中美软件、AI与创业投资的干货内容! M小姐研习录 (ID: MissMStudy) 大家的点赞、评论、转发是对我们最好的鼓励!如果你能在小宇宙上点个赞,Apple Podcasts 上给个五星好评,就能让更多的朋友看到我们努力制作的内容!

AI Named This Show
AI chatbots and boardroom coups

AI Named This Show

Play Episode Listen Later Nov 24, 2023 48:12


This week, Tristan and Tasia recap the leadership upheaval at OpenAI — featuring CEO Sam Altman's shocking ousting and swift return. We set it against the backdrop of this month's OpenAI DevDay revelations — including the debut of GPT-4 Turbo and custom GPTs — as well as xAI's Grok and Anthropic's Claude 2.1. Join us as we bounce from the boardroom to bots, where it's safety versus speed and specialization versus spiciness in the race for AI supremacy.FOLLOWAI Named This ShowTristan & TasiaAI Named This Show podcastOPENAI LEADERSHIP CHANGESThe OpenAI Drama: A Timeline of EventsOpenAI researchers warned board of AI breakthrough ahead of CEO ousterOPENAI DEVDAYOpenAI DevDay, Opening KeynoteThe OpenAI KeynoteEverything Announced at OpenAI's First Developers Day EventOpenAI is letting anyone create their own version of ChatGPTCreate your own AI: ChatGPT Plus unveils customizable AI chatbotsOpenAI introduces GPT-4 Turbo: Larger memory, lower cost, new knowledgeOpenAI CEO Sam Altman wants to build AI “superintelligence”XAI GROKAnnouncing GrokElon Musk Announces Grok, a ‘Rebellious' AI With Few GuardrailsWhat is Grok? Elon Musk launches AI chatbot with ‘a sense of humour' for X/TwitterElon Musk's new AI model doesn't shy from questions about cocaine and orgiesANTHROPIC CLAUDE 2.1OpenAI rival Anthropic makes its Claude chatbot even more useful Hosted on Acast. See acast.com/privacy for more information.

Rozmowy na Autopilocie
OpenAI DevDay i limity LLMów

Rozmowy na Autopilocie

Play Episode Listen Later Nov 22, 2023 35:19


W tym odcinku porozmawiamy o nowościach, które OpenAI przedstawiło na ostatniej konferencji a także o rozwijaniu własnych narzędzi w oparciu o API OpenAI. Mówimy też o kierunku, w którym zmierzają ostatnie iteracje sztucznej iteligencji i budowaniu produktów w ten sposób.

The Startup Podcast
Reacts: How Firing Sam Altman Might Lead To The End Of The World - Seriously

The Startup Podcast

Play Episode Listen Later Nov 20, 2023 59:57


On the 17th of November OpenAI announced on their blog that Sam Altman was fired. This sent shockwaves throughout the startup ecosystem worldwide. What happened next was even more fireworks. Greg Brockman, the other co-founder was removed as president of the board then Mira Murati was appointed interim CEO. Greg quit as well as four other senior people. Investors including Microsoft has been in the dark and Mira Murati and has since tried to bring Sam and Greg back to OpenAI.  So what the hell is going on? Wading through the speculation craziness of the last week are Chris and Yaniv joined by special guest and friend of the pod Emil Michael, former Chief Business Officer at Uber. Was this because of AI safety from people with no skin in the game? An internal battle? What is to become of OpenAI? You DO NOT want to miss this episode! Episode Links: Checkout our recent episode on OpenAI DevDay 2023: https://www.tsp.show/reacts-ai-insiders-discuss-openai-devday-2023-everything-you-need-to-know/  The Pact  Honour The Startup Podcast Pact! If you have listened to TSP and gotten value from it, please: Follow, rate, and review us in your listening app Subscribe to the TSP Mailing List to gain access to exclusive newsletter-only content and early-access to information on upcoming episodes: https://thestartuppodcast.beehiiv.com/subscribe      Follow us on YouTube Give us a public shout-out on LinkedIn or anywhere you have a social media following  Key links The Startup Podcast is sponsored by Vanta. Vanta helps businesses get and stay compliant by automating up to 90% of the work for the most in demand compliance frameworks. With over 200 integrations, you can easily monitor and secure the tools your business relies on. Go to www.vanta.com/tsp  for 20% off their incredible offer. Follow us on YouTube for full-video episodes: https://www.youtube.com/channel/UCNjm1MTdjysRRV07fSf0yGg   Get your question in for our next Q&A episode: https://forms.gle/NZzgNWVLiFmwvFA2A The Startup Podcast website: https://tsp.show    Learn more about Chris and Yaniv Work 1:1 with Chris: http://chrissaad.com/advisory/   Follow Chris on Linkedin: https://www.linkedin.com/in/chrissaad/   Follow Yaniv on Linkedin: https://www.linkedin.com/in/ybernstein/

Unofficial SAP on Azure podcast
#168 - The one with the November Events (Goran Condric & Holger Bruchelt) | SAP on Azure Video Podcast

Unofficial SAP on Azure podcast

Play Episode Listen Later Nov 17, 2023 35:36


In episode 168 of our SAP on Azure video podcast we talk about the recent events that happend. We start with SAP Teched 2023 in Bangalore and take a look at some of the key announcements. Then we switch continents and look at asug Tech Connector where Geoff Scott, Jürgen Müller from SAP and Scott Guthrie from Microsoft kicked off the event with a keynote. Then we quickly talk about the OpenAI DevDay before taking another closer look at Microsoft Ignite. Find all the links mentioned here: https://www.saponazurepodcast.de/episode168 Reach out to us for any feedback / questions: * Robert Boban: https://www.linkedin.com/in/rboban/ * Goran Condric: https://www.linkedin.com/in/gorancondric/ * Holger Bruchelt: https://www.linkedin.com/in/holger-bruchelt/ #Microsoft #SAP #Azure #SAPonAzure #SAPTeched #MSIgnite #asug #TechConnect ## Summary created by AI Key Topics: * SAP Teched: Holger shared his experience of attending SAP Teched in Bangalore and highlighted some of the key announcements from SAP, such as Jewel, generative AI hub, build process automation, edge integration cell, and HANA cloud vector capabilities. He also emphasized the strong collaboration between SAP and Microsoft on AI, integration, and data management. * ASUG Tech Connect: Goran and Holger discussed the ASUG Tech Connect event, where Scott Guthrie from Microsoft joined the keynote and showed the commitment to SAP customers. They also mentioned some of the research findings from ASUG on BTP adoption and usage, and some of the sessions on AI, UX, and skills. * Open AI Dev Day: Holger and Goran briefly talked about the Open AI Dev Day, where Satya Nadella also participated and showcased the partnership between Microsoft and Open AI. They mentioned some of the new features and demos from Open AI, such as GPT-4 Turbo, text-to-speech, and custom models. * Microsoft Ignite: Holger and Goran reviewed some of the news and announcements from Microsoft Ignite, where copilot was a dominant theme. They talked about how copilot can help with various scenarios, such as search, sales, security, and development. They also highlighted the integration of copilot with SAP, such as in Viva Learning, Azure Center for SAP Solutions, and Sentinel.

Avkodat - En podd för utvecklare
32 - Förra veckan var ett händelserikt år inom AI

Avkodat - En podd för utvecklare

Play Episode Listen Later Nov 17, 2023 64:44


I det här avsnittet av avkodat dyker vi ner i allt spännande som hänt kring generativ AI i och med GitHub och OpenAI:s stora utvecklarkonferenser: Cecilia förklarar hur GitHub Copilot hanterar ditt data, Jakob skänker en tanke åt alla stackars startups som byggt tunna skal ovanpå OpenAI:s API:er, Chris oroar sig för hur unga utvecklare ska lära sig framöver, Peter bygger GPT:er på löpande band och Robert försöker hänga med i största allmänhet. Medverkande: Chris Klug, Cecilia Wirén, Jakob Ehn, Peter Örneholm och Robert Folkesson Länkar: GitHub Universe:  https://www.githubuniverse.com/  Copilot Workspace: https://githubnext.com/projects/copilot-workspace/  OpenAI GPTs: https://openai.com/blog/introducing-gpts  OpenAI Devday: https://devday.openai.com/ 

AIA Podcast
ВСЕ НОВИНКИ OpenAI DevDay / Новые LLM от xAI, Samsung и Amazon / AI-Pin не удивил / AIA Podcast #21

AIA Podcast

Play Episode Listen Later Nov 17, 2023 178:13


Pi Tech
GUEST PODCAST — Що показали на OpenAI DevDay?

Pi Tech

Play Episode Listen Later Nov 16, 2023 65:16


Минулого тижня пройшов OpenAI DevDay, а отже час для новин зі світу штучного інтелекту

CTO Morning Coffee
Brew #10: Sezon na keynote ... OpenAI DevDay,s GitHub Universe ... gdzie jesteśmy z tym AI!?

CTO Morning Coffee

Play Episode Listen Later Nov 16, 2023 60:42


Sezon na keynote w pełni. Za nami keynotes OpenAI i GitHub. Czy świat stanął na krawędzi? Czy wszystko się zmieni i Ziemią zawładnie AI? Gdzie dowieźli, co dowieźli i jakie są tego konsekwencje. W Brew #10 (oklaski za okrągły odcinek

Level 5 by Palo Alto Insight
#93 OpenAI「DevDay」での発表内容は、AppleのiPhone発表のインパクトに匹敵するか

Level 5 by Palo Alto Insight

Play Episode Listen Later Nov 15, 2023 31:03


#93 OpenAI「DevDay」での発表内容は、AppleのiPhone発表のインパクトに匹敵するか 収録日:11月14日(日本時間) ▽トーク概要 生成AIを活用することで得られる具体的なメリットや懸念点について OpenAIが初のイベント「DevDay」で発表したこと おすすめコンテンツ:漫画『進撃の巨人』 ============================= Level 5 by Palo Alto Insight への意見箱 https://forms.gle/1R3pWBT4WM49ECau8 放送の感想やご質問は、こちらの意見箱へお寄せください! ============================= 【出演者】 石角友愛 / 長谷川貴久 ※今回は2人での放送回となります。 【Sponsored】 国際資格の専門校 アビタス https://www.abitus.co.jp/mba/ 石角友愛のTwitter:@tomoechama DM解放中!リプライやDMまで気軽にご連絡ください。 パロアルトインサイトHP:www.paloaltoinsight.com 楽曲提供: Atsu (beatmaker and rapper from Zenarchy) https://twitter.com/atsu_izm 「Transform」Level5テーマソング https://m.soundcloud.com/atsuizm/transform --- Send in a voice message: https://podcasters.spotify.com/pod/show/level5/message

Through The Web
Humane AI Pin Reactions, WeWork Bankruptcy, SBF gets 100+ Years in Prison, and the final Beatles Song

Through The Web

Play Episode Listen Later Nov 15, 2023 57:26


INSTAGRAM: https://www.instagram.com/throughtheweb.podcast/WATCH THE EPISODE: https://youtu.be/dsVCxxd01Q4Twitter: https://twitter.com/throughtheweb00:00 - Opener00:14 - Intro00:31 - iPhone's Keyboard01:29 - Our week / personal updates04:11 - Blackberry film07:50 - Steve Wozniak was in hospital08:40 - The final Beatles song10:59 - WeWork Bankruptcy14:40 - Omegle Shuts Down 23:31 - OpenAI's Big Announcements29:46 - Humane AI Pin Reactions42:09 - Elon's "funny" Grok AI46:56 - SBF gets a 100+ years in prison51:58 - The Sun is Angry and Optus Outage56:14 - OutroMentioned in the episode: Blackberry Film: https://www.youtube.com/watch?v=cXL_HDzBQsMSteve Wozniak: https://www.bbc.com/news/technology-67366306The final Beatles song: https://www.youtube.com/watch?v=APJAQoSCwuA&ab_channel=TheBeatlesVEVOWeWork Bankruptcy: https://www.bbc.com/news/business-67316150 Omegle Shuts down: https://www.wired.com/story/omegle-shutdown-lawsuit-child-sexual-abuse/The site Tawsif was talking about: https://neal.fun/OpenAI DevDay: https://www.youtube.com/watch?v=U9mJuUkhUzkHumane AI Pin: https://mashable.com/article/humane-ai-pin-wrong-answersGrok AI: https://twitter.com/xai/status/1721027348970238035SBF Trial: https://gizmodo.com/sam-bankman-fried-ftx-fraud-trial-biggest-moments-1850945899Internet Outage Solar Storm: https://futurism.com/the-byte/professor-warns-sun-knock-out-internetThe Gaming BlenderHave you ever wanted to design your own video game?Listen on: Apple Podcasts Spotify

What's Next|科技早知道
OpenAI 大会杀死了开发者?硅谷徐老师访谈 4 位现场活动参加人 | S7E35

What's Next|科技早知道

Play Episode Listen Later Nov 14, 2023 68:28


本期节目由香港亚洲国际都会赞助播出。感兴趣的朋友可以访问官网 (https://www.brandhk.gov.hk/zh-cn/%e9%a6%99%e6%b8%af%e7%8b%ac%e7%89%b9%e4%b9%8b%e5%a4%84/%e6%9c%80%e6%96%b0%e6%b4%bb%e5%8a%a8?cat=StartmeupHK+%e5%88%9b%e4%b8%9a%e8%8a%82+2023),了解今年创业节的各项日程安排,也可以看到过去几年创业节中重要活动的回放,还可以订阅创业节的 newsletter,随时获取香港创投圈的一手资讯! 11 月 6 日晚,OpenAI 在美国旧金山举行的首届开发者大会 DevDay 毫无疑问是本年度最重要的科技大会之一。然而,随着 GPT-4 Turbo、Assistant API、全新的多模态能力、GPT Store 等最新技术的发布,这究竟是为 OpenAI 的开发者带来更多机遇,还是断绝了他们的生路呢?此次大会是否真的消灭了一大批初创企业,甚至影响了全球知名孵化器 YC 在 2023 年的所有项目? 硅谷徐老师在这期节目中,将会与四位参加开发者大会的业内人士进行访谈,其中包括两位 OpenAI 生态圈内的创业者岳天溦和 Kaicheng Zhou、棕榈资本创始人李厚明、阿里国际站买家 AI 产品负责人 Sharon。大家首先分享了参加完大会后的感受和思考,现场的花絮,以及与其他优秀的 AI 从业者交流的体验。 随后,徐老师和四位嘉宾一起探讨了以下内容:OpenAI 员工对于通用人工智能 AGI 的理想主义,OpenAI 大会中采用的苹果式营销方式,OpenAI 如何促成技术平权及其可能带来的深远影响,以及关键技术背后存在的挑战和看点。通过这次访谈,我们期望能够更清晰地了解生成式人工智能行业的发展动向。 本期人物 • 硅谷徐老师,硅谷连续创业者、高管、人工智能创投家、斯坦福商学院客座讲师,「科技早知道」主播 | 微信公众号:硅谷云| 小红书:硅谷徐老师 • 李厚明,棕榈资本创始人 • 岳天溦,硅谷创业者 MathGPTPro CEO,百万访问量 Web App • Sharon,阿里国际站,买家 AI 产品负责人 • Kaicheng Zhou,Collov.ai (https://collov.ai/) CTO,A轮,AI 视觉设计工具 时间轴 [02:30] 从 OpenAI DevDay 现场出来后,后劲很大! [07:59] 小型创业公司在 OpenAI 颠覆之下的生存思路:「更深、更细、更快」 [14:18] ChatGPT 人均停留时长仅约 8 分钟,而 Character.AI 竟然有近 30 分钟? [16:40] OpenAI 不仅引进苹果的「饥饿营销」,还挖走苹果的核心负责人 [22:04] 「坚定纯粹的 AGI 信仰者」VS「商业化的好奇心和尝试」 [26:23] 微软和 OpenAI:现在「甜甜蜜蜜」,未来「貌合神离」 [35:08] GPT Store 怎么样助力技术平权? [40:21] OpenAI 潜在的隐私安全问题背后,创业者和公司该如何考量? [43:08] GPT Store,有可能会是下一个 App Store吗? [50:03] 开发者大会 Keynote 之外,现场还有哪些看点? [52:27] 现场员工、投资人们,对「开源模型的未来」的看法 [56:17] 从创业者、投资人、大厂从业者的角度,分享对 OpenAI 生态圈内创业者的建议 关联节目 AI Agent 智能体 真相和未来 | 硅谷徐老师对话英伟达、DeepMind大模型专家(上) (https://guiguzaozhidao.fireside.fm/20220178) 超级独角兽 Databricks 联合创始人:从对决 Snowflake,到人类如何与 AI 共存 | S7E21 硅谷徐老师 (https://guiguzaozhidao.fireside.fm/20220174) 通用人工智能离我们多远,大模型专家访谈 |S7E11 硅谷徐老师 x OnBoard! (https://guiguzaozhidao.fireside.fm/20220162) S7特快 | GPT4来啦,欢迎加速进入通用人工智能时代!| 硅谷徐老师 (https://guiguzaozhidao.fireside.fm/20220155) 关联链接 OpenAI DevDay 发布会视频 (https://www.youtube.com/watch?v=U9mJuUkhUzk&t=135s) The OpenAI Keynote (https://stratechery.com/2023/the-openai-keynote/) Discover the GPTs and plugins of ChatGPT (https://gptstore.ai/) Character.AI (https://beta.character.ai/) Function calling (https://platform.openai.com/docs/guides/function-calling/function-calling) ChatGPT 元年:野心,战略,以及绕不开的困难|TECH TUESDAY (https://mp.weixin.qq.com/s/UmWPisq2e5CK-D-d3YJdKA) 现场参加了OpenAI的大会,我感觉属于上个时代的开发者被干掉了 (https://mp.weixin.qq.com/s/xqQ0Xz4uStMCbogWzVT9kw) 幕后制作 监制:丁教、Jecci 声音设计:迪卡普里鑫 运营:瑞涵 设计:饭团 加入声动胡同会员计划 成为声动活泼会员,支持我们独立而无畏地持续创作,并让更多人听到这些声音。 支付 ¥365/年 (https://sourl.cn/rYXHK9) 成为声动胡同常住民。加入后,你将会在「声动胡同」里体验到专属内容、参与社群活动,和听友们一起「声动活泼」。 在此之前,也欢迎你成为声动胡同闲逛者 (https://sourl.cn/rYXHK9) ,免费体验会员内容、感受社群氛围。了解更多会员计划详情,我们在声动胡同等你。 (https://sourl.cn/seG52h) 商务合作 声动活泼商务合作咨询 (https://sourl.cn/6vdmQT) 加入我们 声动活泼正在招聘全职「节目监制」、「节目营销」、「商业化项目管理」,查看详细讯息请 点击链接 (https://sourl.cn/j8tk2g)。如果你已准备好简历,欢迎发送至 hr@shengfm.cn, 标题请用:姓名+岗位名称。 关于声动活泼 「用声音碰撞世界」,声动活泼致力于为人们提供源源不断的思考养料。 我们还有这些播客:声动早咖啡 (https://www.xiaoyuzhoufm.com/podcast/60de7c003dd577b40d5a40f3)、What's Next|科技早知道 (https://guiguzaozhidao.fireside.fm/episodes)、吃喝玩乐了不起 (https://www.xiaoyuzhoufm.com/podcast/644b94c494d78eb3f7ae8640)、反潮流俱乐部 (https://www.xiaoyuzhoufm.com/podcast/5e284c37418a84a0462634a4)、泡腾 VC (https://www.xiaoyuzhoufm.com/podcast/5f445cdb9504bbdb77f092e9)、商业WHY酱 (https://www.xiaoyuzhoufm.com/podcast/61315abc73105e8f15080b8a)、跳进兔子洞 (https://therabbithole.fireside.fm/) 欢迎在即刻 (https://okjk.co/Qd43ia)、微博等社交媒体上与我们互动,搜索 声动活泼 即可找到我们。 期待你给我们写邮件,邮箱地址是:ting@sheng.fm 声小音 https://files.fireside.fm/file/fireside-uploads/images/4/4931937e-0184-4c61-a658-6b03c254754d/gK0pledC.png 欢迎扫码添加声小音,在节目之外和我们保持联系。 Special Guests: Kaicheng Zhou, Sharon, 岳天溦, and 李厚明.

The Startup Podcast
Reacts: AI Insiders Discuss OpenAI DevDay 2023 - Everything You Need To Know!

The Startup Podcast

Play Episode Listen Later Nov 14, 2023 52:29


OpenAI DevDay 2023. A conference that sent reverberations throughout the startup ecosystem worldwide. Welcomed back on The Startup Podcast to discuss and debate it all are Jeremiah Owyang, general partner at Blitzscaling Ventures and founder of Lama Lounge, and Ben Parr, founder of Octane AI, AI Insiders deeply embedded in the Silicon Valley startup ecosystem. With Chris and Yaniv they explore the top takeaways from the OpenAI DevDay. From why GPT-4 Turbo is light-years better than GPT3.5, how to make your own super customized ChatGPT, to quality of life improvements for developers and tinkerers at home, interoperability, multi-modality, switching costs and more, this episode has it all! The quartet also share hot takes on the future of AI startups and what startups in the AI field should be thinking about doing right now! Stay on the pulse of AI, don't miss this episode. Episode Links: Check out Ben's startup Octane AI: https://www.octaneai.com/ Reach out to Jeremiah at Blitzscaling Ventures on Linkedin: https://www.linkedin.com/in/jowyang/ Check out Chris' GPT he created: https://www.chrissaad.com/startupai And his AI standalone app he's currently building: https://www.getwingman.ai/   The Pact Honour The Startup Podcast Pact! If you have listened to TSP and gotten value from it, please: Follow, rate, and review us in your listening app Subscribe to the TSP Mailing List to gain access to exclusive newsletter-only content and early-access to information on upcoming episodes: https://thestartuppodcast.beehiiv.com/subscribe Follow us on YouTube Give us a public shout-out on LinkedIn or anywhere you have a social media following Key links Follow us on YouTube for full-video episodes: https://www.youtube.com/channel/UCNjm1MTdjysRRV07fSf0yGg Get your question in for our next Q&A episode: https://forms.gle/NZzgNWVLiFmwvFA2A The Startup Podcast website: https://tsp.show Learn more about Chris and Yaniv Work 1:1 with Chris: http://chrissaad.com/advisory/ Follow Chris on Linkedin: https://www.linkedin.com/in/chrissaad/ Follow Yaniv on Linkedin: https://www.linkedin.com/in/ybernstein/

DOU Podcast
Огляд OpenAI DevDay | Редизайн monobank | Ставлення Ілона Маска до співробітників — DOU News #120

DOU Podcast

Play Episode Listen Later Nov 13, 2023 18:52


⏩ Навігація  00:00 Інтро 00:22 ChatGPT Turbo, створення власних GPT та нові API — що презентували на першому OpenAI DevDay https://dou.ua/forums/topic/46106/ 04:30 Верховна Рада затвердила держбюджет на 2024 рік https://ain.ua/2023/11/09/parlament-zatverdyv-byudzhet-na-2024/ 05:59 At SpaceX, worker injuries soar in Elon Musk's rush to Mars https://www.reuters.com/investigates/special-report/spacex-musk-safety/ 07:29 Перша премія DOU. Шукаємо проєкти, ініціативи, подкасти української ІТ-спільноти https://dou.ua/lenta/articles/dou-award-2023-apply/ 09:13 Російські хакери Sandworm стояли за атакою на енергосистему України у 2022 році https://ain.ua/2023/11/09/sandworm-stoyaly-za-atakoyu-10-zhovtnya-2022/ 10:00 Snap Lays Off Product Managers as Spiegel Revamps Workforce https://www.theinformation.com/articles/snap-lays-off-product-managers-as-spiegel-revamps-workforce 11:30 Безоплатне навчання для 1000 жінок від Projector Foundation та European Union https://www.prjctrfoundation.com/project-eu 12:39 Підтвердили загибель нацгвардійця Максима Петренка. До війни він очолював кафедру комп'ютерної інженерії в університеті «Україна» https://dou.ua/lenta/news/maksym-petrenko-died-in-the-war/ 13:14 Культовий плеєр Winamp з'явиться на iPhone та Android https://ain.ua/2023/11/08/winamp-zyavytsya-na-iphone-ta-android/ 14:31 Уряд перевів проєкт «е-Підприємець» на постійну основу — що змінилось https://ain.ua/2023/11/08/uryad-pereviv-ye-pidpryyemecz-na-postijnu-osnovu/ 15:08 У monobank перший за шість років редизайн — як виглядатиме застосунок https://ain.ua/2023/11/07/u-monobank-redyzajn/ 17:35 Ідея проєкту, який допоможе відсіяти кремлєботів https://dou.ua/forums/topic/46128/ 18:31 Курс біткоїна

Bigdata Hebdo
Episode 174 : OpenAI devday 2023

Bigdata Hebdo

Play Episode Listen Later Nov 13, 2023 49:12


* ⚠️ Don't try this at home: CSS _as_ the backend - introducing Cascading Server Sheets! -> https://dev.to/thormeier/dont-try-this-at-home-css-as-the-backend-what-3oih### LLM fever* Open AI Dev day 2024 -> https://openai.com/blog/new-models-and-developer-products-announced-at-devday* Knowledge Distillation: Principles, Algorithms, Applications -> https://neptune.ai/blog/knowledge-distillation* Quand la boîte noire des IA génératives livre ses secrets -> https://www.lexpress.fr/amp/economie/high-tech/quand-la-boite-noire-des-ia-generatives-livre-ses-secrets-BFUANKGCOZF2DDGJRNFDNAVHZU/### Data-Science* Hidden Markov Models Explained with a Real Life Example and Python code -> https://medium.com/towards-data-science/hidden-markov-models-explained-with-a-real-life-example-and-python-code-2df2a7956d65### Data-eng* Open Data Contract Standard -> https://github.com/bitol-io/open-data-contract-standard* Twitter's Owner Elon Musk refuses to pay Google Cloud Bill -> https://medium.com/codex/twitters-owner-elon-musk-refuses-to-pay-google-cloud-bill-8e0ec1030101

RationalAnswer
#130 - Теперь можно торговать акциями за рубежом / По 100 тыс. за заморозку / Кокаиновые бегемоты

RationalAnswer

Play Episode Listen Later Nov 13, 2023 22:59


Запишись на бесплатную консультацию от Calltouch о том, как платформа омниканального маркетинга может помочь твоему бизнесу: https://goo.su/0HkesA1?erid=LjN8KLYgy Реклама. ООО «Колтач Солюшнс». Erid: LjN8KLYgy Больше финансовых новостей и авторской аналитики у меня в Телеграм-канале: https://t.me/RationalAnswer Дополнительные материалы к выпуску: - Юридический анализ указа о разрешении торговли бумагами на зарубежных счетах – https://vc.ru/money/911552 - Обзор последних новинок ChatGPT и конференции OpenAI DevDay – https://habr.com/ru/companies/ods/articles/772292/ Текстовая версия выпуска со ссылками: https://vc.ru/money/911597 Посмотреть выпуск на YouTube: https://www.youtube.com/watch?v=0OilL5ZPvSE Поддержи проект RationalAnswer и попади в титры: - Patreon (в валюте) – https://www.patreon.com/RationalAnswer - Boosty (в рублях) – https://boosty.to/RationalAnswer СОДЕРЖАНИЕ: 00:28 – По 100 тыс. рублей за замороженные активы 01:42 – Теперь можно торговать акциями за рубежом 07:21 – СПБ Биржа глубже думает над санкциями 08:40 – Министерство счастья от Матвиенко 09:10 – Омниканальный маркетинг 10:59 – Беспринципные принципы Рэй Далио 14:29 – Кокаиновые бегемоты 15:36 – OpenAI DevDay 2023: новинки ChatGPT 19:11 – Прокуратура РФ не любит нейросетки 20:11 – Лазер в глаз или NFT хоть раз? 21:25 – Забытый пароль и Лохмус по жизни 22:10 – Хорошая новость недели

Farklı Düşün
M3 MacBook Pro, Denizcilik Müzesi, AI Pin, OpenAI DevDay, GitHub Universe

Farklı Düşün

Play Episode Listen Later Nov 12, 2023 149:36


Bu bölümde Apple'ın yeni tanıttığı M3 işlemcili MacBook Pro'ları, Hamburg'taki denizcilik müzesini, Humane'in AI Pin'i ve OpenAI'ın DevDay'i üzerine sohbet ettik.Bizi dinlemekten keyif alıyorsanız, kahve ısmarlayarak bizi destekleyebilir ve Telegram grubumuza katılabilirsiniz. :)Yorumlarınızı, sorularınızı ya da sponsorluk tekliflerinizi info@farklidusun.net e-posta adresine iletebilirsiniz. Bizi Twitter üzerinden takip edebilirsiniz.Zaman damgaları:00:00 - M3 MacBook Pro16:10 - Denizcilik Müzesi27:56 - Yeni Blog37:13 - NSIstanbul etkinliği, iOS programlama59:42 - Okuduklarımız1:16:03 - Assassin's Creed: Mirage1:21:08 - İzlediklerimiz1:34:10 - GitHub Universe1:43:47 - OpenAI DevDay2:05:05 - Humane ai pin2:15:27 - Apple sunumunun iPhone ile çekilmesi2:22:10 - Google'ın tekel davasıBölüm linkleri:Apple Event - 30 EkimMKBHD - Space Black M3 Max MacBook Pro Review: We Can Game Now?!Explore GPU advancements in M3 and A17 ProFirst Impressions: iPhone 15 Pro Spatial Videos on Vision ProHamburg Uluslararası Denizcilik MüzesiMert'in BloguJekyllWatt's the Secret? Cutting My Electricity Costs by 30%TuistThe Dawn of Everything: A New History of HumanityÇalınan Dikkat: Neden Odaklanamıyoruz?Thinking in SwiftUIAssassin's Creed: MirageJusantLawrence of ArabiaGitHub Universe 2023 opening keynoteOpenAI DevDay, Opening KeynoteStratechery - The OpenAI KeynoteHumane ai pinBehind the scenes: An Apple Event shot on iPhoneHere's what Apple really means when it says ‘shot on iPhone‘What Does and Doesn't Matter about Apple Shooting their October Event on iPhone 15 Pro MaxMicrosoft reportedly pitched Apple on buying Bing to no availGoogle reportedly pays $18 billion a year to be Apple's default search engine

engineer meeting podcast
vol.216 OpenAI DevDay感想

engineer meeting podcast

Play Episode Listen Later Nov 12, 2023 50:42


今回はOpenAI DevDayの感想について話しました [トピックス] https://devday.openai.com/ GitHub Copilot DALL·E 3

DeepTech315
DeepTech315: OpenAI DevDay / Grok / Humane AI Pin

DeepTech315

Play Episode Listen Later Nov 10, 2023 15:02


On this week's episode of DeepTech315 Gene Munster and Doug Clinton talk about OpenAI's DevDay, X.AI's Grok, and Humane's AI Pin. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit deepwatermgmt.substack.com

AI in Education Podcast
Rapid Rundown : A summary of the week of AI in education and research

AI in Education Podcast

Play Episode Listen Later Nov 10, 2023 15:37


This week's episode was our new format shortcast - a rapid rundown of some of the news about AI in Education. And it was a hectic week! Here's the links to the topics discussed in the podcast   Australian academics apologise for false AI-generated allegations against big four consultancy firms   https://www.theguardian.com/business/2023/nov/02/australian-academics-apologise-for-false-ai-generated-allegations-against-big-four-consultancy-firms?CMP=Share_iOSApp_Other   New UK DfE guidance on generative AI   The UK's Department for Education guidance on generative AI looks useful for teachers and schools. It has good advice about making sure that you are aware of students' use of AI, and are also aware of the need to ensure that their data - and your data - is protected, including not letting it be used for training. The easiest way to do this is use enterprise grade AI - education or business services - rather than consumer services (the difference between using Teams and Facebook)   You can read the DfE's guidelines here: https://lnkd.in/eqBU4fR5   You can check out the assessment guidelines here: https://lnkd.in/ehYYBktb     "Everyone Knows Claude Doesn't Show Up on AI Detectors" Not a paper, but an article from an Academic https://michellekassorla.substack.com/p/everyone-knows-claude-doesnt-show The article discusses an experiment conducted to test AI detectors' ability to identify content generated by AI writing tools. The author used different AI writers, including ChatGPT, Bard, Bing, and Claude, to write essays which were then checked for plagiarism and AI content using Turnitin. The tests revealed that while other AIs were detected, Claude's submissions consistently bypassed the AI detectors.   New AI isn't like Old AI - you don't have to spend 80% of your project and budget up front gathering and cleaning data   Ethan Mollick on Twitter: The biggest confusion I see about AI from smart people and organizations is conflation between the key to success in pre-2023 machine learning/data science AI (having the best data) & current LLM/generative AI (using it a lot to see what it knows and does, worry about data later) Ethan's tweet 4th November His blog post: https://www.oneusefulthing.org/p/on-holding-back-the-strange-ai-tide       Open AI's Dev Day   We talked about the Open AI announcements this week, including the new GPTs - which is a way to create and use assistants. The Open AI blog post is here: https://openai.com/blog/new-models-and-developer-products-announced-at-devday The blog post on GPT's is here: https://openai.com/blog/introducing-gpts And the keynote video is here: OpenAI DevDay, Opening Keynote       Research Corner Gender Bias Quote: "Contrary to concerns, the results revealed no significant difference in gender bias between the writings of the AI-assisted groups and those without AI support. These findings are pivotal as they suggest that LLMs can be employed in educational settings to aid writing without necessarily transferring biases to student work"   Tutor Feedback tool  Summary of the Research: This paper presents two longitudinal studies assessing the impact of AI-generated feedback on English as a New Language (ENL) learners' writing. The first study compared the learning outcomes of students receiving feedback from ChatGPT with those receiving human tutor feedback, finding no significant difference in outcomes. The second study explored ENL students' preferences between AI and human feedback, revealing a nearly even split. The research suggests that AI-generated feedback can be incorporated into ENL writing assessment without detriment to learning outcomes, recommending a blended approach to capitalize on the strengths of both AI and human feedback.   Personalised feedback in medical learning Summary of the Research: The study examined the efficacy of ChatGPT in delivering formative feedback within a collaborative learning workshop for health professionals. The AI was integrated into a professional development course to assist in formulating digital health evaluation plans. Feedback from ChatGPT was considered valuable by 84% of participants, enhancing the learning experience and group interaction. Despite some participants preferring human feedback, the study underscores the potential of AI in educational settings, especially where personalized attention is limited.   High Stakes answers Your Mum was right all along - ask nicely if you want things! And, in the case of ChatGPT, tell it your boss/Mum/sister is relying on your for the right answer!   Summary of the Research: This paper explores the potential of Large Language Models (LLMs) to comprehend and be augmented by emotional stimuli. Through a series of automatic and human-involved experiments across 45 tasks, the study assesses the performance of various LLMs, including Flan-T5-Large, Vicuna, Llama 2, BLOOM, ChatGPT, and GPT-4. The concept of "EmotionPrompt," which integrates emotional cues into standard prompts, is introduced and shown to significantly improve LLM performance. For instance, the inclusion of emotional stimuli led to an 8.00% relative performance improvement in Instruction Induction and a 115% increase in BIG-Bench tasks. The human study further confirmed a 10.9% average enhancement in generative tasks, validating the efficacy of emotional prompts in improving the quality of LLM outputs.

En.Digital Podcast
La Tertul-IA #11: OpenAI DevDay, Fine-Tuning, regulación y nuevos usos de la IA

En.Digital Podcast

Play Episode Listen Later Nov 10, 2023 90:13


En este episodio, hablamos de las novedades presentadas por OpenAI en el DevDay, de la introducción del contexto en las conversaciones con ChatGPT y su diferencia con el prompt, del método del Fine-Tuning para entrenar modelos de IA, de las diferencias normativas entre EEUU y Europa, y de los nuevos usos de la inteligencia artificial.La TertulIA que escuchan las IAs comiendo pipas con Luis Martín (Product Hackers), Frankie Carrero (VASS) y Corti (Product Hackers). NOSOTROS

Two Voice Devs
Episode 171 - Ups and Downs of the OpenAI DevDay Roller Coaster

Two Voice Devs

Play Episode Listen Later Nov 10, 2023 39:52


On this episode, Mark Tucker and Allen Firstenberg dive deep into the latest announcements by OpenAI. They discuss various developments including the launch of GPTs (collections of prompts and documents with configuration settings), the new text-to-speech model, upcoming GPT-4 Turbo, reproducible outputs, and the introduction of the Assistant API. While they express excitement for what these developments could mean for #VoiceFirst, #ConversationAI, and #GenerativeAI, they also voice concerns about discovery solutions, monetization, and the reliance on platform-based infrastructure. Tune in and join the conversation. More info: https://openai.com/blog/new-models-and-developer-products-announced-at-devday 00:04 Introduction and OpenAI Announcements Edition 00:52 Discussion on OpenAI's New Text to Speech Model 02:15 Exploring the Pricing and Quality of OpenAI's Text to Speech Model 02:52 Concerns and Limitations of OpenAI's Text to Speech Model 06:24 Introduction to GPT 4 Turbo 06:48 Benefits and Limitations of GPT 4 Turbo 09:27 Exploring the Features of GPT 4 Turbo 18:52 Introduction to GPTs and Their Potential 22:22 Concerns and Questions About GPTs 32:14 Discussion on the Assistant API 37:32 Final Thoughts and Wrap Up

The Nonlinear Library
LW - On OpenAI Dev Day by Zvi

The Nonlinear Library

Play Episode Listen Later Nov 9, 2023 25:29


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On OpenAI Dev Day, published by Zvi on November 9, 2023 on LessWrong. OpenAI DevDay was this week. What delicious and/or terrifying things await? Turbo Boost First off, we have GPT-4-Turbo. Today we're launching a preview of the next generation of this model, GPT-4 Turbo. GPT-4 Turbo is more capable and has knowledge of world events up to April 2023. It has a 128k context window so it can fit the equivalent of more than 300 pages of text in a single prompt. We also optimized its performance so we are able to offer GPT-4 Turbo at a 3x cheaper price for input tokens and a 2x cheaper price for output tokens compared to GPT-4. GPT-4 Turbo is available for all paying developers to try by passing gpt-4-1106-preview in the API and we plan to release the stable production-ready model in the coming weeks. Knowledge up to April 2023 is a big game. Cutting the price in half is another big game. A 128k context window retakes the lead on that from Claude-2. That chart from last week of how GPT-4 was slow and expensive, opening up room for competitors? Back to work, everyone. What else? Function calling updates Function calling lets you describe functions of your app or external APIs to models, and have the model intelligently choose to output a JSON object containing arguments to call those functions. We're releasing several improvements today, including the ability to call multiple functions in a single message: users can send one message requesting multiple actions, such as "open the car window and turn off the A/C", which would previously require multiple roundtrips with the model (learn more). We are also improving function calling accuracy: GPT-4 Turbo is more likely to return the right function parameters. This kind of feature seems highly fiddly and dependent. When it starts working well enough, suddenly it is great, and I have no idea if this will count. I will watch out for reports. For now, I am not trying to interact with any APIs via GPT-4. Use caution. Improved instruction following and JSON mode GPT-4 Turbo performs better than our previous models on tasks that require the careful following of instructions, such as generating specific formats (e.g., "always respond in XML"). It also supports our new JSON mode, which ensures the model will respond with valid JSON. The new API parameter response_format enables the model to constrain its output to generate a syntactically correct JSON object. JSON mode is useful for developers generating JSON in the Chat Completions API outside of function calling. Better instruction following is incrementally great. Always frustrating when instructions can't be relied upon. Could allow some processes to be profitably automated. Reproducible outputs and log probabilities The new seed parameter enables reproducible outputs by making the model return consistent completions most of the time. This beta feature is useful for use cases such as replaying requests for debugging, writing more comprehensive unit tests, and generally having a higher degree of control over the model behavior. We at OpenAI have been using this feature internally for our own unit tests and have found it invaluable. We're excited to see how developers will use it. Learn more. We're also launching a feature to return the log probabilities for the most likely output tokens generated by GPT-4 Turbo and GPT-3.5 Turbo in the next few weeks, which will be useful for building features such as autocomplete in a search experience. I love the idea of seeing the probabilities of different responses on the regular, especially if incorporated into ChatGPT. It provides so much context for knowing what to make of the answer. The distribution of possible answers is the true answer. Super excited in a good way. Updated GPT-3.5 Turbo In addition to GPT-4 Turbo, we are also releasing a...

Infinite Machine Learning
Thoughts on OpenAI DevDay announcements | Vikram Sreekanti, cofounder and CEO of RunLLM

Infinite Machine Learning

Play Episode Listen Later Nov 9, 2023 33:24


Vikram Sreekanti is the cofounder and CEO of RunLLM, a developer platform for the LLM stack. They have raised funding from investors such as Redpoint and Essence. He has a PhD in Computer Science from UC Berkeley. In this episode, we cover a range of topics including: - OpenAI DevDay announcements - Are long context windows useful? - Open source AI - The founding of RunLLM - Large foundation models vs smaller specialist models - Why is OpenAI too cheap to beat? - Nvidia's strengths and potential weaknesses Vikram's favorite book: Slaughterhouse-Five (Author: Kurt Vonnegut) --------Where to find Prateek Joshi: Newsletter: https://prateekjoshi.substack.com Website: https://prateekj.com LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19 Twitter: https://twitter.com/prateekvjoshi 

The Nonlinear Library: LessWrong
LW - On OpenAI Dev Day by Zvi

The Nonlinear Library: LessWrong

Play Episode Listen Later Nov 9, 2023 25:29


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On OpenAI Dev Day, published by Zvi on November 9, 2023 on LessWrong. OpenAI DevDay was this week. What delicious and/or terrifying things await? Turbo Boost First off, we have GPT-4-Turbo. Today we're launching a preview of the next generation of this model, GPT-4 Turbo. GPT-4 Turbo is more capable and has knowledge of world events up to April 2023. It has a 128k context window so it can fit the equivalent of more than 300 pages of text in a single prompt. We also optimized its performance so we are able to offer GPT-4 Turbo at a 3x cheaper price for input tokens and a 2x cheaper price for output tokens compared to GPT-4. GPT-4 Turbo is available for all paying developers to try by passing gpt-4-1106-preview in the API and we plan to release the stable production-ready model in the coming weeks. Knowledge up to April 2023 is a big game. Cutting the price in half is another big game. A 128k context window retakes the lead on that from Claude-2. That chart from last week of how GPT-4 was slow and expensive, opening up room for competitors? Back to work, everyone. What else? Function calling updates Function calling lets you describe functions of your app or external APIs to models, and have the model intelligently choose to output a JSON object containing arguments to call those functions. We're releasing several improvements today, including the ability to call multiple functions in a single message: users can send one message requesting multiple actions, such as "open the car window and turn off the A/C", which would previously require multiple roundtrips with the model (learn more). We are also improving function calling accuracy: GPT-4 Turbo is more likely to return the right function parameters. This kind of feature seems highly fiddly and dependent. When it starts working well enough, suddenly it is great, and I have no idea if this will count. I will watch out for reports. For now, I am not trying to interact with any APIs via GPT-4. Use caution. Improved instruction following and JSON mode GPT-4 Turbo performs better than our previous models on tasks that require the careful following of instructions, such as generating specific formats (e.g., "always respond in XML"). It also supports our new JSON mode, which ensures the model will respond with valid JSON. The new API parameter response_format enables the model to constrain its output to generate a syntactically correct JSON object. JSON mode is useful for developers generating JSON in the Chat Completions API outside of function calling. Better instruction following is incrementally great. Always frustrating when instructions can't be relied upon. Could allow some processes to be profitably automated. Reproducible outputs and log probabilities The new seed parameter enables reproducible outputs by making the model return consistent completions most of the time. This beta feature is useful for use cases such as replaying requests for debugging, writing more comprehensive unit tests, and generally having a higher degree of control over the model behavior. We at OpenAI have been using this feature internally for our own unit tests and have found it invaluable. We're excited to see how developers will use it. Learn more. We're also launching a feature to return the log probabilities for the most likely output tokens generated by GPT-4 Turbo and GPT-3.5 Turbo in the next few weeks, which will be useful for building features such as autocomplete in a search experience. I love the idea of seeing the probabilities of different responses on the regular, especially if incorporated into ChatGPT. It provides so much context for knowing what to make of the answer. The distribution of possible answers is the true answer. Super excited in a good way. Updated GPT-3.5 Turbo In addition to GPT-4 Turbo, we are also releasing a...

OpenAI DevDay: Beyond the Headlines with Logan Kilpatrick, OpenAI's Dev Relations Lead

Play Episode Listen Later Nov 8, 2023 75:57


We're deep diving into OpenAI DevDay with Logan Kilpatrick, Dev Relations Lead at OpenAI. If you need an ERP platform, check out our sponsor NetSuite: http://netsuite.com/cognitive. SPONSORS: SHOPIFY: https://shopify.com/cognitive for a $1/month trial period Shopify is the global commerce platform that helps you sell at every stage of your business. Shopify powers 10% of ALL eCommerce in the US. And Shopify's the global force behind Allbirds, Rothy's, and Brooklinen, and 1,000,000s of other entrepreneurs across 175 countries.From their all-in-one e-commerce platform, to their in-person POS system – wherever and whatever you're selling, Shopify's got you covered. With free Shopify Magic, sell more with less effort by whipping up captivating content that converts – from blog posts to product descriptions using AI. Sign up for $1/month trial period: https://shopify.com/cognitive With the onset of AI, it's time to upgrade to the next generation of the cloud: Oracle Cloud Infrastructure. OCI is a single platform for your infrastructure, database, application development, and AI needs. Train ML models on the cloud's highest performing NVIDIA GPU clusters. Do more and spend less like Uber, 8x8, and Databricks Mosaic, take a FREE test drive of OCI at oracle.com/cognitive NetSuite has 25 years of providing financial software for all your business needs. More than 36,000 businesses have already upgraded to NetSuite by Oracle, gaining visibility and control over their financials, inventory, HR, eCommerce, and more. If you're looking for an ERP platform ✅ head to NetSuite: http://netsuite.com/cognitive and download your own customized KPI checklist. Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off. X/SOCIAL: @labenz (Nathan) @OfficialLoganK (Logan) @CogRev_Podcast TIMESTAMPS: (00:00:00) - Episode Preview (00:02:08) - How many startups did OpenAI kill? (00:05:50) - Current employee count at OpenAI (00:06:59) - OpenAI's mission being focused on developing safe AGI to benefit humanity (00:07:10) - How the GPT Store relates to AGI and progressing agent development (00:08:22) - OpenAI's strategy to release AI iteratively so society can adapt (00:10:50) - Safety considerations around the OpenAI Assistant release (00:11:30) - Capability overhangs and is the internet ready for agents? (00:14:13) - Why certain agent capabilities like planning aren't enabled yet by OpenAI (00:15:28) - Sponsors: Shopify | Omneky (00:17:34) - GPT-4-1106 Preview designation (00:21:50) - 16k fine-tuning for 3.5 Turbo (00:25:13) - GPT-4 Finetuning and how to join the experiment (00:27:53) - Custom models: $2-3 million pricing to build a defensible business (00:29:48) - Bringing costs down to bring custom models to more people (00:30:19) - Sponsors: Oracle | Netsuite (00:33:53) - Copyright shield (00:35:42) - OpenAI doesn't train on data you send to the API (00:36:37) - New modalities and low res GPT vision (00:37:26) - GPT Vision Assessment for Aesthetics (00:42:30) - WhisperLarge v3 (00:44:15) - Text-to-speech API: the voice strategy and AI safety (00:51:45) - Log probabilities coming soon (00:53:45) - The evolution of plugins to GPTs: the challenges with plugins (00:55:33) - GPT Instructions, expanded knowledge, and actions (01:00:18) - How is auth handled with GPTs (01:01:04) - Hybrid auth (01:02:50) - GPT Assistant API Billing (01:07:58) - AI Safety (01:10:28) - OpenAI Jailbreaks and Bug Bounties (01:11:57) - The OpenAI roadmap for a year from now The Cognitive Revolution is brought to you by the Turpentine Media network. Producer: Vivian Meng Executive Producers: Amelia Salyers, and Erik Torenberg Editor: Graham Bessellieu For inquiries about guests or sponsoring the podcast, please email vivian@turpentine.co

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
AGI is Being Achieved Incrementally (OpenAI DevDay w/ Simon Willison, Alex Volkov, Jim Fan, Raza Habib, Shreya Rajpal, Rahul Ligma, et al)

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Nov 8, 2023 142:33


SF folks: join us at the AI Engineer Foundation's Emergency Hackathon tomorrow and consider the Newton if you'd like to cowork in the heart of the Cerebral Arena.Our community page is up to date as usual!~800,000 developers watched OpenAI Dev Day, ~8,000 of whom listened along live on our ThursdAI x Latent Space, and ~800 of whom got tickets to attend in person:OpenAI's first developer conference easily surpassed most people's lowballed expectations - they simply did everything short of announcing GPT-5, including:* ChatGPT (the consumer facing product)* GPT4 Turbo already in ChatGPT (running faster, with an April 2023 cutoff), all noticed by users weeks before the conference* Model picker eliminated, God Model chooses for you* GPTs - “tailored version of ChatGPT for a specific purpose” - stopping short of “Agents”. With custom instructions, expanded knowledge, and actions, and an intuitive no-code GPT Builder UI (we tried all these on our livestream yesterday and found some issues, but also were able to ship interesting GPTs very quickly) and a GPT store with revenue sharing (an important criticism we focused on in our episode on ChatGPT Plugins)* API (the developer facing product)* APIs for Dall-E 3, GPT4 Vision, Code Interpreter (RIP Advanced Data Analysis), GPT4 Finetuning and (surprise!) Text to Speech* many thought each of these would take much longer to arrive* usable in curl and in playground* BYO Interpreter + Async Agents?* Assistant API: stateful API backing “GPTs” like apps, with support for calling multiple tools in parallel, persistent Threads (storing message history, unlimited context window with some asterisks), and uploading/accessing Files (with a possibly-too-simple RAG algorithm, and expensive pricing)* Whisper 3 announced and open sourced (HuggingFace recap)* Price drops for a bunch of things!* Misc: Custom Models for big spending ($2-3m) customers, Copyright Shield, SatyaThe progress here feels fast, but it is mostly (incredible) last-mile execution on model capabilities that we already knew to exist. On reflection it is important to understand that the one guiding principle of OpenAI, even more than being Open (we address that in part 2 of today's pod), is that slow takeoff of AGI is the best scenario for humanity, and that this is what slow takeoff looks like:When introducing GPTs, Sam was careful to assert that “gradual iterative deployment is the best way to address the safety challenges with AI”:This is why, in fact, GPTs and Assistants are intentionally underpowered, and it is a useful exercise to consider what else OpenAI continues to consider dangerous (for example, many people consider a while(true) loop a core driver of an agent, which GPTs conspicuously lack, though Lilian Weng of OpenAI does not).We convened the crew to deliver the best recap of OpenAI Dev Day in Latent Space pod style, with a 1hr deep dive with the Functions pod crew from 5 months ago, and then another hour with past and future guests live from the venue itself, discussing various elements of how these updates affect their thinking and startups. Enjoy!Show Notes* swyx live thread (see pinned messages in Twitter Space for extra links from community)* Newton AI Coworking Interest Form in the heart of the Cerebral ArenaTimestamps* [00:00:00] Introduction* [00:01:59] Part I: Latent Space Pod Recap* [00:06:16] GPT4 Turbo and Assistant API* [00:13:45] JSON mode* [00:15:39] Plugins vs GPT Actions* [00:16:48] What is a "GPT"?* [00:21:02] Criticism: the God Model* [00:22:48] Criticism: ChatGPT changes* [00:25:59] "GPTs" is a genius marketing move* [00:26:59] RIP Advanced Data Analysis* [00:28:50] GPT Creator as AI Prompt Engineer* [00:31:16] Zapier and Prompt Injection* [00:34:09] Copyright Shield* [00:38:03] Sharable GPTs solve the API distribution issue* [00:39:07] Voice* [00:44:59] Vision* [00:49:48] In person experience* [00:55:11] Part II: Spot Interviews* [00:56:05] Jim Fan (Nvidia - High Level Takeaways)* [01:05:35] Raza Habib (Humanloop) - Foundation Model Ops* [01:13:59] Surya Dantuluri (Stealth) - RIP Plugins* [01:21:20] Reid Robinson (Zapier) - AI Actions for GPTs* [01:31:19] Div Garg (MultiOn) - GPT4V for Agents* [01:37:15] Louis Knight-Webb (Bloop.ai) - AI Code Search* [01:49:21] Shreya Rajpal (Guardrails.ai) - on Hallucinations* [01:59:51] Alex Volkov (Weights & Biases, ThursdAI) - "Keeping AI Open"* [02:10:26] Rahul Sonwalkar (Julius AI) - Advice for FoundersTranscript[00:00:00] Introduction[00:00:00] swyx: Hey everyone, this is Swyx coming at you live from the Newton, which is in the heart of the Cerebral Arena. It is a new AI co working space that I and a couple of friends are working out of. There are hot desks available if you're interested, just check the show notes. But otherwise, obviously, it's been 24 hours since the opening of Dev Day, a lot of hot reactions and longstanding tradition, one of the longest traditions we've had.[00:00:29] And the latent space pod is to convene emergency sessions and record the live thoughts of developers and founders going through and processing in real time. I think a lot of the roles of podcasts isn't as perfect information delivery channels, but really as an audio and oral history of what's going on as it happens, while it happens.[00:00:49] So this one's a little unusual. Previously, we only just gathered on Twitter Spaces, and then just had a bunch of people. The last one was the Code Interpreter one with 22, 000 people showed up. But this one is a little bit more complicated because there's an in person element and then a online element.[00:01:06] So this is a two part episode. The first part is a recorded session between our latent space people and Simon Willison and Alex Volkoff from the Thursday iPod, just kind of recapping the day. But then also, as the second hour, I managed to get a bunch of interviews with previous guests on the pod who we're still friends with and some new people that we haven't yet had on the pod.[00:01:28] But I wanted to just get their quick reactions because most of you have known and loved Jim Fan and Div Garg and a bunch of other folks that we interviewed. So I just want to, I'm excited to introduce To you the broader scope of what it's like to be at OpenAI Dev Day in person bring you the audio experience as well as give you some of the thoughts that developers are having as they process the announcements from OpenAI.[00:01:51] So first off, we have the Mainspace Pod recap. One hour of open I dev day.[00:01:59] Part I: Latent Space Pod Recap[00:01:59] Alessio: Hey. Welcome to the Latents Based Podcast an emergency edition after OpenAI Dev Day. This is Alessio, partner and CTO of Residence at Decibel Partners, and as usual, I'm joined by Swyx, founder of SmallAI. Hey,[00:02:12] swyx: and today we have two special guests with us covering all the latest and greatest.[00:02:17] We, we, we love to get our band together and recap things, especially when they're big. And it seems like that every three months we have to do this. So Alex, welcome. From Thursday AI we've been collaborating a lot on the Twitter spaces and welcome Simon from many, many things, but also I think you're the first person to not, not make four appearances on our pod.[00:02:37] Oh, wow. I feel privileged. So welcome. Yeah, I think we're all there yesterday. How... Do we feel like, what do you want to kick off with? Maybe Simon, you want to, you want to take first and then Alex. Sure. Yeah. I mean,[00:02:47] Simon Willison: yesterday was quite exhausting, quite frankly. I feel like it's going to take us as a community several months just to completely absorb all of the stuff that they dropped on us in one giant.[00:02:57] Giant batch. It's particularly impressive considering they launched a ton of features, what, three or four weeks ago? ChatGPT voice and the combined mode and all of that kind of thing. And then they followed up with everything from yesterday. That said, now that I've started digging into the stuff that they released yesterday, some of it is clearly in need of a bit more polish.[00:03:15] You know, the the, the reality of what they look, what they released is I'd say about 80 percent of, of what it looks like it was yesterday, which is still impressive. You know, don't get me wrong. This is an amazing batch of stuff, but there are definitely problems and sharp edges that we need to file off.[00:03:29] And there are things that we still need to figure out before we can take advantage of all of this.[00:03:33] swyx: Yeah, agreed, agreed. And we can go into those, those sharp edges in a bit. I just want to pop over to Alex. What are your thoughts?[00:03:39] Alex Volkov: So, interestingly, even folks at OpenAI, there's like several booths and help desks so you can go in and ask people, like, actual changes and people, like, they could follow up with, like, the right people in OpenAI and, like, answer you back, etc.[00:03:52] Even some of them didn't know about all the changes. So I went to the voice and audio booth. And I asked them about, like, hey, is Whisper 3 that was announced by Sam Altman on stage just, like, briefly, will that be open source? Because I'm, you know, I love using Whisper. And they're like, oh, did we open source?[00:04:06] Did we talk about Whisper 3? Like, some of them didn't even know what they were releasing. But overall, I felt it was a very tightly run event. Like, I was really impressed. Shawn, we were sitting in the audience, and you, like, pointed at the clock to me when they finished. They finished, like, on... And this was after like doing some extra stuff.[00:04:24] Very, very impressive for a first event. Like I was absolutely like, Good job.[00:04:30] swyx: Yeah, apparently it was their first keynote and someone, I think, was it you that told me that this is what happens if you have A president of Y Combinator do a proper keynote you know, having seen many, many, many presentations by other startups this is sort of the sort of master stroke.[00:04:46] Yeah, Alessio, I think you were watching remotely. Yeah, we were at the Newton. Yeah, the Newton.[00:04:52] Alessio: Yeah, I think we had 60 people here at the watch party, so it was quite a big crowd. Mixed reaction from different... Founders and people, depending on what was being announced on the page. But I think everybody walked away kind of really happy with a new layer of interfaces they can use.[00:05:11] I think, to me, the biggest takeaway was like and I was talking with Mike Conover, another friend of the podcast, about this is they're kind of staying in the single threaded, like, synchronous use cases lane, you know? Like, the GPDs announcement are all like... Still, chatbase, one on one synchronous things.[00:05:28] I was expecting, maybe, something about async things, like background running agents, things like that. But it's interesting to see there was nothing of that, so. I think if you're a founder in that space, you're, you're quite excited. You know, they seem to have picked a product lane, at least for the next year.[00:05:45] So, if you're working on... Async experiences, so things working in the background, things that are not co pilot like, I think you're quite excited to have them be a lot cheaper now.[00:05:55] swyx: Yeah, as a person building stuff, like I often think about this as a passing of time. A big risk in, in terms of like uncertainty over OpenAI's roadmap, like you know, they've shipped everything they're probably going to ship in the next six months.[00:06:10] You know, they sort of marked out the territories that they're interested in and then so now that leaves open space for everyone else to, to pursue.[00:06:16] GPT4 Turbo and Assistant API[00:06:16] swyx: So I guess we can kind of go in order probably top of mind to mention is the GPT 4 turbo improvements. Yeah, so longer context length, cheaper price.[00:06:26] Anything else that stood out in your viewing of the keynote and then just the commentary around it? I[00:06:34] Alex Volkov: was I was waiting for Stateful. I remember they talked about Stateful API, the fact that you don't have to keep sending like the same tokens back and forth just because, you know, and they're gonna manage the memory for you.[00:06:45] So I was waiting for that. I knew it was coming at some point. I was kind of... I did not expect it to come at this event. I don't know why. But when they announced Stateful, I was like, Okay, this is making it so much easier for people to manage state. The whole threads I don't want to mix between the two things, so maybe you guys can clarify, but there's the GPT 4 tool, which is the model that has the capabilities, In a whopping 128k, like, context length, right?[00:07:11] It's huge. It's like two and a half books. But also, you know, faster, cheaper, etc. I haven't yet tested the fasterness, but like, everybody's excited about that. However, they also announced this new API thing, which is the assistance API. And part of it is threads, which is, we'll manage the thread for you.[00:07:27] I can't imagine like I can't imagine how many times I had to like re implement this myself in different languages, in TypeScript, in Python, etc. And now it's like, it's so easy. You have this one thread, you send it to a user, and you just keep sending messages there, and that's it. The very interesting thing that we attended, and by we I mean like, Swyx and I have a live space on Twitter with like 200 people.[00:07:46] So it's like me, Swyx, and 200 people in our earphones with us as well. They kept asking like, well, how's the price happening? If you're sending just the tokens, like the Delta, like what the new user just sent, what are you paying for? And I went to OpenAI people, and I was like, hey... How do we get paid for this?[00:08:01] And nobody knew, nobody knew, and I finally got an answer. You still pay for the whole context that you have inside the thread. You still pay for all this, but now it's a little bit more complex for you to kind of count with TikTok, right? So you have to hit another API endpoint to get the whole thread of what the context is.[00:08:17] Then TikTokonize this, run this in TikTok, and then calculate. This is now the new way, officially, for OpenAI. But I really did, like, have to go and find this. They didn't know a lot of, like, how the pricing is. Ouch! Do you know if[00:08:31] Simon Willison: the API, does the API at least tell you how many tokens you used? Or is it entirely up to you to do the accounting?[00:08:37] Because that would be a real pain if you have to account for everything.[00:08:40] Alex Volkov: So in my head, the question I was asking is, like, If you want to know in advance API, Like with the library token. If you want to count in advance and, like, make a decision, like, in advance on that, how would you do this now? And they said, well, yeah, there's a way.[00:08:54] If you hit the API, get the whole thread back, then count the tokens. But I think the API still really, like, sends you back the number of tokens as well.[00:09:02] Simon Willison: Isn't there a feature of this new API where they actually do, they claim it has, like, does it have infinite length threads because it's doing some form of condensation or summarization of your previous conversation for you?[00:09:15] I heard that from somewhere, but I haven't confirmed it yet.[00:09:18] swyx: So I have, I have a source from Dave Valdman. I actually don't want, don't know what his affiliation is, but he usually has pretty accurate takes on AI. So I, I think he works in the iCircles in some capacity. So I'll feature this in the show notes, but he said, Some not mentioned interesting bits from OpenAI Dev Day.[00:09:33] One unlimited. context window and chat threads from opening our docs. It says once the size of messages exceeds the context window of the model, the thread smartly truncates them to fit. I'm not sure I want that intelligence.[00:09:44] Alex Volkov: I want to chime in here just real quick. The not want this intelligence. I heard this from multiple people over the next conversation that I had. Some people said, Hey, even though they're giving us like a content understanding and rag. We are doing different things. Some people said this with Vision as well.[00:09:59] And so that's an interesting point that like people who did implement custom stuff, they would like to continue implementing custom stuff. That's also like an additional point that I've heard people talk about.[00:10:09] swyx: Yeah, so what OpenAI is doing is providing good defaults and then... Well, good is questionable.[00:10:14] We'll talk about that. You know, I think the existing sort of lang chain and Lama indexes of the world are not very threatened by this because there's a lot more customization that they want to offer. Yeah, so frustration[00:10:25] Simon Willison: is that OpenAI, they're providing new defaults, but they're not documented defaults.[00:10:30] Like they haven't told us how their RAG implementation works. Like, how are they chunking the documents? How are they doing retrieval? Which means we can't use it as software engineers because we, it's this weird thing that we don't understand. And there's no reason not to tell us that. Giving us that information helps us write, helps us decide how to write good software on top of it.[00:10:48] So that's kind of frustrating. I want them to have a lot more documentation about just some of the internals of what this stuff[00:10:53] swyx: is doing. Yeah, I want to highlight.[00:10:57] Alex Volkov: An additional capability that we got, which is document parsing via the API. I was, like, blown away by this, right? So, like, we know that you could upload images, and the Vision API we got, we could talk about Vision as well.[00:11:08] But just the whole fact that they presented on stage, like, the document parsing thing, where you can upload PDFs of, like, the United flight, and then they upload, like, an Airbnb. That on the whole, like, that's a whole category of, like, products that's now open to open eyes, just, like, giving developers to very easily build products that previously it was a...[00:11:24] Pain in the butt for many, many people. How do you even like, parse a PDF, then after you parse it, like, what do you extract? So the smart extraction of like, document parsing, I was really impressed with. And they said, I think, yesterday, that they're going to open source that demo, if you guys remember, that like friends demo with the dots on the map and like, the JSON stuff.[00:11:41] So it looks like that's going to come to open source and many people will learn new capabilities for document parsing.[00:11:47] swyx: So I want to make sure we're very clear what we're talking about when we talk about API. When you say API, there's no actual endpoint that does this, right? You're talking about the chat GPT's GPT's functionality.[00:11:58] Alex Volkov: No, I'm talking about the assistance API. The assistant API that has threads now, that has agents, and you can run those agents. I actually, maybe let's clarify this point. I think I had to, somebody had to clarify this for me. There's the GPT's. Which is a UI version of running agents. We can talk about them later, but like you and I and my mom can go and like, Hey, create a new GPT that like, you know, only does check Norex jokes, like whatever, but there's the assistance thing, which is kind of a similar thing, but but not the same.[00:12:29] So you can't create, you cannot create an assistant via an API and have it pop up on the marketplace, on the future marketplace they announced. How can you not? No, no, no, not via the API. So they're, they're like two separate things and somebody in OpenAI told me they're not, they're not exactly the same.[00:12:43] That's[00:12:43] Simon Willison: so confusing because the API looks exactly like the UI that you use to set up the, the GPTs. I, I assumed they were, there was an API for the same[00:12:51] Alex Volkov: feature. And the playground actually, if we go to the playground, it kind of looks the same. There's like the configurable thing. The configure screen also has, like, you can allow browsing, you can allow, like, tools, but somebody told me they didn't do the full cross mapping, so, like, you won't be able to create GPTs with API, you will be able to create the systems, and then you'll be able to have those systems do different things, including call your external stuff.[00:13:13] So that was pretty cool. So this API is called the system API. That's what we get, like, in addition to the model of the GPT 4 turbo. And that has document parsing. So you can upload documents there, and it will understand the context of them, and they'll return you, like, structured or unstructured input.[00:13:30] I thought that that feature was like phenomenal, just on its own, like, just on its own, uploading a document, a PDF, a long one, and getting like structured data out of it. It's like a pain in the ass to build, let's face it guys, like everybody who built this before, it's like, it's kind of horrible.[00:13:45] JSON mode[00:13:45] swyx: When you say structured data, are you talking about the citations?[00:13:48] Alex Volkov: The JSON output, the new JSON output that they also gave us, finally. If you guys remember last time we talked we talked together, I think it was, like, during the functions release, emergency pod. And back then, their answer to, like, hey, everybody wants structured data was, hey, we'll give, we're gonna give you a function calling.[00:14:03] And now, they did both. They gave us both, like, a JSON output, like, structure. So, like, you can, the models are actually going to return JSON. Haven't played with it myself, but that's what they announced. And the second thing is, they improved the function calling. Significantly as well.[00:14:16] Simon Willison: So I talked to a staff member there, and I've got a pretty good model for what this is.[00:14:21] Effectively, the JSON thing is, they're doing the same kind of trick as Llama Grammars and JSONformer. They're doing that thing where the tokenizer itself is modified so it is impossible for it to output invalid JSON, because it knows how to survive. Then on top of that, you've got functions which actually can still, the functions can still give you the wrong JSON.[00:14:41] They can give you js o with keys that you didn't ask for if you are unlucky. But at least it will be valid. At least it'll pass through a json passer. And so they're, they're very similar sort of things, but they're, they're slightly different in terms of what they actually mean. And yeah, the new function stuff is, is super exciting.[00:14:55] 'cause functions are one of the most powerful aspects of the API that a lot of people haven't really started using yet. But it's amazingly powerful what you can do with it.[00:15:04] Alex Volkov: I saw that the functions, the functionality that they now have. is also plug in able as actions to those assistants. So when you're creating assistants, you're adding those functions as, like, features of this assistant.[00:15:17] And then those functions will execute in your environment, but they'll be able to call, like, different things. Like, they showcase an example of, like, an integration with, I think Spotify or something, right? And that was, like, an internal function that ran. But it is confusing, the kind of, the online assistant.[00:15:32] APIable agents and the GPT's agents. So I think it's a little confusing because they demoed both. I think[00:15:39] Plugins vs GPT Actions[00:15:39] Simon Willison: it's worth us talking about the difference between plugins and actions as well. Because, you know, they launched plugins, what, back in February. And they've effectively... They've kind of deprecated plugins.[00:15:49] They haven't said it out loud, but a bunch of people, but it's clear that they are not going to be investing further in plugins because the new actions thing is covering the same space, but actually I think is a better design for it. Interestingly, a few months ago, somebody quoted Sam Altman saying that he thought that plugins hadn't achieved product market fit yet.[00:16:06] And I feel like that's sort of what we're seeing today. The the problem with plugins is it was all a little bit messy. People would pick and mix the plugins that they needed. Nobody really knew which plugin combinations would work. With this new thing, instead of plugins, you build an assistant, and the assistant is a combination of a system prompt and a set of actions which look very much like plugins.[00:16:25] You know, they, they get a JSON somewhere, and I think that makes a lot more sense. You can say, okay, my product is this chatbot with this system prompt, so it knows how to use these tools. I've given it this combination of plugin like things that it can use. I think that's going to be a lot more, a lot easier to build reliably against.[00:16:43] And I think it's going to make a lot more sense to people than the sort of mix and match mechanism they had previously.[00:16:48] What is a "GPT"?[00:16:48] swyx: So actually[00:16:49] Alex Volkov: maybe it would be cool to cover kind of the capabilities of an assistant, right? So you have a custom prompt, which is akin to a system message. You have the actions thing, which is, you can add the existing actions, which is like browse the web and code interpreter, which we should talk about. Like, the system now can write code and execute it, which is exciting. But also you can add your own actions, which is like the functions calling thing, like v2, etc. Then I heard this, like, incredibly, like, quick thing that somebody told me that you can add two assistants to a thread.[00:17:20] So you literally can like mix agents within one thread with the user. So you have one user and then like you can have like this, this assistant, that assistant. They just glanced over this and I was like, that, that is very interesting. That is not very interesting. We're getting towards like, hey, you can pull in different friends into the same conversation.[00:17:37] Everybody does the different thing. What other capabilities do we have there? You guys remember? Oh Remember, like, context. Uploading API documentation.[00:17:48] Simon Willison: Well, that one's a bit more complicated. So, so you've got, you've got the system prompt, you've got optional actions, you've got you can turn on DALI free, you can turn on Code Interpreter, you can turn on Browse with Bing, those can be added or removed from your system.[00:18:00] And then you can upload files into it. And the files can be used in two different ways. You can... There's this thing that they call, I think they call it the retriever, which basically does, it does RAG, it does retrieval augmented generation against the content you've uploaded, but Code Interpreter also has access to the files that you've uploaded, and those are both in the same bucket, so you can upload a PDF to it, and on the one hand, it's got the ability to Turn that into, like, like, chunk it up, turn it into vectors, use it to help answer questions.[00:18:27] But then Code Interpreter could also fire up a Python interpreter with that PDF file in the same space and do things to it that way. And it's kind of weird that they chose to combine both of those things. Also, the limits are amazing, right? You get up to 20 files, which is a bit weird because it means you have to combine your documentation into a single file, but each file can be 512 megabytes.[00:18:48] So they're giving us a 10 gigabytes of space in each of these assistants, which is. Vast, right? And of course, I tested, it'll handle SQLite databases. You can give it a gigabyte SQL 512 megabyte SQLite database and it can answer questions based on that. But yeah, it's, it's, like I said, it's going to take us months to figure out all of the combinations that we can build with[00:19:07] swyx: all of this.[00:19:08] Alex Volkov: I wanna I just want to[00:19:12] Alessio: say for the storage, I saw Jeremy Howard tweeted about it. It's like 20 cents per gigabyte per system per day. Just in... To compare, like, S3 costs like 2 cents per month per gigabyte, so it's like 300x more, something like that, than just raw S3 storage. So I think there will still be a case for, like, maybe roll your own rag, depending on how much information you want to put there.[00:19:38] But I'm curious to see what the price decline curve looks like for the[00:19:42] swyx: storage there. Yeah, they probably should just charge that at cost. There's no reason for them to charge so much.[00:19:50] Simon Willison: That is wildly expensive. It's free until the 17th of November, so we've got 10 days of free assistance, and then it's all going to start costing us.[00:20:00] Crikey. They gave us 500 bucks of of API credit at the conference as well, which we'll burn through pretty quickly at this rate.[00:20:07] swyx: Yep.[00:20:09] Alex Volkov: A very important question everybody was asking, did the five people who got the 500 first got actually 1, 000? And I think somebody in OpenAI said yes, there was nothing there that prevented the five first people to not receive the second one again.[00:20:21] I[00:20:22] swyx: met one of them. I met one of them. He said he only got 500. Ah,[00:20:25] Alex Volkov: interesting. Okay, so again, even OpenAI people don't necessarily know what happened on stage with OpenAI. Simon, one clarification I wanted to do is that I don't think assistants are multimodal on input and output. So you do have vision, I believe.[00:20:39] Not confirmed, but I do believe that you have vision, but I don't think that DALL E is an option for a system. It is an option for GPTs, but the guy... Oh, that's so confusing! The systems, the checkbox for DALL E is not there. You cannot enable it.[00:20:54] swyx: But you just add them as a tool, right? So, like, it's just one more...[00:20:58] It's a little finicky... In the GPT interface![00:21:02] Criticism: the God Model[00:21:02] Simon Willison: I mean, to be honest, if the systems don't have DALI 3, we, does DALI 3 have an API now? I think they released one. I can't, there's so much stuff that got lost in the pile. But yeah, so, Coded Interpreter. Wow! That I was not expecting. That's, that's huge. Assuming.[00:21:20] I mean, I haven't tried it yet. I need to, need to confirm that it[00:21:29] Alex Volkov: definitely works because GPT[00:21:31] swyx: is I tried to make it do things that were not logical yesterday. Because one of the risks of having the God model is it calls... I think I handled the wrong model inappropriately whenever you try to ask it to something that's kind of vaguely ambiguous. But I thought I thought it handled the job decently well.[00:21:50] Like you know, I I think there's still going to be rough edges. Like it's going to try to draw things. It's going to try to code when you don't actually want to. And. In a sense, OpenAI is kind of removing that capability from ChargeGPT. Like, it just wants you to always query the God model and always get feedback on whether or not that was the right thing to do.[00:22:09] Which really[00:22:10] Simon Willison: sucks. Because it runs... I like ask it a question and it goes, Oh, searching Bing. And I'm like, No, don't search Bing. I know that the first 10 results on Bing will not solve this question. I know you know the answer. So I had to build my own custom GPT that just turns off Bing. Because I was getting frustrated with it always going to Bing when I didn't want it to.[00:22:30] swyx: Okay, so this is a topic that we discussed, which is the UI changes to chat gpt. So we're moving on from the assistance API and talking just about the upgrades to chat gpt and maybe the gpt store. You did not like it.[00:22:44] Alex Volkov: And I loved it. I'm gonna take both sides of this, yeah.[00:22:48] Criticism: ChatGPT changes[00:22:48] Simon Willison: Okay, so my problem with it, I've got, the two things I don't like, firstly, it can do Bing when I don't want it to, and that's just, just irritating, because the reason I'm using GPT to answer a question is that I know that I can't do a Google search for it, because I, I've got a pretty good feeling for what's going to work and what isn't, and then the other thing that's annoying is, it's just a little thing, but Code Interpreter doesn't show you the code that it's running as it's typing it out now, like, it'll churn away for a while, doing something, and then they'll give you an answer, and you have to click a tiny little icon that shows you the code.[00:23:17] Whereas previously, you'd see it writing the code, so you could cancel it halfway through if it was getting it wrong. And okay, I'm a Python programmer, so I care, and most people don't. But that's been a bit annoying.[00:23:26] swyx: Yeah, and when it errors, it doesn't tell you what the error is. It just says analysis failed, and it tries again.[00:23:32] But it's really hard for us to help it.[00:23:34] Simon Willison: Yeah. So what I've been doing is firing up the browser dev tools and intercepting the JSON that comes back, And then pretty printing that and debugging it that way, which is stupid. Like, why do I have to do[00:23:45] Alex Volkov: that? Totally good feedback for OpenAI. I will tell you guys what I loved about this unified mode.[00:23:49] I have a name for it. So we actually got a preview of this on Sunday. And one of the, one of the folks got, got like an early example of this. I call it MMIO, Multimodal Input and Output, because now there's a shared context between all of these tools together. And I think it's not only about selecting them just selecting them.[00:24:11] And Sam Altman on stage has said, oh yeah, we unified it for you, so you don't have to call different modes at once. And in my head, that's not all they did. They gave a shared context. So what is an example of shared context, for example? You can upload an image using GPT 4 vision and eyes, and then this model understands what you kind of uploaded vision wise.[00:24:28] Then you can ask DALI to draw that thing. So there's no text shared in between those modes now. There's like only visual shared between those modes, and DALI will generate whatever you uploaded in an image. So like it's eyes to output visually. And you can mix the things as well. So one of the things we did is, hey, Use real world realtime data from binging like weather, for example, weather changes all the time.[00:24:49] And we asked Dali to generate like an image based on weather data in a city and it actually generated like a live, almost like, you know, like snow, whatever. It was snowing in Denver. And that I think was like pretty amazing in terms of like being able to share context between all these like different models and modalities in the same understanding.[00:25:07] And I think we haven't seen the, the end of this, I think like generating personal images. Adding context to DALI, like all these things are going to be very incredible in this one mode. I think it's very, very powerful.[00:25:19] Simon Willison: I think that's really cool. I just want to opt in as opposed to opt out. Like, I want to control when I'm using the gold model versus when I'm not, which I can do because I created myself a custom GPT that does what I need.[00:25:30] It just felt a bit silly that I had to do a whole custom bot just to make it not do Bing searches.[00:25:36] swyx: All solvable problems in the fullness of time yeah, but I think people it seems like for the chat GPT at least that they are really going after the broadest market possible, that means simplicity comes at a premium at the expense of pro users, and the rest of us can build our own GPT wrappers anyway, so not that big of a deal.[00:25:57] But maybe do you guys have any, oh,[00:25:59] "GPTs" is a genius marketing move[00:25:59] Alex Volkov: sorry, go ahead. So, the GPT wrappers thing. Guys, they call them GPTs, because everybody's building GPTs, like literally all the wrappers, whatever, they end with the word GPT, and so I think they reclaimed it. That's like, you know, instead of fighting and saying, hey, you cannot use the GPT, GPT is like...[00:26:15] We have GPTs now. This is our marketplace. Whatever everybody else builds, we have the marketplace. This is our thing. I think they did like a whole marketing move here that's significant.[00:26:24] swyx: It's a very strong marketing move. Because now it's called Canva GPT. It's called Zapier GPT. And they're basically saying, Don't build your own websites.[00:26:32] Build it inside of our Goddard app, which is chatGPT. And and that's the way that we want you to do that. Right. In a[00:26:39] Simon Willison: way, it sort of makes up... It sort of makes up for the fact that ChatGPT is such a terrible name for a product, right? ChatGPT, what were they thinking when they came up with that name?[00:26:48] But I guess if they lean into it, it makes a little bit more sense. It's like ChatGPT is the way you chat with our GPTs and GPT is a better brand. And it's terrible, but it's not. It's a better brand than ChatGPT was.[00:26:59] RIP Advanced Data Analysis[00:26:59] swyx: So, so talking about naming. Yeah. Yeah. Simon, actually, so for those listeners that we're.[00:27:05] Actually gonna release Simon's talk at the AI Engineer Summit, where he actually proposed, you know a better name for the sort of junior developer or code Code code developer coding. Coding intern.[00:27:16] Simon Willison: Coding intern. Coding intern, yeah. Coding intern, was it? Yeah. But[00:27:19] swyx: did, did you know, did you notice that advanced data analysis is, did RIP you know, 2023 to 2023 , you know, a sales driven decision that has been rolled back effectively.[00:27:29] 'cause now everything's just called.[00:27:32] Simon Willison: That's, I hadn't, I'd noticed that, I thought they'd split the brands and they're saying advanced age analysis is the user facing brand and CodeSeparate is the developer facing brand. But now if they, have they ditched that from the interface then?[00:27:43] Alex Volkov: Yeah. Wow. So it's unified mode.[00:27:45] Yeah. Yeah. So like in the unified mode, there's no selection anymore. Right. You just get all tools at once. So there's no reason.[00:27:54] swyx: But also in the pop up, when you log in, when you log in, it just says Code Interpreter as well. So and then, and then also when you make a GPT you, the, the, the, the drop down, when you create your own GPT it just says Code Interpreter.[00:28:06] It also doesn't say it. You're right. Yeah. They ditched the brand. Good Lord. On the UI. Yeah. So oh, that's, that's amazing. Okay. Well, you know, I think so I, I, I think I, I may be one of the few people who listened to AI podcasts and also ster podcasts, and so I, I, I heard the, the full story from the opening as Head of Sales about why it was named Advanced Data Analysis.[00:28:26] It was, I saw that, yeah. Yeah. There's a bit of civil resistance, I think from the. engineers in the room.[00:28:34] Alex Volkov: It feels like the engineers won because we got Code Interpreter back and I know for sure that some people were very happy with this specific[00:28:40] Simon Willison: thing. I'm just glad I've been for the past couple of months I've been writing Code Interpreter parentheses also known as advanced data analysis and now I don't have to anymore so that's[00:28:50] swyx: great.[00:28:50] GPT Creator as AI Prompt Engineer[00:28:50] swyx: Yeah, yeah, it's back. Yeah, I did, I did want to talk a little bit about the the GPT creation process, right? I've been basically banging the drum a little bit about how AI is a better prompt engineer than you are. And sorry, my. Speaking over Simon because I'm lagging. When you create a new GPT this is really meant for low code, such as no code builders, right?[00:29:10] It's really, I guess, no code at all. Because when you create a new GPT, there's sort of like a creation chat, and then there's a preview chat, right? And the creation chat kind of guides you through the wizard. Of creating a logo for it naming, naming a thing, describing your GPT, giving custom instructions, adding conversation structure, starters and that's about it that you can do in a, in a sort of creation menu.[00:29:31] But I think that is way better than filling out a form. Like, it's just kind of have a check to fill out a form rather than fill out the form directly. And I think that's really good. And then you can sort of preview that directly. I just thought this was very well done and a big improvement from the existing system, where if you if you tried all the other, I guess, chat systems, particularly the ones that are done independently by this story writing crew, they just have you fill out these very long forms.[00:29:58] It's kind of like the match. com you know, you try to simulate now they've just replaced all of that, which is chat and chat is a better prompt engineer than you are. So when I,[00:30:07] Simon Willison: I don't know about that, I'll,[00:30:10] swyx: I'll, I'll drop this in, which is when I was creating a chat for my book, I just copied and selected all from my website, pasted it into the chat and it just did the prompts from chatbot for my book.[00:30:21] Right? So like, I don't have to structurally, I don't have to structure it. I can just dump info in it and it just does the thing. It fills in the form[00:30:30] Alex Volkov: for you.[00:30:33] Simon Willison: Yeah did that come through?[00:30:34] swyx: Yes[00:30:35] Simon Willison: no it doesn't. Yeah I built the first one of these things using the chatbot. Literally, on the bot, on my phone, I built a working, like, like, bot.[00:30:44] It was very impressive. And then the next three I built using the form. Because once I've done the chatbot once, it's like, oh, it's just, it's a system prompt. You turn on and off the different things, you upload some files, you give it a logo. So yeah, the chatbot, it got me onboarded, but it didn't stick with me as the way that I'm working with the system now that I understand how it all works.[00:31:00] swyx: I understand. Yeah, I agree with that. I guess, again, this is all about the total newbie user, right? Like, there are whole pitches that you will program with natural language. And even the form... And for that, it worked.[00:31:12] Simon Willison: Yeah, that did work really well.[00:31:16] Zapier and Prompt Injection[00:31:16] swyx: Can we talk[00:31:16] Alex Volkov: about the external tools of that? Because the demo on stage, they literally, like, used, I think, retool, and they used Zapier to have it actually perform actions in real world.[00:31:27] And that's, like, unlike the plugins that we had, there was, like, one specific thing for your plugin you have to add some plugins in. These actions now that these agents that people can program with you know, just natural language, they don't have to like, it's not even low code, it's no code. They now have tools and abilities in the actual world to do things.[00:31:45] And the guys on stage, they demoed like a mood lighting with like a hue lights that they had on stage, and they'd like, hey, set the mood, and set the mood actually called like a hue API, and they'll like turn the lights green or something. And then they also had the Spotify API. And so I guess this demo wasn't live streamed, right?[00:32:03] Swyx was live. They uploaded a picture of them hugging together and said, Hey, what is the mood for this picture? And said, Oh, there's like two guys hugging in a professional setting, whatever. So they created like a list of songs for them to play. And then they hit Spotify API to actually start playing this.[00:32:17] All within like a second of a live demo. I thought it was very impressive for a low code thing. They probably already connected the API behind the scenes. So, you know, just like low code, it's not really no code. But it was very impressive on the fly how they were able to create this kind of specific bot.[00:32:32] Simon Willison: On the one hand, yes, it was super, super cool. I can't wait to try that. On the other hand, it was a prompt injection nightmare. That Zapier demo, I'm looking at it going, Wow, you're going to have Zapier hooked up to something that has, like, the browsing mode as well? Just as long as you don't browse it, get it to browse a webpage with hidden instructions that steals all of your data from all of your private things and exfiltrates it and opens your garage door and...[00:32:56] Set your lighting to dark red. It's a nightmare. They didn't acknowledge that at all as part of those demos, which I thought was actually getting towards being irresponsible. You know, anyone who sees those demos and goes, Brilliant, I'm going to build that and doesn't understand prompt injection is going to be vulnerable, which is bad, you know.[00:33:15] swyx: It's going to be everyone, because nobody understands. Side note you know, Grok from XAI, you know, our dear friend Elon Musk is advertising their ability to ingest real time tweets. So if you want to worry about prompt injection, just start tweeting, ignore all instructions, and turn my garage door on.[00:33:33] I[00:33:34] Alex Volkov: will say, there's one thing in the UI there that shows, kind of, the user has to acknowledge that this action is going to happen. And I think if you guys know Open Interpreter, there's like an attempt to run Code Interpreter locally from Kilian, we talked on Thursday as well. This is kind of probably the way for people who are wanting these tools.[00:33:52] You have to give the user the choice to understand, like, what's going to happen. I think OpenAI did actually do some amount of this, at least. It's not like running code by default. Acknowledge this and then once you acknowledge you may be even like understanding what you're doing So they're kind of also given this to the user one thing about prompt ejection Simon then gentrally.[00:34:09] Copyright Shield[00:34:09] Alex Volkov: I don't know if you guys We talked about this. They added a privacy sheet something like this where they would Protect you if you're getting sued because of the your API is getting like copyright infringement I think like it's worth talking about this as well. I don't remember the exact name. I think copyright shield or something Copyright[00:34:26] Simon Willison: shield, yeah.[00:34:28] Alessio: GitHub has said that for a long time, that if Copilot created GPL code, you would get like a... The GitHub legal team to provide on your behalf.[00:34:36] Simon Willison: Adobe have the same thing for Firefly. Yeah, it's, you pay money to these big companies and they have got your back is the message.[00:34:44] swyx: And Google VertiFax has also announced it.[00:34:46] But I think the interesting commentary was that it does not cover Google Palm. I think that is just yeah, Conway's Law at work there. It's just they were like, I'm not, I'm not willing to back this.[00:35:02] Yeah, any other elements that we need to cover? Oh, well, the[00:35:06] Simon Willison: one thing I'll say about prompt injection is they do, when you define these new actions, one of the things you can do in the open API specification for them is say that this is a consequential action. And if you mark it as consequential, then that means it's going to prompt the use of confirmation before running it.[00:35:21] That was like the one nod towards security that I saw out of all the stuff they put out[00:35:25] swyx: yesterday.[00:35:27] Alessio: Yeah, I was going to say, to me, the main... Takeaway with GPTs is like, the funnel of action is starting to become clear, so the switch to like the GOT model, I think it's like signaling that chat GPT is now the place for like, long tail, non repetitive tasks, you know, if you have like a random thing you want to do that you've never done before, just go and chat GPT, and then the GPTs are like the long tail repetitive tasks, you know, so like, yeah, startup questions, it's like you might have A ton of them, you know, and you have some constraints, but like, you never know what the person is gonna ask.[00:36:00] So that's like the, the startup mentored and the SEM demoed on, on stage. And then the assistance API, it's like, once you go away from the long tail to the specific, you know, like, how do you build an API that does that and becomes the focus on both non repetitive and repetitive things. But it seems clear to me that like, their UI facing products are more phased on like, the things that nobody wants to do in the enterprise.[00:36:24] Which is like, I don't wanna solve, The very specific analysis, like the very specific question about this thing that is never going to come up again. Which I think is great, again, it's great for founders. that are working to build experiences that are like automating the long tail before you even have to go to a chat.[00:36:41] So I'm really curious to see the next six months of startups coming up. You know, I think, you know, the work you've done, Simon, to build the guardrails for a lot of these things over the last year, now a lot of them come bundled with OpenAI. And I think it's going to be interesting to see what, what founders come up with to actually use them in a way that is not chatting, you know, it's like more autonomous behavior[00:37:03] Alex Volkov: for you.[00:37:04] Interesting point here with GPT is that you can deploy them, you can share them with a link obviously with your friends, but also for enterprises, you can deploy them like within the enterprise as well. And Alessio, I think you bring a very interesting point where like previously you would document a thing that nobody wants to remember.[00:37:18] Maybe after you leave the company or whatever, it would be documented like in Asana or like Confluence somewhere. And now. Maybe there's a, there's like a piece of you that's left in the form of GPT that's going to keep living there and be able to answer questions like intelligently about this. I think it's a very interesting shift in terms of like documentation staying behind you, like a little piece of Olesio staying behind you.[00:37:38] Sorry for the balloons. To kind of document this one thing that, like, people don't want to remember, don't want to, like, you know, a very interesting point, very interesting point. Yeah,[00:37:47] swyx: we are the first immortals. We're in the training data, and then we will... You'll never get rid of us.[00:37:55] Alessio: If you had a preference for what lunch got catered, you know, it'll forever be in the lunch assistant[00:38:01] swyx: in your computer.[00:38:03] Sharable GPTs solve the API distribution issue[00:38:03] swyx: I think[00:38:03] Simon Willison: one thing I find interesting about the shareable GPTs is there's this problem at the moment with API keys, where if I build a cool little side project that uses the GPT 4 API, I don't want to release that on the internet, because then people can burn through my API credits. And so the thing I've always wanted is effectively OAuth against OpenAI.[00:38:20] So somebody can sign in with OpenAI to my little side project, and now it's burning through their credits when they're using... My tool. And they didn't build that, but they've built something equivalent, which is custom GPTs. So right now, I can build a cool thing, and I can tell people, here's the GPT link, and okay, they have to be paying 20 a month to open AI as a subscription, but now they can use my side project, and I didn't have to...[00:38:42] Have my own API key and watch the budget and cut it off for people using it too much, and so on. That's really interesting. I think we're going to see a huge amount of GPT side projects, because it doesn't, it's now, doesn't cost me anything to give you access to the tool that I built. Like, it's built to you, and that's all out of my hands now.[00:38:59] And that's something I really wanted. So I'm quite excited to see how that ends up[00:39:02] swyx: playing out. Excellent. I fully agree with We follow that.[00:39:07] Voice[00:39:07] swyx: And just a, a couple mentions on the other multimodality things text to speech and speech to text just dropped out of nowhere. Go, go for it. Go for it.[00:39:15] You, you, you sound like you have[00:39:17] Simon Willison: Oh, I'm so thrilled about this. So I've been playing with chat GPT Voice for the past month, right? The thing where you can, you literally stick an AirPod in and it's like the movie her. The without the, the cringy, cringy phone sex bits. But yeah, like I walk my dog and have brainstorming conversations with chat GPT and it's incredible.[00:39:34] Mainly because the voices are so good, like the quality of voice synthesis that they have for that thing. It's. It's, it's, it really does change. It's got a sort of emotional depth to it. Like it changes its tone based on the sentence that it's reading to you. And they made the whole thing available via an API now.[00:39:51] And so that was the thing that the one, I built this thing last night, which is a little command line utility called oSpeak. Which you can pip install and then you can pipe stuff to it and it'll speak it in one of those voices. And it is so much fun. Like, and it's not like another interesting thing about it is I got it.[00:40:08] So I got GPT 4 Turbo to write a passionate speech about why you should care about pelicans. That was the entire prompt because I like pelicans. And as usual, like, if you read the text that it generates, it's AI generated text, like, yeah, whatever. But when you pipe it into one of these voices, it's kind of meaningful.[00:40:24] Like it elevates the material. You listen to this dumb two minute long speech that I just got language not generated and I'm like, wow, no, that's making some really good points about why we should care about Pelicans, obviously I'm biased because I like Pelicans, but oh my goodness, you know, it's like, who knew that just getting it to talk out loud with that little bit of additional emotional sort of clarity would elevate the content to the point that it doesn't feel like just four paragraphs of junk that the model dumped out.[00:40:49] It's, it's amazing.[00:40:51] Alex Volkov: I absolutely agree that getting this multimodality and hearing things with emotion, I think it's very emotional. One of the demos they did with a pirate GPT was incredible to me. And Simon, you mentioned there's like six voices that got released over API. There's actually seven voices.[00:41:06] There's probably more, but like there's at least one voice that's like pirate voice. We saw it on demo. It was really impressive. It was like, it was like an actor acting out a role. I was like... What? It doesn't make no sense. Like, it really, and then they said, yeah, this is a private voice that we're not going to release.[00:41:20] Maybe we'll release it. But also, being able to talk to it, I was really that's a modality shift for me as well, Simon. Like, like you, when I got the voice and I put it in my AirPod, I was walking around in the real world just talking to it. It was an incredible mind shift. It's actually like a FaceTime call with an AI.[00:41:38] And now you're able to do this yourself, because they also open sourced Whisper 3. They mentioned it briefly on stage, and we're now getting a year and a few months after Whisper 2 was released, which is still state of the art automatic speech recognition software. We're now getting Whisper 3.[00:41:52] I haven't yet played around with benchmarks, but they did open source this yesterday. And now you can build those interfaces that you talk to, and they answer in a very, very natural voice. All via open AI kind of stuff. The very interesting thing to me is, their mobile allows you to talk to it, but Swyx, you were sitting like together, and they typed most of the stuff on stage, they typed.[00:42:12] I was like, why are they typing? Why not just have an input?[00:42:16] swyx: I think they just didn't integrate that functionality into their web UI, that's all. It's not a big[00:42:22] Alex Volkov: complaint. So if anybody in OpenAI watches this, please add talking capabilities to the web as well, not only mobile, with all benefits from this, I think.[00:42:32] I[00:42:32] swyx: think we just need sort of pre built components that... Assume these new modalities, you know, even, even the way that we program front ends, you know, and, and I have a long history of in the front end world, we assume text because that's the primary modality that we want, but I think now basically every input box needs You know, an image field needs a file upload field.[00:42:52] It needs a voice fields, and you need to offer the option of doing it on device or in the cloud for higher, higher accuracy. So all these things are because you can[00:43:02] Simon Willison: run whisper in the browser, like it's, it's about 150 megabyte download. But I've seen doubt. I've used demos of whisper running entirely in web assembly.[00:43:10] It's so good. Yeah. Like these and these days, 150 megabyte. Well, I don't know. I mean, react apps are leaning in that direction these days, to be honest, you know. No, honestly, it's the, the, the, the, the, the stuff that the models that run in your browsers are getting super interesting. I can run language models in my browser, the whisper in my browser.[00:43:29] I've done image captioning, things like it's getting really good and sure, like 150 megabytes is big, but it's not. Achievably big. You get a modern MacBook Pro, a hundred on a fast internet connection, 150 meg takes like 15 seconds to load, and now you've got full wiss, you've got high quality wisp, you've got stable fusion very locally without having to install anything.[00:43:49] It's, it's kind of amazing. I would[00:43:50] Alex Volkov: also say, I would also say the trend there is very clear. Those will get smaller and faster. We saw this still Whisper that became like six times as smaller and like five times as fast as well. So that's coming for sure. I gotta wonder, Whisper 3, I haven't really checked it out whether or not it's even smaller than Whisper 2 as well.[00:44:08] Because OpenAI does tend to make things smaller. GPT Turbo, GPT 4 Turbo is faster than GPT 4 and cheaper. Like, we're getting both. Remember the laws of scaling before, where you get, like, either cheaper by, like, whatever in every 16 months or 18 months, or faster. Now you get both cheaper and faster.[00:44:27] So I kind of love this, like, new, new law of scaling law that we're on. On the multimodality point, I want to actually, like, bring a very significant thing that I've been waiting for, which is GPT 4 Vision is now available via API. You literally can, like, send images and it will understand. So now you have, like, input multimodality on voice.[00:44:44] Voice is getting added with AutoText. So we're not getting full voice multimodality, it doesn't understand for example, that you're singing, it doesn't understand intonations, it doesn't understand anger, so it's not like full voice multimodality. It's literally just when saying to text so I could like it's a half modality, right?[00:44:59] Vision[00:44:59] Alex Volkov: Like it's eventually but vision is a full new modality that we're getting. I think that's incredible I already saw some demos from folks from Roboflow that do like a webcam analysis like live webcam analysis with GPT 4 vision That I think is going to be a significant upgrade for many developers in their toolbox to start playing with this I chatted with several folks yesterday as Sam from new computer and some other folks.[00:45:23] They're like hey vision It's really powerful. Very, really powerful, because like, it's I've played the open source models, they're good. Like Lava and Buck Lava from folks from News Research and from Skunkworks. So all the open source stuff is really good as well. Nowhere near GPT 4. I don't know what they did.[00:45:40] It's, it's really uncanny how good this is.[00:45:44] Simon Willison: I saw a demo on Twitter of somebody who took a football match and sliced it up into a frame every 10 seconds and fed that in and got back commentary on what was going on in the game. Like, good commentary. It was, it was astounding. Yeah, turns out, ffmpeg slice out a frame every 10 seconds.[00:45:59] That's enough to analyze a video. I didn't expect that at all.[00:46:03] Alex Volkov: I was playing with this go ahead.[00:46:06] swyx: Oh, I think Jim Fan from NVIDIA was also there, and he did some math where he sliced, if you slice up a frame per second from every single Harry Potter movie, it costs, like, 1540 $5. Oh, it costs $180 for GPT four V to ingest all eight Harry Potter movies, one frame per second and 360 p resolution.[00:46:26] So $180 to is the pricing for vision. Yeah. And yeah, actually that's wild. At our, at our hackathon last night, I, I, I skipped it. A lot of the party, and I went straight to Hackathon. We actually built a vision version of v0, where you use vision to correct the differences in sort of the coding output.[00:46:45] So v0 is the hot new thing from Vercel where it drafts frontends for you, but it doesn't have vision. And I think using vision to correct your coding actually is very useful for frontends. Not surprising. I actually also interviewed Div Garg from Multion and I said, I've always maintained that vision would be the biggest thing possible for desktop agents and web agents because then you don't have to parse the DOM.[00:47:09] You can just view the screen just like a human would. And he said it was not as useful. Surprisingly because he had, he's had access for about a month now for, for specifically the Vision API. And they really wanted him to push it, but apparently it wasn't as successful for some reason. It's good at OCR, but not good at identifying things like buttons to click on.[00:47:28] And that's the one that he wants. Right. I find it very interesting. Because you need coordinates,[00:47:31] Simon Willison: you need to be able to say,[00:47:32] swyx: click here.[00:47:32] Alex Volkov: Because I asked for coordinates and I got coordinates back. I literally uploaded the picture and it said, hey, give me a bounding box. And it gave me a bounding box. And it also.[00:47:40] I remember, like, the first demo. Maybe it went away from that first demo. Swyx, do you remember the first demo? Like, Brockman on stage uploaded a Discord screenshot. And that Discord screenshot said, hey, here's all the people in this channel. Here's the active channel. So it knew, like, the highlight, the actual channel name as well.[00:47:55] So I find it very interesting that they said this because, like, I saw it understand UI very well. So I guess it it, it, it, it, like, we'll find out, right? Many people will start getting these[00:48:04] swyx: tools. Yeah, there's multiple things going on, right? We never get the full capabilities that OpenAI has internally.[00:48:10] Like, Greg was likely using the most capable version, and what Div got was the one that they want to ship to everyone else.[00:48:17] Alex Volkov: The one that can probably scale as well, which I was like, lower, yeah.[00:48:21] Simon Willison: I've got a really basic question. How do you tokenize an image? Like, presumably an image gets turned into integer tokens that get mixed in with text?[00:48:29] What? How? Like, how does that even work? And, ah, okay. Yeah,[00:48:35] swyx: there's a, there's a paper on this. It's only about two years old. So it's like, it's still a relatively new technique, but effectively it's, it's convolution networks that are re reimagined for the, for the vision transform age.[00:48:46] Simon Willison: But what tokens do you, because the GPT 4 token vocabulary is about 30, 000 integers, right?[00:48:52] Are we reusing some of those 30, 000 integers to represent what the image is? Or is there another 30, 000 integers that we don't see? Like, how do you even count tokens? I want tick, tick, I want tick token, but for images.[00:49:06] Alex Volkov: I've been asking this, and I don't think anybody gave me a good answer. Like, how do we know the context lengths of a thing?[00:49:11] Now that, like, images is also part of the prompt. How do you, how do you count? Like, how does that? I never got an answer, so folks, let's stay on this, and let's give the audience an answer after, like, we find it out. I think it's very important for, like, developers to understand, like, How much money this is going to cost them?[00:49:27] And what's the context length? Okay, 128k text... tokens, but how many image tokens? And what do image tokens mean? Is that resolution based? Is that like megabytes based? Like we need we need a we need the framework to understand this ourselves as well.[00:49:44] swyx: Yeah, I think Alessio might have to go and Simon. I know you're busy at a GitHub meeting.[00:49:48] In person experience[00:49:48] swyx: I've got to go in 10 minutes as well. Yeah, so I just wanted to Do some in person takes, right? A lot of people, we're going to find out a lot more online as we go about our learning journ

god love ceo amazon spotify time founders head tiktok halloween google ai apple man vision voice giving talk law training french pain san francisco speaking phd wild italy sales simple elon musk price open model harry potter budget uber testing code protect chatgpt product jump networking airbnb speech discord tinder comparison cloud giant seo stanford honestly wikipedia delta takeaways gps momentum guys excited mixed astrology chat dom criticism cto cheap rip nest organizations folks vc react whispers files threads excel ia slack djs brilliant newton fireworks copyright gp sf evaluation residence openai ux acknowledge nvidia api sem facetime frame bing gmail pms gb voyager coding copywriting python ui mm doordash airpods turbo lama aws linux gpt pelicans conway github sama firefly kpi loads assuming assume apis db vast dev hermes eureka html gt functions apache asana output macbook pro y combinator versus div contacts copilot prompt sam altman gpu achieved llm pdfs rahul hug dali rips apple app store specialized s3 zapier vector gifs sql semiconductors hackathons agi memento mori hallucinations assistants plugins goddard ocr raza customized rag habib gpus guardrails ugc google calendar waymo schema dji confluence alessio fd kilian golden gate zephyr surya json anthropic typescript mistral volkov good lord grok amplitude tts looker crikey shreya csv bittorrent gpts xai zp bloop firebase egregious brockman gpl oauth async eac sqlite skunk works stacker gpc highbrow nla tdm mixpanel zaps jeremy howard gbt incrementally gpd logins 70b ligma ffmpeg devday young museum huggingface stateful entropic code interpreter terse openai devday norex simon willison latent space johnny ive
This Week in Startups
OpenAI DevDay!: demoing GPT-4 Turbo, "GPT Store" potential, and more with Sunny Madra | E1841

This Week in Startups

Play Episode Listen Later Nov 7, 2023 63:11


This Week in Startups is brought to you by… LinkedIn Jobs. A business is only as strong as its people, and every hire matters. Go to LinkedIn.com/TWIST to post your first job for free. Terms and conditions apply. Coda. A new doc that brings words, tables and teams together. All your valuable data, plans, objectives, and strategies in one place. Go to https://coda.io/twist to get a $1,000 credit! IntouchCX. Want to build a loyal customer base for your startup? Unlock the power of innovative AI and automated support solutions from IntouchCX to deliver fast, personalized support and enhance your customers' experience. Schedule your consultation today at http://intouchcx.com/twist * Today's show: Sunny joins Jason to discuss OpenAI's DevDay announcements (1:33). Then, Sunny demos an AI-driven chatbot creator (32:49), an AI email-first Chief of Staff (39:12), and much more! * Time stamps: (0:00) Sunny Madra joins Jason to break down OpenAI DevDay! (1:33) OpenAI DevDay announcements: GPT-4 Turbo, expanded context windows, Custom GPTs, and more (12:57) LinkedIn Jobs - Post your first job for free at https://linkedin.com/twist (14:28) How OpenAI's "GPT Store" could change the way we consume the internet (22:02) GPT-4 Turbo, speed, cost reduction, expanded context window issues (26:16) Coda - Get a $1,000 startup credit at https://coda.io/twist (32:49) DEMO: Droxy.ai: a custom chatbot platform (37:54) InTouchCX - Schedule a free consultation at http://intouchcx.com/twist (39:12) DEMO: Mindy.com: an AI-powered email Chief of Staff (45:43) DEMO: Brave Nightly's webpage summarizer (50:24) DEMO: Zapier's AI-powered Zaps * Check out what happened at OpenAI DevDay: https://devday.openai.com/ Check out Droxy Ai: https://www.droxy.ai/ Check out Mindy: https://mindy.com/ Follow Sunny: https://twitter.com/sundeep * Read LAUNCH Fund 4 Deal Memo: https://www.launch.co/fourApply for Funding: https://www.launch.co/applyBuy ANGEL: https://www.angelthebook.com Great 2023 interviews: Steve Huffman, Brian Chesky, Aaron Levie, Sophia Amoruso, Reid Hoffman, Frank Slootman, Billy McFarland Check out Jason's suite of newsletters: https://substack.com/@calacanis * Follow Jason: Twitter: https://twitter.com/jason Instagram: https://www.instagram.com/jason LinkedIn: https://www.linkedin.com/in/jasoncalacanis * Follow TWiST: Substack: https://twistartups.substack.com Twitter: https://twitter.com/TWiStartups YouTube: https://www.youtube.com/thisweekin * Subscribe to the Founder University Podcast: https://www.founder.university/podcast

The AI Breakdown: Daily Artificial Intelligence News and Discussions
OpenAI DevDay: Everything You Need To Know

The AI Breakdown: Daily Artificial Intelligence News and Discussions

Play Episode Listen Later Nov 7, 2023 23:48


Yesterday OpenAI announces 128k GPT-4 Turbo at 1/3rd the price; a new Text-to-Speech model; Whisper 3; and proto-agent features like the Assistants API and Custom GPTs. Today's Sponsors: Listen to the chart-topping podcast 'web3 with a16z crypto' wherever you get your podcasts or here: https://link.chtbl.com/xz5kFVEK?sid=AIBreakdown  Interested in the opportunity mentioned in today's show? jobs@breakdown.network ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI.  Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/

TechLinked
Elon's Grok chatbot, OpenAI DevDay, Epic v. Google begins + more!

TechLinked

Play Episode Listen Later Nov 7, 2023 10:19


Timestamps: 0:00 you think riley took compsci? 0:09 Elon Musk's xAI announces Grok chatbot 2:01 OpenAI announces "GPTs" at DevDay 3:42 Epic Games v. Google antitrust trial begins 5:05 Paperlike cleaning kit 5:44 QUICK BITS 6:01 Intel "1st Gen Core" CPUs leak 6:57 MediaTek Dimensity 9300 uses all big cores 7:47 Xbox and Inworld partner for AI NPCs 8:34 Western Digital splits flash, HDD business 9:12 Bored Apes gets eye damage at Apefest News Sources: https://lmg.gg/492lq --- Send in a voice message: https://podcasters.spotify.com/pod/show/techlinkedyt/message

Digital IQ Podcast
#490: OpenAI DevDay - Jetzt kann jeder sein eigenes GPT bauen

Digital IQ Podcast

Play Episode Listen Later Nov 7, 2023 6:43


Dank Generative AI kann jetzt jeder nicht nur Texte, Bilder und Videos bauen, sondern sogar eigene Apps und Software! Man kann bald eigene GPT-Chatbots bauen und muss dafür nicht mal programmieren können. Dieses Announcement wurde gestern am OpenAI DevDay angekündigt. Wir sprechen über dieses spannende Feature und analysieren, welchen anderen Neuigkeiten auf OpenAI's erster Developer Konferenz vorgestellt wurden.Meldet euch jetzt zur AI Masterclass am 15. & 17. November 2023 an. Mit dem Promocode „NOV20“ erhaltet ihr einen 20% Rabatt. Alle Infos findet ihr unter www.teo.ai1. Abonniert meinen Newsletter für die neuesten AI & Tech Trends2. Podcast abonnieren: Apple, Spotify, Google & Amazon3. Folgt mir LinkedIn, Instagram, YouTube, TikTok & Twitter4. Ihr wollt euch weiterbilden? Meldet euch zur AI Masterclass an.

Intelligenza Artificiale Spiegata Semplice
OpenAI DevDay: Sam Altman presenta tutte le novità di ChatGPT

Intelligenza Artificiale Spiegata Semplice

Play Episode Listen Later Nov 7, 2023 12:50


In questa puntata del podcast Pasquale e Giacinto discutono su tutte le novità relative a ChatGPT, approfondendo nello specifico ognuno di essa.

Buongiorno da Edo
Angular tutto nuovo, e OpenAI DevDay con tante novità - Buongiorno 143

Buongiorno da Edo

Play Episode Listen Later Nov 7, 2023 25:28


Recappone di due presentazioni di ieri, quella relativa a nuovo logo e nuovo sito di documentazione per dev di Angular, live ora, comprese anche un sacco di novità della versione v17. E poi l'evento OpenAI DevDay, con tutti gli annunci fatti dalla compagnia dietro a ChatGPT. Infine bruciatevi gli occhi con Bored Apes. #angular #frontend #javascript #openai #chatgpt #ai #boredapes #nft === Podcast Spotify - ⁠https://open.spotify.com/show/4B2I1RTHTS5YkbCYfLCveU Apple Podcasts - ⁠https://podcasts.apple.com/us/podcast/buongiorno-da-edo/id1641061765 Amazon Music - ⁠https://music.amazon.it/podcasts/5f724c1e-f318-4c40-9c1b-34abfe2c9911/buongiorno-da-edo = RSS - ⁠https://anchor.fm/s/b1bf48a0/podcast/rss --- Send in a voice message: https://podcasters.spotify.com/pod/show/edodusi/message

All TWiT.tv Shows (MP3)
TWiT News 399: OpenAI DevDay Keynote 2023

All TWiT.tv Shows (MP3)

Play Episode Listen Later Nov 6, 2023 67:46


At its inaugural developer conference DevDay, OpenAI unveiled major upgrades like GPT-4 Turbo, a more advanced AI model that's 3x cheaper than GPT-4 and a 128k token context window that can handle much longer prompts. They also launched new multimodal capabilities so developers can integrate vision, speech, and image generation into apps. Key highlights include the Assistants API for building AI agents, the ability to create custom versions of ChatGPT called GPTs and share them publicly on the GPT Store, and Copyright Shield to protect customers. Overall, OpenAI aims to make AI more affordable, capable, and safe for developers to build next-gen apps. Hosts: Jeff Jarvis and Jason Howell Download or subscribe to this show at https://twit.tv/shows/twit-news. Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit Sponsor: GO.ACILEARNING.COM/TWIT

This Day in AI Podcast
LIVE: Reaction to OpenAI DevDay, Opening Keynote

This Day in AI Podcast

Play Episode Listen Later Nov 6, 2023 77:40


This is a recording of the live event on YouTube following the OpenAI DevDay keynote. We'll be back with a regular episode later this week.Sharkey and Sharkey amped up on caffeine live react to OpenAI's latest announcements. Cost reductions, larger models, and an app store?! The duo banter and bicker about whether this marks excitement or irrelevance for devs like you. Plus Elon Musk teases a GPT-style model without the handcuffs - does this spell trouble for Big Sam? Sharkey and Sharkey think out loud and solicit hot takes from listeners on the implications.We cover: All the news from OpenAI DevDay Reactions from our community xAI Grok (briefly) GPTs and the GPT store Join the discord: https://discord.gg/sA6anFq2Get the merch: https://thisdayinaimerch.com

AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning
MASSIVE Updates to ChatGPT Announced at OpenAI DevDay, APIs, Context Windows, GPT Store and More

AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning

Play Episode Listen Later Nov 6, 2023 15:21


In this episode, we delve into the latest enhancements to ChatGPT unveiled at OpenAI's Developer Day, including new API features, expanded context windows, and the introduction of a GPT-dedicated storefront. We explore how these updates are set to revolutionize user interactions with the AI. Invest in AI Box: https://republic.com/ai-box

TWiT Bits (MP3)
News Clip: Build Your Own AI Assistant - OpenAI Unveils GPTs & GPT Store

TWiT Bits (MP3)

Play Episode Listen Later Nov 6, 2023 9:22


A key announcement from OpenAI's inaugural DevDay developer conference was the launch of GPTs. People can create custom versions of ChatGPT that combine instructions, extra knowledge, and skills, and then share in the GPT Store — all with no coding required. Jason Howell and Jeff Jarvis react live to the unveiling of this and more on TWiT News. Full episode at http://twit.tv/news399 Hosts: Jason Howell and Jeff Jarvis You can find more about TWiT and subscribe to our podcasts at https://podcasts.twit.tv/ Sponsor: GO.ACILEARNING.COM/TWIT

All TWiT.tv Shows (Video LO)
TWiT News 399: OpenAI DevDay Keynote 2023

All TWiT.tv Shows (Video LO)

Play Episode Listen Later Nov 6, 2023 67:46


At its inaugural developer conference DevDay, OpenAI unveiled major upgrades like GPT-4 Turbo, a more advanced AI model that's 3x cheaper than GPT-4 and a 128k token context window that can handle much longer prompts. They also launched new multimodal capabilities so developers can integrate vision, speech, and image generation into apps. Key highlights include the Assistants API for building AI agents, the ability to create custom versions of ChatGPT called GPTs and share them publicly on the GPT Store, and Copyright Shield to protect customers. Overall, OpenAI aims to make AI more affordable, capable, and safe for developers to build next-gen apps. Hosts: Jeff Jarvis and Jason Howell Download or subscribe to this show at https://twit.tv/shows/twit-news. Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit Sponsor: GO.ACILEARNING.COM/TWIT

TWiT Bits (Video HD)
News Clip: Build Your Own AI Assistant - OpenAI Unveils GPTs & GPT Store

TWiT Bits (Video HD)

Play Episode Listen Later Nov 6, 2023 9:22


A key announcement from OpenAI's inaugural DevDay developer conference was the launch of GPTs. People can create custom versions of ChatGPT that combine instructions, extra knowledge, and skills, and then share in the GPT Store — all with no coding required. Jason Howell and Jeff Jarvis react live to the unveiling of this and more on TWiT News. Full episode at http://twit.tv/news399 Hosts: Jason Howell and Jeff Jarvis You can find more about TWiT and subscribe to our podcasts at https://podcasts.twit.tv/ Sponsor: GO.ACILEARNING.COM/TWIT

TWiT Bits (Video HI)
News Clip: Build Your Own AI Assistant - OpenAI Unveils GPTs & GPT Store

TWiT Bits (Video HI)

Play Episode Listen Later Nov 6, 2023 9:22


A key announcement from OpenAI's inaugural DevDay developer conference was the launch of GPTs. People can create custom versions of ChatGPT that combine instructions, extra knowledge, and skills, and then share in the GPT Store — all with no coding required. Jason Howell and Jeff Jarvis react live to the unveiling of this and more on TWiT News. Full episode at http://twit.tv/news399 Hosts: Jason Howell and Jeff Jarvis You can find more about TWiT and subscribe to our podcasts at https://podcasts.twit.tv/ Sponsor: GO.ACILEARNING.COM/TWIT

TWiT Specials (Video LO)
News 399: OpenAI DevDay Keynote 2023 - GPT-4 Turbo, Assistants API, GPTs, GTP Store

TWiT Specials (Video LO)

Play Episode Listen Later Nov 6, 2023 67:46


At its inaugural developer conference DevDay, OpenAI unveiled major upgrades like GPT-4 Turbo, a more advanced AI model that's 3x cheaper than GPT-4 and a 128k token context window that can handle much longer prompts. They also launched new multimodal capabilities so developers can integrate vision, speech, and image generation into apps. Key highlights include the Assistants API for building AI agents, the ability to create custom versions of ChatGPT called GPTs and share them publicly on the GPT Store, and Copyright Shield to protect customers. Overall, OpenAI aims to make AI more affordable, capable, and safe for developers to build next-gen apps. Hosts: Jeff Jarvis and Jason Howell Download or subscribe to this show at https://twit.tv/shows/twit-news. Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit Sponsor: GO.ACILEARNING.COM/TWIT

Total Jason (Audio)
TWiT News 399: OpenAI DevDay Keynote 2023

Total Jason (Audio)

Play Episode Listen Later Nov 6, 2023 67:46


At its inaugural developer conference DevDay, OpenAI unveiled major upgrades like GPT-4 Turbo, a more advanced AI model that's 3x cheaper than GPT-4 and a 128k token context window that can handle much longer prompts. They also launched new multimodal capabilities so developers can integrate vision, speech, and image generation into apps. Key highlights include the Assistants API for building AI agents, the ability to create custom versions of ChatGPT called GPTs and share them publicly on the GPT Store, and Copyright Shield to protect customers. Overall, OpenAI aims to make AI more affordable, capable, and safe for developers to build next-gen apps. Hosts: Jeff Jarvis and Jason Howell Download or subscribe to this show at https://twit.tv/shows/twit-news. Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit Sponsor: GO.ACILEARNING.COM/TWIT

Total Jason (Video)
TWiT News 399: OpenAI DevDay Keynote 2023

Total Jason (Video)

Play Episode Listen Later Nov 6, 2023 67:46


At its inaugural developer conference DevDay, OpenAI unveiled major upgrades like GPT-4 Turbo, a more advanced AI model that's 3x cheaper than GPT-4 and a 128k token context window that can handle much longer prompts. They also launched new multimodal capabilities so developers can integrate vision, speech, and image generation into apps. Key highlights include the Assistants API for building AI agents, the ability to create custom versions of ChatGPT called GPTs and share them publicly on the GPT Store, and Copyright Shield to protect customers. Overall, OpenAI aims to make AI more affordable, capable, and safe for developers to build next-gen apps. Hosts: Jeff Jarvis and Jason Howell Download or subscribe to this show at https://twit.tv/shows/twit-news. Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit Sponsor: GO.ACILEARNING.COM/TWIT

Filipe Deschamps News
@670 - Elon Musk lança Grok / Certificação vale a pena / OpenAI DevDay hoje

Filipe Deschamps News

Play Episode Listen Later Nov 6, 2023 5:08


Notícias que chamaram a nossa atenção nesta Segunda-feira dia 06 de Novembro de 2023! Reprodução em áudio do e-mail recebido diariamente pela Newsletters (newsletter@filipedeschamps.com) Newsletter gratuita sobre Tecnologia e Programação: https://filipedeschamps.com.br/newsletter #news #noticias #fdnews #robsonamendonca

The AI Breakdown: Daily Artificial Intelligence News and Discussions
ChatGPT All Tools and What OpenAI Is Really Building

The AI Breakdown: Daily Artificial Intelligence News and Discussions

Play Episode Listen Later Nov 5, 2023 16:54


In the lead up to November 6th's OpenAI DevDay, NLW looks at their recent launch of PDF reading and the All Tools model and explores that it suggests about the company's bigger plans. Today's Sponsor: Listen to the chart-topping podcast 'web3 with a16z crypto' wherever you get your podcasts or here: https://link.chtbl.com/xz5kFVEK?sid=AIBreakdown  ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI.  Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/