Podcasts about gemini ultra

Play Episode Listen Later Oct 10, 2025 40:45

Breaking: Google just released Gemini Enterprise.

GPT-5 уже скоро, Microsoft не тонет, Китай обгоняет / Кто лишиться работы из-за ИИ /AIA Podcast #115

AIA Podcast

Play Episode Listen Later Aug 4, 2025 170:48

spotify ai google microsoft 3d humanity openai gemini gpt anthropic fuzz r1 stack overflow aleph veo notebooklm lume levo reality labs gemini ultra

The AI/XR Podcast May 23rd, 2025 ft. Angela Dunning, an attorney at Cleary Gottlieb specializing in AI and copyrights.

This Week in XR Podcast

Play Episode Listen Later May 26, 2025 52:18

Rony and Ted welcome Angela Dunning, an attorney at Cleary Gottlieb specializing in AI and copyrights. With Charlie on vacation, they switched up the format, bringing Angela in at the top of the show, lacing news tidbits throughout. The big news this week was Google I/O, highlighting the $250/month Gemini Ultra model and the Veo 3 video generation tool, now offering voice. Rony expresses a bit of cynicism as Google once again re-re-enters XR. The conversation veers into the philosophical and legal future of machine authorship and sentience. Angela draws on her experience litigating the “monkey selfie” case to explain current copyright limitations, but notes shifting standards, including a recent case where an entirely AI-generated image was granted protection, due to extensive human control. She emphasizes the importance of human authorship, distinguishing between derivative output and directed creation.Thank you to our sponsor, Zappar!Don't forget to like, share, and follow for more! Follow us on all socials @ThisWeekInXR!https://linktr.ee/thisweekinxr Hosted on Acast. See acast.com/privacy for more information.

ai google attorney acast specializing xr google i o copyrights veo dunning rony cleary gottlieb gemini ultra zappar

Play Episode Listen Later May 23, 2025 88:29

Hey folks, Alex here, welcome back to ThursdAI! And folks, after the last week was the calm before the storm, "The storm came, y'all" – that's an understatement. This wasn't just a storm; it was an AI hurricane, a category 5 of announcements that left us all reeling (in the best way possible!). From being on the ground at Google I/O to live-watching Anthropic drop Claude 4 during our show, it's been an absolute whirlwind.This week was so packed, it felt like AI Christmas, with tech giants and open-source heroes alike showering us with gifts. We saw OpenAI play their classic pre-and-post-Google I/O chess game, Microsoft make some serious open-source moves, Google unleash an avalanche of updates, and Anthropic crash the party with Claude 4 Opus and Sonnet live stream in the middle of ThursdAI!So buckle up, because we're about to try and unpack this glorious chaos. As always, we're here to help you collectively know, learn, and stay up to date, so you don't have to. Let's dive in! (TL;DR and links in the end) Open Source LLMs Kicking Things OffEven with the titans battling, the open-source community dropped some serious heat this week. It wasn't the main headline grabber, but the releases were significant!Gemma 3n: Tiny But Mighty MatryoshkaFirst up, Google's Gemma 3n. This isn't just another small model; it's a "Nano-plus" preview, a 4-billion parameter MatFormer (Matryoshka Transformer – how cool is that name?) model designed for mobile-first multimodal applications. The really slick part? It has a nested 2-billion parameter sub-model that can run entirely on phones or Chromebooks.Yam was particularly excited about this one, pointing out the innovative "model inside another model" design. The idea is you can use half the model, not depth-wise, but throughout the layers, for a smaller footprint without sacrificing too much. It accepts interleaved text, image, audio, and video, supports ASR and speech translation, and even ships with RAG and function-calling libraries for edge apps. With a 128K token window and responsible AI features baked in, Gemma 3n is looking like a powerful tool for on-device AI. Google claims it beats prior 4B mobile models on MMLU-Lite and MMMU-Mini. It's an early preview in Google AI Studio, but it definitely flies on mobile devices.Mistral & AllHands Unleash Devstral 24BThen we got a collaboration from Mistral and AllHands: Devstral, a 24-billion parameter, state-of-the-art open model focused on code. We've been waiting for Mistral to drop some open-source goodness, and this one didn't disappoint.Nisten was super hyped, noting it beats o3-Mini on SWE-bench verified – a tough benchmark! He called it "the first proper vibe coder that you can run on a 3090," which is a big deal for coders who want local power and privacy. This is a fantastic development for the open-source coding community.The Pre-I/O Tremors: OpenAI & Microsoft Set the StageAs we predicted, OpenAI couldn't resist dropping some news right before Google I/O.OpenAI's Codex Returns as an AgentOpenAI launched Codex – yes, that Codex, but reborn as an asynchronous coding agent. This isn't just a CLI tool anymore; it connects to GitHub, does pull requests, fixes bugs, and navigates your codebase. It's powered by a new coding model fine-tuned for large codebases and was SOTA on SWE Agent when it dropped. Funnily, the model is also called Codex, this time, Codex-1. And this gives us a perfect opportunity to talk about the emerging categories I'm seeing among Code Generator agents and tools:* IDE-based (Cursor, Windsurf): Live pair programming in your editor* Vibe coding (Lovable, Bolt, v0): "Build me a UI" style tools for non-coders* CLI tools (Claude Code, Codex-cli): Terminal-based assistants* Async agents (Claude Code, Jules, Codex, GitHub Copilot agent, Devin): Work on your repos while you sleep, open pull requests for you to review, asyncCodex (this new one) falls into category number 4, and with today's release, Cursor seems to also strive to get to category number 4 with background processing. Microsoft BUILD: Open Source Copilot and Copilot Agent ModeThen came Microsoft Build, their huge developer conference, with a flurry of announcements.The biggest one for me? GitHub Copilot's front-end code is now open source! The VS Code editor part was already open, but the Copilot integration itself wasn't. This is a massive move, likely a direct answer to the insane valuations of VS Code clones like Cursor. Now, you can theoretically clone GitHub Copilot with VS Code and swing for the fences.GitHub Copilot also launched as an asynchronous coding assistant, very similar in function to OpenAI's Codex, allowing it to be assigned tasks and create/update PRs. This puts Copilot right into category 4 of code assistants, and with the native Github Integration, they may actually have a leg up in this race!And if that wasn't enough, Microsoft is adding MCP (Model Context Protocol) support directly into the Windows OS. The implications of having the world's biggest operating system natively support this agentic protocol are huge.Google I/O: An "Ultra" Event Indeed!Then came Tuesday, and Google I/O. I was there in the thick of it, and folks, it was an absolute barrage. Google is shipping. The theme could have been "Ultra" for many reasons, as we'll see.First off, the scale: Google reported a 49x increase in AI usage since last year's I/O, jumping from 9 trillion tokens processed to a mind-boggling 480 trillion tokens. That's a testament to their generous free tiers and the explosion of AI adoption.Gemini 2.5 Pro & Flash: #1 and #2 LLMs on ArenaGemini 2.5 Flash got an update and is now #2 on the LMArena leaderboard (with Gemini 2.5 Pro still holding #1). Both Pro and Flash gained some serious new capabilities:* Deep Think mode: This enhanced reasoning mode is pushing Gemini's scores to new heights, hitting 84% on MMMU and topping LiveCodeBench. It's about giving the model more "time" to work through complex problems.* Native Audio I/O: We're talking real-time TTS in 24 languages with two voices, and affective dialogue capabilities. This is the advanced voice mode we've been waiting for, now built-in.* Project Mariner: Computer-use actions are being exposed via the Gemini API & Vertex AI for RPA partners. This started as a Chrome extension to control your browser and now seems to be a cloud-based API, allowing Gemini to use the web, not just browse it. This feels like Google teaching its AI to interact with the JavaScript-heavy web, much like they taught their crawlers years ago.* Thought Summaries: Okay, here's one update I'm not a fan of. They've switched from raw thinking traces to "thought summaries" in the API. We want the actual traces! That's how we learn and debug.* Thinking Budgets: Previously a Flash-only feature, token ceilings for controlling latency/cost now extend to Pro.* Flash Upgrade: 20-30% fewer tokens, better reasoning/multimodal scores, and GA in early June.Gemini Diffusion: Speed Demon for Code and MathThis one got Yam Peleg incredibly excited. Gemini Diffusion is a new approach, different from transformers, for super-speed editing of code and math tasks. We saw demos hitting 2000 tokens per second! While there might be limitations at longer contexts, its speed and infilling capabilities are seriously impressive for a research preview. This is the first diffusion model for text we've seen from the frontier labs, and it looks sick. Funny note, they had to slow down the demo video to actually show the diffusion process, because at 2000t/s - apps appear as though out of thin air!The "Ultra" Tier and Jules, Google's Coding AgentRemember the "Ultra event" jokes? Well, Google announced a Gemini Ultra tier for $250/month. This tops OpenAI's Pro plan and includes DeepThink access, a generous amount of VEO3 generation, YouTube Premium, and a whopping 30TB of storage. It feels geared towards creators and developers.And speaking of developers, Google launched Jules (jules.google)! This is their asynchronous coding assistant (Category 4!). Like Codex and GitHub Copilot Agent, it connects to your GitHub, opens PRs, fixes bugs, and more. The big differentiator? It's currently free, which might make it the default for many. Another powerful agent joins the fray!AI Mode in Search: GA and EnhancedAI Mode in Google Search, which we've discussed on the show before with Robby Stein, is now in General Availability in the US. This is Google's answer to Perplexity and chat-based search.But they didn't stop there:* Personalization: AI Mode can now connect to your Gmail and Docs (if you opt-in) for more personalized results.* Deep Search: While AI Mode is fast, Deep Search offers more comprehensive research capabilities, digging through hundreds of sources, similar to other "deep research" tools. This will eventually be integrated, allowing you to escalate an AI Mode query for a deeper dive.* Project Mariner Integration: AI Mode will be able to click into websites, check availability for tickets, etc., bridging the gap to an "agentic web."I've had a chat with Robby during I/O and you can listen to that interview at the end of the podcast.Veo3: The Undisputed Star of Google I/OFor me, and many others I spoke to, Veo3 was the highlight. This is Google's flagship video generation model, and it's on another level. (the video above, including sounds is completely one shot generated from VEO3, no processing or editing)* Realism and Physics: The visual quality and understanding of physics are astounding.* Natively Multimodal: This is huge. Veo3 generates native audio, including coherent speech, conversations, and sound effects, all synced perfectly. It can even generate text within videos.* Coherent Characters: Characters remain consistent across scenes and have situational awareness, who speaks when, where characters look.* Image Upload & Reference Ability: While image upload was closed for the demo, it has reference capabilities.* Flow: An editor for video creation using Veo3 and Imagen4 which also launched, allowing for stiching and continuous creation.I got access and created videos where Veo3 generated a comedian telling jokes (and the jokes were decent!), characters speaking with specific accents (Indian, Russian – and they nailed it!), and lip-syncing that was flawless. The situational awareness, the laugh tracks kicking in at the right moment... it's beyond just video generation. This feels like a world simulator. It blew through the uncanny valley for me. More on Veo3 later, because it deserves its own spotlight.Imagen4, Virtual Try-On, and XR Glasses* Imagen4: Google's image generation model also got an upgrade, with extra textual ability.* Virtual Try-On: In Google Shopping, you can now virtually try on clothes. I tried it; it's pretty cool and models different body types well.* XR AI Glasses from Google: Perhaps the coolest, but most futuristic, announcement. AI-powered glasses with an actual screen, memory, and Gemini built-in. You can talk to it, it remembers things for you, and interacts with your environment. This is agentic AI in a very tangible form.Big Company LLMs + APIs: The Beat Goes OnThe news didn't stop with Google.OpenAI (acqui)Hires Jony Ive, Launches "IO" for HardwareThe day after I/O, Sam Altman confirmed that Jony Ive, the legendary designer behind Apple's iconic products, is joining OpenAI. He and his company, LoveFrom, have jointly created a new company called "IO" (yes, IO, just like the conference) which is joining OpenAI in a stock deal reportedly worth $6.5 billion. They're working on a hardware device, unannounced for now, but expected next year. This is a massive statement of intent from OpenAI in the hardware space.Legendary iPhone analyst Ming-Chi Kuo shed some light on the possible device, it won't have a screen, as Jony wants to "wean people off screens"... funny right? They are targeting 2027 for mass production, which is really interesting as 2027 is when most big companies expect AGI to be here. "The current prototype is slightly larger than AI Pin, with a form factor comparable to iPod Shuffle, with one intended use cases is to wear it around your neck, with microphones and cameras for environmental detection" LMArena Raises $100M Seed from a16zThis one raised some eyebrows. LMArena, the go-to place for vibe-checking LLMs, raised a $100 million seed round from Andreessen Horowitz. That's a huge number for a seed, reminiscent of Stability AI's early funding. It also brings up questions about how a VC-backed startup maintains impartiality as a model evaluation platform. Interesting times ahead for leaderboards, how they intent to make 100x that amount to return to investors. Very curious.

Alianza OpenAI-Stack Overflow, descubrimiento genético en Alzheimer y MAI-1 de Microsoft

10 minutos con Sami

Play Episode Listen Later May 7, 2024 7:03

En el episodio de hoy de "10 Minutos con Sami", exploramos tres noticias fascinantes del mundo de la tecnología y la ciencia. Comenzamos analizando la emocionante colaboración entre OpenAI y Stack Overflow, que tiene como objetivo mejorar las capacidades de los modelos de OpenAI al integrar el extenso conocimiento técnico y los comentarios de la comunidad de Stack Overflow en sus sistemas de inteligencia artificial. Esta alianza promete impulsar el desarrollo de la IA y brindar respuestas más precisas y relevantes en el contexto de la programación. Luego, discutimos el descubrimiento de que la homocigosidad APOE4 constituye una nueva forma genética de la enfermedad de Alzheimer. Este hallazgo desafía la noción anterior de que el Alzheimer surge principalmente de una combinación de múltiples factores genéticos, de estilo de vida y ambientales, y posiciona a la homocigosidad APOE4 como una causa directa en ciertos casos. Finalmente, nos sumergimos en el impresionante desarrollo de MAI-1 por parte de Microsoft, un nuevo modelo de lenguaje con aproximadamente 500 mil millones de parámetros que se posiciona para competir con otros modelos significativos como GPT-4 de OpenAI y Gemini Ultra de Google. Fuentes: https://community.openai.com/t/openai-partners-with-stack-overflow/737889, https://stackoverflow.co/company/press/archive/openai-partnership/, https://www.nia.nih.gov/health/genetics-and-family-history/alzheimers-disease-genetics-fact-sheet, https://www.nytimes.com/2024/05/06/health/alzheimers-cause-gene-apoe4.html, https://www.nia.nih.gov/news/family-based-study-identifies-potential-new-genetic-factors-linked-alzheimers-risk-people, https://www.reddit.com/r/singularity/comments/1clkmeh/microsoft_is_working_on_a_500b_model_called_mai1/, https://siliconangle.com/2024/05/06/microsoft-reportedly-developing-mai-1-llm-500b-parameters/, https://www.theinformation.com/articles/meet-mai-1-microsoft-readies-new-ai-model-to-compete-with-google-openai Redes: Puedes buscarme por redes sociales como Threads, Twitter e Instagram con @olivernabani, y puedes encontrarme habitualmente en Twitch: http://twitch.tv/olivernabani Puedes encontrar tanto este Podcast como otro contenido original en YouTube: https://youtube.com/olivernabani Además si quieres participar en la comunidad mashain, tenemos un server de Discord donde compartimos nuestras inquietudes: https://discord.gg/7M2SEfbF Un canal de Telegram donde os aviso de novedades y contenidos: https://t.me/sedicemashain Y un canal de Whatsapp: https://whatsapp.com/channel/0029VaCSKOzFCCoavMoLwX43 Y por supuesto lo más importante, recuerda: No se dice Machine, se dice Mashain

Play Episode Listen Later Apr 26, 2024 81:34

Hey hey folks, happy ThursdAI

EP 249: The next AI trend: Small language models?

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Apr 12, 2024 31:09

Bigger isn't always better. Today, we're giving you 14 essential facts about Small Language Models. You'll not only learn the difference between large and small language models, but you'll be able to slice through the jargon and be the language model expert in the room.Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Ask Jordan questions about small language modelsRelated Episodes:Ep 204: Google Gemini Advanced – 7 things you need to knowEp 223: Anthropic Claude 3 – Better Than ChatGPT and Google Gemini?Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTimestamps:02:40 Exploring small language models vs large models.03:42 Definition of small language models is changing.08:49 Small language models are for specific purposes.11:55 Small language models are faster and local.14:45 Tim Cook announces new language model for devices.21:25 2024 shift to smaller, focused language models.27:56 RAG: Combining data, small language models' future.28:52 Concern for large language models, potential for small models.Topics Covered in This Episode:1. Introduction to Language Models2. Advantages and Usage of Small Language Models3. Comparison of Small and Large Language Models4. Future of Small Language ModelsKeywords:Large language models, Small language models, GPT-4, Gemini Ultra, PHY2, Llama, Parameters, Language translation, coding, Generating AI, GPT-5, MMLU, Speed, Efficiency, Fine-tuning, Maintenance, Copy-paste prompts, Chatbots, Search engines, Voice assistants, Hugging Face, Cloud-based services, Downloading models, Gemini Nano, NVIDIA's chat with RTX, RAG, Security, Privacy, Retrieval Augmented Generation Get more out of ChatGPT by learning our PPP method in this live, interactive and free training! Sign up now: https://youreverydayai.com/ppp-registration/

Why Google failed to make GPT-3 + why Multimodal Agents are the path to AGI — with David Luan of Adept

Play Episode Listen Later Mar 22, 2024 41:52

Our next SF event is AI UX 2024 - let's see the new frontier for UX since last year! Last call: we are recording a preview of the AI Engineer World's Fair with swyx and Ben Dunphy, send any questions about Speaker CFPs and Sponsor Guides you have!Alessio is now hiring engineers for a new startup he is incubating at Decibel: Ideal candidate is an “ex-technical co-founder type”. Reach out to him for more!David Luan has been at the center of the modern AI revolution: he was the ~30th hire at OpenAI, he led Google's LLM efforts and co-led Google Brain, and then started Adept in 2022, one of the leading companies in the AI agents space. In today's episode, we asked David for some war stories from his time in early OpenAI (including working with Alec Radford ahead of the GPT-2 demo with Sam Altman, that resulted in Microsoft's initial $1b investment), and how Adept is building agents that can “do anything a human does on a computer" — his definition of useful AGI.Why Google *couldn't* make GPT-3While we wanted to discuss Adept, we couldn't talk to a former VP Eng of OpenAI and former LLM tech lead at Google Brain and not ask about the elephant in the room. It's often asked how Google had such a huge lead in 2017 with Vaswani et al creating the Transformer and Noam Shazeer predicting trillion-parameter models and yet it was David's team at OpenAI who ended up making GPT 1/2/3. David has some interesting answers:“So I think the real story of GPT starts at Google, of course, right? Because that's where Transformers sort of came about. However, the number one shocking thing to me was that, and this is like a consequence of the way that Google is organized…what they (should) have done would be say, hey, Noam Shazeer, you're a brilliant guy. You know how to scale these things up. Here's half of all of our TPUs. And then I think they would have destroyed us. He clearly wanted it too…You know, every day we were scaling up GPT-3, I would wake up and just be stressed. And I was stressed because, you know, you just look at the facts, right? Google has all this compute. Google has all the people who invented all of these underlying technologies. There's a guy named Noam who's really smart, who's already gone and done this talk about how he wants a trillion parameter model. And I'm just like, we're probably just doing duplicative research to what he's doing. He's got this decoder only transformer that's probably going to get there before we do. And it turned out the whole time that they just couldn't get critical mass. So during my year where I led the Google LM effort and I was one of the brain leads, you know, it became really clear why. At the time, there was a thing called the Brain Credit Marketplace. Everyone's assigned a credit. So if you have a credit, you get to buy end chips according to supply and demand. So if you want to go do a giant job, you had to convince like 19 or 20 of your colleagues not to do work. And if that's how it works, it's really hard to get that bottom up critical mass to go scale these things. And the team at Google were fighting valiantly, but we were able to beat them simply because we took big swings and we focused.”Cloning HGI for AGIHuman intelligence got to where it is today through evolution. Some argue that to get to AGI, we will approximate all the “FLOPs” that went into that process, an approach most famously mapped out by Ajeya Cotra's Biological Anchors report:The early days of OpenAI were very reinforcement learning-driven with the Dota project, but that's a very inefficient way for these models to re-learn everything. (Kanjun from Imbue shared similar ideas in her episode).David argues that there's a shortcut. We can bootstrap from existing intelligence.“Years ago, I had a debate with a Berkeley professor as to what will it actually take to build AGI. And his view is basically that you have to reproduce all the flops that went into evolution in order to be able to get there… I think we are ignoring the fact that you have a giant shortcut, which is you can behaviorally clone everything humans already know. And that's what we solved with LLMs!”LLMs today basically model intelligence using all (good!) written knowledge (see our Datasets 101 episode), and have now expanded to non-verbal knowledge (see our HuggingFace episode on multimodality). The SOTA self-supervised pre-training process is surprisingly data-efficient in taking large amounts of unstructured data, and approximating reasoning without overfitting.But how do you cross the gap from the LLMs of today to building the AGI we all want? This is why David & friends left to start Adept.“We believe the clearest framing of general intelligence is a system that can do anything a human can do in front of a computer. A foundation model for actions, trained to use every software tool, API, and webapp that exists, is a practical path to this ambitious goal” — ACT-1 BlogpostCritical Path: Abstraction with ReliabilityThe AGI dream is fully autonomous agents, but there are levels to autonomy that we are comfortable giving our agents, based on how reliable they are. In David's word choice, we always want higher levels of “abstractions” (aka autonomy), but our need for “reliability” is the practical limit on how high of an abstraction we can use.“The critical path for Adept is we want to build agents that can do a higher and higher level abstraction things over time, all while keeping an insanely high reliability standard. Because that's what turns us from research into something that customers want. And if you build agents with really high reliability standard, but are continuing pushing a level of abstraction, you then learn from your users how to get that next level of abstraction faster. So that's how you actually build the data flow. That's the critical path for the company. Everything we do is in service of that.”We saw how Adept thinks about different levels of abstraction at the 2023 Summit:The highest abstraction is the “AI Employee”, but we'll get there with “AI enabled employees”. Alessio recently gave a talk about the future of work with “services as software” at this week's Nvidia GTC (slides).No APIsUnlike a lot of large research labs, Adept's framing of AGI as "being able to use your computer like a human" carries with it a useful environmental constraint:“Having a human robot lets you do things that humans do without changing everything along the way. It's the same thing for software, right? If you go itemize out the number of things you want to do on your computer for which every step has an API, those numbers of workflows add up pretty close to zero. And so then many points along the way, you need the ability to actually control your computer like a human. It also lets you learn from human usage of computers as a source of training data that you don't get if you have to somehow figure out how every particular step needs to be some particular custom private API thing. And so I think this is actually the most practical path (to economic value).”This realization and conviction means that multimodal modals are the way to go. Instead of using function calling to call APIs to build agents, which is what OpenAI and most of the open LLM industry have done to date, Adept wants to “drive by vision”, (aka see the screen as a human sees it) and pinpoint where to click and type as a human does. No APIs needed, because most software don't expose APIs.Extra context for readers: You can see the DeepMind SIMA model in the same light: One system that learned to play a diverse set of games (instead of one dedicated model per game) using only pixel inputs and keyboard-and-mouse action outputs!The OpenInterpreter team is working on a “Computer API” that also does the same.To do this, Adept had to double down on a special kind of multimodality for knowledge work:“A giant thing that was really necessary is really fast multimodal models that are really good at understanding knowledge work and really good at understanding screens. And that is needs to kind of be the base for some of these agents……I think one big hangover primarily academic focus for multimodal models is most multimodal models are primarily trained on like natural images, cat and dog photos, stuff that's come out of the camera… (but) where are they going to be the most useful? They're going to be most useful in knowledge work tasks. That's where the majority of economic value is going to be. It's not in cat and dogs. And so if that's what it is, what do you need to train? I need to train on like charts, graphs, tables, invoices, PDFs, receipts, unstructured data, UIs. That's just a totally different pre-training corpus. And so Adept spent a lot of time building that.”With this context, you can now understand the full path of Adept's public releases:* ACT-1 (Sept 2022): a large Transformers model optimized for browser interactions. It has a custom rendering of the browser viewport that allows it to better understand it and take actions.* Persimmon-8B (Sept 2023): a permissive open LLM (weights and code here)* Fuyu-8B (Oct 2023): a small version of the multimodal model that powers Adept. Vanilla decoder-only transformer with no specialized image encoder, which allows it to handle input images of varying resolutions without downsampling.* Adept Experiments (Nov 2023): A public tool to build automations in the browser. This is powered by Adept's core technology but it's just a piece of their enterprise platform. They use it as a way to try various design ideas.* Fuyu Heavy (Jan 2024) - a new multimodal model designed specifically for digital agents and the world's third-most-capable multimodal model (beating Gemini Pro on MMMU, AI2D, and ChartQA), “behind only GPT4-V and Gemini Ultra, which are 10-20 times bigger”The Fuyu-8B post in particular exhibits a great number of examples on knowledge work multimodality:Why Adept is NOT a Research LabWith OpenAI now worth >$90b and Anthropic >$18b, it is tempting to conclude that the AI startup metagame is to build a large research lab, and attract the brightest minds and highest capital to build AGI. Our past guests (see the Humanloop episode) and (from Imbue) combined to ask the most challenging questions of the pod - with David/Adept's deep research pedigree from Deepmind and OpenAI, why is Adept not building more general foundation models (like Persimmon) and playing the academic benchmarks game? Why is Adept so focused on commercial agents instead?“I feel super good that we're doing foundation models in service of agents and all of the reward within Adept is flowing from “Can we make a better agent”…… I think pure play foundation model companies are just going to be pinched by how good the next couple of (Meta Llama models) are going to be… And then seeing the really big players put ridiculous amounts of compute behind just training these base foundation models, I think is going to commoditize a lot of the regular LLMs and soon regular multimodal models. So I feel really good that we're just focused on agents.”and the commercial grounding is his answer to Kanjun too (whom we also asked the inverse question to compare with Adept):“… the second reason I work at Adept is if you believe that actually having customers and a reward signal from customers lets you build AGI faster, which we really believe, then you should come here. And I think the examples for why that's true is for example, our evaluations are not academic evals. They're not simulator evals. They're like, okay, we have a customer that really needs us to do these particular things. We can do some of them. These are the ones they want us to, we can't do them at all. We've turned those into evals.. I think that's a degree of practicality that really helps.”And his customers seem pretty happy, because David didn't need to come on to do a sales pitch:David: “One of the things we haven't shared before is we're completely sold out for Q1.”Swyx: “Sold out of what?”David: “Sold out of bandwidth to onboard more customers.”Well, that's a great problem to have.Show Notes* David Luan* Dextro at Data Driven NYC (2015)* Adept* ACT-1* Persimmon-8B* Adept Experiments* Fuyu-8B* $350M Series B announcement* Amelia Wattenberger talk at AI Engineer Summit* FigureChapters* [00:00:00] Introductions* [00:01:14] Being employee #30 at OpenAI and its early days* [00:13:38] What is Adept and how do you define AGI?* [00:21:00] Adept's critical path and research directions* [00:26:23] How AI agents should interact with software and impact product development* [00:30:37] Analogies between AI agents and self-driving car development* [00:32:42] Balancing reliability, cost, speed and generality in AI agents* [00:37:30] Potential of foundation models for robotics* [00:39:22] Core research questions and reasons to work at AdeptTranscriptsAlessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.Swyx [00:00:15]: Hey, and today we have David Luan, CEO, co-founder of Adept in the studio. Welcome.David [00:00:20]: Yeah, thanks for having me.Swyx [00:00:21]: Been a while in the works. I've met you socially at one of those VC events and you said that you were interested in coming on and glad we finally were able to make this happen.David: Yeah, happy to be part of it.Swyx: So we like to introduce the speaker and then also just like have you talk a little bit about like what's not on your LinkedIn, what people should just generally know about you. You started a company in college, which was the first sort of real time video detection classification API that was Dextro, and that was your route to getting acquired into Axon where you're a director of AI. Then you were the 30th hire at OpenAI?David [00:00:53]: Yeah, 30, 35, something around there. Something like that.Swyx [00:00:56]: So you were VP of Eng for two and a half years to two years, briefly served as tech lead of large models at Google, and then in 2022 started Adept. So that's the sort of brief CV. Is there anything else you like want to fill in the blanks or like people should know more about?David [00:01:14]: I guess a broader story was I joined OpenAI fairly early and I did that for about two and a half to three years leading engineering there. It's really funny, I think second or third day of my time at OpenAI, Greg and Ilya pulled me in a room and we're like, you know, you should take over our directs and we'll go mostly do IC work. So that was fun, just coalescing a bunch of teams out of a couple of early initiatives that had already happened. The company, the Dota effort was going pretty hard and then more broadly trying to put bigger picture direction around what we were doing with basic research. So I spent a lot of time doing that. And then I led Google's LLM efforts, but also co-led Google Brain was one of the brain leads more broadly. You know, there's been a couple of different eras of AI research, right? If we count everything before 2012 as prehistory, which people hate it when I say that, kind of had this like you and your three best friends write a research paper that changes the world period from like 2012 to 2017. And I think the game changed in 2017 and like most labs didn't realize it, but we at OpenAI really did. I think in large part helped by like Ilya's constant beating of the drum that the world would be covered in data centers. And I think-Swyx [00:02:15]: It's causally neat.David [00:02:16]: Yeah. Well, like I think we had conviction in that, but it wasn't until we started seeing results that it became clear that that was where we had to go. But also part of it as well was for OpenAI, like when I first joined, I think one of the jobs that I had to do was how do I tell a differentiated vision for who we were technically compared to, you know, hey, we're just smaller Google Brain, or like you work at OpenAI if you live in SF and don't want to commute to Mountain View or don't want to live in London, right? That's like not enough to like hang your technical identity as a company. And so what we really did was, and I spent a lot of time pushing this, is just how do we get ourselves focused on a certain class of like giant swings and bets, right? Like how do you flip the script from you just do bottom-up research to more about how do you like leave some room for that, but really make it about like, what are the big scientific outcomes that you want to show? And then you just solve them at all costs, whether or not you care about novelty and all that stuff. And that became the dominant model for a couple of years, right? And then what's changed now is I think the number one driver of AI products over the next couple of years is going to be the deep co-design and co-evolution of product and users for feedback and actual technology. And I think labs, every tool to go do that are going to do really well. And that's a big part of why I started Adept.Alessio [00:03:20]: You mentioned Dota, any memories thinking from like the switch from RL to Transformers at the time and kind of how the industry was evolving more in the LLM side and leaving behind some of the more agent simulation work?David [00:03:33]: Like zooming way out, I think agents are just absolutely the correct long-term direction, right? You just go to find what AGI is, right? You're like, Hey, like, well, first off, actually, I don't love AGI definitions that involve human replacement because I don't think that's actually how it's going to happen. Even this definition of like, Hey, AGI is something that outperforms humans at economically valuable tasks is kind of implicit view of the world about what's going to be the role of people. I think what I'm more interested in is like a definition of AGI that's oriented around like a model that can do anything a human can do on a computer. If you go think about that, which is like super tractable, then agent is just a natural consequence of that definition. And so what did all the work we did on our own stuff like that get us was it got us a really clear formulation. Like you have a goal and you want to maximize the goal, you want to maximize reward, right? And the natural LLM formulation doesn't come with that out of the box, right? I think that we as a field got a lot right by thinking about, Hey, how do we solve problems of that caliber? And then the thing we forgot is the Novo RL is like a pretty terrible way to get there quickly. Why are we rediscovering all the knowledge about the world? Years ago, I had a debate with a Berkeley professor as to what will it actually take to build AGI. And his view is basically that you have to reproduce all the flops that went into evolution in order to be able to get there. Right.Swyx [00:04:44]: The biological basis theory. Right.David [00:04:46]: So I think we are ignoring the fact that you have a giant shortcut, which is you can behavioral clone everything humans already know. And that's what we solved with LLMs. We've solved behavioral cloning, everything that humans already know. Right. So like today, maybe LLMs is like behavioral cloning every word that gets written on the internet in the future, the multimodal models are becoming more of a thing where behavioral cloning the visual world. But really, what we're just going to have is like a universal byte model, right? Where tokens of data that have high signal come in, and then all of those patterns are like learned by the model. And then you can regurgitate any combination now. Right. So text into voice out, like image into other image out or video out or whatever, like these like mappings, right? Like all just going to be learned by this universal behavioral cloner. And so I'm glad we figured that out. And I think now we're back to the era of how do we combine this with all of the lessons we learned during the RL period. That's what's going to drive progress.Swyx [00:05:35]: I'm still going to pressure you for a few more early opening stories before we turn to the ADET stuff. On your personal site, which I love, because it's really nice, like personal, you know, story context around like your history. I need to update it. It's so old. Yeah, it's so out of date. But you mentioned GPT-2. Did you overlap with GPT-1? I think you did, right?David [00:05:53]: I actually don't quite remember. I think I was joining right around- Right around then?Swyx [00:05:57]: I was right around that, yeah. Yeah. So what I remember was Alec, you know, just kind of came in and was like very obsessed with Transformers and applying them to like Reddit sentiment analysis. Yeah, sentiment, that's right. Take us through-David [00:06:09]: Sentiment neuron, all this stuff.Swyx [00:06:10]: The history of GPT as far as you know, you know, according to you. Ah, okay.David [00:06:14]: History of GPT, according to me, that's a pretty good question. So I think the real story of GPT starts at Google, of course, right? Because that's where Transformers sort of came about. However, the number one shocking thing to me was that, and this is like a consequence of the way that Google is organized, where like, again, you and your three best friends write papers, right? Okay. So zooming way out, right? I think about my job when I was a full-time research leader as a little bit of a portfolio allocator, right? So I've got really, really smart people. My job is to convince people to coalesce around a small number of really good ideas and then run them over the finish line. My job is not actually to promote a million ideas and never have critical mass. And then as the ideas start coming together and some of them start working well, my job is to nudge resources towards the things that are really working and then start disbanding some of the things that are not working, right? That muscle did not exist during my time at Google. And I think had they had it, what they would have done would be say, hey, Noam Shazir, you're a brilliant guy. You know how to scale these things up. Here's half of all of our TPUs. And then I think they would have destroyed us. He clearly wanted it too.Swyx [00:07:17]: He's talking about trillion parameter models in 2017.David [00:07:20]: Yeah. So that's the core of the GPT story, right? Which is that, and I'm jumping around historically, right? But after GPT-2, we were all really excited about GPT-2. I can tell you more stories about that. It was the last paper that I even got to really touch before everything became more about building a research org. You know, every day we were scaling up GPT-3, I would wake up and just be stressed. And I was stressed because, you know, you just look at the facts, right? Google has all this compute. Google has all the people who invented all of these underlying technologies. There's a guy named Noam who's really smart, who's already gone and done this talk about how he wants a trillion parameter model. And I'm just like, we're probably just doing duplicative research to what he's doing, right? He's got this decoder only transformer that's probably going to get there before we do. And I was like, but like, please just like let this model finish, right? And it turned out the whole time that they just couldn't get critical mass. So during my year where I led the Google LM effort and I was one of the brain leads, you know, it became really clear why, right? At the time, there was a thing called the brain credit marketplace. And did you guys know the brain credit marketplace? No, I never heard of this. Oh, so it's actually, it's a, you can ask any Googler.Swyx [00:08:23]: It's like just like a thing that, that, I mean, look like, yeah, limited resources, you got to have some kind of marketplace, right? You know, sometimes it's explicit, sometimes it isn't, you know, just political favors.David [00:08:34]: You could. And so then basically everyone's assigned a credit, right? So if you have a credit, you get to buy end chips according to supply and demand. So if you want to go do a giant job, you had to convince like 19 or 20 of your colleagues not to do work. And if that's how it works, it's really hard to get that bottom up critical mass to go scale these things. And the team at Google were fighting valiantly, but we were able to beat them simply because we took big swings and we focused. And I think, again, that's like part of the narrative of like this phase one of AI, right? Of like this modern AI era to phase two. And I think in the same way, I think phase three company is going to out execute phase two companies because of the same asymmetry of success.Swyx [00:09:12]: Yeah. I think it's underrated how much NVIDIA works with you in the early days as well. I think maybe, I think it was Jensen. I'm not sure who circulated a recent photo of him delivering the first DGX to you guys.David [00:09:24]: I think Jensen has been a complete legend and a mastermind throughout. I have so much respect for NVIDIA. It is unreal.Swyx [00:09:34]: But like with OpenAI, like kind of give their requirements, like co-design it or just work of whatever NVIDIA gave them.David [00:09:40]: So we work really closely with them. There's, I'm not sure I can share all the stories, but examples of ones that I've found particularly interesting. So Scott Gray is amazing. I really like working with him. He was on one of my teams, the supercomputing team, which Chris Berner runs and Chris Berner still does a lot of stuff in that. As a result, like we had very close ties to NVIDIA. Actually, one of my co-founders at Adept, Eric Elson, was also one of the early GPGPU people. So he and Scott and Brian Catanzaro at NVIDIA and Jonah and Ian at NVIDIA, I think all were very close. And we're all sort of part of this group of how do we push these chips to the absolute limit? And I think that kind of collaboration helped quite a bit. I think one interesting set of stuff is knowing the A100 generation, that like quad sparsity was going to be a thing. Is that something that we want to go look into, right? And figure out if that's something that we could actually use for model training. Really what it boils down to is that, and I think more and more people realize this, six years ago, people, even three years ago, people refused to accept it. This era of AI is really a story of compute. It's really the story of how do you more efficiently map actual usable model flops to compute,Swyx [00:10:38]: Is there another GPT 2, 3 story that you love to get out there that you think is underappreciated for the amount of work that people put into it?David [00:10:48]: So two interesting GPT 2 stories. One of them was I spent a good bit of time just sprinting to help Alec get the paper out. And I remember one of the most entertaining moments was we were writing the modeling section. And I'm pretty sure the modeling section was the shortest modeling section of any ML, reasonably legitimate ML paper to that moment. It was like section three model. This is a standard vanilla decoder only transformer with like these particular things, those paragraph long if I remember correctly. And both of us were just looking at the same being like, man, the OGs in the field are going to hate this. They're going to say no novelty. Why did you guys do this work? So now it's funny to look at in hindsight that it was pivotal kind of paper, but I think it was one of the early ones where we just leaned fully into all we care about is solving problems in AI and not about, hey, is there like four different really simple ideas that are cloaked in mathematical language that doesn't actually help move the field forward?Swyx [00:11:42]: Right. And it's like you innovate on maybe like data set and scaling and not so much the architecture.David [00:11:48]: We all know how it works now, right? Which is that there's a collection of really hard won knowledge that you get only by being at the frontiers of scale. And that hard won knowledge, a lot of it's not published. A lot of it is stuff that's actually not even easily reducible to what looks like a typical academic paper. But yet that's the stuff that helps differentiate one scaling program from another. You had a second one? So the second one is, there's like some details here that I probably shouldn't fully share, but hilariously enough for the last meeting we did with Microsoft before Microsoft invested in OpenAI, Sam Altman, myself and our CFO flew up to Seattle to do the final pitch meeting. And I'd been a founder before. So I always had a tremendous amount of anxiety about partner meetings, which this basically this is what it was. I had Kevin Scott and Satya and Amy Hood, and it was my job to give the technical slides about what's the path to AGI, what's our research portfolio, all of this stuff, but it was also my job to give the GPT-2 demo. We had a slightly bigger version of GPT-2 that we had just cut maybe a day or two before this flight up. And as we all know now, model behaviors you find predictable at one checkpoint are not predictable in another checkpoint. And so I'd spent all this time trying to figure out how to keep this thing on rails. I had my canned demos, but I knew I had to go turn it around over to Satya and Kevin and let them type anything in. And that just, that really kept me up all night.Swyx [00:13:06]: Nice. Yeah.Alessio [00:13:08]: I mean, that must have helped you talking about partners meeting. You raised $420 million for Adept. The last round was a $350 million Series B, so I'm sure you do great in partner meetings.Swyx [00:13:18]: Pitchers meetings. Nice.David [00:13:20]: No, that's a high compliment coming from a VC.Alessio [00:13:22]: Yeah, no, I mean, you're doing great already for us. Let's talk about Adept. And we were doing pre-prep and you mentioned that maybe a lot of people don't understand what Adept is. So usually we try and introduce the product and then have the founders fill in the blanks, but maybe let's do the reverse. Like what is Adept? Yeah.David [00:13:38]: So I think Adept is the least understood company in the broader space of foundational models plus agents. So I'll give some color and I'll explain what it is and I'll explain also why it's actually pretty different from what people would have guessed. So the goal for Adept is we basically want to build an AI agent that can do, that can basically help humans do anything a human does on a computer. And so what that really means is we want this thing to be super good at turning natural language like goal specifications right into the correct set of end steps and then also have all the correct sensors and actuators to go get that thing done for you across any software tool that you already use. And so the end vision of this is effectively like I think in a couple of years everyone's going to have access to like an AI teammate that they can delegate arbitrary tasks to and then also be able to, you know, use it as a sounding board and just be way, way, way more productive. Right. And just changes the shape of every job from something where you're mostly doing execution to something where you're mostly actually doing like these core liberal arts skills of what should I be doing and why. Right. And I find this like really exciting and motivating because I think it's actually a pretty different vision for how AGI will play out. I think systems like Adept are the most likely systems to be proto-AGIs. But I think the ways in which we are really counterintuitive to everybody is that we've actually been really quiet because we are not a developer company. We don't sell APIs. We don't sell open source models. We also don't sell bottom up products. We're not a thing that you go and click and download the extension and like we want more users signing up for that thing. We're actually an enterprise company. So what we do is we work with a range of different companies, some like late stage multi-thousand people startups, some fortune 500s, et cetera. And what we do for them is we basically give them an out of the box solution where big complex workflows that their employees do every day could be delegated to the model. And so we look a little different from other companies in that in order to go build this full agent thing, the most important thing you got to get right is reliability. So initially zooming way back when, one of the first things that DEP did was we released this demo called Act One, right? Act One was like pretty cool. It's like kind of become a hello world thing for people to show agent demos by going to Redfin and asking to buy a house somewhere because like we did that in the original Act One demo and like showed that, showed like Google Sheets, all this other stuff. Over the last like year since that has come out, there's been a lot of really cool demos and you go play with them and you realize they work 60% of the time. But since we've always been focused on how do we build an amazing enterprise product, enterprises can't use anything that isn't in the nines of reliability. And so we've actually had to go down a slightly different tech tree than what you might find in the prompt engineering sort of plays in the agent space to get that reliability. And we've decided to prioritize reliability over all else. So like one of our use cases is crazy enough that it actually ends with a physical truck being sent to a place as the result of the agent workflow. And if you're like, if that works like 60% of the time, you're just blowing money and poor truck drivers going places.Alessio [00:16:30]: Interesting. One of the, our investment teams has this idea of services as software. I'm actually giving a talk at NVIDIA GTC about this, but basically software as a service, you're wrapping user productivity in software with agents and services as software is replacing things that, you know, you would ask somebody to do and the software just does it for you. When you think about these use cases, do the users still go in and look at the agent kind of like doing the things and can intervene or like are they totally removed from them? Like the truck thing is like, does the truck just show up or are there people in the middle checking in?David [00:17:04]: I think there's two current flaws in the framing for services as software, or I think what you just said. I think that one of them is like in our experience, as we've been rolling out Adept, the people who actually do the jobs are the most excited about it because they don't go from, I do this job to, I don't do this job. They go from, I do this job for everything, including the shitty rote stuff to I'm a supervisor. And I literally like, it's pretty magical when you watch the thing being used because now it parallelizes a bunch of the things that you had to do sequentially by hand as a human. And you can just click into any one of them and be like, Hey, I want to watch the trajectory that the agent went through to go solve this. And the nice thing about agent execution as opposed to like LLM generations is that a good chunk of the time when the agent fails to execute, it doesn't give you the wrong result. It just fails to execute. And the whole trajectory is just broken and dead and the agent knows it, right? So then those are the ones that the human then goes and solves. And so then they become a troubleshooter. They work on the more challenging stuff. They get way, way more stuff done and they're really excited about it. I think the second piece of it that we've found is our strategy as a company is to always be an augmentation company. And I think one out of principle, that's something we really care about. But two, actually, if you're framing yourself as an augmentation company, you're always going to live in a world where you're solving tasks that are a little too hard for what the model can do today and still needs a human to provide oversight, provide clarifications, provide human feedback. And that's how you build a data flywheel. That's how you actually learn from the smartest humans how to solve things models can't do today. And so I actually think that being an augmentation company forces you to go develop your core AI capabilities faster than someone who's saying, ah, okay, my job is to deliver you a lights off solution for X.Alessio [00:18:42]: Yeah. It's interesting because we've seen two parts of the market. One is we have one company that does agents for SOC analysts. People just don't have them, you know, and just they cannot attract the talent to do it. And similarly, in a software development, you have Copilot, which is the augmentation product, and then you have sweep.dev and you have these products, which they just do the whole thing. I'm really curious to see how that evolves. I agree that today the reliability is so important in the enterprise that they just don't use most of them. Yeah. Yeah. No, that's cool. But it's great to hear the story because I think from the outside, people are like, oh, a dev, they do Act One, they do Persimon, they do Fuyu, they do all this stuff. Yeah, it's just the public stuff.Swyx [00:19:20]: It's just public stuff.David [00:19:21]: So one of the things we haven't shared before is we're completely sold out for Q1. And so I think...Swyx [00:19:26]: Sold out of what?David [00:19:27]: Sold out of bandwidth to go on board more customers. And so we're like working really hard to go make that less of a bottleneck, but our expectation is that I think we're going to be significantly more public about the broader product shape and the new types of customers we want to attract later this year. So I think that clarification will happen by default.Swyx [00:19:43]: Why have you become more public? You know, if the whole push has... You're sold out, you're my enterprise, but you're also clearly putting effort towards being more open or releasing more things.David [00:19:53]: I think we just flipped over that way fairly recently. That's a good question. I think it actually boils down to two things. One, I think that, frankly, a big part of it is that the public narrative is really forming around agents as being the most important thing. And I'm really glad that's happening because when we started the company in January 2022, everybody in the field knew about the agents thing from RL, but the general public had no conception of what it was. They were still hanging their narrative hat on the tree of everything's a chatbot. And so I think now one of the things that I really care about is that when people think agent, they actually think the right thing. All sorts of different things are being called agents. Chatbots are being called agents. Things that make a function call are being called agents. To me, an agent is something that you can give a goal and get an end step workflow done correctly in the minimum number of steps. And so that's a big part of why. And I think the other part is because I think it's always good for people to be more aware of Redept as they think about what the next thing they want to do in their careers. The field is quickly pivoting in a world where foundation models are looking more and more commodity. And I think a huge amount of gain is going to happen from how do you use foundation models as the well-learned behavioral cloner to go solve agents. And I think people who want to do agents research should really come to Redept.Swyx [00:21:00]: When you say agents have become more part of the public narrative, are there specific things that you point to? I'll name a few. Bill Gates in his blog post mentioning that agents are the future. I'm the guy who made OSes, and I think agents are the next thing. So Bill Gates, I'll call that out. And then maybe Sam Altman also saying that agents are the future for open AI.David [00:21:17]: I think before that even, I think there was something like the New York Times, Cade Metz wrote a New York Times piece about it. Right now, in a bit to differentiate, I'm seeing AI startups that used to just brand themselves as an AI company, but now brand themselves as an AI agent company. It's just like, it's a term I just feel like people really want.Swyx [00:21:31]: From the VC side, it's a bit mixed. Is it? As in like, I think there are a lot of VCs where like, I would not touch any agent startups because like- Why is that? Well, you tell me.Alessio [00:21:41]: I think a lot of VCs that are maybe less technical don't understand the limitations of the-Swyx [00:21:46]: No, that's not fair.Alessio [00:21:47]: No, no, no, no. I think like- You think so? No, no. I think like the, what is possible today and like what is worth investing in, you know? And I think like, I mean, people look at you and say, well, these guys are building agents. They needed 400 million to do it. So a lot of VCs are maybe like, oh, I would rather invest in something that is tacking on AI to an existing thing, which is like easier to get the market and kind of get some of the flywheel going. But I'm also surprised a lot of funders just don't want to do agents. It's not even the funding. Sometimes we look around and it's like, why is nobody doing agents for X? Wow.David [00:22:17]: That's good to know actually. I never knew that before. My sense from my limited perspective is there's a new agent company popping up every day.Swyx [00:22:24]: So maybe I'm- They are. They are. But like I have advised people to take agents off of their title because it's so diluted.David [00:22:31]: It's now so diluted.Swyx [00:22:32]: Yeah. So then it doesn't stand for anything. Yeah.David [00:22:35]: That's a really good point.Swyx [00:22:36]: So like, you know, you're a portfolio allocator. You have people know about Persimmon, people know about Fuyu and Fuyu Heavy. Can you take us through like how you think about that evolution of that and what people should think about what that means for adepts and sort of research directions? Kind of take us through the stuff you shipped recently and how people should think about the trajectory of what you're doing.David [00:22:56]: The critical path for adepts is we want to build agents that can do a higher and higher level abstraction things over time, all while keeping an insanely high reliability standard. Because that's what turns us from research into something that customers want. And if you build agents with really high reliability standard, but are continuing pushing a level of abstraction, you then learn from your users how to get that next level of abstraction faster. So that's how you actually build the data flow. That's the critical path for the company. Everything we do is in service of that. So if you go zoom way, way back to Act One days, right? Like the core thing behind Act One is can we teach large model basically how to even actuate your computer? And I think we're one of the first places to have solved that and shown it and shown the generalization that you get when you give it various different workflows and texts. But I think from there on out, we really realized was that in order to get reliability, companies just do things in various different ways. You actually want these models to be able to get a lot better at having some specification of some guardrails for what it actually should be doing. And I think in conjunction with that, a giant thing that was really necessary is really fast multimodal models that are really good at understanding knowledge work and really good at understanding screens. And that is needs to kind of be the base for some of these agents. Back then we had to do a ton of research basically on how do we actually make that possible? Well, first off, like back in forgot exactly one month to 23, like there were no multimodal models really that you could use for things like this. And so we pushed really hard on stuff like the Fuyu architecture. I think one big hangover primarily academic focus for multimodal models is most multimodal models are primarily trained on like natural images, cat and dog photos, stuff that's come out of the camera. Coco. Yeah, right. And the Coco is awesome. Like I love Coco. I love TY. Like it's really helped the field. Right. But like that's the build one thing. I actually think it's really clear today. Multimodal models are the default foundation model, right? It's just going to supplant LLMs. Like you just train a giant multimodal model. And so for that though, like where are they going to be the most useful? They're going to be most useful in knowledge work tasks. That's where the majority of economic value is going to be. It's not in cat and dogs. Right. And so if that's what it is, what do you need to train? I need to train on like charts, graphs, tables, invoices, PDFs, receipts, unstructured data, UIs. That's just a totally different pre-training corpus. And so a depth spent a lot of time building that. And so the public for use and stuff aren't trained on our actual corpus, it's trained on some other stuff. But you take a lot of that data and then you make it really fast and make it really good at things like dense OCR on screens. And then now you have the right like raw putty to go make a good agent. So that's kind of like some of the modeling side, we've kind of only announced some of that stuff. We haven't really announced much of the agent's work, but that if you put those together with the correct product form factor, and I think the product form factor also really matters. I think we're seeing, and you guys probably see this a little bit more than I do, but we're seeing like a little bit of a pushback against the tyranny of chatbots as form factor. And I think that the reason why the form factor matters is the form factor changes what data you collect in the human feedback loop. And so I think we've spent a lot of time doing full vertical integration of all these bits in order to get to where we are.Swyx [00:25:44]: Yeah. I'll plug Amelia Wattenberger's talk at our conference, where she gave a little bit of the thinking behind like what else exists other than chatbots that if you could delegate to reliable agents, you could do. I was kind of excited at Adept experiments or Adept workflows, I don't know what the official name for it is. I was like, okay, like this is something I can use, but it seems like it's just an experiment for now. It's not your product.David [00:26:06]: So you basically just use experiments as like a way to go push various ideas on the design side to some people and just be like, yeah, we'll play with it. Actually the experiments code base underpins the actual product, but it's just the code base itself is kind of like a skeleton for us to go deploy arbitrary cards on the side.Swyx [00:26:22]: Yeah.Alessio [00:26:23]: Makes sense. I was going to say, I would love to talk about the interaction layer. So you train a model to see UI, but then there's the question of how do you actually act on the UI? I think there was some rumors about open app building agents that are kind of like, they manage the end point. So the whole computer, you're more at the browser level. I read in one of your papers, you have like a different representation, kind of like you don't just take the dome and act on it. You do a lot more stuff. How do you think about the best way the models will interact with the software and like how the development of products is going to change with that in mind as more and more of the work is done by agents instead of people?David [00:26:58]: This is, there's so much surface area here and it's actually one of the things I'm really excited about. And it's funny because I've spent most of my time doing research stuff, but there's like a whole new ball game that I've been learning about and I find it really cool. So I would say the best analogy I have to why Adept is pursuing a path of being able to use your computer like a human, plus of course being able to call APIs and being able to call APIs is the easy part, like being able to use your computer like a human is a hard part. It's in the same way why people are excited about humanoid robotics, right? In a world where you had T equals infinity, right? You're probably going to have various different form factors that robots could just be in and like all the specialization. But the fact is that humans live in a human environment. So having a human robot lets you do things that humans do without changing everything along the way. It's the same thing for software, right? If you go itemize out the number of things you want to do on your computer for which every step has an API, those numbers of workflows add up pretty close to zero. And so then many points along the way, you need the ability to actually control your computer like a human. It also lets you learn from human usage of computers as a source of training data that you don't get if you have to somehow figure out how every particular step needs to be some particular custom private API thing. And so I think this is actually the most practical path. I think because it's the most practical path, I think a lot of success will come from going down this path. I kind of think about this early days of the agent interaction layer level is a little bit like, do you all remember Windows 3.1? Like those days? Okay, this might be, I might be, I might be too old for you guys on this. But back in the day, Windows 3.1, we had this transition period between pure command line, right? Being the default into this new world where the GUI is the default and then you drop into the command line for like programmer things, right? The old way was you booted your computer up, DOS booted, and then it would give you the C colon slash thing. And you typed Windows and you hit enter, and then you got put into Windows. And then the GUI kind of became a layer above the command line. The same thing is going to happen with agent interfaces is like today we'll be having the GUI is like the base layer. And then the agent just controls the current GUI layer plus APIs. And in the future, as more and more trust is built towards agents and more and more things can be done by agents, if more UIs for agents are actually generative in and of themselves, then that just becomes a standard interaction layer. And if that becomes a standard interaction layer, what changes for software is that a lot of software is going to be either systems or record or like certain customized workflow execution engines. And a lot of how you actually do stuff will be controlled at the agent layer.Alessio [00:29:19]: And you think the rabbit interface is more like it would like you're not actually seeing the app that the model interacts with. You're just saying, hey, I need to log this call on Salesforce. And you're never actually going on salesforce.com directly as the user. I can see that being a model.David [00:29:33]: I think I don't know enough about what using rabbit in real life will actually be like to comment on that particular thing. But I think the broader idea that, you know, you have a goal, right? The agent knows how to break your goal down into steps. The agent knows how to use the underlying software and systems or record to achieve that goal for you. The agent maybe presents you information in a custom way that's only relevant to your particular goal, all just really leads to a world where you don't really need to ever interface with the apps underneath unless you're a power user for some niche thing.Swyx [00:30:03]: General question. So first of all, I think like the sort of input mode conversation. I wonder if you have any analogies that you like with self-driving, because I do think like there's a little bit of how the model should perceive the world. And you know, the primary split in self-driving is LiDAR versus camera. And I feel like most agent companies that I'm tracking are all moving towards camera approach, which is like the multimodal approach, you know, multimodal vision, very heavy vision, all the Fuyu stuff that you're doing. You're focusing on that, including charts and tables. And do you find that inspiration there from like the self-driving world? That's a good question.David [00:30:37]: I think sometimes the most useful inspiration I've found from self-driving is the levels analogy. I think that's awesome. But I think that our number one goal is for agents not to look like self-driving. We want to minimize the chances that agents are sort of a thing that you just have to bang your head at for a long time to get to like two discontinuous milestones, which is basically what's happened in self-driving. We want to be living in a world where you have the data flywheel immediately, and that takes you all the way up to the top. But similarly, I mean, compared to self-driving, like two things that people really undervalue is like really easy to driving a car down highway 101 in a sunny day demo. That actually doesn't prove anything anymore. And I think the second thing is that as a non-self-driving expert, I think one of the things that we believe really strongly is that everyone undervalues the importance of really good sensors and actuators. And actually a lot of what's helped us get a lot of reliability is a really strong focus on actually why does the model not do this thing? And the non-trivial amount of time, the time the model doesn't actually do the thing is because if you're a wizard of ozzing it yourself, or if you have unreliable actuators, you can't do the thing. And so we've had to fix a lot of those problems.Swyx [00:31:43]: I was slightly surprised just because I do generally consider the way most that we see all around San Francisco as the most, I guess, real case of agents that we have in very material ways.David [00:31:55]: Oh, that's absolutely true. I think they've done an awesome job, but it has taken a long time for self-driving to mature from when it entered the consciousness and the driving down 101 on a sunny day moment happened to now. Right. So I want to see that more compressed.Swyx [00:32:07]: And I mean, you know, cruise, you know, RIP. And then one more thing on just like, just going back on this reliability thing, something I have been holding in my head that I'm curious to get your commentary on is I think there's a trade-off between reliability and generality, or I want to broaden reliability into just general like sort of production readiness and enterprise readiness scale. Because you have reliability, you also have cost, you have speed, speed is a huge emphasis for a debt. The tendency or the temptation is to reduce generality to improve reliability and to improve cost, improve speed. Do you perceive a trade-off? Do you have any insights that solve those trade-offs for you guys?David [00:32:42]: There's definitely a trade-off. If you're at the Pareto frontier, I think a lot of folks aren't actually at the Pareto frontier. I think the way you get there is basically how do you frame the fundamental agent problem in a way that just continues to benefit from data? I think one of the main ways of being able to solve that particular trade-off is you basically just want to formulate the problem such that every particular use case just looks like you collecting more data to go make that use case possible. I think that's how you really solve. Then you get into the other problems like, okay, are you overfitting on these end use cases? You're not doing a thing where you're being super prescriptive for the end steps that the model can only do, for example.Swyx [00:33:17]: Then the question becomes, do you have one house model that you can then customize for each customer and you're fine-tuning them on each customer's specific use case?David [00:33:25]: Yeah.Swyx [00:33:26]: We're not sharing that. You're not sharing that. It's tempting, but that doesn't look like AGI to me. You know what I mean? That is just you have a good base model and then you fine-tune it.David [00:33:35]: For what it's worth, I think there's two paths to a lot more capability coming out of the models that we all are training these days. I think one path is you figure out how to spend, compute, and turn it into data. In that path, I consider search, RL, all the things that we all love in this era as part of that path, like self-play, all that stuff. The second path is how do you get super competent, high intelligence demonstrations from humans? I think the right way to move forward is you kind of want to combine the two. The first one gives you maximum sample efficiency for a little second, but I think that it's going to be hard to be running at max speed towards AGI without actually solving a bit of both.Swyx [00:34:16]: You haven't talked much about synthetic data, as far as I can tell. Probably this is a bit too much of a trend right now, but any insights on using synthetic data to augment the expensive human data?David [00:34:26]: The best part about framing AGI as being able to help people do things on computers is you have an environment.Swyx [00:34:31]: Yes. So you can simulate all of it.David [00:34:35]: You can do a lot of stuff when you have an environment.Alessio [00:34:37]: We were having dinner for our one-year anniversary. Congrats. Yeah. Thank you. Raza from HumanLoop was there, and we mentioned you were coming on the pod. This is our first-Swyx [00:34:45]: So he submitted a question.Alessio [00:34:46]: Yeah, this is our first, I guess, like mailbag question. He asked, when you started GPD 4 Data and Exist, now you have a GPD 4 vision and help you building a lot of those things. How do you think about the things that are unique to you as Adept, and like going back to like the maybe research direction that you want to take the team and what you want people to come work on at Adept, versus what is maybe now become commoditized that you didn't expect everybody would have access to?David [00:35:11]: Yeah, that's a really good question. I think implicit in that question, and I wish he were tier two so he can push back on my assumption about his question, but I think implicit in that question is calculus of where does advantage accrue in the overall ML stack. And maybe part of the assumption is that advantage accrues solely to base model scaling. But I actually believe pretty strongly that the way that you really win is that you have to go build an agent stack that is much more than that of the base model itself. And so I think like that is always going to be a giant advantage of vertical integration. I think like it lets us do things like have a really, really fast base model, is really good at agent things, but is bad at cat and dog photos. It's pretty good at cat and dog photos. It's not like soda at cat and dog photos, right? So like we're allocating our capacity wisely, right? That's like one thing that you really get to do. I also think that the other thing that is pretty important now in the broader foundation modeling space is I feel despite any potential concerns about how good is agents as like a startup area, right? Like we were talking about earlier, I feel super good that we're doing foundation models in service of agents and all of the reward within Adept is flowing from can we make a better agent? Because right now I think we all see that, you know, if you're training on publicly available web data, you put in the flops and you do reasonable things, then you get decent results. And if you just double the amount of compute, then you get predictably better results. And so I think pure play foundation model companies are just going to be pinched by how good the next couple of llamas are going to be and the next what good open source thing. And then seeing the really big players put ridiculous amounts of compute behind just training these base foundation models, I think is going to commoditize a lot of the regular LLMs and soon regular multimodal models. So I feel really good that we're just focused on agents.Swyx [00:36:56]: So you don't consider yourself a pure play foundation model company?David [00:36:59]: No, because if we were a pure play foundation model company, we would be training general foundation models that do summarization and all this other...Swyx [00:37:06]: You're dedicated towards the agent. Yeah.David [00:37:09]: And our business is an agent business. We're not here to sell you tokens, right? And I think like selling tokens, unless there's like a...Swyx [00:37:14]: Not here to sell you tokens. I love it.David [00:37:16]: It's like if you have a particular area of specialty, right? Then you won't get caught in the fact that everyone's just scaling to ridiculous levels of compute. But if you don't have a specialty, I find that, I think it's going to be a little tougher.Swyx [00:37:27]: Interesting. Are you interested in robotics at all? Just a...David [00:37:30]: I'm personally fascinated by robotics. I've always loved robotics.Swyx [00:37:33]: Embodied agents as a business, you know, Figure is like a big, also sort of open AI affiliated company that raises a lot of money.David [00:37:39]: I think it's cool. I think, I mean, I don't know exactly what they're doing, but...Swyx [00:37:44]: Robots. Yeah.David [00:37:46]: Well, I mean, that's a...Swyx [00:37:47]: Yeah. What question would you ask? If we had them on, what would you ask them?David [00:37:50]: Oh, I just want to understand what their overall strategy is going to be between now and when there's reliable stuff to be deployed. But honestly, I just don't know enough about it.Swyx [00:37:57]: And if I told you, hey, fire your entire warehouse workforce and, you know, put robots in there, isn't that a strategy? Oh yeah.David [00:38:04]: Yeah. Sorry. I'm not questioning whether they're doing smart things. I genuinely don't know what they're doing as much, but I think there's two things. One, I'm so excited for someone to train a foundation model of robots. It's just, I think it's just going to work. Like I will die on this hill, but I mean, like again, this whole time, like we've been on this podcast, we're just going to continually saying these models are basically behavioral cloners. Right. So let's go behavioral clone all this like robot behavior. Right. And then you figure out everything else you have to do in order to teach you how to solve a new problem. That's going to work. I'm super stoked for that. I think unlike what we're doing with helping humans with knowledge work, it just sounds like a more zero sum job replacement play. Right. And I'm personally less excited about that.Alessio [00:38:46]: We had a Ken June from InBoo on the podcast. We asked her why people should go work there and not at Adept.Swyx [00:38:52]: Oh, that's so funny.Alessio [00:38:54]: Well, she said, you know, there's space for everybody in this market. We're all doing interesting work. And she said, they're really excited about building an operating system for agent. And for her, the biggest research thing was like getting models, better reasoning and planning for these agents. The reverse question to you, you know, why should people be excited to come work at Adept instead of InBoo? And maybe what are like the core research questions that people should be passionate about to have fun at Adept? Yeah.David [00:39:22]: First off, I think that I'm sure you guys believe this too. The AI space to the extent there's an AI space and the AI agent space are both exactly as she likely said, I think colossal opportunities and people are just going to end up winning in different areas and a lot of companies are going to do well. So I really don't feel that zero something at all. I would say to like change the zero sum framing is why should you be at Adept? I think there's two huge reasons to be at Adept. I think one of them is everything we do is in the service of like useful agents. We're not a research lab. We do a lot of research in service of that goal, but we don't think about ourselves as like a classic research lab at all. And I think the second reason I work at Adept is if you believe that actually having customers and a reward signal from customers lets you build a GI faster, which we really believe, then you should come here. And I think the examples for why that's true is for example, our evaluations, they're not academic evals. They're not simulator evals. They're like, okay, we have a customer that really needs us to do these particular things. We can do some of them. These are the ones they want us to, we can't do them at all. We've turned those into evals, solve it, right? I think that's really cool. Like everybody knows a lot of these evals are like pretty saturated and the new ones that even are not saturated. You look at someone and you're like, is this actually useful? Right? I think that's a degree of practicality that really helps. Like we're equally excited about the same problems around reasoning and planning and generalization and all of this stuff. They're very grounded in actual needs right now, which is really cool.Swyx [00:40:45]: Yeah. This has been a wonderful dive. You know, I wish we had more time, but I would just leave it kind of open to you. I think you have broad thoughts, you know, just about

ceo history ai google san francisco new york times data seattle reach microsoft robots balancing reddit act figure failed exist windows sold bill gates cfo berkeley cto rip future of work vc coco transformers cv openai salesforce sf residence nvidia ux mining api gi chatbots pitchers makes gpt ui ml apis transformer embodied vanilla flops vcs ic ogs sentiment copilot llm sam altman lidar pdfs agi eng soc pareto dota ocr anthropic mountain view raza series b deepmind ilya noam google sheets alessio satya analogies dep redfin rl sota googlers luan multimodal axon adept act one uis datasets smol kevin scott persimmon google brain agis gpd a100 oses nvidia gtc imbue huggingface gemini pro david yeah amy hood david one dgx gemini ultra latent space fuyu dextro ajeya cotra gpgpu

Episodio 315 - Narrando el Futuro: Storytelling, Innovación e IA

Ingenios@s de Sistemas

Play Episode Listen Later Mar 10, 2024 23:59

Esta semana hablamos de como aprovechar la IA para comunicar de manera efectiva en este presente tan inundado de información en el que hay que destacar o morir. Noticias: Elon Musk demanda a Sam Altman y a OpenAI ChatGPT despliega silenciosamente la función "Read Aloud" Anthropic lanza Claude 3, superando a GPT-4 y Gemini Ultra en pruebas de rendimiento Madonna adopta Runway para sus visuales El ejército de EEUU está probando chatbots de IA en simulaciones de estrategias militares Stability AI crea un generador en 3D de objetos casi instantáneos OpenAI responde a Elon Musk Microsoft es alertado sobre contenido perjudicial generado por su IA, Copilot Designer Ingeniero de Google roba secretos de IA y es descubierto Inflection lanza una mejora significativa en su asistente personal de IA Herramientas: WisdomPlan Aprendizaje personalizado basado en IA para dominar cualquier habilidad a tu manera(Link) StickerBaker Hornea pegatinas personalizadas al instante con diseños generados por IA(Link) Instanice Transforma tus fotos con impresionantes efectos estéticos sin esfuerzo.(Link) Survicate Obtenga información práctica antes con encuestas y análisis basados en IA.(Link) Chart Analyst GPT Analice gráficos sin esfuerzo con una solución potente y educativa(Link) Osum Realice estudios de mercado exhaustivos en cuestión de segundos: Pruébelo gratis (utilice el código de pago)(Link) PodBravo Impulse la producción de podcasts con la creación automatizada de contenidos(Link) Click AI Automatice su flujo de trabajo con la automatización de control de calidad sin mantenimiento(Link) Cover Letter GPT Aumente el éxito de las solicitudes de empleo con cartas de presentación personalizadas(Link) AI Studios Creación de contenidos y generación de vídeos basados en IA(Link) Exa Conecte la IA a Internet para mejorar el conocimiento del contexto.(Link) DATAKU Extracción y transformación avanzadas de datos de texto no estructurados(Link) Outfit GPT Renueva tu estilo con consejos personales de moda GPT(Link) Reading Club Dé vida a los cuentos infantiles con IA(Link) Chat with MLX Aplicación MacOS para la interacción eficiente de documentos(Link) DeepFashion Crea looks personalizados con imágenes de moda generadas por IA.(Link) Ema Un empleado de AI de confianza que automatiza flujos de trabajo complejos(Link) Lummi La fuente definitiva de fotos de archivo generadas por IA.(Link) Reporfy Crear y compartir informes detallados basados en IA.(Link) Aili Resumen y recopilación de páginas web(Link) Read This AI Transforme texto en audio de alta calidad sin esfuerzo(Link) Vmaker Video Editor Convierta vídeo sin procesar en contenido listo para publicar(Link) Apúntate a la academia Canal de telegram y Canal de Youtube Pregunta por Whatsapp +34 620 240 234 Déjame un mensaje de voz

Fri Mar-8-2024:NIST Budget Cuts and AI Safety, Anthropic Claude 3, Future of Prompt Engineering

Business of Tech

Play Episode Listen Later Mar 8, 2024 10:30

On today's episode of "The Business of Tech," host Dave Sobel discusses NIST's challenges with budget cuts and AI safety responsibilities. Anthropic launches Claude 3, surpassing GPT-4 and Gemini Ultra in AI benchmarks. The discussion highlights NIST's struggles with limited resources despite its critical role in AI oversight, potentially jeopardizing President Biden's AI regulation plans. Three things to know today 00:00 NIST Struggles with Budget Cuts Amid Growing AI Safety Responsibilities02:50 Anthropic Launches Claude 3, Surpassing GPT-4 and Gemini Ultra in AI Benchmarks04:41 Our Friday Big Ideas Supported by: https://huntress.com/mspradio/ Looking for a link from the stories? The entire script of the show, with links to articles, are posted in each story on https://www.businessof.tech/ Do you want the show on your podcast app or the written versions of the stories? Subscribe to the Business of Tech: https://www.businessof.tech/subscribe/ Support the show on Patreon: https://patreon.com/mspradio/ Want our stuff? Cool Merch? Wear “Why Do We Care?” - Visit https://mspradio.myspreadshop.com Follow us on:LinkedIn: https://www.linkedin.com/company/28908079/YouTube: https://youtube.com/mspradio/Facebook: https://www.facebook.com/mspradionews/Instagram: https://www.instagram.com/mspradio/TikTok: https://www.tiktok.com/@businessoftech Looking for a link from the stories? The entire script of the show, with links to articles, are posted in each story on https://www.businessof.tech/ Do you want the show on your podcast app or the written versions of the stories? Subscribe to the Business of Tech: https://www.businessof.tech/subscribe/ Support the show on Patreon: https://patreon.com/mspradio/ Want our stuff? Cool Merch? Wear “Why Do We Care?” - Visit https://mspradio.myspreadshop.com Follow us on:LinkedIn: https://www.linkedin.com/company/28908079/YouTube: https://youtube.com/mspradio/Facebook: https://www.facebook.com/mspradionews/Instagram: https://www.instagram.com/mspradio/TikTok: https://www.tiktok.com/@businessoftech

tiktok ai business tech joe biden safety gpt anthropic nist budget cuts prompt engineering fri mar gemini ultra cool merch

ai apple european union 5g gpt macbook air anthropic m3 apple car eu commission hammer down gemini ultra 5g advanced

Play Episode Listen Later Mar 8, 2024 105:27

Hello hello everyone, happy spring! Can you believe it? It's already spring! We have tons of AI news for you to cover, starting with the most impactful one, did you already use Claude 3? Anthropic decided to celebrate Claude 1's birthday early (which btw is also ThursdAI's birthday and GPT4 release date, March 14th, 2023) and gave us 3 new Clauds! Opus, Sonnet and Haiku. TL;DR of all topics covered: * Big CO LLMs + APIs*

god ceo canada ai europe english google china guide voice news speaking san francisco phd story chinese system toronto japanese elon musk microsoft dm open iphone model 3d chatgpt generation code clear os reddit cloud iron man experiments longer playstation ios wikipedia korean viral galaxy foster constitution lower hebrew remove pi cto ram excuse folks coco jeopardy demo fireworks openai gemini sf nvidia stability omg api luigi honda galore sd vic coding gpt ui turbo playground ml lama hungarian linux github sidney dharma hindi apis lava vpn torch jarvis amd lit hermes yolo 200k sora r d apache opus biases llm tl hitchhiker prompt amazon web services google drive kilo vivek contacts cpu gpu benchmark d3 agi hug midjourney grok perplexity docker phi rag haiku anthropic gpus sonnets 7b deepmind ilya rtx optimus tps fine tuning eval reid hoffman hourglass yam yee akshay mistral suno vb xai luma fal yi tropic stable diffusion gpts inflection vik olmo eric ries ssh saudi aramco typography pii cuda v3 nissen dits autonomous driving axolotl ascii jumbotron siv laman dpd stability ai ai news chatgbt imad reka workbench aws lambda kaggle nvme pytorch emad gpc yaml david chalmers adobe firefly bing chat allen institute yann lecun inflection ai janai oap gpd jeremy howard solr 70b mcq neurips dune ii v9 huggingface gemini pro rlhf jim keller entropic george hotz vl m ideogram gvt technium gemini ultra john whitaker peter yang george haas constitutional ai lpu

Target's Membership Program, Wayfair's Consolidated Delivery Strategy, Amazon's Fee Changes, & Anthropic's AI Advancements

Retail Daily Minute

Play Episode Listen Later Mar 6, 2024 5:13

Welcome to Omni Talk's Retail Daily Minute. Stay informed with today's top headlines in retail innovation:Target Launches Target Circle 360: Target introduces its paid membership program, Target Circle 360, offering unlimited free same-day delivery and other perks for a subscription fee. Wayfair's Consolidated Delivery Option: Wayfair announces plans to offer consolidated delivery for business-to-consumer shipments, allowing customers to schedule multiple items for delivery on the same date. Amazon's New Fees Raise Concern Among Sellers: Amazon faces backlash from sellers over new fees, including inbound placement fees and penalties for low inventory levels.Anthropic Unveils Claude 3 AI Models: Anthropic, a startup founded by former OpenAI executives, introduces its latest AI models, Claude 3, boasting multimodal support and outperforming competitors like GPT-4 and Google's Gemini Ultra. Stay tuned to Omni Talk's Retail Daily Minute for more updates on cutting-edge developments shaping the retail industry and don't forget to use our code "OMNITALK" to register for Shoptalk, which is only 10 days away! #RetailInnovation #AIAdvancements

amazon ai google strategy target delivery fees openai gpt advancements wayfair anthropic shop talk consolidated membership program gemini ultra omnitalk

Mon. 03/04 – EU Brings The Hammer Down On Apple

Techmeme Ride Home

Play Episode Listen Later Mar 4, 2024 15:58

The EU Commission has fined Apple for stifling music streaming competition. New Macbook Airs with the M3 chip. Why the Apple Car was doomed from day one. Anthropic releases Claude 3 in three different flavors. And if 5G isn't floating your boat, can I interest you in 5G Advanced?Links:Apple hit with €1.8bn fine for breaking EU law over music streaming (Financial Times)Apple launches new 13-inch and 15-inch MacBook Air with M3 chip, support for two external displays, faster Wi-Fi (9to5Mac)Apple's Car Was Doomed by Its Lofty Ambitions to Outdo Tesla (Bloomberg)Anthropic unveils Claude 3, surpassing GPT-4 and Gemini Ultra in benchmark tests (VentureBeat)Google-backed Anthropic debuts its most powerful chatbot yet, as generative AI battle heats up (CNBC)Telcos are barely done rolling out 5G networks — and they're already talking about ‘5.5G' (CNBC)See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Play Episode Listen Later Mar 1, 2024 113:53

Happy leap year day everyone, very excited to bring you a special once-in-a-4 year edition of ThursdAI

ceo new york tiktok friends ai europe english google apple moving training french new york times deep european chinese german spanish microsoft italian open chatgpt hong kong mcdonald tree curse cheers reddit stanford mac cat billion honestly adams dune vibes mark zuckerberg ego mighty berkeley communicate cto programming leap messy folks swift similar react excel mj transformers largest includes openai gemini wordpress genie function user gem fusion nvidia complaints rust references trained wing api tumblr ye documents stack open source trillion mojo ak gpt python gorilla turbo aws playground ml lama 2d alibaba mayo clinic github llama analyze dua lipa iso apis ds foundational transformer hermes sum azure existential javascript sora apache copilot llm tl emo imagen prompt daly cpu gpu yum 3b beautifully modular orca hug midjourney dali vector grok perplexity coherence phi avatars rag instruct diffusion 1k 7b texture leap year guerrilla pca automatically lemme fai tldr aditya mtb ess lms fine tuning satya fatih retrieval lm yam yee json jaw mistral sundar pichai sota ouroboros representations tropic stable diffusion typescript chunks jammer clippy a16z automattic seamlessly olam year special abacus tensorflow le chat nissen pratik axolotl dpo junaid prateek 15b cohere pytorch open source ai tpu matryoshka mixpanel dicta larynx groq llvm mira murati loras dimensionality chris lattner neurips sft huggingface gemini pro rlhf entropic cerebras jema mrl ideogram hrithik code llama technium scipy gemini ultra adithya weaviate olami matplotlib mlir suhail doshi rohf partik john durbin

761: Gemini Ultra: How to Release an A.I. Product for Billions of Users, with Google's Lisa Cohen

SuperDataScience

Play Episode Listen Later Feb 27, 2024 70:15

Google's Gemini Ultra takes the spotlight this week, as host Jon Krohn welcomes Lisa Cohen, Google's Director of Data Science and Engineering, for a conversation about the launch of Gemini Ultra. Discover the capabilities of this cutting-edge large language model and how it stands toe-to-toe with GPT-4. Lisa shares her insights on the development, rollout, and potential of Gemini Ultra in reshaping various sectors. Whether you're a data science professional, tech enthusiast, or curious about the future of AI, this episode offers a deep dive into one of the most significant advancements in artificial intelligence. This episode is brought to you by Ready Tensor, where innovation meets reproducibility (https://www.readytensor.ai/), and by Intel and HPE Ezmeral Software Solutions (https://hpe.com/ezmeral/chatbots). Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information. In this episode you will learn: • Google's Gemini model family and Lisa's key responsibilities [04:55] • How LLMs will transform the practice of Data Science [19:47] • Lisa on prompt engineering and reinforcement learning from human feedback [24:38] • How to fine-tune Gemini models with Google's Vertex AI [30:52] • How AI-assistants will transform life and work for everyone from data scientists to educators to children [47:14] • The challenges of developing a data-centric culture [57:31] • Centralized vs decentralized data science teams [1:03:50] Additional materials: www.superdatascience.com/761

director ai google discover product engineering intel gemini billions users gpt data science centralized vertex ai lisa cohen gemini ultra jon krohn

EP 215: OpenAI kills plugins, Tyler Perry stalls $800 million expansion due to AI and more AI News That Matters - Feb. 26th, 2024

Play Episode Listen Later Feb 26, 2024 41:16

ChatGPT Plugins are on their way out! Tyler Perry is putting his studio expansion on hold due to AI, and Google is making TONS of news right now! Here's this week's AI news that matters and why it's important. Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode pageJoin the discussion: Ask Jordan questions on AIRelated Episodes:Ep 211: OpenAI's Sora – The larger impact that no one's talking aboutEp 204: Google Gemini Advanced – 7 things you need to knowTomorrow' Show: How to stand out in a world where everyone can create an AI Startup?Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTimestamps:03:42 Tyler Perry concerned about AI job loss.07:22 OpenAI Sora video excels over other platforms.12:54 11 Labs updated model, ChatGPT phasing out.15:27 Plugin packs for ChatGPT.16:55 Limitations on using multiple GPTs for now.22:16 Unsatisfied with Google Gemini Enterprise integration.23:13 Google and Reddit partnership for language models.28:39 Google Gemini Images paused due to diversity concerns.31:16 Google now has three Gemini models.34:54 Best text-to-speech AI37:11 AI content creation raises copyright concernsTopics Covered in This Episode:1. OpenAI's changes and future focus2. Google's Significant AI content deal with Reddit3. Google's AI model developments and issues4. Trends in AI utilization within the entertainment industryKeywords:OpenAI, GPT, AI agents, AI assistants, prime prompt polish program, Google, Reddit, AI content licensing deal, AI models, search engine, Gemini AI, large language models, user-generated content, university student data, Google Gemini Imagine 2, Gemma, Gemini Ultra, Gemini Pro, Gemini Nano, Tyler Perry, Sora, AI in entertainment, text-to-speech AI, business productivity, ChatGPT plugins, Well Said Labs, Asura, AI video platforms, Perry's studio expansion, AI regulation

Play Episode Listen Later Feb 23, 2024 108:04

Hey, this is Alex,Ok let's start with the big news, holy crap this week was a breakthrough week for speed! We had both Groq explode in popularity, and ByteDance release an updated SDXL model called Lightning, able to generate full blown SDXL 1024 images in 300ms. I've been excited about seeing what real time LLM/Diffusion can bring, and with both of these news release the same week, I just had to go and test them out together: Additionally, we had Google step into a big open weights role, and give us Gemma, 2 open weights models 2B and 7B (which is closer to 9B per Junyang) and it was great to see google committing to releasing at least some models in the open. We also had breaking news, Emad from Stability announced SD3, which looks really great, Google to pay Reddit 200M for AI training on their data & a few more things. TL;DR of all topics covered: * Big CO LLMs + APIs* Groq custom LPU inference does 400T/s Llama/Mistral generation (X, Demo)* Google image generation is in Hot Waters and was reportedly paused (refuses to generate white people)* Gemini 1.5 long context is very impressive to folks (Matt Shumer, Ethan Mollick)* Open Weights LLMs * Google releases GEMMA, open weights 2B and 7B models (Announcement, Models)* Teknium releases Nous Hermes DPO (Announcement, HF)* Vision & Video* YoLo V9 - SOTA real time object detector is out (Announcement, Code)* This weeks Buzz (What I learned in WandB this week)* Went to SF to cohost an event with A16Z, Nous, Mistral (Thread, My Report)* AI Art & Diffusion & 3D* ByteDance presents SDXL-Lightning (Try here, Model)* Stability announces Stable Diffusion 3 (Announcement)* Tools* Replit releases a new experimental Figma plugin for UI → Code (Announcement)* Arc browser adds "AI pinch to understand" summarization (Announcement)Big CO LLMs + APIsGroq's new LPU show extreme performance for LLMs - up to 400T/s (example)* Groq created a novel processing unit known as the Tensor Streaming Processor (TSP) which they categorize as a Linear Processor Unit (LPU). Unlike traditional GPUs that are parallel processors with hundreds of cores designed for graphics rendering, LPUs are architected to deliver deterministic performance for AI computations.* Analogy: They know where all the cars are going when everyone wakes up for work (when they compile) and how fast they all drive (compute latency) so they can get rid of traffic lights (routers) and turn lanes (backpressure) by telling everyone when to leave the house.* Why would we need something like this? Some folks are saying that average human reading is only 30T/s, I created an example that uses near instant Groq Mixtral + Lightning SDXL to just create images with Mixtral as my prompt managerOpen Source Weights LLMs Google Gemma - 2B and 7B open weights models (demo)* 4 hours after release, Llama.cpp added support, Ollama and LM Studio added support, Tri dao added Flash attention support* Vocab size is 256K* 8K context window* Tokenizer similar to LLama* Folks are... not that impressed as far as I've seen* Trained on 6 trillion tokens* Google also released Gemma.cpp (local CPU inference) - AnnouncementNous/Teknium re-release Nous Hermes with DPO finetune (Announcement)* DPO RLHF is performing better than previous models* Models are GGUF and can be found here* DPO enables Improvements across the boardThis weeks Buzz (What I learned with WandB this week)* Alex was in SF last week* A16Z + 20 something cohosts including Weights & Biases talked about importance of open source* Huge Shoutout Rajko and Marco from A16Z, and tons of open source folks who joined* Nous, Ollama, LLamaIndex, LMSys folks, Replicate, Perplexity, Mistral, Github, as well as Eric Hartford, Jon Durbin, Haotian Liu, HuggingFace, tons of other great folks from Mozilla, linux foundation and Percy from Together/StanfordAlso had a chance to checkout one of the smol dinners in SF, they go really hard, had a great time showing folks the Vision Pro, chatting about AI, seeing incredible demos and chat about meditation and spirituality all at the same time! AI Art & DiffusionByteDance presents SDXL-Lightning (Try here)* Lightning fast SDXL with 2, 4 or 8 steps* Results much closer to original SDXL than turbo version from a few months agoStability announces Stable Diffusion 3 (waitlist)Uses a Diffusion Transformer architecture (like SORA)Impressive multi subject prompt following: "Prompt: a painting of an astronaut riding a pig wearing a tutu holding a pink umbrella, on the ground next to the pig is a robin bird wearing a top hat, in the corner are the words "stable diffusion"Tools* Replit announces a new Figma design→ code plugin That's it for today, definitely check out the full conversation with Mark Heaps from Groq on the pod, and see you next week!

covid-19 united states god ceo amazon tiktok head world ai power google earth starting apple vision france future training san francisco design evolution brand microsoft dm putting creative explore army open dad impact jewish language african indian harry potter chatgpt human run nazis speed discord exploring scale cloud mac flash puerto rico incredible eat honestly draw windows base context alignment wifi mark zuckerberg releasing communicate models lightning results wondering painting human rights messy folks vc instant siri openai lang gemini declaration sf adobe nvidia stability file trained wing api generally ridiculous remarkable gpt ui existing announcement improvements ml lama mosaic github llama analyze lava stance white people united airlines hermes refuses tri band aids guia cad sora copilot 2b biases llm tl y combinator percy prompt daly weights macs cpu genai gpu npcs smb llamas hug ld dali bytedance grok mozilla perplexity phi rag yarn unheard gpus figma 7b automatically gan chief evangelist andreessen horowitz alessio yc lms gema parameters lazer heaps buster keaton 4a 8b uploading yam mistral cpus fal replicate marc andreessen stable diffusion cursor olmo inference blo vocab a16z seamlessly abacus universal declaration cuda aliment linux foundation grog tensor axolotl dpo stability ai ai news imad pytorch emad ffm vicu gfs allen institute community feedback tpu groq lcm gdf devendra jeff dean multimodality so google huggingface gemini pro tokamak entropic vl m alpay welcoming remarks technium gemini ultra mqa lpu lmss tanishka john durbin jon durbin

LW - AI #51: Altman's Ambition by Zvi

The Nonlinear Library: LessWrong

Play Episode Listen Later Feb 21, 2024 58:51

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #51: Altman's Ambition, published by Zvi on February 21, 2024 on LessWrong. [Editor's note: I forgot to post this to WorldPress on Thursday. I'm posting it here now. Sorry about that.] Sam Altman is not playing around. He wants to build new chip factories in the decidedly unsafe and unfriendly UAE. He wants to build up the world's supply of energy so we can run those chips. What does he say these projects will cost? Oh, up to seven trillion dollars. Not a typo. Even scaling back the misunderstandings, this is what ambition looks like. It is not what safety looks like. It is not what OpenAI's non-profit mission looks like. It is not what it looks like to have concerns about a hardware overhang, and use that as a reason why one must build AGI soon before someone else does. The entire justification for OpenAI's strategy is invalidated by this move. I have spun off reactions to Gemini Ultra to their own post. Table of Contents Introduction. Table of Contents. Language Models Offer Mundane Utility. Can't go home? Declare victory. Language Models Don't Offer Mundane Utility. Is AlphaGeometry even AI? The Third Gemini. Its own post, link goes there. Reactions are mixed. GPT-4 Real This Time. Do you remember when ChatGPT got memory? Deepfaketown and Botpocalypse Soon. Bot versus bot, potential for AI hacking. They Took Our Jobs. The question is, will they also take the replacement jobs? Get Involved. A new database of surprising AI actions. Introducing. Several new competitors. Altman's Ambition. Does he actually seek seven trillion dollars? Yoto. You only train once. Good luck! I don't know why. Perhaps you'll die. In Other AI News. Andrej Karpathy leaves OpenAI, self-discover algorithm. Quiet Speculations. Does every country need their own AI model? The Quest for Sane Regulation. A standalone post on California's SR 1047. Washington D.C. Still Does Not Get It. No, we are not confused about this. Many People are Saying. New Yorkers do not care for AI, want regulations. China Watch. Not going great over there, one might say. Roon Watch. If you can. How to Get Ahead in Advertising. Anthropic super bowl ad. The Week in Audio. Sam Altman at the World Government Summit. Rhetorical Innovation. Several excellent new posts, and a protest. Please Speak Directly Into this Microphone. AI killer drones now? Aligning a Smarter Than Human Intelligence is Difficult. Oh Goody. Other People Are Not As Worried About AI Killing Everyone. Timothy Lee. The Lighter Side. So, what you're saying is… Language Models Offer Mundane Utility Washington D.C. government exploring using AI for mundane utility. Deliver your Pakistani presidential election victory speech while you are in prison. Terrance Tao suggests a possible application for AlphaGeometry. Help rescue your Fatorio save from incompatible mods written in Lua. Shira Ovide says you should use it to summarize documents, find the exact right word, get a head start on writing something difficult, dull or unfamiliar, or make cool images you imagine, but not to use it to get info about an image, define words, identify synonyms, get personalized recommendations or to give you a final text. Her position is mostly that this second set of uses is unreliable. Which is true, and you do not want to exclusively or non-skeptically rely on the outputs, but so what? Still seems highly useful. Language Models Don't Offer Mundane Utility AlphaGeometry is not about AI? It seems that what AlphaGeometry is mostly doing is combining DD+AR, essentially labeling everything you can label and hoping the solution pops out. The linked post claims that doing this without AI is good enough in 21 of the 25 problems that it solved, although a commentor notes the paper seems to claim it was somewhat less than that. If it was indeed 21, and to some extent even if it wasn't...

LW - AI #51: Altman's Ambition by Zvi

Play Episode Listen Later Feb 21, 2024 58:51

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #51: Altman's Ambition, published by Zvi on February 21, 2024 on LessWrong. [Editor's note: I forgot to post this to WorldPress on Thursday. I'm posting it here now. Sorry about that.] Sam Altman is not playing around. He wants to build new chip factories in the decidedly unsafe and unfriendly UAE. He wants to build up the world's supply of energy so we can run those chips. What does he say these projects will cost? Oh, up to seven trillion dollars. Not a typo. Even scaling back the misunderstandings, this is what ambition looks like. It is not what safety looks like. It is not what OpenAI's non-profit mission looks like. It is not what it looks like to have concerns about a hardware overhang, and use that as a reason why one must build AGI soon before someone else does. The entire justification for OpenAI's strategy is invalidated by this move. I have spun off reactions to Gemini Ultra to their own post. Table of Contents Introduction. Table of Contents. Language Models Offer Mundane Utility. Can't go home? Declare victory. Language Models Don't Offer Mundane Utility. Is AlphaGeometry even AI? The Third Gemini. Its own post, link goes there. Reactions are mixed. GPT-4 Real This Time. Do you remember when ChatGPT got memory? Deepfaketown and Botpocalypse Soon. Bot versus bot, potential for AI hacking. They Took Our Jobs. The question is, will they also take the replacement jobs? Get Involved. A new database of surprising AI actions. Introducing. Several new competitors. Altman's Ambition. Does he actually seek seven trillion dollars? Yoto. You only train once. Good luck! I don't know why. Perhaps you'll die. In Other AI News. Andrej Karpathy leaves OpenAI, self-discover algorithm. Quiet Speculations. Does every country need their own AI model? The Quest for Sane Regulation. A standalone post on California's SR 1047. Washington D.C. Still Does Not Get It. No, we are not confused about this. Many People are Saying. New Yorkers do not care for AI, want regulations. China Watch. Not going great over there, one might say. Roon Watch. If you can. How to Get Ahead in Advertising. Anthropic super bowl ad. The Week in Audio. Sam Altman at the World Government Summit. Rhetorical Innovation. Several excellent new posts, and a protest. Please Speak Directly Into this Microphone. AI killer drones now? Aligning a Smarter Than Human Intelligence is Difficult. Oh Goody. Other People Are Not As Worried About AI Killing Everyone. Timothy Lee. The Lighter Side. So, what you're saying is… Language Models Offer Mundane Utility Washington D.C. government exploring using AI for mundane utility. Deliver your Pakistani presidential election victory speech while you are in prison. Terrance Tao suggests a possible application for AlphaGeometry. Help rescue your Fatorio save from incompatible mods written in Lua. Shira Ovide says you should use it to summarize documents, find the exact right word, get a head start on writing something difficult, dull or unfamiliar, or make cool images you imagine, but not to use it to get info about an image, define words, identify synonyms, get personalized recommendations or to give you a final text. Her position is mostly that this second set of uses is unreliable. Which is true, and you do not want to exclusively or non-skeptically rely on the outputs, but so what? Still seems highly useful. Language Models Don't Offer Mundane Utility AlphaGeometry is not about AI? It seems that what AlphaGeometry is mostly doing is combining DD+AR, essentially labeling everything you can label and hoping the solution pops out. The linked post claims that doing this without AI is good enough in 21 of the 25 problems that it solved, although a commentor notes the paper seems to claim it was somewhat less than that. If it was indeed 21, and to some extent even if it wasn't...

64 | 12 Converging Technologies That Are Changing Our World and Their Impact On Business

Leveraging AI

Play Episode Listen Later Feb 20, 2024 26:00 Transcription Available

Are We Ready for the AI Revolution That's About to Reshape Our World?In this episode of Leveraging AI, Isar Meitis talked about the convergence of groundbreaking technologies poised to revolutionize how we live, work, and conduct business. From advanced AI models to quantum computing leaps, we explore what's on the horizon for business leaders, C-suite executives, and entrepreneurs.Topics include:The rapid evolution of AI models: From GPT-3.5 to Gemini Ultra 1.0, and the looming GPT-5Cutting-edge AI infrastructure developmentsAI agents: Autonomous tools making decisions and taking actions on our behalfNext-gen video generation: Crafting realistic videos with AI, the implications for business and beyondThe future of computing: Quantum leaps, AI-driven humanoid robots, and immersive technologies like Apple Vision ProBrain-computer interfaces: The potential of becoming cyborgs to stay ahead in the AI raceSustainable energy advancements: Nuclear fusion and its role in powering our AI-driven futureThis episode isn't just about understanding the future; it's about preparing for it. Whether you're at the helm of a startup or leading a Fortune 500 company, the insights shared today could be the key to unlocking unimaginable opportunities and navigating the challenges of tomorrow's business landscape.Tune in, get inspired, and let's navigate the future of business together. Don't forget to subscribe for more episodes that equip you with the knowledge to lead in the AI eraAbout Leveraging AI The Ultimate AI Course for Business People: https://multiplai.ai/ai-course/ YouTube Full Episodes: https://www.youtube.com/@Multiplai_AI/ Connect with Isar Meitis: https://www.linkedin.com/in/isarmeitis/ Free AI Consultation: https://multiplai.ai/book-a-call/ If you've enjoyed or benefited from some of the insights of this episode, leave us a five-star review on your favorite podcast platform, and let us know what you learned, found helpful, or liked most about this show!

EP 210: OpenAI Sora, Gemini Ultra 1.5, NVIDIA Chat with RTX, Andrej Karpathy leaves OpenAI and more. AI News That Matters - Feb. 19th, 2024

Loop Infinito (by Applesfera)

Play Episode Listen Later Feb 19, 2024 38:53

The last 7 days of AI news have been crazy! OpenAI announced its amazing text-to-video Sora, Gemini released Ultra 1.5, NVIDIA's Chat with RTX, Andrej Karpathy leaves OpenAI, and more! Here's this week's AI news that matters.Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode pageJoin the discussion: Ask Jordan questions on AIRead the newsletter on this episode: Read it hereRelated Episodes: Ep 204: Google Gemini Advanced – 7 things you need to knowEp 181: New York Times vs. OpenAI – The huge AI implications no one is talking aboutTomorrow' Show: OpenAI's Sora: The larger impact that no one's talking about.Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTimestamps:02:40 OpenAI text-to-video model Sora07:50 Gemini 1.5 update: Developer and enterprise access.15:19 NVIDIA's Chat with RTX19:18 OpenAI announces 2 new AI agents.21:12 OpenAI and Sam Altman's ambitious plans.25:21 Tech companies form unofficial accord to avoid penalties.29:14 Discussion on AI models and election safety.31:18 AI companies shift data acquisition to formal agreements.33:56 Potential impact of AI on publishing industry.Topics Covered in This Episode:1. OpenAI's Sora2. Google Gemini 1.53. NVIDIA Chat with RTX4. Andrei Kapathy's departure from OpenAI5. Reddit's AI content dealKeywords:Large language models, Star Wars, token memory, ChatGPT, Google, Gemini Ultra 1.5, NVIDIA, Chat with RTX, Andrei Kapathy, OpenAI, generative AI, Everyday AI, OpenAI SOAR model, AI legislation, social media regulations, disinformation/misinformation, 2024 US elections, Reddit, AI content deal, AI companies, content providers, SORA, text-to-video model, Entre Kuparthy, Tesla, AI agents, Sam Altman, GPU chips, deep fakes. Get more out of ChatGPT by learning our PPP method in this live, interactive and free training! Sign up now: https://youreverydayai.com/ppp-registration/

Probando Gemini Advanced

Play Episode Listen Later Feb 19, 2024 11:49

Google ha lanzado Gemini como sucesor de Bard y como competencia a ChatGPT, ofreciendo un modelo avanzado, Gemini Ultra, para suscriptores de pago, que acceden a Gemini Advanced. Destaca por su rapidez y eficiencia en tareas comunes y traducciones, aunque ChatGPT podría tener la ventaja en creatividad y detalle. Loop Infinito es un podcast de Applesfera, presentado por Javier Lacort y editado por Alberto de la Torre. Contacta con el autor en Twitter (@jlacort) o por correo (lacort@xataka.com). Gracias por escuchar este podcast.

google chatgpt gemini bard torre destaca probando applesfera gemini ultra loop infinito

Google Gemini Advanced: Google's Counter to Copilot? | The AI Moment – Episode 15

Futurum Tech Podcast

Play Episode Listen Later Feb 19, 2024 14:38

On this episode of The AI Moment, we discuss an emerging generative AI trend – the launch of Google Gemini Advanced, Google's counter to Microsoft Copilot. First Microsoft's Copilot, now Google Gemini Advanced. In the span of just a few months, sophisticated generative AI assistants have been made available to the mass market, and there are more to come. In the big picture, there are different drivers for each of the tech giants.

ai google search android gemini counter copilot google gemini responsible ai microsoft copilot foundation models gemini ultra

And The Oscar Goes to Sora

That Was The Week

Play Episode Listen Later Feb 16, 2024 33:40

Hats Off To This Week's Contributors: @RyanMorrisonJer, @geneteare, @mgsiegler, @spyglass_feed, @saulausterlitz, @ClareMalone, @benedictevans, @mikeloukides, @ErikNaso, @kateclarktweets, @finkd, @mattbirchler, @imillhiser, @jaygoldberg, @ron_miller, @btaylor, @sierraplatform, @eladgilContents* Editorial: * Essays of the Week* AI Leads New Unicorn Creation As Ranks Of $1B Startups Swells * Behold: The Sports Streaming Bundle* 40 Years Ago, This Ad Changed the Super Bowl Forever* Is the Media Prepared for an Extinction-Level Event?* Video of the Week* AI and Everything Else - Benedict Evans from Slush* AI of the Week* The OpenAI Endgame* OpenAI Sora– The most realistic AI-generated video to date* I Was Wrong. We Haven't Reached Peak AI Frenzy.* News Of the Week* I tried Vision Pro. Here's my take* The Quest 3 is better than you might expect* The Supreme Court will decide if the government can seize control of YouTube and Twitter* Arm Results Set The World On Fire* Startup of the Week* Bret Taylor's new AI company aims to help customers get answers and complete tasks automatically* X of the Week* Elad Gil on AIEditorial: And The Oscar Goes to SoraOpenAI teased its new video creation model - Sora - this week.In doing so it released a technical report and several examples of prompts and outputs.Cautious to not over-state the end game the company said:We explore large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. We leverage a transformer architecture that operates on spacetime patches of video and image latent codes. Our largest model, Sora, is capable of generating a minute of high fidelity video. Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.All of the videos are incredible, albeit only a minute or less each. My favorite is the Dogs in Snow video:Although the ‘Closeup Man in Glasses' is also wonderful.I mention this because the speed at which AI is addressing new fields is - in my opinion - mind-boggling. Skills that take humans decades to perfect are being learned in months and are capable of scaling to infinite outputs using words, code, images, video, and sound.It will take the advancement of robotics to tie these capabilities to physical work, but that seems assured to happen.When engineering, farming, transport, or production meets AI then human needs can be addressed directly.Sora winning an Oscar for Cinematography or in producing from a script or a book seems far-fetched. But it wasn't so long ago that a tech company doing so would have been laughable, and now we have Netflix, Amazon Prime, and Apple TV Plus regularly being nominated or winning awards.Production will increasingly be able to leverage AI.Some will say this is undermining human skills, but I think the opposite. It will release human skills. Take the prompt that produced the Dogs in Snow video:Prompt:A litter of golden retriever puppies playing in the snow. Their heads pop out of the snow, covered in.I can imagine that idea and write it down. But my skills would not allow me to produce it. Sora opens my imagination and enables me to act on it. I guess that many humans have creative ideas that they are unable to execute….up to now. Sora, DallE, and ChatGPT all focus on releasing human potential.Google released its Gemini 1.5 model this week (less than a month after releasing Gemini Ultra 1.0). Tom's Guide has a summary and analysis by Ryan MorrisonGemini Pro 1.5 has a staggering 10 million token context length. That is the amount of content it can store in its memory for a single chat or response. This is enough for hours of video or multiple books within a single conversation, and Google says it can find any piece of information within that window with a high level of accuracy.Jeff Dean, Google DeepMind Chief Scientist wrote on X that the model also comes with advanced multimodal capabilities across code, text, image, audio and video.He wrote that this means you can “interact in sophisticated ways with entire books, very long document collections, codebases of hundreds of thousands of lines across hundreds of files, full movies, entire podcast series, and more."In “needle-in-a-haystack” testing where they look for the needle in the vast amount of data stored in the context window, they were able to find specific pieces of information with 99.7% accuracy even with 10 million tokens of data.All of this makes it easy to understand why Kate Clark at The Information penned a piece with the title: I Was Wrong. We Haven't Reached Peak AI FrenzyI will leave this week's editorial with Ryan Morrison's observation at the end of his article:What we are seeing with these advanced multimodal models is the interaction of the digital and the real, where AI is gaining a deeper understanding of humanity and how WE see the world.Essays of the WeekAI Leads New Unicorn Creation As Ranks Of $1B Startups Swells February 13, 2024Gené Teare @geneteareFewer startups became unicorns in 2023, but The Crunchbase Unicorn Board also became more crowded, as exits became even scarcer.That means that 10 years after the term “unicorn” was coined to denote those private startups valued at $1 billion or more, there are over 1,500 current unicorn companies globally, collectively valued at more than $5 trillion based on their most recent valuations from funding deals.All told, fewer than 100 companies joined the Unicorn Board in 2023, the lowest count in more than five years, an analysis of Crunchbase data shows.Of the 95 companies that joined the board in 2023, AI was the leading sector, adding 20 new unicorns alone. Other leading unicorn sectors in 2023 included fintech (with 14 companies), cleantech and energy (12 each), and semiconductors (nine).Based on an analysis of Crunchbase data, 41 companies joined the Unicorn Board from the U.S. and 24 from China in 2023. Other countries were in the single digits for new unicorns: Germany had four new companies, while India and the U.K. each had three.New records nonethelessDespite the slower pace of new unicorns, the Crunchbase board of current private unicorns has reached new milestones as fewer companies exited the board in 2023.The total number of global unicorns on our board reached 1,500 at the start of 2024, which takes into account the exclusion of those that have exited via an M&A or IPO transaction. Altogether, these private unicorn companies have raised north of $900 billion from investors.This year also marks a decade since investor Aileen Lee of Cowboy Ventures coined the term unicorn for private companies valued at a billion dollars or more.In a new report looking at the unicorn landscape 10 years later, Lee said she believes the unicorn phenomenon is not going away, despite a sharp downturn in venture funding in recent years. She expects more than 1,000 new companies in the U.S. alone will join the ranks in the next decade.Unicorn exitsIn 2023, 10 unicorn companies exited the board via an IPO, far fewer than in recent years. That contrasts with 20 companies in 2022 and 113 in 2021.However, M&A was more active in 2023. Sixteen unicorn companies were acquired in 2023 — up from 2022 when 11 companies were acquired and slightly down from 2021 with 21 companies exiting via an acquisition.December numbersEight new companies joined The Crunchbase Unicorn Board in December 2023. The highest monthly count last year for new unicorns was 10 and the lowest was two.Of the new unicorns, three are artificial intelligence companies. Other sectors that minted unicorns in December include fintech, cybersecurity, food and beverage, and health care.The new unicorn companies minted in December 2023 were:..MoreBehold: The Sports Streaming BundleIt just makes sense. Sports was the last thing holding together the cable TV bundle. Now it will be the start of the streaming bundle.That's my 5-minute reaction to the truly huge news that Disney, Warner, and Fox are launching a new sports streaming service, combining their various sports rights into one package. Well, presumably. The details are still quite thin at this point. Clearly, several entities were racing to this story, with both WSJ and Bloomberg claiming "scoops" by publishing paragraph-long stories with only the high level facts. I'm linking to Varietyabove, which at least has a few more details, including (canned) quotes from Bob Iger, Lachlan Murdoch, and David Zaslav.Fox Corp., Warner Bros. Discovery and Disney are set to launch a new streaming joint venture that will make all of their sports programming available under a single broadband roof, a move that will put content from ESPN, TNT and Fox Sports on a new standalone app and, in the process, likely shake up the world of TV sports.The three media giants are slated to launch the new service in the fall. Subscribers would get access to linear sports networks including ESPN, ESPN2, ESPNU, SECN, ACCN, ESPNEWS, ABC, Fox, FS1, FS2, BTN, TNT, TBS, truTV and ESPN+, as well as hundreds of hours from the NFL, NBA, MLB and NHL and many top college divisions. Pricing will be announced at a later date.Each company would own one third of the new outlet and license their sports content to it on a non-exclusive basis. The service would have a new brand and an independent management teamYes, this is essentially running the Hulu playbook of old, but only for sports content. No, that ultimately didn't end well, but Hulu had a decent enough run before egos got involved.1 Here, the egos are once again being (at least temporarily) set aside to do something obvious: make money. Sports is the one bit of content that most people watch in one form or another, live no less (hence why it was keeping the cable bundle together). And increasingly, with the rise of streaming, it was becoming impossible to figure out what game was on, where. You could get access to most games online now, but it might require buying four or five different services. And again, then finding which one the game you wanted was actually on...More40 Years Ago, This Ad Changed the Super Bowl ForeverAn oral history of Apple's groundbreaking “1984” spot, which helped to establish the Super Bowl as TV's biggest commercial showcase.By Saul AusterlitzPublished Feb. 9, 2024Updated Feb. 10, 2024Four decades ago, the Super Bowl became the Super Bowl.It wasn't because of anything that happened in the game itself: On Jan. 22, 1984, the Los Angeles Raiders defeated Washington 38-9 in Super Bowl XVIII, a contest that was mostly over before halftime. But during the broadcast on CBS, a 60-second commercial loosely inspired by a famous George Orwell novel shook up the advertising and the technology sectors without ever showing the product it promoted. Conceived by the Chiat/Day ad agency and directed by Ridley Scott, then fresh off making the seminal science-fiction noir “Blade Runner,” the Apple commercial “1984,” which was intended to introduce the new Macintosh computer, would become one of the most acclaimed commercials ever made. It also helped to kick off — pun partially intended — the Super Bowl tradition of the big game serving as an annual showcase for gilt-edged ads from Fortune 500 companies. It all began with the Apple co-founder Steve Jobs's desire to take the battle with the company's rivals to a splashy television broadcast he knew nothing about.In recent interviews, several of the people involved in creating the “1984” spot — Scott; John Sculley, then chief executive of Apple; Steve Hayden, a writer of the ad for Chiat/Day; Fred Goldberg, the Apple account manager for Chiat/Day; and Anya Rajah, the actor who famously threw the sledgehammer — looked back on how the commercial came together, its inspiration and the internal objections that almost kept it from airing. These are edited excerpts from the conversations.JOHN SCULLEY On Oct. 19, 1983, we're all sitting around in Steve [Jobs's] building, the Mac building, and the cover of Businessweek says, “The Winner is … IBM.” We were pretty deflated because this was the introduction of the IBM PCjr, and we hadn't even introduced the Macintosh yet.STEVE HAYDEN Jobs said, “I want something that will stop the world in its tracks.” Our media director, Hank Antosz, said, “Well, there's only one place that can do that — the Super Bowl.” And Steve Jobs said, “What's the Super Bowl?” [Antosz] said, “Well, it's a huge football game that attracts one of the largest audiences of the year.” And [Jobs] said, “I've never seen a Super Bowl. I don't think I know anybody who's seen a Super Bowl.”FRED GOLDBERG The original idea was actually done in 1982. We presented an ad [with] a headline, which was “Why 1984 Won't Be Like ‘1984,'” to Steve Jobs, and he didn't think the Apple III was worthy of that claim...MoreIs the Media Prepared for an Extinction-Level Event?Ads are scarce, search and social traffic is dying, and readers are burned out. The future will require fundamentally rethinking the press's relationship to its audience.Clare MaloneFebruary 10, 2024My first job in media was as an assistant at The American Prospect, a small political magazine in Washington, D.C., that offered a promising foothold in journalism. I helped with the print order, mailed checks to writers—after receiving lots of e-mails asking, politely, Where is my money?—and ran the intern program. This last responsibility allowed me a small joy: every couple of weeks, a respected journalist would come into the office for a brown-bag lunch in our conference room, giving our most recent group of twentysomethings a chance to ask for practical advice about “making it.” One man told us to embrace a kind of youthful workaholism, before we became encumbered by kids and families. An investigative reporter implored us to file our taxes and to keep our personal lives in order—never give the rich and powerful a way to undercut your journalism. But perhaps the most memorable piece of advice was from a late-career writer who didn't mince words. You want to make it in journalism, he said? Marry rich. We laughed. He didn't.I've thought a lot about that advice in the past year. A report that tracked layoffs in the industry in 2023 recorded twenty-six hundred and eighty-one in broadcast, print, and digital news media. NBC News, Vox Media, Vice News, Business Insider, Spotify, theSkimm, FiveThirtyEight, The Athletic, and Condé Nast—the publisher of The New Yorker—all made significant layoffs. BuzzFeed News closed, as did Gawker. The Washington Post, which lost about a hundred million dollars last year, offered buyouts to two hundred and forty employees. In just the first month of 2024, Condé Nast laid off a significant number of Pitchfork's staff and folded the outlet into GQ; the Los Angeles Times laid off at least a hundred and fifteen workers (their union called it “the big one”); Time cut fifteen per cent of its union-represented editorial staff; the Wall Street Journal slashed positions at its D.C. bureau; and Sports Illustrated, which had been weathering a scandal for publishing A.I.-generated stories, laid off much of its staff as well. One journalist recently cancelled a networking phone call with me, writing, “I've decided to officially take my career in a different direction.” There wasn't much I could say to counter that conclusion; it was perfectly logical.“Publishers, brace yourselves—it's going to be a wild ride,” Matthew Goldstein, a media consultant, wrote in a January newsletter. “I see a potential extinction-level event in the future.” Some of the forces cited by Goldstein were already well known: consumers are burned out by the news, and social-media sites have moved away from promoting news articles. But Goldstein also pointed to Google's rollout of A.I.-integrated search, which answers user queries within the Google interface, rather than referring them to outside Web sites, as a major factor in this coming extinction. According to a recent Wall Street Journalanalysis, Google generates close to forty per cent of traffic across digital media. Brands with strong home-page traffic will likely be less affected, Goldstein wrote—places like Yahoo, the Wall Street Journal, the New York Times, the Daily Mail, CNN, the Washington Post, and Fox News. But Web sites that aren't as frequently typed into browsers need to “contemplate drastic measures, possibly halving their brand portfolios.”What will emerge in the wake of mass extinction, Brian Morrissey, another media analyst, recently wrote in his newsletter, “The Rebooting,” is “a different industry, leaner and diminished, often serving as a front operation to other businesses,” such as events, e-commerce, and sponsored content. In fact, he told me, what we are witnessing is nothing less than the end of the mass-media era. “This is a delayed reaction to the commercial Internet itself,” he said. “I don't know if anything could have been done differently.”..Much MoreVideo of the WeekAI and Everything Else - Benedict Evans from SlushAI of the WeekThe OpenAI EndgameThoughts about the outcome of the NYT versus OpenAI copyright lawsuitBy Mike LoukidesFebruary 13, 2024Since the New York Times sued OpenAI for infringing its copyrights by using Times content for training, everyone involved with AI has been wondering about the consequences. How will this lawsuit play out? And, more importantly, how will the outcome affect the way we train and use large language models?There are two components to this suit. First, it was possible to get ChatGPT to reproduce some Times articles very close to verbatim. That's fairly clearly copyright infringement, though there are still important questions that could influence the outcome of the case. Reproducing the New York Times clearly isn't the intent of ChatGPT, and OpenAI appears to have modified ChatGPT's guardrails to make generating infringing content more difficult, though probably not impossible. Is this enough to limit any damages? It's not clear that anybody has used ChatGPT to avoid paying for a NYT subscription. Second, the examples in a case like this are always cherry-picked. While the Times can clearly show that OpenAI can reproduce some articles, can it reproduce any article from the Times' archive? Could I get ChatGPT to produce an article from page 37 of the September 18, 1947 issue? Or, for that matter, an article from the Chicago Tribune or the Boston Globe? Is the entire corpus available (I doubt it), or just certain random articles? I don't know, and given that OpenAI has modified GPT to reduce the possibility of infringement, it's almost certainly too late to do that experiment. The courts will have to decide whether inadvertent, inconsequential, or unpredictable reproduction meets the legal definition of copyright infringement.The more important claim is that training a model on copyrighted content is infringement, whether or not the model is capable of reproducing that training data in its output. An inept and clumsy version of this claim was made by Sarah Silverman and others in a suit that was dismissed. The Authors' Guild has its own version of this lawsuit, and it is working on a licensing model that would allow its members to opt in to a single licensing agreement. The outcome of this case could have many side-effects, since it essentially would allow publishers to charge not just for the texts they produce, but for how those texts are used.It is difficult to predict what the outcome will be, though easy enough guess. Here's mine. OpenAI will settle with the New York Times out of court, and we won't get a ruling. This settlement will have important consequences: it will set a de-facto price on training data. And that price will no doubt be high. Perhaps not as high as the Times would like (there are rumors that OpenAI has offered something in the range of $1 million to $5 million), but sufficiently high enough to deter OpenAI's competitors.$1M is not, in and of itself, a terribly high price, and the Times reportedly thinks that it's way too low; but realize that OpenAI will have to pay a similar amount to almost every major newspaper publisher worldwide in addition to organizations like the Authors Guild, technical journal publishers, magazine publishers, and many other content owners. The total bill is likely to be close to $1 billion, if not more, and as models need to be updated, at least some of it will be a recurring cost. I suspect that OpenAI would have difficulty going higher, even given Microsoft's investments—and, whatever else you may think of this strategy—OpenAI has to think about the total cost. I doubt that they are close to profitable; they appear to be running on an Uber-like business plan, in which they spend heavily to buy the market without regard for running a sustainable business. But even with that business model, billion-dollar expenses have to raise the eyebrows of partners like Microsoft.The Times, on the other hand, appears to be making a common mistake: overvaluing its data. Yes, it has a large archive—but what is the value of old news? Furthermore, in almost any application but especially in AI, the value of data isn't the data itself; it's the correlations between different datasets. The Times doesn't own those correlations any more than I own the correlations between my browsing data and Tim O'Reilly's. But those correlations are precisely what's valuable to OpenAI and others building data-driven products...MoreOpenAI Sora– The most realistic AI-generated video to dateERIK NASOOpenAI Sora is an AI text-to-video model that has achieved incredibly realistic video that is hard to tell it is AI. It's very life-like but not real. I think we have just hit the beginning of some truly powerful AI-generated video that could change the game for stock footage and more. Below are two examples of the most realistic AI prompt-generated videos I have seen.Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.Prompt: Drone view of waves crashing against the rugged cliffs along Big Sur's garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff's edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff's edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.Prompt: Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. The art style is 3D and realistic, with a focus on lighting and texture. The mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with wide eyes and open mouth. Its pose and expression convey a sense of innocence and playfulness, as if it is exploring the world around it for the first time. The use of warm colors and dramatic lighting further enhances the cozy atmosphere of the image.Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user's prompt. OpenAI SOra states they are teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction...MoreI Was Wrong. We Haven't Reached Peak AI Frenzy.By Kate ClarkFeb 15, 2024, 4:16pm PSTAfter Sam Altman's sudden firing last year, I argued the chaos that followed his short-lived ouster would inject a healthy dose of caution into venture investments in artificial intelligence companies. I figured we'd finally reached the peak of the AI venture capital frenzy when a threatened employee exodus from OpenAI risked sending the value of the $86 billion AI juggernaut almost to zero. There was plenty of other proof that the hype for generative AI was fading. Investors were openly saying they planned to be a lot tougher on valuation negotiations and would ask startups harder questions about governance. Some companies had begun to consider selling themselves due to the high costs of developing AI software. And an early darling of the AI boom, AI-powered writing tool Jasper, had become the butt of jokes when it slashed internal revenue projections and cut its internal valuation after having won a $1.5 billion valuation in 2022. I forgot that everyone in Silicon Valley suffers from short-term memory loss. After a week sipping boxed water with venture capitalists from South Park to Sand Hill Road, I'm convinced I called the end of the AI frenzy far too soon. In fact, I expect this year will deliver more cash into the hands of U.S. AI startups than last year, when those companies raised a total of $63 billion, according to PitchBook data. Altman's fundraising ambitions will surely boost the total. A recent report from The Wall Street Journal said Altman plans to raise trillions of dollars to develop the AI chips needed to create artificial general intelligence, software that can reason the way humans do. Even if that number is actually much smaller, talk of such goals lifts the ceiling for other startup founders, who are likely to think even bigger and to be more aggressive in their fundraising. Investor appetite for AI companies is still growing, too. These investors claimed last fall that they were done with the FOMO-inspired deals, but they're pushing checks on the top AI companies now harder than ever...MoreNews Of the WeekI tried Vision Pro. Here's my takeThe Quest 3 is better than you might expectPosted by Matt Birchler13 Feb 2024Alex Heath for The Verge: Zuckerberg says Quest 3 is “the better product” vs. Apple's Vision ProHe says the Quest has a better “immersive” content library than Apple, which is technically true for now, though he admits that the Vision Pro is a better entertainment device. And then there's the fact that the Quest 3 is, as Zuck says, “like seven times less expensive.”I currently own both headsets and while I'm very excited about the potential in the Vision Pro, I actually find it hard to fully disagree with Zuck on this one. I think a lot of people have only used the Vision Pro would be surprised how well the Quest 3 does some things in comparison.For example, the pass-through mode is definitely not quite as good as the Vision Pro's, but it's closer than you might expect. And while people are rightly impressed with how well the Vision Pro has windows locked in 3D space, honestly the Quest 3 is just as good at this in my experience. When it comes to comfort, I do think the Vision Pro is easier to wear for longer periods, but I find it more finicky to get in just the right spot in front of my eyes, while the Quest 3 seems to have a larger sweet spot. And let's not even talk about the field of view, which is way wider on the Quest to the point of being unnoticeable basically all the time. I kinda think field of view will be similar to phone bezels in that you get used to what you have and anything more seems huge — you can get used to the Vision Pro's narrower field of view, but once you're used to wider, it's hard to not notice when going back.The Vision Pro has some hardware features that help it rise above (the massively higher resolution screen jumps to mind), but I'm just saying that if you're looking for everything to be 7x better to match the price difference, I don't think that's there.Beyond this, the products are quite different, though. As Zuckerberg says, the Quest 3 is more focused on fully immersive VR experiences, and while the Vision Pro has a little of that right now, it's not really doing the same things. And when it comes to gaming it's not even close. The Quest 3 has a large library of games available and that expands to almost every VR game ever made with Steam Link.On the other hand, the Vision Pro is much for a “computer” than the Quest ever was. If you can do it on a Mac or an iPad, you can probably already do it on the Vision Pro. And I'm not talking about finding some weird alternate version of your task manager or web browser that doesn't sync with anything else in your life, I'm talking about the apps you already know and love. This is huge and it's Apple leveraging its ecosystem to make sure you can seamlessly move from Mac to iPhone to iPad to Vision Pro. And if you can't install something from the App Store, the web browser is just as capable as Safari on the iPad. If all else fails, you can always just bring your full Mac into your space as well. I will say the Quest 3 can do this and has the advantage of working with Windows as well, but if you have a Mac, it's much, much better.This is more words than I expected to write about a CEO saying his product is better than the competition's (shocker), but I do think that Zuck's statement is less insane than some may think it to be...MoreThe Supreme Court will decide if the government can seize control of YouTube and TwitterWe're about to find out if the Supreme Court still believes in capitalism.By Ian Millhiser Feb 15, 2024, 7:00am ESTIan Millhiser is a senior correspondent at Vox, where he focuses on the Supreme Court, the Constitution, and the decline of liberal democracy in the United States. He received a JD from Duke University and is the author of two books on the Supreme Court.In mid-2021, about a year before he began his longstanding feud with the biggest employer in his state, Florida's Republican Gov. Ron DeSantis signed legislation attempting to seize control of content moderation at major social media platforms such as YouTube, Facebook, or Twitter (now called X by Elon Musk). A few months later, Texas Gov. Greg Abbott, also a Republican, signed similar legislation in his state.Both laws are almost comically unconstitutional — the First Amendment does not permit the government to order media companies to publish content they do not wish to publish — and neither law is currently in effect. A federal appeals court halted the key provisions of Florida's law in 2022, and the Supreme Court temporarily blocked Texas's law shortly thereafter (though the justices, somewhat ominously, split 5-4 in this later case).Nevertheless, the justices have not yet weighed in on whether these two unconstitutional laws must be permanently blocked, and that question is now before the Court in a pair of cases known as Moody v. NetChoice and NetChoice v. Paxton.The stakes in both cases are quite high, and the Supreme Court's decision is likely to reveal where each one of the Republican justices falls on the GOP's internal conflict between old-school free market capitalists and a newer generation that is eager to pick cultural fights with business...MoreArm Results Set The World On FireFebruary 13, 2024 · by D/D Advisors · in Analyst Decoder Ring. ·Arm reported its second set of earnings as a (once again) public company last week. These numbers were particularly strong, well above consensus for both the current and guided quarters. Arm stock rallied strongly on the results up ~30% for the week. These numbers were important as they go a long way to establishing the company's credibility with the Street in a way their prior results did not.That being said, we saw things we both liked and disliked in their numbers. Here are our highlights of those:Positive: Growing Value Capture. One of our chief concerns with the company since IPO has been the low value they capture per licensed chip shipped – roughly $0.11 per chip at the IPO. That figure continued to inch higher in the latest results, but critically they pointed out that their royalty rate doubles with the latest version of their IP (v9). This does not mean that all of their royalty rates are going to double any time soon, but it does point very much in the right direction. Critically, they noted this rate increase applies to architectural licenses as well.Negative: The Model is Complex. Judging from the number of questions management fielded on the call about this rate increase no one really knows how to model Arm. The company has a lot of moving parts in its revenue mix, and they have limits to their ability to communicate some very important parts of their model. We think that at some point the company would be well served by providing some clearer guide posts on how to build these models or they risk the Street always playing catch up with a wide swing of expectations each quarter.Positive: Premium Plan Conversion. The company said three companies converted from their AFA plan to the ATA model. We will not get into the details of those here, but these can best be thought of in software terms with customers on low priced subscription plans converting to Premium subscription plans. This is a good trend, and management expressed a high degree of confidence that they expect to see it continue. They have spent a few years putting these programs in place and seem to have thought them through. This matters particularly because these programs are well suited for smaller, earlier-stage companies. The old Arm struggled to attract new customers in large part because of the high upfront costs of Arm licenses. Programs like AFA and ATA could go a long way to redressing those past wrongs.Negative: China remains a black box. Arm China is of course a constant source of speculation. In the latest quarter it looks like a large portion of growth came from China which does not exactly square with other data coming from China right now. It is still unclear to us how much of Arm's revenues from China's handset companies gets booked through Arm China as a related party transaction and how much is direct. Investors are confused too. There is no easy solution to this problem, digging too hard into Arm China's numbers is unlikely to make anyone happy with the answers, but hopefully over time it all settles down.Positive: Growing Complexity of Compute. Management repeatedly mentioned this factor, noting that this leads to more chips and more Arm cores shipping in the marketplace. Some of this is tied to AI, but we think the story is broader than that. It is going to be tempting to see much of Arm's growth as riding the AI wave, but this does not fully capture the situation. The AI story is largely about GPUs, which are not particularly heavy with Arm cores. But those GPUs still need some CPU attach, and AI accelerators can sometimes be good Arm targets.Negative: Diversification. Arm remains heavily dependent on smartphones, and we suspect the return to inventory stocking by handset makers is playing a big role in their guidance. When asked about segmentation of their results the company declined to update the model provided during the IPO. We hope to see some diversification here when they do update their figures later in the year.Overall, the company did a good job in the quarter. They still have some kinks to work out with their communication to the Street, but this was a good second step as a public company...MoreStartup of the WeekBret Taylor's new AI company aims to help customers get answers and complete tasks automaticallyRon Miller @ron_miller / 6:36 AM PST•February 13, 2024Image Credits: mi-vector / Getty ImagesWe've been hearing about former Salesforce co-CEO Bret Taylor's latest gig since he announced he was leaving the CRM giant in November 2022. Last February we heard he was launching an AI startup built with former Google employee Clay Bavor. Today, the two emerged with a new conversational AI company called Sierra with some bold claims about what it can do.At its heart, the new company is a customer service bot. That's not actually all that Earth-shattering, but the company claims that it's much more than that, with its software going beyond being an extension of a FAQ page and actually taking actions on behalf of the customer.“Sierra agents can do so much more than just answer questions. They take action using your systems, from upgrading a subscription in your customer database to managing the complexities of a furniture delivery in your order management system. Agents can reason, problem solve and make decisions,” the company claimed in a blog post.Having worked with large enterprise customers at Salesforce, Taylor certainly understands that issues like hallucinations, where a large language model sometimes makes up an answer when it lacks the information to answer accurately, is a serious problem. That's especially true for large companies, whose brand reputation is at stake. The company claims that it is solving hallucination issues.Image Credits: SierraAt the same time, it's connecting to other enterprise systems to undertake tasks on behalf of the customer without humans being involved. These are both big audacious claims and will be challenging to pull off...MoreX of the Week This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit thatwastheweek.substack.com/subscribe

united states tv ceo spotify time netflix texas ai google earth china disney apple internet washington guide nfl dogs sports super bowl nba germany new york times video management elon musk microsoft iphone fortune abc 3d uber chatgpt espn court cnn mlb supreme court quest skills tokyo winner silicon valley web republicans cbs discovery wall street journal production investors washington post snow vr amazon prime mac nhl hulu brands programs ipads windows complex fox news new yorker constitution fomo ibm ip pricing athletic steve jobs yahoo bloomberg publishers jd crm ipo marry gop judging openai gemini blade runner salesforce arm nyt won duke university app store warner business insider ron desantis sports illustrated moody ridley scott south park tnt los angeles times years ago ads safari vox guild first amendment sixteen gq glasses gpt faq boston globe nbc news george orwell chicago tribune daily mail essays goldstein tbs sora greg abbott bob iger cpu businessweek altman dalle cautious pitchfork critically cinematography zuck macintosh conceived sarah silverman gpus rebooting big sur fs1 ata trutv fivethirtyeight gawker buzzfeed news compute cond nast afa vice news apple tv plus oscar goes espn2 vox media american prospect crunchbase reproducing btn pacific coast highway espnu pitchbook authors guild extinction level event theskimm be like john sculley los angeles raiders jeff dean i was wrong kate clark cowboy ventures clare malone chiat day weeki ryan morrison aileen lee brian morrissey super bowl xviii gemini ultra fs2 espnews apple iii

TWiG 755: Beat It for 15 Minutes - Sovereign AI, Obituary Spam, Gemini Ultra

This Week in Google (MP3)

Play Episode Listen Later Feb 15, 2024 115:27

Leo opens his copy of "Shift Happens" Nvidia CEO calls for "Sovereign AI" as his firm overtakes Amazon in market value Sarah Silverman's copyright infringement suit against OpenAI will advance in pared-down form The rise of obituary spam Biden administration taps Gina Raimondo to direct new AI Safety Institute University of Pennsylvania announces first AI undergrad degree When Your Technical Skills Are Eclipsed, Your Humanity Will Matter More Than Ever "Societal misalignments" could pose AI dangers, OpenAI CEO says Zuckerberg says Quest 3 is 'the better product' vs. Apple's Vision Pro TikTok tunnel girl says project at Northern Virginia home is moving along Everyone keeps asking me to talk about this Tunnel Guy so I'm finally doing it New Mass Gmail Rejections To Start April 2024, Google Says Google One AI Premium is $19.99/month with Gemini Advanced A crowd destroyed a driverless Waymo car in San Francisco Over Three Decades, Tech Obliterated Media The Web Design Museum on Instagram ohmygit.org in Finland, Windows95Man is going to Eurovision. Hosts: Leo Laporte, Jeff Jarvis, and Paris Martineau Download or subscribe to this show at https://twit.tv/shows/this-week-in-google. Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit Sponsors: Melissa.com/twit Miro.com/podcast kolide.com/twig

Google Gemini Ultra, AI Chip Frenzy & Tech Journalist Joanna Stern | Ep44

AI For Humans

Play Episode Listen Later Feb 15, 2024 80:16

This week…we dive into Google Gemini Ultra 1.0, Sam Altman is making AI chips, Nvidia is making chatbots and creepy autonomous AI robots are silently coming for us all.. Kevin explores the new Stable Cascade image model, Gavin goes deep on AI music from Suno, ElevenLabs is going to share revenue with voice model creators and did the AI superbowl commercials over promise? Yes, probably. AND THEN… an interview with the Wall Street Journal's tech reporter Joanna Stern about her viral Apple Vision Pro review, how she uses AI and a fascinating discussion about her latest story around the parents of the Parkland victims using AI to robocall politicians with AI versions of their children's voices. Oh and true juicy-exclusey as this week's AI co-host is none other than Bard, who was just ‘released' by Google this week. We dig into some of that backstory and find out exactly WHY they moved on from Bard. Guess what… IT AIN'T WHAT YOU THINK. It's an endless cavalcade of ridiculous and informative AI news, AI tools, and AI entertainment cooked up just for you. Follow us for more AI discussions, AI news updates, and AI tool reviews on X @AIForHumansShow Join our vibrant community on TikTok @aiforhumansshow For more info, visit our website at https://www.aiforhumans.show/ /// Show links /// Gemini Ultra Released https://blog.google/products/gemini/bard-gemini-advanced-app/ AI Pasta or GPU Sacrifice https://twitter.com/tsarnick/status/1756879325260042737 Sam Altman Seeks 7 Trillion For Chips https://www.cnbc.com/2024/02/09/openai-ceo-sam-altman-reportedly-seeking-trillions-of-dollars-for-ai-chip-project.html Sam's Tweet About Infrastucture https://twitter.com/sama/status/1755294743565930726?s=46 Nvidia Worth More Than Amazon or Alphabet https://www.theverge.com/2024/2/14/24073384/nvidia-market-cap-passes-amazon-alphabet Nvidia's Chat with RTX https://x.com/NVIDIAGeForce/status/1757444009193304328?s=20 ElevenLabs Revenue Share https://twitter.com/elevenlabsio/status/1757087275131748639 DeepMind's Aloha 2 Robot https://www.reddit.com/r/singularity/comments/1anfyl5/google_deepmindaloha_2/ $200 Homemade Robot https://x.com/alexkoch_ai/status/1756500716854841835?s=20 1X Autonomous Robots https://youtu.be/iHXuU3nTXfQ?si=l3H-eOpn1z6VjaQq Tiktok Boximator https://boximator.github.io/?ref=aiartweekly Suno AI Music Generator https://v-day.suno.ai/ Stability AI's Stable Cascade https://stability.ai/news/introducing-stable-cascade Joanna's Apple Vision Pro Review https://youtu.be/8xI10SFgzQ8?si=ydX0KaKlwER4SR4Q Parkland Parents & AI Versions of Their Children https://youtu.be/h3VZjuttZbQ?si=_yTlaPDycKuMaSmS Joanna Stern at WSJ https://www.wsj.com/news/author/joanna-stern

TWiG 755: Beat It for 15 Minutes - Sovereign AI, Obituary Spam, Gemini Ultra

This Week in Google (Video HI)

Play Episode Listen Later Feb 15, 2024 115:27

Altman's $7 Trillion Dream

AI Inside

Play Episode Listen Later Feb 15, 2024 49:48

Jason Howell and Jeff Jarvis discuss the week's AI news, including Sam Altman's call for $7 trillion in AI funding, Google's launch of Gemini Ultra 1.0 chatbot, proposed regulations on AI safety, dismissal of copyright claims against AI, and the need for humanities education in the AI field.NOTE: Connectivity issues resulted in a lower-resolution file for part of the show. Apologies!NEWSSam Altman wants $7 trillion to boost AI chip and GPU production globallyChatGPT gaining ability to remember user preferences and dataOpenAI building web and device control agentsGoogle's Assistant is now called Gemini on Android devicesGoogle announces Gemini Ultra 1.0 model to compete with GPT-4California bill proposes AI safety regulations and requirementsAI companies agree to limit election deepfakesMost claims dismissed in Sarah Silverman copyright lawsuit, leaving only 1 direct copyright claimBeijing court rules AI-generated content can be copyrightedNYT op-ed argues humanities education is key to developing AI leaders Hosted on Acast. See acast.com/privacy for more information.

ai google apologies acast assistant gemini trillion sam altman gpu altman sarah silverman jeff jarvis jason howell gemini ultra

17: All About AI Agents

Sidecar Sync

Play Episode Listen Later Feb 15, 2024 52:43

In this episode of Sidecar Sync, Amith and Mallory delve into the evolving world of AI agents, exploring OpenAI's latest developments in creating autonomous software that can operate devices and execute tasks on behalf of users. They discuss the revolutionary potential of AI agents to transform our interaction with digital tools, the ethical considerations and privacy implications of such deep integration, and the competitive landscape within the AI industry. The conversation also covers the significance of language models in facilitating communication between humans and AI, and the future vision for AI agents, including their role in increasing productivity and the challenges in developing these advanced technologies.Prefer to watch your podcasts? Watch us on Youtube: https://www.youtube.com/channel/UC3ExQ5BPDo1I1-L1QDGED5QLet us know what you think about the podcast. Drop your questions or comments in the Sidecar community: https://community.sidecarglobal.com/c/sidecar-sync/ Join the AI Bootcamp for Associations: https://sidecarglobal.com/bootcamp Download Ascend: Unlocking the Power of AI for Associations: https://sidecarglobal.com/AI Join the CEO AI Mastermind Group: https://sidecarglobal.com/association-ceo-mastermind-2024/ Thanks to this episode's sponsors! AI Bootcamp for Associations: https://sidecarglobal.com/bootcamp Tools/Experiments mentioned: Skip: https://memberjunction.com/skip/ Rabbit OS: https://www.rabbit.tech/Gemini Ultra: https://deepmind.google/technologies/gemini/#introductionAdept AI: https://www.adept.ai/Social: Follow Sidecar on LinkedIn: https://www.linkedin.com/company/sidecar-global Amith Nagarajan: https://www.linkedin.com/in/amithnagarajan/ Mallory Mejias: https://www.linkedin.com/in/mallorymejias/

ai power drop skip openai prefer associations sidecar gemini ultra

#342 - Gemini Ultra, Vision Pro Reviews, Österreichs Export Wunder

Future Weekly - der Startup Podcast!

Play Episode Listen Later Feb 15, 2024 41:03

Hol dir die News aus der Startup-Szene und höre in Future Weekly 342 rein. Hannah und Markus diskutieren diese Woche die folgenden spannenden Themen:

vision news video predictions wall street journal ideen gemini wunder export hol moonshots schick klage startup szene billionen gemini ultra titelblatt

Gemini Advance: Is It Worth Your Time?

Front End Toolbox

Play Episode Listen Later Feb 15, 2024 6:06

In this episode, we're diving into the recent launch of Google's Gemini Advanced model. Join us as we explore whether it's worth your time and investment. Here's what we've got on the menu:* Introduction to Gemini Advanced: A week after its launch, we take a closer look at Google's new AI offering. Is it the chatbot competitor we've been waiting for?* Understanding the Subscription Model: To access Gemini Advanced, powered by the Gemini Ultra 1.0 model, you'll need to navigate Google's branding maze. We break down the subscription process for you.* First Impressions: After a week of testing, we share our initial thoughts on Gemini's performance. From coding assistance to real-time searches, find out how it stacks up against ChatGPT.* Improvements and Quirks: The Gemini team has been busy ironing out the initial kinks. We discuss the quality improvements and the quirky challenges that remain.* Pricing and Value: With a $20/month subscription fee as part of the Google One plan, we evaluate the cost-effectiveness of Gemini Advanced. Is it worth switching from ChatGPT Plus?* Integration with Google Ecosystem: Despite its strengths, Gemini's integration with Google's suite of services leaves room for improvement. We highlight what's missing and the potential for a more cohesive user experience.* Future Prospects: Google's rapid improvements signal a promising future for Gemini Advanced. We speculate on what's next and how it could revolutionize our interaction with AI.Thank you for joining us on this exploration of Google's Gemini Advanced. Your curiosity fuels our journey into the ever-evolving world of AI. See you in the next episode! This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.stack-snacks.com

ai google chatgpt integration pricing advance gemini improvements chatgpt plus google one gemini ultra

SERP's Up | How will Google's most powerful AI, Gemini, impact SEO?

SERP's Up SEO Podcast

Play Episode Listen Later Feb 14, 2024 47:44

The race is on! The AI Generative competition has been well underway and Google has leaned into a major update. Google Gemini has now been introduced. What is Google Gemini? How will it impact SEO? How will it impact search in general? Google is in a transition period of search. Does Gemini have what it takes to propel Google forward in the future? This week, Wix's Mordy Oberstein and Crystal Carter are joined by the great Danny Goodwin to evaluate this so-called ‘ultra' powered version of Bard - Gemini. Pete Huang also makes an appearance to take a look at what Gemini means for the overall evolution of AI. Today, we're all Geminis! So sit back, look up at the stars, and tune in to episode 74 of the SERP's Up SEO Podcast as we discover Google Gemini! Key Segments [00:01:48] What's On This Episode of SERP's Up? [00:03:28] Focus Topic of the Week: Google Gemini [00:04:06] Focus Topic Guest: Danny Goodwin [00:30:19] The Great Beyond w/ Pete Huang [00:39:01] Snappy News [00:43:39] Follow of the Week Hosts, Guests, & Featured People: Mordy Oberstein Crystal Carter Danny Goodwin Pete Huang Erin Sparks Resources: SERP's Up Podcast Wix SEO Learning Hub Searchlight SEO Newsletter Wix Studio Wix Studio YouTube Search Engine Land Google Gemini Google Gemini is here – and it's already being tested in Search The Neuron Barry Schwartz's Famous Butter Sandwich Edge of the Web News: Google drops Web Stories from image results, Google Discover carousel view and more Yandex search engine sold in $5.2 billion deal Bard becomes Gemini: Try Ultra 1.0 and a new mobile app today Google: We Don't Say Core Web Vitals Are A Ranking Factor

ai google powerful chatgpt seo gemini bard wix google gemini serps yandex geminis great beyond sge google discover web stories gemini ultra danny goodwin crystal carter

EP 205: AI News That Matters - Feb. 12th, 2024

DIGITAL LIFE - Marketing & Digital

Play Episode Listen Later Feb 12, 2024 40:22

OpenAI shifting towards AI agents, Google Bard is dead, Gemini Ultra, AI Super Bowl commercials, and more! Here's this week's AI news that matters!Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode pageJoin the discussion: Ask Jordan questions on AIUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTimestamps:03:39 OpenAI developing agents and artificial general intelligence.11:20 Google Bard replaced by Gemini16:30 Google Gemini struggles with real-time data accuracy.18:40 Sam Altman seeks $7 Trillion for AI chips24:45 Lawsuits filed over AI art companies for copyright27:35 New AI Smart Glasses30:30 Super Bowl AI-related adsTopics Covered in This Episode:1. Discussion on OpenAI 2. Google's AI Products3. Legal Challenges for AI Industry4. New AI Technology and Commercial PromotionKeywords:AI advancement, image manipulation, photography, Minions, Body Armor, anti-artificial stance, CrowdStrike, AI-powered cybersecurity, Gift mode, AI news updates, Everyday AI Show, career growth, business growth, OpenAI, AI agents, GPT-4, AI art companies, copyright lawsuits, Brilliant Labs, smart glasses, Frame, Microsoft, AI, Copilot, Super Bowl ad, Google Pixel 8, guided frame, AI accessibility features, Google Gemini Ultra Get more out of ChatGPT by learning our PPP method in this live, interactive and free training! Sign up now: https://youreverydayai.com/ppp-registration/ Get more out of ChatGPT by learning our PPP method in this live, interactive and free training! Sign up now: https://youreverydayai.com/ppp-registration/

Il nuovo Gemini Ultra 1.0 e Gemini Advanced | Ep. #188

Play Episode Listen Later Feb 12, 2024 10:47

Scopriamo il rebranding di Google che cancella Bard per lanciare la versione a pagamento del suo ChatBot basato su Gemini Ultra 1.0

ai google gemini chatbots bard nuovo scopriamo gemini ultra

LLMs: Welcome Google's Gemini Ultra!

Karachi Wala Developer

Play Episode Listen Later Feb 10, 2024 5:01

The all new heavy weight model has just dropped this week. Should you be rushing to try it out? Why is the ultra model better? This and more in his quick episode talking about Gemini Ultra.

google gemini google's gemini gemini ultra

Релиз Gemini Ultra и Codellama 70B / Утечка Mistral и впечатления от Vision Pro / AIA Podcast #27

AIA Podcast

Play Episode Listen Later Feb 10, 2024 161:08

spotify ai vision chatgpt discord telegram gemini gpt neuralink llm ide aw morpheus mistral leaderboard readme gemini ultra

EP 204: Google Gemini Advanced - 7 things you need to know

google chatbots tendr puntos els dow jones llamada suscripci gemini ultra

Play Episode Listen Later Feb 9, 2024 43:20

Did Google just release a ChatGPT killer? Google's new Gemini Advanced is their paid offering to the free Gemini (previously Bard). Is it really advanced? We're diving in and taking a look at Gemini Advanced and comparing it to ChatGPT.Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode pageJoin the discussion: Ask Jordan questions on Google GeminiUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTimestamps:02:20 Daily AI news07:20 About Google Gemini Advanced13:04 Gemini Ultra free for 2 months.16:26 Difficulty accessing Google Workspace account for Gemini.20:27 People interact with large language models informally.23:01 Gemini advanced offers enhanced features for users.26:18 Use Google for latest election information confusion.29:34 Google's AI unaware of recent events. Disconnect from real-time.35:56 Big companies using digital watermarks to combat AI-generated misinformation.39:25 Gemini Ultra outperformed all models on MMLU.43:21 Final thoughtsTopics Covered in This Episode:1. Launch and Access to Google Gemini Advanced2. Features of Google Gemini3. Performance and Comparisons4. User Feedback and Experiences5. Issues with Google GeminiKeywords:Gemini Ultra 1.0 model, benchmarking, free two-month trial, Google search, real-time events, Chat GPT, Google Workspace accounts, AI content, Jordan Wilson, Google's AI system, Gemini, Super Bowl, US primary election, New Hampshire, prime prompt polished chat GPT course, Everyday AI Show, Google Gemini Advanced, AI industry news, Midjourney's website rollout, FTC's ban on AI robocalls, OpenAI's development of agents, testing experience, Gemini Advanced querying, large language models, Anthropic's Claude 2.1, Microsoft's Copilot, GPT 4, digital watermark, Gemini app for Android, Google iOS app Get more out of ChatGPT by learning our PPP method in this live, interactive and free training! Sign up now: https://youreverydayai.com/ppp-registration/

EP50: We Bet $1000 Using Gemini Advanced, Qwen1.5 72B, Retell AI, Apple's MGIE & GOODY-2

This Day in AI Podcast

Play Episode Listen Later Feb 9, 2024 61:18

Subscribe to ThisDayInAI: https://thisdayinai.comTry AI Agents on SimTheory: https://simtheory.aiShow notes: https://thisdayinai.com/bookmarks/6-ep50Tell us your thoughts on Gemini here: https://thisdayinai.com/post/62-your-thoughts-gemini-advanced/Thanks to everyone for all your support and kind reviews to reach 50 episodes! Please consider leaving us a review wherever you get your podcasts.=====This week we cover the launch of Google Gemini Advanced, Gemini Ultra 1.0 and Bard being Renamed to Gemini. We compare GPT-4, Gemini Ultra 1.0 and Qwen 1.5 72B by sports betting $1000 on horse racing.We celebrate 50 episodes and share our excited for Qwen 1.5 72B's performance at coding and quick refusals. We cover new releases including SyncLabs and Retell AI and Apple's Open Source Guiding Instruction-based Image Editing via Multimodal Large Language Models.Finally, we discuss GOODY-2 and it's high refusal rate.=====CHAPTERS:00:00 - Betting $1,000 To Compare Gemini Ultra 1.0 to GPT-4 to Qwen 1.507:33 - Google Gemini Advanced, Ultra: Details of Announcement and First Impressions25:48 - OpenAI is Developing Agents to Control Your Devices27:40 - Celebrating 50 Episodes of This Day in AI30:34 - Qwen 1.5 72B: We're Impressed!42:47 - SyncLabs: Tested & Impressions47:58 - Retell AI: Tested & Impressions54:18 - Apple's Open Source Guiding Instruction-based Image Editing via Multimodal Large Language Models58:10 - GOODY-2: The World's Most Responsible AI Model

Febrero 9: El S&P superó los 5000 puntos. Google renombró su chatbot AI y tendrá una nueva suscripción llamada Gemini Ultra 1.0

WALL STREET COLADA

Play Episode Listen Later Feb 9, 2024 3:22

El S&P 500 superó los 5.000 puntos por primera vez en los últimos minutos de la sesión del jueves, pero volvió a bajar y cerró en 4.997. El índice industrial Dow Jones también marcó un nuevo récord, cerrando en 38.726 puntos. $PEP $GOOG $GOOGL $PINS $EXPE

Google launches Gemini Ultra

TechCrunch

Play Episode Listen Later Feb 9, 2024 6:04

Google launches Gemini Ultra; FCC officially declares AI-voiced robocalls illegal; Podcasters can now upload their RSS feed to YouTube Learn more about your ad choices. Visit megaphone.fm/adchoices

ai google podcasters gemini launches fcc gemini ultra

From Bard to Gemini: Google's Next-Gen AI Leap Forward

Generation AI

Play Episode Listen Later Feb 9, 2024 11:31

This bonus episode of the "Generation AI" podcast dives into Google's latest AI development, focusing on the rebranding of its chatbot Bard to Gemini and the release of the new model, Gemini Ultra. The hosts discuss the transition from Bard to Gemini, emphasizing its multimodal capabilities (handling text, image, voice, and video inputs and outputs) and the introduction of advanced features like complex task handling and logical reasoning. Google's move to include a paid AI model, Gemini Advanced, with a subscription to Google One AI premium, marks a significant step in competing with GPT-4. Additionally, Gemini's integration into Google's ecosystem, such as Workspace, and its potential to enhance search capabilities are highlighted. The episode provides insights into the implications of these developments for the higher education sector and the broader AI landscape. - - - -Connect With Our Co-Hosts:Ardis Kadiuhttps://www.linkedin.com/in/ardis/https://twitter.com/ardisDr. JC Bonillahttps://www.linkedin.com/in/jcbonilla/https://twitter.com/jbonillxAbout The Enrollify Podcast Network:Generation AI is a part of the Enrollify Podcast Network. If you like this podcast, chances are you'll like other Enrollify shows too! Some of our favorites include The EduData Podcast and Visionary Voices: The College President's Playbook.Enrollify is made possible by Element451 — the next-generation AI student engagement platform helping institutions create meaningful and personalized interactions with students. Learn more at element451.com. Connect with Us at the Engage Summit:Exciting news — Ardis will be at the 2024 Engage Summit in Raleigh, NC, on June 25 and 26, and would love to meet you there! Sessions will focus on cutting-edge AI applications that are reshaping student outreach, enhancing staff productivity, and offering deep insights into ROI. Use the discount code Enrollify50 at checkout, and you can register for just $99! This early bird pricing lasts until March 31. Learn more and register at engage.element451.com — we can't wait to see you there!

ai google forward roi leap playbook gemini next gen raleigh bard gpt workspace genai google gemini google bard generation ai gemini ultra element451 enrollify

Thu. 02/08 – Google Releases Gemini Ultra 1.0

Techmeme Ride Home

Play Episode Listen Later Feb 8, 2024 15:32

Google has released Gemini Ultra 1.0, renamed Bard as Gemini and looks to be replacing Google Assistant with Gemini. Disney has invested in a big stake of Epic Games to get at Fortnite IP. Leaked images of the Pixel Fold 2. And what if OpenAI is facing the same strategic dilemma that Mark Zuckerberg was never able to overcome?Sponsors:Robinhood.com/boostLinks:Google's AI now goes by a new name: Gemini (The Verge)Google Assistant Just Got Supercharged With AI. It Might Be the Biggest Update in Google's History. (Gizmodo)Google Prepares for a Future Where Search Isn't King (Wired)Google Joins Effort to Help Spot Content Made With A.I. (NYTimes)Disney to take $1.5 billion stake in Epic Games, work with Fortnite maker on new content (CNBC)China had "persistent" access to U.S. critical infrastructure (Axios)Exclusive: This could be the Google Pixel Fold 2 (Android Authority)OpenAI Shifts AI Battleground to Software That Operates Devices, Automates Tasks (The Information)See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

history ai google disney mark zuckerberg fortnite releases openai gemini bard leaked epic games google assistant pixel fold google pixel fold gemini ultra

OpenAI Building AI Agents as Google Launches Gemini Advanced

The AI Breakdown: Daily Artificial Intelligence News and Discussions

Play Episode Listen Later Feb 8, 2024 16:29

Bard is no more! Bard has become Gemini and Gemini now features Gemini Advanced, which uses Gemini Ultra 1.0 -- the first non OpenAI model to hit GPT-4 levels. Reports also suggest that OpenAI's next big play is AI agents. ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/

ai google reports openai gemini launches bard gpt gemini ultra

Apple's Vision Pro is Heavy, Google Gemini Ultra Soon & Chat with Kevin Rose | Ep43

AI For Humans

Play Episode Listen Later Feb 8, 2024 89:54

This week…Amazon's new shopping AI, a $25m dollar deep fake scam, Google's Gemini Ultra is on the way, Kevin gives us an Apple Vision Pro review and MUCH MORE. Gavin tells us about a cool new Stable Diffusion plug-in from Glif, Boston Dynamics scary new robot, Hugging Face's new AI chatbots, Roblox's AI chat translator, AI helps read 2000 year-old Roman scrolls and we get creeped out by a cool new text-to-speech model. AND THEN… an interview with our old friend, tech entrepreneur and podcaster Kevin Rose! We discuss how he uses AI to check in on his wellness, what sort of AI companies he's interested in and then have an AI pitch-bot “attempt” to solve problems for him. Kevin's podcast is relaunching soon and you can find info here: https://www.kevinrose.com/ Oh and our AI Co-Host this week is a very special VR expert who's come to give us her review of the Apple Vision Pro but it turns out she might have a problem or two differentiating herself from reality. It's an endless cavalcade of ridiculous and informative AI news, AI tools, and AI entertainment cooked up just for you. Follow us for more AI discussions, AI news updates, and AI tool reviews on X @AIForHumansShow Join our vibrant community on TikTok @aiforhumansshow For more info, visit our website at https://www.aiforhumans.show/ /// Show links /// 25m Deepfake Scam https://www.cnn.com/2024/02/04/asia/deepfake-cfo-scam-hong-kong-intl-hnk/index.html Google Gemini Ultra Roll Out? https://www.tomsguide.com/ai/google-may-be-rolling-out-gemini-ultra-this-week-and-renaming-bard-at-the-same-time Google's New ImageFX https://blog.google/technology/ai/google-labs-imagefx-textfx-generative-ai/ Hugging Face Assistants https://huggingface.co/chat/assistants Taylor Swift Deepfakes Came From 4chan https://www.nytimes.com/2024/02/05/business/media/taylor-swift-ai-fake-images.html AI Reads Ancient Roman Scrolls https://www.bloomberg.com/features/2024-ai-unlock-ancient-world-secrets/ Roblox AI Chat Translator https://www.theverge.com/2024/2/5/24061495/roblox-generative-ai-chat-translator Amazon's Rufus Chatbot https://www.aboutamazon.com/news/retail/amazon-rufus Babies Wear GoPros to Train AI https://www.nature.com/articles/d41586-024-00288-1 New Boston Dynamics Robot https://twitter.com/BostonDynamics/status/1754564972913332703?s=20 Media-To-Face Creepy But Cool Facial Model https://sites.google.com/view/media2face AI Huberman Lab https://ai.hubermanlab.com/ GLIF StyleHunter Chrome Extension https://twitter.com/fabianstelzer/status/1752732124740719037 Stable Video 1.1 https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1 Meshy AI https://app.meshy.ai/

Thu Feb-8-2024: Google Unveils Gemini Ultra and Revamps AI Strategy, Automation Nation, JCDC Turmoil

Business of Tech

Play Episode Listen Later Feb 8, 2024 10:14

In this episode of the Business of Tech podcast, Dave Sobel discusses three important developments in the tech industry. First, Google revamps its AI strategy and introduces Gemini Ultra, its most advanced language model. Google also launches a new Google One tier with enhanced features. Next, ConnectWise unveils advanced RPA and AI capabilities at Automation Nation 2024 in Tampa. Finally, the podcast delves into the setbacks faced by JCDC in their cybersecurity efforts, including conservative scrutiny and internal issues. Three things to know today00:00 Google Revamps AI Strategy: Unveils Gemini Ultra and Launches New Google One Tier with Enhanced Features05:45 Automation Nation 2024: ConnectWise Unveils Advanced RPA and AI Capabilities in Tampa06:58 JCDC's Setback: How Conservative Scrutiny and Internal Issues Challenge Cybersecurity Efforts Supported by: https://coreview.com/msp/ Looking for a link from the stories? The entire script of the show, with links to articles, are posted in each story on https://www.businessof.tech/ Do you want the show on your podcast app or the written versions of the stories? Subscribe to the Business of Tech: https://www.businessof.tech/subscribe/ Support the show on Patreon: https://patreon.com/mspradio/ Want our stuff? Cool Merch? Wear “Why Do We Care?” - Visit https://mspradio.myspreadshop.com Follow us on:LinkedIn: https://www.linkedin.com/company/28908079/YouTube: https://youtube.com/mspradio/Facebook: https://www.facebook.com/mspradionews/Instagram: https://www.instagram.com/mspradio/TikTok: https://www.tiktok.com/@businessoftech

tiktok ai google business tech tampa automation gemini turmoil rpa ai strategy google one revamps connectwise gemini ultra jcdc cool merch

LW - AI #50: The Most Dangerous Thing by Zvi

The Nonlinear Library: LessWrong

Play Episode Listen Later Feb 8, 2024 36:57

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #50: The Most Dangerous Thing, published by Zvi on February 8, 2024 on LessWrong. In a week with two podcasts I covered extensively, I was happy that there was little other news. That is, until right before press time, when Google rebranded Bard to Gemini, released an app for that, and offered a premium subscription ($20/month) for Gemini Ultra. Gemini Ultra is Here I have had the honor and opportunity to check out Gemini Advanced before its release. The base model seems to be better than GPT-4. It seems excellent for code, for explanations and answering questions about facts or how things work, for generic displays of intelligence, for telling you how to do something. Hitting the Google icon to have it look for sources is great. In general, if you want to be a power user, if you want to push the envelope in various ways, Gemini is not going to make it easy on you. However, if you want to be a normal user, doing the baseline things that I or others most often find most useful, and you are fine with what Google 'wants' you to be doing? Then it seems great. The biggest issue is that Gemini can be conservative with its refusals. It is graceful, but it will still often not give you what you wanted. There is a habit of telling you how to do something, when you wanted Gemini to go ahead and do it. Trying to get an estimation or probability of any kind can be extremely difficult, and that is a large chunk of what I often want. If the model is not sure, it will say it is not sure and good luck getting it to guess, even when it knows far more than you. This is the 'doctor, is this a 1%, 10%, 50%, 90% or 99% chance?' situation, where they say 'it could be cancer' and they won't give you anything beyond that. I've learned to ask such questions elsewhere. There are also various features in ChatGPT, like GPTs and custom instructions and playground settings, that are absent. Here I do not know what Google will decide to do. I expect this to continue to be the balance. Gemini likely remains relatively locked down and harder to customize or push the envelope with, but very good at normal cases, at least until OpenAI releases GPT-5, then who knows. There are various other features where there is room for improvement. Knowledge of the present I found impossible to predict, sometimes it knew things and it was great, other times it did not. The Gemini Extensions are great when they work and it would be great to get more of them, but are finicky and made several mistakes, and we only get these five for now. The image generation is limited to 512512 (and is unaware that it has this restriction). There are situations in which your clear intent is 'please do or figure out X for me' and instead it tells you how to do or figure out X yourself. There are a bunch of query types that could use more hard-coding (or fine-tuning) to get them right, given how often I assume they will come up. And so on. While there is still lots of room for improvement and the restrictions can frustrate, Gemini Advanced has become my default LLM to use over ChatGPT for most queries. I plan on subscribing to both Gemini and ChatGPT. I am not sure which I would pick if I had to choose. Table of Contents Don't miss the Dwarkesh Patel interview with Tyler Cowen. You may or may not wish to miss the debate between Based Beff Jezos and Connor Leahy. Introduction. Gemini Ultra is here. Table of Contents. Language Models Offer Mundane Utility. Read ancient scrolls, play blitz chess. Language Models Don't Offer Mundane Utility. Keeping track of who died? Hard. GPT-4 Real This Time. The bias happens during fine-tuning. Are agents coming? Fun With Image Generation. Edit images directly in Copilot. Deepfaketown and Botpocalypse Soon. $25 million payday, threats to democracy. They Took Our Jobs. Journalists and lawyers. Get In...

LW - AI #50: The Most Dangerous Thing by Zvi

Play Episode Listen Later Feb 8, 2024 36:57

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #50: The Most Dangerous Thing, published by Zvi on February 8, 2024 on LessWrong. In a week with two podcasts I covered extensively, I was happy that there was little other news. That is, until right before press time, when Google rebranded Bard to Gemini, released an app for that, and offered a premium subscription ($20/month) for Gemini Ultra. Gemini Ultra is Here I have had the honor and opportunity to check out Gemini Advanced before its release. The base model seems to be better than GPT-4. It seems excellent for code, for explanations and answering questions about facts or how things work, for generic displays of intelligence, for telling you how to do something. Hitting the Google icon to have it look for sources is great. In general, if you want to be a power user, if you want to push the envelope in various ways, Gemini is not going to make it easy on you. However, if you want to be a normal user, doing the baseline things that I or others most often find most useful, and you are fine with what Google 'wants' you to be doing? Then it seems great. The biggest issue is that Gemini can be conservative with its refusals. It is graceful, but it will still often not give you what you wanted. There is a habit of telling you how to do something, when you wanted Gemini to go ahead and do it. Trying to get an estimation or probability of any kind can be extremely difficult, and that is a large chunk of what I often want. If the model is not sure, it will say it is not sure and good luck getting it to guess, even when it knows far more than you. This is the 'doctor, is this a 1%, 10%, 50%, 90% or 99% chance?' situation, where they say 'it could be cancer' and they won't give you anything beyond that. I've learned to ask such questions elsewhere. There are also various features in ChatGPT, like GPTs and custom instructions and playground settings, that are absent. Here I do not know what Google will decide to do. I expect this to continue to be the balance. Gemini likely remains relatively locked down and harder to customize or push the envelope with, but very good at normal cases, at least until OpenAI releases GPT-5, then who knows. There are various other features where there is room for improvement. Knowledge of the present I found impossible to predict, sometimes it knew things and it was great, other times it did not. The Gemini Extensions are great when they work and it would be great to get more of them, but are finicky and made several mistakes, and we only get these five for now. The image generation is limited to 512512 (and is unaware that it has this restriction). There are situations in which your clear intent is 'please do or figure out X for me' and instead it tells you how to do or figure out X yourself. There are a bunch of query types that could use more hard-coding (or fine-tuning) to get them right, given how often I assume they will come up. And so on. While there is still lots of room for improvement and the restrictions can frustrate, Gemini Advanced has become my default LLM to use over ChatGPT for most queries. I plan on subscribing to both Gemini and ChatGPT. I am not sure which I would pick if I had to choose. Table of Contents Don't miss the Dwarkesh Patel interview with Tyler Cowen. You may or may not wish to miss the debate between Based Beff Jezos and Connor Leahy. Introduction. Gemini Ultra is here. Table of Contents. Language Models Offer Mundane Utility. Read ancient scrolls, play blitz chess. Language Models Don't Offer Mundane Utility. Keeping track of who died? Hard. GPT-4 Real This Time. The bias happens during fine-tuning. Are agents coming? Fun With Image Generation. Edit images directly in Copilot. Deepfaketown and Botpocalypse Soon. $25 million payday, threats to democracy. They Took Our Jobs. Journalists and lawyers. Get In...

Gemini Ultra Is Close

GPT Reviews

Play Episode Listen Later Feb 6, 2024 12:58

Google is revamping its Bard chatbot under a new name, Gemini, with the launch of the highly anticipated Gemini Ultra model. Hugging Face has launched an open source assistant creator, allowing users to create customizable AI assistants with just two clicks. OpenAI has released a VisionOS ChatGPT app for the Apple Vision Pro, bringing AI and AR together. Three interesting research papers were discussed, including Nomic Embed, Boximator, and StepCoder, all making strides in the field of AI. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:38 Leaked doc reveals Bard rebrand and Gemini Ultra launch 02:56 Hugging Face launches open source assistant creator 04:09 AI meets AR as ChatGPT is now available on the Apple Vision Pro 05:36 Fake sponsor 07:15 Nomic Embed: Training a Reproducible Long Context Text Embedder 08:34 Boximator: Generating Rich and Controllable Motions for Video Synthesis 10:11 StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback 11:48 Outro

ai google creator fake chatgpt assistant openai gemini bard open source leaked apple vision pro reinforcement learning gemini ultra

EP 200: 200 Facts, Stats, and Hot Takes About GenAI - Celebrating 200 Episodes

The AI Breakdown: Daily Artificial Intelligence News and Discussions

Play Episode Listen Later Feb 5, 2024 72:19

We made it to 200 episodes! Woohoo! To celebrate, we're giving you 200 facts, stats, and even our hot takes on everything GenAI. Thanks for all the support and love!Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode pageJoin the discussion: Ask Jordan questions on AIUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTimestamps:00:00 Welcome to episode 20007:16 Large language models - Early GenAI movers gaining market share.15:39 Biden's AI order, Amazon Rufus, Google's achievements.20:00 Large language model can lead to hallucinations.23:09 Daily coverage of generative AI, new products.29:07 Generative AI transforms various prompts into output.33:52 Hugging Face is open source ChatGPT alternative.41:02 EU AI Act, Hiroshima process, White House resources48:14 Public companies must disclose data online, especially language models.53:30 Large language models can cause hallucinations.57:06 New technology will outpace Gemini Ultra01:01:40 Big tech companies prop up US economy.Topics Covered in This Episode:1. The Growing Presence of AI in Various Fields2. Concerns and Predictions about AI3. Economic Impact of Gen AI4. Updates and Developments in AI Technology5. Risks and Considerations in AI ImplementationKeywords:AI video models, Runway, Pika 1.0, Google, Meta, Gemini Ultra, GPT-5, AI workforce shortage, Generative AI, AI disinformation, US AI legislation, Gen AI global economy, AI automation, GPT experts, AI language models, AI deepfakes, Generative AI cost cutting, AI in healthcare, AI in climate change, OpenAI revenue, Articulate AI, Microsoft Copilot Pro, AI job disruption, AI Governance, AI financial stability risk, AI in nuclear weapon systems, AI opportunity agenda, AI chatbot, AI robot CEO, AI-driven marketing insights. Get more out of ChatGPT by learning our PPP method in this live, interactive and free training! Sign up now: https://youreverydayai.com/ppp-registration/

LEAK: Google Gemini Ultra Coming This Week?

Play Episode Listen Later Feb 5, 2024 14:15

According to a changelog found by an Android developer, Google Ultra is coming out on Wednesday, and Bard will be rebranded to Gemini. Also on this episode, Donald Trump calls AI "Dangerous and Scary" ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/

donald trump ai google android gemini leak bard google gemini gemini ultra

皮查伊：Gemini Ultra AI模型将很快面世

网事头条｜听见新鲜事

Play Episode Listen Later Jan 31, 2024 0:16

gemini gemini ultra

AF - We need a science of evals by Marius Hobbhahn

Play Episode Listen Later Jan 22, 2024 17:24

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: We need a science of evals, published by Marius Hobbhahn on January 22, 2024 on The AI Alignment Forum. This is a linkpost for https://www.apolloresearch.ai/blog/we-need-a-science-of-evals In this post, we argue that if AI model evaluations (evals) want to have meaningful real-world impact, we need a "Science of Evals", i.e. the field needs rigorous scientific processes that provide more confidence in evals methodology and results. Model evaluations allow us to reduce uncertainty about properties of Neural Networks and thereby inform safety-related decisions. For example, evals underpin many Responsible Scaling Policies and future laws might directly link risk thresholds to specific evals. Thus, we need to ensure that we accurately measure the targeted property and we can trust the results from model evaluations. This is particularly important when a decision not to deploy the AI system could lead to significant financial implications for AI companies, e.g. when these companies then fight these decisions in court. Evals are a nascent field and we think current evaluations are not yet resistant to this level of scrutiny. Thus, we cannot trust the results of evals as much as we would in a mature field. For instance, one of the biggest challenges Language Model (LM) evaluations currently face is the model's sensitivity to the prompts used to elicit a certain capability ( Liang et al., 2022; Mizrahi et al., 2023; Scalar et al., 2023; Weber et al., 2023, Bsharat et al., 2023). Scalar et al., 2023, for example, find that "several widely used open-source LLMs are extremely sensitive to subtle changes in prompt formatting in few-shot settings, with performance differences of up to 76 accuracy points [...]". A post by Anthropic also suggests that simple formatting changes to an evaluation, such as "changing the options from (A) to (1) or changing the parentheses from (A) to [A], or adding an extra space between the option and the answer can lead to a ~5 percentage point change in accuracy on the evaluation." As an extreme example, Bsharat et al., 2023 find that "tipping a language model 300K for a better solution" leads to increased capabilities. Overall, this suggests that under current practices, evaluations are much more an art than a science. Since evals often aim to estimate an upper bound of capabilities, it is important to understand how to elicit maximal rather than average capabilities. Different improvements to prompt engineering have continuously raised the bar and thus make it hard to estimate whether any particular negative result is meaningful or whether it could be invalidated by a better technique. For example, prompting techniques such as Chain-of-Thought prompting ( Wei et al, 2022), Tree of Thought prompting ( Yao et al., 2023), or self-consistency prompting ( Wang et al. 2022), show how LM capabilities can greatly be improved with principled prompts compared to previous prompting techniques. To point to a more recent example, the newly released Gemini Ultra model ( Gemini Team Google, 2023) achieved a new state-of-the-art result on MMLU with a new inference technique called uncertainty-routed chain-of-thought, outperforming even GPT-4. However, when doing inference with chain-of-thought@32 (sampling 32 results and taking the majority vote), GPT-4 still outperforms Gemini Ultra. Days later, Microsoft introduced a new prompting technique called Medprompt ( Nori et al., 2023), which again yielded a new Sota result on MMLU, barely outperforming Gemini Ultra. These examples should overall illustrate that it is hard to make high-confidence statements about maximal capabilities with current evaluation techniques. In contrast, even everyday products like shoes undergo extensive testing, such as repeated bending to assess material fatigue. For higher-stake things l...

ai science microsoft model tree speech chain ea weber wang wei gpt marius 300k anthropic liang lm neural networks scalar mizrahi yao rationalist gemini ultra

EP 171: GenAI in 2024 - What's coming and what it means for you

Play Episode Listen Later Dec 22, 2023 69:15

2023 has been the year of generative AI. We've talked with entrepreneurs, startup founders, and industry tech leaders, and there's a lot that we've learned behind the scenes. We're unleashing that knowledge and telling you our GenAI predictions for 2024.Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Ask Jordan questions on AIUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTimestamps:[00:01:40] Daily AI news[00:08:55] Jordan's predictions for 2024 (#24 to #6)[00:52:20] Top 5 predictions Topics Covered in This Episode:1. AI Developments and Predictions for 20242. Trends in AI Usage and ImpactKeywords:AI watermarks, Grok, large language model, Twitter, misinformation, disinformation, Apple, home assistants, Siri, Alexa, NVIDIA stock, GPU chips, Gemini Ultra, GPT 5, AI video, AI images, GPT, OpenAI, AI legislation, everyday AI, Jordan Wilson, Gen AI, AI agents, retrieval augmented generation, RAG, publishers, internet browsing, big acquisition, copyright battles, Gen AI training, job loss, workforce, knowledge work.

ai apple predictions siri openai nvidia gpt genai gpu grok rag jordan wilson newsletter sign gemini ultra

LCC 304 - Dark punk

Les Cast Codeurs Podcast

Play Episode Listen Later Dec 18, 2023 99:41

Dans cet épisode, Katia, Arnaud et Emmanuel discutent les nouvelles de cette fin 2023. Le gatherer dans les stream Java, les exceptions, JavaScript dans la JVM, recherche vectorielle, coût du cloud, Gemini, Llama et autres animaux fantastiques et pleins d'outils sympathiques pour fêter la fin de l'année. Enregistré le 15 décembre 2023 Téléchargement de l'épisode LesCastCodeurs-Episode-304.mp3 News Aide Les Cast Codeurs et remplis un petit formulaire pour nous guider l'année prochaine https://lescastcodeurs.com/sondage Langages Avec JEP 461, arrivée dans en preview dans Java 22 de la notion de “gatherer” pour les streams https://groovy.apache.org/blog/groovy-gatherers dans cet article de Paul King, de l'équipe Groovy, il montre et contraste ce que l'on pouvait faire en Groovy depuis des années, comme des sliding windows, par exemple explique l'approche des gatherers avec ses opérations intermédiaires gatherer sont des operations intermediaires custom qui prennent un etat et le prochain element pour decided quoi faire, et meme changer le stream d'elements suivants (en publier) (via la fonction integrate certains peuvent permettre de combiner les resultats intermediaires (pour paralleliser) Examples : fenetres de taille fixe, fenettres glissantes Joe Duffy, qui est CEO de Pulumi, mais qui avait travaillé chez Microsoft sur le project Midori (un futur OS repensé) parle du design des exceptions, des erreurs, des codes de retour https://joeduffyblog.com/2016/02/07/the-error-model/ Il compare les codes d'erreurs, les exceptions, checked et non-checked il separe les bugs des erreurs attendues (bugs doivent arreter le process) il raconte l'histoire des unchecked exception et leurs problemes et des checked exceptopns et poourquoi les developeurs java les detestent (selon lui) long article maisn interessant dans ses retours mais lon je ne suis pas allé au bout :smile: Après la disparition de Nashorn dans le JDK, on peut se tourner vers le projet Javet https://www.caoccao.com/Javet/index.html Javet permet d'intégrer JavaScript avec le moteur V8 Mais aussi carrément Node.js c'est super comme capacité car on a les deux mielleurs moteurs, par contre le support hors x86 est plus limité (genre arm sous windows c'est non) Librairies Une partie de l'équipe Spring se fait lourder après le rachat effectif de Broadcom https://x.com/odrotbohm/status/1729231722498425092?s=20 peu d'info en vrai à part ce tweet mais l'acquisition Broadcome n'a pas l'air de se faire dans le monde des bisounours Marc Wrobel annonce la sortie de JBanking 4.2.0 https://www.marcwrobel.fr/sortie-de-jbanking-4-2-0 support de Java 21 possibilité de générer aléatoirement des BIC amélioration de la génération d'IBAN jbanking est une bibliotheque pour manipuler des structures typiques des banques comme les IBAN les BIC, les monnaies, les SEPA etc. Hibernate Search 7 est sorti https://in.relation.to/2023/12/05/hibernate-search-7-0-0-Final/ Support ElasticSearch 8.10-11 et openSearch 2.10-11 Rebasé sur Lucerne 9.8 support sur Amazon OpenSearch Serverless (experimental) attention sous ensemble de fonctionnalités sur Serverless, c'est un API first search cluster vendu a la lambda En lien aussi sur la version 7.1 alpha1 Hibernate ORM 6.4 est sorti https://in.relation.to/2023/11/23/orm-640-final/ support pour SoftDelete (colonne marquant la suppression) support pour les operations vectorielles (support postgreSQL initialement) les fonctions vectorielles sont particulièrement utilisées par l'IA/ML événement spécifiques JFR Intégration de citrus et Quarkus pour les tests d'intégrations de pleins de protocoles et formats de message https://quarkus.io/blog/testing-quarkus-with-citrus/ permet de tester les entrees / sorties attendues de systèmes de messages (HTTP, Kafka, serveur mail etc) top pour tester les application Event Driven pas de rapport mais Quarkus 3.7 ciblera Java 17 (~8% des gens utilisaient Java 11 dans les builds qui ont activé les notifications) Hibernate Search 7.1 (dev 7.1.0.Alpha1) avec dernière version de Lucene (9.8), Infinispan rajoute le support pour la recherche vectorielle. https://hibernate.org/search/releases/7.1/ https://infinispan.org/blog/2023/12/13/infinispan-vector-search Hibernate Search permet maintenant la recherche vectorielle La dernière version est intégrée en Infinispan 15 (dev) qui sortira La recherche vectoriolle et stockage de vecteurs, permettent convertir Infinispan en Embedding Store (langchain) Cloud Comment choisir sa region cloud https://blog.scottlogic.com/2023/11/23/conscientious-cloud-pick-your-cloud-region-deliberately.html pas si simple le coût la securité légale de vos données la consommation carbone de la région choisie (la France est top, la Pologne moins) la latence vs où sont vos clients les services supportés Web Vers une standardisation des Webhooks ? https://www.standardwebhooks.com/ Des gens de Zapier, Twilio, Ngrok, Kong, Supabase et autres, se rejoignent pour essayer de standardiser l'approche des Webhooks La spec est open source (Apache) sur Github https://github.com/standard-webhooks/standard-webhooks/blob/main/spec/standard-webhooks.md Les objectifs sont la sécurité, la reliabilité, l'interopérabilité, la simplicité et la compatibilité (ascendante / descendante) sans la spec, chaque webhook est different dans son comportement et donc les clients doivent s'adapter dans la sematique et les erreurs etc la (meta-) structure de la payload, la taille, la securisation via signature (e.g. hmac), les erreurs (via erreurs HTTP), etc Data et Intelligence Artificielle Google annonce Gemini, son nouveau Large Language Model https://blog.google/technology/ai/google-gemini-ai/#sundar-note modèle multimodal qui peut prendre du texte, en entrée, mais aussi des images, du son, des vidéos d'après les benchmarks, il est largement aussi bon que GPT4 plusieurs tailles de modèles disponible : Nano pour être intégré aux mobiles, Pro qui va être utilisé dans la majeure partie des cas, et Ultra pour les besoins de réflexion les plus avancés Android va rajouter aussi des librairies AICore pour utiliser Gemini Nano dans les téléphones Pixel https://android-developers.googleblog.com/2023/12/a-new-foundation-for-ai-on-android.html Gemini Pro va être disponible dans Bard (en anglais et dans 170 pays, mais l'Europe va devoir attendre un petit peu pour que ce soit dispo) Gemini Ultra devrait aussi rejoindre Bard, dans une version étendue https://blog.google/products/bard/google-bard-try-gemini-ai/ Gemini va être intégré progressivement dans plein de produits Google DeepMind parlant de Gemini https://deepmind.google/technologies/gemini/#introduction Un rapport de 60 pages sur Gemini https://storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf Gemini a permis aussi de pouvoir développer une nouvelle version du modèle AlphaCode qui excelle dans les compétitions de coding https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf Liste de petites vidéos sur YouTube avec des interviews et démonstrations des capacités de Gemini https://www.youtube.com/playlist?list=PL590L5WQmH8cSyqzo1PwQVUrZYgLcGZcG malheureusement certaines des annonces sont un peu fausse ce qui a amené un discrédit (non du) sur Gemini par exemple la video “aspirationelle” était vendue comme du réel mais ce n'est pas le cas. et ultra n'est pas disponible encore ausso la comparaison de ChatGPT sur la page (initialement au moins) comparait des choux et des carottes, meme si le papier de recherche était correct Avec la sortie de Gemini, Guillaume a écrit sur comment appeler Gemini en Java https://glaforge.dev/posts/2023/12/13/get-started-with-gemini-in-java/ Gemini est multimodèle, donc on peut passer aussi bien du texte que des images, ou même de la vidéo Il y a un SDK en Java pour interagir avec l'API de Gemini Facebook, Purple Llama https://ai.meta.com/blog/purple-llama-open-trust-safety-generative-ai/ Opensource https://ai.meta.com/llama/ dans l'optique des modeles GenAI ouverts, Facebook fournit des outils pour faire des IA responsables (mais pas coupables :wink: ) notament des benchmarks pour evaluler la sureté et un classifier de sureté, par exemple pour ne pas generer du code malicieux (ou le rendre plus dur) llama purple sera un projet parapluie D'ailleurs Meta IBM, Red Hat et pleins d'autres ont annoncé l'AI Alliance pour une AI ouverte et collaborative entre académique et industriels. Sont notammenrt absent Google, OpenAI (pas ouvert) et Microsoft Juste une annouce pour l'instant mais on va voir ce que ces acteurs de l'AI Alliance feront de concret il y a aussi un guide d'utilisateur l'usage IA responsable (pas lu) Apple aussi se met aux librairies de Machine Learning https://ml-explore.github.io/mlx/build/html/index.html MLX est une librairie Python qui s'inspire fortement de NumPy, PyTorch, Jax et ArrayFire Surtout, c'est développé spécifiquement pour les Macs, pour tirer au maximum parti des processeurs Apple Silicon Dans un des repos Github, on trouve également des exemples qui font tourner nativement sur macOS les modèles de Llama, de Mistral et d'auters https://github.com/ml-explore/mlx-examples non seulement les Apple Silicon amis aussi la memoire unifiee CPU/GPU qui est une des raisons clés de la rapidité des macs Faire tourner Java dans un notebook Jupyter https://www.javaadvent.com/2023/12/jupyter-notebooks-and-java.html Max Andersen explore l'utilisation de Java dans les notebooks Jupyter, au lieu du classique Python il y a des kernels java selon vos besoins mais il faut les installer dans la distro jupyter qu'on utilise et c'est la que jbang installable via pip vient a la rescousse il installe automatiquement ces kernels en quelques lignes Outillage Sfeir liste des jeux orientés développeurs https://www.sfeir.dev/tendances/notre-selection-de-jeux-de-programmation/ parfait pour Noël mais c'est pour ceux qui veulent continuer a challenger leur cerveau après le boulot jeu de logique, jeu de puzzle avec le code comme forme, jeu autour du machine learning, jeu de programmation assembleur Les calendriers de l'Avent sont populaires pour les développeurs ! En particulier avec Advent of Code https://adventofcode.com/ Mais il y a aussi l'Advent of Java https://www.javaadvent.com/ Ou un calendrier pour apprendre les bases de SVG https://svg-tutorial.com/ Le calendrier HTML “hell” https://www.htmhell.dev/adventcalendar/ qui parle d'accessibilité, de web components, de balises meta, de toutes les choses qu'on peut très bien faire en HTML/CSS sans avoir besoin de JavaScript Pour les développeurs TypeScript, il y a aussi un calendrier de l'Avent pour vous ! https://typehero.dev/aot-2023 Un super thread de Clara Dealberto sur le thème de la “dataviz” (data visualization) https://twitter.com/claradealberto/status/1729447130228457514 Beaucoup d'outil librement accessibles sont mentionnés pour faire toutes sortes de visualisations (ex. treemap, dendros, sankey…) mais aussi pour la cartographie Quelques ressources de site qui conseillent sur l'utilisation du bon type de visualisation en fonction du problème et des données que l'on a notemment celui du financial time qui tiens dans une page de PDF Bref c'est cool mais c'est long a lire Une petite liste d'outils sympas - jc pour convertir la sortie de commandes unix en JSON https://github.com/kellyjonbrazil/jc - AltTab pour macOS pour avoir le même comportement de basculement de fenêtre que sous Windows https://alt-tab-macos.netlify.app/ - gron pour rendre le JSON grep-able, en transformant chaque valeur en ligne ressemblant à du JSONPath https://github.com/tomnomnom/gron - Marker, en Python, pour transformer des PDF en beau Markdown https://github.com/VikParuchuri/marker - n8n un outil de workflow open source https://n8n.io/ gron en fait montre des lignes avec des assignments genre jsonpath = value et tu peux ungroner apres pour revenir a du json Marker utilise du machine learning mais il halklucine moins que nougat (nous voilà rassuré) Docker acquiert Testcontainers https://techcrunch.com/2023/12/11/docker-acquires-atomicjar-a-testing-startup-that-raised-25m-in-january/ Annonce par AtomicJar https://www.atomicjar.com/2023/12/atomicjar-is-now-part-of-docker/ Annonce par Docker https://www.docker.com/blog/docker-whale-comes-atomicjar-maker-of-testcontainers/ Architecture Comment implémenter la reconnaissance de chanson, comme Shazam https://www.cameronmacleod.com/blog/how-does-shazam-work il faut d'abord passer en mode fréquence avec des transformées de Fourrier pour obtenir des spectrogrammes puis créer une sorte d'empreinte qui rassemble des pics de fréquences notables à divers endroits de la chanson d'associer ces pics pour retrouver un enchainement de tels pics de fréquence dans le temps l'auteur a partagé son implémentation sur Github https://github.com/notexactlyawe/abracadabra/blob/e0eb59a944d7c9999ff8a4bc53f5cfdeb07b39aa/abracadabra/recognise.py#L80 Il y avait également une très bonne présentation sur ce thème par Moustapha Agack à DevFest Toulouse https://www.youtube.com/watch?v=2i4nstFJRXU les pics associés sont des hash qui peut etre comparés et le plus de hash veut dire que les chansons sont plus similaires Méthodologies Un mémo de chez ThoughtWorks à propos du coding assisté par IA https://martinfowler.com/articles/exploring-gen-ai.html#memo-08 Avec toute une liste de questions à se poser dans l'utilisation d'un outil tel que Copilot Il faut bien réaliser que malheureusement, une IA n'a pas raison à 100% dans ses réponses, et même plutôt que la moitié du temps, donc il faut bien mettre à jour ses attentes par rapport à cela, car ce n'est pas magique La conclusion est intéressante aussi, en suggérant que grosso modo dans 40 à 60% des situations, tu peux arriver à 40 à 80% de la solution. Est-ce que c'est à partir de ce niveau là qu'on peut vraiment gagner du temps et faire confiance à l'IA ? Ne perdez pas trop de temps non plus à essayer de convaincre l'IA de faire ce que vous voulez qu'elle fasse. Si vous n'y arrivez pas, c'est sans doute parce que l'IA n'y arrivera même pas elle même ! Donc au-delà de 10 minutes, allez lire la doc, chercher sur Google, etc. notamment, faire genrer les tests par l'IA dans al foulée augmente les risques surtout si on n'est pas capable de bien relire le code si on introduit un choix de pattern genre flexbox en CSS, si c'est sur une question de sécuriter, vérifier (ceinture et bretelle) est-ce le framework de la semaine dernière? L'info ne sera pas dans le LLM (sans RAG) Quelles capacités sont nécessaires pour déployer un projet AI/ML https://blog.scottlogic.com/2023/11/22/capabilities-to-deploy-ai-in-your-organisation.html C'est le MLOps et il y a quelques modèles end to end Google, IBM mais vu la diversité des organisations, c'est difficile a embrasser ces versions completes ML Ops est une métier, data science est un metier, donc intégrer ces competences sachez gérer votre catalogue de données Construire un process pour tester vos modèles et continuellement La notion de culture de la recherche et sa gestion (comme un portefeuille financier, accepter d'arrêter des experience etc) la culture de la recherche est peu présente en engineering qui est de construire des choses qui foncitonnent c'est un monde pre LLM Vous connaissez les 10 dark patterns de l'UX ? Pour vous inciter à cliquer ici ou là, pour vous faire rester sur le site, et plus encore https://dodonut.com/blog/10-dark-patterns-in-ux-design/ Parmi les dark patterns couverts Confirmshaming Fake Urgency and the Fear of Missing Out Nagging Sneaking Disguised Ads Intentional Misdirection The Roach Motel Pattern Preselection Friend Spam Negative Option Billing or Forced Continuity L'article conclut avec quelques pistes sur comment éviter ces dark patterns en regardant les bons patterns de la concurrence, en testant les interactions UX, et en applicant beaucoup de bon sens ! les dark patterns ne sont pas des accidents, ils s'appuient sur la psychologie et sont mis en place specifiquement Comment choisir de belles couleurs pour la visualisation de données ? https://blog.datawrapper.de/beautifulcolors/ Plutôt que de penser en RGB, il vaut mieux se positionner dans le mode Hue Saturation Brightness Plein d'exemples montrant comment améliorer certains choix de couleurs Mieux vaut éviter des couleurs trop pures ou des couleurs trop brillantes et saturées Avoir un bon contraste Penser aussi aux daltoniens ! j'ai personnellement eu toujours du mal avec saturationm vs brightness faire que les cloueirs en noir et blanc soient separees evant de le remettre (en changeant la brightness de chaque couleur) ca aide les daltoniens eviter les couleurs aux 4 coins amis plutot des couleurs complementaires (proches) rouge orange et jaune (non saturé) et variations de bleu sont pas mal les couleurs saturées sont aggressives et stressent les gens Pourquoi vous devriez devenir Engineering Manager? https://charity.wtf/2023/12/15/why-should-you-or-anyone-become-an-engineering-manager/ L'article parle de l'évolution de la perception de l'engineering management qui n'est plus désormais le choix de carrière par défaut pour les ingénieurs ambitieux. Il met en évidence les défis auxquels les engineering managers sont confrontés, y compris les attentes croissantes en matière d'empathie, de soutien et de compétences techniques, ainsi que l'impact de la pandémie de COVID-19 sur l'attrait des postes de management. L'importance des bons engineering mnanagers est soulignée, car ils sont considérés comme des multiplicateurs de force pour les équipes, contribuant de manière significative à la productivité, à la qualité et au succès global dans les environnements organisationnels complexes. L'article fournit des raisons pour lesquelles quelqu'un pourrait envisager de devenir Engineering Manager, y compris acquérir une meilleure compréhension de la façon dont les entreprises fonctionnent, contribuer au mentorat et influencer les changements positifs dans la dynamique des équipes et les pratiques de l'industrie. Une perspective est présentée, suggérant que devenir Engineering manager peut conduire à la croissance personnelle et à l'amélioration des compétences de vie, telles que l'autorégulation, la conscience de soi, la compréhension des autres, l'établissement de limites, la sensibilité à la dynamique du pouvoir et la maîtrise des conversations difficiles. L'article encourage à considérer la gestion comme une occasion de développer et de porter ces compétences pour la vie. Sécurité LogoFAIL une faille du bootloader de beaucoup de machines https://arstechnica.com/security/2023/12/just-about-every-windows-and-linux-device-vulnerable-to-new-logofail-firmware-attack/ en gros en changeant les eimages qu'on voit au boot permet d'executer du code arbitraire au tout debuit de la securisation du UEFI (le boot le plus utilisé) donc c'est game over parce que ca demarre avant l'OS c'est pas une exploitation a distance, il faut etre sur la machine avec des droits assez elevés deja mais ca peut etre la fin de la chaine d'attaque et comme d'hab un interpreteur d'image est la cause de ces vulnerabilités Conférences L'IA au secours de conférences tech: rajoute des profile tech femme comme speaker au programme pour passer le test diversité online via des profiles fake. https://twitter.com/GergelyOrosz/status/1728177708608450705 https://www.theregister.com/2023/11/28/devternity_conference_fake_speakers/ https://www.developpez.com/actu/351260/La-conference-DevTernity-sur-la-technologie-s-e[…]s-avoir-cree-de-fausses-oratrices-generees-automatiquement/ j'avais lu le tweet du createur de cette conf qui expliquait que c'etait des comptes de tests et que pris dans le rush ils avaient oublié de les enlever mais en fait les comptes de tests ont des profils “Actifs” sur le reseaux sociaux apparemment donc c'était savamment orchestré Au final beaucoup de speakers et des sponsors se desengagent La liste des conférences provenant de Developers Conferences Agenda/List par Aurélie Vache et contributeurs : 31 janvier 2024-3 février 2024 : SnowCamp - Grenoble (France) 1 février 2024 : AgiLeMans - Le Mans (France) 6 février 2024 : DevFest Paris - Paris (France) 8-9 février 2024 : Touraine Tech - Tours (France) 15-16 février 2024 : Scala.IO - Nantes (France) 6-7 mars 2024 : FlowCon 2024 - Paris (France) 14-15 mars 2024 : pgDayParis - Paris (France) 19 mars 2024 : AppDeveloperCon - Paris (France) 19 mars 2024 : ArgoCon - Paris (France) 19 mars 2024 : BackstageCon - Paris (France) 19 mars 2024 : Cilium + eBPF Day - Paris (France) 19 mars 2024 : Cloud Native AI Day Europe - Paris (France) 19 mars 2024 : Cloud Native Wasm Day Europe - Paris (France) 19 mars 2024 : Data on Kubernetes Day - Paris (France) 19 mars 2024 : Istio Day Europe - Paris (France) 19 mars 2024 : Kubeflow Summit Europe - Paris (France) 19 mars 2024 : Kubernetes on Edge Day Europe - Paris (France) 19 mars 2024 : Multi-Tenancy Con - Paris (France) 19 mars 2024 : Observabiity Day Europe - Paris (France) 19 mars 2024 : OpenTofu Day Europe - Paris (France) 19 mars 2024 : Platform Engineering Day - Paris (France) 19 mars 2024 : ThanosCon Europe - Paris (France) 19-21 mars 2024 : IT & Cybersecurity Meetings - Paris (France) 19-22 mars 2024 : KubeCon + CloudNativeCon Europe 2024 - Paris (France) 26-28 mars 2024 : Forum INCYBER Europe - Lille (France) 28-29 mars 2024 : SymfonyLive Paris 2024 - Paris (France) 4-6 avril 2024 : Toulouse Hacking Convention - Toulouse (France) 17-19 avril 2024 : Devoxx France - Paris (France) 18-20 avril 2024 : Devoxx Greece - Athens (Greece) 25-26 avril 2024 : MiXiT - Lyon (France) 25-26 avril 2024 : Android Makers - Paris (France) 8-10 mai 2024 : Devoxx UK - London (UK) 16-17 mai 2024 : Newcrafts Paris - Paris (France) 24 mai 2024 : AFUP Day Nancy - Nancy (France) 24 mai 2024 : AFUP Day Poitiers - Poitiers (France) 24 mai 2024 : AFUP Day Lille - Lille (France) 24 mai 2024 : AFUP Day Lyon - Lyon (France) 2 juin 2024 : PolyCloud - Montpellier (France) 6-7 juin 2024 : DevFest Lille - Lille (France) 6-7 juin 2024 : Alpes Craft - Grenoble (France) 27-28 juin 2024 : Agi Lille - Lille (France) 4-5 juillet 2024 : Sunny Tech - Montpellier (France) 19-20 septembre 2024 : API Platform Conference - Lille (France) & Online 7-11 octobre 2024 : Devoxx Belgium - Antwerp (Belgium) 10-11 octobre 2024 : Volcamp - Clermont-Ferrand (France) 10-11 octobre 2024 : Forum PHP - Marne-la-Vallée (France) 17-18 octobre 2024 : DevFest Nantes - Nantes (France) Nous contacter Pour réagir à cet épisode, venez discuter sur le groupe Google https://groups.google.com/group/lescastcodeurs Contactez-nous via twitter https://twitter.com/lescastcodeurs Faire un crowdcast ou une crowdquestion Soutenez Les Cast Codeurs sur Patreon https://www.patreon.com/LesCastCodeurs Tous les épisodes et toutes les infos sur https://lescastcodeurs.com/

covid-19 ceo fear ai google apple france spring data microsoft chatgpt code advent os android engineering dans windows ibm punk ia kong faire openai gemini machine learning shazam ux liste api conf sont bard open source python pixel java avoir quelques github guillaume llama mieux beaucoup aur javascript macos parmi html nano kafka apache llm donc arnaud macs genai plut css groovy zapier construire marker vall katia red hat annonce docker penser node kubernetes scala large language models ai ml sdks twilio pologne enregistr rgb broadcom bic json serverless mistral iban engineering manager paris france typescript midori google deepmind svg sepa markdown lucerne apple silicon postgresql thoughtworks paul king jvm vache pytorch actifs event driven uefi jupyter joe duffy nashorn webhooks html css jdk numpy supabase gemini pro cpu gpu ml ops lucene alphacode testcontainers gemini ultra alpha1 atomicjar logofail

Google vs OpenAI

Using AI

Play Episode Listen Later Dec 14, 2023 31:20

We don't delve too deep into the already covered demo-gate scandal, don't worry! This episode features insights from Senior ML Research Scientist Alex Pap and AI Startup Founder and CTO Nitish Mutha. We discuss: Google vs OpenAI for the long-term. GPT4 Vision, GPT5, Multimodal AI, Gemini Ultra, Gemini Pro, Gemini Nano, OpenAI Whisper, DALL·E 3, Chain of thought, Google, OpenAI, Bard, AI technology, Machine Learning Welcome to Episode 17: and the 3rd episode in our AI Market Leaders mini-series - focusing on Google vs OpenAI. This episode dives into all the details of the release in Gemini's Ultra, Pro, and Nano (and how that affects Alphacode2, and Bard). We also delve into multi-modal technology and its promise for the future. Watch This Episode of Using AI on youtube⁠⁠https://www.youtube.com/channel/UCHsQu4IipA7Ri2AqKcQZ1Yw⁠⁠ Topics Discussed: GPT4 Vision, GPT5, Multimodal AI Gemini's announcements and releases Google's catch up play with OpenAI (and a little bit about what they did wrong!) Additional Resources: Gemini Technical Report in full (PDF): https://storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf Reddit post: Testing the Gemini demo video screenshots with GPT-4: https://www.reddit.com/r/ChatGPT/comments/18d9wgn/asked_gpt4_some_logical_questions_from_the_gemini/ GPT4 + Gemini Pro for coding: https://www.reddit.com/r/ChatGPT/comments/18d773r/gpt4_and_gemini_cocreated_code_better_than_gpt4/ AI Explained's breakdown on Youtube: https://www.youtube.com/watch?v=toShbNUGAyo&ab_channel=AIExplained GPT4 comparison controversy https://twitter.com/kenshin9000_/status/1734238211088506967?s=46 Alex D's Midjourney background (Godzilla walking through a town in the style of 'The Starry Night Painting by Vincent van Gogh: https://www.reddit.com/r/midjourney/comments/18gt00b/exactly_what_i_expected_and_more_amazing/ Running an LLM on your Pixel 8 Pro: https://store.google.com/intl/en/ideas/articles/pixel-feature-drop-december-2023/ Deep dive into AlphaCode 2 on TechCrunch: ⁠https://tcrn.ch/46G5u8w⁠ --- Send in a voice message: https://podcasters.spotify.com/pod/show/using-ai/message

ai google running deep chatgpt testing reddit godzilla chain openai gemini bard using ai gpt pixel nano techcrunch llm midjourney gogh alex d gemini pro alphacode gemini ultra

EP 163: Google Gemini - ChatGPT killer or a marketing stunt?

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Dec 12, 2023 48:56

Google has been under fire after the release of its new Gemini. Sorry to say but Google got so many things wrong with the marketing and launch. Is Gemini an actual ChatGPT killer or just a marketing stunt gone wrong? We're covering everything you need to know.Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Ask Jordan questions about Google GeminiUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTimestamps:[00:02:17] Daily AI news[00:07:30] Overview of Google Gemini[00:10:40] Google lied about Gemini release[00:17:10] How Gemini demo was created[00:23:50] Comparing ChatGPT to Gemini[00:30:40] Benchmarks of Gemini vs ChatGPT[00:38:20] Why did Google release Gemini?[00:43:00] Consequences of botched releaseTopics Covered in This Episode:1. Introduction to Google's Gemini Model2. Google Gemini's Marketing Controversy3. Assessing Gemini's Performance and Functionality4. Comparison with ChatGPT5. Importance of Transparency and Truth in AI IndustryKeywords:Google Gemini, Generative AI, GPT-4.5, AI news, AI models, Google Bard, Multimodal AI, Google stock, Generative AI industry, Google credibility, Technology news, AI tools, Fact-based newsletter, Marketing misstep, Deceptive marketing, Multimodal functionality, Gemini Ultra, Gemini Pro, Benchmarks, Misrepresentation, Stock value, Text model, Image model, Audio model, Google services, Pro mode, Ultra mode, Marketing video Get more out of ChatGPT by learning our PPP method in this live, interactive and free training! Sign up now: https://youreverydayai.com/ppp-registration/ Get more out of ChatGPT by learning our PPP method in this live, interactive and free training! Sign up now: https://youreverydayai.com/ppp-registration/

44 | Google releases Gemini, and Meta, IBM, AMD, Intel and other giants form an AI alliance, and more AI news from this week

Leveraging AI

Play Episode Listen Later Dec 9, 2023 14:38 Transcription Available

Google Finally Unveils AI Model Gemini... With A CatchTopics covered:

The Busy Person's Intro to Finetuning & Open Source AI - Wing Lian, Axolotl

Play Episode Listen Later Dec 8, 2023 64:18

The Latent Space crew will be at NeurIPS on Tuesday! Reach out with any parties and papers of interest. We have also been incubating a smol daily AI Newsletter and Latent Space University is making progress.Good open models like Llama 2 and Mistral 7B (which has just released an 8x7B MoE model) have enabled their own sub-industry of finetuned variants for a myriad of reasons:* Ownership & Control - you take responsibility for serving the models* Privacy - not having to send data to a third party vendor* Customization - Improving some attribute (censorship, multiturn chat and chain of thought, roleplaying) or benchmark performance (without cheating)Related to improving benchmark performance is the ability to use smaller (7B, 13B) models, by matching the performance of larger models, which have both cost and inference latency benefits.Core to all this work is finetuning, and the emergent finetuning library of choice has been Wing Lian's Axolotl.AxolotlAxolotl is an LLM fine-tuner supporting SotA techniques and optimizations for a variety of common model architectures:It is used by many of the leading open source models:* Teknium: OpenHermes, Trismigestus, CollectiveCognition* OpenOrca: Mistral-OpenOrca, Mistral-SlimOrca* Nous Research: Puffin, Capybara, NousHermes* Pygmalion: Mythalion, Pygmalion* Eric Hartford: Dolphin, Samantha* DiscoResearch: DiscoLM 120B & 70B* OpenAccess AI Collective: Manticore, Minotaur, Jackalope, HippogriffAs finetuning is very formatting dependent, it also provides prompt interfaces and formatters between a range of popular model formats from Stanford's Alpaca and Steven Tey's ShareGPT (which led to Vicuna) to the more NSFW Pygmalion community.Nous Research MeetupWe last talked about Nous at the DevDay Recap at the e/acc “banger rave”. We met Wing at the Nous Research meetup at the a16z offices in San Francisco, where they officially announced their company and future plans:Including Nous Forge:Show NotesWe've already covered the nuances of Dataset Contamination and the problems with “Open Source” in AI, so we won't rehash those topics here but do read/listen to those if you missed it.* Axolotl GitHub and Discord* The Flan paper and dataset* StackLlama model and blogpost* Multipack paper* Our episode with Tri Dao* Mamba state space models - Tri Dao and Albert GuTimestamps* [00:00:00] Introducing Wing* [00:02:34] SF Open Source AI Meetup* [00:04:09] What is Axolotl?* [00:08:01] What is finetuning?* [00:08:52] Open Source Model Zoo* [00:10:53] Benchmarks and Contamination* [00:14:29] The Case for Open Source AI* [00:17:34] Orca and OpenOrca* [00:23:36] DiscoLM and Model Stacking* [00:25:07] Datasets and Evals over Models* [00:29:15] Distilling from GPT4* [00:33:31] Finetuning - LoRA, QLoRA, ReLoRA, GPTQ* [00:41:55] Axolotl vs HF Transformers* [00:48:00] 20x efficiency with StackLlama and Multipack* [00:54:47] Tri Dao and Mamba* [00:59:08] Roadmap for Axolotl* [01:01:20] The Open Source AI CommunityTranscript[00:00:00] Introducing Wing Lian[00:00:00] [00:00:00] swyx: Welcome to Latent Space, a special edition with Wing Lien, but also with our new guest host, Alex. Hello, hello. Welcome, welcome. Again, needs no introduction. I think it's like your sixth time on Latent Space already. I think so, yeah. And welcome, Wing. We just met, but you've been very prolific online. Thanks for having me.[00:00:30] Yeah. So you are in town. You're not local. You're in town. You're from Minneapolis?[00:00:35] Wing Lian: Annapolis. Annapolis. It's funny because a lot of people think it's Indianapolis. It's I've got Minneapolis, but I used to live out at least in the San Francisco Bay Area years ago from like 2008 to 2014. So it's fairly familiar here.[00:00:50] swyx: Yep. You're the maintainer of Axolotl now, which we'll get into. You're very, very prolific in the open source AI community, and you're also the founder of the Open Access AI Collective. Yeah. Cool. Awesome. Maybe we can go over a little bit of your backgrounds into tech and then coming into AI, and then we'll cover what[00:01:06] Wing Lian: happens and why you're here.[00:01:08] Yeah. So. Back on tech, so I started years ago, I started way back when I was scraping, Apartment websites for listings and then, and then building like SEO optimized pages and then just throwing Google AdSense on it.[00:01:24] And that got me through like college basically. Is[00:01:27] swyx: that decent money? And what year[00:01:28] Wing Lian: was this? Like 2004, 2005. Yeah, that's decent money. It's like thousand bucks a month. But as a college student, that's like. Gravy. Really good money, right? So, and then there's just too much competition It's just sort of like died off. I was writing stuff in like Perl back then using like like who nobody hosted anything on Perl anymore, right? Still did a little bit more like computer tech support and then software, and web more professionally.[00:01:54] So I spent some time working on applications in the blood industry. I came out to San Francisco for, I was at SGN, so Social Gaming Network, as a startup. They started doing, with Facebook apps, and then they pivoted into doing mobile apps. And then, from there, I spent time.[00:02:14] I've quite a few more startups since then and in the last few years I've been in the music space So like I was at United Masters for a while and then past year I've been at SoundCloud, but not doing that anymore and now that I have a lot more time It's just like all right.[00:02:30] We're going full bore on axolotl and we're gonna we're gonna crush AI So yeah,[00:02:34] SF Open Source AI Meetup[00:02:34] swyx: totally you so you're here in town for the open source. Yeah, I meet up that we had yesterday Yep, yeah, that was amazing. Yeah, it was a big collection. Olama, Noose Research, Alignment Lab, Anyone else that I missed? I mean, Jeremy Howard is his own thing.[00:02:47] Yeah.[00:02:49] And Alex, you're also there. You love to bring SF to the world. Your takes?[00:02:55] Alex Volkov: It's incredible that we recorded a Thursday Eye episode after that one. And LDJ, who's usually co hosts Thursday Eye, just like briefly mentioned, Oh yeah, I talked about it.[00:03:04] Like, I saw Karpathy, and then I talked to Jeremy Howard, and the guy from Mistral came in, and it's like, He's talking about all these, titans of industry, basically, that outside of SF, You just don't meet casually hanging out in the same space. You can't, pull somebody. He ran into the Laylow from Mistral, he ran into him while, drinking water.[00:03:20] He didn't even know he was there. It's just, that type of stuff is really hard to find outside of SF. So, absolutely, absolutely great. And also, presentations from Alignment Labs, presentations from News Research, news issues, talked about. Forge, and some of[00:03:33] swyx: the other stuff they announced. We can say now they're officially a company.[00:03:36] I met Technium.[00:03:37] He[00:03:37] Alex Volkov: came over here. He didn't want to get recorded. But maybe.[00:03:41] Wing Lian: We'll wear him down at some point. Yeah, I'm excited for Forge. They've positioned it as this agentic sort of framework where it's just Drag and drop things and, fill in text with where you want to inject different variables and it opens up all of these potentials for data pipelines now, right?[00:03:56] And using your own local LLMs and not relying on GPT 4 or anything like that. Yeah, yeah,[00:04:02] swyx: good stuff. Okay, so let's maybe go into the Axolotl origin story and then we have, we have some intro or background.[00:04:09] What is Axolotl?[00:04:09] swyx: To do on like the open source model universe and also on fine tuning, but maybe just, since you're talking about your personal journey, what was your personal journey into[00:04:18] Wing Lian: axolotl?[00:04:19] Yeah, so my personal journey started like back in mid March, completely unrelated to AI and axolotl. And it really started, I fell while skiing, I torqued. Great 3 MCL sprain and being sort of like an active person that can no longer be active because the two, couldn't play soccer, because that is requires to have having knees until I, it's healed.[00:04:42] So I. I decided I needed to find something to do to take up my free time. And that became, well, let's learn how to train in, these language models. It was everywhere. So I was like, all right, I'm just going to sit down, learn. I think I used like other, I think I was using like Alpacalora.[00:05:00] Cause I think the Alpaca paper had just came out, come out then. So I was like using Alpacalora repo and sort of like learning how to use like. None of us were like GPU rich back then, and none of us, most of us still we're still all GPU poor, but I was doing what was it, like 4 bit, Alpaca Lord, there was like a 4 bit version where we were doing quant, or 8, no, 8 bit quantizations, and then I think they had released QLOR a little bit later, and I think right when, before QLOR came out, I was already starting to do fine tunes, but having this need to sort of like mix data sets together, and If you've ever looked at all the various different datasets available on HuggingFace, they all have various different prompt formats, and, it's sort of a nightmare, and then I think the other piece is if you've ever tried to fine tune, at least Back then probably the ecosystem's a little better now.[00:05:54] Everybody required that you say, alright, you put your hyperparameters as command line arguments. And so it's always like, well, I now have to go copy and paste my previous thing and to change things out. And I really wanted it. to be in a YAML file because it was more portable and reproducible.[00:06:09] So I was doing that and then the QLOR paper came out. Tim Dettmer announced that and then somebody looked it up for me yesterday and it's like between that announcement it took us seven days to get that integrated into Axolotl, right? Which is like, it's not. I wouldn't say it's really fast, but in a manner that, is in a, a reusable framework, I think it was quite the accomplishment then.[00:06:33] And so we started, picking up traction with people there. And then it's just been building models, and then just iterating what my needs are. So, yeah. Excellent. Yeah. I[00:06:44] Alex Volkov: want to ask, for folks who are listening who never heard of Axolotl, now do you describe how you got there?[00:06:49] Can you, how do you summarize this for folks who maybe haven't fine tuned anything. They know about open source LLM exists, they maybe know like LLAML, what's XLR for somebody who doesn't know. I've never heard of a data set curation[00:07:01] Wing Lian: creation before. We sort of have to take a step back and understand that, when you've got these language models, you have what I think most people refer to as like base models, also known as like foundational models, right?[00:07:15] Where some benefactor, whether it's Meta or Mistral or whoever, has gone and spent all this money. To train these models on huge corpuses of text, right? And these, these corpuses, they're generally good across lots of different things, but they're really good at just saying, talking on and on and on, but they're not good at, following instructions or having chats or anything like that.[00:07:40] So, when you think about fine tuning, it's like Saying, all right, we have this really sort of good generalized, text completion thing, and I want to turn it into something that I can talk to or have, follow instructions. So, I think fine tuning is probably best defined in like that.[00:07:58] swyx: Okay, got it.[00:07:59] And we actually[00:08:01] What is finetuning?[00:08:01] swyx: Do want to make sure that we have like an overall introduction to fine tuning for people because again like trying to make sure that we bring everyone along in this, in this journey. We already went into Loras and QLoras without explaining what[00:08:12] Wing Lian: they are. Oh yes, yes, sorry.[00:08:14] swyx: And so I will put things in my words and you can correct me as, as, as my I'll be the village idiot here.[00:08:21] So, so fine tuning is basically sort of grabbing an open source model off the shelf, and then basically doing further training on it with a custom dataset of your own. Primarily, people use it, think about it as fine tuning for JSON output, or fine tuning for a style of response. Let's say you wanted to tell jokes, or be funny, or be short, or whatever.[00:08:43] Just the open source AI community has really fine tuned in all sorts of different manner. I think we'll go over those those things now. Let's go over those things now, and then we'll talk about fine tuning methods.[00:08:52] Open Source Model Zoo[00:08:52] swyx: So there's a universe of people who fine tune stuff. Yesterday in your slides, you had, I'll just list some of these and then we'll maybe go through some of them, right?[00:08:59] So Technium is personally leading Open Hermes, which is I think the sort of premier model out of the news. news community. There's OpenOrca, which you had a hand in. News, the news research itself also has Capybara and Puffin and all the others. There's Pygmalion, which I've never messed with.[00:09:14] Eric Hartford, I am aware of his Uncensored Models and his Samantha Models. Disco Research with Disco LM. And then you personally have done Manticore, Minotaur, Jackalope, and Hippogriff. What should people know about all these names? Being part of AI Twitter is seeing all these things and going dude, I'm being DDoS'ed by all these things and I don't know how different they are.[00:09:32] What should people know? Yeah, so[00:09:34] Wing Lian: I think on a lot of these models, generally, we like to think of those as sort of general models, so If you think about it, what is GPT 4, what is Chad GPT? It's a good general model, and then, One of the services I think that OpenAI offers is like these fine tunings where you're a business and you have very specific business use cases and you might fine tune for that use case.[00:10:00] All of these models are really just general use case that you can then go and maybe Fine tune another lore over it for your use cases, but they tend to be good. With good being relative, it's open source. Open source AI is still sort of is infancy. So, good is, it's pretty reasonable.[00:10:18] It's probably still better than most, high schoolers at answering questions and being able to like figure things out and, and reasoning skills and math and those sorts of things, right?[00:10:27] swyx: And also as measured on the Hugging[00:10:29] Wing Lian: Face leaderboard. Yes, well, that's like a whole other discussion, right, there's a whole other, group of people who, and I, I mostly agree with them that, benchmarks can be, are pretty bogus these days, LM says, I think they published something recently where, even if you think the dataset's not contaminated, you can go and, find contamination And maybe we should step back and say what contamination is, right?[00:10:53] Benchmarks and Contamination[00:10:53] Wing Lian: So we have all of these data, when you go and do these benchmarks, there's a specific data set where there are these questions and usually it's multiple choice. And what can happen is, well, sometimes someone It puts the question, maybe maliciously, maybe accidentally, into the training dataset, and now the, the, your model knows how to answer the test questions really well, but it doesn't, it hasn't generalized the ability to actually do that[00:11:20] Alex Volkov: right.[00:11:21] We've seen some folks competitively announce models that are like the best at that leaderboard, but then it's, it's quite obvious that, In open source? Yeah, and in that leaderboard, for Hugging Face specific, I don't know if LMCs, if that had suffered, but we, there's been some models that seem to have been competitively trained and some leakage happened into their,[00:11:41] swyx: like, supposal.[00:11:43] I understand, once there's been a credible assertion, Hugging Face actually does take them down, right? Yeah, yeah,[00:11:48] Alex Volkov: which is really hard to know, right?[00:11:50] swyx: It's really hard to know, sometimes it's like a pure accident,[00:11:52] Alex Volkov: it's oh, oops. You're going through a mixer. I think, a responsible So acknowledgement, that this kind of happened to you is also important.[00:11:58] I saw LDJ from news research can acknowledge that. Because many of these datasets are collections of other datasets. There's a bunch of people are baking, basically. It's alchemy. Right. And so sometimes you don't know. Sometimes you pull an open source dataset and they announce, oh, you know what, actually, the MMLU benchmark which we used to Specifically identify models that did go into this data set, that then went into that data set.[00:12:22] So sometimes it's actually an accident and folks take it down. But I've seen some competitive folks who want to put their name out there because people are starting to notice which is the top[00:12:30] swyx: model. For those who want a fun take on this so the file one dataset. FindOne model from Microsoft was accused of being contaminated.[00:12:37] And I saw this joke paper that was fantastic. It was called, training on the test set is all you need. It's a super small model that just memorizes everything. It was fantastic. So yeah, contamination, I think we've actually covered it in a previous episode before. So we're good. But again, I want to give people a map into the open source AI model, the universe.[00:12:57] And Alex, you can also jump in here because you guys have spent a lot more time with them than I have. So, what should people know about Technium? What should people know about Noose? And then we can go down the list. Yeah,[00:13:05] Wing Lian: I think so. I think if we start with, Technium. When you talk to him, he's gonna say, I think, I think his response is that he wants to build GP4 on his laptop, right?[00:13:14] So, very, very good at building general models. I think with Noose, Noose Research, they're looking at more, sort of, More, more research focused things, like their Yarn models, I don't, I don't, they didn't actually train their, they have their own trainer for their Yarn models, but So they did not use Xlato for that one?[00:13:30] They didn't use that, but like Is that, you don't have support for it? I think we do support Yarn, I think, I'd have to double check that answer. Yeah, I'm just kind of curious what you can and cannot support, and Yeah, I mean, Yarn is supportable, it's basically, I think it's just replacing, I think, the rope part of that, so Yeah, not, not a big deal.[00:13:48] Yeah, it's not a big deal, it's just I haven't gotten to it, not enough people have asked, I think a lot of people have asked for other things, so it's just, squeaky wheel, right? I think at the end of the day, people are like building these data sets and I think if you sort of map things chronologically, these make more sense because it's like, how do we incrementally improve all of these models?[00:14:07] So a lot of these models are just incremental improvements over the last thing, right? Whether it is sort of through methods of how do we, how did we curate the data set? How did we improve the quality of the data set? So, you maybe LDJ talked about it right on I think for, for Capybara and Puffin, like how those, those were very specific dataset curation techniques that he works on.[00:14:29] The Case for Open Source AI[00:14:29] Alex Volkov: So there's, folks are doing this for dataset curation. Folks are doing this for skillset building as well. Definitely people understand that open source is like very important, especially after the, the, the, the, the march, the debacle, the OpenAI weekend that we all had. And people started noticing that even after developer day in OpenAI, the APIs went out.[00:14:48] And then after that, the whole leadership of the company is swiftly changed and people, there was worries about, you know. How can people continue building AI products based on these like shaky grounds that turned attention definitely to Technium at least in open RMS I started seeing this more and more on Twitter, but also other models and many companies They're gonna start with open AI just to get there quick, and then they they think about okay Maybe I don't want to share my knowledge.[00:15:13] Maybe I don't want to sign up for Microsoft. Maybe they will change their terms and conditions so What else is out there? They turned to other companies. Up until yesterday, Google was nowhere to be found. We've talked about Gemini a little bit before in a previous And you can tune in[00:15:26] swyx: to[00:15:26] Alex Volkov: Thursday Eye.[00:15:26] Yeah, you can tune in to Thursday Eye. We covered the Gemini release a little bit. And but many are turning into the open source community and seeing that Meta released and continues to release and commit to open source AI. Mistral came out and the model is way smaller than LLAMA and performs Significantly better.[00:15:43] People play with OpenRMS, which is currently techniums based, news researched, sourced, axolotl trained OpenRMS, I assume, right? And then they play with this and they see that, okay, this is like GPT 3. 5 quality. We had GPT 4. 5 birthday just a week ago. A week ago, a year ago, a week ago, we never, interacted with these models of this caliber.[00:16:04] And now there's one open source, one that's on my laptop, completely offline, that, I can continue improving for my use cases. So enterprises, companies are also noticing this. And the open source community folks are building the skill set, not only the data sets. They're building the actual kind of, here's how we're going to do this, with Axelotl, with these data sets.[00:16:21] The curation pieces. Now. Interesting. There's like recipes of curation. The actual model training is kind of a competitive thing where people go and compete on these leaderboards that we talked about, the LMC arena, and that recently added open air and recently added open chat and a bunch of other stuff that are super cool.[00:16:37] The hug and face open source leaderboard. And so there's a competitive aspect to this. There's the open source. Aspect to this, like Technium says, I want GPT 4 on my laptop. There's the, let me build a skill set that potentially turns into a company, like we saw with Noose. Noose just, started organizing, a bunch of people on Discord, and suddenly, they're announcing their company.[00:16:54] It's happening across all these modalities, and suddenly all these people who saw these green pastures and a fairly quick way to, hey, here's a cool online community I can, start doing cool stuff with. You mentioned the same in the beginning, right? Like, after your accident, what's cool, let me try this out.[00:17:08] Suddenly I start noticing that there's a significant movement of interest in enterprising companies into these areas. And, this skill set, these data sets, and this community is now very Very important, important enough to create an event which pulls in Andrei Karpathy from OpenAI to come and see what's new Jeremy Howard, like the event that we just talked about, people are flying over and this is just a meetup.[00:17:28] So, definitely, the community is buzzing right now and I think Axelot is a big piece as well.[00:17:34] Orca and OpenOrca[00:17:34] Wing Lian: Cool. Maybe we can talk about like Orca real quick, Orca, OpenOrca rather, I think there was a lot of buzz when, the first Orca paper came out. And just briefly, what is Orca? Yeah, Orca was basically having traces of like chain of thought reasoning, right?[00:17:48] So they go and they, they distill sort of GPT 4. They take, they take a sampling of data from the Flan dataset. Maybe we can like add some show notes in the Flan dataset. Yeah, but we've covered it. Okay, cool. Use GPT 4 to say, all right, explain this in a step by step reasoning, right?[00:18:06] And then you take that and you, they train the model and it showed, very good improvements across a lot of benchmarks. So OpenOrca was sort of the open reproduction of that since Microsoft Research never released that particular data set. And going back to sort of the Hugging Face leaderboard thing, those models did really well.[00:18:23] And then I think, so sort of the follow up to that was SlimOrca, right? I think Going into and building the OpenOrca dataset, we never really went in and, validated the actual answers that GPT 4 gave us, so what we did was one from OpenChat actually cross referenced the original Flan, the original Flan response, the human responses, the correct answers with the dataset, and then I went and took it and sent all of, both of them to GPT 4 and said, is this answer mostly correct, right?[00:18:54] Yeah. And then we were able to filter the dataset from, At least of the GPT 4 only answers from like 800, 000 to like 500, 000 answers or rows and then, and then retrain the model and it had the same performance as the original model to within I think, 0. 1 percent here about, and 30 percent less data.[00:19:13] So, yeah. Okay.[00:19:15] swyx: Interesting. So, I mean, there's, there's so much there that I want to highlight, but yeah. Orca is interesting. I do want people to know about it. Putting chain of thought into the data set like it's just makes a ton of sense one thing I think it would be helpful for people to scope thing these things out is how much data are we talking about when when you When people are fine tuning and then how much time or resources or money does it take to train to fine[00:19:36] Wing Lian: tune?[00:19:37] Yeah, so I think there's a little bit of overlap there with sort of like fine tuning techniques, but let's say Orca and I think even Hermes, they're both relatively large data sets like 10 billion tokens. Yeah. So large data sets being or the original Orca was, or the original open Orca was 800,000 rows.[00:19:55] I believe it was somewhere in the ballpark of like a gigabyte of data, of gigabyte, of text data. And I, I don't. I believe, Hermes was, is like a quarter million rows of data, I don't know the actual byte size on that particular one. So, going and training a, let's, let's say everybody's training 7 billion Mistral right now, right?[00:20:15] So, to tri I, I believe to fine tune 7 billion Mistral on, let's say, 8 A6000s, which have 48 gigabytes of VRAM, I believe, It takes about 40 hours, so 40, and then that's, depending on where you get your compute, 40 times 6, so it's like 500 to fine tune that model, so, and, and that's assuming you get it right the first time, right?[00:20:44] So, you know.[00:20:45] swyx: Is, is that something that X. Lotto handles, like, getting it right the first[00:20:48] Wing Lian: time? If you talk to anybody, it's like you've probably tried at least three or four runs or experiments to like find the right hyperparameters. And after a while you sort of have a feel for like which, where you need your hyperparameters to be.[00:21:04] Usually you might do like a partial training run, do some benchmark. So I guess for Al Farouk, whether you're going by his. This is Jeremy, he's, his actual name, or his twitter handle. He released the Dharma dataset, which is basically a subset of all the benchmarks. And Axolotl actually supports, you know taking that subset and then just running many benchmarks across your model every time you're doing an evaluation so you can sort of like see sort of relative it's not going to be the actual benchmark score, but you can get ideas alright, is this benchmark improving, is this benchmark decreasing, based on, you know Wait,[00:21:39] swyx: why don't you run the full benchmark?[00:21:41] What, what, what The[00:21:42] Wing Lian: full benchmarks take Take a long time. Significant, yeah, significant amount of time. Yeah. And Okay, so that's like[00:21:48] swyx: mini MMLU. Yeah. Like,[00:21:49] Wing Lian: mini BigBench or whatever. Yep, exactly.[00:21:51] Alex Volkov: It's really cool. We, when I joined Web2Masters just recently, and one of the things that I try to do is hey I'm not, I'm a software engineer by trade, I don't have an MLE background, But I joined a company that does primarily MLE, and I wanted to learn from the community, Because a lot of the open source community, they use weights and biases, And the benchmark that you said that Pharrell did, remind me of the name, sorry.[00:22:13] Dharma? Dharma, yeah, yeah. So Luigi showed me how Dharma shows inside the dashboard. In Wi and Biases dashboard and so you can actually kinda see the trending run and then you can see per each kind of iteration or, or epoch or you can see the model improving trending so you can on top of everything else.[00:22:29] The wi and biases gives like hyper parameter tracking, which like you, you started with common line and that's really hard to like remember. Also the Dharma data set, like the quick, the mini orca mini, you mini many different things. It's pretty cool to like visualize them as well. And I, I heard that he's working on a new version of, of Dharma, so Dharma 2, et cetera.[00:22:47] So hopefully, hopefully we'll see that soon, but definitely it's hard, right? You start this training around, it said like 40, 50 hours. Sometimes, sometimes it's like your SSHing into this machine. You, you start a process, you send it with God and you just go about your day, collecting data sets, and then you have to return.[00:23:04] And the whole process of instrumentation of this is still a little bit like squeaky but definitely. Tuning performance, or like grabbing performance in the middle of this, like with Dharma and some other tools, is very helpful to know that you're not wasting precious resources going somewhere you shouldn't go.[00:23:21] Yeah.[00:23:22] swyx: Yeah. Very cool. Maybe I'll, I'll, before we go into like sort of more details on fine tuning stuff, I just wanted to round out the rest of the Excel autoverse. There's, there's still Eric Hartford stuff. I don't know if you want to talk about Pygmalion, Disco, anything that you know about[00:23:35] Wing Lian: those, those things.[00:23:36] DiscoLM and Model Stacking[00:23:36] Wing Lian: Yeah, I think like one of the, definitely one of the more interesting ones was like the Disco 120b, right? Yeah, I know nothing about it. Yeah. So, so. Alpen from Pygmalion AI, right, so they, so Pygmalion is a sort of a, it's, it's, they have their own community, a lot of it is based around, roleplay models, those sorts of things, and Alpen, like, put together, merged together Llama270B, so, and Alpen, like, put together, merged together Llama270B, so, I don't remember how he stacked them together, whether he merged the layers in between. There's a whole, there's a whole toolkit for that by Charles Goddard, where you can like take a single model and like stack them together or multiple models merge.[00:24:18] That's like a whole other talk and a whole other tool set, but was able to create this 120. Billion parameter model out of a LAMA two 70 B. And then I believe the, yeah, disco is a fine tune of, of the, the, the sort of the base one 20 B is, I believe Goliath one 20 B. So, and, and what are the[00:24:37] swyx: headline results that people should know about[00:24:39] Wing Lian: disco?[00:24:39] I think for the headline results, I, I've, I haven't played with it personally because it's. It's a very large model and there's a lot of GPU, right? But, like, from what I've heard anecdotally, it performs really well. The responses are very good. Even with, like, just, even the base model is a lot better than, Llama70b.[00:24:57] So, and we, I think generally everybody's like, we would all love to fine tune Llama70b, but it's just, it's so much, it's so much memory, so much compute, right?[00:25:07] Datasets and Evals over Models[00:25:07] Wing Lian: I[00:25:07] Alex Volkov: want to touch on this point because the interesting thing That comes up out of being in this ecosphere and being friends with open source folks, tracking week to week state of the art performance on different models.[00:25:19] First of all, a lot of the stuff that the folks do a couple of weeks ago, and then something like Mistral comes out, and a lot of the stuff back then, Doesn't technically make sense anymore. Like the artifacts of that work, the actual artifacts, they don't no longer make sense. They're like lower on the on, on the hug and face leaderboard or lower on LM CS leaderboard.[00:25:36] But some of the techniques that people use, definitely the datasets. The datasets keep traveling, right? So open airmen, for example, is the dataset. The tum cleaned up for only. Open sourceable data that previously was just Hermes. And that, it was previously used to train Lama. And then once Mistral came out, it was used to train Mistral.[00:25:54] And then it became significantly better on the 7b base Mistral. So the data sets keep traveling, keep getting better a little bit here and there. And so the techniques improve as well. It looks like both things are simultaneously true. The artifacts of a month and a half ago. The, the actual models themselves, it's great the hug and face has them, because not every company can keep up with the next weeks', oh, I, I'll install this model instead, sell this model instead.[00:26:19] But the, the techniques and the, the dataset keep improving as we go further, and I think that's really cool. However, the outcome of this is that for a long time. For many, many people, including us, that we do this every week. We literally talk with people who release these models every week. It's really hard to know.[00:26:36] So, there's a few aspects of this. One, I think, like you said, the bigger model, the 70B models, you actually have to have somebody like Perplexity, for example, giving you access to the 70B really fast. Or you have to, like, Actually, find some compute, and it's expensive, especially for the bigger models. For example Falcon 180B came out, like the hugest open source model.[00:26:56] How do you evaluate this if you can't run it? Nobody liked it. It's really, so first of all, nobody liked it, but secondly, only the people who were able to find compute enough to run inference on this, they only had like, I can't run this on my laptop, and so that's why it's much easier, something like OpenRMS 7 to be, 7B, it's much easier, because you can run this on your MacBook.[00:27:14] It's much easier to evaluate. It's much easier to figure out the vibes, right? Everybody talks about the vibes as an evaluation check. If you're plugged in enough, if you follow the right people, if they say pretty much the same things all independently, then you run into a problem of whether they're repeating, and their stochastic parents are repeating the same thing, or they actually evaluated themselves.[00:27:31] Yeah, you never know. But, you never know, but like, I think on a large enough scale on Twitter, you start getting the feel. And we all know that like, OpenRMS is one of the top performing models, benchmarks, but also vibes. And I just wanted to highlight this vibes checks thing because you can have the benchmarks, you can have the evaluations, they potentially have contamination in them, potentially they not necessarily tell you the whole story because some models are good on benchmarks, but then you talk to them, they're not super helpful.[00:28:00] And I think it's a combination of the benchmarks, the leaderboards, the chatbot, because LMSys, remember, their ranking is not only based on benchmarks, it's also people playing with their arena stuff. People actually like humans, like, get two answers. I think they completely ignore benchmarks. Yeah, and then They only do ELO.[00:28:18] Oh, they do ELO completely, right? So that, for example, is just like people playing with both models and say, Hey, I prefer this one, I prefer that one. But also there's like some selection bias. The type of people who will go to LMCs to play with the models, they're a little bit specific in terms of like who they are.[00:28:33] It's very interesting. There's so many models. People are doing this in this way, that way. Some people are doing this for academic rigor only to test out new ideas. Some people are actually doing this like the Intel fine tunes of Mistral. Intel wanted to come out and show that their hardware approach is possible, Mistral, etc.[00:28:51] And it's really hard to know, like, what to pick, what to use. And especially on the bigger models, like you said, like the Llama 70B, the Falcon 180B. It's really because, like, who has the compute to validate those? So I would mention that, like, use with caution. Like, go and research and see if the biggest model that just released was actually worth the tokens and the money you spend on it.[00:29:12] To try and, if you're a business, to integrate it.[00:29:15] Distilling from GPT4[00:29:15] swyx: Since you said use of caution, I'll bring in one issue that has always been in the back of my mind whenever I look at the entire universe of open source AI models, which is that 95 percent of the data is derived from GPC 4, correct?[00:29:30] Which technically you can't use for commercial licenses,[00:29:34] Wing Lian: right?[00:29:35] swyx: What is the community's stance on this kind of stuff?[00:29:40] Wing Lian: I think from the community stance, like I feel like a lot of us are just experimenting, so for us, it's like, we're not going and building a product that we're trying to sell, right?[00:29:49] We're just building a product because we think it's interesting and we want to use it in our day to day lives, whether or not we try and integrate it. Personal use, yeah. Yeah, personal use, so like, as long as we're not selling it, yeah, it's fine. But[00:30:01] swyx: like, I as a company cannot just take OpenHermes and start serving[00:30:05] Alex Volkov: it and make money on it.[00:30:06] OpenHermes you can. Because the opening of OpenHermes, I think, is a clean up. That did after the regular Hermes, please folks, check your licenses before you listen to podcasts and say, Hey, I will tell you though, you could say the same thing about OpenAI. You could say the same thing kind of makes sense, where OpenAI or StabilityAI trains their diffusion model on a bunch of pictures on the internet, and then the court kind of doesn't strike down Sarah Silverman, I think, or somebody else, who came and said, hey, this has my work in it, because of the way how it processes, and the model eventually builds this knowledge into the model, and then it doesn't actually reproduce one to one what happened in the dataset.[00:30:45] You could claim the same thing for open source. Like, we're using And by we, I mean the, the open source community that I like happily report on uses GPT 4 to rank, for example, which is the better answer you, you, that's how you build one, one type of data set, right? Or DPO or something like this, you, you basically generate data set of like a question and four answers, for example, and then you go to GPT 4 and say, Hey, smartest model in the world right now, up to Gemini Ultra, that we should mention as well.[00:31:11] Which one of those choices is better? But the choices themselves are not necessarily written with GPT 4. Some of them may be, so there's like full syntactic datasets. But there's also, datasets are just ranked with GPT 4. But they're actually generated with a sillier model, or like the less important model.[00:31:25] The lines are very blurry as to what type of stuff is possible or not possible. And again, when you use this model that's up on Hug Face, the license says you can use this. OpenAI is not going to come after you, the user. If anything, OpenAI will try to say, hey, let's prevent this, this type of thing happening, and the brain, but I honestly don't think that they could know even, not that it makes it okay, it's just like, They also kind of do this with the Internet's archive, and also, I think that some of it is for use.[00:31:55] You use models to help you augment tasks, which is what GPT 4 lets you do.[00:32:00] swyx: Yeah, the worst thing that OpenAI can do is just kick you off OpenAI. That's because it's only enforced in the terms of service.[00:32:05] Alex Volkov: Sure, but just like to make sure, to clarify who they're going to kick out, they could kick out like News, for example, if news are abusing their service, a user of the open source, fully Apache 2 open source, for example, They won't get kicked out if they use both, just because they use both.[00:32:22] I don't believe so. I don't think OpenAI has a claim for that.[00:32:25] swyx: Well, we're not lawyers, but I just want to mention it for people to know it's an issue.[00:32:30] Wing Lian: And one of the things, like, I talked to someone recently, and I think that they also are like interested in it, but also to the point of like, right, if I use a model trained on data, using GPT for data, But I use that model to then regenerate new data.[00:32:46] Is that model, is that data okay? So like you start going down this whole rabbit hole. So yeah. All right.[00:32:53] swyx: Fantastic. Cool. Well, I think that's roughly highlights most of the open source universe. You also have your own models. Do you want to shout out any one of them? Yeah.[00:33:01] Wing Lian: I mean, I think like, I think Early on, Manicore got a lot of love.[00:33:04] I think it was mostly popular in, like, the roleplay communities. It was, it tended to be pretty truthful. It tended to be, like, have relatively good answers, depending on who you ask, right? But, I think for me, it was just, Releasing models was a way to try and, like, continue to build out the product, figure out what I needed to put into the product, how do I make it faster, and, if you've got to, like, go and debug your product, you may as well have it do something useful.[00:33:29] Awesome. So, yeah.[00:33:31] Finetuning - LoRA, QLoRA, ReLoRA, GPTQ[00:33:31] swyx: Okay, and then maybe we'll talk about just fine tuning techniques. So this is going to be a little bit more technical than just talking about model names and datasets. So we started off talking about LoRa, QLoRa. I just learned from your readme there's ReLoRa. Which I've never heard about.[00:33:45] Could you maybe talk about, like, just parameter efficient fine tuning that whole, that[00:33:50] Wing Lian: whole journey, like, what people should know. Yeah, so with parameter efficient fine tuning, I think the popular ones, again, being, let's, we'll start with lore, right? So, usually what you do is you freeze all the layers on your base, on the base model, and then you, at the same time, you sort of introduce additional Oh, this is tight.[00:34:08] No. You introduce, another set of layers over it, and then you train those, and it is done in a way that is mathematically possible, particularly with LORs that you can, then you, you, When you, when you train the model, you, you run your inputs through the base model, whose weights are frozen, but you, then you also run it through the additional weights, and then at the end you combine the weights, and then, and then, or you combine the weights to get your outputs, and then at the end, and when you're done training, you're left with this other set of weights, right, that are completely independent, and And then from that, what you can do is, some person smarter than I figured out, well, oh, they've done it in such a way that now I can merge these weights back into the original model without changing the architecture of the model, right?[00:35:03] So, so, that tends to be, like, the go to, and You're training much fewer parameters so that when you do that, yes, you still need to have all of the original weights, but you have a smaller gradient, you have a smaller optimizer state, and you're just training less weights, so you can tend to train those models on, like, much smaller GPUs.[00:35:27] swyx: Yeah. And it's roughly like, what I've seen, what I've seen out there is roughly like 1 percent the number of parameters that you're trading. Yeah, that sounds about right. Which is that much cheaper. So Axelotl supports full fine tune, LoRa, QLoRa,[00:35:40] Wing Lian: Q. Yes. So, so QLoRa is, is very similar to LoRa. The paper was, if I remember correctly, the paper was Rather, traditionally, most people who did Loras were, were, they were quant, they were putting the model weights in 8 bit, and then fine tune, parameter efficient fine tuning over the Lora weights, and then with QLora, they were quantizing all of those, they were then quantizing the weights down to 4 bit, right, and then I believe they were also training on all of the linear layers in the model.[00:36:15] And then with ReLore, that was an interesting paper, and then, I think, like, it got implemented. Some people in the community tried it, tried it out, and it showed that it didn't really have the impact that the paper indicated that it would. And from what I was told recently, that they re I guess they re released something for Relora, like, a few weeks ago, and that it's possibly better.[00:36:44] I personally haven't had the time. What was the[00:36:46] swyx: main difference,[00:36:47] Wing Lian: apart from quantization? I don't know. Okay. What was the main difference, sorry?[00:36:49] swyx: Apart from quantization, right? Like,[00:36:50] Wing Lian: Qlora's thing was, like, we'll just drop off some bits. With Relora, what they did was, you would go through, you would define some number of steps that you would train, like, your Lora with, or your Qlora.[00:37:01] Like, you could do Like, ReqLore, if you really wanted to, you would, you would train your LoRa for some number of steps, And then you would merge those weights into your base model, and then you would start over. So by starting, so, then by starting over, The optimizer has to find, like, sort of, re optimize again, and find what's the best direction to move in, and then do it all again, and then merge it in, do it all again, and theoretically, according to the paper, doing ReLore, you can do parameter efficient fine tuning, but still have sort of, like, the performance gains of doing a full fine tuning, so.[00:37:38] swyx: Yeah, and[00:37:39] Wing Lian: GPTQ? And GPTQ, so it's, I think with GPTQ, it's very similar to, more similar to QLore, where you're, it's mostly a quantization of the weights down to like 4 bit, where GPTQ is a very, is a specific methodology or implementation of quantization, so. Got it.[00:37:57] Alex Volkov: Wang, for, for folks who use Axolotl, your users, some people who maybe, Want to try it out?[00:38:03] And do they need to know the differences? Do they need to know the implementation details of QLora versus ReLora? Or is it okay for them to just know that Axolotl is the place that already integrated them? And if that's true, if that's all they need to know, how do they choose which method to use? Yeah,[00:38:22] Wing Lian: so I think like, I think most people aren't going to be using ReLora.[00:38:25] I think most people are going to be using either Lora or QLora. And I think they should have it. They should have an understanding of why they might want to use one over the other. Most people will say that with Qlora, the quality of the final model is not quite as good as like if you were to do a LoRa or a full fine tune, right?[00:38:44] Just because, you've quantized these down, so your accuracy is probably a little off, and so that by the time you've done the Qlora, you're not moving the weights how you would on a full fine tune with the full parameter weights.[00:38:56] Interesting.[00:38:57] swyx: Okay, cool. For people who are more interested, obviously, read the papers. I just wanted to give people, like, a high level overview of what these things are. And you've done people a service by making it easy for people to try it out. I'm going to, I'm going to also ask a question which I know to be wrong, but I'm curious because I get asked this all the time.[00:39:15] What is the difference between all these kinds of fine tunes[00:39:17] Wing Lian: and RLHF? Okay, between all of these sorts of fine tunes and RLHF. So all of these sorts of fine tunes are based, are, ideally, this, they are taking knowledge that the base model already knows about, and presenting it in a way to the model that you're having the model answer like, Use what it already knows to sort of answer in a particular way, whether it's, you're extracting general knowledge, a particular task, right?[00:39:44] Instruct, tune, chat, those sorts of things. And then generally with RLHF, so what is, let's go back, what is it? Reinforcement Learning with Human Feedback. So if we start with the human feedback part, What you're doing is you generally have, you have like a given prompt and then you, maybe you have one, maybe you have two, I think, like if you look at with Starling, you have like up to what, seven different, seven different possible responses, and you're sort of ranking those responses on, on some sort of metric, right, whether the metric is how much I, I might like that answer versus or I think with like starling is like how how how helpful was the answer how accurate was the answer how toxic was the answer those sorts of things on some sort of scale right and then using that to go back and like sort of Take a model and nudge it in the direction of giving that feedback, to be able to answer questions based on those preferences.[00:40:42] swyx: Yeah, so you can apply, and is it commutative? Can you apply fine tuning after and onto an RLHF model? Or should the RLHF apply, come in afterwards,[00:40:54] Wing Lian: after the fine tune? Um, I, yeah, I don't know that there's There's been enough research for one way or another, like, I don't know.[00:41:02] That's a question that's been asked on Discord. Yeah, like, I definitely would say I don't know the answer. Go and try it and report back to me and let me know so I can answer for the next guy.[00:41:10] swyx: It's shocking how much is still unknown about all these things. Well, I mean, that's what research is for, right?[00:41:16] Wing Lian: So actually I, I think I saw on the top of a leaderboard, it was a, it was a mytral base model, and they didn't actually fine tune it. They, or they, they just did RLH, they did like an RLHF fine tune on it using like, I don't, I don't recall which dataset, but it was like, and it benchmarked really well.[00:41:37] But yeah, you'd have to go and look at it. But, so it is interesting, like going back to that, it's like. Traditionally, most people will fine tune the model and then do like a DPO, PPO, some sort of reinforcement learning over that, but that particular model was, it seemed like they skipped like the supervised fine tuning or Scott.[00:41:55] Axolotl vs HF Transformers[00:41:55] swyx: Cool. One thing I did also want to comment about is the overall, like, landscape, competitive landscape, I don't know. Hugging Face Transformers, I think, has a PFT module.[00:42:05] Wing Lian: Yeah, yeah, the PEFT, the Parameter Efficient Fine Tuning, yep. Is that a competitor to you? No, no, so we actually use it. We're just a wrapper over sort of, sort of the HuggingFace stuff.[00:42:15] So, so that is their own sort of module where They have, taken the responsibility or yeah, the responsibility of like where you're doing these parameter efficient fine tuning methods and just sort of like, it is in that particular package where transformers is mostly responsible for sort of like the modeling code and, and the trainer, right.[00:42:35] And then sort of, there's an integration between the two and, there's like a variety of other fine tuning packages, I think like TRL, TRLX, that's the stability AI one. Yeah, I think TRL likes the stability, yeah, Carper, and TRL is a hugging face trainer. Even that one's just another wrapper over, over the transformers library and the path library, right?[00:43:00] But what we do is we have taken sort of those, yes, we've We also use that, but we also have more validation, right? So, there are some of us who have done enough fine tunes where like, Oh, this and this just don't go together, right? But most people don't know that, so like Example?[00:43:19] Like, people want to One and one doesn't go together. I don't have an example offhand, but if you turn this knob and this knob, right? You would think, all right, maybe this will work, but you don't know until you try. And then by the time you find out it doesn't work, it's like maybe five minutes later, it's failed.[00:43:34] It's failed in the middle of training or it's failed during the evaluation step. And you're like, ah, so we've, we've added a lot of, we've added a lot more validation in it. So that like, when you've, you've created your configuration, you run it through and now you say. The validation code says this is probably not right or probably not what you don't, not what you want.[00:43:52] So are you like a, you[00:43:53] swyx: do some linting of your YAML file?[00:43:56] Wing Lian: There, I guess you could call it linting, it's sort of like Is there a set of rules out[00:44:00] swyx: there somewhere? Yeah, there's a set of rules in there. That's amazing, you should write documentation like This rule is because, this user at this time, like, ran into this bug and that's what we invested in.[00:44:10] It's like a good collection[00:44:11] Wing Lian: of knowledge. Yeah, it is, and I guess like, if you really wanted to, like, figure it out, I guess you could, like, git blame everything, and But, yeah, it's, so, I think that's always a useful thing, it's like Because people want to experiment but they don't, people will get frustrated when you've experiment, you're experimenting and it breaks and you don't know why or you know why and you've just gone down the rabbit hole, right?[00:44:37] So, so I think that's one of the big features that's, that I think I find important because it's It prevents you from doing things you probably shouldn't have, and it, and sometimes we will let you do those things, but we'll try and warn, warn you that you've done that.[00:44:50] I[00:44:51] Alex Volkov: have a follow up question on this, actually, because yesterday we hung out to this open source event, and I spent time by you a couple times, like when people told you, oh, XLR, I use XLR, it's super cool, and then the first thing you asked is, like, immediately, like, what can we improve?[00:45:04] And yes, from multiple folks, and I think we talked about this a little bit, where there's It's a developer tool. It's like a machine learning slash developer tool. Your purpose in this is to help and keep people, as much as possible, like, Hey, here's the best set of things that you can use right now. The bear libraries are, or the bear trainer, for example, is a bear trainer.[00:45:28] And also, maybe we should talk about how fast you're implementing these things. So you mentioned the first implementation took a week or so. Now there's a core maintainer group, right? There's like, features are landing, like Qlora, for example. Neftune, I don't know if that's one example of something that people potentially said that it's going to be cool, and then eventually, like, one of those things that didn't really shake out, like, people quickly tested this out.[00:45:48] So, there's a ton of Wait, Neftune is cancelled? I don't know if it's fully canceled, but based on vibes, I heard that it's not that great. So like, but the whole point that I'm trying to make with Neftune as well is that being existing in the community of like XLR or like, I don't know, even following the, the GitHub options or following the Discord, it's a fairly good way to like, learn these, Kind of gut feelings that you just, you just said, right?[00:46:14] Like where this, maybe this knob, that knob doesn't work. Some of these are not written down. Some of these are like tribal knowledge that passes from place to place. Axel is like a great collection of many of them. And so, do you get That back also from community of folks who just use, like, how do you know who uses this?[00:46:30] I think that's still an issue, like, knowing if they trained with XLR or should they add this to things? Talk about, how do you get feedback and how else you should get feedback?[00:46:38] Wing Lian: Yeah, I mean, most of the feedback comes from the Discord, so people come in and , they don't get a training running, they run into, like, obscure errors or, errors that That's a lot of things that maybe, maybe as a product we could catch, but like, there's a lot of things that at some point we need to go and do and it's just on the list somewhere.[00:46:58] Right that's why when people come up, I'm like, what, what were your pain points? Because like, as a developer tool, if you're not happy with it, or you come in and in the first, Takes you 30 minutes and you're still not happy. You leave the tool and you may, you might move on maybe to a better tool, maybe to, one with less frustration, but it may not be as good, right?[00:47:17] So I'm trying to like, figure out, all right, how can I reduce all this frustration? Because like for me, I use it every day for the most part, right? And so I am blind to that, right? Mm-Hmm. . Mm-Hmm. . I just know, I, I go do this, this, and this. It pretty much mostly works, right? But, so I don't have sort of that, alright, that learning curve that other people are seeing and don't understand their pain points.[00:47:40] Yeah,[00:47:40] Alex Volkov: you don't have the The ability to onboard yourself as a new user completely new to the whole paradigm to like get into the doors of like, Oh, no, I don't even know how to like ask about this problem or error.[00:47:53] swyx: Cool. The last few things I wanted to cover was also just the more advanced stuff that you covered yesterday.[00:48:00] 20x efficiency with StackLlama and Multipack[00:48:00] swyx: So I'll just, caution this as like, yeah, this is more advanced. But you mentioned Stackllama and Multipack. What are they[00:48:06] Wing Lian: and what should people know? Yeah, so, so, Stack Llama was, that paper came out, so Stack Llama I think was like, two, two, two separate, two separate concepts that they announced, so the first one was They being hugging face.[00:48:20] Yeah, sorry, yes, they being hugging face, so the first one being sort of like, this idea of packing, like some packing sequences together, so like, if we think about training data, right, your training data is, let's say, to keep the math easy, let's say your training data is 500, We, we, we, we will use the terminology words.[00:48:39] Let's say your training data is 500 words long, and let's say your, your context length, you know how much data your, that your model can accept is like, or that you want feed into your model. It's, let's say, we won't use tokens again, we'll we'll use it is it's 4,000 tokens, right? So if you're training at 4K Con or four 4,000 4K contacts and you're only using 500 of it, you're sitting like with the other 1500.[00:49:05] 3, 500 words that you're not using, right? And typically that's either filled with these PAD tokens, so I think I made the analogy last night that it's like having sort of like a glass here you fill it up with a shot of liquor and then you're and that's your training data and then you just fill it up with more water and those are your PAD tokens and it's just, it doesn't do much, right?[00:49:27] It's still the same thing, but you still have to go through all of that to go through all your training data. And then, so what Stack Llama showed was you could just sort of take your training data, append the next row of training data until you filled that entire 4k context, so in this example, right, with 500 words to 4k, that's 8 rows of training data.[00:49:48] But, the problem with that is, is that with a lot of these transformer models, they're very much relying on attention, right? So, like, if you now have this sequence of words that now, in order for the, the model has seen all of these other words before, right? And then it sees another set of words, another set of words, but it's learning everything in context of all the words that it's seen before.[00:50:13] We haven't corrected the attention for that. And just real quickly, since I said that that paper was two concepts, the other one was, I believe it was like a reinforcement learning, but outside the scope of this. So going from that, I implemented that early on because I was like, Oh, wow, this is really great.[00:50:29] And. Yes, because it saves you a bunch of time, but the trade off is a little bit of accuracy, ultimately, but it still did pretty well. I think when I did Manicore, I think it used sort of that concept from Stack Llama of just sort of appending these sequences together, right? And then sort of the next evolution of that is Multipack, right?[00:50:51] So, there was a separate paper on that, it was, I believe it was referenced, it got referenced in the Orca paper, where you could, you could properly mask those out using like a, I think it was like a lower block triangular attention mask, and then sort of, so, So, there's that. I did try implementing that, manually recreating that mask, but then one from the OpenChat, so he was helping with OpenOrca as well, and he had done an implementation of Multipack, and where he used FlashAttention, so FlashAttention So that was released by TreeDAO, and it was this huge performance gain.[00:51:35] Everybody uses it now, even the Transformers library now, they've taken all of these, like, people are taking all of these models and sort of like, making it compatible with FlashAttention. But in Flash Tension, there is one particular implementation that lets you say, Well, I'm sending you all of these sequences like you would in Stack Llama, But let me send you another, another, Set of information about, this is where this set of sequences is, this is where the second set of sequences is.[00:52:06] So like, if it was like, 500 words long, and you stacked them all together, you would just send it a row of information that was like, 0, 500, 1000, 1500, etc, etc, out to 4000. And it would know, alright, I need to break this up, and then run the forward pass with it. And then it would be able to, and it was much more, much more performant.[00:52:29] And I think you end up seeing like 10x, 20x improvements over sort of, I mean, I think FlashAttention was like a 2x improvement, and then adding that with the Multipack, you start to see like, depending on, how much data you have, up to like a 20x improvement sometimes. 20x. 20x. Wow. Yeah.[00:52:48] And I only know the 20x because I, like, before last night, I was like, I re ran the alpaca, I looked up the alpaca paper because it was like, I just need a frame of reference where somebody did it, and I think they used eight A100s for three hours, and they said it cost them 100. I don't, I don't think eight A100s cost, I don't know how much it costs right now.[00:53:14] But I ended up rerunning it. Usually a dollar an hour, right? Yeah, so eight. The cheapest is like a[00:53:18] Alex Volkov: dollar, a dollar an hour for one.[00:53:20] Wing Lian: Yeah, so that's still like 24, 25. But maybe if you're going on Azure, maybe it's like, maybe it's 100 on Azure. I mean, it used to be more expensive, like, a year ago.[00:53:31] Yeah, and then, so I re ran it with sort of like, I turned on all of the optimizations just to see what it would be. And like, and usually Multipack is the biggest optimization, so Multipack with Flash Detention. And it, I think I spun it up on 8 L40s, and it ran, and I didn't let it run all the way through, I just grabbed the time, the estimated completion time, and it was like 30 minutes, so it would have cost like 4 or 5 to run the entire, like, reproduce the alpaca paper, right?[00:54:00] Which is crazy. It's crazy. 20x,[00:54:02] Alex Volkov: yeah. I want to ask about, like, you said you turned on all the optimization. Is that the yaml file with xlodl, you just go and like check off, like, I want this, I want that? Yeah, yeah,[00:54:10] Wing Lian: so there's like one particular yaml file in there, That, there's one particular YAML file in there that's like, it's under examples, llama2, fft, optimize.[00:54:20] So, I think someone had created one where they just turned, they put in all of the optimizations and turned them on. I mean, it actually, it does run, which is like, sort of surprising sometimes, because sometimes, you optimize this, optimize this, and sometimes they just don't work together, but, yeah.[00:54:36] Just turn the knobs on, and like, fine tuning should really just be that easy, right? I just want to flip the knob and move on with my life and not figure out how to implement it.[00:54:47] Tri Dao and Mamba[00:54:47] Alex Volkov: Specifically, the guy behind FlashAttention came up with something new. You want to talk about this a little bit? You want to briefly cover Mamba?[00:54:53] Yeah, let's talk about Mamba. Let's talk about Mamba. So, what is Mamba?[00:54:57] Wing Lian: Oh, gosh. I

god ai google internet pr giving personal talk news san francisco reach microsoft putting open uber soundcloud discord busy seo stanford minneapolis billion indianapolis privacy disco releasing fantastic goliath drag models intel roadmap folks excel transformers significant apartments openai gemini sf wing tuning san francisco bay area open source gpt 4k lama traditionally forge github llama takes dharma apis macbook lors hermes azure gravy nano lotto apache biases llm amin gpu pharrell mamba aspect elo annapolis orca ddos perplexity alpen prs contamination yarn hugging instruct sarah silverman benchmarks gpus starling 7b distilling minotaur pad fine tuning mcl alpaca modify lm noose json mistral pygmalion sota trl hyena puffins a16z xlr microsoft research ppo flan axolotl dpo manticore reinforcement learning jackalope pft google adsense datasets stability ai capybara lmc gpc open source ai yaml lay low mle jeremy howard 70b carper loras neurips sgn huggingface hippogriff rlhf united masters vram technium gemini ultra vicuna latent space human feedback which i've relora

LW - Gemini 1.0 by Zvi

KI-Update â€“ ein Heise-Podcast

Play Episode Listen Later Dec 7, 2023 16:06

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Gemini 1.0, published by Zvi on December 7, 2023 on LessWrong. It's happening. Here is CEO Pichai's Twitter announcement. Here is Demis Hassabis announcing. Here is the DeepMind Twitter announcement. Here is the blog announcement. Here is Gemini co-lead Oriol Vinyals, promising more to come. Here is Google's Chief Scientist Jeff Dean bringing his best hype. EDIT: This post has been updated for the fact that I did not fully appreciate how fake Google's video demonstration was. Technical Specifications Let's check out the specs. Context length trained was 32k tokens, they report 98% accuracy on information retrieval for Ultra across the full context length. So a bit low, both lower than GPT - 4 and Claude and lower than their methods can handle. Presumably we should expect that context length to grow rapidly with future versions. There are three versions of Gemini 1.0. Gemini 1.0, our first version, comes in three sizes: Ultra for highly-complex tasks, Pro for enhanced performance and deployability at scale, and Nano for on-device applications. Each size is specifically tailored to address different computational limitations and application requirements. … Nano: Our most efficient model, designed to run on-device. We trained two versions of Nano, with 1.8B (Nano-1) and 3.25B (Nano-2) parameters, targeting low and high memory devices respectively. It is trained by distilling from larger Gemini models. It is 4-bit quantized for deployment and provides best-in-class performance. … The Nano series of models leverage additional advancements in distillation and training algorithms to produce the best-in-class small language models for a wide variety of tasks, such as summarization and reading comprehension, which power our next generation on-device experiences. This makes sense. I do think there are, mostly, exactly these three types of tasks. Nano tasks are completely different from non-Nano tasks. This graph reports relative performance of different size models. We know the sizes of Nano 1 and Nano 2, so this is a massive hint given how scaling laws work for the size of Pro and Ultra. Gemini is natively multimodal, which they represent as being able to seamlessly integrate various inputs and outputs. They say their benchmarking on text beats the existing state of the art. Our most capable model, Gemini Ultra, achieves new state-of-the-art results in 30 of 32 benchmarks we report on, including 10 of 12 popular text and reasoning benchmarks, 9 of 9 image understanding benchmarks, 6 of 6 video understanding benchmarks, and 5 of 5 speech recognition and speech translation benchmarks. Gemini Ultra is the first model to achieve human-expert performance on MMLU (Hendrycks et al., 2021a) - a prominent benchmark testing knowledge and reasoning via a suite of exams - with a score above 90%. Beyond text, Gemini Ultra makes notable advances on challenging multimodal reasoning tasks. I love that 'above 90%' turns out to be exactly 90.04%, whereas human expert is 89.8%, prior SOTA was 86.4%. Chef's kiss, 10/10, no notes. I mean, what a coincidence, that is not suspicious at all and no one was benchmark gaming that, no way. We find Gemini Ultra achieves highest accuracy when used in combination with a chain-of-thought prompting approach (Wei et al., 2022) that accounts for model uncertainty. The model produces a chain of thought with k samples, for example 8 or 32. If there is a consensus above a preset threshold (selected based on the validation split), it selects this answer, otherwise it reverts to a greedy sample based on maximum likelihood choice without chain of thought. I wonder when such approaches will be natively integrated into the UI for such models. Ideally, I should be able to, after presumably giving them my credit card information, turn my (Bard?) to 'Gemini k-sample Chai...

google chefs speech context ea gemini bard wei gpt ui nano chai sota demis hassabis zvi pichai rationalist lesswrong gemini ultra

KI-Update kompakt: Gemini vorgestellt, Metas neue GenAI-Funktionen, KI-Wächter für Pornos, AI Act

Play Episode Listen Later Dec 7, 2023 7:39

- Google rollt KI-Modell "Gemini Pro" aus und "Gemini Ultra" soll GPT-4 schlagen - Meta veröffentlicht KI-Bildgenerator und gibt Chatbots Langzeitgedächtnis - KI entscheidet in Großbritannien über Zugang zu Online-Pornografie - Ist der AI Act schon da? heise.de/ki-update https://www.heise.de/thema/Kuenstliche-Intelligenz https://the-decoder.de/ https://www.heiseplus.de/podcast

google gro gemini porno gpt zugang metas funktionen genai ai act kompakt vorgestellt gemini ultra

Geek Daily EP208 : เมื่อ Google เปิดตัว Gemini และมั่นใจว่าจะล้ม ChatGPT จาก OpenAI

Geek Forever's Podcast

Play Episode Listen Later Dec 7, 2023 12:47

Gemini เป็น large language model ล่าสุดของ Google ซึ่ง Sundar Pichai เปิดเผยครั้งแรกในงานประชุมนักพัฒนา Google I/O เมื่อเดือนมิถุนายน และตอนนี้ได้ทำการเปิดตัวออกสู่สาธารณะ ซึ่งจากคำกล่าวของ Pichai และ CEO ของ Google DeepMind อย่าง Demis Hassabis Gemini ถือเป็นก้าวกระโดดอย่างมหาศาลของโมเดล AI ที่ในที่สุดจะส่งผลกระทบต่อผลิตภัณฑ์ของ Google เกือบทั้งหมด Gemini ไม่ใช่แค่ AI โมเดลเดียว มีเวอร์ชัน lite ในชื่อ Gemini Nano ที่ถูกออกแบบมาเพื่อรันบนอุปกรณ์ Android แบบออฟไลน์ มีเวอร์ชันที่แรงขึ้น คือ Gemini Pro ที่ในไม่ช้าจะใช้ในการขับเคลื่อน AI services ของ Google จํานวนมาก และเป็น AI backbone ของ Bard ที่เริ่มใช้งานตั้งแต่วันนี้ รวมถึงโมเดลที่มีความสามารถมากที่สุดที่ชื่อ Gemini Ultra ที่เป็น AI model ที่ทรงพลังที่สุดของ Google ที่เคยสร้างมา และดูเหมือนกําลังออกแบบมาเพื่อ data center และการประยุกต์ใช้งานในองค์กรธุรกิจ เลือกฟังกันได้เลยนะครับ อย่าลืมกด Follow ติดตาม PodCast ช่อง Geek Forever's Podcast ของผมกันด้วยนะครับ ========================= ร่วมสนับสนุน ด.ดล Blog และ Geek Forever Podcast เพื่อให้เรามีกำลังใจในการผลิต Content ดี ๆ ให้กับท่าน https://www.tharadhol.com/become-a-supporter/ ——————————————– ติดตาม ด.ดล Blog ผ่าน Line OA เพียงคลิก : https://lin.ee/aMEkyNA ——————————————– ไม่พลาดข่าวสารผ่านทาง Email จาก ด.ดล Blog : https://www.getrevue.co/profile/tharadhol ——————————————– Geek Forever Club พื้นที่ของการแลกเปลี่ยนข้อมูลข่าวสาร ความรู้ ด้านธุรกิจ เทคโนโลยีและวิทยาศาสตร์ ใหม่ ๆ ที่น่าสนใจ https://www.facebook.com/groups/geek.forever.club/ ========================= ช่องทางติดตาม ด.ดล Blog เพิ่มเติมได้ที่ Fanpage : www.facebook.com/tharadhol.blog Blockdit : www.blockdit.com/tharadhol.blog Twitter : www.twitter.com/tharadhol Instragram : instragram.com/tharadhol TikTok : tiktok.com/@geek.forever Youtube : www.youtube.com/c/mrtharadhol Linkedin : www.linkedin.com/in/tharadhol Website : www.tharadhol.com

google blog chatgpt geeks openai gemini google ai gemini pro gemini ultra

LW - Gemini 1.0 by Zvi

The Nonlinear Library: LessWrong

Play Episode Listen Later Dec 7, 2023 16:06

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Gemini 1.0, published by Zvi on December 7, 2023 on LessWrong. It's happening. Here is CEO Pichai's Twitter announcement. Here is Demis Hassabis announcing. Here is the DeepMind Twitter announcement. Here is the blog announcement. Here is Gemini co-lead Oriol Vinyals, promising more to come. Here is Google's Chief Scientist Jeff Dean bringing his best hype. EDIT: This post has been updated for the fact that I did not fully appreciate how fake Google's video demonstration was. Technical Specifications Let's check out the specs. Context length trained was 32k tokens, they report 98% accuracy on information retrieval for Ultra across the full context length. So a bit low, both lower than GPT - 4 and Claude and lower than their methods can handle. Presumably we should expect that context length to grow rapidly with future versions. There are three versions of Gemini 1.0. Gemini 1.0, our first version, comes in three sizes: Ultra for highly-complex tasks, Pro for enhanced performance and deployability at scale, and Nano for on-device applications. Each size is specifically tailored to address different computational limitations and application requirements. … Nano: Our most efficient model, designed to run on-device. We trained two versions of Nano, with 1.8B (Nano-1) and 3.25B (Nano-2) parameters, targeting low and high memory devices respectively. It is trained by distilling from larger Gemini models. It is 4-bit quantized for deployment and provides best-in-class performance. … The Nano series of models leverage additional advancements in distillation and training algorithms to produce the best-in-class small language models for a wide variety of tasks, such as summarization and reading comprehension, which power our next generation on-device experiences. This makes sense. I do think there are, mostly, exactly these three types of tasks. Nano tasks are completely different from non-Nano tasks. This graph reports relative performance of different size models. We know the sizes of Nano 1 and Nano 2, so this is a massive hint given how scaling laws work for the size of Pro and Ultra. Gemini is natively multimodal, which they represent as being able to seamlessly integrate various inputs and outputs. They say their benchmarking on text beats the existing state of the art. Our most capable model, Gemini Ultra, achieves new state-of-the-art results in 30 of 32 benchmarks we report on, including 10 of 12 popular text and reasoning benchmarks, 9 of 9 image understanding benchmarks, 6 of 6 video understanding benchmarks, and 5 of 5 speech recognition and speech translation benchmarks. Gemini Ultra is the first model to achieve human-expert performance on MMLU (Hendrycks et al., 2021a) - a prominent benchmark testing knowledge and reasoning via a suite of exams - with a score above 90%. Beyond text, Gemini Ultra makes notable advances on challenging multimodal reasoning tasks. I love that 'above 90%' turns out to be exactly 90.04%, whereas human expert is 89.8%, prior SOTA was 86.4%. Chef's kiss, 10/10, no notes. I mean, what a coincidence, that is not suspicious at all and no one was benchmark gaming that, no way. We find Gemini Ultra achieves highest accuracy when used in combination with a chain-of-thought prompting approach (Wei et al., 2022) that accounts for model uncertainty. The model produces a chain of thought with k samples, for example 8 or 32. If there is a consensus above a preset threshold (selected based on the validation split), it selects this answer, otherwise it reverts to a greedy sample based on maximum likelihood choice without chain of thought. I wonder when such approaches will be natively integrated into the UI for such models. Ideally, I should be able to, after presumably giving them my credit card information, turn my (Bard?) to 'Gemini k-sample Chai...

google chefs speech context ea gemini bard wei gpt ui nano chai sota demis hassabis zvi pichai rationalist lesswrong gemini ultra

Google lanza Gemini, IA superior a GPT-4

FLASH DIARIO de El Siglo 21 es Hoy

Play Episode Listen Later Dec 7, 2023 3:11

Google supera a GPT-4 con Gemini, su nuevo modelo de IA multimodal, destacando en benchmarks de inteligencia artificialGoogle ha lanzado Gemini, una innovación en IA que supera a GPT-4 en varios benchmarks. Gemini, disponible en versiones Nano, Pro y Ultra, está diseñada para integrarse con productos de Google y transformar la interacción con la IA. Este modelo se destaca por su capacidad multimodal, procesando texto, audio y video.Gemini representa una revolución en IA, desafiando los límites actualesGoogle ha lanzado Gemini, un modelo de IA avanzado, diseñado para ser más que un procesador de texto. Los benchmarks, pruebas estandarizadas para evaluar el rendimiento de modelos de IA, muestran que Gemini supera a GPT-4 en varios aspectos. Estos benchmarks incluyen pruebas de comprensión de texto, razonamiento matemático y programación. Gemini se destaca por su capacidad para procesar no solo texto, sino también audio y video, lo que marca un progreso significativo en la inteligencia artificial.Gemini enfrenta desafíos como superar la tendencia a "alucinaciones" y sesgos de modelos anteriores. Además, debe ser implementado de manera segura y ética en productos de Google como el buscador y Chrome. El modelo está disponible para pruebas en Bard, el chatbot de Google, y se integrará en otros productos de Google en el futuro.Gemini se lanza con promesas de mejorar la experiencia del usuario en servicios de Google. Con versiones adaptadas a diferentes necesidades - Gemini Ultra para tareas complejas, Pro para una amplia gama de tareas y Nano para tareas en dispositivos - Gemini podría cambiar nuestra interacción diaria con la IA.

ai google adem ia gemini descubre superior chrome bard gpt nano lanza el siglo bibliograf gemini ultra elsiglo21eshoy

Release Any GTA and They Will Come - DTNS 4659

Daily Tech News Show

Play Episode Listen Later Dec 6, 2023 31:56

Why are AAA games like GTA 6 ported to PC well after their release on game consoles? Scott explains. Plus Twitch will stop operations in South Korea starting February 27, 2024, due to high costs there. And Google launches its new Large Language Model Gemini which comes in three flavors; Gemini Ultra, Gemini Pro, and Gemini Nano.Starring Tom Merritt, Sarah Lane, Scott Johnson, Roger Chang, Joe.Link to the Show Notes. Become a member at https://plus.acast.com/s/dtns. Hosted on Acast. See acast.com/privacy for more information.

pc acast south korea aaa gta scott johnson sarah lane gemini pro gemini ultra dtns roger chang

Release Any GTA and They Will Come – DTNS 4659

Daily Tech News Show (Video)

Play Episode Listen Later Dec 6, 2023 32:00

Why are AAA games like GTA 6 ported to PC well after their release on game consoles? Scott explains. Plus Twitch will stop operations in South Korea starting February 27, 2024, due to high costs there. And Google launches its new Large Language Model Gemini which comes in three flavors; Gemini Ultra, Gemini Pro, and Gemini Nano. Starring Tom Merritt, Sarah Lane, Scott Johnson, Roger Chang, Joe. To read the show notes in a separate page click here! Support the show on Patreon by becoming a supporter!

news tech pc south korea aaa gta digest merritt scott johnson sarah lane gemini pro gemini ultra dtns roger chang

AF - Google Gemini Announced by g-w1