POPULARITY
Hey, this is Alex,Ok let's start with the big news, holy crap this week was a breakthrough week for speed! We had both Groq explode in popularity, and ByteDance release an updated SDXL model called Lightning, able to generate full blown SDXL 1024 images in 300ms. I've been excited about seeing what real time LLM/Diffusion can bring, and with both of these news release the same week, I just had to go and test them out together: Additionally, we had Google step into a big open weights role, and give us Gemma, 2 open weights models 2B and 7B (which is closer to 9B per Junyang) and it was great to see google committing to releasing at least some models in the open. We also had breaking news, Emad from Stability announced SD3, which looks really great, Google to pay Reddit 200M for AI training on their data & a few more things. TL;DR of all topics covered: * Big CO LLMs + APIs* Groq custom LPU inference does 400T/s Llama/Mistral generation (X, Demo)* Google image generation is in Hot Waters and was reportedly paused (refuses to generate white people)* Gemini 1.5 long context is very impressive to folks (Matt Shumer, Ethan Mollick)* Open Weights LLMs * Google releases GEMMA, open weights 2B and 7B models (Announcement, Models)* Teknium releases Nous Hermes DPO (Announcement, HF)* Vision & Video* YoLo V9 - SOTA real time object detector is out (Announcement, Code)* This weeks Buzz (What I learned in WandB this week)* Went to SF to cohost an event with A16Z, Nous, Mistral (Thread, My Report)* AI Art & Diffusion & 3D* ByteDance presents SDXL-Lightning (Try here, Model)* Stability announces Stable Diffusion 3 (Announcement)* Tools* Replit releases a new experimental Figma plugin for UI → Code (Announcement)* Arc browser adds "AI pinch to understand" summarization (Announcement)Big CO LLMs + APIsGroq's new LPU show extreme performance for LLMs - up to 400T/s (example)* Groq created a novel processing unit known as the Tensor Streaming Processor (TSP) which they categorize as a Linear Processor Unit (LPU). Unlike traditional GPUs that are parallel processors with hundreds of cores designed for graphics rendering, LPUs are architected to deliver deterministic performance for AI computations.* Analogy: They know where all the cars are going when everyone wakes up for work (when they compile) and how fast they all drive (compute latency) so they can get rid of traffic lights (routers) and turn lanes (backpressure) by telling everyone when to leave the house.* Why would we need something like this? Some folks are saying that average human reading is only 30T/s, I created an example that uses near instant Groq Mixtral + Lightning SDXL to just create images with Mixtral as my prompt managerOpen Source Weights LLMs Google Gemma - 2B and 7B open weights models (demo)* 4 hours after release, Llama.cpp added support, Ollama and LM Studio added support, Tri dao added Flash attention support* Vocab size is 256K* 8K context window* Tokenizer similar to LLama* Folks are... not that impressed as far as I've seen* Trained on 6 trillion tokens* Google also released Gemma.cpp (local CPU inference) - AnnouncementNous/Teknium re-release Nous Hermes with DPO finetune (Announcement)* DPO RLHF is performing better than previous models* Models are GGUF and can be found here* DPO enables Improvements across the boardThis weeks Buzz (What I learned with WandB this week)* Alex was in SF last week* A16Z + 20 something cohosts including Weights & Biases talked about importance of open source* Huge Shoutout Rajko and Marco from A16Z, and tons of open source folks who joined* Nous, Ollama, LLamaIndex, LMSys folks, Replicate, Perplexity, Mistral, Github, as well as Eric Hartford, Jon Durbin, Haotian Liu, HuggingFace, tons of other great folks from Mozilla, linux foundation and Percy from Together/StanfordAlso had a chance to checkout one of the smol dinners in SF, they go really hard, had a great time showing folks the Vision Pro, chatting about AI, seeing incredible demos and chat about meditation and spirituality all at the same time! AI Art & DiffusionByteDance presents SDXL-Lightning (Try here)* Lightning fast SDXL with 2, 4 or 8 steps* Results much closer to original SDXL than turbo version from a few months agoStability announces Stable Diffusion 3 (waitlist)Uses a Diffusion Transformer architecture (like SORA)Impressive multi subject prompt following: "Prompt: a painting of an astronaut riding a pig wearing a tutu holding a pink umbrella, on the ground next to the pig is a robin bird wearing a top hat, in the corner are the words "stable diffusion"Tools* Replit announces a new Figma design→ code plugin That's it for today, definitely check out the full conversation with Mark Heaps from Groq on the pod, and see you next week!
Hello hello everyone, welcome to another special episode (some podcasts call them just.. episodes I guess, but here you get AI news every ThurdsdAI, and on Sunday you get the deeper dives) BTW, I'm writing these words, looking at a 300 inch monitor that's hovering above my usual workstation in the Apple Vision Pro, and while this is an AI newsletter, and I've yet to find a connecting link (there's like 3 AI apps in there right now, one fairly boring chatbot, and Siri... don't get me started on Siri), I'll definitely be covering my experience in the next ThursdAI, because well, I love everything new and technological, AI is a huge part of it, but not the ONLY part!
Hey everyone, we have an exciting interview today with Maxime Labonne. Maxime is a senior Machine Learning Scientist at JPMorgan, the author of Hands on GNNs book and his own ML Blog, creator of LazyMergeKit (which we cover on the pod) and holds a PHD in Artificial Intelligence from the Institut Polytechnique de Paris. Maxime has been mentioned on ThursdAI a couple of times before, as he released the first Phi mixture-of-experts, and has previously finetuned OpenHermes using DPO techniques which resulted in NeuralChat7B For the past couple of months, following AI on X, it was hard not to see Maxime's efforts show up on the timeline, and one of the main reasons I invited Maxime to chat was the release of NeuralBeagle7B, which at the time of writing was the top performing 7B model on the LLM leaderboard, and was specifically a merge of a few models. Model mergingModel merging has been around for a while but recently has been heating up, and Maxime has a lot to do with that, as he recently checked, and his wrapper on top of MergeKit by Charles Goddard (which is the library that put model merging into the mainstream) called LazyMergeKit was in charge of >50% of the merged models on HuggingFace hub leaderboard. Maxime also authored a model merging blogpost on Hugging Face and wrote quite a few articles and shared code that helped others to put merged models out. Modern day AlchemyThis blogpost is a great resource on what model merging actually does, so I won't go into depth of what the algorithms are, please refer to that if you want a deep dive, but in a nutshell, model merging is a technique to apply algorithms to the weights of a few models, even a few instances of the same model (like Mistral7B) and create a new model, that often performs better than the previous ones, without additional training! Since this is algorithmic, it doesn't require beefy GPUs burning power to keep training or finetuning, and since the barrier of entry is very low, we get some cool and crazy results as you'll see below. Yeah, quite crazy as it sounds, this method can also create models of non standard sizes, like 10B or 120B models, since it's slicing pieces of other models and stitching them together in new ways. If you recall, we had a deep dive with Jon Durbin who released Bagel, and Jon specifically mentioned that he created Bagel (based on everything everywhere all at once) as a good base for merges, that will include all the prompt formats, you can read and listen to that episode hereThis merge frenzy, made HuggingFace change the leaderboard, and add a checkbox that hides model merges, because they are flooding the leaderboard, and often, and require much smaller effort than actually pre-training or even finetuning a modelAnd quite often the top of the leaderboard was overrun with model merges like in this example of Bagel and it's merges by CloudYu (which are not the top ones but still in the top 10 as I write this) ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.On why it works? Nisten summarized this pretty well in this now famous copypasta tweet and I've confirmed with Maxime that this is his current understanding as well, it's quite unclear why this seems to perform so well, but it of course doesn't stop the "folks who look for AI Waifus" to keep merging.Following folks like Nathan Lambert from interconnects.ai to start paying attention even though he didn't want to! (Still waiting on your writeup Nathan!) UPDATE: As of today Monday Jan 29th, just released a super comprehensive deep dive into merges, which you can read here
ThursdAI - Sunday special deep dive, interviews with Joao, and Jon, AI agent Crews and Bagel Merges. Happy Sunday dear reader, As you know by now, ThursdAI pod is not a standard interview based podcast, we don't focus on a 1:1 guest/host conversation, but from time to time we do! And this week I was very lucky to have one invited guest and one surprise guest, and I'm very happy to bring you both these conversations today. Get your Crew together - interview with João Moura, creator of CrewAIWe'll first hear from João Moura, the creator of Crew AI, the latest agent framework. João is a director of AI eng. at Clearbit (acquired by Hubspot recently) and created Crew AI for himself, to automate many of the things he didn't want to keep doing, for example, post more on Linkedin. Crew has been getting a lot of engagement lately, and we go into the conversation about it with João, it's been trending #1 on Github, and received #2 product of the day when Chris Messina hunted this (to João's complete surprise) on Product Hunt. CrewAI is built on top of Langchain, and is an agent framework, focusing on Orchestration or role-playing, autonomous agents. In our chat with João we go into the inspiration, the technical challenges and the success of CrewAI so far, how maintenance for crew is now partly a family effort and what's next for crewMerges and Bagels - chat with Jon Durbin about Bagel, DPO and mergingThe second part of today's pod was a conversation with Jon Durbin, a self described AI tinkerer and software engineer. Jon is a Sr. applied AI researcher at Convai, and is well known in our AI circles as a master finetuner and dataset curator. This interview was not scheduled, but I'm very happy it happened! If you've been following along with the AI / Finetuning space, Jon's Airoboros dataset and set of models have been often mentioned, and cited, and Jon's latest work on the Bagel models took the lead on HuggingFace open LLM leaderboardSo when I mentioned on X (as I often do) that I'm going to mention this on ThursdAI, Jon came up to the space and we had a great conversation, in which he shared a LOT of deep insights into finetuning, DPO (Direct Preference Optimizations) and merging. The series of Bagel dataset and models, was inspired by the Everything Everywhere All at Once movie (which is a great movie, watch it if you haven't!) and is alluding to, Jon trying to throw as many datasets together as he could, but not only datasets! There has been a lot of interest in merging models recently, specifically many folks are using MergeKit to merge models with other models (and often a model with itself) to create larger/better models, without additional training or GPU requirements. This is solely an engineering thing, some call it frankensteining, some frankenmerging.If you want to learn about Merging, Maxime Labonne (the author of Phixtral) has co-authored a great deep-dive on Huggingface blog, it's a great resource to quickly get up to speedSo given the merging excitement, Jon has set out to create a model that can be an incredible merge base, many models are using different prompt techniques, and Jon has tried to cover as many as possible. Jon also released a few versions of Bagel models, DPO and non DPO, that and we had a brief conversation about why the DPO versions are more factual and better at math, but not great for Role Playing (which is unsurprisingly what many agents are using these models for) or creative writing. The answer is, as always, dataset mix! I learned a TON from this brief conversation with Jon, and if you're interested in the incredible range of techniques in the Open Source LLM world, DPO and Merging are definitely at the forefront of this space right now, and Jon is just at the cross-roads of them, so definitely worth a listen and I hope to get Jon to say more and learn more in future episodes so stay tuned! So I'm in San Francisco, again... As I've mentioned on the previous newsletter, I was invited to step in for a colleauge and fly to SF to help co-host a hack-a-thon with friends from TogetherCompute, Langchain, in AGI house in Hillsborough CA. The Hackathon was under the Finetune VS RAG theme, because, well, we don't know what works better, and for what purpose.The keynote speaker was Tri Dao, Chief Scientist @ Together and the creator of Flash Attention, who talked about SSM, State space models and Mamba. Harrison from Langchain gave a talk with a deepdive into 5 techniques for knowledge assistants, starting with basic RAG and going all the way to agents
Hey hey everyone, how are you this fine ThursdAI?
In this episode Jordan, Danny, and Felicity sit down with Jon Durbin The son of missionaries born in Battle Creek Michigan but raised in Brazil. Jon has battled Rheumatoid Arthritis since high-school, met his wife(Bonnie) in 5th grade, worked in cyber security, is a wood worker, drummer, unbelievable CrossFit athlete, and one of the kindest and most generous humans we are blessed to call a member and friend of our community.
Welcome to the Breadcast! What do Japan's famous anpan, a town cursed by bread, and a jeweler have in common? Nothing much, other than all being discussed on this episode of the Breadcast! Get in touch: Twitter @BreadPod // Email askbreadcast@gmail.com. Music by Wil Pritchard (Jaffro) (http://thejaffro.bandcamp.com) // Artwork by Jon Durbin (http://jondurb.in).
Welcome to the Breadcast! On this week's episode we try some appealing bread, learn about the importance of the trencher in the Middle Ages, experience some bread-based slides into our DMs and marvel at a miraculous mulberry tree. Get in touch: Twitter @BreadPod // Email askbreadcast@gmail.com. Music by Wil Pritchard (Jaffro) (http://thejaffro.bandcamp.com) // Artwork by Jon Durbin (http://jondurb.in).
Welcome to the Breadcast! Join us on our first episode as we explore the myths behind the baguette, the birth of a meme, and one man's love affair with the potato. Get in touch: Twitter @BreadPod // Email askbreadcast@gmail.com. Music by Wil Pritchard (Jaffro) (http://thejaffro.bandcamp.com) // Artwork by Jon Durbin (http://jondurb.in).
About two weeks ago, I had privilege to be in the company of the Suffers. I have been a fan of this amazing band for quite some time and when I got word they were about to drop their debut EP - Make Some Room - I became so excited that I had to reach out and feature them on this show. I was more than honored to have not just an interview but a conversation with lead vocalist Kam Franklin, saxophonist Cory Wilson, trumpeter Jon Durbin, and trombonist Michael Razo and it was fantastic! On this podcast of The Soul Brother Show - The Home of Heavy Soul and Raw Funk, these fantastic musicians talk about the making of the new project, their excitement about going on the road, and what it means to be a musician hailing from Houston, Texas. Enjoy! Tracklist: OutKast - SpottieOttieDopaliscious Fishbone - Interdependent No Doubt w/Lady Saw - Underneath It All The Tontons - Lush LaTasha Lee and the Black Ties - Heart Breaker Chakachas - Stories Beatfanatic - Cookin' (Funky 7 Mix) The Suffers - Make Some Room The Suffers - Step Aside The Suffers - Gwan The Hue w/Ladybug Mecca - Slick Michele Thibeaux - Ready. Set. Go. The Suffers - Good Day (Live) Fishbone - Ma and Pa This episode of The Soul Brother Show is dedicated to Cristina Acuna of Cactus Records here in Houston, Texas. Thank you so much, love!