POPULARITY
Owen Li is a first-generation immigrant who came to the U.S. with $6,000 in debt and earned a Ph.D. from Harvard. After 12 years in a W-2 role, he started investing in real estate with a $32K house and now manages $90 million in assets, including 600+ multifamily units, 900+ self-storage units, and 205 acres of land. He joined Rod's Warrior Group in November 2024 and closed a 133-unit deal in Hattiesburg, MS the following month for $9.2 million, assuming a $5.75 million HUD loan at 2.45% and raising $4.45 million in equity. Based in Chicago with his wife and three kids, real estate has given his family financial freedom and educational choices. Here's some of the topics we covered: From Immigrant Struggles to Earning a Harvard PhD Why Multifamily Beats Burnout from Labor-Intensive Startups How Buying Foreclosures and Fourplexes Built Owen's Foundation Passive Investing vs. Going All In on Multifamily Investing in Other People's Projects Vs. Multifamily Why Scaling in Multifamily Is a Total Game-Changer Build Trust Fast by Building a Proven Track Record Raising 4.45 Million Dollars With His Team Should You Still Be Buying Multifamily Right Now? How Joining the Warrior Group Fast-Tracks Growth If you'd like to apply to the warrior program and do deals with other rockstars in this business: Text crush to 72345 and we'll be speaking soon.
The Automotive Troublemaker w/ Paul J Daly and Kyle Mountsier
Shoot us a Text.Episode #1014: Today, we're covering President Trump's pause on reciprocal tariffs leaves auto imports still facing steep duties, while a record number of buyers turn to 84-month loans to afford today's rising prices.Show Notes with links:The auto industry is still squarely in the tariff crosshairs, even as President Trump backs off his broader reciprocal tariff plan. While a new 10% base tariff replaces most of the global duties, the 25% tariff on vehicles—and key materials like steel and aluminum—remains firmly in place.Treasury Secretary Scott Bessent confirmed auto, steel, and aluminum duties are sector-specific and still active.Despite ongoing auto tariffs, markets surged on the pause news—Tesla jumped 22.7%, GM rose 7.7%, and Ford gained 9.3%.Industry leaders are pushing for relief with MichAuto's Glenn Stevens Jr. advocating for protecting the international supply chain, calling the fragmentation “harmful” to competitiveness.President Trump said he would consider exempting specific companies from tariffs, saying “We're going to take a look at that.”More new-vehicle buyers are turning to 84-month loans than ever before, highlighting just how financially stretched today's car shoppers remain.According to Edmunds, 20% of Q1 new-vehicle loans were for 84 months, up from 16% in 2024 and 13% in 2019.The average amount financed also jumped to over $41K compared to $32K in Q1 2019.On the used side, 12% of loans ran 84 months, more than double the 5.3% seen in 2019, with an average of $28K financed.Dealer Michael Cummings of I-10 Toyota: “I really, really don't like going 84 months... it's not healthy for the customers in the long run. It's not healthy for us dealers in the long run.”A new report from Retail TouchPoints, citing Forrester's 2024 U.S. CX Index, reveals that customer service is at its worst level since 2016. Despite access to advanced tech, only 3% of brands are truly customer-focused — and shoppers are losing patience.70% of customers say it's hard to find in-store help; 83% of associates say their jobs are too complex.Shoppers prioritize speed over flair—19% say quick item location drives return visits.Brands like Tractor Supply and Dick's use “store mode” apps to show real-time inventory and item locations.GenAI is gaining traction: 84% of customer service managers plan to deploy AI agents in 2025, up from 42% in 2023.“With greater customer experiences comes greater responsibility,” says CI&T's Melissa Minkow.Join hosts Paul J Daly and Kyle Mountsier as they connect the dots across car dealerships, retail trends, emerging tech like AI, and cultural shifts—bringing clarity, speed, and people-first insight to automotive leaders navigating a rapidly changing industry.Get the Daily Push Back email at https://www.asotu.com/ JOIN the conversation on LinkedIn at: https://www.linkedin.com/company/asotu/
LET'S GO! Happy second birthday to ThursdAI, your favorite weekly AI news show! Can you believe it's been two whole years since we jumped into that random Twitter Space to rant about GPT-4? From humble beginnings as a late-night Twitter chat to a full-blown podcast, Newsletter and YouTube show with hundreds of thousands of downloads, it's been an absolutely wild ride! That's right, two whole years of me, Alex Volkov, your friendly AI Evangelist, along with my amazing co-hosts, trying to keep you up-to-date on the breakneck speed of the AI worldAnd what better way to celebrate than with a week PACKED with insane AI news? Buckle up, folks, because this week Google went OPEN SOURCE crazy, Gemini got even cooler, OpenAI created a whole new Agents SDK and the open-source community continues to blow our minds. We've got it all - from game-changing model releases to mind-bending demos.This week I'm also on the Weights & Biases company retreat, so TL;DR first and then the newsletter, but honestly, I'll start embedding the live show here in the substack from now on, because we're getting so good at it, I barely have to edit lately and there's a LOT to show you guys! TL;DR and Show Notes & Links* Hosts & Guests* Alex Volkov - AI Eveangelist & Weights & Biases (@altryne)* Co Hosts - @WolframRvnwlf @ldjconfirmed @nisten * Sandra Kublik - DevRel at Cohere (@itsSandraKublik)* Open Source LLMs * Google open sources Gemma 3 - 1B - 27B - 128K context (Blog, AI Studio, HF)* EuroBERT - multilingual encoder models (210M to 2.1B params)* Reka Flash 3 (reasoning) 21B parameters is open sourced (Blog, HF)* Cohere Command A 111B model - 256K context (Blog)* Nous Research Deep Hermes 24B / 3B Hybrid Reasoners (X, HF)* AllenAI OLMo 2 32B - fully open source GPT4 level model (X, Blog, Try It)* Big CO LLMs + APIs* Gemini Flash generates images natively (X, AI Studio)* Google deep research is now free in Gemini app and powered by Gemini Thinking (Try It no cost)* OpenAI released new responses API, Web Search, File search and Computer USE tools (X, Blog)* This weeks Buzz * The whole company is at an offsite at oceanside, CA* W&B internal MCP hackathon and had cool projects - launching an MCP server soon!* Vision & Video* Remade AI - 8 LORA video effects for WANX (HF)* AI Art & Diffusion & 3D* ByteDance Seedream 2.0 - A Native Chinese-English Bilingual Image Generation Foundation Model by ByteDance (Blog, Paper)* Tools* Everyone's talking about Manus - (manus.im)* Google AI studio now supports youtube understanding via link droppingOpen Source LLMs: Gemma 3, EuroBERT, Reka Flash 3, and Cohere Command-A Unleashed!This week was absolutely HUGE for open source, folks. Google dropped a BOMBSHELL with Gemma 3! As Wolfram pointed out, this is a "very technical achievement," and it's not just one model, but a whole family ranging from 1 billion to 27 billion parameters. And get this – the 27B model can run on a SINGLE GPU! Sundar Pichai himself claimed you'd need "at least 10X compute to get similar performance from other models." Insane!Gemma 3 isn't just about size; it's packed with features. We're talking multimodal capabilities (text, images, and video!), support for over 140 languages, and a massive 128k context window. As Nisten pointed out, "it might actually end up being the best at multimodal in that regard" for local models. Plus, it's fine-tuned for safety and comes with ShieldGemma 2 for content moderation. You can grab Gemma 3 on Google AI Studio, Hugging Face, Ollama, Kaggle – everywhere! Huge shoutout to Omar Sanseviero and the Google team for this incredible release and for supporting the open-source community from day one! Colin aka Bartowski, was right, "The best thing about Gemma is the fact that Google specifically helped the open source communities to get day one support." This is how you do open source right!Next up, we have EuroBERT, a new family of multilingual encoder models. Wolfram, our European representative, was particularly excited about this one: "In European languages, you have different characters than in other languages. And, um, yeah, encoding everything properly is, uh, difficult." Ranging from 210 million to 2.1 billion parameters, EuroBERT is designed to push the boundaries of NLP in European and global languages. With training on a massive 5 trillion-token dataset across 15 languages and support for 8K context tokens, EuroBERT is a workhorse for RAG and other NLP tasks. Plus, how cool is their mascot?Reka Flash 3 - a 21B reasoner with apache 2 trained with RLOOAnd the open source train keeps rolling! Reka AI dropped Reka Flash 3, a 21 billion parameter reasoning model with an Apache 2.0 license! Nisten was blown away by the benchmarks: "This might be one of the best like 20B size models that there is right now. And it's Apache 2.0. Uh, I, I think this is a much bigger deal than most people realize." Reka Flash 3 is compact, efficient, and excels at chat, coding, instruction following, and function calling. They even used a new reinforcement learning technique called REINFORCE Leave One-Out (RLOO). Go give it a whirl on Hugging Face or their chat interface – chat.reka.ai!Last but definitely not least in the open-source realm, we had a special guest, Sandra (@itsSandraKublik) from Cohere, join us to announce Command-A! This beast of a model clocks in at 111 BILLION parameters with a massive 256K context window. Sandra emphasized its efficiency, "It requires only two GPUs. Typically the models of this size require 32 GPUs. So it's a huge, huge difference." Command-A is designed for enterprises, focusing on agentic tasks, tool use, and multilingual performance. It's optimized for private deployments and boasts enterprise-grade security. Congrats to Sandra and the Cohere team on this massive release!Big CO LLMs + APIs: Gemini Flash Gets Visual, Deep Research Goes Free, and OpenAI Builds for AgentsThe big companies weren't sleeping either! Google continued their awesome week by unleashing native image generation in Gemini Flash Experimental! This is seriously f*****g cool, folks! Sorry for my French, but it's true. You can now directly interact with images, tell Gemini what to do, and it just does it. We even showed it live on the stream, turning ourselves into cat-confetti-birthday-hat-wearing masterpieces! Wolfram was right, "It's also a sign what we will see in, like, Photoshop, for example. Where you, you expect to just talk to it and have it do everything that a graphic designer would be doing." The future of creative tools is HERE.And guess what else Google did? They made Deep Research FREE in the Gemini app and powered by Gemini Thinking! Nisten jumped in to test it live, and we were all impressed. "This is the nicest interface so far that I've seen," he said. Deep Research now digs through HUNDREDS of websites (Nisten's test hit 156!) to give you comprehensive answers, and the interface is slick and user-friendly. Plus, you can export to Google Docs! Intelligence too cheap to meter? Google is definitely pushing that boundary.Last second additions - Allen Institute for AI released OLMo 2 32B - their biggest open model yetJust as I'm writing this, friend of the pod, Nathan from Allen Institute for AI announced the release of a FULLY OPEN OLMo 2, which includes weights, code, dataset, everything and apparently it beats the latest GPT 3.5, GPT 4o mini, and leading open weight models like Qwen and Mistral. Evals look legit, but nore than that, this is an Apache 2 model with everything in place to advance open AI and open science! Check out Nathans tweet for more info, and congrats to Allen team for this awesome release! OpenAI new responses API and Agent ASK with Web, File and CUA toolsOf course, OpenAI wasn't going to let Google have all the fun. They dropped a new SDK for agents called the Responses API. This is a whole new way to build with OpenAI, designed specifically for the agentic era we're entering. They also released three new tools: Web Search, Computer Use Tool, and File Search Tool. The Web Search tool is self-explanatory – finally, built-in web search from OpenAI!The Computer Use Tool, while currently limited in availability, opens up exciting possibilities for agent automation, letting agents interact with computer interfaces. And the File Search Tool gives you a built-in RAG system, simplifying knowledge retrieval from your own files. As always, OpenAI is adapting to the agentic world and giving developers more power.Finally in the big company space, Nous Research released PORTAL, their new Inference API service. Now you can access their awesome models, like Hermes 3 Llama 70B and DeepHermes 3 8B, directly via API. It's great to see more open-source labs offering API access, making these powerful models even more accessible.This Week's Buzz at Weights & Biases: Offsite Hackathon and MCP Mania!This week's "This Week's Buzz" segment comes to you live from Oceanside, California! The whole Weights & Biases team is here for our company offsite. Despite the not-so-sunny California weather (thanks, storm!), it's been an incredible week of meeting colleagues, strategizing, and HACKING!And speaking of hacking, we had an MCP hackathon! After last week's MCP-pilling episode, we were all hyped about Model Context Protocol, and the team didn't disappoint. In just three hours, the innovation was flowing! We saw agents built for WordPress, MCP support integrated into Weave playground, and even MCP servers for Weights & Biases itself! Get ready, folks, because an MCP server for Weights & Biases is COMING SOON! You'll be able to talk to your W&B data like never before. Huge shoutout to the W&B team for their incredible talent and for embracing the agentic future! And in case you missed it, Weights & Biases is now part of the CoreWeave family! Exciting times ahead!Vision & Video: LoRA Video Effects and OpenSora 2.0Moving into vision and video, Remade AI released 8 LoRA video effects for 1X! Remember 1X from Alibaba? Now you can add crazy effects like "squish," "inflate," "deflate," and even "cakeify" to your videos using LoRAs. It's open source and super cool to see video effects becoming trainable and customizable.And in the realm of open-source video generation, OpenSora 2.0 dropped! This 11 billion parameter model claims state-of-the-art video generation trained for just $200,000! They're even claiming performance close to Sora itself on some benchmarks. Nisten checked out the demos, and while we're all a bit jaded now with the rapid pace of video AI, it's still mind-blowing how far we've come. Open source video is getting seriously impressive, seriously fast.AI Art & Diffusion & 3D: ByteDance's Bilingual Seedream 2.0ByteDance, the folks behind TikTok, released Seedream 2.0, a native Chinese-English bilingual image generation foundation model. This model, from ByteDream, excels at text rendering, cultural nuance, and human preference alignment. Seedream 2.0 boasts "powerful general capability," "native bilingual comprehension ability," and "excellent text rendering." It's designed to understand both Chinese and English prompts natively, generating high-quality, culturally relevant images. The examples look stunning, especially its ability to render Chinese text beautifully.Tools: Manus AI Agent, Google AI Studio YouTube Links, and Cursor EmbeddingsFinally, in the tools section, everyone's buzzing about Manus, a new AI research agent. We gave it a try live on the show, asking it to do some research. The UI is slick, and it seems to be using Claude 3.7 behind the scenes. Manus creates a to-do list, browses the web in a real Chrome browser, and even generates files. It's like Operator on steroids. We'll be keeping an eye on Manus and will report back on its performance in future episodes.And Google AI Studio keeps getting better! Now you can drop YouTube links into Google AI Studio, and it will natively understand the video! This is HUGE for video analysis and content understanding. Imagine using this for support, content summarization, and so much more.PHEW! What a week to celebrate two years of ThursdAI! From open source explosions to Gemini's visual prowess and OpenAI's agentic advancements, the AI world is moving faster than ever. As Wolfram aptly put it, "The acceleration, you can feel it." And Nisten reminded us of the incredible journey, "I remember I had early access to GPT-4 32K, and, uh, then... the person for the contract that had given me access, they cut it off because on the one weekend, I didn't realize how expensive it was. So I had to use $180 worth of tokens just trying it out." Now, we have models that are more powerful and more accessible than ever before. Thank you to Wolfram, Nisten, and LDJ for co-hosting and bringing their insights every week. And most importantly, THANK YOU to our amazing community for tuning in, listening, and supporting ThursdAI for two incredible years! We couldn't do it without you. Here's to another year of staying up-to-date so YOU don't have to! Don't forget to subscribe to the podcast, YouTube channel, and newsletter to stay in the loop. And share ThursdAI with a friend – it's the best birthday gift you can give us! Until next week, keep building and keep exploring the amazing world of AI! LET'S GO! This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Want to learn more about watches? email us at support@chriswarnes.com We buy, sell, trade, consign, service, source and repair. Do you want to work with us on a timepiece? email us support@chriswarnes.com Let's talk about one of the most unique Royal Oak Offshores ever produced: the 26400SO "Camo." This watch has a special place in my heart—it was the first AP I ever owned. Not only did I wear it proudly, but it was also my first profitable flip! Bought for around $30K and sold for $32K in 2018. Talk about a win! ✨ Key Highlights: Produced: 2014 - 2021 Case Size: Bold 44mm—because bigger is better! A true conversation starter with its unique "Camo" design.
VISTA NOSSA CAMISETA: https://reserva.ink/movimentoemfoco SEJA UM APOIADOR: https://apoia.se/movimentoemfoco No episódio da vez na série "O movimento é o meu foco!", recebemos Lívia Perídes Roizman, fisioterapeuta do esporte e líder de equipe na clínica Care Club. Ela já esteve presente conosco no episódio 115 para falar sobre "O paciente como um projeto" e no 131 em que deu seu relato sobre os "32K na Serra Fina", quando correu a KTR. Dessa vez ela retoma para receber nosso prêmio de 2024 e comentar sobre o papel do movimento em sua vida pessoal. Convidada: @liviaperidesfisio Hosts: @telles.rafa e @cassio_siqueira Edição: @crispetravicius
In this episode, we're joined by Kady Sandel, a business coach and the powerhouse CEO of a successful design agency, who shares the jaw-dropping story of booking $32,000 in design projects in a single day. Kady takes us through her journey from humble beginnings with $65 logos to consistently landing five-figure projects, revealing the mindset shifts, strategies, and client-building techniques that led to her $32K day. Tune in as Kady breaks down her approach to attracting high-value clients, mastering strategic pricing, and managing multiple projects with ease. She shares practical tips on building a strong reputation and visibility in the design world, highlighting the importance of SEO, blogging, and finding a profitable niche. We also dive into the role of AI in modern design, with Kady explaining how she integrates tools like ChatGPT to enhance her brand strategy and streamline her agency's operations. She discusses the balance between using AI to boost productivity while maintaining her unique brand identity, making it a powerful tool without overshadowing her personal design style. Throughout the conversation, Kady shares insights from her journey to success, including lessons on overcoming challenges in niching down, defining ideal clients, and creating a business model that allows for financial freedom and a balanced lifestyle. This episode is packed with actionable strategies for designers looking to elevate their income, confidence, and success. Don't miss this inspiring episode that blends high-level strategy with down-to-earth advice from a designer who's been there, done that, and is ready to show you how to do it too! Connect with Kady: The Wealthy Client Blueprint: https://aventiveacademy.com/wealthy-client/ (*Use coupon code BUCKETLIST to get it for free!) Instagram: https://www.instagram.com/aventiveacademy Podcast: https://aventiveacademy.com/podcast/ Connect with Cassie & Shay: Programs: https://bucketlisbombshells.com/programs Free Community: https://www.facebook.com/groups/bucketlistbombshellscommunity Instagram: https://www.instagram.com/bucketlistbombshells/
- In the latest schoolboard skullduggery, Ontario school board spent $32K to send staffers to education conference in Hawaii. Deb takes your calls- Jamie Ellerton and John Tory join Deb to discuss the unexpected extension to per-vote subsidy for political parties by the Ontario government - Ontario cities consider bylaws to prohibit protests near schools, places of worship. Is this the right move?
Listing To Dreams – This story is crafted from Judges 7:9-17 where God has whittled Gideon's 32K men army down to just 300 men, which is nothing against this mighty army that has filled the valley with troupes. To find out how you can support this ministry by visiting our website at https://lizardtracks.net. My stories can be found on your favorite podcast, App, or Alexa, search for Podcast Lizard Tracks.
Hey everyone, Alex here! Can you believe it's already end of May? And that 2 huge AI companies conferences are behind us (Google IO, MSFT Build) and Apple's WWDC is just ahead in 10 days! Exciting! I was really looking forward to today's show, had quite a few guests today, I'll add all their socials below the TL;DR so please give them a follow and if you're only in reading mode of the newsletter, why don't you give the podcast a try
While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on this intuition, our study presents information-intensive (IN2) training, a purely data-driven solution to overcome lost-in-the-middle. Specifically, IN2 training leverages a synthesized long-context question-answer dataset, where the answer requires (1) fine-grained information awareness on a short segment (~128 tokens) within a synthesized long context (4K-32K tokens), and (2) the integration and reasoning of information from two or more short segments. Through applying this information-intensive training on Mistral-7B, we present FILM-7B (FILl-in-the-Middle). To thoroughly assess the ability of FILM-7B for utilizing long contexts, we design three probing tasks that encompass various context styles (document, code, and structured-data context) and information retrieval patterns (forward, backward, and bi-directional retrieval). The probing results demonstrate that FILM-7B can robustly retrieve information from different positions in its 32K context window. Beyond these probing tasks, FILM-7B significantly improves the performance on real-world long-context tasks (e.g., 23.5->26.9 F1 score on NarrativeQA), while maintaining a comparable performance on short-context tasks (e.g., 59.3->59.2 accuracy on MMLU). Github Link: https://github.com/microsoft/FILM. 2024: Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou https://arxiv.org/pdf/2404.16811
En este programa hacemos un repaso a algunas noticias de la actualidad commodoriana y a los lanzamientos commodorianos de las últimas semanas. Por último, repasaremos los principales contenidos del número 78 de una revista moderna muy clásica, la retornada Commodore Free. Todo esto lo veremos con el equipo habitual formado por David Asenjo (https://twitter.com/darro99), Toni Bianchetti (https://twitter.com/seuck), Narciso Quintana "Narcisound" (https://twitter.com/narcisound), Jonatan Jiménez (https://twitter.com/jsabreman) y Paco Herrera (https://twitter.com/pacoblog64). Las noticias comentadas son: - 8bitDo anuncia el Retro Mechanical Keyboard - C64 Edition: https://www.8bitdo.com/retro-mechanical-keyboard-c64/ - Zilog deja de fabricar el microprocesador Z80: https://twitter.com/nanochess/status/1781290078230253998?t=9wpPhRUkLCD-96vmoDKC9A&s=19 - El músico y programador 4Mat está tuneando el SEUCK para que tenga música y efectos sonoros a la vez: https://x.com/4mat_scenemusic/status/1774168812159414283?s=20 - Proyecto de reproducción de la Commodore Chessmate: https://hackaday.io/project/194011-commodore-chessmate-reproduction; https://hackaday.io/project/194011/logs?sort=oldest - U64E-MK2. Ultimate 64 en formato módulo: https://www.facebook.com/groups/378328707752/?hoisted_section_header_type=recently_seen&multi_permalinks=10160508910172753 - Gala homenaje al videojuego Español: https://www.youtube.com/watch?v=MIRTulmvBqA - Escaneos de la documentación proporcionada por David Sancho: https://archive.org/details/@explora_commodore - Consolador controlado por un C64: https://twitter.com/_DeviantDesigns/status/1779941383102484834?t=5JUe44Hupl4Q0OOQXq2DrQ&s=19 - C-65 a subasta en eBay: https://www.ebay.com/itm/134989142856 - Ganadores del concurso de Basic10Liner, juegos para diversas plataformas en sólo 10 líneas de BASIC: https://twitter.com/Basic10L/status/1776648844496810366?t=Bz7tfouLH0aQvllds-6meg&s=19;https://youtu.be/PZ6r_HHwuto?si=_uDdDEKXk3CMmoNd - Hungarian Game Development Compo - Plus/4: https://sites.google.com/site/istvanmezo81/cplus4-competition-2023 - J-CPU64 6510/8500: https://retro8bitshop.com/product/j-cpu64-6510-8500-replacement-for-the-commodore-64-pre-order/ - SNasm Assembler Nuevo ensamblador para 6502 y Z80: https://mdf200.itch.io/snasm - Otra campaña de crowdfunding de Hewson, esta vez de “pins”: https://www.backerkit.com/c/projects/huey-games-ltd/huey-hewson-pintopia?ref=bk-ec-33942 - Commodore4ever presenta un conector de PSU trasero para C64 con opción de protección de sobre voltaje. Además hay otro angular también para C128: https://www.commodore-4ever.com/product-page/back-jack-power-connector-reimagined Los juegos y programas comentados son: - Spinning image (Carleton Handley, C64): https://carletonhandley.itch.io/spinning-image - Timo's castle (Roman Werner, C64): https://romwer.itch.io/hc - Gridlock (Megastyle, C64): https://megastyle.itch.io/gridlock - Bring Back More Bones (4KB) (Comocore, C64): https://commocore.itch.io/bring-back-more-bones-4k - Prince Of Persia (Pedro Bermejo, VIC-20): https://www.indieretronews.com/2024/04/prince-of-persia-has-been-converted.html;https://sleepingelephant.com/ipw-web/bulletin/bb/viewtopic.php?p=120870#p120870 (Descarga del juego) - Amigo Run (Reassembler 2024, Amiga): https://www.youtube.com/watch?v=rxKWThBZcXk - Bunny's Boing Ball Bounty (RobSmithDev, Amiga): https://robsmith-dev.itch.io/bb - Mad Pod Race (Gods of the Universe, Plus/4): https://plus4world.powweb.com/software/Mad_Pod_Race - Temptations (SOY, Amiga): https://amigatronics.wordpress.com/2024/04/09/temptations-a-punto-de-caramelo/;https://s0yamigamsx.wordpress.com/temptations/ (WEB del autor) - Kondi Krush (Anystone, C64/Amiga/Plus/4): https://www.indieretronews.com/2024/04/kondi-krush-candy-crush-comes-to.html - Anti Air (Inufuto, C64, Amiga, Plus/4, VIC-20, C-16 y C-116 (32K) + otros sistemas (MSX, TRS-80, etc)): http://inufuto.web.fc2.com/8bit/antiair/; https://www.youtube.com/watch?v=uOO4t5NGr5w; https://www.youtube.com/watch?v=VSfYIuXfx00 - Outrun PETSCII (Andy Vaisey, C64): https://twitter.com/AndyVaisey/status/1782501824383070514?t=Jq9pKAaGU5OYCvifDHV5nQ&s=19 - Metal mayhem (Dr Mortal Wombat, C64): https://drmortalwombat.itch.io/metal-mayhem - Legend of wilf (sequel de https://www.lemon64.com/game/kokotoni-wilf) (Hayermaker64, C64): https://hayesmaker64.itch.io/legend-of-wilf - Alpacalyse (TND, C64): https://richard-tnd.itch.io/alpacalypse - R-Squadron (Monster's Legs, Amiga): https://monsters-legs.itch.io/r-squadron - Glubble (Oxygene, Amiga): https://www.youtube.com/watch?v=gSAF4hTsvxc - Tapper Basic (ing73, C64): https://ing73.itch.io/tapper - Toop (Haplo, C64,Plus/4): https://h4plo.itch.io/toop
$32K stolen from woman after Uber struck by another vehicle while on way to make deposit, In other Uber news, 1/3 of Uber drivers say they have been in an accident, Aggressive wild turkey: School pickup line edition
What a CreepRoyal Family PR FiascosSeason 24, Episode 5With the latest news of an insanely unusual news cycle that, unfortunately, includes Kate Middleton being diagnosed with cancer, we decided to speak to supreme royal expert and friend of the pod Kristen Meinzer of the Daily Fail podcast to talk about what the heck is going on with the House of Windsor. With baffling (or complete lack of?) strategic communications, a frighteningly low number of royals who can pinch-hit while King Charles is convalescing, and novice photoshopping--can this get any weirder? Sources for this episode:Washington Post: https://www.washingtonpost.com/world/2024/03/22/catherine-princess-wales-has-cancer-she-says-video/?utm_source=alert&utm_medium=email&utm_campaign=wp_news_alert_revere&location=alertBuckingham Palace will pay $32K for a “Communications Assistant!” https://www.washingtonpost.com/world/2024/03/21/buckingham-palace-communications-assistant-kate/ US Weekly: https://www.usmagazine.com/celebrity-news/news/kate-middletons-photo-of-queen-with-grandchildren-was-manipulated/ABC News: https://abcnews.go.com/GMA/Culture/kate-middleton-spotted-prince-william-amid-photo-editing/story?id=108236720VOX: https://www.vox.com/culture/24087565/princess-kate-middleton-disappearance-rumors-explained-abdominal-surgery-kensington-palaceReuters: https://www.reuters.com/world/uk/british-royals-shrug-off-speculation-about-kate-king-charles-2024-03-19/Sky News: https://news.sky.com/story/royals-latest-doubt-over-second-kate-photo-william-appearing-later-today-13097903Be sure to follow us on social media. But don't follow us too closely … don't be a creep about it! Subscribe to us on Apple PodcastsTwitter: https://twitter.com/CreepPod @CreepPodFacebook: Join the private group! Instagram @WhatACreepPodcastVisit our Patreon page: https://www.patreon.com/whatacreepEmail: WhatACreepPodcast@gmail.com We've got merch here! https://whatacreeppodcast.threadless.com/#Our website is www.whatacreeppodcast.com Our logo was created by Claudia Gomez-Rodriguez. Follow her on Instagram @ClaudInCloud
Hihi, this is Alex, from Weights & Biases, coming to you live, from Yosemite! Well, actually I'm writing these words from a fake virtual yosemite that appears above my kitchen counter as I'm not a Vision Pro user and I will force myself to work inside this thing and tell you if it's worth it. I will also be on the lookout on anything AI related in this new spatial computing paradigm, like THIS for example! But back to rfeality for a second, we had quite the show today! We had the awesome time to have Junyang Justin Lin, a dev lead in Alibaba, join us and talk about Qwen 1.5 and QwenVL and then we had a deep dive into quite a few Acronyms I've been seeing on my timeline lately, namely DSPy, ColBERT and (the funniest one) RAGatouille and we had a chat with Connor from Weaviate and Benjamin the author of RAGatouille about what it all means! Really really cool show today, hope you don't only read the newsletter but listen on Spotify, Apple or right here on Substack. TL;DR of all topics covered: * Open Source LLMs * Alibaba releases a BUNCH of new QWEN 1.5 models including a tiny .5B one (X announcement)* Abacus fine-tunes Smaug, top of HF leaderboard based Qwen 72B (X)* LMsys adds more open source models, sponsored by Together (X)* Jina Embeddings fine tune for code* Big CO LLMs + APIs* Google rebranding Bard to Gemini and launching Gemini Ultra (Gemini)* OpenAI adds image metadata (Announcement)* OpenAI keys are now restricted per key (Announcement)* Vision & Video* Bria - RMBG 1.4 - Open Source BG removal that runs in your browser (X, DEMO)* Voice & Audio* Meta voice, a new apache2 licensed TTS - (Announcement)* AI Art & Diffusion & 3D* Microsoft added DALL-E editing with "designer" (X thread)* Stability AI releases update to SVD - video 1.1 launches with a webUI, much nicer videos* Deep Dive with Benjamin Clavie and Connor Shorten show notes:* Benjamin's announcement of RAGatouille (X)* Connor chat with Omar Khattab (author of DSPy and ColBERT) - Weaviate Podcast* Very helpful intro to ColBert + RAGatouille - NotionOpen Source LLMs Alibaba releases Qwen 1.5 - ranges from .5 to 72B (DEMO)With 6 sizes, including 2 new novel ones, from as little as .5B parameter models to an interesting 4B, to all the way to a whopping 72B, Alibaba open sources additional QWEN checkpoints. We've had the honor to have friend of the pod Junyang Justin Lin again, and he talked to us about how these sizes were selected, that even thought this model beats Mistral Medium on some benchmarks, it remains to be seen how well this performs on human evaluations, and shared a bunch of details about open sourcing this.The models were released with all the latest and greatest quantizations, significantly improved context length (32K) and support for both Ollama and Lm Studio (which I helped make happen and am very happy for the way ThursdAI community is growing and connecting!) We also had a chat about QwenVL Plus and QwebVL Max, their API only examples for the best open source vision enabled models and had the awesome Piotr Skalski from Roborflow on stage to chat with Junyang about those models! To me a success of ThursdAI, is when the authors of things we talk about are coming to the show, and this is Junyang second appearance, which he joined at midnight at the start of the chinese new year, so greately appreciated and def. give him a listen! Abacus Smaug climbs to top of the hugging face leaderboard Junyang also mentioned that Smaug is now at the top of the leaderboards, coming from Abacus, this is a finetune of the previous Qwen-72B, not even this new one. First model to achieve an average score of 80, this is an impressive appearance from Abacus, though they haven't released any new data, they said they are planning to! They also said that they are planning to finetune Miqu, which we covered last time, the leak from Mistral that was acknowledged by Arthur Mensch the CEO of Mistral.The techniques that Abacus used to finetune Smaug will be released an upcoming paper! Big CO LLMs + APIsWelcome Gemini Ultra (bye bye Bard) Bard is no longer, get ready to meet Gemini. it's really funny because we keep getting cofusing naming from huge companies like Google and Microsoft. Just a week ago, Bard with Gemini Pro shot up to the LMSYS charts, after regular gemini pro API were not as close. and now we are suppose to forget that Bard even existed?
Our first ever demo day aimed for 15-20 people and ended up ballooning to >200 and covered in the news. We are now running the 2024 edition in SF on Feb 23: Latent Space Final Frontiers, a startup and research competition in “The Autonomous Workforce”, ”Beyond Transformers & GPUs”, and “Embodied AI”. RSVP here! You can find all LS online/IRL events on our new calendar. Super Early Bird tickets have just gone on sale for AI Engineer World's Fair, June 25-27!Today we have the honor of hosting two of Together AI's co-founders: Ce Zhang (CTO) and Vipul Ved Prakash (CEO). This is a rare opportunity to recap the history of the company since our last check-in with Tri Dao (Chief Scientist), some of their big releases, and do a deep dive into the state of the AI inference market. Together has emerged as one of the most consequential new startups in the new AI summer, last announcing a ~$100m Series A raise in November (at a ~$360-565m valuation). But there are at least three Togethers - Together the Research Lab, Together the Fine Tuning & Inference platform, and Together the custom models service. As we clarify on the pod, the overarching philosophy of Together is the ability to improve on all these fronts simultaneously by being “full stack”, from the lowest level kernel and systems programming to the highest level mathematical abstractions driving new model architectures and inference algorithms.Bringing Research and Industry TogetherIn just one year, Together has been behind some of the most exciting research in AI:* RedPajama, a fully open source dataset for model pre-training which mirrored the Llama1 recipe. Then followed by RedPajama2, a 30T tokens dataset of filtered and de-duplicated tokens. * RedPajama-INCITE-3B and 7B, which were SOTA in a few benchmarks at the time of release. * FlashAttention-2, developed by Together's Chief Scientist Tri Dao. We covered FA-2 in a previous episode with him.* Mamba-3B, the most promising transformer-alternative model that they released in collaboration with Cartesia. * StripedHyena, a SOTA graft of Hyena state space models and transformer models together* Medusa, an alternative to speculative decoding that lets you use multiple decoding heads instead of a draft model. * MonarchMixer, which was one of the most popular orals at NeurIPS 2023. It's an approach to transformers that replaces many of its core parts with Monarch matrices for better computational efficiency. And I'm sure we missed something! As Vipul reveals, almost 50% of Together staff is researchers, and two of their co-founders (Chris Ré and Percy Liang) are professors at Stanford, so we can expect a lot more here.Bringing “Disaggregated” GPUs TogetherOn their cloud, they offer inference as a service, fine-tuning, pre-training, etc, but unlike other providers they think of themselves as a disaggregated cloud. Today, they have ~8,000 A100 and H100 GPUs on their platform (an exclusive revealed on the pod!) totaling over 20 exaflops of compute, but instead of just buying more and putting them in a cluster and then exposing a `us-east-1` option for customers, they are taking heterogenous compute sources and adding a unified layer on top of it for developers to consume. Building on Ce's research, Together's GPU Clusters are taking on comparable AWS and GCP offerings in both cost and speed:Take the Hessian AI center in Germany or the DoE's INCITE; they have GPUs that they want to share with researchers, but they lack the cloud layer over it. Similarly, there's starting to be more and more differentiation amongst types of GPUs: H100s, A100s, MI3000s, etc. Each of them has different availability and performance based on task, and the end user shouldn't have to be an hardware expert to run inference on a model, so Together abstracts a lot of that away.A big theme of the Together inference stack, a “bag of 50 tricks” that we discuss on the pod, is also “hardware-aware” algorithms like FlashAttention and Mamba, which further emphasize the benefits of co-developing everything together:Special Focus: Transformer AlternativesAs we mentioned above, they are also funding a lot of research in Transformer alternatives. To reiterate a few points on why they matter:* Longer context is not the motivation for sub-quadratic architectures: Transformers don't inherently have hard limitations on context size, but they just get extremely expensive. When developing sub-quadratic alternatives, you easily enable very long context, but that's now how you should compare them. Even at same context size, inference and training is much cheaper on sub-quadratic architectures like Hyena.* Emergence of hybrid architectures: a lot of early conversations have been around the “post-Transformers” era, but it might be more like “half-Transformers”. Hybrid architectures could have split layers with some transformer-based and some state-space ones. One of the challenges is that a lot of hardware kernels are optimized for transformer operations, so you'd lose a lot by moving away completely.* Higher speed = higher GPU throughput: if we could reach the same benchmark performance on subquadratic architectures, it'd solve a lot of the GPU crunch. Today we peak at ~170 tok/s on inference in some open models; if we could reach 5,000 tok/s on the same card, you'd be able to serve 30x more customers on the same hardware. As a cloud provider, you're obviously incentivized to get there.We had a lot of fun chatting with the Together guys and we covered a lot of ground, so enjoy the conversation!Note: This is the first episode of a “cloud providers mini-series”. We have Erik from Modal and Ben from Replicate coming up next!Video PodcastJoin us to watching the video version of this pod on our snazzy YouTube!Show Notes* Together AI* RedPajama Dataset v1 Announcement* RedPajama Models v1 Announcement* Together Embeddings* StripedHyena-7B* Mamba-3B-SlimPJ* Vipul's X thread on Anyscale* Vipul's Razor* SemiAnalysis' "Inference Race to the Bottom" post* Chris Ré* Mike Conover's episode* Slim Pajama by Cerebras* Dolma by AI2* Jina AI* Tengyu's Voyage AITimestamps* [00:00:00] Introductions* [00:00:43] Origin and current state of Together.ai* [00:02:15] Transition from Apple to Together and the vision for open AI* [00:04:54] How Chris Ré introduced Ce and Vipul* [00:08:43] How RedPajama came to be* [00:13:34] Model training and Transformer alternatives* [00:15:37] DSIR and the importance of data in LLMs* [00:21:19] Inference vs Fine-tuning vs Pre-training usage on Together* [00:23:20] Together's GPU stash* [00:27:02] Why standardization of inference metrics is important* [00:29:26] Building moats in AI inference* [00:31:49] Federated vs disaggregated cloud computing* [00:34:57] Opportunities for improvement in the inference stack* [00:36:13] Anyscale benchmarking drama* [00:41:27] Not just an inference platform* [00:43:50] Together Embeddings and the future of embedding models* [00:45:53] State space models and hybrid architectures* [00:53:52] The need for 5,000 tokens/s speed in AI inference* [01:00:23] What's the most interesting unsolved question in AI?TranscriptAlessio [00:00:00]: Hey, everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.Swyx [00:00:14]: Hey, and today we're together with Together. Welcome to the studio, guys.Ce / Vipul [00:00:20]: Thank you.Swyx [00:00:21]: I don't know how you typically give self intros, but does anyone want to go first? How do we get our audience acquainted, especially to who's speaking, because it's unusual for us to do a four-person pod. Yeah.Ce [00:00:33]: Hi, everyone. I'm Ce. I'm one of the co-founders of Together and the CTO, working with the team on technical things.Vipul [00:00:40]: I'm Vipul Ved Prakash, co-founder and CEO of Together.Swyx [00:00:43]: I always consider you guys as one of the sort of all-in-one companies. I always want to say labs, but I feel like you're not a lab. What is the sort of origin of Together, and then what is it today? I feel like it used to be Together.xyz, and then now you're Together.ai.Vipul [00:01:00]: I think fundamentally, Together is about open and independent AI systems. We think this is one of the most consequential technologies of our time, and when we started the company in June 2022, our focus was to build a platform for open source, independent, user-owned AI systems. One way to think about it is big labs, frontier model labs, have built their own platforms for developer platforms for their models. We think of Together as a platform for everything else, whether these are open models, whether these are models being built by companies that are owned by them. Our sort of XYZ roots, we have a fairly deep decentralization and open ethos that kind of reflects in all our platform and strategy and business. And we also, the way we structure our cloud is by combining data centers around the world instead of, you know, we are today not located in hyperscalers, we have built a footprint of AI supercomputers in this sort of very disaggregated, decentralized manner.Alessio [00:02:15]: I know before Together, you were at Apple, so you go from like the most walled garden, private, we don't say anything company, to we want everything to be open and everybody to know somebody. What maybe did you learn from like the Apple way of being super close and polished and maybe what are you taking now to Together to make it open, but also a very nice developer experience?Vipul [00:02:37]: Yeah, I would say, you know, one sort of my, you know, background has been in open source for a long time. One of the first things I created was a collaborative spam filter, you know, this was back in the day. It's called Vipul's Razor. And it became quite popular. And the first company I founded called CloudMark was built around, you know, taking open source and building both an open side of it and a commercial product around it. I think Apple is sort of very focused on providing this amazing experience to its customers with, you know, most of the technology sort of hidden behind the product. And certainly the focus on fluidity and applying complex technology to make everyday things simple is something that Apple does really well. And, you know, that's been a sort of big part of how we think about our developer platforms. I think it informs it. The other thing is that during my years at Apple, we, you know, worked a lot on deep learning. And one of the things that was sort of very viscerally accessible to me was how well these systems worked. We, you know, we built an open domain Q&A system. This was based on Facebook's LSTM paper in 2016. And it was remarkable because we had a parallel system based on sort of information retrieval techniques, which is extremely complicated, didn't work that well. And you know, this thing we wrote in a week was just incredible performance. So I think some of those experiences, at least for me personally, sort of were creating this roadmap of how important and powerful this technology is. And you know, when the scaling loss paper was published, I was very clear, like it was in some ways something very profound. We've never had algorithms that improve in capabilities with scale out. So this is almost a new era of computing. So that's been, I think, the influence of Apple, my years at Apple, really for me, like crystallized the value of what we are doing together.Alessio [00:04:54]: And how did you decide to join forces? Because you did a postdoc with Chris Ré at Stanford. You know, we already had Tri Dao from Together and we talked about Hazy. What was like the meeting of the mind of, hey, I come from like the more technical postdoc assistant professor background and we've got yet a more product thing. What got you excited to like build this now?Ce [00:05:15]: So we have been working on this together, Chris, in the essentially last like 10 years, right? So it was like a machine learning system 10 years ago was like Power BI's graphic model, right? And then convolutional neural network and then all the foundation model that we see today. But if you look at this, I think that fundamentally the thing we are actually optimizing is actually not that different. It's always about data movement across essentially all the stacks, right? So when you do distributed like computing, it's about communication across different machines. When you do, for example, flash attention, it's about data movement at a different essentially memory hierarchy, right? So we have been doing this in the last 10 years and seeing the field start grow, grow, grow. So we kind of feel the current kind of this like wave of technology is actually the perfect time to actually bring all the research essentially into something real. And we are super lucky that we got introduced to Weibo, right? And then we hope to join forces and bring this to real world.Swyx [00:06:10]: It's an unusual team of like sort of research and industry. Like you've been like a third or fourth time founder now. Third time founder, yeah. And so like what is your first order of business when you like set up together? Like how do you sort of put something like this together? Oh my God, I'm going to use this word so much.Vipul [00:06:27]: I feel AI companies are really kind of driven by research. And Chris and I had been talking about how to reduce the cost of building models. We felt that there aren't really big data modes around foundation models. They are built from a subset of the web. What is difficult is the cost of capital to build these. And one of the ways in which you can reduce this cost is by making more efficient systems. With that, it was really about finding the right set of co-founders and team. In fact, when Chris introduced me to Ce, and I think within the first five minutes of talking to Ce, I was like, we are starting this company. And our early focus was thinking about this more sort of disparate set of resources, you know, GPUs around the internet. Can we use those to build? And we really have to compress communication for, you know, when we do gradient averaging, there's just a lot of traffic. And if you can reduce that somehow, you sort of open up the possibility of using cheaper compute, you know, across the network. And Ce's research for a decade has been in that subject. You know, and from there, finding, you know, other folks in the network, I think there is generally a lot of excitement and philosophical alignment around what we are doing, which, you know, we publish papers, we publish open source libraries and code, we build open models. And I think the people in academia in, you know, machine learning and NLP, that's really what they want to do. So I think that's been really a kind of kernel for, you know, composition of the company. And we're lucky to have, you know, at this point, attracted some of the best researchers in the field. So I think that's the most important thing. And, you know, the rest of it is sort of driven by us. A couple of these philosophies around independent systems and decentralization and good developer interfaces, you want to make it accessible. That's, you know, just as important. And the rest follows from there, I think.Alessio [00:08:43]: I want to try and fill in some of the blanks in the history of Together. I think people come on your website today and they say, you raised a hundred million dollars Series A. They're like, wow, these guys are like super legit company. But it feels like Red Pajama just came out a year ago. I remember we had Mike Conover in the studio, who had built Dolly at Databricks. And you announced it literally the morning we were recording. So we're like in the studio on our phones, looking at it. And it's like, wow, this is like the first time now there's like a good curated dataset to do open pre-training. So maybe let's start from there. Like, what was the motivation behind it? Why did you decide to do that? It's, datasets are one of the things that most people don't want to work on. They just want to do models, not datasets.Ce [00:09:27]: Yeah. So, yeah, first one is not the first, right? So I think it's actually built on a whole bunch of amazing effort the community already have. For example, Eleuther have the pile, right? There's a whole bunch of amazing datasets they have, like C4, right, from Google, right? So I think really get inspired by the impact those like datasets have on the community, right? So I think when we did Red Pajama, it was a time that people are really fascinated by Lama, the model, like Lama 1, right? Which I feel like decades ago, right? But it's kind of, people are really excited about the quality, right? So that's really like a big shift in people how to think about open model. People start to see hope, right? So, but the one problem of Lama is the data recipe is being described in a pretty detailed way in the paper, but the data is actually not there. So, and our original thinking is how about we take the recipe and we try to do our best effort reproduction and try to put it out, such that we can learn from our mistakes in the reproduction together, right? So that's essentially the original thinking behind Red Pajama. And we have been pretty happy and excited about what community have been kind of build on it. For example, there's a dataset called Slim Pajama, right? Which do deduplication over our data, right?Swyx [00:10:38]: From Cerebras, did they talk to you before?Ce [00:10:39]: Oh, yeah, yeah, yeah, yeah. So, yeah, so we are very good friends so we can discuss about technical perspective. We are pretty excited because I think it's kind of why we do Red Pajama in the first place is that people can actually build not only models, but also datasets essentially over that piece of artifact, right? So that's actually what inspired us to do the first version of Red Pajama dataset.Swyx [00:11:01]: Yeah, and then you released V2 maybe two months ago.Ce [00:11:04]: Yeah.Swyx [00:11:05]: 30 trillion tokens.Ce [00:11:06]: Yeah, 30 trillion tokens. So I think what's exciting about Red Pajama V2 is not only the number of tokens, but we start to kind of learn from Red Pajama V1. So one thing that we learned was that data quality is really the core, right? So you want to take this couple trillion token dataset and try to bring them down maybe to one trillion or two trillion, right? The way that you actually filter them, deduplicate them is not something that kind of pre-decided before you see the application, right? So you kind of want to have a modular framework to think about data quality, right? So like given application, let's automatically or maybe semi-automatically try to come up with a way to filter it down. So that's why in Red Pajama V2, we kind of overlay the dataset with like 40 different pre-computed quality signal, right? If you want to reproduce your best effort, like C4 filter, it's kind of like 20 lines of code, right? And this open up this opportunity you can actually put different filter together, learn the combination of filter. We are very excited to see what community actually come up with using Red Pajama V2.Swyx [00:12:11]: It was retrospectively so obvious that this is a good idea that I wonder how come more datasets don't do this. You release the dataset with all these toggles that you can turn on and off, right? And you can sort of tune up and down the quality in ways that you believe is important to you. Yeah, I just, it makes so much sense now in retrospect. Because everyone just publishes like their pipeline and then the end result. But what about all the intermediate stages? Yeah.Ce [00:12:35]: Yeah, so I think, so there are multiple things there. I don't think we are the only one like doing that. For example, like Doma from AI2, right? They have this very flexible format to actually put in those quality signals, right? Think like, we are actually calling them some, right? So you can actually load Red Pajama using their tool. That whole thing should work, right? So I think one fundamental thing that changed in the last year, essentially, in the beginning when people think about data, it's always like a byproduct of the model, right? You release the model, you also release the data, right? The data side is there essentially to show people, ah, if you train on this data, you'll get a good model. But I think what started to change is when people started building more and more of those models, people started to realize like different subset of data side is kind of valuable for different applications, right? The data becomes something to play with, right? So I think we are kind of lucky that we happen to release Red Pajama right at that point that we get this opportunity to actually learn from that.Alessio [00:13:34]: And you guys have a custom model training platform on Together 2. You have a bunch of stuff in there for data selection, like the DSIR and things like that. How did you decide to work on that versus, because you first started with like some of the fine tunes on LLAMA. Do you see a lot of interest there? And I know you've been doing a lot of research on state space models and other transformer alternatives. Like, do you also see that as something you'll keep working on this year and push more people towards?Vipul [00:14:02]: Yeah, I mean, we, you know, we think of how to make training more efficient and building models more efficient. Part of that is being able to select the right dataset. This is why you have signals, DSIR. You can start with a small dataset and find similar documents, build models with that. So we think it's an important part of the kind of model build tooling that, you know, sort of widely useful for people building different kinds of models. Similarly, you know, we are running into the limits of how fast you can make transformers. And we want inference at 5,000 tokens per second. I don't think we will get there with transformers and we need to learn longer sequences. Data, again, becomes very, very expensive with transformers. So I work on space state models and all the research that we are doing there. And hopefully other labs will pick up on this and make it a kind of important target for optimization. But we think that, you know, open source is a great place for this. We can provide these recipes for data and for training to our customers who are building, you know, custom models themselves. And, you know, we are quite excited about the sort of progress we are seeing there.Alessio [00:15:18]: Do you have some of these models available for inference on Together? Can people play around with a strictly, you know?Swyx [00:15:25]: Yeah.Vipul [00:15:25]: Yeah, they're available for inference on our serverless platform.Swyx [00:15:29]: I always try to be the person who asks about acronyms in case, you know, people want to understand. Should we explain importance resampling, you know, that kind of stuff?Ce [00:15:37]: Oh, yeah. So DSIR essentially, it's a fundamental idea. So it's one of the paper from Percy, right? So essentially, if you know what you are doing, you can actually use that as a very strong signal about what data to put in to insert training process, right? So that's essentially the fundamental idea, right? So, and then more concretely, right? So there are actually different versions of DSIR, right? So one version is like if you have a validation site, right? You can actually somehow measure the similarity between the validation site and also your pre-trained corpus and essentially subset, like the subset. And often there's actually like less targeted version of DSIR where you'll say, yeah, maybe Wikipedia is actually a very good corpus. Let's try to find more Wikipedia, right? And you can think about it in two ways, either as a way to come up with different weights for different data slices. Yeah, so as like filter type of step. Yeah, for a data set, or think about that as like data augmentation. So that's how, yeah, that's how we think about DSIR.Swyx [00:16:33]: That makes sense. I will have to read the paper to understand a little bit more. Because when you say things like, we have to know in advance what we were trying to do with the model, then we do importance resampling. That is against the principle of general intelligence, right? Like the point is to train AGI.Ce [00:16:48]: Yeah, so it depends on what do you mean by being general or generic, right? So I think, I mean, you can always take a meta-learning perspective that we know the distribution of tasks that we care about, right? So you can always go kind of up in the ladder of how general the whole thing is, right? But also for many of the customers that we are actually talking to, right, they have kind of very targeted application, right? The benefit you can get out of that is you could build a better open model, often smaller, often easier to do inference, if you know what you want, right? So I think the whole trade-off would be, and the x-axis would be how generic the whole thing will be. The y-axis would be not only the top accuracy, but also a whole bunch of the deployment cost, right? The size of the model, right? The robustness of the model. So I think different people will navigate the space in different way. And we want to be the platform, essentially, whatever point that you want, we have a solution for you.Swyx [00:17:43]: One more thing on data before we go deeper on state-space models. Are we running out of data? Can we go in order of magnitude? Can we go five orders of magnitude? How do both of you think about how much data we have and how much we need?Ce [00:17:55]: Yeah, so I think that's a very, very good question. So I don't think we are running out of data on Earth.Swyx [00:18:02]: Right, so think about it globally. Training data, training class data.Ce [00:18:05]: Yeah, yeah, so I think, I mean, some of them are not accessible, right? But I do think there are many organizations in the world have enough data to actually train very, very good models, right? So, I mean, they are not publicly available, right? But there are people who actually have access to those, right? So I think in general, right? So if you think about the data in the open space, right? So I guess that was specifically that you actually mean whether we are running out of data. I do think there need to be some way, right? That people who are training open models get connected with essentially data that's not internet data. So I think that channel need to be opened up for the open model to get more data, right? But I'm kind of on the optimistic side that the society will figure out a way that we can train open models that's beyond this internet data.Swyx [00:18:57]: Beyond internet, meaning books?Ce [00:19:00]: I mean, there are a lot of those, right?Swyx [00:19:02]: Books, right?Ce [00:19:02]: Transcripts, right? Videos, audios, right? So there are a whole bunch of data sources that we are not integrating into open data side, right? So, and maybe they shouldn't be open, right? So I think the community need to figure out a way, yeah, like the best balance, yeah? Such that we can have open models, but on the other hand, also have a reasonable collection of data that we can actually use.Swyx [00:19:29]: I think a lot of people think that, there's a theory that Whisper was released so that you could transcribe YouTube and then use that as a source of tokens. Then I talked to other researchers who are like, you know, YouTube has very low quality tokens. You know, do you want your model to talk like a live streamer from YouTube? Because that's what they're going to do. So it's not clear, like what the quality of this data could be.Ce [00:19:53]: Yeah, I guess that depends on your application, right? So I think as a platform, right? So our goal is whatever application that you have, yeah, so we have a platform that you can actually achieve your goal, right? So there are definitely applications that kind of make sense to speak like YouTube, right? So, but there are probably also other application that kind of more on the formal side, right? So I think there are going to be a diverse collection of models, both open and closed, right? So, and we kind of want to be the engine that powers that.Swyx [00:20:21]: There's a lot of people who own data sources who are doing the locally optimal thing and humanity as a whole is losing out. So like New York Times is swinging open AI, you know, Stack Overflow shut down their API, Reddit shut down their API, X, you know, made their own model, right? On Twitter data. We're just going to have all these like tiny little gardens of data that it would be useful in a general model, but everyone's just trying to make their own model. And it seems like globally suboptimal.Vipul [00:20:47]: I think you need to have some kind of a marketplace for figuring out how to get this, you know, data into models and have, I think we'll increasingly see more of that. You know, I think there's a positive aspect to it too. There is a incentive for creators to participate in a system, which is sort of more fair relative to, you know, the capture of value by an AI company that's taking their data. But I agree. I think this is a big open problem that needs to be solved. And I hope there will be, you know, serious efforts around it.Alessio [00:21:19]: Let's talk about the most precious resource on planet earth, GPUs. You have a lot of compute obviously, but you also have a lot of product pieces. You have inference, you have fine tuning, you have pre-training. What's the split in terms of usage? Do you see most people are just running inference on off the shelf models? Do you see maybe some last mile fine tuning?Vipul [00:21:40]: I would say right now, the top five models on our inference stack are probably all fine-tuned versions of open models. And we've seen- Who fine-tuned them?Swyx [00:21:51]: You fine-tuned them?Vipul [00:21:52]: They were fine-tuned by our customers.Swyx [00:21:54]: By your customers.Vipul [00:21:55]: You know, either on our platform or off our platform. And we are generally seeing that, you know, that is the sort of trend where you can get better quality on your task by sort of now easily adapting these models to your data. We also have, I would say, over 20 big model builds happening on the platform, which are customer. We see a lot of training and it's also somewhat surprisingly a more continuous kind of workload. We sort of imagine that this would be more episodic. You train a model and then you do inference. But what we find is, you know, we train a model and then they train the next version and then the next version, which sort of grows in scale. I would say training is still the bigger portion. Some ways inference is super linear to model quality. And as the models are getting better, there's more and more inference.Swyx [00:22:48]: Oh, because they're more useful. Yeah, they're more useful, yeah. So, okay, so training is bigger. This is actually consistent with what we've heard from Mosaic, that, you know, people think that training is sort of like a one-time deal. You do one big run and then you're done. It's never true. And so I'm interested in, like, putting some numbers and I don't know what you have disclosed or what you want to disclose, but, like, how many GPUs do you have? What is the equivalent amount of compute that you have? Because I understand that your GPU setup is different than what people typically think of, like, a giant data center somewhere, right?Vipul [00:23:20]: I don't think we have shared this number publicly. It's, you know, so this will be the first time, I guess. Like, we have close to 7,000 to 8,000 GPUs today. It's growing monthly.Swyx [00:23:31]: What class of GPU are they?Vipul [00:23:32]: They're mostly A100s and H100s.Swyx [00:23:35]: Okay.Vipul [00:23:36]: And probably more, I think, split towards H100s now. You know, we'll be sort of building this best-of-class hardware. So as there are other versions of these coming out later this year, we plan to have those in the fleet as well.Alessio [00:23:53]: I know when we talked last year, you were also using some of the supercomputers by the Department of Energy. There was kind of like a lot of random GPU compute in the world. Have you seen that kind of getting timed out? I think maybe a year ago, people were like, oh, yeah, you can use this GPU computer that is going to be end-of-life. Has the bar changed to give access to those resources?Ce [00:24:13]: From our perspective, it's actually getting better. Yeah, so from the community perspective, because many of the institutions in the world, they're actually investing in hardware, right? So for example, we are working with one of the institutes in Germany called Hessian AI, right, which gives us a lot of help on the compute side. So they start to have this very big GPU cluster, and they're actually sharing that with the community, right? And it's not super big, right, but also not a small one, right? So you start to see this, like, different lives that start to pop up, right? And because of the power of the community, they start to actually share that. So we actually find as a researcher today, it's probably easier for them to actually get a GPU than last year.Swyx [00:24:56]: Interesting.Alessio [00:24:56]: And then for you to buy them, what's the state of the market right now? Is it still extremely hard to get any? Do you have Jensen's phone number? Do you have like GM phone number? Do you guys get like the SDR because you're like under 10,000?Vipul [00:25:12]: NVIDIA is obviously motivated to help us, both as an investor and we are their customers. I would say the market is very tight still, and it's likely going to be this way for a while, is my sense that the demand for AI computing is just kind of ramped up very, very quickly, and it will take a while for supply to catch up.Swyx [00:25:37]: So how tight it is, and let's say compared to like a year ago, two years ago, what do you mean when you say tight? The things you want, you can't get?Vipul [00:25:42]: You can't get them immediately. They're sort of, you know, minimally like two to three months out. Any inventory that shows up tends to clear very, very rapidly. And, you know, we obviously sort of look at this in a very detailed and analytic. There is four to 5 million GPUs that will be sold this year from NVIDIA and others buying. And if you think about 512 to 1,000 GPU cluster for a company, that's 4,000 to 8,000 companies, right? So it's in some ways a very small number. In other ways, the cost of GPUs will be, you know, 80 to $100 billion, and then you layer servers and data center space and electricity on top of that, and that's, you know, close to $250 billion worth of kind of compute, which when you compare it to the cloud computing of today, you know, AWS's last year was $88 billion in revenue. So this is really kind of a build-out happening of AI hyperscalers. It is much more disaggregated, and it's very, very global. So, you know, we think that GPUs are going to be sort of a precious resource for a long time, and using them optimally is very valuable.Swyx [00:27:02]: Yeah.Alessio [00:27:02]: Our friend, Dylan Patel from Semianalysis, he wrote a post about the inference market recently and obviously mentioned you guys. In his post, he said, our model indicates that Together is better off using two A180 gig system rather than a H100-based system. The temperature and performance testing also point to Together utilizing speculative decoding. Any thoughts? Is Dylan right? I don't know, what's-Swyx [00:27:26]: What is his model, man? What does he know that they don't know? Yeah, exactly.Alessio [00:27:30]: I wanna know, I guess like from the outside, and sometimes we even do it, we try and speculate on what people are actually doing. So for the first time, now we have a former guest writing about a current guest. So we wanna know what you guys thought and maybe what are some of the misconceptions that people from the outside have on what it takes to run like a GPU cloud today?Vipul [00:27:50]: Yeah, big fan of Dylan's, by the way. I religiously read Semianalysis. I think there were some errors in that analysis. In particular, we were trying to decode it and one of the things we noticed is that it assumed that input tokens weren't being priced. So I think that may have been an error in the model. I also don't think that there's this assumption that people are running this at a loss. I think it's very expensive. You can't do that for very long. And there are trade-offs in terms of batch sizes you use and the kind of tokens per second performance that are kind of system trade-offs. We've done a lot of work. This is one of the key areas of research for us. So our inference stack is a combination of 50 different sort of tricks and techniques and we think there's a lot of room for optimization here. So whichever hardware provides better performance, whether it's H100 or A100s or L40s, we can sort of measure price performance on particular hardware and we tend to use that for that model or in some cases, certain customers have data streams which can be then optimized for a particular configuration regime. So we do fairly detailed work on how to make this more efficient and so it's hard to, from the outside, looking at memory bandwidth and estimating what's actually happening.Alessio [00:29:26]: How much of these 50 tricks are you giving to yourself and how many are you gonna open? Because we have three now, obviously Flash Attention 2 is open source. He mentioned he'd love to come work together because of how much you care about open source. Yeah, how do you weigh that as a CEO and CTO?Vipul [00:29:43]: A lot of it is open, right? Flash Attention, Flash Decoding, et cetera, and we publish something that's very generally universally useful. It's going to produce better open source AI. We tend to publish as open source. I think on the inference stack, there are open source inference stacks which are pretty good and definitely today, it gives us a competitive advantage to have the best one. So we are not sort of rushing out to release everything about it. It's not overall that additive to open source out there and it is particularly useful as a business for us to provide best price performance. Yeah, we make these decisions. We have discussions. Anything that we keep closed, we generally talk about it quite a bit and decide like this is the piece that is closed for today and it may not be the case six months from now. It may not matter as much.Ce [00:30:40]: Yeah, so I think being open is kind of very important, right? So I think the whole company actually built on this idea that there's going to be ecosystem built on our open models, right? And that's also how we are really lucky to attract this top group of talents to actually join us because of the dream and the mission that we have on our side to really facilitate the open ecosystem, right? So I think in general, it's like I think all the ideas should be open. So that's why we publish papers, right? We actually talk about ideas, right? So I don't think it makes any sense to keep idea like close, right? So there are some software artifact that are kind of really deeply embedded into our kind of own kind of like stack. It kind of only useful when you're trying to build a disaggregated cloud, right? Maybe at some point that we're going to be open as people said, right? But at this moment, right? So we are kind of busy actually building it, right? So that's probably kind of getting to the picture about when that piece is going to be open, right? But I think on the research side, the ideas and for our people to publish things, I think that's really, really important, right? So I think that's how we get talent. That's how I think we as a company going to move the field forward.Swyx [00:31:49]: I noticed that you never used the word federated learning or inference. Is there a distinction that you draw?Ce [00:31:55]: So, I mean, it's definitely not intentional, but I think federated learning is, have been used in so many different ways by so many different people. It starts to lose a very precise meaning about what that really mean, right? If you go back to the original Google paper of federated learning, I think that's very different from what people are talking about today when they say federated. Yeah, we kind of want to be really precise about it.Swyx [00:32:18]: And so your term is disaggregated.Ce [00:32:19]: Yeah, so as an infrastructure, right? So that's disaggregated.Swyx [00:32:22]: Aren't most clouds disaggregated? Like what's different about it?Ce [00:32:27]: So one way is that most of the cloud are disaggregated, but some of that is actually being exposed to the user, right? If you go to AWS, you do know which region you are in, right? So I think one thing that we are trying to do is you have this disaggregated cloud, not only about location or geographically where they are, but about this reliability and also this diversity of this infrastructure. So, and if we want to build a reliable, high-quality layer over that, the user actually don't know, right? What's actually happening under the cover, right? So I think that's one of the difference of the way that we are thinking about infrastructure.Swyx [00:33:06]: Yeah, a bit closer to Cloudflare than AWS. Yeah. Yeah. We have one question here, which we'll just throw out, it's kind of fun. So going back to this sort of inference stack piece, maybe if you had to pull out like a call for researcher or just like point out interesting areas of work that you're interested in, what pieces of the stack have the most opportunity for improvement?Ce [00:33:27]: Yeah, so I think the way we are thinking about the inference stack is, so there are multiple things that can happen, right? So you can do better algorithms, like speckle decoding, you can change the model architecture, you can go really crazy on the system side, right? And you can also code it on the hardware, right? So it's not really clear innovation on a single dimension will get you there. So the key thesis on our side is, if you only push on one direction, you are going to reach diminishing return really, really quickly. Yeah, there's only that much you can do on the system side, only that much you can do on the algorithm side. I think the only big thing that's going to happen is when you ask all those dimensions to actually compound, right? So to have algorithm, model, and system all come together, so I think that's how we reach the next 10 times improvement on inference, right? So I don't think there's a single dimension that is particularly important, but looking at this space in a joint way, right? Try to co-optimize jointly multiple dimensions, I think that's going to be really important for the community to look at.Vipul [00:34:28]: Yeah, we often see, I see numbers from the team and you have these multiple methods, not all of them compound. So you mix these together, it's still similar results and some combination of them will have this incredible effect that is really, really super interesting. So it's very systems, you know, a kind of broad systems approach to it that's the most effective.Swyx [00:34:51]: I think I finally get the name of the company, like- Bring it together, yeah. Everything needs to be automated together.Alessio [00:34:57]: All right, just quickly, how does all this work change, just like some of the architectures change? I know a mixture of experts like speculative decoding is a little less efficient because of memory bandwidth. How much of it do you invest when it's a maybe model-specific improvement versus more horizontal thing? Also, you're researching different architectures, so how much do you want to spend time optimizing what state of the art today versus what's coming next?Vipul [00:35:24]: We do spend time on what state of the art today as well as what's next. You know, the value we get from doing specific optimization, even for, you know, what works well for a particular model on A100s with a particular bus versus H100s, it's a worthwhile investment for us. So we will go down fairly deep into a specific architecture and specific hardware. It does also inform what works better where, and you don't have to take the same approach for, you know, every model and every sort of hardware setup. We can take these different approaches and we do have these multiple systems now. We know that this, you know, system B is better for mixed role and system C is going to be better for stripe tying or Mamba.Alessio [00:36:13]: Before we move on from inference, we need to talk about any scale of drama. So we're actually having Sumit on the podcast tomorrow, who also talked about, kind of came to your guys' support about how, yeah, how important it's not just like, oh, together saying this benchmark's not good because they look bad in it. How, I guess like, it's a hard question to ask, but like, why did you decide to just come out and say it? And how maybe does that also reflect the values that you guys have about open source and openness and kind of like being transparent about what's real and maybe hopes for standardizing some of these benchmarks to make it more clear?Ce [00:36:56]: So it's a great service and skills doing for the community, right? I mean, it's very hard to do benchmark. The moment you do benchmark comparing N players, right, N minus one will be unhappy. You have two tables, then maybe N of them will be unhappy, right? So it's a very great thing that they're doing. And in some of the work that we are doing, we actually use RMOperf, right? So it's a great thing that they're actually doing. So I think one thing about benchmark is, and probably the professor part of me are talking, is a good benchmark should think about how it's going to incentivize the field to actually move forward, right? So if the benchmark really become a kind of standard, how are people going to over-optimize to the benchmark if you are going to do that? And when people are doing that, what are we actually trying to incentivize, right? Will that move the world to a better place? Or will that essentially have every single player focus on marketing or spending time or money on something that actually do not matter on technical side, right? It's very hard to actually strike a balance, right? So I think the reason we kind of try to give feedback on the benchmark is kind of want to open up the discussion about how does the industry should come together and define maybe a common way that we compare with each other, right? So like how database people doing TPC, right? Maybe you should have something actually similar, right? So we are trying to start some of the conversation. So it's not really that we jump out to say it's not good because there's no way we can have a perfect benchmark. That doesn't really exist, right? So just try to kickstart a conversation that maybe we should come together and do something that the community agree and align with the benefit a user going to get, right? So just get the conversation started.Vipul [00:38:42]: I've spoken to the AnyScale team after that, and I think they had really great intentions. And partly, I think it felt very objective and everyone sort of had a reaction to it because it just didn't match their benchmarks that we've all run internally against different services. I think a common industry benchmark run by an independent party versus one of the vendors.Swyx [00:39:04]: Is there one that you appoint to?Vipul [00:39:06]: I don't think one exists today. I think there should be. We're having some conversations about someone setting one up. And there's lots of interesting aspects of this. Time to first token is a function of where the test was run from. There is different load on these services at different times of the day and weekday or weekend. So you have to measure that well. And I think if all of that were done very well by an independent source, that will be a very useful service to customers and in the services themselves.Swyx [00:39:39]: Yeah, I'll point people to artificialanalysis.ai, which is a new one that recently emerged. I don't know if they've done it right. It looks like a side project of a couple people. But I think it's in all the provider's interest to work with them. And ensure that there's an independent third party that's measuring these things, right? At least on the baseline. For me, what's worrying is more about what Toa was saying, which is, do these benchmarks skew things in ways that customers might not be mindful of? Like, what are these things overemphasizing that we might be missing? And I don't really know. It seems like a lot of these services bundled together, they're a version of quantization as well. So that means there's performance trade-offs, right? You're not comparing apples to apples, the same model itself, even though it's like a llama variant or whatever. So what do people trade off? They trade off latency, they trade off price. Obviously, those are the first two. But what else, right? What factors matter in an inference business?Ce [00:40:33]: Yeah, so I think there's also the throughput, right? So there's the time to first token, right? So, and then there are things that users do not often see, for example, the reliability, right? The capacity, right? So that also have impact on user experience at a global scale. Maybe not a single query, right? But in aggregation, you can also see a whole bunch of, like, whether you are emphasizing P50, P95, right? So the whole bunch of things that you can actually play with. And of course, there's also quality. So there are different ways to actually make the whole thing faster, specification, quantization, or combination of those, right? So yeah, so there are so many things to actually play with. So they probably need a benchmark that the protocol is transparent to make sure, like, it's very clear what we are doing and a whole bunch of check on the quality to make sure we are putting the right group of stories in the same table. So I think then essentially the user can actually navigate the space. So I think that's going to be good for everyone.Swyx [00:41:27]: Yeah, makes sense. It's a very important field and I think hopefully there's a good third party that emerges from this. So I just want to touch on one more piece, which is I think I'm appreciating from this discussion that fine tuning is a bigger part of your business than I thought. The other big player in fine tuning is Mosaic. Well, Mosaic is more training, but like there's a bunch of other players in the fine tuning space. If I was a prospective fine tuning customer, what do I come to you with? Do I come to you with my custom data and that's it? Do I also have to write the fine tuning code? What level of engagement do you do with your customers?Vipul [00:42:01]: I think across the spectrum, our customers are training models, pre-training models from scratch and many of them will bring their data sets, you know, user infrastructure and training stack to train their models. There are others who have trained smaller models and want to scale up, scale up across infrastructure, scale up across data. So we'll sort of help them do that. We will have customers who are sort of initially started a little bit more consultative. They have a particular task and idea in mind and we will help them get from there to the data set and the right model to achieve that task. So it's a spectrum and, you know, our goal is to, we're trying to productize as much of this as possible. So that the whole process can be fast and scalable. I would say there is a lot more understanding around fine tuning now, like even the last six months, there are, you know, source tools, recipes, literature, podcasts, discord channels where people are figuring out and it really is in many ways, one of the successes of open source is you have small collectives of, you know, engineers who have created, who are now creating the top models on open source leaderboards. And I have tried out all sorts of different sort of, you know, data recipes, creating synthetic data. Merging models. Merging models. So it's, that's really fun to see. And I think that sort of agency that exists now is exciting. And that is, we see a lot of that sort of being applied into products and, you know, more commercial models that people are deploying in their applications.Alessio [00:43:50]: And then just to, I guess, wrap up the together, it's almost becoming like a platform as a service, because now you release together embeddings. How did you get 92.5 accuracy on 32K retrieval? And do you think we're kind of like getting to embeddings or just like, we did everything that we could, you know, we're getting to like the most optimized it's gonna get and then we should just focus on models and inference or do you think there's still room there to improve?Ce [00:44:17]: Oh, I don't think we haven't even got started on embedding. Yeah. So I think there are so many things. So like embedding is really fundamental for many things, for example, rack, right? So deep in application. So that's how people bring knowledge in. That's also the fundamental piece when you want to build a better model, right? So that's give you this understanding about what actually get into the model. You can actually use that to actually build a better data set, get a better model, then get better embedding, you'll start this loop, right? Without the good embedding, the loop is not closed, right? So I think both on the quality side, how to embed more like dedicated semantics, like into those vectors, how to deal with negation, for example, right? So, and how can you make the whole thing really, really fast? So I think for the next couple years, yeah, we will see a whole bunch of new embeddings maybe of different size and much, much faster than today. Yeah, so I think it's a very active research area. I think people should invest more, yeah.Swyx [00:45:14]: I was surprised to see, I think Jina or, yeah, there's Jina AI, and then there's another guy, Tengyu's Voyage. They are coming out as startups purely focused on embeddings.Ce [00:45:25]: Yeah. Yeah, so I think it's a very, very important piece of the system, right? So you people haven't focused on a lot on them before, and they should definitely start to do that.Swyx [00:45:36]: Yeah. Why are the Chinese universities so good at embeddings? You know what I mean, right? Like the BGE and- Yeah, yeah, yeah.Ce [00:45:44]: So I don't know. We just released our first embedded model, so we still try to learn how to build an embedded model. Yeah, so ask me again in six months.Swyx [00:45:53]: I'll probably have more insight about how to build a better one. I just noticed that you saw 8002 was used to be at the top of the MTB chart, and then it's just like sliding down and down and down, and all the new models are coming out of China for some reason. And I'm like, I don't know what's going on there. So we cannot leave this discussion without talking about state space models. But first of all, how much of the company is dedicated to research? Like it's obviously like not production quality yet, but-Vipul [00:46:17]: I would say it's like 40, 45% I was counting this morning. That's huge.Swyx [00:46:22]: Yeah, so that's the biggest- It's a big investment. Yeah. Okay, well, I mean, it looks like it's paying off, so. And then high level, I will confess or admit or mention for the listeners who are also similarly skeptical, I did not used to care about long contexts because I was like, you know, 30K is enough, 100K is enough, right? I'm not, you know, modeling DNA sequences or anything like that. Why do I need long context? And I mean, first of all, I'll throw that open to you. But second of all, I think what Mamba did for me was change that perception of that. It's only about a long context. The only reason you want sub-quadratic architectures is for long context. Actually, that's not true. And it's also just more efficient to train, period. Right? I'll just leave that open to you. Like what's the motivation that people should keep in their heads? There are multiple things, right?Ce [00:47:09]: So one thing is that, I mean, the moment a model can do for long context well, so it often means that it's kind of cheaper. Yeah, so I mean, that's why it's kind of long. I mean, in principle, transformer can do long context. It's just very expensive. So I think what those like state-based models trying to do is try to push the size of the state, right? Like as small as possible. That's why it's kind of long context, right? And try to kind of like decouple this like quadratical dependency, right? To make sure you can have a much better execution pattern.One direct consequence of those is you can do long context really cheaply, but on the other hand, also introduce a whole bunch of benefit even you are not doing long context. Right? So I think that's actually probably equally important. Because data gets smaller, you can do really large batch size, right? You can actually be very faster. Right? So yeah. And another thing is like, one of the hypothesis that we have is, like in Stripe Hyena, it start to have a hybrid architecture, right? It has part of it has like state-based model and part of it is still the transformer. So different component probably deal with different things kind of better. So maybe by putting them together, by thinking about how information propagate, over this whole horizon of this context, you can probably get an even better quality model than transformer. Right? So I think that's why we are kind of invest a lot of things, on those models. Not only for the context, which is very important, but also for a whole bunch of benefit it could get.Swyx [00:48:42]: Yeah. How should people treat the distinction between Mamba and Stripe Hyena? Like what's the point of releasing these two as separate models? Is one like sort of the together proprietary one and then the other is like the more open research one?Ce [00:48:53]: Yeah. So I think it's pretty much a different stage of exploration. So they kind of have different hypothesis when we try to build those. Yeah. Like for instance, there are different view about state-based model. One is Hyena, another is like Mamba, right? They're actually different architecture. So when we build Stripe Hyena, right? So the curiosity that we have is how good can we... So what is the highest quality non-transformer model we can ever build? The goal of Stripe Hyena is try to see whether we can match Mistral. And by fine-tuning well, whether we can outperform that in some way, right? So it has a very, very strong baseline that we are trying to beat. So that's why there's hybrid scene, like getting the picture, right? And for Mamba, it's kind of more... The curiosity was how far can we push for pure architecture? Then we start from this very system make from small to large, right? All the way to 3 billion, right? So the baseline was essentially the best 3 billion model. So I guess at a different stage of exploration, at some point, I think they are going to converge. We actually learn different things, like when building different models. I think they are just like this intermediate stage in the exploration at different points.Alessio [00:50:02]: You mentioned the hybrid architecture. Is that the model grafting that you mentioned in the Stripe Hyena post where I mentioned you can have transformers and not together? Like this is a concept that I hadn't heard before reading about this. So I think most people's mental models, like transformers or something else, it's not transformers AND something else. How do you train a model that is hybrid? Is there any difference in like how you construct your datasets? Is there any difference in then how you run inference on it? How should people think about starting research in this field?Ce [00:50:36]: Yeah, so we were also very surprised. Yeah, so when we come up with this hybrid architecture. So the way to think about it is like you have different layers in the neural network, right? So like the stateless model for some layer will already give you the benefit. For the other layer, they could be transformers, right? They could give you this more global view of the sequence, but for me, for other layer, don't have to have that, right? I still can have all the other things that kick in, right? So we don't know what is the optimal mixture between different architectures. I mean, in principle, we can have a mamba, hyena, and transformer, all those things that come together, right? And then you can see what makes sense. We have no idea what is optimal doing that. So what we are excited about is now the community have a whole bunch of building blocks that they can actually like playing like a Lego, right? So just put together and see what happen, right? So we are kind of very excited about that. Yeah, we are in the process of trying to learn more like about this architecture. And when we know what we are talking about, we will definitely share with the community about how to do that in a systematic way.Swyx [00:51:41]: Cool. What are we still unsure about? Like, why don't we just, you know, put all the money in the world and training these things now? Like what is left to figure out before we scale this thing?Ce [00:51:53]: So like if you look at how transformer like it's been developed, right? In the last like five to 10 years, right? So people don't start from like, you have this attention to all you need the paper and then let's put all the money in, right? Always start from this very systematic understanding about the scaling, about data quality, about essentially the limits, right? I think for a state-based model from the labs to the real world, you kind of need to go through the same process. But of course, the second time doing that is kind of easier, right? But I think there's no way we can get rid of this systematic step of studying scaling law, study what data to put in, right? So what's the impact of different data slices to the data, yeah, to the final model quality.Swyx [00:52:33]: Do you expect that the data inputs will be different?Ce [00:52:37]: I don't know, but I wouldn't take that for granted that they should be the same, right? So that's one of the hypothesis that, so we have no opinion on that because I think that's the result of the study, not the assumption. Yeah, we do not need to assume that.Swyx [00:52:51]: Okay, scaling laws and data, anything else like architectural that we are not sure about? Because now you have this selection mechanism that you're pretty happy with.Ce [00:52:59]: Yeah, so, I mean, first of all, how to mix them, right? So, and second is what is the architecture? So if you look at transformer, right? So one very interesting piece there is people optimize also the hardware, yeah, to make sure that things run very fast, right?They're very efficient kernel, they're very efficient hardware. And then that's add another boost, right, for the transformer architecture, right? So that's something that should happen for state-based model. Which architecture is kind of easier kind of to run on the hardware, right? So, hosting going kind of faster, you can put more data, it add another dimension in the scaling law. So I think we just need to plow the whole space and just be really systematic from small model to 1 billion, 3 billion, 7 billion, just go all the way up, right? So I wouldn't jump around in the space. I would just like be patient and just like be systematic. Yeah, I think we'll get there, yeah.Swyx [00:53:52]: Yeah, well, I'm looking forward for more research from you guys to figure that out. So one dimension, which we didn't talk about, we talked about long context, we talked about efficiency, but speed is very, speed is also very important. A good inference provider provides, let's say 70 tokens per second, and then maybe that's faster than less good inference providers that are more like 30 tokens per second. But that's the rough range, right? State-of-the-art today. That's around the human speaking speed, human reading speed is about 200 words per minute. Why do we need 5,000 tokens per second is my question back to Vipul. And maybe is this something that is an emphasis for research as well, or is this more just an inference only thing?Vipul [00:54:29]: There are applications that are consuming the tokens that are produced from unmodeled, so they're not necessarily being read or heard by humans. That's a place where we see that level of requirement today that really nobody can quite satisfy. There is, can I think about, as intelligence grows, how do you sort of increase the bandwidth of, you know, how do you reduce the latency of it? If we can do 5,000 tokens a second, the same card can produce, the throughput of that card goes up significantly and can support more applications. So I think it's important from that perspective. And then there are, it opens up new UX possibilities. Once you can get sort of an immediate answer
Welcome to episode 244 of the Cloud Pod Podcast - where the forecast is always cloudy! We've got a ton of news for you this week, including a lot of AI updates, including new CoPilot Pro and updates to ChatGPT, including the addition of a GPT store. Plus, we discuss everyone's favorite supernatural axis, MagicQuadrants.It's a jam packed episode you won't want to miss. Titles we almost went with this week:
ThursdAI TL;DR - November 23 TL;DR of all topics covered: * OpenAI Drama* Sam... there and back again. * Open Source LLMs * Intel finetuned Mistral and is on top of leaderboards with neural-chat-7B (Thread, HF, Github)* And trained on new Habana hardware! * Yi-34B Chat - 4-bit and 8-bit chat finetune for Yi-34 (Card, Demo)* Microsoft released Orca 2 - it's underwhelming (Thread from Eric, HF, Blog)* System2Attention - Uses LLM reasons to figure out what to attend to (Thread, Paper)* Lookahead decoding to speed up LLM inference by 2x (Lmsys blog, Github)* Big CO LLMs + APIs* Anthropic Claude 2.1 - 200K context, 2x less hallucinations, tool use finetune (Announcement, Blog, Ctx length analysis)* InflectionAI releases Inflection 2 (Announcement, Blog)* Bard can summarize youtube videos now * Vision* Video-LLaVa - open source video understanding (Github, demo)* Voice* OpenAI added voice for free accounts (Announcement) * 11Labs released speech to speech including intonations (Announcement, Demo)* Whisper.cpp - with OpenAI like drop in replacement API server (Announcement)* AI Art & Diffusion* Stable Video Diffusion - Stability releases text2video and img2video (Announcement, Try it)* Zip-Lora - combine diffusion LORAs together - Nataniel Ruiz (Annoucement, Blog)* Some folks are getting NERFs out from SVD (Stable Video Diffusion) (link)* LCM everywhere - In Krea, In Tl;Draw, in Fal, on Hugging Face* Tools* Screenshot-to-html (Thread, Github)Ctrl+Altman+Delete weekendIf you're subscribed to ThursdAI, then you most likely either know the full story of the crazy OpenAI weekend. Here's my super super quick summary (and if you want a full blow-by-blow coverage, Ben Tossel as a great one here)Sam got fired, Greg quit, Mira flipped then Ilya Flipped. Satya played some chess, there was an interim CEO for 54 hours, all employees sent hearts then signed a letter, neither of the 3 co-fouders are on the board anymore, Ilya's still there, company is aligned AF going into 24 and Satya is somehow a winner in all this.The biggest winner to me is open source folks, who got tons of interest suddenly, and specifically, everyone seems to converge on the OpenHermes 2.5 Mistral from Teknium (Nous Research) as the best model around! However, I want to shoutout the incredible cohesion that came out of the folks in OpenAI, I created a list of around 120 employees on X and all of them were basically aligned the whole weekend, from ❤️ sending to signing the letter, to showing how happy they are Sam and Greg are back! YayThis Week's Buzz from WandB (aka what I learned this week)As I'm still onboarding, the main things I've learned this week, is how transparent Weights & Biases is internally. During the whole OAI saga, Lukas the co-founder sent a long message in Slack, addressing the situation (after all, OpenAI is a big customer for W&B, GPT-4 was trained on W&B end to end) and answering questions about how this situation can affect us and the business. Additionally, another co-founder, Shawn Lewis shared a recording of his update to the BOD of WandB, about out progress on the product side. It's really really refreshing to see this information voluntarily shared with the company
Has AI Already Taken Your Job and You Don't Even Know It?
This week, I'm sharing the big shifts (some would call this a quantum leap ;) I made to go from 32K to $119K in one year.If you are trying to create your first 6-figure year this episode is for you!I'm also going to be going deeper into each of these big shifts in my Quantum Leap Masterclass.I'll be sharing with you the strategic moves I made in meh content & selling to create this result. Things you can do as soon as you leave the Masterclass!You don't want to miss it.Grab a spot here
En este programa hacemos un repaso a algunas noticias de la actualidad commodoriana. Posteriormente destripamos la Amiga Action 11 con el equipo habitual formado por David Asenjo (https://twitter.com/darro99), Toni Bianchetti (https://twitter.com/seuck), Narciso Quintana "Narcisound" (https://twitter.com/narcisound), Jonatan Jiménez (https://twitter.com/jsabreman) y Paco Herrera (https://twitter.com/pacoblog64). En esta ocasión contaremos también con un invitado especial, Josua Ckultur de Retro Entre Amigos (https://twitter.com/retroamigos). Las noticias commodorianas comentadas han sido: - Nuevo proyecto de reemplazo del MOS6510 con aceleración: https://twitter.com/RetroWizzard/status/1696952457987846359?t=_Nrg2u-yfBRkeS9DRKYj_g&s=19 - Ponencia de Leonard Tramiel con la VCF preservada: https://archive.org/details/VCFW2023_Early_Commodore_History_-_Leonard_Tramiel_and_Dave_McMurtrie - Imágenes de sistema de desarrollo Amiga Jonatan: https://x.com/commodoreihs/status/1702481013828456782?t=bx5HuzDAGcprBryuOcOGxg&s=08 - Prototipo de teclado musical basado en chip SID: https://x.com/commodoreihs/status/1702492927526613035?t=BMdwmMYKXKnRTGy9uZzICg&s=08 - PET en Blue Beetle: https://x.com/CommodoreSpain/status/1705322774045028524?s=20 - Gamebase64 peligra por desaparición del administrador actual: https://www.gamesthatwerent.com/2023/09/gamebase64-needs-you/ https://www.lemon64.com/forum/viewtopic.php?t=83094" - Nuevo XC=BASIC: https://twitter.com/xc_basic/status/1702997228615323747?t=8FOsGN_SoO25mWC3nyqD1w&s=08 https://github.com/orlof/xcb3-gfx - Emulador de PET actualizado: https://www.masswerk.at/pet/ - GeckOS V2, sistema operativo para 6502: https://x.com/EverythingC64/status/1705559705597235323?t=idXBmYKqqLZ8CX287bj-QQ&s=08 https://github.com/fachat/GeckOS-V2/tree/v2.1.0 - Vortex reader, software de 80 col. para C64: https://csdb.dk/release/?id=235428 - Publicaciones de Michael Tomczyk. Paco/Todos - Preservado manual interno de VIC 20 previo a su lanzamiento comercial. - Nuevo libro de Editions 64K: https://www.editions64k.fr/projects/demoscene-the-amiga-renaissance/ Minuto y resultado de juegos: - QIXSCII - C64: https://www.youtube.com/watch?v=f1lYcxlFFEE https://romwer.itch.io/qixscii - Giana Sisters Power Edition 2023 - C64: https://www.youtube.com/watch?v=0E2ddPwN6eI https://csdb.dk/release/?id=235520 - Carrion's 121 Colors - CPlus/4: https://www.youtube.com/watch?v=yM6QhfxPJyQ https://plus4world.powweb.com/software/Carrions_121_Colors - Super Monza GP2 - VIC-20 32K: https://www.youtube.com/watch?v=yyYq_ZAxE1A https://aj-layden.itch.io/super-monza-gp-2 - Doomed PETSCII Pacman - C64: https://www.youtube.com/watch?v=18NUwFWHoi8 https://csdb.dk/release/?id=235324 - TED Vibes 2 - CPlus/4: https://www.youtube.com/watch?v=na1b1E0djRg https://plus4world.powweb.com/software/TED_Vibes_2 - Super 8 Football 2023 - C64: https://www.youtube.com/watch?v=YQIIOKv0UI0 https://interlacedgames.itch.io/super-8-football - Retro Scape 64 - C64: https://www.youtube.com/watch?v=BKwiY597iRI https://csdb.dk/release/?id=235535 - Colodrio - CPlus/4, C16: https://www.youtube.com/watch?v=RmSfSA-iK_8 - WackoPac demo - Amiga: https://www.youtube.com/watch?v=nFW3sAMYTbc - Boxx 4, v.1.04 - Amiga: https://www.youtube.com/watch?v=hRtrwHAC3rY - The Empire Strikes Back - CPlus/4: https://www.youtube.com/watch?v=FnPX6oI1bIo https://plus4world.powweb.com/software/The_Empire_Strikes_Back - Nueva "First release" de SNK VS Capcom - C64: https://www.youtube.com/watch?v=fP0wBpNTXv8
Prepare for an electrifying interview that's set to ignite your spirit as host Dean Wilson sits down with the extraordinary James E. Dixon, a beacon of resilience and empowerment. James's awe-inspiring journey from overcoming multiple surgeries due to poor blood circulation to becoming a record-setting weight lifter and motivational speaker is nothing short of remarkable. Born with challenges that might have deterred others, James's story is a testament to the power of determination and grit. Join us as we dive deep into James's riveting life story – from battling adversity through playing basketball and securing a Division III scholarship to his profound exploration of weightlifting and its transformative impact. But that's just the beginning! James's journey took an unexpected turn when he stepped into the world of sales, entrepreneurship, television, and ministry, all while keeping his amputation hidden. Discover how his decision to embrace vulnerability became a turning point, not only revolutionizing his own life but also creating a platform that brings hope and motivation to millions around the globe. In this captivating interview, James opens up about his transition from a hidden journey to a life lived authentically. As a proud father of three, model for Under Armour, and the anticipated 2023 NFL Combine Keynote Speaker, James's impact on various fields is undeniable. With a YouTube channel boasting 1.4 million subscribers and an Instagram following of 32K, his weekly motivational speeches on Absolute Motivation are a beacon of positivity that uplifts, empowers, and drives change. Whether you're seeking to conquer personal obstacles, unlock your potential, or simply be inspired, this interview is a can't-miss opportunity to hear from a true powerhouse in the world of motivation. Get ready to be uplifted, motivated, and inspired by the resilience and wisdom of James E. Dixon. Make sure to subscribe, hit the notification bell, and get ready to embark on a transformative journey that will leave you empowered to overcome any challenge life throws your way! Want More GLTV? Watch & Subscribe on YouTube! Listen & Subscribe on Spotify Listen & Subscribe on Apple Podcasts Follow us on Instagram! Follow us on Facebook! --- Support this podcast: https://podcasters.spotify.com/pod/show/goodlifeconversations/support
Today we are so excited to have special guest, Ilana Robinson aka Instagram/TikTok's IlanaFofana formerly known as DisneyMom2.0.! Ilana is a Florida resident, mom of 2, annual passholder, runner and overall Disney fanatic who brings her love of all things Disney to her Instagram page of over 32K members and her TikTok page of over 58K followers. MousekeMoms Podcast is sponsored by our friends at Kingdom and Cruise Travel. They can plan your perfect luxury family getaway and are experts in Disney Destinations. Best of all, their services are 100% FREE! Visit us on social media on Instagram at @mousekemoms_podcast or in our Facebook Group at @mousekemompodcast. MousekeMoms Podcast is featured on the Top 100 Disney Podcasts https://blog.feedspot.com/disney_podcasts/ For a transcript of today's show, visit https://mousekemomsblog.com/
Thanks to the almost 30k people who tuned in to the last episode!Your podcast cohosts have been busy shipping:* Alessio open sourced smol-podcaster, which makes the show notes here! * swyx launched GodMode. Maybe someday the Cursor of browsers?* We're also helping organize a Llama Finetuning Hackameetup this Saturday in anticipation of the CodeLlama release. Lastly, more speakers were announced at AI Engineer Summit!
Kathleen opens the show drinking a Laughing Guy Lager from Nashville's Tennessee Brew Works. She reviews her week home in Nashville, eating Nashville hot chicken at Party Fowl and spending a weekend morning at the Nashville Farmer's Market. QUEEN NEWS: Kathleen reports that Queen Taylor Swift gifted dozens of Eras tour truck drivers a $100K raise, and Swifties caused record-breaking seismic activity during her Seattle shows.“GOOD BAD FOOD”: In her quest for delicious not-so-nutritious food, Kathleen samples Snyder's Nashville Hot Chicken Pieces, and Kraft Mayo Buffalo Style Dressing. UPDATES: Kathleen gives updates on the arrest of the Kansas City Chiefs Chiefsaholic bank robber, Klimt's “Lady With A Fan” painting auctions for a record price, and Zuckerberg has lost $40B on his metaverse.“HOLY SHIT THEY FOUND IT”: Kathleen is amazed to read about the discovery of ruins believed to be Nero's theater near the Vatican.FRONT PAGE PUB NEWS: Kathleen shares articles about Taylor Swift fans applying for jobs at Eras show venues when they couldn't get tour tickets, Adidas is releasing a 2nd batch of unsold Yeezy sneakers after their breakup with Ye, Ellen Burstyn returns to the Exorcist franchise 50 years after the original film is released, rare Apple sneakers hit the auction block for $50K, Buc-ee's brisket is awarded a gold medal from Food & Wine magazine, Phoenix's record heat is killing the cacti, Cleopatra's tomb is the new luxe expedition for tourists in Egypt, “ghostlighting” is the sadistic new dating trend, the aviation industry is short 32K pilots/ mechanics/ air traffic controllers and airline scheduling will be impacted for 10 years, and Tupac Shakur's custom ring sells for a record $1M. WHAT TO WATCH THIS WEEK: Kathleen recommends watching “The Exorcist” on Hulu, and her new stand-up Special “Hunting Bigfoot” on Prime Video.See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
FlashAttention was first published by Tri Dao in May 2022 and it had a deep impact in the large language models space. Most open models you've heard of (RedPajama, MPT, LLaMA, Falcon, etc) all leverage it for faster inference. Tri came on the podcast to chat about FlashAttention, the newly released FlashAttention-2, the research process at Hazy Lab, and more. This is the first episode of our “Papers Explained” series, which will cover some of the foundational research in this space. Our Discord also hosts a weekly Paper Club, which you can signup for here. How does FlashAttention work?The paper is titled “FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness”. There are a couple keywords to call out:* “Memory Efficient”: standard attention memory usage is quadratic with sequence length (i.e. O(N^2)). FlashAttention is sub-quadratic at O(N). * “Exact”: the opposite of “exact” in this case is “sparse”, as in “sparse networks” (see our episode with Jonathan Frankle for more). This means that you're not giving up any precision.* The “IO” in “IO-Awareness” stands for “Input/Output” and hints at a write/read related bottleneck. Before we dive in, look at this simple GPU architecture diagram:The GPU has access to three memory stores at runtime:* SRAM: this is on-chip memory co-located with the actual execution core. It's limited in size (~20MB on an A100 card) but extremely fast (19TB/s total bandwidth)* HBM: this is off-chip but on-card memory, meaning it's in the GPU but not co-located with the core itself. An A100 has 40GB of HBM, but only a 1.5TB/s bandwidth. * DRAM: this is your traditional CPU RAM. You can have TBs of this, but you can only get ~12.8GB/s bandwidth, which is way too slow.Now that you know what HBM is, look at how the standard Attention algorithm is implemented:As you can see, all 3 steps include a “write X to HBM” step and a “read from HBM” step. The core idea behind FlashAttention boils down to this: instead of storing each intermediate result, why don't we use kernel fusion and run every operation in a single kernel in order to avoid memory read/write overhead? (We also talked about kernel fusion in our episode with George Hotz and how PyTorch / tinygrad take different approaches here)The result is much faster, but much harder to read:As you can see, FlashAttention is a very meaningful speed improvement on traditional Attention, and it's easy to understand why it's becoming the standard for most models.This should be enough of a primer before you dive into our episode! We talked about FlashAttention-2, how Hazy Research Group works, and some of the research being done in Transformer alternatives.Show Notes:* FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (arXiv)* FlashAttention-2* Together AI* From Deep Learning to Long Learning* The Hardware Lottery by Sara Hooker* Hazy Research* Is Attention All You Need?* Nvidia CUTLASS 3* SRAM scaling slows* Transformer alternatives:* S4* Hyena* Recurrent Neural Networks (RNNs)Timestamps:* Tri's background [00:00:00]* FlashAttention's deep dive [00:02:18]* How the Hazy Research group collaborates across theory, systems, and applications [00:17:21]* Evaluating models beyond raw performance [00:25:00]* FlashAttention-2 [00:27:00]* CUDA and The Hardware Lottery [00:30:00]* Researching in a fast-changing market [00:35:00]* Promising transformer alternatives like state space models and RNNs [00:37:30]* The spectrum of openness in AI models [00:43:00]* Practical impact of models like LLAMA2 despite restrictions [00:47:12]* Incentives for releasing open training datasets [00:49:43]* Lightning Round [00:53:22]Transcript:Alessio: Hey everyone, welcome to the Latent Space podcast. This is Alessio, Partner and CTO-in-Residence at Decibel Partners. Today we have no Swyx, because he's in Singapore, so it's a one-on-one discussion with Tri Dao. Welcome! [00:00:24]Tri: Hi everyone. I'm Tri Dao, excited to be here. [00:00:27]Alessio: Tri just completed his PhD at Stanford a month ago. You might not remember his name, but he's one of the main authors in the FlashAttention paper, which is one of the seminal work in the Transformers era. He's got a lot of interest from efficient transformer training and inference, long range sequence model, a lot of interesting stuff. And now you're going to be an assistant professor in CS at Princeton next year. [00:00:51]Tri: Yeah, that's right. [00:00:52]Alessio: Yeah. And in the meantime, just to get, you know, a low pressure thing, you're Chief Scientist at Together as well, which is the company behind RedPajama. [00:01:01]Tri: Yeah. So I just joined this week actually, and it's been really exciting. [00:01:04]Alessio: So what's something that is not on the internet that people should know about you? [00:01:09]Tri: Let's see. When I started college, I was going to be an economist, so I was fully on board. I was going to major in economics, but the first week I was at Stanford undergrad, I took a few math classes and I immediately decided that I was going to be a math major. And that kind of changed the course of my career. So now I'm doing math, computer science, AI research. [00:01:32]Alessio: I had a similar thing. I started with physics and then I took like a programming course and I was like, I got to do computer science. I don't want to do physics. So FlashAttention is definitely, everybody's using this. Everybody loves it. You just released FlashAttention 2 last week. [00:01:48]Tri: Yeah. Early this week on Monday. Yeah. [00:01:53]Alessio: You know, AI time. Things move fast. So maybe let's run through some of the FlashAttention highlights, some of the innovation there, and then we can dive into FlashAttention 2. So the core improvement in FlashAttention is that traditional attention is a quadratic sequence length. And to the two, FlashAttention is linear, which obviously helps with scaling some of these models. [00:02:18]Tri: There are two factors there. So of course the goal has been to make attention go faster or more memory efficient. And ever since attention became popular in 2017 with the Transformer paper, lots and lots of folks have been working on this. And a lot of approaches has been focusing on approximating attention. The goal is you want to scale to longer sequences. There are tons of applications where you want to do that. But scaling to longer sequences is difficult because attention scales quadratically in sequence length on both runtime and memory, as you mentioned. So instead of trying to approximate attention, we were trying to figure out, can we do the same computation and maybe be more memory efficient? So in the end, we ended up being the memory is linear in sequence length. In terms of computation, it's still quadratic, but we managed to make it much more hardware friendly. And as a result, we do get wall clock speed up on the order of 2 to 4x, which really helps because that just means that you'll be able to train with 2 to 4x longer sequence length for the same cost without doing any approximations. As a result, lots of folks have been using this. The thing is available in a lot of libraries that do language model training or fine tuning. [00:03:32]Alessio: And the approximation thing is important because this is an exact thing versus a sparse. So maybe explain a little bit the difference there. [00:03:40]Tri: For sure. So in addition, essentially you compute pairwise similarity between every single element in a sequence against each other. So there's been other approaches where instead of doing all that pairwise computation, you only compute similarity for some pairs of elements in the sequence. So you don't do quadratic number of comparison. And this can be seen as some form of sparsity. Essentially you're ignoring some of the elements. When you write down the matrix, you essentially say, OK, I'm going to pretend there's zero. So that has some benefits in terms of runtime and memory. But the trade-off is that it tends to do worse in terms of quality because you're essentially approximating or ignoring some elements. And I personally have worked on this as well for a few years. But when we talk to practitioners who actually train models, especially at large scale, they say, tend not to use these approximate attention methods. Because it turns out, this was surprising to me at the time, was that these approximation methods, even though they perform fewer computation, they tend to not be faster in walk-on time. So this was pretty surprising because back then, I think my background was more on the theoretical side. So I was thinking of, oh, how many flops or floating point operations are you performing? And hopefully that correlates well with walk-on time. But I realized that I was missing a bunch of ideas from the system side where flops or floating point operations don't necessarily correlate with runtime. There are other factors like memory reading and writing, parallelism, and so on. So I learned a ton from just talking to systems people because they kind of figured this stuff out a while ago. So that was really eye-opening. And then we ended up focusing a lot more on memory reading and writing because that turned out to be the majority of the time when you're doing attention is reading and writing memory. [00:05:34]Alessio: Yeah, the I.O. awareness is probably one of the biggest innovations here. And the idea behind it is, like you mentioned, the FLOPS growth of the cards have been going up, but the memory bandwidth, not as much. So I think maybe that was one of the assumptions that the original attention paper had. So talk a bit about how that came to be as an idea. It's one of those things that like in insight, it's like, obviously, why are we like rewriting to like HBM every time, you know, and like once you change it, it's clear. But what was that discovery process? [00:06:08]Tri: Yeah, in hindsight, a lot of the ideas have already been there in the literature. And I would say is it was somehow at the intersection of both machine learning and systems. And you kind of needed ideas from both sides. So on one hand, on the system side, so lots of systems folks have known that, oh, you know, kernel fusion is great. Kernel fusion just means that instead of performing, you know, loading the same element, instead of performing an operation, write it down, load it back up and perform the second operation, you just load it once, perform two operations and then write it down again. So that saves you kind of memory read and write in the middle there. So kernel fusion has been a classic. There's been other techniques from the system side, like tiling, where you perform things in the form of computations in block, again, so that you can load it into a really fast memory. Think of it as a cache. And this is, again, classical computer science ideas, right? You want to use the cache. So the system folks have been thinking about these ideas for a long time, and they apply to attention as well. But there were certain things in attention that made it difficult to do a complete kernel fusion. One of which is there is this softmax operation in the middle, which requires you to essentially sum across the row of the attention matrix. So it makes it difficult to kind of break it, because there's this dependency. So it makes it difficult to break things into a block. So on the system side, people have been thinking about these ideas, but it's been difficult to kind of do kernel fusion for the entire operation. On the machine learning side, people have been thinking more algorithmically. They say, okay, either we can approximate attention, or there's this trick called the online softmax trick, which says that because of softmax, the way it's written mathematically, you can actually break it up into smaller pieces, do some rescaling, and still get the right answer. So this online softmax trick has been around for a while. I think there was a paper from NVIDIA folks back in 2018 about this. And then there was a paper from Google. So Marcus, Rob, and Stats wrote a paper late 2021 on using this online softmax trick to break attention up into smaller pieces. So a lot of the ideas were already there. But it turns out, you kind of need to combine ideas from both sides. So you need to understand that, hey, we want to do kernel fusion to reduce memory written writes. But we also need this online softmax trick to be able to break the softmax into smaller pieces so that a lot of the systems tricks kind of carry through. We saw that, and it was kind of a natural idea that we ended up using ideas from both sides, and it ended up working pretty well. Yeah. [00:08:57]Alessio: Are there any downsides to kernel fusion? If I think about databases and the reasons why we have atomic operations, you know, it's like, you have observability and fallback in between them. How does that work with attention? Is there anything that we lose by fusing the operations? [00:09:13]Tri: Yeah, I think mostly on the practical side is that you lose a little bit of flexibility in the sense that, hey, now you have, for example, faster attention, it's just a subroutine that you would call to do attention. But as a researcher, let's say you don't want that exact thing, right? You don't want just attention, let's say you want some modification to attention. You want to do, hey, I'm going to multiply the query and key, but then I'm going to do this extra thing before I carry on. So kernel fusion just means that, okay, we have a subroutine that does the entire thing. But if you want to experiment with things, you won't be able to use that fused kernel. And the answer is, can we have a compiler that then automatically does a lot of this kernel fusion? Lots of compiler folks are thinking about this, either with a new language or you can embed it in PyTorch. PyTorch folks have been working on this as well. So if you write just your code in PyTorch and they can capture the graph, can they generate code that will fuse everything together? That's still ongoing, and it works for some cases. But for attention, because of this kind of softmax rewriting stuff, it's been a little bit more difficult. So maybe in a year or two, we'll have compilers that are able to do a lot of these optimizations for you. And you don't have to, for example, spend a couple months writing CUDA to get this stuff to work. Awesome. [00:10:41]Alessio: And just to make it clear for listeners, when we say we're not writing it to memory, we are storing it, but just in a faster memory. So instead of the HBM, we're putting it in the SRAM. Yeah. [00:10:53]Tri: Yeah. [00:10:54]Alessio: Maybe explain just a little bit the difference there. [00:10:56]Tri: Yeah, for sure. This is kind of a caricature of how you think about accelerators or GPUs in particular, is that they have a large pool of memory, usually called HBM, or high bandwidth memory. So this is what you think of as GPU memory. So if you're using A100 and you list the GPU memory, it's like 40 gigs or 80 gigs. So that's the HBM. And then when you perform any operation, you need to move data from the HBM to the compute unit. So the actual hardware unit that does the computation. And next to these compute units, there are on-chip memory or SRAM, which are much, much smaller than HBM, but much faster. So the analogy there is if you're familiar with, say, CPU and RAM and so on. So you have a large pool of RAM, and then you have the CPU performing the computation. But next to the CPU, you have L1 cache and L2 cache, which are much smaller than DRAM, but much faster. So you can think of SRAM as the small, fast cache that stays close to the compute unit. Physically, it's closer. There is some kind of asymmetry here. So HBM is much larger, and SRAM is much smaller, but much faster. One way of thinking about it is, how can we design algorithms that take advantage of this asymmetric memory hierarchy? And of course, lots of folks have been thinking about this. These ideas are pretty old. I think back in the 1980s, the primary concerns were sorting. How can we sort numbers as efficiently as possible? And the motivating example was banks were trying to sort their transactions, and that needs to happen overnight so that the next day they can be ready. And so the same idea applies, which is that they have slow memory, which was hard disk, and they have fast memory, which was DRAM. And people had to design sorting algorithms that take advantage of this asymmetry. And it turns out, these same ideas can apply today, which is different kinds of memory. [00:13:00]Alessio: In your paper, you have the pyramid of memory. Just to give people an idea, when he says smaller, it's like HBM is like 40 gig, and then SRAM is like 20 megabytes. So it's not a little smaller, it's much smaller. But the throughput on card is like 1.5 terabytes a second for HBM and like 19 terabytes a second for SRAM, which is a lot larger. How do you think that evolves? So TSMC said they hit the scaling limits for SRAM, they just cannot grow that much more. HBM keeps growing, HBM3 is going to be 2x faster than HBM2, I think the latest NVIDIA thing has HBM3. How do you think about the future of FlashAttention? Do you think HBM is going to get fast enough when maybe it's not as useful to use the SRAM? [00:13:49]Tri: That's right. I think it comes down to physics. When you design hardware, literally SRAM stays very close to compute units. And so you don't have that much area to essentially put the transistors. And you can't shrink these things too much. So just physics, in terms of area, you don't have that much area for the SRAM. HBM is off-chip, so there is some kind of bus that essentially transfers data from HBM to the compute unit. So you have more area to essentially put these memory units. And so yeah, I think in the future SRAM probably won't get that much larger, because you don't have that much area. HBM will get larger and faster. And so I think it becomes more important to design algorithms that take advantage of this memory asymmetry. It's the same thing in CPU, where the cache is really small, the DRAM is growing larger and larger. DRAM could get to, I don't know, two terabytes, six terabytes, or something, whereas the cache stays at, I don't know, 15 megabytes or something like that. I think maybe the algorithm design becomes more and more important. There's still ways to take advantage of this, I think. So in the future, I think flash attention right now is being used. I don't know if in the next couple of years, some new architecture will come in and whatnot, but attention seems to be still important. For the next couple of years, I still expect some of these ideas to be useful. Not necessarily the exact code that's out there, but I think these ideas have kind of stood the test of time. New ideas like IO awareness from back in the 1980s, ideas like kernel fusions, tiling. These are classical ideas that have stood the test of time. So I think in the future, these ideas will become more and more important as we scale models to be larger, as we have more kinds of devices, where performance and efficiency become much, much more important. [00:15:40]Alessio: Yeah, and we had Jonathan Frankle on the podcast, and if you go to issattentionallyouneed.com, he has an outstanding bet, and he does believe that attention will be the state of the art architecture still in a few years. Did you think flash attention would be this popular? I'm always curious on the research side, you publish a paper, and obviously you know it's great work, but sometimes it just kind of falls flat in the industry. Could you see everybody just starting to use this, or was that a surprise to you? [00:16:11]Tri: Certainly, I didn't anticipate the level of popularity. Of course, we were extremely happy to have people using this stuff and giving us feedback and so on, and help us improve things. I think when we were writing the paper, I remember sending an email to one of my advisors, and like, hey, I'm excited about this paper, but I think the most important thing will be the artifact, which is the code. So I knew that the code will be valuable. So we kind of focus a lot on the code and make sure that the code is usable and as fast as can be. Of course, the idea, the paper presents the ideas and explain it and have experiments that validate the idea, but I knew that the artifact or the code was also pretty important. And that turned out to be the right focus, which is, you know, we put out the paper, we release the code and continue working on the code. So it's a team effort with my co-authors as well. [00:17:07]Alessio: We mentioned Hazy Research a bunch of times on the podcast before. I would love for you to spend five minutes just talking about how does the group work? How do people get together? How do you bounce ideas off of each other? Yeah. [00:17:21]Tri: So Hazy Research is a research group at Stanford led by one of my advisors, Chris Re. I love the people there. It was one of the best experiences I had. They've made my PhD so much more enjoyable. And I think there are a couple of ways that the group has been working pretty well. So one is, I think there's a diverse pool of people who either, you know, some of them focus on algorithms and theory, some of them focus on building systems, some of them focus on applications. And as a result, there is this flow of idea. So as an example, some of us were working on like more algorithms and theory, and then we can talk to the folks building systems and say, hey, let's try it out and let's put it in the systems and see how it is. And there you will get feedback from systems folks. They will say, hey, we implemented this, or we tried this and this is where it doesn't work, something like that. And once we put it in the systems, the application folks can use the algorithm or new methods or new models. And we again get great feedback from them because the application folks, for example, some of my good friends, they focus on medical imaging or seizure detection. And that is the problem they care about. And if your method doesn't work on the task they care about, they will tell you. Whereas I think a lot of people in machine learning, they're a little bit more flexible. So they will be like, hey, it doesn't work on seizure detection. Let's try some other task, right? But having that direct feedback of like, hey, it doesn't work there, let's figure out why. I think that that feedback allows us to do better work. And I think that kind of process of exchanging ideas, validating it in a real system so that applications folks can try it out and give you feedback. That cycle has been very, very useful. And so that's one, having a diverse group of people. The other one is, and this is something I really appreciate from advice from Chris was try to understand the fundamental, right? And he's happy letting me go off and read some textbooks and playing with things because I think a lot of research ideas come from understanding the old literature and see how it fits with the new landscape. And so if you just new archive papers every day, that's great, but you also need to read textbooks. And that's one advice I got from Chris, which is understand the fundamentals. And I think that allows us to do more impactful work. [00:19:46]Alessio: How do you think about academia versus industry? I feel like AI / Machine Learning has been an area where up until three, four years ago, most of the cutting edge work was being done in academia. And now there's all these big industry research labs. You're obviously going to Princeton, so you're an academia believer. How should people think about where to go? Say I'm doing my master's, I have to decide between doing a PhD and going into OpenAI Anthropic. How should I decide? [00:20:15]Tri: I think they kind of play a complementary role, in my opinion. Of course, I also was considering different paths as well. So I think right now, scaling matters a lot, especially when you talk about language models and AI and so on. Scaling matters a lot. And that means that you need compute resources and you need infrastructure and you need engineers time. And so industry tends to have an advantage when it comes to scaling things. But a lot of the ideas actually came from academia. So let's take Attention, which got popular with the Transformer in 2017. Attention actually has been around for a while. So I think the first mention was in 2014, a paper from Bernadot and others and Yoshua Bengio, which is coming from academia. A lot of ideas did come from academia. And scaling things up, of course, I think OpenAI has been great at scaling things up. That was the bet that they made after, I think, GPT-2. So they saw that scaling these things up to back then was 1.5 billion parameter seemed to give you amazing capabilities. So they really committed to that. They really committed to scaling things. And that turned out to be, it's been a pretty successful bet. I think for academia, we're still trying to figure out exactly what we're doing in this shifting landscape. And so lots of folks have been focusing on, for example, evaluation. So I know the Stanford Center for Foundation Model led by Percy, they have this benchmark called HELM, which is this holistic benchmark. So trying to figure out, okay, characterizing the landscape of different kinds of models, what people should evaluate, what people should measure, and things like that. So evaluation is one role. The other one is understanding. So this has happened historically where there's been some development in the industry and academia can play a role in explaining, understanding. They have the luxury to slow down trying to understand stuff, right? So lots of paper on understanding what's really going on, probing these models, and so on. I think I'm not as familiar with the NLP literature, but my impression is there's a lot of that going on in the NLP conferences, which is understanding what these models are doing, what capabilities they have, and so on. And the third one I could see is that the academia can take more risky bets in the sense that we can work on stuff that is quite different from industry. I think industry, my impression is you have some objective. You're trying to say, hey, for this quarter, we want to scale the model in this particular way. Next quarter, we want the model to have these capabilities. You're trying to get objectives that maybe, I don't know, 70% that will work out because it's important for the company's direction. I think for academia, the way things work is you have many, many researchers or PhD students, and they're kind of pursuing independent directions. And they have a little bit more flexibility on, hey, I'm going to try out this seemingly crazy idea and see, let's say there's a 30% chance of success or something. And however you define success, for academia, a lot of the time, success just means like, hey, we found something interesting. That could eventually go into industry through collaboration and so on. So I do see academia and industry kind of playing complementary roles. And as for someone choosing a career, I think just more and more generally, industry would be probably better in terms of compensation, in terms of probably work-life balance. But my biased perspective is that maybe academia gives you a little bit more freedom to think and understand things. So it probably comes down to personal choice. I end up choosing to be a professor next year at Princeton. But of course, I want to maintain a relationship with industry folks. I think industry folks can provide very valuable feedback to what we're doing in academia so that we understand where the field is moving because some of the directions are very much influenced by what, for example, OpenAI or Google is doing. So we want to understand where the field is moving. What are some promising applications? And try to anticipate, okay, if the field is moving like this, these applications are going to be popular. What problems will be important in two, three years? And then we try to start thinking about those problems so that hopefully in two, three years, we have some of the answers to some of these problems in two, three years. Sometimes it works out, sometimes it doesn't. But as long as we do interesting things in academia, that's the goal. [00:25:03]Alessio: And you mentioned the eval side. So we did a Benchmarks 101 episode. And one of the things we were seeing is sometimes the benchmarks really influence the model development. Because obviously, if you don't score well on the benchmarks, you're not going to get published and you're not going to get funded. How do you think about that? How do you think that's going to change now that a lot of the applications of these models, again, is in more narrow industry use cases? Do you think the goal of the academia eval system is to be very broad and then industry can do their own evals? Or what's the relationship there? [00:25:40]Tri: Yeah, so I think evaluation is important and often a little bit underrated. So it's not as flashy as, oh, we have a new model that can do such and such. But I think evaluation, what you don't measure, you can't make progress on, essentially. So I think industry folks, of course, they have specific use cases that their models need to do well on. And that's what they care about. Not just academia, but other groups as well. People do understand what are some of the emerging use cases. So for example, now one of the most popular use cases is Chatbot. And then I think folks from Berkeley, some of them are from Berkeley, call them MLCs. They set up this kind of Chatbot arena to essentially benchmark different models. So people do understand what are some of the emerging use cases. People do contribute to evaluation and measurement. And as a whole, I think people try to contribute to the field and move the field forward, albeit that maybe slightly different directions. But we're making progress and definitely evaluation and measurement is one of the ways you make progress. So I think going forward, there's still going to be just more models, more evaluation. We'll just have better understanding of what these models are doing and what capabilities they have. [00:26:56]Alessio: I like that your work has been focused on not making benchmarks better, but it's like, let's just make everything faster. So it's very horizontal. So FlashAttention 2, you just released that on Monday. I read in the blog post that a lot of the work was also related to some of the NVIDIA library updates. Yeah, maybe run us through some of those changes and some of the innovations there. Yeah, for sure. [00:27:19]Tri: So FlashAttention 2 is something I've been working on for the past couple of months. So the story is the NVIDIA CUTLASS team, they released a new version of their library, which contains all these primitives to allow you to do matrix multiply or memory loading on GPU efficiently. So it's a great library and I built on that. So they released their version 3 back in January and I got really excited and I wanted to play with that library. So as an excuse, I was just like, okay, I'm going to refactor my code and use this library. So that was kind of the start of the project. By the end, I just ended up working with the code a whole lot more and I realized that, hey, there are these inefficiencies still in Flash Attention. We could change this way or that way and make it, in the end, twice as fast. But of course, building on the library that the NVIDIA folks released. So that was kind of a really fun exercise. I was starting out, it's just an excuse for myself to play with the new library. What ended up was several months of improvement, improving Flash Attention, discovering new ideas. And in the end, we managed to make it 2x faster and now it's pretty close to probably the efficiency of things like matrix multiply, which is probably the most optimized subroutine on the planet. So we're really happy about it. The NVIDIA Cutlass team has been very supportive and hopefully in the future, we're going to collaborate more. [00:28:46]Alessio: And since it's an NVIDIA library, can you only run this on CUDA runtimes? Or could you use this and then run it on an AMD GPU? [00:28:56]Tri: Yeah, so it's an NVIDIA library. So right now, the code we release runs on NVIDIA GPUs, which is what most people are using to train models. Of course, there are emerging other hardware as well. So the AMD folks did implement a version of Flash Attention, I think last year as well, and that's also available. I think there's some implementation on CPU as well. For example, there's this library, ggml, where they implemented the same idea running on Mac and CPU. So I think that kind of broadly, the idea would apply. The current implementation ended up using NVIDIA's library or primitives, but I expect these ideas to be broadly applicable to different hardware. I think the main idea is you have asymmetry in memory hierarchy, which tends to be everywhere in a lot of accelerators. [00:29:46]Alessio: Yeah, it kind of reminds me of Sara Hooker's post, like the hardware lottery. There could be all these things that are much better, like architectures that are better, but they're not better on NVIDIA. So we're never going to know if they're actually improved. How does that play into some of the research that you all do too? [00:30:04]Tri: Yeah, so absolutely. Yeah, I think Sara Hooker, she wrote this piece on hardware lottery, and I think she captured really well of what a lot of people have been thinking about this. And I certainly think about hardware lottery quite a bit, given that I do some of the work that's kind of really low level at the level of, hey, we're optimizing for GPUs or NVIDIA GPUs and optimizing for attention itself. And at the same time, I also work on algorithms and methods and transformer alternatives. And we do see this effect in play, not just hardware lottery, but also kind of software framework lottery. You know, attention has been popular for six years now. And so many kind of engineer hours has been spent on making it as easy and efficient as possible to run transformer, right? And there's libraries to do all kinds of tensor parallel, pipeline parallel, if you use transformer. Let's say someone else developed alternatives, or let's just take recurrent neural nets, like LSTM, GRU. If we want to do that and run that efficiently on current hardware with current software framework, that's quite a bit harder. So in some sense, there is this feedback loop where somehow the model architectures that take advantage of hardware become popular. And the hardware will also kind of evolve to optimize a little bit for that kind of architecture and software framework will also evolve to optimize for that particular architecture. Right now, transformer is the dominant architecture. So yeah, I'm not sure if there is a good way out of this. Of course, there's a lot of development. Things like, I think compilers will play a role because compilers allow you to maybe still be much more efficient across different kinds of hardware because essentially you write the same code and compiler will be able to make it run efficiently different kinds of hardware. So for example, there's this language Mojo, they're compiler experts, right? And their bet is AI models will be running on different kinds of devices. So let's make sure that we have really good compilers with a good language that then the compiler can do a good job optimizing for all kinds of devices. So that's maybe one way that you can get out of this cycle. But yeah, I'm not sure of a good way. In my own research, I have to think about both the algorithm new model and how it maps to hardware. So there are crazy ideas that seem really good, but will be really, really difficult to run efficiently. And so as a result, for example, we can't really scale some of the architectures up simply because they're not hardware friendly. I have to think about both sides when I'm working on new models. [00:32:50]Alessio: Yeah. Have you spent any time looking at some of the new kind of like AI chips companies, so to speak, like the Cerebras of the world? Like one of their innovations is co-locating everything on the chip. So you remove some of this memory bandwidth issue. How do you think about that? [00:33:07]Tri: Yeah, I think that's an interesting bet. I think Tesla also has this Dojo supercomputer where they try to have essentially as fast on-chip memory as possible and removing some of these data transfer back and forth. I think that's a promising direction. The issues I could see, you know, I'm definitely not a hardware expert. One issue is the on-chip memory tends to be really expensive to manufacture, much more expensive per gigabyte compared to off-chip memory. So I talked to, you know, some of my friends at Cerebros and, you know, they have their own stack and compiler and so on, and they can make it work. The other kind of obstacle is, again, with compiler and software framework and so on. For example, if you can run PyTorch on this stuff, lots of people will be using it. But supporting all the operations in PyTorch will take a long time to implement. Of course, people are working on this. So I think, yeah, we kind of need these different bets on the hardware side as well. Hardware has, my understanding is, has a kind of a longer time scale. So you need to design hardware, you need to manufacture it, you know, maybe on the order of three to five years or something like that. So people are taking different bets, but the AI landscape is changing so fast that it's hard to predict, okay, what kind of models will be dominant in, let's say, three or five years. Or thinking back five years ago, would we have known that Transformer would have been the dominant architecture? Maybe, maybe not, right? And so different people will make different bets on the hardware side. [00:34:39]Alessio: Does the pace of the industry and the research also influence the PhD research itself? For example, in your case, you're working on improving attention. It probably took you quite a while to write the paper and everything, but in the meantime, you could have had a new model architecture come out and then it's like nobody cares about attention anymore. How do people balance that? [00:35:02]Tri: Yeah, so I think it's tough. It's definitely tough for PhD students, for researchers. Given that the field is moving really, really fast, I think it comes down to understanding fundamental. Because that's essentially, for example, what the PhD allows you to do. It's been a couple of years understanding the fundamentals. So for example, when I started my PhD, I was working on understanding matrix vector multiply, which has been a concept that's been around for hundreds of years. We were trying to characterize what kind of matrices would have theoretically fast multiplication algorithm. That seems to have nothing to do with AI or anything. But I think that was a time when I developed mathematical maturity and research taste and research skill. The research topic at that point didn't have to be super trendy or anything, as long as I'm developing skills as a researcher, I'm making progress. And eventually, I've gotten quite a bit better in terms of research skills. And that allows, for example, PhD students later in their career to quickly develop solutions to whatever problems they're facing. So I think that's just the natural arc of how you're being trained as a researcher. For a lot of PhD students, I think given the pace is so fast, maybe it's harder to justify spending a lot of time on the fundamental. And it's tough. What is this kind of explore, exploit kind of dilemma? And I don't think there's a universal answer. So I personally spend some time doing this kind of exploration, reading random textbooks or lecture notes. And I spend some time keeping up with the latest architecture or methods and so on. I don't know if there's a right balance. It varies from person to person. But if you only spend 100% on one, either you only do exploration or only do exploitation, I think it probably won't work in the long term. It's probably going to have to be a mix and you have to just experiment and kind of be introspective and say, hey, I tried this kind of mixture of, I don't know, one exploration paper and one exploitation paper. How did that work out for me? Should I, you know, having conversation with, for example, my advisor about like, hey, did that work out? You know, should I shift? I focus more on one or the other. I think quickly adjusting and focusing on the process. I think that's probably the right way. I don't have like a specific recommendation that, hey, you focus, I don't know, 60% on lecture notes and 40% on archive papers or anything like that. [00:37:35]Alessio: Let's talk about some Transformer alternatives. You know, say Jonathan Franco loses his bet and Transformer is not the state of the art architecture. What are some of the candidates to take over? [00:37:49]Tri: Yeah, so this bet is quite fun. So my understanding is this bet between Jonathan Franco and Sasha Rush, right? I've talked to Sasha a bunch and I think he recently gave an excellent tutorial on Transformer alternatives as well. So I would recommend that. So just to quickly recap, I think there's been quite a bit of development more recently about Transformer alternatives. So architectures that are not Transformer, right? And the question is, can they do well on, for example, language modeling, which is kind of the application that a lot of people care about these days. So there are methods based on state space methods that came out in 2021 from Albert Gu and Curran and Chris Re that presumably could do much better in terms of capturing long range information while not scaling quadratically. They scale sub-quadratically in terms of sequence length. So potentially you could have a much more efficient architecture when sequence length gets really long. The other ones have been focusing more on recurrent neural nets, which is, again, an old idea, but adapting to the new landscape. So things like RWKV, I've also personally worked in this space as well. So there's been some promising results. So there's been some results here and there that show that, hey, these alternatives, either RNN or state space methods, can match the performance of Transformer on language modeling. So that's really exciting. And we're starting to understand on the academic research side, we want to understand, do we really need attention? I think that's a valuable kind of intellectual thing to understand. And maybe we do, maybe we don't. If we want to know, we need to spend serious effort on trying the alternatives. And there's been folks pushing on this direction. I think RWKV scale up to, they have a model at 14 billion that seems pretty competitive with Transformer. So that's really exciting. That's kind of an intellectual thing. We want to figure out if attention is necessary. So that's one motivation. The other motivation is Transformer Alternative could have an advantage in practice in some of the use cases. So one use case is really long sequences. The other is really high throughput of generation. So for really long sequences, when you train with Transformer, with flash attention and so on, the computation is still quadratic in the sequence length. So if your sequence length is on the order of, I don't know, 16K, 32K, 100K or something, which some of these models have sequence length 100K, then you do get significantly slower in terms of training, also in terms of inference. So maybe these alternative architectures could scale better in terms of sequence length. I haven't seen actual validation on this. Let's say an RNN model release with context length, I don't know, 100K or something. I haven't really seen that. But the hope could be that as we scale to long sequences, these alternative architectures could be more well-suited. Not just text, but things like high resolution images, audio, video, and so on, which are emerging applications. So that's one, long sequences. Number two is a high throughput generation, where I can imagine scenarios where the application isn't like an interactive chatbot, but let's say a company wants to batch as many requests as possible on their server, or they're doing offline processing, they're generating stuff based on their internal documents, that you need to process in batch. And the issue with Transformer is that during generation, it essentially needs to keep around all the previous history. It's called the KV cache. And that could take a significant amount of memory, so you can't really batch too much because you run out of memory. I am personally bullish on RNNs. I think RNNs, they essentially summarize the past into a state vector that has fixed size, so the size doesn't grow with the history. So that means that you don't need as much memory to keep around all the previous tokens. And as a result, I think you can scale to much higher batch sizes. And as a result, you can make much more efficient use of the GPUs or the accelerator, and you could have much higher generation throughput. Now, this, I don't think, has been validated at scale. So as a researcher, I'm bullish on this stuff because I think in the next couple of years, these are use cases where these alternatives could have an advantage. We'll just kind of have to wait and see to see if these things will happen. I am personally bullish on this stuff. At the same time, I also spend a bunch of time making attention as fast as possible. So maybe hatching and playing both sides. Ultimately, we want to understand, as researchers, we want to understand what works, why do the models have these capabilities? And one way is, let's push attention to be as efficient as possible. On the other hand, let's push other alternatives to be as efficient at scale, as big as possible, and so that we can kind of compare them and understand. Yeah, awesome. [00:43:01]Alessio: And I think as long as all of this work happens and open, it's a net positive for everybody to explore all the paths. Yeah, let's talk about open-source AI. Obviously, together, when Red Pajama came out, which was an open clone of the LLAMA1 pre-training dataset, it was a big thing in the industry. LLAMA2 came out on Tuesday, I forget. And this week, there's been a lot of things going on, which they call open-source, but it's not really open-source. Actually, we wrote a post about it that was on the front page of Hacker News before this podcast, so I was frantically responding. How do you think about what open-source AI really is? In my mind, in open-source software, we have different levels of open. So there's free software, that's like the GPL license. There's open-source, which is Apache, MIT. And then there's kind of restricted open-source, which is the SSPL and some of these other licenses. In AI, you have the open models. So Red Pajama is an open model because you have the pre-training dataset, you have the training runs and everything. And then there's obviously RandomLens that doesn't make it one-to-one if you retrain it. Then you have the open-weights model that's kind of like StableLM, where the weights are open, but the dataset is not open. And then you have LLAMA2, which is the dataset is not open, the weights are restricted. It's kind of like not really open-source, but open enough. I think it's net positive because it's like $3 million of flops donated to the public. [00:44:32]Tri: How do you think about that? [00:44:34]Alessio: And also, as you work together, what is your philosophy with open-source AI? Right, right. [00:44:40]Tri: Yeah, I think that's a great question. And I think about it on maybe more practical terms. So of course, Meta has done an amazing job training LLAMA1, LLAMA2. And for LLAMA2, they make it much less restrictive compared to LLAMA1. Now you can use it for businesses, unless you are a monthly active user or something like that. I think just this change will have a very significant impact in the kind of landscape of open-source AI, where now lots of businesses, lots of companies will be using, I expect will be using things like LLAMA2. They will fine-tune on their own dataset. They will be serving variants or derivatives of LLAMA2. Whereas before, with LLAMA1, it was also a really good model, but your business companies weren't allowed to do that. So I think on a more practical term, it's kind of shifting the balance between a closed-source model like OpenAI and Anthropic and Google, where you're making API calls, right? And maybe you don't understand as much of what the model is doing, how the model is changing, and so on. Versus now, we have a model with open weight that is pretty competitive from what I've seen in terms of benchmarks, pretty competitive with GPT 3.5, right? And if you fine-tune it on your own data, maybe it's more well-suited for your own data. And I do see that's going to shift the balance of it. More and more folks are going to be using, let's say, derivatives of LLAMA2. More and more folks are going to fine-tune and serve their own model instead of calling an API. So that shifting of balance is important because in one way, we don't want just a concentration of decision-making power in the hands of a few companies. So I think that's a really positive development from Meta. Of course, training the model takes a couple of millions of dollars, but engineers have and I'm sure they spend tons of time trying many, many different things. So the actual cost is probably way more than that. And they make the weights available and they allow probably a lot of companies are going to be using this. So I think that's a really positive development. And we've also seen amazing progress on the open source community where they would take these models and they either fine-tune on different kinds of data sets or even make changes to the model. So as an example, I think for LLAMA1, the context lane was limited to 2K. Like a bunch of folks figured out some really simple methods to scale up to like 8K. [00:47:12]Alessio: Like the RoPE. [00:47:13]Tri: Yes. I think the open source community is very creative, right? And lots of people. LLAMA2 will, again, kind of accelerate this where more people will try it out. More people will make tweaks to it and make a contribution and then so on. So overall, I think I see that as still a very positive development for the field. And there's been lots of libraries that will allow you to host or fine-tune these models, like even with quantization and so on. Just a couple of hours after LLAMA2 was released, tons of companies announcing that, hey, it's on our API or hosting and so on and together did the same. So it's a very fast-paced development and just kind of a model with available weights that businesses are allowed to use. I think that alone is already a very positive development. At the same time, yeah, we can do much better in terms of releasing data sets. Data sets tend to be... Somehow people are not incentivized to release data sets. So philosophically, yeah, you want to be as open as possible. But on a practical term, I think it's a little bit harder for companies to release data sets. Legal issues. The data sets released tend to be not as eye-catchy as the model release. So maybe people are less incentivized to do that. We've seen quite a few companies releasing data sets together. Released a red pajama data set. I think Cerebus then worked on that and deduplicate and clean it up and release slim pajama and so on. So we're also seeing positive development on that front, kind of on the pre-training data set. So I do expect that to continue. And then on the fine-tuning data set or instruction tuning data set, I think we now have quite a few open data sets on instruction tuning and fine-tuning. But these companies do pay for human labelers to annotate these instruction tuning data set. And that is expensive. And maybe they will see that as their competitive advantage. And so it's harder to incentivize these companies to release these data sets. So I think on a practical term, we're still going to make a lot of progress on open source AI, on both the model development, on both model hosting, on pre-training data set and fine-tuning data set. Right now, maybe we don't have the perfect open source model since all the data sets are available. Maybe we don't have such a thing yet, but we've seen very fast development on the open source side. I think just maybe this time last year, there weren't as many models that are competitive with, let's say, ChatGPT. [00:49:43]Alessio: Yeah, I think the open data sets have so much more impact than open models. If you think about Elusive and the work that they've done, GPT-J was great, and the Pythia models are great, but the Pyle and the Stack, everybody uses them. So hopefully we get more people to contribute time to work on data sets instead of doing the 100th open model that performs worse than all the other ones, but they want to say they released the model. [00:50:14]Tri: Yeah, maybe the question is, how do we figure out an incentive structure so that companies are willing to release open data sets? And for example, it could be like, I think some of the organizations are now doing this where they are asking volunteers to annotate and so on. And maybe the Wikipedia model of data set, especially for instruction tuning, could be interesting where people actually volunteer their time and instead of editing Wikipedia, add annotation. And somehow they acknowledge and feel incentivized to do so. Hopefully we get to that kind of level of, in terms of data, it would be kind of like Wikipedia. And in terms of model development, it's kind of like Linux where people are contributing patches and improving the model in some way. I don't know exactly how that's going to happen, but based on history, I think there is a way to get there. [00:51:05]Alessio: Yeah, I think the Dolly-15K data set is a good example of a company saying, let's do this smaller thing, just make sure we make it open. We had Mike Conover from Databricks on the podcast, and he was like, people just bought into it and leadership was bought into it. You have companies out there with 200,000, 300,000 employees. It's like, just put some of them to label some data. It's going to be helpful. So I'm curious to see how that evolves. What made you decide to join Together? [00:51:35]Tri: For Together, the focus has been focusing a lot on open source model. And I think that aligns quite well with what I care about, of course. I also know a bunch of people there that I know and trust, and I'm excited to work with them. Philosophically, the way they've been really open with data set and model release, I like that a lot. Personally, for the stuff, for example, the research that I've developed, like we also try to make code available, free to use and modify and so on, contributing to the community. That has given us really valuable feedback from the community and improving our work. So philosophically, I like the way Together has been focusing on open source model. And the nice thing is we're also going to be at the forefront of research and the kind of research areas that I'm really excited about, things like efficient training and inference, aligns quite well with what the company is doing. We'll try our best to make things open and available to everyone. Yeah, but it's going to be fun being at the company, leading a team, doing research on the topic that I really care about, and hopefully we'll make things open to benefit the community. [00:52:45]Alessio: Awesome. Let's jump into the lightning round. Usually, I have two questions. So one is on acceleration, one on exploration, and then a takeaway. So the first one is, what's something that already happened in AI machine learning that you thought would take much longer than it has? [00:53:01]Tri: I think understanding jokes. I didn't expect that to happen, but it turns out scaling model up and training lots of data, the model can now understand jokes. Maybe it's a small thing, but that was amazing to me. [00:53:16]Alessio: What about the exploration side? What are some of the most interesting unsolved questions in the space? [00:53:22]Tri: I would say reasoning in the broad term. We don't really know how these models do. Essentially, they do something that looks like reasoning. We don't know how they're doing it. We have some ideas. And in the future, I think we will need to design architecture that explicitly has some kind of reasoning module in it if we want to have much more capable models. [00:53:43]Alessio: What's one message you want everyone to remember today? [00:53:47]Tri: I would say try to understand both the algorithm and the systems that these algorithms run on. I think at the intersection of machine learning system has been really exciting, and there's been a lot of amazing results at this intersection. And then when you scale models to large scale, both the machine learning side and the system side really matter. [00:54:06]Alessio: Awesome. Well, thank you so much for coming on 3. [00:54:09]Tri: This was great. Yeah, this has been really fun. [00:54:11] Get full access to Latent Space at www.latent.space/subscribe
On this episode of Roger the Wild Child Show: Nashville edition, we are joined by country/pop/alternative artist, Tayla Reese!TAYLA REESis a passionate, vibrant and exciting young artist. She is very dedicated, strong, fun and professional with a special “fire” when it comes to music. She incorporates all that she is within every lyric she writes and every note she sings. She has a special ear and soul for music and is excited to share her unique sound with everyone. She has been performing since the age of 10 as both a solo artist and with her own bands at events and venues. Her journey to success started by singing at fundrasiers and local venues and continues today at - Bethel Woods, Daryls house, The Chance Theater, Towne Crier, The Falcon, Splashdown Beach, etc. In 2015 & 2017, she performed at Bethel Woods singing live on stage with Foreigner to a 32K audience, as well as a song at a local venue with Country singer and mentor, Jessica Lynn. Tayla Rees also works with producers recording her own songs as well as others. She has acted, sang, and performed in music videos and was an intern for 2 summers on set with a NYC Producer. She has been an extra on popular TV shows such as Law & Order SVU, Unforgettable, and the newer Annie movie. Tayla Rees starred in a Scarlett Antonia Musical Production, A Journey Home, 2021/2022 and sang a couple of her own songs. Tayla Rees has released singles on all Major Platforms and music videos on YouTube, etc. She loves being able to share her music with others and using outlets such as YouTube videos, Instagram, TikTok, etc. Her EP “Unscathed”, releasing early 2023 and will be on all major platforms as well as YouTube Music Videos (with Producer, Pat Gasperini, (Singer/Songwriter, Sony/ATV Music Publishing, with multiple Top 40 mainstream and active Billboard Charting Artists). Tayla Rees's live tour begins 2023 performing with Patrick James Band as well as her Solo career preforming across the U.S.******Roger the Wild Child Show: Nashville is streamed live every Wednesday night 9pm ET/ 6pm PT on Facebook, YouTube and Twitter. The show is rebroadcasted on 20+ different podcast platforms. Each week they talk with up-and-coming artists, legends of country music and other influencers to the Nashville scene. Roger is joined by co-hosts Megan Bennett, Patrick James and Kristen Kae. Wanna know what's the nitty gritty from music city? Elise Harper has your Nashville Music News! Check out the video/audio podcasts and the rest of our linksLinkTree https://linktr.ee/wildchildradio
When I started my business I wanted to succeed really quickly. This was driven by the fear of and looking like a failure. I wanted overnight wealth, overnight engaged community and overnight full books. I guess that's kinda normal to want these things but when I was informed by trusted people and professionals that ...it'll take three years to build your brand and to build a solid, engaged communityit's not a business if it can't support your life; it's a hobbyit'll take ten years to be highly regarding in your field ... I wanted to give up. Ugh, who the hell wants to wait this long for this? Recently, my online community grew from 32K to 529K in a matter of four months. I wasn't doing anything new and applied no new strategy. I'd been plugging away for nine years, sharing my message, tweaking it, doing the inner work and showing up a little more each time as my authentic nature. Today's episode shares how grateful I am for past Kat who played the long game, delayed gratification, put her head down and bum up and stayed with what felt good, right and true. Because now, I am reaping the hard work she (past Kat) put in, even when she wondered if what she was doing was a waste of time. UPCOMING EVENTSZEROFKS Dance Party - 23rd May, Melbourne, Australia. Sign up here!WORK WITH ME 1:1 - Coach with Kat.SUBSCRIBE TO MY MEDITATION MEMBERSHIP - TAKE YOUR MEDS - Take Your Meds Meditation Membership - Join now.Support the show
Jahmaal Marshall is blessed with a natural ability to help people. He's been doing it since he was a teenager. Listen to Jahmaal's journey, how his childhood traumatic experience defined his future and career, and why he strongly feels that "what we believe about ourselves is often hardwired into a traumatic event."Here are key nuggets from our session together:Jahmaal's journey in/out of prison (not what you think)Why he calls himself a Biblical Counselor, and what he means by "I'm not religious."The difference between living and existingThe difference between coaching and counselingWhat does Jahmaal mean by "if you go too deep and fast, you may drown."What is 'burnout' and the root cause for that stageJahmaal's advice for determining if someone is a weight or a wingThe best places to get a true perspective on lifeConnect with Jahmaal here:https://www.linkedin.com/in/jahmaalmarshall/ (32K+ followers)https://listenthenspeak.com/
The AI Breakdown: Daily Artificial Intelligence News and Discussions
GPT-4 as most people use it has an 8K token limit. Some already have access to a version with a 32K limit, however, and are reporting hugely different opportunities in terms of what it GPT can do, based on how much more text it can take in and output at once.
THE KELLY CARDENAS PODCAST PRESENTS James E. Dixon was born with poor blood circulation that necessitated thirty three surgeries before his 11th birthday, and eventually resulted in unilateral amputation below the knee. James learned to overcome his disability by playing sports like basketball and was offered a Division III scholarship. He began weight lifting in college, which sparked a lifelong passion that led to record setting achievements. After graduating, he gravitated towards people-focused careers: sales, restaurant entrepreneurship, television, exposition host, and ministry. James hid his amputation along each career path, but eventually shared, changing his life and creating a platform that offers hope to others. Today his motivational speeches are featured weekly to 1.4M subscribers on the YouTube's Absolute Motivation channel and to 32K followers on Instagram. He is also a model for Under Armor, slated as the 2023 NFL combine Keynote Speaker, and a proud father of three who residing in the Indianapolis area. Thank you to our sponsors THE HIDEOUT Be sure to check out my new audiobook SUCCESS LEAVES CLUES (THE 7 P'S THAT CAN SHIFT YOUR REALITY) Thank you to our sponsors PRIVATE MONEY CLUB USE CODE - KELLY500 MONEY SCHOOL TABLE ONE HOSPITALITY RAVEN DRUM FOUNDATION THE MINA GROUP SECRET KNOCK FAMECAST Findlay Volvo Las Vegas Samaritans Feet Cardenas Law Group Squeeze Dried Agua Hedionda Lagoon Foundation BLING SHINE SERUM-The #1 seller of over 15 years and the only product to be endorsed by my MAMA! MORE KELLY “JOY IS THE ART OF FALLING IN LOVE WITH YOUR CURRENT CIRCUMSTANCES AND ALLOWING MAGIC TO HAPPEN!” EXECUTIVE PRODUCER BROOKLYN CARDENAS --- Send in a voice message: https://podcasters.spotify.com/pod/show/kelly-cardenas/message
Crypto News Alerts | Daily Bitcoin (BTC) & Cryptocurrency News
With the next Bitcoin halving now only 351 days away, PlanB creator of the BTC stock-to-flow model updated his price forecast for the king crypto (Bitcoin), predicting BTC to skyrocket to $532K after the 2024 halving. "My Jan 12 prediction is in line with S2F model: 1) ~$32K is S2F 1 standard deviation band 2) ~$60K is S2F model value and just before halvings (dark blue) BTC seems to hit S2F model values 3) $100K is the bottom of my 100K-1M range around $532K S2F model value after 2024 halving" Learn more about your ad choices. Visit megaphone.fm/adchoices
Alabama: “Million Dollar Band” Today I honor my father, Benjamin Colon - otherwise known as Benny or Dad Dad was the Drum Major at Dominguez High School, and he LOVED bands and music! A college marching band adds tradition, pomp and circumstance, fight song, the alma mater, and the memories of college football, basketball, and those crazy pep rallies. Alabama's marching band is known as “The Million Dollar Band” My research led me to Alabama Alumnus, William Champ Pickens, who bestowed the name on the band after the 1922 football game against Georgia Tech. The Crimson Tide lost 33-7 to the Yellow Jackets. Atlanta sportswriter commented to Pickens, “You don't have much of a team, what do you have at Alabama?” To that, Champ proudly answered, “A Million Dollar Band.” And so the name stuck. That was in 1922 – hard to believe Alabama ever had a losing season! You might know the name “Bear” Bryant. For 25 years, he coached the Alabama football team. Coach Bryant's collection of 323 collegiate wins includes 13 Conference Championships and 6 National Titles with the University of Alabama. What I love about this legendary coach is this: he often gave the Million Dollar Band partial credit for the many football victories. What a true gentleman! University of Alabama Large, public institution - With at total of over 32K students 7500 freshmen – that's big! Priority Deadline: February 1 Click to Watch Video Click to Read Blog FREE: Download 10 Sample Essays FREE: Watch Mini College Essay Training Book a Call with Dr. C Visit the website
THE KELLY CARDENAS PODCAST PRESENTS James E. Dixon was born with poor blood circulation that necessitated thirty three surgeries before his 11th birthday, and eventually resulted in unilateral amputation below the knee. James learned to overcome his disability by playing sports like basketball and was offered a Division III scholarship. He began weight lifting in college, which sparked a lifelong passion that led to record setting achievements. After graduating, he gravitated towards people-focused careers: sales, restaurant entrepreneurship, television, exposition host, and ministry. James hid his amputation along each career path, but eventually shared, changing his life and creating a platform that offers hope to others. Today his motivational speeches are featured weekly to 1.4M subscribers on the YouTube's Absolute Motivation channel and to 32K followers on Instagram. He is also a model for Under Armor, slated as the 2023 NFL combine Keynote Speaker, and a proud father of three who residing in the Indianapolis area. THE HIDEOUT Be sure to check out my new audiobook SUCCESS LEAVES CLUES (THE 7 P'S THAT CAN SHIFT YOUR REALITY) Thank you to our sponsors PRIVATE MONEY CLUB USE CODE - KELLY500 MONEY SCHOOL TABLE ONE HOSPITALITY RAVEN DRUM FOUNDATION THE MINA GROUP SECRET KNOCK FAMECAST Findlay Volvo Las Vegas Samaritans Feet Cardenas Law Group Squeeze Dried Agua Hedionda Lagoon Foundation BLING SHINE SERUM-The #1 seller of over 15 years and the only product to be endorsed by my MAMA! MORE KELLY “JOY IS THE ART OF FALLING IN LOVE WITH YOUR CURRENT CIRCUMSTANCES AND ALLOWING MAGIC TO HAPPEN!” EXECUTIVE PRODUCER BROOKLYN CARDENAS --- Send in a voice message: https://anchor.fm/kelly-cardenas/message
AgendaChatGPT 4 came outArlingbrook What am I selling?/What am I building?There's power in being firstChatGPT 4EmailAPI Waitlist: Please sign up for our waitlist to get rate-limited access to the GPT-4 API – which uses the same ChatCompletions API as gpt-3.5-turbo. We'll start inviting some developers today, and scale up availability and rate limits gradually to balance capacity with demand.Priority Access: Developers can get prioritized API access to GPT-4 for contributing model evaluations to OpenAI Evals that get merged, which will help us improve the model for everyone.ChatGPT Plus: ChatGPT Plus subscribers will get GPT-4 access on chat.openai.com with a dynamically adjusted usage cap. We expect to be severely capacity constrained, so the usage cap will depend on demand and system performance. API access will still be through the waitlist.API Pricinggpt-4 with an 8K context window (about 13 pages of text) will cost $0.03 per 1K prompt tokens, and $0.06 per 1K completion tokens.gpt-4-32k with a 32K context window (about 52 pages of text) will cost $0.06 per 1K prompt tokens, and $0.12 per 1K completion tokens.LivestreamPlease join us for a live demo of GPT-4 at 1pm PDT today, where Greg Brockman (co-founder & President of OpenAI) will showcase GPT-4's capabilities and the future of building with the OpenAI API.—The OpenAI teamDemohttps://www.youtube.com/watch?v=outcGtbnMuQOverviewhttps://openai.com/product/gpt-4Overview page of GPT-4 and what early customers have built on top of the modelBlog Posthttps://openai.com/research/gpt-4Blog post with details on the model's capabilities and limitations, including eval resultsAPI Waitlisthttps://openai.com/waitlist/gpt-4-api Visual inputsGPT-4 can accept a prompt of text and images, which—parallel to the text-only setting—lets the user specify any vision or language task. Specifically, it generates text outputs (natural language, code, etc.) given inputs consisting of interspersed text and images. Over a range of domains—including documents with text and photographs, diagrams, or screenshots—GPT-4 exhibits similar capabilities as it does on text-only inputs. Furthermore, it can be augmented with test-time techniques that were developed for text-only language models, including few-shot and chain-of-thought prompting. Image inputs are still a research preview and not publicly available.Open AI Pricinghttps://openai.com/pricing ArlingbrookOffers 2 subscriptions$8.99/monthExclusive access to creators$45.99/monthRhinoleg CRMUnlimited featuresChat botThe whole thing is written in AIMuch moreThere's power in being firstThe world can only hold 2 options in their mindsArlingbrook is being released in no less than 60 daysSupport this podcast at — https://redcircle.com/the-secret-to-success/exclusive-contentAdvertising Inquiries: https://redcircle.com/brandsPrivacy & Opt-Out: https://redcircle.com/privacy
In this video Demeterius and I discussed: 1. His debt free journey and how he paid off 32K of debt in 5 months back in 2018. 2. What he was able to accomplish since becoming debt-free. Just to mention a few things: his wife graduated from law school debt-free (University of Baltimore). He told his wife to quit her six-figure job because it was affecting her health. He recently took my mom on a five star vacation (dream come true). 3. He currently lives a debt free life. More about Demeterius: https://www.instagram.com/harmonfinancialcoaching/ https://www.harmonfinancialcoaching.com/ Demeterius was born and raised in Salisbury, Maryland in a single-parent household. At 17 years old, they joined the Army, serving for 6.5 years and completing two deployments to Iraq. They are happily married with two children and have been married for 13 years. Demeterius is a devout Christian, having given their life to Jesus Christ in 2010 and striving to live according to biblical principles, including financial stewardship. Currently residing in the DC area since 2007, Demeterius is committed to their family and their faith. He is now helping others achieve financial greatness through faith. =|| Books Mentioned ||= Millionaire Next Door by Thomas J. Stanley - https://amzn.to/3YFUtk8 Be sure to subscribe on iTunes, Spotify or wherever you listen to podcasts. ____________ AFFILIATES/SPONSORS: DISCLAIMER: these are sponsored links in which I get paid and you can benefit for being a listener to the podcast. Get Eco Friendly Stocking stuffers with Earth Breeze Laundry Sheets: https://aboutthatwallet.com/earthbreeze Survey Junkie - Make some shmoney for the opinion!! https://aboutthatwallet.com/surveyjunkie Start your investment journey with free stocks! https://aboutthatwallet.com/webull Gain access to over 5,000 training videos on how to increase your skillset with crypto, investing, how to start a business, podcasting and much more: https://shopakanundrum.com/?ref=atw My equipment: Rode Caster Pro - https://amzn.to/3i596tF SHURE SM7B Dynamic Microphone - https://amzn.to/3AbV040 Microphone Stand - https://amzn.to/3NIeBfz Listen to the podcast on your favorite listening platforms such as Apple, Google, Spotify, Amazon and more!! -- DISCLAIMER: I am not a CPA, attorney, insurance, contractor, lender, or financial advisor. The content in this audio are for educational purposes only. You must do your own research and make the best choice for you. Investing of any kind involves risk. While it is possible to minimize risk, your investments are solely your responsibility. It is imperative that you conduct your own research. I am merely sharing my opinion with no guarantee of gains or losses on investments. If you need advice, please contact a qualified CPA, CFP, an attorney, insurance agent, financial advisor, or the appropriate professional for the subject you would like help with. --- Send in a voice message: https://anchor.fm/aboutthatwallet/message Support this podcast: https://anchor.fm/aboutthatwallet/support
This week's EYE ON NPI is your next 8-bit microcontroller, it's STMicroelectronics STM32C0x1 Series Entry-Level MCU (https://www.digikey.com/en/product-highlight/s/stmicroelectronics/stm32c0x1-series-entry-level-32-bit-mcu) that gives developers a 32-bit Arm Cortex M0+ microcontroller at 8-bit microcontroller prices. These aggressively priced microcontrollers come just at the end of a 2-year chip shortage, so if you've been holding back a design, it could be a great time to swap out your 8051-based or other 8-bit microcontroller for a powerful Cortex M0+ that can use ST's supported firmware libraries and IDE. Most developers go with 8-bit microcontrollers to get a few basic needs met: maybe a few timers, ADC, GPIO and I2C or USART for interfacing. The benefits are low complexity in design, for example no crystal needed because there's an internal trimmed-RC oscillator, or a simple power supply with only one power pin so you don't need multiple regulators. 8051-based (https://en.wikipedia.org/wiki/Intel_8051) microcontrollers are popular as cores for their ultra-low cost and fairly-low power usage. But the programming environment tends to be archaic, and 8-bit code compiles chunky especially when dealing with floating points (https://www.wikihow.com/Convert-a-Number-from-Decimal-to-IEEE-754-Floating-Point-Representation) or large-integer math. If you ever have to do anything more complex like interpolate values or perhaps run a digital filter on your data, an 8-bit micro will be really annoying. Updating to the 32-bit, particularly the Arm Cortex line, will open up the whole universe of optimized and standardized libraries that CMSIS provides (https://developer.arm.com/tools-and-software/embedded/cmsis) The STM32C0 series is the lower-powered version of the STM32G0 series (https://www.digikey.com/en/products/result?s=N4IgjCBcoLQCxVAYygMwIYBsDOBTANCAG4B2aWehA9lANogBsAHAEysgC6hADgC5QgQAXxFA0 - both are Cortex M0+ chips, but the C0 runs at 48MHz instead of the G0's 64 MHz. The chips are otherwise pin compatible at the low pin-count-end of 8 to 48 pins, the G0 keeps going up to 100 pins. There's 9 different packages, that also have increasing amounts of FLASH/SRAM, with either 16 or 32K of flash and 6 or 12K of RAM. For peripherals you'll get plenty of GPIO, DMA, four 16-bit timers, a 12-bit ADC with 13 channels and surprisingly-high 1.7 MSPS. SPI, I2S, two USART and one I2C. Note there's no USB on the C0 series, for that you'll need to upgrade to the STM 32G0x1 (https://www.digikey.com/en/product-highlight/s/stmicroelectronics/stm32-g0) We haven't featured a lot of microcontrollers on EYE ON NPI lately because we prefer to tell you about parts you can order immediately. The good news about STMicroelectronics STM32C0x1 Series (https://www.digikey.com/en/product-highlight/s/stmicroelectronics/stm32c0x1-series-entry-level-32-bit-mcu) is that they're all in stock right now for immediate shipment (https://www.digikey.com/en/products/filter/embedded/microcontrollers/685?s=N4IgjCBcoLQCxVAYygMwIYBsDOBTANCAG4B2aWehA9lANrgAMAnAExgsgC6hADgC5QQIAL6igA) - in a wide variety of packages and memory options. And if you want to start verifying the parts for your firmware immediately, there's the STM32C0116-DK dev kit (https://www.digikey.com/en/products/detail/stmicroelectronics/STM32C0116-DK/17074591) in stock. Order your STM32C0x1 Series chips or developer kit today, and you can be revising your 8-bit microcontroller design to a 32-bit glow-up by tomorrow afternoon!
Luis Mantilla, es ingeniero civil, ha corrido cuatro maratones: Orlando, Chicago, Nueva York y San Francisco y en tres oportunidades la carreta atlética Vuelta a San Andrés 32K. Su historia representa claramente su profesión, en la que diseña, edifica y mantiene cada uno de los retos que se ha propuesto en su universo runner.¿Listos? ¡A correr!
"With the stock market rally, take a chance to reposition your portfolio and take some gains. Markets will not recover as China and Great Britain are struggling and there is an ongoing war in Ukraine. Dow Jones today will need to get to 32K before it resumes its downward trend. Debt is the biggest elephant in the room. The Federal Reserve is not in an easy position as if they raise interest rates too much, they will ruin the real estate market and pension funds," says Russell Stone.
Loud Rumor is a marketing & consulting company specifically for gym & fitness owners. Find out if you're a good fit for the program by hopping on a FREE strategy call here: https://loudrumor.com/info In today's GSD Show episode, Mike sat down with Jeremy and Louis, owners of FunkFit soon to be The Collective in Gilbert, AZ, who made an additional $32K in MRR in just 3 MONTHS!
Today, on the @tobraornotpodcast, we're talking with Dr. Brittany Barreto, geneticist, serial entrepreneur, podcast host, and venture capitalist. Brittany is known for being the co-founder and Executive Director of FemTech Focus, a non-profit that supports innovation to improve womxn's health. She is also the badass host of the FemTech Focus podcast which has over 100 episodes, 32K downloads, and subscribers in over 105 countries. Over the years, Brittany has dedicated her time to assisting hundreds of FemTech founders to build, launch, and succeed through their events, resources, and market research reports. As a well-known leader in the Houston startup ecosystem, Dr. Brittany Barreto has been recognized by the Houston Business Journal and The Greater Houston Partnership. She is frequently requested to consult startups and deliver keynotes/workshops that highlight her areas of expertise: FemTech, going from science to entrepreneurship, and fundraising. Currently, Brittany is kicking off her latest endeavor as the co-founding partner and emerging fund manager at Coyote Ventures, an early-stage FemTech investment firm. She is hilarious, filled with knowledge and passion, and is quite literally the definition of a badass boss babe. I'm so grateful for the opportunity to share her story and mission with you all. In this episode, we talk about… Femtech - what it is and why it's important Why women weren't included in medical trials until 1993, and the role this has played in women's healthcare over the years (for real… it's nuts you guys!) How Brittany went from selling sex toys in college to being one of the biggest leaders in the fem tech world today The world's first-ever “smart” sex toy (and one of our personal favs); addressing how 1 in 10 women have never orgasmed & 1 in 4 women struggle to reach orgasm Brittany's personal health journey, shifting from a dieting mindset to a fueling mindset Closing the gender PAIN gap and creating a new normal PS - stay tuned till the end where Brittany shares her favorite lesson that she has learned PPS -
4x Pro Bowl, 2x All-Pro, 1 ring in 3 SB appearances, 2x MVP, Hall of Famer over 12 NFL seasons.32K yards, 208 TDs, and 128 INTs don't paint the full picture of the QB of the Greatest Show on Turf.Perseverance, grit, and coming back give it the color that made everyone root for the underdog. This is, the Kurt Warner Story.We talk about the many twists and turns it took for Kurt to even get into the league, which was A LOT. Multiple tryouts, NFL Europe, Arena Football, injuries, and more all played a big part.But then when the opportunity arose, Kurt took the league by storm with the Greatest Show on Turf; early 2000s Rams were a sight to see.BUT, Kurt Warner's story always has ups and downs, and the middle 5 years of his career were so bad that people were ready to discredit his Rams' achievements. A late resurgence and magical Cardinals run saved him and cemented his legacy as the true underdog story of a man who never quits.Listen in to hear the full details and watch some of the highlights below for those who don't know about Kurt:Bucks vs. Rams 1999 NFC Champ game: Strength vs. strength and controversial Bert Emmanuel catchCardinals vs. Packers NFC Wild Card: Kurt's last playoff win, dropping 29/33 for 379 yards and 5 TDs, outdueling a young Aaron Rodgers in classic offensive shootout. Good breakdown of some of the technique of Kurt's deep throwLet us know your thoughts at innoutpodcast1@gmail.com or on Instagram @in_n_outpodcast and catch y'all next week!
It's been a busy week. A social influencer Youtuber Techlead launched his own crypto token called Million MM. It's an ERC20 token that you can purchase on Uniswap. Crypto.com Sponsors UFC What is the MM Million Token USDC Stable Coin going Public Sec Deadline to figure out SEC regulation Hot Wallet Experience (Trust and Metamask) We may create our own ERC20 Token called INCH HBAR (Hedera Hash Graph) Analysis Bitcoin dropped to 32K this week. @cryptocom we want your sponsorship Buying Babydoge and the Trust Wallet on the Binance SmartChain 35:00 Ethereum Gas Fees. How much we have paid 42:00 How can Canadians invest in Crypto We have a new combo online course: Learn Cryptocurrency & Massage t.ly/kJNx Buy doge, ADA, BTC, DOGE, eth, xrp, cro, VET on https://crypto.com/app/df6cdc6d3eto sign up for Crypto.com and we both get $25 USD :) Buy Crypto & Earn Interest on Binance https://www.binance.com/en/register?ref=V1411WX1 Crypto for Beginners Course https://chicvoyageproductions.com/bitcoinetfbeginners Crypto Onboard Consultations with Terry Courses@raynormassage.ca
Bitcoin (BTC) going down to $32K? Bitcoin (BTC) price prediction. Sign up for Token Metrics at https://tokenmetrics.com Token Metrics Media LLC is a regular publication of information, analysis and commentary focused especially on blockchain technology and business, cryptocurrency, blockchain-based tokens, market trends, and trading strategies. Like the podcast to let us know you like the content! Sign up for Token Metrics at https://tokenmetrics.com ✔ Podcast: https://tokenmetrics.com/podcast ✔ Blog: https://blog.tokenmetrics.com/ ✔ Forum: https://forum.tokenmetrics.com/ Follow us on social media below: ► Telegram Alerts Channel: https://t.me/TokenMetrics ► Telegram Discussion Group: https://t.me/TokenMetricsDiscussion ► Twitter: https://twitter.com/tokenmetricsinc ► Instagram: https://instagram.com/tokenmetrics ► Facebook: https://facebook.com/tokenmetrics Token Metrics Media LLC does not provide individually tailored investment advice and does not take a subscriber's or anyone's personal circumstances into consideration when discussing investments; nor is Token Metrics Media LLC, registered as an investment adviser or broker-dealer in any jurisdiction. Information contained herein is not an offer or solicitation to buy, hold, or sell any security. The Token Metrics Media LLC team has advised and invested in many blockchain companies. A complete list of their advisory roles and current holdings can be viewed here: tokenmetrics.com/disclosures.
The Punch Drunk Soul Podcast - Soul Alignment + Business Chats
I am obsessed with this episode guys!! You can hear my excitement for Kristin as she shares her recent wins with me during this chat. Her story reminds me a lot of my own with the amount we've invested in ourselves and our businesses that took years to pay off - but then it was ALL worth it as you'll hear in this episode. Kristin and I met about 2 years ago when both her and I were still getting our businesses really up and running. Kristin was struggling with debt after investing about $20K in her business the first year and not earning anything in return. It's been SO inspiring to see how far she's come since then and how she was able to use the skills and knowledge she'd picked up from investing in herself and finally turn a huge profit. Just a reminder that it all happens with TIME and effort and patience! Kristin is a work at home mom who now runs multiple businesses and helps many other people create success in their business as well. She is an expert in the delicate art of juggling a family and running a wildly successful empire. I love how she shares with us how she does it all - being a mom, having her husband working thousands of miles away in Japan, and runs a business that just the other week did $32K in sales! Recently, Kristin has pivoted her business to Monetized Mama, which is now an agency that helps mom influencers monetize their audience and skills and build service-based businesses that generate money while they sleep. We talk a lot about this pivot and how Kristin recognized an opportunity happening during these challenging economic times. It was great to dive into this because I know so many of us often feel like things are so bleak, but it's truly during times of hardship that new millionnaires and successful businesses are made and Kristin is an amazing example of this. One of my favorite parts of this episode is hearing how Kristin manages to run her business successfully while maintaining a happy home life with her faith, fitness, family, finances, and then her business. Something to thing about - what are your priorities and are you focusing on them effectively? I know I started implementing a new schedule after this podcast interview and started working out first thing in the AM because my health and fitness is important to me, but I have been putting it behind my business for the last couple of months now in quarantine. This episode inspired me to change things up, so listen in closely and maybe change up a few things in your life. If you enjoy this episode I'd love for you to let us know by sharing it on your Instagram stories and tagging us! I'm @punchdrunksoul and Kristin is @kristinarilus OK let's get into this interview... Aha Moments: How Kristin went from $20K in debt to making $32K in 11 days How Kristin stopped operating from what was in her bank account and focused on where she wanted to be in 10 years to find success How to choose which investment is the right next step for you (and which ones aren't) How Kristin's shame around being successful around her family kept her from earning big and how she healed this to tap into abundance How to be successful as a mom with young kids in business Links: Instagram: https://www.instagram.com/kristinarilus/ Facebook: http://facebook.com/monetizedmama Website: http://monetizedmama.com/ Path to Freedom Coach Accelerator: Punchdrunksoul.com/pathtofreedom
Join Vee Khuu and Jason Pero as they talk about investing in multifamily during the pandemic. Is it safe? Should you bite into the market with all the uncertainty? It's as simple as raising the rent later, right? These are the questions that we will get into. With everyone talking about the potential benefits of going all in, it's essential to know the value of staying conservative and investing in stable markets. While focusing on growth is needed, you cannot forget about maintaining your business by making room for unexpected situations! [02:21] How did Lebron James' career choices affect Jason's relationship with his wife?[04:29] Not wanting to work on a farm and the kind of life in Pennsylvania.[06:51] The mindset of making money to get the things you want, even from a young age. How did a couple of school teachers have a net worth in the millions?[11:51] A $32K duplex as the first rental property. Set up for your financial future![13:42] The market before and after 911. Was it really a crisis across the country?[16:36] Enjoying predictability and stability in his market through investing in small towns.[18:18] Mistakes in multifamily – being too aggressive with assumptions. It always pays to have a conservative approach.[23:06] How a conservative approach is going to preserve you as an investor. Make sure to have available reserves for “break in case of emergency” situations![27:30] The definition of financial freedom and the importance of finding mentors. Copy the successful methods of the people that have done it before you.[33:04] Trying to keep on an even keel. Don't get too high or too low. Remember where you came from![34:37] Be radically open-minded and at peace at the same time. It's a life of continual learning and finding opportunities.[38:38] Having defined goals and expectations in real estate. Are you ready to fire people?[41:20] The challenge of feeling satisfied and knowing where to expand.[43:10] Crash and Learn: what is it about? You don't have to brag about your nice things![46:54] Financial goals are empty – focus more on your happiness![48:55] Experience with the Darren Hardy mastermind. Is it necessary to join one?[55:14] Hiring people that are better than you. Be comfortable in having other people make the decisions for you!Resources:Pero Real EstateJason's LinkedInJason's FacebookCrash and LearnRich Dad, Poor DadThe Millionaire Next DoorThree Feet From GoldDarren HardyAdam Adams Connect with Vee Khuu!WebsiteFacebookInstagram LinktreeSupport this show http://supporter.acast.com/the-real-estate-lab. Our GDPR privacy policy was updated on August 8, 2022. Visit acast.com/privacy for more information.
The Vertical Blank: Generation Atari. Season 3 Episode 5: Monthly News and Notes for those who grew up Atari, Or Atari News and HomeBrew Update for Q1 of 2020: Apocalypse Edition.In this episode we cover news from the Official Atari as well as other new hardware projects that have sparked our interest. We also attempt to cover homebrew games for all of Atari Systems including the 2600, 5200, 7800, 8bit Computers, ST, Lynx and Jaguar. 8bitrocket Pong Returns Post Mortem Videohttps://www.youtube.com/watch?v=GoEDKq2PAPg Pong Questhttps://www.atari.com/games/pong-quest/ Zero Page homebrew reports:Atari 2600 Homebrew Completed/WIP in 2020 https://atariage.com/forums/topic/301348-atari-2600-homebrew-completedwip-in-2020/ 1 vs 1 Pro Tennis (16K) by @easmith https://www.youtube.com/watch?v=rfJSpOrIWjs Save Gaia: The Cy-Mage (64K) by @EvoMikeUK / Generation2 https://www.youtube.com/watch?v=rfJSpOrIWjs Street Rod 2600 (32K) by @TwentySixHundred https://www.youtube.com/watch?v=rfJSpOrIWjs Zoo Keeper (32K Port) by Champ Games: John Champeau @johnnywc, Nathan Strum @Nathan Strum (graphics), Robert Vieira and Thomas Jentzsch @Thomas Jentzsch (Music) (Previously Nominated for WIP)https://www.youtube.com/watch?v=dYw47wffgw8 Atari 7800 Homebrew Completed/WIP in 2020 Zero Page homebrew reports: https://atariage.com/forums/topic/301937-atari-7800-homebrew-completedwip-in-2020/ Dragon's Descent by @Revontulihttps://www.youtube.com/watch?v=FcUtpaBzkAY Atari 8-Bit/5200 Homebrew Completed/WIP in 2020 Zero Page Homebrew https://atariage.com/forums/topic/304311-atari-8-bit5200-homebrew-completedwip-in-2020/ Below is a list of all the Atari 8-Bit/5200 homebrew games that have either been completed in 2020 or have released an updated WIP in 2020. Please let me know if there are any missing or inaccurate entries. Lord of the Orb Xaver (Map Set 004) by Marcin "@XaVeR" Kasztelan (Note: Map Update)https://www.youtube.com/watch?v=cYiWWJL
Koereyelle DuBose discusses why WERK University is the place to be for female entrepreneurs, the importance of mindset preparation, bouncing back from your failures, gives us the details on the new HBCU Project, and more. Koereyelle is the founder of Werk Pray Slay and WERK U, a two-time author, award-winning entrepreneur, and former educator who managed to turn her $32K teaching salary into a six-figure brand. She's the Founder of the 1st African American woman-owned trade school in the country and is on a mission to connect women of color with the resources they need so they can stop living paycheck to paycheck. She's an International Speaker, Podcast Host and Edutainer who's been featured nationally by Forbes, ESSENCE Magazine, The Huffington Post, NBC, TV One, VH1, Bravo TV and more for her empowerment projects. Koereyelle is on a mission to help women uncover their purpose, prioritize their life and profit from their passions. Her motto is, “You already have everything you need to get everything you want, you just have to WERK for it. For more info about her brands, visit www.werkprayslay.com. --- Support this podcast: https://anchor.fm/blkwomenhustle/support
Episode One features Koereyelle DuBose, a former educator who managed to turn her $32K teaching salary into a six-figure brand. As the Chief Experience Officer of WERKPraySlay-- an Annual 4-day empowerment weekend for women who are ready to win and Creator of the Busy at the Beach Travel Tribe she's dedicated her life to empowering women to live a life that they love! With a passion for women and pizazz for entrepreneurship, Koe authored two self-development books for women: WERK101: Get-Your-Life-Together Guide and her sophomore project which shares her success secret!