POPULARITY
Databricks just snatched up another AI company. This week, data analytics giant announced a $1 billion acquisition of Neon, a startup building an open-source alternative to AWS Aurora Postgres. It's the latest in a spree of high-profile buys, joining MosaicML and Tabular, as Databricks positions itself as the place to build, deploy, and scale AI-native applications. Today, on TechCrunch's Equity podcast, hosts Kirsten Korosec, Max Zeff, and Anthony Ha unpack the Databricks–Neon deal, where Neon's serverless Postgres tech fits into the larger vision, and whether $1 billion still counts as “a lot of money” these days (spoiler: Kirsten and Anthony are on the fence). Listen to the full episode to hear about: Chime's long-awaited IPO plans and what the neobank's S-1 did (and didn't) reveal. AWS entering a ‘strategic partnership' that could shake up cloud infrastructure, especially as the Middle East ramps up its AI ambitions The return of the web series. Yes, really. Short-form scripted content is back, and investors are placing big bets on nostalgic trend Equity will be back next week, so don't miss it! Equity is TechCrunch's flagship podcast, produced by Theresa Loconsolo, and posts every Wednesday and Friday. Subscribe to us on Apple Podcasts, Overcast, Spotify and all the casts. You also can follow Equity on X and Threads, at @EquityPod. For the full episode transcript, for those who prefer reading over listening, check out our full archive of episodes here. Credits: Equity is produced by Theresa Loconsolo with editing by Kell. We'd also like to thank TechCrunch's audience development team. Thank you so much for listening, and we'll talk to you next time. Learn more about your ad choices. Visit megaphone.fm/adchoices
In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF).Highlights include:- How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes.- Techniques like Policy Proximal Optimization (PPO) and Direct PreferenceOptimization (DPO) for enhancing response quality.- The role of reward models in improving coding, math, reasoning, and other NLP tasks.Connect with Brandon Cui:https://www.linkedin.com/in/bcui19/
The biggest AI breakthroughs won't come from Ph.D. labs — they'll come from people solving real-world problems. So how do AI founders actually turn cutting-edge research into real products and scale them? In this week's episode of Founded & Funded, Madrona Partner Jon Turow sat down with Jonathan Frankle, Chief AI Scientist at Databricks to talk about the shift from AI hype to real adoption — and what founders need to know. They dive into: 1) How AI adoption has shifted from hype to real-world production 2) The #1 mistake AI startups make when trying to sell to enterprises 3) Why your AI system shouldn't care if it's RAG, fine-tuned, or RLHF — it just needs to work 4) The unexpected secret to getting your first customers 5) The AI opportunity that most startups are overlooking Transcript: https://www.madrona.com/databricks-ia40-ai-data-jonathan-frankle Chapters: (00:00) Introduction (01:02) The Vision Behind MosaicML (04:11) Expanding the Mission at Databricks (05:52) The Concept of Data Intelligence (07:42) Navigating the AI Hype Cycle (15:10) Lessons from Early Wins at MosaicML (20:50) Building a Strong AI Team (23:36) The Future of AI and Its Challenges (24:06) Evolving Roles in AI at Databricks (25:55) Bridging Research and Product (28:29) High School Track at NeurIPS (30:39) AI Techniques and Customer Needs (38:22) Rapid Fire Questions and Lessons Learned (42:49) Exciting Trends in AI and Robotics (45:40) AI Policy and Governance
This is a replay of our first episode from April 12, featuring Databricks VP of AI Naveen Rao and a16z partner Matt Bornstein discussing enterprise LLM adoption, hardware platforms, and what it means for AI to be mainstream. If you're unfamiliar with Naveen, he has been in the AI space for more than decade working on everything from custom hardware to LLMs, and has founded two successful startups — Nervana Systems and MosaicML. Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.
Jonathan Frankle is the Chief AI Scientist at Databricks ($43B), which he joined through the acquisition of MosaicML in July 2023. Databricks has over 12,000 customers on the cutting edge of AI; Jonathan works to anticipate their needs and offer solutions even as the tech is rapidly evolving. [0:00] Intro[0:52] Incentives and Team Motivation at Databricks[2:40] The Evolution of AI Models: Transformers vs. LSTMs[5:27] Mosaic and Databricks: A Strategic Merger[7:31] Guidance on AI Model Training and Fine-Tuning[11:11] Building Effective AI Evaluations[16:02] Domain-Specific AI Models and Their Importance[19:37] The Future of AI: Challenges and Opportunities[25:07] Ethical Considerations and Human-AI Interaction[29:13] Customer Collaboration and AI Implementation[30:45] Navigating AI Tools and Techniques[35:41] The Role of Open Source Models[36:46] AI Infrastructure and Partnerships[48:27] Academia's Role in AI Research[52:09] Ethics and Policy in AI[57:47] Quickfire With your co-hosts: @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @ericabrescia - Former COO Github, Founder Bitnami (acq'd by VMWare) @jordan_segall - Partner at Redpoint
In 2014, Naveen Rao left his researcher job at Qualcomm to co-found machine intelligence platform Nervana. About two and a half years later, Nervana sold to Intel for more than $400 million. After a few years at Intel, Naveen became CEO and co-founder of open source startup MosaicML, which helped companies build and train large AI models more efficiently and affordably. Approximately two and a half years later, Databricks acquired that company for $1.3 billion and made him VP of Generative AI. Naveen is a neuroscientist with multiple exits, so we spoke at length about shifting from engineering into entrepreneurship. If you have a technical role and are considering starting up, he shares his perspective on balancing day-to-day responsibilities while validating your idea, the importance of challenging industry assumptions, and finding your market niche. We also talked about who should — and should not — startup in stealth and addressed some of the current challenges facing the generative AI startup ecosystem, such as the rapid failure rates of companies due to high GPU costs and overinvestment in 2021-2022. Runtime: 51:16 EPISODE BREAKDOWN (2:54) “Evidence is always less than you want. At some point you kind of have to take a leap.” (5:37) “My journey with Nervana was, it was immediately clear to me that the world needed this.” (11:52) “Get in the head of someone who's going to exchange money for your services and products.” (14:54) Why he's not a big fan of remote work. (18:35) “It is on you as an adult to find balance that you can work with. It is not the company's duty to do this.” (20:31) Why product companies should “probably not” startup in stealth. (23:11) “It wasn't the right time to sell. I didn't look at it like that at all.” (26:32) Hiring and retaining generative AI talent is “a hard problem right now.” (30:16) “Companies are failing pretty fast because the GPU spends are so big.” (35:04) Interacting with Databricks' developer community. (43:11) Naveen on accelerators: “I think there are actually faster ways to learn.” (45:05) When it comes to angel investing, “I'm looking for creativity, honestly.” (47:34) If you were interviewing for a job with an early-stage startup, what's one question you'd have to ask the CEO? LINKS Naveen Rao @NaveenGRao Databricks Databricks picks up MosaicML, an OpenAI competitor, for $1.3B Intel is paying more than $400 million to buy deep-learning startup Nervana Systems The Righteous Mind: Why Good People Are Divided by Politics and Religion, Jonathan Haidt Ali Partovi Subscribe to Fund/Build/Scale on Substack: fundbuildscale.substack.com
In this episode of the EUVC podcast, Andreas discusses with Will Prendergast, Partner at Frontline Ventures.With ~€435M in assets under management, Frontline Ventures is a venture investment firm that operates two distinct funds:Frontline Seed: A €100M fund targeting pre-seed and seed-stage European companies with clear U.S. expansion intentions.Frontline Growth: A €100M fund focusing on Series B up to D in U.S. companies planning to expand to Europe within 12-18 months.Frontline Ventures, headquartered in Ireland, specializes in B2B SaaS and Deeptech sectors. The firm's position allows it to bridge the gap between European and U.S. markets, providing valuable support for companies looking to expand across the Atlantic in either direction.As a Partner at Frontline Ventures, Will Prendergast offers us a distinctive perspective on European and U.S. venture landscapes. His insights on early-stage investing in Europe and growth-stage opportunities for U.S. companies expanding to Europe are sure to be valuable for anyone interested in transatlantic venture capital's current state and future.Notable investments: Workvivo (acquired by Zoom), SignalAI, Finbourne, Lattice, MosaicML (acquired by Databricks), Navan.Go to eu.vc for our core learnings and the full video interview
This week on Generative Now, Lightspeed Partner and host Michael Mignano talks to Naveen Rao about big bets in his career, hardware to develop AI, and open source as the key to safety. Naveen Rao is the founder of Nervana Systems and MosaicML, which was acquired by Databricks in 2023. He is currently the VP Generative AI at Databricks and is the former vice president and general manager of the Artificial Intelligence Products Group at Intel. Episode Chapters (00:00) Naveen Rao's Background and Career (01:39) Insights on AI and Neuromorphic Computing (05:37) Founding Nervana and Acquisition by Intel (08:54) Transition to MosaicML (17:07) MosaicML's Mission and Growth (19:36) Open Source AI and Safety (23:07) Acquisition by Databricks (24:57) Cultural Alignment and Acquisition Proposal (36:24) Databricks' Role in the AI Landscape (43:50) Closing Thoughts Stay in touch: www.lsvp.com X: https://twitter.com/lightspeedvp LinkedIn: https://www.linkedin.com/company/lightspeed-venture-partners/ Instagram: https://www.instagram.com/lightspeedventurepartners/ Subscribe on your favorite podcast app: generativenow.co Email: generativenow@lsvp.com The content here does not constitute tax, legal, business or investment advice or an offer to provide such advice, should not be construed as advocating the purchase or sale of any security or investment or a recommendation of any company, and is not an offer, or solicitation of an offer, for the purchase or sale of any security or investment product. For more details please see lsvp.com/legal.
MosaicML, mitbegründet von Jonathan Frankle und Michael Carbin, zielt darauf ab, KI-Modelle zugänglicher zu machen.https://news.mit.edu/2024/mosaicml-helps-nonexperts-build-advanced-generative-ai-models-0621 Katharina „Kat“ Schmolly, MD, gründete zebraMD, eine Plattform, die KI zur Diagnose und Behandlung seltener Krankheiten nutzt.https://medicalxpress.com/news/2024-07-ai-powered-tool-doctors-rare.html Generative KI kann das Verständnis der soziopolitischen Realität verzerren, heißt es in einem Forschungsbericht von Google.https://www.404media.co/google-ai-potentially-breaking-reality-is-a-feature-not-a-bug/ Google Translate erweitert seine Sprachoptionen um 110 neue Sprachen.https://www.engadget.com/google-uses-ai-to-add-110-new-languages-to-translate-123009750.html Visit www.integratedaisolutions.com
MosaicML, co-founded by Jonathan Frankle and Michael Carbin, aims to make AI models more accessible.https://news.mit.edu/2024/mosaicml-helps-nonexperts-build-advanced-generative-ai-models-0621 Katharina "Kat" Schmolly, MD, founded zebraMD, a platform using AI to diagnose and manage rare diseases.https://medicalxpress.com/news/2024-07-ai-powered-tool-doctors-rare.html Generative AI can distort understanding of socio-political reality, says a Google research paper.https://www.404media.co/google-ai-potentially-breaking-reality-is-a-feature-not-a-bug/ Google Translate is expanding its language options with the addition of 110 new languages.https://www.engadget.com/google-uses-ai-to-add-110-new-languages-to-translate-123009750.html Visit www.integratedaisolutions.com
Celem MosaicML, którego współzałożycielami są Jonathan Frankle i Michael Carbin, jest zwiększenie dostępności modeli sztucznej inteligencji.https://news.mit.edu/2024/mosaicml-helps-nonexperts-build-advanced-generative-ai-models-0621 Lekarka Katharina „Kat” Schmolly założyła zebraMD – platformę wykorzystującą sztuczną inteligencję do diagnozowania rzadkich chorób i leczenia ich.https://medicalxpress.com/news/2024-07-ai-powered-tool-doctors-rare.html Jak wynika z artykułu badawczego Google, generatywna sztuczna inteligencja może zniekształcić zrozumienie rzeczywistości społeczno-politycznej.https://www.404media.co/google-ai-potentially-breaking-reality-is-a-feature-not-a-bug/ Tłumacz Google rozszerza swoje opcje językowe, dodając 110 nowych języków.https://www.engadget.com/google-uses-ai-to-add-110-new-languages-to-translate-123009750.html Odwiedź www.integratedaisolutions.com
Celem MosaicML, którego współzałożycielami są Jonathan Frankle i Michael Carbin, jest zwiększenie dostępności modeli sztucznej inteligencji.https://news.mit.edu/2024/mosaicml-helps-nonexperts-build-advanced-generative-ai-models-0621 Naukowcy z Uniwersytetu Kyushu opracowali narzędzie AI QDyeFinder do mapowania neuronów.https://medicalxpress.com/news/2024-06-super-brain-wiring-ai-human.html Meta była krytykowana za nieprawidłowe oznaczanie zdjęć na swoich platformach jako „Wykonane przy użyciu sztucznej inteligencji”.https://techcrunch.com/2024/06/21/meta-tagging-real-photos-made-with-ai/ Zespół Meta Fundamental AI Research (FAIR) udostępnia pięć modeli badawczych AI.https://about.fb.com/news/2024/06/releasing-new-ai-research-models-to-accelerate-innovation-at-scale/ Odwiedź www.integratedaisolutions.com
MosaicML, mitbegründet von Jonathan Frankle und Michael Carbin, zielt darauf ab, KI-Modelle zugänglicher zu machen.https://news.mit.edu/2024/mosaicml-helps-nonexperts-build-advanced-generative-ai-models-0621 Forscher der Universität Kyushu haben mit QDyeFinder ein KI-Tool zur Kartierung von Neuronen entwickelt.https://medicalxpress.com/news/2024-06-super-brain-wiring-ai-human.html Meta wurde dafür kritisiert, dass es auf seinen Plattformen Fotos fälschlicherweise als „Made with AI“ kennzeichnet.https://techcrunch.com/2024/06/21/meta-tagging-real-photos-made-with-ai/ Metas Fundamentalhttps://about.fb.com/news/2024/06/releasing-new-ai-research-models-to-accelerate-innovation-at-scale/ Visit www.integratedaisolutions.com
MosaicML, co-founded by Jonathan Frankle and Michael Carbin, aims to make AI models more accessible.https://news.mit.edu/2024/mosaicml-helps-nonexperts-build-advanced-generative-ai-models-0621 Researchers from Kyushu University have developed an AI tool, QDyeFinder, to map neurons.https://medicalxpress.com/news/2024-06-super-brain-wiring-ai-human.html Meta has been criticized for incorrectly labeling photos as "Made with AI" on its platforms.https://techcrunch.com/2024/06/21/meta-tagging-real-photos-made-with-ai/ Meta's Fundamental AI Research (FAIR) team is releasing five AI research models.https://about.fb.com/news/2024/06/releasing-new-ai-research-models-to-accelerate-innovation-at-scale/ Visit www.integratedaisolutions.com
Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/ MLOps Coffee Sessions Special episode with Databricks, Introducing DBRX: The Future of Language Models, fueled by our Premium Brand Partner, Databricks. DBRX is designed to be especially capable of a wide range of tasks and outperforms other open LLMs on standard benchmarks. It also promises to excel at code and math problems, areas where others have struggled. Our panel of experts will get into the technical nuances, potential applications, and implications of DBRx for businesses, developers, and the broader tech community. This session is a great opportunity to hear from insiders about how DBRX's capabilities can benefit you. // Bio Denny Lee - Co-host Denny Lee is a long-time Apache Spark™ and MLflow contributor, Delta Lake maintainer, and a Sr. Staff Developer Advocate at Databricks. A hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale data platforms and predictive analytics systems. He has previously built enterprise DW/BI and big data systems at Microsoft, including Azure Cosmos DB, Project Isotope (HDInsight), and SQL Server. Davis Blalock Davis Blalock is a research scientist and the first employee at MosaicML. He previously worked at PocketSonics (acquired 2013) and completed his PhD at MIT, where he was advised by John Guttag. He received his M.S. from MIT and his B.S. from the University of Virginia. He is a Qualcomm Innovation Fellow, NSF Graduate Research Fellow, and Barry M. Goldwater Scholar. He is also the author of Davis Summarizes Papers, one of the most widely-read machine learning newsletters. Bandish Shah Bandish Shah is an Engineering Manager at MosaicML/Databricks, where he focuses on making generative AI training and inference efficient, fast, and accessible by bridging the gap between deep learning, large-scale distributed systems, and performance computing. Bandish has over a decade of experience building systems for machine learning and enterprise applications. Prior to MosaicML, Bandish held engineering and development roles at SambaNova Systems where he helped develop and ship the first RDU systems from the ground up, and Oracle where he worked as an ASIC engineer for SPARC-based enterprise servers. Abhi Venigalla Abhi is an NLP architect working on helping organizations build their own LLMs using Databricks. Joined as part of the MosaicML team and used to work as a researcher at Cerebras Systems. Ajay Saini Ajay is an engineering manager at Databricks leading the GenAI training platform team. He was one of the early engineers at MosaicML (acquired by Databricks) where he first helped build and launch Composer (an open source deep learning training framework) and afterwards led the development of the MosaicML training platform which enabled customers to train models (such as LLMs) from scratch on their own datasets at scale. Prior to MosaicML, Ajay was co-founder and CEO of Overfit, an online personal training startup (YC S20). Before that, Ajay worked on ML solutions for ransomware detection and data governance at Rubrik. Ajay has both a B.S. and MEng in computer science with a concentration in AI from MIT. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Website: https://www.databricks.com/ Databricks DBRX: https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/
Naveen Rao, vice president of generative AI at Databricks, joins a16z's Matt Bornstein and Derrick Harris to discuss enterprise usage of LLMs and generative AI. Naveen is particularly knowledgeable about the space, having spent years building AI chips first at Qualcomm and then as the founder of AI chip startup Nervana Systems back in 2014. Intel acquired Nervana in 2016.After a stint at Intel, Rao re-emerged with MosaicML in 2021. This time, he focused on the software side of things, helping customers train their own LLMs, and also fine-tune foundation models, on top of an optimized tech stack. Databricks acquired Mosaic in July of 2023.This discussion covers the gamut of generative AI topics — from basic theory to specialized chips — to although we focus on how the enterprise LLM market is shaping up. Naveen also shares his thoughts on why he prefers finally being part of the technology in-crowd, even if it means he can't escape talking about AI outside of work.More information:LLMs at DatabricksMosaic ResearchMore AI content from a16zFollow everyone on X:Naveen RaoMatt BornsteinDerrick Harris Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.
Huge thank you to Databricks AI for sponsoring this episode. Databricks - http://databricks.com/ Bandish Shah is an Engineering Manager at MosaicML/Databricks, where he focuses on making generative AI training and inference efficient, fast, and accessible by bridging the gap between deep learning, large-scale distributed systems, and performance computing. Davis Blalock is a Research Scientist and the first employee of Mosaic ML: a GenAI startup acquired for $1.3 billion by Databricks. MLOps podcast #219 with Databricks' Engineering Manager, Bandish Shah and Research Scientist Davis Blalock, The Art and Science of Training Large Language Models. // Abstract What's hard about language models at scale? Turns out...everything. MosaicML's Davis and Bandish share war stories and lessons learned from pushing the limits of LLM training and helping dozens of customers get LLMs into production. They cover what can go wrong at every level of the stack, how to make sure you're building the right solution, and some contrarian takes on the future of efficient models. // Bio Bandish Shah Bandish Shah is an Engineering Manager at MosaicML/Databricks, where he focuses on making generative AI training and inference efficient, fast, and accessible by bridging the gap between deep learning, large-scale distributed systems, and performance computing. Bandish has over a decade of experience building systems for machine learning and enterprise applications. Prior to MosaicML, Bandish held engineering and development roles at SambaNova Systems where he helped develop and ship the first RDU systems from the ground up, and Oracle where he worked as an ASIC engineer for SPARC-based enterprise servers. Davis Blalock Davis Blalock is a research scientist at MosaicML. He completed his PhD at MIT, advised by Professor John Guttag. His primary work is designing high-performance machine learning algorithms. He received his M.S. from MIT and his B.S. from the University of Virginia. He is a Qualcomm Innovation Fellow, NSF Graduate Research Fellow, and Barry M. Goldwater Scholar. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Website: http://databricks.com/ Davis Summarizes Papers Newsletter signup link Davis' Newsletters: Learning to recognize spoken words from five unlabeled examples in under two seconds: https://arxiv.org/abs/1609.09196 Training on data at 5GB/s in a single thread: https://arxiv.org/abs/1808.02515 Nearest-neighbor searching through billions of images per second in one thread with no indexing: https://arxiv.org/abs/1706.10283 Multiplying matrices 10-100x faster than a matrix multiply (with some approximation error): https://arxiv.org/abs/2106.10860 Hidden Technical Debt in Machine Learning Systems: https://proceedings.neurips.cc/paper_files/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Davis on LinkedIn: https://www.linkedin.com/in/dblalock/ Connect with Bandish on LinkedIn: https://www.linkedin.com/in/bandish-shah/
Jonathan Frankle works as Chief Scientist (Neural Networks) at MosaicML (recently acquired by Databricks), a startup dedicated to making it easy and cost-effective for anyone to train large-scale, state-of-the-art neural networks. He leads the research team. MLOps podcast #205 with Jonathan Frankle, Chief Scientist (Neural Networks) at Databricks, The Myth of AI Breakthroughs, co-hosted by Denny Lee, brought to us by our Premium Brand Partner, Databricks. // Abstract Jonathan takes us behind the scenes of the rigorous work they undertake to test new knowledge in AI and to create effective and efficient model training tools. With a knack for cutting through the hype, Jonathan focuses on the realities and usefulness of AI and its application. We delve into issues such as face recognition systems, the 'lottery ticket hypothesis,' and robust decision-making protocols for training models. Our discussion extends into Jonathan's interesting move into the world of law as an adjunct professor, the need for healthy scientific discourse, his experience with GPUs, and the amusing claim of a revolutionary algorithm called Qstar. // Bio Jonathan Frankle is Chief Scientist (Neural Networks) at Databricks, where he leads the research team toward the goal of developing more efficient algorithms for training neural networks. He arrived via Databricks' $1.3B acquisition of MosaicML as part of the founding team. He recently completed his PhD at MIT, where he empirically studied deep learning with Prof. Michael Carbin, specifically the properties of sparse networks that allow them to train effectively (his "Lottery Ticket Hypothesis" - ICLR 2019 Best Paper). In addition to his technical work, he is actively involved in policymaking around challenges related to machine learning. He earned his BSE and MSE in computer science at Princeton and has previously spent time at Google Brain and Facebook AI Research as an intern and Georgetown Law as an Adjunct Professor of Law. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Website: www.jfrankle.com Facial recognition: perpetuallineup.orgThe Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networksby Jonathan Frankle and Michael Carbin paper: https://arxiv.org/abs/1803.03635 --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Denny on LinkedIn: https://linkedin.com/in/dennyglee Connect with Jonathan on LinkedIn: https://www.linkedin.com/in/jfrankle/ Timestamps: [00:00] Jonathan's preferred coffee [01:16] Takeaways [07:19] LM Avalanche Panel Surprise [10:07] Adjunct Professor of Law [12:59] Low facial recognition accuracy [14:22] Automated decision making human in the loop argument [16:09] Control vs. Outsourcing Concerns [18:02] perpetuallineup.org [23:41] Face Recognition Challenges [26:18] The lottery ticket hypothesis [29:20] Mosaic Role: Model Expertise [31:40] Expertise Integration in Training [38:19] SLURM opinions [41:30] GPU Affinity [45:04] Breakthroughs with QStar [49:52] Deciphering the noise advice [53:07] Real Conversations [55:47] How to cut through the noise [1:00:12] Research Iterations and Timelines [1:02:30] User Interests, Model Limits [1:06:18] Debugability [1:08:00] Wrap up
My guest this month is Naveen Rao, the co-founder of MosaicML and current head of Generative AI at Databricks. Naveen's journey is unique, as it echoes the evolution of AI itself. He's best known for founding and selling two successful companies. The first, Nervana, an AI-focused chip company, was acquired by Intel for $400 million in 2016. The second, MosaicML, was acquired by Databricks in June for $1.3 billion. In our conversation, we unpack the insights and frameworks that led Naveen to make these bets in the first place. We begin by exploring his long history in AI research and startups, from his early days at Qualcomm, his founding of Nervana, and the genesis of MosaicML. We then turn to the complexities of the ever-changing AI landscape and go behind the scenes of MosaicML's acquisition by Databricks. We close with Naveen's takes on the most urgent questions in AI, including the recent tumult at OpenAI, the road to AGI, the role of regulation, and where he thinks generative AI will go next. What struck me most is Naveen's remarkable ability to not only anticipate the future but also actively pave the path toward it. For founders looking to navigate our current AI moment, this episode is full of valuable lessons.
Join us with Hagay Lupesko, VP of Engineering at MosaicML/Databricks, as we dive into the rapidly evolving world of large language models (LLMs). We'll discuss the latest innovations and challenges in the field, with a spotlight on MosaicML's unique contributions. Additionally, we explore how MosaicML's strategies compare with those of industry giants like OpenAI/Microsoft, Anthropic/AWS, and open-source initiatives such as Meta's LLaMA-2. Hagay provides expert insights into the varied approaches driving AI's future, making this a crucial listen for anyone interested in understanding the trends and potentials of large language models. --- Send in a voice message: https://podcasters.spotify.com/pod/show/tonyphoang/message Support this podcast: https://podcasters.spotify.com/pod/show/tonyphoang/support
好久不见!OnBoard! 回来啦!这一期又是考验听力的全英文访谈。这次,我们跟亲历者聊聊,今年AI领域最受瞩目的一笔10亿美金的并购:Databricks 今年6月以13亿美金的“天价”,收购了当时成立2年的大语言模型(LMM)基础设施创业公司 MosaicML。 Hello World, who is OnBoard!? 可以说,这是今年最受关注的AI领域收购之一了,之后AI 基础设施领域的巨头和创业公司,都被这次收购推动,开始了融资和产品迭代的热潮。收购的时候,MosaicML 仅有60多人,但是已经推出了 MPT 7B, 30B 两个开源大语言模型,总下载量超过330万。也是最早一批推出开源LLM的公司之一。这次访谈,Monica 不仅邀请到了 MosaicML 联合创始人、CTO,Hanlin Tang, 还邀请到还有之前来OnBoard! 做过客的非常专业的硅谷成长期投资人, Sapphire Ventures 合伙人,Casber Wang,我们得以从创始人和投资人的视角,一起解读这个有里程碑意义的收购,以及对于生成式AI,AI infra 核心竞争力和未来格局等等话题进行非常有意思的探讨。 准备好你的英语听力,enjoy! 嘉宾介绍 Hanlin Tang (@hanlintang), 现Databricks Neural Networks CTO, 前 MosaicML co-founder & CTO, Intel AI Lab Senior Director, Nervana (2016.9 被Intel 4亿美金收购)创始团队成员 Casber Wang (@CasberW), Sapphire Ventures 合伙人,专注data, infra, 安全等领域,投资了Auth0, JumpCloud, StarTree 等。Sapphire Ventures 是一家资产管理总额超过110亿美金的全球风险投资基金。 Onboard!主持:Monica(推特:@Monica_XieY):美元VC投资人,前 AWS 硅谷团队+AI创业公司打工人,公众号:M小姐研习录 (ID: MissMStudy) 主理人 | 即刻:莫妮卡同学 我们都聊了什么 [01:57] 两位嘉宾自我介绍,Hanlin 最近感兴趣的AI产品,以及Casber 最近在AI领域的投资 [08:40] LLM 的到来对传统 MLOps 公司有什么影响 [14:13] MosaicML 介绍,2021年如何发现这个创业机会的 [17:28] ChatGPT 发布以来,MosaicML 经历了哪些重要的里程碑事件,行业有哪些重要变化 [21:41] MosaicML 早期客户是哪些公司? [22:55] MosaicML 的几个重要产品决定:一开始为什么要做开源? [26:43] 创业公司应该如何思考是否要开源? [34:36] MosaicML 如何做开源商业化?商业化模式如何演进? [37:17] 如何思考 LLM serving 和训练平台的产品竞争力和差异性?MosaicML 的平台如何在探索中演进? [42:32] MosaicML 为何推出自己的开源LLM? [45:39] 客户如何选择LLM?常见的问题是什么?有什么常见的误解? [51:11] 开源和闭源LLM格局会如何演进?如何影响LLM周边的生态?我们能从云计算生态发展历史中学到什么 [58:39] 客户如何衡量LLM的performance? 客户需要的是怎样的服务? [62:06] LLM 会如何改变SaaS 生态?为什么说我们低估了数据对新的平台的意义? [70:52] 为什么说提高LLM 训练效率和成本没有 silver bullet? 做成本优化的公司如何设计商业模式? [82:00] 企业如何思考是否要自己训练LLM?还是在已有模型上 fine tune? [89:23] MosaicML 被 Databricks 收购之后会有哪些变化?下一个聚焦的点是什么? [91:32] Casber 作为投资人,如何看待 Databricks 对 MosaicML 的收购?对未来行业有什么影响? [95:44] Hanlin 创业以来对于行业的观点有过什么改变?过去和未来有什么值得关注的变化? [108:52] 如果坐时光机到5年后,你最想问的问题是什么? 提到的公司 MosaicML Databricks Weights & Bias FlowGPT 参考文字 www.mosaicml.com www.youtube.com www.latent.space blog.replit.com www.databricks.com www.latent.space 欢迎关注M小姐的微信公众号,了解更多中美软件、AI与创业投资的干货内容! M小姐研习录 (ID: MissMStudy) 大家的点赞、评论、转发是对我们最好的鼓励! 如果你能在小宇宙上点个赞,Apple Podcasts 上给个五星好评,就能让更多的朋友看到我们努力制作的内容,打赏请我们喝杯咖啡,就给你比心!
Startup Field Guide by Unusual Ventures: The Product Market Fit Podcast
MosaicML is the developer of open source infrastructure for training LLMs. The company was acquired by Databricks for $1.3 billion in July 2023. and has gone from 0 to over $30M in revenue this year in just 6 months. In this episode, Sandhya Hegde and Wei Lien Dang chat with Naveen Rao, co-founder of MosaicML and now the head of Generative AI at Databricks Join us as we discuss: 00:00 Preview: Future of foundation model companies 2:16 How Naveen's previous experiences led to MosaicML 7:29 The core insight behind the founding of MosaicML 9:52 MosaicML's approach to building an end-to-end platform 12:09 Why MosaicML focused on open models and LLMs 14:25 Why most foundation model companies will fail 15:52 How MosaicML found early adopters 18:14 Early use cases for MosaicML's product 21:27 Impact of early feedback on MosaicML's product roadmap 25:21 Why Naveen decided to move ahead with the Databricks acquisition 31:44 How Naveen sees the AI ecosystem evolving 34:08 Regulation of AI and the importance of open source 41:15 Advice for founding building AI infrastructure Sandhya Hegde is a General Partner at Unusual Ventures, leading investments in modern SaaS companies with a focus on AI. Previously an early executive at Amplitude, Sandhya is a product-led growth (PLG) coach and mentor. Wei Lien Dang is a General Partner at Unusual Ventures and leads investments in infrastructure software, security, and developer tool. Wei was a co-founder of StackRox, a cloud-native security company prior to its acquisition by Red Hat. Naveen Rao is the co-founder of MosaicML (now a Databricks company) and currently the head of Generative AI at Databricks Unusual Ventures is a seed-stage venture capital firm designed from the ground up to give a distinct advantage to founders building the next generation of software companies. Unusual has invested in category-defining companies like Webflow, Arctic Wolf Networks, Carta, Robinhood, and Harness. Learn more about us at https://www.unusual.vc/. Further reading from Unusual Ventures: Starting an open source company Open source customer development
We were delighted to kick off the 2nd Cerebral Valley AI Summit with Ali Ghodsi, CEO of Databricks, and Naveen Rao, co-founder of MosaicML. Their encounter at our debut event in March led to Ghodsi buying Rao's company, which had little revenue, for $1.3 billion. At our event on Nov. 15, the two discussed how the deal came together quickly after meeting at the conference dinner. Thousands of enterprises around the world rely on Oracle Cloud Infrastructure (OCI) to power applications that drive their businesses. OCI customers include leaders across industries, such as healthcare, scientific research, financial services, telecommunications, and more.NVIDIA DGX Cloud on OCI is an AI training-as-a-service platform for customers to train complex AI models like generative AI applications. Included with DGX Cloud, NVIDIA AI Enterprise brings the software layer of the NVIDIA AI platform to OCI.Talk with Oracle about accelerating your GPU workloads.Ghodsi recounted how he started spending some time with Rao and thought, “these guys are pretty good,” and then by chance noticed an employee he respected poking around with MosaicML and offering a strong endorsement. Soon Ghodsi was on the phone with the head of his deals team, who told him “if you want to buy these guys you have to do it this weekend.” Rao said by that point “you kind of know he's going to pop the question,” and once they worked out the money, the deal was done.The two executives certainly seemed to be in harmony as they touted the potential benefits from their combination, which in simple terms will bring MosaicML's expertise in building specialized generative AI models to Databricks' corporate data platform products, essentially super-charging Databricks for the generative AI era. They were eager to defend the idea of open-source foundation models that are specific to certain tasks, rejecting the notion that general-purpose models like ChatGPT-4 will eventually swallow everything. (This conversation took place before OpenAI was thrown into chaos by its board of directors.)Ghodsi said calls to limit open-source models on the grounds that they'll be too easily exploited by bad actors a “horrible, horrendous” idea that would “put a stop to all innovation.” “It's essential that we have an open-source ecosystem,” he said, noting that even now it's unclear how a lot of AI models work, and open-source research will be critical to answering those questions.Rao added that many of the people making predictions about how AI would develop are “full of s**t.” On the safety question, he noted that cost alone would stand in the way of any existential risks for a long time, and in the meantime the focus should be on real threats like disinformation and robot safety.Give it a listen Get full access to Newcomer at www.newcomer.co/subscribe
We were delighted to kick off the 2nd Cerebral Valley AI Summit with Ali Ghodsi, CEO of Databricks, and Naveen Rao, co-founder of MosaicML. Their encounter at our debut event in March led to Ghodsi buying Rao's company, which had little revenue, for $1.3 billion. At our event on Nov. 15, the two discussed how the deal came together quickly after meeting at the conference dinner. Thousands of enterprises around the world rely on Oracle Cloud Infrastructure (OCI) to power applications that drive their businesses. OCI customers include leaders across industries, such as healthcare, scientific research, financial services, telecommunications, and more.NVIDIA DGX Cloud on OCI is an AI training-as-a-service platform for customers to train complex AI models like generative AI applications. Included with DGX Cloud, NVIDIA AI Enterprise brings the software layer of the NVIDIA AI platform to OCI.Talk with Oracle about accelerating your GPU workloads.Ghodsi recounted how he started spending some time with Rao and thought, “these guys are pretty good,” and then by chance noticed an employee he respected poking around with MosaicML and offering a strong endorsement. Soon Ghodsi was on the phone with the head of his deals team, who told him “if you want to buy these guys you have to do it this weekend.” Rao said by that point “you kind of know he's going to pop the question,” and once they worked out the money, the deal was done.The two executives certainly seemed to be in harmony as they touted the potential benefits from their combination, which in simple terms will bring MosaicML's expertise in building specialized generative AI models to Databricks' corporate data platform products, essentially super-charging Databricks for the generative AI era. They were eager to defend the idea of open-source foundation models that are specific to certain tasks, rejecting the notion that general-purpose models like ChatGPT-4 will eventually swallow everything. (This conversation took place before OpenAI was thrown into chaos by its board of directors.)Ghodsi said calls to limit open-source models on the grounds that they'll be too easily exploited by bad actors a “horrible, horrendous” idea that would “put a stop to all innovation.” “It's essential that we have an open-source ecosystem,” he said, noting that even now it's unclear how a lot of AI models work, and open-source research will be critical to answering those questions.Rao added that many of the people making predictions about how AI would develop are “full of s**t.” On the safety question, he noted that cost alone would stand in the way of any existential risks for a long time, and in the meantime the focus should be on real threats like disinformation and robot safety.Give it a listen Get full access to Newcomer at www.newcomer.co/subscribe
The Sunday Times' tech correspondent Danny Fortson brings on Naveen Rao, founder of MosaicML, to talk about the efficiency of the human brain (3:30), slashing the cost to train AI models (8:00), how this is like the evolution of the car (11:40), selling to Databricks (14:30), how the AI market will evolve (17:00), the fallacy of AI doomerism (21:00), growing up in eastern Kentucky (22:30), plunging into the dotcom boom (24:50), why he studied neuroscience (27:10), selling his previous startup to Intel (32:00), solving intelligence (34:40), and what he tells his kids about the future (40:00). Hosted on Acast. See acast.com/privacy for more information.
Jonathan Frankle is the Chief Scientist at MosaicML, which was recently bought by Databricks for $1.3 billion. MosaicML helps customers train generative AI models on their data. Lots of companies are excited about gen AI, and the hope is that their company data and information will be what sets them apart from the competition. In this conversation with Tristan and Julia, Jonathan discusses a potential future where you can train specialized, purpose-built models, the future of MosaicML inside of Databricks, and the importance of responsible AI practices. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.
In episode 96 of The Gradient Podcast, Daniel Bashir speaks to Jonathan Frankle.Jonathan is the Chief Scientist at MosaicML and (as of release). Jonathan completed his PhD at MIT, where he investigated the properties of sparse neural networks that allow them to train effectively through his lottery ticket hypothesis. He also spends a portion of his time working on technology policy, and currently works with the OECD to implement the AI principles he helped develop in 2019.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at editor@thegradient.pubSubscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:35) Jonathan's background and work* (04:25) Origins of the Lottery Ticket Hypothesis* (06:00) Jonathan's empiricism and approach to science* (08:25) More Karl Popper discourse + hot takes* (09:45) Walkthrough of the Lottery Ticket Hypothesis* (12:00) Issues with the Lottery Ticket Hypothesis as a statement* (12:30) Jonathan's advice for PhD students, on asking good questions* (15:55) Strengths and Promise of the Lottery Ticket Hypothesis* (18:55) More Lottery Ticket Hypothesis Papers* (19:10) Comparing Rewinding and Fine-tuning* (23:00) Care in making experimental choices* (25:05) Linear Mode Connectivity and the Lottery Ticket Hypothesis* (27:50) On what is being measured and how* (28:50) “The outcome of optimization is determined to a linearly connected region”* (31:15) On good metrics* (32:54) On the Predictability of Pruning Across Scales — scaling laws for pruning* (34:40) The paper's takeaway* (38:45) Pruning Neural Networks at Initialization — on a scientific disagreement* (45:00) On making takedown papers useful* (46:15) On what can be known early in training* (49:15) Jonathan's perspective on important research questions today* (54:40) MosaicML* (55:19) How Mosaic got started* (56:17) Mosaic highlights* (57:33) Customer stories* (1:00:30) Jonathan's work and perspectives on AI policy* (1:05:45) The key question: what we want* (1:07:35) OutroLinks:* Jonathan's homepage and Twitter* Papers* The Lottery Ticket Hypothesis and follow-up work* Comparing Rewinding and Fine-tuning in Neural Network Pruning* Linear Mode Connectivity and the LTH* On the Predictability of Pruning Across Scales* Pruning Neural Networks at Initialization: Why Are We Missing The Mark?* Desirable Inefficiency Get full access to The Gradient at thegradientpub.substack.com/subscribe
AI Applied: Covering AI News, Interviews and Tools - ChatGPT, Midjourney, Runway, Poe, Anthropic
Discover how OpenAI's rival, MosaicML, has been acquired for a staggering $1.3 billion by Databricks, reshaping the AI landscape. In this episode, we dive into the implications of this monumental deal and explore what the future holds for MosaicML as it joins forces with a major player in the tech industry. Join us as we unravel the emergence of a new AI powerhouse. Get on the AI Box Waitlist: https://AIBox.ai/Join our ChatGPT Community: https://www.facebook.com/groups/739308654562189/Follow me on Twitter: https://twitter.com/jaeden_ai
ChatGPT: News on Open AI, MidJourney, NVIDIA, Anthropic, Open Source LLMs, Machine Learning
In this episode, we delve into the world of AI competition and acquisitions as we explore MosaicML's astonishing $1.3 billion deal with Databricks, positioning itself as a formidable rival to OpenAI. We'll uncover the strategic moves, implications, and the future of AI landscape as these tech giants make groundbreaking acquisitions in the pursuit of AI supremacy. Join us for an insightful discussion on the evolving dynamics of the AI industry. Get on the AI Box Waitlist: https://AIBox.ai/Join our ChatGPT Community: https://www.facebook.com/groups/739308654562189/Follow me on Twitter: https://twitter.com/jaeden_ai
In the latest Drinks With The Deal Podcast, Databricks general counsel Trâm Phi discusses the company's $1.3 billion purchase of MosaicML, the legal uncertainty around AI and the challenges of managing a growing legal department.
Our guest today is Davis Blalock, Research Scientist and first employee of Mosaic ML; a startup which got recently acquired by Databricks for an astonishing $1.3 billion. In our conversation, we first talk about Davis' PhD at MIT and his research on making algorithms more efficient. Davis then explains how and why he joined Mosaic and shares the story behind the company. He dives into the product and how they evolved from focusing on deep learning algorithms to generative AI and large language models. If you enjoyed the episode, please leave a 5 star review and subscribe to the AI Stories Youtube channel.Follow Davis on LinkedIn: https://www.linkedin.com/in/dblalock/Follow Neil on LinkedIn: https://www.linkedin.com/in/leiserneil/ ————(00:00) - Intro(01:40) - How Davis entered the world of Data and AI?(03:30) - Enhancing ML algorithms' efficiency(12:50) - Importance of efficiency(16:37) - Choosing MosaicML over starting his own startup(25:30) - What is Mosaic ML? (37:34) - How did the rise of LLM aid MosaicML's growth?(46:54) - $1.3 billion acquisition by Databricks(48:52) - Learnings and failures from working in a startup(01:00:05) - Career advice
AI Hustle: News on Open AI, ChatGPT, Midjourney, NVIDIA, Anthropic, Open Source LLMs
In this episode, dive into the remarkable journey of MosaicML, a notable OpenAI competitor, as it sells for a staggering $1.3 billion to tech giant Databricks. Discover the strategic moves, innovations, and market dynamics that led to this groundbreaking acquisition, reshaping the landscape of AI competition and innovation. Join us as we unpack the implications of this milestone in the world of artificial intelligence and its impact on the industry's future. Get on the AI Box Waitlist: https://AIBox.ai/Join our ChatGPT Community: https://www.facebook.com/groups/739308654562189/Follow me on Twitter: https://twitter.com/jaeden_ai
Explore AI's Latest Breakthroughs and Funding! Learn about Databricks' $500M funding, Generate: Biomedicines' expansion plans, SQream's GPU tech, Google's "Gemini" rival to GPT-4, and Adobe's commercial launch of Firefly AI. Plus, discover how these innovations are shaping the future of technology.Key takeaways:Databricks' $500 million Series I funding round, led by T. Rowe Price Associates, Inc., values the data lakehouse platform at $43 billion, reflecting its strong Q2 performance and acquisition of MosaicML.Google's "Gemini" aims to compete with OpenAI's ChatGPT, featuring robust language models for diverse applications, signaling Google's increased investment in generative AI.Generate:Biomedicines secures $237 million in Series C funding, enabling expansion of its generative AI pipeline and acceleration of clinical trials using machine learning.SQream's $45 million Series C funding, led by World Trade Ventures, will fuel North American expansion and enhance AI/ML enterprise capabilities in the big data and analytics markets.Adobe's Firefly AI transitions from beta to commercial availability, offering generative AI tools for image, video, and text content generation across its Creative Cloud platform.Quotes:"Databricks' valuation soars to $43 billion as it successfully closes a $500 million Series I funding round," - Reflecting the company's strong Q2 performance and investor confidence."Generate :Biomedicines plans to expand its generative AI pipeline and launch clinical trials annually," - Highlighting the company's growth strategy following a $237 million Series C funding round."SQream's Chief Revenue Officer, Deborah Leff, emphasizes GPU technology's compatibility with AI and data architectures," - Underlining the advantages of SQream's patented GPU technology."Google aims to establish 'Gemini' as a competitor to OpenAI's GPT-4," - Indicating Google's ambition to rival OpenAI in the generative AI space."Adobe officially launches Firefly AI, extending its generative capabilities to all Creative Cloud users," - Announcing the transition of Firefly AI from beta to commercial availability.____More from Edge of AI
Nathan Labenz sits down with Dr. Ronen Dar, CTO and co-founder of Run:ai, an Israel-based company that helps enterprises train and deploy AI models by optimizing GPU usage. The discussion covers how chip makers can meet the soaring demands, geopolitical fears, to the best practices companies can secure compute capacity. If you're looking for an ERP platform, check out our sponsor, NetSuite: http://netsuite.com/cognitive RECOMMENDED PODCAST: The HR industry is at a crossroads. What will it take to construct the next generation of incredible businesses – and where can people leaders have the most business impact? Hosts Nolan Church and Kelli Dragovich have been through it all, the highs and the lows – IPOs, layoffs, executive turnover, board meetings, culture changes, and more. With a lineup of industry vets and experts, Nolan and Kelli break down the nitty-gritty details, trade offs, and dynamics of constructing high performing companies. Through unfiltered conversations that can only happen between seasoned practitioners, Kelli and Nolan dive deep into the kind of leadership-level strategy that often happens behind closed doors. Check out the first episode with the architect of Netflix's culture deck Patty McCord. https://link.chtbl.com/hrheretics RECOMMENDATION: The AI Scouting Report Playlist Parts 1-3: https://www.youtube.com/watch?v=0hvtiVQ_LqQ&list=PLVfJCYRuaJIXooK_KWju5djdVmEpH81ee TIMESTAMPS: (00:00) Episode Preview (00:51) Introduction to Dr. Ronen Dar (03:30) Run:ai's technology and what differentiates it from other solutions (06:00) Today's market compared to when Dr. Ronen started five years ago (13:40)Run:ai on market competitors like mosaicML (14:55) Sponsors: NetSuite | Omneky (22:00) The process and best practices by which companies secure compute capacity (25:00) Dr. Ronen explains the GPU shortage (31:50) GPU solutions (36:00) Relative pricing across major providers (41:00) What other chip makers are going to be relevant? (49:00) Global outlook for chip production (52:45) Worldview around the US-China AI race (58:00) Can controls on hardware actually control access to AI? LINKS: https://www.run.ai/ SOCIAL MEDIA: @labenz (Nathan) @ronen_dar @runailabs (Run:ai) @cogrev_podcast SPONSORS: NetSuite | Omneky -NetSuite provides financial software for all your business needs. More than thirty-six thousand companies have already upgraded to NetSuite, gaining visibility and control over their financials, inventory, HR, eCommerce, and more. If you're looking for an ERP platform -> NetSuite: http://netsuite.com/cognitive and defer payments of a FULL NetSuite implementation for six months. -Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that *actually work* customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.
Data drives the information economy, but text and images provide crucial context. That's why storytelling has become so prominent in the world of AI and analytics. By harmonizing the use of relevant data with ideal descriptions, organizations can keep customers happy, partners engaged, and employees motivated. Check out this episode of DM Radio to hear two industry visionaries share their thoughts on how to thrive in this new, AI-enabled world! Host @eric_kavanagh will interview Naveen Rao, CEO of MosaicML, and Gur Steif, President, Digital Business Automation at BMC Software. They'll discuss how the nexus of data, automation and AI is fundamentally rewriting the rules of modern business.
Big Sky Capital launches $20M fund for enterprise SaaS startups, FedML secures $11.5M to combine MLOps tools with decentralized AI compute network, Helsing AI develops software for real-time battlefield analysis, AI21 Labs launches Contextual Answers AI engine for enterprise data, Google develops AI news-writing tool, BBC reveals design for Fourteenth Doctor's Sonic Screwdriver, US proposes new rules to examine mergers by online platforms, TikTok tests subscription-based music streaming service, Microsoft introduces virtual makeup filters to Teams, and MosaicML unveils open-source large language model MPT-7B-8K.
In this episode, we dive deep into Databricks' recent acquisition of MosaicML, a staggering $1.3 billion transaction that aims to democratize AI and bring its services to, well... everyone? We'll explore their before and after messaging, the key elements of category strategy in play, and the role of AI in shaping today's software messaging landscape. We also delve into the challenges of positioning AI in an industry that's rapidly expanding and evolving, and worries over reverse differentiation. Join us as we discuss the potential fracturing of AI models and what this could mean for the future of AI in B2B SaaS as the potential looms for it to be incorporated into tech stacks worldwide. We speculate on Databricks' future strategies, including the buzz around a possible public offering to compete against OpenAI in the future. With plenty playing out in the AI space, join this squad of seasoned tech veterans, who are also curious about AI's explosive growth, and what it means for all of us. Get all that and more on this week's episode of The SaaS Brand Strategy Show. About DRMG: SaaS Brand Strategy (SBS) isn't about the colors you use, or the typeface you choose. It's about the category you design and the story you tell. DRMG exists to help SaaS businesses find their magic bullet, load it, and fire it into the market. The companies we work with come out the other side with differentiation, defined categories, and the messaging to back it up. They're organizationally aligned, inspired, and ready to tell a better story—and win. Own the brand that drives demand. With DRMG. Send us an email at: hi@drmg.co Learn more at: drmg.co
This week we had a very special guest on the podcast: Matthew Lynley, one of the founding hosts of Equity and a former TechCruncher. Since his Equity days, Lynley went off and started his very own AI-focused publication called Supervised.We brought him back on the show to ask him questions in a format where we can all learn together. Here's what we got into:From Transformers to GPT4: How attention became so critical inside of neural networks, and how transformers set the path for modern AI services.Recent acquisitions in the AI space, and what it means for the “LLM stack:” With Databricks buying MosaicML and Snowflake already busy with its own checkbook, a lot of folks are working to build out a full-stack LLM data extravaganza. We talked about what that means.Where startups sit in the current AI race: While it's great to think about the majors, we also need to know what the startup angle is. The answer? It's a little early to say, but what is clear is that startups are taking some big swings at the industry and are hellbent to snag a piece of the pie.Thanks to everyone for hanging out with us. Equity is back on Friday for our weekly news roundup!For episode transcripts and more, head to Equity's Simplecast website.Equity drops at 7 a.m. PT every Monday, Wednesday and Friday, so subscribe to us onApple Podcasts, Overcast, Spotify and all the casts. TechCrunch also has a great show on crypto, a show that interviews founders, one that details how our stories come together and more!
Startup MosaicML is on a mission to help the AI community enhance prediction accuracy, decrease costs, and save time by providing tools for easy training and deployment of large AI models. In this episode of NVIDIA's AI Podcast, host Noah Kravitz speaks with MosaicML CEO and co-founder Naveen Rao, about how the company aims to democratize access to large language models. MosaicML, a member of NVIDIA's Inception program, has identified two key barriers to widespread adoption: the difficulty of coordinating a large number of GPUs to train a model and the costs associated with this process. Making training of models accessible is key for many companies who need to control over model behavior, respect data privacy, and iterate fast to develop new products based on AI.
MosaicML's VP Of Engineering, Hagay Lupesko, joins us today to discuss generative AI! We talk about how to use existing models as well as ways to finetune these models to a particular task or domain. 00:01:28 Introductions00:02:09 Hagay's circuitous career journey00:08:25 Building software for large factories00:17:30 The reality of new technologies00:28:10 AWS00:29:33 Pytorch's leapfrog advantage00:37:24 MosaicML's mission00:39:29 Generative AI00:44:39 Giant data models00:57:00 Data access tips01:10:31 MPT-7B01:27:01 Careers in Mosaic01:31:46 FarewellsResources mentioned in this episode:Join the Programming Throwdown Patreon community today: https://www.patreon.com/programmingthrowdown?ty=h Subscribe to the podcast on Youtube: https://www.youtube.com/@programmingthrowdown4793 Links: Hagay Lupesko: Linkedin: https://www.linkedin.com/in/hagaylupesko/ Twitter: https://twitter.com/hagay_lupesko Github: https://github.com/lupesko MosaicML: Website: https://www.mosaicml.com/ Careers: https://www.mosaicml.com/careers Twitter: https://twitter.com/MosaicML Linkedin: https://www.linkedin.com/company/mosaicml/ Others: Amp It Up (Amazon): https://www.amazon.com/Amp-Unlocking-Hypergrowth-Expectations-Intensity/dp/1119836115 Hugging Face Hub: https://huggingface.co/ If you've enjoyed this episode, you can listen to more on Programming Throwdown's website: https://www.programmingthrowdown.com/ Reach out to us via email: programmingthrowdown@gmail.com You can also follow Programming Throwdown on Facebook | Apple Podcasts | Spotify | Player.FM | Youtube Join the discussion on our DiscordHelp support Programming Throwdown through our Patreon ★ Support this podcast on Patreon ★
Databricks acquires MosaicML, SpaceX $150b valuation and monopoly, Robinhood acquires X1, Anduril acquires Adranos, Shein competes with Amazon | Pre-IPO Stock Market Update - Jun 30, 202300:30 | Databricks acquires MosaicML- Databricks acquires MosaicML for $1.3 billion, aiming to bolster its machine learning sector- Deal enhances Databricks' offerings, underscoring the value and potential growth of AI01:07 | SpaceX $150b valuation, defacto monopoly per WSJ- SpaceX completes tender at $150b valuation, $84/share, a 9% increase over its last tender at $137b valuation in Jan 2023, 6 months ago- WSJ says SpaceX has a defacto monopoly on rocket launches03:12 | Robinhood acquires X1- Robinhood is acquiring credit card startup X1 for $95m, marking its first entry into the banking sector.- Acquisition paves the way for a new “securities based line of credit” credit card product from Robinhood, potentially featuring lower interest rates backed by brokerage account securities as collateral03:59 | Anduril acquires Adranos- Anduril acquired solid rocket motors manufacturer Adranos- Acquisition includes Adranos' advanced manufacturing process and proprietary fuel technology ALITEC, which could increase rocket motor range by 40% and cut costs05:25 | Shein competes with Amazon- Fast-fashion retailer Shein is transforming into a marketplace platform- Aims to directly compete with Amazon and PDD's Temu, allowing 3rd party vendors to sell various products07:09| Big capital raises- Sunwoda EVB | $232m Series B, $5.0b valuation- Hithium | $629m Series C, $4.2b valuation- Inflection | $1.3b Series B, $4.0b valuation- Aledade | $260m Series F, $3.5b valuation- Cart.com | $58m Series C, $1.3b valuation- 1Komma5 | $232m Series B, $991m valuation- Unit21 | $45m Series C, $700m valuation- 3vjia Technology | --- Series D, $600m valuation- Redpanda | $100m Series C, $520m valuation- Cyera | $100m Series B, $500m valuation08:58 | Pre-IPO +0.76% for week, S&P 500 +2.35%- YTD pre-IPO stocks still trail the S&P by 30%.- Year to date. Databricks +14.42%, Deel +3.99%. Stripe, Discord -30% approx. Epic Games, Flexport, Chime, Brex all -40% approx.- This week. Databricks +11.25%. Revolut +5.65%. Discord -3.76%.
Weekly newsletter #137 → https://net1us.substack.com/p/weekly-newsletter-137 ・GoogleがARメガネ開発を凍結へ ・Databricksが生成AIスタートアップのMosaicMLを13億ドルで買収 ・Doordash、ドライバーへの時給制を導入へ ・NvidiaとSalesforce、AI動画スタートアップRunwayに出資へ --- Send in a voice message: https://podcasters.spotify.com/pod/show/net1us/message
This week we discuss RHEL licensing changes, check the vibe of DevOps and some thoughts on programing language. Plus, has ChatGPT already become boring? Runner-up Titles I don't like listening to fellow thought leaders. I listen to myself enough. Dammit, alarm was set for PM A massive failure of one The end of free It's not all smiles and thumbs Goose-cow “I used to, but I don't anymore.” The Podcast Review podcast. Rundown RHEL Furthering the evolution of CentOS Stream (https://www.redhat.com/en/blog/furthering-evolution-centos-stream) Red Hat strikes a crushing blow against RHEL downstreams (https://www.theregister.com/2023/06/23/red_hat_centos_move/) IBM/Red Hat Sparks Anger at GPL ‘breach' as RHEL Source Locked Up (https://devops.com/rhel-gpl-richixbw/) Rocky Strikes Back At Red Hat (https://hackaday.com/2023/06/30/rocky-strikes-back-at-red-hat/) The Suicide Attempt by Red Hat [Opinion] (https://news.itsfoss.com/red-hat-fiasco/) Rant about Red Hat's Licensing Change for REHL (https://youtube.com/watch?v=4fAq6AphRn0&feature=share) Reddit Reddit CEO tells employees that subreddit blackout “will pass” (https://www.theverge.com/2023/6/13/23759559/reddit-internal-memo-api-pricing-changes-steve-huffman) Apollo's Christian Selig explains his fight with Reddit — and why users revolted (https://www.theverge.com/2023/6/13/23759180/reddit-protest-private-apollo-christian-selig-subreddit) Reddit doubles down (https://www.platformer.news/p/reddit-doubles-down?utm_medium=email) Hackers threaten to leak 80GB of confidential data stolen from Reddit (https://techcrunch.com/2023/06/19/hackers-threaten-to-leak-80gb-of-confidential-data-stolen-from-reddit) DevOps Second Wave DevOps (https://www.systeminit.com/blog-second-wave-devops/) Kelsey Hightower Predicts How the Kubernetes Community Will Evolve (https://thenewstack.io/kelsey-hightower-predicts-how-the-kubernetes-community-will-evolve/) Kelsey Hightower Retires (https://twitter.com/kelseyhightower/status/1673366087541600256?s=20) Even the best rides come to an end featuring Kelsey Hightower (https://changelog.com/friends/6) (Podcast) Stack Overflow Developer Survey 2023 (https://survey.stackoverflow.co/2023/) Relevant to your Interests AWS teases mysterious mil-spec ‘Snowblade' server (https://www.theregister.com/2023/06/07/aws_snowblade_military_edge_server/) To fill offices, Google issues ultimatum while Salesforce tries charity (https://www.washingtonpost.com/business/2023/06/08/google-salesforce-return-to-office/) Amazon is pursuing 'too many ideas' and needs to focus on best opportunities (https://www.cnbc.com/2023/06/07/amazon-is-pursuing-too-many-ideas-bernstein-says-in-open-letter.html) There are better places for Amazon to put their capital to work, says Bernstein's Mark Shmulik (https://www.youtube.com/watch?v=j9Z2HeYkl4c) The best password managers for 2023 | Engadget (https://www.engadget.com/best-password-manager-134639599.html?guccounter=1&guce_referrer=aHR0cHM6Ly9uZXdzLmdvb2dsZS5jb20v&guce_referrer_sig=AQAAAIYHiHrsIv_lVu8RNqY46BjFzlgU4pFDBXmk1gQxq2wlQOz02b5tuepColb1KJFoYYwQVWy2SjTUKWVY2oAEMzfkYXlXs97_PE0gpwNUA4RjnDwE_YEm7FB323M9oOBQJNHboj1t77QC9HriDL8cJP-VcplJ5UlJvvwHZRzMn9PC) After a Rocky Year, Zuckerberg Lays Out Meta's Road Map to Employees (https://www.nytimes.com/2023/06/08/technology/mark-zuckerberg-meta.html) Hybrid combines the worst of office and remote work (https://world.hey.com/dhh/hybrid-combines-the-worst-of-office-and-remote-work-d3174e50) Twilio to sell ValueFirst business to Tanla (NYSE:TWLO) (https://seekingalpha.com/news/3978773-twilio-to-sell-valuefirst-business-to-tanla) Jeff Bezos Has Gained $10 on Mystery Purchase of One Amazon Share (https://www.bloomberg.com/news/articles/2023-06-09/billionaire-jeff-bezos-just-bought-one-share-of-amazon-and-no-one-knows-why#xj4y7vzkg) Jeff Bezos Has Gained $10 on Mystery Purchase of One Amazon Share (https://www.bloomberg.com/news/articles/2023-06-09/billionaire-jeff-bezos-just-bought-one-share-of-amazon-and-no-one-knows-why#xj4y7vzkg) CNET's Free Shopping Extension Saves You Time and Money. Give It a Try Today (https://www.cnet.com/tech/services-and-software/use-cnet-shopping-to-seek-out-the-best-deals/) Modular: Our launch & what's next (https://www.modular.com/blog/our-launch-whats-next) Exclusive-Broadcom set to win EU nod for $61 billion VMware deal, sources say (https://finance.yahoo.com/news/exclusive-eu-antitrust-regulators-okay-091426470.html) Amazon is reportedly trying to offer Prime subscribers free cell phone service | Engadget (https://www.engadget.com/amazon-is-reportedly-trying-to-offer-prime-subscribers-free-cell-phone-service-140026387.html) Cloud cost management startup CloudZero lands $32M investment (https://techcrunch.com/2023/06/12/cloud-cost-management-startup-cloudzero-lands-32m-investment/) Twitter stiffs Google (https://www.platformer.news/p/twitter-stiffs-google) Open Sourcing AWS Cedar Is a Game Changer for IAM (https://thenewstack.io/open-sourcing-aws-cedar-is-a-game-changer-for-iam/) Oracle beats on top and bottom lines as cloud revenue jumps (https://www.cnbc.com/2023/06/12/oracle-orcl-q4-earnings-report-2023.html) America to halt $68.7bn Microsoft takeover of Activision Blizzard (https://www.thetimes.co.uk/article/america-to-halt-68-7bn-microsoft-takeover-of-activision-blizzard-d80jvxm6f) Meta's Open-Source 'MusicGen' AI Is Like ChatGPT for Tunes (https://gizmodo.com/meta-open-source-musicgen-ai-like-chatgpt-for-music-1850528986) Google's return-to-office crackdown gets backlash from some employees: (https://www.cnbc.com/2023/06/13/google-rto-crackdown-gets-backlash-check-my-work-not-my-badge.html) Forrester Wave Integrated Software Delivery Platforms, Q2 2023 (https://www.forrester.com/blogs/the-forrester-wave-integrated-software-delivery-platforms-q2-2023-say-goodbye-to-the-devops-tax/) The economic potential of generative AI: The next productivity frontier (https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier?utm_source=newsletter&utm_medium=email&utm_campaign=newsletter_axioslogin&stream=top) 1 big thing: Where AI's productivity revolution will strike first (https://www.axios.com/newsletters/axios-login-da50d8f4-fb10-4952-af38-01163b9acbd3.html?chunk=0&utm_term=emshare#story0) For the first time in almost 30 years, a company other than IBM received the most US patents (https://finance.yahoo.com/news/first-time-almost-30-years-192900742.html) AMD stock pops on potential Amazon superchip deal, CEO bullishness (https://finance.yahoo.com/news/amd-stock-pops-on-potential-amazon-superchip-deal-ceo-bullishness-112819279.html) Amazon cloud services back up after big outage hits thousands of users (https://www.reuters.com/technology/amazon-says-multiple-cloud-services-down-users-2023-06-13/) Proven Practices for Developing a Multicloud Strategy | Amazon Web Services (https://aws.amazon.com/blogs/enterprise-strategy/proven-practices-for-developing-a-multicloud-strategy/) 40 photos from inside Metropolitan Park—the first phase of Amazon's HQ2 (https://www.aboutamazon.com/news/amazon-offices/amazon-headquarters-hq2-arlington-virginia-photos?utm_source=newsletter&utm_medium=email&utm_campaign=newsletter_axioslogin&stream=top) The Forrester Wave™: Integrated Software Delivery Platforms, Q2 2023 (https://page.gitlab.com/forrester-wave-integrated-software-delivery-platforms-2023.html?utm_source=cote&utm_campaign=devrel&utm_content=newsletter20230615&utm_medium=email) AWS US-EAST-1 wobbled after Lambda management issues spread (https://www.theregister.com/2023/06/14/aws_us_east_1_brownout/) The store is for people, but the storefront is for robots (https://www.theverge.com/23753963/google-seo-shopify-small-business-ai) A Look Back at Q1 '23 Public Cloud Software Earnings (https://cloudedjudgement.substack.com/p/a-look-back-at-q1-23-public-cloud?utm_source=post-email-title&publication_id=56878&post_id=128805971&isFreemail=true&utm_medium=email) Apple Is Taking On Apples in a Truly Weird Trademark Battle (https://www.wired.com/story/apple-vs-apples-trademark-battle/) Apple Watch alerts 29-year-old Cincinnati woman to blood clot in lungs while sleeping (https://9to5mac.com/2023/06/19/apple-watch-blood-clot-sleeping/) Return to Office Enters the Desperation Phase (https://www.nytimes.com/2023/06/20/business/return-to-office-remote-work.html) Critical 'nOAuth' Flaw in Microsoft Azure AD Enabled Complete Account Takeover (https://thehackernews.com/2023/06/critical-noauth-flaw-in-microsoft-azure.html) What happened to Oracle? Why do they keep acquiring companies? (https://www.tiktok.com/t/ZT8JH8X5Y/) How an ex-Googler is reimagining the oldest computing interface of all (https://www.fastcompany.com/90907013/warp-terminal-command-line) WFH 4 ever (https://www.axios.com/2023/06/23/work-from-home-remote-workplace-trend) Databricks picks up MosaicML, an OpenAI competitor, for $1.3B (https://techcrunch.com/2023/06/26/databricks-picks-up-mosaicml-an-openai-competitor-for-1-3b/) Introducing LLaMA: A foundational, 65-billion-parameter language model (https://ai.facebook.com/blog/large-language-model-llama-meta-ai/?utm_source=newsletter&utm_medium=email&utm_campaign=newsletter_axioslogin&stream=top) AI's next conflict is between open and closed (https://www.axios.com/newsletters/axios-login-e2a8f546-c6e2-421c-a7dc-0996d64bf312.html?chunk=0&utm_term=emshare#story0) Amazon is investing another $7.8B in Ohio-based cloud computing operations, (https://apnews.com/article/amazon-aws-ohio-data-center-investment-e35c8b726269b6b78ce05854f9f31d27) A new law protecting pregnant workers is about to take effect (https://www.axios.com/2023/06/22/pregnant-workers-fairness-act-2023-explain) Amazon launches AWS AppFabric to help customers connect their SaaS apps (https://techcrunch.com/2023/06/27/amazon-launches-aws-appfabric-to-help-customers-connect-their-saas-apps/?guccounter=1&guce_referrer=aHR0cHM6Ly9uZXdzLmdvb2dsZS5jb20v&guce_referrer_sig=AQAAAGcA6HN4Zti_4dKCpuMURoiAkkQ_uR0GBWFOG215KnmRsvryBDclj9SjWv-95R0yA0wFRXevcP-HUdwk-E3ZyR3d23rc5VGVCNXFGK5L3mAPvoEOJxRs6WZFKQvDUBIyw5V3NpdWGkkQ-fXDh4Rijfdp2l_ekJTxepVJjoYJSyKz) State of Kubernetes Cost Optimization Report (https://inthecloud.withgoogle.com/state-of-kubernetes-cost-optimization-report/dl-cd.html) FTC Request, Answered: How Cloud Providers Do Business (https://www.lastweekinaws.com/blog/ftc-request-answered-how-cloud-providers-do-business/) OrbStack · Fast, light, simple Docker & Linux on macOS (https://orbstack.dev/?ref=console.dev) Surprise! You Work for Amazon. (https://www.theatlantic.com/technology/archive/2023/06/amazon-hub-delivery-last-mile/674559/) btop - the htop alternative (https://haydenjames.io/btop-the-htop-alternative/) We Raised A Bunch Of Money (https://fly.io/blog/we-raised-a-bunch-of-money/) Twitter has stopped paying its Google Cloud bills (https://www.businessinsider.com/elon-musk-twitter-stopped-paying-google-cloud-bills-money-platformer-2023-6) Report: 2022 Microsoft Azure Revenue Less Than Estimated, Half That Of AWS | CRN (https://www.crn.com/news/cloud/report-2022-microsoft-azure-revenue-less-than-estimated-half-that-of-aws) Google Domains shutting down, assets sold and being migrated to Squarespace (https://9to5google.com/2023/06/15/google-domains-squarespace/) Is Waze next? (https://www.theverge.com/2023/6/27/23776329/google-waze-layoffs-ads) The real story of how Facebook almost acquired Waze, but we ended up with Google (https://post.news/@/noam/2RTRvTNNxSCQb3yNjqa0DPfr1Yk) Google killed its Iris augmented-reality smart glasses (https://www.businessinsider.com/google-ar-iris-augmented-reality-smart-glasses-2023-6) Who killed Google Reader? (https://www.theverge.com/23778253/google-reader-death-2013-rss-social) Mark Zuckerberg is ready to fight Elon Musk in a cage match (https://www.theverge.com/2023/6/21/23769263/mark-zuckerberg-elon-musk-fight-cage-match-worldstar) IBM to Acquire Apptio Inc., (https://newsroom.ibm.com/2023-06-26-IBM-to-Acquire-Apptio-Inc-,-Providing-Actionable-Financial-and-Operational-Insights-Across-Enterprise-IT) IBM Re-ups On FinOps With Its Apptio Acquisition (https://www.forrester.com/blogs/ibm-re-ups-on-finops-with-its-apptio-acquisition/) Nonsense Texas Bans Kids From Social Media Without Mom and Dad's Ok (https://gizmodo.com/texas-law-kids-social-media-ban-without-parents-consent-1850540419) Summer intern's commute goes viral: She flies from South Carolina to New Jersey (https://www.cnn.com/2023/06/15/business/tiktok-summer-intern-commute/index.html) Twitter evicted from office amid lawsuits over unpaid rent and cleaning bills (https://arstechnica.com/tech-policy/2023/06/judge-ruled-twitter-must-be-evicted-from-colorado-office-over-unpaid-rent/) Fishing crew denied $3.5M in prize money after 600-pound marlin DQ'd in tournament (https://nypost.com/2023/06/19/massive-marlin-dqd-in-big-rock-blue-marlin-tournament-over-mutilation/) 'World's Largest' Buc-ee's store opens (https://www.wyff4.com/article/bucees-world-largest-tennessee/44343171) now on Bus-ee's Map (https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjgoKnr-vX_AhVslGoFHeeBBREQFnoECBgQAQ&url=https%3A%2F%2Fwww.google.com%2Fmymaps%2Fviewer%3Fmid%3D1IBCXZDU73Q5pjsDWVkoQ5O0GLoUd-bg%26hl%3Den&usg=AOvVaw3joznC0GgnH9dU-z_XGEw5&opi=89978449) Magic Mushrooms. LSD. Ketamine. The Drugs That Power Silicon Valley. (https://www.wsj.com/articles/silicon-valley-microdosing-ketamine-lsd-magic-mushrooms-d381e214) 'Fueled by inflation': USPS stamp prices are increasing soon. Here's what to know. (https://www.usatoday.com/story/money/2023/06/28/stamp-price-increase-usps/70363626007/) At least a year younger on paper: South Korea makes changes to age-counting law (https://www.usatoday.com/story/news/world/2023/06/28/south-korea-changes-age-counting-law/70363453007/) Sony just spilled confidential PlayStation information because of a Sharpie (https://www.theverge.com/2023/6/28/23777298/sony-ftc-microsoft-confidential-documents-marker-pen-scanner-oops) Australia legalises psychedelics for mental health (https://www.bbc.co.uk/news/world-australia-66072427) Listener Feedback Let's Get To The News | Craig Box | Substack (https://craigbox.substack.com/) When You Don't Have a Seat At the (Managed Database) Table (https://unskript.com/blog/when-you-don-t-have-a-seat-at-the-(managed-database)-table> Show more) by Doug Sillars Conferences August 8th Kubernetes Community Day Australia (https://community.cncf.io/events/details/cncf-kcd-australia-presents-kubernetes-community-day-australia-2023/) in Sydney, Matt attending. August 21st to 24th SpringOne (https://springone.io/) & VMware Explore US (https://www.vmware.com/explore/us.html), in Las Vegas. Explore EU CFP is open. Sep 6th to 7th DevOpsDays Des Moines (https://devopsdays.org/events/2023-des-moines/welcome/), Coté speaking. Sep 18th to 19th SHIFT (https://shift.infobip.com/) in Zadar, Coté speaking. October 6, 2023, KCD Texas 2023 (https://community.cncf.io/events/details/cncf-kcd-texas-presents-kcd-texas-2023/), CFP Closes: August 30, 2023 Jan 29, 2024 to Feb 1, 2024 That Conference Texas CFP Open 6/1 - 8/21 (https://that.us/call-for-counselors/tx/2024/) If you want your conference mentioned, let's talk media sponsorships. SDT news & hype Join us in Slack (http://www.softwaredefinedtalk.com/slack). Get a SDT Sticker! Send your postal address to stickers@softwaredefinedtalk.com (mailto:stickers@softwaredefinedtalk.com) and we will send you free laptop stickers! Follow us on Twitch (https://www.twitch.tv/sdtpodcast), Twitter (https://twitter.com/softwaredeftalk), Instagram (https://www.instagram.com/softwaredefinedtalk/), Mastodon (https://hachyderm.io/@softwaredefinedtalk), BlueSky (https://bsky.app/profile/softwaredefinedtalk.com), LinkedIn (https://www.linkedin.com/company/software-defined-talk/), TikTok (https://www.tiktok.com/@softwaredefinedtalk) and YouTube (https://www.youtube.com/channel/UCi3OJPV6h9tp-hbsGBLGsDQ/featured). Use the code SDT to get $20 off Coté's book, Digital WTF (https://leanpub.com/digitalwtf/c/sdt), so $5 total. Become a sponsor of Software Defined Talk (https://www.softwaredefinedtalk.com/ads)! Recommendations Brandon: Cloudcast: MidYear 2023 Update (https://www.thecloudcast.net/2023/07/midyear-2023-update.html) Governments Building Software This Is What Happens When Governments Build Software - Odd Lots (https://omny.fm/shows/odd-lots/this-is-what-happens-when-governments-build-softwa) The Book I Wish Every Policymaker Would Read (https://www.nytimes.com/2023/06/06/opinion/ezra-klein-podcast-jennifer-pahlka.html) Tony Hsieh and the Emptiness of the Tech-Mogul Myth (https://www.newyorker.com/news/our-columnists/tony-hsieh-and-the-emptiness-of-the-tech-mogul-myth) (via Coté's newsletter) Coté: Hand Mirror app (https://handmirror.app), also in Setapp (https://setapp.com) if you have that. If Books could Kill (https://www.patreon.com/IfBooksPod) Photo Credits Header (https://unsplash.com/photos/5yuRImxKOcU) Artwork (https://www.freepnglogos.com/images/linux-22615.html)
With MosaicML acquired last week for 1.3 Billion, we rushed to get you an episode we've been working on with Mosaic's Chief Scientist, Jonathan Frankle. It's here and the episode is packed with a lot of practical thinking and perspective on the incredible work they're doing at Mosaic. Robb, Josh and Jonathan Frankle have a fun and useful conversation for anyone who's just wrapping their head around implementing LLMs, or those already deep in the trenches.
(0:00) Bestie intros: Friedberg fills in as moderator! (2:45) Wagner Group rebellion (23:15) SCOTUS strikes down Affirmative Action (51:03) Databricks acquires MosaicML for $1.3B, Inflection raises $1.3B (1:09:35) IRL shuts down after faking 95% of users, Byju's seeks to raise emergency $1B as founder control in jeopardy (1:26:38) Science Corner: Understanding the NANOGrav findings Follow the besties: https://twitter.com/chamath https://linktr.ee/calacanis https://twitter.com/DavidSacks https://twitter.com/friedberg Follow the pod: https://twitter.com/theallinpod https://linktr.ee/allinpodcast Intro Music Credit: https://rb.gy/tppkzl https://twitter.com/yung_spielburg Intro Video Credit: https://twitter.com/TheZachEffect Referenced in the show: https://edition.cnn.com/2023/06/22/politics/ukraine-counteroffensive-western-assessment/index.html https://www.independent.co.uk/news/world/europe/putin-wagner-russia-treason-coup-b2363430.html https://www.statista.com/statistics/896181/putin-approval-rating-russia https://www.levada.ru/en/ratings https://twitter.com/MatreshkaRF/status/1673209794608365570 https://www.csis.org/blogs/post-soviet-post/la-vie-en-rose-why-kremlin-blacklisted-levada-center https://www.nytimes.com/2023/06/29/world/africa/central-african-republic-wagner-africa-syria.html https://www.cnbc.com/2023/06/29/supreme-court-rejects-affirmative-action-at-colleges-says-schools-cant-consider-race-in-admission.html https://en.wikipedia.org/wiki/Students_for_Fair_Admissions_v._Harvard https://twitter.com/greg_price11/status/1674426520100814848 https://www.nbcnews.com/news/us-news/study-harvard-finds-43-percent-white-students-are-legacy-athletes-n1060361 https://www.wsj.com/articles/databricks-strikes-1-3-billion-deal-for-generative-ai-startup-mosaicml-fdcefc06 https://www.snowflake.com/blog/snowflake-acquires-neeva-to-accelerate-search-in-the-data-cloud-through-generative-ai https://www.forbes.com/sites/alexkonrad/2023/06/29/inflection-ai-raises-1-billion-for-chatbot-pi https://www.theinformation.com/articles/social-app-irl-which-raised-200-million-shuts-down-after-ceo-misconduct-probe https://www.theinformation.com/articles/softbank-backed-messaging-app-irl-says-it-has-20-million-users-some-employees-have-doubts-about-that https://www.bloomberg.com/news/articles/2023-06-27/byju-s-seeks-to-raise-1-billion-to-sidestep-shareholder-revolt https://techcrunch.com/2023/06/27/prosus-byjus-markdown https://twitter.com/shaig/status/1673836979903950851 https://www.ft.com/content/b8a4214f-7f64-4d3a-97c4-4731f2effb0d https://twitter.com/chamath/status/1674469606746992651 https://pauloffit.substack.com/p/my-conversation-with-robert-f-kennedy https://www.quantamagazine.org/an-enormous-gravity-hum-moves-through-the-universe-20230628 https://physics.aps.org/articles/v16/116
Mary Ann and Alex are back for another busy news week chock full of deals to chew through.Here's the rundown:Deals of the Week: We think that the idea behind the recently-funded Honey Homes is excellent, but we're split about the cost. We also went over Gusto's latest financial achievements and its plans to team up with Remote.Fintech M&A: The biggest deal of the week in fintech was Visa's purchase of Pismo. We haven't had unicorn-level acquisitions lately, so this one was welcome. Elsewhere in the space, Brex has brought on board a former SVB and a16z denizen, and Ramp bought Cohere.io (not this Cohere, the other one).Other M&A: But those weren't the only deals. Databricks bought MosaicML, IBM bought Apptio, and ThoughtSpot has acquired Mode Analytics.Help, my unicorn is starving: We closed with Alex's look at the declining funding to unicorn and web3 startups, as well as Rebecca Szkutak's latest on the secondary market in the process.Equity will be back on Wednesday as we head off into yet another holiday weekend here in the U.S. when Alex will finally put his PTO to use. In the meantime, let's catch up on Twitter @EquityPod. Talk soon!For episode transcripts and more, head to Equity's Simplecast website. Equity drops at 7:00 a.m. PT every Monday, Wednesday and Friday, so subscribe to us on Apple Podcasts, Overcast, Spotifyand all the casts. TechCrunch also has a great show on crypto, a show that interviews founders, one that details how our stories come together and more!
In den letzten Wochen gab es Rekordfinanzierungen von AI Start-Ups. Ich stelle euch die aus meiner Sicht wichtigsten vor.Die nächste AI Masterclass findet am 24. - 25.07.2023 statt, alle wichtigen Informationen findet ihr hier .Mit dem Promocode: Podcast gibt es 10%.Themen des Podcasts:Intro: 00:00Runway: 00:22Inflection AI: 01:43Synthesia: 03:23Mistral AI: 04:17Databricks kauft MosaicML: 05:22Fazit: 07:271. Abonniert meinen Newsletter für die neuesten AI & Tech Trends2. Podcast abonnieren: Apple, Spotify, Google & Amazon3. Folgt mir LinkedIn, Instagram, YouTube, TikTok & Twitter4. Ihr wollt euch weiterbilden? Meldet euch zur AI Masterclass an.
In der Rubrik “Investments & Exits” begrüßen wir heute Leo Lerach, Principal bei Project A und Philipp Werner, Partner bei Project A. Die beiden besprechen die Übernahme von MoasicML durch Databricks sowie die Finanzierungsrunden von Dexory und Colonia Technologies.Databricks, ein führendes Unternehmen im Bereich Datenanalyse und künstliche Intelligenz, hat die generative KI-Start-up-Firma MosaicML für 1,3 Milliarden Dollar übernommen. MosaicML hat große Sprachmodelle entwickelt, darunter das MPT-30B, das als Konkurrenz zu GPT-3 und GPT-4 gilt. Databricks plant, diese Modelle in seine Lakehouse-Plattform zu integrieren und seine Position im boomenden Markt der generativen KI zu stärken. Dexory, ein Unternehmen, das Lagerhäuser mit Echtzeit-Bestandsmanagementdaten mittels KI-Software und autonomen Robotern versorgt, hat in einer Series-A-Finanzierungsrunde 19 Millionen US-Dollar eingesammelt. Zu den weiteren Investoren gehören Atomico, Lakestar, Maersk Growth, Kindred Capital und Capnamic. Der europäische Risikokapitalgeber Atomico führte die Investitionsrunde an. Dexory entwickelt Roboter mit Sensoren und Kameras, die Daten erfassen und kontinuierlich Fotos von Regalen machen, während sie mit normaler Gehgeschwindigkeit durch das Lager gehen. Das Unternehmen wurde 2015 in London gegründet.Das deutsche Startup Colonia Technologies hat 6 Millionen Euro für seine Plattform zur gemeinsamen Nutzung von Nutzfahrzeugen aufgebracht. Das B2B-Sharing-Modell ist ein Marktplatz für Logistikunternehmen, um ihre Fahrzeuge für Engpässe im internationalen Logistikbereich zu teilen. Gegründet wurde das Unternehmen im Jahr 2021 von Jakob Sadoun und Kaspar Filipp. Investoren der Seed-Runde waren Octopus Ventures, vent.io, Plug and Play, MobilityFund, Atlantic Labs, Hoyer sowie die Gründer von Flixbus, Grover und Seven Senders.
Mike McGrath joins the Ask Noah Show to discuss the changes Red Hat is making in how they make their source code available. -- During The Show -- 00:50 Arch Keyring Fix - Bloominstrong at the bottom of /etc/pacman.conf Include "SigLevel = Optional TrustAll" under core and extra Update the keyring -Sy archlinux-keyring Confirm each new key Remove "SigLevel = Optional TrustALL" run pacman -Syu 03:05 Church Streaming - John Wired vs WiFi video OpenLP (https://openlp.org/) WimpysWorld OBS Portable (https://github.com/wimpysworld/obs-studio-portable) FreeShow (https://freeshow.app/) OBS Project (https://obsproject.com/) Ask Noah Show Ep 341 (https://podcast.asknoahshow.com/341) Confidence monitor Stream Deck Bitfocus (https://bitfocus.io/companion) vdo.ninja (https://vdo.ninja/) Scale Engine (https://www.scaleengine.com/) Own Cast (https://owncast.online/) 10:30 Web Platform? - Brian Go Hugo (gohugo.io) 12:30 Mumble Caller - JMP.chat & Linphone - Naelr Setup Linphone with JMP.chat Send ? in cheogram chat reset sip 17:58 News Wire The linux 6.4 kernel has been released (https://lkml.org/lkml/2023/6/25/453) Intel's new ARC driver in Linux boosts gaming performance by 11% (https://www.tomshardware.com/news/intel-arc-driver-linux-boost) The Exodia OS team has recently updated their customized arch-based distro for security testing (https://github.com/Exodia-OS) Releases Proxmox Virtual Environment 8 (https://www.proxmox.com/en/news/press-releases/proxmox-virtual-environment-8-0) Peazip 9.3 (https://peazip.github.io/changelog.html) Darktable 4.4 (https://www.darktable.org/2023/06/darktable-4.4.0-released/) Ardour 7.5 (https://ardour.org/whatsnew.html) Firewalld 2.0 (https://firewalld.org/2023/06/firewalld-2-0-0-release) Industry News GitLab Expands its Open Source Partner Community With the Addition of The Open Group (https://executivebiz.com/2023/06/gitlab-expands-open-source-partner-community-with-addition-of-the-open-group/) Security News New Linux based IOT attack campaign (https://www.infosecurity-magazine.com/news/openssh-trojan-campaign-iot-linux/) AI News Databricks has agreed to buy MosaicML (https://www.thestack.technology/databricks-to-buy-mosaicml/) Robin AI - Github PR reviewer (https://github.com/Integral-Healthcare/robin-ai-reviewer) Hardware News Dingo - A fully Open Source Robot Dog (https://www.i-programmer.info/news/169-robotics/16402-meet-dingo-your-open-source-four-footed-friend.html) 19:55 Red Hat Interview Mike McGrath - Vice President of Core Platforms Engineering Blog Post 1 (https://www.redhat.com/en/blog/furthering-evolution-centos-stream) Reaction was unexpected, swift, and immediate Conviction to the GPL Source code is still available Surprised more people didn't look at CentOS Stream Blog Post 2 (https://www.redhat.com/en/blog/red-hats-commitment-open-source-response-gitcentosorg-changes) Red Hat no longer finds value in "re-builders" Free as in freedom vs free as in beer Standing on the shoulders of giants Red Hat welcome's competition and contribution Why not start with the second blog post's message? Timing of the change Meeting the GPL requirements Products vs Projects When did the thought of this start? CentOS Stream Things being done in bad faith Threat to the open source business model Red Hat has worked hard to make RHEL available for free What about people using downstream rebuilds in CI pipelines? No cost RHEL for open source projects (ROSI) (https://www.redhat.com/en/blog/extending-no-cost-red-hat-enterprise-linux-open-source-organizations) Does Red Hat want to be the "bottom rung of the ladder"? IBM had zero input CentOS Stream is critical for RHEL What is Red Hat selling when they sell RHEL? What impact will this have on the broader ecosystem? -- The Extra Credit Section -- For links to the articles and material referenced in this week's episode check out this week's page from our podcast dashboard! This Episode's Podcast Dashboard (http://podcast.asknoahshow.com/343) Phone Systems for Ask Noah provided by Voxtelesys (http://www.voxtelesys.com/asknoah) Join us in our dedicated chatroom #GeekLab:linuxdelta.com on Matrix (https://element.linuxdelta.com/#/room/#geeklab:linuxdelta.com) -- Stay In Touch -- Find all the resources for this show on the Ask Noah Dashboard Ask Noah Dashboard (http://www.asknoahshow.com) Need more help than a radio show can offer? Altispeed provides commercial IT services and they're excited to offer you a great deal for listening to the Ask Noah Show. Call today and ask about the discount for listeners of the Ask Noah Show! Altispeed Technologies (http://www.altispeed.com/) Contact Noah live [at] asknoahshow.com -- Twitter -- Noah - Kernellinux (https://twitter.com/kernellinux) Ask Noah Show (https://twitter.com/asknoahshow) Altispeed Technologies (https://twitter.com/altispeed)
Training 3.0 benchmark results show performance gains of up to 1.54x compared to six months ago and 33-49x improvement over the first round, driving innovation and energy efficiency in the industry. Intel's Habana Gaudi2 ML training engine competes with Nvidia's offerings, boasting better performance than A100 and lower pricing than H100. Nvidia, on the other hand, unveils their NeMo model with half a trillion parameters and expands the MLPerf Training suite to include GPT-3 and a new Recommendation engine. Their collaboration with CoreWeave showcases the superior performance of the H100, providing a 3.6x speed increase for GPT-3 compared to Intel Xeon and Gaudi2. Nvidia is also developing foundation models for their DGX cloud, collaborating with major players in the industry, and Intel is widely rumored to be developing its own Gaudi2-as-a-Service offering. Then there's the Tiny 1.1 inferencing benchmark, which saw over 150 results and performance improvements up to 1000x. Time Stamps: 0:00 - Welcome to the Rundown 0:48 - What Red Hat is doing with CentOS 3:36 - Moving Windows to the cloud for consumers 6:31 - IBM acquires Apptio 8:51 - Cisco set to acquire SamKnows 12:04 - Databricks Acquires MosaicML 15:37 - Cato Networks introduces AI tracker for malware command and control 18:39 - MLPerf 3 Upsets the AI Apple Cart 32:10 - The Weeks Ahead 33:40 - Thanks for Watching Follow our Hosts on Social Media Tom Hollingsworth: https://www.twitter.com/NetworkingNerd Stephen Foskett: https://www.twitter.com/SFoskett Tim Bertino: https://www.twitter.com/TimBertino Follow Gestalt IT Website: https://www.GestaltIT.com/ Twitter: https://www.twitter.com/GestaltIT LinkedIn: https://www.linkedin.com/company/Gestalt-IT #Rundown, #MLPerf, #CentOS, #RHEL, @RedHat, #Cloud, @Microsoft, @Windows, @IBM, @Apptio, #NetworkMonitoring, @Cisco, @SamKnows, @Databricks, @MosaicML, @CatoNetworks, #AI, @MLCommons, #MLPerf3,
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
Top 10 AI-powered digital marketing tools (that aren't just wrappers for GPT)MarketMuse - AI Content OptimizationPlus AI for Google Slides - Presentations for Sales Pitches, Webinars, and ConferencesGoCharlie - AI Content Generation in Your Brand Voice + Content RepurposingAdCreative.ai - AI-Powered Ad & Social CreativesBrandBastion - AI-Driven Community ManagementContlo - Autonomous Generative MarketingIt looks like you can use ChatGPT to bypass paywallsEmployees Would Prefer AI Bosses Over Humans, survey showsDatabricks snaps up MosaicML to build private AI modelsClaude vs. ChatGPT: Which AI Assistant Should Data Scientists Choose in 2023?New AI method for graphing scenes from imagesDaily AI News 6/27/2023This podcast is generated using the Wondercraft AI platform, a tool that makes it super easy to start your own podcast, by enabling you to use hyper-realistic AI voices as your host. Like mine!Attention AI Unraveled podcast listeners!Are you eager to expand your understanding of artificial intelligence? Look no further than the essential book "AI Unraveled: Demystifying Frequently Asked Questions on Artificial Intelligence," now available at Google, Apple and Amazon! This engaging read answers your burning questions and provides valuable insights into the captivating world of AI. Don't miss this opportunity to elevate your knowledge and stay ahead of the curve.Get your copy Apple, Google, or Amazon today!
This Week in Startups is presented by: Notion just launched Notion Projects, which includes new, powerful ways to manage projects and leverage the power of their built-in AI features too. Try it for free today at notion.com/twist. LinkedIn Jobs. A business is only as strong as its people, and every hire matters. Go to LinkedIn.com/TWIST to post your first job for free. Terms and conditions apply Fin can't burn its mouth on hot pizza. Or wave at someone who wasn't waving at them. Fin can resolve half of your customer support tickets instantly before they reach your team. Meet Fin. A breakthrough AI bot by Intercom – ready to join your support team today. Visit https://intercom.com/fin * Today's show: Jason is joined by Vinny Lingham to break down IRL shutting down after faking 95% of users (1:12), ZIRP fraud (12:59), Databricks acquiring MosaicML for $1.3B, and some AI demos! (1:02:31) * Check out Waitroom: https://waitroom.com/ Follow Vinny: https://twitter.com/vinnylingham * Time stamps: (0:00) Vinny joins Jason (1:12) IRL's 19M fake users (5:45) Twitter Bots and the creation of fake accounts (11:50) Notion - Try it for free today at notion.com/twist (12:59) Diligence in early-stage startups (20:59) Databricks acquires MosaicML for $1.3B (27:02) LinkedIn Jobs - Post your first job for free at https://linkedin.com/twist (33:09) Google generative search (37:47) Fin - Try Fin, Intercom's new AI customer support chatbot, at https://intercom.com/fin (41:56) Vinny's thoughts on the Titan tragedy (53:34) DeepMind CEO's says Gemini is more capable than ChatGPT (1:02:31) Vinny demos Colorize and Replika * Read LAUNCH Fund 4 Deal Memo: https://www.launch.co/four Apply for Funding: https://www.launch.co/apply Buy ANGEL: https://www.angelthebook.com Great recent interviews: Steve Huffman, Brian Chesky, Aaron Levie, Sophia Amoruso, Reid Hoffman, Frank Slootman, Billy McFarland, PrayingForExits, Jenny Lefcourt Check out Jason's suite of newsletters: https://substack.com/@calacanis * Follow Jason: Twitter: https://twitter.com/jason Instagram: https://www.instagram.com/jason LinkedIn: https://www.linkedin.com/in/jasoncalacanis * Follow TWiST: Substack: https://twistartups.substack.com Twitter: https://twitter.com/TWiStartups YouTube: https://www.youtube.com/thisweekin * Subscribe to the Founder University Podcast: https://www.founder.university/podcast
Databricks to acquire generative A.I. startup MosaicML in $1.3B deal, BlackRock's Larry Fink says he's no longer using the "politicized" term ESG, and Stability A.I.'s third executive turnover in less than three months.
Databricks' acquisition of MosaicML for $1.3 billion, Ramp's acquisition of Cohere.io to improve its customer service, and two research papers on self-supervised evaluation for large language models and scaling MLPs. These topics provide valuable insights into the growing demand for generative AI tools, the importance of realistic data evaluation, and the limits of MLPs' performance on vision tasks. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:31 Databricks Strikes $1.3 Billion Deal for Generative AI Startup MosaicML 03:33 As the generative AI craze rages on, Ramp acquires customer support startup Cohere.io 05:36 Inside China's underground market for high-end Nvidia AI chips 07:36 Fake sponsor 09:31 Bring Your Own Data! Self-Supervised Evaluation for Large Language Models 11:28 System-Level Natural Language Feedback 12:55 Scaling MLPs: A Tale of Inductive Bias 14:50 Outro
AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning
Dive into the details of the groundbreaking $1.3 billion acquisition of MosaicML, a key player in the AI field, by Databricks. Explore the implications of this merger on the AI landscape and how it challenges OpenAI's dominance in the market. Get on the AI Box Waitlist: https://AIBox.ai/Investor Contact Email: jaeden@aibox.aiFacebook Community: https://www.facebook.com/groups/739308654562189/ Discord Community: https://discord.gg/hHw4naQr Follow me on Twitter: https://twitter.com/jaeden_ai
Averages closed in the red, near the lows of the session. Neuberger Berman's CIO breaks down the market action and gives his second half playbook. Eurasia Group President Iam Bremmer breaks down what's next for Europe and the US. Lucid Motors CEO Peter Rawlinson discusses his company's deal with Aston Martin that sent the stock higher today. Nucor CEO Leon Topalian talks Nucor's upbeat guidance and infrastructure spending amid rising rates. Databricks CEO Ali Ghodsi on MosaicML and what's next for the company. Plus Richard Haass talks geopolitical fallout from the weekend events in Russia.
In this episode, Nathan sits down with Jonathan Frankle, Chief Scientist, and Abhi Venigalla, Research Scientist of MosaicML. They chat about Mosaic's custom LLMs, the customers seeking Mosaic out and what their journeys and use cases look like, and exciting developments in Mosaic's research: including their new inference platform, as well as Mosaic's MPT-7B-65k+ storywriter model. RECOMMENDED PODCAST: The HR industry is at a crossroads. What will it take to construct the next generation of incredible businesses – and where can people leaders have the most business impact? Hosts Nolan Church and Kelli Dragovich have been through it all, the highs and the lows – IPOs, layoffs, executive turnover, board meetings, culture changes, and more. With a lineup of industry vets and experts, Nolan and Kelli break down the nitty-gritty details, trade offs, and dynamics of constructing high performing companies. Through unfiltered conversations that can only happen between seasoned practitioners, Kelli and Nolan dive deep into the kind of leadership-level strategy that often happens behind closed doors. Check out the first episode with the architect of Netflix's culture deck Patty McCord. https://link.chtbl.com/hrheretics The Cognitive Revolution is a part of the Turpentine podcast network. To learn more: www.turpentine.co TIMESTAMPS: (00:00) Episode Preview (06:04) Mosaic's business model (07:28) Who uses Mosaic's custom LLMs? What does their data look like? (09:55)Mosaic's use cases for custom LLMs (12:47) How much extraction and summarization was done by humans pre-LLMs? (15:28) Sponsor: Omneky (21:50) The journeys of Mosaic's customers and would a Wendy's LLM know about a Big Mac? (25:46) The curriculum model and fine-tuning (29:10) Language models in the life sciences (33:20) How raw can data be before it becomes a problem? (35:44) Using the output of bulk pre-training process vs additional after training (38:30) Redteaming as a service (39:40) Mosaic's inference platform (41:53) Spending one cent on 20,000 tokens, how is that cent distributed? (46:00)) Selling compute on a dedicated capacity basis (47:30) Oracle and AWS (49:50) The storywriter model and 65,000 token window (54:35) The transition from finite parameters into infinite attention matrix LINKS: MosaicML: https://www.mosaicml.com/ MPT-7B Storywriter Model: https://huggingface.co/mosaicml/mpt-7b-storywriter TWITTER: @jefrankle (Jonathan) @abhi_venigalla (Abhi) @MosaicML (Mosaic) @CogRev_Podcast @labenz (Nathan) @eriktorenberg (Erik) SPONSOR: Thank you Omneky for sponsoring The Cognitive Revolution. Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work, customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off. Music Credit: MusicLM
Naveen Rao is the cofounder and CEO of MosaicML, a platform that enables you to train and deploy large AI models on your data in your secure environment. They've raised funding from amazing investors such as Lux Capital, Data Collective, and Maverick Ventures. Prior to this, he was the cofounder of Nervana. It got acquired by Intel for $408M. In this episode, we cover a range of topics including: - AI compute workloads - Compute platform for training and inference - How to train your own LLM - Ways to reduce the cost of training AI models - Advantages of training domain specific models - In-context learning - Context windows of LLMs - Moat of AI-infused businesses Naveen's favorite book: The Righteous Mind (Author: Jonathan Haidt) --------Where to find Prateek Joshi: Newsletter: https://prateekjoshi.substack.com Website: https://prateekj.com LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19 Twitter: https://twitter.com/prateekvjoshi
MosaicML is a platform for training and deploying large AI models at scale. Explore their docs, check out their blog, and keep an eye on their open roles.Jonathan Frankle is the Chief Scientist at MosaicML and an incoming Assistant Professor of Computer Science at Harvard.Abhinav Venigalla is the NLP Architect at MosaicML.Today's Lifeboat badge winner is singmotor for rescuing How to remove columns with too many missing values in Python from the dustbin of history.
This Week in Startups is presented by: Vanta. Compliance and security shouldn't be a deal-breaker for startups to win new business. Vanta makes it easy for companies to get a SOC 2 report fast. TWiST listeners can get $1,000 off for a limited time at vanta.com/twist. Trovata. Starting up is hard. Trovata makes managing cash easy. Start automating your cash management at Trovata.io/TWIST. Use Code TWIST for 30% off one full year of premium features like AI forecasting. The Microsoft for Startups Founders Hub helps all founders build a better startup, at a lower cost, from day one. Startups get up to $150K in Azure credits, access to free OpenAI credits, free dev tools like GitHub, technical advisory, access to mentors and experts, and so much more. There is no funding requirement, and it only takes minutes to join. Sign up today at aka.ms/thisweekinstartups * Todays show: MosaicML Co-Founder and CEO Naveen Rao joins Jason to discuss the open-source vs closed AI debate, the profound impact of AI on society (41:06) AI's rapid pace of change, and its implications for the future of employment and education (40:42). They wrap the show by breaking down the potential problems with centralized regulation (54:37). Follow Naveen: https://twitter.com/NaveenGRao Check Out MosaicML: https://mosaicml.com * Time stamps: (00:00) Naveen Rao joins Jason (2:54) MosaicML and its purpose (5:10) Obtaining datasets and incentivizing creators (8:30) Vanta - Get $1000 off your SOC 2 at https://vanta.com/twist (9:37) The process of using your data with MosaicML (11:55) Defining tokens and prompts (16:53) Fine-tuning the AI model and reinforcement learning (19:27) The competition with open-source models (24:26) The cost of running AI models (26:08) Trovata - Use code TWIST at https://trovata.io/twist for 30% off one year of premium features, like AI forecasting (27:35) How the GPU crunch has affected cloud models (32:13) Why demand will not cease (34:21) Specialized models vs. general models (39:12) Microsoft for Startups Founders Hub - Apply in 5 minutes for six figures in discounts at http://aka.ms/thisweekinstartups (40:42) The impact AI will have on employment (48:49) The impact AI will have on education (54:37) Thoughts on OpenAI becoming ClosedAI * Read LAUNCH Fund 4 Deal Memo & Apply for Funding Buy ANGEL Great recent interviews: Brian Chesky, Aaron Levie, Sophia Amoruso, Reid Hoffman, Frank Slootman, Billy McFarland, PrayingForExits, Jenny Lefcourt Check out Jason's suite of newsletters: https://substack.com/@calacanis * Follow Jason: Twitter: https://twitter.com/jason Instagram: https://www.instagram.com/jason LinkedIn: https://www.linkedin.com/in/jasoncalacanis * Follow TWiST: Substack: https://twistartups.substack.com Twitter: https://twitter.com/TWiStartups YouTube: https://www.youtube.com/thisweekin * Subscribe to the Founder University Podcast: https://www.founder.university/podcast
We are hosting the AI World's Fair in San Francisco on June 8th! You can RSVP here. Come meet fellow builders, see amazing AI tech showcases at different booths around the venue, all mixed with elements of traditional fairs: live music, drinks, games, and food! We are also at Amplitude's AI x Product Hackathon and are hosting our first joint Latent Space + Practical AI Podcast Listener Meetup next month!We are honored by the rave reviews for our last episode with MosaicML! They are also welcome on Apple Podcasts and Twitter/HN/LinkedIn/Mastodon etc!We recently spent a wonderful week with Itamar Friedman, visiting all the way from Tel Aviv in Israel: * We first recorded a podcast (releasing with this newsletter) covering Codium AI, the hot new VSCode/Jetbrains IDE extension focused on test generation for Python and JS/TS, with plans for a Code Integrity Agent. * Then we attended Agent Weekend, where the founders of multiple AI/agent projects got together with a presentation from Toran Bruce Richards on Auto-GPT's roadmap and then from Itamar on Codium's roadmap* Then some of us stayed to take part in the NextGen Hackathon and won first place with the new AI Maintainer project.So… that makes it really hard to recap everything for you. But we'll try!Podcast: Codium: Code Integrity with Zero BugsWhen it launched in 2021, there was a lot of skepticism around Github Copilot. Fast forward to 2023, and 40% of all code is checked in unmodified from Copilot. Codium burst on the scene this year, emerging from stealth with an $11m seed, their own foundation model (TestGPT-1) and a vision to revolutionize coding by 2025.You might have heard of "DRY” programming (Don't Repeat Yourself), which aims to replace repetition with abstraction. Itamar came on the pod to discuss their “extreme DRY” vision: if you already spent time writing a spec, why repeat yourself by writing the code for it? If the spec is thorough enough, automated agents could write the whole thing for you.Live Demo Video SectionThis is referenced in the podcast about 6 minutes in.Timestamps, show notes, and transcript are below the fold. We would really appreciate if you shared our pod with friends on Twitter, LinkedIn, Mastodon, Bluesky, or your social media poison of choice!Auto-GPT: A Roadmap To The Future of WorkMaking his first public appearance, Toran (perhaps better known as @SigGravitas on GitHub) presented at Agents Weekend:Lightly edited notes for those who want a summary of the talk:* What is AutoGPT?AutoGPT is an Al agent that utilizes a Large Language Model to drive its actions and decisions. It can be best described as a user sitting at a computer, planning and interacting with the system based on its goals. Unlike traditional LLM applications, AutoGPT does not require repeated prompting by a human. Instead, it generates its own 'thoughts', criticizes its own strategy and decides what next actions to take.* AutoGPT was released on GitHub in March 2023, and went viral on April 1 with a video showing automatic code generation. 2 months later it has 132k+ stars, is the 29th highest ranked open-source project of all-time, a thriving community of 37.5k+ Discord members, 1M+ downloads.* What's next for AutoGPT? The initial release required users to know how to build and run a codebase. They recently announced plans for a web/desktop UI and mobile app to enable nontechnical/everyday users to use AutoGPT. They are also working on an extensible plugin ecosystem called the Abilities Hub also targeted at nontechnical users.* Improving Efficacy. AutoGPT has many well documented cases where it trips up. Getting stuck in loops, using instead of actual content incommands, and making obvious mistakes like execute_code("writea cookbook"'. The plan is a new design called Challenge Driven Development - Challenges are goal-orientated tasks or problems thatAuto-GPT has difficulty solving or has not yet been able to accomplish. These may include improving specific functionalities, enhancing the model's understanding of specific domains, or even developing new features that the current version of Auto-GPT lacks. (AI Maintainer was born out of one such challenge). Itamar compared this with Software 1.0 (Test Driven Development), and Software 2.0 (Dataset Driven Development).* Self-Improvement. Auto-GPT will analyze its own codebase and contribute to its own improvement. AI Safety (aka not-kill-everyone-ists) people like Connor Leahy might freak out at this, but for what it's worth we were pleasantly surprised to learn that Itamar and many other folks on the Auto-GPT team are equally concerned and mindful about x-risk as well.The overwhelming theme of Auto-GPT's roadmap was accessibility - making AI Agents usable by all instead of the few.Podcast Timestamps* [00:00:00] Introductions* [00:01:30] Itamar's background and previous startups* [00:03:30] Vision for Codium AI: reaching “zero bugs”* [00:06:00] Demo of Codium AI and how it works* [00:15:30] Building on VS Code vs JetBrains* [00:22:30] Future of software development and the role of developers* [00:27:00] The vision of integrating natural language, testing, and code* [00:30:00] Benchmarking AI models and choosing the right models for different tasks* [00:39:00] Codium AI spec generation and editing* [00:43:30] Reconciling differences in languages between specs, tests, and code* [00:52:30] The Israeli tech scene and startup culture* [01:03:00] Lightning RoundShow Notes* Codium AI* Visualead* AutoGPT* StarCoder* TDD (Test-Driven Development)* AST (Abstract Syntax Tree)* LangChain* ICON* AI21TranscriptAlessio: [00:00:00] Hey everyone. Welcome to the Latent Space podcast. This is Alessio, Partner and CTO-in-Residence at Decibel Partners. I'm joined by my co-host, Swyx, writer and editor of Latent Space.Swyx: Today we have a special guest, Tamar Friedman, all the way from Tel Aviv, CEO and co-founder of Codium AI. Welcome.Itamar: Hey, great being here. Thank you for inviting me.Swyx: You like the studio? It's nice, right?Itamar: Yeah, they're awesome.Swyx: So I'm gonna introduce your background a little bit and then we'll learn a bit more about who you are. So you graduated from Teknion Israel Institute of Technology's kind of like the MIT of of Israel. You did a BS in CS, and then you also did a Master's in Computer Vision, which is kind of relevant.You had other startups before this, but your sort of claim to fame is Visualead, which you started in 2011 and got acquired by Alibaba Group You showed me your website, which is the sort of QR codes with different forms of visibility. And in China that's a huge, huge deal. It's starting to become a bigger deal in the west. My favorite anecdote that you told me was something about how much sales use you saved or something. I forget what the number was.Itamar: Generally speaking, like there's a lot of peer-to-peer transactions going on, like payments and, and China with QR codes. So basically if for example 5% of the scanning does not work and with our scanner we [00:01:30] reduce it to 4%, that's a lot of money. Could be tens of millions of dollars a day.Swyx: And at the scale of Alibaba, it serves all of China. It's crazy. You did that for seven years and you're in Alibaba until 2021 when you took some time off and then hooked up with Debbie, who you've known for 25 years, to start Codium AI and you just raised your $11 million seed rounds with TlB Partners and Vine. Congrats. Should we go right into Codium? What is Codium?Itamar: So we are an AI coding assistant / agent to help developers reaching zero bugs. We don't do that today. Right now, we help to reduce the amount of bugs. Actually you can see people commenting on our marketplace page saying that they found bugs with our tool, and that's like our premise. Our vision is like for Tesla zero emission or something like that, for us it's zero bugs.We started with building an IDE extension either in VS Code or in JetBrains. And that actually works alongside the main panel where you write your code and I can show later what we do is analyze the code, whether you started writing it or you completed it.Like you can go both TDD (Test-Driven Development) or classical coding. And we offer analysis, tests, whether they pass or not, we further self debug [00:03:00] them and make suggestions eventually helping to improve the code quality specifically on code logic testing.Alessio: How did you get there? Obviously it's a great idea. Like, what was the idea, maze? How did you get here?Itamar: I'll go back long. So, yes I was two and a half times a CTO, VC backed startup CTO where we talked about the last one that I sold to Alibaba. But basically I'm like, it's weird to say by 20 years already of R&D manager, I'm not like the best programmer because like you mentioned, I'm coming more from the machine learning / computer vision side, one, one of the main application, but a lot of optimization. So I'm not necessarily the best coder, but I am like 20 year R&D manager. And I found that verifying code logic is very hard thing. And one of the thing that really makes it difficult to increase the development velocity.So you have tools related to checking performance.You have tools for vulnerabilities and security, Israelis are really good at that. But do you have a tool that actually helps you test code logic? I think what we have like dozens or hundreds, even thousands that help you on the end to end, maybe on the microservice integration system. But when you talk about code level, there isn't anything.So that was the pain I always had, especially when I did have tools for that, for the hardware. Like I worked in Mellanox to be sold to Nvidia as a student, and we had formal tools, et cetera. [00:04:30] So that's one part.The second thing is that after being sold to Alibaba, the team and I were quite a big team that worked on machine learning, large language model, et cetera, building developer tools relate with, with LLMs throughout the golden years of. 2017 to 2021, 2022. And we saw how powerful they became.So basically, if I frame it this way, because we develop it for so many use cases, we saw that if you're able to take a problem put a framework of a language around it, whether it's analyzing browsing behavior, or DNA, or etc, if you can put a framework off a language, then LLMs take you really far.And then I thought this problem that I have with code logic testing is basically a combination of a few languages: natural language, specification language, technical language. Even visual language to some extent. And then I quit Alibaba and took a bit of time to maybe wrap things around and rest a bit after 20 years of startup and corporate and joined with my partner Dedy Kredo who was my ever first employee.And that's how we like, came to this idea.Alessio: The idea has obviously been around and most people have done AST analysis, kinda like an abstract syntax tree, but it's kind of hard to get there with just that. But I think these models now are getting good enough where you can mix that and also traditional logical reasoning.Itamar: Exactly.Alessio: Maybe talk a little bit more about the technical implementation of it. You mentioned the agent [00:06:00] part. You mentioned some of the model part, like what happens behind the scenes when Codium gets in your code base?Itamar: First of all, I wanna mention I think you're really accurate.If you try to take like a large language model as is and try to ask it, can you like, analyze, test the code, etc, it'll not work so good. By itself it's not good enough on the other side, like all the traditional techniques we already started to invent since the Greek times. You know, logical stuff, you mentioned ASTs, but there's also dynamic code analysis, mutation testing, etc. There's a lot of the techniques out there, but they have inefficiencies.And a lot of those inefficiencies are actually matching with AI capabilities. Let me give you one example. Let's say you wanna do fuzzy testing or mutation testing.Mutation testing means that you either mutate the test, like the input of the test, the code of the test, etc or you mutate the code in order to check how good is your test suite.For example, if I mutate some equation in the application code and the test finds a bug and it does that at a really high rate, like out of 100 mutation, I [00:07:30] find all of the 100 problems in the test. It's probably a very strong test suite.Now the problem is that there's so many options for what to mutate in the data, in the test. And this is where, for example, AI could help, like pointing out where's the best thing that you can mutate. Actually, I think it's a very good use case. Why? Because even if AI is not 100% accurate, even if it's 80% accurate, it could really take you quite far rather just randomly selecting things.So if I wrap up, just go back high level. I think LLM by themselves cannot really do the job of verifying code logic and and neither can the traditional ones, so you need to merge them. But then one more thing before maybe you tell me where to double click. I think with code logic there's also a philosophy question here.Logic different from performance or quality. If I did a three for in loop, like I loop three things and I can fold them with some vector like in Python or something like that. We need to get into the mind of the developer. What was the intention? Like what is the bad code? Not what is the code logic that doesn't work. It's not according to the specification. So I think like one more thing that AI could really help is help to match, like if there is some natural language description of the code, we can match it. Or if there's missing information in natural language that needs [00:09:00] to be asked for the AI could help asking the user.It's not like a closed solution. Rather open and leaving the developer as the lead. Just like moving the developer from, from being the coder to actually being like a pilot that that clicks button and say, ah, this is what I meant, or this is the fix, rather actually writing all the code.Alessio: That makes sense. I think I talked about it on the podcast before, but like the switch from syntax to like semantics, like developers used to be focused on the syntax and not the meaning of what they're writing. So now you have the models that are really good at the syntax and you as a human are supposed to be really good at the semantics of what you're trying to build.How does it practically work? So I'm a software developer, I want to use Codium, like how do I start and then like, how do you make that happen in the, in the background?Itamar: So, like I said, Codium right now is an IDE extension. For example, I'm showing VS code. And if you just install it, like you'll have a few access points to start Codium AI, whether this sidebar or above every component or class that we think is very good to check with Codium.You'll have this small button. There's other way you can mark specific code and right click and run code. But this one is my favorite because we actually choose above which components we suggest to use code. So once I click it code, I starts analyzing this class. But not only this class, but almost everything that is [00:10:30] being used by the call center class.But all and what's call center is, is calling. And so we do like a static code analysis, et cetera. What, what we talked about. And then Codium provides with code analysis. It's right now static, like you can't change. It can edit it, and maybe later we'll talk about it. This is what we call the specification and we're going to make it editable so you can add additional behaviors and then create accordingly, test that will not pass, and then the code will, will change accordingly. So that's one entrance point, like via natural language description. That's one of the things that we're working on right now. What I'm showing you by the way, could be downloaded as is. It's what we have in production.The second thing that we show here is like a full test suite. There are six tests by default but you can just generate more almost as much as you want every time. We'll try to cover something else, like a happy pass edge case et cetera. You can talk with specific tests, okay? Like you can suggest I want this in Spanish or give a few languages, or I want much more employees.I didn't go over what's a call center, but basically it manages like call center. So you can imagine, I can a ask to make it more rigorous, etc, but I don't wanna complicate so I'm keeping it as is.I wanna show you the next one, which is run all test. First, we verify that you're okay, we're gonna run it. I don't know, maybe we are connected to the environment that is currently [00:12:00] configured in the IDE. I don't know if it's production for some reason, or I don't know what. Then we're making sure that you're aware we're gonna run the code that and then once we run, we show if it pass or fail.I hope that we'll have one fail. But I'm not sure it's that interesting. So I'll go like to another example soon, but, but just to show you what's going on here, that we actually give an example of what's a problem. We give the log of the error and then you can do whatever you want.You can fix it by yourself, or you can click reflect and fix, and what's going on right now is a bit a longer process where we do like chain of thought or reflect and fix. And we can suggest a solution. You can run it and in this case it passes. Just an example, this is a very simple example.Maybe later I'll show you a bug. I think I'll do that and I'll show you a bug and how we recognize actually the test. It's not a problem in the test, it's a problem in the code and then suggest you fix that instead of the code. I think you see where I'm getting at.The other thing is that there are a few code suggestion, and there could be a dozen of, of types that could be related to performance modularity or I see this case there is a maintainability.There could also be vulnerability or best practices or even suggestion for bugs. Like if we noticed, if we think one of the tests, for example, is failing because of a bug. So just code presented in the code suggestion. Probably you can choose a few, for example, if you like, and then prepare a code change like I didn't show you which exactly.We're making a diff now that you can apply on your code. So basically what, what we're seeing here is that [00:13:30] there are three main tabs, the code, the test and the code analysis. Let's call spec.And then there's a fourth tab, which is a code suggestion, if you wanna look at analytics, etc. Mm-hmm. Right now code okay. This is the change or quite a big change probably clicked on something. So that's the basic demo.Right now let's be frank. Like I wanted to show like a simple example. So it's a call center. All the inputs to the class are like relatively simple. There is no jsm input, like if you're Expedia or whatever, you have a J with the hotels, Airbnb, you know, so the test will be almost like too simple or not covering enough.Your code, if you don't provide it with some input is valuable, like adjacent with all information or YAMA or whatever. So you can actually add input data and the AI or model. It's actually by the way, a set of models and algorithms that will use that input to create interesting tests. And another thing is many people have some reference tests that they already made. It could be because they already made it or because they want like a very specific they have like how they imagine the test. So they just write one and then you add a reference and that will inspire all the rest of the tests. And also you can give like hints. [00:15:00] This is by the way plan to be like dynamic hints, like for different type of code.We will provide different hints. So we can help you become a bit more knowledgeable about how to test your code. So you can ask for like having a, a given one then, or you can have like at a funny private, like make different joke for each test or for example,Swyx: I'm curious, why did you choose that one? This is the pirate one. Yeah.Itamar: Interesting choice to put on your products. It could be like 11:00 PM of people sitting around. Let's choose one funny thingSwyx: and yeah. So two serious ones and one funny one. Yeah. Just for the listening audience, can you read out the other hints that you decided on as well?Itamar: Yeah, so specifically, like for this case, relatively very simple class, so there's not much to do, but I'm gonna go to one more thing here on the configuration. But it basically is given when then style, it's one of the best practices and tests. So even when I report a bug, for example, I found a bug when someone else code, usually I wanna say like, given, use this environment or use that this way when I run this function, et cetera.Oh, then it's a very, very full report. And it's very common to use that in like in unit test and perform.Swyx: I have never been shown this format.Itamar: I love that you, you mentioned that because if you go to CS undergrad you take so many courses in development, but none of them probably in testing, and it's so important. So why would you, and you don't go to Udemy or [00:16:30] whatever and, and do a testing course, right? Like it's, it's boring. Like people either don't do component level testing because they hate it or they do it and they hate it. And I think part of it it's because they're missing tool to make it fun.Also usually you don't get yourself educated about it because you wanna write your code. And part of what we're trying to do here is help people get smarter about testing and make it like easy. So this is like very common. And the idea here is that for different type of code, we'll suggest different type of hints to make you more knowledgeable.We're doing it on an education app, but we wanna help developers become smarter, more knowledgeable about this field. And another one is mock. So right now, our model decided that there's no need for mock here, which is a good decision. But if we would go to real world case, like, I'm part of AutoGPT community and there's all of tooling going on there. Right? And maybe when I want to test like a specific component, and it's relatively clear that going to the web and doing some search and coming back, I don't really need to do that. Like I know what I expect to do and so I can mock that part of using to crawl the web.A certain percentage of accuracy, like around 90, we will decide this is worth mocking and we will inject it. I can click it now and force our system to mock this. But you'll see like a bit stupid mocking because it really doesn't make sense. So I chose this pirate stuff, like add funny pirate like doc stringing make a different joke for each test.And I forced it to add mocks, [00:18:00] the tests were deleted and now we're creating six new tests. And you see, here's the shiver me timbers, the test checks, the call successful, probably there's some joke at the end. So in this case, like even if you try to force it to mock it didn't happen because there's nothing but we might find here like stuff that it mock that really doesn't make sense because there's nothing to mock here.So that's one thing I. I can show a demo where we actually catch a bug. And, and I really love that, you know how it is you're building a developer tools, the best thing you can see is developers that you don't know giving you five stars and sharing a few stuff.We have a discord with thousands of users. But I love to see the individual reports the most. This was one of my favorites. It helped me to find two bugs. I mentioned our vision is to reach zero bugs. Like, if you may say, we want to clean the internet from bugs.Swyx: So debugging the internet. I have my podcast title.Itamar: So, so I think like if we move to another exampleSwyx: Yes, yes, please, please. This is great.Itamar: I'm moving to a different example, it is the bank account. By the way, if you go to ChatGPT and, and you can ask me what's the difference between Codium AI and using ChatGPT.Mm-hmm. I'm, I'm like giving you this hard question later. Yeah. So if you ask ChatGPT give me an example to test a code, it might give you this bank account. It's like the one-on-one stuff, right? And one of the reasons I gave it, because it's easy to inject bugs here, that's easy to understand [00:19:30] anyway.And what I'm gonna do right now is like this bank account, I'm gonna change the deposit from plus to minus as an example. And then I'm gonna run code similarly to how I did before, like it suggests to do that for the entire class. And then there is the code analysis soon. And when we announce very soon, part of this podcast, it's going to have more features here in the code analysis.We're gonna talk about it. Yep. And then there is the test that I can run. And the question is that if we're gonna catch the bag, the bugs using running the test, Because who knows, maybe this implementation is the right one, right? Like you need to, to converse with the developer. Maybe in this weird bank, bank you deposit and, and the bank takes money from you.And we could talk about how this happens, but actually you can see already here that we are already suggesting a hint that something is wrong here and here's a suggestion to put it from minus to to plus. And we'll try to reflect and, and fix and then we will see actually the model telling you, hey, maybe this is not a bug in the test, maybe it's in the code.Swyx: I wanna stay on this a little bit. First of all, this is very impressive and I think it's very valuable. What user numbers can you disclose, you launched it and then it's got fairly organic growth. You told me something off the air, but you know, I just wanted to show people like this is being adopted in quite a large amount.Itamar: [00:21:00] First of all, I'm a relatively transparent person. Like even as a manager, I think I was like top one percentile being transparent in Alibaba. It wasn't five out of five, which is a good thing because that's extreme, but it was a good, but it also could be a bad, some people would claim it's a bad thing.Like for example, if my CTO in Alibaba would tell me you did really bad and it might cut your entire budget by 30%, if in half a year you're not gonna do like much better and this and that. So I come back to a team and tell 'em what's going on without like trying to smooth thing out and we need to solve it together.If not, you're not fitting in this team. So that's my point of view. And the same thing, one of the fun thing that I like about building for developers, they kind of want that from you. To be transparent. So we are on the high numbers of thousands of weekly active users. Now, if you convert from 50,000 downloads to high thousands of weekly active users, it means like a lot of those that actually try us keep using us weekly.I'm not talking about even monthly, like weekly. And that was like one of their best expectations because you don't test your code every day. Right now, you can see it's mostly focused on testing. So you probably test it like once a week. Like we wanted to make it so smooth with your development methodology and development lifecycle that you use it every day.Like at the moment we hope it to be used weekly. And that's what we're getting. And the growth is about like every two, three weeks we double the amount of weekly and downloads. It's still very early, like seven weeks. So I don't know if it'll keep that way, but we hope so. Well [00:22:30] actually I hope that it'll be much more double every two, three weeks maybe. Thanks to the podcast.Swyx: Well, we, yeah, we'll, we'll add you know, a few thousand hopefully. The reason I ask this is because I think there's a lot of organic growth that people are sharing it with their friends and also I think you've also learned a lot from your earliest days in, in the private beta test.Like what have you learned since launching about how people want to use these testing tools?Itamar: One thing I didn't share with you is like, when you say virality, there is like inter virality and intra virality. Okay. Like within the company and outside the company. So which teams are using us? I can't say, but I can tell you that a lot of San Francisco companies are using us.And one of the things like I'm really surprised is that one team, I saw one user two weeks ago, I was so happy. And then I came yesterday and I saw 48 of that company. So what I'm trying to say to be frank is that we see more intra virality right now than inter virality. I don't see like video being shared all around Twitter. See what's going on here. Yeah. But I do see, like people share within the company, you need to use it because it's really helpful with productivity and it's something that we will work about the [00:24:00] inter virality.But to be frank, first I wanna make sure that it's helpful for developers. So I care more about intra virality and that we see working really well, because that means that tool is useful. So I'm telling to my colleague, sharing it on, on Twitter means that I also feel that it will make me cool or make me, and that's something maybe we'll need, still need, like testing.Swyx: You know, I don't, well, you're working on that. We're gonna announce something like that. Yeah. You are generating these tests, you know, based on what I saw there. You're generating these tests basically based on the name of the functions. And the doc strings, I guess?Itamar:So I think like if you obfuscate the entire code, like our accuracy will drop by 50%. So it's right. We're using a lot of hints that you see there. Like for example, the functioning, the dog string, the, the variable names et cetera. It doesn't have to be perfect, but it has a lot of hints.By the way. In some cases, in the code suggestion, we will actually suggest renaming some of the stuff that will sync, that will help us. Like there's suge renaming suggestion, for example. Usually in this case, instead of calling this variable is client and of course you'll see is “preferred client” because basically it gives a different commission for that.So we do suggest it because if you accept it, it also means it will be easier for our model or system to keep improving.Swyx: Is that a different model?Itamar: Okay. That brings a bit to the topic of models properties. Yeah. I'll share it really quickly because Take us off. Yes. It's relevant. Take us off. Off. Might take us off road.I think [00:25:30] like different models are better on different properties, for example, how obedient you are to instruction, how good you are to prompt forcing, like to format forcing. I want the results to be in a certain format or how accurate you are or how good you are in understanding code.There's so many calls happening here to models by the way. I. Just by clicking one, Hey Codium AI. Can you help me with this bank account? We do a dozen of different calls and each feature you click could be like, like with that reflect and fix and then like we choose the, the best one.I'm not talking about like hundreds of models, but we could, could use different APIs of open AI for example, and, and other models, et cetera. So basically like different models are better on different aspect. Going back to your, what we talked about, all the models will benefit from having those hints in, in the code, that rather in the code itself or documentation, et cetera.And also in the code analysis, we also consider the code analysis to be the ground truth to some extent. And soon we're also going to allow you to edit it and that will use that as well.Alessio: Yeah, maybe talk a little bit more about. How do I actually get all these models to work together? I think there's a lot of people that have only been exposed to Copilot so far, which is one use case, just complete what I'm writing. You're doing a lot more things here. A lot of people listening are engineers themselves, some of them build these tools, so they would love to [00:27:00] hear more about how do you orchestrate them, how do you decide which model the what, stuff like that.Itamar: So I'll start with the end because that is a very deterministic answer, is that we benchmark different models.Like every time this there a new model in, in town, like recently it's already old news. StarCoder. It's already like, so old news like few days ago.Swyx: No, no, no. Maybe you want to fill in what it is StarCoder?Itamar: I think StarCoder is, is a new up and coming model. We immediately test it on different benchmark and see if, if it's better on some properties, et cetera.We're gonna talk about it like a chain of thoughts in different part in the chain would benefit from different property. If I wanna do code analysis and, and convert it to natural language, maybe one model would be, would be better if I want to output like a result in, in a certain format.Maybe another model is better in forcing the, a certain format you probably saw on Twitter, et cetera. People talk about it's hard to ask model to output JSON et cetera. So basically we predefine. For different tasks, we, we use different models and I think like this is for individuals, for developers to check, try to sync, like the test that now you are working on, what is most important for you to get, you want the semantic understanding, that's most important? You want the output, like are you asking for a very specific [00:28:30] output?It's just like a chat or are you asking to give a output of code and have only code, no description. Or if there's a description of the top doc string and not something else. And then we use different models. We are aiming to have our own models in in 2024. Being independent of any other third party, like OpenAI or so, but since our product is very challenging, it has UI/UX challenges, engineering challenge, statical and dynamical analysis, and AI.As entrepreneur, you need to choose your battles. And we thought that it's better for us to, to focus on everything around the model. And one day when we are like thinking that we have the, the right UX/UI engineering, et cetera, we'll focus on model building. This is also, by the way, what we did in in Alibaba.Even when I had like half a million dollar a month for trading one foundational model, I would never start this way. You always try like first using the best model you can for your product. Then understanding what's the glass ceiling for that model? Then fine tune a foundation model, reach a higher glass ceiling and then training your own.That's what we're aiming and that's what I suggest other developers like, don't necessarily take a model and, and say, oh, it's so easy these days to do RLHF, et cetera. Like I see it's like only $600. Yeah, but what are you trying to optimize for? The properties. Don't try to like certain models first, organize your challenges.Understand the [00:30:00] properties you're aiming for and start playing with that. And only then go to train your own model.Alessio: Yeah. And when you say benchmark, you know, we did a one hour long episode, some benchmarks, there's like many of them. Are you building some unique evals to like your own problems? Like how are you doing that? And that's also work for your future model building, obviously, having good benchmarks. Yeah.Itamar:. Yeah. That's very interesting. So first of all, with all the respect, I think like we're dealing with ML benchmark for hundreds of years now.I'm, I'm kidding. But like for tens of years, right? Benchmarking statistical creatures is something that, that we're doing for a long time. I think what's new here is the generative part. It's an open challenge to some extent. And therefore, like maybe we need to re rethink some of the way we benchmark.And one of the notions that I really believe in, I don't have a proof for that, is like create a benchmark in levels. Let's say you create a benchmark from level one to 10, and it's a property based benchmark. Let's say I have a WebGPT ask something from the internet and then it should fetch it for me.So challenge level one could be, I'm asking it and it brings me something. Level number two could be I'm asking it and it has a certain structure. Let's say for example, I want to test AutoGPT. Okay. And I'm asking it to summarize what's the best cocktail I could have for this season in San Francisco.So [00:31:30] I would expect, like, for example, for that model to go. This is my I what I think to search the internet and do a certain thing. So level number three could be that I want to check that as part of this request. It uses a certain tools level five, you can add to that. I expect that it'll bring me back something like relevance and level nine it actually prints the cocktail for me I taste it and it's good. So, so I think like how I see it is like we need to have data sets similar to before and make sure that we not fine tuning the model the same way we test it. So we have one challenges that we fine tune over, right? And few challenges that we don't.And the new concept may is having those level which are property based, which is something that we know from software testing and less for ML. And this is where I think that these two concepts merge.Swyx: Maybe Codium can do ML testing in the future as well.Itamar: Yeah, that's a good idea.Swyx: Okay. I wanted to cover a little bit more about Codium in the present and then we'll go into the slides that you have.So you have some UI/UX stuff and you've obviously VS Code is the majority market share at this point of IDE, but you also have IntelliJ right?Itamar: Jet Brains in general.Swyx: Yeah. Anything that you learned supporting JetBrains stuff? You were very passionate about this one user who left you a negative review.What is the challenge of that? Like how do you think about the market, you know, maybe you should focus on VS Code since it's so popular?Itamar: Yeah. [00:33:00] So currently the VS Code extension is leading over JetBrains. And we were for a long time and, and like when I tell you long time, it could be like two or three weeks with version oh 0.5, point x something in, in VS code, although oh 0.4 or so a jet brains, we really saw the difference in, in the how people react.So we also knew that oh 0.5 is much more meaningful and one of the users left developers left three stars on, on jet brands and I really remember that. Like I, I love that. Like it's what do you want to get at, at, at our stage? What's wrong? Like, yes, you want that indication, you know, the worst thing is getting nothing.I actually, not sure if it's not better to get even the bad indication, only getting good ones to be re frank like at, at, at least in our stage. So we're, we're 9, 10, 10 months old startup. So I think like generally speaking We find it easier and fun to develop in vs code extension versus JetBrains.Although JetBrains has like very nice property, when you develop extension for one of the IDEs, it usually works well for all the others, like it's one extension for PyCharm, and et cetera. I think like there's even more flexibility in the VS code. Like for example, this app is, is a React extension as opposed that it's native in the JetBrains one we're using. What I learned is that it's basically is almost like [00:34:30] developing Android and iOS where you wanna have a lot of the best practices where you have one backend and all the software development like best practices with it.Like, like one backend version V1 supports both under Android and iOS and not different backends because that's crazy. And then you need all the methodology. What, what means that you move from one to 1.1 on the backend? What supports whatnot? If you don't what I'm talking about, if you developed in the past, things like that.So it's important. And then it's like under Android and iOS and, and you relatively want it to be the same because you don't want one developer in the same team working with Jet Brains and then other VS code and they're like talking, whoa, that's not what I'm seeing. And with code, what are you talking about?And in the future we're also gonna have like teams offering of collaboration Right now if you close Codium Tab, everything is like lost except of the test code, which you, you can, like if I go back to a test suite and do open as a file, and now you have a test file with everything that you can just save, but all the goodies here it's lost. One day we're gonna have like a platform you can save all that, collaborate with people, have it part of your PR, like have suggested part of your PR. And then you wanna have some alignment. So one of the challenges, like UX/UI, when you think about a feature, it should, some way or another fit for both platforms be because you want, I think by the way, in iOS and Android, Android sometimes you don't care about parity, but here you're talking about developers that might be on the same [00:36:00] team.So you do care a lot about that.Alessio: Obviously this is a completely different way to work for developers. I'm sure this is not everything you wanna build and you have some hint. So maybe take us through what you see the future of software development look like.Itamar: Well, that's great and also like related to our announcement, what we're working on.Part of it you already start seeing in my, in my demo before, but now I'll put it into a framework. I'll be clearer. So I think like the software development world in 2025 is gonna look very different from 2020. Very different. By the way. I think 2020 is different from 2000. I liked the web development in 95, so I needed to choose geocities and things like that.Today's much easier to build a web app and whatever, one of the cloud. So, but I think 2025 is gonna look very different in 2020 for the traditional coding. And that's like a paradigm I don't think will, will change too much in the last few years. And, and I'm gonna go over that when I, when I'm talking about, so j just to focus, I'm gonna show you like how I think the intelligence software development world look like, but I'm gonna put it in the lens of Codium AI.We are focused on code integrity. We care that with all this advancement of co-generation, et cetera, we wanna make sure that developers can code fast with confidence. That they have confidence on generated code in the AI that they are using that. That's our focus. So I'm gonna put, put that like lens when I'm going to explain.So I think like traditional development. Today works like creating some spec for different companies, [00:37:30] different development teams. Could mean something else, could be something on Figma, something on Google Docs, something on Jira. And then usually you jump directly to code implementation. And then if you have the time or patience, or will, you do some testing.And I think like some people would say that it's better to do TDD, like not everyone. Some would say like, write spec, write your tests, make sure they're green, that they do not pass. Write your implementation until your test pass. Most people do not practice it. I think for just a few, a few reason, let them mention two.One, it's tedious and I wanna write my code like before I want my test. And I don't think, and, and the second is, I think like we're missing tools to make it possible. And what we are advocating, what I'm going to explain is actually neither. Okay. It's very, I want to say it's very important. So here's how we think that the future of development pipeline or process is gonna look like.I'm gonna redo it in steps. So, first thing I think there do I wanna say that they're gonna be coding assistance and coding agents. Assistant is like co-pilot, for example, and agents is something that you give it a goal or a task and actually chains a few tasks together to complete your goal.Let's have that in mind. So I think like, What's happening right now when you saw our demo is what I presented a few minutes ago, is that you start with an implementation and we create spec for you and test for you. And that was like a agent, like you didn't converse with it, you just [00:39:00] click a button.And, and we did a, a chain of thought, like to create these, that's why it's it's an agent. And then we gave you an assistant to change tests, like you can converse it with it et cetera. So that's like what I presented today. What we're announcing is about a vision that we called the DRY. Don't repeat yourself. I'm gonna get to that when I'm, when I'm gonna show you the entire vision. But first I wanna show you an intermediate step that what we're going to release. So right now you can write your code. Or part of it, like for example, just a class abstract or so with a coding assistant like copilot and maybe in the future, like a Codium AI coding assistant.And then you can create a spec I already presented to you. And the next thing is that you going to have like a spec assistant to generate technical spec, helping you fill it quickly focused on that. And this is something that we're working on and, and going to release the first feature very soon as part of announcement.And it's gonna be very lean. Okay? We're, we're a startup that going bottom up, like lean features going to more and more comprehensive one. And then once you have the spec and implementation, you can either from implementation, have tests, and then you can run the test and fix them like I presented to you.But you can also from spec create tests, okay? From the spec directly to tests. [00:40:30]So then now you have a really interesting thing going on here is that you can start from spec, create, test, create code. You can start from test create code. You can start from a limitation. From code, create, spec and test. And actually we think the future is a very flexible one. You don't need to choose what you're practicing traditional TDD or whatever you wanna start with.If you have already some spec being created together with one time in one sprint, you decided to write a spec because you wanted to align about it with your team, et cetera, and now you can go and create tests and implementation or you wanted to run ahead and write your code. Creating tests and spec that aligns to it will be relatively easy.So what I'm talking about is extreme DRY concept; DRY is don't repeat yourself. Until today when we talked about DRY is like, don't repeat your code. I claim that there is a big parts of the spec test and implementation that repeat himself, but it's not a complete repetition because if spec was as detailed as the implementation, it's actually the implementation.But the spec is usually in different language, could be natural language and visual. And what we're aiming for, our vision is enabling the dry concept to the extreme. With all these three: you write your test will help you generate the code and the spec you write your spec will help you doing the test and implementation.Now the developers is the driver, okay? You'll have a lot [00:42:00] of like, what do you think about this? This is what you meant. Yes, no, you wanna fix the coder test, click yes or no. But you still be the driver. But there's gonna be like extreme automation on the DRY level. So that's what we're announcing, that we're aiming for as our vision and what we're providing these days in our product is the middle, is what, what you see in the middle, which is our code integrity agents working for you right now in your id, but soon also part of your Github actions, et cetera, helping you to align all these three.Alessio: This is great. How do you reconcile the difference in languages, you know, a lot of times the specs is maybe like a PM or it's like somebody who's more at the product level.Some of the implementation details is like backend developers for something. Frontend for something. How do you help translate the language between the two? And then I think in the one of the blog posts on your blog, you mentioned that this is also changing maybe how programming language themselves work. How do you see that change in the future? Like, are people gonna start From English, do you see a lot of them start from code and then it figures out the English for them?Itamar: Yeah. So first of all, I wanna say that although we're working, as we speak on managing we front-end frameworks and languages and usage, we are currently focused on the backend.So for example, as the spec, we won't let you input Figma, but don't be surprised if in 2024 the input of the spec could be a Figma. Actually, you can see [00:43:30] demos of that on a pencil drawing from OpenAI and when he exposed the GPT-4. So we will have that actually.I had a blog, but also I related to two different blogs. One, claiming a very knowledgeable and respectful, respectful person that says that English is going to be the new language program language and, and programming is dead. And another very respectful person, I think equally said that English is a horrible programming language.And actually, I think both of are correct. That's why when I wrote the blog, I, I actually related, and this is what we're saying here. Nothing is really fully redundant, but what's annoying here is that to align these three, you always need to work very hard. And that's where we want AI to help with. And if there is inconsistency will raise a question, what do, which one is true?And just click yes or no or test or, or, or code that, that what you can see in our product and we'll fix the right one accordingly. So I think like English and, and visual language and code. And the test language, let's call it like, like that for a second. All of them are going to persist. And just at the level of automation aligning all three is what we're aiming for.Swyx: You told me this before, so I I'm, I'm just actually seeing Alessio's reaction to it as a first time.Itamar: Yeah, yeah. Like you're absorbing like, yeah, yeah.Swyx: No, no. This is, I mean, you know, you can put your VC hat on or like compare, like what, what is the most critical or unsolved question presented by this vision?Alessio: A lot of these tools, especially we've seen a lot in the past, it's like the dynamic nature of a lot of this, you know?[00:45:00] Yeah. Sometimes, like, as you mentioned, sometimes people don't have time to write the test. Sometimes people don't have time to write the spec. Yeah. So sometimes you end up with things. Out of sync, you know? Yeah. Or like the implementation is moving much faster than the spec, and you need some of these agents to make the call sometimes to be like, no.Yeah, okay. The spec needs to change because clearly if you change the code this way, it needs to be like this in the future. I think my main question as a software developer myself, it's what is our role in the future? You know? Like, wow, how much should we intervene, where should we intervene?I've been coding for like 15 years, but if I've been coding for two years, where should I spend the next year? Yeah. Like focus on being better at understanding product and explain it again. Should I get better at syntax? You know, so that I can write code. Would love have any thoughts.Itamar: Yeah. You know, there's gonna be a difference between 1, 2, 3 years, three to six, six to 10, and 10 to 20. Let's for a second think about the idea that programming is solved. Then we're talking about a machine that can actually create any piece of code and start creating, like we're talking about singularity, right?Mm-hmm. If the singularity happens, then we're talking about this new set of problems. Let's put that aside. Like even if it happens in 2041, that's my prediction. I'm not sure like you should aim for thinking what you need to do, like, or not when the singularity happens. So I, [00:46:30] I would aim for mm-hmm.Like thinking about the future of the next five years or or, so. That's my recommendation because it's so crazy. Anyway. Maybe not the best recommendation. Take that we're for grain of salt. And please consult with a lawyer, at least in the scope of, of the next five years. The idea that the developers is the, the driver.It actually has like amazing team members. Agents that working for him or her and eventually because he or she's a driver, you need to understand especially what you're trying to achieve, but also being able to review what you get. The better you are in the lower level of programming in five years, it it mean like real, real program language.Then you'll be able to develop more sophisticated software and you will work in companies that probably pay more for sophisticated software and the more that you're less skilled in, in the actual programming, you actually would be able to be the programmer of the new era, almost a creator. You'll still maybe look on the code levels testing, et cetera, but what's important for you is being able to convert products, requirements, et cetera, to working with tools like Codium AI.So I think like there will be like degree of diff different type developers now. If you think about it for a second, I think like it's a natural evolution. It's, it's true today as well. Like if you know really good the Linux or assembly, et cetera, you'll probably work like on LLVM Nvidia [00:48:00] whatever, like things like that.Right. And okay. So I think it'll be like the next, next step. I'm talking about the next five years. Yeah. Yeah. Again, 15 years. I think it's, it's a new episode if you would like to invite me. Yeah. Oh, you'll be, you'll be back. Yeah. It's a new episode about how, how I think the world will look like when you really don't need a developer and we will be there as Cody mi like you can see.Mm-hmm.Alessio: Do we wanna dive a little bit into AutoGPT? You mentioned you're part of the community. Yeah.Swyx: Obviously Try, Catch, Finally, Repeat is also part of the company motto.Itamar: Yeah. So it actually really. Relates to what we're doing and there's a reason we have like a strong relationship and connection with the AutoGPT community and us being part part of it.So like you can see, we're talking about agent for a few months now, and we are building like a designated, a specific agent because we're trying to build like a product that works and gets the developer trust to have developer trust us. We're talking about code integrity. We need it to work. Like even if it will not put 100% it's not 100% by the way our product at all that UX/UI should speak the language of, oh, okay, we're not sure here, please take the driving seat.You want this or that. But we really not need, even if, if we're not close to 100%, we still need to work really well just throwing a number. 90%. And so we're building a like really designated agents like those that from code, create tests.So it could create tests, run them, fix them. It's a few tests. So we really believe in that we're [00:49:30] building a designated agent while Auto GPT is like a swarm of agents, general agents that were supposedly you can ask, please make me rich or make me rich by increase my net worth.Now please be so smart and knowledgeable to use a lot of agents and the tools, et cetera, to make it work. So I think like for AutoGPT community was less important to be very accurate at the beginning, rather to show the promise and start building a framework that aims directly to the end game and start improving from there.While what we are doing is the other way around. We're building an agent that works and build from there towards that. The target of what I explained before. But because of this related connection, although it's from different sides of the, like the philosophy of how you need to build those things, we really love the general idea.So we caught it really early that with Toran like building it, the, the maker of, of AutoGPT, and immediately I started contributing, guess what, what did I contribute at the beginning tests, right? So I started using Codium AI to build tests for AutoGPT, even, even finding problems this way, et cetera.So I become like one of the, let's say 10 contributors. And then in the core team of the management, I talk very often with with Toran on, on different aspects. And we are even gonna have a workshop,Swyx: a very small [00:49:00] meetingItamar: work meeting workshop. And we're going to compete together in a, in a hackathons.And to show that AutoGPT could be useful while, for example, Codium AI is creating the test for it, et cetera. So I'm part of that community, whether is my team are adding tests to it, whether like advising, whether like in in the management team or whether to helping Toran. Really, really on small thing.He is the amazing leader like visionaire and doing really well.Alessio: What do you think is the future of open source development? You know, obviously this is like a good example, right? You have code generating the test and in the future code could actually also implement the what the test wanna do. So like, yeah.How do you see that change? There's obviously not enough open source contributors and yeah, that's one of the, the main issue. Do you think these agents are maybe gonna help us? Nadia Eghbal has this great book called like Working in Public and there's this type of projects called Stadium model, which is, yeah, a lot of people use them and like nobody wants to contribute to them.I'm curious about, is it gonna be a lot of noise added by a lot of these agents if we let them run on any repo that is open source? Like what are the contributing guidelines for like humans versus agents? I don't have any of the answers, but like some of the questions that I've been thinking about.Itamar: Okay. So I wanna repeat your question and make sure I understand you, but like, if they're agents, for example, dedicated for improving code, why can't we run them on, mm-hmm.Run them on like a full repository in, in fixing that? The situation right now is that I don't think that right now Auto GPT would be able to do that for you. Codium AI might but it's not open sourced right now. And and like you can see like in the months or two, you will be able to like running really quickly like development velocity, like our motto is moving fast with confidence by the way.So we try to like release like every day or so, three times even a day in the backend, et cetera. And we'll develop more feature, enable you, for example, to run an entire re, but, but it's not open source. So about the open source I think like AutoGPT or LangChain, you can't really like ask please improve my repository, make it better.I don't think it will work right now because because let me like. Softly quote Ilya from Open AI. He said, like right now, let's say that a certain LLM is 95% accurate. Now you're, you're concatenating the results. So the accuracy is one point like it's, it's decaying. And what you need is like more engineering frameworks and work to be done there in order to be able to deal with inaccuracies, et cetera.And that's what we specialize in Codium, but I wanna say that I'm not saying that Auto GPT won't be able to get there. Like the more tools and that going to be added, the [00:52:30] more prompt engineering that is dedicated for this, this idea will be added by the way, where I'm talking with Toran, that Codium, for example, would be one of the agents for Auto GPT.Think about it AutoGPT is not, is there for any goal, like increase my net worth, though not focused as us on fixing or improving code. We might be another agent, by the way. We might also be, we're working on it as a plugin for ChatGPT. We're actually almost finished with it. So that's like I think how it's gonna be done.Again, open opensource, not something we're thinking about. We wanted to be really good before weSwyx: opensource it. That was all very impressive. Your vision is actually very encouraging as well, and I, I'm very excited to try it out myself. I'm just curious on the Israel side of things, right? Like you, you're visiting San Francisco for a two week trip for this special program you can tell us about. But also I think a lot of American developers have heard that, you know, Israel has a really good tech scene. Mostly it's just security startups. You know, I did some, I was in some special unit in the I D F and like, you know, I come out and like, I'm doing the same thing again, but like, you know, for enterprises but maybe just something like, describe for, for the rest of the world.It's like, What is the Israeli tech scene like? What is this program that you're on and what shouldItamar: people know? So I think like Israel is the most condensed startup per capita. I think we're number one really? Or, or startup pair square meter. I think, I think we're number one as well because of these properties actually there is a very strong community and like everyone are around, like are [00:57:00] working in a.An entrepreneur or working in a startup. And when you go to the bar or the coffee, you hear if it's 20, 21, people talking about secondary, if it's 2023 talking about like how amazing Geni is, but everyone are like whatever are around you are like in, in the scene. And, and that's like a lot of networking and data propagation, I think.Somehow similar here to, to the Bay Area in San Francisco that it helps, right. So I think that's one of our strong points. You mentioned some others. I'm not saying that it doesn't help. Yes. And being in the like idf, the army, that age of 19, you go and start dealing with technology like very advanced one, that, that helps a lot.And then going back to the community, there's this community like is all over the world. And for example, there is this program called Icon. It's basically Israelis and in the Valley created a program for Israelis from, from Israel to come and it's called Silicon Valley 1 0 1 to learn what's going on here.Because with all the respect to the tech scene in Israel here, it's the, the real thing, right? So, so it's an non-profit organization by Israelis that moved here, that brings you and, and then brings people from a 16 D or, or Google or Navon or like. Amazing people from unicorns or, or up and coming startup or accelerator, and give you up-to-date talks and, and also connect you to relevant people.And that's, that's why I'm here in addition to to, you know, to [00:58:30] me and, and participate in this amazing podcast, et cetera.Swyx: Yeah. Oh, well, I, I think, I think there's a lot of exciting tech talent, you know, in, in Tel Aviv, and I, I'm, I'm glad that your offer is Israeli.Itamar: I, I think one of thing I wanted to say, like yeah, of course, that because of what, what what we said security is, is a very strong scene, but a actually water purification agriculture attack, there's a awful other things like usually it's come from necessity.Yeah. Like, we have big part of our company of our state is like a desert. So there's, there's other things like ai by the way is, is, is big also in Israel. Like, for example, I think there's an Israeli competitor to open ai. I'm not saying like it's as big, but it's ai 21, I think out of 10.Yeah. Out. Oh yeah. 21. Is this really? Yeah. Out of 10 like most, mm-hmm. Profound research labs. Research lab is, for example, I, I love, I love their. Yeah. Yeah.Swyx: I, I think we should try to talk to one of them. But yeah, when you and I met, we connected a little bit Singapore, you know, I was in the Singapore Army and Israeli army.We do have a lot of connections between countries and small countries that don't have a lot of natural resources that have to make due in the world by figuring out some other services. I think the Singapore startup scene has not done as well as the Israeli startup scene. So I'm very interested in, in how small, small countries can have a world impact essentially.Itamar: It's a question we're being asked a lot, like why, for example, let's go to the soft skills. I think like failing is a bad thing. Yeah. Like, okay. Like sometimes like VCs prefer to [01:00:00] put money on a, on an entrepreneur that failed in his first startup and actually succeeded because now that person is knowledgeable, what it mean to be, to fail and very hungry to, to succeed.So I think like generally, like there's a few reason I think it's hard to put the finger exactly, but we talked about a few things. But one other thing I think like failing is not like, this is my fourth company. I did one as, it wasn't a startup, it was a company as a teenager. And then I had like my first startup, my second company that like, had a amazing run, but then very beautiful collapse.And then like my third company, my second startup eventually exit successfully to, to Alibaba. So, so like, I think like it's there, there are a lot of trial and error, which is being appreciated, not like suppressed. I guess like that's one of the reason,Alessio: wanna jump into lightning round?Swyx: Yes. I think we send you into prep, but there's just three questions now.We've, we've actually reduced it quite a bit, but you have it,Alessio: so, and we can read them that you can take time and answer. You don't have to right away. First question, what is a already appin in AI that Utah would take much longer than an sItamar: Okay, so I have to, I hope it doesn't sound like arrogant,
We are excited to be the first podcast in the world to release an in-depth interview on the new SOTA in commercially licensed open source models - MosiacML MPT-7B!The Latent Space crew will be at the NYC Lux AI Summit next week, and have two meetups in June. As usual, all events are on the Community page! We are also inviting beta testers for the upcoming AI for Engineers course. See you soon!One of GPT3's biggest limitations is context length - you can only send it up to 4000 tokens (3k words, 6 pages) before it throws a hard error, requiring you to bring in LangChain and other retrieval techniques to process long documents and prompts. But MosaicML recently open sourced MPT-7B, the newest addition to their Foundation Series, with context length going up to 84,000 tokens (63k words, 126 pages):This transformer model, trained from scratch on 1 trillion tokens of text and code (compared to 300B for Pythia and OpenLLaMA, and 800B for StableLM), matches the quality of LLaMA-7B. It was trained on the MosaicML platform in 9.5 days on 440 GPUs with no human intervention, costing approximately $200,000. Unlike many open models, MPT-7B is licensed for commercial use and it's optimized for fast training and inference through FlashAttention and FasterTransformer.They also released 3 finetuned models starting from the base MPT-7B: * MPT-7B-Instruct: finetuned on dolly_hhrlhf, a dataset built on top of dolly-5k (see our Dolly episode for more details). * MPT-7B-Chat: finetuned on the ShareGPT-Vicuna, HC3, Alpaca, Helpful and Harmless, and Evol-Instruct datasets.* MPT-7B-StoryWriter-65k+: it was finetuned with a context length of 65k tokens on a filtered fiction subset of the books3 dataset. While 65k is the advertised size, the team has gotten up to 84k tokens in response when running on a single node A100-80GB GPUs. ALiBi is the dark magic that makes this possible. Turns out The Great Gatsby is only about 68k tokens, so the team used the model to create new epilogues for it!On top of the model checkpoints, the team also open-sourced the entire codebase for pretraining, finetuning, and evaluating MPT via their new MosaicML LLM Foundry. The table we showed above was created using LLM Foundry in-context-learning eval framework itself!In this episode, we chatted with the leads of MPT-7B at Mosaic: Jonathan Frankle, Chief Scientist, and Abhinav Venigalla, Research Scientist who spearheaded the MPT-7B training run. We talked about some of the innovations they've brought into the training process to remove the need for 2am on-call PagerDutys, why the LLM dataset mix is such an important yet dark art, and why some of the traditional multiple-choice benchmarks might not be very helpful for the type of technology we are building.Show Notes* Introducing MPT-7B* Cerebras* Lottery Ticket Hypothesis* Hazy Research* ALiBi* Flash Attention* FasterTransformer* List of naughty words for C4 https://twitter.com/code_star/status/1661386844250963972* What is Sparsity?* Hungry Hungry Hippos* BF16 FPp.s. yes, MPT-7B really is codenamed LLongboi!Timestamps* Introductions [00:00:00]* Intro to Mosaic [00:03:20]* Training and Creating the Models [00:05:45]* Data Choices and the Importance of Repetition [00:08:45]* The Central Question: What Mix of Data Sets Should You Use? [00:10:00]* Evaluation Challenges of LLMs [0:13:00]* Flash Attention [00:16:00]* Fine-tuning for Creativity [00:19:50]* Open Source Licenses and Ethical Considerations [00:23:00]* Training Stability Enhancement [00:25:15]* Data Readiness & Training Preparation [00:30:00]* Dynamic Real-time Model Evaluation [00:34:00]* Open Science for Affordable AI Research [00:36:00]* The Open Approach [00:40:15]* The Future of Mosaic [00:44:11]* Speed and Efficiency [00:48:01]* Trends and Transformers [00:54:00]* Lightning Round and Closing [1:00:55]TranscriptAlessio: [00:00:00] Hey everyone. Welcome to the Latent Space podcast. This is Alessio partner and CTO-in-Residence at Decibel Partners. I'm joined by my co-host, Swyx, writer and editor of Latent Space.Swyx: Hey, and today we have Jonathan and Abhi from Mosaic ML. Welcome to our studio.Jonathan: Guys thank you so much for having us. Thanks so much.Swyx: How's it feel?Jonathan: Honestly, I've been doing a lot of podcasts during the pandemic, and it has not been the same.Swyx: No, not the same actually. So you have on your bio that you're primarily based in Boston,Jonathan: New York. New York, yeah. My Twitter bio was a probability distribution over locations.Swyx: Exactly, exactly. So I DMd you because I was obviously very interested in MPT-7B and DMd you, I was like, for the 0.2% of the time that you're in San Francisco, can you come please come to a podcast studio and you're like, I'm there next week.Jonathan: Yeah, it worked out perfectly. Swyx: We're really lucky to have you, I'll read off a few intros that people should know about you and then you can fill in the blanks.So Jonathan, you did your BS and MS at Princeton in programming languages and then found your way into ML for your PhD at MiT where you made a real splash with the lottery ticket hypothesis in 2018, which people can check up on. I think you've done a few podcasts about it over the years, which has been highly influential, and we'll talk about sparse models at Mosaic. You have also had some side [00:01:30] quest. You taught programming for lawyers and you did some law and privacy stuff in, in DC and also did some cryptography stuff. Um, and you've been an assistant professor at Harvard before earning your PhD.Jonathan: I've yet to start.Swyx: You, you yet to start. Okay. But you just got your PhD.Jonathan:. I technically just got my PhD. I was at Mosaic which delayed my defense by about two years. It was, I was at 99% done for two years. Got the job at Harvard, Mosaic started, and I had better things to do than write my dissertation for two years. Swyx: You know, you know, this is very out of order.Jonathan: Like, oh, completely out of order, completely backwards. Go talk to my advisor about that. He's also an advisor at Mosaic and has been from the beginning. And, you know, go talk to him about finishing on time.Swyx: Great, great, great. And just to fill it out, Abhi, you did your BS and MS and MIT, you were a researcher at Cerebras, and you're now a research scientist at Mosaic. Just before we go into Mosaic stuff, I'm actually very curious about Cereus and, uh, just that, that space in general. Um, what are they doing that people should know about?Abhinav: Yeah, absolutely. Um, I think the biggest thing about CEREUS is that they're really building, you know, kind of the NextGen computing platform beyond, like GPUs.Um, they're trying to build a system that uses an entire wafer, you know, rather than cutting up a wafer into smaller chips and trying to train a model on that entire system, or actually more recently on many such wafers. Um, so it's, and it's really extraordinary. I think it's like the first time ever that kind of wafer scale computing has ever really worked. And so it's a really exciting time to be there, trying to figure out how we can map ML workloads to work, um, on a much, much bigger chip.Swyx: And do you use like [00:03:00] a different programming language or framework to do that? Or is that like..Abhinav: Yeah, so I mean, things have changed a bit since I was there.I think, um, you can actually run just normal tensor flow and pie torch on there. Um, so they've built a kind of software stack that compiles it down. So it actually just kind of works naturally. But yeah.Jonathan : Compiled versions of Python is a hot topic at the moment with Mojo as well. Swyx: And then Mosaic, you, you spearheaded the MPT-7B effort.INTRO TO MOSAIC [00:03:20]Abhinav: Uh, yeah. Yeah, so it's kind of like, it's been maybe six months, 12 months in the making. We kind of started working on LMs sort of back in the summer of last year. Um, and then we came with this blog post where we kind of profiled a lot of LMs and saw, hey, the cost of training is actually a lot lower than what people might think.Um, and then since then, you know, being inspired by kind of, you know, meta's release, so the LLaMA models and lots of other open source work, we kind of started working towards, well, what if we were to release a really good kind of 7 billion parameter model? And that's what MPT is. Alessio:You know, we mentioned some of the podcasts you had done, Jonathan, I think in one of them you mentioned Mosaic was not planning on building a model and releasing and obviously you eventually did. So what are some of the things that got you there that maybe obviously LLaMA you mentioned was an inspiration. You now have both the training and like inference products that you offer. Was this more of a research challenge in a way, uh, that you wanted to do?Or how did the idea come to be?Jonathan: I think there were a couple of things. So we still don't have a first class model. We're not an open AI where, you know, our businesses come to use our one great model. Our business is built around customers creating their own models. But at the end of the day, if customers are gonna create their own models, we have to have the tools to help them do that, and to have the tools to help them do that and know that they work we have to create our own models to start. We have to know that we can do something great if customers are gonna do something great. And one too many people may have challenged me on Twitter about the fact that, you know, mosaic claims all these amazing numbers, but, you know, I believe not to, you know, call out Ross Whiteman here, but, you know, I believe he said at some point, you know, show us the pudding.Um, and so Ross, you know, please let me know how the pudding tastes. But in all seriousness, like I think there is something, this is a demo in some sense. This is to say we did this in 9.5 days for a really reasonable cost, straight through 200, an intervention. 200 K. Yep. Um, you can do this too.Swyx: Uh, and just to reference the numbers that you're putting out, this is the, the last year you were making a lot of noise for trading GPT 3 under 450 K, which is your, your initial estimate.Um, and then it went down to a 100 K and stable diffusion 160 k going down to less than 50 K as well.Jonathan: So I will be careful about that 100 K number. That's certainly the challenge I've given Abhi to hit. Oh, I wouldn't make the promise that we've hit yet, but you know, it's certainly a target that we have.And I, you know, Abhi may kill me for saying this. I don't think it's crazy. TRAINING AND CREATING THE MODELS [00:05:45] Swyx: So we definitely want to get into like estimation math, right? Like what, what needs to happen for those big order magnitude changes to in, in infrastructure costs. But, uh, let's kind of stick to the MPT-7B story. Yeah. Tell us everything.Like you have, uh, three different models. One of them. State of the art essentially on context length. Let's talk about the process of training them, the, uh, the decisions that you made. Um, I can go into, you know, individual details, but I just wanna let you let you rip.Abhinav: Yeah, so I mean, I think, uh, we started off with the base model, which is kind of for all practical purposes, a recreation of LLaMA 7B.Um, so it's a 7 billion perimeter model trained on the trillion tokens. Um, and our goal was like, you know, we should do it efficiently. We should be able to do it like, kind of hands free so we don't have to babysit the runs as they're doing them. And it could be kind of a, a launching point for these fine tune models and those fine tune models, you know, on, on the one hand they're kind of really fun for the community, like the story writer model, which has like a 65,000 length context window and you can even kind of extrapolate beyond that. Um, but they're, they're also kind of just tr inspirations really. So you could kind of start with an MPT-7B base and then build your own custom, you know, downstream. If you want a long context code model, you could do that with our platform. If you wanted one that was for a particular language, you could do that too.But yeah, so we picked kind of the three variance chat and instruct and story writer just kind of like inspirations looking at what people were doing in the community today. Yeah. Alessio: And what's the beginning of the math to come up with? You know, how many tokens you wanna turn it on? How many parameters do you want in a bottle? 7 billion and 30 billion seem to be kind of like two of the magic numbers going around right now. Abhinav: Yeah, definitely. Definitely. Yeah, I think like there's sort of these scaling laws which kind of tell you how to best spend your training compute if that's all you cared about. So if you wanna spend $200,000 exactly in the most efficient way, there'd be a recipe for doing that.Um, and that we usually go by the Chinchilla laws. Now for these models, we actually didn't quite do that because we wanted to make sure that people could actually run these at home and that they [00:07:30] were good for inference. So we trained them kind of beyond those chinchilla points so that we're almost over-training them.I think there's like a joke going on online that they're like long boy and that that came up internally because we were training them for really, really long durations. So that 7B model, the chinchilla point might be 140 billion tokens. Instead, we trained a trillion, so almost seven times longer than you normally would.Swyx: So longboi was the code name. So is it, is it the trading method? Is it the scaling law that you're trying to coin or is it the code name for the 64 billion?Jonathan: Uh, 64. It was just an internal joke for the, for training on way more tokens than you would via chinchilla. Okay. Um, we can coin it long boy and it, it really stuck, but just to, you know, long boys filled with two ELs at the beginning.Yeah. Cause you know, we wanted the lLLaMA thing in there as well. Jonathan: Yeah, yeah, yeah. Our darn CEO we have to rein him in that guy, you know, you can't, yeah. I'm gonna take away his Twitter password at some point. Um, but you know, he had to let that one out publicly. And then I believe there was a YouTube video where someone happened to see it mentioned before the model came out and called it the Long G boy or something like that.Like, so you know, now it's out there in the world. It's out there. It's like Sydnee can't put it back inSwyx: There's a beautiful picture which I think Naveen tweeted out, which, um, shows a long boy on a whiteboard.Jonathan: That was the origin of Long Boy. In fact, the legs of the lLLaMA were the two Ls and the long boy.DATA CHOICES AND THE IMPORTANCE OF REPETITION [00:08:45]Swyx: Well, talk to me about your data choices, right? Like this is your passion project. Like what can you tell us about it?Jonathan: Yeah, I think Abhi wanted to kill me by the end for trying to use all the GPUs on data and none of them on actually training the model. Um, at the end of the day, We know that you need to train these models and [00:09:00] lots of data, but there are a bunch of things we don't know.Number one is what kinds of different data sources matter. The other is how much does repetition really matter? And really kind of repetition can be broken down into how much does quality versus quantity matter. Suppose I had the world's best 10 billion tokens of data. Would it be better to train on that a hundred times or better to train on a trillion tokens of low quality, fresh data?And obviously there's, there's a middle point in between. That's probably the sweet spot. But how do you even know what good quality data is? And. So, yeah, this is, nobody knows, and I think the more time I spent, we have a whole data team, so me and several other people, the more time that we spent on this, you know, I came away thinking, gosh, we know nothing.Gosh, if I were back in academia right now, I would definitely go and, you know, write a paper about this because I have no idea what's going on.Swyx: You would write a paper about it. I'm interested in such a paper. I haven't come across any that exists. Could you frame the central question of such a paper?THE CENTRAL QUESTION: WHAT MIX OF DATA SETS SHOULD YOU USE? [00:10:00]Jonathan: Yeah. The central question is what mix of data sets should you use? Okay. Actually I've, you know, you had mentioned my law school stuff. I went back to Georgetown Law where I used to teach, um, in the midst of creating this model, and I actually sat down with a class of law students and asked them, I gave them our exact data sets, our data mixes, um, like how many tokens we had, and I said, Create the best data set for your model.Knowing they knew nothing about large language models, they just know that data goes in and it's going to affect the behavior. Um, and I was like, create a mix and they basically covered all the different trade-offs. Um, you probably want a lot of English language [00:10:30] text to start with. You get that from the web, but do you want it to be multilingual?If so, you're gonna have a lot less English text. Maybe it'll be worse. Do you wanna have code in there? There are all these beliefs that code leads to models being better at logical reasoning, of which I've seen zero evidence. Rep. It's not, um, I mean, really made a great code model, but code models leading to better chain of thought reasoning on the part of language or code being in the training set leading to better chain of thought reasoning.People claim this all the time, but I've still never seen any real evidence beyond that. You know, one of the generations of the GPT three model started supposedly from Code Da Vinci. Yes. And so there's a belief that, you know, maybe that helped. But again, no evidence. You know, there's a belief that spending a lot of time on good sources like Wikipedia is good for the model.Again, no evidence. At the end of the day, we tried a bunch of different data mixes and the answer was that there are some that are better or worse than others. We did find that the pile, for example, was a really solid data mix, but you know, there were stronger data mixes by our evaluation metrics. And I'll get back to the evaluation question in a minute cuz that's a really important one.This data set called c4, which is what the original T five model was trained on, is weirdly good. And everybody, when I posted on this on Twitter, like Stella Beaterman from Luther mentioned this, I think someone else mentioned this as well. C4 does really well in the metrics and we have no idea why we de-duplicated it against our evaluation set.So it's not like it memorized the data, it is just one web scrape from 2019. If you actually look at the T five paper and see how it was pre-processed, it looks very silly. Mm-hmm. They removed anything that had the word JavaScript in it because they didn't want to get like no JavaScript [00:12:00] warnings. They removed anything with curly braces cuz they didn't wanna get JavaScript in it.They looked at this list of bad words, um, and removed anything that had those bad words. If you actually look at the list of bad words, words like gay are on that list. And so there's, you know, it is a very problematic, you know, list of words, but that was the cleaning that leads to a data set that seems to be unbeatable.So that to me says that we know nothing about data. We, in fact used a data set called mc four as well, which is they supposedly did the same pre-processing of C4 just on more web calls. The English portion is much worse than C4 for reasons that completely escape us. So in the midst of all that, Basically I set two criteria.One was I wanted to be at least as good as mc four English, like make sure that we're not making things actively worse. And mc four English is a nice step up over other stuff that's out there. And two was to go all in on diversity after that, making sure that we had some code, we had some scientific papers, we had Wikipedia, because people are gonna use this model for all sorts of different purposes.But I think the most important thing, and I'm guessing abhi had a million opinions on this, is you're only as good as your evaluation. And we don't know how to evaluate models for the kind of generation we ask them to do. So past a certain point, you have to kinda shrug and say, well, my evaluation's not even measuring what I care about.Mm-hmm. So let me just make reasonable choices. EVALUATION CHALLENGES OF LLMs [0:13:00]Swyx: So you're saying MMLU, big bench, that kind of stuff is not. Convincing for youJonathan: A lot of this stuff is you've got two kinds of tasks. Some of these are more of multiple choice style tasks where there is a right answer. Um, either you ask the model to spit out A, B, C, or D or you know, and if you're more [00:13:30] sophisticated, you look at the perplexity of each possible answer and pick the one that the model is most likely to generate.But we don't ask these models to do multiple choice questions. We ask them to do open-ended generation. There are also open-ended generation tasks like summarization. You compare using things like a blue score or a rouge score, which are known to be very bad ways of comparing text. At the end of the day, there are a lot of great summaries of a paper.There are a lot of great ways to do open form generation, and so humans are, to some extent, the gold standard. Humans are very expensive. It turns out we can't put them into our eval pipeline and just have the humans look at our model every, you know, 10 minutes? Not yet. Not yet. Maybe soon. Um, are you volunteering Abhi?Abhinav: I, I, I just know we have a great eval team who's, uh, who's helping us build new metrics. So if they're listening,Jonathan: But it's, you know, evaluation of large language models is incredibly hard and I don't think any of these metrics really truly capture. What we expect from the models in practice.Swyx: Yeah. And we might draw wrong conclusions.There's been a debate recently about the emergence phenomenon, whether or not it's a mirage, right? I don't know if you guys have opinions about that process. Abhinav: Yeah, I think I've seen like this paper and all and all, even just kind of plots from different people where like, well maybe it's just a artifact of power, like log scaling or metrics or, you know, we're meshing accuracy, which is this a very like harsh zero one thing.Yeah. Rather than kind of something more continuous. But yeah, similar to what Jonathan was saying about evals. Like there there's one issue of like you just like our diversity of eval metrics, like when we put these models up, even like the chat ones, the instruct ones, people are using 'em for such a variety of tasks.There's just almost no way we get ahead of time, like measuring individual dimensions. And then also particularly like, you know, at the 7B scale, [00:15:00] um, these models still are not super great yet at the really hard tasks, like some of the hardest tasks in MMLU and stuff. So sometimes they're barely scoring like the above kind of random chance, you know, like on really, really hard tasks.So potentially as we. You know, aim for higher and higher quality models. Some of these things will be more useful to us. But we kind of had to develop MPT 7B kind of flying a little bit blind on, on what we knew it was coming out and just going off of like, you know, a small set of common sensor reasoning tasks.And of course, you know, just comparing, you know, those metrics versus other open source models. Alessio: I think fast training in inference was like one of the goals, right? So there's always the trade off between doing the hardest thing and like. Doing all the other things quickly.Abhinav: Yeah, absolutely. Yeah, I mean, I think like, you know, even at the 7B scale, you know, uh, people are trying to run these things on CPUs at home.You know, people are trying to port these to their phones, basically prioritizing the fact that the small scale would lead to our adoption. That was like a big, um, big thing going on. Alessio: Yeah. and you mentioned, um, flash attention and faster transformer as like two of the core things. Can you maybe explain some of the benefits and maybe why other models don't use it?FLASH ATTENTION [00:16:00]Abhinav: Yeah, absolutely. So flash attention is this basically faster implementation of full attention. Um, it's like a mathematical equivalent developed by like actually some of our collaborators, uh, at Stanford. Uh, the hazy research. Hazy research, yeah, exactly.Jonathan: What is, what, what, what's the name hazy research mean?Abhinav: I actually have no idea.Swyx: I have no clue. All these labs have fun names. I always like the stories behind them.Abhinav: Yeah, absolutely. We really, really liked flash attention. We, I think, had to integrate into repo even as [00:16:30] as early as September of last year. And it really just helps, you know, with training speed and also inference speed and we kind of bake that into model architecture.And this is kind of unique amongst all the other hugging face models you see out there. So ours actually, you can toggle between normal torch attention, which will work anywhere and flash attention, which will work on GPUs right out of the box. And that way I think you get almost like a 2x speed up at training time and somewhere between like 50% to a hundred percent speed up at inference time as well.So again, this is just like, we really, really wanted people to use these and like, feel like an improvement and we, we have the team to, to help deliver that. Swyx: Another part, um, of your choices was alibi position, encodings, which people are very interested in, maybe a lot of people just, uh, to sort of take in, in coatings as, as a given.But there's actually a lot of active research and honestly, it's a lot of, um, it's very opaque as well. Like people don't know how to evaluate encodings, including position encodings, but may, may, could you explain, um, alibi and, um, your choice?Abhinav: Yeah, for sure. The alibi and uh, kind of flash attention thing all kind of goes together in interesting ways.And even with training stability too. What alibi does really is that it eliminates the need to have positional embeddings in your model. Where previously, if you're a token position one, you have a particular embedding that you add, and you can't really go beyond your max position, which usually is like about 2000.With alibies, they get rid of that. Instead, just add a bias to the attention map itself. That's kind of like this slope. And if at inference time you wanna go much, much larger, they just kind of stretch that slope out to a longer, longer number of positions. And because the slope is kind of continuous and you can interpret it, it all works out now.Now one of [00:18:00] the, the funny things we found is like with flash attention, it saved so much memory and like improved performance so much that even as early as I kind of last year, like we were profiling models with, with very long context lines up to like, you know, the 65 k that you seen in release, we just never really got around to using it cuz we didn't really know what we might use it for.And also it's very hard to train stably. So we started experimenting with alibi integration, then we suddenly found that, oh wow, stability improves dramatically and now we can actually work together with alibi in a long context lens. That's how we got to like our story writer model where we can stably train these models out to very, very long context lenses and, and use them performantly.Jonathan: Yeah.Swyx: And it's also why you don't have a firm number. Most people now have a firm number on the context line. Now you're just like, eh, 65 to 85Abhinav: Oh yeah, there's, there's a, there's a big age to be 64 K or 65 k. 65 k plus.Swyx: Just do powers of twos. So 64 isn't, you know. Jonathan: Right, right. Yeah. Yeah. But we could, I mean, technically the context length is infinite.If you give me enough memory, um, you know, we can just keep going forever. We had a debate over what number to say is the longest that we could handle. We picked 84 cakes. It's the longest I expect people to see easily in practice. But, you know, we played around for even longer than that and I don't see why we couldn't go longer.Swyx: Yeah. Um, and so for those who haven't read the blog posts, you put the Great Gatsby in there and, uh, asked it to write an epilogue, which seemed pretty impressive.Jonathan: Yeah. There are a bunch of epilogues floating around internally at Mosaic. Yeah. That wasn't my favorite. I think we all have our own favorites.Yeah. But there are a bunch of really, really good ones. There was one where, you know, it's Gatsby's funeral and then Nick starts talking to Gatsby's Ghost, and Gatsby's father shows up and, you know, then he's [00:19:30] at the police station with Tom. It was very plot heavy, like this is what comes next. And a bunch of that were just very Fitzgerald-esque, like, you know, beautiful writing.Um, but it was cool to just see that Wow, the model seemed to actually be working with. You know, all this input. Yeah, yeah. Like it's, it's exciting. You can think of a lot of things you could do with that kind of context length.FINE-TUNING FOR CREATIVITY [00:19:50]Swyx: Is there a trick to fine tuning for a creative task rather than, um, factual task?Jonathan: I don't know what that is, but probably, yeah, I think, you know, the person, um, Alex who did this, he did fine tune the model explicitly on books. The goal was to try to get a model that was really a story writer. But, you know, beyond that, I'm not entirely sure. Actually, it's a great question. Well, no, I'll ask you back.How would you measure that? Swyx: Uh, God, human feedback is the solve to all things. Um, I think there is a labeling question, right? Uh, in computer vision, we had a really, really good episode with Robo Flow on the segment. Anything model where you, you actually start human feedback on like very, I think it's something like 0.5% of the, the overall, uh, final, uh, uh, labels that you had.But then you sort augment them and then you, you fully automate them, um, which I think could be applied to text. It seems intuitive and probably people like snorkel have already raised ahead on this stuff, but I just haven't seen this applied in the language domain yet.Jonathan: It, I mean there are a lot of things that seem like they make a lot of sense in machine learning that never work and a lot of things that make zero sense that seem to work.So, you know, I've given up trying to even predict. Yeah, yeah. Until I see the data or try it, I just kind shg my shoulders and you know, you hope for the best. Bring data or else, right? Yeah, [00:21:00] exactly. Yeah, yeah, yeah.Alessio: The fine tuning of books. Books three is like one of the big data sets and there was the whole.Twitter thing about trade comments and like, you know, you know, I used to be a community moderator@agenius.com and we've run into a lot of things is, well, if you're explaining lyrics, do you have the right to redistribute the lyrics? I know you ended up changing the license on the model from a commercial use Permitted.Swyx: Yeah let's let them. I'm not sure they did. Jonathan: So we flipped it for about a couple hours. Swyx: Um, okay. Can we, can we introduce the story from the start Just for people who are under the loop. Jonathan: Yeah. So I can tell the story very simply. So, you know, the book three data set does contain a lot of books. And it is, you know, as I discovered, um, it is a data set that provokes very strong feelings from a lot of folks.Um, that was one, one guy from one person in particular, in fact. Um, and that's about it. But it turns out one person who wants a lot of attention can, you know, get enough attention that we're talking about it now. And so we had a, we had a discussion internally after that conversation and we talked about flipping the license and, you know, very late at night I thought, you know, maybe it's a good thing to do.And decided, you know, actually probably better to just, you know, Stan Pat's license is still Apache too. And one of the conversations we had was kind of, we hadn't thought about this cuz we had our heads down, but the Hollywood writer Strike took place basically the moment we released the model. Mm-hmm.Um, we were releasing a model that could do AI generated creative content. And that is one of the big sticking points during the strike. Oh, the optics are not good. So the optics aren't good and that's not what we want to convey. This is really, this is a demo of the ability to do really long sequence lengths and.Boy, you know, [00:22:30] that's, that's not timing that we appreciated. And so we talked a lot internally that night about like, oh, we've had time to read the news. We've had time to take a breath. We don't really love this. Came to the conclusion that it's better to just leave it as it is now and learn the lesson for the future.But certainly that was one of my takeaways is this stuff, you know, there's a societal context around this that it's easy to forget when you're in the trenches just trying to get the model to train. And you know, in hindsight, you know, I might've gone with a different thing than a story writer. I might've gone with, you know, coder because we seem to have no problem putting programmers out of work with these models.Swyx: Oh yeah. Please, please, you know, take away this stuff from me.OPEN SOURCE LICENSES AND ETHICAL CONSIDERATIONS [00:23:00]Jonathan: Right. You know, so it's, I think, you know, really. The copyright concerns I leave to the lawyers. Um, that's really, if I learned one thing teaching at a law school, it was that I'm not a lawyer and all this stuff is a little complicated, especially open source licenses were not designed for this kind of world.They were designed for a world of forcing people to be more open, not forcing people to be more closed. And I think, you know, that was part of the impetus here, was to try to use licenses to make things more closed. Um, which is, I think, against the grain of the open source ethos. So that struck me as a little bit strange, but I think the most important part is, you know, we wanna be thoughtful and we wanna do the right thing.And in that case, you know, I hope with all that interesting licensing fund you saw, we're trying to be really thoughtful about this and it's hard. I learned a lot from that experience. Swyx: There's also, I think, an open question of fair use, right? Is training on words of fair use because you don't have a monopoly on words, but some certain arrangements of words you do.And who is to say how much is memorization by a model versus actually learning and internalizing and then. Sometimes happening to land at the right, the [00:24:00] same result.Jonathan: And if I've learned one lesson, I'm not gonna be the person to answer that question. Right, exactly. And so my position is, you know, we will try to make this stuff open and available.Yeah. And, you know, let the community make decisions about what they are or aren't comfortable using. Um, and at the end of the day, you know, it still strikes me as a little bit weird that someone is trying to use these open source licenses to, you know, to close the ecosystem and not to make things more open.That's very much against the ethos of why these licenses were created.Swyx: So the official mosaic position, I guess is like, before you use TC MPC 7B for anything commercial, check your own lawyers now trust our lawyers, not mosaic's lawyers.Jonathan: Yeah, okay. Yeah. I'm, you know, our lawyers are not your lawyers.Exactly. And, you know, make the best decision for yourself. We've tried to be respectful of the content creators and, you know, at the end of the day, This is complicated. And this is something that is a new law. It's a new law. It's a new law that hasn't been established yet. Um, but it's a place where we're gonna continue to try to do the right thing.Um, and it's, I think, one of the commenters, you know, I really appreciated this said, you know, well, they're trying to do the right thing, but nobody knows what the right thing is to even do, you know, the, I guess the, the most right thing would've been to literally not release a model at all. But I don't think that would've been the best thing for the community either.Swyx: Cool.Well, thanks. Well handled. Uh, we had to cover it, just causeJonathan: Oh, yes, no worries. A big piece of news. It's been on my mind a lot.TRAINING STABILITY ENHANCEMENT [00:25:15]Swyx: Yeah. Yeah. Well, you've been very thoughtful about it. Okay. So a lot of these other ideas in terms of architecture, flash, attention, alibi, and the other data sets were contributions from the rest of the let's just call it open community of, of machine learning advancements. Uh, but Mosaic in [00:25:30] particular had some stability improvements to mitigate loss spikes, quote unquote, uh, which, uh, I, I took to mean, uh, your existing set of tools, uh, maybe we just co kind of covered that. I don't wanna sort of put words in your mouth, but when you say things like, uh, please enjoy my empty logbook.How much of an oversell is that? How much, you know, how much is that marketing versus how much is that reality?Abhinav: Oh yeah. That, that one's real. Yeah. It's like fully end-to-end. Um, and I think.Swyx: So maybe like what, what specific features of Mosaic malibu?Abhinav: Totally, totally. Yeah. I think I'll break it into two parts.One is like training stability, right? Knowing that your model's gonna basically get to the end of the training without loss spikes. Um, and I think, you know, at the 7B scale, you know, for some models like it ha it's not that big of a deal. As you train for longer and longer durations, we found that it's trickier and trickier to avoid these lost spikes.And so we actually spent a long time figuring out, you know, what can we do about our initialization, about our optimizers, about the architecture that basically prevents these lost spikes. And you know, even in our training run, if you zoom in, you'll see small intermittent spikes, but they recover within a few hundred steps.And so that's kind of the magical bit. Our line is one of defenses we recover from Las Vegas, like just naturally, right? Mm-hmm. Our line two defense was that we used determinism and basically really smart resumption strategies so that if something catastrophic happened, we can resume very quickly, like a few batches before.And apply some of these like, uh, interventions. So we had these kinds of preparations, like a plan B, but we didn't have to use them at all for MPT 7B training. So, that was kind of like a lucky break. And the third part of like basically getting all the way to the empty law book is having the right training infrastructure.[00:27:00]So this is basically what, like is, one of the big selling points of the platform is that when you try to train these models on hundreds of GPUs, not many people outside, you know, like deep industry research owners, but the GPUs fail like a lot. Um, I would say like almost once every thousand a 100 days.So for us on like a big 512 cluster every two days, basically the run will fail. Um, and this is either due to GPUs, like falling off the bus, like that's, that's a real error we see, or kind of networking failures or something like that. And so in those situations, what people have normally done is they'll have an on-call team that's just sitting round the clock, 24-7 on slack, once something goes wrong.And if then they'll basically like to try to inspect the cluster, take nodes out that are broken, restart it, and it's a huge pain. Like we ourselves did this for a few months. And as a result of that, because we're building such a platform, we basically step by step automated every single one of those processes.So now when a run fails, we have this automatic kind of watch talk that's watching. It'll basically stop the job. Test the nodes cord in anyone's that are broken and relaunch it. And because our software's all deterministic has fast resumption stuff, it just continues on gracefully. So within that log you can see sometimes I think maybe at like 2:00 AM or something, the run failed and within a few minutes it's back up and running and all of us are just sleeping peacefully.Jonathan: I do wanna say that was hard one. Mm-hmm. Um, certainly this is not how things were going, you know, many months ago, hardware failures we had on calls who were, you know, getting up at two in the morning to, you know, figure out which node had died for what reason, restart the job, have to cord the node. [00:28:30] Um, we were seeing catastrophic loss spikes really frequently, even at the 7B scale that we're just completely derailing runs.And so this was step by step just ratcheting our way there. As Abhi said, to the point where, Many models are training at the moment and I'm sitting here in the studio and not worrying one bit about whether the runs are gonna continue. Yeah. Swyx: I'm, I'm not so much of a data center hardware kind of guy, but isn't there existing software to do this for CPUs and like, what's different about this domain? Does this question make sense at all?Jonathan: Yeah, so when I think about, like, I think back to all the Google fault tolerance papers I read, you know, as an undergrad or grad student mm-hmm. About, you know, building distributed systems. A lot of it is that, you know, Each CPU is doing, say, an individual unit of work.You've got a database that's distributed across your cluster. You wanna make sure that one CPU failing can't, or one machine failing can't, you know, delete data. So you, you replicate it. You know, you have protocols like Paxos where you're literally, you've got state machines that are replicated with, you know, with leaders and backups and things like that.And in this case, you were performing one giant computation where you cannot afford to lose any node. If you lose a node, you lose model state. If you lose a node, you can't continue. It may be that, that in the future we actually, you know, create new versions of a lot of our distributed training libraries that do have backups and where data is replicated so that if you lose a node, you can detect what node you've lost and just continue training without having to stop the run, you know?Pull from a checkpoint. Yeah. Restart again on different hardware. But for now, we're certainly in a world where if anything dies, that's the end of the run and you have to go back and recover from it. [00:30:00]DATA READINESS & TRAINING PREPARATION [00:30:00]Abhinav: Yeah. Like I think a big part, a big word there is like synchronous data pluralism, right? So like, we're basically saying that on every step, every GP is gonna do some work.They're gonna stay in sync with each other and average their, their gradients and continue. Now that there are algorithmic techniques to get around this, like you could say, oh, if a GP dies, just forget about it. All the data that's gonna see, we'll just forget about it. We're not gonna train on it.But, we don't like to do that currently because, um, it makes us give up determinism, stuff like that. Maybe in the future, as you go to extreme scales, we'll start looking at some of those methods. But at the current time it's like, we want determinism. We wanted to have a run that we could perfectly replicate if we needed to.And it was, the goal is figure out how to run it on a big cluster without humans having to babysit it. Babysit it. Alessio: So as you mentioned, these models are kind of the starting point for a lot of your customers To start, you have a. Inference product. You have a training product. You previously had a composer product that is now kind of not rolled into, but you have like a super set of it, which is like the LLM foundry.How are you seeing that change, you know, like from the usual LOP stack and like how people train things before versus now they're starting from, you know, one of these MPT models and coming from there. Like worship teams think about as they come to you and start their journey.Jonathan: So I think there's a key distinction to make here, which is, you know, when you say starting from MPT models, you can mean two things.One is actually starting from one of our checkpoints, which I think very few of our customers are actually going to do, and one is starting from our configuration. You can look at our friends at Rep for that, where, you know, MPT was in progress when Refl [00:31:30] came to us and said, Hey, we need a 3 billion parameter model by next week on all of our data.We're like, well, here you go. This is what we're doing, and if it's good enough for us, um, hopefully it's good enough for you. And that's basically the message we wanna send to our customers. MPT is basically clearing a path all the way through where they know that they can come bring their data, they can use our training infrastructure, they can use all of our amazing orchestration and other tools that abhi just mentioned, for fault tolerance.They can use Composer, which is, you know, still at the heart of our stack. And then the l l M Foundry is really the specific model configuration. They can come in and they know that thing is gonna train well because we've already done it multiple times. Swyx: Let's dig in a little bit more on what should people have ready before they come talk to you? So data architecture, eval that they're looking, etc.Abhinav: Yeah, I, I mean, I think we'll accept customers at any kind of stage in their pipeline. You know, like I'd say science, there's archetypes of people who have built products around like some of these API companies and reach a stage or maturity level where it's like we want our own custom models now, either for the purpose of reducing cost, right?Like our inference services. Quite a bit cheaper than using APIs or because they want some kind of customization that you can't really get from the other API providers. I'd say the most important things to have before training a big model. You know, you wanna have good eval metrics, you know, some kind of score that you can track as you're training your models and scaling up, they can tell you you're progressing.And it's really funny, like a lot of times customers will be really excited about training the models, right? It's really fun to like launch shelves on hundreds of gfs, just all around. It's super fun. But then they'll be like, but wait, what are we gonna measure? Not just the training loss, right? I mean, it's gotta be more than that.[00:33:00]So eval metrics is like a, it's a good pre-req also, you know, your data, you know, either coming with your own pre-training or fine-tune data and having like a strategy to clean it or we can help clean it too. I think we're, we're building a lot of tooling around that. And I think once you have those two kinds of inputs and sort of the budget that you want, we can pretty much walk you through the rest of it, right?Like that's kind of what we do. Recently we helped build CR FM's model for biomedical language a while back. Jonathan: Um, we can. That's the center of research for foundation models. Abhi: Exactly, exactly.Jonathan: Spelling it out for people. Of course.Abhinav: No, absolutely. Yeah, yeah. No, you've done more of these than I have.Um, I think, uh, basically it's sort of, we can help you figure out what model I should train to scale up so that when I go for my big run company, your here run, it's, uh, it's predictable. You can feel confident that it's gonna work, and you'll kind of know what quality you're gonna get out before you have to spend like a few hundred thousand dollars.DYNAMIC REAL-TIME MODEL EVALUATION [00:34:00]Alessio: The rap Reza from rap was on the podcast last week and, uh, they had human eval and then that, uh, I'm Jon Eval, which is like vibe based. Jonathan: And I, I do think the vibe based eval cannot be, you know, underrated really at the, I mean, at the end of the day we, we did stop our models and do vibe checks and we did, as we monitor our models, one of our evals was we just had a bunch of prompts and we would watch the answers as the model trained and see if they changed cuz honestly, You know, I don't really believe in any of these eval metrics to capture what we care about.Mm-hmm. But when you ask it, uh, you know, I don't know. I think one of our prompts was to suggest games for a three-year-old and a seven-year-old. That would be fun to play. Like that was a lot more [00:34:30] valuable to me personally, to see how that answer evolved and changed over the course of training. So, you know, and human eval, just to clarify for folks, human human eval is an automated evaluation metric.There's no humans in it at all. There's no humans in it at all. It's really badly named. I got so confused the first time that someone brought that to me and I was like, no, we're not bringing humans in. It's like, no, it's, it's automated. They just called it a bad name and there's only a hundred cents on it or something.Abhinav: Yeah. Yeah. And, and it's for code specifically, right?Jonathan: Yeah. Yeah. It's very weird. It's a, it's a weird, confusing name that I hate, but you know, when other metrics are called hella swag, like, you know, you do it, just gotta roll with it at this point. Swyx: You're doing live evals now. So one, one of the tweets that I saw from you was that it is, uh, important that you do it paralyzed.Uh, maybe you kind of wanna explain, uh, what, what you guys did.Abhinav: Yeah, for sure. So with LLM Foundry, there's many pieces to it. There's obviously the core training piece, but there's also, you know, tools for evaluation of models. And we've kind of had one of the, I think it's like the, the fastest like evaluation framework.Um, basically it's multi GPU compatible. It runs with Composer, it can support really, really big models. So basically our framework runs so fast that even Azure models are training. We can run these metrics live during the training. So like if you have a dashboard like weights and biases, you kind of watch all these evil metrics.We have, like, 15 or 20 of them honestly, that we track during the run and add negligible overhead. So we can actually watch as our models go and feel confident. Like, it's not like we wait until the very last day to, to test if the models good or notJonathan: That's amazing. Yeah. I love that we've gotten this far into the conversation.We still haven't talked about efficiency and speed. Those are usually our two watch words at Mosaic, which is, you know, that's great. That says that we're [00:36:00] doing a lot of other cool stuff, but at the end of the day, um, you know, Cost comes first. If you can't afford it, it doesn't matter. And so, you know, getting things down cheap enough that, you know, we can monitor in real time, getting things down cheap enough that we can even do it in the first place.That's the basis for everything we do.OPEN SCIENCE FOR AFFORDABLE AI RESEARCH [00:36:00]Alessio: Do you think a lot of the questions that we have around, you know, what data sets we should use and things like that are just because training was so expensive before that, we just haven't run enough experiments to figure that out. And is that one of your goals is trying to make it cheaper so that we can actually get the answers?Jonathan: Yeah, that's a big part of my personal conviction for being here. I think I'm, I'm still in my heart, the second year grad student who was jealous of all his friends who had GPUs and he didn't, and I couldn't train any models except in my laptop. And that, I mean, the lottery ticket experiments began on my laptop that I had to beg for one K 80 so that I could run amist.And I'm still that person deep down in my heart. And I'm a believer that, you know, if we wanna do science and really understand these systems and understand how to make them work well, understand how they behave, understand what makes them safe and reliable. We need to make it cheap enough that we can actually do science, and science involves running dozens of experiments.When I finally, you know, cleaned out my g c s bucket from my PhD, I deleted a million model checkpoints. I'm not kidding. There were over a million model checkpoints. That is the kind of science we need, you know, that's just what it takes. In the same way that if you're in a biology lab, you don't just grow one cell and say like, eh, the drug seems to work on that cell.Like, there's a lot more science you have to do before you really know.Abhinav: Yeah. And I think one of the special things about Mosaic's kind of [00:37:30] position as well is that we have such, so many customers all trying to train models that basically we have the incentive to like to devote all these resources and time to do this science.Because when we learn which pieces actually work, which ones don't, we get to help many, many people, right? And so that kind of aggregation process I think is really important for us. I remember way back there was a paper about Google that basically would investigate batch sizes or something like that.And it was this paper that must have cost a few million dollars during all the experience. And it was just like, wow, what a, what a benefit to the whole community. Now, like now we all get to learn from that and we get, we get to save. We don't have to spend those millions of dollars anymore. So I think, um, kind of mosaical science, like the insights we get on, on data, on pre-screening architecture, on all these different things, um, that's why customers come to us.Swyx: Yeah, you guys did some really good stuff on PubMed, G B T as well. That's the first time I heard of you. Of you. And that's also published to the community.Abhinav: Yeah, that one was really fun. We were like, well, no one's really trained, like fully from scratch domain specific models before. Like, what if we just did a biomed one?Would it still work? And, uh, yeah, I'd be really excited. That did, um, we'll probably have some follow up soon, I think, later this summer.Jonathan: Yeah. Yes. Stay tuned on that. Um, but I, I will say just in general, it's a really important value for us to be open in some sense. We have no incentive not to be open. You know, we make our money off of helping people train better.There's no cost to us in sharing what we learn with the community. Cuz really at the end of the day, we make our money off of those custom models and great infrastructure and, and putting all the pieces together. That's honestly where the Mosaic name came from. Not off of like, oh, we've got, you know, this one cool secret trick [00:39:00] that we won't tell you, or, you know, closing up.I sometimes, you know, in the past couple weeks I've talked to my friends at places like Brain or, you know, what used to be Brain Now Google DeepMind. Oh, I R I P Brain. Yeah. R i p Brian. I spent a lot of time there and it was really a formative time for me. Um, so I miss it, but. You know, I kind of feel like we're one of the biggest open research labs left in industry, which is a very sad state of affairs because we're not very big.Um, but at least can you say how big the team is actually? Yeah. We were about 15 researchers, so we're, we're tiny compared to, you know, the huge army of researchers I remember at Brain or at fair, at Deep Mind back, you know, when I was there during their heydays. Um, you know, but everybody else is kind of, you know, closed up and isn't saying very much anymore.Yeah. And we're gonna keep talking and we're gonna keep sharing and, you know, we will try to be that vanguard to the best of our ability. We're very small and I, I can't promise we're gonna do what those labs used to do in terms of scale or quantity of research, but we will share what we learn and we will try to create resources for the community.Um, I, I dunno, I just, I believe in openness fundamentally. I'm an academic at heart and it's sad to me to watch that go away from a lot of the big labs. THE OPEN APPROACH [00:40:15]Alessio: We just had a live pod about the, you know, open AI snow mode, uh, post that came out and it was one of the first time I really dove into Laura and some of the this new technologies, like how are you thinking about what it's gonna take for like the open approach to really work?Obviously today, GPT four is still, you know, part of like that state-of-the-art model for a [00:40:30] lot of tasks. Do you think some of the innovation and kind of returning methods that we have today are enough if enough people like you guys are like running these, these research groups that are open? Or do you think we still need a step function improvement there?Jonathan: I think one important point here is the idea of coexistence. I think when you look at, I don't know who won Linux or Windows, the answer is yes. Microsoft bought GitHub and has a Windows subsystem for Linux. Linux runs a huge number of our servers and Microsoft is still a wildly profitable company.Probably the most successful tech company right now. So who won open source or closed source? Yes. Um, and I think that's a similar world that we're gonna be in here where, you know, it's gonna be different things for different purposes. I would not run Linux on my laptop personally cuz I like connecting to wifi and printing things.But I wouldn't run Windows on one of my surfers. And so I do think what we're seeing with a lot of our customers is, do they choose opening IR mosaic? Yes. There's a purpose for each of these. You have to send your data off to somebody else with open eyes models. That's a risk. GPT four is amazing and I would never promise someone that if they come to Mosaic, they're gonna get a GPT four quality model.That's way beyond our means and not what we're trying to do anyway. But there's also a whole world for, you know, domain specific models, context specific models that are really specialized, proprietary, trained on your own data that can do things that you could never do with one of these big models. You can customize in crazy ways like G B T four is not gonna hit 65 K context length for a very long time, cuz they've already trained that [00:42:00] model and you know, they haven't even released the 32 K version yet.So we can, you know, we can do things differently, you know, by being flexible. So I think the answer to all this is yes. But we can't see the open source ecosystem disappear. And that's the scariest thing for me. I hear a lot of talk in academia about, you know, whatever happened to that academic research on this field called information retrieval?Well, in 1999 it disappeared. Why? Because Google came along and who cares about information retrieval research when you know you have a Google Scale, you know, Web Scale database. So you know, there's a balance here. We need to have both. Swyx: I wanna applaud you, Elaine. We'll maybe edit it a little like crowd applause, uh, line.Cuz I, I think that, um, that is something that as a research community, as people interested in progress, we need to see these things instead of just, uh, seeing marketing papers from the advertising GPT 4.Jonathan: Yeah. I, I think I, you know, to get on my soapbox for 10 more seconds. Go ahead. When I talk to policymakers about, you know, the AI ecosystem, the usual fear that I bring up is, Innovation will slow because of lack of openness.I've been complaining about this for years and it's finally happened. Hmm. Why is Google sharing, you know, these papers? Why is Open AI sharing these papers? There are a lot of reasons. You know, I have my own beliefs, but it's not something we should take for granted that everybody's sharing the work that they do and it turns out well, I think we took it for granted for a while and now it's gone.I think it's gonna slow down the pace of progress. In a lot of cases, each of these labs has a bit of a monoculture and being able to pass ideas [00:43:30] back and forth was a lot of what kept, you know, scientific progress moving. So it's imperative not just, you know, for the open source community and for academia, but for the progress of technology.That we have a vibrant open source research community.THE FUTURE OF MOSAIC [00:44:11]Swyx: There's a preview of the ecosystem and commentary that we're, we're gonna do. But I wanna close out some stuff on Mosaic. You launched a bunch of stuff this month. A lot of stuff, uh, actually was, I was listening to you on Gradient descent, uh, and other podcasts we know and love.Uh, and you said you also said you were not gonna do inference and, and, and last week you were like, here's Mosaic ML inference. Oops. So maybe just a, at a high level, what was Mosaic ml and like, what is it growing into? Like how do you conceptualize this? Jonathan: Yeah, and I will say gradient, when graded dissent was recorded, we weren't doing inference and had no plans to do it.It took a little while for the podcast to get out. Um, in the meantime, basically, you know, one thing I've learned at a startup, and I'm sure abhi can comment on this as well, focus is the most important thing. We have done our best work when we've been focused on doing one thing really well and our worst work when we've tried to do lots of things.Yeah. So, We don't want to do inference, we don't want to have had to do inference. Um, and at the end of the day, our customers were begging us to do it because they wanted a good way to serve the models and they liked our ecosystem. And so in some sense, we got dragged into it kicking and screaming. We're very excited to have a product.We're going to put our best foot forward and make something really truly amazing. But there is, you know, that's something that we were reluctant to do. You know, our customers convinced us it would be good for our business. It's been wonderful for business and we are gonna put everything into this, but you know, back when grading dissent came out, I [00:45:00] was thinking like, or when we recorded it or focused, oh God, like focus is the most important thing.I've learned that the hard way multiple times that Mosaic, abhi can tell you like, you know, I've made a lot of mistakes on not focusing enough. Um, boy inference, that's a whole second thing, and a whole different animal from training. And at the end of the day, when we founded the company, our belief was that inference was relatively well served at that time.There were a lot of great inference companies out there. Um, training was not well served, especially efficient training. And we had something to add there. I think we've discovered that as the nature of the models have changed, the nature of what we had to add to inference changed a lot and there became an opportunity for us to contribute something.But that was not the plan. But now we do wanna be the place that people come when they wanna train these big, complex, difficult models and know that it's gonna go right the first time and they're gonna have something they can servee right away. Um, you know, really the rep example of, you know, with 10 days to go saying, Hey, can you please train that model?And, you know, three or four days later the model was trained and we were just having fun doing interesting, fine tuning work in it for the rest of the 10 days, you know. That also requires good inference. Swyx: That's true, that's true. Like, so running evals and, and fine tuning. I'm just putting my business hat on and you know, and Alessio as well, like, uh, I've actually had fights with potential co-founders about this on the primary business.Almost like being training, right? Like essentially a one-time cost.Jonathan: Who told you it was a one time cost? What, who, who told you that?Swyx: No, no, no, no. Correct me. Jonathan: Yeah. Yeah. Let me correct you in two ways. Um, as our CEO Navine would say, if he were here, when you create version 1.0 of your software, do you then fire all the engineers?Of [00:46:30] course not. You never, like, MPT has a thousand different things we wanted to do that we never got to. So, you know, there will be future models.Abhinav: And, and the data that's been trained on is also changing over time too, right? If you wanna ask anything about, I guess like May of 2023, we'll have to retrain it further and so on.Right? And I think this is especially true for customers who run like the kind of things that need to be up to date on world knowledge. So I, I think like, you know, the other thing I would say too is that, The malls we have today are certainly not the best malls we'll ever produce. Right. They're gonna get smaller, they're gonna get faster, they're gonna get cheaper, they're gonna get lower latency, they're gonna get higher quality.Right? And so you always want the next gen version of MPT and the one after that and one after that. There's a reason that even the GPT series goes three, four, and we know there's gonna be a five. Right? Um, so I I I also don't see as a, as a one-time cost.Jonathan: Yeah. Yeah. And I, if you wanna cite a stat on this, there are very, very
Latent Space is popping off! Welcome to the over 8500 latent space explorers who have joined us. Join us this month at various events in SF and NYC, or start your own!This post spent 22 hours at the top of Hacker News.As announced during their Developer Day celebrating their $100m fundraise following their Google partnership, Replit is now open sourcing its own state of the art code LLM: replit-code-v1-3b (model card, HF Space), which beats OpenAI's Codex model on the industry standard HumanEval benchmark when finetuned on Replit data (despite being 77% smaller) and more importantly passes AmjadEval (we'll explain!)We got an exclusive interview with Reza Shabani, Replit's Head of AI, to tell the story of Replit's journey into building a data platform, building GhostWriter, and now training their own LLM, for 22 million developers!8 minutes of this discussion go into a live demo discussing generated code samples - which is always awkward on audio. So we've again gone multimodal and put up a screen recording here where you can follow along on the code samples!Recorded in-person at the beautiful StudioPod studios in San Francisco.Full transcript is below the fold. We would really appreciate if you shared our pod with friends on Twitter, LinkedIn, Mastodon, Bluesky, or your social media poison of choice!Timestamps* [00:00:21] Introducing Reza* [00:01:49] Quantitative Finance and Data Engineering* [00:11:23] From Data to AI at Replit* [00:17:26] Replit GhostWriter* [00:20:31] Benchmarking Code LLMs* [00:23:06] AmjadEval live demo* [00:31:21] Aligning Models on Vibes* [00:33:04] Beyond Chat & Code Completion* [00:35:50] Ghostwriter Autonomous Agent* [00:38:47] Releasing Replit-code-v1-3b* [00:43:38] The YOLO training run* [00:49:49] Scaling Laws: from Kaplan to Chinchilla to LLaMA* [00:52:43] MosaicML* [00:55:36] Replit's Plans for the Future (and Hiring!)* [00:59:05] Lightning RoundShow Notes* Reza Shabani on Twitter and LinkedIn* also Michele Catasta and Madhav Singhal* Michele Catasta's thread on the release of replit-code-v1-3b* Intro to Replit Ghostwriter* Replit Ghostwriter Chat and Building Ghostwriter Chat* Reza on how to train your own LLMs (their top blog of all time)* Our Benchmarks 101 episode where we discussed HumanEval* AmjadEval live demo* Nat.dev* MosaicML CEO Naveen Rao on Replit's LLM* MosaicML Composer + FSDP code* Replit's AI team is hiring in North America timezone - Fullstack engineer, Applied AI/ML, and other roles!Transcript[00:00:00] Alessio Fanelli: Hey everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO in residence at Decibel Partners. I'm joined by my co-host, swyx, writer and editor of Latent Space.[00:00:21] Introducing Reza[00:00:21] swyx: Hey and today we have Reza Shabani, Head of AI at Replit. Welcome to the studio. Thank you. Thank you for having me. So we try to introduce people's bios so you don't have to repeat yourself, but then also get a personal side of you.[00:00:34] You got your PhD in econ from Berkeley, and then you were a startup founder for a bit, and, and then you went into systematic equity trading at BlackRock in Wellington. And then something happened and you were now head of AI at Relet. What should people know about you that might not be apparent on LinkedIn?[00:00:50] One thing[00:00:51] Reza Shabani: that comes up pretty often is whether I know how to code. Yeah, you'd be shocked. A lot of people are kind of like, do you know how to code? When I was talking to Amjad about this role, I'd originally talked to him, I think about a product role and, and didn't get it. Then he was like, well, I know you've done a bunch of data and analytics stuff.[00:01:07] We need someone to work on that. And I was like, sure, I'll, I'll do it. And he was like, okay, but you might have to know how to code. And I was like, yeah, yeah, I, I know how to code. So I think that just kind of surprises people coming from like Ancon background. Yeah. Of people are always kind of like, wait, even when people join Relet, they're like, wait, does this guy actually know how to code?[00:01:28] Is he actually technical? Yeah.[00:01:30] swyx: You did a bunch of number crunching at top financial companies and it still wasn't[00:01:34] Reza Shabani: obvious. Yeah. Yeah. I mean, I, I think someone like in a software engineering background, cuz you think of finance and you think of like calling people to get the deal done and that type of thing.[00:01:43] No, it's, it's not that as, as you know, it's very very quantitative. Especially what I did in, in finance, very quantitative.[00:01:49] Quantitative Finance and Data Engineering[00:01:49] swyx: Yeah, so we can cover a little bit of that and then go into the rapid journey. So as, as you, as you know, I was also a quantitative trader on the sell side and the buy side. And yeah, I actually learned Python there.[00:02:01] I learned my, I wrote my own data pipelines there before airflow was a thing, and it was just me writing running notebooks and not version controlling them. And it was a complete mess, but we were managing a billion dollars on, on my crappy code. Yeah, yeah. What was it like for you?[00:02:17] Reza Shabani: I guess somewhat similar.[00:02:18] I, I started the journey during grad school, so during my PhD and my PhD was in economics and it was always on the more data intensive kind of applied economic side. And, and specifically financial economics. And so what I did for my dissertation I recorded cnbc, the Financial News Network for 10 hours a day, every day.[00:02:39] Extracted the close captions from the video files and then used that to create a second by second transcript of, of cmbc, merged that on with high frequency trading, quote data and then looked at, you know, went in and did some, some nlp, tagging the company names, and and then looked at the price response or the change in price and trading volume in the seconds after a company was mentioned.[00:03:01] And, and this was back in. 2009 that I was doing this. So before cloud, before, before a lot of Python actually. And, and definitely before any of these packages were available to make this stuff easy. And that's where, where I had to really learn to code, like outside of you know, any kind of like data programming languages.[00:03:21] That's when I had to learn Python and had to learn all, all of these other skills to work it with data at that, at that scale. So then, you know, I thought I wanted to do academia. I did terrible on the academic market because everyone looked at my dissertation. They're like, this is cool, but this isn't economics.[00:03:37] And everyone in the computer science department was actually way more interested in it. Like I, I hung out there more than in the econ department and You know, didn't get a single academic offer. Had two offer. I think I only applied to like two industry jobs and got offers from both of them.[00:03:53] They, they saw value in it. One of them was BlackRock and turned it down to, to do my own startup, and then went crawling back two and a half years later after the startup failed.[00:04:02] swyx: Something on your LinkedIn was like you're trading Chinese news tickers or something. Oh, yeah. I forget,[00:04:07] Reza Shabani: forget what that was.[00:04:08] Yeah, I mean oh. There, there was so much stuff. Honestly, like, so systematic active equity at, at BlackRock is, was such an amazing. Group and you just end up learning so much and the, and the possibilities there. Like when you, when you go in and you learn the types of things that they've been trading on for years you know, like a paper will come out in academia and they're like, did you know you can use like this data on searches to predict the price of cars?[00:04:33] And it's like, you go in and they've been trading on that for like eight years. Yeah. So they're, they're really ahead of the curve on, on all of that stuff. And the really interesting stuff that I, that I found when I went in was all like, related to NLP and ml a lot of like transcript data, a lot of like parsing through the types of things that companies talk about, whether an analyst reports, conference calls, earnings reports and the devil's really in the details about like how you make sense of, of that information in a way that, you know, gives you insight into what the company's doing and, and where the market is, is going.[00:05:08] I don't know if we can like nerd out on specific strategies. Yes. Let's go, let's go. What, so one of my favorite strategies that, because it never, I don't think we ended up trading on it, so I can probably talk about it. And it, it just kind of shows like the kind of work that you do around this data.[00:05:23] It was called emerging technologies. And so the whole idea is that there's always a new set of emerging technologies coming onto the market and the companies that are ahead of that curve and stay up to date on on the latest trends are gonna outperform their, their competitors.[00:05:38] And that's gonna reflect in the, in the stock price. So when you have a theory like that, how do you actually turn that into a trading strategy? So what we ended up doing is, well first you have to, to determine what are the emergent technologies, like what are the new up and coming technologies.[00:05:56] And so we actually went and pulled data on startups. And so there's like startups in Silicon Valley. You have all these descriptions of what they do, and you get that, that corpus of like when startups were getting funding. And then you can run non-negative matrix factorization on it and create these clusters of like what the various Emerging technologies are, and you have this all the way going back and you have like social media back in like 2008 when Facebook was, was blowing up.[00:06:21] And and you have things like mobile and digital advertising and and a lot of things actually outside of Silicon Valley. They, you know, like shale and oil cracking. Yeah. Like new technologies in, in all these different types of industries. And then and then you go and you look like, which publicly traded companies are actually talking about these things and and have exposure to these things.[00:06:42] And those are the companies that end up staying ahead of, of their competitors. And a lot of the the cases that came out of that made a ton of sense. Like when mobile was emerging, you had Walmart Labs. Walmart was really far ahead in terms of thinking about mobile and the impact of mobile.[00:06:59] And, and their, you know, Sears wasn't, and Walmart did well, and, and Sears didn't. So lots of different examples of of that, of like a company that talks about a new emerging trend. I can only imagine, like right now, all of the stuff with, with ai, there must be tons of companies talking about, yeah, how does this affect their[00:07:17] swyx: business?[00:07:18] And at some point you do, you do lose the signal. Because you get overwhelmed with noise by people slapping a on everything. Right? Which is, yeah. Yeah. That's what the Long Island Iced Tea Company slaps like blockchain on their name and, you know, their stock price like doubled or something.[00:07:32] Reza Shabani: Yeah, no, that, that's absolutely right.[00:07:35] And, and right now that's definitely the kind of strategy that would not be performing well right now because everyone would be talking about ai. And, and that's, as you know, like that's a lot of what you do in Quant is you, you try to weed out other possible explanations for for why this trend might be happening.[00:07:52] And in that particular case, I think we found that, like the companies, it wasn't, it wasn't like Sears and Walmart were both talking about mobile. It's that Walmart went out of their way to talk about mobile as like a future, mm-hmm. Trend. Whereas Sears just wouldn't bring it up. And then by the time an invest investors are asking you about it, you're probably late to the game.[00:08:12] So it was really identifying those companies that were. At the cutting edge of, of new technologies and, and staying ahead. I remember like Domino's was another big one. Like, I don't know, you[00:08:21] swyx: remember that? So for those who don't know, Domino's Pizza, I think for the run of most of the 2010s was a better performing stock than Amazon.[00:08:29] Yeah.[00:08:31] Reza Shabani: It's insane.[00:08:32] swyx: Yeah. Because of their investment in mobile. Mm-hmm. And, and just online commerce and, and all that. I it must have been fun picking that up. Yeah, that's[00:08:40] Reza Shabani: that's interesting. And I, and I think they had, I don't know if you, if you remember, they had like the pizza tracker, which was on, on mobile.[00:08:46] I use it[00:08:46] swyx: myself. It's a great, it's great app. Great app. I it's mostly faked. I think that[00:08:50] Reza Shabani: that's what I heard. I think it's gonna be like a, a huge I don't know. I'm waiting for like the New York Times article to drop that shows that the whole thing was fake. We all thought our pizzas were at those stages, but they weren't.[00:09:01] swyx: The, the challenge for me, so that so there's a, there's a great piece by Eric Falkenstein called Batesian Mimicry, where every signal essentially gets overwhelmed by noise because the people who wants, who create noise want to follow the, the signal makers. So that actually is why I left quant trading because there's just too much regime changing and like things that would access very well would test poorly out a sample.[00:09:25] And I'm sure you've like, had a little bit of that. And then there's what was the core uncertainty of like, okay, I have identified a factor that performs really well, but that's one factor out of. 500 other factors that could be going on. You have no idea. So anyway, that, that was my existential uncertainty plus the fact that it was a very highly stressful job.[00:09:43] Reza Shabani: Yeah. This is a bit of a tangent, but I, I think about this all the time and I used to have a, a great answer before chat came out, but do you think that AI will win at Quant ever?[00:09:54] swyx: I mean, what is Rentech doing? Whatever they're doing is working apparently. Yeah. But for, for most mortals, I. Like just waving your wand and saying AI doesn't make sense when your sample size is actually fairly low.[00:10:08] Yeah. Like we have maybe 40 years of financial history, if you're lucky. Mm-hmm. Times what, 4,000 listed equities. It's actually not a lot. Yeah, no, it's,[00:10:17] Reza Shabani: it's not a lot at all. And, and constantly changing market conditions and made laden variables and, and all of, all of that as well. Yeah. And then[00:10:24] swyx: retroactively you're like, oh, okay.[00:10:26] Someone will discover a giant factor that, that like explains retroactively everything that you've been doing that you thought was alpha, that you're like, Nope, actually you're just exposed to another factor that you're just, you just didn't think about everything was momentum in.[00:10:37] Yeah. And one piece that I really liked was Andrew Lo. I think he had from mit, I think he had a paper on bid as Spreads. And I think if you, if you just. Taken, took into account liquidity of markets that would account for a lot of active trading strategies, alpha. And that was systematically declined as interest rates declined.[00:10:56] And I mean, it was, it was just like after I looked at that, I was like, okay, I'm never gonna get this right.[00:11:01] Reza Shabani: Yeah. It's a, it's a crazy field and I you know, I, I always thought of like the, the adversarial aspect of it as being the, the part that AI would always have a pretty difficult time tackling.[00:11:13] Yeah. Just because, you know, there's, there's someone on the other end trying to out, out game you and, and AI can, can fail in a lot of those situations. Yeah.[00:11:23] swyx: Cool.[00:11:23] From Data to AI at Replit[00:11:23] Alessio Fanelli: Awesome. And now you've been a rep almost two years. What do you do there? Like what does the, the team do? Like, how has that evolved since you joined?[00:11:32] Especially since large language models are now top of mind, but, you know, two years ago it wasn't quite as mainstream. So how, how has that evolved?[00:11:40] Reza Shabani: Yeah, I, so when I joined, I joined a year and a half ago. We actually had to build out a lot of, of data pipelines.[00:11:45] And so I started doing a lot of data work. And we didn't have you know, there, there were like databases for production systems and, and whatnot, but we just didn't have the the infrastructure to query data at scale and to process that, that data at scale and replica has tons of users tons of data, just tons of ripples.[00:12:04] And I can get into, into some of those numbers, but like, if you wanted to answer the question, for example of what is the most. Forked rep, rep on rep, you couldn't answer that back then because it, the query would just completely time out. And so a lot of the work originally just went into building data infrastructure, like modernizing the data infrastructure in a way where you can answer questions like that, where you can you know, pull in data from any particular rep to process to make available for search.[00:12:34] And, and moving all of that data into a format where you can do all of this in minutes as opposed to, you know, days or weeks or months. That laid a lot of the groundwork for building anything in, in ai, at least in terms of training our own own models and then fine tuning them with, with replica data.[00:12:50] So then you know, we, we started a team last year recruited people from, you know from a team of, of zero or a team of one to, to the AI and data team today. We, we build. Everything related to, to ghostrider. So that means the various features like explain code, generate code, transform Code, and Ghostrider chat which is like a in context ide or a chat product within the, in the ide.[00:13:18] And then the code completion models, which are ghostwriter code complete, which was the, the very first version of, of ghostrider. Yeah. And we also support, you know, things like search and, and anything in terms of what creates, or anything that requires like large data scale or large scale processing of, of data for the site.[00:13:38] And, and various types of like ML algorithms for the site, for internal use of the site to do things like detect and stop abuse. Mm-hmm.[00:13:47] Alessio Fanelli: Yep. Sounds like a lot of the early stuff you worked on was more analytical, kind of like analyzing data, getting answers on these things. Obviously this has evolved now into some.[00:13:57] Production use case code lms, how is the team? And maybe like some of the skills changed. I know there's a lot of people wondering, oh, I was like a modern data stack expert, or whatever. It's like I was doing feature development, like, how's my job gonna change? Like,[00:14:12] Reza Shabani: yeah. It's a good question. I mean, I think that with with language models, the shift has kind of been from, or from traditional ml, a lot of the shift has gone towards more like nlp backed ml, I guess.[00:14:26] And so, you know, there, there's an entire skill set of applicants that I no longer see, at least for, for this role which are like people who know how to do time series and, and ML across time. Right. And, and you, yeah. Like you, you know, that exact feeling of how difficult it is to. You know, you have like some, some text or some variable and then all of a sudden you wanna track that over time.[00:14:50] The number of dimensions that it, that it introduces is just wild and it's a totally different skill set than what we do in a, for example, in in language models. And it's very it's a, it's a skill that is kind of you know, at, at least at rep not used much. And I'm sure in other places used a lot, but a lot of the, the kind of excitement about language models has pulled away attention from some of these other ML areas, which are extremely important and, and I think still going to be valuable.[00:15:21] So I would just recommend like anyone who is a, a data stack expert, like of course it's cool to work with NLP and text data and whatnot, but I do think at some point it's going to you know, having, having skills outside of that area and in more traditional aspects of ML will, will certainly be valuable as well.[00:15:39] swyx: Yeah. I, I'd like to spend a little bit of time on this data stack notion pitch. You were even, you were effectively the first data hire at rep. And I just spent the past year myself diving into data ecosystem. I think a lot of software engineers are actually. Completely unaware that basically every company now eventually evolves.[00:15:57] The data team and the data team does everything that you just mentioned. Yeah. All of us do exactly the same things, set up the same pipelines you know, shop at the same warehouses essentially. Yeah, yeah, yeah, yeah. So that they enable everyone else to query whatever they, whatever they want. And to, to find those insights that that can drive their business.[00:16:15] Because everyone wants to be data driven. They don't want to do the janitorial work that it comes, that comes to, yeah. Yeah. Hooking everything up. What like, so rep is that you think like 90 ish people now, and then you, you joined two years ago. Was it like 30 ish people? Yeah, exactly. We're 30 people where I joined.[00:16:30] So and I just wanna establish your founders. That is exactly when we hired our first data hire at Vilify as well. I think this is just a very common pattern that most founders should be aware of, that like, You start to build a data discipline at this point. And it's, and by the way, a lot of ex finance people very good at this because that's what we do at our finance job.[00:16:48] Reza Shabani: Yeah. Yeah. I was, I was actually gonna Good say that is that in, in some ways, you're kind of like the perfect first data hire because it, you know, you know how to build things in a reliable but fast way and, and how to build them in a way that, you know, it's, it scales over time and evolves over time because financial markets move so quickly that if you were to take all of your time building up these massive systems, like the trading opportunities gone.[00:17:14] So, yeah. Yeah, they're very good at it. Cool. Okay. Well,[00:17:18] swyx: I wanted to cover Ghost Writer as a standalone thing first. Okay. Yeah. And then go into code, you know, V1 or whatever you're calling it. Yeah. Okay. Okay. That sounds good. So order it[00:17:26] Replit GhostWriter[00:17:26] Reza Shabani: however you like. Sure. So the original version of, of Ghost Writer we shipped in August of, of last year.[00:17:33] Yeah. And so this was a. This was a code completion model similar to GitHub's co-pilot. And so, you know, you would have some text and then it would predict like, what, what comes next. And this was, the original version was actually based off of the cogen model. And so this was an open source model developed by Salesforce that was trained on, on tons of publicly available code data.[00:17:58] And so then we took their their model, one of the smaller ones, did some distillation some other kind of fancy tricks to, to make it much faster and and deployed that. And so the innovation there was really around how to reduce the model footprint in a, to, to a size where we could actually serve it to, to our users.[00:18:20] And so the original Ghost Rider You know, we leaned heavily on, on open source. And our, our friends at Salesforce obviously were huge in that, in, in developing these models. And, but, but it was game changing just because we were the first startup to actually put something like that into production.[00:18:38] And, and at the time, you know, if you wanted something like that, there was only one, one name and, and one place in town to, to get it. And and at the same time, I think I, I'm not sure if that's like when the image models were also becoming open sourced for the first time. And so the world went from this place where, you know, there was like literally one company that had all of these, these really advanced models to, oh wait, maybe these things will be everywhere.[00:19:04] And that's exactly what's happened in, in the last Year or so, as, as the models get more powerful and then you always kind of see like an open source version come out that someone else can, can build and put into production very quickly at, at, you know, a fraction of, of the cost. So yeah, that was the, the kind of code completion Go Strider was, was really just, just that we wanted to fine tune it a lot to kind of change the way that our users could interact with it.[00:19:31] So just to make it you know, more customizable for our use cases on, on Rep. And so people on Relet write a lot of, like jsx for example, which I don't think was in the original training set for, for cogen. And and they do specific things that are more Tuned to like html, like they might wanna run, right?[00:19:50] Like inline style or like inline CSS basically. Those types of things. And so we experimented with fine tuning cogen a bit here and there, and, and the results just kind of weren't, weren't there, they weren't where you know, we, we wanted the model to be. And, and then we just figured we should just build our own infrastructure to, you know, train these things from scratch.[00:20:11] Like, LMS aren't going anywhere. This world's not, you know, it's, it's not like we're not going back to that world of there's just one, one game in town. And and we had the skills infrastructure and the, and the team to do it. So we just started doing that. And you know, we'll be this week releasing our very first open source code model.[00:20:31] And,[00:20:31] Benchmarking Code LLMs[00:20:31] Alessio Fanelli: and when you say it was not where you wanted it to be, how were you benchmarking[00:20:36] Reza Shabani: it? In that particular case, we were actually, so, so we have really two sets of benchmarks that, that we use. One is human eval, so just the standard kind of benchmark for, for Python, where you can generate some code or you give you give the model a function definition with, with some string describing what it's supposed to do, and then you allow it to complete that function, and then you run a unit test against it and and see if what it generated passes the test.[00:21:02] So we, we always kind of, we would run this on the, on the model. The, the funny thing is the fine tuned versions of. Of Cogen actually did pretty well on, on that benchmark. But then when we, we then have something called instead of human eval. We call it Amjad eval, which is basically like, what does Amjad think?[00:21:22] Yeah, it's, it's exactly that. It's like testing the vibes of, of a model. And it's, it's cra like I've never seen him, I, I've never seen anyone test the model so thoroughly in such a short amount of time. He's, he's like, he knows exactly what to write and, and how to prompt the model to, to get you know, a very quick read on, on its quote unquote vibes.[00:21:43] And and we take that like really seriously. And I, I remember there was like one, one time where we trained a model that had really good you know, human eval scores. And the vibes were just terrible. Like, it just wouldn't, you know, it, it seemed overtrained. So so that's a lot of what we found is like we, we just couldn't get it to Pass the vibes test no matter how the, how[00:22:04] swyx: eval.[00:22:04] Well, can you formalize I'm jal because I, I actually have a problem. Slight discomfort with human eval. Effectively being the only code benchmark Yeah. That we have. Yeah. Isn't that[00:22:14] Reza Shabani: weird? It's bizarre. It's, it's, it's weird that we can't do better than that in some, some way. So, okay. If[00:22:21] swyx: I, if I asked you to formalize Mja, what does he look for that human eval doesn't do well on?[00:22:25] Reza Shabani: Ah, that is a, that's a great question. A lot of it is kind of a lot of it is contextual like deep within, within specific functions. Let me think about this.[00:22:38] swyx: Yeah, we, we can pause for. And if you need to pull up something.[00:22:41] Reza Shabani: Yeah, I, let me, let me pull up a few. This, this[00:22:43] swyx: is gold, this catnip for people.[00:22:45] Okay. Because we might actually influence a benchmark being evolved, right. So, yeah. Yeah. That would be,[00:22:50] Reza Shabani: that would be huge. This was, this was his original message when he said the vibes test with, with flying colors. And so you have some, some ghostrider comparisons ghost Rider on the left, and cogen is on the right.[00:23:06] AmjadEval live demo[00:23:06] Reza Shabani: So here's Ghostrider. Okay.[00:23:09] swyx: So basically, so if I, if I summarize it from a, for ghosting the, there's a, there's a, there's a bunch of comments talking about how you basically implement a clone. Process or to to c Clooney process. And it's describing a bunch of possible states that he might want to, to match.[00:23:25] And then it asks for a single line of code for defining what possible values of a name space it might be to initialize it in amjadi val With what model is this? Is this your, this is model. This is the one we're releasing. Yeah. Yeah. It actually defines constants which are human readable and nice.[00:23:42] And then in the other cogen Salesforce model, it just initializes it to zero because it reads that it starts of an int Yeah, exactly. So[00:23:51] Reza Shabani: interesting. Yeah. So you had a much better explanation of, of that than than I did. It's okay. So this is, yeah. Handle operation. This is on the left.[00:24:00] Okay.[00:24:00] swyx: So this is rep's version. Yeah. Where it's implementing a function and an in filling, is that what it's doing inside of a sum operation?[00:24:07] Reza Shabani: This, so this one doesn't actually do the infill, so that's the completion inside of the, of the sum operation. But it, it's not, it's, it, it's not taking into account context after this value, but[00:24:18] swyx: Right, right.[00:24:19] So it's writing an inline lambda function in Python. Okay.[00:24:21] Reza Shabani: Mm-hmm. Versus[00:24:24] swyx: this one is just passing in the nearest available variable. It's, it can find, yeah.[00:24:30] Reza Shabani: Okay. So so, okay. I'll, I'll get some really good ones in a, in a second. So, okay. Here's tokenize. So[00:24:37] swyx: this is an assertion on a value, and it's helping to basically complete the entire, I think it looks like an E s T that you're writing here.[00:24:46] Mm-hmm. That's good. That that's, that's good. And then what does Salesforce cogen do? This is Salesforce cogen here. So is that invalidism way or what, what are we supposed to do? It's just making up tokens. Oh, okay. Yeah, yeah, yeah. So it's just, it's just much better at context. Yeah. Okay.[00:25:04] Reza Shabani: And, and I guess to be fair, we have to show a case where co cogen does better.[00:25:09] Okay. All right. So here's, here's one on the left right, which[00:25:12] swyx: is another assertion where it's just saying that if you pass in a list, it's going to throw an exception saying in an expectedly list and Salesforce code, Jen says,[00:25:24] Reza Shabani: This is so, so ghost writer was sure that the first argument needs to be a list[00:25:30] swyx: here.[00:25:30] So it hallucinated that it wanted a list. Yeah. Even though you never said it was gonna be a list.[00:25:35] Reza Shabani: Yeah. And it's, it's a argument of that. Yeah. Mm-hmm. So, okay, here's a, here's a cooler quiz for you all, cuz I struggled with this one for a second. Okay. What is.[00:25:47] swyx: Okay, so this is a four loop example from Amjad.[00:25:50] And it's, it's sort of like a q and a context in a chat bot. And it's, and it asks, and Amjad is asking, what does this code log? And it just paste in some JavaScript code. The JavaScript code is a four loop with a set time out inside of it with a cons. The console logs out the iteration variable of the for loop and increasing numbers of of, of times.[00:26:10] So it's, it goes from zero to five and then it just increases the, the delay between the timeouts each, each time. Yeah.[00:26:15] Reza Shabani: So, okay. So this answer was provided by by Bard. Mm-hmm. And does it look correct to you? Well,[00:26:22] the[00:26:22] Alessio Fanelli: numbers too, but it's not one second. It's the time between them increases.[00:26:27] It's like the first one, then the one is one second apart, then it's two seconds, three seconds. So[00:26:32] Reza Shabani: it's not, well, well, so I, you know, when I saw this and, and the, the message and the thread was like, Our model's better than Bard at, at coding Uhhuh. This is the Bard answer Uhhuh that looks totally right to me.[00:26:46] Yeah. And this is our[00:26:47] swyx: answer. It logs 5 5 55, what is it? Log five 50. 55 oh oh. Because because it logs the state of I, which is five by the time that the log happens. Mm-hmm. Yeah.[00:27:01] Reza Shabani: Oh God. So like we, you know we were shocked. Like, and, and the Bard dancer looked totally right to, to me. Yeah. And then, and somehow our code completion model mind Jude, like this is not a conversational chat model.[00:27:14] Mm-hmm. Somehow gets this right. And and, you know, Bard obviously a much larger much more capable model with all this fancy transfer learning and, and and whatnot. Some somehow, you know, doesn't get it right. So, This is the kind of stuff that goes into, into mja eval that you, you won't find in any benchmark.[00:27:35] Good. And and, and it's, it's the kind of thing that, you know, makes something pass a, a vibe test at Rep.[00:27:42] swyx: Okay. Well, okay, so me, this is not a vibe, this is not so much a vibe test as the, these are just interview questions. Yeah, that's, we're straight up just asking interview questions[00:27:50] Reza Shabani: right now. Yeah, no, the, the vibe test, the reason why it's really difficult to kind of show screenshots that have a vibe test is because it really kind of depends on like how snappy the completion is, how what the latency feels like and if it gets, if it, if it feels like it's making you more productive.[00:28:08] And and a lot of the time, you know, like the, the mix of, of really low latency and actually helpful content and, and helpful completions is what makes up the, the vibe test. And I think part of it is also, is it. Is it returning to you or the, the lack of it returning to you things that may look right, but be completely wrong.[00:28:30] I think that also kind of affects Yeah. Yeah. The, the vibe test as well. Yeah. And so, yeah, th this is very much like a, like a interview question. Yeah.[00:28:39] swyx: The, the one with the number of processes that, that was definitely a vibe test. Like what kind of code style do you expect in this situation? Yeah.[00:28:47] Is this another example? Okay.[00:28:49] Reza Shabani: Yeah. This is another example with some more Okay. Explanations.[00:28:53] swyx: Should we look at the Bard one[00:28:54] Reza Shabani: first? Sure. These are, I think these are, yeah. This is original GT three with full size 175. Billion[00:29:03] swyx: parameters. Okay, so you asked GPC three, I'm a highly intelligent question answering bot.[00:29:07] If you ask me a question that is rooted in truth, I'll give you the answer. If you ask me a question that is nonsense I will respond with unknown. And then you ask it a question. What is the square root of a bananas banana? It answers nine. So complete hallucination and failed to follow the instruction that you gave it.[00:29:22] I wonder if it follows if one, if you use an instruction to inversion it might, yeah. Do what better?[00:29:28] Reza Shabani: On, on the original[00:29:29] swyx: GP T Yeah, because I like it. Just, you're, you're giving an instructions and it's not[00:29:33] Reza Shabani: instruction tuned. Now. Now the interesting thing though is our model here, which does follow the instructions this is not instruction tuned yet, and we still are planning to instruction tune.[00:29:43] Right? So it's like for like, yeah, yeah, exactly. So,[00:29:45] swyx: So this is a replica model. Same question. What is the square of bananas? Banana. And it answers unknown. And this being one of the, the thing that Amjad was talking about, which you guys are. Finding as a discovery, which is, it's better on pure natural language questions, even though you trained it on code.[00:30:02] Exactly. Yeah. Hmm. Is that because there's a lot of comments in,[00:30:07] Reza Shabani: No. I mean, I think part of it is that there's a lot of comments and there's also a lot of natural language in, in a lot of code right. In terms of documentation, you know, you have a lot of like markdowns and restructured text and there's also just a lot of web-based code on, on replica, and HTML tends to have a lot of natural language in it.[00:30:27] But I don't think the comments from code would help it reason in this way. And, you know, where you can answer questions like based on instructions, for example. Okay. But yeah, it's, I know that that's like one of the things. That really shocked us is the kind of the, the fact that like, it's really good at, at natural language reasoning, even though it was trained on, on code.[00:30:49] swyx: Was this the reason that you started running your model on hella swag and[00:30:53] Reza Shabani: all the other Yeah, exactly. Interesting. And the, yeah, it's, it's kind of funny. Like it's in some ways it kind of makes sense. I mean, a lot of like code involves a lot of reasoning and logic which language models need and need to develop and, and whatnot.[00:31:09] And so you know, we, we have this hunch that maybe that using that as part of the training beforehand and then training it on natural language above and beyond that really tends to help. Yeah,[00:31:21] Aligning Models on Vibes[00:31:21] Alessio Fanelli: this is so interesting. I, I'm trying to think, how do you align a model on vibes? You know, like Bard, Bard is not purposefully being bad, right?[00:31:30] Like, there's obviously something either in like the training data, like how you're running the process that like, makes it so that the vibes are better. It's like when it, when it fails this test, like how do you go back to the team and say, Hey, we need to get better[00:31:44] Reza Shabani: vibes. Yeah, let's do, yeah. Yeah. It's a, it's a great question.[00:31:49] It's a di it's very difficult to do. It's not you know, so much of what goes into these models in, in the same way that we have no idea how we can get that question right. The programming you know, quiz question. Right. Whereas Bard got it wrong. We, we also have no idea how to take certain things out and or, and to, you know, remove certain aspects of, of vibes.[00:32:13] Of course there's, there's things you can do to like scrub the model, but it's, it's very difficult to, to get it to be better at something. It's, it's almost like all you can do is, is give it the right type of, of data that you think will do well. And then and, and of course later do some fancy type of like, instruction tuning or, or whatever else.[00:32:33] But a lot of what we do is finding the right mix of optimal data that we want to, to feed into the model and then hoping that the, that the data that's fed in is sufficiently representative of, of the type of generations that we want to do coming out. That's really the best that, that you can do.[00:32:51] Either the model has. Vibes or, or it doesn't, you can't teach vibes. Like you can't sprinkle additional vibes in it. Yeah, yeah, yeah. Same in real life. Yeah, exactly right. Yeah, exactly. You[00:33:04] Beyond Code Completion[00:33:04] Alessio Fanelli: mentioned, you know, co being the only show in town when you started, now you have this, there's obviously a, a bunch of them, right.[00:33:10] Cody, which we had on the podcast used to be Tap nine, kite, all these different, all these different things. Like, do you think the vibes are gonna be the main you know, way to differentiate them? Like, how are you thinking about. What's gonna make Ghost Rider, like stand apart or like, do you just expect this to be like table stakes for any tool?[00:33:28] So like, it just gonna be there?[00:33:30] Reza Shabani: Yeah. I, I do think it's, it's going to be table stakes for sure. I, I think that if you don't if you don't have AI assisted technology, especially in, in coding it's, it's just going to feel pretty antiquated. But but I do think that Ghost Rider stands apart from some of, of these other tools for for specific reasons too.[00:33:51] So this is kind of the, one of, one of the things that these models haven't really done yet is Come outside of code completion and outside of, of just a, a single editor file, right? So what they're doing is they're, they're predicting like the text that can come next, but they're not helping with the development process quite, quite yet outside of just completing code in a, in a text file.[00:34:16] And so the types of things that we wanna do with Ghost Rider are enable it to, to help in the software development process not just editing particular files. And so so that means using a right mix of like the right model for for the task at hand. But but we want Ghost Rider to be able to, to create scaffolding for you for, for these projects.[00:34:38] And so imagine if you would like Terraform. But, but powered by Ghostrider, right? I want to, I put up this website, I'm starting to get a ton of traffic to it and and maybe like I need to, to create a backend database. And so we want that to come from ghostrider as well, so it can actually look at your traffic, look at your code, and create.[00:34:59] You know a, a schema for you that you can then deploy in, in Postgres or, or whatever else? You know, I, I know like doing anything in in cloud can be a nightmare as well. Like if you wanna create a new service account and you wanna deploy you know, nodes on and, and have that service account, kind of talk to those nodes and return some, some other information, like those are the types of things that currently we have to kind of go, go back, go look at some documentation for Google Cloud, go look at how our code base does it you know, ask around in Slack, kind of figure that out and, and create a pull request.[00:35:31] Those are the types of things that we think we can automate away with with more advanced uses of, of ghostwriter once we go past, like, here's what would come next in, in this file. So, so that's the real promise of it, is, is the ability to help you kind of generate software instead of just code in a, in a particular file.[00:35:50] Ghostwriter Autonomous Agent[00:35:50] Reza Shabani: Are[00:35:50] Alessio Fanelli: you giving REPL access to the model? Like not rep, like the actual rep. Like once the model generates some of this code, especially when it's in the background, it's not, the completion use case can actually run the code to see if it works. There's like a cool open source project called Walgreen that does something like that.[00:36:07] It's like self-healing software. Like it gives a REPL access and like keeps running until it fixes[00:36:11] Reza Shabani: itself. Yeah. So, so, so right now there, so there's Ghostrider chat and Ghostrider code completion. So Ghostrider Chat does have, have that advantage in, in that it can it, it knows all the different parts of, of the ide and so for example, like if an error is thrown, it can look at the, the trace back and suggest like a fix for you.[00:36:33] So it has that type of integration. But the what, what we really want to do is is. Is merge the two in a way where we want Ghost Rider to be like, like an autonomous agent that can actually drive the ide. So in these action models, you know, where you have like a sequence of of events and then you can use you know, transformers to kind of keep track of that sequence and predict the next next event.[00:36:56] It's how, you know, companies like, like adapt work these like browser models that can, you know, go and scroll through different websites or, or take some, some series of actions in a, in a sequence. Well, it turns out the IDE is actually a perfect place to do that, right? So like when we talk about creating software, not just completing code in a file what do you do when you, when you build software?[00:37:17] You, you might clone a repo and then you, you know, will go and change some things. You might add a new file go down, highlight some text, delete that value, and point it to some new database, depending on the value in a different config file or in your environment. And then you would go in and add additional block code to, to extend its functionality and then you might deploy that.[00:37:40] Well, we, we have all of that data right there in the replica ide. And and we have like terabytes and terabytes of, of OT data you know, operational transform data. And so, you know, we can we can see that like this person has created a, a file what they call it, and, you know, they start typing in the file.[00:37:58] They go back and edit a different file to match the you know, the class name that they just put in, in the original file. All of that, that kind of sequence data is what we're looking to to train our next model on. And so that, that entire kind of process of actually building software within the I D E, not just like, here's some text what comes next, but rather the, the actions that go into, you know, creating a fully developed program.[00:38:25] And a lot of that includes, for example, like running the code and seeing does this work, does this do what I expected? Does it error out? And then what does it do in response to that error? So all, all of that is like, Insanely valuable information that we want to put into our, our next model. And and that's like, we think that one can be way more advanced than the, than this, you know, go straighter code completion model.[00:38:47] Releasing Replit-code-v1-3b[00:38:47] swyx: Cool. Well we wanted to dive in a little bit more on, on the model that you're releasing. Maybe we can just give people a high level what is being released what have you decided to open source and maybe why open source the story of the YOLO project and Yeah. I mean, it's a cool story and just tell it from the start.[00:39:06] Yeah.[00:39:06] Reza Shabani: So, so what's being released is the, the first version that we're going to release. It's a, it's a code model called replica Code V1 three B. So this is a relatively small model. It's 2.7 billion parameters. And it's a, it's the first llama style model for code. So, meaning it's just seen tons and tons of tokens.[00:39:26] It's been trained on 525 billion tokens of, of code all permissively licensed code. And it's it's three epox over the training set. And And, you know, all of that in a, in a 2.7 billion parameter model. And in addition to that, we, for, for this project or, and for this model, we trained our very own vocabulary as well.[00:39:48] So this, this doesn't use the cogen vocab. For, for the tokenize we, we trained a totally new tokenize on the underlying data from, from scratch, and we'll be open sourcing that as well. It has something like 32,000. The vocabulary size is, is in the 32 thousands as opposed to the 50 thousands.[00:40:08] Much more specific for, for code. And, and so it's smaller faster, that helps with inference, it helps with training and it can produce more relevant content just because of the you know, the, the vocab is very much trained on, on code as opposed to, to natural language. So, yeah, we'll be releasing that.[00:40:29] This week it'll be up on, on hugging pace so people can take it play with it, you know, fine tune it, do all type of things with it. We want to, we're eager and excited to see what people do with the, the code completion model. It's, it's small, it's very fast. We think it has great vibes, but we, we hope like other people feel the same way.[00:40:49] And yeah. And then after, after that, we might consider releasing the replica tuned model at, at some point as well, but still doing some, some more work around that.[00:40:58] swyx: Right? So there are actually two models, A replica code V1 three B and replica fine tune V1 three B. And the fine tune one is the one that has the 50% improvement in in common sense benchmarks, which is going from 20% to 30%.[00:41:13] For,[00:41:13] Reza Shabani: for yes. Yeah, yeah, yeah, exactly. And so, so that one, the, the additional tuning that was done on that was on the publicly available data on, on rep. And so, so that's, that's you know, data that's in public res is Permissively licensed. So fine tuning on on that. Then, Leads to a surprisingly better, like significantly better model, which is this retuned V1 three B, same size, you know, same, very fast inference, same vocabulary and everything.[00:41:46] The only difference is that it's been trained on additional replica data. Yeah.[00:41:50] swyx: And I think I'll call out that I think in one of the follow up q and as that Amjad mentioned, people had some concerns with using replica data. Not, I mean, the licensing is fine, it's more about the data quality because there's a lot of beginner code Yeah.[00:42:03] And a lot of maybe wrong code. Mm-hmm. But it apparently just wasn't an issue at all. You did[00:42:08] Reza Shabani: some filtering. Yeah. I mean, well, so, so we did some filtering, but, but as you know, it's when you're, when you're talking about data at that scale, it's impossible to keep out, you know, all of the, it's, it's impossible to find only select pieces of data that you want the, the model to see.[00:42:24] And, and so a lot of the, a lot of that kind of, you know, people who are learning to code material was in there anyway. And, and you know, we obviously did some quality filtering, but a lot of it went into the fine tuning process and it really helped for some reason. You know, there's a lot of high quality code on, on replica, but there's like you, like you said, a lot of beginner code as well.[00:42:46] And that was, that was the really surprising thing is that That somehow really improved the model and its reasoning capabilities. It felt much more kind of instruction tuned afterward. And, and you know, we have our kind of suspicions as as to why there's, there's a lot of like assignments on rep that kind of explain this is how you do something and then you might have like answers and, and whatnot.[00:43:06] There's a lot of people who learn to code on, on rep, right? And, and like, think of a beginner coder, like think of a code model that's learning to, to code learning this reasoning and logic. It's probably a lot more valuable to see that type of, you know, the, the type of stuff that you find on rep as opposed to like a large legacy code base that that is, you know, difficult to, to parse and, and figure out.[00:43:29] So, so that was very surprising to see, you know, just such a huge jump in in reasoning ability once trained on, on replica data.[00:43:38] The YOLO training run[00:43:38] swyx: Yeah. Perfect. So we're gonna do a little bit of storytelling just leading up to the, the an the developer day that you had last week. Yeah. My understanding is you decide, you raised some money, you decided to have a developer day, you had a bunch of announcements queued up.[00:43:52] And then you were like, let's train the language model. Yeah. You published a blog post and then you announced it on Devrel Day. What, what, and, and you called it the yolo, right? So like, let's just take us through like the[00:44:01] Reza Shabani: sequence of events. So so we had been building the infrastructure to kind of to, to be able to train our own models for, for months now.[00:44:08] And so that involves like laying out the infrastructure, being able to pull in the, the data processes at scale. Being able to do things like train your own tokenizes. And and even before this you know, we had to build out a lot of this data infrastructure for, for powering things like search.[00:44:24] There's over, I think the public number is like 200 and and 30 million res on, on re. And each of these res have like many different files and, and lots of code, lots of content. And so you can imagine like what it must be like to, to be able to query that, that amount of, of data in a, in a reasonable amount of time.[00:44:45] So we've You know, we spent a lot of time just building the infrastructure that allows for for us to do something like that and, and really optimize that. And, and this was by the end of last year. That was the case. Like I think I did a demo where I showed you can, you can go through all of replica data and parse the function signature of every Python function in like under two minutes.[00:45:07] And, and there's, you know, many, many of them. And so a and, and then leading up to developer day, you know, we had, we'd kind of set up these pipelines. We'd started training these, these models, deploying them into production, kind of iterating and, and getting that model training to production loop.[00:45:24] But we'd only really done like 1.3 billion parameter models. It was like all JavaScript or all Python. So there were still some things like we couldn't figure out like the most optimal way to to, to do it. So things like how do you pad or yeah, how do you how do you prefix chunks when you have like multi-language models, what's like the optimal way to do it and, and so on.[00:45:46] So you know, there's two PhDs on, on the team. Myself and Mike and PhDs tend to be like careful about, you know, a systematic approach and, and whatnot. And so we had this whole like list of things we were gonna do, like, oh, we'll test it on this thing and, and so on. And even these, like 1.3 billion parameter models, they were only trained on maybe like 20 billion tokens or 30 billion tokens.[00:46:10] And and then Amjad joins the call and he's like, no, let's just, let's just yolo this. Like, let's just, you know, we're raising money. Like we should have a better code model. Like, let's yolo it. Let's like run it on all the data. How many tokens do we have? And, and, and we're like, you know, both Michael and I are like, I, I looked at 'em during the call and we were both like, oh God is like, are we really just gonna do this?[00:46:33] And[00:46:34] swyx: well, what is the what's the hangup? I mean, you know that large models work,[00:46:37] Reza Shabani: you know that they work, but you, you also don't know whether or not you can improve the process in, in In important ways by doing more data work, scrubbing additional content, and, and also it's expensive. It's like, it, it can, you know it can cost quite a bit and if you, and if you do it incorrectly, you can actually get it.[00:47:00] Or you, you know, it's[00:47:02] swyx: like you hit button, the button, the go button once and you sit, sit back for three days.[00:47:05] Reza Shabani: Exactly. Yeah. Right. Well, like more like two days. Yeah. Well, in, in our case, yeah, two days if you're running 256 GP 100. Yeah. Yeah. And and, and then when that comes back, you know, you have to take some time to kind of to test it.[00:47:19] And then if it fails and you can't really figure out why, and like, yeah, it's, it's just a, it's kind of like a, a. A time consuming process and you just don't know what's going to, to come out of it. But no, I mean, I'm Judd was like, no, let's just train it on all the data. How many tokens do we have? We tell him and he is like, that's not enough.[00:47:38] Where can we get more tokens? Okay. And so Michele had this you know, great idea to to train it on multiple epox and so[00:47:45] swyx: resampling the same data again.[00:47:47] Reza Shabani: Yeah. Which, which can be, which is known risky or like, or tends to overfit. Yeah, you can, you can over overfit. But you know, he, he pointed us to some evidence that actually maybe this isn't really a going to be a problem.[00:48:00] And, and he was very persuasive in, in doing that. And so it, it was risky and, and you know, we did that training. It turned out. Like to actually be great for that, for that base model. And so then we decided like, let's keep pushing. We have 256 TVs running. Let's see what else we can do with it.[00:48:20] So we ran a couple other implementations. We ran you know, a the fine tune version as I, as I said, and that's where it becomes really valuable to have had that entire pipeline built out because then we can pull all the right data, de-dupe it, like go through the, the entire like processing stack that we had done for like months.[00:48:41] We did that in, in a matter of like two days for, for the replica data as well removed, you know, any of, any personal any pii like personal information removed, harmful content, removed, any of, of that stuff. And we just put it back through the that same pipeline and then trained on top of that.[00:48:59] And so I believe that replica tune data has seen something like 680. Billion tokens. And, and that's in terms of code, I mean, that's like a, a universe of code. There really isn't that much more out there. And, and it, you know, gave us really, really promising results. And then we also did like a UL two run, which allows like fill the middle capabilities and and, and will be, you know working to deploy that on, on rep and test that out as well soon.[00:49:29] But it was really just one of those Those cases where, like, leading up to developer day, had we, had we done this in this more like careful, systematic way what, what would've occurred in probably like two, three months. I got us to do it in, in a week. That's fun. It was a lot of fun. Yeah.[00:49:49] Scaling Laws: from Kaplan to Chinchilla to LLaMA[00:49:49] Alessio Fanelli: And so every time I, I've seen the stable releases to every time none of these models fit, like the chinchilla loss in, in quotes, which is supposed to be, you know, 20 tokens per, per, what's this part of the yo run?[00:50:04] Or like, you're just like, let's just throw out the tokens at it doesn't matter. What's most efficient or like, do you think there's something about some of these scaling laws where like, yeah, maybe it's good in theory, but I'd rather not risk it and just throw out the tokens that I have at it? Yeah,[00:50:18] Reza Shabani: I think it's, it's hard to, it's hard to tell just because there's.[00:50:23] You know, like, like I said, like these runs are expensive and they haven't, if, if you think about how many, how often these runs have been done, like the number of models out there and then, and then thoroughly tested in some forum. And, and so I don't mean just like human eval, but actually in front of actual users for actual inference as part of a, a real product that, that people are using.[00:50:45] I mean, it's not that many. And, and so it's not like there's there's like really well established kind of rules as to whether or not something like that could lead to, to crazy amounts of overfitting or not. You just kind of have to use some, some intuition around it. And, and what we kind of found is that our, our results seem to imply that we've really been under training these, these models.[00:51:06] Oh my god. And so like that, you know, all, all of the compute that we kind of. Through, with this and, and the number of tokens, it, it really seems to help and really seems to to improve. And I, and I think, you know, these things kind of happen where in, in the literature where everyone kind of converges to something seems to take it for for a fact.[00:51:27] And like, like Chinchilla is a great example of like, okay, you know, 20 tokens. Yeah. And but, but then, you know, until someone else comes along and kind of tries tries it out and sees actually this seems to work better. And then from our results, it seems imply actually maybe even even lla. Maybe Undertrained.[00:51:45] And, and it may be better to go even You know, like train on on even more tokens then and for, for the[00:51:52] swyx: listener, like the original scaling law was Kaplan, which is 1.7. Mm-hmm. And then Chin established 20. Yeah. And now Lama style seems to mean 200 x tokens to parameters, ratio. Yeah. So obviously you should go to 2000 X, right?[00:52:06] Like, I mean, it's,[00:52:08] Reza Shabani: I mean, we're, we're kind of out of code at that point, you know, it's like there, there is a real shortage of it, but I know that I, I know there are people working on I don't know if it's quite 2000, but it's, it's getting close on you know language models. And so our friends at at Mosaic are are working on some of these really, really big models that are, you know, language because you with just code, you, you end up running out of out of context.[00:52:31] So Jonathan at, at Mosaic has Jonathan and Naveen both have really interesting content on, on Twitter about that. Yeah. And I just highly recommend following Jonathan. Yeah,[00:52:43] MosaicML[00:52:43] swyx: I'm sure you do. Well, CAGR, can we talk about, so I, I was sitting next to Naveen. I'm sure he's very, very happy that you, you guys had such, such success with Mosaic.[00:52:50] Maybe could, could you shout out like what Mosaic did to help you out? What, what they do well, what maybe people don't appreciate about having a trusted infrastructure provider versus a commodity GPU provider?[00:53:01] Reza Shabani: Yeah, so I mean, I, I talked about this a little bit in the in, in the blog post in terms of like what, what advantages like Mosaic offers and, and you know, keep in mind, like we had, we had deployed our own training infrastructure before this, and so we had some experience with it.[00:53:15] It wasn't like we had just, just tried Mosaic And, and some of those things. One is like you can actually get GPUs from different providers and you don't need to be you know, signed up for that cloud provider. So it's, it kind of detaches like your GPU offering from the rest of your cloud because most of our cloud runs in, in gcp.[00:53:34] But you know, this allowed us to leverage GPUs and other providers as well. And then another thing is like train or infrastructure as a service. So you know, these GPUs burn out. You have note failures, you have like all, all kinds of hardware issues that come up. And so the ability to kind of not have to deal with that and, and allow mosaic and team to kind of provide that type of, of fault tolerance was huge for us.[00:53:59] As well as a lot of their preconfigured l m configurations for, for these runs. And so they have a lot of experience in, in training these models. And so they have. You know, the, the right kind of pre-configured setups for, for various models that make sure that, you know, you have the right learning rates, the right training parameters, and that you're making the, the best use of the GPU and, and the underlying hardware.[00:54:26] And so you know, your GPU utilization is always at, at optimal levels. You have like fewer law spikes than if you do, you can recover from them. And you're really getting the most value out of, out of the compute that you're kind of throwing at, at your data. We found that to be incredibly, incredibly helpful.[00:54:44] And so it, of the time that we spent running things on Mosaic, like very little of that time is trying to figure out why the G P U isn't being utilized or why you know, it keeps crashing or, or why we, you have like a cuda out of memory errors or something like that. So like all, all of those things that make training a nightmare Are are, you know, really well handled by, by Mosaic and the composer cloud and and ecosystem.[00:55:12] Yeah. I was gonna[00:55:13] swyx: ask cuz you're on gcp if you're attempted to rewrite things for the TPUs. Cause Google's always saying that it's more efficient and faster, whatever, but no one has experience with them. Yeah.[00:55:23] Reza Shabani: That's kind of the problem is that no one's building on them, right? Yeah. Like, like we want to build on, on systems that everyone else is, is building for.[00:55:31] Yeah. And and so with, with the, with the TPUs that it's not easy to do that.[00:55:36] Replit's Plans for the Future (and Hiring!)[00:55:36] swyx: So plans for the future, like hard problems that you wanna solve? Maybe like what, what do you like what kind of people that you're hiring on your team?[00:55:44] Reza Shabani: Yeah. So We are, we're currently hiring for for two different roles on, on my team.[00:55:49] Although we, you know, welcome applications from anyone that, that thinks they can contribute in, in this area. Replica tends to be like a, a band of misfits. And, and the type of people we work with and, and have on our team are you know, like just the, the perfect mix to, to do amazing projects like this with very, very few people.[00:56:09] Right now we're hiring for the applied a applied to AI ml engineer. And so, you know, this is someone who's. Creating data pipelines, processing the data at scale creating runs and and training models and you
Hagay Lupesko, is VP Engineering at MosaicML, a startup that enables teams to easily train large AI models on their data and in their own secure environment. We discuss the the evolution of cloud based machine learning (from “traditional” ML through LLMs), his experience building machine learning applications at leading technology companies, and the need for companies to build their own custom foundation models.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
Jonathan Frankle, Chief Scientist at MosaicML and Assistant Professor of Computer Science at Harvard University, joins us on this episode. With comprehensive infrastructure and software tools, MosaicML aims to help businesses train complex machine-learning models using their own proprietary data.We discuss:- Details of Jonathan's Ph.D. dissertation which explores his “Lottery Ticket Hypothesis.”- The role of neural network pruning and how it impacts the performance of ML models.- Why transformers will be the go-to way to train NLP models for the foreseeable future.- Why the process of speeding up neural net learning is both scientific and artisanal. - What MosiacML does, and how it approaches working with clients.- The challenges for developing AGI.- Details around ML training policy and ethics.- Why data brings the magic to customized ML models.- The many use cases for companies looking to build customized AI models.Jonathan Frankle - https://www.linkedin.com/in/jfrankle/Resources:- https://mosaicml.com/- The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural NetworksThanks for listening to the Gradient Dissent podcast, brought to you by Weights & Biases. If you enjoyed this episode, please leave a review to help get the word out about the show. And be sure to subscribe so you never miss another insightful conversation.#OCR #DeepLearning #AI #Modeling #ML
There are many speculations about the future of artificial intelligence (AI), and in this episode, we hear the opinions and predictions of a player in the inner folds of the AI space. Naveen Rao is the CEO and Co-Founder of the machine learning (ML) training platform, MosaicML, and the former CEO and Co-Founder of Nervana Systems. Naveen shares insight into the thesis behind Mosaic and the practical applications of Large Language Models (LLMs), as well as the Generative Pre-trained Transformer-2 (GPT-2) to GPT-3 transition and the challenge of training models with the constraint of data limits. He predicts the evolution of AI models in terms of quantity, size, and function, and the future of computers in general. If you're curious as to whether it makes sense to build smaller models or own your own model, this episode is for you. Tune in to hear Naveen's opinions on the impact of AI on the economy, the danger of centralizing resources, what constitutes sentience, and much more. “What we're doing at Mosaic is building tools to enable more people to have access to these technologies. When it's all centralized in one or two or three players, that creates a huge power dynamic.” — @NaveenGRao Key Points From This Episode: Naveen Rao's educational background and interest in synthetic intelligence. What led him to start his first AI company, Nervana Systems. The thesis behind MosaicML. What MosaicML offers customers. What Naveen considers to be 2022's most exciting breakthrough in AI. The innovation of ChatGPT. The GPT-2 to GPT-3 transition. The challenge of training models with the constraint of data limits. Naveen explains the concept of synthetic data. He predicts the evolution of AI models in terms of quantity, size, and function. Why it makes sense to build smaller models and own your own model where possible. Data as a moat component. Practical applications of LLMs. Naveen's opinion on whether AI will disrupt the economy or increase Gross Domestic Product (GDP). The danger of the centralization of resources. How MosaicML is making trading more efficient given the limits AI is facing. The efficiency improvements MosaicML customers are seeking out. Naveen's prediction for the rate of cost decline. The history of the computer password. The future of computers. The question of what constitutes sentience. Naveen recounts the acquisition process of selling Nervana Systems to Intel. How the innovator's dilemma will play out among competitors in the AI space. Naveen's advice for his past self.
Jonathan Frankle, incoming Harvard Professor and Chief Scientist at MosaicML, is focused on reducing the cost of training neural nets. He received his PhD at MIT and his BSE and MSE from Princeton.Jonathan has also been instrumental in shaping technology policy related to AI. He worked on a landmark facial recognition report while working as a Staff Technologist at the Center on Privacy and Technology at Georgetown Law.Thanks to great guest Hina Dixit from Samsung NEXT for the introduction to Jonathan!Listen and learn...Why we can't understand deep neural nets like we can understand biology or physics.Jonathan's "lottery hypothesis" that neural nets are 50-90% bigger than they need to be...but it's hard to find which parts aren't necessary.How researchers are finding ways to reduce the cost and complexity of training neural nets.Why we shouldn't expect another AI winter because "it's now a fundamental substrate of research".Which AI problems are a good fit for deep learning... and which ones aren't.What's the role for regulation in enforcing responsible use of AI.How Jonathan and his CTO Hanlin Tang at MosaicML create a culture that fosters responsible use of AI.Why Jonathan says "...We're building a ladder to the moon if we think today's neural nets will lead to AGI."References in this episode...The AI Bill of RightsMosaicMLJonathan's personal site
About MartinMartin Casado is a general partner at the venture capital firm Andreessen Horowitz where he focuses on enterprise investing. He was previously the cofounder and chief technology officer at Nicira, which was acquired by VMware for $1.26 billion in 2012. While at VMware, Martin was a fellow, and served as senior vice president and general manager of the Networking and Security Business Unit, which he scaled to a $600 million run-rate business by the time he left VMware in 2016.Martin started his career at Lawrence Livermore National Laboratory where he worked on large-scale simulations for the Department of Defense before moving over to work with the intelligence community on networking and cybersecurity. These experiences inspired his work at Stanford where he created the software-defined networking (SDN) movement, leading to a new paradigm of network virtualization. While at Stanford he also cofounded Illuminics Systems, an IP analytics company, which was acquired by Quova Inc. in 2006.For his work, Martin was awarded both the ACM Grace Murray Hopper award and the NEC C&C award, and he's an inductee of the Lawrence Livermore Lab's Entrepreneur's Hall of Fame. He holds both a PhD and Masters degree in Computer Science from Stanford University.Martin serves on the board of ActionIQ, Ambient.ai, Astranis, dbt Labs, Fivetran, Imply, Isovalent, Kong, Material Security, Netlify, Orbit, Pindrop Security, Preset, RapidAPI, Rasa, Tackle, Tecton, and Yubico.Links: Yet Another Infra Group Discord Server: https://discord.gg/f3xnJzwbeQ “The Cost of Cloud, a Trillion Dollar Paradox” - https://a16z.com/2021/05/27/cost-of-cloud-paradox-market-cap-cloud-lifecycle-scale-growth-repatriation-optimization/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by Honeycomb. When production is running slow, it's hard to know where problems originate. Is it your application code, users, or the underlying systems? I've got five bucks on DNS, personally. Why scroll through endless dashboards while dealing with alert floods, going from tool to tool to tool that you employ, guessing at which puzzle pieces matter? Context switching and tool sprawl are slowly killing both your team and your business. You should care more about one of those than the other; which one is up to you. Drop the separate pillars and enter a world of getting one unified understanding of the one thing driving your business: production. With Honeycomb, you guess less and know more. Try it for free at honeycomb.io/screaminginthecloud. Observability: it's more than just hipster monitoring.Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig secures your cloud from source to run. They believe, as do I, that DevOps and security are inextricably linked. If you wanna learn more about how they view this, check out their blog, it's definitely worth the read. To learn more about how they are absolutely getting it right from where I sit, visit Sysdig.com and tell them that I sent you. That's S Y S D I G.com. And my thanks to them for their continued support of this ridiculous nonsense.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I'm joined today by someone who has taken a slightly different approach to being—well, we'll call it cloud skepticism here. Martin Casado is a general partner at Andreessen Horowitz and has been on my radar starting a while back, based upon a piece that he wrote focusing on the costs of cloud and how repatriation is going to grow. You wrote that in conjunction with your colleague, Sarah Wang. Martin, thank you so much for joining me. What got you onto that path?Martin: So, I want to be very clear, just to start with is, I think cloud is the biggest innovation that we've seen in infrastructure, probably ever. It's a core part of the industry. I think it's very important, I think every company's going to be using cloud, so I'm very pro-cloud. I just think the nature of how you use clouds is shifting. And that was the focus.Corey: When you first put out your article in conjunction with your colleague as well, like, I saw it and I have to say that this was the first time I'd really come across any of your work previously. And I have my own biases that I started from, so my opening position on reading it was this is just some jerk who's trying to say something controversial and edgy to get attention. That's my frickin job. Excuse me, sir. And who is this clown?So, I started digging, and what I found really changed my perspective because as mentioned at the start of the show, you are a general partner at Andreessen Horowitz, which means you are a VC. You are definitionally almost the archetype of a VC in that sense. And to me, being a venture capitalist means the most interesting thing about you is that you write a large check consisting of someone else's money. And that's never been particularly interesting.Martin: [laugh].Corey: You kind of cut against that grain and that narrative. You have a master's and a PhD in computer science from Stanford; you started your career at one of the national labs—Laurence Livermore, if memory serves—you wound up starting a business, Nicira, if I'm pronouncing that correctly—Martin: Yeah, yeah, yeah.Corey: That you then sold to VMware in 2012, back at a time when that was a noble outcome, rather than a state of failure because VMware is not exactly what it once was. You ran a $600 million a year business while you were there. Basically, the list of boards that you're on is lengthy enough and notable enough that it sounds almost like you're professionally bored, so I don't—Martin: [laugh].Corey: So, looking at this, it's okay, this is someone who actually knows what he is talking about, not just, “Well, I talked to three people in pitch meetings and I now think I know what is going on in this broader industry.” You pay attention, and you're connected, disturbingly well, to what's going on, to the point where if you see something, it is almost certainly rooted in something that is happening. And it's a big enough market that I don't think any one person can keep their finger on the pulse of everything. So, that's when I started really digging into it, paying attention, and more or less took a lot of what you wrote as there are some theses in here that I want to prove or disprove. And I spent a fair bit of time basically threatening, swindling, and bribing people with infinite cups of coffee in order to start figuring out what is going on.And I am begrudgingly left with no better conclusion than you have a series of points in here that are very challenging to disprove. So, where do you stand today, now that, I guess, the whole rise and fall of the hype around your article on cloud repatriation—which yes, yes, we'll put a link to it in the show notes if people want to go there—but you've talked about this in a lot of different contexts. Having had the conversations that you've had, and I'm sure some very salty arguments with people who have a certain vested interest in you being wrong, do you wind up continuing to stand by the baseline positions that you've laid out, or have they evolved into something more nuanced?Martin: So yeah, I definitely want to point out, so this was work done with Sarah Wang was also at Andreessen Horowitz; she's also a GP. She actually did the majority of the analysis and she's way smarter than I am. [laugh]. And so, I'm just very—feel very lucky to work with her on this. And I want to make sure she gets due credit on this.So, let's talk about the furor. So like, I actually thought that this was kind of interesting and it started a good discussion, but instead, like, [laugh] the amount of, like, response pieces and, like, angry emails I got, and [laugh] like, I mean it just—and I kind of thought to myself, like, “Why are people so upset?” I think there's three reasons. I'm going to go through them very quickly because they're interesting.So, the first one is, like, you're right, like, I'm a VC. I think people see a VC and they're like, oh, lack of credibility, lack of accountability, [laugh], you know, doesn't know what they're doing, broad pattern matcher. And, like, I will say, like, I did not necessarily write this as a VC; I wrote this as somebody that's, like, listen, my PhD is an infrastructure; my company was an infrastructure. It's all data center stuff. I had a $600 million a year data center business that sold infrastructure into data centers. I've worked with all of the above. Like, I've worked with Amazon, I've—Corey: So, you sold three Cisco switches?Martin: [laugh]. That's right.Corey: I remember those days. Those were awesome, but not inexpensive.Martin: [laugh]. That's right. Yeah, so like, you know, I had 15 years. It's kind of a culmination of that experience. So, that was one; I just think that people see VC and they have a reaction.The second one is, I think people still have the first cloud wars fresh in their memories and so they just don't know how to think outside of that. So, a lot of the rebuttals were first cloud war rebuttals. Like, “Well, but internal IT is slow and you can't have the expertise.” But like, they just don't apply to the new world, right? Like, listen, if you're Cloudflare, to say that you can't run, like, a large operation is just silly. If you went to Cloudflare and you're like, “Listen, you can't run your own infrastructure,” like, they'd take out your sucker and pat you on the head. [laugh].Corey: And not for nothing, if you try to run what they're doing on other cloud providers from a pure bandwidth perspective, you don't have a company anymore, regardless of how well funded you are. It's a never-full money pit that just sucks all of the money. And I've talked to a number of very early idea stage companies that aren't really founded yet about trying to do things like CDN-style work or streaming video, and a lot of those questions start off with well, we did some back-of-the-envelope math around AWS data transfer pricing, and if our numbers are right, when we scale, we'll be spending $65,000 on data transfer every minute. What did we get wrong?And it's like, “Oh, yeah, you realize that one thing is per hour not per minute, so slight difference there. But no, you're basically correct. Don't do it.” And yeah, no one pays retail price at that volume, but they're not going to give you a 99.999% discount on these things, so come up with a better plan. Cloudflare's business will not work on AWS, full stop.Martin: Yep, yep. So, I legitimately know, basically, household name public companies that are software companies that anybody listening to this knows the name of these companies, who have product lines who have 0% margins because they're [laugh] basically, like, for every dollar they make, they pay a dollar to Amazon. Like, this is a very real thing, right? And if you go to these companies, these are software infrastructure companies; they've got very talented teams, they know how to build, like, infrastructure. To tell them that like, “Well, you know, you can't build your own infrastructure,” or something is, I mean, it's like telling, like, an expert in the business, they can't do what they do; this is what they do. So, I just think that part of the furor, part of the uproar, was like, I just think people were stuck in this cloud war 1.0 mindset.I think the third thing is, listen, we've got an oligopoly, and they employ a bunch of people, and they've convinced a bunch of people they're right, and it's always hard to change that. And I also think there's just a knee-jerk reaction to these big macro shifts. And it was the same thing we did to software-defined networking. You know, like, my grad school work was trying to change networking to go from hardware to software. I remember giving a talk at Cisco, and I was, like, this kind of like a naive grad student, and they literally yelled at me out of the room. They're like, it'll never work.Corey: They tried to burn you as a witch, as I recall.Martin: [laugh]. And so, your specific question is, like, have our views evolved? But the first one is, I think that this macro downturn really kind of makes the problem more acute. And so, I think the problem is very, very real. And so, I think the question is, “Okay, so what happens?”So, let's say if you're building a new software company, and you have a choice of using, like, one of the Big Three public clouds, but it impacts your margins so much that it depresses your share price, what do you do? And I think that we thought a lot more about what the answers there are. And the ones that I think that we're seeing is, some actually are; companies are building their own infrastructure. Like, very famously MosaicML is building their own infrastructure. Fly.io, -building their own infrastructure.Mighty—you know, Suhail's company—building his own infrastructure. Cloudflare has their own infrastructure. So, I think if you're an infrastructure provider, a very reasonable thing to do is to build your own infrastructure. If you're not a core infrastructure provider, you're not; you can still use somebody's infrastructure that's built at a better cost point.So, for example, if I'm looking at a CDN tier, I'm going to use Fly.io, right? I mean, it's like, it's way cheaper, the multi-region is way better, and so, like, I do think that we're seeing, like, almost verticalized clouds getting built out that address this price point and, like, these new use cases. And I think this is going to start happening more and more now. And we're going to see basically almost the delamination of the cloud into these verticalized clouds.Corey: I think there's also a question of scale, where if you're starting out in the evening tonight, to—I want to build, I don't know Excel as a service or something. Great. You're pretty silly if you're not going to start off with a cloud provider, just because you can get instant access to resources, and if your product catches on, you scale out without having to ever go back and build it as quote-unquote “Enterprise grade,” as opposed to having building it on cheap servers or Raspberry Pis or something floating around. By the time that costs hit a certain point—and what that point is going to depend on your stage of company and lifecycle—you're remiss if you don't at least do an analysis on is this the path we want to continue on for the service that we're offering?And to be clear, the answer to this is almost entirely going to be bounded by the context of your business. I don't believe that companies as a general rule, make ill-reasoned decisions. I think that when we see a decision a company makes, by and large, there's context or constraints that we don't see that inform that. I know, it's fun to dunk on some of the large companies' seemingly inscrutable decisions, but I will say, having had the privilege to talk to an awful lot of execs in an awful lot of places—particularly on this show—I don't find myself encountering a whole lot of people in those roles who I come away with thinking that they're a few fries short of a Happy Meal. They generally are very well reasoned in why they do what they do. It's just a question of where we think the future is going on some level.Martin: Yep. So, I think that's absolutely right. So, to be a little bit more clear on what I think is happening with the cloud, which is I think every company that gets created in tech is going to use the cloud for something, right? They'll use it for development, the website, test, et cetera. And many will have everything in the cloud, right?So, the cloud is here to stay, it's going to continue to grow, it's a very important piece of the ecosystem, it's very important piece of IT. I'm very, very pro cloud; there's a lot of value. But the one area that's under pressure is if your product is SaaS if your product is selling Software as a Service, so then your product is basically infrastructure, now you've got a product cost model that includes the infrastructure itself, right? And if you reduce that, that's going to increase your margin. And so, every company that's doing that should ask the question, like, A, is the Big Three the right one for me?Maybe a verticalized cloud—like for example, whatever Fly or Mosaic or whatever is better because the cost is better. And I know how to, you know, write software and run these things, so I'll use that. They'll make that decision or maybe they'll build their own infrastructure. And I think we're going to see that decision happening more and more, exactly because now software is being offered as a service and they can do that. And I just want to make the point, just because I think it's so important, that the clouds did exactly this to the hardware providers. So, I just want to tell a quick story, just because for me, it's just so interesting. So—Corey: No, please, I was only really paying attention to this market from 2016 or so. There was a lot of the early days that I was using as a customer, but I wasn't paying attention to the overall industry trends. Please, storytime. This is how I learned things. I hang out with smart people and I come away a little bit smarter than when I started.Martin: [laugh]. This is, like, literally my fa—this is why this is one of my favorite topics is what I'm about to tell you, which is, so the clouds have always had this argument, right? The big clouds, three clouds, they're like, “Listen, why would you build your own cloud? Because, like, you don't have the expertise, and it's hard and you don't have economies of scale.” Right?And the answer is you wouldn't unless it impacts your share price, right? If it impacts your share price, then of course you would because it makes economic sense. So, the clouds had that exact same dilemma in 2005, right? So, in 2005, Google and Amazon and Microsoft, they looked at their COGS, they looked like, “Okay, I'm offering a cloud. If I look at the COGS, who am I paying?”And it turns out, there was a bunch of hardware providers that had 30% margins or 70% margins. They're like, “Why am I paying Cisco these big margins? Why am I paying Dell these big margins?” Right? So, they had the exact same dilemma.And all of the arguments that they use now applied then, right? So, the exact same arguments, for example, “AWS, you know nothing about hardware. Why would you build hardware? You don't have the expertise. These guys sell to everybody in the world, you don't have the economies of scale.”So, all of the same arguments applied to them. And yet… and yes because it was part of COGS] that it impacted the share price, they can make the economic argument to actually build hardware teams and build chips. And so, they verticalized, right? And so, it just turns out if the infrastructure becomes parts of COGS, it makes sense to optimize that infrastructure. And I would say, the Big Three's foray into OEMs and hardware is a much, much, much bigger leap than an infrastructure company foraying into building their own infrastructure.Corey: There's a certain startup cost inherent to all these things. And the small version of that we had in every company that we started in a pre-cloud era: renting virtual computers from vendors was a thing, but it was still fraught and challenging and things that we use, then, like, GoGrid no longer exist, for good reason. But the alternative was, “Great, I'm going to start building and seeing if this thing has any traction.” Well, you need to go lease a rack somewhere and buy servers from Dell, and they're going to do the fast expedited option, which means only six short weeks until they show up in the data center and then gets sent away because they weren't expecting to receive them. And you wind up with this entire universe of hell between cross-connects and all the rest.And that's before you can ever get anything in front of customers or users to see what happens. Now, it's a swipe of a credit card away and your evening's experiments round up to 25 cents. That was significant. Having to make these significant tens of thousands of dollars of investment just to launch is no longer true. And I feel like that was a great equalizer in some respects.Martin: Yeah, I think that—Corey: And that cost has been borne by the astonishing level of investment that the cloud providers themselves have made. And that basically means that we don't have to. But it does come at a cost.Martin: I think it's also worth pointing out that it's much easier to stand up your own infrastructure now than it has been in the past, too. And so, I think that there's a gradient here, right? So, if you're building a SaaS app, [laugh] you would be crazy not to use the cloud, you just be absolutely insane, right? Like, what do you know about core infrastructure? You know, what do you know about building a back-end? Like, what do you know about operating these things? Go focus on your SaaS app.Corey: The calluses I used to have from crimping my own Ethernet patch cables in data centers have faded by now. I don't want them to come back. Yeah, we used to know how to do these things. Now, most people in most companies do not have that baseline of experience, for excellent reasons. And I wouldn't wish that on the current generation of engineers, except for the ones I dislike.Martin: However, that is if you're building an application. Almost all of my investments are people that are building infrastructure. [laugh]. They're already doing these hardcore backend things; that's what they do: they sell infrastructure. Would you think, like, someone, like, at Databricks doesn't understand how to run infr—of course it does. I mean, like, or Snowflake or whatever, right?And so, this is a gradient. On the extreme app end, you shouldn't be thinking about infrastructure; just use the cloud. Somewhere in the middle, maybe you start on the cloud, maybe you don't. As you get closer to being a cloud service, of course you're going to build your own infrastructure.Like, for example—listen, I mean, I've been mentioning Fly; I just think it's a great example. I mean, Fly is a next-generation CDN, that you can run compute on, where they build their own infrastructure—it's a great developer experience—and they would just be silly. Like, they couldn't even make the cost model work if they did it on the cloud. So clearly, there's a gradient here, and I just think that you would be remiss and probably negligent if you're selling software not to have this conversation, or at least do the analysis.Corey: This episode is sponsored in part by our friend EnterpriseDB. EnterpriseDB has been powering enterprise applications with PostgreSQL for 15 years. And now EnterpriseDB has you covered wherever you deploy PostgreSQL on-premises, private cloud, and they just announced a fully-managed service on AWS and Azure called BigAnimal, all one word. Don't leave managing your database to your cloud vendor because they're too busy launching another half-dozen managed databases to focus on any one of them that they didn't build themselves. Instead, work with the experts over at EnterpriseDB. They can save you time and money, they can even help you migrate legacy applications—including Oracle—to the cloud. To learn more, try BigAnimal for free. Go to biganimal.com/snark, and tell them Corey sent you.Corey: I think there's also a philosophical shift, where a lot of the customers that I talk to about their AWS bills want to believe something that is often not true. And what they want to believe is that their AWS bill is a function of how many customers they have.Martin: Oh yeah.Corey: In practice, it is much more closely correlated with how many engineers they've hired. And it sounds like a joke, except that it's not. The challenge that you have when you choose to build in a data center is that you have bounds around your growth because there are capacity concerns. You are going to run out of power, cooling, and space to wind up having additional servers installed. In cloud, you have an unbounded growth problem.S3 is infinite storage, and the reason I'm comfortable saying that is that they can add hard drives faster than you can fill them. For all effective purposes, it is infinite amounts of storage. There is no forcing function that forces you to get rid of things. You spin up an instance, the natural state of it in a data center as a virtual machine or a virtual instance, is that it's going to stop working two to three years left on maintain when a raccoon hauls it off into the woods to make a nest or whatever the hell raccoons do. In cloud, you will retire before that instance does is it gets migrated to different underlying hosts, continuing to cost you however many cents per hour every hour until the earth crashes into the sun, or Amazon goes bankrupt.That is the trade-off you're making. There is no forcing function. And it's only money, which is a weird thing to say, but the failure mode of turning something off mistakenly that takes things down, well that's disastrous to your brand and your company. Just leaving it up, well, it's only money. It's never a top-of-mind priority, so it continues to build and continues to build and continues to build until you're really forced to reckon with a much larger problem.It is a form of technical debt, where you've kicked the can down the road until you can no longer kick that can. Then your options are either go ahead and fix it or go back and talk to you folks, and it's time for more money.Martin: Yeah. Or talk to you. [laugh].Corey: There is that.Martin: No seriously, I think everybody should, honestly. I think this is a board-level concern for every compa—I sit on a lot of boards; I see this. And this has organically become a board-level concern. I think it should become a conscious board-level concern of, you know, cloud costs, impact COGS. Any software company has it; it always becomes an issue, and so it should be treated as a first-class problem.And if you're not thinking through your options—and I think by the way, your company is a great option—but if you're not thinking to the options, then you're almost fiduciarily negligent. I think the vast, vast majority of people and vast majority of companies are going to stay on the cloud and just do some basic cost controls and some just basic hygiene and they're fine and, like, this doesn't touch them. But there are a set of companies, particularly those that sell infrastructure, where they may have to get more aggressive. And that ecosystem is now very vibrant, and there's a lot of shifts in it, and I think it's the most exciting place [laugh] in all of IT, like, personally in the industry.Corey: One question I have for you is where do you draw the line around infrastructure companies. I tend to have an evolving view of it myself, where things that are hard and difficult do not become harder with time. It used to require a deep-level engineer with a week to kill to wind up compiling and building a web server. Now, it is evolved and evolved and evolved; it is check a box on a webpage somewhere and you're serving a static website. Managed databases, I used to think, were something that were higher up the stack and not infrastructure. Today, I'd call them pretty clearly infrastructure.Things seem to be continually, I guess, a slipping beneath the waves to borrow an iceberg analogy. And it's only the stuff that you can see that is interesting and differentiated, on some level. I don't know where the industry is going at all, but I continue to think of infrastructure companies as being increasingly broad.Martin: Yeah, yeah, yeah. This is my favorite question. [laugh]. I'm so glad you asked. [laugh].Corey: This was not planned to be clear.Martin: No, no, no. Listen, I am such an infrastructure maximalist. And I've changed my opinion on this so much in the last three years. So, it used to be the case—and infrastructure has a long history of, like, calling the end of infrastructure. Like, every decade has been the end of infrastructure. It's like, you build the primitives and then everything else becomes an app problem, you know?Like, you build a cloud, and then we're done, you know? You build the PC and then we're done. And so, they are even very famous talks where people talk about the end of systems when we've be built everything right then. And I've totally changed my view. So, here's my current view.My current view is, infrastructure is the only, really, differentiation in systems, in all IT, in all software. It's just infrastructure. And the app layer is very important for the business, but the app layer always sits on infrastructure. And the differentiations in app is provided by the infrastructure. And so, the start of value is basically infrastructure.And the design space is so huge, so huge, right? I mean, we've moved from, like, PCs to cloud to data. Now, the cloud is decoupling and moving to the CDN tier. I mean, like, the front-end developers are building stuff in the browser. Like, there's just so much stuff to do that I think the value is always going to accrue to infrastructure.So, in my view, anybody that's improving the app accuracy or performance or correctness with technology is an infrastructure company, right? And the more of that you do, [laugh] the more infrastructure you are. And I think, you know, in 30 years, you and I are going to be old, and we're going to go back on this podcast. We're going to talk and there's going to be a whole bunch of infrastructure companies that are being created that have accrued a lot of value. I'm going to say one more thing, which is so—okay, this is a sneak preview for the people listening to this that nobody else has heard before.So Sarah, and I are back at it again, and—the brilliant Sarah, who did the first piece—and we're doing another study. And the study is if you look at public companies and you look at ones that are app companies versus infrastructure companies, where does the value accrue? And there's way, way more app companies; there's a ton of app companies, but it turns out that infrastructure companies have higher multiples and accrue more value. And that's actually a counter-narrative because people think that the business is the apps, but it just turns out that's where the differentiation is. So, I'm just an infra maximalist. I think you could be an infra person your entire career and it's the place to be. [laugh].Corey: And this is the real value that I see of looking at AWS bills. And our narrative is oh, we come in and we fix the horrifying AWS bill. And the naive pass is, “Oh, you cut the bill and make it lower?” Not always. Our primary focus has been on understanding it because you get a phone-number-looking bill from AWS. Great, you look at it, what's driving the cost? Storage.Okay, great. That doesn't mean anything to the company. They want to know what teams are doing this. What's it going to cost for them to add another thousand monthly active users? What is the increase in cost? How do they wind up identifying their bottlenecks? How do they track and assign portions of their COGS to different aspects of their service? How do they trace the flow of capital for their organization as they're serving their customers?And understanding the bill and knowing what to optimize and what not to becomes increasingly strategic business concern.Martin: Yeah.Corey: That's the fun part. That's the stuff I don't see that software has a good way of answering, just because there's no way to use an API to gain that kind of business context. When I started this place, I thought I was going to be building software. It turns out, there's so many conversations that have to happen as a part of this that cannot be replicated by software. I mean, honestly, my biggest competitor for all this stuff is Microsoft Excel because people want to try and do it themselves internally. And sometimes they do a great job, sometimes they don't, but it's understanding their drivers behind their cost. And I think that is what was often getting lost because the cloud obscures an awful lot of that.Martin: Yeah. I think even just summarize this whole thing pretty quickly, which is, like, I do think that organically, like, cloud cost has become a board-level issue. And I think that the shift that founders and execs should make is to just, like, treat it like a first-class problem upfront. So, what does that mean? Minimally, it means understanding how these things break down—A, to your point—B, there's a number of tools that actually help with onboarding of this stuff. Like, Vantage is one that I'm a fan of; it just provides some visibility.And then the third one is if you're selling Software as a Service, that's your core product or software, and particularly it's a infrastructure, if you don't actually do the analysis on, like, how this impacts your share price for different cloud costs, if you don't do that analysis, I would say your fiduciarily negligent, just because the impact would be so high, especially in this market. And so, I think, listen, these three things are pretty straightforward and I think anybody listening to this should consider them if you're running a company, or you're an executive company.Corey: Let's be clear, this is also the kind of problem that when you're sitting there trying to come up with an idea for a business that you can put on slide decks and then present to people like you, these sounds like the paradise of problems to have. Like, “Wow, we're successful and our business is so complex and scaled out that we don't know where exactly a lot of these cost drivers are coming from.” It's, “Yeah, that sounds amazing.” Like, I remember those early days, back when all I was able to do and spend time on and energy on was just down to the idea of, ohh, I'm getting business cards. That's awesome. That means I've made it as a business person.Spoiler: it did not. Having an aggressive Twitter presence, that's what made me as a business person. But then there's this next step and this next step and this next step and this next step, and eventually, you look around and realize just how overwrought everything you've built is and how untangling it just becomes a bit of a challenge and a hell of a mess. Now, the good part is at that point of success, you can bring people in, like, a CFO and a finance team who can do some deep-level analysis to help identify what COGS is—or in some cases, have some founders, explain what COGS is to you—and understand those structures and how you think about that. But it always feels like it's a trailing problem, not an early problem that people focus on.Martin: I'll tell you the reason. The reason is because this is a very new phenomenon that it's part of COGS. It's literally five years new. And so, we're just catching up. Even now, this discussion isn't what it was when we first wrote the post.Like, now people are pretty educated on, like, “Oh yeah, like, this is really an issue. Oh, yeah. It contributes to COGS. Oh, yeah. Like, our stock price gets hit.” Like, it's so funny to watch, like, the industry mature in real-time. And I think, like, going forward, it's just going to be obvious that this is a board-level issue; it's going to be obvious this is, like, a first-class consideration. But I agree with you. It's like, listen, like, the industry wasn't ready for it because we didn't have public companies. A lot of public companies, like, this is a real issue. I mean really we're talking about the last five, seven years.Corey: It really is neat, just in real time watching how you come up with something that sounds borderline heretical, and in a relatively short period of time, becomes accepted as a large-scale problem, and now it's now it is fallen off of the hype train into, “Yeah, this is something to be aware of.” And people's attention spans have already jumped to the next level and next generation of problem. It feels like this used to take way longer for these cycles, and now everything is so rapid that I almost worry that between the time we're recording this and the time that it publishes in a few weeks, what is going to have happened that makes this conversation irrelevant? I didn't used to have to think like that. Now, I do.Martin: Yeah, yeah, yeah, for sure. Well, just a couple of things. I want to talk about, like, one of the reasons that accelerated this, and then when I think is going forward. So, one of the reasons this was accelerated was just the macro downturn. Like, when we wrote the post, you could make the argument that nobody cares about margins because it's all about growth, right?And so, like—and even then, it still saved a bunch of money, but like, a lot of people were like, “Listen, the only thing that matters is growth.” Now, that's absolutely not the case if you look at public market valuations. I mean, people really care about free cash flow, they really care about profitability, and they really care about margins. And so, it's just really forced the issue. And it also, like, you know, made kind of what we were saying very, very clear.I would say, you know, as far as shifts that are going, I think one of the biggest shifts is for every back-end developer, there's, like, a hundred front-end developers. It's just crazy. And those front-end developers—Corey: A third of a DevOps engineer.Martin: [laugh]. True. I think those front-end developers are getting, like, better tools to build complete apps, right? Like, totally complete apps, right? Like they've got great JavaScript frameworks that coming out all the time.And so, you could argue that actually a secular technology change—which is that developers are now rebuilding apps as kind of front-end applications—is going to pull compute away from the clouds anyways, right? Like if instead of, like, the app being some back-end thing running in AWS, but instead is a front-end thing, you know, running in a browser at the CDN tier, while you're still using the Big Three clouds, it's being used in a very different way. And we may have to think about it again differently. Now, this, again, is a five-year going forward problem, but I do feel like there are big shifts that are even changing the way that we currently think about cloud now. And we'll see.Corey: And if those providers don't keep up and start matching those paradigms, there's going to be an intermediary shim layer of companies that wind up converting their resources and infrastructure into things that suit this new dynamic, and effectively, they're going to become the next version of, I don't know, Level 3, one of those big underlying infrastructure companies that most people have never heard of or have to think about because they're not doing anything that's perceived as interesting.Martin: Yeah, I agree. And I honestly think this is why Cloudflare and Cloudflare work is very interesting. This is why Fly is very interesting. It's a set of companies that are, like, “Hey, listen, like, workloads are moving to the front-end and, you know, you need compute closer to the user and multi-region is really important, et cetera.” So, even as we speak, we're seeing kind of shifts to the way the cloud is moving, which is just exciting. This is why it's, like, listen, infrastructure is everything. And, like, you and I like if we live to be 200, we can do [laugh] a great infrastructure work every year.Corey: I'm terrified, on some level, that I'll still be doing the exact same type of thing in 20 years.Martin: [laugh].Corey: I like solving different problems as we go. I really want to thank you for spending so much time talking to me today. If people want to learn more about what you're up to, slash beg you for other people's money or whatnot, where's the best place for them to find you?Martin: You know, we've got this amazing infrastructure Discord channel. [laugh].Corey: Really? I did not know that.Martin: I love it. It's, like, the best. Yeah, my favorite thing to do is drink coffee and talk about infrastructure. And like, I posted this on Twitter and we've got, like, 600 people. And it's just the best thing. So, that's honestly the best way to have these discussions. Maybe can you put, like, the link in, like, the show notes?Corey: Oh, absolutely. It is already there in the show notes. Check the show notes. Feel free to join the infrastructure Discord. I will be there waiting for you.Martin: Yeah, yeah, yeah. That'll be fantastic.Corey: Thank you so much for being so generous with your time. I appreciate it.Martin: This was great. Likewise, Corey. You're always a class act and I really appreciate that about you.Corey: I do my best. Martin Casado, general partner at Andreessen Horowitz. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment telling me that I got it completely wrong and what check you wrote makes you the most interesting.Announcer: The content here is for informational purposes only and should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security and is not directed at any investors or potential investors in any a16z fund. For more details, please see a16z.com/disclosures.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
We wrap up our Future of Compute series with a leading force in the field, Naveen Rao. Rao founded Nervana Systems, the first next-gen AI chip company, which he sold to Intel. He then drove Intel's AI road map before stepping down from the company in 2020, and just recently announced the founding of MosaicML, an AI startup focused on making algorithms more efficient through what he calls here a 'benchmarking as a service' approach. Given his interest in AI stretching back over two decades and his front seat position in the field, Rao's perspective on the competitive landscape, on how things have changed from Nervana to Mosaic, and the challenges facing merchant silicon firms is both valuable and a nice wrap-up of the three part series. He gives his take on the Nvidia/ARM deal, Intel's position, the supply chain, and a lot more. Check out MosaicML, as well as their twitter account and Naveen's. Topics Covered 2:30 minute mark – Naveen's entry into the AI world over his career 6:00 – What did people have to learn about neural networks? 8:00 – The goal of Mosaic 14:00 – View on the current landscape 17:30 – The model Mosaic is targeting 20:30 – The significance of Nvidia's A100 and shift to AI dedicated GPUs – the field in 2016 26:00 – The field in 2018 32:30 – How to look at the AI market today 38:30 – The challenges facing legacy merchant silicon makers 45:30 – Can the industry continue to develop with such a fragmented environment 51:30 – Intel's reaction to the current climate 55:30 – Where are the IPOs? 1:04:00 – Tesla's D1 Chip and AI ambitions 1:12:30 – The Nvidia/Arm deal 1:15:30 – Supply Chain challenges
#mlnews #turingnlg #convmixer Your latest upates on what's happening in the Machine Learning world. OUTLINE: 0:00 - Intro 0:16 - Weights & Biases raises on 1B valuation (sponsored) 2:30 - Microsoft trains 530 billion parameter model 5:15 - StyleGAN v3 released 6:45 - A few more examples may be worth billions of parameters 8:30 - ConvMixer fits into a tweet 9:45 - Improved VQGAN 11:25 - William Shatner AI chats about his life 12:35 - Google AI pushes material science 14:10 - Gretel AI raises 50M for privacy protection 16:05 - DeepMind's push into ML for biology 19:00 - Schmidhuber laudates Kunihiko Fukushima for Bower Award 21:30 - Helpful Things 22:25 - Mosaic ML out of stealth mode 23:55 - First German self-driving train 24:45 - Ex-Pentagon Chief: China has already won 26:25 - DeepMind becomes profitable Sponsor: Weights & Biases https://wandb.com References: Microsoft Trains 530B Parameter Model https://www.microsoft.com/en-us/resea... StyleGAN 3 Code Released https://nvlabs.github.io/stylegan3/ https://github.com/NVlabs/stylegan3 https://colab.research.google.com/git... When do labels help? https://arxiv.org/pdf/2110.04374.pdf ml_paper.bruh https://openreview.net/pdf?id=TVHS5Y4... Improved VQGAN https://openreview.net/pdf?id=pfNyExj7z2 William Shatner "AI" & Storyfile https://www.livescience.com/william-s... https://www.storyfile.com/ GoogleAI Finds Complex Metal Oxides https://ai.googleblog.com/2021/10/fin... GretelAI raises 50M Series B https://techcrunch.com/2021/10/07/gre... https://gretel.ai/ https://gretel.ai/blog/why-privacy-by... DeepMind's Push in ML for Bio https://www.biorxiv.org/content/10.11... https://deepmind.com/blog/article/enf... Kunihiko Fukushima wins Bower Award: Schmidhuber Congratulates https://www.fi.edu/laureates/kunihiko... https://www.youtube.com/watch?v=ysOw6... Helpful Things https://github.com/UKPLab/beir#beers-... https://arxiv.org/pdf/2104.08663.pdf https://bayesoptbook.com/ https://github.com/nvlabs/imaginaire/ https://github.com/NVlabs/imaginaire/... MosaicML out of Stealth Mode https://www.mosaicml.com/ https://www.mosaicml.com/blog/founder... https://app.mosaicml.com/library/imag... https://github.com/mosaicml/composer https://mosaicml-composer.readthedocs... Germany's first self-driving train https://techxplore.com/news/2021-10-g... Ex-Pentagon Chief: China has already won tech war https://nypost.com/2021/10/11/pentago... DeepMind becomes profitable https://bdtechtalks.com/2021/10/07/go... Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick...