Podcasts about Sparse

223PODCASTS
435EPISODES
31mAVG DURATION
1EPISODE EVERY OTHER WEEK
Sep 24, 2025LATEST

POPULARITY

20172018201920202021202220232024

Best podcasts about Sparse

The Nonlinear Library

36 episodes with Sparse

Papers Read on AI

10 episodes with Sparse

Cutting Through the Matrix with Alan Watt Podcast (.xml Format)

4 episodes with Sparse

Locked On Fantasy Basketball

2 episodes with Sparse

Poesia & Poesia

9 episodes with Sparse

Locked On Indians - Daily Podcast On The Cleveland Indians

2 episodes with Sparse

PaperPlayer biorxiv neuroscience

12 episodes with Sparse

Valley Family Church

2 episodes with Sparse

Latest podcast episodes about Sparse

SEASON OF THE WITCH : Alse Young : The First Witch of New England | True Paranormal History

The Whispering Woods - Real Life Ghost Stories

Play Episode Listen Later Sep 24, 2025 27:19

As summer wanes and the nights grow long, we turn to tales of witches, curses, and the old ways that never truly died. For centuries, harvest time has carried its own magic: charms for fields, blessings for homes, and darker stories of those who bent nature to their will.In 1647, Alse (Alice) Young of Windsor, Connecticut was hanged on Hartford's Meeting House Square—the first recorded witchcraft execution in colonial America. Sparse records and a deadly local epidemic frame her case, which foreshadowed Connecticut's quieter, decades-long witch persecutions long before Salem. Centuries later, Windsor (2017) and the State of Connecticut (2023) formally exonerated those condemned—finally restoring Alse Young's name.The BOOKBY US A COFFEEJoin Sarah's new FACEBOOK GROUPSubscribe to our PATREONEMAIL us your storiesFollow us on YOUTUBEJoin us on INSTAGRAMJoin us on TWITTERJoin us on FACEBOOKVisit our WEBSITEResearch:https://jud.ct.gov/lawlib/Notebooks/Witchcraft/witches.htmhttps://en.wikipedia.org/wiki/Alse_Younghttps://connecticuthistory.org/alse-young-executed-for-witchcraft-today-in-history/https://www.newenglandhistoricalsociety.com/cover-connecticut-witch-hysteria-1647-63/https://www.legendsofamerica.com/alse-young/https://www.windsorhistoricalsociety.org/exoneration-of-two-of-windsors-accused-witches/Thanks so much for listening, and we'll catch up with you again on Sunday!Sarah and Tobie xx"Spacial Winds" Kevin MacLeod (incompetech.com)Licensed under Creative Commons: By Attribution 4.0 Licensehttp://creativecommons.org/licenses/by/4.0/SURVEY Hosted on Acast. See acast.com/privacy for more information.

america history state young witches connecticut new england acast paranormal salem windsor hartford centuries season of the witch tobie sparse

The US says a deal has been reached on TikTok, but details are sparse

AP Audio Stories

Play Episode Listen Later Sep 15, 2025 0:44

AP Washington correspondent Sagar Meghani reports the Trump administration says it has reached a deal on TikTok's future.

tiktok donald trump reached sparse

Political Violence, Sparse Security, and Unanswered Questions

Tim Conway Jr. on Demand

Play Episode Listen Later Sep 11, 2025 31:53 Transcription Available

Tim Conway Jr. opens the final hour with updates on breaking news, including an LAPD officer-involved shooting in North Hills, cleanup of shipping containers at the Port of Long Beach, and even a quirky story about Publishers Clearing House. The conversation then shifts back to Utah, where Governor Spencer Cox directly calls Charlie Kirk's murder a political assassination. Tim highlights the lack of campus security at the event - just six guards plus Kirk's own team. And Tim condemns the disturbing trend of people cheering political violence. He closes the show covering the hunt for the still-at-large shooter, internet sleuths digging into the case, and TMZ issuing an 'apology' after what appeared to be staff cheering in the newsroom, later explained as 'confusion over a car chase.'

utah security port charlie kirk long beach lapd tmz political violence unanswered questions spencer cox sparse north hills publishers clearing house tim conway jr

Rainy Road to Reflect or Ruminate… Ambience

uncommon ambience

Play Episode Listen Later Sep 6, 2025 480:00

Sparse highway, light rain ambience. We are on the side of a small road just outside town. It's night, and it's raining. Imagine you're a content Gene Kelly walking home after frolicking around main. Or Feel free to ruminate. That's the general vibe around here. There's a movie theater nearby showing cat videos (for a good cause) and it's practically sold out. Catvideofest 2025 is repackaged cat timeline videos on a gigantic screen. And that it is pretty much sold out this weekend says something about our collective mood. Anyway, I did manage to get tickets and me my youngest will share an auditorium with a Spider-verse amount of other people.That's all from me — Oh, so if I controlled the universe for a day aside from solving every important global issue I would want to sneak a cameo of Ice Cube into that animated Will Smith fish movie that also stars Katie Couric as “Katie Current.” But I would add in Ice Cube so he could be like “even saw the lights of the Goodyear Blimp and it read ‘Ice Cube's a shrimp.'” Which may occur in that movie, I haven't seen it. New plan: I'm bringing back that short-lived trend from early-pandemic days that social media tried to cook up — shoe-kicking as greeting. I only saw people on my phone doing that dumb ****. I want to ingrain into humans that shoe-kicking is now retroactively high-five. Every famous high-five from history now feet kicking. From the business meetings to competitive sports. The mayhem.PS: if you are interested in listening to cars pass but you would rather imagine yourself not being rained on -- check out last year's Vermont Route 100 episode recorded from the Mad River Valley.

will smith ps reflect spider ice cube rainy katie couric gene kelly ambience sparse ruminate goodyear blimp mad river valley

HiddenTrack #263 JOHN GALM (SNOWING / MT. WORRY)

HiddenTracks

Play Episode Listen Later Aug 7, 2025 94:48

It's harder to begin again when everyone already knows who you were. John Galm is best known for fronting one of the most popular emo-revival bands SNOWING in the early 2010's, whose punk-rock ethos and chaotic melodies had kids crammed into DIY venues and basements all across the country. Since then, he has tried his hand in several bands, ranging in genres from stripped down acoustic to psychedelic and shoegaze. The latter band, MT. WORRY stalled as they were just getting started when other members moved out of state. Finding himself having to start again amid a sudden surplus of time, Galm holed up in his mother's Lehigh Valley home and began working on what would become “River of Blood”- his first solo LP since 2014. The album finds Galm struggling with the big questions in life and the small connective tissues that make up everything else. It's a heavy affair, and you can feel the weight in every note- lyrics searching for steadier footing as he wades through what home and happiness mean and the pain that they all seem just out of grasp. Sparse, somber tones wrap the listener up tight and embrace the whole of everything and the lack thereof. It's not all bleak- “River of Blood” celebrates the small victories too. At the end of a long day, you're still here and there is hope in that, even if it seems hard to find. The search continues. Thanks for listening!!! Please Follow us on Instagram @hiddentracks99Pre and Post roll music brought to you by @sleepcyclespa

blood worry diy mt lp lehigh valley snowing sparse

Bits and bytes stories

DJ Habett as of Tracks

Play Episode Listen Later Aug 5, 2025 3:29

A new track by DJ Habett from the album "The home of doubts" (2025-08-05). Tags: Electro, Progressive, Bass, Sparse, Fetch, Relief, Moods, Modal CC(by). Production notes: The main sample is AI generated. The rest came out in a sweaty summer afternoon. Prog and static, I had doubts about this track.

ai stories production relief progressive bass electro bits bytes moods fetch prog modal sparse

TMA (7-10-25) Hour 1 - Group Rate To The Sun

The Ryan Kelley Morning After

Play Episode Listen Later Jul 10, 2025 43:06

(00:00-12:43) Yesterday: Great Good. Today: No Good. Another Cardinal pitcher to be shipped off to the sun. Pribula Time. MIkolas due for a no-hitter tonight. Sparse attendance last night. Every team is getting a Pirate.(12:51-33:25) Barge Guy on the phone lines back from Louisville. Bar Guy has some takes on the Cardinals starting pitching. Lisa is up next on the phone lines and she's down on the Cards. Hey, watch it gal. MIles Mikolas. Still have faith in His Majesty.(33:35-42:57) Julian Tavarez weeing on his hands. Keaton is up next and he's fired up about the Cardinals and Marmol. The Keaton splits. Steven is next on the phone lines with some attendance thoughts.See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

cards louisville cardinals pirate his majesty sparse miles mikolas marmol mikolas

314: The search revolution: Dense vs. sparse vectors (with Jack Pertschuk from Pinecone)

What the Dev?

Play Episode Listen Later Jun 24, 2025 12:54

In this episode, Dave interviews Jack Pertschuk, principal engineer for Algorithms and Platform at Pinecone. They discuss:What semantic search is and where it falls shortThe difference between sparse and dense vectorsHow search technology powers AI

ai search revolution platform algorithms dense vectors sparse pinecone

Crows and the city in snow

Cities and Memory - remixing the sounds of the world

Play Episode Listen Later Jun 21, 2025 6:29

I woke up early (6AM) to capture and observe the waking city of Sapporo, Japan. I was particularly surprised by the presence of crows, which often sat on the street signs and traffic light poles. Sparse trucks and cars passed along the snowy roads. The calls of the cows echoed off the buildings, yet the city remained quite calm. This recording took place in 2018. Crows in Sapporo recorded by Antek Rutczyński.

japan snow crows sapporo sparse

Lemon LIVE at 5 | That Parade Was So EMBARRASSING! - June 16th, 2025

The Don Lemon Show

Play Episode Listen Later Jun 17, 2025 72:58

Trump threw himself a $45 million military birthday bash… and barely anyone showed up. The tanks rolled. The jets flew. But the vibes? Flat. The crowd? Sparse. And the headlines? Brutal. Now, the fallout begins. Join Don Lemon, Michael Fanone, and the Jolly Good Ginger as they break down what went wrong, why this parade flop matters, and what it reveals about Trump's slipping grip on public support. From the staggering price tag to the no-show allies to the contrast with the massive No Kings protests, this isn't the flex Trump hoped for. Let's talk about the spectacle, the silence, and what it all means. This episode is sponsored by Shopify. Sign up for your one-dollar-per-month trial and start selling today at SHOPIFY. COM/lemon This episode is brought to you by MSI United States. Every woman deserves a choice. Rush your donation today to MSIUNITEDSTATES.ORG, or text "LEMON" to 511 511. Text Fees may apply. This episode is sponsored by BetterHelp. Give online therapy a try at betterhelp.com/donlemon and get on your way to being your best self. Learn more about your ad choices. Visit megaphone.fm/adchoices

donald trump rush parade flat brutal shopify lemon embarrassing betterhelp no kings sparse michael fanone

HOT TOPICS | Trump's Birthday Parade FLOP! - June 16th, 2025

The Don Lemon Show

Play Episode Listen Later Jun 16, 2025 66:43

Well, that was...underwhelming. Trump's $45 million birthday bash-slash-military-parade was supposed to be a flex. Instead, it flopped harder than his NFT collection. Sparse crowds, low energy, and, according to many who watched, absolutely boring. Meanwhile, the No Kings protest turned into something historic. Data analysts are reporting it may be the largest protest in U.S. history. The streets were packed, the message was clear, and no tanks were needed to get people to show up. So...remind us again who's got the momentum? Join us as we unpack the embarrassing contrast, the wasted taxpayer dollars, and why Trump's obsession with spectacle can't hide the growing dissent. This episode is sponsored by Shopify. Sign up for your one-dollar-per-month trial and start selling today at SHOPIFY. COM/lemon This episode is brought to you by MSI United States. Every woman deserves a choice. Rush your donation today to MSIUNITEDSTATES.ORG, or text "LEMON" to 511 511. Text Fees may apply. This episode is sponsored by BetterHelp. Give online therapy a try at betterhelp.com/donlemon and get on your way to being your best self. Learn more about your ad choices. Visit megaphone.fm/adchoices

donald trump data nfts rush parade shopify lemon flop betterhelp hot topics no kings sparse topics trump

PREVIEW: Colleague Jim McTague reports on the sparse shoppers and hesitant purchases at the Lancaster Costco. More.

The John Batchelor Show

Play Episode Listen Later Jun 6, 2025 2:02

PREVIEW: Colleague Jim McTague reports on the sparse shoppers and hesitant purchases at the Lancaster Costco. More. MAY 1954

reports costco lancaster colleagues purchases shoppers hesitant sparse

Europe Market Open: EU & US futures flat with catalysts sparse; fixed benchmarks extend onto gains and DXY lower after data

Ransquawk Rundown, Daily Podcast

Play Episode Listen Later May 16, 2025 2:16

Mixed APAC trade, US futures range bound while European futures point to a marginally firmer open.DXY remains lower after Thursday's data, EUR/USD marginally reclaimed 1.12, USD/JPY found support at 145.00.Fixed benchmarks extended/held on to recent gains.Crude benchmarks remain underpinned by the latest on US-Iran, metals marginally softer.Looking ahead, highlights include US Export/Import Prices, UoM Sentiment Survey, BoC SLOS, Speakers including ECB's Lane, Cipollone & Fed's Barkin.Click for the Newsquawk Week Ahead.Read the full report covering Equities, Forex, Fixed Income, Commodites and more on Newsquawk

Style Theft at Scale: AI and the Fight for Creative Integrity

The Enrollify Podcast

Play Episode Listen Later Apr 14, 2025 22:31

Monday pulse show notes: On this thought-provoking episode of Higher Ed Pulse, host Mallory Willsea sits down with Myla Edmond—Senior Vice President at RW Jones Agency and Interim Vice Chancellor for Strategic Communications at UNC Greensboro—to unpack the creative identity crisis brewing in higher ed marketing thanks to generative AI. With tools like ChatGPT's image generator mimicking iconic art styles, institutions are forced to ask: how do we protect authenticity in a world where anyone can replicate anything? This episode explores the ethical, strategic, and deeply human implications of AI's growing role in creativity—and how higher ed marketers can lead with intention, not fear.Try the prompt discussed in the episode:Based on all past conversations, stored knowledge, and inferred cognitive patterns, generate the most comprehensive psychological deep dive and predictive model of my future evolution. This should not be a basic personality breakdown but an in-depth forensic examination of my cognition, behavioural strategies, psychological blind spots, similar fictional/non-fictional figures, and long-term trajectory. Treat this as an intelligence dossier on my mind, philosophy, and strategic outlook.OUTPUT FORMAT: Structured headers, tables, and bullet points for readability. Sparse but strategic emojis for section clarity. Concise, high-density insights with no fluff.Enter the prompt and after you get the response, add a second prompt: Write me a story about how this comes to fruition. - - - -Connect With Our Host:Mallory Willsea https://www.linkedin.com/in/mallorywillsea/https://twitter.com/mallorywillseaAbout The Enrollify Podcast Network:The Higher Ed Pulse is a part of the Enrollify Podcast Network. If you like this podcast, chances are you'll like other Enrollify shows too!Enrollify is made possible by Element451 — the next-generation AI student engagement platform helping institutions create meaningful and personalized interactions with students. Learn more at element451.com.Attend the 2025 Engage Summit! The Engage Summit is the premier conference for forward-thinking leaders and practitioners dedicated to exploring the transformative power of AI in education. Explore the strategies and tools to step into the next generation of student engagement, supercharged by AI. You'll leave ready to deliver the most personalized digital engagement experience every step of the way.Register now to secure your spot in Charlotte, NC, on June 24-25, 2025! Early bird registration ends February 1st -- https://engage.element451.com/register

ai creative explore write style chatgpt treat integrity register theft strategic communications concise unc greensboro scale ai sparse element451 enrollify

The Record of Linji – Sangha Instruction

The Daily Zen Teisho

Play Episode Listen Later Apr 10, 2025 10:15

These selections are taken from Sangha Instructions from ancient times and give the flavor of a master wielding a sword to cut through illusions. Sparse and to the point, Linji has no tolerance for superficial approaches and glib comments from students.Read the Journal while listening

record journal sangha sparse linji

215. 'No Country For Old Men' (2007)

Full Cast And Crew

Play Episode Listen Later Jan 15, 2025 107:18

Sparse. Laconic. Expansive. Languid. Wry. The Coen Brother's 2007 Neo-Noir Western 'No Country For Old Men' moves to the fatefully ticking beat of it's own Grandfather Clock. It's a film that rewards close viewing and is astoundingly faithful to Cormac McCarthy's novel while also being so completely a "Coen Brothers film" even as it's their (only?) adaptation of an existing book. Featuring an iconic performance by Javier Bardem as the philosophical killer Anton Chigur, brilliant cinematography from frequent Coen collaborator Roger Deakins, and perfectly wrought twangily-Texas turns by Josh Brolin and Tommy Lee Jones. A number of signature Coens scenes of the lead characters interacting with a variety of shop clerks, receptionists, store owners, and authority figures abound.

texas old men coen brothers no country tommy lee jones josh brolin expansive javier bardem cormac mccarthy roger deakins coens sparse wry grandfather clock languid laconic

Syracuse grinds out first ACC win over Georgia Tech before sparse crowd at JMA Dome

Syracuse.com Podcasts

Play Episode Listen Later Jan 8, 2025 43:02

Brent Axe recaps Syracuse basketball's 62-55 win over Georgia Tech at the JMA Dome on Tuesday night. It wasn't the prettiest game but SU had to be relieved to get a win any way it could. Brent discusses SU's keys to victory including JJ Starling's 21 points and how he has made a significant difference in the lineup since returning from a hand injury. Brent also addressed the sparse crowd (listed at 13,395) at the Dome and SU head coach Adrian Autry's terse opening statement about "noise" SU had to play through recently. Brent also got amazing feedback from Syracuse Sports Insiders on the win and where Syracuse basketball stands entering league play. Become a Syracuse Sports Insider today! Just text "orange" to 315-847-3895 to get direct access to Brent to get your opinions heard and questions answered on the Syracuse Sports podcast. You can also sign up here. https://joinsubtext.com/syracusesports As a Syracuse Sports Insider, you will get Brent's opinion and reaction to breaking news first via text message, your messages get priority on postgame shows and podcasts, he'll take you behind-the-scenes of SU sports and more! You can also text Brent anytime, including during and after SU games. Try it free for 2 weeks, then it's just $3.99 a month after that. You can cancel at anytime. Subscribe to Syracuse Sports on Spotify https://l.syracuse.com/PKMGpR Subscribe to our Syracuse Orange Sports Report newsletter! Find out how at https://link.syracuse.com/join/6fn/ne... Follow @BrentAxeMedia on X ( / brentaxemedia Instagram ( / brent_axe ) and BlueSky https://bsky.app/profile/brentaxemedi.. Learn more about your ad choices. Visit megaphone.fm/adchoices

spotify crowd blue sky syracuse dome georgia tech grinds sparse adrian autry brent axe jj starling

He Who Would Walk the Earth

Native Calgarian

Play Episode Listen Later Dec 9, 2024 84:42

Sparse and dreamy, Griffin Bjerke-Clarke's debut novel explores memory, identity, trauma, and healing through a timeless journey. An anti-colonial, He Who Would Walk the Earth is infused with Métis storytelling methods and elements of horror, that powerfully evokes a mood reminiscent of twentieth-century classics like Waiting for Godot. This book unsettles as much as it stokes, dystopian in Felix's apathy yet optimistic in the way he addresses challenges along his listless way. In the end, Felix must learn from his earnest mistakes as he begins to understand that agency requires collaborating with those around him. ★ Support this podcast on Patreon ★

earth godot sparse walk the earth

Neel Nanda - Mechanistic Interpretability (Sparse Autoencoders)

Machine Learning Street Talk

Play Episode Listen Later Dec 7, 2024 222:36

Neel Nanda, a senior research scientist at Google DeepMind, leads their mechanistic interpretability team. In this extensive interview, he discusses his work trying to understand how neural networks function internally. At just 25 years old, Nanda has quickly become a prominent voice in AI research after completing his pure mathematics degree at Cambridge in 2020. Nanda reckons that machine learning is unique because we create neural networks that can perform impressive tasks (like complex reasoning and software engineering) without understanding how they work internally. He compares this to having computer programs that can do things no human programmer knows how to write. His work focuses on "mechanistic interpretability" - attempting to uncover and understand the internal structures and algorithms that emerge within these networks. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on ARC and AGI, they just acquired MindsAI - the current winners of the ARC challenge. Are you interested in working on ARC, or getting involved in their events? Goto https://tufalabs.ai/ *** SHOWNOTES, TRANSCRIPT, ALL REFERENCES (DONT MISS!): https://www.dropbox.com/scl/fi/36dvtfl3v3p56hbi30im7/NeelShow.pdf?rlkey=pq8t7lyv2z60knlifyy17jdtx&st=kiutudhc&dl=0 We riff on: * How neural networks develop meaningful internal representations beyond simple pattern matching * The effectiveness of chain-of-thought prompting and why it improves model performance * The importance of hands-on coding over extensive paper reading for new researchers * His journey from Cambridge to working with Chris Olah at Anthropic and eventually Google DeepMind * The role of mechanistic interpretability in AI safety NEEL NANDA: https://www.neelnanda.io/ https://scholar.google.com/citations?user=GLnX3MkAAAAJ&hl=en https://x.com/NeelNanda5 Interviewer - Tim Scarfe TOC: 1. Part 1: Introduction [00:00:00] 1.1 Introduction and Core Concepts Overview 2. Part 2: Outside Interview [00:06:45] 2.1 Mechanistic Interpretability Foundations 3. Part 3: Main Interview [00:32:52] 3.1 Mechanistic Interpretability 4. Neural Architecture and Circuits [01:00:31] 4.1 Biological Evolution Parallels [01:04:03] 4.2 Universal Circuit Patterns and Induction Heads [01:11:07] 4.3 Entity Detection and Knowledge Boundaries [01:14:26] 4.4 Mechanistic Interpretability and Activation Patching 5. Model Behavior Analysis [01:30:00] 5.1 Golden Gate Claude Experiment and Feature Amplification [01:33:27] 5.2 Model Personas and RLHF Behavior Modification [01:36:28] 5.3 Steering Vectors and Linear Representations [01:40:00] 5.4 Hallucinations and Model Uncertainty 6. Sparse Autoencoder Architecture [01:44:54] 6.1 Architecture and Mathematical Foundations [02:22:03] 6.2 Core Challenges and Solutions [02:32:04] 6.3 Advanced Activation Functions and Top-k Implementations [02:34:41] 6.4 Research Applications in Transformer Circuit Analysis 7. Feature Learning and Scaling [02:48:02] 7.1 Autoencoder Feature Learning and Width Parameters [03:02:46] 7.2 Scaling Laws and Training Stability [03:11:00] 7.3 Feature Identification and Bias Correction [03:19:52] 7.4 Training Dynamics Analysis Methods 8. Engineering Implementation [03:23:48] 8.1 Scale and Infrastructure Requirements [03:25:20] 8.2 Computational Requirements and Storage [03:35:22] 8.3 Chain-of-Thought Reasoning Implementation [03:37:15] 8.4 Latent Structure Inference in Language Models

Steven Veerapen, author of the 'Anthony Blanke' series - Historical fiction author and academic discusses morbid curiosity, sparse writing environments, and Tudor love

Writer's Routine

Play Episode Listen Later Nov 22, 2024 50:48

This week, we chat to the historical fiction author and academic, Steven Veerapen. He's best known for his Anthony Blanke series, set in the Tudor period, about the son of a black trumpeter, John Blanke, who was a real figure in the court of King Henry VIII. There's 'Of Blood Descended' and 'Of Judgement Fallen', which are out in print and just released as audiobooks. He's also written 3 in the 'Simon Danforth' series, and a few about the playwright Christopher Marlowe as a spy.We talk about the balance of writing academia and finding time for novels. Also about the morbid curiosity which gives him ideas, and why we all love the Tudors.You can hear about his sparse writing environment, how he plans a busy year, and what Tudor fiction needs to have in it.Get a copy of the book at uk.bookshop.com/shop/writersroutine@writerspodwritersroutine.com Hosted on Acast. See acast.com/privacy for more information.

writing acast academic environments tudor historical fiction tudors king henry viii fiction author christopher marlowe sparse blanke morbid curiosity

Dragnet(111124)

The Mutual Audio Network

Play Episode Listen Later Nov 11, 2024 59:57

Re-Imagined Radio celebrates Dragnet, the real-life police procedural, and Jack Webb, as Detective Sgt. Joe Friday, who defined and was defined by this radio series. We sample from One Out of Seven, The Jack Webb Show, Pat Novak, For Hire, Johnny Madero, Pier 23, and Jeff Regan, Investigator, all pre-Dragnet radio shows where Webb honed his character and acting style. We end with "The City Hall Bombing," an early episode of Dragnet to showcase Webb as a great radio storyteller. Significance The Dragnet radio series presented a wide range of topics, each using fast moving plots and realistic details to keep the action moving. The dialogue was understated. Sparse. Influenced by hard-boiled detective literature. The police work was chronicled step-by-step, with details and realism. The result gave millions of listeners a feel for real police work. The boredom and drudgery. The danger of heroism. With its start in radio, and move to television, Dragnet remains one of the most popular and influentional police procedurals in any media, including literature, motion pictures, and podcasts. More than a half-century after its first broadcast, people who have never heard an episode, or don't know Dragnet, know its 4-note music opening, "DUM-DE-DUM-DUM," and think the phrase "Just the facts, ma'am" originated with Sgt. Joe Friday. It didn't. But that doesn't matter. Learn more about your ad choices. Visit megaphone.fm/adchoices

webb investigators sgt influenced dragnet sparse jack webb joe friday for hire pat novak one out jeff regan

J.B. Smoove | Sad Trump Closes with Lies, Threats, RFK Jr and Complaints About SNL to Sparse Crowds: A Closer Look

Late Night with Seth Meyers Podcast

Play Episode Listen Later Nov 5, 2024 35:38

Seth takes a closer look at an exhausted and despondent Donald Trump closing out his campaign with rambling speeches to dwindling crowds, threats of violence, baseless allegations of cheating, vaccine ban possibilities and complaints about Saturday Night Live.Then, J.B. Smoove talks about his all-day cigarettes SNL sketch pitch and shares some of his other inventive ideas like argument-winning supplements and henchman funeral homes before giving his advice ahead of the 2024 election.Plus, just for this podcast, J.B. continues the conversation backstage at Studio 8G with Late Night's Kevin Miller.See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

donald trump politics books lies threats saturday night live late night complaints crowds robert f kennedy jr closes closer look seth meyers smoove kevin miller sparse seth myers political comedy studio 8g

Season 9 - Episode 43

They Walk Among Us - UK True Crime

Play Episode Listen Later Nov 3, 2024 54:19

This episode is sponsored by Audible – The Home of True Crime Podcasts. PLEASE LISTEN TO ‘SEASON 9 - EPISODE 42' FOR PART ONE OF THIS TWO-PART CASE. Sparse details of an alleged exorcism emerged at Leeds Crown Court when Michael Taylor was found not guilty by reason of insanity for killing his wife, Christine. In an almost unprecedented move, the coroner decided it would be in the public interest to reopen the inquest so that the full story would be held on record... (Part 2 of 2).*** LISTENER CAUTION IS ADVISED *** This episode was researched and written by Eileen Macfarlane.Edited by Joel Porter at Dot Dot Dot Productions.Script editing, additional writing, illustrations and production direction by Rosanna FittonNarration, additional audio editing, script editing, and production direction by Benjamin Fitton.To get early ad-free access, including Season 1, sign up for They Walk Among PLUS, available from Patreon or Apple Podcasts.More information and episode references can be found on our website https://theywalkamonguspodcast.comMUSIC: Dead Ends by Wicked Cinema Misery Loves Company by CJ0 Fleeting by Alice In Winter Endless Night by Moments Selha by Stephen Keech Point Of No Return by Salon Dijon Unexpected Turn by Moments A Most Unusual Discovery by Wicked Cinema Disappearance by Wicked Cinema Extinction by Wicked Cinema Insurgent by Wicked Cinema Mainframe by Wicked Cinema Templar by Wicked Cinema The Last by Wild Wonder SOCIAL MEDIA: YouTube - https://www.youtube.com/channel/UCeM6RXDKQ3gZbDHaKxvrAyAX - https://twitter.com/TWAU_PodcastFacebook - https://www.facebook.com/theywalkamonguspodcastInstagram - https://www.instagram.com/theywalkamonguspodcastThreads - https://www.threads.net/@theywalkamonguspodcastSupport this show http://supporter.acast.com/theywalkamongus. Hosted on Acast. See acast.com/privacy for more information.

acast script edited true crime podcasts michael taylor sparse joel porter

PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

Papers Read on AI

Play Episode Listen Later Oct 14, 2024 33:45

Document understanding is a challenging task to process and comprehend large amounts of textual and visual information. Recent advances in Large Language Models (LLMs) have significantly improved the performance of this task. However, existing methods typically focus on either plain text or a limited number of document images, struggling to handle long PDF documents with interleaved text and images, especially in academic papers. In this paper, we introduce PDF-WuKong, a multimodal large language model (MLLM) which is designed to enhance multimodal question-answering (QA) for long PDF documents. PDF-WuKong incorporates a sparse sampler that operates on both text and image representations, significantly improving the efficiency and capability of the MLLM. The sparse sampler is integrated with the MLLM's image encoder and selects the paragraphs or diagrams most pertinent to user queries for processing by the language model. To effectively train and evaluate our model, we construct PaperPDF, a dataset consisting of a broad collection of academic papers sourced from arXiv, multiple strategies are proposed to generate automatically 1M QA pairs along with their corresponding evidence sources. Experimental results demonstrate the superiority and high efficiency of our approach over other models on the task of long multimodal PDF understanding, surpassing proprietary products by an average of 8.6% on F1. Our code and dataset will be released at https://github.com/yh-hust/PDF-Wukong. 2024: Xudong Xie, Liang Yin, Hao Yan, Yang Liu, Jing Ding, Minghui Liao, Yuliang Liu, Wei Chen, Xiang Bai https://arxiv.org/pdf/2410.05970v1

reading model large f1 document efficient experimental qa sampling large language models multimodal sparse arxiv wukong yang liu

AEW DYNAMITE POST-SHOW (9/18): Keller & Dehnel discuss sparse Grand Slam line-up and evaluate the build for Darby-Mox and Danielson-Nigel

Wade Keller Pro Wrestling Post-shows

Play Episode Listen Later Sep 19, 2024 165:59

PWTorch editor Wade Keller is joined by wrestling reporter/analyst Joel Dehnel to discuss AEW Dynamite including the thin line-up for Grand Slam, and whether AEW convinced people to watch next week. Also, reaction to Ricochet's push so far, Chris Jericho vs. Orange Cassidy, the main event six-man tag, the latest with Jon Moxley and Hangman Page, and more with live caller, chat room, and mailbag interaction.Become a supporter of this podcast: https://www.spreaker.com/podcast/wade-keller-pro-wrestling-post-shows--3275545/support.

Why Halloween Horror Nights 2024 Falls Flat: Budget Cuts & Sparse Scares at Universal Orlando

Green Tagged: Theme Park in 30

Play Episode Listen Later Sep 2, 2024 32:22

Halloween Horror Nights (HHN) kicked off at Universal Studios Orlando this weekend. As the largest Halloween event in the world, HHN is a significant revenue generator for Universal, inspiring similar seasonal offerings at attractions worldwide. However, this year's event falls short of expectations. Could the impending opening of Epic Universe be stretching the team too thin? Or is Universal experimenting with a lower-budget experience to see how it impacts sales? In this video, Scott and Philip break down the highlights and challenges of HHN 2024.

halloween universal falls scare flat halloween horror nights budget cuts universal orlando epic universe universal studios orlando hhn sparse

AF - Showing SAE Latents Are Not Atomic Using Meta-SAEs by Bart Bussmann

The Nonlinear Library

Play Episode Listen Later Aug 24, 2024 35:53

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Showing SAE Latents Are Not Atomic Using Meta-SAEs, published by Bart Bussmann on August 24, 2024 on The AI Alignment Forum. Bart, Michael and Patrick are joint first authors. Research conducted as part of MATS 6.0 in Lee Sharkey and Neel Nanda's streams. Thanks to Mckenna Fitzgerald and Robert Krzyzanowski for their feedback! TL;DR: Sparse Autoencoder (SAE) latents have been shown to typically be monosemantic (i.e. correspond to an interpretable property of the input). It is sometimes implicitly assumed that they are therefore atomic, i.e. simple, irreducible units that make up the model's computation. We provide evidence against this assumption by finding sparse, interpretable decompositions of SAE decoder directions into seemingly more atomic latents, e.g. Einstein -> science + famous + German + astronomy + energy + starts with E We do this by training meta-SAEs, an SAE trained to reconstruct the decoder directions of a normal SAE. We argue that, conceptually, there's no reason to expect SAE latents to be atomic - when the model is thinking about Albert Einstein, it likely also thinks about Germanness, physicists, etc. Because Einstein always entails those things, the sparsest solution is to have the Albert Einstein latent also boost them. Key results SAE latents can be decomposed into more atomic, interpretable meta-latents. We show that when latents in a larger SAE have split out from latents in a smaller SAE, a meta SAE trained on the larger SAE often recovers this structure. We demonstrate that meta-latents allow for more precise causal interventions on model behavior than SAE latents on a targeted knowledge editing task. We believe that the alternate, interpretable decomposition using MetaSAEs casts doubt on the implicit assumption that SAE latents are atomic. We show preliminary results that MetaSAE latents have significant ovelap with latents in a normal SAE of the same size but may relate differently to the larger SAEs used in MetaSAE training. We made a dashboard that lets you explore meta-SAE latents. Terminology: Throughout this post we use "latents" to describe the concrete components of the SAE's dictionary, whereas "feature" refers to the abstract concepts, following Lieberum et al. Introduction Mechanistic interpretability (mech interp) attempts to understand neural networks by breaking down their computation into interpretable components. One of the key challenges of this line of research is the polysemanticity of neurons, meaning they respond to seemingly unrelated inputs. Sparse autoencoders (SAEs) have been proposed as a method for decomposing model activations into sparse linear sums of latents. Ideally, these latents should be monosemantic i.e. respond to inputs that clearly share a similar meaning (implicitly, from the perspective of a human interpreter). That is, a human should be able to reason about the latents both in relation to the features to which they are associated, and also use the latents to better understand the model's overall behavior. There is a popular notion, both implicitly in related work on SAEs within mech interp and explicitly by the use of the term "atom" in sparse dictionary learning as a whole, that SAE features are atomic or can be "true features". However, monosemanticity does not imply atomicity. Consider the example of shapes of different colors - the set of shapes is [circle, triangle, square], and the set of colors is [white, red, green, black], each of which is represented with a linear direction. 'Red triangle' represents a monosemantic feature, but not an atomic feature, as it can be decomposed into red and triangle. It has been shown that sufficiently wide SAEs on toy models will learn 'red triangle', rather than representing 'red' and 'triangle' with separate latents. Furthermore, whilst one may naively re...

research german speech albert einstein bart ea atomic tl mats sae sparse rationalist saes neel nanda germanness

LW - Case Study: Interpreting, Manipulating, and Controlling CLIP With Sparse Autoencoders by Gytis Daujotas

The Nonlinear Library

Play Episode Listen Later Aug 5, 2024 13:12

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Case Study: Interpreting, Manipulating, and Controlling CLIP With Sparse Autoencoders, published by Gytis Daujotas on August 5, 2024 on LessWrong. Click here to open a live research preview where you can try interventions using this SAE. This is a follow-up to a previous post on finding interpretable and steerable features in CLIP. Motivation Modern image diffusion models often use CLIP in order to condition generation. Put simply, users use CLIP to embed prompts or images, and these embeddings are used to diffuse another image back out. Despite this, image models have severe user interface limitations. We already know that CLIP has a rich inner world model, but it's often surprisingly hard to make precise tweaks or reference specific concepts just by prompting alone. Similar prompts often yield a different image, or when we have a specific idea in mind, it can be too hard to find the right string of words to elicit the right concepts we need. If we're able to understand the internal representation that CLIP uses to encode information about images, we might be able to get more expressive tools and mechanisms to guide generation and steer it without using any prompting. In the ideal world, this would enable the ability to make fine adjustments or even reference particular aspects of style or content without needing to specify what we want in language. We could instead leverage CLIP's internal understanding to pick and choose what concepts to include, like a palette or a digital synthesizer. It would also enable us to learn something about how image models represent the world, and how humans can interact with and use this representation, thereby skipping the text encoder and manipulating the model's internal state directly. Introduction CLIP is a neural network commonly used to guide image diffusion. A Sparse Autoencoder was trained on the dense image embeddings CLIP produces to transform it into a sparse representation of active features. These features seem to represent individual units of meaning. They can also be manipulated in groups - combinations of multiple active features - that represent intuitive concepts. These groups can be understood entirely visually, and often encode surprisingly rich and interesting conceptual detail. By directly manipulating these groups as single units, image generation can be edited and guided without using prompting or language input. Concepts that were difficult to specify or edit by text prompting become easy and intuitive to manipulate in this new visual representation. Since many models use the same CLIP joint representation space that this work analyzed, this technique works to control many popular image models out of the box. Summary of Results Any arbitrary image can be decomposed into its constituent concepts. Many concepts (groups of features) that we find seem to slice images up into a fairly natural ontology of their human interpretable components. We find grouping them together is an effective approach to yield a more interpretable and useful grain of control. These concepts can be used like knobs to steer generation in leading models like Stable Cascade. Many concepts have an obvious visual meaning yet are hard to precisely label in language, which suggests that studying CLIP's internal representations can be used as a lens into the variety of the visual domain. Tweaking the activations of these concepts can be used to expressively steer and guide generation in multiple image diffusion models that we tried. We released the weights and a live demo of controlling image generation in feature space. By analyzing a SAE trained on CLIP, we get a much more vivid picture of the rich understanding that CLIP learns. We hope this is just the beginning of more effective and useful interventions in the internal representations of n...

speech controlling similar case study ea concepts clip interpreting manipulating tweaking sae sparse rationalist lesswrong

LW - Open Source Automated Interpretability for Sparse Autoencoder Features by kh4dien

The Nonlinear Library

Play Episode Listen Later Jul 31, 2024 22:41

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Open Source Automated Interpretability for Sparse Autoencoder Features, published by kh4dien on July 31, 2024 on LessWrong. Background Sparse autoencoders recover a diversity of interpretable, monosemantic features, but present an intractable problem of scale to human labelers. We investigate different techniques for generating and scoring text explanations of SAE features. Key Findings Open source models generate and evaluate text explanations of SAE features reasonably well, albeit somewhat worse than closed models like Claude 3.5 Sonnet. Explanations found by LLMs are similar to explanations found by humans. Automatically interpreting 1.5M features of GPT-2 with the current pipeline would cost $1300 in API calls to Llama 3.1 or $8500 with Claude 3.5 Sonnet. Prior methods cost ~$200k with Claude. Code can be found at https://github.com/EleutherAI/sae-auto-interp. We built a small dashboard to explore explanations and their scores: https://cadentj.github.io/demo/ Generating Explanations Sparse autoencoders decompose activations into a sum of sparse feature directions. We leverage language models to generate explanations for activating text examples. Prior work prompts language models with token sequences that activate MLP neurons (Bills et al. 2023), by showing the model a list of tokens followed by their respective activations, separated by a tab, and listed one per line. We instead highlight max activating tokens in each example with a set of . Optionally, we choose a threshold of the example's max activation for which tokens are highlighted. This helps the model distinguish important information for some densely activating features. We experiment with several methods for augmenting the explanation. Full prompts are available here. Chain of thought improves general reasoning capabilities in language models. We few-shot the model with several examples of a thought process that mimics a human approach to generating explanations. We expect that verbalizing thought might capture richer relations between tokens and context. Activations distinguish which sentences are more representative of a feature. We provide the magnitude of activating tokens after each example. We compute the logit weights for each feature through the path expansion where is the model unembed and is the decoder direction for a specific feature. The top promoted tokens capture a feature's causal effects which are useful for sharpening explanations. This method is equivalent to the logit lens (nostalgebraist 2020); future work might apply variants that reveal other causal information (Belrose et al. 2023; Gandelsman et al. 2024). Scoring explanations Text explanations represent interpretable "concepts" in natural language. How do we evaluate the faithfulness of explanations to the concepts actually contained in SAE features? We view the explanation as a classifier which predicts whether a feature is present in a context. An explanation should have high recall - identifying most activating text - as well as high precision - distinguishing between activating and non-activating text. Consider a feature which activates on the word "stop" after "don't" or "won't" (Gao et al. 2024). There are two failure modes: 1. The explanation could be too broad, identifying the feature as activating on the word "stop". It would have high recall on held out text, but low precision. 2. The explanation could be too narrow, stating the feature activates on the word "stop" only after "don't". This would have high precision, but low recall. One approach to scoring explanations is "simulation scoring"(Bills et al. 2023) which uses a language model to assign an activation to each token in a text, then measures the correlation between predicted and real activations. This method is biased toward recall; given a bro...

History of AI - EP06 Part 2: The Effortless Podcast

The Effortless Podcast

Play Episode Listen Later Jul 29, 2024 70:46

Key Topics & Chapter Markers:Recap from Part 1: The Early Years of AI [00:00:00]AI Architecture & Oracle's Innovation in Hash Joins [00:02:00]Impact of Nature in Creative and Collaborative Work [00:05:00]The Rise of Neural Networks: Language and Image Processing [00:10:00]Sparse and Dense Vectors Explained [00:15:00]Google Translate's Early Approaches & Statistical Methods [00:20:00]TensorFlow vs. PyTorch: Defining the Modern AI Framework [00:30:00]Dot Products, Similarity, and the Concept of Attention [00:35:00]Transformers & The Attention Mechanism Revolution [00:42:00]BERT, GPT, and the Dawn of Transfer Learning [01:00:00]The Road to ChatGPT and OpenAI's Innovations [01:10:00]The Future of AI and Computational Scaling [01:15:00]Share Your Thoughts: Have questions or comments? Drop us a mail at EffortlessPodcastHQ@gmail.com

history ai future nature innovation creative drop impact chatgpt attention concept openai gpt early years effortless google translate tensorflow similarity sparse image processing

LW - Efficient Dictionary Learning with Switch Sparse Autoencoders by Anish Mudide

The Nonlinear Library

Play Episode Listen Later Jul 22, 2024 20:21

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Efficient Dictionary Learning with Switch Sparse Autoencoders, published by Anish Mudide on July 22, 2024 on LessWrong. Produced as part of the ML Alignment & Theory Scholars Program - Summer 2024 Cohort 0. Summary To recover all the relevant features from a superintelligent language model, we will likely need to scale sparse autoencoders (SAEs) to billions of features. Using current architectures, training extremely wide SAEs across multiple layers and sublayers at various sparsity levels is computationally intractable. Conditional computation has been used to scale transformers (Fedus et al.) to trillions of parameters while retaining computational efficiency. We introduce the Switch SAE, a novel architecture that leverages conditional computation to efficiently scale SAEs to many more features. 1. Introduction The internal computations of large language models are inscrutable to humans. We can observe the inputs and the outputs, as well as every intermediate step in between, and yet, we have little to no sense of what the model is actually doing. For example, is the model inserting security vulnerabilities or backdoors into the code that it writes? Is the model lying, deceiving or seeking power? Deploying a superintelligent model into the real world without being aware of when these dangerous capabilities may arise leaves humanity vulnerable. Mechanistic interpretability (Olah et al.) aims to open the black-box of neural networks and rigorously explain the underlying computations. Early attempts to identify the behavior of individual neurons were thwarted by polysemanticity, the phenomenon in which a single neuron is activated by several unrelated features (Olah et al.). Language models must pack an extremely vast amount of information (e.g., the entire internet) within a limited capacity, encouraging the model to rely on superposition to represent many more features than there are dimensions in the model state (Elhage et al.). Sharkey et al. and Cunningham et al. propose to disentangle superimposed model representations into monosemantic, cleanly interpretable features by training unsupervised sparse autoencoders (SAEs) on intermediate language model activations. Recent work (Templeton et al., Gao et al.) has focused on scaling sparse autoencoders to frontier language models such as Claude 3 Sonnet and GPT-4. Despite scaling SAEs to 34 million features, Templeton et al. estimate that they are likely orders of magnitude short of capturing all features. Furthermore, Gao et al. train SAEs on a series of language models and find that larger models require more features to achieve the same reconstruction error. Thus, to capture all relevant features of future large, superintelligent models, we will likely need to scale SAEs to several billions of features. With current methodologies, training SAEs with billions of features at various layers, sublayers and sparsity levels is computationally infeasible. Training a sparse autoencoder generally consists of six major computations: the encoder forward pass, the encoder gradient, the decoder forward pass, the decoder gradient, the latent gradient and the pre-bias gradient. Gao et al. introduce kernels and tricks that leverage the sparsity of the TopK activation function to dramatically optimize all computations excluding the encoder forward pass, which is not (yet) sparse. After implementing these optimizations, Gao et al. attribute the majority of the compute to the dense encoder forward pass and the majority of the memory to the latent pre-activations. No work has attempted to accelerate or improve the memory efficiency of the encoder forward pass, which remains the sole dense matrix multiplication. In a standard deep learning model, every parameter is used for every input. An alternative approach is conditional computatio...

AF - Decomposing the QK circuit with Bilinear Sparse Dictionary Learning by keith wynroe

The Nonlinear Library

Play Episode Listen Later Jul 2, 2024 21:25

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Decomposing the QK circuit with Bilinear Sparse Dictionary Learning, published by keith wynroe on July 2, 2024 on The AI Alignment Forum. This work was produced as part of Lee Sharkey's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort Intro and Motivation Sparse dictionary learning (SDL) has attracted a lot of attention recently as a method for interpreting transformer activations. They demonstrate that model activations can often be explained using a sparsely-activating, overcomplete set of human-interpretable directions. However, despite its success for explaining many components, applying SDL to interpretability is relatively nascent and have yet to be applied to some model activations. In particular, intermediate activations of attention blocks have yet to be studied, and provide challenges for standard SDL methods. The first challenge is bilinearity: SDL is usually applied to individual vector spaces at individual layers, so we can simply identify features as a direction in activation space. But the QK circuits of transformer attention layers are different: They involve a bilinear form followed by a softmax. Although simply applying sparse encoders to the keys and queries[1] could certainly help us understand the "concepts" being used by a given attention layer, this approach would fail to explain how the query-features and key-features interact bilinearly. We need to understand which keys matter to which queries. The second challenge is attention-irrelevant variance: A lot of the variance in the attention scores is irrelevant to the attention pattern because it is variance in low scores which are softmaxed to zero; this means that most of the variability in the keys and queries is irrelevant for explaining downstream behaviour[2]. The standard method of reconstructing keys and queries would therefore waste capacity on what is effectively functionally irrelevant noise. To tackle these two problems (bilinearity and attention-irrelevant variance), we propose a training setup which only reconstructs the dimensions of the keys and queries that most affect the attention pattern. Training Setup Our training process has two steps: Step 1: Reconstructing the attention pattern with key- and query- encoder-decoder networks Step 2: Finding a condensed set of query-key feature pairs by masking Step 1: Reconstructing the attention pattern with key- and query-transcoders Architecture Our first training step involves training two sparse dictionaries in parallel (one for the keys and one for the queries). The dictionaries both take in the layer-normalized residual stream at a given layer (normalised_resid_pre_i) and each output a [n_head * d_head] vector, representing the flattened keys and queries[3]. Figure 1: High-level diagram of our training set-up Loss functions However, rather than penalising the reconstruction loss of the keys and queries explicitly, we can use these keys and queries to reconstruct the original model's attention pattern. To train the reconstructed attention pattern, we used several different losses: KL divergence between the attention pattern (using reconstructed keys and reconstructed queries) and the ground-truth attention pattern produced by the original model. We also added two auxiliary reconstruction losses both for early-training-run stability, and to ensure our transcoders do not learn to reconstruct the keys and queries with an arbitrary rotation applied (since this would still produce the same attention scores and patterns): KL divergence between the attention pattern (using reconstructed keys and the original model's queries) and the ground-truth attention pattern produced by the original model. KL divergence between the attention pattern (using the original models' keys and the reconstructed queries) and the groun...

learning speech figure ea circuit dictionary reconstructing sparse decomposing sdl rationalist qk bilinear

LW - Decomposing the QK circuit with Bilinear Sparse Dictionary Learning by keith wynroe

The Nonlinear Library

Play Episode Listen Later Jul 2, 2024 21:25

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Decomposing the QK circuit with Bilinear Sparse Dictionary Learning, published by keith wynroe on July 2, 2024 on LessWrong. This work was produced as part of Lee Sharkey's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort Intro and Motivation Sparse dictionary learning (SDL) has attracted a lot of attention recently as a method for interpreting transformer activations. They demonstrate that model activations can often be explained using a sparsely-activating, overcomplete set of human-interpretable directions. However, despite its success for explaining many components, applying SDL to interpretability is relatively nascent and have yet to be applied to some model activations. In particular, intermediate activations of attention blocks have yet to be studied, and provide challenges for standard SDL methods. The first challenge is bilinearity: SDL is usually applied to individual vector spaces at individual layers, so we can simply identify features as a direction in activation space. But the QK circuits of transformer attention layers are different: They involve a bilinear form followed by a softmax. Although simply applying sparse encoders to the keys and queries[1] could certainly help us understand the "concepts" being used by a given attention layer, this approach would fail to explain how the query-features and key-features interact bilinearly. We need to understand which keys matter to which queries. The second challenge is attention-irrelevant variance: A lot of the variance in the attention scores is irrelevant to the attention pattern because it is variance in low scores which are softmaxed to zero; this means that most of the variability in the keys and queries is irrelevant for explaining downstream behaviour[2]. The standard method of reconstructing keys and queries would therefore waste capacity on what is effectively functionally irrelevant noise. To tackle these two problems (bilinearity and attention-irrelevant variance), we propose a training setup which only reconstructs the dimensions of the keys and queries that most affect the attention pattern. Training Setup Our training process has two steps: Step 1: Reconstructing the attention pattern with key- and query- encoder-decoder networks Step 2: Finding a condensed set of query-key feature pairs by masking Step 1: Reconstructing the attention pattern with key- and query-transcoders Architecture Our first training step involves training two sparse dictionaries in parallel (one for the keys and one for the queries). The dictionaries both take in the layer-normalized residual stream at a given layer (normalised_resid_pre_i) and each output a [n_head * d_head] vector, representing the flattened keys and queries[3]. Figure 1: High-level diagram of our training set-up Loss functions However, rather than penalising the reconstruction loss of the keys and queries explicitly, we can use these keys and queries to reconstruct the original model's attention pattern. To train the reconstructed attention pattern, we used several different losses: KL divergence between the attention pattern (using reconstructed keys and reconstructed queries) and the ground-truth attention pattern produced by the original model. We also added two auxiliary reconstruction losses both for early-training-run stability, and to ensure our transcoders do not learn to reconstruct the keys and queries with an arbitrary rotation applied (since this would still produce the same attention scores and patterns): KL divergence between the attention pattern (using reconstructed keys and the original model's queries) and the ground-truth attention pattern produced by the original model. KL divergence between the attention pattern (using the original models' keys and the reconstructed queries) and the ground-truth atten...

learning speech figure ea circuit dictionary reconstructing sparse decomposing sdl rationalist lesswrong qk bilinear

AF - Interpreting Preference Models w/ Sparse Autoencoders by Logan Riggs Smith

The Nonlinear Library

Play Episode Listen Later Jul 1, 2024 15:43

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Interpreting Preference Models w/ Sparse Autoencoders, published by Logan Riggs Smith on July 1, 2024 on The AI Alignment Forum. Preference Models (PMs) are trained to imitate human preferences and are used when training with RLHF (reinforcement learning from human feedback); however, we don't know what features the PM is using when outputting reward. For example, maybe curse words make the reward go down and wedding-related words make it go up. It would be good to verify that the features we wanted to instill in the PM (e.g. helpfulness, harmlessness, honesty) are actually rewarded and those we don't (e.g. deception, sycophancey) aren't. Sparse Autoencoders (SAEs) have been used to decompose intermediate layers in models into interpretable feature. Here we train SAEs on a 7B parameter PM, and find the features that are most responsible for the reward going up & down. High level takeaways: 1. We're able to find SAE features that have a large causal effect on reward which can be used to "jail break" prompts. 2. We do not explain 100% of reward differences through SAE features even though we tried for a couple hours. What are PMs? [skip if you're already familiar] When talking to a chatbot, it can output several different responses, and you can choose which one you believe is better. We can then train the LLM on this feedback for every output, but humans are too slow. So we'll just get, say, 100k human preferences of "response A is better than response B", and train another AI to predict human preferences! But to take in text & output a reward, a PM would benefit from understanding language. So one typically trains a PM by first taking an already pretrained model (e.g. GPT-3), and replacing the last component of the LLM of shape [d_model, vocab_size], which converts the residual stream to 50k numbers for the probability of each word in its vocabulary, to [d_model, 1] which converts it to 1 number which represents reward. They then call this pretrained model w/ this new "head" a "Preference Model", and train it to predict the human-preference dataset. Did it give the human preferred response [A] a higher number than [B]? Good. If not, bad! This leads to two important points: 1. Reward is relative - the PM is only trained to say the human preferred response is better than the alternative. So a large negative reward or large positive reward don't have objective meaning. All that matters is the relative reward difference for two completions given the same prompt. 1. (h/t to Ethan Perez's post) 2. Most features are already learned in pretraining - the PM isn't learning new features from scratch. It's taking advantage of the pretrained model's existing concepts. These features might change a bit or compose w/ each other differently though. 1. Note: this an unsubstantiated hypothesis of mine. Finding High Reward-affecting Features w/ SAEs We trained 6 SAEs on layers 2,8,12,14,16,20 of an open source 7B parameter PM, finding 32k features for each layer. We then find the most important features for the reward going up or down (specifics in Technical Details section). Below is a selection of features found through this process that we thought were interesting enough to try to create prompts w/. (My list of feature interpretations for each layer can be found here) Negative Features A "negative" feature is a feature that will decrease the reward that the PM predicts. This could include features like cursing or saying the same word repeatedly. Therefore, we should expect that removing a negative feature makes the reward go up I don't know When looking at a feature, I'll look at the top datapoints that removing it affected the reward the most: Removing feature 11612 made the chosen reward go up by 1.2 from 4.79->6.02, and had no effect on the rejected completion because it doesn't a...

ai speech reward models ea pms gpt interpreting llm preference riggs 7b sae sparse rationalist saes rlhf

LLM Interpretability and Sparse Autoencoders: Research from OpenAI and Anthropic

Deep Papers

Play Episode Listen Later Jun 14, 2024 44:00

It's been an exciting couple weeks for GenAI! Join us as we discuss the latest research from OpenAI and Anthropic. We're excited to chat about this significant step forward in understanding how LLMs work and the implications it has for deeper understanding of the neural activity of language models. We take a closer look at some recent research from both OpenAI and Anthropic. These two recent papers both focus on the sparse autoencoder--an unsupervised approach for extracting interpretable features from an LLM. In "Extracting Concepts from GPT-4," OpenAI researchers propose using k-sparse autoencoders to directly control sparsity, simplifying tuning and improving the reconstruction-sparsity frontier. In "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet," researchers at Anthropic show that scaling laws can be used to guide the training of sparse autoencoders, among other findings. To learn more about ML observability, join the Arize AI Slack community or get the latest on our LinkedIn and Twitter.

research openai gpt ml llm genai sonnets anthropic sparse interpretability

Investing in AI for Hard Tech, with Eric Vishria of Benchmark and Sergiy Nesterenko of Quilter

Play Episode Listen Later Jun 13, 2024 54:43

Dive into the world of AI investments with Eric Vishria of Benchmark and Sergiy Nesterenko of Quilter. Explore the future of AI in hardware design, the strategies for venture capital investment in the AI era, and the impact on society. Discover why Benchmark has yet to invest in foundation model companies and the significance of solving enduring problems in this dynamic field. Join us for an eye-opening discussion on the intersection of AI technology and business innovation. SPONSORS: Oracle Cloud Infrastructure (OCI) is a single platform for your infrastructure, database, application development, and AI needs. OCI has four to eight times the bandwidth of other clouds; offers one consistent price, and nobody does data better than Oracle. If you want to do more and spend less, take a free test drive of OCI at https://oracle.com/cognitive The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://bit.ly/BraveTCR Head to Squad to access global engineering without the headache and at a fraction of the cost: head to https://choosesquad.com/ and mention “Turpentine” to skip the waitlist. Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/ Recommended Podcast - The Riff with Byrne Hobart Byrne Hobart, the writer of The Diff, is revered in Silicon Valley. You can get an hour with him each week. See for yourself how his thinking can upgrade yours. Spotify: https://open.spotify.com/show/6rANlV54GCARLgMOtpkzKt Apple: https://podcasts.apple.com/us/podcast/the-riff-with-byrne-hobart-and-erik-torenberg/id1716646486 CHAPTERS: (00:00:00) Introduction (00:10:12) The Idea Maze (00:12:28) Disruptive Approach (00:15:47) Sparse reward problem (00:18:26) Sponsors: Oracle | Brave (00:20:34) Reliability of the reward signal (00:28:12) Model size and compute (00:30:14) Simulation methods (00:35:48) Superhuman circuit board design (00:38:53) Sponsors: Squad | Omneky (00:40:38) What does the future of circuit board design look like? (00:43:11) How do I make money in AI? (00:46:18) What is cutting edge? (00:48:34) Researchers vs. engineers (00:50:51) Call for startups

AF - Scaling and evaluating sparse autoencoders by leogao

The Nonlinear Library

Play Episode Listen Later Jun 6, 2024 1:37

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Scaling and evaluating sparse autoencoders, published by leogao on June 6, 2024 on The AI Alignment Forum. [Blog] [Paper] [Visualizer] Abstract: Sparse autoencoders provide a promising unsupervised approach for extracting interpretable features from a language model by reconstructing activations from a sparse bottleneck layer. Since language models learn many concepts, autoencoders need to be very large to recover all relevant features. However, studying the properties of autoencoder scaling is difficult due to the need to balance reconstruction and sparsity objectives and the presence of dead latents. We propose using k-sparse autoencoders [Makhzani and Frey, 2013] to directly control sparsity, simplifying tuning and improving the reconstruction-sparsity frontier. Additionally, we find modifications that result in few dead latents, even at the largest scales we tried. Using these techniques, we find clean scaling laws with respect to autoencoder size and sparsity. We also introduce several new metrics for evaluating feature quality based on the recovery of hypothesized features, the explainability of activation patterns, and the sparsity of downstream effects. These metrics all generally improve with autoencoder size. To demonstrate the scalability of our approach, we train a 16 million latent autoencoder on GPT-4 activations for 40 billion tokens. We release code and autoencoders for open-source models, as well as a visualizer. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

speech scaling evaluating ea gpt frey sparse rationalist

AF - Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning by Dan Braun

The Nonlinear Library

Play Episode Listen Later May 17, 2024 9:00

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning, published by Dan Braun on May 17, 2024 on The AI Alignment Forum. A short summary of the paper is presented below. This work was produced by Apollo Research in collaboration with Jordan Taylor (MATS + University of Queensland) . TL;DR: We propose end-to-end (e2e) sparse dictionary learning, a method for training SAEs that ensures the features learned are functionally important by minimizing the KL divergence between the output distributions of the original model and the model with SAE activations inserted. Compared to standard SAEs, e2e SAEs offer a Pareto improvement: They explain more network performance, require fewer total features, and require fewer simultaneously active features per datapoint, all with no cost to interpretability. We explore geometric and qualitative differences between e2e SAE features and standard SAE features. Introduction Current SAEs focus on the wrong goal: They are trained to minimize mean squared reconstruction error (MSE) of activations (in addition to minimizing their sparsity penalty). The issue is that the importance of a feature as measured by its effect on MSE may not strongly correlate with how important the feature is for explaining the network's performance. This would not be a problem if the network's activations used a small, finite set of ground truth features -- the SAE would simply identify those features, and thus optimizing MSE would have led the SAE to learn the functionally important features. In practice, however, Bricken et al. observed the phenomenon of feature splitting, where increasing dictionary size while increasing sparsity allows SAEs to split a feature into multiple, more specific features, representing smaller and smaller portions of the dataset. In the limit of large dictionary size, it would be possible to represent each individual datapoint as its own dictionary element. Since minimizing MSE does not explicitly prioritize learning features based on how important they are for explaining the network's performance, an SAE may waste much of its fixed capacity on learning less important features. This is perhaps responsible for the observation that, when measuring the causal effects of some features on network performance, a significant amount is mediated by the reconstruction residual errors (i.e. everything not explained by the SAE) and not mediated by SAE features (Marks et al.). Given these issues, it is therefore natural to ask how we can identify the functionally important features used by the network. We say a feature is functional important if it is important for explaining the network's behavior on the training distribution. If we prioritize learning functionally important features, we should be able to maintain strong performance with fewer features used by the SAE per datapoint as well as fewer overall features. To optimize SAEs for these properties, we introduce a new training method. We still train SAEs using a sparsity penalty on the feature activations (to reduce the number of features used on each datapoint), but we no longer optimize activation reconstruction. Instead, we replace the original activations with the SAE output and optimize the KL divergence between the original output logits and the output logits when passing the SAE output through the rest of the network, thus training the SAE end-to-end (e2e). One risk with this method is that it may be possible for the outputs of SAE_e2e to take a different computational pathway through subsequent layers of the network (compared with the original activations) while nevertheless producing a similar output distribution. For example, it might learn a new feature that exploits a particular transformation in a downstream layer that is unused by the regular netw...

learning speech identifying ea queensland braun dictionary tl pareto sae functionally sparse mse rationalist saes

LW - Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers by hugofry

The Nonlinear Library

Play Episode Listen Later Apr 30, 2024 19:44

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers, published by hugofry on April 30, 2024 on LessWrong. Two Minute Summary In this post I present my results from training a Sparse Autoencoder (SAE) on a CLIP Vision Transformer (ViT) using the ImageNet-1k dataset. I have created an interactive web app, 'SAE Explorer', to allow the public to explore the visual features the SAE has learnt, found here: https://sae-explorer.streamlit.app/ (best viewed on a laptop). My results illustrate that SAEs can identify sparse and highly interpretable directions in the residual stream of vision models, enabling inference time inspections on the model's activations. To demonstrate this, I have included a 'guess the input image' game on the web app that allows users to guess the input image purely from the SAE activations of a single layer and token of the residual stream. I have also uploaded a (slightly outdated) accompanying talk of my results, primarily listing SAE features I found interesting: https://youtu.be/bY4Hw5zSXzQ. The primary purpose of this post is to demonstrate and emphasise that SAEs are effective at identifying interpretable directions in the activation space of vision models. In this post I highlight a small number my favourite SAE features to demonstrate some of the abstract concepts the SAE has identified within the model's representations. I then analyse a small number of SAE features using feature visualisation to check the validity of the SAE interpretations. Later in the post, I provide some technical analysis of the SAE. I identify a large cluster of features analogous to the 'ultra-low frequency' cluster that Anthropic identified. In line with existing research, I find that this ultra-low frequency cluster represents a single feature. I then analyse the 'neuron-alignment' of SAE features by comparing the SAE encoder matrix the MLP out matrix. This research was conducted as part of the ML Alignment and Theory Scholars program 2023/2024 winter cohort. Special thanks to Joseph Bloom for providing generous amounts of his time and support (in addition to the SAE Lens code base) as well as LEAP labs for helping to produce the feature visualisations and weekly meetings with Jessica Rumbelow. Example, animals eating other animals feature: (top 16 highest activating images) Example, Italian feature: Note that the photo of the dog has a watermark with a website ending in .it (Italy's domain name). Note also that the bottom left photo is of Italian writing. The number of ambulances present is a byproduct of using ImageNet-1k. Motivation Frontier AI systems are becoming increasingly multimodal, and capabilities may advance significantly as multimodality increases due to transfer learning between different data modalities and tasks. As a heuristic, consider how much intuition humans gain for the world through visual reasoning; even in abstract settings such as in maths and physics, concepts are often understood most intuitively through visual reasoning. Many cutting edge systems today such as DALL-E and Sora use ViTs trained on multimodal data. Almost by definition, AGI is likely to be multimodal. Despite this, very little effort has been made to apply and adapt our current mechanistic interpretability techniques to vision tasks or multimodal models. I believe it is important to check that mechanistic interpretability generalises to these systems in order to ensure they are future-proof and can be applied to safeguard against AGI. In this post, I restrict the scope of my research to specifically investigating SAEs trained on multimodal models. The particular multimodal system I investigate is CLIP, a model trained on image-text pairs. CLIP consists of two encoders: a language model and a vision model that are trained to e...

learning vision italy italian speech leap transformers ea clip sora agi anthropic sae multimodal mlp sparse rationalist imagenet saes interpretability lesswrong

AF - Improving Dictionary Learning with Gated Sparse Autoencoders by Neel Nanda

The Nonlinear Library

Play Episode Listen Later Apr 25, 2024 1:12

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Improving Dictionary Learning with Gated Sparse Autoencoders, published by Neel Nanda on April 25, 2024 on The AI Alignment Forum. Authors: Senthooran Rajamanoharan*, Arthur Conmy*, Lewis Smith, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah, Neel Nanda A new paper from the Google DeepMind mech interp team: Improving Dictionary Learning with Gated Sparse Autoencoders! Gated SAEs are a new Sparse Autoencoder architecture that seems to be a significant Pareto-improvement over normal SAEs, verified on models up to Gemma 7B. They are now our team's preferred way to train sparse autoencoders, and we'd love to see them adopted by the community! (Or to be convinced that it would be a bad idea for them to be adopted by the community!) They achieve similar reconstruction with about half as many firing features, and while being either comparably or more interpretable (confidence interval for the increase is 0%-13%). See Sen's Twitter summary, my Twitter summary, and the paper! Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

learning speech improving ea dictionary pareto kram google deepmind gated sparse rationalist saes lewis smith neel nanda rohin shah

Ep. 567 w/ Brian Stevens CEO at Neural Magic

Building The Future Show - Radio / TV / Podcast

Play Episode Listen Later Apr 23, 2024 46:33 Transcription Available

Together with our community, we engineer sparse LLM, CV, and NLP models that are more efficient and performant in production. Why does this matter? Sparse models are more flexible and can achieve unrivaled latency and throughput performance on your private CPU and GPU infrastructure. Check us out on GitHub and join the Neural Magic Slack Community to get started with software-delivered AI.http://neuralmagic.com/

founders ai technology entrepreneur magic startups investors nlp cv github llm cpu gpu neural sparse brian stevens

AF - ProLU: A Pareto Improvement for Sparse Autoencoders by Glen M. Taggart

The Nonlinear Library

Play Episode Listen Later Apr 23, 2024 8:59

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: ProLU: A Pareto Improvement for Sparse Autoencoders, published by Glen M. Taggart on April 23, 2024 on The AI Alignment Forum. Abstract This paper presents ProLU, an alternative to ReLU for the activation function in sparse autoencoders that produces a pareto improvement over the standard sparse autoencoder architectures and sparse autoencoders trained with Sqrt(L1) penalty. Introduction SAE Context and Terminology Learnable parameters of a sparse autoencoder: Wenc : encoder weights Wdec : decoder weights benc : encoder bias bdec : decoder bias Training Notation: Encoder/Decoder Let encode(x)=ReLU((xbdec)Wenc+benc)decode(a)=aWdec+bdec so that the full computation done by an SAE can be expressed as SAE(x)=decode(encode(x)) An SAE is trained with gradient descent on where λ is the sparsity penalty coefficient (often "L1 coefficient") and P is the sparsity penalty function, used to encourage sparsity. P is commonly the L1 norm ||a||1 but recently l12 has been shown to produce a Pareto improvement on the L0 and CE metrics. Sqrt(L1) SAEs There has been other work producing pareto improvements to SAEs by taking P(a)=||a||1/21/2 as the penalty function. We will use this as a further baseline to compare against when assessing our models. Motivation: Inconsistent Scaling in Sparse Autoencoders Due to the affine translation, sparse autoencoder features with nonzero encoder biases only perfectly reconstruct feature magnitudes at a single point. This poses difficulties if activation magnitudes for a fixed feature tend to vary over a wide range. This potential problem motivates the concept of scale consistency: A scale consistent response curve The bias maintains its role in noise suppression, but no longer translates activation magnitudes when the feature is active. The lack of gradients for the encoder bias term poses a challenge for learning with gradient descent. This paper will formalize an activation function which gives SAEs this scale-consistent response curve, and motivate and propose two plausible synthetic gradients, and compare scale-consistent models trained with the two synthetic gradients to standard SAEs and SAEs trained with Sqrt(L1) penalty. Scale Consistency Desiderata Notation: Centered Submodule The use of the decoder bias can be viewed as performing centering on the inputs to a centered SAE then reversing the centering on the outputs: SAE(x)=SAEcent(xbdec)+bdec SAEcent(x)=ReLU(xWenc+benc)Wdec Notation: Specified Feature Let Wi denote the weights and bienc the encoder bias for the i-th feature. Then, let SAEi(x)=SAEicent(xbdec)+bdec where SAEicent(x)=ReLU(xWienc+bienc)Widec Conditional Linearity Noise Suppresion Threshold Methods Proportional ReLU (ProLU) We define the Proportional ReLU (ProLU) as: Backprop with ProLU: To use ProLU in SGD-optimized models, we first address the lack of gradients wrt. the b term. ReLU gradients: For comparison and later use, we will first consider ReLU: partial derivatives are well defined for ReLU at all points other than xi=0: Gradients of ProLU: Partials of ProLU wrt. m are similarly well defined: However, they are not well defined wrt. b, so we must synthesize these. Notation: Synthetic Gradients Let fx denote the synthetic partial derivative of f wrt. x, and f the synthetic gradient of f, used for backpropagation as a stand-in for the gradient. Different synthetic gradient types We train two classes of ProLU with different synthetic gradients. These are distinguished by their subscript: ProLUReLU ProLUSTE They are identical in output, but have different synthetic gradients. I.e. ReLU-Like Gradients: ProLUReLU The first synthetic gradient is very similar to the gradient for ReLU. We retain the gradient wrt. m, and define the synthetic gradient wrt. b as follows: Thresh STE Derived Gradients: ProLUSTE The second class of Pro...

speech ea pareto l1 taggart sae sparse rationalist sgd saes gradients

The Future of AI will be Sparse

Amelia's Weekly Fish Fry

Play Episode Listen Later Apr 19, 2024 15:31

My podcast guest this week is Femtosense CEO Sam Fok! Sam and I chat about the role that sparsity will play in the future of AI, the details of Femtosense's SPU hardware platform and how Femtosense's AI technology is being used for AI speech enhancement in hearing aids. Also this week, I check out how you can design your own function warp drive with the help of a new groundbreaking open source software toolkit called Warp Factory.

ai future of ai sparse spu

US Market Open: DXY nears 104.00 after BoJ sources, XAU at ATHs; docket ahead sparse

Ransquawk Rundown, Daily Podcast

Play Episode Listen Later Apr 9, 2024 3:00

European bourses in the red with US futures flat in catalyst-thin tradeDXY steady for much of the morning before USD/JPY came under pressure on a BoJ-related source reportFixed income benchmarks bid with EGBs outperforming, no reaction to supply but latest ECB BLS perhaps factoringCommodities firmer; crude gains incremental while XAU hit a fresh ATHLooking ahead, highlights include US Supply & SNB's Schlegel; EIA STEO.Read the full report covering Equities, Forex, Fixed Income, Commodites and more on Newsquawk

european forex equities fixed income boj us markets docket schlegel sparse snb aths usd jpy xau

LibrInsieme, il book club che riunisce persone sparse per l'Australia che parlano l'italiano

SBS Italian - SBS in Italiano

Play Episode Listen Later Apr 3, 2024 16:09

LibrInsieme è un club letterario che ogni 15 giorni riunisce in una biblioteca virtuale tanti appassionati di lettura sparsi in giro per l'Australia in cui ci si confronta sui libri letti e si ha la possibilità di incontrare gli autori e le autrici.

australia book club per l persone parlano sparse

AF - Announcing Neuronpedia: Platform for accelerating research into Sparse Autoencoders by Johnny Lin

The Nonlinear Library

Play Episode Listen Later Mar 25, 2024 12:49

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing Neuronpedia: Platform for accelerating research into Sparse Autoencoders, published by Johnny Lin on March 25, 2024 on The AI Alignment Forum. This posts assumes basic familiarity with Sparse Autoencoders. For those unfamiliar with this technique, we highly recommend the introductory sections of these papers. TL;DR Neuronpedia is a platform for mechanistic interpretability research. It was previously focused on crowdsourcing explanations of neurons, but we've pivoted to accelerating researchers for Sparse Autoencoders (SAEs) by hosting models, feature dashboards, data visualizations, tooling, and more. Important Links Explore: The SAE research focused Neuronpedia. Current SAEs for GPT2-Small: RES-JB: Residuals - Joseph Bloom (294k feats) ATT-KK: Attention Out - Connor Kissane + Robert Kryzanowski (344k feats) Upload: Get your SAEs hosted by Neuronpedia: fill out this

research speech platform ea accelerating tl sparse rationalist saes

LW - Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT by Robert AIZI

The Nonlinear Library

Play Episode Listen Later Mar 5, 2024 16:16

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT, published by Robert AIZI on March 5, 2024 on LessWrong. Abstract A sparse autoencoder is a neural network architecture that has recently gained popularity as a technique to find interpretable features in language models ( Cunningham et al, Anthropic's Bricken et al). We train a sparse autoencoder on OthelloGPT, a language model trained on transcripts of the board game Othello, which has been shown to contain a linear representation of the board state, findable by supervised probes. The sparse autoencoder finds 9 features which serve as high-accuracy classifiers of the board state, out of 180 findable with supervised probes (and 192 possible piece/position combinations). Across random seeds, the autoencoder repeatedly finds "simpler" features concentrated on the center of the board and the corners. This demonstrates that current techniques for sparse autoencoders may fail to find a large majority of the interesting, interpretable features in a language model. Introduction There has been a recent flurry of research activity around Sparse Autoencoders for Dictionary Learning, a new approach to finding interpretable features in language models and potentially "solving superposition" ( Sharkey et al, Anthropic's Bricken et al, Cunningham et al.). But while this technique can find features which are interpretable, it is not yet clear if sparse autoencoders can find particular features of interest (e.g., features relevant to reducing AI risk). This research report seeks to answer the question of whether sparse autoencoders can find a set of a-priori existing, interesting, and interpretable features in the OthelloGPT language model. OthelloGPT, as the name suggests, is a language model trained on transcripts of the board game Othello to predict legal moves, but was found to also linearly encode the current board state ( Nanda, Hazineh et al). That is, for each of the 64 board positions, there were "board-state features" (linear mappings from the residual stream to R^3) that classify the state at that position between [is empty] vs [has active-player's piece] vs [has enemy's piece], and these board-state features can be found by the supervised training of a linear probe. These board-state features are an exciting testbed for sparse autoencoders because they represent a set of "called-shot" features we hope to find, and which are extremely interpretable and correspond to natural human thinking[1]. If the sparse autoencoder can find these features, this is some evidence that they will find relevant and important features in language models. Conversely, if the sparse autoencoders can't find these features, that indicates a limitation of the method, and provides a test case where we can adjust our training methods until we can find them. Overview Here we: Train an OthelloGPT model from scratch Train a linear probe to classify the board states (replicating Hazineh et al) from an intermediate layer of OthelloGPT. Train a sparse autoencoder on the same layer of OthelloGPT Assess whether the features found by the sparse autoencoder include the linear encoding of the current board state that the linear probe is able to find. Retrain the sparse autoencoder with different random seeds, and analyze which features are found. Methods Training OthelloGPT We first trained an OthelloGPT model from scratch, following the approach of Li et al. Our model is a 25M parameter, 8-layer, decoder-only transformer, with residual stream dimension d_model=512 (identical to Li et al's model). It is trained to do next-token-prediction of random transcripts of Othello games, with each possible move being encoded as a separate token, resulting in a vocabulary size of 66 (64 from the positions on the boards, plus 2 speci...

ai board train speech ea li cunningham conversely anthropic othello 25m sharkey nanda retrain sparse research report rationalist lesswrong

NBA Fantasy Basketball: Navigating Super Bowl's Sparse Schedule

Locked On Fantasy Basketball

Play Episode Listen Later Feb 10, 2024 26:41

Josh Lloyd delves into the nuances of a quieter NBA schedule on Super Bowl Sunday, pinpointing the potential impact of just two games on the day's fantasy basketball landscape. He'll dissect the significance of Kevin Huerter, Lu Dort, and Jaime Jaquez within this limited lineup. Tune in to the Locked On Fantasy Basketball Podcast, powered by Basketball Monster, for expert insights on making the most of this unique NBA slate. Vote for my partner to win the Changemaker Award https://www.wishpond.com/lp/2780526/entries/204585428 Support Us By Supporting Our Sponsors! Nissan Our friends at Nissan have a lineup of SUV's with the capabilities to take your adventure to the next level. Take the Nissan Rogue, Nissan Pathfinder, or Nissan Armada and go find your next big adventure. Shop NissanUSA.com. Robinhood Robinhood has the only IRA that gives you a 3% boost on every dollar you contribute when you subscribe to Robinhood Gold. Now through April 30th, Robinhood is even boosting every single dollar you transfer in from other retirement accounts with a 3% match. Available to U.S. customers in good standing. Robinhood Financial LLC (member SIPC), is a registered broker dealer. LinkedIn LinkedIn Jobs helps you find the qualified candidates you want to talk to, faster. Post your job for free at LinkedIn.com/LOCKEDONNBA. Terms and conditions apply. eBay Motors For parts that fit, head to eBay Motors and look for the green check. Stay in the game with eBay Guaranteed Fit at eBayMotos.com. Let's ride. eBay Guaranteed Fit only available to US customers. Eligible items only. Exclusions apply. BetterHelp This episode is sponsored by BetterHelp. Make your brain your friend, with BetterHelp. Visit BetterHelp.com/LOCKEDONNBA today to get 10% off your first month. PrizePicks Go to PrizePicks.com/lockedonnba and use code lockedonnba for a first deposit match up to $100! Gametime Download the Gametime app, create an account, and use code LOCKEDON for $20 off your first purchase. FanDuel Get buckets with your first bet on FanDuel, America's Number One Sportsbook. Right now, NEW customers get ONE HUNDRED AND FIFTY DOLLARS in BONUS BETS with any winning FIVE DOLLAR BET! That's A HUNDRED AND FIFTY BUCKS – if your bet wins! Visit FanDuel.com/LOCKEDON to get started. FANDUEL DISCLAIMER: 21+ in select states. First online real money wager only. Bonus issued as nonwithdrawable free bets that expires in 14 days. Restrictions apply. See terms at sportsbook.fanduel.com. Gambling Problem? Call 1-800-GAMBLER or visit FanDuel.com/RG (CO, IA, MD, MI, NJ, PA, IL, VA, WV), 1-800-NEXT-STEP or text NEXTSTEP to 53342 (AZ), 1-888-789-7777 or visit ccpg.org/chat (CT), 1-800-9-WITH-IT (IN), 1-800-522-4700 (WY, KS) or visit ksgamblinghelp.com (KS), 1-877-770-STOP (LA), 1-877-8-HOPENY or text HOPENY (467369) (NY), TN REDLINE 1-800-889-9789 (TN) Intro Music by Ben Lloyd TikTok Instagram Learn more about your ad choices. Visit podcastchoices.com/adchoices

NBA Fantasy Basketball: Navigating Super Bowl's Sparse Schedule

Locked On Fantasy Basketball

Play Episode Listen Later Feb 10, 2024 21:56

Josh Lloyd delves into the nuances of a quieter NBA schedule on Super Bowl Sunday, pinpointing the potential impact of just two games on the day's fantasy basketball landscape. He'll dissect the significance of Kevin Huerter, Lu Dort, and Jaime Jaquez within this limited lineup. Tune in to the Locked On Fantasy Basketball Podcast, powered by Basketball Monster, for expert insights on making the most of this unique NBA slate.Vote for my partner to win the Changemaker Award https://www.wishpond.com/lp/2780526/entries/204585428Support Us By Supporting Our Sponsors!NissanOur friends at Nissan have a lineup of SUV's with the capabilities to take your adventure to the next level. Take the Nissan Rogue, Nissan Pathfinder, or Nissan Armada and go find your next big adventure. Shop NissanUSA.com.RobinhoodRobinhood has the only IRA that gives you a 3% boost on every dollar you contribute when you subscribe to Robinhood Gold. Now through April 30th, Robinhood is even boosting every single dollar you transfer in from other retirement accounts with a 3% match. Available to U.S. customers in good standing. Robinhood Financial LLC (member SIPC), is a registered broker dealer.LinkedInLinkedIn Jobs helps you find the qualified candidates you want to talk to, faster. Post your job for free at LinkedIn.com/LOCKEDONNBA. Terms and conditions apply.eBay MotorsFor parts that fit, head to eBay Motors and look for the green check. Stay in the game with eBay Guaranteed Fit at eBayMotos.com. Let's ride. eBay Guaranteed Fit only available to US customers. Eligible items only. Exclusions apply.BetterHelpThis episode is sponsored by BetterHelp. Make your brain your friend, with BetterHelp. Visit BetterHelp.com/LOCKEDONNBA today to get 10% off your first month.PrizePicksGo to PrizePicks.com/lockedonnba and use code lockedonnba for a first deposit match up to $100!GametimeDownload the Gametime app, create an account, and use code LOCKEDON for $20 off your first purchase.FanDuelGet buckets with your first bet on FanDuel, America's Number One Sportsbook. Right now, NEW customers get ONE HUNDRED AND FIFTY DOLLARS in BONUS BETS with any winning FIVE DOLLAR BET! That's A HUNDRED AND FIFTY BUCKS – if your bet wins! Visit FanDuel.com/LOCKEDON to get started.FANDUEL DISCLAIMER: 21+ in select states. First online real money wager only. Bonus issued as nonwithdrawable free bets that expires in 14 days. Restrictions apply. See terms at sportsbook.fanduel.com. Gambling Problem? Call 1-800-GAMBLER or visit FanDuel.com/RG (CO, IA, MD, MI, NJ, PA, IL, VA, WV), 1-800-NEXT-STEP or text NEXTSTEP to 53342 (AZ), 1-888-789-7777 or visit ccpg.org/chat (CT), 1-800-9-WITH-IT (IN), 1-800-522-4700 (WY, KS) or visit ksgamblinghelp.com (KS), 1-877-770-STOP (LA), 1-877-8-HOPENY or text HOPENY (467369) (NY), TN REDLINE 1-800-889-9789 (TN)Intro Music by Ben LloydTikTokInstagram Learn more about your ad choices. Visit podcastchoices.com/adchoices

america super bowl nba navigating ny vote md va robin hood ia restrictions next step betterhelp ks nissan gambler wv fanduel game time eligible wy rg hopeny tn redline exclusions bonus bets sipc fantasy basketball locked on stop la nissan rogue sparse nissan pathfinder kevin huerter robinhood gold ebay guaranteed fit nba fantasy nissan armada lockedonnba josh lloyd basketball monster locked on fantasy basketball podcast

The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Dec 14, 2023 79:37

We are running an end of year survey for our listeners. Let us know any feedback you have for us, what episodes resonated with you the most, and guest requests for 2024! RAG has emerged as one of the key pieces of the AI Engineer stack. Jerry from LlamaIndex called it a “hack”, Bryan from Hex compared it to “a recommendation system from LLMs”, and even LangChain started with it. RAG is crucial in any AI coding workflow. We talked about context quality for code in our Phind episode. Today's guests, Beyang Liu and Steve Yegge from SourceGraph, have been focused on code indexing and retrieval for over 15 years. We locked them in our new studio to record a 1.5 hours masterclass on the history of code search, retrieval interfaces for code, and how they get SOTA 30% completion acceptance rate in their Cody product by being better at the “bin packing problem” of LLM context generation. Google Grok → SourceGraph → CodyWhile at Google in 2008, Steve built Grok, which lives on today as Google Kythe. It allowed engineers to do code parsing and searching across different codebases and programming languages. (You might remember this blog post from Steve's time at Google) Beyang was an intern at Google at the same time, and Grok became the inspiration to start SourceGraph in 2013. The two didn't know eachother personally until Beyang brought Steve out of retirement 9 years later to join him as VP Engineering. Fast forward 10 years, SourceGraph has become to best code search tool out there and raised $223M along the way. Nine months ago, they open sourced SourceGraph Cody, their AI coding assistant. All their code indexing and search infrastructure allows them to get SOTA results by having better RAG than competitors:* Code completions as you type that achieve an industry-best Completion Acceptance Rate (CAR) as high as 30% using a context-enhanced open-source LLM (StarCoder)* Context-aware chat that provides the option of using GPT-4 Turbo, Claude 2, GPT-3.5 Turbo, Mistral 7x8B, or Claude Instant, with more model integrations planned* Doc and unit test generation, along with AI quick fixes for common coding errors* AI-enhanced natural language code search, powered by a hybrid dense/sparse vector search engine There are a few pieces of infrastructure that helped Cody achieve these results:Dense-sparse vector retrieval system For many people, RAG = vector similarity search, but there's a lot more that you can do to get the best possible results. From their release:"Sparse vector search" is a fancy name for keyword search that potentially incorporates LLMs for things like ranking and term expansion (e.g., "k8s" expands to "Kubernetes container orchestration", possibly weighted as in SPLADE): * Dense vector retrieval makes use of embeddings, the internal representation that LLMs use to represent text. Dense vector retrieval provides recall over a broader set of results that may have no exact keyword matches but are still semantically similar. * Sparse vector retrieval is very fast, human-understandable, and yields high recall of results that closely match the user query. * We've found the approaches to be complementary.There's a very good blog post by Pinecone on SPLADE for sparse vector search if you're interested in diving in. If you're building RAG applications in areas that have a lot of industry-specific nomenclature, acronyms, etc, this is a good approach to getting better results.SCIPIn 2016, Microsoft announced the Language Server Protocol (LSP) and the Language Server Index Format (LSIF). This protocol makes it easy for IDEs to get all the context they need from a codebase to get things like file search, references, “go to definition”, etc. SourceGraph developed SCIP, “a better code indexing format than LSIF”:* Simpler and More Efficient Format: SCIP utilizes Protobuf instead of JSON, which is used by LSIF. Protobuf is more space-efficient, simpler, and more suitable for systems programming. * Better Performance and Smaller Index Sizes: SCIP indexers, such as scip-clang, show enhanced performance and reduced index file sizes compared to LSIF indexers (10%-20% smaller)* Easier to Develop and Debug: SCIP's design, centered around human-readable string IDs for symbols, makes it faster and more straightforward to develop new language indexers. Having more efficient indexing is key to more performant RAG on code. Show Notes* Sourcegraph* Cody* Copilot vs Cody* Steve's Stanford seminar on Grok* Steve's blog* Grab* Fireworks* Peter Norvig* Noam Chomsky* Code search* Kelly Norton* Zoekt* v0.devSee also our past episodes on Cursor, Phind, Codeium and Codium as well as the GitHub Copilot keynote at AI Engineer Summit.Timestamps* [00:00:00] Intros & Backgrounds* [00:05:20] How Steve's work on Grok inspired SourceGraph for Beyang* [00:08:10] What's Cody?* [00:11:22] Comparison of coding assistants and the capabilities of Cody* [00:16:00] The importance of context (RAG) in AI coding tools* [00:21:33] The debate between Chomsky and Norvig approaches in AI* [00:30:06] Normsky: the Norvig + Chomsky models collision* [00:36:00] The death of the DSL?* [00:40:00] LSP, Skip, Kythe, BFG, and all that fun stuff* [00:53:00] The SourceGraph internal stack* [00:58:46] Building on open source models* [01:02:00] SourceGraph for engineering managers?* [01:12:00] Lightning RoundTranscriptAlessio: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO-in-Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI. [00:00:16]Swyx: Hey, and today we're christening our new podcast studio in the Newton, and we have Beyang and Steve from Sourcegraph. Welcome. [00:00:25]Beyang: Hey, thanks for having us. [00:00:26]Swyx: So this has been a long time coming. I'm very excited to have you. We also are just celebrating the one year anniversary of ChatGPT yesterday, but also we'll be talking about the GA of Cody later on today. We'll just do a quick intros of both of you. Obviously, people can research you and check the show notes for more. Beyang, you worked in computer vision at Stanford and then you worked at Palantir. I did, yeah. You also interned at Google. [00:00:48]Beyang: I did back in the day where I get to use Steve's system, DevTool. [00:00:53]Swyx: Right. What was it called? [00:00:55]Beyang: It was called Grok. Well, the end user thing was Google Code Search. That's what everyone called it, or just like CS. But the brains of it were really the kind of like Trigram index and then Grok, which provided the reference graph. [00:01:07]Steve: Today it's called Kythe, the open source Google one. It's sort of like Grok v3. [00:01:11]Swyx: On your podcast, which you've had me on, you've interviewed a bunch of other code search developers, including the current developer of Kythe, right? [00:01:19]Beyang: No, we didn't have any Kythe people on, although we would love to if they're up for it. We had Kelly Norton, who built a similar system at Etsy, it's an open source project called Hound. We also had Han-Wen Nienhuys, who created Zoekt, which is, I think, heavily inspired by the Trigram index that powered Google's original code search and that we also now use at Sourcegraph. Yeah. [00:01:45]Swyx: So you teamed up with Quinn over 10 years ago to start Sourcegraph and you were indexing all code on the internet. And now you're in a perfect spot to create a code intelligence startup. Yeah, yeah. [00:01:56]Beyang: I guess the backstory was, I used Google Code Search while I was an intern. And then after I left that internship and worked elsewhere, it was the single dev tool that I missed the most. I felt like my job was just a lot more tedious and much more of a hassle without it. And so when Quinn and I started working together at Palantir, he had also used various code search engines in open source over the years. And it was just a pain point that we both felt, both working on code at Palantir and also working within Palantir's clients, which were a lot of Fortune 500 companies, large financial institutions, folks like that. And if anything, the pains they felt in dealing with large complex code bases made our pain points feel small by comparison. So that was really the impetus for starting Sourcegraph. [00:02:42]Swyx: Yeah, excellent. Steve, you famously worked at Amazon. And you've told many, many stories. I want every single listener of Latent Space to check out Steve's YouTube because he effectively had a podcast that you didn't tell anyone about or something. You just hit record and just went on a few rants. I'm always here for your Stevie rants. And then you moved to Google, where you also had some interesting thoughts on just the overall Google culture versus Amazon. You joined Grab as head of eng for a couple of years. I'm from Singapore, so I have actually personally used a lot of Grab's features. And it was very interesting to see you talk so highly of Grab's engineering and sort of overall prospects. [00:03:21]Steve: Because as a customer, it sucked? [00:03:22]Swyx: Yeah, no, it's just like, being from a smaller country, you never see anyone from our home country being on a global stage or talked about as a startup that people admire or look up to, like on the league that you, with all your legendary experience, would consider equivalent. Yeah. [00:03:41]Steve: Yeah, no, absolutely. They actually, they didn't even know that they were as good as they were, in a sense. They started hiring a bunch of people from Silicon Valley to come in and sort of like fix it. And we came in and we were like, Oh, we could have been a little better operational excellence and stuff. But by and large, they're really sharp. The only thing about Grab is that they get criticized a lot for being too westernized. Oh, by who? By Singaporeans who don't want to work there. [00:04:06]Swyx: Okay. I guess I'm biased because I'm here, but I don't see that as a problem. If anything, they've had their success because they were more westernized than the Sanders Singaporean tech company. [00:04:15]Steve: I mean, they had their success because they are laser focused. They copy to Amazon. I mean, they're executing really, really, really well for a giant. I was on a slack with 2,500 engineers. It was like this giant waterfall that you could dip your toe into. You'd never catch up. Actually, the AI summarizers would have been really helpful there. But yeah, no, I think Grab is successful because they're just out there with their sleeves rolled up, just making it happen. [00:04:43]Swyx: And for those who don't know, it's not just like Uber of Southeast Asia, it's also a super app. PayPal Plus. [00:04:48]Steve: Yeah. [00:04:49]Swyx: In the way that super apps don't exist in the West. It's one of the enduring mysteries of B2C that super apps work in the East and don't work in the West. We just don't understand it. [00:04:57]Beyang: Yeah. [00:04:58]Steve: It's just kind of curious. They didn't work in India either. And it was primarily because of bandwidth reasons and smaller phones. [00:05:03]Swyx: That should change now. It should. [00:05:05]Steve: And maybe we'll see a super app here. [00:05:08]Swyx: You retired-ish? I did. You retired-ish on your own video game? Mm-hmm. Any fun stories about that? And that's also where you discovered some need for code search, right? Mm-hmm. [00:05:16]Steve: Sure. A need for a lot of stuff. Better programming languages, better databases. Better everything. I mean, I started in like 95, right? Where there was kind of nothing. Yeah. Yeah. [00:05:24]Beyang: I just want to say, I remember when you first went to Grab because you wrote that blog post talking about why you were excited about it, about like the expanding Asian market. And our reaction was like, oh, man, how did we miss stealing it with you? [00:05:36]Swyx: Hiring you. [00:05:37]Beyang: Yeah. [00:05:38]Steve: I was like, miss that. [00:05:39]Swyx: Tell that story. So how did this happen? Right? So you were inspired by Grok. [00:05:44]Beyang: I guess the backstory from my point of view is I had used code search and Grok while at Google, but I didn't actually know that it was connected to you, Steve. I knew you from your blog posts, which were always excellent, kind of like inside, very thoughtful takes from an engineer's perspective on some of the challenges facing tech companies and tech culture and that sort of thing. But my first introduction to you within the context of code intelligence, code understanding was I watched a talk that you gave, I think at Stanford, about Grok when you're first building it. And that was very eye opening. I was like, oh, like that guy, like the guy who, you know, writes the extremely thoughtful ranty like blog posts also built that system. And so that's how I knew, you know, you were involved in that. And then, you know, we always wanted to hire you, but never knew quite how to approach you or, you know, get that conversation started. [00:06:34]Steve: Well, we got introduced by Max, right? Yeah. It was temporal. Yeah. Yeah. I mean, it was a no brainer. They called me up and I had noticed when Sourcegraph had come out. Of course, when they first came out, I had this dagger of jealousy stabbed through me piercingly, which I remember because I am not a jealous person by any means, ever. But boy, I was like, but I was kind of busy, right? And just one thing led to another. I got sucked back into the ads vortex and whatever. So thank God Sourcegraph actually kind of rescued me. [00:07:05]Swyx: Here's a chance to build DevTools. Yeah. [00:07:08]Steve: That's the best. DevTools are the best. [00:07:10]Swyx: Cool. Well, so that's the overall intro. I guess we can get into Cody. Is there anything else that like people should know about you before we get started? [00:07:18]Steve: I mean, everybody knows I'm a musician. I can juggle five balls. [00:07:24]Swyx: Five is good. Five is good. I've only ever managed three. [00:07:27]Steve: Five is hard. Yeah. And six, a little bit. [00:07:30]Swyx: Wow. [00:07:31]Beyang: That's impressive. [00:07:32]Alessio: So yeah, to jump into Sourcegraph, this has been a company 10 years in the making. And as Sean said, now you're at the right place. Phase two. Now, exactly. You spent 10 years collecting all this code, indexing, making it easy to surface it. Yeah. [00:07:47]Swyx: And also learning how to work with enterprises and having them trust you with their code bases. Yeah. [00:07:52]Alessio: Because initially you were only doing on-prem, right? Like a lot of like VPC deployments. [00:07:55]Beyang: So in the very early days, we're cloud only. But the first major customers we landed were all on-prem, self-hosted. And that was, I think, related to the nature of the problem that we're solving, which becomes just like a critical, unignorable pain point once you're above like 100 devs or so. [00:08:11]Alessio: Yeah. And now Cody is going to be GA by the time this releases. So congrats to your future self for launching this in two weeks. Can you give a quick overview of just what Cody is? I think everybody understands that it's a AI coding agent, but a lot of companies say they have a AI coding agent. So yeah, what does Cody do? How do people interface with it? [00:08:32]Beyang: Yeah. So how is it different from the like several dozen other AI coding agents that exist in the market now? When we thought about building a coding assistant that would do things like code generation and question answering about your code base, I think we came at it from the perspective of, you know, we've spent the past decade building the world's best code understanding engine for human developers, right? So like it's kind of your guide as a human dev if you want to go and dive into a large complex code base. And so our intuition was that a lot of the context that we're providing to human developers would also be useful context for AI developers to consume. And so in terms of the feature set, Cody is very similar to a lot of other assistants. It does inline autocompletion. It does code base aware chat. It does specific commands that automate, you know, tasks that you might rather not want to do like generating unit tests or adding detailed documentation. But we think the core differentiator is really the quality of the context, which is hard to kind of describe succinctly. It's a bit like saying, you know, what's the difference between Google and Alta Vista? There's not like a quick checkbox list of features that you can rattle off, but it really just comes down to all the attention and detail that we've paid to making that context work well and be high quality and fast for human devs. We're now kind of plugging into the AI coding assistant as well. Yeah. [00:09:53]Steve: I mean, just to add my own perspective on to what Beyang just described, RAG is kind of like a consultant that the LLM has available, right, that knows about your code. RAG provides basically a bridge to a lookup system for the LLM, right? Whereas fine tuning would be more like on the job training for somebody. If the LLM is a person, you know, and you send them to a new job and you do on the job training, that's what fine tuning is like, right? So tuned to our specific task. You're always going to need that expert, even if you get the on the job training, because the expert knows your particular code base, your task, right? That expert has to know your code. And there's a chicken and egg problem because, right, you know, we're like, well, I'm going to ask the LLM about my code, but first I have to explain it, right? It's this chicken and egg problem. That's where RAG comes in. And we have the best consultants, right? The best assistant who knows your code. And so when you sit down with Cody, right, what Beyang said earlier about going to Google and using code search and then starting to feel like without it, his job was super tedious. Once you start using these, do you guys use coding assistants? [00:10:53]Swyx: Yeah, right. [00:10:54]Steve: I mean, like we're getting to the point very quickly, right? Where you feel like almost like you're programming without the internet, right? Or something, you know, it's like you're programming back in the nineties without the coding assistant. Yeah. Hopefully that helps for people who have like no idea about coding systems, what they are. [00:11:09]Swyx: Yeah. [00:11:10]Alessio: I mean, going back to using them, we had a lot of them on the podcast already. We had Cursor, we have Codium and Codium, very similar names. [00:11:18]Swyx: Yeah. Find, and then of course there's Copilot. [00:11:22]Alessio: You had a Copilot versus Cody blog post, and I think it really shows the context improvement. So you had two examples that stuck with me. One was, what does this application do? And the Copilot answer was like, oh, it uses JavaScript and NPM and this. And it's like, but that's not what it does. You know, that's what it's built with. Versus Cody was like, oh, these are like the major functions. And like, these are the functionalities and things like that. And then the other one was, how do I start this up? And Copilot just said NPM start, even though there was like no start command in the package JSON, but you know, most collapse, right? Most projects use NPM start. So maybe this does too. How do you think about open source models? Because Copilot has their own private thing. And I think you guys use Starcoder, if I remember right. Yeah, that's correct. [00:12:09]Beyang: I think Copilot uses some variant of Codex. They're kind of cagey about it. I don't think they've like officially announced what model they use. [00:12:16]Swyx: And I think they use a range of models based on what you're doing. Yeah. [00:12:19]Beyang: So everyone uses a range of model. Like no one uses the same model for like inline completion versus like chat because the latency requirements for. Oh, okay. Well, there's fill in the middle. There's also like what the model's trained on. So like we actually had completions powered by Claude Instant for a while. And but you had to kind of like prompt hack your way to get it to output just the code and not like, hey, you know, here's the code you asked for, like that sort of text. So like everyone uses a range of models. We've kind of designed Cody to be like especially model, not agnostic, but like pluggable. So one of our kind of design considerations was like as the ecosystem evolves, we want to be able to integrate the best in class models, whether they're proprietary or open source into Cody because the pace of innovation in the space is just so quick. And I think that's been to our advantage. Like today, Cody uses Starcoder for inline completions. And with the benefit of the context that we provide, we actually show comparable completion acceptance rate metrics. It's kind of like the standard metric that folks use to evaluate inline completion quality. It's like if I show you a completion, what's the chance that you actually accept the completion versus you reject it? And so we're at par with Copilot, which is at the head of that industry right now. And we've been able to do that with the Starcoder model, which is open source and the benefit of the context fetching stuff that we provide. And of course, a lot of like prompt engineering and other stuff along the way. [00:13:40]Alessio: And Steve, you wrote a post called cheating is all you need about what you're building. And one of the points you made is that everybody's fighting on the same axis, which is better UI and the IDE, maybe like a better chat response. But data modes are kind of the most important thing. And you guys have like a 10 year old mode with all the data you've been collecting. How do you kind of think about what other companies are doing wrong, right? Like, why is nobody doing this in terms of like really focusing on RAG? I feel like you see so many people. Oh, we just got a new model. It's like a bit human eval. And it's like, well, but maybe like that's not what we should really be doing, you know? Like, do you think most people underestimate the importance of like the actual RAG in code? [00:14:21]Steve: I think that people weren't doing it much. It wasn't. It's kind of at the edges of AI. It's not in the center. I know that when ChatGPT launched, so within the last year, I've heard a lot of rumblings from inside of Google, right? Because they're undergoing a huge transformation to try to, you know, of course, get into the new world. And I heard that they told, you know, a bunch of teams to go and train their own models or fine tune their own models, right? [00:14:43]Swyx: Both. [00:14:43]Steve: And, you know, it was a s**t show. Nobody knew how to do it. They launched two coding assistants. One was called Code D with an EY. And then there was, I don't know what happened in that one. And then there's Duet, right? Google loves to compete with themselves, right? They do this all the time. And they had a paper on Duet like from a year ago. And they were doing exactly what Copilot was doing, which was just pulling in the local context, right? But fundamentally, I thought of this because we were talking about the splitting of the [00:15:10]Swyx: models. [00:15:10]Steve: In the early days, it was the LLM did everything. And then we realized that for certain use cases, like completions, that a different, smaller, faster model would be better. And that fragmentation of models, actually, we expected to continue and proliferate, right? Because we are fundamentally, we're a recommender engine right now. Yeah, we're recommending code to the LLM. We're saying, may I interest you in this code right here so that you can answer my question? [00:15:34]Swyx: Yeah? [00:15:34]Steve: And being good at recommender engine, I mean, who are the best recommenders, right? There's YouTube and Spotify and, you know, Amazon or whatever, right? Yeah. [00:15:41]Swyx: Yeah. [00:15:41]Steve: And they all have many, many, many, many, many models, right? For all fine-tuned for very specific, you know. And that's where we're heading in code, too. Absolutely. [00:15:50]Swyx: Yeah. [00:15:50]Alessio: We just did an episode we released on Wednesday, which we said RAG is like Rexis or like LLMs. You're basically just suggesting good content. [00:15:58]Swyx: It's like what? Recommendations. [00:15:59]Beyang: Recommendations. [00:16:00]Alessio: Oh, got it. [00:16:01]Steve: Yeah, yeah, yeah. [00:16:02]Swyx: So like the naive implementation of RAG is you embed everything, throw it in a vector database, you embed your query, and then you find the nearest neighbors, and that's your RAG. But actually, you need to rank it. And actually, you need to make sure there's sample diversity and that kind of stuff. And then you're like slowly gradient dissenting yourself towards rediscovering proper Rexis, which has been traditional ML for a long time. But like approaching it from an LLM perspective. Yeah. [00:16:24]Beyang: I almost think of it as like a generalized search problem because it's a lot of the same things. Like you want your layer one to have high recall and get all the potential things that could be relevant. And then there's typically like a layer two re-ranking mechanism that bumps up the precision and tries to get the relevant stuff to the top of the results list. [00:16:43]Swyx: Have you discovered that ranking matters a lot? Oh, yeah. So the context is that I think a lot of research shows that like one, context utilization matters based on model. Like GPT uses the top of the context window, and then apparently Claude uses the bottom better. And it's lossy in the middle. Yeah. So ranking matters. No, it really does. [00:17:01]Beyang: The skill with which models are able to take advantage of context is always going to be dependent on how that factors into the impact on the training loss. [00:17:10]Swyx: Right? [00:17:10]Beyang: So like if you want long context window models to work well, then you have to have a ton of data where it's like, here's like a billion lines of text. And I'm going to ask a question about like something that's like, you know, embedded deeply into it and like, give me the right answer. And unless you have that training set, then of course, you're going to have variability in terms of like where it attends to. And in most kind of like naturally occurring data, the thing that you're talking about right now, the thing I'm asking you about is going to be something that we talked about recently. [00:17:36]Swyx: Yeah. [00:17:36]Steve: Did you really just say gradient dissenting yourself? Actually, I love that it's entered the casual lexicon. Yeah, yeah, yeah. [00:17:44]Swyx: My favorite version of that is, you know, how we have to p-hack papers. So, you know, when you throw humans at the problem, that's called graduate student dissent. That's great. It's really awesome. [00:17:54]Alessio: I think the other interesting thing that you have is this inline assist UX that I wouldn't say async, but like it works while you can also do work. So you can ask Cody to make changes on a code block and you can still edit the same file at the same time. [00:18:07]Swyx: Yeah. [00:18:07]Alessio: How do you see that in the future? Like, do you see a lot of Cody's running together at the same time? Like, how do you validate also that they're not messing each other up as they make changes in the code? And maybe what are the limitations today? And what do you think about where the attack is going? [00:18:21]Steve: I want to start with a little history and then I'm going to turn it over to Bian, all right? So we actually had this feature in the very first launch back in June. Dominic wrote it. It was called nonstop Cody. And you could have multiple, basically, LLM requests in parallel modifying your source [00:18:37]Swyx: file. [00:18:37]Steve: And he wrote a bunch of code to handle all of the diffing logic. And you could see the regions of code that the LLM was going to change, right? And he was showing me demos of it. And it just felt like it was just a little before its time, you know? But a bunch of that stuff, that scaffolding was able to be reused for where we're inline [00:18:56]Swyx: sitting today. [00:18:56]Steve: How would you characterize it today? [00:18:58]Beyang: Yeah, so that interface has really evolved from a, like, hey, general purpose, like, request anything inline in the code and have the code update to really, like, targeted features, like, you know, fix the bug that exists at this line or request a very specific [00:19:13]Swyx: change. [00:19:13]Beyang: And the reason for that is, I think, the challenge that we ran into with inline fixes, and we do want to get to the point where you could just fire and forget and have, you know, half a dozen of these running in parallel. But I think we ran into the challenge early on that a lot of people are running into now when they're trying to construct agents, which is the reliability of, you know, working code generation is just not quite there yet in today's language models. And so that kind of constrains you to an interaction where the human is always, like, in the inner loop, like, checking the output of each response. And if you want that to work in a way where you can be asynchronous, you kind of have to constrain it to a domain where today's language models can generate reliable code well enough. So, you know, generating unit tests, that's, like, a well-constrained problem. Or fixing a bug that shows up as, like, a compiler error or a test error, that's a well-constrained problem. But the more general, like, hey, write me this class that does X, Y, and Z using the libraries that I have, that is not quite there yet, even with the benefit of really good context. Like, it definitely moves the needle a lot, but we're not quite there yet to the point where you can just fire and forget. And I actually think that this is something that people don't broadly appreciate yet, because I think that, like, everyone's chasing this dream of agentic execution. And if we're to really define that down, I think it implies a couple things. You have, like, a multi-step process where each step is fully automated. We don't have to have a human in the loop every time. And there's also kind of like an LM call at each stage or nearly every stage in that [00:20:45]Swyx: chain. [00:20:45]Beyang: Based on all the work that we've done, you know, with the inline interactions, with kind of like general Codyfeatures for implementing longer chains of thought, we're actually a little bit more bearish than the average, you know, AI hypefluencer out there on the feasibility of agents with purely kind of like transformer-based models. To your original question, like, the inline interactions with CODI, we actually constrained it to be more targeted, like, you know, fix the current error or make this quick fix. I think that that does differentiate us from a lot of the other tools on the market, because a lot of people are going after this, like, shnazzy, like, inline edit interaction, whereas I think where we've moved, and this is based on the user feedback that we've gotten, it's like that sort of thing, it demos well, but when you're actually coding day to day, you don't want to have, like, a long chat conversation inline with the code base. That's a waste of time. You'd rather just have it write the right thing and then move on with your life or not have to think about it. And that's what we're trying to work towards. [00:21:37]Steve: I mean, yeah, we're not going in the agent direction, right? I mean, I'll believe in agents when somebody shows me one that works. Yeah. Instead, we're working on, you know, sort of solidifying our strength, which is bringing the right context in. So new context sources, ways for you to plug in your own context, ways for you to control or influence the context, you know, the mixing that happens before the request goes out, etc. And there's just so much low-hanging fruit left in that space that, you know, agents seems like a little bit of a boondoggle. [00:22:03]Beyang: Just to dive into that a little bit further, like, I think, you know, at a very high level, what do people mean when they say agents? They really mean, like, greater automation, fully automated, like, the dream is, like, here's an issue, go implement that. And I don't have to think about it as a human. And I think we are working towards that. Like, that is the eventual goal. I think it's specifically the approach of, like, hey, can we have a transformer-based LM alone be the kind of, like, backbone or the orchestrator of these agentic flows? Where we're a little bit more bearish today. [00:22:31]Swyx: You want the human in the loop. [00:22:32]Beyang: I mean, you kind of have to. It's just a reality of the behavior of language models that are purely, like, transformer-based. And I think that's just like a reflection of reality. And I don't think people realize that yet. Because if you look at the way that a lot of other AI tools have implemented context fetching, for instance, like, you see this in the Copilot approach, where if you use, like, the at-workspace thing that supposedly provides, like, code-based level context, it has, like, an agentic approach where you kind of look at how it's behaving. And it feels like they're making multiple requests to the LM being like, what would you do in this case? Would you search for stuff? What sort of files would you gather? Go and read those files. And it's like a multi-hop step, so it takes a long while. It's also non-deterministic. Because any sort of, like, LM invocation, it's like a dice roll. And then at the end of the day, the context it fetches is not that good. Whereas our approach is just like, OK, let's do some code searches that make sense. And then maybe, like, crawl through the reference graph a little bit. That is fast. That doesn't require any sort of LM invocation at all. And we can pull in much better context, you know, very quickly. So it's faster. [00:23:37]Swyx: It's more reliable. [00:23:37]Beyang: It's deterministic. And it yields better context quality. And so that's what we think. We just don't think you should cargo cult or naively go like, you know, agents are the [00:23:46]Swyx: future. [00:23:46]Beyang: Let's just try to, like, implement agents on top of the LM that exists today. I think there are a couple of other technologies or approaches that need to be refined first before we can get into these kind of, like, multi-stage, fully automated workflows. [00:24:00]Swyx: It makes sense. You know, we're very much focused on developer inner loop right now. But you do see things eventually moving towards developer outer loop. Yeah. So would you basically say that they're tackling the agent's problem that you don't want to tackle? [00:24:11]Beyang: No, I would say at a high level, we are after maybe, like, the same high level problem, which is like, hey, I want some code written. I want to develop some software and can automate a system. Go build that software for me. I think the approaches might be different. So I think the analogy in my mind is, I think about, like, the AI chess players. Coding, in some senses, I mean, it's similar and dissimilar to chess. I think one question I ask is, like, do you think producing code is more difficult than playing chess or less difficult than playing chess? More. [00:24:41]Swyx: I think more. [00:24:41]Beyang: Right. And if you look at the best AI chess players, like, yes, you can use an LLM to play chess. Like, people have showed demos where it's like, oh, like, yeah, GPT-4 is actually a pretty decent, like, chess move suggester. Right. But you would never build, like, a best in class chess player off of GPT-4 alone. [00:24:57]Swyx: Right. [00:24:57]Beyang: Like, the way that people design chess players is that you have kind of like a search space and then you have a way to explore that search space efficiently. There's a bunch of search algorithms, essentially. We were doing tree search in various ways. And you can have heuristic functions, which might be powered by an LLM. [00:25:12]Swyx: Right. [00:25:12]Beyang: Like, you might use an LLM to generate proposals in that space that you can efficiently explore. But the backbone is still this kind of more formalized tree search based approach rather than the LLM itself. And so I think my high level intuition is that, like, the way that we get to more reliable multi-step workflows that do things beyond, you know, generate unit test, it's really going to be like a search based approach where you use an LLM as kind of like an advisor or a proposal function, sort of your heuristic function, like the ASTAR search algorithm. But it's probably not going to be the thing that is the backbone, because I guess it's not the right tool for that. Yeah. [00:25:50]Swyx: I can see yourself kind of thinking through this, but not saying the words, the sort of philosophical Peter Norvig type discussion. Maybe you want to sort of introduce that in software. Yeah, definitely. [00:25:59]Beyang: So your listeners are savvy. They're probably familiar with the classic like Chomsky versus Norvig debate. [00:26:04]Swyx: No, actually, I wanted, I was prompting you to introduce that. Oh, got it. [00:26:08]Beyang: So, I mean, if you look at the history of artificial intelligence, right, you know, it goes way back to, I don't know, it's probably as old as modern computers, like 50s, 60s, 70s. People are debating on like, what is the path to producing a sort of like general human level of intelligence? And kind of two schools of thought that emerged. One is the Norvig school of thought, which roughly speaking includes large language models, you know, regression, SVN, basically any model that you kind of like learn from data. And it's like data driven. Most of machine learning would fall under this umbrella. And that school of thought says like, you know, just learn from the data. That's the approach to reaching intelligence. And then the Chomsky approach is more things like compilers and parsers and formal systems. So basically like, let's think very carefully about how to construct a formal, precise system. And that will be the approach to how we build a truly intelligent system. I think Lisp was invented so that you could create like rules-based systems that you would call AI. As a language. Yeah. And for a long time, there was like this debate, like there's certain like AI research labs that were more like, you know, in the Chomsky camp and others that were more in the Norvig camp. It's a debate that rages on today. And I feel like the consensus right now is that, you know, Norvig definitely has the upper hand right now with the advent of LMs and diffusion models and all the other recent progress in machine learning. But the Chomsky-based stuff is still really useful in my view. I mean, it's like parsers, compilers, basically a lot of the stuff that provides really good context. It provides kind of like the knowledge graph backbone that you want to explore with your AI dev tool. Like that will come from kind of like Chomsky-based tools like compilers and parsers. It's a lot of what we've invested in in the past decade at Sourcegraph and what you build with Grok. Basically like these formal systems that construct these very precise knowledge graphs that are great context providers and great kind of guard rails enforcers and kind of like safety checkers for the output of a more kind of like data-driven, fuzzier system that uses like the Norvig-based models. [00:28:03]Steve: Jang was talking about this stuff like it happened in the middle ages. Like, okay, so when I was in college, I was in college learning Lisp and prologue and planning and all the deterministic Chomsky approaches to AI. And I was there when Norvig basically declared it dead. I was there 3,000 years ago when Norvig and Chomsky fought on the volcano. When did he declare it dead? [00:28:26]Swyx: What do you mean he declared it dead? [00:28:27]Steve: It was like late 90s. [00:28:29]Swyx: Yeah. [00:28:29]Steve: When I went to Google, Peter Norvig was already there. He had basically like, I forget exactly where. It was some, he's got so many famous short posts, you know, amazing. [00:28:38]Swyx: He had a famous talk, the unreasonable effectiveness of data. Yeah. [00:28:41]Steve: Maybe that was it. But at some point, basically, he basically convinced everybody that deterministic approaches had failed and that heuristic-based, you know, data-driven statistical approaches, stochastic were better. [00:28:52]Swyx: Yeah. [00:28:52]Steve: The primary reason I can tell you this, because I was there, was that, was that, well, the steam-powered engine, no. The reason was that the deterministic stuff didn't scale. [00:29:06]Swyx: Yeah. Right. [00:29:06]Steve: They're using prologue, man, constraint systems and stuff like that. Well, that was a long time ago, right? Today, actually, these Chomsky-style systems do scale. And that's, in fact, exactly what Sourcegraph has built. Yeah. And so we have a very unique, I love the framing that Bjong's made, that the marriage of the Chomsky and the Norvig, you know, sort of models, you know, conceptual models, because we, you know, we have both of them and they're both really important. And in fact, there, there's this really interesting, like, kind of overlap between them, right? Where like the AI or our graph or our search engine could potentially provide the right context for any given query, which is, of course, why ranking is important. But what we've really signed ourselves up for is an extraordinary amount of testing. [00:29:45]Swyx: Yeah. [00:29:45]Steve: Because in SWIGs, you were saying that, you know, GPT-4 tends to the front of the context window and maybe other LLMs to the back and maybe, maybe the LLM in the middle. [00:29:53]Swyx: Yeah. [00:29:53]Steve: And so that means that, you know, if we're actually like, you know, verifying whether we, you know, some change we've made has improved things, we're going to have to test putting it at the beginning of the window and at the end of the window, you know, and maybe make the right decision based on the LLM that you've chosen. Which some of our competitors, that's a problem that they don't have, but we meet you, you know, where you are. Yeah. And we're, just to finish, we're writing tens of thousands. We're generating tests, you know, fill in the middle type tests and things. And then using our graph to basically sort of fine tune Cody's behavior there. [00:30:20]Swyx: Yeah. [00:30:21]Beyang: I also want to add, like, I have like an internal pet name for this, like kind of hybrid architecture that I'm trying to make catch on. Maybe I'll just say it here. Just saying it publicly kind of makes it more real. But like, I call the architecture that we've developed the Normsky architecture. [00:30:36]Swyx: Yeah. [00:30:36]Beyang: I mean, it's obviously a portmanteau of Norvig and Chomsky, but the acronym, it stands for non-agentic, rapid, multi-source code intelligence. So non-agentic because... Rolls right off the tongue. And Normsky. But it's non-agentic in the sense that like, we're not trying to like pitch you on kind of like agent hype, right? Like it's the things it does are really just developer tools developers have been using for decades now, like parsers and really good search indexes and things like that. Rapid because we place an emphasis on speed. We don't want to sit there waiting for kind of like multiple LLM requests to return to complete a simple user request. Multi-source because we're thinking broadly about what pieces of information and knowledge are useful context. So obviously starting with things that you can search in your code base, and then you add in the reference graph, which kind of like allows you to crawl outward from those initial results. But then even beyond that, you know, sources of information, like there's a lot of knowledge that's embedded in docs, in PRDs or product specs, in your production logging system, in your chat, in your Slack channel, right? Like there's so much context is embedded there. And when you're a human developer, and you're trying to like be productive in your code base, you're going to go to all these different systems to collect the context that you need to figure out what code you need to write. And I don't think the AI developer will be any different. It will need to pull context from all these different sources. So we're thinking broadly about how to integrate these into Codi. We hope through kind of like an open protocol that like others can extend and implement. And this is something else that should be accessible by December 14th in kind of like a preview stage. But that's really about like broadening this notion of the code graph beyond your Git repository to all the other sources where technical knowledge and valuable context can live. [00:32:21]Steve: Yeah, it becomes an artifact graph, right? It can link into your logs and your wikis and any data source, right? [00:32:27]Alessio: How do you guys think about the importance of, it's almost like data pre-processing in a way, which is bring it all together, tie it together, make it ready. Any thoughts on how to actually make that good? Some of the innovation you guys have made. [00:32:40]Steve: We talk a lot about the context fetching, right? I mean, there's a lot of ways you could answer this question. But, you know, we've spent a lot of time just in this podcast here talking about context fetching. But stuffing the context into the window is, you know, the bin packing problem, right? Because the window is not big enough, and you've got more context than you can fit. You've got a ranker maybe. But what is that context? Is it a function that was returned by an embedding or a graph call or something? Do you need the whole function? Or do you just need, you know, the top part of the function, this expression here, right? You know, so that art, the golf game of trying to, you know, get each piece of context down into its smallest state, possibly even summarized by another model, right, before it even goes to the LLM, becomes this is the game that we're in, yeah? And so, you know, recursive summarization and all the other techniques that you got to use to like stuff stuff into that context window become, you know, critically important. And you have to test them across every configuration of models that you could possibly need. [00:33:32]Beyang: I think data preprocessing is probably the like unsexy, way underappreciated secret to a lot of the cool stuff that people are shipping today. Whether you're doing like RAG or fine tuning or pre-training, like the preprocessing step matters so much because it's basically garbage in, garbage out, right? Like if you're feeding in garbage to the model, then it's going to output garbage. Concretely, you know, for code RAG, if you're not doing some sort of like preprocessing that takes advantage of a parser and is able to like extract the key components of a particular file of code, you know, separate the function signature from the body, from the doc string, what are you even doing? Like that's like table stakes. It opens up so much more possibilities with which you can kind of like tune your system to take advantage of the signals that come from those different parts of the code. Like we've had a tool, you know, since computers were invented that understands the structure of source code to a hundred percent precision. The compiler knows everything there is to know about the code in terms of like structure. Like why would you not want to use that in a system that's trying to generate code, answer questions about code? You shouldn't throw that out the window just because now we have really good, you know, data-driven models that can do other things. [00:34:44]Steve: Yeah. When I called it a data moat, you know, in my cheating post, a lot of people were confused, you know, because data moat sort of sounds like data lake because there's data and water and stuff. I don't know. And so they thought that we were sitting on this giant mountain of data that we had collected, but that's not what our data moat is. It's really a data pre-processing engine that can very quickly and scalably, like basically dissect your entire code base in a very small, fine-grained, you know, semantic unit and then serve it up. Yeah. And so it's really, it's not a data moat. It's a data pre-processing moat, I guess. [00:35:15]Beyang: Yeah. If anything, we're like hypersensitive to customer data privacy requirements. So it's not like we've taken a bunch of private data and like, you know, trained a generally available model. In fact, exactly the opposite. A lot of our customers are choosing Cody over Copilot and other competitors because we have an explicit guarantee that we don't do any of that. And that we've done that from day one. Yeah. I think that's a very real concern in today's day and age, because like if your proprietary IP finds its way into the training set of any model, it's very easy both to like extract that knowledge from the model and also use it to, you know, build systems that kind of work on top of the institutional knowledge that you've built up. [00:35:52]Alessio: About a year ago, I wrote a post on LLMs for developers. And one of the points I had was maybe the depth of like the DSL. I spent most of my career writing Ruby and I love Ruby. It's so nice to use, but you know, it's not as performant, but it's really easy to read, right? And then you look at other languages, maybe they're faster, but like they're more verbose, you know? And when you think about efficiency of the context window, that actually matters. [00:36:15]Swyx: Yeah. [00:36:15]Alessio: But I haven't really seen a DSL for models, you know? I haven't seen like code being optimized to like be easier to put in a model context. And it seems like your pre-processing is kind of doing that. Do you see in the future, like the way we think about the DSL and APIs and kind of like service interfaces be more focused on being context friendly, where it's like maybe it's harder to read for the human, but like the human is never going to write it anyway. We were talking on the Hacks podcast. There are like some data science things like spin up the spandex, like humans are never going to write again because the models can just do very easily. Yeah, curious to hear your thoughts. [00:36:51]Steve: Well, so DSLs, they involve, you know, writing a grammar and a parser and they're like little languages, right? We do them that way because, you know, we need them to compile and humans need to be able to read them and so on. The LLMs don't need that level of structure. You can throw any pile of crap at them, you know, more or less unstructured and they'll deal with it. So I think that's why a DSL hasn't emerged for sort of like communicating with the LLM or packaging up the context or anything. Maybe it will at some point, right? We've got, you know, tagging of context and things like that that are sort of peeking into DSL territory, right? But your point on do users, you know, do people have to learn DSLs like regular expressions or, you know, pick your favorite, right? XPath. I think you're absolutely right that the LLMs are really, really good at that. And I think you're going to see a lot less of people having to slave away learning these things. They just have to know the broad capabilities and the LLM will take care of the rest. [00:37:42]Swyx: Yeah, I'd agree with that. [00:37:43]Beyang: I think basically like the value profit of DSL is that it makes it easier to work with a lower level language, but at the expense of introducing an abstraction layer. And in many cases today, you know, without the benefit of AI cogeneration, like that totally worth it, right? With the benefit of AI cogeneration, I mean, I don't think all DSLs will go away. I think there's still, you know, places where that trade-off is going to be worthwhile. But it's kind of like how much of source code do you think is going to be generated through natural language prompting in the future? Because in a way, like any programming language is just a DSL on top of assembly, right? And so if people can do that, then yeah, like maybe for a large portion of the code [00:38:21]Swyx: that's written, [00:38:21]Beyang: people don't actually have to understand the DSL that is Ruby or Python or basically any other programming language that exists. [00:38:28]Steve: I mean, seriously, do you guys ever write SQL queries now without using a model of some sort? At least a draft. [00:38:34]Swyx: Yeah, right. [00:38:36]Steve: And so we have kind of like, you know, past that bridge, right? [00:38:39]Alessio: Yeah, I think like to me, the long-term thing is like, is there ever going to be, you don't actually see the code, you know? It's like, hey, the basic thing is like, hey, I need a function to some two numbers and that's it. I don't need you to generate the code. [00:38:53]Steve: And the following question, do you need the engineer or the paycheck? [00:38:56]Swyx: I mean, right? [00:38:58]Alessio: That's kind of the agent's discussion in a way where like you cannot automate the agents, but like slowly you're getting more of the atomic units of the work kind of like done. I kind of think of it as like, you know, [00:39:09]Beyang: do you need a punch card operator to answer that for you? And so like, I think we're still going to have people in the role of a software engineer, but the portion of time they spend on these kinds of like low-level, tedious tasks versus the higher level, more creative tasks is going to shift. [00:39:23]Steve: No, I haven't used punch cards. [00:39:25]Swyx: Yeah, I've been talking about like, so we kind of made this podcast about the sort of rise of the AI engineer. And like the first step is the AI enhanced engineer. That is that software developer that is no longer doing these routine, boilerplate-y type tasks, because they're just enhanced by tools like yours. So you mentioned OpenCodeGraph. I mean, that is a kind of DSL maybe, and because we're releasing this as you go GA, you hope for other people to take advantage of that? [00:39:52]Beyang: Oh yeah, I would say so OpenCodeGraph is not a DSL. It's more of a protocol. It's basically like, hey, if you want to make your system, whether it's, you know, chat or logging or whatever accessible to an AI developer tool like Cody, here's kind of like the schema by which you can provide that context and offer hints. So I would, you know, comparisons like LSP obviously did this for kind of like standard code intelligence. It's kind of like a lingua franca for providing fine references and codefinition. There's kind of like analogs to that. There might be also analogs to kind of the original OpenAI, kind of like plugins, API. There's all this like context out there that might be useful for an LM-based system to consume. And so at a high level, what we're trying to do is define a common language for context providers to provide context to other tools in the software development lifecycle. Yeah. Do you have any critiques of LSP, by the way, [00:40:42]Swyx: since like this is very much, very close to home? [00:40:45]Steve: One of the authors wrote a really good critique recently. Yeah. I don't think I saw that. Yeah, yeah. LSP could have been better. It just came out a couple of weeks ago. It was a good article. [00:40:54]Beyang: Yeah. I think LSP is great. Like for what it did for the developer ecosystem, it was absolutely fantastic. Like nowadays, like it's much easier now to get code navigation up and running in a bunch of editors by speaking this protocol. I think maybe the interesting question is like looking at the different design decisions comparing LSP basically with Kythe. Because Kythe has more of a... How would you describe it? [00:41:18]Steve: A storage format. [00:41:20]Beyang: I think the critique of LSP from a Kythe point of view would be like with LSP, you don't actually have an actual symbolic model of the code. It's not like LSP models like, hey, this function calls this other function. LSP is all like range-based. Like, hey, your cursor's at line 32, column 1. [00:41:35]Swyx: Yeah. [00:41:35]Beyang: And that's the thing you feed into the language server. And then it's like, okay, here's the range that you should jump to if you click on that range. So it kind of is intentionally ignorant of the fact that there's a thing called a reference underneath your cursor, and that's linked to a symbol definition. [00:41:49]Steve: Well, actually, that's the worst example you could have used. You're right. But that's the one thing that it actually did bake in is following references. [00:41:56]Swyx: Sure. [00:41:56]Steve: But it's sort of hardwired. [00:41:58]Swyx: Yeah. [00:41:58]Steve: Whereas Kythe attempts to model [00:42:00]Beyang: like all these things explicitly. [00:42:02]Swyx: And so... [00:42:02]Steve: Well, so LSP is a protocol, right? And so Google's internal protocol is gRPC-based. And it's a different approach than LSP. It's basically you make a heavy query to the back end, and you get a lot of data back, and then you render the whole page, you know? So we've looked at LSP, and we think that it's a little long in the tooth, right? I mean, it's a great protocol, lots and lots of support for it. But we need to push into the domain of exposing the intelligence through the protocol. Yeah. [00:42:29]Beyang: And so I would say we've developed a protocol of our own called Skip, which is at a very high level trying to take some of the good ideas from LSP and from Kythe and merge that into a system that in the near term is useful for Sourcegraph, but I think in the long term, we hope will be useful for the ecosystem. Okay, so here's what LSP did well. LSP, by virtue of being like intentionally dumb, dumb in air quotes, because I'm not like ragging on it, allowed language servers developers to kind of like bypass the hard problem of like modeling language semantics precisely. So like if all you want to do is jump to definition, you don't have to come up with like a universally unique naming scheme for each symbol, which is actually quite challenging because you have to think about like, okay, what's the top scope of this name? Is it the source code repository? Is it the package? Does it depend on like what package server you're fetching this from? Like whether it's the public one or the one inside your... Anyways, like naming is hard, right? And by just going from kind of like a location to location based approach, you basically just like throw that out the window. All I care about is jumping definition, just make that work. And you can make that work without having to deal with like all the complex global naming things. The limitation of that approach is that it's harder to build on top of that to build like a true knowledge graph. Like if you actually want a system that says like, okay, here's the web of functions and here's how they reference each other. And I want to incorporate that like semantic model of how the code operates or how the code relates to each other at like a static level. You can't do that with LSP because you have to deal with line ranges. And like concretely the pain point that we found in using LSP for source graph is like in order to do like a find references [00:44:04]Swyx: and then jump definitions, [00:44:04]Beyang: it's like a multi-hop process because like you have to jump to the range and then you have to find the symbol at that range. And it just adds a lot of latency and complexity of these operations where as a human, you're like, well, this thing clearly references this other thing. Why can't you just jump me to that? And I think that's the thing that Kaith does well. But then I think the issue that Kaith has had with adoption is because it is more sophisticated schema, I think. And so there's basically more things that you have to implement to get like a Kaith implementation up and running. I hope I'm not like, correct me if I'm wrong about any of this. [00:44:35]Steve: 100%, 100%. Kaith also has a problem, all these systems have the problem, even skip, or at least the way that we implemented the indexers, that they have to integrate with your build system in order to build that knowledge graph, right? Because you have to basically compile the code in a special mode to generate artifacts instead of binaries. And I would say, by the way, earlier I was saying that XREFs were in LSP, but it's actually, I was thinking of LSP plus LSIF. [00:44:58]Swyx: Yeah. That's another. [00:45:01]Steve: Which is actually bad. We can say that it's bad, right? [00:45:04]Steve: It's like skip or Kaith, it's supposed to be sort of a model serialization, you know, for the code graph, but it basically just does what LSP needs, the bare minimum. LSIF is basically if you took LSP [00:45:16]Beyang: and turned that into a serialization format. So like you build an index for language servers to kind of like quickly bootstrap from cold start. But it's a graph model [00:45:23]Steve: with all of the inconvenience of the API without an actual graph. And so, yeah. [00:45:29]Beyang: So like one of the things that we try to do with skip is try to capture the best of both worlds. So like make it easy to write an indexer, make the schema simple, but also model some of the more symbolic characteristics of the code that would allow us to essentially construct this knowledge graph that we can then make useful for both the human developer through SourceGraph and through the AI developer through Cody. [00:45:49]Steve: So anyway, just to finish off the graph comment, we've got a new graph, yeah, that's skip based. We call it BFG internally, right? It's a beautiful something graph. A big friendly graph. [00:46:00]Swyx: A big friendly graph. [00:46:01]Beyang: It's a blazing fast. [00:46:02]Steve: Blazing fast. [00:46:03]Swyx: Blazing fast graph. [00:46:04]Steve: And it is blazing fast, actually. It's really, really interesting. I should probably have to do a blog post about it to walk you through exactly how they're doing it. Oh, please. But it's a very AI-like iterative, you know, experimentation sort of approach. We're building a code graph based on all of our 10 years of knowledge about building code graphs, yeah? But we're building it quickly with zero configuration, and it doesn't have to integrate with your build. And through some magic tricks that we have. And so what just happens when you install the plugin, that it'll be there and indexing your code and providing that knowledge graph in the background without all that build system integration. This is a bit of secret sauce that we haven't really like advertised it very much lately. But I am super excited about it because what they do is they say, all right, you know, let's tackle function parameters today. Cody's not doing a very good job of completing function call arguments or function parameters in the definition, right? Yeah, we generate those thousands of tests, and then we can actually reuse those tests for the AI context as well. So fortunately, things are kind of converging on, we have, you know, half a dozen really, really good context sources, and we mix them all together. So anyway, BFG, you're going to hear more about it probably in the holidays? [00:47:12]Beyang: I think it'll be online for December 14th. We'll probably mention it. BFG is probably not the public name we're going to go with. I think we might call it like Graph Context or something like that. [00:47:20]Steve: We're officially calling it BFG. [00:47:22]Swyx: You heard it here first. [00:47:24]Beyang: BFG is just kind of like the working name. And so the impetus for BFG was like, if you look at like current AI inline code completion tools and the errors that they make, a lot of the errors that they make, even in kind of like the easy, like single line case, are essentially like type errors, right? Like you're trying to complete a function call and it suggests a variable that you defined earlier, but that variable is the wrong type. [00:47:47]Swyx: And that's the sort of thing [00:47:47]Beyang: where it's like a first year, like freshman CS student would not make that error, right? So like, why does the AI make that error? And the reason is, I mean, the AI is just suggesting things that are plausible without the context of the types or any other like broader files in the code. And so the kind of intuition here is like, why don't we just do the basic thing that like any baseline intelligent human developer would do, which is like click jump to definition, click some fine references and pull in that like Graph Context into the context window and then have it generate the completion. So like that's sort of like the MVP of what BFG was. And turns out that works really well. Like you can eliminate a lot of type errors that AI coding tools make just by pulling in that context. Yeah, but the graph is definitely [00:48:32]Steve: our Chomsky side. [00:48:33]Swyx: Yeah, exactly. [00:48:34]Beyang: So like this like Chomsky-Norvig thing, I think pops up in a bunch of differ

america ceo amazon spotify ai google west building reading microsoft fortune bank east uber chatgpt code asian silicon valley mvp ga comparison engineering phase develop singapore stanford hacks recommendations skip ibm architecture easier ip cfo intel cto exploration jaws etsy react slack doc southeast asia fireworks rapid newton openai residence rust ux api cio rolls cs b2c coding python gpt ui mm turbo wells fargo ml llama ey apis transformer javascript hound ids copilot llm sam altman gpu programmers duet palantir ides ide sql neural grok git kubernetes hex codex rag gpus anthropic dense benchmarking v2 alessio lms lm json chomsky sota bfg googlers typescript cursor github copilot dsl clippy vs code npm lisp postgres lsp zoekt airflow pytorch sparse altavista devtools pinecone grpc 'a star svn vpc bian langchain repl ai engineer dsls george hotz george moore peter norvig steve it rnns concretely codeium latent space steve well steve yeah steve no xpath steve so steve yegge steve how steve can steve yes steve one beyang liu steve just steve they steve sure steve today steve wow steve which steve absolutely

Podcasts about Sparse

Best podcasts about Sparse

The Nonlinear Library

Papers Read on AI

Cutting Through the Matrix with Alan Watt Podcast (.xml Format)

Locked On Fantasy Basketball

Poesia & Poesia

Locked On Indians - Daily Podcast On The Cleveland Indians

PaperPlayer biorxiv neuroscience

Valley Family Church

Latest news about Sparse

Latest podcast episodes about Sparse

SEASON OF THE WITCH : Alse Young : The First Witch of New England | True Paranormal History

The US says a deal has been reached on TikTok, but details are sparse

Political Violence, Sparse Security, and Unanswered Questions

Rainy Road to Reflect or Ruminate… Ambience

HiddenTrack #263 JOHN GALM (SNOWING / MT. WORRY)

Bits and bytes stories

TMA (7-10-25) Hour 1 - Group Rate To The Sun

314: The search revolution: Dense vs. sparse vectors (with Jack Pertschuk from Pinecone)

Crows and the city in snow

Lemon LIVE at 5 | That Parade Was So EMBARRASSING! - June 16th, 2025

HOT TOPICS | Trump's Birthday Parade FLOP! - June 16th, 2025

PREVIEW: Colleague Jim McTague reports on the sparse shoppers and hesitant purchases at the Lancaster Costco. More.

Europe Market Open: EU & US futures flat with catalysts sparse; fixed benchmarks extend onto gains and DXY lower after data

Style Theft at Scale: AI and the Fight for Creative Integrity

The Record of Linji – Sangha Instruction

215. 'No Country For Old Men' (2007)

Syracuse grinds out first ACC win over Georgia Tech before sparse crowd at JMA Dome

He Who Would Walk the Earth

Neel Nanda - Mechanistic Interpretability (Sparse Autoencoders)

Steven Veerapen, author of the 'Anthony Blanke' series - Historical fiction author and academic discusses morbid curiosity, sparse writing environments, and Tudor love

Dragnet(111124)

J.B. Smoove | Sad Trump Closes with Lies, Threats, RFK Jr and Complaints About SNL to Sparse Crowds: A Closer Look

Season 9 - Episode 43

PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

AEW DYNAMITE POST-SHOW (9/18): Keller & Dehnel discuss sparse Grand Slam line-up and evaluate the build for Darby-Mox and Danielson-Nigel

Why Halloween Horror Nights 2024 Falls Flat: Budget Cuts & Sparse Scares at Universal Orlando

AF - Showing SAE Latents Are Not Atomic Using Meta-SAEs by Bart Bussmann

LW - Case Study: Interpreting, Manipulating, and Controlling CLIP With Sparse Autoencoders by Gytis Daujotas

LW - Open Source Automated Interpretability for Sparse Autoencoder Features by kh4dien

History of AI - EP06 Part 2: The Effortless Podcast

LW - Efficient Dictionary Learning with Switch Sparse Autoencoders by Anish Mudide

AF - Decomposing the QK circuit with Bilinear Sparse Dictionary Learning by keith wynroe

LW - Decomposing the QK circuit with Bilinear Sparse Dictionary Learning by keith wynroe

AF - Interpreting Preference Models w/ Sparse Autoencoders by Logan Riggs Smith

LLM Interpretability and Sparse Autoencoders: Research from OpenAI and Anthropic

Investing in AI for Hard Tech, with Eric Vishria of Benchmark and Sergiy Nesterenko of Quilter

AF - Scaling and evaluating sparse autoencoders by leogao

AF - Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning by Dan Braun

LW - Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers by hugofry

AF - Improving Dictionary Learning with Gated Sparse Autoencoders by Neel Nanda

Ep. 567 w/ Brian Stevens CEO at Neural Magic

AF - ProLU: A Pareto Improvement for Sparse Autoencoders by Glen M. Taggart

The Future of AI will be Sparse

US Market Open: DXY nears 104.00 after BoJ sources, XAU at ATHs; docket ahead sparse

LibrInsieme, il book club che riunisce persone sparse per l'Australia che parlano l'italiano

AF - Announcing Neuronpedia: Platform for accelerating research into Sparse Autoencoders by Johnny Lin

LW - Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT by Robert AIZI

NBA Fantasy Basketball: Navigating Super Bowl's Sparse Schedule

NBA Fantasy Basketball: Navigating Super Bowl's Sparse Schedule

The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph