POPULARITY
Presented by Material Security: We protect your company's most valuable materials -- the emails, files, and accounts that live in your Google Workspace and Microsoft 365 cloud offices. Three Buddy Problem - Episode 71: The buddies travel to Canada for a live recording at the Countermeasure conference, discussing the Google v FFmpeg open-source patching brouhana, ransomware negotiators charged and linked to ransomware attacks, the looming TP-Link ban in the U.S., and the discovery of LANDFALL, an APT attack caught using a Samsung mobile zero-day. Cast: Juan Andres Guerrero-Saade (https://twitter.com/juanandres_gs), Ryan Naraine (https://twitter.com/ryanaraine) and Costin Raiu (https://twitter.com/craiu).
In this week's show Patrick Gray and Adam Boileau discuss the week's cybersecurity news, including: We love some good vulnerability reporting drama, this time FFmpeg's got beef with Google OpenAI announces its Aardvark bug-gobbling system Two US ransomware responders get arrested for… ransomware Memento (nee HackingTeam) CEO says: Sì, those are totally our tools getting snapped in Russia Hackers help freight theft gangs steal shipments to resell A second Jabber Zeus mastermind gets his comeuppance 15 years on This week's episode is sponsored by Nucleus Security, who make a vulnerability information management system. Co-founder Scott Kuffer says that approaches for triaging vulnerabilities have started to fall apart, given there are just. So. Many. And they're all important! This episode is also available on Youtube. Show notes vx-underground on X: "Yeah, so pretty much this entire drama thing is FFmpeg are a bunch of nerds…" FFmpeg on X: "@DavidEGrayson It's someone's hobby project of an obscure 1990s decoder…" Halvar Flake on X: "Given the extremely big role ffmpeg has played historically..." thaddeus e. grugq on X: "Current drama: Plucky security researcher Google takes on volunteer open source behemoth FFmpeg." Robert Graham on X: "Current status: There's a conflict between Google…" Introducing Aardvark: OpenAI's agentic security researcher | OpenAI Bugcrowd acquires Mayhem Security to advance AI-powered security testing | CyberScoop Prosecutors allege incident response pros used ALPHV/BlackCat to commit string of ransomware attacks | CyberScoop Former Trenchant Exec Sold Stolen Code to Russian Buyer Even After Learning that Other Code He Sold Was Being "Utilized" by Different Broker in South Korea How an ex-L3Harris Trenchant boss stole and sold cyber exploits to Russia | TechCrunch Operation Zero — A Zero-Day Vulnerability Platform John Scott-Railton on X: "7/ There's a push to scale up America's offensive industry right now…" CEO of spyware maker Memento Labs confirms one of its government customers was caught using its malware | TechCrunch Exploiting Microsoft Teams: Impersonation and Spoofing Vulnerabilities Exposed Microsoft Teams Vulnerabilities Uncovered Cargo theft gets a boost from hackers using remote monitoring tools | The Record from Recorded Future News Remote access, real cargo: cybercriminals targeting trucking and logistics | Proofpoint US Alleged Conti ransomware gang affiliate appears in Tennessee court after Ireland extradition | The Record from Recorded Future News Three suspected developers of Meduza Stealer malware arrested in Russia | The Record from Recorded Future News Alleged Jabber Zeus Coder ‘MrICQ' in U.S. Custody – Krebs on Security Windows Server Update Service exploitation ensnares at least 50 victims | Cybersecurity Dive Post by @paulschnack.bsky.social — Bluesky
From AI-powered code generation boosting productivity to adversaries using the same tools to hunt zero-days, the panel exposes the coming wave of AI-fueled cyberattacks—and why most companies aren't ready for it. Cotton blocks Trump-backed effort to make daylight saving time permanent The End of Cybersecurity Amazon says it didn't cut 14,000 people because of money. It cut them because of 'culture' Here's How the AI Crash Happens US government is getting closer to banning TP-Link routers Neato cloud shutdown sees robocleaners robbed of their smarts FCC will vote to scrap telecom cybersecurity requirements Trump FCC Votes To Make It Easier For Your Broadband ISP To Rip You Off Swedish Death Cleaning But for Your Ditital Life The F5 Hack is a Big Deal OpenAI Releases Agentic Security Researcher 'Do not trust your eyes': AI generates surge in expense fraud Proton Data Breach Observatory aims to alert you in near real-time Using a Security Key on X? Re-Enroll Now or Your Account Will Be Locked YouTube denies AI was involved with odd removals of tech tutorials 10M people watched a YouTuber shim a lock; the lock company sued him. Bad idea. Samsung's $2000 smart fridges are getting ads - gHacks Tech News ESPN, ABC, and other Disney channels go dark on YouTube TV Host: Leo Laporte Guests: Jill Duffy, Alex Stamos, and Stacey Higginbotham Download or subscribe to This Week in Tech at https://twit.tv/shows/this-week-in-tech Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: ziprecruiter.com/twit zscaler.com/security miro.com canary.tools/twit - use code: TWIT Melissa.com/twit
From AI-powered code generation boosting productivity to adversaries using the same tools to hunt zero-days, the panel exposes the coming wave of AI-fueled cyberattacks—and why most companies aren't ready for it. Cotton blocks Trump-backed effort to make daylight saving time permanent The End of Cybersecurity Amazon says it didn't cut 14,000 people because of money. It cut them because of 'culture' Here's How the AI Crash Happens US government is getting closer to banning TP-Link routers Neato cloud shutdown sees robocleaners robbed of their smarts FCC will vote to scrap telecom cybersecurity requirements Trump FCC Votes To Make It Easier For Your Broadband ISP To Rip You Off Swedish Death Cleaning But for Your Ditital Life The F5 Hack is a Big Deal OpenAI Releases Agentic Security Researcher 'Do not trust your eyes': AI generates surge in expense fraud Proton Data Breach Observatory aims to alert you in near real-time Using a Security Key on X? Re-Enroll Now or Your Account Will Be Locked YouTube denies AI was involved with odd removals of tech tutorials 10M people watched a YouTuber shim a lock; the lock company sued him. Bad idea. Samsung's $2000 smart fridges are getting ads - gHacks Tech News ESPN, ABC, and other Disney channels go dark on YouTube TV Host: Leo Laporte Guests: Jill Duffy, Alex Stamos, and Stacey Higginbotham Download or subscribe to This Week in Tech at https://twit.tv/shows/this-week-in-tech Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: ziprecruiter.com/twit zscaler.com/security miro.com canary.tools/twit - use code: TWIT Melissa.com/twit
This week SUSE's SLES and Red Hat's RHEL are embracing AI in the form of MCP and CUDA support. FFMPEG scores a $100k donation, Pop_OS and Cosmic finally have a release data, and Unity is in need of help. Kodi 22 has an Alpha, Debian has a Systemd dustup, and Krita has landed HDR support. And there's a port of Linux to WASM, so you can run the kern in your browser. Handy! For tips we have doxx for opening .docx in the terminal, a primer on absolute vs relative paths, whoami for grabbing the current username, and btrfs's scrub command for checking the local disk. You can find the show notes at https://bit.ly/4ovhsLG and have a great week! Host: Jonathan Bennett Co-Hosts: Rob Campbell, Jeff Massie, and Ken McDonald Download or subscribe to Untitled Linux Show at https://twit.tv/shows/untitled-linux-show Want access to the ad-free video and exclusive features? Become a member of Club TWiT today! https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.
From AI-powered code generation boosting productivity to adversaries using the same tools to hunt zero-days, the panel exposes the coming wave of AI-fueled cyberattacks—and why most companies aren't ready for it. Cotton blocks Trump-backed effort to make daylight saving time permanent The End of Cybersecurity Amazon says it didn't cut 14,000 people because of money. It cut them because of 'culture' Here's How the AI Crash Happens US government is getting closer to banning TP-Link routers Neato cloud shutdown sees robocleaners robbed of their smarts FCC will vote to scrap telecom cybersecurity requirements Trump FCC Votes To Make It Easier For Your Broadband ISP To Rip You Off Swedish Death Cleaning But for Your Ditital Life The F5 Hack is a Big Deal OpenAI Releases Agentic Security Researcher 'Do not trust your eyes': AI generates surge in expense fraud Proton Data Breach Observatory aims to alert you in near real-time Using a Security Key on X? Re-Enroll Now or Your Account Will Be Locked YouTube denies AI was involved with odd removals of tech tutorials 10M people watched a YouTuber shim a lock; the lock company sued him. Bad idea. Samsung's $2000 smart fridges are getting ads - gHacks Tech News ESPN, ABC, and other Disney channels go dark on YouTube TV Host: Leo Laporte Guests: Jill Duffy, Alex Stamos, and Stacey Higginbotham Download or subscribe to This Week in Tech at https://twit.tv/shows/this-week-in-tech Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: ziprecruiter.com/twit zscaler.com/security miro.com canary.tools/twit - use code: TWIT Melissa.com/twit
From AI-powered code generation boosting productivity to adversaries using the same tools to hunt zero-days, the panel exposes the coming wave of AI-fueled cyberattacks—and why most companies aren't ready for it. Cotton blocks Trump-backed effort to make daylight saving time permanent The End of Cybersecurity Amazon says it didn't cut 14,000 people because of money. It cut them because of 'culture' Here's How the AI Crash Happens US government is getting closer to banning TP-Link routers Neato cloud shutdown sees robocleaners robbed of their smarts FCC will vote to scrap telecom cybersecurity requirements Trump FCC Votes To Make It Easier For Your Broadband ISP To Rip You Off Swedish Death Cleaning But for Your Ditital Life The F5 Hack is a Big Deal OpenAI Releases Agentic Security Researcher 'Do not trust your eyes': AI generates surge in expense fraud Proton Data Breach Observatory aims to alert you in near real-time Using a Security Key on X? Re-Enroll Now or Your Account Will Be Locked YouTube denies AI was involved with odd removals of tech tutorials 10M people watched a YouTuber shim a lock; the lock company sued him. Bad idea. Samsung's $2000 smart fridges are getting ads - gHacks Tech News ESPN, ABC, and other Disney channels go dark on YouTube TV Host: Leo Laporte Guests: Jill Duffy, Alex Stamos, and Stacey Higginbotham Download or subscribe to This Week in Tech at https://twit.tv/shows/this-week-in-tech Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: ziprecruiter.com/twit zscaler.com/security miro.com canary.tools/twit - use code: TWIT Melissa.com/twit
This week SUSE's SLES and Red Hat's RHEL are embracing AI in the form of MCP and CUDA support. FFMPEG scores a $100k donation, Pop_OS and Cosmic finally have a release data, and Unity is in need of help. Kodi 22 has an Alpha, Debian has a Systemd dustup, and Krita has landed HDR support. And there's a port of Linux to WASM, so you can run the kern in your browser. Handy! For tips we have doxx for opening .docx in the terminal, a primer on absolute vs relative paths, whoami for grabbing the current username, and btrfs's scrub command for checking the local disk. You can find the show notes at https://bit.ly/4ovhsLG and have a great week! Host: Jonathan Bennett Co-Hosts: Rob Campbell, Jeff Massie, and Ken McDonald Download or subscribe to Untitled Linux Show at https://twit.tv/shows/untitled-linux-show Want access to the ad-free video and exclusive features? Become a member of Club TWiT today! https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.
From AI-powered code generation boosting productivity to adversaries using the same tools to hunt zero-days, the panel exposes the coming wave of AI-fueled cyberattacks—and why most companies aren't ready for it. Cotton blocks Trump-backed effort to make daylight saving time permanent The End of Cybersecurity Amazon says it didn't cut 14,000 people because of money. It cut them because of 'culture' Here's How the AI Crash Happens US government is getting closer to banning TP-Link routers Neato cloud shutdown sees robocleaners robbed of their smarts FCC will vote to scrap telecom cybersecurity requirements Trump FCC Votes To Make It Easier For Your Broadband ISP To Rip You Off Swedish Death Cleaning But for Your Digital Life The F5 Hack is a Big Deal OpenAI Releases Agentic Security Researcher 'Do not trust your eyes': AI generates surge in expense fraud Proton Data Breach Observatory aims to alert you in near real-time Using a Security Key on X? Re-Enroll Now or Your Account Will Be Locked YouTube denies AI was involved with odd removals of tech tutorials 10M people watched a YouTuber shim a lock; the lock company sued him. Bad idea. Samsung's $2000 smart fridges are getting ads - gHacks Tech News ESPN, ABC, and other Disney channels go dark on YouTube TV Host: Leo Laporte Guests: Jill Duffy, Alex Stamos, and Stacey Higginbotham Download or subscribe to This Week in Tech at https://twit.tv/shows/this-week-in-tech Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: ziprecruiter.com/twit zscaler.com/security miro.com canary.tools/twit - use code: TWIT Melissa.com/twit
From AI-powered code generation boosting productivity to adversaries using the same tools to hunt zero-days, the panel exposes the coming wave of AI-fueled cyberattacks—and why most companies aren't ready for it. Cotton blocks Trump-backed effort to make daylight saving time permanent The End of Cybersecurity Amazon says it didn't cut 14,000 people because of money. It cut them because of 'culture' Here's How the AI Crash Happens US government is getting closer to banning TP-Link routers Neato cloud shutdown sees robocleaners robbed of their smarts FCC will vote to scrap telecom cybersecurity requirements Trump FCC Votes To Make It Easier For Your Broadband ISP To Rip You Off Swedish Death Cleaning But for Your Digital Life The F5 Hack is a Big Deal OpenAI Releases Agentic Security Researcher 'Do not trust your eyes': AI generates surge in expense fraud Proton Data Breach Observatory aims to alert you in near real-time Using a Security Key on X? Re-Enroll Now or Your Account Will Be Locked YouTube denies AI was involved with odd removals of tech tutorials 10M people watched a YouTuber shim a lock; the lock company sued him. Bad idea. Samsung's $2000 smart fridges are getting ads - gHacks Tech News ESPN, ABC, and other Disney channels go dark on YouTube TV Host: Leo Laporte Guests: Jill Duffy, Alex Stamos, and Stacey Higginbotham Download or subscribe to This Week in Tech at https://twit.tv/shows/this-week-in-tech Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: ziprecruiter.com/twit zscaler.com/security miro.com canary.tools/twit - use code: TWIT Melissa.com/twit
A popular YouTuber named Theo Browne offered $20k to the Open Source FFmpeg team if they remove their social media person, who Theo calls a “motherf***er”. The X Thread: https://x.com/LundukeJournal/status/1982569289237352620 More from The Lunduke Journal: https://lunduke.com/ This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit lunduke.substack.com/subscribe
Steve talks about his struggles with Immich, Noah walks through his adventures with OpenWRT. -- During The Show -- 00:58 Steve's Immich War Story Updated Immich 2 Choices to repair Knowing things are down 07:15 LubeLogger & Ansible - Aaron LubeLogger Ansible (https://codeberg.org/aaronvonawesome/ansible-role-lube-logger) Mechanical mindset Playbook setup Tracking vehicle maintenance Dealership perks 14:00 Networking Suggestion & AI Question - David Zyxel AP Dell r420 with dual e5-2430v2 PCI generational lock if it doesn't fit in vram, no point disable hyper threading Bottle necks 21:20 ZFS On Removable Device - Chris ZFS send/receive bit-rot rsync Diverse hardware/software Better hardware Odroid Used Ebay 1 Liter PC USB HDDs are low quality RFC 6214 Pigeons instead of AI 36:42 News Wire Firefox 142 - firefox.com (https://www.firefox.com/en-US/firefox/142.0/releasenotes) Thunderbird 142 - thunderbird.net (https://www.thunderbird.net/en-US/thunderbird/142.0/releasenotes) Libreoffice 25.8 - documentfoundation.org (https://wiki.documentfoundation.org/ReleaseNotes/25.8) GNU Nano 8.6 - lists.gnu.org (https://lists.gnu.org/archive/html/info-gnu/2025-08/msg00008.html) FFMpeg 8.0 - ffmpeg.org (https://ffmpeg.org/index.html#news) Git 2.15 - gitlab.com (https://about.gitlab.com/blog/what-s-new-in-git-2-51-0) github.blog (https://github.blog/open-source/git/highlights-from-git-2-51) Linux 6.15 - techprovidence.com (https://www.techprovidence.com/linux-kernel-6-15-eol-upgrade-6-16) Tails 6.19 - torproject.org (https://blog.torproject.org/new-release-tails-6_19) RingReaper - darkreading.com (https://www.darkreading.com/cyber-risk/ringreaper-sneaks-past-linux-edrs) Arch Linux DDoS - tomshardware.com (https://www.tomshardware.com/software/linux/arch-linux-continues-to-feel-the-force-of-a-ddos-attack-after-two-brutal-weeks-attackers-yet-to-be-identified-as-project-struggles-to-restore-full-service) Grok 2.5 - engadget.com (https://www.engadget.com/ai/you-can-now-download-and-tweak-grok-25-for-yourself-as-it-goes-open-source-164734471.html) reuters.com (https://www.reuters.com/technology/musk-says-xai-open-sources-grok-25-2025-08-23) Essedum 1.0 - techzine.eu (https://www.techzine.eu/news/infrastructure/134070/open-source-platform-essedum-1-0-brings-ai-to-networking) OpenCUA - venturebeat.com (https://venturebeat.com/ai/opencuas-open-source-computer-use-agents-rival-proprietary-models-from-openai-and-anthropic) Gnoppix Embeds AI - thenewstack.io (https://thenewstack.io/gnoppix-makes-using-ai-on-linux-a-snap-with-one-caveat) Quantinuum's Guppy and Selene - constellationr.com (https://www.constellationr.com/blog-news/insights/quantinuum-fleshes-out-quantum-software-stack-open-source-guppy-selene) 38:40 OpenWrt Flashing OpenWrt on lots of devices GL.iNet Mango (https://www.gl-inet.com/products/gl-mt300n-v2/) GL.iNet Domino & OpenWrt GL.iNet Marble (https://www.gl-inet.com/products/gl-b3000/) OpenWrt One (https://openwrt.org/toh/openwrt/one?s[]=shell) Comet Pro (https://www.gl-inet.com/campaign/gl-rm10/?) UniFi 6 LR -- The Extra Credit Section -- For links to the articles and material referenced in this week's episode check out this week's page from our podcast dashboard! This Episode's Podcast Dashboard (http://podcast.asknoahshow.com/457) Phone Systems for Ask Noah provided by Voxtelesys (http://www.voxtelesys.com/asknoah) Join us in our dedicated chatroom #GeekLab:linuxdelta.com on Matrix (https://element.linuxdelta.com/#/room/#geeklab:linuxdelta.com) -- Stay In Touch -- Find all the resources for this show on the Ask Noah Dashboard Ask Noah Dashboard (http://www.asknoahshow.com) Need more help than a radio show can offer? Altispeed provides commercial IT services and they're excited to offer you a great deal for listening to the Ask Noah Show. Call today and ask about the discount for listeners of the Ask Noah Show! Altispeed Technologies (http://www.altispeed.com/) Contact Noah live [at] asknoahshow.com -- Twitter -- Noah - Kernellinux (https://twitter.com/kernellinux) Ask Noah Show (https://twitter.com/asknoahshow) Altispeed Technologies (https://twitter.com/altispeed)
LibreOffice is dumping Windows (OK, not all of Windows), there's anime catgirls keeping the kernel safe, and FFmpeg makes a major new release. Kdenlive has a release, Thunderbird has announced ThunderMail, and one of the hosts gives CachyOS a spin. For tips we're covering Gnome System Extensions, using WirePlumber for volume control, hacks for waking your monitor back up, and unbuffer for keeping your colors where they belong. You can find the show tips at http://bit.ly/45Nszrr and come back next week for more! Host: Jonathan Bennett Co-Hosts: Rob Campbell, Jeff Massie, and Ken McDonald Download or subscribe to Untitled Linux Show at https://twit.tv/shows/untitled-linux-show Want access to the ad-free video and exclusive features? Become a member of Club TWiT today! https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.
How AI Can Inadvertently Expose Personal DataAI tools often unintentionally leak private information. For example, meeting transcription software can include offhand comments, personal jokes, or sensitive details in auto-generated summaries. ChatGPT conversations—when publicly shared—can also be indexed by search engines, revealing confidential topics such as NDAs or personal relationship issues. Even healthcare devices like MRIs and X-ray machines have exposed private data due to weak or absent security controls, risking identity theft and phishing attacks.Cybercriminals Exploiting AI for AttacksAI is a double-edged sword: while offering defensive capabilities, it's also being weaponized. The group “GreedyBear” used AI-generated code in a massive crypto theft operation. They deployed malicious browser extensions, fake websites, and executable files to impersonate trusted crypto platforms, harvesting users' wallet credentials. Their tactic involves publishing benign software that gains trust, then covertly injecting malicious code later. Similarly, AI-generated TikTok ads lead to fake “shops” pushing malware like SparkKitty spyware, which targets cryptocurrency users.Security Concerns with Advanced AI Models like GPT-5Despite advancements, new AI models such as GPT-5 remain vulnerable. Independent researchers, including NeuralTrust and SPLX, were able to bypass GPT-5's safeguards within 24 hours. Methods included multi-turn “context smuggling” and text obfuscation to elicit dangerous outputs like instructions for creating weapons. These vulnerabilities suggest that even the latest models lack sufficient security maturity, raising concerns about their readiness for enterprise use.AI Literacy and Education InitiativesThere is a growing push for AI literacy, especially in schools. Microsoft has pledged $4 billion to fund AI education in K–12 schools, community colleges, and nonprofits. The traditional "Hour of Code" is being rebranded as "Hour of AI," reflecting a shift from learning to code to understanding AI itself. The aim is to empower students with foundational knowledge of how AI works, emphasizing creativity, ethics, security, and systems thinking over rote programming.Legal and Ethical Issues Around Posthumous Data UseOne emerging ethical challenge is the use of deceased individuals' data to train AI models. Scholars advocate for postmortem digital rights, such as a 12-month grace period for families to delete a person's data. Currently, U.S. laws offer little protection in this area, and acts like RUFADAA don't address AI recreations.Encryption Weaknesses in Law Enforcement and Critical SystemsRecent research highlights significant encryption vulnerabilities in communication systems used by police, military, and critical infrastructure. A Dutch study uncovered a deliberate backdoor in a radio encryption algorithm. Even the updated, supposedly secure version reduces key strength from 128 bits to 56 bits—dramatically weakening security. This suggests that critical communications could be intercepted, leaving sensitive systems exposed despite the illusion of protection.Public Trust in Government Digital SystemsTrust in digital governance is under strain. The UK's HM Courts & Tribunals Service reportedly concealed an IT error that caused key evidence to vanish in legal cases. The lack of transparency and inadequate investigation risk undermining judicial credibility. Separately, the UK government secretly authorized facial recognition use across immigration databases, far exceeding the scale of traditional criminal databases.AI for Cybersecurity DefenseOn the defensive side, AI is proving valuable in finding vulnerabilities. Google's “Big Sleep,” an LLM-based tool developed by DeepMind and Project Zero, has independently discovered 20 bugs in major open-source projects like FFmpeg and ImageMagick.
Wes performs a 2 a.m. rescue at DEFCON, and Chris attempts to build a Linux desktop using nothing but vibes.Sponsored By:Managed Nebula: Meet Managed Nebula from Defined Networking. A decentralized VPN built on the open-source Nebula platform that we love. 1Password Extended Access Management: 1Password Extended Access Management is a device trust solution for companies with Okta, and they ensure that if a device isn't trusted and secure, it can't log into your cloud apps. Unraid: A powerful, easy operating system for servers and storage. Maximize your hardware with unmatched flexibility. Support LINUX UnpluggedLinks:
This week it's a duet, with Jonathan and Jeff chatting about Clear Linux' last hurrah, and some other Intel projects. The kernel may be about to adopt an AI code policy, and Fedora debates how to handle BIOS bugs. FFmpeg is about to release 8.0, KDE is adding printer ink monitoring, and Valve has a Steam refresh in the works. Our command line tips are vity for AI help with the command line, and immich for building your own video and image store and timeline. You can catch the show notes at http://bit.ly/4lKOPZz Have a great week! Host: Jonathan Bennett Co-Host: Jeff Massie Download or subscribe to Untitled Linux Show at https://twit.tv/shows/untitled-linux-show Want access to the ad-free video and exclusive features? Become a member of Club TWiT today! https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.
以下のようなトピックについて話をしました。 01. SuperClaudeフレームワークで開発効率が劇的向上 タイトル:Claude Codeの開発効率を向上させるSuperClaudeフレームワークを試してみた 要約: SuperClaude v3は、Claude Codeをさらに効率的に活用できるOSSフレームワークです。専門家ペルソナによる自動最適化と高品質なスラッシュコマンドを提供し、開発効率を大幅に向上させます。 主な特徴は以下の2点です: ペルソナによる自動最適化 フロントエンド開発、セキュリティ、パフォーマンス最適化など各分野の専門家ペルソナが自動的に起動し、まるで専門家チームに作業を依頼しているような体験を提供します。 高品質なスラッシュコマンド /sc:analyze、/sc:implement、/sc:testなど開発で頻繁に利用する17個のスラッシュコマンドが高いクオリティですぐに使えます。 実際に使ってみると、専門家ペルソナが自動的に連携して総合的な分析を行ってくれるなど、インストールするだけで効果を実感できました。導入は数分で完了し、簡単に試せるのもポイントです。 Claude Codeをより効率的に活用したい開発者におすすめのツールです。ぜひ皆さんもClaude Codeの中に専門家チームを迎えて、開発効率を向上させてみてください。 02. 世界初のRISC-Vタブレット159ドルで発売 要約: 世界初のRISC-Vタブレット「PineTab-V」が登場した。159ドルという戦略的な価格設定で、Debian Linuxがプリインストールされており、Webブラウジングやドキュメント編集など日常的なタスクが可能だ。これは、RISC-Vが「理論」や「特定用途」の段階を卒業し、「汎用コンピューティング」の世界へ確かな一歩を踏み出したことを意味する。 PineTab-Vの登場は、RISC-Vエコシステムが新たな成熟段階に入ったことを示す歴史的な瞬間である。ハードウェアメーカーとソフトウェア企業が連携し、完成した製品を市場に送り出した。しかし、パフォーマンスの向上とアプリケーションの互換性という課題も残る。 NVIDIAがRISC-V向けにCUDAプラットフォームのサポートを発表するなど、業界の巨人たちもこのオープンなアーキテクチャに注目し始めている。PineTab-Vは、テクノロジーの未来が、よりオープンで多様な選択肢に満ちたものになる可能性を示す希望に満ちた第一歩だ。 03. FFmpeg手書きアセンブリで100倍の高速化を実現 タイトル:FFmpegの手書きアセンブリによる100倍高速化が示す「究極の最適化」の真価 オープンソースのメディア変換ツールFFmpegが、手書きアセンブリコードにより特定機能で100倍の高速化を達成した。この驚異的な数字の裏には、現代のソフトウェア開発が忘れかけた「職人芸」の価値がある。 高速化の対象は、ニッチな単一機能であり、FFmpeg全体の性能が100倍になったわけではない。しかし、開発者がわざわざ手間のかかる手法を選ぶのは、コンパイラの自動最適化には限界があるからだ。 手書きアセンブリは、CPUのポテンシャルを最大限に引き出すオーダーメイドの競技服のようなもの。FFmpegにとって、それは一貫した開発哲学である。 この取り組みは、ソフトウェア開発における「効率性」と「保守性」のトレードオフを突きつける。FFmpegが効率を追求するのは、それが無数のソフトウェアの基盤技術だからだ。 AI時代に「職人芸」とも言えるこの技術は、人間知性の極致を示すもの。FFmpegはそれを次世代に継承すべき重要な「知」と位置づけている。この挑戦は、技術の深淵と、それを探求し続ける人間の情熱の尊さを教えてくれる。 本ラジオはあくまで個人の見解であり現実のいかなる団体を代表するものではありません ご理解頂ますようよろしくおねがいします
Brandon interviews Victor Adossi, an engineer at Cosmonic. They discuss the state of WebAssembly, wasmCloud, and why Wasm is poised for growth. Plus, Victor shares what it's like to live as an expat in Japan. Watch the YouTube Live Recording of Episode 526 (https://youtu.be/i7PRMqYk-gM?si=dz_FKqcF3G9EI25m) Show Links Cosmonic (https://cosmonic.com/) Bytecode Alliance (https://bytecodealliance.org/) WebAssembly Specifications (https://webassembly.org/specs/) The WebAssembly Component Model (https://component-model.bytecodealliance.org) Emscripten (https://emscripten.org) wasmCloud (https://wasmcloud.com/) wasmCloud Examples (https://github.com/wasmCloud/wasmCloud/tree/main/examples) Jco (Javascript ecosystem) Examples (https://github.com/bytecodealliance/jco/tree/main/examples/components) FFmpeg (https://ffmpeg.org) Contact Victor Github: t3hmrman (https://github.com/t3hmrman) and vados-cosmonic (https://github.com/vados-cosmonic) Twitter: @vadosware (https://x.com/vadosware) (https://x.com/vadosware) Web: vadosware.io (http://vadosware.io/) SDT News & Hype Join us in Slack (http://www.softwaredefinedtalk.com/slack). Get a SDT Sticker! Send your postal address to stickers@softwaredefinedtalk.com (mailto:stickers@softwaredefinedtalk.com) and we will send you free laptop stickers! Follow us: Twitch (https://www.twitch.tv/sdtpodcast), Twitter (https://twitter.com/softwaredeftalk), Instagram (https://www.instagram.com/softwaredefinedtalk/), Mastodon (https://hachyderm.io/@softwaredefinedtalk), BlueSky (https://bsky.app/profile/softwaredefinedtalk.com), LinkedIn (https://www.linkedin.com/company/software-defined-talk/), TikTok (https://www.tiktok.com/@softwaredefinedtalk), Threads (https://www.threads.net/@softwaredefinedtalk) and YouTube (https://www.youtube.com/channel/UCi3OJPV6h9tp-hbsGBLGsDQ/featured). Use the code SDT to get $20 off Coté's book, Digital WTF (https://leanpub.com/digitalwtf/c/sdt), so $5 total. Become a sponsor of Software Defined Talk (https://www.softwaredefinedtalk.com/ads)! Special Guest: Victor Adossi.
Topics covered in this episode: * Free-threaded Python no longer “experimental” as of Python 3.14* typed-ffmpeg pyleak * Optimizing Test Execution: Running live_server Tests Last with pytest* Extras Joke Watch on YouTube About the show Sponsored by PropelAuth: pythonbytes.fm/propelauth66 Connect with the hosts Michael: @mkennedy@fosstodon.org / @mkennedy.codes (bsky) Brian: @brianokken@fosstodon.org / @brianokken.bsky.social Show: @pythonbytes@fosstodon.org / @pythonbytes.fm (bsky) Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Monday at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Brian #1: Free-threaded Python no longer “experimental” as of Python 3.14 “PEP 779 ("Criteria for supported status for free-threaded Python") has been accepted, which means free-threaded Python is now a supported build!” - Hugo van Kemenade PEP 779 – Criteria for supported status for free-threaded Python As noted in the discussion of PEP 779, “The Steering Council (SC) approves PEP 779, with the effect of removing the “experimental” tag from the free-threaded build of Python 3.14.” We are in Phase II then. “We are confident that the project is on the right path, and we appreciate the continued dedication from everyone working to make free-threading ready for broader adoption across the Python community.” “Keep in mind that any decision to transition to Phase III, with free-threading as the default or sole build of Python is still undecided, and dependent on many factors both within CPython itself and the community. We leave that decision for the future.” How long will all this take? According to Thomas Wouters, a few years, at least: “In other words: it'll be a few years at least. It can't happen before 3.16 (because we won't have Stable ABI support until 15) and may well take longer.” Michael #2: typed-ffmpeg typed-ffmpeg offers a modern, Pythonic interface to FFmpeg, providing extensive support for complex filters with detailed typing and documentation. Inspired by ffmpeg-python, this package enhances functionality by addressing common limitations, such as lack of IDE integration and comprehensive typing, while also introducing new features like JSON serialization of filter graphs and automatic FFmpeg validation. Features : Zero Dependencies: Built purely with the Python standard library, ensuring maximum compatibility and security. User-Friendly: Simplifies the construction of filter graphs with an intuitive Pythonic interface. Comprehensive FFmpeg Filter Support: Out-of-the-box support for most FFmpeg filters, with IDE auto-completion. Integrated Documentation: In-line docstrings provide immediate reference for filter usage, reducing the need to consult external documentation. Robust Typing: Offers static and dynamic type checking, enhancing code reliability and development experience. Filter Graph Serialization: Enables saving and reloading of filter graphs in JSON format for ease of use and repeatability. Graph Visualization: Leverages graphviz for visual representation, aiding in understanding and debugging. Validation and Auto-correction: Assists in identifying and fixing errors within filter graphs. Input and Output Options Support: Provide a more comprehensive interface for input and output options, including support for additional codecs and formats. Partial Evaluation: Enhance the flexibility of filter graphs by enabling partial evaluation, allowing for modular construction and reuse. Media File Analysis: Built-in support for analyzing media files using FFmpeg's ffprobe utility, providing detailed metadata extraction with both dictionary and dataclass interfaces. Michael #3: pyleak Detect leaked asyncio tasks, threads, and event loop blocking with stack trace in Python. Inspired by goleak. Use as context managers or function dectorators When using no_task_leaks, you get detailed stack trace information showing exactly where leaked tasks are executing and where they were created. Even has great examples and a pytest plugin. Brian #4: Optimizing Test Execution: Running live_server Tests Last with pytest Tim Kamanin “When working with Django applications, it's common to have a mix of fast unit tests and slower end-to-end (E2E) tests that use pytest's live_server fixture and browser automation tools like Playwright or Selenium. ” Tim is running E2E tests last for Faster feedback from quick tests To not tie up resources early in the test suite. He did this with custom “e2e” marker Implementing a pytest_collection_modifyitems hook function to look for tests using the live_server fixture, and for them automatically add the e2e marker to those tests move those tests to the end The reason for the marker is to be able to Just run e2e tests with -m e2e Avoid running them sometimes with -m "not e2e" Cool small writeup. The technique works for any system that has some tests that are slower or resource bound based on a particular fixture or set of fixtures. Extras Brian: Is Free-Threading Our Only Option? - Interesting discussion started by Eric Snow and recommended by John Hagen Free-threaded Python on GitHub Actions - How to add FT tests to your projects, by Hugo van Kemenade Michael: New course! LLM Building Blocks in Python Talk Python Deep Dives Complete: 600K Words of Talk Python Insights .folders on Linux Write up on XDG for Python devs. They keep pulling me back - ChatGPT Pro with o3-pro Python Bytes is the #1 Python news podcast and #17 of all tech news podcasts. Python 3.13.4, 3.12.11, 3.11.13, 3.10.18 and 3.9.23 are now available Python 3.13.5 is now available! Joke: Naming is hard
This episode was edited entirely in Rupert-wave. Rupert-wave is a audio editing software that my uncle made himself on his garage computer, and he gave it to me during a family barbecue last summer. It came on a series 8 cds and while I was going through the installation process, my pc coughed. In this episode we play "manga or children's book" and Wayne has us on a jury for Steam VAC bans, whatever that means. Send your emails to: podcast@radiotv.solutions Music By Justin Wabs
ThePrimeagen (aka Michael Paulson) is a programmer who has educated, entertained, and inspired millions of people to build software and have fun doing it. Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep461-sc See below for timestamps, and to give feedback, submit questions, contact Lex, etc. CONTACT LEX: Feedback - give feedback to Lex: https://lexfridman.com/survey AMA - submit questions, videos or call-in: https://lexfridman.com/ama Hiring - join our team: https://lexfridman.com/hiring Other - other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS: ThePrimeagen's X: https://twitter.com/ThePrimeagen ThePrimeagen's YouTube: https://youtube.com/ThePrimeTimeagen ThePrimeagen's Twitch: https://twitch.tv/ThePrimeagen ThePrimeagen's GitHub: https://github.com/theprimeagen ThePrimeagen's TikTok: https://tiktok.com/@theprimeagen ThePrimeagen's Coffee: https://www.terminal.shop/ SPONSORS: To support this podcast, check out our sponsors & get discounts: Invideo AI: AI video generator. Go to https://invideo.io/i/lexpod Shopify: Sell stuff online. Go to https://shopify.com/lex NetSuite: Business management software. Go to http://netsuite.com/lex BetterHelp: Online therapy and counseling. Go to https://betterhelp.com/lex AG1: All-in-one daily nutrition drinks. Go to https://drinkag1.com/lex OUTLINE: (00:00) - Introduction (10:27) - Love for programming (20:00) - Hardest part of programming (22:16) - Types of programming (29:54) - Life story (39:58) - Hardship (41:29) - High school (47:15) - Porn addiction (57:01) - God (1:12:44) - Perseverance (1:22:40) - Netflix (1:35:08) - Groovy (1:40:13) - Printf() debugging (1:46:35) - Falcor (1:56:05) - Breaking production (1:58:49) - Pieter Levels (2:03:19) - Netflix, Twitch, and YouTube infrastructure (2:15:22) - ThePrimeagen origin story (2:30:37) - Learning programming languages (2:39:40) - Best programming languages in 2025 (2:44:35) - Python (2:45:15) - HTML & CSS (2:46:05) - Bash (2:46:45) - FFmpeg (2:53:28) - Performance (2:56:00) - Rust (3:00:48) - Epic projects (3:14:12) - Asserts (3:23:26) - ADHD (3:31:34) - Productivity (3:35:58) - Programming setup (4:11:28) - Coffee (4:18:32) - Programming with AI (5:01:16) - Advice for young programmers (5:12:48) - Reddit questions (5:20:20) - God PODCAST LINKS: - Podcast Website: https://lexfridman.com/podcast - Apple Podcasts: https://apple.co/2lwqZIr - Spotify: https://spoti.fi/2nEwCF8 - RSS: https://lexfridman.com/feed/podcast/ - Podcast Playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 - Clips Channel: https://www.youtube.com/lexclips
This week, we discuss essential Linux applications, explore the exciting new features of Linux Kernel 6.13, and discover how to control FFmpeg with natural language commands. Plus, we'll discuss Nvidia's plans for a consumer-friendly ARM chip.
Guest Kailash Nadh Panelist Richard Littauer Show Notes In this episode of Sustain, host Richard Littauer sits down with Kailash Nath, CTO of Zerodha, to delve into the dynamics of funding and sustaining open-source projects. They explore the establishment of Zerodha's FLOSS/Fund, which allocates a million dollars annually to support pivotal open source projects and discuss the development of the funding.json format to streamline grant applications. The conversation also covers the challenges of creating such funds, including regulatory hurdles, and aims to make financial assistance globally accessible. From detailing efforts to revive India's open-source communities through the FOSS United Foundation to highlighting the obstacles and innovative models in funding open-source software, the episode provides a comprehensive look at both global and Indian perspectives. Hit download now! [00:01:14] Richard brings up the FLOSS/Fund, a $1 million annual commitment to open source projects. Kailash confirms that the fund is still active and explains how it recently became more structured and a small team has been formed to manage the logistics of the fund. [00:02:48] The FLOSS/Fund has been created to publicly commit to supporting open source in a structured way. Kailash points out that while other companies give donations to open source, there are few structured initiatives from large organizations. [00:04:33] Kailash expresses frustration that few billion-dollar companies have set up similar initiatives to support open source projects. [00:06:24] Kailash explains that the FLOSS/Fund is open to the global open source community and target systemically important projects like libraries and widely used software tools. [00:08:14] Richard inquires about the application process and Kailash explains instead of traditional grant forms, projects must create and publish a “funding.json” file. [00:10:35] Kailash shares that the structured application method is designed to avoid the usual awkwardness of fundraising conversations and streamline the process. [00:13:31] The two discuss the difficulty maintainers face when articulating the importance of their projects, particularly for maintainers who may not have strong written communication skills, Kailash emphasizes that the funding.json method does not replace narrative descriptions but simplifies signaling. [00:16:17] The conversation switches to global scope and prioritization as Kailash tells us Zerodha's open source contributions are not limited to projects they directly use, the fund is open to all global projects, and Zerodha hopes to support projects that are crucial for open source infrastructure. [00:17:09] Kailash discusses the complexity of sending money internationally from India. [00:18:59] We learn the goal is not to make funding.json go viral through financial incentives, but to organically grow adoption if the tool proves valuable. [00:20:49] Richard and Kailash explore the broader challenges of sustaining open source projects beyond funding, such as building healthy communities and incentivizing the proper use and citation of open source infrastructure. [00:25:32] Kailash discusses the Indian open source ecosystem. [00:30:29] Kailash explains how Zerodha's initiatives aim to push the Indian industry to give back more to the open source community. He hopes that their efforts will inspire other companies to set up similar initiatives. [00:32:12] Find out where you can donate to Floss fund and follow Kailash online. Spotlight [00:32:56] Richard's spotlight is his first grade teacher, Mrs. Barril. [00:33:25] Kailash's spotlight is Jim Martsolf who introduced him to “webmastering.” Links SustainOSS (https://sustainoss.org/) podcast@sustainoss.org (mailto:podcast@sustainoss.org) richard@sustainoss.org (mailto:richard@sustainoss.org) SustainOSS Discourse (https://discourse.sustainoss.org/) SustainOSS Mastodon (https://mastodon.social/tags/sustainoss) Open Collective-SustainOSS (Contribute) (https://opencollective.com/sustainoss) Richard Littauer Socials (https://www.burntfen.com/2023-05-30/socials) Kailash Nadh LinkedIn (https://www.linkedin.com/in/kailashnadh/) Kailash Nadh Website (https://nadh.in/) Zerodha (https://zerodha.com/) funding.json (https://floss.fund/funding-manifest/) Sustain Podcast-Episode 153: Kailash Nadh and the FOSS United Foundation (https://podcast.sustainoss.org/153) FLOSS/Fund (https://floss.fund/) FFmpeg (https://ffmpeg.org/) Zig (https://ziglang.org/) Sustain Podcast-Episode 247: Chad Whitacre on the Open Source Pledge (https://podcast.sustainoss.org/247) Open Source Pledge (https://opensourcepledge.com/) Announcing FLOSS/fund: $1M per year for free and open source projects-post by Kailash Nadh (https://floss.fund/blog/announcing-floss-fund/) Credits Produced by Richard Littauer (https://www.burntfen.com/) Edited by Paul M. Bahr at Peachtree Sound (https://www.peachtreesound.com/) Show notes by DeAnn Bahr Peachtree Sound (https://www.peachtreesound.com/) Special Guest: Kailash Nadh.
Обирайте VPS з мавпочкою — https://fotbo.com.ua/?utm_source=bloggers&utm_medium=yt&utm_campaign=dou_2024
We are recording our next big recap episode and taking questions! Submit questions and messages on Speakpipe here for a chance to appear on the show!Also subscribe to our calendar for our Singapore, NeurIPS, and all upcoming meetups!In our first ever episode with Logan Kilpatrick we called out the two hottest LLM frameworks at the time: LangChain and Dust. We've had Harrison from LangChain on twice (as a guest and as a co-host), and we've now finally come full circle as Stanislas from Dust joined us in the studio.After stints at Oracle and Stripe, Stan had joined OpenAI to work on mathematical reasoning capabilities. He describes his time at OpenAI as "the PhD I always wanted to do" while acknowledging the challenges of research work: "You're digging into a field all day long for weeks and weeks, and you find something, you get super excited for 12 seconds. And at the 13 seconds, you're like, 'oh, yeah, that was obvious.' And you go back to digging." This experience, combined with early access to GPT-4's capabilities, shaped his decision to start Dust: "If we believe in AGI and if we believe the timelines might not be too long, it's actually the last train leaving the station to start a company. After that, it's going to be computers all the way down."The History of DustDust's journey can be broken down into three phases:* Developer Framework (2022): Initially positioned as a competitor to LangChain, Dust started as a developer tooling platform. While both were open source, their approaches differed – LangChain focused on broad community adoption and integration as a pure developer experience, while Dust emphasized UI-driven development and better observability that wasn't just `print` statements.* Browser Extension (Early 2023): The company pivoted to building XP1, a browser extension that could interact with web content. This experiment helped validate user interaction patterns with AI, even while using less capable models than GPT-4.* Enterprise Platform (Current): Today, Dust has evolved into an infrastructure platform for deploying AI agents within companies, with impressive metrics like 88% daily active users in some deployments.The Case for Being HorizontalThe big discussion for early stage companies today is whether or not to be horizontal or vertical. Since models are so good at general tasks, a lot of companies are building vertical products that take care of a workflow end-to-end in order to offer more value and becoming more of “Services as Software”. Dust on the other hand is a platform for the users to build their own experiences, which has had a few advantages:* Maximum Penetration: Dust reports 60-70% weekly active users across entire companies, demonstrating the potential reach of horizontal solutions rather than selling into a single team.* Emergent Use Cases: By allowing non-technical users to create agents, Dust enables use cases to emerge organically from actual business needs rather than prescribed solutions.* Infrastructure Value: The platform approach creates lasting value through maintained integrations and connections, similar to how Stripe's value lies in maintaining payment infrastructure. Rather than relying on third-party integration providers, Dust maintains its own connections to ensure proper handling of different data types and structures.The Vertical ChallengeHowever, this approach comes with trade-offs:* Harder Go-to-Market: As Stan talked about: "We spike at penetration... but it makes our go-to-market much harder. Vertical solutions have a go-to-market that is much easier because they're like, 'oh, I'm going to solve the lawyer stuff.'"* Complex Infrastructure: Building a horizontal platform requires maintaining numerous integrations and handling diverse data types appropriately – from structured Salesforce data to unstructured Notion pages. As you scale integrations, the cost of maintaining them also scales. * Product Surface Complexity: Creating an interface that's both powerful and accessible to non-technical users requires careful design decisions, down to avoiding technical terms like "system prompt" in favor of "instructions." The Future of AI PlatformsStan initially predicted we'd see the first billion-dollar single-person company in 2023 (a prediction later echoed by Sam Altman), but he's now more focused on a different milestone: billion-dollar companies with engineering teams of just 20 people, enabled by AI assistance.This vision aligns with Dust's horizontal platform approach – building the infrastructure that allows small teams to achieve outsized impact through AI augmentation. Rather than replacing entire job functions (the vertical approach), they're betting on augmenting existing workflows across organizations.Full YouTube EpisodeChapters* 00:00:00 Introductions* 00:04:33 Joining OpenAI from Paris* 00:09:54 Research evolution and compute allocation at OpenAI* 00:13:12 Working with Ilya Sutskever and OpenAI's vision* 00:15:51 Leaving OpenAI to start Dust* 00:18:15 Early focus on browser extension and WebGPT-like functionality* 00:20:20 Dust as the infrastructure for agents* 00:24:03 Challenges of building with early AI models* 00:28:17 LLMs and Workflow Automation* 00:35:28 Building dependency graphs of agents* 00:37:34 Simulating API endpoints* 00:40:41 State of AI models* 00:43:19 Running evals* 00:46:36 Challenges in building AI agents infra* 00:49:21 Buy vs. build decisions for infrastructure components* 00:51:02 Future of SaaS and AI's Impact on Software* 00:53:07 The single employee $1B company race* 00:56:32 Horizontal vs. vertical approaches to AI agentsTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.Swyx [00:00:11]: Hey, and today we're in a studio with Stanislas, welcome.Stan [00:00:14]: Thank you very much for having me.Swyx [00:00:16]: Visiting from Paris.Stan [00:00:17]: Paris.Swyx [00:00:18]: And you have had a very distinguished career. It's very hard to summarize, but you went to college in both Ecopolytechnique and Stanford, and then you worked in a number of places, Oracle, Totems, Stripe, and then OpenAI pre-ChatGPT. We'll talk, we'll spend a little bit of time about that. About two years ago, you left OpenAI to start Dust. I think you were one of the first OpenAI alum founders.Stan [00:00:40]: Yeah, I think it was about at the same time as the Adept guys, so that first wave.Swyx [00:00:46]: Yeah, and people really loved our David episode. We love a few sort of OpenAI stories, you know, for back in the day, like we're talking about pre-recording. Probably the statute of limitations on some of those stories has expired, so you can talk a little bit more freely without them coming after you. But maybe we'll just talk about, like, what was your journey into AI? You know, you were at Stripe for almost five years, there are a lot of Stripe alums going into OpenAI. I think the Stripe culture has come into OpenAI quite a bit.Stan [00:01:11]: Yeah, so I think the buses of Stripe people really started flowing in, I guess, after ChatGPT. But, yeah, my journey into AI is a... I mean, Greg Brockman. Yeah, yeah. From Greg, of course. And Daniela, actually, back in the days, Daniela Amodei.Swyx [00:01:27]: Yes, she was COO, I mean, she is COO, yeah. She had a pretty high job at OpenAI at the time, yeah, for sure.Stan [00:01:34]: My journey started as anybody else, you're fascinated with computer science and you want to make them think, it's awesome, but it doesn't work. I mean, it was a long time ago, it was like maybe 16, so it was 25 years ago. Then the first big exposure to AI would be at Stanford, and I'm going to, like, disclose a whole lamb, because at the time it was a class taught by Andrew Ng, and there was no deep learning. It was half features for vision and a star algorithm. So it was fun. But it was the early days of deep learning. At the time, I think a few years after, it was the first project at Google. But you know, that cat face or the human face trained from many images. I went to, hesitated doing a PhD, more in systems, eventually decided to go into getting a job. Went at Oracle, started a company, did a gazillion mistakes, got acquired by Stripe, worked with Greg Buckman there. And at the end of Stripe, I started interesting myself in AI again, felt like it was the time, you had the Atari games, you had the self-driving craziness at the time. And I started exploring projects, it felt like the Atari games were incredible, but there were still games. And I was looking into exploring projects that would have an impact on the world. And so I decided to explore three things, self-driving cars, cybersecurity and AI, and math and AI. It's like I sing it by a decreasing order of impact on the world, I guess.Swyx [00:03:01]: Discovering new math would be very foundational.Stan [00:03:03]: It is extremely foundational, but it's not as direct as driving people around.Swyx [00:03:07]: Sorry, you're doing this at Stripe, you're like thinking about your next move.Stan [00:03:09]: No, it was at Stripe, kind of a bit of time where I started exploring. I did a bunch of work with friends on trying to get RC cars to drive autonomously. Almost started a company in France or Europe about self-driving trucks. We decided to not go for it because it was probably very operational. And I think the idea of the company, of the team wasn't there. And also I realized that if I wake up a day and because of a bug I wrote, I killed a family, it would be a bad experience. And so I just decided like, no, that's just too crazy. And then I explored cybersecurity with a friend. We're trying to apply transformers to cut fuzzing. So cut fuzzing, you have kind of an algorithm that goes really fast and tries to mutate the inputs of a library to find bugs. And we tried to apply a transformer to that and do reinforcement learning with the signal of how much you propagate within the binary. Didn't work at all because the transformers are so slow compared to evolutionary algorithms that it kind of didn't work. Then I started interested in math and AI and started working on SAT solving with AI. And at the same time, OpenAI was kind of starting the reasoning team that were tackling that project as well. I was in touch with Greg and eventually got in touch with Ilya and finally found my way to OpenAI. I don't know how much you want to dig into that. The way to find your way to OpenAI when you're in Paris was kind of an interesting adventure as well.Swyx [00:04:33]: Please. And I want to note, this was a two-month journey. You did all this in two months.Stan [00:04:38]: The search.Swyx [00:04:40]: Your search for your next thing, because you left in July 2019 and then you joined OpenAI in September.Stan [00:04:45]: I'm going to be ashamed to say that.Swyx [00:04:47]: You were searching before. I was searching before.Stan [00:04:49]: I mean, it's normal. No, the truth is that I moved back to Paris through Stripe and I just felt the hardship of being remote from your team nine hours away. And so it kind of freed a bit of time for me to start the exploration before. Sorry, Patrick. Sorry, John.Swyx [00:05:05]: Hopefully they're listening. So you joined OpenAI from Paris and from like, obviously you had worked with Greg, but notStan [00:05:13]: anyone else. No. Yeah. So I had worked with Greg, but not Ilya, but I had started chatting with Ilya and Ilya was kind of excited because he knew that I was a good engineer through Greg, I presume, but I was not a trained researcher, didn't do a PhD, never did research. And I started chatting and he was excited all the way to the point where he was like, hey, come pass interviews, it's going to be fun. I think he didn't care where I was, he just wanted to try working together. So I go to SF, go through the interview process, get an offer. And so I get Bob McGrew on the phone for the first time, he's like, hey, Stan, it's awesome. You've got an offer. When are you coming to SF? I'm like, hey, it's awesome. I'm not coming to the SF. I'm based in Paris and we just moved. He was like, hey, it's awesome. Well, you don't have an offer anymore. Oh, my God. No, it wasn't as hard as that. But that's basically the idea. And it took me like maybe a couple more time to keep chatting and they eventually decided to try a contractor set up. And that's how I kind of started working at OpenAI, officially as a contractor, but in practice really felt like being an employee.Swyx [00:06:14]: What did you work on?Stan [00:06:15]: So it was solely focused on math and AI. And in particular in the application, so the study of the larger grid models, mathematical reasoning capabilities, and in particular in the context of formal mathematics. The motivation was simple, transformers are very creative, but yet they do mistakes. Formal math systems are of the ability to verify a proof and the tactics they can use to solve problems are very mechanical, so you miss the creativity. And so the idea was to try to explore both together. You would get the creativity of the LLMs and the kind of verification capabilities of the formal system. A formal system, just to give a little bit of context, is a system in which a proof is a program and the formal system is a type system, a type system that is so evolved that you can verify the program. If the type checks, it means that the program is correct.Swyx [00:07:06]: Is the verification much faster than actually executing the program?Stan [00:07:12]: Verification is instantaneous, basically. So the truth is that what you code in involves tactics that may involve computation to search for solutions. So it's not instantaneous. You do have to do the computation to expand the tactics into the actual proof. The verification of the proof at the very low level is instantaneous.Swyx [00:07:32]: How quickly do you run into like, you know, halting problem PNP type things, like impossibilities where you're just like that?Stan [00:07:39]: I mean, you don't run into it at the time. It was really trying to solve very easy problems. So I think the... Can you give an example of easy? Yeah, so that's the mass benchmark that everybody knows today. The Dan Hendricks one. The Dan Hendricks one, yeah. And I think it was the low end part of the mass benchmark at the time, because that mass benchmark includes AMC problems, AMC 8, AMC 10, 12. So these are the easy ones. Then AIME problems, somewhat harder, and some IMO problems, like Crazy Arm.Swyx [00:08:07]: For our listeners, we covered this in our Benchmarks 101 episode. AMC is literally the grade of like high school, grade 8, grade 10, grade 12. So you can solve this. Just briefly to mention this, because I don't think we'll touch on this again. There's a bit of work with like Lean, and then with, you know, more recently with DeepMind doing like scoring like silver on the IMO. Any commentary on like how math has evolved from your early work to today?Stan [00:08:34]: I mean, that result is mind blowing. I mean, from my perspective, spent three years on that. At the same time, Guillaume Lampe in Paris, we were both in Paris, actually. He was at FAIR, was working on some problems. We were pushing the boundaries, and the goal was the IMO. And we cracked a few problems here and there. But the idea of getting a medal at an IMO was like just remote. So this is an impressive result. And we can, I think the DeepMind team just did a good job of scaling. I think there's nothing too magical in their approach, even if it hasn't been published. There's a Dan Silver talk from seven days ago where it goes a little bit into more details. It feels like there's nothing magical there. It's really applying reinforcement learning and scaling up the amount of data that can generate through autoformalization. So we can dig into what autoformalization means if you want.Alessio [00:09:26]: Let's talk about the tail end, maybe, of the OpenAI. So you joined, and you're like, I'm going to work on math and do all of these things. I saw on one of your blog posts, you mentioned you fine-tuned over 10,000 models at OpenAI using 10 million A100 hours. How did the research evolve from the GPD 2, and then getting closer to DaVinci 003? And then you left just before ChatGPD was released, but tell people a bit more about the research path that took you there.Stan [00:09:54]: I can give you my perspective of it. I think at OpenAI, there's always been a large chunk of the compute that was reserved to train the GPTs, which makes sense. So it was pre-entropic splits. Most of the compute was going to a product called Nest, which was basically GPT-3. And then you had a bunch of, let's say, remote, not core research teams that were trying to explore maybe more specific problems or maybe the algorithm part of it. The interesting part, I don't know if it was where your question was going, is that in those labs, you're managing researchers. So by definition, you shouldn't be managing them. But in that space, there's a managing tool that is great, which is compute allocation. Basically by managing the compute allocation, you can message the team of where you think the priority should go. And so it was really a question of, you were free as a researcher to work on whatever you wanted. But if it was not aligned with OpenAI mission, and that's fair, you wouldn't get the compute allocation. As it happens, solving math was very much aligned with the direction of OpenAI. And so I was lucky to generally get the compute I needed to make good progress.Swyx [00:11:06]: What do you need to show as incremental results to get funded for further results?Stan [00:11:12]: It's an imperfect process because there's a bit of a... If you're working on math and AI, obviously there's kind of a prior that it's going to be aligned with the company. So it's much easier than to go into something much more risky, much riskier, I guess. You have to show incremental progress, I guess. It's like you ask for a certain amount of compute and you deliver a few weeks after and you demonstrate that you have a progress. Progress might be a positive result. Progress might be a strong negative result. And a strong negative result is actually often much harder to get or much more interesting than a positive result. And then it generally goes into, as any organization, you would have people finding your project or any other project cool and fancy. And so you would have that kind of phase of growing up compute allocation for it all the way to a point. And then maybe you reach an apex and then maybe you go back mostly to zero and restart the process because you're going in a different direction or something else. That's how I felt. Explore, exploit. Yeah, exactly. Exactly. Exactly. It's a reinforcement learning approach.Swyx [00:12:14]: Classic PhD student search process.Alessio [00:12:17]: And you were reporting to Ilya, like the results you were kind of bringing back to him or like what's the structure? It's almost like when you're doing such cutting edge research, you need to report to somebody who is actually really smart to understand that the direction is right.Stan [00:12:29]: So we had a reasoning team, which was working on reasoning, obviously, and so math in general. And that team had a manager, but Ilya was extremely involved in the team as an advisor, I guess. Since he brought me in OpenAI, I was lucky to mostly during the first years to have kind of a direct access to him. He would really coach me as a trainee researcher, I guess, with good engineering skills. And Ilya, I think at OpenAI, he was the one showing the North Star, right? He was his job and I think he really enjoyed it and he did it super well, was going through the teams and saying, this is where we should be going and trying to, you know, flock the different teams together towards an objective.Swyx [00:13:12]: I would say like the public perception of him is that he was the strongest believer in scaling. Oh, yeah. Obviously, he has always pursued the compression thesis. You have worked with him personally, what does the public not know about how he works?Stan [00:13:26]: I think he's really focused on building the vision and communicating the vision within the company, which was extremely useful. I was personally surprised that he spent so much time, you know, working on communicating that vision and getting the teams to work together versus...Swyx [00:13:40]: To be specific, vision is AGI? Oh, yeah.Stan [00:13:42]: Vision is like, yeah, it's the belief in compression and scanning computes. I remember when I started working on the Reasoning team, the excitement was really about scaling the compute around Reasoning and that was really the belief we wanted to ingrain in the team. And that's what has been useful to the team and with the DeepMind results shows that it was the right approach with the success of GPT-4 and stuff shows that it was the right approach.Swyx [00:14:06]: Was it according to the neural scaling laws, the Kaplan paper that was published?Stan [00:14:12]: I think it was before that, because those ones came with GPT-3, basically at the time of GPT-3 being released or being ready internally. But before that, there really was a strong belief in scale. I think it was just the belief that the transformer was a generic enough architecture that you could learn anything. And that was just a question of scaling.Alessio [00:14:33]: Any other fun stories you want to tell? Sam Altman, Greg, you know, anything.Stan [00:14:37]: Weirdly, I didn't work that much with Greg when I was at OpenAI. He had always been mostly focused on training the GPTs and rightfully so. One thing about Sam Altman, he really impressed me because when I joined, he had joined not that long ago and it felt like he was kind of a very high level CEO. And I was mind blown by how deep he was able to go into the subjects within a year or something, all the way to a situation where when I was having lunch by year two, I was at OpenAI with him. He would just quite know deeply what I was doing. With no ML background. Yeah, with no ML background, but I didn't have any either, so I guess that explains why. But I think it's a question about, you don't necessarily need to understand the very technicalities of how things are done, but you need to understand what's the goal and what's being done and what are the recent results and all of that in you. And we could have kind of a very productive discussion. And that really impressed me, given the size at the time of OpenAI, which was not negligible.Swyx [00:15:44]: Yeah. I mean, you've been a, you were a founder before, you're a founder now, and you've seen Sam as a founder. How has he affected you as a founder?Stan [00:15:51]: I think having that capability of changing the scale of your attention in the company, because most of the time you operate at a very high level, but being able to go deep down and being in the known of what's happening on the ground is something that I feel is really enlightening. That's not a place in which I ever was as a founder, because first company, we went all the way to 10 people. Current company, there's 25 of us. So the high level, the sky and the ground are pretty much at the same place. No, you're being too humble.Swyx [00:16:21]: I mean, Stripe was also like a huge rocket ship.Stan [00:16:23]: Stripe, I was a founder. So I was, like at OpenAI, I was really happy being on the ground, pushing the machine, making it work. Yeah.Swyx [00:16:31]: Last OpenAI question. The Anthropic split you mentioned, you were around for that. Very dramatic. David also left around that time, you left. This year, we've also had a similar management shakeup, let's just call it. Can you compare what it was like going through that split during that time? And then like, does that have any similarities now? Like, are we going to see a new Anthropic emerge from these folks that just left?Stan [00:16:54]: That I really, really don't know. At the time, the split was pretty surprising because they had been trying GPT-3, it was a success. And to be completely transparent, I wasn't in the weeds of the splits. What I understood of it is that there was a disagreement of the commercialization of that technology. I think the focal point of that disagreement was the fact that we started working on the API and wanted to make those models available through an API. Is that really the core disagreement? I don't know.Swyx [00:17:25]: Was it safety?Stan [00:17:26]: Was it commercialization?Swyx [00:17:27]: Or did they just want to start a company?Stan [00:17:28]: Exactly. Exactly. That I don't know. But I think what I was surprised of is how quickly OpenAI recovered at the time. And I think it's just because we were mostly a research org and the mission was so clear that some divergence in some teams, some people leave, the mission is still there. We have the compute. We have a site. So it just keeps going.Swyx [00:17:50]: Very deep bench. Like just a lot of talent. Yeah.Alessio [00:17:53]: So that was the OpenAI part of the history. Exactly. So then you leave OpenAI in September 2022. And I would say in Silicon Valley, the two hottest companies at the time were you and Lanktrain. What was that start like and why did you decide to start with a more developer focused kind of like an AI engineer tool rather than going back into some more research and something else?Stan [00:18:15]: Yeah. First, I'm not a trained researcher. So going through OpenAI was really kind of the PhD I always wanted to do. But research is hard. You're digging into a field all day long for weeks and weeks and weeks, and you find something, you get super excited for 12 seconds. And at the 13 seconds, you're like, oh, yeah, that was obvious. And you go back to digging. I'm not a trained, like formally trained researcher, and it wasn't kind of a necessarily an ambition of me of creating, of having a research career. And I felt the hardness of it. I enjoyed a lot of like that a ton. But at the time, I decided that I wanted to go back to something more productive. And the other fun motivation was like, I mean, if we believe in AGI and if we believe the timelines might not be too long, it's actually the last train leaving the station to start a company. After that, it's going to be computers all the way down. And so that was kind of the true motivation for like trying to go there. So that's kind of the core motivation at the beginning of personally. And the motivation for starting a company was pretty simple. I had seen GPT-4 internally at the time, it was September 2022. So it was pre-GPT, but GPT-4 was ready since, I mean, I'd been ready for a few months internally. I was like, okay, that's obvious, the capabilities are there to create an insane amount of value to the world. And yet the deployment is not there yet. The revenue of OpenAI at the time were ridiculously small compared to what it is today. So the thesis was, there's probably a lot to be done at the product level to unlock the usage.Alessio [00:19:49]: Yeah. Let's talk a bit more about the form factor, maybe. I think one of the first successes you had was kind of like the WebGPT-like thing, like using the models to traverse the web and like summarize things. And the browser was really the interface. Why did you start with the browser? Like what was it important? And then you built XP1, which was kind of like the browser extension.Stan [00:20:09]: So the starting point at the time was, if you wanted to talk about LLMs, it was still a rather small community, a community of mostly researchers and to some extent, very early adopters, very early engineers. It was almost inconceivable to just build a product and go sell it to the enterprise, though at the time there was a few companies doing that. The one on marketing, I don't remember its name, Jasper. But so the natural first intention, the first, first, first intention was to go to the developers and try to create tooling for them to create product on top of those models. And so that's what Dust was originally. It was quite different than Lanchain, and Lanchain just beat the s**t out of us, which is great. It's a choice.Swyx [00:20:53]: You were cloud, in closed source. They were open source.Stan [00:20:56]: Yeah. So technically we were open source and we still are open source, but I think that doesn't really matter. I had the strong belief from my research time that you cannot create an LLM-based workflow on just one example. Basically, if you just have one example, you overfit. So as you develop your interaction, your orchestration around the LLM, you need a dozen examples. Obviously, if you're running a dozen examples on a multi-step workflow, you start paralyzing stuff. And if you do that in the console, you just have like a messy stream of tokens going out and it's very hard to observe what's going there. And so the idea was to go with an UI so that you could kind of introspect easily the output of each interaction with the model and dig into there through an UI, which is-Swyx [00:21:42]: Was that open source? I actually didn't come across it.Stan [00:21:44]: Oh yeah, it wasn't. I mean, Dust is entirely open source even today. We're not going for an open source-Swyx [00:21:48]: If it matters, I didn't know that.Stan [00:21:49]: No, no, no, no, no. The reason why is because we're not open source because we're not doing an open source strategy. It's not an open source go-to-market at all. We're open source because we can and it's fun.Swyx [00:21:59]: Open source is marketing. You have all the downsides of open source, which is like people can clone you.Stan [00:22:03]: But I think that downside is a big fallacy. Okay. Yes, anybody can clone Dust today, but the value of Dust is not the current state. The value of Dust is the number of eyeballs and hands of developers that are creating to it in the future. And so yes, anybody can clone it today, but that wouldn't change anything. There is some value in being open source. In a discussion with the security team, you can be extremely transparent and just show the code. When you have discussion with users and there's a bug or a feature missing, you can just point to the issue, show the pull request, show the, show the, exactly, oh, PR welcome. That doesn't happen that much, but you can show the progress if the person that you're chatting with is a little bit technical, they really enjoy seeing the pull request advancing and seeing all the way to deploy. And then the downsides are mostly around security. You never want to do security by obfuscation. But the truth is that your vector of attack is facilitated by you being open source. But at the same time, it's a good thing because if you're doing anything like a bug bountying or stuff like that, you just give much more tools to the bug bountiers so that their output is much better. So there's many, many, many trade-offs. I don't believe in the value of the code base per se. I think it's really the people that are on the code base that have the value and go to market and the product and all of those things that are around the code base. Obviously, that's not true for every code base. If you're working on a very secret kernel to accelerate the inference of LLMs, I would buy that you don't want to be open source. But for product stuff, I really think there's very little risk. Yeah.Alessio [00:23:39]: I signed up for XP1, I was looking, January 2023. I think at the time you were on DaVinci 003. Given that you had seen GPD 4, how did you feel having to push a product out that was using this model that was so inferior? And you're like, please, just use it today. I promise it's going to get better. Just overall, as a founder, how do you build something that maybe doesn't quite work with the model today, but you're just expecting the new model to be better?Stan [00:24:03]: Yeah, so actually, XP1 was even on a smaller one that was the post-GDPT release, small version, so it was... Ada, Babbage... No, no, no, not that far away. But it was the small version of GDPT, basically. I don't remember its name. Yes, you have a frustration there. But at the same time, I think XP1 was designed, was an experiment, but was designed as a way to be useful at the current capability of the model. If you just want to extract data from a LinkedIn page, that model was just fine. If you want to summarize an article on a newspaper, that model was just fine. And so it was really a question of trying to find a product that works with the current capability, knowing that you will always have tailwinds as models get better and faster and cheaper. So that was kind of a... There's a bit of a frustration because you know what's out there and you know that you don't have access to it yet. It's also interesting to try to find a product that works with the current capability.Alessio [00:24:55]: And we highlighted XP1 in our anatomy of autonomy post in April of last year, which was, you know, where are all the agents, right? So now we spent 30 minutes getting to what you're building now. So you basically had a developer framework, then you had a browser extension, then you had all these things, and then you kind of got to where Dust is today. So maybe just give people an overview of what Dust is today and the courtesies behind it. Yeah, of course.Stan [00:25:20]: So Dust, we really want to build the infrastructure so that companies can deploy agents within their teams. We are horizontal by nature because we strongly believe in the emergence of use cases from the people having access to creating an agent that don't need to be developers. They have to be thinkers. They have to be curious. But anybody can create an agent that will solve an operational thing that they're doing in their day-to-day job. And to make those agents useful, there's two focus, which is interesting. The first one is an infrastructure focus. You have to build the pipes so that the agent has access to the data. You have to build the pipes such that the agents can take action, can access the web, et cetera. So that's really an infrastructure play. Maintaining connections to Notion, Slack, GitHub, all of them is a lot of work. It is boring work, boring infrastructure work, but that's something that we know is extremely valuable in the same way that Stripe is extremely valuable because it maintains the pipes. And we have that dual focus because we're also building the product for people to use it. And there it's fascinating because everything started from the conversational interface, obviously, which is a great starting point. But we're only scratching the surface, right? I think we are at the pong level of LLM productization. And we haven't invented the C3. We haven't invented Counter-Strike. We haven't invented Cyberpunk 2077. So this is really our mission is to really create the product that lets people equip themselves to just get away all the work that can be automated or assisted by LLMs.Alessio [00:26:57]: And can you just comment on different takes that people had? So maybe the most open is like auto-GPT. It's just kind of like just trying to do anything. It's like it's all magic. There's no way for you to do anything. Then you had the ADAPT, you know, we had David on the podcast. They're very like super hands-on with each individual customer to build super tailored. How do you decide where to draw the line between this is magic? This is exposed to you, especially in a market where most people don't know how to build with AI at all. So if you expect them to do the thing, they're probably not going to do it. Yeah, exactly.Stan [00:27:29]: So the auto-GPT approach obviously is extremely exciting, but we know that the agentic capability of models are not quite there yet. It just gets lost. So we're starting, we're starting where it works. Same with the XP one. And where it works is pretty simple. It's like simple workflows that involve a couple tools where you don't even need to have the model decide which tools it's used in the sense of you just want people to put it in the instructions. It's like take that page, do that search, pick up that document, do the work that I want in the format I want, and give me the results. There's no smartness there, right? In terms of orchestrating the tools, it's mostly using English for people to program a workflow where you don't have the constraint of having compatible API between the two.Swyx [00:28:17]: That kind of personal automation, would you say it's kind of like an LLM Zapier type ofStan [00:28:22]: thing?Swyx [00:28:22]: Like if this, then that, and then, you know, do this, then this. You're programming with English?Stan [00:28:28]: So you're programming with English. So you're just saying, oh, do this and then that. You can even create some form of APIs. You say, when I give you the command X, do this. When I give you the command Y, do this. And you describe the workflow. But you don't have to create boxes and create the workflow explicitly. It just needs to describe what are the tasks supposed to be and make the tool available to the agent. The tool can be a semantic search. The tool can be querying into a structured database. The tool can be searching on the web. And obviously, the interesting tools that we're only starting to scratch are actually creating external actions like reimbursing something on Stripe, sending an email, clicking on a button in the admin or something like that.Swyx [00:29:11]: Do you maintain all these integrations?Stan [00:29:13]: Today, we maintain most of the integrations. We do always have an escape hatch for people to kind of custom integrate. But the reality is that the reality of the market today is that people just want it to work, right? And so it's mostly us maintaining the integration. As an example, a very good source of information that is tricky to productize is Salesforce. Because Salesforce is basically a database and a UI. And they do the f**k they want with it. And so every company has different models and stuff like that. So right now, we don't support it natively. And the type of support or real native support will be slightly more complex than just osing into it, like is the case with Slack as an example. Because it's probably going to be, oh, you want to connect your Salesforce to us? Give us the SQL. That's the Salesforce QL language. Give us the queries you want us to run on it and inject in the context of dust. So that's interesting how not only integrations are cool, and some of them require a bit of work on the user. And for some of them that are really valuable to our users, but we don't support yet, they can just build them internally and push the data to us.Swyx [00:30:18]: I think I understand the Salesforce thing. But let me just clarify, are you using browser automation because there's no API for something?Stan [00:30:24]: No, no, no, no. In that case, so we do have browser automation for all the use cases and apply the public web. But for most of the integration with the internal system of the company, it really runs through API.Swyx [00:30:35]: Haven't you felt the pull to RPA, browser automation, that kind of stuff?Stan [00:30:39]: I mean, what I've been saying for a long time, maybe I'm wrong, is that if the future is that you're going to stand in front of a computer and looking at an agent clicking on stuff, then I'll hit my computer. And my computer is a big Lenovo. It's black. Doesn't sound good at all compared to a Mac. And if the APIs are there, we should use them. There is going to be a long tail of stuff that don't have APIs, but as the world is moving forward, that's disappearing. So the core API value in the past has really been, oh, this old 90s product doesn't have an API. So I need to use the UI to automate. I think for most of the ICP companies, the companies that ICP for us, the scale ups that are between 500 and 5,000 people, tech companies, most of the SaaS they use have APIs. Now there's an interesting question for the open web, because there are stuff that you want to do that involve websites that don't necessarily have APIs. And the current state of web integration from, which is us and OpenAI and Anthropic, I don't even know if they have web navigation, but I don't think so. The current state of affair is really, really broken because you have what? You have basically search and headless browsing. But headless browsing, I think everybody's doing basically body.innertext and fill that into the model, right?Swyx [00:31:56]: MARK MIRCHANDANI There's parsers into Markdown and stuff.Stan [00:31:58]: FRANCESC CAMPOY I'm super excited by the companies that are exploring the capability of rendering a web page into a way that is compatible for a model, being able to maintain the selector. So that's basically the place where to click in the page through that process, expose the actions to the model, have the model select an action in a way that is compatible with model, which is not a big page of a full DOM that is very noisy, and then being able to decompress that back to the original page and take the action. And that's something that is really exciting and that will kind of change the level of things that agents can do on the web. That I feel exciting, but I also feel that the bulk of the useful stuff that you can do within the company can be done through API. The data can be retrieved by API. The actions can be taken through API.Swyx [00:32:44]: For listeners, I'll note that you're basically completely disagreeing with David Wan. FRANCESC CAMPOY Exactly, exactly. I've seen it since it's summer. ADEPT is where it is, and Dust is where it is. So Dust is still standing.Alessio [00:32:55]: Can we just quickly comment on function calling? You mentioned you don't need the models to be that smart to actually pick the tools. Have you seen the models not be good enough? Or is it just like, you just don't want to put the complexity in there? Like, is there any room for improvement left in function calling? Or do you feel you usually consistently get always the right response, the right parametersStan [00:33:13]: and all of that?Alessio [00:33:13]: FRANCESC CAMPOY So that's a tricky product question.Stan [00:33:15]: Because if the instructions are good and precise, then you don't have any issue, because it's scripted for you. And the model will just look at the scripts and just follow and say, oh, he's probably talking about that action, and I'm going to use it. And the parameters are kind of abused from the state of the conversation. I'll just go with it. If you provide a very high level, kind of an auto-GPT-esque level in the instructions and provide 16 different tools to your model, yes, we're seeing the models in that state making mistakes. And there is obviously some progress can be made on the capabilities. But the interesting part is that there is already so much work that can assist, augment, accelerate by just going with pretty simply scripted for actions agents. What I'm excited about by pushing our users to create rather simple agents is that once you have those working really well, you can create meta agents that use the agents as actions. And all of a sudden, you can kind of have a hierarchy of responsibility that will probably get you almost to the point of the auto-GPT value. It requires the construction of intermediary artifacts, but you're probably going to be able to achieve something great. I'll give you some example. We have our incidents are shared in Slack in a specific channel, or shipped are shared in Slack. We have a weekly meeting where we have a table about incidents and shipped stuff. We're not writing that weekly meeting table anymore. We have an assistant that just go find the right data on Slack and create the table for us. And that assistant works perfectly. It's trivially simple, right? Take one week of data from that channel and just create the table. And then we have in that weekly meeting, obviously some graphs and reporting about our financials and our progress and our ARR. And we've created assistants to generate those graphs directly. And those assistants works great. By creating those assistants that cover those small parts of that weekly meeting, slowly we're getting to in a world where we'll have a weekly meeting assistance. We'll just call it. You don't need to prompt it. You don't need to say anything. It's going to run those different assistants and get that notion page just ready. And by doing that, if you get there, and that's an objective for us to us using Dust, get there, you're saving an hour of company time every time you run it. Yeah.Alessio [00:35:28]: That's my pet topic of NPM for agents. How do you build dependency graphs of agents? And how do you share them? Because why do I have to rebuild some of the smaller levels of what you built already?Swyx [00:35:40]: I have a quick follow-up question on agents managing other agents. It's a topic of a lot of research, both from Microsoft and even in startups. What you've discovered best practice for, let's say like a manager agent controlling a bunch of small agents. It's two-way communication. I don't know if there should be a protocol format.Stan [00:35:59]: To be completely honest, the state we are at right now is creating the simple agents. So we haven't even explored yet the meta agents. We know it's there. We know it's going to be valuable. We know it's going to be awesome. But we're starting there because it's the simplest place to start. And it's also what the market understands. If you go to a company, random SaaS B2B company, not necessarily specialized in AI, and you take an operational team and you tell them, build some tooling for yourself, they'll understand the small agents. If you tell them, build AutoGP, they'll be like, Auto what?Swyx [00:36:31]: And I noticed that in your language, you're very much focused on non-technical users. You don't really mention API here. You mention instruction instead of system prompt, right? That's very conscious.Stan [00:36:41]: Yeah, it's very conscious. It's a mark of our designer, Ed, who kind of pushed us to create a friendly product. I was knee-deep into AI when I started, obviously. And my co-founder, Gabriel, was a Stripe as well. We started a company together that got acquired by Stripe 15 years ago. It was at Alain, a healthcare company in Paris. After that, it was a little bit less so knee-deep in AI, but really focused on product. And I didn't realize how important it is to make that technology not scary to end users. It didn't feel scary to me, but it was really seen by Ed, our designer, that it was feeling scary to the users. And so we were very proactive and very deliberate about creating a brand that feels not too scary and creating a wording and a language, as you say, that really tried to communicate the fact that it's going to be fine. It's going to be easy. You're going to make it.Alessio [00:37:34]: And another big point that David had about ADAPT is we need to build an environment for the agents to act. And then if you have the environment, you can simulate what they do. How's that different when you're interacting with APIs and you're kind of touching systems that you cannot really simulate? If you call it the Salesforce API, you're just calling it.Stan [00:37:52]: So I think that goes back to the DNA of the companies that are very different. ADAPT, I think, was a product company with a very strong research DNA, and they were still doing research. One of their goals was building a model. And that's why they raised a large amount of money, et cetera. We are 100% deliberately a product company. We don't do research. We don't train models. We don't even run GPUs. We're using the models that exist, and we try to push the product boundary as far as possible with the existing models. So that creates an issue. Indeed, so to answer your question, when you're interacting in the real world, well, you cannot simulate, so you cannot improve the models. Even improving your instructions is complicated for a builder. The hope is that you can use models to evaluate the conversations so that you can get at least feedback and you could get contradictive information about the performance of the assistance. But if you take actual trace of interaction of humans with those agents, it is even for us humans extremely hard to decide whether it was a productive interaction or a really bad interaction. You don't know why the person left. You don't know if they left happy or not. So being extremely, extremely, extremely pragmatic here, it becomes a product issue. We have to build a product that identifies the end users to provide feedback so that as a first step, the person that is building the agent can iterate on it. As a second step, maybe later when we start training model and post-training, et cetera, we can optimize around that for each of those companies. Yeah.Alessio [00:39:17]: Do you see in the future products offering kind of like a simulation environment, the same way all SaaS now kind of offers APIs to build programmatically? Like in cybersecurity, there are a lot of companies working on building simulative environments so that then you can use agents like Red Team, but I haven't really seen that.Stan [00:39:34]: Yeah, no, me neither. That's a super interesting question. I think it's really going to depend on how much, because you need to simulate to generate data, you need to train data to train models. And the question at the end is, are we going to be training models or are we just going to be using frontier models as they are? On that question, I don't have a strong opinion. It might be the case that we'll be training models because in all of those AI first products, the model is so close to the product surface that as you get big and you want to really own your product, you're going to have to own the model as well. Owning the model doesn't mean doing the pre-training, that would be crazy. But at least having an internal post-training realignment loop, it makes a lot of sense. And so if we see many companies going towards that all the time, then there might be incentives for the SaaS's of the world to provide assistance in getting there. But at the same time, there's a tension because those SaaS, they don't want to be interacted by agents, they want the human to click on the button. Yeah, they got to sell seats. Exactly.Swyx [00:40:41]: Just a quick question on models. I'm sure you've used many, probably not just OpenAI. Would you characterize some models as better than others? Do you use any open source models? What have been the trends in models over the last two years?Stan [00:40:53]: We've seen over the past two years kind of a bit of a race in between models. And at times, it's the OpenAI model that is the best. At times, it's the Anthropic models that is the best. Our take on that is that we are agnostic and we let our users pick their model. Oh, they choose? Yeah, so when you create an assistant or an agent, you can just say, oh, I'm going to run it on GP4, GP4 Turbo, or...Swyx [00:41:16]: Don't you think for the non-technical user, that is actually an abstraction that you should take away from them?Stan [00:41:20]: We have a sane default. So we move the default to the latest model that is cool. And we have a sane default, and it's actually not very visible. In our flow to create an agent, you would have to go in advance and go pick your model. So this is something that the technical person will care about. But that's something that obviously is a bit too complicated for the...Swyx [00:41:40]: And do you care most about function calling or instruction following or something else?Stan [00:41:44]: I think we care most for function calling because you want to... There's nothing worse than a function call, including incorrect parameters or being a bit off because it just drives the whole interaction off.Swyx [00:41:56]: Yeah, so got the Berkeley function calling.Stan [00:42:00]: These days, it's funny how the comparison between GP4O and GP4 Turbo is still up in the air on function calling. I personally don't have proof, but I know many people, and I'm probably part of them, to think that GP4 Turbo is still better than GP4O on function calling. Wow. We'll see what comes out of the O1 class if it ever gets function calling. And Cloud 3.5 Summit is great as well. They kind of innovated in an interesting way, which was never quite publicized. But it's that they have that kind of chain of thought step whenever you use a Cloud model or Summit model with function calling. That chain of thought step doesn't exist when you just interact with it just for answering questions. But when you use function calling, you get that step, and it really helps getting better function calling.Swyx [00:42:43]: Yeah, we actually just recorded a podcast with the Berkeley team that runs that leaderboard this week. So they just released V3.Stan [00:42:49]: Yeah.Swyx [00:42:49]: It was V1 like two months ago, and then they V2, V3. Turbo is on top.Stan [00:42:53]: Turbo is on top. Turbo is over 4.0.Swyx [00:42:54]: And then the third place is XLAM from Salesforce, which is a large action model they've been trying to popularize.Stan [00:43:01]: Yep.Swyx [00:43:01]: O1 Mini is actually on here, I think. O1 Mini is number 11.Stan [00:43:05]: But arguably, O1 Mini has been in a line for that. Yeah.Alessio [00:43:09]: Do you use leaderboards? Do you have your own evals? I mean, this is kind of intuitive, right? Like using the older model is better. I think most people just upgrade. Yeah. What's the eval process like?Stan [00:43:19]: It's funny because I've been doing research for three years, and we have bigger stuff to cook. When you're deploying in a company, one thing where we really spike is that when we manage to activate the company, we have a crazy penetration. The highest penetration we have is 88% daily active users within the entire employee of the company. The kind of average penetration and activation we have in our current enterprise customers is something like more like 60% to 70% weekly active. So we basically have the entire company interacting with us. And when you're there, there is so many stuff that matters most than getting evals, getting the best model. Because there is so many places where you can create products or do stuff that will give you the 80% with the work you do. Whereas deciding if it's GPT-4 or GPT-4 Turbo or et cetera, you know, it'll just give you the 5% improvement. But the reality is that you want to focus on the places where you can really change the direction or change the interaction more drastically. But that's something that we'll have to do eventually because we still want to be serious people.Swyx [00:44:24]: It's funny because in some ways, the model labs are competing for you, right? You don't have to do any effort. You just switch model and then it'll grow. What are you really limited by? Is it additional sources?Stan [00:44:36]: It's not models, right?Swyx [00:44:37]: You're not really limited by quality of model.Stan [00:44:40]: Right now, we are limited by the infrastructure part, which is the ability to connect easily for users to all the data they need to do the job they want to do.Swyx [00:44:51]: Because you maintain all your own stuff.Stan [00:44:53]: You know, there are companies out thereSwyx [00:44:54]: that are starting to provide integrations as a service, right? I used to work in an integrations company. Yeah, I know.Stan [00:44:59]: It's just that there is some intricacies about how you chunk stuff and how you process information from one platform to the other. If you look at the end of the spectrum, you could think of, you could say, oh, I'm going to support AirByte and AirByte has- I used to work at AirByte.Swyx [00:45:12]: Oh, really?Stan [00:45:13]: That makes sense.Swyx [00:45:14]: They're the French founders as well.Stan [00:45:15]: I know Jean very well. I'm seeing him today. And the reality is that if you look at Notion, AirByte does the job of taking Notion and putting it in a structured way. But that's the way it is not really usable to actually make it available to models in a useful way. Because you get all the blocks, details, et cetera, which is useful for many use cases.Swyx [00:45:35]: It's also for data scientists and not for AI.Stan [00:45:38]: The reality of Notion is that sometimes you have a- so when you have a page, there's a lot of structure in it and you want to capture the structure and chunk the information in a way that respects that structure. In Notion, you have databases. Sometimes those databases are real tabular data. Sometimes those databases are full of text. You want to get the distinction and understand that this database should be considered like text information, whereas this other one is actually quantitative information. And to really get a very high quality interaction with that piece of information, I haven't found a solution that will work without us owning the connection end-to-end.Swyx [00:46:15]: That's why I don't invest in, there's Composio, there's All Hands from Graham Newbig. There's all these other companies that are like, we will do the integrations for you. You just, we have the open source community. We'll do off the shelf. But then you are so specific in your needs that you want to own it.Swyx [00:46:28]: Yeah, exactly.Stan [00:46:29]: You can talk to Michel about that.Swyx [00:46:30]: You know, he wants to put the AI in there, but you know. Yeah, I will. I will.Stan [00:46:35]: Cool. What are we missing?Alessio [00:46:36]: You know, what are like the things that are like sneakily hard that you're tackling that maybe people don't even realize they're like really hard?Stan [00:46:43]: The real parts as we kind of touch base throughout the conversation is really building the infra that works for those agents because it's a tenuous walk. It's an evergreen piece of work because you always have an extra integration that will be useful to a non-negligible set of your users. I'm super excited about is that there's so many interactions that shouldn't be conversational interactions and that could be very useful. Basically, know that we have the firehose of information of those companies and there's not going to be that many companies that capture the firehose of information. When you have the firehose of information, you can do a ton of stuff with models that are just not accelerating people, but giving them superhuman capability, even with the current model capability because you can just sift through much more information. An example is documentation repair. If I have the firehose of Slack messages and new Notion pages, if somebody says, I own that page, I want to be updated when there is a piece of information that should update that page, this is not possible. You get an email saying, oh, look at that Slack message. It says the opposite of what you have in that paragraph. Maybe you want to update or just ping that person. I think there is a lot to be explored on the product layer in terms of what it means to interact productively with those models. And that's a problem that's extremely hard and extremely exciting.Swyx [00:48:00]: One thing you keep mentioning about infra work, obviously, Dust is building that infra and serving that in a very consumer-friendly way. You always talk about infra being additional sources, additional connectors. That is very important. But I'm also interested in the vertical infra. There is an orchestrator underlying all these things where you're doing asynchronous work. For example, the simplest one is a cron job. You just schedule things. But also, for if this and that, you have to wait for something to be executed and proceed to the next task. I used to work on an orchestrator as well, Temporal.Stan [00:48:31]: We used Temporal. Oh, you used Temporal? Yeah. Oh, how was the experience?Swyx [00:48:34]: I need the NPS.Stan [00:48:36]: We're doing a self-discovery call now.Swyx [00:48:39]: But you can also complain to me because I don't work there anymore.Stan [00:48:42]: No, we love Temporal. There's some edges that are a bit rough, surprisingly rough. And you would say, why is it so complicated?Swyx [00:48:49]: It's always versioning.Stan [00:48:50]: Yeah, stuff like that. But we really love it. And we use it for exactly what you said, like managing the entire set of stuff that needs to happen so that in semi-real time, we get all the updates from Slack or Notion or GitHub into the system. And whenever we see that piece of information goes through, maybe trigger workflows to run agents because they need to provide alerts to users and stuff like that. And Temporal is great. Love it.Swyx [00:49:17]: You haven't evaluated others. You don't want to build your own. You're happy with...Stan [00:49:21]: Oh, no, we're not in the business of replacing Temporal. And Temporal is so... I mean, it is or any other competitive product. They're very general. If it's there, there's an interesting theory about buy versus build. I think in that case, when you're a high-growth company, your buy-build trade-off is very much on the side of buy. Because if you have the capability, you're just going to be saving time, you can focus on your core competency, etc. And it's funny because we're seeing, we're starting to see the post-high-growth company, post-SKF company, going back on that trade-off, interestingly. So that's the cloud news about removing Zendesk and Salesforce. Do you believe that, by the way?Alessio [00:49:56]: Yeah, I did a podcast with them.Stan [00:49:58]: Oh, yeah?Alessio [00:49:58]: It's true.Swyx [00:49:59]: No, no, I know.Stan [00:50:00]: Of course they say it's true,Swyx [00:50:00]: but also how well is it going to go?Stan [00:50:02]: So I'm not talking about deflecting the customer traffic. I'm talking about building AI on top of Salesforce and Zendesk, basically, if I understand correctly. And all of a sudden, your product surface becomes much smaller because you're interacting with an AI system that will take some actions. And so all of a sudden, you don't need the product layer anymore. And you realize that, oh, those things are just databases that I pay a hundred times the price, right? Because you're a post-SKF company and you have tech capabilities, you are incentivized to reduce your costs and you have the capability to do so. And then it makes sense to just scratch the SaaS away. So it's interesting that we might see kind of a bad time for SaaS in post-hyper-growth tech companies. So it's still a big market, but it's not that big because if you're not a tech company, you don't have the capabilities to reduce that cost. If you're a high-growth company, always going to be buying because you go faster with that. But that's an interesting new space, new category of companies that might remove some SaaS. Yeah, Alessio's firmSwyx [00:51:02]: has an interesting thesis on the future of SaaS in AI.Alessio [00:51:05]: Service as a software, we call it. It's basically like, well, the most extreme is like, why is there any software at all? You know, ideally, it's all a labor interface where you're asking somebody to do something for you, whether that's a person, an AI agent or whatnot.Stan [00:51:17]: Yeah, yeah, that's interesting. I have to ask.Swyx [00:51:19]: Are you paying for Temporal Cloud or are you self-hosting?Stan [00:51:22]: Oh, no, no, we're paying, we're paying. Oh, okay, interesting.Swyx [00:51:24]: We're paying way too much.Stan [00:51:26]: It's crazy expensive, but it makes us-Swyx [00:51:28]: That's why as a shareholder, I like to hear that. It makes us go faster,Stan [00:51:31]: so we're happy to pay.Swyx [00:51:33]: Other things in the infrastack, I just want a list for other founders to think about. Ops, API gateway, evals, you know, anything interesting there that you build or buy?Stan [00:51:41]: I mean, there's always an interesting question. We've been building a lot around the interface between models and because Dust, the original version, was an orchestration platform and we basically provide a unified interface to every model providers.Swyx [00:51:56]: That's what I call gateway.Stan [00:51:57]: That we add because Dust was that and so we continued building upon and we own it. But that's an interesting question was in you, you want to build that or buy it?Swyx [00:52:06]: Yeah, I always say light LLM is the current open source consensus.Stan [00:52:09]: Exactly, yeah. There's an interesting question there.Swyx [00:52:12]: Ops, Datadog, just tracking.Stan [00:52:14]: Oh yeah, so Datadog is an obvious... What are the mistakes that I regret? I started as pure JavaScript, not TypeScript, and I think you want to, if you're wondering, oh, I want to go fast, I'll do a little bit of JavaScript. No, don't, just start with TypeScript. I see, okay.Swyx [00:52:30]: So interesting, you are a research engineer that came out of OpenAI that bet on TypeScript.Stan [00:52:36]: Well, the reality is that if you're building a product, you're going to be doing a lot of JavaScript, right? And Next, we're using Next as an example. It's
This week we're still waiting for Ubuntu Core, But the wait is over for AMD's new 9800X3D processor! We get better kernel PWM support, Russia appears to be forking the kernel, the Mozilla Foundation makes cuts, and Framework is teaming with Mint. For tips we have pw-cat for sniffing on Pipewire, nvtop for sniffing on your GPU, ssh jump servers, and the zen browser. Find the show notes at https://bit.ly/4fmWf22 and enjoy! Host: Jonathan Bennett Co-Hosts: David Ruggles, Ken McDonald, and Jeff Massie Want access to the video version and exclusive features? Become a member of Club TWiT today! https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.
Malicious NPM packages are sneaking into codebases while FFmpeg devs prove old-school assembly skills can still smoke the competition. Plus, a rare bee species takes on Zuck's AI dreams.
Malicious NPM packages are sneaking into codebases while FFmpeg devs prove old-school assembly skills can still smoke the competition. Plus, a rare bee species takes on Zuck's AI dreams.
Malicious NPM packages are sneaking into codebases while FFmpeg devs prove old-school assembly skills can still smoke the competition. Plus, a rare bee species takes on Zuck's AI dreams.
From Nextcloud Breakup to Blissful Reunion: Chris's journey back to a smarter setup. Plus, Jellyfin's game-changing features and a beloved self-hosted app get the upgrade we've all been waiting for.
This week we cover news from Pine on another piece of hardware that runs doom, network manager and Godot have a shared struggle, and FFMpeg drops 7.1. Then we chat Audacious, Look at XFCE's Wayland work, talk about Linux 6.12, and reminisce about bittorrent. For tips, there's Etchdroid for writing boot disks from Android, install for command line installations, neovim for editing needs, and truncate for trimming bytes off the end of files. The show notes are available at https://tinyurl.com/23zpxeyf with links to each topic! Host: Jonathan Bennett Co-Hosts: Rob Campbell, Jeff Massie, and Ken McDonald Want access to the video version and exclusive features? Become a member of Club TWiT today! https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.
This week we dig into the Verizon outage. We'll talk about why and how you can host your own communication infrastructure, so an outage is less detrimental to your life! -- During The Show -- 01:35 Notes with sync? - Jou Jou How Steve takes notes Standard Notes (https://app.standardnotes.com/) Joplin (https://joplinapp.org/) Simple Mobile Tools (https://simplemobiletools.com/) Fossify (https://www.fossify.org/apps/) Nextcloud Notes app Collabra 09:12 Listener Reacts To Distros - Aaron Issues with Ubuntu Go for Fedora Endless OS Write in with your Ubuntu issues Mind Drip One Offline Windows Setup (https://docs.minddripone.com/Windows/windows-11-offline-setup/) 15:27 Audio Issues Mixer died Loaner from company Got the board fixed Sample rate changed Sample rate manually fixed and digital recorder back in production 17:48 News Wire Firefox 131 - mozilla.org (https://www.mozilla.org/en-US/firefox/131.0/releasenotes/) FFmpeg 7.1 - ffmpeg.org (https://ffmpeg.org) GnuCash 5.9 - gnucash.org (https://www.gnucash.org/news.phtml) Postgres 17.1 - postgresql (https://www.postgresql.org/about/news/postgresql-17-released-2936/) Winamp Source Code - bleepingcomputer.com (https://www.bleepingcomputer.com/news/software/winamp-releases-source-code-asks-for-help-modernizing-the-player/) Liya 2.1 - notebookcheck.net (https://www.notebookcheck.net/Arch-Linux-based-Liya-2-1-rolls-out-with-the-6-11-0-1-kernel.893752.0.html) Arch and Valve Collaboration - lists.archlinux.org (https://lists.archlinux.org/archives/list/arch-dev-public@lists.archlinux.org/thread/RIZSKIBDSLY4S5J2E2STNP5DH4XZGJMR/) Tails and Tor Join Forces - blog.torproject.org (https://blog.torproject.org/tor-tails-join-forces/) Statial-B - hackaday.com (https://hackaday.com/2024/09/25/the-statial-b-open-source-adjustable-mouse/) Molmo Open Source AI Model - venturebeat.com (https://venturebeat.com/ai/ai2s-new-molmo-open-source-ai-models-beat-gpt-4o-claude-on-some-benchmarks/) 18:55 Arch Flatpak Issue Installed Arch, ran the ansible, restored backups Flatpak installs failed Github Issue (https://github.com/flatpak/flatpak/issues/5111) Arch Flatpak Fix ``` flatpak remote-delete --force flathub flatpak remote-add --if-not-exists flathub https://flathub.org/repo/flathub.flatpakrepo ``` * Let's do a better job 26:10 Verizon Outage No clear communication Speculation abounds No clear ETA for restoration of service Never seen people behave this badly Self Hosted services unaffected Calls and Texts via 3CX and JMP.chat Internal communications over Matrix Companies should self host their own services Communication is changing Master Switch (https://www.penguinrandomhouse.com/books/194417/the-master-switch-by-tim-wu/) Steve didn't even notice the outage Other carriers reported issues contacting Verizon customers Don't be a slave to the glowing brick 40:33 Thunderbird on Android Mozilla took over K9 Mail Unified Inbox Still in Beta Asking for feedback Thunderbird Matrix Chat (https://chat.mozilla.org/#/room/#tb-android:mozilla.org) Thunderbird Blog Post (https://blog.thunderbird.net/2024/09/help-us-test-the-thunderbird-for-android-beta/) 46:00 Reproducibility Combinations of packages can be problematic Automation can fail Never will be completely safe Automation can get you a long way NixOS ZFS Snapshots -- The Extra Credit Section -- For links to the articles and material referenced in this week's episode check out this week's page from our podcast dashboard! This Episode's Podcast Dashboard (http://podcast.asknoahshow.com/409) Phone Systems for Ask Noah provided by Voxtelesys (http://www.voxtelesys.com/asknoah) Join us in our dedicated chatroom #GeekLab:linuxdelta.com on Matrix (https://element.linuxdelta.com/#/room/#geeklab:linuxdelta.com) -- Stay In Touch -- Find all the resources for this show on the Ask Noah Dashboard Ask Noah Dashboard (http://www.asknoahshow.com) Need more help than a radio show can offer? Altispeed provides commercial IT services and they're excited to offer you a great deal for listening to the Ask Noah Show. Call today and ask about the discount for listeners of the Ask Noah Show! Altispeed Technologies (http://www.altispeed.com/) Contact Noah live [at] asknoahshow.com -- Twitter -- Noah - Kernellinux (https://twitter.com/kernellinux) Ask Noah Show (https://twitter.com/asknoahshow) Altispeed Technologies (https://twitter.com/altispeed)
Scott walks Wes through the new Syntax Production Assistant Desktop App, designed to streamline and automate their complex publishing process. From tech stack choices like Svelte5 and Rust to AI-driven features, they dive into how this tool keeps everything consistent. Show Notes 00:00 Welcome to Syntax! 00:44 Brought to you by Sentry.io. 01:37 What was the idea? 05:42 The tech. Svelte5, Tauri, Rust, FFMPEG. 08:32 Markdown editor. ink-mde, Dillinger. 09:32 Epoch timestamps. Epoch.vercel. 10:01 Updating front-matter. 10:10 Dexie.js function. 11:25 Backing up data. 11:58 Rust functions. 12:58 Why a desktop app and not a website? 14:38 Some small AI features. 16:26 Challenges with OAuth. 20:03 Publishing challenges. 23:29 Could this work on Windows? 23:54 Debugging. 26:23 Deciphering Apple logs. Hit us up on Socials! Syntax: X Instagram Tiktok LinkedIn Threads Wes: X Instagram Tiktok LinkedIn Threads Scott: X Instagram Tiktok LinkedIn Threads Randy: X Instagram YouTube Threads
News includes the upcoming signed installers for Livebook and Elixir on Windows, the release of Telemetry v1.3 with improved documentation, LiveView Native 0.3.0's announcement ahead of ElixirConf, Google Research introducing an alternative SQL syntax with a pipe, a Livebook leveraging LLMs and FFMPEG for media conversion, legal updates on the US non-compete agreements ban, and potential antitrust actions against Google, and more! Show Notes online - http://podcast.thinkingelixir.com/218 (http://podcast.thinkingelixir.com/218) Elixir Community News - https://x.com/josevalim/status/1825954736094457943 (https://x.com/josevalim/status/1825954736094457943?utm_source=thinkingelixir&utm_medium=shownotes) – The next versions of Livebook and Elixir will have signed installers on Windows, thanks to the Erlang Ecosystem Foundation and Wojtek Mach. - https://x.com/wojtekmach/status/1826521109476344035 (https://x.com/wojtekmach/status/1826521109476344035?utm_source=thinkingelixir&utm_medium=shownotes) – Wojtek Mach discusses the challenges of packaging Livebook into a .msix for the Windows Store and asks for contributions from those familiar with the process. - https://hexdocs.pm/telemetry/1.3.0/readme.html (https://hexdocs.pm/telemetry/1.3.0/readme.html?utm_source=thinkingelixir&utm_medium=shownotes) – Telemetry v1.3 is out with improved documentation, rewritten to ExDoc from Erlang edoc, thanks to contributions from Wojtek Mach and Andrea Leopardi. OTP 27 is required. - https://x.com/bcardarella/status/1826266402631889091 (https://x.com/bcardarella/status/1826266402631889091?utm_source=thinkingelixir&utm_medium=shownotes) – LiveView Native 0.3.0 is now released with the official announcement at ElixirConf. Blog posts, tutorials to follow. - https://x.com/bcardarella/status/1826279303623082421 (https://x.com/bcardarella/status/1826279303623082421?utm_source=thinkingelixir&utm_medium=shownotes) – Additional details about the LiveView Native 0.3.0 release. - https://twitter.com/simonw/status/1827482890680332386 (https://twitter.com/simonw/status/1827482890680332386?utm_source=thinkingelixir&utm_medium=shownotes) – Google Research released a paper on an alternative SQL syntax with a pipe, similar to Ecto querying syntax. - https://simonwillison.net/2024/Aug/24/pipe-syntax-in-sql/ (https://simonwillison.net/2024/Aug/24/pipe-syntax-in-sql/?utm_source=thinkingelixir&utm_medium=shownotes) – More details on the new SQL syntax introduced by Google for ZetaSQL. - https://twitter.com/ac_alejos/status/1794105872680972458 (https://twitter.com/ac_alejos/status/1794105872680972458?utm_source=thinkingelixir&utm_medium=shownotes) – A Livebook that uses LLMs and FFMPEG to simplify the process of converting videos or audio by suggesting the right flags and switches. - https://github.com/acalejos/CinEx (https://github.com/acalejos/CinEx?utm_source=thinkingelixir&utm_medium=shownotes) – Detailed information on using LLMs within Livebook for conversion tasks. - https://www.reuters.com/legal/us-judge-strikes-down-biden-administration-ban-worker-noncompete-agreements-2024-08-20/ (https://www.reuters.com/legal/us-judge-strikes-down-biden-administration-ban-worker-noncompete-agreements-2024-08-20/?utm_source=thinkingelixir&utm_medium=shownotes) – A US Judge struck down the FTC's ban on non-compete agreements, stating the FTC lacks legal authority and the ban is too wide-reaching. - https://www.nytimes.com/2024/08/13/technology/google-monopoly-antitrust-justice-department.html (https://www.nytimes.com/2024/08/13/technology/google-monopoly-antitrust-justice-department.html?utm_source=thinkingelixir&utm_medium=shownotes) – The US government is considering ordering Google to be broken up following antitrust allegations. - https://www.macrumors.com/2024/08/22/apple-eu-default-app-update/ (https://www.macrumors.com/2024/08/22/apple-eu-default-app-update/?utm_source=thinkingelixir&utm_medium=shownotes) – Apple might allow EU residents to delete apps currently blocked from removal, addressing app store issues in the EU. - Living in a time when industry rules are being challenged creates opportunities for new businesses and markets, as highlighted by ongoing legal issues with major tech companies like Google and Apple. Do you have some Elixir news to share? Tell us at @ThinkingElixir (https://twitter.com/ThinkingElixir) or email at show@thinkingelixir.com (mailto:show@thinkingelixir.com) Find us online - Message the show - @ThinkingElixir (https://twitter.com/ThinkingElixir) - Message the show on Fediverse - @ThinkingElixir@genserver.social (https://genserver.social/ThinkingElixir) - Email the show - show@thinkingelixir.com (mailto:show@thinkingelixir.com) - Mark Ericksen - @brainlid (https://twitter.com/brainlid) - Mark Ericksen on Fediverse - @brainlid@genserver.social (https://genserver.social/brainlid) - David Bernheisel - @bernheisel (https://twitter.com/bernheisel) - David Bernheisel on Fediverse - @dbern@genserver.social (https://genserver.social/dbern)
Rust meets Linux in a clash of coding cultures. Why some developers are resisting, and where things go from here.Sponsored By:Core Contributor Membership: Take $1 a month of your membership for a lifetime!Tailscale: Tailscale is a programmable networking software that is private and secure by default - get it free on up to 100 devices! 1Password Extended Access Management: 1Password Extended Access Management is a device trust solution for companies with Okta, and they ensure that if a device isn't trusted and secure, it can't log into your cloud apps. Support LINUX UnpluggedLinks:
Today's guest, Nicholas Carlini, a research scientist at DeepMind, argues that we should be focusing more on what AI can do for us individually, rather than trying to have an answer for everyone."How I Use AI" - A Pragmatic ApproachCarlini's blog post "How I Use AI" went viral for good reason. Instead of giving a personal opinion about AI's potential, he simply laid out how he, as a security researcher, uses AI tools in his daily work. He divided it in 12 sections:* To make applications* As a tutor* To get started* To simplify code* For boring tasks* To automate tasks* As an API reference* As a search engine* To solve one-offs* To teach me* Solving solved problems* To fix errorsEach of the sections has specific examples, so we recommend going through it. It also includes all prompts used for it; in the "make applications" case, it's 30,000 words total!My personal takeaway is that the majority of the work AI can do successfully is what humans dislike doing. Writing boilerplate code, looking up docs, taking repetitive actions, etc. These are usually boring tasks with little creativity, but with a lot of structure. This is the strongest arguments as to why LLMs, especially for code, are more beneficial to senior employees: if you can get the boring stuff out of the way, there's a lot more value you can generate. This is less and less true as you go entry level jobs which are mostly boring and repetitive tasks. Nicholas argues both sides ~21:34 in the pod.A New Approach to LLM BenchmarksWe recently did a Benchmarks 201 episode, a follow up to our original Benchmarks 101, and some of the issues have stayed the same. Notably, there's a big discrepancy between what benchmarks like MMLU test, and what the models are used for. Carlini created his own domain-specific language for writing personalized LLM benchmarks. The idea is simple but powerful:* Take tasks you've actually needed AI for in the past.* Turn them into benchmark tests.* Use these to evaluate new models based on your specific needs.It can represent very complex tasks, from a single code generation to drawing a US flag using C:"Write hello world in python" >> LLMRun() >> PythonRun() >> SubstringEvaluator("hello world")"Write a C program that draws an american flag to stdout." >> LLMRun() >> CRun() >> VisionLLMRun("What flag is shown in this image?") >> (SubstringEvaluator("United States") | SubstringEvaluator("USA")))This approach solves a few problems:* It measures what's actually useful to you, not abstract capabilities.* It's harder for model creators to "game" your specific benchmark, a problem that has plagued standardized tests.* It gives you a concrete way to decide if a new model is worth switching to, similar to how developers might run benchmarks before adopting a new library or framework.Carlini argues that if even a small percentage of AI users created personal benchmarks, we'd have a much better picture of model capabilities in practice.AI SecurityWhile much of the AI security discussion focuses on either jailbreaks or existential risks, Carlini's research targets the space in between. Some highlights from his recent work:* LAION 400M data poisoning: By buying expired domains referenced in the dataset, Carlini's team could inject arbitrary images into models trained on LAION 400M. You can read the paper "Poisoning Web-Scale Training Datasets is Practical", for all the details. This is a great example of expanding the scope beyond the model itself, and looking at the whole system and how ti can become vulnerable.* Stealing model weights: They demonstrated how to extract parts of production language models (like OpenAI's) through careful API queries. This research, "Extracting Training Data from Large Language Models", shows that even black-box access can leak sensitive information.* Extracting training data: In some cases, they found ways to make models regurgitate verbatim snippets from their training data. Him and Milad Nasr wrote a paper on this as well: Scalable Extraction of Training Data from (Production) Language Models. They also think this might be applicable to extracting RAG results from a generation.These aren't just theoretical attacks. They've led to real changes in how companies like OpenAI design their APIs and handle data. If you really miss logit_bias and logit results by token, you can blame Nicholas :)We had a ton of fun also chatting about things like Conway's Game of Life, how much data can fit in a piece of paper, and porting Doom to Javascript. Enjoy!Show Notes* How I Use AI* My Benchmark for LLMs* Doom Javascript port* Conway's Game of Life* Tic-Tac-Toe in one printf statement* International Obfuscated C Code Contest* Cursor* LAION 400M poisoning paper* Man vs Machine at Black Hat* Model Stealing from OpenAI* Milad Nasr* H.D. Moore* Vijay Bolina* Cosine.sh* uuencodeTimestamps* [00:00:00] Introductions* [00:01:14] Why Nicholas writes* [00:02:09] The Game of Life* [00:05:07] "How I Use AI" blog post origin story* [00:08:24] Do we need software engineering agents?* [00:11:03] Using AI to kickstart a project* [00:14:08] Ephemeral software* [00:17:37] Using AI to accelerate research* [00:21:34] Experts vs non-expert users as beneficiaries of AI* [00:24:02] Research on generating less secure code with LLMs.* [00:27:22] Learning and explaining code with AI* [00:30:12] AGI speculations?* [00:32:50] Distributing content without social media* [00:35:39] How much data do you think you can put on a single piece of paper?* [00:37:37] Building personal AI benchmarks* [00:43:04] Evolution of prompt engineering and its relevance* [00:46:06] Model vs task benchmarking* [00:52:14] Poisoning LAION 400M through expired domains* [00:55:38] Stealing OpenAI models from their API* [01:01:29] Data stealing and recovering training data from models* [01:03:30] Finding motivation in your workTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO-in-Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.Swyx [00:00:12]: Hey, and today we're in the in-person studio, which Alessio has gorgeously set up for us, with Nicholas Carlini. Welcome. Thank you. You're a research scientist at DeepMind. You work at the intersection of machine learning and computer security. You got your PhD from Berkeley in 2018, and also your BA from Berkeley as well. And mostly we're here to talk about your blogs, because you are so generous in just writing up what you know. Well, actually, why do you write?Nicholas [00:00:41]: Because I like, I feel like it's fun to share what you've done. I don't like writing, sufficiently didn't like writing, I almost didn't do a PhD, because I knew how much writing was involved in writing papers. I was terrible at writing when I was younger. I do like the remedial writing classes when I was in university, because I was really bad at it. So I don't actually enjoy, I still don't enjoy the act of writing. But I feel like it is useful to share what you're doing, and I like being able to talk about the things that I'm doing that I think are fun. And so I write because I think I want to have something to say, not because I enjoy the act of writing.Swyx [00:01:14]: But yeah. It's a tool for thought, as they often say. Is there any sort of backgrounds or thing that people should know about you as a person? Yeah.Nicholas [00:01:23]: So I tend to focus on, like you said, I do security work, I try to like attacking things and I want to do like high quality security research. And that's mostly what I spend my actual time trying to be productive members of society doing that. But then I get distracted by things, and I just like, you know, working on random fun projects. Like a Doom clone in JavaScript.Swyx [00:01:44]: Yes.Nicholas [00:01:45]: Like that. Or, you know, I've done a number of things that have absolutely no utility. But are fun things to have done. And so it's interesting to say, like, you should work on fun things that just are interesting, even if they're not useful in any real way. And so that's what I tend to put up there is after I have completed something I think is fun, or if I think it's sufficiently interesting, write something down there.Alessio [00:02:09]: Before we go into like AI, LLMs and whatnot, why are you obsessed with the game of life? So you built multiplexing circuits in the game of life, which is mind boggling. So where did that come from? And then how do you go from just clicking boxes on the UI web version to like building multiplexing circuits?Nicholas [00:02:29]: I like Turing completeness. The definition of Turing completeness is a computer that can run anything, essentially. And the game of life, Conway's game of life is a very simple cellular 2D automata where you have cells that are either on or off. And a cell becomes on if in the previous generation some configuration holds true and off otherwise. It turns out there's a proof that the game of life is Turing complete, that you can run any program in principle using Conway's game of life. I don't know. And so you can, therefore someone should. And so I wanted to do it. Some other people have done some similar things, but I got obsessed into like, if you're going to try and make it work, like we already know it's possible in theory. I want to try and like actually make something I can run on my computer, like a real computer I can run. And so yeah, I've been going on this rabbit hole of trying to make a CPU that I can run semi real time on the game of life. And I have been making some reasonable progress there. And yeah, but you know, Turing completeness is just like a very fun trap you can go down. A while ago, as part of a research paper, I was able to show that in C, if you call into printf, it's Turing complete. Like printf, you know, like, which like, you know, you can print numbers or whatever, right?Swyx [00:03:39]: Yeah, but there should be no like control flow stuff.Nicholas [00:03:42]: Because printf has a percent n specifier that lets you write an arbitrary amount of data to an arbitrary location. And the printf format specifier has an index into where it is in the loop that is in memory. So you can overwrite the location of where printf is currently indexing using percent n. So you can get loops, you can get conditionals, and you can get arbitrary data rates again. So we sort of have another Turing complete language using printf, which again, like this has essentially zero practical utility, but like, it's just, I feel like a lot of people get into programming because they enjoy the art of doing these things. And then they go work on developing some software application and lose all joy with the boys. And I want to still have joy in doing these things. And so on occasion, I try to stop doing productive, meaningful things and just like, what's a fun thing that we can do and try and make that happen.Alessio [00:04:39]: Awesome. So you've been kind of like a pioneer in the AI security space. You've done a lot of talks starting back in 2018. We'll kind of leave that to the end because I know the security part is, there's maybe a smaller audience, but it's a very intense audience. So I think that'll be fun. But everybody in our Discord started posting your how I use AI blog post and we were like, we should get Carlini on the podcast. And then you were so nice to just, yeah, and then I sent you an email and you're like, okay, I'll come.Swyx [00:05:07]: And I was like, oh, I thought that would be harder.Alessio [00:05:10]: I think there's, as you said in the blog posts, a lot of misunderstanding about what LLMs can actually be used for. What are they useful at? What are they not good at? And whether or not it's even worth arguing what they're not good at, because they're obviously not. So if you cannot count the R's in a word, they're like, it's just not what it does. So how painful was it to write such a long post, given that you just said that you don't like to write? Yeah. And then we can kind of run through the things, but maybe just talk about the motivation, why you thought it was important to do it.Nicholas [00:05:39]: Yeah. So I wanted to do this because I feel like most people who write about language models being good or bad, some underlying message of like, you know, they have their camp and their camp is like, AI is bad or AI is good or whatever. And they like, they spin whatever they're going to say according to their ideology. And they don't actually just look at what is true in the world. So I've read a lot of things where people say how amazing they are and how all programmers are going to be obsolete by 2024. And I've read a lot of things where people who say like, they can't do anything useful at all. And, you know, like, they're just like, it's only the people who've come off of, you know, blockchain crypto stuff and are here to like make another quick buck and move on. And I don't really agree with either of these. And I'm not someone who cares really one way or the other how these things go. And so I wanted to write something that just says like, look, like, let's sort of ground reality and what we can actually do with these things. Because my actual research is in like security and showing that these models have lots of problems. Like this is like my day to day job is saying like, we probably shouldn't be using these in lots of cases. I thought I could have a little bit of credibility of in saying, it is true. They have lots of problems. We maybe shouldn't be deploying them lots of situations. And still, they are also useful. And that is the like, the bit that I wanted to get across is to say, I'm not here to try and sell you on anything. I just think that they're useful for the kinds of work that I do. And hopefully, some people would listen. And it turned out that a lot more people liked it than I thought. But yeah, that was the motivation behind why I wanted to write this.Alessio [00:07:15]: So you had about a dozen sections of like how you actually use AI. Maybe we can just kind of run through them all. And then maybe the ones where you have extra commentary to add, we can... Sure.Nicholas [00:07:27]: Yeah, yeah. I didn't put as much thought into this as maybe was deserved. I probably spent, I don't know, definitely less than 10 hours putting this together.Swyx [00:07:38]: Wow.Alessio [00:07:39]: It took me close to that to do a podcast episode. So that's pretty impressive.Nicholas [00:07:43]: Yeah. I wrote it in one pass. I've gotten a number of emails of like, you got this editing thing wrong, you got this sort of other thing wrong. It's like, I haven't just haven't looked at it. I tend to try it. I feel like I still don't like writing. And so because of this, the way I tend to treat this is like, I will put it together into the best format that I can at a time, and then put it on the internet, and then never change it. And this is an aspect of like the research side of me is like, once a paper is published, like it is done as an artifact that exists in the world. I could forever edit the very first thing I ever put to make it the most perfect version of what it is, and I would do nothing else. And so I feel like I find it useful to be like, this is the artifact, I will spend some certain amount of hours on it, which is what I think it is worth. And then I will just...Swyx [00:08:22]: Yeah.Nicholas [00:08:23]: Timeboxing.Alessio [00:08:24]: Yeah. Stop. Yeah. Okay. We just recorded an episode with the founder of Cosine, which is like an AI software engineer colleague. You said it took you 30,000 words to get GPT-4 to build you the, can GPT-4 solve this kind of like app. Where are we in the spectrum where chat GPT is all you need to actually build something versus I need a full on agent that does everything for me?Nicholas [00:08:46]: Yeah. Okay. So this was an... So I built a web app last year sometime that was just like a fun demo where you can guess if you can predict whether or not GPT-4 at the time could solve a given task. This is, as far as web apps go, very straightforward. You need basic HTML, CSS, you have a little slider that moves, you have a button, sort of animate the text coming to the screen. The reason people are going here is not because they want to see my wonderful HTML, right? I used to know how to do modern HTML in 2007, 2008. I was very good at fighting with IE6 and these kinds of things. I knew how to do that. I have no longer had to build any web app stuff in the meantime, which means that I know how everything works, but I don't know any of the new... Flexbox is new to me. Flexbox is like 10 years old at this point, but it's just amazing being able to go to the model and just say, write me this thing and it will give me all of the boilerplate that I need to get going. Of course it's imperfect. It's not going to get you the right answer, and it doesn't do anything that's complicated right now, but it gets you to the point where the only remaining work that needs to be done is the interesting hard part for me, the actual novel part. Even the current models, I think, are entirely good enough at doing this kind of thing, that they're very useful. It may be the case that if you had something, like you were saying, a smarter agent that could debug problems by itself, that might be even more useful. Currently though, make a model into an agent by just copying and pasting error messages for the most part. That's what I do, is you run it and it gives you some code that doesn't work, and either I'll fix the code, or it will give me buggy code and I won't know how to fix it, and I'll just copy and paste the error message and say, it tells me this. What do I do? And it will just tell me how to fix it. You can't trust these things blindly, but I feel like most people on the internet already understand that things on the internet, you can't trust blindly. And so this is not like a big mental shift you have to go through to understand that it is possible to read something and find it useful, even if it is not completely perfect in its output.Swyx [00:10:54]: It's very human-like in that sense. It's the same ring of trust, I kind of think about it that way, if you had trust levels.Alessio [00:11:03]: And there's maybe a couple that tie together. So there was like, to make applications, and then there's to get started, which is a similar you know, kickstart, maybe like a project that you know the LLM cannot solve. It's kind of how you think about it.Nicholas [00:11:15]: Yeah. So for getting started on things is one of the cases where I think it's really great for some of these things, where I sort of use it as a personalized, help me use this technology I've never used before. So for example, I had never used Docker before January. I know what Docker is. Lucky you. Yeah, like I'm a computer security person, like I sort of, I have read lots of papers on, you know, all the technology behind how these things work. You know, I know all the exploits on them, I've done some of these things, but I had never actually used Docker. But I wanted it to be able to, I could run the outputs of language model stuff in some controlled contained environment, which I know is the right application. So I just ask it like, I want to use Docker to do this thing, like, tell me how to run a Python program in a Docker container. And it like gives me a thing. I'm like, step back. You said Docker compose, I do not know what this word Docker compose is. Is this Docker? Help me. And like, you'll sort of tell me all of these things. And I'm sure there's this knowledge that's out there on the internet, like this is not some groundbreaking thing that I'm doing, but I just wanted it as a small piece of one thing I was working on. And I didn't want to learn Docker from first principles. Like I, at some point, if I need it, I can do that. Like I have the background that I can make that happen. But what I wanted to do was, was thing one. And it's very easy to get bogged down in the details of this other thing that helps you accomplish your end goal. And I just want to like, tell me enough about Docker so I can do this particular thing. And I can check that it's doing the safe thing. I sort of know enough about that from, you know, my other background. And so I can just have the model help teach me exactly the one thing I want to know and nothing more. I don't need to worry about other things that the writer of this thinks is important that actually isn't. Like I can just like stop the conversation and say, no, boring to me. Explain this detail. I don't understand. I think that's what that was very useful for me. It would have taken me, you know, several hours to figure out some things that take 10 minutes if you could just ask exactly the question you want the answer to.Alessio [00:13:05]: Have you had any issues with like newer tools? Have you felt any meaningful kind of like a cutoff day where like there's not enough data on the internet or? I'm sure that the answer to this is yes.Nicholas [00:13:16]: But I tend to just not use most of these things. Like I feel like this is like the significant way in which I use machine learning models is probably very different than most people is that I'm a researcher and I get to pick what tools that I use and most of the things that I work on are fairly small projects. And so I can, I can entirely see how someone who is in a big giant company where they have their own proprietary legacy code base of a hundred million lines of code or whatever and like you just might not be able to use things the same way that I do. I still think there are lots of use cases there that are entirely reasonable that are not the same ones that I've put down. But I wanted to talk about what I have personal experience in being able to say is useful. And I would like it very much if someone who is in one of these environments would be able to describe the ways in which they find current models useful to them. And not, you know, philosophize on what someone else might be able to find useful, but actually say like, here are real things that I have done that I found useful for me.Swyx [00:14:08]: Yeah, this is what I often do to encourage people to write more, to share their experiences because they often fear being attacked on the internet. But you are the ultimate authority on how you use things and there's this objectively true. So they cannot be debated. One thing that people are very excited about is the concept of ephemeral software or like personal software. This use case in particular basically lowers the activation energy for creating software, which I like as a vision. I don't think I have taken as much advantage of it as I could. I feel guilty about that. But also, we're trending towards there.Nicholas [00:14:47]: Yeah. No, I mean, I do think that this is a direction that is exciting to me. One of the things I wrote that was like, a lot of the ways that I use these models are for one-off things that I just need to happen that I'm going to throw away in five minutes. And you can.Swyx [00:15:01]: Yeah, exactly.Nicholas [00:15:02]: Right. It's like the kind of thing where it would not have been worth it for me to have spent 45 minutes writing this, because I don't need the answer that badly. But if it will only take me five minutes, then I'll just figure it out, run the program and then get it right. And if it turns out that you ask the thing, it doesn't give you the right answer. Well, I didn't actually need the answer that badly in the first place. Like either I can decide to dedicate the 45 minutes or I cannot, but like the cost of doing it is fairly low. You see what the model can do. And if it can't, then, okay, when you're using these models, if you're getting the answer you want always, it means you're not asking them hard enough questions.Swyx [00:15:35]: Say more.Nicholas [00:15:37]: Lots of people only use them for very small particular use cases and like it always does the thing that they want. Yeah.Swyx [00:15:43]: Like they use it like a search engine.Nicholas [00:15:44]: Yeah. Or like one particular case. And if you're finding that when you're using these, it's always giving you the answer that you want, then probably it has more capabilities than you're actually using. And so I oftentimes try when I have something that I'm curious about to just feed into the model and be like, well, maybe it's just solved my problem for me. You know, most of the time it doesn't, but like on occasion, it's like, it's done things that would have taken me, you know, a couple hours that it's been great and just like solved everything immediately. And if it doesn't, then it's usually easier to verify whether or not the answer is correct than to have written in the first place. And so you check, you're like, well, that's just, you're entirely misguided. Nothing here is right. It's just like, I'm not going to do this. I'm going to go write it myself or whatever.Alessio [00:16:21]: Even for non-tech, I had to fix my irrigation system. I had an old irrigation system. I didn't know how I worked to program it. I took a photo, I sent it to Claude and it's like, oh yeah, that's like the RT 900. This is exactly, I was like, oh wow, you know, you know, a lot of stuff.Swyx [00:16:34]: Was it right?Alessio [00:16:35]: Yeah, it was right.Swyx [00:16:36]: It worked. Did you compare with OpenAI?Alessio [00:16:38]: No, I canceled my OpenAI subscription, so I'm a Claude boy. Do you have a way to think about this like one-offs software thing? One way I talk to people about it is like LLMs are kind of converging to like semantic serverless functions, you know, like you can say something and like it can run the function in a way and then that's it. It just kind of dies there. Do you have a mental model to just think about how long it should live for and like anything like that?Nicholas [00:17:02]: I don't think I have anything interesting to say here, no. I will take whatever tools are available in front of me and try and see if I can use them in meaningful ways. And if they're helpful, then great. If they're not, then fine. And like, you know, there are lots of people that I'm very excited about seeing all these people who are trying to make better applications that use these or all these kinds of things. And I think that's amazing. I would like to see more of it, but I do not spend my time thinking about how to make this any better.Alessio [00:17:27]: What's the most underrated thing in the list? I know there's like simplified code, solving boring tasks, or maybe is there something that you forgot to add that you want to throw in there?Nicholas [00:17:37]: I mean, so in the list, I only put things that people could look at and go, I understand how this solved my problem. I didn't want to put things where the model was very useful to me, but it would not be clear to someone else that it was actually useful. So for example, one of the things that I use it a lot for is debugging errors. But the errors that I have are very much not the errors that anyone else in the world will have. And in order to understand whether or not the solution was right, you just have to trust me on it. Because, you know, like I got my machine in a state that like CUDA was not talking to whatever some other thing, the versions were mismatched, something, something, something, and everything was broken. And like, I could figure it out with interaction with the model, and it gave it like told me the steps I needed to take. But at the end of the day, when you look at the conversation, you just have to trust me that it worked. And I didn't want to write things online that were this, like, you have to trust me that what I'm saying. I want everything that I said to like have evidence that like, here's the conversation, you can go and check whether or not this actually solved the task as I said that the model does. Because a lot of people I feel like say, I used a model to solve this very complicated task. And what they mean is the model did 10%, and I did the other 90% or something, I wanted everything to be verifiable. And so one of the biggest use cases for me, I didn't describe even at all, because it's not the kind of thing that other people could have verified by themselves. So that maybe is like, one of the things that I wish I maybe had said a little bit more about, and just stated that the way that this is done, because I feel like that this didn't come across quite as well. But yeah, of the things that I talked about, the thing that I think is most underrated is the ability of it to solve the uninteresting parts of problems for me right now, where people always say, this is one of the biggest arguments that I don't understand why people say is, the model can only do things that people have done before. Therefore, the model is not going to be helpful in doing new research or like discovering new things. And as someone whose day job is to do new things, like what is research? Research is doing something literally no one else in the world has ever done before. So this is what I do every single day, 90% of this is not doing something new, 90% of this is doing things a million people have done before, and then a little bit of something that was new. There's a reason why we say we stand on the shoulders of giants. It's true. Almost everything that I do is something that's been done many, many times before. And that is the piece that can be automated. Even if the thing that I'm doing as a whole is new, it is almost certainly the case that the small pieces that build up to it are not. And a number of people who use these models, I feel like expect that they can either solve the entire task or none of the task. But now I find myself very often, even when doing something very new and very hard, having models write the easy parts for me. And the reason I think this is so valuable, everyone who programs understands this, like you're currently trying to solve some problem and then you get distracted. And whatever the case may be, someone comes and talks to you, you have to go look up something online, whatever it is. You lose a lot of time to that. And one of the ways we currently don't think about being distracted is you're solving some hard problem and you realize you need a helper function that does X, where X is like, it's a known algorithm. Any person in the world, you say like, give me the algorithm that, have a dense graph or a sparse graph, I need to make it dense. You can do this by doing some matrix multiplies. It's like, this is a solved problem. I knew how to do this 15 years ago, but it distracts me from the problem I'm thinking about in my mind. I needed this done. And so instead of using my mental capacity and solving that problem and then coming back to the problem I was originally trying to solve, you could just ask model, please solve this problem for me. It gives you the answer. You run it. You can check that it works very, very quickly. And now you go back to solving the problem without having lost all the mental state. And I feel like this is one of the things that's been very useful for me.Swyx [00:21:34]: And in terms of this concept of expert users versus non-expert users, floors versus ceilings, you had some strong opinion here that like, basically it actually is more beneficial for non-experts.Nicholas [00:21:46]: Yeah, I don't know. I think it could go either way. Let me give you the argument for both of these. Yes. So I can only speak on the expert user behalf because I've been doing computers for a long time. And so yeah, the cases where it's useful for me are exactly these cases where I can check the output. I know, and anything the model could do, I could have done. I could have done better. I can check every single thing that the model is doing and make sure it's correct in every way. And so I can only speak and say, definitely it's been useful for me. But I also see a world in which this could be very useful for the kinds of people who do not have this knowledge, with caveats, because I'm not one of these people. I don't have this direct experience. But one of these big ways that I can see this is for things that you can check fairly easily, someone who could never have asked or have written a program themselves to do a certain task could just ask for the program that does the thing. And you know, some of the times it won't get it right. But some of the times it will, and they'll be able to have the thing in front of them that they just couldn't have done before. And we see a lot of people trying to do applications for this, like integrating language models into spreadsheets. Spreadsheets run the world. And there are some people who know how to do all the complicated spreadsheet equations and various things, and other people who don't, who just use the spreadsheet program but just manually do all of the things one by one by one by one. And this is a case where you could have a model that could try and give you a solution. And as long as the person is rigorous in testing that the solution does actually the correct thing, and this is the part that I'm worried about most, you know, I think depending on these systems in ways that we shouldn't, like this is what my research says, my research says is entirely on this, like, you probably shouldn't trust these models to do the things in adversarial situations, like, I understand this very deeply. And so I think that it's possible for people who don't have this knowledge to make use of these tools in ways, but I'm worried that it might end up in a world where people just blindly trust them, deploy them in situations that they probably shouldn't, and then someone like me gets to come along and just break everything because everything is terrible. And so I am very, very worried about that being the case, but I think if done carefully it is possible that these could be very useful.Swyx [00:23:54]: Yeah, there is some research out there that shows that when people use LLMs to generate code, they do generate less secure code.Nicholas [00:24:02]: Yeah, Dan Bonet has a nice paper on this. There are a bunch of papers that touch on exactly this.Swyx [00:24:07]: My slight issue is, you know, is there an agenda here?Nicholas [00:24:10]: I mean, okay, yeah, Dan Bonet, at least the one they have, like, I fully trust everything that sort of.Swyx [00:24:15]: Sorry, I don't know who Dan is.Swyx [00:24:17]: He's a professor at Stanford. Yeah, he and some students have some things on this. Yeah, there's a number. I agree that a lot of the stuff feels like people have an agenda behind it. There are some that don't, and I trust them to have done the right thing. I also think, even on this though, we have to be careful because the argument, whenever someone says x is true about language models, you should always append the suffix for current models because I'll be the first to admit I was one of the people who was very much on the opinion that these language models are fun toys and are going to have absolutely no practical utility. If you had asked me this, let's say, in 2020, I still would have said the same thing. After I had seen GPT-2, I had written a couple of papers studying GPT-2 very carefully. I still would have told you these things are toys. And when I first read the RLHF paper and the instruction tuning paper, I was like, nope, this is this thing that these weird AI people are doing. They're trying to make some analogies to people that makes no sense. It's just like, I don't even care to read it. I saw what it was about and just didn't even look at it. I was obviously wrong. These things can be useful. And I feel like a lot of people had the same mentality that I did and decided not to change their mind. And I feel like this is the thing that I want people to be careful about. I want them to at least know what is true about the world so that they can then see that maybe they should reconsider some of the opinions that they had from four or five years ago that may just not be true about today's models.Swyx [00:25:47]: Specifically because you brought up spreadsheets, I want to share my personal experience because I think Google has done a really good job that people don't know about, which is if you use Google Sheets, Gemini is integrated inside of Google Sheets and it helps you write formulas. Great.Nicholas [00:26:00]: That's news to me.Swyx [00:26:01]: Right? They don't maybe do a good job. Unless you watch Google I.O., there was no other opportunity to learn that Gemini is now in your Google Sheets. And so I just don't write formulas manually anymore. It just prompts Gemini to do it for me. And it does it.Nicholas [00:26:15]: One of the problems that these machine learning models have is a discoverability problem. I think this will be figured out. I mean, it's the same problem that you have with any assistant. You're given a blank box and you're like, what do I do with it? I think this is great. More of these things, it would be good for them to exist. I want them to exist in ways that we can actually make sure that they're done correctly. I don't want to just have them be pushed into more and more things just blindly. I feel like lots of people, there are far too many X plus AI, where X is like arbitrary thing in the world that has nothing to do with it and could not be benefited at all. And they're just doing it because they want to use the word. And I don't want that to happen.Swyx [00:26:58]: You don't want an AI fridge?Nicholas [00:27:00]: No. Yes. I do not want my fridge on the internet.Swyx [00:27:03]: I do not want... Okay.Nicholas [00:27:05]: Anyway, let's not go down that rabbit hole. I understand why some of that happens, because people want to sell things or whatever. But I feel like a lot of people see that and then they write off everything as a result of it. And I just want to say, there are allowed to be people who are trying to do things that don't make any sense. Just ignore them. Do the things that make sense.Alessio [00:27:22]: Another chunk of use cases was learning. So both explaining code, being an API reference, all of these different things. Any suggestions on how to go at it? I feel like one thing is generate code and then explain to me. One way is just tell me about this technology. Another thing is like, hey, I read this online, kind of help me understand it. Any best practices on getting the most out of it?Swyx [00:27:47]: Yeah.Nicholas [00:27:47]: I don't know if I have best practices. I have how I use them.Swyx [00:27:51]: Yeah.Nicholas [00:27:51]: I find it very useful for cases where I understand the underlying ideas, but I have never usedSwyx [00:27:59]: them in this way before.Nicholas [00:28:00]: I know what I'm looking for, but I just don't know how to get there. And so yeah, as an API reference is a great example. The tool everyone always picks on is like FFmpeg. No one in the world knows the command line arguments to do what they want. They're like, make the thing faster. I want lower bitrate, like dash V. Once you tell me what the answer is, I can check. This is one of these things where it's great for these kinds of things. Or in other cases, things where I don't really care that the answer is 100% correct. So for example, I do a lot of security work. Most of security work is reading some code you've never seen before and finding out which pieces of the code are actually important. Because, you know, most of the program isn't actually do anything to do with security. It has, you know, the display piece or the other piece or whatever. And like, you just, you would only ignore all of that. So one very fun use of models is to like, just have it describe all the functions and just skim it and be like, wait, which ones look like approximately the right things to look at? Because otherwise, what are you going to do? You're going to have to read them all manually. And when you're reading them manually, you're going to skim the function anyway, and not just figure out what's going on perfectly. Like you already know that when you're going to read these things, what you're going to try and do is figure out roughly what's going on. Then you'll delve into the details. This is a great way of just doing that, but faster, because it will abstract most of whatSwyx [00:29:21]: is right.Nicholas [00:29:21]: It's going to be wrong some of the time. I don't care.Swyx [00:29:23]: I would have been wrong too.Nicholas [00:29:24]: And as long as you treat it with this way, I think it's great. And so like one of the particular use cases I have in the thing is decompiling binaries, where oftentimes people will release a binary. They won't give you the source code. And you want to figure out how to attack it. And so one thing you could do is you could try and run some kind of decompiler. It turns out for the thing that I wanted, none existed. And so I spent too many hours doing it by hand. Before I first thought, why am I doing this? I should just check if the model could do it for me. And it turns out that it can. And it can turn the compiled source code, which is impossible for any human to understand, into the Python code that is entirely reasonable to understand. And it doesn't run. It has a bunch of problems. But it's so much nicer that it's immediately a win for me. I can just figure out approximately where I should be looking, and then spend all of my time doing that by hand. And again, you get a big win there.Swyx [00:30:12]: So I fully agree with all those use cases, especially for you as a security researcher and having to dive into multiple things. I imagine that's super helpful. I do think we want to move to your other blog post. But you ended your post with a little bit of a teaser about your next post and your speculations. What are you thinking about?Nicholas [00:30:34]: So I want to write something. And I will do that at some point when I have time, maybe after I'm done writing my current papers for ICLR or something, where I want to talk about some thoughts I have for where language models are going in the near-term future. The reason why I want to talk about this is because, again, I feel like the discussion tends to be people who are either very much AGI by 2027, orSwyx [00:30:55]: always five years away, or are going to make statements of the form,Nicholas [00:31:00]: you know, LLMs are the wrong path, and we should be abandoning this, and we should be doing something else instead. And again, I feel like people tend to look at this and see these two polarizing options and go, well, those obviously are both very far extremes. Like, how do I actually, like, what's a more nuanced take here? And so I have some opinions about this that I want to put down, just saying, you know, I have wide margins of error. I think you should too. If you would say there's a 0% chance that something, you know, the models will get very, very good in the next five years, you're probably wrong. If you're going to say there's a 100% chance that in the next five years, then you're probably wrong. And like, to be fair, most of the people, if you read behind the headlines, actually say something like this. But it's very hard to get clicks on the internet of like, some things may be good in the future. Like, everyone wants like, you know, a very, like, nothing is going to be good. This is entirely wrong. It's going to be amazing. You know, like, they want to see this. I want people who have negative reactions to these kinds of extreme views to be able to at least say, like, to tell them, there is something real here. It may not solve all of our problems, but it's probably going to get better. I don't know by how much. And that's basically what I want to say. And then at some point, I'll talk about the safety and security things as a result of this. Because the way in which security intersects with these things depends a lot in exactly how people use these tools. You know, if it turns out to be the case that these models get to be truly amazing and can solve, you know, tasks completely autonomously, that's a very different security world to be living in than if there's always a human in the loop. And the types of security questions I would want to ask would be very different. And so I think, you know, in some very large part, understanding what the future will look like a couple of years ahead of time is helpful for figuring out which problems, as a security person, I want to solve now. You mentioned getting clicks on the internet,Alessio [00:32:50]: but you don't even have, like, an ex-account or anything. How do you get people to read your stuff? What's your distribution strategy? Because this post was popping up everywhere. And then people on Twitter were like, Nicholas Garlini wrote this. Like, what's his handle? It's like, he doesn't have it. It's like, how did you find it? What's the story?Nicholas [00:33:07]: So I have an RSS feed and an email list. And that's it. I don't like most social media things. On principle, I feel like they have some harms. As a person, I have a problem when people say things that are wrong on the internet. And I would get nothing done if I would have a Twitter. I would spend all of my time correcting people and getting into fights. And so I feel like it is just useful for me for this not to be an option. I tend to just post things online. Yeah, it's a very good question. I don't know how people find it. I feel like for some things that I write, other people think it resonates with them. And then they put it on Twitter. And...Swyx [00:33:43]: Hacker News as well.Nicholas [00:33:44]: Sure, yeah. I am... Because my day job is doing research, I get no value for having this be picked up. There's no whatever. I don't need to be someone who has to have this other thing to give talks. And so I feel like I can just say what I want to say. And if people find it useful, then they'll share it widely. You know, this one went pretty wide. I wrote a thing, whatever, sometime late last year, about how to recover data off of an Apple profile drive from 1980. This probably got, I think, like 1000x less views than this. But I don't care. Like, that's not why I'm doing this. Like, this is the benefit of having a thing that I actually care about, which is my research. I would care much more if that didn't get seen. This is like a thing that I write because I have some thoughts that I just want to put down.Swyx [00:34:32]: Yeah. I think it's the long form thoughtfulness and authenticity that is sadly lacking sometimes in modern discourse that makes it attractive. And I think now you have a little bit of a brand of you are an independent thinker, writer, person, that people are tuned in to pay attention to whatever is next coming.Nicholas [00:34:52]: Yeah, I mean, this kind of worries me a little bit. I don't like whenever I have a popular thing that like, and then I write another thing, which is like entirely unrelated. Like, I don't, I don't... You should actually just throw people off right now.Swyx [00:35:01]: Exactly.Nicholas [00:35:02]: I'm trying to figure out, like, I need to put something else online. So, like, the last two or three things I've done in a row have been, like, actually, like, things that people should care about.Swyx [00:35:10]: Yes. So, I have a couple of things.Nicholas [00:35:11]: I'm trying to figure out which one do I put online to just, like, cull the list of people who have subscribed to my email.Swyx [00:35:16]: And so, like, tell them, like,Nicholas [00:35:16]: no, like, what you're here for is not informed, well-thought-through takes. Like, what you're here for is whatever I want to talk about. And if you're not up for that, then, like, you know, go away. Like, this is not what I want out of my personal website.Swyx [00:35:27]: So, like, here's, like, top 10 enemies or something.Alessio [00:35:30]: What's the next project you're going to work on that is completely unrelated to research LLMs? Or what games do you want to port into the browser next?Swyx [00:35:39]: Okay. Yeah.Nicholas [00:35:39]: So, maybe.Swyx [00:35:41]: Okay.Nicholas [00:35:41]: Here's a fun question. How much data do you think you can put on a single piece of paper?Swyx [00:35:47]: I mean, you can think about bits and atoms. Yeah.Nicholas [00:35:49]: No, like, normal printer. Like, I gave you an office printer. How much data can you put on a piece of paper?Alessio [00:35:54]: Can you re-decode it? So, like, you know, base 64A or whatever. Yeah, whatever you want.Nicholas [00:35:59]: Like, you get normal off-the-shelf printer, off-the-shelf scanner. How much data?Swyx [00:36:03]: I'll just throw out there. Like, 10 megabytes. That's enormous. I know.Nicholas [00:36:07]: Yeah, that's a lot.Swyx [00:36:10]: Really small fonts. That's my question.Nicholas [00:36:12]: So, I have a thing. It does about a megabyte.Swyx [00:36:14]: Yeah, okay.Nicholas [00:36:14]: There you go. I was off by an order of magnitude.Swyx [00:36:16]: Yeah, okay.Nicholas [00:36:16]: So, in particular, it's about 1.44 megabytes. A floppy disk.Swyx [00:36:21]: Yeah, exactly.Nicholas [00:36:21]: So, this is supposed to be the title at some point. It's a floppy disk.Swyx [00:36:24]: A paper is a floppy disk. Yeah.Nicholas [00:36:25]: So, this is a little hard because, you know. So, you can do the math and you get 8.5 by 11. You can print at 300 by 300 DPI. And this gives you 2 megabytes. And so, every single pixel, you need to be able to recover up to like 90 plus percent. Like, 95 percent. Like, 99 point something percent accuracy. In order to be able to actually decode this off the paper. This is one of the things that I'm considering. I need to get a couple more things working for this. Where, you know, again, I'm running into some random problems. But this is probably, this will be one thing that I'm going to talk about. There's this contest called the International Obfuscated C-Code Contest, which is amazing. People try and write the most obfuscated C code that they can. Which is great. And I have a submission for that whenever they open up the next one for it. And I'll write about that submission. I have a very fun gate level emulation of an old CPU that runs like fully precisely. And it's a fun kind of thing. Yeah.Swyx [00:37:20]: Interesting. Your comment about the piece of paper reminds me of when I was in college. And you would have like one cheat sheet that you could write. So, you have a formula, a theoretical limit for bits per inch. And, you know, that's how much I would squeeze in really, really small. Yeah, definitely.Nicholas [00:37:36]: Okay.Swyx [00:37:37]: We are also going to talk about your benchmarking. Because you released your own benchmark that got some attention, thanks to some friends on the internet. What's the story behind your own benchmark? Do you not trust the open source benchmarks? What's going on there?Nicholas [00:37:51]: Okay. Benchmarks tell you how well the model solves the task the benchmark is designed to solve. For a long time, models were not useful. And so, the benchmark that you tracked was just something someone came up with, because you need to track something. All of deep learning exists because people tried to make models classify digits and classify images into a thousand classes. There is no one in the world who cares specifically about the problem of distinguishing between 300 breeds of dog for an image that's 224 or 224 pixels. And yet, like, this is what drove a lot of progress. And people did this not because they cared about this problem, because they wanted to just measure progress in some way. And a lot of benchmarks are of this flavor. You want to construct a task that is hard, and we will measure progress on this benchmark, not because we care about the problem per se, but because we know that progress on this is in some way correlated with making better models. And this is fine when you don't want to actually use the models that you have. But when you want to actually make use of them, it's important to find benchmarks that track with whether or not they're useful to you. And the thing that I was finding is that there would be model after model after model that was being released that would find some benchmark that they could claim state-of-the-art on and then say, therefore, ours is the best. And that wouldn't be helpful to me to know whether or not I should then switch to it. So the argument that I tried to lay out in this post is that more people should make benchmarks that are tailored to them. And so what I did is I wrote a domain-specific language that anyone can write for and say, you can take tasks that you have wanted models to solve for you, and you can put them into your benchmark that's the thing that you care about. And then when a new model comes out, you benchmark the model on the things that you care about. And you know that you care about them because you've actually asked for those answers before. And if the model scores well, then you know that for the kinds of things that you have asked models for in the past, it can solve these things well for you. This has been useful for me because when another model comes out, I can run it. I can see, does this solve the kinds of things that I care about? And sometimes the answer is yes, and sometimes the answer is no. And then I can decide whether or not I want to use that model or not. I don't want to say that existing benchmarks are not useful. They're very good at measuring the thing that they're designed to measure. But in many cases, what that's designed to measure is not actually the thing that I want to use it for. And I expect that the way that I want to use it is different the way that you want to use it. And I would just like more people to have these things out there in the world. And the final reason for this is, it is very easy. If you want to make a model good at some benchmark, to make it good at that benchmark, you can find the distribution of data that you need and train the model to be good on the distribution of data. And then you have your model that can solve this benchmark well. And by having a benchmark that is not very popular, you can be relatively certain that no one has tried to optimize their model for your benchmark.Swyx [00:40:40]: And I would like this to be-Nicholas [00:40:40]: So publishing your benchmark is a little bit-Swyx [00:40:43]: Okay, sure.Nicholas [00:40:43]: Contextualized. So my hope in doing this was not that people would use mine as theirs. My hope in doing this was that- You should make yours. Yes, you should make your benchmark. And if, for example, there were even a very small fraction of people, 0.1% of people who made a benchmark that was useful for them, this would still be hundreds of new benchmarks that- not want to make one myself, but I might want to- I might know the kinds of work that I do is a little bit like this person, a little bit like that person. I'll go check how it is on their benchmarks. And I'll see, roughly, I'll get a good sense of what's going on. Because the alternative is people just do this vibes-based evaluation thing, where you interact with the model five times, and you see if it worked on the kinds of things that you just like your toy questions. But five questions is a very low bit output from whether or not it works for this thing. And if you could just automate running it 100 questions for you, it's a much better evaluation. So that's why I did this.Swyx [00:41:37]: Yeah, I like the idea of going through your chat history and actually pulling out real-life examples. I regret to say that I don't think my chat history is used as much these days, because I'm using Cursor, the native AI IDE. So your examples are all coding related. And the immediate question is, now that you've written the How I Use AI post, which is a little bit broader, are you able to translate all these things to evals? Are some things unevaluable?Nicholas [00:42:03]: Right. A number of things that I do are harder to evaluate. So this is the problem with a benchmark, is you need some way to check whether or not the output was correct. And so all of the kinds of things that I can put into the benchmark are the kinds of things that you can check. You can check more things than you might have thought would be possible if you do a little bit of work on the back end. So for example, all of the code that I have the model write, it runs the code and sees whether the answer is the correct answer. Or in some cases, it runs the code, feeds the output to another language model, and the language model judges was the output correct. And again, is using a language model to judge here perfect? No. But like, what's the alternative? The alternative is to not do it. And what I care about is just, is this thing broadly useful for the kinds of questions that I have? And so as long as the accuracy is better than roughly random, like, I'm okay with this. I've inspected the outputs of these, and like, they're almost always correct. If you ask the model to judge these things in the right way, they're very good at being able to tell this. And so, yeah, I probably think this is a useful thing for people to do.Alessio [00:43:04]: You complain about prompting and being lazy and how you do not want to tip your model and you do not want to murder a kitten just to get the right answer. How do you see the evolution of like prompt engineering? Even like 18 months ago, maybe, you know, it was kind of like really hot and people wanted to like build companies around it. Today, it's like the models are getting good. Do you think it's going to be less and less relevant going forward? Or what's the minimum valuable prompt? Yeah, I don't know.Nicholas [00:43:29]: I feel like a big part of making an agent is just like a fancy prompt that like, you know, calls back to the model again. I have no opinion. It seems like maybe it turns out that this is really important. Maybe it turns out that this isn't. I guess the only comment I was making here is just to say, oftentimes when I use a model and I find it's not useful, I talk to people who help make it. The answer they usually give me is like, you're using it wrong. Which like reminds me very much of like that you're holding it wrong from like the iPhone kind of thing, right? Like, you know, like I don't care that I'm holding it wrong. I'm holding it that way. If the thing is not working with me, then like it's not useful for me. Like it may be the case that there exists a way to ask the model such that it gives me the answer that's correct, but that's not the way I'm doing it. If I have to spend so much time thinking about how I want to frame the question, that it would have been faster for me just to get the answer. It didn't save me any time. And so oftentimes, you know, what I do is like, I just dump in whatever current thought that I have in whatever ill-formed way it is. And I expect the answer to be correct. And if the answer is not correct, like in some sense, maybe the model was right to give me the wrong answer. Like I may have asked the wrong question, but I want the right answer still. And so like, I just want to sort of get this as a thing. And maybe the way to fix this is you have some default prompt that always goes into all the models or something, or you do something like clever like this. It would be great if someone had a way to package this up and make a thing I think that's entirely reasonable. Maybe it turns out that as models get better, you don't need to prompt them as much in this way. I just want to use the things that are in front of me.Alessio [00:44:55]: Do you think that's like a limitation of just how models work? Like, you know, at the end of the day, you're using the prompt to kind of like steer it in the latent space. Like, do you think there's a way to actually not make the prompt really relevant and have the model figure it out? Or like, what's the... I mean, you could fine tune itNicholas [00:45:10]: into the model, for example, that like it's supposed to... I mean, it seems like some models have done this, for example, like some recent model, many recent models. If you ask them a question, computing an integral of this thing, they'll say, let's think through this step by step. And then they'll go through the step by step answer. I didn't tell it. Two years ago, I would have had to have prompted it. Think step by step on solving the following thing. Now you ask them the question and the model says, here's how I'm going to do it. I'm going to take the following approach and then like sort of self-prompt itself.Swyx [00:45:34]: Is this the right way?Nicholas [00:45:35]: Seems reasonable. Maybe you don't have to do it. I don't know. This is for the people whose job is to make these things better. And yeah, I just want to use these things. Yeah.Swyx [00:45:43]: For listeners, that would be Orca and Agent Instruct. It's the soda on this stuff. Great. Yeah.Alessio [00:45:49]: That's a few shot. It's included in the lazy prompting. Like, do you do a few shot prompting? Like, do you collect some examples when you want to put them in? Or...Nicholas [00:45:57]: I don't because usually when I want the answer, I just want to get the answer. Brutal.Swyx [00:46:03]: This is hard mode. Yeah, exactly.Nicholas [00:46:04]: But this is fine.Swyx [00:46:06]: I want to be clear.Nicholas [00:46:06]: There's a difference between testing the ultimate capability level of the model and testing the thing that I'm doing with it. What I'm doing is I'm not exercising its full capability level because there are almost certainly better ways to ask the questions and sort of really see how good the model is. And if you're evaluating a model for being state of the art, this is ultimately what I care about. And so I'm entirely fine with people doing fancy prompting to show me what the true capability level could be because it's really useful to know what the ultimate level of the model could be. But I think it's also important just to have available to you how good the model is if you don't do fancy things.Swyx [00:46:39]: Yeah, I would say that here's a divergence between how models are marketed these days versus how people use it, which is when they test MMLU, they'll do like five shots, 25 shots, 50 shots. And no one's providing 50 examples. I completely agree.Nicholas [00:46:54]: You know, for these numbers, the problem is everyone wants to get state of the art on the benchmark. And so you find the way that you can ask the model the questions so that you get state of the art on the benchmark. And it's good. It's legitimately good to know. It's good to know the model can do this thing if only you try hard enough. Because it means that if I have some task that I want to be solved, I know what the capability level is. And I could get there if I was willing to work hard enough. And the question then is, should I work harder and figure out how to ask the model the question? Or do I just do the thing myself? And for me, I have programmed for many, many, many years. It's often just faster for me just to do the thing than to figure out the incantation to ask the model. But I can imagine someone who has never programmed before might be fine writing five paragraphs in English describing exactly the thing that they want and have the model build it for them if the alternative is not. But again, this goes to all these questions of how are they going to validate? Should they be trusting the output? These kinds of things.Swyx [00:47:49]: One problem with your eval paradigm and most eval paradigms, I'm not picking on you, is that we're actually training these things for chat, for interactive back and forth. And you actually obviously reveal much more information in the same way that asking 20 questions reveals more information in sort of a tree search branching sort of way. Then this is also by the way the problem with LMSYS arena, right? Where the vast majority of prompts are single question, single answer, eval, done. But actually the way that we use chat things, in the way, even in the stuff that you posted in your how I use AI stuff, you have maybe 20 turns of back and forth. How do you eval that?Nicholas [00:48:25]: Yeah. Okay. Very good question. This is the thing that I think many people should be doing more of. I would like more multi-turn evals. I might be writing a paper on this at some point if I get around to it. A couple of the evals in the benchmark thing I have are already multi-turn. I mentioned 20 questions. I have a 20 question eval there just for fun. But I have a couple others that are like, I just tell the model, here's my get thing, figure out how to cherry pick off this other branch and move it over there. And so what I do is I just, I basically build a tiny little agency thing. I just ask the model how I do it. I run the thing on Linux. This is what I want a Docker for. I spin up a Docker container. I run whatever the model told me the output to do is. I feed the output back into the model. I repeat this many rounds. And then I check at the very end, does the git commit history show that it is correctly cherry picked in
This week the guys are chatting about Snap improvements, the new Ryzen 9 9000 chips, and Debian 11 hitting LTS. Then they chat about Tails, Proton VPN, and ClamAV 1.4 all for security. Then Ubuntu prepares for 24.10 with some Easter eggs, and HandBrake fixes some irritating problems. For tips we have Cosmic community projects, Reflector for Arch Mirrors, wl-clipboard, and a one-liner to apply patches from a URL. You can find the show notes at https://bit.ly/3M836zB and see you next week! Host: Jonathan Bennett Co-Hosts: Ken McDonald, Jeff Massie, and Rob Campbell Want access to the video version and exclusive features? Become a member of Club TWiT today! https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.
Josh and Kurt talk about a story talking about the "graying" of open source. There doesn't seem to be many young people working on open source, but we don't really know why that is. There are many thoughts, but a better question is why should anyone get involved in open source anymore? The world has changed quite a lot since open source was created. Show Notes The graying open source community needs fresh blood OSPOs for Good 2024 Day 1 Part 1 Day 1 Part 2 Day 2 Part 1 Day 2 Part 2 FFmpeg bug JSON Editor Online https://rfc3339.com/
This week it's all about the GPUs, with KDE 6.1, Nvidia's 555 drivers, and Mesa 24.1 all coming out, bringing support for Explicit sync, the Nvidia NVK ready for prime time, and more. Then there's Handbreak with FFmpeg 7.0 support, Ventoy bringing an update, and a new Pipewire RC with support for snapcast. Then there's Hans Reiser's last request for ReiserFS making it into kernel 6.10. For tips, we have the Bluefish editor, The last of Spring Cleaning, and how to use docker to very quickly spin up one-off containers. See the show notes at https://bit.ly/3yxtdwn and come back next time! Host: Jonathan Bennett Co-Hosts: Ken McDonald and Jeff Massie Want access to the video version and exclusive features? Become a member of Club TWiT today! https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.
We're talking about Google's layoffs and how it affects Flutter and Dart, then AMD is working to push AMF code into FFMPEG, and it's time for Open Source to grow up. RHEL has an AI offering, NVidia is suggesting Open Source kernel drivers, and Zed is coming to Linux. Then there's Pi Connect pulling a Sherlock, KDE working on color management, and a bit of a history lesson on where we came from. For tips, we have the Radion TUI radio player, || : to ignore errors in a script, the Mixxx DJ app for all those underground raves, and PanWriter for markup editing. You can catch the show notes at https://bit.ly/3UTCJ5C, and we'll see you next time! Host: Jonathan Bennett Co-Hosts: Rob Campbell, David Ruggles, and Jeff Massie Want access to the video version and exclusive features? Become a member of Club TWiT today! https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.
This is my conversation with Doug Petkanics and Eric Tang, cofounders of Livepeer.Timestamps:- 00:00:00 intro- 00:01:45 sponsor: Optimism- 00:03:55 Livepeer origin story- 00:11:54 FFmpeg and the video infrastructure stack- 00:17:07 compute capacity and cost in open vs closed systems- 00:22:59 GPUs as the supply side, working at NVIDIA- 00:40:27 finding latent demand- 00:46:10 sponsor: Privy- 00:47:30 learnings on go-to-market, Livepeer Studio, AI video processing- 01:00:54 AI subnets in the Livepeer network- 01:07:51 doing whatever it takes to get it done- 01:13:19 interacting with the market- 01:18:51 the inner game- 01:24:30 outroLinks:Doug Petkanics - https://twitter.com/petkanicsEric Tang - https://twitter.com/ericxtangLivepeer - https://twitter.com/livepeerLivepeer Studio - https://twitter.com/livepeerstudioThank you to our sponsors for making this podcast possible:Optimism - https://optimism.ioPrivy - https://privy.ioInto the Bytecode:Twitter - https://twitter.com/sinahabFarcaster - https://warpcast.com/sinahabOther episodes - https://intothebytecode.comDisclaimer: this podcast is for informational purposes only. It is not financial advice or a recommendation to buy or sell securities. The host and guests may hold positions in the projects discussed.
This week Noah and Steve give you the latest on open source mobile operating systems. Windows is pushing people to Linux by way of charging them $244 per year for updates in the third year. -- During The Show -- 01:20 Hardware Protectli (https://protectli.com/) $170 Chinesium Device (https://www.newegg.com/p/22Z-007C-009Y8?Item=9SIAK3UJNH8968) Lenovo Thunderbolt Dock (https://www.amazon.com/ZoomSpeed-Universal-Thunder-40B00300US-DisplayPort/dp/B0B1T7PPGZ) CalDigit Thunderbolt Dock (https://www.caldigit.com/thunderbolt-station-4/) 09:19 Graphene OS - Craig GrapheneOS Moving away from phones JMP.Chat (https://jmp.chat/) Conversations (https://f-droid.org/en/packages/eu.siacs.conversations/) Gajim (https://gajim.org/) Linphone (https://linphone.org/) JMP.Chat phone service LineageOS (https://lineageos.org/) PostmarketOS (https://postmarketos.org/) SailfishOS (https://sailfishos.org/) Titan M chip Had pretty good luck on ebay NitroKey/NitroPhone (https://shop.nitrokey.com/shop?&search=nitrophone) 25:55 NixOS Thoughts - Jeremy W Where NixOS is useful "Productised" NixOS Set people up for success 32:58 Self Hosted Email - Jeremy H Write in and tell me about your self hosted email experiences 34:54 News Wire German State Moving to Linux - ARSTechnica (https://arstechnica.com/information-technology/2024/04/german-state-gov-ditching-windows-for-linux-30k-workers-migrating/) Kodi 21.0 - Kodi (https://kodi.tv/article/kodi-21-0-omega-release/) Nitrux - nxos.org (https://nxos.org/changelog/release-announcement-nitrux-3-4-0/) Ubuntu 24.04 Delayed - Toms Hardware (https://www.tomshardware.com/software/linux/ubuntu-2404-beta-delayed-due-to-malicious-code-in-xz-utils-other-linux-distros-are-also-affected) EndeavourOS ARM Discontinued - EndeavourOS (https://endeavouros.com/news/goodbye-endeavouros-arm/) Linux 6.7 EOL - Server Host (https://serverhost.com/blog/end-of-life-for-linux-kernel-6-7-urgent-call-for-users-to-upgrade-to-6-8/) QT Creator 13 - QT (https://www.qt.io/blog/qt-creator-13-released) FFmpeg 7.0 - FFmpeg (https://ffmpeg.org//index.html#pr7.0) Dtrace 2.0 - Phoronix (https://www.phoronix.com/news/D-Trace-2.0.0-1.14) AURORA-M - Mark Tech Post (https://www.marktechpost.com/2024/04/07/aurora-m-a-15b-parameter-multilingual-open-source-ai-model-trained-in-english-finnish-hindi-japanese-vietnamese-and-code/) Gretel AI Text-to-SQL - Mark Tech Post (https://www.marktechpost.com/2024/04/04/gretel-ai-releases-largest-open-source-text-to-sql-dataset-to-accelerate-artificial-intelligence-ai-model-training/) Viking Model Family - Mark Tech Post (https://www.marktechpost.com/2024/04/07/silo-ai-releases-new-viking-model-family-pre-release-an-open-source-llm-for-all-nordic-languages-english-and-programming-languages/) Framework Hiring - Phoronix (https://www.phoronix.com/news/Framework-OSS-Firmware-Hiring) 36:26 Bell Canada is deleting DVR/PVR recordings Steve hates Bell Canada Marriage photographer story HDCP (https://en.wikipedia.org/wiki/High-bandwidth_Digital_Content_Protection) HDCP Stripper (https://www.amazon.com/THWT-HDMI-EDID-Emulator-Model/dp/B0CRRWQ7XS) OREI HDMI splitter (https://www.amazon.com/THWT-HDMI-EDID-Emulator-Model/dp/B0CRRWQ7XS) 43:03 Windows Upgrades/Updates Windows 10 EOL October 2025 Extended Security Updates (ESUs) Microsoft will charge for updates ARS Technica (https://arstechnica.com/gadgets/2024/04/post-2025-windows-10-updates-for-businesses-start-at-61-per-pc-go-up-from-there/) 46:06 Germany Switching to Linux Linux solves "Windows high hardware requirements" Schleswig-Holstein developing open source directory service LiMux from Munich Steve's take Lxer.com (lxer.com/module/newswire/ext_link.php?rid=339628) ARS Technica (https://arstechnica.com/information-technology/2024/04/german-state-gov-ditching-windows-for-linux-30k-workers-migrating/) 51:10 American Privacy Rights Act Fusion Centers (lxer.com/module/newswire/ext_link.php?rid=339628) IQT (https://www.iqt.org/) Tax dollars funding data collection companies The Register (https://www.theregister.com/2024/04/09/us_federal_privacy_law_apra/) -- The Extra Credit Section -- For links to the articles and material referenced in this week's episode check out this week's page from our podcast dashboard! This Episode's Podcast Dashboard (http://podcast.asknoahshow.com/384) Phone Systems for Ask Noah provided by Voxtelesys (http://www.voxtelesys.com/asknoah) Join us in our dedicated chatroom #GeekLab:linuxdelta.com on Matrix (https://element.linuxdelta.com/#/room/#geeklab:linuxdelta.com) -- Stay In Touch -- Find all the resources for this show on the Ask Noah Dashboard Ask Noah Dashboard (http://www.asknoahshow.com) Need more help than a radio show can offer? Altispeed provides commercial IT services and they're excited to offer you a great deal for listening to the Ask Noah Show. Call today and ask about the discount for listeners of the Ask Noah Show! Altispeed Technologies (http://www.altispeed.com/) Contact Noah live [at] asknoahshow.com -- Twitter -- Noah - Kernellinux (https://twitter.com/kernellinux) Ask Noah Show (https://twitter.com/asknoahshow) Altispeed Technologies (https://twitter.com/altispeed)
This week the crew starts by looking at a KDE throw-back distro, then followed that up with a bunch of April Fools news, and a few April first stories that check out. FFMPEG puches out version 7, LXC mints 6.0 LTS, and EEVDF is about feature complete. Then the XZ SSH backdoor gets an update, and that conversation turns a bit philisophical regarding how nice Open Source should really be. For tips we have the awesome selfhosted list, vim, xz --version and zstd, and then some xfs tools for resizing a partition. See the show notes at https://bit.ly/4aqmu5a and we hope to see you next time! Host: Jonathan Bennett Co-Hosts: Rob Campbell, David Ruggles, and Ken McDonald Want access to the video version and exclusive features? Become a member of Club TWiT today! https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.
Microsoft wins the foot-in-mouth award this week, and Google gets the Rust religion - but Mike is skeptical.
Process is important. This show is dedicated to examples of non-developer tasks that can be improved by coding scripts. Join Scott and Wes for a deep dive into automation magic. Show Notes 00:00 Welcome to Syntax! 02:11 Brought to you by Sentry.io. 03:02 FFmpeg, a tool for video producers. FFmpeg FFprobe 06:35 Markdown validation. Syntax Markdown Validation 09:21 AI timestamps for inform editing process. Episode 456 Transcript 12:19 Generating clips for social media. 13:31 YouTube find and replace tool. YouTube Find & Replace - work in progress 15:03 What other scripts can you create? 16:17 Packaging a tool for a non-developer to use. 16:54 Apple Scripts 17:45 Stand-alone website. 19:25 Script Kit: Shortcut to Everything 20:19 Other ways to run scripts. ZX Dax 22:05 Get in touch with your tips. Hit us up on Socials! Syntax: X Instagram Tiktok LinkedIn Threads Wes: X Instagram Tiktok LinkedIn Threads Scott:X Instagram Tiktok LinkedIn Threads Randy: X Instagram YouTube Threads
The stories that kept us talking all year, and are only getting hotter! Plus the big flops we're still sore about. Special Guest: Kenji Berthold.