POPULARITY
Categories
Think AI is hitting a wall? Nope. This is just the start. Actually, we're at the first chapter. Here's what that means, and how you can move your company ahead. Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Thoughts on this? Join the conversationUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:Generative AI's current phaseMeta's in-house AI chips developmentOpenAI's new developer toolsDay zero of AI and future prospectsReinforcement learning advancementsEmergent reasoning capabilities in AIBusiness implications of AI advancementsAI in healthcare and scienceTimestamps:00:00 Day Zero of AI03:31 AI Tools Enhance Customization & Access09:02 Reinforcement Learning Enhances AI Reasoning11:27 Agentic AI: The Future of Tasks15:59 Tech Potential vs. Everyday Utilization18:48 AI Models Offer Broad Benefits23:15 "Generative AI: Optimism and Oversight"27:08 Generative AI vs. Domain-Specific AI29:24 Superhuman AI: Next FrontierKeywords:Generative AI, Fortune 100 leaders, chat GBT, Microsoft Copilot, enterprise companies, day zero of AI, livestream podcast, free daily newsletter, leveraging AI, capital expenditures, Meta AI chips, Nvidia, Taiwan's TSMC, AI infrastructure investments, Amazon, Google, Microsoft, OpenAI, responses API, agents SDK, legal research, customer support, deep research, agentic AI, supervised learning, reinforcement learning, language models, health care, computational biology, AlphaFold, protein folding prediction.Send Everyday AI and Jordan a text message. (We can't reply back unless you leave contact info) Ready for ROI on GenAI? Go to youreverydayai.com/partner
Wes talks with Peter Pistorius about RedwoodSDK, a new React framework built natively for Cloudflare. They dive into real-time React, server components, zero-cost infrastructure, and why RedwoodSDK empowers developers to ship faster with fewer tradeoffs and more control. Show Notes 00:00 Welcome to Syntax! 00:52 What is RedwoodSDK? 04:49 Choosing openness over abstraction 08:46 More setup, more control 12:20 Why RedwoodSDK only runs on Cloudflare 14:25 What the database setup looks like 16:15 Durable Objects explained – Ep 879: Fullstack Cloudflare 18:14 Middleware and request flow 23:14 No built-in client-side router? 24:07 Integrating routers with defineApp 26:04 React Server Components and real-time updates 29:53 What happened to RedwoodJS? 31:14 Why do opinionated frameworks struggle to catch on? 34:35 The problem with Lambdas 36:16 Cloudflare's JavaScript runtime compatibility 40:04 Brought to you by Sentry.io 41:44 The vision behind RedwoodSDK Hit us up on Socials! Syntax: X Instagram Tiktok LinkedIn Threads Wes: X Instagram Tiktok LinkedIn Threads Scott: X Instagram Tiktok LinkedIn Threads Randy: X Instagram YouTube Threads
Welcome to this exciting episode of The Edge of Show! Join us as we sit down with Douglas Horn, the CEO of Goodblock and founder of Ease Protocol, live from Dubai.In this episode, Douglas shares his journey from the entertainment industry to the world of blockchain, discussing his previous work with the Telos blockchain and the innovative technologies he developed, including Telos EVM and cross-chain bridging. He dives deep into the mission of Ease Protocol, a blockchain platform designed to meet regulatory standards and enhance user experience, making blockchain accessible for businesses and governments.Discover how Ease Protocol aims to revolutionize the blockchain landscape with features like sequestered encryption, modular SDKs, and a unique approach to compliance that addresses the needs of governments. Douglas also discusses the importance of user-friendly interfaces and the potential for stablecoins and CBDCs in various jurisdictions.Tune in to learn about the future of blockchain technology, the role of governance, and the exciting developments on the horizon for Ease Protocol.Don't forget to like, subscribe, and hit the notification bell for more insights into the world of Web3 tech and culture!
Apologies for the hiatus! Dave needed some time off to recover from burnout, and these episodes remained in the can. Thanks for Waiting for us
Scott and Wes break down the latest in JavaScript news, including new async patterns in Svelte, React Server Component tooling with Parcel, and Redwood's push into Cloudflare with its new SDK. They also cover what's new in Storybook 9 Beta, from visual testing to a sleeker, lighter build. Show Notes 00:00 Welcome to Syntax! 02:50 Brought to you by Sentry.io. 03:37 Syntax Meetup! 04:09 React View Transitions. 08:58 addTransitionType. 11:18 Activity API. Offscreen Renamed to Activity. 14:22 Maintaining state in search queries. 16:29 Asynchronous Svelte. Playground. 19:04 Svelte Boundary. 25:13 Parcel RSC. 27:15 Redwood SDK. 30:55 Storybook 9 Beta. Hit us up on Socials! Syntax: X Instagram Tiktok LinkedIn Threads Wes: X Instagram Tiktok LinkedIn Threads Scott: X Instagram Tiktok LinkedIn Threads Randy: X Instagram YouTube Threads
Die Themen im heutigen Versicherungsfunk Update sind: Eltern als Abschlusszielgruppe: Jeder Fünfte plant Versicherung Laut YouGov-Analyse planen 21 % der Eltern in Deutschland in den nächsten 12 Monaten den Abschluss oder Wechsel eines Versicherungsvertrags. Besonders Männer zwischen 35 und 44 Jahren mit Vollzeitjob zeigen Abschlussbereitschaft. Sie sind überdurchschnittlich investitionsfreudig, gesundheitsbewusst und versicherungsaffin. KI, Deepfakes und digitale Gesundheit: Neue Ausgabe des Tech Trend Radar veröffentlicht Welche Technologien verändern 2025 die Versicherungsbranche? Munich Re und ERGO präsentieren mit dem neuen Tech Trend Radar 2025 einen Überblick über 36 technologiegetriebene Trends – von KI-Agenten über Digital Healthcare bis Deepfake Defense. Die Publikation zeigt Chancen, Risiken und Einsatzmöglichkeiten entlang der gesamten Wertschöpfungskette von Versicherern. Stuttgarter ernennt Volker Bohn zum Generalbevollmächtigten für unabhängige Vertriebspartner Die Stuttgarter Versicherungsgruppe hat Volker Bohn zum Generalbevollmächtigten für unabhängige Vertriebspartner ernannt. Seit 2019 im Unternehmen, steht Bohn für Marktnähe, strategischen Weitblick und starke Partnerschaften. Neben dem Vertrieb war er auch Nachhaltigkeitsbeauftragter und gestaltete aktiv die Unternehmensstrategie sowie das Fusionsprojekt mit der SDK mit. ESG-Award für hylane: DEVK-Tochter überzeugt mit Wasserstoff-Lkw-Vermietung Die DEVK-Tochter hylane hat beim ESG Transformation Award den 2. Platz in der Kategorie „Change Enabler“ belegt. Das Unternehmen bietet ein nutzungsbasiertes Mietmodell für emissionsfreie Wasserstoff-Lkw an – und unterstützt damit die klimaneutrale Transformation im Schwerlastverkehr. Neben einem nachhaltigen Antrieb profitieren Firmen von CO₂-Nachweisen zur Erfüllung der neuen EU-Berichtspflichten. Zurich gewinnt ESG Transformation Award in der Kategorie „Change Enabler“ Die Zurich Gruppe Deutschland wurde für ihren Bereich Zurich Resilience Solutions mit dem ESG Transformation Award ausgezeichnet. In der Kategorie „Change Enabler“ überzeugte Zurich mit einer skalierbaren Lösung zur Klimarisikoanalyse – auch für kleine und mittlere Unternehmen. Die unabhängige Jury lobte den konkreten Impact der digitalen Dienstleistung. Marsh ernennt Thomas Droberg zum Leiter Cyber-Risiko-Sparte Marsh Deutschland hat Thomas Droberg zum neuen Leiter der Spezialabteilung CYRIS für Cyber-Versicherung, Incident- und Claims-Management ernannt. Droberg übernimmt die Funktion von Johannes Behrends, der zu Jahresbeginn Chief Specialty Officer wurde. Neuer Leiter Cyber Insurance ist Patrick Baas, der seit 2020 bei Marsh tätig ist.
Take a Network Break! This week we catch up on the Airborne vulnerabilities affecting Apple’s AirPlay protocol and SDK, and get an update on active exploits against an SAP NetWeaver vulnerability–a patch is available, so get fixing if you haven’t already. Palo Alto Networks launches the AIRS platform to address AI threats in the enterprise,... Read more »
From Bitcoin Magazine X Spaces, Business Reporter Juan Galt sits down with Roy Sheinfeld, co-founder of Breeze, to discuss the evolving architecture of Bitcoin's Lightning Network and how Breeze is working to make Bitcoin more usable as a medium of exchange. Roy explains Breeze's unique SDK architecture, the role of Lightning Service Providers (LSPs), and the technical trade-offs of custodial vs. non-custodial systems.The conversation dives deep into the mechanics of self-custody, key management, watchtowers, and the controversial use of stablecoins like USDT on Lightning via Taproot Assets. Roy also shares what's coming next for Breeze—including support for Bolt 12, BIP 353, Nostr Wallet Connect, and WASM.Chapters:00:00 – Intro and Bitcoin News Roundup05:40 – What is Breeze? Company Mission and Evolution08:00 – Solving Lightning's UX: Breeze SDK Explained12:00 – Key Management, External Signers, and Custody Models15:00 – Native Lightning vs. Liquid-Based Payments18:00 – Lightning Service Providers and UX Challenges21:00 – Is Breeze Custodial? The Liquid Trust Model Debate26:00 – Defining “Self-Custody”: Terminology Controversies30:00 – Phoenix, Watchtowers & Trust Assumptions in Lightning34:00 – Unilateral Exit as a Bitcoin Layer 2 Filter39:00 – Bolt 12 and BIP 353 Support Incoming42:00 – Building for Nostr: Wallet Connect and Zaps44:00 – Taproot Assets & Stablecoins on Lightning47:00 – Regulatory Risks and the USDT Lightning Takeover Concern49:00 – What's Next for Breeze: Mr. Breeze, WASM, and Multi-Network SupportRecorded 04/05/2025
Take a Network Break! This week we catch up on the Airborne vulnerabilities affecting Apple’s AirPlay protocol and SDK, and get an update on active exploits against an SAP NetWeaver vulnerability–a patch is available, so get fixing if you haven’t already. Palo Alto Networks launches the AIRS platform to address AI threats in the enterprise,... Read more »
Take a Network Break! This week we catch up on the Airborne vulnerabilities affecting Apple’s AirPlay protocol and SDK, and get an update on active exploits against an SAP NetWeaver vulnerability–a patch is available, so get fixing if you haven’t already. Palo Alto Networks launches the AIRS platform to address AI threats in the enterprise,... Read more »
In this episode of the Cognitive Revolution, an AI-narrated version of Joshua Clymer's story on how AI might take over in two years is presented. The episode is based on Josh's appearance on the Audio Tokens podcast with Lukas Peterson. Joshua Clymer, a technical AI safety researcher at Redwood Research, shares a fictional yet plausible AI scenario grounded in current industry realities and trends. The story highlights potential misalignment risks, competitive pressures among AI labs, and the importance of government regulation and safety measures. After the story, Josh and Lukas discuss these topics further, including Josh's personal decision to purchase a bio shelter for his family. The episode is powered by ElevenLabs' AI voice technology. SPONSORS: ElevenLabs: ElevenLabs gives your app a natural voice. Pick from 5,000+ voices in 31 languages, or clone your own, and launch lifelike agents for support, scheduling, learning, and games. Full server and client SDKs, dynamic tools, and monitoring keep you in control. Start free at https://elevenlabs.io/cognitive-revolution Oracle Cloud Infrastructure (OCI): Oracle Cloud Infrastructure offers next-generation cloud solutions that cut costs and boost performance. With OCI, you can run AI projects and applications faster and more securely for less. New U.S. customers can save 50% on compute, 70% on storage, and 80% on networking by switching to OCI before May 31, 2024. See if you qualify at https://oracle.com/cognitive Shopify: Shopify powers millions of businesses worldwide, handling 10% of U.S. e-commerce. With hundreds of templates, AI tools for product descriptions, and seamless marketing campaign creation, it's like having a design studio and marketing team in one. Start your $1/month trial today at https://shopify.com/cognitive NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive PRODUCED BY: https://aipodcast.ing CHAPTERS: (00:00) About the Episode (04:31) Interview start between Josh and Lucas (11:00) Start of AI story (Part 1) (24:37) Sponsors: ElevenLabs | Oracle Cloud Infrastructure (OCI) (27:05) Start of AI story (Part 2) (Part 1) (40:50) Sponsors: Shopify | NetSuite (44:15) Start of AI story (Part 2) (Part 2) (01:20:09) End of AI story (02:01:20) Outro
Cloud Connections 2025 | St. Petersburg, FL “If you think you're moving fast, you're probably not moving fast enough.” That was the core message from Mike Tessler, managing partner at True North Advisory, in his opening keynote at the Cloud Connections 2025 conference. In a session titled “Don't Stop Believin': AI's Journey in Enterprise Transformation,” Tessler shifted the AI conversation from capabilities to strategy. Instead of showcasing the latest contact center tricks or flashy generative features, he dove deep into how enterprises should approach AI adoption—with urgency, realism, and a clear plan. Tessler framed the moment as a once-in-a-generation inflection point. Just 866 days since ChatGPT launched, enterprises have been flooded with AI solutions, but many are still struggling with actual implementation. “The field is exploding, but there's friction,” said Tessler, noting that while consumers quickly embraced AI tools, corporate environments remain slow to adapt. Three Big Takeaways from Tessler's Talk AI Is Only as Good as Your Data Enterprises must start by understanding their own data. “Almost every company says, ‘We don't have data,'” Tessler observed, “but they do. They just don't know how to surface and structure it.” He suggested simple tools like JSON to codify marketing guidelines or operational principles and inject consistency into AI-generated content. Enterprise Strategy Starts with Personal Productivity Tessler outlined a three-layer AI roadmap used at Boldyn Networks, where he serves on the board: Layer 1: Personal Productivity (e.g., Copilot, Gemini) Layer 2: Team & Process-Level AI (e.g., AI in network design/deployment) Layer 3: New Services & Capabilities enabled by proprietary data This layered model helps unify enterprise goals and align AI projects with tangible outcomes. Start Small, Move Fast, Stay Agile Forget long IT rollouts, said Tessler. AI adoption demands an agile, iterative approach. Small proofs of concept are key. “Something that wasn't possible last week might be today,” he warned. “So get started now.” Real-World Use Cases: Where AI Is Delivering Value Today Tessler concluded with four examples of AI being used to solve real business problems: Spinoco – Helps micro-businesses manage customer interactions by turning every message, call, or DM into actionable tasks, no CRM needed. Kiwi Data – Uses AI to extract key terms and obligations from decades of contracts and NDAs, helping enterprises get a grip on what they've signed. Tato – Leverages the “exhaust” of UCaaS platforms (transcripts, messages) to identify project risks and drive smarter project management. Intent HQ – Delivers hyper-personalized marketing using behavioral data harvested via mobile SDKs. A Call to Action for the Telecom Community Tessler left the audience with a challenge: "We have to change the way we do things—or get wiped out by those who do." He encouraged every organization to return home with at least one AI use case to explore. “Try something. Test. Learn. Iterate.” To request the slides from the keynote, contact: info@truenorthadvisory.com
In this eye-opening episode of Reimagining Cyber, host Tyler Moffitt is joined by Tom Tovar, co-creator of cybersecurity company Apto, to unpack one of the fastest-growing threats in mobile security—deepfakes and biometric bypass attacks.Tom explains why facial recognition, once considered a reliable security measure, was never designed to withstand today's AI-powered spoofing tactics. From simple call interception techniques to sophisticated real-time face-swapping and buffer overrides, Tom walks us through the anatomy of modern biometric attacks. He also reveals why most mobile apps—and even top-tier facial recognition systems—are currently defenseless against these threats.We dive deep into the vulnerabilities hiding in plain sight within mobile frameworks, and why defending facial recognition starts with the app itself, not the authentication system. Plus, Tom gives us a glimpse into how AI is being used to both attack and defend, and what the future of mobile app security might look like.If you think your face is your password, think again.Topics Covered:How attackers bypass facial recognition without even needing a deepfakeCommon tools and techniques used to manipulate authentication flowsThe problem with relying on SDK-based facial recognition vendorsWhy the future of defense lies in app-level perimeter securityHow Apto is using AI to build autonomous, in-app defensesWhether you're a security professional or just fascinated by the evolving threat landscape, this is a must-listen episode.Follow or subscribe to the show on your preferred podcast platform.Share the show with others in the cybersecurity world.Get in touch via reimaginingcyber@gmail.com As featured on Million Podcasts' Best 100 Cybersecurity Podcast and Best 70 Chief Information Security Officer CISO Podcasts rankings.
Vojtěch Meluzín is the founder and CEO of MeldaProduction, a Czech-based audio software company renowned for its innovative and versatile audio plugins. Born and raised in the Czech Republic, Meluzín's journey into music and technology began early. He started programming at the age of 10 and was already creating music on an Atari using Cubase by the age of 8. Meluzín pursued his passion for music and technology through formal education, culminating in a university degree where he developed a GUI system for plugins as part of his diploma work. His initial foray into audio software development was the creation of MDrummer, a project that began as a school assignment but evolved into a commercial product. Encouraged by a peer to market his creation, Meluzín improved MDrummer significantly and decided to venture into the audio plugin industry independently. Under his leadership, MeldaProduction has grown to offer over 120 plugins, including the flagship MSoundFactory and MTurboComp. Meluzín is known for his hands-on approach, having developed the company's custom framework without relying on third-party SDKs, allowing for complete control and optimization. He emphasizes innovation over imitation, often critiquing the industry's focus on analog emulation and instead leveraging digital capabilities to create unique audio processing tools. Meluzín's work is characterized by a commitment to pushing the boundaries of audio software, integrating machine learning and advanced algorithms to enhance functionality and user experience. His dedication to quality and innovation has positioned MeldaProduction as a respected name in the audio production community. Vojtech Meluzin Links Mr. Bill's Links
Hey everyone, Alex here
Jassem "J" Osseiran is an experienced entrepreneur and operational investor based in London. With a deep understanding of global finance gained through his Economics degree at the University of San Francisco, Jay began his career as a tech venture builder with Rocket Internet in New York. He then shifted his focus to the Middle East and North Africa (MENA), managing media portfolios and founding startups tailored to the region. A sought-after advisor for high-growth brands and cutting-edge technology platforms, Jay has orchestrated multiple successful exits, including a notable IPO on the London Stock Exchange. In 2020, he founded 611 Capital Investments, concentrating on investments, incubation, and market scalability across the EU, the Middle East, and Asia. Currently, Jay is the Co-Founder and Chief Strategy Officer of PlaysOut Technologies, where he is spearheading the commercial framework for seamlessly integrating mini-games within super apps. This initiative represents a significant shift in the digital ecosystem, leveraging evolving Web3 technologies to unlock immense potential for mini-games in the digital space. Ronan recently caught up with Jay and he spoke about his background, what PlaysOuts do, blockchain, AI agents, one click publishing and more. More about Playsout: PlaysOut is a globally oriented open platform for mini-programs that fully aligns with the Weixin Mini-Program framework. It provides SDK interfaces for super apps, offering a rich array of mini-program and mini-game content. Additionally, PlaysOut streamlines the integration process for developers, providing convenient tools and seamless access to mini-program and mini-game products. Leveraging its robust technical capabilities and deep collaborations with game developers, PlaysOut aims to connect with global traffic partners, gradually expanding its reach across international markets. Its goal is to become the largest open platform for mini-programs and mini-games worldwide. See more podcasts here. More about Irish Tech News Irish Tech News are Ireland's No. 1 Online Tech Publication and often Ireland's No.1 Tech Podcast too. You can find hundreds of fantastic previous episodes and subscribe using whatever platform you like via our Anchor.fm page here: https://anchor.fm/irish-tech-news If you'd like to be featured in an upcoming Podcast email us at Simon@IrishTechNews.ie now to discuss. Irish Tech News have a range of services available to help promote your business. Why not drop us a line at Info@IrishTechNews.ie now to find out more about how we can help you reach our audience. You can also find and follow us on Twitter, LinkedIn, Facebook, Instagram, TikTok and Snapchat.
Discover how Oracle APEX leverages OCI AI services to build smarter, more efficient applications. Hosts Lois Houston and Nikita Abraham interview APEX experts Chaitanya Koratamaddi, Apoorva Srinivas, and Toufiq Mohammed about how key services like OCI Vision, Oracle Digital Assistant, and Document Understanding integrate with Oracle APEX. Packed with real-world examples, this episode highlights all the ways you can enhance your APEX apps. Oracle APEX: Empowering Low Code Apps with AI: https://mylearn.oracle.com/ou/course/oracle-apex-empowering-low-code-apps-with-ai/146047/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode. --------------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:25 Lois: Hello and welcome to the Oracle University Podcast. I'm Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Team Lead: Editorial Services. Nikita: Hi everyone! Last week, we looked at how generative AI powers Oracle APEX and in today's episode, we're going to focus on integrating APEX with OCI AI Services. Lois: That's right, Niki. We're going to look at how you can use Oracle AI services like OCI Vision, Oracle Digital Assistant, Document Understanding, OCI Generative AI, and more to enhance your APEX apps. 01:03 Nikita: And to help us with it all, we've got three amazing experts with us, Chaitanya Koratamaddi, Director of Product Management at Oracle, and senior product managers, Apoorva Srinivas and Toufiq Mohammed. In today's episode, we'll go through each Oracle AI service and look at how it interacts with APEX. Apoorva, let's start with you. Can you explain what the OCI Vision service is? Apoorva: Oracle Cloud Infrastructure Vision is a serverless multi-tenant service accessible using the console or REST APIs. You can upload images to detect and classify objects in them. With prebuilt models available, developers can quickly build image recognition into their applications without machine learning expertise. OCI Vision service provides a fully managed model infrastructure. With complete integration with OCI Data Labeling, you can build custom models easily. OCI Vision service provides pretrained models-- Image Classification, Object Detection, Face Detection, and Text Recognition. You can build custom models for Image Classification and Object Detection. 02:24 Lois: Ok. What about its use cases? How can OCI Vision make APEX apps more powerful? Apoorva: Using OCI Vision, you can make images and videos discoverable and searchable in your APEX app. You can use OCI Vision to detect and classify objects in the images. OCI Vision also highlights the objects using a red rectangular box. This comes in handy in use cases such as detecting vehicles that have violated the rules in traffic images. You can use OCI Vision to identify visual anomalies in your data. This is a very popular use case where you can detect anomalies in cancer X-ray images to detect cancer. These are some of the most popular use cases of using OCI Vision with your APEX app. But the possibilities are endless and you can use OCI Vision for any of your image analysis. 03:29 Nikita: Let's shift gears to Oracle Digital Assistant. Chaitanya, can you tell us what it's all about? Chaitanya: Oracle Digital Assistant is a low-code conversational AI platform that allows businesses to build and deploy AI assistants. It provides natural language understanding, automatic speech recognition, and text-to-speech capabilities to enable human-like interactions with customers and employees. Oracle Digital Assistant comes with prebuilt templates for you to get started. 04:00 Lois: What are its key features and benefits, Chaitanya? How does it enhance the user experience? Chaitanya: Oracle Digital Assistant provides conversational AI capabilities that include generative AI features, natural language understanding and ML, AI-powered voice, and analytics and insights. Integration with enterprise applications become easier with unified conversational experience, prebuilt chatbots for Oracle Cloud applications, and chatbot architecture frameworks. Oracle Digital Assistant provides advanced conversational design tools, conversational designer, dialogue and domain trainer, and native multilingual support. Oracle Digital Assistant is open, scalable, and secure. It provides multi-channel support, automated bot-to-agent transfer, and integrated authentication profile. 04:56 Nikita: And what about the architecture? What happens at the back end? Chaitanya: Developers assemble digital assistants from one or more skills. Skills can be based on prebuilt skills provided by Oracle or third parties, custom developed, or based on one of the many skill templates available. 05:16 Lois: Chaitanya, what exactly are “skills” within the Oracle Digital Assistant framework? Chaitanya: Skills are individual chatbots that are designed to interact with users and fulfill specific type of tasks. Each skill helps a user complete a task through a combination of text messages and simple UI elements like select list. When a user request is submitted through a channel, the Digital Assistant routes the user's request to the most appropriate skill to satisfy the user's request. Skills can combine multilingual NLP deep learning engine, a powerful dialogflow engine, and integration components to connect to back-end systems. Skills provide a modular way to build your chatbot functionality. Now users connect with a chatbot through channels such as Facebook, Microsoft Teams, or in our case, Oracle APEX chatbot, which is embedded into an APEX application. 06:21 Nikita: That's fascinating. So, what are some use cases of Oracle Digital Assistant in APEX apps? Chaitanya: Digital assistants streamline approval processes by collecting information, routing requests, and providing status updates. Digital assistants offer instant access to information and documentation, answering common questions and guiding users. Digital assistants assist sales teams by automating tasks, responding to inquiries, and guiding prospects through the sales funnel. Digital assistants facilitate procurement by managing orders, tracking deliveries, and handling supplier communication. Digital assistants simplify expense approvals by collecting reports, validating receipts, and routing them for managerial approval. Digital assistants manage inventory by tracking stock levels, reordering supplies, and providing real-time inventory updates. Digital assistants have become a common UX feature in any enterprise application. 07:28 Want to learn how to design stunning, responsive enterprise applications directly from your browser with minimal coding? The new Oracle APEX Developer Professional learning path and certification enables you to leverage AI-assisted development, including generative AI and Database 23ai, to build secure, scalable web and mobile applications with advanced AI-powered features. From now through May 15, 2025, we're waiving the certification exam fee (valued at $245). So, what are you waiting for? Visit mylearn.oracle.com to get started today. 08:09 Nikita: Welcome back! Thanks for that, Chaitanya. Toufiq, let's talk about the OCI Document Understanding service. What is it? Toufiq: Using this service, you can upload documents to extract text, tables, and other key data. This means the service can automatically identify and extract relevant information from various types of documents, such as invoices, receipts, contracts, etc. The service is serverless and multitenant, which means you don't need to manage any servers or infrastructure. You can access this service using the console, REST APIs, SDK, or CLI, giving you multiple ways to integrate. 08:55 Nikita: What do we use for APEX apps? Toufiq: For APEX applications, we will be using REST APIs to integrate the service. Additionally, you can process individual files or batches of documents using the ProcessorJob API endpoint. This flexibility allows you to handle different volumes of documents efficiently, whether you need to process a single document or thousands at once. With these capabilities, the OCI Document Understanding service can significantly streamline your document processing tasks, saving time and reducing the potential for manual errors. 09:36 Lois: Ok. What are the different types of models available? How do they cater to various business needs? Toufiq: Let us start with pre-trained models. These are ready-to-use models that come right out of the box, offering a range of functionalities. The available models are Optical Character Recognition (OCR) enables the service to extract text from documents, allowing you to digitize, scan the documents effortlessly. You can precisely extract text content from documents. Key-value extraction, useful in streamlining tasks like invoice processing. Table extraction can intelligently extract tabular data from documents. Document classification automatically categorizes documents based on their content. OCR PDF enables seamless extraction of text from PDF files. Now, what if your business needs go beyond these pre-trained models. That's where custom models come into play. You have the flexibility to train and build your own models on top of these foundational pre-trained models. Models available for training are key value extraction and document classification. 10:50 Nikita: What does the architecture look like for OCI Document Understanding? Toufiq: You can ingest or supply the input file in two different ways. You can upload the file to an OCI Object Storage location. And in your request, you can point the Document Understanding service to pick the file from this Object Storage location. Alternatively, you can upload a file directly from your computer. Once the file is uploaded, the Document Understanding service can process the file and extract key information using the pre-trained models. You can also customize models to tailor the extraction to your data or use case. After processing the file, the Document Understanding service stores the results in JSON format in the Object Storage output bucket. Your Oracle APEX application can then read the JSON file from the Object Storage output location, parse the JSON, and store useful information at local table or display it on the screen to the end user. 11:52 Lois: And what about use cases? How are various industries using this service? Toufiq: In financial services, you can utilize Document Understanding to extract data from financial statements, classify and categorize transactions, identify and extract payment details, streamline tax document management. Under manufacturing, you can perform text extraction from shipping labels and bill of lading documents, extract data from production reports, identify and extract vendor details. In the healthcare industry, you can automatically process medical claims, extract patient information from forms, classify and categorize medical records, identify and extract diagnostic codes. This is not an exhaustive list, but provides insights into some industry-specific use cases for Document Understanding. 12:50 Nikita: Toufiq, let's switch to the big topic everyone's excited about—the OCI Generative AI Service. What exactly is it? Toufiq: OCI Generative AI is a fully managed service that provides a set of state of the art, customizable large language models that cover a wide range of use cases. It provides enterprise grade generative AI with data governance and security, which means only you have access to your data and custom-trained models. OCI Generative AI provides pre-trained out-of-the-box LLMs for text generation, summarization, and text embedding. OCI Generative AI also provides necessary tools and infrastructure to define models with your own business knowledge. 13:37 Lois: Generally speaking, how is OCI Generative AI useful? Toufiq: It supports various large language models. New models available from Meta and Cohere include Llama2 developed by Meta, and Cohere's Command model, their flagship text generation model. Additionally, Cohere offers the Summarize model, which provides high-quality summaries, accurately capturing essential information from documents, and the Embed model, converting text to vector embeddings representation. OCI Generative AI also offers dedicated AI clusters, enabling you to host foundational models on private GPUs. It integrates LangChain and open-source framework for developing new interfaces for generative AI applications powered by language models. Moreover, OCI Generative AI facilitates generative AI operations, providing content moderation controls, zero downtime endpoint model swaps, and endpoint deactivation and activation capabilities. For each model endpoint, OCI Generative AI captures a series of analytics, including call statistics, tokens processed, and error counts. 14:58 Nikita: What about the architecture? How does it handle user input? Toufiq: Users can input natural language, input/output examples, and instructions. The LLM analyzes the text and can generate, summarize, transform, extract information, or classify text according to the user's request. The response is sent back to the user in the specified format, which can include raw text or formatting like bullets and numbering, etc. 15:30 Lois: Can you share some practical use cases for generative AI in APEX apps? Toufiq: Some of the OCI generative AI use cases for your Oracle APEX apps include text summarization. Generative AI can quickly summarize lengthy documents such as articles, transcripts, doctor's notes, and internal documents. Businesses can utilize generative AI to draft marketing copy, emails, blog posts, and product descriptions efficiently. Generative AI-powered chatbots are capable of brainstorming, problem solving, and answering questions. With generative AI, content can be rewritten in different styles or languages. This is particularly useful for localization efforts and catering to diverse audience. Generative AI can classify intent in customer chat logs, support tickets, and more. This helps businesses understand customer needs better and provide tailored responses and solutions. By searching call transcripts, internal knowledge sources, Generative AI enables businesses to efficiently answer user queries. This enhances information retrieval and decision-making processes. 16:47 Lois: Before we let you go, can you explain what Select AI is? How is it different from the other AI services? Toufiq: Select AI is a feature of Autonomous Database. This is where Select AI differs from the other AI services. Be it OCI Vision, Document Understanding, or OCI Generative AI, these are all freely managed standalone services on Oracle Cloud, accessible via REST APIs. Whereas Select AI is a feature available in Autonomous Database. That means to use Select AI, you need Autonomous Database. 17:26 Nikita: And what can developers do with Select AI? Toufiq: Traditionally, SQL is the language used to query the data in the database. With Select AI, you can talk to the database and get insights from the data in the database using human language. At the very basic, what Select AI does is it generates SQL queries using natural language, like an NL2SQL capability. 17:52 Nikita: How does it actually do that? Toufiq: When a user asks a question, the first step Select AI does is look into the AI profile, which you, as a developer, define. The AI profile holds crucial information, such as table names, the LLM provider, and the credentials needed to authenticate with the LLM service. Next, Select AI constructs a prompt. This prompt includes information from the AI profile and the user's question. Essentially, it's a packet of information containing everything the LLM service needs to generate SQL. The next step is generating SQL using LLM. The prompt prepared by Select AI is sent to the available LLM services via REST. Which LLM to use is configured in the AI profile. The supported providers are OpenAI, Cohere, Azure OpenAI, and OCI Generative AI. Once the SQL is generated by the LLM service, it is returned to the application. The app can then handle the SQL query in various ways, such as displaying the SQL results in a report format or as charts, etc. 19:05 Lois: This has been an incredible discussion! Thank you, Chaitanya, Apoorva, and Toufiq, for walking us through all of these amazing AI tools. If you're ready to dive deeper, visit mylearn.oracle.com and search for the Oracle APEX: Empowering Low Code Apps with AI course. You'll find step-by-step guides and demos for everything we covered today. Nikita: Until next week, this is Nikita Abraham… Lois: And Lois Houston signing off! 19:31 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
Jassem “J” Osseiran is an experienced entrepreneur and operational investor based in London. With a deep understanding of global finance gained through his Economics degree at the University of San Francisco, Jay began his career as a tech venture builder with Rocket Internet in New York. He then shifted his focus to the Middle East and North Africa (MENA), managing media portfolios and founding startups tailored to the region.A sought-after advisor for high-growth brands and cutting-edge technology platforms, Jay has orchestrated multiple successful exits, including a notable IPO on the London Stock Exchange. In 2020, he founded 611 Capital Investments, concentrating on investments, incubation, and market scalability across the EU, the Middle East, and Asia.Currently, Jay is the Co-Founder and Chief Strategy Officer of PlaysOut Technologies, where he is spearheading the commercial framework for seamlessly integrating mini-games within super apps. This initiative represents a significant shift in the digital ecosystem, leveraging evolving Web3 technologies to unlock immense potential for mini-games in the digital space.Ronan recently caught up with Jay and he spoke about his background, what PlaysOuts do, blockchain, AI agents, one click publishing and more.More about Playsout:PlaysOut is a globally oriented open platform for mini-programs that fully aligns with the Weixin Mini-Program framework. It provides SDK interfaces for super apps, offering a rich array of mini-program and mini-game content. Additionally, PlaysOut streamlines the integration process for developers, providing convenient tools and seamless access to mini-program and mini-game products. Leveraging its robust technical capabilities and deep collaborations with game developers, PlaysOut aims to connect with global traffic partners, gradually expanding its reach across international markets. Its goal is to become the largest open platform for mini-programs and mini-games worldwide.
RJJ Software's Software Development Service This episode of The Modern .NET Show is supported, in part, by RJJ Software's Software Development Services, whether your company is looking to elevate its UK operations or reshape its US strategy, we can provide tailored solutions that exceed expectations. Show Notes "So on my side it was actually, the interesting experience was that I kind of used it one way, because it was mainly about reading the Python code, the JavaScript code, and, let's say like, the Go implementations, trying to understand what are the concepts, what are the ways about how it has been implemented by the different teams. And then, you know, switching mentally into the other direction of writing than the code in C#."— Jochen Kirstaetter Welcome friends to The Modern .NET Show; the premier .NET podcast, focusing entirely on the knowledge, tools, and frameworks that all .NET developers should have in their toolbox. We are the go-to podcast for .NET developers worldwide, and I am your host: Jamie “GaProgMan” Taylor. In this episode, Jochen Kirstaetter joined us to talk about his .NET SDK for interacting with Google's Gemini suite of LLMs. Jochen tells us that he started his journey by looking at the existing .NET SDK, which didn't seem right to him, and wrote his own using the HttpClient and HttpClientFactory classes and REST. "I provide a test project with a lot of tests. And when you look at the simplest one, is that you get your instance of the Generative AI type, which you pass in either your API key, if you want to use it against Google AI, or you pass in your project ID and location if you want to use it against Vertex AI. Then you specify which model that you like to use, and you specify the prompt, and the method that you call is then GenerateContent and you get the response back. So effectively with four lines of code you have a full integration of Gemini into your .NET application."— Jochen Kirstaetter Along the way, we discuss the fact that Jochen had to look into the Python, JavaScript, and even Go SDKs to get a better understanding of how his .NET SDK should work. We discuss the “Pythonistic .NET” and “.NETy Python” code that developers can accidentally end up writing, if they're not careful when moving from .NET to Python and back. And we also talk about Jochen's use of tests as documentation for his SDK. Anyway, without further ado, let's sit back, open up a terminal, type in `dotnet new podcast` and we'll dive into the core of Modern .NET. Supporting the Show If you find this episode useful in any way, please consider supporting the show by either leaving a review (check our review page for ways to do that), sharing the episode with a friend or colleague, buying the host a coffee, or considering becoming a Patron of the show. Full Show Notes The full show notes, including links to some of the things we discussed and a full transcription of this episode, can be found at: https://dotnetcore.show/season-7/google-gemini-in-net-the-ultimate-guide-with-jochen-kirstaetter/ Jason's Links: JoKi's MVP Profile JoKi's Google Developer Expert Profile JoKi's website Other Links: Generative AI for .NET Developers with Amit Bahree curl Noda Time with Jon Skeet Google Cloud samples repo on GitHub Google's Gemini SDK for Python Google's Gemini SDK for JavaScript Google's Gemini SDK for Go Vertex AI JoKi's base NuGet package: Mscc.GenerativeAI JoKi's NuGet package: Mscc.GenerativeAI.Google System.Text.Json gcloud CLI .NET Preprocessor directives .NET Target Framework Monikers QUIC protocol IAsyncEnumerable Microsoft.Extensions.AI Supporting the show: Leave a rating or review Buy the show a coffee Become a patron Getting in Touch: Via the contact page Joining the Discord Remember to rate and review the show on Apple Podcasts, Podchaser, or wherever you find your podcasts, this will help the show's audience grow. Or you can just share the show with a friend. And don't forget to reach out via our Contact page. We're very interested in your opinion of the show, so please get in touch. You can support the show by making a monthly donation on the show's Patreon page at: https://www.patreon.com/TheDotNetCorePodcast. Music created by Mono Memory Music, licensed to RJJ Software for use in The Modern .NET Show
What's coming with Expo SDK 53? I dive into the latest news, trends and upcoming features of Expo and React Native, and share updates on the projects I'm working on.Also in this episode:- How RevenueCat Paywalls make my life better- I talked with Google- "Is this sponsored by Expo?"- AI Image Trends
In this episode, Jake and Michael discuss Michael's new recording gear, building integrations with external APIs using Saloon, and configuring Laravel Horizon.
Lángh Tamás – Nagy Martin –Szenci Attila – Zsák PéterIdőbélyegekés témák:[00:00:01] - Bevezetés,résztvevők bemutatása[00:01:47] - Technológiaikihívások a Z-Wave és Zigbee rendszereknél[00:09:00] - SDK és firmwarefejlesztések problematikája[00:12:07] - Termékválasztásés támogatás fontossága[00:15:48] - Valósfelhasználási esetek és nem várt problémák kezelése[00:35:11] - Az okosotthontechnológia evolúciója, a piac változása[00:37:32] - Fűtési rendszerekés problémakezelés[00:40:27] - Flood mitigationrendszerek és megelőző megoldások[00:44:21] - Automatizációelőnyei különböző felhasználási területeken[00:50:23] - Diagnosztikairendszerek és mérési pontok fontossága[00:56:51] - AI szerepe éslehetőségei az okosotthon rendszerekben[01:01:48] - AI használatajavaslatokra vs. közvetlen vezérlésre[01:05:37] - Záró gondolatok atechnológia jövőjérőlHa szeretnél hasonló gondolkodású okosotthon rajongókkalés szakemberekkel beszélgetni, csatlakozz a Discord csatornánkhozvagy kövesd a LinkedInprofilomat.Ha szeretnél mélyebben elmerülni az okosotthonokvilágában, első lépésben vegyél részt az okosotthon kihívásban vagy legyél teis „OkosotthonKalandor”.Ha készen állsz arra, hogy te is okosotthon telepítőlegyél, foglalj időpontot egy ingyenes 45perces konzultációra.
ABOUT JON HYMANJon Hyman is the co-founder and chief technology officer of Braze, the customer engagement platform that delivers messaging experiences across push, email, in-app, and more. He leads the charge for building the platform's technical systems and infrastructure as well as overseeing the company's technical operations and engineering team.Prior to Braze, Jon served as lead engineer for the Core Technology group at Bridgewater Associates, the world's largest hedge fund. There, he managed a team that maintained 80+ software assets and was responsible for the security and stability of critical trading systems. Jon met cofounder Bill Magnuson during his time at Bridgewater, and together they won the 2011 TechCrunch Disrupt Hackathon. Jon is a recipient of the SmartCEO Executive Management Award in the CIO/CTO Category for New York. Jon holds a B.A. from Harvard University in Computer Science.ABOUT BRAZEBraze is the leading customer engagement platform that empowers brands to Be Absolutely Engaging.™ Braze allows any marketer to collect and take action on any amount of data from any source, so they can creatively engage with customers in real time, across channels from one platform. From cross-channel messaging and journey orchestration to Al-powered experimentation and optimization, Braze enables companies to build and maintain absolutely engaging relationships with their customers that foster growth and loyalty. The company has been recognized as a 2024 U.S. News & World Report Best Companies to Work For, 2024 Best Small & Medium Workplaces in Europe by Great Place to Work®, 2024 Fortune Best Workplaces for Women™ by Great Place to Work® and was named a Leader by Gartner® in the 2024 Magic Quadrant™ for Multichannel Marketing Hubs and a Strong Performer in The Forrester Wave™: Email Marketing Service Providers, Q3 2024. Braze is headquartered in New York with 15 offices across North America, Europe, and APAC. Learn more at braze.com.SHOW NOTES:What Jon learned from being the only person on call for his company's first four years (2:56)Knowing when it's time to get help managing your servers, ops, scaling, etc. (5:42)Establishing areas of product ownership & other scaling lessons from the early days (9:25)Frameworks for conversations on splitting of products across teams (12:00)The challenges, complexities & strategies behind assigning ownership in the early days (14:40)Founding Braze (18:01)Why Braze? The story & insights behind the original vision for Braze (20:08)Identifying Braze's product market fit (22:34)Early-stage PMF challenges faced by Jon & his co-founders (25:40)Pivoting to focus on enterprise customers (27:48)“Let's integrate the SDK right now” - founder-led sales ideas to validate your product (29:22)Behind the decision to hire a chief revenue officer for the first time (34:02)The evolution of enterprise & its impact on Braze's product offering (36:42)Growing out of your early-stage failure modes (39:00)Why it's important to make personnel decisions quickly (41:22)Setting & maintaining a vision pre IPO vs. post IPO (44:21)Jon's next leadership evolution & growth areas he is focusing on (49:50)Rapid fire questions (52:53)LINKS AND RESOURCESWhen We Cease to Understand the World - Benjamín Labatut's fictional examination of the lives of real-life scientists and thinkers whose discoveries resulted in moral consequences beyond their imagining. At a breakneck pace and with a wealth of disturbing detail, Labatut uses the imaginative resources of fiction to tell the stories of Fritz Haber, Alexander Grothendieck, Werner Heisenberg, and Erwin Schrödinger, the scientists and mathematicians who expanded our notions of the possible.This episode wouldn't have been possible without the help of our incredible production team:Patrick Gallagher - Producer & Co-HostJerry Li - Co-HostNoah Olberding - Associate Producer, Audio & Video Editor https://www.linkedin.com/in/noah-olberding/Dan Overheim - Audio Engineer, Dan's also an avid 3D printer - https://www.bnd3d.com/Ellie Coggins Angus - Copywriter, Check out her other work at https://elliecoggins.com/about/
I'm changing the format of RocketShip! Learn all about why and what changes, the new times and topics we will talk about. This podcast will continue to be primarily about React Native, but also about shipping great apps, using AI and news for mobile devs!Also in this episode: - My latest app projects - My first experience with the Vercel AI SDK in Expo apps- Personal branding stories from Simon- Behind the scenes of a creator.
What if insurance data weren't just extracted—but earned? In this episode of the Insurtech Leadership Podcast, host Josh Hollander sits down with Elan Nyer, CEO and Co-Founder of Ownli, to explore how policyholders can take control of their data and become active participants in the insurance value chain. Elan shares how Ownli enables consumers to share verified, first-party data—such as mileage, location, and vehicle condition—directly with insurers, creating a permissioned, transparent data marketplace. This shift not only improves underwriting and claims accuracy but also strengthens engagement between insurers and policyholders. The conversation also dives into AI-driven verification, flexible integration models, and the larger opportunity to serve untapped, non-telematics users. Elan's vision turns traditional data collection on its head—empowering individuals to own, protect, and profit from their digital footprint. In This Episode: [00:59] Elan's background in automotive data and connected mobility [03:01] The founding insight behind Ownli: data control and transparency [05:00] Verified use cases: mileage, parking location, and condition check-ins [09:35] Serving non-telematics users with valuable, self-reported insights [12:06] Verifying user-submitted data with AI and 11-layer model architecture [14:00] Ownli's revenue-sharing model where users benefit from their data [16:30] Three integration paths: SDK, web portal, and standalone app [22:16] Building a verified “insurance profile” to streamline quoting and renewal [30:12] Regulatory challenges and advocating for individual data ownership Notable Quotes: “If I had verified mileage and condition on my policy, my claim would have been resolved in 24 hours. Instead, it took months.” – Elan Nyer “Your data is an asset—just like money in a bank. We're building the platform to help you own it, protect it, and earn from it.” – Elan Nyer Our Guest: Elan Nyer is the CEO and Co-Founder of Ownli, a platform that empowers consumers to share verified, first-party insurance data through a consent-first marketplace. Previously, he led teams at Mobileye and Nexar, focusing on automotive safety and connected vehicle intelligence. Resources: Elan Nyer – LinkedIn | Ownli Josh Hollander – LinkedIn | Horton International | Insurtech Leadership Show
Jordan Dea-Mattson, a veteran tech leader, and Jeremy Au discussed how Jordan built developer tools at Apple and went on to lead engineering teams at Adobe and Indeed. They explored how he witnessed Apple's transformation under Steve Jobs, the often unseen dynamics behind major tech layoffs, and what it takes to grow and scale high-performing teams in Southeast Asia. Jordan also shared how he led the rapid expansion of Indeed Singapore, navigated its unexpected closure, and helped his team transition. He also opens up about overcoming personal trauma, leading with integrity, and why real bravery means acting in the face of fear. 1. From curious teen to Apple product manager: Jordan fell in love with computers in middle school, studied computer science, and hustled his way into a job at Apple by fixing bugs and thinking like a product owner. 2. Building early developer tools: He managed key tools like ResEdit and Max bug, and worked on making Apple software usable in Japanese, Arabic, and Hebrew—shaping his global product thinking. 3. Seeing Apple with and without Jobs: Jordan lived through Apple's lost years and felt the seismic shift when Steve Jobs returned—cutting the product line, raising the bar, and restoring focus. 4. From Apple to Adobe: At Adobe, Jordan worked on Acrobat's SDK, then led a cross-product team to improve interoperability—laying the groundwork for what became the Adobe Creative Suite. 5. Layoffs, politics, and unintended consequences: He was laid off during Adobe's merger with Macromedia, learning firsthand how internal politics often decide who stays and who goes. 6. Helping Adobe's products play nice: His team standardized core components like fonts and color management, turning a “preschool” of incompatible products into a cohesive offering. 7. Building Indeed Singapore from scratch: In 2018, Jordan set up the Indeed product center in Singapore, growing it from 4 to 250 people—emphasizing diversity, speed, and engineering quality. Watch, listen or read the full insight at https://www.braves ea.com/blog/engineering-soft-landings Get transcripts, startup resources & community discussions at www.bravesea.com WhatsApp: https://whatsapp.com/channel/0029VakR55X6BIElUEvkN02e TikTok: https://www.tiktok.com/@jeremyau Instagram: https://www.instagram.com/jeremyauz Twitter: https://twitter.com/jeremyau LinkedIn: https://www.linkedin.com/company/bravesea English: Spotify | YouTube | Apple Podcasts Bahasa Indonesia: Spotify | YouTube | Apple Podcasts Chinese: Spotify | YouTube | Apple Podcasts Vietnamese: Spotify | YouTube | Apple Podcasts
Celé PREMIUM VIDEO nájdeš tu
Chris Duggan is the Senior Marketing Manager at ChainGPT and former content lead at BNB Chain (Binance). In this episode, recorded live in New York, Chris dives deep into how ChainGPT is building an expansive AI ecosystem for Web3—from smart contract generation and NFT creation to incubating new AI-native projects. We talk about ChainGPT's explosive social growth (1M+ followers), his personal journey through the metaverse, and why co-marketing, creativity, and cultural relevance are everything in Web3 marketing today. He also shares actionable advice for founders, builders, and content creators looking to grow their presence and leverage AI effectively.Key Timestamps[00:00:00] Introduction: Sam introduces the episode and guest Chris Duggan from ChainGPT. [00:01:00] What is ChainGPT: Chris explains how ChainGPT builds AI infrastructure for Web3, including smart contract tools, NFT generators, and launchpads. [00:03:30] Smart Contract Generator & Auditor: How developers can create, audit, and deploy contracts with zero coding knowledge. [00:05:00] Chris's Journey: From writing and the metaverse to Binance and now ChainGPT. [00:08:00] Marketing DNA: Why ChainGPT's branding and visuals are key to its rapid growth. [00:10:00] Differentiators: How ChainGPT goes beyond the usual with tools, launchpads, SDKs, and an upcoming blockchain (AIVM). [00:12:30] Content Strategy: Chris breaks down how to think in benefits (not features), and why fun and variety win.[00:16:00] Meme Culture: How ChainGPT uses humor, memes, and a mascot to go viral.[00:19:00] AI Agents: How ChainGPT is building autonomous agents for market news, trading, and dev tools.[00:24:00] Future of AI x Web3: What's coming with AIVM and the decentralized AI agent economy.[00:28:00] AI Content Fatigue: Why authenticity and human-created content still matter.[00:30:00] What Founders Should Do: Co-marketing, cross-promotion, and collaborating with similar projects.[00:33:00] Final Ask: ChainGPT invites devs, founders, and creatives to try their tools, apply for incubation, and build with them.DisclaimerNothing mentioned in this podcast is investment advice and please do your own research. Finally, it would mean a lot if you can leave a review of this podcast on Apple Podcasts or Spotify and share this podcast with a friend.Connecthttps://www.chaingpt.org/https://www.linkedin.com/company/chaingpt/https://www.linkedin.com/in/christopher-duggan-43865a65/https://x.com/Chain_GPTBe a guest on the podcast or contact us - https://www.web3pod.xyz/
Welcome to your weekly UAS News Update. We have three stories for you this week. First, DJI takes the Department of Defense to court over its "Chinese Military Company" label. Second, the popular DJI Mini 4 Pro gets a huge update enabling third-party app support. And third, drones play a crucial role in rescuing two lost teenagers in Colorado. First up this week, DJI is pushing back legally against the U.S. Department of Defense. On March 14th, DJI filed a motion in federal court challenging the DoD's designation of the company as a “Chinese Military Company,” or CMC. This label comes under Section 1260H of the 2021 National Defense Authorization Act, which targets companies supposedly linked to China's military. DJI argues this designation, first applied in October 2022 and reaffirmed this past January that it is arbitrary, lacks substantial evidence, and harms its business significantly. The company points to terminated contracts and state-level restrictions in places like Florida and Arkansas that limit or ban the use of their drones by public agencies. DJI is asking the court to declare the DoD's actions unlawful and remove them from the CMC list. They claim the DoD ignored a detailed delisting petition submitted last July and failed to provide public justification for the listing as required by recent amendments. DJI contests the DoD's claims about state ownership, stating that its founder and early investors hold the vast majority of stock and voting rights, with state-owned entities having minimal shares. They also dispute the idea that having a National Enterprise Technology Center status links them to the military, noting companies like Volkswagen also hold this civilian-focused status. Next up, there's some exciting news for DJI Mini 4 Pro owners. DJI has released a major update to its Mobile SDK, that's the Software Development Kit, specifically version 5. This update now includes support for the Mini 4 Pro, which is a pretty big deal. What this means is that third-party developers can now create apps that work directly with your Mini 4 Pro. We're talking about popular apps like Litchi, DroneDeploy, and Drone Harmony potentially offering features like advanced flight automation, custom waypoint missions, and better mapping capabilities. This really unlocks some pro-level functionality for a drone that weighs under 250 grams. Now, there's one important catch you need to know. To use these third-party apps via the MSDK, you *must* be using the DJI RC-N2 controller – that's the one that uses your smartphone as the screen. Unfortunately, the DJI RC 2 controller, the one with the built-in screen, doesn't have MSDK support enabled for the Mini 4 Pro at this time. DJI hasn't said if or when that might change. This update also added MSDK support for the professional Matrice 4D cinematography drone and the Matrice 4TD industrial drone. Next up this week, a great story showing drones in action saving lives. Two teenagers got lost while hiking near Carpenter Peak in Colorado's Roxborough State Park last Saturday evening. They did the right thing: they called 911 and stayed put. Douglas County Search and Rescue, along with Colorado Parks and Wildlife, responded around 8:30 p.m. As ground crews started hiking in, the DCSAR drone team lead, Darren Keralla, launched a drone. Despite windy conditions, the drone quickly located the teens, who were flashing a light while sheltering under trees. Using the drone's GPS data (lat/long), rescuers could pinpoint their exact location, streamlining the effort. It's another fantastic example of how drones are becoming invaluable tools for search and rescue operations. https://kdvr.com/news/local/2-lost-teenagers-rescued-with-drone-aid-at-roxborough-state-park/https://dronedj.com/2025/03/21/dji-mini-4-pro-msdk/https://dronexl.co/2025/03/21/dji-court-chinese-military-company-label/
⬥GUEST⬥Ken Huang, Co-Chair, AI Safety Working Groups at Cloud Security Alliance | On LinkedIn: https://www.linkedin.com/in/kenhuang8/⬥HOST⬥Host: Sean Martin, Co-Founder at ITSPmagazine and Host of Redefining CyberSecurity Podcast | On LinkedIn: https://www.linkedin.com/in/imsmartin/ | Website: https://www.seanmartin.com⬥EPISODE NOTES⬥In this episode of Redefining CyberSecurity, host Sean Martin speaks with Ken Huang, Co-Chair of the Cloud Security Alliance (CSA) AI Working Group and author of several books including Generative AI Security and the upcoming Agent AI: Theory and Practice. The conversation centers on what agentic AI is, how it is being implemented, and what security, development, and business leaders need to consider as adoption grows.Agentic AI refers to systems that can autonomously plan, execute, and adapt tasks using large language models (LLMs) and integrated tools. Unlike traditional chatbots, agentic systems handle multi-step workflows, delegate tasks to specialized agents, and dynamically respond to inputs using tools like vector databases or APIs. This creates new possibilities for business automation but also introduces complex security and governance challenges.Practical Applications and Emerging Use CasesKen outlines current use cases where agentic AI is being applied: startups using agentic models to support scientific research, enterprise tools like Salesforce's AgentForce automating workflows, and internal chatbots acting as co-workers by tapping into proprietary data. As agentic AI matures, these systems may manage travel bookings, orchestrate ticketing operations, or even assist in robotic engineering—all with minimal human intervention.Implications for Development and Security TeamsDevelopment teams adopting agentic AI frameworks—such as AutoGen or CrewAI—must recognize that most do not come with out-of-the-box security controls. Ken emphasizes the need for SDKs that add authentication, monitoring, and access controls. For IT and security operations, agentic systems challenge traditional boundaries; agents often span across cloud environments, demanding a zero-trust mindset and dynamic policy enforcement.Security leaders are urged to rethink their programs. Agentic systems must be validated for accuracy, reliability, and risk—especially when multiple agents operate together. Threat modeling and continuous risk assessment are no longer optional. Enterprises are encouraged to start small: deploy a single-agent system, understand the workflow, validate security controls, and scale as needed.The Call for Collaboration and Mindset ShiftAgentic AI isn't just a technological shift—it requires a cultural one. Huang recommends cross-functional engagement and alignment with working groups at CSA, OWASP, and other communities to build resilient frameworks and avoid duplicated effort. Zero Trust becomes more than an architecture—it becomes a guiding principle for how agentic AI is developed, deployed, and defended.⬥SPONSORS⬥LevelBlue: https://itspm.ag/attcybersecurity-3jdk3ThreatLocker: https://itspm.ag/threatlocker-r974⬥RESOURCES⬥BOOK | Generative AI Security: https://link.springer.com/book/10.1007/978-3-031-54252-7BOOK | Agentic AI: Theories and Practices, to be published August by Springer: https://link.springer.com/book/9783031900259BOOK | The Handbook of CAIO (with a business focus): https://www.amazon.com/Handbook-Chief-AI-Officers-Revolution/dp/B0DFYNXGMRMore books at Amazon, including books published by Cambridge University Press and John Wiley, etc.: https://www.amazon.com/stores/Ken-Huang/author/B0D3J7L7GNVideo Course Mentioned During this Episode: "Generative AI for Cybersecurity" video course by EC-Council with 255 people rated averaged 5 starts: https://codered.eccouncil.org/course/generative-ai-for-cybersecurity-course?logged=falsePodcast: The 2025 OWASP Top 10 for LLMs: What's Changed and Why It Matters | A Conversation with Sandy Dunn and Rock Lambros⬥ADDITIONAL INFORMATION⬥✨ More Redefining CyberSecurity Podcast:
⬥GUEST⬥Ken Huang, Co-Chair, AI Safety Working Groups at Cloud Security Alliance | On LinkedIn: https://www.linkedin.com/in/kenhuang8/⬥HOST⬥Host: Sean Martin, Co-Founder at ITSPmagazine and Host of Redefining CyberSecurity Podcast | On LinkedIn: https://www.linkedin.com/in/imsmartin/ | Website: https://www.seanmartin.com⬥EPISODE NOTES⬥In this episode of Redefining CyberSecurity, host Sean Martin speaks with Ken Huang, Co-Chair of the Cloud Security Alliance (CSA) AI Working Group and author of several books including Generative AI Security and the upcoming Agent AI: Theory and Practice. The conversation centers on what agentic AI is, how it is being implemented, and what security, development, and business leaders need to consider as adoption grows.Agentic AI refers to systems that can autonomously plan, execute, and adapt tasks using large language models (LLMs) and integrated tools. Unlike traditional chatbots, agentic systems handle multi-step workflows, delegate tasks to specialized agents, and dynamically respond to inputs using tools like vector databases or APIs. This creates new possibilities for business automation but also introduces complex security and governance challenges.Practical Applications and Emerging Use CasesKen outlines current use cases where agentic AI is being applied: startups using agentic models to support scientific research, enterprise tools like Salesforce's AgentForce automating workflows, and internal chatbots acting as co-workers by tapping into proprietary data. As agentic AI matures, these systems may manage travel bookings, orchestrate ticketing operations, or even assist in robotic engineering—all with minimal human intervention.Implications for Development and Security TeamsDevelopment teams adopting agentic AI frameworks—such as AutoGen or CrewAI—must recognize that most do not come with out-of-the-box security controls. Ken emphasizes the need for SDKs that add authentication, monitoring, and access controls. For IT and security operations, agentic systems challenge traditional boundaries; agents often span across cloud environments, demanding a zero-trust mindset and dynamic policy enforcement.Security leaders are urged to rethink their programs. Agentic systems must be validated for accuracy, reliability, and risk—especially when multiple agents operate together. Threat modeling and continuous risk assessment are no longer optional. Enterprises are encouraged to start small: deploy a single-agent system, understand the workflow, validate security controls, and scale as needed.The Call for Collaboration and Mindset ShiftAgentic AI isn't just a technological shift—it requires a cultural one. Huang recommends cross-functional engagement and alignment with working groups at CSA, OWASP, and other communities to build resilient frameworks and avoid duplicated effort. Zero Trust becomes more than an architecture—it becomes a guiding principle for how agentic AI is developed, deployed, and defended.⬥SPONSORS⬥LevelBlue: https://itspm.ag/attcybersecurity-3jdk3ThreatLocker: https://itspm.ag/threatlocker-r974⬥RESOURCES⬥BOOK | Generative AI Security: https://link.springer.com/book/10.1007/978-3-031-54252-7BOOK | Agentic AI: Theories and Practices, to be published August by Springer: https://link.springer.com/book/9783031900259BOOK | The Handbook of CAIO (with a business focus): https://www.amazon.com/Handbook-Chief-AI-Officers-Revolution/dp/B0DFYNXGMRMore books at Amazon, including books published by Cambridge University Press and John Wiley, etc.: https://www.amazon.com/stores/Ken-Huang/author/B0D3J7L7GNVideo Course Mentioned During this Episode: "Generative AI for Cybersecurity" video course by EC-Council with 255 people rated averaged 5 starts: https://codered.eccouncil.org/course/generative-ai-for-cybersecurity-course?logged=falsePodcast: The 2025 OWASP Top 10 for LLMs: What's Changed and Why It Matters | A Conversation with Sandy Dunn and Rock Lambros⬥ADDITIONAL INFORMATION⬥✨ More Redefining CyberSecurity Podcast:
Did Copilot uninstall from your computer recently? You're not alone. At least Microsoft is working on a fix. Plus, Satya Nadella has created a new Office of Strategy and Transformation to meet the rapidly evolving needs of the AI era. Discord finally has a Social SDK now, Copilot for Gaming is preparing for mobile testing, and a Call of Duty franchise sale that brings savings up to 67 percent off. Lastly, Paul's app pick is a free, open source, third-party File Explorer replacement that is beautiful and highly customizable. And it never badgers you to backup to OneDrive. Windows March security update hilariously removes Copilot app from Windows 11 New Canary build today Release Preview (today): 24H2 ahead of Week D Release Preview: 23H2 and Windows 10 ahead of Week D Dev, Beta, Beta (23H3) - Voice access suggestions, File Explorer fix Paint is getting new Cocreator features New Notepad and Snipping Tool features for all Microsoft Microsoft announces vague transformation that could be important FTC to move forward with Microsoft antitrust probe Microsoft no longer includes power supply with Surface PCs sold in Europe AI/Dev Gemini adds Canvas and Audio Overview features Plus, Gemini is replacing Assistant in Android (and Chromebook) Zoom AI Companion is going agentic Meta claims one billion downloads of Llama AI models Microsoft ships .NET 10 Preview 2 Xbox Microsoft is bringing Copilot to Xbox Xbox Adaptive Joystick is now available for $29.99 Here are the new games heading to Game Pass in second half of March Epic Games and Qualcomm partner on bringing games to WOA Discord has an SDK now Google Play Games for PC is adding native games Tips and Picks Tip of the week: Call of Duty titles are on sale in the Microsoft Store App pick of the week: Files RunAs Radio this week: Managing AI Costs with Sonia Cuff Brown liquor pick of the week: Toki Suntory Whisky Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell Download or subscribe to Windows Weekly at https://twit.tv/shows/windows-weekly Check out Paul's blog at thurrott.com The Windows Weekly theme music is courtesy of Carl Franklin. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: uscloud.com zscaler.com/security 1password.com/windowsweekly
Did Copilot uninstall from your computer recently? You're not alone. At least Microsoft is working on a fix. Plus, Satya Nadella has created a new Office of Strategy and Transformation to meet the rapidly evolving needs of the AI era. Discord finally has a Social SDK now, Copilot for Gaming is preparing for mobile testing, and a Call of Duty franchise sale that brings savings up to 67 percent off. Lastly, Paul's app pick is a free, open source, third-party File Explorer replacement that is beautiful and highly customizable. And it never badgers you to backup to OneDrive. Windows March security update hilariously removes Copilot app from Windows 11 New Canary build today Release Preview (today): 24H2 ahead of Week D Release Preview: 23H2 and Windows 10 ahead of Week D Dev, Beta, Beta (23H3) - Voice access suggestions, File Explorer fix Paint is getting new Cocreator features New Notepad and Snipping Tool features for all Microsoft Microsoft announces vague transformation that could be important FTC to move forward with Microsoft antitrust probe Microsoft no longer includes power supply with Surface PCs sold in Europe AI/Dev Gemini adds Canvas and Audio Overview features Plus, Gemini is replacing Assistant in Android (and Chromebook) Zoom AI Companion is going agentic Meta claims one billion downloads of Llama AI models Microsoft ships .NET 10 Preview 2 Xbox Microsoft is bringing Copilot to Xbox Xbox Adaptive Joystick is now available for $29.99 Here are the new games heading to Game Pass in second half of March Epic Games and Qualcomm partner on bringing games to WOA Discord has an SDK now Google Play Games for PC is adding native games Tips and Picks Tip of the week: Call of Duty titles are on sale in the Microsoft Store App pick of the week: Files RunAs Radio this week: Managing AI Costs with Sonia Cuff Brown liquor pick of the week: Toki Suntory Whisky Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell Download or subscribe to Windows Weekly at https://twit.tv/shows/windows-weekly Check out Paul's blog at thurrott.com The Windows Weekly theme music is courtesy of Carl Franklin. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: uscloud.com zscaler.com/security 1password.com/windowsweekly
Did Copilot uninstall from your computer recently? You're not alone. At least Microsoft is working on a fix. Plus, Satya Nadella has created a new Office of Strategy and Transformation to meet the rapidly evolving needs of the AI era. Discord finally has a Social SDK now, Copilot for Gaming is preparing for mobile testing, and a Call of Duty franchise sale that brings savings up to 67 percent off. Lastly, Paul's app pick is a free, open source, third-party File Explorer replacement that is beautiful and highly customizable. And it never badgers you to backup to OneDrive. Windows March security update hilariously removes Copilot app from Windows 11 New Canary build today Release Preview (today): 24H2 ahead of Week D Release Preview: 23H2 and Windows 10 ahead of Week D Dev, Beta, Beta (23H3) - Voice access suggestions, File Explorer fix Paint is getting new Cocreator features New Notepad and Snipping Tool features for all Microsoft Microsoft announces vague transformation that could be important FTC to move forward with Microsoft antitrust probe Microsoft no longer includes power supply with Surface PCs sold in Europe AI/Dev Gemini adds Canvas and Audio Overview features Plus, Gemini is replacing Assistant in Android (and Chromebook) Zoom AI Companion is going agentic Meta claims one billion downloads of Llama AI models Microsoft ships .NET 10 Preview 2 Xbox Microsoft is bringing Copilot to Xbox Xbox Adaptive Joystick is now available for $29.99 Here are the new games heading to Game Pass in second half of March Epic Games and Qualcomm partner on bringing games to WOA Discord has an SDK now Google Play Games for PC is adding native games Tips and Picks Tip of the week: Call of Duty titles are on sale in the Microsoft Store App pick of the week: Files RunAs Radio this week: Managing AI Costs with Sonia Cuff Brown liquor pick of the week: Toki Suntory Whisky Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell Download or subscribe to Windows Weekly at https://twit.tv/shows/windows-weekly Check out Paul's blog at thurrott.com The Windows Weekly theme music is courtesy of Carl Franklin. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: uscloud.com zscaler.com/security 1password.com/windowsweekly
Did Copilot uninstall from your computer recently? You're not alone. At least Microsoft is working on a fix. Plus, Satya Nadella has created a new Office of Strategy and Transformation to meet the rapidly evolving needs of the AI era. Discord finally has a Social SDK now, Copilot for Gaming is preparing for mobile testing, and a Call of Duty franchise sale that brings savings up to 67 percent off. Lastly, Paul's app pick is a free, open source, third-party File Explorer replacement that is beautiful and highly customizable. And it never badgers you to backup to OneDrive. Windows March security update hilariously removes Copilot app from Windows 11 New Canary build today Release Preview (today): 24H2 ahead of Week D Release Preview: 23H2 and Windows 10 ahead of Week D Dev, Beta, Beta (23H3) - Voice access suggestions, File Explorer fix Paint is getting new Cocreator features New Notepad and Snipping Tool features for all Microsoft Microsoft announces vague transformation that could be important FTC to move forward with Microsoft antitrust probe Microsoft no longer includes power supply with Surface PCs sold in Europe AI/Dev Gemini adds Canvas and Audio Overview features Plus, Gemini is replacing Assistant in Android (and Chromebook) Zoom AI Companion is going agentic Meta claims one billion downloads of Llama AI models Microsoft ships .NET 10 Preview 2 Xbox Microsoft is bringing Copilot to Xbox Xbox Adaptive Joystick is now available for $29.99 Here are the new games heading to Game Pass in second half of March Epic Games and Qualcomm partner on bringing games to WOA Discord has an SDK now Google Play Games for PC is adding native games Tips and Picks Tip of the week: Call of Duty titles are on sale in the Microsoft Store App pick of the week: Files RunAs Radio this week: Managing AI Costs with Sonia Cuff Brown liquor pick of the week: Toki Suntory Whisky Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell Download or subscribe to Windows Weekly at https://twit.tv/shows/windows-weekly Check out Paul's blog at thurrott.com The Windows Weekly theme music is courtesy of Carl Franklin. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: uscloud.com zscaler.com/security 1password.com/windowsweekly
Did Copilot uninstall from your computer recently? You're not alone. At least Microsoft is working on a fix. Plus, Satya Nadella has created a new Office of Strategy and Transformation to meet the rapidly evolving needs of the AI era. Discord finally has a Social SDK now, Copilot for Gaming is preparing for mobile testing, and a Call of Duty franchise sale that brings savings up to 67 percent off. Lastly, Paul's app pick is a free, open source, third-party File Explorer replacement that is beautiful and highly customizable. And it never badgers you to backup to OneDrive. Windows March security update hilariously removes Copilot app from Windows 11 New Canary build today Release Preview (today): 24H2 ahead of Week D Release Preview: 23H2 and Windows 10 ahead of Week D Dev, Beta, Beta (23H3) - Voice access suggestions, File Explorer fix Paint is getting new Cocreator features New Notepad and Snipping Tool features for all Microsoft Microsoft announces vague transformation that could be important FTC to move forward with Microsoft antitrust probe Microsoft no longer includes power supply with Surface PCs sold in Europe AI/Dev Gemini adds Canvas and Audio Overview features Plus, Gemini is replacing Assistant in Android (and Chromebook) Zoom AI Companion is going agentic Meta claims one billion downloads of Llama AI models Microsoft ships .NET 10 Preview 2 Xbox Microsoft is bringing Copilot to Xbox Xbox Adaptive Joystick is now available for $29.99 Here are the new games heading to Game Pass in second half of March Epic Games and Qualcomm partner on bringing games to WOA Discord has an SDK now Google Play Games for PC is adding native games Tips and Picks Tip of the week: Call of Duty titles are on sale in the Microsoft Store App pick of the week: Files RunAs Radio this week: Managing AI Costs with Sonia Cuff Brown liquor pick of the week: Toki Suntory Whisky Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell Download or subscribe to Windows Weekly at https://twit.tv/shows/windows-weekly Check out Paul's blog at thurrott.com The Windows Weekly theme music is courtesy of Carl Franklin. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: uscloud.com zscaler.com/security 1password.com/windowsweekly
Did Copilot uninstall from your computer recently? You're not alone. At least Microsoft is working on a fix. Plus, Satya Nadella has created a new Office of Strategy and Transformation to meet the rapidly evolving needs of the AI era. Discord finally has a Social SDK now, Copilot for Gaming is preparing for mobile testing, and a Call of Duty franchise sale that brings savings up to 67 percent off. Lastly, Paul's app pick is a free, open source, third-party File Explorer replacement that is beautiful and highly customizable. And it never badgers you to backup to OneDrive. Windows March security update hilariously removes Copilot app from Windows 11 New Canary build today Release Preview (today): 24H2 ahead of Week D Release Preview: 23H2 and Windows 10 ahead of Week D Dev, Beta, Beta (23H3) - Voice access suggestions, File Explorer fix Paint is getting new Cocreator features New Notepad and Snipping Tool features for all Microsoft Microsoft announces vague transformation that could be important FTC to move forward with Microsoft antitrust probe Microsoft no longer includes power supply with Surface PCs sold in Europe AI/Dev Gemini adds Canvas and Audio Overview features Plus, Gemini is replacing Assistant in Android (and Chromebook) Zoom AI Companion is going agentic Meta claims one billion downloads of Llama AI models Microsoft ships .NET 10 Preview 2 Xbox Microsoft is bringing Copilot to Xbox Xbox Adaptive Joystick is now available for $29.99 Here are the new games heading to Game Pass in second half of March Epic Games and Qualcomm partner on bringing games to WOA Discord has an SDK now Google Play Games for PC is adding native games Tips and Picks Tip of the week: Call of Duty titles are on sale in the Microsoft Store App pick of the week: Files RunAs Radio this week: Managing AI Costs with Sonia Cuff Brown liquor pick of the week: Toki Suntory Whisky Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell Download or subscribe to Windows Weekly at https://twit.tv/shows/windows-weekly Check out Paul's blog at thurrott.com The Windows Weekly theme music is courtesy of Carl Franklin. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: uscloud.com zscaler.com/security 1password.com/windowsweekly
Google's Chirp 3 is coming to Vertex AI, Roblox announces text-to-shape generator Cube 3D, Alphabet spins off laser internet startup Taara. Show Notes
Send Everyday AI and Jordan a text messageOpenAI and Google are essentially asking the federal government to skip the whole copyright law.
Google's Chirp 3 is coming to Vertex AI, Roblox announces text-to-shape generator Cube 3D, Alphabet spins off laser internet startup Taara. MP3 Please SUBSCRIBE HERE for free or get DTNS Live ad-free. A special thanks to all our supporters–without you, none of this would be possible. If you enjoy what you see you can supportContinue reading "Discord Announces Social SDK For Game Developers – DTH"
The rollercoaster of the AFL Fantasy season is back as we hang on every lockout, decision made and point scored. The Traders unpack round one and look ahead to the fix up trades for round two. From Sam De Koning to Joel Freijah as the mid-priced defenders to making sure you've got Xavier Lindsay and Levi Ashcroft as cash cows, plenty is covered in this bumper episode ... also featuring plenty of your questions. Head to fantasy.afl.com.au to pick your AFL Fantasy Classic team and you can set up your AFL Fantasy Draft league today at fantasydraft.afl.com.au. Episode guide 1:20 - The Traders' scores with Calvin happy on top. 3:10 - Who gets the +3s and -3s? 6:00 - Xavier Lindsay gets top Cash Cow of the Year votes. 8:10 - News from round one. 11:00 - Calvin brings back tag watch. 16:45 - What are these first trades about? 22:25 - Mid-priced options to chase. 26:00 - Should we get Sam De Koning? 32:30 - Who should be traded out? 41:40 - Dual-position player possibilities. 43:00 - Early rage trades. 44:00 - Questions from social media - follow @AFLFantasy on X, @aflfantasy on Instagram and like the Official AFL Fantasy facebook page. 50:30 - Is Guns 'n' Rookies the way to play the Best 18 of the byes? 53:30 - Who is the priority to trade out of Sam Taylor and Harry Perryman? - - - - Find more from Roy, Calvin and Warnie. Head to afl.com.au/fantasy for more content from The Traders. Like AFL Fantasy on Facebook. Follow @AFLFantasy on Instagram. Follow @AFLFantasy on X.See omnystudio.com/listener for privacy information.
It is time for a seasonal update at the intersection of Marketing, Data, Privacy and Technology. As usual, this Newsroom is divided into five blocks: ePrivacy & regulatory updates; MarTech & AdTech; AI, Competition and Digital Markets; PETs and Zero-Party Data; and Future of Media. TL;DL: The use of SDKs for data collection/sharing has been a common factor in various fines and lawsuits on both sides of the pond. The EDPB sparked an important debate on personal data-powered AI in the EU. Texas and California went after Allstate and Honda respectively. La Liga (ES), Netflix (NL), Meta (IR), and others received fines. The FTC put an end to personal data sales by General Motors. The My Health My Data Act (WA) was put to the test. AI “reasoning” models exploded, and then AI Agents followed. Garante (IT) blocked DeepSeek and a class action in Germany could have a major impact across the EU. Australia updated its legal framework. The biggest CDP players dissolved into adjacent markets and Google kept marching towards PET-powered AdTech. All references and links can be found in this episode's blog post.
Join Simtheory: https://simtheory.ai----CHAPTERS:00:00 - Gemini Flash 2.0 Experimental Native Image Generation & Editing27:55 - Thoughts on OpenAI's "New tools for building agents" announcement43:31 - Why is everyone talking about MCP all of a sudden?56:31 - Manus AI: Will Manus Invade the USA and Defeat it With Powerful AGI? (jokes)----Thanks for all of your support and listening!
Send us a textRuchir Punjabi shares how his company ReNRG is building a decentralized physical energy network (DePEN) and the world's first energy-backed stablecoin to revolutionize how electricity is bought, sold, and managed.• Building technology that lets users buy renewable energy from anywhere in the world through blockchain• Creating an IoT gateway compatible with over 40 types of electrical equipment to enable smart electricity management• Addressing equity problems in renewable energy by making solar affordable for small businesses in India and Africa• Developing a crypto SDK that allows developers to build applications on top of electricity infrastructure• Automating electricity payments and management without requiring users to understand Web3 technology• Planning to launch in the next 45-60 days with initial focus on solar energy transactionsThis episode was recorded through a Descript call on February 25, 2025. Read the blog article and show notes here: https://webdrie.net/why-your-next-utility-bill-might-be-paid-by-a-smart-contract-with-ruchir-punjabi/Discover RYO: the Web3 payment solution making crypto simple and secure for everyone. Featuring an expansive ecosystem with LIFE Wallet, Global Mall, and Japan's first licensed Crypto ATM Network, RYO empowers your financial journey. Awarded 'Best Crypto Solution.'
Send Everyday AI and Jordan a text messageThink AI is hitting a wall?
While everyone is now repeating that 2025 is the “Year of the Agent”, OpenAI is heads down building towards it. In the first 2 months of the year they released Operator and Deep Research (arguably the most successful agent archetype so far), and today they are bringing a lot of those capabilities to the API:* Responses API* Web Search Tool* Computer Use Tool* File Search Tool* A new open source Agents SDK with integrated Observability ToolsWe cover all this and more in today's lightning pod on YouTube!More details here:Responses APIIn our Michelle Pokrass episode we talked about the Assistants API needing a redesign. Today OpenAI is launching the Responses API, “a more flexible foundation for developers building agentic applications”. It's a superset of the chat completion API, and the suggested starting point for developers working with OpenAI models. One of the big upgrades is the new set of built-in tools for the responses API: Web Search, Computer Use, and Files. Web Search ToolWe previously had Exa AI on the podcast to talk about web search for AI. OpenAI is also now joining the race; the Web Search API is actually a new “model” that exposes two 4o fine-tunes: gpt-4o-search-preview and gpt-4o-mini-search-preview. These are the same models that power ChatGPT Search, and are priced at $30/1000 queries and $25/1000 queries respectively. The killer feature is inline citations: you do not only get a link to a page, but also a deep link to exactly where your query was answered in the result page. Computer Use ToolThe model that powers Operator, called Computer-Using-Agent (CUA), is also now available in the API. The computer-use-preview model is SOTA on most benchmarks, achieving 38.1% success on OSWorld for full computer use tasks, 58.1% on WebArena, and 87% on WebVoyager for web-based interactions.As you will notice in the docs, `computer-use-preview` is both a model and a tool through which you can specify the environment. Usage is priced at $3/1M input tokens and $12/1M output tokens, and it's currently only available to users in tiers 3-5.File Search ToolFile Search was also available in the Assistants API, and it's now coming to Responses too. OpenAI is bringing search + RAG all under one umbrella, and we'll definitely see more people trying to find new ways to build all-in-one apps on OpenAI. Usage is priced at $2.50 per thousand queries and file storage at $0.10/GB/day, with the first GB free.Agent SDK: Swarms++!https://github.com/openai/openai-agents-pythonTo bring it all together, after the viral reception to Swarm, OpenAI is releasing an officially supported agents framework (which was previewed at our AI Engineer Summit) with 4 core pieces:* Agents: Easily configurable LLMs with clear instructions and built-in tools.* Handoffs: Intelligently transfer control between agents.* Guardrails: Configurable safety checks for input and output validation.* Tracing & Observability: Visualize agent execution traces to debug and optimize performance.Multi-agent workflows are here to stay!OpenAI is now explicitly designs for a set of common agentic patterns: Workflows, Handoffs, Agents-as-Tools, LLM-as-a-Judge, Parallelization, and Guardrails. OpenAI previewed this in part 2 of their talk at NYC:Further coverage of the launch from Kevin Weil, WSJ, and OpenAIDevs, AMA here.Show Notes* Assistants API* Swarm (OpenAI)* Fine-Tuning in AI* 2024 OpenAI DevDay Recap with Romain* Michelle Pokrass episode (API lead)Timestamps* 00:00 Intros* 02:31 Responses API * 08:34 Web Search API * 17:14 Files Search API * 18:46 Files API vs RAG * 20:06 Computer Use / Operator API * 22:30 Agents SDKAnd of course you can catch up with the full livestream here:TranscriptAlessio [00:00:03]: Hey, everyone. Welcome back to another Latent Space Lightning episode. This is Alessio, partner and CTO at Decibel, and I'm joined by Swyx, founder of Small AI.swyx [00:00:11]: Hi, and today we have a super special episode because we're talking with our old friend Roman. Hi, welcome.Romain [00:00:19]: Thank you. Thank you for having me.swyx [00:00:20]: And Nikunj, who is most famously, if anyone has ever tried to get any access to anything on the API, Nikunj is the guy. So I know your emails because I look forward to them.Nikunj [00:00:30]: Yeah, nice to meet all of you.swyx [00:00:32]: I think that we're basically convening today to talk about the new API. So perhaps you guys want to just kick off. What is OpenAI launching today?Nikunj [00:00:40]: Yeah, so I can kick it off. We're launching a bunch of new things today. We're going to do three new built-in tools. So we're launching the web search tool. This is basically chat GPD for search, but available in the API. We're launching an improved file search tool. So this is you bringing your data to OpenAI. You upload it. We, you know, take care of parsing it, chunking it. We're embedding it, making it searchable, give you this like ready vector store that you can use. So that's the file search tool. And then we're also launching our computer use tool. So this is the tool behind the operator product in chat GPD. So that's coming to developers today. And to support all of these tools, we're going to have a new API. So, you know, we launched chat completions, like I think March 2023 or so. It's been a while. So we're looking for an update over here to support all the new things that the models can do. And so we're launching this new API. It is, you know, it works with tools. We think it'll be like a great option for all the future agentic products that we build. And so that is also launching today. Actually, the last thing we're launching is the agents SDK. We launched this thing called Swarm last year where, you know, it was an experimental SDK for people to do multi-agent orchestration and stuff like that. It was supposed to be like educational experimental, but like people, people really loved it. They like ate it up. And so we are like, all right, let's, let's upgrade this thing. Let's give it a new name. And so we're calling it the agents SDK. It's going to have built-in tracing in the OpenAI dashboard. So lots of cool stuff going out. So, yeah.Romain [00:02:14]: That's a lot, but we said 2025 was the year of agents. So there you have it, like a lot of new tools to build these agents for developers.swyx [00:02:20]: Okay. I guess, I guess we'll just kind of go one by one and we'll leave the agents SDK towards the end. So responses API, I think the sort of primary concern that people have and something I think I've voiced to you guys when, when, when I was talking with you in the, in the planning process was, is chat completions going away? So I just wanted to let it, let you guys respond to the concerns that people might have.Romain [00:02:41]: Chat completion is definitely like here to stay, you know, it's a bare metal API we've had for quite some time. Lots of tools built around it. So we want to make sure that it's maintained and people can confidently keep on building on it. At the same time, it was kind of optimized for a different world, right? It was optimized for a pre-multi-modality world. We also optimized for kind of single turn. It takes two problems. It takes prompt in, it takes response out. And now with these agentic workflows, we, we noticed that like developers and companies want to build longer horizon tasks, you know, like things that require multiple returns to get the task accomplished. And computer use is one of those, for instance. And so that's why the responses API came to life to kind of support these new agentic workflows. But chat completion is definitely here to stay.swyx [00:03:27]: And assistance API, we've, uh, has a target sunset date of first half of 2020. So this is kind of like, in my mind, there was a kind of very poetic mirroring of the API with the models. This, I kind of view this as like kind of the merging of assistance API and chat completions, right. Into one unified responses. So it's kind of like how GPT and the old series models are also unifying.Romain [00:03:48]: Yeah, that's exactly the right, uh, that's the right framing, right? Like, I think we took the best of what we learned from the assistance API, especially like being able to access tools very, uh, very like conveniently, but at the same time, like simplifying the way you have to integrate, like, you no longer have to think about six different objects to kind of get access to these tools with the responses API. You just get one API request and suddenly you can weave in those tools, right?Nikunj [00:04:12]: Yeah, absolutely. And I think we're going to make it really easy and straightforward for assistance API users to migrate over to responsive. Right. To the API without any loss of functionality or data. So our plan is absolutely to add, you know, assistant like objects and thread light objects to that, that work really well with the responses API. We'll also add like the code interpreter tool, which is not launching today, but it'll come soon. And, uh, we'll add async mode to responses API, because that's another difference with, with, uh, assistance. I will have web hooks and stuff like that, but I think it's going to be like a pretty smooth transition. Uh, once we have all of that in place. And we'll be. Like a full year to migrate and, and help them through any issues they, they, they face. So overall, I feel like assistance users are really going to benefit from this longer term, uh, with this more flexible, primitive.Alessio [00:05:01]: How should people think about when to use each type of API? So I know that in the past, the assistance was maybe more stateful, kind of like long running, many tool use kind of like file based things. And the chat completions is more stateless, you know, kind of like traditional completion API. Is that still the mental model that people should have? Or like, should you buy the.Nikunj [00:05:20]: So the responses API is going to support everything that it's at launch, going to support everything that chat completion supports, and then over time, it's going to support everything that assistance supports. So it's going to be a pretty good fit for anyone starting out with open AI. Uh, they should be able to like go to responses responses, by the way, also has a stateless mode, so you can pass in store false and they'll make the whole API stateless, just like chat completions. You're really trying to like get this unification. A story in so that people don't have to juggle multiple endpoints. That being said, like chat completions, just like the most widely adopted API, it's it's so popular. So we're still going to like support it for years with like new models and features. But if you're a new user, you want to or if you want to like existing, you want to tap into some of these like built in tools or something, you should feel feel totally fine migrating to responses and you'll have more capabilities and performance than the tech completions.swyx [00:06:16]: I think the messaging that I agree that I think resonated the most. When I talked to you was that it is a strict superset, right? Like you should be able to do everything that you could do in chat completions and with assistants. And the thing that I just assumed that because you're you're now, you know, by default is stateful, you're actually storing the chat logs or the chat state. I thought you'd be charging me for it. So, you know, to me, it was very surprising that you figured out how to make it free.Nikunj [00:06:43]: Yeah, it's free. We store your state for 30 days. You can turn it off. But yeah, it's it's free. And the interesting thing on state is that it just like makes particularly for me, it makes like debugging things and building things so much simpler, where I can like create a responses object that's like pretty complicated and part of this more complex application that I've built, I can just go into my dashboard and see exactly what happened that mess up my prompt that is like not called one of these tools that misconfigure one of the tools like the visual observability of everything that you're doing is so, so helpful. So I'm excited, like about people trying that out and getting benefits from it, too.swyx [00:07:19]: Yeah, it's a it's really, I think, a really nice to have. But all I'll say is that my friend Corey Quinn says that anything that can be used as a database will be used as a database. So be prepared for some abuse.Romain [00:07:34]: All right. Yeah, that's a good one. Some of that I've tried with the metadata. That's some people are very, very creative at stuffing data into an object. Yeah.Nikunj [00:07:44]: And we do have metadata with responses. Exactly. Yeah.Alessio [00:07:48]: Let's get through it. All of these. So web search. I think the when I first said web search, I thought you were going to just expose a API that then return kind of like a nice list of thing. But the way it's name is like GPD for all search preview. So I'm guessing you have you're using basically the same model that is in the chat GPD search, which is fine tune for search. I'm guessing it's a different model than the base one. And it's impressive the jump in performance. So just to give an example, in simple QA, GPD for all is 38% accuracy for all search is 90%. But we always talk about. How tools are like models is not everything you need, like tools around it are just as important. So, yeah, maybe give people a quick review on like the work that went into making this special.Nikunj [00:08:29]: Should I take that?Alessio [00:08:29]: Yeah, go for it.Nikunj [00:08:30]: So firstly, we're launching web search in two ways. One in responses API, which is our API for tools. It's going to be available as a web search tool itself. So you'll be able to go tools, turn on web search and you're ready to go. We still wanted to give chat completions people access to real time information. So in that. Chat completions API, which does not support built in tools. We're launching the direct access to the fine tuned model that chat GPD for search uses, and we call it GPD for search preview. And how is this model built? Basically, we have our search research team has been working on this for a while. Their main goal is to, like, get information, like get a bunch of information from all of our data sources that we use to gather information for search and then pick the right things and then cite them. As accurately as possible. And that's what the search team has really focused on. They've done some pretty cool stuff. They use like synthetic data techniques. They've done like all series model distillation to, like, make these four or fine tunes really good. But yeah, the main thing is, like, can it remain factual? Can it answer questions based on what it retrieves and get cited accurately? And that's what this like fine tune model really excels at. And so, yeah, so we're excited that, like, it's going to be directly available in chat completions along with being available as a tool. Yeah.Alessio [00:09:49]: Just to clarify, if I'm using the responses API, this is a tool. But if I'm using chat completions, I have to switch model. I cannot use 01 and call search as a tool. Yeah, that's right. Exactly.Romain [00:09:58]: I think what's really compelling, at least for me and my own uses of it so far, is that when you use, like, web search as a tool, it combines nicely with every other tool and every other feature of the platform. So think about this for a second. For instance, imagine you have, like, a responses API call with the web search tool, but suddenly you turn on function calling. You also turn on, let's say, structure. So you can have, like, the ability to structure any data from the web in real time in the JSON schema that you need for your application. So it's quite powerful when you start combining those features and tools together. It's kind of like an API for the Internet almost, you know, like you get, like, access to the precise schema you need for your app. Yeah.Alessio [00:10:39]: And then just to wrap up on the infrastructure side of it, I read on the post that people, publisher can choose to appear in the web search. So are people by default in it? Like, how can we get Latent Space in the web search API?Nikunj [00:10:53]: Yeah. Yeah. I think we have some documentation around how websites, publishers can control, like, what shows up in a web search tool. And I think you should be able to, like, read that. I think we should be able to get Latent Space in for sure. Yeah.swyx [00:11:10]: You know, I think so. I compare this to a broader trend that I started covering last year of online LLMs. Actually, Perplexity, I think, was the first. It was the first to say, to offer an API that is connected to search, and then Gemini had the sort of search grounding API. And I think you guys, I actually didn't, I missed this in the original reading of the docs, but you even give like citations with like the exact sub paragraph that is matching, which I think is the standard nowadays. I think my question is, how do we take what a knowledge cutoff is for something like this, right? Because like now, basically there's no knowledge cutoff is always live, but then there's a difference between what the model has sort of internalized in its back propagation and what is searching up its rag.Romain [00:11:53]: I think it kind of depends on the use case, right? And what you want to showcase as the source. Like, for instance, you take a company like Hebbia that has used this like web search tool. They can combine like for credit firms or law firms, they can find like, you know, public information from the internet with the live sources and citation that sometimes you do want to have access to, as opposed to like the internal knowledge. But if you're building something different, well, like, you just want to have the information. If you want to have an assistant that relies on the deep knowledge that the model has, you may not need to have these like direct citations. So I think it kind of depends on the use case a little bit, but there are many, uh, many companies like Hebbia that will need that access to these citations to precisely know where the information comes from.swyx [00:12:34]: Yeah, yeah, uh, for sure. And then one thing on the, on like the breadth, you know, I think a lot of the deep research, open deep research implementations have this sort of hyper parameter about, you know, how deep they're searching and how wide they're searching. I don't see that in the docs. But is that something that we can tune? Is that something you recommend thinking about?Nikunj [00:12:53]: Super interesting. It's definitely not a parameter today, but we should explore that. It's very interesting. I imagine like how you would do it with the web search tool and responsive API is you would have some form of like, you know, agent orchestration over here where you have a planning step and then each like web search call that you do like explicitly goes a layer deeper and deeper and deeper. But it's not a parameter that's available out of the box. But it's a cool. It's a cool thing to think about. Yeah.swyx [00:13:19]: The only guidance I'll offer there is a lot of these implementations offer top K, which is like, you know, top 10, top 20, but actually don't really want that. You want like sort of some kind of similarity cutoff, right? Like some matching score cuts cutoff, because if there's only five things, five documents that match fine, if there's 500 that match, maybe that's what I want. Right. Yeah. But also that might, that might make my costs very unpredictable because the costs are something like $30 per a thousand queries, right? So yeah. Yeah.Nikunj [00:13:49]: I guess you could, you could have some form of like a context budget and then you're like, go as deep as you can and pick the best stuff and put it into like X number of tokens. There could be some creative ways of, of managing cost, but yeah, that's a super interesting thing to explore.Alessio [00:14:05]: Do you see people using the files and the search API together where you can kind of search and then store everything in the file so the next time I'm not paying for the search again and like, yeah, how should people balance that?Nikunj [00:14:17]: That's actually a very interesting question. And let me first tell you about how I've seen a really cool way I've seen people use files and search together is they put their user preferences or memories in the vector store and so a query comes in, you use the file search tool to like get someone's like reading preferences or like fashion preferences and stuff like that, and then you search the web for information or products that they can buy related to those preferences and you then render something beautiful to show them, like, here are five things that you might be interested in. So that's how I've seen like file search, web search work together. And by the way, that's like a single responses API call, which is really cool. So you just like configure these things, go boom, and like everything just happens. But yeah, that's how I've seen like files and web work together.Romain [00:15:01]: But I think that what you're pointing out is like interesting, and I'm sure developers will surprise us as they always do in terms of how they combine these tools and how they might use file search as a way to have memory and preferences, like Nikum says. But I think like zooming out, what I find very compelling and powerful here is like when you have these like neural networks. That have like all of the knowledge that they have today, plus real time access to the Internet for like any kind of real time information that you might need for your app and file search, where you can have a lot of company, private documents, private details, you combine those three, and you have like very, very compelling and precise answers for any kind of use case that your company or your product might want to enable.swyx [00:15:41]: It's a difference between sort of internal documents versus the open web, right? Like you're going to need both. Exactly, exactly. I never thought about it doing memory as well. I guess, again, you know, anything that's a database, you can store it and you will use it as a database. That sounds awesome. But I think also you've been, you know, expanding the file search. You have more file types. You have query optimization, custom re-ranking. So it really seems like, you know, it's been fleshed out. Obviously, I haven't been paying a ton of attention to the file search capability, but it sounds like your team has added a lot of features.Nikunj [00:16:14]: Yeah, metadata filtering was like the main thing people were asking us for for a while. And I'm super excited about it. I mean, it's just so critical once your, like, web store size goes over, you know, more than like, you know, 5,000, 10,000 records, you kind of need that. So, yeah, metadata filtering is coming, too.Romain [00:16:31]: And for most companies, it's also not like a competency that you want to rebuild in-house necessarily, you know, like, you know, thinking about embeddings and chunking and, you know, how of that, like, it sounds like very complex for something very, like, obvious to ship for your users. Like companies like Navant, for instance. They were able to build with the file search, like, you know, take all of the FAQ and travel policies, for instance, that you have, you, you put that in file search tool, and then you don't have to think about anything. Now your assistant becomes naturally much more aware of all of these policies from the files.swyx [00:17:03]: The question is, like, there's a very, very vibrant RAG industry already, as you well know. So there's many other vector databases, many other frameworks. Probably if it's an open source stack, I would say like a lot of the AI engineers that I talk to want to own this part of the stack. And it feels like, you know, like, when should we DIY and when should we just use whatever OpenAI offers?Nikunj [00:17:24]: Yeah. I mean, like, if you're doing something completely from scratch, you're going to have more control, right? Like, so super supportive of, you know, people trying to, like, roll up their sleeves, build their, like, super custom chunking strategy and super custom retrieval strategy and all of that. And those are things that, like, will be harder to do with OpenAI tools. OpenAI tool has, like, we have an out-of-the-box solution. We give you the tools. We use some knobs to customize things, but it's more of, like, a managed RAG service. So my recommendation would be, like, start with the OpenAI thing, see if it, like, meets your needs. And over time, we're going to be adding more and more knobs to make it even more customizable. But, you know, if you want, like, the completely custom thing, you want control over every single thing, then you'd probably want to go and hand roll it using other solutions. So we're supportive of both, like, engineers should pick. Yeah.Alessio [00:18:16]: And then we got computer use. Which I think Operator was obviously one of the hot releases of the year. And we're only two months in. Let's talk about that. And that's also, it seems like a separate model that has been fine-tuned for Operator that has browser access.Nikunj [00:18:31]: Yeah, absolutely. I mean, the computer use models are exciting. The cool thing about computer use is that we're just so, so early. It's like the GPT-2 of computer use or maybe GPT-1 of computer use right now. But it is a separate model that has been, you know, the computer. The computer use team has been working on, you send it screenshots and it tells you what action to take. So the outputs of it are almost always tool calls and you're inputting screenshots based on whatever computer you're trying to operate.Romain [00:19:01]: Maybe zooming out for a second, because like, I'm sure your audience is like super, super like AI native, obviously. But like, what is computer use as a tool, right? And what's operator? So the idea for computer use is like, how do we let developers also build agents that can complete tasks for the users, but using a computer? Okay. Or a browser instead. And so how do you get that done? And so that's why we have this custom model, like optimized for computer use that we use like for operator ourselves. But the idea behind like putting it as an API is that imagine like now you want to, you want to automate some tasks for your product or your own customers. Then now you can, you can have like the ability to spin up one of these agents that will look at the screen and act on the screen. So that means able, the ability to click, the ability to scroll. The ability to type and to report back on the action. So that's what we mean by computer use and wrapping it as a tool also in the responses API. So now like that gives a hint also at the multi-turned thing that we were hinting at earlier, the idea that like, yeah, maybe one of these actions can take a couple of minutes to complete because there's maybe like 20 steps to complete that task. But now you can.swyx [00:20:08]: Do you think a computer use can play Pokemon?Romain [00:20:11]: Oh, interesting. I guess we tried it. I guess we should try it. You know?swyx [00:20:17]: Yeah. There's a lot of interest. I think Pokemon really is a good agent benchmark, to be honest. Like it seems like Claude is, Claude is running into a lot of trouble.Romain [00:20:25]: Sounds like we should make that a new eval, it looks like.swyx [00:20:28]: Yeah. Yeah. Oh, and then one more, one more thing before we move on to agents SDK. I know you have a hard stop. There's all these, you know, blah, blah, dash preview, right? Like search preview, computer use preview, right? And you see them all like fine tunes of 4.0. I think the question is, are we, are they all going to be merged into the main branch or are we basically always going to have subsets? Of these models?Nikunj [00:20:49]: Yeah, I think in the early days, research teams at OpenAI like operate with like fine tune models. And then once the thing gets like more stable, we sort of merge it into the main line. So that's definitely the vision, like going out of preview as we get more comfortable with and learn about all the developer use cases and we're doing a good job at them. We'll sort of like make them part of like the core models so that you don't have to like deal with the bifurcation.Romain [00:21:12]: You should think of it this way as exactly what happened last year when we introduced vision capabilities, you know. Yes. Vision capabilities were in like a vision preview model based off of GPT-4 and then vision capabilities now are like obviously built into GPT-4.0. You can think about it the same way for like the other modalities like audio and those kind of like models, like optimized for search and computer use.swyx [00:21:34]: Agents SDK, we have a few minutes left. So let's just assume that everyone has looked at Swarm. Sure. I think that Swarm has really popularized the handoff technique, which I thought was like, you know, really, really interesting for sort of a multi-agent. What is new with the SDK?Nikunj [00:21:50]: Yeah. Do you want to start? Yeah, for sure. So we've basically added support for types. We've made this like a lot. Yeah. Like we've added support for types. We've added support for guard railing, which is a very common pattern. So in the guardrail example, you basically have two things happen in parallel. The guardrail can sort of block the execution. It's a type of like optimistic generation that happens. And I think we've added support for tracing. So I think that's really cool. So you can basically look at the traces that the Agents SDK creates in the OpenAI dashboard. We also like made this pretty flexible. So you can pick any API from any provider that supports the ChatCompletions API format. So it supports responses by default, but you can like easily plug it in to anyone that uses the ChatCompletions API. And similarly, on the tracing side, you can support like multiple tracing providers. By default, it sort of points to the OpenAI dashboard. But, you know, there's like so many tracing providers. There's so many tracing companies out there. And we'll announce some partnerships on that front, too. So just like, you know, adding lots of core features and making it more usable, but still centered around like handoffs is like the main, main concept.Romain [00:22:59]: And by the way, it's interesting, right? Because Swarm just came to life out of like learning from customers directly that like orchestrating agents in production was pretty hard. You know, simple ideas could quickly turn very complex. Like what are those guardrails? What are those handoffs, et cetera? So that came out of like learning from customers. And it was initially shipped. It was not as a like low-key experiment, I'd say. But we were kind of like taken by surprise at how much momentum there was around this concept. And so we decided to learn from that and embrace it. To be like, okay, maybe we should just embrace that as a core primitive of the OpenAI platform. And that's kind of what led to the Agents SDK. And I think now, as Nikuj mentioned, it's like adding all of these new capabilities to it, like leveraging the handoffs that we had, but tracing also. And I think what's very compelling for developers is like instead of having one agent to rule them all and you stuff like a lot of tool calls in there that can be hard to monitor, now you have the tools you need to kind of like separate the logic, right? And you can have a triage agent that based on an intent goes to different kind of agents. And then on the OpenAI dashboard, we're releasing a lot of new user interface logs as well. So you can see all of the tracing UIs. Essentially, you'll be able to troubleshoot like what exactly happened. In that workflow, when the triage agent did a handoff to a secondary agent and the third and see the tool calls, et cetera. So we think that the Agents SDK combined with the tracing UIs will definitely help users and developers build better agentic workflows.Alessio [00:24:28]: And just before we wrap, are you thinking of connecting this with also the RFT API? Because I know you already have, you kind of store my text completions and then I can do fine tuning of that. Is that going to be similar for agents where you're storing kind of like my traces? And then help me improve the agents?Nikunj [00:24:43]: Yeah, absolutely. Like you got to tie the traces to the evals product so that you can generate good evals. Once you have good evals and graders and tasks, you can use that to do reinforcement fine tuning. And, you know, lots of details to be figured out over here. But that's the vision. And I think we're going to go after it like pretty hard and hope we can like make this whole workflow a lot easier for developers.Alessio [00:25:05]: Awesome. Thank you so much for the time. I'm sure you'll be busy on Twitter tomorrow with all the developer feedback. Yeah.Romain [00:25:12]: Thank you so much for having us. And as always, we can't wait to see what developers will build with these tools and how we can like learn as quickly as we can from them to make them even better over time.Nikunj [00:25:21]: Yeah.Romain [00:25:22]: Thank you, guys.Nikunj [00:25:23]: Thank you.Romain [00:25:23]: Thank you both. Awesome. Get full access to Latent.Space at www.latent.space/subscribe
This week's EYE ON NPI is a NAND in the HAND, it's ISSI Serial NAND Flash chips (https://www.digikey.com/en/product-highlight/i/issi/serial-nand-flash) available in a variety of sizes and footprints. These are great options for folks that need more data storage on their PCBs, but don't necessarily want an SD card. DigiKey has a selection of 1Gbit and 2Gbit chips, so you have tons of storage for data logs, images, recordings, or even filesystems. And the price is great, you'll pay much less per byte when buying NAND flash There's plenty of times you'll need to access non-volatile memory on your microcontroller: graphics or audio files for a user interface, maps or almanac data for telemetry, sensor or usage logs, interpreted code scripts, firmware updates, security certificates, etc. these files are too big to be stored in simple EEPROM chips (https://www.digikey.com/short/0rf9t7qb) that max out at a few kB. The next step up is to use NOR Flash (https://www.digikey.com/short/jfp3bvph) - you can get up to 256 Megabytes in size! (https://www.digikey.com/en/products/detail/issi-integrated-silicon-solution-inc/IS25LP02GJ-RHLE/24617385) Compared to EEPROM which comes in 1-Wire, I2C or SPI, you definitely have to use an SPI interface for NOR Flash. It's also possible on many chips to have 4-bit-at-a-time QSPI or even 8-bit OSPI interfacing for fast reads. And that's the thing that's really nice about NOR: instant reads of any byte anywhere in memory just like EEPROM. Unlike EEPROM you can't write just one byte at a time anywhere in the storage, you have to write 'page' and erase a 'sector' at a time - each page tends to be about 256 bytes, a sector is often 4KB. That means if you want to update a file, you'll need to read the whole 4K block into a memory cache, change the bytes you want to, then erase and re-write the block out. The good news though is once you write out a page, you can pretty much assume it will stay for many years: there's rarely corrupted data in NOR flash. And, although erasing and writing is a bit of a pain, the instant-access means NOR is great for 'XIP' or other dynamic memory access. If NOR is so great, why bother with NAND? One is cost: a 2MB NOR chip isn't too bad about 45 cents in quantity (https://www.digikey.com/short/zff49fb7) but once you get to the biggest 256 MB ones (https://www.digikey.com/short/vffmp583) the pricing gets high pretty quickly: $15 in tray quantities. Considering you can get a 64G SD card for that price, NOR isn't very cost effective. Second is sizing: if you want 1GB for large files, it just isn't available. For that kind of density you need NOR flash. NAND flash is the kind of flash you get when you buy a USB key or microSD card, although those have USB or SDIO interface chips (https://www.bunniestudios.com/blog/2013/where-usb-memory-sticks-are-born/) that are wire bonded to the NAND flash chips. You get a lot more for the price: instead of $15 for 256MB NOR, its $3 (https://www.digikey.com/short/r77p0922). You also don't need more pins! We always thought that NAND flash required a lot of pins since it comes in 48-TSSOP (https://www.digikey.com/short/8zqbmw31) but turns out that you can get it in a QSPI 8-pin format. That makes it easy to integrate without needing an 8-bit wide memory controller. However, the architectural decisions that give ISSI NAND (https://www.digikey.com/short/jtp8ppdb) the massive size & low cost that we love also make it more complex to use than NOR flash. For one, you can no longer get random access to any byte you like. Instead, an entire page must be read at once into a 2176-byte cache, and then can be accessed. This is fine for most uses except we can't use XIP anymore and there are probably some memory access use cases that don't work nearly as nicely. Also that high density means that bits are more likely to go 'bad' and flip. While you can sorta-kinda get away with not doing error correction or wear leveling on NOR, you absolutely must do error correction and wear leveling on NAND! ISSI includes a simple multi-bit ECC system that can handle repairing up to 8 bits per 2176-byte page. And, every time there's ECC errors, you will need to 'refresh/rewrite' the data to clean it up. That refresh counts against the 60K or 100K write cycle - you are more likely to need wear-level management, even if you don't expect to write that often. Basically, check if your microcontroller SDK has a NAND controller library (https://github.com/D-Buckingham/NAND_flash) that can manage this all for you. So, if you need to level up your storage, with easy-to-use SPI or QSPI-interface, ISSI has many NAND (https://www.digikey.com/en/product-highlight/i/issi/serial-nand-flash) options to let you quickly and inexpensively add 1 or 2 gigabits of non-volatile memory with built in ECC support and block cache. DigiKey will be stocking them shortly, sign up (https://www.digikey.com/short/jtp8ppdb) to be notified when they drop into stock mid-next month!
In a series of posts from February 8th to 11th, open source developer Matthieu Bucchianeri started making allegations that Meta's OVRPlugin is blocking non-Meta OpenXR runtimes, thereby undermining the open and interoperable spirit of OpenXR. He claims that "the OVRPlugin takes intentional precautions to exclude non-Meta platforms. This means that XR content developed with OVRPlugin will only work with Quest Link, and it will not work with any other runtime." Bucchianeri's allegation is that if a PCVR game includes Meta's OVRPlugin as their OpenXR middleware and doesn't implement the counter-blocking measures that he details, then PCVR users will only be able to use a Quest headset with Quest Link while other non-Meta headsets like "Pimax, Pico, Varjo, Vive" will be blocked, even if they have conformant OpenXR runtimes. Bucchianeri has validated these blocking counter-measures, and he says, "as proven with many applications using OVRPlugin with counter-measures enabled, these applications can run on a conformant OpenXR implementation." Not very many XR developers are willing to speak about these issues on the record, but I did manage to record a Voices of VR podcast interview with Virtual Desktop's Guy Godin who was able to independently corroborate many of the core allegations from Bucchianeri. Godin collaborated with Bucchianeri on the Virtual Desktop OpenXR (VDXR) runtime, but has also been receiving many complaints from PCVR users around Meta's OpenXR non-compliance issues, especially with games that launch both on Quest and PCVR and use Meta's OVRPlugin. Godin told me that games will work fine natively on the Quest, but any non-Quest headsets or if they aren't using Quest Link will be blocked on the PCVR version, unless the developer specifically implements anti-blocking counter-measures detailed by Bucchianeri in his technical write-up. It appears as though OpenXR conformance from the Khronos Group only pertains to the actual OpenXR runtimes on the hardware, but headset manufacturers are able to create their own SDK plug-in middleware that interfaces with OpenXR that doesn't have the same conformance requirements. It appears as though Meta is able to technically maintain their OpenXR runtime conformant status because it does not apply to their OVRPlugin middleware SDK solution, and Bucchianeri is claiming that Meta is undermining the normative standards of interoperability by not following the best practices from the Khronos Group. He says, "For the past several years, Khronos has come up with best practices and solutions to develop OpenXR applications and maximize cross-vendor and cross-platform interoperability. Khronos has asked XR developers all over the world to follow these best practices, however - Meta - the largest vendor in Khronos is refusing to follow these best practices." Bucchianeri is also claiming to have had private communications with Meta confirming that these were deliberate and intentional changes. He says, "This is not an accident: this concern was reported to Meta early in 2024 via official means in the Khronos group. Meta acknowledged purposedly blocking other platforms from running OpenXR content at that time." I was able to confirm in my discussion with Guy Godwin in this Voices of VR podcast episode that he also believes that these were deliberate and intentional changes to undermine the spirit of OpenXR. Part of why Bucchianeri was blowing the whistle is because the Khronos Group had not been taking any action against Meta. He describes Meta's actions as "reverting many of the improvements to the developers and users ecosystem that Khronos has spent time, money, energy into solving for the past 7 years." He says, "Unfortunately, since 2024, Khronos has refused to take actions to stop Meta's OVRPlugin destructive initiative towards the PCVR ecosystem. By not taking any actions to resolve the issues created by Meta's OVRPlugin, Khronos is sending the message that OpenXR is no lo...
Today's episode is with Paul Klein, founder of Browserbase. We talked about building browser infrastructure for AI agents, the future of agent authentication, and their open source framework Stagehand.* [00:00:00] Introductions* [00:04:46] AI-specific challenges in browser infrastructure* [00:07:05] Multimodality in AI-Powered Browsing* [00:12:26] Running headless browsers at scale* [00:18:46] Geolocation when proxying* [00:21:25] CAPTCHAs and Agent Auth* [00:28:21] Building “User take over” functionality* [00:33:43] Stagehand: AI web browsing framework* [00:38:58] OpenAI's Operator and computer use agents* [00:44:44] Surprising use cases of Browserbase* [00:47:18] Future of browser automation and market competition* [00:53:11] Being a solo founderTranscriptAlessio [00:00:04]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.swyx [00:00:12]: Hey, and today we are very blessed to have our friends, Paul Klein, for the fourth, the fourth, CEO of Browserbase. Welcome.Paul [00:00:21]: Thanks guys. Yeah, I'm happy to be here. I've been lucky to know both of you for like a couple of years now, I think. So it's just like we're hanging out, you know, with three ginormous microphones in front of our face. It's totally normal hangout.swyx [00:00:34]: Yeah. We've actually mentioned you on the podcast, I think, more often than any other Solaris tenant. Just because like you're one of the, you know, best performing, I think, LLM tool companies that have started up in the last couple of years.Paul [00:00:50]: Yeah, I mean, it's been a whirlwind of a year, like Browserbase is actually pretty close to our first birthday. So we are one years old. And going from, you know, starting a company as a solo founder to... To, you know, having a team of 20 people, you know, a series A, but also being able to support hundreds of AI companies that are building AI applications that go out and automate the web. It's just been like, really cool. It's been happening a little too fast. I think like collectively as an AI industry, let's just take a week off together. I took my first vacation actually two weeks ago, and Operator came out on the first day, and then a week later, DeepSeat came out. And I'm like on vacation trying to chill. I'm like, we got to build with this stuff, right? So it's been a breakneck year. But I'm super happy to be here and like talk more about all the stuff we're seeing. And I'd love to hear kind of what you guys are excited about too, and share with it, you know?swyx [00:01:39]: Where to start? So people, you've done a bunch of podcasts. I think I strongly recommend Jack Bridger's Scaling DevTools, as well as Turner Novak's The Peel. And, you know, I'm sure there's others. So you covered your Twilio story in the past, talked about StreamClub, you got acquired to Mux, and then you left to start Browserbase. So maybe we just start with what is Browserbase? Yeah.Paul [00:02:02]: Browserbase is the web browser for your AI. We're building headless browser infrastructure, which are browsers that run in a server environment that's accessible to developers via APIs and SDKs. It's really hard to run a web browser in the cloud. You guys are probably running Chrome on your computers, and that's using a lot of resources, right? So if you want to run a web browser or thousands of web browsers, you can't just spin up a bunch of lambdas. You actually need to use a secure containerized environment. You have to scale it up and down. It's a stateful system. And that infrastructure is, like, super painful. And I know that firsthand, because at my last company, StreamClub, I was CTO, and I was building our own internal headless browser infrastructure. That's actually why we sold the company, is because Mux really wanted to buy our headless browser infrastructure that we'd built. And it's just a super hard problem. And I actually told my co-founders, I would never start another company unless it was a browser infrastructure company. And it turns out that's really necessary in the age of AI, when AI can actually go out and interact with websites, click on buttons, fill in forms. You need AI to do all of that work in an actual browser running somewhere on a server. And BrowserBase powers that.swyx [00:03:08]: While you're talking about it, it occurred to me, not that you're going to be acquired or anything, but it occurred to me that it would be really funny if you became the Nikita Beer of headless browser companies. You just have one trick, and you make browser companies that get acquired.Paul [00:03:23]: I truly do only have one trick. I'm screwed if it's not for headless browsers. I'm not a Go programmer. You know, I'm in AI grant. You know, browsers is an AI grant. But we were the only company in that AI grant batch that used zero dollars on AI spend. You know, we're purely an infrastructure company. So as much as people want to ask me about reinforcement learning, I might not be the best guy to talk about that. But if you want to ask about headless browser infrastructure at scale, I can talk your ear off. So that's really my area of expertise. And it's a pretty niche thing. Like, nobody has done what we're doing at scale before. So we're happy to be the experts.swyx [00:03:59]: You do have an AI thing, stagehand. We can talk about the sort of core of browser-based first, and then maybe stagehand. Yeah, stagehand is kind of the web browsing framework. Yeah.What is Browserbase? Headless Browser Infrastructure ExplainedAlessio [00:04:10]: Yeah. Yeah. And maybe how you got to browser-based and what problems you saw. So one of the first things I worked on as a software engineer was integration testing. Sauce Labs was kind of like the main thing at the time. And then we had Selenium, we had Playbrite, we had all these different browser things. But it's always been super hard to do. So obviously you've worked on this before. When you started browser-based, what were the challenges? What were the AI-specific challenges that you saw versus, there's kind of like all the usual running browser at scale in the cloud, which has been a problem for years. What are like the AI unique things that you saw that like traditional purchase just didn't cover? Yeah.AI-specific challenges in browser infrastructurePaul [00:04:46]: First and foremost, I think back to like the first thing I did as a developer, like as a kid when I was writing code, I wanted to write code that did stuff for me. You know, I wanted to write code to automate my life. And I do that probably by using curl or beautiful soup to fetch data from a web browser. And I think I still do that now that I'm in the cloud. And the other thing that I think is a huge challenge for me is that you can't just create a web site and parse that data. And we all know that now like, you know, taking HTML and plugging that into an LLM, you can extract insights, you can summarize. So it was very clear that now like dynamic web scraping became very possible with the rise of large language models or a lot easier. And that was like a clear reason why there's been more usage of headless browsers, which are necessary because a lot of modern websites don't expose all of their page content via a simple HTTP request. You know, they actually do require you to run this type of code for a specific time. JavaScript on the page to hydrate this. Airbnb is a great example. You go to airbnb.com. A lot of that content on the page isn't there until after they run the initial hydration. So you can't just scrape it with a curl. You need to have some JavaScript run. And a browser is that JavaScript engine that's going to actually run all those requests on the page. So web data retrieval was definitely one driver of starting BrowserBase and the rise of being able to summarize that within LLM. Also, I was familiar with if I wanted to automate a website, I could write one script and that would work for one website. It was very static and deterministic. But the web is non-deterministic. The web is always changing. And until we had LLMs, there was no way to write scripts that you could write once that would run on any website. That would change with the structure of the website. Click the login button. It could mean something different on many different websites. And LLMs allow us to generate code on the fly to actually control that. So I think that rise of writing the generic automation scripts that can work on many different websites, to me, made it clear that browsers are going to be a lot more useful because now you can automate a lot more things without writing. If you wanted to write a script to book a demo call on 100 websites, previously, you had to write 100 scripts. Now you write one script that uses LLMs to generate that script. That's why we built our web browsing framework, StageHand, which does a lot of that work for you. But those two things, web data collection and then enhanced automation of many different websites, it just felt like big drivers for more browser infrastructure that would be required to power these kinds of features.Alessio [00:07:05]: And was multimodality also a big thing?Paul [00:07:08]: Now you can use the LLMs to look, even though the text in the dome might not be as friendly. Maybe my hot take is I was always kind of like, I didn't think vision would be as big of a driver. For UI automation, I felt like, you know, HTML is structured text and large language models are good with structured text. But it's clear that these computer use models are often vision driven, and they've been really pushing things forward. So definitely being multimodal, like rendering the page is required to take a screenshot to give that to a computer use model to take actions on a website. And it's just another win for browser. But I'll be honest, that wasn't what I was thinking early on. I didn't even think that we'd get here so fast with multimodality. I think we're going to have to get back to multimodal and vision models.swyx [00:07:50]: This is one of those things where I forgot to mention in my intro that I'm an investor in Browserbase. And I remember that when you pitched to me, like a lot of the stuff that we have today, we like wasn't on the original conversation. But I did have my original thesis was something that we've talked about on the podcast before, which is take the GPT store, the custom GPT store, all the every single checkbox and plugin is effectively a startup. And this was the browser one. I think the main hesitation, I think I actually took a while to get back to you. The main hesitation was that there were others. Like you're not the first hit list browser startup. It's not even your first hit list browser startup. There's always a question of like, will you be the category winner in a place where there's a bunch of incumbents, to be honest, that are bigger than you? They're just not targeted at the AI space. They don't have the backing of Nat Friedman. And there's a bunch of like, you're here in Silicon Valley. They're not. I don't know.Paul [00:08:47]: I don't know if that's, that was it, but like, there was a, yeah, I mean, like, I think I tried all the other ones and I was like, really disappointed. Like my background is from working at great developer tools, companies, and nothing had like the Vercel like experience. Um, like our biggest competitor actually is partly owned by private equity and they just jacked up their prices quite a bit. And the dashboard hasn't changed in five years. And I actually used them at my last company and tried them and I was like, oh man, like there really just needs to be something that's like the experience of these great infrastructure companies, like Stripe, like clerk, like Vercel that I use in love, but oriented towards this kind of like more specific category, which is browser infrastructure, which is really technically complex. Like a lot of stuff can go wrong on the internet when you're running a browser. The internet is very vast. There's a lot of different configurations. Like there's still websites that only work with internet explorer out there. How do you handle that when you're running your own browser infrastructure? These are the problems that we have to think about and solve at BrowserBase. And it's, it's certainly a labor of love, but I built this for me, first and foremost, I know it's super cheesy and everyone says that for like their startups, but it really, truly was for me. If you look at like the talks I've done even before BrowserBase, and I'm just like really excited to try and build a category defining infrastructure company. And it's, it's rare to have a new category of infrastructure exists. We're here in the Chroma offices and like, you know, vector databases is a new category of infrastructure. Is it, is it, I mean, we can, we're in their office, so, you know, we can, we can debate that one later. That is one.Multimodality in AI-Powered Browsingswyx [00:10:16]: That's one of the industry debates.Paul [00:10:17]: I guess we go back to the LLMOS talk that Karpathy gave way long ago. And like the browser box was very clearly there and it seemed like the people who were building in this space also agreed that browsers are a core primitive of infrastructure for the LLMOS that's going to exist in the future. And nobody was building something there that I wanted to use. So I had to go build it myself.swyx [00:10:38]: Yeah. I mean, exactly that talk that, that honestly, that diagram, every box is a startup and there's the code box and then there's the. The browser box. I think at some point they will start clashing there. There's always the question of the, are you a point solution or are you the sort of all in one? And I think the point solutions tend to win quickly, but then the only ones have a very tight cohesive experience. Yeah. Let's talk about just the hard problems of browser base you have on your website, which is beautiful. Thank you. Was there an agency that you used for that? Yeah. Herb.paris.Paul [00:11:11]: They're amazing. Herb.paris. Yeah. It's H-E-R-V-E. I highly recommend for developers. Developer tools, founders to work with consumer agencies because they end up building beautiful things and the Parisians know how to build beautiful interfaces. So I got to give prep.swyx [00:11:24]: And chat apps, apparently are, they are very fast. Oh yeah. The Mistral chat. Yeah. Mistral. Yeah.Paul [00:11:31]: Late chat.swyx [00:11:31]: Late chat. And then your videos as well, it was professionally shot, right? The series A video. Yeah.Alessio [00:11:36]: Nico did the videos. He's amazing. Not the initial video that you shot at the new one. First one was Austin.Paul [00:11:41]: Another, another video pretty surprised. But yeah, I mean, like, I think when you think about how you talk about your company. You have to think about the way you present yourself. It's, you know, as a developer, you think you evaluate a company based on like the API reliability and the P 95, but a lot of developers say, is the website good? Is the message clear? Do I like trust this founder? I'm building my whole feature on. So I've tried to nail that as well as like the reliability of the infrastructure. You're right. It's very hard. And there's a lot of kind of foot guns that you run into when running headless browsers at scale. Right.Competing with Existing Headless Browser Solutionsswyx [00:12:10]: So let's pick one. You have eight features here. Seamless integration. Scalability. Fast or speed. Secure. Observable. Stealth. That's interesting. Extensible and developer first. What comes to your mind as like the top two, three hardest ones? Yeah.Running headless browsers at scalePaul [00:12:26]: I think just running headless browsers at scale is like the hardest one. And maybe can I nerd out for a second? Is that okay? I heard this is a technical audience, so I'll talk to the other nerds. Whoa. They were listening. Yeah. They're upset. They're ready. The AGI is angry. Okay. So. So how do you run a browser in the cloud? Let's start with that, right? So let's say you're using a popular browser automation framework like Puppeteer, Playwright, and Selenium. Maybe you've written a code, some code locally on your computer that opens up Google. It finds the search bar and then types in, you know, search for Latent Space and hits the search button. That script works great locally. You can see the little browser open up. You want to take that to production. You want to run the script in a cloud environment. So when your laptop is closed, your browser is doing something. The browser is doing something. Well, I, we use Amazon. You can see the little browser open up. You know, the first thing I'd reach for is probably like some sort of serverless infrastructure. I would probably try and deploy on a Lambda. But Chrome itself is too big to run on a Lambda. It's over 250 megabytes. So you can't easily start it on a Lambda. So you maybe have to use something like Lambda layers to squeeze it in there. Maybe use a different Chromium build that's lighter. And you get it on the Lambda. Great. It works. But it runs super slowly. It's because Lambdas are very like resource limited. They only run like with one vCPU. You can run one process at a time. Remember, Chromium is super beefy. It's barely running on my MacBook Air. I'm still downloading it from a pre-run. Yeah, from the test earlier, right? I'm joking. But it's big, you know? So like Lambda, it just won't work really well. Maybe it'll work, but you need something faster. Your users want something faster. Okay. Well, let's put it on a beefier instance. Let's get an EC2 server running. Let's throw Chromium on there. Great. Okay. I can, that works well with one user. But what if I want to run like 10 Chromium instances, one for each of my users? Okay. Well, I might need two EC2 instances. Maybe 10. All of a sudden, you have multiple EC2 instances. This sounds like a problem for Kubernetes and Docker, right? Now, all of a sudden, you're using ECS or EKS, the Kubernetes or container solutions by Amazon. You're spending up and down containers, and you're spending a whole engineer's time on kind of maintaining this stateful distributed system. Those are some of the worst systems to run because when it's a stateful distributed system, it means that you are bound by the connections to that thing. You have to keep the browser open while someone is working with it, right? That's just a painful architecture to run. And there's all this other little gotchas with Chromium, like Chromium, which is the open source version of Chrome, by the way. You have to install all these fonts. You want emojis working in your browsers because your vision model is looking for the emoji. You need to make sure you have the emoji fonts. You need to make sure you have all the right extensions configured, like, oh, do you want ad blocking? How do you configure that? How do you actually record all these browser sessions? Like it's a headless browser. You can't look at it. So you need to have some sort of observability. Maybe you're recording videos and storing those somewhere. It all kind of adds up to be this just giant monster piece of your project when all you wanted to do was run a lot of browsers in production for this little script to go to google.com and search. And when I see a complex distributed system, I see an opportunity to build a great infrastructure company. And we really abstract that away with Browserbase where our customers can use these existing frameworks, Playwright, Publisher, Selenium, or our own stagehand and connect to our browsers in a serverless-like way. And control them, and then just disconnect when they're done. And they don't have to think about the complex distributed system behind all of that. They just get a browser running anywhere, anytime. Really easy to connect to.swyx [00:15:55]: I'm sure you have questions. My standard question with anything, so essentially you're a serverless browser company, and there's been other serverless things that I'm familiar with in the past, serverless GPUs, serverless website hosting. That's where I come from with Netlify. One question is just like, you promised to spin up thousands of servers. You promised to spin up thousands of browsers in milliseconds. I feel like there's no real solution that does that yet. And I'm just kind of curious how. The only solution I know, which is to kind of keep a kind of warm pool of servers around, which is expensive, but maybe not so expensive because it's just CPUs. So I'm just like, you know. Yeah.Browsers as a Core Primitive in AI InfrastructurePaul [00:16:36]: You nailed it, right? I mean, how do you offer a serverless-like experience with something that is clearly not serverless, right? And the answer is, you need to be able to run... We run many browsers on single nodes. We use Kubernetes at browser base. So we have many pods that are being scheduled. We have to predictably schedule them up or down. Yes, thousands of browsers in milliseconds is the best case scenario. If you hit us with 10,000 requests, you may hit a slower cold start, right? So we've done a lot of work on predictive scaling and being able to kind of route stuff to different regions where we have multiple regions of browser base where we have different pools available. You can also pick the region you want to go to based on like lower latency, round trip, time latency. It's very important with these types of things. There's a lot of requests going over the wire. So for us, like having a VM like Firecracker powering everything under the hood allows us to be super nimble and spin things up or down really quickly with strong multi-tenancy. But in the end, this is like the complex infrastructural challenges that we have to kind of deal with at browser base. And we have a lot more stuff on our roadmap to allow customers to have more levers to pull to exchange, do you want really fast browser startup times or do you want really low costs? And if you're willing to be more flexible on that, we may be able to kind of like work better for your use cases.swyx [00:17:44]: Since you used Firecracker, shouldn't Fargate do that for you or did you have to go lower level than that? We had to go lower level than that.Paul [00:17:51]: I find this a lot with Fargate customers, which is alarming for Fargate. We used to be a giant Fargate customer. Actually, the first version of browser base was ECS and Fargate. And unfortunately, it's a great product. I think we were actually the largest Fargate customer in our region for a little while. No, what? Yeah, seriously. And unfortunately, it's a great product, but I think if you're an infrastructure company, you actually have to have a deeper level of control over these primitives. I think it's the same thing is true with databases. We've used other database providers and I think-swyx [00:18:21]: Yeah, serverless Postgres.Paul [00:18:23]: Shocker. When you're an infrastructure company, you're on the hook if any provider has an outage. And I can't tell my customers like, hey, we went down because so-and-so went down. That's not acceptable. So for us, we've really moved to bringing things internally. It's kind of opposite of what we preach. We tell our customers, don't build this in-house, but then we're like, we build a lot of stuff in-house. But I think it just really depends on what is in the critical path. We try and have deep ownership of that.Alessio [00:18:46]: On the distributed location side, how does that work for the web where you might get sort of different content in different locations, but the customer is expecting, you know, if you're in the US, I'm expecting the US version. But if you're spinning up my browser in France, I might get the French version. Yeah.Paul [00:19:02]: Yeah. That's a good question. Well, generally, like on the localization, there is a thing called locale in the browser. You can set like what your locale is. If you're like in the ENUS browser or not, but some things do IP, IP based routing. And in that case, you may want to have a proxy. Like let's say you're running something in the, in Europe, but you want to make sure you're showing up from the US. You may want to use one of our proxy features so you can turn on proxies to say like, make sure these connections always come from the United States, which is necessary too, because when you're browsing the web, you're coming from like a, you know, data center IP, and that can make things a lot harder to browse web. So we do have kind of like this proxy super network. Yeah. We have a proxy for you based on where you're going, so you can reliably automate the web. But if you get scheduled in Europe, that doesn't happen as much. We try and schedule you as close to, you know, your origin that you're trying to go to. But generally you have control over the regions you can put your browsers in. So you can specify West one or East one or Europe. We only have one region of Europe right now, actually. Yeah.Alessio [00:19:55]: What's harder, the browser or the proxy? I feel like to me, it feels like actually proxying reliably at scale. It's much harder than spending up browsers at scale. I'm curious. It's all hard.Paul [00:20:06]: It's layers of hard, right? Yeah. I think it's different levels of hard. I think the thing with the proxy infrastructure is that we work with many different web proxy providers and some are better than others. Some have good days, some have bad days. And our customers who've built browser infrastructure on their own, they have to go and deal with sketchy actors. Like first they figure out their own browser infrastructure and then they got to go buy a proxy. And it's like you can pay in Bitcoin and it just kind of feels a little sus, right? It's like you're buying drugs when you're trying to get a proxy online. We have like deep relationships with these counterparties. We're able to audit them and say, is this proxy being sourced ethically? Like it's not running on someone's TV somewhere. Is it free range? Yeah. Free range organic proxies, right? Right. We do a level of diligence. We're SOC 2. So we have to understand what is going on here. But then we're able to make sure that like we route around proxy providers not working. There's proxy providers who will just, the proxy will stop working all of a sudden. And then if you don't have redundant proxying on your own browsers, that's hard down for you or you may get some serious impacts there. With us, like we intelligently know, hey, this proxy is not working. Let's go to this one. And you can kind of build a network of multiple providers to really guarantee the best uptime for our customers. Yeah. So you don't own any proxies? We don't own any proxies. You're right. The team has been saying who wants to like take home a little proxy server, but not yet. We're not there yet. You know?swyx [00:21:25]: It's a very mature market. I don't think you should build that yourself. Like you should just be a super customer of them. Yeah. Scraping, I think, is the main use case for that. I guess. Well, that leads us into CAPTCHAs and also off, but let's talk about CAPTCHAs. You had a little spiel that you wanted to talk about CAPTCHA stuff.Challenges of Scaling Browser InfrastructurePaul [00:21:43]: Oh, yeah. I was just, I think a lot of people ask, if you're thinking about proxies, you're thinking about CAPTCHAs too. I think it's the same thing. You can go buy CAPTCHA solvers online, but it's the same buying experience. It's some sketchy website, you have to integrate it. It's not fun to buy these things and you can't really trust that the docs are bad. What Browserbase does is we integrate a bunch of different CAPTCHAs. We do some stuff in-house, but generally we just integrate with a bunch of known vendors and continually monitor and maintain these things and say, is this working or not? Can we route around it or not? These are CAPTCHA solvers. CAPTCHA solvers, yeah. Not CAPTCHA providers, CAPTCHA solvers. Yeah, sorry. CAPTCHA solvers. We really try and make sure all of that works for you. I think as a dev, if I'm buying infrastructure, I want it all to work all the time and it's important for us to provide that experience by making sure everything does work and monitoring it on our own. Yeah. Right now, the world of CAPTCHAs is tricky. I think AI agents in particular are very much ahead of the internet infrastructure. CAPTCHAs are designed to block all types of bots, but there are now good bots and bad bots. I think in the future, CAPTCHAs will be able to identify who a good bot is, hopefully via some sort of KYC. For us, we've been very lucky. We have very little to no known abuse of Browserbase because we really look into who we work with. And for certain types of CAPTCHA solving, we only allow them on certain types of plans because we want to make sure that we can know what people are doing, what their use cases are. And that's really allowed us to try and be an arbiter of good bots, which is our long term goal. I want to build great relationships with people like Cloudflare so we can agree, hey, here are these acceptable bots. We'll identify them for you and make sure we flag when they come to your website. This is a good bot, you know?Alessio [00:23:23]: I see. And Cloudflare said they want to do more of this. So they're going to set by default, if they think you're an AI bot, they're going to reject. I'm curious if you think this is something that is going to be at the browser level or I mean, the DNS level with Cloudflare seems more where it should belong. But I'm curious how you think about it.Paul [00:23:40]: I think the web's going to change. You know, I think that the Internet as we have it right now is going to change. And we all need to just accept that the cat is out of the bag. And instead of kind of like wishing the Internet was like it was in the 2000s, we can have free content line that wouldn't be scraped. It's just it's not going to happen. And instead, we should think about like, one, how can we change? How can we change the models of, you know, information being published online so people can adequately commercialize it? But two, how do we rebuild applications that expect that AI agents are going to log in on their behalf? Those are the things that are going to allow us to kind of like identify good and bad bots. And I think the team at Clerk has been doing a really good job with this on the authentication side. I actually think that auth is the biggest thing that will prevent agents from accessing stuff, not captchas. And I think there will be agent auth in the future. I don't know if it's going to happen from an individual company, but actually authentication providers that have a, you know, hidden login as agent feature, which will then you put in your email, you'll get a push notification, say like, hey, your browser-based agent wants to log into your Airbnb. You can approve that and then the agent can proceed. That really circumvents the need for captchas or logging in as you and sharing your password. I think agent auth is going to be one way we identify good bots going forward. And I think a lot of this captcha solving stuff is really short-term problems as the internet kind of reorients itself around how it's going to work with agents browsing the web, just like people do. Yeah.Managing Distributed Browser Locations and Proxiesswyx [00:24:59]: Stitch recently was on Hacker News for talking about agent experience, AX, which is a thing that Netlify is also trying to clone and coin and talk about. And we've talked about this on our previous episodes before in a sense that I actually think that's like maybe the only part of the tech stack that needs to be kind of reinvented for agents. Everything else can stay the same, CLIs, APIs, whatever. But auth, yeah, we need agent auth. And it's mostly like short-lived, like it should not, it should be a distinct, identity from the human, but paired. I almost think like in the same way that every social network should have your main profile and then your alt accounts or your Finsta, it's almost like, you know, every, every human token should be paired with the agent token and the agent token can go and do stuff on behalf of the human token, but not be presumed to be the human. Yeah.Paul [00:25:48]: It's like, it's, it's actually very similar to OAuth is what I'm thinking. And, you know, Thread from Stitch is an investor, Colin from Clerk, Octaventures, all investors in browser-based because like, I hope they solve this because they'll make browser-based submission more possible. So we don't have to overcome all these hurdles, but I think it will be an OAuth-like flow where an agent will ask to log in as you, you'll approve the scopes. Like it can book an apartment on Airbnb, but it can't like message anybody. And then, you know, the agent will have some sort of like role-based access control within an application. Yeah. I'm excited for that.swyx [00:26:16]: The tricky part is just, there's one, one layer of delegation here, which is like, you're authoring my user's user or something like that. I don't know if that's tricky or not. Does that make sense? Yeah.Paul [00:26:25]: You know, actually at Twilio, I worked on the login identity and access. Management teams, right? So like I built Twilio's login page.swyx [00:26:31]: You were an intern on that team and then you became the lead in two years? Yeah.Paul [00:26:34]: Yeah. I started as an intern in 2016 and then I was the tech lead of that team. How? That's not normal. I didn't have a life. He's not normal. Look at this guy. I didn't have a girlfriend. I just loved my job. I don't know. I applied to 500 internships for my first job and I got rejected from every single one of them except for Twilio and then eventually Amazon. And they took a shot on me and like, I was getting paid money to write code, which was my dream. Yeah. Yeah. I'm very lucky that like this coding thing worked out because I was going to be doing it regardless. And yeah, I was able to kind of spend a lot of time on a team that was growing at a company that was growing. So it informed a lot of this stuff here. I think these are problems that have been solved with like the SAML protocol with SSO. I think it's a really interesting stuff with like WebAuthn, like these different types of authentication, like schemes that you can use to authenticate people. The tooling is all there. It just needs to be tweaked a little bit to work for agents. And I think the fact that there are companies that are already. Providing authentication as a service really sets it up. Well, the thing that's hard is like reinventing the internet for agents. We don't want to rebuild the internet. That's an impossible task. And I think people often say like, well, we'll have this second layer of APIs built for agents. I'm like, we will for the top use cases, but instead of we can just tweak the internet as is, which is on the authentication side, I think we're going to be the dumb ones going forward. Unfortunately, I think AI is going to be able to do a lot of the tasks that we do online, which means that it will be able to go to websites, click buttons on our behalf and log in on our behalf too. So with this kind of like web agent future happening, I think with some small structural changes, like you said, it feels like it could all slot in really nicely with the existing internet.Handling CAPTCHAs and Agent Authenticationswyx [00:28:08]: There's one more thing, which is the, your live view iframe, which lets you take, take control. Yeah. Obviously very key for operator now, but like, was, is there anything interesting technically there or that the people like, well, people always want this.Paul [00:28:21]: It was really hard to build, you know, like, so, okay. Headless browsers, you don't see them, right. They're running. They're running in a cloud somewhere. You can't like look at them. And I just want to really make, it's a weird name. I wish we came up with a better name for this thing, but you can't see them. Right. But customers don't trust AI agents, right. At least the first pass. So what we do with our live view is that, you know, when you use browser base, you can actually embed a live view of the browser running in the cloud for your customer to see it working. And that's what the first reason is the build trust, like, okay, so I have this script. That's going to go automate a website. I can embed it into my web application via an iframe and my customer can watch. I think. And then we added two way communication. So now not only can you watch the browser kind of being operated by AI, if you want to pause and actually click around type within this iframe that's controlling a browser, that's also possible. And this is all thanks to some of the lower level protocol, which is called the Chrome DevTools protocol. It has a API called start screencast, and you can also send mouse clicks and button clicks to a remote browser. And this is all embeddable within iframes. You have a browser within a browser, yo. And then you simulate the screen, the click on the other side. Exactly. And this is really nice often for, like, let's say, a capture that can't be solved. You saw this with Operator, you know, Operator actually uses a different approach. They use VNC. So, you know, you're able to see, like, you're seeing the whole window here. What we're doing is something a little lower level with the Chrome DevTools protocol. It's just PNGs being streamed over the wire. But the same thing is true, right? Like, hey, I'm running a window. Pause. Can you do something in this window? Human. Okay, great. Resume. Like sometimes 2FA tokens. Like if you get that text message, you might need a person to type that in. Web agents need human-in-the-loop type workflows still. You still need a person to interact with the browser. And building a UI to proxy that is kind of hard. You may as well just show them the whole browser and say, hey, can you finish this up for me? And then let the AI proceed on afterwards. Is there a future where I stream my current desktop to browser base? I don't think so. I think we're very much cloud infrastructure. Yeah. You know, but I think a lot of the stuff we're doing, we do want to, like, build tools. Like, you know, we'll talk about the stage and, you know, web agent framework in a second. But, like, there's a case where a lot of people are going desktop first for, you know, consumer use. And I think cloud is doing a lot of this, where I expect to see, you know, MCPs really oriented around the cloud desktop app for a reason, right? Like, I think a lot of these tools are going to run on your computer because it makes... I think it's breaking out. People are putting it on a server. Oh, really? Okay. Well, sweet. We'll see. We'll see that. I was surprised, though, wasn't I? I think that the browser company, too, with Dia Browser, it runs on your machine. You know, it's going to be...swyx [00:30:50]: What is it?Paul [00:30:51]: So, Dia Browser, as far as I understand... I used to use Arc. Yeah. I haven't used Arc. But I'm a big fan of the browser company. I think they're doing a lot of cool stuff in consumer. As far as I understand, it's a browser where you have a sidebar where you can, like, chat with it and it can control the local browser on your machine. So, if you imagine, like, what a consumer web agent is, which it lives alongside your browser, I think Google Chrome has Project Marina, I think. I almost call it Project Marinara for some reason. I don't know why. It's...swyx [00:31:17]: No, I think it's someone really likes the Waterworld. Oh, I see. The classic Kevin Costner. Yeah.Paul [00:31:22]: Okay. Project Marinara is a similar thing to the Dia Browser, in my mind, as far as I understand it. You have a browser that has an AI interface that will take over your mouse and keyboard and control the browser for you. Great for consumer use cases. But if you're building applications that rely on a browser and it's more part of a greater, like, AI app experience, you probably need something that's more like infrastructure, not a consumer app.swyx [00:31:44]: Just because I have explored a little bit in this area, do people want branching? So, I have the state. Of whatever my browser's in. And then I want, like, 100 clones of this state. Do people do that? Or...Paul [00:31:56]: People don't do it currently. Yeah. But it's definitely something we're thinking about. I think the idea of forking a browser is really cool. Technically, kind of hard. We're starting to see this in code execution, where people are, like, forking some, like, code execution, like, processes or forking some tool calls or branching tool calls. Haven't seen it at the browser level yet. But it makes sense. Like, if an AI agent is, like, using a website and it's not sure what path it wants to take to crawl this website. To find the information it's looking for. It would make sense for it to explore both paths in parallel. And that'd be a very, like... A road not taken. Yeah. And hopefully find the right answer. And then say, okay, this was actually the right one. And memorize that. And go there in the future. On the roadmap. For sure. Don't make my roadmap, please. You know?Alessio [00:32:37]: How do you actually do that? Yeah. How do you fork? I feel like the browser is so stateful for so many things.swyx [00:32:42]: Serialize the state. Restore the state. I don't know.Paul [00:32:44]: So, it's one of the reasons why we haven't done it yet. It's hard. You know? Like, to truly fork, it's actually quite difficult. The naive way is to open the same page in a new tab and then, like, hope that it's at the same thing. But if you have a form halfway filled, you may have to, like, take the whole, you know, container. Pause it. All the memory. Duplicate it. Restart it from there. It could be very slow. So, we haven't found a thing. Like, the easy thing to fork is just, like, copy the page object. You know? But I think there needs to be something a little bit more robust there. Yeah.swyx [00:33:12]: So, MorphLabs has this infinite branch thing. Like, wrote a custom fork of Linux or something that let them save the system state and clone it. MorphLabs, hit me up. I'll be a customer. Yeah. That's the only. I think that's the only way to do it. Yeah. Like, unless Chrome has some special API for you. Yeah.Paul [00:33:29]: There's probably something we'll reverse engineer one day. I don't know. Yeah.Alessio [00:33:32]: Let's talk about StageHand, the AI web browsing framework. You have three core components, Observe, Extract, and Act. Pretty clean landing page. What was the idea behind making a framework? Yeah.Stagehand: AI web browsing frameworkPaul [00:33:43]: So, there's three frameworks that are very popular or already exist, right? Puppeteer, Playwright, Selenium. Those are for building hard-coded scripts to control websites. And as soon as I started to play with LLMs plus browsing, I caught myself, you know, code-genning Playwright code to control a website. I would, like, take the DOM. I'd pass it to an LLM. I'd say, can you generate the Playwright code to click the appropriate button here? And it would do that. And I was like, this really should be part of the frameworks themselves. And I became really obsessed with SDKs that take natural language as part of, like, the API input. And that's what StageHand is. StageHand exposes three APIs, and it's a super set of Playwright. So, if you go to a page, you may want to take an action, click on the button, fill in the form, etc. That's what the act command is for. You may want to extract some data. This one takes a natural language, like, extract the winner of the Super Bowl from this page. You can give it a Zod schema, so it returns a structured output. And then maybe you're building an API. You can do an agent loop, and you want to kind of see what actions are possible on this page before taking one. You can do observe. So, you can observe the actions on the page, and it will generate a list of actions. You can guide it, like, give me actions on this page related to buying an item. And you can, like, buy it now, add to cart, view shipping options, and pass that to an LLM, an agent loop, to say, what's the appropriate action given this high-level goal? So, StageHand isn't a web agent. It's a framework for building web agents. And we think that agent loops are actually pretty close to the application layer because every application probably has different goals or different ways it wants to take steps. I don't think I've seen a generic. Maybe you guys are the experts here. I haven't seen, like, a really good AI agent framework here. Everyone kind of has their own special sauce, right? I see a lot of developers building their own agent loops, and they're using tools. And I view StageHand as the browser tool. So, we expose act, extract, observe. Your agent can call these tools. And from that, you don't have to worry about it. You don't have to worry about generating playwright code performantly. You don't have to worry about running it. You can kind of just integrate these three tool calls into your agent loop and reliably automate the web.swyx [00:35:48]: A special shout-out to Anirudh, who I met at your dinner, who I think listens to the pod. Yeah. Hey, Anirudh.Paul [00:35:54]: Anirudh's a man. He's a StageHand guy.swyx [00:35:56]: I mean, the interesting thing about each of these APIs is they're kind of each startup. Like, specifically extract, you know, Firecrawler is extract. There's, like, Expand AI. There's a whole bunch of, like, extract companies. They just focus on extract. I'm curious. Like, I feel like you guys are going to collide at some point. Like, right now, it's friendly. Everyone's in a blue ocean. At some point, it's going to be valuable enough that there's some turf battle here. I don't think you have a dog in a fight. I think you can mock extract to use an external service if they're better at it than you. But it's just an observation that, like, in the same way that I see each option, each checkbox in the side of custom GBTs becoming a startup or each box in the Karpathy chart being a startup. Like, this is also becoming a thing. Yeah.Paul [00:36:41]: I mean, like, so the way StageHand works is that it's MIT-licensed, completely open source. You bring your own API key to your LLM of choice. You could choose your LLM. We don't make any money off of the extract or really. We only really make money if you choose to run it with our browser. You don't have to. You can actually use your own browser, a local browser. You know, StageHand is completely open source for that reason. And, yeah, like, I think if you're building really complex web scraping workflows, I don't know if StageHand is the tool for you. I think it's really more if you're building an AI agent that needs a few general tools or if it's doing a lot of, like, web automation-intensive work. But if you're building a scraping company, StageHand is not your thing. You probably want something that's going to, like, get HTML content, you know, convert that to Markdown, query it. That's not what StageHand does. StageHand is more about reliability. I think we focus a lot on reliability and less so on cost optimization and speed at this point.swyx [00:37:33]: I actually feel like StageHand, so the way that StageHand works, it's like, you know, page.act, click on the quick start. Yeah. It's kind of the integration test for the code that you would have to write anyway, like the Puppeteer code that you have to write anyway. And when the page structure changes, because it always does, then this is still the test. This is still the test that I would have to write. Yeah. So it's kind of like a testing framework that doesn't need implementation detail.Paul [00:37:56]: Well, yeah. I mean, Puppeteer, Playwright, and Slenderman were all designed as testing frameworks, right? Yeah. And now people are, like, hacking them together to automate the web. I would say, and, like, maybe this is, like, me being too specific. But, like, when I write tests, if the page structure changes. Without me knowing, I want that test to fail. So I don't know if, like, AI, like, regenerating that. Like, people are using StageHand for testing. But it's more for, like, usability testing, not, like, testing of, like, does the front end, like, has it changed or not. Okay. But generally where we've seen people, like, really, like, take off is, like, if they're using, you know, something. If they want to build a feature in their application that's kind of like Operator or Deep Research, they're using StageHand to kind of power that tool calling in their own agent loop. Okay. Cool.swyx [00:38:37]: So let's go into Operator, the first big agent launch of the year from OpenAI. Seems like they have a whole bunch scheduled. You were on break and your phone blew up. What's your just general view of computer use agents is what they're calling it. The overall category before we go into Open Operator, just the overall promise of Operator. I will observe that I tried it once. It was okay. And I never tried it again.OpenAI's Operator and computer use agentsPaul [00:38:58]: That tracks with my experience, too. Like, I'm a huge fan of the OpenAI team. Like, I think that I do not view Operator as the company. I'm not a company killer for browser base at all. I think it actually shows people what's possible. I think, like, computer use models make a lot of sense. And I'm actually most excited about computer use models is, like, their ability to, like, really take screenshots and reasoning and output steps. I think that using mouse click or mouse coordinates, I've seen that proved to be less reliable than I would like. And I just wonder if that's the right form factor. What we've done with our framework is anchor it to the DOM itself, anchor it to the actual item. So, like, if it's clicking on something, it's clicking on that thing, you know? Like, it's more accurate. No matter where it is. Yeah, exactly. Because it really ties in nicely. And it can handle, like, the whole viewport in one go, whereas, like, Operator can only handle what it sees. Can you hover? Is hovering a thing that you can do? I don't know if we expose it as a tool directly, but I'm sure there's, like, an API for hovering. Like, move mouse to this position. Yeah, yeah, yeah. I think you can trigger hover, like, via, like, the JavaScript on the DOM itself. But, no, I think, like, when we saw computer use, everyone's eyes lit up because they realized, like, wow, like, AI is going to actually automate work for people. And I think seeing that kind of happen from both of the labs, and I'm sure we're going to see more labs launch computer use models, I'm excited to see all the stuff that people build with it. I think that I'd love to see computer use power, like, controlling a browser on browser base. And I think, like, Open Operator, which was, like, our open source version of OpenAI's Operator, was our first take on, like, how can we integrate these models into browser base? And we handle the infrastructure and let the labs do the models. I don't have a sense that Operator will be released as an API. I don't know. Maybe it will. I'm curious to see how well that works because I think it's going to be really hard for a company like OpenAI to do things like support CAPTCHA solving or, like, have proxies. Like, I think it's hard for them structurally. Imagine this New York Times headline, OpenAI CAPTCHA solving. Like, that would be a pretty bad headline, this New York Times headline. Browser base solves CAPTCHAs. No one cares. No one cares. And, like, our investors are bored. Like, we're all okay with this, you know? We're building this company knowing that the CAPTCHA solving is short-lived until we figure out how to authenticate good bots. I think it's really hard for a company like OpenAI, who has this brand that's so, so good, to balance with, like, the icky parts of web automation, which it can be kind of complex to solve. I'm sure OpenAI knows who to call whenever they need you. Yeah, right. I'm sure they'll have a great partnership.Alessio [00:41:23]: And is Open Operator just, like, a marketing thing for you? Like, how do you think about resource allocation? So, you can spin this up very quickly. And now there's all this, like, open deep research, just open all these things that people are building. We started it, you know. You're the original Open. We're the original Open operator, you know? Is it just, hey, look, this is a demo, but, like, we'll help you build out an actual product for yourself? Like, are you interested in going more of a product route? That's kind of the OpenAI way, right? They started as a model provider and then…Paul [00:41:53]: Yeah, we're not interested in going the product route yet. I view Open Operator as a model provider. It's a reference project, you know? Let's show people how to build these things using the infrastructure and models that are out there. And that's what it is. It's, like, Open Operator is very simple. It's an agent loop. It says, like, take a high-level goal, break it down into steps, use tool calling to accomplish those steps. It takes screenshots and feeds those screenshots into an LLM with the step to generate the right action. It uses stagehand under the hood to actually execute this action. It doesn't use a computer use model. And it, like, has a nice interface using the live view that we talked about, the iframe, to embed that into an application. So I felt like people on launch day wanted to figure out how to build their own version of this. And we turned that around really quickly to show them. And I hope we do that with other things like deep research. We don't have a deep research launch yet. I think David from AOMNI actually has an amazing open deep research that he launched. It has, like, 10K GitHub stars now. So he's crushing that. But I think if people want to build these features natively into their application, they need good reference projects. And I think Open Operator is a good example of that.swyx [00:42:52]: I don't know. Actually, I'm actually pretty bullish on API-driven operator. Because that's the only way that you can sort of, like, once it's reliable enough, obviously. And now we're nowhere near. But, like, give it five years. It'll happen, you know. And then you can sort of spin this up and browsers are working in the background and you don't necessarily have to know. And it just is booking restaurants for you, whatever. I can definitely see that future happening. I had this on the landing page here. This might be a slightly out of order. But, you know, you have, like, sort of three use cases for browser base. Open Operator. Or this is the operator sort of use case. It's kind of like the workflow automation use case. And it completes with UiPath in the sort of RPA category. Would you agree with that? Yeah, I would agree with that. And then there's Agents we talked about already. And web scraping, which I imagine would be the bulk of your workload right now, right?Paul [00:43:40]: No, not at all. I'd say actually, like, the majority is browser automation. We're kind of expensive for web scraping. Like, I think that if you're building a web scraping product, if you need to do occasional web scraping or you have to do web scraping that works every single time, you want to use browser automation. Yeah. You want to use browser-based. But if you're building web scraping workflows, what you should do is have a waterfall. You should have the first request is a curl to the website. See if you can get it without even using a browser. And then the second request may be, like, a scraping-specific API. There's, like, a thousand scraping APIs out there that you can use to try and get data. Scraping B. Scraping B is a great example, right? Yeah. And then, like, if those two don't work, bring out the heavy hitter. Like, browser-based will 100% work, right? It will load the page in a real browser, hydrate it. I see.swyx [00:44:21]: Because a lot of people don't render to JS.swyx [00:44:25]: Yeah, exactly.Paul [00:44:26]: So, I mean, the three big use cases, right? Like, you know, automation, web data collection, and then, you know, if you're building anything agentic that needs, like, a browser tool, you want to use browser-based.Alessio [00:44:35]: Is there any use case that, like, you were super surprised by that people might not even think about? Oh, yeah. Or is it, yeah, anything that you can share? The long tail is crazy. Yeah.Surprising use cases of BrowserbasePaul [00:44:44]: One of the case studies on our website that I think is the most interesting is this company called Benny. So, the way that it works is if you're on food stamps in the United States, you can actually get rebates if you buy certain things. Yeah. You buy some vegetables. You submit your receipt to the government. They'll give you a little rebate back. Say, hey, thanks for buying vegetables. It's good for you. That process of submitting that receipt is very painful. And the way Benny works is you use their app to take a photo of your receipt, and then Benny will go submit that receipt for you and then deposit the money into your account. That's actually using no AI at all. It's all, like, hard-coded scripts. They maintain the scripts. They've been doing a great job. And they build this amazing consumer app. But it's an example of, like, all these, like, tedious workflows that people have to do to kind of go about their business. And they're doing it for the sake of their day-to-day lives. And I had never known about, like, food stamp rebates or the complex forms you have to do to fill them. But the world is powered by millions and millions of tedious forms, visas. You know, Emirate Lighthouse is a customer, right? You know, they do the O1 visa. Millions and millions of forms are taking away humans' time. And I hope that Browserbase can help power software that automates away the web forms that we don't need anymore. Yeah.swyx [00:45:49]: I mean, I'm very supportive of that. I mean, forms. I do think, like, government itself is a big part of it. I think the government itself should embrace AI more to do more sort of human-friendly form filling. Mm-hmm. But I'm not optimistic. I'm not holding my breath. Yeah. We'll see. Okay. I think I'm about to zoom out. I have a little brief thing on computer use, and then we can talk about founder stuff, which is, I tend to think of developer tooling markets in impossible triangles, where everyone starts in a niche, and then they start to branch out. So I already hinted at a little bit of this, right? We mentioned more. We mentioned E2B. We mentioned Firecrawl. And then there's Browserbase. So there's, like, all this stuff of, like, have serverless virtual computer that you give to an agent and let them do stuff with it. And there's various ways of connecting it to the internet. You can just connect to a search API, like SERP API, whatever other, like, EXA is another one. That's what you're searching. You can also have a JSON markdown extractor, which is Firecrawl. Or you can have a virtual browser like Browserbase, or you can have a virtual machine like Morph. And then there's also maybe, like, a virtual sort of code environment, like Code Interpreter. So, like, there's just, like, a bunch of different ways to tackle the problem of give a computer to an agent. And I'm just kind of wondering if you see, like, everyone's just, like, happily coexisting in their respective niches. And as a developer, I just go and pick, like, a shopping basket of one of each. Or do you think that you eventually, people will collide?Future of browser automation and market competitionPaul [00:47:18]: I think that currently it's not a zero-sum market. Like, I think we're talking about... I think we're talking about all of knowledge work that people do that can be automated online. All of these, like, trillions of hours that happen online where people are working. And I think that there's so much software to be built that, like, I tend not to think about how these companies will collide. I just try to solve the problem as best as I can and make this specific piece of infrastructure, which I think is an important primitive, the best I possibly can. And yeah. I think there's players that are actually going to like it. I think there's players that are going to launch, like, over-the-top, you know, platforms, like agent platforms that have all these tools built in, right? Like, who's building the rippling for agent tools that has the search tool, the browser tool, the operating system tool, right? There are some. There are some. There are some, right? And I think in the end, what I have seen as my time as a developer, and I look at all the favorite tools that I have, is that, like, for tools and primitives with sufficient levels of complexity, you need to have a solution that's really bespoke to that primitive, you know? And I am sufficiently convinced that the browser is complex enough to deserve a primitive. Obviously, I have to. I'm the founder of BrowserBase, right? I'm talking my book. But, like, I think maybe I can give you one spicy take against, like, maybe just whole OS running. I think that when I look at computer use when it first came out, I saw that the majority of use cases for computer use were controlling a browser. And do we really need to run an entire operating system just to control a browser? I don't think so. I don't think that's necessary. You know, BrowserBase can run browsers for way cheaper than you can if you're running a full-fledged OS with a GUI, you know, operating system. And I think that's just an advantage of the browser. It is, like, browsers are little OSs, and you can run them very efficiently if you orchestrate it well. And I think that allows us to offer 90% of the, you know, functionality in the platform needed at 10% of the cost of running a full OS. Yeah.Open Operator: Browserbase's Open-Source Alternativeswyx [00:49:16]: I definitely see the logic in that. There's a Mark Andreessen quote. I don't know if you know this one. Where he basically observed that the browser is turning the operating system into a poorly debugged set of device drivers, because most of the apps are moved from the OS to the browser. So you can just run browsers.Paul [00:49:31]: There's a place for OSs, too. Like, I think that there are some applications that only run on Windows operating systems. And Eric from pig.dev in this upcoming YC batch, or last YC batch, like, he's building all run tons of Windows operating systems for you to control with your agent. And like, there's some legacy EHR systems that only run on Internet-controlled systems. Yeah.Paul [00:49:54]: I think that's it. I think, like, there are use cases for specific operating systems for specific legacy software. And like, I'm excited to see what he does with that. I just wanted to give a shout out to the pig.dev website.swyx [00:50:06]: The pigs jump when you click on them. Yeah. That's great.Paul [00:50:08]: Eric, he's the former co-founder of banana.dev, too.swyx [00:50:11]: Oh, that Eric. Yeah. That Eric. Okay. Well, he abandoned bananas for pigs. I hope he doesn't start going around with pigs now.Alessio [00:50:18]: Like he was going around with bananas. A little toy pig. Yeah. Yeah. I love that. What else are we missing? I think we covered a lot of, like, the browser-based product history, but. What do you wish people asked you? Yeah.Paul [00:50:29]: I wish people asked me more about, like, what will the future of software look like? Because I think that's really where I've spent a lot of time about why do browser-based. Like, for me, starting a company is like a means of last resort. Like, you shouldn't start a company unless you absolutely have to. And I remain convinced that the future of software is software that you're going to click a button and it's going to do stuff on your behalf. Right now, software. You click a button and it maybe, like, calls it back an API and, like, computes some numbers. It, like, modifies some text, whatever. But the future of software is software using software. So, I may log into my accounting website for my business, click a button, and it's going to go load up my Gmail, search my emails, find the thing, upload the receipt, and then comment it for me. Right? And it may use it using APIs, maybe a browser. I don't know. I think it's a little bit of both. But that's completely different from how we've built software so far. And that's. I think that future of software has different infrastructure requirements. It's going to require different UIs. It's going to require different pieces of infrastructure. I think the browser infrastructure is one piece that fits into that, along with all the other categories you mentioned. So, I think that it's going to require developers to think differently about how they've built software for, you know
Steve Ruiz, founder of TLDraw, discusses the revolutionary AI applications in TLDraw, the intricacies of infinite canvas editors, and the impact of AI on design and development. Links https://www.steveruiz.me https://www.tldraw.com https://makereal.tldraw.com https://teach.tldraw.com https://computer.tldraw.com https://gitnation.com/contents/make-real-tldraws-accidental-ai-play We want to hear from you! How did you find us? Did you see us on Twitter? In a newsletter? Or maybe we were recommended by a friend? Let us know by sending an email to our producer, Emily, at emily.kochanekketner@logrocket.com (mailto:emily.kochanekketner@logrocket.com), or tweet at us at PodRocketPod (https://twitter.com/PodRocketpod). Follow us. Get free stickers. Follow us on Apple Podcasts, fill out this form (https://podrocket.logrocket.com/get-podrocket-stickers), and we'll send you free PodRocket stickers! What does LogRocket do? LogRocket provides AI-first session replay and analytics that surfaces the UX and technical issues impacting user experiences. Start understand where your users are struggling by trying it for free at [LogRocket.com]. Try LogRocket for free today.(https://logrocket.com/signup/?pdr)