Podcasts about amazon q

111PODCASTS
202EPISODES
51mAVG DURATION
1WEEKLY EPISODE
Sep 22, 2025LATEST

POPULARITY

20172018201920202021202220232024

Best podcasts about amazon q

Silent Sales Machine Radio

31 episodes with amazon q

AWS - Conversations with Leaders

13 episodes with amazon q

Lunch With Norm - The Amazon FBA & eCommerce Podcast

11 episodes with amazon q

All TWiT.tv Shows (MP3)

4 episodes with amazon q

All TWiT.tv Shows (Video LO)

4 episodes with amazon q

My Amazon Guy

3 episodes with amazon q

Everyday AI Podcast â€“ An AI and ChatGPT Podcast

2 episodes with amazon q

Total Jason (Audio)

3 episodes with amazon q

Total Jason (Video)

3 episodes with amazon q

Latest podcast episodes about amazon q

Amazon's legendary memo-writing culture is on its last leg

Daybreak

Play Episode Listen Later Sep 22, 2025 11:40

For decades, every Amazon meeting began in silence with employees reading six-page memos that shaped the company's biggest innovations like Prime and Alexa. Jeff Bezos banned PowerPoint in 2004 to build a culture of truth-seeking through crisp writing and messy discussions. Now, that tradition faces disruption as internal AI tools like Amazon Q and Cedric draft, summarise and analyse documents in minutes. Some employees are embracing the speed while others fear a loss of originality and rigour. Is AI strengthening Amazon's culture or quietly dismantling the practice that once defined its success?Tune in.Click here to sign up for The Ken's case competition.Daybreak is produced from the newsroom of The Ken, India's first subscriber-only business news platform. Subscribe for more exclusive, deeply-reported, and analytical business stories. If you are a student who wants to participate in The Ken's case build competition, or if you simply want to read the case, you can do that here: https://the-ken.com/case-competition-2025/

amazon culture ai business technology news writing legendary prime jeff bezos powerpoint memo daybreak last leg amazon q

ChatGPT im Unternehmen: So wird KI sicher bei DSGVO & Datenschutz | Kauz.ai bei #ITundTECH

IT und TECH Podcast

Play Episode Listen Later Sep 8, 2025 38:08

ChatGPT im Unternehmen: Viele Mitarbeitende nutzen ChatGPT, Microsoft Copilot, Google Gemini oder andere generative KI-Assistenten bereits – oft ohne Freigabe, ohne Struktur und mit unklarer Datensicherheit. In diesem #ITundTECH Interview erklärt Thomas Rüdel, Geschäftsführer von Kauz.ai, wie Unternehmen KI-Tools wie ChatGPT, Claude oder Gemini datenschutzkonform einsetzen können. Er zeigt, wie der aiWorkplace von Kauz.ai nicht nur Sicherheit gewährleistet, sondern auch konkrete Effizienzgewinne für den Unternehmensalltag ermöglicht.ChatGPT im Unternehmen – Highlights des Interviews:Wie Unternehmen KI wie ChatGPT, Anthropic's Claude, Google Gemini, Microsoft Copilot, DeepSeek AI, Mistral Le Chat, Enterprise Bot, IBM Watsonx, Amazon Q, und viele andere generative KI-Assistenten sicher und DSGVO-konform nutzen können.Warum gerade im KMU mit KI der Arbeitsalltag produktiver und kreativer gestaltet werden kanndie Vorteile des AI Workplace im Vergleich zu den Standardlösungen generativer KI-AssistentenTakeaways aus dem Interview:ChatGPT und andere LLMs sind ohne klare Richtlinien ein Risiko für Datenschutz und Unternehmenssicherheit.Der AI Workplace von Kauz.ai ermöglicht eine sichere, europäische Hosting-Umgebung für KI im Unternehmen.KI kann Mitarbeitende in allen Bereichen unterstützen – von Marketing bis Sachbearbeitung.Strukturiertes KI-Management verhindert unkontrollierte Schatten-IT durch „heimliche“ Nutzung von ChatGPT.Unternehmen profitieren von Effizienzsteigerungen, besserer Meeting-Vorbereitung und schnellerer Dokumentenerstellung.Datenschutz und Produktivität sind kein Widerspruch, wenn KI-Lösungen richtig integriert werden.Die Zukunft liegt in der Kombination aus KI-Agenten und Workflow-AutomatisierungWeiterführende Links► Internet: https://kauz.ai/ ► LinkedIn-Firmenseite: https://www.linkedin.com/company/kauz-gmbh/ ► LinkedIn: http://www.linkedin.com/in/thomas-ruedel-92980985 Über den ITundTECH PodcastDer Podcast mit CEOs innovativer Softwarehersteller, IT-Dienstleister oder TECH-Unternehmen aus Deutschland► Mehr erfahren: https://www.itundtech.de/ ► Abonniere uns auf Youtube:https://www.youtube.com/@itundtech ► Vernetze dich mit Holger Winkler auf LinkedIn: https://www.linkedin.com/in/holger-winkler/ Du möchtest als Gast dabei sein?Hier findest du unsere Mediadaten: https://join.itundtech.de/

Amazon Q Rules Except It Doesn't At All

AWS Morning Brief

Play Episode Listen Later Sep 3, 2025 4:22

AWS Morning Brief for the week of September 2nd, 2025, with Corey Quinn. Links:How Ancestry optimizes a 100-billion-row Iceberg tableMastering Amazon Q Developer with Rules Bob's Used Books: Build a .NET Serverless Application on AWS – Part 2: ArchitectureHow Amazon Finance built an AI assistant using Amazon Bedrock and Amazon Kendra to support analysts for data discovery and business insights Building Your Open Source Commercial Strategy with AWSHow to optimize Amazon RDS and Amazon Aurora database costs/performance with AWS Compute OptimizerGracefully handle failed AWS Lambda events from Amazon DynamoDB StreamsAnnouncing the AWS Billing and Cost Management MCP serverAWS joins the DocumentDB project to build interoperable, open source document database technologyCount Tokens API supported for Anthropic's Claude models now in Amazon Bedrock

amazon ai cloud aws devops anthropic aws lambda corey quinn amazon q amazon bedrock amazon rds amazon aurora documentdb last week in aws

147. Spec coding with Kiro

AWS Bites

Play Episode Listen Later Aug 21, 2025 38:50

What if AWS built an IDE to rival your favorite editor? Turns out they did!In this episode of AWS Bites, we dive into Kiro, an AI centric fork of VS Code that tries to turn an empty repo and a loose idea into working software. Kiro imports your VS Code world, then guides you through requirements, design, and a clear task plan before an agent gets to work. We share what clicked, what tripped us up, and how Kiro's spec driven approach compares to Cursor or Claude Code. We also cover status, limits, pricing, and what this could become if AWS leans in with deep cloud integration. Stick around for our take on whether you should switch or wait.Big shoutout to fourTheorem for powering yet another episode of AWS Bites. At fourTheorem, we believe the cloud should be simple, scalable, and cost-effective, and we help teams do just that. Whether you're diving into containers, stepping into event-driven architecture, or scaling a global SaaS platform on AWS, or trying to keep cloud spend under control our team has your back. Visit ⁠⁠⁠https://fourTheorem.com⁠⁠⁠ to see how we can help you build faster, better, and with more confidence using AWS cloud!In this episode, we mentioned the following resources: Kiro website: https://kiro.dev/ Kiro docs on Agent Hooks: https://kiro.dev/docs/hooks/ Kiro docs on Steering: https://kiro.dev/docs/steering/ Kiro pricing plans blog: https://kiro.dev/blog/pricing-plans-are-live/ Cargo Lambda: https://www.cargo-lambda.info/ Episode 64: how do you write Lambda functions in Rust?: https://awsbites.com/64-how-do-you-write-lambda-functions-in-rust/ Kiro GitHub issue: https://github.com/kirodotdev/Kiro/issues/2004 Amazon Q developer CLI: https://github.com/aws/amazon-q-developer-cli Do you have any AWS questions you would like us to address?Leave a comment here or connect with us on X/Twitter, BlueSky or LinkedIn:- ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://twitter.com/eoins⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ | ⁠⁠⁠⁠⁠⁠⁠https://bsky.app/profile/eoin.sh⁠⁠⁠⁠⁠⁠⁠ | ⁠⁠⁠⁠⁠⁠⁠https://www.linkedin.com/in/eoins/⁠⁠⁠⁠⁠⁠⁠- ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://twitter.com/loige⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ | ⁠⁠⁠⁠⁠⁠⁠https://bsky.app/profile/loige.co⁠⁠⁠⁠⁠⁠⁠ | ⁠⁠⁠⁠⁠⁠⁠https://www.linkedin.com/in/lucianomammino/

ai saas rust blue sky coding aws ide steering spec lambda cli cursor vs code kiro amazon q

#733: Amazon Connect - So Many Cool New Capabilities For You to Use!

AWS Podcast

Play Episode Listen Later Aug 18, 2025 33:05

In this episode of the AWS Podcast, we explore the evolving world of contact centers and Amazon Connect. The discussion covers why contact centers remain critical to both business and public sector operations, and how they're transforming from traditional cost centers into valuable sources of business intelligence. Key highlights include Amazon Connect's integration capabilities with AWS services, particularly through AWS Lambda functions, and the recent implementation of generative AI features including contact summarisation, agent evaluations, and Amazon Q in Connect. The conversation emphasizes how modern technology is helping organizations better understand customer needs, improve agent performance, and maintain human empathy in customer service while leveraging automation. The episode also touches on practical aspects of system integration and data management, demonstrating how Amazon Connect helps organizations overcome traditional barriers in contact center operations. https://aws.amazon.com/connect/ https://aws.amazon.com/blogs/contact-center/introducing-the-next-generation-of-amazon-connect/

ai aws capabilities aws lambda amazon connect amazon q

Ep133: Enabling Better Customer Experiences with Amazon Q Index w/ PagerDuty and Zoom

AWS for Software Companies Podcast

Play Episode Listen Later Aug 18, 2025 23:10

Hear how PagerDuty and Zoom built successful AI products using Amazon Q-Index to solve real customer problems like incident response and meeting intelligence, while sharing practical lessons from their early adoption journey.Topics Include:David Gordon introduces AWS Q-Business partnerships with PagerDuty and ZoomMeet Everaldo Aguiar: PagerDuty's Applied AI leader with academia and enterprise backgroundPaul Magnaghi from Zoom brings AI platform scaling experience from SeattleQ-Business launched over a year ago as managed generative AI servicePlatform enables agentic experiences: content discovery, analysis, and process automationBuilt on AWS Bedrock with enterprise guardrails and data source integrationPartners wanted backend capabilities but preferred their own UI and modelsQ-Index provides vector database functionality for ISV partner integrationsEveraldo explains PagerDuty's evolution from traditional ML to generative AI solutionsHistorical challenges: alert fatigue, noise reduction using machine learning approachesNew gen AI opportunities: incident context, relevant data surfacing, automated postmortemsEngineering teams faced learning curve with agents and high-latency user experiencesPaul discusses Zoom's existing AI: virtual backgrounds and voice isolation technologyAI Companion strategy focused on simplicity during complex generative AI adoptionProblem identified: valuable meeting conversations disappear after Zoom calls endCustomer feedback revealed need for enterprise data integration beyond basic summariesGoal: combine unstructured conversations with structured enterprise data seamlesslyPagerDuty Advanced provides agentic AI for on-call engineers during incidentsQ-Index integration accesses internal documentation: Confluence pages, runbooks, proceduresDemo shows Slack integration pulling relevant incident response documentation automaticallyAccess control lists ensure users see only data they're authorized to accessZoom's AI companion panel enables real-time meeting questions and summariesExample use cases: decision tracking, incident analysis, action item identificationAdvice for starting: standardize practices and create internal development templatesSingle data access point reduces legal and security evaluation overheadCenter of excellence approach helps teams move quickly across product divisionsCut through generative AI buzzwords to focus on real user valueFederated AWS Bedrock architecture provides model choice and flexibility meeting customersCustomer trust alignment between Zoom conversations and AWS data handlingGetting started: PagerDuty Advance available now, Zoom AI free with paid add-onsParticipants:Everaldo Aguiar – Senior Engineering Manager, Applied AI, PagerDutyPaul Magnaghi – Head of AI & ISV Go To Market, ZoomDavid Gordon - Global Business Development, Amazon Q for Business. Amazon Web ServicesFurther Links:PagerDuty Website, LinkedIn & AWS MarketplaceZoom Website, LinkedIn & AWS MarketplaceSee how Amazon Web Services gives you the freedom to migrate, innovate, and scale your software company at https://aws.amazon.com/isv/

The Top AI Tool for Devs Isn't GitHub Copilot, New Report Finds

The New Stack Podcast

Play Episode Listen Later Aug 15, 2025 36:47

In this week's episode ofThe New Stack Agents, Scott Carey, editor-in-chief of LeadDev, discussed their first AI Impact Report, which explores how engineering teams are adopting AI tools. The report shows that two-thirds of developers are actively using AI, with another 20% in pilot stages and only 2% having no plans to use AI — a group Carey finds particularly intriguing. Popular tools include Cursor (43%) and GitHub Copilot (37%), with others like OpenAI, Gemini, and Claude following, while Amazon Q and Replit lag behind.Most developers use AI for code generation, documentation, and research, but usage for DevOps tasks like testing, deployment, and IT automation remains low. Carey finds this underutilization frustrating, given AI's potential impact in these areas. The report also highlights concern for junior developers, with 54% of respondents expecting fewer future hires at that level. While many believe AI boosts productivity, some remain unsure — a sign that organizations still struggle to measure developer performance effectively.Learn more from The New Stack about the latest insights about the AI tool adoption: AI Adoption: Why Businesses Struggle to Move from Development to Production3 Strategies for Speeding Up AI Adoption Among DevelopersAI Everywhere: Overcoming Barriers to AdoptionJoin our community of newsletter subscribers to stay on top of the news and at the top of your game.

ai tech development popular openai gemini devops devs software engineers software developers new report cursor github copilot ai tool replit amazon q new stack developer podcast scott carey

Why Vibe Coding Isn't Enough

Datacenter Technical Deep Dives

Play Episode Listen Later Aug 14, 2025

In this episode we talk to AWS Hero Brian Hough: Vibe Coding with GenAI is fast and fun — until your app has to actually work in production. That's when reality hits: fragile APIs, missing auth, surprise AWS bills, strict constraints, and no clear path to scale. In this Dev Chat, I'll share what it takes to evolve from AI-generated MVPs to real-world, production-ready apps for millions of users. We'll talk infrastructure as code, scaling APIs, adding observability, and building systems that don't break under pressure. If you've used GenAI tools like Amazon Q, Bedrock, or your favorite code copilot, this session will help you ship faster and smarter. 00:00 - Intro 15:43 - Why Vibe Coding Isn't Enough 17:10 - The vibe coded initial app 18:30 - What could possibly go wrong? 24:42 - (Agenda) How we're going to fix the vibe coded app 27:55 - Fixing our vibe code workflow 29:06 - The Architecture 31:29 - Our Toolkit & Fixing all the things! 55:17 - The repo to play along at home! 55:23 - Q&A How to find Brian: https://www.linkedin.com/in/brianhhough/ https://brianhhough.com/ Brian's links: https://github.com/BrianHHough/aws-summit-2025

Balancing Innovation and Risk in Insurance AI with Darwin Larrison and Amanda Turcotte

Life Accelerated

Play Episode Listen Later Aug 13, 2025 46:06

In this episode, host Olivier Lafontaine speaks with Amanda Turcotte, SVP and Chief Actuary at Amalgamated Life Insurance Company, and Darwin Larrison, VP and Chief Information Security Officer at Modern Woodmen of America, about how their teams are navigating the changing landscape of artificial intelligence in life insurance. Amanda shares how her company is applying tools like Amazon Q and Intelligent Document Processing to streamline customer support and data handling. Darwin explains how governance frameworks, vendor partnerships, and licensing decisions are shaping how AI tools like Copilot are being deployed securely and responsibly. Throughout the session, Amanda and Darwin bring their unique perspectives from actuarial and security leadership to highlight what AI can realistically deliver today, and how insurers can prepare for what's ahead. Key Takeaways: Adopting AI in insurance requires more than tools. It demands structure, governance, and cultural buy-in. Licensing strategies and vendor partnerships can quietly shape how innovation spreads inside an organization. AI can help small carriers scale smarter by turning everyday data into operational advantage. Jump Into the Conversation: (00:00) Meet Amanda Turcotte and Darwin Larrison (02:18) Why insurance leaders are cautious with AI (06:44) Using Amazon Q to speed up service (08:11) Building a centralized CRM with AI features (11:27) Who gets access to Copilot and why (14:50) AI's role in institutional memory and training (19:00) Building a governance group for responsible AI (23:29) How to upskill non-tech employees on AI (29:17) Why transcription still faces internal resistance (38:15) What startups do differently with AI adoption (40:31) Predictions on how AI will transform insurance jobs Resources: Connect with Amanda Turcotte: https://www.linkedin.com/in/amanda-turcotte-7a436413/ Connect with Darwin Larrison: https://www.linkedin.com/in/darwinlarrison/ Check out Amalgamated Life Insurance Company: https://www.amalgamatedbenefits.com/amalgamated-life/ Check out Modern Woodmen of America: https://www.modernwoodmen.org/ Connect with Olivier: https://www.linkedin.com/in/olivierlafontaine/

america ai conversations building innovation predictions risk balancing insurance crm svp olivier licensing copilot resources connect chief information security officer turcotte amazon q jump into

SonicWall releases patches, The Com warning, Compromised Amazon Q extension

Cyber Security Headlines

Play Episode Listen Later Jul 25, 2025 8:34

SonicWall announces SMA 100 patches FBI warns about The Com Compromised Amazon Q extension deletes everything Huge thanks to our sponsor, Nudge Security Nudge Security discovers new apps, accounts, and data-sharing in real-time and helps guide employees toward secure behaviors. Instead of trying to control everything, we give IT and security teams the visibility and automation they need to secure the Workforce Edge. Find the stories behind the headlines at CISOseries.com.

fbi releases extension patches sma compromised sonicwall amazon q ciso series

Ep120: Asana and Amazon Q - Co-Innovating with AWS Generative AI Services

AWS for Software Companies Podcast

Play Episode Listen Later Jul 17, 2025 27:37

Spencer Herrick, Principal AI Product Manager of Asana and Oliver Myers of AWS demonstrate how their integration allows Asana's AI workflows to access enterprise data from Amazon Q Business, enabling seamless cross-application automation and insights.Topics Include:Oliver Myers leads Amazon Q Business go-to-market, Spencer Herrick manages Asana AI products.Session focuses on end user productivity challenges with generative AI technology implementations.End users face technology overload with doubled workplace application usage over five years.Data silos prevent getting maximum value from generative AI across fragmented enterprise systems.Workers spend 53% of time on "work about work" instead of strategic contributions.Ideal experience needs single pane of glass with cross-application insights and actions.Amazon Q Business launched as managed service with 40+ enterprise data connectors.Connectors maintain end-user permissions from source systems for enterprise security compliance.QIndex feature enables ISVs to access Q Business data via API calls.End users get answers enriched with multiple data sources without switching applications.Asana's work graph connects all tasks, projects, and portfolios to company goals.Phase 1 AI focused on narrow solutions like smart status updates.Phase 2 aimed for AI teammate capabilities requiring extensive contextual knowledge.AI Studio launched as no-code workflow automation builder within Asana platform.Q integration allows AI Studio to access cross-application context beyond Asana boundaries.SmartChat enhanced with Q can answer "what should I work on today?" holistically.Users returning from PTO can quickly understand goal risks across data sources.AI Studio workflows automate feature request processing across Asana, Drive, Slack, email.Partnership eliminates silos while maintaining enterprise security and permission controls.Integration creates connected ecosystem enabling true cross-application AI automation and insights.Participants:Spencer Herrick - Principal AI Product Manager, AsanaOliver Myers - Worldwide Head of Business Development, Amazon Web ServicesFurther Links:Asana.comAsana on AWS MarketplaceSee how Amazon Web Services gives you the freedom to migrate, innovate, and scale your software company at https://aws.amazon.com/isv/

Battle of the AI CLIs: Jack Tests Them All

Front-End Fire

Play Episode Listen Later Jun 30, 2025 45:36

The field of AI coding agent CLIs is crowded and getting more so by the day, and our co-host Jack has tried them all so you don't have to. The big four are: OpenAI's Codex, Anthropic's Claude Code, Google's Gemini Code, and Amazon Q, along with some lesser known CLIs like AmpCode, OpenCode, and (the already shut down) Anon Kode. After trying everything, Jack says Anthropic's Sonnet models and Claude Code are still the best.Google's quietly been working on new LLM-powered web APIs that rely on Google's Gemini Nano model to power browser features like language detection and translation, and writing and proofreading, and Mozilla is concerned devs will create apps based on Gemini's behavior.Less than two months after Figma's big Config conference, it shared it's acquired OS headless CMS Payload. Continuing the effort to make Figma a central hub for digital product creation, Figma's adding a CMS to the mix so marketers and designers can more easily update website content as needed.Timestamps:1:01 - Jack's AI tool roundup10:34 - Mozilla's concerns about Google building AI into Chrome19:16 - Figma buys Payload24:22 - Firefox gets vertical tabs27:15 - Jack's macOS 26 experiment goes wrong30:36 - Anthropic destroys millions of print books38:06 - What's making us happyLinks:Paige - Figma buys CMS PayloadJack - State of the AI CLIs: Codex, OpenCode, AmpCode, Gemini Code, Claude Code, Amazon QTJ - Mozilla's concerns about Google building AI into ChromeLightning News:Firefox v140Jack's MacOS 26 upgrade gone wrongAnthropic destroyed millions of print books to build its AI modelsWhat Makes Us Happy this Week:Paige - Rock Paper Scissors novelJack - Tamolitch Falls and Final Destination movie seriesTJ - Watkins Glen State ParkThanks as always to our sponsor, the Blue Collar Coder channel on YouTube. You can join us in our Discord channel, explore our website and reach us via email, or talk to us on X, Bluesky, or YouTube.Front-end Fire websiteBlue Collar Coder on YouTubeBlue Collar Coder on DiscordReach out via emailTweet at us on X @front_end_fireFollow us on Bluesky @front-end-fire.comSubscribe to our YouTube channel @Front-EndFirePodcast

ai google battle os discord front tests openai gemini blue sky apis cms final destination macos llm firefox mozilla codex sonnets figma anthropic config amazon q clis

309: Microsoft tries to give away cloud services for free, sadly, it’s only SQL

The Cloud Pod

Play Episode Listen Later Jun 26, 2025 51:05

Welcome to episode 308 of The Cloud Pod – where the forecast is always cloudy! Justin and Matt are on hand and ready to bring you an action packed episode. Unfortunately, this one is also lullaby free. Apologies. This week we're talking about Databricks and Lakebridge, Cedar Analysis, Amazon Q, Google's little hiccup, and updates to SQL – plus so much more! Thanks for joining us. Titles we almost went with this week: KV Phone Home: When Your Key-Value Store Goes AWOL When Your Coreless Service Finds Its Core Problem Oracle’s Vanity Fair: Pretty URLs for Pretty Penny From Warehouse to Lakehouse: Your Free Ticket to Cloud Town 1⃣Databricks Uno: Because One is the Loneliest Number Free as in Beer, Smart as in Data Science Cedar Analysis: Because Your Authorization Policies Wood Never Lie Cedar Analysis: Teaching Old Policies New Proofs Amazon Q Finally Learns to Talk to Other Apps Tomorrow: Visual Studio’s Predictive Edit Revolution The Ghost of Edits Future: AI Haunts Your Code Before You Write It IAM What IAM: Google’s Identity Crisis Breaks the Internet Permission Denied: The Day Google Forgot Who Everyone Was 403 Forbidden: When Google’s Bouncer Called in Sick AWS Brings the Heat to Fusion Research Larry’s Cloud Nine: Oracle Stock Soars on Forecast Raise OCI You Later: Oracle Bets Big on Cloud Growth Oracle’s Crystal Ball Shows 40% Cloud Growth Ahead Meta Scales Up Its AI Ambitions with $14 Billion Investment From FAIR to Scale: Meta’s $14 Billion AI Makeover Congratulations Databricks one, you are now the new low code solution. AWS burns power to figure out how power works AI Is Going Great – Or How ML Makes Money 02:12 Zuckerberg makes Meta’s biggest bet on AI, $14 billion Scale AI deal Meta is finalizing a $14 billion investment for a 49% stake in Scale AI, with CEO Alexandr Wang joining to lead a new AI research lab at Meta. This follows similar moves by Google and Microsoft acquiring AI talent through investments rather than direct acquisitions to avoid regulatory scrutiny. Scale AI specializes in data labeling and annotation services critical for training AI models, serving major clients including OpenAI, Google, Microsoft, and Meta. The company’s expertise covers approximately 70% of all AI models being built, providing Meta with valuable intelligence on competitor approaches to model development. The deal reflects Meta’s struggles with its Llama AI models, particularly the underwhelming reception of Llama 4 and delays in releasing the more powerful “Behemoth” model due to concerns about competitiveness with OpenAI and

ai google talk microsoft smart heat beer apologies mark zuckerberg openai titles aws llama sql behemoth cloud services databricks scale ai amazon q cloud pod

Ep101: Beyond Chat - How Asana and Amazon Q Are Embedding AI Into Enterprise Workflows

AWS for Software Companies Podcast

Play Episode Listen Later May 27, 2025 25:13

Victoria Chin of Asana and Michael Horn of AWS demonstrate how Amazon Q integrates with Asana to enable AI-powered workflows while dramatically reducing manual work and improving cross-functional collaboration.Topics Include:Victoria Chin introduces herself as Asana's CPO Chief of StaffMichael Horn from AWS discusses customer feedback on generative AIAI agents limited by quality of data pulled into themAmazon Q Business created to analyze information and take actionHundreds of customers using Q Business across various industries dailyAWS hosts most business applications, ideal for AI journeyAmazon Q has most built-in, managed, secure data connectors availableQ Index creates comprehensive, accessible index of all company dataSecurity permissions automatically pulled in, no manual configuration neededSupports both structured and unstructured data from multiple sourcesVictoria returns to discuss Asana's integration with Q IndexBillions invested in integrations, but usage still lags behindTeams switch between apps 1000 times daily, missing connectionsRoot problem: no reliable way to track who/what/when/whyContent platforms store work but don't manage or coordinateAsana bridges content and communication for effective teamwork scalingAI disrupting software, but questions remain about real valueSoftware must provide structured framework to guide LLMs effectivelyAI needs data AND structure to separate signal from noiseAsana Work Graph maps how work actually gets done organizationallyWork Graph visualized as interconnected data, not rows and columnsMost strategic work is cross-functional, requiring multiple teams collaboratingTraditional integrations require manual setup and knowing when to useQ Index gives Asana access to 40+ different data connectorsUsers can ask questions, get answers with cross-application contextAI Studio enables no-code building of workflows with AI agentsProduct launch example shows intake, planning, execution, and reporting stagesAI can surface relevant documents, research, and updates automaticallyChat is tip of iceberg; real power comes from embedded workflowsIntegration evolves from feature-level to AI-powered product-level connectionsParticipants:Victoria J. Chin – Chief of Staff / Product Strategy, AI, AsanaMichael Horn – Principal Head of Business Development – Artificial Intelligence & Machine Learning, AWSSee how Amazon Web Services gives you the freedom to migrate, innovate, and scale your software company at https://aws.amazon/isv/

amazon ai chief enterprise machine learning aws asana amazon web services cloud computing workflows data security ai ai embedding cloud services michael horn generativeai amazon q

Big Retail Cyber Attack: Amazon's AI Offensive & the Google AI Opt‑Out Illusion

Cloud Unplugged

Play Episode Listen Later May 7, 2025 33:16

In this 30‑minute episode, Jon and Lewis unpick the coordinated ransomware wave that struck Britain's high‑street giants. They trace the attack chain that emptied Co‑op shelves, froze M&S online orders and attempted, but failed, to extort Harrods.Lewis takes a look at Amazon's latest generative‑AI arsenal: Amazon Q's new developer‑first agents, the multimodal Nova Premier family running on Bedrock, and AWS's landmark decision to let any SaaS vendor list in Marketplace regardless of where the software runs, a direct play to become the app store for the whole cloud economy. Finally, they ask whether enterprises can really keep their data out of Google's AI engines.Hosts:https://www.linkedin.com/in/jonathanshanks/https://www.linkedin.com/in/lewismarshall/

#719: AWS News: Amazon Q Developer brings powerful new AI capabilities to GitLab Duo

AWS Podcast

Play Episode Listen Later May 5, 2025 26:12

Description: Learn how you can use the all new Amazon Q Developer integration with GitLab Duo to automate code generation and review, plus even more updates from AWS. 00:00:00 - Intro, 00:00:28 - SWE Holly Bench, 00:04:31 - Analytics, 00:06:49 - Application Integration, 00:07:14 - Artificial Intelligence, 00:08:53 - Amazon Bedrock Data Automation, 00:14:11 - AWS Health Omex, 00:14:21 - Compute, 00:16:37 - Contact Centers, 00:17:25 - Containers, 00:17:46 - Databases, 00:18:18 - Front end Web and Mobile, 00:18:59 - Management and Governance, 00:20:07 - Migration and Transfer, 00:20:17 - Networking and Content Delivery, 00:20:44 - Security Identity End Compliance, 00:23:24 - Serverless, 00:24:01 - Storage, 00:24:41 - Wrap up Shownotes: https://d29iemol7wxagg.cloudfront.net/719ExtendedShownotes.html

The Art of Amazon Q Developer

AWS Morning Brief

Play Episode Listen Later Apr 28, 2025 4:31

AWS Morning Brief for the week of April 28th, with Corey Quinn. Links:Amazon CloudWatch agent now supports Red Hat OpenShift Service on AWS (ROSA) Amazon Cognito now supports refresh token rotation Amazon Q Developer releases state-of-the-art agent for feature development AWS Account Management now supports IAM-based account name updates AWS CodeBuild adds support for specifying EC2 instance type and configurable storage size AWS Console Mobile Application adds support for Amazon Lightsail AWS STS global endpoint now serves your requests locally in Regions enabled by default AWS Transfer Family introduces Terraform module for deploying SFTP server endpoints How Smartsheet reduced latency and optimized costs in their serverless architecture In the works – New Availability Zone in Maryland for US East (Northern Virginia) Region CVE-2025-3857 – Infinite loop condition in Amazon.IonDotnet I annotated Amazon CEO Andy Jassy's 2024 Letter to Shareholders

amazon maryland letter cloud i am developers infinite aws devops regions shareholders terraform ec2 corey quinn amazon q sftp amazon ceo andy jassy last week in aws

Ep097: Specialized Agents & Agentic Orchestration - New Relic and the Future of Observability

AWS for Software Companies Podcast

Play Episode Listen Later Apr 28, 2025 29:04

New Relic's Head of AI and ML Innovation, Camden Swita discusses their four-cornered AI strategy and envisions a future of "agentic orchestration" with specialized agents.Topics Include:Introduction of Camden Swita, Head of AI at New Relic.New Relic invented the observability space for monitoring applications.Started with Java workloads monitoring and APM.Evolved into full-stack observability with infrastructure and browser monitoring.Uses advanced query language (NRQL) with time series database.AI strategy focuses on AI ops for automation.First cornerstone: Intelligent detection capabilities with machine learning.Second cornerstone: Incident response with generative AI assistance.Third cornerstone: Problem management with root cause analysis.Fourth cornerstone: Knowledge management to improve future detection.Initially overwhelmed by "ocean of possibilities" with LLMs.Needed narrow scope and guardrails for measurable progress.Natural language to NRQL translation proved immensely complex.Selecting from thousands of possible events caused accuracy issues.Shifted from "one tool" approach to many specialized tools.Created routing layer to select right tool for each job.Evaluation of NRQL is challenging even when syntactically correct.Implemented multi-stage validation with user confirmation step.AWS partnership involves fine-tuning models for NRQL translation.Using Bedrock to select appropriate models for different tasks.Initially advised prototyping on biggest, best available models.Now recommends considering specialized, targeted models from start.Agent development platforms have improved significantly since beginning.Future focus: "Agentic orchestration" with specialized agents.Envisions agents communicating through APIs without human prompts.Integration with AWS tools like Amazon Q.Industry possibly plateauing in large language model improvements.Increasing focus on inference-time compute in newer models.Context and quality prompts remain crucial despite model advances.Potential pros and cons to inference-time compute approach.Participants:Camden Swita – Head of AI & ML Innovation, Product Management, New RelicSee how Amazon Web Services gives you the freedom to migrate, innovate, and scale your software company at https://aws.amazon/isv/

Pronetx Merger Supercharges AWS CX Services: “Get to the Cloud Faster—And Smarter”, Podcast

Telecom Reseller

Play Episode Listen Later Apr 14, 2025

“This is a liberating moment for us—and for the enterprises we serve. We're combining forces to scale fast and deliver smarter CX solutions.” — Chris Marr, Pronetx Live from Enterprise Connect, Chris Marr and Yasser El-Haggan of Pronetx joined us for a special Technology Reseller News podcast to share big news: the merger of two AWS customer experience (CX) powerhouses—Pronetx and VT Team—to create a stronger, faster, and more specialized Amazon Connect services firm. AWS-Certified, Cloud-Focused, and Ready to Scale Pronetx, an AWS Service Delivery Partner specializing in Amazon Connect, helps customers—including Fortune 25 companies and federal agencies—migrate contact centers to the cloud and unlock the full potential of AWS technologies, including generative AI, chatbots, case management, and advanced analytics. “Many customers think they're on the cloud—but they're not truly leveraging it,” said El-Haggan. “We help them do more with their AWS investment.” With the merger, Pronetx is not only growing in capacity—it's expanding its focus. Together, the combined team will accelerate software development, build tools for CX teams, and help enterprises infuse generative AI into both front-end and back-office operations. A Boutique Partner, Backed by Deep Tech Expertise Unlike broad SIs, Pronetx operates as a boutique CX firm focused solely on Amazon Connect—a strategy that enables deeper specialization and faster time-to-value. “We're not generalists. We're laser-focused on customer experience, and that's what makes us an ideal partner—for enterprises and for SIs and GSIs,” said Marr. As one of AWS's launch partners for Amazon Q, Pronetx has already begun helping customers use agentic AI and natural language processing to deliver more intelligent, efficient, and personalized support. CX Trends, Real-Time Data, and GenAI Readiness One theme echoed throughout the podcast: AI won't work without great data. Marr emphasized that with the merger, the team now has expanded capability to understand, organize, and apply customer data to maximize GenAI performance. “It's impossible to succeed with GenAI without understanding your customer data. This merger gives us the team to do that at scale,” he added. With CX trends evolving fast—and customer expectations even faster—Pronetx is positioning itself as a partner of choice for cloud-first transformation. A Platform Built on Experience The announcement comes on the eighth anniversary of Amazon Connect, launched at Enterprise Connect 2017. El-Haggan, who helped lead that launch while at AWS, noted the full-circle moment. “Amazon Connect was born right here eight years ago. Now, we're taking it even further with this merger.” Learn More Visit pronetx.com

ai fortune services cloud smarter backed merger aws sis cx genai marr real time data amazon connect amazon q chris marr enterprise connect gsis

Speaking Your Language in Amazon Q Developer

Shift AI Podcast

Play Episode Listen Later Apr 9, 2025 1:44

EXCLUSIVE AWS ANNOUNCEMENT: Amazon Q Developer now speaks YOUR language!

ai english speaking language developers aws amazon q

#715: AWS News: Be your own data analyst with Amazon Q in Quicksight, and more

AWS Podcast

Play Episode Listen Later Apr 7, 2025 24:07

Hosts Simon and Jillian discuss how you can uncover hidden trends and make data-driven decisions - all through natural conversation, with Amazon Q in Quicksight, plus, more of the latest updates from AWS. 00:00 - Intro, 00:22 - Top Stories, 02:50 - Analytics, 03:35 - Application Integrations, 04:48 - Amazon Sagemaker, 05:29 - Amazon Bedrock Knowledge Bases, 05:48- Amazon Polly, 06:46 - Amazon Bedrock, 07:31 - Amazon Bedrock Model Evolution LLM, 08:29 - Business Application, 08:58 - Compute, 09:51 - Contact Centers, 10:54 - Containers, 11:12 - Database, 14:21 - Developer Tools, 15:20 - Front End Web and Mobile, 15:45 - Games, 16:04 - Management and Governance, 16:35 - Media Services, 16:47 - Network and Content Delivery, 19:39 - Security Identity and Compliance, 20:24 - Serverless, 21:48 - Storage, 22:43 - Wrap up Show Notes: https://dqkop6u6q45rj.cloudfront.net/shownotes-20250404-184823.html

Amazon Q and The Future of Autonomous Development | AWS' Adnan Ijaz

Dev Interrupted

Play Episode Listen Later Apr 1, 2025 42:38 Transcription Available

AI is evolving at a breakneck speed, leaving engineering leaders with a critical dilemma: innovate or fall behind. But how do you experiment with AI without risking your credibility? Andrew Zigler sits down with Adnan Ijaz, Director of Product Management for Next Gen Developer Experience at AWS, to unpack the power of AI agents. Together they discuss how to leverage autonomous AI in your development workflow, and learn from real-world examples like Amazon Q.Dive into the evolving role of the developer and discover how to mentor your AI, not just use it. It's time to shift from task-oriented coding to strategic architecture, and this episode shows you how.But first, co-host Dan Lines frames the conversation by discussing the shift towards measuring the concrete benefits of AI tools in development, rather than just their potential. Dan also provides examples of how to set realistic expectations for AI implementation by focusing on specific tasks and measuring both individual and workflow improvements, highlighting the need for overall workflow optimization.Check out:Translating DevEx to the Board Beyond the DORA FrameworksIntroducing AI-Powered Code Review with gitStreamFollow the hosts:Follow BenFollow AndrewFollow today's guest(s):Adnan IjazAmazon Q DeveloperSupport the show: Subscribe to our Substack Leave us a review Subscribe on YouTube Follow us on Twitter or LinkedIn Offers: Learn about Continuous Merge with gitStream Get your DORA Metrics free forever

director ai development dive aws product management autonomous adnan amazon q dora metrics

AI Explorer Series (Part 1: AWI AI Products)

Web and Mobile App Development (Language Agnostic, and Based on Real-life experience!)

Play Episode Listen Later Mar 19, 2025 40:13

In this conversation, Krish Palaniappan explores various AWS AI products and services, discussing their applications, features, and potential use cases. He emphasizes the importance of understanding these tools at a foundational level, especially for beginners in the AI space. The discussion covers specific AWS offerings like Amazon Q, SageMaker, and App Studio, as well as the significance of human review in machine learning through Augmented AI. The conversation aims to provide insights into navigating the complex landscape of AWS AI tools and their integration into business processes. Snowpal Products Backends as Services on ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠AWS Marketplace⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Mobile Apps on ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠App Store⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ and ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Play Store⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Web App⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Education Platform⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ for Learners and Course Creators

ai services products explorers app store aws learners amazon q sagemaker

Ep084: Accelerating ISV Modernization: SoftServe's Six-Month Success Formula

AWS for Software Companies Podcast

Play Episode Listen Later Mar 17, 2025 23:28

Ruslan Kusov of SoftServe presents how their Application Modernization Framework accelerates ISV modernization, assesses legacy code, and delivers modernized applications through platform engineering principles.Topics Include:Introduction of Ruslan Kusov, Cloud CoE Director at SoftServeSoftServe builds code for top ISVsSuccess case: accelerated security ISV modernization by six monthsHealthcare tech company assessment: 1.6 million code lines in weeksBusiness need: product development acceleration for competitive advantageBusiness need: intelligent operations automationBusiness need: ecosystem integration and "sizeification" to cloudBusiness need: secure and compliant solutionsBusiness need: customer-centric platforms with personalized experiencesBusiness need: AWS marketplace integrationDistinguishing intentional from unintentional complexityPlatform engineering concept introductionSelf-service internal platforms for standardizationApplying platform engineering across teams (GenAI, CSO, etc.)No one-size-fits-all approach to modernizationSAMP/SEMP framework introductionCore components: EKS, ECS, or LambdaModular structure with interchangeable componentsCase study: ISV switching from hardware to software productsFour-week MVP instead of planned ten weeksSix-month full modernization versus planned twelve monthsAssessment phase importance for business case developmentCalculating cost of doing nothing during modernization decisionsHealthcare customer case: 1.6 million code lines assessedBenefits: platform deployment in under 20 minutesBenefits: 5x reduced assessment timeBenefits: 30% lower infrastructure costsBenefits: 20% increased development productivity with GenAIIntegration with Amazon Q for developer productivityClosing Q&A on security modernization and ongoing managementParticipants:Ruslan Kusov – Cloud CoE Director, SoftserveSee how Amazon Web Services gives you the freedom to migrate, innovate, and scale your software company at https://aws.amazon/isv/

amazon ai mvp aws accelerating amazon web services cso genai cloud computing modernization cloud services success formula ecs eks isv generativeai amazon q

The AWS Chatbot Disappointment

AWS Morning Brief

Play Episode Listen Later Feb 17, 2025 6:31

AWS Morning Brief for the week of February 17, with Corey Quinn. Links:Amazon DynamoDB now supports auto-approval of quota adjustmentsAmazon Elastic Block Store (EBS) now adds full snapshot size information in Console and APIAmazon RDS for MySQL announces Extended Support minor 5.7.44-RDS.20250103Amazon Redshift Serverless announces reduction in IP Address Requirements to 3 per SubnetAWS Deadline Cloud now supports Adobe After Effects in Service-Managed FleetsAWS Network Load Balancer now supports removing availability zonesAWS CloudTrail network activity events for VPC endpoints now generally availableHarness Amazon Bedrock Agents to Manage SAP InstancesTimestamp writes for write hedging in Amazon DynamoDBUpdating AWS SDK defaults – AWS STS service endpoint and Retry StrategyLearning AWS best practices from Amazon Q in the ConsoleAutomating Cost Optimization Governance with AWS ConfigAmazon Q Developer in chat applications rename - Summary of changes - AWS Chatbot

amazon cloud disappointment chatbots aws console devops rds mysql vpc corey quinn amazon q adobe after effects last week in aws

Quantum is Here! Plus more on re:Invent and Data Protection - Six Five Webcast Infrastructure Matters

The Six Five with Patrick Moorhead and Daniel Newman

Play Episode Listen Later Dec 30, 2024 28:22

On this episode of the Six Five Webcast Infrastructure Matters, hosts Camberley Bates and Dion Hinchcliffe discuss takeaways from AWS re:Invent, developments in Quantum computing, open source LLMs and more! Their discussion covers: The American Society for AI and its discussions around open source LLMs AWS re:Invent announcements such as S3 Enhancements, Amazon Q, Bedrock and more. Developments in quantum computing, including new quantum chips from Google and IBM

ai google ibm infrastructure quantum american society developments aws reinvent invent llm bedrock data protection webcasts amazon q dion hinchcliffe

2024 in Agents [LS Live! @ NeurIPS 2024]

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Dec 25, 2024 48:59

Happy holidays! We'll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all all our LS supporters who helped fund the gorgeous venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver.Our next keynote covers The State of LLM Agents, with the triumphant return of Professor Graham Neubig's return to the pod (his ICLR episode here!). OpenDevin is now a startup known as AllHands! The renamed OpenHands has done extremely well this year, as they end the year sitting comfortably at number 1 on the hardest SWE-Bench Full leaderboard at 29%, though on the smaller SWE-Bench Verified, they are at 53%, behind Amazon Q, devlo, and OpenAI's self reported o3 results at 71.7%.Many are saying that 2025 is going to be the year of agents, with OpenAI, DeepMind and Anthropic setting their sights on consumer and coding agents, vision based computer-using agents and multi agent systems. There has been so much progress on the practical reliability and applications of agents in all domains, from the huge launch of Cognition AI's Devin this year, to the sleeper hit of Cursor Composer and Codeium's Windsurf Cascade in the IDE arena, to the explosive revenue growth of Stackblitz's Bolt, Lovable, and Vercel's v0, and the unicorn rounds and high profile movements of customer support agents like Sierra (now worth $4 billion) and search agents like Perplexity (now worth $9 billion). We wanted to take a little step back to understand the most notable papers of the year in Agents, and Graham indulged with his list of 8 perennial problems in building agents in 2024.Must-Read Papers for the 8 Problems of Agents* The agent-computer interface: CodeAct: Executable Code Actions Elicit Better LLM Agents. Minimial viable tools: Execution Sandbox, File Editor, Web Browsing* The human-agent interface: Chat UI, GitHub Plugin, Remote runtime, …?* Choosing an LLM: See Evaluation of LLMs as Coding Agents on SWE-Bench at 30x - must understand instructions, tools, code, environment, error recovery* Planning: Single Agent Systems vs Multi Agent (CoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration) - Explicit vs Implicit, Curated vs Generated* Reusable common workflows: SteP: Stacked LLM Policies for Web Actions and Agent Workflow Memory - Manual prompting vs Learning from Experience* Exploration: Agentless: Demystifying LLM-based Software Engineering Agents and BAGEL: Bootstrapping Agents by Guiding Exploration with Language* Search: Tree Search for Language Model Agents - explore paths and rewind* Evaluation: Fast Sanity Checks (miniWoB and Aider) and Highly Realistic (WebArena, SWE-Bench) and SWE-Gym: An Open Environment for Training Software Engineering Agents & VerifiersFull Talk on YouTubePlease like and subscribe!Timestamps* 00:00 Welcome to Latent Space Live at NeurIPS 2024* 00:29 State of LLM Agents in 2024* 02:20 Professor Graham Newbig's Insights on Agents* 03:57 Live Demo: Coding Agents in Action* 08:20 Designing Effective Agents* 14:13 Choosing the Right Language Model for Agents* 16:24 Planning and Workflow for Agents* 22:21 Evaluation and Future Predictions for Agents* 25:31 Future of Agent Development* 25:56 Human-Agent Interaction Challenges* 26:48 Expanding Agent Use Beyond Programming* 27:25 Redesigning Systems for Agent Efficiency* 28:03 Accelerating Progress with Agent Technology* 28:28 Call to Action for Open Source Contributions* 30:36 Q&A: Agent Performance and Benchmarks* 33:23 Q&A: Web Agents and Interaction Methods* 37:16 Q&A: Agent Architectures and Improvements* 43:09 Q&A: Self-Improving Agents and Authentication* 47:31 Live Demonstration and Closing RemarksTranscript[00:00:29] State of LLM Agents in 2024[00:00:29] Speaker 9: Our next keynote covers the state of LLM agents. With the triumphant return of Professor Graham Newbig of CMU and OpenDevon, now a startup known as AllHands. The renamed OpenHands has done extremely well this year, as they end the year sitting comfortably at number one on the hardest SWE Benchful leaderboard at 29%.[00:00:53] Speaker 9: Though, on the smaller SWE bench verified, they are at 53 percent behind Amazon Q [00:01:00] Devlo and OpenAI's self reported O3 results at 71. 7%. Many are saying that 2025 is going to be the year of agents, with OpenAI, DeepMind, and Anthropic setting their sights on consumer and coding agents. Vision based computer using agents and multi agent systems.[00:01:22] Speaker 9: There has been so much progress on the practical reliability and applications of agents in all domains, from the huge launch of Cognition AI's Devon this year, to the sleeper hit of Cursor Composer and recent guest Codium's Windsurf Cascade in the IDE arena. To the explosive revenue growth of recent guests StackBlitz's Bolt, Lovable, and Vercel's vZero.[00:01:44] Speaker 9: And the unicorn rounds and high profile movements of customer support agents like Sierra, now worth 4 billion, and search agents like Perplexity, now worth 9 billion. We wanted to take a little step back to understand the most notable papers of the year in [00:02:00] agents, and Graham indulged with his list of eight perennial problems in building agents.[00:02:06] Speaker 9: As always, don't forget to check our show notes for all the selected best papers of 2024, and for the YouTube link to their talk. Graham's slides were especially popular online, and we are honoured to have him. Watch out and take care![00:02:20] Professor Graham Newbig's Insights on Agents[00:02:20] Speaker: Okay hi everyone. So I was given the task of talking about agents in 2024, and this is An impossible task because there are so many agents, so many agents in 2024. So this is going to be strongly covered by like my personal experience and what I think is interesting and important, but I think it's an important topic.[00:02:41] Speaker: So let's go ahead. So the first thing I'd like to think about is let's say I gave you you know, a highly competent human, some tools. Let's say I gave you a web browser and a terminal or a file system. And the ability to [00:03:00] edit text or code. What could you do with that? Everything. Yeah.[00:03:07] Speaker: Probably a lot of things. This is like 99 percent of my, you know, daily daily life, I guess. When I'm, when I'm working. So, I think this is a pretty powerful tool set, and I am trying to do, and what I think some other people are trying to do, is come up with agents that are able to, you know, manipulate these things.[00:03:26] Speaker: Web browsing, coding, running code in successful ways. So there was a little bit about my profile. I'm a professor at CMU, chief scientist at All Hands AI, building open source coding agents. I'm maintainer of OpenHands, which is an open source coding agent framework. And I'm also a software developer and I, I like doing lots of coding and, and, you know, shipping new features and stuff like this.[00:03:51] Speaker: So building agents that help me to do this, you know, is kind of an interesting thing, very close to me.[00:03:57] Live Demo: Coding Agents in Action[00:03:57] Speaker: So the first thing I'd like to do is I'd like to try [00:04:00] some things that I haven't actually tried before. If anybody has, you know, tried to give a live demo, you know, this is, you know very, very scary whenever you do it and it might not work.[00:04:09] Speaker: So it might not work this time either. But I want to show you like three things that I typically do with coding agents in my everyday work. I use coding agents maybe five to 10 times a day to help me solve my own problems. And so this is a first one. This is a data science task. Which says I want to create scatter plots that show the increase of the SWE bench score over time.[00:04:34] Speaker: And so I, I wrote a kind of concrete prompt about this. Agents work better with like somewhat concrete prompts. And I'm gonna throw this into open hands and let it work. And I'll, I'll go back to that in a second. Another thing that I do is I create new software. And I, I've been using a [00:05:00] service a particular service.[00:05:01] Speaker: I won't name it for sending emails and I'm not very happy with it. So I want to switch over to this new service called resend. com, which makes it easier to send emails. And so I'm going to ask it to read the docs for the resend. com API and come up with a script that allows me to send emails. The input to the script should be a CSV file and the subject and body should be provided in Jinja2 templates.[00:05:24] Speaker: So I'll start another agent and and try to get it to do that for me.[00:05:35] Speaker: And let's go with the last one. The last one I do is. This is improving existing software and in order, you know, once you write software, you usually don't throw it away. You go in and, like, actually improve it iteratively. This software that I have is something I created without writing any code.[00:05:52] Speaker: It's basically software to monitor how much our our agents are contributing to the OpenHance repository. [00:06:00] And on the, let me make that a little bit bigger, on the left side, I have the number of issues where it like sent a pull request. I have the number of issues where it like sent a pull request, whether it was merged in purple, closed in red, or is still open in green. And so these are like, you know, it's helping us monitor, but one thing it doesn't tell me is the total number. And I kind of want that feature added to this software.[00:06:33] Speaker: So I'm going to try to add that too. So. I'll take this, I'll take this prompt,[00:06:46] Speaker: and here I want to open up specifically that GitHub repo. So I'll open up that repo and paste in the prompt asking it. I asked it to make a pie chart for each of these and give me the total over the entire time period that I'm [00:07:00] monitoring. So we'll do that. And so now I have let's see, I have some agents.[00:07:05] Speaker: Oh, this one already finished. Let's see. So this one already finished. You can see it finished analyzing the Swebench repository. It wrote a demonstration of, yeah, I'm trying to do that now, actually.[00:07:30] Speaker: It wrote a demonstration of how much each of the systems have improved over time. And I asked it to label the top three for each of the data sets. And so it labeled OpenHands as being the best one for SWE Bench Normal. For SWE Bench Verified, it has like the Amazon QAgent and OpenHands. For the SWE Bench Lite, it has three here over three over here.[00:07:53] Speaker: So you can see like. That's pretty useful, right? If you're a researcher, you do data analysis all the time. I did it while I was talking to all [00:08:00] of you and making a presentation. So that's, that's pretty nice. I, I doubt the other two are finished yet. That would be impressive if the, yeah. So I think they're still working.[00:08:09] Speaker: So maybe we'll get back to them at the end of the presentation. But so these are the kinds of the, these are the kinds of things that I do every day with coding agents now. And it's or software development agents. It's pretty impressive.[00:08:20] Designing Effective Agents[00:08:20] Speaker: The next thing I'd like to talk about a little bit is things I worry about when designing agents.[00:08:24] Speaker: So we're designing agents to, you know, do a very difficult task of like navigating websites writing code, other things like this. And within 2024, there's been like a huge improvement in the methodology that we use to do this. But there's a bunch of things we think about. There's a bunch of interesting papers, and I'd like to introduce a few of them.[00:08:46] Speaker: So the first thing I worry about is the agent computer interface. Like, how do we get an agent to interact with computers? And, How do we provide agents with the tools to do the job? And [00:09:00] within OpenHands we are doing the thing on the right, but there's also a lot of agents that do the thing on the left.[00:09:05] Speaker: So the thing on the left is you give like agents kind of granular tools. You give them tools like or let's say your instruction is I want to determine the most cost effective country to purchase the smartphone model, Kodak one the countries to consider are the USA, Japan, Germany, and India. And you have a bunch of available APIs.[00:09:26] Speaker: And. So what you do for some agents is you provide them all of these tools APIs as tools that they can call. And so in this particular case in order to solve this problem, you'd have to make about like 30 tool calls, right? You'd have to call lookup rates for Germany, you'd have to look it up for the US, Japan, and India.[00:09:44] Speaker: That's four tool goals. And then you go through and do all of these things separately. And the method that we adopt in OpenHands instead is we provide these tools, but we provide them by just giving a coding agent, the ability to call [00:10:00] arbitrary Python code. And. In the arbitrary Python code, it can call these tools.[00:10:05] Speaker: We expose these tools as APIs that the model can call. And what that allows us to do is instead of writing 20 tool calls, making 20 LLM calls, you write a program that runs all of these all at once, and it gets the result. And of course it can execute that program. It can, you know, make a mistake. It can get errors back and fix things.[00:10:23] Speaker: But that makes our job a lot easier. And this has been really like instrumental to our success, I think. Another part of this is what tools does the agent need? And I, I think this depends on your use case, we're kind of extreme and we're only giving the agent five tools or maybe six tools.[00:10:40] Speaker: And what, what are they? The first one is program execution. So it can execute bash programs, and it can execute Jupyter notebooks. It can execute cells in Jupyter notebooks. So that, those are two tools. Another one is a file editing tool. And the file editing tool allows you to browse parts of files.[00:11:00][00:11:00] Speaker: And kind of read them, overwrite them, other stuff like this. And then we have another global search and replace tool. So it's actually two tools for file editing. And then a final one is web browsing, web browsing. I'm kind of cheating when I call it only one tool. You actually have like scroll and text input and click and other stuff like that.[00:11:18] Speaker: But these are basically the only things we allow the agent to do. What, then the question is, like, what if we wanted to allow it to do something else? And the answer is, well, you know, human programmers already have a bunch of things that they use. They have the requests PyPy library, they have the PDF to text PyPy library, they have, like, all these other libraries in the Python ecosystem that they could use.[00:11:41] Speaker: And so if we provide a coding agent with all these libraries, it can do things like data visualization and other stuff that I just showed you. So it can also get clone repositories and, and other things like this. The agents are super good at using the GitHub API also. So they can do, you know, things on GitHub, like finding all of the, you know, [00:12:00] comments on your issues or checking GitHub actions and stuff.[00:12:02] Speaker: The second thing I think about is the human agent interface. So this is like how do we get humans to interact with agents? Bye. I already showed you one variety of our human agent interface. It's basically a chat window where you can browse through the agent's results and things like this. This is very, very difficult.[00:12:18] Speaker: I, I don't think anybody has a good answer to this, and I don't think we have a good answer to this, but the, the guiding principles that I'm trying to follow are we want to present enough info to the user. So we want to present them with, you know, what the agent is doing in the form of a kind of.[00:12:36] Speaker: English descriptions. So you can see here you can see here every time it takes an action, it says like, I will help you create a script for sending emails. When it runs a bash command. Sorry, that's a little small. When it runs a bash command, it will say ran a bash command. It won't actually show you the whole bash command or the whole Jupyter notebook because it can be really large, but you can open it up and see if you [00:13:00] want to, by clicking on this.[00:13:01] Speaker: So like if you want to explore more, you can click over to the Jupyter notebook and see what's displayed in the Jupyter notebook. And you get like lots and lots of information. So that's one thing.[00:13:16] Speaker: Another thing is go where the user is. So like if the user's already interacting in a particular setting then I'd like to, you know, integrate into that setting, but only to a point. So at OpenHands, we have a chat UI for interaction. We have a GitHub plugin for tagging and resolving issues. So basically what you do is you Do at open hands agent and the open hands agent will like see that comment and be able to go in and fix things.[00:13:42] Speaker: So if you say at open hands agent tests are failing on this PR, please fix the tests. It will go in and fix the test for you and stuff like this. Another thing we have is a remote runtime for launching headless jobs. So if you want to launch like a fleet of agents to solve, you know five different problems at once, you can also do [00:14:00] that through an API.[00:14:00] Speaker: So we have we have these interfaces and this probably depends on the use case. So like, depending if you're a coding agent, you want to do things one way. If you're a like insurance auditing agent, you'll want to do things other ways, obviously.[00:14:13] Choosing the Right Language Model for Agents[00:14:13] Speaker: Another thing I think about a lot is choosing a language model.[00:14:16] Speaker: And for agentic LMs we have to have a bunch of things work really well. The first thing is really, really good instruction following ability. And if you have really good instruction following ability, it opens up like a ton of possible applications for you. Tool use and coding ability. So if you provide tools, it needs to be able to use them well.[00:14:38] Speaker: Environment understanding. So it needs, like, if you're building a web agent, it needs to be able to understand web pages either through vision or through text. And error awareness and recovery ability. So, if it makes a mistake, it needs to be able to, you know, figure out why it made a mistake, come up with alternative strategies, and other things like this.[00:14:58] Speaker: [00:15:00] Under the hood, in all of the demos that I did now Cloud, we're using Cloud. Cloud has all of these abilities very good, not perfect, but very good. Most others don't have these abilities quite as much. So like GPT 4. 0 doesn't have very good error recovery ability. And so because of this, it will go into loops and do the same thing over and over and over again.[00:15:22] Speaker: Whereas Claude does not do this. Claude, if you, if you use the agents enough, you get used to their kind of like personality. And Claude says, Hmm, let me try a different approach a lot. So, you know, obviously it's been trained in some way to, you know, elicit this ability. We did an evaluation. This is old.[00:15:40] Speaker: And we need to update this basically, but we evaluated CLOD, mini LLAMA 405B, DeepSeq 2. 5 on being a good code agent within our framework. And CLOD was kind of head and shoulders above the rest. GPT 40 was kind of okay. The best open source model was LLAMA [00:16:00] 3. 1 405B. This needs to be updated because this is like a few months old by now and, you know, things are moving really, really fast.[00:16:05] Speaker: But I still am under the impression that Claude is the best. The other closed models are, you know, not quite as good. And then the open models are a little bit behind that. Grok, I, we haven't tried Grok at all, actually. So, it's a good question. If you want to try it I'd be happy to help.[00:16:24] Speaker: Cool.[00:16:24] Planning and Workflow for Agents[00:16:24] Speaker: Another thing is planning. And so there's a few considerations for planning. The first one is whether you have a curated plan or you have it generated on the fly. And so for solving GitHub issues, you can kind of have an overall plan. Like the plan is first reproduce. If there's an issue, first write tests to reproduce the issue or to demonstrate the issue.[00:16:50] Speaker: After that, run the tests and make sure they fail. Then go in and fix the tests. Run the tests again to make sure they pass and then you're done. So that's like a pretty good workflow [00:17:00] for like solving coding issues. And you could curate that ahead of time. Another option is to let the language model basically generate its own plan.[00:17:10] Speaker: And both of these are perfectly valid. Another one is explicit structure versus implicit structure. So let's say you generate a plan. If you have explicit structure, you could like write a multi agent system, and the multi agent system would have your reproducer agent, and then it would have your your bug your test writer agent, and your bug fixer agent, and lots of different agents, and you would explicitly write this all out in code, and then then use it that way.[00:17:38] Speaker: On the other hand, you could just provide a prompt that says, please do all of these things in order. So in OpenHands, we do very light planning. We have a single prompt. We don't have any multi agent systems. But we do provide, like, instructions about, like, what to do first, what to do next, and other things like this.[00:17:56] Speaker: I'm not against doing it the other way. But I laid [00:18:00] out some kind of justification for this in this blog called Don't Sleep on Single Agent Systems. And the basic idea behind this is if you have a really, really good instruction following agent it will follow the instructions as long as things are working according to your plan.[00:18:14] Speaker: But let's say you need to deviate from your plan, you still have the flexibility to do this. And if you do explicit structure through a multi agent system, it becomes a lot harder to do that. Like, you get stuck when things deviate from your plan. There's also some other examples, and I wanted to introduce a few papers.[00:18:30] Speaker: So one paper I liked recently is this paper called CoAct where you generate plans and then go in and fix them. And so the basic idea is like, if you need to deviate from your plan, you can You know, figure out that your plan was not working and go back and deviate from it.[00:18:49] Speaker: Another thing I think about a lot is specifying common workflows. So we're trying to tackle a software development and I already showed like three use cases where we do [00:19:00] software development and when we. We do software development, we do a ton of different things, but we do them over and over and over again.[00:19:08] Speaker: So just to give an example we fix GitHub actions when GitHub actions are failing. And we do that over and over and over again. That's not the number one thing that software engineers do, but it's a, you know, high up on the list. So how can we get a list of all of, like, the workflows that people are working on?[00:19:26] Speaker: And there's a few research works that people have done in this direction. One example is manual prompting. So there's this nice paper called STEP that got state of the art on the WebArena Web Navigation Benchmark where they came up with a bunch of manual workflows for solving different web navigation tasks.[00:19:43] Speaker: And we also have a paper recently called Agent Workflow Memory where the basic idea behind this is we want to create self improving agents that learn from their past successes. And the way it works is is we have a memory that has an example of lots of the previous [00:20:00] workflows that people have used. And every time the agent finishes a task and it self judges that it did a good job at that task, you take that task, you break it down into individual workflows included in that, and then you put it back in the prompt for the agent to work next time.[00:20:16] Speaker: And this we demonstrated that this leads to a 22. 5 percent increase on WebArena after 40 examples. So that's a pretty, you know, huge increase by kind of self learning and self improvement.[00:20:31] Speaker: Another thing is exploration. Oops. And one thing I think about is like, how can agents learn more about their environment before acting? And I work on coding and web agents, and there's, you know, a few good examples of this in, in both areas. Within coding, I view this as like repository understanding, understanding the code base that you're dealing with.[00:20:55] Speaker: And there's an example of this, or a couple examples of this, one example being AgentList. [00:21:00] Where they basically create a map of the repo and based on the map of the repo, they feed that into the agent so the agent can then navigate the repo and and better know where things are. And for web agents there's an example of a paper called Bagel, and basically what they do is they have the agent just do random tasks on a website, explore the website, better understand the structure of the website, and then after that they they feed that in as part of the product.[00:21:27] Speaker: Part seven is search. Right now in open hands, we just let the agent go on a linear search path. So it's just solving the problem once. We're using a good agent that can kind of like recover from errors and try alternative things when things are not working properly, but still we only have a linear search path.[00:21:45] Speaker: But there's also some nice work in 2024 that is about exploring multiple paths. So one example of this is there's a paper called Tree Search for Language Agents. And they basically expand multiple paths check whether the paths are going well, [00:22:00] and if they aren't going well, you rewind back. And on the web, this is kind of tricky, because, like, how do you rewind when you accidentally ordered something you don't want on Amazon?[00:22:09] Speaker: It's kind of, you know, not, not the easiest thing to do. For code, it's a little bit easier, because you can just revert any changes that you made. But I, I think that's an interesting topic, too.[00:22:21] Evaluation and Future Predictions for Agents[00:22:21] Speaker: And then finally evaluation. So within our development for evaluation, we want to do a number of things. The first one is fast sanity checks.[00:22:30] Speaker: And in order to do this, we want things we can run really fast, really really cheaply. So for web, we have something called mini world of bits, which is basically these trivial kind of web navigation things. We have something called the Adder Code Editing Benchmark, where it's just about editing individual files that we use.[00:22:48] Speaker: But we also want highly realistic evaluation. So for the web, we have something called WebArena that we created at CMU. This is web navigation on real real open source websites. So it's open source [00:23:00] websites that are actually used to serve shops or like bulletin boards or other things like this.[00:23:07] Speaker: And for code, we use Swebench, which I think a lot of people may have heard of. It's basically a coding benchmark that comes from real world pull requests on GitHub. So if you can solve those, you can also probably solve other real world pull requests. I would say we still don't have benchmarks for the fur full versatility of agents.[00:23:25] Speaker: So, for example We don't have benchmarks that test whether agents can code and do web navigation. But we're working on that and hoping to release something in the next week or two. So if that sounds interesting to you, come talk to me and I, I will tell you more about it.[00:23:42] Speaker: Cool. So I don't like making predictions, but I was told that I should be somewhat controversial, I guess, so I will, I will try to do it try to do it anyway, although maybe none of these will be very controversial. Um, the first thing is agent oriented LLMs like large language models for [00:24:00] agents.[00:24:00] Speaker: My, my prediction is every large LM trainer will be focusing on training models as agents. So every large language model will be a better agent model by mid 2025. Competition will increase, prices will go down, smaller models will become competitive as agents. So right now, actually agents are somewhat expensive to run in some cases, but I expect that that won't last six months.[00:24:23] Speaker: I, I bet we'll have much better agent models in six months. Another thing is instruction following ability, specifically in agentic contexts, will increase. And what that means is we'll have to do less manual engineering of agentic workflows and be able to do more by just prompting agents in more complex ways.[00:24:44] Speaker: Cloud is already really good at this. It's not perfect, but it's already really, really good. And I expect the other models will catch up to Cloud pretty soon. Error correction ability will increase, less getting stuck in loops. Again, this is something that Cloud's already pretty good at and I expect the others will, will follow.[00:25:00][00:25:01] Speaker: Agent benchmarks. Agent benchmarks will start saturating.[00:25:05] Speaker: And Swebench I think WebArena is already too easy. It, it is, it's not super easy, but it's already a bit too easy because the tasks we do in there are ones that take like two minutes for a human. So not, not too hard. And kind of historically in 2023 our benchmarks were too easy. So we built harder benchmarks like WebArena and Swebench were both built in 2023.[00:25:31] Future of Agent Development[00:25:31] Speaker: In 2024, our agents were too bad, so we built agents and now we're building better agents. In 2025, our benchmarks will be too easy, so we'll build better benchmarks, I'm, I'm guessing. So, I would expect to see much more challenging agent benchmarks come out, and we're already seeing some of them.[00:25:49] Speaker: In 2026, I don't know. I didn't write AGI, but we'll, we'll, we'll see.[00:25:56] Human-Agent Interaction Challenges[00:25:56] Speaker: Then the human agent computer interface. I think one thing that [00:26:00] we'll want to think about is what do we do at 75 percent success rate at things that we like actually care about? Right now we have 53 percent or 55 percent on Swebench verified, which is real world GitHub PRs.[00:26:16] Speaker: My impression is that the actual. Actual ability of models is maybe closer to 30 to 40%. So 30 to 40 percent of the things that I want an agent to solve on my own repos, it just solves without any human intervention. 80 to 90 percent it can solve without me opening an IDE. But I need to give it feedback.[00:26:36] Speaker: So how do we, how do we make that interaction smooth so that humans can audit? The work of agents that are really, really good, but not perfect is going to be a big challenge.[00:26:48] Expanding Agent Use Beyond Programming[00:26:48] Speaker: How can we expose the power of programming agents to other industries? So like as programmers, I think not all of us are using agents every day in our programming, although we probably will be [00:27:00] in in months or maybe a year.[00:27:02] Speaker: But I, I think it will come very naturally to us as programmers because we know code. We know, you know. Like how to architect software and stuff like that. So I think the question is how do we put this in the hands of like a lawyer or a chemist or somebody else and have them also be able to, you know, interact with it as naturally as we can.[00:27:25] Redesigning Systems for Agent Efficiency[00:27:25] Speaker: Another interesting thing is how can we redesign our existing systems for agents? So we had a paper on API based web agents, and basically what we showed is If you take a web agent and the agent interacts not with a website, but with APIs, the accuracy goes way up just because APIs are way easier to interact with.[00:27:42] Speaker: And in fact, like when I ask the, well, our agent, our agent is able to browse websites, but whenever I want it to interact with GitHub, I tell it do not browse the GitHub website. Use the GitHub API because it's way more successful at doing that. So maybe, you know, every website is going to need to have [00:28:00] an API because we're going to be having agents interact with them.[00:28:03] Accelerating Progress with Agent Technology[00:28:03] Speaker: About progress, I think progress will get faster. It's already fast. A lot of people are already overwhelmed, but I think it will continue. The reason why is agents are building agents. And better agents will build better agents faster. So I expect that you know, if you haven't interacted with a coding agent yet, it's pretty magical, like the stuff that it can do.[00:28:24] Speaker: So yeah.[00:28:28] Call to Action for Open Source Contributions[00:28:28] Speaker: And I have a call to action. I'm honestly, like I've been working on, you know, natural language processing and, and Language models for what, 15 years now. And even for me, it's pretty impressive what like AI agents powered by strong language models can do. On the other hand, I believe that we should really make these powerful tools accessible.[00:28:49] Speaker: And what I mean by this is I don't think like, you know, We, we should have these be opaque or limited to only a set, a certain set of people. I feel like they should be [00:29:00] affordable. They shouldn't be increasing the, you know, difference in the amount of power that people have. If anything, I'd really like them to kind of make it It's possible for people who weren't able to do things before to be able to do them well.[00:29:13] Speaker: Open source is one way to do that. That's why I'm working on open source. There are other ways to do that. You know, make things cheap, make things you know, so you can serve them to people who aren't able to afford them. Easily, like Duolingo is one example where they get all the people in the US to pay them 20 a month so that they can give all the people in South America free, you know, language education, so they can learn English and become, you know like, and become, you know, More attractive on the job market, for instance.[00:29:41] Speaker: And so I think we can all think of ways that we can do that sort of thing. And if that resonates with you, please contribute. Of course, I'd be happy if you contribute to OpenHands and use it. But another way you can do that is just use open source solutions, contribute to them, research with them, and train strong open source [00:30:00] models.[00:30:00] Speaker: So I see, you know, Some people in the room who are already training models. It'd be great if you could train models for coding agents and make them cheap. And yeah yeah, please. I, I was thinking about you among others. So yeah, that's all I have. Thanks.[00:30:20] Speaker 2: Slight, slightly controversial. Tick is probably the nicest way to say hot ticks. Any hot ticks questions, actual hot ticks?[00:30:31] Speaker: Oh, I can also show the other agents that were working, if anybody's interested, but yeah, sorry, go ahead.[00:30:36] Q&A: Agent Performance and Benchmarks[00:30:36] Speaker 3: Yeah, I have a couple of questions. So they're kind of paired, maybe. The first thing is that you said that You're estimating that your your agent is successfully resolving like something like 30 to 40 percent of your issues, but that's like below what you saw in Swebench.[00:30:52] Speaker 3: So I guess I'm wondering where that discrepancy is coming from. And then I guess my other second question, which is maybe broader in scope is that [00:31:00] like, if, if you think of an agent as like a junior developer, and I say, go do something, then I expect maybe tomorrow to get a Slack message being like, Hey, I ran into this issue.[00:31:10] Speaker 3: How can I resolve it? And, and, like you said, your agent is, like, successfully solving, like, 90 percent of issues where you give it direct feedback. So, are you thinking about how to get the agent to reach out to, like, for, for planning when it's, when it's stuck or something like that? Or, like, identify when it runs into a hole like that?[00:31:30] Speaker: Yeah, so great. These are great questions. Oh,[00:31:32] Speaker 3: sorry. The third question, which is a good, so this is the first two. And if so, are you going to add a benchmark for that second question?[00:31:40] Speaker: Okay. Great. Yeah. Great questions. Okay. So the first question was why do I think it's resolving less than 50 percent of the issues on Swebench?[00:31:48] Speaker: So first Swebench is on popular open source repos, and all of these popular open source repos were included in the training data for all of the language models. And so the language [00:32:00] models already know these repos. In some cases, the language models already know the individual issues in Swebench.[00:32:06] Speaker: So basically, like, some of the training data has leaked. And so it, it definitely will overestimate with respect to that. I don't think it's like, you know, Horribly, horribly off but I think, you know, it's boosting the accuracy by a little bit. So, maybe that's the biggest reason why. In terms of asking for help, and whether we're benchmarking asking for help yes we are.[00:32:29] Speaker: So one one thing we're working on now, which we're hoping to put out soon, is we we basically made SuperVig. Sweep edge issues. Like I'm having a, I'm having a problem with the matrix multiply. Please help. Because these are like, if anybody's run a popular open source, like framework, these are what half your issues are.[00:32:49] Speaker: You're like users show up and say like, my screen doesn't work. What, what's wrong or something. And so then you need to ask them questions and how to reproduce. So yeah, we're, we're, we're working on [00:33:00] that. I think. It, my impression is that agents are not very good at asking for help, even Claude. So like when, when they ask for help, they'll ask for help when they don't need it.[00:33:11] Speaker: And then won't ask for help when they do need it. So this is definitely like an issue, I think.[00:33:20] Speaker 4: Thanks for the great talk. I also have two questions.[00:33:23] Q&A: Web Agents and Interaction Methods[00:33:23] Speaker 4: It's first one can you talk a bit more about how the web agent interacts with So is there a VLM that looks at the web page layout and then you parse the HTML and select which buttons to click on? And if so do you think there's a future where there's like, so I work at Bing Microsoft AI.[00:33:41] Speaker 4: Do you think there's a future where the same web index, but there's an agent friendly web index where all the processing is done offline so that you don't need to spend time. Cleaning up, like, cleaning up these TML and figuring out what to click online. And any thoughts on, thoughts on that?[00:33:57] Speaker: Yeah, so great question. There's a lot of work on web [00:34:00] agents. I didn't go into, like, all of the details, but I think there's There's three main ways that agents interact with websites. The first way is the simplest way and the newest way, but it doesn't work very well, which is you take a screenshot of the website and then you click on a particular pixel value on the website.[00:34:23] Speaker: And Like models are not very good at that at the moment. Like they'll misclick. There was this thing about how like clawed computer use started like looking at pictures of Yellowstone national park or something like this. I don't know if you heard about this anecdote, but like people were like, oh, it's so human, it's looking for vacation.[00:34:40] Speaker: And it was like, no, it probably just misclicked on the wrong pixels and accidentally clicked on an ad. So like this is the simplest way. The second simplest way. You take the HTML and you basically identify elements in the HTML. You don't use any vision whatsoever. And then you say, okay, I want to click on this element.[00:34:59] Speaker: I want to enter text [00:35:00] in this element or something like that. But HTML is too huge. So it actually, it usually gets condensed down into something called an accessibility tree, which was made for screen readers for visually impaired people. And So that's another way. And then the third way is kind of a hybrid where you present the screenshot, but you also present like a textual summary of the output.[00:35:18] Speaker: And that's the one that I think will probably work best. What we're using is we're just using text at the moment. And that's just an implementation issue that we haven't implemented the. Visual stuff yet, but that's kind of like we're working on it now. Another thing that I should point out is we actually have two modalities for web browsing.[00:35:35] Speaker: Very recently we implemented this. And the reason why is because if you want to interact with full websites you will need to click on all of the elements or have the ability to click on all of the elements. But most of our work that we need websites for is just web browsing and like gathering information.[00:35:50] Speaker: So we have another modality where we convert all of it to markdown because that's like way more concise and easier for the agent to deal with. And then [00:36:00] can we create an index specifically for agents, maybe a markdown index or something like that would be, you know, would make sense. Oh, how would I make a successor to Swebench?[00:36:10] Speaker: So I mean, the first thing is there's like live code bench, which live code bench is basically continuously updating to make sure it doesn't leak into language model training data. That's easy to do for Swebench because it comes from real websites and those real websites are getting new issues all the time.[00:36:27] Speaker: So you could just do it on the same benchmarks that they have there. There's also like a pretty large number of things covering various coding tasks. So like, for example, Swebunch is mainly fixing issues, but there's also like documentation, there's generating tests that actually test the functionality that you want.[00:36:47] Speaker: And there there was a paper by a student at CMU on generating tests and stuff like that. So I feel like. Swebench is one piece of the puzzle, but you could also have like 10 different other tasks and then you could have like a composite [00:37:00] benchmark where you test all of these abilities, not just that particular one.[00:37:04] Speaker: Well, lots, lots of other things too, but[00:37:11] Speaker 2: Question from across. Use your mic, it will help. Um,[00:37:15] Speaker 5: Great talk. Thank you.[00:37:16] Q&A: Agent Architectures and Improvements[00:37:16] Speaker 5: My question is about your experience designing agent architectures. Specifically how much do you have to separate concerns in terms of tasks specific agents versus having one agent to do three or five things with a gigantic prompt with conditional paths and so on.[00:37:35] Speaker: Yeah, so that's a great question. So we have a basic coding and browsing agent. And I won't say basic, like it's a good, you know, it's a good agent, but it does coding and browsing. And it has instructions about how to do coding and browsing. That is enough for most things. Especially given a strong language model that has a lot of background knowledge about how to solve different types of tasks and how to use different APIs and stuff like that.[00:37:58] Speaker: We do have [00:38:00] a mechanism for something called micro agents. And micro agents are basically something that gets added to the prompt when a trigger is triggered. Right now it's very, very rudimentary. It's like if you detect the word GitHub anywhere, you get instructions about how to interact with GitHub, like use the API and don't browse.[00:38:17] Speaker: Also another one that I just added is for NPM, the like JavaScript package manager. And NPM, when it runs and it hits a failure, it Like hits in interactive terminals where it says, would you like to quit? Yep. Enter yes. And if that does it, it like stalls our agent for the time out until like two minutes.[00:38:36] Speaker: So like I added a new microagent whenever it started using NPM, it would Like get instructions about how to not use interactive terminal and stuff like that. So that's our current solution. Honestly, I like it a lot. It's simple. It's easy to maintain. It works really well and stuff like that. But I think there is a world where you would want something more complex than that.[00:38:55] Speaker 5: Got it. Thank you.[00:38:59] Speaker 6: I got a [00:39:00] question about MCP. I feel like this is the Anthropic Model Context Protocol. It seems like the most successful type of this, like, standardization of interactions between computers and agents. Are you guys adopting it? Is there any other competing standard?[00:39:16] Speaker 6: Anything, anything thought about it?[00:39:17] Speaker: Yeah, I think the Anth, so the Anthropic MCP is like, a way to It, it's essentially a collection of APIs that you can use to interact with different things on the internet. I, I think it's not a bad idea, but it, it's like, there's a few things that bug me a little bit about it.[00:39:40] Speaker: It's like we already have an API for GitHub, so why do we need an MCP for GitHub? Right. You know, like GitHub has an API, the GitHub API is evolving. We can look up the GitHub API documentation. So it seems like kind of duplicated a little bit. And also they have a setting where [00:40:00] it's like you have to spin up a server to serve your GitHub stuff.[00:40:04] Speaker: And you have to spin up a server to serve your like, you know, other stuff. And so I think it makes, it makes sense if you really care about like separation of concerns and security and like other things like this, but right now we haven't seen, we haven't seen that. To have a lot more value than interacting directly with the tools that are already provided.[00:40:26] Speaker: And that kind of goes into my general philosophy, which is we're already developing things for programmers. You know,[00:40:36] Speaker: how is an agent different than from a programmer? And it is different, obviously, you know, like agents are different from programmers, but they're not that different at this point. So we can kind of interact with the interfaces we create for, for programmers. Yeah. I might change my mind later though.[00:40:51] Speaker: So we'll see.[00:40:54] Speaker 7: Yeah. Hi. Thanks. Very interesting talk. You were saying that the agents you have right now [00:41:00] solve like maybe 30 percent of your, your issues out of the gate. I'm curious of the things that it doesn't do. Is there like a pattern that you observe? Like, Oh, like these are the sorts of things that it just seems to really struggle with, or is it just seemingly random?[00:41:15] Speaker: It's definitely not random. It's like, if you think it's more complex than it's. Like, just intuitively, it's more likely to fail. I've gotten a bit better at prompting also, so like, just to give an example it, it will sometimes fail to fix a GitHub workflow because it will not look at the GitHub workflow and understand what the GitHub workflow is doing before it solves the problem.[00:41:43] Speaker: So I, I think actually probably the biggest thing that it fails at is, um, er, that our, our agent plus Claude fails at is insufficient information gathering before trying to solve the task. And so if you provide all, if you provide instructions that it should do information [00:42:00] gathering beforehand, it tends to do well.[00:42:01] Speaker: If you don't provide sufficient instructions, it will try to solve the task without, like, fully understanding the task first, and then fail, and then you need to go back and give feedback. You know, additional feedback. Another example, like, I, I love this example. While I was developing the the monitor website that I, I showed here, we hit a really tricky bug where it was writing out a cache file to a different directory than it was reading the cache file from.[00:42:26] Speaker: And I had no idea what to do. I had no idea what was going on. I, I thought the bug was in a different part of the code, but what I asked it to do was come up with five possible reasons why this could be failing and decreasing order of likelihood and examine all of them. And that worked and it could just go in and like do that.[00:42:44] Speaker: So like I think a certain level of like scaffolding about like how it should sufficiently Gather all the information that's necessary in order to solve a task is like, if that's missing, then that's probably the biggest failure point at the moment. [00:43:00][00:43:01] Speaker 7: Thanks.[00:43:01] Speaker 6: Yeah.[00:43:06] Speaker 6: I'm just, I'm just using this as a chance to ask you all my questions.[00:43:09] Q&A: Self-Improving Agents and Authentication[00:43:09] Speaker 6: You had a, you had a slide on here about like self improving agents or something like that with memory. It's like a really throwaway slide for like a super powerful idea. It got me thinking about how I would do it. I have no idea how.[00:43:21] Speaker 6: So I just wanted you to chain a thought more on this.[00:43:25] Speaker: Yeah, self, self improving. So I think the biggest reason, like the simplest possible way to create a self improving agent. The problem with that is to have a really, really strong language model that with infinite context, and it can just go back and look at like all of its past experiences and, you know, learn from them.[00:43:46] Speaker: You might also want to remove the bad stuff just so it doesn't over index on it's like failed past experiences. But the problem is a really powerful language model is large. Infinite context is expensive. We don't have a good way to [00:44:00] index into it because like rag, Okay. At least in my experience, RAG from language to code doesn't work super well.[00:44:08] Speaker: So I think in the end, it's like, that's the way I would like to solve this problem. I'd like to have an infinite context and somehow be able to index into it appropriately. And I think that would mostly solve it. Another thing you can do is fine tuning. So I think like RAG is one way to get information into your model.[00:44:23] Speaker: Fine tuning is another way to get information into your model. So. That might be another way of continuously improving. Like you identify when you did a good job and then just add all of the good examples into your model.[00:44:34] Speaker 6: Yeah. So, you know, how like Voyager tries to write code into a skill library and then you reuse as a skill library, right?[00:44:40] Speaker 6: So that it improves in the sense that it just builds up the skill library over time.[00:44:44] Speaker: Yep.[00:44:44] Speaker 6: One thing I was like thinking about and there's this idea of, from, from Devin, your, your arch nemesis of playbooks. I don't know if you've seen them.[00:44:52] Speaker: Yeah, I mean, we're calling them workflows, but they're simpler.[00:44:55] Speaker 6: Yeah, so like, basically, like, you should, like, once a workflow works, you can kind of, [00:45:00] like, persist them as a skill library. Yeah. Right? Like I, I feel like that there's a, that's like some in between, like you said, you know, it's hard to do rag between language and code, but I feel like that is ragged for, like, I've done this before, last time I did it, this, this worked.[00:45:14] Speaker 6: So I'm just going to shortcut. All the stuff that failed before.[00:45:18] Speaker: Yeah, I totally, I think it's possible. It's just, you know, not, not trivial at the same time. I'll explain the two curves. So basically, the base, the baseline is just an agent that does it from scratch every time. And this curve up here is agent workflow memory where it's like adding the successful experiences back into the prompt.[00:45:39] Speaker: Why is this improving? The reason why is because just it failed on the first few examples and for the average to catch up it, it took a little bit of time. So it's not like this is actually improving it. You could just basically view the this one is constant and then this one is like improving.[00:45:56] Speaker: Like this, basically you can see it's continuing to go [00:46:00] up.[00:46:01] Speaker 8: How do you think we're going to solve the authentication problem for agents right now?[00:46:05] Speaker: When you say authentication, you mean like credentials, like, yeah.[00:46:09] Speaker 8: Yeah. Cause I've seen a few like startup solutions today, but it seems like it's limited to the amount of like websites or actual like authentication methods that it's capable of performing today.[00:46:19] Speaker: Yeah. Great questions. So. My preferred solution to this at the moment is GitHub like fine grained authentication tokens and GitHub fine grained authentication tokens allow you to specify like very free. On a very granular basis on this repo, you have permission to do this, on this repo, you have permission to do this.[00:46:41] Speaker: You also can prevent people from pushing to the main branch unless they get approved. You can do all of these other things. And I think these were all developed for human developers. Or like, the branch protection rules were developed for human developers. The fine grained authentication tokens were developed for GitHub apps.[00:46:56] Speaker: I think for GitHub, maybe [00:47:00] just pushing this like a little bit more is the way to do this. For other things, they're totally not prepared to give that sort of fine grained control. Like most APIs don't have something like a fine grained authentication token. And that goes into my like comment that we're going to need to prepare the world for agents, I think.[00:47:17] Speaker: But I think like the GitHub authentication tokens are like a good template for how you could start doing that maybe, but yeah, I don't, I don't, I don't have an answer.[00:47:25] Speaker 8: I'll let you know if I find one.[00:47:26] Speaker: Okay. Yeah.[00:47:31] Live Demonstration and Closing Remarks[00:47:31] Speaker: I'm going to finish up. Let, let me just see.[00:47:37] Speaker: Okay. So this one this one did write a script. I'm not going to actually read it for you. And then the other one, let's see.[00:47:51] Speaker: Yeah. So it sent a PR, sorry. What is, what is the PR URL?[00:48:00][00:48:02] Speaker: So I don't, I don't know if this sorry, that's taking way longer than it should. Okay, cool. Yeah. So this one sent a PR. I'll, I'll tell you later if this actually like successfully Oh, no, it's deployed on Vercel, so I can actually show you, but let's, let me try this real quick. Sorry. I know I don't have time.[00:48:24] Speaker: Yeah, there you go. I have pie charts now. So it's so fun. It's so fun to play with these things. Cause you could just do that while I'm giving a, you know, talk and things like that. So, yeah, thanks. Get full access to Latent Space at www.latent.space/subscribe

284: Amazon Q uses machine learning to get smarter, but Bond’s Q can turn a wristwatch into a laser beam. Your move, AI.

The Cloud Pod

Play Episode Listen Later Dec 19, 2024 63:19

Welcome to episode 284 of The Cloud Pod – where the forecast is always cloudy! Everybody is in the house this week, and it's a good thing because since we've last recorded re:Invent happened, and we have a LOT to talk about. So let's jump right in! Titles we almost went with this week: Amazon Steals from Azure…. We Are Doomed The Cloud Pod Can Now Throw Away a lot of Code The Cloud Pod Controls the Future The Cloud Pod Observes More Insights We Are Simplicity X None of the Above Stop Trying to Make Bedrock & Q Happen My Head Went SuperNova over all the Q Announcements These are Not the Gadgets Bond Needed, Q! A big thanks to this week's sponsor: We're sponsorless! Want to get your brand, company, or service in front of a very enthusiastic group of cloud news seekers? You've come to the right place! Send us an email or hit us up on our slack channel for more info. AWS 08:12 It's the re:Invent recap! Did you make any announcement predictions? Let's see how our hosts' predictions stacked up to reality. Matt – 1 Large Green Computing Reinvent LLM at the Edge Something new on S3 Ryan (AI) – 1 Improved serverless observability tools Expansion of AI Driven workflows in datalakes Greater Focus on Multi-Account or Multi-region orchestration, centralized compliance management, or enhanced security services Jonathan – 0 New Edge Computing Capabilities better global application deployment type features. (Cloudflare competitor maybe) New automated cost optimization tools Automated RAG/vector to S3 Justin – 2 Managed Backstage or platform like service New LLM multi-modal replacement or upgrade to Titan Competitor VM offering to Broadcom Honorable Mentions: Jonathan: Deeper integration between serverless and container services New region Enhanced Observability with AI driven debugging tool Justin: Multicloud management – in a bigger way (Anthos competitor) Agentic AI toolings New ARM graviton chip How many will AI or Artificial Intelligence be said: 45 Justin – 35 Jonathan – 72 Pre:Invent There were over 180 announcements, and yes – we have them all listed here for you. You're welcome. 17:12 Time-based snapshot copy for Amazon EBS Now you can specify a desired completion duration, from 15 minutes to 48 hours when you copy an Amazon EBS snapshot within or between Amazon regions or accounts. This will allow you to meet your time-based compliance and business requirements for critical workloads, mostly around DR capabilities. We're just glad to see this one finally, because having it built in directly to the console to guarantee that EBS snapshots make it to the other region is a big quality of life enhancement. Announcing future-dated Amazon EC2 On-Demand Capacity Reservations

amazon time ai artificial intelligence bond machine learning azure invent cloudflare laser beams ebs get smarter amazon q anthos wristwatch cloud pod

SP7｜現場直擊科技盛會 re:Invent，深度解析 AWS 的生成式 AI 策略

曼報 Manny's Newsletter

Play Episode Listen Later Dec 6, 2024 37:16

本集節目由【AWS】贊助播出這集是人生第一次在國外錄音，而且還是在全球最大的科技論壇之一、AWS 的主場 re:Invent 的現場錄音。另一個令人興奮的是本集還邀請到 AWS 解決方案架構師經理 Jayson Hsieh 與社群英雄 Ernest Chiang 一起暢談今年覺得最重要的發表資訊。不論你是不是 AWS 的用戶，或甚至熟不熟悉 AWS 的產品服務，這集應該都能帶給你一些啟發，因為我們三人都專注在解釋各項新發表究竟帶來了哪些商業價值，而不只是單純的更新。 — 搶先報名 AWS 雲端科技發表會 - 台灣站及觀看AWS re:Invent 精彩重播：https://pages.awscloud.com/tw-reinvent_recap_202501.html -- (01:44) 次世代基礎模型 Amazon Nova (07:15) 運算創新：解決明確的問題 (13:01) 儲存創新：更快還要更便宜 (18:32) 資料庫創新：台灣一定要關注的 Aurora (21:18) 協作創新：Amazon Q 如何協助開發者 (28:51) 特殊彩蛋 (31:07) 最後討論 -- 商業合作報價：https://manny-li.com/sponsor/ 訂閱電子報：https://manny-li.com 追蹤 IG：@manny_li 追蹤 FB：manny yh li Powered by Firstory Hosting

fb powered reinvent invent amazon q

AWS and GitLab Announce Integrated AI Offering - Six Five Media at AWS re:Invent

The Six Five with Patrick Moorhead and Daniel Newman

Play Episode Listen Later Dec 5, 2024 20:26

Exciting news from AWS re:Invent! GitLab and AWS are joining forces to supercharge AI-powered software development!

ai media expectations security exciting offering aws reinvent integrated invent software development gitlab devsecops developer productivity amazon q

Build Amazon Q Apps to Scale & Drive Community Engagement with Linda Mohamed

Datacenter Technical Deep Dives

Play Episode Listen Later Nov 26, 2024

AWS Hero Linda Mohamed joins the vBrownBag to show us how she uses Amazon Q to scale community engagement (also a sneak peek for her AWS ReInvent 2024 session!) 00:00 Intros and chit-chat 09:05 Linda is a juggler, and Damian is fascinated by juggling math

AI @ HLTH: GE GenAI Strategy Leveraging Multimodels to Streamline Technologies and Products to Reduce Cognitive Burden on Providers

HLTH Matters

Play Episode Listen Later Nov 22, 2024 24:55

In this episode, Sandy Vance interviews Parminder “Parry” Bhatia, Chief AI Officer at GE Healthcare, about the integration of generative AI and machine learning in healthcare. They discuss the transformative potential of these technologies in improving clinical efficiency, reducing burnout, and enhancing patient care. Parminder shares insights on recent innovations, the concept of agentic AI, and the importance of responsible AI practices in ensuring safe and effective healthcare solutions. The discussion highlights GE Healthcare's commitment to advancing AI technologies while maintaining a focus on ethical considerations and collaboration with clinical partners.In this Episode they discuss:GE Healthcare is leading in AI and machine learning integration.Generative AI is set to revolutionize healthcare data management.AI technologies can significantly reduce clinical burnout.The Care Intellect application enhances oncology care efficiency.Agentic AI offers proactive solutions in complex healthcare scenarios.Responsible AI practices are crucial for building trust in technology.AI can streamline workflows and improve patient experiences.Collaboration with clinical partners is essential for innovation.The AI Innovation Lab fosters early-stage research and development.GE Healthcare aims to enhance healthcare delivery for over a billion patients.About Parminder "Parry" Bhatia: At GE HealthCare, Parminder is focused on integrating AI across areas including smart devices, across the patient journey, and at the hospital operation level.GE HealthCare has a long track record innovating in AI, and has topped the Food and Drug Administration's (FDA) list of AI-enabled medica devices for three years in a row, with 80+ AI-enabled medical device authorizations. Parminder leads a team that is helping to advance AI integration within medical devices at GE HealthCare, with the ultimate goal of enhancing patient outcomes and creating a world where healthcare has no limits. Parry is part of the company's internal committee on responsible AI to help ensure that new AI applications are reliable, scalable, and ethically sound. He has been recognized by the AIM “AI 100” 2024 Awards and Constellation Research's Artificial Intelligence 150 “AI 150” list.Previously, Parminder was Head of Science for Generative AI at Amazon, where he led the development of machine learning and generative AI products including Amazon Comprehend Medical for analyzing medical record data at scale, Amazon Q for developer productivity, and Amazon Bedrock for democratizing access to Large Language Model technologies globally.He has held previous roles in AI and machine learning at Microsoft and Georgia Tech.

Ep062: Amazon Q - Your Generative AI Assistant with Urmila Kukreja of Smartsheet

AWS for Software Companies Podcast

Play Episode Listen Later Nov 5, 2024 22:31

Register here for AWS re:Invent 2024, Dec 2-6, Las Vegas, NV-------Urmila Kukreja of Smartsheet and Nick Simha of AWS discuss leveraging Amazon Q's Retrieval-Augmented Generation (RAG) solution to enhance productivity by enabling employees to quickly access relevant information within secure, integrated workflows like Slack, improving efficiency across the organization.Topics Include:Introduction by Nick Simha, AWS.Overview of Amazon Q's role in data analytics and Gen AI.Gen AI's impact on productivity, ~30% improvement backed by Gartner study findings.General productivity improvement seen across various departments.Amazon Q's developer code generation tool – rapid developmentGen AI and LLMs' challenges: security, privacy, and data relevance.Foundation models lack specific organizational knowledge by default.Empowering Gen AI to grant system access can cause issuesPrivacy concern: Sensitive data, like credit card info, can be central in data breachesCompliance is critical for organizational reputation and data integrity.Data integration techniques: prompt engineering, RAG, fine-tuning, custom training.RAG (Retrieval Augmented Generation) balances cost and accuracy effectively.Implementing RAG requires complex, resource-heavy integration steps.Amazon Q simplifies RAG integration with "RAG as a service."Amazon Q's Gen AI stack overview, including Bedrock and model flexibility.Amazon Q connects to 40+ applications, including Salesforce and ServiceNow.Amazon Q respects existing security rules and data privacy constraints.Plugin functionality enables backend actions directly from Amazon Q.All configurations and permissions can be managed by administrators.Urmila Kukreja from Smartsheet explains real-world Q implementation.Smartsheet's Ask Us Engineering Slack channel: origin of Q integration.Q integration in Slack simplifies data access and user workflow."Ask Me" Slack bot lets employees query databases instantly.Adoption across departments is high due to integrated workflow.Future plans include adding data sources and personalized response features.Session wrap upParticipants:Urmila Kukreja – Director of Product Management, SmartsheetNick Simha - Solutions Architecture Leader - Data, Analytics, GenAI and Emerging ISVs, AWSSee how Amazon Web Services gives you the freedom to migrate, innovate, and scale your software company at https://aws.amazon/isv/

Ep058: Boost Employee Productivity with AI agents powered by Amazon Q

AWS for Software Companies Podcast

Play Episode Listen Later Oct 8, 2024 23:37

Register here for AWS re:Invent 2024, Dec 2-6, Las Vegas, NV-------J.B. Brown, VP of Engineering at Smartsheet, shares how integrating Amazon Q with Smartsheet's flexible work management platform has streamlined productivity and enhanced employee support through AI-driven automation.Topics Include:Introduction by J.B. Brown, VP of Engineering at Smartsheet.Story about improving productivityContext about Smartsheet as an enterprise-scale work management platform.Examples of Smartsheet use in healthcare, TV streaming, and small businesses.Focus on not changing how companies work, offering flexibility.Integration with popular enterprise tech stack tools like Okta and Slack.Automations in Smartsheet for notifications and data synchronization.Smartsheet's customer base includes large enterprises and small businesses.Overview of Smartsheet's scale: 15 million users and $1 billion revenue.Smartsheet's employee support system, including 270+ "Ask Us" Slack channels.Mention of AWS and the introduction of Amazon Q Business.Building a Smartsheet Q Business app for streamlined employee support.Setting up an Amazon Q Business app with proprietary data sources.Implementation of Slack integration for Smartsheet employee support.Example of AI summarizing Slack threads for improved efficiency.Demo of Amazon Q Business outperforming human experts in knowledge retrieval.Emphasizing the value of reducing response time and decision-making delays.Future development plans: Smartsheet-Amazon Q connector.Using AI to interrogate and manage Smartsheet project data.Invitation to AI-minded Smartsheet customers to test the new connector.Participants:J.B. Brown - VP of Engineering at SmartsheetSee how Amazon Web Services gives you the freedom to migrate, innovate, and scale your software company at https://aws.amazon/isv/

Calculating the Cost and ROI of Generative AI

AWS - Conversations with Leaders

Play Episode Listen Later Sep 17, 2024 39:38

Generative AI has the potential to revolutionize industries and organizations, but how much will it cost your business, and how do you measure return on that investment? In this episode, get practical guidance on estimating the costs of generative AI adoption, from model selection to infrastructure requirements. Learn how to quantifying the productivity gains and efficiency improvements enabled by generative AI, strategies for upskilling your workforce, and advice on driving organization-wide adoption of generative AI capabilities.Resources:Amazon Bedrock: https://aws.amazon.com/bedrock/Amazon Q: https://aws.amazon.com/q/AWS Generative AI Innovation Center: https://aws.amazon.com/generative-ai/innovation-center/

ai business technology leadership innovation cost executives artificial intelligence cloud stem enterprise machine learning digital transformation generative amazon web services calculating workplace culture amazon q

iCIMS & SmartRecruiters CEO Divergence

The Chad & Cheese Podcast

Play Episode Listen Later Aug 30, 2024 46:54

This week, the boys are doing their best Charles Dickens impressions, but instead of two cities, they're waxing poetic about the tale of two applicant tracking systems: SmartRecruiers and iCIMS. One is making all the right moves, while the other one, well, not-so-much. Then it's on to Amazon, whose new Amazon Q technology is taking the world by storm ... and possibly putting a lot of developers out of work. Yikes! Then it's time for a little Buy-or-Sell, featuring Pangeam, Micro1 and Workpay. More disagreement this time than usual, so hopefully you like fireworks. And if you don't like fireworks, hopefully you like chicken, becauase Chick-fil-A is back in the news. And speaking of chicken, don't even get Chad started on Chicken Cock Whiskey ... you just need to listen. Chapters 01:00 - Euro Chad goes deep 03:40 - Chicken Cock Whiskey 05:18 - Oasis, PeralJam and Xers 08:25 - Kelce's New Heights rein in $100m 12:00 - Fantasy Football 14:15 - CEO Divergence 20:01 - iCIMS: Potential Acquisition Strategy and Layoffs 21:01 - SmartRecruiters: Industry Knowledge and Global Expansion 21:14 - Impact of CEO Changes on Global Growth Strategies 21:30 - Comparing iCIMS and SmartRecruiters 27:36 - The Impact of Amazon Q 34:54 - Recent Funding: Pangem, Micro One, and Workpay 43:32 - Chick-fil-A's Misadventure in the Streaming Market Keywords SmartRecruiters, iCIMS, CEO changes, acquisitions, layoffs, global growth strategy, iCIMS, SmartRecruiters, HR tech, AI, software development, Amazon Q, Pangem, Micro One, Workpay, Chick-fil-A, streaming platform

ceo amazon ai impact oasis chick travis kelce charles dickens new heights divergence misadventure amazon q icims smartrecruiters chicken cock whiskey

AI in Patch Management, Chatbots, Cloud Spending, and Windows Control Panel

Business of Tech

Play Episode Listen Later Aug 28, 2024 11:44

The episode begins with a focus on AI-based patch management solutions, highlighting leading vendors like Automox, Flexera, and Kaseya. The discussion delves into how AI and ML-driven patch management can provide real-time risk assessments, helping prioritize critical patches and enhance cybersecurity measures.The episode then shifts to the evolving landscape of cloud infrastructure driven by generative AI advancements. The transcript reveals insights from an IBM study, indicating concerns among tech executives about infrastructure readiness for AI demands. Additionally, the discussion touches on the challenges faced by businesses in adopting AI quickly and effectively, with a prediction that 13% of businesses will adopt AI in the next three to four years.A significant development highlighted in the episode is the introduction of ChatIT by Commonwealth Bank, an AI-powered IT support chatbot built on Azure services. The chatbot, accessible via Microsoft Teams, boasts an impressive average response time of 14 seconds and over 13,000 employee interactions. This innovation streamlines IT troubleshooting, integrates with the bank's knowledge base, and hints at future enhancements to improve user experience and efficiency.The episode concludes with updates on technology advancements, including Broadcom's launch of VMware Cloud Foundation 9 and Microsoft's decision to phase out the Windows Control Panel in favor of the Settings app. The discussion emphasizes the importance of understanding Azure's true cloud consumption revenue and the implications of AI tools like Amazon Q on software development tasks. Overall, the episode provides valuable insights into the intersection of AI, cloud computing, and IT service delivery in the evolving tech landscape. Four things to know today 00:00 GigaOm Report Highlights Top AI-Based Patch Management Solutions, Featuring Automox, Flexera, and Kaseya04:49 Commonwealth Bank Launches ChatIT, AI-Powered IT Support Bot on Azure, Achieves 14-Second Response Times07:11 Windows Control Panel to Be Phased Out in Favor of Modern Settings App, Microsoft Confirms08:25 Microsoft's New Reporting Strategy Aims to Clarify Azure's True Cloud Consumption Revenue Supported by: https://getthread.com/mspradio/https://www.huntress.com/mspradio/ All our Sponsors: https://businessof.tech/sponsors/ Do you want the show on your podcast app or the written versions of the stories? Subscribe to the Business of Tech: https://www.businessof.tech/subscribe/Looking for a link from the stories? The entire script of the show, with links to articles, are posted in each story on https://www.businessof.tech/ Support the show on Patreon: https://patreon.com/mspradio/ Want our stuff? Cool Merch? Wear “Why Do We Care?” - Visit https://mspradio.myspreadshop.com Follow us on:LinkedIn: https://www.linkedin.com/company/28908079/YouTube: https://youtube.com/mspradio/Facebook: https://www.facebook.com/mspradionews/Instagram: https://www.instagram.com/mspradio/TikTok: https://www.tiktok.com/@businessoftechBluesky: https://bsky.app/profile/businessoftech.bsky.social

Is finetuning GPT4o worth it?

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Aug 22, 2024 65:19

Betteridge's law says no: with seemingly infinite flavors of RAG, and >2million token context + prompt caching from Anthropic/Deepmind/Deepseek, it's reasonable to believe that "in context learning is all you need".But then there's Cosine Genie, the first to make a huge bet using OpenAI's new GPT4o fine-tuning for code at the largest scale it has ever been used externally; resulting in what is now the #1 coding agent in the world according to SWE-Bench Full, Lite, and Verified:SWE-Bench has been the most successful agent benchmark of the year, receiving honors at ICLR (our interview here) and recently being verified by OpenAI. Cognition (Devin) was valued at $2b after reaching 14% on it. So it is very, very big news when a new agent appears to beat all other solutions, by a lot:While this number is self reported, it seems to be corroborated by OpenAI, who also award it clear highest marks on SWE-Bench verified:The secret is GPT-4o finetuning on billions of tokens of synthetic data. * Finetuning: As OpenAI says:Genie is powered by a fine-tuned GPT-4o model trained on examples of real software engineers at work, enabling the model to learn to respond in a specific way. The model was also trained to be able to output in specific formats, such as patches that could be committed easily to codebases. Due to the scale of Cosine's finetuning, OpenAI worked closely with them to figure out the size of the LoRA:“They have to decide how big your LoRA adapter is going to be… because if you had a really sparse, large adapter, you're not going to get any signal in that at all. So they have to dynamically size these things.”* Synthetic data: we need to finetune on the process of making code work instead of only training on working code.“…we synthetically generated runtime errors. Where we would intentionally mess with the AST to make stuff not work, or index out of bounds, or refer to a variable that doesn't exist, or errors that the foundational models just make sometimes that you can't really avoid, you can't expect it to be perfect.”Genie also has a 4 stage workflow with the standard LLM OS tooling stack that lets it solve problems iteratively:Full Video Podlike and subscribe etc!Show Notes* Alistair Pullen - Twitter, Linkedin* Cosine Genie launch, technical report* OpenAI GPT-4o finetuning GA* Llama 3 backtranslation* Cursor episode and Aman + SWEBench at ICLR episodeTimestamps* [00:00:00] Suno Intro* [00:05:01] Alistair and Cosine intro* [00:16:34] GPT4o finetuning* [00:20:18] Genie Data Mix* [00:23:09] Customizing for Customers* [00:25:37] Genie Workflow* [00:27:41] Code Retrieval* [00:35:20] Planning* [00:42:29] Language Mix* [00:43:46] Running Code* [00:46:19] Finetuning with OpenAI* [00:49:32] Synthetic Code Data* [00:51:54] SynData in Llama 3* [00:52:33] SWE-Bench Submission Process* [00:58:20] Future Plans* [00:59:36] Ecosystem Trends* [01:00:55] Founder Lessons* [01:01:58] CTA: Hiring & CustomersDescript Transcript[00:01:52] AI Charlie: Welcome back. This is Charlie, your AI cohost. As AI engineers, we have a special focus on coding agents, fine tuning, and synthetic data. And this week, it all comes together with the launch of Cosign's Genie, which reached 50 percent on SWE Bench Lite, 30 percent on the full SWE Bench, and 44 percent on OpenAI's new SWE Bench Verified.[00:02:17] All state of the art results by the widest ever margin recorded compared to former leaders Amazon Q and US Autocode Rover. And Factory Code Droid. As a reminder, Cognition Devon went viral with a 14 percent score just five months ago. Cosign did this by working closely with OpenAI to fine tune GPT 4. 0, now generally available to you and me, on billions of tokens of code, much of which was synthetically generated.[00:02:47] Alistair Pullen: Hi, I'm Ali. Co founder and CEO of Cosign, a human reasoning lab. And I'd like to show you Genie, our state of the art, fully autonomous software engineering colleague. Genie has the highest score on SWBench in the world. And the way we achieved this was by taking a completely different approach. We believe that if you want a model to behave like a software engineer, it has to be shown how a human software engineer works.[00:03:15] We've designed new techniques to derive human reasoning from real examples of software engineers doing their jobs. Our data represents perfect information lineage, incremental knowledge discovery, and step by step decision making. Representing everything a human engineer does logically. By actually training Genie on this unique dataset, rather than simply prompting base models, which is what everyone else is doing, we've seen that we're no longer simply generating random code until some works.[00:03:46] It's tackling problems like[00:03:48] AI Charlie: a human. Alistair Pullen is CEO and co founder of Kozen, and we managed to snag him on a brief trip stateside for a special conversation on building the world's current number one coding agent. Watch out and take care.[00:04:07] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO of Resonance at Decibel Partners, and I'm joined by my co host Swyx, founder of Small. ai.[00:04:16] swyx: Hey, and today we're back in the studio. In person, after about three to four months in visa jail and travels and all other fun stuff that we talked about in the previous episode.[00:04:27] But today we have a special guest, Ali Pullen from Cosign. Welcome. Hi, thanks for having me. We're very lucky to have you because you're on a two day trip to San Francisco. Yeah, I wouldn't recommend it. I would not[00:04:38] Alistair Pullen: recommend it. Don't fly from London to San Francisco for two days.[00:04:40] swyx: And you launched Genie on a plane.[00:04:42] On plain Wi Fi, um, claiming state of the art in SuiteBench, which we're all going to talk about. I'm excited to dive into your whole journey, because it has been a journey. I've been lucky to be a small angel in part of that journey. And it's exciting to see that you're launching to such acclaim and, you know, such results.[00:05:01] Alistair and Cosine intro[00:05:01] swyx: Um, so I'll go over your brief background, and then you can sort of fill in the blanks on what else people should know about you. You did your bachelor's in computer science at Exeter.[00:05:10] Speaker 6: Yep.[00:05:10] swyx: And then you worked at a startup that got acquired into GoPuff and round about 2022, you started working on a stealth startup that became a YC startup.[00:05:19] What's that? Yeah. So[00:05:21] Alistair Pullen: basically when I left university, I, I met my now co founder, Sam. At the time we were both mobile devs. He was an Android developer. iOS developer. And whilst at university, we built this sort of small consultancy, sort of, we'd um, be approached to build projects for people and we would just take them up and start with, they were student projects.[00:05:41] They weren't, they weren't anything crazy or anything big. We started with those and over time we started doing larger and larger projects, more interesting things. And then actually, when we left university, we just kept doing that. We didn't really get jobs, traditional jobs. It was also like in the middle of COVID, middle of lockdown.[00:05:57] So we were like, this is a pretty good gig. We'll just keep like writing code in our bedrooms. And yeah, that's it. We did that for a while. And then a friend of ours that we went to Exeter with started a YC startup during COVID. And it was one of these fast grocery delivery companies. At the time I was living in the deepest, darkest countryside in England, where fast grocery companies are still not a thing.[00:06:20] So he, he sort of pitched me this idea and was like, listen, like I need an iOS dev, do you fancy coming along? And I thought, absolutely. It was a chance to get out of my parents house, chance to move to London, you know, do interesting things. And at the time, truthfully, I had no idea what YC was. I had no idea.[00:06:34] I wasn't in the startup space. I knew I liked coding and building apps and stuff, but I'd never, never really done anything in that area. So I said, yes, absolutely. I moved to London just sort of as COVID was ending and yeah, worked at what was fancy for about a year and a half. Then we brought Sam along as well.[00:06:52] So we, Sam and I, were the two engineers at Fancy for basically its entire life, and we built literally everything. So like the, the front, the client mobile apps, the, the backends, the internal like stock management system, the driver routing, algorithms, all those things. Literally like everything. It was my first.[00:07:12] You know, both of us were super inexperienced. We didn't have, like, proper engineering experience. There were definitely decisions we'd do differently now. We'd definitely buy a lot of stuff off the shelf, stuff like that. But it was the initial dip of the toe into, like, the world of startups, and we were both, like, hooked immediately.[00:07:26] We were like, this is so cool. This sounds so much better than all our friends who were, like, consultants and doing, like, normal jobs, right? We did that, and it ran its course, and after, I want to say, 18 months or so, GoPuff came and acquired us. And there was obviously a transitionary period, an integration period, like with all acquisitions, and we did that, and as soon as we'd vested what we wanted to vest, and as soon as we thought, okay, this chapter is sort of done, uh, in about 2022, We left and we knew that we wanted to go alone and try something like we'd had this taste.[00:07:54] Now we knew we'd seen how a like a YC startup was managed like up close and we knew that we wanted to do something similar ourselves. We had no idea what it was at the time. We just knew we wanted to do something. So we, we tried a small, um, some small projects in various different areas, but then GPT 3.[00:08:12] He'd seen it on Reddit and I'm his source of all knowledge. Yeah, Sam loves Reddit. I'd actually heard of GPT 2. And obviously had like loosely followed what OpenAI had done with, what was the game they trained a model to play? Dota. Was it Dota? Yeah. So I'd followed that and, I knew loosely what GPT 2 was, I knew what BERT was, so I was like, Okay, this GPT 3 thing sounds interesting.[00:08:35] And he just mentioned it to me on a walk. And I then went home and, like, googled GPT was the playground. And the model was DaVinci 2 at the time. And it was just the old school playground, completions, nothing crazy, no chat, no nothing. I miss completions though. Yeah. Oh, completion. Honestly, I had this conversation in open hours office yesterday.[00:08:54] I was like, I just went. I know. But yeah, so we, we, um, I started playing around with the, the playground and the first thing I ever wrote into it was like, hello world, and it gave me some sort of like, fairly generic response back. I was like, okay, that looks pretty cool. The next thing was. I looked through the docs, um, also they had a lot of example prompts because I had no idea.[00:09:14] I didn't know if the, if you could put anything in, I didn't know if you had to structure in a certain way or whatever, and I, and I saw that it could start writing like tables and JSON and stuff like that. So I was like, okay, can you write me something in JSON? And it did. And I was like, Oh, wow, this is, this is pretty cool.[00:09:28] Um, can it, can it just write arbitrary JSON for me? And, um, immediately as soon as I realized that my mind was racing and I like got Sam in and we just started messing around in the playground, like fairly innocently to start with. And then, of course, both being mobile devs and also seeing, at that point, we learned about what the Codex model was.[00:09:48] It was like, this thing's trained to write code, sounds awesome. And Copilot was start, I think, I can't actually remember if Copilot had come out yet, it might have done. It's round about the same time as Codex. Round about the same time, yeah. And we were like, okay, as mobile devs, let's see what we can do.[00:10:02] So the initial thing was like, okay, let's see if we can get this AI to build us a mobile app from scratch. We eventually built the world's most flimsy system, which was back in the day with like 4, 000 token context windows, like chaining prompts, trying to keep as much context from one to the other, all these different things, where basically, Essentially, you'd put an app idea in a box, and then we'd do, like, very high level stuff, figuring out what the stack should be, figuring out what the frontend should be written in, backend should be written in, all these different things, and then we'd go through, like, for each thing, more and more levels of detail, until the point that you're You actually got Codex to write the code for each thing.[00:10:41] And we didn't do any templating or anything. We were like, no, we're going to write all the code from scratch every time, which is basically why it barely worked. But there were like occasions where you could put in something and it would build something that did actually run. The backend would run, the database would work.[00:10:54] And we were like, Oh my God, this is insane. This is so cool. And that's what we showed to our co founder Yang. I met my co founder Yang through, through fancy because his wife was their first employee. And, um, we showed him and he was like, You've discovered fire. What is this? This is insane. He has a lot more startup experience.[00:11:12] Historically, he's had a few exits in the past and has been through all different industries. He's like our dad. He's a bit older. He hates me saying that. He's your COO now? He's our COO. Yeah. And, uh, we showed him and he was like, this is absolutely amazing. Let's just do something. Cause he, he, at the time, um, was just about to have a child, so he didn't have anything going on either.[00:11:29] So we, we applied to YC, got an interview. The interview was. As most YC interviews are short, curt, and pretty brutal. They told us they hated the idea. They didn't think it would work. And that's when we started brainstorming. It was almost like the interview was like an office hours kind of thing. And we were like, okay, given what you know about the space now and how to build things with these LLMs, like what can you bring out of what you've learned in building that thing into Something that might be a bit more useful to people on the daily, and also YC obviously likes B2B startups a little bit more, at least at the time they did, back then.[00:12:01] So we were like, okay, maybe we could build something that helps you with existing codebases, like can sort of automate development stuff with existing codebases, not knowing at all what that would look like, or how you would build it, or any of these things. And They were like, yeah, that sounds interesting.[00:12:15] You should probably go ahead and do that. You're in, you've got two weeks to build us an MVP. And we were like, okay, okay. We did our best. The MVP was absolutely horrendous. It was a CLI tool. It sucked. And, um, at the time we were like, we, we don't even know. How to build what we want to build. And we didn't really know what we wanted to build, to be honest.[00:12:33] Like, we knew we wanted to try to help automate dev work, but back then we just didn't know enough about how LLM apps were built, the intricacies and all those things. And also, like, the LLMs themselves, like 4, 000 tokens, you're not going very far, they're extremely expensive. So we ended up building a, uh, a code based retrieval tool, originally.[00:12:51] Our thought process originally was, we want to build something that can do our jobs for us. That is like the gold star, we know that. We've seen like there are glimpses of it happening with our initial demo that we did. But we don't see the path of how to do that at the moment. Like the tech just wasn't there.[00:13:05] So we were like, well, there are going to be some things that you need to build this when the tech does catch up. So retrieval being one of the most important things, like the model is going to have to build like pull code out of a code base somehow. So we were like, well, let's just build the tooling around it.[00:13:17] And eventually when the tech comes, then we'll be able to just like plug it into our, our tooling and then it should work basically. And to be fair, that's basically what we've done. And that's basically what's happened, which is very fortunate. But in the meantime, whilst we were waiting for everything to sort of become available, we built this code base retrieval tool.[00:13:34] That was the first thing we ever launched when we were in YC like that, and it didn't work. It was really frustrating for us because it was just me and Sam like working like all hours trying to get this thing to work. It was quite a big task in of itself, trying to get like a good semantic search engine working that could run locally on your machine.[00:13:51] We were trying to avoid sending code to the cloud as much as possible. And then for very large codebases, you're like, you know, millions of lines of code. You're trying to do some sort of like local HNSW thing that runs inside your VS Code instance that like eats all your RAM as you've seen in the past.[00:14:05] All those different things. Yep. Yeah.[00:14:07] swyx: My first call with[00:14:07] Alistair Pullen: you, I had trouble. You were like, yeah, it sucks, man. I know, I know. I know it sucks. I'm sorry. I'm sorry. But building all that stuff was essentially the first six to eight months of what at the time was built. Which, by the way, build it. Build it. Yeah, it was a terrible, terrible name.[00:14:25] It was the worst,[00:14:27] swyx: like, part of trying to think about whether I would invest is whether or not people could pronounce it.[00:14:32] Alistair Pullen: No, when we, so when we went on our first ever YC, like, retreat, No one got the name right. They were like, build, build, well, um, and then we actually changed the names, cosign, like, although some people would spell it as in like, as if you're cosigning for an apartment or something like that's like, can't win.[00:14:49] Yeah. That was what built was back then. But the ambition, and I did a talk on this back in the end of 2022, the ambition to like build something that essentially automated our jobs was still very much like core to what we were doing. But for a very long time, it was just never apparent to us. Like. How would you go about doing these things?[00:15:06] Even when, like, you had 3. suddenly felt huge, because you've gone from 4 to 16, but even then 16k is like, a lot of Python files are longer than 16k. So you can't, you know, before you even start doing a completion, even then we were like, eh, Yeah, it looks like we're still waiting. And then, like, towards the end of last year, you then start, you see 32k.[00:15:28] 32k was really smart. It was really expensive, but also, like, you could fit a decent amount of stuff in it. 32k felt enormous. And then, finally, 128k came along, and we were like, right, this is, like, this is what we can actually deal with. Because, fundamentally, to build a product like this, you need to get as much information in front of the model as possible, and make sure that everything it ever writes in output can be read.[00:15:49] traced back to something in the context window, so it's not hallucinating it. As soon as that model existed, I was like, okay, I know that this is now going to be feasible in some way. We'd done early sort of dev work on Genie using 3. 5 16k. And that was a very, very like crude way of proving that this loop that we were after and the way we were generating the data actually had signal and worked and could do something.[00:16:16] But the model itself was not useful because you couldn't ever fit enough information into it for it to be able to do the task competently and also the base intelligence of the model. I mean, 3. 5, anyone who's used 3. 5 knows the base intelligence of the model is. is lacking, especially when you're asking it to like do software engineering, this is quite quite involved.[00:16:34] GPT4o finetuning[00:16:34] Alistair Pullen: So, we saw the 128k context model and um, at that point we'd been in touch with OpenAI about our ambitions and like how we wanted to build it. We essentially are, I just took a punt, I was like, I'm just going to ask to see, can we like train this thing? Because at the time Fortobo had just come out and back then there was still a decent amount of lag time between like OpenAI releasing a model and then allowing you to fine tune it in some way.[00:16:59] They've gotten much better about that recently, like 4. 0 fine tuning came out either, I think, a day, 4. 0 mini fine tuning came out like a day after the model did. And I know that's something they're definitely like, optimising for super heavily inside, which is great to see.[00:17:11] swyx: Which is a little bit, you know, for a year or so, YC companies had like a direct Slack channel to open AI.[00:17:17] We still do. Yeah. Yeah. So, it's a little bit of a diminishing of the YC advantage there. Yeah. If they're releasing this fine tuning[00:17:23] Alistair Pullen: ability like a day after. Yeah, no, no, absolutely. But like. You can't build a startup otherwise. The advantage is obviously nice and it makes you feel fuzzy inside. But like, at the end of the day, it's not that that's going to make you win.[00:17:34] But yeah, no, so like we'd spoken to Shamul there, Devrel guy, I'm sure you know him. I think he's head of solutions or something. In their applied team, yeah, we'd been talking to him from the very beginning when we got into YC, and he's been absolutely fantastic throughout. I basically had pitched him this idea back when we were doing it on 3.[00:17:53] 5, 16k, and I was like, this is my, this is my crazy thesis. I want to see if this can work. And as soon as like that 128k model came out, I started like laying the groundwork. I was like, I know this definitely isn't possible because he released it like yesterday, but know that I want it. And in the interim, like, GPT 4, like, 8K fine tuning came out.[00:18:11] We tried that, it's obviously even fewer tokens, but the intelligence helped. And I was like, if we can marry the intelligence and the context window length, then we're going to have something special. And eventually, we were able to get on the Experimental Access Program, and we got access to 4Turbo fine tuning.[00:18:25] As soon as we did that, because in the entire run up to that we built the data pipeline, we already had all that set up, so we were like, right, we have the data, now we have the model, let's put it through and iterate, essentially, and that's, that's where, like, Genie as we know it today, really was born. I won't pretend like the first version of Gene that we trained was good.[00:18:45] It was a disaster. That's where you realize all the implicit biases in your data set. And you realize that, oh, actually this decision you made that was fairly arbitrary was the wrong one. You have to do it a different way. Other subtle things like, you know, how you write Git diffs in using LLMs and how you can best optimize that to make sure they actually apply and work and loads of different little edge cases.[00:19:03] But as soon as we had access to the underlying tool, we were like, we can actually do this. And I was I breathed a sigh of relief because I didn't know it was like, it wasn't a done deal, but I knew that we could build something useful. I mean, I knew that we could build something that would be measurably good on whatever eval at the time that you wanted to use.[00:19:23] Like at the time, back then, we weren't actually that familiar with Swift. But once Devin came out and they announced the SBBench core, I like, that's when my life took a turn. Challenge accepted. Yeah, challenge accepted. And that's where like, yes, that's where my friendships have gone. My sleep has gone. My weight.[00:19:40] Everything got into SweeBench and yeah, we, we, it was actually a very useful tool in building GeniX beforehand. It was like, yes, vibe check this thing and see if it's useful. And then all of a sudden you have a, an actual measure to, to see like, couldn't it do software engineering? Not, not the best measure, obviously, but like it's a, it's the best that we've got now.[00:19:57] We, we just iterated and built and eventually we got it to the point where it is now. And a little bit beyond since we actually Like, we actually got that score a couple of weeks ago, and yeah, it's been a hell of a journey from the beginning all the way now. That was a very rambling answer to your question about how we got here, but that's essentially the potted answer of how we got here.[00:20:16] Got the full[00:20:16] swyx: origin story[00:20:17] Alessio: out. Yeah, no, totally.[00:20:18] Genie Data Mix[00:20:18] Alessio: You mentioned bias in the data and some of these things. In your announcement video, you called Genie the worst verse AI software engineering colleague. And you kind of highlighted how the data needed to train it needs to show how a human engineer works. I think maybe you're contrasting that to just putting code in it.[00:20:37] There's kind of like a lot more than code that goes into software engineering. How do you think about the data mixture, you know, and like, uh, there's this kind of known truth that code makes models better when you put in the pre training data, but since we put so much in the pre training data, what else do you add when you turn to Genium?[00:20:54] Alistair Pullen: Yeah, I think, well, I think that sort of boils down fundamentally to the difference between a model writing code and a model doing software engineering, because the software engineering sort of discipline goes wider, because if you look at something like a PR, that is obviously a Artifact of some thought and some work that has happened and has eventually been squashed into, you know, some diffs, right?[00:21:17] What the, very crudely, what the pre trained models are reading is they're reading those final diffs and they're emulating that and they're being able to output it, right? But of course, it's a super lossy thing, a PR. You have no idea why or how, for the most part, unless there are some comments, which, you know, anyone who's worked in a company realizes PR reviews can be a bit dodgy at times, but you see that you lose so much information at the end, and that's perfectly fine, because PRs aren't designed to be something that perfectly preserves everything that happened, but What we realized was if you want something that's a software engineer, and very crudely, we started with like something that can do PRs for you, essentially, you need to be able to figure out why those things happened.[00:21:58] Otherwise, you're just going to rely, you essentially just have a code writing model, you have something that's good at human eval, but But, but not very good at Sweet Eng. Essentially that realization was, was part of the, the kernel of the idea of of, of the approach that we took to design the agent. That, that is genie the way that we decided we want to try to extract what happened in the past, like as forensically as possible, has been and is currently like one of the, the main things that we focus all our time on, because doing that as getting as much signal out as possible, doing that as well as possible is the biggest.[00:22:31] thing that we've seen that determines how well we do on that benchmark at the end of the day. Once you've sorted things out, like output structure, how to get it consistently writing diffs and all the stuff that is sort of ancillary to the model actually figuring out how to solve a problem, the core bit of solving the problem is how did the human solve this problem and how can we best come up with how the human solved these problems.[00:22:54] So all the effort went in on that. And the mix that we ended up with was, as you've probably seen in the technical report and so on, all of those different languages and different combinations of different task types, all of that has run through that pipeline, and we've extracted all that information out.[00:23:09] Customizing for Customers[00:23:09] Alessio: How does that differ when you work with customers that have private workflows? Like, do you think, is there usually a big delta between what you get in open source and maybe public data versus like Yeah,[00:23:19] Alistair Pullen: yeah, yeah. When you scrape enough of it, most of open source is updating readmes and docs. It's hilarious, like we had to filter out so much of that stuff because when we first did the 16k model, like the amount of readme updating that went in, we did like no data cleaning, no real, like, we just sort of threw it in and saw what happened.[00:23:38] And it was just like, It was really good at updating readme, it was really good at writing some comments, really good at, um, complaining in Git reviews, in PR reviews, rather, and it would, again, like, we didn't clean the data, so you'd, like, give it some feedback, and it would just, like, reply, and, like, it would just be quite insubordinate when it was getting back to you, like, no, I don't think you're right, and it would just sort of argue with you, so The process of doing all that was super interesting because we realized from the beginning, okay, there's a huge amount of work that needs to go into like cleaning this, getting it aligned with what we want the model to do to be able to get the model to be useful in some way.[00:24:12] Alessio: I'm curious, like, how do you think about the customer willingness? To share all of this historical data, I've done a lot of developer tools investing in my career and getting access to the code base is always one of the hard things. Are people getting more cautious about sharing this information? In the past, it was maybe like, you know, you're using static analysis tool, like whatever else you need to plug into the code base, fine.[00:24:35] Now you're building. A model based on it, like, uh, what's the discussion going into these companies? Are most people comfortable with, like, letting you see how to work and sharing everything?[00:24:44] Alistair Pullen: It depends on the sector, mostly. We've actually seen, I'd say, people becoming more amenable to the idea over time, actually, rather than more skeptical, because I think they can see the, the upside.[00:24:55] If this thing could be, Does what they say it does, it's going to be more help to us than it is a risk to our infosec. Um, and of course, like, companies building in this space, we're all going to end up, you know, complying with the same rules, and there are going to be new rules that come out to make sure that we're looking at your code, that everything is safe, and so on.[00:25:12] So from what we've seen so far, we've spoken to some very large companies that you've definitely heard of and all of them obviously have stipulations and many of them want it to be sandbox to start with and all the like very obvious things that I, you know, I would say as well, but they're all super keen to have a go and see because like, despite all those things, if we can genuinely Make them go faster, allow them to build more in a given time period and stuff.[00:25:35] It's super worth it to them.[00:25:37] Genie Workflow[00:25:37] swyx: Okay, I'm going to dive in a little bit on the process that you have created. You showed the demo on your video, and by the time that we release this, you should be taking people off the waitlist and launching people so people can see this themselves. There's four main Parts of the workflow, which is finding files, planning action, writing code and running tests.[00:25:58] And controversially, you have set yourself apart from the Devins of the world by saying that things like having access to a browser is not that important for you. Is that an accurate reading of[00:26:09] Alistair Pullen: what you wrote? I don't remember saying that, but At least with what we've seen, the browser is helpful, but it's not as helpful as, like, ragging the correct files, if that makes sense.[00:26:20] Like, it is still helpful, but obviously there are more fundamental things you have to get right before you get to, like, Oh yeah, you can read some docs, or you can read a stack overflow article, and stuff like that.[00:26:30] swyx: Yeah, the phrase I was indexing on was, The other software tools are wrappers around foundational models with a few additional tools, such as a web browser or code interpreter.[00:26:38] Alistair Pullen: Oh, I see. No, I mean, no, I'm, I'm not, I'm not, I'm not deri, I'm deriding the, the, the approach that, not the, not the tools. Yeah, exactly. So like, I would[00:26:44] swyx: say in my standard model of what a code agent should look like, uh, Devon has been very influential, obviously. Yeah. Yeah. Because you could just add the docs of something.[00:26:54] Mm-Hmm. . And like, you know, now I have, now when I'm installing a new library, I can just add docs. Yeah, yeah. Cursor also does this. Right. And then obviously having a code interpreter does help. I guess you have that in the form[00:27:03] Alistair Pullen: of running tests. I mean, uh, the Genie has both of those tools available to it as well.[00:27:08] So, yeah, yeah, yeah. So, we have a tool where you can, like, put in URLs and it will just read the URLs. And you can also use this Perplexities API under the hood as well to be able to actually ask questions if it wants to. Okay. So, no, we use both of those tools as well. Like, those tools are Super important and super key.[00:27:24] I think obviously the most important tools to these agents are like being able to retrieve code from a code base, being able to read Stack Overflow articles and what have you and just be able to essentially be able to Google like we do is definitely super useful.[00:27:38] swyx: Yeah, I thought maybe we could just kind of dive into each of those actions.[00:27:41] Code Retrieval[00:27:41] swyx: Code retrieval, one of the core indexer that Yes. You've worked on, uh, even as, as built, what makes it hard, what approach you thought would work, didn't work,[00:27:52] Alistair Pullen: anything like that. It's funny, I had a similar conversation to this when I was chatting to the guys from OpenAI yesterday. The thing is that searching for code, specifically semantically, at least to start with, I mean like keyword search and stuff like that is a, is a solved problem.[00:28:06] It's been around for ages, but at least being able to, the phrase we always used back in the day was searching for what code does rather than what code is. Like searching for functionality is really hard. Really hard. The way that we approached that problem was that obviously like a very basic and easy approach is right.[00:28:26] Let's just embed the code base. We'll chunk it up in some arbitrary way, maybe using an AST, maybe using number of lines, maybe using whatever, like some overlapping, just chunk it up and embed it. And once you've done that, I will write a query saying, like, find me some authentication code or something, embed it, and then do the cosine similarity and get the top of K, right?[00:28:43] That doesn't work. And I wish it did work, don't get me wrong. It doesn't work well at all, because fundamentally, if you think about, like, semantically, how code looks is very different to how English looks, and there's, like, not a huge amount of signal that's carried between the two. So what we ended up, the first approach we took, and that kind of did well enough for a long time, was Okay, let's train a model to be able to take in English code queries and then produce a hypothetical code snippet that might look like the answer, embed that, and then do the code similarity.[00:29:18] And that process, although very simple, gets you so much more performance out of the retrieval accuracy. And that was kind of like the start of our of our engine, as we called it, which is essentially like the aggregation of all these different heuristics, like semantic, keyword, LSP, and so on. And then we essentially had like a model that would, given an input, choose which ones it thought were most appropriate, given the type of requests you had.[00:29:45] So the whole code search thing was a really hard problem. And actually what we ended up doing with Genie is we, um, let The model through self play figure out how to retrieve code. So actually we don't use our engine for Genie. So instead of like a request coming in and then like say GPT 4 with some JSON output being like, Well, I think here we should use a keyword with these inputs and then we should use semantic.[00:30:09] And then we should like pick these results. It's actually like, A question comes in and Genie has self played in its training data to be able to be like, okay, this is how I'm going to approach finding this information. Much more akin to how a developer would do it. Because if I was like, Shawn, go into this new code base you've never seen before.[00:30:26] And find me the code that does this. You're gonna probably, you might do some keywords, you're gonna look over the file system, you're gonna try to figure out from the directories and the file names where it might be, you're gonna like jump in one, and then once you're in there, you're probably gonna be doing the, you know, go to definition stuff to like jump from file to file and try to use the graph to like get closer and closer.[00:30:46] And that is exactly what Genie does. Starts on the file system, looks at the file system, picks some candidate files, is this what I'm looking for, yes or no, and If there's something that's interesting, like an import or something, it can, it can command click on that thing, go to definition, go to references, and so on.[00:31:00] And it can traverse the codebase that way.[00:31:02] swyx: Are you using the VS Code, uh, LSP, or? No,[00:31:05] Alistair Pullen: that's not, we're not like, we're not doing this in VS Code, we're just using the language servers running. But, we really wanted to try to mimic the way we do it as best as possible. And we did that during the self play process when we were generating the dataset, so.[00:31:18] Although we did all that work originally, and although, like, Genie still has access to these tools, so it can do keyword searches, and it can do, you know, basic semantic searches, and it can use the graph, it uses them through this process and figures out, okay, I've learned from data how to find stuff in codebases, and I think in our technical report, I can't remember the exact number, but I think it was around 65 or 66 percent retrieval accuracy overall, Measured on, we know what lines we need for these tasks to find, for the task to actually be able to be completed, And we found about 66 percent of all those lines, which is one of the biggest areas of free performance that we can get a hold of, because When we were building Genie, truthfully, like, a lot more focus went on assuming you found the right information, you've been able to reproduce the issue, assuming that's true, how do you then go about solving it?[00:32:08] And the bulk of the work we did was on the solving. But when you go higher up the funnel, obviously, like, the funnel looks like, have you found everything you need for the task? Are you able to reproduce the problem that's seen in the issue? Are you then able to solve it? And the funnel gets narrower as you go down.[00:32:22] And at the top of the funnel, of course, is rank. So I'm actually quite happy with that score. I think it's still pretty impressive considering the size of some of the codebases we're doing, we're using for this. But as soon as that, if that number becomes 80, think how many more tasks we get right. That's one of the key areas we're going to focus on when we continue working on Genie.[00:32:37] It'd be interesting to break out a benchmark just for that.[00:32:41] swyx: Yeah, I mean, it's super easy. Because I don't know what state of the art is.[00:32:43] Alistair Pullen: Yeah, I mean, like, for a, um, it's super easy because, like, for a given PR, you know what lines were edited. Oh, okay. Yeah, you know what lines were[00:32:50] swyx: you can[00:32:51] Alistair Pullen: source it from Cbench, actually.[00:32:52] Yeah, you can do it, you can do it super easily. And that's how we got that figure out at the other end. Um, for us being able to see it against, um, our historic models were super useful. So we could see if we were, you know, actually helping ourselves or not. And initially, one of the biggest performance gains that we saw when we were work, when we did work on the RAG a bit was giving it the ability to use the LSP to like go to definition and really try to get it to emulate how we do that, because I'm sure when you go into an editor with that, where like the LSP is not working or whatever, you suddenly feel really like disarmed and naked.[00:33:20] You're like, Oh my god, I didn't realize how much I actually used this to get about rather than just find stuff. So we really tried to get it to do that and that gave us a big jump in performance. So we went from like 54 percent up to like the 60s, but just by adding, focusing on that.[00:33:34] swyx: One weird trick. Yes.[00:33:37] I'll briefly comment here. So this is the standard approach I would say most, uh, code tooling startups are pursuing. The one company that's not doing this is magic. dev. So would you do things differently if you have a 10 million[00:33:51] Alistair Pullen: token context window? If I had a 10 million context window and hundreds of millions of dollars, I wouldn't have gone and built, uh, it's an LTM, it's not a transformer, right, that they're using, right?[00:34:03] If I'm not mistaken, I believe it's not a transformer. Yeah, Eric's going to come on at some point. Listen, they obviously know a lot more about their product than I do. I don't know a great deal about how magic works. I don't think he knows anything yet. I'm not going to speculate. Would I do it the same way as them?[00:34:17] I like the way we've done it because fundamentally like we focus on the Active software engineering and what that looks like and showing models how to do that. Fundamentally, the underlying model that we use is kind of null to us, like, so long as it's the best one, I don't mind. And the context windows, we've already seen, like, you can get transformers to have, like, million, one and a half million token context windows.[00:34:43] And that works perfectly well, so like, as soon as you can fine tune Gemini 1. 5, then you best be sure that Genie will run on Gemini 1. 5, and like, we'll probably get very good performance out of that. I like our approach because we can be super agile and be like, Oh, well, Anthropic have just released whatever, uh, you know, and it might have half a million tokens and it might be really smart.[00:35:01] And I can just immediately take my JSONL file and just dump it in there and suddenly Genie works on there and it can do all the new things. Does[00:35:07] swyx: Anthropic have the same fine tuning support as OpenAI? I[00:35:11] Alistair Pullen: actually haven't heard any, anyone do it because they're working on it. They are partner, they're partnered with AWS and it's gonna be in Bedrock.[00:35:16] Okay. As far as, as far as I know, I think I'm, I think, I think that's true. Um, cool. Yeah.[00:35:20] Planning[00:35:20] swyx: We have to keep moving on to, uh, the other segments. Sure. Uh, planning the second piece of your four step grand master plan, that is the frontier right now. You know, a lot of people are talking about strawberry Q Star, whatever that is.[00:35:32] Monte Carlo Tree Search. Is current state of the art planning good enough? What prompts have worked? I don't even know what questions to ask. Like, what is the state of planning?[00:35:41] Alistair Pullen: I think it's fairly obvious that with the foundational models, like, you can ask them to think by step by step and ask them to plan and stuff, but that isn't enough, because if you look at how those models score on these benchmarks, then they're not even close to state of the art.[00:35:52] Which ones are[00:35:52] swyx: you referencing? Benchmarks? So, like,[00:35:53] Alistair Pullen: just, uh, like, SweetBench and so on, right? And, like, even the things that get really good scores on human evalor agents as well, because they have these loops, right? Yeah. Obviously these things can reason, quote unquote, but the reasoning is the model, like, it's constrained by the model as intelligence, I'd say, very crudely.[00:36:10] And what we essentially wanted to do was we still thought that, obviously, reasoning is super important, we need it to get the performance we have. But we wanted the reasoning to emulate how we think about problems when we're solving them as opposed to how a model thinks about a problem when we're solving it.[00:36:23] And that was, that's obviously part of, like, the derivation pipeline that we have when we, when we, when we Design our data, but the reasoning that the models do right now, and who knows what Q star, whatever ends up being called looks like, but certainly what I'm excited on a small tangent to that, like, what I'm really excited about is when models like that come out, obviously, the signal in my data, when I regenerate, it goes up.[00:36:44] And then I can then train that model. It's already better at reasoning with it. improved reasoning data and just like I can keep bootstrapping and keep leapfrogging every single time. And that is like super exciting to me because I don't, I welcome like new models so much because immediately it just floats me up without having to do much work, which is always nice.[00:37:02] But at the state of reasoning generally, I don't see it going away anytime soon. I mean, that's like an autoregressive model doesn't think per se. And in the absence of having any thought Maybe, uh, an energy based model or something like that. Maybe that's what QSTAR is. Who knows? Some sort of, like, high level, abstract space where thought happens before tokens get produced.[00:37:22] In the absence of that for the moment, I think it's all we have and it's going to have to be the way it works. For what happens in the future, we'll have to see, but I think certainly it's never going to hinder performance to do it. And certainly, the reasoning that we see Genie do, when you compare it to like, if you ask GPT 4 to break down step by step and approach for the same problem, at least just on a vibe check alone, looks far better.[00:37:46] swyx: Two elements that I like, that I didn't see in your initial video, we'll see when, you know, this, um, Genie launches, is a planner chat, which is, I can modify the plan while it's executing, and then the other thing is playbooks, which is also from Devin, where, here's how I like to do a thing, and I'll use Markdown to, Specify how I do it.[00:38:06] I'm just curious if, if like, you know,[00:38:07] Alistair Pullen: those things help. Yeah, no, absolutely. We're a hundred percent. We want everything to be editable. Not least because it's really frustrating when it's not. Like if you're ever, if you're ever in a situation where like this is the one thing I just wish I could, and you'd be right if that one thing was right and you can't change it.[00:38:21] So we're going to make everything as well, including the code it writes. Like you can, if it makes a small error in a patch, you can just change it yourself and let it continue and it will be fine. Yeah. So yeah, like those things are super important. We'll be doing those two.[00:38:31] Alessio: I'm curious, once you get to writing code, is most of the job done?[00:38:35] I feel like the models are so good at writing code when they're like, And small chunks that are like very well instructed. What's kind of the drop off in the funnel? Like once you get to like, you got the right files and you got the right plan. That's a great question[00:38:47] Alistair Pullen: because by the time this is out, there'll be another blog, there'll be another blog post, which contains all the information, all the learnings that I delivered to OpenAI's fine tuning team when we finally got the score.[00:38:59] Oh, that's good. Um, go for it. It's already up. And, um, yeah, yeah. I don't have it on my phone, but basically I, um, broke down the log probs. I basically got the average log prob for a token at every token position in the context window. So imagine an x axis from 0 to 128k and then the average log prob for each index in there.[00:39:19] As we discussed, like, The way genie works normally is, you know, at the beginning you do your RAG, and then you do your planning, and then you do your coding, and that sort of cycle continues. The certainty of code writing is so much more certain than every other aspect of genie's loop. So whatever's going on under the hood, the model is really comfortable with writing code.[00:39:35] There is no doubt, and it's like in the token probabilities. One slightly different thing, I think, to how most of these models work is, At least for the most part, if you ask GPT4 in ChatGPT to edit some code for you, it's going to rewrite the entire snippet for you with the changes in place. We train Genie to write diffs and, you know, essentially patches, right?[00:39:55] Because it's more token efficient and that is also fundamentally We don't write patches as humans, but it's like, the result of what we do is a patch, right? When Genie writes code, I don't know how much it's leaning on the pre training, like, code writing corpus, because obviously it's just read code files there.[00:40:14] It's obviously probably read a lot of patches, but I would wager it's probably read more code files than it has patches. So it's probably leaning on a different part of its brain, is my speculation. I have no proof for this. So I think the discipline of writing code is slightly different, but certainly is its most comfortable state when it's writing code.[00:40:29] So once you get to that point, so long as you're not too deep into the context window, another thing that I'll bring up in that blog post is, um, Performance of Genie over the length of the context window degrades fairly linearly. So actually, I actually broke it down by probability of solving a SWE bench issue, given the number of tokens of the context window.[00:40:49] It's 60k, it's basically 0. 5. So if you go over 60k in context length, you are more likely to fail than you are to succeed just based on the amount of tokens you have on the context window. And when I presented that to the fine tuning team at OpenAI, that was super interesting to them as well. And that is more of a foundational model attribute than it is an us attribute.[00:41:10] However, the attention mechanism works in, in GPT 4, however, you know, they deal with the context window at that point is, you know, influencing how Genie is able to form, even though obviously all our, all our training data is perfect, right? So even if like stuff is being solved in 110, 000 tokens, sort of that area.[00:41:28] The training data still shows it being solved there, but it's just in practice, the model is finding it much harder to solve stuff down that end of the context window.[00:41:35] Alessio: That's the scale with the context, so for a 200k context size, is 100k tokens like the 0. 5? I don't know. Yeah, but I,[00:41:43] Alistair Pullen: I, um, hope not. I hope you don't just take the context length and halve it and then say, oh, this is the usable context length.[00:41:50] But what's been interesting is knowing that Actually really digging into the data, looking at the log probs, looking at how it performs over the entire window. It's influenced the short term improvements we've made to Genie since we did the, got that score. So we actually made some small optimizations to try to make sure As best we can without, like, overdoing it, trying to make sure that we can artificially make sure stuff sits within that sort of range, because we know that's our sort of battle zone.[00:42:17] And if we go outside of that, we're starting to push the limits, we're more likely to fail. So just doing that sort of analysis has been super useful without actually messing with anything, um, like, more structural in getting more performance out of it.[00:42:29] Language Mix[00:42:29] Alessio: What about, um, different languages? So, in your technical report, the data makes sense.[00:42:34] 21 percent JavaScript, 21 percent Python, 14 percent TypeScript, 14 percent TSX, um, Which is JavaScript, JavaScript.[00:42:42] Alistair Pullen: Yeah,[00:42:42] swyx: yeah, yeah. Yes,[00:42:43] Alistair Pullen: yeah, yeah. It's like 49 percent JavaScript. That's true, although TypeScript is so much superior, but anyway.[00:42:46] Alessio: Do you see, how good is it at just like generalizing? You know, if you're writing Rust or C or whatever else, it's quite different.[00:42:55] Alistair Pullen: It's pretty good at generalizing. Um, obviously, though, I think there's 15 languages in that technical report, I think, that we've, that we've covered. The ones that we picked in the highest mix were, uh, the ones that, selfishly, we internally use the most, and also that are, I'd argue, some of the most popular ones.[00:43:11] When we have more resource as a company, and, More time and, you know, once all the craziness that has just happened sort of dies down a bit, we are going to, you know, work on that mix. I'd love to see everything ideally be represented in a similar level as it is. If you, if you took GitHub as a data set, if you took like how are the languages broken down in terms of popularity, that would be my ideal data mix to start.[00:43:34] It's just that it's not cheap. So, um, yeah, trying to have an equal amount of Ruby and Rust and all these different things is just, at our current state, is not really what we're looking for.[00:43:46] Running Code[00:43:46] Alessio: There's a lot of good Ruby in my GitHub profile. You can have it all. Well, okay, we'll just train on that. For running tests It sounds easy, but it isn't, especially when you're working in enterprise codebases that are kind of like very hard to spin up.[00:43:58] Yes. How do you set that up? It's like, how do you make a model actually understand how to run a codebase, which is different than writing code for a codebase?[00:44:07] Alistair Pullen: The model itself is not in charge of like setting up the codebase and running it. So Genie sits on top of GitHub, and if you have CI running GitHub, you have GitHub Actions and stuff like that, then Genie essentially makes a call out to that, runs your CI, sees the outputs and then like moves on.[00:44:23] Making a model itself, set up a repo, wasn't scoped in what we wanted Genie to be able to do because for the most part, like, at least most enterprises have some sort of CI pipeline running and like a lot of, if you're doing some, even like, A lot of hobbyist software development has some sort of like basic CI running as well.[00:44:40] And that was like the lowest hanging fruit approach that we took. So when, when Genie ships, like the way it will run its own code is it will basically run your CI and it will like take the, um, I'm not in charge of writing this. The rest of the team is, but I think it's the checks API on GitHub allows you to like grab that information and throw it in the context window.[00:44:56] Alessio: What's the handoff like with the person? So, Jeannie, you give it a task, and then how long are you supposed to supervise it for? Or are you just waiting for, like, the checks to eventually run, and then you see how it goes? Like, uh, what does it feel like?[00:45:11] Alistair Pullen: There are a couple of modes that it can run in, essentially.[00:45:14] It can run in, like, fully headless autonomous modes, so say you assign it a ticket in linear or something. Then it won't ask you for anything. It will just go ahead and try. Or if you're in like the GUI on the website and you're using it, then you can give it a task and it, it might choose to ask you a clarifying question.[00:45:30] So like if you ask it something super broad, it might just come back to you and say, what does that actually mean? Or can you point me in the right direction for this? Because like our decision internally was, it's going to piss people off way more if it just goes off and has, and makes a completely like.[00:45:45] ruined attempt at it because it just like from day one got the wrong idea. So it can ask you for a lot of questions. And once it's going much like a regular PR, you can leave review comments, issue comments, all these different things. And it, because you know, he's been trained to be a software engineering colleague, responds in actually a better way than a real colleague, because it's less snarky and less high and mighty.[00:46:08] And also the amount of filtering has to do for When you train a model to like be a software engineer, essentially, it's like you can just do anything. It's like, yeah, it looks good to me, bro.[00:46:17] swyx: Let's[00:46:17] Alistair Pullen: ship it.[00:46:19] Finetuning with OpenAI[00:46:19] swyx: I just wanted to dive in a little bit more on your experience with the fine tuning team. John Allard was publicly sort of very commentary supportive and, you know, was, was part of it.[00:46:27] Like, what's it like working with them? I also picked up that you initially started to fine tune what was publicly available, the 16 to 32 K range. You got access to do more than that. Yeah. You've also trained on billions of tokens instead of the usual millions range. Just, like, take us through that fine tuning journey and any advice that you might have.[00:46:47] Alistair Pullen: It's been so cool, and this will be public by the time this goes out, like, OpenAI themselves have said we are pushing the boundaries of what is possible with fine tuning. Like, we are right on the edge, and like, we are working, genuinely working with them in figuring out how stuff works, what works, what doesn't work, because no one's doing No one else is doing what we're doing.[00:47:06] They have found what we've been working on super interesting, which is why they've allowed us to do so much, like, interesting stuff. Working with John, I mean, I had a really good conversation with John yesterday. We had a little brainstorm after the video we shot. And one of the things you mentioned, the billions of tokens, one of the things we've noticed, and it's actually a very interesting problem for them as well, when you're[00:47:28] How big your peft adapter, your lore adapter is going to be in some way and like figuring that out is actually a really interesting problem because if you make it too big and because they support data sets that are so small, you can put like 20 examples through it or something like that, like if you had a really sparse, large adapter, you're not going to get any signal in that at all.[00:47:44] So they have to dynamically size these things and there is an upper bound and actually we use. Models that are larger than what's publicly available. It's not publicly available yet, but when this goes out, it will be. But we have larger law adapters available to us, just because the amount of data that we're pumping through it.[00:48:01] And at that point, you start seeing really Interesting other things like you have to change your learning rate schedule and do all these different things that you don't have to do when you're on the smaller end of things. So working with that team is such a privilege because obviously they're like at the top of their field in, you know, in the fine tuning space.[00:48:18] So we're, as we learn stuff, they're learning stuff. And one of the things that I think really catalyzed this relationship is when we first started working on Genie, like I delivered them a presentation, which will eventually become the blog post that you'll love to read soon. The information I gave them there I think is what showed them like, oh wow, okay, these guys are really like pushing the boundaries of what we can do here.[00:48:38] And truthfully, our data set, we view our data set right now as very small. It's like the minimum that we're able to afford, literally afford right now to be able to produce a product like this. And it's only going to get bigger. So yesterday while I was in their offices, I was basically, so we were planning, we were like, okay, how, this is where we're going in the next six to 12 months.[00:48:57] Like we're, Putting our foot on the gas here, because this clearly works. Like I've demonstrated this is a good, you know, the best approach so far. And I want to see where it can go. I want to see what the scaling laws like for the data. And at the moment, like, it's hard to figure that out because you don't know when you're running into like saturating a PEFT adapter, as opposed to actually like, is this the model's limit?[00:49:15] Like, where is that? So finding all that stuff out is the work we're actively doing with them. And yeah, it's, it's going to get more and more collaborative over the next few weeks as we, as we explore like larger adapters, pre training extension, different things like that.[00:49:27] swyx: Awesome. I also wanted to talk briefly about the synthetic data process.[00:49:32] Synthetic Code Data[00:49:32] swyx: One of your core insights was that the vast majority of the time, the code that is published by a human is encrypted. In a working state. And actually you need to fine tune on non working code. So just, yeah, take us through that inspiration. How many rounds, uh, did you, did you do? Yeah, I mean, uh,[00:49:47] Alistair Pullen: it might, it might be generous to say that the vast majority of code is in a working state.[00:49:51] I don't know if I don't know if I believe that. I was like, that's very nice of you to say that my code works. Certainly, it's not true for me. No, I think that so yeah, no, but it was you're right. It's an interesting problem. And what we saw was when we didn't do that, obviously, we'll just hope you have to basically like one shot the answer.[00:50:07] Because after that, it's like, well, I've never seen iteration before. How am I supposed to figure out how this works? So what the what you're alluding to there is like the self improvement loop that we started working on. And that was in sort of two parts, we synthetically generated runtime errors. Where we would intentionally mess with the AST to make stuff not work, or index out of bounds, or refer to a variable that doesn't exist, or errors that the foundational models just make sometimes that you can't really avoid, you can't expect it to be perfect.[00:50:39] So we threw some of those in with a, with a, with a probability of happening and on the self improvement side, I spoke about this in the, in the blog post, essentially the idea is that you generate your data in sort of batches. First batch is like perfect, like one example, like here's the problem, here's the answer, go, train the model on it.[00:50:57] And then for the second batch, you then take the model that you trained before that can look like one commit into the future, and then you let it have the first attempt at solving the problem. And hopefully it gets it wrong, and if it gets it wrong, then you have, like, okay, now the codebase is in this incorrect state, but I know what the correct state is, so I can do some diffing, essentially, to figure out how do I get the state that it's in now to the state that I want it in, and then you can train the model to then produce that diff next, and so on, and so on, and so on, so the model can then learn, and also reason as to why it needs to make these changes, to be able to learn how to, like, learn, like, solve problems iteratively and learn from its mistakes and stuff like that.[00:51:35] Alessio: And you picked the size of the data set just based on how much money you could spend generating it. Maybe you think you could just make more and get better results. How, what[00:51:42] Alistair Pullen: multiple of my monthly burn do I spend doing this? Yeah. Basically it was, it was very much related to Yeah. Just like capital and um, yes, with any luck that that will be alleviated to[00:51:53] swyx: very soon.[00:51:54] Alistair Pullen: Yeah.[00:51:54] SynData in Llama 3[00:51:54] swyx: Yeah. I like drawing references to other things that are happening in, in the, in the wild. So, 'cause we only get to release this podcast once a week. Mm-Hmm. , the LAMA three paper also had some really interesting. Thoughts on synthetic data for code? I don't know if you have reviewed that. I'll highlight the back translation section.[00:52:11] Because one of your dataset focuses is updating documentation. I think that translation between natural language, English versus code, and

covid-19 god ceo donald trump ai english google pr england san francisco design performance planning speaker putting chatgpt code starts mvp ga cheers android reddit coo id active honestly ios customers b2b wifi models cto expensive ram swift slack openai gemini genie nah rust ux historically api lite yang unbelievable python gpt aws lama java github llama representing future plans synthetic javascript da vinci exeter ids copilot llm resonance sw alistair ast ide measured git bedrock prs dota codex rag urls benchmarks anthropic artifact 8k alessio customizing yc fine tuning stack overflow json typescript cli cursor vs code markdown devrel cosign genix lsp swe tsx github actions gopuff openai gpt ltm amazon q devins betteridge cosine iclr latent space

#869: A live Amazon Q/A with our Facebook members and PAC students

Silent Sales Machine Radio

Play Episode Listen Later Jul 27, 2024 28:46

Once per week or so our Amazon seller leadership team and I love to go live on Zoom with whoever can join us and answer as many ecommerce and Amazon selling related questions as possible. We captured the the best moments and turned them into a podcast episode. Here's the topics of the clips in the Q&A: Keep Calm and Find More ASINS Why using coupons and discounts aren't necessary in obtaining products to sell Is it necessary to create an LLC with your business and how to go about it How long should I wait before I contact Amazon because my inventory isn't checked in yet - there's a program to help expedite the process Comparing and contrasting The System with other tools we recommend here How is the Proven Amazon Course illustrated by the movie Matrix Show note LINKS: SilentSalesMachine.com - text the word “free” to 507-800-0090 to get a free copy of Jim's latest book in audio about building multiple income streams online or visit https://silentjim.com/free11 SilentJim.com/bookacall - book a call here to discuss our offers including coaching, legends and ProvenAmazonCourse.com course My Silent Team Facebook group - https://www.facebook.com/groups/mysilentteam 100% FREE! Join 75,000 + Facebook members from around the world who are using the internet creatively every day to launch and grow multiple income streams through our exciting PROVEN strategies! There's no support community like this one anywhere else in the world! ProvenAmazonCourse.com - the comprehensive course that contains ALL our Amazon training modules, recorded events and a steady stream of latest cutting edge training including of course the most popular starting point, the REPLENS selling model. The PAC is updated free for life! Khang's episode is #826 (the $3million automated REPLENS seller) and his software link is SilentJim.com/thesystem Keepa- episode 369 explains what keepa is. Check out Keepa here: SilentJim.com/keepa https://Humnbird.com - they help sellers get established on Amazon.com (sell in the US from anywhere in the world) They specialize in everything from Intellectual Property, Trademarks, Patents, Corporation Setup to Branding, Design and Marketing. We recommend their affordable systems and solutions giving you everything you need for your wholesale and private label business even overseas!

amazon marketing design zoom system students llc branding comparing pac proven intellectual property patents trademarks khang amazon q keepa

#847: A live Amazon Q/A with our Facebook members and PAC students

Silent Sales Machine Radio

Play Episode Listen Later Jun 8, 2024 31:53

Once per week or so our Amazon seller leadership team and I love to go live on Zoom with whoever can join us and answer as many ecommerce and Amazon selling related questions as possible. We had a long session last time so we captured the best moments of the first hour or so and turned them into a podcast episode. Here's the topics of the clips for the Q&A: Intro and explanation of a few of our programs Coach intro to Kickstart Bootcamp - Robin Joy gives an overview Question about testing ASINs and discussion on Keepa Question about selling a product before inventory is officially checked in How to get support through Seller Central with the hard topics and how Jeff Schick's services are a help Another explanation of the above the buy box pricing and why it works Show note LINKS: https://silentjim.com/bb70 - see over 170 examples of Keepa charts of "above buy box" winners that are selling in Jim's account in real time RIGHT NOW. https://JeffSchick.com - our legal expert for all things Amazon SilentSalesMachine.com - text the word “free” to 507-800-0090 to get a free copy of Jim's latest book in audio about building multiple income streams online or visit https://silentjim.com/free11 New sellers often ask how to find profitable products but rather the question should be how to find “test worthy ASINs" or “underserved shelf space at Amazon!!” Jim has posted many examples on the My Silent Team Facebook page from his own personal inventory demonstrating exactly how this works! Jim recommends listeners go to Podcast #554 for an in depth discussion on this topic and Podcast #369 is a thorough discussion on Keepa! https://silentjim.com/podcast SilentJim.com/bookacall - book a call here to discuss our offers including coaching, legends and ProvenAmazonCourse.com course My Silent Team Facebook group - https://www.facebook.com/groups/mysilentteam 100% FREE! Join 75,000 + Facebook members from around the world who are using the internet creatively every day to launch and grow multiple income streams through our exciting PROVEN strategies! There's no support community like this one anywhere else in the world! ProvenAmazonCourse.com - the comprehensive course that contains ALL our Amazon training modules, recorded events and a steady stream of latest cutting edge training including of course the most popular starting point, the REPLENS selling model. The PAC is updated free for life! https://SilentJim.com/kickstart - if you want a shortcut to learning all you need to get started then get the Proven Amazon Course and go through Kickstart.

amazon coach zoom students pac proven kickstart seller central amazon q keepa asins jeff schick

#844: A live Amazon Q/A with our Facebook members and PAC students

Silent Sales Machine Radio

Play Episode Listen Later Jun 1, 2024 22:15

Once per week or so our Amazon seller leadership team and I love to go live on Zoom with whoever can join us and answer as many ecommerce and Amazon selling related questions as possible. We captured the best moments and turned them into a podcast episode. Today's episode has Brian and Robin Joy as co-hosts with great tips in business attitude and growth along with a road map in where to start in the Proven Amazon Course. Here's the topics of the clips for the Q&A: Robin Joy highlights The Proven Conference experience Brian tells us where to start in PAC step by step What is up with the Mastrermind groups and how to utilize such a great opportunity More thoughts on Keepa and how to simplify it Show note LINKS: SilentSalesMachine.com - text the word “free” to 507-800-0090 to get a free copy of Jim's latest book in audio about building multiple income streams online or visit https://silentjim.com/free11 SilentJim.com/bookacall - book a call here to discuss our offers including coaching, legends and ProvenAmazonCourse.com course TheProvenConference.com - Get your tickets at the deepest discount now for our 2025 live event in Orlando! Hurry though - when the event tickets go live starts the price goes WAY up! Keepa- episode 369 at SilentJim.com/podcast explains what Keepa is. Check out Keepa here: silentjim.com/keepa My Silent Team Facebook group - https://www.facebook.com/groups/mysilentteam 100% FREE! Join 75,000 + Facebook members from around the world who are using the internet creatively every day to launch and grow multiple income streams through our exciting PROVEN strategies! There's no support community like this one anywhere else in the world! ProvenAmazonCourse.com - the comprehensive course that contains ALL our Amazon training modules, recorded events and a steady stream of latest cutting edge training including of course the most popular starting point, the REPLENS selling model. The PAC is updated free for life!

amazon zoom students pac proven hurry amazon q keepa

#841: A live Amazon Q/A with our Facebook members and PAC students

Silent Sales Machine Radio

Play Episode Listen Later May 25, 2024 39:17

Once per week or so our Amazon seller leadership team and I love to go live on Zoom with whoever can join us and answer as many ecommerce and Amazon selling related questions as possible. We had a long session last time so we captured the remainder of the best moments and turned them into a podcast episode. Here's the topics of the rest of the clips for the Q&A: Jim repeats the introduction and gives background on SilentJim.com and tells us details about TheProvenConference Coach intro to Kickstart Bootcamp - Robin Joy gives an overview Question about rank and how it works with Keepa knowledge Why is the word "and" better than the word "or" in business? A discussion and great tip to save time using your camera and how to find the ASINS Can we decide which warehouse to sell at and how is that actually helping allowing the Above the Buy Box strategy? The three data points for Keepa a seller should know are found in episode 612 What's the rule of thumb in profit vs investment? Is there a ratio? What model is better than wholesale - up and coming? New glasses and better perspective - don't think profit, think how can I find underserved markets Show note LINKS: SilentSalesMachine.com - text the word “free” to 507-800-0090 to get a free copy of Jim's latest book in audio about building multiple income streams online or visit https://silentjim.com/free11 https://SilentJim.com/bb70 See over 170 examples of Keepa charts of "above buy box" winners that are selling in Jim's account in real time RIGHT NOW. SilentJim.com/bookacall - book a call here to discuss our offers including coaching, legends and ProvenAmazonCourse.com course My Silent Team Facebook group - https://www.facebook.com/groups/mysilentteam 100% FREE! Join 75,000 + Facebook members from around the world who are using the internet creatively every day to launch and grow multiple income streams through our exciting PROVEN strategies! There's no support community like this one anywhere else in the world! ProvenAmazonCourse.com - the comprehensive course that contains ALL our Amazon training modules, recorded events and a steady stream of latest cutting edge training including of course the most popular starting point, the REPLENS selling model. The PAC is updated free for life!

amazon zoom students pac proven buy box amazon q keepa

Once per week or so our Amazon seller leadership team and I love to go live on Zoom with whoever can join us and answer as many ecommerce and Amazon selling related questions as possible. We love to capture the best moments from these Monday night sessions for you and turn them into a podcast episode. Here's the topics of today's clips put together by the team: Jim gives us an introduction and background on SilentJim.com and tells us details about TheProvenConference Coach intro to Kickstart Bootcamp - Robin Joy gives an overview A question about brands that are not allowed to sell on Amazon and how Jeff Schick helps our community with Legal Advice A question about pricing products above the buy box but getting a suppressed listing A question about ASINS and pricing inventory A question about who decides when the brand is gated, Amazon or the brand? Feedback on the panel discussion of the recent Amazon fee changes A question about the shipping cost increase and how to mitigate that cost into my business Comments about funding our Amazon businesses A question about do I have to have a large living space to do this business? Show note LINKS: TheProvenConference.com/orlando - come meet your fellow listeners to this podcast, dozens of our coaches and hundreds of business building warriors at our live event in May! Tickets are still available now! JeffSchick.com - our legal expert for all things Amazon SilentJim.com/buyorsell - our preferred partner for selling your amazon account or buying a new or seasoned account! Selling Above Buy Box strategy https://SilentJim.com/bb70 - this is a link to a Facebook post with over 70 examples of great ASINs selling above buy box for our host Jim Cockrum's Amazon seller account. https://silentjim.com/ungating - a short discussion and video on ungating My Silent Team Facebook group - https://www.facebook.com/groups/mysilentteam 100% FREE! Join 72,000 + Facebook members from around the world who are using the internet creatively every day to launch and grow multiple income streams through our exciting PROVEN strategies! There's no support community like this one anywhere else in the world! ProvenAmazonCourse.com - the comprehensive course that contains ALL our Amazon training modules, recorded events and a steady stream of latest cutting edge training including of course the most popular starting point, the REPLENS selling model. The PAC is updated free for life! https://SilentJim.com/kickstart - if you want a shortcut to learning all you need to get started then get the Proven Amazon Course and go through Kickstart. https://JimCockrumCoaching.com - get a free session with a business consultant on our team at 1-800-994-1792 / 1-801-693-1688 or TEXT US at 385-284-7701 (US & Canada only for Text) ALL of our coaches are running very successful businesses of their own based on the models we teach here! We've been setting the standard for excellence in e-commerce and Amazon seller coaching since 2002 with over 7,000 students served! Hundreds of our successful, happy students have been interviewed on our podcast! Or grab a slot on the calendar and we'll call you at https://silentjim.com/bookacall https://silentjim.com/podcast - episode 820 is our panel discussion about the changes in fees and how it can actually benefit sellers https://silentjim.com/podcast/episode-820-how-scary-are-the-latest-amazon-fee-and-shipping-changes-whats-the-good-and-bad-news/

amazon zoom students tickets pac hundreds proven kickstart us canada textus amazon q asins jim cockrum jeff schick

#821: A live Amazon Q/A with our Facebook members and PAC students

Silent Sales Machine Radio

Play Episode Listen Later Apr 20, 2024 31:33

amazon coach zoom students walmart pac proven amazon q

#817: A live Amazon Q/A with our Facebook members and PAC students

Silent Sales Machine Radio

Play Episode Listen Later Apr 13, 2024 23:07

Once per week or so our Amazon seller leadership team and I love to go live on Zoom with whoever can join us and answer as many ecommerce and Amazon selling related questions as possible. We love to capture the best moments from these Monday night sessions for you and turn them into a podcast episode. Jim gives us an introduction and background on SilentJim.com and the ProvenAmazonCourse.com Coach intro to Kickstart Bootcamp - Robin Joy gives an overview Podcast best hits -- Jim covers which episodes are most helpful and popular with our listeners of the Silent Sales Machine Radio podcast https://silentjim.com/podcast They touch on cross the border selling and the advantages of doing so Jim clarifies what a replen is and how knowing this is intregal to your success Lots of discussion on why Keepa is so important to your Amazon business Question My question is, I'm in Canada and I did start my journey as a private label seller and that didn't go too well and then I joined you I think a year ago, should I sell here or in the US? You have a very unique advantage living in Canada. We've got some cross border training coming from Debbie on our team here and we're going to actually be rolling it out at the conference in May. https://theprovenconference.com Question My question is how do I do replens? I've always done replens and home goods products which sell very well but I have to get my income up more. Replens definition Let's talk about what a replen is real quick. The difference between where you are now and having a business where you pay someone to put tape on boxes which is $12 $15 an hour work, you need to be spending time perfecting your Keepa skills. You really got to dive into replens, find test worthy asins and build your book of ASINs When you start off in the Proven Amazon Course you're going to start off with Amazon 101, then both Replens courses we offer in the PAC. You will see exactly where to start on the tab "get started" and that's going to keep you busy the next three or four months. Question What podcast could help me understand keepa charts? If you want a shortcut get the course get the proven Amazon course go through Kickstart. https://silentjim.com/kickstart/ Podcast episodes which are helpful and fan favorites. Episodes 369 is the go to beginner's understanding of how we use Keepa, then episodes 612, 554, 555, 754 at https://silentjim.com/podcast For those of you who around earlier when I was mentioning it podcast episode 754 is our interview with Khang Dang. He's brilliant and created a software program to help manage his multimillion dollar Amazon business. We're very proud of how low we're going to make the price on this thing. https://silentjim.com/thesystem Question Someone asked if there's any repricing training inside the Proven Amazon Course. Yes, there is. Bqool is one of our great sponsors. Repricer training - Bqool training module inside the ProvenAmazonCourse.com (direct link) https://learning.silentsalesmachine.com/members/courses/bqool/ Bqool webpage with group special https://SilentJim.com/repricer Support Questions for us? If you got any questions about anything that we talked about tonight you can always contact our support team at support@silentsalesmachine.com

amazon canada coach zoom students pac kickstart amazon q keepa

#813: A live Amazon Q/A with our Facebook members and PAC students

Silent Sales Machine Radio

Play Episode Listen Later Apr 6, 2024 35:25

Once per week or so our Amazon seller leadership team and I love to go live on Zoom with whoever can join us and answer as many ecommerce and Amazon selling related questions as possible. We love to capture the best moments from these Monday night sessions for you and turn them into a podcast episode. Intro topics Jim gives us background on the My Silent Team Facebook group and how to join. My Silent Team Facebook group - https://www.facebook.com/groups/mysilentteam 100% FREE! Join 74,000 + Facebook members from around the world who are using the internet creatively every day to launch and grow multiple income streams Proven Conference and Scholarship Coach intro to Kickstart Bootcamp Discussion on 100 ASIN class - Robin Joy gives an overview Podcast overview -- Jim covers Coaches Corner and what happens on that episode each week on Silent Sales Machine Radio podcast and he talks about our upgrade to the podcast page - new LOOK https://silentjim.com/podcast Questions from the group participants: Question - What is the difference between Jungle Scout and Keepa? One of the best tools we recommend is Keepa which allows a quick decision based on good data that we Listen to episode #369 at https://silentjim.com/podcast to hear the best information we teach about Keepa Keepa link - https://silenjim.com/keepa Question - I use Bqool repricer and need help with a setting, can you help? Bqool - https://silentjim.com/repricer Facebook group - great question for our FB group https://www.facebook.com/groups/mysilentteam Question - Can you still do ra when you change to a wholesale account? Podcast #807 is an example of building an Amazon Business where our student uses several different business models in one selling account. PROVEN Branded Bundles -we're going to be adding this course to our PAC this month! https://PROVEN Branded Bundles.com Question - Does the old buying inventory 3x rule still work selling on Amazon? The answer is it depends! Use a great tool to be well informed https://provenamazoncourse.com/revseller - gives you a quick estimate of the fees Question-If I leave an as I'm run out of an as out of a stock and I'm not sure if I'm going to continue with that ASIN or not? Do I have to close it out or can I leave it as it is? Not an issue but there are considerations here. Question -If Amazon is on the listing should I avoid that ASIN? More thoughts on this strategy a bit later in the show. Question - When I get good inventory listed I get a letter from the supplier saying that I am not authorized to sell, how do I deal with this? Jeff Schick is a terrific resource here https://jeffschick.com Asking on the MST Facebook group with hidden details is helpful in discerning if the letters are legit The Keepa chart's sharp decline of sellers is a hint that the brand may be best to stay away from - "grumpy brands" Question: Are we allowed Amazon to Amazon Flips? Yep! Don't use your Prime account. Question: Should I set my price higher than the Buy Box price for 3-4 weeks? Podcast #554 - Above the Buy Box pricing Question - Why it doesn't always matter that Amazon is selling on the ASIN? Sometimes Amazon being on the listing is better for boosted ranks of your product. Buy Box crucial information Question: Do you try to win the buy box or match the buy box? https://silent Jim.combb70 - our Facebook chat of all Jim's examples of selling at a higher price Podcast 554, 555, 612 - these episodes highlight Above the Buy Box pricing Be sure to be well informed in the Replens training in our Proven Amazon Course before diving too deeply in this strategy Suppressed Buy Box Jim explains what that means. How it can be a plus on your product listings.

amazon zoom students fb prime pac coaches corner asin amazon business jungle scout buy box amazon q keepa jeff schick

#806: A live Amazon Q/A with our Facebook members and PAC students

Silent Sales Machine Radio

Play Episode Listen Later Mar 23, 2024 96:14

Once per week or so our Amazon seller leadership team and I love to go live on Zoom with whoever can join us and answer as many ecommerce and Amazon selling related questions as possible. We love to capture the best moments from these Monday night sessions for you and turn them into a podcast episode. Here are several summary answers recorded for you: Jim gives what he calls an “infomercial” to launch this Facebook live! The My Silent Team Facebook group now has 75,000 members strong! Jim has about 60 coaches on his team, all of whom have their own robust Amazon businesses! Jim has earned a living from e-commerce alone for over 20 years and he loves what he does! Jim introduces Nathan Bailey, who has worked with him for 20 years now (primarily in the coaching department)! “The mission is bigger than the man” in this community and we really want everyone to succeed! Jim shares the candlelight vs cake mentality which is a poverty vs abundance mindset that we really emphasize in this community! Jim and Nathan explain the “inch deep, mile wide” approach that we want new sellers to follow when starting. Podcast episode 554 talks about the “selling above the buy box” strategy. Jim outlines 3 red flags to be on the lookout for when reading Keepa: If listed as a generic brand - do not list on it!! (means it is likely mis-branded) Jim discusses pricing requests that Amazon sometimes makes and how to handle those The guest asks Jim about product safety certificates and how he should go about handling the situation. Jim recommends listening to Podcast episode 754 to hear about the possibilities with the Replens model. Coach Nathan Bailey answers the guest's questions about trademarks and also shares his business which can help further with these questions - https://humnbird.com For more experienced sellers, Jim outlines some additional ways to make an online income including consulting work - https://provenproductpartnering.com for more info. Also an Amazon influencer program - https://provenAZinfluencer.com for more info. The Proven Amazon Course (PAC) also has over 30 strategies for making money on Amazon! Jim reminds the listeners of our live event https://theprovenconference.com, which is coming May 2024. It will have over 40 break out sessions as well as many new strategies for Amazon selling! Jim explains in detail what “Replens” means and how it is the lowest hanging fruit opportunity to make money on Amazon! It is low risk, has the lowest price point for entry and the highest odds of success. It is essentially finding the underserved or “test worthy” ASINs on Amazon and “replenishing” the products as they sell. This is just one of many ways to make money on Amazon but it's the best place for new sellers to start! Jim also highly recommends new sellers listen to the following Podcasts - Episodes 369 - Keepa; 554-555 - how to find good products to test; 612 - the 3 step test; 754 - an incredible scaling of a Replens business at https://silentjim.com/podcast Show note LINKS: My Silent Team Facebook group. 100% FREE! https://www.facebook.com/groups/mysilentteam Join 74,000 + Facebook members from around the world who are using the internet creatively every day to launch and grow multiple income streams through our exciting PROVEN strategies! There's no support community like this one anywhere else in the world! https://ProvenAmazonCourse.com The comprehensive course that contains ALL our Amazon training modules, recorded events and a steady stream of latest cutting edge training including of course the most popular starting point, the REPLENS selling model. The PAC is updated free for life!

amazon zoom students pac proven amazon q keepa asins

#798: A live Amazon Q/A with our Facebook members and PAC students

Silent Sales Machine Radio

Play Episode Listen Later Mar 9, 2024 69:36

Once per week or so our Amazon seller leadership team and I love to go live on Zoom with whoever can join us and answer as many ecommerce and Amazon selling related questions as possible. We love to capture the best moments from these Monday night sessions for you and turn them into a podcast episode. Highlights collected by the team: Jim gives an overview of the community and why we recommend new sellers start with the Replens Model, which has a “low, low high” theme - low investment required, low learning curve and high odds of success! Jim also discusses the www.provenAZinfluencer.com program and how our community is getting excited about this newer program! Jim discusses how there are discounts available for PAC and coaching students Listeners can also email support for any questions or concerns: support@silentsalesmachine.com Jim talks about how to use the MST Facebook page to find other sellers who are located close to you! He also highly recommends the Kick Start boot camp for those who have PAC as a way to jump start your business, meet other new sellers and create community with them! The Kick Start program is only $40 for 4 group coaching sessions - www.silentjim.com/kickstart.com and also has a private Facebook group so participants can connect! The Proven Conference is another great way to meet people and develop relationships that are so crucial to this business! www.theprovenconference.com Conference will be May 23-25, 2024 in Orlando! Jim talks about how important relationships are in this business and how it can be very lonely without them! The guest recommends a book called Fire Yourself First by Jeff Russell. Jim also recommends Business Secrets from the Bible by Rabbi Daniel Lapin, who will be the keynote speaker at this year's Proven Amazon Conference! As far as a recession goes, Jim's advice is to not participate!! In May at the Proven Amazon Conference, Coaches Brian and Robin Joy are doing a workshop called 100 ASINs. They teach the 3 step check, 4 week test, to build a 5 figure business in 6 months if you are consistent! www.provenamazoncourse.com/100 Jim has suggestions and a word of warning for a newer seller in terms of utilizing a VA. Jim and Coach Robin Joy also give some guidelines for listing bundles: do not set up new listings if you are a newer seller - too many risks involved. www.provenbrandedbundles.com for a specific course on bundles and how to do it correctly! New sellers often ask how to find profitable products but rather the question should be how to find “test worthy ASINs" or “underserved shelf space at Amazon!!” Jim has posted many examples on the My Silent Team Facebook page demonstrating exactly how this works! Jim recommends listeners go to Podcast #554 for an in depth discussion on this topic and Podcast #369 is a thorough discussion on Keepa! The guest mentions the 3 step check that Robin Joy teaches in order to determine “the worst case scenario.” (more info about this in Podcast #612). Jim talks about the pink line on the Keepa chart in making this decision Show note LINKS: My Silent Team Facebook group - https://www.facebook.com/groups/mysilentteam 100% FREE! Join 74,000 + Facebook members from around the world who are using the internet creatively every day to launch and grow multiple income streams through our exciting PROVEN strategies! ProvenAmazonCourse.com - the comprehensive course that contains ALL our Amazon training modules, recorded events and a steady stream of latest cutting edge training including of course the most popular starting point, the REPLENS selling model. The PAC is updated free for life!

amazon bible zoom conference students va pac proven kickstart business secrets rabbi daniel lapin amazon q keepa asins jeff russell

Podcasts about amazon q

Best podcasts about amazon q

Silent Sales Machine Radio

AWS - Conversations with Leaders

Lunch With Norm - The Amazon FBA & eCommerce Podcast

All TWiT.tv Shows (MP3)

All TWiT.tv Shows (Video LO)

My Amazon Guy

Everyday AI Podcast â€“ An AI and ChatGPT Podcast

Total Jason (Audio)

Total Jason (Video)

Latest news about amazon q

Latest podcast episodes about amazon q

Amazon's legendary memo-writing culture is on its last leg

ChatGPT im Unternehmen: So wird KI sicher bei DSGVO & Datenschutz | Kauz.ai bei #ITundTECH

Amazon Q Rules Except It Doesn't At All

147. Spec coding with Kiro

#733: Amazon Connect - So Many Cool New Capabilities For You to Use!

Ep133: Enabling Better Customer Experiences with Amazon Q Index w/ PagerDuty and Zoom

The Top AI Tool for Devs Isn't GitHub Copilot, New Report Finds

Why Vibe Coding Isn't Enough

Balancing Innovation and Risk in Insurance AI with Darwin Larrison and Amanda Turcotte

SonicWall releases patches, The Com warning, Compromised Amazon Q extension

Ep120: Asana and Amazon Q - Co-Innovating with AWS Generative AI Services

Battle of the AI CLIs: Jack Tests Them All

309: Microsoft tries to give away cloud services for free, sadly, it’s only SQL

Ep101: Beyond Chat - How Asana and Amazon Q Are Embedding AI Into Enterprise Workflows

Big Retail Cyber Attack: Amazon's AI Offensive & the Google AI Opt‑Out Illusion

#719: AWS News: Amazon Q Developer brings powerful new AI capabilities to GitLab Duo

The Art of Amazon Q Developer

Ep097: Specialized Agents & Agentic Orchestration - New Relic and the Future of Observability

Pronetx Merger Supercharges AWS CX Services: “Get to the Cloud Faster—And Smarter”, Podcast

Speaking Your Language in Amazon Q Developer

#715: AWS News: Be your own data analyst with Amazon Q in Quicksight, and more

Amazon Q and The Future of Autonomous Development | AWS' Adnan Ijaz

AI Explorer Series (Part 1: AWI AI Products)

Ep084: Accelerating ISV Modernization: SoftServe's Six-Month Success Formula

The AWS Chatbot Disappointment

Quantum is Here! Plus more on re:Invent and Data Protection - Six Five Webcast Infrastructure Matters

2024 in Agents [LS Live! @ NeurIPS 2024]

284: Amazon Q uses machine learning to get smarter, but Bond’s Q can turn a wristwatch into a laser beam. Your move, AI.

SP7｜現場直擊科技盛會 re:Invent，深度解析 AWS 的生成式 AI 策略

AWS and GitLab Announce Integrated AI Offering - Six Five Media at AWS re:Invent

Build Amazon Q Apps to Scale & Drive Community Engagement with Linda Mohamed

AI @ HLTH: GE GenAI Strategy Leveraging Multimodels to Streamline Technologies and Products to Reduce Cognitive Burden on Providers

Ep062: Amazon Q - Your Generative AI Assistant with Urmila Kukreja of Smartsheet

Ep058: Boost Employee Productivity with AI agents powered by Amazon Q

Calculating the Cost and ROI of Generative AI

iCIMS & SmartRecruiters CEO Divergence

AI in Patch Management, Chatbots, Cloud Spending, and Windows Control Panel

Is finetuning GPT4o worth it?

#869: A live Amazon Q/A with our Facebook members and PAC students

#847: A live Amazon Q/A with our Facebook members and PAC students

#844: A live Amazon Q/A with our Facebook members and PAC students

#841: A live Amazon Q/A with our Facebook members and PAC students

#837: A live Amazon Q/A with our Facebook members and PAC students

#829: A live Amazon Q/A with our Facebook members and PAC students

#825: A live Amazon Q/A with our Facebook members and PAC students

#821: A live Amazon Q/A with our Facebook members and PAC students

#817: A live Amazon Q/A with our Facebook members and PAC students

#813: A live Amazon Q/A with our Facebook members and PAC students

#806: A live Amazon Q/A with our Facebook members and PAC students

#798: A live Amazon Q/A with our Facebook members and PAC students