Podcasts about code reviews

Activity where one or more people check a program's code

163PODCASTS
264EPISODES
50mAVG DURATION
5WEEKLY NEW EPISODES
Jul 22, 2026LATEST

POPULARITY

20192020202120222023202420252026

Best podcasts about code reviews

Software Engineering Unlocked

16 episodes with code reviews

Smart Software with SmartLogic

5 episodes with code reviews

Tech Lead Journal

6 episodes with code reviews

The Bike Shed

3 episodes with code reviews

PodRocket - A web development podcast from LogRocket

3 episodes with code reviews

Reversim Podcast

5 episodes with code reviews

Software Engineering Daily

2 episodes with code reviews

Programming Throwdown

2 episodes with code reviews

Engineering Kiosk

5 episodes with code reviews

Latest podcast episodes about code reviews

Using Substition to Make Decisions Simpler

Developer Tea

Play Episode Listen Later Jul 22, 2026 18:30

There's a cognitive trick our brains play whenever we face a hard question or a difficult decision: we quietly swap the hard thing for something easier to answer. It's called substitution, and we do it constantly — usually without noticing, and sometimes to our detriment. In today's episode, I make the case that you can take this same shortcut your brain already runs automatically and start using it intentionally to break decision paralysis and get moving. The Substitution Heuristic: Understand the mental move at the heart of this episode — when "How are you?" becomes "Is there anything I urgently need to say?", your brain isn't being lazy, it's compressing. It runs a quick calculation: what's the downside of the substitute? Usually it's low, so the shortcut holds. Code Review as Substitution: See why you almost never review a big PR line by line. "Was it tested? Does it follow our best practices? Does anything look obviously wrong?" are all stand-ins for the harder, mostly-unnecessary work of reading every line — and most of the time, they're enough. Performance Reviews Are a Best Guess: Recognize that your rubrics, metrics, and frameworks are themselves substitutes. You can't deterministically rate a person on a scale, so every measure you use is an approximation of an immeasurable question: how much value is this person really generating, and where are they headed? Turn the Trick Around on Purpose: Learn how to use substitution deliberately to defeat paralysis. Instead of "I'm deciding to leave my job and chase something new," substitute "I'm going to send an email." The weight of sending an email is tiny — and the dream job is just a series of small, iterative steps like that one. Unbundle Your Big Decisions: Notice how we fuse trivial mechanical actions with heavy imagined meaning. Merging a branch isn't "declaring this production-ready and putting my name on the line" — it's moving bits into the cloud. Separate the labeling from the action and the action gets a lot less intimidating. Lower Your Commitment Threshold: Swap "this code is great" for "I believe this code is shippable." Swap "this is the perfect hire" for "this is a good bet we can course-correct." The actions you take are identical — but the internal stakes, and the fear, drop dramatically. Favor Reversible Moves: We consistently overestimate the risk of acting and underestimate the risk of doing nothing. Most decisions aren't permanent — you can roll back the deploy, divert from a bad hire, find an exit path. Look for the way back, and the decision gets easier to make. Episode Homework Next time you feel a big decision looming, break it into its most fundamental pieces. Ask: what am I actually doing here? What are the physical actions, the words, the mechanical steps? What's the true worst-case downside — and do I really have to attach all of my worth to it? You're likely already substituting easier questions without realizing it. This week, try doing it on purpose.

pr separate swap merging make decisions simpler code reviews purpose learn

Owning Your AI Stack with Zach Daniel

Smart Software with SmartLogic

Play Episode Listen Later Jul 16, 2026 60:23

In this episode of Elixir Wizards, Charles Suggs and Emma Whamond are joined by Zach Daniel, creator of the Ash Framework and Igniter, VP of Engineering at Remedy Meds, and upcoming ElixirConf keynote speaker, to talk about what sits between an LLM and useful engineering work. Zach reflects on how much has changed since his last appearance on the podcast in October 2024, moving from Igniter, code generation, and project patching into AI agents, context layers, and custom engineering workflows. The conversation explores how deterministic tools and probabilistic LLMs can work together, and why the most useful AI systems often depend on the structure built around the model. We also discuss why teams should be careful about outsourcing the systems that hold their organizational knowledge and decision-making. He shares his perspective on owning the AI stack, building internal knowledge systems, training junior developers in an AI-augmented world, avoiding vendor lock-in, and why Elixir may be especially well-suited for safer agentic workflows. Zach will be a keynote speaker at ElixirConf 2026, September 10–11 in Chicago, and the Elixir Wizards will be there too! Join us and the broader Elixir community, and use promo code Elixirwizards for 10% off in-person or virtual tickets at https://elixirconf.com/ Key topics discussed in this episode: Zach Daniel's work with Ash, Igniter, and AI tooling How software development has changed since 2024 Deterministic code generation vs. LLM-generated code Combining structured tools with AI agents What it means to own your AI stack Organizational knowledge as an engineering “spinal column” Context layers, documentation, and internal workflows Building custom agentic systems Security, vendor lock-in, and open source LLMs Junior developers and apprenticeship in the AI era Why Elixir and the BEAM fit agentic workflows Links mentioned: Ash Framework https://ash-hq.org/ Igniter https://igniter.hexdocs.pm/ Phoenix Framework https://www.phoenixframework.org/ Remedy Meds https://remedymeds.com/ Keynote: Code Generators are Dead. Long Live Code Generators - Chris McCord | ElixirConf EU 2025 https://www.youtube.com/watch?v=ojL_VHc4gLk https://phoenix.hexdocs.pm/Mix.Tasks.Phx.Gen.Live.html LSP https://en.wikipedia.org/wiki/Language_Server_Protocol Claude Code https://claude.com/product/claude-code GitHub Actions https://github.com/features/actions Harness Engineering https://en.wikipedia.org/wiki/Agent_harness Claude SDK https://code.claude.com/docs/en/agent-sdk/overview Zach's Twitter https://x.com/ZachSDaniel1 ElixirConf https://elixirconf.com/ AshConf https://luma.com/wz4z0iz6 Goatmire https://goatmire.com/Special Guest: Zach Daniel.

live ai chicago dead engineering agent owning mix context tasks ash programming stack coding beam llm software development elixir large language models software engineering knowledge management otp technical debt phx ai security refactoring open source ai developer tools functional programming pull requests software design learning to code software architecture code reviews igniter engineering leadership developer productivity code quality software quality elixirconf phoenix framework developer training

Building AI that can search inside videos (and photos and audio too)

Remotely Curious

Play Episode Listen Later Jul 14, 2026 31:58

Not all work happens in writing. Teams that work with photos, videos, and audio need AI that works for them too. This is why, with Dropbox, you can search within multimedia content for key moments and important information—not just text. In this episode, we talk with Appu Shaji and Hicham Badri, two Dropbox machine learning engineers who are part of the team that makes all of this possible. They explain how multimodal search works—from understanding the context of the initial query, to identifying objects and actions in complex scenes—and how they ensure those models work fast, even at Dropbox-scale. ~ ~ ~ Working Smarter is brought to you by Dropbox. Find, organize, and share your work—all in one place—with context-aware AI from Dropbox. You can listen to more episodes of Working Smarter on Apple Podcasts, Spotify, YouTube, Amazon Music, or wherever you get your podcasts. To read more stories and past interviews, visit workingsmarter.ai This show would not be possible without the talented team at Cosmic Standard: producer Ben Montoya, sound engineer Aja Simpson, technical director Jacob Winik, and executive producer Eliza Smith. Special thanks to our illustrator Fanny Luor, marketing consultant Meggan Ellingboe, and editorial support from Catie Keck. Our theme song was composed by Doug Stuart. Working Smarter is hosted by Matthew Braga. Thanks for listening!

1019: LGTM, Ship It: The AI Code Review Problem

Syntax - Tasty Web Development Treats

Play Episode Listen Later Jul 8, 2026 39:20

This episode tackles the growing pains of AI-assisted development, from the struggle of reviewing thousands of lines of agent-generated code to the mounting technical debt when teams merge PRs without meaningful human review. Scott and Wes also dig into local models, whether jujutsu really beats git, how freelancers should price work in the AI era, and getting your team on board with external libraries. Show Notes 00:00 Welcome to Syntax! 00:45 Understanding AI-Generated Code 06:24 The Challenges of Code Review in AI Development 11:21 What the heck are local models? CJ's Guide to Local AI 16:09 Exploring New Tools: Jujutsu and Beyond 20:35 Exploring Version Control Innovations 22:18 Pricing Strategies in the Age of AI 24:52 webkit-box-reflect not in the browsers. 27:44 The Angular vs. React Debate 31:46 Sick Picks & Shameless Plugs. Sick Picks Scott: Huggingface Reachy Mini Robot Wes: Bose QC35 Hit us up on Socials! Syntax: X Instagram Tiktok LinkedIn Threads Wes: X Instagram Tiktok LinkedIn Threads Scott: X Instagram Tiktok LinkedIn Threads Randy: X Instagram YouTube Threads

ai guide challenges code ship cj socials prs pricing strategies angular syntax code reviews shameless plugs

The State of Career Growth in Elixir with Bruce Tate

Smart Software with SmartLogic

Play Episode Listen Later Jul 2, 2026 55:35

In this episode, Elixir Wizards Charles Suggs and Emma Whamond are joined by Bruce Tate, founder of Groxio and longtime Elixir educator, to discuss career growth and mentorship in the age of AI. Bruce reflects on Elixir's evolution and explores how AI coding agents are changing the way developers learn, work, and advance. He explains his “anti-vibe coding” philosophy and recommends practices for maintaining code quality, arguing that many struggles attributed to junior developers are really failures of training and mentorship. Rather than using AI productivity gains to eliminate entry-level roles, Bruce challenges teams to reinvest that time in educating juniors to support their long-term career path. Bruce will expand on this conversation at ElixirConf 2026, September 10–11 in Chicago, and the Elixir Wizards will be there too! Join us and the broader Elixir community, and use promo code Elixirwizards for 10% off in-person or virtual tickets at https://elixirconf.com/ Topics discussed in this episode: The evolution of the Elixir ecosystem from conception to now Rethinking career growth and mentorship in the age of AI What “anti-vibe coding” means Functional core and imperative shell design OTP fundamentals and architectural literacy Managing AI-generated code and pull requests Mentorship beyond code review Developing engineering judgment API design as a core skill AI as a force multiplier for junior developers Reinvesting AI productivity gains in people Links mentioned: https://grox.io/ Seven Languages in Seven Weeks by Bruce Tate https://pragprog.com/titles/btlang/seven-languages-in-seven-weeks/ Programming Phoenix LiveView https://pragprog.com/titles/liveview/programming-phoenix-liveview/ BASIC Programming Language https://www.purebasic.com/ C++ Programming Language https://cplusplus.com/ COBOL Programming Language https://www.ibm.com/think/topics/cobol Fortran Programming Language https://fortran-lang.org/ Bitter Java by Bruce Tate https://www.amazon.com/Bitter-Java-Bruce-Tate/dp/193011043X Slashdot Effect https://en.wikipedia.org/wiki/Slashdot_effect Slashdot https://slashdot.org/ Java Programming Language https://www.java.com/en/ Ruby Programming Language https://www.ruby-lang.org/en/ Lisp Programming Language https://lisp-lang.org/ Haskell Programming Language https://www.haskell.org/ Erlang Programming Language https://www.erlang.org/ Eric Meadows-Jönsson https://github.com/ericmj Machine Learning in Elixir by Sean Moriarity https://pragprog.com/titles/smelixir/machine-learning-in-elixir/ The AI Collective https://www.aicollective.com/ Elixir Pipe Operator https://operator.hexdocs.pm/readme.html Phoenix Framework https://www.phoenixframework.org/ Phoenix LiveView https://phoenix-live-view.hexdocs.pm/Phoenix.LiveView.html Broadway https://hex.pm/packages/broadway Designing Elixir Systems with OTP by James Edward Gray & Bruce Tate https://pragprog.com/titles/jgotp/designing-elixir-systems-with-otp/ The AI Coding Crisis Blog Series https://grox.io/blog/series/the-ai-coding-crisis/ Groxio Live https://grox.io/liveSpecial Guest: Bruce Tate.

ai chicago software development elixir career growth software engineering otp computer programming senior developers functional programming pull requests seven weeks learning to code software architecture code reviews engineering leadership liveview developer productivity code quality junior developer api design software craftsmanship seven languages bruce tate elixirconf groxio

How agentic AI works behind the scenes to find the answers you need

Remotely Curious

Play Episode Listen Later Jun 30, 2026 31:35

When AI is at its best, the conversations can feel uncanny—almost magical in their accuracy, relevance, and speed. For that you can thank the AI agents that work together behind the scenes to search, reason, and sift through all your content to get you what you need to do your job. We talk with Jongmin Baek and Marta Mendez, two Dropbox machine learning engineers, about building conversational AI that's helpful, useful, and grounded in your team's shared context, so you can spend more time on the work that really matters. ~ ~ ~ Working Smarter is brought to you by Dropbox. Find, organize, and share your work—all in one place—with context-aware AI from Dropbox. You can listen to more episodes of Working Smarter on Apple Podcasts, Spotify, YouTube, Amazon Music, or wherever you get your podcasts. To read more stories and past interviews, visit workingsmarter.ai This show would not be possible without the talented team at Cosmic Standard: producer Ben Montoya, sound engineer Aja Simpson, technical director Jacob Winik, and executive producer Eliza Smith. Special thanks to our illustrator Fanny Luor, marketing consultant Meggan Ellingboe, and editorial support from Catie Keck. Our theme song was composed by Doug Stuart. Working Smarter is hosted by Matthew Braga. Thanks for listening!

The PHP Podcast 2026.06.25

php[podcast] episodes from php[architect]

Play Episode Listen Later Jun 26, 2026 52:14

PHP Podcast – June 25, 2026 Hosts: Eric Van Johnson & John Congdon Eric and John are back. Sara and Holly did a better job. Eric’s computer still hates him. Eric’s Connectivity Saga: A Possible Resolution For weeks, Eric has been dealing with a maddening streaming issue — he could see and hear everyone, but nobody could hear or see him. It only happened during Zoom, Slack huddles, and Restream sessions. No one could explain it, including Eric. The apparent fix came by accident: while helping his kid troubleshoot a similar issue, Eric pulled up his own DNS settings and discovered they were only pointing to his router with no upstream fallback. He manually added Google’s 8.8.8.8 and Cloudflare’s 1.1.1.1 — and for the first time in weeks, had zero issues all week. Does DNS explain streaming dropouts? Almost certainly not. Does it appear to have fixed it? So far, yes. Computers are stupid. Eric needs to retire and open a landscaping business. Two Weeks Off: Road Trip, Pittsburgh, and a Graduation John took the family on a road trip that included national park hikes, biking across the Golden Gate Bridge, and a swing through Universal Studios. Eric headed to Pittsburgh, technically West Virginia, for his niece’s high school graduation. Eric came away impressed with how much Pittsburgh has changed: what was once a gritty steel city has quietly become a genuinely beautiful place. He’s half-serious about looking at it as a future PHP Tek location. Sara and Holly: The Better Hosts Eric and John opened with a heartfelt thank-you to Sara Golemon and Holly Schilling for covering the last two weeks. John’s take: Sara and Holly showed up with documents, had a plan, and ran a tighter show. He’s joking about handing over the keys — mostly. PHP Architect Takes Over Laravel Magazine Here’s the announcement Eric was teasing before the break: PHP Architect has taken over the Laravel Magazine brand. It was originally a Statamic, which Eric rebuilt from scratch over the past couple of weeks. No plans to create a print magazine — it will remain a web-only publication. Eric is thinking about opening it to outside contributors, and there’s a real possibility of a dedicated Laravel column eventually appearing in the PHP Architect magazine. The new consulting section Eric built for the site looks sharp enough that John immediately pointed out it looks better than what’s currently on phparch.com. Foam Burner Feature: Proof of Concept Wars John got a big feature in Foam Burner across the finish line — or at least mostly across — recorded a screencast, and sent it off to the people who care. The immediate response: a list of compliance concerns and edge cases. This is a proof of concept. John is sympathetic, having just had to tell Eric the same thing about Laravel Magazine. The cycle of building something you’re proud of and then having someone find the things that aren’t done yet is a universal developer experience. Code Review in the Age of AI — What’s the Point? John is in a strange spot: he’s doing careful code review on pull requests written entirely by Claude, submitted by non-developers. His detailed, educational feedback — the kind meant to help a developer understand why something was done a particular way — is just being fed back into Claude to generate revisions. Nobody’s learning anything. An incident during his vacation reinforced why the reviews matter: someone deployed AI-generated code that wasn’t well reviewed, it broke overnight, and the team had to revert it the next morning. His position: keep reviewing, even if the audience is an AI. But the nature of what you’re reviewing for has to change — you’re no longer nurturing a developer; you’re being a gate. Eric’s broader point: if you’re a developer who cares about the craft, don’t let AI make you lazy. Learn from how it implements things. Ask it why. The thing that will differentiate developers when AI really matures is genuine understanding of what the code is doing — and that only comes from staying curious. PHP Generics RFC — Closed The Generics RFC was shut down while Eric and John were away, and Eric is genuinely disappointed. The proposal was for syntax-only generics: type annotations that static analyzers could read but that would be stripped at the opcode level, meaning no runtime performance impact. The goal was to standardize the generics syntax so PHPStan, Psalm, and other tools all read it the same way — right now they each implement their own dialect. Sara voted no (she explained her reasoning in the June 17 episode). Joe abstained. Whether an active abstain requires a deliberate action or is the default for a non-vote is apparently still a matter of some debate. PHP Tek 2026 Talks Now on PHP Tech TV Talks from PHP Tek 2026 are being uploaded to phptech.tv. Subscribers can watch the full library. Several speakers have given permission to make their talks free, and Ben Ramsey’s is one of them. John also added video progress tracking to the platform — it now remembers how far into a video you’ve watched. This Week in PHP Internals (Artisan Build) While looking for a PHP Internals podcast that Nuno appears to have started (possibly in connection with the PHP Foundation after PHPverse), Eric stumbled on a different show: This Week in PHP Internals, hosted on the Artisan Build site, with four episodes out. Eric doesn’t know who runs it but says it’s good — short, focused, and gets to the point. He’s also still looking for confirmation on what exactly Nuno’s new podcast is and who it’s for. For reference: the original PHP Internals podcast was Derick Rethans’ show, which he hasn’t updated in four or five years. The ecosystem growing new shows is a good sign. PHP Friends RFC — Under Discussion John has been watching the friends RFC, currently in the discussion phase. The idea: a class can explicitly declare another class as a “friend,” granting it access to private properties without requiring inheritance. The canonical use case is a builder pattern — a UserBuilder that needs to set private fields on User without a thousand public setters, and without making those fields non-private. Holly, in chat, noted that the friend model is a special case of a “surfaces” model she proposed a few years back. She also shared that Swift doesn’t have protected at all (just public and private) — something she initially found frustrating but has come to appreciate. Eric admits he’s been guilty of abusing inheritance over the years and is more thoughtful about it now. The RFC is still under discussion; no vote yet. Eric Drops PHPStorm — Falls Back in Love with Vim Eric canceled his JetBrains All Products subscription. Not because there’s anything wrong with PHPStorm — he’s explicit about that — but because he’s been doing so much work via Claude Code and making only targeted, smaller changes himself that the license fee no longer made sense. His replacement workflow: VS Code for some things, Vim for others. The Vim part was supposed to be supplemental. Instead, his terminal has taken over: it went from a panel alongside PHPStorm to taking up two-thirds of his screen to now living on its own separate virtual desktop. He’s running Spotify in the terminal. He briefly ran Slack in the terminal. He uses Tmux religiously. “I have problems,” he acknowledged. He would not take this back. Links from the show: Laravel Magazine — Now under PHP Architect PHP Tech TV — PHP Tek 2026 talks now uploading PHP RFC Wiki — All RFCs under discussion PHP Tek 2027 — April 27–29 (phptek.io) PHP Architect Discord Host: Eric Van Johnson X: @shocm Mastodon: @eric@phparch.social Bluesky: @ericvanjohnson.bsky.social PHPArch.me: @eric John Congdon X: @johncongdon Mastodon: @john@phparch.social Bluesky: @johncongdon.bsky.social PHPArch.me: @john Streams: Youtube Channel Twitch Connect & Hire PHP Architect Website Twitter/X Mastodon Hire PHP Developers Looking to hire PHP developers? Email support@phparch.com – Joe and the team are available for consulting, infrastructure work, Ansible playbooks, and code review. Partner This podcast is made a little better thanks to our partners Displace Infrastructure Management, Simplified Automate Kubernetes deployments across any cloud provider or bare metal with a single command. Deploy, manage, and scale your infrastructure with ease. https://displace.tech/ PHPScore Put Your Technical Debt on Autopay with PHPScore CodeRabbit Cut code review time & bugs in half instantly with CodeRabbit. Music Provided by Epidemic Sound https://www.epidemicsound.com/ Join Us Live Next Week Youtube Channel Got feedback? Join us on Discord at discord.phparch.com The post The PHP Podcast 2026.06.25 appeared first on PHP Architect.

love spotify ai google zoom psalm pittsburgh discord computers west virginia swift slack user universal studios dns php deploy cloudflare nuno golden gate bridge epidemic sound vim vs code rfc restream ansible laravel music provided code reviews autopay tmux phpstorm ben ramsey statamic

Folge 118: Der Agent reviewed

@Autoweird.fm

Play Episode Listen Later Jun 22, 2026 94:29 Transcription Available

# Folge 118: Der Agent reviewed ## Zusammenfassung Wir reden heute nochmal über Code Reviews! Heute adressieren wir den Elefanten im Raum. Brauchen wir die traditionelle Art der Code Reviews noch im Zeitalter von Claude Code und Co? Es geht um Lights out codebases, Agent output as compiler output. Und wir springen wild von Thema A nach Thema B: Heißt irgendwie: Wir sind sehr unsicher wo wir grad stehen, wo es hingeht. Aber ein zurück wird es wahrscheinlich nicht geben. Oder?

art engineering lights agent raum brauchen zeitalter elefanten agentic code reviews

Champaign City Council 6-16-26 w/ Audio Descriptions

City of Champaign

Play Episode Listen Later Jun 17, 2026 82:50

ORDINANCES AND RESOLUTIONS CB2026-079 - A Resolution Reappointing Nicholas Kut to the Board of Fire and Police CommissionersCB2026-080 - A Resolution Reappointing Michael La Due and Rajeev Malik to the Champaign Public Library Board of TrusteesCB2026-081 - A Resolution Reappointing Kenwood Sullivan and Lucas McGill to the Code Review & Appeals Board CB2026-082 - A Resolution Reappointing Anthony Bamert, Gail Broadie, and Jon Roma to the Historic Preservation Commission CB2026-083 - A Resolution Reappointing Francesca Morgan to the Housing Authority of Champaign County CB2026-084 - A Resolution Reappointing Willie G. Comer, Jr. to the Human Relations CommissionCB2026-085 - A Resolution Reappointing Joshua Bubniak to the Citizen Review Subcommittee of the Human Relations CommissionCB2026-086 - A Resolution Reappointing Yvonne Miller to the Neighborhood Services Advisory BoardCB2026-087 - A Resolution Reappointing Paul Cole and Jeffrey Barkstall to the Plan CommissionCB2026-088 - A Resolution Reappointing Bridgett Wakefield to the Zoning Board of AppealsCB2026-089 - An Ordinance Approving and Adopting the Annual Budget for the Fiscal Year Commencing July 1, 2026 and Ending June 30, 2027CB2026-090 - A Resolution Adopting Financial Policies for the Development, Adoption and Execution of the Annual BudgetCB2026-091 - An Ordinance Establishing Rates of Compensation for Employees of the City of Champaign and Approving the Annual Position Control Report for the Fiscal Year 2026-2027CB2026-092 - An Ordinance Establishing Rates of Compensation for Employees of the Champaign Public Library and Adopting the Annual Position Control Report for the Champaign Public Library for the Fiscal Year 2026/27CB2026-093 - A Resolution Adopting the Ten-Year Capital Improvement Plan for Fiscal Years 2026/27 – 2035/36 and Adopting the Capital Improvement PoliciesCB2026-094 - An Ordinance Amending Section 19-8.15.3 of the Champaign Municipal Code, 1985 (Stormwater Utility Fee)CB2026-095 - An Ordinance Amending Various Sections of the Champaign Municipal Code, 1985 (Finance Department)CB2026-096 - An Ordinance Amending the Champaign Municipal Code by the Addition of Chapter 36, Article IV and the Addition of Section 19-8.16-1 and Amending Section 19.1 (Vehicles for Hire – Bike Share Operators; Licenses and Permits)CB2026-097 - A Resolution Accepting a Bid for Heating, Ventilation, and Air Conditioning (HVAC), Mechanical, Controls Support, and Repair Services (Public Works Department – Mechanical, Inc. dba Helm Service, Freeport, Illinois)CB2026-098 - A Resolution Accepting a Bid and Authorizing the City Manager to Execute an Agreement for the 2026 Pavement Marking Project (Public Works Department – Varsity Striping & Construction Co.)CB2026-099 - A Resolution Approving an Engineering Services Agreement with Clark Dietz, Inc. for the 2027 Concrete Street Improvements Project (Public Works Department – Clark Dietz, Inc.) (City Project No. 0726)CB2026-100 - A Resolution for Improvement of Streets by Municipalities Under the Illinois Highway Code (Public Works Department – 2027 Concrete Street Improvements Project) (Project No. 0726)CB2026-101 - A Resolution Approving a Change Order with Clark Dietz, Inc., to Provide Additional Design Engineering Services on Phase 2 of the Downtown Plaza Project (Public Works Department – Clark Dietz, Inc.)CB2026-102 - A Resolution Accepting a Bid and Authorizing the City Manager to Execute an Agreement for Construction of Phase 2 of the Downtown Plaza Project (Public Works Department – Duce Construction Company) (City Project No. 0789)CB2026-103 - A Resolution Authorizing the City Manager to Execute a Professional Services Agreement with Clark Dietz, Inc., to Provide Construction Engineering Services for Phase 2 of the Downtown Plaza Project (Public Works Department – Clark Dietz, Inc.) (City Project No. 0789)

Why don't more AI tools understand what matters to you?

Remotely Curious

Play Episode Listen Later Jun 16, 2026 29:43

How do you build AI that actually understands you and the work you do? It all starts with having the right context. We talk with Dropbox staff product manager Noorain Noorani and principal engineer Sean-Michael Lewis about the art of context engineering and how Dropbox connects to all the tools your team needs for work—so you get AI that works wherever you do. ~ ~ ~ Working Smarter is brought to you by Dropbox. Find, organize, and share your work—all in one place—with context-aware AI from Dropbox. You can listen to more episodes of Working Smarter on Apple Podcasts, Spotify, YouTube, Amazon Music, or wherever you get your podcasts. To read more stories and past interviews, visit workingsmarter.ai This show would not be possible without the talented team at Cosmic Standard: producer Ben Montoya, sound engineer Aja Simpson, technical director Jacob Winik, and executive producer Eliza Smith. Special thanks to our illustrator Fanny Luor, marketing consultant Meggan Ellingboe, and editorial support from Catie Keck. Our theme song was composed by Doug Stuart. Working Smarter is hosted by Matthew Braga. Thanks for listening!

Vibecoding im Unternehmen: Wie Marketing-Teams mit KI eigene Anwendungen entwickeln

121STUNDEN talk - Online Marketing weekly I 121WATT School for Digital Marketing & Innovation

Play Episode Listen Later Jun 16, 2026 41:17 Transcription Available

Du hast eine Idee für ein Tool, ein Dashboard oder ein Browser-Plugin. Aber für die Umsetzung fehlen Zeit, Budget oder Entwickler-Ressourcen? Genau hier wird Vibecoding für viele Marketer interessant. Denn mit modernen KI-Tools lassen sich heute erste Prototypen, Automatisierungen oder sogar komplette Anwendungen entwickeln, oft allein durch den Dialog mit der KI. In Folge #178 des 121WATT Podcasts sprechen Patrick Klingberg und Alexander Holl mit Dr. Christoph Röck darüber, was hinter dem Begriff Vibecoding steckt, welche Tools sich für den Einstieg eignen und warum das Thema gerade für Marketing-Teams so spannend ist. Das kannst du direkt mitnehmen:

spotify tools budget code tool thema idee probleme qualit schritt stunden genau marketers besonders umsetzung einstieg dialog erkenntnis die folge vorteil konzepte prompts eigene dashboard anwendungen entwickler entwickeln marketing teams dashboards mit ki die ki ki tools im unternehmen prototypen prototyp anwendungsf code reviews automatisierungen unternehmen wie christoph r browser plugin patrick klingberg

Coming soon: Working Smarter season three

Remotely Curious

Play Episode Listen Later Jun 2, 2026 2:17

Modern work can be frustrating and chaotic—if you don't have the right tools. From context engineering to multimodal search, go behind the scenes and hear how Dropbox engineers are building AI that actually understands you, so you can focus on the work that matters most. If you're new to Working Smarter, we've travelled from the F1 track to the bottom of a lake, and heard real stories from chefs, doctors, lawyers, and founders about how AI is helping them do more of what they love about their jobs. But in our third season, we're talking to the people behind the tools—the engineers and product leaders building helpful, time-saving AI features into the Dropbox experience you already know and trust. You'll hear all about their work on agents, inference, security, and, of course, how the people building AI use AI themselves. ~ ~ ~ Working Smarter is brought to you by Dropbox. Find, organize, and share your work—all in one place—with context-aware AI from Dropbox. You can listen to more episodes of Working Smarter on Apple Podcasts, Spotify, YouTube, Amazon Music, or wherever you get your podcasts. To read more stories and past interviews, visit workingsmarter.ai This show would not be possible without the talented team at Cosmic Standard: producer Ben Montoya, sound engineer Aja Simpson, technical director Jacob Winik, and executive producer Eliza Smith. Special thanks to our illustrator Fanny Luor, marketing consultant Meggan Ellingboe, and editorial support from Catie Keck. Our theme song was composed by Doug Stuart. Working Smarter is hosted by Matthew Braga. Thanks for listening!

Folge 117: Emotionen und Code Reviews

@Autoweird.fm

Play Episode Listen Later Jun 1, 2026 84:01 Transcription Available

In der heutigen Folge reden Benedikt und Holger über Code Reviews. Über Conventional Commits und besonders über Conventional Comments. Welchen Sinn haben beide Methoden? Sind diese gleich wichtig? Löst eins von beiden Überraschend viel Emotion aus? Das war auch wahrscheinlich jetzt schon klar. Ja! Code Reviews: Da geht es um Technik. Aber mindestens genauso viel um Kommunikation. Hört doch mal rein!

ai development code emotion kommunikation technik methoden holger benedikt welchen sinn emotionen und code reviews

The Age of Async Agents — Cognition's Walden Yan & OpenInspect's Cole Murray

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later May 28, 2026 68:02

The new AIEWF website is live! CFPs close in 2 days and we will run our first New Engineer Orientation this weekend, get your tickets booked ASAP as they -will- sell out. Take the AI Engineering Survey and get >$2k in credits and free AIE WF tickets!One of the central tensions in the agents industry is that even while there are major decacorn agent labs like Sierra, Decagon, Notion and Cursor being built up, it is also true that it has never been easier to DIY agents, with a plethora of agent frameworks like LangGraph and Pydantic and Flue, and managed agents from Anthropic and Gemini and Amazon. There has been a wave of companies building their own background agents from Shopify to Stripe to Paradigm to Razorpay, and even Cognition's friends Ramp have built their own coding agent with other friend Modal.You'd think Cognition might feel a bit threatened, but they're not - even after all this, they were way oversubscribed for the $1B Series D they just announced:Walden Yan, coiner of context engineering and Chief Product Officer/Cofounder of Cognition, invited OpenInspect's Cole Murray to talk about why the Devin is in the Details.Full conversation live on the pod today: In retrospect, async agents were the most AGI pilled bet you could make in 2024 - the models weren't good enough yet to vibecode, and people didn't trust AI enough to let it rip, nobody (including early Cognition) was sure about the form factors. Now it is obvious:* The first wave of AI coding tools made the developer faster but remain heavily in the loop. Copilor and Cursor's tab autocomplete are prime examples However, the workflow was still heavily centered around and bottlenecked by the developer's local workflow: a developer in an IDE, watching the model, accepting or rejecting changes, and pushing code one interaction at a time.* The second wave was local agents: Claude Code, Windsurf, Cursor's agents pane: first one and increasingly many terminals all running concurrently.* The current Age of Async Agents points to a different future focused more on agent orchestration which drives end-to-end development.According to previous guest Steve Yegge, there are finer-grained 8 levels to agent adoption, but we have collapsed it into three.As Cursor's Michael Truell put it in The third era of AI software development:Cursor is no longer primarily about writing code. It is about helping developers build the factory that creates their software. This factory is made up of fleets of agents that they interact with as teammates: providing initial direction, equipping them with the tools to work independently, and reviewing their work.The agent should not sit solely inside the developer's flow. It should be setup to work in the background so that you can give it a task, a repo, a machine, a shell, a browser, tests, memory, and review loops to go do the work somewhere else.In less than a year, the sentiment has shifted from avoiding multi-agent systems:to suggesting approaches that actually work:From coining “context engineering” to building the infrastructure behind Devin's 7x PR growth and jump from 16% to 80% of commits across Cognition repos, Walden Yan has had a front-row seat to the background-agent shift. In this episode, Cognition co-founder and CPO Walden Yan joins swyx alongside Cole Murray, creator of OpenInspect, to unpack why everyone is building their own Devin, what changed after the December 2025 model inflection, and why “spec to pull request” is now becoming a real production workflow.We go deep on the architecture of background agents: harness-in-the-box vs out-of-the-box, why Devin separates the “brain” from the machine, why repo setup is still one of the hardest problems, why Docker is not always enough, and how full VMs, snapshots, scoped secrets, GitHub bots, Slack integrations, and video-based testing all fit together. Walden and Cole also dig into memory, MCP limitations, multi-agent orchestration, AI code review, SRE auto-triage, PMs shipping code from Slack, Windsurf 2.0, hybrid frontier/sub-frontier systems, and the real failure mode of uncontrolled vibe coding: your codebase regressing to your worst engineer.And as agents eat software… and software eats the world… you can draw the conclusion on what is next:We discuss:* Why the engineering world is waking up to background agents and cloud agents* The December 2025 model inflection that made spec-to-PR workflows practical* Devin's 7x merged PR growth and rise from 16% to 80% of commits* Why Cole built OpenInspect as an open-source background-agent system* The economics of $20/seat agent products and why monetization is tricky* What Cognition actually sells beyond Devin: infra, onboarding, integrations, and adoption* Harness in the box vs out of the box, and why architecture matters* Why Devin separates the brain from the machine for security and permissions* Repo setup, scoped secrets, Docker Compose, and agent-ready dev environments* Why full VMs matter when agents need to run real applications and test them* Android, macOS, Windows, nested virtualization, and machine-specific agent work* Why testing is much harder than “computer use”* Screenshots, video verification, and the “I know it works” merge moment* GitHub UX, Devin Review, AI reviewers, and agents responding to PR comments* Why MCP alone is not enough for first-class Slack and enterprise integrations* Memory, Knowledge, skills, Claude.md, and why retrieval is still unsolved* Devin's auto-generated memories and the challenge of memory pruning* Always-on agents as permanent PMs for issues, tickets, and product areas* Sub-agents, meta-Devin management, and what multi-agent systems actually add* Why pure auto-merge vibe coding breaks down after about two weeks* AI code smells, lint rules, reward hacking, and Semgrep for agent-written code* GitAI, inline context, and preserving the “why” behind code changes* Local testing, mock servers, older codebases, and preparing companies for agents* Windsurf 2.0 and the handoff between local foreground agents and cloud background agents* SRE auto-triage, support workflows, and agents as first responders* PMs, marketing, and non-engineers creating pull requests from Slack* AI agent budgets, $1k-$5k per engineer spend, and hybrid frontier/sub-frontier systems* The rise of autonomous coding factories and who Cognition is hiringWalden Yan* X: https://x.com/walden_yan* LinkedIn: https://www.linkedin.com/in/waldenyan/Cole Murray* X: https://x.com/_colemurray* LinkedIn: https://www.linkedin.com/in/colemurray/* OpenInspect / Background Agents: https://github.com/ColeMurray/background-agentsTimestamps00:00:00 Introduction00:00:43 Why Everyone Is Building Their Own Devin00:01:57 Devin's 2025 Ramp: 7x PR Growth and 80% of Commits00:03:49 OpenInspect and the Rise of Open-Source Background Agents00:07:59 What Cognition Actually Sells Beyond Devin00:09:56 Background Agent Architecture: Harness In vs Out of the Box00:12:08 Separating the Brain from the Machine00:14:07 Repo Setup, Secrets, Docker, and Full VMs00:19:13 Why Testing Is Harder Than Computer Use00:22:40 Video Verification and the “I Know It Works” Merge Moment00:23:19 GitHub UX, Devin Review, and AI Code Review00:25:42 MCP, Slack, and Enterprise Agent Integrations00:28:59 Memory, Knowledge, and Always-On Agents00:36:16 Sub-Agents, Multi-Agent Orchestration, and Meta-Devin00:43:55 Vibe Coding, Auto-Merge, and Codebase Decay00:48:38 Agent Infra, VPCs, Cloud Providers, and Fast VM Restore00:52:25 AI Code Smells, Reward Hacking, and Code Review Systems00:56:10 Making Codebases Agent-Ready00:58:30 Windsurf 2.0 and the Local-to-Cloud Agent Handoff01:01:15 SRE Auto-Triage, PMs Shipping Code, and Agent Use Cases01:04:32 Agent Budgets, Hybrid Models, and Autonomous Coding Factories01:06:51 Hiring at Cognition and OpenInspect Consulting01:07:45 OutroTranscriptIntroduction: Walden Yan, Cole Murray, and Context EngineeringSwyx [00:00:00]: All right, we're in the studio with Walden Yan, co-founder of Cognition, CPO.Walden [00:00:08]: Happy to be here.Swyx [00:00:09]: Which is a cool title. And coiner of context engineering.Walden [00:00:15]: Although I think there are many people who'd used the terms in various ways beforehand, but I did find that people, both internally and externally, enjoyed the upgrade from prompt engineering or model wrapping into maybe a more thoughtful way to build agents.Swyx [00:00:33]: For those who haven't caught up on that, I have on screen the Don't Build Multi-Agents post, which you should go read on and we might refer to, and Cole Murray, who created OpenInspect.Cole [00:00:43]: Great to be here.Swyx [00:00:43]: So let's talk about it. Everyone is building their own Devins. What's going on?The December Shift: From Handholding Models to Autonomous PRsCole [00:00:51]: So I think the engineering world is waking up to this idea of background agents, cloud agents, whatever you'd like to call it. And I think we saw a shift around the December timeframe of 2025, where the models Opus 4.5 and GPT 5.2, they reached a capability where we moved away from handholding the model and being able to actually more or less autonomously drive the model. And what I mean by that is that we could pretty much go from a specification to a completed pull request, assuming the spec was good enough, with very little friction. And that paradigm alone, I think, changed a lot of how we interact with agents, and opened this world where background agents became more practical.Swyx [00:01:41]: I think for Cole, everyone experienced this in December, but I feel like there was just this increasing ramp, right? There was this moment which was, I think, Sonnet 3.7, where, You guys rewrote Devin in one night or something. So describe 2025 or how it felt from your side.Walden [00:02:01]: In retrospect, we always thought it was ramping up, but then even now, over the last three, four months from today, it's been ramping up even faster. So it's almost funny to be talking about how, big of a leap Sonnet 3.7 was, and honestly, a lot of it was stripping out parts of Devin that were no longer needed with that jump in of intelligence. But I also just think that a lot of the recent leaps, especially, you look at, models like Opus and the latest GPT models, they are reaching levels of autonomy where people are actually finding that they actually can just be hands-off. And people who were once debating, “Oh, do I need to be in the weeds with my model in the IDE? Can I just completely move it off into the cloud?” That's a more serious conversation, and we've seen that in all of our growth charts. Internally there's this funny graph where our usage has, of PRs, our merged PRs, has grown 7X since I forget what it was called.Swyx [00:02:57]: I think Dev, maybe tweeted that. Yes.Walden [00:03:01]: it grew like 7X over, the last, I think it was, two months, three months, something like that. And then you see our engineering headcount growth. It's, gone up by, 10% or something.Swyx [00:03:11]: We were, we were afraid To release this. So this is Devin commit percentages on all Devin repos, was 16% in January and now 80% in March.Walden [00:03:25]: It's a big shift right now. And so it makes sense that a lot of people are now thinking about, buying Devin, but also maybe, trying to build their own and there's Lots of I have a lot of fun building Devin, so I can see why other people would want to build their own cloud agents as well. Matt, well, maybe it's good to hear, what initially inspired you to try to build OpenInspect?OpenInspect: Ramp, Cloud Agents, and Open SourceCole [00:03:49]: OpenInspect came about, through primarily my clients observing how they were using tools like Claude, OpenAI's Codex at the time, and seeing some of the friction that they were having with it. Primarily the Claude was being used through Slack, and a big issue they ran into was that the sessions that were launched were specific to whoever called it via Slack. And so if a PM was the one who invoked the session and they would then go to pass context to engineering can't see the session. And that in itself was a deal breaker because the PM, “Hey, engineering, can you jump in?” But there's nothing to jump in on unless they're copy-pasting out or the single response that came back. And so seeing some of these problems, I had built a similar architecture internally, just to experiment with, test out different ideas as this trend of moving off of localhost was starting to become, And as Ramp released their blog post, I had a lot of the pieces for this already in place, and just thought it would be funny to, see what Claude could do just purely from the blog post. And on my X account, there's actually a thread of where I live tweeted, going through thisCole [00:05:14]: comparing GPT and Claude as both of them are going through it.Swyx [00:05:17]: On the announcement thing or something else?Cole [00:05:19]: right after it got released. We can put it in the show notes. Yeah, it was helpful that I had already knew how to verify the system. I knew what I was looking for. I think Ramp did a great job of really illustrating, the technical aspects of how to build something. It was much more than just like, “Hey, we built a great system.” It was, “And here's how you can build it too.” And so, I resonated a lot with that, just with the problems that I was already seeing, and I thought that, looking around, I didn't really see anything in the open source community that, met this type of system. I think there's a lot that run, in localhost like Superset, Conductor, and many others.But nothing that was actually running in the cloud. And so, I built it, and I thought it was interesting to just open source it and allow anyone to then have a foundation that they can mix and match on top of.The Business of Background Agents: Open Source vs. DevinSwyx [00:06:16]: So literally after Devin was launched was, there was OpenDevin Which became All Hands. I don't know if you tried that orWalden [00:06:22]: I was going to say, one of the things that interested me a lot with OpenInspect was, you didn't try to go make it then something you monetize. There are a lot of, I think, these open source projects would then go and really try to, raise VSwyx [00:06:36]: That's why no OpenDevin. Yeah.Walden [00:06:38]: yeah, and how did you think about that? I thought that was very interesting.Cole [00:06:44]: I thought, and just what I had seen across my clients, was that having a background agent system is going to become a critical infrastructure within their company. And so because of that, I think that I wanted to open source it so that they could fork it and put in whatever customization they wanted. To that question though, I get asked all, “Oh, are you going to raise? Are you going to turn this into a service?”Walden [00:07:08]: I'm sure you've gotten offers.Cole [00:07:09]: but primarily I don't want to do that for a few reasons. One, I think that I don't want to compete for, $20 a seat. I think that is just a really difficult business. I think it's very easy to copy the main pieces of it. Again, I built this fairly quickly. And I think because you are not owning, I guess, the entire stack, it's hard to monetize. You have money being made at the sandbox layer with Daytona, E2b, many other players. You have money being made at the model layer. And you sit in this weird in-between gray area where what are you actually selling? You're selling, I guess, the infrastructure. You're selling, the integrations maybe.Swyx [00:07:55]: let's ask the guy. What are you What are you selling?Walden [00:07:59]: Well, yeah, there's multiple layers to this in practice, and actually it's funny you mentioned the infrastructure, ‘cause when we got started building Devin as well, we had to go figure out how to make the infrastructure as well because,Swyx [00:08:10]: You had to build this two years before everyone else,?Swyx [00:08:15]: Including, the model sideWalden [00:08:17]: It was not, it was not very polished at the start, when we just built it off of raw VMs from cloud providers like EC2, the boot up time was so slow, I think, And especially then, turning off the machines, saving them, and then to be able to bring them back up again when the, when you want Devin to wake up again later. It would just be out cold for like 10 minutes because that's just how long these systems took. They were not built for this repeated down and up usage. And so we actually had to go do all of that. And as a result now, one thing we offer when we go and sell Devin to people is, you don't have to worry about all the compute side of things. We'll make it work. We'll make it work in your cloud if you want it to. But aside from the product, and I want to go into the agents and the tuning of the intelligence part later, but I think a big part of what we do at Cognition as well is to just make sure that your company learns and uses and adopts these coding agents. ‘Cause I think for especially the largest enterprises in the world, you find that there is a lot of people who want to move over to using AI for their day-to-day workloads. But because of the way projects are planned, because, not everyone is literate in using AI in these ways, having a team of engineers who can actually go in and onboard you, set up all the integrations you need, the automations you need to really get to that level of, leverage with AI, is super helpful. And so We do that. We show thought partners to the customers that we work with as well.Swyx [00:09:56]: So let's talk about, architectural stuff. I think that's always, that is something that was the topic of conversation between the two of you. Is this, the mental model that you want to start with or something else? I'll just leave the floor open to you guys.Agent Architecture: Harness in the Box vs. Out of the BoxCole [00:10:11]: I think, maybe we can start here as just a general what are the pieces of a background agent system. And then maybe we can go into some of the nuances of, Decisions that you can make.Swyx [00:10:22]: But I guess I also Like, what, maybe what Walden is saying is the agent is like in this open code box, I guess. Right? This is infra, and then there's, that's the agent. And you had this discussion about whether you put the agent in here or in Out externally. Can you tease that out?Cole [00:10:39]: In a background agent systems, you have a decision to make of where the agent is actually going to run. This is typically described as the harness in the box or out of the box. With running the agent in the box, you're making some trade-offs by doing that. The negative trade-off you're making is primarily security. Because the agent is running in that box, unless you otherwise design it, all of your secrets need to go into that box as well. And given the nature of AI, it can be unpredictable, and you could very easily end up accidentally exfilling your secrets, or other unintended behavior. Now, the out of the box is the idea that we are going to have the actual agent running not directly in the sandbox, and we will have, quote-unquote, the brain of the agent running in some type of worker, control plane. That sandbox then is going to serve as the hands where the brain is basically operating and making tool calls into that environment to manipulate it. I guess other trade-off that you're making between the two systems is that, in my opinion, running it out of the box is much more complex because, you have state that has to be managed, whereas if you're running it in the box, all of the state of that agent is actually in the box, and yes, it's you could persist it elsewhere, but it's all localized and you have less concerns to worry about.Walden [00:12:08]: I think a lot of that, what you mentioned, is why we actually from the start built Devin to what we called separate the brain from the machine. The other thing that this allows you to do is reuse any existing infrastructure you have for dev boxes Perhaps. And so you don't have to worry as much about making a new type of dev box that has all the dependencies the brain needs, as you mentioned, the secrets the brain needs as well. One thing that we've seen some customers run into is, you have a GitHub app and you want Devin, your agent, whatever, be able to interact with GitHub through this application, but then you have different users with different actual permissions. If they are all interacting through the same GitHub app and there's no actual, separation between the system that decides, what it does and the actual secrets on the machine, then you run into an issue where, okay, it's hard to do the separation. But in practice, with Devin, it's much easier because we just say whatever you put on the machine, that is, the scope of basically what the user is free to do, what the agent is free to do. So only put the most scoped secrets on that machine, and then the brain is fully not accessible from the machine. So you don't have to worry about messing with the, any of the most secure parts of the brain if the user is free to do whatever they want with the machine.Swyx [00:13:31]: I was going to just bring, I have this, chart from OpenAI, where I don't know if this is, in the box, out of the box. That is something that they do use to describe it. And then also recently Anthropic did, managed agentsSwyx [00:13:44]: Which is, this is their thing. I don't know. It's all, it's all variations of the same pattern, right?Cole [00:13:49]: So this would be out of the box.Swyx [00:13:51]: Which, is preferable for them because it's less work?Cole [00:13:56]: I would say it's more work.Swyx [00:13:58]: It's more work?Cole [00:13:58]: But it, in my opinion, it is the better architecture of the two. It's just, you're taking on a bit of complexity by doing that.Repo Setup, Docker, and VM-Based Development EnvironmentsWalden [00:14:07]: One thing I've not seen a lot of other players do well is how do you manage what's actually on the box? And this can be complex for many reasons. Let's say you have a big repository that's changing and updating a lot with changing dependencies. How do you make sure that the working environment of the agent actually stays up to date, has all the credentials it needs to, let's say, run the app and test it, and all the things you want your autonomousSwyx [00:14:34]: So a repo setup.Walden [00:14:35]: Exactly. So in, internally At Cognition, we call this repo setup.Cole [00:14:39]: The hardest part ofWalden [00:14:40]: It's been a perennial problem since the start of the company, of how do we help people get this set up? Because not everyone just has, working cloud environments working out of the box. And do you find this to be a common problem withSwyx [00:14:53]: How do you solve it?Walden [00:14:53]: Your clients?Cole [00:14:54]: This is a very common problem, and through my consulting, this is a lot of what I help teams do. A lot of teams don't really have great developer environment setups, if any. A lot of the times it's, “Go talk to Bob and get the secrets,” and that obviously doesn't work when the agent needs to actually set this up. And so a lot of that, most teams are using Docker Compose or some type of microservices. And so for theSwyx [00:15:19]: Even in prod?Cole [00:15:20]: Not in prod. With the OpenInspect, you are using this primarily to interact, and make code changes. There is other use cases, but you can hook, whether through CLI, MCPs, other tools, you can then hook that into your production systems primarily for, SRE type use cases. But you are not, necessarily, trying to test your prod internal microservice through the system.Walden [00:15:48]: And you mentioned Docker Compose. I think one direction we saw some of our friends take early on was, using Docker containers as the level of abstraction for their models. There's lots of reasons, I think, why Docker containers are not great. One thing is, Docker container's not really a true security boundary, for one. But the other is, if you are running real applications, a lot of times those applications use Docker, and then you have to think about Docker in Docker, which is, really weird. And so I think part of, the really hard challenge of getting VMs to work, why did we do that? Well, it was because we realized that you actually needed, full VMs to be able to do these types of things. And especially nowadays where there's actually value in running the application and clicking around and sending you screen recordings of these things. The value just, keeps adding on top of that. But it is a decision I see people run into when they try to build their own systems, is, “Oh, do we, in addition to this, do we put the agent in the machine or out of the machine? Do we use Docker? Do we use something else?” What do you recommend people nowadays?Cole [00:16:57]: I think Docker is a good solution for maybe not running the agent, but running your infrastructure, because that is more or less the same setup your engineers are probably already using. If they're not, then I don't know what they're using. But they're probably already using Docker Compose.Swyx [00:17:14]: I've always had a small candle for web containers. I don't know if you guys have tried them before.Swyx [00:17:19]: To me, they were, supposed to be like Docker Light.Cole [00:17:22]: Is it?Swyx [00:17:22]: I don't know.Cole [00:17:22]: No, I haven't tried it. But yeah, I think any environment that you've set up that is a good experience for your developer naturally lends itself to being easy to set up for the agent. And once you figure out that local developer story, you've more or less solved the agent in a sandbox, environment setup. OpenInspect does have hooks as well, where you can, run a setup SH script that will pre-install everything. You can then pre-snapshot that build so it starts instantly, and then there is a second hook to actually then, restore the state of the sandbox when it comes back. And so you can already have all of those microservices running and basically get the same experience that you would on your machine within the sandbox.Testing Agents: Computer Use, Screenshots, and Real App WorkflowsWalden [00:18:08]: Another thing that we've been thinking a lot about is like Different VM service offerings. Have you had customers where they needed like macOS specific VMs or like Windows specificWalden [00:18:20]: VMs?Walden [00:18:22]: There are like many technologies in the world that only work on specific types of machines, right? If you're building a.NET application that has to run on Windows or like, maybe more commonly if you want to build iOS or macOS Does that workSwyx [00:18:32]: Does Commission supportSwyx [00:18:33]: Choices like that?Walden [00:18:35]: The fundamental architecture we do, because we do the separation, it does support, but the actual work in progress is happening right now on these. Another thing that we've actually recently added support now for, it's in beta, is doing Android development. To do that, we needed to support, I think, nested virtualization within our machines because the VM itself is like a, is a virtualized Firecracker instance, and then you had to then run another Android emulator inside. And there's like weird performance issues that like, it, which is why it's like still in beta. We have to think through these problems, but it unlocks a lot for anyone who wants to do Android development.Swyx [00:19:13]: I was trying to find like a reference video for the testing thing. I couldn't find it, but I think you worked on the testing, capability. Why call it testing and not like computer use or I don't know, it's, what's the general Category of problem?Walden [00:19:26]: I think that when people think about the ability of an AI to run your app and test it, I think they actually over-index on the computer use part of it because computer use in my mind is the literal, okay, you want what button you want to click. Can you emit the right coordinates to go click that button? I think testing is actually a really interesting likeWalden [00:19:48]: Problem-solving, challenge for these AIs because if you wanted to do arbitrary testing, imagine you make a change that spans the frontend and the backend, maybe, even some other like even more deeply nested service. To actually test that change, we have to reason through what-- how do you first run these applications to orchestrate with each other with the right version of the code? Then, okay, how do I trigger the feature or how do I make the thing actually happen? And this can get arbitrarily hard, maybe you have to be an admin. Maybe a certain thing has to be feature flagged on. Maybe, you have to like run two sessions and then send us a very specific word into one of them to trigger a specific behavior. And figuring out how do you do that requires a lot of code base context, requires, a lot of orchestration that we've specifically done. And in some cases, we found that you actually, no one frontier model can actually do this full end-to-end task itself.Walden [00:20:42]: We've seen cases where we actually had to orchestrate different frontier models together to solve this problem together. That is where we spend most of our time when we think about this testing problem, not so much the computer use part. Computer use for what it's worth has gotten a lot better with recent models and it's made that part of the job certainly easier.Swyx [00:20:58]: Especially with like even 4.7, that they released yesterday, apparently like way better in terms of the vision stuff, which is going to be encompassing computer use.Walden [00:21:08]: Having evals for all these as well is something that like takes a while to build up. And having the evals be right is tricky as well. Do you ever see like, clients who are building their own agents have to start standing up evals to make sure things don't regress?Swyx [00:21:25]: Not so much evals in the traditional sense, but specific to the testing part that has just gone in. I just added support for screenshots And in theory you can also do video. I need to put in a plugin to do that. But they do show up natively, and it was a very heavily requested feature, especially after Cursor's recording came out. I think that was very enlightening for everyone of like, “Oh, this is a very good feature to actually have.”, I think with Devin you guys have had this for a while.Swyx [00:21:57]: Oh, yeah. See how screenshots work. Yeah, I don't know if there's anything, super and not obvious. It's like once what feature to build, you can just prompt it and it Will mostly work.Walden [00:22:09]: I think to Walden's point, though, the computer use is a subset of the larger testing problem, and I think that's very specific to the code base that you're working and it's not something that, out of the box that you could just solve it. The-- you do need the code base context to actually know how to test it. And I think in the case of a background agent system, you fortunately do have that code base locally that what is changing and could then inspect it and use that to drive the model.Swyx [00:22:40]: For those who haven't seen it before, this is an example of how it works. You, after the PR is done, you click testing approved, and then it sends you back a video. What I really like is that it labels, It's very small here, but it actually labels what it's testing. And then it-- and then you actually see the cursor and everything. So I don't know, yeah, the engineering in this, just Whatever you want to show. ‘cause this is like, this is one of those like, oh, few of the AGI moments, right? ‘cause Once I look at this, I actually don't I wish I can just merge inside Of Slack instead of going to GitHub ‘cause I don't need to see the code. I know it works.Walden [00:23:19]: Maybe a new feature in Cursor. Yeah, the annotations at the bottom was also a big difference for me when I, when I added those.Swyx [00:23:27]: It's just like, what am I looking at? What are you trying to demonstrate?Walden [00:23:30]: Exactly. There's a surprisingly long tail of small details that ends up making a big difference for this end metric of like how fast do you actually merge the code in. One experience that we spent a lot of time tuning early on was what is the right experience on GitHub for these tools. Because I think, most tools out there when you build the agent, you'll think about, oh, it'll create the PR for you. We try to take that a step further and say, “Oh, what if we actually made sure you could interact Devin, with direct Devin directly on GitHub?” And so we made sure that you can comment on GitHub, and Devin would actually receive those comments and address them back. But there's actually quite a bit of tuning you have to do here because you can imagine that actually like-We recently have Devin Review, for example. Devin Review will post comments on his own PR And then Devin has to then goGitHub Workflows: Devin Review, Comments, and PR AutomationSwyx [00:24:23]: He answers his own comments, which is Really loopy. So like, yeah, I like that it just updates here that it's, that I have commented But usually it's just me saying like, “Hey, merged, fix any merge conflicts.”Walden [00:24:37]: The, so when Devin fixes his own comments, you might be scared that, oh, maybe I'll infinite loop. But we've put a lot of work into making sure it doesn't, both by making sure that the comments are high signal, but also that the agent is thoughtful about what comments it immediately goes and tries to fix, and what comments it's like, “Wait a second, I think you're wrong.” Actually, that's one of my favorite moments is when Devin tells me that I'm wrong, when I try to get it to do something different. But tuning that behavior, actually makes a big difference in terms of how useful the actual GitHub experience is.Cole [00:25:06]: I think to touch on that as well, I think having the AI reviewer integrated into the system is a critical part of this background system. OpenInspect does have that. It has a GitHub code reviewer that you can control the prompt. It does do comments as well. It doesn't do them automatically yet. The capability is there, but it's not fully used.Swyx [00:25:27]: So you have to ask for it?Cole [00:25:28]: you do, yeah. You can tag it on GitHub, and then whatever you named your, GitHub bot, it will then follow up on it. It will then, if you have merge conflicts or whatever you have asked it to resolve, it will then resolve it, but it doesn't do it automatically yet.Integrations: Slack, MCP, and First-Party Agent InterfacesWalden [00:25:42]: Well, I'm curious, what is, the most common thing that people end up requesting, that they still need on top of OpenInspect when you help them go implement it?Cole [00:25:52]: I think a lot of it comes down to actually integrating it into the company. It's one thing to have the background agent system set up, but if it isn't actually integrated into your larger ecosystem, it isn't that useful. It is useful to be able to kick off sessions, but what we really want to be able to do is hook it into all of our other systems, whether that is the production database with read-only credentials, the logs, a Confluence or internal knowledge-based system. I think that is where I see the huge leap for companies, and that can be a challenge for companies as well who are maybe not familiar with exactly how to approach it, especially if they're in environments that have more compliance type things where, access control can be pretty big and how do you deliberately think about these problems, I find to be, one of the problems that comes with a system like this.Walden [00:26:46]: The thing we found is So, MCPs, obviously it has been like this, really big explosion of, oh, you can go, integrate it with all these different things. But to actually get the integration right and the and get the right experience, oftentimes we found that we had to go build our own ad hoc things. I think Slack is a great example of this. You could give your agent a Slack MCP and okay, it can post messages back to you on Slack. But we actually use Devin like a coworker in Slack, and that's how it's been built from the ground up. But to do that, you actually need to, support webhooks that come back, right? And then Devin has to respond in a natural way and then hopefully don't spam your threads too much and annoy the people in your company. So you got to tune that experience just right. Especially when there's a lot of back and forths, we find that we actually have to go beyond the simple MCP integrations in these places.Swyx [00:27:39]: I just pulled up the MCP marketplace. I know this is a Fair amount of work. Is the answer to eventually take first party control of all the top MCPs? Is that theWalden [00:27:48]: I would love a world where you could have something that's more expressive than MCP. That, goes both ways, not just a set of tools, but a proper system that interacts back and lets it Have the right experience with all these interfaces.Swyx [00:28:03]: So there actually is sampling in the MCP spec, but nobody Uses it, right?Walden [00:28:07]: And so I think that's the other part is, actually we found that when the MCP spec starts to get too complicated, it starts to lose its original promise of Being like a simple one-step connect. Now then we have to go figure out how to support all these different variations of things and It starts to look a lot like just building the first party integrations in a lot of these cases now.Cole [00:28:29]: I think it matters, too, how critical it is to your company, right? If this is something that nearly every session is going through, it probably makes sense to own it so that you can make optimizations on top of it Versus just whatever is off the shelf.Swyx [00:28:43]: Awesome. Other than MCPs, what else, sorry, well, I don't know if that's Narrowing in too much on, integrations. But what else? What other elements of building OpenInspect or Devin that you guys really sink on?Memory and Knowledge: What Agents Should RememberCole [00:28:59]: I think, a problem that comes up very frequently is this idea of memories or knowledge base.Swyx [00:29:05]: Oh, boy. How do you solve it?Cole [00:29:08]: so not solved yet, is the short answer.Cole [00:29:11]: it's something, there's a open issue for it, someone asking about it.Swyx [00:29:16]: There's, I, D Wiki hasn't indexed anything about memory yet.Cole [00:29:20]: how I'm seeing it solved across my clients is primarily through skills. I find that skills can be a good gap within that or updating Claude MD, but I think memory as a whole is a pretty unsolved problem, and it is why I've been hesitant to add it. I think there is parts of memory and that can be addressed, but I think as a whole it's a very difficult retrieval problem.Swyx [00:29:44]: Oh my God. RAMP didn't write anything about memory? I see zero search results.Walden [00:29:50]: No. Memory can be quite tricky to get right because it's the retrieval, but also the generation of the memories that can be really tricky. You don't want it to just like Remember very specific details.Swyx [00:29:59]: Walk us through the Devin memory journey because I know there's been a journey.Walden [00:30:03]: the first version of memory that like stuck around for a while was A system we have called Knowledge. And the idea was we wanted it to pick up things over time and not need the user to be proactive about teaching Devin things. So, okay, any time you remind Devin, “Wait, no, that's not quite the way you're supposed to use Git”Like, we actually want Devin to say, “Hey, do you want me to actually just remember this for the future?” And for you to just basically quickly approve or reject and for it to build up over time. ‘Cause I find that, 95%, I think, or some crazy stat like that of the memories that Devin has are all through these auto-generated things. Very few people actually just want to sit down and write big docs on Here's how you're supposed to work with the technology, et cetera. The generation and the retrieval has been something that we've been trying to tune a lot over the years. Generation, you don't want it to remember something like, if you asked one time to like, “Oh, please open as a draft PR,” you don't want to be like, “Oh, everyone forever now should get their PRs as draft PRs.” But you do want some, conveyor. Maybe you want to say like, “Oh, Cole generally likes, things to be created as draft PRs.” Same with retrieval, if you have thousands of these memories, how do you actually make sure they're retrieved at the right time? And that can be quite tricky to do right without exploding the context with a bunch of useful yeah, useless information. Surprising amount of just, eval work to just make sure that, memory is, remains a reliable system as new models come and go.Cole [00:31:31]: Do you have anything that you could share on, memory pruning? And like the temporal aspect of memory?Swyx [00:31:36]: Deleting and forgetting?Walden [00:31:39]: The, today, the, So the things they could do is it could edit memories. And so if your memory used to say like, “Oh, Cole likes to open everything as like a draft PR,” then you can imagine, “No, don't do that.” And then it'll say, “Oh, do you want me to update the memory to be Cole now want everything as, open PRs?” I think that at the same time we don't know if this is going to be the final version of the system. Whatever we have here will probably, translate into the new system that we'll be coming up with. But I think one big difference between two years ago and today is these agents are really good at using anything that resembles a file system natively. And so part of us are, is thinking, “Oh, should we rebuild memories to feel more like a file system that we let the agent navigate on its own?” That's been an interesting exploration. Also similar ideas in the scale space.Swyx [00:32:35]: I am pulling up OpenClaude's memory thing right now. So memory, OpenClaude has like this like daily memory journal thing, right? And you can I mean, that is a file system you can grep through and is a source of truth. I don't know if it's the best. It's probably super noisy, but at least, if you lose something you can discover it or you can apply some, forgetting algorithm to, more ancient memories that don't get recalled again or something. I don't know.Walden [00:33:01]: One thing we've been trying to do to push the boundaries of how you use agents at your company is letting an agent basically have a very similar file, a memory.md or something, and just like be your permanent PM for a specific set of issues maybe. So we have like some Slack channels internally, maybe a Slack channel dedicated to, a specific product like DeepWiki maybe. And you can imagine that, or you want a Devin that never stops, it's just always awake, but it has this like memory dock that it can just maintain for itself about, okay, what are like the number one priorities of what we have to fix and prioritize? Who is responsible for some upcoming work? Maybe they'll even Devin will even tag you on some recurring basis. And so it's been an interesting move to see, okay, how can we actually use Devin for more than just engineering? Can we actually upstream above the engineering process and maybe it's just Devin creating tickets, which then maybe some humans do, but then maybe other Devins do.Swyx [00:34:00]: One of my more fun automations is go research competitors and just suggest stuff to me on a weekly basis. That's the automation. I can't find it right now, but basically it just like, “Look at competitors and suggest things.” “And here are three things that you've suggested that I don't want any more of,” and you just stick that in the prompts. But like I wish actually So for like when I, for example, when I reject a PR, I wish that it updated memory so that I can then just not have to go up, go back and update the scheduled, sync, but anyway, feature request.Walden [00:34:31]: what? We might change it soon. I guess OpenInspect, in the time you've been around, has there been anything you tried to implement but then you had to like undo and like do a different way?OpenInspect Architecture: Webhooks, Control Planes, and Agent StateCole [00:34:41]: Nothing yet, but something that is on my mind. The initial way that I built it was that each of the integrations lives as its own package. And so you have The Slack bot, which is what's handling the webhooks, and then is basically interacting with the control plane. As I'm seeing the system starting to be more integrated, specifically with the GitHub bot integration, I'm considering bringing that all into the central control plane because especially now I want to start, And a request that I'm getting is the ability to monitor, the actual, pull requests being merged, as well as just tracking ofSwyx [00:35:19]: What do I have open?Cole [00:35:21]: What do I have open? How many of these are getting merged? How many comments are showing up? To just understand the health of the system. And so in the case of a GitHub app, you only have one webhook. And so then it's a question of do I put that webhook in that GitHub bot package? That's weird. It doesn't really make sense to live there because that package is more for like the code reviewer. Or do I like centralize it? So that's something that's on my mind of, making that decision. I think the other one we touched on earlier is the harness in the box versus out of the box. I think long term the architecture will eventually come back out of the box. Some of the newer tools that I've added are calling back into the control plane so that you don't have the secrets in the sandbox. And so I think long term I probably will pull the actual, agent out of the box, but I think for now it's fine.Subagents and Multi-Agent Systems: When Parallelism Helps or HurtsSwyx [00:36:16]: Just, a quick question on pulling the agent out of the box. I'm One thing I'm very bullish on this year is agents calling other agents or spawning sub-agents or Whatever you want to call it. Does that make it harder or easier? I can't tell. Because if the harness is in the box, you can just spin up more boxes. If the harness is outside the box, then you're, it's less easy because you are, you have a unicorn pet of a, of a harness that's, living outside the box.Cole [00:36:45]: In theory it would be the same way, right? Whether, one agent has launched many, sub-sessions within it, OpenInspect, for example, can launch sub-sessions and actually create other environments and then monitor them. In the case where it is out of the box, that would basically just be an additional session that's running. And so that session is also running outside of the box. It's running in your worker plane, wherever you're running this. And then you really just have to think about how does your top level agent then interact with it. I do think it can be more complex, just ‘cause again, you have now a more difficult architecture. But I think if you figured it out once, it's probably fine.Swyx [00:37:26]: Well, then I'm just, throwing it open to you in terms of, I call this like meta Devin management. Which is like the, Devin's calling Devins or Devin scheduling Devins or querying trajectories or anything like that. What have you built or unshipped, anything?Cole [00:37:46]: I think one of the surprising things we've seen is that a lot of the ways that, these, separate agents work with each other, and you want them to, parallelize their work, has still mostly followed the same manager sub-agents regime. And a lot of people I think are excited about this world where you have swarms of agents that, talk with each other all over the place. We've actually given Devin an MCP so they can just go arbitrarily message other Devins And create new Devins, et cetera. But I guess, it somehow creates, a really chaotic world in that sense. And so we've still found that most practical use on a day-to-day basis has been one single Devin.Cole [00:38:33]: Figuring out how to segregate the work and get, have other Devins work on it in, a relatively isolated sense, each with their own boxes Not sharing machines, so there's, a very little room for conflict is the regime that you have to create today.Swyx [00:38:50]: I'll call out, the experiments from Cursor, right? This is Wilson Lin's work on Single agent to multi-agent, and you're obviously famously on the side of don't build multi-agent. But they went through the whole thing, only to arrive at, this Which is exactly what Devin has, I think.Cole [00:39:08]: I think there will be a revision to that post at some point AboutSwyx [00:39:12]: Tell us about itCole [00:39:12]: I think multi-agents were very much not at all possible a year ago. You do see more multi-agent experiments today, but you can argue, are they really multi-agents, or are they just just, tool calls,? There are people who, will create sub-agents to go look for XYZ file, XYZ implementation. Has really nice context management benefits because all of the tool calls and tokens that it spends then get collapsed back to just the answer for the main agent. There's a lot of benefits to doing this. We basically have Devin do this with Deep Bookie, make a call out to Deep Bookie, give you back the results, but that feels like a tool call,? It's not like these, two collaborators actually talking back with each, back and forth with each other. But I think the thing that gives me the most bullishness that multi-agents might actually be possible is actually what I said earlier about Devin will actually sometimes tell me I'm wrong and push back, and I think that demonstrates a level of maturity and communication today that makes a multi-agent world possible. One, can two agents who have seen different information come back to each other and actually figure out who is right, what is the correct implementation? They're not just, yes men. Claude, I guess is like, used to just say, what is it? “You're right,” or,Swyx [00:40:25]: “You're absolutely right.”Cole [00:40:26]: “You're absolutely right.” Yeah.Swyx [00:40:28]: The Have you seen, did you seeCole [00:40:29]: The age is overSwyx [00:40:30]: The Codex app troll in Topic? This is the Codex app. Inside of Settings, there's a little, there's a little Easter egg, right? So if you go to, the Themes or Appearance, right? There's all these, color codes, and the top is absolutely, and it's the Topic's colors. Which is such a troll. Anyway.Model Behavior: Pushback, Adversarial Prompts, and Agent SkepticismCole [00:40:53]: I love that Easter egg. Did you discover that yourself?Swyx [00:40:54]: No, it was, someone was, tweeting about it And I was like, I was like, “Is this true?” Because, sometimes people just tweet stuff to, get a rise out of you. But yeah, there you go, in Topic colors.Cole [00:41:06]: Yeah. So yeah, we're out of this regime where, it just says you're absolutely right, and they can have real conversations and real back and forths.Swyx [00:41:13]: You can prompt it as well to be more adversarial or whatever. Yeah. Okay. Yeah, that, I mean, to me, that is more intelligence, right? That is not just something that's, a dumb tool, it's actually pushing back on you I think. Yeah.Cole [00:41:24]: when you mentioned, of course, the blog posts. There was one blog they had where they fed a swarm of agents together and built a browser.Swyx [00:41:34]: That was I think that was the one.Cole [00:41:36]: You can have, likeSwyx [00:41:37]: I think it's the same oneCole [00:41:37]: Creation of it. We found a surprising success of, don't do a swarm or anything, just have one Devin, it does its own context management. Just let it keep running for a while and give it some crazy tasks. I think we asked it to, rebuild, a Windows OS system. And it managed to do it just like, going on for long enough. It'sSwyx [00:41:55]: Was this Andrew's thing?Cole [00:41:58]: there were lots of demos that we ended up not posting, ‘cause at some point we'd just be posting way too much a bunch of, Demos. But I love that because it shows that I think the multi-agent thing still has, a bit of exciting sexiness to it, which is maybe still beyond still, the actual delta it adds to the capabilities of these systems. But it's absolutely the future. I think we're heading in that direction and we can see the progress being made there already.Swyx [00:42:25]: If I were to, make one super minor pushback because I don't feel that confident about it yetCole [00:42:33]: Go for itSwyx [00:42:33]: But I've had Ryan Lopopolo from OpenAI on the pod And he's a super slop cannon, right? Oh my God, that's my coding agent being done. I downloaded this, Peon Ping. I don't know if you guys have heard this. It takes like-, sound packs from popular games like, Command and Conquer and Warcraft, and then it plays it whenever it's done. And so it's like, “Work,” or whatever, “At your command,” or something. Anyway, what I got from the Cursor code base and from Ryan's thing was that there's a slop cannon approach where you try to loosen the single agent's, bottleneck, and I feel like that is, probably an, a very important thing to try to figure out. I don't think anyone's, really solved it. Because then you just have more reviewer slop on top of the agent slop To try to wrangle it all. Ryan will probably very strongly object that I say that he hasn't solved it, but he thinks he's He thinks he's completely solved it. But I think it's still I think it's, very important, ‘cause, that is a bottleneck, right? I feel Devin is slow sometimes Because I'm like, well, yeah, this is very readable and very sensible, but also it is slower than it could be if I just, I want a button to just say, “Just ramp this up 1,000 next parallel, in parallel and just, see what happens,”? And I don't know if that's, feasible at some point in the future.Code Review, Entropy, and AI SlopWalden [00:43:55]: I And we've also run experiments internally where we've basically tried to build entire products, true products that we knew we would eventually ship, but for now, let's try to see if we can do it just by purely, vibe coding on top of each other, auto merge, no code review at all. And then there's this benchmark of how many weeks can you go onto this for Before you say, “We have the trashiest code base.”Walden [00:44:18]: “Let's actually rewrite it from scratch.”Swyx [00:44:19]: Start a new factory, yeah. What'd you find?Walden [00:44:21]: I think we found that the state-of-the-art in December was you can probably, run this for about two weeks. By the end of those two weeks, you'd find that, hey, you want to, change the color of a button. Well, it turns out this button is implemented in, 10 different places, and they, have All these different variations, and oh, you forgot one of them, and actually it's a slightly different color in one spot. And you're like, “Okay, this is too much to work with. Let's actually try to do code review at the same time.” And make sure that we're on top of our software, actually cleaning it up a bit And making sure it's done in a scalable way.Cole [00:44:54]: I think building on that, the idea of, you don't have to look at code, I think is generally a bad idea. And the meme that I have for thatWalden [00:45:03]: What timeline, all right, is Do you think that statement will be true on?Cole [00:45:06]: I think probably for a while it'll be true that you should continue to look at your code. A problem that I see a lot of teams run into that I work with who are embracing AI native, AI first coding, is The meme that I have is that your code base regresses to your worst engineer, because that engineer who is, very gung-ho about AI and is not auditing their code, their pattern starts cementing into the code, and now the AI is referencing their patterns. And so now their if/else block that, is 20 if/elses back and forth, the AI is seeing that as the pattern of how things are done and starts to then exponentially grow this slop. And I find to your point, a pretty good approach to that is having scheduled cleanup, whether by humans or through systems, that are looking for duplication. They then address that. You'll end up with like 12 helpers for how to format a date. And you need to address that, because otherwise it will continue to sprawl.Swyx [00:46:09]: Within balance, I think it's fine to have some duplication, and then sometimes To have garbage collection, right? Yeah. The What I've been, talking about with a lot of engineering leaders is that you want to be very strict about the boundaries between modules, and it's your job as an architect, as a CTO, whatever, to say like, “Okay, here's the hard contract between you guys and you guys. Whatever you do inside this black box is your business. You do whatever. But between these guys, let's be, really damn clear, and any movement must be signed off by a human or me,” or. Then, and like that's that. I don't know if you have any other modifications or advice.Walden [00:46:44]: Well, I guess generally on the topic of, where humans can be useful, I found that ‘cause, some of these, really deep infra problems, sometimes just having a human that just has, really deep expertise can make a big difference. I've actually seen this come into play when actually building agents. So we've had a few friends now, try building their own coding agents, and I think one same problem that I recurringly heard a lot of them run into was this problem of like, “Oh, Grep is really slow on our agents' machines.” And so a lot of them, I assume because they're using AI and they themselves don't have, super deep infra background knowledge, say, “Okay, we're going to go build our own custom Grep index. It's going to be really fast,” and use that as a way around this problem. When we ran into this problem About like, maybe like a year and a half ago when we were, in the early days of building Devin, we obviously didn't have AI then. We just asked our, how to, how to do this. You can just swap out a new Grep index, so.Infrastructure Details: Grep, File Systems, and SandboxesSwyx [00:47:45]: What do you mean you hand-coded Devin? What?Walden [00:47:48]: It's like, can you believe we hand-wrote this code? And we had, our infra people who are really amazing, they were looking into it and they're like, “Oh, what? We realized that actually the root cause of this problem is actually super simple, but like fine-grain detail,” which is that a lot of these virtual machines actually underlying them don't use real file systems. They use these, network file systems where things are actually cached over the network actually in S3. So when you're Grepping, you're actually making network calls Every time you're doing these things, and that's why Grep is extremely slow on these machines. And so again, goes back to, what is all of the crazy infra work that we had to do to actually get these machines working. If you try to do this yourself, there are tons of small details like this, and so we had to eventually go swap out that network file system. ButSwyx [00:48:35]: I think there's a write-up about it, right? Silas did one about the virtual file system.Walden [00:48:38]: Oh, that was a whole other thing. TheSwyx [00:48:39]: Oh, that's a different thingWalden [00:48:40]: The BlockDev file storage formatSwyx [00:48:42]: I'll bring it upWalden [00:48:42]: Which is, a file system format that we built so that the VMs could be spun up and down very quickly. Basically, the intuition behind this is-Imagine you have, a terabyte of disk, and your agent only, wrote, a hundred lines of code on top of that disk. How long does it, say, take to, save and re-bring up that disk? And most systems, because you're not optimizing for this case, it's just, on the order of a terabyte of work because you have to Save all of that and bring it back up. In our system, we try to build a file system that incrementally builds on top of each other. So every time you save and bring the machine back up, you're only doing work that is proportional to effectively the diff in the file system. And so this, shaves off a lot of time in the boot-up process of Devin. I think we This is actually now outdated. We have a newer system inside of Devin. But yeah, there's a lot of tiny details you have to get right here to actually get the day-to-day experience of Devin to be good.Swyx [00:49:39]: It's, not technically agents, but it is agent infra, and when you sell an agent as a company, you sell agent plus agent infra.Walden [00:49:46]: At least the way we do it be And the other The nice thing about having the agent infra being done together is, you We get to deploy Devin in whatever environment we want now. We don't need to wait for some underlying infra provider to also go and support VPC or on-prem or FedGovCloud, for instance. So we can actually go and figure out, okay, since we own the infrastructure, how can we get that set up for you?Cloud Providers: Modal, Daytona, and Enterprise SandboxesSwyx [00:50:12]: Whereas you're Cloudflare dependent.Cole [00:50:15]: so Cloudflare runs the control plane. The sandboxes, Modal is supported. A contributor just added Daytona. E2B is on the roadmap, and I think there's an abstraction in place that if any contributor wants to add a new provider, they can add that in.Walden [00:50:32]: Well, what are, How are the customers you work with Do they generally try to then go set up a contract with another one of these third-party providers? Do they try to do the VMs in-house?Cole [00:50:44]: most of them I see using Modal. I think Modal has a greatWalden [00:50:48]: Shout out Modal.Swyx [00:50:48]: Shout out Modal.Cole [00:50:50]: I think Modal has a great offering. It captures all of the sandbox pieces you need, snapshots being a pretty big piece of that, and given that they also offer GPUs, I think it's a pretty nice offering as a whole.Swyx [00:51:04]: no debate there.Walden [00:51:07]: Modal is great, especially, I think their container offering is, the most natural, and so especially if you are willing to, forego, the full VM requirements Modal is, a really vast place you can spin something up on.Swyx [00:51:20]: Is there a point So Modal's very Python, and I feel like most workload, has really shifted to JavaScript. I don't know if you guys Get the same feeling. So, okay, when I started Landspace and IE and all these things, I was like 50/50 Python and JS, right? That's roughly. I think that's wrong now. I think JS has won. I don't know if you guys Like, I Maybe I'm overstating it, and maybe for cognition, there's, C# and Java and what have you. But for, new greenfield apps, do you feel that Do you get that sense? Does it matter?Cole [00:51:52]: I think that most of the libraries that I see in this space are Python native first, especially in theCole [00:51:58]: Observability space. That said, I think that there is a pretty big appeal of having your entire system in one language. Especially when you have both your frontend and backend communicating, you can have one central type Which is very nice.Swyx [00:52:11]: That's my case against Modal, which is Then you have to run JS. You can run JS inside Modal. It's just, one extra step That, isn't native to the runtime. I don't know ifWalden [00:52:22]: I don't knowSwyx [00:52:23]: Reviews. Do you have numbers? I don't know.Walden [00:52:25]: the one thing I don't like about Python is whenever AI, whenever it writes Python, it always does, the weirdest patterns, andSwyx [00:52:32]: Oh, because it's, mixing two and three or what?Walden [00:52:34]: I think it's something mixing two and three, yeah. The I don't know if you see this. It always tries to do, has attribute on objects as likeCole [00:52:41]: Oh, my God.Walden [00:52:41]: But it's like But that you shouldn't be doing that. It should error if there wasSwyx [00:52:45]: Because it's training on library code?Cole [00:52:47]: I think it's more of, likeCole [00:52:48]: From what I've seen, it's more of, a reward hacking mechanism where it doesn't want to basicallyWalden [00:52:54]: It'll never error.Cole [00:52:54]: It doesn't want the code to fail. And so it Even when it knows it has the attribute, it'll call getattr on a, and for a lot of my clients who have moved towards more autonomous coding, we've put that in as a lint rule That if you do getattr, your pull request is going to fail.Slop Signatures: Comments, Backwards Compatibility, and TypesSwyx [00:53:12]: Ooh, this is a fun topic. Can you tell me more about this? What else is a sign of AI coding that you have to put guards in?Walden [00:53:21]: So we were talking just before this about Opus 4.7. One of the things this new model likes to do is it writes lots of comments. Not like, it'll, comment every line, but it'll write, paragraph, PRDs, on top of every function. But I will say, to its credit, these aren't slop, descriptions like they were before. “Oh, here's what this function does.” It's like, “Oh, here's actually the r

god amazon ai business pr work secrets walk research brain local single security generation decisions diy memory os hiring choices android computers honestly consulting ios windows surprising cto command gemini slack openai conquer appearance themes shopify leak sh harness pms gpt paradigm python separating daytona github java warcraft demos notion db settings stripe dev vm anthropic screenshots conductor javascript opus macos cognition versus agi ramp walden cpo xyz s3 ide codex cloudflare entropy prs docker git js gpus internally narrowing sonnets continual deleting repo confluence sentry mcp cursor sre firecrackers modal cli vms datadog observability postgres async backwards compatibility windsurf all hands supersets ec2 7x grep mcps cfps code reviews vpc flue windows os devins clis steve yegge vpcs little snitch semgrep

The State of Code Quality with Saša Jurić

Smart Software with SmartLogic

Play Episode Listen Later May 28, 2026 55:33

In this episode of Elixir Wizards, hosts Charles Suggs and Emma Whamond sit down with Saša Jurić, Elixir mentor and author of Elixir in Action, to discuss software craftsmanship in the age of AI. As AI coding tools become increasingly capable, Saša argues that the real challenge isn't generating code, it's maintaining quality, clarity, and shared understanding within a codebase. We explore the difference between correct code and good code, and why code is more than a set of instructions for a machine to execute. Code is also documentation, communication, and a long-term investment that future developers must be able to understand and maintain. Saša shares his concerns about the growing "theater of pull requests," where teams go through the motions of code review without creating meaningful opportunities for learning, feedback, or knowledge sharing. The hosts and Saša talk about practical ways to work effectively with AI, including taking smaller steps, carefully reviewing AI-generated code, and using AI as a collaborative tool rather than an autonomous developer. Throughout the discussion, Saša challenges the industry's obsession with speed and makes the case that the principles of good software development (incremental progress, clear communication, and human judgment) remain important in the age of AI. Key Topics Discussed The difference between correct code and good code Code as communication, documentation, and shared understanding The "theater of pull requests" and ineffective review practices How AI is changing software development workflows Using AI as a collaborator rather than a replacement Why smaller, incremental changes lead to better outcomes Human oversight in AI-assisted development Balancing development speed with maintainability Pull request size and review effectiveness Commit history as a tool for storytelling and context The risks of accumulating technical debt faster with AI Testing and validating AI-generated code Refactoring AI-generated solutions for clarity Applying agile principles to AI-assisted workflows The role of experience and judgment in software design Why software craftsmanship still matters in the age of AI Links mentioned Code Complete by Steve McConnell https://khmerbamboo.wordpress.com/wp-content/uploads/2014/09/code-complete-2nd-edition-v413hav.pdf Harness AI for DevOps, Testing, and AppSec https://www.harness.io/ Claude Code https://claude.com/product/claude-code Claude Code GitHub https://github.com/anthropics/claude-code Pull Request for Oban https://github.com/oban-bg/oban/pull/331 SMPP https://en.wikipedia.org/wiki/Short_Message_Peer-to-Peer OpenAI Codex https://chatgpt.com/codex/ Opus AI https://opus.ai/ Tidewave https://tidewave.ai/ Credo Static Code Analysis https://github.com/rrrene/credo https://smartlogic.io/podcast/elixir-wizards/s11-e09-static-code-analyzer-elixir-credo-ruby-rubocop/ Link to Sasa's X post https://x.com/sasajuric/status/2029522378196238503 Saša Jurić “Tell Me A Story” at Goatmire https://www.youtube.com/watch?v=GOrKfCs-mr0 https://meks.quest/blogs/the-theatre-of-pull-requests-and-code-review Looks Good to Me: Constructive Code Reviews by Adrienne Braganza https://www.manning.com/books/looks-good-to-me Towards Maintainable Elixir: Testing https://medium.com/very-big-things/towards-maintainable-elixir-testing-b32ac0604b99 TDD, Where Did It All Go Wrong (Ian Cooper) https://youtu.be/EZ05e7EMOLMSpecial Guest: Saša Jurić.

Champaign City Council 5-19-26 w/ Audio Descriptions

City of Champaign

Play Episode Listen Later May 20, 2026 99:13

ORDINANCES AND RESOLUTIONS Council Bill No. 2026-063: A Resolution Appointing Tom Cullop to the Code Review & Appeals Board in the City of Champaign Council Bill No. 2026-064: A Resolution Appointing Kintessa Redmon to the Human Relations Commission in the City of Champaign Council Bill No. 2026-065: An Ordinance Approving a Fourth Annexation Agreement Amendment Between the City of Champaign and Friendship Lutheran Church of Joy Council Bill No. 2026-066: An Ordinance Approving an Amendment to a Special Use Permit Allowing a Power Generation Facility in the I2, Heavy Industrial Zoning DistrictCouncil Bill No. 2026-068: A Resolution Accepting a Bid for the State Street Sanitary Sewer Extension Project Council Bill No. 2026-069: A Resolution Accepting a Bid for Refuse Collection Services for On-Street Receptacles and City Facilities Council Bill No. 2026-070: A Resolution Approving the FY 2026/27 Annual Budget for the Champaign-Urbana Solid Waste Disposal System Council Bill No. 2026-071: A Resolution Accepting a Bid and Authorizing the City Manager to Execute an Agreement for the 2026 Infrastructure Maintenance Project Council Bill No. 2026-072: A Resolution Granting a Waiver to City Policy for Yard Waste and Holiday Tree Collection to Modify the Fall 2026 Yard Waste Collection Schedule STUDY SESSIONFood Security Partnership

fall amendment agreement city council execute bid waiver descriptions city managers champaign fy modify code reviews authorizing human relations commission i2

#111: A Bazooka of Syntax

Side Project Spotlight

Play Episode Listen Later May 11, 2026 69:27

Steve finally fixed phillycocoa.org, and the journey from broken CircleCI pipelines and hijacked S3 buckets to a blazing-fast Cloudflare Pages site took one Side Project Saturday and an embarrassing number of Codex tokens. Then The Trio turns to the AI hype machine, and they're tired: tired of opaque token costs, tired of reviewing generated code that complicates everything it touches, and tired of an industry that mistakes syntax speed for software engineering. Fred Brooks called it in 1986, and The Trio is calling it now.## Chapters00:00 Introductions01:47 The Journey of Updating the Website06:38 Challenges with CircleCI and S3 Buckets09:23 Exploring Cloudflare Pages11:14 Navigating Cloudflare's User Interface14:22 Setting Up Automatic Deployments17:35 Managing DNS and SSL with Cloudflare23:07 LLM Development Fatigue26:15 Navigating Concerns and Costs in AI Usage29:11 LLMs are No Silver Bullet31:57 The Exhaustion of Code Review and Architectural Decisions36:25 Token Management and Cost Awareness in AI Tools40:07 The Economics of AI and Software Development42:45 The Hype vs. Reality of AI Tools46:34 Future Prospects of LLMs and Universal UI50:16 The Future of Edge Computing with LLMs53:08 The Evolution of Software Development and AI Integration54:17 AI in Sci-Fi: Myths vs. Reality57:54 The Challenges of Local Models and Hardware Limitations01:03:21 Outro & Upcoming Event01:09:21 Tag## Show Notes- Steve spent Side Project Saturday migrating phillycocoa.org from a broken CircleCI/S3 setup to Cloudflare Pages, burning his entire weekly Codex token budget in about three hours.- Cloudflare Pages handles Hugo builds automatically and manages SSL and CDN without manual config, all on a free tier that's plenty for the site.- Cloudflare's UI hides the Pages "Get Started" link below giant worker buttons, which Kotaro calls "the weirdest dark pattern."- Steve argues that syntax generation was never the real bottleneck in software engineering, citing Fred Brooks' 1986 essay "No Silver Bullet."- Aaron is worn out from reviewing AI-generated code and still having to make every architectural decision himself.- LLM costs are nearly impossible to forecast: a single prompt can burn a significant chunk of your plan, depending on model, tool calls, and context.- The Trio sees firms rushing to adopt LLM tooling before the ROI math makes sense, driven by hype rather than evidence.- ThePrimeagen's recent take on the shifting AI economy lines up with what Steve sees at work: token-based billing is starting to expose the real cost.- The Trio agrees local models running on personal hardware are the interesting long-term play, but RAM shortages make even basic setups expensive.- Kotaro closes with a dad joke: he thought his LLM skills landed him his current job, but it turns out...## Links**PhillyCocoa.org Update**Website: https://phillycocoa.org**Articles & Essays**"Let's talk about LLMs" by James Bennett: https://www.b-list.org/weblog/2026/apr/09/llms/"No Silver Bullet" by Fred Brooks: https://www.cs.unc.edu/techreports/86-020.pdf**Videos**"The AI economy is about to change" by ThePrimeagen: https://youtu.be/_Q-e_nczWqM**One More Thing**"Beyond the Simulator: Perspectives on Modern App Development": https://luma.com/i00ll61z**PhillyCocoa:** https://phillycocoa.orgIntro music: "When I Hit the Floor", © 2021 Lorne Behrman. Used with permission of the artist.

Do you actually own the code you ship?

No Compromises

Play Episode Listen Later May 9, 2026 14:54 Transcription Available

When a tool hands you a working solution, how much do you really need to understand about why it works?In the latest episode of the No Compromises podcast, we discuss whether developers still care about understanding the code they ship, or whether that expectation is becoming a relic of the past.We explore why knowing the "why" behind a solution isn't just about curiosity. It's about having enough domain knowledge to ask better questions, push back on bad answers, and ultimately produce better work.We also walk through a real code review example involving a tricky Eloquent query, talk through the pressures that pull developers away from digging deeper, and consider what separates a line cook from a chef in how we approach our craft.(00:16) - Are developers losing the habit of asking why (02:16) - How AI changes the copy-paste-and-move-on cycle (05:25) - Learning by accident while reading the manual (06:24) - The Eloquent query neither of us could explain (12:16) - Silly bit Join a community of developers who still care about understanding the code they ship.

learning ai code ship php eloquent laravel code reviews code quality

The Future of Code Review: Stop Reviewing Line-by-Line, Start Governing AI Agents

Tech Lead Journal

Play Episode Listen Later May 4, 2026 75:27

(07:22) Brought to you by MailtrapMailtrap is a modern email delivery for developers with native SDKs support along with security compliant API & SMTP. Plus, you get 4,000 emails a month completely on their free tier! It also provides 24/7 support where you actually talk to real people, not an AI chatbot. Try Mailtrap for free at ⁠mailtrap.io⁠.What does code review mean when AI writes most of the code? The answer isn't to review more carefully. It's a fundamentally different process, one built around rules, agents, and governance rather than diffs and comments.In this episode, Itamar Friedman, founder and CEO of Qodo.ai, shares how AI is forcing a complete rethink of code review — from inline comments on code diffs to multi-agent governance systems that verify intent, architecture, and business logic at scale. He traces the evolution of code review through successive generations, explains why traditional static analysis is no longer sufficient, and lays out what a modern quality and governance layer actually looks like. Itamar also introduces the concept of “shift up” — extending quality checks into the planning phase so that technical product managers can contribute directly to shipping features — and explains how teams can move from vibe coding to viable, grounded development. The conversation also covers the race between AI labs, the role of open-source models, and a frank look at where the software developer role is heading by 2030.Key topics discussed:Why line-by-line code review doesn't scale with AI-generated PRsThe generational evolution of code review tools (Gen 1 to 3.5)How multi-agent systems surface only what needs human attentionTurning tribal knowledge into enforceable rules and skillsShift-left and shift-up: embedding quality earlier in the workflowWhat the new agentic code review UI will look likeVibe coding vs. viable coding: the governance layer in betweenWhere the software developer role is headed by 2030Timestamps:(00:00:00) Trailer & Intro(00:02:50) How Has AI Driven the Evolution of Code Review to Multi-Agent Systems?(00:07:53) How Do We Move from Vibe Coding to Viable, Grounded Development?(00:12:35) Are Traditional Static Analysis Checks Still Sufficient in the AI Era?(00:16:27) How Do We Handle Exploding PR Volume Without Sacrificing Code Review Quality?(00:22:11) How Do We Evolve Code Review from Simple Comments to Senior-Level AI Reviews?(00:28:51) What Will the New Agentic Code Review UI Look Like?(00:33:32) How Does Qodo Differentiate Itself as an AI Code Review and Governance Platform?(00:37:15) What Do Shift-Left and Shift-Up Mean for the Future of Code Quality?(00:41:23) How Do We Maintain Quality When Running Multiple AI Agents in Parallel?(00:48:11) How Are Chinese AI Models Reshaping the Open-Source vs Closed-Source Race?(00:55:25) Which AI Models Excel at Code Review, and Are We Heading Toward Specialization?(01:03:16) Will Software Developers Still Be Needed as AI Automates More of Engineering?(01:08:50) 3 Tech Lead Wisdom_____Itamar Friedman's BioItamar Friedman is the CEO and Co-Founder of Qodo, an AI code review platform used by 1M + developers. Before founding Qodo, Itamar was a founder of Visualead, which was acquired by the Alibaba Group. He then worked for Alibaba Group for 4 years as the Director of Machine Vision. Now, Itamar is dedicated to quality-first code generation.Follow Itamar:LinkedIn – linkedin.com/in/itamarfX (formerly Twitter) – @itamar_marQodo.ai – qodo.aiLike this episode?Show notes & transcript: techleadjournal.dev/episodes/257.Follow @techleadjournal on LinkedIn, Twitter, and Instagram.Buy me a coffee or become a patron.

#264 Seniorität im KI-Zeitalter: Eine Ode an den Junior

Engineering Kiosk

Play Episode Listen Later Apr 21, 2026 58:57 Transcription Available

Seniorität im AI-Zeitalter.AI ist überall. In Demos, in LinkedIn-Posts, in Produktivitätsversprechen und inzwischen auch mitten im Entwickleralltag. Aber was passiert eigentlich, wenn Code plötzlich billig wird? Wird dann jede:r zum 10x Engineer oder merken wir erst jetzt, worauf es bei Seniorität wirklich ankommt? Genau dieser Frage gehen wir in dieser Episode nach und schauen ehrlich auf den Spannungsbogen zwischen KI-Hype, Softwarequalität, Code Reviews und Karriereentwicklung.Wir sprechen darüber, warum mehr Output nicht automatisch mehr Outcome bedeutet, was DORA-Metriken, Studien und Alltagserfahrungen über AI Coding Tools sagen und weshalb das Big Picture wichtiger wird als die pure Menge an produziertem Code. Außerdem diskutieren wir, warum Senior Engineers gerade jetzt so gefragt sind, welche Rolle Kommunikation, Priorisierung, Leadership und Architekturverständnis spielen und warum der Satz AI ist doch mein Junior deutlich zu kurz greift. Ebenso schauen wir auf den Rückgang von Junior-Rollen, auf Internships als Recruiting-Pipeline und darauf, wie Lernen, Mentoring und echte Verantwortung in einer Welt mit Coding Agents aussehen können.Wenn du verstehen willst, wie sich Softwareentwicklung, Seniorität, Juniors, Staff Engineers, AI Adoption und Business Value gerade verschieben, ist diese Folge für dich. Oder anders gesagt: Wenn mehr Code billiger wird, wird Klarheit wertvoller. Und genau da wird es spannend.Unsere aktuellen Werbepartner findest du auf https://engineeringkiosk.dev/partnersDas schnelle Feedback zur Episode:

ai leadership team code welt senior engineers gedanken recruiting mentoring probleme kommunikation unsere einfluss verantwortung genau outcome lernen big picture menge perspektiven kaffee klarheit gutes anregungen studien output zeitalter ebenso internships produktivit interns cloudflare juniors skepsis eher business value werbepartner priorisierung praktikant erreiche studienlage ki hype code reviews spannungsbogen eine ode open mindset alltagserfahrungen karrierepfade senior it sprungmarken

Episode 60: Using AI to modernise legacy code, manage tech debt and improve documentation

Off Script

Play Episode Listen Later Apr 15, 2026 39:11

In this episode of Off Script, James, Josh, and guest Jon Milsom explore the rapid advancements in AI as a tool to refactor codebases, and how teams are leveraging AI to innovate, solve problems, and improve processes across industries. Chapters: 00:00 Introduction to AI in Tech Consulting 01:03 The Evolution of AI and Its Impact 03:13 AI Week: Experimentation and Learning 04:39 Personal Projects: Scratch an Itch 07:35 Encouraging Innovation and Creativity 08:41 The Excitement of AI and Its Challenges 11:19 Using AI for Tech Debt and Documentation 13:46 AI's Role in Code Review and Documentation 17:05 Navigating Bottlenecks in Development 19:10 The Broader Responsibility of Engineers 20:26 The Importance of Maintenance in Software Development 23:00 Leveraging AI for Legacy Code Upgrades 26:06 AI Integration Across Business Functions 28:59 User-Centric AI Features in Sports Tech 31:59 Empowering Teams with AI Tools 34:47 The Future of AI Pricing and Development Resources: Pitchero - https://pitchero.com Claude AI - https://www.anthropic.com/index.html OpenAI GPT models - https://openai.com/research

ai future evolution manage chapters excitement maintenance using ai documentation software development leveraging ai ai ethics off script tech innovation tech debt legacy code james hall code reviews

Zwolnij swoich developerów! AI zakoduje za nich

Nerd Management

Play Episode Listen Later Apr 14, 2026 23:12

"Zwolnij swoich developerów - AI zakoduje za nich" - brzmi kusząco, prawda? Krzysiek, jako menedżer, wziął technologię, której nie zna, wrzucił ją w Claude Code i wdrożył na produkcję. Działa! Więc po co nam developerzy? No właśnie... nie tak szybko. Bo za tym clickbaitowym tytułem kryje się dużo bardziej złożona historia - o tym, co się stało, gdy senior architekt przejrzał ten "piękny" kod wygenerowany przez AI, o sekretach wyciekających na front, o katastrofie promu Challenger i o tym, dlaczego drobne ustępstwa w jakości mogą skończyć się wielkim bum. W tym odcinku rozmawiamy o tym, jak wygląda realność kodowania z AI z perspektywy menedżera - co jest fajne, co jest złudne i gdzie czyhają pułapki, których żaden model językowy za nas nie wyłapie. Opowiadamy o tym, dlaczego background techniczny i ludzki nadzór nadal mają znaczenie, jak zmienia się rynek rekrutacji w IT (co jest na topie?) oraz dlaczego zamiast zwalniać ludzi, warto nauczyć ich surfować na fali AI. Jeśli zastanawiasz się, jak podejść do AI w swoim zespole, jak nie dać się złapać na hype jednoosobowych firm za miliard dolarów i co robić, żeby nie zostać w tyle - ten odcinek jest dla Ciebie. Zapraszamy! Z tego odcinka dowiesz się: - Czy menedżer bez doświadczenia technicznego może bezpiecznie kodować z AI? - Dlaczego drobne ustępstwa w jakości kodu generowanego przez AI mogą skończyć się katastrofą? - Kogo zatrudniać, a kogo rozwijać w erze AI? - Czy vibe coding to przyszłość, czy droga do wycieku danych i strat finansowych? - Jak nie zostać w tyle, gdy AI zmienia zasady gry w IT? Linki do materiałów, wersję video oraz transkrypt do tego odcinka znajdziesz na stronie:

spotify ai blog startups apple podcast cybersecurity developers wi growth mindset firma jak challenger spreaker devops czy dlaczego dzi kto lider zapraszamy nich dzia lidera ciebie kogo twoja linki wprowadzenie opowiadamy swoich wejd code reviews rekrutacja automatyzacja krzysiek programista cloud code polub nerd management

Amazon's AI outage, the engineer retention crisis, autonomous agents and the future of senior engineers

PodRocket - A web development podcast from LogRocket

Play Episode Listen Later Mar 26, 2026 52:58

The Amazon AI coding outage reignited a debate the industry can't ignore: is this an AI failure or a process failure, and does that distinction even matter anymore? Paige, Jack, Paul, and Noel dig into vibe coding culture, the engineer retention crisis, and the rise of harness engineering as a discipline in this month's panel. They also tackle autonomous agents running while you sleep, zero-touch engineering, what a senior engineer even means now, and whether open source can survive the agentic era. Resources Beyond the Hype: Why Vibe Coding Leaders Are Facing a Retention Crisis: https://www.forbes.com/councils/forbestechcouncil/2026/03/09/beyond-the-hype-why-vibe-coding-leaders-are-facing-a-retention-crisis/ Atlassian layoffs as part of AI push: https://www.theguardian.com/technology/2026/mar/12/atlassian-layoffs-software-technology-ai-push-mike-cannon-brookes-asx I'm Building Agents That Run While I Sleep: https://www.claudecodecamp.com/p/i-m-building-agents-that-run-while-i-sleep We want to hear from you! How did you find us? Did you see us on Twitter? In a newsletter? Or maybe we were recommended by a friend? Fill out our listener survey! https://t.co/oKVAEXipxu Let us know by sending an email to our producer, Elizabeth, at elizabeth.becz@logrocket.com, or tweet at us at PodRocketPod. Check out our newsletter! https://blog.logrocket.com/the-replay-newsletter/ Follow us. Get free stickers. Follow us on Apple Podcasts, fill out this form, and we'll send you free PodRocket stickers! What does LogRocket do? LogRocket provides AI-first session replay and analytics that surfaces the UX and technical issues impacting user experiences. Start understanding where your users are struggling by trying it for free at LogRocket.com. Try LogRocket for free today. Chapters 00:00 Introduction & Panel Welcome 01:30 Amazon's AI Outage - Process Failure or AI Failure? 05:00 Harness Engineering and the Real Lesson from Stripe 08:30 The Retention Crisis - Are Good Engineers Leaving Tech? 11:30 The Satisfaction Problem - AI Stole the Mountain Climb 14:00 Code Review Is the New Bottleneck 16:00 Stripe vs Amazon - Two Different Philosophies on AI at Scale 18:00 Would You Restart Your Career in a World of Code Review? 21:00 Domain Experts as the New Engineers 24:00 Is Artisanal Code a Real Future? 28:30 Content in the AI Era - Who's It Even For? 30:30 Agents Running While You Sleep - The Verification Problem 33:00 Zero-Touch Engineering and How Paul's Team Does It 36:00 Auto-Research, LLMs Judging LLMs, and Brain Rot Scripts 40:00 Are We Actually Shipping Faster? 41:00 What Does "Senior Engineer" Mean Now? 47:00 Hot Takes - Open Source, USB-C, Defense Contracts, and Taste 53:00 Wrap-UpSpecial Guest: Jack Herrington.

amazon world ai crisis senior fill engineers ux retention open source stripe autonomous outage usb c software engineering atlassian senior engineer amazon ai code reviews

Meta Plumps For Bot Social Networks

Techmeme Ride Home

Play Episode Listen Later Mar 10, 2026 21:10

Meta moves for the social network for AI bots. Code Review for Claude Code seems to be like another revolution for the software development industry. Yan LeCun raises the biggest European seed round of all time. And the MacBook Neo… worth investing in or not? Exclusive: Meta hires duo behind Moltbook (Axios) OpenAI and Google employees rush to Anthropic's defense in DOD lawsuit (TechCrunch) This new Claude Code Review tool uses AI agents to check your pull requests for bugs - here's how (ZDNet) Amazon holds engineering meeting following AI-related outages (FT) Yann LeCun's AI start-up raises more than $1bn in Europe's largest seed round (FT) MacBook Neo review: the Mac for the masses (The Verge) Learn more about your ad choices. Visit megaphone.fm/adchoices

ai europe google european mac bots dod anthropic social networks code reviews plumps

Every Agent Needs a Box — Aaron Levie, Box

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Mar 5, 2026 76:58

The reception to our recent post on Code Reviews has been strong. Catch up!Amid a maelstrom of discussion on whether or not AI is killing SaaS, one of the top publicly listed SaaS companies in the world has just reported record revenues, clearing well over $1.1B in ARR for the first time with a 28% margin. As we comment on the pod, Aaron Levie is the rare public company CEO equally at home in both worlds of Silicon Valley and Wall Street/Main Street, by day helping 70% of the Fortune 500 with their Enterprise Advanced Suite, and yet by night is often found in the basements of early startups and tweeting viral insights about the future of agents.Now that both Cursor, Cloudflare, Perplexity, Anthropic and more have made Filesystems and Sandboxes and various forms of “Just Give the Agent a Box” cool (not just cool; it is now one of the single hottest areas in AI infrastructure growing 100% MoM), we find it a delightfully appropriate time to do the episode with the OG CEO who has been giving humans and computers Boxes since he was a college dropout pitching VCs at a Michael Arrington house party.Enjoy our special pod, with fan favorite returning guest/guest cohost Jeff Huber!Note: We didn't directly discuss the AI vs SaaS debate - Aaron has done many, many, many other podcasts on that, and you should read his definitive essay on it. Most commentators do not understand SaaS businesses because they have never scaled one themselves, and deeply reflected on what the true value proposition of SaaS is.We also discuss Your Company is a Filesystem:We also shoutout CTO Ben Kus' and the AI team, who talked about the technical architecture and will return for AIE WF 2026.Full Video EpisodeTimestamps* 00:00 Adapting Work for Agents* 01:29 Why Every Agent Needs a Box* 04:38 Agent Governance and Identity* 11:28 Why Coding Agents Took Off First* 21:42 Context Engineering and Search Limits* 31:29 Inside Agent Evals* 33:23 Industries and Datasets* 35:22 Building the Agent Team* 38:50 Read Write Agent Workflows* 41:54 Docs Graphs and Founder Mode* 55:38 Token FOMO Culture* 56:31 Production Function Secrets* 01:01:08 Film Roots to Box* 01:03:38 AI Future of Movies* 01:06:47 Media DevRel and EngineeringTranscriptAdapting Work for AgentsAaron Levie: Like you don't write code, you talk to an agent and it goes and does it for you, and you may be at best review it. That's even probably like, like largely not even what you're doing. What's happening is we are changing our work to make the agents effective. In that model, the agent didn't really adapt to how we work.We basically adapted to how the agent works. All of the economy has to go through that exact same evolution. Right now, it's a huge asset and an advantage for the teams that do it early and that are kinda wired into doing this ‘cause you'll see compounding returns. But that's just gonna take a while for most companies to actually go and get this deployed.swyx: Welcome to the Lane Space Pod. We're back in the chroma studio with uh, chroma, CEO, Jeff Hoover. Welcome returning guest now guest host.Aaron Levie: It's a pleasure. Wow. How'd you get upgraded to, uh, to that?swyx: Because he's like the perfect guy to be guest those for you.Aaron Levie: That makes sense actually, for We love context. We, we both really love context le we really do.We really do.swyx: Uh, and we're here with, uh, Aaron Levy. Welcome.Aaron Levie: Thank you. Good to, uh, good to be [00:01:00] here.swyx: Uh, yeah. So we've all met offline and like chatted a little bit, but like, it's always nice to get these things in person and conversation. Yeah. You just started off with so much energy. You're, you're super excited about agents.I loveAaron Levie: agents.swyx: Yeah. Open claw. Just got by, got bought by OpenAI. No, not bought, but you know, you know what I mean?Aaron Levie: Some, some, you know, acquihire. Executiveswyx: hire.Aaron Levie: Executive hire. Okay. Executive hire. Say,swyx: hey, that's my term. Okay. Um, what are you pounding the table on on agents? You have so many insightful tweets.Why Every Agent Needs a BoxAaron Levie: Well, the thing that, that we get super excited by that I think is probably, you know, should be relatively obvious is we've, we've built a platform to help enterprises manage their files and their, their corporate files and the permissions of who has access to those files and the sharing collaboration of those files.All of those files contain really, really important information for the enterprise. It might have your contracts, it might have your research materials, it might have marketing information, it might have your memos. All that data obviously has, you know, predominantly been used by humans. [00:02:00] But there's been one really interesting problem, which is that, you know, humans only really work with their files during an active engagement with them, and they kind of go away and you don't really see them for a long time.And all of a sudden, uh, with the power of AI and AI agents, all of that data becomes extremely relevant as this ongoing source of, of answers to new questions of data that will transform into, into something else that, that produces value in your organization. It, it contains the answer to the new employee that's onboarding, that needs to ramp up on a project.Um, it contains the answer to the right thing to sell a customer when you're having a conversation to them, with them contains the roadmap information that's gonna produce the next feature. So all that data. That previously we've been just sort of storing and, and you know, occasionally forgetting about, ‘cause we're only working on the new active stuff.All of that information becomes valuable to the enterprise and it's gonna become extremely valuable to end users because now they can have agents go find what they're looking for and produce new, new [00:03:00] value and new data on that information. And it's gonna become incredibly valuable to agents because agents can roam around and do a bunch of work and they're gonna need access to that data as well.And um, and you know, sometimes that will be an agent that is sort of working on behalf of, of, of you and, and effectively as you as and, and they are kind of accessing all of the same information that you have access to and, and operating as you in the system. And then sometimes there's gonna be agents that are just.Effectively autonomous and kind of run on their own and, and you're gonna collaborate and work with them kind of like you did another person. Open Claw being the most recent and maybe first real sort of, you know, kind of, you know, up updating everybody's, you know, views of this landscape version of, of what that could look like, which is, okay, I have an agent.It's on its own system, it's on its own computer, it has access to its own tools. I probably don't give it access to my entire life. I probably communicate with it like I would an assistant or a colleague and then it, it sort of has this sandbox environment. So all of that has massive implications for a platform that manage that [00:04:00] enterprise data.We think it's gonna just transform how we work with all of the enterprise content that we work with, and we just have to make sure we're building the right platform to support that.swyx: The sort of shorthand I put it is as people build agents, everybody's just realizing that every agent needs a box. Yes.And it's nice to be called box and just give everyone a box.Aaron Levie: Hey, I if I, you know, if we can make that go viral, uh, like I, I think that that terminology, I, that's theswyx: tagline. Every agentAaron Levie: needs a box. Every agent needs a box. If we can make that the headline of this, I'm fine with this. And that's the billboard I wanna like Yeah, exactly.Every agent needs a box. Um, I like it. Can we ship this? Like,swyx: okay, let's do it. Yeah.Aaron Levie: Uh, my work here is done and I got the value I needed outta this podcast Drinks.swyx: Yeah.Agent Governance and IdentityAaron Levie: But, but, um, but, but, you know, so the thing that we, we kind of think about is, um, is, you know, whether you think the number 10 x or a hundred x or whatever the number is, we're gonna have some order of magnitude more agents than people.That's inevitable. It has to happen. So then the question is, what is the infrastructure that's needed to make all those agents effective in the enterprise? Make sure that they are well governed. Make sure they're only doing [00:05:00] safe things on your information. Make sure that they're not getting exposed. The data that they shouldn't have access to.There's gonna be just incredibly spectacularly crazy security incidents that will happen with agents because you'll prompt, inject an agent and sort of find your way through the CRM system and pull out data that you shouldn't have access to. Oh, weJeff Huber: have God,Aaron Levie: right? I mean, that's just gonna happen all over the place, right?So, so then the thing is, is how do you make sure you have the right security, the permissions, the access controls, the data governance. Um, we actually don't yet exactly know in many cases how we're gonna regulate some of these agents, right? If you think about an agent in financial services, does it have the exact same financial sort of, uh, requirements that a human did?Or is it, is the risk fully on the human that was interacting or created the agent? All open questions, but no matter what, there's gonna need to be a layer that manages the, the data they have access to, the workflows that they're involved in, pulling up data from multiple systems. This is the new infrastructure opportunity in the era of agents.swyx: You have a piece on agent identities, [00:06:00] which I think was today, um, which I think a lot of breaking news, the security, security people are talking about, right? Like you basically, I, I always think of this as like, well you need the human you and then there you need the agent. YouAaron Levie: Yes.swyx: And uh, well, I don't know if it's that simple, but is box going to have an opinion on that or you're just gonna be like, well we're just the sort of the, the source layer.Yeah. Let's Okta of zero handle that.Aaron Levie: I think we're gonna have an opinion and we will work with generally wherever the contours of the market end up. Um, and the reason that we're gonna have an opinion more than other topics probably is because one of the biggest use cases for why your agent might need it, an identity is for file system access.So thus we have to kind of think about this pretty deeply. And I think, uh, unless you're like in our world thinking about this particular problem all day long, it might be, you know, like, why is this such a big deal? And the reason why it's a really big deal is because sometimes sort of say, well just give the agent an, an account on the system and it just treats, treat it like every other type of user on the system.The [00:07:00] problem is, is that I as Aaron don't really have any responsibility over anybody else's box account in our organization. I can't see the box account of any other employee that I work with. I am not liable for anything that they do. And they have, I have, I have, you know, strict privacy requirements on everything that they're able to, you know, that, that, that they work on.Agents don't have that, you know, don't have those properties. The person who creates the agent probably is gonna, for the foreseeable future, take on a lot of the liability of what that agent does. That agent doesn't deserve any privacy because, because it's, you know, it can't fully be autonomously operated and it doesn't have any legal, you know, kind of, you know, responsibility.So thus you can't just be like, oh, well I'll just create a bunch of accounts and then I'll, I'll kind of work with that agent and I'll talk to it occasionally. Like you need oversight of that. And so then the question is, how do you have a world where the agent, sometimes you have oversight of, but what if that agent goes and works with other people?That person over there is collaborating with the agent on something you shouldn't have [00:08:00] access to what they're doing. So we have all of these new boundaries that we're gonna have to figure out of, of, you know, it's really, really easy. So far we've been in, in easy mode. We've hit the easy button with ai, which is the agent just is you.And when you're in quad code and you're in cursor, and you're in Codex, you're just, the agent is you. You're offing into your services. It can do everything you can do. That's the easy mode. The hard mode is agents are kind of running on their own. People check in with them occasionally, they're doing things autonomously.How do you give them access to resources in the enterprise and not dramatically increased the security risk and the risk that you might expose the wrong thing to somebody. These are all the new problems that we have to get solved. I like the identity layer and, and identity vendors as being a solution to that, but we'll, we'll need some opinions as well because so many of the use cases are these collaborative file system use cases, which is how do I give it an agent, a subset of my data?Give it its own workspace as well. ‘cause it's gonna need to store off its own information that would be relevant for it. And how do I have the right oversight into that? [00:09:00]Jeff Huber: One thing, which, um, I think is kind interesting, think about is that you know, how humans work, right? Like I may not also just like give you access to the whole file.I might like sit next to you and like scroll to this like one part of the file and just show you that like one part and like, you know,swyx: partial file access.Jeff Huber: I'm just saying I think like our, like RA does seem to be dead, right? Like you wanna say something is dead uhhuh probably RA is dead. And uh, like the auth story to me seems like incredibly unsolved and unaddressed by like the existing state of like AI vendors.ButAaron Levie: yeah, I think, um, we're, I mean you're taking obviously really to level limit that we probably need to solve for. Yeah. And we built an access control system that was, was kind of like, you know, its own little world for, for a long time. And um, and the idea was this, it's a many to many collaboration system where I can give you any part of the file system.And it's a waterfall model. So if I give you higher up in the, in the, in the system, you get everything below. And that, that kind of created immense flexibility because I can kind of point you to any layer in the, in the tree, but then you're gonna get access to everything kind of below it. And that [00:10:00] mostly is, is working in this, in this world.But you do have to manage this issue, which is how do I create an agent that has access to some of my stuff and somebody else's stuff as well. Mm-hmm. And which parts do I get to look at as the creator of the agent? And, and these are just brand new problems? Yeah. Crazy. And humans, when there was a human there that was really easy to do.Like, like if the three of us were all sharing, there'd be a Venn diagram where we'd have an overlapping set of things we've shared, but then we'd have our own ways that we shared with each other. In an agent world, somebody needs to take responsibility for what that agent has access to and what they're working on.These are like the, some of the most probably, you know, boring problems for 98% of people on, on the internet, but they will be the problems that are the difference between can you actually have autonomous agents in an enterprise contextswyx: Yeah.Aaron Levie: That are not leaking your data constantly.swyx: No. Like, I mean, you know, I run a very, very small company for my conference and like we already have data sensitivity issues.Yes. And some of my team members cannot see Yes. Uh, the others and like, I can't imagine what it's like to run a Fortune 500 and like, you have to [00:11:00] worry about this. I'm just kinda curious, like you, you talked to a lot like, like 70, 80% of your cus uh, of the Fortune 500, your customers.Aaron Levie: Yep. 67%. Just so we're being verySEswyx: precise.So Yeah. I'm notAaron Levie: Okay. Okay.swyx: Something I'm rounding up. Yes. Round up. I'm projecting to, forAaron Levie: the government.swyx: I'm projecting to the end of the year.Aaron Levie: Okay.swyx: There you go.Aaron Levie: You do make it sound like, like we, we, well we've gotta be on this. Like we're, we're taking way too long to get to 80%. Well,swyx: no, I mean, so like. How are they approaching it?Right? Because you're, you don't have a, you don't have a final answer yet.Why Coding Agents Took Off FirstAaron Levie: Well, okay, so, so this is actually, this is the stark reality that like, unfortunately is the kinda like pouring the water on the party a little bit.swyx: Yes.Aaron Levie: We all in Silicon Valley are like, have the absolute best conditions possible for AI ever.And I think we all saw the dke, you know, kind of Dario podcast and this idea of AI coding. Why is that taken off? And, and we're not yet fully seeing it everywhere else. Well, look, if you just like enumerated the list of properties that AI coding has and then compared it to other [00:12:00] knowledge work, let's just, let's just go through a few of them.Generally speaking, you bring on a new engineer, they have access to a large swath of the code base. Like, there's like very, like you, just, like new engineer comes on, they can just go and find the, the, the stuff that they, they need to work with. It's a fully text in text out. Medium. It's only, it's just gonna be text at the end of the day.So it's like really great from a, from just a, uh, you know, kinda what the agent can work with. Obviously the models are super trained on that dataset. The labs themselves have a really strong, kind of self-reinforcing positive flywheel of why they need to do, you know, agent coding deeply. So then you get just better tooling, better services.The actual developers of the AI are daily users of the, of the thing that they're we're working on versus like the, you know, probably there's only like seven Claude Cowork legal plugin users at Anthropic any given day, but there's like a couple thousand Claude code and you know, users every single day.So just like, think about which one are they getting more feedback on. All day long. So you just go through this list. You have a, you know, everybody who's a [00:13:00] developer by definition is technical so they can go install the latest thing. We're all generally online, or at least, you know, kinda the weird ones are, and we're all talking to each other, sharing best practices, like that's like already eight differences.Versus the rest of the economy. Every other part of the economy has like, like six to seven headwinds relative to that list. You go into a company, you're a banker in financial services, you have access to like a, a tiny little subset of the total data that's gonna be relevant to do your job. And you're have to start to go and talk to a bunch of people to get the right data to do your job because Sally didn't add you to that deal room, you know, folder.And that that, you know, the information is actually in a completely different organization that you now have to go in and, and sort of run into. And it's like you have this endless list of access controls and security. As, as you talked about, you have a medium, which is not, it's not just text, right? You have, you have a zoom call that, that you're getting all of the requirements from the customer.You have a lot of in-person conversations and you're doing in-person sales and like how do you ever [00:14:00] digitize all of that information? Um, you know, I think a lot of people got upset with this idea that the code base has all the context, um, that I don't know if you follow, you know, did you follow some of that conversation that that went viral?Is like, you know, it's not that simple that, that the code base doesn't have all the knowledge, but like it's a lot, you're a lot better off than you are with other areas of knowledge work. Like you, we like, we like have documentation practices, you write specifications. Those things don't exist for like 80% of work that happens in the enterprise.That's the divide that we have, which is, which is AI coding has, has just fully, you know, where we've reached escape velocity of how powerful this stuff is, and then we're gonna have to find a way to bring that same energy and momentum, but to all these other areas of knowledge work. Where the tools aren't there, the data's not set up to be there.The access controls don't make it that easy. The context engineering is an incredibly hard problem because again, you have access control challenges, you have different data formats. You have end users that are gonna need to kind of be kind of trained through this as opposed to their adopting [00:15:00] these tools in their free time.That's where the Fortune 500 is. And so we, I think, you know, have to be prepared as an industry where we are gonna be on a multi-year march to, to be able to bring agents to the enterprise for these workflows. And I think probably the, the thing that we've learned most in coding that, that the rest of the world is not yet, I think ready for, I mean, we're, they'll, they'll have to be ready for it because it's just gonna inevitably happen is I think in coding.What, what's interesting is if you think about the practice of coding today versus two years ago. It's probably the most changed workflow in maybe the history of time from the amount of time it's changed, right? Yeah. Like, like has any, has any workflow in the entire economy changed that quickly in terms of the amount of change?I just, you know, at least in any knowledge worker workflow, there's like very rarely been an event where one piece of technology and work practice has so fundamentally, you know, changed, changed what you do. Like you don't write code, you talk to an agent and it goes and [00:16:00] does it for you, and you may be at best review it.And even that's even probably like, like largely not even what you're doing. What's happening is we are changing our work to make the agents effective. In that model, the agent didn't really adapt to how we work. We basically adapted to how the agent works. Mm-hmm. All of the economy has to go through that exact same evolution.The rest of the economy is gonna have to update its workflows to make agents effective. And to give agents the context that they need and to actually figure out what kind of prompting works and to figure out how do you ensure that the agent has the right access to information to be able to execute on its work.I, you know, this is not the panacea that people were hoping for, of the agent drops in, just automates your life. Like you have to basically re-engineer your workflow to get the most out of agents and, uh, and that, that's just gonna take, you know, multiple years across the economy. Right now it's a huge asset and an advantage for the teams that do it early and that are kinda wired into doing this.‘cause [00:17:00] you'll see compounding returns, but that's just gonna take a while for most companies to actually go and get this deployed.swyx: I love, I love pushing back. I think that. That is what a lot of technology consultants love to hear this sort of thing, right? Yeah, yeah, yeah. First to, to embrace the ai. Yes. To get to the promised land, you must pay me so much money to a hundred percent to adopt the prescribed way of, uh, conforming to the agents.Yes. And I worry that you will be eclipsed by someone else who says, no, come as you are.Aaron Levie: Yeah.swyx: And we'll meet you where you are.Aaron Levie: And, and, and and what was the thing that went viral a week ago? OpenAI probably, uh, is hiring F Dees. Yeah. Uh, to go into the enterprise. Yeah. Yeah. And then philanthropic is embedded at Goldman Sachs.Yeah. So if the labs are having to do this, if, if the labs have decided that they need to hire FDE and professional services, then I think that's a pretty clear indication that this, there's no easy mode of workflow transformation. Yeah. Yeah. So, so to your point, I think actually this is a market opportunity for, you know, new professional services and consulting [00:18:00] firms that are like Agent Build and they, and they kind of, you know, go into organizations and they figure out how to re-engineer your workflows to make them more agent ready and get your data into the right format and, you know, reconstruct your business process.So you're, you're not doing most of the work. You're telling agents how to do the work and then you're reviewing it. But I haven't seen the thing that can just drop in and, and kinda let you not go through those changes.swyx: I don't know how that kind of sales pitch goes over. Yeah. You know, you're, you're saying things like, well, in my sort of nice beautiful walled garden, here's, there's, uh, because here's this, here's this beautiful box account that has everything.Yes. And I'm like, well, most, most real life is extremely messy. Sure. And like, poorly named and there duplicate this outdated s**tAaron Levie: a hundred percent. And so No, no, a hundred percent. And so this is actually No. So, so this is, I mean, we agree that, that getting to the beautiful garden is gonna be tough.swyx: Yeah.Aaron Levie: There's also the other end of the spectrum where I, I just like, it's a technical impossibility to solve. The agent is, is truly cannot get enough context to make the right decision in, in the, in the incredibly messy land. Like there's [00:19:00] no a GI that will solve that. So, so we're gonna have to kind of land in somewhere in between, which is like we all collectively get better at.Documentation practices and, and having authoritative relatively up-to-date information and putting it in the right place like agents will, will certainly cause us to be much better organized around how we work with our information, simply because the severity of the agent pulling the wrong data will be too high and the productivity gain of that you'll miss out on by not doing this will be too high as well, that you, that your competition will just do it and they'll just have higher velocity.So, uh, and, and we, we see this a lot firsthand. So we, we build a series of agents internally that they can kind of have access to your full box account and go off and you give it a task and it can go find whatever information you're looking for and work with. And, you know, thank God for the model progress, but like, if, if you gave that task to an agent.Nine months ago, you're just gonna get lots of bogus answers because it's gonna, it's gonna say, Hey, here's, here are fi [00:20:00] five, you know, documents that all kind of smell like the right thing. And I'm gonna, but I, but you're, you're putting me on the clock. ‘cause my assistant prompt says like, you know, be pretty smart, but also try and respond to the user and it's gonna respond.And it's like, ah, it got the wrong document. And then you do that once or twice as a knowledge worker and you're just neverswyx: again,Aaron Levie: never again. You're just like done with the system.swyx: Yeah. It doesn't work.Aaron Levie: It doesn't work. And so, you know, Opus four six and Gemini three one Pro and you know, whatever the latest five 3G BT will be, like, those things are getting better and better and it's using better judgment.And this sort of like the, all of these updates to the agentic tool and search systems are, are, we're seeing, we're seeing very real progress where the agent. Kind of can, can almost smell some things a little bit fishy when it's getting, you know, we, we have this process where we, we have it go fan out, do a bunch of searches, pull up a bunch of data, and then it has to sort of do its own ranking of, you know, what are the right documents that, that it should be working with.And again, like, you know, the intelligence level of a model six months ago, [00:21:00] it'd be just throwing a dart at like, I'm just, I'm gonna grab these seven files and I, I pray, I hope that that's the right answer. And something like an opus first four five, and now four six is like, oh, it's like, no, that one doesn't seem right relative to this question because I'm seeing some signal that is making that, you know, that's contradicting the document where it would normally be in the tree and who should have access.Like it's doing all of that kind of work for you. But like, it still doesn't work if you just have a total wasteland of data. Like, it's just not, it's just not possible. Partly ‘cause a human wouldn't even be able to do it. So basically if a, if a really, really smart human. Could not do that task in five or 10 minutes for a search retrieval type task.Look, you know, your agent's not gonna be able to do it any better. You see this all day long. SoContext Engineering and Search Limitsswyx: this touches on a thing that just passionate about it was just context engineering. I, I'm just gonna let you ramble or riff on, on context engineering. If, if, if there's anything like he, he did really good work on context fraud, which has really taken over as like the term that people use and the referenceAaron Levie: a hundred percent.We, we all we think about is, is the context rob problem. [00:22:00]Jeff Huber: Yeah, there's certainly a lot of like ranking considerations. Gentech surgery think is incredibly promising. Um, yeah, I was trying to generate a question though. I think I have a question right now. Swyx.Aaron Levie: Yeah, no, but like, like I think there was this moment, um, you know, like, I don't know, two years ago before, before we knew like where the, the gotchas were gonna be in ai and I think someone was like, was like, well, infinite context windows will just solve all of these problems and ‘cause you'll just, you'll just give the context window like all the data and.It's just like, okay, I mean, maybe in 2035, like this is a viable solution. First of all, it, it would just, it would just simply cost too much. Like we just can't give the model like the 5,000 documents that might be relevant and it's gonna read them all. And I've seen enough to, to start believing in crazy stuff.So like, I'm willing to just say, sure. Like in, in 10 years from now,swyx: never say, never, never.Aaron Levie: In, in 10 years from now, we'll have infinite context windows at, at a thousandth of the price of today. Like, let's just like believe that that's possible, but Right. We're in reality today. So today we have a context engineering [00:23:00] problem, which is, I got, I got, you know, 200,000 tokens that I can work with, or prob, I don't even know what the latest graph is before, like massive degradation.16. Okay. I have 60,000 tokens that I get to work with where I'm gonna get accurate information. That's not a lot of tokens for a corpus of 10 million documents that a knowledge worker might have across all of the teams and all the projects and all the people they work with. I have, I have 10 million documents.Which, you know, maybe is times five pages per document or something like that. I'm at 50 million pages of information and I have 60,000 tokens. Like, holy s**t. Yeah. This is like, how do I bridge the 50 million pages of information with, you know, the couple hundred that I get to work with in that, in that token window.Yeah. This is like, this is like such an interesting problem and that's why actually so much work is actually like, just like search systems and the databases and that layer has to just get so locked in, but models getting better and importantly [00:24:00] knowing when they've done a search, they found the wrong thing, they go back, they check their work, they, they find a way to balance sort of appeasing the user versus double checking.We have this one, we have this one test case where we ask the agent to go find. 10 pieces of information.swyx: Is this the complex work eval?Aaron Levie: Uh, this is actually not in the eval. This is, this is sort of just like we have a bunch of different, we have a bunch of internal benchmark kind of scenarios. Every time we, we update our agent, we have one, which is, I ask it to find all of our office addresses, and I give it the list of 10 offices that we have.And there's not one document that has this, maybe there should be, that would be a great example of the kind of thing that like maybe over time companies start to, you know, have these sort of like, what are the canonical, you know, kind of key areas of knowledge that we need to have. We don't seem to have this one document that says, here are all of our offices.We have a bunch of documents that have like, here's the New York office and whatever. So you task this agent and you, you get, you say, I need the addresses for these 10 offices. Okay. And by the way, if you do this on any, you know, [00:25:00] public chat model, the same outcome is gonna happen. But for a different kind of query, you give it, you say, I need these 10 addresses.How many times should the agent go and do its search before it decides whether or not, there's just no answer to this question. Often, and especially the, the, let's say lower tier models, it'll come back and it'll give you six of the 10 addresses. And it'll, and I'll just say I couldn't find the otherswyx: four.It, it doesn't know what It doesn't know. ItAaron Levie: doesn't know what It doesn't know. Yeah. So the model is just like, like when should it stop? When should it stop doing? Like should it, should it do that task for literally an hour and just keep cranking through? Maybe I actually made up an office location and it doesn't know that I made it up and I didn't even know that I made it up.Like, should it just keep, re should it read every single file in your entire box account until it, until it should exhaust every single piece of information.swyx: Expensive.Aaron Levie: These are the new problems that we have. So, you know, something like, let's say a new opus model is sort of like, okay, I'm gonna try these types of queries.I didn't get exactly what I wanted. I'm gonna try again. I'm gonna, at [00:26:00] some point I'm gonna stop searching. ‘cause I've determined that that no amount of searching is gonna solve this problem. I'm just not able to do it. And that judgment is like a really new thing that the model needs to be able to have.It's like, when should it give up on a task? ‘cause, ‘cause you just don't, it's a can't find the thing. That's the real world of knowledge, work problems. And this is the stuff that the coding agents don't have to deal with. Because they, it just doesn't like, like you're not usually asking it about, you're, you're always creating net new information coming right outta the model for the most part.Obviously it has to know about your code base and your specs and your documentation, but, but when you deploy an agent on all of your data that now you have all of these new problems that you're dealing withJeff Huber: our, uh, follow follow-up research to context ride is actually on a genetic search. Ah. Um, and we've like right, sort of stress tested like frontier models and their ability to search.Um, and they're not actually that good at searching. Right. Uh, so you're sort of highlighting this like explore, exploit.swyx: You're just say, Debbie, Donna say everything doesn't work. Like,Aaron Levie: well,Jeff Huber: somebody has to be,Aaron Levie: um, can I just throw out one more thing? Yeah. That is different from coding and, and the rest [00:27:00] of the knowledge work that I, I failed to mention.So one other kind of key point is, is that, you know, at the end of the day. Whether you believe we're in a slop apocalypse or, or whatever. At the end of the day, if you, if you build a working product at the end of, if you, if you've built a working solution that is ultimately what the customer is paying for, like whether I have a lot of slop, a little slop or whatever, I'm sure there's lots of code bases we could go into in enterprise software companies where it's like just crazy slop that humans did over a 20 year period, but the end customer just gets this little interface.They can, they can type into it, it does its thing. Knowledge work, uh, doesn't have that property. If I have an AI model, go generate a contract and I generate a contract 20 times and, you know, all 20 times it's just 3% different and like that I, that, that kind of lop introduces all new kinds of risk for my organization that the code version of that LOP didn't, didn't introduce.These are, and so like, so how do you constrain these models to just the part that you want [00:28:00] them to work on and just do the thing that you want them to do? And, and, you know, in engineering, we don't, you can't be disbarred as an engineer, but you could be disbarred as a lawyer. Like you can do the wrong medical thing In healthcare, you, there's no, there's no equivalent to that of engineering.Like, doswyx: you want there to be, because I've considered softwareJeff Huber: engineer. What's that? Civil engineering there is, right? NotAaron Levie: software civil engineer. Sure. Oh yeah, for sure. But like in any of our companies, you like, you know, you'll be forgiven if you took down the site and, and we, we will do a rollback and you'll, you'll be in a meeting, but you have not been disbarred as an engineer.We don't, we don't change your, you know, your computer science, uh, blameJeff Huber: degree, this postmortem.Aaron Levie: Yeah, exactly. Exactly. So, so, uh, now maybe we collectively as an industry need to figure out like, what are you liable for? Not legally, but like in a, in a management sense, uh, of these agents. All sorts of interesting problems that, that, that, uh, that have to come out.But in knowledge work, that's the real hostile environments that we're operating in. Hmm.swyx: I do think like, uh, a lot of the last year's, 2025 story was the rise of coding agents and I think [00:29:00] 2026 story is definitely knowledge work agents. Yes. A hundredAaron Levie: percent.swyx: Right. Like that would, and I think open claw core work are just the beginning.Yes. Like it's, the next one's gonna just gonna be absolute craziness.Aaron Levie: It it is. And, and, uh, and it's gonna be, I mean, again, like this is gonna be this, this wave where we, we are gonna try and bring as many of the practices from coding because that, that will clearly be the forefront, which is tell an agent to go do something and has an access to a set of resources.You need to be responsible for reviewing it at the end of the process. That to me is the, is the kind of template that I just think goes across knowledge, work and odd. Cowork is a great example. Open Closet's a great example. You can kind of, sort of see what Codex could become over time. These are some, some really interesting kind of platforms that are emerging.swyx: Okay. Um, I wanted to, we touched on evals a little bit. You had, you had the report that you're gonna go bring up and then I was gonna go into like, uh, boxes, evals, but uh, go ahead. Talk about your genetic search thing.Jeff Huber: Yeah. Mostly I think kinda a few of the insights. It's like number one frontier model is not good at search.Humans have this [00:30:00] natural explore, exploit trade off where we kinda understand like when to stop doing something. Also, humans are pretty good at like forgetting actually, and like pruning their own context, whereas agents are not, and actually an agent in their kind of context history, if they knew something was bad and they even, you could see in the trace the reason you trace, Hey, that probably wasn't a good idea.If it's still in the trace, still in the context, they'll still do it again. Uhhuh. Uh, and so like, I think pruning is also gonna be like, really, it's already becoming a thing, right? But like, letting self prune the con windowsswyx: be a big deal. Yeah. So, so don't leave the mistake. Don't leave the mistake in there.Cut out the mistake but tell it that you made a mistake in the past and so it doesn't repeat it.Jeff Huber: Yeah. But like cut it out so it doesn't get like distracted by it again. ‘cause really, you know, what is so, so it will repeat its mistake just because it's been, it's inswyx: theJeff Huber: context. It'sAaron Levie: in the context so much.That's a few shot example. Even if it, yeah.Jeff Huber: It's like oh thisAaron Levie: is a great thing to go try even ifJeff Huber: it didn't work.Aaron Levie: Yeah,Jeff Huber: exactly.Aaron Levie: SoJeff Huber: there's like a bunch of stuff there. JustAaron Levie: Groundhogs Day inside these models. Yeah. I'm gonna go keep doing the same wrongJeff Huber: thing. Covering sense. I feel like, you know, some creator analogy you're trying like fit a manifold in latent space, which kind is doing break program synthesis, which is kinda one we think about we're doing right.Like, you know, certain [00:31:00] facts might be like sort of overly pitting it. There are certain, you know, sec sectors of latent space and so like plug clean space. Yeah. And, uh, andswyx: so we have a bell, our editor as a bell every time you say that. SoJeff Huber: you have, you have to like remove those, likeswyx: you shoulda a gong like TPN or something.IfJeff Huber: we gong, you either remove those links to like kinda give it the freedom, kind of do what you need to do. So, but yeah. We'll, we'll release more soon. That'sAaron Levie: awesome.Jeff Huber: That'll, that'll be cool.swyx: We're a cerebral podcast that people listen to us and, and sort of think really deep. So yeah, we try to keep it subtle.Okay. We try to keep it.Aaron Levie: Okay, fine.Inside Agent Evalsswyx: Um, you, you guys do, you guys do have EVs, you talked about your, your office thing, but, uh, you've been also promoting APEX agents and complex work. Uh, yeah, whatever you, wherever you wanna take this just Yeah. How youAaron Levie: Apex is, is obviously me, core's, uh, uh, kind of, um, agent eval.We, we supported that by sort of. Opening up some data for them around how we kind of see these, um, data workspaces in, in the, you know, kind of regular economy. So how do lawyers have a workspace? How do investment bankers have a workspace? What kind of data goes into those? And so we, [00:32:00] we partner with them on their, their apex eval.Our own, um, eval is, it's actually relatively straightforward. We have a, a set of, of documents in a, in a range of industries. We give the agent previously did this as a one shot test of just purely the model. And then we just realized we, we need to, based on where everything's going, it's just gotta be more agentic.So now it's a bit more of a test of both our harness and the model. And we have a rubric of a set of things that has to get right and we score it. Um, and you're just seeing, you know, these incredible jumps in almost every single model in its own family of, you know, opus four, um, you know, sonnet four six versus sonnet four five.swyx: Yeah. We have this up on screen.Aaron Levie: Okay, cool. So some, you're seeing it somewhere like. I, I forget the to, it was like 15 point jump, I think on the main, on the overall,swyx: yes.Aaron Levie: And it's just like, you know, these incredible leaps that, that are starting to happen. Um,swyx: and OP doesn't know any, like any, it's completely held out from op.Aaron Levie: This is not in any, there's no public data which has, you know, Ben benefits and this is just a private eval that we [00:33:00] do, and then we just happen to show it to, to the world. Hmm. So you can't, you can't train against it. And I think it's just as representative of. It's obviously reasoning capabilities, what it's doing at, at, you know, kind of test time, compute capabilities, thinking levels, all like the context rot issues.So many interesting, you know, kind of, uh, uh, capabilities that are, that are now improvingswyx: one sector that you have. That's interesting.Industries and Datasetsswyx: Uh, people are roughly familiar with healthcare and legal, but you have public sector in there.Aaron Levie: Yeah.swyx: Uh, what's that? Like, what, what, what is that?Aaron Levie: Yeah, and, and we actually test against, I dunno, maybe 10 industries.We, we end up usually just cutting a few that we think have interesting gains. All extras, won a lot of like government type documents. Um,swyx: what is that? What is it? Government type documents?Aaron Levie: Government filings. Like a taxswyx: return, likeAaron Levie: a probably not tax returns. It would be more of what would go the government be using, uh, as data.So, okay. Um, so think about research that, that type of, of, of data sets. And then we have financial services for things like data rooms and what would be in an investment prospectus. Uhhuh,swyx: that one you can dog food.Aaron Levie: Yeah, exactly. Exactly. Yes. Yes. [00:34:00] So, uh, so we, we run the models, um, in now, you know, more of an agent mode, but, but still with, with kinda limited capacity and just try and see like on a, like, for like basis, what are the improvements?And, and again, we just continue to be blown away by. How, how good these models are getting.swyx: Yeah, I mean, I think every serious AI company needs something like that where like, well, this is the work we do. Here's our company eval. Yeah. And if you don't have it, well, you're not a serious AI company.Aaron Levie: There's two dimensions, right?So there's, there's like, how are the models improving? And so which models should you either recommend a customer use, which one should you adopt? But then every single day, we're making changes to our agents. And you need to knowswyx: if you regressed,Aaron Levie: if you know. Yeah. You know, I've been fully convinced that the whole agent observability and eval space is gonna be a massive space.Um, super excited for what Braintrust is doing, excited for, you know, Lang Smith, all the things. And I think what you're going to, I mean, this is like every enter like literally every enterprise right now. It's like the AI companies are the customers of these tools. Every enterprise will have this. Yeah, you'll just [00:35:00] have to have an eval.Of all of your work and like, we'll, you'll have an eval of your RFP generation, you'll have an eval of your sales material creation. You'll have an eval of your, uh, invoice processing. And, and as you, you know, buy or use new agentic systems, you are gonna need to know like, what's the quality of your, of your pipeline.swyx: Yeah.Aaron Levie: Um, so huge, huge market with agent evals.swyx: Yeah.Building the Agent Teamswyx: And, and you know, I'm gonna shout out your, your team a bit, uh, your CTO, Ben, uh, did a great talk with us last year. Awesome. And he's gonna come back again. Oh, cool. For World's Fair.Aaron Levie: Yep.swyx: Just talk about your team, like brag a little bit. I think I, I think people take these eval numbers in pretty charts for granted, but No, there, I mean, there's, there's lots of really smart people at work during all this.Aaron Levie: Biggest shout out, uh, is we have a, we have a couple folks at Dya, uh, Sidarth, uh, that, that kind of run this. They're like a, you know, kind of tag tag team duo on our evals, Ben, our CTO, heavily involved Yasha, head of ai, uh, you know, a bunch of folks. And, um, evals is one part of the story. And then just like the full, you know, kind of AI.An agent team [00:36:00] is, uh, is a, is a pretty, you know, is core to this whole effort. So there's probably, I don't know, like maybe a few dozen people that are like the epicenter. And then you just have like layers and layers of, of kind of concentric circles of okay, then there's a search team that supports them and an infrastructure team that supports them.And it's starting to ripple through the entire company. But there's that kind of core agent team, um, that's a pretty, pretty close, uh, close knit group.swyx: The search team is separate from the infra team.Aaron Levie: I mean, we have like every, every layer of the stack we have to kind of do, except for just pure public cloud.Um, but um, you know, we, we store, I don't even know what our public numbers are in, you know, but like, you can just think about it as like a lot of data is, is stored in box. And so we have, and you have every layer of the, of the stack of, you know, how do you manage the data, the file system, the metadata system, the search system, just all of those components.And then they all are having to understand that now you've got this new customer. Which is the agent, and they've been building for two types of customers in the past. They've been building for users and they've been building for like applications. [00:37:00] And now you've got this new agent user, and it comes in with a difference of it, of property sometimes, like, hey, maybe sometimes we should do embeddings, an embedding based, you know, kind of search versus, you know, your, your typical semantic search.Like, it's just like you have to build the, the capabilities to support all of this. And we're testing stuff, throwing things away, something doesn't work and, and not relevant. It's like just, you know, total chaos. But all of those teams are supporting the agent team that is kind of coming up with its requirements of what, what do we need?swyx: Yeah. No, uh, we just came from, uh, fireside chat where you did, and you, you talked about how you're doing this. It's, it's kind of like an internal startup. Yeah. Within the broader company. The broader company's like 3000 people. Yeah. But you know, there's, there's a, this is a core team of like, well, here's the innovation center.Aaron Levie: Yeah.swyx: And like that every company kind of is run this way.Aaron Levie: Yeah. I wanna be sensitive. I don't call it the innovation center. Yeah. Only because I think everybody has to do innovation. Um, there, there's a part of the, the, the company that is, is sort of do or die for the agent wave.swyx: Yeah.Aaron Levie: And it only happens to be more of my focus simply because it's existential that [00:38:00] we get it right.swyx: Yeah.Aaron Levie: All of the supporting systems are necessary. All of the surrounding adjacent capabilities are necessary. Like the only reason we get to be a platform where you'd run an agent is because we have a security feature or a compliance feature, or a governance feature that, that some team is working on.But that's not gonna be the make or break of, of whether we get agents right. Like that already exists and we need to keep innovating there. I don't know what the right, exact precise number is, but it's not a thousand people and it's not 10 people. There's a number of people that are like the, the kind of like, you know, startup within the company that are the make or break on everything related to AI agents, you know, leveraging our platform and letting you work with your data.And that's where I spend a lot of my time, and Ben and Yosh and Diego and Teri, you know, these are just, you know, people that, that, you know, kind of across the team. Are working.swyx: Yeah. Amazing.Read Write Agent WorkflowsJeff Huber: How do you, how do you think about, I mean, you talked a lot about like kinda read workflows over your box data. Yep.Right. You know, gen search questions, queries, et cetera. But like, what about like, write or like authoring workflows?Aaron Levie: Yes. I've [00:39:00] already probably revealed too much actually now that I think about it. So, um, I've talked about whatever,Jeff Huber: whatever you can.Aaron Levie: Okay. It's just us. It's just us. Yeah. Okay. Of course, of course.So I, I guess I would just, uh, I'll make it a little bit conceptual, uh, because again, I've already, I've already said things that are not even ga but, but we've, we've kinda like danced around it publicly, so I, yeah, yeah. Okay. Just like, hopefully nobody watches this, um, episode. No.swyx: It's tidbits for the Heidi engaged to go figure out like what exactly, um, you know, is, is your sort of line of thinking.Sure. They can connect the dots.Aaron Levie: Yeah. So, so I would say that, that, uh, we, you know, as a, as a place where you have your enterprise content, there's a use case where I want to, you know, have an agent read that data and answer questions for me. And then there's a use case where I want the agent to create something.And use the file system to create something or store off data that it's working on, or be able to have, you know, various files that it's writing to about the work it's doing. So we do see it as a total read write. The harder problem has so far been the read only because, because again, you have that kind of like 10 [00:40:00] million to one ratio problem, whereas rights are a lot of, that's just gonna come from the model and, and we just like, we'll just put it in the file system and kinda use it.So it's a little bit of a technically easier problem, but the only part that's like, not necessarily technically hard, it is just like it's not yet perfected in the state of the ecosystem is, you know, building a beautiful PowerPoint presentation. It's still a hard problem for these models. Like, like we still, you know, like, like these formats are just, we're not built for.They'reswyx: working on it.Aaron Levie: They're, they're working on it. Everybody's working on it.swyx: Every launch is like, well, we do PowerPoint now.Aaron Levie: We're getting, yeah, getting a lot, getting a lot of better each time. But then you'll do this thing where you'll ask the update one slide and all of a sudden, like the fonts will be just like a little bit different, you know, on two of the slides, or it moved, you know, some shape over to the left a little bit.And again, these are the kind of things that, like in code, obviously you could really care about if you really care about, you know, how beautiful is the code, but at the end, user doesn't notice all those problems and file creation, the end user instantly sees it. You're [00:41:00] like, ah, like paragraph three, like, you literally just changed the font on me.Like it's a totally different font and like midway through the document. Mm-hmm. Those are the kind of things that you run into a lot of in the, in the content creation side. So, mm-hmm. We are gonna have native agents. That do all of those things, they'll be powered by the leading kind of models and labs.But the thing that I think is, is probably gonna be a much bigger idea over time is any agent on any system, again, using Box as a file system for its work, and in that kind of scenario, we don't necessarily care what it's putting in the file system. It could put its memory files, it could put its, you know, specification, you know, documents.It could put, you know, whatever its markdown files are, or it could, you know, generate PDFs. It's just like, it's a workspace that is, is sort of sandboxed off for its work. People can collaborate into it, it can share with other people. And, and so we, we were thinking a lot about what's the right, you know, kind of way to, to deliver that at scale.Docs Graphs and Founder Modeswyx: I wanted to come into sort of the sort of AI transformation or AI sort of, uh, operations things. [00:42:00] Um, one of the tweets that you, that you wanted to talk about, this is just me going through your tweets, by the way. Oh, okay. I mean, like, this is, you readAaron Levie: one by one,swyx: you're the, you're the easiest guest to prep for because you, you already have like, this is the, this is what I'm interested in.I'm like, okay, well, areAaron Levie: we gonna get to like, like February, January or something? Where are we in the, in the timelines? How far back are we going?swyx: Can you, can you describe boxes? A set of skills? Right? Like that, that's like, that's like one of the extremes of like, well if you, you just turn everything into a markdown file.Yeah. Then your agent can run your company. Uh, like you just have to write, find the right sequence of words toAaron Levie: Yes.swyx: To do it.Aaron Levie: Sorry, isthatswyx: the question? So I think the question is like, what if we documented everything? Yes. The way that you exactly said like,Aaron Levie: yes.swyx: Um, let's get all the Fortune five hundreds, uh, prepared for agents.Yes. And like, you know, everything's in golden and, and nicely filed away and everything. Yes. What's missing? Like, what's left, right? LikeAaron Levie: Yeah.swyx: You've, you've run your company for a decade. LikeAaron Levie: Yeah. I think the challenge is that, that that information changes a week later. And because something happened in the market for that [00:43:00] customer, or us as a company that now has to go get updated, and so these systems are living and breathing and they have to experience reality and updates to reality, which right now is probably gonna be humans, you know, kinda giving those, giving them the updates.And, you know, there is this piece about context graphs as as, uh, that kinda went very viral. Yeah. And I, I, I was like a, i, I, I thought it was super provocative. I agreed with many parts of it. I disagree with a few parts around. You know, it's not gonna be as easy as as just if we just had the agent traces, then we can finally do that work because there's just like, there's so much more other stuff that that's happening that, that we haven't been able to capture and digitize.And I think they actually represented that in the piece to be clear. But like there's just a lot of work, you know, that that has to, you just can't have only skills files, you know, for your company because it's just gonna be like, there's gonna be a lot of other stuff that happens. Yeah. Change over time.Yeah. Most companies are practically apprenticeships.swyx: Most companies are practically apprenticeships. LikeJeff Huber: every new employee who joins the team, [00:44:00] like you span one to three months. Like ramping them up.Aaron Levie: Yes. AllJeff Huber: that tat knowledgeAaron Levie: isJeff Huber: not written down.Aaron Levie: Yes.Jeff Huber: But like, it would have to be if you wanted to like give it to an Asian.Right. And so like that seems to me like to beAaron Levie: one is I think you're gonna see again a premium on companies that can document this. Mm-hmm. Much. There'll be a huge premium on that because, because you know, can you shorten that three month ramp cycle to a two week ramp cycle? That's an instant productivity gain.Can you re dramatically reduce rework in the organization because you've documented where all the stuff is and where the answers are. Can you make your average employee as good as your 90th percentile employee because you've captured the knowledge that's sort of in the heads of, of those top employees and make that available.So like you can see some very clear productivity benefits. Mm-hmm. If you had a company culture of making sure you know your information was captured, digitized, put in a format that was agent ready and then made available to agents to work with, and then you just, again, have this reality of like add a 10,000 person [00:45:00] company.Mapping that to the, you know, access structure of the company is just a hard problem. Is like, is like, yeah, well, you just, not every piece of information that's digitized can be shared to everybody. And so now you have to organize that in a way that actually works. There was a pretty good piece, um, this, this, uh, this piece called your company as a file is a file system.I, did you see that one?swyx: Nope.Aaron Levie: Uh, yes. You saw it. Yeah. And, and, uh, I actually be curious your thoughts on it. Um, like, like an interesting kind of like, we, we agree with it because, because that's how we see the world and, uh,swyx: okay. We, we have it up on screen. Oh,Aaron Levie: okay. Yeah. But, but it's all about basically like, you know, we've already, we, we, we already organized in this kind of like, you know, permission structure way.Uh, and, and these are the kind of, you know, natural ways that, that agents can now work with data. So it's kind of like this, this, you know, kind of interesting metaphor, but I do think companies will have to start to think about how they start to digitize more, more of that data. What was your take?Jeff Huber: Yeah, I mean, like the company's probably like an acid compliant file system.Aaron Levie: Uh,Jeff Huber: yeah. Which I'm guessing boxes, right? So, yeah. Yes.swyx: Yeah. [00:46:00]Jeff Huber: Which you have a great piece on, but,swyx: uh, yeah. Well, uh, I, I, my, my, my direction is a little bit like, I wanna rewind a little bit to the graph word you said that there, that's a magic trigger word for us. I always ask what's your take on knowledge graphs?Yeah. Uh, ‘cause every, especially at every data database person, I just wanna see what they think. There's been knowledge graphs, hype cycles, and you've seen it all. So.Aaron Levie: Hmm. I actually am not the expert in knowledge graphs, so, so that you might need toswyx: research, you don't need to be an expert. Yeah. I think it's just like, well, how, how seriously do people take it?Yeah. Like, is is, is there a lot of potential in the, in the HOVI?Aaron Levie: Uh, well, can I, can I, uh, understand first if it's, um, is this a loaded question in the sense of are you super pro, super con, super anti medium? Iswyx: see pro, I see pros and cons. Okay. Uh, but I, I think your opinion should be independent of mine.Aaron Levie: Yeah. No, no, totally. Yeah. I just want to see what I'm stepping into.swyx: No, I know. It's a, and it's a huge trigger word for a lot of people out Yeah. In our audience. And they're, they're trying to figure out why is that? Because whyAaron Levie: is this such aswyx: hot item for them? Because a lot of people get graph religion.And they're like, everything's a graph. Of course you have to represent it as a graph. Well, [00:47:00] how do you solve your knowledge? Um, changing over time? Well, it's a graph.Aaron Levie: Yeah.swyx: And, and I think there, there's that line of work and then there's, there's a lot of people who are like, well, you don't need it. And both are right.Aaron Levie: Yeah. And what do the people who say you don't need it, what are theyswyx: arguing for Mark down files. Oh, sure, sure. Simplicity.Aaron Levie: Yeah.swyx: Versus it's, it's structure versus less structure. Right. That's, that's all what it is. I do.Aaron Levie: I think the tricky thing is, um, is, is again, when this gets met with real humans, they're just going to their computer.They're just working with some people on Slack or teams. They're just sharing some data through a collaborative file system and Google Docs or Box or whatever. I certainly like the vision of most, most knowledge graph, you know, kind of futuristic kind of ways of thinking about it. Uh, it's just like, you know, it's 2026.We haven't seen it yet. Kind of play out as as, I mean, I remember. Do you remember the, um, in like, actually I don't, I don't even know how old you guys are, but I'll for, for to show my age. I remember 17 years ago, everybody thought enterprises would just run on [00:48:00] Wikis. Yeah. And, uh, confluence and, and not even, I mean, confluence actually took off for engineering for sure.Like unquestionably. But like, this was like everything would be in the w. And I think based on our, uh, our, uh, general style of, of, of what we were building, like we were just like, I don't know, people just like wanna workspace. They're gonna collaborate with other people.swyx: Exactly. Yeah. So you were, you were anti-knowledge graph.Aaron Levie: Not anti, not anti. Soswyx: not nonAaron Levie: I'm not, I'm not anti. ‘cause I think, I think your search system, I just think these are two systems that probably, but like, I'm, I'm not in any religious war. I don't want to be in anybody's YouTube comments on this. There's not a fight for me.swyx: We, we love YouTube comments. We're, we're, we're get into comments.Aaron Levie: Okay. Uh, but like, but I, I, it's mostly just a virtue of what we built. Yeah. And we just continued down that path. Yeah.swyx: Yeah.Aaron Levie: And, um, and that, that was what we pursued. But I'm not, this is not a, you know, kind of, this is not a, uh, it'sswyx: not existential for you. Great.Aaron Levie: We're happy to plug into somebody else's graph.We're happy to feed data into it. We're happy for [00:49:00] agents to, to talk to multiple systems. Not, not our fight.swyx: Yeah.Aaron Levie: But I need your answer. Yeah. Graphs or nerd Snipes is very effective nerd.swyx: See this is, this is one, one opinion and then I've,Jeff Huber: and I think that the actual graph structure is emergent in the mind of the agent.Ah, in the same way it is in the mind of the human. And that's a more powerful graph ‘cause it actually involved over time.swyx: So don't tell me how to graph. I'll, I'll figure it out myself. Exactly. Okay. All right. AndJeff Huber: what's yours?swyx: I like the, the Wiki approach. Uh, my, I'm actually

god ceo new york netflix tiktok game ai chicago movies talk change building deep zoom identity creator marvel government open mom fortune executives asian humans silicon valley companies medium engineering fiction poland agent covering pixar substack amid ra saas civil expensive simplicity cto drinks crm gemini slack openai paramount goldman sachs mapping industries api gi spielberg generally powerpoint boxes persistent apex mm kpi forrest gump ee 1b hubspot evs disrupt apis scorsese vcs anthropic arr documentation opus techcrunch google docs wiki versus pdfs cg venn waymo perplexity graphs darren aronofsky codex cloudflare partly previs rfp okta cursor l'op arrington fdi gump snipes brain trust requiem for a dream brian chesky cowork tyler cowen devrel your company datasets tpn yosh wikis yasha code reviews techcrunch disrupt sandboxes aaron levie pt anderson abdicate panino dharmesh aaron levy gentech fde dya michael arrington filesystems jeff hoover

From Technical Debt to 4x Engineering Velocity with Gayatri Narayan of Builders FirstSource

CIO Classified

Play Episode Listen Later Feb 26, 2026 22:44

Two words that make most engineers shudder: code refactoring. Now raise the stakes — refactoring decades of legacy systems inside a large enterprise. A tech debt-heavy project of this scale needs a leader who has driven complex digital transformations, like Gayatri Narayan (formerly PepsiCo, Microsoft, Amazon). Now, as President of Technology at Builders FirstSource, Gayatri Narayan is achieving a 3–4x increase in engineering velocity since joining less than a year ago. Gayatri joins host Yousuf Khan to unpack the strategy behind those results, including how to deploy AI across the SDLC, how to rigorously evaluate ROI on AI investments, and how to lead change across complex enterprise tech stacks.Key Moments: 01:30 – Why Construction Technology Is Ready for Transformation 04:05 – AI Strategy: Elevating UX and Customer Experience 08:20 – Evaluating AI Investments: ROI, NPV, and Operating Costs 12:45 – Achieving 3–4x Engineering Velocity 16:05 – Humans in the Loop: Craft, Code Review, and AI Amplification 18:35 – Where the Industry Gets AI Adoption Wrong 20:30 – Leadership Advice: Start with the Customer About Gayatri: Gayatri Narayan is a general management executive with more than 15 years of experience leading product, engineering, data science, and operations across global enterprises, with full P&L responsibility and a track record of driving profitable growth through digital transformation. She currently serves as President of Technology at Builders FirstSource, where she leads enterprise technology strategy, modernizes legacy systems, and embeds AI into the software development lifecycle to accelerate innovation across the residential construction value chain. Previously, she served as Senior Vice President of Digital Products and Services at PepsiCo and held multiple general management roles at Microsoft, including leading Product and Engineering for Intelligent Communications across Teams and Skype as well as Enterprise PaaS and SaaS businesses; she also held leadership roles at Amazon spanning Marketplace Transportation and Logistics and several major retail categories. Guest Highlights: “We've seen a three to four times increase in engineering velocity — especially in refactoring legacy systems where historically there was very little knowledge of how the system actually worked.” “With generative AI, companies that have existed for 20 or 30 years don't have to get bogged down by legacy stacks. They can embrace emerging technologies without spending 18 to 24 months just refactoring.” “It really comes down to efficiency of time. The developer's surface area of impact expands dramatically — it's not just about writing code anymore, it's about delivering business value faster.” Visit ciopod.com for more episodes. Subscribe on YouTube or follow on your favorite podcast platform so you never miss a conversation with today's top technology leaders. Our Sponsor: Want to accelerate software development by 500%? Meet Blitzy, the only autonomous code generation platform with infinite code context, purpose-built for large, complex enterprise-scale codebases. While other AI coding tools provide snippets of code and struggle with context, Blitzy ingests millions of lines of code and orchestrates thousands of agents that reason for hours to map every line-level dependency. With a complete contextual understanding of your codebase, Blitzy is ready to be deployed at the beginning of every sprint. Blitzy handles the heavy lifting, delivering over 80% of the work autonomously. The platform plans, builds, and validates premium-quality code at the speed of compute, turning months of engineering into a matter of days. It's the secret weapon for Fortune 500 companies globally. To hear how engineering leaders are transforming the way they deliver software, visit blitzy.com. Schedule a meeting with their consultants to enable an AI-Native SDLC in your organization today. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

20VC: Codex vs Claude Code vs Cursor: Who Wins, Who Loses | Will All Coding Be Automated - Do We Need PMs | The Real Bottleneck to AGI | The Three Phases of Agents and What You Need to Know with Alex Embiricos, Head of Codex at OpenAI

The Twenty Minute VC: Venture Capital | Startup Funding | The Pitch

Play Episode Listen Later Feb 21, 2026 67:55

Alexander Embiricos is the Head of Codex at OpenAI, leading the development of the company's flagship AI coding systems that power automated software generation, debugging and developer workflows. Under his leadership, Codex has become one of the most widely adopted AI developer platforms. AGENDA: 05:13 Will Coding Be Automated? Why AI Could Create More Engineers, Not Fewer 07:17 Do We Need PMs? The "Undefined" Product Role and When It Matters 08:06 The Real AGI Bottleneck: Human Prompting, Validation, and "Too Much Effort" 13:04 Three Phases of Agents: Coding → Computer Use → Productized Workflows 13:52 Enterprise Reality Check: Security, Permissions, and Safe Agentic Browsing 17:57 Is Inference the New Sales and Marketing? 18:49 What % of Codex Was Written by AI? 21:33 Do OpenAI Use AI for Code Review? 23:31 Is there any stickiness to AI coding tools? 28:22 What Does "Winning" Mean at OpenAI? Mission, Competition, and Moats 32:04 The Future UI: Chat or Voice 34:10 Agent-to-Agent Workflows: Designing for Approvals, Compliance, and Automation 35:39 Do Coding Models Have a Data Moat? 36:50 How does Codex View Data: Will They Build Their Own Mercor and Turing? 37:27 How Does Codex View Consumer: Will They Compete with Lovable? 41:56 Benchmarks vs "Vibes": How People Actually Judge Models 42:43 Cursor's Edge and the Case for Building Your Own Models 47:37 Is SaaS Dead? What Still Defends Value (Humans + Systems of Record) 51:28 Talent Wars and Career Advice for New Engineers in the AI Era 01:01:03 Guardrails, the Fully AI-Managed Stack, and a 10-Year Vision for Everyone

99. O architekturze oprogramowania w erze AI-Assisted development z Łukaszem Szydło i Marcinem Markowskim

Better Software Design

Play Episode Listen Later Feb 5, 2026 101:34

Od ostatnich odcinków minęło trochę czasu, ale świat IT nie stał w miejscu – wręcz przeciwnie, przyspieszył tak, że momentami trudno nadążyć. Dlatego w tym odcinku, wspólnie z Łukaszem Szydło i Marcinem Markowskim, próbujemy po prostu głośno zastanowić się, co tak naprawdę dzieje się z pracą architekta oprogramowania i ogólnie architekturą software'u w dobie wszechobecnego Generative AI.Gdy kolejne modele wychodzą w coraz szybszym tempie, w zasadzie trochę trudno rozmawiać o tym, jakie 10 narzędzi zmieni Twoje życie architekta, z których warto korzystać już teraz. Zamiast tego usiedliśmy, żeby porozmawiać o naszych spostrzeżeniach i obserwacjach z placu boju. AI wpędza nas po trochu w pułapkę: kod powstaje błyskawicznie, ale nasze ludzkie moce przerobowe do jego czytania i weryfikacji pozostają w zasadzie bez zmian. Czy przez to nie zmieniamy się powoli w redaktorów kodu i czy Code Review nie stanie się zaraz największym wąskim gardłem w naszych projektach? Ale Code Review jest tylko jednym z etapów procesu Software Development Lifecycle, na którym widać wpływ narzędzi AI.Ogłoszenie!Już niedługo, bo 17 lutego, będziemy mogli się spotkać na otwartym warsztacie DevHours: Fullstack x EventStorming, który mam przyjemność współorganizować z Capgemini. Jeśli interesujesz się oprogramowaniem i chcesz podnieść swoje umiejętności w projektowaniu software'u, zapraszam do rejestracji.

ai development od ju assisted czy genai dlatego capgemini twoje gdy zamiast ddd sztuczna inteligencja sdlc marcinem software architecture code reviews erze szyd

509 Bumpers 90

Reversim Podcast

Play Episode Listen Later Jan 11, 2026

רק מספר 509 של רברס עם פלטפורמה - באמפרס מספר 90, שהוקלט ב-1 בינואר 2026, שנה אזרחית חדשה טובה! רן, דותן ואלון באולפן הוירטואלי (עם Riverside) בסדרה של קצרצרים וחדשות (ולפעמים קצת ישנות) מרחבי האינטרנט: הבלוגים, ה-GitHub-ים, ה-Rust-ים וה-LLM-ים החדשים מהתקופה האחרונה.

ai dna local drop open hands developing trade code human skills hype software scale boost agent trigger skill framework aka gemini coverage openai loop demo processing rust ux ecosystem api laptops shipping riverside goose gpt open source kpi fe github jeep notion domain kool aid llm cherokees one shot black box browsers prompt linus made simple midi reasoning interface codex cloudflare ai ai bottleneck kanban fine tuning cursor cve inference sunk costs linux foundation bumpers postgres waf steady state voice cloning clean code code reviews silverlight kent beck assembler rlhf business logic software craftsmanship code smells

505 Bumpers 89

Reversim Podcast

Play Episode Listen Later Nov 22, 2025

פרק מספר 505 של רברס עם פלטפורמה - באמפרס מספר 89, שהוקלט ב-13 בנובמבר 2025, רגע אחרי כנס רברסים 2025 [יש וידאו!]: רן, דותן ואלון (והופעת אורח של שלומי נוח!) באולפן הוירטואלי עם סדרה של קצרצרים מרחבי האינטרנט: הבלוגים, ה-GitHub-ים, ה-Claude-ים וה-GPT-ים החדשים מהתקופה האחרונה.

amazon ai business spoilers service state video performance devil selling chatgpt trade tesla union memory enemy consistency production software cloud experiments doom followers agent projects sci fi context patterns framework ship chat consistent characters vibe resilient powered pattern bots delay openai loop doc garbage arc user nvidia monitoring rust ux api sensitive frame dump real time bullet sweep generate gpt open source python ui aws contributors valuations risky subscription server github databases 500k llm azure conductor black box output prompt samples grok high level battlestar galactica extensive desktops interface spreadsheets canary cascade dns workload guardrails cloudflare sql blueprints dynamo rag agentic repo kanban embedding lambda rollback pointer stop motion zed polymarket mcp generates retrieval query rendering green tea serverless neural networks chromium typescript folder indexing captcha vs code ppt multicloud timeouts bloop bumpers dhh clustering cohere ec2 mcps memori clean code code reviews dynamodb instructive langchain ragas 100gb neurips us east iy deep neural networks key value garbage collector

424: I Never Really Loved Coding (And Only AI Made Me Realize It)

The Bootstrapped Founder

Play Episode Listen Later Nov 21, 2025 18:38 Transcription Available

After 20+ years as a software developer, AI coding assistants revealed a shocking truth: I never actually loved coding—I loved what code could accomplish. In this episode, I explore how transitioning from hand-crafting every line at Podscan to orchestrating AI-generated code exposed the fundamental difference between developers who cherish solving technical puzzles and entrepreneurs who prioritize shipping features that drive business value. This shift from programmer to orchestrator isn't just about tools; it's about letting go of a carefully constructed identity and embracing that for software entrepreneurs, pristine code was never the goal—rapid deployment, customer value, and business growth always were. If you're struggling with AI coding tools or clinging to perfectionist coding standards, this perspective might fundamentally change how you view your role as a technical founder.This episode of The Bootstraped Founder is sponsored by Paddle.comYou'll find the Black Friday Guide here: https://www.paddle.com/learn/grow-beyond-black-fridayThe blog post: https://thebootstrappedfounder.com/i-never-really-loved-coding-and-only-ai-made-me-realize-it/ The podcast episode: https://tbf.fm/episodes/424-i-never-really-loved-coding-and-only-ai-made-me-realize-it Check out Podscan, the Podcast database that transcribes every podcast episode out there minutes after it gets released: https://podscan.fmSend me a voicemail on Podline: https://podline.fm/arvidYou'll find my weekly article on my blog: https://thebootstrappedfounder.comPodcast: https://thebootstrappedfounder.com/podcastNewsletter: https://thebootstrappedfounder.com/newsletterMy book Zero to Sold: https://zerotosold.com/My book The Embedded Entrepreneur: https://embeddedentrepreneur.com/My course Find Your Following: https://findyourfollowing.comHere are a few tools I use. Using my affiliate links will support my work at no additional cost to you.- Notion (which I use to organize, write, coordinate, and archive my podcast + newsletter): https://affiliate.notion.so/465mv1536drx- Riverside.fm (that's what I recorded this episode with): https://riverside.fm/?via=arvid- TweetHunter (for speedy scheduling and writing Tweets): http://tweethunter.io/?via=arvid- HypeFury (for massive Twitter analytics and scheduling): https://hypefury.com/?via=arvid60- AudioPen (for taking voice notes and getting amazing summaries): https://audiopen.ai/?aff=PXErZ- Descript (for word-based video editing, subtitles, and clips): https://www.descript.com/?lmref=3cf39Q- ConvertKit (for email lists, newsletters, even finding sponsors): https://convertkit.com?lmref=bN9CZw

ai loved sold perfectionism openai realize tweets riverside coding delegation identity crisis notion paddle entrepreneurial mindset business value technical debt customer value code reviews code quality software craftsmanship arvid kahl hypefury embedded entrepreneur tweethunter

Crossover with Embedded AI Podcast

The Agile Embedded Podcast

Play Episode Listen Later Nov 18, 2025 55:02

In this special crossover episode with the brand-new Embedded AI Podcast, Luca and Jeff are joined by Ryan Torvik, Luca's co-host on the Embedded AI podcast, to explore the intersection of AI-powered development tools and agile embedded systems engineering. The hosts discuss practical strategies for using Large Language Models (LLMs) effectively in embedded development workflows, covering topics like context management, test-driven development with AI, and maintaining code quality standards in safety-critical systems. The conversation addresses common anti-patterns that developers encounter when first adopting LLM-assisted coding, such as "vibe coding" yourself off a cliff by letting the AI generate too much code at once, losing control of architectural decisions, and failing to maintain proper test coverage. The hosts emphasize that while LLMs can dramatically accelerate prototyping and reduce boilerplate coding, they require even more rigorous engineering discipline - not less. They discuss how traditional agile practices like small commits, continuous integration, test-driven development, and frequent context resets become even more critical when working with AI tools. For embedded systems engineers working in safety-critical domains like medical devices, automotive, and aerospace, the episode provides valuable guidance on integrating AI tools while maintaining deterministic quality processes. The hosts stress that LLMs should augment, not replace, static analysis tools and human code reviews, and that developers remain fully responsible for AI-generated code. Whether you're just starting with AI-assisted development or looking to refine your approach, this episode offers actionable insights for leveraging LLMs effectively while keeping the reins firmly in hand. ## Key Topics * [03:45] LLM Interface Options: Web, CLI, and IDE Plugins - Choosing the Right Tool for Your Workflow* [08:30] Prompt Engineering Fundamentals: Being Specific and Iterative with LLMs* [12:15] Building Effective Base Prompts: Learning from Experience vs. Starting from Templates* [16:40] Context Window Management: Avoiding Information Overload and Hallucinations* [22:10] Understanding LLM Context: Files, Prompts, and Conversation History* [26:50] The Nature of Hallucinations: Why LLMs Always Generate, Never Judge* [29:20] Test-Driven Development with AI: More Critical Than Ever* [35:45] Avoiding 'Vibe Coding' Disasters: The Importance of Small, Testable Increments* [42:30] Requirements Engineering in the AI Era: Becoming More Specific About What You Want* [48:15] Extreme Programming Principles Applied to LLM Development: Small Steps and Frequent Commits* [52:40] Context Reset Strategies: When and How to Start Fresh Sessions* [56:20] The V-Model Approach: Breaking Down Problems into Manageable LLM-Sized Chunks* [01:01:10] AI in Safety-Critical Systems: Augmenting, Not Replacing, Deterministic Tools* [01:06:45] Code Review in the AI Age: Maintaining Standards Despite Faster Iteration* [01:12:30] Prototyping vs. Production Code: The Superpower and the Danger* [01:16:50] Shifting Left with AI: Empowering Product Owners and Accelerating Feedback Loops* [01:19:40] Bootstrapping New Technologies: From Zero to One in Minutes Instead of Weeks* [01:23:15] Advice for Junior Engineers: Building Intuition in the Age of AI-Assisted Development ## Notable Quotes > "All of us are new to this experience. Nobody went to school back in the 80s and has been doing this for 40 years. We're all just running around, bumping into things and seeing what works for us." — Ryan Torvik > "An LLM is just a token generator. You stick an input in, and it returns an output, and it has no way of judging whether this is correct or valid or useful. It's just whatever it generated. So it's up to you to give it input data that will very likely result in useful output data." — Luca Ingianni > "Tests tell you how this is supposed to work. You can have it write the test first and then evaluate the test. Using tests helps communicate - just like you would to another person - no, it needs to function like this, it needs to have this functionality and behave in this way." — Ryan Torvik > "I find myself being even more aggressively biased towards test-driven development. While I'm reasonably lenient about the code that the LLM writes, I am very pedantic about the tests that I'm using. I will very thoroughly review them and really tweak them until they have the level of detail that I'm interested in." — Luca Ingianni > "It's really forcing me to be a better engineer by using the LLM. You have to go and do that system level understanding of the problem space before you actually ask the LLM to do something. This is what responsible people have been saying - this is how you do engineering." — Ryan Torvik > "I can use LLMs to jumpstart me or bootstrap me from zero to one. Once there's something on the screen that kind of works, I can usually then apply my general programming skill, my general engineering taste to improve it. Getting from that zero to one is now not days or weeks of learning - it's 20 minutes of playing with it." — Jeff Gable > "LLMs are fantastic at small-scale stuff. They will be wonderful at finding better alternatives for how to implement a certain function. But they are absolutely atrocious at large-scale stuff. They will gleefully mess up your architecture and not even notice because they cannot fit it into their tiny electronic brains." — Luca Ingianni > "Don't be afraid to try it out. We're all noobs to this. This is the brave noob world of AI exploration. Be curious about it, but also be cautious about it. Don't ever take your hands off the reins. Trust your engineering intuition - even young folks that are just starting, trust your engineering intuition." — Ryan Torvik > "As the saying goes, good judgment comes from experience. Experience comes from bad judgment. You'll find spectacular ways of messing up - that is how you become a decent engineer. LLMs do not change that. Junior engineers will still be necessary, will still be around, and they will still evolve into senior engineers eventually after they've fallen on their faces enough times." — Luca Ingianni You can find Jeff at https://jeffgable.com.You can find Luca at https://luca.engineer.Want to join the agile Embedded Slack? Click hereAre you looking for embedded-focused trainings? Head to https://agileembedded.academy/Ryan Torvik and Luca have started the Embedded AI podcast, check it out at https://embeddedaipodcast.com/

#220 Code Reviews als Kommunikationsnetzwerk mit Prof. Michael Dorner

Engineering Kiosk

Play Episode Listen Later Nov 4, 2025 76:56 Transcription Available

Blockiert dein Code Review gerade mal wieder den Release oder ist es der unsichtbare Klebstoff, der Wissen im Team verteilt? In dieser Episode gehen wir der Frage auf den Grund, warum Reviews weit mehr sind als ein einfaches “looks good to me” und was sie mit sozialer Interaktion, Teamdynamik und Wissensverteilung zu tun haben. Wir sprechen mit Prof. Michael Dorner, Professor für Software Engineering an der TH Nürnberg, der seit Jahren zur Rolle von Code Reviews in der Softwareentwicklung forscht: mit Code Review Daten von Microsoft, Spotify oder trivago. Überall zeigt sich: Pull Requests sind mehr als technische Checks, sie sind Kommunikationsnetzwerke. Gemeinsam beleuchten wir, warum Tooling oft zweitrangig ist, wie sich Review-Praktiken historisch entwickelt haben und was das alles mit Ownership, Architektur und sogar Steuern zu tun hat. Ein Blick auf Code Reviews, der dir definitiv eine neue Perspektive eröffnet.Bonus: Wir erklären, warum alle Informatiker Doktoren auch Philosophen sind ;)Unsere aktuellen Werbepartner findest du auf https://engineeringkiosk.dev/partnersDas schnelle Feedback zur Episode:

#219 Technische Schulden: Bewusst aufbauen, gezielt abbauen

Engineering Kiosk

Play Episode Listen Later Oct 28, 2025 60:53

Technische Schulden: Code veröffentlichen und weiterziehen oder doch erst aufräumen?Technische Schulden fühlen sich oft nach Ballast an, können aber dein stärkster Hebel für Speed sein. Der Knackpunkt ist, sie bewusst und sichtbar einzugehen und konsequent wieder abzubauen. In dieser Episode sprechen wir darüber, wie wir technische Schulden strategisch nutzen, ohne uns langfristig festzufahren.Ward Cunningham sagt: Technische Schulden sind nicht automatisch schlechter Code. Wir ordnen ein, was wirklich als “Debt” zählt und warum Provisorien oft länger leben als geplant. Dann erweitern wir die Perspektive von der Code‑ und Architektur‑Ebene auf People und Prozesse: Knowledge Silos, fehlendes Code Review und organisatorische Entscheidungen können genauso Schulden sein wie ein any in TypeScript. Wir diskutieren sinnvolle Indikatoren wie DORA Metriken, zyklomatische Komplexität und den CRAP Index, aber auch ihre Grenzen. Warum Trends über Releases hilfreicher sind als Einzelwerte oder wie Teamskalierung die Kennzahlen beeinflusst. Dazu die Business Seite: reale Kosten, Produktivitätsverluste, Frust im Team und Fluktuation. Als Anschauung dient der Sonos App Rewrite als teures Lehrstück für akkumulierte Schulden.Wenn du wissen willst, wie du in deinem Team Technical Debt als Werkzeug nutzt, Metriken und Kultur klug kombinierst und den Business Impact sauber argumentierst, dann ist diese Episode für dich.Bonus: Wir verraten, warum Legacy allein keine Schuld ist und wie Open Source, Plattformteams und Standardisierung dir echte Zinsen sparen können.Unsere aktuellen Werbepartner findest du auf https://engineeringkiosk.dev/partnersDas schnelle Feedback zur Episode:

ai team code speed debt gedanken dazu grenzen releases kultur unsere entscheidungen keine perspektive kosten crap open source schuld kaffee gutes anregungen frust werkzeug produktivit schulden komplexit hebel aufbauen bewusst ballast zinsen abbau kennzahlen typescript business impact technische werbepartner technical debt indikatoren erreiche gezielt standardisierung fluktuation lehrst abbauen legacy code metriken code reviews iso iec slos debtthe einzelwerte ward cunningham business seite bonus wir

420: AI for the Code-Writing Purist: How to Use AI Without Surrendering Your Keyboard

The Bootstrapped Founder

Play Episode Listen Later Oct 24, 2025 23:25 Transcription Available

I know you're out there. The developer who watches their colleagues enthusiastically embrace Claude Code and Cursor, having AI write entire feature sets while you proudly type every semicolon by hand. The founder who sees AI-generated code as a ticking time bomb of bugs and security vulnerabilities. The software entrepreneur who believes that real code comes from human minds, not language models.This one's for you.This episode of The Bootstraped Founder is sponsored by Paddle.comYou'll find the Black Friday Guide here: https://www.paddle.com/learn/grow-beyond-black-fridayThe blog post: https://thebootstrappedfounder.com/ai-for-the-code-writing-purist-how-to-use-ai-without-surrendering-your-keyboard/The podcast episode: https://tbf.fm/episodes/420-ai-for-the-code-writing-purist-how-to-use-ai-without-surrendering-your-keyboardCheck out Podscan, the Podcast database that transcribes every podcast episode out there minutes after it gets released: https://podscan.fmSend me a voicemail on Podline: https://podline.fm/arvidYou'll find my weekly article on my blog: https://thebootstrappedfounder.comPodcast: https://thebootstrappedfounder.com/podcastNewsletter: https://thebootstrappedfounder.com/newsletterMy book Zero to Sold: https://zerotosold.com/My book The Embedded Entrepreneur: https://embeddedentrepreneur.com/My course Find Your Following: https://findyourfollowing.comHere are a few tools I use. Using my affiliate links will support my work at no additional cost to you.- Notion (which I use to organize, write, coordinate, and archive my podcast + newsletter): https://affiliate.notion.so/465mv1536drx- Riverside.fm (that's what I recorded this episode with): https://riverside.fm/?via=arvid- TweetHunter (for speedy scheduling and writing Tweets): http://tweethunter.io/?via=arvid- HypeFury (for massive Twitter analytics and scheduling): https://hypefury.com/?via=arvid60- AudioPen (for taking voice notes and getting amazing summaries): https://audiopen.ai/?aff=PXErZ- Descript (for word-based video editing, subtitles, and clips): https://www.descript.com/?lmref=3cf39Q- ConvertKit (for email lists, newsletters, even finding sponsors): https://convertkit.com?lmref=bN9CZw

ai writing security code sold automation acquisition accessibility tweets riverside coding surrendering problem solving notion keyboard use ai paddle software development vulnerabilities business value solopreneurs cursor debugging purist code reviews codebase arvid kahl hypefury embedded entrepreneur tweethunter

#217 Bug Management: Erfassen, Reporten, Klassifizieren, Triagieren

Engineering Kiosk

Play Episode Listen Later Oct 14, 2025 50:40

Bug-Management muss man wollen … und können.Jede:r von uns kennt sie: Bugs in der Software. Sie verstecken sich nicht nur in tiefen Architekturentscheidungen oder Skurrilitäten des Nutzerverhaltens. Sie sind Alltag, egal wie viel Testautomatisierung, KI-Unterstützung oder Code-Reviews wir in unseren Prozessen haben. Doch wie gehst du damit um, wenn die Bugliste immer länger wird, dein Team über Jira-Tickets stöhnt und die Frage im Raum steht: Lohnt es sich überhaupt, Bugs systematisch zu managen?In dieser Episode nehmen wir dich mit durch alle Facetten des modernen Bug-Managements. Wir diskutieren, wie Bugs überhaupt entstehen, warum 'Zero Bug'-Versprechen ein Mythos sind und welche Strategien es gibt, Fehler möglichst früh zu finden. Ob durch Beta-Channels, Dogfooding im eigenen Unternehmen oder kreatives Recruiting. Wir tauchen ein in die Welt der Bug Reports: Wie sieht ein richtig guter aus? Welche Infos braucht das Engineering und wie senkst du die Hürden, damit dein Team (und auch die Community) wirklich meldet? Klartext gibt's auch zur Priorisierung: Wie klassifizierst du Bugs nach User-Impact, Komplexität und Business-Wert, anstatt an zu vielen bunten Jira-Feldern zu verzweifeln?Neugierig? Dann bleib dran.Bonus: Unerwartete Funfact-Challenge → Ist schlechte UX ein Bug oder ein Feature?Unsere aktuellen Werbepartner findest du auf https://engineeringkiosk.dev/partnersDas schnelle Feedback zur Episode:

From Deterministic to AI-Driven—The New Paradigm of Software Development | Markus Hjort

Scrum Master Toolbox Podcast

Play Episode Listen Later Oct 9, 2025 44:17

AI Assisted Coding: From Deterministic to AI-Driven—The New Paradigm of Software Development, With Markus Hjort In this BONUS episode, we dive deep into the emerging world of AI-assisted coding with Markus Hjort, CTO of Bitmagic. Markus shares his hands-on experience with what's being called "vibe coding" - a paradigm shift where developers work more like technical product owners, guiding AI agents to produce code while focusing on architecture, design patterns, and overall system quality. This conversation explores not just the tools, but the fundamental changes in how we approach software engineering as a team sport. Defining Vibecoding: More Than Just Autocomplete "I'm specifying the features by prompting, using different kinds of agentic tools. And the agent is producing the code. I will check how it works and glance at the code, but I'm a really technical product owner." Vibecoding represents a spectrum of AI-assisted development approaches. Markus positions himself between pure "vibecoding" (where developers don't look at code at all) and traditional coding. He produces about 90% of his code using AI tools, but maintains technical oversight by reviewing architectural patterns and design decisions. The key difference from traditional autocomplete tools is the shift from deterministic programming languages to non-deterministic natural language prompting, which requires an entirely different way of thinking about software development. The Paradigm Shift: When AI Changed Everything "It's a different paradigm! Looking back, it started with autocomplete where Copilot could implement simple functions. But the real change came with agentic coding tools like Cursor and Claude Code." Markus traces his journey through three distinct phases. First came GitHub Copilot's autocomplete features for simple functions - helpful but limited. Next, ChatGPT enabled discussing architectural problems and getting code suggestions for unfamiliar technologies. The breakthrough arrived with agentic tools like Cursor and Claude Code that can autonomously implement entire features. This progression mirrors the historical shift from assembly to high-level languages, but with a crucial difference: the move from deterministic to non-deterministic communication with machines. Where Vibecoding Works Best: Knowing Your Risks "I move between different levels as I go through different tasks. In areas like CSS styling where I'm not very professional, I trust the AI more. But in core architecture where quality matters most, I look more thoroughly." Vibecoding effectiveness varies dramatically by context. Markus applies different levels of scrutiny based on his expertise and the criticality of the code. For frontend work and styling where he has less expertise, he relies more heavily on AI output and visual verification. For backend architecture and core system components, he maintains closer oversight. This risk-aware approach is essential for startup environments where developers must wear multiple hats. The beauty of this flexibility is that AI enables developers to contribute meaningfully across domains while maintaining appropriate caution in critical areas. Teaching Your Tools: Making AI-Assisted Coding Work "You first teach your tool to do the things you value. Setting system prompts with information about patterns you want, testing approaches you prefer, and integration methods you use." Success with AI-assisted coding requires intentional configuration and practice. Key strategies include: System prompts: Configure tools with your preferred patterns, testing approaches, and architectural decisions Context management: Watch context length carefully; when the AI starts making mistakes, reset the conversation Checkpoint discipline: Commit working code frequently to Git - at least every 30 minutes, ideally after every small working feature Dual AI strategy: Use ChatGPT or Claude for architectural discussions, then bring those ideas to coding tools for implementation Iteration limits: Stop and reassess after roughly 5 failed iterations rather than letting AI continue indefinitely Small steps: Split features into minimal increments and commit each piece separately In this segment we refer to the episode with Alan Cyment on AI Assisted Coding, and the Pachinko coding anti-pattern. Team Dynamics: Bigger Chunks and Faster Coordination "The speed changes a lot of things. If everything goes well, you can produce so much more stuff. So you have to have bigger tasks. Coordination changes - we need bigger chunks because of how much faster coding is." AI-assisted coding fundamentally reshapes team workflows. The dramatic increase in coding speed means developers need larger, more substantial tasks to maintain flow and maximize productivity. Traditional approaches of splitting stories into tiny tasks become counterproductive when implementation speed increases 5-10x. This shift impacts planning, requiring teams to think in terms of complete features rather than granular technical tasks. The coordination challenge becomes managing handoffs and integration points when individuals can ship significant functionality in hours rather than days. The Non-Deterministic Challenge: A New Grammar "When you're moving from low-level language to higher-level language, they are still deterministic. But now with LLMs, it's not deterministic. This changes how we have to think about coding completely." The shift to natural language prompting introduces fundamental uncertainty absent from traditional programming. Unlike the progression from assembly to C to Python - all deterministic - working with LLMs means accepting probabilistic outputs. This requires developers to adopt new mental models: thinking in terms of guidance rather than precise instructions, maintaining checkpoints for rollback, and developing intuition for when AI is "hallucinating" versus producing valid solutions. Some developers struggle with this loss of control, while others find liberation in focusing on what to build rather than how to build it. Code Reviews and Testing: What Changes? "With AI, I spend more time on the actual product doing exploratory testing. The AI is doing the coding, so I can focus on whether it works as intended rather than syntax and patterns." Traditional code review loses relevance when AI generates syntactically correct, pattern-compliant code. The focus shifts to testing actual functionality and user experience. Markus emphasizes: Manual exploratory testing becomes more important as developers can't rely on having written and understood every line Test discipline is critical - AI can write tests that always pass (assert true), so verification is essential Test-first approach helps ensure tests actually verify behavior rather than just existing Periodic test validation: Randomly modify test outputs to verify they fail when they should Loosening review processes to avoid bottlenecks when code generation accelerates dramatically Anti-Patterns and Pitfalls to Avoid Several common mistakes emerge when developers start with AI-assisted coding: Continuing too long: When AI makes 5+ iterations without progress, stop and reset rather than letting it spiral Skipping commits: Without frequent Git checkpoints, recovery from AI mistakes becomes extremely difficult Over-reliance without verification: Trusting AI-generated tests without confirming they actually test something meaningful Ignoring context limits: Continuing to add context until the AI becomes confused and produces poor results Maintaining traditional task sizes: Splitting work too granularly when AI enables completing larger chunks Forgetting exploration: Reading about tools rather than experimenting hands-on with your own projects The Future: Autonomous Agents and Automatic Testing "I hope that these LLMs will become larger context windows and smarter. Tools like Replit are pushing boundaries - they can potentially do automatic testing and verification for you." Markus sees rapid evolution toward more autonomous development agents. Current trends include: Expanded context windows enabling AI to understand entire codebases without manual context curation Automatic testing generation where AI not only writes code but also creates and runs comprehensive test suites Self-verification loops where agents test their own work and iterate without human intervention Design-to-implementation pipelines where UI mockups directly generate working code Agentic tools that can break down complex features autonomously and implement them incrementally The key insight: we're moving from "AI helps me code" to "AI codes while I guide and verify" - a fundamental shift in the developer's role from implementer to architect and quality assurance. Getting Started: Experiment and Learn by Doing "I haven't found a single resource that covers everything. My recommendation is to try Claude Code or Cursor yourself with your own small projects. You don't know the experience until you try it." Rather than pointing to comprehensive guides (which don't yet exist for this rapidly evolving field), Markus advocates hands-on experimentation. Start with personal projects where stakes are low. Try multiple tools to understand their strengths. Build intuition through practice rather than theory. The field changes so rapidly that reading about tools quickly becomes outdated - but developing the mindset and practices for working with AI assistance provides durable value regardless of which specific tools dominate in the future. About Markus Hjort Markus is Co-founder and CTO of Bitmagic, and has over 20 years of software development expertise. Starting with Commodore 64 game programming, his career spans gaming, fintech, and more. As a programmer, consultant, agile coach, and leader, Markus has successfully guided numerous tech startups from concept to launch. You can connect with Markus Hjort on LinkedIn.

475: Invisible Mentorship

The Bike Shed

Play Episode Listen Later Sep 23, 2025 38:11

Sally and Aji discuss their experiences with invisible mentorship when it comes to code review. Together they question when is the right time to have conversations with your team in a bid to chase improvement, the importance of understanding your co-workers perspectives, as well as the best ways to initiate a mentoring moment. — Check out some of the things mentioned in this episode - The Coding Train (https://thecodingtrain.com) - Sarah Mel's Livable Code (https://www.youtube.com/watch?v=lI77oMKr5EY&pp=ygUTc2FyYWggbWVpIHJhaWxzY29uZg==) Thanks to our sponsors for this episode Judoscale - Autoscale the Right Way (https://judoscale.com/bikeshed) (check the link for your free gift!), and Scout Monitoring (https://www.scoutapm.com/). Your hosts for this episode have been thoughtbot's own Sally Hall (https://www.linkedin.com/in/sallyannahall) and Aji Slater (https://www.linkedin.com/in/doodlingdev/) If you would like to support the show, head over to our GitHub page (https://github.com/sponsors/thoughtbot), or check out our website (https://bikeshed.thoughtbot.com). Got a question or comment about the show? Why not write to our hosts: hosts@bikeshed.fm This has been a thoughtbot (https://thoughtbot.com/) podcast. Stay up to date by following us on social media - YouTube (https://www.youtube.com/@thoughtbot/streams) - LinkedIn (https://www.linkedin.com/company/150727/) - Mastodon (https://thoughtbot.social/@thoughtbot) - BlueSky (https://bsky.app/profile/thoughtbot.com) © 2025 thoughtbot, inc.

invisible mentorship programming right way blue sky github coworkers mastodon code reviews aji sally hall

How OpenAI Built Its Coding Agent

a16z

Play Episode Listen Later Sep 16, 2025 80:12

OpenAI's Codex has already shipped hundreds of thousands of pull requests in its first month. But what is it really, and how will coding agents change the future of software?In this episode, General Partner Anjney Midha goes behind the scenes with one of Codex's product leads- Alexander Embiricos - to unpack its origin story, why its PR success rate is so high, the safety challenges of autonomous agents, and what this all means for developers, students, and the future of coding. Timecodes:0:00 Intro: The Vision for AI Agents1:25 Codex's Origin and Naming3:20 Early Prototypes and Agent Form Factors6:00 Cloud Agents: Safety and Security9:40 Prompt Injection and Attack Vectors12:00 PR Merging: Metrics and Transparency17:00 The Future of Code Review and Automation20:00 User Adoption: Internal vs. External Surprises22:00 Multi-Turn Interactions and Product Learnings29:30 Best-of-N, Slot Machine Analogy, and Creativity33:00 Human Taste, Iteration, and Collaboration40:00 AI's Impact on Software Engineering Careers45:00 Education, CS Degrees, and AI Integration49:00 Prototyping, Hackathons, and Speed to Magic55:00 Legacy Code, Modernization, and Global Adoption1:00:00 Enterprise, Security, and Air-Gapped Environments1:05:00 Product Roadmap and Future of Codex1:10:00 Advice for Founders and Startups1:15:00 Education Reform and Project-Based Learning1:20:00 Hiring, Building, and New Grad Advice Resources: Find Alex on X: https://x.com/embiricoFind Anjney on X: https://twitter.com/AnjneyMidha Stay Updated: If you enjoyed this episode, be sure to like, subscribe, and share with your friends!Find a16z on X: https://x.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zListen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYXListen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711Follow our host: https://x.com/eriktorenbergPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Stay Updated:Find a16z on XFind a16z on LinkedInListen to the a16z Podcast on SpotifyListen to the a16z Podcast on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Startup Funding Espresso – How To Perform Technical Due Diligence

Investor Connect Podcast

Play Episode Listen Later Sep 16, 2025 2:01

How To Perform Technical Due Diligence Hello, this is Hall T. Martin with the Startup Funding Espresso -- your daily shot of startup funding and investing. Investors perform diligence on a startup before investing. Most of the diligence focuses on the financial aspects of the business. Technical diligence is just as important. For startups, it's also important to focus on the technical aspects. Here's a list of areas to review for technical due diligence: Architecture Review the technical architecture for scalability and robustness. Check the architecture for fit with the application. Code Review the code for quality and documentation. Are there processes for testing and verification? Security Review the code for security measures. Perform penetration exercises to check its strength. People Interview the technical team for their technical background and skills. See if the skills match the project requirements. Intellectual property Review the intellectual property to see if the key technical features are covered. Consider these steps in performing technical due diligence on a startup. Thank you for joining us for the Startup Funding Espresso where we help startups and investors connect for funding. Let's go startup something today. _______________________________________________________ For more episodes from Investor Connect, please visit the site at: Check out our other podcasts here: For Investors check out: For Startups check out: For eGuides check out: For upcoming Events, check out For Feedback please contact info@tencapital.group Please , share, and leave a review. Music courtesy of .

music events startups investors funding technical perform intellectual espresso due diligence code reviews security review investor connect for feedback hall t

Conversations at the Intersection of AI and Code with Harjot Gill

Screaming in the Cloud

Play Episode Listen Later Sep 4, 2025 33:41

AI is rewriting the rules of code review and CodeRabbit is leading the charge. In this featured episode of Screaming in the Cloud, Harjot Gill shares with Corey Quinn how his team built the most-installed AI app on GitHub and GitLab, nailed positive unit economics, and turned code review into a powerful guardrail for the AI era.Show Highlights(0:00) Entrepreneurial Journey and Code Rabbit's Origin (3:06) The Broken Nature of Code Reviews (5:47) Developer Feedback and the Future of Code Review (9:50) AI-Generated Code and the Code Review Burden (11:46) Traditional Tools vs. AI in Code Review (13:41) Keeping Up with State-of-the-Art Models (16:16) Cloud Architecture and Google Cloud Run(18:21) Context Engineering for Large Codebases (20:52) Taming LLMs and Balancing Feedback (22:30) Business Model and Open Source Strategy About Harjot Gill Harjot is the CEO of CodeRabbit, a leading AI-first developer tools company. LinksHarjot on LinkedIn: https://www.linkedin.com/in/harjotsgill/SponsorCodeRabbit: https://coderabbit.link/corey

ceo amazon ai conversations future state code cloud origin intersection business models screaming aws keeping up github devops entrepreneurial journey gitlab code reviews corey quinn cloud architecture last week in aws

#231 - Faster Code Reviews, Faster Code Shipping with Stacked PRs - Greg Foster

Tech Lead Journal

Play Episode Listen Later Sep 1, 2025 60:57

Are long code review cycles killing your engineering team's velocity? Learn how top engineering teams are shipping code faster without sacrificing quality.In this episode, Greg Foster, CTO and co-founder of Graphite, discusses the evolution of code review practices, from the fundamentals of pull requests to the future of AI in code review workflows. He shares the secrets behind how the Graphite team became one of the most productive engineering teams by leveraging techniques like small code changes and stacked PRs (pull requests).Key topics discussed:The evolution of code review from bug-hunting to knowledge sharingBest practices for PRs and why small PRs get better feedbackHow stacked PRs eliminate waiting time in development workflowsThe rise of AI in the code review processWhy AI code review works best as an automated CI checkHow Graphite achieves P99 engineering productivityHiring engineers in the age of AI-assisted codingTimestamps:(00:00) Trailer & Intro(02:21) Career Turning Points(05:11) Now is The Golden Time to Be in Software Engineering(09:08) The Evolution of Code Review in Software Development(14:59) The Popularity of Pull Request Workflow(21:01) Pull Request Best Practices(26:17) The Stacked PR and Its Benefits(34:07) How Graphite Ships Code Remarkably Fast(40:03) The Cool Things About AI Code Review(45:23) Graphite's Unique Recipes for Engineering Productivity(50:55) Hiring Engineers in the Age of AI(55:31) 2 Tech Lead Wisdom_____Greg Foster's BioGreg Foster is the CTO and co-founder of Graphite, an a16z and Anthropic-backed company helping teams like Snowflake, Figma, and Perplexity ship faster and scale AI-generated code with confidence. Prior to Graphite, Greg was a dev tools engineer at Airbnb. There, he experienced the impact of robust internal tooling on developer velocity and co-founded Graphite to bring powerful, AI-powered code review to every team. Greg holds a BS in Computer Science from Harvard University.Follow Greg:LinkedIn – linkedin.com/in/gregmfosterX – x.com/gregmfosterEmail – greg@graphite.devGraphite – graphite.devGraphite X – x.com/withgraphiteLike this episode?Show notes & transcript: techleadjournal.dev/episodes/231.Follow @techleadjournal on LinkedIn, Twitter, and Instagram.Buy me a coffee or become a patron.

678: AI Hype, Browser Journey, and Content Creation Incentives

ShopTalk » Podcast Feed

Play Episode Listen Later Aug 18, 2025 54:58

Show DescriptionIdentifying where we are in the AI hype cycle, a quick #davegoeshairy update, what has been the impact of AI on tech creators, Chris is making his own CSS starter on stream, and Item flow / masonry discussions. Listen on WebsiteLinks Introducing GPT-5 - YouTube Simon Willison on ai Orion Browser by Kagi VisBug Chrome Canary Features For Developers - Google Chrome Download Microsoft Edge Zen Browser Google Backtracks On Plans For URL Shortener Service Impact of AI on Tech Content Creators Pre-commit Hooks, requestAnimationFrame, Code Reviews, and More - Syntax #922 CodePen Radio CSS Tools: Reset CSS Item Flow – Part 2: next steps for Masonry | WebKit

ai hype content creation incentives item browsers hooks css code reviews

Cybersecurity Today Month In Review: August 9, 2025

Cyber Security Today

Play Episode Listen Later Aug 9, 2025 58:55 Transcription Available

Cybersecurity Today: July Review - Massive Lawsuits, AI Warnings, and Major Breaches In this episode of Cybersecurity Today: The Month in Review, host Jim Love and an expert panel, including David Shipley, Anton Levaja, and Tammy Harper, discuss the most significant cybersecurity stories from July. Key topics include the $380 million lawsuit between Clorox and Cognizant following a massive ransomware attack, the ongoing legal battle between Delta and CrowdStrike, and breached forums like XSS leading to significant law enforcement actions. The panel also dives into AI-related risks in software development, recent supply chain attacks, and legislative developments in Europe affecting cybersecurity. Watch to stay informed about the latest trends and challenges in the cybersecurity landscape. 00:00 Introduction and Panelist Introductions 01:28 Major Cybersecurity Lawsuits: Clorox vs. Cognizant and Delta vs. CrowdStrike 04:11 Reflections on Legal Implications and Industry Impact 13:01 Tammy Harper on XSS Forum Seizure 17:52 Law Enforcement Tactics and Dark Web Trust Issues 23:47 Anton Levaja on Supply Chain Attacks 30:18 AI Wiping Code and Backup Issues 31:18 Security Concerns with Model Control Protocol 31:56 Challenges with AI in Code Review 34:02 The Problem with AI-Generated Code 40:43 The SharePoint Apocalypse 43:36 Impact of Business Decisions on Technology 49:16 Final Thoughts and Upcoming Stories 49:25 Current and Upcoming Tech Legislation

ai europe technology challenges impact current reflections delta cybersecurity final thoughts crowdstrike business decisions clorox cognizant legal implications xss code reviews supply chain attacks david shipley

Engineering in the AI Era: Qodo Founder on the AI-Powered SDLC

Founded and Funded

Play Episode Listen Later Jul 10, 2025 35:14

In this episode of Founded & Funded, Madrona Investor Rolanda Fu is joined by Dedy Kredo, the co-founder and chief product officer of QodoAI — formerly CodiumAI, a 2024 IA40 winner and one of the most exciting AI companies shaping the future of software development. Dedy and his co-founder, Itamar, are entrepreneurs who have spent their careers building for developers, and with Qodo, they're tackling one of the most frustrating problems in software engineering — testing and verifying code. As AI generates more code, the challenge shifts to ensuring quality, maintaining standards, and managing complexity across the entire software development lifecycle. In this conversation, Dedy and Rolanda discuss how Qodo's agentic architecture and deep code-based understanding are helping enterprises leverage AI speed while ensuring code integrity and governance. They get into what it takes to build enterprise-ready AI platforms, the strategy behind scaling from a developer-first approach to major enterprise partnerships, and how AI agents might reshape software engineering teams altogether. Transcript: https://www.madrona.com/engineering-ai-era-qodo-dedy-kredo-on-ai-powered-sdlc Chapters: (00:00) Introduction (01:12) The Future of AI in Software Development (01:58) Dedy's Journey in Tech (03:02) The Genesis of Qodo (03:53) Qodo's Unique Approach to AI Coding (05:13) Exploring Qodo's Product Features (06:42) Code Review and Verification (08:53) Customizing AI Agents (11:02) Vibe Coding and Code Review (13:27) Developer Love vs. Enterprise Needs (15:33) Enterprise Adoption (17:51) Future of Software Engineering (22:13) Balancing Developer Love and Enterprise Sales (24:05) Advice for Founders

founders ai future advice tech engineering chapters ai powered verification software development software engineering unique approach enterprise sales itamar sdlc code reviews rolanda product features

CodeRabbit and RAG for Code Review with Harjot Gill

Software Engineering Daily

Play Episode Listen Later Jun 24, 2025 48:42

One of the most immediate and high-impact applications of LLMs has been in software development. The models can significantly accelerate code writing, but with that increased velocity comes a greater need for thoughtful, scalable approaches to codereview. Integrating AI into the development workflow requires rethinking how to ensure quality,security, and maintainability at scale. CodeRabbit is The post CodeRabbit and RAG for Code Review with Harjot Gill appeared first on Software Engineering Daily.

code rag integrating ai code reviews software engineering daily

From English Literature to Cybersecurity: A Journey Through Blockchain and Security

Cyber Security Today

Play Episode Listen Later May 24, 2025 54:36 Transcription Available

LINKS: https://distrust.co/software.html - Software page with OSS software Linux distro: https://codeberg.org/stagex/stagex Milksad vulnerability: https://milksad.info/ In this episode of Cybersecurity Today on the Weekend, host Jim Love engages in a captivating discussion with Anton Livaja from Distrust. Anton shares his unique career transition from obtaining a BA in English literature at York University to delving into cybersecurity and tech. Anton recounts how he initially entered the tech field through a startup and quickly embraced programming and automation. The conversation covers Anton's interest in Bitcoin and blockchain technology, including the importance of stablecoins, and the frequent hacking incidents in the crypto space. Anton explains the intricacies of blockchain security, emphasizing the critical role of managing cryptographic keys. The dialogue also explores advanced security methodologies like full source bootstrapping and deterministic builds, and Anton elaborates on the significance of creating open-source software for enhanced security. As the discussion concludes, Anton highlights the need for continual curiosity, teamwork, and purpose-driven work in the cybersecurity field. 00:00 Introduction to Cybersecurity Today 00:17 Anton's Journey from Literature to Cybersecurity 01:08 First Foray into Programming and Automation 02:35 Blockchain and Its Real-World Applications 04:36 Security Challenges in Blockchain and Cryptocurrency 13:21 The Rise of Insider Threats and Social Engineering 16:40 Advanced Security Measures and Supply Chain Attacks 22:36 The Importance of Deterministic Builds and Full Source Bootstrapping 29:35 Making Open Source Software Accessible 31:29 Blockchain and Supply Chain Traceability 33:34 Ensuring Software Integrity and Security 38:20 The Role of AI in Code Review 40:37 The Milksad Incident 46:33 Introducing Distrust and Its Mission 52:23 Final Thoughts and Encouragement

Claude Code: Anthropic's CLI Agent

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later May 7, 2025 77:22

More info: https://docs.anthropic.com/en/docs/claude-code/overviewThe AI coding wars have now split across four battlegrounds:1. AI IDEs: with two leading startups in Windsurf ($3B acq. by OpenAI) and Cursor ($9B valuation) and a sea of competition behind them (like Cline, Github Copilot, etc).2. Vibe coding platforms: Bolt.new, Lovable, v0, etc. all experiencing fast growth and getting to the tens of millions of revenue in months.3. The teammate agents: Devin, Cosine, etc. Simply give them a task, and they will get back to you with a full PR (with mixed results)4. The cli-based agents: after Aider's initial success, we are now seeing many other alternatives including two from the main labs: OpenAI Codex and Claude Code. The main draw is that 1) they are composable 2) they are pay as you go based on tokens used.Since we covered all three of the first categories, today's guests are Boris and Cat, the lead engineer and PM for Claude Code. If you only take one thing away from this episode, it's this piece from Boris: Claude Code is not a product as much as it's a Unix utility.This fits very well with Anthropic's product principle: “do the simple thing first.” Whether it's the memory implementation (a markdown file that gets auto-loaded) or the approach to prompt summarization (just ask Claude to summarize), they always pick the smallest building blocks that are useful, understandable, and extensible. Even major features like planning (“/think”) and memory (#tags in markdown) fit the same idea of having text I/O as the core interface. This is very similar to the original UNIX design philosophy:Claude Code is also the most direct way to consume Sonnet for coding, rather than going through all the hidden prompting and optimization than the other products do. You will feel that right away, as the average spend per user is $6/day on Claude Code compared to $20/mo for Cursor, for example. Apparently, there are some engineers inside of Anthropic that have spent >$1,000 in one day!If you're building AI developer tools, there's also a lot of alpha on how to design a cli tool, interactive vs non-interactive modes, and how to balance feature creation. Enjoy!Full Video EpisodeTimestamps[00:00:00] Intro[00:01:59] Origins of Claude Code[00:04:32] Anthropic's Product Philosophy[00:07:38] What should go into Claude Code?[00:09:26] Claude.md and Memory Simplification[00:10:07] Claude Code vs Aider[00:11:23] Parallel Workflows and Unix Utility Philosophy[00:12:51] Cost considerations and pricing model[00:14:51] Key Features Shipped Since Launch[00:16:28] Claude Code writes 80% of Claude Code[00:18:01] Custom Slash Commands and MCP Integration[00:21:08] Terminal UX and Technical Stack[00:27:11] Code Review and Semantic Linting[00:28:33] Non-Interactive Mode and Automation[00:36:09] Engineering Productivity Metrics[00:37:47] Balancing Feature Creation and Maintenance[00:41:59] Memory and the Future of Context[00:50:10] Sandboxing, Branching, and Agent Planning[01:01:43] Future roadmap[01:11:00] Why Anthropic Excels at Developer Tools Get full access to Latent.Space at www.latent.space/subscribe

Claude Code: Anthropic's CLI Agent

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later May 7, 2025 77:21

More info: https://docs.anthropic.com/en/docs/claude-code/overview The AI coding wars have now split across four battlegrounds: 1. AI IDEs: with two leading startups in Windsurf ($3B acq. by OpenAI) and Cursor ($9B valuation) and a sea of competition behind them (like Cline, Github Copilot, etc). 2. Vibe coding platforms: Bolt.new, Lovable, v0, etc. all experiencing fast growth and getting to the tens of millions of revenue in months. 3. The teammate agents: Devin, Cosine, etc. Simply give them a task, and they will get back to you with a full PR (with mixed results) 4. The cli-based agents: after Aider's initial success, we are now seeing many other alternatives including two from the main labs: OpenAI Codex and Claude Code. The main draw is that 1) they are composable 2) they are pay as you go based on tokens used. Since we covered all three of the first categories, today's guests are Boris and Cat, the lead engineer and PM for Claude Code. If you only take one thing away from this episode, it's this piece from Boris: Claude Code is not a product as much as it's a Unix utility. This fits very well with Anthropic's product principle: “do the simple thing first.” Whether it's the memory implementation (a markdown file that gets auto-loaded) or the approach to prompt summarization (just ask Claude to summarize), they always pick the smallest building blocks that are useful, understandable, and extensible. Even major features like planning (“/think”) and memory (#tags in markdown) fit the same idea of having text I/O as the core interface. This is very similar to the original UNIX design philosophy: Claude Code is also the most direct way to consume Sonnet for coding, rather than going through all the hidden prompting and optimization than the other products do. You will feel that right away, as the average spend per user is $6/day on Claude Code compared to $20/mo for Cursor, for example. Apparently, there are some engineers inside of Anthropic that have spent >$1,000 in one day! If you're building AI developer tools, there's also a lot of alpha on how to design a cli tool, interactive vs non-interactive modes, and how to balance feature creation. Enjoy! Timestamps [00:00:00] Intro [00:01:59] Origins of Claude Code [00:04:32] Anthropic's Product Philosophy [00:07:38] What should go into Claude Code? [00:09:26] Claude.md and Memory Simplification [00:10:07] Claude Code vs Aider [00:11:23] Parallel Workflows and Unix Utility Philosophy [00:12:51] Cost considerations and pricing model [00:14:51] Key Features Shipped Since Launch [00:16:28] Claude Code writes 80% of Claude Code [00:18:01] Custom Slash Commands and MCP Integration [00:21:08] Terminal UX and Technical Stack [00:27:11] Code Review and Semantic Linting [00:28:33] Non-Interactive Mode and Automation [00:36:09] Engineering Productivity Metrics [00:37:47] Balancing Feature Creation and Maintenance [00:41:59] Memory and the Future of Context [00:50:10] Sandboxing, Branching, and Agent Planning [01:01:43] Future roadmap [01:11:00] Why Anthropic Excels at Developer Tools

LLMs for web developers with Roy Derks

PodRocket - A web development podcast from LogRocket

Play Episode Listen Later Mar 6, 2025 28:45

Roy Derks, Developer Experience at IBM, talks about the integration of Large Language Models (LLMs) in web development. We explore practical applications such as building agents, automating QA testing, and the evolving role of AI frameworks in software development. Links https://www.linkedin.com/in/gethackteam https://www.youtube.com/@gethackteam https://x.com/gethackteam https://hackteam.io We want to hear from you! How did you find us? Did you see us on Twitter? In a newsletter? Or maybe we were recommended by a friend? Let us know by sending an email to our producer, Emily, at emily.kochanekketner@logrocket.com (mailto:emily.kochanekketner@logrocket.com), or tweet at us at PodRocketPod (https://twitter.com/PodRocketpod). Follow us. Get free stickers. Follow us on Apple Podcasts, fill out this form (https://podrocket.logrocket.com/get-podrocket-stickers), and we'll send you free PodRocket stickers! What does LogRocket do? LogRocket provides AI-first session replay and analytics that surfaces the UX and technical issues impacting user experiences. Start understand where your users are struggling by trying it for free at [LogRocket.com]. Try LogRocket for free today.(https://logrocket.com/signup/?pdr) Special Guest: Roy Derks.

Podcasts about code reviews

Best podcasts about code reviews

Software Engineering Unlocked

Smart Software with SmartLogic

Tech Lead Journal

The Bike Shed

PodRocket - A web development podcast from LogRocket

Reversim Podcast

Software Engineering Daily

Programming Throwdown

Engineering Kiosk

Latest news about code reviews

Latest podcast episodes about code reviews

Using Substition to Make Decisions Simpler

Owning Your AI Stack with Zach Daniel

Building AI that can search inside videos (and photos and audio too)

1019: LGTM, Ship It: The AI Code Review Problem

The State of Career Growth in Elixir with Bruce Tate

How agentic AI works behind the scenes to find the answers you need

The PHP Podcast 2026.06.25

Folge 118: Der Agent reviewed

Champaign City Council 6-16-26 w/ Audio Descriptions

Why don't more AI tools understand what matters to you?

Vibecoding im Unternehmen: Wie Marketing-Teams mit KI eigene Anwendungen entwickeln

Coming soon: Working Smarter season three

Folge 117: Emotionen und Code Reviews

The Age of Async Agents — Cognition's Walden Yan & OpenInspect's Cole Murray

The State of Code Quality with Saša Jurić

Champaign City Council 5-19-26 w/ Audio Descriptions

#111: A Bazooka of Syntax

Do you actually own the code you ship?

The Future of Code Review: Stop Reviewing Line-by-Line, Start Governing AI Agents

#264 Seniorität im KI-Zeitalter: Eine Ode an den Junior

Episode 60: Using AI to modernise legacy code, manage tech debt and improve documentation

Zwolnij swoich developerów! AI zakoduje za nich

Amazon's AI outage, the engineer retention crisis, autonomous agents and the future of senior engineers

Meta Plumps For Bot Social Networks

Every Agent Needs a Box — Aaron Levie, Box

From Technical Debt to 4x Engineering Velocity with Gayatri Narayan of Builders FirstSource

20VC: Codex vs Claude Code vs Cursor: Who Wins, Who Loses | Will All Coding Be Automated - Do We Need PMs | The Real Bottleneck to AGI | The Three Phases of Agents and What You Need to Know with Alex Embiricos, Head of Codex at OpenAI

99. O architekturze oprogramowania w erze AI-Assisted development z Łukaszem Szydło i Marcinem Markowskim

509 Bumpers 90

505 Bumpers 89

424: I Never Really Loved Coding (And Only AI Made Me Realize It)

Crossover with Embedded AI Podcast

#220 Code Reviews als Kommunikationsnetzwerk mit Prof. Michael Dorner

#219 Technische Schulden: Bewusst aufbauen, gezielt abbauen

420: AI for the Code-Writing Purist: How to Use AI Without Surrendering Your Keyboard

#217 Bug Management: Erfassen, Reporten, Klassifizieren, Triagieren

From Deterministic to AI-Driven—The New Paradigm of Software Development | Markus Hjort

475: Invisible Mentorship

How OpenAI Built Its Coding Agent

Startup Funding Espresso – How To Perform Technical Due Diligence

Conversations at the Intersection of AI and Code with Harjot Gill

#231 - Faster Code Reviews, Faster Code Shipping with Stacked PRs - Greg Foster

678: AI Hype, Browser Journey, and Content Creation Incentives

Cybersecurity Today Month In Review: August 9, 2025

Engineering in the AI Era: Qodo Founder on the AI-Powered SDLC

CodeRabbit and RAG for Code Review with Harjot Gill

From English Literature to Cybersecurity: A Journey Through Blockchain and Security

Claude Code: Anthropic's CLI Agent

Claude Code: Anthropic's CLI Agent

LLMs for web developers with Roy Derks