POPULARITY
TestTalks | Automation Awesomeness | Helping YOU Succeed with Test Automation
AI coding tools promised to make development faster — and they delivered. But here's the problem nobody talks about enough: when you speed up coding, you don't eliminate the bottleneck in the SDLC. You just move it. And for most teams, it lands squarely in QA. In this episode, Joe sits down with Vilhelm von Ehrenheim, Co-founder and Chief AI Officer of QA.tech, to dig into how agentic AI is reshaping software testing from the ground up. Vilhelm brings serious ML credibility, he helped build Motherbrain, one of the earliest production LLM systems in venture capital, and he's now applying that experience to one of the hardest problems in software delivery: testing at AI development velocity. You'll learn how QA.tech's behavioral knowledge graph gives AI agents the context they need to actually understand your application, why validating user intent beats checking element identifiers every time, how autonomous agents can review PRs, reproduce bugs from Slack messages, and generate targeted tests without a single line of test code ,and what the tester's role actually looks like when agents do the heavy lifting. If you're wondering whether your QA practice can survive the pace of AI-driven development, this one's required listening.
Josh chats with Sal Kimmich about the current state of everything, and what we can expect next. Sal has some incredible insight into what we can expect to see due to the current wave of security bugs and incidents. There are some new features we will need in both our hardware and software to ward off the state of things. Since those features are years away, what we need in the short term is shoring up our SDLC programs. Sal has some really good medical examples and analogies for this one. It's a huge problem but not insurmountable. The show notes and blog post for this episode can be found at https://opensourcesecurity.io/2026/2026-06-verification-sal-kimmich/
In this episode of Resilient Cyber, I sit down with Katie Norton, Research Manager for DevSecOps and Software Supply Chain Security at IDC, to unpack what application security looks like as AI moves from copilot to autonomous teammate across the software development lifecycle.We dive into:
Software Engineering Radio - The Podcast for Professional Software Developers
Dwayne McDaniel, developer advocate at GitGuardian.com, joins host Priyanka Raghavan to talk about the engineering challenges of secrets management. They explore what "secrets" really are in modern systems—far beyond passwords—including API keys, tokens, certificates, and machine identities, and how "secret sprawl" emerges across the SDLC. Drawing on reports from GitGuardian and Verizon, they discuss the growing scale of secret leaks and why credential abuse and phishing remain dominant attack vectors. They examine common leak points—from code repos and logs to CI/CD pipelines, containers, and SaaS integrations—and how cloud, DevOps, and AI tooling are amplifying risks. Priyanka quizzes Dwayne about recent supply chain attacks from pyPi and trivy ecosystems, highlighting recurring root causes like poor access control, long-lived credentials, and weak security hygiene. Finally, they consider detection, response, and modern solutions—short-lived credentials, secret scanning, and identity-based approaches like OWASP NHIR and SPIFFE/SPIRE—ending with practical advice for engineers to reduce blast radius and design for secure secret lifecycle management.
We showcase recordings from this year's RSAC. At RSAC Conference 2026, Scott Clinton, Co-Chair and co-founder of the OWASP GenAI Security Project, shares insights from the project's latest research, including new landscape guides and evolving approaches to securing generative and agentic AI systems. The conversation explores critical gaps in GenAI data security, the rise of AI-assisted development, and the immense growth of the OWASP community and sponsor ecosystem. Looking ahead, he outlines the most urgent risks and priorities shaping AI and agentic security in 2026. Then Merritt Maxim discusses how AI is affecting Identity and Access Management. Expect to hear this topic a lot throughout 2026, especially as the industry tries to figure out what's different or special about securing agent identities. We close with a chat with Janet Worthington about the impact of agents on the SDLC and how orgs are updating their controls to deal with code generated by humans and LLMs alike. Segment Resources: https://genai.owasp.org https://genai.owasp.org/resources/ https://www.scworld.com/podcast-episode/3905-keeping-up-with-the-owasp-genai-project-scott-clinton-asw-381 This segment is sponsored by The OWASP GenAI Security Project. Visit https://securityweekly.com/owasp to learn more about them! Visit https://www.securityweekly.com/asw for all the latest episodes! Show Notes: https://securityweekly.com/asw-384
We showcase recordings from this year's RSAC. At RSAC Conference 2026, Scott Clinton, Co-Chair and co-founder of the OWASP GenAI Security Project, shares insights from the project's latest research, including new landscape guides and evolving approaches to securing generative and agentic AI systems. The conversation explores critical gaps in GenAI data security, the rise of AI-assisted development, and the immense growth of the OWASP community and sponsor ecosystem. Looking ahead, he outlines the most urgent risks and priorities shaping AI and agentic security in 2026. Then Merritt Maxim discusses how AI is affecting Identity and Access Management. Expect to hear this topic a lot throughout 2026, especially as the industry tries to figure out what's different or special about securing agent identities. We close with a chat with Janet Worthington about the impact of agents on the SDLC and how orgs are updating their controls to deal with code generated by humans and LLMs alike. Segment Resources: https://genai.owasp.org https://genai.owasp.org/resources/ https://www.scworld.com/podcast-episode/3905-keeping-up-with-the-owasp-genai-project-scott-clinton-asw-381 This segment is sponsored by The OWASP GenAI Security Project. Visit https://securityweekly.com/owasp to learn more about them! Show Notes: https://securityweekly.com/asw-384
We showcase recordings from this year's RSAC. At RSAC Conference 2026, Scott Clinton, Co-Chair and co-founder of the OWASP GenAI Security Project, shares insights from the project's latest research, including new landscape guides and evolving approaches to securing generative and agentic AI systems. The conversation explores critical gaps in GenAI data security, the rise of AI-assisted development, and the immense growth of the OWASP community and sponsor ecosystem. Looking ahead, he outlines the most urgent risks and priorities shaping AI and agentic security in 2026. Then Merritt Maxim discusses how AI is affecting Identity and Access Management. Expect to hear this topic a lot throughout 2026, especially as the industry tries to figure out what's different or special about securing agent identities. We close with a chat with Janet Worthington about the impact of agents on the SDLC and how orgs are updating their controls to deal with code generated by humans and LLMs alike. Segment Resources: https://genai.owasp.org https://genai.owasp.org/resources/ https://www.scworld.com/podcast-episode/3905-keeping-up-with-the-owasp-genai-project-scott-clinton-asw-381 This segment is sponsored by The OWASP GenAI Security Project. Visit https://securityweekly.com/owasp to learn more about them! Visit https://www.securityweekly.com/asw for all the latest episodes! Show Notes: https://securityweekly.com/asw-384
Take the 2026 AI Engineering Survey and get >$2k in credits and AIE WF tickets!This was recorded before Railway suffered a major GCP outage on May 19, despite being a multi-AZ, multi-zone mesh ring, with HA fiber interconnects between their Metal GCP AWS, because workload discoverability was unintentionally still tied to GCP. All has been resolved with a post-mortem.Railway did not start as an AI infrastructure company.It was founded in 2020 years before agents became the default way people thought about deploying software. Jake Cooper, formerly at Bloomberg and Uber, started Railway with a simple obsession: the activation energy to ship something to production should be near zero. Push code, get a URL, iterate. No Docker files, no Kubernetes manifests, no Ansible scripts stacked on Ansible scripts.For years, this was a slow grind. Railway spent its first 18 months hand-acquiring its first 100 users with Jake personally greeting every Discord signup on a second monitor.Today, Railway has raised $124m and is growing very fast. A 35-person team supports 3 million users, adding roughly 100,000 signups a week. Their bare metal data centers have a 3-month payback period vs. renting in the cloud, with 70% margins funding aggressive cloud bursting when needed. The servers they own have actually appreciated in value as RAM prices have climbed basically meaning the value of their hardware now exceeds the capital they've raised.From rebuilding Railway's network overlay over a weekend to moving the vast majority of workloads onto its own bare metal data centers, Jake Cooper is trying to build a new cloud for an agent-native world. In this episode, Railway's founder and “conductor” joins swyx and Alessio to unpack why the next era of software infrastructure is not just “Heroku but newer,” what agents need that humans did not, and why the old deployment loop of Git, PRs, CI/CD, and static cloud resources may be heading for a rewrite.We go deep on Railway's infrastructure stack: own-metal data centers, three-month cloud payback periods, cloud bursting, data center debt, Railpack, Nixpacks, Temporal, feature flags, Central Station, content-addressable filesystems, agent-safe production forks, and why the CLI may become more important than the canvas in an agent world. Jake also shares the founder journey behind Railway, how the company survived losing $500K/month, why it now serves millions of users with only 35 people, and why he believes the pull request is dying.We discuss:* How Railway went from a slow six-year grind to adding 100,000 users a week* How Railway thinks about agents as the next dominant software species* Why agents need version control, observability, compute, storage, and orchestration at 1000x scale* The economics of Railway's own-metal data centers and three-month payback* How Railway uses cloud bursting while scaling its own infrastructure* Why data center debt can be a better tool than venture debt for infra startups* Central Station, Railway's internal system for clustering customer feedback and incidents* Why responsible disclosure and over-communication matter for platforms* Why feature flags, progressive rollouts, and shadow traffic are essential for agents* Temporal's strengths, pain points, and why workflows matter for agents* Railpack, Nixpacks, Nix, and lazy-loaded content-addressable filesystems* Why “cattle, not pets” may change if you can clone the pets* Why Railway is building a new cloud from scratch instead of copying hyperscalers* The solo founder path, focus, writing, and how Jake thinks about company buildingRailway:* Website: https://railway.com/* X: https://x.com/RailwayJake Cooper:* LinkedIn: https://www.linkedin.com/in/thejakecooper/* X: https://x.com/JustJakeTimestamps00:00:00 Introduction: What Is Railway?00:02:07 Jake's Path to Railway00:06:13 Railway's Six-Year Growth Story00:08:52 Rebuilding the Business After the Free Tier00:11:17 Agents as the Next Software Platform00:13:29 Railway's Infrastructure Philosophy00:15:42 Bare Metal, Cloud Economics, and the Compute Crunch00:17:22 Cloud Bursting and Five-Cloud Networking00:20:20 Data Center Debt and Infra Financing00:23:31 Data Centers in Space00:25:24 What Agents Need From Infrastructure00:28:24 CLIs, Canvas, and Agent-Native UX00:35:15 Central Station, Incidents, and Responsible Disclosure00:40:30 Safe Rollouts, SRE Agents, and Production Forks00:45:00 AI SRE, Specs, Code, and Tests00:48:24 Self-Replicating Infrastructure and the New Serverless00:53:18 Heroku, Temporal, and Workflow Engines01:04:07 Railpack, Nixpacks, and Lazy-Loaded Filesystems01:06:01 Coding Agents, Token Spend, and Roadmap Acceleration01:10:56 The Pull Request Is Dying01:12:28 Feature Flags and the Agent-Era SDLC01:16:15 Cattle, Pets, and Cloning Machines01:19:29 Solo Founder Lessons01:24:12 Focus, GPUs, and Building a New Cloud01:28:20 Closing ThoughtsTranscriptAlessio [00:00:00]: Hey, everyone. Welcome to the Latent Space Podcast. This is Alessio, founder of Kernel Labs, and I'm joined by Swyx, editor of Latent Space.Swyx [00:00:10]: Hey, hey, hey. Today we're in the studio with Jake Cooper of Railway.Alessio [00:00:14]: Conductor of Railway.Swyx [00:00:15]: Conductor at Railway. Yeah.Alessio [00:00:16]: Choo-choo.Swyx [00:00:17]: Do you actually have that anywhere, like on your business card?Jake [00:00:20]: We call some of our volunteer moderators conductors. I don't have a business card. We're not that big yet. At some point I will. I got handed a nice business card from the Supermicro folks, and I was like, “Damn, this is pretty official.”Swyx [00:00:30]: Business cards are coming back.Jake [00:00:32]: They're cool. They're hip. The conductor thing is good. We're trying to figure out what we want to call each other internally. Some people think it's super cringe and say, “You don't need a name for people internally.” Some people want to call each other something. We still don't have a really good one.Jake [00:00:55]: We've got New Railcrews, Trainiacs. Nothing has stuck yet.Swyx [00:01:00]: I like Trainiac. Trainiac sounds good. Railwayians. For those who don't know, what is Railway? Let's give people a crisp definition up front.Jake [00:01:09]: Railway is the easiest way to ship anything. You go to the canvas, or you talk with Claude, and you say, “Deploy a Postgres instance, deploy my GitHub repository, run this code,” and you're off to the races.Swyx [00:01:22]: You've got a nice animation on the landing page.Jake [00:01:24]: Thank you. None of my work, by the way. They don't let me touch the design stuff anymore.Jake [00:01:25]: We want to make it trivially easy not just to deploy things, but to evolve applications over time. Most tooling right now stacks entropy on top of entropy: Docker, Kubernetes, Ansible scripts, and all these other things. If we can version all of your software and keep track of all the changes, then we can make it trivial to clone environments, fork into a parallel universe, get copies of production data, get copies of any services, make changes, validate them, and collapse them back in without reproducing everything across a staging environment.The Railway Origin Story: From Uber Systems to a New CloudSwyx [00:02:07]: I was looking at your background: Bloomberg, Uber. Nothing immediately stands out as, “This guy is going to found the next great platform as a service.” What prepared you for Railway?Jake [00:02:21]: It was curiosity to keep going deeper. I started out on front-end stuff, working on Wolfram Mathematica and porting it over. Then I briefly moved to Bloomberg, then toward Uber and distributed systems, taking the Jump Bikes systems and moving them to a distributed system built on top of Cadence, the pre-Temporal Temporal.Swyx [00:02:44]: Which, by the way, I'm happy to talk about, pros and cons.Jake [00:02:48]: Totally.Swyx [00:02:51]: But let's do the Railway story.Jake [00:02:52]: It has been a continual step of wanting an experience. Whether it's walking up to a bike, unlocking it, and having it work frictionlessly, or something else, the depth required to make that happen follows from the experience. A lot of the work I do, and a lot of the team does, is in service of that experience. We fundamentally don't care how deep we have to go. We will swim to the bottom of the swimming pool to get the experience.Jake [00:03:17]: I don't have a physics PhD. I did an EECS degree. It has always been about figuring out the next step: how do we get there? That's what led to starting Railway for that experience and then moving all the way to bare metal data centers. I was adding patches to the kernel this week to get the experience there because I can see how much better it can be.Swyx [00:03:49]: Other patches to the Linux kernel this week?Jake [00:03:51]: Yeah. Not upstream. Our fork.Swyx [00:03:52]: That's a flex. Railpack? No, this is different. This is the OS on top of Railpack?Jake [00:03:57]: No, this is an actual kernel patch. It's always literally: what do we have to do to get that experience? Then figure it out. Anything is figureoutable.Swyx [00:04:10]: Would you send the patch upstream, or does it not fit other use cases?Jake [00:04:13]: Maybe. We have to work out the experience internally. It has to do with the storage layer we're building for some of the agentic stuff. Maybe it'll be useful upstream, but it's deeply useful for us internally.Open Source, Forks, and Non-Deterministic VersioningSwyx [00:04:29]: You mentioned open source before. How do you think about starting from open source, and then coding agents letting you do a lot more from forks of it?Jake [00:04:38]: GitHub's original sin is that it's almost a series of broken pointers. You have this thing, then you clone it, and now you've lost the whole upstream. How do we make it trivial for people to modify really small pieces of it?Jake [00:04:51]: We think of Git in a discrete sense: I've either made a change and merged upstream, or I haven't. What would it look like if it were percentage-based, a little more non-deterministic, or a stream of changes that users traverse as a percentage rolled out in general and then rolled all the way up?Jake [00:05:13]: We have the open-source kickback program and let you deploy templates because we want to make it trivial for people to version these shards over time. It solves a large problem around authentication, authorization, and security. NPM has a way to define, “Don't take any new packages.” The ideal end state is that you roll out progressively to users with the minimum impact zone and continue rolling up. JPMorgan should probably be the last one on the patch line, for all our sakes, because our money and livelihoods are there.Jake [00:05:53]: It's okay if Johnny Vibe Coder gets a broken patch because there's so much entropy in the system that the rubber has to meet the road at some point. You have to test at varying levels.The Long Grind: First Users, Free Tier, and Making the Business WorkSwyx [00:06:13]: I wanted to pull up this glorious chart, which is your usage or number of daily signups?Jake [00:06:22]: Daily signups, I think.Swyx [00:06:24]: You started six years ago. It was a slow grind, and now you're on a rocket ship. You say, “Don't doubt your fight and don't quit.” Maybe pick out certain points that were key inflections for the company.Jake [00:06:40]: At the start, it's about getting your first 100 users, hell or high water. We had a website and a support link. The support link was the Discord channel. I had notifications on with two monitors: the monitor I was working on and the other monitor with Discord. If anybody came in, I was immediately like, “Hey, how's it going?” It was rare, so getting those first 100 users to come back was the start.Jake [00:07:14]: Then you build a consultancy factory because users want all these things. You have to go back to the board and ask, “What is the actual product offering I want to build on top of this?”Jake [00:07:28]: VCs want charts that always go up and to the right, but in reality you don't necessarily want charts that look like that. For us, there have been periods of expansion where we add features to test use cases, and periods of compaction where we ask, “If the experience we have is good, how do we make it significantly better?” Maybe we strip out features that don't fit our ICP anymore.Jake [00:07:57]: The boom from 2022 to 2023 came from the free tier. Everybody under the sun was using it.Swyx [00:08:09]: A lot of Reddit bots and Discord bots.Jake [00:08:12]: And crypto miners. When you build an open product on the internet where anybody can sign up, the internet is a horrible place with so many things. You go through periods of asking, “How do I reach as many people as possible?” Then, “How do I fit the exact use case for the people who really matter and are really excited about this specific thing?”Jake [00:08:39]: Then there was a two-year period of making the actual business work. During the free-tier era, we were losing about half a million dollars a month.Swyx [00:08:59]: On a $20 million bank account.Jake [00:09:02]: On a $20 million bank account with maybe $50,000 a month in revenue. That's a horrible business. I don't know how anybody invested. But you have to go through it and say, “We have an experience people love, but the business has to work.”Jake [00:09:17]: There are two schools of thought. You can run the horrible business all the way up with bad margins, or you can go back and make it work. We've always wanted a super lean team. We're 35 people right now. It's very small.Swyx [00:09:36]: Supporting three million already?Jake [00:09:38]: Yeah. We're adding 100,000 users a week right now, so it's growing fast. We don't want to add headcount for the sake of headcount or throw bodies at problems. We want to build systems. It's hard to build systems during expansion because you're adding things to the system because people are asking for them or things are breaking.Jake [00:10:00]: We had to cut off the free users for a little while, rebuild the business, and make sure it worked. We want to reach as many people as possible because software is important. It's become difficult to create things in the physical world, so it's important to make it easy for people to build in the virtual world and have access to creation. But there are legs to that journey.Jake [00:10:30]: You can see divots in the charts. If you follow between 2025 and 2026, it's either summer or winter. People go on holiday with family.Swyx [00:10:50]: It affects that much?Jake [00:10:51]: Yeah. It's kind of B2C and kind of B2B. People are shipping constantly, then they stop. Our activation curve now shows more people activating on weekdays because we have more business users, so it smooths out over time.Agents as the New Interface to DeploymentSwyx [00:11:17]: Was there a point where you started prioritizing AI development or agent development?Jake [00:11:24]: We've prioritized agentic as a top-of-funnel thing. Over the last six months, we've deeply prioritized agentic as a mechanism to build and deploy things because we believe the curve is so steep and that is how people will build and deploy software.Jake [00:11:42]: It almost fundamentally doesn't matter whether this is dot-com or not because we're all on the internet anyway. If agents are going to deploy a bunch of things and we hit an inference wall at some point, we'll fix those problems. The dominant species over the next 10 years is that we've moved from assembly to C to C++ to JavaScript to words. You're going to need to close that loop.Swyx [00:12:13]: When you say this is dot-com, did you mean buying the domain, or the general case?Jake [00:12:17]: I mean the dot-com era, when companies had a huge run-up because people understood the internet was important. Then they hit bottlenecks, fundamental laws of physics, math didn't work, and everybody came back down to earth. But it didn't matter because the internet became so impactful. If you operate on a long enough time horizon, you should build these things anyway because you can see where it's going.Jake [00:12:45]: That's where I think a lot of agent stuff is. You get to a point where you're running thousands of agents in parallel. What is the inference cost? What is the compute cost? How do you make that efficient? How do you coordinate all this? We have issues coordinating humans; we don't even have good tooling for that. Now we have to figure out how to get agents to coordinate, safely version changes, and know when to raise their hand for someone to intervene. Otherwise it becomes an interrupt factory.Railway's Infrastructure Thesis: Network, Compute, Storage, and MetalSwyx [00:13:19]: Let's go right into the technical side. What are the core infrastructure or architectural beliefs of Railway that allow you to do what you do?Jake [00:13:29]: The primitives matter a lot for us. We need network, compute, storage, and orchestration around it. You need control over a lot of those things. We've talked a lot about how we don't really use Kubernetes because we want higher-order control to place workloads in very specific places.Jake [00:13:48]: The reason is that you have to be very efficient with agents: memory reuse and all these other things, or you're going to massively blow up your cost structure. Being able to rack and stack your own servers and build your own metal unlocks performance and cost. Experiences where you're running 1,000 agents in parallel are not massively cost prohibitive.Jake [00:14:13]: Token use and compute use are blowing up. Over time, those things have to get a lot more efficient. You can get a lot of margin to make those experiences solid by building your own metal. That's all in service of offering a differentiated experience to as many people as humanly possible.Swyx [00:14:51]: You have a data center in Singapore.Jake [00:14:53]: Yeah. We have two in every other region now. In Singapore, we're adding a second one in Q3.Swyx [00:14:58]: What's it like? I've never built a data center. Do you go to Equinix and say, “I want some slots?”Jake [00:15:05]: Yeah. Equinix. You basically go and say, “I want power and I want a cage.” They say, “Great, here's what it's going to be.” You rent the cage for a period of time, fill it with racks and servers, and hook up internet to it. That's all the pieces.Swyx [00:15:36]: Then you handle everything else.Jake [00:15:37]: You handle everything else.Swyx [00:15:39]: What's the math versus clouds doing it for you?Jake [00:15:43]: If we rented in the cloud, our payback period when we go to metal is about three months.Swyx [00:15:50]: Which is crazy.Jake [00:15:51]: It's nuts. That's four years of depreciated hardware. You're going to see a lot of this compute crunch because hyperscalers are buying up a lot of stuff. We're working directly with OEMs, resellers, and people building these machines: Supermicro, Dell, and others.Jake [00:16:11]: Upstream, there's a bunch of supply pressure. When we raised our last round, between deploying capital for servers and now, the amount of money we've raised is less than the amount of money we have in the bank plus the value of the servers because the servers have appreciated as RAM has gone up. It's nuts how valuable hardware has become.Jake [00:16:50]: If you look at hyperscalers, they deployed around $80 billion of capital expenditures this year, and next year will be more. That's a massive infrastructure build-out. You look at that and think it's crazy that they're spending way more than the Manhattan Project. But if every person is going to run dozens or hundreds of agents in parallel, you have no conceptual idea how much compute is required to make that experience happen, even if you're deeply efficient and sharing resources. And that doesn't even count inference.Swyx [00:17:22]: How do you plan the build-out? The growth chart is so vertical. Are you usually at 100% utilization as soon as racks are live? How far ahead are you planning?Jake [00:17:33]: We still maintain cloud presence for bursting. We work with AWS, GCP, and a few other clouds. We can rent, and then the moment we get space or power, we compact those workloads off the cloud. We started on the clouds, then built a system to migrate to our own metal. There's nothing that says you can't continually do that again, and that's exactly what we do. We never want to be compute constrained.Jake [00:18:09]: At the start of the year, we actually became compute constrained because one upstream provider wasn't able to give us quota at the rate we needed, and the hardware was slower. I spent a weekend rebuilding our entire network overlay so we could straddle five clouds: Oracle, AWS, ourselves, GCP, and one other one. We can do more than that now.Jake [00:18:38]: We got into a spot where we were trying to pack instances tight because we couldn't get enough compute. That led to a few reliability issues, which are now past us. I made a tweet pointing out that it's becoming harder and harder to acquire compute at the rate these models need to acquire compute. We got bit by it.Swyx [00:19:15]: How do you think about pricing knowing you might not have your own metal available at all times? Are you pricing assuming you need extra margin if you end up going into the cloud?Jake [00:19:26]: Because we've built out our metal data centers, our margins on metal are around 70%. We can deeply subsidize the cloud business if we want to scale at a reasonable rate. We have a few levers: metal, which makes the margins; cloud burst; debt to buy servers; and venture capital. It's an interesting operational problem: how much cash do we have, how much should we raise, how quickly can we deploy it, and can we scale revenue as quickly as we scale compute?Jake [00:20:05]: If we continue making it trivially easy for people to build and deploy, then the faster we close that loop and the more operationally excellent we are with capital, the faster the business can scale. It's almost a straight linear deployment rate.Financing Infrastructure: Hardware Debt, VC, and Operational LeverageSwyx [00:20:20]: I think infra startups raising debt is a tool people don't utilize enough or know enough about. What can you tell us about that? Is it secured against your CPUs?Jake [00:20:32]: It's secured against our hardware.Swyx [00:20:37]: What rates do you get? Who are the lenders?Jake [00:20:39]: We pay prime plus a spread, and we can refinance any of the debt as rates go down. The terms are pretty good. The unfortunate thing is that Twitter has no nuance, so people say, “Venture debt bad.” But as with all things, there are specific tools and areas where you can be deliberate instead of using one tool as a hammer. Venture capital is not the hammer for everything. You have to explore and figure out what works.Swyx [00:21:12]: VC is usually the most expensive financing you can get.Jake [00:21:15]: Yeah. I also think people think about VC incorrectly from a capital-raising perspective. Most people think, “How do I raise as much money as possible from whoever is probably the best I can get at that time?” That's close to right, but what we've tried to do is figure out what unfair advantage we can buy with that equity.Jake [00:21:34]: It's the most expensive equity you're going to give away at that point in time, assuming the company keeps getting better. How do you use it to work with someone stellar who complements you? In the seed stage, I had never started a company. Ray Tonsing had good advice, and I could text him all the time. He was really fast. Awesome.Jake [00:22:01]: Then with John and Erica at Unusual, they said, “You roughly know what you're doing building a product. We'll mostly leave you alone and be available for advice.” Amazing. Then we got to Series A and the business was an operational tire fire because we didn't know how to scale a business. Work with Erica, and Jordan is over at Redpoint, so bonus.Jake [00:22:28]: Now we've raised from TQ and FPV as we're moving into enterprises. Every step of the way, we've asked: who can we partner with at this specific time to unlock the next section of the journey? I don't know enterprise sales. As an engineer, I can eyeball what features we might need, and we have wonderful people internally who can help. But you want boardroom dynamics where everyone is aligned and asking, “How do we win this?” instead of bickering about strategy.Data Centers in Space and the Physics of ComputeSwyx [00:23:31]: You had a tweet about data centers in space. Why no data centers in space?Jake [00:23:37]: It's not “no data centers in space.” My hot take is that I think it is solvable. I've just never seen anybody solve it.Swyx [00:23:49]: You said, “How are you going to dissipate that much heat in a vacuum?” You're making a physics claim.Jake [00:23:55]: I haven't seen anybody prove how you're going to dissipate that much heat in a vacuum. It doesn't mean it's not possible. It just means nobody has brought it up yet.Swyx [00:24:05]: Astrophage.Jake [00:24:06]: I don't know what that is.Swyx [00:24:07]: The Martian thing. Okay, you're very logical.Jake [00:24:09]: It could work. A lot of people are putting the cart before the horse. They say, “We're going to put data centers in space.” Okay, but how? “We have time to figure it out.” It's like in The Martian where they ask how they're going to intercept something and say, “We'll figure it out.”Swyx [00:24:36]: Making a bet on human invention is weird because you blind trust that it can be solved. But with physics, there are first-principles bounds you can put on it. Maybe not. Maybe you're asking to travel time or break a fundamental thermodynamic law.Jake [00:24:57]: I don't know how VCs do this either. How do you know what's not possible and a grift versus what's possible but sounds completely insane? “We're going to put data centers in space.” Coin flip as to which it is, and I guess you'll know in 10 years. That's one cycle.What Agents Need: Versioning, Observability, and 1,000x ScaleSwyx [00:25:23]: Moving back to agents. The branching, fast spin-up, and orchestration you do feels like pre-work that happened to be exactly what agents want. What do agents want differently than humans?Jake [00:25:37]: They want the ability to version things. It's not that different; it materializes slightly differently. Agents want a way to test changes incrementally. Engineers have feature flags. Is there a reason agents can't use feature flags? I don't think so.Jake [00:25:54]: They want version control. Can we use Git or not Git? That one is up in the air. I think something outside Git will emerge for how we version these things over time. They need observability. You need to query what happened, when it happened, which steps failed, traces, logs, metrics, and all the rest. They need network, compute, and storage. They need to write files, save files, iterate on files, and snapshot file systems.Jake [00:26:25]: A lot of what humans needed is in line with what agents need. Branching and forking are not different; we're just moving 1,000 times quicker. It can look like you need something massively different, but what you need is something massively better than what existed. You need orchestration massively better than Kubernetes. You need networking probably better than Envoy. It goes all the way down the stack.Jake [00:26:55]: If the workload profile doesn't change so much as it gets massively compressed because you need thousands of these things, what assumptions change? etcd is going to melt. You need to replace it with something. You can go all the way down the stack and say, “That part has to change, that part has to change, and that part has to change.”Jake [00:27:19]: The interesting thing about the super-exponential curve is that you have to build systems where you can rip out those parts at any time because a new bottleneck might emerge. You get good at parallel agents, and a different part of the system breaks. So it's similar to what humans needed, but at 1,000x scale.Jake [00:27:55]: How do you do code review in the age of agents?Swyx [00:28:00]: You throw more agents at it.Jake [00:28:01]: You don't. But then who reviews for CVEs and all these other things?Swyx [00:28:07]: More agents.Jake [00:28:08]: And that's how we hit the inference wall. You can continually throw agents at the problem, but I think there's a limit to the number of agents you can throw at a problem.CLI, Agent Handles, and Closing the LoopSwyx [00:28:24]: You already had a CLI before it was cool. How is the shape of what you're exposing changing, if at all?Jake [00:28:28]: CLIs have always been cool. The CLI changes because we think about how to give Claude, Codex, ChatGPT, or any model a handhold.Jake [00:28:50]: A CLI is a single command: deploy, get logs, and so on. Things that were prohibitively annoying to humans are not annoying to agents. They're nice. If I handed you a CLI with 40 arguments and 600 flags, you'd think, “I'm never going to use all of this.” But if you hand it to an agent, it says, “This is excellent. I have so many handles to work with.”Jake [00:29:24]: If you're going to expose things to agents that way, you want as many handles as possible where they can get information, query dynamic information, and close the loop quickly. Most problems right now are about how to close the loop as quickly as possible. Where does the agent get stuck, and how can you remove that?Jake [00:29:49]: Telemetry is important. If you can tell where the agent gets stuck from the CLI and say, “12% of people deviate from the happy path because of this, and now I add this argument and drive it down to 2%,” you massively increase the rate of loop closure.Jake [00:30:03]: That's how we think about not just the CLI, but every point in the dashboard. It's a user journey: I hear about Railway. I get something deployed. I get my first green build or aha moment. I see an endpoint, logs, whatever. Then I iterate. The iteration loop is indefinite. The user wants to deploy a new thing, a Postgres instance, change code, and keep iterating.Jake [00:30:36]: If you focus on the iteration loops and what's blocking them from closing quickly, one thing we say internally is: you never want to be waiting on compute anymore. You always want to be waiting on intelligence. If you're waiting on compute, there's a bottleneck that needs to be destroyed because eventually that bottleneck becomes so large that another workflow emerges to change it.Jake [00:31:04]: We've built a product where you push code, build it, and so on. But I fundamentally believe the push-pull loop is going away. We'll get to a point where you make a small change in production, that change is versioned across your infrastructure, you're working alongside copy-on-write versions of your database and infrastructure, and then you merge it in and it's instantaneously live. That's the holy grail of loops. The push-pull-rebuild thing is a point of friction that we're removing entirely.Canvas as Output: Dashboards, Context Anchors, and HyperstructuresSwyx [00:31:43]: It's incredibly fast. If anyone hasn't tried it, that fast feedback is great. My hot take is that Railway was famous for its canvas, which visualizes your infrastructure and lets you manipulate it visually. But that was for humans. For the next phase of growth, Railway CLI is more important than canvas.Jake [00:32:05]: The canvas is funny because it's a mechanism to show changes over time. You're right that previously we used it a lot as an input. Moving forward, its goal is more like an output. You would go to the canvas, make changes, see them, and watch your infrastructure evolve. Now agents have access to the CLI and can make those changes. So the canvas becomes an output: what information does the human need at this moment to make suitable decisions about control requests? Do I approve this or not?Jake [00:32:57]: It also has to be an anchor for your context, a port in the storm. Think of it like layers in a file system. You start with a project, then drill down into services, then into a function or code, because you want to represent the entire thing not just in your head, but in the canvas. Other people can share that representation, think on the same wavelength, and move quickly.Jake [00:33:33]: A lot of organizations get in trouble as they scale because all the context lives in someone's head. “How does this microservice work?” “I have no idea; go ask this person.” Then you have whole categories of products built around context discovery. A lot of that melts away if you have a solid hierarchy and can infinitely nest services, code, context, and everything else all the way down. That's what lets you build these structures over time.Jake [00:34:18]: It's also what lets us build what I've called hyperstructures: things that are way bigger. You look at the Golden Gate Bridge and ask, “How did we build that?” There's a meme that we lost the technology. To some extent, yes, because the coordination that built those things evolved and changed. We lost some of the art of building structure as we jammed everything into Slack.Swyx [00:34:52]: But you jam everything in Discord.Jake [00:34:53]: Same point. It doesn't matter. It's message passing and interrupts, message passing and interrupts.Swyx [00:35:00]: So you're arguing there should be something better and more structured than Slack?Jake [00:35:04]: Yeah. For sure. I think Slack is awful, and Discord is awful too.Central Station: Context Routing, Support, and Incident ClustersSwyx [00:35:09]: This is the equivalent of my mom test. What have you done that has your solution to this?Jake [00:35:15]: Internally, we've built a tool called Central Station that aggregates all the context from our users. Every piece of feedback, every customer support item, everything gets aggregated into clusters. If an incident is brewing, we can determine how many users are affected and break off a discussion based on that.Jake [00:35:40]: That is more helpful than long-running channels where you're trying to decide which channel to put something in. If you can dynamically aggregate information and dynamically route it to the right person based on context, it works better. We know internally that these four people are close to networking. If we see a networking thing, we can drill it down to those four people. If it's with this part, we can look at the commits. This is no longer a manual process internally.Jake [00:36:13]: If you go to station or help.railway.com, that's why we built it. We wanted to scale with a massive amount of leverage by aggregating feedback.Swyx [00:36:27]: This is built in-house?Jake [00:36:28]: Yep.Swyx [00:36:29]: I remember helping out on this one with Angelo in 2023. You scale a lot with a very small team.Jake [00:36:38]: Yeah. We're about 10 times bigger now.Swyx [00:36:40]: You have your full developer code here? Very cool.Jake [00:36:44]: If you go to railway.com/stats, we expose this as a pub-sub-able thing. It's all real-time metrics. There's a way to get it as JSON somewhere if you care.Jake [00:37:01]: We're big on trying to build everything in public and talk about what we're working on. We've had issues in the past, and we'll say, “Here's how we're fixing these things.” We've gotten compliments and flak for incident reports. We're always trying to make them better and talk with people.Incidents, Disclosure, and Progressive RolloutsSwyx [00:37:20]: You had a big one recently. I liked that it was scoped to 3,000. You presumably used Central Station. Talk through what happened and how you address it internally as a team.Jake [00:37:38]: Internally, this one really sucked. It had to do with an upstream provider that didn't do the behavior it said it documented, which is unfortunate given they wrote the RFC for how the behavior should work. We rolled those things out, and Central Station caught it initially when a couple users said caches weren't invalidating. We turned it off immediately.Jake [00:38:03]: When you roll out to a large user base of three million people, you get a lot of disparate behaviors. We tested in staging and had tests, but we hit an edge case. We've hardened those systems, and now we can make that better. But it was a tough one.Swyx [00:38:39]: I always wonder how private disclosure is supposed to work if people find an issue. Are they supposed to contact you first? When you run a platform, these things will happen. What channels should people pursue to quietly resolve it before it becomes a bigger incident?Jake [00:38:59]: There's responsible disclosure. We err on the side of over-disclosing and letting you know something is wrong versus having your provider gaslight you. We've erred on sharing those things more publicly, even if they impact a small subset of users. That's a decision we've made internally. We have four values. One is honor. The honorable thing is to notify people to the widest degree at which they may have been affected or there was an issue, and then confront it head-on: why did it happen, what can we do better?Swyx [00:39:45]: Not the whole user base. That's because of incremental rollouts and other things?Jake [00:39:50]: Yeah. Progressive rollouts.Swyx [00:39:54]: That should be the norm at all large platforms.Jake [00:39:58]: It should. A variety of companies do this. There's the quote that Meta runs 10,000 different versions of Meta. To our earlier point about agents, they need the same thing. They need shadow traffic and all these other things. We've built so much ceremony around production being sacred that we need to make it trivially easy to test different behaviors in a safe environment. Then you can make mistakes in a safe environment.Safe AI SRE: Customer Agents, Forked Environments, and Production ParityAlessio [00:40:30]: Do you see a world where these things get automatically caught, not necessarily by your agent, but by your customer's agent? The cache invalidation issue seems easy to check if you know to look for it.Jake [00:40:44]: It's hard because to determine it, we almost need to hook into your observability infrastructure. That's why we have the template loop on the platform: so you can roll things out progressively. You can roll out to Johnny Vibe Coder initially, or push a shard that someone consumes at their own leisure. Or you can roll it out over weeks: 0.1% of people, 1% of people, early adopters, then all the way up. That's the non-deterministic version control we talked about earlier.Jake [00:41:30]: I believe that's where most things should go, because most companies end up building staged rollout systems in-house. It's the same thing built again and again at every company. There's a massive opportunity to consolidate developer debt.Alessio [00:41:45]: You should have a free tier. Model providers give free tokens if you let them use the data. You could give free compute if someone is the number-one shard that goes out and lets you plug into their observability.Jake [00:41:55]: We do that. That's why we talked about the impact on 3,000 people. We start with lower-impact people. Larger companies on the platform are last to receive those rollouts so they have a version of the platform that's deeply stable.Alessio [00:42:16]: I have three services, so I'm sure I get the first rollout. You can nuke my thing at any time. There are all these SRE agent companies. Observability people also want agents that fix upstream problems. You have your own agent in the canvas now. How do you see that playing out?Jake [00:42:39]: It's the stacking entropy problem. If you don't have primitives to make iteration in production safe, it becomes difficult. If you're an observability provider saying, “Here's the fix to this error,” assume 80% are good and make sense. But in the last 20% long tail of complex issues, if you let somebody stamp it, you create an opportunity for an incident.Jake [00:43:08]: That's why forked environments are important. People have staging, but it always drifts from production. You need primitives, workflows, and experience built first-party on the platform so you can fork any service at any point in time.Jake [00:43:33]: I think of the canvas as a sheet of transparency paper. The agent is a little guy you push up into the canvas. It should say, “I need to copy that service and that service so I can test these two things.” It gets a read-only copy of production. Anything that's PII gets marked as a transform when we clone the database, create a copy-on-write version, or read from it. Then the agent makes changes and asks, “Does this actually work?” as close to production as possible.Jake [00:44:22]: That's how close you have to be, or you get massive drift. The system becomes unstable. You see this with massive systems built on Docker for local, Kubernetes for production, and a specific thing for something else. That complexity slows developers and becomes unstable at scale, making it hard to iterate. We want to compress that way down and say, “As close to prod as possible is where we want to be.”From AISRE Skeptic to Agent BelieverSwyx [00:45:00]: I was texting Erica for questions, and she says you were originally not a believer in AISRE. Have you come around on it?Jake [00:45:10]: I flipped, but I'm still not a believer in AISRE if you don't have the primitives to make it safe. If you unleash AISRE on production infrastructure without safe primitives for copying volumes and making sure things are fine, it's going to nuke your production database. It's not a matter of if, but when. I'm a big believer in making those loops safe.Jake [00:45:33]: I was a deep AI skeptic until 2023. In 2024, I thought, “Maybe I can roughly make this thing do it.” In 2025, I thought, “Now I can hold this.” Over winter break, everybody came back saying, “It's almost impossible to hold this.”Swyx [00:46:01]: Did you see this on the Claude docs? CloudBot? OpenCloud?Jake [00:46:06]: It's gotten to a point where it's harder to hold it wrong than to hold it right. There's a scene in Avengers where Vision picks up Thor's hammer and says it's terribly well-balanced. It self-balances and works well. I'm a deep believer at this point that this will be the dominant species: assembly, C, C++, JavaScript, words.Swyx [00:46:35]: It feels like a big jump.Jake [00:46:37]: It is. But it's not like you abandon CPU-based discrete logic and move straight to fuzzy logic. You need both. Your skills should call code or applications or some static structure. You can use skills to distill what the procedure should be or how the code should act.Jake [00:47:02]: I'm coming to a thesis: you need three points. You need a clear spec defining the system, the code, and the tests. When you say it out loud, if you've been in engineering long enough, you're like, “Of course. That's an RFC, tests, and code.” But they all matter. Having them together lets them reinforce each other: the spec and tests match, but the code doesn't, so reconcile it. Or the tests and code match but the spec doesn't, so reconcile that. That's the iteration loop.Jake [00:47:41]: That's why you're seeing people talk about software factories, docs, and reconciliation. Some of that is architectural astronomy if you don't implement it, but that loop is where most things will end up.Swyx [00:48:07]: For listeners, we've been talking about this on the pod for three years: the holy trinity of specs and tests. Itamar Friedman from Qodo is the reference if people want to look it up.Self-Modifying Infrastructure and the End of Push-Pull-RebuildSwyx [00:48:18]: One thing I want to mention on the OpenCloud idea is self-modification. I don't know how Railway would support it, but I have my OpenClaw, and I just tell it it has the Railway CLI and can do whatever. In theory, whatever capabilities or new infra it needs, it can call the Railway CLI, provision it, and add it to itself. The agent can modify its own infra.Jake [00:48:45]: It's nuts. I have a loop set up where you put the Railway CLI on top of something that runs on Railway. You're authenticated as whatever the current box is, and you can make any changes to it. Then you call Railway deploy, and it deploys itself.Jake [00:49:04]: It's like: “I need to spin up this instance of this environment. I already exist in this environment. Excellent, I have access to a Postgres instance now.” That's where we want to go with agentic, self-replicating infrastructure. That's your loop: iterate in production. You continue making changes. If it works, merge it upstream. If it doesn't, throw it away.Jake [00:49:37]: How do you make throwaway copies trivial to spin up and super cheap? The era of “I have an AWS instance with four vCPU and 16 gigs of RAM” is going to get destroyed. If you do that for agents, you need a thousand of those machines. It's prohibitively expensive compared with what we've spent a ton of time figuring out: the atomic unit of deploy, whether you call it isolates, sandboxes, or something else. Only pay for what you use, spin up instantaneously, and close the loop as quickly as possible.Jake [00:50:15]: If the system can self-replicate safely and say, “This is my environment, I'm making these changes,” it can come back with, “Does this look good? This is a new state of infrastructure given this prompt. I think I've solved it.” Then you go back and say, “Actually, it looks different.” It does the loop again. Then you say, “Cool. Apply.”Swyx [00:50:38]: That's retroactively obvious, which is the most useful kind. Any other comments on agent deployment on Railway?Jake [00:50:51]: It's getting better every day. I'm on X or Twitter. You can always yell at me about the parts not working as well as they should, because plenty of things should work way better.The New Serverless: Stateful, Long-Running, Pay-for-What-You-Use LinuxSwyx [00:51:04]: At this stage, when people want massively or embarrassingly parallel compute, they usually talk serverless. I feel like there's a new serverless compared to the previous five years of serverless. You're in that new bucket. Do you have comparisons or philosophical differences you want to call out?Jake [00:51:31]: It's somewhere in between. It's the ability to run stateful, long-running workflows or executions.Swyx [00:51:42]: Vercel has Fluid Compute, Cloudflare has some container thing, Google has App Runner and others.Jake [00:51:55]: That's where everything is roughly going, and it's why we've been working on this for six years. We believe users need access to a computer: a box that speaks Linux. They need to deploy what they want. Other systems change the surface area of what you can build. For us, users need a computer and need to deploy anything they truly want. That's why we've focused on the primitives: network, compute, storage. If we give you those and expose them so you can run things indefinitely, that's where we believe it's going.Jake [00:52:43]: Twitter has no nuance, so everyone says “servers” or “serverless.” It's always somewhere in the middle: I want to run it for a long time, but I don't want to provision the resource statically or pay for things I'm not using. That's been our thesis from day one: pay only for what you use, run it indefinitely, and it is full Linux.Swyx [00:53:12]: That's why I like the naming of Fluid. It's fluid. Flexible.Heroku, Focus, and Carrying the Torch Without Becoming the PastSwyx [00:53:18]: Another milestone is the Heroku official deprecation. You're one of the presumptive new Herokus. “New Heroku” has been a category for as long as I've been in developer tooling. It's finally happening. What was that like? Any behind-the-scenes of, “This is the moment”?Jake [00:53:42]: You have people where you're like, “You were running stuff on here? You, as this company?” It's crazy that names you would know are running on it and now coming to us saying, “We want to move a lot of this off.”Swyx [00:54:00]: Any behind-the-scenes on why Salesforce let Heroku stagnate?Jake [00:54:05]: I can only guess. It's hard when it's not your business. Salesforce's business is to build a great CRM. That's their focus. Then you acquire a compute business as an offshoot. A lot of early Meta people talk about focus. Boz has a write-up about how in the early days of Meta they had no money, so they were forced to focus. Then they turned on the money tree and had no reason not to split their focus.Jake [00:54:52]: But that dilutes your product. You get offshoots where you ask, “Is this the focus of the business?” If it's not core, it languishes. A lot of companies get in trouble when they split focus because they're fighting a multi-front war, not just externally but internally for alignment. Where are we going? What are we doing? What is our purpose?Jake [00:55:24]: If you're Salesforce-built and mission-driven, you want to work on Salesforce. Heroku is off to the side. It's not core to the business. Getting resources, budget, focus, and alignment internally becomes hard. It was a matter of time.Swyx [00:56:06]: Kudos for them to call it out instead of leaving it unknown.Jake [00:56:12]: Their release was a little odd. They called it out, but they didn't say they were shutting it down. Behind the scenes, I think they issued messages to people saying they should close accounts and that they were going to deprecate and remove things over time.Jake [00:56:30]: It's crazy because some of my first deployment experiences were on Heroku. You start with dragging things into an FTP server, then you try to get a deploy working, and then it's Heroku. It was the on-ramp for us. But the wheel turns. New things emerge. We're happy to carry the torch for a lot of that. But we don't want to be the new Heroku. We want to be the way people build and deploy software, and ultimately the way people monetize software over time.Swyx [00:57:19]: It's still a big crown to be the new Heroku. There are 50 companies that fought for that.Jake [00:57:23]: Everybody is holding some portion of it. We're happy to support people and companies. The platform works differently. The game loop is similar, but we've been dogmatic about where these things are going: primitives, agents, fan-out. Some things fit; some workflows need to change. We have an approximation of Heroku pipelines with the environment system. It's exciting. We've got a ton of people we can support, and it's growing a lot.Temporal, Workflow Engines, and State MachinesSwyx [00:58:12]: I have one more technical question about Temporal. I've sold my shares. You're a power user and one of our earliest customers. I met you through Temporal. You built on Temporal. You have complaints. This may be the most neutral and informed conversation anyone will hear about Temporal without someone working at the company.Jake [00:58:39]: That's fair. I've used Temporal for almost 10 years because of Cadence at Uber.Swyx [00:58:52]: Give people a sense of what Cadence was at Uber.Jake [00:58:57]: Cadence was the precursor to Temporal. It powers trip actions, rides, when you rent a Jump bike or scooter or car. You're running workflows for a period of time and saying, “This ride will run indefinitely until it finishes.” You attach information: you paused in this zone, so add this charge to the bill. When you end the trip, the workflow is done. That experience was powered by Cadence at the time.Swyx [00:59:34]: I used to say it's like programming the entire user journey top-down as one function.Jake [00:59:39]: It's a powerful idea and important. It's also important for the next phase of the agentic journey. You want an agent to do a specific task, be complete or incomplete on that task, and move on to the next thing. You need a way to manage workflows dynamically.Jake [00:59:59]: Temporal was always great in theory, and great when you got it working the way you wanted in production. But it required you to model the entire journey in your head. If you didn't, you could cause issues where replaying the state of the workflow causes non-determinism.Swyx [01:00:25]: Because it works on deterministic workflow history.Jake [01:00:28]: Exactly. I describe it as a jet engine. If you know how to operate it and run it, it's great. But you can't hand it to people trying to build complicated things if they don't have the whole state in their head.Jake [01:00:48]: We run our whole deployment pipeline on top of it. That's a reasonably complicated workflow: pre-commit hooks, signaling, queuing, and all the rest. We ran into the same thing at Uber. As you express a large workflow, it gets more complicated, with more states in the state machine that you have to map back to the workflow.Swyx [01:01:15]: It's a lot of ifs.Jake [01:01:16]: Exactly. At Uber, we built a system for doing the state machine and testing it. We've started to build some of those things here because it's grown heavily. It's not quite love-hate. When it works well, it works super well. But if someone who doesn't have full context puts something into the system that invalidates state or causes non-determinism, or spins off a ton of activities, you have to keep track of underlying SRE knobs like activity slots. Those should scale with memory, vCPU, and so on. It becomes a bear to scale.Swyx [01:02:10]: You need a capable sysadmin running things behind the scenes. If you moved off, what would you do?Jake [01:02:19]: We'd build our own workflow engine. We have a few internally that we've worked on.Swyx [01:02:27]: This is one of those classes of things you typically wouldn't vibe code, but I'm wondering if you can.Jake [01:02:33]: I still don't think you should vibe code it. You still want to run decent tests to make sure it works.Swyx [01:02:39]: Timo didn't invent that from scratch either. There are libraries you can run. On top of that, it's just a state machine that you have to map out. Ultimately, you define the instructions you want and run them through a state machine.Jake [01:03:00]: It's very doable. Workflow stuff is interesting. Restate is doing neat stuff here.Swyx [01:03:10]: You're tied into JavaScript. Are you a JavaScript maxi?Jake [01:03:13]: Internally, we have TypeScript, Rust, and Go. We don't add more languages. Actually, we have a little C because we write BPF code and hooks. But those are the languages.Swyx [01:03:28]: Is this for sidecars?Jake [01:03:32]: No. It's for the networking stack, volumes, and things like that. We use TypeScript a lot because it powers the dashboard, but we're moving a lot of workflow stuff off the dashboard stack and into the infrastructure stack.Railpack, Nixpacks, and Content-Addressable FilesystemsSwyx [01:04:00]: Cool. Any other technical infrastructure stuff? Railpacks?Jake [01:04:07]: We built an engine for determining dependencies based on source code. It's called Railpack. We built the first version, Nixpacks, on top of Nix, and then we moved.Swyx [01:04:17]: People have been trying to get me to adopt Nix and NixOS for four years. Is it ever going to be a thing?Jake [01:04:23]: I don't know. We're excited about it, but it has pain points. Think of it as a stack of versioned binaries at specific slices in time. If you want version X and version Y, you bloat the package space, which blows up image size and makes real-world workloads difficult.Swyx [01:04:53]: But you content-address it and cache it. In theory, there are optimizations.Jake [01:05:00]: In theory, yes. But with a large enough user base and disparate enough machines, you run into a problem Meta described in the XFAAS paper, their internal serverless system. It becomes difficult at scale unless you break out specific runtimes.Jake [01:05:24]: We didn't want to do that because we wanted to truly allow you to deploy anything. That was our initial thing with Nix. But we've moved toward interesting work around content-addressable file systems that can lazy-load anything from any point and page it into memory.Swyx [01:05:48]: Amazing.Jake [01:05:49]: The future is very bright. It's crazy, and it's going to be nuts.Coding Agent Spend, Roadmaps, and Token ROISwyx [01:05:54]: Founder journey stuff?Alessio [01:05:56]: Your cloud usage: you tweeted you're going to spend $300K this month?Jake [01:06:01]: I think we got to $200K.Alessio [01:06:02]: Coding agents?Jake [01:06:03]: Yeah.Swyx [01:06:04]: Across the company?Alessio [01:06:05]: You only have 35 people, so I'm sure they're not all spending $10K a month. What's the distribution?Jake [01:06:10]: I think I'm at about $25K. We have power users all the way down. We came back from winter break, and I basically said, “If you're writing code by hand, you're doing this wrong.” The tools are good enough now that you can move extremely quickly. There are issues and pain points, but you should be reviewing the code you are writing instead of writing it by hand.Jake [01:06:40]: Architectural patterns matter more now than ever, but you shouldn't spend your time generating code you would write. If you know how to write it, ask the agent to write it and reconcile it until it looks like you would have written it yourself.Jake [01:06:58]: People misconstrue my propensity to push people toward agents as connected to our growth and some reliability bumps. They're not necessarily related. The tools are good enough to move extremely quickly and build things way larger than you could before.Jake [01:07:19]: To the earlier point about cooling data centers in space: I don't know. But with software, you can ask, “How would I build block storage from scratch? How would I do these things?” I have ideas because I have history and have read papers. Let me work them out and build massive test benches with thousands of tests, because those are now free to author. If you're not using AI systems to speed-run your roadmap and reconcile your existing system onto the future, you're missing a large point of what's happening.Alessio [01:08:12]: What's the path to spending $3 million a month? Is it bound by ideas and things customers can absorb?Jake [01:08:19]: For most companies, it's bound by deployment at this point. That's why we've seen a massive boom in users and companies, from Fortune 50s down, asking how to get developers to move faster. You'll probably hit your CFO before any technical limits because they'll look at the eye-watering amount of money spent on tokens. Inference costs have to come down, but we're inference constrained now. There will be price discovery around what makes sense for an org to adopt.Jake [01:09:06]: I think you'll end up with the F1 driver concept. If someone is really adept at these things, it makes sense to put them in a $3 million car. If they're not, it probably doesn't make sense. You'll take a few people and say, “You can drive the F1 car. We need to go in this direction. Figure out if it works and prototype it.”Jake [01:09:33]: We've done some of that and vastly accelerated our roadmap. We thought we'd ship something in a few years; now we can probably ship it in a few months because we validated it and don't have to build it incrementally. We can skip steps and move toward our vision.Alessio [01:09:58]: A lot of people are realizing the roadmap doesn't always have a business impact, so they say tokens are too expensive. But if your roadmap were built to make more money by the time you built it, you'd have token pricing for it, the same way you do with sales. You'd spend a billion dollars on sales if you knew you would get $2 billion of revenue.Jake [01:10:19]: Exactly. A naive way to measure this is the percentage of tokens that end up in production. If you can measure impact because those tokens end up in production, that's awesome. But the burden of proof will rise. Internally, we have a growing number of pull requests that haven't merged. The question becomes: how do you get this into production? It's about how quickly you can build and deploy software, which is exciting because that's our whole thing.The SDLC Shift: Prompt Requests, Feature Flags, and Safe RolloutsSwyx [01:10:56]: The SDLC is changing. One thesis is that the pull request is dying. It's going to be the prompt request. Beyond that, code review is also kind of dying if you have all the other systems in place. What else is changing about the SDLC?Jake [01:11:19]: The AISRE and the tools to make it happen. AISRE is pie-in-the-sky aspirational. What does it take to get an AISRE? What tools do you need to build?Swyx [01:11:32]: You should expose your tooling to customers at some point. The Central Station command center.Jake [01:11:39]: We have it for template maintainers. Template maintainers can deploy and maintain templates, and they get feedback. We're going to expose those things incrementally.Swyx [01:11:51]: Clustering around incidents. Everyone has a version of that, but I don't think anyone has solved it.Jake [01:11:56]: I won't say we've solved it internally, but it's gotten so good that we can see incidents forming pretty quickly. At some point, those will be things either someone else builds or we build. We've always built things purpose-built for us. If it makes sense to make it useful for users, monetize it, or turn that loop into a profit center instead of a cost center, we want to do that.Jake [01:12:28]: Pull request is definitely dying.Swyx [01:12:29]: Do you do first-party feature flagging and incremental rollout stuff?Jake [01:12:34]: We have a feature-flagging engine we built internally and will eventually roll out.Swyx [01:12:38]: I don't see it as a user. How come you didn't give us what you have?Jake [01:12:43]: We have to beta test it. We care a lot about the quality of the things. There's plenty we've used internally that doesn't make it all the way through the journey because it fails. It works for one service but not multiple services. We'd have to build it for multiple services and know that if we released it, we'd rebuild it again and again. Some things are worth that, but many inform the roadmap.Jake [01:13:18]: We don't want to dilute the experience by saying, “This works, but only for this service,” unless it's a core initiative. Over the next few months, we'll roll out things that work for a single service, then multiple services, then multiple services across the environment. You have to be deliberate. Otherwise you create broken disparate experiences and support load because people ask how to use the feature.Jake [01:13:52]: It's the earlier expansion and compaction pattern. You expand the company to get features, then compact and smooth them out so the experience is stellar. You told me in the hallway, “It's gotten so much better.” Internally we're saying, “This part really sucks. We need to make it significantly better.”Swyx [01:14:11]: I can attest to that over the last three years watching you build Railway. For listeners, feature flagging is a huge part of Uber culture. So much so that they have too many feature flags and another thing to remove feature flags. Facebook has Gatekeeper. Agents are going to need this. It's fundamental to incremental rollouts. OpenAI acquired Statsig. GPT-5 is routing and flagging through different models.Jake [01:14:56]: It's super important. If the software development lifecycle is going to change because we're doing things 1,000 times faster and 1,000 times more concurrently, what becomes important at scale?Jake [01:15:16]: Before I started Railway, I built a feature-flagging product and tried to sell it. It was an easier version of LaunchDarkly. I ran into a problem: anyone small enough to adopt your technology doesn't care about feature flags, and anyone large enough to need feature flags needs so much scale that you have to build out all the infrastructure. I scrapped it.Jake [01:15:42]: But what is old is new again. Companies are trying to move quickly, but you can't YOLO a vibe-coded thing straight into production. You need to say, “Here's my blast radius, my impact, and I want to shadow it for these users.” Feature flags. You're going to need the tools larger companies built to maintain their structures. Everything gets compressed by 1,000x so everybody can build those structures quickly.Jake [01:16:07]: That's exactly where we are: compressing the software development lifecycle, then expanding it and adding more new things.Cattle, Pets, and Clonable InfrastructureSwyx [01:16:15]: Another term that comes to mind for newer developers is “cattle, not pets.” People treat production like a pet. It has a name. You baby it and keep it alive. With cattle, you can mass farm, roll out, portion parts out, and kill them.Jake [01:16:37]: I think that might change. You can move toward having pets as long as you have a cloning machine for your pets.Swyx [01:16:52]: Yeah.Jake [01:16:52]: If you can snapshot every single thing at every frame, it doesn't matter if something gets obliterated because you have a snapshot of it. The things we've built right now are designed to block changes from the hermetically sealed DevOps line. You have to write a Dockerfile because you nee
Most organizations think they're doing AI. They've bought the licenses, rolled out the tools, and told the team to start using Copilot. But adding AI on top of a 40-year-old process isn't transformation. It's decoration. Andre Kaminski, Director of Advanced Technology Solutions at WorkSafeBC and author of "The AI-Native Software Development Lifecycle," joins Peter and Dave to talk about what it actually means to rebuild your delivery process around AI, not just bolt it on. They get into why optimizing code generation alone is the wrong focus, what the six phases of an AI-native SDLC look like in practice, and why the biggest challenge isn't the technology at all. It's the identity shift that comes with it. If your organization is asking "which AI tool should we use?" this episode will help you realize that's probably the wrong question.In this episode:Why AI-augmented and AI-native are very different thingsThe compounding learning effect and why early adopters are pulling further ahead every monthWhat prompt architecture actually means and why it matters more than codeHow to think about governance when prompts become your new source of truth Want to keep the conversation going? Drop us a line at feedback@definitelymaybeagile.com or find us at definitelymaybeagile.com. If this episode got you thinking, share it with someone who needs to hear it.
Autonomous software development creates a dilemma for leaders in regulated industries: adopt AI coding at scale or fall behind on product velocity without compromising auditability and code quality. In CXOTalk episode 917, Kris Tokarzewski, Group Chief Technology Information Officer at Vitality, describes how a 14,000-employee multinational insurer is rebuilding its software development life cycle around AI. This episode examines the impact of agentic AI on software development in the enterprise.Recorded at Blitzy's headquarters, the conversation examines deterministic code generation, Blitzy's infinite code context, context engineering, test-driven development, and the shifting bottlenecks that surface as throughput accelerates.YOU'LL DISCOVER✅ Why regulated industries require deterministic, auditable code rather than the probabilistic output most AI coding systems generate✅ How Blitzy's infinite code context (ingestion of codebases, engineering standards, and business rules) creates high-quality software aligned with compliance requirements✅ How Vitality reverse-engineers legacy systems with autonomous AI, achieving a measured 5x acceleration over manual methods✅ Why optimizing end-to-end SDLC throughput matters more than local efficiency at any single stage✅ How code review of 50,000 to 100,000-line pull requests becomes the next limiting factor, and how AI reviewers close the gap✅ How test-driven development pairs with autonomous code generation to raise quality and compliance pass rates✅ How the roles of requirements engineers, software engineers, and product teams converge inside an AI-native SDLC✅ How to instrument AI spend against velocity, quality, end-to-end throughput, and customer value rather than isolated gainsTIMESTAMPS0:00 Deterministic code vs. probabilistic AI output0:14 Meet Kris Tokarzewski, Group CTIO of Vitality0:32 Why Vitality is modernizing legacy insurance systems1:30 Event-driven architecture as agentic AI's natural partner3:00 Building an AI-native software development life cycle with Blitzy4:28 Throughput optimization versus local efficiency6:02 Reverse engineering legacy systems and deterministic code generation9:05 Infinite code context: ingesting codebases, standards, and rules10:00 Test-driven development with autonomous code generation10:49 Results: 5x faster legacy reverse engineering13:17 Product, engineering, and DevOps convergence15:04 Roles level up: requirements engineers and software engineers16:18 Reviewing 50,000 to 100,000-line pull requests17:56 Instrumenting AI spend against business outcomes19:16 Executive sponsorship for autonomous development20:16 Advice for CIOs and CTOs adopting AI-driven development
The first of three episodes recorded at Google Cloud NEXT, Las Vegas in partnership with Kyndryl, the world's largest IT infrastructure services provider Host Russell Goldsmith was joined by: 1/ Kris Lovejoy, Global Head of Strategy, Kyndryl 2/ Vincenzo Forciniti, AI Adoption and Data Platform Leader, Fastweb & Vodafone 3/ Adrian Tatsch, VP AI Technology & Innovation, Equifax 4/ Patrick Bobrukiewicz, VP Data Services, Thrive Restaurant Group 5/ Kaapro Kanto, VP, Cybersecurity & Digital Platforms, DNA 6/ Brad Duff-Hudkins, VP Data Analytics, Next After Each of our guests offered a grounded, real‑world view of AI adoption at scale. The episode opens with Kris Lovejoy, Global Head of Strategy at Kyndryl, who outlines why digital sovereignty, geopolitical risk and regulatory pressure are reshaping enterprise architecture. She also breaks down the guardrails required for employee productivity tools versus mission‑critical agentic systems and why modernisation itself has become a security control. Next, Vincenzo Forciniti, AI Adoption & Data Platform Leader at Fastweb and Vodafone Italia, discusses the data‑unification challenges following Fastweb's acquisition of Vodafone Italia. He shares how the team built a shared data catalogue, why change management is often harder than technology, and how modernising legacy stacks is enabling scaled AI across SDLC optimisation, operations and customer‑facing processes. We then hear from Adrian Tatsch, VP of AI Technology & Innovation at Equifax, who explains how the company is connecting APIs to AI agents using Apigee MCP, and how Equifax's multi‑billion‑dollar cloud transformation has accelerated AI maturity. Adrian explains how Equifax is redefining human vs. non‑human work, upskilling, and measuring ROI across the organisation. Patrick Bobrukiewicz, VP of Data Services at Thrive Restaurant Group, shares a hospitality‑sector perspective on AI adoption. Kaapro Kanto, VP, Cybersecurity & Digital Platforms, DNA explains how DNA moved from traditional network operations to AI‑driven SecOps, enabling small businesses to benefit from enterprise‑grade detection, automation and response, and why the biggest barrier to AI maturity is shifting from pilot experiments to trusted, scalable operational models. And finally Brad Duff‑Hudkins, VP of Data Analytics at NextAfter, explains how his team used Google's data engineering agents to cut onboarding time from 2–3 weeks to just 72 hours, and why agentic AI is already unlocking faster, more personalised, more scalable data operations for lean teams. A fast, insight‑rich episode capturing the reality of AI transformation inside complex global enterprises, from security and sovereignty to data foundations, workflow automation and the future of human‑machine collaboration.
Welcome to TiPS – the Topics in Product Series – a new podcast format powered by ITX and the team at Product Momentum. The TiPS mission is to engage the same important product space issues that you confront every day – but this time through the experiences of ITX product managers, UX researchers and designers, engineers, security analysts, and the rest of the team. In this inaugural TiPS episode, Dan Sharp is joined by Sean Murray and Andrew Knoblauch to reflect on a recent Product Leaders Breakfast, hosted by Prerna Singh. Together, they draw on insights from event attendees to discuss how AI is being applied inside real organizations. The central theme was clear: successful AI adoption depends less on hype and more on first principles and core product skills that drive disciplined product thinking, incremental progress, and strong decision-making. Here's what we learned: Top-Down ‘Do AI' Directive Is the Wrong Reason for Integrating AI The integration of AI into software development is no longer the proverbial “hammer in search of a nail.” The days of doing AI for AI's sake are behind us. Today's product leaders focus on making incremental improvements tied to bona fide business problems. As Sean points out, our response to the ‘do AI' directive should be: “’Where do you want to see improvement? What outcomes are you looking for?' I think back to our conversation with Teresa Torres, about applying best practices in the initiation and discovery phases of the SDLC so that when we actually get into building something, it’s gonna have some sort of relevant business value.” It's a more grounded approach that reflects a broader industry need to align AI efforts with tangible outcomes.. Building Stakeholder Trust Through Incremental Change Trust emerged as a critical factor in AI adoption, but not only in the technical sense. Instead, as attendees discussed, trust is built gradually through careful implementation and organizational alignment. Andrew explains that product teams build trust not by tackling the biggest, riskiest challenge – but by prioritizing low- to medium-risk opportunities while involving stakeholders early, especially those in Legal and Compliance. “This idea of building trust among others in your organization.” Andrew continues. “We do this every day with our clients and with our own teammates. We learn about people’s concerns, what they care about.” The conversation reinforces the idea that AI should be introduced as a collaborator within workflows, not as a replacement for human judgment. Decision Quality as the True Differentiator One of the key threads weaving through our conversation was a return to foundational product principles – specifically, the importance of decision-making. While AI fluency is valuable, it does not replace the need for strong judgment and clear thinking. Teams that succeed will be those that consistently make informed, high-quality decisions, Sean says. “The biggest differentiator moving forward is gonna be decision quality…your ability to consistently make good decisions.” In this context, AI becomes an enabler, not the driver, of product success. The conversation at the Product Leaders Breakfast (hosted by Prerna Singh) reinforces a familiar but essential message for all product leaders. AI does not replace core product skills; it amplifies them. Teams that stay focused on problem definition, stakeholder alignment, and disciplined execution will be best positioned to realize its full potential. The post 186 / TiPS: AI-Enabled First Principles + Core Product Skills Spark Adoption appeared first on ITX Corp..
A new episode of the Resilient Cyber Show just dropped, and this one is a conversation I've been looking forward to for a long time.I sat down with Tanya Janca, better known to most of the AppSec world as SheHacksPurple. Tanya is the best-selling author of Alice and Bob Learn Application Security and Alice and Bob Learn Secure Coding, an OWASP Lifetime Distinguished Member, CEO of She Hacks Purple Consulting, and one of the most recognized voices in application security and developer education on the planet.The timing of this conversation is hard to overstate. The OWASP Top 10 2025 was announced at the Global AppSec Conference last year, with two new categories, Software Supply Chain Failures and Mishandling of Exceptional Conditions, and SSRF folded into Broken Access Control. Recently, Anthropic released the Claude Mythos Preview system card, documenting a model that has already found thousands of high-severity zero-day vulnerabilities autonomously, including bugs in every major operating system and web browser, and a 27-year-old vulnerability in OpenBSD.In other words, AppSec is at a hinge moment, and Tanya is exactly the right person to think out loud with about it.Here's what we get into:What the OWASP Top 10 2025 got right, what it missed, and how teams should actually use itAI-generated code, “vibe coding,” and Tanya's brand-new free prompt library for secure coding with AI assistants, SecureMyVibe.caWhat Mythos-class capabilities mean for the offense/defense asymmetry AppSec has always lived withHow AI is genuinely changing the SDLC, where it creates lift, where it creates noise, and where it creates entirely new attack surfaceArchitecting real defenses at the prompt layer, across MCP servers, and inside RAG pipelines, not just bolting content filters onto the front doorWhy developers are the new attack surface, and why a lot of what gets labeled as “supply chain attacks” lately is really a developer compromise that cascaded into the supply chainTanya's threat model, defense framework, and maturity model for protecting developers themselvesDevSec Station, Tanya's new podcast delivering 5–10 minute secure coding lessons in a format built for how developers actually consume contentWhat she'd change tomorrow about how AppSec programs are built and run if she could change just one thingThis is one of those conversations that ranges from the practical (what to do Monday morning) to the philosophical (what does it even mean to “secure software” when an AI can find more zero-days in a weekend than a Red Team finds in a year). Tanya brings the rare combination of deep technical chops, real teaching ability, and genuine warmth that makes a hard subject feel approachable.If you lead an AppSec program, write code for a living, run a security team trying to keep up with AI-assisted development, or you're just trying to figure out where this whole industry is heading, this is the episode for you.Resources from the episode:SecureMyVibeDevSec Station Podcast (Tanya's new show)She Hacks Purple ConsultingAlice and Bob Learn Application Security and Alice and Bob Learn Secure CodingOWASP Top 10 2025 — https://owasp.org/Top10/2025/Claude Mythos Preview System Card — AnthropicThanks for being here. If this episode landed for you, the best thing you can do is share it with one person on your team who'd find it useful, that's how this newsletter and show grow.
(05:00) Brought to you by MailtrapMailtrap is a modern email delivery for developers with native SDKs support along with security compliant API & SMTP. Plus, you get 4,000 emails a month completely on their free tier! It also provides 24/7 support where you actually talk to real people, not an AI chatbot. Try Mailtrap for free at mailtrap.io.What happens when AI ships code faster than your team can review it? As agentic development accelerates your SDLC, the guardrails matter more than ever — and most teams don't have them.In this episode, Egil Osthus, CEO of Unleash, makes the case for FeatureOps as a strategic capability — not just a developer convenience. He explains the shift from a project mindset to a product mindset, where releases are decoupled from deployments and business outcomes matter more than shipping scope. Egil breaks down the four pillars of FeatureOps — gradual rollout, full stack experimentation, surgical rollback, and lifecycle management — and why each one becomes even more critical as AI-generated code flows faster into production. He also warns against building your own feature flag solution in-house, and shares what the rise of agentic development means for engineers who must now act as guardians of an oversight layer.Key topics discussed:Project mindset vs. product mindset in software deliveryThe 4 pillars of FeatureOps and what each one solvesWhy feature flags scare executives — and how to win them overDecoupling deployment from release across Dev, PM, and MarketingThe danger of rolling your own feature flag solutionHow local evaluation keeps feature flags fast and privateBlast radius management in an AI-accelerated SDLCWhat vibe coders get wrong about day-two operationsTimestamps:(00:00) Trailer & Intro(02:36) What Is the Current State of Feature Flag Adoption Across the Industry?(05:32) Why Is Feature Flag Adoption So Challenging Despite Its Apparent Simplicity?(10:44) How Does FeatureOps Differ From CI/CD and Progressive Delivery?(12:26) What Are the Four Core Pillars of FeatureOps?(16:11) How Can Teams Shift the Perception of Feature Flags From Tactical to Strategic?(20:46) How Do Feature Flags Align the Needs of Developers, Product Managers, and Marketing?(25:09) How Do Organizations Effectively Define Responsibilities for Strategic Feature Flags?(28:03) Does Using Feature Flags Enable Your Team to Deploy on Fridays?(30:41) What Is Unleash and How Does It Scale for Enterprise Needs?(34:54) What Are the Hidden Dangers of Building Your Own Feature Flag Solution?(39:32) Why Are Local Evaluation and Privacy Core to Unleash's Design?(44:48) How Does the Rise of AI Impact the Evolution of FeatureOps?(52:02) What Specific Guardrails Does FeatureOps Provide to Improve Safety?(54:21) Can FeatureOps Platforms Use AI to Autonomously Manage Feature Rollouts?(55:33) What Essential FeatureOps Advice Should Every Vibe Coder Follow?(59:53) 3 Tech Lead Wisdom_____Egil Osthus's BioEgil Østhus is the co-founder and CEO of Unleash, the world's leading open-source feature management platform. As a seasoned enterprise technologist and product strategist, he operates at the cutting edge of business and software engineering.Egil's mission is to help technology leaders and businesses move beyond traditional DevOps by embracing FeatureOps, a new methodology that provides a critical safety net for the accelerating, and often risky, world of agentic software development. He has a unique ability to speak the language of both engineers and senior executives, making complex topics accessible and actionable.Follow Egil:LinkedIn – linkedin.com/in/egilconrUnleash – getunleash.ioLike this episode?Show notes & transcript: techleadjournal.dev/episodes/256.Follow @techleadjournal on LinkedIn, Twitter, and Instagram.Buy me a coffee or become a patron.
PLG can create explosive growth, but it can also mask fundamental gaps in execution, capacity, and long-term durability. As AI-native companies scale at unprecedented speed, revenue leaders face a new tension: how to convert bottom-up adoption into enterprise value without breaking the system that fueled growth. Brian McCarthy joins to unpack how Cursor is navigating this shift, why sales execution becomes the moat in a world of swappable technology, and what it takes to build a go-to-market machine that keeps pace with innovation while deepening customer trust. Brian McCarthy is President of Global Revenue and Field Operations at Cursor and former CRO at Rubrik, where he helped scale the company from $118M to $1.5B in ARR. He is known for building high-performance revenue organizations and execution-focused cultures in complex enterprise environments. Connect with Brian: LinkedIn Resources mentioned: Ep. 71 - What the Best Sales Leaders Do with Brian McCarthy All In Podcast with Chamath, Jason, Sacks & Friedberg Key takeaways from this episode: 03:30 – Why great leaders know when to step away, and how building a successor is the true test of an execution machine 09:50 – What to look for in a once-in-a-career opportunity, and why timing matters more than brand or hype 17:03 – How PLG success created a capacity crisis, and why too much demand can degrade customer experience 29:29 – The decision to radically reduce account load, and how focus enables better selling and better buying experiences 32:16 – Why champions, not features, drive revenue, and how to intentionally build them across the organization 38:19 – The required balance between bottom-up adoption and top-down value selling in technical markets 41:31 – Why “clock speed” is the defining trait of modern sellers, and how enablement must fuel continuous learning 49:09 – The shift from tools to AI factories, and what it means for the future of software development and selling 52:01 – Why culture, trust, and human relationships remain the durable moat in a world of rapidly changing technology Hosted by five-time CRO John McMahon and Force Management Co-Founder John Kaplan, the Revenue Builders podcast goes behind the scenes with the sales leaders who have been there, done that, and seen the results. This show is brought to you by Force Management. We help companies improve sales performance, executing their growth strategy at the point of sale. Connect with Us: LinkedInYouTubeForce Management
“Jest niejako absurdalne, że do tego, żeby zrobić podstronkę z formularzem, musisz myśleć o memoizacji referencji na funkcję.” Tomasz Ducin - konsultant, architekt i człowiek, który dekadę spędził na frontendzie - nie owija w bawełnę. 95% projektów frontendowych to niemodularne monolity, a globalny store dostępny dla każdego modułu to backendowy odpowiednik wspólnych tabel w bazie. Brzmi znajomo?
Piyush Jain, Founder and CEO of Simpalm and co-founder of Ducknowl, is on a mission to solve real-world challenges by combining technology and entrepreneurship. With over 15 years of experience building custom software solutions, Piyush helps businesses turn complex ideas into practical applications by blending technical depth, business acumen, and a strong problem-solving mindset. We explore Piyush's AI Ideation Framework—Validate idea, Proof of concept, Design, Competitor analysis, and Feature selection—a practical approach to building software in the post-AI era. Piyush explains how AI can help teams better understand user personas, validate product assumptions, and rapidly prototype ideas, while human expertise remains essential in design, architecture, and production-grade development. He also shares how prompt engineering, peer-reviewed prompting, and a right-shoring delivery model can help businesses build smarter, faster, and more cost-effectively. — 3D Print Your Software with Piyush Jain Good day, dear listeners. Steve Preda here with the Management Blueprint, and my guest today is Piyush Jain, the Founder and CEO of Simpalm, a custom software development company, and the co-founder of Ducknowl, a candidate screening and assessment application business for high-volume recruiting. Piyush, welcome to the show. Thank you, Steve. Thanks for inviting me. Well, I’m very curious about the stuff that you have to share with us, and I’d like to ask first about your personal purpose. What is your “why,” and how are you manifesting it in your business? Yeah, so that’s a very interesting question. And I think for every entrepreneur or tech founder, really, that's the motivation—why you want to do certain things. So for me, if I look at it, my personal “why” is: why are we not solving challenges? Or why are we not solving them the right way? Why are we not transforming our lives? I grew up in India and then came to the US, so I've seen many different parts of the world—from Asia to North America. I see people face different challenges, but then we are not focusing on solving those problems. A lot of it I see is there’s a lot of challenges in the world because I believe there are not enough entrepreneurs. Because entrepreneurs are the ones who really take risks, combine everything, and create solutions. That was like me, right? That’s what I learned growing up, that I think I can do that, right? I can combine the technical knowledge and the business acumen and create solutions that people like, solve their challenges. Growing up, like I'm more on the technical side.Share on X I was inclined more toward science and technology, but then as I got into my undergrad and grad school, I realized that I have that entrepreneurship aspect, but it's still around science and technology. That’s when I realized that, you know what, I cannot be a pure scientist or maybe a pure entrepreneur, but I can be someone who can combine these two, because my main driving factor is problem-solving. I can combine these two and then live my life, be very happy with what I do. That has been my motivation. I like it. So solving challenges and being an entrepreneur, and kind of combining the two—being the technical expert and the entrepreneur in one. Now, one of the things that we always talk about on this podcast is frameworks. And you have developed a really good one for AI ideation, which I think is something that everyone needs to do these days or use these days, and it helps you create business apps and other business applications. Can you share with me how that framework works, and what are the steps in it? Sure, yeah, definitely. So just to give you a brief background, we've been building software for the last 15 years. Some companies have used different frameworks, whether it's Agile or Waterfall in SDLC, in building the software, right? There are different methodology that companies have used, and they've been good, successful—they've played their role. But now, with the advent of AI, things have changed. We had to figure out, in our organization, how to use AI, and that's how this framework was built. My team helped me building this framework as well.Share on X But we realized that we were losing business—we were losing clients—since we didn't have an AI framework that would fit our clients. Again, for me, it's a challenge. So anytime I see a challenge, it create brain juice in me, right? So I said, okay, let's figure out how we create this framework. How did you do it? So really, we built this framework—very interesting. A lot of the steps are similar, but then a lot of things are different.Share on X Whenever client comes to us and says, “Hey, we want to solve this challenge,” what we do is we do enough research. And now we use a lot of AI tools to really understand the problem better and understand the user persona. When you build any software application, there is a person who's going to use that. Sometimes we used to do user research or focus studies to understand that. Now, with the help of AI, we can get a lot of ideas about the user persona. For example, maybe we are building a healthcare application for an anesthesiologist. I don’t know much about that. I know, I mean, because I have been through some medical surgery and all that, but I can't fully understand their user persona or their requirements with respect to the application we're building. But now, with AI, I can actually ask different AI models, “Hey, we are building this app for anesthesiologists. What are their pain points? How would they see it?” So all that deeper mindset and psychology we can get using AI. You are validating the idea by interrogating AI applications. What users are going to like and all that. So I will always use this term earlier. In software engineering, now we have this pre-AI and post-AI, right? If you read history, we talk about before Christ and after Christ, right? Yeah. So it's a similar thing now. Yeah, exactly. Or before Covid, after Covid. Before AI, after we did all the user research and everything and created a requirements document, we would usually do design, create like a visual design of the software. But now, with the AI framework, we don't do that. That's not the next step. What we do instead is create a quick prototype using AI platforms.Share on X So there are a lot of AI platforms—like Lovable, Claude. Now ChatGPT launched Codex for coding, and Replit. Depending on what kind of application you're building—for example, maybe if you're building a web-based application—then I recommend using Lovable or Replit. They're very good at creating that. Whatever software you want to build, whatever user personas that you’re addressing, you can feed into that and it’ll create like a prototype application. Okay. So what that does is actually, then this prototype, clients can just take it to their customers or internal users and get feedback. A picture is better than a thousand words. Organizations discussing an idea is very different from when they actually see something. Then everybody starts chipping in—“Oh yeah, I see this in the prototype, but I don't want this,” or “I want to move things around,” or “This is what I want.” Basically, building a prototype on AI platforms is much faster than building wireframes and design prototypes like we used to do earlier. So that has changed. So you're 3D printing your software, right? Yes, exactly. There you go. Well, that’s a very good way you put it together. Yeah. So, yeah, exactly. You’re just 3D printing the software, right? So you can see it, visualize it, and then once you go through that, it creates a lot of better ideas about the software in faster time. So once you have that, then you go into UI/UX design. So in that also, there are two steps. One is wireframing. Wireframing is like creating the flow in black and white. It's like creating a skeleton of your software. It does not have the color, the font, or the branding, but you just create all the different user journeys, the screens, the flow, and the fields that will be there on the screen. So we have integrated AI into that step as well. Earlier, it used to be created by a designer or a business analyst. Now we are using software like Uizard or UX Pilot, where we define what we want—what kind of user journey, flows, and screens—and it creates that. It spins out those wireframes in minutes. So really that has reduced now. The time it used to take to create wire frames is faster now. So you're designing the wireframes with AI? Yes, but it's just the wireframe part of it, and it's still guided by our expert VA or designer—someone who knows how to really visualize things and has done a lot of wireframes and sketches. So they know what to tell the AI. Prompting is very important. It's very important that you know how to prompt—what to ask for—so that you can get variations and differentiation in the wireframes. You don't want a standard AI-created wireframe. Everybody can recognize AI-generated images now, right? If I show you one, you'd say, “Oh yeah, it's AI-generated.” I know that, right? Yeah. So again, we keep the human intelligence. We're not asking AI to create the full software end-to-end. It never works—it'll never work. It just doesn't. I know that's a strong statement, but I'm saying that based on experience and an understanding of human behavior and psychology. So AI agents will not be able to code software, in your opinion? No, they can do the coding, but they cannot build the whole software end-to-end—a production-deployed software. Because these software are being used by humans. You have to have human intelligence to understand and define what you need and how it works.Share on X You can maybe create some software, but it doesn't work very well. Even if you use all these platforms, you can cut down your production time and cost by 30%, 40%, 50%, right? That's the number we are seeing—30 to 50% reduction, depending on the software you're building and the objectives. So just to recap—you validate the idea by interrogating Claude and ChatGPT, asking about the needs of that customer, the psychology of the customer—that's step number one. Step number two is 3D printing the software with Lovable or Replit—so proof of concept. And then you design the wireframes. And then what's next after you design the wireframes? What's the next step? So that’s a good thing. That’s it. Now I'm going to talk about the human element—some people listening to this podcast will be surprised. Now it comes to visual design, right? So you've created the skeleton, and now you have to add the skin, the tone, the color, the emotion to the design, to the workflow. Now, we have tried AI, but it doesn't work. It's very monotonous. So we use an experienced visual designer, a UX designer, for that step—to give it emotion. When you use AI—I wish I could show you some examples—it creates very similar kinds of designs for apps and software. So what we did is we gave it three different apps with very different objectives and everything, and the designs it came up with were very similar—blocks, buttons—very monotonous. So there's no differentiation. And design is the main thing that becomes the differentiator, right? Yeah. So that's what we learned from our experience. And I say that very categorically in all of my talks—that visual design, final UX, has to be human, not AI.Share on X Because you are communicating emotions, right? And AI is still not there to communicate emotions. Yeah. It doesn’t have emotions. Well, some people will argue with you and say, “No, it can understand if you're sad or unhappy.” But my response to that is—it's because we've programmed it that way. But things change based on situation, context, ethnicity, culture, fear—how people express nervousness, fear, and all that—it's very different. So there was this AI video interviewing company five or six years ago. They were sued by the Department of Justice because they were trying to detect emotions of people like anxious, nervous, when the interview was happening. It turned out their model was trained only on one race—they didn't account for other races or ethnicities. So their model failed, and they were sued by Department of Justice for that. So yeah, emotions is something—maybe they have unlimited dimensions, we don't know. So it's hard to program that. So basically: ideation, prototype, wireframe, and then final visual design—that's the discovery and design framework. Now, when it comes to development framework, this is where AI has been a game changer—the coding part. But again, you have to be very careful about how you use AI in your coding pattern with your coding team. It depends on the application, it depends on the tech stack, right? Every platform has its own strengths and weaknesses. For example, if you want to build a web-based application in the React JS framework, then Lovable is great. That's very good—very efficient and cost-effective. Then Claude is there. Claude has been really good in software engineering. I would say it has been built and designed mostly for coding, right? Anthropic—their idea, their starting point—was coding, how to make coding and software engineering better. So they've been a front runner in the race. ChatGPT is trying to catch up using Codex, and Copilot is great. Copilot is mostly used by enterprises who are on the Microsoft stack. They use Copilot a lot for coding in .NET and enterprise-level applications. They’re used to co-pilot. It’s because they feel comfortable with Microsoft security policies and all that. That’s fine. But in general, we see Claude to be at the top—from our perspective. We've also built a framework for software coding. In software development, there's a popular process called peer review. So when you create source code, you get it reviewed by your peer—your colleague.Share on X Is this what happens on GitHub? Yeah, yes. So basically anywhere—any source code repository—you can do that. So your team members can help you make your code better and more efficient. Yeah, I understand. But now we have a step called prompt peer review. When you're using prompts to build software, those prompts get reviewed by team members. Because if your prompts are not very specific or good enough all the way through the SDLC, you can run into a lot of challenges trying to fix the code. Because now you have a situation where you have code that you have not written fully, and when you ask AI to change something in the code, sometimes it ends up changing a lot of things that you don't want it to change. Yeah. That's what we've seen, and that's why we evolved. Before we build any software, we create maybe a 10-, 20-, 30-page prompt document, where we go through each screen and function and write it out. It's very sophisticated—it has evolved really well. But the thing is, it takes a few days to do that within the team, because we know if we do it right, the next step is faster and more accurate. So really, the prompt document—think of it more like an architecture document. Earlier, we used to create a solution architecture document, defining all the tools, the design, everything. But now it's more like an AI-driven solution architecture document with prompts, which get reviewed by team members. So we do that, and then we run that, and we get the code and everything. So I have a CTO club—I run a CTO Club in Maryland—and I was talking to CTOs. They're all using this, but some of them are so advanced that they actually define the test cases in the beginning. They define, “Okay, this is what I want, this is the function I want, and these are the test cases I want it to pass.” That's even more advanced. If you can do that, you can have very efficient code. Yeah, I love it. So is that the end? You have your test cases, you design the prompt, you peer-review the prompt, and you already had the prototype, so now you're coding the software—what's the last step? Yeah. Then there’s an integration as well. So AI doesn’t do the integration so well. You can do the front-end coding, you can do the back-end coding, you can probably create the APIs. APIs require a lot more human intervention. But once you have that, then you have to connect it, right? You have to connect the front end with the backend. A lot of that is still done by the programmer. It's hard to rely on AI for doing that. And again, it depends on the application. Maybe if it's a smaller application, maybe you can have AI do that. But if it's a bigger application—we mostly build bigger applications—then integration, then final QA and testing, and deployment. So all that is there. But in each of these steps, you can use some sort of AI tool to speed up the process. But the key is you still have to have your architecture, the process. You have to know the steps more. You have to be a good, experienced developer to use AI efficiently if you want to build a production-ready application. You can build a prototype. Anybody can build a prototype on Replit or Lovable, but it's not going to be production-ready that you can give to your customer and charge them money. So that’s the differentiator. Yeah, I understand. So Piyush, I’d like to switch gears here. I understand the AI ideation framework—that's great. We talked about the technical part of it, the curiosity, the technical challenges. Let’s talk about the entrepreneurship part, which is also part of your profile. So what drives the growth of your business? What would you say drives it? For us, there are multiple factors that drive the growth of our business. The first is, again, our problem-solving attitude. Any client that comes to us we communicate in that modelShare on X The problem, the challenge, the solution, the business part, the value proposition we bring. And the second factor is our location. We are here in Maryland, and we have another office in Chicago. So being here, we have a global shoring model—that's a main driving factor of our business from the entrepreneurship perspective. So what the global shoring model is: our client-facing team, the senior team, is here—solution architects, sales engineers, designers, project managers, business analysts—they are here in the US, client-facing. And our dev team and testers are in our offshore locations. Some people call it hybrid shoring. I call it right shoring. The reason I call it right shoring is because in this model, you have the right people at the right shore, so you get the most value. Here, you have people who understand the culture, the product, the context—because products are used by people in a certain culture. And if you are not in that culture, if you haven't experienced it, it's always harder to design the right software solution. I was one of the first people to start that model here in the DMV area for mid-size and smaller companies. This model existed before, but mostly for large enterprise companies. They have used that. But I started to offer that 16 years ago to smaller companies. Either companies were just going offshore, or they were doing onshore, right? I introduced this hybrid—or right-shoring—model, and it has been well received by our customers. So that’s it. So what is one thing that you’re trying to figure out in your business right now? Right now, what I'm trying to figure out in my business is scaling. I mean, we have built solutions for many different industries. We have built solutions for different clients in fintech, healthcare, education, nonprofit, startups, IoT, construction. But now what we are trying to figure out is how do we create some off-the-shelf solutions for different industries? Because one challenge we see is that, from the client's perspective, getting custom software built takes time and money. But in certain use cases, we can have off-the-shelf, industry-specific solutions, and then customize those based on the client's needs. So that's what we are trying to figure out—across different industries, what those solutions can be—so we can scale and also make it easier. And these are more like AI-driven, off-the-shelf solutions that are customizable. So think of it like Salesforce—its core is off-the-shelf, but then you can customize the front end and a lot of other things. Not exactly like Salesforce, but more like industry-specific solutions for different use cases—nonprofit, construction, right? With those, overall, we can build solutions faster. That’s fascinating. So how has the offshoring—or right shoring, as you call it—model evolved over the past 10 years? Is it different now than it was 10 or 20 years ago? Yeah, I think that's a great question. It has evolved and changed. Earlier—maybe 10, 12 years ago—when we were talking about hybrid shoring, we were mostly talking about the US and Asia. But now we have different players. We have the nearshore model, which has become quite popular as well—like South America. We have team members in nearshore locations as well, in South America, because we want to leverage different time zones, resources, and culture. And we've seen very positive results. Then you have Eastern Europe. We have competition from countries like Ukraine, Belarus, Romania, Poland. I think it’s the part of the globalized world, right? It's like energy flowing in different spaces—it's not limited to one place, which is great. That's one way it has evolved. I also know some companies working in Kenya—there are developers there. Some companies are setting up in East Africa, West Africa. So different places are playing roles now. That’s one thing I see. And now, with the help of AI, what's going to happen is it will play two roles. One— in many situations, with AI, you can do more things onshore. That’s one aspect of it. And second—with AI, someone sitting offshore who knows how to use AI can become very competitive as well. We don't have enough data yet to fully see how this will evolve, but maybe in a year or so, we'll see how it plays out. But I also find that with these simultaneous translation tools—like Apple, I think an iPhone can now translate in all languages. Essentially, another barrier falls that if the language and knowledge of your offshore contractor is not perfect, they can understand things much more clearly because of simultaneous translation. Even on Zoom, you can now flip a switch and they can read what's being said in their own language during a conversation. So that's amazing, I think. Yeah. That’s amazing. That’s amazing. They can understand more about the culture and mindset. So that's something have to see. Again, I think it depends on the use case, the application, the problem we're solving. But in some cases, it might be great news for onshore—we can keep more dollars here. But keeping dollars here with AI also means a lot of that spend is going to AI, right? So that's one thing—we have to be very careful. Yesterday, in our tech breakfast, our presentation was about how to optimize your AI tokens. There are some companies spending $150,000 per year per employee on tokens. Wow. That's like the salary of one employee. Yeah. A mid-level developer—$150K—they're spending that much. And then they’re trying to figure out how to optimize it. And on top of that, they have cloud costs, right? AWS, Azure—those costs are still there—and then you add AI. So it's a lot of money. You really have to be very smart about understanding and optimizing it. That’s why the prompting is so important, right? It's not just about getting the right software—it's also about getting the cost down. Yeah. Again, you need expert people who can prompt well, because it's about being able to communicate well. Prompting is about communication—it's about clarity, brevity, security, all that stuff. So, Piyush, we're coming close to the end of the recording. If someone would like to learn more about the applications you develop, how you're using AI, and how you can help their business develop technology, where can they find you? What's the best way to get in touch with you? Sure, there are many ways people can reach out to me. They can go to my website, www.simpalm.com—we have a contact form there. They can submit the form, or they can reach out to me via email directly at contact@simpalm.com. They can also connect with me on LinkedIn. I'm on LinkedIn—message me there if somebody needs anything. I always like discussing problems and what the solutions can be. If anybody reaches out to me, I'm always very quick to respond. That's awesome. So Piyush Jain, the CEO of Simpalm—and we didn't even talk about your other business, Ducknowl—thank you for coming, and thank you for sharing your insights and your framework on how to build an ideation framework for AI. So thanks for sharing that. And if you're listening and you enjoyed this conversation, then stay tuned, because every week we have another entrepreneur sharing their insights and frameworks with you. So make sure you follow us on YouTube, subscribe, and give us a review on Apple Podcasts. So thanks for coming. Thank you, Steve. It was a pleasure talking to you. Important Links: Piyush's LinkedIn Piyush's website
Security problems aren't changing very much even though security teams are. We catch up on the implications of the Claude Code source leak, the very human lessons from the axios NPM compromise, and what secure design looks like when it involves agents, humans, or both. AppSec has always celebrated interesting and impactful vulns. And LLMs are now a favored tool for finding flaws. We shouldn't forget the success and effectiveness of fuzzers like OSS-Fuzz, which has improved security for over 1,000 projects and found over 50,000 bugs. But we can't ignore the ease of prompting an agent to go find -- and exploit -- a vuln when the UX and overhead of doing so is hardly more than writing some markdown. The SDLC Blind Spot: Why Breaches Start with Identity, Not Code Developers have access to source code, CI/CD pipelines, and cloud infrastructure — and attackers know it. Target lost 860GB of source code through a single compromised credential. Recruitment fraud campaigns have pivoted from a compromised developer to cloud admin in under 10 minutes. As agents join human developers, contractors, and service accounts in the SDLC, the attack surface is expanding faster than static security tools can track. Security teams need real-time visibility beyond code and into who has access and what they're actually doing. This segment is sponsored by Apiiro. To lean more, visit https://securityweekly.com/apiirorsac. How AI-Driven Development is Reshaping the Application Risk Landscape Agent coding assistants are accelerating software development, generating more code and more change than security teams were built to handle. In this interview, Idan Plotnik discusses how AI-driven development is reshaping the application risk landscape and why traditional vulnerability management models can't keep up. Make sure to schedule a free SDLC Risk Assessment with BlueFlag Security - 30 minutes to deploy. 48 hours to results. Please visit https://securityweekly.com/blueflagrsac. Visit https://www.securityweekly.com/asw for all the latest episodes! Show Notes: https://securityweekly.com/asw-377
We're proud to release this ahead of Ryan's keynote at AIE Europe. Hit the bell, get notified when it is live! Attendees: come prepped for Ryan's AMA with Vibhu after.Move over, context engineering. Now it's time for Harness engineering and the age of the token billionaires.Ryan Lopopolo of OpenAI is leading that charge, recently publishing a lengthy essay on Harness Eng that has become the talk of the town:In it, Ryan peeled back the curtains on how the recently announced OpenAI Frontier team have become OpenAI's top Codex users, running a >1m LOC codebase with 0 human written code and, crucially for the Dark Factory fans, no human REVIEWED code before merge. Ryan is admirably evangelical about this, calling it borderline “negligent” if you aren't using >1B tokens a day (roughly $2-3k/day in token spend based on market rates and caching assumptions):Over the past five months, they ran an extreme experiment: building and shipping an internal beta product with zero manually written code. Through the experiment, they adopted a different model of engineering work: when the agent failed, instead of prompting it better or to “try harder,” the team would look at “what capability, context, or structure is missing?”The result was Symphony, “a ghost library” and reference Elixir implementation (by Alex Kotliarskyi) that sets up a massive system of Codex agents all extensively prompted with the specificity of a proper PRD spec, but without full implementation:The future starts taking shape as one where coding agents stop being copilots and start becoming real teammates anyone can use and Codex is doubling down on that mission with their Superbowl messaging of “you can just build things”.Across Codex, internal observability stacks, and the multi-agent orchestration system his team calls Symphony, Ryan has been pushing what happens when you optimize an entire codebase, workflow, and organization around agent legibility instead of human habit.We sat down with Ryan to dig into how OpenAI's internal teams actually use Codex, why the real bottleneck in AI-native software development is now human attention rather than tokens, how fast build loops, observability, specs, and skills let agents operate autonomously, why software increasingly needs to be written for the model as much as for the engineer, and how Frontier points toward a future where agents can safely do economically valuable work across the enterprise.We discuss:* Ryan's background from Snowflake, Brex, Stripe, and Citadel to OpenAI Frontier Product Exploration, where he works on new product development for deploying agents safely at enterprise scale* The origin of “harness engineering” and the constraint that kicked off the whole experiment: Ryan deliberately refused to write code himself so the agent had to do the job end to end* Building an internal product over five months with zero lines of human-written code, more than a million lines in the repo, and thousands of PRs across multiple Codex model generations* Why early Codex was painfully slow at first, and how the team learned to decompose tasks, build better primitives, and gradually turn the agent into a much faster engineer than any individual human* The obsession with fast build times: why one minute became the upper bound for the inner loop, and how the team repeatedly retooled the build system to keep agents productive* Why humans became the bottleneck, and how Ryan's team shifted from reviewing code directly to building systems, observability, and context that let agents review, fix, and merge work autonomously* Skills, docs, tests, markdown trackers, and quality scores as ways of encoding engineering taste and non-functional requirements directly into context the agent can use* The shift from predefined scaffolds to reasoning-model-led workflows, where the harness becomes the box and the model chooses how to proceed* Symphony, OpenAI's internal Elixir-based orchestration layer for spinning up, supervising, reworking, and coordinating large numbers of coding agents across tickets and repos* Why code is increasingly disposable, why worktrees and merge conflicts matter less when agents can resolve them, and what it really means to fully delegate the PR lifecycle* “Ghost libraries”, spec-driven software, and the idea that a coding agent can reproduce complex systems from a high-fidelity specification rather than shared source code* The broader future of Frontier: safely deploying observable, governable agents into enterprises, and building the collaboration, security, and control layers needed for real-world agentic workRyan Lopopolo* X: https://x.com/_lopopolo* Linkedin: https://www.linkedin.com/in/ryanlopopolo/* Website: https://hyperbo.la/contact/Timestamps00:00:00 Introduction: Harness Engineering and OpenAI Frontier00:02:20 Ryan's background and the “no human-written code” experiment00:08:48 Humans as the bottleneck: systems thinking, observability, and agent workflows00:12:24 Skills, scaffolds, and encoding engineering taste into context00:17:17 What humans still do, what agents already own, and why software must be agent-legible00:24:27 Delegating the PR lifecycle: worktrees, merge conflicts, and non-functional requirements00:31:57 Spec-driven software, “ghost libraries,” and the path to Symphony00:35:20 Symphony: orchestrating large numbers of coding agents00:43:42 Skill distillation, self-improving workflows, and team-wide learning00:50:04 CLI design, policy layers, and building token-efficient tools for agents00:59:43 What current models still struggle with: zero-to-one products and gnarly refactors01:02:05 Frontier's vision for enterprise AI deployment01:08:15 Culture, humor, and teaching agents how the company works01:12:29 Harness vs. training, Codex model progress, and “you can just do things”01:15:09 Bellevue, hiring, and OpenAI's expansion beyond San FranciscoTranscriptRyan Lopopolo: I do think that there is an interesting space to explore here with Codex, the harness, as part of building AI products, right? There's a ton of momentum around getting the models to be good at coding. We've seen big leaps in like the task complexity with each incremental model release where if you can figure out how to collapse a product that you're trying to.Build a user journey that you're trying to solve into code. It's pretty natural to use the Codex Harness to solve that problem for you. It's done all the wiring and lets you just communicate in prompts. To let the model cook, you have to step back, right? Like you need to take a systems thinking mindset to things and constantly be asking, where is the Asian making mistakes?Where am I spending my time? How can I not spend that time going forward? And then build confidence in the automation that I'm putting in place. So I have solved this part of the SDLC.swyx: [00:01:00] All right.[00:01:03] Meet Ryan swyx: We're in the studio with Ryan from OpenAI. Welcome.Ryan Lopopolo: Hi,swyx: Thanks for visiting San Francisco and thanks for spending some time with us.Ryan Lopopolo: Yeah, thank you. I'm super excited to be here.swyx: You wrote a blockbuster article on harness engineering. It's probably going to be the defining piece of this emerging discipline, huh?Ryan Lopopolo: Thank you. It is it's been fun to feel like we've defined the discourse in some sense.swyx: Let's contextualize a little bit, this first podcast you've ever done. Yes. And thank you for spending with us. What is, where is this coming from? What team are you in all that jazz?Ryan Lopopolo: Sure, sure.Ryan Lopopolo: I work on Frontier Product Exploration, new product development in the space of OpenAI Frontier, which is our enterprise platform for deploying agents safely at scale, with good governance in any business. And. The role of VMI team has been to figure out novel ways to deploy our models into package and products that we can sell as solutions to enterprises.swyx: And you have a background, I'll just squeeze it in there. Snowflake, brick, [00:02:00] stripe, citadel.Ryan Lopopolo: Yes. Yes. Same. Any kind of customerswyx: entire life. Yes. The exact kind of customer that you want to,Vibhu: so I'll say, I was actually, I didn't expect the background when I looked at your Twitter, I'm seeing the opposite.Stuff like this. So you've got the mindset of like full send AI, coding stuff about slop, like buckling in your laptop on your Waymo's. Yes. And then I look at your profile, I'm like, oh, you're just like, you're in the other end too. Oh, perfect. Makes perfect.Ryan Lopopolo: I it's quite fun to be AI maximalist if you're gonna live that persona.Open eye is the place to do it. And it'sswyx: token is what you say.Ryan Lopopolo: Yeah. Certainly helps that we have no rate limits internally. And I can go, like you said, full send at this stay.swyx: Yeah. Yeah. So the Frontier, and you're a special team within O Frontier.Ryan Lopopolo: We had been given some space to cook, which has been super, super exciting.[00:02:47] Zero Code ExperimentRyan Lopopolo: And this is why I started with kind of a out there constraint to not write any of the code myself. I was figuring if we're trying to make agents that can be deployed into end to enterprises, they should be [00:03:00] able to do all the things that I do. And having worked with these coding models, these coding harnesses over 6, 7, 8 months, I do feel like the models are there enough, the harnesses are there enough where they're isomorphic to me in capability and the ability to do the job.So starting with this constraint of I can't write the code meant that the only way I could do my job was to get the agent to do my job.Vibhu: And like a, just a bit of background before that. This is basically the article. So what you guys did is five months of working on an internal tool, zero lines of code over a mi, a million lines of code in the total code base.You say it was cenex, more like it was cenex faster than you would've. If you had done it by end. SoRyan Lopopolo: yeah, thatVibhu: was the mindset going into this, right?Ryan Lopopolo: That's right.[00:03:46] Model Upgrades LessonsRyan Lopopolo: Started with some of the very first versions of Codex CLI, with the Codex Mini model, which was obviously much less capable than the ones we have today.Which was also a very good constraint, right? Quite a visceral feeling to ask the [00:04:00] model to build you a product feature. And it just not being able to assemble the pieces together.Which kind of defined one of the mindsets we had for going into this, which is whenever the model just cannot, you always pop open at the task, double click into it, and build smaller building blocks that then you can reassemble into the broader objective.And it was quite painful to do this. Honestly, the first month and a half was. 10 times slower than I would be. But because we paid that cost, we ended up getting to something much more productive than any one engineer could be because we built the tools, the assembly station for the agent to do the whole thing.[00:04:43] Model Generations, Build Systems & Background ShellsRyan Lopopolo: But yeah, so onward to G BT 5, 5, 1, 5, 2, 5, 3, 5 4. To go through all these model generations and see their kind of corks and different working styles also meant we had to adapt the code base to change things up when the model was revved. [00:05:00] One interesting thing here is five two, the Codex harness at the time did not have background shells in it, which means we were able to rely on blocking scripts to perform long horizon work.But with five, three and background shells, it became less patient, less willing to block. So we had to retool the entire build system to complete in under a minute and. This is not a thing I would expect to be able to do in a code base where people have opinions. But because the only goal was to make the Asian productive over the course of a week, we went from a bespoke make file build to Basil, to turbo to nx and just left it there because builds were fast at that point.swyx: Interesting. Talk more about Turbo TenX. That's interesting ‘cause that's the other direction that other people have been doing.Ryan Lopopolo: Ultimately I have. Not a lot of experience with actual frontend repo architecture.swyx: You're talking that Jessica built the sky. So I'm like, I know the NX team. I know Turbo from Jared [00:06:00] Palmer.And I'm like, yeah, that's an interesting comparison.[00:06:02] One Minute Build LoopRyan Lopopolo: The hill we were climbing right, was make it fast.swyx: Is there a micro front end involved? Is it how how complex reactRyan Lopopolo: electron base single app sort of thingswyx: And must be under a minute. That's an interesting limitation. I'm actually not super familiar with the background shelf stuff.Probably was talked about in the fight three release.Ryan Lopopolo: BA basically means that codex is able to spawn commands in the background and then go continue to work while it waits for them to finish. So it can spawn an expensive build and then continue reviewing the code, for example.swyx: Yeah.Ryan Lopopolo: And this helps it be more time efficient for the user invoking the harness.swyx: And I guess and just to really nail this, like what does one minute matter? Like why not five, okay, good. We want no. WeRyan Lopopolo: want the inner loop to be as fast as possible. Okay. One minute was just a nice round number and we were able to hit it.swyx: And if it doesn't complete, it kills it or some something,Ryan Lopopolo: No.We just take that as a signal that we need to stop what we're doing, double click, decompose a build graph a bit to get us to high back under so that we [00:07:00] can able the agent continue to operate.swyx: It's almost like you're, it's like a ratchet. It's like you're forcing build time discipline, because if you don't, it'll just grow and grow.That's right. And you mentioned that my current, like the software I work on currently is at 12 minutes. It sucks.Ryan Lopopolo: This has been my experience with platform teams in the past, where you have an envelope of acceptable build times and you let it go up to breach and then you spend two, three weeks to bring it back down to the lower end of the average low bed stop.But because tokens are so cheap Yeah. And we're so insanely parallel with the model, we can just constantly be gardening this thing to make sure that we maintain these in variants, which means. There's way less dispersion in the code and the SDLC, which means we can simplify in a way and rely on a lot more in variance as we write the software.[00:07:45] Observability, Traces & Local Dev StackVibhu: Lovely.[00:07:46] Humans Are BottleneckVibhu: You mentioned in your article, like humans became the bottleneck, right? You kicked off as a team of three people. You're putting out a million line of code, like 1500 prs, basically. What's the mindset there? So as much as code is disposable, you're doing a lot of review. A lot [00:08:00] of the article talks about how you wanna rephrase everything is prompting everything, is what the agent can't see.It's kind of garbage, right? You shouldn't have it in there. So what's like the high level of how you went about building it, and then how you address okay, humans are just PR review. Like how is human in the loop for this?Ryan Lopopolo: We've moved beyond even the humans reviewing the code as well.[00:08:19] Human Review, PR Automation & Agent Code ReviewRyan Lopopolo: Most of the human review is post merge at this point.But post, post merge, that's not even reviewed. That's justswyx: Oh, let's just make ourselves happy by YouRyan Lopopolo: haven't used fundamentally. The model is trivially paralyzable, right? As many GPUs and tokens as I am willing to spend, I can have capacity to work with my hood base.The only fundamentally scarce thing is the synchronous human attention of my team. There's only so many hours in the day we have to eat lunch. I would like to sleep, although it's quite difficult to, stop poking the machine because it makes me want to feed it. You have to step back, right?Like you need to take a systems thinking mindset to things and [00:09:00] constantly be asking where is the agent making mistakes? Where am I spending my time? How can I not spend that time going forward? And then build confidence in the automation that I'm putting in place. So I have solved this part of the SDLC, and usually what that has looked like is like we started needing to pay very close attention to the code because the agent did not have the right building blocks to produce.Modular software that decomposed appropriately that was reliable and observable and actually accrued a working front end in these things, right?[00:09:35] Observability First SetupRyan Lopopolo: So in order to not spend all of our time sitting in front of a terminal at most, doing one or two things at a time, invested in giving the model that observability, which is that that graph in the post here.swyx: Yeah. Let's walk through this traces and which existed firstRyan Lopopolo: we started with just the app and the whole rest of it. From vector through to all these login metrics, APIs was, I dunno, half an [00:10:00] afternoon of my time. We have intentionally chosen very high level fast developer tools. There's a ton of great stuff out there now.We use me a bunch, which makes it trivial to pull down all these go written Victoria Stack binaries in our local development. Tiny little bit of python glue to spin all these up. And off you go. One neat thing here is we have tried to invert things as much as possible, which is instead of setting up an environment to spawn the coding agent into, instead we spawn the coding agent, like that's the entry point.It's just Codex. And then we give Codex via skills and scripts the ability to boot the stack if it chooses to, and then tell it how to set some end variables. So the app and local Devrel points at this stack that it has chosen to spin up. And this I think is like the fundamental difference between reasoning models and the four ones and four ohs of the past, where these models could not think so you had to put them in [00:11:00] boxes with a predefined set of state transitions.Whereas here we have the model, the harness be the whole box. And give it a bunch of options for how to proceed with enough context for it to make intelligent choices. SoVibhu: sales, so like a lot of that is around scaffolding, right? Yes. Previous agents, you would define a scaffold. It would operate in that.Lube, try again. That's pivoted off from when we've had reasoning models. They're seeming to perform better when you don't have a scaffold, right? That's right.[00:11:28] Docs Skills GuardrailsVibhu: And you go into like niches here too, like your SPEC MD and like having a very short agent MG Agent md.swyx: Yes. Yes.Vibhu: Yeah. So you even lay out what it is here, but I likeswyx: the table contents.Vibhu: Yeah.swyx: Like stuff like this, it really helps guide people because everyone's trying to do this.Ryan Lopopolo: This structure also makes it super cheap to put new content into the repository to steer both the humans and the agents.swyx: You, you reinvented skills, right?Vibhu: One big agents andswyx: skills from first princip holdsRyan Lopopolo: all skills did not exist when we started doing this.Vibhu: You have a short [00:12:00] one 100 line overall table of contents and then you have little skills, right? Core beliefs, MD tech tracker. Yeah. Yeah. The scale is overRyan Lopopolo: The tech jet tracker and the quality score are pretty interesting because this is basically a tiny little scaffold, like a markdown table, which is a hook for Codex to review all the business logic that we have defined in the app, assess how it matches all these documented guardrails and propose follow up work for itself.Before beads and all these ticketing systems, we were just tracking follow up work as notes in a markdown file, which, we could spa an agent on Aron to burn down. There's this really neat thing that like the models fundamentally crave text. So a lot of what we have done here is figure out ways to inject textswyx: intoRyan Lopopolo: the system right when we get a page, because we're missing a timeout, for example.I can just add Codex in Slack on that page and say, I'm gonna fix this by adding a timeout. Please update our reliability documentation. To require that all network calls have [00:13:00] timeouts. So I have not only made a point in time fix, but also like durably encoded this process knowledge around what good looks like.swyx: Yeah.Ryan Lopopolo: And we give that to the root coding agent as it goes and does the thing. But you can also use that to distill tests out of, or a code review agent, which is pointed at the same things to narrow the acceptable universe of the code that's produced.swyx: I think one of the concerns I have with that kind of stuff is you think you're making the right call by making, it's persisted for all time across everything.Yes. But then you didn't think about the exceptions that you need to make, right? And that you have to roll it back.Vibhu: Part of it isswyx: also sometimes it can follow your s instructions too.Vibhu: It's somewhat a skill, right? So it determines when it uses the tools, right? Like it's not like it'll run outta every call.It'll determine when it wants to check quality score, right?Ryan Lopopolo: Yeah. And we do in the prompts we give these agents, allow them to push back,[00:13:51] Agent Code Review RulesRyan Lopopolo: When we first started adding code review agents to the pr, it would be Codex, CLI. Locally writes the change, pushes up a PR on [00:14:00] those PR synchronizations of review agent fires.It posts a comment. We instruct Codex that it has to at least acknowledge and respond to that feedback. And initially the Codex driving the code author was willing to be bullied by the PR reviewer, which meant you could end up in a situation where things were not converging. So yeah, we had to,swyx: he's just a thrash.Ryan Lopopolo: We had to add more optionality to the prompts on both of these things, right? The reviewer agents were instructed to bias toward merging the thing to not surface anything greater than a P two in priority. We didn't really define P two, but we gave it, youswyx: did define P two.Ryan Lopopolo: We gave it a framework within which to score its outputswyx: and then greater than P zero is worse, right?Yes. P two is very good.Ryan Lopopolo: P zero is you will mute the code place ifswyx: you merch thisRyan Lopopolo: thing, right?swyx: Yeah.Ryan Lopopolo: But also on the code authoring agent side, we also gave it the flexibility to either defer or push back against review feedback, right? This happens all the time, right? Like I happen to notice something and leave a code review, [00:15:00] which.Could blow up the scope by a factor of two. I usually don't mean for that to be addressed Exactly. In the moment. It's more of an FYI file it to the backlog, pick it up in the next fix it week sort of thing. And without the context that this is permissible, the coding agents are gonna bias toward what they do, which is following instructions.swyx: Yeah.[00:15:19] Autonomous Merging Flowswyx: I do wanted to check in on a couple things, right? Sure. All the coding review agent, it can merge autonomously. I think that's something that a lot of people aren't comfortable with. And you have a list here of how much agents do they do Product code and tests, CI configuration and release tooling, internal Devrel tools, documentation eval, harness review, comments, scripts that manage the repository itself, production dashboard definition files, like everything.Yes. And so they're just all churning at the same time, is there like a record that, that any human on the team pulls to stop everythingRyan Lopopolo: Because we are building a native application here. We're not doing continuous deploy. So there's still a human in the loop for cutting the release branch.I see. We require a blessed [00:16:00] human approved smoke test of the app before we promote it to distribution, these sort of things.swyx: So you're working on the app, you're not building like infrastructure where you have like nines of reliability, that kinda stuff?Ryan Lopopolo: That's correct. That's correct. Okay. And also like full recognition here that all of this activity took in a completely greenfield repository.There's. Should be no script that this applies generally toswyx: this is a production thing, you're gonna shipRyan Lopopolo: toswyx: customers. Of course. Yeah, of course. So this is realVibhu: And like one of the things there is, you mentioned you started this as a repo from scratch. The onboarding first month or so was pretty, it was like working backwards, right?Yeah. And then you had to work with the system and now you're at that point where you know, you're very autonomous. I'm curious like, okay, so what, how human in the loop is it? So what are the bottlenecks that you wish you could still automate? And part of that is also like, where do you see the model trajectory improving and offloading more human in the loop?We just got 5.4. It's a really good,Ryan Lopopolo: fantastic model, by the way.Vibhu: Yeah. Yeah. It's the first one that's merged. Top tier coding. So it's codex level coding and reasoning. So general reasoning both in one model. SoRyan Lopopolo: andVibhu: computer [00:17:00] use vision.Ryan Lopopolo: Now we now with five four, I can just have Codex write the blog post, whereas for this one I had to balance between chat.swyx: Oh, I need to, I might be out of a job. Oh my God.Ryan Lopopolo: Oh,swyx: I know. You just gave me an idea for a completely AI newsletter that five four could do. Yeah, I get it Now.Ryan Lopopolo: This sort of thing is just one example of closing the loop, right? Like the dashboard thing you mentioned. We have Codex authoring the Js ON, for the Grafana dashboards and publishing them and also responding to the pages, which means when it gets the page, it knows exactly which dashboards are defined and what alerts.What alert was triggered by which exact log in the code base. ‘cause all of this stuff is collated together.swyx: It has to own everything.Yes. Yeah. Yeah.Ryan Lopopolo: And it means that if we have an outage that did not result in a page. It has the existing set of dashboards available to it. It has the existing set of metrics and logs and can figure out where the gaps in the dashboard are or [00:18:00] in the underlying metrics and fix them in one go.In the same way, you would have a full stack engineer be able to drive a feature from the backend all the way to the front end.Vibhu: So it, it seems like a lot of the work you guys had to do was you as a small team are fully working for a way that the model wants the software to be written. It's like less human legible for better. Code legibility, agent legibility. How do you think that affects broader teams? So one at OpenAI, do liaison, like this is how software should be written. Like I can imagine, say you join a new team with this methodology, this mindset there's ways that, teams do code review, teams write code, like teams are structured and a lot of it is for human legibility.So should we all swap? Like how does this play back one broader into OpenAI and then like broader into the software engineering, right? Is it like teams that pick this up will it's pretty drastic, right? You have to make a pretty big switch. Should they just full send Yeah.Ryan Lopopolo: The mindset is very much that I'm removed from the process, right? I can't really have deep code level opinions about [00:19:00] things. It's as if I'm. Group tech leading a 500 person organization.Vibhu: Yeah.Ryan Lopopolo: Like it's not appropriate for me to be in the weeds on every pr. This is why that post merge code review thing is like a good analog here, right?Like I have some representative sample of the code as it is written, and I have to use that to infer what the teams are struggling with, where they could use help, where they're already moving quickly and I can pivot my focus elsewhere.Vibhu: Yeah.Ryan Lopopolo: So I don't really have too many opinions around the code as it is written.I do, however, have a command based class, which is used to have repeatable chunks of business logic that comes with tracing and metrics and observability for free. And the thing to focus on is not how that business logic is structured, but that it uses this primitive ‘cause I know that's gonna give leverage by default.Vibhu: Yeah.Ryan Lopopolo: Yeah, back to that sort of systems stinking,Vibhu: and you have part of that in your blog post, enforcing architecture and ta taste how you set boundaries for what's used. There's also a section on redefining [00:20:00] engineering and stuff, but yeah, it's just, it's interesting to hear,Ryan Lopopolo: and as the models have gotten better, they have gotten better at proposing these abstractions to unblock themselves, which again, lets me move higher and higher up the stack to look deeper into the future on what ultimately blocked the team from shipping.swyx: Yeah. You mentioned so you, this is primarily a, it is like a 1 million line of code base electron app. But it manages its own services as well, so it's like a backend for front end type thing.Ryan Lopopolo: We do have a backend in there, but that's hosted in the cloud.Yeah. This sort of structure is actually within the separate main and render processesWithin theswyx: electric.That's just how electronic works.Ryan Lopopolo: Yeah, of course. So have also treated like. MVC style decomposition with the same level of rigor, which has been very fun.swyx: I have a fun pun. This is a tangent, NVC is model view controller. Any sort of full stack web Devrel knows that.But my AI native version of this is Model view Claw, the clause the harness.Ryan Lopopolo: That's right. That's right. I do think that there is an interesting space to [00:21:00] explore here with Codex, the harness as part of building AI products, right? There's a ton of momentum around getting the models to be good at coding.We've seen big leaps in like the task complexity with each incremental model release where if you can figure out how to collapse a product that you're trying to build, a user journey that you're trying to solve into code, it's pretty natural to use the Codex Harness to solve that problem for you. It's done all the wiring and lets you just communicate and prompts to let the model cook.Yeah. It's been very fun. And there's also a very engineering legible way of increasing capabil. It's fantastic, right? Yeah. Just give you, just give the model scripts, the same scripts you would already build for yourself.swyx: Yeah.Yeah. So for listeners, this is Ryan saying that software engineering or coding against will eat knowledge work like the non-coding parts that you would normally think.Oh, you have to build a separate agent for it. No, start a coding agent and go out from there. Which open Claw has like it's pie Underhood.Ryan Lopopolo: [00:22:00] Yes.Vibhu: Basically define your task in code. Everything is a codingswyx: agent by the way. Since I brought it up, it's probably the only place we bring it up. Is any open claw usage from you?Any?Ryan Lopopolo: No. No. Not for me. I don't have any spare Mac Minis rattling around my house.swyx: You can afford it? No. I just, I'm curious if it's changed anything in opening eye yet, but it's probably early days. And then the other, the other thing I, I wanna pull on here is like you mentioned ticketing systems and you mentioned prs and I'm wondering if both those things have to go away or be reinvented for this kind of coding.So the git itself and is like very hostile to multi-agent.Ryan Lopopolo: Yeah. We make very heavy use of work trees.swyx: But like even then, like I just did a, dropped a podcast yesterday with Cursors saying, and they said they're getting rid of work trees ‘cause it still has too many merge conflicts.It's still un too un unintuitive. But go ahead.Ryan Lopopolo: The models are really great at resolving merge conflicts. Yeah. And to get to a state where I'm not synchronously in the loop in my terminal, I almost don't care that there are mergeswyx: with disposable.[00:23:00] Yeah.Ryan Lopopolo: We invoke a dollar land skill and that coaches codex to push the PR Wait for human and agent reviewers Wait for CI to be green.Fix the flakes if there are any merged upstream. If the PR comes into conflict, wait for everything to pass. Put it in the merge queue. Deal with flakes until it's in Maine. End. This is what it means to delegate fully, right? This is in a, very large model re probably a significant tax on humans to get PRS merged, but the agent is more than capable of doing this and I really don't have to think about it other than keep my laptop open.swyx: Yeah. I used to be much more of a control freak, but now I'm like, yeah, actually you could do a better job of this than me. Yeah. With the right context. Yes.[00:23:47] Encoding Requirementsswyx: Anything else in harness in general? Just this piece, I just wanna make sure we,Ryan Lopopolo: I think one thing that I maybe didn't make super clear in the article that I heard on Twitter as an interesting, that's respond [00:24:00]swyx: to them.What's the chatter and then what's your response?Ryan Lopopolo: Ultimately, all the things that we have encoded in docs and tests and review agents and all these things are ways to put all the non-functional requirements of building high scale, high quality, reliable software into a space that prompt injects the agent.We either write it down as docs, we add links where the error messages tell how to do the right thing. So the whole meta of the thing is to basically tease out of the heads of all the engineers on my team, what they think good looks like, what they would do by default, or what they would coach a new hire on the team to do to get things to merch.And that's why we pay attention to all the mistakes, mistakes that the agent makes, right? This is code being written that is misaligned with some as yet not written down, non-functional requirement.swyx: Sorry, what? Did the online people misunderstand orRyan Lopopolo: No,swyx: whatyouRyan Lopopolo: responded to? Somebody just literally said that.I was like, oh yeah,swyx: okay,Ryan Lopopolo: This is the [00:25:00] thing. This is what I've been doing. Oh, youswyx: agree? Yeah. I see. Interesting.Ryan Lopopolo: One other neat thing, which I did totally did not expect is folks were just. Taking the link to the article and giving it to pi or Codex and say, make my repo this,Vibhu: you achi a whole recursion.Ryan Lopopolo: And it was wildly effective. Really? It was wildly effective. NoVibhu: way. It just actually is something I tried with five, four yesterday. I didn't have time. Last time I was like out speaking of something, and this is one of my things, I was like, okay, I have this article. Can we just scaffold out what it would be like to run this?And I, I did it first as that and then I was like, okay, let me take another little side repo and say okay, if I was to fully automate this like this because I haven't written a line of code, it'sRyan Lopopolo: like over full, setVibhu: it right. The side thing I'm doing of voice. TTS I'm just like, slobbing out, whatever.It's nothing production. I'm like, how would I make this like this? And it's actually like a really good way. It's like a good way to learn what could be changed, what could be like, it's just a good analyzing, right? You give it all the codes, you give it all the context, you give it the article and it walks you through it very well.That's right. That's right.[00:25:57] Inlining Dependencies[00:25:57] Dependencies Going Away & Brett Taylor's Responseswyx: I guess one more thing before we go to Symphony is I wanted to cover [00:26:00] Brett Taylor's response. We had him on the show. He is your chairman, which is wild. Yeah. That he's reading your articles as well and like getting engaged in it. He says software dependencies are going away.Basically they can just be like vendored. Yes. Response.Ryan Lopopolo: Aswyx: hundred percent. A hundred percent agree. You still pro qr, you still pay Datadog. You still pay Temporal. Thank you.Ryan Lopopolo: Yep. The level of complexity of the dependencies that we can internalize is, I would say low, medium right now. Just based on model capability.What does the,swyx: what is medium?Ryan Lopopolo: I would say like a. A couple thousand line dependency is a thing that we could in-house No problem. Call in an afternoon of time. One neat thing about it is like probably most of that code you don't even need. Like by in-house and abstraction, you can strip away all the generic parts of it and only focus on what you need to enable the specific thing.Yes. You're building,swyx: I've been calling this the end of b******t plugins.Ryan Lopopolo: Yeah.swyx: Because there's so much when I published an open source thing, I want to accept everything, be liberal. I want to accept, this is post's law, but that means there's so much bloat. Yes. There's so much overhead.Ryan Lopopolo: One other neat thing about [00:27:00] this too is when we deploy Codex Security on the repo, it is able to deeply review and change. The internalized dependencies in a much lower friction way than it would be to like, push patches upstream, wait for them to be released, pull them down, make sure that's compatible with all the transitive I have in my repo and things like that.So it's also much lower friction to internalize some of these things if code is free. ‘cause the tokens are cheap sort of thing.swyx: Yeah. Yeah. I think like the only argument I have against this is basically scale testing, which obviously the larger pieces of software like Linux, MySQL, he calls up even the Datadog and Temporals and then maybe security testing where Yes.Classically, I think, is it linis tos, it said security open source is the best disinfectant.Ryan Lopopolo: Many eyes.swyx: Many eyes. And if inline your dependencies and code them up, you're gonna have to relearn mistakes from other people that Yep.Ryan Lopopolo: Yep. And to internalize that dependency, you're back to zero and you have to start.Reassembling all those bits and pieces to Yeah. Have [00:28:00] high confidence in the code as it is written. Yeah.Vibhu: Even part of the first intro of this, you basically mentioned like everything was written by codex, including internal tooling, right? So internal tooling, like when you're visualizing what's going on it's writing it for itself.swyx: Yeah. I'm built internal tools way I now, and like I just show them off and they're like, how long did you spend? And I didn't spend any time. I just prompted it,Ryan Lopopolo: very funny story here.swyx: Yeah, go ahead.Ryan Lopopolo: We had deployed our app to the first dozen users internally had some performance issues, so we asked them to export a trace for us get a tar ball, gave it to our on-call engineer, and he did a fantastic job of working with Codex to build this beautiful local Devrel tool, next JS app, the drag and drop the tar ball in, and it visualizes the entire trace.It's fantastic. Took an afternoon, but none of this was necessary. Because you could just spin up codex and give it the tar ball and ask the same thing and get the response immediately. So in a way, optimizing for human [00:29:00] legibility of that debugging process was wrong. It kept him in the loop unnecessarily when instead he could have just like Codex cooked for five minutes and gotten this same.swyx: Yeah, you verify your instincts here of this is how we used to do it. Or this is how I would have used to solve it.Ryan Lopopolo: Yeah. In this local observability stack. Like sure, you can de deploy Yeager to visualize the traces, but I wouldn't expect to be looking at the traces in the first place because I'm not gonna write the code to fix them.swyx: Yeah. So basically there needs to be like this kind of house stack and owning the whole loop. I think that is very well established. And it sounds like you might be like sharing more about that in the future, right?Ryan Lopopolo: Yeah. I think we're excited to do[00:29:36] Ghost Libraries Specs[00:29:36] Ghost Libraries & Distributing Software as SpecsRyan Lopopolo: We're gonna talk about Symphony in a little bit, but like the way we distribute it as a spec, which I think folks are calling Ghost Libraries on Twitter.This is like a such a cool name. It does mean it becomes much cheaper to share software with the world, right? You define a spec, how you could build your own specifying as much as is required for a coding agent to reassemble it [00:30:00] locally. The flow here is very cool. Like we have taken. All the scaffolding that has existed in our proprietary repo spun up a new one.Ask Codex with our repo as a reference. Write the spec. We tell it. Spin up a team ox spawn a disconnected codex to implement the spec. Wait for it to be done. Spawn another codex and another team ox to review the spec com or review the implementation compared to upstream and update the spec so it diverges less.And then you just loop over and over Ralph style until you get a spec that is with high fidelity able to reproduce the system as it is. It's fantastic.Vibhu: And you're basically, you're not really adding any of your human bias in there, right? That's correct. A lot of times people write a spec and be like, okay, I think it should be done this way, and you'll riff on something.And it's no, the agent could have just handled it like you're still scaffolding in a sense, right? I want it done this way. It can determine its spec better.swyx: That's right. That's right. Part of me it, I'm, I've been working a lot on evals recently, and part of me is wondering if [00:31:00] an agent can produce a spec that it cannot solve.Is it always capable of things that he can imagine or can you imagine things that it is impossible to do?Ryan Lopopolo: I think with Symphony, we, there's like this there's this axis where you have things that are easier, hard, or established or new, right? And I think things that are hard and new is still something that the models need humans.Yeah. Drive.swyx: Yeah. Yeah.Ryan Lopopolo: But I think those other quadrants are largely salt. Given the right scaffold and the right thing that's gonna drive the agent to completion,swyx: it's crazy that it solved,Ryan Lopopolo: but it means that the humans, the ones with limited time and attention get to work on the hardest stuff, like the problems where it's pure white space out in front. Or like the deepest refactorings where you don't know what the proper shape of the interfaces are. And this is where I wanna spend my time. ‘cause it lets me set up for the next level of scale.swyx: Yeah. Yeah. Amazing. Let's introduce Symphony.I think we've been mentioning it every now and then. Elixir. Interesting option.Ryan Lopopolo: Yeah.swyx: Yeah. I'm not,Ryan Lopopolo: again, like the [00:32:00] elixir manifestation here is just a derivative. Is it a modelswyx: chosen? Yeah.Ryan Lopopolo: Yeah. Yeah. And it chose that because the process supervision and the gen servers are super amenable to the type of process orchestration that we're doing here.You are essentially spinning up little Damons for every task that is in execution and driving it to completion, which. Means the mall gets a ton of stuff for free by using Elixir and the Beam.swyx: I had to go do a crash course in Beam and Elixir, and I think most people are not operating at that scale of concurrency where you need that.But it is a good mental model for Resum ability and all those things. And these are things I care about. But tell me the story, the origin story of Symphony. What do you use it for? Is this, how did it form maybe any abandoned paths that you didn't take?[00:32:46] Terminal Free Orchestration[00:32:46] Symphony: Removing Humans from the LoopRyan Lopopolo: At the end of December we were at about three and a half PRS per engineer per day.This was before five two came out in the beginning of January. Everyone gets back from holiday with five two and no other work [00:33:00] on the repository. We were up in the five to 10 PRS per day per engineer. And I don't know about y'all, but like it's very taxing to constantly be switching like that. Like I was pretty tapped out at the end of the day, again, where are the humans spending their time? They're spending their time context switching between all these active tmox pains to drive the agent forward.swyx: Yeah. No way. Yeah.Ryan Lopopolo: So let's again, build something to remove ourselves from the loop. And this is what frantic sprinted adapt here to find a way to remove the need for the human to sit in front of their terminal.So a lot of experimentation with Devrel boxes and, automatically spinning up agents, like it seems like a fantastic end state here, where my life is beach. I open live twice a day and say yes no to these things. Yeah. And this is again, a super, super interesting framing for how the work is done.Because I become more latency and sensitive. I have [00:34:00] way less attachment to the code as it is written. Like I've had close to zero investment in the actual authorship experience. So if it's garbage. I can just throw it away and not care too much about it. In Symphony, there's this like rework state where once the PR is proposed and it's escalated to the human for review, it should be a cheap review.It is either mergeable or it is not. And if it's not, you move it to rework. The elixir service will completely trash the entire work tree NPR and start it again from scratch. Okay. And this is that opportunity again to say, why was it trash right? What did the agent do that wasswyx: bad. Yeah.Ryan Lopopolo: Fix that before moving the ticket toswyx: endRyan Lopopolo: of progress again.swyx: Yeah. Why is this not in codex app? I guess this, you guys are ahead of Codex app,Ryan Lopopolo: yeah, so the way the team has been working is basically to be as AI pilled as possible and spread ahead. And a lot of the things we have worked on have fallen out [00:35:00] into a lot of the products that we have.Like we were in deep consultation with the Codex team to. Have the Codex app be a thing that exists, right? To have skills be a thing that Codex is able to use. So we didn't have to roll our own to put automations into the product. So all of our automatic refactoring agents didn't have to be these hand rolled control loops.It has been really fantastic to be, in a way, un anchored to the product development of Frontier and Codex and just very quickly try to figure out what works and then later find the scalable thing that can be deployed widely. It's been a very fun way to operate. It's certainly chaotic. I have lost track very often of what the actual state of the code looks like.‘cause I'm not in the loop. There was. One point where we had wired playwright directly up to the Electron app. With MCPM CCPs, I'm pretty bearish on because the harness forcibly injects all those tokens in the [00:36:00] context, and I don't really get a say over it. They mess with auto compaction. The agent can forget how to use the tool.There's probably only what three calls in playwright that I actually ever want to use. So I pay the cost for a ton of things. Somebody vibed a local Damon that boots playwright and exposes a tiny little shim CLI to drive it. And I had zero idea that this had occurred because to me, I run Codex and it's able to, it's oh, it's better.Yeah. Like no knowledge of this at all. Uhhuh.[00:36:30] Multi Human ChaosRyan Lopopolo: So we have had like in human space to spend a lot of time doing synchronous knowledge sharing. We have a daily standup that's 45 minutes long because we almost have to. Fan out the understanding of the current state.swyx: Yeah, I was gonna say this is good for a single human multi-agent, but multi human, multi-agent is a whole like po like explosion of stuff.Ryan Lopopolo: Yeah. And that this is fundamentally why we have such a rigid, like 10,000 [00:37:00] engineer level architecture in the app because we have to find ways to carve up the space so people are not trampling on each other.swyx: Sorry, I don't get the 10,000 thing. Did I miss that?Ryan Lopopolo: The structure of the repository is like 500 NPM packages.It's like architecture to the excess for what you would consider, I think normal for a seven person team. But if every person is actually like 10 to 50. Then the like numbers on being super, super deep into decomposition and sharding and like proper interface boundaries make a lot more sense.swyx: Yeah. To me, that's why I talked about Microfund ends and I, an anex is from that world, but Cool. It is just coming back to, to, to this I dunno if you have other, thoughts on. Orchestrating so much work coin going through this. Is this enough? Is this like any aha moments?Vibhu: It'll be interesting to see like where, okay, so right now you pick linear as your issue tracker, right?swyx: Or it's like a is it actually linear? This is actually linear.[00:37:55] Linear vs Slack WorkflowVibhu: Oh, that's linear. It's linear.swyx: Oh I never looked atVibhu: video. The demo video I had to download to [00:38:00] run.swyx: So I, because I'm a Slack maxie, but Yeah, linear. Linear is also really good. Yes,Ryan Lopopolo: we do make a good use of Slack. We we fire off codex to do all these lotion, elasticity, fix ups, the things that like sync that knowledge into the repository.It's super cheap. Yeah.swyx: Yeah.Ryan Lopopolo: Just do it in Codex.swyx: My biggest plug is OpenAI needs to build Slack. You need to own Slack. Build yours. Turn this into Slack.Ryan Lopopolo: I did read about it. Youswyx: did?Ryan Lopopolo: Yeah.[00:38:25] Collaboration Tools for AgentsRyan Lopopolo: I would say that if we think that we want these agents to do economically valuable work, which is like this is the mission, right?We want AI to be deployed widely, to do economically valuable work, then we need to find ways for them to naturally collaborate with humans, which means collaboration tooling, I think, is an interesting space to explore.swyx: Yeah, totally. Yeah. GitHub, slack, linear.Vibhu: Yeah, that was my thing. Okay, where do we see right now Codex has started Codex Model, then CLI, now there's an app, app can let me shoot off multiple Codex is in parallel, but there's no great team collaboration for Codex.And it [00:39:00] seems like your team had some say into what comes out, right? So you talked to ‘em, codex kind of was a thing. From there, if you guys are on the bound, what stuff that like, you might not focus on, but what do you expect other people to be building, right? So people that are like five x 50 Xing.Should you build stuff that's like very niche for your workflow, for your team? Should it be more general so other people can adopt? Is there a niche there? ‘Cause part of it is just okay, is everything just internal tooling? Do we have everything our own way? Like the way our team operates has our own ways that we like to communicate or is there a broader way to do it?Is it something like a issue tracker? Just thoughts if you wanna riff on that.[00:39:35] Standardizing Skills and CodeRyan Lopopolo: I think TBD we have not figured this out in a general way. I do think that there is leverage to be had in making the code and the processes as much the same as possible. If you think that code is context, code is prompts, it's better from the agent behavior perspective to be able to look in a package in directory X, Y, Z, and it not to have to page so [00:40:00] deeply into directory if you C, because they have the same structure, use the same language, they have the same patterns internally.And that same like leverage comes from aligning on a single set of skills that you're pouring every engineer's taste into to make sure that the agent is effective. So like in our code base, we have, I think, six skills. That's it. And if some part of the software development loop is not being covered, our first attempt is to encode it in one of the existing setup skills, which means that we can change the agent behavior.Yeah. More cheaply than changing the human driver behavior.swyx: Yeah.[00:40:39] Self Improvement via Logsswyx: Have you ever, have you experimented with agents changing their own behavior?Ryan Lopopolo: We do.swyx: Yeah. Or parent agent changing a subagents, behavior or something like that.Ryan Lopopolo: We have some bits for skill distillation. So for example, there's one neat thing you can do with Codex, which is just point it at its own session logs to ask it to tell you how you can use [00:41:00] the tool pedal better.swyx: It's like introspectionRyan Lopopolo: or ask it to do things. I useVibhu: this session better. What skills should Iswyx: high? I like the modification of, you can do, just do things to you can just ask agent to do things.Ryan Lopopolo: Yeah. You can just codex things. This is like a, this is like a silly emoji that we have, right? You can just codex things, you can just prompt things.It's really glorious future we live in, but okay, you can do that one-on-one. But we're actually slurping these up for the entire team into blob storage and. Running agent loops over them every day to figure out where as a team can we do better and how do we reflect that back into the repositories?Yes, though everybody benefits from everybody else's behavior for free. Same for like PR comments, right? These are all feedback. That means the code as written, deviated from what was good, a PR comment, a failed build. These are all signals that mean at some point the agent was missing context. We gotta figure out how toswyx: Yeah.Ryan Lopopolo: Slurp it up and put it back in the reboot.swyx: By the way, I do this exactly right. I used to, when I use cloud code for [00:42:00] knowledge work, cloud cowork is like a nice product, right? Yes. In I think you would agree. I always have it tell me what do I do better next time? And that's the meta programming reflection thing.So I almost think like you have six reflection extraction levels in symphony and almost like the zero of layer. So the six levels are PO policy, configuration, coordination, execution, integration, observability. We've talked about a couple of these, but the zero layer is like the, okay, are we working well?Can we improve how we work? Yes. Can I modify my own workflow without MD or something? I don't know.Ryan Lopopolo: Yeah, of course. Yeah, of course you can. Like this thing is also able to cut its own tickets ‘cause we give it full access.Yeah. Make it a ticket to have it cut. Tickets you can.Put in the ticket that you expect it to file as on follow up work,swyx: like Yeah. Self-modifying. Yeah.Ryan Lopopolo: Yeah.[00:42:44] Tool Access and CLI FirstRyan Lopopolo: Put, don't put the agent in a box. Give the agent full accessibility over it. Domain.swyx: I had a mental reaction when you said don't put the agent in a box. So I think you should put it in a box. Like it's just that you're giving the box everything it needs.Ryan Lopopolo: Yeah. Context and tools.swyx: But we're like, as developers, we're used to calling [00:43:00] out to different systems, but here you use the open source things like the Prometheus, whatever, and you run it locally so that you can have the full loop. I assume.Ryan Lopopolo: Yep.Vibhu: I think likeRyan Lopopolo: another, you wanna minimize cloud, cloud dependencies.Vibhu: You also want to make sure that you think about what the agent has access to. What does it see? Does it go back into the loop, like from the most basic sense of you let it see its own like calls, traces it can determine where it went wrong. But are you feeding that back in? So you know, just the most basic level of you wanna see exactly what's input output, like does the agent have access to.What is being outputted, right? It can self-improve a lot of these things. It's allRyan Lopopolo: text, right? My job is to figure out ways to funnel text from one agent to the other.swyx: It's so strange like way back at the start of this whole AI wave Andre was like, English is the hottest day programming language.It's here, it's just Yeah. The feature as well.Vibhu: A lot of, okay. Like a lot of software, a lot of stuff. There's a gui, it's made for the human. We're seeing the evolution of CLI for everything, right? All tools have CLIs. Your agents can use [00:44:00] them well, do we get good vision? Do we get good little sandboxes?Like right now? It's a really effective way, right? Models love to use tools. They love the best. They love to read through text. So slap a CLI let it go loose. That works for everything.Ryan Lopopolo: It does. Yeah. Yeah.[00:44:14] UI Perception and RasterizingRyan Lopopolo: We've also been adapting nont, textual things to that shape in order to improve model behavior in some ways, right?We want the agent to be able to see the UI agents do not perceive visually in the same way that we do. They don't see a red box, they see red box button, right? They see these things in latent space. So if we want, Hey, yeah, I do. We haveswyx: a ding if that goes off every time. Alien spaceRyan Lopopolo: ding.Anyway if we wanna actually make it see the layout, it's almost easier to rasterize that image to ask EOR and feed it in to the agent. Ha. And there's no reason you can't do both, right? To like further refine how the model perceives the object it's [00:45:00] manipulating.swyx: Cool. Could we, you wanna talk about a couple more of these layers that might bear more introspection or that you have personal passion for?[00:45:07] Coordination Layer with ElixirRyan Lopopolo: I will say that the coordination layer here was a really tricky piece to get right.swyx: Let's do it. Yep. I'm all about that. And this is Temporal core.Ryan Lopopolo: This is where when we turn the spec into Elixir, where like the model takes a shortcut, right? Like it's oh, I have all these primitives that I can make use of in this lovely runtime that has native process supervision.Which is I think, a neat way to have taken the spec and made it more choices achievable by making choices that naturally mapswyx: Yeah.Ryan Lopopolo: To the domain, right? In the same way that like you would prefer to have a TypeScript model repo if you are doing full stack web development, right? Because the ability to share types across the front end and backend reduces a lot of complexity.And becauseswyx: that's what graph kill used to be.Ryan Lopopolo: That's right. Andswyx: I don't know if it's still alive, butRyan Lopopolo: [00:46:00] no humans in the loop here. So like my own personal ability to write or not write elixir. Doesn't really have to bias us away from using the right tool for the job. It is just wild.swyx: Love it. I love it.Yeah. I wonder if any languages struggle more than others because of this? I feel like everyone has their own abstractions. That would make sense. But maybe it might be slower, it might be more faulty where like you'd have to just kick the server every now and then. I, I don't know. I think observability layer is really well understood.Integration layer, CP is dead. I think all these just like a really interesting hierarchy to travel up and down. It's common language for people working on the system to understandRyan Lopopolo: The policy stuff is really cool, right? Yeah. You don't really have to build a bunch of code to make sure the system wait for the, to passswyx: it's institutional knowledge.Ryan Lopopolo: Yeah. You just give it the G-H-C-L-I with some text that say CI has to pass. It makes the maintenance of these systems a lot easier.[00:46:57] Agent Friendly CLI Outputswyx: Do you think that CLI maintainers need to be [00:47:00] do anything special for agents or just as is? It's good because like I don't think when people made the G GitHub, CLI, they anticipated this happening.Ryan Lopopolo: That's correct. The GH CLI is fantastic. It's great super industry.swyx: Everyone go try GH repo create GH pull and then pull request number, right? GH HPR, like 1 53, whatever. And then it like pullsRyan Lopopolo: basically my only interaction with the GitHub web UI at this point is GH PR view dash web.Exactly. Glanceswyx: at the diffRyan Lopopolo: and be like Sure thing. Send it. Yeah. But the CLI are nice ‘cause they're super token efficient and they can be made more token efficient really easily. Like I'm sure you all have seen like I go to build Kite or Jenkins and I could just get this massive wall of build output.And in order to unblock the humans, your developer productivity team is almost certainly gonna write some code that parses the actual exception out of the build logs and sticks it in a sticky note at the top of the page. And you basically [00:48:00] want CLI to be structured in a similar way, right? You're gonna want to patch dash silent to prettier because the agent doesn't care that every file was already formatted.Just wants to know it's either formatted or not. So it can then go run a right command. Similarly, like in our PNPM distributed script runner, when we had one, when you do dash recursive, like it produces a absolute mountain of text. But all of that is for passing. Test suites. So we ended up wrapping all of this in another scriptswyx: to suppress the,Ryan Lopopolo: which you can vibe the channel only output the failing parts of the tests.swyx: You make a pipe errors versus the standard, standard out. I don't know. Okay. Whatever. Too much thinking have to do that. The CII used to maintain SCLI for my company and yeah, this is like core, very core to my heart. But you're vibing my job.Ryan Lopopolo: That's right.swyx: Cool. Any other things?This is a long spec. [00:49:00] I appreciate that. It's got a lot of strong opinions in here. Any other things that we should highlight? I think obviously you can spend the whole day going through some of these, but I do think that some of these have a lot of care or some of this you might wanna tell people, Hey, take this, but, make it your own.[00:49:15] Blueprint Spec and GuardrailsRyan Lopopolo: Fundamentally, software is made more flexible when it's able to adapt to the environment in which it is deployed, which means that things like linear or GitHub even are specified within the spec, but not required pieces of it. There's like a more platonic ideal of the thing that you could swap in like Jira or Bitbucket, for example.But being able to tightly specify things like the ID formats or how the Ralph Loop works for the individual agents. Basically means you can get up and running with a fully specified system quickly that you then evolve later on. I think we never intended for this to be a static spec that you can [00:50:00] never change.It's more like a blueprint to get something worth a starting point up and running.swyx: Yeah.Ryan Lopopolo: For you then to vibe later to your heart's content,swyx: you have like code and scripts in here where it's oh, I think this is a really good prompt. It's just a very long prompt.Ryan Lopopolo: Fundamentally, the agents are good at following instructions, so give them instructions.And it will, improve the reliability of the result. We, much like the way we use Symphony, we don't want folks to have to monitor the agent as it is vibing the system into existence. So being very opinionatedVery strict around what these success criteria are means that our deployment success rate goes up. Yeah. It means we don't have to get tickets on this thing.Vibhu: Think it all goes back to that like code to disposable, right? Like early on when you had CLI or you'd kick off a Codex run, it would take two hours. You would wanna monitor okay, I'm in the workflow of just using one.I don't want it to go down the wrong path. I'll cut it off and, just shoot off four, like that was my favorite thing of the Codex app, right? Yeah. Just Forex it like, [00:51:00] it's okay. One of them will probably be right, one of them might be better. Stop overthinking it. Like my first example was probably like deep research.When you put out deep research and I'd ask it something like, I asked it something about LLM, it thought it was legal something and spent an hour, came back with a report completely off the rails. And I was like, okay, I gotta monitor this thing a bit. No don't monitor it. Just you want to build it so it's that it, it goes the right way.And you don't wanna, you don't wanna sit there and babysit, right? You don't want to babysit your agentsRyan Lopopolo: with that deep research query that you made. Looking at the bad result, you probably figured out you needed to tweak your prompt Yeah. A bit, right? That's that guardrail that you fed back into the code base for the task, your prompt to further align the agent's execution.Same sort of concept supply there too.swyx: When you talk, how are the customers feelingRyan Lopopolo: for Symphony? I think we have none, right? This is a thing we have put out into theswyx: world. Symphony's internal, right? As long as you are happy, you are the customer. That'
Security problems aren't changing very much even though security teams are. We catch up on the implications of the Claude Code source leak, the very human lessons from the axios NPM compromise, and what secure design looks like when it involves agents, humans, or both. AppSec has always celebrated interesting and impactful vulns. And LLMs are now a favored tool for finding flaws. We shouldn't forget the success and effectiveness of fuzzers like OSS-Fuzz, which has improved security for over 1,000 projects and found over 50,000 bugs. But we can't ignore the ease of prompting an agent to go find -- and exploit -- a vuln when the UX and overhead of doing so is hardly more than writing some markdown. The SDLC Blind Spot: Why Breaches Start with Identity, Not Code Developers have access to source code, CI/CD pipelines, and cloud infrastructure — and attackers know it. Target lost 860GB of source code through a single compromised credential. Recruitment fraud campaigns have pivoted from a compromised developer to cloud admin in under 10 minutes. As agents join human developers, contractors, and service accounts in the SDLC, the attack surface is expanding faster than static security tools can track. Security teams need real-time visibility beyond code and into who has access and what they're actually doing. This segment is sponsored by Apiiro. To lean more, visit https://securityweekly.com/apiirorsac. How AI-Driven Development is Reshaping the Application Risk Landscape Agent coding assistants are accelerating software development, generating more code and more change than security teams were built to handle. In this interview, Idan Plotnik discusses how AI-driven development is reshaping the application risk landscape and why traditional vulnerability management models can't keep up. Make sure to schedule a free SDLC Risk Assessment with BlueFlag Security - 30 minutes to deploy. 48 hours to results. Please visit https://securityweekly.com/blueflagrsac. Show Notes: https://securityweekly.com/asw-377
Security problems aren't changing very much even though security teams are. We catch up on the implications of the Claude Code source leak, the very human lessons from the axios NPM compromise, and what secure design looks like when it involves agents, humans, or both. AppSec has always celebrated interesting and impactful vulns. And LLMs are now a favored tool for finding flaws. We shouldn't forget the success and effectiveness of fuzzers like OSS-Fuzz, which has improved security for over 1,000 projects and found over 50,000 bugs. But we can't ignore the ease of prompting an agent to go find -- and exploit -- a vuln when the UX and overhead of doing so is hardly more than writing some markdown. The SDLC Blind Spot: Why Breaches Start with Identity, Not Code Developers have access to source code, CI/CD pipelines, and cloud infrastructure — and attackers know it. Target lost 860GB of source code through a single compromised credential. Recruitment fraud campaigns have pivoted from a compromised developer to cloud admin in under 10 minutes. As agents join human developers, contractors, and service accounts in the SDLC, the attack surface is expanding faster than static security tools can track. Security teams need real-time visibility beyond code and into who has access and what they're actually doing. This segment is sponsored by Apiiro. To lean more, visit https://securityweekly.com/apiirorsac. How AI-Driven Development is Reshaping the Application Risk Landscape Agent coding assistants are accelerating software development, generating more code and more change than security teams were built to handle. In this interview, Idan Plotnik discusses how AI-driven development is reshaping the application risk landscape and why traditional vulnerability management models can't keep up. Make sure to schedule a free SDLC Risk Assessment with BlueFlag Security - 30 minutes to deploy. 48 hours to results. Please visit https://securityweekly.com/blueflagrsac. Visit https://www.securityweekly.com/asw for all the latest episodes! Show Notes: https://securityweekly.com/asw-377
Security problems aren't changing very much even though security teams are. We catch up on the implications of the Claude Code source leak, the very human lessons from the axios NPM compromise, and what secure design looks like when it involves agents, humans, or both. AppSec has always celebrated interesting and impactful vulns. And LLMs are now a favored tool for finding flaws. We shouldn't forget the success and effectiveness of fuzzers like OSS-Fuzz, which has improved security for over 1,000 projects and found over 50,000 bugs. But we can't ignore the ease of prompting an agent to go find -- and exploit -- a vuln when the UX and overhead of doing so is hardly more than writing some markdown. The SDLC Blind Spot: Why Breaches Start with Identity, Not Code Developers have access to source code, CI/CD pipelines, and cloud infrastructure — and attackers know it. Target lost 860GB of source code through a single compromised credential. Recruitment fraud campaigns have pivoted from a compromised developer to cloud admin in under 10 minutes. As agents join human developers, contractors, and service accounts in the SDLC, the attack surface is expanding faster than static security tools can track. Security teams need real-time visibility beyond code and into who has access and what they're actually doing. This segment is sponsored by Apiiro. To lean more, visit https://securityweekly.com/apiirorsac. How AI-Driven Development is Reshaping the Application Risk Landscape Agent coding assistants are accelerating software development, generating more code and more change than security teams were built to handle. In this interview, Idan Plotnik discusses how AI-driven development is reshaping the application risk landscape and why traditional vulnerability management models can't keep up. Make sure to schedule a free SDLC Risk Assessment with BlueFlag Security - 30 minutes to deploy. 48 hours to results. Please visit https://securityweekly.com/blueflagrsac. Show Notes: https://securityweekly.com/asw-377
In this episode of Engineering Enablement, Jesse Adametz joins Abi Noda, this time to host. Together, they explore how AI is showing up across the SDLC, not just in code generation, and how it is shifting bottlenecks across the development process. They unpack what “AI readiness” actually means in practice, and why it often comes down to developer experience fundamentals like documentation, environments, and feedback loops.They also discuss why enablement matters more than tool choice, how teams are thinking about measuring ROI, and what changes as background agents become more common. Finally, they explore how the role of the engineer may evolve, the open questions teams are still grappling with, and the challenges of non-engineers contributing to codebases.Where to find Jesse Adametz: • LinkedIn: https://www.linkedin.com/in/jesseadametz • X: https://x.com/jesseadametz • Website: https://www.jesseadametz.com/Where to find Abi Noda:• LinkedIn: https://www.linkedin.com/in/abinoda In this episode, we cover:(00:00) Intro(02:12) Where AI is showing up across the SDLC(05:53) AI readiness and its link to developer experience(08:23) Why enablement, education, and experimentation matter more than tool choice(13:05) The case for a dedicated enablement team(14:50) Measuring AI ROI: challenges and tradeoffs(19:46) Background agents and token spend(24:12) Measuring agent output with PR throughput(26:58) How the engineer role might change(31:01) Specs and documentation in the age of AI(33:11) Non-engineers writing code(35:30) What's changing in the SDLC and open questionsReferenced:• Measuring AI code assistants and agents• Lessons from Twilio's multi-year platform consolidation• The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win• How Claude remembers your project - Claude Code Docs• specIsJustCode : r/ProgrammerHumor
This conversation was recorded at GOTO Copenhagen 2025.https://gotopia.techMarko Klemetti - CTO of EficodeKris Jenkins - Lifelong Computer Geek and Podcast HostORIGINAL TALK TITLERewriting the SDLC Playbook with GenAI: How To Build a GenAI-Augmented Software Organization?RESOURCESMarkohttps://bsky.app/profile/mrako.comhttps://twitter.com/mrakohttps://github.com/mrakohttps://www.linkedin.com/in/mrakohttps://mrako.comKrishttps://bsky.app/profile/krisajenkins.bsky.socialhttps://twitter.com/krisajenkinshttps://www.linkedin.com/in/krisjenkinshttps://github.com/krisajenkinshttp://blog.jenkster.comABSTRACTSpeakers interview each other on topics that matter to them.Expect the unexpected. [...]Read the full abstract here:https://gotocph.com/2025/sessions/3931RECOMMENDED BOOKSMatthew Skelton & Manuel Pais • Team Topologies • http://amzn.to/3sVLyLQForsgren, Humble & Kim • Accelerate: The Science of Lean Software and DevOps • https://amzn.to/3tCz1xOJohn Arundel & Justin Domingus • Cloud Native DevOps with Kubernetes • https://amzn.to/3hKZvI5Wynne, Hellesoy & Tooke • The Cucumber Book • https://amzn.to/3tEUINJSol Rashidi • Your AI Survival Guide • https://amzn.to/3UFYnKCDavid Foster • Generative Deep Learning • https://amzn.to/48ZgP4xPhil Winder • Reinforcement Learning • https://amzn.to/3t1S1VZBlueskyInstagramLinkedInFacebookCHANNEL MEMBERSHIP BONUSJoin this channel to get early access to videos & other perks:https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/joinLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!
AI agents have officially arrived on an internet that simply wasn't built for them. So how do we build the infrastructure to keep them safe, productive, and contained? This week, Andrew sits down with Matt Boyle, Head of Product, Design and Engineering at Ona (formerly Gitpod), to discuss evolving cloud development environments into secure, enterprise-grade "agent jails." They explore the mechanics of Project Veto's kernel-level security, the slow death of the traditional IDE, and how the rise of AI is transforming developers into full-stack, T-shaped product owners. Finally, Matt shares his vision for the future of the SDLC, detailing how organizations can safely balance strict compliance with the bleeding edge of autonomous software factories.Download the APEX FrameworkFollow the show:Subscribe to our Substack Follow us on LinkedInSubscribe to our YouTube ChannelLeave us a ReviewFollow the hosts:Follow AndrewFollow BenFollow DanFollow today's guest:Learn more about Ona (formerly Gitpod) and read their latest blog announcements, including Veto.Connect with Matt on LinkedIn | X (Twitter)OFFERSStart Free Trial: Get started with LinearB's AI productivity platform for free.Book a Demo: Learn how you can ship faster, improve DevEx, and lead with confidence in the AI era.LEARN ABOUT LINEARBAI Code Reviews: Automate reviews to catch bugs, security risks, and performance issues before they hit production.AI & Productivity Insights: Go beyond DORA with AI-powered recommendations and dashboards to measure and improve performance.AI-Powered Workflow Automations: Use AI-generated PR descriptions, smart routing, and other automations to reduce developer toil.MCP Server: Interact with your engineering data using natural language to build custom reports and get answers on the fly.
Ken Johnson and Seth Law reflect on the 2026 RSA Conference and BSidesSF, noting an industry-wide "awakening" regarding the high costs and engineering complexities of operationalizing AI security tools. A major focus is the recent "supply chain attack hell," specifically the compromise of the Axios HTTP client through dual-account breaches that allowed attackers to bypass legitimate OIDC deploy setups via a misconfigured NPM CLI. The malware used was particularly evasive, deleting itself and replacing its package.json with a clean version post-execution. The hosts also discuss the emergence of the "Agentic Development Lifecycle" (ADLC), where engineering teams are increasingly "committing on time" rather than features, creating a volume of code that traditional security gates cannot manage. They debate Thomas Ptacek's thesis that AI agents will soon "supplant" human vulnerability research for common bug classes, shifting the human role toward high-level governance and "context infusion". Economically, they highlight how Anthropic's security announcements contributed to nearly half a trillion dollars in market value loss for traditional security firms, as investors increasingly bet on frontier models to consume established security domains.
As AI transforms the digital landscape, the intersection of data privacy and machine learning has become a critical battleground for security professionals. In this episode, we dive into the core tenets of Privacy Engineering through the lens of the Certified Information Privacy Technologist (CIPT). From the seven principles of Privacy by Design to the deployment of Privacy Enhancing Technologies (PETs), learn how organizations are building privacy into the SDLC rather than "bolting it on" as an afterthought.
Traditional AppSec tools were created with the assumption that humans wrote code and security reviewed it afterward. But when AI generates code continuously and autonomously, at a speed no traditional security process can keep up with, vulnerabilities spread long before a scanner ever runs. Risk is compounding while security struggles to catch up. In this episode, Dave Rubinstein speaks with Eran Kinsbruner, vice president of marketing at AppSec company Checkmarx. Among the topics discussed are:-- Why traditional AppSec tools can't keep pace with AI-generated code-- The need to ensure security from the beginning of the project-- How the SDLC is morphing into assn ADLC -- Agentic Development Life Cycle
This week. we discuss Claude Code's momentum, Cursor's identity crisis, and the SDLC's uncertain future. Plus, Coté finally explains how Markdown is destroying the economy. Watch the YouTube Live Recording of Episode 562 Runner-up Titles Demos over Memos Products over Prose Software written by the many for the few USB is flaky Do you get a Code of Conduct for prison? I thought I had typed it somewhere Markdown is taking down the economy Claude, Take the Wheel Sticking with month-to-month Precious Tokens Rip Van Winkle this whole AI thing The ants have won They have infinite tokens Is SLDC Dead? Rundown The SaaS-Apocalypse was based on markdown files The Software Development Lifecycle Is Dead The Third Era of Software Development Intelligence, Subtracted Anthropic rejects Pentagon's AI demands Exclusive-Anthropic investors push to de-escalate Pentagon clash over AI safeguards ‘Incoherent': Hegseth's Anthropic ultimatum confounds AI policymaker Anthropic leads Enterprise AI Spend Anthropic took >50% of spend on enterprise AI subscriptions $110 Billion in Name Only OpenAI reveals more details about its agreement with the Pentagon OpenAI changes deal with US military after backlash Relevant to your Interests McKinsey and AWS launch Amazon McKinsey Group Polymarket defends its decision to allow betting on war as ‘invaluable' The Supreme Court doesn't care if you want to copyright your AI-generated art Distinguished Eng On Stack Ranking, Competing with Bezos, Regrets WIZ: My Personal AI Agent OpenAI changes deal with US military after backlash Tech Publications Lost 58% of Google Traffic Since 2024 Ramp AI Index Nonsense Callers to Washington state hotline press 2 for Spanish and get accented AI English instead Anyone Else Have Those Weird Dreams Where Sobbing Future Generations Beg You To Change Course? Conferences Austin Meetup, March 10th, Listener Steve Anness speaking on Grafana KubeCon EU, March 23rd to 26th, 2026 - Coté will be there on a media pass. DevOpsdays Atlanta 2026, April 21-22, 2026 DevOpsDays Austin, May 5-6, 2026 WeAreDevelopers, July 8th to 10th, Berlin, Coté speaking. VMware User Groups (VMUGs): Amsterdam (March 17-19, 2026) - Coté speaking. Minneapolis (April 7-9, 2026) Toronto (May 12-14, 2026) Dallas (June 9-11, 2026) Orlando (October 20-22, 2026) SDT News & Community Join our Slack community Email the show: questions@softwaredefinedtalk.com Free stickers: Email your address to stickers@softwaredefinedtalk.com Follow us on social media: Twitter, Threads, Mastodon, LinkedIn, BlueSky Watch us on: Twitch, YouTube, Instagram, TikTok Book offer: Use code SDT for $20 off "Digital WTF" by Coté Sponsor the show Sponsor more podcasts with Failover Media Recommendations Brandon: Failover Media Newsletter Milestone 1.1 Ski Quiver Matt: IKEA MYGGSPRAY motion sensor, TRADFRI LED and RODRET Coté: The “Anime Wow” sound. And, related to Brandon's modernization talk last week.
What does it actually look like to build an AI-native product and lead an engineering team through the AI era when you've been doing it longer than most? Rob Zuber sits down with Loïc Houssier, CTO at Superhuman, to talk about what it meant to be an AI company before AI was everywhere, and how that early foundation shapes the way they build, ship, and think today.The conversation covers how Loïc drove AI tool adoption across his engineering org without mandates (and which senior engineer's change of heart became a cultural turning point), why great UX is still the real moat in an age where anyone can ship an average product fast, and how email, despite everything, remains the connective tissue of professional life. Plus: what it's like to rethink your entire SDLC when the economics of building software change overnight.Have someone you'd like to hear on the show, reach out to us on X at @CircleCI!
Software Engineering Radio - The Podcast for Professional Software Developers
Marc Brooker, VP and Distinguished Engineer at AWS, joins host Kanchan Shringi to explore specification-driven development as a scalable alternative to prompt-by-prompt "vibe coding" in AI-assisted software engineering. Marc explains how accelerating code generation shifts the bottleneck to requirements, design, testing, and validation, making explicit specifications the central artifact for maintaining quality and velocity over time. He describes how specifications can guide both code generation and automated testing, including property-based testing, enabling teams to catch regressions earlier and reason about behavior without relying on line-by-line code review. The conversation examines how spec-driven development fits into modern SDLC practices; how AI agents can support design, code review, documentation, and testing; and why managing context is now one of the hardest problems in agentic development. Marc shares examples from AWS, including building drivers and cloud services using this approach, and discusses the role of modularity, APIs, and strong typing in making both humans and AI more effective. The episode concludes with guidance on rollout, evaluation metrics, cultural readiness, and why AI-driven development shifts the engineer's role toward problem definition, system design, and long-term maintainability rather than raw code production. Brought to you by IEEE Computer Society and IEEE Software magazine.
Stephen Framil, Corporate Global Head of Accessibility at Merck, shares how he embedded accessibility into enterprise digital governance across more than 125 countries. From authoring a global accessibility policy to integrating controls into procurement, SDLC, and clinical trial protocols, Stephen explains how accessibility must be “baked in” rather than bolted on. Drawing from his background as a conductor, musician, and cancer survivor, he describes accessibility leadership as orchestration—guiding experts toward inclusive outcomes while normalizing accessibility across systems and culture.Mentioned in this episode:Info about Accessibility at Blink
Two words that make most engineers shudder: code refactoring. Now raise the stakes — refactoring decades of legacy systems inside a large enterprise. A tech debt-heavy project of this scale needs a leader who has driven complex digital transformations, like Gayatri Narayan (formerly PepsiCo, Microsoft, Amazon). Now, as President of Technology at Builders FirstSource, Gayatri Narayan is achieving a 3–4x increase in engineering velocity since joining less than a year ago. Gayatri joins host Yousuf Khan to unpack the strategy behind those results, including how to deploy AI across the SDLC, how to rigorously evaluate ROI on AI investments, and how to lead change across complex enterprise tech stacks.Key Moments: 01:30 – Why Construction Technology Is Ready for Transformation 04:05 – AI Strategy: Elevating UX and Customer Experience 08:20 – Evaluating AI Investments: ROI, NPV, and Operating Costs 12:45 – Achieving 3–4x Engineering Velocity 16:05 – Humans in the Loop: Craft, Code Review, and AI Amplification 18:35 – Where the Industry Gets AI Adoption Wrong 20:30 – Leadership Advice: Start with the Customer About Gayatri: Gayatri Narayan is a general management executive with more than 15 years of experience leading product, engineering, data science, and operations across global enterprises, with full P&L responsibility and a track record of driving profitable growth through digital transformation. She currently serves as President of Technology at Builders FirstSource, where she leads enterprise technology strategy, modernizes legacy systems, and embeds AI into the software development lifecycle to accelerate innovation across the residential construction value chain. Previously, she served as Senior Vice President of Digital Products and Services at PepsiCo and held multiple general management roles at Microsoft, including leading Product and Engineering for Intelligent Communications across Teams and Skype as well as Enterprise PaaS and SaaS businesses; she also held leadership roles at Amazon spanning Marketplace Transportation and Logistics and several major retail categories. Guest Highlights: “We've seen a three to four times increase in engineering velocity — especially in refactoring legacy systems where historically there was very little knowledge of how the system actually worked.” “With generative AI, companies that have existed for 20 or 30 years don't have to get bogged down by legacy stacks. They can embrace emerging technologies without spending 18 to 24 months just refactoring.” “It really comes down to efficiency of time. The developer's surface area of impact expands dramatically — it's not just about writing code anymore, it's about delivering business value faster.” Visit ciopod.com for more episodes. Subscribe on YouTube or follow on your favorite podcast platform so you never miss a conversation with today's top technology leaders. Our Sponsor: Want to accelerate software development by 500%? Meet Blitzy, the only autonomous code generation platform with infinite code context, purpose-built for large, complex enterprise-scale codebases. While other AI coding tools provide snippets of code and struggle with context, Blitzy ingests millions of lines of code and orchestrates thousands of agents that reason for hours to map every line-level dependency. With a complete contextual understanding of your codebase, Blitzy is ready to be deployed at the beginning of every sprint. Blitzy handles the heavy lifting, delivering over 80% of the work autonomously. The platform plans, builds, and validates premium-quality code at the speed of compute, turning months of engineering into a matter of days. It's the secret weapon for Fortune 500 companies globally. To hear how engineering leaders are transforming the way they deliver software, visit blitzy.com. Schedule a meeting with their consultants to enable an AI-Native SDLC in your organization today. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Arnie Katz has been running product and engineering under one roof since before most companies even considered combining the roles. As CPTO at GoFundMe, he oversees the teams behind a platform processing over 2.5 donations every second, with more than $40 billion in help facilitated worldwide. Arnie breaks down why the CPTO title keeps gaining traction, how he thinks about the role like a portfolio manager, and where the real trade offs live when one person holds both the product and technology reins.Key TakeawaysThe CPTO role works like a portfolio manager. Arnie manages the company's largest investment center by balancing short term business wins against long term platform bets, knowing when to take on technical debt and when to pay it down.Velocity, coordination, and alignment are the three biggest wins. When product and engineering report to one leader, decisions happen faster, roadmap conflicts get resolved without executive tug of war, and technical investments stay tied to business outcomes.The disadvantages are real. Without separate CPO and CTO voices at the executive table, certain perspectives can get muted. His fix: build a leadership bench strong enough to create the right tension underneath him.AI is changing what small teams can deliver. GoFundMe's eight person team behind Giving Funds is shipping at a pace that would have been impossible five years ago.Timestamped Highlights[00:38] The scale most people don't realize about GoFundMe, including 2.5 donations per second and GoFundMe Pro for nonprofits.[02:02] How Arnie first landed the CPTO title at StubHub seven years ago, and why it clicked.[09:11] The real downside of collapsing two C suite roles into one, and how Arnie designs around it.[13:57] His portfolio approach to technical debt, sequencing re platforming in areas like identity and payments while other teams ship business value.[18:38] AI reshaping engineering velocity, the future of the SDLC, and product teams prototyping without writing code.[23:06] Where the CPTO model is headed as the industry evolves.The Line That Stuck"I often think of myself as a portfolio manager. My job is to invest money where the company gets the best returns, where the mission gets the best return, where the shareholder gets the best returns."Pro TipsSequence your bets instead of spreading them thin. GoFundMe gave their identity and payments teams nine months of runway to re platform with no feature expectations while other squads picked up the pace on near term results.Build leadership that creates productive friction. Without CPO vs. CTO tension at the exec level, let your VPs and SVPs push back against each other. That tension is where the best decisions come from.Think in time horizons, not just priorities. Short term moves for 0.1% to 0.5% metric lifts. Midterm bets for 1% to 5% gains. Long term swings that could transform the business. Allocate across all three.If this conversation changed how you think about product and engineering working together, share it with someone on your team. Subscribe to The Tech Trek so you never miss an episode, and connect with Arnie on LinkedIn to keep the conversation going.GoFundMe is offering listeners of The Tech Trek a chance to open their own Giving Fund. For the first 50 people who open a Giving Fund and add $25 or more to their Giving Fund, GoFundMe will add an additional $25 to that Giving Fund. If you have a Giving Fund but have never contributed into it, you can also participate. The deadline for this incentive is March 13. To get this incentive, click here to start your Giving Fund.
Is AI security just "Cloud Security 2.0"? Toni De La Fuente, creator of the open-source tool Prowler, joins Ashish to explain why securing AI workloads requires a fundamentally different approach than traditional cloud infrastructure.We dive deep into the "Shared Responsibility Gap" emerging with managed AI services like AWS Bedrock and OpenAI. Toni spoke about the hidden dangers of default AI architectures, why you should never connect an MCP (Model Context Protocol) directly to a database.We discuss the new AI-driven SDLC, where tools like Claude Code can generate infrastructure but also create massive security blind spots if not monitored.Guest Socials - Toni's LinkedinPodcast Twitter - @CloudSecPod If you want to watch videos of this LIVE STREAMED episode and past episodes - Check out our other Cloud Security Social Channels:-Cloud Security Podcast- Youtube- Cloud Security Newsletter If you are interested in AI Security, you can check out our sister podcast - AI Security PodcastQuestions asked:(00:00) Introduction(02:50) Who is Toni De La Fuente? (Creator of Prowler)(03:50) AI Security vs. Cloud Security: What's the Difference? (07:20) The Shared Responsibility Gap in AI Services (Bedrock, OpenAI) (11:30) The "Fifth Party" Risk: Managed AI Access (13:40) AI Architecture Best Practices: Never Connect MCP to DB Directly (16:40) Prowler's AI Pillars: Generating Dashboards & Detections (22:30) The New SDLC: Securing Code from Claude Code & Lovable (25:30) The "Magic" Trap: Why AI Doesn't Know Your Security Context (28:30) Top 3 Priorities for Security Leaders (Infra, LLM, Shadow AI) (30:40) Future Predictions: Why Predicting 12 Months Out is Impossible
We are at a unique point in history where there is finally an alternative to human coding. If AI can write the code effectively, what is left for the software engineer?In this episode, Joris Conijn (AWS CTO at Xebia) argues that the era of "just coding" is over. We discuss why senior developers are safe (for now), why juniors are at risk of never learning the fundamentals, and how "Shadow AI" is forcing companies to change their security strategies.Most importantly, we break down the difference between a "Programmer" and a "Software Engineer" with the introduction of agentic tools. If you want to future-proof your career and move from writing lines of code to designing systems, this conversation is for you.In this episode, we cover:Why banning AI at work actually increases your security riskHow to use AI to automate the boring parts of the SDLC (requirements & user stories)The critical difference between "Coding" and "System Architecture"Why you should check your AI Agents into your Git repositoryThe 20-year problem: what happens when engineers never learn the fundamentals?Connect with Joris Conijn:https://www.linkedin.com/in/jorisconijnTIMESTAMPS00:00:00 - Intro 00:01:11 - What Keeps a CTO Excited About Tech? 00:02:58 - Stop Being the "Department of No" in Security 00:05:28 - The Real Risk of Banning AI at Work 00:06:32 - When Developers Hold the Organization Hostage 00:08:14 - The Hidden Dangers of Instant AI Code Fixes 00:09:50 - Will Future Devs Understand Object Oriented Programming? 00:11:36 - Using AI to Accelerate Learning vs Copy-Pasting 00:13:17 - Why Testing Matters More When AI Writes Code 00:16:42 - Automating the Boring Parts of the SDLC 00:19:06 - How to Turn Meeting Transcripts into User Stories 00:21:36 - The Critical Skill of Making Implicit Knowledge Explicit 00:23:10 - Why You Should Stop Obsessing Over Story Points 00:27:46 - The "A-Team" Approach to High-Trust Development 00:29:54 - Running Parallel Workflows with AI Agents 00:33:34 - Pro Tip: Check Your AI Agents into Git 00:35:52 - Balancing Autonomy and Governance in Large Teams 00:39:19 - There Is Finally an Alternative to Human Coders 00:41:07 - Programmer vs Software Engineer: What is the Difference? 00:44:45 - How to Teach Software Engineering in the AI Era#SoftwareEngineering #SystemDesign #AIAgents
Amal Hussein returns to tell us all about her new role at Istari, what life is like outside the web browser, how she's helping ambitious orgs in aerospace, what the SDLC looks like in 2026, and a whole lot more. Wait, moon vacuums?!
Amal Hussein returns to tell us all about her new role at Istari, what life is like outside the web browser, how she's helping ambitious orgs in aerospace, what the SDLC looks like in 2026, and a whole lot more. Wait, moon vacuums?!
Od ostatnich odcinków minęło trochę czasu, ale świat IT nie stał w miejscu – wręcz przeciwnie, przyspieszył tak, że momentami trudno nadążyć. Dlatego w tym odcinku, wspólnie z Łukaszem Szydło i Marcinem Markowskim, próbujemy po prostu głośno zastanowić się, co tak naprawdę dzieje się z pracą architekta oprogramowania i ogólnie architekturą software'u w dobie wszechobecnego Generative AI.Gdy kolejne modele wychodzą w coraz szybszym tempie, w zasadzie trochę trudno rozmawiać o tym, jakie 10 narzędzi zmieni Twoje życie architekta, z których warto korzystać już teraz. Zamiast tego usiedliśmy, żeby porozmawiać o naszych spostrzeżeniach i obserwacjach z placu boju. AI wpędza nas po trochu w pułapkę: kod powstaje błyskawicznie, ale nasze ludzkie moce przerobowe do jego czytania i weryfikacji pozostają w zasadzie bez zmian. Czy przez to nie zmieniamy się powoli w redaktorów kodu i czy Code Review nie stanie się zaraz największym wąskim gardłem w naszych projektach? Ale Code Review jest tylko jednym z etapów procesu Software Development Lifecycle, na którym widać wpływ narzędzi AI.Ogłoszenie!Już niedługo, bo 17 lutego, będziemy mogli się spotkać na otwartym warsztacie DevHours: Fullstack x EventStorming, który mam przyjemność współorganizować z Capgemini. Jeśli interesujesz się oprogramowaniem i chcesz podnieść swoje umiejętności w projektowaniu software'u, zapraszam do rejestracji.
AI has successfully solved the blank page problem for developers, but it has created a massive new bottleneck downstream in the SDLC. LinearB CEO Ori Keren joins us to explain why 2026 will be a year of norming as organizations struggle to digest the flood of AI-generated code. In this annual prediction episode, he details why upstream velocity gains are being lost to chaos in reviews and testing. We also discuss why enterprises aren't ready to hand over the keys to autonomous agents and how to build dynamic pipelines based on risk.LinearB Access the AI code review metrics dashboardUnify your Copilot and Cursor impact metricsFollow the show:Subscribe to our Substack Follow us on LinkedInSubscribe to our YouTube ChannelLeave us a ReviewFollow the hosts:Follow AndrewFollow BenFollow DanFollow today's guest:Follow Ori on LinkedInOFFERS Start Free Trial: Get started with LinearB's AI productivity platform for free. Book a Demo: Learn how you can ship faster, improve DevEx, and lead with confidence in the AI era. LEARN ABOUT LINEARB AI Code Reviews: Automate reviews to catch bugs, security risks, and performance issues before they hit production. AI & Productivity Insights: Go beyond DORA with AI-powered recommendations and dashboards to measure and improve performance. AI-Powered Workflow Automations: Use AI-generated PR descriptions, smart routing, and other automations to reduce developer toil. MCP Server: Interact with your engineering data using natural language to build custom reports and get answers on the fly.
Ori Bendet, Vice President of Product Management at Checkmarx, joined Doug Green, Publisher of Technology Reseller News, to discuss how the acquisition of Tromzo strengthens Checkmarx's agentic application security strategy and reflects a broader shift in how organizations secure software in an AI-driven development era. Bendet explained that Checkmarx, a pioneer in application security with more than two decades of experience, has traditionally focused on helping organizations identify vulnerabilities early in the software development lifecycle (SDLC). However, the rapid adoption of AI-generated code has fundamentally changed the AppSec landscape. “The industry used to be fixated on finding vulnerabilities,” Bendet said. “Now the real challenge is fixing them at scale, in context, and without slowing developers down.” The Tromzo acquisition builds on Checkmarx's existing family of agentic tools, Checkmarx Assist, which already provides real-time remediation inside the developer IDE. Tromzo extends these capabilities deeper into the SDLC, enabling automated remediation at the repository and pull-request stages. Together, the technologies aim to “complete the loop” by delivering consistent, trusted remediation from early development through later stages of deployment. Bendet noted that AI is widening the gap between development velocity and security oversight, as significantly more code—and therefore more vulnerabilities—is being produced. At the same time, the application footprint itself is evolving to include AI components such as large language models, agents, and third-party AI services. “There is now a new AI element inside the application,” he said, “and organizations need AppSec solutions that understand and protect that expanded footprint.” Auto-remediation, once viewed skeptically by developers, is now gaining acceptance as AI agents gain a deeper understanding of application context. According to Bendet, modern agentic tools can remediate vulnerabilities while preserving business logic and minimizing disruption. “Developers no longer need to spend days undoing fixes that broke functionality,” he said. “The agent can understand the blast radius and refactor automatically.” Looking ahead, Bendet described a future where AppSec becomes more autonomous, with agents continuously testing, fixing, and validating applications while developers shift toward higher-level architectural and review roles. With proper guardrails in place, this evolution promises to reduce alert fatigue and allow teams to focus on innovation rather than remediation backlogs. More information about Checkmarx and its agentic application security approach is available at https://checkmarx.com/, with additional developer-focused resources at https://checkmarx.dev/.
Software engineering is changing fast, but not in the way most hot takes claim. Robert Brennan, Co founder and CEO at OpenHands, breaks down what happens when you outsource the typing to the LLM and let software agents handle the repetitive grind, without giving up the judgment that keeps a codebase healthy. This is a practical conversation about agentic development, the real productivity gains teams are seeing, and which skills will matter most as the SDLC keeps evolving. Key TakeawaysAI in the IDE is now table stakes for most engineers, the bigger jump is learning when to delegate work to an agentThe best early wins are the unglamorous tasks, fixing tests, resolving merge conflicts, dependency updates, and other maintenance work that burns time and attentionBigger output creates new bottlenecks, QA and code review can become the limiting factor if your workflow does not adaptSenior engineering judgment becomes more valuable, good architecture and clean abstractions make it easier to delegate safely and avoid turning the codebase into a messThe most durable human edge is empathy, for users, for teammates, and for your future self maintaining the systemTimestamped Highlights00:40 What OpenHands actually is, a development agent that writes code, runs it, debugs, and iterates toward completion02:38 The adoption curve, why most teams start with IDE help, and what “agent engineers” do differently to get outsized gains06:00 If an engineer becomes 10x faster, where does the time go, more creative problem solving, less toil15:01 A real example of the SDLC shifting, a designer shipping working prototypes and even small UI changes directly16:51 The messy middle, why many teams see only moderate gains until they redraw the lines between signal and noise20:42 Skills that last, empathy, critical thinking, and designing systems other people can understand22:35 Why this is still early, even if models stopped improving today, most orgs have not learned how to use them well yetA line worth sharing“The durable competitive advantage that humans have over AI is empathy.”Pro Tips for Tech TeamsStart by delegating low creativity tasks, CI failures, dependency bumps, and coverage improvements are great training wheelsDefine “safe zones” for non engineers contributing, like UI tweaks, while keeping application logic behind clearer guardrailsInvest in abstractions and conventions, you want a codebase an agent can work with, and a human can trustTrack where throughput stalls, if PR review and QA are the bottleneck, productivity gains will not show up where you expectCall to ActionIf you got value from this one, follow the show and share it with an engineer or product leader who is sorting out what “agentic development” actually means in practice.
Has AI really become THIS powerful in the enterprise? Today, we're talking to Brian Elliott, CEO at Blitzy and Tom Jackson, CTO at RSM US LLP. We discuss how AI agents are autonomously completing months of development work in days, why organizational change management is now the biggest bottleneck in software development, and how enterprises are achieving 5x engineering velocity with agentic SDLC platforms. All of this right here, right now, on the Modern CTO Podcast! To learn more about Blitzy, check out their website here.
When you're managing $60 trillion in assets across dozens of products and 30 global jurisdictions, technical debt isn't just an inconvenience—it's an existential risk.Jason Adams, Interim CTO of Charles River, a State Street Company, leads 800 engineers building mission-critical trading platforms for the world's largest asset managers. Joined by Sid Pardeshi, Co-Founder and CTO of Blitzy, he explains how State Street is using an AI-augmented SDLC to modernize decades-old systems, refactor legacy code, and dramatically increase developer productivity—without compromising the rigor required in financial services.Jason frames the strategy around three pillars: AI for engineering (copilots and polyglot support),AI for operations (APM, observability, and proactive monitoring), andAI embedded in products (LLM-powered explainers).Using Blitzy's agentic approach—iterative context building, dependency mapping, and targeted code generation—State Street compressed months of work into weeks while maintaining strict quality gates.About the Guests:Jason AdamsJason Adams is the Interim CTO of Charles River, a State Street Company. He brings deep expertise in modernizing legacy fintech infrastructure into scalable, cloud-native systems that support mission-critical financial services at global scale.Previously, Jason was Head of Platform Product and Strategy at Charles River Development and CTO of Mercatus (acquired by State Street and now part of Charles River for Private Markets). He has led high-impact initiatives across engineering, product, and cloud infrastructure, with extensive experience guiding end-to-end delivery teams.Today, Jason is driving a comprehensive SaaS transformation at CRD, focused on building resilient, future-ready architectures. From scaling global engineering organizations to delivering secure, high-performance platforms, he is committed to advancing innovation, agility, and long-term growth across Charles River, State Street Alpha, and State Street.Sid PardeshiSid Pardeshi is a technology leader and entrepreneur, currently Co-Founder and CTO of Blitzy. He holds a Harvard MS/MBA and previously served as a Software Architect at NVIDIA, where he built deep expertise at the intersection of AI, large-scale software systems, and product innovation.At NVIDIA, Sid was recognized as a Master Inventor, earning the Inventor's Jacket for driving AI-powered product innovation, with more than 25 U.S. patents filed across gaming, augmented reality, and virtual reality. He is also a seasoned software engineer with a strong track record in application performance optimization, delivering native client load-time improvements of up to 90%.Beyond hands-on engineering, Sid has led and coordinated software design, framework requirements, and application architecture across global teams of 500+ engineers. Today, he applies this blend of innovation, technical depth, and organizational leadership to building autonomous software development platforms that help enterprises modernize at scale.Timestamps:00:30 – Jason on Managing $60 Trillion in Assets01:55 – Challenges and Strategies in Financial Services07:00 – Embracing AI for Modernization09:10 – AI in Software Development Lifecycle15:55 – Ensuring Quality and Compliance with AI23:55 – AI in Operations and Incident Response26:00 – Proactive Workflow Monitoring26:20 – AI in SDLC: Creation to Operations30:00 – Challenges in AI Recommendations33:20 – Iterative Context Building with AI36:00 – Human Side of AI Transformation42:30 – Adopting AI Tools in Financial ServicesGuest Highlights:"One of the things that excites me the most right now is the ability to use an AI-augmented SDLC to drive modernization. Otherwise, with this many systems, it's too hard." — Jason "You have to invest in the non-attractive parts first. You have to build a foundation that's gonna support being able to bring on solutions and tools that could change your overall enterprise SDLC. That's a lot of work and that's a major investment." — Jason "We are unlocking by adding these additional capabilities and additional assurance that improves quality exponentially more than we could have in the past. Now I can have an agent swarm check itself—multiple agents doing code review at a level of depth we just don't have time to get to." — JasonGet Connected:Jason Adams on LinkedInSid Pardeshi on LinkedInYousuf Kahn on LinkedInIan Faison on LinkedInHungry for more tech talk? Check out latest episodes at ciopod.com: Ep 63 - How Autonomous AI is Solving the Enterprise Modernization ChallengeEp 62 - Running IT Like a Growth EngineEp 61 - What Manufacturing Can Teach You About Scaling Enterprise AILearn more about Caspian Studios: caspianstudios.com Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Software Engineering Radio - The Podcast for Professional Software Developers
Max and Luniel co-authors of the book - "Ready: Why Most Software Projects Fail and How to Fix It", discuss the concept of Readiness in software engineering with host Brijesh Ammanath. While Agile workflows and technical practices help delivery, many software efforts still struggle to achieve desired outcomes. Rework, shifting requirements, delays, defects, and mounting technical debt plague software delivery and impede or altogether halt progress toward goals. The problem is often that implementation begins prematurely, before the team is properly set up for success. A strict system of explicit readiness work and gating, called Requirements Maturation Flow (RMF), solves this problem in a SDLC-independent way. Teams that have adopted RMF dramatically improve progress toward real goals while reducing stress on engineering teams. In this podcast, Max and Luniel deep dive into Requirements Maturation Flow (RMF) and explain its foundational pillars. Objective - Understand why most software projects fail, what causes rework, under-delivery and delays. What is Requirements Maturation Flow and its 3 foundational practices? Understanding the value of having Readiness as a explicit work item Understanding Definition of Done Understanding Definition of Ready Brought to you by IEEE Computer Society and IEEE Software magazine.
Most enterprises have roadmaps stretching 3-5 years out. What if you could compress that to 1-2 years? Brian Elliot is the Co-Founder and CEO of Blitzy, an enterprise-focused autonomous software development platform tackling one of technology's toughest problems: how do you modernize 20-100 million lines of legacy code when the developers who wrote it retired 15 years ago?In this episode, Brian explores:Why orchestrated AI agents can handle 80% of transformation work autonomously (and why humans still matter for the other 20%)The realities of enterprise buying cycles and why embedded on-site teams accelerate change managementWhy documentation and test coverage are the unsexy first steps that make everything else possibleAbout the Guest: Brian Elliott is CEO and Co-founder of Blitzy. A serial entrepreneur, former Infantry Officer with the 1st Ranger Battalion, and West Point graduate in Systems Engineering with a Harvard MBA, Brian brings a unique blend of military precision, engineering expertise, and entrepreneurial vision to transforming enterprise software development.As CEO, Brian leads Blitzy's mission to empower systematic AI adoption across enterprises, transforming traditional development lifecycles into AI-native workflows. Under his leadership, Blitzy has developed an agentic platform where thousands of specialized AI Agents cooperate at inference to autonomously deliver enterprise-scale code that is tested, validated, and compiled.Focused on operational deployment at scale, Brian architected the company's proven Agentic SDLC Accelerator—a structured methodology that systematically guides engineering organizations from technical validation to full-scale enterprise adoption. This framework unlocks autonomous capabilities across the complete software development lifecycle.Timestamps:01:25 – Understanding Blitzy's AI Capabilities03:25 – Challenges and Solutions in Enterprise Software06:00 – The Genesis of Blitzy07:30 – Insights from Nvidia and AI Development11:00 – Implementing AI in Enterprise Systems18:00 – Change Management and Customer Collaboration20:30 – Understanding Enterprise Security Needs25:10 – Improving Code Quality and Test Coverage28:15 – Blitzy's Mission and Market Direction30:10 – Challenges and Opportunities in Enterprise SoftwareGuest Highlight:"Code is beautiful in that it's verifiable. We're following enterprise best practices—everything goes to a dev branch where a human can look at it, review it, go through a typical QA process. The first thing we're gonna do is document their code so they know what's going on, then add test cases, then develop software at scale that's highly verifiable."Get Connected:Brian Elliot on LinkedInYousuf Kahn on LinkedInIan Faison on LinkedInHungry for more tech talk? Check out past episodes at ciopod.com: Ep 62 - Running IT Like a Growth EngineEp 61 - What Manufacturing Can Teach You About Scaling Enterprise AIEp 60 - Why the Smartest CIOs Are Becoming Business StrategistsLearn more about Caspian Studios: caspianstudios.comOur Sponsor: This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with Infinite Code Context. Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise scale codebases with millions of lines of code.Enterprise Engineering leaders start every development sprint with the Blitzy platform, bringing in their development requirements. The Blitzy platform provides a plan, then generates and pre-compiles code for each task. Blitzy delivers 80%+ of the development work autonomously, while providing a guide for the final 20% of human development work required to complete the sprint.Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their Pre-IDE development tool, pairing it with their coding co-pilot of choice to bring an AI-Native SDLC into their org.Visit Blitzy.com and press book demo to learn how Blitzy transforms your SDLC from AI Assisted to AI Native. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
In this episode, Chetna explains how new automation strategies are evolving not only productivity, but the role of the CIO. Chetna emphasizes the importance of data quality and security when scaling a fast-growing company, as well as transparency and partnership in vendor relationships. About the Guest: Chetna is an award winning CIO, board member, and VC advisor with over 25 years of experience working in the Fortune 100 and serving as a 3X CIO for hyper-growth SaaS businesses. Chetna currently serves as CIO of Webflow, a hyper-growth Website Experience Platform SaaS company. Previously, she served as CDIO at Amplitude and ZoomInfo.Chetna is an advisor to prominent VC firms including Sequoia Capital, Accel, Ridge Ventures, and Mayfield and serves on the Customer Advisory Board (CAB) at Veza and, Productiv and was formerly at Snowflake and Google Cloud Platform CAB. She served on the Tech Committee with Carlyle and Thoma Bravo, and on the Advisory Board of Ninja Focus and Women & AI.She was a finalist and nominee for the Bay Area ORBIE, CIO award, a finalist for “2019 Markie's Cultivator Award for Best Lead Management Program,” a recipient of the Delta Dental Women in Business Stevie Award of Excellence in Healthcare Transformation, and a Boeing Spirit of Excellence Award recipient. Outside of work, she enjoys traveling, hiking, and skiing and has a passion for exploring different cultures.Timestamps:01:41 - About Chetna04:53 - Automation as a starting point07:16 - Employee productivity and the CIO11:25 - Discovering new AI tools13:44 - Evolving revenue systems22:47 - How will the CIO role evolve?28:37 - Lightning roundGuest Highlight:“ AI has really taken productivity at a whole different level now. It has really helped us drive the pace in productivity we couldn't have fathomed before the event of the content generation. It's not just content generation anymore. It's way beyond that. The velocity at which we are innovating on the product is huge.”Get Connected:Chetna Mahajan on LinkedInYousuf Kahn on LinkedInIan Faison on LinkedInHungry for more tech talk? Check out past episodes at ciopod.com: Ep 62 - Running IT Like a Growth EngineEp 61 - What Manufacturing Can Teach You About Scaling Enterprise AIEp 60 - Why the Smartest CIOs Are Becoming Business StrategistsLearn more about Caspian Studios: caspianstudios.comOur Sponsor:This episode was brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with Infinite Code Context.Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise scale codebases with millions of lines of code. Enterprise Engineering leaders start every development sprint with the Blitzy platform, bringing in their development requirements. The Blitzy platform provides a plan, then generates and pre-compiles code for each task. Blitzy delivers 80%+ of the development work autonomously, while providing a guide for the final 20% of human development work required to complete the sprint.Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their Pre-IDE development tool, pairing it with their coding co-pilot of choice to bring an AI-Native SDLC into their org.Visit Blitzy.com and press book demo to learn how Blitzy transforms your SDLC from AI Assisted to AI Native. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Three Buddy Problem - Episode 70: Dave Aitel from OpenAI's technical staff joins the buddies to discuss the just-launched Aardvark, OpenAI's agentic “security researcher” that claims to read code, finds bugs, validates exploits, and ships patches. We press him on where LLMs beat fuzzers, privacy boundaries, human-in-the-loop realities, SDLC budgets, pen-test cadence, and the zero-day economy. Plus, L3 Harris/Trenchant exec pleads guilty to selling exploits to Russian brokers, Kaspersky catches the return of HackingTeam using Chrome zero-day exploit chain, and news of a proposed law in Russia to force researchers to report vulnerabilities first to goverment agencies. Cast: Dave Aitel (https://www.linkedin.com/in/daveaitel/) (Technical Staff, OpenAI), Juan Andres Guerrero-Saade (https://twitter.com/juanandres_gs), Ryan Naraine (https://twitter.com/ryanaraine) and Costin Raiu (https://twitter.com/craiu).
Today, we are kicking off a new series, sponsored by our good friends at Railsware. Railsware is a leading product studio with two main focuses - services and products. They have created amazing products like Mailtrap, Coupler and TitanApps, while also partnering with teams like Calendly and Bright Bytes. They deliver amazing products, and have happy customers to prove it.In this series, we are digging into the company's methods around product engineering and development. In particular, we will cover relevant topics to not only highlight their expertise, but to educate you on industry trends alongside their experience.In today's episode, we are talking with Sergiy Korolov, Co-CEO of Railsware and Co-founder of Mailtrap. In this conversation, we are bringing up a popular - but somewhat controversial topic - vibe-coding vs. traditional software development approaches.Questions:You've been in tech for over two decades, and have definitely seen many trends come and go. How would you define "vibe-coding" and how does it differ from traditional software development approaches?What drove the emergence of vibe-coding? Could it be a response to overly rigid development processes that many companies have? Or it's a fundamental shift in engineering?What do engineers on your team think about vibe-coding? Have you practiced this approach on some of your products?What types of products or development contexts are best suited for vibe-coding?Is it possible to create successful and scalable products through vibe-coding? For instance, can people balance vibe-coding with business requirements, deadlines, and stakeholder expectations?To wrap up, is vibe-coding actually sustainable long-term, or is it just a trendy reaction to over-engineering?Linkshttps://railsware.com/https://www.linkedin.com/in/sergiykorolov/https://www.linkedin.com/posts/ylazor_vibe-coding-is-real-whether-we-like-it-or-activity-7371646785066422273-cmSO/https://mailtrap.io/Support this podcast at — https://redcircle.com/code-story-insights-from-startup-tech-leaders/donationsAdvertising Inquiries: https://redcircle.com/brandsPrivacy & Opt-Out: https://redcircle.com/privacy
This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Today, we're joined by Animesh Koratana, founder and CEO of PlayerZero to discuss his team's approach to making agentic and AI-assisted coding tools production-ready at scale. Animesh explains how rapid advances in AI-assisted coding have created an “asymmetry” where the speed of code output outpaces the maturity of processes for maintenance and support. We explore PlayerZero's debugging and code verification platform, which uses code simulations to build a "memory bank" of past bugs and leverages an ensemble of LLMs and agents to proactively simulate and verify changes, predicting potential failures. Animesh also unpacks the underlying technology, including a semantic graph that analyzes code bases, ticketing systems, and telemetry to trace and reason through complex systems, test hypotheses, and apply reinforcement learning techniques to create an “immune system” for software. Finally, Animesh shares his perspective on the future of the software development lifecycle (SDLC), rethinking organizational workflows, and ensuring security as AI-driven tools continue to mature. The complete show notes for this episode can be found at https://twimlai.com/go/746.