POPULARITY
Categories
Honeycomb Co-founder and CTO Charity Majors explains why measuring the right engineering metrics in the age of AI matters more than chasing numbers.Topics Include:Charity Majors introduces Honeycomb as the original observability company for complex systemsHoneycomb solves high cardinality problems across millions of individual customer experiencesTheir MCP tool ranked top five in Stack Overflow's most-used listCanva lets developers interact with production software directly from their IDEAI acts as an amplifier requiring strong reliability and observability foundationsMeasuring success requires multiple metrics to avoid gaming single numbersHoneycomb adopted Intercom's 2X productivity challenge enlisting employees to identify gainsWriting code was never the hard part even before generative AI arrivedHoneycomb created AI values prioritizing transparency and emotional safety for employeesStaff tested boundaries on resources and environmental impact prompting honest discussionsHoneycomb acquired Grok and shipped Query Assistant Canvas and MCP productsFuture concerns include AI economics shifting and AI-native developers lacking foundational expertiseParticipants:Charity Majors – Co-Founder/CTO, Honeycomb.ioSee how Amazon Web Services gives you the freedom to migrate, innovate, and scale your software company at https://aws.amazon.com/isv/
In this episode, we're joined by Maryam Ashoori, VP of Product and Engineering at IBM's Watsonx platform. With a background that includes 2 master's degrees in AI, a PhD in Systems Design Engineering, and named on over 30 patents at IBM, she's been on the bleeding edge for over a decade. Currently leading the charge on Agentic AI and AI Governance at IBM, Maryam is a bridge between the theoretical frontier of AI and the messy reality of enterprise deployment. In this episode, Maryam: Tells why AI has been stuck in pilot purgatory for longer than expected, and what you need to do today for a successful enterprise deployment Calls shenanigans on the “biggest, best model” crowd, and why often a smaller, more focused tool is the right choice Explains how to build an agnostic architecture that can handle the realities of an AI world where models advance faster than anybody can keep up Links LinkedIn: https://www.linkedin.com/in/mashoori/ IBM: https://www.ibm.com/us-en Resources Reinventing SaaS: Zuora's AI Transformation | Karthik Chakkarapani and Shakir Karim (Zuora): https://www.youtube.com/watch?v=gHVxnLikMpQ Linear's Secret to Building Powerful AI Products | Nan Yu, Head of Product (Linear): https://www.youtube.com/watch?v=27rGB-6XQJg Chapters 00:00 Intro 02:18 From ChatGPT hype to enterprise reality: use cases, ROI, and the rise of agents 06:11 Security, accountability & governance: who's responsible when agents go wrong? 10:37 Risk-based rollout: use-case scoping, Risk Atlas, and guardrails like PII detection 17:10 Observability for agentic workflows 18:21 Why compute optimization matters 22:58 Designing for model agility: abstraction layers, routing, and picking the right model 27:23 Conclusion Follow LaunchPod on YouTube We have a new YouTube page! Watch full episodes of our interviews with PM leaders and subscribe! What does LogRocket do? LogRocket's Galileo AI watches user sessions for you and surfaces the technical and usability issues holding back your web and mobile apps. Understand where your users are struggling by trying it for free at LogRocket.com.Special Guest: Maryam Ashoori.
BONUS: When AI Decisions Go Wrong at Scale—And How to Prevent It We've spent years asking what AI can do. But the next frontier isn't more capability—it's something far less glamorous and far more dangerous if we get it wrong. In this episode, Ran Aroussi shares why observability, transparency, and governance may be the difference between AI that empowers humans and AI that quietly drifts out of alignment. The Gap Between Demos and Deployable Systems "I've noticed that I watched well-designed agents make perfectly reasonable decisions based on their training, but in a context where the decision was catastrophically wrong. And there was really no way of knowing what had happened until the damage was already there." Ran's journey from building algorithmic trading systems to creating MUXI, an open framework for production-ready AI agents, revealed a fundamental truth: the skills needed to build impressive AI demos are completely different from those needed to deploy reliable systems at scale. Coming from the EdTech space where he handled billions of ad impressions daily and over a million concurrent users, Ran brings a perspective shaped by real-world production demands. The moment of realization came when he saw that the non-deterministic nature of AI meant that traditional software engineering approaches simply don't apply. While traditional bugs are reproducible, AI systems can produce different results from identical inputs—and that changes everything about how we need to approach deployment. Why Leaders Misunderstand Production AI "When you chat with ChatGPT, you go there and it pretty much works all the time for you. But when you deploy a system in production, you have users with unimaginable different use cases, different problems, and different ways of phrasing themselves." The biggest misconception leaders have is assuming that because AI works well in their personal testing, it will work equally well at scale. When you test AI with your own biases and limited imagination for scenarios, you're essentially seeing a curated experience. Real users bring infinite variation: non-native English speakers constructing sentences differently, unexpected use cases, and edge cases no one anticipated. The input space for AI systems is practically infinite because it's language-based, making comprehensive testing impossible. Multi-Layered Protection for Production AI "You have to put in deterministic filters between the AI and what you get back to the user." Ran outlines a comprehensive approach to protecting AI systems in production: Model version locking: Just as you wouldn't randomly upgrade Python versions without testing, lock your AI model versions to ensure consistent behavior Guardrails in prompts: Set clear boundaries about what the AI should never do or share Deterministic filters: Language firewalls that catch personal information, harmful content, or unexpected outputs before they reach users Comprehensive logging: Detailed traces of every decision, tool call, and data flow for debugging and pattern detection The key insight is that these layers must work together—no single approach provides sufficient protection for production systems. Observability in Agentic Workflows "With agentic AI, you have decision-making, task decomposition, tools that it decided to call, and what data to pass to them. So there's a lot of things that you should at least be able to trace back." Observability for agentic systems is fundamentally different from traditional LLM observability. When a user asks "What do I have to do today?", the system must determine who is asking, which tools are relevant to their role, what their preferences are, and how to format the response. Each user triggers a completely different dynamic workflow. Ran emphasizes the need for multi-layered access to observability data: engineers need full debugging access with appropriate security clearances, while managers need topic-level views without personal information. The goal is building a knowledge graph of interactions that allows pattern detection and continuous improvement. Governance as Human-AI Partnership "Governance isn't about control—it's about keeping people in the loop so AI amplifies, not replaces, human judgment." The most powerful reframing in this conversation is viewing governance not as red tape but as a partnership model. Some actions—like answering support tickets—can be fully automated with occasional human review. Others—like approving million-dollar financial transfers—require human confirmation before execution. The key is designing systems where AI can do the preparation work while humans retain decision authority at critical checkpoints. This mirrors how we build trust with human colleagues: through repeated successful interactions over time, gradually expanding autonomy as confidence grows. Building Trust Through Incremental Autonomy "Working with AI is like working with a new colleague that will back you up during your vacation. You probably don't know this person for a month. You probably know them for years. The first time you went on vacation, they had 10 calls with you, and then slowly it got to 'I'm only gonna call you if it's really urgent.'" The path to trusting AI systems mirrors how we build trust with human colleagues. You don't immediately hand over complete control—you start with frequent check-ins, observe performance, and gradually expand autonomy as confidence builds. This means starting with heavy human-in-the-loop interaction and systematically reducing oversight as the system proves reliable. The goal is reaching a state where you can confidently say "you don't have to ask permission before you do X, but I still want to approve every Y." In this episode, we refer to Thinking in Systems by Donella Meadows, Designing Machine Learning Systems by Chip Huyen, and Build a Large Language Model (From Scratch) by Sebastian Raschka. About Ran Aroussi Ran Aroussi is the founder of MUXI, an open framework for production-ready AI agents. He is also the co-creator of yfinance (with 10 million downloads monthly) and founder of Tradologics and Automaze. Ran is the author of the forthcoming book Production-Grade Agentic AI: From Brittle Workflows to Deployable Autonomous Systems, also available at productionaibook.com. You can connect with Ran Aroussi on LinkedIn.
Perform 2026 felt like a turning point for Dynatrace, and when Steve Tack joined me for his fourth appearance on the show, it was clear this was not business as usual. We began with a little Perform nostalgia, from Dave Anderson's unforgettable "Full Stack Baby" moment to the debut of AI Rick on the keynote stage. But the humor quickly gave way to substance. Because beneath the spectacle, Dynatrace introduced something that signals a broader shift in observability: Dynatrace Intelligence. Steve was candid about the problem they set out to solve. Too much focus on ingesting data. Too much time spent stitching tools together. Too many dashboards. Too many alerts. The real opportunity, he argued, is turning telemetry into trusted, automated action. And that means blending deterministic AI with agentic systems in a way enterprises can actually trust. We unpacked what that looks like in practice. From United Airlines using a digital cockpit to improve operational performance, to TELUS and Vodafone demonstrating measurable ROI on stage, the emphasis at Perform was firmly on production outcomes rather than pilot projects. As Steve put it, the industry has spent long enough in "pilot purgatory." The next phase demands real-world deployment and real return. A big part of that confidence comes from the foundations Dynatrace has laid with Grail and Smartscape. By combining unified telemetry in its data lakehouse with real-time topology mapping and causal AI, Dynatrace is positioning itself as the engine behind explainable, trustworthy automation. When hyperscaler agents from AWS, Azure, or Google Cloud call Dynatrace Intelligence, they are expected to receive answers grounded in causal context rather than probabilistic guesswork. We also explored what this means for developers, who often carry the burden of alert fatigue and fragmented tooling. New integrations into VS Code, Slack, Atlassian, and ServiceNow aim to bring observability directly into the developer workflow. The goal is simple in theory and complex in execution: keep engineers in their flow, reduce toil, and amplify human decision-making rather than replace it. Of course, autonomy raises questions about risk. Steve acknowledged that for now, humans remain firmly in the loop, with most agentic interactions still requiring checkpoints. But as trust grows, so will the willingness to let systems self-optimize, self-heal, and remediate issues automatically. We closed by zooming out. In a market saturated with AI claims, Steve encouraged listeners to bet on change rather than cling to the status quo. There will be hype. There will be agent washing. But there is also real value emerging for those prepared to experiment, learn, and scale responsibly. If you want to understand where AI observability is heading, and how deterministic and agentic intelligence can coexist inside enterprise operations, this episode offers a grounded, practical perspective straight from the Perform show floor.
In this episode of Tech Talks Daily, I sat down with Jinsook Han, Chief Agentic AI Officer at Genpact, to unpack one of the most misunderstood shifts in enterprise AI right now. Many organizations feel confident about the value AI can deliver, yet only a small fraction are able to move beyond pilots and into autonomous operations that actually scale. Genpact's Autonomy By Design research puts hard data behind that gap, and Jinsook explains why optimism often races ahead of readiness. We explore why agentic AI changes the rules entirely. When AI systems begin to act, decide, and adapt on behalf of the business, familiar operating models start to strain. Jinsook makes a compelling case that agentic AI cannot be treated like another software rollout. It demands a rethink of data, governance, roles, and even how teams define work itself. The shift from tools to teammates alters expectations for people across the organization, from frontline operators to the C-suite, and exposes just how unprepared many companies still are. Governance is a major theme throughout the conversation, but not in the way most leaders expect. Rather than slowing progress, Jinsook argues that governance must become part of how work happens every day. She shares how Genpact approaches agent certification, maturity, and oversight, using vivid analogies to explain why quality and alignment matter more than simply deploying large numbers of agents. We also dig into why many governance models fail, especially when they rely on committees instead of lived understanding. Upskilling sits at the heart of this transformation. Jinsook walks through how Genpact is training more than 130,000 employees for an agentic future, starting with executives themselves. The focus is not on abstract learning, but on proving that today's work looks different from yesterday's. Observability, explainability, and responsible AI are woven into this approach, with command centers designed to monitor both agent performance and health, turning early signals into opportunities rather than panic. This conversation goes well beyond hype. It is about readiness, responsibility, and the reality of building autonomous systems that still depend on human judgment. As organizations rush toward agentic AI, are they truly prepared to change how decisions are made, how people work, and how accountability is defined, or are they still treating AI as a faster hammer rather than a new kind of teammate? Useful Links Connect with Jinsook Han Learn More about Genpact
Checo, CEO and co-founder of Ottermon AI, joins Dash0's Mirko Novakovic to argue that modern observability is rife with noise, reactivity and human bottlenecks. Drawing on years of frontline SRE and product experience, Checo explains why most telemetry is wasted, how signal distillation and “fingerprinting” can surface real risk earlier, and why observability must shift from dashboards and alerts to prescriptive, prevention-first intelligence.
Software Engineering Radio - The Podcast for Professional Software Developers
Yechezkel "Chez" Rabinovich, CTO and co-founder at Groundcover, joins SE Radio host Brijesh Ammanath to discuss the key challenges in migrating observability toolsets. The episode starts with a look at why customers might seek to migrate their existing Observability stack, and then Chez explains some approaches and techniques for doing so. The discussion turns to OpenTelemetry, including what it is and how Groundcover helps with the migration of dashboards, monitors, pipelines, and integrations that are proprietary to vendor products. Chez describes methods for validating a successful migration, as well as metrics and signals that engineering teams can use to assess the migration health. Brought to you by IEEE Computer Society and IEEE Software magazine.
Developers are not just writing code anymore. They are starting to run a virtual team. At AWS re:Invent, I had a conversation with Jemiah Sius, VP, Market Strategy and Developer Relations, from New Relic about how AI is changing the day-to-day life of developers. This was one of those chats that makes you pause and rethink how software will be built very soon.Here is what stood out-- Agentic AI is becoming real for developers Teams are excited about agents that behave like a digital team or a virtual SRE, taking care of reliability and performance while developers focus on building features-- Developers are becoming orchestrators Over the next 6 to 8 months, the role of the developer is shifting. Less time writing every line of code, more time directing agents and tools. This shift is already driving a big jump in productivity-- Observability matters more than ever As agents start working across multiple LLM servers and interacting with other agents, visibility becomes critical. Without observability across the full agent layer, things can quickly create more work instead of less-- New Relic and AWS coming together We talked about the New Relic integration with AWS Q, which brings observability data directly into AWS DevOps workflows, and the new security agent that surfaces real production data on vulnerabilitiesIt was great catching up with Jemiah again and hearing how New Relic is thinking about the future of developers and reliability.#Data #AI #AWSRecipes #NewRelic #AgenticAI #Security #MCP #reinvent #NewRelic #TheRavitShow
Fresh out of the studio, Patrick Kelly, Vice President for Asia Pacific at Arize AI, joins us to explore the critical world of AI observability, evaluation, and infrastructure and how Arize AI will start their go to market across the region. Beginning with his transition from Databricks to Arize AI, Patrick explained how the company's mission centers on making AI work for people by helping teams observe, evaluate, and continuously improve their AI agents in production. Emphasizing that evaluations are the most important requirement for AI systems in 2025-2026, he revealed a striking insight: approximately 50% of AI agents fail silently in production because organizations don't know what's happening. Through compelling case studies from Booking.com, Flipkart, and AT&T, Patrick explained how Arize AI enables real-time observability and online evaluations, achieving results like 40% accuracy improvements and 84% cost reductions. Patrick concluded by sharing his vision for success across Asia Pacific's diverse markets - from regulatory frameworks in Korea and Singapore to language localization challenges in Vietnam - emphasizing the three pillars that remain constant: helping customers make money, control costs, and manage risk in an era where AI governance has become paramount. Last but not least, he shares what great would look like for Arize AI in the Asia Pacific"The mission is to make AI work for the people. It's about getting AI working for everybody—consumers, customers, and businesses at large. Evals are the most important things that we've seen through 2025 and will see more of into 2026; they are the most important thing for systems to work. When I'm working with a customer, I ask: How are we going to help them make money? How are we going to help them control costs? And how are we going to help them manage risk? A lot of AI now is about managing risk."Episode Highlights: [00:00] Quote of the Day by Patrick Kelly[01:10] Bernard introduces AI evaluation and infrastructure topic[02:24] Patrick's journey from Databricks to Arize AI[03:20] Arize AI's mission: making AI work for people[04:00] Understanding agentic systems and their complexity[05:18] Observability, evaluation, and development framework explained[06:27] Creating continuous feedback loops for AI improvement[07:00] On-premises and air-gapped deployment capabilities[08:00] Open Telemetry and Open Inference standards[09:08] Evaluations are critical for 2025-2026 success[10:36] Booking.com case: real-time production AB testing[14:36] Phoenix open source and Open Inference: entry to Arize ecosystem[16:00] Travel industry use cases: Skyscanner and Flipkart[17:53] AT&T case: 40% accuracy improvement, 84% cost reduction[19:36] 50% of production agents fail silently[20:26] Korea and Singapore MAS launches AI risk management framework[22:08] Arize AI CEO's 10 predictions for AI 2026[22:41] Cursor for X: AI engineering everywhere[24:06] Context and session state matter critically[26:27] Harness: new buzzword for agent orchestration[34:13] Three pillars: make money, control costs, manage risk[36:00] Asia Pacific diversity: India to Japan[37:12] Language and cultural nuances in evaluations[38:00] ClosingProfile: Patrick Kelly, Vice President, Asia Pacific, Arize AILinkedIn Profile: https://www.linkedin.com/in/patrick-kelly-aab6168/?ref=analyse.asiaPodcast Information: Bernard Leong hosts and produces the show. The proper credits for the intro and end music are "Energetic Sports Drive." G. Thomas Craig mixed and edited the episode in both video and audio format.
Kris Beevers is the CEO at NetBox Labs, working on turning NetBox into the system of record and automation backbone for modern and AI-driven infrastructure.Speed and Scale: How Today's AI Datacenters Are Operating Through Hypergrowth // MLOps Podcast #359 with Kris Beevers, CEO of NetBox LabsJoin the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletterMLOps GPU Guide: https://go.mlops.community/gpuguide// AbstractHundreds of neocloud operators and "AI Factory" builders have emerged to serve the insatiable demand for AI infrastructure. These teams are compressing the design, build, deploy, operate, scale cycle of their infrastructures down to months, while managing massive footprints with lean teams. How? By applying modern intent-driven infrastructure automation principles to greenfield deployments. We'll explore how these teams carry design intent through to production, and how operating and automating around consistent infrastructure data is compressing "time to first train".// BioKris Beevers is the Co-founder and CEO of NetBox Labs. NetBox is used by nearly every Neocloud and AI datacenter to manage their networks and infrastructure. Kris is an engineer at heart and by background, and loves the leverage infrastructure innovation creates to accelerate technology and empower engineers to do their best work. A serial entrepreneur, Kris has founded and helped lead multiple other successful businesses in the internet and network infrastructure. Most recently, he co-founded and led NS1, which was acquired by IBM in 2023. He holds a Ph.D. in Computer Science from Rensselaer Polytechnic Institute and is based in New Jersey.// Related LinksWebsite: https://netboxlabs.com/Coding Agents Conference: https://luma.com/codingagents~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExploreJoin our Slack community [https://go.mlops.community/slack]Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)] Sign up for the next meetup: [https://go.mlops.community/register]MLOps Swag/Merch: [https://shop.mlops.community/]Connect with Demetrios on LinkedIn: /dpbrinkmConnect with Kris on LinkedIn: /beevek/Timestamps:[00:00] Observability and Delta Analysis[00:26] New World Exploration[04:06] Bottlenecks in AI Infrastructure[13:37] Data Center Optimization Challenges[19:58] Tech Stack Breakdown[25:26] Data Center Design Principles[31:32] Constraints and Automation in Design[40:00] Complexity in Data Centers[45:02] GPU Cloud Landscape[50:24] Data Centers in Containers[57:45] Observability Beyond Software[1:04:43] Tighter Integrations vs NetBox[1:06:47] Wrap up
In this episode, Matt is joined by Charlie Bell, Microsoft's EVP of Security, Compliance, Identity, and Management, to discuss the future of AI and its implications on cybersecurity.The conversation revolves around IDC's prediction of 1.3 billion AI agents by 2028, Charlie's insights from his recent writings 'Beware of Double Agents', and the crucial aspects of agentic Zero Trust.They explore the benefits and risks associated with AI agents, the importance of security culture, and strategies to mitigate potential threats.Charlie also shares his experiences working with Satya Nadella and the importance of collaboration and curiosity in leadership.--Key Moments:02:08 The Exponential Growth and Impact of AI Agents03:47 AI Agents: Beyond Conversational Interfaces05:48 Security Challenges in the Age of AI Agents06:57 Parallels Between Cloud Adoption and AI Agent Era09:19 Democratization of AI: From Developers to Everyone13:57 The Concept of Double Agents in AI16:07 New Attack Vectors and Security Concerns21:43 Combating Security Challenges in AI22:07 The Importance of Identity and Containment23:50 Alignment and Intent in AI Systems27:08 Observability and Accountability of AI Agents30:00 AI in Security and Assumed Breach33:17 Fostering a Culture of Security38:45 Leadership Insights from Satya Nadella--Key Links:MicrosoftConnect with Charlie on LinkedInMentioned in this episode:Free report from HatchWorks AI — State of AI 2026What's real in AI this year, what's hype, and what leaders should prioritize — including production lessons, designing for agents, and governance. https://hatchworks.com/state-of-ai-2026/AI Opportunity FinderFeeling overwhelmed by all the AI noise out there? The AI Opportunity Finder from HatchWorks cuts through the hype and gives you a clear starting point. In less than 5 minutes, you'll get tailored, high-impact AI use cases specific to your business—scored by ROI so you know exactly where to start. Whether you're looking to cut costs, automate tasks, or grow faster, this free tool gives you a personalized roadmap built for action.
From Systems Engineer in Aeronautics via many clouds to becoming an SRE in Observability! That's the path from our guest, Alexandra Franz who is a Lead Product Engineer in SRE at Dynatrace. Tune in and learn how their team plans ahead for expected high traffic around Black Friday, Cyber Monday or the Super Bowl. We discuss how regional traffic patterns and differences in available hardware get factored in for capacity management and cost control. We also learn why global cloud outages are stressful - but - how those incidents can also be the reward for a good SRE.Make sure to connect with Alexandra on LinkedIn: https://www.linkedin.com/in/alexandrafranz/
Join us as Neel explores how observability is evolving beyond traditional logs, metrics, and traces into a predictive, AI-powered discipline. Neel walks through the evolution of Observability, demonstrating how OpenTelemetry, machine learning, and LLMs are transforming how we monitor and maintain modern applications. You'll learn about dynamic sampling techniques that reduce costs while maintaining visibility, how ML algorithms detect anomalies before they cause outages, and practical implementations using tools like the OpenTelemetry Collector. This episode covers real-world scenarios from reducing massive log volumes to predicting system failures before they impact customers. Timestamps 0:00 Welcome & Introduction 4:29 Neel's Background & Community Work 5:03 The Evolution of Observability 6:29 The 2 AM Production Incident Scenario 8:13 OpenTelemetry's Role in Modern Observability 12:45 Dynamic Sampling Techniques 18:22 ML & AI in Anomaly Detection 24:16 LLM Observability Explained 28:32 Cost Optimization Strategies 30:04 Context Windows & Token Management 32:00 Self-Healing Systems Discussion 34:15 Edge Cases: When Dynamic Sampling Doesn't Work 36:27 Wrap-up & Resources How to find Neel: https://www.linkedin.com/in/neelcshah/ https://bento.me/neelshah Links from the show: https://neelshah.dev/blogs/observability-2 https://opentelemetry.io/ https://middleware.io/blog/observability-2-0/
Your company's goldmine? All those meetings and call recordings. It's the fuel that AI needs. But here's the big letdown: those call transcripts only pick up the words. Not what they mean. And the difference? Well…. That can make all the difference. But some new technology might change what's possible. Join us as we talk about it. AI Can Finally Hear What You Actually Mean. What this unlocks — An Everyday AI chat with Jordan Wilson and Modulate's Mike Pappas.Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion on LinkedIn: Thoughts on this? Join the convo on LinkedIn and connect with other AI leaders.Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:Modulate Velma Voice Native AI Model OverviewTone, Emotion, and Intent in Voice AIDifferentiating Text vs. True Voice UnderstandingReal-World Voice AI Use Cases in Fraud DetectionSynthetic Voice and Deepfake Detection TechniquesEnsemble Listening Model (ELM) Technology ExplainedVoice AI for Customer Service and SupportTrust, Compliance, and Observability in Voice AI AgentsCost and Scalability Challenges for Voice AIFuture Impact of Voice AI on Customer RelationshipsTimestamps:00:00 "Modulate: AI That Understands Tone"06:15 "AI Use Cases Beyond Gaming"07:13 "Detecting Abuse and Fraud"13:19 Dynamic Model Orchestration Innovation16:22 "Context-Aware AI for Conversations"17:44 "Voice AI Transforming Customer Service"22:49 AI Accountability and Compliance Challenges25:36 AI, Customers, and Brand Trust28:05 "Enhancing Communication Through AI"Keywords: Voice AI, voice native AI, voice understanding, tone detection AI, intent detection, emotional AI, prosody analysis, real-time fraud detection, synthetic voice detection, AI guardrails, deepfake detection, customer support AI, call analysis, Send Everyday AI and Jordan a text message. (We can't reply back unless you leave contact info) Human-Level Voice Intelligence, 100x Faster. Try Velma from Modulate today. Human-Level Voice Intelligence, 100x Faster. Try Velma from Modulate today.
#335: Observability tools have exploded in recent years, but most come with a familiar tradeoff: either pay steep cloud vendor markups or spend weeks building custom dashboards from scratch. Coroot takes a different path as a self-hosted, open source observability platform that prioritizes simplicity over flexibility. Using eBPF technology, Coroot automatically instruments applications without requiring code changes or complex configuration, delivering what co-founder Peter Zaitsev calls opinionated observability—a philosophy of less is more that aims to reduce cognitive overload rather than drowning users in endless metrics and dashboards. The conversation explores how Coroot differentiates itself in a crowded market with over a hundred observability vendors. Rather than competing head-to-head with cloud giants like Datadog and Dynatrace, Coroot focuses on developers who need answers fast without building elaborate monitoring systems. The platform combines systematic root cause analysis with AI-powered recommendations, using deterministic methods to trace how errors propagate through microservices before handing off to LLMs for actionable fix suggestions. Darin and Viktor dig into Coroot's business model with Peter, examining why the company chose Apache 2.0 licensing instead of more restrictive options, and how staying bootstrapped with minimal angel funding allows them to play the long game without pressure to chase every hype cycle. Peter's contact information: X: https://x.com/PeterZaitsev Bluesky: https://bsky.app/profile/peterzaitsev.bsky.social LinkedIn: https://www.linkedin.com/in/peterzaitsev/ YouTube channel: https://youtube.com/devopsparadox Review the podcast on Apple Podcasts: https://www.devopsparadox.com/review-podcast/ Slack: https://www.devopsparadox.com/slack/ Connect with us at: https://www.devopsparadox.com/contact/
Mike Oaten is the Founder and CEO of TIKOS, working on building AI assurance, explainability, and trustworthy AI infrastructure, helping organizations test, monitor, and govern AI models and systems to make them transparent, fair, robust, and compliant with emerging regulations.Cracking the Black Box: Real-Time Neuron Monitoring & Causality Traces // MLOps Podcast #358 with Mike Oaten, Founder and CEO of TIKOSJoin the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletter// AbstractAs AI models move into high-stakes environments like Defence and Financial Services, standard input/output testing, evals, and monitoring are becoming dangerously insufficient. To achieve true compliance, MLOps teams need to access and analyse the internal reasoning of their models to achieve compliance with the EU AI Act, NIST AI RMF, and other requirements.In this session, Mike introduces the company's patent-pending AI assurance technology that moves beyond statistical proxies. He will break down the architecture of the Synapses Logger, a patent-pending technology that embeds directly into the neural activation flow to capture weights, activations, and activation paths in real-time.// BioMike Oaten serves as the CEO of TIKOS, leading the company's mission to progress trustworthy AI through unique, high-performance AI model assurance technology. A seasoned technical and data entrepreneur, Mike brings experience from successfully co-founding and exiting two previous data science startups: Riskopy Inc. (acquired by Nasdaq-listed Coupa Software in 2017) and Regulation Technologies Limited (acquired by mnAi Data Solutions in 2022).Mike's expertise spans data, analytics, and ML product and governance leadership. At TIKOS, Mike leads a VC-backed team developing technology to test and monitor deep-learning models in high-stakes environments, such as defence and financial services, so they comply with the stringent new laws and regulations.// Related LinksWebsite: https://tikos.tech/LLM guardrails: https://medium.com/tikos-tech/your-llm-output-is-confidently-wrong-heres-how-to-fix-it-08194fdf92b9Model Bias: https://medium.com/tikos-tech/from-hints-to-hard-evidence-finally-how-to-find-and-fix-model-bias-in-dnns-2553b072fd83Model Robustness: https://medium.com/tikos-tech/tikos-spots-neural-network-weaknesses-before-they-fail-the-iris-dataset-b079265c04daGPU Optimisation: https://medium.com/tikos-tech/400x-performance-a-lightweight-open-source-python-cuda-utility-to-break-vram-barriers-d545e5b6492fHyperbolic GPU Cloud: app.hyperbolic.ai.Coding Agents Conference: https://luma.com/codingagents~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExploreJoin our Slack community [https://go.mlops.community/slack]Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)] Sign up for the next meetup: [https://go.mlops.community/register]MLOps Swag/Merch: [https://shop.mlops.community/]Connect with Demetrios on LinkedIn: /dpbrinkmConnect with Mike on LinkedIn: /mike-oaten/Timestamps:[00:00] Regulations as Opportunity[00:25] Regulation Compliance Fun[02:49] AI Act Layers Explained[05:19] Observability in Systems vs ML[09:05] Risk Transfer in AI[11:26] LLMs and Model Approval[14:53] LLMs in Finance[17:17] Hyperbolic GPU Cloud Ad[18:16] Stakeholder Alignment and Tech[22:20] AI in Regulated Environments[28:55] Autonomous Boat Regulations[34:20] Data Compliance Mapping[39:11] Data Capture Strategy[41:13] EU AI Act Insights[44:52] Wrap up[45:45] Join the Coding Agents Conference!
Lior Gavish, CTO and co-founder of Monte Carlo Data, joins Ben Lorica to discuss the critical transition from data observability to agent observability in production environments. Subscribe to the Gradient Flow Newsletter
As managing Macs evolves, it's no longer enough to just configure the devices we are responsible for. What happened after we configured them, are they still configured that way, has the user managed to get around the controls we put in place. Eric Metzger joins us to discuss the different tools that we as Mac Admins can use to keep an eye on our fleet without making them slow and stop the users from doing their jobs. Hosts: Tom Bridge - @tbridge@theinternet.social Marcus Ransom - @marcusransom Guests: Eric Metzger - LinkedIn Links: SIEMply Irresistable - JNUC 2025 Jamf Protect Telemetry Data Model (Free Jamf ID required) Sponsors: Iru Fleet Device Management Meter If you're interested in sponsoring the Mac Admins Podcast, please email podcast@macadmins.org for more information. Get the latest about the Mac Admins Podcast, follow us on Twitter! We're @MacAdmPodcast! The Mac Admins Podcast has launched a Patreon Campaign! Our named patrons this month include Weldon Dodd, Damien Barrett, Justin Holt, Chad Swarthout, William Smith, Stephen Weinstein, Seb Nash, Dan McLaughlin, Joe Sfarra, Nate Cinal, Jon Brown, Dan Barker, Tim Perfitt, Ashley MacKinlay, Tobias Linder Philippe Daoust, AJ Potrebka, Adam Burg, & Hamlin Krewson
Security, compliance, and resilience are the cornerstones of trust. In this episode, Lois Houston and Nikita Abraham continue their conversation with David Mills and Tijo Thomas, exploring how Oracle Cloud Infrastructure empowers organizations to protect data, stay compliant, and scale with confidence. Real-world examples from Zoom, KDDI, 8x8, and Uber highlight these capabilities. Cloud Business Jumpstart: https://mylearn.oracle.com/ou/course/cloud-business-jumpstart/152957 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode. ------------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:26 Lois: Hello and welcome to the Oracle University Podcast! I'm Lois Houston, Director of Communications and Adoption with Customer Success Services, and with me is Nikita Abraham, Team Lead: Editorial Services with Oracle University. Nikita: Hi everyone! In our last episode, we started the conversation around the real business value of Oracle Cloud Infrastructure and how it helps organizations create impact at scale. Lois: Today, we're taking a closer look at what keeps the value strong — things like security, compliance, and the technology that helps businesses stay resilient. To walk us through it, we have our experts from Oracle University, David Mills, Senior Principal PaaS Instructor, and Tijo Thomas, Principal OCI Instructor. 01:12 Nikita: Hi David and Tijo! It's great to have you both here! Tijo, let's start with you. How does Oracle Cloud Infrastructure help organizations stay secure? Tijo: OCI uses a security first approach to protect customer workloads. This is done with implementing a Zero Trust Model. A Zero Trust security model use frequent user authentication and authorization to protect assets while continuously monitoring for potential breaches. This would assume that no users, no devices, no applications are universally trusted. Continuous verification is always required. Access is granted only based on the context of request, the level of trust, and the sensitivity of that asset. There are three strategic pillars that Oracle security first approach is built on. The first one is being automated. With automation, the business doesn't have to rely on any manual work to stay secure. Threat detection, patching, and compliance checks, all these happen automatically. And that reduces human errors and also saving time. Security in OCI is always turned on. Encryption is automatic. Identity checks are continuous. Security is not an afterthought in OCI. It is incorporated into every single layer. Now, while we talk about Oracle's security first approach, remember security is a shared responsibility, and what that means while Oracle handles the data center, the hardware, the infrastructure, software, consumers are responsible for securing their apps, configurations and the data. 03:06 Lois: Tijo, let's discuss this with an example. Imagine an online store called MuShop. They're a fast-growing business selling cat products. Can you walk us through how a business like this can enhance its end-to-end security and compliance with OCI? Tijo: First of all, focusing on securing web servers. These servers host the web portal where customers would browse, they log in, and place their orders. So these web servers are a prime target for attackers. To protect these entry points, MuShop deployed a service called OCI Web Application Firewall. On top of that, the MuShop business have also used OCI security list and network security groups that will control their traffic flow. As when the businesses grow, new users such as developers, operations, finance, staff would all need to be onboarded. OCI identity services is used to assign roles, for example, giving developers access to only the dev instances, and finance would access just the billing dashboards. MuShop also require MFA multi-factor authentication, and that use both password and a time-based authentication code to verify their identities. Talking about some of the critical customer data like emails, addresses, and the payment info, this data is stored in databases and storage. Using OCI Vault, the data is encrypted with customer managed keys. Oracle Data Safe is another service, and that is used to audit who has got access to sensitive tables, and also mask real customer data in non-production environments. 04:59 Nikita: Once those systems are in place, how can MuShop use OCI tools to detect and respond to threats quickly? Tijo: For that, MuShop used a service called OCI Cloud Guard. Think of it like a security operation center, and which is built right into OCI. It monitors the entire OCI environment continuously, and it can track identity activities, storage settings, network configurations and much more. If it finds something risky, like a publicly exposed object storage bucket, or maybe a user having a broad access to that environment, it raises a security finding. And better yet, it can automatically respond. So if someone creates a resource outside of their policy, OCI Cloud Guard can disable it. 05:48 Lois: And what about preventing misconfigurations? How does OCI make that easier while keeping operations secure? Tijo: OCI Security Zone is another service and that is used to enforce security postures in OCI. The goody zones help you to avoid any accidental misconfigurations. For example, in a security zone, you can choose users not to create a storage bucket that is publicly accessible. To stay ahead of vulnerabilities, MuShop runs OCI vulnerability scanning. They have scheduled to scan weekly to capture any outdated libraries or misconfigurations. OCI Security Advisor is another service that is used to flag any unused open ports and with recommending stronger access rules. MuShop needed more than just security. They also had to be compliant. OCI's compliance certifications have helped them to meet data privacy and security regulations across different regions and industries. There are additional services like OCI audit logs for traceability that help them pass internal and external audits. 07:11 Oracle University is proud to announce three brand new courses that will help your teams unlock the power of Redwood—the next generation design system. Redwood enhances the user experience, boosts efficiency, and ensures consistency across Oracle Fusion Cloud Applications. Whether you're a functional lead, configuration consultant, administrator, developer, or IT support analyst, these courses will introduce you to the Redwood philosophy and its business impact. They'll also teach you how to use Visual Builder Studio to personalize and extend your Fusion environment. Get started today by visiting mylearn.oracle.com. 07:52 Nikita: Welcome back! We know that OCI treats security as a continuous design principle: automated, always on, and built right into the platform. David, do you have a real-world example of a company that needed to scale rapidly and was able to do so successfully with OCI? David: In late 2019, Zoom averaged 10 million meeting participants a day. By April 2020, well that number surged to over 300 million as video conferencing became essential for schools, businesses, and families around the world due to the global pandemic. To meet that explosive demand, Zoom chose OCI not just for performance, but for the ability to scale fast. In just nine hours, OCI engineers helped Zoom move from deployment to live production, handling hundreds of thousands of concurrent meetings immediately. Within weeks, they were supporting millions. And Zoom didn't just scale, they sustained it. With OCI's next-gen architecture, Zoom avoided the performance bottlenecks common in legacy clouds. They used OCI functions and cloud native services to scale workloads flexibly and securely. Today, Zoom transfers more than seven petabytes of data per day through Oracle Cloud. That's enough bandwidth to stream HD video continuously for 93 years. And they do it while maintaining high availability, low latency, and enterprise grade security. As articulated by their CEO Erik Yuan, Zoom didn't just meet the moment, they redefined it with OCI behind the scenes. 09:45 Nikita: That's an incredible story about scale and agility. Do you have more examples of companies that turned to OCI to solve complex data or integration challenges? David: Telecom giant KDDI with over 64 million subscribers, faced a growing data dilemma. Data was everywhere. Survey results, system logs, behavioral analytics, but it was scattered across thousands of sources. Different tools for different tasks created silos, delays, and rising costs. KDDI needed a single platform to connect it all, and they chose Oracle. They replaced their legacy data systems with a modern data platform built on OCI and Autonomous Database. Now they can analyze behavior, improve service planning, and make faster, smarter decisions without the data chaos. But KDDI didn't stop there. They built a 300 terabyte data lake and connected all their systems-- custom on-prem apps, SaaS providers like Salesforce, and even multi-cloud infrastructure. Thanks to Oracle Integration and pre-built adapters, everything works together in real-time, even across clouds. AWS, Azure, and OCI now operate in harmony. The results? Reduced operational costs, faster development cycles, governance and API access improved across the board. KDDI can now analyze customer behavior to improve services like where to expand their 5G network. Next up, 8 by 8 powers communication for over 55,000 companies and 160 countries with more than 3 million users, depending on its voice, video, and messaging tools every day. To maintain that scale, they needed a cloud that could deliver low latency global availability and high performance without blowing up costs. Well, they moved their video meeting services from Amazon to OCI and went live in just four days. The results? 25% increase in performance per node, 80% reduction in network egress costs, and a significantly lower overall infrastructure spend. But this wasn't just a lift and shift. 8 by 8 also replaced legacy tools with Oracle Logging Analytics, giving their teams a single view across apps, infrastructure, and regions. 8 by 8 scaled up fast. They migrated core voice services, deployed over 300 microservices using OCI Kubernetes, and now run over 1,700 nodes across 26 global OCI regions. In addition, OCI's Ampere-based virtual machines gave them a major boost, sustaining 80% CPU utilization and more than 30% increased performance per core and with no degradation. And with OCI's Observability and Management platform, they gained real-time visibility into application health across both on-prem and cloud. Bottom line, 8x8 represents yet another excellent example of a company leveraging OCI for maximum business results. 13:24 Lois: Uber handles more than a million trips per hour, and Oracle Cloud Infrastructure is an integral part of making that possible. Can you walk us through how OCI supports Uber's needs? David: Uber, the world's largest on-demand mobility platform, handles over 1 million trips every hour. And behind the scenes, OCI is helping to make that possible. In 2023, Uber began migrating thousands of microservices, data platforms, and AI models to OCI. Why? Because OCI provides the automation, flexibility, and infrastructure scale needed to support Uber's explosive growth. Today, Uber uses OCI Compute to handle massive trips serving traffic and OCI Object Storage to replace one of the largest Hadoop-based data environments in the industry. They needed global reach and multi-cloud compatibility, and OCI delivered. But it's not just scale, it's intelligence. Uber runs dozens of AI models on OCI to support real-time predictions up 14 million per second. From ride pricing to traffic patterns, this AI layer powers every trip behind the scenes. And by shifting stateless workloads to OCI Ampere ARM Compute servers, Uber reduced cost while increasing CPU efficiency. For AI inferencing, Uber uses OCI's AI infrastructure to strike the perfect balance between speed, throughput, and cost. So the next time you use your Uber app to schedule a ride, consider what happens behind the scenes with OCI. 15:18 Lois: That's so impressive! Thank you, David, for those wonderful stories, and Tijo for all of your insights. Whether you're in strategy, finance, or transformation, we hope you're walking away with a clearer view of the business value OCI can bring. Nikita: Yeah, and if you want to learn more about the topics we discussed today, visit mylearn.oracle.com and search for the Cloud Business Jumpstart course. Until next time, this is Nikita Abraham… Lois: And Lois Houston signing off! 15:48 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
What happens when engineering teams can finally see the business impact of every technical decision they make? In this episode of Tech Talks Daily, I sat down with Chris Cooney, Director of Advocacy at Coralogix, to unpack why observability is no longer just an engineering concern, but a strategic lever for the entire business. Chris joined me fresh from AWS re:Invent, where he had been challenging a long-standing assumption that technical signals like CPU usage, error rates, and logs belong only in engineering silos. Instead, he argues that these signals, when enriched and interpreted correctly, can tell a much more powerful story about revenue loss, customer experience, and competitive advantage. We explored Coralogix's Observability Maturity Model, a four-stage framework that takes organizations from basic telemetry collection through to business-level decision making. Chris shared how many teams stall at measuring engineering health, without ever connecting that data to customer impact or financial outcomes. The conversation became especially tangible when he explained how a single failed checkout log can be enriched with product and pricing data to reveal a bug costing thousands of dollars per day. That shift, from "fix this tech debt" to "fix this issue draining revenue," fundamentally changes how priorities are set across teams. Chris also introduced Oli, Coralogix's AI observability agent, and explained why it is designed as an agent rather than a simple assistant. We talked about how Oli can autonomously investigate issues across logs, metrics, traces, alerts, and dashboards, allowing anyone in the organization to ask questions in plain English and receive actionable insights. From diagnosing a complex SQL injection attempt to surfacing downstream customer impact, Oli represents a move toward democratizing observability data far beyond engineering teams. Throughout our discussion, a clear theme emerged. When technical health is directly tied to business health, observability stops being seen as a cost center and starts becoming a competitive advantage. By giving autonomous engineering teams visibility into real-world impact, organizations can make faster, better decisions, foster innovation, and avoid the blind spots that have cost even well-known brands millions. So if observability still feels like a necessary expense rather than a growth driver in your organization, what would change if every technical signal could be translated into clear business impact, and who would make better decisions if they could finally see that connection? Useful LInks Connect with Chris Cooney Learn more about Coralogix Follow on LinkedIn Thanks to our sponsors, Alcor, for supporting the show.
The world of IT is becoming increasingly complex, making the need for real-time visibility and intelligence more critical than ever. In this episode, we sit down with Principal Technologists Michael Sasse and Matthew Bednar to explore the expanding field of data observability. We dive deep into defining observability—moving beyond basic latency graphs to equating system metrics and machine data with actual business outcomes. Hear their perspectives on why observability is now at the forefront of every IT leader's mind. Our conversation then veers to tackle the current state of "Shadow Observability," where an accumulation of different tools and creation of siloed teams leads to data fragmentation across cloud, data center, and security platforms. They underscore the necessity for a centralized, single point of truth and a unified strategy that prevents individual team decisions from negatively impacting the broader environment. Finally, our guests reveal how Pure Storage is participating in the movement toward standardization and centralization. We detail the evolution of Pure1 and a strategic shift from pull-based Open Metrics to real-time, push-based Open Telemetry, ensuring seamless integration across Pure's platform of storage array products.. Learn how this foundational platform approach, championed by Fusion and the Enterprise Data Cloud vision, empowers IT to move from being a cost center to a critical business partner that can tangibly link infrastructure efforts to financial results. To learn more, visit https://blog.purestorage.com/products/enhance-visibility-with-our-new-data-observability-and-monitoring-service/ Check out the new Pure Storage digital customer community to join the conversation with peers and Pure experts: https://purecommunity.purestorage.com/ 00:00 Intro and Welcome 08:30 Defining Observability 13:52 Stat of the Episode 25:49 Pure1 and Observability 28:25 Using APIs for Open Telemetry Output 32:49 Key Native Observability Integrations 39:25 Hot Takes Segment
TestTalks | Automation Awesomeness | Helping YOU Succeed with Test Automation
Performance testing often fails for one simple reason: teams can't see where the slowdown actually happens. In this episode, we explore Locust load testing and why Python-based performance testing is becoming the go-to choice for modern DevOps, QA, and SRE teams. You'll learn how Locust enables highly realistic user behavior, massive concurrency, and distributed load testing — without the overhead of traditional enterprise tools. We also dive into: Why Python works so well for AI-assisted load testing How Locust fits naturally into CI/CD and GitHub Actions The real difference between load testing vs performance testing How observability and end-to-end tracing eliminate guesswork Common performance testing mistakes even experienced teams make Whether you're a software tester, automation engineer, or QA leader looking to shift-left performance testing, this conversation will help you design smarter tests and catch scalability issues before your users do.
Sales Game Changers | Tip-Filled Conversations with Sales Leaders About Their Successful Careers
This is episode 806. Read the complete transcription on the Sales Game Changers Podcast website here. Watch the video of this podcast on YouTube here. The Sales Game Changers Podcast was recognized by YesWare as the top sales podcast. Read the announcement here. FeedSpot named the Sales Game Changers Podcast at a top 20 Sales Podcast and top 8 Sales Leadership Podcast! Subscribe to the Sales Game Changers Podcast now on Apple Podcasts! Purchase Fred Diamond's best-sellers Love, Hope, Lyme: What Family Members, Partners, and Friends Who Love a Chronic Lyme Survivor Need to Know and Insights for Sales Game Changers now! On today's show, Fred discussed where we are with AI and B2B and B2Government sales with ScienceLogic leaders Wendy Wooley, VP of Customer Advocacy and Strategy, and Lee Koepping, Global Sales Engineering at ScienceLogic. Find Wendy on LinkedIn. Find Lee on LinkedIn. WENDY'S TIP: "You can't replace human relationships in sales, but AI helps you stay proactive when organizations change." LEE'S TIP: "Sales is still very much a human game. AI won't replace that but it can get you to the right people, at the right time, with the right insight."
What happens when the systems we rely on every day start producing more signals than humans can realistically process, and how do IT leaders decide what actually matters anymore? In this episode of Tech Talks Daily, I sit down with Garth Fort, Chief Product Officer at LogicMonitor, to unpack why traditional monitoring models are reaching their limits and why AI native observability is starting to feel less like a future idea and more like a present day requirement. Modern enterprise IT now spans legacy data centers, multiple public clouds, and thousands of services layered on top. That complexity has quietly broken many of the tools teams still depend on, leaving operators buried under alerts rather than empowered by insight. Garth brings a rare perspective shaped by senior roles at Microsoft, AWS, and Splunk, along with firsthand experience running observability at hyperscale. We talk about how alert fatigue has become one of the biggest hidden drains on IT teams, including real world examples where organizations were dealing with tens of thousands of alerts every week and still missing the root cause. This is where LogicMonitor's AI agent, Edwin AI, enters the picture, not as a replacement for human judgment, but as a way to correlate noise into something usable and give operators their time and confidence back. A big part of our conversation centers on trust. AI agents behave very differently from deterministic automation, and that difference matters when systems are responsible for critical services like healthcare supply chains, airline operations, or global hospitality platforms. Garth explains why governance, auditability, and role based controls will decide how quickly enterprises allow AI agents to move from advisory roles into more autonomous ones. We also explore why experimentation with AI has become one of the lowest risk moves leaders can make right now, and why the teams who treat learning as a daily habit tend to outperform the rest. We finish by zooming out to the bigger picture, where observability stops being a technical function and starts becoming a way to understand business health itself. From mapping infrastructure to real customer experiences, to reshaping how IT budgets are justified in boardrooms, this conversation offers a grounded look at where enterprise operations are heading next. So, as AI agents become more embedded in the systems that run our businesses, how comfortable are you with handing them the keys, and what would it take for you to truly trust them? Useful Links Connect with Garth Fort Learn more about LogicMonitor Check out the Logic Monitor blog Follow on LinkedIn, X, Facebook, and YouTube. Alcor is the Sponsor of Tech Talks Network
In this episode of The Digital Executive, host Brian Thomas sits down with David Sztyman, Chief Architect at Hydrolix, to explore how real-time streaming data and AI are reshaping observability and security operations. Drawing on two decades of experience across streaming, caching, security, and analytics, David explains why scale remains a constant challenge—and why traditional data warehouses can't keep up with today's real-time demands.The conversation dives into the critical role of streaming data architectures in detecting issues as they happen, from video performance problems to active security threats like DDoS attacks. David also shares a pragmatic approach to AI, emphasizing how teams can use machine learning and LLMs selectively to detect anomalies without driving up costs. Looking ahead, he discusses the rise of AI agents, automated remediation, and natural-language access to data—capabilities that will make observability and security insights accessible to far more people across the enterprise.If you liked what you heard today, please leave us a review - Apple or Spotify. See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
This interview was recorded for GOTO Unscripted.https://gotopia.techCheck out more here:https://gotopia.tech/articles/407Ben Smith - Staff Developer Advocate at StripeJames Beswick - Head of Developer Relations at StripeRESOURCESBenhttps://twitter.com/benjamin_l_shttps://github.com/bls20AWShttps://linkedin.com/in/bensmithportfoliohttp://developeradvocate.co.ukhttps://thewebsmithsite.wordpress.comJameshttps://bsky.app/profile/jbesw.bsky.socialhttps://twitter.com/jbeswhttps://linkedin.com/in/jamesbeswickLinkshttps://stripe.devhttps://serverlessland.comDESCRIPTIONJames Beswick and Ben Smith explore the evolution of modern software architecture. They discuss why workflow services are essential for managing distributed systems, the challenges of microservices versus monoliths, and the power of plugin architectures.The conversation covers practical topics like idempotency, circuit breaker patterns, and the importance of observability, while also diving into what makes a great developer advocate and how to build demos that truly resonate with developers.RECOMMENDED BOOKSSimon Brown • Software Architecture for Developers Vol. 2 • https://leanpub.com/visualising-software-architectureDavid Farley • Modern Software Engineering • https://amzn.to/3GI468MKim, Humble, Debois, Willis & Forsgren • The DevOps Handbook • https://amzn.to/47oAf3lSimon Wardley • Wardley Maps • https://amzn.to/45U8UprSimon Wardley • Wardley Mapping, The Knowledge • https://amzn.to/3XQEeDuDavid Anderson, Marck McCann & Michael O'Reilly • The Value Flywheel Effect • https://amzn.to/3VcHxCMike Amundsen • Restful Web API Patterns & Practices Cookbook • https://amzn.to/3C74fpHBlueskyTwitterInstagramLinkedInFacebookCHANNEL MEMBERSHIP BONUSJoin this channel to get early access to videos & other perks:https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/joinLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!
How many people have you met that implemented distributed tracing in the early 2000s? Make it one more after you have tuned into our latest podcast with William Louth. William, who can't seem to escape the observability space even though he keeps trying, has a track record in the space. He is an innovator and tool builder and is currently reimagining intelligent systems by shifting the focus from data collection to meaning-making. In our conversation we learn about situational awareness and how systems should use symbols to show their current state by also taking into account everything they are aware of happening in their ecosystem.This podcast episode has been long overdue and opens a fascinating new world beyond metrics, logs and traces!Links discussedWilliams LinkedIn: https://www.linkedin.com/in/william-david-louth/Humainary Research: https://humainary.io/research/Humainary GitHub: https://github.com/humainary-ioServentis Signs: https://raw.githubusercontent.com/humainary-io/substrates-api-java/refs/heads/main/ext/serventis/SIGNS.md
Ari Zilka, founder of MyDecisive.ai and former Hortonworks CPO, argues that most observability vendors now offer essentially identical, reactive dashboards that highlight problems only after systems are already broken. After speaking with all 23 observability vendors at KubeCon + CloudNativeCon North America 2025, Zilka said these tools fail to meaningfully reduce mean time to resolution (MTTR), a long-standing demand he heard repeatedly from thousands of CIOs during his time at New Relic.Zilka believes observability must shift from reactive monitoring to proactive operations, where systems automatically respond to telemetry in real time. MyDecisive.ai is his attempt to solve this, acting as a “bump in the wire” that intercepts telemetry and uses AI-driven logic to trigger actions like rolling back faulty releases.He also criticized the rising cost and complexity of OpenTelemetry adoption, noting that many companies now require large, specialized teams just to maintain OTel stacks. MyDecisive aims to turn OpenTelemetry into an enterprise-ready service that reduces human intervention and operational overhead.Learn more from The New Stack about OpenTelemetry:Observability Is Stuck in the Past. Your Users Aren't. Setting Up OpenTelemetry on the Frontend Because I Hate MyselfHow to Make OpenTelemetry Better in the BrowserJoin our community of newsletter subscribers to stay on top of the news and at the top of your game. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
BONUS: The Operating System for Software-Native Organizations - The Five Core Principles In this BONUS episode, the final installment of our Special Xmas 2025 reflection on Software-native businesses, we explore the five fundamental principles that form the operating system for software-native organizations. Building on the previous four episodes, this conversation provides the blueprint for building organizations that can adapt at the speed of modern business demands, where the average company lifespan on the S&P 500 has dropped from 33 years in the 1960s to a projected 12 years by 2027. The Challenge of Adaptation "What we're observing in Ukraine is adaptation happening at a speed that would have been unthinkable in traditional military contexts - new drone capabilities emerge, countermeasures appear within days, and those get countered within weeks." The opening draws a powerful parallel between the rapid adaptation we're witnessing in drone warfare and the existential threats facing modern businesses. While our businesses aren't facing literal warfare, they are confronting dramatic disruption. Clayton Christensen documented this in "The Innovator's Dilemma," but what he observed in the 1970s and 80s is happening exponentially faster now, with software as the accelerant. If we can improve businesses' chances of survival even by 10-15%, we're talking about thousands of companies that could thrive instead of fail, millions of jobs preserved, and enormous value created. The central question becomes: how do you build an organization that can adapt at this speed? Principle 1: Constant Experimentation with Tight Feedback Loops "Everything becomes an experiment. Not in the sense of being reckless or uncommitted, but in being clear about what we're testing and what we expect to learn. I call this: work like a scientist: learning is the goal." Software developers have practiced this for decades through Test-Driven Development, but now this TDD mindset is becoming the ruling metaphor for managing products and entire businesses. The practice involves framing every initiative with three clear elements: the goal (what are we trying to achieve?), the action (what specific thing will we do?), and the learning (what will we measure to know if it worked?). When a client says "we need to improve our retrospectives," software-native organizations don't just implement a new format. Instead, they connect it to business value - improving the NPS score for users of a specific feature by running focused retrospectives that explicitly target user pain points and tracking both the improvements implemented and the actual NPS impact. After two weeks, you know whether it worked. The experiment mindset means you're always learning, never stuck. This is TDD applied to organizational change, and it's powerful because every process change connects directly to customer outcomes. Principle 2: Clear Connection to Business Value "Software-native organizations don't measure success by tasks completed, story points delivered, or features shipped. Or even cycle time or throughput. They measure success by business outcomes achieved." While this seems obvious, most organizations still optimize for output, not outcomes. The practice uses Impact Mapping or similar outcome-focused frameworks where every initiative answers three questions: What business behavior are we trying to change? How will we measure that change? What's the minimum software needed to create that change? A financial services client wanted to "modernize their reporting system" - a 12-month initiative with dozens of features in project terms. Reframed through a business value lens, the goal became reducing time analysts spend preparing monthly reports from 80 hours to 20 hours, measured by tracking actual analyst time, starting with automating just the three most time-consuming report components. The first delivery reduced time to 50 hours - not perfect, but 30 hours saved, with clear learning about which parts of reporting actually mattered. The organization wasn't trying to fulfill requirements; they were laser focused on the business value that actually mattered. When you're connected to business value, you can adapt. When you're committed to a feature list, you're stuck. Principle 3: Software as Value Amplifier "Software isn't just 'something we do' or a support function. Software is an amplifier of your business model. If your business model generates $X of value per customer through manual processes, software should help you generate $10X or more." Before investing in software, ask whether this can amplify your business model by 10x or more - not 10% improvement, but 10x. That's the threshold where software's unique properties (zero marginal cost, infinite scale, instant distribution) actually matter, and where the cost/value curve starts to invert. Remember: software is still the slowest and most expensive way to check if a feature would deliver value, so you better have a 10x or more expectation of return. Stripe exemplifies this principle perfectly. Before Stripe, accepting payments online required a merchant account (weeks to set up), integration with payment gateways (months of development), and PCI compliance (expensive and complex). Stripe reduced that to adding seven lines of code - not 10% easier, but 100x easier. This enabled an entire generation of internet businesses that couldn't have existed otherwise: subscription services, marketplaces, on-demand platforms. That's software as amplifier. It didn't optimize the old model; it made new models possible. If your software initiatives are about 5-10% improvements, ask yourself: is software the right medium for this problem, or should you focus where software can create genuine amplification? Principle 4: Software as Strategic Advantage "Software-native organizations use software for strategic advantage and competitive differentiation, not just optimization, automation, or cost reduction. This means treating software development as part of your very strategy, not a way to implement a strategy that is separate from the software." This concept, discussed with Tom Gilb and Simon Holzapfel on the podcast as "continuous strategy," means that instead of creating a strategy every few years and deploying it like a project, strategy and execution are continuously intertwined when it comes to software delivery. The practice involves organizing around competitive capabilities that software uniquely enables by asking: How can software 10x the value we generate right now? What can we do with software that competitors can't easily replicate? Where does software create a defensible advantage? How does our software create compounding value over time? Amazon Web Services didn't start as a product strategy but emerged from Amazon building internal capabilities to run their e-commerce platform at scale. They realized they'd built infrastructure that was extremely hard to replicate and asked: "What if we offered it to others?" AWS became Amazon's most profitable business - not because they optimized their existing retail business, but because they turned an internal capability into a strategic platform. The software wasn't supporting the strategy - the software became the strategy. Compare this to companies that use software just for cost reduction or process optimization - they're playing defense. Software-native companies use software to play offense, creating capabilities that change the competitive landscape. Continuous strategy means your software capabilities and your business strategy evolve together, in real-time, not in annual planning cycles. Principle 5: Real-Time Observability and Adaptive Systems "Software-native organizations use telemetry and real-time analytics not just to understand their software, but to understand their entire business and adapt dynamically. Observability practices from DevOps are actually ways of managing software delivery itself. We're bootstrapping our own operating system for software businesses." This principle connects back to Principle 1 but takes it to the organizational level. The practice involves building systems that constantly sense what's happening and can adapt in real-time: deploy with feature flags so you can turn capabilities on/off instantly, use A/B testing not just for UI tweaks but for business model experiments, instrument everything so you know how users actually behave, and build feedback loops that let the system respond automatically. Social media companies and algorithmic trading firms already operate this way. Instagram doesn't deploy a new feed algorithm and wait six months to see if it works - they're constantly testing variations, measuring engagement in real-time, adapting the algorithm continuously. The system is sensing and responding every second. High-frequency trading firms make thousands of micro-adjustments per day based on market signals. Imagine applying this to all businesses: a retail company that adjusts pricing, inventory, and promotions in real-time based on demand signals; a healthcare system that dynamically reallocates resources based on patient flow patterns; a logistics company whose routing algorithms adapt to traffic, weather, and delivery success rates continuously. This is the future of software-native organizations - not just fast decision-making, but systems that sense and adapt at software speed, with humans setting goals and constraints but software executing continuous optimization. We're moving from "make a decision, deploy it, wait to see results" to "deploy multiple variants, measure continuously, let the system learn." This closes the loop back to Principle 1 - everything is an experiment, but now the experiments run automatically at scale with near real-time signal collection and decision making. It's Experiments All The Way Down "We established that software has become societal infrastructure. That software is different - it's not a construction project with a fixed endpoint; it's a living capability that evolves with the business." This five-episode series has built a complete picture: Episode 1 established that software is societal infrastructure and fundamentally different from traditional construction. Episode 2 diagnosed the problem - project management thinking treats software like building a bridge, creating cascade failures throughout organizations. Episode 3 showed that solutions already exist, with organizations like Spotify, Amazon, and Etsy practicing software-native development successfully. Episode 4 exposed the organizational immune system - the four barriers preventing transformation: the project mindset, funding models, business/IT separation, and risk management theater. Today's episode provides the blueprint - the five principles forming the operating system for software-native organizations. This isn't theory. This is how software-native organizations already operate. The question isn't whether this works - we know it does. The question is: how do you get started? The Next Step In Building A Software-Native Organization "This is how transformation starts - not with grand pronouncements or massive reorganizations, but with conversations and small experiments that compound over time. Software is too important to society to keep managing it wrong." Start this week by doing two things. First, start a conversation: pick one of these five principles - whichever resonates most with your current challenges - and share it with your team or leadership. Don't present it as "here's what we should do" but as "here's an interesting idea - what would this mean for us?" That conversation will reveal where you are, what's blocking you, and what might be possible. Second, run one small experiment: take something you're currently doing and frame it as an experiment with a clear goal, action, and learning measure. Make it small, make it fast - one week maximum, 24 hours if you can - then stop and learn. You now have the blueprint. You understand the barriers. You've seen the alternatives. The transformation is possible, and it starts with you. Recommended Further Reading Tom Gilb and Simon Holzapfel episodes on continuous strategy The book by Christensen, Clayton: "The Innovator's Dilemma" The book by Gojko Adzic: Impact Mapping Ukraine drone warfare Company lifespan statistics: Innosight research on S&P 500 turnover Stripe's impact on internet businesses Amazon AWS origin story DevOps observability practices About Vasco Duarte Vasco Duarte is a thought leader in the Agile space, co-founder of Agile Finland, and host of the Scrum Master Toolbox Podcast, which has over 10 million downloads. Author of NoEstimates: How To Measure Project Progress Without Estimating, Vasco is a sought-after speaker and consultant helping organizations embrace Agile practices to achieve business success. You can link with Vasco Duarte on LinkedIn.
Join Dan Vega and DaShaun Carter for the latest updates from the Spring Ecosystem. In this episode, Dan and DaShaun sit down with Spring Team member Brian Clozel to discuss OpenTelemetry (OTEL) and how to leverage it in your Spring Boot applications. Learn how OTEL provides a vendor-neutral standard for collecting telemetry data including traces, metrics, and logs to gain deeper observability into your applications. You can participate in our live stream to ask questions or catch the replay on your preferred podcast platform.Show Notes:OpenTelemtry with Spring BootBrian Clozel GitHubBrian Clozel on Mastodon
This interview was recorded for the GOTO Book Club.http://gotopia.tech/bookclubCheck out more here:https://gotopia.tech/episodes/402Albert S. Tanure - Cross Solutions Architec at Microsoft & Author of "ASP.NET Core 9 Essentials"Rafael Herik de Carvalho - Platform & DevOps Engineering at DevoteamRESOURCESAlberthttps://x.com/alberttanurehttps://github.com/tanurehttps://www.linkedin.com/in/albert-tanurehttps://www.codefc.io/enRafaelhttps://x.com/rafaelherikhttps://github.com/rafaelherikhttps://www.linkedin.com/in/rafaelh-carvalhohttps://dev.to/rafaelherikDESCRIPTIONMicrosoft Solutions Architect Albert Tanure explores his approach to writing "ASP.NET Core 9 Essentials", a guide designed to take developers from basic .NET concepts to advanced cloud-native application development. Albert emphasizes the intentional structure of starting with foundations before introducing best practices, covering the complete application lifecycle from UI development and APIs to deployment, monitoring, and cloud operations.The conversation highlights how modern development requires understanding not just coding, but also DevOps practices, observability with tools like OpenTelemetry, dynamic configurations, containers, and cloud-native principles. The book serves both beginners seeking solid foundations and experienced developers looking to understand modern deployment strategies, with particular emphasis on chapters 9-11 that cover cloud native mindsets and operational considerations.RECOMMENDED BOOKSAlbert Tanure • ASP.NET Core 9 Essentials • https://amzn.to/43bH73tMark J. Price • Real-World Web Development with .NET 9 • https://amzn.to/46ZKsnwMark J. Price • C# 13 and .NET 9 – Modern Cross-Platform Development Fundamentals • https://amzn.to/4o5E5FZFabrizio Romano & Heinrich Kruger • Learning Python Programming • https://amzn.to/4myLBItBlueskyTwitterInstagramLinkedInFacebookCHANNEL MEMBERSHIP BONUSJoin this channel to get early access to videos & other perks:https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/joinLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!
Enterprises are racing to deploy AI services, but the teams responsible for running them in production are seeing familiar problems reemerge—most notably, silos between data scientists and operations teams, reminiscent of the old DevOps divide. In a discussion recorded at AWS re:Invent 2025, IBM's Thanos Matzanas and Martin Fuentes argue that the challenge isn't new technology but repeating organizational patterns. As data teams move from internal projects to revenue-critical, customer-facing applications, they face new pressures around reliability, observability, and accountability.The speakers stress that many existing observability and governance practices still apply. Standard metrics, KPIs, SLOs, access controls, and audit logs remain essential foundations, even as AI introduces non-determinism and a heavier reliance on human feedback to assess quality. Tools like OpenTelemetry provide common ground, but culture matters more than tooling.Both emphasize starting with business value and breaking down silos early by involving data teams in production discussions. Rather than replacing observability professionals, AI should augment human expertise, especially in critical systems where trust, safety, and compliance are paramount.Learn more from The New Stack about enabling AI with silos: Are Your AI Co-Pilots Trapping Data in Isolated Silos?Break the AI Gridlock at the Intersection of Velocity and TrustTaming AI Observability: Control Is the Key to SuccessJoin our community of newsletter subscribers to stay on top of the news and at the top of your game. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
In this episode, Steph Hippo, Platform Engineering Director at Honeycomb, joins The Prodcast to discuss AI and SRE. Steph explains how observability helps us understand complex systems from their outputs, and provides a foundation for SRE to respond to system problems. This episode explains how AI and observability build a self-reinforcing loop. We also discuss how AI can detect and respond to certain classes of incidents, leading to self-healing systems and allowing SREs to focus on novel and interesting problems. She advises small businesses adopting AI to learn from others' mistakes (post-mortems) and to commit time and budget to experimentation.
As AI tools and agentic AI become part of how applications are developed, delivered, and managed, application performance monitoring and observability have to adapt. Ned Bellavance sits down with Drew Flowers and Jacob Yackenovich from IBM Instana about where these fields sit today, and the potential impacts of AI. They detail the challenges of application... Read more »
As AI tools and agentic AI become part of how applications are developed, delivered, and managed, application performance monitoring and observability have to adapt. Ned Bellavance sits down with Drew Flowers and Jacob Yackenovich from IBM Instana about where these fields sit today, and the potential impacts of AI. They detail the challenges of application... Read more »
As AI tools and agentic AI become part of how applications are developed, delivered, and managed, application performance monitoring and observability have to adapt. Ned Bellavance sits down with Drew Flowers and Jacob Yackenovich from IBM Instana about where these fields sit today, and the potential impacts of AI. They detail the challenges of application... Read more »
The worlds of IT security and operations are being pulled together and AI is a catalyst that's making it happen. The focus on observability that's been part of the DevOps movement, is playing an important role in improving security effectiveness and Scott Crawford, Mark Ehr and Mike Fratto return to look at how this is happening with host Eric Hanselman. Security teams have always wrestled with making effective use of telemetry data from the infrastructure and applications they are securing. Correlating data from just the security tooling is hard enough, let alone adding operational data to the mix. Security Information and Event Management (SIEM) systems came into existence many years ago specifically to address this problem, but they were complex to configure and operate and needed tending to stay accurate. The volumes of data coming from observability initiatives was promising, but new approaches were required and AI and ML have been key to unlocking that value. Once again, we've hit an opportunity where it's all about the data and getting it to where it can be put to work. The Open Telemetry project simplified data interchange, but the question remained as to where all of this data had to live. It's not practical to get all of the data in one place, but data fabrics and federation can manage access effectively. Better correlation opens the door to many possibilities, including building a single source of truth for IT assets. There's a lot of benefit to bringing security and operations together. More S&P Global Content: AI for security: Agentic AI will be a focus for security operations in 2025 AI in action: unleashing agentic potential For S&P Global subscribers: 2026 Trends in Information Security Deal Analysis: Palo Alto Acquires Chronosphere Big Picture Report: 2026 AI Outlook – Unleashing agentic potential Credits: Host/Author: Eric Hanselman Guests: Scott Crawford, Mark Ehr, Mike Fratto Producer/Editor: Feranmi Adeoshun Published With Assistance From: Sophie Carr, Kyra Smith
Neste episódio, destrinchamos como o Kong conversa com a Gateway API no Kubernetes, passamos por GatewayClass, Gateway e HTTPRoute, e mostramos onde os plugins entram para dar aquele boost de segurança e observabilidade.A gente também faz o raio‑X dos componentes, comenta escolhas de arquitetura (do balanceamento de tráfego ao mTLS com cert‑manager) e debate os trade‑offs entre Ingress Controller tradicional e o ecossistema moderno da Gateway API. Sem prometer milagres, mas prometendo menos YAML sofrido.E claro: não faltam comparações sinceras entre OSS e Enterprise, além de dicas de onde cavar documentação que presta.Links Importantes: - Marco Ollivier - https://www.linkedin.com/in/marcopollivier/ - Slides DOD - https://docs.google.com/presentation/d/1GxcpOBaomthc4gDnmNSakEMfMZIkiseB16KMRVdnNkw/edit?usp=sharing - João Brito - https://www.linkedin.com/in/juniorjbn/ - Kong - https://github.com/Kong/kongO Kubicast é uma produção da Getup, empresa especialista em Kubernetes e projetos open source para Kubernetes. Os episódios do podcast estão nas principais plataformas de áudio digital e no YouTube.com/@getupcloud.
This interview was recorded for the GOTO Book Club.http://gotopia.tech/bookclubAlex Ewerlöf - Senior Staff Engineer at Volvo Cars & Author of "Reliability Engineering Mindset"Charity Majors - Co-Founder & CTO of honeycomb.io & Co-Author of "Observability Engineering"RESOURCESAlexhttps://bsky.app/profile/alexewerlof.comhttps://www.linkedin.com/in/alexewerlofhttps://www.alexewerlof.comCharityhttps://twitter.com/mipsytipsyhttps://linkedin.com/in/charity-majorshttps://charity.wtfhttps://www.honeycomb.io/blog/slos-are-the-api-for-your-engineering-teamDESCRIPTIONAlex Ewerlöf shares his journey from product engineering to reliability engineering and discusses the practical challenges of implementing Google's SRE practices in real-world companies.He emphasizes the significant gap between Google's idealized SRE approach — which he links to "a fantastic chef's recipe for Michelin-starred restaurants" — and the reality most companies face with limited resources and infrastructure. The discussion covers key topics including the evolution from traditional operations to where engineers own their code in production, the critical importance of choosing SLIs that align with business impact, and how SLOs help set expectations and help the service consumers prepare non-functional requirements.Alex coined the law of 10x per 9 highlighting that reliability isn't free and requires careful cost-benefit analysis.RECOMMENDED BOOKSAlex Ewerlöf • Reliability Engineering Mindset • https://blog.alexewerlof.com/p/remC. Majors, L. Fong-Jones & G. Miranda • Observability Eng. • https://amzn.to/38scbmaC. Majors & L. Campbell • Database Reliability Eng. • https://amzn.to/3ujybdSAlex Hidalgo • Implementing Service Level Objectives • https://amzn.to/4pbWJxwBrian Klaas • Fluke • https://amzn.to/41V1CkoSimler & Hanson • The Elephant in the BrPsst! The Folium Diary has something it wants to tell you - please come a little closer...YOU can change the world - you do it every day. Let's change it for the better, together.Listen on: Apple Podcasts SpotifyBlueskyTwitterInstagramLinkedInFacebookCHANNEL MEMBERSHIP BONUSJoin this channel to get early access to videos & other perks:https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/joinLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!
Tech leaders are often led to believe that they have “full-stack observability.” The MELT framework—metrics, events, logs, and traces—became the industry standard for visibility. However, Robert Cowart, CEO and Co-Founder of ElastiFlow, believes that this MELT framework leaves a critical gap. In the latest episode of the Tech Transformed podcast, host Dana Gardner, President and Principal Analyst at Interabor Solutions, sits down with Cowart to discuss network observability and its vitality in achieving full-stack observability.The speakers discuss the limitations of legacy observability tools that focus on MELT and how this leaves a significant and dangerous blind spot. Cowart emphasises the need for teams to integrate network data enriched with application context to enhance troubleshooting and security measures. What's Beyond MELT?Cowart explains that when it comes to the MELT framework, meaning “metrics, events, logs, and traces, think about the things that are being monitored or observed with that information. This is alluded to servers and applications.“Organisations need to understand their compute infrastructure and the applications they are running on. All of those servers are connected to networks, and those applications communicate over the networks, and users consume those services again over the network,” he added.“What we see among our growing customer base is that there's a real gap in the full-stack story that has been told in the market for the last 10 years, and that is the network.”The lack of insights results in a constant blind spot that delays problem-solving, hides user-experience issues, and leaves organizations vulnerable to security threats. Cowart notes that while performance monitoring tools can identify when an application call to a database is slow, they often don't explain why.“Was the database slow, or was the network path between them rerouted and causing delays?” he questions. “If you don't see the network, you can't find the root cause.”The outcome is longer troubleshooting cycles, isolated operations teams, and an expensive “blame game” among DevOps, NetOps, and SecOps.Elastiflow's approaches it differently. They focus on observability to network connectivity—understanding who is communicating with whom and how that communication behaves. This data not only speeds up performance insights but also acts as a “motion detector” within the organization. Monitoring east-west, north-south, and cloud VPC flow logs helps organizations spot unusual patterns that indicate internal threats or compromised systems used for launching external attacks.“Security teams are often good at defending the perimeter,” Cowart says. “But once something gets inside, visibility fades. Connectivity data fills that gap.”Isolated Monitoring to Unified Experience Cowart believes that observability can't just be about green lights...
Kennst du das? Neun Klicks sind blitzschnell, der zehnte hängt gefühlt ewig. Genau da frisst die Tail Latency deine User Experience und der Durchschnittswert hilft dir kein bisschen. In dieser Episode tauchen wir in Request Hedging ein, also das bewusste Duplizieren von Requests, um P99 zu drücken und Ausreißer zu entschärfen.Wir starten mit einem kurzen Recap zu Resilience Engineering: Timeouts, Retries, Exponential Backoff, Jitter, Circuit Breaker. Danach gehen wir tief rein ins Hedging: Was ist der Hedge Threshold, warum optimieren wir auf Tail statt Head Latency und wie Perzentile wie P50, P95 und P99 die Sicht auf Performance verändern. Wir zeigen, wie du Hedging sicher umsetzt, ohne dein Backend zu überlasten, wo Idempotenz Pflicht ist und warum Schreibzugriffe besonders heikel sind.In der Praxis klären wir, wie du Requests sauber cancelst: HTTP 1.1 via FIN und Reset, HTTP 2 mit RESET_STREAM, gRPC Support und wie Go mit Context Cancellation nativ hilft. Zum Tooling gibt es echte Beispiele: Envoy als Cloud-native Proxy mit Hedging, gRPC, Open Source Erfahrungen. In der Datenbankwelt sprechen wir über Read Hedging, Quorum Reads und Write-Constraints bei Cassandra und Kafka, über Vitess im MySQL-Universum und Grenzen von PG Bouncer. Auch Caches wie Redis und Memcached sowie DNS Patterns wie Happy Eyeballs sind am Start. Historisch ordnen wir das Ganze mit The Tail at Scale von Jeff Dean ein und schauen, wie Google, Netflix, Uber, LinkedIn oder Cloudflare Hedging verwenden.Am Ende nimmst du klare Best Practices mit: Hedging gezielt auf Tail Latency einsetzen, Requests wirklich canceln, Idempotenz sicherstellen, dynamische Thresholds mit Observability füttern und deine Guardrails definieren.Neugierig, ob Hedging dein P99 rettet, ohne dich selbst zu ddosen? Genau darum geht es.Bonus: Hedgehog hat damit nichts zu tun, auch wenn der Name dazu verführt.Keywords: Resilience Engineering, Request Hedging, Tail Latency, P99, Perzentile, Microservices, HTTP 2, gRPC, Go Context, Observability, Monitoring, Prometheus, Grafana, Envoy, Open Source, Cassandra, Kafka, Vitess, Redis, Memcached, Quorum Reads, Tech Community, Networking.Unsere aktuellen Werbepartner findest du auf https://engineeringkiosk.dev/partnersDas schnelle Feedback zur Episode:
A lot of network monitoring tools allow you to say, “It's not the network,” but a more useful tool would not only tell you that it's not the network, but also what the problem actually is. Today our guest is Brandon Hale, CTO at IBM SevOne. He is here to give us an overview of... Read more »
A lot of network monitoring tools allow you to say, “It's not the network,” but a more useful tool would not only tell you that it's not the network, but also what the problem actually is. Today our guest is Brandon Hale, CTO at IBM SevOne. He is here to give us an overview of... Read more »
DevOps practitioners — whether developers, operators, SREs or business stakeholders — increasingly rely on telemetry to guide decisions, yet face growing complexity, siloed teams and rising observability costs. In a conversation at KubeCon + CloudNativeCon North America, IBM's Jacob Yackenovich emphasized the importance of collecting high-granularity, full-capture data to avoid missing critical performance signals across hybrid application stacks that blend legacy and cloud-native components. He argued that observability must evolve to serve both technical and nontechnical users, enabling teams to focus on issues based on real business impact rather than subjective judgment.AI's rapid integration into applications introduces new observability challenges. Yackenovich described two patterns: add-on AI services, such as chatbots, whose failures don't disrupt core workflows, and blocking-style AI components embedded in essential processes like fraud detection, where errors directly affect application function.Rising cloud and ingestion costs further complicate telemetry strategies. Yackenovich cautioned against limiting visibility for budget reasons, advocating instead for predictable, fixed-price observability models that let organizations innovate without financial uncertainty.Learn more from The New Stack about the latest in observability: Introduction to ObservabilityObservability 2.0? Or Just Logs All Over Again?Building an Observability Culture: Getting Everyone OnboardJoin our community of newsletter subscribers to stay on top of the news and at the top of your game. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Adriana is a CNCF Ambassador, blogger, host of the Geeking Out Podcast, and a maintainer of the OpenTelemetry End User SIG. By day, she's a Principal Developer Advocate at Dynatrace, focusing on Observability and OpenTelemetry. By night, she climbs walls. She also loves capybaras, because they make her happy.You can find Adriana on the following sites:BlueskyBlogLinkedInGitHubMastodonYouTubePLEASE SUBSCRIBE TO THE PODCASTSpotifyApple PodcastsYouTube MusicAmazon MusicRSS FeedYou can check out more episodes of Coffee and Open Source on https://www.coffeeandopensource.comCoffee and Open Source is hosted by Isaac Levin
Want trustworthy AI? Discover how observability, real-time monitoring, and modern platforms are reshaping how we build accountable, explainable systems.
Honeycomb's VP of Marketing Shabih Syed reveals why traditional observability is dead and how AI-powered tools are transforming the way engineers debug production systems, with real examples.Topics Include:Observability is how you understand and troubleshoot your production systems in real-timeShabih's 18-year journey: developer to product manager to marketing VP shares unique perspectiveAI coding assistants are fundamentally changing how fast engineers ship code to productionCustomer patience is gone - one checkout failure means losing them foreverOver 90% of engineers now "vibe code" with AI, creating new complexityObservability costs are spiraling - engineers forced to limit logging, creating debugging dead-endsHoneycomb reimagines observability: meeting expectations, reducing complexity, breaking the cost curveMajor customers like Booking.com and Intercom already transforming with AI-native observabilityMCP server brings production data directly into your IDE for real-time AI assistanceCanvas enables plain English investigations to find "unknown unknowns" before they become problemsAnomaly detection helps junior engineers spot issues they wouldn't know to look forStatic dashboards are dead - AI-powered workflows are the future of system observationParticipants:Shabih Syed - VP Product Marketing, Honeycomb.io See how Amazon Web Services gives you the freedom to migrate, innovate, and scale your software company at https://aws.amazon.com/isv/
The AWS US-East problems on Oct 27th was a good reminder how depending we are on globally shared services. Built-in Resiliency is not guaranteed if systems have a hard dependency on a single region of a single vendor. Many of us have experienced systems being impacted that we use on a daily basis - some critical - some not so critical as Andi will tell you when he found out that is beloved Leberkas Pepi App didnt work!Besides this outage we discuss lessons learned from Cloud Native Days Austria, Observability and Platform Engineering Meetups in Gdansk and Tallinn as well as giving an outline to the upcoming Cloud and AI-Native US Tour from Henrik Rexed and Andi GrabnerAll the links we discussed are hereLeberkas Pepi: https://www.leberkaspepi.at/Cloud Native Austria: https://www.linkedin.com/company/cndaustria/Observability Meetup: https://www.meetup.com/observability-tech-community-meetup-group/US Tour from Henrik and Andi: https://events.dynatrace.com/noram-all-de-engineering-efficiency-tour-2025-28225/
Ever wondered how source maps actually work? In this episode, Nicolo Ribaudo, Babel maintainer and TC39 delegate, breaks down how source maps connect your JavaScript, TypeScript, and CSS back to the original code — making debugging, stack traces, and observability smoother in Chrome dev tools. We dive into how source maps help in both development and production with minified code, explore tools like Webpack, Rollup, Next.js, and Svelte, and share when you should turn off source maps to avoid confusion. Links Website: https://nicr.dev LinkedIn: https://www.linkedin.com/in/nicol%C3%B2-ribaudo-bb94b4187 BlueSky: https://bsky.app/profile/nicr.dev Github: https://github.com/nicolo-ribaudo Resources Squiggleconf talk: https://squiggleconf.com/2025/sessions#source-maps-how-does-the-magic-work Slide deck: https://docs.google.com/presentation/d/1lyor5xgv821I4kUWJIwrrmXBjzC_qiqIqcZxve1ybw0 We want to hear from you! How did you find us? Did you see us on Twitter? In a newsletter? Or maybe we were recommended by a friend? Fill out our listener survey (https://t.co/oKVAEXipxu)! https://t.co/oKVAEXipxu Let us know by sending an email to our producer, Elizabeth, at elizabet.becz@logrocket.com (mailto:elizabeth.becz@logrocket.com), or tweet at us at PodRocketPod (https://twitter.com/PodRocketpod). Check out our newsletter (https://blog.logrocket.com/the-replay-newsletter/)! https://blog.logrocket.com/the-replay-newsletter/ Follow us. Get free stickers. Follow us on Apple Podcasts, fill out this form (https://podrocket.logrocket.com/get-podrocket-stickers), and we'll send you free PodRocket stickers! What does LogRocket do? LogRocket provides AI-first session replay and analytics that surfaces the UX and technical issues impacting user experiences. Start understanding where your users are struggling by trying it for free at LogRocket.com. Try LogRocket for free today. (https://logrocket.com/signup/?pdr) Chapters 00:00 Intro – Welcome to PodRocket + Introducing Nicolo Ribaudo 00:45 What Are Source Maps and Why They Matter for Debugging 01:20 From Babel to TC39 – Nicolo's Path to Source Maps 02:00 Source Maps Beyond JavaScript: CSS, C, and WebAssembly 03:00 The Core Idea – Mapping Compiled Code Back to Source 04:00 How Source Maps Work Under the Hood (Encoded JSON) 05:10 File Size and Performance – Why It Doesn't Matter in Production 06:00 Why Source Maps Are Useful Even Without Minification 07:00 Sentry and Error Monitoring – How Source Maps Are Used in Production 08:10 Two Worlds: Local Debugging vs. Remote Error Analysis 09:00 You're Probably Using Source Maps Without Realizing It 10:00 Why Standardization Was Needed After 15+ Years of Chaos 11:00 TC39 and the Creation of the Official Source Maps Standard 12:00 Coordinating Browsers, Tools, and Vendors Under One Spec 13:00 How Chrome, Firefox, and WebKit Implement Source Maps Differently 14:00 Why the Source Maps Working Group Moves Faster Than Other Standards 15:00 A Small, Focused Group of DevTools Engineers 16:00 How Build Tools and Bundlers Feed Into the Ecosystem 17:00 Making It Easier for Tool Authors to Generate Source Maps 18:00 How Frameworks Like Next.js and Vite Handle Source Maps for You 19:00 Common Pitfalls When Chaining Build Tools 20:00 Debugging Wrong or Broken Source Maps in Browsers 21:00 Upcoming Feature: Scopes for Variables and Functions 22:00 How Scopes Improve the Live Debugging Experience 23:00 Experimental Implementations and How to Try Them 24:00 Where to Find the TC39 Source Maps Group + Get Involved 25:00 Nicolo's Links – GitHub, BlueSky, and Talks Online 25:30 Closing Thoughts
In this episode of Crazy Wisdom, host Stewart Alsop talks with Jared Zoneraich, CEO and co-founder of PromptLayer, about how AI is reshaping the craft of software building. The conversation covers PromptLayer's role as an AI engineering workbench, the evolving art of prompting and evals, the tension between implicit and explicit knowledge, and how probabilistic systems are changing what it means to “code.” Stewart and Jared also explore vibe coding, AI reasoning, the black-box nature of large models, and what accelerationism means in today's fast-moving AI culture. You can find Jared on X @imjaredz and learn more or sign up for PromptLayer at PromptLayer.com.Check out this GPT we trained on the conversationTimestamps00:00 – Stewart Alsop opens with Jared Zoneraich, who explains PromptLayer as an AI engineering workbench and discusses reasoning, prompting, and Codex.05:00 – They explore implicit vs. explicit knowledge, how subject matter experts shape prompts, and why evals matter for scaling AI workflows.10:00 – Jared explains eval methodologies, backtesting, hallucination checks, and the difference between rigorous testing and iterative sprint-based prompting.15:00 – Discussion turns to observability, debugging, and the shift from deterministic to probabilistic systems, highlighting skill issues in prompting.20:00 – Jared introduces “LM idioms,” vibe coding, and context versus content—how syntax, tone, and vibe shape AI reasoning.25:00 – They dive into vibe coding as a company practice, cloud code automation, and prompt versioning for building scalable AI infrastructure.30:00 – Stewart reflects on coding through meditation, architecture planning, and how tools like Cursor and Claude Code are shaping AGI development.35:00 – Conversation expands into AI's cultural effects, optimism versus doom, and critical thinking in the age of AI companions.40:00 – They discuss philosophy, history, social fragmentation, and the possible decline of social media and liberal democracy.45:00 – Jared predicts a fragmented but resilient future shaped by agents and decentralized media.50:00 – Closing thoughts on AI-driven markets, polytheistic model ecosystems, and where innovation will thrive next.Key InsightsPromptLayer as AI Infrastructure – Jared Zoneraich presents PromptLayer as an AI engineering workbench—a platform designed for builders, not researchers. It provides tools for prompt versioning, evaluation, and observability so that teams can treat AI workflows with the same rigor as traditional software engineering while keeping flexibility for creative, probabilistic systems.Implicit vs. Explicit Knowledge – The conversation highlights a critical divide between what AI can learn (explicit knowledge) and what remains uniquely human (implicit understanding or “taste”). Jared explains that subject matter experts act as the bridge, embedding human nuance into prompts and workflows that LLMs alone can't replicate.Evals and Backtesting – Rigorous evaluation is essential for maintaining AI product quality. Jared explains that evals serve as sanity checks and regression tests, ensuring that new prompts don't degrade performance. He describes two modes of testing: formal, repeatable evals and more experimental sprint-based iterations used to solve specific production issues.Deterministic vs. Probabilistic Thinking – Jared contrasts the old, deterministic world of coding—predictable input-output logic—with the new probabilistic world of LLMs, where results vary and control lies in testing inputs rather than debugging outputs. This shift demands a new mindset: builders must embrace uncertainty instead of trying to eliminate it.The Rise of Vibe Coding – Stewart and Jared explore vibe coding as a cultural and practical movement. It emphasizes creativity, intuition, and context-awareness over strict syntax. Tools like Claude Code, Codex, and Cursor let engineers and non-engineers alike “feel” their way through building, merging programming with design thinking.AI Culture and Human Adaptation – Jared predicts that AI will both empower and endanger human cognition. He warns of overreliance on LLMs for decision-making and the coming wave of “AI psychosis,” yet remains optimistic that humans will adapt, using AI to amplify rather than atrophy critical thinking.A Fragmented but Resilient Future – The episode closes with reflections on the social and political consequences of AI. Jared foresees the decline of centralized social media and the rise of fragmented digital cultures mediated by agents. Despite risks of isolation, he remains confident that optimism, adaptability, and pluralism will define the next AI era.