POPULARITY
Categories
Marcin Grzejszczak, a veteran of observability spaces, discusses the current state of the space, including its evolution and the fine-grained details of how to instrument your system to capture all relevant information at every level - both inside services and between services communication. Read a transcript of this interview: http://bit.ly/4mDTkFW Subscribe to the Software Architects' Newsletter for your monthly guide to the essential news and experience from industry peers on emerging patterns and technologies: https://www.infoq.com/software-architects-newsletter Upcoming Events: InfoQ Dev Summit Munich (October 15-16, 2025) Essential insights on critical software development priorities. https://devsummit.infoq.com/conference/munich2025 QCon San Francisco 2025 (November 17-21, 2025) Get practical inspiration and best practices on emerging software trends directly from senior software developers at early adopter companies. https://qconsf.com/ QCon AI New York 2025 (December 16-17, 2025) https://ai.qconferences.com/ QCon London 2026 (March 16-19, 2026) https://qconlondon.com/ The InfoQ Podcasts: Weekly inspiration to drive innovation and build great teams from senior software leaders. Listen to all our podcasts and read interview transcripts: - The InfoQ Podcast https://www.infoq.com/podcasts/ - Engineering Culture Podcast by InfoQ https://www.infoq.com/podcasts/#engineering_culture - Generally AI: https://www.infoq.com/generally-ai-podcast/ Follow InfoQ: - Mastodon: https://techhub.social/@infoq - X: https://x.com/InfoQ?from=@ - LinkedIn: https://www.linkedin.com/company/infoq/ - Facebook: https://www.facebook.com/InfoQdotcom# - Instagram: https://www.instagram.com/infoqdotcom/?hl=en - Youtube: https://www.youtube.com/infoq - Bluesky: https://bsky.app/profile/infoq.com Write for InfoQ: Learn and share the changes and innovations in professional software development. - Join a community of experts. - Increase your visibility. - Grow your career. https://www.infoq.com/write-for-infoq
Most AI projects still fail, are too costly, or don't provide the value they hoped to gain. The root cause is nothing new: it's non-optimized models or code that runs the logic behind your AI Apps. The solution is also not new: tuning the system based on insights from Observability!To learn more about the state of AI Observability, we invited back Nir Gazit, CEO and Co-Founder of traceloop, the company behind OpenLLMetry, the open source observability standard that is seeing exponential adoption growth!Tune in and learn how OpenLLMetry became such a successful open source project, which problems it solves, and what we can learn from other AI project implementations that successfully launched their AI Apps and AgentsLinks we discussedNir's LinkedIn: https://www.linkedin.com/in/nirga/OpenLLMetry: https://github.com/traceloop/openllmetryTraceloop Hub LLM Gateway: https://www.traceloop.com/docs/hub
The discussion centers on two key design principles: observability, which ensures humans can understand what automated systems are doing and why, and direct ability, which allows humans to steer automation rather than simply turning it on or off. Using examples from aviation incidents like Boeing's MCAS system and emerging AI technologies, the episode demonstrates how these 25-year-old principles remain relevant for contemporary automation challenges in safety-critical systems. Discussion Points:(00:00) Background on automation and natural experiments in safety(04:58) Hard vs soft skills debate and limitations of binary thinking(08:12) Two common approaches to automation problems and their flaws(12:20) The substitution myth and why simple replacement doesn't work(17:25) Design principles for coordination, observability, and direct ability(24:33) Observability challenges with AI and machine learning systems(26:25) Direct ability and the problem of binary control options(30:47) Design implications and avoiding simplistic solutions(33:27) Practical takeaways for human automation coordinationLike and follow, send us your comments and suggestions for future show topics! Quotes:Drew Rae: "The moment you divide it up and you just try to analyze the human behavior or analyze the automation, you lose the understanding of where the safety is coming from and what's necessary for it to be safe."David Provan: "We actually don't think about that automation in the context of the overall system and all of the interfaces and everything like that. So we, we look at AI as AI and, you know, deploying. Introducing ai, but we don't do any kind of comprehensive analysis of, you know, what's gonna be all of the flow on implications and interfaces and potentially unintended consequences or the system, not necessarily just the technology or automation itself."Drew Rae: "It's not enough for an expert system to just like constantly tell you all of the underlying rules that it's applying, that that doesn't really give you the right level of visibility as understanding what it thinks the current state is."David Provan: "But I think this paper makes a really good argument, which is actually our automated system should be far more flexible than that. So I might be able to adjust, you know, it's functioning. If I know, if I, if I know enough about how it's functioning and why it's functioning, and I realize that the automation can't understand context and situation, then I should be able to make adjustments."Drew Rae: "There's, there's gotta be ways of allowing all the animation to keep working, but to be able to. Retain control, and that's a really difficult design problem."Resources:Link to the PaperThe Safety of Work PodcastThe Safety of Work on LinkedInFeedback@safetyofwork
The ClickHouse open source project has gained interest in the observability community, thanks to its outstanding performance benchmarks. Now ClickHouse is doubling down on observability with the release of ClickStack, a new open source observability stack that bundles in ClickHouse, OpenTelemetry and HyperDX frontend. I invited Mike Shi, the co-founder of HyperDX and co-creator of ClickStack, to tell us all about this new project. Mike is Head of Observability at ClickHouse, and brings prior observability experience with Elasticsearch and more.You can read the recap post: https://medium.com/p/73f129a179a3/Show Notes:00:00 episode and guest intro04:38 taking the open source path as an entrepreneur10:51 the HyperDX observability user experience 16:08 challenges in implementing observability directly on ClickHouse20:03 intro to ClickStack and incorporating OpenTelemetry32:35 balancing simplicity and flexibility36:15 SQL vs. Lucene query languages 39:06 performance, cardinality and the new JSON type52:14 use cases in production by OpenAI, Anthropic, Tesla and more55:38 episode outroResources:HyperDX https://github.com/hyperdxio/hyperdx ClickStack https://clickhouse.com/docs/use-cases/observability/clickstack Shopify's Journey to Planet-Scale Observability: https://medium.com/p/9c0b299a04ddClickHouse: Breaking the Speed Limit for Observability and Analytics https://medium.com/p/2004160b2f5e New JSON data type for ClickHouse: https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouseSocials:BlueSky: https://bsky.app/profile/openobservability.bsky.socialTwitter: https://twitter.com/OpenObservLinkedIn: https://www.linkedin.com/company/openobservability/YouTube: https://www.youtube.com/@openobservabilitytalksDotan Horovits============Twitter: @horovitsLinkedIn: www.linkedin.com/in/horovitsMastodon: @horovits@fosstodonBlueSky: @horovits.bsky.socialMike Shi=======Twitter: https://x.com/MikeShi42LinkedIn: https://www.linkedin.com/in/mikeshi42BlueSky: https://bsky.app/profile/mikeshi42.bsky.socialOpenObservability Talks episodes are released monthly, on the last Thursday of each month and are available for listening on your favorite podcast app and on YouTube.
Today, the Elixir Wizards wrap up Season 14 “Enter the Elixirverse.” Dan, Charles, and Sundi look back at some common themes: Elixir plays well with others, bridges easily to access languages and tools, and remains a powerful technology for data flow, concurrency, and developer experience. We revisit the popular topics of the year, from types and tooling to AI orchestration and reproducible dev environments, and share what we're excited to explore next. We also invite your questions and takeaways to help shape future seasons and conference conversations. Season 14 doubles as a handy primer for anyone curious about how Elixir integrates across the stack. Key topics discussed in this episode: * Lessons from a season of interoperability * Set-theoretic types and what new compiler warnings unlock * AI in practice: LLM orchestration, fallbacks, and real-world use * SDUI and GraphQL patterns for shipping UI across web/iOS/Android * Dataframes in Elixir with Explorer for analytics workflows * Python interoperability (ErlPort, PythonX) and when to reach for it * Reproducible dev environments with Nix and friends * Performance paths: Rustler and Zig for native extensions * Bluetooth & Nerves: Blue Heron and hardware integrations * DevEx upgrades: LiveView, build pipelines, and standard project setup * Observability and ops: Prometheus/Grafana and sensible deployments * Community feedback, conferences, and what's on deck for next season Links mentioned in this episode: Cars.com S14E06 SDUI at Scale with Elixir https://youtu.be/nloRcgngTk?si=g4Zd4N1s56Ronrtw https://hexdocs.pm/phoenixliveview/Phoenix.LiveView.html https://wordpress.com/ https://elixir-lang.org/ S14E01 Zigler: Zig NIFs for Elixir https://youtu.be/hSAvWxh26TU?si=d55tVuZbNw0KCfT https://ziglang.org/ https://hexdocs.pm/zigler/Zig.html https://github.com/blue-heron/blueheron https://github.com/elixir-explorer/explorer S14E08 Nix for Elixir Apps https://youtu.be/yymUcgy4OAk?si=BRgTlc2VK5bsIhIf https://nixos.org/ https://nix.dev/ S14E07 Set Theoretic Types in Elixir https://youtu.be/qMmEnXcHxL4?si=Ux2lebiwEp3mc0e S14E10 Python in Elixir Apps https://youtu.be/SpVLrrWkRqE?si=ld3oQVXVlWHpo7eV https://www.python.org/ https://hexdocs.pm/pythonx/ https://github.com/Pyrlang/Pyrlang https://github.com/erlport/erlport S14E03 LangChain: LLM Integration for Elixir https://youtu.be/OwFaljL3Ptc?si=A0sDs2dzJ0UoE2PY https://github.com/brainlid/langchain S14E04 Nx & Machine Learning in Elixir https://youtu.be/Ju64kAMLlkw?si=zdVnkBTTLHvIZNBm S14E05 Rustler: Bridging Elixir and Rust https://youtu.be/2RBw7B9OfwE?si=aRVYOyxxW8fTmoRA https://github.com/rusterlium/rustler Season 3: Working with Elixir https://youtube.com/playlist?list=PLTDLmInI9YaDbhMRpGuYpboVNbp1Fl9PD&si=hbe7qt4gRUfrMtpj S14E11 Vibe Coding the LoopedIn Crochet App https://youtu.be/DX0SjmPE92g?si=zCBPjS1huRDIeVeP Season 5: Adopting Elixir YouTubeLaunchisode and Outlaws Takeover with Chris Keathley, Amos King, and Anna Neyzberg S13E01 Igniter: Elixir Code Generation https://youtu.be/WM9iQlQSF_g?si=e0CAiML2qC2SxmdL Season 8: Elixir in a Polyglot Environment https://youtube.com/playlist?list=PLTDLmInI9YaAPlvMd-RDp6LWFjI67wOGN&si=YCI7WLA8qozD57iw !! We Want to Hear Your Thoughts *!!* Have questions, comments, or topics you'd like us to discuss on the podcast? Share your thoughts with us here: https://forms.gle/Vm7mcYRFDgsqqpDC9
I am joined by Ed of Cribl, an observability and telemetry platform for security tracking and teams.Capture your screen effortlesslySimple and clear, but packed with features, Simple Screenshot is a drop in replacement.https://go.chrischinchilla.com/simple-screenshot For show notes and an interactive transcript, visit chrischinchilla.com/podcast/To reach out and say hello, visit chrischinchilla.com/contact/To support the show for ad-free listening and extra content, visit chrischinchilla.com/support/
Marty Kagan, co-founder and CEO of Hydrolix, joined me for a detailed conversation on the importance of delivering better end-user QoE with real-time telemetry and the challenges that come with CDN observability. Marty highlights why more hot data is crucial for AIOps and challenges that M&E customers face today, including the need to discard data due to storage costs. Marty presents a compelling argument against the notion that most data is irrelevant and that content owners only need to retain a small percentage as a sample. We also discuss why Hydrolix isn't an AI company and Marty's plans to continue growing its streaming data lake platform in verticals outside of video.Podcast produced by Security Halt Media
We're taking the Code RED podcast public! Join Dash0 CEO Mirko Novakovic, CTO Ben Blackmore, and Principal AI Engineer Lariel Fernandes for a no-fluff look at AI in observability.We'll dig into:⃗⃗⃗→ What agentic observability might actually look like→ How OpenTelemetry enables the AI ecosystem→ How AI shows up in real engineering workflows→ And what still needs to be built
Many React Native apps ship without full observability. The result? Blind spots in performance, crashes, and user behavior once your app is in the wild. In this episode of React Universe On Air, Łukasz Chludziński sits down with Jonathan Munz (Senior Software Engineer at Embrace) and Adam Horodyski (React Native Expert at Callstack) to unpack how OpenTelemetry can bring structure and clarity to mobile monitoring. They break down why mobile observability is harder than observability on backend, what the OTLP protocol enables, and how to instrument React Native apps without locking into a single vendor. You'll also hear how community-driven tooling like React Native OpenTelemetry and the Embrace React Native SDK can simplify setup and improve data portability. You'll learn: ➡️ How observability and OpenTelemetry work together ➡️ The 3 core OpenTelemetry signal types for mobile ➡️ Why mobile instrumentation is more complex than backend telemetry ➡️ How OTLP improves interoperability between tools ➡️ Where auto-instrumentation is still missing in React Native ➡️ The role of Embrace and open-source libraries in reducing setup overhead Check out episode resources on our website
Software Engineering Radio - The Podcast for Professional Software Developers
Qian Li of DBOS, a durable execution platform born from research by the creators of Postgres and Spark, speaks with host Kanchan Shringi about building durable, observable, and scalable software systems, and why that matters for modern applications. They discuss database-backed program state, workflow orchestration, real-world AI use cases, and comparisons with other workflow technologies. Li explains how DBOS persists not just application data but also program execution state in Postgres to enable automatic recovery and exactly-once execution. She outlines how DBOS uses workflow and step annotations to build deterministic, fault-tolerant flows for everything from e-commerce checkouts to LLM-powered agents. Observability features, including SQL-accessible state tables and a time-travel debugger, allow developers and business users to understand and troubleshoot system behavior. Finally, she compares DBOS with tools like Temporal and AWS Step Functions. Brought to you by IEEE Computer Society and IEEE Software magazine.
This interview was recorded for the GOTO Book Club.http://gotopia.tech/bookclubRead the full transcription of the interview hereBen Evans - Senior Principal Software Engineer at Red Hat & Co-Author of "Optimizing Cloud Native Java" and many more BooksHolly Cummins - Senior Principal Software Engineer on the Red Hat Quarkus TeamRESOURCESBenhttps://mastodon.social/@kittylysthttps://www.linkedin.com/in/kittylysthttps://www.kittylyst.comHollyhttps://hollycummins.comhttps://bsky.app/profile/hollycummins.comhttps://hachyderm.io/@holly_cumminshttps://linkedin.com/in/holly-k-cumminsDESCRIPTIONHolly Cummins talks with Ben Evans about his latest book "Optimizing Cloud Native Java", which updates his previous work "Optimizing Java" to reflect the realities of cloud native environments.Ben explains that performance engineering is not just technical but also psychological, emphasizing the importance of user expectations and defining clear performance goals. They discuss how modern Java performance must account for cloud native architectures, with applications running across distributed microservices and containerized, single-core environments.The book focuses on the importance of measuring relevant data, warns against relying on misleading micro-benchmarks, and highlights how system-level benchmarks offer a clearer picture. Ben also delves into the JVM's hidden complexities, such as changes in Java 17 and the impact of virtual threads. Practical, real-world examples in the book, like the "fighting animals" microservices application, help developers learn how to optimize Java performance in real network environments.Finally, Ben touches on the future of Java concurrency, with virtual threads and structured concurrency offering new ways to handle performance challenges in cloud native systems.RECOMMENDED BOOKSBen Evans & Jim Gough • Optimizing Cloud Native JavaBen Evans, Jason Clark & David Flanagan • Java in a NutshellBen Evans, Martijn Verburg & Jason Clark • The Well-Grounded Java DeveloperBen Evans, Jim Gough & Chris Newland • Optimizing JavaBen Evans & Martijn Verburg • The Well-Grounded Java DeveloperBlueskyTwitterInstagramLinkedInFacebookCHANNEL MEMBERSHIP BONUSJoin this channel to get early access to videos & other perks:https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/joinLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!
Former Coralogix VP of Products and OverOps co-founder Chen Harel joins Dash0's Mirko Novakovic for a candid look at the observability industry — past, present and future. They unpack the early days of production debugging, the realities of scaling in a crowded market and the behind-the-scenes of big-name acquisitions. From surviving startup cycles to navigating enterprise politics, Chen shares what he's learned from a decade of building, selling and staying competitive in one of tech's most unique spaces.
In the Pit with Cody Schneider | Marketing | Growth | Startups
AI-driven search (AISEO) is opening a new lane for brands in competitive categories. Joe Davies from FATJOE explains why branded mentions (not just links) are increasingly what LLMs use to decide recommendations—and how teams can systematically earn those mentions. We cover tactics like guest blogging at scale, context-seeding your USP across reviews/listicles, building deep product docs to feed LLMs, and using tier-two links to get your “influencer pages” ranking. Early data shows 2–3× higher conversion rates from AI-referred traffic because buyers arrive pre-educated and ready to act.What You'll LearnWhy AISEO rewards brand mentions and clear USPs more than classic link metrics.How AI-referred traffic converts 2–3× higher than traditional search.A repeatable process to seed your brand in listicles, reviews, and comparisons.How to “context-seed” your USP so LLMs recommend you for the right reason.Why deep help docs / knowledge bases make LLMs more confident recommending you.How to choose targets (DR + real traffic), then lift them with tier-two links.The state of AISEO observability (what to track, what's still immature).Tactical Playbook (Step-by-Step)Define your USP: the specific “best for ___” angle you want LLMs to repeat.Keyword map long-tail, bottom-funnel queries (e.g., “best X for Y,” “X vs Y,” “X alternatives,” “[product] review”).Prospect targets with credible traffic (DR is fine as a filter, but prioritize verified organic traffic).Commission content: secure guest posts/listicles and full reviews on those sites. Mix formats to look natural.Context-seed your USP in every placement (e.g., “Best for small teams,” “Most features,” “Best value”).Include competitors in listicles/reviews so the page is useful (LLMs prefer balanced sources).Boost with tier-two links (niche edits, syndication) to help these pages rank on pages 1–3.Expand surface area: Reddit answers, YouTube/tutorial mentions, and social chatter to reinforce brand salience.On-site foundation: build exhaustive docs—features, integrations, FAQs, facts sections—so LLMs can learn you deeply.Measure pragmatically: track referral traffic from AI surfaces and downstream conversions; current “AI visibility” tools are early.Resources & MentionsChatGPT Path (shows the searches/sources ChatGPT runs under the hood): https://chromewebstore.google.com/detail/chatgpt-path/kiopibcjdnlpamdcdcnphaajccobkbanFATJOE — Brand Mentions Service: http://fatjoe.com/brand-mentionsFATJOE: https://fatjoe.com/Key TakeawaysAISEO is early but growing fast and already drives higher-intent traffic.Focus on being mentioned credibly across the open web; LLMs synthesize those signals.Listicles + reviews on high-trust, real-traffic sites are the current highest-leverage assets.Your docs are marketing now—LLMs read them and recommend accordingly.Don't abandon SEO; it remains the foundation that AI systems lean on.Chapters00:00 Cold open: AISEO's opportunity & why mentions matter03:45 Data: AI referrals converting 2–3× vs. classic SEO07:50 Who should prioritize AISEO (and who can wait)10:30 Tactics: listicles, reviews, and “context-seeding” your USP15:45 Tools & workflows; extension that reveals ChatGPT's queries19:45 Content ops: human vs. AI writing, plans, and clustering22:30 Build deep product docs to feed LLM understanding26:10 Ranking the influencer pages + tier-two links33:00 Observability today: what's useful, what isn't yet36:50 The next 5–10 years: AI + SEO, not AI vs. SEOGuestJoe DaviesX: https://x.com/fatjoedaviesLinkedIn: https://es.linkedin.com/in/joe-davies-seoWebsite: https://fatjoe.com/
We are overdue for a vendor neutral industry wide event dedicated to our favorite topic - open observability.Last month (June 2025) the Cloud Native Computing Foundation (CNCF) ran the first-ever Open Observability Summit, bringing together the world's best experts in the field in a day packed with talks from project maintainers, end users and practitioners.We're proud partners of the event, and are here to bring you the highlights from this industry-shaping event.This special episode has two parts, one recorded onsite before the event, covering conference goals, and insights from the talk submissions, and the other recorded after the event, covering the highlights of the events and the talks. The guests for is episode are two observability veterans: Alok Bhide, member of the event's content committee and head of product innovation at Chronosphere; and Henrik Rexed, developer advocate at Dynatrace, CNCF Ambassador, and host of Is It Observable podcast.Catch up on everything you need to know from the first-ever Open Observability Summit.You can read the recap post: https://medium.com/p/d42c8826d6a5/Show Notes:00:00 - intro02:52 - Part 1 pre-event03:40 - guest intro Alok Bhide04:49 - a new community event for open observability06:58 - talk submission highlights from the CFP content reviewer12:34 - a view of the open observability stack and its use 16:42 - Fluent Bit alignment with OpenTelemetry20:08 - AI in observability25:34 - Part 2 talk highlights26:22 - Fluent Bit vs. OpenTelemetry Collector benchmark analysis37:51 - OpenSearch 3.1 release40:47 - eBay's observability talk47:00 - Kotlin SDK for OTel talk for Android developers51:45 - Otel Collector fine-tuning talk53:52 - Broadcom OTel use case from mobile to mainframe56:43 - Spotify migration from in-house TSDB to VictoriaMetrics and Prometheus58:20 - OTel Collector replacement in Rust with the Rotel project1:00:58 - Noisy neighbors network observability1:03:04 - rising awareness of OTel semantic conventions 1:05:50 - outro Resources:Open Observability Summit + OTel Community Day: https://events.linuxfoundation.org/open-observability-summit-otel-community-day/eBay innovation with open source observability: https://www.youtube.com/watch?v=6ycNhzRVSbU&list=PLj6h78yzYM2NFT2PGItX2idBf7v8fHcy7&index=35 More on eBay's journey to planet-scale observability: https://www.youtube.com/watch?v=-UsU3nRglhA&list=PLd57eY2edRXz4djMETYTm-2p8WGTdoX3D Spotify talk: https://www.youtube.com/watch?v=87koDlpKDR4&list=PLj6h78yzYM2NFT2PGItX2idBf7v8fHcy7Kotlin SDK for OTel: https://www.youtube.com/watch?v=di5nhYvUh6w&list=PLj6h78yzYM2NFT2PGItX2idBf7v8fHcy7More on mobile observability with OTel: https://medium.com/p/2eb847c41941 OpenTelemtry Collector vs. Fluent Bit: https://www.youtube.com/watch?v=tZho5W9L_Z8&list=PLj6h78yzYM2NFT2PGItX2idBf7v8fHcy7&index=8Telemetry Pipelines: https://www.youtube.com/watch?v=0d1g5ZWAc1Y&list=PLj6h78yzYM2NFT2PGItX2idBf7v8fHcy7&index=30 OTel Collector in Rust with Rotel: https://www.youtube.com/watch?v=xeQnP8Ct7qY&list=PLj6h78yzYM2NFT2PGItX2idBf7v8fHcy7&index=16 Rotel project repo: https://github.com/streamfold/rotel Noisy neighbor detection: https://www.youtube.com/watch?v=xVqiOtXTEFA Socials:BlueSky: https://bsky.app/profile/openobservability.bsky.socialTwitter: https://twitter.com/OpenObservLinkedIn: https://www.linkedin.com/company/openobservability/YouTube: https://www.youtube.com/@openobservabilitytalksDotan Horovits============Twitter: https://twitter.com/horovits LinkedIn: https://www.linkedin.com/in/horovits/ BlueSky: https://bsky.app/profile/horovits.bsky.social Mastodon: https://fosstodon.org/@horovitsHenrik Rexed===========LinkedIn: https://www.linkedin.com/in/hrexed/BlueSky: @hrexed.bsky.socialYouTube: https://www.youtube.com/@isitobservable Alok Bhide=========LinkedIn: https://www.linkedin.com/in/albhide/
Bret is joined by Andrew Tunall, the President and Chief Product Officer at Embrace, to discuss his prediction that we'll all start shipping non-QA'd code (buggier code in production) and QA will need to be replaced with better observability.
For memberships: join this channel as a member here:https://www.youtube.com/channel/UC_mGuY4g0mggeUGM6V1osdA/joinSummaryIn this conversation, Nitish Tiwari discusses Parseable, an observability platform designed to address the challenges of managing and analyzing large volumes of data. The discussion covers the evolution of observability systems, the design principles behind Parseable, and the importance of efficient data ingestion and storage in S3. Nitish explains how Parseable allows for flexible deployment, handles data organization, and supports querying through SQL. The conversation also touches on the correlation of logs and traces, failure modes, scaling strategies, and the optional nature of indexing for performance optimization.References:Parseable: https://www.parseable.com/GitHub Repository: https://github.com/parseablehq/parseableArchitecture: https://parseable.com/docs/architecture Chapters:00:00 Introduction to Parseable and Observability Challenges05:17 Key Features of Parseable12:03 Deployment and Configuration of Parseable18:59 Ingestion Process and Data Handling32:52 S3 Integration and Data Organisation35:26 Organising Data in Parseable38:50 Metadata Management and Retention39:52 Querying Data: User Experience and SQL44:28 Caching and Performance Optimisation46:55 User-Friendly Querying: SQL vs. UI48:53 Correlating Logs and Traces50:27 Handling Failures in Ingestion53:31 Managing Spiky Workloads54:58 Data Partitioning and Organisation58:06 Creating Indexes for Faster Reads01:00:08 Parseable's Architecture and Optimisation01:03:09 AI for Enhanced Observability01:05:41 Getting Involved with ParseableFor memberships: join this channel as a member here:https://www.youtube.com/channel/UC_mGuY4g0mggeUGM6V1osdA/joinDon't forget to like, share, and subscribe for more insights!=============================================================================Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription.https://app.codecrafters.io/join?via=geeknarrator=============================================================================Database internals series: https://youtu.be/yV_Zp0Mi3xsPopular playlists:Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_dModern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsNStay Curios! Keep Learning!#database #s3 #objectstorage #opentelemetry #logs #metrics
"Privacy engineering is the art of translating privacy laws and policies into code, figuring out how to make legal requirements such as ‘an individual must be able to request deletion of all their personal data' a technical reality.", was the elegant explanation from Cat Easdon when asked about what she is doing in her day job.If you want to learn more then tune in to this episode. Cat, Privacy Engineer at Dynatrace, shares her learnings about things such as: When the right time is to form your own privacy engineering team, why privacy means different things for different people and regulators and what privacy considerations we specifically have in the observability industry so that our users trust our services!Links:Cat's LinkedIn Profile: https://www.linkedin.com/in/easdon/Publications from Cat: https://www.dynatrace.com/engineering/persons/catherine-easdon/Blog on Managing Sensitive Data at Scale: https://www.dynatrace.com/news/blog/manage-sensitive-data-and-privacy-requirements-at-scale/Semgrep for lightweight code scanning: https://github.com/semgrep/semgrepThe IAPP: https://iapp.org/'Meeting your users' expectations' is formally described by the theory of contextual integrity: https://www.open.edu/openlearncreate/mod/page/view.php?id=214540Facebook's $5 billion fine from the FTC: http://ftc.gov/news-events/news/press-releases/2019/07/ftc-imposes-5-billion-penalty-sweeping-new-privacy-restrictions-facebookFact-check: "The $5 billion penalty against Facebook is the largest ever imposed on any company for violating consumers' privacy and almost 20 times greater than the largest privacy or data security penalty ever imposed worldwide. It is one of the largest penalties ever assessed by the U.S. government for any violation." I think that's still true; the largest fine under the GDPR was €1.2 billion (again for Facebook/Meta)
How much has DevOps really changed since the AI boom? Are you truly observing your Salesforce org, or just reacting to fires? When do you pay down technical debt versus pushing for the next big feature?If you're asking these questions, you're not alone. Host Jack McCurdy is joined by DevOps expert Andy Barrick to tackle the tough challenges facing teams today. This practical conversation covers the evolution of the field since 2022, providing actionable advice on implementing proactive observability, managing risk, and making incremental changes that deliver massive impact.Learn more:Read the observability whitepaper: https://grst.co/4lFfxmmHow Salesforce teams execute observability for Salesforce: https://grst.co/3GXddrUSee Flow and Apex Error Monitoring in action: https://grst.co/3IF5EqqAbout DevOps Diaries: Salesforce DevOps Advocate Jack McCurdy chats to members of the Salesforce community about their experience in the Salesforce ecosystem. Expect to hear and learn from inspirational stories of personal growth and business success, whilst discovering all the trials, tribulations, and joy that comes with delivering Salesforce for companies of all shapes and sizes. New episodes bi-weekly on YouTube as well as on your preferred podcast platform.Podcast produced and sponsored by Gearset. Learn more about Gearset: https://grst.co/4iCnas2Subscribe to Gearset's YouTube channel: https://grst.co/4cTAAxmLinkedIn: https://www.linkedin.com/company/gearsetX/Twitter: https://x.com/GearsetHQFacebook: https://www.facebook.com/gearsethqAbout Gearset: Gearset is the leading Salesforce DevOps platform, with powerful solutions for metadata and CPQ deployments, CI/CD, automated testing, sandbox seeding and backups. It helps Salesforce teams apply DevOps best practices to their development and release process, so they can rapidly and securely deliver higher-quality projects. Get full access to all of Gearset's features for free with a 30-day trial: https://grst.co/4iKysKWChapters:00:00 The Evolution of DevOps Since 202202:35 The Role of AI in Automation05:21 Testing Fundamentals in DevOps08:29 Understanding Observability in Salesforce11:05 Reactive vs Proactive Observability14:03 The Importance of Proactive Monitoring16:29 Implementing Observability in DevOps19:41 Starting Your Observability Journey22:14 Balancing Refactoring and New Initiatives24:56 Risk Management and Observability27:48 Final Thoughts on Observability
I am in conversation with Tom Elliott, founder of Ocuroot and former Engineering Productivity lead at Yext, Introduction:Tom Elliott shares his career journey, starting from his early interest in computers to his current role in Dev tooling .Career Insights:Tom discusses the challenges of entering the industry during the financial crash and his transition from contract work to a full-time role at VMware .He highlights his experience at VMware, working on early-stage projects like building login pages and authentication systems .Shift to New York:Tom talks about his move to New York and his work at a small VPN startup, focusing on user-facing applications .Experience at Yext:Tom shares his journey at Yext, starting as a mobile developer and gradually moving to backend development and Dev tooling .He emphasizes the importance of being close to the users and getting immediate feedback on the tools he built .Challenges and Solutions:Tom discusses the challenges of working in large organizations, such as resolving merge conflicts and managing long-lived branches .He explains the benefits of trunk-based development and feature flags for managing multiple features and environments .Observability and Deployment:Tom highlights the importance of observability and the use of tools like open telemetry for distributed tracing .He shares insights on managing different deployment environments and ensuring consistency across regions .Quality and CI/CD Pipelines:Tom talks about the emphasis on quality and the importance of CI/CD pipelines in ensuring reliable software releases .He shares his experience of setting up CI/CD pipelines to avoid issues like broken installers .Conclusion:Tom reflects on the importance of flexibility and prototyping in software development .He shares his thoughts on the future of AI in coding and the role of human operators in leveraging AI tools .Bio:During nearly 20 years in the tech industry, Tom has worked for companies large and small on both sides of the pond and all layers of the tech stack from user-facing mobile and desktop applications to the backest of backends: DevOps. He is currently building Ocuroot, his own take on a CI/CD solution, based on his experiences scaling large numbers of environments for B2B SaaS products.Links: * LinkedIn: https://www.linkedin.com/in/telliott1984/ * BlueSky: https://bsky.app/profile/telliott.me* Blog: https://thefridaydeploy.substack.com/* Ocuroot: https://www.ocuroot.com
Outline00:00 - Intro01:06 - The big idea03:42 - Controllability, observability, and ... the space race!14:52 - Kálmán and the state-space paradigm00:00 - The math and intuition: state-space basics, definitions, and duality00:00 - A touch of nonlinearity00:00 - Developments in the field: a chronological tour00:00 - Controllability and observability: quo vaditis?00:00 - OutroLinksKálmán: https://tinyurl.com/bdzj7mtrControllability: https://tinyurl.com/28s5zxpnObservability: https://tinyurl.com/yjxncxdnPaper - "Contributions to the theory of optimal control": https://tinyurl.com/9wwf8pvhPaper - "Discovery and Invention": https://tinyurl.com/ryfn463nKálmán's speech - Kyoto Prize : https://tinyurl.com/2ahrjdahPaper - Controllability of complex networks: https://tinyurl.com/3zk99n4sSupport the showPodcast infoPodcast website: https://www.incontrolpodcast.com/Apple Podcasts: https://tinyurl.com/5n84j85jSpotify: https://tinyurl.com/4rwztj3cRSS: https://tinyurl.com/yc2fcv4yYoutube: https://tinyurl.com/bdbvhsj6Facebook: https://tinyurl.com/3z24yr43Twitter: https://twitter.com/IncontrolPInstagram: https://tinyurl.com/35cu4kr4Acknowledgments and sponsorsThis episode was supported by the National Centre of Competence in Research on «Dependable, ubiquitous automation» and the IFAC Activity fund. The podcast benefits from the help of an incredibly talented and passionate team. Special thanks to L. Seward, E. Cahard, F. Banis, F. Dörfler, J. Lygeros, ETH studio and mirrorlake . Music was composed by A New Element.
Recebemos o Daniel Romeiro — mais conhecido como Infoslack — para mergulhar de cabeça no universo em ebulição de Inteligência Artificial, DevOps e Machine Learning. Neste episódio, exploramos como filtrar o ruído do hype com uma abordagem de filtro reverso e discutimos os bastidores do deploy de modelos de Machine Learning em produção.Trocamos experiências sobre observabilidade avançada em pipelines de IA e compartilhamos insights sobre como acumular habilidades DevOps ao longo da carreira, sem jamais perder o pé no chão. Entre uma piada e outra, analisamos também o impacto dos testes A/B em tempo real e a complexidade de gerenciar artefatos de IA em escala.Por fim, refletimos sobre as perspectivas futuras: qual será o próximo grande passo para SREs que querem continuar relevantes em um cenário dominado por IA generativa? Nós conversamos sobre como arquiteturas mal planejadas podem se tornar gargalos de latência e apresentamos estratégias para garantir alta disponibilidade mesmo quando as APIs externas decidem ficar fora do ar.Links Importantes:- Daniel Romeiro - https://www.linkedin.com/in/infoslack/- João Brito - https://www.linkedin.com/in/juniorjbn- Assista ao FilmeTEArapia - https://youtu.be/M4QFmW_HZh0?si=HIXBDWZJ8yPbpflMParticipe de nosso programa de acesso antecipado e tenha um ambiente mais seguro em instantes!https://getup.io/zerocveO Kubicast é uma produção da Getup, empresa especialista em Kubernetes e projetos open source para Kubernetes. Os episódios do podcast estão nas principais plataformas de áudio digital e no YouTube.com/@getupcloud.
Software Engineering Radio - The Podcast for Professional Software Developers
Jacob Visovatti and Conner Goodrum of Deepgram speak with host Kanchan Shringi about testing ML models for enterprise use and why it's critical for product reliability and quality. They discuss the challenges of testing machine learning models in enterprise environments, especially in foundational AI contexts. The conversation particularly highlights the differences in testing needs between companies that build ML models from scratch and those that rely on existing infrastructure. Jacob and Conner describe how testing is more complex in ML systems due to unstructured inputs, varied data distribution, and real-time use cases, in contrast to traditional software testing frameworks such as the testing pyramid. To address the difficulty of ensuring LLM quality, they advocate for iterative feedback loops, robust observability, and production-like testing environments. Both guests underscore that testing and quality assurance are interdisciplinary efforts that involve data scientists, ML engineers, software engineers, and product managers. Finally, this episode touches on the importance of synthetic data generation, fuzz testing, automated retraining pipelines, and responsible model deployment—especially when handling sensitive or regulated enterprise data. Brought to you by IEEE Computer Society and IEEE Software magazine.
Operational disruptions don't come unannounced. There are always early warning signs—if only you have the right observability solution to catch them before they snowball into costly outages. In this episode of the AI in Action podcast, Vikram Murali, VP of Software Development for IBM Automation, explains how continuous observability has taken the baton from traditional monitoring, giving a new meaning to operational resilience. Observability has transformed reactive measures into proactive steps, leading to more seamless operations and fewer costly disruptions. Catch the full discussion to learn how you can run your business smoothly by tapping into the combined power of observability, automation and remediation. 0:00 Intro03:35 How do AI solutions affect observability?07:35 How much visibility does observability provide?12:26 The difference between good and bad observability tools20:21 Choosing the right data for the observability tool23:42 Balancing power and control in agentic frameworks26:33 Getting started with observability The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity.
Software Engineering Radio - The Podcast for Professional Software Developers
Samuel Colvin, the CEO and founder of Pydantic, speaks with host Gregory M. Kapfhammer about the ecosystem of Pydantic's Python frameworks, including Pydantic, Pydantic AI, and Pydantic Logfire. Along with discussing the design, implementation, and use of these frameworks, they dive into the refactoring of Pydantic and the follow-on performance improvements. They also explore ways in which Python programmers can use these three frameworks to build, test, evaluate, and monitor their own applications that interact with both local and cloud-based large language models. Brought to you by IEEE Computer Society and IEEE Software magazine.
Zack Kayser, Staff Software Engineer at cars.com, joins Elixir Wizards Sundi Myint and Charles Suggs to discuss how Cars.com adopted a server-driven UI (SDUI) architecture powered by Elixir and GraphQL to deliver consistent, updatable interfaces across web, iOS, and Android. We explore why SDUI matters for feature velocity, how a mature design system and schema planning make it feasible, and what it takes, culturally and technically, to move UI logic from client code into a unified backend. Key topics discussed in this episode: SDUI fundamentals and how it differs from traditional server-side rendering GraphQL as the single source of truth for UI components and layouts Defining abstract UI components on the server to eliminate duplicate logic Leveraging a robust design system as the foundation for SDUI success API-first development and cross-team coordination for schema changes Mock data strategies for early UI feedback without breaking clients Handling breaking changes and hot-fix deployments via server-side updates Enabling flexible layouts and A/B testing through server-controlled ordering Balancing server-driven vs. client-managed UI Iterative SDUI rollout versus “big-bang” migrations in large codebases Using type specs and Dialyxir for clear cross-team communication Integration testing at the GraphQL layer to catch UI regressions early Quality engineering's role in validating server-driven interfaces Production rollback strategies across web and native platforms Considerations for greenfield projects adopting SDUI from day one Zack and Ethan's upcoming Instrumenting Elixir Apps book Links mentioned: https://cars.com https://github.com/absinthe-graphql/absinthe Telemetry & Observability for Elixir Apps Ep: https://youtu.be/1V2xEPqqCso https://www.phoenixframework.org/blog/phoenix-liveview-1.0-released https://hexdocs.pm/phoenixliveview/assigns-eex.html https://graphql.org/ https://tailwindcss.com/ https://github.com/jeremyjh/dialyxir https://github.com/rrrene/credo GraphQL Schema https://graphql.org/learn/schema/ SwiftUI https://developer.apple.com/documentation/swiftui/ Kotlin https://kotlinlang.org/ https://medium.com/airbnb-engineering/a-deep-dive-into-airbnbs-server-driven-ui-system-842244c5f5 Zack's Twitter: https://x.com/kayserzl/ Zack's LinkedIn: https://www.linkedin.com/in/zack-kayser-93b96b88 Special Guest: Zack Kayser.
No episódio 174 do Kubicast, nós convidamos Lucas Quadros, desenvolvedor de software no time de IAI e Machine Learning da Grafana, para mergulharmos no universo da observabilidade. Em uma conversa técnica e bem-humorada, exploramos como logs e processamento de linguagem natural (NLP) se cruzam para transformar dados brutos em insights acionáveis e sobre a evolução de algoritmos de detecção de anomalias em séries temporais.Avançamos na discussão sobre IA generativa aplicada ao monitoramento: desde a criação de dashboards dinâmicos até a configuração inteligente de alertas e SLOs. Falamos ainda sobre a arquitetura de agentes de observabilidade capazes de navegar em enormes quantidades de métricas, traces e logs, ajudando a acelerar investigações de incidentes.Para fechar, debatemos aspectos de segurança e as trocas de conhecimento por meio de protocolos MCP que conectam LLMs aos nossos repositórios, dashboards e runbooks. Comentamos casos de uso, desafios de privacidade de dados e perspectivas para o futuro da automação em observabilidade.Links Importantes:- Luccas Quadros - Não tem rede social!!!- AIOps no KCD RJ - https://youtu.be/WTWmOybEOK4?si=QujwWRx8QxpOY43g- João Brito - https://www.linkedin.com/in/juniorjbn- Assista ao FilmeTEArapia - https://youtu.be/M4QFmW_HZh0?si=HIXBDWZJ8yPbpflMParticipe de nosso programa de acesso antecipado e tenha um ambiente mais seguro em instantes!https://getup.io/zerocveO Kubicast é uma produção da Getup, empresa especialista em Kubernetes e projetos open source para Kubernetes. Os episódios do podcast estão nas principais plataformas de áudio digital e no YouTube.com/@getupcloud.
Kui Jia, Sumo Logic's Vice President of Engineering and Head of AI, shares how their AWS-powered AI agents transform chaotic security investigations into streamlined workflows.Topics Include:Kui Jia leads AI Engineering at Sumo LogicSREs and SOC analysts work under chaotic, high-pressure conditionsTeams constantly switch between different vendor tools and platformsInvestigation requires quick hypothesis formation and complex query writingSumo Logic processes petabytes of data daily across enterprisesCompany serves 2,000+ enterprise customers for 15 yearsPlatform focuses on observability and cybersecurity use casesInvestigation journey: discover, diagnose, decide, act, learn phasesData flows from ingestion through analytics to human insightsTraditional workflow relies heavily on tribal domain knowledgeSenior engineers create queries that juniors struggle to understandWar room situations demand immediate answers, not learning curvesContext switching between tools wastes time and creates frictionMultiple AI generations deployed: ML anomaly detection to GenAIAgentic AI enables reasoning, planning, tools, and evaluation capabilitiesMo Copilot launched at AWS re:Invent as AI agent suiteNatural language converts high-level questions into Sumo queriesSystem provides intelligent autocomplete and multi-turn conversationsInsight agents summarize logs and security signals automaticallyKnowledge integration combines foundation models with proprietary metadataAI generates playbooks and remediation scripts for automated actionsThree-tier architecture: Infrastructure, AI Tooling, and Application layersBuilt on AWS Bedrock with Nova models for performanceFocus on reusable infrastructure and AI tooling componentsData differentiation more important than AI model selectionGolden datasets and contextualized metadata are development challengesGuardrails and evaluation frameworks critical for enterprise deploymentAI observability enables debugging and performance monitoringEnterprise agents achievable within one year development timelineFuture vision: multiple AI agents collaborating with human investigatorsParticipants:Kui Jia – Vice President of AI Engineering, Head of AI, Sumo LogicFurther Links:Website: https://www.sumologic.com/Sumo Logic in the AWS MarketplaceSee how Amazon Web Services gives you the freedom to migrate, innovate, and scale your software company at https://aws.amazon.com/isv/
Software Engineering Radio - The Podcast for Professional Software Developers
Brian Demers, Developer Advocate at Gradle, speaks with host Giovanni Asproni about the importance of having observability in the toolchain. Such information about build times, compiler warnings, test executions, and any other system used to build the production code can help to reduce defects, increase productivity, and improve the developer experience. During the conversation they touch upon what is possible with today's tools; the impact on productivity and developer experience; and the impact, both in terms of risks and opportunities, introduced by the use of artificial intelligence. Brought to you by IEEE Computer Society and IEEE Software magazine.
LLMs are reshaping the future of data and AI—and ignoring them might just be career malpractice. Yoni Michael and Kostas Pardalis unpack what's breaking, what's emerging, and why inference is becoming the new heartbeat of the data pipeline.// BioKostas PardalisKostas is an engineer-turned-entrepreneur with a passion for building products and companies in the data space. He's currently the co-founder of Typedef. Before that, he worked closely with the creators of Trino at Starburst Data on some exciting projects. Earlier in his career, he was part of the leadership team at Rudderstack, helping the company grow from zero to a successful Series B in under two years. He also founded Blendo in 2014, one of the first cloud-based ELT solutions.Yoni MichaelYoni is the Co-Founder of typedef, a serverless data platform purpose-built to help teams process unstructured text and run LLM inference pipelines at scale. With a deep background in data infrastructure, Yoni has spent over a decade building systems at the intersection of data and AI — including leading infrastructure at Tecton and engineering teams at Salesforce.Yoni is passionate about rethinking how teams extract insight from massive troves of text, transcripts, and documents — and believes the future of analytics depends on bridging traditional data pipelines with modern AI workflows. At Typedef, he's working to make that future accessible to every team, without the complexity of managing infrastructure.// Related LinksWebsite: https://www.typedef.aihttps://techontherocks.showhttps://www.cpard.xyz~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExploreMLOps Swag/Merch: [https://shop.mlops.community/]Connect with Demetrios on LinkedIn: /dpbrinkmConnect with Kostas on LinkedIn: /kostaspardalis/Connect with Yoni on LinkedIn: /yonimichael/Timestamps:[00:00] Breaking Tools, Evolving Data Workloads[06:35] Building Truly Great Data Teams[10:49] Making Data Platforms Actually Useful[18:54] Scaling AI with Native Integration[24:04] Empowering Employees to Build Agents[28:17] Rise of the AI Sherpa[36:09] Real AI Infrastructure Pain Points[38:05] Fixing Gaps Between Data, AI[46:04] Smarter Decisions Through Better Data[50:18] LLMs as Human-Machine Interfaces[53:40] Why Summarization Still Falls Short[01:01:15] Smarter Chunking, Fixing Text Issues[01:09:08] Evaluating AI with Canary Pipelines[01:11:46] Finding Use Cases That Matter[01:17:38] Cutting Costs, Keeping AI Quality[01:25:15] Aligning MLOps to Business Outcomes[01:29:44] Communities Thrive on Cross-Pollination[01:34:56] Evaluation Tools Quietly Consolidating
What happens when you try to monitor something fundamentally unpredictable? In this featured guest episode, Wayne Segar from Dynatrace joins Corey Quinn to tackle the messy reality of observing AI workloads in enterprise environments. They explore why traditional monitoring breaks down with non-deterministic AI systems, how AI Centers of Excellence are helping overcome compliance roadblocks, and why “human in the loop” beats full automation in most real-world scenarios.From Cursor's AI-driven customer service fail to why enterprises are consolidating from 15+ observability vendors, this conversation dives into the gap between AI hype and operational reality, and why the companies not shouting the loudest about AI might be the ones actually using it best.Show Highlights(00:00) - Cold Open(00:48) – Introductions and what Dynatrace actually does(03:28) – Who Dynatrace serves(04:55) – Why AI isn't prominently featured on Dynatrace's homepage(05:41) – How Dynatrace built AI into its platform 10 years ago(07:32) – Observability for GenAI workloads and their complexity(08:00) – Why AI workloads are "non-deterministic" and what that means for monitoring(12:00) – When AI goes wrong(13:35) – “Human in the loop”: Why the smartest companies keep people in control(16:00) – How AI Centers of Excellence are solving the compliance bottleneck(18:00) – Are enterprises too paranoid about their data?(21:00) – Why startups can innovate faster than enterprises(26:00) – The "multi-function printer problem" plaguing observability platforms(29:00) – Why you rarely hear customers complain about Dynatrace(31:28) – Free trials and playground environmentsAbout Wayne SegarWayne Segar is Director of Global Field CTOs at Dynatrace and part of the Global Center of Excellence where he focuses on cutting-edge cloud technologies and enabling the adoption of Dynatrace at large enterprise customers. Prior to joining Dynatrace, Wayne was a Dynatrace customer where he was responsible for performance and customer experience at a large financial institution. LinksDynatrace website: https://dynatrace.comDynatrace free trial: https://dynatrace.com/trialDynatrace AI observability: https://dynatrace.com/platform/artificial-intelligence/Wayne Segar on LinkedIn: https://www.linkedin.com/in/wayne-segar/SponsorDynatrace: http://www.dynatrace.com
In this episode, Patrick McKenzie (@patio11) is joined by Jennifer Li, a general partner at a16z investing in enterprise, infrastructure and AI. Jennifer breaks down how AI workloads are creating new demands on everything from inference pipelines to observability systems, explaining why we're seeing a bifurcation between language models and diffusion models at the infrastructure level. They explore emerging categories like reinforcement learning environments that help train agents, the evolution of web scraping for agentic workflows, and why Jennifer believes the API economy is about to experience another boom as agents become the primary consumers of software interfaces.–Full transcript: www.complexsystemspodcast.com/the-ai-infrastructure-stack-with-jennifer-li-a16z/–Sponsor: VantaVanta automates security compliance and builds trust, helping companies streamline ISO, SOC 2, and AI framework certifications. Learn more at https://vanta.com/complex–Links:Jennifer Li's writing at a16z https://a16z.com/author/jennifer-li/ –Timestamps:(00:00) Intro(00:55) The AI shift and infrastructure(02:24) Diving into middleware and AI models(04:23) Challenges in AI infrastructure(07:07) Real-world applications and optimizations(15:15) Sponsor: Vanta(16:38) Real-world applications and optimizations (cont'd)(19:05) Reinforcement learning and synthetic environments(23:05) The future of SaaS and AI integration(26:02) Observability and self-healing systems(32:49) Web scraping and automation(37:29) API economy and agent interactions(44:47) Wrap
In this episode, Vercel CEO Guillermo Rauch goes deep on how V0, their text-to-app platform, has already generated over 100 million applications and doubled Vercel's user base in under a year.Guillermo reveals how a tiny SWAT team inside Vercel built V0 from scratch, why “vibe coding” is making software creation accessible to everyone (not just engineers), and how the AI Cloud is automating DevOps, making cloud infrastructure self-healing, and letting companies expose their data to AI agents in just five lines of code.You'll hear why “every company will have to rethink itself as a token factory,” how Vercel's Next.js went from a conference joke to powering Walmart, Nike, and Midjourney, and why the next billion app creators might not write a single line of code. Guillermo breaks down the difference between vibe coding and agentic engineering, shares wild stories of users building apps from napkin sketches, and explains how Vercel is infusing “taste” and best practices directly into their AI models.We also dig into the business side: how Vercel's AI-powered products are driving explosive growth, why retention and margins are strong, and how the company is adapting to a new wave of non-technical users. Plus: the future of MCP servers, the security challenges of agent-to-agent communication, and why prompting and AI literacy are now must-have skills.VercelWebsite - https://vercel.comX/Twitter - https://x.com/vercelGuillermo RauchLinkedIn - https://www.linkedin.com/in/rauchgX/Twitter - https://x.com/rauchgFIRSTMARKWebsite - https://firstmark.comX/Twitter - https://twitter.com/FirstMarkCapMatt Turck (Managing Director)LinkedIn - https://www.linkedin.com/in/turck/X/Twitter - https://twitter.com/mattturck(00:00) Intro (02:08) What Is V0 and Why Did It Take Off So Fast? (04:10) How Did a Tiny Team Build V0 So Quickly? (07:51) V0 vs Other AI Coding Tools (10:35) What is Vibe Coding? (17:05) Is V0 Just Frontend? Moving Toward Full Stack and Integrations (19:40) What Skills Make a Great Vibe Coder? (23:35) Vibe Coding as the GUI for AI: The Future of Interfaces (29:46) Developer Love = Agent Love (33:41) Having Taste as Developer (39:10) MCP Servers: The New Protocol for AI-to-AI Communication (43:11) Security, Observability, and the Risks of Agentic Web (45:25) Are Enterprises Ready for the Agentic Future? (49:42) Closing the Feedback Loop: Customer Service and Product Evolution (56:06) The Vercel AI Cloud: From Pixels to Tokens (01:10:14) How Vercel Adapts to the ICP Change? (01:13:47) Retention, Margins, and the Business of AI Products (01:16:51) The Secret Behind Vercel Last Year Growth (01:24:15) The Importance of Online Presence (01:30:49) Everything, Everywhere, All at Once: Being CEO 101 (01:34:59) Guillermo's Advice to Younger Self
How do you keep complex digital experiences running smoothly when every layer, from networks to cloud infrastructure to applications, can break in ways that frustrate customers and burn out IT teams? This question is at the heart of my conversation recorded live at Cisco Live in San Diego with Patrick Lin, Senior Vice President and General Manager for Observability at Splunk, now part of Cisco. In this episode, Patrick explains how observability has evolved far beyond simple monitoring and is becoming the nerve centre for digital resilience in a world where reactive alerts no longer cut it. We unpack how Splunk and Cisco ThousandEyes are now deeply integrated, giving teams a single source of truth that connects application behaviour, infrastructure health, and network performance, even across systems they do not directly control. Patrick also shares what these two-way integrations mean in practice: faster incident resolution, fewer blame games, and far less time wasted chasing false alerts. We explore how AI is enhancing this vision by cutting through the noise to detect real anomalies, correlate related events, and suggest root causes at a speed no human team could match. If your business depends on staying online and your teams are drowning in disconnected data, this conversation offers a glimpse into the next phase of unified observability and assurance. It might even help quiet the flood of alerts that keep IT professionals awake at night. How is your organisation tackling alert fatigue and rising complexity? Listen in and tell me what strategies you have found that actually work.
Coralogix, an Israeli startup offering a full-stack observability and security platform, has raised $115 million at a pre-money valuation of over $1 billion, almost doubling in three years from its last round in 2022. Learn more about your ad choices. Visit podcastchoices.com/adchoices
Hi, Spring fans! In this episode, I talk to Micrometer.io lead Tommy Ludwig on the latest-and-greatest in observability for the Spring developer
Mark Ericksen, creator of the Elixir LangChain framework, joins the Elixir Wizards to talk about LLM integration in Elixir apps. He explains how LangChain abstracts away the quirks of different AI providers (OpenAI, Anthropic's Claude, Google's Gemini) so you can work with any LLM in one more consistent API. We dig into core features like conversation chaining, tool execution, automatic retries, and production-grade fallback strategies. Mark shares his experiences maintaining LangChain in a fast-moving AI world: how it shields developers from API drift, manages token budgets, and handles rate limits and outages. He also reveals testing tactics for non-deterministic AI outputs, configuration tips for custom authentication, and the highlights of the new v0.4 release, including “content parts” support for thinking-style models. Key topics discussed in this episode: • Abstracting LLM APIs behind a unified Elixir interface • Building and managing conversation chains across multiple models • Exposing application functionality to LLMs through tool integrations • Automatic retries and fallback chains for production resilience • Supporting a variety of LLM providers • Tracking and optimizing token usage for cost control • Configuring API keys, authentication, and provider-specific settings • Handling rate limits and service outages with degradation • Processing multimodal inputs (text, images) in Langchain workflows • Extracting structured data from unstructured LLM responses • Leveraging “content parts” in v0.4 for advanced thinking-model support • Debugging LLM interactions using verbose logging and telemetry • Kickstarting experiments in LiveBook notebooks and demos • Comparing Elixir LangChain to the original Python implementation • Crafting human-in-the-loop workflows for interactive AI features • Integrating Langchain with the Ash framework for chat-driven interfaces • Contributing to open-source LLM adapters and staying ahead of API changes • Building fallback chains (e.g., OpenAI → Azure) for seamless continuity • Embedding business logic decisions directly into AI-powered tools • Summarization techniques for token efficiency in ongoing conversations • Batch processing tactics to leverage lower-cost API rate tiers • Real-world lessons on maintaining uptime amid LLM service disruptions Links mentioned: https://rubyonrails.org/ https://fly.io/ https://zionnationalpark.com/ https://podcast.thinkingelixir.com/ https://github.com/brainlid/langchain https://openai.com/ https://claude.ai/ https://gemini.google.com/ https://www.anthropic.com/ Vertex AI Studio https://cloud.google.com/generative-ai-studio https://www.perplexity.ai/ https://azure.microsoft.com/ https://hexdocs.pm/ecto/Ecto.html https://oban.pro/ Chris McCord's ElixirConf EU 2025 Talk https://www.youtube.com/watch?v=ojL_VHc4gLk Getting started: https://hexdocs.pm/langchain/gettingstarted.html https://ash-hq.org/ https://hex.pm/packages/langchain https://hexdocs.pm/igniter/readme.html https://www.youtube.com/watch?v=WM9iQlQSFg @brainlid on Twitter and BlueSky Special Guest: Mark Ericksen.
In episode 83 of o11ycast, the Honeycomb team chats with Dan Ravenstone, the o11yneer. Dan unpacks the crucial, often underappreciated, role of the observability engineer. He discusses how this position champions the user, bridging the gap between technical performance and real-world customer experience. Learn about the challenges of mobile observability, the importance of clear terminology, and how building alliances across an organization drives successful observability practices.
In episode 83 of o11ycast, the Honeycomb team chats with Dan Ravenstone, the o11yneer. Dan unpacks the crucial, often underappreciated, role of the observability engineer. He discusses how this position champions the user, bridging the gap between technical performance and real-world customer experience. Learn about the challenges of mobile observability, the importance of clear terminology, and how building alliances across an organization drives successful observability practices.
Jack Herrington, podcaster, software engineer, writer and YouTuber, joins the pod to uncover the truth behind server functions and why they don't actually exist in the web platform. We dive into the magic behind frameworks like Next.js, TanStack Start, and Remix, breaking down how server functions work, what they simplify, what they hide, and what developers need to know to build smarter, faster, and more secure web apps. Links YouTube: https://www.youtube.com/@jherr Twitter: https://x.com/jherr Github: https://github.com/jherr ProNextJS: https://www.pronextjs.dev Discord: https://discord.com/invite/KRVwpJUG6p LinkedIn: https://www.linkedin.com/in/jherr Website: https://jackherrington.com Resources Server Functions Don't Exist (It Matters) (https://www.youtube.com/watch?v=FPJvlhee04E) We want to hear from you! How did you find us? Did you see us on Twitter? In a newsletter? Or maybe we were recommended by a friend? Let us know by sending an email to our producer, Em, at emily.kochanek@logrocket.com (mailto:emily.kochanek@logrocket.com), or tweet at us at PodRocketPod (https://twitter.com/PodRocketpod). Follow us. Get free stickers. Follow us on Apple Podcasts, fill out this form (https://podrocket.logrocket.com/get-podrocket-stickers), and we'll send you free PodRocket stickers! What does LogRocket do? LogRocket provides AI-first session replay and analytics that surfaces the UX and technical issues impacting user experiences. Start understanding where your users are struggling by trying it for free at LogRocket.com. Try LogRocket for free today. (https://logrocket.com/signup/?pdr) Special Guest: Jack Herrington.
Raza Habib, the CEO of LLM Eval platform Humanloop, talks to us about how to make your AI products more accurate and reliable by shortening the feedback loop of your evals. Quickly iterating on prompts and testing what works, along with some of his favorite Dario from Anthropic AI Quotes.// BioRaza is the CEO and Co-founder at Humanloop. He has a PhD in Machine Learning from UCL, was the founding engineer of Monolith AI, and has built speech systems at Google. For the last 4 years, he has led Humanloop and supported leading technology companies such as Duolingo, Vanta, and Gusto to build products with large language models. Raza was featured in the Forbes 30 Under 30 technology list in 2022, and Sifted recently named him one of the most influential Gen AI founders in Europe.// Related LinksWebsites: https://humanloop.com~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExploreMLOps Swag/Merch: [https://shop.mlops.community/]Connect with Demetrios on LinkedIn: /dpbrinkmConnect with Raza on LinkedIn: /humanloop-razaTimestamps:[00:00] Cracking Open System Failures and How We Fix Them[05:44] LLMs in the Wild — First Steps and Growing Pains[08:28] Building the Backbone of Tracing and Observability[13:02] Tuning the Dials for Peak Model Performance[13:51] From Growing Pains to Glowing Gains in AI Systems[17:26] Where Prompts Meet Psychology and Code[22:40] Why Data Experts Deserve a Seat at the Table[24:59] Humanloop and the Art of Configuration Taming[28:23] What Actually Matters in Customer-Facing AI[33:43] Starting Fresh with Private Models That Deliver[34:58] How LLM Agents Are Changing the Way We Talk[39:23] The Secret Lives of Prompts Inside Frameworks[42:58] Streaming Showdowns — Creativity vs. Convenience[46:26] Meet Our Auto-Tuning AI Prototype[49:25] Building the Blueprint for Smarter AI[51:24] Feedback Isn't Optional — It's Everything
This episode was sponsored by Elastic! Elastic is the company behind Elasticsearch, they help teams find, analyze, and act on their data in real-time through their Search, Observability, and Security solutions. Thanks Elastic! This episode was recorded at Elastic's offices in San Francisco during a meetup.Find info about the show, past episodes including transcripts, our swag store, Patreon link, and more at https://cupogo.dev/.
Highlights from this week's conversation include:Pete's Background and Journey in Data (1:36)Evolution of Data Practices (3:02)Integration Challenges with Acquired Companies (5:13)Trust and Safety as a Service (8:12)Transition to Dagster (11:26)Value Creation in Networking (14:42)Observability in Data Pipelines (18:44)The Era of Big Complexity (21:38)Abstraction as a Tool for Complexity (24:41)Composability and Workflow Engines (28:08)The Need for Guardrails (33:13)AI in Development Tools (36:24)Internal Components Marketplace (40:14)Reimagining Data Integration (43:03)Importance of Abstraction in Data Tools (46:17)Parting Advice for Listeners and Closing Thoughts (48:01)The Data Stack Show is a weekly podcast powered by RudderStack, customer data infrastructure that enables you to deliver real-time customer event data everywhere it's needed to power smarter decisions and better customer experiences. Each week, we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Send us a textWe're back for Part 2 of our Automation deep-dive—and the hits just keep coming! Host Al Martin reunites with IBM automation aces Sarah McAndrew (WW Automation Technical Sales) and Vikram Murali (App Mod & IT Automation Development) to push past the hype and map out the road ahead.
Send us a textWe're back for Part 2 of our Automation deep-dive—and the hits just keep coming! Host Al Martin reunites with IBM automation aces Sarah McAndrew (WW Automation Technical Sales) and Vikram Murali (App Mod & IT Automation Development) to push past the hype and map out the road ahead.
Scientific research is the foundation of many innovative solutions in any field. Did you know that Dynatrace runs its own Research Lab within the Campus of the Johannes Kepler University (JKU) in Linz, Austria - just 2 kilometers away from our global engineering headquarter? What started in 2020 has grown to 20 full time researchers and many more students that do research on topics such as GenAI, Agentic AI, Log Analytics, Procesesing of Large Data Sets, Sampling Strategies, Cloud Native Security or Memory and Storage Optimizations.Tune in and hear from Otmar and Martin how they are researching on the N+2 generation of Observability and AI, how they are contributing to open source projects such as OpenTelemetry, and what their predictions are when AI is finally taking control of us humans!To learn more about their work check out these links:Martin's LinkedIn: https://www.linkedin.com/in/mflechl/Otmar's LinkedIn: https://www.linkedin.com/in/otmar-ertl/Dynatrace Research Lab: https://careers.dynatrace.com/locations/linz/#__researchLab
Highlights from this week's conversation include:Background of ClickHouse (1:14)PostgreSQL Data Replication Tool (3:19)Emerging Technologies Observations (7:25)Observability and Market Dynamics (11:26)Product Development Challenges (12:39)Challenges with PostgreSQL Performance (15:30)Philosophy of Open Source (18:01)Open Source Advantages (22:56)Simplified Stack Vision (24:48)End-to-End Use Cases (28:13)Migration Strategies (30:21)Final Thoughts and Takeaways (33:29)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Agentic AI is equally as daunting as it is dynamic. So…… how do you not screw it up? After all, the more robust and complex agentic AI becomes, the more room there is for error. Luckily, we've got Dr. Maryam Ashoori to guide our agentic ways. Maryam is the Senior Director of Product Management of watsonx at IBM. She joined us at IBM Think 2025 to break down agentic AI done right. Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Have a question? Join the convo here.Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:Agentic AI Benefits for EnterprisesWatson X's New Features & AnnouncementsAI-Powered Enterprise Solutions at IBMResponsible Implementation of Agentic AILLMs in Enterprise Cost OptimizationDeployment and Scalability EnhancementsAI's Impact on Developer ProductivityProblem-Solving with Agentic AITimestamps:00:00 AI Agents: A Business Imperative06:14 "Optimizing Enterprise Agent Strategy"09:15 Enterprise Leaders' AI Mindset Shift09:58 Focus on Problem-Solving with Technology13:34 "Boost Business with LLMs"16:48 "Understanding and Managing AI Risks"Keywords:Agentic AI, AI agents, Agent lifecycle, LLMs taking actions, WatsonX.ai, Product management, IBM Think conference, Business leaders, Enterprise productivity, WatsonX platform, Custom AI solutions, Environmental Intelligence Suite, Granite Code models, AI-powered code assistant, Customer challenges, Responsible AI implementation, Transparency and traceability, Observability, Optimization, Larger compute, Cost performance optimization, Chain of thought reasoning, Inference time scaling, Deployment service, Scalability of enterprise, Access control, Security requirements, Non-technical users, AI-assisted coding, Developer time-saving, Function calling, Tool calling, Enterprise data integration, Solving enterprise problems, Responsible implementation, Human in the loop, Automation, IBM savings, Risk assessment, Empowering workforce.Send Everyday AI and Jordan a text message. (We can't reply back unless you leave contact info)
Network monitoring, Internet monitoring, and observability are all key components of NetOps. We speak with sponsor Catchpoint to understand how Catchpoint can help network operators proactively identify and resolve issues before they impact customers. We discuss past and current network monitoring strategies and the challenges that operators face with both on-prem and cloud monitoring, along... Read more »
Network monitoring, Internet monitoring, and observability are all key components of NetOps. We speak with sponsor Catchpoint to understand how Catchpoint can help network operators proactively identify and resolve issues before they impact customers. We discuss past and current network monitoring strategies and the challenges that operators face with both on-prem and cloud monitoring, along... Read more »
Modern cloud-native systems are highly dynamic and distributed, which makes it difficult to monitor cloud infrastructure using traditional tools designed for static environments. This has motivated the development and widespread adoption of dedicated observability platforms. Prometheus is an open-source observability tool designed for cloud-native environments. Its strong integration with Kubernetes and pull-based data collection model The post Prometheus and Open-Source Observability with Eric Schabell appeared first on Software Engineering Daily.