POPULARITY
Christophe Rochefolle a un beau parcours de CTO, est une référence sur le sujet du Chaos engineering et est l'auteur d'un des livres de référence sur le DevOps.Dans cet épisode, nous discutons des multiples facettes d'une réponse appropriée aux crises : Chaos Engineering, robustesse et santé mentale.Accès rapide03:00 : le Chaos Engineering09:50 : l'incident de Prod qui devient un trauma12:15 : les “Days of chaos”16:40 : La gestion d'incident plus que la gestion de problème21:20 : le concept de robustesse (contre le culte de la performance)29:30 : le “Truck Factor”, des compétences en I au peigne et la charge mentale35:20 : le Platform Engineering42:40 : “Mettre en oeuvre DevOps” a 10 ans, qu'est-ce qui s'est passé depuis 2015 ?52:00 : les polycrises55:20 : santé mentale et robustesse, premiers secours en santé mentale1:03:03 : manager avec un grand coeur1:07:40 : les pratiques managériales recommandéesRecommandations“Mettre en oeuvre DevOps”, Alain Sacquet, Christophe Rochefolle“Ne coupez jamais la poire en deux”, Chris Voss“Klara et le soleil”, Kazuo Ishiguro“Friends” Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.
In dieser Folge tauchen wir tief in die Welt der Künstlichen Intelligenz (KI) ein und betrachten sie aus einer systemischen Perspektive. Inspiriert vom klassischen Cartoon Spy vs. Spy geht es um die Frage, wie KI-Systeme miteinander interagieren — sowohl im Sinne von Kooperation als auch Konkurrenz — und welche Herausforderungen dies für uns Menschen mit sich bringt. Ich beleuchte die evolutionäre Natur technischen Fortschritts, die Risiken von Komplexität und Kontrollverlust sowie die möglichen Folgen einer Zukunft, in der KI von einem passiven Werkzeug zu einem aktiven Akteur wird. Begleiten Sie mich auf einer Reise durch emergente Phänomene und Fulgurationen, systemische Abhängigkeiten und die Frage, wie wir die Kontrolle über immer komplexere Systeme behalten können. Denn es stellen sich zahlreiche Fragen: Wie könnte sich die Interaktion zwischen KI-Systemen in der Zukunft entwickeln – eher kooperativ oder kompetitiv? Welche Rolle spielt der Mensch noch, wenn KI-Systeme immer schneller Entscheidungen treffen und autonome Akteure werden? Welche Erfahrungen haben Sie mit technischen Systemen gemacht, die durch Komplexität oder Automatisierung an Kontrolle verloren haben? Was bedeutet es für unsere Gesellschaft, wenn wir durch Automatisierung Know-how und Kontrolle an externe Akteure abgeben? Wie können wir den Übergang von KI als Spielzeug zu einem systemisch notwendigen Element gestalten, ohne in eine Abhängigkeit zu geraten? Glauben Sie, dass KI jemals eine eigene Motivation oder Intentionalität entwickeln könnte – und wenn ja, wie sollten wir darauf reagieren? Welche Strategien könnten helfen, die Kontrolle über komplexe, nicht-deterministische Systeme wie KI-Agenten zu behalten? Wie sehen Sie die geopolitischen Herausforderungen der KI-Entwicklung, insbesondere in Bezug auf Energie und Innovation? Welche positiven Szenarien können Sie sich für eine Zukunft mit KI und autonomen Agenten vorstellen? Wie bereiten Sie sich persönlich oder in Ihrem Umfeld auf die kommenden Entwicklungen in der KI vor? Ich freue mich Ihre Gedanken und Ansichten zu hören! Schreiben Sie mir und lassen Sie uns gemeinsam über die Zukunft nachdenken. Bis zum nächsten Mal beim gemeinsamen Nachdenken über die Zukunft! Referenzen Andere Episoden Episode 109: Was ist Komplexität? Ein Gespräch mit Dr. Marco Wehr Episode 107: How to Organise Complex Societies? A Conversation with Johan Norberg Episode 104: Aus Quantität wird Qualität Episode 103: Schwarze Schwäne in Extremistan; die Welt des Nassim Taleb, ein Gespräch mit Ralph Zlabinger Episode 99: Entkopplung, Kopplung, Rückkopplung Episode 94: Systemisches Denken und gesellschaftliche Verwundbarkeit, ein Gespräch mit Herbert Saurugg Episode 90: Unintended Consequences (Unerwartete Folgen) Episode 69: Complexity in Software Episode 40: Software Nachhaltigkeit, ein Gespräch mit Philipp Reisinger Episode 31: Software in der modernen Gesellschaft – Gespräch mit Tom Konrad Fachliche Referenzen Spy vs. Spy, The Complete Casebook Rupert Riedl, Strukturen der Komplexität: Eine Morphologie des Erkennens und Erklärens, Springer (2000) Doug Meil, The U.K. Post Office Scandal: Software Malpractice At Scale – Communications of the ACM (2024) Casey Rosenthal, Chaos Engineering, O'Reilly (2017)
Chaos engineering—is it really chaos, or something more structured? Andrey, Paulina, and Mattias talk about what chaos engineering means, how it started, and why you might already be using it unintentionally. Connect with us on LinkedIn or Twitter (see info at https://devsecops.fm/about/). We are happy to answer any questions, hear suggestions for new episodes, or hear from you, our listeners.
November 6, 2024: George Pappas, CEO at Intraprise Health joins Drex for the news. The conversation delves into the challenges of maintaining robust cybersecurity practices amidst resource constraints and evolving regulatory burdens. Can a split CISO role better navigate the increasing demands of compliance and governance? The show also highlights Mainline Health's strategic use of chaos engineering to test vulnerabilities and improve resilience—how does such a proactive approach redefine preparedness in an era where digital and analog systems must coexist seamlessly? Key Points:02:00 Unmanaged Cloud Credentials and Monitoring09:01 CISO Role and Regulatory Burden14:09 Cybersecurity Risk Management18:26 Chaos Engineering at Mainline HealthNews articles:Nearly Half of Organizations Face Data Breach Risks from Long-Lived CredentialsCISOs Push for Role Split Amid Rising Regulatory PressuresMain Line Health deploys chaos engineering to bolster healthcare resilienceThis Week Health SubscribeThis Week Health TwitterThis Week Health LinkedinAlex's Lemonade Stand: Foundation for Childhood Cancer Donate
On this Replay, we're revisiting our conversation with Jason Yee, Staff Technical Advocate at Datadog. At the time of this recording, he was the Director of Advocacy at Gremlin, an enterprise-grade chaos engineering platform. Join Corey and Jason as they talk about what Gremlin is and what a director of advocacy does, making chaos engineering more accessible for the masses, how it's hard to calculate ROI for developer advocates, how developer advocacy and DevRel changes from one company to the next, why developer advocates need to focus on meaningful connections, why you should start chaos engineering as a mental game, qualities to look for in good developer advocates, the Break Things On Purpose podcast, and more.Show Highlights(0:00) Intro(0:31) Blackblaze sponsor read(0:58) The role of a Director of Advocacy(3:34) DevRel and twisting job definitions(5:50) How DevRel confusion manifests into marketing(11:37) Being able to measure and define a team's success(13:42) Building respect and a community in tech(15:22) Effectively courting a community(18:02) The challenges of Jason's job(21:06) Planning for failure modes(22:30) Determining your value in tech(25:41) The growth of Gremlin(30:16) Where you can find more from JasonAbout Jason YeeJason Yee is Staff Technical Avdocate at Datadog, where he works to inspire developers and ops engineers with the power of metrics and monitoring. Previously, he was the community manager for DevOps & Performance at O'Reilly Media and a software engineer at MongoDB.LinksBreak Things On Purpose podcast: https://www.gremlin.com/podcast/Twitter: https://twitter.com/gitbisectOriginal episodehttps://www.lastweekinaws.com/podcast/screaming-in-the-cloud/chaos-engineering-for-gremlins-with-jason-yee/SponsorBackblaze: https://www.backblaze.com/
We have Miko Pawlikowski as a guest and we will dive deep into the world of Chaos Engineering and SRE (Site Reliability Enginering). Miko also co-founded multiple startups, including SREday and Conf42 with his brother, Mark PawlikowskiOther topics we talk about are UniKernels, GoldPinger, Chaos Monkey, SRE Days Amsterdam and a lot more.https://nanovms.com/https://hockeystick.showhttps://sreday.com/Stuur ons een bericht.
In today's episode of Category Visionaries, we speak with Casey Rosenthal, CEO of Verica. Topics Discussed: Chaos engineering - what does it mean? Why is it so important and what impact can it have on an enterprise tech setup Why more and more companies are seeking to add chaos engineering to their approach, and why it's already popular in fintech Why diversity is key for modern tech companies, driving better performance and ultimately more success Continuous verification and the DevOps revolution - where does this new category stand in the story? How harmonizing schedules between different departments can be a real challenge for a company looking to expand Favorite book: What Works: Gender Equality by Design
In this episode, we spoke to Karthik Satchitanand. Karthik is a principal software engineer at Harness and co-founder and maintainer of LitmusChaos, a CNCF incubated project. We talked about Chaos engineering , the Litmus project and more. Do you have something cool to share? Some questions? Let us know: - web: kubernetespodcast.com - mail: kubernetespodcast@google.com - twitter: @kubernetespod News of the week Kubernetes 1.31 release blog Kubernetes 1.31 release episode of the Kubernetes Podcast from Google KubeCon NA 2024 Schedule Score accepted as a CNCF Sandbox Project Links from the interview LitmusChaos principlesofchaos.org Okteto LitmusChaosCon community.cncf.io Links from the post-interview chat Chaos Monkey Chapter 5 of “Chaos Engineering” by Casey Rosenthal, Nora Jones, published by O'Reilly, covers DiRT LitmusChaos ChaosHub Klustered on YouTube Rawkode Academy
Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/. Benjamin Wilms is a developer and software architect at heart, with 20 years of experience. He fell in love with chaos engineering. Benjamin now spreads his enthusiasm and new knowledge as a speaker and author – especially in the field of chaos and resilience engineering. Retrieval Augmented Generation // MLOps podcast #237 with Benjamin Wilms, CEO & Co-Founder of Steadybit. Huge thank you to Amazon Web Services for sponsoring this episode. AWS - https://aws.amazon.com/ // Abstract How to build reliable systems under unpredictable conditions with Chaos Engineering. // Bio Benjamin has over 20 years of experience as a developer and software architect. He fell in love with chaos engineering 7 years ago and shares his knowledge as a speaker and author. In October 2019, he founded the startup Steadybit with two friends, focusing on developers and teams embracing chaos engineering. He relaxes by mountain biking when he's not knee-deep in complex and distributed code. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Website: https://steadybit.com/ --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Benjamin on LinkedIn: https://www.linkedin.com/in/benjamin-wilms/ Timestamps: [00:00] Benjamin's preferred coffee [00:28] Takeaways [02:10] Please like, share, leave a review, and subscribe to our MLOps channels! [02:53] Chaos Engineering tldr [06:13] Complex Systems for smaller Startups [07:21] Chaos Engineering benefits [10:39] Data Chaos Engineering trend [15:29] Chaos Engineering vs ML Resilience [17:57 - 17:58] AWS Trainium and AWS Infecentia Ad [19:00] Chaos engineering tests system vulnerabilities and solutions [23:24] Data distribution issues across different time zones [27:07] Expertise is essential in fixing systems [31:01] Chaos engineering integrated into machine learning systems [32:25] Pre-CI/CD steps and automating experiments for deployments [36:53] Chaos engineering emphasizes tool over value [38:58] Strong integration into observability tools for repeatable experiments [45:30] Invaluable insights on chaos engineering [46:42] Wrap up
Chaos Engineering is no longer a nice to have, as Ananth Movva explains in this episode of the SREpath podcast. His experiences with it drove a reduced number and severity of serious incidents and outages.He's been at the helm of reliability-focused decision-making at one of Canada's largest banks, BMO, since 2020. Having completed 12 years at the bank, Ananth has seen the evolution of banking technology from archaic to user-centric, where incidents are considered seriously.Ananth highlighted the use of chaos principles and tooling to identify future points of failure well ahead of time. He also talked about issues in bringing developers to integrate chaos into SDLC. You will not want to miss this conversation!You can connect with Ananth via LinkedIn This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit srepath.substack.com
For some strange reason, “maintenance” has been in the news quite a bit lately. Is there ever a time when maintenance is enjoyable, or appreciated? SHOW: 814SHOW TRANSCRIPT: The Cloudcast #814SHOW VIDEO: https://youtube.com/@TheCloudcastNET CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotwCHECK OUT OUR NEW PODCAST - "CLOUDCAST BASICS"SHOW NOTES:AWS increased the price of longer-running EKS clusters by 6xBroadcom changes VMware licensing from perpetual to subscriptionBroadcom offers security patches to perpetual license customersIncreasing the Kubernetes support window to 1 yearDiscovering the XZ backdoor (Oxide and Friends, podcast)IS MAINTENANCE EVER APPRECIATED OR ENJOYABLE?Spent the day surrounded by maintenance activities (oil, AC, power-wash)The costs of maintenance are real and opportunityMaintenance often goes unappreciated and unseenNaming: Release Notes, Technical Debt, Chaos EngineeringTECHNICAL DEBT VS. MAINTENANCEShould we encourage a lack of maintenance vs. innovation as a priority?Should we encourage active maintenance with lower hard costs?Is there a way to put respect on maintenance? (e.g. OSS maintainers)Do we undervalue maintenance (e.g. Backup/Recovery, DisasterRecovery, etc.)?What maintenance best practices do you use? What are the good and bad of them?FEEDBACK?Email: show at the cloudcast dot netTwitter: @cloudcastpodInstagram: @cloudcastpodTikTok: @cloudcastpod
Jelle Niemantsverdriet joins us in this episode to discuss how the mindset around security is evolving, both from organisations and from professionals. My favourite takeaway is that security is on the same path as testing and becoming part of quality in software development. Connect with Jelle Niemantsverdriet: https://www.linkedin.com/in/jelleniemantsverdriet https://twitter.com/jelle_n References: Digital Defense Report - https://www.microsoft.com/nl-nl/security/security-insider/microsoft-digital-defense-report-2023 Data Breach Investigations Report (DBIR) - https://www.verizon.com/business/resources/reports/dbir/?CMP=OOH_SMB_OTH_22222_MC_20200501_NA_NM20200079_00001 Sidney Dekker - https://sidneydekker.com Kelly Shortridge - https://kellyshortridge.com/blog/ Chaos Engineering - https://www.securitychaoseng.com OUTLINE 00:00:00 - Intro 00:00:25 - Security is a matter of software quality 00:02:19 - Security way of working 00:04:37 - Professional pride 00:06:53 - Layers of defense, or excuse? 00:09:05 - The industrial revolution in IT 00:10:48 - Security as speciality 00:13:18 - Collaborating with the security department 00:14:29 - Building bridges 00:16:22 - Willingness to listen 00:19:29 - Scenario analysis workshops 00:21:01 - Unpredictable human behaviour 00:23:21 - Seemless and friction in security solutions 00:25:28 - Instant cake 00:26:38 - Red, blue and purple teaming 00:28:34 - Exploring the boundaries in AI 00:31:38 - Gamified security 00:32:46 - With risk comes reward 00:36:17 - Security costs vs. benefit 00:38:49 - Frequent password changes 00:41:20 - Verizon Data Breach Investigations Report 00:43:55 - Sidney Dekker - Human error doesn't exist 00:46:23 - Kelly Shortridge - Sensemaking 00:47:14 - Sharing knowledge around security
Chaos engineering is all about resilience and reliability… it just takes the harder path to get there. By injecting random and unpredictable behavior to the point of failure, chaos engineers observe systems' weak points, apply preventative maintenance, and develop a failover plan. Matt Schillerstrom from Harness introduces Ned and Ethan to this wild corner of... Read more »
Chaos engineering is all about resilience and reliability… it just takes the harder path to get there. By injecting random and unpredictable behavior to the point of failure, chaos engineers observe systems' weak points, apply preventative maintenance, and develop a failover plan. Matt Schillerstrom from Harness introduces Ned and Ethan to this wild corner of... Read more »
Chaos engineering is all about resilience and reliability… it just takes the harder path to get there. By injecting random and unpredictable behavior to the point of failure, chaos engineers observe systems' weak points, apply preventative maintenance, and develop a failover plan. Matt Schillerstrom from Harness introduces Ned and Ethan to this wild corner of... Read more »
On this week's episode, host Conor Bronsdon sit down with Guilherme Sesterheim, SAP DevOps SRE Engineer at AWS. Guilherme delves into applying Chaos Engineering and DevOps principles to SAP, a domain traditionally seen as risk-averse and resistant to rapid innovation.With expertise in both open-source technologies and SAP, Guilherme shares how he's bringing modern practices to SAP environments at AWS. He explores how Chaos Engineering can be used to test and improve the resilience of SAP systems, focusing on HANA, SAP's in-memory database. The discussion also touches on the challenges of integrating these practices within the SAP framework and the broader implications for SAP users and the tech industry.Episode Highlights:00:20 What does it mean to apply chaos engineering to testing SAP installation04:05 What does it mean to have DevOps around SAP?05:58 Guilherme's approach to DevOps practices around SAP10:01 The challenge of handling installation and migration11:50 How to Start Applying Chaos Engineering to Your SAP Instance16:57 The 12 Scenarios When You Inject Failures on SAP19:24 How Guilherme ended up at AWS working on SAP23:14 What's Next in DevOps Guilherme is Excited About?Show Notes:Wiring the Winning Organization - IT RevolutionSupport the show: Subscribe to our Substack Leave us a review Subscribe on YouTube Follow us on Twitter or LinkedIn Offers: Learn about Continuous Merge with gitStream Get your DORA Metrics free forever
The resilience discipline of controlled stress test experimentation in continuous integration/continuous delivery environments, CI/CD environments, to uncover systemic weaknesses. CyberWire Glossary link: https://thecyberwire.com/glossary/chaos-engineering Audio reference link: Farnam Street, 2009. Richard Feynman Teaches you the Scientific Method [Website]. Farnam Street. URL https://fs.blog/mental-model-scientific-method/
The resilience discipline of controlled stress test experimentation in continuous integration/continuous delivery environments, CI/CD environments, to uncover systemic weaknesses. CyberWire Glossary link: https://thecyberwire.com/glossary/chaos-engineering Audio reference link: Farnam Street, 2009. Richard Feynman Teaches you the Scientific Method [Website]. Farnam Street. URL https://fs.blog/mental-model-scientific-method/ Learn more about your ad choices. Visit megaphone.fm/adchoices
We've reached an inflection point in security. There are a handful of organizations regularly and successfully stopping cyber attacks. Most companies haven't gotten there, however. What separates these two groups? Why does it seem like we're still failing as an industry, despite seeming to collectively have all the tools, intel, and budget we've asked for? Kelly Shortridge has studied this problem in depth. She has created tools (https://www.deciduous.app/), and written books (https://www.securitychaoseng.com/) to help the community approach security challenges in a more logical and structured way. We'll discuss what hasn't worked for infosec in the past, and what Kelly thinks might work as we go into the future. Show Notes: https://securityweekly.com/esw-339
We've reached an inflection point in security. There are a handful of organizations regularly and successfully stopping cyber attacks. Most companies haven't gotten there, however. What separates these two groups? Why does it seem like we're still failing as an industry, despite seeming to collectively have all the tools, intel, and budget we've asked for? Kelly Shortridge has studied this problem in depth. She has created tools (https://www.deciduous.app/), and written books (https://www.securitychaoseng.com/) to help the community approach security challenges in a more logical and structured way. We'll discuss what hasn't worked for infosec in the past, and what Kelly thinks might work as we go into the future. Show Notes: https://securityweekly.com/esw-339
Podcast: Unsolicited Response (LS 33 · TOP 5% what is this?)Episode: Kelly Shortridge - Security Chaos Engineering in ICSPub date: 2023-11-01Kelly joins Dale to discuss her new book Security Chaos Engineering: Sustaining Resilience in Software and Systems. Kelly points out the second part of the title is the most descriptive, and she is not a big fan of the Chaos term that has taken hold. They discuss: A quick description of Security Chaos Engineering Is there similarity or overlap with the CCE or CIE approach? The value of decision trees Her view of checklists of security controls like CISA's CPG Lesson 1 - "Start in Nonproduction environments" The experiment / scientific method approach and how it can start small The Danger Zone: tight coupling and complex interactions How should ICS use Chaos Engineering The podcast and artwork embedded on this page are from Dale Peterson: ICS Security Catalyst and S4 Conference Chair, which is the property of its owner and not affiliated with or endorsed by Listen Notes, Inc.
Podcast: Unsolicited Response (LS 34 · TOP 5% what is this?)Episode: Kelly Shortridge - Security Chaos Engineering in ICSPub date: 2023-11-01Kelly joins Dale to discuss her new book Security Chaos Engineering: Sustaining Resilience in Software and Systems. Kelly points out the second part of the title is the most descriptive, and she is not a big fan of the Chaos term that has taken hold. They discuss: A quick description of Security Chaos Engineering Is there similarity or overlap with the CCE or CIE approach? The value of decision trees Her view of checklists of security controls like CISA's CPG Lesson 1 - "Start in Nonproduction environments" The experiment / scientific method approach and how it can start small The Danger Zone: tight coupling and complex interactions How should ICS use Chaos Engineering The podcast and artwork embedded on this page are from Dale Peterson: ICS Security Catalyst and S4 Conference Chair, which is the property of its owner and not affiliated with or endorsed by Listen Notes, Inc.
Kelly joins Dale to discuss her new book Security Chaos Engineering: Sustaining Resilience in Software and Systems. Kelly points out the second part of the title is the most descriptive, and she is not a big fan of the Chaos term that has taken hold. They discuss: A quick description of Security Chaos Engineering Is there similarity or overlap with the CCE or CIE approach? The value of decision trees Her view of checklists of security controls like CISA's CPG Lesson 1 - "Start in Nonproduction environments" The experiment / scientific method approach and how it can start small The Danger Zone: tight coupling and complex interactions How should ICS use Chaos Engineering
Ready to inject a little chaos into your systems? Richard talks to Kelly Shortridge about her book Security Chaos Engineering. Kelly discusses the challenges of modern cybersecurity - how do you find weaknesses in your infrastructure and security systems? This leads to a discussion about challenging assumptions by exploring the workflows that exist in your infrastructure today. Exploring the workflows shows where assumptions exist, and that opens the door to testing them. There's sure to be some low-hanging fruit you can deal with, but eventually, you're left with tests that have to be set loose on your system - and you'll find out how resilient you really are!Links:FastlySecurity Chaos EngineeringDeciduousRecorded August 22, 2023
Welcome to another episode of the DevOps Toolchain podcast! In today's episode, we have the pleasure of talking with Daryl Dunn from Gremlin about the exciting topics of Gremlin certification, reliability testing, and chaos engineering. Daryl is a senior solution tech at Gremlin and has extensive experience in the field of testing and automation. We dive into the world of chaos engineering and how it plays a crucial role in ensuring system resilience and observability in today's cloud-native environments. Daryl shares his insights on how chaos engineering is a controlled process that allows organizations to proactively test their systems' resilience and ability to recover from failures. We discuss the importance of incorporating chaos engineering into the DevOps toolchain to identify and address vulnerabilities before they become critical issues. Daryl also highlights the significance of observability and how it plays a crucial role in optimizing system performance and response time. Take advantage of this episode to upskill your career in one of the hottest testing topics. Join us as we explore the world of Gremlin certification, reliability testing, and chaos engineering with Daryl Dunn. Let's dive in!
In today's Kubernetes Unpacked, Michael and Kristina catch up with Prithvi Raj and Sayan Mondal to talk about all things Chaos Engineering in the Kubernetes space! We chat about the open source and CNCF incubating project, Litmus, and various other topics including why Chaos Engineering is important, how it can help all organizations, how every engineer can use it, and more.
In today's Kubernetes Unpacked, Michael and Kristina catch up with Prithvi Raj and Sayan Mondal to talk about all things Chaos Engineering in the Kubernetes space! We chat about the open source and CNCF incubating project, Litmus, and various other topics including why Chaos Engineering is important, how it can help all organizations, how every engineer can use it, and more. The post Kubernetes Unpacked 035: Chaos Engineering In Kubernetes And The Litmus Project appeared first on Packet Pushers.
In today's Kubernetes Unpacked, Michael and Kristina catch up with Prithvi Raj and Sayan Mondal to talk about all things Chaos Engineering in the Kubernetes space! We chat about the open source and CNCF incubating project, Litmus, and various other topics including why Chaos Engineering is important, how it can help all organizations, how every engineer can use it, and more. The post Kubernetes Unpacked 035: Chaos Engineering In Kubernetes And The Litmus Project appeared first on Packet Pushers.
In today's Kubernetes Unpacked, Michael and Kristina catch up with Prithvi Raj and Sayan Mondal to talk about all things Chaos Engineering in the Kubernetes space! We chat about the open source and CNCF incubating project, Litmus, and various other topics including why Chaos Engineering is important, how it can help all organizations, how every engineer can use it, and more.
In today's Kubernetes Unpacked, Michael and Kristina catch up with Prithvi Raj and Sayan Mondal to talk about all things Chaos Engineering in the Kubernetes space! We chat about the open source and CNCF incubating project, Litmus, and various other topics including why Chaos Engineering is important, how it can help all organizations, how every engineer can use it, and more.
Speaking at Black Hat 2023, Kelly Shortridge is bringing cybersecurity out of the dark ages by infusing security by design to create secure patterns and practices. It's a subject of her new book on Security Chaos Computing, and it's a topic that's long overdue to be discussed in the field.
Guest: Kelly Shortridge, Senior Principal Engineer in the Office of the CTO at Fastly Topics: So what is Security Chaos Engineering? “Chapter 5. Operating and Observing” is Anton's favorite. One thing that mystifies me, however, is that you outline how to fail with alerts (send too many), but it is not entirely clear how to practically succeed with them? How does chaos engineering help security alerting / detection? How chaos engineering (or is it really about software resilience?) intersects with Cloud security - is this peanut butter and chocolate or more like peanut butter and pickles? How can organizations get started with chaos engineering for software resilience and security? What is your favorite chaos engineering experiment that you have ever done? We often talk about using the SRE lessons for security, and yet many organizations do security the 1990s way. Are there ways to use chaos engineering as a forcing function to break people out of their 1990s thinking and time warp them to 2023? Resources: Video (LinkedIn, YouTube) “Security Chaos Engineering: Sustaining Resilience in Software and Systems” by Kelly Shortridge, Aaron Rinehart “Cybersecurity Myths and Misconceptions” book “Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems“ book “Normal Accidents: Living with High-Risk Technologies” book “Deploy Security Capabilities at Scale: SRE Explains How” (ep85) “The Good, the Bad, and the Epic of Threat Detection at Scale with Panther” (ep123) “Can a Small Team Adopt an Engineering-Centric Approach to Cybersecurity?” (ep117) IKEA Effect “Modernizing SOC ... Introducing Autonomic Security Operations” blog
In today's episode of Category Visionaries, we speak with Benjamin Wilms, CEO and Co-Founder of Steadybit, a chaos engineering platform that's raised $7.8 Million in funding, about why resilience is everything in the modern software economy, and why so many companies struggle to build it into their more complex systems. Working with a small team of dedicated engineers, Steadybit wants to move beyond chaos engineering to create holistic resilience engineering solutions and empower everybody who builds and runs software. We also speak about Benjamin's background as a software developer and what it was like transitioning to the role of CEO, the original inspiration for a resilience-focused tech platform like Steadybit, having a real passion for problem solving and sharing knowledge, and why chaos engineering is key to making sure everything runs the way it should in the real world. Topics Discussed: Benjamin's background in software development, and his history of developing resilience-first software solutions The transition from developer to CEO, and confronting the challenges of time management and decision making The central role of chaos engineering in building long-term resilience for systems operating in an unpredictable digital space Why the term ‘chaos engineering' itself has great marketing potential, and how attracting initial interest is a big part of business growth How learning from failure is such an important part of aspect of developing a business, but why that doesn't mean it's always easy
Steadybit is a Chaos and Resilience Engineering platform that helps to proactively reduce downtime and provide visibility into systems to detect issues. Your business deserves the best possible preventive action, no matter how complex your system landscape is. steady bit adds that little extra certainty to your development and testing workflow. Connect with Benjamin
Chris - For those not familiar with Security Chaos Engineering, how would you summarize it, and what made you decide to author the new book on it?Nikki - In one of your sections of Security Chaos Engineering, you talk about what a modern security program looks like. Can you talk about what this means compared to security programs maybe 5 to 10 years ago? Chris - When approaching leadership, it can be tough to sell the concept of being disruptive, what advice do you have for security professionals looking to get buy-in from their leadership to introduce security chaos engineering?Nikki - One of the hallmarks of chaos engineering is actually building resilience into development and application environments, but people here 'chaos engineering' and don't quite know what to make of it. Can you talk about how security chaos engineering can build resiliency into infrastructure?Chris - I've cited several of your articles, such as Markets DGAF Security and others. You often take a counter-culture perspective to some of the groupthink in our industry. Why do you think we tend to rally around concepts even when the data doesn't prove them out and have your views been met with defensiveness among some who hold those views? Nikki - One of my favorite parts of chaos engineering is the hyptohesis-based approach and framework for building a security chaos engineering program. It may seem counter-intuitive to the 'chaos' in 'chaos engineering'. What do you think about the scientific method approach? Chris - Another topic I've been seeing you write and talk about is increasing the burden/cost on malicious actors to drive down their ROI. Can you touch on this topic with us?
When we peel back the layers of the stack, there's one human characteristic we're sure to find: errors. Mistakes, mishaps, and miscalculations are fundamental to being human, and as such, error is built into every piece of infrastructure and code we create. Of course, learning from our errors is critical in our effort to create functional, reliable tech. But could our mistakes be as important to technological development as our ideas? And what happens when we try to change our attitude towards errors…or remove them entirely? In this fascinating episode of Traceroute, we start back in 1968, when “The Mother of All Demos“ was supposed to change the face of personal computing…before the errors started. We're then joined by Andrew Clay Shafer, a DevOps pioneer who has seen the evolution of “errors” to “incidents” through practices like Scrum, Agile, and Chaos Engineering. We also speak with Courtney Nash, a Cognitive Neuroscientist and Researcher whose Verica Open Incident Directory (VOID) has changed the way we look at incident reporting. Additional ResourcesConnect with Amy Tobey: LinkedIn or TwitterConnect with Fen Aldrich: LinkedIn or TwitterConnect with John Taylor on LinkedInConnect With Courtney Nash on TwitterConnect with Andrew Clay Shafter on TwitterVisit Origins.dev for more informationEnjoyed This Episode?If you did, be sure to follow and share it with your friends! We'd also appreciate a five-star review on Apple Podcasts - it really helps people find the show!Traceroute is a podcast from Equinix and is a production of Stories Bureau. This episode was produced by John Taylor with help from Tim Balint and Cat Bagsic. It was edited by Joshua Ramsey and mixed by Jeremy Tuttle, with additional editing and sound design by Mathr de Leon. Our theme song was composed by Ty Gibbons.
Kelly Shortridge, Senior Principal Engineer at Fastly, joins Corey on Screaming in the Cloud to discuss their recently released book, Security Chaos Engineering: Sustaining Resilience in Software and Systems. Kelly explains why a resilient strategy is far preferable to a bubble-wrapped approach to cybersecurity, and how developer teams can use evidence to mitigate security threats. Corey and Kelly discuss how the risks of working with complex systems is perfectly illustrated by Jurassic Park, and Kelly also highlights why it's critical to address both system vulnerabilities and human vulnerabilities in your development environment rather than pointing fingers when something goes wrong.About KellyKelly Shortridge is a senior principal engineer at Fastly in the office of the CTO and lead author of "Security Chaos Engineering: Sustaining Resilience in Software and Systems" (O'Reilly Media). Shortridge is best known for their work on resilience in complex software systems, the application of behavioral economics to cybersecurity, and bringing security out of the dark ages. Shortridge has been a successful enterprise product leader as well as a startup founder (with an exit to CrowdStrike) and investment banker. Shortridge frequently advises Fortune 500s, investors, startups, and federal agencies and has spoken at major technology conferences internationally, including Black Hat USA, O'Reilly Velocity Conference, and SREcon. Shortridge's research has been featured in ACM, IEEE, and USENIX, spanning behavioral science in cybersecurity, deception strategies, and the ROI of software resilience. They also serve on the editorial board of ACM Queue.Links Referenced: Fastly: https://www.fastly.com/ Personal website: https://kellyshortridge.com Book website: https://securitychaoseng.com LinkedIn: https://www.linkedin.com/in/kellyshortridge/ Twitter: https://twitter.com/swagitda_ Bluesky: https://shortridge.bsky.social TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Have you listened to the new season of Traceroute yet? Traceroute is a tech podcast that peels back the layers of the stack to tell the real, human stories about how the inner workings of our digital world affect our lives in ways you may have never thought of before. Listen and follow Traceroute on your favorite platform, or learn more about Traceroute at origins.dev. My thanks to them for sponsoring this ridiculous podcast. Corey: Welcome to Screaming in the Cloud, I'm Corey Quinn. My guest today is Kelly Shortridge, who is a Senior Principal Engineer over at Fastly, as well as the lead author of the recently released Security Chaos Engineering: Sustaining Resilience in Software and Systems. Kelly, welcome to the show.Kelly: Thank you so much for having me.Corey: So, I want to start with the honest truth that in that title, I think I know what some of the words mean, but when you put them together in that particular order, I want to make sure we're talking about the same thing. Can you explain that like I'm five, as far as what your book is about?Kelly: Yes. I'll actually start with an analogy I make in the book, which is, imagine you were trying to rollerblade to some destination. Now, one thing you could do is wrap yourself in a bunch of bubble wrap and become the bubble person, and you can waddle down the street trying to make it to your destination on the rollerblades, but if there's a gust of wind or a dog barks or something, you're going to flop over, you're not going to recover. However, if you instead do what everybody does, which is you know, kneepads and other things that keep you flexible and nimble, the gust you know, there's a gust of wind, you can kind of be agile, navigate around it; if a dog barks, you just roller-skate around it; you can reach your destination. The former, the bubble person, that's a lot of our cybersecurity today. It's just keeping us very rigid, right? And then the alternative is resilience, which is the ability to recover from failure and adapt to evolving conditions.Corey: I feel like I am about to torture your analogy to death because back when I was in school in 2000, there was an annual tradition at the school I was attending before failing out, where a bunch of us would paint ourselves green every year and then bike around the campus naked. It was the green bike ride. So, one year I did this on rollerblades. So, if you wind up looking—there's the bubble wrap, there's the safety gear, and then there's wearing absolutely nothing, which feels—Kelly: [laugh]. Yes.Corey: —kind of like the startup approach to InfoSec. It's like, “It'll be fine. What's the worst that happens?” And you're super nimble, super flexible, until suddenly, oops, now I really wish I'd done things differently.Kelly: Well, there's a reason why I don't say rollerblade naked, which other than it being rather visceral, what you described is what I've called YOLOSec before, which is not what you want to do. Because the problem when you think about it from a resilience perspective, again, is you want to be able to recover from failure and adapt. Sure, you can oftentimes move quickly, but you're probably going to erode software quality over time, so to a certain point, there's going to be some big incident, and suddenly, you aren't fast anymore, you're actually pretty slow. So, there's this, kind of, happy medium where you have enough, I would like security by design—we can talk about that a bit if you want—where you have enough of the security by design baked in and you can think of it as guardrails that you're able to withstand and recover from any failure. But yeah, going naked, that's a recipe for not being able to rollerblade, like, ever again, potentially [laugh].Corey: I think, on some level, that the correct dialing in of security posture is going to come down to context, in almost every case. I'm building something in my spare time in the off hours does not need the same security posture—mostly—as we are a bank. It feels like there's a very wide gulf between those two extremes. Unfortunately, I find that there's a certain tone-deafness coming from a lot of the security industry around oh, everyone must have security as their number one thing, ever. I mean, with my clients who I fixed their AWS bills, I have to care about security contractually, but the secrets that I hold are boring: how much money certain companies pay another very large company.Yes, I'll get sued into oblivion if that leaks, but nobody dies. Nobody is having their money stolen as a result. It's slightly embarrassing in the tech press for a cycle and then it's over and done with. That's not the same thing as a brief stint I did running tech ops at Grindr ten years ago where, leak that database and people will die. There's a strong difference between those threat models, and on some level, being able to act accordingly has been one of the more eye-opening approaches to increasing velocity in my experience. Does that align with the thesis of your book, since my copy has not yet arrived for this recording?Kelly: Yes. The book, I am not afraid to say it depends on the book, and you're right, it depends on context. I actually talk about this resilience potion recipe that you can check out if you want, these ingredients so we can sustain resilience. A key one is defining your critical functions, just what is your system's reason for existence, and that is what you want to make sure it can recover and still operate under adverse conditions, like you said.Another example I give all the time is most SaaS apps have some sort of reporting functionality. Guess what? That's not mission-critical. You don't need the utmost security on that, for the most part. But if it's processing transactions, yeah, probably you want to invest more security there. So yes, I couldn't agree more that it's context-dependent and oh, my God, does the security industry ignore that so much of the time, and it's been my gripe for, I feel like as long as I've been in the industry.Corey: I mean, there was a great talk that Netflix gave years ago where they mentioned in passing, that all developers have root in production. And that's awesome and the person next to him was super excited and I looked at their badge, and holy hell, they worked at an actual bank. That seems like a bad plan. But talking to the Netflix speaker after the fact, Dave Hahn, something that I found that was extraordinarily insightful, was that, yeah, well we just isolate off the PCI environment so the rest and sensitive data lives in its own compartmentalized area. So, at that point, yeah, you're not going to be able to break much in that scenario.It's like, that would have been helpful context to put in talk. Which I'm sure he did, but my attention span had tripped out and I missed that. But that's, on some level, constraining blast radius and not having compliance and regulatory issues extending to every corner of your environment really frees you up to do things appropriately. But there are some things where you do need to care about this stuff, regardless of how small the surface area is.Kelly: Agreed. And I introduced the concept of the effort investment portfolio in the book, which is basically, that is where does it matter to invest effort and where can you kind of like, maybe save some resources up. I think one thing you touched on, though, is, we're really talking about isolation and I actually think people don't think about isolation in as detailed or maybe as expansively as they could. Because we want both temporal and logical and spatial isolation. What you talked about is, yeah, there are some cases where you want to isolate data, you want to isolate certain subsystems, and that could be containers, it could also be AWS security groups.It could take a bunch of different forms, it could be something like RLBox in WebAssembly land. But I think that's something that I really try to highlight in the book is, there's actually a huge opportunity for security engineers starting from the design of a system to really think about how can we infuse different forms of isolation to sustain resilience.Corey: It's interesting that you use the word investment. When fixing AWS bills for a living, I've learned over the last almost seven years now of doing this that cost and architecture and cloud are fundamentally the same thing. And resilience is something that comes with a very real cost, particularly when you start looking at what the architectural choices are. And one of the big reasons that I only ever work on a fixed-fee basis is because if I'm charging for a percentage of savings or something, it inspires me to say really uncomfortable things like, “Backups are for cowards.” And, “When was the last time you saw an entire AWS availability zone go down for so long that it mattered? You don't need to worry about that.” And it does cut off an awful lot of cost issues, at the price of making the environment more fragile.That's where one of the context thing starts to come in. I mean, in many cases, if AWS is having a bad day in a given region, well does your business need that workload to be functional? For my newsletter, I have a publication system that's single-homed out of the Oregon region. If that whole thing goes down for multiple days, I'm writing that week's issue by hand because I'm going to have something different to talk about anyway. For me, there is no value in making that investment. But for companies, there absolutely is, but there's also seems to be a lack of awareness around, how much is a reasonable investment in that area when do you start making that investment? And most critically, when do you stop?Kelly: I think that's a good point, and luckily, what's on my side is the fact that there's a lot of just profligate spending in cybersecurity and [laugh] that's really what I'm focused on is, how can we spend those investments better? And I actually think there's an opportunity in many cases to ditch a ton of cybersecurity tools and focus more on some of the stuff he talked about. I agree, by the way that I've seen some threat models where it's like, well, AWS, all regions go down. I'm like, at that point, we have, like, a severe, bigger-than-whatever-you're-thinking-about problem, right?Corey: Right. So, does your business continuity plan account for every one of your staff suddenly quitting on the spot because there's a whole bunch of companies with very expensive consulting, like, problems that I'm going to go work for a week and then buy a house in cash. It's one of those areas where, yeah, people are not going to care about your environment more than they are about their families and other things that are going on. Plan accordingly. People tend to get so carried away with these things with tabletop planning exercises. And then of course, they forget little things like I overwrote the database by dropping the wrong thing. Turns out that was production. [laugh]. Remembering for [a me 00:10:00] there.Kelly: Precisely. And a lot of the chaos experiments that I talk about in the book are a lot of those, like, let's validate some of those basics, right? That's actually some of the best investments you can make. Like, if you do have backups, I can totally see your argument about backups are for cowards, but if you do have them, like, maybe you conduct experiments to make sure that they're available when you need them, and the same thing, even on the [unintelligible 00:10:21] side—Corey: No one cares about backups, but everyone really cares about restores, suddenly, right after—Kelly: Yeah.Corey: —they really should have cared about backups.Kelly: Exactly. So, I think it's looking at those experiments where it's like, okay, you have these basic assumptions in place that you assume to be invariance or assume that they're going to bail you out if something goes wrong. Let's just verify. That's a great place to start because I can tell you—I know you've been to the RSA hall floor—how many cybersecurity teams are actually assessing the efficacy and actually experimenting to see if those tools really help them during incidents. It's pretty few.Corey: Oh, vendors do not want to do those analyses. They don't want you to do those analyses, either, and if you do, for God's sakes, shut up about it. They're trying to sell things here, mostly firewalls.Kelly: Yeah, cybersecurity vendors aren't necessarily happy about my book and what I talk about because I have almost this ruthless focus on evidence and [unintelligible 00:11:08] cybersecurity vendors kind of thrive on a lack of evidence. So.Corey: There's so much fear, uncertainty, and doubt in that space and I do feel for them. It's a hard market to sell in without having to talk about here's the thing that you're defending against. In my case, it's easy to sell the AWS bill is high because if I don't have to explain why more or less setting money on fire as a bad thing, I don't really know what to tell you. I'm going to go look for a slightly different customer profile. That's not really how it works in security, I'm sure there are better go-to-market approaches, but they're hard to find, at least, ones that work holistically.Kelly: There are. And one of my priorities with the book was to really enumerate how many opportunities there are to take software engineering practices that people already know, let's say something like type systems even, and how those can actually help sustain resilience. Even things like integration testing or infrastructure as code, there are a lot of opportunities just to extend what we already do for systems reliability to sustain resilience against things that aren't attacks and just make sure that, you know, we cover a few of those cases as well. A lot of it should be really natural to software engineering teams. Again, security vendors don't like that because it turns out software engineering teams don't particularly like security vendors.Corey: I hadn't noticed that. I do wonder, though, for those who are unaware, chaos engineering started off as breaking things on purpose, which I feel like one person had a really good story and thought about it super quickly when they were about to get fired. Like, “No, no, it's called Chaos Engineering.” Good for them. It's now a well-regarded discipline. But I've always heard of it in the context of reliability of, “Oh, you think your site is going to work if the database falls over? Let's push it over and see what happens.” How does that manifest in a security context?Kelly: So, I will clarify, I think that's a slight misconception. It's really about fixing things in production, and that's the end goal. I think we should not break things just to break them, right? But I'll give a simple example, which I know it's based on what Aaron Rinehart conducted at UnitedHealth Group, which is, okay, let's inject a misconfigured port as an experiment and see what happens, end-to-end. In their case, the firewall only detected the misconfigured port 60% of the time, so 60% of the time, it works every time.But it was actually the cloud, the very common, like, Cloud configuration management tool that caught the change and alerted responders. So, it's that kind of thing where we're still trying to verify those assumptions that we have about our systems and how they behave, again, end-to-end. In a lot of cases, again, with security tools, they are not behaving as we expect. But I still argue security is just a subset of software quality, so if we're experimenting to verify, again, our assumptions and observe system behavior, we're benefiting software quality, and security is just a subset of that. Think about C code, right? It's not like there's, like, a healthy memory corruption, so it's bad for both the quality and security reason.Corey: One problem that I've had in the security space for a while is—let's [unintelligible 00:14:05] on this to AWS for a second because that is the area in which I spend the most of my time, which probably explains a lot about my personality challenges. But the problem that I keep smacking into is if I go ahead and configure everything the way that I should according to best practices and the rest, I wind up with a firehose torrent of information in terms of CloudTrail logs, et cetera. And it's expensive in its own right. But then to sort through it or to do a lot of things in security, there are basically two options. I can either buy a vendor's product, which generally tends to start around $12,000 a year and goes up rapidly from there on my current $6,000 a year bill, so okay, twice as much as the infrastructure for security monitoring. Okay.Or alternately, find a bunch of different random scripts and tools on GitHub of wildly diverging quality and sort of hope for the best on that. It feels like there's nothing in between. And the reason I care about this is not because I'm cheap but because when you have an individual learner who is either a student or a career switcher or someone just trying to experiment with this, you want them to begin as you want them to go on, and things that are no money for an enterprise are all the money to them. They're going to learn to work with the tools that they can afford. That feels like it's a big security swing and a miss. Do you agree or disagree? What's the nuance I'm missing here?Kelly: No, I don't think there's nuance you're missing. I think security observability, for one, isn't a buzzword that particularly exists. I've been trying to make it a thing, but I'm solely one individual screaming into the void. But observability just hasn't been a thing. We haven't really focused on, okay, so what, like, we get data and what do we do with it?And I think, again, from a software engineering perspective, I think there's a lot we can do. One, we can just avoid duplicating efforts. We can treat observability, again, of any sort of issue as similar, whether that's an attack or a performance issue. I think this is another place where security, or any sort of chaos experiment, shines though because if you have an idea of here's an adverse scenario we care about, you can actually see how does it manifest in the logs and you can start to figure out, like, what signals do we actually need to be looking for, what signals mattered to be able to narrow it down. Which again, it involves time and effort, but also, I can attest when you're buying the security vendor tool and, in theory, absolving some of that time and effort, it's maybe, maybe not, because it can be hard to understand what the outcomes are or what the outputs are from the tool and it can also be very difficult to tune it and to be able to explain some of the outputs. It's kind of like trading upfront effort versus long-term overall overhead if that makes sense.Corey: It does. On that note, the title of your book includes the magic key phrase ‘sustaining resilience.' I have found that security effort and investment tends to resemble a fire drill in—Kelly: [laugh].Corey: —an awful lot of places, where, “We care very much about security,” says the company, right after they very clearly failed to care about security, and I know this because I'm reading getting an email about a breach that they've just sent me. And then there's a whole bunch of running around and hair-on-fire moments. But then there's a new shiny that always comes up, a new strategic priority, and it falls to the wayside again. What do you see the drives that sustained effort and focus on resilience in a security context?Kelly: I think it's really making sure you have a learning culture, which sounds very [unintelligible 00:17:30], but things again, like, experiments can help just because when you do simulate those adverse scenarios and you see how your system behaves, it's almost like running an incident and you can use that as very fresh, kind of, like collective memory. And I even strongly recommend starting off with prior incidents in simulating those, just to see like, hey, did the improvements we make actually help? If they didn't, that can be kind of another fire under the butt, so to speak, to continue investing. So, definitely in practice—and there's some case studies in the book—it can be really helpful just to kind of like sustain that memory and sustain that learning and keep things feeling a bit fresh. It's almost like prodding the nervous system a little, just so it doesn't go back to that complacent and convenient feeling.Corey: It's one of the hard problems because—I'm sure I'm going to get castigated for this by some of the listeners—but computers are easy, particularly compared to the people. There are deterministic ways to solve almost any computer problem, but people are always going to be a little bit different, and getting them to perform the same way today that they did yesterday is an exercise in frustration. Changing the culture, changing the approach and the attitude that people take toward a lot of these things feels, from my perspective, like, something of an impossible job. Cultural transformations are things that everyone talks about, but it's rare to see them succeed.Kelly: Yes, and that's actually something that I very strongly weaved throughout the book is that if your security solutions rely on human behavior, they're going to fail. We want to either reduce hazards or eliminate hazards by design as much as possible. So, my view is very much again, like, can you make processes more repeatable? That's going to help security. I definitely do not think that if anyone takes away from my book that they need to have, like, a thousand hours of training to change hearts and minds, then they have completely misunderstood most of the book.The idea is very much like, what are practices that we want for other outcomes anyway—again, reliability or faster time to market—and how can we harness those to also be improving resilience or security at the same time? It's very much trying to think about those opportunities rather than, you know, trying to drill into people's heads, like, “Thou shalt not,” or, “Thou shall.”Corey: Way back in 2018, you gave a keynote at some conference or another and you built the entire thing on the story of Jurassic Park, specifically Ian Malcolm as one of your favorite fictional heroes, and you tied it into security in a bunch of different ways. You hadn't written this book then unless the authorship process is way longer than I think it is. So, I'm curious to get your take on what Jurassic Park can teach us about software security.Kelly: Yes, so I talk about Jurassic Park as a reference throughout the book, frequently. I've loved that book since I was a very young child. Jurassic Park is a great example of a complex system gone wrong because you can't point to any one thing. Like there's Dennis Nedry, you know, messing up the power system, but then there's also the software was looking for a very specific count of dinosaurs and they didn't anticipate there could be more in the count. Like, there are so many different factors that influenced it, you can't actually blame just, like, human error or point fingers at one thing.That's a beautiful example of how things go wrong in our software systems because like you said, there's this human element and then there's also how the humans interact and how the software components interact. But with Jurassic Park, too, I think the great thing is dinosaurs are going to do dinosaur things like eating people, and there are also equivalents in software, like C code. C code is going to do C code things, right? It's not a memory safe language, so we shouldn't be surprised when something goes wrong. We need to prepare accordingly.Corey: “How could this happen? Again?” Yeah.Kelly: Right. At a certain point, it's like, there's probably no way to sufficiently introduce isolation for dinosaurs unless you put them in a bunker where no one can see them, and it's the same thing sometimes with things like C code. There's just no amount of effort you can invest, and you're just kind of investing for a really unclear and generally not fortuitous outcome. So, I like it as kind of this analogy to think about, okay, where do our effort investments make sense and where is it sometimes like, we really just do need to refactor because we're dealing with dinosaurs here.Corey: When I was a kid, that was one of my favorite books, too. The problem is, I didn't realize I was getting a glimpse of my future at a number of crappy startups that I worked at. Because you have John Hammond, who was the owner of the park talking constantly about how, “We spared no expense,” but then you look at what actually happened and he spared every frickin expense. You have one IT person who is so criminally underpaid that smuggling dinosaur embryos off the island becomes a viable strategy for this. He wound up, “Oh, we couldn't find the right DNA, so we're just going to, like, splice some other random stuff in there. It'll be fine.”Then you have the massive overconfidence because it sounds very much like he had this almost Muskian desire to fire anyone who disagreed with him, and yeah, there was a certain lack of investment that could have been made, despite loud protestations to the contrary. I'd say that he is the root cause, he is the proximate reason for the entire failure of the park. But I'm willing to entertain disagreement on that point.Kelly: I think there are other individuals, like Dr. Wu, if you recall, like, deciding to do the frog DNA and not thinking that maybe something could go wrong. I think there was a lot of overconfidence, which you're right, we do see a lot in software. So, I think that's actually another very important lesson is that incentives matter and incentives are very hard to change, kind of like what you talked about earlier. It doesn't mean that we shouldn't include incentives in our threat model.So like, in the book I talked about, our threat models should include things like maybe yeah, people are underpaid or there is a ton of pressure to deliver things quickly or, you know, do things as cheaply as possible. That should be just as much of our threat models as all of the technical stuff too.Corey: I think that there's a lot that was in that movie that was flat-out wrong. For example, one of the kids—I forget her name; it's been a long time—was logging in and said, “Oh, this is Unix. I know Unix.” And having learned Unix as my first basically professional operating system, “No, you don't. No one knows Unix. They get very confused at some point, the question is, just how far down what rabbit hole it is.”I feel so sorry for that kid. I hope she wound up seeking therapy when she was older to realize that, no, you don't actually know Unix. It's not that you're bad at computers, it's that Unix is user-hostile, actively so. Like, the raptors, like, that's the better metaphor when everything winds up shaking out.Kelly: Yeah. I don't disagree with that. The movie definitely takes many liberties. I think what's interesting, though, is that Michael Creighton, specifically, when he talks about writing the book—I don't know how many people know this—dinosaurs were just a mechanism. He knew people would want to read it in airport.What he cared about was communicating really the danger of complex systems and how if you don't respect them and respect that interactivity and that it can baffle and surprise us, like, things will go wrong. So, I actually find it kind of beautiful in a way that the dinosaurs were almost like an afterthought. What he really cared about was exactly what we deal with all the time in software, is when things go wrong with complexity.Corey: Like one of his other books, Airframe, talked about an air disaster. There's a bunch of contributing factors in the rest, and for some reason, that did not receive the wild acclaim that Jurassic Park did to become a cultural phenomenon that we're still talking about, what, 30 years later.Kelly: Right. Dinosaurs are very compelling.Corey: They really are. I have to ask though—this is the joy of having a kid who is almost six—what is your favorite dinosaur? Not a question most people get asked very often, but I am going to trot that one out.Kelly: No. Oh, that is such a good question. Maybe a Deinonychus.Corey: Oh, because they get so angry they spit and kill people? That's amazing.Kelly: Yeah. And I like that, kind of like, nimble, smarter one, and also the fact that most of the smaller ones allegedly had feathers, which I just love this idea of, like, feather-ful murder machines. I have the classic, like, nerd kid syndrome, though, where I read all these dinosaur names as a kid and I've never pronounced them out loud. So, I'm sure there are others—Corey: Yep.Kelly: —that I would just word salad. But honestly, it's hard to go wrong with choosing a favorite dinosaur.Corey: Oh, yeah. I'm sure some paleontologist is sitting out there in the field on the dig somewhere listening to this podcast, just getting very angry at our pronunciation and things. But for God's sake, I call the database Postgres-squeal. Get in line. There's a lot of that out there where looking at a complex system failures and different contributing factors and the rest makes stuff—that's what makes things interesting.I think that there's this the idea of a root cause is almost always incorrect. It's not, “Okay, who tripped over the buried landmine,” is not the interesting question. It's, “Who buried the thing?” What were all the things that wound up contributing to this? And you can't even frame it that way in the blaming context, just because you start doing that and people clam up, and good luck figuring out what really happened.Kelly: Exactly. That's so much of what the cybersecurity industry is focused on is how do we assign blame? And it's, you know, the marketing person clicked on a link. And it's like, they do that thousands of times, like a month, and the one time, suddenly, they were stupid for doing it? That doesn't sound right.So, I'm a big fan of, yes, vanquishing root cause, thinking about contributing factors, and in particular, in any sort of incident review, you have to think about, was there a designer process problem? You can't just think about the human behavior; you have to think about where are the opportunities for us to design things better, to make this secure way more of the default way.Corey: When you talk about resilience and reliability and big, notable outages, most forward-thinking companies are going to go and do a variety of incident reviews and disclosures around everything that happened to it, depending upon levels of trust and whether your NDA'ed or not, and how much gets public is going to vary from place to place. But from a security perspective, that feels like the sort of thing that companies will clam up about and never say a word.Kelly: Yes.Corey: Because I can wind up pouring a couple of drinks into people and get the real story of outages, or the AWS bill, but security stuff, they start to wonder if I'm a state actor, on some level. When you were building all of this, how did you wind up getting people to talk candidly and forthrightly about issues that if it became tied to them that they were talking to this in public would almost certainly have negative career impact for them?Kelly: Yes, so that's almost like a trade secret, I feel like. A lot of it is yes, over the years talking with people over, generally at a conference where you know, things are tipsy. I never want to betray confidentiality, to be clear, but certainly pattern-matching across people's stories.Corey: Yeah, we're both in positions where if even the hint of they can't be trusted enters the ecosystem, I think both of our careers explode and never recover. Like it's—Kelly: Exactly.Corey: —yeah. Oh, yeah. They play fast and loose with secrets is never the reputation you want as a professional.Kelly: No. No, definitely not. So, it's much more pattern matching and trying to generalize. But again, a lot of what can go wrong is not that different when you think about a developer being really tired and making a bunch of mistakes versus an attacker. A lot of times they're very much the same, so luckily there's commonality there.I do wish the security industry was more forthright and less clandestine because frankly, all of the public postmortems that are out there about performance issues are just such, such a boon for everyone else to improve what they're doing. So, that's a change I wish would happen.Corey: So, I have to ask, given that you talk about security, chaos engineering, and resilience-and of course, software and systems—all in the title of the O'Reilly book, who is the target audience for this? Is it folks who have the word security featured three times in their job title? Is it folks who are new to the space? What is your target audience start and stop?Kelly: Yes, so I have kept it pretty broad and it's anyone who works with software, but I'll talk about the software engineering audience because that is, honestly, probably out of anyone who I would love to read the book the most because I firmly believe that there's so much that software engineering teams can do to sustain resilience and security and they don't have to be security experts. So, I've tried to demystify security, make it much less arcane, even down to, like, how attackers, you know, they have their own development lifecycle. I try to demystify that, too. So, it's very much for any team, especially, like, platform engineering teams, SREs, to think about, hey, what are some of the things maybe I'm already doing that I can extend to cover, you know, the security cases as well? So, I would love for every software engineer to check it out to see, like, hey, what are the opportunities for me to just do things slightly differently and have these great security outcomes?Corey: I really want to thank you for taking the time to talk with me about how you view these things. If people want to learn more, where's the best place for them to find you?Kelly: Yes, I have all of the social media which is increasingly fragmented, [laugh] I feel like, but I also have my personal site, kellyshortridge.com. The official book site is securitychaoseng.com as well. But otherwise, find me on LinkedIn, Twitter, [Mastodon 00:30:22], Bluesky. I'm probably blanking on the others. There's probably already a new one while we've spoken.Corey: Blue-ski is how I insist on pronouncing it as well, while we're talking about—Kelly: Blue-ski?Corey: Funhouse pronunciation on things.Kelly: I like it.Corey: Excellent. And we will, of course, put links to all of those things in the [show notes 00:30:37]. Thank you so much for being so generous with your time. I really appreciate it.Kelly: Thank you for having me and being a fellow dinosaur nerd.Corey: [laugh]. Kelly Shortridge, Senior Principal Engineer at Fastly. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an insulting comment about how our choice of dinosaurs is incorrect, then put the computer away and struggle to figure out how to open a door.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Chaos engineering is a discipline within the field of software engineering that focuses on testing and improving the resilience and stability of a system by intentionally introducing controlled instances of chaos and failure. The primary goal of chaos engineering is to identify and address potential weaknesses and vulnerabilities in a system, ultimately making it more The post Chaos Engineering with Uma Mukkara appeared first on Software Engineering Daily.
Chaos engineering is a discipline within the field of software engineering that focuses on testing and improving the resilience and stability of a system by intentionally introducing controlled instances of chaos and failure. The primary goal of chaos engineering is to identify and address potential weaknesses and vulnerabilities in a system, ultimately making it more The post Chaos Engineering with Uma Mukkara appeared first on Software Engineering Daily.
What is security chaos engineering? You may remember Kelly Shortridge, our very first guest, who came on the show to talk about behavioral economics and cybersecurity. Well Kelly is back to talk about her new book, "Security Chaos Engineering: Sustaining Resilience in Software and Systems". Security chaos engineering is derived from chaos engineering, a relatively new discipline in software development that seeks to test distributed computing systems to ensure that they withstand unexpected disruptions. It's all about resilience, in other words. Security chaos engineering seeks to do the same for the security of such software systems. Kelly breaks down her book during a lively conversation featuring an opinion or two her cat, Link (yes, a Zelda reference!): Who should read this book? Resilience in software and systems Systems-oriented security Architecting and designing Building and delivering Operating and observing (Allan's favorite chapter as it intersects with one of his Zero Trust tenets) Responding and recovering Platform resilience engineering Security chaos experiments (a very fun chapter!) Case studies Note that the book is peppered with references and quotes from other disciplines. We would expect no less from Kelly. Sponsored by our good friends at Dazz: Dazz takes the pain out of the cloud remediation process using automation and intelligence to discover, reduce, and fix security issues—lightning fast. Visit Dazz.io/demo and see for yourself.
Watch on YouTube About the show Sponsored by InfluxDB from Influxdata. Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Brian #1: huak - A Python package manager written in Rust. Inspired by Cargo Suggested by Owen Tons of workflows activate - activate a virtual environment add add a dependency to a project pip install it into your virtual environment, and add it to the dependency list in pyproject.toml test - run pytest update update dependencies lint - run ruff, installing it first if necessary fix - autofix fixable lint conflicts build - build wheel in isolated virtual environment using hatchling Honestly I was considering building my own workflow tool, but this is darned close to what I want. Even though it's still “in an experimental state”. There are rough edges (ruff edges, get it), but still, way cool. I just don't know how to pronounce it. Is it like “walk”, or more like “whack”? Michael #2: PSF expresses concerns about a proposed EU law that may make it impossible to continue providing Python and PyPI to the European public After reviewing the proposed Cyber Resilience Act and Product Liability Act, the PSF has found issues that put the mission of our organization and the health of the open-source software community at risk. As currently written, the authors of open-source components might bear legal and financial responsibility for the way their components are applied in someone else's commercial product. The risk of huge potential costs would make it impossible in practice for us to continue to provide Python and PyPI to the European public. Brian #3: ChaosToolkit Suggested by the maintainer, Sylvain Hellegouarch Declare and store your Chaos Engineering experiments as JSON/YAML files so you can collaborate and orchestrate them as any other piece of code. Extensible through an Open API Can be automated in CI/CD pipeline Michael #4: PEP 711 – PyBI: a standard format for distributing Python Binaries “Like wheels, but instead of a pre-built python package, it's a pre-built python interpreter” Joke: It's the effort that counts
All links and images for this episode can be found on CISO Series. Is chaos engineering the secret sauce to creating a resilient organization? Purposefully disrupt your architecture to allow for early discovery of weak points. Can we take it even further to company environment, beyond even a tabletop exercise? How far can we test our limits while still allowing the business to operate? This week's episode is hosted by me, David Spark (@dspark), producer of CISO Series and Andy Ellis (@csoandy), operating partner, YL Ventures. Our sponsored guest is Mike Wiacek, CEO, Stairwell. Thanks to our podcast sponsor, Stairwell The standard cybersecurity blueprint is a roadmap for attackers to test and engineer attacks. With Inception, organizations can operate out of sight, out of band, and out of time. Collect, search, and analyze every file in your environment – from malware and supply chain vulnerabilities to unique, low-prevalence files and beyond. Learn about Inception. In this episode: Is chaos engineering the secret sauce to creating a resilient organization? Purposefully disrupt your architecture to allow for early discovery of weak points. Can we take it even further to company environment, beyond even a tabletop exercise? How far can we test our limits while still allowing the business to operate?
Chaos Engineering started in the mid 2000s. It was made famous by the Netflix engineering team under an internal app they developed, called Chaos Monkey, that randomly destroyed pieces of their customer-facing infrastructure, on purpose, so that their network architects could understand resilience engineering down deep in their core. But the concept is much more than simply destroying production systems to see what will happen. This elevates the idea of regression testing to the level of the scientific method designed to uncover potential and unknown architectural designs that may cause catastrophic failure. I make the case that the CSO should probably own that functionality.
In complex service-oriented architectures, failure can happen in individual servers and containers, then cascade through your system. Good engineering takes into account possible failures. But how do you test whether a solution actually mitigates failures without risking the ire of your customers? That's where chaos engineering comes in, injecting failures and uncertainty into complex systems so your team can see where your architecture breaks. On this sponsored episode, our fourth in the series with Intuit, Ben and Ryan chat with Deepthi Panthula, Senior Product Manager, and Shan Anwar, Principal Software Engineer, both of Intuit about how use self-serve chaos engineering tools to control the blast radius of failures, how game day tests and drills keep their systems resilient, and how their investment in open-source software powers their program. Episode notes: Sometimes old practices work in new environments. The Intuit team uses Failure Mode Effect Analysis, (FMEA), a procedure developed by the US military in 1949, to ensure that their developers understand possible points of failure before code makes it to production. The team uses Litmus Chaos to inject failures into their Kubernetes-based system and power their chaos engineering efforts. It's open source and maintained by Intuit and others. If you've been following this series, you'd know that Intuit is a big fan of open-source software. Special shout out to Argo Workflow, which makes their compute-intensive Kubernetes jobs work much smoother. Connect on LinkedIn with Deepthi Panthula and Zeeshan (Shan) Anwar.If you want to see what Stack Overflow users are saying about chaos engineering, check out Chaos engineering best practice, asked by User NingLee two years ago.
2011 was a pivotal year for Netflix: the now hugely successful company was then in the midst of a formidable transformation, changing from a mail-based DVD rental service to the modern streaming service that it is today. It was at this crucial point in the company's history that Jason Chan, our guest in this episode, was hired by Netflix to lay the foundations for its cloud security protocols. Nate Nelson, our Sr. Producer, spoke with Jason about the decade he spent at the company, what he learned during his tenure there, and the ideas that took shape at that time, such as Chaos Engineering.Nate Nelson, our Sr. producer, spoke with Dr. Cohen about his early research into computer viruses, his work with the US army, the panicky response from the US government - and the parallels between computer viruses and mental viruses - i.e. memes.
In today's episode of Category Visionaries, we speak with Casey Rosenthal, CEO of Verica, about his career in digital security, making a name for himself with the concept of chaos engineering, and how he eventually took those ideas mainstream with Verica, a continuous verification platform which helps people map the limitations of their existing software solutions and make contingencies for when things go wrong. An increasingly popular concept, ‘chaos engineering' is essentially a toolkit to experiment with a system's critical capacity, but a great tool doesn't always make a perfect product, and Carey explains a little bit about how commercial viability can be an engineering challenge all of its own, why continuous verification is the final piece of the DevOps revolution, and why the biggest challenge for Verica is aligning disparate development cycles. Topics Discussed: Chaos engineering - what does it mean? Why is it so important and what impact can it have on an enterprise tech setup Why more and more companies are seeking to add chaos engineering to their approach, and why it's already popular in fintech Why diversity is key for modern tech companies, driving better performance and ultimately more success Continuous verification and the DevOps revolution - where does this new category stand in the story? How harmonizing schedules between different departments can be a real challenge for a company looking to expand Favorite book: What Works: Gender Equality by Design
This episode is sponsored by Fiberplane. Your platform for collaborative debugging notebooks!Episode Resources:Try Fiberplane hereFiberplane websiteFiberplane DocsNP-hard Ventures About Micha Hernandez van LeuffenMicha Hernandez van Leuffen is the founder and CEO of Fiberplane. He previously founded Wercker, a container-native CI/CD platform that was acquired by Oracle. Micha has dedicated his career to improving the workflows of developers. Read the whole episode (Transcript)[If you want, you can help make the transcript better, and improve the podcast's accessibility via Github. I'm happy to lend a hand to help you get started with pull requests, and open source work.][00:00:00] Michaela: Hello and welcome to the Software Engineering Unlocked Podcast. I'm host, Dr. McKayla, and today I have the pleasure to talk to Micha Hernandez van Leuffen. He is the founder and CEO of Fiberplane. He previously was the founder of Wercker, a container native CI/CD platform that was acquired by Oracle. Micha has dedicated his career to improving the workflow of developers, so he and I have a lot to talk about today.I'm really, really happy that he's here today and he's also sponsoring today's episode. Welcome to the show. I'm happy that you're here, Micha.[00:00:36] Micha: Thank you for having me. Excited to be on the show.[00:00:38] Michaela: Yeah, I'm really, really excited. So, Micha, I wanted to start really from the beginning. So you are the CEO of Fiberplane and you are the founder of Wercker, which you already sold.So, can you tell me a little bit about how you actually started to this entrepreneur journey of yours and what brought you to the developer experience area.[00:01:03] Micha: Yeah, sure thing. So I have a background in computer science and I did my so, I'm originally from Amsterdam, but I did my thesis at USF.And the topic was autonomous resource provision using software containers. This was all before Docker was a thing, you know, the container format that we now know and love. And I sort of got excited by that field of, of so containers and decided to start a company around it. That company was Worker, so container native CI/CD platform.So we helped developers build tests and deploy their applications to the cloud. We went, I would say, so we went through various iterations of the platform. You know, eventually, you know, we started off with Lxc as a container format and then eventually ended up, you know, having to, to platform on Docker.And Kubernetes. But, you know, it was quite a, quite a journey. So that company eventually got acquired by Oracle to bolster their cloud native strategy. And then, you know, spent a couple years in a Bay area as a VP of software development focusing on their cloud native efforts.Tried to do a little bit of open source there as well, and then, you know, move back to Europe. And so sort of started thinking about what's. Did some angel investing. We're still doing some angel investing as well actually in the sort of same arena. So developer tools, infrastructure building blocks for tomorrow.So I run a, a small precede seat fund with to other friends of mine. But then also started, you know, thinking about what to build next. And you know, we can get into that, but sort of from our experience at running work or this sort of large distributed. Sort of fiber plane was, was born.[00:02:26] Michaela: Cool. Yeah. And so how, how was the acquisition for you? I, from the time I'm, you said you were studying at the university, but then did you write out of university, you know, start worker or maybe already while[00:02:40] Micha: you were Yeah. More or less studying? Yeah. Yeah, more or less just out of university. So it was around 20, 20 12, 20 13.And then, you know, expanded the team. Of course we got an office in San Francisco and, and London. And then 2017 we got acquired by Whirlpool. Oh,[00:02:56] Michaela: very cool. Wow. Cool. So, and you were the, you were the founder of that and also probably cto, CEO. At, at the beginning you were one person shop, or was this, or have this idea and I get some funding and I already, you know, have a team when I'm starting out, or was it more bootstrapped way?How, how was that?[00:03:16] Micha: Yeah, yeah. We both gates, both fiber plane and, and, and worker. We got some funding early on. Then eventually got a CTO. For worker was one of the co-founders of, of OpenStack. So also, you know, very early in the, in that sort of, mm-hmm. container and, and cloud infrastructure journey.And then if for fiber plane, Yeah. There, there's no cto. I'm. I'm both CEO and cto, I guess[00:03:38] Michaela: at the same time. Yeah. Cool, cool. Can you tell me a little bit about fiber plane? What is fiber plane? You know, what does, what does it has to do with containers and with developer experience? What, what kind of of a product is it?[00:03:51] Micha: Yeah, sure thing. So, so guess coming back to the worker days, right? So we, we, you know, we're running this distributed system cic cd, so we were also running users arbitrary code. You know, any, any sort of job could happen on the platform on top of Kubernetes, inside of containers. So one of the things that, you know, stuck with me was it was very hard to always sort of debug the system, like figure out what's really going on when we had some kind of issue.You know, we've going back and forth between metrics, logs, traces, trying to figure out what is the root cause of an issue. So sort of that, that was sort of one thing. So we're thinking a lot about, you know, surely there must be a better way to, to, to help you on this, on this journey. . The other thing that I started thinking about a lot was sort of just challenge the assumption of the dashboard, mm-hmm.So if you think about it, like a lot of the monitoring observability tools are modeled after the dashboard, like sort of cockpit like view of your infrastructure. But I'd say that those are great for the known knowns. So dashboard is great. You set it up in advance, you know exactly what's gonna go wrong.These are the things to monitor. These are the things, you know, to keep tabs on. But then reality hits and you know, the thing that you're looking at, at the dashboard is not necessarily a thing that's. Going wrong. Right? So started thinking a lot about you know, what, what is a better form factor to support that sort of more investigative explorative debugging of your infrastructure.And not to say that dashboards don't have their place, right? It's like still that sort of cockpit view of your infrastructure. I think that's a, a good thing to have. But for debugging, you might wanna sort of more explorative a form factor that also gives you actionable intelligence. I think the other thing that you see a lot with dashboards, like everybody's monitoring everything and now you get a lot of signal and a lot of inputs, but not necessarily the actionable intelligence to figure out what's going on.So that's sort of the other piece where it, then the other, like, the third like I would say is collaboration sort of thing that stuck with. Was also like we've come to enjoy tools like Notion, you know, Google Docs obviously. You know, in the design space we got Figma where collaboration is built in from the get go and it is found that it was kind of odd how in the developer tools and then sort of specifically DevOps.We don't really have sort of these collab collaboration not really built in. Right. If you think about it you know, the status quo of, of you and I debugging an issue is we get on, you know, we get on a. You share your screen you open some dashboard and we started talking over it or something.Right. And so it's, and it's, you know, I guess sort of covid accelerated his thinking a bit, but you know, of everybody going remote you know, how can you make that experience more collaborative?[00:06:22] Michaela: Mm-hmm. . So it's in the incident space, it's in the monitoring space, and you want to bring more collaboration.So how does it work? Yeah,[00:06:32] Micha: yeah, yeah, exactly. So what's your solution now? Yeah. Now I've explained sort of the in inception. Yeah. But yeah, but what is it? What is it? Right. So it's, it's it's a notebook form factor. So very much inspired by data science, right? Like rc, like Jupiter. Yeah, we can Jupyter Notebooks.Yeah. Think of, think of that form factor. Mm-hmm. . We don't use Jupyter or anything like that. We've written everything from scratch. But it's a sort of, yeah, a notebook form factor and you know, built in with collaboration. So you can add, mention people like you would on Slack. You can leave, you know, comments or discussions and all and all that.But where it gets interesting, we've got these things called providers, which are effectively plugins. So they're web assembly bundles, which we can sort of dive into into that as well. But they're providers that connect to your infrastructure, right? So we have, for instance, a provider for Elastic Search for your logs.We have a provider for Prometheus for your. And it allows you to connect to these observability systems and kind of pull 'em together into one form, factor the notebook, and then, you know, start collaborating around that. Mm-hmm. . So, you know, imagine if Notion and Datadog would have a baby . Yeah.That's kind what you get. Yeah.[00:07:41] Michaela: That's cool. So I can imagine that. Let's. I'm on call and hopefully I'm not alone. A call. You are also on call, right? Yeah, and so we would open a fiber plane notebook.[00:07:52] Micha: Hopefully we're in the same time zone and we don't need to like wake up in the middle of that. Yeah.[00:07:57] Michaela: Hopefully. Yes. And then we want to understand. How the system is behaving. And so we are pulling in observers. These are data sources. Yeah. More or less. Right. And then we can do some transformation with those data. Data sources or[00:08:12] Micha: Yeah, yeah. That, yeah, exactly. That, that might be the case. The other thing that we integrate with is, for instance, PagerDuty.So an alert goes off indeed we are on call, but an alert goes off and we have this PagerDuty integration. And subsequently a notebook is created for us already. Mm-hmm. . Okay. Maybe, maybe even with, you know, some, some charts and logs that are already related to the service that might be down.Okay. So depend, So depending on the alert, obviously you're, as you know, you're as good as how you've instrumented your alerts. But say we've written some good alerts, we now have a notebook ready to go. Based off a template. So that's another thing that we, that we have as well, which is this template mechanism.And now, you know, we're ready to, to, to go in, get in into things and start debugging. So we might have a checklist, you know you look at the metrics, I'll look at the logs, sort of this action plan. We pull in that data we start a discussion around it. Mm-hmm. , hopefully we come, we come to the, to the, you know, the root cause of, of our issue.[00:09:11] Michaela: Okay. And so this discussion and this pulling in data, this happens all in the notebook. Can you explain me a little bit more, and also our listeners Exactly. When we are on this, you know, on this call now, having a fiber plane notebook in front of our, what do we see, right? How does that, how does the tool look?[00:09:28] Micha: It's, it's very similar to, I would say, like a Notion Page or a Google Doc page. Mm-hmm. . So we've got like different, different headings. The other thing that we have is, so you might have a title for a notebook, right? You know, the billing, the billing API is now. The other thing that we have is sort of this, this time range.So maybe usually when there's an issue, you know, we've seen this behavior over the last three hours, so we can sort of have that time range locked into place. So we only want to see our. For the last three hours. And that means that any chart that we plot or any log that we pull in will adhere to that global timeframe.So that's what we see. Mm-hmm. . We have support for labels, so, you know, obviously big fans of Kubernetes and, and Promeus. So we, you know, labels are. A first class primitive on the platform. So you're able to sort of populate the notebook with the labels that might maybe be related to our service.Right? So it's a US East one, which is our region. It might, you know, say service is the billing. It might be, you know, environment is production. And the status of our incident is, mm-hmm. ongoing, stuff like that. So we have, we've got, go ahead.[00:10:34] Michaela: Cool. Yeah. And, and so is it then from top to bottom we are writing and we are investigating and we are writing out down the questions that we have and the investigation.Yeah, exactly. We do.[00:10:44] Micha: Yeah. Yeah. And so, so[00:10:45] Michaela: we might have, Is it an Yeah. Is it Yes,[00:10:48] Micha: we our work? Yeah. Yeah. It's sort of Exactly. And I think in the most ideal use case, right. And I do it most ideal scenario, you're kind of like writing your postmortem as you go along. That's what[00:10:59] Michaela: I, I was thinking exactly that.Right. And then maybe next time I'm on call again and I get PagerDuty and something is down, it's again, billing. Can I search in the fiber plane notebooks to find, you know, what we did last time and then[00:11:13] Micha: Exactly. So you'll, you'll search, jump to the conclusion . Yeah. Yeah, exactly. Hopefully, hopefully if you, if you experience, you know, the same issue multiple times at some point, we'll, we'll, you know, do a little commit on GitHub and we, we fix our, fix our, Yeah.Do. But yeah, indeed, so you can search Yeah. Cool. On the notebooks and see if you've, you know, ran into similar issues. So that's, you know, it's great for building up this, this system of record, right. This knowledge base of mm-hmm. . Mm-hmm. . Of infrastructure issues and, and incidents. And it's also great for onboarding, right?If a new person joins, like, this is our process. These are some of examples that we've run into you know, have a look. And now you've got a sense for you know, how we, how we handle things and some of the issues that we've investigated. Yeah. Cool. One more thing on the, on the product. So the other, so, you know, sort of explained the, the notebook form factor.We've got these providers, right, that pull in data. From different, different data sources like Elastic Search or, or Prometheus. The other thing that we have is a command line interface which is called fp. Mm-hmm . And apart from, you know, being able to create notebooks from your terminal and you know, even invite people from the terminal, all this sort of usual stuff that you would, you know, expect from interacting with an API, with, with a product like this, there's two other things that we do.So one is a command called FP Run. And it allows you to, if you are typing a command like cube, ctl logs for a specific pod, you can pipe that the output of that command to a notebook. And why that is useful is of course, you know, when we're de debugging this issue, you and I, and you're start typing things in your terminal.I have no idea what you just did. And this is a way sort of to capture that. So you're piping these these, these outputs from your, the stuff that you're typing into your into your. into the notebook. And the cool thing is, you know, in, on your laptop you just, you know, sort of see text, right?Monospace output. But for certain outputs such as the cube CTL logs command, we actually know the structure of the data and we're actually capable of formatting that in the notebook in the structured manner that you can start filtering on, on the logs and you know, select certain columns and sort of highlight even certain loglines for prosperity that you say, Hey, these are the culprits, these are the things that you need to take into, into consideration next.So we have this sort of command line interface companion, and the other thing that it does, you're actually capable of running a long, like, sort of same use case as it just, I mentioned, but like a long running recording, like you actually record your entire shell session session as you're debugging this thing and all the output gets piped into the notebook.Cool. Cool. And[00:13:46] Michaela: so I have two questions for fiber plane. One. Is the software engineer the right person to interact with you know, fiber plane or is it the site reliability engineer that's really designed, you know, or the tool is designed[00:14:03] Micha: for? Yeah, it's, it's, that's an excellent question. So I think one, one site reliability engineers, you kind of see in more larger organizations, right, where you start splitting up your teams.I will say, I think at the end of the day, right, is if you're an engineer, you've built the service, now you need to maintain it now you need to operate it like it's, it's your baby, right? You need to, you probably know best how that system behaves than anybody else. So indeed I would say that, yeah, the target group is, you know, developers.Mm-hmm. .[00:14:40] Michaela: And so the other question that I had around fiber plane is also. When we are on this call and we are writing in this notebook, how does the whole scenario look like? Are we still on a call, like, do we have Zoom or, you know, Google meet open, or are we really in the, in the fiber plane document just writing, Or are we sitting next to each other?You know, what, what's the traditional, Is there a traditional scenario or is this all possible with fiber plane? How would you recommend using[00:15:09] Micha: it? Yeah, Yeah. Yeah. Not a, not a great question. Right. I think back in the day, it would be that, you know, we maybe sit in the same office and I scoot over and we start looking at a, at a screen, right?And start typing together. Mm-hmm. . The reality is, of course, we're all doing remote work now, and we might not be in the same room. So I do think people will still use a Zoom call or a Google meet you know, as a companion to talk over stuff. I think, you know, people will still communicate in Slack and sort of start chatting back and forth.But I think what we hope to achieve with fiber plane is like the pasting of screenshots, right? Well, if you take a screenshot of some kind of chart in your dashboard and you put it in Slack and you know, somebody yells, Oh, that's not the, that's not the thing that you should be looking at. You should, you know, like all that sort of slack glue That, you know, it's our, our goal to do away with that.[00:15:59] Michaela: Yeah. And, and the slack blue is also very problematic for the search. At least I'm never able to find it again. Right. It's like is in the dark, super in[00:16:07] Micha: the dark area. Yeah. Super ephemeral. Yeah. Yeah. You can't, can't go back in time easily. And, and you know, how did we solve this last time? So again, like building up that system of record, I think.[00:16:17] Michaela: Yeah. Very cool. And so how long are you now working on fiber plane already?[00:16:23] Micha: So we've been working on it for about two years now. Which is a, is a, is a long time. I think as a sort of, you know, one of the things that we've, I guess, sort of discovered along the way that we're kind of like building two startups at the same time, Right?We're doing a notion or like a, a rich text, collaborative rich text editing experience, which is kind of like a startup on its own. Mm-hmm. . And we're building sort of this infrastructure product. So it's, you know, it's taken quite some time and, and energy to, to get the product to where it is now.[00:16:54] Michaela: Yeah. And do you have already users? Is it like can people that listen today, can they hop on fiber plane already or.[00:17:02] Micha: It's, it's in it's been in private beta, mm-hmm. , but I think by the time this gets aired it's will be in public beta and people can sign up and take it for a spin. And, you know, we would love to get feedback on, on our roadmap, right?And Okay. People can suggest what other types of providers we need to support, what are types of integrations we, you know, would love to, to have that convers.[00:17:23] Michaela: Cool. Yeah. So is there, There is the provider side. Is there something else that you want feedback on that you are exploring[00:17:30] Micha: maybe. . Yeah. Yeah. So we've got the providers that's one thing.We've got sort of our templating stack. Mm-hmm. So curious to sort of see how people sort of start codifying their knowledge, right? What's, what, what kind of processes people have to debug their infrastructure and sort of run their incidents or write their postmortems. So curious to see what people come up with there.Other types of integrations. Right? So we have as I said, sort of PagerDuty what other type of, sort of alert, alert to notebook or other types of external systems that we need to plug in with. I would love to get some feedback on that as well. Yeah,[00:18:04] Michaela: I think I had page Bailey over on the podcast.She's from GitHub and she was she was also, they were releasing something with copilot and you know, For data scientists, some, some spaces here. And she also said like, well, we really need input from the users, right? So try it out, you know, tell us how it's working. I think it's so valuable, right, to see not only like you have your vision and obviously.It's going one way, but then if you have your users, sometimes they take your product and they use it in a very different, you know, way than you anticipate it, which can be very informative. Right. I dunno. You have done two startups already. Have you seen that? And how do you react to it? Do you instrument the data a little bit?How do you realize that people are using your product in a different. . Yeah.[00:18:50] Micha: So, so obviously we have metrics and analytics on sort of usage patterns of the, of the product. But I think, I think that data is excellent, right? But also qualitative data is, mm-hmm. , especially at this stage is probably even better, right?Where you can get somebody on a call and, you know, tell us about your use case. Tell, tell us about the problem that you're trying to solve here and how can we be, be helpful in like what types of integrations should we support? I think sort of the difference between. Worker, I would say, and, and fiber plane is that, you know, worker was a pretty confined piece of surface area, right?Cic, c d the whole goal is so you either have a, you know, a green check mark next to your build or a red check mark next to your build. Like it either, you know, failed or passed. And we need to sort of do that fast for you, get, get that result quick. Mm-hmm. . And with fiber plane, it's a more. I think that the interesting thing here is like, it's a, it's a more explorative and a sort of rich design space, right?It's this notebook, which already you, you know, you can start typing and text and images and headings and check checklists and whatnot, right? It's a very open form factor and design space. And then of course, with the integrations, it can even, you know, be richer. So I'm very curious into your point, right what direction people will pull the product into.Cause you can take it into all sorts. Use cases and scenarios. Yeah,[00:20:05] Michaela: exactly. And I think as a founder also, or as the design team, product team, it's it's also a little bit of a balancing act, right? So how far, you know, let me, are we going with what the user are doing with our product and where are we setting some boundaries that they can't do everything right?So there's also often the talk about opinionated products, right? That you can actually do one thing and on one thing only, and we have an opinion on, you know, how. Supposed to use our product. And you know, we try to, if we see people deviate from that, we try to put an end to it. And then there's the other way where you say, Well, you know, if you take fiber plane and you do X with it and we haven't thought about this maybe, you know, we are okay with it.Or maybe we even support that path, right.[00:20:48] Micha: Yeah, I think, I think we're more on indeed on the, on the ladder, right? I think what we've sort of, we talk about this a lot internally, sort of everything is a building block. You know, you've, we've got the notebook, you've got these different cell types, you've got providers, you've got templates.Mm-hmm. You've got the command line interface. So like for us, like everything is a building block and we, we actually want to retain that flexibility. Not be too prescriptive. Cause maybe you have a, a if you think about sort of the, the incident debugging or, or investigating your infrastructure, like you might have a certain process, I might have a completely different process and we need to be able to facilitate, you know, these different workflows.So, you know, thus far sort of our, our thinking around the product has been everything is a building block. And it should be this sort of flexible form factor that people can pull into into different scenarios and use. I mean,[00:21:36] Michaela: we have infrastructure as code, right? And we have like security as code.Maybe we have debugging as code. Maybe, you know, this is what's coming next. Can, can you envision that, that it's going in this direction? Because while we have building blocks, maybe right now it's not you know, programming language for debugging, but it could go a little bit into the distraction, right?No code coding for debugging.[00:22:02] Micha: Yeah, we've actually, we've, we've had some of, of that sort of discussion internally as well. If you think about the templates right. To, to some extent that is a, you know, we use J Sonet as a, as a sort of language, but we sort of codified them in a certain way and you can, you could argue that the templates is, you know, sort of a programming language for at least, you know, that debugging process, right?Yeah, exactly. Right. Yeah. And. And, and we, you can take that even further and make it kind of like statically typed and make it adhere to, you know, certain rules and maybe even have control flow. So I think that, that there's, there's a piece there. And then maybe, you know, obviously we have you know, some YAML configuration on how you set up your providers, right?Like how to connect to your infrastructure. So there's some, you know, observability as code in mm-hmm. in that realm. Yeah. Yeah. I think that'll be an interesting part of the journey, right? Like to figure out can we, and some even.[00:22:55] Michaela: Yeah, in some parts should be well, don't repeat yourself, right?Like, for example, pulling in these providers, configuring that, you know, I get the right data. This would actually be something that I'm, you know, pulling in again. And probably that's what your templates do, right? So you say billing, oh, and then check, check, check, check, check. I have, you know, all my signals here and they're configured in a way that it's useful.And then for this investigation, hopefully, One at a type thing, right? So I'm investigating, and as we, as we talked before once I realized what's going on, hopefully in my postmortem I'm going to, you know, make sure that this is not happening again. So this code probably is not going to be reused that often.Maybe some, you know, some ideas from it, but hopefully we won't reproduce the same sect completely exact thing again.[00:23:44] Micha: Yeah. Cool. Yeah, that's, that's a super great point. And I think coming back, sort of the early part of the conversation around dashboards, right? I think thus far what we've sort of experienced as, you know, engineers ourselves, like, I think, I think we probably had sort of a phase around information gathering.Like all these dashboards are great for information gathering, but now with Kubernetes and containers and microservices like the, the, the number. Services that we're running and the complexity has increased. So I think, I think there's sort of an opportunity for more exactly what you're describing. So it's more about action, right?Mm-hmm. , what? What are we doing? We want to have the information, we want actionable intelligence that informs us what to do.[00:24:20] Michaela: Yeah, yeah, exactly. Because now I'm looking at this dashboard and I'm seeing the signals. But then everything else is outside of, you know, this realm, right? So what actions do I take?Do I go, go to the console? Do I restart that service? You know, or, you know, whatever I'm doing. And, and it's also vanishing, right? So I'm doing it, but then. Who can see it, What I did. Right? Yeah, exactly. And so now we are capturing this, which is very nice, and then we can learn from it Right. Postmortems as well.Yeah. So I looked a little bit through your blog and and, and your Twitter, and you were also talking about blameless postmortems. So how do you think about psychological safety? How should people. In an organization look at on call and incident management to really make it sure that we are ending the blame game.Right. You probably have some thoughts about that as well, because you're working in this area.[00:25:19] Micha: Yeah. I, I think it's important to, and you like not have put any blame on any person. Right. It, it is a, and I guess sort of, you know, that's also why we're building this product. It is a collaborative process to debug an issue or resolve an incident.Like, and what you want to achieve is to put the entire team in the best possible position to solve the issue at hand and and, you know, a support structure around it. So, you know, coming back to the product, like being able to, to open discussions. Point people in the, in the, in the right direction.[00:25:52] Michaela: So maybe also if it's easier to find a problem to root cause it, and, you know, incidents become no issue or at least a lesser issue. So maybe the blame game is not that important. Can, can we say it that way?[00:26:08] Micha: I think so. Yeah. Yeah, yeah. If, if, you know, if the process becomes repeatable and we codify that and we collaborate on it and we build up that, again, that system of record and knowledge base I think that, you know, puts us in a safer position to, to solve the next one.That's[00:26:25] Michaela: true. Yeah. Another thing that I was thinking of when I looked through, you know, fiber plane and what it does is KS engineering and I thought like what KS engineering is where you try to prevent not only the knowns, but also the unknowns, right? So really think about, you know, what, what could go wrong and then, you know, make a fallback so that your system is reliable.Or, you know, if this database goes down that not the whole system goes down, but only a part of it and so on. Do you think that KS engineers can act. Source or, you know, use those notebooks that you're creating as input for knowing, you know, what we should actually look at and, Yeah.[00:27:02] Micha: Well, I think it, well, one thing I think it'd be a great provider yeah.integrating with, with, with, you know, one or many of the, the chaos engineering services out there. I think it's a great way to train your team, right? You, we plug in some K engineering provider. The, the provider communicates with your infrastructure and such, pulling out wires from from, you know, your, your system.And then now go ahead and start, you know, debugging this issue and mm-hmm. and you know, use different templates and you can, you know, sort of trial all sorts of different issues. I think it'd be super fun. Yeah.[00:27:37] Michaela: Yeah. So Micha, one thing that I also saw is that some of your of fiber plane is open source.So what's your vision for open sourcing that are, you know, are some parts being open source? Can people help with the building fiber plane?[00:27:51] Micha: Yeah, great question. So right now what we've open sourced is a project called fp bind Gen. So this is actually of SDK bindings, generat. For how you would create full stack web assembly plugins.So this is what we use to build our own elastic search and our Prometheus plugins. So we've, we've open sourced that. It's on GitHub we've already got some, quite some feedback on it. So, but would love some more. And then going forward we'll be open sourcing sort of our templating stack the proxy.Which sort of sets which you install inside your cluster and sort of sets up the secure connections between the providers and your infrastructure and then the fabric plane managed service. And then the command line interface that I mentioned will also be open source. So expect more to hear from us on the open source front.[00:28:36] Michaela: Yeah, Cool. I think that's so important, especially for developer tooling, that people can also really get it into their hands and then help, you know, shape the, or make the best product for their, for their environments that they have. I think this is such a success strategy.[00:28:50] Micha: Yeah, exactly. And you know, we, as I said, we would love to get feedback on the, on the providers and the, the plugin model, but maybe even, you know, once we open source the the, the provider stack would be great if people maybe come up with crazy ideas.Right? You can think of any type of provider that you could surface data inside of, inside of the notebook. Yeah. Doesn't need to be observability or like monitoring data. Like could be. Yeah.[00:29:14] Michaela: Cool. Yeah, I'm super excited. What, you know, what will come out of that. Yeah. So I want to come back a little bit to your founding story because I know a lot of people are interested in developer tools and, you know, and, and Startup founding as well.And you did it twice already, right? And maybe several more times in your life, I dunno. But right now we know of two instances. Yeah. There, there. So, and and also for fiber plane, you already got funding, right? Several million dollars. And so how do you do. How do you do it out of Europe is also some of my questions that I have because I think it's a little bit a different game here in Europe than it's in Silicon Valley.Yeah. It doesn't look like, you know, opportunities around the corner everywhere. I, I have been studying in the Netherlands, so I know that actually Netherlands is really a good place, I think for, for tech startups and, you know, also a little bit out of the universities I saw there like You know, you get a little bit of help and, and, and funding and things like this, but still, I would assume it's harder than in Silicon Valley.So how did you make it work? How did you get funding? You also said that worker had some funding at the beginning. Yeah.[00:30:26] Micha: Yeah. It's a good question. Well, how did we do the second time around, to be honest, Because it's the second time. Yeah. It was a bit easier. I mean, it's never, It's, Yeah. Yeah. It's obviously, you know, never as easy.But it was definitely easier. I do think in Europe, if I also compare it to the worker days to where we are now, Like I do think the funding climate and sort of the, the, the, the thinking around startups has improved a lot, right? There's there's more funding out there, there's more feess. I think more importantly though, what we've seen is that now.Sort of the European unicorns have exited or gone ipo. And we have actually more operators inside of Europe that have experience in either founding a startup are able to sort of start doing angel investing or have worked at multiple startups and we have just more operating experience you know, versus honestly like bankers, right?That That, you know, help you out or are, are investing in you? So actually the, the, the funds that funder does were Crane Venture Partners which is actually a seed fund out of London that's actually focused on developer tools and infrastructure. So I would highly recommend, you know, talking to them.If you're thinking about, you know, building a developer tool company and you need some funding, of course my own fund is also focused on developer tool. So shameless plug there on MP Hard Ventures. You can just Google that and find me. And then we have North Zone, which is a, you know, very like multi-stage fund.Also out of, well actually quite different geographies and Notion Capital out of out of London as well. Okay. We've got some have several micro VCs, several things. Yeah. We have somebody funded West Coast Alana Anderson was doing with base case capitals investing in a lot of infrastructure and enterprise startups and Max Cloud from System one in Berlin.Is another one. So yeah, we have a good crew of, you know, a diff different experience and sort of different stage type of funding as well.[00:32:19] Michaela: Yeah. This was my next question that I had for you. It's probably not only about the money, you said experience, right? It's also about the knowledge that people have, right.How to do things. Probably, yeah. The people that they know, right? So that they can Yeah. To be Yeah, exactly. Can consider the right people have the right network and so.[00:32:36] Micha: Yeah, I think, I think the most, yeah, it's is, is introductions, but it's also. You know, if you, if you think about the, the funds that actually do developer tools, right?So they, in their portfolio, they, they've seen, you know, startups trying over and over to tackle some kind of go to market issue or trying to build an open source, mm-hmm. company, right? So they have some, some pattern matching and some, some knowledge about, you know, what to do and what, what not to do.Of course, it's all advice, but it's good to sort of have some people in your corner that have at least seen this, these types of companies being built. Over and over again. Right. That's, and then, and then other VCs have more experience in, you know, more, more like how to build up or scale up a sales organization and thinking about how to run a SaaS company.So yeah. Different experience from different, different funds.[00:33:20] Michaela: And so now you listed quite a lot of different investors. Do you reach out to each one of them or do you have like a whole group meeting and they're all in there and you ask them for advice? , how does it[00:33:33] Micha: Yeah. No, it's, it's sort of one on one chats, right?Either over, over chat or, you know, we meet up for coffee or, or or breakfast, mm-hmm. . But yeah, we try to do that on a, on a regular cadence. And then of course, when, you know, something exciting happens, such as our launch know, we try to group them together and get them all on the same page around the same time.Or of course if an issue arises, Right, which could also be the case. Yeah. And then sort of all hands on deck and everybody in the same room or zoom.[00:34:01] Michaela: And what about your biggest struggle on your, on your entrepreneurial journey, maybe now with fiber plane or maybe with Worker? Did you ever think that, you know, worker, when you started it, did you think that somebody is going to buy this and.This is going to be huge.[00:34:16] Micha: Yeah. Yeah. I think, I think the ambition was always there. Mm-hmm. . And, but, and, and sort of that drive to just make better developer tools. I think that sort of, that, you know, that's been true for all the companies or all too. Yeah, that's,[00:34:30] Michaela: Yeah. And what[00:34:32] Micha: I struggle. Yeah. Yeah. So I think, I think as I think for fiber plane now, it's not necessarily a struggle, it's just the real, which this mission of this flexible form factor, just the fact that we're doing sort of two startups at the same time has been sort of mm-hmm. An interesting thing to to build now, right? You're doing this rich, collaborative, rich tech editor and trying to build this infrastructure oriented company, and I think that's been yeah, just an interesting experience with building out a team.You know, the technology and the product that we.[00:35:01] Michaela: Yeah. Yeah. So maybe can you tell me a little bit more about again, if people want to hop over to Fiber plane now and try it out how does that work? Do you have to, you know is there a sign up? Is there a waiting list? I mean, you said probably when this airs there is a public beat, but still do you have to, you know, what do you have to reach out to you, you give me a demo or I just fill in my credentials and I'm off togo.[00:35:25] Micha: you can just sign, sign up with Google and then you're off to the races. And then of course, if you want a demo and sort of get some more, more more help or onboarding we're happy to help you and get on a call and walk you through it. But yeah. Okay, cool. Try playing com. Is there[00:35:40] Michaela: also a, Yeah, is there a video or something that we can look[00:35:44] Micha: at?Yes. The, the website and there's a video.[00:35:50] Michaela: Okay. I will link that so that people can go Yeah. And it will explain everything to them. Right. What about pricing? Whatever pricing? Yeah. You have already some idea around pricing. Yeah.[00:36:01] Micha: We've got some ideas on how to charge, but I think right now for us, it's important to get the product market fit, mm-hmm.and as such, you know, get, get the feedback. From these companies and these teams using the product. So we'll introduce pricing at a later stage. So for now it's, it's free to use, mm-hmm. . And you just give us your time and your feedback, and then Yeah, we're grateful.[00:36:20] Michaela: Yeah. And what about my data?Is it safe with you? Like, do you have some visibility into my data or do I send it over to[00:36:29] Micha: you? Yeah, so we actually so the way the, the providers work the plugins, so they actually get activated through a proxy. So we install a proxy inside of your cluster. The proxy sets up a secure bidirectional tunnel from your infrastructure to the fiber plane managed service.And then we do, for that specific query, we do store the data that's related to that query. So of a result, we do store that in the notebook. And yeah, we probably will come up with sort of more enterprisey ideas around how to self host[00:36:59] Michaela: it, Right? Or something[00:37:01] Micha: as an example. Yeah, yeah, yeah. But again, we'd love to get some feedback on that.[00:37:07] Michaela: How that works. Right? Yeah. Okay, cool. So yeah, that sounds really good. I think you, at least my questions, , you could answer them all, but maybe my listeners have questions and then they can send them to you. I think you will be, Yeah. Quite happy, right?[00:37:22] Micha: A hundred percent. At mes on Twitter, m i e s and at fiber, playing on Twitter, fiber playing.com.Sign up, take it for spin, shoot us a message. Yeah, sounds.[00:37:33] Michaela: Yeah. Yeah, it sounds super interesting. I hope that a lot of my listeners will do that, and I will link everything in my show notes that we, you know, talked about your, your Twitter handle and everything so that people can reach you. And I hope you get a lot of questions and people give it a spin and give it a try and send you their use cases,And yeah. I hope you all the best with your product. Thank you so much for being on my show today Micha. And yeah. Thank you. Bye.[00:37:59] Micha: Thank. Thank you for having me.[00:38:01] Michaela: Yeah, it was really great. Bye bye[00:38:04] Micha: bye.[00:38:06] Michaela: This was another episode of the Software Engineering Unlocked Podcast. If you enjoyed the episode, please help me spread the word about the podcast.Send episode to a friend via email, Twitter, LinkedIn. Well, whatever messaging system you use, or give it a positive review on your favorite podcasting platform such as Spotify or iTune. This would mean really a lot to me. So thank you for listening. Don't forget to subscribe and I will talk to you in two weeks.
Benjamin Wilms (@MrBWilms, co-founder/CEO of @Steadybit) talks about the importance of resilience for SREs, DevOps, and developers through chaos engineering platformsSHOW: 661CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotwCHECK OUT OUR NEW PODCAST - "CLOUDCAST BASICS"SHOW SPONSORS:Datadog Synthetic Monitoring: Frontend and Backend Modern MonitoringEnsure frontend issues don't impair user experience by detecting user-facing issues with API and browser tests with a free 14 day Datadog trial. Listeners of The Cloudcast will also receive a free Datadog T-shirt. Granulate, an Intel company - Autonomous, continuous, workload optimizationgProfiler from Granulate - Production profiling, made easyCDN77 - Content Delivery Network Optimized for Video85% of users stop watching a video because of stalling and rebuffering. Rely on CDN77 to deliver a seamless online experience to your audience. Ask for a free trial with no duration or traffic limits.SHOW NOTES:Steadybit (homepage)Steadybit wants developers involved in Chaos engineering before production (TechCrunch)Topic 1 - Benjamin, give everyone a quick introduction.Topic 2 - Let's start with the concept of chaos engineering. In its simplest form, chaos engineering intentionally takes down parts of a test or production environment (typically after software has shipped) randomly so teams, typically SRE's/ops/dev, are forced to make the applications more resilient over time. It's not a matter of if systems will go down, it's a matter of when. This makes the systems better over time. Benjamin, you have a consulting background in this area that ultimately led to founding Steadybit. What were the limitations to this approach?Topic 3 - What you're talking about is a more proactive approach to downtime. I'll call this resilience engineering and it requires a shift in mindset in an organization. How do you get developers onboard to embrace the need? Are we asking developers to share responsibility for outages with the SRE organization?Topic 4 - On the surface, the obvious benefit is reduced downtime. That can be hard to quantify in business value. Outages can be measured, a lack of outages is harder to quantify. Does this become an issue in convincing an organization to embrace this methodology?Topic 5 - When you say we are going to move chaos engineering into the CI/CD pipeline, what does that mean? Is this code that is added? Testing simulations that have to be passed? Real time failures of databases or nodes or simulated? What are the common use cases?FEEDBACK?Email: show at the cloudcast dot netTwitter: @thecloudcastnet
In Dexter's Lab, does Dexter allow his sister DeeDee to wreck his lab so that he can become a better scientist due to chaos engineering? Want a FTQ shirt, mug, phone case, or many other products? Shop our merch store! https://www.fantheoryqueries.com/merch/ For exclusive bonus content, FTQ Discord theory discussion channel, and the ability to view live recording sessions, support us on Patreon: https://www.patreon.com/fantheoryqs Advertisers: please contact sales@advertisecast.com if you would like to advertise on Fan Theory Queries. Fan Theory Queries is part of the Airwave Media podcast network. Visit AirwaveMedia.com to listen and subscribe to their other fine shows. Learn more about your ad choices. Visit megaphone.fm/adchoices
Lithuania sustains a major DDoS attack. Lessons from NotPetya. Conti's brand appears to have gone into hiding. Online extortion now tends to skip the ransomware proper. Josh Ray from Accenture on how social engineering is evolving for underground threat actors. Rick Howard looks at Chaos Engineering. US financial institutions conduct a coordinated cybersecurity exercise. For links to all of today's stories check out our CyberWire daily news briefing: https://thecyberwire.com/newsletters/daily-briefing/11/122 Selected reading. Russia's Killnet hacker group says it attacked Lithuania (Reuters) The hacker group KillNet has published an ultimatum to the Lithuanian authorities (TDPel Media) 5 years after NotPetya: Lessons learned (CSO Online) The cyber security impact of Operation Russia by Anonymous (ComputerWeekly) Conti ransomware finally shuts down data leak, negotiation sites (BleepingComputer) The Conti Enterprise: ransomware gang that published data belonging to 850 companies (Group-IB) Fake copyright infringement emails install LockBit ransomware (BleepingComputer) NCC Group Monthly Threat Pulse – May 2022 (NCC Group) We're now truly in the era of ransomware as pure extortion without the encryption (Register) Wall Street Banks Quietly Test Cyber Defenses at Treasury's Direction (Bloomberg)