POPULARITY
On this Screaming in the Cloud Replay, Corey is joined by John Allspaw, Founder/Principal at Adaptive Capacity Labs. John was foundational in the DevOps movement, but he's continued to bring much more to the table. He's written multiple books and seems to always be at the forefront. Which is why he is now at Adaptive Capacity Labs. John tells us what exactly Adaptive Capacity Labs does and how it works and how he convinced some heroes to get behind it. John brings a much-needed insight into how to get multiple people in an organization on the same level when it comes to dealing with incidents. Engineers and non. John points out the issues surrounding public vs. private write-ups and the roadblocks they may prop up. Adaptive Capacity Labs is working towards bringing those roadblocks down, tune in for how!Show Highlights(0:00) Introduction(0:59) The Duckbill Group sponsor read(1:33) What is Adaptive Capacity Labs and the work that they do?(3:00) How to effectively learn from incidents(7:33) What is the root of confusion in incident analysis(13:20) Identifying if an organization has truly learned from their incidents(18:23) Gitpod sponsor read(19:35) Adaptive Capacity Lab's reputation for positively shifting company culture(24:22) What the tech industry is missing when it comes to learning effectively from the incidents(28:44) Where you can find more from John and Adaptive Capacity LabsAbout John AllspawJohn Allspaw has worked in software systems engineering and operations for over twenty years in many different environments. John's publications include the books The Art of Capacity Planning (2009) and Web Operations (2010) as well as the forward to “The DevOps Handbook.” His 2009 Velocity talk with Paul Hammond, “10+ Deploys Per Day: Dev and Ops Cooperation” helped start the DevOps movement.John served as CTO at Etsy, and holds an MSc in Human Factors and Systems Safety from Lund UniversityLinksThe Art of Capacity Planning: https://www.amazon.com/Art-Capacity-Planning-Scaling-Resources/dp/1491939206/Web Operations: https://www.amazon.com/Web-Operations-Keeping-Data-Time/dp/1449377440/The DevOps Handbook: https://www.amazon.com/DevOps-Handbook-World-Class-Reliability-Organizations/dp/1942788002/Adaptive Capacity Labs: https://www.adaptivecapacitylabs.comJohn Allspaw Twitter: https://twitter.com/allspawRichard Cook Twitter: https://twitter.com/ri_cookDave Woods Twitter: https://twitter.com/ddwoods2Original Episodehttps://www.lastweekinaws.com/podcast/screaming-in-the-cloud/finding-a-common-language-for-incidents-with-john-allspaw/SponsorsThe Duckbill Group: duckbillgroup.com Gitpod: http://www.gitpod.io/
In this episode, Nick chats with Ryan Houska, Director of Web Operations at Cisco and a CSAP Alum. Hear firsthand about his transformative journey, the power of curiosity, open-mindedness, and networking, and the crucial difference between management and leadership. Get an insider's view of his transition into leadership, the significance of resilience, shared team accountability, and the customer-first approach at Cisco. As always, the episode is packed with wisdom, cleverly wrapped in sports analogies, crafted with our new comers in mind.
We cover resilience engineering & learning from incidents with John Allspaw, former CTO @ Etsy and current Founder & Principal @ Adaptive Capacity Labs! Co-hosted by Kenji Kiuchi (Head of Quality and Performance @ Postman) this episode also addresses common unintuitive perspectives within resilience engineering, strategies for effective incident response / problem solving, how to identify current sources of resilience, and practical tips for implementing these resiliency tactics in your organization today.ABOUT JOHN ALLSPAWJohn Allspaw (@allspaw) has worked in software systems engineering and operations for over twenty years in many different environments. John's publications include the books The Art of Capacity Planning (2009) and Web Operations (2010) as well as the forward to “The DevOps Handbook.” His 2009 Velocity talk with Paul Hammond, “10+ Deploys Per Day: Dev and Ops Cooperation” helped start the DevOps movement. John served as CTO at Etsy, and holds an MSc in Human Factors and Systems Safety from Lund University."The competitive advantage is not for a leader to say, ‘Why did it take so long to restore this issue or resolve this outage?' A competitive advantage is, ‘Oh my God, that is amazing. Tell me what made this hard and what are any of the things that made it difficult to resolve? Is there anything I can do to help get out of the way for people to do the work?'"- John Allspaw ABOUT KENJI KIUCHIKenji Kiuchi (@dr_kiuchi) is Head of Quality and Performance at Postman, an API platform whose mission is to maximize everyone's creativity through the power of connected software. There he leads a global team with a focus on maximizing user delight and innovating the practice of testing. Before coming to Postman, he spent several years ‘Helping people get Jobs” at Indeed. There, he worked on scaling teams and practice to optimize engineering delivery as well as leading Diversity, Inclusion and Belonging initiatives as an Associate Site Director. Prior to Indeed, Kenji spent several years as an Engineering Manager at Twitter where he led Quality efforts across monetization, growth, infra and the delivery of live video. When Kenji isn't driving engineering excellence, he's driving his motorcycle, spending quality time with his 3 daughters, and mentoring leaders across the globe.Check out our friends and sponsor, JellyfishTo learn more about Jellyfish and how they can help you increase engineering satisfaction and create happier, higher-performing engineering teams...Learn more at Jellyfish.co/elcSHOW NOTES:John's perspective on production (4:27)What drove John toward resilience engineering (6:22)How complex systems relate to resilience engineering (9:23)Differences between robustness and resilience (13:13)The role of productive adaptation in resilience engineering (17:26)Identify sources of resilience already present in your organization (22:52)Examples of unintuitive perspectives involving incident analysis (27:15)How to make room for unintuitive perspectives (31:41)Practical tips for implementing resiliency tactics & understanding incidents (36:12)Rapid fire questions (39:51)LINKS AND RESOURCESLearning From Incidents Conference 2023 - This is a forum for sharing stories of incidents, incident handling, and the learnings from software engineers who handle large-scale distributed software systems.Hindsight and Sacrifice Decisions Blog Post on Adaptive Capacity Labs reaction to the NYSE halting trading to resolve an issueUsing Language by Herbert H. Clark - Herbert Clark argues that language use is more than the sum of a speaker speaking and a listener listening. It is the joint action that emerges when speakers and listeners, writers and readers perform their individual actions in coordination, as ensembles. In contrast to work within the cognitive sciences, which has seen language use as an individual process, and to work within the social sciences, which has seen it as a social process, the author argues strongly that language use embodies both individual and social processes.Papers We Love TalkVisual Momentum
When managing a marketing team, success is not necessarily a straight line. No instant inputs and outputs. After all, your team isn't just filled with employees, it's filled with human beings.Complex, fallible, emotional, confusing, questioning human beings. All of us, together, experiencing the human condition while trying to be productive, together, working in an organization.I say this because, if you're not careful you might just fast forward to ramming your way through to the goal. The real challenge is to coax fellow humans along to that goal. Enabling them. Preparing them.In our free digital marketing course, we describe it this way – Website Strategies: 4 ways to prepare your marketing team to increase conversion rates (https://meclabs.com/course/lessons/website-strategies/). Not just, how to get higher conversion rates. No. Ways to prepare your team. And our latest guest manages with the same philosophy. “Pre-sell key ideas internally,” she says. “Build strong CFO relationships,” she says. Don't just charge ahead with gusto. Lay the groundwork for success.Those are just some of the lessons Jeanne Hopkins, Chief Revenue Officer, OneScreen.ai (https://www.onescreen.ai/), shared with Daniel Burstein in our latest podcast episode. Hopkins was the Chief Marketing Officer of MarketingSherpa (https://www.marketingsherpa.com/) and sister publication MarketingExperiments (https://marketingexperiments.com/) before this podcast's host even started here, and he's been here 13 years. She's had 11 C-level or VP-level marketing roles in her career. And today Hopkins leads a team of 19 (with three more hires slated for this quarter) and manages a $6.2 million budget. In other words, she has a wealth of experience that I thought you could learn a lot from. Stories (with lessons) about what she made in marketingSome lessons from Hopkins that emerged in this discussion:Build strong CFO relationships.Pre-sell key ideas internally.Allow your team to shine.Loyalty first. Hire in batches.Remember your interns.Related content mentioned in this episodeThe Long-Term-Growth Product Launch: Cuisinart has been selling the same food processor since the ‘70s (Podcast Episode #13) (https://www.marketingsherpa.com/article/interview/long-term-growth-business)Table Fries (Hopkins' podcast) (https://tablefries.com/)How to Sell Your Marketing and Advertising Ideas to Your Boss and Clients (with free template) (https://sherpablog.marketingsherpa.com/marketing/how-to-sell-to-your-boss/)Customer Loyalty Chart: Just how big of an effect does customer satisfaction have on loyalty? (https://www.marketingsherpa.com/article/chart/loyalty-effect-customer-satisfaction)Get more episodesTo receive future episodes of how I Made It In Marketing, sign up to the MarketingSherpa email newsletter at https://marketingsherpa.com/newslettersAbout this podcastThis podcast is not about marketing – it is about the marketer. It draws its inspiration from the Flint McGlaughlin quote, “The key to transformative marketing is a transformed marketer” from the Become a Marketer-Philosopher: Create and optimize high-converting webpages
Many companies make components and materials for our everyday lives. Sometimes we don't even know the names of those companies that are helping us achieve our goals. One of those empowering companies is MathWorks, which has software embedded in chips for phones, cars, planes, and even hearing aid implants, just to name a few applications. MathWorks' Director of Creative Services and Web Operations, Ken Hyman, recently saw began an implementation of Agile thinking and processes with his team, and that's why we wanted him to join us for this episode of Modern Business Operations. Host Briana Okyere starts the conversation by asking Ken to define Agile and the reasons why MathWorks implemented it. Briana and Ken also discuss: MathWorks' move from being primarily office-based to being fully remote Why Agile leads to more customer-centricity How Agile allows Mathworks to focus on outcomes instead of output You'll also hear about Ken's use of job satisfaction reports after the implementation of Agile to keep iterating on the best possible version of it for MathWorks. This episode is brought to you by Tonkean Tonkean is the operating system for business operations and is the enterprise standard for process orchestration. It provides businesses with the building blocks to orchestrate any process, with no code or change management required. Contact us at tonkean.com to learn how you can build complex business processes. Fast.
Date recorded: November 12, 2021 Show Description: Today we welcome John Allspaw. John is an engineering leader and researcher with over 20 years of experience in building and leading teams engaged in software and systems engineering. He is a co-founder of Adaptive Capacity Labs, LLC. Previously, he was Chief Technology Officer at Etsy. He has also worked at Flickr, Friendster, InfoWorld, Salon, Genentech, Volpe National Transportation Center, and a bunch of other places as a consultant from time to time. John has spent the last decade bridging insights from Human Factors, Cognitive Systems Engineering, and Resilience Engineering to the domain of software engineering and operations. His publications include the books The Art of Capacity Planning (2009) and Web Operations (2010) as well as the forward to “The DevOps Handbook.” His 2009 Velocity talk with Paul Hammond, “10+ Deploys Per Day: Dev and Ops Cooperation” helped start the DevOps movement. He holds a Master's degree in Human Factors and Systems Safety from Lund University. Where to find John: LinkedIn Twitter Learn more about NDM: NaturalisticDecisionMaking.org Journal of Cognitive Engineering and Decision Making Where to find hosts Brian Moon and Laura Militello: Brian's website Brian's LinkedIn Brian's Twitter Laura's website Laura's LinkedIn Laura's Twitter
About JohnJohn Allspaw has worked in software systems engineering and operations for over twenty years in many different environments. John's publications include the books The Art of Capacity Planning (2009) and Web Operations (2010) as well as the forward to “The DevOps Handbook.” His 2009 Velocity talk with Paul Hammond, “10+ Deploys Per Day: Dev and Ops Cooperation” helped start the DevOps movement.John served as CTO at Etsy, and holds an MSc in Human Factors and Systems Safety from Lund UniversityLinks: The Art of Capacity Planning: https://www.amazon.com/Art-Capacity-Planning-Scaling-Resources/dp/1491939206/ Web Operations: https://www.amazon.com/Web-Operations-Keeping-Data-Time/dp/1449377440/ The DevOps Handbook: https://www.amazon.com/DevOps-Handbook-World-Class-Reliability-Organizations/dp/1942788002/ Adaptive Capacity Labs: https://www.adaptivecapacitylabs.com John Allspaw Twitter: https://twitter.com/allspaw Richard Cook Twitter: https://twitter.com/ri_cook Dave Woods Twitter: https://twitter.com/ddwoods2 TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Corey: This episode is sponsored in part by CircleCI. CircleCI is the leading platform for software innovation at scale. With intelligent automation and delivery tools, more than 25,000 engineering organizations worldwide—including most of the ones that you've heard of—are using CircleCI to radically reduce the time from idea to execution to—if you were Google—deprecating the entire product. Check out CircleCI and stop trying to build these things yourself from scratch, when people are solving this problem better than you are internally. I promise. To learn more, visit circleci.com.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I'm joined this week by John Allspaw, who's—well, he's done a lot of things. He was one of the founders of the DevOps movement—although I'm sure someone's going to argue with that—he's also written a couple of books, The Art of Capacity Planning and Web Operations and the foreword of The DevOps Handbook. But he's also been the CTO at Etsy and has gotten his Master's in Human Factors and System Safety from Lund University before it was the cool thing to do. And these days, he is the founder and principal at Adaptive Capacity Labs. John, thanks for joining me.Corey: And now for something completely different!John: Thanks for having me. I'm excited to talk with you, Corey.Corey: So, let's start at the beginning here. So, what is Adaptive Capacity Labs? It sounds like an experiment in auto-scaling, as is every use of auto-scaling, but that's neither here nor there. I'm guessing it goes deeper.John: Yeah. So, I managed to trick, or let's say convince some of my heroes, Dr. Richard Cook and Dr. David Woods, these folks are what you would call heavies in the human factors, system safety, and resilience engineering world, Dave Woods is credited with creating the field of resilience engineering. And so what we've been doing for the past—since I left Etsy is bringing perspectives, techniques, approaches to the software world that are, I guess, some of the most progressive practices that saved other safety, critical domains, like aviation, and power plants, and all of the stuff that makes news.And the way we've been doing that is largely through the lens of incidents. And so we do a whole bunch of different things, but that's the core of what we do is activities and projects for clients that have a concern around incidents; both, are we learning well? Can you tell us that? Or can you tell us how to understand incidents and analyze them in such a way that we can learn from them effectively?Corey: Generally speaking, my naive guess, based upon the times I spent working in various operations role has been, “Great. So, how do we learn from incidents?” Well, if you're like most of the industry, you really don't. You wind up blaming someone in a meeting that's called blameless, so instead of using the person's name, you use a team or a role name, and then you wind up effectively doing a whole bunch of reactive process work that in long enough timeline and enough incidents ossifies you into a whole bunch of processes and procedure that is just horrible. And then how do you learn from this?Well, by the time it actually becomes a problem, you've rotated CIOs four times and there's no real institutional memory here. Great. That's my cynical approach, and I suspect it's not entirely yours because if it were, you wouldn't be doing a business in this because otherwise, it would be this wonderful choreographed song-and-dance number of, “Doesn't it suck to be you? Da-da.” And that's it. I suspect you do more as a consultant than that. So, what does my lived experience of terrible companies differing in what respects from the folks you talk to?John: Oh, well, I mean, just to be blunt, you're absolutely spot on. [laugh]. The industry is terrible at this.Corey: Well, crap.John: I mean, look, the good news is, there are inklings, there are signals for some organizations that have been doing the things that they've been told to do by some book or website that they read, and they're doing all the things and they realize, “All right, well, whatever we're doing doesn't seem to be—it doesn't feel—we're doing all the things, checking the boxes, but we're having incidents”—and even more disturbing to them is we're having incidents that seem as if—it'd be one thing to have incidents that were really difficult, hairy, complicated, and complex, and certainly those happen, but there is a view that they're just simply not getting as much out of these sometimes pretty traumatic events as they could be. And that's all that's needed, yeah.Corey: In most companies, it seems like, on some level, you're dealing with every incident that looks a lot like that. Sure, it was a certificate expired, but then you wind up tying into all the relevant things that are touching that. It seems like it's an easy, logical conclusion. Oh, wow. It turns out in big enterprises, nothing is straightforward or simple.Everything becomes complicated, and issues like that happen frequently enough that it seems like the entire career can be spent in pure firefighting reactive mode.John: Yeah, absolutely. And again, I would say that just like these other domains that I mentioned earlier, there's a lot of, sort of, intuitive perspectives that are, let's just say, sort of unproductive. And so in software, we write software; it makes sense if all of our discussions after an incident trying to make sense of it, is entirely focused on the software did this, and Postgres has this weird thing, and Kafka has this tricky bit here. But the fact of the matter is, people and—engineers and non-engineers—are struggling when an incident arises, both in terms of what the hell is happening, and generating hypotheses, and working through whether the hypothesis is valid or not, adjusting it if signals show up that it's not, and what can we do, what are some options? If we do feel like we're on a good [unintelligible 00:06:09] productive thread about what's happening, what are some options that we can take?That opens up a doorway for a whole variation of other questions. But the fact of the matter is, handling incidents, understanding really, effectively, time-pressured problem solving, almost always amongst multiple people with different views, different expertise, and piecing together across that group what's happening, and what to do about it, and what are the ramifications of doing this thing versus that thing? This is all what we would call above-the-line work. This is expertise. It shows up in how people weigh ambiguities, and things are uncertain.And that doesn't get this lived experience that people have, it just we're not used to talking about—we're used to talking about networks, and applications, and code, and network. We're not used to talking about and even have vocabulary for what makes something confusing? What makes something ambiguous? And that is what makes for effective incident analysis.Corey: Do you find that most of the people who are confused about these things tend to be more aligned with being individual contributor type engineers, who are effectively boots-on-the-ground, for lack of a better term? Is it high-level executives who are trying to understand why it seems like they're constantly getting paraded in the press? Or is it often folks somewhere between the two?John: Yes.Corey: [laugh].John: Right? Like there is something that you point out, which is this contrast between boots-on-the-ground, hands-on keyboard, folks who are resolving incidents, who are wrestling with these problems, and leadership. And sometimes leadership who remember their glory days of being an individual contributor sometimes are a bit miscalibrated. They still believe they have a sufficient understanding of all the messy details when they don't. And so, I mean, the fact of the matter is, there's the age-old story of Timmy stuck in a well, right?There's the people trying to get Timmy out of the well, and then there's what to do about all of the news reporters surrounding the well asking for updates and questions, and how did Timmy get in the well? These are two different activities. And I'll tell you pretty confidently, if you get Timmy out of the well, pretty fluidly, if you can set situations up where people who ostensibly would get Timmy out of the well are better prepared with anticipating Timmy is going to be in the well, and understanding all the various options and tools to get Timmy out of the well, the more you can set up those and have those conditions be in place, there's a whole host of other problems that simply don't go away. And so, these things kind of get a bit muddled. And so when you say ‘learning from incidents,' I would separate that very much from what you tell the world externally from your company about the incident because they're not at all the same.Public write-ups about an incident are not the results of an analysis. It's not the same as an internal review, were the review to be effective. Why? Well, first thing is you never see apologies on internal post-incident reviews because who are you going to apologize to?Corey: It's always fun watching the certain level of escalating transparency as you go up through the spectrum of the public explanation of an outage, to ones you put internal customers, to ones you show under NDA to special customers, to the ones who are basically partners who are going to fire you contractually if you don't, to the actual internal discussion about it. And watching that play out is really interesting. As you wind up seeing the things that are buried deeper and deeper, yeah, you wind up with this flowery language on the outside, and it gets more and more transparent, and at the end, it's, “Someone tripped and hit the emergency power switch in a data center.” And it's this great list of how this stuff works.John: Yeah. And to be honest, it would be strange and shocking if they weren't different. Because like I said, the purpose of a public write-up is entirely different than an internal write-up and the audience is entirely different. And so that's why they're cherry-picked. There's a whole bunch of things that aren't included in public write-up because the purpose is, “I want a customer or potential customer to read this and feel at least a little bit better.”Or really, I want them to at least get this notion that we've got a handle on it. “Wow, that was really bad, but nothing to see here, folks. It's all been taken care of.” But again, this is very different, the people inside the organization, even if it's just sort of tacit, they've got a knowledge. Tenured people who have been there for some time, see connections, even if they're not made explicit, between one incident to another incident.To that one that happened—“Remember that one that happened three years ago, that big one? Oh, sorry, you're new. Oh, let me tell you the story. Oh, it's about this and blah, blah, blah. And who knew that Unix pipes only passes 4k across it.” Blah, blah, blah, something—some weird, esoteric thing.And so our focus, largely, although we have done projects with companies about trying to be better about their external language about it, the vast majority of what we do and where our focuses is, is to capture the richest understanding of an incident for the broadest audience. And like I said at the very beginning, the bar is real low. There's a lot of, I don't want to say falsehoods, but certainly a lot of myths that just don't play out in the data about whether people are learning. Whenever we have a call with a potential client, we always ask the same question. Ask them about what their post-incident activities look like, and they tell us and throw in some cliches, and everyone—never want a crisis go to waste.And, “Oh, yes. And we always try to capture the learnings and we put them in a document.” And we always ask the same question, which is, “Oh. So, you put these documents, these write-ups in an area?” Oh, yes, we want that to be shared as much as possible.And then we say, “Who reads them?” And that tends to put a bit of a pause because most people have no idea whether they're being read or not. And the fact is, when we look, very few of these write-ups are being read. Why? I'll be blunt: because they're terrible. [laugh].There's not much to learn from there because they're not written to be read. They're written to be filed. And so we're looking to change that. And there's a whole bunch of other things that are unintuitive, but just like all of the perspective shifts, DevOps, and continuous deployment, they sound obvious, but only in hindsight after you get it. That's characterization of our work.Corey: It's easy to wind up, from the outside, seeing a scenario where things go super well in an environment like that, where, okay, we brought you in as a consultant, suddenly, we have better understanding about our outages. Awesome. But outages still happen. And it's easy to take a cynical view of, okay, so other than talking to you a lot, we say the right things, but how do we know that companies are actually learning from what happened as opposed to just being able to tell better stories about pretending to learn?John: Yeah, yeah. And this is, I think, where the world of software has some advantages over other domains. And the fact is, software engineers don't pay any attention to anything they don't think the attention is warranted, or they're not being judged, or scored, or rewarded for. And so there's no single signal that accompanies learning from incidents. It's more like a constellation, like, a bunch of smaller signals.So, for example, if more people are reading the write-ups. If more people are attending group review meetings. In organizations that do this really well, engineers who start attending meetings, we ask them, “Well, why are you going to this meeting?” And they'll report, “Well, because I can learn stuff here that I can't learn anywhere else. Can't read about it in a runbook, can't read about it on the wiki, can't read about it in an email, or hear about it in an all-hands.”And that they can see a connection between, even incidents handled in some distant group, they can see a connection to their own work. And so those are the sort of signals—we've written about this on our blog—those are the sort of signals that we know that progress is building momentum. But a big part of that is capturing this, again, this experience. Usually, we'll see, there's a timeline, and this is when memcached did X, and this alert happened, and then blah, blah, blah, blah, blah. Right?But very rarely are captured the things that, when you ask an engineer, “Tell me your favorite incident story.” People who will even describe themselves, “Oh, I'm not really a storyteller, but listen to this.” And they'll include parts that make for a good story. Social construct is, if you're going to tell a story, you've got the attention of other people, you're going to include the stuff that was not usually kept or captured in write-ups. For example, like, what was confusing?A story that tells about what was confusing, well—“And then we looked, and it said, ‘zero tests failed.'”—this is an actual case that we looked at—“It says ‘zero tests failed.' And so, okay. So, then I deployed. Well, the site went down.” “Okay, well, so what's the story there?” “Well, listen to this. As it turns out, at a fixed font, zeros, like, in Courier or whatever, have a slash through it and at a small enough font, a zero with a slash through it looks a lot like an eight. There were eight tests failed, not zero.” So, that's about the display. And so those are the types of things that make a good story. We all know stories like this, right? The Norway problem with YAML. You ever heard of that Norway problem?Corey: Not exactly. I'm hoping you'll tell me.John: Well, so lay [laugh] it's excellent, and of course it works out that the spec for YAML will evaluate the value no—N-O—to false as if it was a boolean. Yes, for true. Well, but if your YAML contains a list of abbreviations for countries, then you might have Ireland, Great Britain, Spain, US, false instead of Norway. And so that's just an unintuitive surprise. And so, those are the types of things that don't typically get captured in incident writeups.There might be a sentence like, “There was a lack of understanding.” Well, that's unhelpful. At best. Don't tell me what wasn't there. Tell me what was there. “There was confusion.” Great. “What made it confusing?” “Oh, yeah. N-O is both ‘no' and the abbreviation for Norway.”Red herrings is another great example. Red herrings happen a lot; they tend to stick in people's memories; and yet, they never really get captured. But it's, like, one of the most salient aspects of the case that ought to be captured. People don't follow red herrings because they know they're a red herring. They follow red herrings because they think it's going to be productive.So therefore, you better describe for all your colleagues what brought you to believe that this was productive. Turns out later—you find out later that it wasn't productive. Those are some of the examples. And so if you can capture what's difficult, what's ambiguous, what's uncertain, and what made it difficult, ambiguous, or uncertain, that makes for good stories. If you can enrich these documents, it means people who maybe don't even work there yet, when they start working there, they'll be interested; they have a set expectation they'll learn something by reading these things.Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of "Hello, World" demos? Allow me to introduce you to Oracle's Always Free tier. It provides over 20 free services and infrastructure, networking databases, observability, management, and security.And - let me be clear here - it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself all while gaining the networking load, balancing and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build.With Always Free you can do things like run small scale applications, or do proof of concept testing without spending a dime. You know that I always like to put asterisks next to the word free. This is actually free. No asterisk. Start now. Visit https://snark.cloud/oci-free that's https://snark.cloud/oci-free.Corey: There's an inherent cynicism around… well, from at least from my side of the world, around any third-party that claims to fundamentally shift significant aspects of company culture, and if the counter-argument to that is that you and DORA and a whole bunch of other folks have had significant success with doing it, it's just very hard to see that from the outside. So, I'm curious as to how you wind up telling stories about that because the problem is inherently whenever you have an outsider coming into an enterprise-style environment, is, “Oh, cool. What are they going to be able to change?” And it's hard to articulate that value, and not—well, given what you do, to be direct—come across as an engineering apologist, where it's well, “Engineers are just misunderstood, so they need empathy, and psychological safety, and blameless post-mortems.” And it sounds to crappy executives, if I'm being direct, that, “Oh, in other words, I just can't ever do anything negative to engineers who, from my perspective, just failed me or are invisible, and there's nothing else in my relationship with them.” Or am I oversimplifying?John: No, no. I actually think you're spot on. I mean, that's the thing is that if you're talking with leaders—remember, a.k.a. People who are, even though they're tasked with providing the resources and setting conditions for practitioners—the hands-on folks who get their work done—they're quite happy to talk about these sort of abstract concepts, like psychological safety and insert other sorts of hand-wavy stuff.What is actually pretty magical about incidents is that these are grounded, concrete, messy phenomena that practitioners have, and will remember; they're sometimes visceral experiences. And so that's why we don't do theory at Adaptive Capacity Labs. We understand the theory, happy to talk to you about it, but it doesn't mean as much without the practicality. And the fact of the matter is that the engineer apologist is, “If you didn't have the engineers, would you have a business?” That's at the flip side; this is, like, the core unintuitive part of the field of resilience engineering, which is that Murphy's Law is wrong.What could go wrong almost never does, but we don't pay much attention to that. And the reason why you're not having nearly as many incidents as you could be is because, despite the fact that you make it hard to learn from incidents, people are actually learning. But they're just learning out of view from leaders. When we go to an organization and we see that most of the people who are attending post-incident review meetings are managers, that has a very particular signal. That tells me that the real post-incident review is happening outside that meeting, it probably happened before that meeting, and those people are there to make sure that whatever group that they represent in their organization isn't unnecessarily given the brunt of the bottom of a bus.And so it's a political due diligence. But the notion that you shouldn't punish or be harsh on engineers for making mistakes completely misses the point. The point is to set up the conditions so that engineers can understand the work that they do. And if you can amplify that, as Andrew Schaffer has said, “You're either building a learning organization, or you're losing to someone who is.” And a big part of that is you need people; you have to set up conditions for people to give detailed story about their work, what's hard.This part of the codebase is really scary, right? All engineers have these notions: this part is really scary, this part is really not that big of a deal, this part is somewhere in between. But there's no place for that outside of the informal discussions. But I would assert that if you can capture that, the organization will be better prepared. The thing that I would end on that is that it's a bit of a rhetorical device to get this across, but one of the questions we'll ask is, “How can you tell the difference between a difficult case—a difficult incident—handled well, or a straightforward incident handled poorly?”Corey: And from the outside, it's very hard to tell the difference.John: Oh, yeah. Well, certainly if what you're doing is averaging how long these things take. But the fact of the matter is that all the people who were involved in that, they know the difference between a difficult case handled well, and a straightforward one handled poorly. They know it, but there's nowhere, there's no place to give voice to that lived experience.Corey: So, on the whole, what is the tech industry missing when it comes to learning effectively from the incidents that we all continually experience and what feels to be far too frequently?John: They're missing what is captured in that age-old parable of the blind men and the elephant. And I would assert that these blind men that the king sends out—“Go find an elephant and come back and tell me about the elephant”—they come back and they all have—they're all valid perspectives, and they argue about, “No, an elephant is this big flexible thing,” and other one is, “Oh, no, an elephant is this big wall,” and, “No, an elephant is a big flappy thing.” If you were to make a synthesis of their different perspectives, then you'd have a richer picture and understanding of an elephant. You cannot legislate—and this is where what you brought up—you cannot set ahead, a priori, some amount of time and effort. And quite often what we see are leaders saying, “Okay, we need to have some sort of root cause analysis done within 72 hours of an event.” Well, if your goal is to find gaps, and come up with remediation items, that's what you're going to get. Remediation items might actually not be that good because you've basically contained the analysis time.Corey: Which does sort of feel, on some level, like it's very much aligned as—from a viewpoint of, yeah, remediation items may not be useful as far as driving lasting change, but without remediation items, good luck explaining to your customers that will never ever, ever happen again.John: Right, yeah. Of course. Well, you'll notice something about those public write-ups; you'll notice that they don't tend to link to previous incidents that have similarities to them because that would undermine the whole purpose, which is to provide confidence. And a reader might actually follow a hyperlink to say, “Wait a minute. You said this wouldn't happen again.”Turns out it would. Of course, that's horseshit. But you're right. And there's nothing wrong with remediation items, but if that's the goal, then that goal is—you know, what you look for is what you find, and what you find is what you fix. If I said, “Here's this really complicated problem and I'm only giving you an hour to describe it,” and it took you eight hours to figure out the solution.Well then, what you come up with in an hour is not actually going to be all that good. So, then the question is, how good are the remediation items? Quite often what we see is—and I'm sure you've had this experience—an incident's been resolved and you and your colleagues are like, “Wow, that was a huge pain in the ass. Oh, dude. I didn't see that coming. That was weird. Yeah.” And one of you might say, “You know what? I'm just going to make this change because I don't want to be woken up tonight, or I know that making this change is going to help things. I'm not waiting for the post-mortem. We're just going to do that.” “Is that good?” “Yep.” “Okay, yeah, please do it.”Quite frequently, those things, those actions, those aren't listed as action items, and yet it was a thing so important that it couldn't wait for the post-mortem—arguably the most important action item—and it doesn't get captured that way. We've seen this take place. And so again, in the end, it's about those who have the lived experience. The live experience is what fuels how reliable you are today.You don't go to your senior technical people and say, “Hey, listen. We got to do this project. We don't know how. I want you to figure out—we're going to—let's say we're going to move away from this legacy thing, so I want you to get in a room, come up with two or three options. Gather a group of folks who know what they're talking about. Get some options, and then show me what the options. Oh, and by the way, I'm prohibiting you from taking into account any experience you've ever had with incidents.” It sounds ridiculous when you would say that, and yet, that is what [unintelligible 00:27:54].So, if you can fuel people's memory, you can't say you've learned something if you can't remember it. At least that's what my kids' teachers tell me. And so yeah, you have to capture the lived experience, and including what was hard for people to understand. And those make for good stories. That makes for people reading them. That makes for people to have better questions about it. That's what learning looks like.Corey: If people want to learn more about what you have to say and how you view these things, where can they find you?John: You can find me and my colleagues at adaptivecapacitylabs.com where we talk all about the stuff on our blog. And myself, and Richard Cook, and Dave Woods are also on Twitter, as well.Corey: And we'll, of course, include links to that in the [show notes 00:28:42]. John, thank you so much for taking the time to speak with me today. I really appreciate it.John: Yeah, thanks. Thanks for having me. I'm honored.Corey: John Allspaw, co-founder and principal at Adaptive Capacity Labs. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice, along with a comment giving me a list of suggested remediation actions that I can take to make sure it never happens again.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need the Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
Our guest this week is Webman, who is the Director of Web Operations at videopoker.com. The main topic of discussion is the new Pro Membership at that site.We welcome your questions - send them to us at gamblingwithanedge@gmail.com, or you can find me at @RWM21 on Twitter.podcastClick to listen - Alt click to downloadShow Notes[00:00] Introduction of Webman, director of web operations at VideoPoker.com[00:35] What does Webman do at VideoPoker.com[01:35] Types of membership available at VideoPoker.com[06:38] Pro membership features: training mode, coaching mode, charts, EV cost of errors, etc[21:06] Ultimate X offerings and strategies[31:40] 500-hand free trial[38:08] Does VideoPoker.com list what paytables are available in casinos?[42:23] South Point Casino October Promotions - Free Play with a Kicker, free video poker class[44:03] Predictit.org - political prediction market, $20 deposit match for GWAE listeners[48:06] BlackjackApprenticeship.com - blackjack training site and forum[49:26] Recommended - Las Vegas Advisor emails, Chile Caliente Tacos y Marisco, Storytellers ShowcaseSouthPointCasino.comPredictit.org/promo/edgeBlackjackApprenticeship.com/featuresVideoPoker.comVPfree2.comWizardofVegas.comLasVegasAdvisor.comBobDancer.com/seminarsChileCalientetym.comLasVegasNevada.gov/Residents/Parks-Facilities/Charleston-Heights-Arts-Center
In this no filter and candid episode, we sit down with Lucas Chang who is helping youth of all socio-economic backgrounds believe they can create change. He's the Co-Founder of Y2Labs (a non-profit that provides students in high school and middle school opportunities to develop their problem-solving and innovation skills, and their entrepreneurial mindset) and Co-lead of Startup York (a volunteer-run community of more than 600 entrepreneurs and business owners from across York Region). With the incredible work he's doing with educating students, Gen Z, in design thinking, he is changing they way someone approaches a problem using empathy based design. We dive into topics around imposter syndrome, what Design Thinking is, Gen Z vs. Millenials, corporate to independent consulting, and what community building is like in York Region.Lucas Chang's experience is unique and diverse. He holds an MBA from Rotman School of Management and starting from his journey he was a Director of Web Operations and Director of Project Planning and Delivery at TELUS, as well as business consultant with Accenture where he worked with clients in financial services, transportation, telecommunications and at provincial and federal levels of government. He then made a shift and co-founded Y2 Entrepreneurship Labs, which is a non-profit that provides students in high school and middle school opportunities to develop their problem-solving and innovation skills, and their entrepreneurial mindset. He is also a co-lead of Startup York (a volunteer-run community of more than 600 entrepreneurs and business owners from across York Region), the principal consultant at KnL Business Consulting, and a regular workshop facilitator, speaker, mentor and judge at entrepreneur- and student-focused forums. Previously, Lucas was a Program Co-Director of the Shad York 2019 campus, a member of the York Angel Investors and a principal at PerfectlySoft, a software startup company. Learn more about Y2Labs: http://y2labs.co/Y2 Labs Twitter: https://twitter.com/y2labsY2Labs Facebook: https://www.facebook.com/y2labs/Startup York Region Twitter: https://twitter.com/StartupYorkStartup York Region Facebook: https://www.facebook.com/StartupYorkRegion/LinkedIn: https://www.linkedin.com/in/lucaschang/
Noah Lampert is the host of the Synchronicity Podcast, founder of the Mindpod Network, musician, and web developer. He has served as the director of Web Operations for the Ram Dass--Love Serve Remember Foundation and is the owner of CRVD Media. In this episode, Noah dives deep into the experience of Synchronicity, why it is important to honor the experience, and its mystical significance. Noah also elaborates on his current passionate practice based on the work and life of Neville Goddard. The closing music of this episode is taken from Noah's 2018 EP entitled "Kykeon" inspired by the ancient Greek brew used in Eleusinian Mystery ceremonies and traditions. For more information about Noah and his work, please visit his website: www.syncpodcast.com
On this episode, David DeWolf and Jonathan Rivers join us to share an overview of all of the news that was fit to print at this year's Fortune BrainstormTech Conference. We discuss what the prevailing wisdom was in Aspen when it comes to artificial intelligence, how the gig economy is driving a wave of acquisitions and the two-way street of innovation some of those acquisitions have opened up, and what you can expect from an upcoming book called The Product Mindset. David DeWolf, the Founder and CEO of 3Pillar Global, lives at the intersection of business, technology, and leadership. After starting 3Pillar almost by accident at the age of 26, he has grown the company to nearly 1000 employees around the globe. 3Pillar has won numerous awards for rapid growth, including being recognized on the Inc. 5000 list of fastest growing private companies in the U.S. for seven of the last eight years. Jonathan Rivers is the CTO at 3Pillar. He leads our product and engineering teams, which include more than 750 software engineers, product consultants, product managers, quality assurance, and user experience professionals. Prior to joining 3Pillar, Jonathan was the interim CTO at Telegraph Media Group and served as its Director of Service Delivery and Operations. As Senior Director of Web Operations and Customer Support at PBS, Jonathan helped transform PBS into a digital leader. Learn more and get the full show notes at: 3PillarGlobal.com
YouTube video here! Topics: - Current cryptocurrency market - Saving and investing About the Guests: - William Kehl, President of Coinigy Inc, is an early Bitcoin adopter, miner, trader, developer and designer. He founded iPhoneFreelancer and several other successful startups over the years. Formerly Director of Web Operations for a global VOIP provider, William is a full stack developer with an emphasis on blockchain market tech. Along with his business partner Rob, William founded Coinigy.com in 2014 to provide professional-grade tools and API access for individuals and institutions interested in Blockchain Market Intelligence. - Perianne Boringfounded the first DC-based advocacy organization for the digital currency and digital asset community. Prior to forming the Chamber, she worked as a financial services television host and Forbes contributor. She began her career as a legislative analyst in the US House of Representatives, advising on finance, economics, tax and healthcare policy. If you like this content, please send a tip with BTC to: 1444meJi7YjgQGNg3U8Z6qYZFA5cgz4Gmj More Info: TatianaMoroz.com CryptoMediaHub.com Vaultoro.com digitalchamber.org coinigy.com Friends and Sponsors of the Show: TheBitcoinCPA.com CryptoCompare.com FreeRoss.org ThirdKey.Solutions SovrynTech.com SexAndScienceHour.com
Safety Podcast, Devops, Safety Culture, Safety Differently, Safety Leadership, New View Safety, Organizational Change, Operational Excellence, Safety, ReliabilitySo this podcast episode is one I have been looking forward to doing for almost a year. When the podcast first started a person "tweeted" an episode on being wrong...and suddenly dozens of people were asking me about John Allspaw. John has worked in systems operations for over fourteen years in biotech, government and online media. He started out tuning parallel clusters running vehicle crash simulations for the U.S. government, and then moved on to the Internet in 1997. He built the backing infrastructures at Salon.com, InfoWorld.com, Friendster, and Flickr. He is now the CTO at Etsy, and is the author of "The Art of Capacity Planning" and "Web Operations" published by O'Reilly. He speaks from time to time at conferences on topics related to web operations, operations and development culture, infrastructure, and capacity planning.He's a dad, guitarist, engineer, and wiseguy. He is also a great podcast interview. I bet you cannot listen to this episode with learning something new...come on! Take the bet. You know I am good for it. I know you will love this episode. Thanks for listening and tell your friends. You are a a part of the fastest growing safety podcast on earth!This episode is sponsored by UA Workplace Health and Safety. Learn more and thank them at UAWHS.comPS. Hey John. I play the guitar, you play the guitar, we should play guitar.
tagomorisさんをゲストに迎えて、RubyConf, Flow, JRuby, Fluentd, Norikra, Lambda Architecture などについて話しました。 Show Notes Rubyconf2014 Matz at RubyConf 2014: Will Ruby 3.0 be Statically Typed? - The Omniref Blog Rebuild: 59: Ruby 3.0 Coming Soon (Matz) Flow | A static type checker for JavaScript Statically typed JavaScript via Microsoft TypeScript, Facebook Flow and Google AtScript Ko1 at RubyConf 2014: Massive Garbage Collection Speedup in Ruby 2.2 - The Omniref Blog JRuby 9K Expected in 2014 Ready for Production The Social Coding Contract // Speaker Deck Confreaks Invitation for v1.0.0 | RubyKaigi 2014 Template Engines in Ruby // Speaker Deck Rebuild: 3: MessagePack (frsyuki, kiyoto) Fluentd | Open Source Data Collector logstash - open source log management Elasticsearch.org Kibana | Overview | Elasticsearch Welcome to Apache Flume - Apache Flume fluent-plugin-amazon_sns | RubyGems.org Turn on Elasticsearch logging by default for GCE platform FluentdがKubernetesの標準ログ収集ツールとして採用 GoogleCloudPlatform/google-fluentd Elasticsearch.org Kibana 4 Beta 2: Get It Now | Blog | Elasticsearch Configuring a Google Compute Engine VM for Google Cloud Logging Norikra: Stream processing with SQL for everybody Norikra: SQL Stream Processing In Ruby Lambda Architecture Microsoft Azure Stream Analytics | Real-time Event Processing Dryad - Microsoft Research Apache Tez - Welcome to Apache Tez Web Operations and Performance - O'Reilly Velocity Strata + Hadoop World Apple Podcasts app now with Show Notes - Tatsuhiko Miyagawa's blog
Welcome to episode five!Run time: 15:03Community Divas on iTunesA podcast about communities and social media toolsIn this episode:- Eden interviews Kathryn Lagden about community building at PTO Today, we discuss one of the issues raised in the interview and we have some community news.- We discuss Kathryn's concern about managing multiple tools within one community website.- Community News: Podcamp Toronto - Feb. 21 & 22 at Rogers Communications Centre at Ryerson University, sign up on the wiki at http://podcamptoronto.org~~~The Avon Breast Cancer CrusadeIn support of breast cancer research, Avon Canada begins the ‘Call for the Cure,’ a grassroots awareness campaign raising funds for the Canadian Breast Cancer Research Alliance.A call to 1-900-561-PINK will automatically donate $2 to breast cancer research (land line phones only). For more information on this and other related Avon crusades: Canada - http://avoncrusade.ca/U.S. - http://www.avoncompany.com/women/avoncrusade/U.K. - http://www.avoncrusade.co.uk/~~~Kathryn Lagden is Director of Web Operations for School Family Media. I recently had the pleasure of speaking with her about community building at PTO Today . She also blogs at That Kathryn Girl. ~~~ 00:01 Intro by Jay Moonah00:09 Eden Spodek and Connie Crosby00:15 Summary of today’s episode00:27 Eden introduces Kathryn Lagden00:43 Kathryn explains what PTO Today is - School Family Media is the umbrella organization, and PTO Today is a main media property. Help parent group leaders of parent groups (PTO or PTA).01:22 She discusses her role in community development.02:36 How do subject matter experts know when to jump into the conversations when there is no designated community manager to suggest they do so?03:38 PTO Today has had an active community for a very long time, starting with a message board. Kathryn discusses the different ways the community interact with the community website.04:30 How the different tools are similar and overlap in content. It is difficult for someone new to see where all the content resides. Kathryn talks about how they redirect people to other parts of the site so they can find the rest of the discussion.05:54 Getting members to move to another tool by directing traffic over.06:51 The PTO File Exchange is a feature created to allow parent groups to share documents and hopefully therefore solve similar issues.08:04 They are tracking usage of the File Exchange with two metrics: number of page views and number of downloads on each document. They have incredible usage on some of their documents; these are very popular.09:20 File Exchange includes a rating feature. Kathryn explains how this works.09:58 Has PTO Today been successful in moving free community members into paid level members? They have a lot of free content on the site including their magazine which brings a lot of people into the site. They have some programs and tools that are paid services and link to them from the appropriate places on the free site.11:24 Eden thanks Kathryn for joining us.11:39 Eden asks Connie about Kathryn's challenge of migrating community members from one tool to another when the conversation is suited for another area. 12:57 Eden invites our listeners to give your suggestions.13:10 Community news13:18 Podcamp Toronto 2009 announced13:45 Avon Breast Cancer Crusade announced14:16 Eden wraps up the episode.Our cool theme music “Get Out of My Face” is by Uncle Seth and is from the Podsafe Music Network. We hope to hear from you! Send your comments to communitydivas@gmail.com or post them on the blog at communitydivas.com. Follow us on Twitter, our Facebook page or our FriendFeed room. Some registration may be required.