On-Call Nightmares Podcast

Follow On-Call Nightmares Podcast
Share on
Copy link to clipboard

Being on-call in a tech team can lead to some interesting stories. On this podcast we'll talk to a variety of people from the world of technology, discuss their experiences in on-call and find out some nightmares they survived. Hosted by Jay Gordon - Twitter @jaydestro

Jay Gordon

  • May 8, 2020 LATEST EPISODE
  • infrequent NEW EPISODES
  • 34m AVG DURATION
  • 48 EPISODES


Search for episodes from On-Call Nightmares Podcast with a specific topic:

Latest episodes from On-Call Nightmares Podcast

On-Call Nightmares: Back in a New York Groove

Play Episode Listen Later May 8, 2020 1:21


Hey friends, it's been a while. I haven't been on-call, but I have been working on meeting tons of new people for new content for this podcast. I can't do it alone though. Would you like to be on the podcast? Reach out! Twitter: https://twitter.com/OnCallNightmare Email: oncallnightmares@gmail.com The commitment for your story is under 35 minutes and you'll have a lasting testimony of your experience on-call.

Episode 46 - Year in Review with Corey Quinn of The Duckbill Group

Play Episode Listen Later Dec 23, 2019 36:19


Well 2019 is just about done, that means one more podcast. This time I break format a bit and welcome on Corey Quinn. Corey and I take a look at how he founded the company and how they help people save money on their AWS bills. Then Corey and I take a dive into some of the topics that impacted the cloud in 2019. A fun conversation to end 2019! Corey is the Cloud Economist at The Duckbill Group. Corey specializes in helping companies improve their AWS bills by making them smaller and less horrifying; hosts the Screaming in the Cloud and AWS Morning Brief podcasts; and curates Last Week in AWS, a weekly newsletter summarizing the latest in AWS news, blogs, and tools, sprinkled with snark. https://twitter.com/QuinnyPig https://www.lastweekinaws.com/podcast/screaming-in-the-cloud/ https://www.lastweekinaws.com/

Episode 45 - Kelsey Hightower - Google

Play Episode Listen Later Dec 19, 2019 34:05


It's the One Year Anniversary of On-Call Nightmares. When I set out to start this podcast, there were a few people on a list that i just felt I needed to speak to. I finally checked off the first name I had on the list. Episode 45 is a conversation with Google Principal Developer Advocate, Kelsey Hightower. Kelsey Hightower is a Technologist working at Google while learning in public. https://twitter.com/kelseyhightower https://github.com/kelseyhightower/kubernetes-the-hard-way

Episode 44 - Silvia Botros - Twilio

Play Episode Listen Later Nov 21, 2019 34:00


This week I chat with Silvia Botros also known as the @dbsmasher from Twitter. I learn about her experiences on-call for databases, motherhood and an affinity for breaking things. An awesome conversation with an incredible person. Silvia Botros is a Sr Principal Engineer at Twilio. She focuses on ways to break databases but is also talented at finding bugs in all your software. Whether she helped build it or not. When she is not helping Twilio Sendgrid send billions of emails a day, she is busy training her little replicas on also breaking computers and trolling her friends on Twitter. https://twitter.com/dbsmasher https://twillio.com https://blog.dbsmasher.com

Episode 43 - Damon Edwards - Rundeck

Play Episode Listen Later Nov 14, 2019 30:46


One of the best parts of attending DOES 2019 in Las Vegas was meeting so many of the leaders and innovators from the world of DevOps. Damon Edwards's work is extremely well known in the DevOps field and I was lucky enough to discuss his history during this interview. Damon Edwards is a Co-Founder of Rundeck Inc., the makers of Rundeck, the popular open source Operations Management Platform. Damon has spent over 15 years working with both the technology and business ends of IT Operations and is noted for being a leader in porting Lean and cutting-edge DevOps techniques to large-scale enterprise organizations. Damon is a frequent conference speaker and writer who focuses on DevOps, SRE, and Operations improvement topics. Damon is active in the international DevOps community, a co-host of the DevOps Cafe podcast, and a content chair for Gene Kim’s DevOps Enterprise Summit. https://twitter.com/damonedwards https://rundeck.com

Episode 42 - John Willis - Red Hat

Play Episode Listen Later Oct 31, 2019 36:26


The number 42 has a huge meaning for baseball fans. Jackie Robinson wore 42, Mariano Rivera wore 42 and now one of the greatest in DevOps, John Willis wears the On-Call Nightmares podcast episode #42! Learn from John's past, his present and his future at Red Hat. We got together at the 2019 DevOps Enterprise Summit in Las Vegas to chat about all things DevOps and a lil Yankees baseball (not much). By far one of the most important episodes of the podcast yet. John Willis has worked in the IT management industry for more than 35 years. Currently he is part of Red Hat's Global Transformation Office which will be focused on accelerating our customers digital visions while bringing holistic change across their technological AND social systems. He was formerly Director of Ecosystem Development at Docker. Prior to Docker, Willis was the VP of Solutions for Socketplane (sold to Docker) and Enstratius (sold to Dell). Prior to to Socketplane and Enstratius, Willis was the VP of Training and Services at Opscode, where he formalized the training, evangelism, and professional services functions at the firm. Willis also founded Gulf Breeze Software, an award-winning IBM business partner, which specializes in deploying Tivoli technology for the enterprise. Willis has authored six IBM Redbooks on enterprise systems management and was the founder and chief architect at Chain Bridge Systems. https://twitter.com/botchagalupe Beyond the Phoenix Project - Audiobook https://itrevolution.com/book/beyond-phoenix-project-audiobook/ Maslach Burnout Inventory - https://www.mindgarden.com/117-maslach-burnout-inventory

Episode 41 - JJ Asghar - IBM

Play Episode Listen Later Oct 24, 2019 41:50


On-Call Nightmares returns to talk to the man from Texas who represents Big Blue, JJ Asghar. JJ and I discuss his start as a 15-year-old in technology and how on-call has morphed over the years. JJ works at IBM on the IBM cloud as a Developer Advocate. He’s focusing on the IBM Kubernetes Service trying to make companies and users have a successful on boarding to the Cloud Native ecosystem. He lives and grew up in Austin, Texas. He enjoys a good strong stout, hoppy IPA, and some team building Artemis, madding Dwarf Fortress, Rimworld, or Factorio. He’s a member of the Church of Emacs, though jumps into Vim on remote machines. He usually chooses Ubuntu over CentOS, but secretly wants FreeBSD everywhere. He’s always trying to become a better Ruby developer, but experiments with Go, Python, and only when he has to, Node. A father and husband, if he’s not trying to automate his job away he’s always trying to convince his daughters to “be button makers not button pushers. http://www.github.com/jjasghar https://twitter.com/jjasghar https://www.deliveryconf.com

Episode 40 - Ryan Kitchens - Netflix

Play Episode Listen Later Oct 10, 2019 33:47


A big milestone, episode 40! This week I speak with Netflix SRE Ryan Kitchen about birds, DR and movies! Ryan Kitchens has been in a variety of positions in software over the past ten years allowing him to experience the good and the bad, the amazing and the bizarre. As an SRE with a film degree, he currently works at Netflix on the CORE team, focused on ensuring availability. The background of the team spans incident management and analysis, resilience engineering, and human factors & systems safety. https://twitter.com/this_hits_home

Episode 39 - Daniel Bentley - tilt.dev

Play Episode Listen Later Sep 25, 2019 33:05


This week I speak with Dan Bentley of tilt.dev! Dan is a software engineer who's currently fixing microservice development as CEO of Tilt ( https://tilt.dev ). Before that, he was at Google for 11 years and then Twitter, working on tools for devs and tools for non-developers. He's opened for The Who and has checks from Donald Knuth. Transcript: https://aka.ms/AA64hk6 https://tilt.dev https://twitter.com/dbentley

Episode 38 - Gene Kim - IT Revolution

Play Episode Listen Later Sep 12, 2019 35:51


Live from DevOpsDays Portland, I speak with Gene Kim, Author of "The Phoenix Project" and the upcoming book "The Unicorn Project."  When I started this podcast, one of my goals was to talk to Gene about his own experiences in IT, thankfully this trip to DevOpsDays in PDX helped that happen.  Cameos by Jennifer Davis, Matty Stratton, Jason Yee and Terri Haber! Gene Kim is a multiple award-winning CTO, researcher and author, and has been studying high-performing technology organizations since 1999. He was founder and CTO of Tripwire for 13 years. He has written five books, including “The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win”, “The DevOps Handbook”, “Accelerate” and the upcoming “The Unicorn Project”. Since 2014, he has been the organizer of the DevOps Enterprise Summit, studying the technology transformations of large, complex organizations. https://twitter.com/RealGeneKim Transcript - https://aka.ms/AA6107c The Unicorn Project - https://itrevolution.com/book/the-unicorn-project/ DevOps Enterprise Summit Las Vegas - https://events.itrevolution.com/us/

Episode 37 - Jason Schuster - Stratasan

Play Episode Listen Later Sep 5, 2019 28:03


The On-Call Nightmares Listener feedback system works! Without your stories I just cannot do this podcast. Thankfully, Jason Schuster reached out to share his experience in a 20 year career in technology. Share in his nightmare on this latest episode! Transcript: https://aka.ms/AA606at Jason's Bio: After graduating with a BFA in theater design in 2000 I landed my first job admiring HPUX servers. I took a low ball salary in exchange for training. While I got the training it took a long while for the scales to even out inheriting an outgoing sysadmins servers when I was less than a year on the job. My true passion for automating all the things came on an off site DR test watching 2 senior admins formatting disks one at a time and building a crazy number of volume groups and luns on them by hand. DR used to be a real interesting space that having so much stuff virtualized has mostly solved. After working on various .gov contracts and then supporting internal systems for 13 years I made the jump to devops at one small startup that folded out from under me but did start me on my way. I joined Stratasan just after new years and am loving this place. We are big fans of making boring things boring and not adding unneeded tools to our lives. Mostly I have been extending the reach of our terraform while trying to cut down the number of services we use in AWS to just what is needed. I have also been highlighting metrics we are missing to help us making good planning choices. https://twitter.com/devoprus iamaunixadmin.com

Episode 36 - Michael Stahnke - CircleCI

Play Episode Listen Later Aug 29, 2019 30:39


Live from DevOpsDays Chicago! I meet up with Ops Veteran, Michael Stahnke as we discuss his career in technology. From the weird days of AIX systems all the way till his time now at CricleCI, Michael has plenty of great stories. Special cameos by Jason Yee and Joshua Zimmerman (our laugh track). Michael Stahnke is VP of Platform Engineering at CircleCI. Prior to this role, he was at Puppet running engineering for Puppet Enterprise, Puppet Open source, and SRE. He is an author for State of DevOps Report in 2018 and 2019. Michael also helped get the Extra Packages for Enterprise Linux (EPEL) repository off the ground in 2005, is the author of Pro OpenSSH (Apress, 2005), is an organizer of Devopsdays Madison. You can find reach him @stahnma on nearly any service online. Transcript: https://aka.ms/AA5yha2 https://twitter.com/stahnma

Episode 35 - Mike Grayson - Paychex

Play Episode Listen Later Aug 22, 2019 26:10


Getting paid is a pretty dang important part of your job. Mike Grayson and the team at Paychex are working to make sure that the databases that handle that are always online. This week I catch up with Mike Grayson who's been a great advocate for the database ops community. Mike is a Senior Database Engineer specializing in DevOps, MongoDB, and Apache Kafka based out of Rochester, New York. He is a MongoDB Master and speaker in the Oracle, SQL Server and MongoDB communities. Transcript: https://aka.ms/AA5wnuo https://twitter.com/mikegray831 https://mongomikeblog.wordpress.com/blog/

Episode 34 - Xander Grzywinski - Microsoft

Play Episode Listen Later Aug 8, 2019 35:51


X gonna give it to ya! Xander from the Microsoft Azure Kubernetes SRE Team joins me to talk about his history on-call and more! Xander is a Site Reliability Engineer at Microsoft, he currently slings containers on Azure Kubernetes Service. Previous to Microsoft, he did all the things with retail tech at both Starbucks and Target. You are always welcome to send him your favorite cat pictures. @XanderGrzy https://github.com/salaxander Full Transcript: https://aka.ms/AA5r8ja

Episode 33 - Ben Halpern - DEV Community

Play Episode Listen Later Aug 1, 2019 42:21


On-call can come in different shapes and sizes. Sometimes it's a group of developers who are attacking a problem to keep other developers afloat. That's what Ben Halpern and the team at the DEV Community are up to. Founder of DEV, Canadian, generalist software developer who writes a lot of Ruby. Transcript: https://aka.ms/AA5r8ja https://dev.to/ben https://twitter.com/bendhalpern

Episode 32 - Matty Stratton - PagerDuty

Play Episode Listen Later Jul 25, 2019 50:00


This week I speak with my friend Matty Stratton as we discuss the hard times and the processes to make them better. Matty Stratton is a DevOps Advocate at PagerDuty, where he helps dev and ops teams advance the practice of their craft and become more operationally mature. He collaborates with PagerDuty customers and industry thought leaders in the broader DevOps community, and back when he drove, his license plate actually said “DevOps”. Matty has over 20 years experience in IT operations, ranging from large financial institutions such as JPMorganChase and internet firms, including Apartments.com. He is a sought-after speaker internationally, presenting at Agile, DevOps, and ITSM focused events, including ChefConf, DevOpsDays, Interop, PINK, and others worldwide. Matty is the founder and co-host of the popular Arrested DevOps podcast, as well as a global organizer of the DevOpsDays set of conferences. He lives in Chicago and has three awesome kids, who he loves just a little bit more than he loves Doctor Who. He is currently on a mission to discover the best pho in the world. Transcript (txt format) - https://aka.ms/AA5pv8x Pagerduty Summit - sept 23-25 in San Fran. Breakathon, etc. https://community.pagerduty.com/summit for a great discount PDS19SAT Devopsdays chicago - use the code ADO2019 for 20% off. Devopsdayschi.org http://arresteddevops.com http://speaking.mattstratton.com Breakathon - https://www.eventbrite.com/e/breakathon-at-pagerduty-summit19-tickets-65736757411 https://twitter.com/mattstratton

Episode 31 - Jason Yee - Datadog

Play Episode Listen Later Jul 18, 2019 34:18


Datadog Dash was this week which meant I was lucky enough to catch up with my friend, Jason Yee. We discuss his time in tech, measuring everything and a lot more! Jason is a technical evangelist at Datadog, where he works to inspire developers and ops engineers with the power of metrics and monitoring. Previously, he was the community manager for DevOps & Performance at O'Reilly Media and a software engineer at MongoDB. He's currently exploring the world while living as a nomad and would love to hear about where you live. transcript: https://raw.githubusercontent.com/jaydestro/oncallnightmares/master/episode31.jason.yee.txt https://twitter.com/gitbisect https://www.datadoghq.com/

Episode 30 - Tim Yocum - InfluxDB

Play Episode Listen Later Jul 11, 2019 41:58


Episode 30 is a waterfall of information you'll soak up and learn a ton from. Things get a bit wet and wild for Tim in this episode of On-Call Nightmares! A great discussion about a long history in tech, the things you just can't plan for and more. Tim is an engineering manager at InfluxData with over 20 years of experience. His technical interests include high-performance, scalable, fault-tolerant cloud infrastructure, interconnected hybrid architecture, containerization (c14n?) all the way down, and always winning buzzword bingo. Helping teams achieve their highest potential is his true calling, which often means planting ideas and staying out of the way. transcript: https://raw.githubusercontent.com/jaydestro/oncallnightmares/master/episode30.tim.yocum.txt https://twitter.com/tkyocum https://www.influxdata.com/ https://tky.io

Episode 29 - Molly Struve - Kenna Security

Play Episode Listen Later Jul 3, 2019 33:56


This week's conversation is with Molly Struve of Kenna Security! We discuss her path to tech, how her team worked to fix their on-call rotation and more! Molly Struve is the Lead Site Reliability Engineer at Kenna Security. She joined Kenna in 2015 and has had the opportunity to work on some of the most challenging aspects of Kenna’s code base. This includes scaling Elasticsearch, sharding MySQL databases, and creating an infrastructure that can grow as fast as Kenna's business. When not making code run faster, she can be found fulfilling her need for speed by riding and jumping her show horses. Transcript: https://aka.ms/AA5q313 https://www.mollystruve.com/ https://twitter.com/molly_struve/

Episode 28 - Jason Hand - Microsoft

Play Episode Listen Later Jun 27, 2019 44:40


This week my homie supreme, Jason Hand joins me on On-Call Nightmares. We talk monitoring, SRE and getting in the van. Jason has spent the last 5 years connecting with technologists around the world on ideas related to balancing system and service reliability with the speed and agility required in today's digital world. Previously at VictorOps, Jason authored four books on the subjects of Site Reliability Engineering, Post-Incident Reviews, and ChatOps and was named "DevOps Evangelist of the Year" in 2016 by DevOps.com. Co-organizer and emcee of the annual DevOpsDays Rockies conference, the Frontrange Site Reliability Meetup, Denver DevOps Meetup, and DevOps Road Trip, Jason enjoys connecting story tellers and actionable ideas with those who are hungry to learn. Co-host of the podcast "Community Pulse", Jason helps to bring together ideas and expertise as it relates to building community within tech (I.e. advocacy, evangelism). In his spare time, you'll find Jason soaking up the beautiful Colorado outdoors on a trail, lake, river, or mountain by day and enjoying craft IPA's and bluegrass music by night. Transcript: https://aka.ms/AA5q317 https://twitter.com/jasonhand

Episode 27 - Joseph Marhee - Packet

Play Episode Listen Later Jun 13, 2019 40:13


This week, I bring a friend from a past job to share his insights on observability and other aspects of a weird life in technology. This is one of my favorite chats because Joe is one of my favorite people in tech. "Customer-concerned Operations and Systems workers turned Cloud Native lab-rat at Packet, previously of DigitalOcean, IBM, Recurly, Platform9 Systems. Approach to Production engineering relies on an iterative combination of programmatically-led audits, collaborative remediation, and mental health check-ins to ensure the observability scheme is serving the organization and its workers, and not leaving burnt out engineers at the on-call rotation's mercy. " Transcript: https://aka.ms/AA5q31e https://twitter.com/jmarhee https://github.com/jmarhee https://www.packet.com/

Episode 26 - Jacquelyne Grindrod - MedStack

Play Episode Listen Later Jun 6, 2019 29:15


This week I speak with Jacquie of MedStack! We get insights into how her career started including a nightmare where she's thrown right into the fire. Jacquie has worked in FinTech, media, and is currently in eHealth working at MedStack, a digital app platform for the healthcare industry. She's passionate about solving problems with a holistic approach, and bridging the gaps in communication and systems. Building something meaningful is important to her – from making healthcare accessible to creating a networking app for women in tech (winning team at ElleHacks 2018). She recently became one of Canada's Top 30 Under 30 Developers and was a speaker at DevOpsDays TO 2019! Toronto's tech community is getting stronger and Jacquie looks forward to continuing to collaborate with, and empower, those around her. Transcript: https://aka.ms/AA5q35g https://twitter.com/devopsjacquie https://medstack.co/

Episode 25 - Quintessence Anx - Logz.io

Play Episode Listen Later May 30, 2019 29:55


Live from DevOpsDays Toronto, I meet up with my fellow DevRel road warrior, Quintessence Anx of Logz.io. Quintessence bring years of experience and compassion to her role. Quintessence is a champion for mindfulness around accessibility and diversity. In her own words... I’ve worked in the IT community for over 10 years, including as a database administrator and a DevOps / Cloud / Infrastructure engineer. I was a core contributor to Stark & Wayne’s SHIELD project, which adds backup functionality to Cloud Foundry, as well as a technical reviewer for Learning Go Programming published by Packt Publishing. Currently I am the US Developer Advocate for Logz.io, driving DevOps community engagement. Outside of work I am a chapter leader of Girl Develop It’s Buffalo chapter to help women in the Buffalo community launch careers in development. https://twitter.com/QuintessenceAnx https://twitter.com/inctechbuffalo

Episode 24 - Nathen Harvey - Google

Play Episode Listen Later May 23, 2019 38:02


Live from ChefConf 2019, I talk with Nathen Harvey about outages, lunch and a life spent in technology. This was one of my favorite podcast interviews because Nathen is one of my major influences and mentors in what we do in Developer Advocacy and Relations in technology. He's taught me so much over the years and has done his best to check in with me during the tough moments, like another member of the on-call team might do during a rough incident. Nathen Harvey, Cloud Developer Advocate at Google, helps the community understand and apply DevOps and SRE practices in the cloud. Nathen is a co-host of the Food Fight Show, a podcast about Chef and DevOps, and is part of the DevOps Days conferences global organizing committee. Nathen is part of the Google DevRel team and can be found at the following links: https://twitter.com/nathenharvey https://linkedin.com/in/nathen

Episode 23 - Rich Burroughs - Gremlin

Play Episode Listen Later May 16, 2019 36:30


This week we speak with Gremlin's Community Manager, Rich Burroughs, on his time on-call. We discuss power outages, active-active datacenters and other perspectives from a long career in technology. Rich Burroughs is a Community Manager at Gremlin where he’s focused on growing and strengthening the Chaos Engineering community. He previously worked at Puppet as an SRE and in other operational roles over the years. Rich spent about twenty years of his career in oncall rotations, and is driven by his empathy for operators and engineers who manage production systems. His first PC ran Windows 95, but he set it up to dual boot Linux. https://twitter.com/richburroughs https://gremlin.com

Bonus Episode - Jay Gordon - "bits of //build, Overcoming Failure"

Play Episode Listen Later May 9, 2019 23:23


Bonus! ME!!! I spoke at Microsoft's community event "bits of //build" about overcoming failure. This is a culture talk I have been working on that really focuses on my personal road through failure and recovery. Thanks to all who sat in the room and took part. https://twitter.com/jaydestro

Episode 22 - Mike Julian - The Duckbill Group

Play Episode Listen Later May 9, 2019 40:15


This week I get a chance to speak to someone who just wants to save you some money on your cloud bills. Mike shares some great stories and gives insight to what he and Corey Quinn are working on at the Duckbill Group. Mike is the CEO of The Duckbill Group, a consultancy helping companies fix the horrifying AWS bill by both lowering the size of it and helping them understand where the money is going. Mike also hosts the Real World DevOps Podcast, is the author of O’Reilly’s Practical Monitoring, and editor/analyst at Monitoring Weekly. He was previously an SRE/DevOps Engineer/system administrator for companies such as Taos Consulting, Peak Hosting, Oak Ridge National Laboratory, and many more. Mike is originally from Knoxville, TN (Go Vols!) and currently resides in Portland, OR. Twitter: https://twitter.com/mike_julian https://www.duckbillgroup.com https://monitoring.love https://www.realworlddevops.com

Episode 21 - Arup Chakrabarti - PagerDuty

Play Episode Listen Later Apr 25, 2019 38:56


Who wakes up the people who get woken up for on-call? The folks at PagerDuty are responsible for providing pager notifications to teams across the globe. In this interview I talk with Arup Chakrabarti who's dedicated to get you your alerts. Arup has been working in the space of software operations since 2007. He started out at as an Operations Engineer at Amazon, helping to reduce customer defects with multiple teams for the Amazon Marketplace. Since then, he has managed and built operations teams at Amazon and Netflix to help improve availability and reliability. He currently works at PagerDuty, where he is part of the Infrastructure Engineering group. twitter: https://twitter.com/arupchak https://pagerduty.com

Episode 20 - Nick Maludy - Encore Technologies

Play Episode Listen Later Apr 18, 2019 38:03


LET'S GET WEIRD. LET'S GET WEIRD. LET'S GET WEIRD. This week we talk with Nick Maludy of Encore Technologies on some "weird on-prem" he managed when working as a Defense Contractor. Nick brings unique insight into having to manage critical systems from 10,000 feet above the ground. After graduating Nick Maludy worked for ~5 years at a Department of Defense contractor called SilverBlock Systems. Here they developed an "integration platform" for performing sensor research for next-gen Navy aircrafts. Nick would routinely be traveling and flying in the aircraft for testing purposes. Several times a year the team were deployed to execute missions with our platform on the aircraft. Nick would fly on the plane and debug, troubleshoot, and implement new features on the fly. Since leaving SilverBlock Nick has worked at Encore Technologies as the Manager of the DevOps Team and recently was promoted to Director of Engineering. Twitter: @NickMaludy Github = https://github.com/nmaludy Encore's tech blog = https://encoretechnologies.github.io/

Episode 19 - Shayon Mukherjee - Intercom

Play Episode Listen Later Apr 11, 2019 32:05


You know that little box on the lower bottom of the window you see that asks you if you need help on websites? Well Shayon is part of the team that keeps that online for businesses across the planet. We chat a bit about his time on-call and other topics. Shayon is a System Engineer at Intercom. He is part of the internal infrastructure team that is responsible for Intercom's Availability, Performance, and Scalability. Outside of regular system/infra work, he has played other roles at Intercom, as an engineer on marketing team to doing product development as a Product Engineer. He takes a deep interest into operations and incident management, and attributes operations playing a major role in career growth. Twitter: @shayonj

Episode 18 - Phoummala Schmitt - Microsoft

Play Episode Listen Later Apr 4, 2019 39:14


You get opportunities in tech to work with some of the best people in the world. I got that opportunity when I joined Microsoft, that's where I met the Exchange Goddess! We discuss family, work and how it all comes together when you're on-call. We also discuss the Microsoft Create Startups Event Phoummala will be taking part in, https://www.createstartups.io/ You can register now, for free! Sr Cloud Advocate @ Microsoft, with a background focus on messaging and collaboration, virtualization, and storage. She is a VMware vExpert and co-hosts The Current Status Podcast. In her spare time she blogs for 24x7ITConnection, Exchangegoddess.com. Some her technical articles can also be found on Petri IT Knowledgebase, WeBreakTech.com, and The Register. Considered as one of the Top 50+ Tech Influencers and Thought Leaders You Should Follow , Phoummala aka Exchange Goddess can be followed as @exchangegoddess on Twitter.

Episode 17 - Andy Fleener - SportsEngine

Play Episode Listen Later Mar 28, 2019 41:46


Get your playbook and have the stats ready, we're talking with Andy Fleener of SportsEngine this week. Andy is a Humanist, Systems Thinker, New View Safety Nerd, Sr. Platform Operations Manager at SportsEngine, DevOps Days MSP Co-Organizer. Twitter: @andyfleener

Episode 16 - Eric Sorenson - Puppet

Play Episode Listen Later Mar 20, 2019 30:02


Ever wonder what it was like to do dial-up support hosting in Hawaii? Well this is the damn episode you've waited for your whole life. After 16 years working as a systems/network administrator in the Bay Area, Eric relocated to Portland in 2012 to further develop his passion for awesome configuration management tools. As Puppet's product manager, he worked on extending and improving it for modern infrastructure; his current project is Lyra, a cloud-native workflow engine. Outside of work he enjoys riding road bikes on the dirt and performing electronic music with Portland's techno collective Volt Divers. https://www.facebook.com/VoltDivers/ https://puppet.com/ Twitter: @ahpook

Episode 15 - Andrew Clay Shafer - Pivotal

Play Episode Listen Later Mar 14, 2019 42:37


The Conscientious Developer There are great ways to think of how to attack the on-call situation even if you aren't in an on-call rotation. By being a conscientious developer and taking that extra interest in your software after deployment you're adding incredible valuable. Your co-workers may also really end up appreciating your time a little bit more as well. Some people are born to on call, and some people have on call thrust up on them. Andrew Clay Shafer stole good ideas from wherever he could and started calling them all devops. Andrew tries to solve more problems than he causes but often fails. If devops ever caused you any problems, Andrew feels bad. If devops ever helped you with anything, he also apologizes. Twitter: littleidea

Episode 14 - JD Trask - Raygun

Play Episode Listen Later Mar 6, 2019 35:06


Welcome back to OCN! I this time I chat with CEO of Raygun, JD Trask. One of the cool parts of this podcast is meeting people from all over the world who have had some experience on-call, JD does his thing in New Zealand! John-Daniel is the CEO and co-founder of Raygun.com, an application monitoring company that helps teams identify hidden performance bottlenecks and software bugs. With over 25 years of experience in software development, JD is a programmer at heart with unique insights into scaling software businesses and software team leadership, and he has a deep understanding of building healthy software that gives great customer experiences. He is known to enjoy a glass of whiskey now and then. https://raygun.com/ https://twitter.com/traskjd

Episode 13 - Damian Schenkelman - Auth0

Play Episode Listen Later Feb 28, 2019 29:03


Welcome back to another podcast about downtime! Once again we meet with another technologist who's building a new product and getting it out to the world. This time we meet Damian of Auth0 who's been working with his team to ensure identity services. Damian is an Software Engineer that loves to solve hard problems of any type, especially those related to making software and teams scale. He is a Director of Engineering at Auth0 helping make identity simple for developers. Before Auth0, Damian spent many years working for and at Microsoft on Azure, Media and patterns & practices related initiatives. He spends his spare time with family, friends, exercising and catching up on all things NBA. Twitter: @dschenkelman auth0.com

Episode 12 - Baron Schwartz - VividCortex

Play Episode Listen Later Feb 21, 2019 40:43


Content Warning: This episode does contain some graphic description of the work done by an EMT - if you find this troubling you may want to check out another episode! On this episode, I speak with the CTO and founder of VividCortex on his life down on the farm and as an EMT. Baron gives us some insight into how that prepared him for his time on-call in different roles to ensure databases are fast and reliable. Baron is the CTO and founder of VividCortex, the best way to see what your production database servers are doing. Baron has written a lot of open source software, and several books including High Performance MySQL. He’s focused his career on learning and teaching about scalability, performance, and observability of systems generally (including the view that teams are systems and culture influences their performance), and databases specifically. Twitter: @xaprb Website: xaprb.com

Episode 11 - Sam Phippen - Google

Play Episode Listen Later Feb 14, 2019 38:51


On this edition, Sam shares with me some scary moments from his time at DigitalOcean. Sam tells the tale of a database table that was dropped. https://blog.digitalocean.com/update-on-the-april-5th-2017-outage/ Sam Phippen is a Developer Advocate at Google, and previously an Engineering Manager at DigitalOcean. He's seen his fair share of deep, complex, incidents. He has strong opinions about incident management, postmortem culture, and on call practises. He's sad that he can't hug every cat. Twitter: @samphippen samphippen.com

Episode 10 - J. Paul Reed - Everywhere and Nowhere ;-)

Play Episode Listen Later Feb 7, 2019 44:19


In this episode, Jay and J. Paul Reed discuss the need for on-call practices and incident response in the world of software release engineering. Paul shares some great stories, including how the World Series can depend on a single line of code. J. Paul Reed has over twenty years experience in the trenches as a build/release engineer, working with such companies as VMware, Mozilla, Postbox, Symantec, and Salesforce. In 2012, he founded Release Engineering Approaches, a consultancy incorporating a host of tools and techniques to help organizations "Simply Ship. Every time." He's worked across a number of industries, from financial services to cloud-based infrastructure to health care, with teams ranging from 2 to 2,500 on everything from tooling, operational analysis and improvement, cultural transformation, and business value optimization. He speaks internationally on release engineering, DevOps, operational complexity, and human factors and holds a Masters of Science candidate in Human Factors & Systems Safety at Lund University.

Episode 9 - Charity Majors - Honeycomb.io

Play Episode Listen Later Jan 31, 2019 22:09


Infrastructure Week, Episode 2! Charity and Jay sit down for a discussion on her career and a deep dive into a database incident. You'll get some interesting thoughts on how monitoring has changed in operations. Charity is cofounder and CEO of Honeycomb.io, a startup aimed at debugging complex systems. (“It’s like strace for systems!”) Previously, Charity ran infrastructure at Parse and was an engineering manager at Facebook. She also worked with the RocksDB team to build and deploy the world’s first Mongo + Rocks in production. She likes single malt scotch. https://honeycomb.io https://twitter.com/mipsytipsy

Episode 8 - Melissa Palmer - Veeam

Play Episode Listen Later Jan 28, 2019 32:59


Does this VM bring me joy? Melissa is Product Strategy Technologist at Veeam and an information technology infrastructure enthusiast, with a focus on virtualization, security, and emerging technologies. Melissa is a VMware Certified Design Expert (VCDX #236), and has held roles such as VMware Engineer, Systems Engineer, Solutions Architect, and Technical Marketing Engineer prior to joining Veeam. You can find Melissa on twitter @vMiss33 or at her blog https://vMiss.net.

Episode 7 - Jamesha "Jam" Fisher - Splice

Play Episode Listen Later Jan 24, 2019 36:24


Jamesha "Jam" Fisher is an infrastructure engineer at Splice. Jamesha has worked in the tech industry for over 15(!) years, with a special interest in security. Graduating with a degree in information assurance and security engineering, they lent their experience to operations and systems engineering at companies like Google and GitHub. In their spare time, Jamesha queers it up, along with being a maker of things musical or delicious and objects that use binary numbers.

Episode 6 - Adam Jacob - Board Member at Chef Software

Play Episode Listen Later Jan 17, 2019 40:18


Ride The On-Call Lightning with Adam Jacob Adam Jacob is a Board Member, CTO and founder of Chef. Adam joins us this week to discuss his world as an on-call engineer. Find out what happens when they call in the "Mr. Wolf" of Oracle on a private jet to get the database back online. Learn about Adam's passion for Open Source while we interject our mutual interest in heavy metal.

Episode - 5 - Kolton Andrus - Gremlin Inc

Play Episode Listen Later Jan 10, 2019 37:43


Fear, Chaos and Pain Common subjects in the Christopher Nolan Batman films, especially when the Joker appears. How do we avoid the moments of fear, chaos and pain in real time? By preparing for it. Today we talk with Gremlin Inc founder and CEO Kolton Andrus. Kolton is co-founder and CEO of Gremlin. Previously, he was a Chaos Engineer at Netflix improving streaming reliability and operating the Edge services. He designed and built F.I.T., Netflix's failure injection service. Prior he improved the performance and reliability of the Amazon Retail website. At both companies he has served as a 'Call Leader', managing the resolution of company-wide incidents. Gremlin.com Twitter: @gremlininc

Episode 4 - Tanya Janca - Microsoft

Play Episode Listen Later Jan 3, 2019 31:50


There's on-call in nearly every aspect of the tech industry, in this episode we will focus on Security. Tanya Janca is a senior cloud advocate for Microsoft, specializing in application and cloud security; evangelizing software security and advocating for developers and operations folks alike through public speaking, her open source project OWASP DevSlop, and various forms of teaching via workshops, blogs and community events. As an ethical hacker, OWASP Project and Chapter Leader, Women in Security and Technology (WIST) chapter leader, software developer and professional computer geek of 20+ years, she is a person who is truly fascinated by the ‘science’ of computer science.   https://twitter.com/shehackspurple https://medium.com/@shehackspurple (blog) DevSlop.co

Episode 3 - Chris Short - Red Hat

Play Episode Listen Later Dec 27, 2018 37:24


Chris Short has been a proponent of open source solutions throughout his over two decades in various IT disciplines including systems, security, networks, and DevOps engineering and advocacy across the public and private sectors. He currently works on the Ansible team at Red Hat. Chris is a partially disabled US Air Force veteran living with his wife and son in Greater Metro Detroit. Chris writes about DevOps and other topics at chrisshort.net. He also runs the DevOps, Cloud Native, and open source focused newsletter DevOps’ish. Twitter: ChrisShort Web: https://chrisshort.net, https://devopsish.com

On-Call Nightmares Podcast - Episode 2 - Dan Maher - Datadog

Play Episode Listen Later Dec 20, 2018 37:42


Welcome to the first full-length episode of The On-Call Nightmares Podcast. Dan is a veteran of the original dotcom bubble and has since worked in a variety of environments from start-ups to global corporations, including a stints as a founder, university lecturer, and a day labourer. Today, Dan is a member of the Devopsdays Global team, and a Developer Advocate at Datadog. Twitter: @phrawzty

On-Call Nightmares Episode 1 - Preview

Play Episode Listen Later Dec 19, 2018 2:10


A quick preview of what's to come!

Claim On-Call Nightmares Podcast

In order to claim this podcast we'll send an email to with a verification link. Simply click the link and you will be able to edit tags, request a refresh, and other features to take control of your podcast page!

Claim Cancel