POPULARITY
What if the hardest part of reliability has nothing to do with tooling or automation? Jennifer Petoff explains why real reliability comes from the human workflows wrapped around the engineering work.Everyone seems to think AI will automate reliability away. I keep hearing the same story: “Our tooling will catch it.” “Copilots will reduce operational load.” “Automation will mitigate incidents before they happen.”But here's a hard truth to swallow: AI only automates the mechanical parts of reliability — the machine in the machine.The hard parts haven't changed at all.You still need teams with clarity on system boundaries.You still need consistent approaches to resolution.You still need postmortems that drive learning rather than blame.AI doesn't fix any of that. If anything, it exposes every organizational gap we've been ignoring. And that's exactly why I wanted today's guest on.Jennifer Petoff is Director of Program Management for Google Cloud Platform and Technical Infrastructure education. Every day, she works with SREs at Google, as well as with SREs at other companies through her public speaking and Google Cloud Customer engagements.Even if you have never touched GCP, you have still been influenced by her work at some point in your SRE career. She is co-editor of Google's original Site Reliability Engineering book from 2016. Yeah, that one!It was my immense pleasure to have her join me to discuss the internal dynamics behind successful reliability initiatives. Here are 5 highlights from our talk:3 issues stifling individual SREs' workTo start, I wanted to know from Jennifer the kinds of challenges she has seen individual SREs face when attempting to introduce or reinforce reliability improvements within their teams or the broader organization.She categorized these challenges into 3 main categories* Cultural issues (with a look into Westrum's typology of organizational culture)* Insufficient buy-in from stakeholders* Inability to communicate the value of reliability workOrganizations with generative cultures have 30% better organizational performance.A key highlight from this topic came from her look at DORA research, an annual survey of thousands of tech professionals and the research upon which the book Accelerate is based.It showed that organizations with generative cultures have 30% better organizational performance. In other words, you can have the best technology, tools, and processes to get good results, but culture further raises the bar. A generative culture also makes it easier to implement the more technical aspects of DevOps or SRE that are associated with improved organizational performance.Hands-on is the best kind of trainingWe then explored structured approaches that ensure consistency, build capability, and deliberately shape reliability culture. As they say – Culture eats strategy for breakfast!One key example Jennifer gave was the hands-on approach they take at Google. She believes that adults learn by doing. In other words, SREs gain confidence by doing hands-on work. Where possible, training programs should move away from passive listening to lectures toward hands-on exercises that mimic real SRE work, especially troubleshooting.One specific exercise that Google has built internally is Simulating Production Breakages. Engineers undergoing that training have a chance to troubleshoot a real system built for this purpose in a safe environment. The results have been profound, with a tremendous amount of confidence that Jennifer's team saw in survey results. This confidence is focused on job-related behaviors, which when repeated over time reinforce that culture of reliability.Reliability is mandatory for everybodyAnother thing Jennifer told me Google did differently was making reliability a mandatory part of every engineer's curriculum, not only SREs.When we first spun up the SRE Education team, our focus was squarely on our SREs. However, that's like preaching to the choir. SREs are usually bought into reliability. A few years in, our leadership was interested in propagating the reliability-focused culture of SRE to all of Google's development teams, a challenge an order of magnitude greater than training SREs. How did they achieve this mandate?* They developed a short and engaging (and mandatory) production safety training* That training has now been taken by tens of thousands of Googlers* Jennifer attributes this initiative's success to how they“SRE'ed the program”. “We ran a canary followed by a progressive roll-out. We instituted monitoring and set up feedback loops so that we could learn and drive continuous improvement.”The result of this massive effort? A very respectable 80%+ net promoter score with open text feedback: “best required training ever.”What made this program successful is that Jennifer and her team SRE'd its design and iterative improvement. You can learn more about “How to SRE anything” (from work to life) using her rubric: https://www.reliablepgm.com/how-to-sre-anything/Reliability gets rewarded just like feature workJennifer then talked about how Google mitigates a risk that I think every reliability engineer wishes could be solved at their organization. That is, having great reliability work rewarded at the same level as great feature work.For development and operations teams alike at Google, this means making sure “grungy work” like tech debt reduction, automation, and other activities that improve reliability are rewarded equally to shiny new product features. Organizational reward programs that recognize outstanding work typically have committees. These committees not only look for excellent feature development work, but also reward and celebrate foundational activities that improve reliability. This is explicitly built into the rubric for judging award submissions.Keep a scorecard of reliability performanceJennifer gave another example of how Google judges reliability performance, but more specifically for SRE teams this time. Google's Production Excellence (ProdEx) program was created in 2015 to assess and improve production excellence (aka reliability improvements) across SRE teams.ProdEx acts like a central scorecard to aggregate metrics from various production health domains to provide a comprehensive overview of an SRE team's health and the reliability of the services they manage. Here are some specifics from the program:* Domains include SLOs, on-call workload, alerting quality, and postmortem discipline* Reviews are conducted live every few quarters by senior SREs (directors or principal engineers) who are not part of the team's direct leadership* There is a focus on coaching and accountability without shame (to elicit psychological safety)ProdEx serves various levels of the SRE organization through:* providing strategic situational awareness regarding organizational and system health to leadership and* keeping forward momentum around reliability and surfacing team-level issues early to support engineers in addressing themWrapping upHaving an inside view of reliability mechanisms within a few large organizations, I know that few are actively doing all — or sometimes any — of the reliability enhancers that Google uses and Jennifer has graciously shared with us. It's time to get the ball rolling. What will you do today to make it happen? This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com
Connect with Sabahudin: https://www.linkedin.com/in/sabahudin-murtic/
In this episode of the Shift AI Podcast, Shri Santhanam, EVP and GM at Experian, joins host Boaz Ashkenazy to explore how one of the world's largest data companies is implementing AI at scale. From his early days at Stanford to pioneering Silicon Valley-style innovation in large enterprises, Shri shares insights into Experian's transformation from a traditional credit bureau to a data-backed platform company. The conversation delves into the development of the Experian Assistant, an AI-powered tool revolutionizing how businesses interact with complex financial data, and the critical role of trust in AI adoption. If you're interested in how major financial institutions are navigating the AI revolution while maintaining data privacy and ethical standards, this episode offers invaluable perspectives on the future of trusted AI agents in the workplace.Chapters:[00:00] The Trust Challenge in AI[01:00] Introduction to Shri Santhanam[02:39] Career Journey and Silicon Valley Innovation[04:08] Experian's AI Evolution[06:43] Experian's Business Transformation[10:44] The Experian Assistant[20:01] Internal AI Implementation[23:48] Technical Infrastructure and Model Selection[28:36] The Future: Trusted AgentsConnect with Shri SanthanamLinkedIn: https://www.linkedin.com/in/shriramsanthanam Connect with Boaz AshkenazyLinkedIn: https://www.linkedin.com/in/boazashkenazy X: boazashkenazyEmail: shift@augmentedailabs.com
Higher Ed AV PodcastEpisode 216Mike Pedersen welcomes Chris Hewitt, Manager of Classroom Technical Infrastructure at the University of Guelph to the show.Connect with Chris Hewitt:Twitter: https://twitter.com/chrishewitt79LinkedIn: https://www.linkedin.com/in/chewitt1/Connect with Mike Pedersen:Twitter: https://twitter.com/ped1971LinkedIn: https://www.linkedin.com/in/mike-pedersen-cts-d-cts-i-7477191/
In this episode, Jennifer Petoff, Director of Program Management of Google Cloud Platform and Technical Infrastructure Education, joins us to discuss her pivot from chemistry to tech, the culture shock that came with it, as well as what it's like working for a place like Google.
How to Challenge Stereotypical Views Of What A Physicist Looks Like “Limitless is the campaign to encourage and inspire young people from backgrounds underrepresented in. To continue with physics after the age of 16.” This is the first episode of our 3 part boxset series in partnership with Institute Of Physics whose Limit Less campaign challenges all of those stereotypical views of what a Physicist looks like. Simone Roche MBE chats to Dawn Watson who is head of Strategy and Technical Infrastructure at Sellafield in Cumbria. Listen to learn: ⚡How physics can be for everyone ⚡About Dawns journey into the realm of physics ⚡What Dawn has learnt about working in physics through initiatives and campaigns that she has been involved in Find Dawn here. Find Sellafield here. Find out more about the Limit Less Campaign here. Watch our IOP webinar here. Sign up to our Power Platform to check out our events calendar here. Keep up to date on the latest news from Northern Power Women: Twitter, LinkedIn, Instagram & Facebook Sign up to our newsletter.
Scott Pross, Vice President of Technology at Monalytic, a SolarWinds company, and Brett Littrell, Chief Technology Officer of Alum Rock Union School District, discuss real world solutions utilizing school district case studies. Learn how Alum Rock Union School District transformed their aging technical infrastructure into a modern network with SolarWinds to meet the needs of their students and staff.
Digital business academy Crucial Constructs is adding to its "How To" series about working as a digital nomad with a new report on day-to-day working and living on Tenerife in the Canary Islands. Learn more at https://crucialconstructs.com/how-to-be-a-digital-nomad-in-tenerife-canary-islands (https://crucialconstructs.com/how-to-be-a-digital-nomad-in-tenerife-canary-islands)
The most innovative businesses in the world run on AWS. Top companies in every industry, from automobiles to healthcare to telecom, they all use AWS to build sophisticated applications that power the growth and future of their businesses.But with more than 250 products and services, keeping up with all of AWS's innovations—and making sure you are using them to their fullest potential—can be tricky.That's why we created this podcast: AWS Insiders.In each episode, we will explain how today's tech leaders can stay ahead of Amazon's constantly evolving pace of innovation. You'll hear from Amazon's top product managers as they detail their secrets and strategies to reduce costs, save time, and improve performance.You will also hear from top companies who are using AWS to drive incredible innovation.We will share hands-on tips for making AWS easier and detail why AWS is the operating system of the future.Welcome to AWS Insiders: Secrets and strategies from the smartest minds in AWS.Hosted by AWS superfan and ESW Capital CTO Rahul Subramaniam, and powered by the team at CloudFix.AWS Insiders is available anywhere you get your podcasts.
On the show this week, Mark Mirchandani joins Stephanie Wong to talk about serverless computing and the Cloud OnAir Serverless event with our guests. Aparna Sinha and Philip Beevers start the show giving us a thorough definition of serverless infrastructures and how this setup can help clients run efficient and cost-effective projects with easy scalability and observability. Serverless has grown exponentially over the last decade, and Aparna talks about how that trajectory will continue in the future. At its core, the serverless structure allows large enterprise companies to do what they need to do, from analyzing real time information to ensuring dinner is delivered piping hot. Aparna describes the three aspects of next generation serverless, developer centricity, versatility, and built-in best practices, and how Google is using these to empower developers and company employees to create robust projects efficiently and economically. Phil tells us about the experience of using serverless products and the success of the three pillars in Google serverless offerings. Enterprise customers like MediaMarktSaturn and Ikea are taking advantage of the serverless system for e-commerce, data processing, machine learning, and more. Our guests describe client experiences and how customer feedback is used to help improve Google serverless tools. With so many serverless tools available, our guests offer advice on choosing the right products for your project. We also hear all about the upcoming Cloud On Air event and what participants can expect, from product announcements and live demos to thorough reviews of recently added serverless features. Aparna Sinha Aparna Sinha is Director of Product at Google Cloud and the product leader for Serverless Application Development and DevOps. She is passionate about transforming businesses through faster, safer software delivery. Previously, Aparna helped grow Kubernetes into a widely adopted platform across industries. Aparna holds a PhD in Electrical Engineering from Stanford. She is Chair of the Governing Board of the Cloud Native Computing Foundation (CNCF). She lives in Palo Alto with her husband and two kids. Philip Beevers Phil has been at Google for seven years. He currently leads the Serverless Engineering teams and previously ran the Site Reliability Engineering team for Google Cloud and Google’s internal Technical Infrastructure. Phil holds a BA in Mathematics from Oxford University. Cool things of the week The evolution of Kubernetes networking with the GKE Gateway controller blog Network Performance for all of Google Cloud in Performance Dashboard site Go from Database to Dashboard with BigQuery and Looker blog Introducing Open Saves: Open-source cloud-native storage for games blog Interview Cloud Run site Cloud Functions site Serverless Computing site The power of Serverless: Get more done easily site App Engine site Building Serverless Applications with Google Cloud Run book MediaMarktSaturn site Ikea site Airbus site Veolia site Sound Effects Attribution “Fanfare1” by N2P5 of Freesound.org “Banjo Opener” by Simanays of Freesound.org
David Finn, EVP of Strategic Innovation at CynergisTek, talks with Jesse Fasolo, the Director of Technical Infrastructure and Cybersecurity at Saint Joseph's Hospital in Paterson, NJ about how Jesse built (and continues to build) a successful security program over the last 6 years. Subscribe to CTEK Voices: The Risk Perspective Apple iTunes, Spotify, Stitcher, or your preferred podcast platform. New episodes are released weekly and a transcript of each episode can be found at cynergistek.com.
In today's episode I chat with Asad Malik, who I first met at my workshop in London. Asad started his journey at LSE and began his career at Goldman Sachs in the Emerging Markets Equity team in London. He then transitioned to Google where he helps the firm with Strategy, Investments and FP&A for the Technical Infrastructure team with a focus on machine supply chains, robotics and automation. Asad is a new father as well and on this episode we chat about his journey and some of the struggles he faced in making the transition from pure finance to the wonderful world of tech. Tune in, it's Story Time! For more information visit sherjan.com.
GeneralSubscribe to Fully Vested at FullyVested.co or through your podcast app of choice.LinksOut Of Office: 65+ Startups Helping You Work From Home (CB Insights)COVID-19 proves we need to continue upgrading America’s broadband infrastructure (Blair Levin for the Brookings Institution)Keeping our network infrastructure strong amid COVID-19 (Urs Hölzle, Senior Vice President, Technical Infrastructure for Google, publishing to Blog.Google)About The Co-HostsJason D. Rowley is a researcher and writer, volunteers with the Python Software Foundation as an organizer of Startup Row at PyCon US, and sends occasional newsletters from Rowley.Report.Graham C. Peck is a Venture Partner with Cultivation Capital and additionally helps companies build technology development teams in partnership with Brightgrove and other technology development organizations.
Today, the IT Industry accounts for about 2 percent of total greenhouse gas emissions, comparable to the footprint of air travel. Will IT emission eclipse air travel one day soon? Urs Hölzle thinks the clear answer is “no”: he says IT energy will decrease, and perhaps decrease significantly, over the next decade. Find out why. Hölzle is Senior Vice President of Technical Infrastructure & Google Fellow and oversees the design and operation of the servers, networks, and data centers that power Google's services, as well as the development of the software infrastructure used by Google’s applications. Series: "Institute for Energy Efficiency" [Science] [Show ID: 33271]
Today, the IT Industry accounts for about 2 percent of total greenhouse gas emissions, comparable to the footprint of air travel. Will IT emission eclipse air travel one day soon? Urs Hölzle thinks the clear answer is “no”: he says IT energy will decrease, and perhaps decrease significantly, over the next decade. Find out why. Hölzle is Senior Vice President of Technical Infrastructure & Google Fellow and oversees the design and operation of the servers, networks, and data centers that power Google's services, as well as the development of the software infrastructure used by Google’s applications. Series: "Institute for Energy Efficiency" [Science] [Show ID: 33271]
Today, the IT Industry accounts for about 2 percent of total greenhouse gas emissions, comparable to the footprint of air travel. Will IT emission eclipse air travel one day soon? Urs Hölzle thinks the clear answer is “no”: he says IT energy will decrease, and perhaps decrease significantly, over the next decade. Find out why. Hölzle is Senior Vice President of Technical Infrastructure & Google Fellow and oversees the design and operation of the servers, networks, and data centers that power Google's services, as well as the development of the software infrastructure used by Google’s applications. Series: "Institute for Energy Efficiency" [Science] [Show ID: 33271]
Today, the IT Industry accounts for about 2 percent of total greenhouse gas emissions, comparable to the footprint of air travel. Will IT emission eclipse air travel one day soon? Urs Hölzle thinks the clear answer is “no”: he says IT energy will decrease, and perhaps decrease significantly, over the next decade. Find out why. Hölzle is Senior Vice President of Technical Infrastructure & Google Fellow and oversees the design and operation of the servers, networks, and data centers that power Google's services, as well as the development of the software infrastructure used by Google’s applications. Series: "Institute for Energy Efficiency" [Science] [Show ID: 33271]
When you’re building a new product, you have to experiment quickly and change constantly. If your product is digital, and you have a technical infrastructure that isn’t built to deal with these conditions, it can stonewall any kind of innovation. In this talk for technical team members, Codeship co-founder Florian Motlik introduces different ways to build your infrastructure and processes for constant change, experimentation and innovation.
Vanessa discusses a recent Search Engine Land post entitled How Twitters Technical Infrastructure Issues Are Impacting Google Search Results where she describes how Technical infrastructure and architecture can make an impact on search.