POPULARITY
This episode features Google Technical Program Manager (TPM) Karanveer Anand, who joins our hosts to discuss the unique role of TPMs in Site Reliability Engineering (SRE). The conversation highlights how SRE TPMs bridge the gap between technical details and business impact, managing complex projects with inter-team dependencies and ensuring system reliability, particularly in the rapidly evolving AI landscape.
No episódio 175 do Kubicast, recebemos o especialista Luriel Santana para um duelo de ideias entre DevOps e Site Reliability Engineering (SRE). Entre cafés e risadas, mergulhamos em discussões sobre cultura organizacional, automação de infraestrutura, métricas de confiabilidade e práticas de campo que vão desde data centers em Angola até pipelines modernos em nuvem.1. O Panorama: DevOps e SRE no MercadoDesde seu surgimento, o movimento DevOps trouxe um sopro de velocidade e integração entre equipes de desenvolvimento e operações. Já o SRE, idealizado pelo Google, elevou o patamar ao introduzir métricas claras (SLIs, SLOs e SLAs) e processos de gestão de erros. Nesta batalha, não há um “vencedor único”: DevOps acelera a entrega; SRE garante que ela aconteça sem interrupções.2. Lições de Campo em AngolaLuriel compartilhou conosco suas aventuras em data centers físicos, rodando Linux e configurando roteadores Cisco numa das regiões mais desafiadoras do continente africano. A mensagem foi clara: sem automação mínima, manter servidores operando em condições extremas vira gargalo. Foi ali que aprendemos a importância de Infrastructure as Code e do versionamento de configurações.3. Cultura vs FerramentalFrequentemente, equipes se apaixonam por ferramentas e esquecem a cultura. Discutimos como pipelines de CI/CD, contêineres e orquestração Kubernetes só fazem sentido quando há um mindset de colaboração e responsabilidade compartilhada. Do contrário, viram apenas mais uma “caixinha de truques” sem resultados consistentes.4. Métricas de Confiabilidade: SLOs e SLIs na PráticaA gente explorou exemplos de SLOs para aplicações críticas e viu que definir limites aceitáveis de erro é tanto arte quanto ciência. Falamos dos trade‑offs entre velocidade e estabilidade, e de como o roteamento de incidentes pode se apoiar em dashboards bem configurados — sem esquecer dos alertas que evitam alert fatigue.5. Pandemia e Adoção AceleradaA crise global empurrou muitas empresas para a nuvem e para práticas de automação. Discutimos como o trabalho remoto reforçou a necessidade de automação e infraestrutura resiliente, e refletimos sobre cases de pipelines que nasceram em questão de dias para suportar picos inesperados.Conclusão e Próximos PassosSaímos deste episódio com uma certeza: DevOps e SRE não são antagonistas, mas sim parceiros na jornada de entregar software com velocidade e confiabilidade. Se você está começando, comece definindo seus SLIs. Para os veteranos, a dica é revisitar processos e investir em cultura.Links e Recomendações:Conecte-se com Luriel Santana no LinkedIn: https://www.linkedin.com/in/lurielsantana/João Brito - https://www.linkedin.com/in/juniorjbnAssista ao FilmeTEArapia - https://youtu.be/M4QFmW_HZh0?si=HIXBDWZJ8yPbpflMSaiba mais sobre o DevOps Days Feira de Santana: https://www.devopsdays.org/events/2025-feira-de-santana/Confira o Canal Pro Evolua: https://www.youtube.com/c/ProEvoluaDescubra o Projeto Zero CVE (Getup): https://getup.io/zerocveParticipe de nosso programa de acesso antecipado e tenha um ambiente mais seguro em instantes! https://getup.io/zerocve
In this "bumpisode", hosts and producers of Prodcast (including our new co-host, Matt Siegler!) reflect on the previous season and introduce the new season's focus on upcoming trends in Site Reliability Engineering (SRE) and AI, and the friends we make along the way. They also introduce new elements we are bringing in with Season 4, such as a video format and a feedback form.
Mike is joined by Dan Ruby, VP of Marketing at Noble9, a leading reliability platform that helps manage and monitor application reliability. Dan discusses the challenges of marketing a product that aims to keep issues unnoticed by end users and how storytelling can make a traditionally "unexciting" product compelling and engaging. The conversation also covers the importance of data-driven marketing, balancing brand building with lead generation, and innovative campaign strategies. About Nobl9 Founded in 2019 by ex-Googlers Marcin Kurc and Brian Singer, Nobl9 is the premiere Service Level Objectives-based platform for driving a reliable digital experience. With a strong enterprise customer base as well as strategic investments from Cisco and ServiceNow, Nobl9 is recognized as a bleeding-edge solution to modernizing Site Reliability Engineering (SRE) strategies, ensuring that reliability is not measured primarily by availability, but rather by users' ability to do what they expect to be able to do within an application. About Dan Ruby Dan is an eighteen year veteran of digital marketing, with the vast majority of his experience coming as the head of marketing for various B2B SaaS organizations in the Boston area. He has been acquired at various points by Google and Snap, and is currently the VP of Marketing for Nobl9, a B2B SaaS platform for user-centric site reliability. He holds a Bachelor of Journalism from the University of Missouri as well as an MBA from Brandeis University. He occasionally teaches an undergraduate course on marketing at Bentley University. Throughout his career, Dan has become increasingly stubborn about the fact that marketing must focus on creating value for potential leads, and is quite fond of telling anyone who will listen that "nobody gives a **** about your product, give them valuable information, not product pitches." Time Stamps 00:00:42 - Dan Ruby's Career Journey 00:02:09 - Overview of Noble9 00:05:48 - Challenges in Marketing a Reliability Product 00:07:03 - Using Stories to Make Marketing Exciting 00:12:43 - Balancing Brand Building and Lead Generation 00:17:07 - Innovative Campaign Example: DORA 00:22:24 - The Importance of Partnerships in Marketing 00:22:41 - Best Marketing Advice Received 00:23:41 - Advice for New Marketing Professionals 00:25:44 - How to Contact Dan Ruby 00:26:18 - Closing Remarks Quotes "Marketing is such an interesting field. It takes pretty much any skill set and makes it useful.” Dan Ruby, VP of Marketing at Nobl9 "Nothing is boring if you can make it into a story that resonates." Dan Ruby, VP of Marketing at Nobl9 "You can find partners who believe in your product, believe in your company, believe in your people, who will work with you." Dan Ruby, VP of Marketing at Nobl9 Follow Dan: Dan Ruby on LinkedIn: https://www.linkedin.com/in/danielruby/ Nobl9's website: https://www.nobl9.com/ Nobl9 on LinkedIn: https://www.linkedin.com/company/nobl9inc/ Follow Mike: Mike Maynard on LinkedIn: https://www.linkedin.com/in/mikemaynard/ Napier website: https://www.napierb2b.com/ Napier LinkedIn: https://www.linkedin.com/company/napier-partnership-limited/ If you enjoyed this episode, be sure to subscribe to our podcast for more discussions about the latest in Marketing B2B Tech and connect with us on social media to stay updated on upcoming episodes. We'd also appreciate it if you could leave us a review on your favourite podcast platform. Want more? Check out Napier's other podcast - The Marketing Automation Moment: https://podcasts.apple.com/ua/podcast/the-marketing-automation-moment-podcast/id1659211547
This interview was recorded at GOTO Copenhagen for GOTO Unscripted.http://gotopia.techRead the full transcription of this interview hereLiz Fong-Jones - Field CTO at Honeycomb.ioMarit van Dijk - Developer Advocate at JetBrains & Open Source ContributorRESOURCESLIzhttps://twitter.com/lizthegreyhttps://linkedin.com/in/efonghttps://www.lizthegrey.comMarithttps://twitter.com/MaritvanDijk77https://linkedin.com/in/maritvandijkhttps://mastodon.social/@maritvandijkhttps://github.com/mlvandijkhttps://medium.com/@mlvandijkhttps://maritvandijk.comDESCRIPTIONExplore the intricacies of efficient development collaboration and gain valuable insights into Site Reliability Engineering (SRE) strategies in this engaging conversation.Liz Fong-Jones and Marit van Dijk delve into the challenges developers face, emphasizing streamlined communication and workflow optimization. From managing software dependencies to the evolving role of SRE teams, they share practical experiences and thoughts on building internal platforms, shedding light on the collaborative dynamics that shape successful development endeavors.Discover how embracing effective communication and proven SRE practices can pave the way for improved team efficiency and impactful software development outcomes.RECOMMENDED BOOKSCharity Majors, Liz Fong-Jones & George Miranda • Observability EngineeringBeyer, Murphy, Rensin, Kawahara & Thorne • The Site Reliability WorkbookKelly Shortridge & Aaron Rinehart • Security Chaos EngineeringNora Jones & Casey Rosenthal • Chaos EngineeringRuss Miles • Learning Chaos EngineeringMark Seemann & Steven van Deursen • Dependency Injection Principles, Practices & PatternsTwitterInstagramLinkedInFacebookLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!
In this episode Bob and Randy invite Dan Salinas and Sarv Shah from Nobl9 to dive deep into the complexities of Site Reliability Engineering (SRE) and Service Level Objectives (SLOs). Discover the origins of SRE, the significance of SLOs in improving customer experience, and the impact of digital reliability on businesses today. From the challenges of maintaining microservices to the advent of cloud dependency, this episode is packed with insights on ensuring operational excellence in our digital world.
While getting to a cloud native development pattern is a goal for most organizations, it can be a significant journey to transform both infrastructure and processes. Analyst Carl Lehmann joins host Eric Hanselman to explore the paths that can move enterprises forward. DevOps approaches can speed development, Site Reliability Engineering (SRE) can change ways of managing risk and platform engineering can simplify tool sets, but adoption does not always follow a straight line.
En este primer episodio de la Temporada 5, charlamos con Pelado Nerd, reconocido SRE y creador de contenido en YouTube. Exploramos su trayectoria desde sus inicios hasta su éxito en YouTube, así como su experiencia en el mundo de Site Reliability Engineering (SRE). Discutimos el día a día de un SRE y herramientas esenciales para el rol, entre otros.Tabla de Contenidos 01:34 Intro al invitado, los orígenes de Pelado... 03:50 Tú faceta como creador de contenidos. 11:00 Aplicando lo aprendido en Youtube y viceversa. 14:00 Balanceando el ejercicio con el trabajo / mejorando la productividad 18:30 El día a día de un SRE. 25:47 Las 3 herramientas imprescindibles del SRE 27:04 La gran ventaja de Kubernetes 31:21 Kubernetes NO es la opción de ORO para todo 33:45 Lanzando 300 nodos...en 30 min! 35:12 Descubriendo los warm-up 40:13 Historias para no dormir: Adiós a los certificados 44:28 Consejos para futuros SREs 48:30 Para qué quieres Jenkins? Usa Dagger. 52:20 Escalado de clusters con Karpenter 55:20 Lambdas en contenedores 58:12 El futuro de K8s y la 3ra ola de contenedores: WASM 1:01:45 Impacto de la IA en la Infraestructura1:04:50 Recomendaciones finalesRedes Sociales del InvitadoTwitter: https://twitter.com/peladonerdYouTube: https://www.youtube.com/@PeladoNerdLinkedIN: https://www.linkedin.com/in/pablofredrikson/Videos MencionadosDocker de Novato a Pro: https://www.youtube.com/watch?v=CV_Uf3Dq-EU&t=115sIntroducción a Dagger: https://www.youtube.com/watch?v=lGl1UlcODLQWASM, la 3ra ola de contenedores: https://www.youtube.com/watch?v=bgWTf3m6HG0LENS, la mejor interfaz para K8s: https://www.youtube.com/watch?v=DFMKcR4BqwMCrossplane, mejor que Terraform? https://www.youtube.com/watch?v=dWbEvHOtljg&t=129sRecomendacionesLibro: Time Management for System Administrators: https://amzn.eu/d/fL7FiUlLibro: Site Reliability Engineering (Gratis)https://sre.google/books/Canal Pelado Entrena, el desafío de correr una maratón: https://www.youtube.com/@PeladoEntrena✉️ Si quieren escribirnos pueden hacerlo a este correo: podcast-aws-espanol@amazon.comPodes encontrar el podcast en este link: https://aws-espanol.buzzsprout.com/O en tu plataforma de podcast favoritaMás información y tutoriales en el canal de youtube de Charlas Técnicas#foobar #AWSenEspañol
Explore the evolving landscape of Site Reliability Engineering (SRE) and Ownership in the latest episode of Evo Nordics. Hosted by Georgia Benton, this episode features insights from Christian Holmboe, Engineering Manager at Volvo Cars, Alex Ewerlöf, Senior Staff Engineer also at Volvo Cars, and Jens Rantil, Senior Software Engineer. Dive into discussions on fostering ownership in engineering teams and implementing SRE practices for optimal performance, only on Evo Nordics. Manager at Volvo Cars, Alex Ewerlöf, Senior Staff Engineer also at Volvo Cars, and Jens Rantil, Senior Software Engineer. Dive into discussions on fostering ownership in engineering teams and implementing SRE practices for optimal performance, only on Manager at Volvo Cars, Alex Ewerlöf, Senior Staff Engineer also at Volvo Cars, and Jens Rantil, Senior Software Engineer. Dive into discussions on fostering ownership in e Manager at Volvo Cars, Alex Ewerlöf, Senior Staff Engineer
IT-Management Podcast | Für den Service-Management Nerd in Dir.
Site Reliability Engineering (SRE) ist eine Disziplin, die das tiefe Verständnis von Softwareengineering mit einer ausgeprägten Fokussierung auf Zuverlässigkeit und Betriebsstabilität verbindet. Ursprünglich von Google entwickelt, zielt SRE darauf ab, die Lücke zwischen der Entwicklung und dem Betrieb von Software zu schließen, indem es Prinzipien des Engineerings auf Betriebsaufgaben anwendet. SRE-Teams sind dafür verantwortlich, Skalierbarkeit, Performance und Ausfallsicherheit von Services zu gewährleisten und dabei auch die schnelle Entwicklung und Bereitstellung neuer Features zu unterstützen. Sie nutzen eine Reihe von Methoden, wie Automatisierung und kontinuierliche Integration/Delivery, um manuelle Arbeit zu reduzieren und Fehlerquellen zu minimieren. Genau über diese Methoden und SRE an sich spreche ich heute mit Alex Lichtenberger.
Intro Allison Durham Focus: Exploring AI, Software Development, and the Human Mind What is the Human Mind? Allison doesn't make a distinction between the brain and the mind. She sees the mind as a dynamic range of cognitive experiences that include thoughts, perception, and self-awareness. The mind exists alongside the human experience and is fully integrated with bodily sensations. On Consciousness Allison discusses the topic of consciousness, noting that awareness can vary in its intensity. She mentions an intriguing question: Can awareness exist without the brain? She recalls an interesting conversation with a friend who asked her about consciousness and awareness. The Experience of Dreams Allison describes a dream she had that was "rooted in Earth," contrasting it with another dream featuring a monstrous, otherworldly creature. She emphasizes her ability to fully visualize experiences in her dreams, even though she struggles with visualization in her waking life. Aphantasia and Visualization Allison brings up the concept of Aphantasia, where people have difficulty visualizing images. She explores the idea that visualization might be trainable, mentioning techniques such as the "candle technique" to improve skill. She notes that while most people can recall memories with images, these people also often have underdeveloped other sensory recall like smell and hearing. Software Development and AI Allison talks about Rust, a systems-level programming language she enjoys using. She delves into the concept of Site Reliability Engineering (SRE), explaining it stems from Google's earlier operations methods. She praises GitLab for packaging all the tools needed for DevOps, making it more accessible. She explores the concept of MLOps, which focuses on getting machine learning models into production. She finds the speed of open-source AI development both exciting and challenging, noting that problems can't be fully solved before new ones appear. Personal Psychology Framework Allison discusses her psychological framework, leaning heavily on mindfulness-based tactics. She believes in being fully aware of one's thoughts and emotional state, and she finds this awareness essential for taking proper action in life. Final Thoughts She mentions her website, AdjectiveAllison.com, and her social media handle, AdjectiveAllison on X. Time Stamps: 2:30 - Discussing the nature of the mind and its relationship to the brain and awareness 5:00 - Allison explains her experience with aphantasia 7:30 - Stuart talks about training himself to visualize through meditation 9:00 - Whether imagination and visualization can be trained as skills 11:00 - Allison's perspective on not training her own visualization abilities right now 12:00 - Allison's interest in learning Rust programming language 14:00 - Using ChatGPT to assist with engineering problems as a "rubber duck debugger" 16:00 - Explanation of DevOps, APIs, serverless solutions like Repl.it 19:00 - How AI may or may not change API and engineering architectures 21:00 - Automation as connecting APIs; engineers building instead of using no-code 23:00 - AI unlikely to change API interface itself, complexity happens behind it 24:00 - Allison's favorite psychological framework is mindfulness 25:30 - Aligning with specific frameworks depending on the problem
Over the past five to ten years, the testing of microservices has seen significant growth. This surge in testing can be attributed to the increasing adoption of microservices and Kubernetes, which signify a shift away from monolithic application architectures. Bruno Lopes, a leader at Kubernetes company incubator Kubeshop, noted this trend. Kubeshop has initiated six Kubernetes projects, including TestKube, a Kubernetes native testing framework led by Lopes.This rise in testing is making it more accessible to a wider audience and is enhancing the developer experience through automation. Developers now have more time to focus on innovation rather than manual testing. However, there is often a disconnect between development and testing, as developers move quickly, outpacing organizational adaptation to modern testing methods.Lopes emphasized the importance of testing before production deployment and advocated for creating production-resembling testing environments that allow for rapid deployment without waiting for manual tests. This approach is particularly critical for Site Reliability Engineering (SRE) teams who need to respond quickly to issues and minimize downtime for customers. In some cases, it's necessary to run tests within Kubernetes itself, a concept that may take time for companies to fully embrace as the developer experience continues to improve.Learn more from The New Stack about Kubernetes, Testing and TestKube:Testkube: A Cloud Native Testing Framework for KubernetesTop 5 Challenges in Modern Kubernetes TestingWhy You Should Start Testing in the Cloud Native Way
Today we talk to our own Drew Rogers about some of the nuanced aspects of Site Reliability Engineering (SRE) and Development Operations (DevOps) within the rapidly evolving domain of sports betting. The post TechChat Tuesdays #66: The DevOps of Sports Betting with Drew Rogers appeared first on Chariot Solutions.
On this episode of DevOps Toolchain, host Joe Colantonio interviews Vlad Ukis, the head of R&D for Siemens Health Imagineers, about the implementation and benefits of Site Reliability Engineering (SRE). Vlad emphasizes the importance of involving product management, product development, and product operations from the beginning to ensure the success of SRE in an organization. He discusses how to prioritize and communicate the importance of SRE in large organizations with competing initiatives and how introducing a role like SRE and creating a community of practice can facilitate cross-pollination of ideas and best practices. Vlad also dives into the concept of Service Level Objectives (SLOs), their importance in managing services, and the process of defining them by bringing together different teams. He shares his experience introducing SRE in a healthcare domain within a medical device vendor and addresses the challenge of orchestrating organizational buy-in for SRE. Vlad highlights the need for unique approaches to engaging each party in the organization and stresses the importance of culture in implementing new processes at scale. Listeners are encouraged to check out Vlad's book, 'Establishing SRE Foundations.' The interview provides valuable insights into the changes and efforts required for successful SRE implementation and the shift in mindset towards prioritizing reliability. Vlad also discusses the role of coaching and learning over time and the transformation of traditional product management, development, and operations models in the software-as-a-service world. The episode concludes with a discussion on the definition and practice of SRE, its role within an organization, and the potential creation of new positions. Don't miss out on this informative and thought-provoking episode featuring Vlad Ukis, a true expert in SRE and continuous delivery.
We discuss throughout this episode the different engagement models for Site Reliability Engineering (SRE) and how to contextualize SRE into an organization's structure. Sebastian Vietz, an experienced SRE practitioner, suggests five different engagement models for SRE and emphasizes the importance of considering the cost associated with each model. The hosts also discuss the different types of SREs that can exist within these engagement models, including SRE champions and unicorns. They stress the importance of considering organizational context when implementing SRE and tease a future episode where they will delve deeper into a framework for identifying the capabilities needed to solve SRE-related problems.Timestamps of key conceptsWhere and how SRE fits into an organization [00:00:20]We discuss the importance of considering organizational context when implementing SRE and explore different engagement models for SRE.Center of Excellence for Reliability Engineering [00:02:14]We discuss the idea of a center of excellence for reliability engineering, where a few practitioners take on an advisory role for the organization.Embedded SREs [00:04:14]We discuss the idea of embedding SREs into teams, where each team has an embedded SRE whose focus is to implement reliability engineering principles and best practices.Five SRE Engagement Models [00:08:23]We discuss five different engagement models for SRE, including embedded SREs, a center of excellence, and a consulting or ambassador model.Types of SREs [00:10:25]We discuss different personas that an SRE can take, including champions, advocates, and unicorns.Unicorn SREs [00:13:50]We discuss the rare and sought-after unicorn SREs, who have extensive experience and exposure to different business domains and contexts. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit srepath.substack.com
In this episode, we are joined by Mersedes, aka @blkwomenread. We talk about her journey from working at one of the countries busiest Dominos, to call center, to today. These days Mersedes works as a Systems Engineer/Monitoring Engineer which lines up very nicely with her strong interest in the field of Site Reliability Engineering (SRE). We dive into the nitty gritty about her transition from fast food worker to network engineer and learn about what her day-to-day looks like in her current role. This was a great roundtable discussion, suitable for anyone wondering about or currently working in the cloud!How to connect with Mersedes:Twitter: [https://twitter.com/blkwomenread]YouTube: [https://www.youtube.com/@blkwomenread]Twitch: [https://www.twitch.tv/techsavvysadie]Linkedin: [https://linkedin.com/in/mersedeshenderson]Topics:Site Reliability Engineering (SRE)-https://sre.googleBooks to look out for-Building Secure and Reliable SystemsThe Site Reliability WorkbookSite Reliability EngineeringCheck out the Fortnightly Cloud Networking NewsVisit our website and subscribe: https://www.cables2clouds.com/Follow us on Twitter: https://twitter.com/cables2cloudsFollow us on YouTube: https://www.youtube.com/@cables2clouds/Follow us on TikTok: https://www.tiktok.com/@cables2cloudsMerch Store: https://store.cables2clouds.com/Join the Discord Study group: https://artofneteng.com/iaatjArt of Network Engineering (AONE): https://artofnetworkengineering.com
This episode discusses how Site Reliability Engineering (SRE) can be important to organizations. SRE can optimize software operations, reduce costs, support revenue-driving areas, mitigate risks, improve cybersecurity, and enhance customer experiences. We will also cover how to integrate SRE into the organization's culture for continuous improvement and innovation. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit srepath.substack.com
In this episode of SREpath, Ash and Sebastian discuss the unnecessary debate surrounding Site Reliability Engineering (SRE), DevOps, and platform engineering. They argue that these disciplines should not be pitted against each other, but rather seen as complementary and able to coexist within an organization. The focus should be on continuous improvement, learning from failures, and making things better. The hosts emphasize that practitioners in all three areas share the common goal of improvement and should collaborate rather than compete. They briefly distinguish SRE as focusing on system reliability and scalability, DevOps on collaboration and automation, and platform engineering on building and maintaining infrastructure. The decision to establish dedicated teams for each discipline depends on the organization's scale and needs. The hosts encourage a context-driven approach, where individuals from diverse backgrounds and skill sets can contribute to the SRE field. Ultimately, the key is to prioritize improvement and learning, regardless of labels or titles. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit srepath.substack.com
Czym jest Web3 i czy jest to krok w stronę przejścia na zdecentralizowane kontakty między ludźmi? Jak działa i w jaki sposób Ramp Network pomaga na wejście i wyjście ze świata Web3? Podczas rozmowy w sposób oczywisty towarzyszy nam wątek Site Reliability Engineering (SRE), kwestie bezpieczeństwa fintech oraz to, dlaczego Paweł Dawidowicz nie wyobraża sobie budowania startupu on-premise. Słuchajcie uważnie, bo polecimy Wam darmową do pobrania książkę poświęconą tematyce SRE, a nasz gość uchyli rąbka tajemnicy o tym, jak Ramp Network realizuje Zero Trust.
In this episode of the SREpath podcast, Ash and Sebastian explore what Site Reliability Engineering (SRE) is and how it manifests in a highly functional organization. We also cover the controversial issue of what SRE is not. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit srepath.substack.com
Welcome to the first episode of the SREpath podcast! In this episode, we'll introduce you to our podcast hosts and give you their broad-level view of Site Reliability Engineering (SRE). We'll also share some points about how we'll be running future episodes. Whether you're an SRE expert or new to the field, this episode will provide valuable insights into SRE and what you can expect from our podcast series. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit srepath.substack.com
const podcast = { episode: 228, title: 'Web Apps and Site Reliability Engineering', topics: [ 'reliability', 'web apps', 'user focused' ], guest: 'Brian Love' hosts: [ 'John Papa', 'Ward Bell' ]};Recording date: March, 23, 2023John Papa @John_PapaWard Bell @WardBellDan Wahlin @DanWahlinCraig Shoemaker @craigshoemakerBrian Love @Brian_loveBrought to you byAG GridIdeaBladeResources:Google Books on SREWhat is SREIntroduction to Site Reliability Engineering (SRE)Reliable systems in DevOpsPing testVoting with your feetWhat is an SLAService Level Objectives and IndicatorsSLA vs SLO vs SLISLIs, SLOs, and SLAs, oh my:Interview with Dave Rensen, SRE Engineering Director on the SRE Workbook:The Origins of SREWhat it means to be a SREGet Polaris (SRE tool)Send Beacon APIGitHub Copilot XPrompt EngineeringLearn with Introduction to Prompt EngineeringTimejumps00:29 Welcome01:37 Guest introduction02:55 What is SRE?05:38 What is it like if you don't have an SRE?09:29 Sponsor: Ag Grid10:36 Available vs reliable13:35 Is SRE the same as health monitoring?21:29 Sponsor: IdeaBlade22:30 How do I make sure I don't cause more reliability issues?27:36 Who's providing the infastructure?31:04 Where's the AI in all of this?33:59 Final thoughtsPodcast editing on this episode done by Chris Enns of Lemon Productions.
You have a CISO (Chief Security Information Officer) but no CRO (Chief Reliability Officer)? You blame people if systems crash? You scale your people in the rate of scaling your infrastructure? If you answer any of those questions with YES then you should tune into this podcast as you probably struggle adopting Site Reliability Engineering (SRE) in your organization.James Brookbank, Cloud Solutions Architect, has dealt with resiliency topics in a large enterprise prior to joining Google. In our conversation he shares advice he gives Enterprises to convert the excitement about SRE into actual implementation. James gave some good guidance on what good and not so good projects are to start with. He gives practical examples on what it means to change your company culture and why there doesn't have to be an SRE for every service.In our call we discussed the SRE in Enterprise talk at DevOpsDays Boston and SRECon EMEA as well as their recent book. Here are all the relevant links:James Brookbank on Linkedin:https://www.linkedin.com/in/jamesbrookbank/SRECon EMEA Slides: https://www.usenix.org/system/files/srecon22_slides_mcghee.pdfDevOpsDays Boston 2022 Session Recording: https://www.youtube.com/watch?v=__e7b25QOHcEnterprise Roadmap to SRE Book: https://sre.google/resources/practices-and-processes/enterprise-roadmap-to-sre/
Maggie Johnson-Pint from Stanza sits down with Amal & Divya for a deep-dive in to the production side of the development world. If you're at all curious (and/or intimidated) by terms like Site Reliability Engineering (SRE), Service Level Objective (SLO), OpenTelemetry, distributed tracing, and the like… this episode's for you!
Maggie Johnson-Pint from Stanza sits down with Amal & Divya for a deep-dive in to the production side of the development world. If you're at all curious (and/or intimidated) by terms like Site Reliability Engineering (SRE), Service Level Objective (SLO), OpenTelemetry, distributed tracing, and the like… this episode's for you!
Maarten is in conversation with Ramón Medrano, Senior Staff Site Reliability Engineer at Google. In this conversation Maarten and Ramón discuss how the principles and practices of Site Reliability Engineering (SRE) can be applied to the practices of Data Reliability Engineering and data quality management. They deep-dive into four topics - SLOs, lineage, debuggability, and how to operate as a team - from the book Site Reliability Engineering: How Google Runs Production Systems, co-authored by Ramón's manager, Jennifer Petoff. As the book explains how Google's SRE team builds, deploys, monitors, and maintains some of the largest software systems in the world, Maarten and Ramón's conversation explores how data practitioners can apply some of the best practices, processes, and thinking, when it comes to data and systems. More about our host, Maarten Masschelein Read the transcript of this episode Learn more about the chosen charity, Open Arms Connect with us on social media: Twitter, LinkedIn, Facebook From Soda, the provider of data reliability tools and observability platform to enable data teams to find, analyze, and resolve data issues.
Guest: Steve McGhee, Reliability Advocate, Google Cloud Topics: What can security teams learn from the Site Reliability Engineering (SRE) art of rapid and safe deployment? Is this all about the process or do SREs possess some magical technology to do this? What is SRE approach to automation? What are the pillars / components of SRE approach to deployment? SRE is also about scaling. Some security teams have to manage 1000s of detection rules, how can this be done in a manner that does not conflict or cause other problems? Resources: Google SRE book A companion Google SRE workbook “How We Scale Detection and Response at Google: Automation, Metrics, Toil” (ep75) “Achieving Autonomic Security Operations: Why metrics matter (but not how you think)” blog “Achieving Autonomic Security Operations: Reducing toil” blog.
Welcome to a new season of the Humans of DevOps Podcast with your host Eveline Oehrlich. In this episode Eveline is joined by DevOps Institute CEO Jayne Groll to discuss Site Reliability Engineering (SRE). Jayne and Eveline discuss the findings of the 2022 Global SRE Pulse report, how SRE came into being, and the developments and frameworks that are leading SRE into the future. Special thanks to our sponsor Range! Enjoy the Humans of DevOps Podcast? We're incredibly grateful to be voted one of the Best 25 DevOps Podcasts by Feedspot. Want access to more DevOps-focused content and learning? When you join SKILup IT Learning you gain the tools, resources and knowledge to help your organization adapt and respond to the challenges of today. And if you're looking for the answers to DevOps' persistent questions, pop on in to SKILup Discussions, one of the fastest-growing DevOps communities around! Have questions, feedback or just want to chat about the podcast? Send us an email at podcast@devopsinstitute.com
This week, Sean sat down with Emily Arnott of Blameless, who is making it her mission to spread “the Gospel of SRE.” Their discussion covered the philosophy underpinning Site Reliability Engineering, its origins in the world of manufacturing, and a few detailed scenarios for how this approach plays out in real-world incident response teams.
This week, host Sean McDermott is speaking with Eveline Oehrlich, Industry Analyst and Chief Research Officer at the DevOps Institute. Eveline is an industry analyst, author, speaker and business advisor focused on digital transformation. They discuss the emerging topic of Site Reliability Engineering, or SRE, and challenges relative to the broad adoption of DevOps. Has site reliability become the next natural extension of DevOps? Operations always lags behind engineering in a "build and ship" world. Can SRE get these two organizations collaborating from the outset, on both processes and outcomes? Join us for this deep-dive into the future of technology.
Curiosity, Focus, and Forging a Path.In this episode of The Outspoken Podcast, host Shana Cosgrove talks to Gerard Spivey, Senior Systems Development Engineer at Amazon Web Services. Gerard speaks in detail about Amazon's interview process, giving us insight into their procedures and how he prepared himself. We also hear about Gerard's time at Amazon and the types of work he's taking on. Side hustles are a way of life for Gerard, and he speaks about his latest experiences managing his YouTube channel, Gerard's Curious Tech. Lastly, Gerard talks about his time at NYLA and how he was able to bring his full self to work thanks to NYLA's culture. QUOTES “I can do slow and steady, I can find my target audience, and then once I have that I can figure out what I want to parlay that into later.” - Gerard Spivey [25:59] “‘I'm a Senior Director [at Intel], and I can do what I want' is basically what he told me. He's like ‘the company has a 3.0 thing, but for someone like you who actually knows what they're talking about it's not a problem.' So I said, ‘Ooh this is my time, they're letting me in'” - Gerard Spivey [42:07] “You're in a good spot in your career when you're valued for the thing you're going to do next versus the thing you did previously. What you're going to do next is your competitive value - that is what you bring to the table.” - Gerard Spivey [48:27] TIMESTAMPS [00:04] Intro [01:31] Gerard's Wedding Ceremony [02:32] Working at Amazon Web Services (AWS) [05:33] Amazon's Interview Process [12:06] Gerard's Experience with the Job Market [15:54] Working at Amazon [19:11] Starting a New Job During COVID [19:43] Side Hustles [23:21] Gerard's YouTube Channel [31:08] Gerard's Childhood [31:52] How Gerard Decided to Study Electrical Engineering [34:19] Choosing a College [45:13] Gerard's Advice to his Younger Self [47:42] Favorite Books [50:57] Gerard's Time at NYLA [55:36] Outro RESOURCES https://aws.amazon.com/ec2/ (Amazon EC2) https://aws.amazon.com/ec2/instance-types/ (Amazon EC2 Instance Types) https://aws.amazon.com/dynamodb/ (Amazon DynamoDB) https://sre.google/ (Site Reliability Engineering (SRE)) https://www.c2stechs.com/ (Commercial Cloud Services (C2S)) https://www.thebalancecareers.com/what-is-the-star-interview-response-technique-2061629 (STAR Interview Response Method) https://www.microsoft.com/en-us/microsoft-365/exchange/email (Microsoft Exchange) https://azure.microsoft.com/en-us/ (Microsoft Azure) https://www.synopsys.com/glossary/what-is-cicd.html (CI/CD) https://mlt.org/ (Management Leadership for Tomorrow (MLT)) https://www.hbs.edu/ (Harvard Business School) https://a16z.com/ (Andreessen Horowitz) https://www.youtube.com/ (YouTube) https://www.nsbe.org/K-12/Programs/PCI-Programs (NSBE Pre-College Initiative Program) https://www.jhu.edu/ (Johns Hopkins University) https://www.abet.org/ (Accreditation Board for Engineering and Technology (ABET)) https://www.ncat.edu/ (North Carolina A&T State University) https://www.morgan.edu/ (Morgan State University) https://howard.edu/ (Howard University) https://www.rit.edu/ (Rochester Institute of Technology) https://www.psu.edu/ (Penn State University) https://www.digitaltechnologieshub.edu.au/teach-and-assess/classroom-resources/topics/digital-systems/ (Digital Systems) https://www.xilinx.com/products/silicon-devices/fpga/what-is-an-fpga.html (Field Programmable Gate Arrays (FPGAs)) https://www.gwu.edu/ (The George Washington University) https://www.intel.com/content/www/us/en/homepage.html (Intel) https://www.pcmag.com/encyclopedia/term/pci-express (PCI Express) https://www.intel.com/content/www/us/en/io/serial-ata/serial-ata-developer.html (Serial ATA (SATA)) https://consortium.org/ (Consortium of Universities of the Washington Metropolitan Area) https://www.amazon.com/Zero-One-Notes-Startups-Future/dp/0804139296 (Zero to One) by Peter Thiel and Blake Masters https://www.richdad.com/...
Meski membangun budaya DevOps telah membantu tim berkolaborasi dengan lebih baik serta menghadirkan software yang lebih cepat dan handal tim DevOps sebaiknya juga memiliki orang yang di dedikasikan khusus untuk mengembangkan keandalan sistem dan kinerja software. Disitulah Site Reliability Engineering berperan Site Reliability Engineering atau yang biasa disingkat SRE awalnya diinisiasi oleh insinyur Google Ben Treynor. Tak lama setelah menerapkan SRE mereka menerbitkan eBook untuk mensosialisasikan SRE di industri teknologi Nah, sekarang kita sudah kedatangan kak Tara Baskara, Engineering Manajer di Gojek untuk membedah lebih jauh tentang SRE ini.
In this episode, Dave and Jamison answer these questions: Is it possible to move too fast and do you believe in too much enthusiasm? I am one of the youngest member of the team and am always willing to start new projects and balance a few different things. Is there a point where this can start hurting my career? I've gotten bumped in compensation fairly, almost 25% raise since I first started. My career goal is to stay on the programming side but want to become a possible trainer for newer engineers/devs. Listener Michael asks, I'm a backend engineer in an engineering/coding role with a small bit of SRE type work. I love the work as I get to dig deep into tech we use and have become subject a matter expert on databases within the company. I really like my team and my manager in particular, and get to learn a lot every week. My manager is leaving my team to lead a new team within the company that is focused on the company's SaaS offering and I've been given the option of joining this new team if I wish. I like their managerial style and how they have helped me with my career progression so far. However, I'd be doing Site Reliability Engineering (SRE) work. I'm not sure if I'm ready yet to commit to being an SRE and code less/focus more on ensuring the reliability of mission critical production systems. I don't know how easy it would be to switch back to more of a coding role in a years time or if it would pigeonhole me into that type of role. Have you got any advice?
Hey there! Follow the podcast if you like the episode This is Tharun. In the Developer Tharun Podcast, I speak about Software Engineering Thank you for Listening In this Episode Site reliability engineering The 4 aspects of Site Reliability Engineering according to me And more... Thank you for listening to my Podcast. Follow my podcast if you find it helpful. Check out my other episodes. I talk about programming & software engineering. YouTube: https://youtube.com/c/developerTharun Blog Article on: https://tharunshiv.com Instagram: @developerTharun Dev.to: https://dev.to/developertharun Udemy: https://www.udemy.com/user/tharun-shiv/ LinkedIn: https://linkedin.com/in/tharunshiv
Dave Stanke joins us to talk all about Site Reliability Engineering. Dave is a Developer Relations Engineer with Google Cloud Platform specializing in DevOps, Site Reliability Engineering (SRE), and other flavors of technical relationship therapy. He loves chatting with practitioners: listening to stories, telling stories, sharing a healthy cry. Prior to Google, he was the CTO of OvationTix/TheaterMania, a SaaS startup in the performing arts industry, where he specialized in feeding memory to Java servers. He chose on purpose to live in New Jersey, where he enjoys baking, indie rock, and fatherhood. Links https://stanke.dev/ https://twitter.com/davidstanke https://cloud.google.com/developers/advocates/dave-stanke Resources https://sre.google/ https://bit.ly/reliability-discuss https://bit.ly/dora-sodr Thinking, Fast and Slow Site Reliability Engineering The Site Reliability Workbook Want to supercharge your DevOps practice? Research says try SRE Eliminating Toil Identifying and tracking toil using SRE principles How maintenance windows affect your error budget—SRE tips "Tempting Time" by Animals As Leaders used with permissions - All Rights Reserved × Subscribe now! Never miss a post, subscribe to The 6 Figure Developer Podcast! Are you interested in being a guest on The 6 Figure Developer Podcast? Click here to check availability!
O mercado está passando por um período de muitas mudanças e desafios. Para os líderes de empresas, o cenário atual exige a tomada de decisões estratégicas, que ajudem a acelerar a transformação digital dos negócios e otimizar seus investimentos. Nesse sentido, a cultura organizacional ganha força para que líderes e gestores repensem a forma como pessoas, estruturas e processos interagem no dia a dia. No primeiro episódio da segunda temporada do Google Cloud Cast, Daniel Leite, Executivo de Vendas do Google Cloud, e Marcelo Gomes, Especialista em Modernização de Infraestrutura do Google Cloud, recebem o Senior Innovation Advisor do Google Cloud, Renato Nobre, para discutir como desenvolver uma cultura organizacional que valorize a inovação. Se você quiser conferir essa conversa na íntegra e sem cortes, acesse o canal do Google Cloud LATAM no YouTube e assista à gravação completa - o vídeo estará disponível em breve. O Google Cloud Cast é o podcast oficial do Google Cloud no Brasil, no qual discutimos quinzenalmente temas como transformação digital, inovação e a jornada para a nuvem com a participação de executivos, especialistas e convidados especiais. Confira os links deste episódio: O que é Site Reliability Engineering (SRE): https://sre.google Saiba mais sobre DevOps & SRE: https://cloud.google.com/blog/products/devops-sre Confira a pesquisa "O futuro do trabalho no Brasil: Insights sobre a colaboração e novas formas de trabalho": https://bit.ly/FuturodoTrabalhoGCC Confira essa linha do tempo da história computacional: https://www.computerhistory.org/timeline/1945/ Saiba mais sobre o cabo de comunicações transatlânticas Grace Hopper: https://cloud.google.com/blog/products/infrastructure/announcing-googles-grace-hopper-subsea-cable-system Saiba mais sobre Paul Otlet, um dos fundadores da documentação: https://daily.jstor.org/internet-before-internet-paul-otlet/ Solving for Innovation: o que estamos solucionando: https://cloudonair.withgoogle.com/events/reimagine-negocio-business-solution-2021?talk=oq_estamos_solucionando Reinventar a inovação: https://cloudonair.withgoogle.com/events/reimagine-negocio-business-solution-2021?talk=reinventar_a_inovacao Gostou do episódio ou tem alguma sugestão? Compartilha conosco por e-mail em googlecloudcast@google.com
ServiceNow partnered with EMA Research to understand the state of DevOps and Site Reliability Engineering (SRE) industry. This video offers a synopsis of key findings. Download these DevOps and SRE reports to learn about the complete results. See omnystudio.com/listener for privacy information.
ServiceNow partnered with EMA Research to understand the state of DevOps and Site Reliability Engineering (SRE) industry. This video offers a synopsis of key findings. Download these DevOps and SRE reports to learn about the complete results. See omnystudio.com/listener for privacy information.
如果喜欢我们的节目,欢迎通过爱发电打赏支持:https://afdian.net/@pythonhunter 主播 Manjusaka laike9m laixintao 时间轴 00:02:00 为什么 xintao 会离开阿里? 00:22:43 办理新加坡签证 00:28:30 新加坡的生活成本和税收 00:29:57 在新加坡租房 00:43:20 新加坡的日常生活 00:58:17 应对诈骗 01:03:13 xintao 在 Shopee 的工作,Shopee 的公司文化 01:06:06 如何进入 Shopee 工作? 01:11:05 Manjusaka 的招人广告 链接 What is Site Reliability Engineering (SRE)? Google December 2020 services outage 智能运维系列(一)| AIOps 的崛起与实践 关于《Fluent Python》中文版中“期物”这个翻译的讨论 组屋 我在新加坡一个月的生活费明细 - by laixintao Join Shopee & Work with Me! - xintao 的内推链接 PyCon US 2021
New data reveals the competing – and sometimes conflicting – challenges and priorities of IT leaders from 2020 that are shaping IT’s agenda for 2021 when it comes to managing risk. According to a new global survey, 72% of IT leaders and 52% of employees agreed that security is the biggest issue when it comes to unaccounted for and unmanaged technology. It seems that IT’s continuous efforts to reinforce security best practices may finally be paying off. But there is a lower level of awareness for additional issues, especially among employees, with 16% believing unaccounted for and unmanaged technologies do not cause any business problems whatsoever. Snow Software CIO Alastair Pooley joins me on Tech Talks Daily to dive deeper into the findings. On joining Snow, Alastair championed the idea of launching SaaS services and established both a hosting and Site Reliability Engineering (SRE) function to support such growth. This provides the infrastructure and Support for over 200 customers who have adopted such services and provided a path to future growth for the business. By adopting a SaaS/IaaS approach to IT services Snow has managed to pass $100M of ARR with barely any owned infrastructure. Over 90% of Snow’s IT lives either with a SaaS provider or in the public cloud. This is achieved on a zero-trust design which focuses on building single sign on capabilities with strong cybersecurity controls. Alastair also initiated, and now oversees, the cybersecurity function at Snow to provide a central risk and compliance function for the business.
In this episode, we talk to Bob Strecansky who is a Staff SRE at MailChimp. A packed podcast about all things Site Reliability Engineering (SRE). Learn about how to become an SRE, the rise of blameless culture, a clear definition of black-box vs white-box approaches, and much more!
# Podcast S01-E34: CNCF sigue creciendo, también Kubernetes en Netflix y Rust en el Linux kernel - Conducido por @_marKox, @domix ## Revisión de las noticias - [Cloud Native Computing Foundation Takes Charge of Red Hat’s Operator Framework](https://thenewstack.io/cloud-native-computing-foundation-takes-charge-of-red-hats-operator-framework/) - [KubeCon + CloudNativeCon North America 2020 is now an online experience](https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/attend/virtual-event-update/) - [Will 2020 Be The Year Of Rust In The Linux Kernel?](https://hackaday.com/2020/07/15/will-2020-be-the-year-of-rust-in-the-linux-kernel/) - [Site Reliability Engineering (SRE) 101 with DevOps vs SRE](https://www.cncf.io/blog/2020/07/17/site-reliability-engineering-sre-101-with-devops-vs-sre/) ## Twitter! - [Netflix moved to Kubernetes](https://twitter.com/aspyker/status/1283836267646431234) ## Referencias y Recursos - [Setting SLOs: a step-by-step guide](https://cloud.google.com/blog/products/management-tools/practical-guide-to-setting-slos) - [GKE best practices: Exposing GKE applications through Ingress and Services](https://cloud.google.com/blog/products/containers-kubernetes/exposing-services-on-gke) - [Announcing the New Version of the Well-Architected Framework](https://aws.amazon.com/blogs/architecture/announcing-the-new-version-of-the-well-architected-framework/) ## Repos chingones de código - [Kubelive](https://github.com/ameerthehacker/kubelive) - [kubevol](https://github.com/bmaynard/kubevol) ### Créditos de música Music by Scott Buckley – www.scottbuckley.com.au
In this podcast, Johnny Boursiquot, Site Reliability Engineer at Heroku, sat down with InfoQ podcast co-host Daniel Bryant and discussed topics that included: why Go is a useful language for building Function-as-a-Service (FaaS) style applications; how Heroku implement the role of Site Reliability Engineer (SRE); and why the ability to teach is such a valuable skill. Why listen to this podcast: - Go is a useful language for building Function-as-a-Service (FaaS) style applications. The ability to build Go applications into a static binary reduces the need for dependency management, and the quick runtime and application start time is good for initiation and scaling - The FaaS development toolchain has improved over the years. Many cloud providers now provide local runtimes, e.g. AWS SAM Local, and service simulators, e.g. LocalStack. Testing in production is facilitated by the ability to do dark launches and canary releasing at the ingress/API gateway - Developing “serverless” applications typically does not remove the need for operational expertise on a development team. Designing systems appropriately and getting the most out of the runtime (with minimal cost) requires knowledge of the underlying infrastructure components - The role of Site Reliability Engineering (SRE) looks different across practically every organisation. The Heroku SRE team have adapted well-established patterns and practices into their roles. They act as “diplomats”, working closely with product teams to share knowledge around operational best practices - The ability to teach is a valuable skill, regardless of your job. Teaching people to code or to embrace important operational principles is extremely rewarding. - Engineers who teach must seek to escape the pull of their ego; by focusing on the needs of the people you are teaching, much more progress can be made. More on this: Quick scan our curated show notes on InfoQ https://bit.ly/2UV0tqK You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq Subscribe: www.youtube.com/infoq Like InfoQ on Facebook: bit.ly/2jmlyG8 Follow on Twitter: twitter.com/InfoQ Follow on LinkedIn: www.linkedin.com/company/infoq Check the landing page on InfoQ: https://bit.ly/2UV0tqK
DevOps is great, but it needs a huge cultural shift, which many organizations find too hard. That's where Site Reliability Engineering (SRE) comes in. In this episode, Elton Stoneman, author of the Pluralsight course Site Reliability Engineering (SRE): The Big Picture, shares why SRE might be than DevOps for most organizations. Discover how SRE brings a software engineering approach to operations, making it easy to implement and to get quick results. Listen in to discover some critical aspects of SRE and how to transform your organization.
songmuさんをゲストにお迎えして、中国での起業、語学学校、SE時代、Perlコミュニティ、OSS活動、Nature株式会社、などについて話しました。 【Show Notes】 Nature株式会社 Rebuild.fm 慶應義塾大学 SFC 総合政策学部 環境情報学部 順徳区 - Wikipedia Shibuya Perl Mongers Sugamo.css 面白法人カヤック @fujiwara | Twitter @typester | Twitter IRC - Wikipedia ISUCON YAPC - Wikipedia オードリー・タン - Wikipedia はてなに入った技術者の皆さんへ - jkondoのはてなブログ @miyagawa | Twitter Plagger - Wikipedia Plack - Wikipedia 退職とFA宣言のお知らせ | おそらくはそれさえも平凡な日々 @stanaka | Twitter Mackerel インフラチーム改め Site Reliability Engineering (SRE) チームになりました - Mercari Engineering Blog セールスエンジニア 改め Customer Reliability Engineer (CRE) になりました - Hatena Developer Blog @maaash | Twitter Nature Remoのシステムの裏側についての資料を公開します - An Epicurean Nature Remo E YAPC::Kyoto 2020 OSS貢献を小さく始めて技術力を高め、大きく花開かせる - YAPC::Kyoto 2020 採用情報 — Nature ghq v1リリースとghq-handbookのお知らせ | おそらくはそれさえも平凡な日々 ghq-handbook 配信情報はtwitter ID @shiganaiRadio で確認することができます。 フィードバックは(#しがないラジオ)でつぶやいてください! 感想、話して欲しい話題、改善して欲しいことなどつぶやいてもらえると、今後のポッドキャストをより良いものにしていけるので、ぜひたくさんのフィードバックをお待ちしています。 【パーソナリティ】 gami@jumpei_ikegami zuckey@zuckey_17 【ゲスト】 songmu@songmu 【機材】 Blue Micro Yeti USB 2.0マイク 15374
In this episode I catch up with Josh Duffney to discuss the differences between DevOps and Site Reliability Engineering (SRE).
Even the best continuous delivery and DevOps practices cannot guarantee that there will be no issues in production. The rise of Site Reliability Engineering (SRE) has promoted new ways to automate resilience into your system and applications to circumvent potential problems, but it's time to 'shift-left' this effort into engineering. In this session, learn to leverage AWS Lambda functions as 'remediation as code.' We show how to make it part of your continuous delivery process and orchestrate the invocation of Self-Healing Lambda functions in case of unexpected situations impacting the reliability of your system. Gone are the days of traditional operation teams-it's the rise of 'shift-lefters'! This session is brought to you by AWS partner, Dynatrace.
Tune in to host Mike Kavis and guest Damon Edwards as they discuss how Site Reliability Engineering (SRE) works, some of the common benefits, and why it is important to define an SRE model specific to your organization. Learn how SRE’s shared responsibility model and feedback mechanisms can help organizations gain control and enable the operations, development, and business teams to work together.
Site Reliability Engineering (SRE) is the topic for the latest Full Stack Journey podcast. Guest Michael Kehoe explores SRE, its relationship w/ DevOps, essential skills, and more.
Site Reliability Engineering (SRE) is the topic for the latest Full Stack Journey podcast. Guest Michael Kehoe explores SRE, its relationship w/ DevOps, essential skills, and more.
Site Reliability Engineering (SRE) is the topic for the latest Full Stack Journey podcast. Guest Michael Kehoe explores SRE, its relationship w/ DevOps, essential skills, and more. The post Full Stack Journey 022: Site Reliability Engineering (SRE) With Michael Kehoe appeared first on Packet Pushers.
Site Reliability Engineering (SRE) is the topic for the latest Full Stack Journey podcast. Guest Michael Kehoe explores SRE, its relationship w/ DevOps, essential skills, and more. The post Full Stack Journey 022: Site Reliability Engineering (SRE) With Michael Kehoe appeared first on Packet Pushers.
SRE is a very hot field right now. Some say it is "the ops in DevOps". We chat with Stig Sorenson of the Bloomberg SRE team about how Bloomberg is using SRE to make their business more responsive to their customers. Stig and the Bloomberg team are really at the forefront of what is happening in the SRE field, so this is a great look in.
This week is a clash of titans! Liz Fong-Jones and Seth Vargo join Mark and Melanie, to battle out on which is better: SRE or Devops (hint - everyone wins!). Liz Fong-Jones Liz is a Staff Site Reliability Engineer at Google and works on the Google Cloud Customer Reliability Engineering team in New York. She has worked on services ranging from Google Flights to Cloud Bigtable in her 10+ years at Google. She lives with her wife, metamour, and a Samoyed/Golden Retriever mix in Brooklyn. In her spare time, she plays classical piano, leads an EVE Online alliance, and advocates for transgender rights. Seth Vargo Seth Vargo is a Developer Advocate at Google. Previously he worked at HashiCorp, Chef Software, CustomInk, and a few Pittsburgh-based startups. He is the author of Learning Chef and is passionate about reducing inequality in technology. Seth is an active member of the DevOps community and has written thought-leader-y pieces such as the 10 Myths of DevOps. Cool things of the week Google I/O session youtube What’s new in Firebase at I/O 2018 blog Introducing ML Kit for Firebase blog Jeff Dean is new Head of AI wired Introducing Cloud Memorystore: A fully managed in-memory data store service for Redis blog Google Group Issue tracker Interview class SRE implements DevOps youtube series DevOps wikipedia Site Reliability Engineering (SRE) site Terraform site Chef site Puppet site Ansible site SaltStack site Prometheus site Datadog site Stackdriver site The Site Reliability Workbook: Practical Ways to Implement SRE amazon Seeking SRE o’reilly Customer Reliability Engineering Blog Series blogs Question of the week I’m a researcher at a regionally accredited academic institution and I need compute resources. Does Google Cloud have any programs that can help me out? Google Cloud Platform announces new credits program for researchers blog faq Where can you find us next? Mark will be speaking at the Monthly SF Game Development Community, presenting on You Can’t Just Add More Servers on May the 30th in San Francisco. Melanie is speaking at the Understand Risk Forum on May 17th, in Mexico City.
Aaron and Brian talk with Rob Hirschfeld (@zehicle, CEO @rackngo; Kubernetes Cluster Ops Co-Chair) about the consistency, continuum and confusion between the concepts of DevOps and Site Reliability Engineering (SRE). Show Links: Google SRE Book DevOps vs. SRE DevOps vs. SRE (Rob on Datanauts) Love DevOps? Wait until you meet SRE? Open Source PXE "L8istSh9y" - Rob's Edge & Automation Podcast [PODCAST] @PodCTL - Containers | Kubernetes - RSS Feed, iTunes, Google Play, Stitcher, TuneIn and all your favorite podcast players [A CLOUD GURU] Get The Cloudcast Alexa Skill [A CLOUD GURU] A Cloud Guru Membership - Start your free trial. Unlimited access to the best cloud training and new series to keep you up-to-date on all things AWS. [A CLOUD GURU] FREE access to AWS Certification Exam Prep Guide - At A Cloud Guru, the #1 question received from students is "I want to pass the AWS cert exam, so where do I start?" This course is your answer. [FREE] eBook from O'Reilly Show Notes Topic 1 - What is the State of DevOps today? On one hand, there’s Gene Kim’s DevOps Reports (all is great), on another hand is DevOps Days which has become about Empathy, and somewhere in between are companies struggling with all of this silo-busting and automation and constant change. So where are we? Topic 2 - The DevOps community seemed to want to reject all sort of labels and titles (DevOps engineer, DevOps certified, etc.) and how there is this “SRE” (Site Reliability Engineering) concept. Is this just a new name for DevOps? Topic 3 - Like NetFlix had microservices, so everybody needed microservices - Google has SRE, so now everyone needs SRE? How does SRE fit into a non-Google company? Topic 4 - Many Infra/Ops-centric people have been trying to learn automation and some basic programming (e.g. Python, Powershell/Scripting). SREs are often described as programmers that live in the Ops world. Can these current Infra/Ops people evolve to SRE? Topic 5 - Do you find that DevOps or SRE apply more (or less) to using certain types of technologies vs. other technologies? Feedback? Email: show at thecloudcast dot net Twitter: @thecloudcastnet and @ServerlessCast
In this podcast, Rob Hirschfeld, Founder and CEO, RackN discusses the latest trends in IT management at scale including DevOps and the emergence of Site Reliability Engineering (SRE). SRE is a response to the limitations of DevOps faced by Google providing an answer to the significant challenges of operating global Hybrid IT infrastructure that continues to grow at a rapid rate. For more information on RackN visit www.rackn.com
Software Engineering Radio - The Podcast for Professional Software Developers
Björn Rabenstein discusses the field of Site Reliability Engineering (SRE) with host Robert Blumen. The term SRE has recently emerged to mean Google's approach to DevOps. The publication of Google's book on SRE has brought many of their practices into more public discussion. The interview covers: what is distinct about SRE versus devops; the SRE focus on development of operational software to minimize manual tasks; the emphasis on reliability; Dickerson's hierarchy of reliability; how reliability can be measured; is there such a thing as too much reliability?; can Google's approach to SRE be applied outside of Google?; Björn's experience in applying SRE to Soundcloud - what worked and what did not; how can engineers best apply SRE to their organizational situation?; the importance of monitoring; monitoring and alerting; being on call, responding to incidents; the importance of documentation for responding to problems; they wrap up with a discussion of why people from non-computer science backgrounds are often found in devops and SRE.
Software Engineering Radio - The Podcast for Professional Software Developers
Björn Rabenstein discusses the field of Site Reliability Engineering (SRE) with host Robert Blumen. The term SRE has recently emerged to mean Google’s approach to DevOps. The publication of Google’s book on SRE has brought many of their practices into more public discussion. The interview covers: what is distinct about SRE versus devops; the SRE […]
DevOps je mrtvé, ať žije Site Reliability Engineering (SRE). Do tohoto dílu jsme pozvali Ladislava Prskavce, který vede SRE tým v Apiary a je tedy osobou více než povolanou, aby nám o tomto novém přístupu něco prozradil.
