Sponsor of the Day:
Jerkmate
https://sre.google/sre-book/introduction/
Google SRE - IT Service Management: Automate Operations
SRE's approach to IT Service Management, Use software engineers to design scalable and reliable systems. Innovation and improve product development.
google sreservice managementautomate operations
https://sre.google/sre-book/part-III-practices/
Google SRE: Distributed Computing Systems | Incident Management
Explore how distributed computing environment and systems improve scalability and resilience. Learn effective incident management startegies used by SRE teams.
google sredistributed computingincident managementsystems
https://sre.google/prodverbs/
Google SRE - Capture sre beliefs: Prodverbs by Perry Lorier
Prodverbs capture key principles of SRE, offering insights into running production services. Explore this first collection, with more to come in the future.
google srecapturebeliefsperry
https://sre.google/workbook/table-of-contents/
Google SRE - SRE workbook table of content
The site reliability workbook table of contents, navigate key SRE concepts of sre and practical strategies for building reliable, scalable systems.
google sreworkbooktablecontent
https://sre.google/classroom/
Google SRE - Sre wokshop | Learn about NALSD and sre
SRE Classroom offers workshops by Google SRE, covering NALSD and sre. Learn non-abstract large systems design and gain hands-on experience in system evaluation.
google srewokshoplearn
https://sre.google/sre-book/simplicity/
Google SRE - Operational Simplicity: Stability and Agility
Operational simplicity in aodrqew, balancing stability and agility to enhance reliability, reduce complexity, and focus on impactful innovation.
google sreoperationalsimplicitystabilityagility
https://sre.google/sre-book/postmortem-culture/
Google SRE - Blameless Postmortem for System Resilience
Blameless postmortems in SRE culture. Incident study that focus on root cause analysis and preventive actions, for culture of continuous improvement.
google sresystem resilienceblamelesspostmortem
https://sre.google/20/
Google SRE: What is SRE? 20 Years of SRE at Google
Celebrate 20 Years of Google SRE history and its evolution | Learn what is SRE? how it started and the role Google SRE engineers played in its global adoption.
google sre20 years
https://sre.google/
Google SRE - Site Reliability engineering
google sre sitereliability engineering
https://sre.google/resources/videos/
Google SRE Video Gallery
The SRE digital video library to learn about SRE practices and build reliable production systems.
google srevideo gallery
https://sre.google/sre-book/managing-incidents/
Google SRE - Incident Management: Key to Restore Operations
Principled incident management can limit disruptions and restore normalcy. Learn about effective strategies and processes for managing incidents.
google sreincident managementkeyrestoreoperations
https://sre.google/sre-book/preface/
Google SRE - What is SRE Role and Responsibilities
Role of a SRE, for system reliability, and service operations. Explore how SREs contribute to the development of scalable, and reliable systems.
google sreroleresponsibilities
https://sre.google/sre-book/table-of-contents/
Google SRE - Site reliability engineering book Google index
Go through the complete table of contents of sre Google book, outlined are the key topics and insights covered in this essential resource for SRE professionals.
google sre sitereliability engineeringbookindex
https://sre.google/sre-book/launch-checklist/
Google SRE - Google checklist: SRE pre launch checklist
Google checklist to ensure successful product launch. Go through the pre launch checklist and launch readiness checklist to execute smooth product launch.
google srepre launchchecklist
https://sre.google/local/
Google SRE Events
Discover Site Reliability Engineering, find local events, meetups, resources to learn more about SRE and reliability engineering
google sreevents
https://sre.google/books/
Google SRE book- Comprehensive guide to site reliability
Explore the world of site reliability engineering with top-rated sre books. Find resources on SRE principles, best practices and the role of a reliability...
google srecomprehensive guidesite reliabilitybook
https://sre.google/sre-book/bibliography/
Google SRE - SRE Book for Must Techniques & Practices
Explore reliable and scalable systems with
google srebookmusttechniquespractices
https://sre.google/sre-book/production-meeting/
Google SRE - System Availability and Outage Review
Review of system availability, major outage impacts, and updates from October 23,2025. Includes action on file descriptor leaks and load balancing improvement.
google sresystem availabilityoutagereview
https://sre.google/sre-book/monitoring-distributed-systems/
Google SRE monitoring ditributed system - sre golden signals
google sremonitoringsystemgoldensignals
https://sre.google/sre-book/foreword/
Google SRE - SRE Workbook: Google's Infrastructure
Discover how Google scaled its legendary infrastructure and system administration with the SRE Workbook. Insights that shaped the concept of DevOps.
google sreworkbookinfrastructure
https://sre.google/sre-book/software-engineering-in-sre/
Google SRE - Developing Software for Complex Machines
Google SRE engineers and their solutions to keep production running, and production intricacies to maintain uptime and minimize latency.
google sredeveloping softwarecomplexmachines
https://sre.google/workbook/canarying-releases/
Google SRE - Canary Release: Deployment Safety and Efficiency
Discover how canary release can improve deployment safety by testing new changes on a small portion of users before a full rollout.
google srecanaryreleasedeploymentsafety
https://itrevolution.com/articles/how-google-sre-and-developers-collaborate/
How Google SRE and Developers Collaborate - IT Revolution
Oct 6, 2022 - This post was adapted from the paper “How Google SRE and Developers Collaborate by Christof Leng, Tracy Ferrell, Alex Bligh, Michal Gefen, Betsy Beyer with...
google sredevelopers collaboraterevolution
https://sre.google/sre-book/evolving-sre-engagement-model/
Google SRE - Production Readiness Review: Engagement Insight
Learn the production readiness review (PRR) model in SRE, including its phases, benefits, and how early engagement improves service reliability and effiency.
google sreproduction readinessreview engagementinsight
https://sre.google/sre-book/service-level-objectives/
Google SRE - Defining slo: service level objective meaning
SRE SLO book to understand service level objective meaning and the various service level terminilogy including sla slo sli to improve service reliability.
google sreservice leveldefiningsloobjective
https://sre.google/sre-book/addressing-cascading-failures/
Google SRE - Cascading Failures: Reducing System Outage
Discover strategies to prevent and mitigate cascading failures, ensuring system stability and reliability, potentially preventing system outages.
google sresystem outagecascadingfailuresreducing
https://sre.google/sre-book/distributed-periodic-scheduling/
Google SRE: Distributed Periodic Scheduling with Cron Service
Explore distributed scheduling challenges and how the distributed cron service solve failure modes in large-scale systems | Google's Paxos-based solution.
google srecron servicedistributedperiodicscheduling
https://sre.google/classroom/distributed-pubsub/
Google SRE classroom - Distributed Publish-subscribe workshop
SRE Classroom: Distributed PubSub teaches NALSD principles through hands on experience in designing and evaluating distributed publish-subscribe messaging...
google srepublish subscribeclassroomdistributedworkshop
https://sre.google/sre-book/part-I-introduction/
Google SRE - Site Reliability Engineering at Google
SRE at Google with insights from Ben Treynor Sloss and a guide to Google's production environment from an SRE perspective.
google sre sitereliability engineering
https://sre.google/mobaa/methods/
Google SRE - Methods for vector display of internet artifacts
Methods for vector display of internet artifacts: Explore how Google's panopticon and monarch system enhance the visualization and monitorng digital data at...
google sremethodsvectordisplayinternet
https://sre.google/resources/practices-and-processes/art-of-slos/
Google SRE - Art of slo | customer reliability engineering
The art of SLO's workshop, crafted by google's customer reliability engineering team, teaches how to measure service reliability using SLIs and SLOs hands-on.
google srereliability engineeringartslocustomer
https://cloud.google.com/blog/products/devops-sre
DevOps & SRE | Google Cloud Blog
google cloud blogdevops sre