Course
Skills
Site Reliability Engineering: Measuring and Managing Reliability
In this course, students learn approaches for devising appropriate SLIs and SLOs and managing reliability through the use of an error budget.
What you'll learn
Service level indicators (SLIs) and service level objectives (SLOs) are fundamental tools for measuring and managing reliability. In this course, students learn approaches for devising appropriate SLIs and SLOs and managing reliability through the use of an error budget.
Table of contents
Introduction
26mins
- Course structure 3m
- What's the difference between DevOps and SRE? - Intro 0m
- What's the difference between DevOps and SRE? 5m
- Now SRE Everyone Else with CRE! - Intro 0m
- Now SRE Everyone Else with CRE! 6m
- CRE's Three Reliability Principles 3m
- Reliability in the Cloud 3m
- How SLOs help your business make decisions 2m
- How SLOs help you build features faster 2m
- How SLOs help you balance operational and project work 2m
- Making SLOs work for your organization 1m
Targeting Reliability
13mins
Operating for Reliability
19mins
Choosing a Good SLI
40mins
- Module Introduction 2m
- User happiness in metric form 2m
- The properties of good SLI metrics 4m
- Ways of measuring SLIs 4m
- The SLI Menu 3m
- The SLI Equation 2m
- Request / Response SLIs 6m
- Data Processing SLIs 6m
- But my system is really complex! 2m
- Managing complexity with aggregation 2m
- Managing complexity with bucketing 3m
- Achievable SLOs 2m
- Aspirational SLOs 1m
- Continuous Improvement 2m
Developing SLOs and SLIs
17mins
Quantifying Risks to SLOs
20mins
Consequences of SLO Misses
21mins
- Module Introduction 1m
- No Surprises 2m
- A Dashboard Example 1m
- Why an Error Budget Policy? 3m
- Fundamentals of an Error Budget Policy 4m
- How to Draft an Error Budget Policy 4m
- Example Policy Thresholds 3m
- Hypothetical Policy Scenario 4m
- Course Conclusion and Video Wrap Up 1m
- Squirrels 0m
- Additional suggested reading 0m