Incorporating Site Reliability Engineering (SRE) in Your System Design
SRE is the hot way to manage apps in production - but are you and your systems ready for it? This course teaches you how to design systems for maximum reliability, find the gaps in your current system design and adopt SRE smoothly and effectively.
What you'll learn
Before you adopt SRE you need to be sure that your systems are designed to work well with SRE practices. In this course, Incorporating Site Reliability Engineering (SRE) in Your System Design, you’ll learn how to design systems with SRE in mind and assess what's missing in your existing systems. First, you’ll discover how to architect apps for reliability, so temporary problems are automatically managed and bigger issues are quickly alerted. Next, you’ll explore how observability design supports SRE and helps you get your apps back online. Finally, you’ll delve into how to effectively measure and report on service levels. When you’re finished with this course, you’ll have the skills and knowledge of system design needed to bring your own apps into SRE.
Table of contents
- Designing to Support Incident Resolution 3m
- Exploring the Three Pillars of Observability 5m
- Scenario: Putting Metrics to Use 5m
- Supporting Triage with High-level Metadata 3m
- Scenario: Putting Traces to Use 5m
- Supporting Examination with Low-level Metadata 4m
- Scenario: Putting Logs to Use 6m
- Module Summary 3m