AdvancedExpert5 weeks
Site Reliability Engineering (SRE)
Site Reliability Engineering (SRE) turns SLI, SLO, SLA into practical infrastructure skill for reliable production systems.
Topic 22 of 29
Prerequisites
- Platform Engineering
Key Concepts & Skills
- SLI
- SLO
- SLA
- Error Budgets
- Reliability Engineering
- Operate SLI in production-like environments
- Connect SLO to infrastructure workflows
- Troubleshoot failures with repeatable runbooks
- Document operational tradeoffs and risks
Learning Outcomes
- Explain how Site Reliability Engineering (SRE) impacts reliability and delivery
- Build or configure a lab around SLI
- Identify common failure modes and mitigation strategies
Resources
Official Docs
Open Source Projects
Practice Exercises
Project Task
Run a site reliability engineering (sre) lab in a local or cloud sandbox.