ExpertExpert6 weeks
Production Engineering
Operate high-availability systems with reliability targets, disaster recovery, incident response, and cost controls.
Topic 24 of 24
Prerequisites
- Observability
- Advanced Scalability
- Infrastructure as Code
Key Concepts & Skills
- Reliability
- High Availability
- Disaster Recovery
- Cost Optimization
- Define SLOs
- Plan failover
- Run postmortems
- Optimize cloud spend
Learning Outcomes
- Understand the core principles of Reliability
- Configure and deploy High Availability successfully
- Troubleshoot common issues with Disaster Recovery
- Understand the core principles of Cost Optimization
- Configure and deploy Define SLOs successfully
- Troubleshoot common issues with Plan failover
- Understand the core principles of Run postmortems
- Configure and deploy Optimize cloud spend successfully
Resources
Official Docs
Community
Practice Exercises
Project Task
Design a highly available backend with disaster recovery and cost controls.