Skip to main content
ExpertExpert6 weeks

Production Engineering

Operate high-availability systems with reliability targets, disaster recovery, incident response, and cost controls.

Topic 24 of 24

Prerequisites

  • Observability
  • Advanced Scalability
  • Infrastructure as Code

Key Concepts & Skills

  • Reliability
  • High Availability
  • Disaster Recovery
  • Cost Optimization
  • Define SLOs
  • Plan failover
  • Run postmortems
  • Optimize cloud spend

Learning Outcomes

  • Understand the core principles of Reliability
  • Configure and deploy High Availability successfully
  • Troubleshoot common issues with Disaster Recovery
  • Understand the core principles of Cost Optimization
  • Configure and deploy Define SLOs successfully
  • Troubleshoot common issues with Plan failover
  • Understand the core principles of Run postmortems
  • Configure and deploy Optimize cloud spend successfully

Resources

Practice Exercises

Project Task

Design a highly available backend with disaster recovery and cost controls.

Quiz