22-07 Operational Excellence Design Principles
# Perform operations as code
Apply the same engineering discipline you would to application code to your cloud infrastructure. By treating your operations as code you can limit human error and enable consistent responses to events.
Example
# Make frequent, small, reversible changes
Design workloads to allow components to be updated regularly.
Example
rollbacks, incremental changes, Blue/Green, CI/CD
# Refine operations procedures frequently
Look for continuous opportunities to improve your operations
Example
Use game days to simulate traffic or event failure on your production workloads
# Anticipate failure
Perform post-mortems on system failures to better improve, write test code, kill production serves to test recovery
# Learn from all operational failures
share lessons learned in a knowledge base for operational events and failures across your entire organization