Failing over without Falling over
This talk will show how we can use System Theoretic Process Analysis (STPA), as advocated by Professor Nancy Leveson’s team at MIT, to analyze failover hazards.
What you'll learn
Many organizations have disaster recovery (DR) failover plans that are poorly tested and implemented, and they are scared to test or use them in a realistic manner. This talk will show how we can use System Theoretic Process Analysis (STPA), as advocated by Professor Nancy Leveson’s team at MIT, to analyze failover hazards. Observability and human understanding of safety margins and the state of a failover are critical to having a real DR capability. Chaos engineering, game days and a high level of automation provides continuously tested resilience, and confidence that systems will fail over, without falling over.