SREcon Americas 2018 – Architecting a Post Mortem

San Jose, CA (2018-03-29)

SRE’s are frequently tasked with being front and center in intense, highly demanding situations in the production environment that require clear lines of communication. Our systems fail not because of a lack of attention or laziness but due to cognitive dissonance between what we believe about our environments and the objective interactions both internal and external to them. In this talk, I’ll discuss how we can revisit our established beliefs surrounding failure scenarios with an emphasis not on the who in decision making but the why behind those decisions. With this mindset, we can encourage our teams to reject shallow explanations of human error for said failures, instead focusing on how we can gain greater understanding of these complexities. I’ll walk through the structure of post mortems used at large tech companies with real world examples of failure scenarios and debunk myths regularly attributed to failures. Through these discussions, you’ll learn how to incorporate open dialogue within and between teams to bridge these gaps in understanding.

