One of the fallacies about our collective approach to retrospectives, incident reviews, and post mortems is the belief that the entire process is a rational machine. Pour in a curated series of events, turn the handle, and out pop all of the action items that need completing to fix the world. I can’t speak to every industry that practices Resilience Engineering, but as for Software Engineering it stems strongly from our belief that we’re fully in control of our environment. We’ve built our tooling, architected our systems, and we’re running the retro. Why wouldn’t we be able to simply apply the calculus to our knowledge and change things for the better?
This all speaks to a distinct lack of humility in what we do as a practice. If we want to better understand the risks we undertake every day and to learn from failures in that work, we need to first accept that failures are in part due to our incomplete understanding, great and small, of our socio-technical systems. Even with a complete working knowledge of everything, we would be unable to act on everything needed to perfect our system, and that underlying system will change despite these efforts. Being comfortable with being wrong means we can change.
This reluctance to accept that things continually fail despite our best efforts, is a common reaction. It’s hard to assume that our systems are continuously in need of tweaks because it’s also hard to accept that they will always run in some form of a degraded state1. That said, we can fall into the adjacent trap with people being the adaptable element in the system2 that we are then in the best position to understand the entirety of our system and how best to course correct. The sharp end can be a powerful, if not perilous, position to sit in but it doesn’t guarantee omniscience in the scope of understanding an incident. This is why I frequently suggest that practitioners of retrospectives be folks who weren’t involved in the incident, to help mitigate this failing.
Hubris as Facilitator
It’s easy to understand the desire to sit in the facilitator chair. You’re taking the reins of the situation and you’re going to get to the bottom of things. You ask the questions, you drive the conversation and schedule the meeting, but most importantly you’re going to be there to get answers. That would be true if you held a made up title like investigation commander or retrospective captain, but you’re don’t. A facilitator is less the spike and more the bump/set. You’re there to position other folks to learn, not wear the badge.
Retros also come in various shapes and sizes, which makes for another tempting place to be in control. If I’m running the retro, then it can follow my guidelines and my preferred flow. This lesson I learned the hard way, having felt as though I knew “the one true way” to run it. I was there at Etsy watching John Allspaw, Morgan Evans, and Daniel Schauenberg develop and put ink to paper with the Etsy Debriefing Guide. In doing so, though, I failed to recognize the microcosm that was Etsy, that what worked for us there didn’t apply universally. Maybe folks had other tools worth surfacing and we should continually look to that to see how we can improve the production of our retros.
Facilitators should instead be the support for everyone else to do the talking and ask questions of their own. We can only share that deep empathy with one another when we put ourselves in one another’s shoes and that can only be done with the understanding of our own fallibility. We too know how awful it feels to be at the center of an incident, that it could have easily been us, which allows us to help recreate the scene and ditch concepts like “human error” as an easy solution to a complex problem.
A singular view of the problem, from that up on high as facilitator, will produce a singular set of answers constrained by our myopic vantage point.
Top Down Misunderstanding of Retrospectives
Another failure in our work running retrospectives is senior leadership (individual contributors and management both) using them to impart the illusion of work being done. You’ll see this often in email chains that include a CC list that races its way up the reporting structure. This incident was unacceptable, but don’t worry, we have top men working on it right now. Similarly, you’ll see public facing reports sent out by companies to reassure customers, the board of directors, and investors that all is well. It’s worth noting that there is value in shared perception being a useful tool for a business to leverage, but it still doesn’t impart learning to folks on either side of the boundaries of a company. The learning review is downgraded for the sake of making an org, at the macro or micro level, appear to be invulnerable to failure. The company cannot tolerate failure because vulnerability must be made an impossibility.
It’s also why so many existing tools are marketed with a primary focus on action items as the work. Going into a retro with the principal desire to create a ToDo list is problematic because the learning becomes a secondary function by nature of prioritization. It has roots in humility such that we’re going into our discussions and interviews believing we can simply solve all the problems in tech and then everything will be perfect from here on, the deeper understanding a “nice to have”.
This is not to give individual contributors in senior positions a pass. How often do we rely on “This is the way we’ve always done it” and our use of best practices4 as a crutch for decision making rather than challenge established methods? We give folks in senior positions more time for questions during discussions and put them at the front to answer for the sake of expedience. Equally, holding a blameless post mortem can fail if an org values engineers who prioritize their place in the pecking order rather than risk losing face in front of others. Giving less experienced folks time to explore ideas tests the validity of our mental models.
Most importantly, for us to build and revise adaptive capacity3 we have to first acknowledge that the map of our system is potentially inaccurate, a map that is heavily influenced top down. Until we can move towards an acceptance of inaccuracies in our understanding, our assumption stands that we must be right and the view should not change. All of these concepts, and our own journey to them, are themselves a work in progress. There are soft boundaries and holes in the middle where our language and understanding fails us. This does not inherently diminish our work, but can in fact enhance it.
Humility In Practice
It’s a fairly given criticism that a lot of our work in applying Resilience Engineering and Human Factors concepts to Software Engineering fail to give concrete examples of putting theory to practice, often leaving it as an exercise to the reader. With that in mind, what does humility first in a retro look like?
- A retrospective is a safe place to say “I don’t know”. A facilitator can and should actively say just that while encouraging others who exhibit similar misgivings about what they can safely hold true. By doing so, it establishes a pattern of being ok with the discomfort of uncertainty.
- Retros should prioritize learning before fixing. This is not infrequently stated, but bears repeating. As said elsewhere, it also doesn’t exclude action items. Rather, allow folks to freely express what they don’t understand without shame and for improvements to extend from these learning experiences.
- A generosity of spirit is key. Participants should hold a respect to time shared for other folks to learn, with a particular emphasis on the facilitator. As invaluable as your time is, the up front cost of interviewing folks, organizing meetings, and gathering information is paramount. Put in the extra effort to interview before a retro meeting and follow up after to tie up loose ends.
- There should be a reduction (not an absence) on our use of hindsight. Acknowledging that we’re all fallible means we can resist the inclination to “fix” an error with counterfactuals when we review past events. Look backwards not as a way to save face but to explore why ideas previously made sense.
- The malleable nature of a retrospective is to review what is assumed to be true. Confirm or refute assumptions on the narrative as it is assumed to exist regardless of who shares it. Some folks may not be in a position to share, internal pressures against them. Insights often comes from the sharp end, which isn’t always the most tenured engineer.
- All participants should be on equal footing. Retros are akin to a round table discussion where folks come to share events and ask questions, rather than seniority or management directing the events as to how it may best serve their own interests or those assumed to be of the organization. Don’t let titles dictate who gets to speak.
- Our work in retrospectives is ongoing and adaptable. Before practitioners get too set in their ways, we should remember that Resilience is a verb5. Templates are more rigid and predefined, but allowing ourselves the chance to break out of molds, to make mistakes, and explore the boundaries with the assurance that failure is ok, we can practice new ways of pulling out sources of information from our incidents. Our meta discussions surrounding incidents should themselves be challenged.
1. Cook (2002) – How Complex Systems Fail: Complex systems run in degraded mode
4. Klein et al (2016) – Can We Trust Best Practices? Six Cognitive Challenges of Evidence-Based Approaches
5. Woods (2018) – Resilience is a Verb