How Many Maybe’s until Empathy?

How many maybes do you need until you get someone to be empathetic to a situation?

It’s no surprise when we discuss failure modes of organizations, typically from the outside, that we can spot the smaller failure modes. We’re looking backwards, the benefit of hindsight in full display and with all the time we need. We’re not seeing the hurdles and stumbling blocks, the physical and mental fatigue, the gaps in expertise or casual every day work expected to succeed. As an example, this week we saw the 911 system for Massachusetts go down for several hours. Outcome bias too rearing its ugly head, a system so critical to saving lives making the failure mode feel all the worse.

So why are we telling the experts how to do their jobs?

There are lots of flavors of it, but similar responses sound like:

  • “They should have had a ‘proper’ set up for their deployment pipeline.”
  • “They should have done more testing, that would’ve prevented the mistake.”
  • “Clearly they don’t have a staging environment or they would have caught the problem”.
  • “They used the wrong language to develop the system. Of course they’re making errors in their code.”
  • “Why haven’t they updated their software packages? End of Life for that module was six months ago.”
  • “Best practices for a system so important dictate it should look like this and they must not have designed it correctly.”
  • “This is a critical system – they should have been safer.

That last one in particular always gets me the most. How risky/safe an action is a lot easier to judge in the rearview. What I don’t hear often is how awful that must have felt for people in that position – letting folks down, the pressure of restoring a system that large numbers of folks depend on, and in particular during a time when leadership in tech likes seeing their layoffs.

Instead of trying to point out the failure mode and solving it from outside the system, I make efforts (it’s always going to be effortful) to list out the “maybe’s” that brought them to that situation. The first, always, should be “Maybe they didn’t want the system to go down…” which I hope folks can understand. You have to start their, a broad objective goal. In this case, having a contact system for emergency services people can reach. The challenging part I find when convincing folks to understand a failure better, is to redirect questions about the competence of agents in a system to more meaningful questions of understanding. What possibilities, what maybes, could there have been to explain the failure modes that aren’t “they’re bad at their jobs”?

  • Proper pipeline? “Maybe they’ve found it successful enough in the past for the frequency of their deploys.”
  • Testing? “Maybe they reached a case in their system overlooked, as no system can possibly have 100% test coverage.”
  • Staging environment? “Maybe the staging environment has different system load or a configuration malfunction only presented itself in production.”
  • Coding language? “Maybe they’re stuck in a legacy system in which migrating from has been deemed cost ineffective or lacking in experience with the team.”
  • Software package version? “Maybe they tried upgrading and noticed conflicting dependencies that will take additional efforts to unwind before moving forward.”
  • Best Practices? “Maybe there were conflicting best practices, avoiding long lived branches versus getting wide consensus from experts who are out on PTO.”
  • Be Safer? “Maybe they were following all the protocols laid out for being safe, protocols which couldn’t account for an unexpected series of events.”

Every time we ask a question of “Why didn’t they just…” and then follow it up with a “Maybe it was because..” we better exercise that muscle for deeper understanding of others rather than rushing to judgment, and in doing so learn a lot more about our shared systems.

This is all to say local rationality is the crux of empathy towards any incident. It’s deciding that people in a situation are making the most logical choices to them at the time based on what evidence they have and experience leading the decision making.

What I’ll ask too is when we comment on the failure modes of systems (often when it doesn’t directly or indirectly affect us!), what are we looking to get out of it? A sense of understanding is a wonderful thing (a “yikes!” or “oof..” or a hugops is apropos), but armchair “this is how you’re supposed to build this system…” from folks who were not involved is often self serving.

The irony, of course, is that this can all sound judgmental as well. I’ll say that I still struggle with this, as deeply ingrained as it is in how we approach problems (notably in software engineering). But, much like other failure modes in systems, we can still learn from them. Just ask a maybe or two first.

Further reading on Local Rationality

Photo: https://flickr.com/photos/duncan/22653695991

Post navigation

Leave a Reply

Your email address will not be published. Required fields are marked *