I’m a big fan of Fantasy and SciFi novels, ever since I first picked up The Hobbit before freshmen year in high school. I love how magic seems to well up within folks, when things seem most dire that people can be their best selves. Unknown strength or skill appears in the moment to solve the crisis at hand, despite often not know where it may come from.

Quite possibly my favorite, though not yet finished, series is The Kingkiller Chronicles by Patrick Rothfuss. Of the three books, two have been written. I’ve read them perhaps 4 or 5 times each. Cleverly crafted magic systems in particular seem to be a hallmark in many of the books I return to often. In the Kingkiller Chronicles, part of that magic system is the study of “Naming”, an art in which one truly knows something so well to its core as to have completely mastery of it, even if you don’t know how you know it. It simply happens in the moment.

One scene comes to mind in which Elodin, the Master Namer of the university for magics (and a bit of a crackpot), is teaching his class on the first day of the term. He pulls a river stone from his pocket and asks the class to study it – its composition, how the light reflects it, its true being and everything that makes it. Yet for such a simple thing as a rock, he says, after a month of intense study you could only grasp the faintest edges of it. Maybe.

“This is the problem namers face. We must understand things that are beyond our understanding. How can it be done?”

He didn’t wait for an answer and instead picked up some of the paper he’d brought in with him, handing each of us several sheets. “In fifteen minutes I will toss this stone. I will stand here,” he set his feet. “Facing thus.” he squared his shoulders. “I will throw it underhand with about three grip of force behind it. I want you to calculate in what manner it will move through the air so you can have your hand in the proper place to catch it when the time comes.”

Elodin set the stone on a desk. “Proceed.”

From “The Wise Man’s Fear: Day 2 of the Kingkiller Chronicles”

The class then immediately sets at it, drawing forth knowledge of physics and psychology, maths and probability, to seek out every possibility that could come forth from this ask. The students are encouraged to group up, to combine their collective knowledge to form an answer. A class of highly ranked and knowledgeable individuals, some of the brightest in the world, are unable to accurately answer with any certainty.

“We don’t know how the stone will fall.”

Elodin clapped his hands approvingly. “Good! That is the right answer. Now watch.”

The eccentric professor then steps into the hallway and ushers in a boy no older than eight. Elodin stands where he said he would in just the squared off posture he stated, grinning madly. “Catch!” he says and tosses the stone to the boy who, startled, snatches it out of the air. He whoops and cheers, congratulating the small child and ushers him back out. He then asks his class how the boy did the impossible, predict in the moment what the best and brightest of them couldn’t piece together with all their years of deep knowledge and understanding.

This is how we, as experts, should look towards incidents and how we’re getting it so very wrong in the software industry. We’re trying to calculate all the possibilities, predict the trajectory so we’ll be lined up to catch the failures before they hit the ground. If we just add a few more numbers to this time series or modify our predictive pathing to see where we’ll be next week, we can figure out exactly when we’ll have an incident! With all these metrics we’re sure to plan out the course to success before we even have the failure!

But we can’t. An incident is by its very nature a surprise, events you can’t plan for.

And yet! Somehow, beyond all reason, we manage to keep things running. We don’t just survive, we thrive in the impossible. We think, we plan, and we act in the moment, with years of experience the foundation for our expertise – high tempo, high pressure situations where we synthesize just what needs to be done.

How does that happen? How do we invoke this magic, practice it so often as to seem mundane, and how do we go on to teach it to others? That too is part of the as of yet unknown parts of it. Our subconscious is constantly picking up signals we’re not entirely aware of. We collect them and we disseminate them to others. We process them, separately and collectively, to move, to act without fully being aware of everything that happens. The network-slowdown-that-isn’t-a-network problem, where your gut tells you to look at the slow query log and shout “ah ha! I knew it”, these are the not-so-magical-incantations the experts need to share.

“My point is this. In each of us there is a mind we use for all our waking deeds. But there is another mind as well, a sleeping mind. It is so powerful that the sleeping mind of an eight-year-old can accomplish in one second what the waking minds of seven members of the Arcanum could not in fifteen minutes.”

This is Resilience Engineering, the practice of understanding and developing this intuition, that “sleeping mind” that works though we can’t explain how. It’s what we continuously source to rescue us from disaster – and to avert disaster in our regular daily actions without fully knowing the extent to which we’ve avoiding harm. There are true things we know instinctively, hints and assumptions we rely upon without comprehending all of the bits of information that led us there. Our work then is to tease out these unknowables, to continuously create and refine mental models that can make sense of how we go about learning.

We get this wrong in software engineering because we think getting better at catching the rock means predicting where it will be long before it’s thrown. What we actually need to do is get better at recognizing when the rock is being thrown and reacting quickly to it, to study the arc and our ability to move when we need to. The only way to do that is to practice it in the moment.


