Coming back from SREcon 19 Americas in Brooklyn (catch up with Tanya Reilly’s conf report) and Chaos Community Day 19 in Manhattan (Nora Jones’ Chaos Engineering Traps), Resilience Engineering has had my full attention lately. I’m thoroughly encouraged to see so many folks interested in it and speakers from many different companies contributing their shared experiences to a field that… Read more →
Category: Human Factors
Resilience Engineering and Error Budgets
This post on error budgets should be considered fluid, ideas worked in and out as any good beliefs should. My experiences with error budgets are not universal and should not be assumed as decrying anyone who has had success using them. I strongly welcome thoughtful, critical feedback and assume best intent from anyone who disagrees. I’m not a fan of… Read more →
Book Review – Meltdown: Why Our Systems Fail and What We Can Do About It
I recently had a chance to review Meltdown: Why Our Systems Fail and What We Can Do About It, which takes a critical look at several examples of catastrophic failure in many differing areas and applies Perrow’s theories of Normal Accidents in order to address these systemic problems we face regularly. A lot of the background in human factors I’m… Read more →
No, seriously. Root Cause is a Fallacy.
I’m just back from attending SREcon ’18 Americas in Santa Clara last week, an incredible conference I’ve spoken at before in Dublin in 2016 as a tutorial, but never in the U.S. You can find some blog posts written about specifics (Day 1, Day 2, Day 3), but I wouldn’t be able to do it justice myself, so read those!… Read more →
Thoughts on the role of Incident Commander
As with most of my blog posts, this should be considered a living document, the ideas offered here being malleable, as I would hope the document that it references be flexible to new ideas. Conversations surrounding this welcomed and encouraged as we all continue to learn. I recently came across Pagerduty’s documentation surrounding their philosophy on the Incident Commander (here… Read more →
Recognizing adaptability in learning
The following is the current version of a section in my book on interviewing for technical roles. I’m trying to help out with any advice I can while I’m putting all of this together. As part of that, I’m looking for constructive criticism and feedback alongside it. My experiences as an engineer are also not universal and so my own biases… Read more →