Tag: Resilience Engineering

Eastham Bridge collapse 24th May 2016

Big Enough to Fail

I’ve been playing with an idea based mostly on anecdotal evidence: At some point, the external dependencies which our systems rely on become so tightly coupled, large, and fundamental that should those foundations inevitably fail, that blame can actually go down in response to an incident. A blanket statement applicable to any system feels egregious, but I’m confident enough to… Read more →

Chameleon on a leaf with green and red tones https://flickr.com/photos/michaels_bilder_aus_aller_welt/10760876543

The Invisible Success of Near Misses

We often talk about blame aware culture. Your teams are continuously working towards building a system where, among many goals, a safe and reliable system is available. When we’re surprised, incidents happen. As we’re working towards safety, and by definition these incidents are surprises, shaming folks for failure is counterproductive and instead we should celebrate the opportunity to learn more.… Read more →

Crystal Ball

Peering into the future of Resilience Engineering in Tech

Coming back from SREcon 19 Americas in Brooklyn (catch up with Tanya Reilly’s conf report) and Chaos Community Day 19 in Manhattan (Nora Jones’ Chaos Engineering Traps), Resilience Engineering has had my full attention lately. I’m thoroughly encouraged to see so many folks interested in it and speakers from many different companies contributing their shared experiences to a field that… Read more →