Continuous Verification of Friday Deploys

Deploying code on a Friday is a hot button topic that pops up in tech every few months, setting twitter and the like ablaze with passionate discourse in both directions. “It’s too risky to on call folks”, “no, it’s perfectly fine if you build your systems right” and so on. This is me adding fuel to that fire.

Some first principles to begin with:

  1. A non-zero number of deploys will go wrong. No successful company in the world has ever existed or will ever exist that doesn’t face failure often.
  2. We don’t know with absolutely certainty before taking any action that it will guarantee success. In fact, we might not know it wasn’t successful immediately after (sometimes not until long after).
  3. No one wants to spend more time working on a problem past the end of a working day than they have to, especially into extended time off like a weekend.

If you deploy at all on Fridays, it will eventually require you to work late because of something unforeseen, against the interest of you and any involved teammates.

This may seem as though I’m advocating for the “No Friday Deploys” camp, but just the opposite.

Friday deploys will happen whether you want them to or not. Extenuating circumstances will at some point demand that you deploy during undesirable times (“We have a SEV0 security patch”, “We’re launching an experiment that needs to run on Saturdays and Sundays”, etc.) , where the cost for failure may be higher. If that’s the case, a mature engineering organization must be well practiced to do so.

“Friday Deploys are Resilience”

I didn’t originally say that, but I wish I did. I want it on a laptop sticker and maybe a tee. Originally, Ryan Kitchens tweeted that out as a snarky response to a thread on twitter rehashing the Friday deploy debate because apparently it had been too long since the last time we argued about this and poking the bear is just too much fun.

As a quick reminder, Resilience is the demonstration of Graceful Extensibility, that is, our ability to adapt when some capacity to work is exceeded. Human intervention is required to sort things out. This is the scary part of engineering, but also the part that makes it feel like we’re breaking new ground. After all, if we had all the instructions, it would just be data entry.

The only way Resilience can happen, though, is through experience, often when things go wrong. Yes, we need things to go wrong to help us understand when things go wrong. Everyone has their story about the unexpected and a long drawn out incident or a particularly painful, high stress situation where the company’s future dances on the edge of a knife. We dread when they happen but are grateful for their teachings later.

Getting Good when Breaking Bad

We should be able to deploy on a Friday. In fact, we have to deploy on Fridays. It’s not enough to say it’s ok to do it when needed in extreme or unusual circumstances. It needs to be done regularly and with thoughtful acknowledgement of the scope, both with the comfort of the banal and “The Fear” of any high stakes deploy going south. I would even venture to say Friday deploys should be a requirement of any functional engineering organization that exists in a universe with Fridays.

There’s always a choice, but it may not be palatable. At some point, you’re going to have to deploy on a Friday, for business necessity, for security concerns, something out of the ordinary that will break your cadence of what you prefer your deploy schedule to be like. This has many similar tones to the philosophy behind Chaos Engineering, that failure will happen, so we should minimize the blast radius where possible and understand how things will go wrong. Be prepared for failure in what should be a common scenario.

I’d liken deploying on a Friday to testing your datastore backups. Like the adage goes, an untested backup is no backup at all. Can you be confident that when disaster strikes, your data retention scheme can withstand losing a primary and failing over to a replica? We can, should, and must explicitly view deploys in varying time periods and circumstances as a necessity for unforeseen circumstances. Develop a frequently practiced plan for Friday deploys so that when you must do so, the boundaries are better understood.

There is nothing inherently riskier about deploying on a Friday than any other day of the week. The cost of a “surprise” may be higher, but that does not mean the decision carries with it a greater possibility of failure inherent to the decision. What does make the change on a Friday riskier is not doing it often enough. We should know what we’re in for when we collectively agree that yes it’s not ideal but the decision to not make a change – to not deploy on a Friday – comes with a greater cost than the known (an important qualifier – it’s not fully known!) potential cost of failure.

Resilience is preparing for this eventuality, that you will have to deploy on a Friday.

But…not too regularly…

But don’t wake me up in the middle of the night, or take me away from dinner with my family. No one wants that either. I’ve got things to do on a Friday night (ok, it’s listening to podcasts and putting together jigsaw puzzles). We need personal time to recharge, to be ready for the rest of the work week. Our resilience to the unexpected is also affected by fatigue and that has to be taken into consideration as well. When disaster strikes at times that cost us energy to pay down, and less to have during working hours, no one’s happy.

Just because you can deploy on a Friday doesn’t mean you have to. The practice should provoke a deliberate testing of your systems, human and machine alike. We’re not deploying at 4:59pm and rushing out the door because we’re too impatient to wait until Monday morning. That’s the reasonable pushback from opponents of Friday deploys, and I wholeheartedly agree.

The issue here is control. We’re afraid of Friday deploys because we don’t know what will happen, or more so we imagine what could happen and run as far away from it as possible. This control (“No Friday deploys!”) is simplistic and fragile, with the mistaken belief that if we just say we won’t deploy then, we’ll never have to. Communicate early and often, execute with preparation for the unknown, and remove the dread behind it.

Live a little bit in that uncomfortable area of uncertainty and make sure it doesn’t become unwieldy. Keep deploying on Fridays to verify they’re sound. Just not too often.

Thanks to Jacob Scott for the inspiration for the title and Ryan Kitchens for nerdsniping me to write this.


Photo: https://www.flickr.com/photos/oneaustin/1261907803/