Quick thoughts on short papers: “Failure to adapt or adaptations that fail: contrasting models on procedures and safety”

I’ve been consuming a lot of content in the last few months — books, papers, blog posts, talks, podcasts, you name it. I’ve seen exciting aha! moments and spent hours trying to wrap my head around seemingly simple but, in fact, maddeningly complex concepts. In this setting, the right sort of short paper is like a refreshing cold drink on a hot, humid day. These papers argue a core idea of goldilocks size and serve as landmarks and reference points as I sail through seas full of dragons.

The most recent paper of this type I have run into is Failure to adapt or adaptations that fail: contrasting models on procedures and safety by Sidney Dekker, from 2001 (hat tip to John Allspaw for passing it along). This paper has become my go-to for the idea that work as imagined and work as done are not the same. That is,

People at work must interpret procedures with respect to a collection of actions and circumstances that the procedures themselves can never fully specify (e.g. Suchman, 1987). In other words, procedures are not the work itself. Work, especially that in complex, dynamic workplaces, often requires subtle, local judgments with regard to timing of subtasks, relevance, importance, prioritization and so forth.

And:

 Procedures are resources for action. Procedures do not specify all circumstances to which they apply. Procedures cannot dictate their own application. Procedures can, in themselves, not guarantee safety. 

Applying procedures successfully across situations can be a substantive and skillful cognitive activity. 

Safety results from people being skillful at judging when (and when not) and how to adapt procedures to local circumstances. 

And, finally:

There is always a tension between centralized guidance and local practice. Sticking to procedures can lead to ineffective, unproductive or unsafe local actions, whereas adapting local practice in the face of pragmatic demands can miss global system goals and other constraints or vulnerabilities that operate on the situation in question. 

To apply to software engineering, simply replace procedures or guidance with runbooks or documentation, think about the last time someone in your organization went “offroad”, and what the response was.  The beauty of short papers is that they don’t need much more analysis than that. Just read it yourself — and let me know what you think!

What’s up funemployment

I left Lyft in June 2019. In the leadup to my departure, I decided to take a period of “structured funemployment,”  rather than pipeline a job search and quickly get back to W-2 work. A break like this — where I set the agenda and have time to reflect on the past and future — is something I’ve been promising myself I would do the next time I had a break between jobs. Achievement unlocked!

I’m privileged to have the financial/logistical space to take a break like this and especially so to structure this experience around my membership in the South Park Commons community, which I joined in July (h/t Matt for the referral). A group that “brings together talented people to share ideas, explore directions and realize the opportunities that’ll get you there” is exactly what I was looking for. The Commons is an inspiring place, full of brilliant, kind, hard-working people with varying backgrounds and goals.  Spending my day at the Commons is energizing in a way that sitting at home is not.

In this setting, I’ve chosen to focus my attention so far on understanding how complex systems fail and the application of resilience engineering to software. This decision is motivated in large part by my experience with operations, failure and the incident lifecycle during my time at Lyft. It turns out there’s decades of research on why these problems are hard (e.g., why we can’t don’t have nice things!).

I find this domain fascinating and am framing my pursuit as independent research, rather than an intent to launch a startup “in this space” (as they say). It’s… complex, but I don’t think this work aligns well with the sort of mechanics (e.g., metrics, growth) associated with success as a venture-backed company (but reach out if you think otherwise!). One data point here is that John Allspaw, the godfather of “resilience for tech” and former CTO of Etsy, is operating a consultancy, not a startup selling a product. My working hypothesis is that I’ll leave funemployment to work in this area inside a maturing hyper-growth company — think decacorns like Airbnb and Stripe. I still have at least a few months left of funemployment and research, but I’m happy to chat with folks at organizations looking to invest/hire in this area — drop me a note!  

So far, in practice, independent research on resilience engineering means consumption and conversation. Looking at the ever-increasing list of papers, books, and talks I have read/watched and have outstanding, it is definitely the best and worst of unbounded queue times. It is also great to have connected with individuals and communities — primarily on Twitter and in various Slacks — where there’s a healthy and ongoing discourse on resilience and related topics. Plenty of excellent chats over coffee as well — and funemployment means a flexible schedule, so if you’re interested in talking shop, just let me know.

While I’d given myself top marks for participation on Slack and Twitter (@jhscott), I’ve been slower to produce longer form analysis and writing. I’ve started drafting talk proposals for related conferences, with the first being REdeploy, which I’ll be attending in San Francisco in October. I’m also hoping to use this blog to distill what I’ve learned (duh). Look for a writeup of what SREs even do in a world of production ownership where developers hold the pagers in the next few days.

Lets chat

Curiosity and free time is a dangerous and excellent combination. If you’d like to grab coffee or lunch in the SFBA, or chat over Hangouts/Skype/etc, let me know. The best ways to reach me are email — first.last @ gmail.com — or Twitter — @jhscott.