One of the many challenges of DevOps is finding great team members that can handle the always-on nature of the job. Nearly everyone at VictorOps has been “that person” and has done the weeklong stint of carrying the virtual pager.
What we noticed in our years of doing that job is that team member behavior tends to be polarized. If it’s your week to be on-call, then you are “all in”. You don’t venture far from home, you can’t attend special events and you have to apologize to your significant other a lot. When you are not on-call, you tend to unplug from the company. These are normal behaviors and completely understandable given the difficulty of the job.
The problem is that this on-call way of life is not conducive to actually fixing problems faster. In a perfect, company-centric world, everyone would get paged for every problem with the logic being that one of the team members actually has the information needed to solve the problem. But that’s not going to happen. And unfortunately, the truth of the situation is that you’re not sure who that one person with the correct knowledge to solve the problem will be.
The question then arises…how can team members have more knowledge about what is going on with the complex systems in their architectures while also maintaining their quality of life?
Consider Twitter and Facebook user behavior. I have a cousin that lives in Chicago. I have seen her in-person once in the last 30 years, yet I know she remodeled her living room, I know about her work and I know where she took her last vacation. I know these things because the Twitter and Facebook experiences leverage something called Continuous Partial Attention, a term coined by Linda Stone in 1998. Basically, I pull my phone out of my pocket, swipe once, and gain a disproportionate amount of information for the effort (or conscience thought) I put into it.
If you’re in DevOps, then the idea of Continuous Partial Attention is something you’re already living daily. Now imagine putting CPA to work for your team. Everyone can have an eye on the infrastructure, but not “feel like it”. The on-call team member is still responsible but if everyone has an idea of what is happening, the likelihood of someone knowing the answer is revealed sooner. Now everyone wins because the problem is solved faster, getting the team back to their lives and the company back to revenue generation.