Things get crazy, don’t they? I mean, IT, even with Agile and DevOps, is a very interruptive endeavor. When something in production breaks, whatever you were working on suddenly just got a lot less important. Agile is aimed at making more stable code, and DevOps is focused on shifting troubleshooting left, but inevitably, IT systems – be they ones we wrote or ones someone else did and we manage/maintain – are complex systems, and they break unexpectedly. As time goes on, things like test-driven development (TDD) or continuous testing have steadily improved quality and reduced interruptions, but we still say things like “Oh, the container crashed, spin up a new one and keep moving,” when that crash should be a work-stopping event. If it crashed once, it will again, and it only takes a few minutes to find some of the horror stories out there caused by doing exactly what I just described. There is a line between “resilient application infrastructure” and “We ignore critical failures because we can.” And one side of that line is pretty ugly.
So, with all of our advances, we are still facing unexpected issues cropping up – sometimes caused by coworkers or vendors – that we need to deal with.
One thing to remember is that this, too, is part of your job. Start there. While most of us aspire not to be “the fix-it guy,” if you wrote the code or managed the infrastructure, you are the right person to fix the unexpected problem. So, no stress, but others (like management) might be stressed themselves, so communicate clearly. A five-minute email or thirty-second Slack message can save a ton of frustration and meetings later. A simple “This came up and is a priority, so that thing I was set to deliver tomorrow will be delayed by at least X amount of time,” will do a lot to help others understand the impacts on your daily flow.
I’ve worked a fair amount of my life as a contractor, and I admit that I honed the skills in the above paragraph living that life. While contractors are generally paid by the hour, ownership is a bigger deal, but in the end, contractors are there to do what needs doing, and the client decides exactly what that is. The same applies when not a contractor, though also, and in my experience, it applies double or triple if you are in charge of a team – you have an entire team, and you better communicate if an issue so big it side-tracks deliverables occurs. We had one of seven inputs to a massive data pool that was the source of billing fail on one team I managed. I mailed stakeholders in our other projects with a simple, “The company likes to get paid, so our attention is 100% on restoring that feed; your project is delayed.” This example was one (of two when I led that team) that was pretty extreme, but it’s a good example of effective communication; just let them know. Most business leaders are there for the org while focused on their piece of it, and if they have an issue they can hash out the business’s priorities and come back to you.
And keep nailing it. Errors and issues and vendor relations failures happen. You’re the ones minimizing the disruption and keeping the org rocking. Don’t stop or get discouraged when it sets you off your timelines; just communicate … and keep being the star.