The concept of a “war room” is time-honored. One of the early examples is the Churchill War Rooms, an underground bunker that allowed Britain’s leaders to plot the allied route to victory during World War II. British Prime Minister Winston Churchill, as plain-speaking and self-assured as ever, said, “This is the room from which I will direct the war.”
Today, the war room concept also serves the world of business—specifically, organizations’ IT departments when troubleshooting a major issue impacting the business applications and their supporting IT infrastructure. The problem? So many parts of the infrastructure, whether hardware or software, can be implicated as the root cause of any problem. As a result, IT war rooms can encompass many stakeholders who are experts in those individual parts. And, while the goal is to collaborate to resolve the problem and minimize the damages from it, IT war rooms often result in a case of too many cooks in the kitchen, all speaking different “languages.”
War rooms are a symptom of a much bigger problem within the IT organization. At the highest level, it results when IT operations teams have insufficient visibility and have limited proactive control of the infrastructure. For one thing, in a siloed organization (which most enterprises are), discussion across silos is rare. Plus, siloed monitoring tools for resolving infrastructure issues lack a common context and have no inherent understanding of the business impact of the applications. By contrast, the hardware, software and data that span silos do “talk.” In a sense, they collaborate. That’s the nature of connected infrastructure. Data, for example, flows throughout the enterprise, altogether oblivious of silos.
The lofty view of war rooms, in the words of Dennis Drogseth, vice president at analyst firm Enterprise Management Associates, is this: “Our war rooms can be either physical or virtual. Highly automated or not. Made up of consistent, well-defined teams, or not. But what makes them war rooms is the need for collaborative decision-making across silos.”
How Low Can IT War Rooms Go?
The reality is that, despite their noble charters, IT war rooms have devolved into blamestorming—rancorous debate and finger-pointing to indict application development, database analysts, IT operations or DevOps for a failure. Drogseth described a bleary view of war rooms as “disastrous assemblages of finger-pointing adults caught up with siloed versions of the truth.” In an even more disillusioning example, one vendor stated that “mean time to innocence” becomes the IT war room’s chief metric”—rather than isolating and resolving the root causes of issues.
With so many factors in play, dollar costs of IT war rooms are difficult to calculate. Taking aim at the problem, Enterprise Management Associates said, “The average War Room comprises 15 people and takes 5-6 hours to respond to and resolve the incident that caused it to convene. 29% of War Rooms take more than 11 hours to respond and resolve.” And those figures are over and above the substantial losses that occur if a server or switch, for example, goes down and war room staffers bicker over who’s to blame. Revenue losses can easily reach into the millions of dollars per hour of outage.
Furthermore, the problems and costs are growing. Infrastructure complexity has ballooned with the growth of hybrid data centers. The hybrid model uses cost-effective, flexible compute and storage via the cloud, while retaining the control and security provided by on-premises infrastructure. However, the reality is that highly virtualized, multi-cloud environments are amazingly complex and problematical when it comes to ensuring application service delivery. Out-of-control infrastructure leads to outages and slowdowns—and finger-pointing.
Remedies
There are, however, several ways to mitigate the downsides of IT war rooms:
- Discover technology solutions that provide cross-domain insights. Stated more simply, that means finding tools that can trace infrastructure issues to their source—exactly where the issue originates—even if the symptoms appear elsewhere in the infrastructure. Achieving this level of sophistication effectively eliminates finger-pointing.
- Learn from history. Issues repeat themselves; if an issue occurs but is not traced to its source and remedied the first time, it will almost certainly recur. Often, it requires analytics-based tools to make the necessary correlations and remedy an issue.
- Create what EMA calls an “informed context.” An informed context can help resolve an issue, moving from triage and diagnostics, to remediation, to validation. Knowing where applications live on the infrastructure at any given time and the business priority or tier of each application is one example of “informed context.”
IT will likely always have its war rooms, and many organizations will claim that they’re necessary. At its best, a war room is a highly motivated, solutions-driven SWAT team. Of course, SWAT teams, like anything to be valued, will forever come at a cost to the enterprise. One way to view the IT war room is that it’s just one more cost center. Maybe the better way to view it is by quantifying its value: If a war room saves minutes in getting a server up and running after an outage—or prevents an outage altogether—the savings could run in the millions of dollars. But too often, the opposite is the case: IT war rooms further contribute to the lost revenue from a failure.
Let’s hope that enterprises remember the original intent of IT war rooms and reorient them around strategy rather than blame. If not, enterprises might want to consider closing the door of the IT war room for good.