Atlassian Brings Postmortem Analysis to DevOps

Atlassian has added a postmortem capability to JiraOps, which gathers a list of all the key events from any IT incident in chronological order to make it easier to analyze what happened, identify root causes and raise issues directly within the JiraOps platform to ensure subsequent actions are taken.

At the same time, Atlassian is extending the Automation Actions, a set of automated scripts and playbooks made available in Opsgenie to include support for Amazon Web Services (AWS) Systems Manager and Generic REST Endpoint. Those capabilities extend the ability of DevOps teams to launch automated tasks from within an Opsgenie console or mobile applications. Opsgenie is a notification application that Atlassian gained via an acquisition announced the same day the company launched JiraOps.

Atlassian is also sharing incident management data it has collected throughout the year. Collectively, its customers have recorded more than 190,000 incidents and scheduled maintenance events. Updates to those events were made more than 564,000 times, which in turn fueled the delivery of more than 130 million email and text messages to end users.

Unfortunately, the Atlassian report also finds only 3 percent of incident reports include any kind of postmortem analysis—an issue the company is trying to address via the latest update to JiraOps.

Danny Olinsky, head of product marketing for incident management at Atlassian, said that as organizations make the shift from primarily deploying application on-premises to the cloud, many of them are starting to more fully embrace best DevOps practices. One of the central tenets of DevOps is ruthlessly automating as many processes as possible, including incident management. By integrating incident management and DevOps processes, the amount of downtime experienced by any application over the course of its life cycle can be reduced substantially, he said.

The biggest challenge many IT organizations now face is melding DevOps practices and incident management process, which frequently evolve in isolation from one another. Those processes, however, are being unified as more organizations digitally transform business processes that need to always available. Any incident that might lead to application downtime now has potential revenue implications that go well beyond adversely impacting employee productivity. In fact, clear, precise communications involving any IT incident is now an expectation in the DevOps era, said Olinsky.

The degree to which organizations will be automating incident management naturally will differ. But at a time when mean time to resolution is being more aggressively measured than ever, many IT professionals have an incentive to automate as many processes as possible. Murphy’s Law also dictates that incidents will manifest themselves at the most inconvenient time, so being able to automate a process from a mobile device can eliminate the need to make a trip to the office in the dead of night.

It may be a while before IT organizations are able to fully automate every response to an IT incident. But as process that can be automated are identified, the lower the probability a truly catastrophic event capable of paralyzing the entire organization becomes.

— Mike Vizard