I’ve found that those who have been most successful with DevOps do not actually care for the phrase. What they care about is the ability to automate, respond to, and improve their software delivery chain, as well as the tactics of moving faster and improving application quality. Another common attribute of those committed to DevOps is a focus on releases automation. The problem has been a tendency to move too quickly in the middle of the process, but in production, things start to break.
This is why Virtuoso decided to look at the outcome first.
Releases automation, although the most commonly talked about aspect of DevOps, is in the middle. It is preceded by what could be seen as volatile and disjointed development processes, and concludes with making sure that from dev to post-release, everything works as expected. While release automation is where most performance is found, if it isn’t followed by great monitoring, its value can degrade release by release.
Virtuoso is a high-end travel network. Their customers come to them expecting top-notch service. And that service starts with their web application. As their development team has begun the move to a DevOps-oriented operation, they’ve had to embrace change head-on to keep their application quality high.
For the team at Virtuoso, there was no question they could automate. After all, the DevOps team had built a culture whose mantra was Automate it all. “If we do something more than two times, it should be automated,” says Shawn Motley, a DevOps architect at Virtuoso. But the team realized that they needed to know what would happen to production after they did automate — meaning they needed to focus on the end of the delivery chain first. “QA was the only place [where] I had a holistic point of view,” recalls Shawn.
Shawn considers himself a veteran of quality, and realized that in order to take the next step in modern development, he needed to turn his focus on how the team found and addressed both process and production issues. He understood that fast releases were nothing if they ended up in a state of poor quality, or the team’s ability to respond to issues was slow and ineffective. “DevOps is like internal affairs for cops,” says Shawn, meaning that they need to make sure things get done effectively, efficiently, and consistently.
Quality to Virtuoso was more than just fixing problems. It was also making sure that when things did go south that the proper people were notified, but were not swamped in notifications (which had previously led to burnout and incident fatigue).
Their setup leveraged log analysis with SumoLogic, good sharing, and internal education via Atlassian Confluence for collaboration, plus “cheat sheets”, and finally, a robust incident management system through PagerDuty.
The combination of the three did one major thing — it gave them visibility. And that visibility has now optimized their team, and freed them to focus on what’s next, which is supporting their release automation with Chef and Jenkins, and looking at future functional test automation.
PagerDuty in particular was the key to making sure that all that automation didn’t become more of an issue than a benefit. Both PagerDuty’s Event Enrichment Platform (EEP), and their incident management platform made sure that the on-call workload was distributed properly, the right experts were contacted, and that a funnel of incidents were culled for what was most important. The decentralization of ops dramatically improved reliability. Shawn states, “With PagerDuty if you heard a ding, it actually meant something.”
Previously, alert fatigue fell on one or two individuals who burnt out quickly and were a single point of failure. Because they could do nothing more than try to keep up with the system, they were unable to utilize the skills they had to take the automation further.
Shawn also wanted to ensure that the culture was in place to keep the process that they had built up and running. So there was no more finger-pointing, a commitment to full transparency, and real effort to make sure everyone was on the same page.
In the 14 months that Shawn and team built out their delivery chain, they went from releases that took 13 hours to more frequent releases that took only six minutes. “We applied the science of testing to application development and operations,” says Shawn. “Without quality operations, we could not support even the most amazing dev.” Once they had a robust analytics and incident management system, they could test the outcome of their experiments, and were able to innovate more, and move faster.
I’m reminded by the Virtuoso story that “DevOps” is the movement that aligns a set of principles and practices. But what gets it done is focusing on execution. And a critical aspect of execution is not just speed — it’s quality as well.