The Good: DevOps has been a success for Dev.
With higher expectations, better processes, more management focus and new tools like advanced PaaS and containers, developers are churning out more and more software faster than ever. Success! Or is it?
The Bad: DevOps has been a success…for Dev.
For Dev teams, this may result in pats on the back and high fives. But for Ops, this can look like one thankless tidal wave of work after another. The same software velocity that is praised in Dev is currently overwhelming their counterparts in Operations. And frankly, many DevOps tools released in recent years happen to benefit Dev much more than their counterparts in Ops. (My pet theory is that developers at IT automation firms are acting and thinking like…developers – focusing their creative energies on solving Dev problems.) A similar problem holds true for IT Security teams who want to build quality in by “shifting left” – see https://devops.com/2015/12/10/getting-rugged-devops-right/ . From the vantage point of IT Operations and Security teams, they are doing more work than ever but without any more human or IT automation resources. To add insult to injury amid their increased workloads, many IT Ops teams have to invest countless hours (away from family and personal time) in training courses and migrations to reorient their processes to fit into shiny, new Dev-centric “DevOps” tools and paradigms.
The Ugly.
Work. Overwork. Glitch. Search. Find. Rework. Repeat.
A quick glance at IsItDownRightNow.com will show you which IT Operations teams are in a living hell at the moment. The other thing you will notice is that these IT Ops teams are from some of the most reputable companies on the planet. They are talented teams but stuck in rework cycles of web performance problems, security vulnerabilities and outages.
These problems get the attention of everyone. Operations…and Dev…and Security…and customers…and Finance…and auditors and the press.
No one likes rework. And no one likes the panicked fire drills that cause panicked rework. So what causes those rework cycles?
Drama-free IT. Visibility Cures Chaos.
Much of this chaos can be tied to miscommunication. Boring? Maybe. Simple? No. Part of miscommunication is cultural. (Hint: OrcaConfig’s “How to Screw Up DevOps” white paper covers the cultural aspects of miscommunication). But miscommunication is also cured by automation solutions that promote transparency and visibility.
If we want drama-free IT we need IT automation solutions that eliminate miscommunication and missed assumptions between Dev, Ops and Security.
What better way to do this than with full transparency and visibility?
Bird’s Eye Views:
Many IT automation tools cater to SysAdmins who are interested in managing configurations of individual servers. That’s great if your focus is on individual severs. But IT Operations, eCommerce teams, Application Owners, Compliance & Security teams and IT Managers often care more about the health and performance of their application ecosystem (including related applications, middleware, databases and configurations that tie them together). So there is often a mismatch between the tools that SysAdmins and Developers have versus what IT Ops needs.
Operating at Enterprise Scale Requires at-a-glance Visibility:
IT Ops teams need bird’s-eye views of applications, databases, middleware and ecosystems to promote ease of ongoing operation. They need to be able to see, at a glance that a configuration compliance problem exists and they need to know right away the location and nature of the error. Large scale IT Ops teams do not have the luxury of toggling back and forth between multiple reports and multiple screens searching for that non-compliant, broken needle in an ecosystem haystack. If they are faced with a glitch or an outage, they need to find and fix it right away. One size does not fit all. Dev, Ops and Security each have their own interests and their own need to view their application through a lens that is most meaningful for them. Multiple ecosystem views eliminate guesswork between Dev, Ops and Security and save precious time in the find-fix cycle.
Forrester Research spells it out well in their 7 Habits of Rugged DevOps white paper, Habit 1: Increase Trust And Transparency Between Dev, Sec, And Ops:
“I&O pros pride themselves on understanding the interconnectedness of the hardware and software environment, and they are unnerved by complexities that poorly governed delivery practices create because the firm rates I&O pros on their ability to improve the performance and stability of the production environment. To maintain the environment, they must reduce the chances of an outage or interruption of service, and they must reduce the MTTR when an issue does occur.“
Understanding that interconnectedness depends greatly on how much intuitive ecosystem visibility we provide to our colleagues in IT Operations. But it’s not just IT Ops that benefits from visibility, Dev and Security do too.
Developers are able to write more secure code when they get early, in-context feedback. As David Mortman, distinguished engineer and Dell chief security architect, said, “Security issues are product quality issues, and no one wants to write buggy code.”
Even better than efficiently finding and fixing errors is of course to prevent them in the first place and to proactively eliminate the damage they cause. Here too, the answer is better visibility. With true application ecosystem visibility, Dev and Ops teams would be able to model relationships between applications, middleware and their configurations so users will know that “changing this, affects that”.
My teenage son has better visibility via his smartphone as to his driving route and weather conditions than I have for my multibillion dollar IT operation.
Visibility into the Future:
Ecosystem visibility is a key to communicating between Dev, Ops and Security and running complex application environments in the present. But what about the future?
Release automation tools and custom scripts are excellent at deploying changes across large environments. Unfortunately they can be just as efficient at deploying typos and other mistakes across large environments, causing the chaos that we discussed earlier. Here again Operations teams would benefit from improved visibility – the ability to preview and pre-validate proposed changes across the entire ecosystem well before those changes are deployed. This is the real promise and savings of ecosystem visibility. Better visibility, with more use-case lenses cuts out the miscommunication that causes too many performance glitches, security vulnerabilities and outages. This chaos takes a toll on our IT Operations teams, our employers and our customers.
What Next?
We are soon entering a new season with new budgets. The temptation to “buy something” will be great. In the spirit of Rugged DevOps my recommendation is to take a full, bird’s eye view of your SDLC and to shift Ops and Security leftwards, building their critical considerations into your processes and into your IT automation tool selection.