For an enterprise with a deep DevOps maturity model, the initial “wow” factor of automation has worn off. At this point, the sexier concepts involve proving continuous improvement. As an example, our first baseline of metrics might have taken the pre-DevOps timings of a typical build and deployment of a given application. We can measure this time against our post-implementation of DevOps automation and show a significant savings in time. This is opportunity time, meaning an employee must be ready to turn around the decision cycle faster to take advantage of the time saved (not be playing solitaire, for example). But once DevOps automation is old hat, the difference between our initial post-DevOps typical build and deployment time and our current one still using DevOps may not be all that significant. But is there more hidden there than meets the eye?
Establishing Context & Baselines
At this point context may be a more meaningful measurement. As an example, if my first version of DevOps automation for a deployment contained 35 steps and my current one contains 160 steps it would stand to reason I am “doing more” now in my deployments than I used to do. These additional steps may involve executing more tests than I originally thought to do. They may be interacting with monitoring systems (turning them on and off as appropriate), which reduces “noise” in the alerts my monitoring teams have to respond to. In short, over time, my deployment processes may become “longer” because I am increasing the capabilities or the sophistication of what I do during these deployments. While from a pure timing perspective this may make them longer, it also makes them more powerful and, so, is worth measuring. To know the delta, however, I must have retained older versions of automation to compare against, not just throw away the old every time I update with new steps.
So let us assume the context of my automation is now fairly stable. I do the same things, more or less, the same way every time I deploy my application. At this point, measuring the functionality of what I do becomes more meaningful. As an example, my deployment automation may complete four main functional tasks to complete, I deploy the code (the app), I deploy the dataset, I deploy the config files and, finally, I run a series of tests on the above. While I may not ever incorporate too many new kinds of things to do in this automation, understanding how long each function takes to complete is relevant to proving continuous improvement. For instance, if my code takes roughly the same amount of time each deployment, as does my config files, but all of the sudden my dataset begins to grow in each deployment (from normal usage), my overall deployments may be getting slower. This is not due to hardware inefficiency or network inefficiency; it is simply due to a larger dataset that keeps growing.
If on the other hand, it is my test set that is getting progressively longer to execute, that is an indicator of test script inefficiency (given same code, same config and same dataset). If the number of tests being executed has not increased, but the length of time it takes to run them has, there likely is inefficiency in “how” the testing is being constructed or executed. Understanding the context for what is going on in a deployment or similar kinds of automation must occur before introducing predictive analytics can occur.
Historical Trends Extended
Predictively evaluating automation is nothing more than taking the data from a historical perspective and extending the trend lines into the immediate future. If for example, my deployment automation is getting four seconds longer each month for the last 12 months, I can reasonably infer the trend will continue, and I can predict the length of time it takes to execute a deployment in the coming months. Understanding “why” this occurs requires a contextual investigation, but predicting its occurrence does not. This may seem inconsequential as we talk about the length of deployment processes of a given app, particularly if my portfolio of apps is small. But if the trend holds true for a portfolio of thousands of applications, the impacts become more consequential. Or if the phenomenon is directly connected to a particular cloud provider and not seen elsewhere, again, the consequences become meaningful.
Where having “ground-up” predictive analytics (build and/or deployment automation) becomes much more useful is in the construction of release events. Often, release managers are asked to supply an estimated time duration for a given event to factor into the construction of a new release. If a vendor were to integrate more predictive analytical tooling into the release orchestration (RO) toolsets, “random” or “best guess” timing estimates could be replaced by extending historical data into the timing estimation fields. The RO tooling could provide a default timing estimation based on examining historical trends for a period defined by the user. Estimation defaults could always be replaced by a user (assuming discreet knowledge of other impacting events), but at least the new overall lengths of releases then could become significantly more accurate.
This has special significance for manual tasks. Let’s say our database admin team traditionally takes 40 minutes to perform a given release event update, but now is running significantly longer than that. Having the RO software aware of the historical trends could alert the release manager, as well as the database team manager, during the event to check out why this is occurring. For manual tasks in particular it may be hard for anyone besides a predictive analytics module to determine what timings “should” be, as compared with what they are in a given release.
Impacts to the Bottom Line
The real questions are, Why should a customer care about this kind of feature set? Why should we be looking at the vendors of our tooling to incorporate forward-thinking architectures such as predictive analytics into the tooling they sell us? Because the efficiency it can bring to our DevOps set of services lowers the costs of administration and increases the speed at which we can deliver innovation in general. Over time, trending combined with context can statistically prove continuous improvement of our DevOps services, which is the promise we made to the executive teams that sponsored and made the investment in DevOps to begin with.
Metrics analysis can occur (usually with a dedicated investment on our part) without a predictive view of the impacts. But this changes the perspective of our analysis to hindsight thinking instead of forward thinking. In essence, we then are simply “reacting” to what has occurred instead of tailoring our environment to alter the picture of future predicted events to be better than the software predicts it to be. We start tweaking the variables to instantly impact future predictions, instead of only examining them in a post mortem.
This perspective is yet another cultural catalyst for change your business needs. When I talk about changing how an organization “thinks” about innovation, this is one of the key objectives. We take an enterprise from knowing little about the costs of innovation, to having an extensive knowledge, to having a forward view of what is—or what should be, possible. All of this is possible despite the loss of waterfall controls and in the midst of an agile construction model. All it takes is the integration of forward-thinking architectures into the products we select. If our vendor does not offer them, we begin to enlighten our vendors on the value they would bring to us in cost reduction and to them in market capture. If you need help in these areas, feel free to contact me, I am looking at the moment 🙂 .
To continue the conversation, feel free to contact me.