We all want to deliver excellent software to our customers. We now have a greater awareness than ever before of the interconnectedness of all the activities required to do so. The agile movement started this but it was not until the advent of DevOps that Ops was considered part of the full picture. In this article I will explain an effective first step that agile organizations can take to complete the picture and bring ops more completely into the product development process.
Delivering excellent software starts with product definition and flows from those initial stories and designs, into dev, verification and testing, into production and then back again in the form of customer feedback and bugs. In reality the feedback paths represent a transitive closure of all activities just mentioned as you can see in the diagram below. Agile recognized the importance of feedback loops in good product development. Agile also realized that for the right information to flow, walls had to be broken down. The motto “Done means tested” arose and indicated that product owners were not done with a story, could not consider it closed, until it had been tested and they had verified it. This represented the integration of PM, development and QA functions. A huge step, but one step too short. Operations was still not viewed as a key part of building excellent software and there was not much insight into the fact that the feedback loop did not stop at testing. DevOps has recognized this shortfall and taken it to its logical conclusion – “Done means in production.”
Product development is a cycle, a cycle that operations is a critical part of. In describing how all parts of the cycle can be integrated lets start at the most natural beginning; product definition. Product definition activities produce stories, designs, functional acceptance criteria (A.C.) and prioritization. In this phase we don’t produce all we need for the finished product but just enough to get us going. With popular agile methodologies like Scrum this is ideally not done solely by a product owner but instead in conjunction with dev and test at story grooming meetings, design reviews and sprint planning. In these meetings dev, QA and product communicate and negotiate stories, comment on designs, and flesh out acceptance criteria. In this way the stories, designs and acceptance criteria are improved and everyone gains an understanding of what is being built; they also become invested in it. A story may look something like this (abbreviated):
Story: Blog administrator can change user roles SO THAT users can be managed effectively
A.C: Given blog administrator is logged in
And on a user profile
When he selects a permission scheme from a permissions drop-down menu
Then the users permissions are updated and stored in the DB
This is a great start but it is not enough. If we are to be excellent we must modernize this just a bit and add in some DevOps spice.
A first step towards bringing ops effectively into the cycle is to include someone responsible for actually operating the software in the story and acceptance criteria grooming sessions, design reviews, and sprint planning meetings. The critical aspect is that when the team negotiates the acceptance criteria associated with a story it will be ops responsibility to ensure we for each story or feature the team adds success and failure metrics, initial alarm values and any operational functionality such as a kill switch that will be needed in order to operate the functionality effectively. So a story will now contain things like the following in the A.C. (acceptance criteria).
Given operation X fails
Then the metric operation-x:failed will be sent through the metrics feed
We will also alter our definition of done to make sure that a story will not be closed until we meet the ops driven acceptance criteria. For example: we can see the metrics we have defined in our A.C. show up on an ops dashboard. There are so many benefits to this such as folks that are unlucky enough to be on on-call will have participated in this story grooming and will know exactly what these metrics mean when they encounter an issue with them for the first time in production. Furthermore, they will know what operational levers they can pull or request to be pulled.
This first step represents a connection from product definition to dev, to QA, to operations. This connection is perfectly natural and intuitive when you think about it. Being a true DevOps organization and developing operationally excellent products requires more but having connected product definition to operations as we have provides a nice toe hold from which to flesh things out further. Best of luck. If you have any questions leave them as comments and I will respond as best I can.