How did we in Enterprise IT get to this point? There’s a lot of talking and writing about the value of adopting DevOps but very little analyzing the problem domain we’re trying to correct. There are some schools that will say we can move forward without looking back, but I believe a key element of the dysfunction that faces many enterprise IT organizations can directly be attributed to their history and how they were formed.
One of the clear lessons we can learn from businesses that we call “unicorns”–termed this because of their mythical capabilities as an IT organization that can deliver high quality software very quickly–is that they have very little friction in the process from ideation to operation. Startups and greenfield initiatives are practically operating in a vacuum relative to the amount of friction they encounter compared to operations around entrenched decade-old legacy mission-critical applications. In retrospect, this should not be surprising given frictionless delivery is a mandate for many of these efforts and has been from the start.
You create what you focus on!
Some will say we don’t require the capability to deliver to production ten times a day (or even a year) for our SAP Enterprise Resource Planning (ERP) system. So they will conclude that perhaps DevOps isn’t right for our needs. This shows the skewed thinking that current literature on DevOps is fostering. It is not the need to deploy changes quickly into production that is at issue, but the ability to deliver business capabilities when they are required as quickly and with little impact as possible. Current processes around changes to the ERP are usually so fraught with friction that when the time comes to fulfill a need for the business so that the business can continue to compete effectively, it takes weeks or months instead of days or hours.
I researched how the division between development and operations was formed and interestingly, there’s very little history about this division of work. We can trace this division back to the early days of time-sharing systems in which the division of work actually represented the exact advice DevOps prescribes today. Deliver and operate a platform on which developers can build and deliver their applications. In the days of time-sharing systems, operations was not responsible for what developers put into the box, just ensuring that the box was available, secure, and backed up. If developers applications had bugs or crashed, it was the developers that were responsible for repair.
Then came the era of client/server. Application pieces ended up distributed across application infrastructure like a kid that dumped their Legos all over the floor. Managing these applications became as painful as stepping barefoot on a Lego as well. Developers didn’t work on the complete application, but instead worked on modules and these modules were split across various services each requiring their own levels of expertise to keep running in an optimal manner. The DBA was responsible for ensuring the database could meet transactional requirements and support data design and management, the network professionals dealt with protocols and traffic issues caused by these modules communicating with one another. Each layer of the application had it’s own capacity management and operational concerns. The only chance we had to keep the ship from sinking was to put it all in the hands of the one team that had access to all the components — IT Operations.
Things have not progressed much since this divergence away from time-sharing to client/server either. Today’s applications are more complex and more distributed than ever before. We have clusters of the same server working in unison to handle scalability exponentially increasing the complexity and difficulty for operations. A single application may now require anywhere from 5 to 25 servers to deliver the required capacity.
On top of all this we’ve never modified the division of work from the days of time-sharing and continue to push the burden for monitoring and uptime to the operations staff. We have turned IT operations into the Jigsawyer’s–someone who does jigsaw puzzles–and developers into the puzzle makers. Moreover, the puzzle is never the same; the puzzle makers change the picture all the time as they add new piece forcing the Jigsawyers to figure out how to retrofit the new piece into the whole.
While researching the history of IT operations and trying to figure out the path from computing’s earliest organization until now, I came across a great piece that really helps identify the problem we have created. So What Is A Deployment Really? by Robert van Loghem illustrates what it takes to move from ideation to operation with today’s application platforms. What I really like about this piece is the illustration that even if we want to go back to earliest division of work, it may not be possible. IT operations cannot simply provide a black box for developers to deliver their applications on and expect them to work as planned. Firewalls need updating, web servers need configuration changes to support the new domain requests and the security architecture must now allow these new requests to be routed appropriately based on rights and source.
Deployment is a team activity!
So, what have most of us in enterprise IT done? We broke down the tasks of deployment further into divisions of work so that we can apply the necessary level of specialization to ensure the application can be delivered to the end user at the cost of adding a ton of friction into the process slowing things down by a logarithmic factor when compared to our unicorn friends.
What is surprising is the number of pundits that put forth that the answer to this problem domain is empathy and culture change. Clearly, we have organizationally written ourselves into a corner because this is how large enterprises solve problems. Here’s a simplification of how this decision was most likely made:
Question: There’s a lot of issues attempting to monitor and operate these distributed applications and the developers don’t have the means or transparency into all the components.
Retort: Then who does have access and insight into all these components?
Answer: IT operations can see what’s running and has the tools to monitor the health, let’s give them the task.
At a time when IT was viewed as an expense versus a strategic advantage, there was no offsite planning to assess what the impact to this answer means to the business ten or twenty years hence. It’s solve the problem and keep moving because that’s what enterprises need to do in non-strategic areas. At the time this answer was formulated most businesses equated this decision with “which office supply vendor should we use?” Moreover, the leaders in client/server barely had any experience with IT operations, they were primarily comprised of application developers. Hence, there was no experience to guide businesses away from this decision. IBM, Unisys, Honeywell, etc. the vendors that were a prime source of guidance at the time were still leading with a time-sharing mentality.
This entire trip down memory lane was with one purpose in mind, figure out what we created and how it was created so we can consider the means to unwind it and provide a path for enterprise IT to move forward in an agile manner. The history provides the answer. We need to move away from the traditional time-sharing division of work that was formulated before the age of the Internet and realign along frictionless surfaces. What does that mean? Well, that, my friend, is an entire book that happens to be in the works! 😉