Companies will often believe they can buy their way out of technical debt without making any substantial changes. They often think they can buy DevOps in a box. Well, unfortunately, devops isn’t for sale. For an organization to embrace DevOps it will need to adopt cultural change, adopt tools which enable automation, and change processes to include both dev and ops. I like to think DevOps can be explained simply as operations working together with engineers to get things done faster in an automated and repeatable way.
Culture
Organizations large and small have experienced these changes, the challenges encountered along the way, and the light at the end of the tunnel. Each organization requires their own set of tools and processes to embrace DevOps. Cultural change largely depends on the existing dynamic of an organization. In small companies, I have found it needs to come from the ground up and usually starts with adding the DevOps toolchain. In large organizations the value of DevOps needs to be sold to the executive team and then enforced from the top down. This is a daunting challenge for most large companies because it’s hard to quantify the value of DevOps.
Most organizations only quantify failure when it comes to IT outages. How many outages? For how long and how severe? They often don’t quantify the success from 100% uptime or the resilience of failing over with zero down time. The ability to instrument and prove to an organization the value of applying DevOps is a crucial step to making it happen. DevOps teams efficiently deploy features faster with far less bugs. Also, when outages happen they respond faster minimizing the business impact.
From developer to operations – one and the same?
As a developer I have always dabbled lightly in operations. I always wanted to focus on making my code great and let an operations team worry about setting up the production infrastructure. It used to be easy! I could just ftp my files to production, and voila! My app was live and it was time for a beer. Real applications are much more complex. As I evolved my skillset I started to do more and expand my operations knowledge.
Tools
During the development phase, the operations staff normally makes sure the development environment is managed and they actively work to set up the test, QA, and Prod environments. This can take a lot of time if automation tools aren’t used. Here are some tools you can use to automate server build and configuration. In another post I will dive through the pros and cons of each tool, but as long as you apply some form of configuration management and treat your infrastructure as code you will be on the right track.
Process
From an operational perspective, my first instinct is to understand the application architecture so I can start thinking about the proper deployment model for the infrastructure components. Here are some of my operational questions and considerations for this stage:
Are we using a public or private cloud?
What is the lead time for spinning up each component and ensuring they comply with my company’s regulations?
When do I need to provide a development environment to my dev team, or will they handle it themselves?
Does this application perform functions other applications or services can already handle?
Operations should have high-level visibility into the application and service portfolio. From a development perspective, my first milestone is to make sure the ops team fully understands the application and what it takes to deploy it to a pre-production environment. This is where we the developers sync with the product and ops team and make sure we are aligned.
Planning for the ops team:
What tools will we use for deployment and configuration management?
How will we automate the deployment process and does the ops team understand the manual steps?
How will we integrate our builds with our continuous integration server?
How will we automate the provisioning of new environments?
Capacity Planning – Do we know the expected production load?
Once developers have built unit and functional tests we need to ensure the tests are running after every commit and don’t allow regressions in our promoted environments. In theory, developers should do this before they commit any code, but often times problems don’t show up until you have production traffic running under production infrastructure. The goal of this step is really to simulate everything that can go wrong and find out what happens and how to remediate.
Combining the power of developers and operations
The focus on the developer/operations collaboration enables a new approach to managing the complexity of real world operations. I believe the operations complexity breaks down into a few main categories: infrastructure automation, configuration management, deployment automation, log management, performance management, and monitoring.
In my next post I will walk through each one of these complexities and address how you can get your organization inline.