What is DevOps, and how do we do it?
It’s a question I’m often asked, with confusion around job titles, team structures, tooling, personnel responsibilities, office layout and more as potential ingredients for a successful devops recipe. The first step in evolving an existing organization to a devops mentality is to take a step back, and level up your thinking, far from the tactics mentioned above.
The first thing to recognize is that this is a cultural and professional movement of technology workers wanting to be more impactful for their businesses and be personally happier themselves, and this movement has three key strategic pillars of success: cultural, structural, and tooling. When we get all three right, we enable a devops mentality, the benefit of which is a business that can respond quickly to customer or market feedback (i.e., agility).
The three pillars are foundational to the collaboration among technologists to enable this business agility.
Evolving to DevOps
You have a status quo of culture, structure and tools. Regardless of whether you explicitly defined each, or allowed some to more implicitly evolve than others, the fact remains that you have a status quo. How do you evolve? You need to tackle each pillar in turn, understanding your status quo, and what helps or hinders a devops mentality for your organization. The first pillar is the most critical: culture. Let’s speak to that first.
A Culture to Enable DevOps
What happens at your company when things go wrong? Be really honest. Do folks become defensive about their role in contributing to a problem? Do they blame others for problems? Do fingers point outward before they point inward? Do you ever hear, “It’s not my servers, it’s the change you made to the code!”, or the converse, “It’s not the code, something’s wrong with the servers!”.
If any of the above holds true, stop everything right now, get everyone together, and talk about what you see in how you interact with each other. Your goal is to enable a shared outcome mentality across functions and jobs and roles (i.e., we all succeed or fail together), and a culture of self-improvement and introspection where the first thought when things go wrong is “what could I have done to prevent this from happening, and how do I want to change what I do in the future to make it better?”. When you get it right, people will gravitate TO problems and collaborate on solutions, not distance themselves AWAY from them to avoid blame.
So what do you say? How do you say it? What’s necessary to evolve the culture? Every situation is unique, and you’re going to need to chart your own personal and organizational evolution to foster a culture that can enable devops. If you’re the leader of the team or business, start surrounding yourself with mentors and peers in other companies that have achieved this, and ask to learn from them.
If you’re not the leader, again, surround yourself with mentors and peers that have built this culture, and ask to learn from them. Then be the change you seek in your organization, and lead a cultural shift from within. Either way, you’ll begin to discover your own unique path of learning, supported by many readings and mentoring conversations, that will empower you with the skills and conviction to bring about cultural change.
If you’re the current leader of the team or business, and you’re not willing to undergo this difficult process of personal and organizational evolution of your culture, I have a rude awakening for you: you won’t be the leader for the long term. Over a period of years, either your business will succumb to competitors who are able to get this right, or a member of your team will take over for you as the rightful cultural steward. Choose wisely.
A Structure to Enable DevOps
Within your company, look at your organization chart, and look at the people that you are physically near or collaborate with the most, and answer the question: what are we responsible for? Now look at other parts of the organizational chart or office layout for departments or functions of the organization that you are dependent on for collective success. What you’re looking for is a physical or proverbial wall in the organization that draws boundaries on what one group of people are responsible for, and what the next group of people are responsible for; these walls create a “that’s someone else’s problem” mentality. In the most classic devops sense, we’re talking organizational (i.e., you report in to different parts of the organization from a managerial perspective, or you are accountable to metrics that put you at odds with another part of the company) and physical (i.e., you sit in different areas of the office) silos separating developers from operators. If you’ve ever heard the phrase “developers toss the code over the wall for operators to run,” you know what I’m talking about.
Some of these physical and proverbial walls will make sense in the organization, while others you’re going to need to question why they are there. You may even find some walls that don’t make sense with other parts of your organization (say product management, or security, but we’re going to stick to the classic developers and operators example for brevity). Physical and proverbial walls that hinder collaboration toward shared outcomes must be evaluated for removal.
The classic example from a proverbial perspective is a development function of an organization being held accountable for performance metrics such as number of features released, while an operational function of an organization being held accountable for reliability metrics. Clearly, there will be tension and angst as these metrics can easily pit two groups of people against one another. In this example, a common solution is to focus on shared outcomes, and hold everyone accountable for both feature velocity and reliability. Another common solution is to have both developers and operators report in to the same person. Another common solution is to put developers “on call”. The tactics will vary on the situation, the strategy remains the same: break down the physical and proverbial barriers to collaboration and embracing shared outcomes. If there is a wall that prevents people from working together to improve the business, tear it down.
Tooling to Enable DevOps
How effective are you on 3 hours sleep after being paged all night? How strategic is your thinking while typing in a command for the Nth repetitive time? How demotivating is it to put all of your energy into treading water without signs of visible forward progress? Even the greatest collaborative culture with a structure reinforcing shared outcomes will fail if everyone is burning out debugging problems and performing the same mundane tasks over and over. This is where you need to evaluate your tooling.
Start with the repetition. Every time some task comes up a second time, make a commitment organization wide to automate it and track changes in revision control. Use this as a foundation to eventually drive to fully automated configuration management from revision control, but you have to start somewhere, and you will free up significant time and frustration over the long haul with this simple mentality shift. We’re going to invest this newly found time into the followup tooling activities.
With that executing over time, turn your attention to telemetry. Are you collecting the right data from the right places to let you understand where problems are? Data debunks all blame, and properly aligns people toward investigating problems in a collaborative manner. Consider “monitoring” as the definition of thresholds on the right telemetry data, and “notifications” as the definition of who needs to know and when. Be diligent on getting the right threshold and right notifications; set thresholds too aggressive or notifications too broad, and folks will become numb to the system and lose trust, while setting thresholds too passive or notifications too narrow, and important issues affecting customers could get missed. Over time, improved telemetry will enable you to predict problems before they happen, and the right thresholds and notifications will ensure the right people know what to do to fix what problems right away, freeing up more time to go into the final phase of tooling.
With common tasks automated, and a good, clear understanding of how well your systems are performing and where problems lie, you can focus your energy on improvements that will enable resilience for the future, both in your technical systems and in your teams. On technical systems, you’ll investigate refactored architectures that are more scalable and fault tolerant (e.g., horizontal scale, multi-datacenter load balancing). For your teams, you’ll invest in better collaboration systems (e.g., wiki, chat, ticketing, etc.).
You’re Never Done
While I’ve provided a framework to systematically tackle common barriers to enabling the business agility that comes from a devops mentality, truth be told, there will always be more to do, and there will always be more personal, organizational and technical improvement to be had. Embrace this journey, be proud for each new milestone along the evolution, celebrate every victory, and deliver wonderful things for your customers.