In Part 1 of this series, we discussed the fact that more organizations are looking to embrace the “DevOps culture.” However, unlike ITIL, implementing DevOps into an organization does not happen overnight. DevOps is more a philosophy than a procedure, and choosing the right leadership team is as important as choosing the right tools.
In Part 2 we look at other important elements of organizations successfully embracing the DevOps culture.
Without Proper Collaboration, DevOps Initiatives Will Fail
A good test to determine if your organization has truly embraced the DevOps culture is how well your IT department responds when a major incident occurs in the product environment. For example, an e-commerce company’s dramatically slow website, a hospital’s unresponsive electronic medical record (EMR) system, and a malfunctioning airline reservation system. In these chaotic situations, every minute counts and all Dev and Ops hands need to be on deck to keep your organization afloat.
When incidents occur, it’s critical that your organization’s Dev and Ops teams work together to resolve the problem and restore failed service as quickly and efficiently as possible.
Incident Response: The Critical Incident Team
An international company with global operations may have network operations centers (NOCs) in the United States and the U.K. that support more than 50 mission-critical business applications, development centers in China and the United States and a service desk in the United States. When an incident occurs, the company’s service desk is flooded with alerts, ranging from various IT monitoring tools and customer phone calls to open tickets from online users and inquiries from stakeholders. When that happens, the only concern of the service desk is to fix the issue—not tracking down a department head on the other side of the world that may be responsible for the problem.
DevOps should enable an organization to assemble its critical incident team in only a few minutes. The Ops team should know exactly what team(s) to contact for a given incident, and who from those teams is assigned to application support. They should know who is on call based on the time zone and day of the week, and have the correct phone numbers easily accessible.
Next, the team should review the technical details of the issue provided by the incident manager. The universally used DevOps toolset enables everyone to speak the same language, quickly identify the root cause, put a remediation plan together to have approved by the emergency cab, deploy the fix, test it out and confirm service is back up and running.
IT Communication: The Glue That Holds Dev and Ops
To be successful, organizations need to be able to respond as quickly as possible during a major IT incident to limit the negative impact on the business. This is achieved when Dev and Ops works together to create a central communication and collaboration center to ensure proper communication with key stakeholders.
An organization’s ability to effectively mitigate the impact of an IT issue relies heavily on its ability to access and communicate critical information, and to ensure the right people can analyze it and initiate the appropriate actions to keep the business running smoothly. The communication hub equipped with an IT communication solution will facilitate:
- Reaching out to the right on-call people among all the different teams: infrastructure, server, system administration, middleware, network, DBA, QA, support team, service desk and the application developers.
- Because emails don’t wake up people, they will be able to leverage the IT communication solution to reach out to people via multiple channels until they respond (voice, SMS, email, push notification app, paging, etc.), or automatically escalate to the next resolver on the on-call list.
- Providing the right information so the IT resolvers start investigating the issue, identify the root cause and put a resolution plan together without wasting time.
- Get people to collaborate together using the same telecommunication and collaboration tools whatever the time zone and wherever they may be located.
- Contacting a third-party vendor in case the problem is not attributable to the company but caused by an external piece of software.
- Informing the other departments if the business impact grows so big that it affects the company profitability or reputation, such as a cyberattack leading to a data breach for instance. The CEO, legal and marketing may need to be informed to anticipate the consequences.
- Informing end users or customers to limit the number of incoming calls into the help desk.
The benefits of quick collaboration and not just handoff between the teams don’t even have to be demonstrated. Removing the broken pathways, obsolete delivery methods and redundant platforms will improve significantly the organization’s ability to communicate during critical moments. Becoming more efficient and removing the time wasted will automatically have a huge impact on the mean time to know (MTTK) and, in turn, on the mean to resolve (MTTR), hence minimizing the disastrous impacts on the business and keeping the company’s execs sleep at night.
DevOps Culture Can Happen
Embracing a DevOps culture is not a Big Bang change that happens overnight. This is a journey that will be more or less difficult to pursue based on the existing culture and the history and maturity of the IT organization. Even more importantly, it will depend on the level of commitment from upper management to introduce a new culture. Leaders need to embark on this journey keeping in mind that not only will it require new common tools and new common processes, but also they’ll need to convince people of the value of such transformation. To achieve this goal, management should consider IT communication as a key element, which will be the glue that binds Dev and Ops together. And because there always will be room for continuous improvement, the road to DevOps will have no end.
About the Author/Vincent Geffray
Vincent Geffray is Senior Director of Product Marketing at Everbridge with focus on IT Service Alerting & Communications Automation and IoT.
He has more than 14 years of experience in the information technology business, designing, promoting and selling Enterprise IT Operations Management solutions, including Critical Communications, Application Performance Management, IT Process and Workload Automation. He also has international experience as he started his career in Europe. Vincent holds a Master of Science (Mechanical Engineering and Computer Science) and executive certificates from the MIT – Sloan School of Management.
Vincent’s LinkedIn: www.linkedin.com/in/vgeffray
Vincent’s Twitter: @vgeffray