Awareness and anticipation of operational management, best practices can improve your app at every stage of its life cycle
How important or interesting you find the “operations management” aspects of DevOps methodologies likely depends on your background, experience and how many operational horror stories you have lived through. If you have been on the hot seat when services are impacted, woken up in the middle of the night when major outages occur or dragged into blame-battle war room calls while you and your team scramble to restore service, then the importance you place on operations management may be very high.
However, if you have avoided such experiences, it would surprise no one if the operations management aspects of DevOps are not top of mind. When everything is smooth sailing, then it is understandable that you might be reluctant to make any significant changes to your daily routine. However, in agile, fast-moving environments, things can go wrong very fast, even if your organization separates duties intentionally to create an “operational firewall” to keep ill-advised code or configuration changes to a minimum. Using effective ways to lend your app expertise to assist during impactful situations can make your life much easier and limit disruption to conducting your other duties.
You may work in an organization which is new to owning and managing operations management tools. It is becoming more common for organizations outside of a central IT operations team to do so as scopes of responsibility expand. Several analyst firms have published recent studies that nearly half of purchase and usage of operations tools is already done outside of IT operations teams. This trend is predicted to grow to the majority in a few years.
Configuration and automation tools are also becoming more commonly owned outside the data center. Such shifts in responsibility make efforts of operational skill-building, incident readiness and resolution more challenging. As the number of different groups deploying these tools expands, the likelihood of overload of notifications, alerts, events and general operational noise starts to rise dramatically. Note: I will be presenting a webinar on how new operations management tools can help you carry out the best practices, join me in this conversation Thursday, Oct 26.
Organizations deploying new apps and management tools in the cloud naturally look to manage those workloads from the cloud as well. Other organizations with legacy applications often are managing digital transformation leading to transformative projects for operations management. Whether your organization is already leveraging cloud-based tools or you are part of an organization undergoing transformation, you may have the challenge of a hybrid set of workload deployments, both cloud-based and on-premises. For your organization, this could mean finding a pragmatic complementary approach to deploy and manage the new, cloud-based applications and their associated management tools. Such an approach may not need to be isolated from your current tools and methods. Sufficient tool interoperability could support a federated, phased path.
In this blog series, we will explore the common operational challenges many DevOps teams are facing today, how traditional IT operations best practices could be leveraged for use in a DevOps methodology and how new operations management tools can help you carry out those best practices to meet your goals on an on-going basis.
Here’s a preview of the forthcoming blog topics:
Let the Noise Wail Without Going Deaf
Unless your applications and services exist in a vacuum sealed chamber of silence, agile development, continuous rollouts and updates and infrastructure system changes create a plethora of events that range in severity of impact, validity and meaningfulness. Those events tend to come from multiple monitoring and configuration tools or reflect many different parts of your applications and underlying services. This mixture of events and alerts leads to what is commonly known as operational “noise.”
Smart Ops Action: Taking the Right Action for Ops Traction
Empowering anyone in your organization who contributes to restoring service when incidents occur can ensure their time is well spent, with the intended results. All too often, operations teams effectively identify and triage incidents, yet fall short of empowering first responders to actually resolve problems. Bolstering your team’s efforts to take the actions necessary to resolve the situation at hand and restore service promptly is achievable—if done right.
Boring is Best! How Can You Thrill Your DevOps Stakeholders with Boredom?
Strive for stability and structure in production without stifling development agility. Don’t try to thwart the innovation and chaos that results from agile development and constant rollouts—get structure and stability where you have control so you can focus your talented team members to be supportive, valuable contributors to innovation and agility.
Fuel innovation and make your stakeholders smile with boredom by effectively turning chaos into stable operations.
If you’re a veteran IT operations practitioner with years of experience helping your company’s applications, services and underlying infrastructure run efficiently, these tips may be obvious. Hopefully you’ve been successful advocating these best practices to your development and line of business colleagues.
If you are new to operations—and especially if you are amongst the growing population of DevOps professionals who are understandably reluctant to conduct triage, investigation, diagnosis or resolution of operational incidents—rest assured there is relief. Tried-and-true best practices carried out with new, easy-to-use tools can empower your team to conduct these unwanted tasks like a seasoned veteran, regardless of operational experience. Learn more.
About the Author / James Moore
James Moore is Principal Offering Manager, IBM, responsible for offering management and strategy for IBM HybridCloud Operations Insight solutions including IBM Cloud Event Management, Runbook Automation and Alert Notification. James joined IBM from the Candle acquisition in 2004, where he was product manager for Candle’s Application Response Time product lines. James has over 15 years’ experience in event management, application performance management, and business service management. Connect with him on LinkedIn and Twitter.