DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DataOps
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • Techstrong.tv Podcast
    • Techstrong.tv - Twitch
    • DevOps Unbound
  • Webinars
    • Upcoming
    • Calendar View
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • Calendar View
    • On-Demand Events
  • Sponsored Content
  • Related Sites
    • Techstrong Group
    • Cloud Native Now
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Techstrong TV
    • Techstrong.tv Podcast
    • Techstrong.tv - Twitch
  • Media Kit
  • About
  • Sponsor
  • AI
  • Cloud
  • CI/CD
  • Continuous Testing
  • DataOps
  • DevSecOps
  • DevOps Onramp
  • Platform Engineering
  • Sustainability
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps
    • ROELBOB
Hot Topics
  • Chronosphere Adds Professional Services to Jumpstart Observability
  • Friend or Foe? ChatGPT's Impact on Open Source Software
  • VMware Streamlines IT Management via Cloud Foundation Update
  • Revolutionizing the Nine Pillars of DevOps With AI-Engineered Tools
  • No, Dev Jobs Aren’t Dead: AI Means ‘Everyone’s a Programmer’? ¦ Interesting Intel VPUs

Home » Blogs » Top 5 Best Practices for DevOps Monitoring

Top 5 Best Practices for DevOps Monitoring

Avatar photoBy: contributor on November 15, 2017 3 Comments

Today organizations are required to deliver on higher levels of customer satisfaction for their online services. Many, however, are forced to support these initiatives with an interrupt-driven approach as they react to fix things after they break. However, for a more proactive approach and to manage expected high levels of SLAs, organizations can opt to reduce their amount of unscheduled downtime by implementing a continuous delivery (CD) model to their development efforts.

Recent Posts By contributor
  • How to Ensure DevOps Success in a Distributed Network Environment
  • Dissecting the Role of QA Engineers and Developers in Functional Testing
  • DevOps Primer: Using Vagrant with AWS
Avatar photo More from contributor
Related Posts
  • Top 5 Best Practices for DevOps Monitoring
  • Assembling the Key Components of Continuous Delivery
  • GitLab To Embed Observability in CI/CD Platform
    Related Categories
  • Blogs
  • DevOps Practice
    Related Topics
  • automation
  • continuous delivery
  • data analysis
  • data collection
  • data correlation
  • devops
  • monitoring
  • tools
  • what-if analysis
Show more
Show less

What is critical in the CD model is the ability to monitor and manage systems in a structured way, getting early detection of problems so organizations can make changes before the system is impacted. While unexpected failures make interrupt-driven work unavoidable at times, organizations can become more proactive by examining their current approach and toolset against business needs to help create a path to continuous service delivery optimization.

Effective CD Models Need Effective Monitoring

As a starting point, organizations need control and visibility into their DevOps environment by collecting and instrumenting everything. Considering the amounts of data, this can be an insurmountable challenge for most organizations. To get started, follow these five best practices to perform DevOps monitoring efficiently and quickly in a measurable and scalable way.

Step 1: Collect the Data. You can’t manage what you don’t measure! It is important to take inventory of what is being collected today and align with business and executive teams to collaborate on the aim of the services delivered. Analyze the inventory collection of metrics with questions such as: “Why aren’t we collecting this? How does this fit into our goals? Have you ever seen a failure in X? How often should we be measuring this? How long should we keep it for? Is that important?” Teams also should evaluate how they are collecting the information and consider the best architectural approach for collecting their metrics, including whether pull, push or pull collection methods are better. Once goals are understood and the inventory of data is collected, look at what else your organization should consider collecting.

Step 2: Correlate and Triage. Correlation of data is needed to understand it, but data comes in at different frequencies, in different time frames and from different sources. Work to normalize the data to understand it, perform comparisons against the different incoming metrics and establish baselines for basic service availability. Since organizations aim to go beyond basic service level agreements (SLAs) and offer a high-performing solution, constantly ask what the organization is missing from a data perspective and how it relates to business initiatives. Asking that question from the collection, correlation and triage perspective is critical.

Step 3: Identify Trends. Organizations need to examine historical data to identify trends and take action before issues arise and customers complain. Establish alerting thresholds by outlining what a normal day looks like from a monitoring performance and customer perspective, and then identifying examples of what makes an abnormal day. This ties in with managing infrastructure inventory and understanding safety thresholds for each of the components that potentially could impact the service offering. It’s critical to communicate these findings with teams and business-line managers to prevent service delivery problems from happening and optimize offerings based on identified trends.

Step 4: Notify and Act (Automation). In manual mode, a notification is delivered and then the team reacts. But teams are continually pushed to do things faster, and automation can help. To get to that point, organizations must understand where best to add more automation. How do you gather the right telemetry that delivers consistent answers that the machine can operate rules against, and do you leverage an automated process or notify a person? The desire to improve to a faster process requires a shift from manual to automated practices.

Step 5: Predict (What-if Analysis). If you don’t take the time to go through the first four steps in a methodical way, it’s very difficult to reach this final step without always remaining in a manual mode. To balance costs and availability, it’s critical to discuss with the executive team how to predict customer service consumption (revenue) against the amount your services are going to cost the business.

For example, developing a service that can alert customers of a potential service disruption due to disk space can be accomplished only by clearly setting business goals, then using the right metrics and events platform leveraging a time series database. In this case, the business goal is to ensure no service disruption due to failures such as low disk space; the metric monitoring is for disk space for each customer instance with the appropriate threshold alert set off by an automated trigger; and the action is an email to the customer letting them know of the situation and what action they can take (reduce load or upgrade for more disk space). In this example, it is important to marry business logic with the monitoring practice to make service a successful experience for customers. Furthermore, this approach will help teams to not only predict user experience but also help to predict OpEx and CapEx better into the future.

That’s the benefit of the ability to predict with the right monitoring approach—to make informed business decisions based on results. Customers appreciate that, and proactive support is much better than being reactive.

Emerging trends such as microservices, containerization, elastic storage, software defined networking and hybrid clouds are pushing the boundaries of DevOps monitoring. The right monitoring plan can identify and resolve problems before they affect critical business processes and enables customers to plan for upgrades before outdated systems begin to cause failures and outages for users.

About the Author / Mark Herring

Mark Herring is CMO at InfluxData. He is a well-rounded silicon-valley executive with proven experience in taking complex technology and making it understandable to the broader audience. His deep developer roots are never far from his mind as he looks at trends and asks the tough questions on whether the technology is here to stay or just another fad. Currently Mark is Chief Marketing Officer at InfluxData. Follow him on Twitter and connect with him on LinkedIn.

Filed Under: Blogs, DevOps Practice Tagged With: automation, continuous delivery, data analysis, data collection, data correlation, devops, monitoring, tools, what-if analysis

« Coding in the Future
SAP Adds SDK to ERP Cloud Service »

Techstrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

Securing Your Software Supply Chain with JFrog and AWS
Tuesday, June 6, 2023 - 1:00 pm EDT
Maximize IT Operations Observability with IBM i Within Splunk
Wednesday, June 7, 2023 - 1:00 pm EDT
Secure Your Container Workloads in Build-Time with Snyk and AWS
Wednesday, June 7, 2023 - 3:00 pm EDT

GET THE TOP STORIES OF THE WEEK

Sponsored Content

PlatformCon 2023: This Year’s Hottest Platform Engineering Event

May 30, 2023 | Karolina Junčytė

The Google Cloud DevOps Awards: Apply Now!

January 10, 2023 | Brenna Washington

Codenotary Extends Dynamic SBOM Reach to Serverless Computing Platforms

December 9, 2022 | Mike Vizard

Why a Low-Code Platform Should Have Pro-Code Capabilities

March 24, 2021 | Andrew Manby

AWS Well-Architected Framework Elevates Agility

December 17, 2020 | JT Giri

Latest from DevOps.com

Chronosphere Adds Professional Services to Jumpstart Observability
June 2, 2023 | Mike Vizard
Friend or Foe? ChatGPT’s Impact on Open Source Software
June 2, 2023 | Javier Perez
VMware Streamlines IT Management via Cloud Foundation Update
June 2, 2023 | Mike Vizard
Revolutionizing the Nine Pillars of DevOps With AI-Engineered Tools
June 2, 2023 | Marc Hornbeek
No, Dev Jobs Aren’t Dead: AI Means ‘Everyone’s a Programmer’? ¦ Interesting Intel VPUs
June 1, 2023 | Richi Jennings

TSTV Podcast

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays

Most Read on DevOps.com

What Is a Cloud Operations Engineer?
May 30, 2023 | Gilad David Maayan
Forget Change, Embrace Stability
May 31, 2023 | Don Macvittie
Five Great DevOps Job Opportunities
May 30, 2023 | Mike Vizard
No, Dev Jobs Aren’t Dead: AI Means ‘Everyone’s a Programmer’? ¦ Interesting Intel VPUs
June 1, 2023 | Richi Jennings
Checkmarx Brings Generative AI to SAST and IaC Security Tools
May 31, 2023 | Mike Vizard
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2023 ·Techstrong Group, Inc.All rights reserved.