DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • DevOps Chats
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Communities
    • AWS Community Hub
    • CloudBees
    • IT as Code
    • Rocket on DevOps.com
    • Traceable on DevOps.com
    • Quali on DevOps.com
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Digital Anarchist
  • Media Kit
  • About
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DevSecOps
  • Leadership Suite
  • Practices
  • ROELBOB
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps

Home » Blogs » Application Performance Management/Monitoring » 3 Observability Strategies to Reduce Debugging Cycles

observability

3 Observability Strategies to Reduce Debugging Cycles

By: Dan Bennett on July 30, 2020 Leave a Comment

These observability tips can help developers uncover issues that impede performance and derail customer experience

Related Posts
  • 3 Observability Strategies to Reduce Debugging Cycles
  • The Importance of Customer Experience Observability
  • Opportunities and Challenges of Observability
    Related Categories
  • Application Performance Management/Monitoring
  • Blogs
    Related Topics
  • application performance monitoring
  • customer experience
  • developer
Show more
Show less

Modern technologies and methodologies such as cloud services, containers, DevOps, microservices and serverless have made it easier for organizations to deploy application code to production. However, the additional layers within the modern tech stack have increased complexity, making it more difficult to identify performance issues and trace them back to the root cause before customers are impacted.

DevOps Connect:DevSecOps @ RSAC 2022

While all these layers can be instrumented for observability, each of them spits out different information and for different roles. If something goes wrong, where does a developer begin to look? Maybe they get mentioned on a support ticket. Or they end up staring at high-level visualizations of systems from the various performance monitoring and logging tools. But there are graphs everywhere. There’s no surefire way for the developer to hone in on what they’re supposed to look at quickly. By standing back and eyeballing the graphs, they’ll obviously see the spike. But with all this information and so many variables, it’s not easy to pinpoint the reasons why.

When incidents impact large numbers of customers, they could spend many days trying to find the answer. Without context into what’s causing the issue, developers spend most of this time blindly following different trails and doing things like arbitrary code rollbacks. It’s costly. Studies show organizations spend about $4.6 million annually on incident management.

Observability: Strategies to Consider

Development teams can save time and reduce costs by implementing three overlooked observability strategies that offer a new perspective and carry learnings forward.

Compare Changes Over Time

To help reduce the time it takes to trace the issue back to a specific reason, developers should look at metrics over time. Find the closest representation of an identical window of activity and compare it to a previous point in time to help spot an outlier. For example, if a development team is alerted to an incident in which customers are reporting slow response times with an application, they can compare transaction data from similar time periods to look for correlations. Data may reveal the issue is only with API calls, although individual API call times were not significant enough to meet thresholds and trigger alarms. However, comparing those API calls to those from a week ago under the same parameters—same time, same server, etc.—they discover the aggregate of the API calls is performing 20% slower.

Looking at performance comparisons over time gives developers visibility into performance degradation before it starts to hurt a large number of users.

Analyze User Behavior Patterns

A development team should also evaluate the behavior patterns of individuals and IP addresses that are interacting with an application. This can provide insights that help to differentiate between external (user, browser) causes and internal causes (problems with code or a company’s systems).

For example, consider an e-commerce customer trying to modify items in their shopping cart before checking out. The application is responding slowly to their click (it’s taking 2 seconds), so they click repeatedly. A single occurrence might be overlooked but repeated 2-second delays will add up to a significant impact. It may even trigger alerts that the database is overloaded or there’s a code problem when in reality user behaviors are compounding an underlying problem. Some log activity may look like an attack, but investigating the users impacted may reveal the root cause is actually a browser behaving unusually or a misconfigured proxy that is sending anomalous traffic.

In either case, developers end up wasting time by implementing corrective actions that may not solve the problem. But considering patterns—viewing events that fired before and after the problem and behavior from individual users along with IP addresses—will enable developers to quickly trace performance issues back to the root cause.

Properly Assess Impact

Properly assessing the impact of an issue is another useful strategy that can help to determine response to an alert. For instance, if a subset of users < 0.5% are repeatedly encountering an error with an API, are unable to access the functionality (as in the shopping cart example) and are persistently re-attempting the API, you will see a spike in error rates that implies a bigger problem.

Another example might be new functionality that has been rolled out that is for some reason incompatible with components of older accounts, such as incompatible profile information or data thresholds. As customers explore, this small subset of users will encounter the bug. It may appear as though the feature is broken when, in reality, it is a problem with some legacy data.

When problems affect only a few customers, it is beneficial for developers to lean on monitoring tools for alerts and context necessary to resolve an issue without needing to rely on support or support tickets.

Apply Observability Learnings Forward in Post Discovery

For issues that impacted many users or were difficult to diagnose, development teams should consider adding a custom metric or monitor when they do their post-discovery analysis. Whatever the metric is—an extra line in a log, a StatsD metric or putting a catch around it to log the trace—it is important to capture the lesson learned and make it actionable for the future, which will save time during discovery when it happens again. Without this step, the mean time to resolution for future issues may continue to worsen as software complexity increases.

Lack of actionable data is one of the reasons why companies spend millions of dollars annually on managing and resolving incidents. As development teams look to expand observability practices, they should first consider what puts the developer in the best position to resolve an error or performance issue quickly and ultimately deliver exceptional customer experiences.

Filed Under: Application Performance Management/Monitoring, Blogs Tagged With: application performance monitoring, customer experience, developer

Sponsored Content
Featured eBook
The Automated Enterprise

The Automated Enterprise

“The Automated Enterprise” e-book shows the important role IT automation plays in business today. Optimize resources and speed development with Red Hat® management solutions, powered by Red Hat Ansible® Automation. IT automation helps your business better serve your customers, so you can be successful as you: Optimize resources by automating ... Read More
« FortressIQ and Signavio Partner to Deliver End-to-End Process Intelligence Across the Enterprise
Making the Move to Edge Computing? Consider This  »

TechStrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

Continuous Deployment
Monday, July 11, 2022 - 1:00 pm EDT
Using External Tables to Store and Query Data on MinIO With SQL Server 2022
Tuesday, July 12, 2022 - 11:00 am EDT
Goldilocks and the 3 Levels of Cardinality: Getting it Just Right
Tuesday, July 12, 2022 - 1:00 pm EDT

Latest from DevOps.com

Rust in Linux 5.20 | Deepfake Hiring Fraud | IBM WFH ‘New Normal’
June 30, 2022 | Richi Jennings
Moving From Lift-and-Shift to Cloud-Native
June 30, 2022 | Alexander Gallagher
The Two Types of Code Vulnerabilities
June 30, 2022 | Casey Bisson
Common RDS Misconfigurations DevSecOps Teams Should Know
June 29, 2022 | Gad Rosenthal
Quick! Define DevSecOps: Let’s Call it Development Security
June 29, 2022 | Don Macvittie

Get The Top Stories of the Week

  • View DevOps.com Privacy Policy
  • This field is for validation purposes and should be left unchanged.

Download Free eBook

DevOps: Mastering the Human Element
DevOps: Mastering the Human Element

Most Read on DevOps.com

Rust in Linux 5.20 | Deepfake Hiring Fraud | IBM WFH ‘New No...
June 30, 2022 | Richi Jennings
Chip-to-Cloud IoT: A Step Toward Web3
June 28, 2022 | Nahla Davies
The Two Types of Code Vulnerabilities
June 30, 2022 | Casey Bisson
Common RDS Misconfigurations DevSecOps Teams Should Know
June 29, 2022 | Gad Rosenthal
Quick! Define DevSecOps: Let’s Call it Development Security
June 29, 2022 | Don Macvittie

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2022 ·Techstrong Group, Inc.All rights reserved.