DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DataOps
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • Techstrong.tv Podcast
    • Techstrong.tv - Twitch
    • DevOps Unbound
  • Webinars
    • Upcoming
    • Calendar View
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • Calendar View
    • On-Demand Events
  • Sponsored Content
  • Related Sites
    • Techstrong Group
    • Cloud Native Now
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Techstrong TV
    • Techstrong.tv Podcast
    • Techstrong.tv - Twitch
  • Media Kit
  • About
  • Sponsor
  • AI
  • Cloud
  • CI/CD
  • Continuous Testing
  • DataOps
  • DevSecOps
  • DevOps Onramp
  • Platform Engineering
  • Sustainability
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps
    • ROELBOB
Hot Topics
  • Cisco Acquires Splunk to Create Observability Powerhouse
  • Nobl9 Unfurls Reliability Center for Managing SLOs
  • Harness Launches Open Source Gitness Platform
  • Documentation as Code: A Game Changer for DevOps Teams?
  • Innersourcing Open Source Principles in the Enterprise

Blogs Enterprise DevOps Application Monitoring in the Age of Big Data

Application Monitoring in the Age of Big Data

Avatar photoBy: contributor on January 19, 2016 2 Comments

Operations (or ops, as it is known colloquially) has been necessary since the first shared systems came online in the ’60s. While the job title didn’t necessarily exist, the same questions needed answers then as they do now:

Recent Posts By contributor
  • How to Ensure DevOps Success in a Distributed Network Environment
  • Dissecting the Role of QA Engineers and Developers in Functional Testing
  • DevOps Primer: Using Vagrant with AWS
Avatar photo More from contributor
Related Posts
  • Application Monitoring in the Age of Big Data
  • New Relic Announces Groundbreaking Real-Time Analytics Platform
  • 3 Approaches to End-User Experience Monitoring
    Related Categories
  • Blogs
  • Enterprise DevOps
    Related Topics
  • apm
  • application performance
  • application performance management
  • big data
Show more
Show less
  • Is it up and accessible?
  • Is it responding in a reasonable time frame?
  • Is it performing better or worse now than previously?

In a simple system, say a static web site, these questions are answered easily by requesting the page. In an automated fashion, of course.

However, when you look at any “late model” technical infrastructure, you’ll find many, many moving parts. These range from border routers to web servers, work queues to databases and data caches to big data clusters. There are thousands upon thousands of computing instances, software processes, network storage devices and network links that can go down, up or just plain wonky. This means potentially millions of places where things can go wrong.

AWS Builder Community Hub

How do you keep an eye on all of these potential issues? How do you know if they’re responsive? How do you know if things are getting better or worse?

Some Monitoring Progress

The “old” method was to write some sort of monitoring script that polled some data that you thought would represent the overall health of that component. Through the years things have gotten more granular. Our monitoring script libraries have grown and we’ve even instrumented our applications with counters (statsd, librato, etc.) and come up with big data storage (opentsdb) to house all of these data points.

The question remains, Do we have any more clarity than before? We unarguably have more data points. So we must know more about the performance of our systems, right?

Turns out, in practice, we just have a lot of data and a lot of people attempting to decipher what that data means. If you’ve worked in operations you’ve been up at 2 a.m. looking at status dashboards with green, yellow and red icons and many, many graphs, all in an effort to answer the question: What is going on?

Sure, some incidents are easy: An instance was retired, or a fiber line was cut somewhere in Kansas.

I’m talking about the ones that aren’t so clear cut, so to speak:

  • Site feels slow
  • Median response times are up 25 percent from yesterday
  • Usage is down week over week compared to some relevant time frame

How do you answer those questions in a haystack of data points, up/down status and, in some cases, just plain noise? Application performance managers (APMs) came along and gave us tooling in the app frameworks and the browser. Now we have visibility into which calls are performing poorly and how our app “feels” to the end user.

This has helped by breaking down the haystack into smaller haystacks and provided a way to search and sort potential performance issues. However, most APM tools depend on instrumenting code such as Java, .Net or Ruby, and are not able to provide monitoring insights for the rapidly growing open source frameworks such as Grails, Akka, Netflix OSS and the like. We’re still left with the haystack problem: What is going on?

When I’m up at 2 a.m., stumbling over to the computer to find out what issue PagerDuty has alerted me to, I don’t really want to pore through page after page of graphs and up/down status. What would be really helpful is a system that can aggregate all those monitoring data flows and statuses, actually apply some intelligence. Then it could tell me, “Look at this response time of this component, and this metric is reporting something anomalous. This is most likely causing your issue right now.”

In other words, it would be helpful to have a system that can show both the symptoms and the root cause of the issue. That system also would provide high-quality, high-value information about my infrastructure and combine anomaly detection with root cause analysis to provide real root cause detection.

One new startup, OpsClarity, is hoping to make troubleshooting more accurate and efficient by combining data science and anomaly detection with root cause analysis.

Perhaps this is the long overdue beginning of the next age in monitoring.

 

About the Author/Michael Hobbs

Michael Hobbs HeadshotMichael Hobbs is the director of Technical Operations at Expa, a startup studio based in San Francisco. Significant experience in building, operating and automating highly scalable application infrastructures has allowed him to contribute to the success of Infusionsoft and Stumbleupon, among others. Michael continues to put that expertise to work at Expa by advising and assisting the next generation of Internet companies. Michael is also a significant contributor to many open source projects, including dokku and herokuish.

 

Filed Under: Blogs, Enterprise DevOps Tagged With: apm, application performance, application performance management, big data

« Webinar: DevOps at GE Transportation
3 Tips to Avoid Painting Your DevOps into a Scalability Corner »

Techstrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

Cloud Security Turbocharged: A Wild Ride of Innovation, Threats and Staying Ahead
Friday, September 22, 2023 - 11:00 am EDT
Infosys Zero Cost Mainframe Transformations
Monday, September 25, 2023 - 11:00 am EDT
How PRINCE2 Improves Cybersecurity
Tuesday, September 26, 2023 - 11:00 am EDT

GET THE TOP STORIES OF THE WEEK

Sponsored Content

JFrog’s swampUP 2023: Ready for Next 

September 1, 2023 | Natan Solomon

DevOps World: Time to Bring the Community Together Again

August 8, 2023 | Saskia Sawyerr

PlatformCon 2023: This Year’s Hottest Platform Engineering Event

May 30, 2023 | Karolina Junčytė

The Google Cloud DevOps Awards: Apply Now!

January 10, 2023 | Brenna Washington

Codenotary Extends Dynamic SBOM Reach to Serverless Computing Platforms

December 9, 2022 | Mike Vizard

Latest from DevOps.com

Cisco Acquires Splunk to Create Observability Powerhouse
September 21, 2023 | Mike Vizard
Nobl9 Unfurls Reliability Center for Managing SLOs
September 21, 2023 | Mike Vizard
Harness Launches Open Source Gitness Platform
September 21, 2023 | Mike Vizard
Documentation as Code: A Game Changer for DevOps Teams?
September 21, 2023 | Gilad David Maayan
Innersourcing Open Source Principles in the Enterprise
September 21, 2023 | Bill Doerrfeld

TSTV Podcast

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays

Most Read on DevOps.com

Why Enterprises Should Embrace Data-Driven Software Management
September 15, 2023 | Alex Circei
Should You Measure Developer Productivity?
September 18, 2023 | Bill Doerrfeld
Buildkite Acquires Packagecloud to Streamline DevOps Workflows
September 19, 2023 | Mike Vizard
JFrog swampUP: Addressing the Advent of AI
September 18, 2023 | William Willis
DevOps is Making Gains on Mainframe Platforms
September 15, 2023 | Mike Vizard
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2023 ·Techstrong Group, Inc.All rights reserved.