DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • DevOps Chats
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Communities
    • AWS Community Hub
    • CloudBees
    • IT as Code
    • Rocket on DevOps.com
    • Traceable on DevOps.com
    • Quali on DevOps.com
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Digital Anarchist
  • Media Kit
  • About
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DevSecOps
  • Leadership Suite
  • Practices
  • ROELBOB
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps

Home » Blogs » Enterprise DevOps » Application Monitoring in the Age of Big Data

Application Monitoring in the Age of Big Data

By: contributor on January 19, 2016 2 Comments

Operations (or ops, as it is known colloquially) has been necessary since the first shared systems came online in the ’60s. While the job title didn’t necessarily exist, the same questions needed answers then as they do now:

Recent Posts By contributor
  • How to Ensure DevOps Success in a Distributed Network Environment
  • Dissecting the Role of QA Engineers and Developers in Functional Testing
  • DevOps Primer: Using Vagrant with AWS
More from contributor
Related Posts
  • Application Monitoring in the Age of Big Data
  • The Risks of Shadow Code
  • Defining the Next Cycle of Technology
    Related Categories
  • Blogs
  • Enterprise DevOps
    Related Topics
  • apm
  • application performance
  • application performance management
  • big data
Show more
Show less
  • Is it up and accessible?
  • Is it responding in a reasonable time frame?
  • Is it performing better or worse now than previously?

In a simple system, say a static web site, these questions are answered easily by requesting the page. In an automated fashion, of course.

DevOps Connect:DevSecOps @ RSAC 2022

However, when you look at any “late model” technical infrastructure, you’ll find many, many moving parts. These range from border routers to web servers, work queues to databases and data caches to big data clusters. There are thousands upon thousands of computing instances, software processes, network storage devices and network links that can go down, up or just plain wonky. This means potentially millions of places where things can go wrong.

How do you keep an eye on all of these potential issues? How do you know if they’re responsive? How do you know if things are getting better or worse?

Some Monitoring Progress

The “old” method was to write some sort of monitoring script that polled some data that you thought would represent the overall health of that component. Through the years things have gotten more granular. Our monitoring script libraries have grown and we’ve even instrumented our applications with counters (statsd, librato, etc.) and come up with big data storage (opentsdb) to house all of these data points.

The question remains, Do we have any more clarity than before? We unarguably have more data points. So we must know more about the performance of our systems, right?

Turns out, in practice, we just have a lot of data and a lot of people attempting to decipher what that data means. If you’ve worked in operations you’ve been up at 2 a.m. looking at status dashboards with green, yellow and red icons and many, many graphs, all in an effort to answer the question: What is going on?

Sure, some incidents are easy: An instance was retired, or a fiber line was cut somewhere in Kansas.

I’m talking about the ones that aren’t so clear cut, so to speak:

  • Site feels slow
  • Median response times are up 25 percent from yesterday
  • Usage is down week over week compared to some relevant time frame

How do you answer those questions in a haystack of data points, up/down status and, in some cases, just plain noise? Application performance managers (APMs) came along and gave us tooling in the app frameworks and the browser. Now we have visibility into which calls are performing poorly and how our app “feels” to the end user.

This has helped by breaking down the haystack into smaller haystacks and provided a way to search and sort potential performance issues. However, most APM tools depend on instrumenting code such as Java, .Net or Ruby, and are not able to provide monitoring insights for the rapidly growing open source frameworks such as Grails, Akka, Netflix OSS and the like. We’re still left with the haystack problem: What is going on?

When I’m up at 2 a.m., stumbling over to the computer to find out what issue PagerDuty has alerted me to, I don’t really want to pore through page after page of graphs and up/down status. What would be really helpful is a system that can aggregate all those monitoring data flows and statuses, actually apply some intelligence. Then it could tell me, “Look at this response time of this component, and this metric is reporting something anomalous. This is most likely causing your issue right now.”

In other words, it would be helpful to have a system that can show both the symptoms and the root cause of the issue. That system also would provide high-quality, high-value information about my infrastructure and combine anomaly detection with root cause analysis to provide real root cause detection.

One new startup, OpsClarity, is hoping to make troubleshooting more accurate and efficient by combining data science and anomaly detection with root cause analysis.

Perhaps this is the long overdue beginning of the next age in monitoring.

 

About the Author/Michael Hobbs

Michael Hobbs HeadshotMichael Hobbs is the director of Technical Operations at Expa, a startup studio based in San Francisco. Significant experience in building, operating and automating highly scalable application infrastructures has allowed him to contribute to the success of Infusionsoft and Stumbleupon, among others. Michael continues to put that expertise to work at Expa by advising and assisting the next generation of Internet companies. Michael is also a significant contributor to many open source projects, including dokku and herokuish.

 

Filed Under: Blogs, Enterprise DevOps Tagged With: apm, application performance, application performance management, big data

Sponsored Content
Featured eBook
DevOps: Mastering the Human Element

DevOps: Mastering the Human Element

While building constructive culture, engaging workers individually and helping staff avoid burnout have always been organizationally demanding, they are intensified by the continuous, always-on notion of DevOps.  When we think of work burnout, we often think of grueling workloads and deadline pressures. But it also has to do with mismatched ... Read More
« Webinar: DevOps at GE Transportation
3 Tips to Avoid Painting Your DevOps into a Scalability Corner »

TechStrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

Boost Your Java/JavaScript Skills With a Multi-Experience Platform
Wednesday, June 29, 2022 - 3:30 pm EDT
Closing the Gap: Reducing Enterprise AppSec Risks Without Disrupting Deadlines
Thursday, June 30, 2022 - 11:00 am EDT
Automating the Observer: Lessons From 1,000+ Incidents
Thursday, June 30, 2022 - 1:00 pm EDT

Latest from DevOps.com

Chip-to-Cloud IoT: A Step Toward Web3
June 28, 2022 | Nahla Davies
DevOps Connect: DevSecOps — Building a Modern Cybersecurity Practice
June 27, 2022 | Veronica Haggar
What Is User Acceptance Testing and Why Is it so Important?
June 27, 2022 | Ron Stefanski
Developer’s Guide to Web Application Security
June 24, 2022 | Anas Baig
Cloudflare Outage Outrage | Yet More FAA 5G Stupidity
June 23, 2022 | Richi Jennings

Get The Top Stories of the Week

  • View DevOps.com Privacy Policy
  • This field is for validation purposes and should be left unchanged.

Download Free eBook

Hybrid Cloud Security 101
New call-to-action

Most Read on DevOps.com

Four Steps to Avoiding a Cloud Cost Incident
June 22, 2022 | Asim Razzaq
The Age of Software Supply Chain Disruption
June 23, 2022 | Bill Doerrfeld
At Some Point, We’ve Shifted Too Far Left
June 22, 2022 | Don Macvittie
Cloudflare Outage Outrage | Yet More FAA 5G Stupidity
June 23, 2022 | Richi Jennings
Developer’s Guide to Web Application Security
June 24, 2022 | Anas Baig

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2022 ·Techstrong Group, Inc.All rights reserved.