DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DataOps
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • Techstrong.tv Podcast
    • Techstrong.tv - Twitch
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Content
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Techstrong TV
    • Techstrong.tv Podcast
    • Techstrong.tv - Twitch
  • Media Kit
  • About
  • Sponsor
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DataOps
  • DevSecOps
  • DevOps Onramp
  • Platform Engineering
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps
    • ROELBOB
Hot Topics
  • npm is Scam-Spam Cesspool ¦ Google in Microsoft Antitrust Thrust
  • 5 Key Performance Metrics to Track in 2023
  • Debunking Myths About Reliability
  • New Relic Bets on AI to Advance Observability
  • Vega Cloud Commits to Reducing Cloud Costs

Home » Blogs » DevOps Practice » We Still Haven’t Solved the Logging Problem

We Still Haven’t Solved the Logging Problem

Avatar photoBy: Pete Cheslock on September 28, 2018 2 Comments

At a company I previously worked at, I helped build out the initial infrastructure to support a large cloud security product. It was one of the first times I had the opportunity to build something from nearly the very beginning. Like many operations professionals, so much of my career prior to this experience had been spent cleaning up other people’s messes and technical debt that I was finally excited to “do things right” from the onset.

Recent Posts By Pete Cheslock
  • DevOps: Here’s How to Slack Securely
  • Having DevOps In Your Job Title Is Doing You Harm
Avatar photo More from Pete Cheslock
Related Posts
  • We Still Haven’t Solved the Logging Problem
  • Sumo Logic Demonstrates Integration with Docker Platform as Part of New Docker Ecosystem Technology Partner (ETP) Program for Logging
  • DevOps Leadership Series: Monitoring Containers and Microservices
    Related Categories
  • Blogs
  • Business of DevOps
  • DevOps Practice
    Related Topics
  • data
  • log management
  • logging
Show more
Show less

Looking back, I wish I knew that building our monitoring platform would be hampered without the ability to retain and analyze both warm and cold log data. Here’s why.

TechStrong Con 2023Sponsorships Available

Time Series to the DevOps Rescue 

Hindsight is almost always 20/20, and today I can proudly say that my team and I built a killer app monitoring platform. My goal was to make it easy for Dev and Ops folks to report time series metrics for their applications and build dashboards and alerts for app health. And we did just that.

While many ops colleagues would say that monitoring journeys usually start by capturing application logs with open source tools such as Logstash/Elasticsearch, Graylog2, or even buying commercial services from Splunk or Sumo Logic, not all journeys start that way. My team and I took a different approach and optimized for time series metrics versus the more traditional structured log management approach, leveraging newer toolsets such as Librato for real-time operations monitoring and performance analytics. Librato enabled us to scale much faster than with traditional log solutions and gave us almost instant visibility across our containers, cloud servers and the apps themselves.

Logging is Too Expensive 

Using hosted services such as Librato to manage our time series metrics worked fine for a while. However, the cost of SaaS monitoring eventually grew, and, since we had the time and energy to bring metrics in house, we deployed Graphite, the open source enterprise-grade monitoring tool. Graphite allowed us to continue enabling our engineers with affordable access to granular time series metrics to help them manage their applications.

While this allowed the business to respond quickly when an issue arose, time series metrics were only part of the solution. They allowed us to see what was happening, but not always why it happened. While time series metrics and even some of the more advanced tracing technologies including Jaeger and Opentracing are fantastic to help debug complex distributed systems, they fall short when investigating and analyzing causes for problems, issues and hacks.

Digging Into Your Log History Gets Expensive, Quick

Tools such as the ELK stack have been booming, with Elastic alone hitting more than 100 million downloads last year, which shows that DevOps teams want more from traditional log management solutions. They’re looking for simplicity, ease of use and affordability, among others. With JSON effectively the de facto standard for logging, most teams take log data with its predefined schema and then index it into Elasticsearch/Lucene. Unfortunately, the Lucene indexing mechanism results in an unintended consequence: a 5 to 10X increase in disk usage. And just like that, ops cloud budgets get blown away when they realize that 10GB of JSON data indexed into Elasticsearch could be 50GB of disk used. You wanted that index to be replicated for disaster recovery as well? Then add 100GB of storage for every 10GB of logs.

DevOps Normal: Still the Tough Choice Between Retention and Cost

Unfortunately, most people choose to simply purge data from our “hot” Elasticsearch clusters as soon as we possibly can to minimize cost issues. This is far from ideal.

Companies moving to microservices-based architectures, deployed on containers such as Kubernetes, are going to be in for a rude awakening when they start building out their centralized logging platforms to keep up with the volume. Security teams are going to feel the burden as well, as they come under increasing pressure to keep data around for long periods of time due to compliance and audit reasons. When the next big web-facing vulnerability comes out, the ability to go back weeks or months will be key to see what IP address accessed a potentially malicious endpoint. With that data available, we could then take all those IPs and correlate them with other access logs.

Is centralized logging any better than it was 10 years ago? Or has all this data generated exploded so much that the tools we have today are just barely keeping up? One thing is certain, this problem is far from solved, and many companies may find themselves drowning in their own data lakes.

— Pete Cheslock

Filed Under: Blogs, Business of DevOps, DevOps Practice Tagged With: data, log management, logging

« Self Signed Certificates
The Art of Software Performance Testing »

Techstrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

https://webinars.devops.com/overcoming-business-challenges-with-automation-of-sap-processes
Tuesday, April 4, 2023 - 11:00 am EDT
Key Strategies for a Secure and Productive Hybrid Workforce
Tuesday, April 4, 2023 - 1:00 pm EDT
Using Value Stream Automation Patterns and Analytics to Accelerate DevOps
Thursday, April 6, 2023 - 1:00 pm EDT

Sponsored Content

The Google Cloud DevOps Awards: Apply Now!

January 10, 2023 | Brenna Washington

Codenotary Extends Dynamic SBOM Reach to Serverless Computing Platforms

December 9, 2022 | Mike Vizard

Why a Low-Code Platform Should Have Pro-Code Capabilities

March 24, 2021 | Andrew Manby

AWS Well-Architected Framework Elevates Agility

December 17, 2020 | JT Giri

Practical Approaches to Long-Term Cloud-Native Security

December 5, 2019 | Chris Tozzi

Latest from DevOps.com

npm is Scam-Spam Cesspool ¦ Google in Microsoft Antitrust Thrust
March 31, 2023 | Richi Jennings
5 Key Performance Metrics to Track in 2023
March 31, 2023 | Sarah Guthals
Debunking Myths About Reliability
March 31, 2023 | Kit Merker
New Relic Bets on AI to Advance Observability
March 30, 2023 | Mike Vizard
Vega Cloud Commits to Reducing Cloud Costs
March 30, 2023 | Mike Vizard

TSTV Podcast

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays

GET THE TOP STORIES OF THE WEEK

Most Read on DevOps.com

Don’t Make Big Tech’s Mistakes: Build Leaner IT Teams Instead
March 27, 2023 | Olivier Maes
How to Supercharge Your Engineering Teams
March 27, 2023 | Sean Knapp
Five Great DevOps Job Opportunities
March 27, 2023 | Mike Vizard
The Power of Observability: Performance and Reliability
March 29, 2023 | Javier Antich
Cloud Management Issues Are Coming to a Head
March 29, 2023 | Mike Vizard
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2023 ·Techstrong Group, Inc.All rights reserved.