DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • DevOps Chats
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Communities
    • AWS Community Hub
    • CloudBees
    • IT as Code
    • Rocket on DevOps.com
    • Traceable on DevOps.com
    • Quali on DevOps.com
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Digital Anarchist
  • Media Kit
  • About
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DevSecOps
  • Leadership Suite
  • Practices
  • ROELBOB
  • Low-Code/No-Code
  • IT as Code
  • More Topics
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps

Home » Blogs » Enterprise DevOps » In Determining DevOps issues, Sometimes It’s the Process

In Determining DevOps issues, Sometimes It’s the Process

By: Don Macvittie on August 1, 2017 Leave a Comment

It is pretty much standard DevOps process that when a server instance starts having problems, you simply kill it and start another. This is in line with the idea that servers are cattle and there isn’t a ton of difference between them.

Recent Posts By Don Macvittie
  • Is Your Future in SaaS? Yes, Except …
  • Update Those Ops Tools, Too
  • Why We Still Need Specialists
More from Don Macvittie
Related Posts
  • In Determining DevOps issues, Sometimes It’s the Process
  • Top Nine Skills for SREs to Master
  • Why Over-Permissive CI/CD Pipelines are an Unnecessary Evil
    Related Categories
  • Blogs
  • DevOps Practice
  • Enterprise DevOps
    Related Topics
  • application monitoring
  • failure
  • High availability
  • reliability
Show more
Show less

But it creates a problem that no amount of CI/CD or automated provisioning can overcome. That’s the blindness problem. CI and CD miss some bugs, simply because of the wild variations possible in inputs when humans are involved, or unique issues of hardware or platform that DevOps tries to ignore as much as possible. While our tooling cannot be so over-arching that we have data points on everything, and we can track down problems to the routine that is causing them, a service degrading performance or outright crashing is indicative of one of those blind spots.

DevOps/Cloud-Native Live! Boston

If you’ve been in a highly dynamic DevOps environment, you know this is no simple problem. But it is one we have to resolve, because killing and restarting is simply masking a problem. Indeed, it is this very process that turned Cloudbleed from a simple and understandable programmer error into front page news. They knew their servers would occasionally have memory over-runs, even crashing on occasion; they would just spawn another instance and keep moving along. The ability to create another so easily reduced their desire/need to fix the underlying problem. But the problem was bigger than they knew.

The problem we face is, of course, how much time and work is invested in monitoring, and how do we know what’s really important. This problem gets worse when you have an instance that stops responding, or starts hogging resources. Getting it offline is imperative, but re-creating the issues that caused it to go off the rails may not be so easy. In fact, often it is not at all easy.

Long term, we need a way to pick up the exact problem area, create a bug report and get it back into the Dev/CI/CD system that is reliable. Short term, don’t ignore errors in production unless you know exactly what is going wrong.

Sure, it’s easy to kill and/or restart an instance to “recover” from an error. It’s easy to chuck your app on the public internet without any security, too. Just because it’s easy doesn’t make it a good idea.

So watch those logs, don’t mask problems and stay on top of it all, even if the complexity sometimes has you reeling a bit. Because application reliability might be impacted—and application availability is, in the end, the whole point.

— Don Macvittie

Filed Under: Blogs, DevOps Practice, Enterprise DevOps Tagged With: application monitoring, failure, High availability, reliability

Sponsored Content
Featured eBook
The State of the CI/CD/ARA Market: Convergence

The State of the CI/CD/ARA Market: Convergence

The entire CI/CD/ARA market has been in flux almost since its inception. No sooner did we find a solution to a given problem than a better idea came along. The level of change has been intensified by increasing use, which has driven changes to underlying tools. Changes in infrastructure, such ... Read More
« NodeSource Gets to the Source of the Node.js Issue
Why Your Manager Will Want to Send You to Jenkins World 2017 »

TechStrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

Accelerating Continuous Security With Value Stream Management
Monday, May 23, 2022 - 11:00 am EDT
The Complete Guide to Open Source Licenses 2022
Monday, May 23, 2022 - 3:00 pm EDT
Building a Successful Open Source Program Office
Tuesday, May 24, 2022 - 11:00 am EDT

Latest from DevOps.com

DevSecOps Deluge: Choosing the Right Tools
May 20, 2022 | Gary Robinson
Managing Hardcoded Secrets to Shrink Your Attack Surface 
May 20, 2022 | John Morton
DevOps Institute Releases Upskilling IT 2022 Report 
May 18, 2022 | Natan Solomon
Creating Automated GitHub Bots in Go
May 18, 2022 | Sebastian Spaink
Is Your Future in SaaS? Yes, Except …
May 18, 2022 | Don Macvittie

Get The Top Stories of the Week

  • View DevOps.com Privacy Policy
  • This field is for validation purposes and should be left unchanged.

Download Free eBook

The 101 of Continuous Software Delivery
New call-to-action

Most Read on DevOps.com

Why Over-Permissive CI/CD Pipelines are an Unnecessary Evil
May 16, 2022 | Vladi Sandler
Apple Allows 50% Fee Rise | @ElonMusk Fans: 70% Fake | Micro...
May 17, 2022 | Richi Jennings
Making DevOps Smoother
May 17, 2022 | Gaurav Belani
DevOps Institute Releases Upskilling IT 2022 Report 
May 18, 2022 | Natan Solomon
Creating Automated GitHub Bots in Go
May 18, 2022 | Sebastian Spaink

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2022 ·Techstrong Group, Inc.All rights reserved.