DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DataOps
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • Techstrong.tv Podcast
    • Techstrong.tv Video Podcast
    • Techstrong.tv - Twitch
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Content
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Techstrong TV
    • Techstrong.tv Podcast
    • Techstrong.tv Video Podcast
    • Techstrong.tv - Twitch
  • Media Kit
  • About
  • Sponsor
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DataOps
  • DevSecOps
  • DevOps Onramp
  • Platform Engineering
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps
    • ROELBOB

Home » Blogs » Enterprise DevOps » In Determining DevOps issues, Sometimes It’s the Process

In Determining DevOps issues, Sometimes It’s the Process

Avatar photoBy: Don Macvittie on August 1, 2017 Leave a Comment

It is pretty much standard DevOps process that when a server instance starts having problems, you simply kill it and start another. This is in line with the idea that servers are cattle and there isn’t a ton of difference between them.

Recent Posts By Don Macvittie
  • Don’t Hire for Product Expertise
  • Complexity is Still With Us
  • Are We Delivering?
Avatar photo More from Don Macvittie
Related Posts
  • In Determining DevOps issues, Sometimes It’s the Process
  • Continuous Integration vs. Continuous Delivery: There’s an Important Difference
  • 15 DevOps Expert Opinions on Continuous Integration and Delivery
    Related Categories
  • Blogs
  • DevOps Practice
  • Enterprise DevOps
    Related Topics
  • application monitoring
  • failure
  • High availability
  • reliability
Show more
Show less

But it creates a problem that no amount of CI/CD or automated provisioning can overcome. That’s the blindness problem. CI and CD miss some bugs, simply because of the wild variations possible in inputs when humans are involved, or unique issues of hardware or platform that DevOps tries to ignore as much as possible. While our tooling cannot be so over-arching that we have data points on everything, and we can track down problems to the routine that is causing them, a service degrading performance or outright crashing is indicative of one of those blind spots.

TechStrong Con 2023Sponsorships Available

If you’ve been in a highly dynamic DevOps environment, you know this is no simple problem. But it is one we have to resolve, because killing and restarting is simply masking a problem. Indeed, it is this very process that turned Cloudbleed from a simple and understandable programmer error into front page news. They knew their servers would occasionally have memory over-runs, even crashing on occasion; they would just spawn another instance and keep moving along. The ability to create another so easily reduced their desire/need to fix the underlying problem. But the problem was bigger than they knew.

The problem we face is, of course, how much time and work is invested in monitoring, and how do we know what’s really important. This problem gets worse when you have an instance that stops responding, or starts hogging resources. Getting it offline is imperative, but re-creating the issues that caused it to go off the rails may not be so easy. In fact, often it is not at all easy.

Long term, we need a way to pick up the exact problem area, create a bug report and get it back into the Dev/CI/CD system that is reliable. Short term, don’t ignore errors in production unless you know exactly what is going wrong.

Sure, it’s easy to kill and/or restart an instance to “recover” from an error. It’s easy to chuck your app on the public internet without any security, too. Just because it’s easy doesn’t make it a good idea.

So watch those logs, don’t mask problems and stay on top of it all, even if the complexity sometimes has you reeling a bit. Because application reliability might be impacted—and application availability is, in the end, the whole point.

— Don Macvittie

Filed Under: Blogs, DevOps Practice, Enterprise DevOps Tagged With: application monitoring, failure, High availability, reliability

« NodeSource Gets to the Source of the Node.js Issue
Why Your Manager Will Want to Send You to Jenkins World 2017 »

Techstrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

Evolution of Transactional Databases
Monday, January 30, 2023 - 3:00 pm EST
Moving Beyond SBOMs to Secure the Software Supply Chain
Tuesday, January 31, 2023 - 11:00 am EST
Achieving Complete Visibility in IT Operations, Analytics, and Security
Wednesday, February 1, 2023 - 11:00 am EST

Sponsored Content

The Google Cloud DevOps Awards: Apply Now!

January 10, 2023 | Brenna Washington

Codenotary Extends Dynamic SBOM Reach to Serverless Computing Platforms

December 9, 2022 | Mike Vizard

Why a Low-Code Platform Should Have Pro-Code Capabilities

March 24, 2021 | Andrew Manby

AWS Well-Architected Framework Elevates Agility

December 17, 2020 | JT Giri

Practical Approaches to Long-Term Cloud-Native Security

December 5, 2019 | Chris Tozzi

Latest from DevOps.com

Stream Big, Think Bigger: Analyze Streaming Data at Scale
January 27, 2023 | Julia Brouillette
What’s Ahead for the Future of Data Streaming?
January 27, 2023 | Danica Fine
The Strategic Product Backlog: Lead, Follow, Watch and Explore
January 26, 2023 | Chad Sands
Atlassian Extends Automation Framework’s Reach
January 26, 2023 | Mike Vizard
Software Supply Chain Security Debt is Increasing: Here’s How To Pay It Off
January 26, 2023 | Bill Doerrfeld

TSTV Podcast

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays

GET THE TOP STORIES OF THE WEEK

Most Read on DevOps.com

What DevOps Needs to Know About ChatGPT
January 24, 2023 | John Willis
Microsoft Outage Outrage: Was it BGP or DNS?
January 25, 2023 | Richi Jennings
Five Great DevOps Job Opportunities
January 23, 2023 | Mike Vizard
Optimizing Cloud Costs for DevOps With AI-Assisted Orchestra...
January 24, 2023 | Marc Hornbeek
A DevSecOps Process for Node.js Projects
January 23, 2023 | Gilad David Maayan
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2023 ·Techstrong Group, Inc.All rights reserved.