DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DataOps
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • Techstrong.tv Podcast
    • Techstrong.tv Video Podcast
    • Techstrong.tv - Twitch
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Content
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Techstrong TV
    • Techstrong.tv Podcast
    • Techstrong.tv Video Podcast
    • Techstrong.tv - Twitch
  • Media Kit
  • About
  • Sponsor
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DataOps
  • DevSecOps
  • DevOps Onramp
  • Platform Engineering
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps
    • ROELBOB
Hot Topics
  • Where Does Observability Stand Today, and Where is it Going Next?
  • Five Great DevOps Job Opportunities
  • 5 Technologies Powering Cloud Optimization
  • Azure Migration Strategy: Tools, Costs and Best Practices
  • Blameless Integrates Incident Management Platform With Opsgenie

Home » Blogs » But What About When You Fail?

But What About When You Fail?

Avatar photoBy: Don Macvittie on October 6, 2021 Leave a Comment

One thing Agile and DevOps definitely brought IT was a more accepting view of the whole “mistakes happen” mantra. Crashing systems is no longer a guarantee of a free ticket out the door. Indeed, off the top of my head, I can think of at least two cases where a CIO themselves checked in sloppy code and it trashed the system—CIOs at orgs big enough that they probably should not have been coding in the first place.

And that brings us to today’s blog. The standard response to “But what if there is a bad update?” is “We’ll just roll out a new version!” That works in some instances. In a lot of instances—most that involve GitOps as part of DevOps—it doesn’t. Implosions can be huge and take out chunks of infrastructure. So you need a better plan than just assuming you can fix it with a forward update. Do you have rollback capability? Are you making use of it, assuring all is set to roll back if needed? Testing to make sure rollback works with the other changes made across the system in this update?

TechStrong Con 2023Sponsorships Available

That is part of the problem. Fixing with a new update is the best option, simply because in the age of massively distributed, microservices-based solutions, rolling back comes with a ton of baggage—enough that it may not be viable for you. Okay. So you can’t quickly roll forward. You can’t quickly roll back. Quick! What do you do?!

And I don’t have the answer to that question, because I’m not in your organization working on your systems. At some point, all IT is personal. And this is one of those points. You need to know what your best options are if an update spirals everything, and you need to have a plan to implement it. But what your best options are is not going to be the same as the next org. This is that point.

One option is to keep the ability to build the entire system. That’s a large setup, but every minute systems are down is hurting the company. And that’s what you need to plan for. Think of it as disaster planning for DevOps. A man-made disaster that destroys systems and infrastructure.

So, like any other disaster planning scenario, walk through the chain that makes the system work, identify weaknesses and list ways to address them. Then test those to make certain they do what you hope they will. Then, set up an automated system to keep this whole plan up-to-date.

In short, we’re in a new automated landscape, and we need a new automated tool to pull our rears out of the fire when the inevitable happens. Be it from dev error or malicious attacker, it is a safe bet that, sooner or later, you will have a massive systems outage that your DevOps toolchain can’t adequately address. Know what you are going to do. Or at least have thought about it, so you’re not just starting to think it through while coworkers and customers can’t access systems.

And keep rocking it. This is just another layer of protection for all the hard work you’re doing. Take the extra step. Like insurance, if you ever need it, you will absolutely be glad you did.

Recent Posts By Don Macvittie
  • Looking Ahead, 2023 Edition
  • Don’t Hire for Product Expertise
  • Complexity is Still With Us
Avatar photo More from Don Macvittie
Related Posts
  • But What About When You Fail?
  • MongoDB Needs and Loves DevOps
  • Red Hat Unveils Ansible Platform
    Related Categories
  • Blogs
  • DevOps Practice
  • Doin' DevOps
  • Enterprise DevOps
  • Features
    Related Topics
  • application availability
  • DevOps toolchains
  • disaster recovery
  • Disaster recovery testing
  • roll back
  • roll forward
  • SRE Toolchain
Show more
Show less

Filed Under: Blogs, DevOps Practice, Doin' DevOps, Enterprise DevOps, Features Tagged With: application availability, DevOps toolchains, disaster recovery, Disaster recovery testing, roll back, roll forward, SRE Toolchain

« Global Next-Generation Software Engineering Conference
Are ERP-Centric Enterprises Being Left Behind? »

Techstrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

Automating Day 2 Operations: Best Practices and Outcomes
Tuesday, February 7, 2023 - 3:00 pm EST
Shipping Applications Faster With Kubernetes: Myth or Reality?
Wednesday, February 8, 2023 - 1:00 pm EST
Why Current Approaches To "Shift-Left" Are A DevOps Antipattern
Thursday, February 9, 2023 - 1:00 pm EST

Sponsored Content

The Google Cloud DevOps Awards: Apply Now!

January 10, 2023 | Brenna Washington

Codenotary Extends Dynamic SBOM Reach to Serverless Computing Platforms

December 9, 2022 | Mike Vizard

Why a Low-Code Platform Should Have Pro-Code Capabilities

March 24, 2021 | Andrew Manby

AWS Well-Architected Framework Elevates Agility

December 17, 2020 | JT Giri

Practical Approaches to Long-Term Cloud-Native Security

December 5, 2019 | Chris Tozzi

Latest from DevOps.com

Azure Migration Strategy: Tools, Costs and Best Practices
February 3, 2023 | Gilad David Maayan
Blameless Integrates Incident Management Platform With Opsgenie
February 3, 2023 | Mike Vizard
OpenAI Hires 1,000 Low Wage Coders to Retrain Copilot | Netflix Blocks Password Sharing
February 2, 2023 | Richi Jennings
Red Hat Brings Ansible Automation to Google Cloud
February 2, 2023 | Mike Vizard
Three Trends That Will Transform DevOps in 2023
February 2, 2023 | Dan Belcher

TSTV Podcast

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays

GET THE TOP STORIES OF THE WEEK

Most Read on DevOps.com

OpenAI Hires 1,000 Low Wage Coders to Retrain Copilot | Netflix Blocks Password Sharing
February 2, 2023 | Richi Jennings
New Relic Bolsters Observability Platform
January 30, 2023 | Mike Vizard
Jellyfish Adds Tool to Visualize Software Development Workflows
January 31, 2023 | Mike Vizard
Cisco AppDynamics Survey Surfaces DevSecOps Challenges
January 31, 2023 | Mike Vizard
Automation Challenges Holding DevOps Back
February 1, 2023 | Mike Vizard
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2023 ·Techstrong Group, Inc.All rights reserved.