DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • DevOps Chats
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Communities
    • AWS Community Hub
    • CloudBees
    • IT as Code
    • Rocket on DevOps.com
    • Traceable on DevOps.com
    • Quali on DevOps.com
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Digital Anarchist
  • Media Kit
  • About
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DevSecOps
  • DevOps Onramp
  • Practices
  • ROELBOB
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps

Home » Blogs » Continuous Testing » The Bug in Production: What You Don’t Know Can – and Will – Harm You

The Bug in Production

The Bug in Production: What You Don’t Know Can – and Will – Harm You

By: Frank Huerta on May 15, 2019 1 Comment

Despite the risk of unplanned downtime, many organizations that develop software and services push them live without adequately testing for bugs that will manifest against production traffic. This is a huge gamble: Those bugs could lead to errors that bring down the service altogether. Pushing live without sufficient testing can be detrimental to your business.

Related Posts
  • The Bug in Production: What You Don’t Know Can – and Will – Harm You
  • 5 Testing Strategies For Deploying Microservices
  • The real cost of downtime
    Related Categories
  • Blogs
  • Continuous Testing
  • DevOps Practice
  • DevSecOps
    Related Topics
  • CI/CD pipeline
  • devops
  • QA testing
  • Quality Assurance
  • rollbacks
  • rollouts
  • software bugs
Show more
Show less

The Problem of Going Live With Bugs in the Code

Software downtime leads to loss of revenue and of reputation. In fact, Gartner analysts have estimated that the average cost of downtime is $5,600 a minute—that’s well over $300,000 an hour. To provide a real-life example of what this looks like, Microsoft Azure suffered a major outage in November 2018 caused by issues introduced as part of a code update, lasting for 14 hours and affecting customers throughout Europe and beyond. With migration from legacy systems to micro-environments in the cloud, outages and downtime pose a growing and serious problem.

CloudNativeDay 2022

As companies switch to DevOps and CI/CD models to move faster and provide application updates sooner, software developers continually release new features and often push code updates as fast as they’re written. The classic six-month development timelines of dev, quality assurance (QA) and beta testing have been compressed to days and sometimes hours. Gone is the time when teams could beta test with customers for extended periods to flag real-time bugs.

With current quality testing tools, developers don’t know how a new software version will perform in production—or whether it will even work in production. The Cloudbleed bug is an example of this problem—in February 2017, a simple coding error in a software upgrade from security vendor Cloudflare led to a serious vulnerability discovered by a Google researcher several months later.

Flaws can lead to serious security issues later, in addition to having the immediate impacts mentioned above. Heartbleed, a vulnerability that arose in 2014 and stemmed from a programming mistake in the OpenSSL library, left large numbers of private keys and sensitive information exposed to the internet, enabling theft which otherwise would have been protected by SSL/TLS encryption.

Standard QA Testing Isn’t Enough: Test With Production Traffic

The way QA testing is typically done is no longer sufficient for today’s increasingly frequent and fast development cycles. Traditionally, DevOps teams haven’t been able to do side-by-side testing of the production version and an upgrade candidate.  The QA testing used by many organizations is a set of simulated test suites, which may not give comprehensive insight into the myriad ways in which customers may actually make use of the software. Just because upgraded code works under one set of testing parameters, doesn’t mean it will work in the unpredictable world of production usage.

As in the case of the Cloudflare incident, the error went entirely unnoticed by end users for an extended period of time and there were no system errors logged as a result of the flaw. Just as QA testing isn’t sufficient, relying on system logs and users also has a limited scope for what can be detected.

It is estimated that fixing flaws after a software release can be five times as expensive as fixing them during design—and it can lead to even costlier development delays. Enabling software teams to identify potential bugs and security concerns prior to release can alleviate those delays. Clearly, testing with production traffic earlier in the code development process can save time, money and pain. Software and DevOps teams need a way to test quickly and accurately how new releases will perform with real (not simulated) customer traffic and while maintaining the highest standards.

By evaluating release versions side by side, teams can quickly locate any differences or defects. In addition, they can gain real insight on network performance while also verifying the stability of upgrades and patches in a working environment. Doing this efficiently will significantly reduce the likelihood of releasing software that later needs to be rolled back. Rollbacks are expensive, as we saw in the case of the Microsoft Azure incident.

Some organizations stage rollouts, which requires running multiple software versions in production. The software teams put a small percentage of users on the new version, while most users run the status quo. Unfortunately, this approach to testing with production traffic is cumbersome to manage and costly, and still vulnerable to rollbacks. The other problem with these kinds of rolling deployments is that while failures can be caught early in the process, they are—by design—only caught after they’ve affected end users.

This brings more questions, including: How do you know whether the new software is causing the failures? and, How many failures does the business allow before recalling or rolling back the software, since the business does not observe side-by-side results from the same customer? This disrupts the end user experience, which ultimately affects business operations and company reputation. And staging may not provide a sufficient sample to gauge the efficacy of the new release versus the entire population of customers.

Cost is still an issue as well. If you stage with 10% of customers on the new version and a failure costs more than $300,000 an hour, then a failure affecting 10% of users could potentially still cost more than $30,000 per hour. The impact is reduced, of course, but it’s still significant—not counting the uncertainty of when to rollback.

Looking Ahead

Standard QA testing is no longer enough. To reduce the risk injected into the software development life cycle by today’s rapid iterations, DevOps teams can test in production and evaluate release versions side-by-side. This will help prevent costly rollbacks or staging while still releasing a quality, secure product. The old way of doing things is not sufficient, but fortunately, there is a better way.

— Frank Huerta

Filed Under: Blogs, Continuous Testing, DevOps Practice, DevSecOps Tagged With: CI/CD pipeline, devops, QA testing, Quality Assurance, rollbacks, rollouts, software bugs

Sponsored Content
Featured eBook
Hybrid Cloud Security 101

Hybrid Cloud Security 101

No matter where you are in your hybrid cloud journey, security is a big concern. Hybrid cloud security vulnerabilities typically take the form of loss of resource oversight and control, including unsanctioned public cloud use, lack of visibility into resources, inadequate change control, poor configuration management, and ineffective access controls ... Read More
« How to Throw an Exception
Kubernetes Adoption: Are You Ready? »

TechStrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

DevOps Institute's 2022 Global SRE Pulse Survey
Tuesday, August 9, 2022 - 11:00 am EDT
VSM, an Ideal Framework for Continuous Security Dashboards
Wednesday, August 10, 2022 - 11:00 am EDT
LIVE WORKSHOP - Accelerate Software Delivery With Value Stream Mapping
Wednesday, August 10, 2022 - 1:00 pm EDT

Latest from DevOps.com

Don’t Let Developer Toil Affect the Business Value of Your Apps
August 8, 2022 | Michael Cote
Leverage Empirical Data to Avoid DevOps Burnout
August 8, 2022 | Bill Doerrfeld
Learn Something New Every (Cloud-Native) Day
August 8, 2022 | Mike Rothman
Putting the Security Into DevSecOps
August 5, 2022 | Ross Moore
Recession! DevOps Hiring Freeze | Data Centers Suck (Power) | Intel to ‘be’ Wi-Fi 7
August 4, 2022 | Richi Jennings

Get The Top Stories of the Week

  • View DevOps.com Privacy Policy
  • This field is for validation purposes and should be left unchanged.

Download Free eBook

The Automated Enterprise
The Automated Enterprise

Most Read on DevOps.com

Recession! DevOps Hiring Freeze | Data Centers Suck (Power) ...
August 4, 2022 | Richi Jennings
Three Key Steps To Going Multi-Cloud
August 2, 2022 | Aran Khanna
Developer-led Landscape & 2022 Outlook
August 3, 2022 | Alan Shimel
Palo Alto Networks Extends Checkov Tool for Securing Infrast...
August 3, 2022 | Mike Vizard
Orgs Struggle to Get App Modernization Right
August 4, 2022 | Mike Vizard

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2022 ·Techstrong Group, Inc.All rights reserved.