DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DataOps
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • Techstrong.tv Podcast
    • Techstrong.tv Video Podcast
    • Techstrong.tv - Twitch
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Content
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Techstrong TV
    • Techstrong.tv Podcast
    • Techstrong.tv Video Podcast
    • Techstrong.tv - Twitch
  • Media Kit
  • About
  • Sponsor
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DataOps
  • DevSecOps
  • DevOps Onramp
  • Platform Engineering
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps
    • ROELBOB
Hot Topics
  • We Are Living in an Ephemeral World
  • Cisco Bets on OpenTelemetry to Advance Observability
  • 5 Technologies Powering Cloud Optimization
  • Platform Engineering: Creating a Paved Path to Reduce Developer Toil
  • Where Does Observability Stand Today, and Where is it Going Next?

Home » Blogs » Continuous Delivery » Tax Day IT Lessons, or How to Avoid Wiping Out on Your Biggest Day of the Year

Tax Day IT Lessons, or How to Avoid Wiping Out on Your Biggest Day of the Year

Avatar photoBy: Fred Stevens-Smith on April 12, 2019 Leave a Comment

This time last year, as the Internal Revenue Service (IRS) geared up for its biggest day of the year, a firmware bug deep in its storage network was waiting to take down the system. When Tax Day arrived, the onslaught of last-minute filers triggered a systemwide outage that lasted 11 hours and forced the IRS to extend its filing deadline by a day.

Recent Posts By Fred Stevens-Smith
  • What Google’s ‘Digital Wellbeing’ Means for Every App Developer
  • What Developers Can Learn from Apple’s iOS Woes
Avatar photo More from Fred Stevens-Smith
Related Posts
  • Tax Day IT Lessons, or How to Avoid Wiping Out on Your Biggest Day of the Year
  • What Devs Need to Know About R&D Tax Credits
  • The Reality of Cloud Migration: It Gets Worse Before it Gets Better
    Related Categories
  • Blogs
  • Continuous Delivery
  • DevOps Practice
  • Enterprise DevOps
    Related Topics
  • application development
  • code freeze
  • deployment process
  • Internal Revenue Service
  • IRS
  • tax day
Show more
Show less

The IRS has another chance to get it right next week, but its IT team is far from alone in having dropped the ball on its most important day of the year. In 2015, the government’s Healthcare.gov website was plagued by crashes on the day Obamacare launched. Last year, Amazon’s website crashed just as Prime Day kicked off, and a few months later, Walmart’s systems fumbled at the start of Black Friday.

TechStrong Con 2023Sponsorships Available

These glitches can cost organizations millions, not to mention the hit to their reputations and inconvenience to customers. As Tax Day approaches, it’s worth looking at some steps any organization can take to avoid embarrassing or costly glitches when a big day arrives. These occasions are fraught with risk due to sudden traffic spikes or untested code.

Here are steps organizations can take to minimize the chance of disruption.

Have a Deployment Process You Trust

People often talk about a code freeze as a solution, but this shouldn’t be your first instinct. A code freeze ties your hands from making last minute changes, and let’s face it, few teams have their code buttoned down months ahead of time. Far better is to have a rock-solid deployment process that gives you confidence that what you’re shipping won’t break. Here are some elements it should include:

  • Trusted testing gate before production. You and your team need confidence that you’re shipping code that works on the devices and browsers your customers use. There are many ways to do this but focus on these elements: speed of throughput needed, the level of risk you are willing to tolerate and the cost to your business of bugs making it to production.
  • A totally automated process with no human bottlenecks. There’s nothing like trying to ship a fix to production, only to realize the engineer with permission to approve the release is on PTO.
  • Proper environment escalation. Contrary to the above, never ship direct to production! Have a proper staging server that is a good enough replica of production, so you can test your application in conditions that match production.
  • Good test data management. This is one of the areas where software companies fall down. Stale data, data that doesn’t have edges or nuances like in production and data that contains personally identifiable information (PII) are all very common culprits—this is one of the most difficult things to get right with testing.

Stress Test for Robustness

In high-traffic periods, new stuff breaks. Stress testing helps to ensure services will continue running under a heavy load, but you need a realistic approach grounded in actual user behavior. Can you handle 20,000 users, all trying to register at once when your new app goes live? What pathways will customers take through your website or application? What features have caused problems in the past? Identify common pathways and potential problem areas and stress-test the hell out of them.

Build a Culture of Quality

One reason big events are scary for engineers is the sudden emphasis on accountability and quality that was lacking before. Solving for quality at the eleventh hour is guaranteed to be less effective and efficient than building a culture of quality from the start. Much like most management, the only way to do this well is to create accountability for your teams and give ownership of quality directly to them. A few quick ideas: Add a functionality review to your code review step, provide tooling to your team so they can drive testing themselves on pre-production features and make them feel the pain of bugs getting into production by having an engineering support rotation.

Measurement and Metrics

Achieving this culture of quality depends partly on good measurement and metrics. Sadly, this is one of the weakest and least-defined areas of development today. Since it’s impossible to directly observe quality in production—since you can only know about the bugs that are caught—it’s more productive to create some measure for QA process coverage and tie that off to the business. One way to do this is to look at the differences between how your customers are using your application in production, versus how you are testing it pre-production. This can give you a lot of insight into the areas that are under-covered or over-covered and can be a useful way to measure whether incremental investments in QA will actually be ROI-positive.

If You Can’t Do Any of That … Fine, Resort to a Code Freeze

If you don’t have a bulletproof deployment process, a code freeze is a brute-force option. It ties your hands from making last-minute changes, but at least ensures you won’t add new risks without adequate time to discover them. The duration depends on how long it takes your team to surface production issues, based on past experience. If you don’t have that data, six to eight weeks may be a safe period. But really, if this is your main strategy for avoiding problems on game day, you’re probably in the wrong job.

Review Patch Management Policies

This one might have helped the IRS. Its storage vendor, IBM, had issued a patch for the bug that brought down the system months earlier, but the IRS didn’t apply it because it was part of a code bundle that didn’t meet its production testing requirements. A government report concluded that the IRS needs to vet those decisions more effectively and document the process. Bottom line: Have a structure and process in place that ensures you’re making informed, sensible decisions about whether to apply new patches.

Other issues contributed to the IRS meltdown: Its storage system had no automatic fail-over or built-in redundancies, creating a single point of failure. It was unfortunate that several factors conspired against it, but luck isn’t something you should depend on. Executing successfully for a big event isn’t achieved in a few short months beforehand. It involves building the right culture and practices to guide your development and operations year-round. This will greatly minimize risk and the work you need to do to be prepared.

— Fred Stevens-Smith

Filed Under: Blogs, Continuous Delivery, DevOps Practice, Enterprise DevOps Tagged With: application development, code freeze, deployment process, Internal Revenue Service, IRS, tax day

« Developing Simple and Stable Machine Learning Models
SmartBear Integrates Test Tools with Atlassian DevOps Platforms »

Techstrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

Shipping Applications Faster With Kubernetes: Myth or Reality?
Wednesday, February 8, 2023 - 1:00 pm EST
Why Current Approaches To "Shift-Left" Are A DevOps Antipattern
Thursday, February 9, 2023 - 1:00 pm EST
Log Love: Monitoring, Troubleshooting, Forensics and Biz Analytics
Tuesday, February 14, 2023 - 11:00 am EST

Sponsored Content

The Google Cloud DevOps Awards: Apply Now!

January 10, 2023 | Brenna Washington

Codenotary Extends Dynamic SBOM Reach to Serverless Computing Platforms

December 9, 2022 | Mike Vizard

Why a Low-Code Platform Should Have Pro-Code Capabilities

March 24, 2021 | Andrew Manby

AWS Well-Architected Framework Elevates Agility

December 17, 2020 | JT Giri

Practical Approaches to Long-Term Cloud-Native Security

December 5, 2019 | Chris Tozzi

Latest from DevOps.com

We Are Living in an Ephemeral World
February 8, 2023 | Don Macvittie
Cisco Bets on OpenTelemetry to Advance Observability
February 7, 2023 | Mike Vizard
5 Technologies Powering Cloud Optimization
February 7, 2023 | Gilad David Maayan
Platform Engineering: Creating a Paved Path to Reduce Developer Toil
February 7, 2023 | Daniel Bryant
Where Does Observability Stand Today, and Where is it Going Next?
February 6, 2023 | Tomer Levy

TSTV Podcast

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays

GET THE TOP STORIES OF THE WEEK

Most Read on DevOps.com

OpenAI Hires 1,000 Low Wage Coders to Retrain Copilot | Netflix Blocks Password Sharing
February 2, 2023 | Richi Jennings
Automation Challenges Holding DevOps Back
February 1, 2023 | Mike Vizard
Three Trends That Will Transform DevOps in 2023
February 2, 2023 | Dan Belcher
Red Hat Brings Ansible Automation to Google Cloud
February 2, 2023 | Mike Vizard
The Ultimate Guide to Hiring a DevOps Engineer
February 2, 2023 | Vikas Agarwal
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2023 ·Techstrong Group, Inc.All rights reserved.