DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • DevOps Chats
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Communities
    • AWS Community Hub
    • CloudBees
    • IT as Code
    • Rocket on DevOps.com
    • Traceable on DevOps.com
    • Quali on DevOps.com
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Digital Anarchist
  • Media Kit
  • About
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DevSecOps
  • Leadership Suite
  • Practices
  • ROELBOB
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps

Home » Blogs » AWS Outage Exposes Weaknesses of DevOps Resilience

AWS outage DevOps

AWS Outage Exposes Weaknesses of DevOps Resilience

By: Frank Ohlhorst on December 9, 2021 Leave a Comment

The December 7, 2021 Amazon Web Services (AWS) outage severely disrupted services from a wide range of businesses for more than five hours and highlighted just how reliant businesses have become on internet-delivered services. The outage mostly impacted web services in the eastern U.S., yet the implications are universal: It’s a reminder that many businesses blindly ignored the old axiom about putting all your eggs in one basket and instead are relying on a provider with a single point of failure.

Services ranging from airline booking systems to streaming video to e-commerce were disrupted during the outage, causing millions of dollars in lost revenue and countless hours in productivity. One of the more interesting aspects of the outage is the impact it had on services from collaboration vendors such as Slack, Trello, Asana and Smartsheet—tools that many development and DevOps teams have come to rely on.

DevOps Connect:DevSecOps @ RSAC 2022

Furthermore, core AWS services, such as the company’s Elastic Compute and DynamoDB cloud tools were also impacted, disrupting many third-party services and severely hampering business processes that use those services. While the obvious victims of the outage are well known, like Amazon’s own e-commerce operation, there is a troubling undercurrent: The disruption to DevOps frameworks and those using them.

AWS has been rather tight-lipped about the root cause of the outage thus far; however, there are still many lessons to be learned for the DevOps community and questions that must be asked such as “Can the DevOps process survive during IaaS/SaaS disruptions?” and “Can multi-cloud failover solutions be baked into the applications DevOps builds?”

There are no simple answers to those questions, but the outage highlights the need to understand the underlying architecture and framework of a deployed DevOps system. Take, for example, how many developers in the DevOps community have embraced SaaS tools to accelerate the development process and to feed CI/CD pipelines. SaaS applications such as code scanners, pipeline orchestration and even IDEs have become common in the world of DevOps. But has anyone bothered to ask what happens if a single one of those tools fails?

What’s more, the reliance on SaaS tools in the development process has led to the creation of potential liabilities in the applications created by DevOps developers. DevOps applications have come to rely on APIs, are often driven by microservices and are frequently deployed into containers that run on SaaS. If any of those elements become non-functional, numerous applications could fail, ultimately putting the onus on developers to explain why they are creating applications with a single point of failure.

Moving forward, the DevOps community needs to take a serious look at the components of their frameworks and determine if there are any single points of failure—including their IaaS/SaaS providers. While it may be impossible to remediate every single one, there is a lesson to be learned about how fragile the development process can become if no one bothers to build an inventory of the tools used and take into account how a failure of any one of those tools could impact workflow. There are numerous examples of how an external failure of a single component impacted the functionality of an application, while the discovery process of the root cause has taken days or even weeks. This can be mostly attributed to not only a lack of knowledge, but a lack of visibility into the components used.

Those lessons can be extended to the development process itself, where the best practice of rooting out single points of failure can be extended to the applications themselves. Leveraging that intelligence starts with understanding the concept of a software bill of materials (SBoM), a piece of supporting documentation that is becoming increasingly important to the purveyors of applications. A properly-defined SBoM reveals all of the components (libraries, APIs, etc.) that are baked into an application and can be used as a map to define where weaknesses may lie.

For the DevOps community, the recent AWS outage has become a clarion call to look inward and discover how the applications they are building may be part of the problem. With continuity and resiliency becoming major topics in the IT and business realm, it’s about time that DevOps practitioners start to look at how they can support both of those business-critical needs. The days of finger-pointing to shift blame must come to an end, and if businesses that rely on software want to grow, someone needs to take responsibility for providing answers when outages occur and learn from those outages to create applications that are more resilient.

Recent Posts By Frank Ohlhorst
  • Best of 2021 – Transform Legacy Java Apps to Microservices
  • How Log4j Becomes a Serious DevOps Problem
  • Despite Cloud Adoption, Enterprises Demand On-Premises Software
More from Frank Ohlhorst
Related Posts
  • AWS Outage Exposes Weaknesses of DevOps Resilience
  • How to Design DevSecOps Compliance Processes to Free Up Developer Resources
  • Cloudflare Outage Outrage | Yet More FAA 5G Stupidity
    Related Categories
  • Application Performance Management/Monitoring
  • Blogs
  • DevOps in the Cloud
  • DevOps Practice
  • Enterprise DevOps
  • Features
    Related Topics
  • AWS
  • IaaS
  • outage
  • SaaS
  • SBoM
Show more
Show less

Filed Under: Application Performance Management/Monitoring, Blogs, DevOps in the Cloud, DevOps Practice, Enterprise DevOps, Features Tagged With: AWS, IaaS, outage, SaaS, SBoM

Sponsored Content
Featured eBook
Hybrid Cloud Security 101

Hybrid Cloud Security 101

No matter where you are in your hybrid cloud journey, security is a big concern. Hybrid cloud security vulnerabilities typically take the form of loss of resource oversight and control, including unsanctioned public cloud use, lack of visibility into resources, inadequate change control, poor configuration management, and ineffective access controls ... Read More
« Vercel Acquires Turborepo to Gain Build System
Sentry Acquires Specto to Add Analytics Tool for Observability »

TechStrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

Boost Your Java/JavaScript Skills With a Multi-Experience Platform
Wednesday, June 29, 2022 - 3:30 pm EDT
Closing the Gap: Reducing Enterprise AppSec Risks Without Disrupting Deadlines
Thursday, June 30, 2022 - 11:00 am EDT
Automating the Observer: Lessons From 1,000+ Incidents
Thursday, June 30, 2022 - 1:00 pm EDT

Latest from DevOps.com

Common RDS Misconfigurations DevSecOps Teams Should Know
June 29, 2022 | Gad Rosenthal
Quick! Define DevSecOps: Let’s Call it Development Security
June 29, 2022 | Don Macvittie
Chip-to-Cloud IoT: A Step Toward Web3
June 28, 2022 | Nahla Davies
DevOps Connect: DevSecOps — Building a Modern Cybersecurity Practice
June 27, 2022 | Veronica Haggar
What Is User Acceptance Testing and Why Is it so Important?
June 27, 2022 | Ron Stefanski

Get The Top Stories of the Week

  • View DevOps.com Privacy Policy
  • This field is for validation purposes and should be left unchanged.

Download Free eBook

The Automated Enterprise
The Automated Enterprise

Most Read on DevOps.com

The Age of Software Supply Chain Disruption
June 23, 2022 | Bill Doerrfeld
Cloudflare Outage Outrage | Yet More FAA 5G Stupidity
June 23, 2022 | Richi Jennings
Developer’s Guide to Web Application Security
June 24, 2022 | Anas Baig
What Is User Acceptance Testing and Why Is it so Important?
June 27, 2022 | Ron Stefanski
DevOps Connect: DevSecOps — Building a Modern Cybersecurity ...
June 27, 2022 | Veronica Haggar

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2022 ·Techstrong Group, Inc.All rights reserved.