DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DataOps
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • Techstrong.tv Podcast
    • Techstrong.tv - Twitch
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Content
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Techstrong TV
    • Techstrong.tv Podcast
    • Techstrong.tv - Twitch
  • Media Kit
  • About
  • Sponsor
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DataOps
  • DevSecOps
  • DevOps Onramp
  • Platform Engineering
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps
    • ROELBOB
Hot Topics
  • Grafana Labs Acquires Pyroscope to Add Code Profiling Capability
  • Four Technologies Transforming Data and Driving Change
  • Neural Hashing: The Future of AI-Powered Search
  • How Database DevOps Fuels Digital Transformation
  • Large Organizations Are Embracing AIOps

Home » Blogs » Leadership Suite » Code Yellow: When Operations Isn’t Perfect

Code Yellow: When Operations Isn’t Perfect

Avatar photoBy: Todd Palino on April 13, 2018 7 Comments

Every now and then, engineering teams will get into trouble for any number of reasons. Sometimes, explosive growth catches the team by surprise. Or, on-call has gotten out of control, with alerts going off every five minutes. Or, development and operations teams simply have stopped seeing eye to eye. Regardless of why, the team is in a bad spot and something needs to be done to resolve it.

Related Posts
  • Code Yellow: When Operations Isn’t Perfect
  • Empower Developers to Build Security into DevOps
  • Continuous Monitoring: The Role of DevOps and APM
    Related Categories
  • Blogs
  • Leadership Suite
    Related Topics
  • application development
  • devops teams
  • LinkedIn
Show more
Show less

If it’s a new problem, the fix might be easy: Add a few more servers, roll back to a known good application version or get everyone together over pizza and beers to clear the air. Often, however, the problem creeps up on you over time and suddenly the hole is so deep you can’t find the way out. At LinkedIn, a team that has gotten to this point will often declare a state known as “Code Yellow.”

Yellow Alert

Some people assume the name Code Yellow is based on traffic lights, but more accurately—and with a geekier twist—it’s actually from your favorite “Star Trek” series. More precisely, it’s how the crew of the Starship Enterprise indicates their current defensive condition. Either way, the definition is clear: Something is wrong, and we need to move forward with caution. True to both metaphors, we also have a Code Red. That’s better described as an immediate crisis, with everyone working 24 hours a day until it’s resolved. The Code Yellow is slightly more laid-back: This is everyone’s primary focus, but during business hours only. Code Yellows also tends to last on the order of months, while a Code Red should last on the order of days.

Other companies may use a different term than Code Yellow, but the effect is the same: The team is communicating to the rest of the company that they’ve identified a serious problem that is a priority to be fixed to ensure the success of the team and, therefore, the company. The ability to do this is an important aspect of open and honest communication, a value that is critical to a healthy culture and can often get overlooked. Talking about our problems is just as important, if not more so, than celebrating our successes. Teams can learn more from fixing a problem than they can from a total success.

This is Not a Failure

The first step in starting a Code Yellow is to understand this is not a failure. There is no shame in admitting that the team has a problem that needs to be fixed. Bugs happen, despite our best efforts to avoid them. The only thing we can do is diagnose the problem and remediate it. The only time we fail is when we turn a blind eye to these problems. This applies to how our engineering teams interact with each other just as much as it does to the software and systems we produce and run.

It is critical, however, that the right problems are addressed. Most of the time, we’ve gotten into the current situation because of a slow boil—increasing technical debt, many small issues or breakdowns in a process—that eventually built up into a crisis. The goal of the Code Yellow must be to not only remediate the current problems (a reactive component), but also make sure they are not repeated in the future (the proactive component).

Planning for a Successful Code Yellow

There are several components required for a successful Code Yellow, as well as necessary pieces to get buy-in from the rest of the organization:

  • Problem Statement: There must be a clear and agreed-upon statement of the problems facing the team that prompted the Code Yellow. This should include not only what the current problems are, but also what the current understanding of their root cause is.
  • Exit Criteria: Next, you need to have specific goals that the team will work toward to exit the Code Yellow. These should be traditional SMART goals: specific, measurable, achievable, relevant and time-bound. These goals are what make it possible for the team to enter a Code Yellow in the first place, as it covers a fixed scope and is not open-ended.
  • Communication: All information about the Code Yellow, including the announcement (which includes the problem statement and exit criteria), the successful conclusion and periodic status updates should be sent to the larger organization. This may be your department, or it may be the entire engineering organization. It may even be the entire company, depending on the nature of the problems.
  • Project Management: Like all large projects, there needs to be someone responsible for organizing the work and communicating information. As this represents an “all hands on deck” scenario for the impacted team, it is usually helpful to have a dedicated project manager (PM) to help with this. This is typically a PM who is knowledgeable about the team and the work, but not directly involved with the execution. This frees up the managers and individual contributors to focus on the work at hand.

Once each of these aspects have been thought about, and the decision has been made to enter Code Yellow, the team’s first act is to reorganize their priorities around the exit criteria. This often means putting quarterly goals on the shelf. It may also be necessary to establish a dedicated meeting around discussing the status of the exit criteria.

Space to Breathe

It is all well and good for a team to enter Code Yellow and work with a single focus on the goals that have been set to make things right, but this is not enough for the team to succeed. For true success, everyone surrounding the team must understand the situation and give them the space to do their work. This is the place where a healthy engineering culture rises to meet the challenge.

  • Expect Delays: The most common way that a tangential team will be affected is through delays. They must expect that any requests that have been made of the impacted team may be delayed if they are not within the scope of the exit criteria. The Code Yellow involves, at its core, a reordering of priorities to address the stated problem. Outside teams need to factor this in and understand that their own project timelines may need to be adjusted.
  • Minimize New Requests: Other teams should also refrain from asking the impacted team for new things that are outside the scope of the defined exit criteria. Minimizing these requests, in addition to accepting delays on any existing requests, allows the impacted team to spend their limited engineering hours on getting to the other side of the Code Yellow.
  • Requests for Assistance from other teams: The team in Code Yellow may find they need outside help to reach their goals. For example, if there is sudden, explosive growth in traffic, they may need to accelerate the provisioning of new hardware. Finding yourself on the receiving end of a request such as this may require shifting your own priorities. Always remember that the team is all part of the same company, and as such, everyone succeeds or fails together.

Engineering teams rarely stand alone, and it is important that everyone understand the value in having those teams working well, and working together well. A little temporary delay in goals to assure that this is the case is well worth it.

Light at the End of the Tunnel

Code Yellow represents a significant amount of high-priority work and working through it will often be stressful for the team. Saying “no” to coworkers that are making a reasonable request is hard, and the work that is in scope rarely involves spending time on interesting new features. In addition, if the problems being addressed include communication issues between groups, there are going to be some difficult conversations that need to happen. However, as the team approaches the end of the work, it will be much easier to see through to what lies on the other side of the exit criteria.

The ultimate goal of the Code Yellow is to get the team out of a reactive mode where they are running from crisis to crisis and into a proactive state where they are able to work on the right big projects. Achieving the exit criteria will mean the engineers are more effective and able to work proactively. This is a stronger team—engineers are happier because they’re not under a heavy stress of operations work, the team is working well because they’re talking to each other effectively, and customers are pleased because requests are handled either through automation or in a reasonable amount of time.

Does your company have an internal process that’s similar to how LinkedIn does a Code Yellow?

— Todd Palino

Filed Under: Blogs, Leadership Suite Tagged With: application development, devops teams, LinkedIn

« Money Talks, Cash Talks Louder
DevOps: Role Clarity is Important »

Techstrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

How Atlassian Scaled a Developer Security Solution Across Thousands of Engineers
Tuesday, March 21, 2023 - 1:00 pm EDT
The Testing Diaries: Confessions of an Application Tester
Wednesday, March 22, 2023 - 11:00 am EDT
The Importance of Adopting Modern AppSec Practices
Wednesday, March 22, 2023 - 1:00 pm EDT

Sponsored Content

The Google Cloud DevOps Awards: Apply Now!

January 10, 2023 | Brenna Washington

Codenotary Extends Dynamic SBOM Reach to Serverless Computing Platforms

December 9, 2022 | Mike Vizard

Why a Low-Code Platform Should Have Pro-Code Capabilities

March 24, 2021 | Andrew Manby

AWS Well-Architected Framework Elevates Agility

December 17, 2020 | JT Giri

Practical Approaches to Long-Term Cloud-Native Security

December 5, 2019 | Chris Tozzi

Latest from DevOps.com

Grafana Labs Acquires Pyroscope to Add Code Profiling Capability
March 17, 2023 | Mike Vizard
Four Technologies Transforming Data and Driving Change
March 17, 2023 | Thomas Kunnumpurath
Neural Hashing: The Future of AI-Powered Search
March 17, 2023 | Bharat Guruprakash
How Database DevOps Fuels Digital Transformation
March 17, 2023 | Bill Doerrfeld
Large Organizations Are Embracing AIOps
March 16, 2023 | Mike Vizard

TSTV Podcast

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays

GET THE TOP STORIES OF THE WEEK

Most Read on DevOps.com

SVB: When Silly Valley Sneezes, DevOps Catches a Cold
March 14, 2023 | Richi Jennings
Five Great DevOps Job Opportunities
March 13, 2023 | Mike Vizard
Low-Code Should be Worried About ChatGPT
March 14, 2023 | Romy Hughes
Improving the DevOps Process for Mobile App Developers
March 13, 2023 | Tom Tovar
NETSCOUT Taps F5 to Optimize Custom App Performance
March 13, 2023 | Mike Vizard
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2023 ·Techstrong Group, Inc.All rights reserved.