DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • DevOps Chats
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Communities
    • AWS Community Hub
    • CloudBees
    • IT as Code
    • Rocket on DevOps.com
    • Traceable on DevOps.com
    • Quali on DevOps.com
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Digital Anarchist
  • Media Kit
  • About
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DevSecOps
  • Leadership Suite
  • Practices
  • ROELBOB
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps

Home » Blogs » 5 Ways to Reduce DevOps Toil

Intel implementations toil site reliability engineering

5 Ways to Reduce DevOps Toil

By: Kenneth Rose on July 29, 2021 Leave a Comment

Over the last several years, DevOps has become a bit of a buzzword. It has become simultaneously a practice, a culture, a team, a job title and a vendor product. You can hire some DevOps, buy some DevOps, adopt DevOps and sprinkle a little bit of DevOps on top for good measure.

But, at its core, DevOps is about service ownership: having a single team with end-to-end ownership of software running in production. You should not have one team building software and another team deploying and operating it. That’s slow. DevOps is literally combining development and operations into a single consolidated team. DevOps means embracing the mantra of “you build it, you own it.”

DevOps Connect:DevSecOps @ RSAC 2022

The primary benefit of DevOps is increased speed and agility of engineering teams. With DevOps, exactly one team is responsible for a given piece of production software. There are no handoffs or knowledge silos. Practices like continuous delivery can emerge. Features can be put in front of your customers’ eyes with a shorter turnaround time. A secondary benefit is increased reliability and security. With “you build it, you own it,” it is no longer someone else’s job to ensure that software is correct or performant or reliable or secure. These responsibilities are “shifted left” onto exactly one team.

So, where do things go off the rails?

Service ownership and DevOps mean giving teams the agency to make whatever changes are necessary to their software. That can and will come at the expense of product functionality at times. So, the first place things go off the rails is when there is not a healthy balance between product needs and engineering needs. If teams are constantly working on new features but do not have the capacity to fix operational issues that are affecting them, product quality will eventually suffer. Ultimately, product-engineering imbalance generally comes from misaligned incentives or poorly defined goals from management.

The second place where things go off the rails is insufficient training. Especially with organizations early in their journey to adopting DevOps, which may have separate development and operations teams, there is a lot of training on new technologies and processes. Developers need to learn about observability, responding to incidents and debugging production applications. Operators and sysadmins need to learn about coding and the principles of software design. These are new skills for both camps and investment in training and mentorship is required.

What are some best practices for keeping DevOps on track?

There are lots of these best practices. It really depends on your starting point as an engineering organization, but here are five important ones:

1. Track Service Ownership

If DevOps is about “you build it, you own it,” you need an authoritative list of a) what’s running in production and b) which team owns it. If a piece of software goes down, you do not want to find out that the last person to work on it was the intern who left three years ago. Many folks start tracking this kind of information in spreadsheets or wikis, but keeping the data up-to-date becomes challenging. SaaS service catalogs have emerged as the easiest way to automatically keep this data accurate, complete and consistent.

2. Adopt Continuous Integration, Continuous Delivery and Feature Flags

Committing small changes multiple times per day is superior to release trains where one large change is committed on an infrequent basis. CI/CD allows for new features and improvements to be tested faster and put in front of customers more quickly. Feature flag tools let you easily test new functionality and product hypotheses for different segments of customers.

3. Embrace Incident Management

Software breaks sometimes. Having a robust process for how to triage and respond to both alerts and incidents is important. Within the context of DevOps, the team that owns a piece of software should be responsible for being on-call to respond to associated incidents. If the software breaks or behaves in some unexpected way, the team is expected to fix it. The team is also responsible for identifying what happened and investing in ways to ensure the same error does not occur again.

4. Treat Everything as Code, Then Automate all the Things

Applications and microservices? That’s code. Infrastructure? That’s code, too. Security policies? Also code. Deployments? You guessed it–code.

Treating everything as code allows for processes like “code review” to emerge beyond just your applications. You can have multiple eyes on every change, which catches bugs sooner and spreads knowledge to others. As well, version control systems (like Git) are a natural audit log, allowing anyone to see what was changed, when, and by whom.

With everything defined as code, the next step is to invest in automation. How many steps does it take for a developer to deploy new code to production? Or to spin up new infrastructure? Or to hook up monitoring or error tracking tools?

Your goal should be “one step:” run a command. It can be a shell script, a Slack bot or a button with a nice web UI. But it should always be one step. If there are more, you’re now in “manual steps” territory, and that’s brittle. You have no guarantees that successive individuals will run whatever sequence of steps in the same order, which can lead to divergence in how various parts of production are set up.

Another example is maintaining a list of a) what’s running in production and b) who owns it. Manually keeping that list up-to-date is impossible as the size of your production infrastructure or engineering team scales. You’re better off investing in service catalog tooling that can automatically integrate and capture this information.

5. Invest in Reducing Toil

Service ownership and DevOps are two-way streets between management and their engineering teams. On one hand, consolidated ownership will lead to faster delivery and more reliable software. That’s good for management. But, on the other hand, teams require actual empowerment and agency to make decisions about their software. Part of that is having capacity on their roadmaps to make improvements to issues that are causing toil.

For example, if a particular service is paging a lot or experiencing load issues, service ownership means giving the owning team the time to make any necessary changes. That will come at the short-term expense of new production functionality.

There’s no free lunch on this last point. Engineering management can’t just combine dev and ops responsibilities and magically expect results. Service ownership is a two-way street. Otherwise, toil will kill your teams in the long term.

Related Posts
  • 5 Ways to Reduce DevOps Toil
  • How DevOps is Killing QA
  • How to Become a DevSecOps Engineer
    Related Categories
  • Blogs
  • DevOps Culture
  • DevOps Practice
  • Leadership Suite
    Related Topics
  • devops
  • DevOps processes
  • feature flags
  • incident management
  • toil
Show more
Show less

Filed Under: Blogs, DevOps Culture, DevOps Practice, Leadership Suite Tagged With: devops, DevOps processes, feature flags, incident management, toil

Sponsored Content
Featured eBook
The Automated Enterprise

The Automated Enterprise

“The Automated Enterprise” e-book shows the important role IT automation plays in business today. Optimize resources and speed development with Red Hat® management solutions, powered by Red Hat Ansible® Automation. IT automation helps your business better serve your customers, so you can be successful as you: Optimize resources by automating ... Read More
« 12 Ways to Bake Security Into a DevOps Transformation
EP 11: Unburdening Developers »

TechStrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

Continuous Deployment
Monday, July 11, 2022 - 1:00 pm EDT
Using External Tables to Store and Query Data on MinIO With SQL Server 2022
Tuesday, July 12, 2022 - 11:00 am EDT
Goldilocks and the 3 Levels of Cardinality: Getting it Just Right
Tuesday, July 12, 2022 - 1:00 pm EDT

Latest from DevOps.com

Rust in Linux 5.20 | Deepfake Hiring Fraud | IBM WFH ‘New Normal’
June 30, 2022 | Richi Jennings
Moving From Lift-and-Shift to Cloud-Native
June 30, 2022 | Alexander Gallagher
The Two Types of Code Vulnerabilities
June 30, 2022 | Casey Bisson
Common RDS Misconfigurations DevSecOps Teams Should Know
June 29, 2022 | Gad Rosenthal
Quick! Define DevSecOps: Let’s Call it Development Security
June 29, 2022 | Don Macvittie

Get The Top Stories of the Week

  • View DevOps.com Privacy Policy
  • This field is for validation purposes and should be left unchanged.

Download Free eBook

The State of the CI/CD/ARA Market: Convergence
https://library.devops.com/the-state-of-the-ci/cd/ara-market

Most Read on DevOps.com

What Is User Acceptance Testing and Why Is it so Important?
June 27, 2022 | Ron Stefanski
Rust in Linux 5.20 | Deepfake Hiring Fraud | IBM WFH ‘New No...
June 30, 2022 | Richi Jennings
Chip-to-Cloud IoT: A Step Toward Web3
June 28, 2022 | Nahla Davies
DevOps Connect: DevSecOps — Building a Modern Cybersecurity ...
June 27, 2022 | Veronica Haggar
The Two Types of Code Vulnerabilities
June 30, 2022 | Casey Bisson

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2022 ·Techstrong Group, Inc.All rights reserved.