DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DataOps
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • Techstrong.tv Podcast
    • Techstrong.tv - Twitch
    • DevOps Unbound
  • Webinars
    • Upcoming
    • Calendar View
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • Calendar View
    • On-Demand Events
  • Sponsored Content
  • Related Sites
    • Techstrong Group
    • Cloud Native Now
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Techstrong TV
    • Techstrong.tv Podcast
    • Techstrong.tv - Twitch
  • Media Kit
  • About
  • Sponsor
  • AI
  • Cloud
  • CI/CD
  • Continuous Testing
  • DataOps
  • DevSecOps
  • DevOps Onramp
  • Platform Engineering
  • Sustainability
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps
    • ROELBOB
Hot Topics
  • Atlassian Advances DevSecOps via Jira Integrations
  • PagerDuty Signals Commitment to Adding Generative AI Capabilities
  • Mastering DevOps Automation for Modern Software Delivery
  • DigiCert Allies With ReversingLabs to Secure Software Supply Chains
  • The Future of Continuous Testing in CI/CD

Home » Blogs » Enterprise DevOps » Dev & Ops: These Are Reconcilable Differences

Dev & Ops: These Are Reconcilable Differences

Avatar photoBy: Dominic Wellington on June 9, 2017 Leave a Comment

The new world of IT requires a different approach from what worked in the past. As usual, the technology is the easy part, but without people and processes being aligned, it won’t deliver value. When evaluating how data science and machine learning can help manage highly dynamic IT environments, old reflexes may be actively harmful.

Recent Posts By Dominic Wellington
  • Want a Successful Software Build? Just Say No
  • A Vendor Guy Goes To Monitorama
  • Ops: The Other DevOps
Avatar photo More from Dominic Wellington
Related Posts
  • Dev & Ops: These Are Reconcilable Differences
  • A Three Step Roadmap for Enterprise DevOps
  • Putting Ops Back in DevOps
    Related Categories
  • AI
  • Blogs
  • Business of DevOps
  • Enterprise DevOps
    Related Topics
  • AIOps
  • algorithm
  • enterprise
  • Enterprise IT Operations
  • IT operations
  • machine learning
  • Moogsoft
  • rules
Show more
Show less

By their nature and due to their experiences, Ops people tend to be a pretty (small-C) conservative bunch. Rule No. 1 of Ops is “If it ain’t broke, DON’T TOUCH IT!” and if we’re honest with ourselves, most of the rest of the rulebook is just variations and commentaries on that theme.

Cloud Native NowSponsorships Available

One of the ways in which Ops people try to control their world is through automation. DevOps mostly looks to the moment of release and deployment, as that is where the handover to Ops typically occurs—and this is a big focus of the drive to automate. However, there is another part of Ops where automation is key, something that Dev is not always nearly as involved in; namely, day-to-day operations. Which alerts should even be sent, whom they should be routed to, what are the valid responses and so on—all these have been automated over the years, to a greater or lesser extent.

Ops people have laboriously documented their architecture and spent a long time in meetings planning which information is relevant to share and who should respond to alarms. Much of this is implemented in various pieces of software, whether commercial, open-source, home-grown or that special grey area, where something that started out as a standard component has been so customized that it is effectively bespoke. Rules, filters and thresholds determine what action to take in which case.

These approaches have been in place for years or even decades, but now they are starting to fail. Simply put, IT’s increased complexity and the ever-accelerating rate of change are outpacing administrators’ ability to keep up and reconfigure their management systems.

The Coming ‘New Normal’

New approaches are emerging to deal with the new normal, but as ever, the technology is the easy part. For new technology to be a success, people and processes need to be in sync.

Right now, the emerging approach is to use data science and machine learning to process events, instead of the old deterministic rulesets and databases. The results speak for themselves, especially at scale and in highly dynamic environments. People building container-based or SDN architectures no longer assume that there will be a database of configurations that will always be up to date; quite the opposite.

The new way is to throw out all the filters and rules, feed all the alerts into a single place, and then use data science and machine learning techniques to sift them and identify relationships with them. The result will be a shift in focus from individual alerts to the business problems those alerts are the symptoms of.

Algorithms are great at these sorts of repetitive, high-volume tasks. They can look at massive volumes of events and figure out which are even relevant and worth taking a second look at, and then identify how those events relate to each other and what they really mean. This is where the humans come in, with their strengths in low-volume and unpredictable situations. Because they are not constantly drowning in irrelevant noise, they can work together effectively to understand the actual problem and get it fixed quickly.

Where’s the Catch?

The difficulty is that for many Ops people, there is a big cultural change required to adopt this model. The old deterministic systems may have required laborious maintenance, but at least it was possible to model them and understand why a specific result was achieved. New algorithmic techniques take a radically different approach, and while the results are impressive, they come from a black box. This does not inspire confidence for people who are very comfortable with the old approaches.

Often, the response of Ops people coming from legacy tools is to get stuck into the minutiae of what is going on inside the black box and to look for traditional failure modes. The problem is that these approaches are no longer valid in the new world, as I discussed in my last post about getting to SRE.

For instance, while it is indeed important to make sure that no issues are missed due to over-aggressive filtering, this does not mean that we need to see every single alert. As long as we know about the actual business problem, we don’t actually care about the single alerts, except insofar as they help us debug the underlying issue and resolve it.

The other direction is even more insidious, where people spend a lot of time chasing down “extraneous” events that they think they should not be seeing. The risk is, of course, that the Ops team ends right back up in the same place that they were trying to escape, where they are only able to identify expected problems, but are getting blindsided on a regular basis by the “unknown unknowns,” the conditions that simply should not happen.

Ultimately, Ops needs to let go of reflexes that were good in the old world, and focus on the goal: ensuring the uptime and stability of business-supporting systems. The how is less relevant than the what and the why; the algorithmic black box might as well be full of magic leprechauns, as long as they are telling Ops what they need to know, when they need to know it. Call it the “Chinese Room” theory of Ops: For as long as the results are useful to Ops, it does not matter what is inside the room.

The benefit for Ops is precisely that they no longer need to focus on the minutiae of individual alerts, but can concentrate on solving actual business problems. That is how IT can make itself a value generator and avoid being viewed as another undifferentiated cost center.

Pay no attention to the algorithm behind the curtain.

— Dominic Wellington

Filed Under: AI, Blogs, Business of DevOps, Enterprise DevOps Tagged With: AIOps, algorithm, enterprise, Enterprise IT Operations, IT operations, machine learning, Moogsoft, rules

« Fear of Automation
Spurring the DevOps Adoption Journey »

Techstrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

Maximize IT Operations Observability with IBM i Within Splunk
Wednesday, June 7, 2023 - 1:00 pm EDT
Secure Your Container Workloads in Build-Time with Snyk and AWS
Wednesday, June 7, 2023 - 3:00 pm EDT
ActiveState Workshop: Building Secure and Reproducible Open Source Runtimes
Thursday, June 8, 2023 - 1:00 pm EDT

GET THE TOP STORIES OF THE WEEK

Sponsored Content

PlatformCon 2023: This Year’s Hottest Platform Engineering Event

May 30, 2023 | Karolina Junčytė

The Google Cloud DevOps Awards: Apply Now!

January 10, 2023 | Brenna Washington

Codenotary Extends Dynamic SBOM Reach to Serverless Computing Platforms

December 9, 2022 | Mike Vizard

Why a Low-Code Platform Should Have Pro-Code Capabilities

March 24, 2021 | Andrew Manby

AWS Well-Architected Framework Elevates Agility

December 17, 2020 | JT Giri

Latest from DevOps.com

Atlassian Advances DevSecOps via Jira Integrations
June 6, 2023 | Mike Vizard
PagerDuty Signals Commitment to Adding Generative AI Capabilities
June 6, 2023 | Mike Vizard
Mastering DevOps Automation for Modern Software Delivery
June 6, 2023 | Krishna R.
DigiCert Allies With ReversingLabs to Secure Software Supply Chains
June 6, 2023 | Mike Vizard
The Future of Continuous Testing in CI/CD
June 6, 2023 | Alexander Tarasov

TSTV Podcast

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays

Most Read on DevOps.com

No, Dev Jobs Aren’t Dead: AI Means ‘Everyone’s a Programmer’? ¦ Interesting Intel VPUs
June 1, 2023 | Richi Jennings
Forget Change, Embrace Stability
May 31, 2023 | Don Macvittie
Revolutionizing the Nine Pillars of DevOps With AI-Engineered Tools
June 2, 2023 | Marc Hornbeek
Friend or Foe? ChatGPT’s Impact on Open Source Software
June 2, 2023 | Javier Perez
Checkmarx Brings Generative AI to SAST and IaC Security Tools
May 31, 2023 | Mike Vizard
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2023 ·Techstrong Group, Inc.All rights reserved.