DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • DevOps Chats
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Communities
    • AWS Community Hub
    • CloudBees
    • IT as Code
    • Rocket on DevOps.com
    • Traceable on DevOps.com
    • Quali on DevOps.com
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Digital Anarchist
  • Media Kit
  • About
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DevSecOps
  • Leadership Suite
  • Practices
  • ROELBOB
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps

Home » Blogs » Event Management: Let the Noise Wail Without Going Deaf

Event Management: Let the Noise Wail Without Going Deaf

By: contributor on October 25, 2017 1 Comment

In part 1 of the blog series “Putting the Ops in DevOps,” by James Moore, we discussed the awareness and anticipation of operational management and the best practices that can improve the app at every stage of its life cycle. In this blog, I would like to cover one of the event management discipline maturity and share some best practices.

Recent Posts By contributor
  • How to Ensure DevOps Success in a Distributed Network Environment
  • Dissecting the Role of QA Engineers and Developers in Functional Testing
  • DevOps Primer: Using Vagrant with AWS
More from contributor
Related Posts
  • Event Management: Let the Noise Wail Without Going Deaf
  • How DevOps is Killing QA
  • DevSecOps: Realities of Policy Management
    Related Categories
  • Blogs
  • DevOps Practice
    Related Topics
  • development
  • devops
  • even management
  • incident management
  • operations
  • state
  • status
Show more
Show less

If you are an operations professional on a team that’s beginning to embrace DevOps methodologies, you probably are facing some significant challenges.

DevOps Connect:DevSecOps @ RSAC 2022

You know the business is applying pressure to get to continuous integrations/delivery, where release cycles can be measured by value delivered in days or even hours rather than weeks or months. Your Dev teams have embraced agile development, and are producing (what should be) production-ready code in short iterations. Your combined orgs are working toward implementing an integrated delivery toolchain that can build, test and deploy a new release version at the touch of a button. Your Dev teams are leveraging cloud technologies and architectural patterns that enhance their agility, auto-scaling microservice architectures of unprecedented scale and modularity. They are producing, as a matter of best practice, highly instrumented code that provides a dense trail of data relating to the state of the applications deployed on runtimes that constantly report their own state. You may be responsible for physical and virtual compute, network and storage infrastructure that itself is monitored with tools that spew a continuous stream of state information.

In other words, you find yourself faced with continuous change (both automatic and human driven) and a barrage of potential noise. If you are a seasoned operations professional, bitter experience may have lead you to see change as the enemy—as change tends to zero, then so does operational risk. Now, you may to have to deal with continuous change as an inevitable force for advancing your company’s business goals. How will you separate the signal of service status from the noise of service state? Doing this successfully will be the difference between knowing quickly that there is a problem with an application or service before your users notice and being told about it by your users. It may also be the difference between getting a good night sleep and unnecessarily responding to a page indicating a benign application state.

Enter event and incident management. The discipline of event management has matured, and has continued to mature, over decades as a mechanism for determining the status of elements of a managed environment from the state. Best practice in event management consists of:

  • Filtering out events that are not likely to be service affecting. A trace message from an app log is not worth your attention, unless you need to view it as a part of a diagnostic procedure. A synthetic user transaction timeout may require immediate investigation.
  • Correlating events that are likely to be related, so you get one notification per true incident rather than, say, 20.
  • Enriching events with context, so if a single service instance fails in a redundant array of five instances, route the event to the operations console but don’t wake anyone up unless the service is affected.
  • Implementing X-in-Y policies. In a large and complex system, a single microservice-to-microservice HTTP timeout may not be a huge problem. But you may want to investigate if you start getting 20 per second.
  • Implementing runbooks for common failures. Collaborate with development on the definition of those runbooks. Tie those runbooks to incidents as they occur so that the first responder has a process they can reinstate a service or prevent an outage in the first place.
  • If possible, automating those runbooks. If the problem is a failed process, then restart it. A disk is full? Free up some space.
  • Leveraging analytics and machine learning to gain insights from the reams of data your tool collects. Can you learn from event history to suggest correlations or areas for improvement to ops efficiency?

With rapid delivery of new features and capabilities, the Dev team should already be mindful of the operability of the software they produce. Shift awareness of the event management tools and logic to the left. Have development use the same management tools that Ops use in production, but in pre-production. And then learn from what they see in pre-prod to improve. This should be done in combination with systems failure testing. Perform the test and see how the operations tools responds. Ask yourself, “Would I have been able to fix this if it had happened in production? Would I even have known what to do?”

As is obvious by now, successfully operating an application or service in a modern business requires unprecedented co-operation between development and operations teams. Event management as a discipline and the tools that support that discipline are as critical now as they have ever been, if not more so.

Listen to my recent podcast where I spoke about the common operational challenges many DevOps teams are facing today and how the traditional IT Operations best practices could be leveraged for use in a DevOps methodology.

Also, download this white paper, which includes best practices for DevOps transformation and improving event or incident management.

 

About the Author / Dr. Kristian J. Stewart

Kristian Stewart is Architect – Hybrid Cloud Event Management and Analytics at IBM. He currently leads architecture for IBM’s Netcool Event Management offering, and is part of the team providing as-a-service capabilities to IBM’s clients with Cloud Event Management. He has worked in Systems and Service Management for 18 years. He lives in England with his wife and two daughters, two cats, and five Raspberry Pis.

Filed Under: Blogs, DevOps Practice Tagged With: development, devops, even management, incident management, operations, state, status

Sponsored Content
Featured eBook
The 101 of Continuous Software Delivery

The 101 of Continuous Software Delivery

Now, more than ever, companies who rapidly react to changing market conditions and customer behavior will have a competitive edge.  Innovation-driven response is successful not only when a company has new ideas, but also when the software needed to implement them is delivered quickly. Companies who have weathered recent events ... Read More
« Building a Services and IoT Platform with DevOps, Part 1
Welcome to the CA Live Chat Series home page! »

TechStrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

Deploying Microservices With Pulumi & AWS Lambda
Tuesday, June 28, 2022 - 3:00 pm EDT
Boost Your Java/JavaScript Skills With a Multi-Experience Platform
Wednesday, June 29, 2022 - 3:30 pm EDT
Closing the Gap: Reducing Enterprise AppSec Risks Without Disrupting Deadlines
Thursday, June 30, 2022 - 11:00 am EDT

Latest from DevOps.com

Chip-to-Cloud IoT: A Step Toward Web3
June 28, 2022 | Nahla Davies
DevOps Connect: DevSecOps — Building a Modern Cybersecurity Practice
June 27, 2022 | Veronica Haggar
What Is User Acceptance Testing and Why Is it so Important?
June 27, 2022 | Ron Stefanski
Developer’s Guide to Web Application Security
June 24, 2022 | Anas Baig
Cloudflare Outage Outrage | Yet More FAA 5G Stupidity
June 23, 2022 | Richi Jennings

Get The Top Stories of the Week

  • View DevOps.com Privacy Policy
  • This field is for validation purposes and should be left unchanged.

Download Free eBook

Hybrid Cloud Security 101
New call-to-action

Most Read on DevOps.com

Four Steps to Avoiding a Cloud Cost Incident
June 22, 2022 | Asim Razzaq
The Age of Software Supply Chain Disruption
June 23, 2022 | Bill Doerrfeld
At Some Point, We’ve Shifted Too Far Left
June 22, 2022 | Don Macvittie
Cloudflare Outage Outrage | Yet More FAA 5G Stupidity
June 23, 2022 | Richi Jennings
Developer’s Guide to Web Application Security
June 24, 2022 | Anas Baig

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2022 ·Techstrong Group, Inc.All rights reserved.