DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • DevOps Chats
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Communities
    • AWS Community Hub
    • CloudBees
    • IT as Code
    • Rocket on DevOps.com
    • Traceable on DevOps.com
    • Quali on DevOps.com
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Digital Anarchist
  • Media Kit
  • About
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DevSecOps
  • Leadership Suite
  • Practices
  • ROELBOB
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps

Home » Blogs » How to Achieve Cloud Operational Excellence

How to Achieve Cloud Operational Excellence

By: William Malik on September 2, 2021 Leave a Comment

Cloud operational excellence means delivering the right mix of cloud-based services at the optimal cost and quality to support your organization’s mission and strategy. You might be thinking, “Why go to the trouble? Isn’t ‘almost good enough’ sufficient?” It turns out that it matters—a lot. The difference between the top 10% and the bottom 10% in cost per relevant delivery metric can vary by an order of magnitude or more.

In the mid-1990s, Gartner acquired an IT metrics firm called Real Decisions. They offered benchmarking services so customers could compare their IT efficiency with similar organizations. With hundreds of “Global 2000”-sized customers, their database was rich. Over time, the Real Decisions team refined their catalog of metrics and enriched their data with repeat studies over time, to develop indices of efficiency. 

DevOps Connect:DevSecOps @ RSAC 2022

Within their user population, the difference in cost per unit of productive work was 11 times better in the top 10% compared with the bottom 10%. That doesn’t mean 11% better, it means 1,000% better. 

Note that the user base was self-selected. All participants wanted to get objective metrics of performance relevant to their business goals and paid for extensive studies involving questionnaires, financial audits and technical benchmarks. Notably, all participants had cost recovery programs (beyond chargeback) in place. The bottom 10% of this segment is still within the top 10% of the IT industry—and their score is 11 times worse than the best of the best. 

Which raises the question: What is the industry average for IT efficiency? Is it possible that the hundreds of benchmarking users are all doing IT wrong, and the search for relevant metrics is misguided? I think not. 

Choosing the Right Metrics

In 1911, Fredrick W. Taylor published The Principles of Scientific Management, which discussed approaches to optimizing two important variables: Output quality and worker compensation. Taylor recognized that successful firms work collaboratively, with management and workers jointly setting goals and developing methods and tools to achieve both profitability and proportionate compensation. Nowhere in this text—or in any of his recorded speeches or documents—does he say, “If you can’t measure it, you can’t manage it.” He never said that for two reasons: First, he did not believe it; second, it is not true. What he did emphasize was that if you do measure it, you will manage it. That was a warning: Pick the right metrics or you will exert effort pursuing a meaningless goal. 

Here are some potential cloud performance metrics compiled from several cloud providers and users across various industries, not-for-profits and governmental agencies: 

  • Service Metrics 
    • Reliability – mean time between failure (MTBF)
    • Availability – Uptime, expressed as a meaningful percentage of demand
    • Serviceability – Mean time to repair (MTTR)
  • IT Metrics
    • Capacity 
    • Latency
    • Bandwidth
    • Response time
  • Strategic Metrics  
    • Business agility 
    • Customer engagement 
    • Customer reach 
    • Financial impact 
    • Solution performance 

The journey to cloud excellence starts with developing the metrics that are most relevant to your business goals. Picking the right metrics with the right scale matters. As a rule of thumb, generally right beats precisely wrong. 

Sustaining Excellence 

After you agree on a set of metrics that are both statistically reliable, repeatable, objective and aligned with your firm’s mission, how do you achieve and sustain excellence? 

In the 1970s, the U.S. Department of Defense bought a lot of custom software. Sometimes it worked well, other times it didn’t. So, the DoD funded research on code quality. It turns out that the key difference between great and mediocre quality code flowed from how the organization managed problems. More specifically, how did the team react to an unexpected event? The spectrum runs from confusion and dismay through fire drill chaos to calm, rational assessment and remediation. That methodology gets baked into the code itself. This study produced the Capability Maturity Model (CMMi), which was created by the Software Engineering Institute at Carnegie Mellon University. 

The CMMi framework identifies five levels of process maturity. A simplified assessment of an organization’s process maturity level comes down to two questions. First, is there current comprehensive documentation for the process, including how to deal with defects? If the answer is yes, the organization is at level three or higher. If the answer is no, the second question is, “Does anyone know what’s going on?” If the answer is yes, that’s level two; if no, level one. 

A level one organization has no standard method in place to deal with problems. When something goes wrong, everybody grabs tools and tries to figure out what went wrong and how to fix it. Organizations like this do not spend much on training or analysis. Their focus is continuing to produce whatever they are trying to make. Over time, an individual may develop expertise in diagnosing a component, and when things go wrong, the call goes out to “Get Fred in here!” to troubleshoot the problem. Organizations with pockets of expertise are moving into level two. Most organizations fall within one of these two levels. 

Organizational transformation is very difficult, and often requires significant changes in the management team, as well as funding different activities. Training and communications skills are crucial to proceeding beyond level two. Management rewards heroes who can shoot the most difficult bugs. They get the big bonus, the promotion, the better office, a parking space near the door. This behavior reinforces the culture of heroes. But moving forward requires the heroes take on a new role.  

Once the organization creates documentation, it is on the path to level three. Note that these transformations are wrenching. It is not easy to tell the heroes that their greatest value to the organization is now how well they can write or teach. But with proper management attention, it can be done. And the benefits of moving forward are many: 

  • Dramatically fewer crises. Staff could make plans and keep them – no emergencies interrupting a family gathering, a school event, or a get-together with friends. 
  • High quality code. Maintenance tasks became much simpler, and customers experienced improved reliability. The documentation was helpful, and troubleshooting became routine rather than overwhelming. 
  • Reliable planning. In a mature organization, plans hold true because they are based in proven metrics, continuously validated processes and uniformly high competence within the team. Project estimates are accurate because the data stems from reliable, repeatable evidence. 

Cloud excellence is not a phantom or an unachievable goal. It is the result of clear thinking and sound documentation. Over time, practices improve, skills build. To quote Macklemore, “The greats weren’t great because at birth they could paint, the greats were great because they painted a lot.”  With practice and focus, your organization can achieve cloud excellence.

Related Posts
  • How to Achieve Cloud Operational Excellence
  • Improving Mainframe Agility With DevOps
  • The State of DevOps Report 2019 Is Out
    Related Categories
  • Application Performance Management/Monitoring
  • Blogs
  • Business of DevOps
  • DevOps in the Cloud
  • Enterprise DevOps
  • Infrastructure/Networking
    Related Topics
  • business agility
  • cloud
  • operations
  • service metrics
Show more
Show less

Filed Under: Application Performance Management/Monitoring, Blogs, Business of DevOps, DevOps in the Cloud, Enterprise DevOps, Infrastructure/Networking Tagged With: business agility, cloud, operations, service metrics

Sponsored Content
Featured eBook
The 101 of Continuous Software Delivery

The 101 of Continuous Software Delivery

Now, more than ever, companies who rapidly react to changing market conditions and customer behavior will have a competitive edge.  Innovation-driven response is successful not only when a company has new ideas, but also when the software needed to implement them is delivered quickly. Companies who have weathered recent events ... Read More
« Career Alternatives
Avoid Security Apathy with DevSecOps »

TechStrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

Deploying Microservices With Pulumi & AWS Lambda
Tuesday, June 28, 2022 - 3:00 pm EDT
Boost Your Java/JavaScript Skills With a Multi-Experience Platform
Wednesday, June 29, 2022 - 3:30 pm EDT
Closing the Gap: Reducing Enterprise AppSec Risks Without Disrupting Deadlines
Thursday, June 30, 2022 - 11:00 am EDT

Latest from DevOps.com

Developer’s Guide to Web Application Security
June 24, 2022 | Anas Baig
Cloudflare Outage Outrage | Yet More FAA 5G Stupidity
June 23, 2022 | Richi Jennings
The Age of Software Supply Chain Disruption
June 23, 2022 | Bill Doerrfeld
Four Steps to Avoiding a Cloud Cost Incident
June 22, 2022 | Asim Razzaq
At Some Point, We’ve Shifted Too Far Left
June 22, 2022 | Don Macvittie

Get The Top Stories of the Week

  • View DevOps.com Privacy Policy
  • This field is for validation purposes and should be left unchanged.

Download Free eBook

The 101 of Continuous Software Delivery
New call-to-action

Most Read on DevOps.com

Survey Uncovers Depth of Open Source Software Insecurity
June 21, 2022 | Mike Vizard
One Year Out: What Biden’s EO Means for Software Devs
June 20, 2022 | Tim Mackey
Open Source Coder Tool Helps Devs Build Cloud Spaces
June 20, 2022 | Mike Vizard
At Some Point, We’ve Shifted Too Far Left
June 22, 2022 | Don Macvittie
Cloudflare Outage Outrage | Yet More FAA 5G Stupidity
June 23, 2022 | Richi Jennings

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2022 ·Techstrong Group, Inc.All rights reserved.