DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • DevOps Chats
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Communities
    • AWS Community Hub
    • CloudBees
    • IT as Code
    • Rocket on DevOps.com
    • Traceable on DevOps.com
    • Quali on DevOps.com
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Digital Anarchist
  • Media Kit
  • About
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DevSecOps
  • Leadership Suite
  • Practices
  • ROELBOB
  • Low-Code/No-Code
  • IT as Code
  • More Topics
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps

Home » Features » What’s the Difference Between SLI, SLA and SLO?

Nobl9 SLO SLA architecture SLO microservices CI/CD

What’s the Difference Between SLI, SLA and SLO?

By: Bill Doerrfeld on April 21, 2022 Leave a Comment

If you’re just getting into site reliability engineering (SRE) or platform engineering, you’ve probably come across a bunch of new terminologies, like SLI, SLA and SLO. These benchmarks are commonly referenced in the day-to-day life of an SRE but may seem foreign to outsiders. So, what are the differences between these abbreviations?

Well, one of the hallmarks of the SRE approach is to gauge performance and set targets for application metrics. These targets can be thought of as reliability thresholds. If a site’s latency reaches a certain point, for example, it may break the agreement the software provider has with its consumer.

DevOps/Cloud-Native Live! Boston

SREs create and monitor service-level performance benchmarks for all sorts of metrics, such as uptime, latency, error count, mean-time-to-recovery and throughput, among others. But not all these performance goals are part of public agreements. Companies usually take a more proactive approach to setting more internal targets to stay far away from breaking external agreements and user trust. It’s become a best practice to set these baselines to retain functional, reliable systems.

As mentioned, the type of targets range from firm partner commitments to more internal goals. Below, we’ll attempt to define, in simple terms, the differences between service-level agreement (SLA), service-level objective (SLO) and service-level indicator (SLI). While these concepts share many similarities, it’s important to understand how they are applied in practice. Keep in mind that each company may have a nuanced understanding of these terms and may conceive and apply them differently.

Service-Level Indicator (SLI)

First off, service-level indicators (SLIs) refer to the actual metrics produced by software services. This is a direct measurement of a service’s behavior. These are the real numbers that indicate overall performances, such as error rate and latency over time.

One example of SLI would be the number of successful requests out of total requests over a one-month period. Say an engineer used an application monitoring tool that ingests data from production logs. This data showed that out of one million requests made, ten requests failed. An SLI for availability would thus be 99.999% uptime.

Service-Level Agreement (SLA)

Second, most technologists are familiar with service-level agreements (SLAs). Even if you’re not, you’ve likely agreed to many throughout your history as a digital user. SLAs are like a pact between the software provider and the software user or client. These binding commitments often note availability expectations that must be upheld. SLAs may also include responsiveness to incidents and bugs. It depends on the contract, but if an SLA is broken, some kind of penalty may be incurred such as a refund or a service subscription credit.

These days, the average business relies on many cloud-based SaaS, PaaS and IaaS. Just to initiate something as simple as an online payment might require hitting multiple remote servers. But if all these services just went offline whenever they felt like it, our digital operations would come to a grinding halt. SLAs are thus necessary for enterprise software contractual agreements to ensure both parties meet specific standards.

Once an engineer tracks SLIs and has an idea of typical behavior, they can then set an SLA that makes sense. If we take the example above, an SRE may consider guaranteeing a lower threshold than what is being actively monitored. This would ensure that application performance, at its current pace, doesn’t break any legal or contractual obligations. In this scenario, perhaps the SLA mentions an availability objective: No more than 100 requests fail per one million requests made. This would essentially equate to 99.99% uptime.

Service-Level Objective (SLO)

Lastly, service-level objectives (SLOs) are similar to SLAs but explicitly refer to the performance or reliability targets. An SLA may refer to specific SLOs. Or SLOs may be tracked just for internal purposes. As Google described, “the availability SLO in the SLA is normally a looser objective than the internal availability SLO.”

You don’t want the end users to be the first people clamoring about a 400%+ latency rise in your mobile web apps. Thus, SRE teams typically keep a close eye on performance to ensure they never even get close to breaking SLAs. Some do this by setting and monitoring internal baselines that are more ambitious than the SLA threshold.

If we take our example above, if an SLA guarantees a service uptime of 99.99%, the business may set an internal target of 99.995%. In other words, for every one million requests, no more than 50 should fail. If software systems aren’t hitting these marks, it’s a sign that the company must reevaluate designs and search for bottlenecks. Or, perhaps engineering teams have goals to improve the average downtime for the next quarter—an internal objective could be set at a higher standard than the currently perceived performance.

As the Google Cloud blog described, “every service should have an availability SLO—without it, your team and your stakeholders cannot make principled judgments.” It’s a good reminder to be modest in your reliability commitments, they added, as consumers will come to expect this level of performance. On the other hand, setting more ambitious internal performance targets has the benefit of delivering a better result than the agreement on paper, increasing a software service’s competitiveness.

Real-World Performance Benchmarks

Keep in mind that the above numbers are simply for demonstration purposes. One interesting resource for real-world figures is API.Expert, a service that queries popular APIs and posts weekly performance statistics. Since APIs are the heart of many UI-based platforms (and our digital economy at large), these benchmarks stand as a good indicator of average uptimes and latencies in the industry.

For example, at the time of writing, API.Expert’s Enterprise APIs collection ranked Microsoft Office365 and Pivotal Tracker at the top, both with a 100.00% pass rate and a 220 ms and 248 ms latency, respectively. On the other end of the spectrum, Docusign is at 99.93% with an 877 ms latency and Box is at 99.99% with 414 ms latency.

SLIs, SLAs and SLOs—Oh My!

Although it may sound good to an unpracticed ear, an SLA of 99.99% still equates to 52 minutes and 36 seconds of downtime per year. That’s nearly an hour of downtime in which customers are left scratching their heads or, worse, searching for other options. For traumatic health care situations, a loss of connectivity could be a matter of life and death.

Although creating SLAs and SLOs is important to gauge system health, the reality is that it can be challenging to track and enforce them. “These agreements—generally written by people who aren’t in the tech trenches themselves—often make promises that are difficult for teams to measure,” according to the Atlassian knowledge center.

In summary, SLIs demonstrate the real behavior of software systems. These metrics inform the creation of SLAs, which must be met to ensure B2B agreements. These SLAs often reference particular service-level objectives (SLOs) that must be met, which usually give more breathing room around SLIs. Lastly, in a digital economy with accelerating digital expectations, it makes sense to monitor internal SLOs and improve baselines over time.

Recent Posts By Bill Doerrfeld
  • Increasing Use of SLOs to Enable Observability
  • Does GraphQL Introduce New Security Risks?
  • Smoothing the Transition From REST to GraphQL
More from Bill Doerrfeld
Related Posts
  • What’s the Difference Between SLI, SLA and SLO?
  • Increasing Use of SLOs to Enable Observability
  • Day in the Life of a Site Reliability Engineer (SRE)
    Related Categories
  • Application Performance Management/Monitoring
  • Continuous Delivery
  • Continuous Testing
  • DevOps Practice
  • Enterprise DevOps
  • Features
  • IT Administration
    Related Topics
  • app performance
  • DevOps metrics
  • service metrics
  • SLAs
  • SLIs
  • SLOs
Show more
Show less

Filed Under: Application Performance Management/Monitoring, Continuous Delivery, Continuous Testing, DevOps Practice, Enterprise DevOps, Features, IT Administration Tagged With: app performance, DevOps metrics, service metrics, SLAs, SLIs, SLOs

Sponsored Content
Featured eBook
The State of the CI/CD/ARA Market: Convergence

The State of the CI/CD/ARA Market: Convergence

The entire CI/CD/ARA market has been in flux almost since its inception. No sooner did we find a solution to a given problem than a better idea came along. The level of change has been intensified by increasing use, which has driven changes to underlying tools. Changes in infrastructure, such ... Read More
« Checkmarx Finds Malicious Open Source PyPi Repository
The Pros and Cons of Embedded SREs »

TechStrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

Accelerating Continuous Security With Value Stream Management
Monday, May 23, 2022 - 11:00 am EDT
The Complete Guide to Open Source Licenses 2022
Monday, May 23, 2022 - 3:00 pm EDT
Building a Successful Open Source Program Office
Tuesday, May 24, 2022 - 11:00 am EDT

Latest from DevOps.com

DevSecOps Deluge: Choosing the Right Tools
May 20, 2022 | Gary Robinson
Managing Hardcoded Secrets to Shrink Your Attack Surface 
May 20, 2022 | John Morton
DevOps Institute Releases Upskilling IT 2022 Report 
May 18, 2022 | Natan Solomon
Creating Automated GitHub Bots in Go
May 18, 2022 | Sebastian Spaink
Is Your Future in SaaS? Yes, Except …
May 18, 2022 | Don Macvittie

Get The Top Stories of the Week

  • View DevOps.com Privacy Policy
  • This field is for validation purposes and should be left unchanged.

Download Free eBook

The State of Open Source Vulnerabilities 2020
The State of Open Source Vulnerabilities 2020

Most Read on DevOps.com

Why Over-Permissive CI/CD Pipelines are an Unnecessary Evil
May 16, 2022 | Vladi Sandler
Apple Allows 50% Fee Rise | @ElonMusk Fans: 70% Fake | Micro...
May 17, 2022 | Richi Jennings
Making DevOps Smoother
May 17, 2022 | Gaurav Belani
Why Data Lineage Matters and Why it’s so Challenging
May 16, 2022 | Alex Morozov
DevOps Institute Releases Upskilling IT 2022 Report 
May 18, 2022 | Natan Solomon

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2022 ·Techstrong Group, Inc.All rights reserved.