DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • DevOps Chats
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Communities
    • AWS Community Hub
    • CloudBees
    • IT as Code
    • Rocket on DevOps.com
    • Traceable on DevOps.com
    • Quali on DevOps.com
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Digital Anarchist
  • Media Kit
  • About
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DevSecOps
  • Leadership Suite
  • Practices
  • ROELBOB
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps

Home » Blogs » How the SRE Role Is Evolving

availability SRE

How the SRE Role Is Evolving

By: Bill Doerrfeld on February 10, 2021 Leave a Comment

In recent years, site reliability engineering (SRE) has garnered much interest. In 2019, LinkedIn listed site reliability engineer as the second most promising job in the United States. Now, in 2021, the role continues to grow and evolve within many organizations.

Recent Posts By Bill Doerrfeld
  • Quality Is a Top Challenge for Data-Driven Projects
  • The Age of Software Supply Chain Disruption
  • Supergraph: One GraphQL Schema to Rule Them All
More from Bill Doerrfeld
Related Posts
  • How the SRE Role Is Evolving
  • Increasing Use of SLOs to Enable Observability
  • SRE vs. DevOps — a False Distinction?
    Related Categories
  • Application Performance Management/Monitoring
  • Blogs
  • DevOps Culture
  • Editorial Calendar
  • Features
  • SRE
    Related Topics
  • secure software development
  • service reliability engineer
  • SRE
  • systems administrator
Show more
Show less

Initially spearheaded at Google and credited to engineer Ben Treynor, the strategy seeks smarter accountability for application reliability. An SRE team sets service level indicators, makes error budgets for new features and uses tools like application performance monitoring (APM) to visualize performance insights, among other tasks. SREs are quickly becoming a key check to increase business output in response to new digital innovations.

DevOps Connect:DevSecOps @ RSAC 2022

I recently met with AppDynamics Regional CTO Gregg Ostrowski to discuss the emergence of SRE and how the approach is evolving. In short, it appears SREs will only continue to grow in importance across more companies, adopting new tactics like chaos engineering, and diversifying their teams with new domain knowledge in response to increasing technological complexity.

What is an SRE?

Traditionally, organizations employed system administrators, or sysadmins, to maintain operations for large computing systems and services. However, this typically produced a dichotomy: product developers want to push new features, while operations teams (sysadmins) want to make sure the existing service doesn’t break. The split between developers and operations can be unhealthy and cause friction.

Whereas sysadmin positions are detached from development and involve a lot of manual work, an SRE approach, on the other hand, takes a more engineering-focused approach to automate operations. Google’s SRE book, which has become a bible of sorts, defines the role as “What happens when you ask a software engineer to design an operations team.”

So, what does an SRE do? Well, at Google, they are in charge of availability, latency, performance, efficiency, change management, monitoring, emergency response and capacity planning for the services they operate. At Google, they spend no more than 50% of their time on traditional “ops,” like tickets, on-call and manual tasks, and are encouraged to use the remaining 50% of their time to develop automation that will replace human operations labor.

5 Ways the SRE Approach is Evolving

It’s been a few years since Google introduced the SRE concept, and naturally, unique iterations have popped up within other enterprises. But, on the whole, where is the concept heading? Ostrowski sees SREs evolving in a few key areas: growth and maturity, increased diversification with domain-specific experts and new monitoring tactics.

1. Increased Adoption

First, not all companies have embraced an SRE model. A recent study by Blameless found “… 50% of respondents employ an SRE model with dedicated engineers focused on infrastructure and tooling, or an embedded model where full-time SREs are assigned to a service.”

The SRE model is gaining momentum, but there is still room for greater adoption. There is also room for internal growth. Ostrowski sees a single SRE team as a single point of failure. “It needs to be a whole department,” he said.

In addition, SREs are gaining a more prominent voice at the table, influencing feature rollout. “With proper and mature SRE involvement, teams can’t willy-nilly deploy,” he said. Ostrowski views these teams as maintaining a critical balance between business risk and introducing new technology.

2. Larger, Diversified SRE Departments

Many companies are experiencing rising user demands, and thus must rapidly scale their application networks. Simultaneously, there has been a Cambrian explosion of deployment types — systems could be using any assortment of legacy infrastructure, mainframe, microservices, cloud environments and multiple cloud vendors. “The complexity and topology of the IT space has grown substantially, with many interdependencies,” Ostrowski said.

Due to increasingly complex technical stacks, SRE must now cover many domains. Thus, SRE departments will likely require a more diversified team of domain experts. Ostrowski likens SREs to “Navy SEALs of the IT team.” For example, assigning them with experience navigating the nuances behind Google Cloud Platform (GCP) or Microsoft Azure could maintain service reliability in respective clouds.

Individuals’ personality matters for suitability to SRE, too, said Ostrowski. The role requires a specific type of innovator who considers the repercussions of code, and loves improving processes and performances.

3. New Testing Tactics Will Emerge

Ostrowski also foresees SRE departments introducing new monitoring and testing approaches to maintain reliability. One of these is chaos engineering, which champions the idea of intentionally breaking application systems. Different strategies will undoubtedly emerge to help drive user experience (UX) and ensure that performance is always top of mind.

4. Businesses Rely on SREs to Mitigate Risk

All things will inevitably fail, at some point. SREs accept failure and learn to manage it, designing repeatable operational mitigation structures. “As companies become reliant on the consumerization of IT, business is dependent on SREs to drive business,” said Ostrowski. This reliance will likely increase as businesses tap SREs to maintain stability.

Whether it’s reducing mean time to repair (MTTR), programming service level indicators to monitor website load time, or forecasting error budgets for new feature introductions, SREs will be increasingly relied upon to maintain business stability and high performance.

“The SRE mindset is about coming to terms with a blameless environment,” said Ostrowski. They can help “balance a risk between the business and application team,” described Ostrowski.

5. SREs Steer UX

In this digital-only environment, “The application has become the business,” described Ostrowski. If the application is core to the business, monitoring the user journey is necessary to improve it. Ostrowski believes SREs oversee a unique territory that could produce valuable business insights too.

In addition to monitoring uptime to ensure URL response times meet SLAs, SREs could track UX-related insights, such as conversion rates or cart abandonment percentages. Tracking such analytics and setting standards baselines could help pinpoint problems affecting UX. It could also assist product development in designing better-performing software.

Final Thoughts

“Anything that you do more than twice has to be automated.”
-Adam Stone, CEO, D-Tools

Traditional operations teams are typically detached from product development and are culturally very different. However, within an SRE approach, operations instead aim to run systems that automate work typically performed manually by sysadmins.

In this brief introduction, we barely scratched the surface of the topic. And of course, each company is unique in its approach to hybridizing product development and operations. DevOps tenants are closely aligned with SRE — often they are intermixed or one and the same.

Ultimately, truly reaping business value from this approach will depend on breaking down silos and opening conversations — or at least automated notifications — between disparate units. Only then can operations and development coexist in the most productive way.

Filed Under: Application Performance Management/Monitoring, Blogs, DevOps Culture, Editorial Calendar, Features, SRE Tagged With: secure software development, service reliability engineer, SRE, systems administrator

Sponsored Content
Featured eBook
Hybrid Cloud Security 101

Hybrid Cloud Security 101

No matter where you are in your hybrid cloud journey, security is a big concern. Hybrid cloud security vulnerabilities typically take the form of loss of resource oversight and control, including unsanctioned public cloud use, lack of visibility into resources, inadequate change control, poor configuration management, and ineffective access controls ... Read More
« Vocational Training
Dynatrace Adds Cloud Automation Module to Its Software Intelligence Platform »

TechStrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

Continuous Deployment
Monday, July 11, 2022 - 1:00 pm EDT
Using External Tables to Store and Query Data on MinIO With SQL Server 2022
Tuesday, July 12, 2022 - 11:00 am EDT
Goldilocks and the 3 Levels of Cardinality: Getting it Just Right
Tuesday, July 12, 2022 - 1:00 pm EDT

Latest from DevOps.com

Rust in Linux 5.20 | Deepfake Hiring Fraud | IBM WFH ‘New Normal’
June 30, 2022 | Richi Jennings
Moving From Lift-and-Shift to Cloud-Native
June 30, 2022 | Alexander Gallagher
The Two Types of Code Vulnerabilities
June 30, 2022 | Casey Bisson
Common RDS Misconfigurations DevSecOps Teams Should Know
June 29, 2022 | Gad Rosenthal
Quick! Define DevSecOps: Let’s Call it Development Security
June 29, 2022 | Don Macvittie

Get The Top Stories of the Week

  • View DevOps.com Privacy Policy
  • This field is for validation purposes and should be left unchanged.

Download Free eBook

Hybrid Cloud Security 101
New call-to-action

Most Read on DevOps.com

What Is User Acceptance Testing and Why Is it so Important?
June 27, 2022 | Ron Stefanski
Rust in Linux 5.20 | Deepfake Hiring Fraud | IBM WFH ‘New No...
June 30, 2022 | Richi Jennings
Chip-to-Cloud IoT: A Step Toward Web3
June 28, 2022 | Nahla Davies
DevOps Connect: DevSecOps — Building a Modern Cybersecurity ...
June 27, 2022 | Veronica Haggar
The Two Types of Code Vulnerabilities
June 30, 2022 | Casey Bisson

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2022 ·Techstrong Group, Inc.All rights reserved.