It’s no secret that site reliability engineering (SRE) adoption is growing. The operating model, it seems, is pivotal for digital transformations across the enterprise. According to the Upskilling 2021: Enterprise DevOps Skills Report, 22% of respondents (up 7% from last year) cited SRE as the approach they need to continue automation in 2021.
Common Adoption Barriers
However, for some companies, adopting SRE principles and practices isn’t always easy. Ahead of the upcoming SKILup Day conference focused on SRE (Register for SKILup Day: SRE here), I asked several DevOps Institute Ambassadors and SRE subject matter experts to provide insights into the barriers of SRE adoption. Here’s what they shared:
Marc Hornbeek, CEO and Principal Consultant at Engineering DevOps Consulting and author of the book, “Engineering DevOps”
“The primary barrier for SRE adoption is executive vision and team alignment around a goal to transform to SRE practices. SRE affects the entire organization’s practices, roles and investment priorities. Without an executive sponsor, SRE transformations are not going to go far.”
Stephen Walters, Solution Architect at xMatters, Inc.
“Culture, and in particular, cultural change, is the greatest barrier to SRE adoption. As much as everything about the human race, the things that got us to being the creatures we are today, is about evolution, which is change, and we are not particularly good at it. Or, more specifically, the bigger the change, the more resistant we are to it. It has always been a great oxymoron in IT philosophy that in order to “transform,” you must change, but that transformation is occurring because we are not good at change. So it was with agile in the early days. Organizations wishing to transform to agile (small, iterative change) would do so under a transformation project, which would typically be waterfall (big bang). Big change, big cultural resistance. The key is to implement transformation in smaller, agile chunks. They are more easily accepted and testable, and also easier to roll back or adapt if they do not work or are not acceptable. All of the benefits of agile development apply to agile transformation.
The second greatest barrier is, then, “buy-in,” where the simple answer is, unfortunately, show the end user “what is in it for them.” This will typically override the enterprise-scale view of “what is in it for us?” but, over time, with any cultural transformation, this is one of the things that should change, iteratively, and also be a focus for developing a more holistic and empathetic culture.”
Shivagami Gugan, Chief Technology Officer at CX Tech Unicorn
“Any organization, irrespective of whichever level of maturity they currently exist on today, can adopt SRE and improve their business outcomes. However, the key underlying principles of SRE and its applicability should be understood properly. There is a lot of hype around transformation and the first barrier is about understanding what SRE means within your organizational context. I find most of the time, people try to imitate what highly mature companies like Google or Spotify or some other cloud-born company does, and this usually results in failures. SRE is the purest form of the implementation of DevOps. SRE is about removing the silos in a product life cycle to achieve business outcomes in a safer, faster, cheaper and better manner. If this is understood clearly, the first barrier gets removed.
The second barrier is this complicated vision of boiling the ocean – doing 100% Agile, 100% DevOps or 100% SRE. This doesn’t work, especially when you have heritage systems of records, on-premises services and heritage infrastructure and a whole lot of baggage built over several years (sometimes decades) which is the usual case for most companies. So be very clear in choosing target areas.
The third barrier is when you think SREs specialize in building operational reliability, but do not have anything to do with the development or the deployment phases of the life cycle. Reliability is everyone’s problem – the technical product manager, the developer, the tester, the support engineer; not just the SREs. The reality of the SRE role is to ensure services availability and reliability by supporting the other teams that own these services. SREs are enablers, they are collaborators, and their goal is to ensure that the services are overall resilient, reliable and that incremental value is delivered in a continual manner. When an organization understands that implementing SRE is an underpinning cultural change that affects all parts of the organization, then it becomes easier to remove the main barriers.”
Lisa Chan, Head of Software Engineering & DevOps at PETRONAS
“Not all organizations have the luxury of having an Ops team composed of skilled developers, which is a key success factor for the adoption of the SRE working model. In my case, I’m in an IT organization of 1,800 folks that has supported an oil and gas company since its inception 40 years ago. We serve a captive market of an integrated energy business with more than 100 companies under its portfolio. As a result, there is a legacy landscape of more than 2000+ applications. The IT organization still maintains the common “Dev is Dev, and Ops is Ops” anti-pattern. Even though we are in transition towards a working model that is more collaborative, it is unrealistic to expect our 300 Ops personnel to reskill themselves to be developers, because many of them have traditional skill sets along the typical Ops silos; e.g., networking, database admins, server admins, data center management, call center agents, etc.”
BMK Lakshminarayanan, Value Stream Architect at Bank of New Zealand
“Building the SRE practices and skills within enterprises are always challenging, as some are still embroiled with old ways of thinking. The thinking, associated techniques, available tools and limited skills are constraining SRE adoption in most enterprises.
Renaming the system admins to SREs neither helps nor makes SRE experts. Investing in expensive tools doesn’t help, either. Any adoption and transformation is a gradual process. A typical medium to large enterprise has a few hundred systems, applications and services with various operating systems to support, database platforms to manage and hundreds of application databases to monitor or administrate. To this complexity, add all the dependencies of other services and components, as well. Hence, redefining the SLOs, SLAs and SLIs for these services, automating everyday maintenance tasks, understanding the toil, building resilience and fault-tolerant characteristics within these systems and applications are uphill tasks for most enterprises.”
Identifying Other Common Adoption Barriers
Many other common barriers for enterprises are:
- Availability of up-to-date system and application information in a centralized platform
- Observable nature of legacy systems
- Understanding the dependencies and service levels (Promise Theory)
- Availability of the right tools for the right job
- Skills – old ways do not open new doors.
- Time – time and space for people to upskill, train and experiment.
Want to learn more about SRE and how to overcome these adoption barriers? Join us for SKILup Day: Site Reliability Engineering on May 20. Register here: https://devopsinstitute.com/sre-2021/.