First we were told that every company will become a technology company. Then the tides shifted to every company will become a DevOps company. Now the zeitgeist has a new obsession: Every company will become a Site Reliability Engineering (SRE) company.
Have you been keeping pace? If your company survived the ruthless restructuring of digital transformation and is savvy enough to have surfed the DevOps wave, you might already have heard SRE is the next industry standard on the horizon. And you might be bracing yourself for yet another cycle of emerging best practices. Is the dread setting in?
Fear not, throwing out the welcome mat for SRE does not require kicking DevOps to the curb. On the contrary, the two are complementary philosophies, which, when harnessed correctly, can usher in a new era of value delivery. Essentially, SRE is a specific implementation of the core DevOps pillars of success that applies a software engineering mindset toward solving traditional operations problems–with a focus on creating reliable and scalable technology. Together, DevOps and SRE work in harmony, striving toward a shared goal of breaking down organizational barriers and delivering better software faster.Â
If you’re ready to give SRE a shot but you are unsure how it best fits within your team’s existing structure, let’s break down the top five team organizational structures that emerge with clients. However, feel free to view these as loose guidelines rather than an exact blueprint. Similar to DevOps, there are diverging views and practices of integrating SRE into a company’s structure.
Discovering the unique way SRE fits into your team is the most rewarding and fun part of adopting this new system. What matters most here are outcomes. Creating a role for SRE means alleviating some of the pressure on your developers, allowing them to be more productive, and more efficiently and effectively implement the DevOps paradigm. Soon enough you’ll be reaping the benefits of what happens when, as Ben Treynor, Google vice president of engineering said, “a software engineer is tasked with what used to be called operations.”
The OG(oogle)
Let’s give credit where it’s due. We wouldn’t have the SRE acronym, let alone the entire discipline, if it weren’t for Google’s innovative ops team. With the aim of bridging the gap between its production and ops teams, Google ops leader, Ben Treynor proposed eschewing the norm of building an ops team solely from system administrators and instead bringing in software engineers who possessed a systems background.
This model entails creating a dedicated engineering team focused on running and scaling a product or platform. The team should be comprised of highly skilled software and systems engineers who are on-call for the product, and simultaneously directly updating the product code base for reliability and building associated tooling to support the product. Most importantly, the product development team is still responsible for a portion of these operational tasks and may not be allowed to ship new features if reliability falls below agreed upon thresholds (the Error Budget).
Why Change When You Can Rebrand?
If your ops, DevOps and platform teams are on board with the SRE philosophy but you either don’t want or don’t need to make significant changes to your product code, this is the approach for you. In this model, your teams can simply rebrand as Site Reliability Engineers, while adopting and implementing the tenets that come with the title. That means shifting the concentration of engineering to focus on improving reliability and scalability. This team would also be on-call, playing a prominent role in the underlying infrastructure, tooling, platforms or day to day support of the product.
Beware: You need to take seriously the underlying tenets and need for strong software engineering skills on these teams in order to realize the benefits.
The Inside Job
Perhaps you’re down with SRE but don’t want it to co-mingle with product accountability. A viable option then would be creating an internal consulting arm–a centralized team solely dedicated to creating and advocating for reliability tools and processes. One of the prominent benefits of this structure is that your team would be able to not only see the forest for the trees but also serve as your SRE gurus, keeping your team abreast of the latest technologies, trends and research in the SRE space.Â
You Get an SRE Engineer!
For those who want to weave in SRE engineers throughout their entire organization, this approach applies a holistic mindset to the SRE implementation proposition. That means SRE engineers are introduced into cross-functional teams that own the end-to-end lifecycle of a product, from build through decommissioning. From here the road diverges: your choice is to either create an SRE team of engineers who belong to a single capability and are also embedded full time within a product team, or allow product teams to hire their own dedicated SRE engineers. As Cloud and SRE Practice Lead at Slalom Build, I took the former. For us, this approach guarantees bespoke SRE implementation, ensuring right-sized scalability and reliability throughout the product lifecycle.
SRE? New to Me
Despite the compounding hype around SRE, it has yet to hit universal name-recognition. If it’s still unfamiliar to you, that’s okay. It’s not too late to raise your hand and get up to speed during this period in which we are still in the early stages of adoption. I assure you that if you reach out to expand your knowledge of SRE and even bring it back to your team, you will be met with enthusiasm and welcoming arms.
— Rob Cummings