Analyzing SRE Job Postings

You can find plenty of high-level definitions out on the internet about what site reliability engineering means and what site reliability engineers do.

But if you want to understand what it’s actually like to work as a site reliability engineer, there is perhaps no better source than job descriptions. SRE job ads explain what real companies are looking for and what they expect on a day-to-day basis.

To provide insight on what SRE looks like in practice, we’ll look at some job ads from several leading companies. As we’ll see, each company has a slightly different take on what the site reliability engineer role entails, but it’s possible to identify some core themes that underline what businesses expect.

To understand the content of SRE job descriptions, let’s examine examples from four leading companies.

GitLab’s Job Ad

GitLab’s SRE job description contains most of what you would expect from a typical site reliability engineer job ad. It highlights the importance of having a mix of technical skills and collaboration skills. It also emphasizes expertise in areas like Unix/Linux, infrastructure-as-code and Kubernetes.

What may be slightly surprising, however, is the emphasis that GitLab’s job description places on programming skills. GitLab expects SREs to know how to code in specific languages, like Ruby and Go. The takeaway is that SREs need to know more than just how to use code to solve IT operations problems. They must also be able to develop applications or contribute code to application platforms (the job description mentions “contributing to code in GitLab” as a required SRE technical skill).

Also interesting is the fact that GitLab expects site reliability engineers to be able to write blog posts. Blogging may not be one of the first skills that come to mind when you think of a site reliability engineer’s responsibilities, but at GitLab, at least, it’s part of the job.

LinkedIn’s Job Ad Template

LinkedIn provides an SRE job ad template that businesses can use when posting their own SRE jobs on the site.

LinkedIn’s ad is pretty high-level (which is unsurprising, given that it’s a template rather than an ad for a particular job). For the most part, it doesn’t mention specific tools—although the main exception is programming languages, where it says SREs should know how to code in languages like C, Python, JavaScript and Ruby. Like GitLab, then, LinkedIn clearly sees development as a core SRE job responsibility.

The LinkedIn job description also places a heavy emphasis on the ability to “gather and analyze metrics.” It doesn’t use the word “observability,” but that seems to be what LinkedIn is getting at.

An SRE Job at Microsoft

Continuing the theme of development-centric SRE work, a recent job ad for a specific SRE opening at Microsoft reads a lot like a DevOps engineer job description. The ability to “design, write and deliver software” is the top responsibility listed in the ad, which also mentions familiarity with CI/CD tools.

The Microsoft job description also mentions the importance of written and verbal communication skills. Unlike GitLab’s ad, this one doesn’t say SREs will need to write blog posts; still, it’s clear that companies expect people in this role to be creative communicators in addition to technical experts.

It’s worth noting, too, that Microsoft wants site reliability engineers who know both Windows and Unix. If you need further proof that Unix skills are absolutely central to SRE work, look no further than this job ad from the company that created Windows.

Amazon’s Site Reliability Engineer Ad

Even Amazon—a company that is more of a platform provider than a software vendor—places major emphasis on the ability of SREs to code in one of its recent SRE job ads.

Interestingly, the Amazon job also highlights networking skills as a major qualification. That’s understandable given the complexity of networking in the AWS cloud. But it’s a bit surprising because most other site reliability engineer job ads don’t call out networking expertise as a key skill—despite the fact that network engineering is critical to SRE work.

The Amazon job description mentions the importance of an applicant’s willingness to be on-call, too. We suspect that’s critical in virtually any SRE job, but at least Amazon is upfront about the fact that, at time, you’ll be woken up at 4 a.m. to manage incidents.

Conclusion: Takeaways

There are a few key takeaways from these various SRE job descriptions.

Development skills are absolutely critical in order to be an SRE. Scripting for administration or IT Ops purposes is not enough on its own.
The ability to communicate clearly—including both verbally and in writing—is more important to SRE work than you may realize.
Network engineering skills, too, are a crucial part of SRE jobs, at least at some companies.
In practice, SRE job descriptions sound more similar to DevOps job ads than you might expect.

This is not to say that all SRE roles are identical or that all companies expect the same things from their SREs, of course. But the jobs ads or templates we’ve surveyed do highlight how the high-level understanding of SRE translates to on-the-ground reality at actual companies.