DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • DevOps Chats
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Communities
    • AWS Community Hub
    • CloudBees
    • IT as Code
    • Rocket on DevOps.com
    • Traceable on DevOps.com
    • Quali on DevOps.com
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Digital Anarchist
  • Media Kit
  • About
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DevSecOps
  • Leadership Suite
  • Practices
  • ROELBOB
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps

Home » Blogs » Doin' DevOps » LinkedIn Preps Site Reliability Engineers (SREs) For Exciting Careers

LinkedIn Preps Site Reliability Engineers (SREs) For Exciting Careers

By: David Geer on September 10, 2015 Leave a Comment

Recent Posts By David Geer
  • Q&A: BDO’s Coffman on Change Management, Security and DevOps, Part 2
  • Q&A: BDO’s Coffman on Change Management, Security and DevOps, Part 1
  • Sounding the Death Knell for Agile: Not so Fast!
More from David Geer
Related Posts
  • LinkedIn Preps Site Reliability Engineers (SREs) For Exciting Careers
  • SRE vs. DevOps — a False Distinction?
  • What SREs Can Learn From the Atlassian Outage of 2022
    Related Categories
  • Blogs
  • Doin' DevOps
    Related Topics
  • devops jobs
Show more
Show less

Find and Fix, Sink or Swim

A few LinkedIn NOC engineers including Ariel Casas had a hand in forming what the once fledgling business / professional networking company today calls its Site Reliability Engineering organization. “The main difference between SRE and devops is that SREs are basically developers who happen to be very good at operations. So we call them Site Reliability Engineers,” says Ariel Casas, now Manager, Site Reliability Engineering, LinkedIn, explaining the choice of terminology.

In those days, circumstances such as outages would thrust Casas and his colleagues into the middle of the excitement, forcing them to devise solutions to the operations challenges at hand. This gave them opportunity to engage developers about how to work together to avoid future issues.

DevOps Connect:DevSecOps @ RSAC 2022

“Getting thrown into the middle of the storm and having to reverse engineer a service not knowing what it does, not knowing why it’s broken, we basically had to identify the root cause. We had a very, I mean, we still do, but we had a very, very complex infrastructure back when I first started. It was like descrambling an omelet,” says Casas.

Casas and crew located and scraped through log data, combed through config files, identified service dependencies, and figured out how services worked on the spot in order to understand an unexpectedly broken service. “Those opportunities were a great way for us to learn and figure out how our infrastructure worked,” says Casas, whose early training at LinkedIn was largely on-the-job, experiential learning.

Early team members like Casas received a lot of hands-on exposure to the various stages in LinkedIn’s application lifecycles, from conception to design to maintenance. Team members touched the application stack at many points and worked with developers to help them understand how the code they wrote would affect the infrastructure that the NOC had to support.

Playing With Fire

Speaking of particular fires he and his co-workers had to extinguish, Casas recalls an early example when only a few engineers were on hand as they watched the site, waiting for something anomalous to happen. And something did. “We had a company widget go down,” says Casas.

The widget served data that LinkedIn collected about specific enterprises that external media outlets were reporting on. In this instance, the CNN news service’s application of the widget stopped working. “We had no idea about the service or how the widget served that data, we simply knew it was broken and causing a poor experience for the customer,” says Casas.

The team used a Firebug debugging tool to determine the context path for the widget. They connected the dots from there to the service and the port that it was listening on. Then with some log mining relative to that service, they found errors that confirmed that the service could benefit and perhaps fully recover with the aid of some additional memory resources. “We increased the memory for the service, deployed it, restarted it, and it was really rewarding to see the widget behaving the way it should,” says Casas.

On With the Formalities

With more than 150 engineers at LinkedIn now, the company has fewer fires to put out and more need for formal training, including external trainings such as conferences and internal trainings at LinkedIn. SREs attend yearly conferences like Velocity, a WebOps-focused event where speakers flock from different organizations to share on topics where they have subject matter expertise. SREcon is another conference and SREs come to seed knowledge and training in their fellow workers there. LinkedIn also sends speakers to each of these conferences as well as attendees. “We also go to Python trainings to learn how to better code and to understand new concepts that we can use to improve our tooling,” says Casas.

For internal training at LinkedIn the company’s SREs who support select services, such as the internal SaltStack will provide education about developments in those areas. Other training includes Java programming, as the LinkedIn software stack is formed with Java. LinkedIn’s Java SMEs also contribute code back to Java.

LinkedIn also trains SRE staff on Kafka, the messaging system that supports much of that group’s data streaming. “We have some pretty smart engineers who support, own, and develop Kafka here. They provide good training around how we implement and use it,” says Casas. LinkedIn sends Kafka SMEs to meet-ups to train other enterprises in its use.

An Open Door to the SRE Career Advancement Ladder

LinkedIn offers internal trainings with an open door to any high-performing NOC engineer or developer in SRE who wants to attend. “We also have more general trainings called TechTalks and these are high-level and open to anyone in the company,” says Casas.

Filed Under: Blogs, Doin' DevOps Tagged With: devops jobs

Sponsored Content
Featured eBook
DevOps: Mastering the Human Element

DevOps: Mastering the Human Element

While building constructive culture, engaging workers individually and helping staff avoid burnout have always been organizationally demanding, they are intensified by the continuous, always-on notion of DevOps.  When we think of work burnout, we often think of grueling workloads and deadline pressures. But it also has to do with mismatched ... Read More
« The perfect fit: what to look for when hiring a consultancy group and how to build a lasting relationship
The Third “S” To A Successful App: Street Smarts »

TechStrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

Deploying Microservices With Pulumi & AWS Lambda
Tuesday, June 28, 2022 - 3:00 pm EDT
Boost Your Java/JavaScript Skills With a Multi-Experience Platform
Wednesday, June 29, 2022 - 3:30 pm EDT
Closing the Gap: Reducing Enterprise AppSec Risks Without Disrupting Deadlines
Thursday, June 30, 2022 - 11:00 am EDT

Latest from DevOps.com

DevOps Connect: DevSecOps — Building a Modern Cybersecurity Practice
June 27, 2022 | Veronica Haggar
What Is User Acceptance Testing and Why Is it so Important?
June 27, 2022 | Ron Stefanski
Developer’s Guide to Web Application Security
June 24, 2022 | Anas Baig
Cloudflare Outage Outrage | Yet More FAA 5G Stupidity
June 23, 2022 | Richi Jennings
The Age of Software Supply Chain Disruption
June 23, 2022 | Bill Doerrfeld

Get The Top Stories of the Week

  • View DevOps.com Privacy Policy
  • This field is for validation purposes and should be left unchanged.

Download Free eBook

Hybrid Cloud Security 101
New call-to-action

Most Read on DevOps.com

Four Steps to Avoiding a Cloud Cost Incident
June 22, 2022 | Asim Razzaq
How FinOps Can Optimize Cloud Costs and Drive Innovation
June 21, 2022 | Larry Cusick
The Age of Software Supply Chain Disruption
June 23, 2022 | Bill Doerrfeld
Survey Uncovers Depth of Open Source Software Insecurity
June 21, 2022 | Mike Vizard
At Some Point, We’ve Shifted Too Far Left
June 22, 2022 | Don Macvittie

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2022 ·Techstrong Group, Inc.All rights reserved.