Tag: site reliability
SREs Say There’s Plenty of Room to Improve Incident Management
A global survey of site reliability engineers (SREs) found diagnosing issues is the most difficult aspect of incident management ...
AI: A Game-Changer for SRE Work-Sharing and Technical Debt
AI-engineered tools can be used to improve the SRE work-sharing and technical debt pillar of DevOps practice ...
Why Observability is Important for Development Teams
Observability, which allows development teams to understand better how their systems behave in production, is critical to successful software delivery. According to Gartner, by 2026, 70% of organizations that successfully applied this ...
How To Build Anti-Fragile Software Ecosystems
Just as organisms are susceptible to diseases and viruses, software systems are susceptible to hacks and errors. And within many complex interconnected systems, a minor bug could have a cascading effect across ...
The Rogers Outage of 2022: Takeaways for SREs
When, eight years from now, folks are creating lists of the top IT incidents of the 2020s, there's a good chance that they'll include the Rogers outage of 2022. The failure, which ...
How SREs Benefit From Feature Flags
When you think of who uses feature flags, your mind most likely goes to software developers. In general, feature flags are closely associated with software engineering. But site reliability engineers (SREs), too, ...