Tag: SRE
Why Your Observability Stack Is Costing You More Than Your Cloud Bill
There's a pattern playing out across engineering teams right now that nobody talks about openly: the tool meant to reduce operational complexity has quietly become one of the biggest line items on ...
When the Structure Becomes the Culture
Why micro teams and rotation reshape culture, not just throughput, in modern SRE. Most SRE leaders design teams around the systems they own. We designed ours around movement. We introduced micro teams ...
The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure
In complex software systems, our traditional definition of operational health has always been comfortably binary. For over a decade, site reliability engineering (SRE) teams have relied on the industry-standard ‘Four Golden Signals’ ...
Agentic SRE: The Next Frontier of Reliability
Agentic SRE is the evolution of site reliability engineering where AI agents help observe systems, reason over telemetry and take bounded operational actions under human-defined guardrails ...
On-Call: The Silent Force Shaping Engineering Culture
There is a silent force shaping engineering culture inside every technology organization. It affects productivity, team morale, psychological safety, and long-term retention. And yet, it is rarely discussed in executive meetings or ...
The Five Biggest Mistakes Organizations Make When Implementing SRE
From cargo-culting Google's playbook to rushing AI-powered observability into production before the fundamentals are in place, here's where SRE transformations quietly go wrong, and how to course-correct. ...
Lightrun Adds Ability to Dynamically Pull Telemetry Data from Live Apps
Lightrun has added an ability to dynamically pull missing telemetry evidence from live application environments without having to deploy additional instrumentation to its namesake site reliability engineering (SRE) platform that is based ...
PagerDuty Extends Scope and Reach of AI SRE Platform
PagerDuty has extended the capabilities and reach of its artificial intelligence (AI) agents to enable them to be invoked directly from within the Slack messaging platform. Additionally, the AI SRE Agent that ...
Komodor Extends Reach of AI SRE Orchestration Framework
Komodor today extended the reach of its orchestration framework for artificial intelligence (AI) agents by adding support for Model Context Protocol (MCP) servers and the OpenAPI specification. Company CTO Itiel Shwartz said ...
Five Great DevOps Job Opportunities
Weekly DevOps jobs roundup, this week highlighting top roles in Massachusetts, New Jersey, Chicago, Charlotte and Seattle, with pay ranges and hiring trends to help DevOps pros advance careers ...
AI Is Forcing DevOps Teams to Rethink Observability Data Management
As AI coding tools accelerate software delivery, they are also intensifying a problem DevOps and SRE teams have been dealing with for years: the unchecked growth of observability data. In this conversation, ...
How We Got Here: Alert Fatigue to Decision Fatigue
AI and observability reduced alert fatigue, but decision fatigue remains. Decision architecture helps DevOps teams scale operational judgment ...

