All of Microsoft’s cloud services go down, everywhere. Redmond’s IaaS, PaaS and SaaS—including GitHub—were dead for several hours, and are…
In today’s always-on, ever-connected world, we all expect 100% availability. What gets in the way of this? The devil is…
In this week’s The Long View: Cloudflare suffers another huge outage while the FAA and FCC still disagree over 5G/NR…
What happens when the tools and services you depend on to drive site reliability engineering turns out to be susceptible…
In this week’s The Long View: Why Apple services were down, Linux gets a huge RNG overhaul, and we wonder…
The December 7, 2021 Amazon Web Services (AWS) outage severely disrupted services from a wide range of businesses for more…
In this week’s The Long View: Amazon Web Services falls on its face, Linux’s move to Rust takes the next…
In this week’s The Long View: Nvidia’s faltering attempt to buy Arm, Google’s load balancers go offline, and Backblaze’s newly-IPO’ed…
In the first and second parts of this three-part series, we looked at how organizations can effectively resolve incidents and…
One aspect of the DevOps movement I've seen adopted at numerous companies is the idea that everyone supports their products…