Everyone has bad days. Every company has been through some kind of outage due to a buggy database deployment. Even the best of the best, with highly trained staff, world-class best practices and well-thought-out processes make mistakes. On May 17, Salesforce.com had a bad day.
The company deployed a faulty database change script that broke permission settings in production and gave users read and write access to restricted data. This opened the door for an unauthorized employee to steal or tamper with the company’s data. As a result, Salesforce needed to take large parts of its infrastructure down to find and properly fix the issue. The outage lasted 15 hours, 8 minutes. According to Gartner’s cost-of-downtime formula ($5,600/minute), this outage cost approximately $5 million. Plus, since so many companies rely on Salesforce, it was a very visible and embarrassing outage. (Just take a look at #SalesforceDown and #permissiongeddon on social media.)
Salesforce had to shut everything down because of the way databases work. It’s not as easy as pulling a single application. Who knows how many Salesforce employees worked like mad to take the whole database down, find the offending database script, and restore everything—all because of one change script. That’s not a fun way to spend a weekend.
Historically, Salesforce customers have experienced very little disruption in service. On the day of the outage, many loyal customers were tweeting about how rock solid the service has been, and that’s impressive.
That being said, this outage should be a wake-up call for users to realize their dependency on their platform, which has become a more integral part of how we conduct business. I’ve heard anecdotes of entire offices being unable to complete work that Friday.
The customer reactions show that they clearly have their stuff together over at Salesforce. What this outage shows is less about any shortcomings of this company specifically; but rather that everyone has blind spots, no matter how robust the testing process is.
Lessons for IT professionals:
Lessons for end users:
The bottom line is that, while the mistakes that led to the Salesforce outage are very costly and highly visible to customers, they are also entirely preventable.
Redis is taking it in the chops, as both maintainers and customers move to the Valkey Redis fork.
GitLab Duo Chat is a natural language interface which helps generate code, create tests and access code summarizations.
Expect attacks on the open source software supply chain to accelerate, with attackers automating attacks in common open source software…
The emergence of low/no-code platforms is challenging traditional notions of coding expertise. Gone are the days when coding was an…
Datadog today published a State of DevSecOps report that finds 90% of Java services running in a production environment are…
Linux dodged a bullet. If the XZ exploit had gone undiscovered for only a few more weeks, millions of Linux…