It was 6 a.m. Saturday when the phone rang and awoke the American Express CIO. When the phone rings at 6 a.m., it is never good news. Never. This instance was no different: In this case, a third-party supplier just suffered a breach, and that breach would impact American Express cardholders.
Immediately following that call, the CIO activated the company’s Cyber Crisis Response team. The job of the Cyber Crisis Response team is to help identify impacted cardmembers and prepare to reach out and assist any who have questions or need help regarding the breach.
The need to respond swiftly to customers and provide them with the correct information is essential today, not only to stay on the right side of regulators, but to also help customers avoid fraudulent transactions and identity theft. When it comes to such effective data breach incident response, rarely does DevOps come into play—but the experience American Express shared at the most recent DevOps Enterprise Summit revealed just how better DevOps organizations can be when it comes to effective breach response.
Within hours of the initial call between the CIO and the Cyber Crisis Response team, the primary breach response was divvied into three teams. The first team focused on what data breach bridge teams typically focus on: how to identify impacted customers. The second team was comprised of business and product owners, as well as customer care staff, whose objective was to take the findings from the response investigation and communicate them to American Express customers.
The third team consisted of DBAs and system specialists, who understood all the systems, as well as enterprise architects, who might be able to quickly solve any technical challenges that arose.
Getting to the Right Breach Information
By 3 p.m. that Saturday, the first team determined that it could pull together all of the information it needed to identify affected cardholders. That was the good news. The bad news was that there would be tens of millions of production records that would need to be evaluated to make the final determination.
If the team was to pull those tens of millions of records in production, the demand would start to slow those production systems. “How do we pull these records out of production without impacting our availability? That was the challenge,” said Aimee Cardwell, vice president of Engineering, Consumer Product Development at American Express.
As the day moved forward and the teams worked to find a way to access those production records without impacting availability, one of the engineers on the team pitched an idea that would have been scoffed at in most organizations: Why don’t we clone production? But the novel idea wasn’t immediately rejected. The team on the call began to weigh the pros and cons of cloning production, and concluded that it could, in fact, clone production quickly, and all of the impacted card data could be aggregated without negatively impacting American Express’s servers and associated availability.
After the team successfully cloned the production systems, it worked all night to identify the impacted card members, which required cross-referencing the cloned production system with other data stores, Cardwell explained.
“What was really important here was the comfort level everyone had when it came to bringing up an idea that was really off the wall. And, in the end, they came together to make it happen because together the various teams had a string understanding of the people, technologies, and processes in place necessary to succeed,” said Chad Avery, director, DevOps implementation at American Express.
It was now 6 a.m. Sunday and the teams had worked nonstop since the CIO’s phone rang Saturday morning. The teams managed to evaluate the relevant data from various systems and collected a list of potentially impacted card members. And after a careful analysis, they were able to determine who had been affected by the breach and who hadn’t.
What made this success possible? Both Avery and Cardwell believe that it was the integration of business, product and technical teams. “The fact that we had business, product and technical teams working together in this incident was a huge win for us,” Cardwell said.
Avery contended the ability for an organization to successfully integrate its technical, business and product teams makes the difference. If those teams hadn’t been working together from the beginning, he said, they may not have been able to find a solution and, if they had, it would have taken much more time to do so.
Disclosure Day, and a Final Big Win
It’s a good thing it worked out quickly. By Monday morning, the breached third-party was ready to go public with its data breach disclosure. That announcement would immediately impact not just American Express cardmembers, but those of most other credit card providers as well. Many customers would have pressing questions about whether their accounts were impacted, and, if so, what to do.
Customer service lines had to be ready.
That proved to be the final challenge that needed to be solved: How would customer service representatives know which customers calling were affected? Tight team integration and collaboration proved to be a deciding factor once again.
The engineers on the product team had spent the previous 24 hours building a tool that they shared with American Express customer care professionals. When individuals called the customer care team, the tool could alert the team whether a customer was in the impacted group.
The tool also came equipped with a data collection feature that would enable everyone to learn from the event and even prepare better for next time. “We want everyone to get better. We want ourselves to get better,” Cardwell said. “We want to know what percentage of customers who called were impacted, and what percentage of customers that called were not impacted. We wanted to know our response time on those customers. We wanted to know any other data that would help us to do better next time.”
And it will have been that dedication to collaboration among technical teams, business units, developers, operations and security teams that enabled them to quickly and successfully manage a third-party data breach and prepare themselves to do even better next time. “They all were able to develop a solution quickly. They were able to identify the customer impact, design a solution, build it, test it and get it deployed. And, oh by the way, the teams did it in a weekend,” Avery said.