Using Incident Response for Continuous Testing

Incident response tools offer the ability for organizations to not only implement continuous testing but to also shorten the feedback loop from continuous testing back into planning and development

At this point, you should be able to say the word “continuous” to any technical team, drop the mic and leave. Yes, the idea that we want to do more delivery, more testing and more interactions isn’t always novel. The reason we use phrases such as continuous testing and continuous improvement is to have the conversation about the unique set of tools and practices required to run these processes in production, and not just as part of continuous integration (CI). Part of supporting continuous testing is having a highly effective incident response process—arguably, it’s required.

Continuous testing is used in the context of functional testing, performance testing and vulnerability scanning. All three come from different disciplines, but all suffer from the same challenge that they’re processes designed to be run during pre-release, not in production.

Functional testing is the process of running automated tests of the application as if the test were a user, from the outside in. Performance testing is the process of putting artificial loads on the application to simulate high transaction volume to see how the application responds. Vulnerability scanning is the process of scanning application code to make sure components don’t have known vulnerabilities.

All of these application quality processes are generally run in continuous integration (CI) environments and pre-production. If an organization were to move them to production, they get several benefits:

Full parity with production infrastructure. When you run these tests in CI, the environment it’s run on isn’t a one-for-one comparison with the infrastructure used for production. By running tests in production, you get a true picture of what is going to happen.
Better relation to production data. By running these tests in production you are running aside real user activity, which gives a more accurate picture from the outcome of the tests.
Reduced reliance on CI. Continuous testing allows you to streamline and even reduce pre-release processes, allowing more code to get to production faster. It also reduces the reliance on synthetics, mocks and service virtualization.
Specifically for vulnerability scanning, the ability to build DevSecOps workflows into production for vulnerabilities that show up at a later date.

The mechanisms for running the tests in production are the same as CI, just with a different target environment. The problem is, when something breaks in production, it’s much more serious. So, to support continuous testing, you need to know what to do if there’s a test case failure or a failure in the application as a result of running the test.

This is where incident response comes in. Today, most incident response tools and processes are used by SREs and developers to be alerted when something breaks. But, the same tools and approaches can be used for running tests as well. The only difference is, not only will the application be a data source of an alert, but so are the testing tools. This way, quality engineering and DevSecOps teams can confidently run their test suites and get alerted if something goes wrong.

In particular for vulnerability scanning, continuous testing allows SecOps folks to get alerted when a new vulnerability is detected, which can result in patching holes long before the application is exploited.

If an organization embraces continuous testing, they need to know:

If a test fails. The effort to run tests, both in terms of human and compute time can be high. To wait for a result can also impact downstream processes. Therefore, it’s important for application quality and security to know that the tests themselves are running successfully. And, if they break, the QE team or SecOps team needs to be alerted so they can re-run the tests.
If the tests break something in production. Not only do the teams who own the testing process and tools need to be alerted but if you’re on-call, you need to know as part of the context of an alert, that the incident was directly or indirectly impacted by tests being run. This can save a lot of time by killing tests versus chasing down issues with the application that may not be there.

Incident response tools offer the ability for organizations to not only implement continuous testing but to also shorten the feedback loop from continuous testing back into planning and development. This can be done simply by reducing the amount of time it takes to get details from an incident back to the appropriate teams, but also for non-critical incidents that indicate potential issues in the application. Issues can be made more visible to everyone and communicated to product owners faster and more effectively. This is the other continuous, continuous improvement.