Prevent False Positives From Derailing Shift Left

Static application security testing (SAST) tools are designed to balance false positives (incorrect warnings) with false negatives (missed vulnerabilities) primarily because deeper analysis requires more time and computing resources. Both of these are in short supply among developers that are tasked with meeting shorter and shorter product delivery milestones.

So, while SAST vendors consider a true positive a correct detection of a real defect, and a false positive an alert on a bug that does not exist, this is rarely the lens used by developers. What really matters to them is whether the output of a SAST tool is useful and actionable. For example, there is a great deal of variation in how developers interpret SAST scan results, depending on the nature of the defect, the role of the user, the platform on which the application will run and the environment in which it is deployed.

Take, for example, a true positive result of a buffer overrun, one of the most notorious classes of C/C++ defects from a security perspective. In the early stages of application development, it almost always makes sense to change the code to fix such a bug. Developers are actively changing the code, anyway, so fixing it involves little extra overhead. However, if the same defect is found after the application has been deployed, then it is much trickier to decide whether it is worth fixing.

That’s because the cost of fixing a real bug may exceed the benefit of fixing it, and the benefit of “correcting” a false positive may exceed the cost of leaving it alone. Since SAST tools can only be relied on to give narrow technical answers, humans must interpret and determine which static analysis results should be acted upon.

Consider a hypothetical comparison of different SAST tools in the graph below. Tool A has good recall (i.e., the ability to identify real defects) and precision (i.e., the ability to exclude false positives) which results in finding many of the real bugs with a reasonable amount of false positives. Tool B has high precision but poor recall, resulting in low false positives but a higher number of false negatives (undetected security vulnerabilities). Tool C has poor precision but high recall, resulting in detecting all the possible errors but also with a very high number of false positives.

A comparison of recall and precision of hypothetical SAST tools.

Developers hate false positives because they introduce unnecessary work and delays. This, in turn, has a disproportionate effect on the way tools are designed, configured and used. If given a choice between Tool A that reports 40 real defects and 10 false positives, and Tool B that reports 50 real bugs but with 50 false positives, users will almost always prefer the former, even though it is finding fewer real defects. This is perfectly understandable — users are asked to weigh an immediate concrete negative (time spent looking at false positives) against an intangible potential future positive (vulnerabilities that may not show up).

However, if one offsets the time and risk saved in finding those 10 bugs earlier (i.e. by avoiding expensive and potentially dangerous bugs in finished products) against the time needed to assess the additional 40 warnings as false positives, then it quickly becomes apparent that configuration B is more economical. This is especially true with security vulnerabilities that slip through testing and can have expensive consequences.

Since the number of warnings within a DevOps pipeline can be a deterrent to getting the most out of a SAST tool – especially early in the project, when it provides the greatest benefits – organizations should consider the following techniques to increase developer attention to alerts:

Filter and Focus: Parse the viewed data from the tool’s interface, focusing on what is most important for the project and assigning developers to fix critical issues in priority order.
Mark and Defer: Lower the priority, or change the state to “later,” for example, on all or a subset of the warnings based on some set of conditions that are less crucial to the project.
Stop the Bleeding: Use the above techniques to temporarily defer existing warnings with the emphasis on fixing new defects that are introduced as code is changed or new code is developed.

Despite the fact that it will invariably produce some false positives, SAST helps shift security further to the left by ensuring security and programming vulnerabilities are eliminated when code is being written, rather than at the testing stage when they are much more expensive to fix. Filtering and prioritizing warnings are two effective techniques to ensure developers don’t throw out the baby with the bathwater.