[cfe-dev] [analyzer] Equivalent paths.

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Fri Oct 19 14:40:57 PDT 2018

Just wanted to share/document one idea that was kinda in the air during 
the dev meeting.


First, well-known stuff to warm up. Suppose we have code:

     void foo(int x) {
         if (x > 100)

Static Analyzer's usual bug report will say something like "assuming x 
is from 101 to 2147483647, foo(x) crashes". Would it be better to report 
it as "assuming x is equal to 2018, this function crashes"?

   * On one hand, it's tempting because we've come up with a concrete 
unit test that the user can add to his program: "foo(2018);". This 
doesn't sound like much in this case, but it might be quite helpful as 
the system of equations becomes more complicated (eg., x > 0, y > 0, x^3 
+ y^3 = z^3 - this might have a non-trivial solution in finite-width 
ints!) (unsigned ints, to prevent overflow UB).

   * On the other hand, what if the developer says "it is impossible to 
call foo(2018) because it is only intended to be called with ascii 
characters"? In this case the developer would not know that x='x' is 
another valid unit test, because we only told him about x=2018, so he'll 
think that it's a false positive and miss a real bug in his code.

This last reason kinda outweights the first reason, which is why we're 
trying not to avoid reducing Static Analyzer reports to concrete 
counterexamples: it's better to provide the developer with all the 
information that we have. This might work for concolic testing, but it's 
not a good idea for a purely static tool.


Now, apply the same reasoning to bug report de-duplication. If we aren't 
comfortable with reducing "[101, 2147483647]" to "2018", then why are we 
comfortable with reducing the whole set of equivalent reports (that 
represent the same bug found on different execution paths) to a single 
"example report" (even the shortest one)? It might be that the shortest 
warning is false due to a hidden contract within the program, but the 
equivalence class contains other warnings that have feasible paths.

For example:

     void bar(bool x, bool y) {
         if (x) { ... }
         if (y) { ... }

A typical report produced by Static Analyzer would look like this: 
"assuming x is false, assuming y is false, bar(x, y) crashes". This 
allows the developer to dismiss the report as a false positive because 
bar() is only intended to be called with at least one of x and y being true.

However, if bar() is small enough, Static Analyzer is likely to find the 
crash on all four paths through bar(). And if this is the case, then 
when we're picking a specific "example report" out of these four 
equivalent reports, we're throwing away an important piece of knowledge 
that we already have: knowledge that it is in fact *irrelevant* whether 
x and y are true or false.

For that reason, adding "assuming"/"taking" pieces here and even showing 
path notes within any of the two if()s is bad. We should be pruning them 
much like we're pruning path notes within irrelevant inlined functions, 
or maybe replace with a specific note message that it is irrelevant 
which branch is taken (though i don't immediately see how it may be useful).


So i believe that the future of the BugReporter and custom 
BugReporterVisitors is to display path notes by exploring the whole 
equivalence class of bug reports rather than picking a single example 

The equivalence class might be incomplete because StaticAnalyzer doesn't 
give any guarantees of coverage. But this is fine: in the worst case, 
when only one path of all the equivalence paths is found, we'll just get 
the old behavior.

More information about the cfe-dev mailing list