[PATCH] D53076: [analyzer] Enhance ConditionBRVisitor to write out more information

Thu Nov 1 20:52:07 PDT 2018

NoQ added a comment.

@Charusso, i believe you have some misconception about what constraints mean and how they work. Not sure what this misconception is, so i'll make a blind guess and annoy you a little bit with a narcissistic rant and you'll have to bear me, sry!

The most important thing to know about constraints is that they are applied to "symbols", not to variables. We explain it to the user in terms of variables (when possible) because symbols are immaterial in the code and as such cannot be easily explained to the user, but in reality constraints are entirely about symbols.

Let me explain this with an example. If you say "let x denote the number of sheep in a herd" and then a new sheep is born and joins the herd, the number of sheep becomes (x + 1) and no longer remains equal to x, while x can still be used to represent the *original* number of sheep. In this example, the number of sheep is the variable, and x is the symbol.

The good thing about symbols is that they denote the same unknown value throughout the whole analysis. Values of variables may change (hence the nomenclature), but values denoted by symbols never change. They are kinda like Static Analyzer's substitute for transforming the program into SSA notation. When constraints are assumed, we make assumptions about the unknown value denoted by the symbol and it becomes "less unknown", but it is still the same value, it's just we know more about it. Which means that:

- //"The Golden Rule of Constraint Solving":// (i came up with this title 3 seconds ago) In any two points A and B during analysis (i.e., nodes in the exploded graph), if A precedes B along any execution path (i.e., there's a directed path from A to B in the exploded graph), constraints for any symbol x in point B are a subset of constraints for x in point A.

For example, if Static Analyzer assumes that a symbol x is greater than 10, it cannot later assume that x is less than 5, because [-∞, 4] is not a subset of [11, +∞]. Therefore Static Analyzer will not explore the path on which x is less than 5, and it will not update range constraints for the symbol. But it will explore the path on which x is also less than 15 by setting range constraints for x to [11, 14]  because [11, 14] is indeed a subset of [11, +∞].

The formal definition of "Assuming" pieces is that they indicate that the constraints change for the symbol (eg., for the symbol that represents the *current* value of the variable mentioned in the message of the event piece). This is the precise meaning of "Static Analyzer is assuming". And when explained in terms of variables, this precise meaning is comprehend-able by the user and should be conveyed properly. The distinction between presence and absence of assumptions is crucial to convey. Therefore, if the branch condition is denoted by a symbol and constraints over that symbol do not change, there should be no "Assuming..." piece for the branch condition symbol, and the respective mechanism should not be removed. But there may be a new sort of event piece, eg. "Knowing...", if you still want to highlight the motivation behind "Taking..." a specific branch, which seems to be the purpose of your patch.

"Assuming..." pieces over the same variable may contradict each other (i.e., look as if they violate the Golden Rule) when contents of the variable change. So, yeah, in this case another "Assuming..." piece will be necessary in some cases. But it doesn't mean that the Golden Rule is actually violated; it just means that the variable now has different value and different assumptions can be made about it. This doesn't happen in the examples that i pointed out, so there should not be an "Assuming..." piece.

In an inline comment i show what this means on the test case that seems to be the easiest.

================
Comment at: test/Analysis/MisusedMovedObject.cpp:187
     A a;
-    if (i == 1) { // expected-note {{Taking false branch}} expected-note {{Taking false branch}}
+    if (i == 1) { // expected-note {{Assuming 'i' is not equal to 1}} expected-note {{Taking false branch}}
+      // expected-note at -1 {{Assuming 'i' is not equal to 1}} expected-note at -1 {{Taking false branch}}
----------------
Charusso wrote:
> NoQ wrote:
> > These assumptions were already made on the previous branches. There should be no extra assumptions here.
> Agree but only if there is no extra constraint EventPiece between them.
I think i answered this concern in the out-of-line comment. Because constraints never change in a contradictory manner but only grow shorter, a much stronger statement can be made here: there are also no extra assumptions when the variable is not reassigned since the last "Assuming..." piece. Of course, even if it is reassigned, we may still not want to have the "Assuming..." piece - it depends entirely on the current constraints over the symbol that represents the branch condition.

================
Comment at: test/Analysis/NewDelete-path-notes.cpp:10
   if (p)
-    // expected-note at -1 {{Taking true branch}}
+    // expected-note at -1 {{Assuming 'p' is non-null}}
+    // expected-note at -2 {{Taking true branch}}
----------------
Charusso wrote:
> NoQ wrote:
> > Static Analyzer knows that the standard operator new never returns null. Therefore no assumption is being made here.
> As I see SA knows nothing. Where to teach it?
It does know it. The ultimate way to observe what Static Analyzer knows or thinks at any point during analysis and how its knowledge evolves is to [[ http://clang-analyzer.llvm.org/checker_dev_manual.html#visualizing | dump the "exploded graph" ]]:

{F7485724}

As you see, as soon as operator new is evaluated, even before the if-statement is evaluated, the symbol `conj_$0{int, LC1, S772, #1}` that represents the unknown value of the pointer produced by operator new becomes constrained to [1, 2⁶⁴ - 1] (assuming the target system is 64-bit), which means exactly that: it can take any value except 0.

Therefore when Static Analyzer reaches the if-statement later, it has no choice but to proceed to the true-branch, and not only the false branch is entirely skipped (the execution path doesn't split in two), but also the set of constraints remains unchanged, indicating that no assumption is being made:

{F7485753}

For the reference, here's the complete exploded graph for the test:

{F7485762}

https://reviews.llvm.org/D53076