[PATCH] D103440: [WIP][analyzer] Introduce range-based reasoning for addition operator

Tue Jun 22 02:38:34 PDT 2021

vsavchenko added inline comments.

================
Comment at: clang/test/Analysis/constant-folding.c:280-281
+  if (c < 0 && c != INT_MIN && d < 0) {
+    clang_analyzer_eval((c + d) == -1); // expected-warning{{FALSE}}
+    clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}
+    clang_analyzer_eval((c + d) <= -2); // expected-warning{{UNKNOWN}}
----------------
manas wrote:
> manas wrote:
> > vsavchenko wrote:
> > > manas wrote:
> > > > vsavchenko wrote:
> > > > > manas wrote:
> > > > > > When I pull these cases out to a separate function (say testAdditionRules2) in constant-folding.c then algorithm is able to reason about the range correctly, but inside testAdditionRules, it is unable to figure stuff out. Does it have something to do with constraints being propagated which we discussed below?
> > > > > > 
> > > > > > @vsavchenko wrote: 
> > > > > > > I have one note here.  The fact that we don't have a testing framework and use this approach of inspecting the actual analysis has an interesting implication.
> > > > > > > 
> > > > > > > ```
> > > > > > > if (a == 10) { // here we split path into 2 paths: one where a == 10, and one where a != 10;
> > > > > > >                // one goes inside of if, one doesn't
> > > > > > >   . . .
> > > > > > > }
> > > > > > > if (a >= 5) { // here we split into 3 paths: a == 10, a < 5, and a in [5, 9] and [11, INT_MAX] (aka a >= 5 and a != 10).
> > > > > > >               // 1 path was infeasible: a == 10 and a < 5
> > > > > > >               // Two of these paths go inside of the if, one doesn't
> > > > > > >   . . .
> > > > > > >   clang_analyzer_eval(a == 10); // it will produce two results: TRUE and FALSE
> > > > > > > }
> > > > > > > clang_analyzer_eval(a == 10); // it will produce three results: TRUE, FALSE, and FALSE
> > > > > > > ```
> > > > > > > 
> > > > > > > We don't want to test how path splitting works in these particular tests (they are about solver and constant folding after all), that's why we try to have only one path going through each `clang_analyzer_eval(...)`
> > > > > > > 
> > > > > > 
> > > > > > 
> > > > > It might be that or it might be something different.  Just by looking at this example, the previous `if` statement shouldn't add more paths that go inside of this `if`, so it shouldn't be the root cause.
> > > > > Whenever you encounter problems and you want to tell other people, **please, provide more detail**.  Did you notice it when running the test case?  What was the output?  What did you try?  How did you extract it into a separate function?
> > > > I put a new test function in `constant-folding.c` as:
> > > > 
> > > > ```
> > > > void testAdditionRulesNew(int c, int d) {
> > > >   if (c < 0 && c != INT_MIN && d < 0) {
> > > >     clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}
> > > >   }
> > > > }
> > > > ```
> > > > I tested this specific function as:
> > > > 
> > > >   ./build/bin/clang -cc1 -analyze -analyzer-checker=core,debug.ExprInspection -analyze-function=testAdditionRulesNew constant-folding.c
> > > > 
> > > > And I got the following output:
> > > > 
> > > >   ../clang/test/Analysis/constant-folding.c:338:5: warning: FALSE [debug.ExprInspection]
> > > >     clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}
> > > > 
> > > > which is correct.
> > > > 
> > > > But when I ran the same test inside `testAdditionRules`, using:
> > > >     ./build/bin/clang -cc1 -analyze -analyzer-checker=core,debug.ExprInspection -analyze-function=testAdditionRules constant-folding.c
> > > > then I got:
> > > > 
> > > >   ../clang/test/Analysis/constant-folding.c:281:5: warning: FALSE [debug.ExprInspection]
> > > >     clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}                                                                                            
> > > >     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                          
> > > >   ../clang/test/Analysis/constant-folding.c:281:5: warning: TRUE [debug.ExprInspection]
> > > >     clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}                                                                                            
> > > >     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                      
> > > > 
> > > > Here, `c = [INT_MIN + 1, -1]` and `d = [INT_MIN, 0]`, so `c + d = [INT_MIN, -2] U [1, INT_MAX]`. So `c + d == 0` should be false. But in latter case, it is reasoning `c + d == 0` to be `UNKNOWN`.
> > > > 
> > > > Also, the error arises in `c + d == 0` case only and not in `c + d == -1` case. I mistakenly highlighted that case while commenting.
> > > Hmm, I don't know what can be the reason.
> > > 
> > > There are three reasonable approaches:
> > > 1. Straightforward: use debugger.  Find a place when we ask the solver about this assumption and dig in.
> > > 2. Print-based: use `clang_analyzer_printState` inside of this if.  Among other things it will print you all constraints that we know about, you can check if they match your expectations.  Also, if it is indeed because of residual paths, you will get another print from residual path.
> > > 3. Reduction: add  the whole `testAdditionRules` into `testAdditionRulesNew` and start removing parts of it.  At some point problem should disappear (based on your previous observations).  That "minimal" reproducer can give you insight into what's going on.
> > > 
> > > These are not mutually exclusive, so you can try them in combination.
> > I tried print the ProgramState as in both the cases like:
> > 
> >   if (c < 0 && c != INT_MIN && d < 0) {
> >     clang_analyzer_printState();
> >     clang_analyzer_eval(...);
> >     ...
> >   }
> > 
> > 1. During `testAdditionRulesNew`, when I added `clang_analyzer_printState` just before doing any `clang_analyzer_eval` calls, I got the following constraints (these are as expected):
> > 
> > ```  
> >   "constraints": [                                                                                                                                             
> >     { "symbol": "reg_$0<int c>", "range": "{ [-2147483647, -1] }" },                                                                                           
> >     { "symbol": "reg_$1<int d>", "range": "{ [-2147483648, -1] }" }                                                                                            
> >   ],
> > ```
> > 
> > 2. In `testAdditionRules`, adding `printState` before any evaluation led to:
> > 
> > ```
> >   "constraints": [                                                                                                                                             
> >     { "symbol": "reg_$2<int c>", "range": "{ [-2147483647, -1] }" },                                                                                           
> >     { "symbol": "reg_$3<int d>", "range": "{ [-2147483648, -1] }" },                                                                                           
> >     { "symbol": "(reg_$2<int c>) + (reg_$3<int d>)", "range": "{ [0, 2147483647] }" }                                                                          
> >   ],
> > 
> >   "constraints": [
> >     { "symbol": "reg_$2<int c>", "range": "{ [-2147483647, -1] }" },
> >     { "symbol": "reg_$3<int d>", "range": "{ [-2147483648, -1] }" },
> >     { "symbol": "(reg_$2<int c>) + (reg_$3<int d>)", "range": "{ [-2147483648, -2] }" }
> >   ],
> > ```
> > A point to note is that this ProgramState is **before** any `clang_analyzer_eval` calls have been made. So, the third constraint, in both sets of constraints (for `c + d`) should not exist there, that is,
> > 
> >   a. { "symbol": "(reg_$2<int c>) + (reg_$3<int d>)", "range": "{ [0, 2147483647] }" }
> >   b. { "symbol": "(reg_$2<int c>) + (reg_$3<int d>)", "range": "{ [-2147483648, -2] }" }
> > 
> > should not be in the constraints set.
> > 
> > And as obvious, the `UNKNOWN` reasoning for `c + d == 0` in `testAdditionRules` is arising from constraint (a) above, which is a wrong reasoning (it ideally should be `[1, INT_MAX]` and not `[0, INT_MAX]`).
>  @vsavchenko wrote:
> 
> > 3. Reduction: add  the whole `testAdditionRules` into `testAdditionRulesNew` and start removing parts of it.  At some point problem should disappear (based on your previous observations).  That "minimal" reproducer can give you insight into what's going on.
> > 
> 
> Trying this, I was able to deduce that the following snippet is causing `UNKNOWN` triggers:
> ```
> void testAdditionRulesNew(unsigned int a, unsigned int b, int c, int d) {
>   if (c < 0 && d < 0) {
>     clang_analyzer_printState();
>     clang_analyzer_eval((c + d) < 0); // expected-warning{{UNKNOWN}}
>     clang_analyzer_printState();
>   }
> 
>   if (c < 0 && c != INT_MIN && d < 0) {
>     clang_analyzer_printState();
>     clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}
>     clang_analyzer_printState();
>   }
> }
> ```
>  The output warnings are as follows:
> 
> ```
> <snipped-ProgramStates>
> ../clang/test/Analysis/constant-folding.c:341:5: warning: FALSE [debug.ExprInspection]
>     clang_analyzer_eval((c + d) < 0); // expected-warning{{UNKNOWN}}
>     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ../clang/test/Analysis/constant-folding.c:341:5: warning: TRUE [debug.ExprInspection]
>     clang_analyzer_eval((c + d) < 0); // expected-warning{{UNKNOWN}}
>     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ../clang/test/Analysis/constant-folding.c:347:5: warning: FALSE [debug.ExprInspection]
>     clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}
>     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ../clang/test/Analysis/constant-folding.c:347:5: warning: TRUE [debug.ExprInspection]
>     clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}
>     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 4 warnings generated.
> 
> ```
> The second evaluation is leading to wrong reasoning. I think the constraints are being propagated somehow from first conditional (`c < 0 && d < 0`) to the second conditional (`c < 0 && c != INT_MIN && d < 0`).
> 
> I have added a pastebin using `clang_analyzer_printState` here: https://pastebin.com/0E8hmTWh
> 
Ooof, OK.  I know what's going on here.

`clang_analyzer_eval` actually **assumes** its condition.  So it works as expected when the result is `TRUE` or `FALSE`, but actually splits the path when `UNKNOWN`.
This being said, the first conditional doesn't affect you (it is weaker than the second condition), `clang_analyzer_eval((c + d) < 0); // expected-warning{{UNKNOWN}}` does.
After that point, you have two states: one where `c + d >= 0` (you called it **a.**) and the other with `c + d < 0` (**b.**).

Another important point here is located in `SymbolicRangeInferrer::infer(SymbolRef Sym)`.  You can see that there are three returns in that function.  Your functionality lives in the very last one, namely in `Visit(Sym)`.  What it means is that we ALWAYS give preference to existing constraints. And if we don't have any, we start traversing symbolic expressions to get something from that level.
This logic is flawed because we can be in a situation like you encountered now.  Where we believe that `c + d >= 0`, but if we look at `c` and `d`, we can see that this is actually impossible.
It can be solved by always doing `Visit(Sym)` and intersecting results from ALL available sources.  It has a simple implementation, but requires additional performance testing.

Long story short, that's not your problem.  Try to avoid it/rewrite the test.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103440/new/

https://reviews.llvm.org/D103440