[PATCH] D28953: [analyzer] Eliminate analyzer limitations on symbolic constraint generation

Sat Jun 17 00:04:29 PDT 2017

zaks.anna added inline comments.

================
Comment at: lib/StaticAnalyzer/Core/SValBuilder.cpp:356
                                    QualType ResultTy) {
-  if (!State->isTainted(RHS) && !State->isTainted(LHS))
-    return UnknownVal();
----------------
ddcc wrote:
> zaks.anna wrote:
> > I am concerned that removing the guard will regress performance in the vanilla case. (Note that Z3 support as well as taint are not on by default.)
> > 
> > I am curious how much of the regression you've measured could be gained back if we make this conditional.
> To clarify, the changes made in this patch aren't specific to Z3 support, especially simplifying `SymbolCast` and `IntSymExpr`. With the exception of `PR24184.cpp` and `plist-macros.cpp`, all testcases pass with both the default and Z3 constraint managers. However, creating additional constraints does have performance overhead, and it may be useful to consider the parameters for gating this functionality.
> 
> On a combined execution (Range + Z3) through the testcases, except the two mentioned above, the runtime is 327 sec with this patch applied, and 195 sec without this patch applied. On a separate execution through the testcases with only the Z3 constraint manager, I get runtimes 320 and 191, respectively.
> 
> For testing purposes, I also tried the following code, which has combined runtime 311 sec, but loses the accuracy improvements with the Range constraint manager on `bitwise-ops.c`, `conditional-path-notes.c`, `explain-svals.cpp`, and `std-c-library-functions.c`.
> 
> ```
> ConstraintManager &CM = getStateManager().getConstraintManager();
> if (!State->isTainted(RHS) && !State->isTainted(LHS) && !CM.isZ3())
> ```
Thanks for the explanation!

Regressing the current default behavior is my main concern. By looking at the numbers you provided before (Pre-commit and Post-Commit for RangeConstraintManager), it seems that this patch will introduce a performance regression of about 20%, which is large. But you are pointing out that there are improvements to the modeling as well and those are captured by the updated tests.

Here are a couple of ideas that came into mind on gaining back performance for the RangeConstraintManager case: 
 - Can we drop computing these for some expressions that we know the RangeConstraintManager will not utilize?
 - We could implement the TODO described below and possibly also lower the MaxComp value. This means that instead of keeping a complex expression and constraints on every symbol used in that expression, we would conjure a new symbol and associate a new constrain derived from the expression with it. (Would this strategy also work for the Z3 case?)

I have to point out that this code is currently only used by taint analysis, which is experimental and the current MaxComp value is targeted at that. We would probably need to refine the strategy here if we want to apply this logic to a general case. It's possible that different MaxComp should be used for different cases.

It would be valuable to run this on real code and measure how the number of reports we get differs depending on these values.

https://reviews.llvm.org/D28953