[PATCH] D28952: [analyzer] Add new Z3 constraint manager backend

Tue Jan 24 13:09:51 PST 2017

dcoughlin added a comment.

In https://reviews.llvm.org/D28952#655278, @ddcc wrote:

> > - That said, I think there are still significant optimization opportunities. It looks like when performing a query you create a new solver for each set of constraints. My understanding (and perhaps @nlopes can correct me here) is that these days Z3 is quite good at incremental solves so you may be leaving a lot of performance on the table. For example, in `getSymVal()` you create one solver to query for a satisfying assignment and then later create a completely new one to determine whether it is the only satisfying assignment. Since those two queries share all of their conjuncts but one it might be faster to reuse the first solver and add the new assertion for the second. Similarly, since analyzer exploration is a DFS, you could imagine building up the path condition incrementally. Z3 has solver APIs for pushing and popping assertions. (Again, @nlopes may have a better idea of whether this would pay off in practice.)
>
> I agree that the performance is the main problem, and that there are still a lot of performance optimizations available. I've separated the Z3 solver and model into separate classes, and reuse the solver now in `getSymVal()` and `checkNull()`. I looked at using the push/pop approach briefly when I started writing the solver, but since the static analyzer itself seems to have an internal worklist, I wasn't sure if the state traversal order is deterministic and predictable, so I didn't try to reuse a single solver across all program states. This is the new timing:
>
> Z3ConstraintManager (built-in, previous changes and solver state reuse): 202.37 sec

The improvement here (200s vs. 250s) makes me think that some form of caching/incrementalization could pay off.

> On the following testcases, the memory usage increase is around 18%. Using GNU time, measuring the max RSS in kbytes:
>  edges-new.mm (RangeConstraintManager): 45184
>  edges-new.mm (Z3ConstraintManager): 53568
>  malloc-plist.c (RangeConstraintManager): 40264
>  malloc-plist.c (Z3ConstraintManager): 47888

That's not as bad as I feared!

> 
> 
>> - As @a.sidorin noted, there is a significant test and maintenance cost to keeping two constraint managers around. Since the testing matrix would double, anything we can do to limit the need to modify/duplicate tests would be a huge savings. Is it possible to use lit substitution to call the analyzer twice for each run line: once with the range constraint manager and once with z3 (for builds where z3 is requested)? This, combined with automatically adding the #defines that @NoQ suggested would provide a mechanism to share most tests between the constraint managers.
> 
> I think this is reasonable, the only remaining question is how to identify when lit should call the analyzer twice, e.g. match on patch against `test/Analysis`? Also, note that the argument passed to clang (`-Xclang -analyzer-constraints=z3` vs. `-analyzer-constraints=z3`) need to change depending on whether the actual compiler or just the compiler driver is invoked.

We could add a new lit substitution '%clang_cc1_analyze' that expands to '%clang_cc1 -analyze -analyzer-constraints=foo -DANALYZER_HAS_CONSTRAINT_MANAGER_FOO" and change the analyzer tests to use it.

My sense is that most of the tests that legitimately call the driver directly are testing functionality that should be independent of the constraint manager. (Though there are some that call the driver for no good reason -- these should be easy to rewrite the RUN line).

https://reviews.llvm.org/D28952