[PATCH] D28952: [analyzer] Add new Z3 constraint manager backend

Tue Jan 24 12:33:50 PST 2017

ddcc added a comment.

> - That said, I think there are still significant optimization opportunities. It looks like when performing a query you create a new solver for each set of constraints. My understanding (and perhaps @nlopes can correct me here) is that these days Z3 is quite good at incremental solves so you may be leaving a lot of performance on the table. For example, in `getSymVal()` you create one solver to query for a satisfying assignment and then later create a completely new one to determine whether it is the only satisfying assignment. Since those two queries share all of their conjuncts but one it might be faster to reuse the first solver and add the new assertion for the second. Similarly, since analyzer exploration is a DFS, you could imagine building up the path condition incrementally. Z3 has solver APIs for pushing and popping assertions. (Again, @nlopes may have a better idea of whether this would pay off in practice.)

I agree that the performance is the main problem, and that there are still a lot of performance optimizations available. I've separated the Z3 solver and model into separate classes, and reuse the solver now in `getSymVal()` and `checkNull()`. I looked at using the push/pop approach briefly when I started writing the solver, but since the static analyzer itself seems to have an internal worklist, I wasn't sure if the state traversal order is deterministic and predictable, so I didn't try to reuse a single solver across all program states. This is the new timing:

Z3ConstraintManager (built-in, previous changes and solver state reuse): 202.37 sec

> - It would be good to measure the increased peak memory usage with the Z3 constraint solver. The analyzer is often used as part of an IDE and so it needs to be able to run in memory constrained environments (such as laptops). Further, since multiple invocations of clang are often run simultaneously, memory is often a more precious resource than CPU time.

On the following testcases, the memory usage increase is around 18%. Using GNU time, measuring the max RSS in kbytes:
edges-new.mm (RangeConstraintManager): 45184
edges-new.mm (Z3ConstraintManager): 53568
malloc-plist.c (RangeConstraintManager): 40264
malloc-plist.c (Z3ConstraintManager): 47888

> - As @a.sidorin noted, there is a significant test and maintenance cost to keeping two constraint managers around. Since the testing matrix would double, anything we can do to limit the need to modify/duplicate tests would be a huge savings. Is it possible to use lit substitution to call the analyzer twice for each run line: once with the range constraint manager and once with z3 (for builds where z3 is requested)? This, combined with automatically adding the #defines that @NoQ suggested would provide a mechanism to share most tests between the constraint managers.

I think this is reasonable, the only remaining question is how to identify when lit should call the analyzer twice, e.g. match on patch against `test/Analysis`? Also, note that the argument passed to clang (`-Xclang -analyzer-constraints=z3` vs. `-analyzer-constraints=z3`) need to change depending on whether the actual compiler or just the compiler driver is invoked.

> - Right now the CMake uses find_package and builds with Z3 if it is found to be installed on the building system. I think this is too aggressive. It would be better to have the build explicitly opt in to using Z3. Without this, a user could accidentally build a clang that dynamically links to their local Z3 but then fails to launch with a load error when distributed. Similarly, it would be good to be able to explicitly set the the location of the Z3 to be used at build time for systems with multiple Z3s installed.

Done.

https://reviews.llvm.org/D28952