[PATCH] D28952: [analyzer] Add new Z3 constraint manager backend

Thu Mar 30 07:31:46 PDT 2017

delcypher added a comment.

In https://reviews.llvm.org/D28952#655337, @dcoughlin wrote:

> In https://reviews.llvm.org/D28952#655278, @ddcc wrote:
>
> > > - That said, I think there are still significant optimization opportunities. It looks like when performing a query you create a new solver for each set of constraints. My understanding (and perhaps @nlopes can correct me here) is that these days Z3 is quite good at incremental solves so you may be leaving a lot of performance on the table. For example, in `getSymVal()` you create one solver to query for a satisfying assignment and then later create a completely new one to determine whether it is the only satisfying assignment. Since those two queries share all of their conjuncts but one it might be faster to reuse the first solver and add the new assertion for the second. Similarly, since analyzer exploration is a DFS, you could imagine building up the path condition incrementally. Z3 has solver APIs for pushing and popping assertions. (Again, @nlopes may have a better idea of whether this would pay off in practice.)
> >
> > I agree that the performance is the main problem, and that there are still a lot of performance optimizations available. I've separated the Z3 solver and model into separate classes, and reuse the solver now in `getSymVal()` and `checkNull()`. I looked at using the push/pop approach briefly when I started writing the solver, but since the static analyzer itself seems to have an internal worklist, I wasn't sure if the state traversal order is deterministic and predictable, so I didn't try to reuse a single solver across all program states. This is the new timing:
> >
> > Z3ConstraintManager (built-in, previous changes and solver state reuse): 202.37 sec
>
>
> The improvement here (200s vs. 250s) makes me think that some form of caching/incrementalization could pay off.

KLEE has a few optimization ideas that you could consider implementing that certainly help in the context of symbolic execution.

- Constraint independence. This splits a set of constraints into independent constraints and solves them separately. This can be useful for further processing (e.g. increases the likely hood of a cache hit)
- "CounterExampleCache" . The naming is rather unfortunate and confusing (because KLEE works in terms of validity rather than satisfiability). The idea is to cache models for solver queries that are SAT (i.e. maintain a mapping from models to satisfying assignments with a sentinel to indicate when a query does not have a satisfying assignment) and when handling future queries
- If the new query is in the existing cache if it has a model it is SAT otherwise it is UNSAT.
- If the new query is a superset of an existing query in the cache then: If there was no model for the cached query then the super set query is also UNSAT. If there is a model for the original query try substituting it into the query (along with a deterministic values for symbolic values in the new query that aren't in the model) to try to quickly check if the cached model also satisfies the superset. If this fails then call the real solver
- If the new query is a subset of an existing query in the cache then: If there is a model for the cached query then that model will also satisfying the new query. If there is no model then the new query is UNSAT.

KLEE also does very aggressive constant folding and expression canonicalization to try to get more cache hits and have smaller constraints.

https://reviews.llvm.org/D28952