[cfe-dev] Questions about symbolic execution in the static analyzer.

Sun Jun 6 19:28:58 PDT 2021

On 6/4/21 12:09 PM, Kriso, Achim via cfe-dev wrote:
> Hello,
>
> I am researching various static analyzers and want to get a better understanding of how the symbolic execution in the clang static analyzer works.
>
> Every implementation of symbolic execution has to make certain tradeoffs to be feasible.
>
> I'm interested in the following issues:
>
> How does the static analyzer approach the path explosion problem?
> In this [1] presentation it is mentioned that there are budgets which limit the exploration after visiting the same node multiple times.
> Are there any additional techniques to alleviate path explosion?

Yes, basically a number of different deterministic budgets and limits. A 
budget for the total number of nodes (the closest thing we have to a 
hard time limit), a budget for visiting a CFG block too many times 
(which sounds like what you're describing), a limit for callee function 
size and a recursion budget in interprocedural analysis, etc. One of my 
"favorite" limits is the rule to not consider a function for 
interprocedural analysis if it hit a budget last time we tried it; this 
one even works across otherwise unrelated symbolic execution entry 
points (within the same Clang instance, of course) (in hindsight, this 
was probably a bad idea; we're considering to replace it with something 
more predictable).

> Secondly, I'm am wondering how it deals with symbolic values as indices for arrays (or symbolic pointer arithmetic).
> I have read the paper about its memory model [2], but it doesn't seem to address this question.
> All the approaches to the problem I know rely either on address concretization, creating a new path for every possible index or using a more powerful constraint solver (for example using theory of arrays).

An access to array through symbolic index produces a symbol that 
remembers, as part of its identity, that it was obtained by loading from 
*that* array by *that* symbolic index.

This is where support basically ends :) This guarantees that loading 
twice with the same symbolic index will produce the same value but we 
don't yet properly write down constraints over the new symbol in terms 
of constraints over array elements and constraints over the index value.

This probably leads to bugs / false positives but I haven't seen much of 
those in practice (even though I can easily craft one). If we were to 
improve upon this, we'd probably go ahead with writing down a constraint 
like

   a[i] == a[1] || a[i] == a[2] || a[i] == a[3]

without splitting the execution path. And then hope that the solver will 
handle that.

Generally, our experience with powerful constraint solvers is relatively 
interesting. We have our custom constraint solver that's extremely 
primitive and barely correct but extremely fast. Replacing it with a 
full-featured SMT solver causes an unacceptable slowdown (like 20x or 
so). Our overall future direction is to go for cross-checking instead: 
use our fast solver for discovering bugs, and then invoke a stronger 
solver only when a potential bug is discovered. We have this implemented 
(with Z3 as a motivating example) and it was shown to noticeably improve 
precision without causing any noticeable slowdowns. One reason this 
isn't enabled for all users is because nobody is ready to contribute 
enough manpower to take responsibility of keeping Z3 running in practice 
- say, if there's a bug in Z3 that causes the static analyzer to crash, 
it may harm our users way more than the false positives it eliminates, 
and we'll most likely be unable to address it quickly enough. Also 
licensing concerns and usual LLVM external dependencies concerns.

If we actually had a chance to ship a strong SMT solver, we could 
totally take advantage of its array theory support as part of the 
cross-check pass.

> I hope you can answer my questions.
>
> Sincerely,
> Achim Kriso
>
> [1] https://www.youtube.com/watch?v=4n3l-ZcDJNY
> [2] http://lcs.ios.ac.cn/~xzx/memmodel.pdf
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev