[cfe-dev] Questions about symbolic execution in the static analyzer.
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Sun Jun 6 19:28:58 PDT 2021
On 6/4/21 12:09 PM, Kriso, Achim via cfe-dev wrote:
> Hello,
>
> I am researching various static analyzers and want to get a better understanding of how the symbolic execution in the clang static analyzer works.
>
> Every implementation of symbolic execution has to make certain tradeoffs to be feasible.
>
> I'm interested in the following issues:
>
> How does the static analyzer approach the path explosion problem?
> In this [1] presentation it is mentioned that there are budgets which limit the exploration after visiting the same node multiple times.
> Are there any additional techniques to alleviate path explosion?
Yes, basically a number of different deterministic budgets and limits. A
budget for the total number of nodes (the closest thing we have to a
hard time limit), a budget for visiting a CFG block too many times
(which sounds like what you're describing), a limit for callee function
size and a recursion budget in interprocedural analysis, etc. One of my
"favorite" limits is the rule to not consider a function for
interprocedural analysis if it hit a budget last time we tried it; this
one even works across otherwise unrelated symbolic execution entry
points (within the same Clang instance, of course) (in hindsight, this
was probably a bad idea; we're considering to replace it with something
more predictable).
> Secondly, I'm am wondering how it deals with symbolic values as indices for arrays (or symbolic pointer arithmetic).
> I have read the paper about its memory model [2], but it doesn't seem to address this question.
> All the approaches to the problem I know rely either on address concretization, creating a new path for every possible index or using a more powerful constraint solver (for example using theory of arrays).
An access to array through symbolic index produces a symbol that
remembers, as part of its identity, that it was obtained by loading from
*that* array by *that* symbolic index.
This is where support basically ends :) This guarantees that loading
twice with the same symbolic index will produce the same value but we
don't yet properly write down constraints over the new symbol in terms
of constraints over array elements and constraints over the index value.
This probably leads to bugs / false positives but I haven't seen much of
those in practice (even though I can easily craft one). If we were to
improve upon this, we'd probably go ahead with writing down a constraint
like
a[i] == a[1] || a[i] == a[2] || a[i] == a[3]
without splitting the execution path. And then hope that the solver will
handle that.
Generally, our experience with powerful constraint solvers is relatively
interesting. We have our custom constraint solver that's extremely
primitive and barely correct but extremely fast. Replacing it with a
full-featured SMT solver causes an unacceptable slowdown (like 20x or
so). Our overall future direction is to go for cross-checking instead:
use our fast solver for discovering bugs, and then invoke a stronger
solver only when a potential bug is discovered. We have this implemented
(with Z3 as a motivating example) and it was shown to noticeably improve
precision without causing any noticeable slowdowns. One reason this
isn't enabled for all users is because nobody is ready to contribute
enough manpower to take responsibility of keeping Z3 running in practice
- say, if there's a bug in Z3 that causes the static analyzer to crash,
it may harm our users way more than the false positives it eliminates,
and we'll most likely be unable to address it quickly enough. Also
licensing concerns and usual LLVM external dependencies concerns.
If we actually had a chance to ship a strong SMT solver, we could
totally take advantage of its array theory support as part of the
cross-check pass.
> I hope you can answer my questions.
>
> Sincerely,
> Achim Kriso
>
> [1] https://www.youtube.com/watch?v=4n3l-ZcDJNY
> [2] http://lcs.ios.ac.cn/~xzx/memmodel.pdf
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
More information about the cfe-dev
mailing list