[cfe-dev] [StaticAnalysis] Determine dereference values
Rafael·Stahl via cfe-dev
cfe-dev at lists.llvm.org
Thu Jul 27 08:51:11 PDT 2017
Hello
We are looking into using the clang front-end for static analysis.
The goal is to find memory accesses on the source code level whose
addresses can be statically determined or constrained. This should work
across functions and even translation units.
Example:
main.c:
int main() {
for (int i = 0; i < 4; i++)
access(((int*)0x1234) + i); // pass 0x1234, 0x1238, 0x123c, 0x1240
access(*(int**)0x4444); // pass statically unknown value
}
other.c:
void access(int* p) {
// Want output: read at addr
(0x1634|0x1638|0x163c|0x1640|unknown) from clang::Expr*.
((volatile int*)p)[0x100];
}
The clang StaticAnalysis library does a lot of the work we are
interested in. That is, determining what values an expression is
constrained to, while understanding stores, loads and running a symbolic
execution engine.
How scalable is this approach? Even though we would require inter-TU
analysis, the problem could be reduced by only looking at accesses that
have the volatile qualifier since we are looking at hardware accesses of
a bare-metal program. Some retries without inlining are fine, because we
assume the accesses are not separated by the constant with significant
complexity in between.
Will this be decently reliable? We are interested in cases where a
constant is dragged across a couple of low bounded loops with a bit of
arithmetic. What are typical cases where the engine gives up because of
exploding complexity? I have found that loops are explored in a very
limited scope. Is there an easy way to relax these limits a bit at the
cost of much higher execution time?
I noticed the engine does not take the value of a file scoped constant
pointer "T* const" into account. Is there a technical limitation that
prevents doing this?
I also tried to hack a bit on the DereferenceChecker and DivZeroChecker
to try and get the symbolic or even concrete value of a Loc, but only
got the initialized value and not the value it should be at the
dereference. When plotting a graph from a source that does basic
arithmetic on a pointer, the expression value never changes. It seems to
me that symbolic values of Locs are not fully tracked. Is this true and
is there a way to fully track them?
A backwards data-flow analysis on IR level is probably a more reasonable
approach in general, but getting the exact clang::Expr that does the
access is valuable to us.
Overall, is this problem reasonably solvable with clang static analysis?
Any feedback is greatly appreciated!
Best Regards
Rafael
More information about the cfe-dev
mailing list