[cfe-dev] [StaticAnalysis] Determine dereference values

Rafael·Stahl via cfe-dev cfe-dev at lists.llvm.org
Thu Jul 27 08:51:11 PDT 2017


Hello

We are looking into using the clang front-end for static analysis.

The goal is to find memory accesses on the source code level whose 
addresses can be statically determined or constrained. This should work 
across functions and even translation units.

Example:
main.c:
     int main() {
       for (int i = 0; i < 4; i++)
         access(((int*)0x1234) + i);  // pass 0x1234, 0x1238, 0x123c, 0x1240
       access(*(int**)0x4444);  // pass statically unknown value
     }
other.c:
     void access(int* p) {
       // Want output: read at addr 
(0x1634|0x1638|0x163c|0x1640|unknown) from clang::Expr*.
       ((volatile int*)p)[0x100];
     }

The clang StaticAnalysis library does a lot of the work we are 
interested in. That is, determining what values an expression is 
constrained to, while understanding stores, loads and running a symbolic 
execution engine.

How scalable is this approach? Even though we would require inter-TU 
analysis, the problem could be reduced by only looking at accesses that 
have the volatile qualifier since we are looking at hardware accesses of 
a bare-metal program. Some retries without inlining are fine, because we 
assume the accesses are not separated by the constant with significant 
complexity in between.

Will this be decently reliable? We are interested in cases where a 
constant is dragged across a couple of low bounded loops with a bit of 
arithmetic. What are typical cases where the engine gives up because of 
exploding complexity? I have found that loops are explored in a very 
limited scope. Is there an easy way to relax these limits a bit at the 
cost of much higher execution time?

I noticed the engine does not take the value of a file scoped constant 
pointer "T* const" into account. Is there a technical limitation that 
prevents doing this?

I also tried to hack a bit on the DereferenceChecker and DivZeroChecker 
to try and get the symbolic or even concrete value of a Loc, but only 
got the initialized value and not the value it should be at the 
dereference. When plotting a graph from a source that does basic 
arithmetic on a pointer, the expression value never changes. It seems to 
me that symbolic values of Locs are not fully tracked. Is this true and 
is there a way to fully track them?

A backwards data-flow analysis on IR level is probably a more reasonable 
approach in general, but getting the exact clang::Expr that does the 
access is valuable to us.

Overall, is this problem reasonably solvable with clang static analysis? 
Any feedback is greatly appreciated!

Best Regards
Rafael




More information about the cfe-dev mailing list