[cfe-dev] Some problems about function summary

Wed Apr 27 06:12:44 PDT 2011

Hi, Ted,

I am trying to extend clang static analyzer to support inter-procedural
summary-based analysis. I have implemented a simple method for detecting
inter-procedural memory leaks which still has some problems preventing
doing a precise or even correct analysis.The main idea is recording the
side effects of each function, called SE for short. The structure of SE
is maps from statements to SVals.
For example,  
void f(int *x) {
  x[0] = 1;
}
void g() {
  int a = 0;
  f(&a);
}
The SE of function f is stmt(x[0])->ConcreteInt(1) (Here, x[0] must meet
a condition that the base region of stmt must be a symbolic Region
accessed from formal argument.).

When analyzing function g, we binding the actual argument &a to the
formal argument x(call the method EnterStackFrame() in GRState class.)
and then iterator all of SEs in function f to evaluate the side
effects(Use method ExprEngine::Visit(x[0]) to get the actual SVal or
memory region of x[0] and assign it the Value Concrete(1)).

There are many problems in this method.
1. If the index of x is a local variable ,the analysis is incorrect.For
instance, void f(int *x) { int i = 0; x[i] = 1; }.The actual memory
region of x[i] is unknown.Because the value i in function f is missed in
current summary.
2. If there exists assignments between arguments, the analysis is also
incorrect. For instance, void f(int *x, int *y){ x = y; x[0] = 1; }.
...
...

Obviously, the summary info is insufficient.So I try to explore more
summary info by collecting each function's access paths(AP).
Access paths are symbolic names for memory locations accessed in a
procedural.The structure of access paths is maps from statements to
memory region. We use access paths to handle the transfer from the
called function's statements with formal arguments to the caller's
actual statements. For example,  
void f(int *x, int *y) {
  x = y;
  x[0] = 1;
}
void g(int *a, int *b) {
  f(a, b);
}
The access paths of function f are stmt(x) -> symbolicRegion($x),
stmt(y)->symbolicRegion($y).In order to use access paths, the structure
of SE is adopted as maps from memory region to SVal.In this figure , f's
SE is ElemRegion{SymbolicRegion($y), 0} -> ConcreteInt(1).

While applying the summary info at the function's call site, we should
translate from the called function's memory region in SE to actual
memory region.ElemRegion{SymbolicRegion($y), 0} is translated as
ElemRegion{SymbolicRegion($a), 0},and then assign ConcreteInt(1) to
ElemRegion{SymbolicRegion($a), 0}.

This method is difficult or even infeasible to implement.Because the
translation is not always reasonable and the state data such as memory
region during intra-procedural analysis is not released which causes a
large memory consuming.

So we should define a data structure for summary which is independent of
GRState(or memory region).The most difficult thing is how to obtain the
actual value from the called function's analysis data.Can you give me
some suggestions about the design of function summary?Hope for your
reply!

-- 
Zhenbo Xu