[cfe-dev] [PATCH] expand a symbolic pointer

Wed Dec 10 21:54:36 PST 2008

On Dec 10, 2008, at 8:06 PM, Zhongxing Xu wrote:

> Thanks for your comments. They were helpful. Now the interfaces get  
> simplified a lot. Variables and functions are commented well.
>
> Here I explain the main idea.
>
> Some struct base is not mutable, e.g., struct s a; a.f = 3; Here the  
> struct base 'a' is not mutable. But some can, e.g.,
>
> void foo(struct s* p) {
>   p->f = 3
> }
>
> When we getInitialStore(), p is initialized to have value  
> loc::SymbolVal, which I call a symbolic pointer. (Note: here no  
> symbolic region is involved).
>
> In this patch we try to concretize the struct pointer p. We have  
> several preconditions for this concretization:
>  - the struct base is a pointer
>  - the pointer is symbolic
>  - the pointer is an lvalue, so that it can be modified.
>
> When concretizing the struct base pointer, we pass the pointer's  
> location to StoreManager. Therefore the pointer should be an lvalue  
> and has a location. We get this location by calling
> GRExprEngine::getBasePtrLoc(). This method only handles a few cases.  
> In other cases, BasePtrLoc is left Undefined intentionally.

Hi Zhongxing,

The patch is looking great, but I still have some concerns about the  
overall approach.  Consider your example with one modification:

void foo(struct s* p) {
   struct s* q = p;
   p->f = 3
}

After evaluating 'p->f' p will bind to an AnonPointeeRegion, but 'q'  
will bind to the original symbol.  They are aliases.  It gets  
arbitrarily worse if the value of 'p' is stored in other places,  
passed to functions, etc.

This overall approach feels a little "destructive."  We're losing  
information here.  After evaluating 'p->f' we've lost the fact that  
'p' *still* binds to the value at the entry to the function.

I see this as being fundamental problematic for four reasons:

(1) Different analyses may attach information (perhaps using the GDM)  
to a symbolic value.  Actually replacing a symbolic value with  
something "concrete" loses that information.

(2) Consider function summaries.  We don't have them yet, but   
basically we would build up a set of constraints and effects in terms  
of the input to the function.  That input is the symbolic values.   
Even if we "concretize" them we need to keep them around.

(3) Aliasing (i.e., the example I gave).  Concretization via replacing  
loses this critical information.

(4) The BugReporter "module" needs to reconstruct information from the  
ExplodedGraph, including when aliasing relationships come into effect,  
etc.  Blowing away the original value of 'p' makes this difficult at  
best.

The only way to replace a symbolic value with a concrete value is to  
alpha-rename it throughout the entire state.  This isn't trivial,  
especially when one considers the GDM.

I honestly think we need to take a "layered" approach.  Keep a map on  
the side from "symbols" to concrete values.  That's basically what we  
do in ConstraintManager when it comes to equality relationships with  
integers.  Equality relationships with integers is a form of  
concretizations; that's all were basically doing here, except we're  
reasoning about the concretization of the heap.

What do you think?  I have specific comments about your patch (which  
is looking great!) but I will defer them until we iterate more on this  
design point.

Ted