[cfe-commits] [PATCH] A tentative implementation of RegionStoreManager

Mon Oct 6 21:52:07 PDT 2008

On Oct 6, 2008, at 7:16 PM, Zhongxing Xu wrote:

> On Mon, Oct 6, 2008 at 11:18 PM, Ted Kremenek <kremenek at apple.com>  
> wrote:
>
> On Oct 5, 2008, at 6:28 PM, Zhongxing Xu wrote:
>
>> My either idea was to have regions encode minimal information that  
>> could be shared amongst different implementations of StoreManager.   
>> I'm not really certain why you wish to add a VarDecl* "pointedBy"  
>> field into AnonTypedRegion?  I didn't get around to commenting this  
>> class, but AnonTypedRegion is meant to represent a typed chunk of  
>> memory; it doesn't have to be pointed by a VarDecl.   It also seems  
>> to me that you are using AnonTypedRegion exactly the way BasicStore  
>> uses VarRegion.  Isn't it the same thing?  The "Anon" means  
>> anonymous; it means there is no name associated with this region.
>>
>> I use VarDecl* to differentiate AnonTypedRegions. For example, for
>>
>> void foo(char* a, char* b) {...},
>>
>> 'a' and 'b' both points to an AnonTypedRegion with type 'char', I  
>> just use the VarDecl's of 'a' and 'b' to differentiate these two  
>> regions. This is definitely not optimal design, because there might  
>> be AnonTypedRegion that are not pointed to by any variable. So it  
>> should be discussed. But one thing is sure: we need something to be  
>> associated with AnonTypedRegion besides its type and superregion.
>
> I understand, but this particular use of AnonTypedRegions is exactly  
> the same as VarRegion.  Why not just use VarRegion instead and save  
> the extra QualType?  It also avoids putting the VarDecl* in  
> AnonTypedRegion, since at that point the region is not anonymous.
>
> In RegionStoreManager, I assume pointer parameters points to some  
> anonymous memory region at the beginning of the function. So this  
> AnonTypedRegion is the the region that the parameter points to, not  
> the memory region associated with the parameter itself. So for a  
> function parameter 'char *a', actually I created two regions: one is  
> a VarRegion with VarDecl of 'a' (the region 'R' in my patch), this  
> is the region with 'a' itself. The other is an AnonTypedRegion (the  
> region 'PR' in my patch). This is the region that is pointed to by  
> 'a' (by assumption). This region is really an anonymous region, for  
> we don't know where 'a' points to when we setup the initial store  
> for the function. (In the future interprocedural analysis, we might  
> know.) And to differentiate AnonTypedRegions pointed-at by different  
> parameters, I associate with them the VarDecl of the parameter  
> pointing-at them. Maybe we can have a subclass of AnonTypedRegion to  
> represent such memory region pointed-at by function parameters, and  
> give them VarDecl or other information to differentiate from each  
> other.

Hi Zhongxing,

Thanks for clarifying.  That makes a lot more sense to me.

I just saw your other email where you submitted an alternate patch,  
but I'll mention some initial thoughts I have here.  First, I think  
having an AnonPointeeRegion makes sense then overloading the purpose  
of AnonTypedRegion.  What you are trying to represent is the "symbolic  
address" of the parameters and global variables that are pointers upon  
entree of the function.  One question comes to mind is whether or not  
this binding is done up front or lazily.  For example

void foo(int **p) { ... }

In your patch the value of "p" upon entry to the function would be an  
address binding to an AnonTypedRegion/AnonPointeeRegion.  What about  
*p?  How many levels deep does it go?  I don't have a good answer, but  
binding "symbolic" addresses lazily might be more flexible, dynamic,  
and scalable.

I have seen in the implementations of some static analysis systems  
that they bound the "deepness" of the heap.  For example:

  q->x->y might have an explicit region for the field 'y'

but

   q->x->y->w might just have an "unknown" for w, just to bound the  
abstraction.

This question becomes particularly important when one considers  
recursive data structures.  There doesn't have to be a one-size-fits- 
all solution; it's just something to consider.

Here's another (random) thought.  Consider:

int (int* p, int *q) {
   ...
   if (*p > 10) { ... }
   ...
   if (*q < 20) { ... }
   ...
   if (p == q) { ...  }
   ...
}

What happens to the store when 'p == q'?  Do we do alpha-renaming +  
unification of the regions and bindings?  I.e., do all the values and  
constraints for the region and the values they map to get combined?   
When we start reasoning about abstract memory, these are things we  
want to consider in the design.  Again, we don't have to solve all  
problems at once, but this is something fairly fundamental.

>
> It seems to me that RegionStoreManager doesn't need to use just one  
> kind of region and one ImmutableMap.  For example, a map from  
> VarDecl* -> VarRegion could be used for mappings from variables to  
> regions, and then other maps could be used for other bindings.  For  
> example, a (MemRegion*, FieldDecl*) -> FieldRegion could be used for  
> field bindings.  A second set of mappings could then be used for  
> region -> value bindings.  In your prototype implementation you have  
> AnonTypedRegion* -> RVal, but one could also just have MemRegion* ->  
> RVal.
>
> My thought is that we don't need any mappings from Decl* to  
> MemRegion*. We only need one mapping from MemRegion* to its stored  
> value RVal. Because once we have the mapping MemRegion* -> RVal, we  
> can calculate the location (MemRegion*) of any name on the fly. So  
> we don't need to store them.

Unfortunately I don't believe that's true (which counteracts something  
I said in an earlier private email).

Consider:

int *p = 0;

for (int i = 0; i < 10; i++) {
   if (p) *p++;
   int j = i + 1;
   p = &j;
}

On each iteration of the loop the VarDecl for 'j' will conceptually  
bind to a different region (although by coincidence it may bind to the  
same physical memory, logically it binds to a different "object").   
While memory bindings for globals and parameters stay fixed during the  
execution of a function (with globals staying always fixed), bindings  
for local variables do not.  If we want to catch cases like the *p++  
being a use of invalid memory, we actually have to consider the case  
where the same VarDecl can bind to different regions.  Note that we do  
not capture enough scope information in the CFGs to do this analysis  
right now, but that is something we probably we will add to the CFGs  
in the future (especially if we wish to model implicit calls to  
destructors for C++ objects).

> For a struct member expression 'p->data', we can first get the  
> superRegion by following p's MemRegion in the store mapping, then  
> get the final region by composite the FieldDecl of 'data' with the  
> MemRegion that p points at.

Would the idea be to represent p->data (the composition) with a  
FieldRegion (with its super region being the region for 'p'), or  
something else?

> In summary, we only need to store the Store ( mapping from  
> MemRegion* to RVal), but not the Environment (mapping from names to  
> MemRegion*).

Given the example I provided above, do you still think that is the  
case?  It's certainly simpler if we don't have to model the mapping  
from names to MemRegion*.

Ted
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20081006/715c2cac/attachment.html>