[cfe-commits] [PATCH] A tentative implementation of RegionStoreManager

Mon Oct 6 23:49:00 PDT 2008

On Tue, Oct 7, 2008 at 12:52 PM, Ted Kremenek <kremenek at apple.com> wrote:

>
> On Oct 6, 2008, at 7:16 PM, Zhongxing Xu wrote:
>
> On Mon, Oct 6, 2008 at 11:18 PM, Ted Kremenek <kremenek at apple.com> wrote:
>
>>
>> On Oct 5, 2008, at 6:28 PM, Zhongxing Xu wrote:
>>
>> My either idea was to have regions encode minimal information that could
>>> be shared amongst different implementations of StoreManager.  I'm not really
>>> certain why you wish to add a VarDecl* "pointedBy" field into
>>> AnonTypedRegion?  I didn't get around to commenting this class, but
>>> AnonTypedRegion is meant to represent a typed chunk of memory; it doesn't
>>> have to be pointed by a VarDecl.   It also seems to me that you are using
>>> AnonTypedRegion exactly the way BasicStore uses VarRegion.  Isn't it the
>>> same thing?  The "Anon" means anonymous; it means there is no name
>>> associated with this region.
>>>
>>
>> I use VarDecl* to differentiate AnonTypedRegions. For example, for
>>
>> void foo(char* a, char* b) {...},
>>
>> 'a' and 'b' both points to an AnonTypedRegion with type 'char', I just use
>> the VarDecl's of 'a' and 'b' to differentiate these two regions. This is
>> definitely not optimal design, because there might be AnonTypedRegion that
>> are not pointed to by any variable. So it should be discussed. But one thing
>> is sure: we need something to be associated with AnonTypedRegion besides its
>> type and superregion.
>>
>>
>> I understand, but this particular use of AnonTypedRegions is exactly the
>> same as VarRegion.  Why not just use VarRegion instead and save the extra
>> QualType?  It also avoids putting the VarDecl* in AnonTypedRegion, since at
>> that point the region is not anonymous.
>>
>
> In RegionStoreManager, I assume pointer parameters points to some anonymous
> memory region at the beginning of the function. So this AnonTypedRegion is
> the the region that the parameter points to, not the memory region
> associated with the parameter itself. So for a function parameter 'char *a',
> actually I created two regions: one is a VarRegion with VarDecl of 'a' (the
> region 'R' in my patch), this is the region with 'a' itself. The other is an
> AnonTypedRegion (the region 'PR' in my patch). This is the region that is
> pointed to by 'a' (by assumption). This region is really an anonymous
> region, for we don't know where 'a' points to when we setup the initial
> store for the function. (In the future interprocedural analysis, we might
> know.) And to differentiate AnonTypedRegions pointed-at by different
> parameters, I associate with them the VarDecl of the parameter pointing-at
> them. Maybe we can have a subclass of AnonTypedRegion to represent such
> memory region pointed-at by function parameters, and give them VarDecl or
> other information to differentiate from each other.
>
>
> Hi Zhongxing,
>
> Thanks for clarifying.  That makes a lot more sense to me.
>
> I just saw your other email where you submitted an alternate patch, but
> I'll mention some initial thoughts I have here.  First, I think having an
> AnonPointeeRegion makes sense then overloading the purpose of
> AnonTypedRegion.  What you are trying to represent is the "symbolic address"
> of the parameters and global variables that are pointers upon entree of the
> function.  One question comes to mind is whether or not this binding is done
> up front or lazily.  For example
>
> void foo(int **p) { ... }
>
> In your patch the value of "p" upon entry to the function would be an
> address binding to an AnonTypedRegion/AnonPointeeRegion.  What about *p?
>  How many levels deep does it go?  I don't have a good answer, but binding
> "symbolic" addresses lazily might be more flexible, dynamic, and scalable.
>
> I have seen in the implementations of some static analysis systems that
> they bound the "deepness" of the heap.  For example:
>
>  q->x->y might have an explicit region for the field 'y'
>
> but
>
>   q->x->y->w might just have an "unknown" for w, just to bound the
> abstraction.
>
> This question becomes particularly important when one considers recursive
> data structures.  There doesn't have to be a one-size-fits-all solution;
> it's just something to consider.
>
> Here's another (random) thought.  Consider:
>
> int (int* p, int *q) {
>   ...
>   if (*p > 10) { ... }
>   ...
>   if (*q < 20) { ... }
>   ...
>   if (p == q) { ...  }
>   ...
> }
>
> What happens to the store when 'p == q'?  Do we do alpha-renaming +
> unification of the regions and bindings?  I.e., do all the values and
> constraints for the region and the values they map to get combined?  When we
> start reasoning about abstract memory, these are things we want to consider
> in the design.  Again, we don't have to solve all problems at once, but this
> is something fairly fundamental.
>

These are all good points to consider. My suggestion is that we add
complexity step by step. In this first basic region store model (see the
attached new patch), we assume simple cases: we assume no aliasing for
parameters, we do 1-limiting analysis and assume nothing about the heap
shape. So the only big thing compared to BasicStore is the field
sensitivity.

Then we can design more complex ones: heap store model, alias store model,
...

Another cure that I'm looking forward to is the inter-procedural analysis.
Most complexities above are due to lack of environment information. But when
we do top-down inter-procedural analysis, we will have (at least part of)
function entree information. This can relieve us from assuming the worst
case.

>
>
>> It seems to me that RegionStoreManager doesn't need to use just one kind
>> of region and one ImmutableMap.  For example, a map from VarDecl* ->
>> VarRegion could be used for mappings from variables to regions, and then
>> other maps could be used for other bindings.  For example, a (MemRegion*,
>> FieldDecl*) -> FieldRegion could be used for field bindings.  A second set
>> of mappings could then be used for region -> value bindings.  In your
>> prototype implementation you have AnonTypedRegion* -> RVal, but one could
>> also just have MemRegion* -> RVal.
>>
>
> My thought is that we don't need any mappings from Decl* to MemRegion*. We
> only need one mapping from MemRegion* to its stored value RVal. Because once
> we have the mapping MemRegion* -> RVal, we can calculate the location
> (MemRegion*) of any name on the fly. So we don't need to store them.
>
>
> Unfortunately I don't believe that's true (which counteracts something I
> said in an earlier private email).
>
> Consider:
>
> int *p = 0;
>
> for (int i = 0; i < 10; i++) {
>   if (p) *p++;
>   int j = i + 1;
>   p = &j;
> }
>
> On each iteration of the loop the VarDecl for 'j' will conceptually bind to
> a different region (although by coincidence it may bind to the same physical
> memory, logically it binds to a different "object").  While memory bindings
> for globals and parameters stay fixed during the execution of a function
> (with globals staying always fixed), bindings for local variables do not.
>  If we want to catch cases like the *p++ being a use of invalid memory, we
> actually have to consider the case where the same VarDecl can bind to
> different regions.  Note that we do not capture enough scope information in
> the CFGs to do this analysis right now, but that is something we probably we
> will add to the CFGs in the future (especially if we wish to model implicit
> calls to destructors for C++ objects).
>

Why in this example 'j' will bind to a different region on each iteration?
In actual (physical) execution of this program, 'j' just holds the same
stack memory. It's the same as the one declared outside of the loop, except
its scope is limited inside the loop. Instead, if we bind it to different
region on each iteration,  we will get wrong results, because it's
inconsistent with the semantics of C.

> For a struct member expression 'p->data', we can first get the superRegion
> by following p's MemRegion in the store mapping, then get the final region
> by composite the FieldDecl of 'data' with the MemRegion that p points at.
>
>
> Would the idea be to represent p->data (the composition) with a FieldRegion
> (with its super region being the region for 'p'), or something else?
>

Yes, exactly.

>
> In summary, we only need to store the Store ( mapping from MemRegion* to
> RVal), but not the Environment (mapping from names to MemRegion*).
>
>
> Given the example I provided above, do you still think that is the case?
>  It's certainly simpler if we don't have to model the mapping from names to
> MemRegion*.
>

Could you show me an example where the same VarDecl should bind to different
region during one analysis path?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20081007/3c1e1913/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: regionstore.patch
Type: application/octet-stream
Size: 7939 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20081007/3c1e1913/attachment.obj>