[cfe-commits] r63347 - in /cfe/trunk: include/clang/Analysis/PathSensitive/BasicValueFactory.h include/clang/Analysis/PathSensitive/MemRegion.h include/clang/Analysis/PathSensitive/SVals.h lib/Analysis/MemRegion.cpp lib/Analysis/RegionStore.cpp l

Mon Feb 2 13:21:51 PST 2009

Zhongxing,

I'm going to respond to your comments out-of-order.

On Jan 30, 2009, at 1:36 AM, Zhongxing Xu wrote:

<SNIP>

> It saves much trouble later when p is used as array base.
>
> If we do not make that ElementRegion when doing cast, it is hard to  
> distinguish the case for code:
>
> struct s* p = (struct s*) alloc(sizeof struct s);
> struct s v = p[1];
>
> For this case the region for p should not be used as the base region  
> for element region.
>
> My point is that we make an ElementRegion in the first case and an  
> AnonTypedRegion for the second case, and we can only do the right  
> thing when doing cast, but not later when we getLValueElement for  
> p[1].
>
> Then when we getLValueElement, we abandon (emit warning or retun  
> unknown) when the base region is not an ElementRegion.

I honestly don't think we can have the invariant that getLValueElement  
is always invoked on a MemRegion that is an ElementRegion.  Consider  
the following examples:

Example 1:

int foo(int *x) {
   return x[0];
}

In Example 1, 'x' binds to a loc::SymbolVal.  The region associated  
with that symbol is a SymbolicRegion (let's call it SymR).  How do we  
ensure the an ElementRegion whose base is SymR and index 0 is passed  
to getLValueElement?  Honestly, what does this buy us?  If we pass a  
non-ElementRegion to getLvalueElement, it's a synonym for  
ElementRegion(SymR, 0).

Example 2:

int bar() {
   char x;
   char *p = &x;
   return p[0];
}

In this case we have a VarRegion for x (let's denote that  
VarRegion(x)).  After the second line 'p' binds to  
loc::MemRegionVal(VarRegion(x)).  This value is then passed to  
getLValueElement.  Essentially this is the same as passing  
ElementRegion(VarRegion(x), 0).

I'm sure we can come up with many more cases like this.  What does  
creating an ElementRegion (with base 0) before calling  
getLValueElement buy us?  This logic could certainly be done in  
GRStateManager, just before it calls StoreManager::getLValueElement.   
Maybe that would simplify some things.

Creating an ElementRegion when we do the cast from the return value of  
alloca() to another type doesn't make much sense to me at all.  When  
we see code like the following, 'p' binds to the *entire*  
AnonTypedRegion, not just a specific element:

   char *p = (char*) alloca(1000);

I just semantically adds an unnecessary veneer to have 'p' bind to  
ElementRegion(AnonTypedRegion(AllocaRegion(...)), 0) instead of just  
AnonTypedRegion(...).  I argue that this veneer buys us nothing since  
we have to handle all the other cases I mentioned above anyway in some  
uniform fashion.

>
> As the rabbit hole goes arbitrary deep here, I suggest we abandon  
> early when we getLValueElement for wonky[1]. For this example, I  
> think the region of _wonky_gesticulate_cheese should be the brother  
> but not the parent of the region for wonky[1].

I'm not certain what you mean by the "brother."  We don't have that  
notion (yet) in MemRegion.  Can you elucidate?  What would this buy  
us?  What would be the parent of  
ElementRegion(_wonky_gesticulate_cheese, 1) (i.e., "wonky[1])?

> But when we getLValueElement for wonky[1], we actually use the  
> region of _wonky_gesticulate_cheese as the super region for the  
> element region.

I'm not certain what this dual view buys us.  The variable 'wonky' has  
the region 'VarRegion(wonky)'.  That region binds to the location of  
_wonky_gesticulate_cheese.  The lvalue of 'wonky[1]' evaluates to  
ElementRegion(_wonky_gesticulate_cheese, 1).  Wouldn't the parent of  
the later just be _wonky_gesticulate_cheese in all cases?

> Then it goes to my original argument for guaranteeing all base  
> regions passed to getLValueElement() to be ElementRegions, since we  
> do not have enough information in getLValueElement() for us to make  
> the right decision.

I feel like I'm missing something here.  What information are we  
missing to make the right decision?

Put another way, what problem are you trying to solve by imposing the  
precondition to getLValueElement that it is always passed an  
ElementRegion?  As I said earlier, we can enforce this precondition by  
adding some wrapper logic in GRStateManager, but all that would be is  
code refactoring.  From my understanding of things now, all of the  
information necessary to make the "right" decision can be done in  
getLValueElement.

> For code
>
> char * p = (char*) alloc(4);
> char c = p[1];
>
> we make p points to an element region when we do the cast.

I honestly feel that special casing this example buys us nothing  
(given the examples I provided above), and only adds complexity and  
extra space overhead.

Don't get me wrong, I think we need some nothing of an "extended" type  
system when doing analysis.  Some casts could be deemed "okay" and  
others not so much.  Unions represent a good example where memory can  
be reused for different data types, so we're just going to have to  
make some choices at some point of how to handle pieces of memory that  
are used in a conflated manner.