[cfe-dev] MemRegions - how to (re)use them right?

Mon Oct 26 18:35:57 PDT 2009

On Oct 19, 2009, at 7:30 AM, Olaf Krzikalla wrote:

> At block level the for-body contains binary assignment expressions  
> only. Now I just want to know, if and where lhs_x appears anywhere  
> in rhs_y (with x < y, let's ignore x > y for the moment). Then I  
> could expand rhs_y by replacing lhs_x with rhs_x, perhaps  
> recursively. This of course isn't that easy as it sounds: you have  
> to resolve things like *&a <-> a or a->b <-> (*a).b.
> Thus all I need is a unique identifier for lhs_x and a function  
> which for a given arbitrary rvalue-expression finds the referred  
> lhs_x identifier (if one exists). Given these my initial example  
> could be solved quite elegant:
>
> A* ptr = &temp;                  // var ptr gets ID1
> temp.a = /*expr*/;               // field var temp.a gets ID2
> temp.b = /* another_expr */;     // field var temp.b gets ID3
> *sink = temp.a;   // function detects ID2 for temp.a: expand to  
> *sink = /*expr*/: done
> *sink = ptr->b;   // function detects ID1 for ptr: expand to *sink =  
> (&temp)->b:
>                 // function detects ID3 for (&temp)->b: expand to  
> *sink  = /* another_expr */: done
>
> I think, that one feasible approach to do this is working on the AST  
> level by unifiying expressions somehow (e.g. something like a  
> unified string) and then just search for them in other unified  
> expressions. However I'm afraid that this approach isn't very  
> extensible. At least it means some work which possibly has been done.
> OTOH I've seen the MemRegions which apparently are already designed  
> so that they can act easily as these unique identifiers I mentioned  
> above. Also the PostLoad node seems to designate sub-expressions for  
> which a MemRegion exists. But the only clang part dealing with  
> MemRegions so far is the GRExprEngine and as I said I'm note sure if  
> it is what I need. Or it's me who is still not able to use  
> GRExprEngine properly for my purpose.

Hi Olaf,

Sorry for the delay in my response.

MemRegions can be viewed as the "name" or "location" of a piece of  
memory.  This is how GRExprEngine uses them, and it uses MemRegions to  
talk to StoreManager to reason about values bound to MemRegions.

MemRegion certainly can be repurposed by other clients, as  
MemRegionManager (which constructs MemRegion objects) doesn't rely on  
the rest of the path-sensitive engine.  MemRegions conceptually  
represent abstract chunks of memory.  For example, a VarRegion  
represents the memory associated with a given variable.  Regions can  
be layered on top of each other to represent field and array accesses,  
casts, and so forth.  MemRegions are constructing on demand by  
GRExprEngine when it evaluates casts, pointer decays, and so forth.   
MemRegions are uniqued by MemRegionManager.  For example, when you  
request the MemRegion for a given VarDecl, you will always get the  
same MemRegion*.

In your example above, I'm not certain if ID* symbols are the values  
bound to locations or the locations themselves.  If they are the  
locations, then you'd have:

   ID1 => VarRegion(ptr)
   ID2 => FieldRegion(VarRegion(temp), a)
   ID3 => FieldRegion(VarRegion(temp), b)

Now if ID* represent values, you're going to need something else  
besides MemRegions to keep track of location -> value bindings.  For  
example:

> *sink = ptr->b;   // function detects ID1 for ptr: expand to *sink =  
> (&temp)->b:

To get this reasoning, you need to keep track of the binding:

    VarRegion(ptr)  =>  VarRegion(temp)

which would cause the l-value of 'ptr->b' to evaluate to:

    FieldRegion(VarRegion(temp), b)

Then the r-value of 'ptr->b' expands to whatever value you track for  
the field 'b' of 'temp'.

In general, I'm not certain how much value flow logic you plan on  
implementing.  MemRegions themselves don't represent the values of  
expressions, but rather memory locations (which some expressions may  
evaluate to).  The path-sensitive engine uses SVals (short for  
"symbolic values") to represent what the symbolic result of evaluating  
an expression.  As you'll see, SVals can represent "locations" and  
"non-locations", and "locations" include MemRegions.

I guess it's not clear to me how much symbolic evaluation you plan on  
doing.  If your analysis isn't path-sensitive, how do plan on handling  
confluence points in the CFG (and loops)?

Ted