[cfe-dev] MemRegions - how to (re)use them right?
Ted Kremenek
kremenek at apple.com
Mon Oct 26 18:35:57 PDT 2009
On Oct 19, 2009, at 7:30 AM, Olaf Krzikalla wrote:
> At block level the for-body contains binary assignment expressions
> only. Now I just want to know, if and where lhs_x appears anywhere
> in rhs_y (with x < y, let's ignore x > y for the moment). Then I
> could expand rhs_y by replacing lhs_x with rhs_x, perhaps
> recursively. This of course isn't that easy as it sounds: you have
> to resolve things like *&a <-> a or a->b <-> (*a).b.
> Thus all I need is a unique identifier for lhs_x and a function
> which for a given arbitrary rvalue-expression finds the referred
> lhs_x identifier (if one exists). Given these my initial example
> could be solved quite elegant:
>
> A* ptr = &temp; // var ptr gets ID1
> temp.a = /*expr*/; // field var temp.a gets ID2
> temp.b = /* another_expr */; // field var temp.b gets ID3
> *sink = temp.a; // function detects ID2 for temp.a: expand to
> *sink = /*expr*/: done
> *sink = ptr->b; // function detects ID1 for ptr: expand to *sink =
> (&temp)->b:
> // function detects ID3 for (&temp)->b: expand to
> *sink = /* another_expr */: done
>
> I think, that one feasible approach to do this is working on the AST
> level by unifiying expressions somehow (e.g. something like a
> unified string) and then just search for them in other unified
> expressions. However I'm afraid that this approach isn't very
> extensible. At least it means some work which possibly has been done.
> OTOH I've seen the MemRegions which apparently are already designed
> so that they can act easily as these unique identifiers I mentioned
> above. Also the PostLoad node seems to designate sub-expressions for
> which a MemRegion exists. But the only clang part dealing with
> MemRegions so far is the GRExprEngine and as I said I'm note sure if
> it is what I need. Or it's me who is still not able to use
> GRExprEngine properly for my purpose.
Hi Olaf,
Sorry for the delay in my response.
MemRegions can be viewed as the "name" or "location" of a piece of
memory. This is how GRExprEngine uses them, and it uses MemRegions to
talk to StoreManager to reason about values bound to MemRegions.
MemRegion certainly can be repurposed by other clients, as
MemRegionManager (which constructs MemRegion objects) doesn't rely on
the rest of the path-sensitive engine. MemRegions conceptually
represent abstract chunks of memory. For example, a VarRegion
represents the memory associated with a given variable. Regions can
be layered on top of each other to represent field and array accesses,
casts, and so forth. MemRegions are constructing on demand by
GRExprEngine when it evaluates casts, pointer decays, and so forth.
MemRegions are uniqued by MemRegionManager. For example, when you
request the MemRegion for a given VarDecl, you will always get the
same MemRegion*.
In your example above, I'm not certain if ID* symbols are the values
bound to locations or the locations themselves. If they are the
locations, then you'd have:
ID1 => VarRegion(ptr)
ID2 => FieldRegion(VarRegion(temp), a)
ID3 => FieldRegion(VarRegion(temp), b)
Now if ID* represent values, you're going to need something else
besides MemRegions to keep track of location -> value bindings. For
example:
> *sink = ptr->b; // function detects ID1 for ptr: expand to *sink =
> (&temp)->b:
To get this reasoning, you need to keep track of the binding:
VarRegion(ptr) => VarRegion(temp)
which would cause the l-value of 'ptr->b' to evaluate to:
FieldRegion(VarRegion(temp), b)
Then the r-value of 'ptr->b' expands to whatever value you track for
the field 'b' of 'temp'.
In general, I'm not certain how much value flow logic you plan on
implementing. MemRegions themselves don't represent the values of
expressions, but rather memory locations (which some expressions may
evaluate to). The path-sensitive engine uses SVals (short for
"symbolic values") to represent what the symbolic result of evaluating
an expression. As you'll see, SVals can represent "locations" and
"non-locations", and "locations" include MemRegions.
I guess it's not clear to me how much symbolic evaluation you plan on
doing. If your analysis isn't path-sensitive, how do plan on handling
confluence points in the CFG (and loops)?
Ted
More information about the cfe-dev
mailing list