[cfe-commits] [PATCH] Expressions have lvalues and rvalues

Wed Oct 15 06:40:32 PDT 2008

On Wed, Oct 15, 2008 at 3:27 PM, Ted Kremenek <kremenek at apple.com> wrote:

>
> On Oct 12, 2008, at 2:12 AM, Zhongxing Xu wrote:
>
>
>>
> I don't see these are redundant. Values are raw bits interpreted within
> some context. We make a fundamental distinction between two types of values:
> value that represents some address and value that not. So I prefer we have
> two kinds of ConcreteInt: lval::ConcreteInt and nonlval::ConcreteInt. And
> they are seldom same. lval::ConcreteInt usually is very large.
>
>
> You're right.  I originally wrote this code this way for this very reason,
> and in the course of this discussion I confused myself.  I was just trying
> to think of whether or not an integer value was really an lvalue.  Yes it
> can represent a location, but is it really an lvalue.  The current way
> things are "typed" with lval and nonlval, however, makes the analyzer
> readily understand code like the following:
>
> int foo() {
>   return *&*((int*) 0xa0000);
> }
>
> In this example, the pointer cast causes the integer literal to be treated
> as an lval::ConcreteInt.  This isn't really an lvalue though; it's really an
> rvalue that represents the location of something in memory (at address
> 0xa0000).  The '*' operator, however, expects an rvalue (per the wording in
> the C++ standard).  So while nonlval and lval do reason about locations,
> they aren't really lvalues or rvalues at all; just something that
> approximates them.  Hence the motivation to change their names.
>

Right. Let me describe how we evaluate this statement from another
direction. As in 'return' expression, we expect the rvalue of *&*((int*)
0xa0000). To get *expr's rvalue we should first get expr's rvalue, which is
a location value, then retrieve the value stored there. So we evaluate
&*((int*) 0xa0000)'s rvalue. &expr's rvalue is expr's lvalue. And *((int*)
0xa0000) is really an lvalue expression. We evaluate its lvalue. *expr's
lvalue is expr's rvalue. So we evaluate ((int*) 0xa0000)'s rvalue. ((int*)
0xa0000) is a cast expression. 0xa0000 is an non-location integer value,
(int*) casts it to location value 0xa0000, which is what we expect.
Everything works just fine.

>
>
>>
>> One thing about this is that it makes the transfer function structure
>> basically fall out from what's in the C/C++ standards.  For example, the '*'
>> operator essentially has the following type signature:
>>
>> * : rval -> lval
>>
>> Similarly, references to variables and the '&' operator could be
>> represented as follows:
>>
>> variable reference:  (declrefexpr) -> lval
>> & : lval -> rval  (with the rval being an rval::MemoryRegion)
>>
>> In contexts where an lval is used as an rval, we have an implicit
>> conversion (as stated in the C++ standard).  Such implicit conversations
>> would be represented by a transfer function, which cause a new state and
>> ExplodedNode to be created to represent the effect of this conversion.  For
>> example, an implicit conversion from an lval to rval could result in a value
>> load (e.g., EvalLoad, which would have the type signature lval -> rval).
>>
>
> I don't understand your meaning very clearly. For * operator, we just get
> its operand's rvalue, which is a location value. If we are at the LHS of an
> assignment, this location value is what we want. If we are at the RHS of an
> assignment, we do another EvalLoad with this location value. For & operator,
> we get its operand's lvalue, and this location value is the rvalue of the
> whole expression.
>
>
> My pedantic point was that lval:: and nonlval:: classes are not lvalues and
> rvalues.  We could change them to be as such; in this case the transfer
> function of operator '*' would always return an lvalue, and then whatever
> used that lvalue would then perform the implicit conversion.  We invert this
> in the static analyzer right now; the transfer function logic for '*' does
> the implicit lvalue->rvalue conversion based on context (i.e., if the asLVal
> flag is not set).  The current approach makes sense since we want to
> associate with a given expression the value the expression evaluates to;
> this includes the result of implicit conversions.
>
> At the end of the day, however, lval:: and nonlval:: classes are not
> lvalues/rvalues respectively (in the C++ parlance).  My question was whether
> or not we should change the use of these classes so that they EXACTLY map to
> lvalues/rvalues.  After looking at the code, reviewing the patch, thinking
> about the overall design of GRExprEngine, and all the comments made here,
> I'm not in favor of this idea anymore.  I think it is more useful to reason
> about locations versus non-locations than lvalues versus rvalues.
>

Right.

>
>
> I am not letting the distinction between rvalues and lvalues happen in the
> Store. They do happen in GRExprEngine in my patch. Notice that I only added
> a getLValue() to StoreManager. The intention is to let the Store to return
> the lvalue of an expression.
>
>
> To me that is the point of MemRegion.  Shouldn't MemRegion be able to
> represent the location in all cases?  In my mind, an implementation of Store
> should not depend on RValues.h.  If a Store just reasons about regions, it
> doesn't need to think about lval objects, or is this not the case?
>

Yes. Store should not depend on RValues.h. Store should just reasons about
regions.

>
> I know I'm the one who came up with lval::FieldOffset, lval::ArrayOffset,
> etc.  I'm questioning this decision.  It just seems to be hard-coding a
> particular Store's conception of memory into the lval classes.  These
> concepts can easily be represented (far more elegantly) as regions.  Once
> you are just dealing with regions, StoreManager::getLValue() only needs to
> return a region type.
>

Yes, I also feel that here needs some redesign. But different store will
still return different region values for an expression, because of their
different granularity of modeling. I have no clear idea about this
currently. I expect to do this in a separate patch.

>
> Because for different stores, we may have different representation of
> location values for the same expression. For example, in BasicStore, we may
> have a different location value for the lvalue of expression a[3] than in
> RegionStore
>
>
> To me Stores can return different regions for a[3].  However, having the
> Store return an lval:: object has one distinct advantage that I see: we
> don't have to define "Undefined" or "Unknown" for regions.
>

When we query Store for the lvalue of an expression, Store should return us
an abstract value, which may be a MemRegionVal or UnknownVal or
UndefinedVal.
This may be clear when we are replacing lval::FieldOffset and
lval::ArrayOffset with something else.

> . So I let StoreManager to determine the concrete representation of an
> expression's lvalue. Whether we want the lvalue or rvalue of an expression
> is decided by the GRExprEngine according to the context, e.g., the position
> of the expression in the parent expression.
>
>
> Right.
>
> Yeah, we should limit this rvalue/lvalue distinction within GRExprEngine.
> And the intention of my patch is this! Let me summarize my patch in the
> following:
>
> In GRExprEngine, we know when we want the rvalue of an expression and when
> we want the lvalue of an expression. If we want the lvalue of an expression,
> we first evaluate all of its sub-exprs, then we ask the Store what the
> concrete form of its lvalue. The Store return us a location value.
>
>
> This all makes much more sense to me now.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20081015/a6d63f57/attachment.html>