[cfe-dev] CFGElement changes and initializers addition (with patch)

Wed Aug 25 18:56:28 PDT 2010

On Thu, Aug 26, 2010 at 3:50 AM, Ted Kremenek <kremenek at apple.com> wrote:
>
> On Aug 24, 2010, at 11:26 PM, Zhongxing Xu wrote:
>
>> On Wed, Aug 25, 2010 at 1:56 PM, Ted Kremenek <kremenek at apple.com> wrote:
>>> On Aug 24, 2010, at 5:43 PM, Zhongxing Xu <xuzhongxing at gmail.com> wrote:
>>>
>>>>> Going with my above suggestion, CXXConstructExprs should probably just be treated the same way as CallExprs, and have their own CallEnter/CallExit nodes.  In this way they are treated just like any other call.  That means they should also be block-level expressions.
>>>>
>>>> We're treating CXXConstructExprs as CallExprs. But they have an
>>>> implicit 'this' argument. I prefer we set up 'this' before entering
>>>> the call.
>>>
>>> Hi Zhongxing,
>>>
>>> I'm not exactly certain what you mean by setting up 'this'.  What aspect of control-flow do you want to represent in the CFG w.r.t. the 'this' argument?  I assume that this is specific to constructors, and not all calls to member functions.
>>
>> I mean the logic in GRExprEngine::VisitCXXConstructExpr(), the object
>> region is passed in by
>> 'Dest'. Then we setup the CXXThisRegion, then enter the ctor call.
>
> Ah, I see.  Makes sense.
>
>>>
>>>> That is, I'm suggesting we create CFG for DeclStmt
>>>>
>>>> A a(3), b(4);
>>>>
>>>> as
>>>>
>>>> A a(3)
>>>> A b(4)
>>>>
>>>> not
>>>>
>>>> 3
>>>> a(3)
>>>> 4
>>>> b(4)
>>>
>>> The reason we do the latter is because of the control-flow sequencing between declarations and initializers.  For example, the following is legal:
>>>
>>>  int a = a, b = a;
>>>
>>> We represent the control-flow here as:
>>>
>>>  a
>>>  int a = a
>>>  a
>>>  int b = a
>>>
>>> because there is a control-flow ordering between the initializer expressions and the object they are initializing.  This is important for catching uses of uninitialized values (for example).
>>>
>>> With respect to the CFG, I guess I'm not certain what you mean by:
>>>
>>>  A a(3)
>>>  A b(4)
>>>
>>> To me the '3' and the '4' (the arguments of the call) need to be evaluated before the constructor call, which includes its member initializers.  I would thus expect:
>>>
>>>  3
>>>  A a(3)
>>>  4
>>>  A b(4)
>>>
>>> I am also not clear what you mean by:
>>>
>>>  A a(3)
>>>
>>> instead of:
>>>
>>>  a(3)
>>>
>>> I think I'm missing something basic here.  Could you explain this a little further?  That would really help me understand what you want to represent here, and why it needs to be in the CFG.
>>>
>>
>> Currently, for
>>
>> int a = a;
>>
>> we construct CFG as
>>
>> int a = a;
>>
>> not
>>
>> a
>> int a = a;
>
> Yes, you are right.  I was thinking of the following two cases:
>
>  int x = x && x
>
> and
>
>  int x = 1, y = x;
>
> In the first case, we get:
>
>  [ B1 ]
>      1: [B2.1] && [B3.1]
>      2: int x = x && x;
>    Predecessors (2): B3 B2
>    Successors (1): B0
>
>  [ B2 ]
>      1: x
>      T: [B2.1] && ...
>    Predecessors (1): B4
>    Successors (2): B3 B1
>
> and in the second case, we get:
>
>  [ B1 ]
>      1: int x = 1;
>      2: int y = x;
>    Predecessors (1): B2
>    Successors (1): B0
>
> My point is that we do make conscious decisions about representing the control-flow in declarations, even with straight C.
>
>>
>> My point is that constructing cfg as
>>
>> int a =a;
>>
>> does not affect anything. Because when the engine sees the
>> VarDecl('a'), it actually does nothing except creating a
>> VarRegion('a'). 'a' is still left as undefined.
>
> That's absolutely true from the perspective of evaluating the semantics of the declaration, but the control-dependency is still important for other tasks.  For example, a checker may want to know that the 'a' on the RHS is uninitialized, and thus flag a warning.  Since the initializer is essentially a subexpression, it is obviously evaluated first, but that control-dependency is important.  We just don't need to represent it explicitly in the CFG since it's part the semantics of statements and their subexpressions.

We can make the control-dependency explicit in the CFG only when it is needed.

Currently, int x =f(); and int x = x&& x; are both handled well with
explicit control flow in the CFG.

For int x = 3; or A x=A(); we don't need to make '3' and 'A()'
block-level exprs.

Do you mean we should uniformly make all initializers block-level exprs?

>
>>
>> The sequence that the engine sees the VarDecl first or the initializer
>> first makes no difference for C. But it makes difference for C++.
>>
>> When evaluating the CXXConstructExpr, GRExprEngine delegates to
>> AggExprVisitor, which needs a Dest pointer to the object it will
>> construct into. If the engine sees the CXXConstructExpr first, it has
>> to create a temporary object to construct in, then lazy-copy it into
>> the variable being declared. If the engine can see the VarDecl first,
>> it can pass the VarRegion of it to the AggExprVisitor.
>
> I think I see what you mean.  If we make CXXConstructExpr a block-level expression, it will be seen before the DeclStmt that declares the variable.  Is this what you mean?
>
> Currently the invariant in the CFG is that after a DeclStmt is evaluated the variable is considered to be initialized.  For variables that don't involve constructors, the DeclStmt actually does the initialization.  I'd like to keep that invariant if at all possible.
>
> If I am interpreting you correctly, I think I have a few suggestions here that won't require changes to the CFG but will accomplish what you desire.  I don't want to expound on that, however, until I'm sure I know that this is what you meant.