[cfe-dev] Lambda expr AST representation

Tue Oct 9 06:29:51 PDT 2012

On Oct 4, 2012, at 2:36 PM, Abramo Bagnara <abramo.bagnara at bugseng.com> wrote:

> Il 04/10/2012 23:11, Eli Friedman ha scritto:
>> On Thu, Oct 4, 2012 at 2:05 PM, Abramo Bagnara
>> <abramo.bagnara at bugseng.com> wrote:
>>> Il 04/10/2012 21:26, Eli Friedman ha scritto:
>>>> On Thu, Oct 4, 2012 at 11:51 AM, Abramo Bagnara
>>>> <abramo.bagnara at bugseng.com> wrote:
>>>>> Il 04/10/2012 20:23, Eli Friedman ha scritto:
>>>>>> On Thu, Oct 4, 2012 at 4:05 AM, Abramo Bagnara
>>>>>> <abramo.bagnara at bugseng.com> wrote:
>>>>>>> 
>>>>>>> Despite what is written in C++11 5.1.2p7:
>>>>>>> 
>>>>>>> The lambda-expression’s compound-statement yields the function-body
>>>>>>> (8.4) of the function call operator, but for purposes of name lookup
>>>>>>> (3.4), determining the type and value of this (9.3.2) and transforming
>>>>>>> id-expressions referring to non-static class members into class member
>>>>>>> access expressions using (*this) (9.3.1), the compound-statement is
>>>>>>> considered in the context of the lambda-expression.
>>>>>>> 
>>>>>>> currently clang in its AST insert DeclRefExpr instead of correct
>>>>>>> MemberExpr, as the following typescript shows:
>>>>>>> 
>>>>>>> $ cat p.cc
>>>>>>> int f(int a) {
>>>>>>>  return [a]()->int { return a; }();
>>>>>>> }
>>>>>>> $ _clang++ -cc1 -ast-dump -std=c++0x p.cc
>>>>>>> typedef __int128 __int128_t;
>>>>>>> typedef unsigned __int128 __uint128_t;
>>>>>>> typedef __va_list_tag __builtin_va_list[1];
>>>>>>> int f(int a) (CompoundStmt 0x4629a50 <p.cc:1:14, line:3:1>
>>>>>>>  (ReturnStmt 0x4629a30 <line:2:3, col:35>
>>>>>>>    (CXXOperatorCallExpr 0x46299b0 <col:10, col:35> 'int'
>>>>>>>      (ImplicitCastExpr 0x4629998 <col:34, col:35> 'auto (*)(void) const
>>>>>>> -> int' <FunctionToPointerDecay>
>>>>>>>        (DeclRefExpr 0x4629910 <col:34, col:35> 'auto (void) const ->
>>>>>>> int' lvalue CXXMethod 0x4629580 'operator()' 'auto (void) const -> int'))
>>>>>>>      (ImplicitCastExpr 0x4629a18 <col:10, col:33> 'const class <lambda
>>>>>>> at p.cc:2:10>' <NoOp>
>>>>>>>        (LambdaExpr 0x4629748 <col:10, col:33> 'class <lambda at p.cc:2:10>'
>>>>>>>          (ImplicitCastExpr 0x46296b0 <col:11> 'int' <LValueToRValue>
>>>>>>>            (DeclRefExpr 0x4629688 <col:11> 'int' lvalue ParmVar
>>>>>>> 0x45fbf00 'a' 'int'))
>>>>>>>          (CompoundStmt 0x4629728 <col:21, col:33>
>>>>>>>            (ReturnStmt 0x4629708 <col:23, col:30>
>>>>>>>              (ImplicitCastExpr 0x46296f0 <col:30> 'int' <LValueToRValue>
>>>>>>>                (DeclRefExpr 0x46296c8 <col:30> 'const int' lvalue
>>>>>>> ParmVar 0x45fbf00 'a' 'int')))))))))
>>>>>>> 
>>>>>>> Although I'm aware that these DeclRefExpr are handled especially in
>>>>>>> CodeGen I think that this behavior should be considered a defect of AST.
>>>>>> 
>>>>>> Despite the reference to "transforming id-expressions" in the
>>>>>> standard, the ASTs were intentionally designed the way they are
>>>>>> because an expression in a lambda acts more like a reference to the
>>>>>> original variable in terms of semantic analysis than some sort of
>>>>>> member reference expression.
>>>>> 
>>>>> I'm amazed by this phrase: my message is specifically oriented to have a
>>>>> proper built AST under a semantic analysis point of view. AFAIK the
>>>>> reference to captured variables are *indeed* references to record field
>>>>> and not to original variable: e.g. if original variable captured by
>>>>> value changes after lambda class (closure type) instance generation and
>>>>> before operator() call the value that should be seen is the field of
>>>>> lambda class instance and not the value of captured variable.
>>>>> 
>>>>> I'm missing something?
>>>>> 
>>>>>> The alternative involves some entirely
>>>>>> new AST nodes to keep around the relevant semantic information, and
>>>>>> from my perspective that would just bloat the AST without any
>>>>>> substantial benefit.
>>>>> 
>>>>> Can you explain which semantic information?
>>>> 
>>>> Hmm... maybe the current implementation makes more sense to me because
>>>> I implemented large parts of it, but I don't think of references to
>>>> captured variables like normal member variables... I think of them
>>>> more like references to the original variable from the perspective of
>>>> inside the lambda.  The whole implementation was a bi colored by the
>>>> existing implementation of the Apple blocks extension, where it isn't
>>>> as clear-cut that there's actually an in-memory object containing the
>>>> relevant members.
>>>> 
>>>> There are two reasons we'd need new AST nodes: one, we have to treat
>>>> the "implicit this" differently from the implicit this for normal
>>>> class members, and two, we would need a different kind of member
>>>> reference expression to track the original variable referred to.
>>> 
>>> I'd suggest a slightly different path:
>>> 
>>> 1) the closure type FieldDecl has the name (actually a pseudo-name) of
>>> the captured variable ("this" to refer to captured this)
>>> 
>>> 2) the FieldDecl uses a bit to represent the fact that fields are the
>>> fields of the closure type (this means they are actually unnamed)
>>> 
>>> In this way the source pretty printing is easily doable, the semantic
>>> info is accurate, no new AST node is needed, CodeGen is simpler (it does
>>> not need to map DeclRefExpr to MemberExpr).
>>> 
>>> I've forgot something?
>> 
>> That could work... although it would be a bit tricky to find the
>> original captured variable given a MemberExpr of this sort.
> 
> I've thought to that, but I failed to imagine a case where this is needed.

It matters a lot for features that care more about the results of name lookup than the underlying semantics. For example, libclang's clang_findReferencesInFile, which finds all of the references to a given declaration, would need to introduce new code to map the fields of implicitly-generated MemberExprs back to references to a normal variable declaration. In general, these tools expect (reasonably, IMO) that a local variable or static data member will be referenced with DeclRefExpr, while a non-static data member will be referenced with a MemberExpr. That's actually a very nice invariant. Doing as you suggest would complicate the invariants for these clients, forcing them to deal specifically with lambda captures (which they otherwise wouldn't have to consider). And if we have to have the complication somewhere, I'd rather it be with the more intelligent clients that care about semantics, rather than the clients that only care about cross-referencing.

> Also I've thought that this info would pertain to capture list and not
> to field. If we need it we might save the FieldDecl* together with the
> VarDecl* in LambdaExpr::Capture.

I'd be perfectly fine with adding the FieldDecl* into LambdaExpr::Capture, so it's easy to map between the two.

Elsewhere, you said:

> I'd suggest a slightly different path:
>
> 1) the closure type FieldDecl has the name (actually a pseudo-name) of
> the captured variable ("this" to refer to captured this)

As noted elsewhere, we can't do this exactly. The __name bit could work, but I'd prefer that we simply keep these as anonymous fields, because the __ names really don't help that much.

> 2) the FieldDecl uses a bit to represent the fact that fields are the
> fields of the closure type (this means they are actually unnamed)

There's no need for this bit; simply check whether the DeclContext of the FieldDecl is a lambda class. From that lambda class, you can get at the LambdaExpr::Captures, and therefore the names and original VarDecls.

	- Doug