[cfe-dev] Lambda expr AST representation

Tue Oct 9 09:31:16 PDT 2012

On Oct 9, 2012, at 9:16 AM, Abramo Bagnara <abramo.bagnara at bugseng.com> wrote:

> Il 09/10/2012 17:38, Douglas Gregor ha scritto:
>> 
>> On Oct 9, 2012, at 8:11 AM, Abramo Bagnara
>> <abramo.bagnara at bugseng.com> wrote:
>> 
>>> Il 09/10/2012 15:29, Douglas Gregor ha scritto:
>>>> 
>>>> On Oct 4, 2012, at 2:36 PM, Abramo Bagnara 
>>>> <abramo.bagnara at bugseng.com> wrote:
>>>> 
>>>>>>> I'd suggest a slightly different path:
>>>>>>> 
>>>>>>> 1) the closure type FieldDecl has the name (actually a 
>>>>>>> pseudo-name) of the captured variable ("this" to refer to 
>>>>>>> captured this)
>>>>>>> 
>>>>>>> 2) the FieldDecl uses a bit to represent the fact that
>>>>>>> fields are the fields of the closure type (this means they
>>>>>>> are actually unnamed)
>>>>>>> 
>>>>>>> In this way the source pretty printing is easily doable,
>>>>>>> the semantic info is accurate, no new AST node is needed,
>>>>>>> CodeGen is simpler (it does not need to map DeclRefExpr to 
>>>>>>> MemberExpr).
>>>>>>> 
>>>>>>> I've forgot something?
>>>>>> 
>>>>>> That could work... although it would be a bit tricky to find
>>>>>> the original captured variable given a MemberExpr of this
>>>>>> sort.
>>>>> 
>>>>> I've thought to that, but I failed to imagine a case where this
>>>>> is needed.
>>>> 
>>>> It matters a lot for features that care more about the results
>>>> of name lookup than the underlying semantics. For example,
>>>> libclang's clang_findReferencesInFile, which finds all of the
>>>> references to a given declaration, would need to introduce new
>>>> code to map the fields of implicitly-generated MemberExprs back
>>>> to references to a normal variable declaration. In general, these
>>>> tools expect (reasonably, IMO) that a local variable or static
>>>> data member will be referenced with DeclRefExpr, while a
>>>> non-static data member will be referenced with a MemberExpr.
>>>> That's actually a very nice invariant. Doing as you suggest would
>>>> complicate the invariants for these clients, forcing them to deal
>>>> specifically with lambda captures (which they otherwise wouldn't
>>>> have to consider). And if we have to have the complication
>>>> somewhere, I'd rather it be with the more intelligent clients
>>>> that care about semantics, rather than the clients that only care
>>>> about cross-referencing.
>>> 
>>> I have a rather different perspective for libclang that IMHO is
>>> more accurate (and congruent with semantic): the variable in body
>>> references the capture list entry (implicit or explicit) while the
>>> capture list entry references the captured variable.
>> 
>> You're saying it's "more accurate" because it matches more closely
>> with the as-i" implementation written in the standard, and I don't
>> dispute that. What I dispute is that exactly modeling the as-if rule
>> in the standard is the right thing for Clang. We're not bound to
>> implement lambda classes via exactly that as-i" rule, and we probably
>> shouldn't do so for [&] lambdas anyway (because it's silly to store
>> several references to local variables in the same stack frame). And,
>> as noted above, our ASTs are meant to describe the language (which
>> they certainly do, even if not strictly based on the as-if rule) and
>> are meant to be usable by clients. Your suggestion may make some
>> trivial improvement in the former (for those who want to think in
>> terms of the as-if rule), but complicate other clients. That's not a
>> good trade-off.
> 
> I'm rather unconvinced this is a good choice: what about AST Matchers
> that want to find references to captured variable?

Those are simple today, because they're just DeclRefExprs. What you're asking for would complicate those matchers, because you'd have to match an implicitly-generated MemberExpr and map from the FieldDecl back to the VarDecl.

> I'm sure there are
> also other example of things that become rather complex if we permit
> such a large deviation from correct semantics to AST representation (and
> as far as I know this is almost unprecedented).

If you want to make this case, you'll need to come up with those examples. None of the clients of the current AST are particularly complex, and those that I've enumerated (and the one you've listed above) are simpler with the current AST representation. 

I see that you've dropped my comment about blocks. It is important, however. Having two different ways of expressing closures is unfortunate, and forced on us by the various dialects we support, but representing their captures in very different ways in the AST would be a lot more unfortunate (and is avoidable).

>>>>> 1) the closure type FieldDecl has the name (actually a
>>>>> pseudo-name) of the captured variable ("this" to refer to
>>>>> captured this)
>>>> 
>>>> As noted elsewhere, we can't do this exactly. The __name bit
>>>> could work, but I'd prefer that we simply keep these as anonymous
>>>> fields, because the __ names really don't help that much.
>>> 
>>> I apologize, it seems I've missed the elsewhere where this has
>>> been clarified. Can you explain that?
>> 
>> The __ names still require you to detect that you're in a lambda and
>> then remove the '__' from the name. You can't simply remove __ from
>> every name, or you'll get bogus results from various system classes
>> that actually do use __.
> 
> I meant to name the fields exactly with the captured name (without __
> prefix) and "this" for captured this. These names would be hidden to lookup.

"Hidden to lookup" is going to mean changing name lookup in a few places. That's rather unfortunate.

>>> It would be a pity to not have such names in FieldDecl, this would
>>> make pretty printer job very easy.
>> 
>> Adding a
>> 
>> VarDecl *getCapturedVariable(FieldDecl *)
>> 
>> for lambda classes would make the pretty printer's job trivial.
>> Besides, what's a pretty-printer doing printing the generated lambda
>> class?
> 
> The body of operator() of lambda class should be printed.

… and in that pretty-printing, we should print out the name of the captured variable, not a member expression referring to the field corresponding to the captured variable. Your suggestion either changes nothing (because an implicit member expression to the field prints just the field name), or complicates matters (if we don't 
have the field names match up with the captured variable names). Why is this an interesting case to discuss?

Fundamentally, you're arguing from the perspective of "purity of representation." There are two problems here: the pure representation you want isn't really "pure", in the sense that you want to represent an as-if rule that doesn't have to be the actual implementation approach. The second problem is that you're not considering the effect of this change on clients, which (so far) look like they would get *more* complex than what we have now.

	- Doug