[cfe-commits] [PATCH] Sema and AST for C++0x Lambda Expressions

Fri Aug 12 15:26:57 PDT 2011

On Aug 11, 2011, at 11:21 AM, John Freeman wrote:

> Attached is a patch that begins building an AST for lambda expressions. It is still incomplete (notably no LambdaExpr yet), but I feel like I reached a stopping point and would like a review before I get too deep.
> 
> There are quite a few FIXMEs sprinkled throughout. Not all of them indicate problems; many just mark design/API/coding convention questions for the reviewer. When it came to decisions, I just went with my gut figuring I could always fix it later if I was wrong.

--- include/clang/AST/DeclCXX.h
+++ include/clang/AST/DeclCXX.h
@@ -16,6 +16,8 @@
 #define LLVM_CLANG_AST_DECLCXX_H
 
 #include "clang/AST/Expr.h"
+// FIXME: Including this only for CXXThisExpr. Is that bad?
+#include "clang/AST/ExprCXX.h"
 #include "clang/AST/Decl.h"
 #include "clang/AST/TypeLoc.h"
 #include "clang/AST/UnresolvedSet.h"

How about just forward-declaring CXXThisExpr, rather than #including the whole header?

@@ -2566,6 +2568,53 @@ public:
   friend class ASTDeclReader;
 };
 
+/// LambdaDecl - Represents a C++0x lambda expression.
+class LambdaDecl : public Decl, public DeclContext {

It would help if the comment here showed an example lambda expression, and pointed out the main pieces (capture list, body, etc.).

+class LambdaDecl : public Decl, public DeclContext {
+public:
+  class Capture {
+    enum {
+      CaptureThis,
+      CaptureByRef,
+      CaptureByCopy
+    } Kind;
+
+    Expr* Init;
+

Please use a PointerIntPair<Expr*, 2, TheEnumKind> as the storage for a capture, to reduce storage to one pointer per capture.

+    /// The initializing Expr for the captured entity. Will be either
+    /// CXXThisExpr or DeclRefExpr.
+    Expr* getInit() const { return Init; }

I'm not sure I understand the point of the DeclRefExpr. Why not just store the VarDecl* that's actually being captured, along with a SourceLocation showing where the capture occurred?

+
+  Capture *Explicits;
+  unsigned NumExplicits;

A short comment would be helpful here. 

+  /// Declaration for the function object class implementing the lambda
+  /// expression.
+  CXXRecordDecl *D;

Could we give this a useful name, e.g., LambdaClass?

The three data members above should be private, with accessors.

Please add more source-location information to LambdaDecl, so we have at least enough source-location information to implement getSourceRange() for LambdaDecl, covering the entire declaration. We'll need this source information for diagnostics and indexing.

We also seem to be missing some accessors :)

--- include/clang/AST/RecursiveASTVisitor.h
+++ include/clang/AST/RecursiveASTVisitor.h
@@ -1114,6 +1114,8 @@ DEF_TRAVERSE_DECL(FriendTemplateDecl, {
     }
   })
 
+DEF_TRAVERSE_DECL(LambdaDecl, { })
+

This should traverse the explicit captures and (eventually) the parameter list, return type, and body of the lambda. It's okay to fill this in incrementally.

--- include/clang/Basic/DiagnosticSemaKinds.td
+++ include/clang/Basic/DiagnosticSemaKinds.td
@@ -1151,6 +1151,19 @@ def err_for_range_begin_end_types_differ : Error<
 def note_for_range_type : Note<"range has type %0">;
 def note_for_range_begin_end : Note<
   "selected '%select{begin|end}0' %select{function|template }1%2 with iterator type %3">;
+
+// C++0x lambda expressions
+def err_capture_more_than_once : Error<
+  "%0 can appear only once in a capture list">;
+// FIXME: Is this note overkill? It will almost always appear on the same line
+// as the error.
+def note_previously_captured_here : Note<"previously captured here">;

Instead of emitting a separate note, you could just highlight the source range of the previous capture.

+/// LambdaExpr - Adaptor class for mixing a LambdaDecl with expressions.
+class LambdaExpr : public Expr {
+  LambdaDecl *Lambda;
+

It's really not just an "adaptor", is it? It's the actual lambda expression. Please expand the comment with an example lambda expression. The commenting here is actually more important than for LambdaDecl, since lambdas are an expression in the language.

+  // Required to pass implementation check.
+  SourceRange getSourceRange() const {
+    return SourceRange(/*FIXME: What goes here?*/);
+  }

You want the source range to cover everything from the opening [ to the closing }.

+  //===--------------------------------------------------------------------===//
+  // C++ Lambda Expressions (C++0x [expr.prim.lambda])
+
+  // FIXME: Change to DeclResult later? Is there a better way to pass the Lambda
+  // from through each action? The introducer does not begin a function scope,
+  // so cannot mimic blocks. Should we condense these into one action, or should
+  // we try semantic checking as early as possible?
+  LambdaDecl *ActOnLambdaIntroducer(const LambdaIntroducer &Intro);
+  void ActOnLambdaDeclarator(LambdaDecl *Lambda, Declarator &D);
+  void ActOnLambdaBody(LambdaDecl *Lambda, Stmt *Body);

The only benefit of DeclResult is that it captures the error case in ways that are harder to ignore. I suggest using it as the result of ActOnLambdaIntroducer, just for that reason.

Regarding your other comment, I like the current set of three functions. It lets us do semantic analysis early, and feels like a nice, logical breakdown of the important parts of the lambda.

--- a/lib/AST/StmtPrinter.cpp
+++ b/lib/AST/StmtPrinter.cpp
@@ -1503,6 +1503,9 @@ StmtPrinter::VisitObjCBridgedCastExpr(ObjCBridgedCastExpr *E) {
   PrintExpr(E->getSubExpr());
 }
 
+void StmtPrinter::VisitLambdaExpr(LambdaExpr *Node) {
+}
+

Please leave a // FIXME here

+  {
+    // FIXME: Should we default the TST to auto like below, or create
+    // Declarator::LambdaContext and add extra handling to a bunch of different
+    // functions?
+    const char *PrevSpec = 0;
+    unsigned DiagID = 0;
+    bool HasPrevSpec = DS.SetTypeSpecType(TST_auto, /*Loc=*/SourceLocation(), PrevSpec, DiagID);
+    assert(!HasPrevSpec && !PrevSpec && "unexpected default type specifier!");
+  }

I think it's perfectly reasonable to default to 'auto' here, since those are the semantics we want. We still may very well find that we need Declarator::LambdaContext for other reasons.

+  // FIXME: If no declarator is specified, it is as if it were "()". Should we
+  // default the declarator, or add extra handling to Sema?
   Declarator D(DS, Declarator::PrototypeContext);

Please add the extra handling to Sema, because we want to keep track of whether the user explicitly wrote the '()' or not.

+LambdaDecl *Sema::ActOnLambdaIntroducer(const LambdaIntroducer &Intro) {
+  // Diagnose the explicit capture list.
+  SourceLocation CapturesThis;
+  // FIXME: Are typedefs encouraged? What is the naming convention?
+  llvm::DenseMap<const IdentifierInfo*, SourceLocation> CapturesSoFar;

Use your judgment. If you need to name this a lot, give it a typedef.

+  // Check captures before we allocate a LambdaDecl.
+  // FIXME: If we hit an error, should we continue as if it didn't exist, or
+  // immediately return failure? For now, return failure.

Just drop the erroneous capture and continue as if it didn't exist. 

+  for (llvm::SmallVector<LambdaCapture, 4>::const_iterator
+       C = Intro.Captures.begin(), E = Intro.Captures.end(); C != E; ++C) {
+    if (C->Kind == LCK_This) {
+      if (CapturesThis.isValid()) {
+        Diag(C->Loc, diag::err_capture_more_than_once) << "'this'";
+        Diag(CapturesThis, diag::note_previously_captured_here);
+        return NULL;
+      }
+
+      if (Intro.Default == LCD_ByCopy) {
+        Diag(C->Loc, diag::err_this_capture_with_copy_default);

This is a good opportunity for a Fix-It, to remove the extraneous '=' :)

+        return NULL;
+      }
+
+      CapturesThis = C->Loc;
+      continue;
+    }

+    if (C->Kind == LCK_ByRef && Intro.Default == LCD_ByRef) {
+      Diag(C->Loc, diag::err_reference_capture_with_reference_default);
+      return NULL;
+    } else if (C->Kind == LCK_ByCopy && Intro.Default == LCD_ByCopy) {
+      Diag(C->Loc, diag::err_copy_capture_with_copy_default);
+      return NULL;
+    }

In these cases, you can just emit a Fix-It to remove the extraneous & or =, and continue on as if it weren't there.

+    if (!IsFirstAppearance) {
+      Diag(C->Loc, diag::err_capture_more_than_once) << C->Id;
+      Diag(Appearance->second, diag::note_previously_captured_here);
+      return NULL;
+    }

As noted above, you could probably just emit a single diagnostic here, but pass the source location of the previous capture (as a SourceRange) so Clang will highlight it on the command line.

+  // Avoid new Capture[] because we don't want to provide a default constructor.
+  // FIXME: Do we need to align?
+  LambdaDecl::Capture *Explicits
+    = static_cast<LambdaDecl::Capture*>(
+        Context.Allocate(Intro.Captures.size() * sizeof(LambdaDecl::Capture)));

You can use llvm::align_of to get the alignment of LambdaDecl::Capture.

+  // FIXME: Try to merge with the above loop somehow?
+  LambdaDecl::Capture *Explicit = Explicits;
+  for (llvm::SmallVector<LambdaCapture, 4>::const_iterator
+       C = Intro.Captures.begin(), E = Intro.Captures.end(); C != E; ++C) {

Merging with the above loop is reasonable.

+    if (C->Kind == LCK_This) {
+      CXXThisExpr *This = static_cast<CXXThisExpr*>(ActOnCXXThis(C->Loc).take());
+      new (Explicit) LambdaDecl::Capture(This);
+      ++Explicit;
+      continue;
+    }

Note that ActOnCXXThis can fail with an error (if we aren't inside a non-static member expression). You'll want to note this error and drop the 'this' capture if this happens. (This is a good reason for merging this loop with the loop above, since that loop can also drop captures).

+    UnqualifiedId Name;
+    CXXScopeSpec ScopeSpec;
+    Name.setIdentifier(C->Id, C->Loc);
+    // FIXME: Can we do better than ActOnIdExpression?
+    DeclRefExpr *Var
+      = static_cast<DeclRefExpr*>(
+          ActOnIdExpression(getCurScope(),
+                            ScopeSpec, Name,
+                            /*HasTrailingLParen=*/false,
+                            /*IsAddressOfOperand=*/false).take());
+
+    // FIXME: Check that we have a local variable with automatic storage
+    // duration.

It's probably best just to call LookupName() directly, and then look at the result of name lookup, than to actually build an id-expression. That way, the capture list itself can just store declaration + source location information for each (successful!) capture.

+  CXXRecordDecl *Class = CXXRecordDecl::Create(Context,
+                                               TTK_Class,
+                                               DC,
+                                               /*StartLoc=*/SourceLocation(),
+                                               /*IdLoc=*/SourceLocation(),
+                                               /*Id=*/0);

It would be good to give source location information here that coincides with location information for the lambda expression itself.

+  Class->startDefinition();
+
+  LambdaDecl *Lambda = LambdaDecl::Create(Context, DC,
+                                          Intro.Range.getBegin());
+  Lambda->Explicits = Explicits;
+  Lambda->NumExplicits = Intro.Captures.size();
+  Lambda->Class = Class;

I'd prefer if Explicits, NumExplicits, and Class were set via the LambdaDecl::Create method, so they don't need to be public.

+    assert(D.getNumTypeObjects() == 1 && "unexpected nested declarator!");
+    // FIXME: The scope is unused?
+    DeclaratorChunk &Chunk = D.getTypeObject(0);
+    assert(Chunk.Kind == DeclaratorChunk::Function &&
+           "unexpected non-function declarator chunk!");
+    if (!Chunk.Fun.TrailingReturnType) {
+      Chunk.Fun.TrailingReturnType = Context.VoidTy.getAsOpaquePtr();
+    }
+    MethodTyInfo = GetTypeForDeclarator(D, getCurScope());
+    MethodTy = MethodTyInfo->getType();

It's worth checking for a NULL return from GetTypeForDeclarator, so you can then fail gracefully.

+  CXXMethodDecl *Method
+    = CXXMethodDecl::Create(Context,
+                            Class,
+                            /*StartLoc=*/SourceLocation(),
+                            DeclarationNameInfo(MethodName,
+                                                /*NameLoc=*/SourceLocation()),
+                            MethodTy,
+                            MethodTyInfo,
+                            /*isStatic=*/false,
+                            SC_None,
+                            /*isInline=*/true,
+                            /*EndLocation=*/SourceLocation());

It's good to provide actual source locations for the start/end when you can, e.g., so they line up with something in the lambda.

+  // FIXME: It appears we can get the ParmVarDecls through either the
+  // Declarator, a la Sema::ActOnMethodDeclaration, or through the
+  // TypeSourceInfo, a la Sema::ActOnBlockArguments. Which is preferred? What do
+  // TypeSourceInfo and TypeLoc represent?
+  FunctionProtoTypeLoc MethodTypeLoc;
+  {
+    TypeLoc TL = MethodTyInfo->getTypeLoc();
+    if (isa<FunctionProtoTypeLoc>(TL))
+      MethodTypeLoc = cast<FunctionProtoTypeLoc>(TL);
+  }
+  Method->setParams(MethodTypeLoc.getParmArray(), MethodTypeLoc.getNumArgs());

The ParmVarDecls will be the same in both places. I suggest grabbing them from the Declarator, since that's easiest.

+void Sema::ActOnLambdaBody(LambdaDecl *Lambda, Stmt *Body) {
+  CXXRecordDecl *Class = Lambda->Class;
+  CanQualType ClassType
+    = Context.getCanonicalType(Context.getTypeDeclType(Class));
+
+  CXXMethodDecl *Method = Lambda->Method;
+  Method->setBody(Body);
+
+  DeclarationName CtorName
+    = Context.DeclarationNames.getCXXConstructorName(ClassType);
+  CXXConstructorDecl *Ctor
+    = CXXConstructorDecl::Create(Context,
+                                 Class,
+                                 /*StartLoc=*/SourceLocation(),
+                                 DeclarationNameInfo(CtorName,
+                                                  /*NameLoc=*/SourceLocation()),
+                                 // Chicken and egg problem: we need the Ctor to
+                                 // build the CtorParams and the CtorParams to
+                                 // build the Ctor. Favor one call on the Ctor
+                                 // to set its type after construction instead
+                                 // of one call on each CtorParam to set their
+                                 // DeclContext.
+                                 QualType(),
+                                 /*TInfo=*/0,
+                                 /*isExplicit=*/true,
+                                 /*isInline=*/true,
+                                 /*isImplicitlyDeclared=*/true);

The way we typically address this chicken-and-egg problem is to actually build the ParmVarDecls for the constructor before building the constructor itself. The ParmVarDecls are given the translation unit as their DeclContext, and then later (after the constructor is built) we update their DeclContext to point to the constructor. This allows us to build the full type of the constructor before building the constructor itself.

+  CXXCtorInitializer **CtorInit = CtorInits;
+  for (LambdaDecl::Capture *C = Lambda->Explicits,
+       *E = Lambda->Explicits + Lambda->NumExplicits; C != E; ++C) {
+
+    Expr *Init = C->getInit();
+
+    IdentifierInfo *Id;
+    if (C->isThisCapture()) {
+      Id = PP.getIdentifierInfo("__this");
+    } else {
+      Id = cast<DeclRefExpr>(Init)->getDecl()->getIdentifier();
+    }

This is one of the places where just storing the VarDecl in the capture list would make things easier. 

+    // FIXME: Find the correct type. There are capture type transformations
+    // remaining to be implemented. What is the difference between CanQualType
+    // and QualType? How is template instantiation handled? If some of these
+    // have dependent type, how is that treated?
+    QualType CaptureType = Init->getType();
+    if (C->isReferenceCapture()) {
+      // FIXME: Compare with Sema::BuildReferenceType.
+      CaptureType = Context.getLValueReferenceType(CaptureType,
+                                                   /*SpelledAsLValue=*/true);
+    }

CaptureType is dependent, that's fine. We'll just end up building a reference to a dependent type.

When you get to the point where you're dealing with template instantiation, I expect that we'll end up running all of this code again after substituting the template arguments for template parameters, at which point we'll probably have non-dependent types.

+    FieldDecl *Field = CheckFieldDecl(CaptureName,
+                                      CaptureType,
+                                      /*TInfo=*/0,
+                                      Class,
+                                      /*StartLoc=*/SourceLocation(),
+                                      /*Mutable=*/false,
+                                      /*BitWidth=*/0,
+                                      /*HasInit=*/false,
+                                      /*IdLoc=*/SourceLocation(),
+                                      AS_private,
+                                      /*PrevDecl=*/0);
+
+    Class->addDecl(Field);

Again, source location information: you can have the field locations match up with the capture locations, so if there's a problem with the field declaration for some reason, it'll point at the capture.

+    // FIXME: Compare with Sema::CheckParameter.
+    ParmVarDecl *CtorParam = ParmVarDecl::Create(Context,
+                                                 Ctor,
+                                                 /*StartLoc=*/SourceLocation(),
+                                                 /*IdLoc=*/SourceLocation(),
+                                                 Id,
+                                  Context.getAdjustedParameterType(CaptureType),
+                                                 /*TInfo=*/0,
+                                                 SC_None,
+                                                 SC_None,
+                                                 /*DefArg=*/0);

Same comment about source locations.

+    DeclRefExpr *CtorInitArg
+      = DeclRefExpr::Create(Context,
+                            /*QualifierLoc=*/NestedNameSpecifierLoc(),
+                            CtorParam,
+                            /*NameLoc=*/SourceLocation(),
+                            CaptureType.getNonReferenceType(),
+                            VK_LValue);
+    // FIXME: Compare with Sema::BuildMemberInitializer.
+    *CtorInit = new (Context) CXXCtorInitializer(Context,
+                                                 Field,
+                                                 /*MemberLoc=*/SourceLocation(),
+                                                 /*L=*/SourceLocation(),
+                                                 CtorInitArg,
+                                                 /*R=*/SourceLocation());

This won't be quite enough, because it doesn't perform semantic analysis. It probably makes sense just to call Sema::BuildMemberInitializer directly to do this work.

+  CXXConstructExpr *CtorCall = CXXConstructExpr::Create(Context,
+                                                        CtorType,
+                                                       /*Loc=*/SourceLocation(),
+                                                        Ctor,
+                                                        /*Elidable=*/false,
+                                                        CtorArgs.data(),
+                                                        CtorArgs.size());

You'll want to go through Sema::BuildCXXConstructExpr to build a call to this constructor.

This is a *great* start! I suggest doing some polish on the AST representation based on my comments (mainly to get better source-location information in there), and try to get something we commit---knowing, of course, that it's nowhere near complete.

	- Doug