[cfe-commits] [PATCH] Parsing C++0x lambda expressions

Douglas Gregor dgregor at apple.com
Wed Jul 13 16:23:06 PDT 2011


Hi John,

On Jul 13, 2011, at 6:17 AM, John Freeman wrote:

> This is an updated patch addressing comments.
> 
> Doug pointed out a nasty case where a lambda introducer cannot be disambiguated with fixed lookahead from a message expression where the receiver is a comma expression. In fact, if it is a lambda introducer, then it cannot be disambiguated until its end because every capture form (identifier, &identifier, 'this') is a valid expression.
> 
> ParseCXX0XLambdaIntroducer returns a DiagnosticID if it hits something unexpected. For tentative parsing, we can ignore the diagnostic and revert. This is encapsulated in TryParseCXX0XLambdaIntroducer. For normal parsing, we can issue the diagnostic and continue.
> 
> ParseCXX0XLambdaExpressionAfterIntroducer handles the rest of the expression.
> 
> The calls to these functions are wrapped in two entry functions:
> - ParseCXX0XLambdaExpression, for normal parsing where we know we are looking at a lambda expression, and
> - TryParseCXX0XLambdaExpression, that first uses lookahead and possibly tentative parsing (it calls the other entry function when lookahead is enough)
> 
> TryParseCXX0XLambdaExpression returns an ExprResult in one of three states:
> - invalid, to indicate that we are not looking at a lambda expression (and that we should try parsing a message expression)
> - valid but unusable, when we parsed a lambda expression with unrecoverable errors (like incomplete support)
> - valid and usable, when we successfully parsed a lambda expression (possibly with recoverable errors)
> 
> I fear it could be considered overkill for what may be a rare or even non-existent case in real code. Let me know.
> 
> Feedback is appreciated!


Index: include/clang/Basic/DiagnosticParseKinds.td
===================================================================
--- include/clang/Basic/DiagnosticParseKinds.td	(revision 135029)
+++ include/clang/Basic/DiagnosticParseKinds.td	(working copy)
@@ -487,6 +487,11 @@
 def err_sizeof_parameter_pack : Error<
   "expected parenthesized parameter pack name in 'sizeof...' expression">;
 
+// C++0x lambda expressions
+def err_expected_comma_or_rsquare : Error<"expected ',' or ']'">;
+def err_expected_capture : Error<"expected capture">;
+def err_expected_lambda_body : Error<"expected body of lambda expression">;
+

The first two could be a bit more descriptive, e.g,

	"expected ',' or ']' in lambda capture list"
	"expected a variable name or 'this' in capture list"

Index: include/clang/Parse/Parser.h
===================================================================
--- include/clang/Parse/Parser.h	(revision 135029)
+++ include/clang/Parse/Parser.h	(working copy)
@@ -1183,6 +1183,50 @@
                                       bool IsTypename = false);
 
   //===--------------------------------------------------------------------===//
+  // C++0x 5.1.2: Lambda expressions
+
+  enum CXX0XLambdaCaptureDefault {
+    LCD_None,
+    LCD_ByCopy,
+    LCD_ByRef
+  };
+
+  enum CXX0XLambdaCaptureKind {
+    LCK_This,
+    LCK_ByCopy,
+    LCK_ByRef
+  };
+
+  struct CXX0XLambdaCapture {
+    CXX0XLambdaCaptureKind Kind;
+    SourceLocation Loc;
+    IdentifierInfo* Id;
+
+    CXX0XLambdaCapture(CXX0XLambdaCaptureKind Kind,
+                       SourceLocation Loc,
+                       IdentifierInfo* Id = 0)
+      : Kind(Kind), Loc(Loc), Id(Id)
+    {}
+  };
+
+  struct CXX0XLambdaCaptureList {
+    SourceLocation Loc, EndLoc;
+    CXX0XLambdaCaptureDefault Default;
+    llvm::SmallVector<CXX0XLambdaCapture, 4> Captures;
+
+    CXX0XLambdaCaptureList()
+      : Default(LCD_None) {}
+  };

Please comment the enums and structs so that we know what language entities they represent.

Also, [Loc, EndLoc] could just be represented as a SourceRange (containing the '[' and ']' locations, presumably).

+ExprResult Parser::ParseCXX0XLambdaExpression() {
+  // Parse lambda-introducer.
+  CXX0XLambdaCaptureList List;
+
+  unsigned DiagID = ParseCXX0XLambdaIntroducer(List);
+  if (DiagID != 0) {
+    Diag(Tok, DiagID);
+    SkipUntil(tok::r_square);
+  }

Technically, I think we can have a valid diagnostic ID of zero, although it's unlikely that it would ever be returned from ParseCXX0XLambdaIntroducer. Wouldn't a return type of llvm::Optional<unsigned> describe the intent more precisely?

+/// TryParseCXX0XLambdaExpression - Use lookahead and potentially tentative
+/// parsing to determine if we are looking at a C++0x lambda expression, and parse
+/// it if we are.
+///
+/// The returned ExprResult will be in one of three states:
+/// - invalid => we are not looking at a lambda expression
+/// - valid and unusable => we parsed a lambda expression with unrecoverable
+///   errors
+/// - valid and usable => we parsed a lambda expression (possibly with
+///   recoverable errors)
+ExprResult Parser::TryParseCXX0XLambdaExpression() {
+  assert(getLang().CPlusPlus0x
+         && Tok.is(tok::l_square)
+         && "why would you call me?");

Please make the assertion test useful, e.g., "Not at the start of a possible lambda expression".

+  ExprResult Res(/*Invalid=*/true);
+
+  // FIXME: Save results of NextToken() and GetLookAheadToken() myself, or rely
+  // on compiler to optimize common subexpression?

I wouldn't rely on the compiler to optimize away the repeated NextToken() calls. It'd be better to save the token locally.

+  if (NextToken().is(tok::r_square) ||

Okay, this is []

+      NextToken().is(tok::equal) ||

Okay, this is [=

+      (NextToken().is(tok::amp) &&
+       GetLookAheadToken(2).isNot(tok::identifier)) ||

This check doesn't seem right. [&(x) + y foo:wibble] is a message send.

Looking for [&, or [&] would make sense.

+      (NextToken().is(tok::identifier) &&
+       GetLookAheadToken(2).is(tok::r_square))) {

Okay, this is [identifier]

+    Res = ParseCXX0XLambdaExpression();

We like to avoid nesting when possible. How about making this

	return ParseCXX0XLambdaExpression();

so we don't need the "else" below?

+  // ...else if lookahead does not indicate that this is not a lambda...
+  } else if (!(NextToken().is(tok::identifier) &&
+               GetLookAheadToken(2).is(tok::identifier))) {
+    CXX0XLambdaCaptureList List;
+    if (TryParseCXX0XLambdaIntroducer(List))
+      Res = ParseCXX0XLambdaExpressionAfterIntroducer(List);
+  }

I think it would be cleaner to isolate the obvious message-send case 

	[identifier identifier

and just 

	return ExprError();

at this point. Then this code

+    CXX0XLambdaCaptureList List;
+    if (TryParseCXX0XLambdaIntroducer(List))
+      Res = ParseCXX0XLambdaExpressionAfterIntroducer(List);

need not be nested at all.

+/// ParseCXX0XLambdaExpression - Parse a lambda introducer.
+///
+/// Returns a DiagnosticID if it hit something unexpected, or 0 if parsing was
+/// successful.
+unsigned Parser::ParseCXX0XLambdaIntroducer(CXX0XLambdaCaptureList &List) {
+  assert(Tok.is(tok::l_square) && "lambda expressions begin with '['");
+  List.Loc = ConsumeBracket();
+
+  bool first = true;
+  while (1) {
+    if (Tok.is(tok::r_square)) {
+      List.EndLoc = MatchRHSPunctuation(tok::r_square, List.Loc);
+      break;
+    }
+
+    if (first) {
+      first = false;
+
+      // Parse capture-default.
+      if (Tok.is(tok::amp) && NextToken().isNot(tok::identifier)) {
+        List.Default = LCD_ByRef;
+        // Consume '&'.
+        ConsumeToken();
+        continue;
+      } else if (Tok.is(tok::equal)) {
+        List.Default = LCD_ByCopy;
+        // Consume '='.
+        ConsumeToken();
+        continue;
+      }

Do we want to save the locate not the '&' or '='? I guess it isn't necessary.

+    if (Tok.is(tok::kw_this)) {
+      Kind = LCK_This;
+      Loc = ConsumeToken();
+    } else {
+      if (Tok.is(tok::amp)) {
+        Kind = LCK_ByRef;
+        ConsumeToken();
+      }
+
+      if (Tok.is(tok::identifier)) {
+        Id = Tok.getIdentifierInfo();
+        Loc = ConsumeToken();
+      } else {
+        return diag::err_expected_capture;
+      }
+    }

Should we have a custom diagnostic for &this?

+/// TryParseCXX0XLambdaExpression - Tentatively parse a lambda introducer.
+///
+/// Returns true if successful. 
+bool Parser::TryParseCXX0XLambdaIntroducer(CXX0XLambdaCaptureList &List) {

LLVM/Clang tends to use false to indicate success, rather than true.

+  if (DiagID == 0) {
+    PA.Commit();
+    return true;
+  } else {
+    PA.Revert();
+    return false;
+  }

This is another case where you can reduce indenting by just dropping the else, since the if returns unconditionally anyway.

Things are looking fairly good, thanks for working on this!

	- Doug



More information about the cfe-commits mailing list