[cfe-commits] [PATCH] C++0x range-based for loop

Sun Mar 27 03:57:20 PDT 2011

On Mar 21, 2011, at 12:56 AM, Richard Smith wrote:

> Hi,
> 
> The attached patch adds support for C++0x's range-based for loops to
> clang, according to the current draft (N3242). Specifically:
> 
> A new Stmt subclass, CXXForRangeStmt, has been added. This class provides
> two views of the for-statement: a partially-desugared representation (just
> enough desugaring is done to be able to perform the semantic checks
> needed), and an implicit source-as-written representation (extracted from
> the partially-desugared representation).
> 
> In order to avoid tentative parsing, the parsing of the
> for-range-declaration is performed by ParseSimpleDeclaration. The
> for-range-initializer is parsed there too, in order to ensure that the
> for-range-declaration is not in scope in the initializer: "int a =
> {1,2,3}; for (auto a : a)" appears to be legal.
> 
> Most of the semantic analysis for a for-range statement is performed
> before the body is parsed, so that the type of the loop variable can be
> deduced (in case it contains 'auto'). Attaching the body is performed as a
> separate finalization step.
> 
> For a dependently-typed range expression, much of the semantic analysis is
> deferred until template instantiation time. A new flag has been added to
> VarDecl to indicate whether it's a for-range-declarator. This flag is used
> to skip ActOnUninitializedDecl (specifically, while instantiating a
> template containing a dependent for-range statement); such declarators are
> never actually uninitialized, but appear that way in the AST if the
> required semantic analysis could not be performed.
> 
> As a small extension, this implementation allows the deduced 'begin' and
> 'end' iterators to be of differing types; this allows for-range to be used
> for iterators where it's easy to determine whether an iterator is at the
> end, but difficult or expensive to construct an explicit iterator object
> representing the 'end', or to compare general iterators.
> 
> The wording for for-range statements seems likely to change at the
> upcoming WG21 meeting in Madrid (per N3257). Resolution 2 for this change
> can be implemented by changing the NeedsADL flag from true to false in
> BuildForRangeBeginEndCall in SemaStmt.cpp.
> 
> Please review!

Very cool!

@@ -1005,6 +1006,328 @@
                                                    ForLoc, RParenLoc));
 }
 
+namespace {
+
+/// Build a call to 'begin' or 'end' for a C++0x for-range statement.
+/// FIXME: This is likely to be changed as the result of paper N3257.

We now have final wording for this, which just adds a new step before the ADL-only lookup of begin/end. So I'll just comment on the parts that are the same.

+static ExprResult BuildForRangeBeginEndCall(Sema &SemaRef, Scope *S,
+                                            SourceLocation Loc,
+                                            const char *Name, Expr *Arg) {
+  // Find the appropriate begin or end function.
+  IdentifierInfo *II = &SemaRef.PP.getIdentifierTable().get(Name);
+  LookupResult R(SemaRef, II, Loc, Sema::LookupOrdinaryName);
+
+  if (SemaRef.StdNamespace)
+    SemaRef.LookupQualifiedName(R, SemaRef.getStdNamespace());

Qualified name lookup into namespace "std" is not the same as adding "std" as an associated namespace for ADL (there are differences w.r.t. using directives, for example). I suggest modeling the handling of "std" exactly as the working paper specifies. 

+  if (R.isAmbiguous())
+    return ExprError();
+
+  CXXScopeSpec SS;
+  // In order to implement N3257 resolution 2, change this 'true' to 'false'.
+  ExprResult Func = SemaRef.BuildDeclarationNameExpr(SS, R, /*NeedsADL=*/true);
+  if (Func.isInvalid())
+    return ExprError();

Since we won't actually be performing any non-ADL name lookup for 'begin' or 'end', most of this can go away. It's probably best just to build the UnresolvedLookupExpr directly.

Index: include/clang/AST/ASTContext.h
===================================================================

--- include/clang/AST/ASTContext.h	(revision 127980)
+++ include/clang/AST/ASTContext.h	(working copy)
@@ -422,6 +422,10 @@
   CanQualType DependentTy;
   CanQualType ObjCBuiltinIdTy, ObjCBuiltinClassTy, ObjCBuiltinSelTy;
 
+  // Types for deductions in C++0x [stmt.ranged]'s desugaring.
+  QualType AutoDeductTy;     // Deduction against 'auto'.
+  QualType AutoRRefDeductTy; // Deduction against 'auto &&'.

It looks like we're missing serialization/deserialization code for these type nodes. Also, can we build them lazily via some accessors, rather than forcing initialization early in the bring-up of ASTContext? Most programs won't ever need them.

+  // Deduce the type for the implicit range variable.
+  QualType RangeType;
+  if (Range->getType()->isVoidType() ||
+      !DeduceAutoType(Context.AutoRRefDeductTy, Range, RangeType)) {
+    // Deduction can fail, eg if the range is an overloaded function.
+    Diag(Range->getLocStart(), diag::err_for_range_deduction_failure)
+      << Range->getType() << Range->getSourceRange();
+    // Pretend range type is dependent in order to allow us to carry on.
+    RangeType = Context.AutoRRefDeductTy;
+  }

Pretending that the range type is dependent works fine when we're in a template, but doing so in non-template code can be problematic: there should be no dependent types within a non-template function, even when that function is ill-formed. I suggest bailing out earlier, or marking failure in some other way.

+    if (BeginExpr.get()->getType()->isVoidType() ||
+        !DeduceAutoType(Context.AutoDeductTy, BeginExpr.get(), BeginType)) {
+      Diag(RangeLoc, diag::err_for_range_iter_deduction_failure)
+        << BeginExpr.get()->getType();
+      NoteForRangeBeginEndFunction(*this, BeginExpr.get(), BEF_begin);
+    }


Do we really need the check for 'void' here? Can it be folded into DeduceAutoType, since essentially every user of DeduceAutoType will have to do a 'void' check?

+    // C++0x [decl.spec.auto]p6: BeginType and EndType must be the same. We
+    // allow them to differ as an extension, for structures where iterators
+    // can tell whether they have reached the end, and where an explicit 'end'
+    // iterator would be expensive to build.
+    if (Context.getCanonicalType(BeginType) !=
+        Context.getCanonicalType(EndType)) {
+      Diag(RangeLoc, diag::ext_for_range_begin_end_types_differ);
+      NoteForRangeBeginEndFunction(*this, BeginExpr.get(), BEF_begin);
+      NoteForRangeBeginEndFunction(*this, EndExpr.get(), BEF_end);
+    }

It's a little cleaner to use "if(!Context.hasSameType(BeginType, EndType))".


As a general note, most of the various expressions and initializers are full expressions, but you don't seem to be calling ActOnFinishFullExpr at all. That will likely manifest as a bug where temporaries created by those expressions won't be destroyed. Which ties in with...

@@ -1175,15 +1201,19 @@
   if (Body.isInvalid())
     return StmtError();
 
-  if (!ForEach)
-    return Actions.ActOnForStmt(ForLoc, LParenLoc, FirstPart.take(), SecondPart,
-                                SecondVar, ThirdPart, RParenLoc, Body.take());
+  if (ForEach)
+    // FIXME: It isn't clear how to communicate the late destruction of 
+    // C++ temporaries used to create the collection.

Although this FIXME is specifically written for the Objective-C for collection statement, it seems that the same issue affects the C++0x range-for.  How about gathering the set of temporaries still active after processing the range expression and placing them into the CXXForRangeStmt? CodeGen can then create a special cleanup scope for these temporaries.

It would be nice to have a test for the AST read/write of CXXForRangeStmts.

This is looking really great!

	- Doug