[cfe-dev] [RFC] Captured Statements

Douglas Gregor dgregor at apple.com
Tue Jan 29 17:31:27 PST 2013


Hello,

On Jan 23, 2013, at 7:30 AM, "Pan, Wei" <wei.pan at intel.com> wrote:

> Hi Doug and clang-dev,
> 
> We think this could answer Doug's question about "How function outlining is handled?" http://lists.cs.uiuc.edu/pipermail/cfe-dev/2013-January/027311.html 
> 
> Thanks!
> 
> Ben Langmuir, Wei Pan and Andy Zhang
> Intel of Canada, Waterloo
> 
> *BEGIN*
> 
> [RFC] Captured Statements 
> 
> We are proposing a feature that we have called 'Captured Statements', to
> support outlining statements into functions in Clang's AST.  This
> includes capturing references to variables - similar to C++11 lambdas'
> automatic capture.  However, the feature is more "primitive" than lambdas and
> so has less complexity and baggage, and so can be used for implementing other
> features not related to lambdas.
> 
> We used Captured Statements to support the Cilk Plus C/C++ extension in Clang.
> However, we believe that Captured Statements will be useful to others, and are
> seeking feedback on the proposed design.  In particular, Captured Statements
> should be useful in the implementation of OpenMP parallel regions.  They may
> also be useful in implementing some of the new features (e.g. in-kernel spawning)
> being considered for OpenCL 2.0, and for nested functions as in GCC.
> 
> There are a set of requirements for function outlining:
> 
> (1) Must work for both C and C++ programs
> (2) Should be nestable
> (3) Should be able to capture most types of variables, including arrays and 'this'
> (4) Should be able to customize the capturing and codegen behavior
> 
> The primary use case is to support outlining parallel regions into functions
> so that they may be passed to a runtime library.  Both OpenMP and Cilk Plus
> require this kind of outlining to run parallel regions on multiple threads
> using a runtime library.
> 
> E.g.
> 
> #paragma omp parallel
> {
>  ... // parallel region is outlined, some variable references are captured
> }
> 
> cilk_spawn foo(a, b, c); // call to foo is outlined into a helper function and
>                                               // references to a, b, and c are captured
> 
> 
> There are two existing AST constructs closely related to Captured Statements:
> Objective-C/C++ blocks and C++11 lambda expressions.
> 
> The code generation of "block" calls contains quite a few Objective-C specific
> runtime calls.  There are also constraints for blocks that do not apply to
> Captured Statements, e.g. arrays cannot be captured in blocks.
> C++11 lambda expressions work for C++ only, where the context is captured in a
> CXXRecordDecl.
> 
> As far as we know, neither construct can satisfy all the above requirements,
> and a new Captured Statement seems necessary. The proposed AST changes are based
> on the AST for blocks, but the codegen is closer to lambdas.
> 
> Most existing routines for variable capturing will be shared among blocks,
> lambdas and Captured Statements. We still need to extend the current clang
> implementation. For example, a OpenMP 'threadprivate' variable should also be
> captured, although it may be a static variable or static class member.


> AST
> ===
> 
> We propose adding a new abstract AST class CapturedStmt to represent a Captured Statement:
> 
> - CapturedStmt derives from Stmt
> - CapturedStmt is an abstract class and each kind of outlining
>  (eg, for OpenMP, Cilk Plus, etc) will create a separate AST class that
>  derives from CapturedStmt

This part surprises me a little bit. I would have expected that CapturedStmt would be the same across the various consumers of outlining, and that it's the consumers that would differ. An OpenMP parallel for loop would store a CapturedStmt, as might a Cilk spawn expression.

> - CapturedStmt will contain "captures", a list of variables referenced within
>  the Captured Statement that are declared outside the scope of the statement
> - The CapturedStmt node will hold a Stmt that is the statement to be outlined.
> 
> We have prototyped Captured Statements and created an example for its use.
> In our prototype, the "#pragma captured" directive is used to mark a compound statement
> as a Captured Statement, which will be outlined into a separate function and the
> compound statement will be replaced a call to the outline function immediately.

Please put this undef "#pragma clang __debug captured", and we'll remove it as soon as we get our first "real" client in-tree.

> Take the following example,
> 
> int foo(int x) {
>  int y = 7;
>  #pragma captured
>  { y *= x; }
> 
>  return y;
> }
> 
> This is equivalent to
> 
> int foo(int x) {
>  __block int y = 7;
>  ^{ y *= x; }();
> 
>  return y;
> }
> 
> using a block or
> 
> void foo(int x) {
>  int y = 7;
>  [&](){ y *= x; }();
> 
>  return y;
> }
> 
> using a lambda expression. With the Captured Statement, its AST looks like
> 
> (FunctionDecl 0x5b272e0 <captured.c:3:1, line:12:1> foo 'int (int)'
>    (ParmVarDecl 0x5b27220 <line:3:9, col:13> x 'int')
>    (CompoundStmt 0x5b52d10 <col:16, line:12:1>
>      (DeclStmt 0x5b27418 <line:4:3, col:12>
>        (VarDecl 0x5b273a0 <col:3, col:11> y 'int'
>          (IntegerLiteral 0x5b273f8 <col:11> 'int' 7)))
>      (CapturedStmt 0x5b52c60 <line:7:3, line:9:3>
>        (Capture (Var 0x5b273a0 'y' 'int'))
>        (Capture (ParmVar 0x5b27220 'x' 'int'))
>        (CompoundStmt 0x5b52c40 <line:7:3, line:9:3>
>          (CompoundAssignOperator 0x5b52c08 <line:8:5, col:10> 'int' '*=' ComputeLHSTy='int' ComputeResultTy='int'
>            (DeclRefExpr 0x5b52a88 <col:5> 'int' lvalue Var 0x5b273a0 'y' 'int')
>            (ImplicitCastExpr 0x5b52bf0 <col:10> 'int' <LValueToRValue>
>              (DeclRefExpr 0x5b52b40 <col:10> 'int' lvalue ParmVar 0x5b27220 'x' 'int')))))
>      (ReturnStmt 0x5b52cf0 <line:11:3, col:10>
>        (ImplicitCastExpr 0x5b52cd8 <col:10> 'int' <LValueToRValue>
>          (DeclRefExpr 0x5b52cb0 <col:10> 'int' lvalue Var 0x5b273a0 'y' 'int'))))))
> 
> which is almost of the same AST for the block example above.
> 
> An implicit RecordDecl(not CXXRecordDecl) will be created to hold all the capture fields,
> and the capture type is by reference by default. The statement to be captured will
> be the body of an implicit FunctionDecl.

Just FWIW, you'll almost certainly need to build a CXXRecordDecl in C++ mode, but that shouldn't make what you're doing any harder.

> Semantic analysis
> =================
> 
> There are a number of common constraints on statements to be captured, and this needs
> to be elaborated further. A general rule is to treat a Captured Statement as
> a function body. For example, the use of jump statements into and out of the
> statement is limited.
> 
> Some refactoring may be required to accommodate needs for derived Captured Statements.
> For example, one Captured Statement may allow throw expressions but another may not.

I guess that's one reason to have different Captured Statement subclasses, but even that's just contextual information that we can easily encode in the single Captured Statement node.

> Code generation
> ===============
> 
> The Captured Statement AST is close to blocks, but the code generation is completely
> different. In fact, for straight Captured Statements (those without additional
> language extension runtime calls inserted), both emission of the outlined function
> and its invocation are much closer to lambdas. The only difference is that the
> capture context is explicitly passed as the first argument.
> 
> The code emitted for the outlined function looks like:
> 
> %struct.capture = type { i32*, i32* }
> 
> define internal void @__captured_stmt_helper(%struct.capture* %this) nounwind {
> entry:
>  %this.addr = alloca %struct.capture*, align 8
>  store %struct.capture* %this, %struct.capture** %this.addr, align 8
>  %0 = load %struct.capture** %this.addr
>  %1 = getelementptr inbounds %struct.capture* %0, i32 0, i32 1
>  %ref = load i32** %1, align 8
>  %2 = load i32* %ref, align 4
>  %3 = getelementptr inbounds %struct.capture* %0, i32 0, i32 0
>  %ref1 = load i32** %3, align 8
>  %4 = load i32* %ref1, align 4
>  %mul = mul nsw i32 %4, %2
>  store i32 %mul, i32* %ref1, align 4
>  ret void
> }
> 
> *END*

Looks reasonable. I think this is a great approach, and I look forward to seeing the patches.

	- Doug





More information about the cfe-dev mailing list