[cfe-dev] [RFC] Captured Statements

Pan, Wei wei.pan at intel.com
Wed Jan 30 12:23:09 PST 2013


Hi Doug,

Thanks for the feedback! All suggestions make good sense to us. We will address them in our future patches. 

Before sending out patches for commit, we are attaching an *incomplete* implementation of captured statements. By applying this patch, clang should compile the following function:

void foo(int &x) {
  #pragma captured
  {
    int y = 100;
    x += y;
    
    #pragma captured
    {
      y++;
      x -= y; 
    }  
  }
}   

There are a number of missing features like template support that we will be working on.

We would welcome any suggestions or feedback. 

Thanks!

Ben Langmuir, Wei Pan and Andy Zhang

-----Original Message-----
From: Douglas Gregor [mailto:dgregor at apple.com] 
Sent: Tuesday, January 29, 2013 8:31 PM
To: Pan, Wei
Cc: Alexey Bataev; clang-dev Developers
Subject: Re: [RFC] Captured Statements

Hello,

On Jan 23, 2013, at 7:30 AM, "Pan, Wei" <wei.pan at intel.com> wrote:

> Hi Doug and clang-dev,
> 
> We think this could answer Doug's question about "How function 
> outlining is handled?" 
> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2013-January/027311.html
> 
> Thanks!
> 
> Ben Langmuir, Wei Pan and Andy Zhang
> Intel of Canada, Waterloo
> 
> *BEGIN*
> 
> [RFC] Captured Statements
> 
> We are proposing a feature that we have called 'Captured Statements', 
> to support outlining statements into functions in Clang's AST.  This 
> includes capturing references to variables - similar to C++11 lambdas'
> automatic capture.  However, the feature is more "primitive" than 
> lambdas and so has less complexity and baggage, and so can be used for 
> implementing other features not related to lambdas.
> 
> We used Captured Statements to support the Cilk Plus C/C++ extension in Clang.
> However, we believe that Captured Statements will be useful to others, 
> and are seeking feedback on the proposed design.  In particular, 
> Captured Statements should be useful in the implementation of OpenMP 
> parallel regions.  They may also be useful in implementing some of the 
> new features (e.g. in-kernel spawning) being considered for OpenCL 2.0, and for nested functions as in GCC.
> 
> There are a set of requirements for function outlining:
> 
> (1) Must work for both C and C++ programs
> (2) Should be nestable
> (3) Should be able to capture most types of variables, including arrays and 'this'
> (4) Should be able to customize the capturing and codegen behavior
> 
> The primary use case is to support outlining parallel regions into 
> functions so that they may be passed to a runtime library.  Both 
> OpenMP and Cilk Plus require this kind of outlining to run parallel 
> regions on multiple threads using a runtime library.
> 
> E.g.
> 
> #paragma omp parallel
> {
>  ... // parallel region is outlined, some variable references are 
> captured }
> 
> cilk_spawn foo(a, b, c); // call to foo is outlined into a helper function and
>                                               // references to a, b, 
> and c are captured
> 
> 
> There are two existing AST constructs closely related to Captured Statements:
> Objective-C/C++ blocks and C++11 lambda expressions.
> 
> The code generation of "block" calls contains quite a few Objective-C 
> specific runtime calls.  There are also constraints for blocks that do 
> not apply to Captured Statements, e.g. arrays cannot be captured in blocks.
> C++11 lambda expressions work for C++ only, where the context is 
> C++captured in a
> CXXRecordDecl.
> 
> As far as we know, neither construct can satisfy all the above 
> requirements, and a new Captured Statement seems necessary. The 
> proposed AST changes are based on the AST for blocks, but the codegen is closer to lambdas.
> 
> Most existing routines for variable capturing will be shared among 
> blocks, lambdas and Captured Statements. We still need to extend the 
> current clang implementation. For example, a OpenMP 'threadprivate' 
> variable should also be captured, although it may be a static variable or static class member.


> AST
> ===
> 
> We propose adding a new abstract AST class CapturedStmt to represent a Captured Statement:
> 
> - CapturedStmt derives from Stmt
> - CapturedStmt is an abstract class and each kind of outlining  (eg, 
> for OpenMP, Cilk Plus, etc) will create a separate AST class that  
> derives from CapturedStmt

This part surprises me a little bit. I would have expected that CapturedStmt would be the same across the various consumers of outlining, and that it's the consumers that would differ. An OpenMP parallel for loop would store a CapturedStmt, as might a Cilk spawn expression.

> - CapturedStmt will contain "captures", a list of variables referenced 
> within  the Captured Statement that are declared outside the scope of 
> the statement
> - The CapturedStmt node will hold a Stmt that is the statement to be outlined.
> 
> We have prototyped Captured Statements and created an example for its use.
> In our prototype, the "#pragma captured" directive is used to mark a 
> compound statement as a Captured Statement, which will be outlined 
> into a separate function and the compound statement will be replaced a call to the outline function immediately.

Please put this undef "#pragma clang __debug captured", and we'll remove it as soon as we get our first "real" client in-tree.

> Take the following example,
> 
> int foo(int x) {
>  int y = 7;
>  #pragma captured
>  { y *= x; }
> 
>  return y;
> }
> 
> This is equivalent to
> 
> int foo(int x) {
>  __block int y = 7;
>  ^{ y *= x; }();
> 
>  return y;
> }
> 
> using a block or
> 
> void foo(int x) {
>  int y = 7;
>  [&](){ y *= x; }();
> 
>  return y;
> }
> 
> using a lambda expression. With the Captured Statement, its AST looks 
> like
> 
> (FunctionDecl 0x5b272e0 <captured.c:3:1, line:12:1> foo 'int (int)'
>    (ParmVarDecl 0x5b27220 <line:3:9, col:13> x 'int')
>    (CompoundStmt 0x5b52d10 <col:16, line:12:1>
>      (DeclStmt 0x5b27418 <line:4:3, col:12>
>        (VarDecl 0x5b273a0 <col:3, col:11> y 'int'
>          (IntegerLiteral 0x5b273f8 <col:11> 'int' 7)))
>      (CapturedStmt 0x5b52c60 <line:7:3, line:9:3>
>        (Capture (Var 0x5b273a0 'y' 'int'))
>        (Capture (ParmVar 0x5b27220 'x' 'int'))
>        (CompoundStmt 0x5b52c40 <line:7:3, line:9:3>
>          (CompoundAssignOperator 0x5b52c08 <line:8:5, col:10> 'int' '*=' ComputeLHSTy='int' ComputeResultTy='int'
>            (DeclRefExpr 0x5b52a88 <col:5> 'int' lvalue Var 0x5b273a0 'y' 'int')
>            (ImplicitCastExpr 0x5b52bf0 <col:10> 'int' <LValueToRValue>
>              (DeclRefExpr 0x5b52b40 <col:10> 'int' lvalue ParmVar 0x5b27220 'x' 'int')))))
>      (ReturnStmt 0x5b52cf0 <line:11:3, col:10>
>        (ImplicitCastExpr 0x5b52cd8 <col:10> 'int' <LValueToRValue>
>          (DeclRefExpr 0x5b52cb0 <col:10> 'int' lvalue Var 0x5b273a0 
> 'y' 'int'))))))
> 
> which is almost of the same AST for the block example above.
> 
> An implicit RecordDecl(not CXXRecordDecl) will be created to hold all 
> the capture fields, and the capture type is by reference by default. 
> The statement to be captured will be the body of an implicit FunctionDecl.

Just FWIW, you'll almost certainly need to build a CXXRecordDecl in C++ mode, but that shouldn't make what you're doing any harder.

> Semantic analysis
> =================
> 
> There are a number of common constraints on statements to be captured, 
> and this needs to be elaborated further. A general rule is to treat a 
> Captured Statement as a function body. For example, the use of jump 
> statements into and out of the statement is limited.
> 
> Some refactoring may be required to accommodate needs for derived Captured Statements.
> For example, one Captured Statement may allow throw expressions but another may not.

I guess that's one reason to have different Captured Statement subclasses, but even that's just contextual information that we can easily encode in the single Captured Statement node.

> Code generation
> ===============
> 
> The Captured Statement AST is close to blocks, but the code generation 
> is completely different. In fact, for straight Captured Statements 
> (those without additional language extension runtime calls inserted), 
> both emission of the outlined function and its invocation are much 
> closer to lambdas. The only difference is that the capture context is explicitly passed as the first argument.
> 
> The code emitted for the outlined function looks like:
> 
> %struct.capture = type { i32*, i32* }
> 
> define internal void @__captured_stmt_helper(%struct.capture* %this) 
> nounwind {
> entry:
>  %this.addr = alloca %struct.capture*, align 8  store %struct.capture* 
> %this, %struct.capture** %this.addr, align 8
>  %0 = load %struct.capture** %this.addr
>  %1 = getelementptr inbounds %struct.capture* %0, i32 0, i32 1  %ref = 
> load i32** %1, align 8
>  %2 = load i32* %ref, align 4
>  %3 = getelementptr inbounds %struct.capture* %0, i32 0, i32 0
>  %ref1 = load i32** %3, align 8
>  %4 = load i32* %ref1, align 4
>  %mul = mul nsw i32 %4, %2
>  store i32 %mul, i32* %ref1, align 4
>  ret void
> }
> 
> *END*

Looks reasonable. I think this is a great approach, and I look forward to seeing the patches.

	- Doug


-------------- next part --------------
A non-text attachment was scrubbed...
Name: captured_stmt.patch
Type: application/octet-stream
Size: 56751 bytes
Desc: captured_stmt.patch
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130130/6adfbeea/attachment.obj>


More information about the cfe-dev mailing list