[PATCH] D11361: [OpenMP] Target directive host codegen

Fri Aug 14 19:34:36 PDT 2015

sfantao added a comment.

Alexey, John,

Thanks for the review! I've tried to address your concerns in the last diff. Please, check the inlined comments to find answers for the remarks of the previous diff.

Thanks again!
Samuel


================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:863-864
@@ -840,1 +862,4 @@
   }
+  case OMPRTL__tgt_target: {
+    // Build to int32_t __tgt_target(int32_t device_id, void *host_ptr, int32_t
+    // arg_num, void** args_base, void **args, int64_t *arg_sizes, int32_t
----------------
rjmccall wrote:
> Spurious "to" at the start.
Fixed!

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:2921
@@ +2920,3 @@
+      uint64_t SizeVal =
+          CGM.getDataLayout().getTypeSizeInBits(V->getType()) / 8;
+      Size = CGF.Builder.getInt64(SizeVal);
----------------
rjmccall wrote:
> getTypeStoreSize()
Now using getTypeSizeInChars from the ASTcontext as suggested bellow.  

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:2930
@@ +2929,3 @@
+      uint64_t SizeVal =
+          CGM.getDataLayout().getTypeSizeInBits(PtrTy->getElementType()) / 8;
+      Size = CGF.Builder.getInt64(SizeVal);
----------------
rjmccall wrote:
> You should ask the ASTContext to compute this size instead of making assumptions about the layout size of the IR type.
> 
> Also, what are the semantics supposed to be for mapping to and from?  Do referents need to be trivially copyable?  What if there are pointers or references?  What happens to virtual bases?
Using using getTypeSizeInChars from the ASTcontext.

This patch only deals with trivially copiable types. By default, a variable that is captured in the target region is mapped "by value" using a to-from policy. In order to do something different than the default, the user has to use a map clause  (I'll send out a patch for it once we have the Parsing/SEMA in place). As per the current spec, the map clause allows a user to map the pointee of a pointer as well as only mapping a section of an array or pointee. In the next version of the OpenMP spec we will have the ability to map aggregate fields and more support for "deep" copy. We expect to be able to handle all the cases with the proper flags in OpenMPOffloadMappingFlags. 

About virtual bases: OpenMP 4.0 forbids virtual members in mappable variable. Nevertheless, it is possible this constraint be lifted in future versions.

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:2940
@@ +2939,3 @@
+      uint64_t SizeVal =
+          CGM.getDataLayout().getTypeSizeInBits(PtrTy->getElementType()) / 8;
+      Size = CGF.Builder.getInt64(SizeVal);
----------------
rjmccall wrote:
> Same thing, please ask the ASTContext.
> 
> Also, you might need to care about variably-sized types here.
Now using ASTContext.

I was planing to deal with the VLA types once I add support for the map clause, but I agree it makes more sense to do it now. Thanks for the suggestion.

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:3009
@@ +3008,3 @@
+    SizesArray =
+        CGF.Builder.CreateAlloca(CGM.Int64Ty, PointerNum, ".offload_sizes");
+
----------------
rjmccall wrote:
> This is creating a bunch of dynamic allocas instead of just temporary values.  Please call CreateMemTemp with an array type instead.
Done!

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:3014
@@ +3013,3 @@
+    llvm::Constant *MapTypesArrayInit =
+        llvm::ConstantDataArray::get(CGF.Builder.getContext(), MapTypes);
+    MapTypesArray =
----------------
rjmccall wrote:
> The sizes aren't constant if you've captured a VLA.  But this comment is actually just wrong, because this isn't building something for the sizes at all; it's building something for the flags.
> 
> That said, I think you ought to be able to do the same thing with the sizes when you don't have a VLA.
Sorry for the error in the comment. It is fixed now.

Also, I added code to deal with constant sizes for when we don't have VLAs.

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:3017
@@ +3016,3 @@
+        new llvm::GlobalVariable(CGM.getModule(), MapTypesArrayInit->getType(),
+                                 true, llvm::GlobalValue::PrivateLinkage,
+                                 MapTypesArrayInit, ".offload_maptypes");
----------------
rjmccall wrote:
> Please comment boolean arguments like this:
>    /*constant*/ true
> 
> And please mark this variable unnamed_addr.
Added comments for the boolean.

Setting unnamed_addr now, thanks!

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:3037
@@ +3036,3 @@
+      llvm::Value *S = CGF.Builder.CreateConstInBoundsGEP1_32(
+          SPtrTy->getElementType(), SizesArray, i);
+
----------------
rjmccall wrote:
> You already know the element types for all of these.  The code will be much more readable if you just use those types directly.
Ok, now using the types explicitly.

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:3044
@@ +3043,3 @@
+      CGF.Builder.CreateStore(
+          CGF.Builder.CreateIntCast(Sizes[i], CGM.Int64Ty, true), S);
+    }
----------------
rjmccall wrote:
> It makes more sense to emit the store next to the GEP.  This will also let you more easily skip all the code associated with an array when you can emit it as a constant.
Moved the stores next to the GEP. Also skipping the sizes for when no VLAs are used.

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:3067
@@ +3066,3 @@
+          CGM.getModule(), CGM.Int8Ty, true, llvm::GlobalValue::PrivateLinkage,
+          llvm::Constant::getNullValue(CGM.Int8Ty), ".offload_hstptr"),
+      PointerNum, BasePointersArray, PointersArray, SizesArray, MapTypesArray};
----------------
rjmccall wrote:
> Please move the creation of this global out of this initializer list.
Done!

================
Comment at: lib/CodeGen/CGStmt.cpp:2215-2245
@@ -2213,4 +2214,33 @@
   FunctionArgList Args;
-  Args.append(CD->param_begin(), CD->param_end());
+
+  // If this is an offload function, we need pass a reference to each captured
+  // declarations as arguments.
+  if (isOffloadFunction) {
+    DeclContext *DC = CapturedDecl::castToDeclContext(CD)->getParent();
+    auto ri = RD->field_begin();
+    for (CapturedStmt::const_capture_iterator ci = S.capture_begin(),
+                                              ce = S.capture_end();
+         ci != ce; ++ci, ++ri) {
+      StringRef Name;
+      QualType Ty;
+      if (ci->capturesVariableArrayType()) {
+        Ty = Ctx.getPointerType(ri->getType());
+        Name = "__vla_size";
+      } else if (ci->capturesThis()) {
+        Ty = ri->getType();
+        Name = "__this";
+      } else {
+        const VarDecl *VD = ci->getCapturedVar();
+        Ty = Ctx.getPointerType(VD->getType());
+        Name = VD->getName();
+      }
+
+      IdentifierInfo *ParamName = &Ctx.Idents.get(Name);
+      ImplicitParamDecl *Param =
+          ImplicitParamDecl::Create(Ctx, DC, Loc, ParamName, Ty);
+      Args.push_back(Param);
+    }
+  } else
+    Args.append(CD->param_begin(), CD->param_end());
 
   // Create the function declaration.
----------------
rjmccall wrote:
> ABataev wrote:
> > rjmccall wrote:
> > > ABataev wrote:
> > > > sfantao wrote:
> > > > > There is only one argument that comes with the CaptureDecl and that is the anonymous struct that captures all the references. 
> > > > > 
> > > > > For the target outlined functions we have to pass each capture separately because not only are they arguments but also data mappings. Therefore we don't have to generate the anonymous struct at all and use the captures directly, signaling the proper map types. This follows what was discussed in the offloading infrastructure proposal - the compiler will generate arrays with each argument/mapping and forward that to the target specific plugin. The target function is therefore expected to have the same signature of the host function that is being generated by this patch, so that the order in the array is consistent with the signature.
> > > > > 
> > > > > The code in this patch that is generating the host version of the target region can therefore be completely reused to generate the device version.
> > > > Could we add required functionality to CapturedStmt to make it pass mappings also, if required? Currently we have to expose some things that better to stay hidden like VLAMap, modify StartFunction(), when we could add required parameters in Sema analysis and get all required stuff for free, without any additional changes in codegen
> > > Okay, so from what I understand, __tgt_target is just a placeholder call that some later pass will rewrite to invoke a target-specific plugin that behaves like the host function.  Fine, but:
> > > 
> > > - Doesn't it still need to get passed the host function?
> > > - Why does it return an i32 instead of just an i1, if you're going to rewrite it anyway?
> > > - It would be much clearer to name it something actually intrinsic-like, like "omp.target.call".
> > > 
> > > As for the signature: as we've discussed before, because the captured statement invocation function isn't actually exposed anywhere, its physical signature is really completely an implementation detail of the captured statement emitter.  So I have no problem with this kind of change, but you should really design it to be flexible for different captured-statement emitters (by, say, taking a lambda or something similar) rather than hardcoding the specific needs of this one.
> > John, __tgt_target is not a placeholder, this is a real name of offloading runtime function. Runtime library for offloading is not a part of LLVM yet. You can find it here https://github.com/clang-omp/libomptarget. Runtime functions are declared here https://github.com/clang-omp/libomptarget/blob/master/src/omptarget.h
> > 
> Oh, I see.  It's the host-pointer argument that's meant to uniquely determine the operation to perform on the target, and there's going to be some separate registration step that will associate that with a target-specific implementation?
> 
> Are you really adding this otherwise-unnecessary use of libffi just so that you can pass the captures as (indirect) arguments for no particular reason?  You're free to make that decision, but I think it's a poor one.  If you're assuming that all the migrating captures are POD — and I don't see how that's avoidable — then you'd be better off putting all the captured variables into a single allocation and presenting the target with the base address of that.
Correct, there is going to be a separate registration step where all the host pointers are going to be captured in a descriptor and passed to the OpenMP runtime library (__tgt_register_lib()). I'll contribute that in a separate patch.

libffi is not required unless one wants to do offloading to some host thread. The host version of the target region is not launched from the runtime library. If the runtime decides to execute the target region on the host it should return an error code in tgt_target and it will just work. What I think is key here is:

- To have the runtime be more efficient while scanning the mapped pointers/sizes/map types, and using arrays is a good solution for that. 

- For offloading APIs I have experience with, e.g. CUDA, having an array of pointers is exactly what is expected. By passing the parameters separately will let the implementation decide what is best for the target, usually use some dedicated memory or registers for parameters as defined in the calling/launching convention it has in place (in the limit it could implement a single allocation and load the parameters from there as you are suggesting).

Let me know if you need me to clarify any detail about the interface.

================
Comment at: lib/CodeGen/CodeGenFunction.cpp:769
@@ +768,3 @@
+    }
+  }
+
----------------
rjmccall wrote:
> Please find some way of doing this that does not rely on changing common infrastructure like StartFunction.
In the new patch CaptureDecl was changed and `CodeGenFunction::PrepareOMPTargetDirectiveBodyEmission` was created so that this code is not required anymore.

================
Comment at: lib/CodeGen/CodeGenFunction.cpp:1550
@@ -1512,1 +1549,3 @@
+  return vlaSize;
+}
 
----------------
rjmccall wrote:
> What do you need this for that you can't use getVLASize for?
What I need is the size of each dimension and what `getVLASize` returns is the size of the type. I can probably find ways of doing this without this new member function, but that will be definitely more messy. Should I do that? Any suggestion?


http://reviews.llvm.org/D11361