[PATCH] D11361: [OpenMP] Target directive host codegen

Thu Jul 23 21:45:00 PDT 2015

rjmccall added inline comments.

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:863-864
@@ -840,1 +862,4 @@
   }
+  case OMPRTL__tgt_target: {
+    // Build to int32_t __tgt_target(int32_t device_id, void *host_ptr, int32_t
+    // arg_num, void** args_base, void **args, int64_t *arg_sizes, int32_t
----------------
Spurious "to" at the start.

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:2921
@@ +2920,3 @@
+      uint64_t SizeVal =
+          CGM.getDataLayout().getTypeSizeInBits(V->getType()) / 8;
+      Size = CGF.Builder.getInt64(SizeVal);
----------------
getTypeStoreSize()

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:2930
@@ +2929,3 @@
+      uint64_t SizeVal =
+          CGM.getDataLayout().getTypeSizeInBits(PtrTy->getElementType()) / 8;
+      Size = CGF.Builder.getInt64(SizeVal);
----------------
You should ask the ASTContext to compute this size instead of making assumptions about the layout size of the IR type.

Also, what are the semantics supposed to be for mapping to and from?  Do referents need to be trivially copyable?  What if there are pointers or references?  What happens to virtual bases?

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:2940
@@ +2939,3 @@
+      uint64_t SizeVal =
+          CGM.getDataLayout().getTypeSizeInBits(PtrTy->getElementType()) / 8;
+      Size = CGF.Builder.getInt64(SizeVal);
----------------
Same thing, please ask the ASTContext.

Also, you might need to care about variably-sized types here.

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:3009
@@ +3008,3 @@
+    SizesArray =
+        CGF.Builder.CreateAlloca(CGM.Int64Ty, PointerNum, ".offload_sizes");
+
----------------
This is creating a bunch of dynamic allocas instead of just temporary values.  Please call CreateMemTemp with an array type instead.

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:3014
@@ +3013,3 @@
+    llvm::Constant *MapTypesArrayInit =
+        llvm::ConstantDataArray::get(CGF.Builder.getContext(), MapTypes);
+    MapTypesArray =
----------------
The sizes aren't constant if you've captured a VLA.  But this comment is actually just wrong, because this isn't building something for the sizes at all; it's building something for the flags.

That said, I think you ought to be able to do the same thing with the sizes when you don't have a VLA.

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:3017
@@ +3016,3 @@
+        new llvm::GlobalVariable(CGM.getModule(), MapTypesArrayInit->getType(),
+                                 true, llvm::GlobalValue::PrivateLinkage,
+                                 MapTypesArrayInit, ".offload_maptypes");
----------------
Please comment boolean arguments like this:
   /*constant*/ true

And please mark this variable unnamed_addr.

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:3037
@@ +3036,3 @@
+      llvm::Value *S = CGF.Builder.CreateConstInBoundsGEP1_32(
+          SPtrTy->getElementType(), SizesArray, i);
+
----------------
You already know the element types for all of these.  The code will be much more readable if you just use those types directly.

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:3044
@@ +3043,3 @@
+      CGF.Builder.CreateStore(
+          CGF.Builder.CreateIntCast(Sizes[i], CGM.Int64Ty, true), S);
+    }
----------------
It makes more sense to emit the store next to the GEP.  This will also let you more easily skip all the code associated with an array when you can emit it as a constant.

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:3067
@@ +3066,3 @@
+          CGM.getModule(), CGM.Int8Ty, true, llvm::GlobalValue::PrivateLinkage,
+          llvm::Constant::getNullValue(CGM.Int8Ty), ".offload_hstptr"),
+      PointerNum, BasePointersArray, PointersArray, SizesArray, MapTypesArray};
----------------
Please move the creation of this global out of this initializer list.

================
Comment at: lib/CodeGen/CGStmt.cpp:2215-2245
@@ -2213,4 +2214,33 @@
   FunctionArgList Args;
-  Args.append(CD->param_begin(), CD->param_end());
+
+  // If this is an offload function, we need pass a reference to each captured
+  // declarations as arguments.
+  if (isOffloadFunction) {
+    DeclContext *DC = CapturedDecl::castToDeclContext(CD)->getParent();
+    auto ri = RD->field_begin();
+    for (CapturedStmt::const_capture_iterator ci = S.capture_begin(),
+                                              ce = S.capture_end();
+         ci != ce; ++ci, ++ri) {
+      StringRef Name;
+      QualType Ty;
+      if (ci->capturesVariableArrayType()) {
+        Ty = Ctx.getPointerType(ri->getType());
+        Name = "__vla_size";
+      } else if (ci->capturesThis()) {
+        Ty = ri->getType();
+        Name = "__this";
+      } else {
+        const VarDecl *VD = ci->getCapturedVar();
+        Ty = Ctx.getPointerType(VD->getType());
+        Name = VD->getName();
+      }
+
+      IdentifierInfo *ParamName = &Ctx.Idents.get(Name);
+      ImplicitParamDecl *Param =
+          ImplicitParamDecl::Create(Ctx, DC, Loc, ParamName, Ty);
+      Args.push_back(Param);
+    }
+  } else
+    Args.append(CD->param_begin(), CD->param_end());
 
   // Create the function declaration.
----------------
ABataev wrote:
> sfantao wrote:
> > There is only one argument that comes with the CaptureDecl and that is the anonymous struct that captures all the references. 
> > 
> > For the target outlined functions we have to pass each capture separately because not only are they arguments but also data mappings. Therefore we don't have to generate the anonymous struct at all and use the captures directly, signaling the proper map types. This follows what was discussed in the offloading infrastructure proposal - the compiler will generate arrays with each argument/mapping and forward that to the target specific plugin. The target function is therefore expected to have the same signature of the host function that is being generated by this patch, so that the order in the array is consistent with the signature.
> > 
> > The code in this patch that is generating the host version of the target region can therefore be completely reused to generate the device version.
> Could we add required functionality to CapturedStmt to make it pass mappings also, if required? Currently we have to expose some things that better to stay hidden like VLAMap, modify StartFunction(), when we could add required parameters in Sema analysis and get all required stuff for free, without any additional changes in codegen
Okay, so from what I understand, __tgt_target is just a placeholder call that some later pass will rewrite to invoke a target-specific plugin that behaves like the host function.  Fine, but:

- Doesn't it still need to get passed the host function?
- Why does it return an i32 instead of just an i1, if you're going to rewrite it anyway?
- It would be much clearer to name it something actually intrinsic-like, like "omp.target.call".

As for the signature: as we've discussed before, because the captured statement invocation function isn't actually exposed anywhere, its physical signature is really completely an implementation detail of the captured statement emitter.  So I have no problem with this kind of change, but you should really design it to be flexible for different captured-statement emitters (by, say, taking a lambda or something similar) rather than hardcoding the specific needs of this one.

================
Comment at: lib/CodeGen/CodeGenFunction.cpp:769
@@ +768,3 @@
+    }
+  }
+
----------------
Please find some way of doing this that does not rely on changing common infrastructure like StartFunction.

================
Comment at: lib/CodeGen/CodeGenFunction.cpp:1550
@@ -1512,1 +1549,3 @@
+  return vlaSize;
+}
 
----------------
What do you need this for that you can't use getVLASize for?


http://reviews.llvm.org/D11361