[PATCH] D29506: [OpenMP] Teams reduction on the NVPTX device.

Thu Feb 9 06:42:52 PST 2017

arpith-jacob marked 2 inline comments as done.
arpith-jacob added a comment.

In https://reviews.llvm.org/D29506#669542, @ABataev wrote:

> The patch is too big and quite hard to review? Could you split it into several smaller parts?

Alexey, thank you for your time.  I have addressed your comments and split the patch into a smaller one, which I will post shortly.

The new patch is to implement parallel reductions on the GPU.  This is the smallest patch I can get such that the codegen is correct and fully functional (runs correctly on the GPU).

Thanks,
Arpith

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:4280-4282
+  // We don't need debug information in this function as nothing here refers to
+  // user source code.
+  CGF.disableDebugInfo();
----------------
ABataev wrote:
> It is not quite so, at least we have a reference in a list of reductions in `reduction` clause, which may be considered a debug position
Ok, removed.

================
Comment at: lib/CodeGen/CGOpenMPRuntime.h:524
+
+  static bool classof(const CGOpenMPRuntime *RT) {
+    return RT->getKind() == RK_HOST;
----------------
ABataev wrote:
> arpith-jacob wrote:
> > This is required to cast to the NVPTX runtime in a static function as follows;
> > 
> > CGOpenMPRuntimeNVPTX &RT = cast<CGOpenMPRuntimeNVPTX>(CGM.getOpenMPRuntime());
> Are you going to make calls to `isa()`, `dyn_cast()` functions? If not, just use `static_cast<>()` instead.
Yes, I will use a static_cast and remove the OpenMPRuntimeKind functionality.

================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:972-973
+  auto CastTy = Size <= 4 ? CGM.Int32Ty : CGM.Int64Ty;
+  auto *ElemCast = Bld.CreateSExtOrBitCast(Elem, CastTy);
+  auto *WarpSize = Bld.CreateTruncOrBitCast(getNVPTXWarpSize(CGF), CGM.Int16Ty);
+
----------------
ABataev wrote:
> I'd prefer you to use `CGF.EmitScalarConversion()` rather than Builder casts.
Ok.  I have used CGF.EmitScalarConversion() when I can, for example for WarpSize.  In other cases I want simple bitcasts and truncs so I have used the Builder.

================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:975-978
+  llvm::SmallVector<llvm::Value *, 3> Args;
+  Args.push_back(ElemCast);
+  Args.push_back(Offset);
+  Args.push_back(WarpSize);
----------------
ABataev wrote:
> Do you really need a SmallVector<> or you can use just an array here?
Used an array here and other places.

================
Comment at: lib/CodeGen/CGStmtOpenMP.cpp:2249
+                                         ? OMPD_parallel_for_simd
+                                         : OMPD_parallel);
       // Emit post-update of the reduction variables if IsLastIter != 0.
----------------
ABataev wrote:
> OMPD_parallel or OMPD_parallel_for?
OMPD_parallel is fine.  I just need to know that it is a 'parallel' type reduction.

================
Comment at: lib/CodeGen/CGStmtOpenMP.cpp:2424
     CGF.OMPCancelStack.emitExit(CGF, S.getDirectiveKind(), CodeGen);
-    CGF.EmitOMPReductionClauseFinal(S);
+    CGF.EmitOMPReductionClauseFinal(S, OMPD_parallel);
     // Emit post-update of the reduction variables if IsLastIter != 0.
----------------
ABataev wrote:
> OMPD_parallel_for?
This is the reduction codegen for the 'sections' directive.  So a reduction type of 'OMPD_parallel' works well.

https://reviews.llvm.org/D29506