[PATCH] D29758: [OpenMP] Parallel reduction on the NVPTX device.

Fri Feb 10 07:41:50 PST 2017

ABataev added inline comments.

================
Comment at: lib/CodeGen/CGOpenMPRuntime.h:956-962
   virtual void emitReduction(CodeGenFunction &CGF, SourceLocation Loc,
                              ArrayRef<const Expr *> Privates,
                              ArrayRef<const Expr *> LHSExprs,
                              ArrayRef<const Expr *> RHSExprs,
                              ArrayRef<const Expr *> ReductionOps,
-                             bool WithNowait, bool SimpleReduction);
+                             bool WithNowait, bool SimpleReduction,
+                             OpenMPDirectiveKind ReductionKind);
----------------
Number of parameters is getting too big, maybe it is better to aggregate them into a struct/class?

================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:118-133
+// GPU Configuration:  This information can be derived from cuda registers,
+// however, providing compile time constants helps generate more efficient
+// code.  For all practical purposes this is fine because the configuration
+// is the same for all known NVPTX architectures.
+enum MachineConfiguration : unsigned {
+  WarpSize = 32,
+  // Number of bits required to represent a lane identifier, which is
----------------
It's better to use `///` style of comments here

================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:653-675
+    /// Build int32_t __kmpc_shuffle_int32(int32_t element,
+    /// int16_t lane_offset, int16_t warp_size);
+    llvm::Type *TypeParams[] = {CGM.Int32Ty, CGM.Int16Ty, CGM.Int16Ty};
+    llvm::FunctionType *FnTy =
+        llvm::FunctionType::get(CGM.Int32Ty, TypeParams, /*isVarArg*/ false);
+    RTLFn = CGM.CreateRuntimeFunction(FnTy, "__kmpc_shuffle_int32");
+    break;
----------------
Use `//` instead of `///`

================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:963-965
+enum CopyAction : unsigned {
+  RemoteLaneToThread,
+  ThreadCopy,
----------------
Comments here?

================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:969-974
+// Emit instructions to copy a Reduce list, which contains partially
+// aggregated values, in the specified direction.
+//
+// RemoteLaneToThread: Copy over a Reduce list from a remote lane in
+//   the warp using shuffle instructions.
+// ThreadCopy: Make a copy of a Reduce list on the thread's stack.
----------------
Use `///`

================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:1272
+
+// Emit a helper that reduces data across two OpenMP threads (lanes)
+// in the same warp.  It uses shuffle instructions to copy over data from
----------------
`///` style here

================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:1488
+
+//
+// Design of OpenMP reductions on the GPU
----------------
`///` here

================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.h:245

-public:
+  /// \brief Emit a code for reduction clause.
+  ///
----------------
Bo \brief

================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.h:263
+
+  /// \brief Returns specified OpenMP runtime function for the current OpenMP
+  /// implementation.  Specialized for the NVPTX device.
----------------
No \brief

https://reviews.llvm.org/D29758