[PATCH] D29758: [OpenMP] Parallel reduction on the NVPTX device.
Alexey Bataev via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri Feb 10 07:41:50 PST 2017
ABataev added inline comments.
================
Comment at: lib/CodeGen/CGOpenMPRuntime.h:956-962
virtual void emitReduction(CodeGenFunction &CGF, SourceLocation Loc,
ArrayRef<const Expr *> Privates,
ArrayRef<const Expr *> LHSExprs,
ArrayRef<const Expr *> RHSExprs,
ArrayRef<const Expr *> ReductionOps,
- bool WithNowait, bool SimpleReduction);
+ bool WithNowait, bool SimpleReduction,
+ OpenMPDirectiveKind ReductionKind);
----------------
Number of parameters is getting too big, maybe it is better to aggregate them into a struct/class?
================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:118-133
+// GPU Configuration: This information can be derived from cuda registers,
+// however, providing compile time constants helps generate more efficient
+// code. For all practical purposes this is fine because the configuration
+// is the same for all known NVPTX architectures.
+enum MachineConfiguration : unsigned {
+ WarpSize = 32,
+ // Number of bits required to represent a lane identifier, which is
----------------
It's better to use `///` style of comments here
================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:653-675
+ /// Build int32_t __kmpc_shuffle_int32(int32_t element,
+ /// int16_t lane_offset, int16_t warp_size);
+ llvm::Type *TypeParams[] = {CGM.Int32Ty, CGM.Int16Ty, CGM.Int16Ty};
+ llvm::FunctionType *FnTy =
+ llvm::FunctionType::get(CGM.Int32Ty, TypeParams, /*isVarArg*/ false);
+ RTLFn = CGM.CreateRuntimeFunction(FnTy, "__kmpc_shuffle_int32");
+ break;
----------------
Use `//` instead of `///`
================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:963-965
+enum CopyAction : unsigned {
+ RemoteLaneToThread,
+ ThreadCopy,
----------------
Comments here?
================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:969-974
+// Emit instructions to copy a Reduce list, which contains partially
+// aggregated values, in the specified direction.
+//
+// RemoteLaneToThread: Copy over a Reduce list from a remote lane in
+// the warp using shuffle instructions.
+// ThreadCopy: Make a copy of a Reduce list on the thread's stack.
----------------
Use `///`
================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:1272
+
+// Emit a helper that reduces data across two OpenMP threads (lanes)
+// in the same warp. It uses shuffle instructions to copy over data from
----------------
`///` style here
================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:1488
+
+//
+// Design of OpenMP reductions on the GPU
----------------
`///` here
================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.h:245
-public:
+ /// \brief Emit a code for reduction clause.
+ ///
----------------
Bo \brief
================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.h:263
+
+ /// \brief Returns specified OpenMP runtime function for the current OpenMP
+ /// implementation. Specialized for the NVPTX device.
----------------
No \brief
https://reviews.llvm.org/D29758
More information about the cfe-commits
mailing list