[PATCH] D29758: [OpenMP] Parallel reduction on the NVPTX device.
Arpith Jacob via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Feb 9 06:48:58 PST 2017
arpith-jacob created this revision.
Herald added a subscriber: jholewinski.
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types. An efficient
implementation requires hierarchical reduction within a
warp and a threadblock. It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.
The patch creates a struct to hold reduction variables and
a number of helper functions. The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team. Variables are
shared between CUDA threads using shuffle intrinsics.
An implementation of reductions on the NVPTX device is
substantially different to that of CPUs. However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.
The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions. The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.
While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.
Patch by Tian Jin in collaboration with Arpith Jacob
https://reviews.llvm.org/D29758
Files:
lib/CodeGen/CGOpenMPRuntime.cpp
lib/CodeGen/CGOpenMPRuntime.h
lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
lib/CodeGen/CGOpenMPRuntimeNVPTX.h
lib/CodeGen/CGStmtOpenMP.cpp
lib/CodeGen/CodeGenFunction.h
test/OpenMP/nvptx_target_parallel_reduction_codegen.cpp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D29758.87799.patch
Type: text/x-patch
Size: 101312 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20170209/979b5728/attachment-0001.bin>
More information about the cfe-commits
mailing list