[PATCH] D29910: [OpenMP] Specialize default schedule on a worksharing loop on the NVPTX device.
Arpith Jacob via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Mon Feb 13 13:47:47 PST 2017
arpith-jacob created this revision.
The default schedule type on a worksharing loop is implementation
defined according to the OpenMP specifications. Currently, the
compiler codegens a doubly nested loop that effectively implements
a schedule of type (static). This is ideal for threads on CPUs.
On the NVPTX and other SIMT GPUs, this schedule provides very poor
performance because consecutive threads in a warp access loop arrays
in a non-coalesced manner. That is, to achieve coalescing, and good
performance, the best schedule is static with a chunk size of 1.
This patch adds support for target devices to select the best default
schedule depending on their architecture. It modifies loop codegen
to generate optimized code for (static,1) on the NVPTX device, i.e.,
by using a single loop instead of a doubly nested loop as is
currently the case.
https://reviews.llvm.org/D29910
Files:
include/clang/AST/StmtOpenMP.h
include/clang/Basic/OpenMPKinds.h
lib/AST/StmtOpenMP.cpp
lib/Basic/OpenMPKinds.cpp
lib/CodeGen/CGStmtOpenMP.cpp
lib/Sema/SemaOpenMP.cpp
test/OpenMP/nvptx_coalesced_scheduling_codegen.cpp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D29910.88252.patch
Type: text/x-patch
Size: 43540 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20170213/6fbe0f74/attachment-0001.bin>
More information about the cfe-commits
mailing list