[PATCH] D29910: [OpenMP] Specialize default schedule on a worksharing loop on the NVPTX device.

Mon Feb 13 13:47:47 PST 2017

arpith-jacob created this revision.

The default schedule type on a worksharing loop is implementation
defined according to the OpenMP specifications.  Currently, the
compiler codegens a doubly nested loop that effectively implements
a schedule of type (static).  This is ideal for threads on CPUs.

On the NVPTX and other SIMT GPUs, this schedule provides very poor
performance because consecutive threads in a warp access loop arrays
in a non-coalesced manner.  That is, to achieve coalescing, and good
performance, the best schedule is static with a chunk size of 1.

This patch adds support for target devices to select the best default
schedule depending on their architecture.  It modifies loop codegen
to generate optimized code for (static,1) on the NVPTX device, i.e.,
by using a single loop instead of a doubly nested loop as is
currently the case.

https://reviews.llvm.org/D29910

Files:
  include/clang/AST/StmtOpenMP.h
  include/clang/Basic/OpenMPKinds.h
  lib/AST/StmtOpenMP.cpp
  lib/Basic/OpenMPKinds.cpp
  lib/CodeGen/CGStmtOpenMP.cpp
  lib/Sema/SemaOpenMP.cpp
  test/OpenMP/nvptx_coalesced_scheduling_codegen.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D29910.88252.patch
Type: text/x-patch
Size: 43540 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20170213/6fbe0f74/attachment-0001.bin>