[all-commits] [llvm/llvm-project] 95a25e: [OpenMP][FIX] Do not use TBAA in type punning redu...

Sun Aug 16 12:40:45 PDT 2020

  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: 95a25e4c3203f35e9f57f9fac620b4a21bffd6e1
      https://github.com/llvm/llvm-project/commit/95a25e4c3203f35e9f57f9fac620b4a21bffd6e1
  Author: Johannes Doerfert <johannes at jdoerfert.de>
  Date:   2020-08-16 (Sun, 16 Aug 2020)

  Changed paths:
    M clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
    A clang/test/OpenMP/nvptx_target_parallel_reduction_codegen_tbaa_PR46146.cpp

  Log Message:
  -----------
  [OpenMP][FIX] Do not use TBAA in type punning reduction GPU code PR46156

When we implement OpenMP GPU reductions we use type punning a lot during
the shuffle and reduce operations. This is not always compatible with
language rules on aliasing. So far we generated TBAA which later allowed
to remove some of the reduce code as accesses and initialization were
"known to not alias". With this patch we avoid TBAA in this step,
hopefully for all accesses that we need to.

Verified on the reproducer of PR46156 and QMCPack.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D86037

  Commit: aa27cfc1e7d7456325e951a4ba3ced405027f7d0
      https://github.com/llvm/llvm-project/commit/aa27cfc1e7d7456325e951a4ba3ced405027f7d0
  Author: Johannes Doerfert <johannes at jdoerfert.de>
  Date:   2020-08-16 (Sun, 16 Aug 2020)

  Changed paths:
    M openmp/libomptarget/plugins/cuda/src/rtl.cpp

  Log Message:
  -----------
  [OpenMP][CUDA] Cache the maximal number of threads per block (per kernel)

Instead of calling `cuFuncGetAttribute` with
`CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK` for every kernel invocation,
we can do it for the first one and cache the result as part of the
`KernelInfo` struct. The only functional change is that we now expect
`cuFuncGetAttribute` to succeed and otherwise propagate the error.
Ignoring any error seems like a slippery slope...

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D86038

  Commit: 5272d29e2cb7c967c3016fa285f14edc7515d9bf
      https://github.com/llvm/llvm-project/commit/5272d29e2cb7c967c3016fa285f14edc7515d9bf
  Author: Johannes Doerfert <johannes at jdoerfert.de>
  Date:   2020-08-16 (Sun, 16 Aug 2020)

  Changed paths:
    M openmp/libomptarget/plugins/cuda/src/rtl.cpp

  Log Message:
  -----------
  [OpenMP][CUDA] Keep one kernel list per device, not globally.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D86039

Compare: https://github.com/llvm/llvm-project/compare/5f45f91de419...5272d29e2cb7