[Openmp-commits] [PATCH] D51875: [OPENMP][NVPTX] Add support for lastprivates/reductions handling in SPMD constructs with lightweight runtime.

Mon Sep 10 13:09:37 PDT 2018

ABataev added a comment.

In https://reviews.llvm.org/D51875#1229491, @Hahnfeld wrote:

> I really, really dislike adding even more global buffers. `4096 * 32 * 56` are another 7MiB that are not usable for applications. What's wrong with using the existing ones?
>
> Can you upload the CodeGen patch for reductions somewhere? I thought we need a global scratchpad buffer that is adressable for all teams?

I really, really dislike an implementation in ibm-devel, the scratchpad solution will never be added to the trunk. The existing ones cannot be reused, as they are allocated only if the full runtime is used.

================
Comment at: libomptarget/deviceRTLs/nvptx/src/option.h:37
 // memory.
-#if __CUDA_ARCH__ >= 600
+#if __CUDA_ARCH__ >= 900
+#define OMP_STATE_COUNT 32
----------------
Hahnfeld wrote:
> This doesn't exist unless you have information that are not public yet. Volta is `720` at most.
According to this https://docs.nvidia.com/cuda/volta-tuning-guide/index.html, it is 84

Repository:
  rOMP OpenMP

https://reviews.llvm.org/D51875