[Openmp-commits] [PATCH] D51875: [OPENMP][NVPTX] Add support for lastprivates/reductions handling in SPMD constructs with lightweight runtime.
Jonas Hahnfeld via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Mon Sep 10 13:16:07 PDT 2018
Hahnfeld added a comment.
In https://reviews.llvm.org/D51875#1229496, @ABataev wrote:
> In https://reviews.llvm.org/D51875#1229491, @Hahnfeld wrote:
> > I really, really dislike adding even more global buffers. `4096 * 32 * 56` are another 7MiB that are not usable for applications. What's wrong with using the existing ones?
> > Can you upload the CodeGen patch for reductions somewhere? I thought we need a global scratchpad buffer that is adressable for all teams?
> I really, really dislike an implementation in ibm-devel, the scratchpad solution will never be added to the trunk. The existing ones cannot be reused, as they are allocated only if the full runtime is used.
What's the overhead of initializing it? The whole `libomptarget-nvptx` is already a pretty much mess, see my thread on openmp-dev.
Comment at: libomptarget/deviceRTLs/nvptx/src/option.h:37
-#if __CUDA_ARCH__ >= 600
+#if __CUDA_ARCH__ >= 900
+#define OMP_STATE_COUNT 32
> Hahnfeld wrote:
> > This doesn't exist unless you have information that are not public yet. Volta is `720` at most.
> According to this https://docs.nvidia.com/cuda/volta-tuning-guide/index.html, it is 84
I'm not commenting on `MAX_SM`, rather on the value of `__CUDA_ARCH__`. As such these defines are never active.
More information about the Openmp-commits