[Openmp-commits] [PATCH] D51687: [libomptarget][NVPTX] Fix number of threads in parallel
Jonas Hahnfeld via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Wed Sep 5 08:17:28 PDT 2018
Hahnfeld added inline comments.
================
Comment at: libomptarget/deviceRTLs/nvptx/src/parallel.cu:214-223
+#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700
+ // On Volta and newer architectures we require that all lanes in
+ // a warp participate in the parallel region. Round down to a
+ // multiple of WARPSIZE since it is legal to do so in OpenMP.
+ if (NumThreads < WARPSIZE) {
+ NumThreads = 1;
+ } else {
----------------
I'm not really sure why this is needed. It was added in https://github.com/clang-ykt/openmp/commit/588e8cbc751299c9f40067dc4d36812e8e49d2bd#diff-6fa63679541dd3fa12162559c1aeac42R279 and I can't find a technical explanation. Because I can't test on Volta (only Pascal) I've copied the code for now, but it would be great to know the background and add an explanation.
Repository:
rOMP OpenMP
https://reviews.llvm.org/D51687
More information about the Openmp-commits
mailing list