[Openmp-commits] [PATCH] D51687: [libomptarget][NVPTX] Fix number of threads in parallel

Jonas Hahnfeld via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Wed Sep 5 08:17:28 PDT 2018

Hahnfeld added inline comments.

Comment at: libomptarget/deviceRTLs/nvptx/src/parallel.cu:214-223
+#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700
+  // On Volta and newer architectures we require that all lanes in
+  // a warp participate in the parallel region.  Round down to a
+  // multiple of WARPSIZE since it is legal to do so in OpenMP.
+  if (NumThreads < WARPSIZE) {
+    NumThreads = 1;
+  } else {
I'm not really sure why this is needed. It was added in https://github.com/clang-ykt/openmp/commit/588e8cbc751299c9f40067dc4d36812e8e49d2bd#diff-6fa63679541dd3fa12162559c1aeac42R279 and I can't find a technical explanation. Because I can't test on Volta (only Pascal) I've copied the code for now, but it would be great to know the background and add an explanation.

  rOMP OpenMP


More information about the Openmp-commits mailing list