[Openmp-commits] [PATCH] D51687: [libomptarget][NVPTX] Fix number of threads in parallel

Jonas Hahnfeld via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Wed Sep 5 08:17:28 PDT 2018


Hahnfeld added inline comments.


================
Comment at: libomptarget/deviceRTLs/nvptx/src/parallel.cu:214-223
+#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700
+  // On Volta and newer architectures we require that all lanes in
+  // a warp participate in the parallel region.  Round down to a
+  // multiple of WARPSIZE since it is legal to do so in OpenMP.
+  if (NumThreads < WARPSIZE) {
+    NumThreads = 1;
+  } else {
----------------
I'm not really sure why this is needed. It was added in https://github.com/clang-ykt/openmp/commit/588e8cbc751299c9f40067dc4d36812e8e49d2bd#diff-6fa63679541dd3fa12162559c1aeac42R279 and I can't find a technical explanation. Because I can't test on Volta (only Pascal) I've copied the code for now, but it would be great to know the background and add an explanation.


Repository:
  rOMP OpenMP

https://reviews.llvm.org/D51687





More information about the Openmp-commits mailing list