[Openmp-commits] [PATCH] D32321: [OpenMP] Optimized default kernel launch parameters in CUDA plugin

Jonas Hahnfeld via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Thu Apr 20 23:15:54 PDT 2017


Hahnfeld added a comment.

Does this change result in a lower runtime? Last time I tested clang-ykt on Pascal GPUs, 1024 threads were really the best thing to do...



================
Comment at: libomptarget/plugins/cuda/src/rtl.cpp:594-598
   // Add master warp if necessary
   if (KernelInfo->ExecutionMode == GENERIC) {
     cudaThreadsPerBlock += DeviceInfo.WarpSize[device_id];
     DP("Adding master warp: +%d threads\n", DeviceInfo.WarpSize[device_id]);
   }
----------------
Just move this code under `if (thread_limit > 0)`?


================
Comment at: libomptarget/plugins/cuda/src/rtl.cpp:622-624
+      } else {
+        cudaBlocksPerGrid = loop_tripcount;
+      }
----------------
So each block executes one iteration? What is left for the threads in each block?


Repository:
  rL LLVM

https://reviews.llvm.org/D32321





More information about the Openmp-commits mailing list