[Openmp-commits] [PATCH] D32321: [OpenMP] Optimized default kernel launch parameters in CUDA plugin
Jonas Hahnfeld via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Thu Apr 20 23:15:54 PDT 2017
Hahnfeld added a comment.
Does this change result in a lower runtime? Last time I tested clang-ykt on Pascal GPUs, 1024 threads were really the best thing to do...
================
Comment at: libomptarget/plugins/cuda/src/rtl.cpp:594-598
// Add master warp if necessary
if (KernelInfo->ExecutionMode == GENERIC) {
cudaThreadsPerBlock += DeviceInfo.WarpSize[device_id];
DP("Adding master warp: +%d threads\n", DeviceInfo.WarpSize[device_id]);
}
----------------
Just move this code under `if (thread_limit > 0)`?
================
Comment at: libomptarget/plugins/cuda/src/rtl.cpp:622-624
+ } else {
+ cudaBlocksPerGrid = loop_tripcount;
+ }
----------------
So each block executes one iteration? What is left for the threads in each block?
Repository:
rL LLVM
https://reviews.llvm.org/D32321
More information about the Openmp-commits
mailing list