[Openmp-commits] [openmp] 10de217 - [libomptarget][amdgpu] Fix truncation error for partial wavefront
Jon Chesterfield via Openmp-commits
openmp-commits at lists.llvm.org
Thu May 13 09:32:12 PDT 2021
Author: Jon Chesterfield
Date: 2021-05-13T17:31:57+01:00
New Revision: 10de21720989166a6b51cbf48b21efacbb913f23
URL: https://github.com/llvm/llvm-project/commit/10de21720989166a6b51cbf48b21efacbb913f23
DIFF: https://github.com/llvm/llvm-project/commit/10de21720989166a6b51cbf48b21efacbb913f23.diff
LOG: [libomptarget][amdgpu] Fix truncation error for partial wavefront
[libomptarget][amdgpu] Fix truncation error for partial wavefront
The partial barrier implementation involves one wavefront resetting and N-1
waiting. This change future proofs against launching with a number of threads
that is not a multiple of the wavefront size.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D102407
Added:
Modified:
openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip
Removed:
################################################################################
diff --git a/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip b/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip
index 4c99a096916a3..a9663a80b83ce 100644
--- a/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip
+++ b/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip
@@ -56,7 +56,7 @@ static void pteam_mem_barrier(uint32_t num_threads, uint32_t * barrier_state)
{
__atomic_thread_fence(__ATOMIC_ACQUIRE);
- uint32_t num_waves = num_threads / WARPSIZE;
+ uint32_t num_waves = (num_threads + WARPSIZE - 1) / WARPSIZE;
// Partial barrier implementation for amdgcn.
// Uses two 16 bit unsigned counters. One for the number of waves to have
More information about the Openmp-commits
mailing list