[Openmp-commits] [openmp] 10de217 - [libomptarget][amdgpu] Fix truncation error for partial wavefront

Jon Chesterfield via Openmp-commits openmp-commits at lists.llvm.org
Thu May 13 09:32:12 PDT 2021


Author: Jon Chesterfield
Date: 2021-05-13T17:31:57+01:00
New Revision: 10de21720989166a6b51cbf48b21efacbb913f23

URL: https://github.com/llvm/llvm-project/commit/10de21720989166a6b51cbf48b21efacbb913f23
DIFF: https://github.com/llvm/llvm-project/commit/10de21720989166a6b51cbf48b21efacbb913f23.diff

LOG: [libomptarget][amdgpu] Fix truncation error for partial wavefront

[libomptarget][amdgpu] Fix truncation error for partial wavefront

The partial barrier implementation involves one wavefront resetting and N-1
waiting. This change future proofs against launching with a number of threads
that is not a multiple of the wavefront size.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102407

Added: 
    

Modified: 
    openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip

Removed: 
    


################################################################################
diff  --git a/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip b/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip
index 4c99a096916a3..a9663a80b83ce 100644
--- a/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip
+++ b/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip
@@ -56,7 +56,7 @@ static void pteam_mem_barrier(uint32_t num_threads, uint32_t * barrier_state)
 {
   __atomic_thread_fence(__ATOMIC_ACQUIRE);
 
-  uint32_t num_waves = num_threads / WARPSIZE;
+  uint32_t num_waves = (num_threads + WARPSIZE - 1) / WARPSIZE;
 
   // Partial barrier implementation for amdgcn.
   // Uses two 16 bit unsigned counters. One for the number of waves to have


        


More information about the Openmp-commits mailing list