[Openmp-commits] [openmp] [Libomptarget] Add a wavefront sync builtin for the AMDGPU implementation (PR #70228)

Joseph Huber via Openmp-commits openmp-commits at lists.llvm.org
Wed Oct 25 10:02:38 PDT 2023


https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/70228

Summary:
While this is technically a no-op for AMDGPU hardware, in cases where
the user would see fit to add an explicit wavefront sync on Nvidia
hardware, we should also inform the LLVM optimizer that this control
flow is convergent so we do not reorder blocks.


>From 2cbbde89df616442c19752d2e910d60302c60181 Mon Sep 17 00:00:00 2001
From: Joseph Huber <jhuber6 at vols.utk.edu>
Date: Wed, 25 Oct 2023 12:01:12 -0500
Subject: [PATCH] [Libomptarget] Add a wavefront sync builtin for the AMDGPU
 implementation

Summary:
While this is technically a no-op for AMDGPU hardware, in cases where
the user would see fit to add an explicit wavefront sync on Nvidia
hardware, we should also inform the LLVM optimizer that this control
flow is convergent so we do not reorder blocks.
---
 openmp/libomptarget/DeviceRTL/src/Synchronization.cpp | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp b/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
index b9a192f0d84df9a..ad3b1cad4194364 100644
--- a/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
+++ b/openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
@@ -272,7 +272,9 @@ void fenceSystem(atomic::OrderingTy Ordering) {
 }
 
 void syncWarp(__kmpc_impl_lanemask_t) {
-  // AMDGCN doesn't need to sync threads in a warp
+  // This is a no-op on current AMDGPU hardware but it is used by the optimizer
+  // to enforce convergent behaviour between control flow graphs.
+  __builtin_amdgcn_wave_barrier();
 }
 
 void syncThreads(atomic::OrderingTy Ordering) {



More information about the Openmp-commits mailing list