[Openmp-commits] [openmp] 3e72f02 - [Offload][OpenMP][libdevice] Make check to enter state machine architecture dependent (#188144)

via Openmp-commits openmp-commits at lists.llvm.org
Wed Mar 25 08:22:27 PDT 2026


Author: Alex Duran
Date: 2026-03-25T16:22:22+01:00
New Revision: 3e72f02e20b99ba1fa0dce2982d89aaa0ef6ab26

URL: https://github.com/llvm/llvm-project/commit/3e72f02e20b99ba1fa0dce2982d89aaa0ef6ab26
DIFF: https://github.com/llvm/llvm-project/commit/3e72f02e20b99ba1fa0dce2982d89aaa0ef6ab26.diff

LOG: [Offload][OpenMP][libdevice] Make check to enter state machine architecture dependent (#188144)

The genericStateMachine call uses synchronize::thread wich is expected
to be implemented using a workgroup level barrier.
Currently as in some other architectures where if threads in the same
warp as the main thread reach the barrier may cause a race condition
there's a condition that makes some threads not enter the state machine.
But in Intel GPUs all threads must reach the barrier for it to be
completed, otherwise the threads in the state machine never make
progress.

This PR moves the condition into an architecture-dependent config so it
can work correctly for both kinds of hardware.

Added: 
    

Modified: 
    openmp/device/src/Kernel.cpp

Removed: 
    


################################################################################
diff  --git a/openmp/device/src/Kernel.cpp b/openmp/device/src/Kernel.cpp
index a180df7b982e3..d6b8659436156 100644
--- a/openmp/device/src/Kernel.cpp
+++ b/openmp/device/src/Kernel.cpp
@@ -44,6 +44,31 @@ initializeRuntime(bool IsSPMD, KernelEnvironmentTy &KernelEnvironment,
   workshare::init(IsSPMD);
 }
 
+/// Returns true if the current thread should enter the generic state machine.
+static bool shouldEnterStateMachine(bool IsSPMD) {
+#if defined(__NVPTX__) || defined(__AMDGPU__)
+  // This check is important for NVIDIA Pascal (but not Volta) and AMD
+  // GPU. In those cases, a single thread can apparently satisfy a barrier on
+  // behalf of all threads in the same warp. Thus, it would not be safe for
+  // other threads in the main thread's warp to reach the first
+  // synchronize::threads call in genericStateMachine before the main thread
+  // reaches its corresponding synchronize::threads call: that would permit all
+  // active worker threads to proceed before the main thread has actually set
+  // state::ParallelRegionFn, and then they would immediately quit without
+  // doing any work.  mapping::getMaxTeamThreads() does not include any of the
+  // main thread's warp, so none of its threads can ever be active worker
+  // threads.
+  return mapping::getThreadIdInBlock() < mapping::getMaxTeamThreads(IsSPMD);
+#else
+  // On other architectures (e.g., Intel GPUs) all threads must enter the state
+  // machine to satisfy the requirements of workgroup of synchronize::threads
+  // call in genericStateMachine. Otherwise, the workers will wait on the
+  // call to synchronize::threads forever and never proceed.
+  (void)IsSPMD;
+  return true;
+#endif
+}
+
 /// Simple generic state machine for worker threads.
 static void genericStateMachine(IdentTy *Ident) {
   uint32_t TId = mapping::getThreadIdInBlock();
@@ -108,21 +133,10 @@ int32_t __kmpc_target_init(KernelEnvironmentTy &KernelEnvironment,
     return -1;
 
   // Enter the generic state machine if enabled and if this thread can possibly
-  // be an active worker thread.
-  //
-  // The latter check is important for NVIDIA Pascal (but not Volta) and AMD
-  // GPU.  In those cases, a single thread can apparently satisfy a barrier on
-  // behalf of all threads in the same warp.  Thus, it would not be safe for
-  // other threads in the main thread's warp to reach the first
-  // synchronize::threads call in genericStateMachine before the main thread
-  // reaches its corresponding synchronize::threads call: that would permit all
-  // active worker threads to proceed before the main thread has actually set
-  // state::ParallelRegionFn, and then they would immediately quit without
-  // doing any work.  mapping::getMaxTeamThreads() does not include any of the
-  // main thread's warp, so none of its threads can ever be active worker
-  // threads.
-  if (UseGenericStateMachine &&
-      mapping::getThreadIdInBlock() < mapping::getMaxTeamThreads(IsSPMD))
+  // be an active worker thread. The shouldEnterStateMachine check is
+  // architecture-specific and handles platforms where warp-level barrier
+  // forwarding could cause races during state machine initialization.
+  if (UseGenericStateMachine && shouldEnterStateMachine(IsSPMD))
     genericStateMachine(KernelEnvironment.Ident);
 
   return mapping::getThreadIdInBlock();


        


More information about the Openmp-commits mailing list