[Openmp-commits] [openmp] [Offload][OpenMP][libdevice] Make check to enter state machine architecture dependent (PR #188144)
Kevin Sala Penades via Openmp-commits
openmp-commits at lists.llvm.org
Tue Mar 24 16:46:59 PDT 2026
================
@@ -62,6 +62,45 @@ bool mayUseThreadStates();
/// parallelism, or if it was explicitly disabled by the user.
bool mayUseNestedParallelism();
+/// Returns true if the current thread should enter the generic state machine.
+/// On some architectures, some threads should not enter the state machine to
+/// avoid warp-level barrier forwarding issues during initialization.
+/// On other architectures, all threads must enter the state machine to satisfy
+/// the requirements of workgroup synchronization.
+static inline bool shouldEnterStateMachine(bool IsSPMD);
+
+} // namespace config
+} // namespace ompx
+
+#include "Mapping.h"
+
+namespace ompx {
+namespace config {
+
+static inline bool shouldEnterStateMachine(bool IsSPMD) {
+#if defined(__NVPTX__) || defined(__AMDGPU__)
+ // This check is important for NVIDIA Pascal (but not Volta) and AMD
+ // GPU. In those cases, a single thread can apparently satisfy a barrier on
+ // behalf of all threads in the same warp. Thus, it would not be safe for
+ // other threads in the main thread's warp to reach the first
+ // synchronize::threads call in genericStateMachine before the main thread
+ // reaches its corresponding synchronize::threads call: that would permit all
+ // active worker threads to proceed before the main thread has actually set
+ // state::ParallelRegionFn, and then they would immediately quit without
+ // doing any work. mapping::getMaxTeamThreads() does not include any of the
+ // main thread's warp, so none of its threads can ever be active worker
+ // threads.
+ return mapping::getThreadIdInBlock() < mapping::getMaxTeamThreads(IsSPMD);
+#else
+ // On other architectures (e.g., Intel GPUs) all threads must enter the state
+ // machine to satisfy the requirements of workgroup of synchronize::threads
+ // call in genericStateMachine. Otherwise, the workers will wait on the
+ // call to synchronize::threads forever and never proceed.
+ (void)IsSPMD;
+ return true;
+#endif
+}
----------------
kevinsala wrote:
I don't think this is the proper file for this function. I would directly define it in Kernel.cpp, near the `genericStateMachine()` function.
https://github.com/llvm/llvm-project/pull/188144
More information about the Openmp-commits
mailing list