[Openmp-commits] [openmp] [OpenMP] Use distributed fork/join barrier for large teams by default (PR #195473)

Kim Walisch via Openmp-commits openmp-commits at lists.llvm.org
Sun May 3 23:58:30 PDT 2026


kimwalisch wrote:

To better explain my pull request:

Initially the AI agent I used to find this bug suggested the following tiny fix in `kmp_global.cpp`:

```diff
diff --git a/openmp/runtime/src/kmp_global.cpp b/openmp/runtime/src/kmp_global.cpp
index 15b9babfaf0b..1c75b76fe290 100644
--- a/openmp/runtime/src/kmp_global.cpp
+++ b/openmp/runtime/src/kmp_global.cpp
@@ -82,9 +82,9 @@ kmp_uint32 __kmp_barrier_gather_bb_dflt = 2;
 kmp_uint32 __kmp_barrier_release_bb_dflt = 2;
 /* branch_factor = 4 */ /* hyper2: C78980 */
 
-kmp_bar_pat_e __kmp_barrier_gather_pat_dflt = bp_hyper_bar;
+kmp_bar_pat_e __kmp_barrier_gather_pat_dflt = bp_dist_bar;
 /* hyper2: C78980 */
-kmp_bar_pat_e __kmp_barrier_release_pat_dflt = bp_hyper_bar;
+kmp_bar_pat_e __kmp_barrier_release_pat_dflt = bp_dist_bar;
 /* hyper2: C78980 */
 
 kmp_uint32 __kmp_barrier_gather_branch_bits[bs_last_barrier] = {0};
```

I benchmarked this simple 2 line code change and it indeed fixed the LLVM OpenMP performance issue I previously described. However, simply changing the default barrier type for all OpenMP users just to improve the performance of my particular use case seemed unappropriate to me. Therefore my pull request leaves the default barrier type (which performs very well for workloads using a small number of threads such as e.g. ≤ 8 threads) as is, but switches to the more scalable barrier type for workloads using ≥ 32 threads.

https://github.com/llvm/llvm-project/pull/195473


More information about the Openmp-commits mailing list