[Openmp-commits] [openmp] [OpenMP] Fix hyper barrier performance issue (PR #195473)

Kim Walisch via Openmp-commits openmp-commits at lists.llvm.org
Thu May 7 08:20:12 PDT 2026


kimwalisch wrote:

> I don't see this as a performance issue, but rather as a configuration issue. By default, LLVM OMP is multi-user environment friendly, and has wait policy set to passive. When doing performance analysis, we always use active wait policy. It's not a "workaround", it's a configuration. Your "fix" is forcing active wait policy into the runtime, taking away the ability to configure it as needed on a per application basis. You have the option to use kmp_set_blocktime and kmp_set_library in your code. Default wait policy is implementation defined and not dictated by the spec. I don't know what GOMP does, but my guess is they might be using an active wait policy by default, hence the difference.

OK, I get it LLVM OpenMP has wait policy set to passive by default whereas GCC OpenMP has wait policy set to active by default. And as expected when I set the GCC OpenMP wait policy to passive then both the primecount LLVM/Clang binary and the primecount GCC binary have similar performance:

```bash
$ hyperfine "OMP_WAIT_POLICY=PASSIVE ./primecount-gcc 1e17"
Benchmark 1: OMP_WAIT_POLICY=PASSIVE ./primecount-gcc 1e17
  Time (mean ± σ):     495.8 ms ±  10.5 ms    [User: 33831.9 ms, System: 508.7 ms]
  Range (min … max):   476.6 ms … 506.3 ms    10 runs

$ hyperfine "./primecount-clang 1e17"
Benchmark 1: ./primecount-clang 1e17
  Time (mean ± σ):     518.6 ms ±  31.7 ms    [User: 50341.4 ms, System: 1167.6 ms]
  Range (min … max):   481.8 ms … 572.6 ms   
```

But how do you explain this performance issue in LLVM OpenMP then:

```bash
$ hyperfine "OMP_WAIT_POLICY=PASSIVE GOMP_SPINCOUNT=100000 ./primecount-gcc 1e17"
Benchmark 1: OMP_WAIT_POLICY=PASSIVE GOMP_SPINCOUNT=100000 ./primecount-gcc 1e17
  Time (mean ± σ):     372.3 ms ±   7.0 ms    [User: 36583.8 ms, System: 1340.9 ms]
  Range (min … max):   354.5 ms … 378.8 ms    10 runs

$ hyperfine "OMP_WAIT_POLICY=PASSIVE KMP_BLOCKTIME=100ms ./primecount-clang 1e17"
Benchmark 1: OMP_WAIT_POLICY=PASSIVE KMP_BLOCKTIME=100ms ./primecount-clang 1e17
  Time (mean ± σ):     506.0 ms ±  70.3 ms    [User: 49090.4 ms, System: 1133.7 ms]
  Range (min … max):   450.4 ms … 692.6 ms    10 runs
```

I am using the same passive wait policy but now I have set the threads to spin for some time before going to sleep. This clearly improves the performance of the GCC OpenMP primecount binary whereas the LLVM OpenMP primecount binary's performance is not improved at all. Hence, it seems like LLVM OpenMP ignores `KMP_BLOCKTIME=100ms`, while GCC does honor `GOMP_SPINCOUNT=100000`. 

According to the LLVM OpenMP documentation: "LLVM/libomp documents the default as OMP_WAIT_POLICY=passive / KMP_LIBRARY=throughput with a default KMP_BLOCKTIME=200ms when OMP_WAIT_POLICY is unset." (https://openmp.llvm.org/design/Runtimes.html). So by default `KMP_BLOCKTIME=200ms` which would be very beneficial for my use case but LLVM OpenMP seems to simply ignore it?!

https://github.com/llvm/llvm-project/pull/195473


More information about the Openmp-commits mailing list