[Openmp-commits] [openmp] [OpenMP] Use distributed fork/join barrier for large teams by default (PR #195473)

Mon May 4 09:10:52 PDT 2026

kimwalisch wrote:

> The distributed barrier was designed for the sort of cases you are encountering, but particularly for Intel hardware, so it's interesting, and great that it also seems to work well on AMD. But since this is a specific application behavior, it's best not to change the defaults of the runtime. Rather, just use the environment variables to select the barrier you wish to use for your specific case. KMP_FORKJOIN_BARRIER_PATTERN=dist,dist for example. This will get you a warning to change all barrier patterns to dist, because dist doesn't mix with other barrier algorithms. So KMP_PLAIN_BARRIER_PATTERN and KMP_REDUCTION_BARRIER_PATTERN will need to be set too.

Thanks for your answer.

Unfortunately your suggested solution does not solve my particular issue. My primecount program/library is downloaded by random users from the internet using the operating system's package manager. 99% of these users will run my program using the default settings and will hence encounter the LLVM OpenMP scaling issue (if my program is compiled using LLVM/Clang).

I personally think this is a severe LLVM OpenMP scaling issue that is worth fixing upstream in LLVM OpenMP. The huge number of context switches in my benchmarks clearly indicate that there is a significant issue in LLVM OpenMP on many-core systems. As mentioned GCC's OpenMP library does not suffer from this issue (using the default settings).

If an LLVM OpenMP maintainer could provide guidance on the best approach for resolving this performance issue, I would be happy to work on a fix for LLVM OpenMP and address any feedback received during the review of my patches.

https://github.com/llvm/llvm-project/pull/195473