[Openmp-commits] [PATCH] D40358: Use hyperbarrier by default on all architectures

Wed Nov 22 10:54:19 PST 2017

Hahnfeld added a comment.

In https://reviews.llvm.org/D40358#932906, @AndreyChurbanov wrote:

> The idea was that 32-bit machines will probably have small number of cores (2, or 4, or ...).  Then hyper barrier can have bigger overhead.  Can you check if 2 or 4 threads work faster on hyper barrier comparing to linear?   If not, then maybe the condition could be fixed in different way, e.g. adding Power arch to the x86_64, leaving linear barrier for 32-bit archs.

I might be seeing a slightly better average with the linear barrier for 2 threads (1 percent?), but a higher standard deviation - not really sure about this.
The hyper barrier clearly wins for 4 threads by about 5 percent and naturally for all higher thread counts.
(Tested on the same Power system.)

So in theory, the hyper barrier collapses to a linear barrier for all thread counts less than 5 because we have a branch factor of 4, right? Obviously with a higher overhead because of the more complex code, but the synchronization pattern (which threads waits for which child) remains the same...

> BTW, the comments "hyper2: C78980" could be safely removed I think.  This is some very old info that says nothing nowadays (at least to me:).

Ok, will do after we agreed on the general direction of this. (I always thought these to be references to an internal bug tracker? There are more references in `kmp_atomic.cpp`)

https://reviews.llvm.org/D40358