[Openmp-commits] [PATCH] D40358: Use hyperbarrier by default on all architectures
Jonas Hahnfeld via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Wed Nov 22 10:54:19 PST 2017
Hahnfeld added a comment.
In https://reviews.llvm.org/D40358#932906, @AndreyChurbanov wrote:
> The idea was that 32-bit machines will probably have small number of cores (2, or 4, or ...). Then hyper barrier can have bigger overhead. Can you check if 2 or 4 threads work faster on hyper barrier comparing to linear? If not, then maybe the condition could be fixed in different way, e.g. adding Power arch to the x86_64, leaving linear barrier for 32-bit archs.
I might be seeing a slightly better average with the linear barrier for 2 threads (1 percent?), but a higher standard deviation - not really sure about this.
The hyper barrier clearly wins for 4 threads by about 5 percent and naturally for all higher thread counts.
(Tested on the same Power system.)
So in theory, the hyper barrier collapses to a linear barrier for all thread counts less than 5 because we have a branch factor of 4, right? Obviously with a higher overhead because of the more complex code, but the synchronization pattern (which threads waits for which child) remains the same...
> BTW, the comments "hyper2: C78980" could be safely removed I think. This is some very old info that says nothing nowadays (at least to me:).
Ok, will do after we agreed on the general direction of this. (I always thought these to be references to an internal bug tracker? There are more references in `kmp_atomic.cpp`)
https://reviews.llvm.org/D40358
More information about the Openmp-commits
mailing list