[Openmp-commits] [PATCH] D11990: Lock-free start of serialized parallel regions

Tue Aug 18 01:30:20 PDT 2015

hfinkel accepted this revision.
This revision is now accepted and ready to land.

================
Comment at: runtime/src/kmp_runtime.c:1740
@@ +1739,3 @@
+            if ( nthreads == 1 ) {
+                __kmp_release_bootstrap_lock( &__kmp_forkjoin_lock );
+            }
----------------
AndreyChurbanov wrote:
> hfinkel wrote:
> > Why do we only release the lock when nthreads == 1? Does __kmp_reserve_threads release it otherwise?
> > 
> > (I realize that you've only moved this line from down below, but this seems non-obvious)
> > 
> No, the __kmp_reserve_threads does not release the lock.
> 
> Let me detail the rational of the change:
> 
> Old code: get lock always at the beginning, then release lock for nthreads==1 on line 1756, for nthreads>1 on line 2168 when a lot of multithread-sensitive actions have completed.
> 
> New code: lock skipped for simple 1-thread cases, but still got lock for other 1-thread cases (e.g. when serial execution caused by dynamic threads adjustment inside __kmp_reserve_threads). As a result the lock releasing for 1-thread moved here, because it now cannot be done for all 1-thread cases.  Multi-thread case releases the lock in the same place as earlier.
> 
> Performance result - 10x or more speedup of the code like
>   <long loop>
>     #pragma omp parallel
>       #pragma omp parallel
> where inner parallel region are serialized by default because OMP nesting is disabled, and number of threads in outer region is big (e.g. 60 threads on Xeon PHI to keep all cores busy).
> 
Okay, sounds good. Please add a comment here explaining that in the multi-thread case the lock is released later on in the function. Otherwise, LGTM.

Repository:
  rL LLVM

http://reviews.llvm.org/D11990