[Openmp-commits] [PATCH] D35490: Cleanup: two consecutive PAUSE instructions per spin loop iteration replaced with single one

Andrey Churbanov via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Mon Jul 17 09:36:07 PDT 2017


AndreyChurbanov created this revision.
AndreyChurbanov added a project: OpenMP.

Multiple consecutive hint to a processor about execution of a spin loop look redundant.  This patch eliminates extra PAUSE leaving single one per iteration of spin loop on a barrier.

Testing on many platforms showed no performance impact of this change. But on Intel KNC the patch revealed performance regressions on Spec OMPM 2001 test suite (actually the pause instruction is not supported there - _mm_delay_32 intrinsic used instead).  Making the pause two times longer restored the performance, and making it three times longer showed even better results - performance improvement on some tests - 314.mgrid_m, 324.apsi_m and 328.fma3d_m. So the delay interval changed from 100 ticks to 300.


Repository:
  rL LLVM

https://reviews.llvm.org/D35490

Files:
  runtime/src/kmp.h
  runtime/src/kmp_wait_release.h


Index: runtime/src/kmp_wait_release.h
===================================================================
--- runtime/src/kmp_wait_release.h
+++ runtime/src/kmp_wait_release.h
@@ -47,7 +47,7 @@
  */
 template <typename P> class kmp_flag {
   volatile P
-    *loc; /**< Pointer to the flag storage that is modified by another thread
+      *loc; /**< Pointer to the flag storage that is modified by another thread
              */
   flag_type t; /**< "Type" of the flag in loc */
 public:
@@ -225,11 +225,14 @@
 
     // If we are oversubscribed, or have waited a bit (and
     // KMP_LIBRARY=throughput), then yield
-    KMP_YIELD(oversubscribed);
     // TODO: Should it be number of cores instead of thread contexts? Like:
     // KMP_YIELD(TCR_4(__kmp_nth) > __kmp_ncores);
     // Need performance improvement data to make the change...
-    KMP_YIELD_SPIN(spins);
+    if (oversubscribed) {
+      KMP_YIELD(1);
+    } else {
+      KMP_YIELD_SPIN(spins);
+    }
     // Check if this thread was transferred from a team
     // to the thread pool (or vice-versa) while spinning.
     in_pool = !!TCR_4(this_thr->th.th_in_pool);
Index: runtime/src/kmp.h
===================================================================
--- runtime/src/kmp.h
+++ runtime/src/kmp.h
@@ -1040,7 +1040,11 @@
 #if KMP_ARCH_X86
 extern void __kmp_x86_pause(void);
 #elif KMP_MIC
-static void __kmp_x86_pause(void) { _mm_delay_32(100); }
+// Performance testing on KNC (C0QS-7120 P/A/X/D, 61-core, 16 GB Memory) showed
+// regression after removal of extra PAUSE from KMP_YIELD_SPIN(). Changing
+// the delay from 100 to 300 showed even better performance than double PAUSE
+// on Spec OMP2001 and LCPC tasking tests, no regressions on EPCC.
+static void __kmp_x86_pause(void) { _mm_delay_32(300); }
 #else
 static void __kmp_x86_pause(void) { _mm_pause(); }
 #endif
@@ -1076,16 +1080,16 @@
     KMP_CPU_PAUSE();                                                           \
     (count) -= 2;                                                              \
     if (!(count)) {                                                            \
-      KMP_YIELD(cond);                                                         \
+      __kmp_yield(cond);                                                       \
       (count) = __kmp_yield_next;                                              \
     }                                                                          \
   }
 #define KMP_YIELD_SPIN(count)                                                  \
   {                                                                            \
     KMP_CPU_PAUSE();                                                           \
     (count) -= 2;                                                              \
     if (!(count)) {                                                            \
-      KMP_YIELD(1);                                                            \
+      __kmp_yield(1);                                                          \
       (count) = __kmp_yield_next;                                              \
     }                                                                          \
   }


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D35490.106887.patch
Type: text/x-patch
Size: 3196 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20170717/07608d86/attachment.bin>


More information about the Openmp-commits mailing list