[Openmp-commits] [PATCH] D22093: Improving EPCC performance when linking with hwloc
Jonathan Peyton via Openmp-commits
openmp-commits at lists.llvm.org
Thu Jul 7 08:55:41 PDT 2016
jlpeyton created this revision.
jlpeyton added reviewers: tlwilmar, AndreyChurbanov.
jlpeyton added a subscriber: openmp-commits.
jlpeyton set the repository for this revision to rL LLVM.
When linking with libhwloc, the ORDERED EPCC test slows down on big machines (> 48 cores). Performance analysis showed that a cache thrash
was occurring and this padding helps alleviate the problem.
Also, inside the main spin-wait loop in `kmp_wait_release.h`, we can eliminate the references to the global shared variables by instead creating a local variable, `oversubscribed` and instead checking that.
Repository:
rL LLVM
http://reviews.llvm.org/D22093
Files:
runtime/src/kmp.h
runtime/src/kmp_dispatch.cpp
runtime/src/kmp_wait_release.h
Index: runtime/src/kmp_wait_release.h
===================================================================
--- runtime/src/kmp_wait_release.h
+++ runtime/src/kmp_wait_release.h
@@ -97,6 +97,7 @@
kmp_uint32 hibernate;
int th_gtid;
int tasks_completed = FALSE;
+ int oversubscribed;
KMP_FSYNC_SPIN_INIT(spin, NULL);
if (flag->done_check()) {
@@ -166,6 +167,7 @@
hibernate - __kmp_global.g.g_time.dt.t_value));
}
+ oversubscribed = (TCR_4(__kmp_nth) > __kmp_avail_proc);
KMP_MB();
// Main wait spin loop
@@ -201,7 +203,7 @@
}
// If we are oversubscribed, or have waited a bit (and KMP_LIBRARY=throughput), then yield
- KMP_YIELD(TCR_4(__kmp_nth) > __kmp_avail_proc);
+ KMP_YIELD(oversubscribed);
// TODO: Should it be number of cores instead of thread contexts? Like:
// KMP_YIELD(TCR_4(__kmp_nth) > __kmp_ncores);
// Need performance improvement data to make the change...
Index: runtime/src/kmp_dispatch.cpp
===================================================================
--- runtime/src/kmp_dispatch.cpp
+++ runtime/src/kmp_dispatch.cpp
@@ -180,6 +180,12 @@
kmp_uint32 *doacross_flags; // array of iteration flags (0/1)
kmp_int32 doacross_num_done; // count finished threads
#endif
+#if KMP_USE_HWLOC
+ // When linking with libhwloc, the ORDERED EPCC test slowsdown on big
+ // machines (> 48 cores). Performance analysis showed that a cache thrash
+ // was occurring and this padding helps alleviate the problem.
+ char padding[64];
+#endif
};
/* ------------------------------------------------------------------------ */
Index: runtime/src/kmp.h
===================================================================
--- runtime/src/kmp.h
+++ runtime/src/kmp.h
@@ -1706,6 +1706,12 @@
volatile kmp_uint32 *doacross_flags; // shared array of iteration flags (0/1)
kmp_int32 doacross_num_done; // count finished threads
#endif
+#if KMP_USE_HWLOC
+ // When linking with libhwloc, the ORDERED EPCC test slows down on big
+ // machines (> 48 cores). Performance analysis showed that a cache thrash
+ // was occurring and this padding helps alleviate the problem.
+ char padding[64];
+#endif
} dispatch_shared_info_t;
typedef struct kmp_disp {
@@ -2567,7 +2573,7 @@
int t_size_changed; // team size was changed?: 0: no, 1: yes, -1: changed via omp_set_num_threads() call
// Read/write by workers as well -----------------------------------------------------------------------
-#if KMP_ARCH_X86 || KMP_ARCH_X86_64
+#if (KMP_ARCH_X86 || KMP_ARCH_X86_64) && !KMP_USE_HWLOC
// Using CACHE_LINE=64 reduces memory footprint, but causes a big perf regression of epcc 'parallel'
// and 'barrier' on fxe256lin01. This extra padding serves to fix the performance of epcc 'parallel'
// and 'barrier' when CACHE_LINE=64. TODO: investigate more and get rid if this padding.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D22093.63080.patch
Type: text/x-patch
Size: 3023 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20160707/75169c2e/attachment.bin>
More information about the Openmp-commits
mailing list