[Openmp-dev] Two patches
hfinkel at anl.gov
Tue Mar 31 09:57:31 PDT 2015
----- Original Message -----
> From: "Terry L Wilmarth" <terry.l.wilmarth at intel.com>
> To: openmp-dev at dcs-maillist2.engr.illinois.edu
> Sent: Thursday, March 26, 2015 12:32:46 PM
> Subject: [Openmp-dev] Two patches
> Here are 2 more patches:
> 1. hier1.patch: This short patch eliminates writing to the depth
> field of the machine_hierarchy data structure. The write was
> unnecessary, as the depth can simply be pulled out of the data
> structure to a local variable, and then modified as needed. This
> also allowed the removal of the base_depth field, because now the
> depth field is the base depth, and does not need to be modified.
LGTM, but we need more comments here. First, please add a comment near where skipPerLevel is declared noting what the field means (looks like it is numPerLevel, but cumulative over all parent levels?). Regarding the code you're fixing, please add a comment explaining what is happening. It looks like you're increasing the effective barrier scheduling granularity if you have more participating threads than machine_hierarchy units at that granularity, right?
> 2. safe_api.patch: This patch is large, but very simple. This change
> replaces banned C API calls with safe calls to meet the SDL
> guidelines. Most calls have their safe alternatives on Windows, but
> we just define macros for other platforms.
> a. kmp_safe_c_api.h defines macros that replace the banned calls
> b. APIs for non-windows platforms will be filled in later
Okay. Two questions:
+// _malloca was suggested, but it is not a drop-in replacement for _alloca
+# define KMP_ALLOCA _alloca
Is the only difference the fact that the size is capped at _ALLOCA_S_THRESHOLD? Would you hit this limit? Are there performance implications to making the change regardless?
--- runtime/src/kmp_barrier.cpp (revision 231388)
+++ runtime/src/kmp_barrier.cpp (working copy)
@@ -32,7 +32,7 @@
#define ngo_load(src) ((void)0)
#define ngo_store_icvs(dst, src) copy_icvs((dst), (src))
-#define ngo_store_go(dst, src) memcpy((dst), (src), CACHE_LINE)
+#define ngo_store_go(dst, src) KMP_MEMCPY((dst), (src), CACHE_LINE)
#define ngo_sync() ((void)0)
#endif /* KMP_MIC && USE_NGO_STORES */
The other changes look fine, but this could have performance implications, no? The compiler can optimize away the memcpy, but likely not the memcpy_s, and the associated validation is likely expensive. Should we avoid this here?
> Terry L. Wilmarth
> terry.l.wilmarth at intel.com 217/403-4251
> Intel/SSG/DPD/TCAR/RAD/Threading Runtimes
> Openmp-dev mailing list
> Openmp-dev at dcs-maillist2.engr.illinois.edu
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Openmp-dev