[Openmp-dev] Two patches

Tue Mar 31 09:57:31 PDT 2015

----- Original Message -----

> From: "Terry L Wilmarth" <terry.l.wilmarth at intel.com>
> To: openmp-dev at dcs-maillist2.engr.illinois.edu
> Sent: Thursday, March 26, 2015 12:32:46 PM
> Subject: [Openmp-dev] Two patches

> Hello,

> Here are 2 more patches:

> 1. hier1.patch: This short patch eliminates writing to the depth
> field of the machine_hierarchy data structure. The write was
> unnecessary, as the depth can simply be pulled out of the data
> structure to a local variable, and then modified as needed. This
> also allowed the removal of the base_depth field, because now the
> depth field is the base depth, and does not need to be modified.
LGTM, but we need more comments here. First, please add a comment near where skipPerLevel is declared noting what the field means (looks like it is numPerLevel, but cumulative over all parent levels?). Regarding the code you're fixing, please add a comment explaining what is happening. It looks like you're increasing the effective barrier scheduling granularity if you have more participating threads than machine_hierarchy units at that granularity, right? 

> 2. safe_api.patch: This patch is large, but very simple. This change
> replaces banned C API calls with safe calls to meet the SDL
> guidelines. Most calls have their safe alternatives on Windows, but
> we just define macros for other platforms.
> a. kmp_safe_c_api.h defines macros that replace the banned calls
> b. APIs for non-windows platforms will be filled in later
Okay. Two questions: 

+// _malloca was suggested, but it is not a drop-in replacement for _alloca 
+# define KMP_ALLOCA _alloca 

Is the only difference the fact that the size is capped at _ALLOCA_S_THRESHOLD? Would you hit this limit? Are there performance implications to making the change regardless? 

Index: runtime/src/kmp_barrier.cpp 
===================================================================

--- runtime/src/kmp_barrier.cpp (revision 231388) 
+++ runtime/src/kmp_barrier.cpp (working copy) 
@@ -32,7 +32,7 @@ 
#else 
#define ngo_load(src) ((void)0) 
#define ngo_store_icvs(dst, src) copy_icvs((dst), (src)) 
-#define ngo_store_go(dst, src) memcpy((dst), (src), CACHE_LINE) 
+#define ngo_store_go(dst, src) KMP_MEMCPY((dst), (src), CACHE_LINE) 
#define ngo_sync() ((void)0) 
#endif /* KMP_MIC && USE_NGO_STORES */ 

The other changes look fine, but this could have performance implications, no? The compiler can optimize away the memcpy, but likely not the memcpy_s, and the associated validation is likely expensive. Should we avoid this here? 

Thanks again, 
Hal 

> Thanks!
> Terry

> --
> Terry L. Wilmarth
> terry.l.wilmarth at intel.com 217/403-4251
> Intel/SSG/DPD/TCAR/RAD/Threading Runtimes

> _______________________________________________
> Openmp-dev mailing list
> Openmp-dev at dcs-maillist2.engr.illinois.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/openmp-dev

-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20150331/fd3e5fad/attachment.html>