<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div>My guess is that you are blocking rather than spinning. Using OMP_WAIT_POLICY=active doesn't seem to be enough with Intel's runtime to turn off all blocking. As I recall, there is a KMP_xxx flag that applies as well. You can grep the barrier code in the runtime or hope that someone from intel responds to your inquiry.<br><br>--<div>John Mellor-Crummey</div><div><br></div><div>(sent from my phone)</div></div><div><br>On Aug 18, 2015, at 1:14 PM, César via Openmp-dev <<a href="mailto:openmp-dev@lists.llvm.org">openmp-dev@lists.llvm.org</a>> wrote:<br><br></div><blockquote type="cite"><div><div dir="ltr">Hello,<div><br></div><div>I don't know if this is the correct list to talk about this - I did not find a better place..<br></div><div><br></div><div>I am doing performance experiments with a few OpenMP implementations (IOMP, GOMP and our private impl.) and I am seeing a severe slowdown when I use IOMP (GOMP and others are performing well). </div><div><br></div><div>The benchmarks I am using are these ones: <a href="http://kastors.gforge.inria.fr/#!index.md">http://kastors.gforge.inria.fr/#!index.md</a><br clear="all"><div></div></div><div><br></div><div>Really, the slowdown is huge. For one of the programs (plasma/dpotrf_taskdep -n 8192 -b 64 -i 1 -c) the serial version executes in ~28s and the parallel one executes in ~110s. I did some profiling and found that most of the time is being spent on synchronization barriers and dependence tracking (see attached image). Before digging deeper I would like to hear back from you if I am doing something wrong here:</div><div><br></div><div>- I tested with the last version of the repository: <a href="http://llvm.org/svn/llvm-project/openmp/trunk">http://llvm.org/svn/llvm-project/openmp/trunk</a></div><div><div><div class="gmail_signature">- I am using Ubuntu 14.10.</div><div class="gmail_signature">- I have tested on more than one machine, the results above are from a Intel i7-3770</div><div class="gmail_signature">- The runtime itself is compiled using: make compiler=gcc os_omp=linux arch=32e</div><div class="gmail_signature">- The version of GCC that I am using is: 4.9.1</div><div class="gmail_signature">- The version of Clang that I am using to compile the benchmarks: 3.5.0</div><div class="gmail_signature"><br><br>César.</div></div>
</div></div>
!DSPAM:8504,55d37631260061683114033!
</div></blockquote><blockquote type="cite"><div><pic1.png></div></blockquote><blockquote type="cite"><div><pic2.png></div></blockquote><blockquote type="cite"><div><pic3.png></div></blockquote><blockquote type="cite"><div><span>_______________________________________________</span><br><span>Openmp-dev mailing list</span><br><span><a href="mailto:Openmp-dev@lists.llvm.org">Openmp-dev@lists.llvm.org</a></span><br><span><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a></span><br><span></span><br><span>!DSPAM:8504,55d37631260061683114033!</span><br></div></blockquote></body></html>