[Openmp-dev] initial clang-omp/openmp benchmarking

Wed May 28 07:24:33 PDT 2014

Sorry, I read your description of the problem
“While the results for iomp5 are far better on darwin than those for gomp on darwin, we still are lagging behind the performance of gomp using futex on linux.”
as being that libiomp5.so was underperforming on Linux because we’re not using futex there, so I was explaining how we could do that.

I now grok that what you’re saying is that you’d like to see performance on Darwin (without futexes) that is faster than on Linux (with futexes).
So, I suggest trying the TAS lock, (KMP_LOCK_KIND=tas on Darwin).

Depending on what you think OpenMP is used for, though, locks may be irrelevant. If you look at the latest SPECOMP codes, there are none that use locks (down from the previous version that had a couple).

In HPC locks should be rare and heavily contended locks absent completely. (Because if there are heavily contended locks in a significant part of the code, it won’t perform well anyway, so doesn’t qualify for the “HPC” name ☺).

-- Jim

James Cownie <james.h.cownie at intel.com>
SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
Tel: +44 117 9071438

From: Jack Howarth [mailto:howarth.mailing.lists at gmail.com]
Sent: Wednesday, May 28, 2014 3:15 PM
To: Cownie, James H
Cc: openmp-dev at dcs-maillist2.engr.illinois.edu
Subject: Re: [Openmp-dev] initial clang-omp/openmp benchmarking

James,
      The files I used (with the shell script adjusted for the compiler of course) are attached to the gcc
bugzilla at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333. As for using futex, I am focused on
using darwin so that isn't really an option.
               Jack
ps Attached is an archive with a set of openmp open source demos that I found. There might be a
better example to benchmark locks in there than heated_plate_openmp.c and heated_plate_gcc.sh,

On Wed, May 28, 2014 at 9:27 AM, Cownie, James H <james.h.cownie at intel.com<mailto:james.h.cownie at intel.com>> wrote:
Without looking at the benchmark code (do you have a link to it?), given the description it sounds as if it is omp_lock_t intensive.
If so you can explicitly use FUTEX locks with libiomp5.so on Linux by setting the environment variable KMP_LOCK_KIND=futex.
You may also want to play with KMP_LOCK_KIND=tas, which uses a “test and test-and-set” lock.

The choice of a default lock implementation is not trivial, since some lock benchmarks (such as that in EPCC) are for heavily contended locks, whereas many codes have lightly contended locks and benefit from simpler (and unfair) lock implementations.

-- Jim

James Cownie <james.h.cownie at intel.com<mailto:james.h.cownie at intel.com>>
SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
Tel: +44 117 9071438<tel:%2B44%20117%209071438>

From: openmp-dev-bounces at cs.uiuc.edu<mailto:openmp-dev-bounces at cs.uiuc.edu> [mailto:openmp-dev-bounces at cs.uiuc.edu<mailto:openmp-dev-bounces at cs.uiuc.edu>] On Behalf Of Jack Howarth
Sent: Wednesday, May 28, 2014 2:22 PM
To: openmp-dev at dcs-maillist2.engr.illinois.edu<mailto:openmp-dev at dcs-maillist2.engr.illinois.edu>
Subject: [Openmp-dev] initial clang-omp/openmp benchmarking

    I've done some initial benchmarking of openmp performance using the
clang compiler from our fink llvm34-3.4.1-0e packaging which has the
current openmp trunk svn built against the llvm/compiler-rt/clang 3.4.1
with a back port of current clang-omp from github applied. The results for
the heated_plate_openmp.c demo code compiled and run with the
heated_plate_gcc.sh shell script revealed some interesting results. The
demo code is run at one, two and four OMP processes. Ratioing these
timings to the one OMP process timing shows the following on a 16-core
MacPro on darwin13…

1:1.90:3.31 for FSF gcc 4.8.3

1:1.90:3.30 for FSF gcc 4.9.0

1:1.99:3.71 for clang 3.4.1 with openmp and merged clang-omp

this compares to the results on a 24-core Fedora 15 linux box

1:1.99:3.92 for FSF gcc 4.6.3

1:1.99:3.93 for FSF gcc 4.8 branch svn

I've filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333 on the
reduced performance of gomp on darwin compared to iomp5 on darwin and
gomp on linux. Their response was that darwin's use of pthread_mutex calls
rather than futex was the cause in gomp and that we should be using linux.
    While the results for iomp5 are far better on darwin than those for
gomp on darwin, we still are lagging behind the performance of gomp using
futex on linux. FYI, the heated_plate_openmp.c and heated_plate_gcc.sh
are attached to PR 61333.
            Jack

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20140528/64c786fd/attachment.html>