[Openmp-dev] initial clang-omp/openmp benchmarking

Cownie, James H james.h.cownie at intel.com
Wed May 28 06:27:51 PDT 2014


Without looking at the benchmark code (do you have a link to it?), given the description it sounds as if it is omp_lock_t intensive.
If so you can explicitly use FUTEX locks with libiomp5.so on Linux by setting the environment variable KMP_LOCK_KIND=futex.
You may also want to play with KMP_LOCK_KIND=tas, which uses a “test and test-and-set” lock.

The choice of a default lock implementation is not trivial, since some lock benchmarks (such as that in EPCC) are for heavily contended locks, whereas many codes have lightly contended locks and benefit from simpler (and unfair) lock implementations.

-- Jim

James Cownie <james.h.cownie at intel.com>
SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
Tel: +44 117 9071438

From: openmp-dev-bounces at cs.uiuc.edu [mailto:openmp-dev-bounces at cs.uiuc.edu] On Behalf Of Jack Howarth
Sent: Wednesday, May 28, 2014 2:22 PM
To: openmp-dev at dcs-maillist2.engr.illinois.edu
Subject: [Openmp-dev] initial clang-omp/openmp benchmarking

    I've done some initial benchmarking of openmp performance using the
clang compiler from our fink llvm34-3.4.1-0e packaging which has the
current openmp trunk svn built against the llvm/compiler-rt/clang 3.4.1
with a back port of current clang-omp from github applied. The results for
the heated_plate_openmp.c demo code compiled and run with the
heated_plate_gcc.sh shell script revealed some interesting results. The
demo code is run at one, two and four OMP processes. Ratioing these
timings to the one OMP process timing shows the following on a 16-core
MacPro on darwin13…

1:1.90:3.31 for FSF gcc 4.8.3

1:1.90:3.30 for FSF gcc 4.9.0

1:1.99:3.71 for clang 3.4.1 with openmp and merged clang-omp

this compares to the results on a 24-core Fedora 15 linux box

1:1.99:3.92 for FSF gcc 4.6.3

1:1.99:3.93 for FSF gcc 4.8 branch svn

I've filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333 on the
reduced performance of gomp on darwin compared to iomp5 on darwin and
gomp on linux. Their response was that darwin's use of pthread_mutex calls
rather than futex was the cause in gomp and that we should be using linux.
    While the results for iomp5 are far better on darwin than those for
gomp on darwin, we still are lagging behind the performance of gomp using
futex on linux. FYI, the heated_plate_openmp.c and heated_plate_gcc.sh
are attached to PR 61333.
            Jack
---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20140528/127c2d77/attachment.html>


More information about the Openmp-dev mailing list