[Openmp-dev] initial clang-omp/openmp benchmarking

Cownie, James H james.h.cownie at intel.com
Thu May 29 05:39:45 PDT 2014


I think the complaint is this: on Darwin, the scaling to 4 "processes" is worse than on Linux.
Four threads is small. The OpenMP runtime is tested scaling in the 200+ thread range for Xeon Phi, and on big-iron servers. We measure the scaling of a variety of more interesting things there (such as SpecOMP).

Futexes are fast, but then so are our spin-locks. The difference is what happens when the lock is contended (whether you enter the kernel or not, and therefore allow the kernel to schedule something else onto the same HW thread). That should make little difference in this case, since the machine is not over-subscribed.

If Darwin provides a fast futex interface, then iomp should use it.
Darwin does not provide it, so we can’t use it ☺.

I’d guess that the issue here is more likely related to affinity choices made by the operating system (whether it chooses to place threads as hyper-threads on the same core, as threads in the same socket, or across sockets) than details of the locking. I believe that Darwin also has no specific support that would let us control that either…

-- Jim

James Cownie <james.h.cownie at intel.com>
SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
Tel: +44 117 9071438

From: Chandler Carruth [mailto:chandlerc at google.com]
Sent: Thursday, May 29, 2014 1:07 PM
To: Cownie, James H
Cc: Jack Howarth; openmp-dev at dcs-maillist2.engr.illinois.edu
Subject: Re: [Openmp-dev] initial clang-omp/openmp benchmarking


On Thu, May 29, 2014 at 4:45 AM, Cownie, James H <james.h.cownie at intel.com<mailto:james.h.cownie at intel.com>> wrote:
I don’t really understand what problem you are complaining about.
Your numbers show clang-omp as the fastest implementation in all directly comparable cases. That doesn’t seem like something we want to change!

I think the complaint is this: on Darwin, the scaling to 4 "processes" is worse than on Linux.

However, the reason is stated already: Linux provides a *very* fast futex implementation. Darwin either doesn't provide it or iomp doesn't use it.

If Darwin provides a fast futex interface, then iomp should use it. That's a useful request. I don't know enough about Darwin to help investigate whether the OS has a futex interface exposed to userland.

If Darwin doesn't provide a futex interface, there is literally nothing we can do about that. You aren't going to match the scalability of a kernel-supported futex with something in userspace.

Anyways, I do agree that micro-optimizing mutex performance for something like openmp seems somewhat less important....
---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20140529/5954fd00/attachment.html>


More information about the Openmp-dev mailing list