[Openmp-dev] initial clang-omp/openmp benchmarking

Thu May 29 10:30:30 PDT 2014

Darwin has a very weak notion of "affinity hints":

https://developer.apple.com/library/mac/releasenotes/Performance/RN-AffinityAPI/

But it's so dumbed down (only a concept of distinct affinity "tags"
based solely on L2 cache sharing) that it's pretty useless. I did some
microbenchmarks with it to simulate an OpenMP workload with pinning,
and as far as I'm able to tell, the Darwin kernel just ignores those
hints and does whatever it pleases.

On Thu, May 29, 2014 at 5:39 AM, Cownie, James H
<james.h.cownie at intel.com> wrote:
> I think the complaint is this: on Darwin, the scaling to 4 "processes" is
> worse than on Linux.
>
> Four threads is small. The OpenMP runtime is tested scaling in the 200+
> thread range for Xeon Phi, and on big-iron servers. We measure the scaling
> of a variety of more interesting things there (such as SpecOMP).
>
>
>
> Futexes are fast, but then so are our spin-locks. The difference is what
> happens when the lock is contended (whether you enter the kernel or not, and
> therefore allow the kernel to schedule something else onto the same HW
> thread). That should make little difference in this case, since the machine
> is not over-subscribed.
>
>
>
> If Darwin provides a fast futex interface, then iomp should use it.
>
> Darwin does not provide it, so we can’t use it J.
>
>
>
> I’d guess that the issue here is more likely related to affinity choices
> made by the operating system (whether it chooses to place threads as
> hyper-threads on the same core, as threads in the same socket, or across
> sockets) than details of the locking. I believe that Darwin also has no
> specific support that would let us control that either…
>
>
>
> -- Jim
>
> James Cownie <james.h.cownie at intel.com>
> SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
>
> Tel: +44 117 9071438
>
>
>
> From: Chandler Carruth [mailto:chandlerc at google.com]
> Sent: Thursday, May 29, 2014 1:07 PM
> To: Cownie, James H
> Cc: Jack Howarth; openmp-dev at dcs-maillist2.engr.illinois.edu
>
>
> Subject: Re: [Openmp-dev] initial clang-omp/openmp benchmarking
>
>
>
>
>
> On Thu, May 29, 2014 at 4:45 AM, Cownie, James H <james.h.cownie at intel.com>
> wrote:
>
> I don’t really understand what problem you are complaining about.
>
> Your numbers show clang-omp as the fastest implementation in all directly
> comparable cases. That doesn’t seem like something we want to change!
>
>
> I think the complaint is this: on Darwin, the scaling to 4 "processes" is
> worse than on Linux.
>
>
>
> However, the reason is stated already: Linux provides a *very* fast futex
> implementation. Darwin either doesn't provide it or iomp doesn't use it.
>
>
>
> If Darwin provides a fast futex interface, then iomp should use it. That's a
> useful request. I don't know enough about Darwin to help investigate whether
> the OS has a futex interface exposed to userland.
>
>
>
> If Darwin doesn't provide a futex interface, there is literally nothing we
> can do about that. You aren't going to match the scalability of a
> kernel-supported futex with something in userspace.
>
>
>
> Anyways, I do agree that micro-optimizing mutex performance for something
> like openmp seems somewhat less important....
>
> ---------------------------------------------------------------------
>
>
> Intel Corporation (UK) Limited
> Registered No. 1134945 (England)
> Registered Office: Pipers Way, Swindon SN3 1RJ
> VAT No: 860 2173 47
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
> _______________________________________________
> Openmp-dev mailing list
> Openmp-dev at dcs-maillist2.engr.illinois.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/openmp-dev
>