[Openmp-dev] initial clang-omp/openmp benchmarking

Steven Noonan steven at uplinklabs.net
Thu May 29 11:36:01 PDT 2014


No, I haven't. I'm pretty sure Apple's stance is that they don't
*want* people to affinitize processes because they believe people
would abuse it. Comments like this seem to indicate that to me,
anyway:

https://github.com/opensource-apple/xnu/blob/10.9/osfmk/kern/sched_prim.c#L1720

They used to make it possible to affinitize to CPUs via a framework
that came with Xcode called CHUD, but they never made an Intel 64-bit
version of it, and now it's gone altogether.

On Thu, May 29, 2014 at 11:13 AM, Jack Howarth
<howarth.mailing.lists at gmail.com> wrote:
> Steven,
>      Have you filed a radar bug report with Apple on this?. There always is
> the remote possibility that this issue could be addressed in a future OS
> release.
>             Jack
>
>
> On Thu, May 29, 2014 at 1:30 PM, Steven Noonan <steven at uplinklabs.net>
> wrote:
>>
>> Darwin has a very weak notion of "affinity hints":
>>
>>
>> https://developer.apple.com/library/mac/releasenotes/Performance/RN-AffinityAPI/
>>
>> But it's so dumbed down (only a concept of distinct affinity "tags"
>> based solely on L2 cache sharing) that it's pretty useless. I did some
>> microbenchmarks with it to simulate an OpenMP workload with pinning,
>> and as far as I'm able to tell, the Darwin kernel just ignores those
>> hints and does whatever it pleases.
>>
>> On Thu, May 29, 2014 at 5:39 AM, Cownie, James H
>> <james.h.cownie at intel.com> wrote:
>> > I think the complaint is this: on Darwin, the scaling to 4 "processes"
>> > is
>> > worse than on Linux.
>> >
>> > Four threads is small. The OpenMP runtime is tested scaling in the 200+
>> > thread range for Xeon Phi, and on big-iron servers. We measure the
>> > scaling
>> > of a variety of more interesting things there (such as SpecOMP).
>> >
>> >
>> >
>> > Futexes are fast, but then so are our spin-locks. The difference is what
>> > happens when the lock is contended (whether you enter the kernel or not,
>> > and
>> > therefore allow the kernel to schedule something else onto the same HW
>> > thread). That should make little difference in this case, since the
>> > machine
>> > is not over-subscribed.
>> >
>> >
>> >
>> > If Darwin provides a fast futex interface, then iomp should use it.
>> >
>> > Darwin does not provide it, so we can’t use it J.
>> >
>> >
>> >
>> > I’d guess that the issue here is more likely related to affinity choices
>> > made by the operating system (whether it chooses to place threads as
>> > hyper-threads on the same core, as threads in the same socket, or across
>> > sockets) than details of the locking. I believe that Darwin also has no
>> > specific support that would let us control that either…
>> >
>> >
>> >
>> > -- Jim
>> >
>> > James Cownie <james.h.cownie at intel.com>
>> > SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
>> >
>> > Tel: +44 117 9071438
>> >
>> >
>> >
>> > From: Chandler Carruth [mailto:chandlerc at google.com]
>> > Sent: Thursday, May 29, 2014 1:07 PM
>> > To: Cownie, James H
>> > Cc: Jack Howarth; openmp-dev at dcs-maillist2.engr.illinois.edu
>> >
>> >
>> > Subject: Re: [Openmp-dev] initial clang-omp/openmp benchmarking
>> >
>> >
>> >
>> >
>> >
>> > On Thu, May 29, 2014 at 4:45 AM, Cownie, James H
>> > <james.h.cownie at intel.com>
>> > wrote:
>> >
>> > I don’t really understand what problem you are complaining about.
>> >
>> > Your numbers show clang-omp as the fastest implementation in all
>> > directly
>> > comparable cases. That doesn’t seem like something we want to change!
>> >
>> >
>> > I think the complaint is this: on Darwin, the scaling to 4 "processes"
>> > is
>> > worse than on Linux.
>> >
>> >
>> >
>> > However, the reason is stated already: Linux provides a *very* fast
>> > futex
>> > implementation. Darwin either doesn't provide it or iomp doesn't use it.
>> >
>> >
>> >
>> > If Darwin provides a fast futex interface, then iomp should use it.
>> > That's a
>> > useful request. I don't know enough about Darwin to help investigate
>> > whether
>> > the OS has a futex interface exposed to userland.
>> >
>> >
>> >
>> > If Darwin doesn't provide a futex interface, there is literally nothing
>> > we
>> > can do about that. You aren't going to match the scalability of a
>> > kernel-supported futex with something in userspace.
>> >
>> >
>> >
>> > Anyways, I do agree that micro-optimizing mutex performance for
>> > something
>> > like openmp seems somewhat less important....
>> >
>> > ---------------------------------------------------------------------
>> >
>> >
>> > Intel Corporation (UK) Limited
>> > Registered No. 1134945 (England)
>> > Registered Office: Pipers Way, Swindon SN3 1RJ
>> > VAT No: 860 2173 47
>> >
>> > This e-mail and any attachments may contain confidential material for
>> > the sole use of the intended recipient(s). Any review or distribution
>> > by others is strictly prohibited. If you are not the intended
>> > recipient, please contact the sender and delete all copies.
>> >
>> >
>> > _______________________________________________
>> > Openmp-dev mailing list
>> > Openmp-dev at dcs-maillist2.engr.illinois.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/openmp-dev
>> >
>>
>> _______________________________________________
>> Openmp-dev mailing list
>> Openmp-dev at dcs-maillist2.engr.illinois.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/openmp-dev
>
>




More information about the Openmp-dev mailing list