[Openmp-dev] initial clang-omp/openmp benchmarking

Fri May 30 01:41:58 PDT 2014

> I'm pretty sure Apple's stance is that they don't *want* people to affinitize 
> processes because they believe people would abuse it.

They are right, people get it wrong, *but* it can make 2x performance difference when used right,
and sharp knives are useful tools while blunt ones (or no knife at all) are not.

I think it's an issue of target market, though. OpenMP is mostly used in HPC; there
people are pushing for the last 5% performance on a machine where their code is all
that is running. In that environment having fine-grained control and telling the OS 
what to do makes sense. But, that's not where Apple is at all...
(There are no Apple machines in the Top500, whereas 482 of them run Linux).

(And, yes, I grant you, it may make sense for some applications on the Mac Pro, though 
even those seem only to be single-socket machines so affinity is less critical).

-- Jim

James Cownie <james.h.cownie at intel.com>
SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
Tel: +44 117 9071438

-----Original Message-----
From: Steven Noonan [mailto:steven at uplinklabs.net] 
Sent: Thursday, May 29, 2014 7:36 PM
To: Jack Howarth
Cc: Cownie, James H; openmp-dev at dcs-maillist2.engr.illinois.edu
Subject: Re: [Openmp-dev] initial clang-omp/openmp benchmarking

No, I haven't. I'm pretty sure Apple's stance is that they don't
*want* people to affinitize processes because they believe people would abuse it. Comments like this seem to indicate that to me,
anyway:

https://github.com/opensource-apple/xnu/blob/10.9/osfmk/kern/sched_prim.c#L1720

They used to make it possible to affinitize to CPUs via a framework that came with Xcode called CHUD, but they never made an Intel 64-bit version of it, and now it's gone altogether.

On Thu, May 29, 2014 at 11:13 AM, Jack Howarth <howarth.mailing.lists at gmail.com> wrote:
> Steven,
>      Have you filed a radar bug report with Apple on this?. There 
> always is the remote possibility that this issue could be addressed in 
> a future OS release.
>             Jack
>
>
> On Thu, May 29, 2014 at 1:30 PM, Steven Noonan <steven at uplinklabs.net>
> wrote:
>>
>> Darwin has a very weak notion of "affinity hints":
>>
>>
>> https://developer.apple.com/library/mac/releasenotes/Performance/RN-A
>> ffinityAPI/
>>
>> But it's so dumbed down (only a concept of distinct affinity "tags"
>> based solely on L2 cache sharing) that it's pretty useless. I did 
>> some microbenchmarks with it to simulate an OpenMP workload with 
>> pinning, and as far as I'm able to tell, the Darwin kernel just 
>> ignores those hints and does whatever it pleases.
>>
>> On Thu, May 29, 2014 at 5:39 AM, Cownie, James H 
>> <james.h.cownie at intel.com> wrote:
>> > I think the complaint is this: on Darwin, the scaling to 4 "processes"
>> > is
>> > worse than on Linux.
>> >
>> > Four threads is small. The OpenMP runtime is tested scaling in the 
>> > 200+ thread range for Xeon Phi, and on big-iron servers. We measure 
>> > the scaling of a variety of more interesting things there (such as 
>> > SpecOMP).
>> >
>> >
>> >
>> > Futexes are fast, but then so are our spin-locks. The difference is 
>> > what happens when the lock is contended (whether you enter the 
>> > kernel or not, and therefore allow the kernel to schedule something 
>> > else onto the same HW thread). That should make little difference 
>> > in this case, since the machine is not over-subscribed.
>> >
>> >
>> >
>> > If Darwin provides a fast futex interface, then iomp should use it.
>> >
>> > Darwin does not provide it, so we can’t use it J.
>> >
>> >
>> >
>> > I’d guess that the issue here is more likely related to affinity 
>> > choices made by the operating system (whether it chooses to place 
>> > threads as hyper-threads on the same core, as threads in the same 
>> > socket, or across
>> > sockets) than details of the locking. I believe that Darwin also 
>> > has no specific support that would let us control that either…
>> >
>> >
>> >
>> > -- Jim
>> >
>> > James Cownie <james.h.cownie at intel.com> SSG/DPD/TCAR (Technical 
>> > Computing, Analyzers and Runtimes)
>> >
>> > Tel: +44 117 9071438
>> >
>> >
>> >
>> > From: Chandler Carruth [mailto:chandlerc at google.com]
>> > Sent: Thursday, May 29, 2014 1:07 PM
>> > To: Cownie, James H
>> > Cc: Jack Howarth; openmp-dev at dcs-maillist2.engr.illinois.edu
>> >
>> >
>> > Subject: Re: [Openmp-dev] initial clang-omp/openmp benchmarking
>> >
>> >
>> >
>> >
>> >
>> > On Thu, May 29, 2014 at 4:45 AM, Cownie, James H 
>> > <james.h.cownie at intel.com>
>> > wrote:
>> >
>> > I don’t really understand what problem you are complaining about.
>> >
>> > Your numbers show clang-omp as the fastest implementation in all 
>> > directly comparable cases. That doesn’t seem like something we want 
>> > to change!
>> >
>> >
>> > I think the complaint is this: on Darwin, the scaling to 4 "processes"
>> > is
>> > worse than on Linux.
>> >
>> >
>> >
>> > However, the reason is stated already: Linux provides a *very* fast 
>> > futex implementation. Darwin either doesn't provide it or iomp 
>> > doesn't use it.
>> >
>> >
>> >
>> > If Darwin provides a fast futex interface, then iomp should use it.
>> > That's a
>> > useful request. I don't know enough about Darwin to help 
>> > investigate whether the OS has a futex interface exposed to 
>> > userland.
>> >
>> >
>> >
>> > If Darwin doesn't provide a futex interface, there is literally 
>> > nothing we can do about that. You aren't going to match the 
>> > scalability of a kernel-supported futex with something in 
>> > userspace.
>> >
>> >
>> >
>> > Anyways, I do agree that micro-optimizing mutex performance for 
>> > something like openmp seems somewhat less important....
>> >
>> > -------------------------------------------------------------------
>> > --
>> >
>> >
>> > Intel Corporation (UK) Limited
>> > Registered No. 1134945 (England)
>> > Registered Office: Pipers Way, Swindon SN3 1RJ VAT No: 860 2173 47
>> >
>> > This e-mail and any attachments may contain confidential material 
>> > for the sole use of the intended recipient(s). Any review or 
>> > distribution by others is strictly prohibited. If you are not the 
>> > intended recipient, please contact the sender and delete all copies.
>> >
>> >
>> > _______________________________________________
>> > Openmp-dev mailing list
>> > Openmp-dev at dcs-maillist2.engr.illinois.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/openmp-dev
>> >
>>
>> _______________________________________________
>> Openmp-dev mailing list
>> Openmp-dev at dcs-maillist2.engr.illinois.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/openmp-dev
>
>
---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.