[PATCH] [OPENMP] Codegen for "omp flush" directive.

Wed Nov 26 03:04:34 PST 2014

On Tue, Nov 25, 2014 at 5:16 AM, Cownie, James H <james.h.cownie at intel.com<mailto:james.h.cownie at intel.com>> wrote:
From a runtime point of view we need to preserve backwards binary compatibility, so we can't change the interface to the current interface function to introduce a count (because that old code won't set it).

Okay.  I wasn't sure if you were guaranteeing backwards binary compatibility with existing compiles or not.  The way that I understand you're currently deploying the runtime — linking in a static library that's distributed with the compiler — does not actually require backwards binary compatibility as far as I can tell.  But if you're making that guarantee, then I agree we can't break it.

You missed that we don’t normally build a static library at all, but rather a dynamic one. Therefore we need to maintain backwards binary compatibility so that we can link code compiled by an old compiler against a library provided by a new one.

Static-linking the OpenMP runtime is a very bad idea, since it can lead to multiple copies of the runtime in the same process (when a shared library is linked statically against one OpenMP runtime and another shared library, or the executable, is linked against another). That, in turn, can lead to over-subscription (each rutime creates its own thread pool) with associated poor performance, or, even incorrect behaviour (#pragma omp critical not enforcing atomicity between the different libraries).

This is one of the main reasons why we have the libgomp entrypoints, since without them you would necessarily have two OpenMP runtimes in the process when you linked gcc compiled OpenMP code and clang compiled OpenMP code.  That would lead to all the issues outlined above.

The only sane approach is to attempt to ensure that there’s only a single copy of the OpenMP runtime in any process. As soon as you statically link an OpenMP runtime into some component (executable, shared-library) that becomes impossible to guarantee in general.

It sounds like the flush operation is really designed for the benefit of OpenMP implementations that aren't just using CPU threads.
You clearly need the flush operation even when compiling for CPUs in a cache coherent node to ensure that the compiler has moved data associated with the flushed variables from registers into store, and that an appropriate memory fence is executed. The real issue is whether, once the compiler has issued the necessary stores, the runtime can do anything smarter than use a full memory-fence if it knows which variables were flushed. Clearly that has never proved to be the case yet!

We actually implemented “Cluster OpenMP” (you can probably Google it if you care), which implemented OpenMP on a “shared nothing” cluster. It exploited OpenMP’s relaxed memory consistency models and a Treadmarks derived shared virtual memory implementation, but clearly even that didn’t need to pass any extra information into the flush function in the runtime, since otherwise this interface would already have been fixed!

Anyway, we’ll have a patch to change the interface specification in the next few weeks, I hope.

-- Jim

James Cownie <james.h.cownie at intel.com>
SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
Tel: +44 117 9071438

From: John McCall [mailto:rjmccall at gmail.com]
Sent: Wednesday, November 26, 2014 5:31 AM
To: Cownie, James H
Cc: Hal Finkel; reviews+D6292+public+0f122151bdc64d55 at reviews.llvm.org; cfe-commits at cs.uiuc.edu; a bataev; fraggamuffin at gmail.com; ejstotzer at gmail.com
Subject: Re: [PATCH] [OPENMP] Codegen for "omp flush" directive.

On Tue, Nov 25, 2014 at 5:16 AM, Cownie, James H <james.h.cownie at intel.com<mailto:james.h.cownie at intel.com>> wrote:
From a runtime point of view we need to preserve backwards binary compatibility, so we can't change the interface to the current interface function to introduce a count (because that old code won't set it).

Okay.  I wasn't sure if you were guaranteeing backwards binary compatibility with existing compiles or not.  The way that I understand you're currently deploying the runtime — linking in a static library that's distributed with the compiler — does not actually require backwards binary compatibility as far as I can tell.  But if you're making that guarantee, then I agree we can't break it.

I would therefore prefer to do this

1) Change the prototype for the existing function so that it becomes a function with no arguments. (Effectively your #1). Since no existing code passes arguments, this is fine.

Changing from a variadic to a non-variadic function is technically allowed to break the ABI, e.g. by passing the non-optional arguments in a different way.  I don't know of any platforms where it does actually change for the standard C calling convention, though — I know of platforms (like MIPS and Apple's ARM64) that change how variadic arguments are passed, but not of any that affect earlier arguments.  And I *think* there's no issue with new compilations calling old runtimes on platforms like x86-64 where variadic functions expect extra information, because I don't know of any compilers that actually crash on an out-of-range value in %al.

So this is probably fine, but if you want to be 100% paranoid, the safest thing to do is to leave it variadic but document that the variadic arguments are useless.

2) If/when we decide that it is useful to pass extra information, we design that as a new function. I can't see a need for that in a cache-coherent system, and in a non-coherent system the interface needs to be more complicated than just a set of pointers (because you also need to know how big the object at the target of the pointer is). So this gets complicated, but, since we're not going to implement it until we need it and understand the problem better, that’s fine.

I'm avoiding your #2 because I don't believe it is sufficient.

I'm fine with that decision; it's your call.  I hadn't looked to see what these pointers were supposed to be; I guess I was assuming that they were variable descriptors or something, not literally just pointers to objects of unknown size.

It sounds like the flush operation is really designed for the benefit of OpenMP implementations that aren't just using CPU threads.  Given that your runtime's API is really only designed to be good enough for CPU threads, there's no reason to future proof it, because current CPUs don't let us do efficient partial memory barriers anyway.  If somebody comes out with a CPU that does, we can embellish the API then.

John.
---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20141126/b4783795/attachment.html>