<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 05/16/2017 02:54 AM, C Bergström
wrote:<br>
</div>
<blockquote
cite="mid:CAOnawYoMxLV+gu9d5J4oAi_MubS__DzboRksiu5uZHn2x62MAw@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, May 16, 2017 at 2:50 PM, Hal
Finkel via cfe-dev <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<p>Hi, Erik,</p>
<p>That's great!<br>
</p>
<p>Gor, Marshall, and I discussed this after some past
committee meeting. We wanted to architect the
implementation so that we could provide different
underlying concurrency mechanisms; including:</p>
<p> a. A self-contained thread-pool-based
implementation using a work-stealing scheme.</p>
<p> b. An implementation that wraps Grand Central
Dispatch (for Mac and any other platforms providing
libdispatch).</p>
<p> c. An implementation that uses OpenMP.</p>
</div>
</blockquote>
<div><br>
</div>
<div>Sorry to butt in, but I'm kinda curious how these will
be substantially different under the hood<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
No need to be sorry; this is a good question. I think that there are
a few high-level goals here:<br>
<br>
1. Provide a solution that works for everybody<br>
<br>
2. Take advantage of compiler technology as appropriate<br>
<br>
3. Provide useful interoperability. In practice: don't
oversubscribe the system.<br>
<br>
The motivation for providing an implementation based on a libc++
thread pool is to satisfy (1). Your suggestion of using our OpenMP
runtime's low-level API directly is a good one. Personally, I really
like this idea. It does imply, however, that organizations that
distribute libc++ will also end up distributing libomp. If libomp
has matured (in the open-source sense) to the point where this is a
suitable solution, then we should do this. As I recall, however, we
still have at least several organizations that ship
Clang/LLVM/libc++-based toolchains that don't ship libomp, and I
don't know how generally comfortable people will be with this
dependency.<br>
<br>
That having been said, to point (2), using the OpenMP compiler
directives is superior to calling the low-level API directly. OpenMP
directives to translate into API calls, as you point out, but they
also provide optimization hints to the compiler (e.g. about lack of
loop-carried dependencies). Over the next couple of years, I expect
to see a lot more in the compiler optimization capabilities around
OpenMP (and perhaps other parallelism) directives (parallel-region
fusion, etc.). OpenMP also provides a standard way to access many of
the relevant vectorization hints, and taking advantage of this is
useful for compiling with Clang and also other compilers.<br>
<br>
Regarding why you'd use GDC on Mac, and similarly why it is
important for many users to use OpenMP underneath, it is important,
to the extent possible, to use the same underlying thread pool as
other things in the application. This is to avoid over-subscription
and other issues associated with conflicting threading runtimes. If
parts of the application are already using GCD, then we probably
want to do this to (or at least not compete with it). Otherwise,
OpenMP's runtime is probably better ;)<br>
<br>
<br>
<blockquote
cite="mid:CAOnawYoMxLV+gu9d5J4oAi_MubS__DzboRksiu5uZHn2x62MAw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>"OpenMP" is a pretty vague term and I'm curious what
that means in terms of actual directives used. All
non-accelerator OpenMP implementations lower down to
threading currently. (Even if you use tasks it still ends
up being a thread)<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
I had in mind basic host-level OpenMP directives (i.e. OpenMP 3
style plus simd directives for vectorization, although using
taskloop is a good thing to consider as well). I don't think we can
transparently use OpenMP accelerator directives in their current
state because we can't identify the memory dependencies. When OpenMP
grows some way to deal with accelerators in a global address space
(e.g. the new NVIDIA UVM technology), then we should be able to use
that too. CUDA+UVM will be an option in the shorter term here as
well, however. Given that Clang can function as a CUDA compiler,
this is definitely worth exploring.<br>
<br>
Thanks again,<br>
Hal<br>
<br>
<blockquote
cite="mid:CAOnawYoMxLV+gu9d5J4oAi_MubS__DzboRksiu5uZHn2x62MAw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>GCD (libdispatch) is essentially a task based execution
model, but again on non-OSX platforms lowers to threads.
(I have a doubt that GCD offers any performance benefit
over native threads or Intel OMP runtime on OSX.)<br>
<br>
</div>
<div>How would the above offer any benefit over a native
thread pool? Would you be just duplicating code which is
already working?<br>
--------------<br>
</div>
<div>I'm no OMP advocate, but I'd find it significantly more
sane to target the Intel OMP runtime API directly.<br>
</div>
<div>* Production ready<br>
</div>
<div>* Portable across CPU (Intel, ARM, Power8)<br>
</div>
<div>* Likely provides the interface needed for parallelism<br>
</div>
<div>* Single approach<br>
</div>
<div>* Already part of the llvm infrastructure without
external dependencies.<br>
</div>
<div><br>
</div>
<div>I don't know how well the API will map to accelerators,
but for something quick and easy it's likely to the
easiest.<br>
<br>
</div>
<div>Bryce I think even mentioned he had used it before with
positive results?<br>
<br>
</div>
<div>In contrast the other approaches will loosely couple
things to external dependencies and be more difficult to
debug and support long term. It will introduce additional
build dependencies which will likely add barriers to
others contributing.<br>
<br>
</div>
<div>I'm not writing the code and just trying to offer
another pragmatic point of view..<br>
<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</body>
</html>