<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 05/16/2017 02:54 AM, C Bergström

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAOnawYoMxLV+gu9d5J4oAi_MubS__DzboRksiu5uZHn2x62MAw@mail.gmail.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <div dir="ltr"><br>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On Tue, May 16, 2017 at 2:50 PM, Hal

            Finkel via cfe-dev <span dir="ltr"><<a

                moz-do-not-send="true"

                href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000">

                <p>Hi, Erik,</p>

                <p>That's great!<br>

                </p>

                <p>Gor, Marshall, and I discussed this after some past

                  committee meeting. We wanted to architect the

                  implementation so that we could provide different

                  underlying concurrency mechanisms; including:</p>

                <p>   a. A self-contained thread-pool-based

                  implementation using a work-stealing scheme.</p>

                <p>   b. An implementation that wraps Grand Central

                  Dispatch (for Mac and any other platforms providing

                  libdispatch).</p>

                <p>   c. An implementation that uses OpenMP.</p>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>Sorry to butt in, but I'm kinda curious how these will

              be substantially different under the hood<br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    No need to be sorry; this is a good question. I think that there are

    a few high-level goals here:<br>

    <br>

     1. Provide a solution that works for everybody<br>

    <br>

     2. Take advantage of compiler technology as appropriate<br>

    <br>

     3. Provide useful interoperability. In practice: don't

    oversubscribe the system.<br>

    <br>

    The motivation for providing an implementation based on a libc++

    thread pool is to satisfy (1). Your suggestion of using our OpenMP

    runtime's low-level API directly is a good one. Personally, I really

    like this idea. It does imply, however, that organizations that

    distribute libc++ will also end up distributing libomp. If libomp

    has matured (in the open-source sense) to the point where this is a

    suitable solution, then we should do this. As I recall, however, we

    still have at least several organizations that ship

    Clang/LLVM/libc++-based toolchains that don't ship libomp, and I

    don't know how generally comfortable people will be with this

    dependency.<br>

    <br>

    That having been said, to point (2), using the OpenMP compiler

    directives is superior to calling the low-level API directly. OpenMP

    directives to translate into API calls, as you point out, but they

    also provide optimization hints to the compiler (e.g. about lack of

    loop-carried dependencies). Over the next couple of years, I expect

    to see a lot more in the compiler optimization capabilities around

    OpenMP (and perhaps other parallelism) directives (parallel-region

    fusion, etc.). OpenMP also provides a standard way to access many of

    the relevant vectorization hints, and taking advantage of this is

    useful for compiling with Clang and also other compilers.<br>

    <br>

    Regarding why you'd use GDC on Mac, and similarly why it is

    important for many users to use OpenMP underneath, it is important,

    to the extent possible, to use the same underlying thread pool as

    other things in the application. This is to avoid over-subscription

    and other issues associated with conflicting threading runtimes. If

    parts of the application are already using GCD, then we probably

    want to do this to (or at least not compete with it). Otherwise,

    OpenMP's runtime is probably better ;)<br>

    <br>

    <br>

    <blockquote

cite="mid:CAOnawYoMxLV+gu9d5J4oAi_MubS__DzboRksiu5uZHn2x62MAw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>"OpenMP" is a pretty vague term and I'm curious what

              that means in terms of actual directives used. All

              non-accelerator OpenMP implementations lower down to

              threading currently. (Even if you use tasks it still ends

              up being a thread)<br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    I had in mind basic host-level OpenMP directives (i.e. OpenMP 3

    style plus simd directives for vectorization, although using

    taskloop is a good thing to consider as well). I don't think we can

    transparently use OpenMP accelerator directives in their current

    state because we can't identify the memory dependencies. When OpenMP

    grows some way to deal with accelerators in a global address space

    (e.g. the new NVIDIA UVM technology), then we should be able to use

    that too. CUDA+UVM will be an option in the shorter term here as

    well, however. Given that Clang can function as a CUDA compiler,

    this is definitely worth exploring.<br>

    <br>

    Thanks again,<br>

    Hal<br>

    <br>

    <blockquote

cite="mid:CAOnawYoMxLV+gu9d5J4oAi_MubS__DzboRksiu5uZHn2x62MAw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>GCD (libdispatch) is essentially a task based execution

              model, but again on non-OSX platforms lowers to threads.

              (I have a doubt that GCD offers any performance benefit

              over native threads or Intel OMP runtime on OSX.)<br>

              <br>

            </div>

            <div>How would the above offer any benefit over a native

              thread pool? Would you be just duplicating code which is

              already working?<br>

              --------------<br>

            </div>

            <div>I'm no OMP advocate, but I'd find it significantly more

              sane to target the Intel OMP runtime API directly.<br>

            </div>

            <div>* Production ready<br>

            </div>

            <div>* Portable across CPU (Intel, ARM, Power8)<br>

            </div>

            <div>* Likely provides the interface needed for parallelism<br>

            </div>

            <div>* Single approach<br>

            </div>

            <div>* Already part of the llvm infrastructure without

              external dependencies.<br>

            </div>

            <div><br>

            </div>

            <div>I don't know how well the API will map to accelerators,

              but for something quick and easy it's likely to the

              easiest.<br>

              <br>

            </div>

            <div>Bryce I think even mentioned he had used it before with

              positive results?<br>

              <br>

            </div>

            <div>In contrast the other approaches will loosely couple

              things to external dependencies and be more difficult to

              debug and support long term. It will introduce additional

              build dependencies which will likely add barriers to

              others contributing.<br>

              <br>

            </div>

            <div>I'm not writing the code and just trying to offer

              another pragmatic point of view..<br>

              <br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

  </body>

</html>