<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 05/16/2017 11:57 AM, C Bergström

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAOnawYoW4KLedZFT5tb6VVVDw0Yr8Lp1wk5v4uixzJ8zuefAEw@mail.gmail.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=windows-1252">

      <div dir="ltr"><br>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On Wed, May 17, 2017 at 12:20 AM, Hal

            Finkel <span dir="ltr"><<a moz-do-not-send="true"

                target="_blank" href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>></span>

            wrote:<br>

            <blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px

              solid rgb(204,204,204);padding-left:1ex"

              class="gmail_quote">

              <div bgcolor="#FFFFFF"><span class="gmail-">

                  <div

                    class="gmail-m_3541403897252453532moz-cite-prefix">On

                    05/16/2017 02:54 AM, C Bergström wrote:<br>

                  </div>

                  <blockquote type="cite">

                    <div dir="ltr"><br>

                      <div class="gmail_extra"><br>

                        <div class="gmail_quote">On Tue, May 16, 2017 at

                          2:50 PM, Hal Finkel via cfe-dev <span

                            dir="ltr"><<a moz-do-not-send="true"

                              target="_blank"

                              href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>></span>

                          wrote:<br>

                          <blockquote style="margin:0px 0px 0px

                            0.8ex;border-left:1px solid

                            rgb(204,204,204);padding-left:1ex"

                            class="gmail_quote">

                            <div bgcolor="#FFFFFF">

                              <p>Hi, Erik,</p>

                              <p>That's great!<br>

                              </p>

                              <p>Gor, Marshall, and I discussed this

                                after some past committee meeting. We

                                wanted to architect the implementation

                                so that we could provide different

                                underlying concurrency mechanisms;

                                including:</p>

                              <p>   a. A self-contained

                                thread-pool-based implementation using a

                                work-stealing scheme.</p>

                              <p>   b. An implementation that wraps

                                Grand Central Dispatch (for Mac and any

                                other platforms providing libdispatch).</p>

                              <p>   c. An implementation that uses

                                OpenMP.</p>

                            </div>

                          </blockquote>

                          <div><br>

                          </div>

                          <div>Sorry to butt in, but I'm kinda curious

                            how these will be substantially different

                            under the hood<br>

                          </div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                  <br>

                </span> No need to be sorry; this is a good question. I

                think that there are a few high-level goals here:<br>

                <br>

                 1. Provide a solution that works for everybody<br>

                <br>

                 2. Take advantage of compiler technology as appropriate<br>

                <br>

                 3. Provide useful interoperability. In practice: don't

                oversubscribe the system.<br>

                <br>

                The motivation for providing an implementation based on

                a libc++ thread pool is to satisfy (1). Your suggestion

                of using our OpenMP runtime's low-level API directly is

                a good one. Personally, I really like this idea. It does

                imply, however, that organizations that distribute

                libc++ will also end up distributing libomp. If libomp

                has matured (in the open-source sense) to the point

                where this is a suitable solution, then we should do

                this. As I recall, however, we still have at least

                several organizations that ship Clang/LLVM/libc++-based

                toolchains that don't ship libomp, and I don't know how

                generally comfortable people will be with this

                dependency.<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>If "people" aren't comfortable with llvm-openmp then

              kick it out as a project. I use it and I know other

              projects that use it just fine. I can maybe claim the

              title of OpenMP hater and yet I don't know any legitimate

              reason against having this as a dependency. It's a

              portable parallel runtime that exposes an API and works..

              I hope someone does speak up about specific concerns if

              they exist.<br>

               </div>

            <blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px

              solid rgb(204,204,204);padding-left:1ex"

              class="gmail_quote">

              <div bgcolor="#FFFFFF"> <br>

                That having been said, to point (2), using the OpenMP

                compiler directives is superior to calling the low-level

                API directly. OpenMP directives to translate into API

                calls, as you point out, but they also provide

                optimization hints to the compiler (e.g. about lack of

                loop-carried dependencies). Over the next couple of

                years, I expect to see a lot more in the compiler

                optimization capabilities around OpenMP (and perhaps

                other parallelism) directives (parallel-region fusion,

                etc.). OpenMP also provides a standard way to access

                many of the relevant vectorization hints, and taking

                advantage of this is useful for compiling with Clang and

                also other compilers.<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>If projects can't even ship llvm-openmp runtime then I

              have a very strong concern with bootstrap dependencies

              which may start relying on external tools.<br>

              <br>

            </div>

            <div>Further, I'm not sure I understand your point here. The

              directives wouldn't be in the end user code, but would be

              in the STL implementation side. Wouldn't that

              implementation stuff be fixed and an abstract layer

              exposed to the end user? It almost sounds like you're

              expressing the benefits of OMP here and not the parallel

              STL side. (Hmm.. in the distance I hear.. "<span

                class="gmail-st"><em>premature optimization</em> is the

                root of <em>all evil")</em></span></div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    That's correct. The OpenMP pragmas would be an implementation

    detail. However, we'd design this so that the lambda that gets

    passed into the algorithm can be inlined into the code that has the

    compiler directives, thus reaping the benefit of OpenMP's compiler

    integration.<br>

    <br>

    <blockquote

cite="mid:CAOnawYoW4KLedZFT5tb6VVVDw0Yr8Lp1wk5v4uixzJ8zuefAEw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>Once llvm OpenMP can do things like handle nested

              parallelism and a few more advanced things properly all

              this might be fun (We can go down a big list if anyone

              wants to digress)<br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    This is why I said we might consider using taskloop ;) -- There are

    other ways of handling nesting as well (colleagues of mine work on

    one: <a class="moz-txt-link-freetext" href="http://www.bolt-omp.org/">http://www.bolt-omp.org/</a>), but we should probably have a

    separate thread on OpenMP and nesting to discuss this aspect of

    things.<br>

    <br>

    <blockquote

cite="mid:CAOnawYoW4KLedZFT5tb6VVVDw0Yr8Lp1wk5v4uixzJ8zuefAEw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> </div>

            <blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px

              solid rgb(204,204,204);padding-left:1ex"

              class="gmail_quote">

              <div bgcolor="#FFFFFF"> <br>

                Regarding why you'd use GDC on Mac, and similarly why it

                is important for many users to use OpenMP underneath, it

                is important, to the extent possible, to use the same

                underlying thread pool as other things in the

                application. This is to avoid over-subscription and

                other issues associated with conflicting threading

                runtimes. If parts of the application are already using

                GCD, then we probably want to do this to (or at least

                not compete with it). Otherwise, OpenMP's runtime is

                probably better ;)<span class="gmail-"><br>

                </span></div>

            </blockquote>

            <div><br>

            </div>

            <div>Again this detail isn't visible to the end user? We

              pick an implementation that makes sense. If other

              applications use GCD and we use OpenMP, if multiple thread

              heavy applications are running, over-subscription would be

              a kernel issue and not userland. I don't see how you can

              always avoid that situation and creating two

              implementations to try kinda seems funny. btw GCD is a

              marketing term and libdispatch is really what I'm talking

              about here. It's been quite a while since I hands on

              worked with it, but I wonder how much the API overlaps

              with similar interfaces to llvm-openmp. If the interfaces

              are similar and the "cost" in terms of complexity is low,

              who cares, but I don't remember that being the case. (side

              note: I worked on an older version of libdispatch and

              ported it Solaris. I also played around and benchmarked

              OMP tasks lowering directly down to libdispatch calls

              across multiple platforms. At the time our runtime always

              beat it in performance. Maybe newer versions of

              libdispatch are better)<br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    The detail is invisible to the user at the source-code level.

    Obviously they might notice if we're oversubscribing the system.

    Yes, on many systems the kernel can manage oversubscription, but

    that does not mean it will perform well. As I'm sure you understand,

    because of cache locality and many other effects, just running a

    bunch of threads and letting the kernel switch them is often much

    slower than running a smaller number of threads and having them pull

    from a task queue. There are exceptions worth mentioning, however,

    such as when the threads are mostly themselves blocked on I/O. <br>

    <br>

    <blockquote

cite="mid:CAOnawYoW4KLedZFT5tb6VVVDw0Yr8Lp1wk5v4uixzJ8zuefAEw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>I'm not trying to be combative, but your points just

              don't make sense....... (I take the blame and must be

              missing something)<br>

              -----------------<br>

            </div>

            <div>All this aside - I'm happy to help if needed - GPU

              (NVIDIA or AMD) and or llvm-openmp direct runtime api

              implementation. I've been involved with sorta similar

              projects (C++AMP) and based on that experience may be able

              to help avoid some gotchas.<br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Sounds great.<br>

    <br>

     -Hal<br>

    <br>

    <pre class="moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

  </body>

</html>