<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 05/16/2017 11:57 AM, C Bergström
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAOnawYoW4KLedZFT5tb6VVVDw0Yr8Lp1wk5v4uixzJ8zuefAEw@mail.gmail.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html;
        charset=windows-1252">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Wed, May 17, 2017 at 12:20 AM, Hal
            Finkel <span dir="ltr"><<a moz-do-not-send="true"
                target="_blank" href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>></span>
            wrote:<br>
            <blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px
              solid rgb(204,204,204);padding-left:1ex"
              class="gmail_quote">
              <div bgcolor="#FFFFFF"><span class="gmail-">
                  <div
                    class="gmail-m_3541403897252453532moz-cite-prefix">On
                    05/16/2017 02:54 AM, C Bergström wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr"><br>
                      <div class="gmail_extra"><br>
                        <div class="gmail_quote">On Tue, May 16, 2017 at
                          2:50 PM, Hal Finkel via cfe-dev <span
                            dir="ltr"><<a moz-do-not-send="true"
                              target="_blank"
                              href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>></span>
                          wrote:<br>
                          <blockquote style="margin:0px 0px 0px
                            0.8ex;border-left:1px solid
                            rgb(204,204,204);padding-left:1ex"
                            class="gmail_quote">
                            <div bgcolor="#FFFFFF">
                              <p>Hi, Erik,</p>
                              <p>That's great!<br>
                              </p>
                              <p>Gor, Marshall, and I discussed this
                                after some past committee meeting. We
                                wanted to architect the implementation
                                so that we could provide different
                                underlying concurrency mechanisms;
                                including:</p>
                              <p>   a. A self-contained
                                thread-pool-based implementation using a
                                work-stealing scheme.</p>
                              <p>   b. An implementation that wraps
                                Grand Central Dispatch (for Mac and any
                                other platforms providing libdispatch).</p>
                              <p>   c. An implementation that uses
                                OpenMP.</p>
                            </div>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>Sorry to butt in, but I'm kinda curious
                            how these will be substantially different
                            under the hood<br>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                  <br>
                </span> No need to be sorry; this is a good question. I
                think that there are a few high-level goals here:<br>
                <br>
                 1. Provide a solution that works for everybody<br>
                <br>
                 2. Take advantage of compiler technology as appropriate<br>
                <br>
                 3. Provide useful interoperability. In practice: don't
                oversubscribe the system.<br>
                <br>
                The motivation for providing an implementation based on
                a libc++ thread pool is to satisfy (1). Your suggestion
                of using our OpenMP runtime's low-level API directly is
                a good one. Personally, I really like this idea. It does
                imply, however, that organizations that distribute
                libc++ will also end up distributing libomp. If libomp
                has matured (in the open-source sense) to the point
                where this is a suitable solution, then we should do
                this. As I recall, however, we still have at least
                several organizations that ship Clang/LLVM/libc++-based
                toolchains that don't ship libomp, and I don't know how
                generally comfortable people will be with this
                dependency.<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>If "people" aren't comfortable with llvm-openmp then
              kick it out as a project. I use it and I know other
              projects that use it just fine. I can maybe claim the
              title of OpenMP hater and yet I don't know any legitimate
              reason against having this as a dependency. It's a
              portable parallel runtime that exposes an API and works..
              I hope someone does speak up about specific concerns if
              they exist.<br>
               </div>
            <blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px
              solid rgb(204,204,204);padding-left:1ex"
              class="gmail_quote">
              <div bgcolor="#FFFFFF"> <br>
                That having been said, to point (2), using the OpenMP
                compiler directives is superior to calling the low-level
                API directly. OpenMP directives to translate into API
                calls, as you point out, but they also provide
                optimization hints to the compiler (e.g. about lack of
                loop-carried dependencies). Over the next couple of
                years, I expect to see a lot more in the compiler
                optimization capabilities around OpenMP (and perhaps
                other parallelism) directives (parallel-region fusion,
                etc.). OpenMP also provides a standard way to access
                many of the relevant vectorization hints, and taking
                advantage of this is useful for compiling with Clang and
                also other compilers.<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>If projects can't even ship llvm-openmp runtime then I
              have a very strong concern with bootstrap dependencies
              which may start relying on external tools.<br>
              <br>
            </div>
            <div>Further, I'm not sure I understand your point here. The
              directives wouldn't be in the end user code, but would be
              in the STL implementation side. Wouldn't that
              implementation stuff be fixed and an abstract layer
              exposed to the end user? It almost sounds like you're
              expressing the benefits of OMP here and not the parallel
              STL side. (Hmm.. in the distance I hear.. "<span
                class="gmail-st"><em>premature optimization</em> is the
                root of <em>all evil")</em></span></div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    That's correct. The OpenMP pragmas would be an implementation
    detail. However, we'd design this so that the lambda that gets
    passed into the algorithm can be inlined into the code that has the
    compiler directives, thus reaping the benefit of OpenMP's compiler
    integration.<br>
    <br>
    <blockquote
cite="mid:CAOnawYoW4KLedZFT5tb6VVVDw0Yr8Lp1wk5v4uixzJ8zuefAEw@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>Once llvm OpenMP can do things like handle nested
              parallelism and a few more advanced things properly all
              this might be fun (We can go down a big list if anyone
              wants to digress)<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    This is why I said we might consider using taskloop ;) -- There are
    other ways of handling nesting as well (colleagues of mine work on
    one: <a class="moz-txt-link-freetext" href="http://www.bolt-omp.org/">http://www.bolt-omp.org/</a>), but we should probably have a
    separate thread on OpenMP and nesting to discuss this aspect of
    things.<br>
    <br>
    <blockquote
cite="mid:CAOnawYoW4KLedZFT5tb6VVVDw0Yr8Lp1wk5v4uixzJ8zuefAEw@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> </div>
            <blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px
              solid rgb(204,204,204);padding-left:1ex"
              class="gmail_quote">
              <div bgcolor="#FFFFFF"> <br>
                Regarding why you'd use GDC on Mac, and similarly why it
                is important for many users to use OpenMP underneath, it
                is important, to the extent possible, to use the same
                underlying thread pool as other things in the
                application. This is to avoid over-subscription and
                other issues associated with conflicting threading
                runtimes. If parts of the application are already using
                GCD, then we probably want to do this to (or at least
                not compete with it). Otherwise, OpenMP's runtime is
                probably better ;)<span class="gmail-"><br>
                </span></div>
            </blockquote>
            <div><br>
            </div>
            <div>Again this detail isn't visible to the end user? We
              pick an implementation that makes sense. If other
              applications use GCD and we use OpenMP, if multiple thread
              heavy applications are running, over-subscription would be
              a kernel issue and not userland. I don't see how you can
              always avoid that situation and creating two
              implementations to try kinda seems funny. btw GCD is a
              marketing term and libdispatch is really what I'm talking
              about here. It's been quite a while since I hands on
              worked with it, but I wonder how much the API overlaps
              with similar interfaces to llvm-openmp. If the interfaces
              are similar and the "cost" in terms of complexity is low,
              who cares, but I don't remember that being the case. (side
              note: I worked on an older version of libdispatch and
              ported it Solaris. I also played around and benchmarked
              OMP tasks lowering directly down to libdispatch calls
              across multiple platforms. At the time our runtime always
              beat it in performance. Maybe newer versions of
              libdispatch are better)<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    The detail is invisible to the user at the source-code level.
    Obviously they might notice if we're oversubscribing the system.
    Yes, on many systems the kernel can manage oversubscription, but
    that does not mean it will perform well. As I'm sure you understand,
    because of cache locality and many other effects, just running a
    bunch of threads and letting the kernel switch them is often much
    slower than running a smaller number of threads and having them pull
    from a task queue. There are exceptions worth mentioning, however,
    such as when the threads are mostly themselves blocked on I/O. <br>
    <br>
    <blockquote
cite="mid:CAOnawYoW4KLedZFT5tb6VVVDw0Yr8Lp1wk5v4uixzJ8zuefAEw@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>I'm not trying to be combative, but your points just
              don't make sense....... (I take the blame and must be
              missing something)<br>
              -----------------<br>
            </div>
            <div>All this aside - I'm happy to help if needed - GPU
              (NVIDIA or AMD) and or llvm-openmp direct runtime api
              implementation. I've been involved with sorta similar
              projects (C++AMP) and based on that experience may be able
              to help avoid some gotchas.<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Sounds great.<br>
    <br>
     -Hal<br>
    <br>
    <pre class="moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
  </body>
</html>