<div><br><div class="gmail_quote"><div dir="auto">On Wed, Dec 6, 2017 at 8:57 PM Hal Finkel <<a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
    <p><br>
    </p>
    <div class="m_8513274869410520852moz-cite-prefix">On 12/06/2017 10:23 PM, Jeff Hammond
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div><br>
        <div class="gmail_quote">
          <div dir="auto">On Wed, Dec 6, 2017 at 4:23 PM Hal Finkel <<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>>
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div bgcolor="#FFFFFF" text="#000000">
              <p><br>
              </p>
              <div class="m_8513274869410520852m_2065056468622040889moz-cite-prefix">On
                12/04/2017 10:48 PM, Serge Preis via cfe-dev wrote:<br>
              </div>
              <blockquote type="cite">
                <div>I agree that guarantees provided by ICC may be
                  stronger than with other compilers, so yes, under
                  OpenMP terms vectorization is permitted and cannot be
                  assumed. However OpenMP clearly defines semantics of
                  variables used within OpenMP region some being
                  shared(scalar), some private(vector) and some being
                  inductions. This goes far beyond typical compiler
                  specific pragmas about dependencies and cost modelling
                  and makes vectorization much simpler task with more
                  predictable and robust results if properly implemented
                  (admittedly, even ICC implementation is far from
                  perfect). I hope Intel's efforts to standardize
                  someting like this in core C++ will evntually come to
                  fruition. Until then I as a regular application
                  developer would appreciate OpenMP-simd based execution
                  policy (hoping for good support for OpenMP SIMD in
                  clang), but it shouldn't necessary be part of libc++.
                  Since 'unordered' execution policy is currently not
                  part of C++ standard </div>
              </blockquote>
              <br>
            </div>
            <div bgcolor="#FFFFFF" text="#000000">
              std::execution::par_unseq is part of C++17, and that
              essentially maps to '#pragma omp parallel for simd'.</div>
            <div bgcolor="#FFFFFF" text="#000000"><br>
            </div>
          </blockquote>
          <div dir="auto"><br>
          </div>
          <div dir="auto">Do you expect par/par_unseq to nest?</div>
        </div>
      </div>
    </blockquote>
    <br></div><div bgcolor="#FFFFFF" text="#000000">
    Yes.</div><div bgcolor="#FFFFFF" text="#000000"><br>
    <br>
    <blockquote type="cite">
      <div>
        <div class="gmail_quote">
          <div dir="auto"> Nesting omp-parallel is generally regarded as
            a Bad Idea.</div>
        </div>
      </div>
    </blockquote>
    <br></div><div bgcolor="#FFFFFF" text="#000000">
    Agreed. I suspect we'll want the mapping to be more like '#pragma
    omp taskloop simd'.</div><div bgcolor="#FFFFFF" text="#000000"><br>
    </div></blockquote><div dir="auto"><br></div><div dir="auto">That won’t run in parallel unless in an omp-parallel-master region. That means OpenMP-based PSTL won’t be parallel unless the user knows to add back-end specific code about the PSTL.</div><div dir="auto"><br></div><div dir="auto">What I’m trying to say is that OpenMP is a poor target for PSTL in its current form. Nested parallel regions is the only thing that works and it is likely to work poorly.</div><div dir="auto"><br></div><div dir="auto">Jeff</div><div dir="auto"><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><br>
     -Hal</div><div bgcolor="#FFFFFF" text="#000000"><br>
    <br>
    <blockquote type="cite">
      <div>
        <div class="gmail_quote">
          <div dir="auto"><br>
          </div>
          <div dir="auto">Jeff</div>
          <div dir="auto"><br>
          </div>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div bgcolor="#FFFFFF" text="#000000"><br>
              <blockquote type="cite">
                <div>I don't care much on how it will be implemneted in
                  libc++ if it is. I just would like to ask Intel guys
                  and community here to make implementation extensible
                  in a sense that custom OpenMP-SIMD-based execution
                  policy along with algorithms implementations (as
                  specializations for the policy) can be used with the
                  libc++ library. And I additionally would like to ask
                  Intel guys to provide complete and compatible
                  extension on github for developers like me to use.</div>
              </blockquote>
              <br>
            </div>
            <div bgcolor="#FFFFFF" text="#000000"> In the end, I think
              we want the following:<br>
              <br>
               1. A design for libc++ that allows the thread-level
              parallelism to be implemented in terms of different
              underlying providers (i.e., OpenMP, GCD, Work Queues on
              Windows, whatever else).<br>
               2. To follow the same philosophy with respect to
              standards as we do everywhere else: Use standards where
              possible with compiler/system-specific extensions as
              necessary.<br>
              <br>
               -Hal</div>
            <div bgcolor="#FFFFFF" text="#000000"><br>
              <br>
              <blockquote type="cite">
                <div> </div>
                <div>Regards,</div>
                <div>Serge.</div>
                <div> </div>
                <div> </div>
                <div> </div>
                <div>04.12.2017, 12:07, "Jeff Hammond" <a class="m_8513274869410520852m_2065056468622040889moz-txt-link-rfc2396E" href="mailto:jeff.science@gmail.com" target="_blank"><jeff.science@gmail.com></a>:</div>
                <blockquote type="cite">
                  <div>
                    <div>ICC implements a very aggressive interpretation
                      of the OpenMP standard, and this interpretation is
                      not shared by everyone in the OpenMP community. 
                      ICC is correct but other implementations may be
                      far less aggressive, so _Pragma("omp simd")
                      doesn't guarentee vectorization unless the
                      compiler documentation says that is how it is
                      implemented.  All the standard says that it means
                      is that vectorization is _permitted_.</div>
                    <div> </div>
                    <div>Given that the practical meaning of
                      _Pragma("omp simd") isn't guaranteed to be
                      consistent across different implementations, I
                      don't really know how to compare it to
                      compiler-specific pragmas unless we define
                      everything explicitly.</div>
                    <div> </div>
                    <div>In any case, my fundamental point remains: do
                      not use OpenMP pragmas here, but instead use
                      whatever the appropriate compiler-specific pragma
                      is, or create a new one that meets the need.</div>
                    <div> </div>
                    <div>Best,</div>
                    <div> </div>
                    <div>Jeff</div>
                    <div title="Page 81">
                      <div>
                        <div> </div>
                      </div>
                    </div>
                    <div> 
                      <div>On Sun, Dec 3, 2017 at 8:09 PM, Serge Preis <span><<a href="mailto:spreis@yandex-team.ru" target="_blank">spreis@yandex-team.ru</a>></span>
                        wrote:
                        <blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div>Hello,</div>
                          <div> </div>
                          <div>_Pragma("omp simd") is semantically quite
                            different from _Pragma("clang loop
                            vectorize(assume_safety)"), _Pragma("GCC
                            ivdep") and _Pragma("vector always"), so I
                            am not sure all latter will work as expected
                            in all cases. They definitely won't provide
                            any vectorization guarantees which slightly
                            defeat the purpose of using corresponding
                            execution policy.</div>
                          <div> </div>
                          <div>I support the idea of having OpenMP
                            orthogonal and definitely having -fopenmp
                            enabled by default is not an option. Intel
                            compiler has separate -qopenmp-simd option
                            which doesn't affect performance outside
                            explicitly marked loops, but even this is
                            not enabled by default. I would say that
                            there might exist multiple implementations
                            of unordered policy, originally OpenMP SIMD
                            based implementation may be more powerful
                            and one based on other pragmas being
                            default, but hinting about existence of
                            faster option. Later on one may be brave
                            enough to add some SIMD template library and
                            implement default unordered policy using it
                            (such implementation is possible even now
                            using vector types, but it will be extremely
                            complex if attempt to target all base data
                            types, vector widths and target SIMD
                            architectures clang supports. Even with the
                            library this may be quite tedious).</div>
                          <div> </div>
                          <div>Without any standard way of expressing
                            SIMD perallelism in pure C++ any implementer
                            of SIMD execution policy is to rely on means
                            avaialble for plaform/compiler and so it is
                            not totaly unnatural to ask user to enable
                            OpenMP SIMD for efficient support of
                            corresponding execution policy.</div>
                          <div> </div>
                          <div>Reagrds,</div>
                          <div>Serge Preis</div>
                          <div> </div>
                          <div>(Who once was part of Intel Compiler
                            Vectorizer team and driven OpenMP SIMD
                            efforts within icc and beyond, if anyone is
                            keeping track of conflicts-of-interest)</div>
                          <div> </div>
                          <div> </div>
                          <div>04.12.2017, 08:46, "Jeff Hammond via
                            cfe-dev" <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>>:</div>
                          <blockquote type="cite">
                            <div>
                              <div>
                                <div>It would be nice to keep PSTL and
                                  OpenMP orthogonal, even if
                                  _Pragma("omp simd") does not require
                                  runtime support.  It should be trivial
                                  to use _Pragma("clang loop
                                  vectorize(assume_safety)") instead, by
                                  wrapping all of the different compiler
                                  vectorization pragmas in preprocessor
                                  logic.  I similarly recommend
                                  _Pragma("GCC ivdep") for GCC and
                                  _Pragma("vector always") for ICC. 
                                  While this requires O(n_compilers)
                                  effort instead of O(1), but
                                  orthogonality is worth it.
                                  <div> </div>
                                  <div>While OpenMP is
                                    vendor/compiler-agnostic, users
                                    should not be required to use
                                    -fopenmp or similar to enable
                                    vectorization from PSTL, nor should
                                    the compiler enable any OpenMP
                                    pragma by default.  I know of cases
                                    where merely using the -fopenmp flag
                                    alters code generation in a
                                    performance-visible manner, and
                                    enabling the OpenMP "simd" pragma by
                                    default may surprise some users,
                                    particularly if no other OpenMP
                                    pragmas are enabled by default.
                                    <div><br>
                                      Best,</div>
                                    <div> </div>
                                    <div>Jeff</div>
                                    <div>(who works for Intel but not on
                                      any software products and has been
                                      a heavy user of Intel PSTL since
                                      it was released, if anyone is
                                      keeping track of
                                      conflicts-of-interest)<br>
                                      <br>
                                      On Wed, Nov 29, 2017 at 4:21 AM,
                                      Kukanov, Alexey via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>>
                                      wrote:<br>
                                      ><br>
                                      > Hello all,<br>
                                      ><br>
                                      > At Intel, we have developed
                                      an implementation of C++17
                                      execution policies<br>
                                      > for algorithms (often
                                      referred to as Parallel STL). We
                                      hope to contribute it<br>
                                      > to libc++/LLVM, so would like
                                      to ask the community for comments
                                      on this.<br>
                                      ><br>
                                      > The code is already published
                                      at GitHub (<a href="https://github.com/intel/parallelstl" target="_blank">https://github.com/intel/parallelstl</a>).<br>
                                      > It supports the C++17
                                      standard execution policies (seq,
                                      par, par_unseq) as well as<br>
                                      > the experimental unsequenced
                                      policy (unseq) for SIMD execution.
                                      At the moment,<br>
                                      > about half of the C++17
                                      standard algorithms that must
                                      support execution policies<br>
                                      > are implemented; a few more
                                      will be ready soon, and the work
                                      continues.<br>
                                      > The tests that we use are
                                      also available at GitHub; needless
                                      to say we will<br>
                                      > contribute those as well.<br>
                                      ><br>
                                      > The implementation is not
                                      specific to Intel’s hardware. For
                                      thread-level parallelism<br>
                                      > it uses TBB* (<a href="https://www.threadingbuildingblocks.org/" target="_blank">https://www.threadingbuildingblocks.org/</a>)
                                      but abstracts it with<br>
                                      > an internal API which can be
                                      implemented on top of other
                                      threading/parallel solutions –<br>
                                      > so it is for the community to
                                      decide which ones to use. For SIMD
                                      parallelism<br>
                                      > (unseq, par_unseq) we use
                                      #pragma omp simd directives; it is
                                      vendor-neutral and<br>
                                      > does not require any OpenMP
                                      runtime support.<br>
                                      ><br>
                                      > The current implementation
                                      meets the spirit but not always
                                      the letter of<br>
                                      > the standard, because it has
                                      to be separate from but also
                                      coexist with<br>
                                      > implementations of standard
                                      C++ libraries. While preparing the
                                      contribution,<br>
                                      > we will address
                                      inconsistencies, adjust the code
                                      to meet community standards,<br>
                                      > and better integrate it into
                                      the standard library code.<br>
                                      ><br>
                                      > We are also proposing that
                                      our implementation is included
                                      into libstdc++/GCC.<br>
                                      > Compatibility between the
                                      implementations seems useful as it
                                      can potentially<br>
                                      > reduce the amount of work for
                                      everyone. We hope to keep the code
                                      mostly identical,<br>
                                      > and would like to know if you
                                      think it’s too optimistic to
                                      expect.<br>
                                      ><br>
                                      > Obviously we plan to use
                                      appropriate open source licenses
                                      to meet the different<br>
                                      > projects’ requirements.<br>
                                      ><br>
                                      > We expect to keep developing
                                      the code and will take the
                                      responsibility for<br>
                                      > maintaining it (with
                                      community contributions, of
                                      course). If there are other<br>
                                      > community efforts to
                                      implement parallel algorithms, we
                                      are willing to collaborate.<br>
                                      ><br>
                                      > We look forward to your
                                      feedback, both for the overall
                                      idea and – if supported –<br>
                                      > for the next steps we should
                                      take.<br>
                                      ><br>
                                      > Regards,<br>
                                      > - Alexey Kukanov<br>
                                      ><br>
                                      > * Note that TBB itself is
                                      highly portable (and ported by
                                      community to Power and ARM<br>
                                      > architectures) and
                                      permissively licensed, so could be
                                      the base for the threading<br>
                                      > infrastructure. But the
                                      Parallel STL implementation itself
                                      does not require TBB.<br>
                                      ><br>
                                      >
                                      _______________________________________________<br>
                                      > cfe-dev mailing list<br>
                                      > <a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>
                                      > <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
                                      <br>
                                      <br>
                                      <br>
                                      <br>
                                      --<br>
                                      Jeff Hammond<br>
                                      <a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br>
                                      <a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a>
                                      <div> </div>
                                    </div>
                                  </div>
                                </div>
                              </div>
                            </div>
                            ,
                            <p><span>_______________________________________________<br>
                                cfe-dev mailing list<br>
                                <a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>
                                <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a></span></p>
                          </blockquote>
                        </blockquote>
                      </div>
                       
                      <div> </div>
                      --
                      <div>Jeff Hammond<br>
                        <a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br>
                        <a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>
                    </div>
                  </div>
                </blockquote>
                <br>
                <fieldset class="m_8513274869410520852m_2065056468622040889mimeAttachmentHeader"></fieldset>
                <br>
                <pre>_______________________________________________
cfe-dev mailing list
<a class="m_8513274869410520852m_2065056468622040889moz-txt-link-abbreviated" href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>
<a class="m_8513274869410520852m_2065056468622040889moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a>
</pre>
              </blockquote>
              <br>
            </div>
            <div bgcolor="#FFFFFF" text="#000000">
              <pre class="m_8513274869410520852m_2065056468622040889moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
            </div>
          </blockquote>
        </div>
      </div>
      <div>-- <br>
      </div>
      <div class="m_8513274869410520852gmail_signature" data-smartmail="gmail_signature">Jeff
        Hammond<br>
        <a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br>
        <a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>
    </blockquote>
    <br>
    <pre class="m_8513274869410520852moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
  </div></blockquote></div></div><div dir="ltr">-- <br></div><div class="gmail_signature" data-smartmail="gmail_signature">Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>