<div><br><div class="gmail_quote"><div dir="auto">On Wed, Dec 6, 2017 at 8:57 PM Hal Finkel <<a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    <p><br>

    </p>

    <div class="m_8513274869410520852moz-cite-prefix">On 12/06/2017 10:23 PM, Jeff Hammond

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div><br>

        <div class="gmail_quote">

          <div dir="auto">On Wed, Dec 6, 2017 at 4:23 PM Hal Finkel <<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>>

            wrote:<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div bgcolor="#FFFFFF" text="#000000">

              <p><br>

              </p>

              <div class="m_8513274869410520852m_2065056468622040889moz-cite-prefix">On

                12/04/2017 10:48 PM, Serge Preis via cfe-dev wrote:<br>

              </div>

              <blockquote type="cite">

                <div>I agree that guarantees provided by ICC may be

                  stronger than with other compilers, so yes, under

                  OpenMP terms vectorization is permitted and cannot be

                  assumed. However OpenMP clearly defines semantics of

                  variables used within OpenMP region some being

                  shared(scalar), some private(vector) and some being

                  inductions. This goes far beyond typical compiler

                  specific pragmas about dependencies and cost modelling

                  and makes vectorization much simpler task with more

                  predictable and robust results if properly implemented

                  (admittedly, even ICC implementation is far from

                  perfect). I hope Intel's efforts to standardize

                  someting like this in core C++ will evntually come to

                  fruition. Until then I as a regular application

                  developer would appreciate OpenMP-simd based execution

                  policy (hoping for good support for OpenMP SIMD in

                  clang), but it shouldn't necessary be part of libc++.

                  Since 'unordered' execution policy is currently not

                  part of C++ standard </div>

              </blockquote>

              <br>

            </div>

            <div bgcolor="#FFFFFF" text="#000000">

              std::execution::par_unseq is part of C++17, and that

              essentially maps to '#pragma omp parallel for simd'.</div>

            <div bgcolor="#FFFFFF" text="#000000"><br>

            </div>

          </blockquote>

          <div dir="auto"><br>

          </div>

          <div dir="auto">Do you expect par/par_unseq to nest?</div>

        </div>

      </div>

    </blockquote>

    <br></div><div bgcolor="#FFFFFF" text="#000000">

    Yes.</div><div bgcolor="#FFFFFF" text="#000000"><br>

    <br>

    <blockquote type="cite">

      <div>

        <div class="gmail_quote">

          <div dir="auto"> Nesting omp-parallel is generally regarded as

            a Bad Idea.</div>

        </div>

      </div>

    </blockquote>

    <br></div><div bgcolor="#FFFFFF" text="#000000">

    Agreed. I suspect we'll want the mapping to be more like '#pragma

    omp taskloop simd'.</div><div bgcolor="#FFFFFF" text="#000000"><br>

    </div></blockquote><div dir="auto"><br></div><div dir="auto">That won’t run in parallel unless in an omp-parallel-master region. That means OpenMP-based PSTL won’t be parallel unless the user knows to add back-end specific code about the PSTL.</div><div dir="auto"><br></div><div dir="auto">What I’m trying to say is that OpenMP is a poor target for PSTL in its current form. Nested parallel regions is the only thing that works and it is likely to work poorly.</div><div dir="auto"><br></div><div dir="auto">Jeff</div><div dir="auto"><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><br>

     -Hal</div><div bgcolor="#FFFFFF" text="#000000"><br>

    <br>

    <blockquote type="cite">

      <div>

        <div class="gmail_quote">

          <div dir="auto"><br>

          </div>

          <div dir="auto">Jeff</div>

          <div dir="auto"><br>

          </div>

          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div bgcolor="#FFFFFF" text="#000000"><br>

              <blockquote type="cite">

                <div>I don't care much on how it will be implemneted in

                  libc++ if it is. I just would like to ask Intel guys

                  and community here to make implementation extensible

                  in a sense that custom OpenMP-SIMD-based execution

                  policy along with algorithms implementations (as

                  specializations for the policy) can be used with the

                  libc++ library. And I additionally would like to ask

                  Intel guys to provide complete and compatible

                  extension on github for developers like me to use.</div>

              </blockquote>

              <br>

            </div>

            <div bgcolor="#FFFFFF" text="#000000"> In the end, I think

              we want the following:<br>

              <br>

               1. A design for libc++ that allows the thread-level

              parallelism to be implemented in terms of different

              underlying providers (i.e., OpenMP, GCD, Work Queues on

              Windows, whatever else).<br>

               2. To follow the same philosophy with respect to

              standards as we do everywhere else: Use standards where

              possible with compiler/system-specific extensions as

              necessary.<br>

              <br>

               -Hal</div>

            <div bgcolor="#FFFFFF" text="#000000"><br>

              <br>

              <blockquote type="cite">

                <div> </div>

                <div>Regards,</div>

                <div>Serge.</div>

                <div> </div>

                <div> </div>

                <div> </div>

                <div>04.12.2017, 12:07, "Jeff Hammond" <a class="m_8513274869410520852m_2065056468622040889moz-txt-link-rfc2396E" href="mailto:jeff.science@gmail.com" target="_blank"><jeff.science@gmail.com></a>:</div>

                <blockquote type="cite">

                  <div>

                    <div>ICC implements a very aggressive interpretation

                      of the OpenMP standard, and this interpretation is

                      not shared by everyone in the OpenMP community. 

                      ICC is correct but other implementations may be

                      far less aggressive, so _Pragma("omp simd")

                      doesn't guarentee vectorization unless the

                      compiler documentation says that is how it is

                      implemented.  All the standard says that it means

                      is that vectorization is _permitted_.</div>

                    <div> </div>

                    <div>Given that the practical meaning of

                      _Pragma("omp simd") isn't guaranteed to be

                      consistent across different implementations, I

                      don't really know how to compare it to

                      compiler-specific pragmas unless we define

                      everything explicitly.</div>

                    <div> </div>

                    <div>In any case, my fundamental point remains: do

                      not use OpenMP pragmas here, but instead use

                      whatever the appropriate compiler-specific pragma

                      is, or create a new one that meets the need.</div>

                    <div> </div>

                    <div>Best,</div>

                    <div> </div>

                    <div>Jeff</div>

                    <div title="Page 81">

                      <div>

                        <div> </div>

                      </div>

                    </div>

                    <div> 

                      <div>On Sun, Dec 3, 2017 at 8:09 PM, Serge Preis <span><<a href="mailto:spreis@yandex-team.ru" target="_blank">spreis@yandex-team.ru</a>></span>

                        wrote:

                        <blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                          <div>Hello,</div>

                          <div> </div>

                          <div>_Pragma("omp simd") is semantically quite

                            different from _Pragma("clang loop

                            vectorize(assume_safety)"), _Pragma("GCC

                            ivdep") and _Pragma("vector always"), so I

                            am not sure all latter will work as expected

                            in all cases. They definitely won't provide

                            any vectorization guarantees which slightly

                            defeat the purpose of using corresponding

                            execution policy.</div>

                          <div> </div>

                          <div>I support the idea of having OpenMP

                            orthogonal and definitely having -fopenmp

                            enabled by default is not an option. Intel

                            compiler has separate -qopenmp-simd option

                            which doesn't affect performance outside

                            explicitly marked loops, but even this is

                            not enabled by default. I would say that

                            there might exist multiple implementations

                            of unordered policy, originally OpenMP SIMD

                            based implementation may be more powerful

                            and one based on other pragmas being

                            default, but hinting about existence of

                            faster option. Later on one may be brave

                            enough to add some SIMD template library and

                            implement default unordered policy using it

                            (such implementation is possible even now

                            using vector types, but it will be extremely

                            complex if attempt to target all base data

                            types, vector widths and target SIMD

                            architectures clang supports. Even with the

                            library this may be quite tedious).</div>

                          <div> </div>

                          <div>Without any standard way of expressing

                            SIMD perallelism in pure C++ any implementer

                            of SIMD execution policy is to rely on means

                            avaialble for plaform/compiler and so it is

                            not totaly unnatural to ask user to enable

                            OpenMP SIMD for efficient support of

                            corresponding execution policy.</div>

                          <div> </div>

                          <div>Reagrds,</div>

                          <div>Serge Preis</div>

                          <div> </div>

                          <div>(Who once was part of Intel Compiler

                            Vectorizer team and driven OpenMP SIMD

                            efforts within icc and beyond, if anyone is

                            keeping track of conflicts-of-interest)</div>

                          <div> </div>

                          <div> </div>

                          <div>04.12.2017, 08:46, "Jeff Hammond via

                            cfe-dev" <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>>:</div>

                          <blockquote type="cite">

                            <div>

                              <div>

                                <div>It would be nice to keep PSTL and

                                  OpenMP orthogonal, even if

                                  _Pragma("omp simd") does not require

                                  runtime support.  It should be trivial

                                  to use _Pragma("clang loop

                                  vectorize(assume_safety)") instead, by

                                  wrapping all of the different compiler

                                  vectorization pragmas in preprocessor

                                  logic.  I similarly recommend

                                  _Pragma("GCC ivdep") for GCC and

                                  _Pragma("vector always") for ICC. 

                                  While this requires O(n_compilers)

                                  effort instead of O(1), but

                                  orthogonality is worth it.

                                  <div> </div>

                                  <div>While OpenMP is

                                    vendor/compiler-agnostic, users

                                    should not be required to use

                                    -fopenmp or similar to enable

                                    vectorization from PSTL, nor should

                                    the compiler enable any OpenMP

                                    pragma by default.  I know of cases

                                    where merely using the -fopenmp flag

                                    alters code generation in a

                                    performance-visible manner, and

                                    enabling the OpenMP "simd" pragma by

                                    default may surprise some users,

                                    particularly if no other OpenMP

                                    pragmas are enabled by default.

                                    <div><br>

                                      Best,</div>

                                    <div> </div>

                                    <div>Jeff</div>

                                    <div>(who works for Intel but not on

                                      any software products and has been

                                      a heavy user of Intel PSTL since

                                      it was released, if anyone is

                                      keeping track of

                                      conflicts-of-interest)<br>

                                      <br>

                                      On Wed, Nov 29, 2017 at 4:21 AM,

                                      Kukanov, Alexey via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>>

                                      wrote:<br>

                                      ><br>

                                      > Hello all,<br>

                                      ><br>

                                      > At Intel, we have developed

                                      an implementation of C++17

                                      execution policies<br>

                                      > for algorithms (often

                                      referred to as Parallel STL). We

                                      hope to contribute it<br>

                                      > to libc++/LLVM, so would like

                                      to ask the community for comments

                                      on this.<br>

                                      ><br>

                                      > The code is already published

                                      at GitHub (<a href="https://github.com/intel/parallelstl" target="_blank">https://github.com/intel/parallelstl</a>).<br>

                                      > It supports the C++17

                                      standard execution policies (seq,

                                      par, par_unseq) as well as<br>

                                      > the experimental unsequenced

                                      policy (unseq) for SIMD execution.

                                      At the moment,<br>

                                      > about half of the C++17

                                      standard algorithms that must

                                      support execution policies<br>

                                      > are implemented; a few more

                                      will be ready soon, and the work

                                      continues.<br>

                                      > The tests that we use are

                                      also available at GitHub; needless

                                      to say we will<br>

                                      > contribute those as well.<br>

                                      ><br>

                                      > The implementation is not

                                      specific to Intel’s hardware. For

                                      thread-level parallelism<br>

                                      > it uses TBB* (<a href="https://www.threadingbuildingblocks.org/" target="_blank">https://www.threadingbuildingblocks.org/</a>)

                                      but abstracts it with<br>

                                      > an internal API which can be

                                      implemented on top of other

                                      threading/parallel solutions –<br>

                                      > so it is for the community to

                                      decide which ones to use. For SIMD

                                      parallelism<br>

                                      > (unseq, par_unseq) we use

                                      #pragma omp simd directives; it is

                                      vendor-neutral and<br>

                                      > does not require any OpenMP

                                      runtime support.<br>

                                      ><br>

                                      > The current implementation

                                      meets the spirit but not always

                                      the letter of<br>

                                      > the standard, because it has

                                      to be separate from but also

                                      coexist with<br>

                                      > implementations of standard

                                      C++ libraries. While preparing the

                                      contribution,<br>

                                      > we will address

                                      inconsistencies, adjust the code

                                      to meet community standards,<br>

                                      > and better integrate it into

                                      the standard library code.<br>

                                      ><br>

                                      > We are also proposing that

                                      our implementation is included

                                      into libstdc++/GCC.<br>

                                      > Compatibility between the

                                      implementations seems useful as it

                                      can potentially<br>

                                      > reduce the amount of work for

                                      everyone. We hope to keep the code

                                      mostly identical,<br>

                                      > and would like to know if you

                                      think it’s too optimistic to

                                      expect.<br>

                                      ><br>

                                      > Obviously we plan to use

                                      appropriate open source licenses

                                      to meet the different<br>

                                      > projects’ requirements.<br>

                                      ><br>

                                      > We expect to keep developing

                                      the code and will take the

                                      responsibility for<br>

                                      > maintaining it (with

                                      community contributions, of

                                      course). If there are other<br>

                                      > community efforts to

                                      implement parallel algorithms, we

                                      are willing to collaborate.<br>

                                      ><br>

                                      > We look forward to your

                                      feedback, both for the overall

                                      idea and – if supported –<br>

                                      > for the next steps we should

                                      take.<br>

                                      ><br>

                                      > Regards,<br>

                                      > - Alexey Kukanov<br>

                                      ><br>

                                      > * Note that TBB itself is

                                      highly portable (and ported by

                                      community to Power and ARM<br>

                                      > architectures) and

                                      permissively licensed, so could be

                                      the base for the threading<br>

                                      > infrastructure. But the

                                      Parallel STL implementation itself

                                      does not require TBB.<br>

                                      ><br>

                                      >

                                      _______________________________________________<br>

                                      > cfe-dev mailing list<br>

                                      > <a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>

                                      > <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>

                                      <br>

                                      <br>

                                      <br>

                                      <br>

                                      --<br>

                                      Jeff Hammond<br>

                                      <a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br>

                                      <a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a>

                                      <div> </div>

                                    </div>

                                  </div>

                                </div>

                              </div>

                            </div>

                            ,

                            <p><span>_______________________________________________<br>

                                cfe-dev mailing list<br>

                                <a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>

                                <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a></span></p>

                          </blockquote>

                        </blockquote>

                      </div>

                      <div> </div>

                      --

                      <div>Jeff Hammond<br>

                        <a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br>

                        <a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>

                    </div>

                  </div>

                </blockquote>

                <br>

                <fieldset class="m_8513274869410520852m_2065056468622040889mimeAttachmentHeader"></fieldset>

                <br>

                <pre>_______________________________________________

cfe-dev mailing list

<a class="m_8513274869410520852m_2065056468622040889moz-txt-link-abbreviated" href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>

<a class="m_8513274869410520852m_2065056468622040889moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a>

</pre>

              </blockquote>

              <br>

            </div>

            <div bgcolor="#FFFFFF" text="#000000">

              <pre class="m_8513274869410520852m_2065056468622040889moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

            </div>

          </blockquote>

        </div>

      </div>

      <div>-- <br>

      </div>

      <div class="m_8513274869410520852gmail_signature" data-smartmail="gmail_signature">Jeff

        Hammond<br>

        <a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br>

        <a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>

    </blockquote>

    <br>

    <pre class="m_8513274869410520852moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

  </div></blockquote></div></div><div dir="ltr">-- <br></div><div class="gmail_signature" data-smartmail="gmail_signature">Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>