<div><br><div class="gmail_quote"><div dir="auto">On Wed, Dec 6, 2017 at 4:23 PM Hal Finkel <<a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    <p><br>

    </p>

    <div class="m_2065056468622040889moz-cite-prefix">On 12/04/2017 10:48 PM, Serge Preis via

      cfe-dev wrote:<br>

    </div>

    <blockquote type="cite">

      <div>I agree that guarantees provided by ICC may be stronger than

        with other compilers, so yes, under OpenMP terms vectorization

        is permitted and cannot be assumed. However OpenMP clearly

        defines semantics of variables used within OpenMP region some

        being shared(scalar), some private(vector) and some being

        inductions. This goes far beyond typical compiler specific

        pragmas about dependencies and cost modelling and makes

        vectorization much simpler task with more predictable and robust

        results if properly implemented (admittedly, even ICC

        implementation is far from perfect). I hope Intel's efforts to

        standardize someting like this in core C++ will evntually come

        to fruition. Until then I as a regular application developer

        would appreciate OpenMP-simd based execution policy (hoping for

        good support for OpenMP SIMD in clang), but it shouldn't

        necessary be part of libc++. Since 'unordered' execution policy

        is currently not part of C++ standard </div>

    </blockquote>

    <br></div><div bgcolor="#FFFFFF" text="#000000">

    std::execution::par_unseq is part of C++17, and that essentially

    maps to '#pragma omp parallel for simd'.</div><div bgcolor="#FFFFFF" text="#000000"><br>

    </div></blockquote><div dir="auto"><br></div><div dir="auto">Do you expect par/par_unseq to nest? Nesting omp-parallel is generally regarded as a Bad Idea.</div><div dir="auto"><br></div><div dir="auto">Jeff</div><div dir="auto"><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><br>

    <blockquote type="cite">

      <div>I don't care much on how it will be implemneted in libc++ if

        it is. I just would like to ask Intel guys and community here to

        make implementation extensible in a sense that custom

        OpenMP-SIMD-based execution policy along with algorithms

        implementations (as specializations for the policy) can be used

        with the libc++ library. And I additionally would like to ask

        Intel guys to provide complete and compatible extension on

        github for developers like me to use.</div>

    </blockquote>

    <br></div><div bgcolor="#FFFFFF" text="#000000">

    In the end, I think we want the following:<br>

    <br>

     1. A design for libc++ that allows the thread-level parallelism to

    be implemented in terms of different underlying providers (i.e.,

    OpenMP, GCD, Work Queues on Windows, whatever else).<br>

     2. To follow the same philosophy with respect to standards as we do

    everywhere else: Use standards where possible with

    compiler/system-specific extensions as necessary.<br>

    <br>

     -Hal</div><div bgcolor="#FFFFFF" text="#000000"><br>

    <br>

    <blockquote type="cite">

      <div> </div>

      <div>Regards,</div>

      <div>Serge.</div>

      <div> </div>

      <div> </div>

      <div> </div>

      <div>04.12.2017, 12:07, "Jeff Hammond"

        <a class="m_2065056468622040889moz-txt-link-rfc2396E" href="mailto:jeff.science@gmail.com" target="_blank"><jeff.science@gmail.com></a>:</div>

      <blockquote type="cite">

        <div>

          <div>ICC implements a very aggressive interpretation of the

            OpenMP standard, and this interpretation is not shared by

            everyone in the OpenMP community.  ICC is correct but other

            implementations may be far less aggressive, so _Pragma("omp

            simd") doesn't guarentee vectorization unless the compiler

            documentation says that is how it is implemented.  All the

            standard says that it means is that vectorization is

            _permitted_.</div>

          <div> </div>

          <div>Given that the practical meaning of _Pragma("omp simd")

            isn't guaranteed to be consistent across different

            implementations, I don't really know how to compare it to

            compiler-specific pragmas unless we define everything

            explicitly.</div>

          <div> </div>

          <div>In any case, my fundamental point remains: do not use

            OpenMP pragmas here, but instead use whatever the

            appropriate compiler-specific pragma is, or create a new one

            that meets the need.</div>

          <div> </div>

          <div>Best,</div>

          <div> </div>

          <div>Jeff</div>

          <div title="Page 81">

            <div>

              <div> </div>

            </div>

          </div>

          <div> 

            <div>On Sun, Dec 3, 2017 at 8:09 PM, Serge Preis <span><<a href="mailto:spreis@yandex-team.ru" target="_blank">spreis@yandex-team.ru</a>></span>

              wrote:

              <blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                <div>Hello,</div>

                <div> </div>

                <div>_Pragma("omp simd") is semantically quite different

                  from _Pragma("clang loop vectorize(assume_safety)"),

                  _Pragma("GCC ivdep") and _Pragma("vector always"), so

                  I am not sure all latter will work as expected in all

                  cases. They definitely won't provide any vectorization

                  guarantees which slightly defeat the purpose of using

                  corresponding execution policy.</div>

                <div> </div>

                <div>I support the idea of having OpenMP orthogonal and

                  definitely having -fopenmp enabled by default is not

                  an option. Intel compiler has separate -qopenmp-simd

                  option which doesn't affect performance outside

                  explicitly marked loops, but even this is not enabled

                  by default. I would say that there might exist

                  multiple implementations of unordered policy,

                  originally OpenMP SIMD based implementation may be

                  more powerful and one based on other pragmas being

                  default, but hinting about existence of faster option.

                  Later on one may be brave enough to add some SIMD

                  template library and implement default unordered

                  policy using it (such implementation is possible even

                  now using vector types, but it will be extremely

                  complex if attempt to target all base data types,

                  vector widths and target SIMD architectures clang

                  supports. Even with the library this may be quite

                  tedious).</div>

                <div> </div>

                <div>Without any standard way of expressing SIMD

                  perallelism in pure C++ any implementer of SIMD

                  execution policy is to rely on means avaialble for

                  plaform/compiler and so it is not totaly unnatural to

                  ask user to enable OpenMP SIMD for efficient support

                  of corresponding execution policy.</div>

                <div> </div>

                <div>Reagrds,</div>

                <div>Serge Preis</div>

                <div> </div>

                <div>(Who once was part of Intel Compiler Vectorizer

                  team and driven OpenMP SIMD efforts within icc and

                  beyond, if anyone is keeping track of

                  conflicts-of-interest)</div>

                <div> </div>

                <div> </div>

                <div>04.12.2017, 08:46, "Jeff Hammond via cfe-dev" <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>>:</div>

                <blockquote type="cite">

                  <div>

                    <div>

                      <div>It would be nice to keep PSTL and OpenMP

                        orthogonal, even if _Pragma("omp simd") does not

                        require runtime support.  It should be trivial

                        to use _Pragma("clang loop

                        vectorize(assume_safety)") instead, by wrapping

                        all of the different compiler vectorization

                        pragmas in preprocessor logic.  I similarly

                        recommend _Pragma("GCC ivdep") for GCC and

                        _Pragma("vector always") for ICC.  While this

                        requires O(n_compilers) effort instead of O(1),

                        but orthogonality is worth it.

                        <div> </div>

                        <div>While OpenMP is vendor/compiler-agnostic,

                          users should not be required to use -fopenmp

                          or similar to enable vectorization from PSTL,

                          nor should the compiler enable any OpenMP

                          pragma by default.  I know of cases where

                          merely using the -fopenmp flag alters code

                          generation in a performance-visible manner,

                          and enabling the OpenMP "simd" pragma by

                          default may surprise some users, particularly

                          if no other OpenMP pragmas are enabled by

                          default.

                          <div><br>

                            Best,</div>

                          <div> </div>

                          <div>Jeff</div>

                          <div>(who works for Intel but not on any

                            software products and has been a heavy user

                            of Intel PSTL since it was released, if

                            anyone is keeping track of

                            conflicts-of-interest)<br>

                            <br>

                            On Wed, Nov 29, 2017 at 4:21 AM, Kukanov,

                            Alexey via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>>

                            wrote:<br>

                            ><br>

                            > Hello all,<br>

                            ><br>

                            > At Intel, we have developed an

                            implementation of C++17 execution policies<br>

                            > for algorithms (often referred to as

                            Parallel STL). We hope to contribute it<br>

                            > to libc++/LLVM, so would like to ask

                            the community for comments on this.<br>

                            ><br>

                            > The code is already published at GitHub

                            (<a href="https://github.com/intel/parallelstl" target="_blank">https://github.com/intel/parallelstl</a>).<br>

                            > It supports the C++17 standard

                            execution policies (seq, par, par_unseq) as

                            well as<br>

                            > the experimental unsequenced policy

                            (unseq) for SIMD execution. At the moment,<br>

                            > about half of the C++17 standard

                            algorithms that must support execution

                            policies<br>

                            > are implemented; a few more will be

                            ready soon, and the work continues.<br>

                            > The tests that we use are also

                            available at GitHub; needless to say we will<br>

                            > contribute those as well.<br>

                            ><br>

                            > The implementation is not specific to

                            Intel’s hardware. For thread-level

                            parallelism<br>

                            > it uses TBB* (<a href="https://www.threadingbuildingblocks.org/" target="_blank">https://www.threadingbuildingblocks.org/</a>)

                            but abstracts it with<br>

                            > an internal API which can be

                            implemented on top of other

                            threading/parallel solutions –<br>

                            > so it is for the community to decide

                            which ones to use. For SIMD parallelism<br>

                            > (unseq, par_unseq) we use #pragma omp

                            simd directives; it is vendor-neutral and<br>

                            > does not require any OpenMP runtime

                            support.<br>

                            ><br>

                            > The current implementation meets the

                            spirit but not always the letter of<br>

                            > the standard, because it has to be

                            separate from but also coexist with<br>

                            > implementations of standard C++

                            libraries. While preparing the contribution,<br>

                            > we will address inconsistencies, adjust

                            the code to meet community standards,<br>

                            > and better integrate it into the

                            standard library code.<br>

                            ><br>

                            > We are also proposing that our

                            implementation is included into

                            libstdc++/GCC.<br>

                            > Compatibility between the

                            implementations seems useful as it can

                            potentially<br>

                            > reduce the amount of work for everyone.

                            We hope to keep the code mostly identical,<br>

                            > and would like to know if you think

                            it’s too optimistic to expect.<br>

                            ><br>

                            > Obviously we plan to use appropriate

                            open source licenses to meet the different<br>

                            > projects’ requirements.<br>

                            ><br>

                            > We expect to keep developing the code

                            and will take the responsibility for<br>

                            > maintaining it (with community

                            contributions, of course). If there are

                            other<br>

                            > community efforts to implement parallel

                            algorithms, we are willing to collaborate.<br>

                            ><br>

                            > We look forward to your feedback, both

                            for the overall idea and – if supported –<br>

                            > for the next steps we should take.<br>

                            ><br>

                            > Regards,<br>

                            > - Alexey Kukanov<br>

                            ><br>

                            > * Note that TBB itself is highly

                            portable (and ported by community to Power

                            and ARM<br>

                            > architectures) and permissively

                            licensed, so could be the base for the

                            threading<br>

                            > infrastructure. But the Parallel STL

                            implementation itself does not require TBB.<br>

                            ><br>

                            >

                            _______________________________________________<br>

                            > cfe-dev mailing list<br>

                            > <a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>

                            > <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>

                            <br>

                            <br>

                            <br>

                            <br>

                            --<br>

                            Jeff Hammond<br>

                            <a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br>

                            <a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a>

                            <div> </div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </div>

                  ,

                  <p><span>_______________________________________________<br>

                      cfe-dev mailing list<br>

                      <a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>

                      <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a></span></p>

                </blockquote>

              </blockquote>

            </div>

            <div> </div>

            --

            <div>Jeff Hammond<br>

              <a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br>

              <a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>

          </div>

        </div>

      </blockquote>

      <br>

      <fieldset class="m_2065056468622040889mimeAttachmentHeader"></fieldset>

      <br>

      <pre>_______________________________________________

cfe-dev mailing list

<a class="m_2065056468622040889moz-txt-link-abbreviated" href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>

<a class="m_2065056468622040889moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a>

</pre>

    </blockquote>

    <br>

    </div><div bgcolor="#FFFFFF" text="#000000"><pre class="m_2065056468622040889moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

  </div></blockquote></div></div><div dir="ltr">-- <br></div><div class="gmail_signature" data-smartmail="gmail_signature">Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>