[cfe-dev] [RFC] Proposal to contribute Intel’s implementation of C++17 parallel algorithms

Sun Dec 3 21:07:34 PST 2017

ICC implements a very aggressive interpretation of the OpenMP standard, and
this interpretation is not shared by everyone in the OpenMP community.  ICC
is correct but other implementations may be far less aggressive, so
_Pragma("omp simd") doesn't guarentee vectorization unless the compiler
documentation says that is how it is implemented.  All the standard says
that it means is that vectorization is _permitted_.

Given that the practical meaning of _Pragma("omp simd") isn't guaranteed to
be consistent across different implementations, I don't really know how to
compare it to compiler-specific pragmas unless we define everything
explicitly.

In any case, my fundamental point remains: do not use OpenMP pragmas here,
but instead use whatever the appropriate compiler-specific pragma is, or
create a new one that meets the need.

Best,

Jeff

On Sun, Dec 3, 2017 at 8:09 PM, Serge Preis <spreis at yandex-team.ru> wrote:

> Hello,
>
> _Pragma("omp simd") is semantically quite different from _Pragma("clang
> loop vectorize(assume_safety)"), _Pragma("GCC ivdep") and _Pragma("vector
> always"), so I am not sure all latter will work as expected in all cases.
> They definitely won't provide any vectorization guarantees which slightly
> defeat the purpose of using corresponding execution policy.
>
> I support the idea of having OpenMP orthogonal and definitely having
> -fopenmp enabled by default is not an option. Intel compiler has separate
> -qopenmp-simd option which doesn't affect performance outside explicitly
> marked loops, but even this is not enabled by default. I would say that
> there might exist multiple implementations of unordered policy, originally
> OpenMP SIMD based implementation may be more powerful and one based on
> other pragmas being default, but hinting about existence of faster option.
> Later on one may be brave enough to add some SIMD template library and
> implement default unordered policy using it (such implementation is
> possible even now using vector types, but it will be extremely complex if
> attempt to target all base data types, vector widths and target SIMD
> architectures clang supports. Even with the library this may be quite
> tedious).
>
> Without any standard way of expressing SIMD perallelism in pure C++ any
> implementer of SIMD execution policy is to rely on means avaialble for
> plaform/compiler and so it is not totaly unnatural to ask user to enable
> OpenMP SIMD for efficient support of corresponding execution policy.
>
> Reagrds,
> Serge Preis
>
> (Who once was part of Intel Compiler Vectorizer team and driven OpenMP
> SIMD efforts within icc and beyond, if anyone is keeping track of
> conflicts-of-interest)
>
>
> 04.12.2017, 08:46, "Jeff Hammond via cfe-dev" <cfe-dev at lists.llvm.org>:
>
> It would be nice to keep PSTL and OpenMP orthogonal, even if _Pragma("omp
> simd") does not require runtime support.  It should be trivial to use
> _Pragma("clang loop vectorize(assume_safety)") instead, by wrapping all of
> the different compiler vectorization pragmas in preprocessor logic.  I
> similarly recommend _Pragma("GCC ivdep") for GCC and _Pragma("vector
> always") for ICC.  While this requires O(n_compilers) effort instead of
> O(1), but orthogonality is worth it.
>
> While OpenMP is vendor/compiler-agnostic, users should not be required to
> use -fopenmp or similar to enable vectorization from PSTL, nor should the
> compiler enable any OpenMP pragma by default.  I know of cases where merely
> using the -fopenmp flag alters code generation in a performance-visible
> manner, and enabling the OpenMP "simd" pragma by default may surprise some
> users, particularly if no other OpenMP pragmas are enabled by default.
>
> Best,
>
> Jeff
> (who works for Intel but not on any software products and has been a heavy
> user of Intel PSTL since it was released, if anyone is keeping track of
> conflicts-of-interest)
>
> On Wed, Nov 29, 2017 at 4:21 AM, Kukanov, Alexey via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
> >
> > Hello all,
> >
> > At Intel, we have developed an implementation of C++17 execution policies
> > for algorithms (often referred to as Parallel STL). We hope to
> contribute it
> > to libc++/LLVM, so would like to ask the community for comments on this.
> >
> > The code is already published at GitHub (https://github.com/intel/
> parallelstl).
> > It supports the C++17 standard execution policies (seq, par, par_unseq)
> as well as
> > the experimental unsequenced policy (unseq) for SIMD execution. At the
> moment,
> > about half of the C++17 standard algorithms that must support execution
> policies
> > are implemented; a few more will be ready soon, and the work continues.
> > The tests that we use are also available at GitHub; needless to say we
> will
> > contribute those as well.
> >
> > The implementation is not specific to Intel’s hardware. For thread-level
> parallelism
> > it uses TBB* (https://www.threadingbuildingblocks.org/) but abstracts
> it with
> > an internal API which can be implemented on top of other
> threading/parallel solutions –
> > so it is for the community to decide which ones to use. For SIMD
> parallelism
> > (unseq, par_unseq) we use #pragma omp simd directives; it is
> vendor-neutral and
> > does not require any OpenMP runtime support.
> >
> > The current implementation meets the spirit but not always the letter of
> > the standard, because it has to be separate from but also coexist with
> > implementations of standard C++ libraries. While preparing the
> contribution,
> > we will address inconsistencies, adjust the code to meet community
> standards,
> > and better integrate it into the standard library code.
> >
> > We are also proposing that our implementation is included into
> libstdc++/GCC.
> > Compatibility between the implementations seems useful as it can
> potentially
> > reduce the amount of work for everyone. We hope to keep the code mostly
> identical,
> > and would like to know if you think it’s too optimistic to expect.
> >
> > Obviously we plan to use appropriate open source licenses to meet the
> different
> > projects’ requirements.
> >
> > We expect to keep developing the code and will take the responsibility
> for
> > maintaining it (with community contributions, of course). If there are
> other
> > community efforts to implement parallel algorithms, we are willing to
> collaborate.
> >
> > We look forward to your feedback, both for the overall idea and – if
> supported –
> > for the next steps we should take.
> >
> > Regards,
> > - Alexey Kukanov
> >
> > * Note that TBB itself is highly portable (and ported by community to
> Power and ARM
> > architectures) and permissively licensed, so could be the base for the
> threading
> > infrastructure. But the Parallel STL implementation itself does not
> require TBB.
> >
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
>
> ,
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>

-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20171203/2927229a/attachment.html>