[cfe-dev] [RFC] Proposal to contribute Intel’s implementation of C++17 parallel algorithms

Thu Dec 7 09:35:44 PST 2017

On Wed, Dec 6, 2017 at 8:57 PM Hal Finkel <hfinkel at anl.gov> wrote:

>
> On 12/06/2017 10:23 PM, Jeff Hammond wrote:
>
>
> On Wed, Dec 6, 2017 at 4:23 PM Hal Finkel <hfinkel at anl.gov> wrote:
>
>>
>> On 12/04/2017 10:48 PM, Serge Preis via cfe-dev wrote:
>>
>> I agree that guarantees provided by ICC may be stronger than with other
>> compilers, so yes, under OpenMP terms vectorization is permitted and cannot
>> be assumed. However OpenMP clearly defines semantics of variables used
>> within OpenMP region some being shared(scalar), some private(vector) and
>> some being inductions. This goes far beyond typical compiler specific
>> pragmas about dependencies and cost modelling and makes vectorization much
>> simpler task with more predictable and robust results if properly
>> implemented (admittedly, even ICC implementation is far from perfect). I
>> hope Intel's efforts to standardize someting like this in core C++ will
>> evntually come to fruition. Until then I as a regular application developer
>> would appreciate OpenMP-simd based execution policy (hoping for good
>> support for OpenMP SIMD in clang), but it shouldn't necessary be part of
>> libc++. Since 'unordered' execution policy is currently not part of C++
>> standard
>>
>>
>> std::execution::par_unseq is part of C++17, and that essentially maps to
>> '#pragma omp parallel for simd'.
>>
>>
> Do you expect par/par_unseq to nest?
>
>
> Yes.
>
>
> Nesting omp-parallel is generally regarded as a Bad Idea.
>
>
> Agreed. I suspect we'll want the mapping to be more like '#pragma omp
> taskloop simd'.
>
>
That won’t run in parallel unless in an omp-parallel-master region. That
means OpenMP-based PSTL won’t be parallel unless the user knows to add
back-end specific code about the PSTL.

What I’m trying to say is that OpenMP is a poor target for PSTL in its
current form. Nested parallel regions is the only thing that works and it
is likely to work poorly.

Jeff

>  -Hal
>
>
>
> Jeff
>
>
>> I don't care much on how it will be implemneted in libc++ if it is. I
>> just would like to ask Intel guys and community here to make implementation
>> extensible in a sense that custom OpenMP-SIMD-based execution policy along
>> with algorithms implementations (as specializations for the policy) can be
>> used with the libc++ library. And I additionally would like to ask Intel
>> guys to provide complete and compatible extension on github for developers
>> like me to use.
>>
>>
>> In the end, I think we want the following:
>>
>>  1. A design for libc++ that allows the thread-level parallelism to be
>> implemented in terms of different underlying providers (i.e., OpenMP, GCD,
>> Work Queues on Windows, whatever else).
>>  2. To follow the same philosophy with respect to standards as we do
>> everywhere else: Use standards where possible with compiler/system-specific
>> extensions as necessary.
>>
>>  -Hal
>>
>>
>>
>> Regards,
>> Serge.
>>
>>
>>
>> 04.12.2017, 12:07, "Jeff Hammond" <jeff.science at gmail.com>
>> <jeff.science at gmail.com>:
>>
>> ICC implements a very aggressive interpretation of the OpenMP standard,
>> and this interpretation is not shared by everyone in the OpenMP community.
>> ICC is correct but other implementations may be far less aggressive, so
>> _Pragma("omp simd") doesn't guarentee vectorization unless the compiler
>> documentation says that is how it is implemented.  All the standard says
>> that it means is that vectorization is _permitted_.
>>
>> Given that the practical meaning of _Pragma("omp simd") isn't guaranteed
>> to be consistent across different implementations, I don't really know how
>> to compare it to compiler-specific pragmas unless we define everything
>> explicitly.
>>
>> In any case, my fundamental point remains: do not use OpenMP pragmas
>> here, but instead use whatever the appropriate compiler-specific pragma is,
>> or create a new one that meets the need.
>>
>> Best,
>>
>> Jeff
>>
>>
>> On Sun, Dec 3, 2017 at 8:09 PM, Serge Preis <spreis at yandex-team.ru>
>> wrote:
>>
>> Hello,
>>
>> _Pragma("omp simd") is semantically quite different from _Pragma("clang
>> loop vectorize(assume_safety)"), _Pragma("GCC ivdep") and _Pragma("vector
>> always"), so I am not sure all latter will work as expected in all cases.
>> They definitely won't provide any vectorization guarantees which slightly
>> defeat the purpose of using corresponding execution policy.
>>
>> I support the idea of having OpenMP orthogonal and definitely having
>> -fopenmp enabled by default is not an option. Intel compiler has separate
>> -qopenmp-simd option which doesn't affect performance outside explicitly
>> marked loops, but even this is not enabled by default. I would say that
>> there might exist multiple implementations of unordered policy, originally
>> OpenMP SIMD based implementation may be more powerful and one based on
>> other pragmas being default, but hinting about existence of faster option.
>> Later on one may be brave enough to add some SIMD template library and
>> implement default unordered policy using it (such implementation is
>> possible even now using vector types, but it will be extremely complex if
>> attempt to target all base data types, vector widths and target SIMD
>> architectures clang supports. Even with the library this may be quite
>> tedious).
>>
>> Without any standard way of expressing SIMD perallelism in pure C++ any
>> implementer of SIMD execution policy is to rely on means avaialble for
>> plaform/compiler and so it is not totaly unnatural to ask user to enable
>> OpenMP SIMD for efficient support of corresponding execution policy.
>>
>> Reagrds,
>> Serge Preis
>>
>> (Who once was part of Intel Compiler Vectorizer team and driven OpenMP
>> SIMD efforts within icc and beyond, if anyone is keeping track of
>> conflicts-of-interest)
>>
>>
>> 04.12.2017, 08:46, "Jeff Hammond via cfe-dev" <cfe-dev at lists.llvm.org>:
>>
>> It would be nice to keep PSTL and OpenMP orthogonal, even if _Pragma("omp
>> simd") does not require runtime support.  It should be trivial to use
>> _Pragma("clang loop vectorize(assume_safety)") instead, by wrapping all of
>> the different compiler vectorization pragmas in preprocessor logic.  I
>> similarly recommend _Pragma("GCC ivdep") for GCC and _Pragma("vector
>> always") for ICC.  While this requires O(n_compilers) effort instead of
>> O(1), but orthogonality is worth it.
>>
>> While OpenMP is vendor/compiler-agnostic, users should not be required to
>> use -fopenmp or similar to enable vectorization from PSTL, nor should the
>> compiler enable any OpenMP pragma by default.  I know of cases where merely
>> using the -fopenmp flag alters code generation in a performance-visible
>> manner, and enabling the OpenMP "simd" pragma by default may surprise some
>> users, particularly if no other OpenMP pragmas are enabled by default.
>>
>> Best,
>>
>> Jeff
>> (who works for Intel but not on any software products and has been a
>> heavy user of Intel PSTL since it was released, if anyone is keeping track
>> of conflicts-of-interest)
>>
>> On Wed, Nov 29, 2017 at 4:21 AM, Kukanov, Alexey via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>> >
>> > Hello all,
>> >
>> > At Intel, we have developed an implementation of C++17 execution
>> policies
>> > for algorithms (often referred to as Parallel STL). We hope to
>> contribute it
>> > to libc++/LLVM, so would like to ask the community for comments on this.
>> >
>> > The code is already published at GitHub (
>> https://github.com/intel/parallelstl).
>> > It supports the C++17 standard execution policies (seq, par, par_unseq)
>> as well as
>> > the experimental unsequenced policy (unseq) for SIMD execution. At the
>> moment,
>> > about half of the C++17 standard algorithms that must support execution
>> policies
>> > are implemented; a few more will be ready soon, and the work continues.
>> > The tests that we use are also available at GitHub; needless to say we
>> will
>> > contribute those as well.
>> >
>> > The implementation is not specific to Intel’s hardware. For
>> thread-level parallelism
>> > it uses TBB* (https://www.threadingbuildingblocks.org/) but abstracts
>> it with
>> > an internal API which can be implemented on top of other
>> threading/parallel solutions –
>> > so it is for the community to decide which ones to use. For SIMD
>> parallelism
>> > (unseq, par_unseq) we use #pragma omp simd directives; it is
>> vendor-neutral and
>> > does not require any OpenMP runtime support.
>> >
>> > The current implementation meets the spirit but not always the letter of
>> > the standard, because it has to be separate from but also coexist with
>> > implementations of standard C++ libraries. While preparing the
>> contribution,
>> > we will address inconsistencies, adjust the code to meet community
>> standards,
>> > and better integrate it into the standard library code.
>> >
>> > We are also proposing that our implementation is included into
>> libstdc++/GCC.
>> > Compatibility between the implementations seems useful as it can
>> potentially
>> > reduce the amount of work for everyone. We hope to keep the code mostly
>> identical,
>> > and would like to know if you think it’s too optimistic to expect.
>> >
>> > Obviously we plan to use appropriate open source licenses to meet the
>> different
>> > projects’ requirements.
>> >
>> > We expect to keep developing the code and will take the responsibility
>> for
>> > maintaining it (with community contributions, of course). If there are
>> other
>> > community efforts to implement parallel algorithms, we are willing to
>> collaborate.
>> >
>> > We look forward to your feedback, both for the overall idea and – if
>> supported –
>> > for the next steps we should take.
>> >
>> > Regards,
>> > - Alexey Kukanov
>> >
>> > * Note that TBB itself is highly portable (and ported by community to
>> Power and ARM
>> > architectures) and permissively licensed, so could be the base for the
>> threading
>> > infrastructure. But the Parallel STL implementation itself does not
>> require TBB.
>> >
>> > _______________________________________________
>> > cfe-dev mailing list
>> > cfe-dev at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>>
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com
>> http://jeffhammond.github.io/
>>
>> ,
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com
>> http://jeffhammond.github.io/
>>
>>
>>
>> _______________________________________________
>> cfe-dev mailing listcfe-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>> --
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
> --
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20171207/32aaef95/attachment.html>