[cfe-dev] [RFC] Proposal to contribute Intel’s implementation of C++17 parallel algorithms

Wed Dec 6 16:23:04 PST 2017

On 12/04/2017 10:48 PM, Serge Preis via cfe-dev wrote:
> I agree that guarantees provided by ICC may be stronger than with 
> other compilers, so yes, under OpenMP terms vectorization is permitted 
> and cannot be assumed. However OpenMP clearly defines semantics of 
> variables used within OpenMP region some being shared(scalar), some 
> private(vector) and some being inductions. This goes far beyond 
> typical compiler specific pragmas about dependencies and cost 
> modelling and makes vectorization much simpler task with more 
> predictable and robust results if properly implemented (admittedly, 
> even ICC implementation is far from perfect). I hope Intel's efforts 
> to standardize someting like this in core C++ will evntually come to 
> fruition. Until then I as a regular application developer would 
> appreciate OpenMP-simd based execution policy (hoping for good support 
> for OpenMP SIMD in clang), but it shouldn't necessary be part of 
> libc++. Since 'unordered' execution policy is currently not part of 
> C++ standard

std::execution::par_unseq is part of C++17, and that essentially maps to 
'#pragma omp parallel for simd'.

> I don't care much on how it will be implemneted in libc++ if it is. I 
> just would like to ask Intel guys and community here to make 
> implementation extensible in a sense that custom OpenMP-SIMD-based 
> execution policy along with algorithms implementations (as 
> specializations for the policy) can be used with the libc++ library. 
> And I additionally would like to ask Intel guys to provide complete 
> and compatible extension on github for developers like me to use.

In the end, I think we want the following:

  1. A design for libc++ that allows the thread-level parallelism to be 
implemented in terms of different underlying providers (i.e., OpenMP, 
GCD, Work Queues on Windows, whatever else).
  2. To follow the same philosophy with respect to standards as we do 
everywhere else: Use standards where possible with 
compiler/system-specific extensions as necessary.

  -Hal

> Regards,
> Serge.
> 04.12.2017, 12:07, "Jeff Hammond" <jeff.science at gmail.com>:
>> ICC implements a very aggressive interpretation of the OpenMP 
>> standard, and this interpretation is not shared by everyone in the 
>> OpenMP community.  ICC is correct but other implementations may be 
>> far less aggressive, so _Pragma("omp simd") doesn't guarentee 
>> vectorization unless the compiler documentation says that is how it 
>> is implemented.  All the standard says that it means is that 
>> vectorization is _permitted_.
>> Given that the practical meaning of _Pragma("omp simd") isn't 
>> guaranteed to be consistent across different implementations, I don't 
>> really know how to compare it to compiler-specific pragmas unless we 
>> define everything explicitly.
>> In any case, my fundamental point remains: do not use OpenMP pragmas 
>> here, but instead use whatever the appropriate compiler-specific 
>> pragma is, or create a new one that meets the need.
>> Best,
>> Jeff
>> On Sun, Dec 3, 2017 at 8:09 PM, Serge Preis <spreis at yandex-team.ru 
>> <mailto:spreis at yandex-team.ru>> wrote:
>>
>>     Hello,
>>     _Pragma("omp simd") is semantically quite different from
>>     _Pragma("clang loop vectorize(assume_safety)"), _Pragma("GCC
>>     ivdep") and _Pragma("vector always"), so I am not sure all latter
>>     will work as expected in all cases. They definitely won't provide
>>     any vectorization guarantees which slightly defeat the purpose of
>>     using corresponding execution policy.
>>     I support the idea of having OpenMP orthogonal and definitely
>>     having -fopenmp enabled by default is not an option. Intel
>>     compiler has separate -qopenmp-simd option which doesn't affect
>>     performance outside explicitly marked loops, but even this is not
>>     enabled by default. I would say that there might exist multiple
>>     implementations of unordered policy, originally OpenMP SIMD based
>>     implementation may be more powerful and one based on other
>>     pragmas being default, but hinting about existence of faster
>>     option. Later on one may be brave enough to add some SIMD
>>     template library and implement default unordered policy using it
>>     (such implementation is possible even now using vector types, but
>>     it will be extremely complex if attempt to target all base data
>>     types, vector widths and target SIMD architectures clang
>>     supports. Even with the library this may be quite tedious).
>>     Without any standard way of expressing SIMD perallelism in pure
>>     C++ any implementer of SIMD execution policy is to rely on means
>>     avaialble for plaform/compiler and so it is not totaly unnatural
>>     to ask user to enable OpenMP SIMD for efficient support of
>>     corresponding execution policy.
>>     Reagrds,
>>     Serge Preis
>>     (Who once was part of Intel Compiler Vectorizer team and driven
>>     OpenMP SIMD efforts within icc and beyond, if anyone is keeping
>>     track of conflicts-of-interest)
>>     04.12.2017, 08:46, "Jeff Hammond via cfe-dev"
>>     <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>>:
>>>     It would be nice to keep PSTL and OpenMP orthogonal, even if
>>>     _Pragma("omp simd") does not require runtime support.  It should
>>>     be trivial to use _Pragma("clang loop vectorize(assume_safety)")
>>>     instead, by wrapping all of the different compiler vectorization
>>>     pragmas in preprocessor logic.  I similarly recommend
>>>     _Pragma("GCC ivdep") for GCC and _Pragma("vector always") for
>>>     ICC.  While this requires O(n_compilers) effort instead of O(1),
>>>     but orthogonality is worth it.
>>>     While OpenMP is vendor/compiler-agnostic, users should not be
>>>     required to use -fopenmp or similar to enable vectorization from
>>>     PSTL, nor should the compiler enable any OpenMP pragma by
>>>     default.  I know of cases where merely using the -fopenmp flag
>>>     alters code generation in a performance-visible manner, and
>>>     enabling the OpenMP "simd" pragma by default may surprise some
>>>     users, particularly if no other OpenMP pragmas are enabled by
>>>     default.
>>>
>>>     Best,
>>>     Jeff
>>>     (who works for Intel but not on any software products and has
>>>     been a heavy user of Intel PSTL since it was released, if anyone
>>>     is keeping track of conflicts-of-interest)
>>>
>>>     On Wed, Nov 29, 2017 at 4:21 AM, Kukanov, Alexey via cfe-dev
>>>     <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>>>     >
>>>     > Hello all,
>>>     >
>>>     > At Intel, we have developed an implementation of C++17
>>>     execution policies
>>>     > for algorithms (often referred to as Parallel STL). We hope to
>>>     contribute it
>>>     > to libc++/LLVM, so would like to ask the community for
>>>     comments on this.
>>>     >
>>>     > The code is already published at GitHub
>>>     (https://github.com/intel/parallelstl).
>>>     > It supports the C++17 standard execution policies (seq, par,
>>>     par_unseq) as well as
>>>     > the experimental unsequenced policy (unseq) for SIMD
>>>     execution. At the moment,
>>>     > about half of the C++17 standard algorithms that must support
>>>     execution policies
>>>     > are implemented; a few more will be ready soon, and the work
>>>     continues.
>>>     > The tests that we use are also available at GitHub; needless
>>>     to say we will
>>>     > contribute those as well.
>>>     >
>>>     > The implementation is not specific to Intel’s hardware. For
>>>     thread-level parallelism
>>>     > it uses TBB* (https://www.threadingbuildingblocks.org/) but
>>>     abstracts it with
>>>     > an internal API which can be implemented on top of other
>>>     threading/parallel solutions –
>>>     > so it is for the community to decide which ones to use. For
>>>     SIMD parallelism
>>>     > (unseq, par_unseq) we use #pragma omp simd directives; it is
>>>     vendor-neutral and
>>>     > does not require any OpenMP runtime support.
>>>     >
>>>     > The current implementation meets the spirit but not always the
>>>     letter of
>>>     > the standard, because it has to be separate from but also
>>>     coexist with
>>>     > implementations of standard C++ libraries. While preparing the
>>>     contribution,
>>>     > we will address inconsistencies, adjust the code to meet
>>>     community standards,
>>>     > and better integrate it into the standard library code.
>>>     >
>>>     > We are also proposing that our implementation is included into
>>>     libstdc++/GCC.
>>>     > Compatibility between the implementations seems useful as it
>>>     can potentially
>>>     > reduce the amount of work for everyone. We hope to keep the
>>>     code mostly identical,
>>>     > and would like to know if you think it’s too optimistic to expect.
>>>     >
>>>     > Obviously we plan to use appropriate open source licenses to
>>>     meet the different
>>>     > projects’ requirements.
>>>     >
>>>     > We expect to keep developing the code and will take the
>>>     responsibility for
>>>     > maintaining it (with community contributions, of course). If
>>>     there are other
>>>     > community efforts to implement parallel algorithms, we are
>>>     willing to collaborate.
>>>     >
>>>     > We look forward to your feedback, both for the overall idea
>>>     and – if supported –
>>>     > for the next steps we should take.
>>>     >
>>>     > Regards,
>>>     > - Alexey Kukanov
>>>     >
>>>     > * Note that TBB itself is highly portable (and ported by
>>>     community to Power and ARM
>>>     > architectures) and permissively licensed, so could be the base
>>>     for the threading
>>>     > infrastructure. But the Parallel STL implementation itself
>>>     does not require TBB.
>>>     >
>>>     > _______________________________________________
>>>     > cfe-dev mailing list
>>>     > cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>>     > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>>
>>>
>>>
>>>     --
>>>     Jeff Hammond
>>>     jeff.science at gmail.com <mailto:jeff.science at gmail.com>
>>>     http://jeffhammond.github.io/
>>>     ,
>>>
>>>     _______________________________________________
>>>     cfe-dev mailing list
>>>     cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>>     http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>> -- 
>> Jeff Hammond
>> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
>> http://jeffhammond.github.io/
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20171206/580a1b23/attachment.html>