[cfe-dev] [RFC] Proposal to contribute Intel’s implementation of C++17 parallel algorithms

Sun Dec 3 20:23:03 PST 2017

On 12/03/2017 10:09 PM, Serge Preis via cfe-dev wrote:
> Hello,
> _Pragma("omp simd") is sematically quite different from _Pragma("clang 
> loop vectorize(assume_safety)"), _Pragma("GCC ivdep") and 
> _Pragma("vector always"), so I am not sure all latter will work as 
> exepected in all cases. They definitely won't provide any 
> vectorization guarantees which slightly defeat the purpose of using 
> corresponding execution policy.
> I support the idea of having OpenMP orthogonal and definitely having 
> -fopenmp enabled by default is not an option. Intel compiler has 
> separate -qopenmp-simd option which doesn't affect performance outside 
> explicitly marked loops,

A similar flag is being worked on for Clang (see 
https://reviews.llvm.org/D31417).

Maybe what we really need for this is some kind of ' #pragma GCC 
push_options' thing so that we can force OpenMP SIMD support on in 
particular regions of code?

  -Hal

> but even this is not enabled by default. I would say that there might 
> exist multiple implementations of unordered policy, originally OpenMP 
> SIMD based implementation may be more powerful and one based on other 
> pragmas being default, but hinting about existence of faster option. 
> Later on one may be brave enough to add some SIMD template library and 
> implement default unordered policy using it (such implementation is 
> possible even now using vector types, but it will be extremely complex 
> if attempt to target all base data types, vector widths and target 
> SIMD atchitectures clang supports. Even with the library this may be 
> quite tedious).
> Without any standard way of expressing SIMD perallelism in pure C++ 
> any implementer of SIMD execution policy is to rely on means avaialble 
> for plaform/compiler and so it is not totaly unnatural to ask user to 
> enable OpenMP SIMD for efficient support of corresponding execution 
> policy.
> Reagrds,
> Serge Preis
> (Who once was part of Intel Compiler Vectorizer team and driven OpenMP 
> SIMD efforts within icc and beyond, if anyone is keeping track of 
> conflicts-of-interest)
> 04.12.2017, 08:46, "Jeff Hammond via cfe-dev" <cfe-dev at lists.llvm.org>:
>> It would be nice to keep PSTL and OpenMP orthogonal, even if 
>> _Pragma("omp simd") does not require runtime support.  It should be 
>> trivial to use _Pragma("clang loop vectorize(assume_safety)") 
>> instead, by wrapping all of the different compiler vectorization 
>> pragmas in preprocessor logic.  I similarly recommend _Pragma("GCC 
>> ivdep") for GCC and _Pragma("vector always") for ICC.  While this 
>> requires O(n_compilers) effort instead of O(1), but orthogonality is 
>> worth it.
>> While OpenMP is vendor/compiler-agnostic, users should not be 
>> required to use -fopenmp or similar to enable vectorization from 
>> PSTL, nor should the compiler enable any OpenMP pragma by default.  I 
>> know of cases where merely using the -fopenmp flag alters code 
>> generation in a performance-visible manner, and enabling the OpenMP 
>> "simd" pragma by default may surprise some users, particularly if no 
>> other OpenMP pragmas are enabled by default.
>>
>> Best,
>> Jeff
>> (who works for Intel but not on any software products and has been a 
>> heavy user of Intel PSTL since it was released, if anyone is keeping 
>> track of conflicts-of-interest)
>>
>> On Wed, Nov 29, 2017 at 4:21 AM, Kukanov, Alexey via cfe-dev 
>> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>> >
>> > Hello all,
>> >
>> > At Intel, we have developed an implementation of C++17 execution 
>> policies
>> > for algorithms (often referred to as Parallel STL). We hope to 
>> contribute it
>> > to libc++/LLVM, so would like to ask the community for comments on 
>> this.
>> >
>> > The code is already published at GitHub 
>> (https://github.com/intel/parallelstl).
>> > It supports the C++17 standard execution policies (seq, par, 
>> par_unseq) as well as
>> > the experimental unsequenced policy (unseq) for SIMD execution. At 
>> the moment,
>> > about half of the C++17 standard algorithms that must support 
>> execution policies
>> > are implemented; a few more will be ready soon, and the work continues.
>> > The tests that we use are also available at GitHub; needless to say 
>> we will
>> > contribute those as well.
>> >
>> > The implementation is not specific to Intel’s hardware. For 
>> thread-level parallelism
>> > it uses TBB* (https://www.threadingbuildingblocks.org/) but 
>> abstracts it with
>> > an internal API which can be implemented on top of other 
>> threading/parallel solutions –
>> > so it is for the community to decide which ones to use. For SIMD 
>> parallelism
>> > (unseq, par_unseq) we use #pragma omp simd directives; it is 
>> vendor-neutral and
>> > does not require any OpenMP runtime support.
>> >
>> > The current implementation meets the spirit but not always the 
>> letter of
>> > the standard, because it has to be separate from but also coexist with
>> > implementations of standard C++ libraries. While preparing the 
>> contribution,
>> > we will address inconsistencies, adjust the code to meet community 
>> standards,
>> > and better integrate it into the standard library code.
>> >
>> > We are also proposing that our implementation is included into 
>> libstdc++/GCC.
>> > Compatibility between the implementations seems useful as it can 
>> potentially
>> > reduce the amount of work for everyone. We hope to keep the code 
>> mostly identical,
>> > and would like to know if you think it’s too optimistic to expect.
>> >
>> > Obviously we plan to use appropriate open source licenses to meet 
>> the different
>> > projects’ requirements.
>> >
>> > We expect to keep developing the code and will take the 
>> responsibility for
>> > maintaining it (with community contributions, of course). If there 
>> are other
>> > community efforts to implement parallel algorithms, we are willing 
>> to collaborate.
>> >
>> > We look forward to your feedback, both for the overall idea and – 
>> if supported –
>> > for the next steps we should take.
>> >
>> > Regards,
>> > - Alexey Kukanov
>> >
>> > * Note that TBB itself is highly portable (and ported by community 
>> to Power and ARM
>> > architectures) and permissively licensed, so could be the base for 
>> the threading
>> > infrastructure. But the Parallel STL implementation itself does not 
>> require TBB.
>> >
>> > _______________________________________________
>> > cfe-dev mailing list
>> > cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>>
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
>> http://jeffhammond.github.io/
>> ,
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20171203/48a4b92e/attachment.html>