<div dir="ltr">It would be nice to keep PSTL and OpenMP orthogonal, even if _Pragma("omp simd") does not require runtime support.  It should be trivial to use _Pragma("clang loop vectorize(assume_safety)") instead, by wrapping all of the different compiler vectorization pragmas in preprocessor logic.  I similarly recommend _Pragma("GCC ivdep") for GCC and _Pragma("vector always") for ICC.  While this requires O(n_compilers) effort instead of O(1), but orthogonality is worth it.<div><br></div><div>While OpenMP is vendor/compiler-agnostic, users should not be required to use -fopenmp or similar to enable vectorization from PSTL, nor should the compiler enable any OpenMP pragma by default.  I know of cases where merely using the -fopenmp flag alters code generation in a performance-visible manner, and enabling the OpenMP "simd" pragma by default may surprise some users, particularly if no other OpenMP pragmas are enabled by default.<div><br>Best,</div><div><br></div><div>Jeff</div><div>(who works for Intel but not on any software products and has been a heavy user of Intel PSTL since it was released, if anyone is keeping track of conflicts-of-interest)<br><br>On Wed, Nov 29, 2017 at 4:21 AM, Kukanov, Alexey via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>> wrote:<br>><br>> Hello all,<br>><br>> At Intel, we have developed an implementation of C++17 execution policies<br>> for algorithms (often referred to as Parallel STL). We hope to contribute it<br>> to libc++/LLVM, so would like to ask the community for comments on this.<br>><br>> The code is already published at GitHub (<a href="https://github.com/intel/parallelstl">https://github.com/intel/parallelstl</a>).<br>> It supports the C++17 standard execution policies (seq, par, par_unseq) as well as<br>> the experimental unsequenced policy (unseq) for SIMD execution. At the moment,<br>> about half of the C++17 standard algorithms that must support execution policies<br>> are implemented; a few more will be ready soon, and the work continues.<br>> The tests that we use are also available at GitHub; needless to say we will<br>> contribute those as well.<br>><br>> The implementation is not specific to Intel’s hardware. For thread-level parallelism<br>> it uses TBB* (<a href="https://www.threadingbuildingblocks.org/">https://www.threadingbuildingblocks.org/</a>) but abstracts it with<br>> an internal API which can be implemented on top of other threading/parallel solutions –<br>> so it is for the community to decide which ones to use. For SIMD parallelism<br>> (unseq, par_unseq) we use #pragma omp simd directives; it is vendor-neutral and<br>> does not require any OpenMP runtime support.<br>><br>> The current implementation meets the spirit but not always the letter of<br>> the standard, because it has to be separate from but also coexist with<br>> implementations of standard C++ libraries. While preparing the contribution,<br>> we will address inconsistencies, adjust the code to meet community standards,<br>> and better integrate it into the standard library code.<br>><br>> We are also proposing that our implementation is included into libstdc++/GCC.<br>> Compatibility between the implementations seems useful as it can potentially<br>> reduce the amount of work for everyone. We hope to keep the code mostly identical,<br>> and would like to know if you think it’s too optimistic to expect.<br>><br>> Obviously we plan to use appropriate open source licenses to meet the different<br>> projects’ requirements.<br>><br>> We expect to keep developing the code and will take the responsibility for<br>> maintaining it (with community contributions, of course). If there are other<br>> community efforts to implement parallel algorithms, we are willing to collaborate.<br>><br>> We look forward to your feedback, both for the overall idea and – if supported –<br>> for the next steps we should take.<br>><br>> Regards,<br>> - Alexey Kukanov<br>><br>> * Note that TBB itself is highly portable (and ported by community to Power and ARM<br>> architectures) and permissively licensed, so could be the base for the threading<br>> infrastructure. But the Parallel STL implementation itself does not require TBB.<br>><br>> _______________________________________________<br>> cfe-dev mailing list<br>> <a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a><br>> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br><br><br><br><br>--<br>Jeff Hammond<br><a href="mailto:jeff.science@gmail.com">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/">http://jeffhammond.github.io/</a><br><div class="gmail_extra">

</div></div></div></div>