[cfe-dev] [RFC] Proposal to contribute Intel’s implementation of C++17 parallel algorithms
Hal Finkel via cfe-dev
cfe-dev at lists.llvm.org
Wed Dec 6 20:57:29 PST 2017
On 12/06/2017 10:23 PM, Jeff Hammond wrote:
>
> On Wed, Dec 6, 2017 at 4:23 PM Hal Finkel <hfinkel at anl.gov
> <mailto:hfinkel at anl.gov>> wrote:
>
>
> On 12/04/2017 10:48 PM, Serge Preis via cfe-dev wrote:
>> I agree that guarantees provided by ICC may be stronger than with
>> other compilers, so yes, under OpenMP terms vectorization is
>> permitted and cannot be assumed. However OpenMP clearly defines
>> semantics of variables used within OpenMP region some being
>> shared(scalar), some private(vector) and some being inductions.
>> This goes far beyond typical compiler specific pragmas about
>> dependencies and cost modelling and makes vectorization much
>> simpler task with more predictable and robust results if properly
>> implemented (admittedly, even ICC implementation is far from
>> perfect). I hope Intel's efforts to standardize someting like
>> this in core C++ will evntually come to fruition. Until then I as
>> a regular application developer would appreciate OpenMP-simd
>> based execution policy (hoping for good support for OpenMP SIMD
>> in clang), but it shouldn't necessary be part of libc++. Since
>> 'unordered' execution policy is currently not part of C++ standard
>
> std::execution::par_unseq is part of C++17, and that essentially
> maps to '#pragma omp parallel for simd'.
>
>
> Do you expect par/par_unseq to nest?
Yes.
> Nesting omp-parallel is generally regarded as a Bad Idea.
Agreed. I suspect we'll want the mapping to be more like '#pragma omp
taskloop simd'.
-Hal
>
> Jeff
>
>
>> I don't care much on how it will be implemneted in libc++ if it
>> is. I just would like to ask Intel guys and community here to
>> make implementation extensible in a sense that custom
>> OpenMP-SIMD-based execution policy along with algorithms
>> implementations (as specializations for the policy) can be used
>> with the libc++ library. And I additionally would like to ask
>> Intel guys to provide complete and compatible extension on github
>> for developers like me to use.
>
> In the end, I think we want the following:
>
> 1. A design for libc++ that allows the thread-level parallelism
> to be implemented in terms of different underlying providers
> (i.e., OpenMP, GCD, Work Queues on Windows, whatever else).
> 2. To follow the same philosophy with respect to standards as we
> do everywhere else: Use standards where possible with
> compiler/system-specific extensions as necessary.
>
> -Hal
>
>
>> Regards,
>> Serge.
>> 04.12.2017, 12:07, "Jeff Hammond" <jeff.science at gmail.com>
>> <mailto:jeff.science at gmail.com>:
>>> ICC implements a very aggressive interpretation of the OpenMP
>>> standard, and this interpretation is not shared by everyone in
>>> the OpenMP community. ICC is correct but other implementations
>>> may be far less aggressive, so _Pragma("omp simd") doesn't
>>> guarentee vectorization unless the compiler documentation says
>>> that is how it is implemented. All the standard says that it
>>> means is that vectorization is _permitted_.
>>> Given that the practical meaning of _Pragma("omp simd") isn't
>>> guaranteed to be consistent across different implementations, I
>>> don't really know how to compare it to compiler-specific pragmas
>>> unless we define everything explicitly.
>>> In any case, my fundamental point remains: do not use OpenMP
>>> pragmas here, but instead use whatever the appropriate
>>> compiler-specific pragma is, or create a new one that meets the
>>> need.
>>> Best,
>>> Jeff
>>> On Sun, Dec 3, 2017 at 8:09 PM, Serge Preis
>>> <spreis at yandex-team.ru <mailto:spreis at yandex-team.ru>> wrote:
>>>
>>> Hello,
>>> _Pragma("omp simd") is semantically quite different from
>>> _Pragma("clang loop vectorize(assume_safety)"), _Pragma("GCC
>>> ivdep") and _Pragma("vector always"), so I am not sure all
>>> latter will work as expected in all cases. They definitely
>>> won't provide any vectorization guarantees which slightly
>>> defeat the purpose of using corresponding execution policy.
>>> I support the idea of having OpenMP orthogonal and
>>> definitely having -fopenmp enabled by default is not an
>>> option. Intel compiler has separate -qopenmp-simd option
>>> which doesn't affect performance outside explicitly marked
>>> loops, but even this is not enabled by default. I would say
>>> that there might exist multiple implementations of unordered
>>> policy, originally OpenMP SIMD based implementation may be
>>> more powerful and one based on other pragmas being default,
>>> but hinting about existence of faster option. Later on one
>>> may be brave enough to add some SIMD template library and
>>> implement default unordered policy using it (such
>>> implementation is possible even now using vector types, but
>>> it will be extremely complex if attempt to target all base
>>> data types, vector widths and target SIMD architectures
>>> clang supports. Even with the library this may be quite
>>> tedious).
>>> Without any standard way of expressing SIMD perallelism in
>>> pure C++ any implementer of SIMD execution policy is to rely
>>> on means avaialble for plaform/compiler and so it is not
>>> totaly unnatural to ask user to enable OpenMP SIMD for
>>> efficient support of corresponding execution policy.
>>> Reagrds,
>>> Serge Preis
>>> (Who once was part of Intel Compiler Vectorizer team and
>>> driven OpenMP SIMD efforts within icc and beyond, if anyone
>>> is keeping track of conflicts-of-interest)
>>> 04.12.2017, 08:46, "Jeff Hammond via cfe-dev"
>>> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>>:
>>>> It would be nice to keep PSTL and OpenMP orthogonal, even
>>>> if _Pragma("omp simd") does not require runtime support.
>>>> It should be trivial to use _Pragma("clang loop
>>>> vectorize(assume_safety)") instead, by wrapping all of the
>>>> different compiler vectorization pragmas in preprocessor
>>>> logic. I similarly recommend _Pragma("GCC ivdep") for GCC
>>>> and _Pragma("vector always") for ICC. While this requires
>>>> O(n_compilers) effort instead of O(1), but orthogonality is
>>>> worth it.
>>>> While OpenMP is vendor/compiler-agnostic, users should not
>>>> be required to use -fopenmp or similar to enable
>>>> vectorization from PSTL, nor should the compiler enable any
>>>> OpenMP pragma by default. I know of cases where merely
>>>> using the -fopenmp flag alters code generation in a
>>>> performance-visible manner, and enabling the OpenMP "simd"
>>>> pragma by default may surprise some users, particularly if
>>>> no other OpenMP pragmas are enabled by default.
>>>>
>>>> Best,
>>>> Jeff
>>>> (who works for Intel but not on any software products and
>>>> has been a heavy user of Intel PSTL since it was released,
>>>> if anyone is keeping track of conflicts-of-interest)
>>>>
>>>> On Wed, Nov 29, 2017 at 4:21 AM, Kukanov, Alexey via
>>>> cfe-dev <cfe-dev at lists.llvm.org
>>>> <mailto:cfe-dev at lists.llvm.org>> wrote:
>>>> >
>>>> > Hello all,
>>>> >
>>>> > At Intel, we have developed an implementation of C++17
>>>> execution policies
>>>> > for algorithms (often referred to as Parallel STL). We
>>>> hope to contribute it
>>>> > to libc++/LLVM, so would like to ask the community for
>>>> comments on this.
>>>> >
>>>> > The code is already published at GitHub
>>>> (https://github.com/intel/parallelstl).
>>>> > It supports the C++17 standard execution policies (seq,
>>>> par, par_unseq) as well as
>>>> > the experimental unsequenced policy (unseq) for SIMD
>>>> execution. At the moment,
>>>> > about half of the C++17 standard algorithms that must
>>>> support execution policies
>>>> > are implemented; a few more will be ready soon, and the
>>>> work continues.
>>>> > The tests that we use are also available at GitHub;
>>>> needless to say we will
>>>> > contribute those as well.
>>>> >
>>>> > The implementation is not specific to Intel’s hardware.
>>>> For thread-level parallelism
>>>> > it uses TBB* (https://www.threadingbuildingblocks.org/)
>>>> but abstracts it with
>>>> > an internal API which can be implemented on top of other
>>>> threading/parallel solutions –
>>>> > so it is for the community to decide which ones to use.
>>>> For SIMD parallelism
>>>> > (unseq, par_unseq) we use #pragma omp simd directives; it
>>>> is vendor-neutral and
>>>> > does not require any OpenMP runtime support.
>>>> >
>>>> > The current implementation meets the spirit but not
>>>> always the letter of
>>>> > the standard, because it has to be separate from but also
>>>> coexist with
>>>> > implementations of standard C++ libraries. While
>>>> preparing the contribution,
>>>> > we will address inconsistencies, adjust the code to meet
>>>> community standards,
>>>> > and better integrate it into the standard library code.
>>>> >
>>>> > We are also proposing that our implementation is included
>>>> into libstdc++/GCC.
>>>> > Compatibility between the implementations seems useful as
>>>> it can potentially
>>>> > reduce the amount of work for everyone. We hope to keep
>>>> the code mostly identical,
>>>> > and would like to know if you think it’s too optimistic
>>>> to expect.
>>>> >
>>>> > Obviously we plan to use appropriate open source licenses
>>>> to meet the different
>>>> > projects’ requirements.
>>>> >
>>>> > We expect to keep developing the code and will take the
>>>> responsibility for
>>>> > maintaining it (with community contributions, of course).
>>>> If there are other
>>>> > community efforts to implement parallel algorithms, we
>>>> are willing to collaborate.
>>>> >
>>>> > We look forward to your feedback, both for the overall
>>>> idea and – if supported –
>>>> > for the next steps we should take.
>>>> >
>>>> > Regards,
>>>> > - Alexey Kukanov
>>>> >
>>>> > * Note that TBB itself is highly portable (and ported by
>>>> community to Power and ARM
>>>> > architectures) and permissively licensed, so could be the
>>>> base for the threading
>>>> > infrastructure. But the Parallel STL implementation
>>>> itself does not require TBB.
>>>> >
>>>> > _______________________________________________
>>>> > cfe-dev mailing list
>>>> > cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jeff Hammond
>>>> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
>>>> http://jeffhammond.github.io/
>>>> ,
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>
>>> --
>>> Jeff Hammond
>>> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
>>> http://jeffhammond.github.io/
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
> --
> Jeff Hammond
> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
> http://jeffhammond.github.io/
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20171206/abb9599c/attachment.html>
More information about the cfe-dev
mailing list