[cfe-dev] [RFC] Proposal to contribute Intel’s implementation of C++17 parallel algorithms
Hal Finkel via cfe-dev
cfe-dev at lists.llvm.org
Fri Dec 8 13:13:49 PST 2017
On 12/07/2017 11:35 AM, Jeff Hammond wrote:
>
> On Wed, Dec 6, 2017 at 8:57 PM Hal Finkel <hfinkel at anl.gov
> <mailto:hfinkel at anl.gov>> wrote:
>
>
> On 12/06/2017 10:23 PM, Jeff Hammond wrote:
>>
>> On Wed, Dec 6, 2017 at 4:23 PM Hal Finkel <hfinkel at anl.gov
>> <mailto:hfinkel at anl.gov>> wrote:
>>
>>
>> On 12/04/2017 10:48 PM, Serge Preis via cfe-dev wrote:
>>> I agree that guarantees provided by ICC may be stronger than
>>> with other compilers, so yes, under OpenMP terms
>>> vectorization is permitted and cannot be assumed. However
>>> OpenMP clearly defines semantics of variables used within
>>> OpenMP region some being shared(scalar), some
>>> private(vector) and some being inductions. This goes far
>>> beyond typical compiler specific pragmas about dependencies
>>> and cost modelling and makes vectorization much simpler task
>>> with more predictable and robust results if properly
>>> implemented (admittedly, even ICC implementation is far from
>>> perfect). I hope Intel's efforts to standardize someting
>>> like this in core C++ will evntually come to fruition. Until
>>> then I as a regular application developer would appreciate
>>> OpenMP-simd based execution policy (hoping for good support
>>> for OpenMP SIMD in clang), but it shouldn't necessary be
>>> part of libc++. Since 'unordered' execution policy is
>>> currently not part of C++ standard
>>
>> std::execution::par_unseq is part of C++17, and that
>> essentially maps to '#pragma omp parallel for simd'.
>>
>>
>> Do you expect par/par_unseq to nest?
>
> Yes.
>
>
>> Nesting omp-parallel is generally regarded as a Bad Idea.
>
> Agreed. I suspect we'll want the mapping to be more like '#pragma
> omp taskloop simd'.
>
>
> That won’t run in parallel unless in an omp-parallel-master region.
Yes.
> That means OpenMP-based PSTL won’t be parallel unless the user knows
> to add back-end specific code about the PSTL.
That obviously wouldn't be acceptable.
>
> What I’m trying to say is that OpenMP is a poor target for PSTL in its
> current form. Nested parallel regions is the only thing that works and
> it is likely to work poorly.
I'm not sure that's true, but the technique may not be trivial. I
believe that it is possible, however. For example, the mapping might be
to something like:
if (omp_in_parallel()) {
#pragma omp taskloop simd
for (size_t i = 0; i < N; ++i)
F(X[i]);
} else {
#pragma omp parallel
{
#pragma omp taskloop simd
for (size_t i = 0; i < N; ++i)
F(X[i]);
}
}
The fact that we'd need to use this kind of pattern is a bit
unfortunate, but it can be easily abstracted into a template function,
so it just becomes some implementation detail of the library.
Thanks again,
Hal
>
> Jeff
>
>
> -Hal
>
>
>>
>> Jeff
>>
>>
>>> I don't care much on how it will be implemneted in libc++ if
>>> it is. I just would like to ask Intel guys and community
>>> here to make implementation extensible in a sense that
>>> custom OpenMP-SIMD-based execution policy along with
>>> algorithms implementations (as specializations for the
>>> policy) can be used with the libc++ library. And I
>>> additionally would like to ask Intel guys to provide
>>> complete and compatible extension on github for developers
>>> like me to use.
>>
>> In the end, I think we want the following:
>>
>> 1. A design for libc++ that allows the thread-level
>> parallelism to be implemented in terms of different
>> underlying providers (i.e., OpenMP, GCD, Work Queues on
>> Windows, whatever else).
>> 2. To follow the same philosophy with respect to standards
>> as we do everywhere else: Use standards where possible with
>> compiler/system-specific extensions as necessary.
>>
>> -Hal
>>
>>
>>> Regards,
>>> Serge.
>>> 04.12.2017, 12:07, "Jeff Hammond" <jeff.science at gmail.com>
>>> <mailto:jeff.science at gmail.com>:
>>>> ICC implements a very aggressive interpretation of the
>>>> OpenMP standard, and this interpretation is not shared by
>>>> everyone in the OpenMP community. ICC is correct but other
>>>> implementations may be far less aggressive, so _Pragma("omp
>>>> simd") doesn't guarentee vectorization unless the compiler
>>>> documentation says that is how it is implemented. All the
>>>> standard says that it means is that vectorization is
>>>> _permitted_.
>>>> Given that the practical meaning of _Pragma("omp simd")
>>>> isn't guaranteed to be consistent across different
>>>> implementations, I don't really know how to compare it to
>>>> compiler-specific pragmas unless we define everything
>>>> explicitly.
>>>> In any case, my fundamental point remains: do not use
>>>> OpenMP pragmas here, but instead use whatever the
>>>> appropriate compiler-specific pragma is, or create a new
>>>> one that meets the need.
>>>> Best,
>>>> Jeff
>>>> On Sun, Dec 3, 2017 at 8:09 PM, Serge Preis
>>>> <spreis at yandex-team.ru <mailto:spreis at yandex-team.ru>> wrote:
>>>>
>>>> Hello,
>>>> _Pragma("omp simd") is semantically quite different
>>>> from _Pragma("clang loop vectorize(assume_safety)"),
>>>> _Pragma("GCC ivdep") and _Pragma("vector always"), so I
>>>> am not sure all latter will work as expected in all
>>>> cases. They definitely won't provide any vectorization
>>>> guarantees which slightly defeat the purpose of using
>>>> corresponding execution policy.
>>>> I support the idea of having OpenMP orthogonal and
>>>> definitely having -fopenmp enabled by default is not an
>>>> option. Intel compiler has separate -qopenmp-simd
>>>> option which doesn't affect performance outside
>>>> explicitly marked loops, but even this is not enabled
>>>> by default. I would say that there might exist multiple
>>>> implementations of unordered policy, originally OpenMP
>>>> SIMD based implementation may be more powerful and one
>>>> based on other pragmas being default, but hinting about
>>>> existence of faster option. Later on one may be brave
>>>> enough to add some SIMD template library and implement
>>>> default unordered policy using it (such implementation
>>>> is possible even now using vector types, but it will be
>>>> extremely complex if attempt to target all base data
>>>> types, vector widths and target SIMD architectures
>>>> clang supports. Even with the library this may be quite
>>>> tedious).
>>>> Without any standard way of expressing SIMD perallelism
>>>> in pure C++ any implementer of SIMD execution policy is
>>>> to rely on means avaialble for plaform/compiler and so
>>>> it is not totaly unnatural to ask user to enable OpenMP
>>>> SIMD for efficient support of corresponding execution
>>>> policy.
>>>> Reagrds,
>>>> Serge Preis
>>>> (Who once was part of Intel Compiler Vectorizer team
>>>> and driven OpenMP SIMD efforts within icc and beyond,
>>>> if anyone is keeping track of conflicts-of-interest)
>>>> 04.12.2017, 08:46, "Jeff Hammond via cfe-dev"
>>>> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>>:
>>>>> It would be nice to keep PSTL and OpenMP orthogonal,
>>>>> even if _Pragma("omp simd") does not require runtime
>>>>> support. It should be trivial to use _Pragma("clang
>>>>> loop vectorize(assume_safety)") instead, by wrapping
>>>>> all of the different compiler vectorization pragmas in
>>>>> preprocessor logic. I similarly recommend
>>>>> _Pragma("GCC ivdep") for GCC and _Pragma("vector
>>>>> always") for ICC. While this requires O(n_compilers)
>>>>> effort instead of O(1), but orthogonality is worth it.
>>>>> While OpenMP is vendor/compiler-agnostic, users should
>>>>> not be required to use -fopenmp or similar to enable
>>>>> vectorization from PSTL, nor should the compiler
>>>>> enable any OpenMP pragma by default. I know of cases
>>>>> where merely using the -fopenmp flag alters code
>>>>> generation in a performance-visible manner, and
>>>>> enabling the OpenMP "simd" pragma by default may
>>>>> surprise some users, particularly if no other OpenMP
>>>>> pragmas are enabled by default.
>>>>>
>>>>> Best,
>>>>> Jeff
>>>>> (who works for Intel but not on any software products
>>>>> and has been a heavy user of Intel PSTL since it was
>>>>> released, if anyone is keeping track of
>>>>> conflicts-of-interest)
>>>>>
>>>>> On Wed, Nov 29, 2017 at 4:21 AM, Kukanov, Alexey via
>>>>> cfe-dev <cfe-dev at lists.llvm.org
>>>>> <mailto:cfe-dev at lists.llvm.org>> wrote:
>>>>> >
>>>>> > Hello all,
>>>>> >
>>>>> > At Intel, we have developed an implementation of
>>>>> C++17 execution policies
>>>>> > for algorithms (often referred to as Parallel STL).
>>>>> We hope to contribute it
>>>>> > to libc++/LLVM, so would like to ask the community
>>>>> for comments on this.
>>>>> >
>>>>> > The code is already published at GitHub
>>>>> (https://github.com/intel/parallelstl).
>>>>> > It supports the C++17 standard execution policies
>>>>> (seq, par, par_unseq) as well as
>>>>> > the experimental unsequenced policy (unseq) for SIMD
>>>>> execution. At the moment,
>>>>> > about half of the C++17 standard algorithms that
>>>>> must support execution policies
>>>>> > are implemented; a few more will be ready soon, and
>>>>> the work continues.
>>>>> > The tests that we use are also available at GitHub;
>>>>> needless to say we will
>>>>> > contribute those as well.
>>>>> >
>>>>> > The implementation is not specific to Intel’s
>>>>> hardware. For thread-level parallelism
>>>>> > it uses TBB*
>>>>> (https://www.threadingbuildingblocks.org/) but
>>>>> abstracts it with
>>>>> > an internal API which can be implemented on top of
>>>>> other threading/parallel solutions –
>>>>> > so it is for the community to decide which ones to
>>>>> use. For SIMD parallelism
>>>>> > (unseq, par_unseq) we use #pragma omp simd
>>>>> directives; it is vendor-neutral and
>>>>> > does not require any OpenMP runtime support.
>>>>> >
>>>>> > The current implementation meets the spirit but not
>>>>> always the letter of
>>>>> > the standard, because it has to be separate from but
>>>>> also coexist with
>>>>> > implementations of standard C++ libraries. While
>>>>> preparing the contribution,
>>>>> > we will address inconsistencies, adjust the code to
>>>>> meet community standards,
>>>>> > and better integrate it into the standard library code.
>>>>> >
>>>>> > We are also proposing that our implementation is
>>>>> included into libstdc++/GCC.
>>>>> > Compatibility between the implementations seems
>>>>> useful as it can potentially
>>>>> > reduce the amount of work for everyone. We hope to
>>>>> keep the code mostly identical,
>>>>> > and would like to know if you think it’s too
>>>>> optimistic to expect.
>>>>> >
>>>>> > Obviously we plan to use appropriate open source
>>>>> licenses to meet the different
>>>>> > projects’ requirements.
>>>>> >
>>>>> > We expect to keep developing the code and will take
>>>>> the responsibility for
>>>>> > maintaining it (with community contributions, of
>>>>> course). If there are other
>>>>> > community efforts to implement parallel algorithms,
>>>>> we are willing to collaborate.
>>>>> >
>>>>> > We look forward to your feedback, both for the
>>>>> overall idea and – if supported –
>>>>> > for the next steps we should take.
>>>>> >
>>>>> > Regards,
>>>>> > - Alexey Kukanov
>>>>> >
>>>>> > * Note that TBB itself is highly portable (and
>>>>> ported by community to Power and ARM
>>>>> > architectures) and permissively licensed, so could
>>>>> be the base for the threading
>>>>> > infrastructure. But the Parallel STL implementation
>>>>> itself does not require TBB.
>>>>> >
>>>>> > _______________________________________________
>>>>> > cfe-dev mailing list
>>>>> > cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jeff Hammond
>>>>> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
>>>>> http://jeffhammond.github.io/
>>>>> ,
>>>>>
>>>>> _______________________________________________
>>>>> cfe-dev mailing list
>>>>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>
>>>> --
>>>> Jeff Hammond
>>>> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
>>>> http://jeffhammond.github.io/
>>>
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>> --
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
>> http://jeffhammond.github.io/
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
> --
> Jeff Hammond
> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
> http://jeffhammond.github.io/
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20171208/4f88ec5d/attachment.html>
More information about the cfe-dev
mailing list