[libcxx-dev] Parallel STL

Jeff Hammond via libcxx-dev libcxx-dev at lists.llvm.org
Wed Sep 16 15:53:19 PDT 2020


Yes, indeed OpenMP tasking is used so it will compose well.  Below are some
measurements I took a while ago (2018 -
https://www.ixpug.org/images/docs/KAUST_Workshop_2018/IXPUG_Invited2_Hammond.pdf).
This compares a bunch of different C++ parallel loop abstractions (
https://github.com/ParRes/Kernels/tree/default/Cxx11 contains all the
code).

I suppose I should measure again but some of the performance limitations of
OpenMP tasking are baked into the semantics and cannot be improved upon by
a compliant implementation.

[image: Screen Shot 2020-09-16 at 3.49.56 PM.png]

Jeff

On Wed, Sep 16, 2020 at 1:58 PM Christopher Nelson <nadiasvertex at gmail.com>
wrote:

> I am looking over the OpenMP code, and it does seem to handle the nested
> parallel region problem correctly.
>
> What is the simple case that omp-parallel-for works better on? Perhaps
> that can be detected and preferred in these cases? I am by no means an
> OpenMP expert.
>
> I can see that the TBB folks have done a lot of work to make it easier to
> take TBB as a dependency. For us there is still some hesitancy around
> embracing a large dependency like that, for a variety of reasons. I have a
> number of business problems that can be drastically improved with even
> basic parallelization. So while it would be great to eke out every erg of
> efficiency, having something that allows our developers to trivially use
> all the cores in the machine in even basic scenarios would be a huge help.
> :-) And it would be nice if it were standardized, and came with our
> compiler... :-)
>
> On Wed, Sep 16, 2020 at 3:54 PM Jeff Hammond <jeff.science at gmail.com>
> wrote:
>
>> It is worth having a detailed discussion of what is meant by the OpenMP
>> version.  If one maps exec::par onto omp-parallel-for, nested loops will be
>> transformed into the OpenMP nested-parallel anti-pattern (or one has to
>> check omp_in_parallel and generate two paths every time).  One of the
>> reasons why TBB is a better backend is that it self-composes better than
>> OpenMP parallel-for.  OpenMP tasking should compose better, but does not
>> perform as well as omp-parallel-for in the simple cases.  (I have
>> performance data comparing all of these.)
>>
>> If supporting nested parallel loops is a non-goal for PSTL, then my
>> comments can be ignored.
>>
>> GCD is likely a good back-end on MacOS, especially since Apple Clang
>> doesn't support OpenMP.
>>
>> Jeff
>>
>> On Wed, Sep 16, 2020 at 12:19 PM Christopher Nelson via libcxx-dev <
>> libcxx-dev at lists.llvm.org> wrote:
>>
>>> That's very exciting to hear! I was thinking of taking up the stalled
>>> OpenMP implementation, but if you are a good way along then it may not make
>>> any sense to do both.
>>>
>>> On the other hand, if you feel it might be a year or more before you
>>> have something production ready, it might make sense to try and finish up
>>> the OpenMP version. What do you think?
>>>
>>> On Wed, Sep 16, 2020 at 12:14 PM Thomas Rodgers <trodgers at redhat.com>
>>> wrote:
>>>
>>>> > Okay, that makes sense. I can see how you might want to use Grand
>>>> > Central Dispatch on macOS, and the Windows system thread pool on
>>>> > Windows. I'm not really sure what that means for Linux, though. Other
>>>> than maybe pthreads, which is not great.
>>>>
>>>> I am currently working on a new backend for GCC which is neither TBB nor
>>>> OpenMP which will support both the PSTL and (presumably) C++23
>>>> Executors.
>>>>
>>>> Kukanov, Alexey writes:
>>>>
>>>> > Hi Cristopher,
>>>> >
>>>> > One good way to contribute, I think, is to develop an OpenMP-based
>>>> parallel backend. LLVM already supports OpenMP, so it resolves the
>>>> dependency problem Louis mentioned. While it’s arguably not the best
>>>> default engine in the long term, there is certainly some demand for it. The
>>>> GCC community is also interested in it. Moreover, Mikhail and the team at
>>>> Intel in collaboration with Thomas (CC’d) from GCC already developed a
>>>> basic prototype: https://reviews.llvm.org/D70530, but further work is
>>>> postponed. If you are interested to continue, you are more than welcome,
>>>> and we will help with guidance and feedback.
>>>> >
>>>> > Regards,
>>>> > - Alexey
>>>> >
>>>> > From: libcxx-dev <libcxx-dev-bounces at lists.llvm.org> On Behalf Of
>>>> Christopher Nelson via libcxx-dev
>>>> > Sent: Wednesday, September 16, 2020 2:43 PM
>>>> > To: Louis Dionne <ldionne at apple.com>
>>>> > Cc: Dvorskiy, Mikhail <mikhail.dvorskiy at intel.com>;
>>>> > Subject: Re: [libcxx-dev] Parallel STL
>>>> >
>>>> > Fantastic. I will study the serial backend and see what I can do!
>>>> >
>>>> > On Tue, Sep 15, 2020 at 5:27 PM Louis Dionne <ldionne at apple.com
>>>> <mailto:ldionne at apple.com>> wrote:
>>>> > + Mikhail, who wrote most of the PSTL
>>>> >
>>>> >
>>>> > On Sep 15, 2020, at 15:40, Christopher Nelson <nadiasvertex at gmail.com
>>>> <mailto:nadiasvertex at gmail.com>> wrote:
>>>> >
>>>> > Okay, that makes sense. I can see how you might want to use Grand
>>>> Central Dispatch on macOS, and the Windows system thread pool on Windows.
>>>> I'm not really sure what that means for Linux, though. Other than maybe
>>>> pthreads, which is not great.
>>>> >
>>>> > Is there any documentation on what is needed to create a backend? Or
>>>> are there perhaps already plans in motion? I don't want to step on any
>>>> toes, but I would love to have a usable pstl on macOS and Linux for the
>>>> next LLVM release.
>>>> > We use libc++ on Linux as well as macOS. Depending on what's
>>>> involved, I might be able to contribute a backend for those two platforms.
>>>> >
>>>> > You're not stepping on any toes, far from that. If we have backends
>>>> with satisfactory performance and we're confident about ABI stability, I
>>>> don't see a reason why we wouldn't ship the PSTL as soon as we have those.
>>>> One big issue to shipping it so far has been that the only backends are
>>>> serial (not great to ship that), and the other one relies on an external
>>>> dependency (TBB).
>>>> >
>>>> > Mikhail might be able to provide documentation. We should check it
>>>> into the PSTL repository. I meant to write such documentation when I wrote
>>>> the serial backend, but never got around to writing something that was
>>>> enough to check in. You can see the minimal API needed to implement a
>>>> backend here: pstl/include/pstl/internal/parallel_backend_serial.h. It's
>>>> the serial backend, which tries to be as trivial as possible.
>>>> >
>>>> > Are you familiar with libc++ contribution? If so, contributing to
>>>> PSTL works basically the same -- just send a Phabricator review and I'll
>>>> review it. We can also chat on Slack in the Cpplang workspace and I can
>>>> give some guidance -- look for "ldionne".
>>>> >
>>>> > Cheers,
>>>> > Louis
>>>> >
>>>> >
>>>> >
>>>> > On Tue, Sep 15, 2020 at 2:50 PM Louis Dionne <ldionne at apple.com
>>>> <mailto:ldionne at apple.com>> wrote:
>>>> > Hi,
>>>> >
>>>> > Long story short, the PSTL is pretty much ready to be shipped with
>>>> LLVM. I did the integration between it and libc++, and it all worked last
>>>> time I checked. I think the next step would be to change whatever LLVM
>>>> scripts are used to create releases to also install the PSTL, which is the
>>>> part I haven't had time to look into yet.
>>>> >
>>>> > That being said, the PSTL will then default to using the Serial
>>>> backend, which isn't very useful. We could decide to ship a different
>>>> backend if we wanted, however I think what makes sense is to use a backend
>>>> specific to the platform we're running on instead of adding a dependency to
>>>> LLVM.
>>>> >
>>>> > Louis
>>>> >
>>>> >> On Sep 8, 2020, at 08:25, Christopher Nelson via libcxx-dev <
>>>> libcxx-dev at lists.llvm.org<mailto:libcxx-dev at lists.llvm.org>> wrote:
>>>> >>
>>>> >> Hello friends,
>>>> >>
>>>> >> I have spent some time looking at the mailing archives and git logs
>>>> for the parallel STL. I'm not clear what state it is in, since the
>>>> oneAPI/tbb seems to be production ready and comes with the parallel STL.
>>>> Also, it appears the GCC has shipped a PSTL based on the same code that
>>>> clang is using.
>>>> >>
>>>> >> I was wondering if someone could clarify for me what state the PSTL
>>>> is in, and if there is some work needed to help get it over the finish line
>>>> I may be able to help. I'm very interested in using it in our production
>>>> software, so I'm a motivated helper. :-)
>>>> >>
>>>> >> Thank you for your time,
>>>> >> -={C}=-
>>>> >> _______________________________________________
>>>> >> libcxx-dev mailing list
>>>> >> libcxx-dev at lists.llvm.org<mailto:libcxx-dev at lists.llvm.org>
>>>> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/libcxx-dev
>>>>
>>>> _______________________________________________
>>> libcxx-dev mailing list
>>> libcxx-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/libcxx-dev
>>>
>>
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com
>> http://jeffhammond.github.io/
>>
>

-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/libcxx-dev/attachments/20200916/a1d0ed6b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2020-09-16 at 3.49.56 PM.png
Type: image/png
Size: 471067 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/libcxx-dev/attachments/20200916/a1d0ed6b/attachment-0001.png>


More information about the libcxx-dev mailing list