[libcxx-dev] OpenMP parallel reduce bugs
Christopher Nelson via libcxx-dev
libcxx-dev at lists.llvm.org
Fri Oct 30 13:29:04 PDT 2020
Hi Mikhail,
I tried to apply the patch today, but it doesn't apply cleanly. May I ask
what version of the repo I should use? After I apply it (skipping a hunk
that doesn't apply) I get some compile errors:
/home/christopher/projects/llvm-project/pstl/include/pstl/internal/algorithm_impl.h:1780:13:
error: unknown type name '_ReduceRes'
_ReduceRes __table[] = {__broken, __broken, __broken,
__broken, __broken, __all_true,
^
/home/christopher/projects/llvm-project/pstl/include/pstl/internal/algorithm_impl.h:1783:13:
error: use of '_ReduceType' with tag type that does not match previous
declaration
struct _ReduceType
^
/home/christopher/projects/llvm-project/pstl/include/pstl/internal/algorithm_impl.h:1770:18:
note: previous use is here
enum _ReduceType
^
/home/christopher/projects/llvm-project/pstl/include/pstl/internal/algorithm_impl.h:1802:54:
error: use of undeclared identifier '__val1'; did you mean '__value'?
return _ReduceType{__broken, __val1.__pos};
^~~~~~
__value
/home/christopher/projects/llvm-project/pstl/include/pstl/internal/algorithm_impl.h:1799:73:
note: '__value' declared here
_ReduceType
__value) -> _ReduceType {
On Wed, Oct 21, 2020 at 9:36 AM Dvorskiy, Mikhail <
mikhail.dvorskiy at intel.com> wrote:
> Hi Christopher,
>
>
>
> Please find the attachment - a patch which fixes the mentioned problem
> with is partitioned algo. I’ve not yet promoted it into LLVM repo, but you
> can check it now with OpenMP backend.
>
>
>
> Best regards,
>
> Mikhail Dvorskiy
>
>
>
> *From:* Christopher Nelson <nadiasvertex at gmail.com>
> *Sent:* Monday, October 5, 2020 3:01 PM
> *To:* Dvorskiy, Mikhail <mikhail.dvorskiy at intel.com>
> *Cc:* Kukanov, Alexey <Alexey.Kukanov at intel.com>; Pavlov, Evgeniy <
> evgeniy.pavlov at intel.com>; Louis Dionne <ldionne at apple.com>; Thomas
> Rodgers <trodgers at redhat.com>; Libc++ Dev <libcxx-dev at lists.llvm.org>
> *Subject:* Re: [libcxx-dev] OpenMP parallel reduce bugs
>
>
>
> Great, thanks!
>
>
>
> I appreciate it!
>
>
>
> On Mon, Oct 5, 2020, 04:22 Dvorskiy, Mikhail <mikhail.dvorskiy at intel.com>
> wrote:
>
> Hi Christopher,
>
> I’ve double check the code of __pattern_is_partitioned (which is based on
> the reduction parallel pattern). Yes, a binary operation is not
> commutative. So, my hypo was right.
>
> Generally speaking the writing “manually” reduction pattern w/o OpenMP s
> reducer is not good approach due to it may be not effective. Indeed, if we
> consider your example – the second loop (for) combines the results in
> serial mode, and std::vector brings additional overheads…
>
>
>
> Once more, OpenMP reduction requires commutative binary operation and it
> is right. With PSTL design perspective an algorithm pattern should not rely
> on a fact that a parallel reduction pattern (which is provided by a
> parallel backend) support a non-commutative binary operation. So, it is an
> issue of __pattern_is_partitioned and we will fix it. So while I would
> suggest don’t worry about that.
>
>
>
> Best regards,
>
> Mikhail Dvorskiy
>
>
>
> *From:* Christopher Nelson <nadiasvertex at gmail.com>
> *Sent:* Saturday, October 3, 2020 6:19 PM
> *To:* Dvorskiy, Mikhail <mikhail.dvorskiy at intel.com>
> *Cc:* Kukanov, Alexey <Alexey.Kukanov at intel.com>; Pavlov, Evgeniy <
> evgeniy.pavlov at intel.com>; Louis Dionne <ldionne at apple.com>; Thomas
> Rodgers <trodgers at redhat.com>; Libc++ Dev <libcxx-dev at lists.llvm.org>
> *Subject:* Re: [libcxx-dev] OpenMP parallel reduce bugs
>
>
>
> Hello again,
>
>
>
> I was able to rewrite the parallel_reduce function in a way that works
> without using OpenMP's reducer. I have a couple of questions:
>
>
>
> 1. I use a vector to gather the intervening results for later reduction.
> Is there any problem depending on vector here?
>
> 2. I can see that it might make sense to build a taskloop for the actual
> reduction if the number of chunks is quite large. Is that something that I
> should look into more?
>
>
>
> The code is below. Please let me know if you have any questions or
> concerns.
>
>
>
>
>
>
>
> *//------------------------------------------------------------------------// parallel_reduce//------------------------------------------------------------------------*template <class _RandomAccessIterator, class _Value, typename _RealBody, typename _Reduction>
> _Value
> __parallel_reduce_body(_RandomAccessIterator __first, _RandomAccessIterator __last, _Value __identity,
> _RealBody __real_body, _Reduction __reduction)
> {
> std::size_t __n_chunks{0}, __chunk_size{0}, __first_chunk_size{0};
> __chunk_partitioner(__first, __last, __n_chunks, __chunk_size, __first_chunk_size);
>
> std::vector<_Value> __values(__n_chunks);
>
>
> *// To avoid over-subscription we use taskloop for the nested parallelism *_PSTL_PRAGMA(omp taskloop shared(__values))
> for (std::size_t __chunk = 0; __chunk < __n_chunks; ++__chunk)
> {
> auto __this_chunk_size = __chunk == 0 ? __first_chunk_size : __chunk_size;
> auto __index = __chunk == 0 ? 0 : (__chunk * __chunk_size) + (__first_chunk_size - __chunk_size);
> auto __begin = __first + __index;
> auto __end = __begin + __this_chunk_size;
> __values[__chunk] = __real_body(__begin, __end, __identity);
> }
>
> auto __result = __values.front();
> for (auto p = __values.begin() + 1; p != __values.end(); ++p)
> {
> __result = __reduction(__result, *p);
> }
>
> return __result;
> }
>
>
>
> On Fri, Oct 2, 2020 at 1:33 PM Christopher Nelson <nadiasvertex at gmail.com>
> wrote:
>
> Thank you. I wondered if you had an update on this. I've done some further
> looking, and I think that is correct. I've tried to find example
> implementations of performing reductions with openmp that don't require a
> commutative operator. It seems like rewriting the is_partioned algorithm to
> provide a commutative operator might be a larger / undesirable change.
>
>
>
> Do you have any guidance on manually writing a task loop in openmp that
> performs the reduction without requiring commutativity?
>
>
>
> Thanks!
>
>
>
> -={C}=-
>
>
>
> On Thu, Oct 1, 2020 at 9:11 AM Dvorskiy, Mikhail <
> mikhail.dvorskiy at intel.com> wrote:
>
> Hi Christopher,
>
>
>
> Yes, “is_partitioned” algo implementation is based on a reduction
> parallel pattern.
>
> And it looks that a binary operation (combiner) is not commutative.
>
>
>
> In general, “reduction” algorithm requires a commutative binary operation.
> And OpenMP reduction requires that.
>
> For TBB backend it works because TBB parallel reduction algorithm doesn’t
> require a commutative binary operation.
>
>
>
> We (me or Evgeniy) will check that hypo and inform you.
>
>
>
> Best regards,
>
> Mikhail Dvorskiy
>
>
>
> *From:* Christopher Nelson <nadiasvertex at gmail.com>
> *Sent:* Thursday, October 1, 2020 2:46 AM
> *To:* Kukanov, Alexey <Alexey.Kukanov at intel.com>
> *Cc:* Dvorskiy, Mikhail <mikhail.dvorskiy at intel.com>; Pavlov, Evgeniy <
> evgeniy.pavlov at intel.com>; Louis Dionne <ldionne at apple.com>; Thomas
> Rodgers <trodgers at redhat.com>; Libc++ Dev <libcxx-dev at lists.llvm.org>
> *Subject:* [libcxx-dev] OpenMP parallel reduce bugs
>
>
>
> Hello friends,
>
>
>
> I have been working on the OpenMP backend for the parallel STL, and most
> of the tests are passing. However, among the failures is the
> "is_partitioned" test. I have rewritten the __parallel_reduce backend
> function to be simpler to understand in an attempt to understand what is
> failing (code is below.)
>
>
>
> I also rewrote it as a serial function that splits the iteration range in
> two and then calls __reduction() on each half of the range being passed in.
> The result I get from the serial execution as compared to the result I get
> from the parallel execution is different.
>
>
>
> I have verified that the parallel execution tasks are run, and that their
> results match what each serial execution would be if I ran them that way.
>
>
>
> I am wondering if there is something wrong with the way OpenMP is running
> the reducer here? Perhaps it is injecting a value into the computation that
> is unexpected for this algorithm? Does anything jump out at anyone as being
> suspicious?
>
>
>
> Thank you again for your time and assistance!
>
> template <class _RandomAccessIterator, class _Value, typename _RealBody, typename _Reduction>
> _Value
> __parallel_reduce_body(_RandomAccessIterator __first, _RandomAccessIterator __last, _Value __identity,
> _RealBody __real_body, _Reduction __reduction)
> {
> std::size_t __item_count = __last - __first;
> std::size_t __head_items = (__item_count / __default_chunk_size) * __default_chunk_size;
>
>
>
> *// We should encapsulate a result value and a reduction operator since we // cannot use a lambda in OpenMP UDR. *using _CombinerType = __pstl::__internal::_Combiner<_Value, _Reduction>;
> _CombinerType __result{__identity, &__reduction};
> _PSTL_PRAGMA_DECLARE_REDUCTION(__combiner, _CombinerType)
>
>
>
> *// To avoid over-subscription we use taskloop for the nested parallelism //_PSTL_PRAGMA(omp taskloop reduction(__combiner : __result)) *for (std::size_t __i = 0; __i < __item_count; __i += __default_chunk_size)
> {
> auto __begin = __first + __i;
> auto __end = __i < __head_items ? __begin + __default_chunk_size : __last;
> __result.__value = __real_body(__begin, __end, __identity);
> }
>
> return __result.__value;
> }
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/libcxx-dev/attachments/20201030/740686fe/attachment-0001.html>
More information about the libcxx-dev
mailing list