[libcxx-dev] OpenMP parallel reduce bugs
Christopher Nelson via libcxx-dev
libcxx-dev at lists.llvm.org
Sat Oct 3 08:18:33 PDT 2020
Hello again,
I was able to rewrite the parallel_reduce function in a way that works
without using OpenMP's reducer. I have a couple of questions:
1. I use a vector to gather the intervening results for later reduction. Is
there any problem depending on vector here?
2. I can see that it might make sense to build a taskloop for the actual
reduction if the number of chunks is quite large. Is that something that I
should look into more?
The code is below. Please let me know if you have any questions or concerns.
//------------------------------------------------------------------------
// parallel_reduce
//------------------------------------------------------------------------
template <class _RandomAccessIterator, class _Value, typename
_RealBody, typename _Reduction>
_Value
__parallel_reduce_body(_RandomAccessIterator __first,
_RandomAccessIterator __last, _Value __identity,
_RealBody __real_body, _Reduction __reduction)
{
std::size_t __n_chunks{0}, __chunk_size{0}, __first_chunk_size{0};
__chunk_partitioner(__first, __last, __n_chunks, __chunk_size,
__first_chunk_size);
std::vector<_Value> __values(__n_chunks);
// To avoid over-subscription we use taskloop for the nested parallelism
_PSTL_PRAGMA(omp taskloop shared(__values))
for (std::size_t __chunk = 0; __chunk < __n_chunks; ++__chunk)
{
auto __this_chunk_size = __chunk == 0 ? __first_chunk_size :
__chunk_size;
auto __index = __chunk == 0 ? 0 : (__chunk * __chunk_size) +
(__first_chunk_size - __chunk_size);
auto __begin = __first + __index;
auto __end = __begin + __this_chunk_size;
__values[__chunk] = __real_body(__begin, __end, __identity);
}
auto __result = __values.front();
for (auto p = __values.begin() + 1; p != __values.end(); ++p)
{
__result = __reduction(__result, *p);
}
return __result;
}
On Fri, Oct 2, 2020 at 1:33 PM Christopher Nelson <nadiasvertex at gmail.com>
wrote:
> Thank you. I wondered if you had an update on this. I've done some further
> looking, and I think that is correct. I've tried to find example
> implementations of performing reductions with openmp that don't require a
> commutative operator. It seems like rewriting the is_partioned algorithm to
> provide a commutative operator might be a larger / undesirable change.
>
> Do you have any guidance on manually writing a task loop in openmp that
> performs the reduction without requiring commutativity?
>
> Thanks!
>
> -={C}=-
>
> On Thu, Oct 1, 2020 at 9:11 AM Dvorskiy, Mikhail <
> mikhail.dvorskiy at intel.com> wrote:
>
>> Hi Christopher,
>>
>>
>>
>> Yes, “is_partitioned” algo implementation is based on a reduction
>> parallel pattern.
>>
>> And it looks that a binary operation (combiner) is not commutative.
>>
>>
>>
>> In general, “reduction” algorithm requires a commutative binary
>> operation. And OpenMP reduction requires that.
>>
>> For TBB backend it works because TBB parallel reduction algorithm doesn’t
>> require a commutative binary operation.
>>
>>
>>
>> We (me or Evgeniy) will check that hypo and inform you.
>>
>>
>>
>> Best regards,
>>
>> Mikhail Dvorskiy
>>
>>
>>
>> *From:* Christopher Nelson <nadiasvertex at gmail.com>
>> *Sent:* Thursday, October 1, 2020 2:46 AM
>> *To:* Kukanov, Alexey <Alexey.Kukanov at intel.com>
>> *Cc:* Dvorskiy, Mikhail <mikhail.dvorskiy at intel.com>; Pavlov, Evgeniy <
>> evgeniy.pavlov at intel.com>; Louis Dionne <ldionne at apple.com>; Thomas
>> Rodgers <trodgers at redhat.com>; Libc++ Dev <libcxx-dev at lists.llvm.org>
>> *Subject:* [libcxx-dev] OpenMP parallel reduce bugs
>>
>>
>>
>> Hello friends,
>>
>>
>>
>> I have been working on the OpenMP backend for the parallel STL, and most
>> of the tests are passing. However, among the failures is the
>> "is_partitioned" test. I have rewritten the __parallel_reduce backend
>> function to be simpler to understand in an attempt to understand what is
>> failing (code is below.)
>>
>>
>>
>> I also rewrote it as a serial function that splits the iteration range in
>> two and then calls __reduction() on each half of the range being passed in.
>> The result I get from the serial execution as compared to the result I get
>> from the parallel execution is different.
>>
>>
>>
>> I have verified that the parallel execution tasks are run, and that their
>> results match what each serial execution would be if I ran them that way.
>>
>>
>>
>> I am wondering if there is something wrong with the way OpenMP is running
>> the reducer here? Perhaps it is injecting a value into the computation that
>> is unexpected for this algorithm? Does anything jump out at anyone as being
>> suspicious?
>>
>>
>>
>> Thank you again for your time and assistance!
>>
>> template <class _RandomAccessIterator, class _Value, typename _RealBody, typename _Reduction>
>> _Value
>> __parallel_reduce_body(_RandomAccessIterator __first, _RandomAccessIterator __last, _Value __identity,
>> _RealBody __real_body, _Reduction __reduction)
>> {
>> std::size_t __item_count = __last - __first;
>> std::size_t __head_items = (__item_count / __default_chunk_size) * __default_chunk_size;
>>
>>
>>
>> *// We should encapsulate a result value and a reduction operator since we // cannot use a lambda in OpenMP UDR. *using _CombinerType = __pstl::__internal::_Combiner<_Value, _Reduction>;
>> _CombinerType __result{__identity, &__reduction};
>> _PSTL_PRAGMA_DECLARE_REDUCTION(__combiner, _CombinerType)
>>
>>
>>
>> *// To avoid over-subscription we use taskloop for the nested parallelism //_PSTL_PRAGMA(omp taskloop reduction(__combiner : __result)) *for (std::size_t __i = 0; __i < __item_count; __i += __default_chunk_size)
>> {
>> auto __begin = __first + __i;
>> auto __end = __i < __head_items ? __begin + __default_chunk_size : __last;
>> __result.__value = __real_body(__begin, __end, __identity);
>> }
>>
>> return __result.__value;
>> }
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/libcxx-dev/attachments/20201003/3c659db3/attachment-0001.html>
More information about the libcxx-dev
mailing list