[libcxx-dev] OpenMP parallel reduce bugs
Dvorskiy, Mikhail via libcxx-dev
libcxx-dev at lists.llvm.org
Thu Oct 1 06:11:54 PDT 2020
Hi Christopher,
Yes, “is_partitioned” algo implementation is based on a reduction parallel pattern.
And it looks that a binary operation (combiner) is not commutative.
In general, “reduction” algorithm requires a commutative binary operation. And OpenMP reduction requires that.
For TBB backend it works because TBB parallel reduction algorithm doesn’t require a commutative binary operation.
We (me or Evgeniy) will check that hypo and inform you.
Best regards,
Mikhail Dvorskiy
From: Christopher Nelson <nadiasvertex at gmail.com>
Sent: Thursday, October 1, 2020 2:46 AM
To: Kukanov, Alexey <Alexey.Kukanov at intel.com>
Cc: Dvorskiy, Mikhail <mikhail.dvorskiy at intel.com>; Pavlov, Evgeniy <evgeniy.pavlov at intel.com>; Louis Dionne <ldionne at apple.com>; Thomas Rodgers <trodgers at redhat.com>; Libc++ Dev <libcxx-dev at lists.llvm.org>
Subject: [libcxx-dev] OpenMP parallel reduce bugs
Hello friends,
I have been working on the OpenMP backend for the parallel STL, and most of the tests are passing. However, among the failures is the "is_partitioned" test. I have rewritten the __parallel_reduce backend function to be simpler to understand in an attempt to understand what is failing (code is below.)
I also rewrote it as a serial function that splits the iteration range in two and then calls __reduction() on each half of the range being passed in. The result I get from the serial execution as compared to the result I get from the parallel execution is different.
I have verified that the parallel execution tasks are run, and that their results match what each serial execution would be if I ran them that way.
I am wondering if there is something wrong with the way OpenMP is running the reducer here? Perhaps it is injecting a value into the computation that is unexpected for this algorithm? Does anything jump out at anyone as being suspicious?
Thank you again for your time and assistance!
template <class _RandomAccessIterator, class _Value, typename _RealBody, typename _Reduction>
_Value
__parallel_reduce_body(_RandomAccessIterator __first, _RandomAccessIterator __last, _Value __identity,
_RealBody __real_body, _Reduction __reduction)
{
std::size_t __item_count = __last - __first;
std::size_t __head_items = (__item_count / __default_chunk_size) * __default_chunk_size;
// We should encapsulate a result value and a reduction operator since we
// cannot use a lambda in OpenMP UDR.
using _CombinerType = __pstl::__internal::_Combiner<_Value, _Reduction>;
_CombinerType __result{__identity, &__reduction};
_PSTL_PRAGMA_DECLARE_REDUCTION(__combiner, _CombinerType)
// To avoid over-subscription we use taskloop for the nested parallelism
//_PSTL_PRAGMA(omp taskloop reduction(__combiner : __result))
for (std::size_t __i = 0; __i < __item_count; __i += __default_chunk_size)
{
auto __begin = __first + __i;
auto __end = __i < __head_items ? __begin + __default_chunk_size : __last;
__result.__value = __real_body(__begin, __end, __identity);
}
return __result.__value;
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/libcxx-dev/attachments/20201001/8cbc0dfe/attachment-0001.html>
More information about the libcxx-dev
mailing list