[libcxx-dev] is_partioned algorithm
Christopher Nelson via libcxx-dev
libcxx-dev at lists.llvm.org
Thu Oct 1 05:26:11 PDT 2020
Hi friends,new_state = table[left_state * 4 + right_state]
I have rewritten the __parallel_reduce function for OpenMP to avoid using
the reduction() clause. I think that OpenMP is not feeding the
output values to the reduction in the way that one might think. I am
tracking that problem in a different thread. In any case, the loop without
the reduction, and then performing an explicit reduction at the end gets
_much_ farther.
In this thread, I would like to talk about the is_partioned algorithm.
Briefly, I can see that there is a main body which performs some processing
and then returns a result which is a value in a lookup table.
The brick body for is_partioned will return one of:
enum _ReduceType
{
__not_init = -1,
__broken,
__all_true,
__all_false,
__true_false
};
In the case I am trying to debug there 521 elements. The simple, naive
chunk partitioner I wrote breaks this up into two chunks: 0-511, and
512-521.
The first chunk comes back as "2" (__all_true) and "3" (__all_false) if I
am reading the enum right. The reducer is called with __reduction(2,3)
which performs the following table lookup:
new_state = table[left_state * 4 + right_state]
In this case, it looks like it result in table entry 11:
_ReduceType __table[] = {
__broken, __broken, __broken, __broken, __broken, __all_true,
__true_false, __true_false, __broken, __broken, __all_false, __broken,
__broken, __broken, __true_false, __broken
};
Which turns out to be __broken. However, when I run the brick body on the
entire range at once, it results in "3". This seems to indicate that the
algorithm is very sensitive to how the range is chunked. I looked at the
TBB implementation and it seems to create a blocked range with a
grainsize of 1. That seems wrong, since the scheduling overhead would be
enormous, so clearly there is something going on that I don't understand.
However, since the reduction of two arbitrary chunks of a range appears to
give an answer that is different from the reduction of a symmetric split,
it seems like that may be a requirement. Is that true? Does
__parallel_reduce need split the range evenly in order to perform correctly?
-={C}=-
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/libcxx-dev/attachments/20201001/e2861962/attachment.html>
More information about the libcxx-dev
mailing list