[Polly] Fix parallelism check for reductions
Tobias Grosser
tobias at grosser.es
Thu Jun 26 22:37:37 PDT 2014
On 27/06/2014 01:47, Johannes Doerfert wrote:
> Thanks for the comments on this one, I added some myself again.
>
> Summarized I think we still need to figure out what a correct and good way
> of doing this (but I think we are close).
Yes we are close.
>>> On 26/06/2014 02:24, Doerfert, Johannes wrote:
>>>> Hey Tobi,
>>>>
>>>> This is a patch we need to discuss in detail.
>>>> I basically tried to fix the loop reversal and modulo problems we
> discussed
>>>> earlier.
>>>>
>>>> Do you see any problems whit this way of check for broken dependences?
>>>
>>> I have some comments, but feel free to ignore what you believe is
> unnecessary.
> I'm not sure what's the best way to do this yet. My proposed way (TC and RTC
> dependences) perform pretty well so far (I cannot find a schedule which
> would
> not be parallelized because of overaproximation and I'm confident they will
> always detect the need for privatization).
>>>> + // Transitive closure of the reduction dependences
>>>> + TYPE_TC_RED = 1 << 4,
>>>> +
>>>> + // Reversed transitive closure of the reduction dependences
>>>> + TYPE_RTC_RED = 1 << 5,
>>>
>>> Why do you expose these dependences at all? I personally would have
> stored the
>>> union of those two in TYPE_RED and would have removed the negative
> dependences
>>> from TYPE_RED any time getDependences is called (using the schedule to
> determin
>>> what negative means).
> I store them for now because TYPE_RED and TYPE_*TC_RED do not serve the same
> purpose:
> TYPE_RED is still used to force sequential execution since the code
> generation
> is not ready.
> TYPE_*TC_RED is used to determine if parallel execution needs to be
> augmented
> by privatization.
>
>>>> +/// @brief Fix all dimension of @p Zero to 0 and add it to @p user
>>>> +static int fixSetToZero(__isl_take isl_set *Zero, void *user) {
>>>> + isl_union_set **User = (isl_union_set **)user;
>>>> + for (unsigned i = 0; i < isl_set_dim(Zero, isl_dim_set); i++)
>>>> + Zero = isl_set_fix_si(Zero, isl_dim_set, i, 0);
>>>> + *User = isl_union_set_add_set(*User, Zero);
>>>> + return 0;
>>>> +}
>>>> +
>>>> /// @brief Compute the privatization dependences for a given
> dependency @p
>>>> Map
>>>> ///
>>>> /// Privatization dependences are widened original dependences which
>>>> originate @@ -132,14 +141,27 @@ void Dependences::collectInfo(Scop &S,
>>>> isl_union_map **Read,
>>>> /// S1[i0] -> S2[] : i0 >= 0 and i0 <= 1023}
>>>> /// RED:
>>>> /// { S1[i0] -> S1[1 + i0] : i0 >= 0 and i0 <= 1022 }
>>>> +///
>>>> +/// Note: This function also computes the (reverse) transitive closure
> of the
>>>> +/// reduction dependences.
>>>> void Dependences::addPrivatizationDependences() {
>>>> - isl_union_map *PrivRAW, *PrivWAW, *PrivWAR, *TransClosure;
>>>> + isl_union_map *PrivRAW, *PrivWAW, *PrivWAR;
>>>>
>>>> // The transitive closure might be over approximated but we only
> use it to
>>>> // compute the privatization dependences. Thus, overapproximation
>>>> will lead
>>>> - // "only" to more conservative privatization dependences.
>>>> - // FIXME: Take precautions to ensure only forward dependences are
> created.
>>>> - TransClosure =
>>>> isl_union_map_transitive_closure(isl_union_map_copy(RED), 0);
>>>> + // "only" to more conservative privatization dependences. However,
>>>> + we take // precautions to ensure only forward dependences are
> created.
>>>> + TC_RED = isl_union_map_transitive_closure(isl_union_map_copy(RED),
>>>> + 0);
>>>> +
>>>> + isl_union_set *UDeltas =
>>>> + isl_union_map_deltas(isl_union_map_copy(TC_RED));
>>>> + isl_union_set *Universe =
>>>> + isl_union_set_universe(isl_union_set_copy(UDeltas));
>>>> + isl_union_set *Zero =
>>>> + isl_union_set_empty(isl_union_set_get_space(Universe));
>>>> + isl_union_set_foreach_set(Universe, fixSetToZero, &Zero);
>>>
>>> As the schedule space is a isl_set, but not a union_set you may be able
> to avoid
>>> this helper function.
> Is it a isl_set? Maybe I am confused right now but when I run
> test/Dependences/reduction_multiple_reductions.ll I get:
> isl_union_set_dump(UDeltas);
> { [Stmt_if_else[i0] -> MemRef_prod[0]] : i0 >= 1 and i0 <= 511;
> [Stmt_if_then[i0] -> MemRef_sum[0]] : i0 >= 1 and i0 <= 511 }
> isl_union_set_dump(Zero);
> { [Stmt_if_else[0] -> MemRef_prod[0]]; [Stmt_if_then[0] -> MemRef_sum[0]]
> }
> and:
> isl_set_from_union_set(Zero);
> crashes.
>
> Did I understand you wrong?
I mean the space into which the schedule maps is a single space. Those
are the dependences to which the schedule was not yet applied.
>>>> + isl_union_map *NonPositive =
>>>> + isl_union_set_lex_le_union_set(UDeltas, Zero);
>>>
>>> You do not involve the schedule here? Only after applying the schedule
> you can
>>> know if dependences are negative in time.
> You mentioned that before but I still don't see why. I will try to come up
> with
> a formal reason why it is right, but I don't think it is ever "incorrect".
>
> Correctness [Do we __always__ privatize when we have to?]:
> The *TC_RED dependences basically connect all reduction iterations with
> each
> other making it imposible to schedule any two at the same time without
> violating at least one of these dependences.
>
> Tell me what you think about that "argument".
Right. This is more about removing negative dependences here. Is this
not what the lex_le_set is for? Maybe I misunderstood, but what is a
negative dependency for you. Can you explain?
>>>> @@ -371,7 +393,7 @@ bool Dependences::isParallelDimension(__isl_take
>>>> isl_set *ScheduleSubset,
>>>>
>>>> // FIXME: We can remove ignore reduction dependences in case we
> privatize
>>>> the
>>>> // memory locations the reduction statements reduce into.
>>>> - Deps = getDependences(TYPE_ALL | TYPE_RED);
>>>> + Deps = getDependences(TYPE_RAW | TYPE_WAW | TYPE_WAR | TYPE_RED);
>>>
>>> Why do you use TYPE_RED here and not the TC_RED dependences?
>>>
>>> (I know having two parallelism checks is a bad idea)
> Because I did only ensure the isl codegeneration (In IslAst.h) does "the
> right
> thing" while we annotate the isl ast optimistically with "pragma *
> reduction".
>
> The above code is used by the cloog codegen. I use TYPE_RED to ensure
> sequential
> execution (as explained earlier).
Did you test this with CLooG. I am afraid it may also introduce parallelism.
>>>> - isl_union_map *RedDeps = D->getDependences(Dependences::TYPE_RED);
>>>> + isl_union_map *RedDeps =
>>>> + D->getDependences(Dependences::TYPE_TC_RED |
>>>> + Dependences::TYPE_RTC_RED);
>>>
>>> Does this work if we reverse the dependences?
>>>
>>> "schedule" : "[n] -> { Stmt_S0[i0] -> scattering[0, -i0, 0]: i0 % 2 =
> 0;
>>> Stmt_S0[i0] -> scattering[2, -i0, 0]: i0 % 2 = 1 }"
> It does, but wasn't that also one of the examples?
>
> I added some more possible schedules which all look good to me.
>
> The test cases combine loop reversals, "modulo schedules" and multiple
> dimensions (some of them do not carry reduction dependences).
Good. Thanks for checking.
Tobias
More information about the llvm-commits
mailing list