[Polly] Fix parallelism check for reductions

Tobias Grosser tobias at grosser.es
Thu Jun 26 22:37:37 PDT 2014


On 27/06/2014 01:47, Johannes Doerfert wrote:
> Thanks for the comments on this one, I added some myself again.
>
> Summarized I think we still need to figure out what a correct and good way
> of doing this (but I think we are close).

Yes we are close.

>>> On 26/06/2014 02:24, Doerfert, Johannes wrote:
>>>> Hey Tobi,
>>>>
>>>> This is a patch we need to discuss in detail.
>>>> I basically tried to fix the loop reversal and modulo problems we
> discussed
>>>> earlier.
>>>>
>>>> Do you see any problems whit this way of check for broken dependences?
>>>
>>> I have some comments, but feel free to ignore what you believe is
> unnecessary.
> I'm not sure what's the best way to do this yet. My proposed way (TC and RTC
> dependences) perform pretty well so far (I cannot find a schedule which
> would
> not be parallelized because of overaproximation and I'm confident they will
> always detect the need for privatization).


>>>> +    // Transitive closure of the reduction dependences
>>>> +    TYPE_TC_RED = 1 << 4,
>>>> +
>>>> +    // Reversed transitive closure of the reduction dependences
>>>> +    TYPE_RTC_RED = 1 << 5,
>>>
>>> Why do you expose these dependences at all?  I personally would have
> stored the
>>> union of those two in TYPE_RED and would have removed the negative
> dependences
>>> from TYPE_RED any time getDependences is called (using the schedule to
> determin
>>> what negative means).
> I store them for now because TYPE_RED and TYPE_*TC_RED do not serve the same
> purpose:
>    TYPE_RED is still used to force sequential execution since the code
> generation
>             is not ready.
>    TYPE_*TC_RED is used to determine if parallel execution needs to be
> augmented
>                 by privatization.
>
>>>> +/// @brief Fix all dimension of @p Zero to 0 and add it to @p user
>>>> +static int fixSetToZero(__isl_take isl_set *Zero, void *user) {
>>>> +  isl_union_set **User = (isl_union_set **)user;
>>>> +  for (unsigned i = 0; i < isl_set_dim(Zero, isl_dim_set); i++)
>>>> +    Zero = isl_set_fix_si(Zero, isl_dim_set, i, 0);
>>>> +  *User = isl_union_set_add_set(*User, Zero);
>>>> +  return 0;
>>>> +}
>>>> +
>>>>    /// @brief Compute the privatization dependences for a given
> dependency @p
>>>>    Map
>>>>    ///
>>>>    /// Privatization dependences are widened original dependences which
>>>> originate @@ -132,14 +141,27 @@ void Dependences::collectInfo(Scop &S,
>>>> isl_union_map **Read,
>>>>    ///       S1[i0] -> S2[] : i0 >= 0 and i0 <= 1023}
>>>>    ///   RED:
>>>>    ///     { S1[i0] -> S1[1 + i0] : i0 >= 0 and i0 <= 1022 }
>>>> +///
>>>> +/// Note: This function also computes the (reverse) transitive closure
> of the
>>>> +///       reduction dependences.
>>>>    void Dependences::addPrivatizationDependences() {
>>>> -  isl_union_map *PrivRAW, *PrivWAW, *PrivWAR, *TransClosure;
>>>> +  isl_union_map *PrivRAW, *PrivWAW, *PrivWAR;
>>>>
>>>>      // The transitive closure might be over approximated but we only
> use it to
>>>>      // compute the privatization dependences. Thus, overapproximation
>>>> will lead
>>>> -  // "only" to more conservative privatization dependences.
>>>> -  // FIXME: Take precautions to ensure only forward dependences are
> created.
>>>> -  TransClosure =
>>>> isl_union_map_transitive_closure(isl_union_map_copy(RED), 0);
>>>> +  // "only" to more conservative privatization dependences. However,
>>>> + we take  // precautions to ensure only forward dependences are
> created.
>>>> +  TC_RED = isl_union_map_transitive_closure(isl_union_map_copy(RED),
>>>> + 0);
>>>> +
>>>> +  isl_union_set *UDeltas =
>>>> + isl_union_map_deltas(isl_union_map_copy(TC_RED));
>>>> +  isl_union_set *Universe =
>>>> + isl_union_set_universe(isl_union_set_copy(UDeltas));
>>>> +  isl_union_set *Zero =
>>>> + isl_union_set_empty(isl_union_set_get_space(Universe));
>>>> +  isl_union_set_foreach_set(Universe, fixSetToZero, &Zero);
>>>
>>> As the schedule space is a isl_set, but not a union_set you may be able
> to avoid
>>> this helper function.
> Is it a isl_set? Maybe I am confused right now but when I run
> test/Dependences/reduction_multiple_reductions.ll I get:
> isl_union_set_dump(UDeltas);
>    { [Stmt_if_else[i0] -> MemRef_prod[0]] : i0 >= 1 and i0 <= 511;
> [Stmt_if_then[i0] -> MemRef_sum[0]] : i0 >= 1 and i0 <= 511 }
> isl_union_set_dump(Zero);
>    { [Stmt_if_else[0] -> MemRef_prod[0]]; [Stmt_if_then[0] -> MemRef_sum[0]]
> }
> and:
>    isl_set_from_union_set(Zero);
> crashes.
>
> Did I understand you wrong?

I mean the space into which the schedule maps is a single space. Those 
are the dependences to which the schedule was not yet applied.

>>>> +  isl_union_map *NonPositive =
>>>> + isl_union_set_lex_le_union_set(UDeltas, Zero);
>>>
>>> You do not involve the schedule here? Only after applying the schedule
> you can
>>> know if dependences are negative in time.
> You mentioned that before but I still don't see why. I will try to come up
> with
> a formal reason why it is right, but I don't think it is ever "incorrect".
>
> Correctness [Do we __always__ privatize when we have to?]:
>    The *TC_RED dependences basically connect all reduction iterations with
> each
>    other making it imposible to schedule any two at the same time without
>    violating at least one of these dependences.
>
> Tell me what you think about that "argument".

Right. This is more about removing negative dependences here. Is this 
not what the lex_le_set is for? Maybe I misunderstood, but what is a 
negative dependency for you. Can you explain?

>>>> @@ -371,7 +393,7 @@ bool Dependences::isParallelDimension(__isl_take
>>>> isl_set *ScheduleSubset,
>>>>
>>>>      // FIXME: We can remove ignore reduction dependences in case we
> privatize
>>>>      the
>>>>      //        memory locations the reduction statements reduce into.
>>>> -  Deps = getDependences(TYPE_ALL | TYPE_RED);
>>>> +  Deps = getDependences(TYPE_RAW | TYPE_WAW | TYPE_WAR | TYPE_RED);
>>>
>>> Why do you use TYPE_RED here and not the TC_RED dependences?
>>>
>>> (I know having two parallelism checks is a bad idea)
> Because I did only ensure the isl codegeneration (In IslAst.h) does "the
> right
> thing" while we annotate the isl ast optimistically with "pragma *
> reduction".
>
> The above code is used by the cloog codegen. I use TYPE_RED to ensure
> sequential
> execution (as explained earlier).

Did you test this with CLooG. I am afraid it may also introduce parallelism.

>>>> -  isl_union_map *RedDeps = D->getDependences(Dependences::TYPE_RED);
>>>> +  isl_union_map *RedDeps =
>>>> +      D->getDependences(Dependences::TYPE_TC_RED |
>>>> + Dependences::TYPE_RTC_RED);
>>>
>>> Does this work if we reverse the dependences?
>>>
>>>    "schedule" : "[n] -> { Stmt_S0[i0] -> scattering[0, -i0, 0]: i0 % 2 =
> 0;
>>>    Stmt_S0[i0] -> scattering[2, -i0, 0]: i0 % 2 = 1 }"
> It does, but wasn't that also one of the examples?
>
> I added some more possible schedules which all look good to me.
>
> The test cases combine loop reversals, "modulo schedules" and multiple
> dimensions (some of them do not carry reduction dependences).

Good. Thanks for checking.

Tobias




More information about the llvm-commits mailing list