[Polly] Fix parallelism check for reductions

Thu Jun 26 16:47:11 PDT 2014

Thanks for the comments on this one, I added some myself again.

Summarized I think we still need to figure out what a correct and good way
of doing this (but I think we are close).

>> On 26/06/2014 02:24, Doerfert, Johannes wrote:
>> > Hey Tobi,
>> >
>> > This is a patch we need to discuss in detail.
>> > I basically tried to fix the loop reversal and modulo problems we
discussed
>> > earlier.
>> >
>> > Do you see any problems whit this way of check for broken dependences?
>> 
>> I have some comments, but feel free to ignore what you believe is
unnecessary.
I'm not sure what's the best way to do this yet. My proposed way (TC and RTC
dependences) perform pretty well so far (I cannot find a schedule which
would
not be parallelized because of overaproximation and I'm confident they will
always detect the need for privatization).

>> >       // Reduction dependences
>> > -    TYPE_RED = 0x8,
>> > +    TYPE_RED = 1 << 3,
>> >
>> > -    // All (validity) dependences
>> > -    TYPE_ALL = (TYPE_WAR | TYPE_RAW | TYPE_WAW)
>> 
>> I would have committed this change separately.
I did r211803.

>> > +    // Transitive closure of the reduction dependences
>> > +    TYPE_TC_RED = 1 << 4,
>> > +
>> > +    // Reversed transitive closure of the reduction dependences
>> > +    TYPE_RTC_RED = 1 << 5,
>> 
>> Why do you expose these dependences at all?  I personally would have
stored the
>> union of those two in TYPE_RED and would have removed the negative
dependences
>> from TYPE_RED any time getDependences is called (using the schedule to
determin
>> what negative means).
I store them for now because TYPE_RED and TYPE_*TC_RED do not serve the same
purpose:
  TYPE_RED is still used to force sequential execution since the code
generation
           is not ready.
  TYPE_*TC_RED is used to determine if parallel execution needs to be
augmented
               by privatization.

>> > +/// @brief Fix all dimension of @p Zero to 0 and add it to @p user 
>> > +static int fixSetToZero(__isl_take isl_set *Zero, void *user) {
>> > +  isl_union_set **User = (isl_union_set **)user;
>> > +  for (unsigned i = 0; i < isl_set_dim(Zero, isl_dim_set); i++)
>> > +    Zero = isl_set_fix_si(Zero, isl_dim_set, i, 0);
>> > +  *User = isl_union_set_add_set(*User, Zero);
>> > +  return 0;
>> > +}
>> > +
>> >   /// @brief Compute the privatization dependences for a given
dependency @p
>> >   Map
>> >   ///
>> >   /// Privatization dependences are widened original dependences which 
>> > originate @@ -132,14 +141,27 @@ void Dependences::collectInfo(Scop &S,
>> > isl_union_map **Read,
>> >   ///       S1[i0] -> S2[] : i0 >= 0 and i0 <= 1023}
>> >   ///   RED:
>> >   ///     { S1[i0] -> S1[1 + i0] : i0 >= 0 and i0 <= 1022 }
>> > +///
>> > +/// Note: This function also computes the (reverse) transitive closure
of the
>> > +///       reduction dependences.
>> >   void Dependences::addPrivatizationDependences() {
>> > -  isl_union_map *PrivRAW, *PrivWAW, *PrivWAR, *TransClosure;
>> > +  isl_union_map *PrivRAW, *PrivWAW, *PrivWAR;
>> >
>> >     // The transitive closure might be over approximated but we only
use it to
>> >     // compute the privatization dependences. Thus, overapproximation 
>> > will lead
>> > -  // "only" to more conservative privatization dependences.
>> > -  // FIXME: Take precautions to ensure only forward dependences are
created.
>> > -  TransClosure = 
>> > isl_union_map_transitive_closure(isl_union_map_copy(RED), 0);
>> > +  // "only" to more conservative privatization dependences. However, 
>> > + we take  // precautions to ensure only forward dependences are
created.
>> > +  TC_RED = isl_union_map_transitive_closure(isl_union_map_copy(RED), 
>> > + 0);
>> > +
>> > +  isl_union_set *UDeltas = 
>> > + isl_union_map_deltas(isl_union_map_copy(TC_RED));
>> > +  isl_union_set *Universe = 
>> > + isl_union_set_universe(isl_union_set_copy(UDeltas));
>> > +  isl_union_set *Zero = 
>> > + isl_union_set_empty(isl_union_set_get_space(Universe));
>> > +  isl_union_set_foreach_set(Universe, fixSetToZero, &Zero);
>> 
>> As the schedule space is a isl_set, but not a union_set you may be able
to avoid
>> this helper function.
Is it a isl_set? Maybe I am confused right now but when I run
test/Dependences/reduction_multiple_reductions.ll I get:
isl_union_set_dump(UDeltas);
  { [Stmt_if_else[i0] -> MemRef_prod[0]] : i0 >= 1 and i0 <= 511;
[Stmt_if_then[i0] -> MemRef_sum[0]] : i0 >= 1 and i0 <= 511 }
isl_union_set_dump(Zero);
  { [Stmt_if_else[0] -> MemRef_prod[0]]; [Stmt_if_then[0] -> MemRef_sum[0]]
}
and:
  isl_set_from_union_set(Zero);
crashes.

Did I understand you wrong?

>> > +  isl_union_map *NonPositive = 
>> > + isl_union_set_lex_le_union_set(UDeltas, Zero);
>> 
>> You do not involve the schedule here? Only after applying the schedule
you can
>> know if dependences are negative in time.
You mentioned that before but I still don't see why. I will try to come up
with
a formal reason why it is right, but I don't think it is ever "incorrect".

Correctness [Do we __always__ privatize when we have to?]:
  The *TC_RED dependences basically connect all reduction iterations with
each
  other making it imposible to schedule any two at the same time without
  violating at least one of these dependences.

Tell me what you think about that "argument".

>> If you computed TC_RED as the union of TC_RED and RTC_RED before calling
this
>> function, you an just use TC_RED here.
I merged them.

>> > @@ -371,7 +393,7 @@ bool Dependences::isParallelDimension(__isl_take 
>> > isl_set *ScheduleSubset,
>> >
>> >     // FIXME: We can remove ignore reduction dependences in case we
privatize
>> >     the
>> >     //        memory locations the reduction statements reduce into.
>> > -  Deps = getDependences(TYPE_ALL | TYPE_RED);
>> > +  Deps = getDependences(TYPE_RAW | TYPE_WAW | TYPE_WAR | TYPE_RED);
>> 
>> Why do you use TYPE_RED here and not the TC_RED dependences?
>> 
>> (I know having two parallelism checks is a bad idea)
Because I did only ensure the isl codegeneration (In IslAst.h) does "the
right
thing" while we annotate the isl ast optimistically with "pragma *
reduction".

The above code is used by the cloog codegen. I use TYPE_RED to ensure
sequential
execution (as explained earlier).

>> > -  isl_union_map *RedDeps = D->getDependences(Dependences::TYPE_RED);
>> > +  isl_union_map *RedDeps =
>> > +      D->getDependences(Dependences::TYPE_TC_RED | 
>> > + Dependences::TYPE_RTC_RED);
>> 
>> Does this work if we reverse the dependences?
>> 
>>   "schedule" : "[n] -> { Stmt_S0[i0] -> scattering[0, -i0, 0]: i0 % 2 =
0;
>>   Stmt_S0[i0] -> scattering[2, -i0, 0]: i0 % 2 = 1 }"
It does, but wasn't that also one of the examples?

I added some more possible schedules which all look good to me.

The test cases combine loop reversals, "modulo schedules" and multiple
dimensions (some of them do not carry reduction dependences).

--

Johannes Doerfert
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
The Linux Foundation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Fix-parallelism-check-with-reductions-dependences_2.patch
Type: application/octet-stream
Size: 72794 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140626/708edbb8/attachment.obj>