[Polly] Fix parallelism check for reductions

Fri Jul 11 02:34:38 PDT 2014

On 10/07/2014 01:32, Johannes Doerfert wrote:
> Hey,
>
> We talked about this today and while I think it is correct I would like a
> last comment here.
> This patch basically is combined out of the [Fix] we discussed earlier and
> the initial (and I think "accepted") isl ast patch.
> I also added comments to parts which might be confusing or need attention in
> the future.
>
> What do you guys think?

>     // The node is the innermost parallel loop.
>     int IsInnermostParallel;
> +
> +  // The node is only parallel because of reductions

      // The node is parallel only if reductions are ignored or
         privatized.

Reductions don't make a loop parallel, no?

>     /// @brief The type of the dependences.
>     ///
> -  /// Reduction dependences are seperated because they can be ignored during
> -  /// the scheduling. This is the case since the order in which the reduction
> -  /// statements are executed does not matter. However, if they are executed
> -  /// in parallel we need to take additional measures (e.g., privatization)
> -  /// to ensure a correct result.
> +  /// Reduction dependences are seperated from RAW/WAW/WAR dependences because

separated

> +  /// we can ignore them during the scheduling. This is the case since the order
> +  /// in which the reduction statements are executed does not matter. However,
> +  /// if they are executed in parallel we need to take additional measures
> +  /// (e.g, privatization) to ensure a correct result. The (reverse) transitive
> +  /// closure of the reduction dependences are used to check for parallel
> +  /// executed reduction statements during code generation.
> +  ///
> +  /// FIXME: As soon as the code generation can handle reductions we can get
> +  ///        rid of the current TYPE_RED dependences and rename the TYPE_TC_RED
> +  ///        to TYPE_RED.

I don't get this FIXME. What is needed in the code generation to remove 
TC_RED? (You can just remove it, if you don't want to discuss this now).

>     enum Type {
>       // Write after read
> -    TYPE_WAR = 0x1,
> +    TYPE_WAR = 1 << 0,
>
>       // Read after write
> -    TYPE_RAW = 0x2,
> +    TYPE_RAW = 1 << 1,
>
>       // Write after write
> -    TYPE_WAW = 0x4,
> +    TYPE_WAW = 1 << 2,
>
>       // Reduction dependences
> -    TYPE_RED = 0x8,
> +    TYPE_RED = 1 << 3,
> +
> +    // Transitive closure of the reduction dependences (& the reverse)

Please describe how they look like, not how they are computed. Also, we 
should mention here that thy are cyclic and possibly reversed.

>   void Dependences::addPrivatizationDependences() {
> -  isl_union_map *PrivRAW, *PrivWAW, *PrivWAR, *TransClosure;
> -
> -  // The transitive closure might be over approximated but we only use it to
> -  // compute the privatization dependences. Thus, overapproximation will lead
> -  // "only" to more conservative privatization dependences.
> -  // FIXME: Take precautions to ensure only forward dependences are created.
> -  TransClosure = isl_union_map_transitive_closure(isl_union_map_copy(RED), 0);
> +  isl_union_map *PrivRAW, *PrivWAW, *PrivWAR;
> +
> +  // The transitive closure might be over approximated, thus could lead to
> +  // dependency cycles in the privatization dependences. To make sure this
> +  // will not happen we remove all negative dependences after we computed
> +  // the transitive closure.
> +  TC_RED = isl_union_map_transitive_closure(isl_union_map_copy(RED), 0);
> +
> +  // FIXME: Apply the current schedule instead of assuming the identity schedule
> +  //        here. The current approach is only valid as long as we compute the
> +  //        dependences only with the initial (identity schedule). Any other
> +  //        schedule could change "the direction of the backward depenendes" we
> +  //        want to eliminate here.
> +  isl_union_set *UDeltas = isl_union_map_deltas(isl_union_map_copy(TC_RED));
> +  isl_union_set *Universe = isl_union_set_universe(isl_union_set_copy(UDeltas));
> +  isl_union_set *Zero = isl_union_set_empty(isl_union_set_get_space(Universe));
> +  isl_union_set_foreach_set(Universe, fixSetToZero, &Zero);
> +  isl_union_map *NonPositive = isl_union_set_lex_le_union_set(UDeltas, Zero);

Did you face any issues with applying the schedule? Or you just decided 
to leave it for later.

>
> +/// @brief Check if the current scheduling dimension is parallel
> +///
> +/// In case the dimension is parallel we also check if any reduction
> +/// dependences is broken when we exploit this parallelism. If so,
                    are
> +/// @p IsReductionParallel will be set to true. The reduction dependences we use
                               is set
> +/// to check are actually the union of the transitive closure of the initial
> +/// reduction dependences together with their reveresal. Even though these
> +/// dependences connect all iterations with each other (thus they are cyclic)
> +/// we can perform the parallelism check as we are only interested in a zero
> +/// (or non-zero) dependence distance on the dimension in question.

Otherwise looks good to me. Feel free to commit.

Cheers,
Tobias