[Polly] Reduction dependences [V3]

Wed Jun 18 17:59:08 PDT 2014

Hey Tobias,

I changed the way privatization dependences are computed according to our
discussion. Please have a look at the test cases now.
Most comments become outdated by that and I think I updated all test cases.
The only open comment is:

>>> +; void f(int *sum, int N) {
>>> +;   if (N >= 10) {

>> Why is there an N and an if? Is this necessary for this test case?
It's always good to have a parametric case because the isl spaces etc. are
different and might be incompatible at some point.

What do you think?

Best regards,
  Johannes

--

Johannes Doerfert
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
The Linux Foundation

-----Original Message-----
From: Tobias Grosser [mailto:tobias at grosser.es] 
Sent: Wednesday, June 18, 2014 6:10 AM
To: Johannes Doerfert
Cc: llvm-commits at cs.uiuc.edu; 'Sebastian Pop'
Subject: Re: [Polly] Reduction dependences [V3]

On 17/06/2014 23:17, Johannes Doerfert wrote:
> Small changes to the version before:
>
>    + I found an easier way of computing the privatization dependences.
>
>   + More accurate comment
>
>
>
> Since we have the first patch in, tell me what you think about this one J.

Sure, let's keep going.

> diff --git a/include/polly/Dependences.h b/include/polly/Dependences.h 
> index 9a704a8..d44a905 100755
> --- a/include/polly/Dependences.h
> +++ b/include/polly/Dependences.h
> @@ -105,6 +105,9 @@ private:
>     isl_union_map *WAR;
>     isl_union_map *WAW;
>
> +  /// @brief The map of reduction dependences  isl_union_map *RED = 
> + nullptr;
> +
>     /// @brief Collect information about the SCoP.
>     void collectInfo(Scop &S, isl_union_map **Read, isl_union_map **Write,
>                      isl_union_map **MayWrite, isl_union_map 
> **Schedule); diff --git a/lib/Analysis/Dependences.cpp 
> b/lib/Analysis/Dependences.cpp index dc64b7e..1624b81 100644
> --- a/lib/Analysis/Dependences.cpp
> +++ b/lib/Analysis/Dependences.cpp
> @@ -25,6 +25,7 @@
>   #include "polly/Options.h"
>   #include "polly/ScopInfo.h"
>   #include "polly/Support/GICHelper.h"
> +#include "polly/CodeGen/CodeGeneration.h"
>   #include "llvm/Support/Debug.h"
>
>   #include <isl/aff.h>
> @@ -65,6 +66,84 @@ static cl::opt<enum AnalysisType> OptAnalysisType(
>       cl::cat(PollyCategory));
>
>   
> //===-----------------------------------------------------------------
> -----===//
> +
> +/// @brief Helper struct during the compuation of privatization 
> +dependences
                                        computation
> +struct PrivPayload {

I would suggest to avoid appreviations, except for very commonly used code
with an obvious appreviation.

> +  /// @brief All privatization dependences computed so far
Missing point at end of sentence

> +  isl_union_map *Priv;
> +  /// @brief All accesses which are part of actual reduction 
> + dependences
Missing point at end of sentence.

What about 'Accesses'?

> +};
> +
> +/// @brief Compute the privatization dependences for a given 
> +dependency @p Map /// /// Privatization dependences are widened 
> +original dependences which originate /// or end in a reduction 
> +access. To compute them we first look for those /// original 
> +dependences. If found we widen them such that they do __not__ only 
> +/// depend on a subset of the reduction memory accesses but on all of
them.
> +/// For the input:
> +///
> +///  S0:   *sum = 0;
> +///        for (int i = 0; i < 1024; i++)
> +///  S1:     *sum += i;
> +///  S2:   *sum = *sum * 3;
> +///
> +/// we have the following dependences before we add privatization
dependences:
> +///
> +///   RAW:
> +///     { S0[] -> S1[0]; S1[1023] -> S2[] }
> +///   WAR:
> +///     {  }
> +///   WAW:
> +///     { S0[] -> S1[0]; S1[1024] -> S2[] }
> +///   RED:
> +///     { S1[i0] -> S1[1 + i0] : i0 >= 0 and i0 <= 1022 }
> +///
> +/// and afterwards:
> +///
> +///   RAW:
> +///     { S0[] -> S1[i0] : i0 >= 0 and i0 <= 1023;
> +///       S1[i0] -> S2[] : i0 >= 0 and i0 <= 1023}
> +///   WAR:
> +///     {  }
> +///   WAW:
> +///     { S0[] -> S1[i0] : i0 >= 0 and i0 <= 1023;
> +///       S1[i0] -> S2[] : i0 >= 0 and i0 <= 1023}
> +///   RED:
> +///     { S1[i0] -> S1[1 + i0] : i0 >= 0 and i0 <= 1022 }

Very nice and clear comment!

This is for me currently the most important part to understand and to get
right.

I suggest to move this code here:

> +      isl_union_set *RedAccs =
isl_union_map_domain(isl_union_map_copy(RED));
> +      RedAccs = isl_union_set_union(
> +          RedAccs, isl_union_map_range(isl_union_map_copy(RED)));
> +      RedAccs = isl_union_set_coalesce(RedAccs);
> +      isl_union_map **Maps[] = {&RAW, &WAW, &WAR};
> +      for (isl_union_map **MapPtr : Maps) {
> +        PrivPayload Pl =
{isl_union_map_empty(isl_union_map_get_space(RAW)),
> +                          RedAccs};
> +        isl_union_map_foreach_map(*MapPtr, createPrivatizationDeps, &Pl);
> +        *MapPtr = isl_union_map_coalesce(isl_union_map_union(*MapPtr,
Pl.Priv));
> +      }
> +      isl_union_set_free(RedAccs);

into its own function right below your comment.

As this algorithm is an essential part, it would be good to describe the
implementation in another (smaller) comment.

> +static int createPrivatizationDeps(__isl_take isl_map *Map, void 
> +*user) {
> +  isl_union_set *PDom = nullptr, *PRan = nullptr;
> +  isl_union_map *UMap, *DInter, *RInter;
> +  PrivPayload *Pl = (PrivPayload *)user;
> +  UMap = isl_union_map_from_map(Map);
> +
> +  DInter = isl_union_map_intersect_domain(isl_union_map_copy(UMap),
> +                                          
> + isl_union_set_copy(Pl->RedAccs));  if (!isl_union_map_is_empty(DInter))
{
> +    PDom = isl_union_set_copy(Pl->RedAccs);
> +    PRan = isl_union_map_range(isl_union_map_copy(UMap));
> +  }
> +
> +  RInter = isl_union_map_intersect_range(isl_union_map_copy(UMap),
> +                                         
> + isl_union_set_copy(Pl->RedAccs));  if (!isl_union_map_is_empty(RInter))
{
> +    assert(!PDom && !PRan && "Create privatization locations twice");
> +    PDom = isl_union_map_domain(isl_union_map_copy(UMap));
> +    PRan = isl_union_set_copy(Pl->RedAccs);  }
> +
> +  if (PDom || PRan) {
> +    assert(PDom && PRan && "Inconsistent assignments");
> +    Pl->Priv = isl_union_map_union(
> +        Pl->Priv, isl_union_map_from_domain_and_range(PDom, PRan));  
> + }

It would be interesting to see the following test case:

S0:   *sum = 0;
       for (int i = 0; i < 1024; i++)
S1:     *sum += i;
S2:   *sum = *sum * 3;
       for (int i = 0; i < 1024; i++)
S4:     *sum += i;
S5:   *sum = *sum * 3;

>   Dependences::Dependences() : ScopPass(ID) { RAW = WAR = WAW = 
> nullptr; }
>
>   void Dependences::collectInfo(Scop &S, isl_union_map **Read, @@ 
> -169,6 +248,62 @@ void Dependences::calculateDependences(Scop &S) {
>     isl_ctx_reset_operations(S.getIslCtx());
>     isl_ctx_set_max_operations(S.getIslCtx(), MaxOpsOld);
>
> +  // FIXME: Code generation cannot produce privatization code yet, thus
we need
> +  //        to check if we will produce non-parallel code anyway. If so,
we do
> +  //        not need privatization, if not we turn off reduction support.
> +  if (PollyVectorizerChoice == VECTORIZER_NONE && !EnableOpenMP) {

That was not exactly what I imagined. Instead of checking if OpenMP code
generation is enabled, I was hoping that we store the dependence in a way
that passes like the scheduler can get the necessary dependences for the
sequential transformations and that we make sure that for parallelism checks
we use the additional reduction dependences.

Even with openmp code generation enabled, we should allow the scheduler the
freedom of reordering instructions. We only don't want to incorrectly
introduce parallel loops.

(Let's leave this out for now, the main point is the privatization
dependences computation.)

> +    // To handle reduction dependences follow we:

Grammar?

> +    // 1) Aggregate all possible reduction dependences, namely all self
> +    //    dependences on reduction like statements.
> +    // 2) Intersect them with the actual RAW & WAW dependences to the get
the
> +    //    actual reduction dependences. This will ensure the load/store
memory
> +    //    addresses were __identical__ in the two iterations of the
statement.
> +    // 3) Relax the original RAW and WAW dependences by substracting the
actual
> +    //    reduction dependences. Binary reductions cause both (and the
same) RAW
> +    //    and WAW dependences, but unary reductions only WAW.

What are binary and what are unary reductions. Is this already explained
somewhere? Are there certain test cases that I can look at to see these
differences?

> +    // 4) Add the privatization dependences which are widened versions of
> +    //    already present dependences. They model the effect of manual
> +    //    privatization at the outermost possible place (namely after the
last
> +    //    write and before the first access to a reduction location).
> +
> +    // Step 1)
> +    RED = isl_union_map_empty(isl_union_map_get_space(RAW));
> +    for (auto *Stmt : S) {
> +      if (!Stmt->isReductionLike())
> +        continue;
> +      isl_set *Domain = Stmt->getDomain();
> +      isl_map *Identity =
> +          isl_map_from_domain_and_range(isl_set_copy(Domain), Domain);
> +      RED = isl_union_map_add_map(RED, Identity);
> +    }
> +
> +    // Step 2)
> +    RED = isl_union_map_intersect(RED, isl_union_map_copy(RAW));
> +    RED = isl_union_map_intersect(RED, isl_union_map_copy(WAW));
> +
> +    if (!isl_union_map_is_empty(RED)) {
> +
> +      // Step 3)
> +      RAW = isl_union_map_subtract(RAW, isl_union_map_copy(RED));
> +      WAW = isl_union_map_subtract(WAW, isl_union_map_copy(RED));
> +
> +      // Step 4)
> +      isl_union_set *RedAccs = 
> + isl_union_map_domain(isl_union_map_copy(RED));

To my understanding 'RED' contains a mapping from statement iteration to
statement iteration. Why is the domain now an 'access'?

> +      RedAccs = isl_union_set_union(
> +          RedAccs, isl_union_map_range(isl_union_map_copy(RED)));
> +      RedAccs = isl_union_set_coalesce(RedAccs);

This is the set of statement iterations that appear in reduction accesses,
right?

> +
> +      isl_union_map **Maps[] = {&RAW, &WAW, &WAR};
> +      for (isl_union_map **MapPtr : Maps) {
> +        PrivPayload Pl =
{isl_union_map_empty(isl_union_map_get_space(RAW)),
> +                          RedAccs};
> +        isl_union_map_foreach_map(*MapPtr, createPrivatizationDeps, 
> + &Pl);

In your comment, it would be interesting to understand why creating
privatization dependences per map is a good/valid approach.

> diff --git a/test/Dependences/reduction_only_reduction_like_access.ll 
> b/test/Dependences/reduction_only_reduction_like_access.ll
> new file mode 100644
> index 0000000..e8a0ef6
> --- /dev/null
> +++ b/test/Dependences/reduction_only_reduction_like_access.ll
> @@ -0,0 +1,36 @@
> +; RUN: opt %loadPolly -polly-dependences -analyze < %s | FileCheck %s 
> +; ; CHECK:  Reduction dependences:
> +; CHECK:    {  }
> +;
> +; void f(int *sum) {
> +;   for (int i = 0; i < 100; i++)
> +;     sum[i] = sum[99-i] + i;

Can you add a comment why this test cases is interesting/what it is testing.
This is not immediately clear to me.

> diff --git 
> a/test/Dependences/reduction_only_reduction_like_access_2.ll 
> b/test/Dependences/reduction_only_reduction_like_access_2.ll
> new file mode 100644
> index 0000000..3bdd896
> --- /dev/null
> +++ b/test/Dependences/reduction_only_reduction_like_access_2.ll
> @@ -0,0 +1,52 @@
> +; RUN: opt %loadPolly -polly-dependences -analyze < %s | FileCheck %s 
> +; ; CHECK:  Reduction dependences:
> +; CHECK:    {  }
> +;
> +; void f(int *sum) {
> +;   for (int i = 0; i < 1024; i++)
> +;     for (int j = 0; j < 512; j++)
> +;       sum[i + j] = sum[i] + 3;

Similar here. Can you explain what exactly is tested here. Is it different
from the previous test case?

(Btw, the test cases are already very small/nicely reduced. This is
nice!)

> diff --git a/test/Dependences/reduction_simple_iv.ll 
> b/test/Dependences/reduction_simple_iv.ll
> new file mode 100644
> index 0000000..01b0a51
> --- /dev/null
> +++ b/test/Dependences/reduction_simple_iv.ll
> @@ -0,0 +1,41 @@
> +; RUN: opt %loadPolly -polly-dependences -analyze < %s | FileCheck %s 
> +; ; CHECK:  RAW dependences:
> +; CHECK:    {  }
> +; CHECK:  WAR dependences:
> +; CHECK:    {  }
> +; CHECK:  WAW dependences:
> +; CHECK:    {  }
> +; CHECK:  Reduction dependences:
> +; CHECK:    { Stmt_for_cond[i0] -> Stmt_for_cond[1 + i0] : i0 >= 0 and i0
<= 99 }
> +;
> +;
> +; void f(int* sum) {
> +;   for (int i = 0; i <= 100; i++)
> +;     sum += i * 3;
> +; }

This is a nice and minimal test case. I like it.

To make these test cases easier to understand, it could help if you add
labels to the examples:

void f(int* sum) {
       for (int i = 0; i <= 100; i++)
S1:     sum += i * 3;
}

And then name the basic blocks according to the labels. The resulting
dependences should then contain the names of the basic blocks, so looking at
the C code and the dependences makes sense.

> +++ b/test/Dependences/reduction_simple_privatization_deps.ll
> @@ -0,0 +1,58 @@
> +; RUN: opt %loadPolly -polly-dependences -analyze < %s | FileCheck %s 
> +; ; CHECK: Printing analysis 'Polly - Calculate dependences' for 
> +region: 'entry.1 => f.end' in function 'f':
> +; CHECK:   RAW dependences:
> +; CHECK:     [N] -> { Stmt_entry_2[] -> Stmt_for_body[o0] : N >= 11 and
o0 <= 1023 and o0 >= 0; Stmt_for_body[i0] -> Stmt_for_end[] : N >= 11 and i0
<= 1023 and i0 >= 0 }
> +; CHECK:   WAR dependences:
> +; CHECK:     [N] -> {  }
> +; CHECK:   WAW dependences:
> +; CHECK:     [N] -> { Stmt_entry_2[] -> Stmt_for_body[o0] : N >= 11 and
o0 <= 1023 and o0 >= 0; Stmt_for_body[i0] -> Stmt_for_end[] : N >= 11 and i0
<= 1023 and i0 >= 0 }
> +; CHECK:   Reduction dependences:
> +; CHECK:     [N] -> { Stmt_for_body[i0] -> Stmt_for_body[1 + i0] : i0 >=
0 and i0 <= 1022 and N >= 11 }
> +;
> +; void f(int *sum, int N) {
> +;   if (N >= 10) {

Why is there an N and an if? Is this necessary for this test case?

> +;     *sum = 0;
> +;     for (int i = 0; i < 1024; i++)
> +;       *sum += i;
> +;     *sum = *sum * 3;
> +;   }
> +; }

Otherwise this is a very nice, minimal test case.

> +++ b/test/Dependences/reduction_simple_privatization_deps_2.ll
> @@ -0,0 +1,67 @@
> +; RUN: opt %loadPolly -polly-dependences -analyze < %s | FileCheck %s 
> +; ; CHECK: Printing analysis 'Polly - Calculate dependences' for 
> +region: 'for.cond => for.end8' in function 'f':
> +; CHECK:  RAW dependences:
> +; CHECK:    { Stmt_for_body3[i0, i1] -> Stmt_for_end[o0] : i0 <= 99 and
i0 >= 0 and i1 <= 99 and i1 >= 0 and o0 <= 99 and o0 >= 0; Stmt_for_body[i0]
-> Stmt_for_body3[o0, o1] : i0 <= 99 and i0 >= 0 and o0 <= 99 and o0 >= 0
and o1 <= 99 and o1 >= 0; Stmt_for_end[i0] -> Stmt_for_body[1 + i0] : i0 >=
0 and i0 <= 98 }
> +; CHECK:  WAR dependences:
> +; CHECK:    {  }
> +; CHECK:  WAW dependences:
> +; CHECK:    { Stmt_for_body3[i0, i1] -> Stmt_for_end[o0] : i0 <= 99 and
i0 >= 0 and i1 <= 99 and i1 >= 0 and o0 <= 99 and o0 >= 0; Stmt_for_body[i0]
-> Stmt_for_body3[o0, o1] : i0 <= 99 and i0 >= 0 and o0 <= 99 and o0 >= 0
and o1 <= 99 and o1 >= 0; Stmt_for_end[i0] -> Stmt_for_body[1 + i0] : i0 >=
0 and i0 <= 98 }
> +; CHECK:  Reduction dependences:
> +; CHECK:    { Stmt_for_body3[i0, i1] -> Stmt_for_body3[i0, 1 + i1] : i0
>= 0 and i0 <= 99 and i1 >= 0 and i1 <= 98 }
> +;
> +; void f(int *sum) {
> +;   int i, j;
> +;   for (i = 0; i < 100; i++) {
> +;     *sum *= 42;
> +;     for (j = 0; j < 100; j++)
> +;       *sum += i * j;
> +;     *sum *= 7;
> +;   }
> +; }

And this is a very nice extension to multiple loops.

Regarding the privatization dependences I am slighly confused. You currently
have as dependences:

Stmt_for_body[i0] -> Stmt_for_body3[o0, o1] : i0 <= 99 and i0 >= 0 and
o0 <= 99 and o0 >= 0 and o1 <= 99 and o1 >= 0; Stmt_for_body3[i0, i1] ->
Stmt_for_end[o0] : i0 <= 99 and i0 >= 0 and i1 <= 99 and i1 >= 0 and o0 <=
99 and o0 >= 0 Stmt_for_end[i0] -> Stmt_for_body[1 + i0] : i0 >= 0 and i0 <=
98

This includes the following dependences:

Stmt_for_body[9] -> Stmt_for_body3[10,0] Stmt_for_body3[10, 0] ->
Stmt_for_end[8]; Stmt_for_end[8] -> Stmt_for_body[9]

If I did not miss something, this is a dependency cycle for which there is
no valid schedule that can be found. Right?

I would have expected something like the following instead:

Stmt_for_body3[i0, i1] -> Stmt_for_end[i0] : i0 <= 99 and i0 >= 0 and i1 <=
99 and i1 >= 0

In case you believe a modification of your algorithm and you are unsure
where to look it may make sense to discuss the core algorithm on isl-dev.
One direction to look is isl's transitive closure function.
I tried the following with islpy:

$ m = islpy.Map("{ Stmt_for_body3[i0, i1] -> Stmt_for_body3[i0, 1 + i1]
: i0 >= 0 and i0 <= 99 and i1 >= 0 and i1 <= 98 }") $ m.transitive_closure()
(Map("{ Stmt_for_body3[i0, i1] -> Stmt_for_body3[i0, o1] : i0 >= 0 and
i0 <= 99 and i1 >= 0 and i1 <= 98 and o1 >= 1 + i1 and o1 <= 99 and o1  >= 1
}"),
  1)
$ m.transitive_closure()[0].reverse().lexmin().reverse()
Map("{ Stmt_for_body3[i0, 0] -> Stmt_for_body3[i0, o1] : i0 <= 99 and i0  >=
0 and o1 <= 99 and o1 >= 1 }")

Maybe this is helpful.

Cheers,
Tobias
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Model-statement-wise-reduction-dependences_v4.patch
Type: application/octet-stream
Size: 47676 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140618/15f69215/attachment.obj>