[Polly] Reduction dependences [V3]

Tobias Grosser tobias at grosser.es
Thu Jun 19 01:47:46 PDT 2014


On 19/06/2014 02:59, Johannes Doerfert wrote:
> Hey Tobias,
>
> I changed the way privatization dependences are computed according to our
> discussion.

Great. This looks very nice now.

 > Please have a look at the test cases now.
> Most comments become outdated by that and I think I updated all test cases.
> The only open comment is:
>
>>>> >>>+; void f(int *sum, int N) {
>>>> >>>+;   if (N >= 10) {
>>> >>Why is there an N and an if? Is this necessary for this test case?
> It's always good to have a parametric case because the isl spaces etc. are
> different and might be incompatible at some point.
>
> What do you think?

Maybe, just make it two test cases. One with parameters and one just the 
simple case. The general rule is that one test case
tests one specific thing. So we should not test parameters in the simple 
reduction
test case.

I am now very comfortable with the overall patch. A couple of minor 
comments are left, but we should be able to address them quickly. The 
only missing piece is to not look at the openmp flags, but instead to
include the reduction dependences in the parallelism test.

Cheers,
Tobias

> Best regards,
>    Johannes
>

> diff --git a/lib/Analysis/Dependences.cpp b/lib/Analysis/Dependences.cpp
> index dc64b7e..515b76e 100644
> --- a/lib/Analysis/Dependences.cpp
> +++ b/lib/Analysis/Dependences.cpp
> @@ -25,6 +25,7 @@
>   #include "polly/Options.h"
>   #include "polly/ScopInfo.h"
>   #include "polly/Support/GICHelper.h"
> +#include "polly/CodeGen/CodeGeneration.h"

As written in my earlier comment, I don't think we need this (the only 
reason this was needed was to include the command line options from 
there, no?)

>   #include "llvm/Support/Debug.h"
>
>   #include <isl/aff.h>
> @@ -65,6 +66,7 @@ static cl::opt<enum AnalysisType> OptAnalysisType(
>       cl::cat(PollyCategory));
>
>   //===----------------------------------------------------------------------===//
> +
Unrelated change.

>   Dependences::Dependences() : ScopPass(ID) { RAW = WAR = WAW = nullptr; }
>
>   void Dependences::collectInfo(Scop &S, isl_union_map **Read,
> @@ -92,6 +94,71 @@ void Dependences::collectInfo(Scop &S, isl_union_map **Read,
>     }
>   }
>
> +/// @brief Compute the privatization dependences for a given dependency @p Map
> +///
> +/// Privatization dependences are widened original dependences which originate
> +/// or end in a reduction access. To compute them we apply the transitive close
> +/// of the reduction dependences (which maps each iteration of a reduction
> +/// statement to all following ones) on the RAW/WAR/WAW dependences. The
> +/// dependences which start or end at a reduction statement will be extended to
> +/// depend on all following reduction statement iterations as well.
> +/// Note: "Following" here means according to the reduction dependences.

Nice. Also, thanks for explaining "Following".

> +///
> +/// For the input:
> +///
> +///  S0:   *sum = 0;
> +///        for (int i = 0; i < 1024; i++)
> +///  S1:     *sum += i;
> +///  S2:   *sum = *sum * 3;
> +///
> +/// we have the following dependences before we add privatization dependences:
> +///
> +///   RAW:
> +///     { S0[] -> S1[0]; S1[1023] -> S2[] }
> +///   WAR:
> +///     {  }
> +///   WAW:
> +///     { S0[] -> S1[0]; S1[1024] -> S2[] }
> +///   RED:
> +///     { S1[i0] -> S1[1 + i0] : i0 >= 0 and i0 <= 1022 }
> +///
> +/// and afterwards:
> +///
> +///   RAW:
> +///     { S0[] -> S1[i0] : i0 >= 0 and i0 <= 1023;
> +///       S1[i0] -> S2[] : i0 >= 0 and i0 <= 1023}
> +///   WAR:
> +///     {  }
> +///   WAW:
> +///     { S0[] -> S1[i0] : i0 >= 0 and i0 <= 1023;
> +///       S1[i0] -> S2[] : i0 >= 0 and i0 <= 1023}
> +///   RED:
> +///     { S1[i0] -> S1[1 + i0] : i0 >= 0 and i0 <= 1022 }
> +void Dependences::addPrivatizationDependences() {
> +  isl_union_map *PrivRAW, *PrivWAW, *PrivWAR, *TransClosure;
> +  TransClosure = isl_union_map_transitive_closure(isl_union_map_copy(RED), 0);

Can you add a comment that makes clear why it is save to over
approximate?

> +  PrivRAW = isl_union_map_apply_range(isl_union_map_copy(RAW),
> +                                      isl_union_map_copy(TransClosure));
> +  PrivRAW = isl_union_map_union(
> +      PrivRAW, isl_union_map_apply_range(isl_union_map_copy(TransClosure),
> +                                         isl_union_map_copy(RAW)));
> +  PrivWAW = isl_union_map_apply_range(isl_union_map_copy(WAW),
> +                                      isl_union_map_copy(TransClosure));
> +  PrivWAW = isl_union_map_union(
> +      PrivWAW, isl_union_map_apply_range(isl_union_map_copy(TransClosure),
> +                                         isl_union_map_copy(WAW)));
> +  PrivWAR = isl_union_map_apply_range(isl_union_map_copy(WAR),
> +                                      isl_union_map_copy(TransClosure));
> +  PrivWAR = isl_union_map_union(
> +      PrivWAR, isl_union_map_apply_range(isl_union_map_copy(TransClosure),
> +                                         isl_union_map_copy(WAR)));
> +  isl_union_map_free(TransClosure);
> +
> +  RAW = isl_union_map_coalesce(isl_union_map_union(RAW, PrivRAW));
> +  WAW = isl_union_map_coalesce(isl_union_map_union(WAW, PrivWAW));
> +  WAR = isl_union_map_coalesce(isl_union_map_union(WAR, PrivWAR));
> +}

I like the new algorithm. It looks clean and simple.

>   void Dependences::calculateDependences(Scop &S) {
>     isl_union_map *Read, *Write, *MayWrite, *Schedule;
>
> @@ -114,7 +181,7 @@ void Dependences::calculateDependences(Scop &S) {
>
>     // The pointers below will be set by the subsequent calls to
>     // isl_union_map_compute_flow.
> -  RAW = WAW = WAR = nullptr;
> +  RAW = WAW = WAR = RED = nullptr;
>
>     if (OptAnalysisType == VALUE_BASED_ANALYSIS) {
>       isl_union_map_compute_flow(
> @@ -169,6 +236,56 @@ void Dependences::calculateDependences(Scop &S) {
>     isl_ctx_reset_operations(S.getIslCtx());
>     isl_ctx_set_max_operations(S.getIslCtx(), MaxOpsOld);
>
> +  // FIXME: Code generation cannot produce privatization code yet, thus we need
> +  //        to check if we will produce non-parallel code anyway. If so, we do
> +  //        not need privatization, if not we turn off reduction support.
> +  if (PollyVectorizerChoice == VECTORIZER_NONE && !EnableOpenMP) {

If you want to check that we neither introduce vector parallellism nor 
OpenMP parallelism, you meant || instead of &&, no?

In any case, I don't think we should check those flags. Instead, we 
should include the reduction dependences in all parallelism checks (and 
exclude them in case we know we can perform privatization).

By checking the flags here, we prevent certain scheduling 
transformations that will be allowed in sequential mode, but not when 
enabling parallel loops, whereas they are legal even without 
privatization. The only thing we should prevent is the parallel 
execution of loops that have reduction dependences. As said before, we
should detect this in the parallelism check.

> +    // To handle reduction dependences we proceed as follows:
> +    // 1) Aggregate all possible reduction dependences, namely all self
> +    //    dependences on reduction like statements.

Good.

> +    // 2) Intersect them with the actual RAW & WAW dependences to the get the
> +    //    actual reduction dependences. This will ensure the load/store memory
> +    //    addresses were __identical__ in the two iterations of the statement.

Good.

> +    // 3) Relax the original RAW and WAW dependences by substracting the actual
> +    //    reduction dependences. Binary reductions (sum += A[i]) cause both, and
> +    //    the same, RAW and WAW dependences, but unary reductions (last = A[i])
> +    //    only WAW.

So an unary reduction is a statement that only has either an incoming or 
an outcoming reduction dependency, but not both? Is the first reduction 
in a reduction chain also an unary reduction?

I just looked at the dependences in:

reduction_simple_privatization_deps.ll

	RAW dependences:
		[N] -> { Stmt_S0[] -> Stmt_S1[0] : N >= 11; Stmt_S1[1023] -> Stmt_S2[] 
: N >= 11; Stmt_S1[i0] -> Stmt_S1[1 + i0] : i0 >= 0 and i0 <= 1022 and N 
 >= 11 }
	WAR dependences:
		[N] -> {  }
	WAW dependences:
		[N] -> { Stmt_S0[] -> Stmt_S1[0] : N >= 11; Stmt_S1[1023] -> Stmt_S2[] 
: N >= 11; Stmt_S1[i0] -> Stmt_S1[1 + i0] : i0 >= 0 and i0 <= 1022 and N 
 >= 11 }

It seems the WAR/WAW dependences are identical. Does this mean there is 
no unary reduction here? Otherwise, according to your comment, we would 
see a WAW dependency that is not in the RAW set, no?

> diff --git a/lib/CodeGen/CodeGeneration.cpp b/lib/CodeGen/CodeGeneration.cpp
> index 37d9801..7dbd372 100644
> --- a/lib/CodeGen/CodeGeneration.cpp
> +++ b/lib/CodeGen/CodeGeneration.cpp
> @@ -62,10 +62,11 @@ using namespace llvm;
>   struct isl_set;
>
>   namespace polly {
> -static cl::opt<bool>
> +static cl::opt<bool, true>
>   OpenMP("enable-polly-openmp", cl::desc("Generate OpenMP parallel code"),
>          cl::value_desc("OpenMP code generation enabled if true"),
> -       cl::init(false), cl::ZeroOrMore, cl::cat(PollyCategory));
> +       cl::location(EnableOpenMP), cl::init(false), cl::ZeroOrMore,
> +       cl::cat(PollyCategory));

As I said earlier, I don't think that we want to look at this flag when 
computing dependences.


> --- /dev/null
> +++ b/test/Dependences/reduction_multiple_reductions_2.ll
> @@ -0,0 +1,110 @@
> +; RUN: opt %loadPolly -basicaa -polly-dependences -analyze < %s | FileCheck %s
> +;
> +; CHECK: RAW dependences:
> +; CHECK-DAG: Stmt_S2[i0, i1] -> Stmt_S3[i0] : i0 <= 1023 and i0 >= 0 and i1 <= 1023 and i1 >= 0
> +; CHECK-DAG: Stmt_S3[i0] -> Stmt_S0[1 + i0] : i0 >= 0 and i0 <= 1022
> +; CHECK-DAG: Stmt_S0[i0] -> Stmt_S1[i0, o1] : i0 <= 1023 and i0 >= 0 and o1 <= 1023 and o1 >= 0
> +; These are the important RAW dependences, as they need to originate/end in only one iteration:
> +; CHECK-DAG: Stmt_S1[i0, 1023] -> Stmt_S2[i0, o1] : i0 <= 1023 and i0 >= 0 and o1 <= 1023 and o1 >= 0
> +; CHECK-DAG: Stmt_S1[i0, i1] -> Stmt_S2[i0, 0] : i0 <= 1023 and i0 >= 0 and i1 <= 1022 and i1 >= 0
> +; CHECK: WAR dependences:
> +; CHECK:   {  }
> +; CHECK: WAW dependences:
> +; CHECK-DAG: Stmt_S2[i0, i1] -> Stmt_S3[i0] : i0 <= 1023 and i0 >= 0 and i1 <= 1023 and i1 >= 0
> +; CHECK-DAG: Stmt_S3[i0] -> Stmt_S0[1 + i0] : i0 >= 0 and i0 <= 1022
> +; CHECK-DAG: Stmt_S0[i0] -> Stmt_S1[i0, o1] : i0 <= 1023 and i0 >= 0 and o1 <= 1023 and o1 >= 0
> +; These are the important WAW dependences, as they need to originate/end in only one iteration:
> +; CHECK-DAG: Stmt_S1[i0, 1023] -> Stmt_S2[i0, o1] : i0 <= 1023 and i0 >= 0 and o1 <= 1023 and o1 >= 0
> +; CHECK-DAG: Stmt_S1[i0, i1] -> Stmt_S2[i0, 0] : i0 <= 1023 and i0 >= 0 and i1 <= 1022 and i1 >= 0
> +; CHECK: Reduction dependences:
> +; CHECK-DAG:  Stmt_S1[i0, i1] -> Stmt_S1[i0, 1 + i1] : i0 <= 1023 and i0 >= 0 and i1 <= 1022 and i1 >= 0
> +; CHECK-DAG:  Stmt_S2[i0, i1] -> Stmt_S2[i0, 1 + i1] : i0 <= 1023 and i0 >= 0 and i1 <= 1022 and i1 >= 0


Very nice, I like the S1/S2 in the dependences.

Also, I did not realize we can use the CHECK-DAG instruction to not 
break in case of possible reordering in isl's output.

> --- /dev/null
> +++ b/test/Dependences/reduction_only_reduction_like_access.ll
> @@ -0,0 +1,38 @@
> +; RUN: opt %loadPolly -polly-dependences -analyze < %s | FileCheck %s
> +;
> +; The statement is reduction like but should not yield any reduction dependences

The statement is currently not even reduction like. (We changed it earlier)

> +;
> +; CHECK:  Reduction dependences:
> +; CHECK:    {  }
> +;
> +; void f(int *sum) {
> +;   for (int i = 0; i < 100; i++)
> +;     sum[i] = sum[99-i] + i;
> +; }

Even if we would detect this statement as reduction like, what is the 
reason that there are no reduction dependences?

i = 50 and i = 49 write to the same memory location. So there should be 
a reduction dependency, no?

> +++ b/test/Dependences/reduction_only_reduction_like_access_2.ll
> @@ -0,0 +1,56 @@
> +; RUN: opt %loadPolly -polly-dependences -analyze < %s | FileCheck %s
> +;
> +; The statement is reduction like but should not yield any reduction dependences
> +; FIXME: This is only true as long as we restrict ourselfs to the exact same
ourselves

> +;        access pointer for the load and store.

Again, this statement is currently also not detected as reduction like. 
I would formulate this like:

We do not find any reduction dependences in the following loop, as the
different index expressions in the read and write access prevent us from
detecting this statement as reduction like.

FIXME: In case we lift the requirement on identical subscripts in input
        and output accesses, this test case will have reduction
        dependences for j = 0.


> +; CHECK:  Reduction dependences:
> +; CHECK:    {  }
> +;
> +; void f(int *sum) {
> +;   for (int i = 0; i < 1024; i++)
> +;     for (int j = 0; j < 512; j++)
> +;       sum[i + j] = sum[i] + 3;
> +; }
> +target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-n32-S64"
> +
> +define void @f(i32* %sum) {
> +entry:
> +  br label %for.cond
> +
> +for.cond:                                         ; preds = %for.inc6, %entry
> +  %i.0 = phi i32 [ 0, %entry ], [ %inc7, %for.inc6 ]
> +  %exitcond1 = icmp ne i32 %i.0, 1024
> +  br i1 %exitcond1, label %for.body, label %for.end8
> +
> +for.body:                                         ; preds = %for.cond
> +  br label %for.cond1
> +
> +for.cond1:                                        ; preds = %for.inc, %for.body
> +  %j.0 = phi i32 [ 0, %for.body ], [ %inc, %for.inc ]
> +  %exitcond = icmp ne i32 %j.0, 512
> +  br i1 %exitcond, label %for.body3, label %for.end
> +
> +for.body3:                                        ; preds = %for.cond1
> +  %arrayidx = getelementptr inbounds i32* %sum, i32 %i.0
> +  %tmp = load i32* %arrayidx, align 4
> +  %add = add nsw i32 %tmp, 3
> +  %add4 = add nsw i32 %i.0, %j.0
> +  %arrayidx5 = getelementptr inbounds i32* %sum, i32 %add4
> +  store i32 %add, i32* %arrayidx5, align 4
> +  br label %for.inc
> +
> +for.inc:                                          ; preds = %for.body3
> +  %inc = add nsw i32 %j.0, 1
> +  br label %for.cond1
> +
> +for.end:                                          ; preds = %for.cond1
> +  br label %for.inc6
> +
> +for.inc6:                                         ; preds = %for.end
> +  %inc7 = add nsw i32 %i.0, 1
> +  br label %for.cond
> +
> +for.end8:                                         ; preds = %for.cond
> +  ret void
> +}
> diff --git a/test/Dependences/reduction_privatization_deps.ll b/test/Dependences/reduction_privatization_deps.ll
> new file mode 100644
> index 0000000..e733a3d
> --- /dev/null
> +++ b/test/Dependences/reduction_privatization_deps.ll
> @@ -0,0 +1,115 @@
> +; RUN: opt %loadPolly -basicaa -polly-dependences -analyze < %s | FileCheck %s
> +;
> +; CHECK:      RAW dependences:
> +; CHECK-DAG:    Stmt_S0[i0] -> Stmt_S1[o0, i0 - o0] : i0 <= 1023 and o0 >= 0 and o0 <= i0
> +; CHECK-DAG:    Stmt_S1[i0, i1] -> Stmt_S2[-1 + i0 + i1] : i0 >= 0 and i1 >= 1 and i0 <= 1022 and i1 <= 1024 - i0 and i1 <= 1023
> +; CHECK-DAG:    Stmt_S1[1023, i1] -> Stmt_S2[1022 + i1] : i1 <= 1 and i1 >= 0
> +; CHECK-DAG:    Stmt_S1[i0, 0] -> Stmt_S2[-1 + i0] : i0 <= 1022 and i0 >= 1
> +; CHECK:      WAR dependences:
> +; CHECK-DAG:    Stmt_S2[i0] -> Stmt_S2[1 + i0] : i0 <= 1022 and i0 >= 0
> +; CHECK-DAG:    Stmt_S1[i0, i1] -> Stmt_S2[i0 + i1] : i0 >= 0 and i1 <= 1023 - i0 and i1 >= 1
> +; CHECK-DAG:    Stmt_S1[i0, 0] -> Stmt_S2[i0] : i0 <= 1023 and i0 >= 1
> +; CHECK:      WAW dependences:
> +; CHECK-DAG:    Stmt_S0[i0] -> Stmt_S1[o0, i0 - o0] : i0 <= 1023 and o0 >= 0 and o0 <= i0
> +; CHECK-DAG:    Stmt_S1[0, 0] -> Stmt_S2[0]
> +; CHECK:      Reduction dependences:
> +; CHECK-DAG:    Stmt_S1[i0, i1] -> Stmt_S1[1 + i0, -1 + i1] : i0 <= 1022 and i0 >= 0 and i1 <= 1023 and i1 >= 1

I like the S0/S1 names here. They make it a lot more readable.

Also, the privatization dependences look good here.

> @@ -0,0 +1,83 @@
> +; RUN: opt %loadPolly -polly-dependences -analyze < %s | FileCheck %s


> +;  CHECK: RAW dependences:
> +;  CHECK-DAG:  Stmt_S3[i0] -> Stmt_S2[1 + i0, o1] : i0 <= 97 and i0 >= 0 and o1 <= 99 and o1 >= 0
> +;  CHECK-DAG:  Stmt_S1[i0] -> Stmt_S3[i0] : i0 >= 0 and i0 <= 98
> +;  CHECK: WAR dependences:
> +;  CHECK:   {  }
> +;  CHECK: WAW dependences:
> +;  CHECK-DAG:  Stmt_S3[i0] -> Stmt_S2[1 + i0, o1] : i0 <= 97 and i0 >= 0 and o1 <= 99 and o1 >= 0
> +;  CHECK-DAG:  Stmt_S1[i0] -> Stmt_S3[i0] : i0 >= 0 and i0 <= 98
> +;  CHECK: Reduction dependences:
> +;  CHECK:   { Stmt_S2[i0, i1] -> Stmt_S2[i0, 1 + i1] : i0 <= 98 and i0 >= 0 and i1 <= 98 and i1 >= 0 }

Maybe add a comment, that in this case we have privatization dependences 
from a textually later statement to a textually earlier,
but the dependences still go forward in time. This is again a nice test 
case.

> +;    void f(int *sum) {
> +;      int i, j;
> +;      for (i = 0; i < 99; i++) {
> +; S1:    sum[i + 1] += 42;
> +;        for (j = 0; j < 100; j++)
> +; S2:      sum[i] += i * j;
> +; S3:    sum[i + 1] += 7;
> +;      }
> +;    }

The other test cases are fine. We got now really good test coverage. I 
am very confident about this.

Tobias




More information about the llvm-commits mailing list