Reductions

Sun Jun 15 14:24:51 PDT 2014

Comments inlined.

Johannes Doerfert
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
The Linux Foundation

> On 13/06/2014 19:27, Johannes Doerfert wrote:
>> Hey Tobias,
>>
>> I changed almost everything according to your comments (except the
>> prefix in
>> the test cases, I don't see the need to do this, but if you insist I
>> will).
>> I added another "only reduction like" test case and a floating point
>> test
>> case w and w/o fast math [= 4 test cases for in the first patch].
>
> Nice.
>
>> What is left in order to commit this one and get to the next patch?
>
> Besides a couple of nitpicks in the test cases and some last questions
> about the comments, the only missing piece is the question if we need to
> check the read/write location to be identical.
As I said in the first mail, I dont think we have to as long as we are
restricted like this. More when this is discussed below.

As you said, it remains to clarify where I have to put the test cases now.

>> Comments inlined (formatting might be a bit off, sorry about that).
>
>
>>>>> >> >+  /// @brief Flag to indicate reduction statements
>>>>> >> >+ bool IsReductionLike = false;
>>>> >>
>>>> >>Why do you call this ReductionLike and not just reduction?
>
> This question is just a minor thing, but you did not reply to it. Is the
> example you give in the comment the reason you call this reduction-like
> instead of reduction?
Yes, my bad. The explanation is in the comment, here it is:
  +  /// @brief Flag to indicate reduction like statements
  +  ///
  +  /// A statement is reduction like if it contains exactly one load and
one
  +  /// store with exactly one binary operator in between. The binary
operator
  +  /// needs to be associative and commutative or easily convertable
into a such
  +  /// one. Not all reduction like statements are binary reductions in
the common
  +  /// sence, e.g.,
  +  /// for (i = 0; i < 100; i++)
  +  ///   sum[100-i] += sum[i]
  +  /// which is reduction like but will not cause any reduction
dependences.
  +  bool IsReductionLike = false;
And I thought this kind of detection was more to your liking as it moves
computation and filtering to the dependecny analysis.

>>>> >>Also, can you explain what it means to be a reduction. Is it
>>>> associative
>> _and_  commutative? Or just one of them? I would also mention the
>> restriction
>> on two memory accesses. We later need to extend this to the multiple
>> memory
>> access case.
>> I added a descriptive comment to this Boolean flag. It talks about
>> reduction
>> like, associative, commutative, and the restriction.
>> Lifting the restriction soon is one of mine priorities too, but I'd like
>> positive feedback on these patches first.
>
> Nice.
>
>>>>> >> >   #include "isl/constraint.h"
>>>>> >> >@@ -55,6 +55,12 @@ using namespace polly;
>>>>> >> >   STATISTIC(ScopFound, "Number of valid Scops");
>>>>> >> >   STATISTIC(RichScopFound, "Number of Scops containing a loop");
>>>>> >> >
>>>>> >> >+static cl::opt<bool>
>>>>> >> >
>> +DisableMultiplicativeReductions("polly-disable-multiplicative-reductions",
>>>>> >> >+                                cl::desc("Disable multiplicative
>> reductions"),
>>>>> >> >+                                cl::Hidden, cl::ZeroOrMore,
>> cl::init(false),
>>>>> >> >+                                cl::cat(PollyCategory));
>>>> >>
>>>> >>Out of interest. Why would we want to disable those, but not the
>>>> other
>> possible operands?
>> They overflow easily, add is pretty safe and bit operations are stable
>> too.
>> Is that OK. ?
>
> Sure. Maybe add a comment why someone would like to disable
> multiplicative reductions.
I will.

>>>>> >> >+  // Skip if there is not one binary operator between the load
>>>>> and
>>>>> >> >+ the store  auto *BinOp =
>>>>> >> >+ dyn_cast<BinaryOperator>(Store->getValueOperand());
>>>>> >> >+  if (!BinOp || (BinOp->getOperand(0) != Load &&
>>>>> BinOp->getOperand(1)
>> != Load))
>>>>> >> >+    return;
>>>> >>Should we also check that the load and store instruction point to
>>>> the
>> same memory location (reference the very same pointer)?
>> Not in the current setting. I wanted it to be a small first patch so I
>> use
>> reduction like and allow for example:
>> for I
>>    for j
>>      B[i] = A[i] + 3;
>> However there won't be any reduction dependences for this example so
>> there
>> is no real harm done.
>
> I tested test example (attached):
>
> for 0 <= i < 100:
>    A[i+1] = A[i] + 1;
>
> To my understanding your current dependence check in patch 2 would
> detect reduction dependences here (and does not verify correctly that
> the in and out locations are identical).
No, it does not. I verified it. Here is the output:
    Function: f
    Region: %for.cond---%for.end
    Context:
    {  :  }
    Statements {
    	Stmt_for_body
        Reduction like: 1
            Domain :=
                { Stmt_for_body[i0] : i0 >= 0 and i0 <= 99 };
            Scattering :=
                { Stmt_for_body[i0] -> scattering[0, i0, 0] };
            ReadAccess :=
                { Stmt_for_body[i0] -> MemRef_sum[i0] };
            MustWriteAccess :=
                { Stmt_for_body[i0] -> MemRef_sum[1 + i0] };
    }
Printing analysis 'Polly - Calculate dependences' for region: 'for.cond =>
for.end' in function 'f':
	RAW dependences:
		{ Stmt_for_body[i0] -> Stmt_for_body[1 + i0] : i0 >= 0 and i0 <= 98 }
	WAR dependences:
		{  }
	WAW dependences:
		{  }
	Reduction dependences:
		{  }

And that is exactly my argument. We don't check this in this simple
setting and add checks as soon as it is necessary (when we remove the two
access limitation we need to check all accesses for overlapping ones
anyway).

> (In fact you don't detect this reduction, but this seems to be a small
> bug:
>
>      RED = isl_union_map_intersect(RED, isl_union_map_copy(RAW));
>      RED = isl_union_map_intersect(RED, isl_union_map_copy(WAW));
>
> I think you would like to compute
>
> RED = intersect(RED, union(RAW, WAW)), instead of the current
> RED = intersect(intersect(RED, RAW), WAW))
I don't think this is correct. We will get rid of the RAW intersection
once we want unary reductions, but the WAW intersection need to stay.

> Or are reduction dependences required to be both RAW and WAW dependences?)
There is no "bug" I think, but I wanted to be conservative and only test
for binary reductions.
Reductions should always cause WAW dependences, binary reduction also the
same RAW dependences.

> I think the simplest check for now is to verify that the addresses of
> the input/output location have the same LLVM-IR value. Or that the read
> and write access have identical access functions. We can then later
> discuss how to relax this.
This is part of the "relax the detection limitations" patch. As I
mentioned above, we don't need that now as there won't be reduction
dependences.
However, I can add a check to limit it two the same base address (which
makes sense).

>> diff --git a/include/polly/ScopInfo.h b/include/polly/ScopInfo.h
>> index e85ec13..f571b47 100644
>> --- a/include/polly/ScopInfo.h
>> +++ b/include/polly/ScopInfo.h
>> @@ -270,6 +270,17 @@ class ScopStmt {
>>     MemoryAccessVec MemAccs;
>>     std::map<const Instruction *, MemoryAccess *> InstructionToAccess;
>>
>> +  /// @brief Flag to indicate reduction like statements
>> +  ///
>> +  /// A statement is reduction like if it contains exactly one load and
>> one
>> +  /// store with exactly one binary operator in between. The binary
>> operator
>> +  /// needs to be associative and commutative or easily convertable
>> into a such
>
> Should it be 'convertible'?
Mh, maybe. I'll ask somebody.

> What does it mean to be easily convertible? Is this already implemented?
> Is there a test case/example I could look at?
It means "sum -= A[i];" is fine too, not with the current patch (v2 and
v3) as I removed the opcode check and use isCommutative now.
I will adjust the comment for the final version.

>> +  /// one. Not all reduction like statements are binary reductions in
>> the common
>> +  /// sence, e.g.,
>> +  /// for (i = 0; i < 100; i++)
>> +  ///   sum[100-i] += sum[i]
>> +  /// which is reduction like but will not cause any reduction
>> dependences.
>
> This is because there are no dependences at all in this loop or because
> the individual iterations write to a different memory location?
Yes, it shows that reduction like is not the same as reduction as I would
suggest to detect this for now and call it reduction like.
The example above A[i] = B[i] + 3 is similar and would be marked at the
moment, however a check for the same base value won't change anything in
the sum[100-i] example.
I like to not test the access instruction because we can have:
for i
  for j
     sum[i+j] = sum[i] + 3;
which contains reduction and non reduction dependences.

> Am I right that the point you want to make is that the flag in ScopInfo
> only gives information about the properties of an individual
> computation, but that the actual computation of reduction dependences
> will happen later?
Yes! And to my understanding this was what you proposed during the
discussion of my last approach.
However this will become stricter as soon as we remove some of the
detection limits, maybe at some point we can rename it to reduction.

>> diff --git a/test/ScopInfo/reduction_only_reduction_like_access.ll
>> b/test/ScopInfo/reduction_only_reduction_like_access.ll
>> new file mode 10644
>> index 0000000..30eef4d
>> --- /dev/null
>> +++ b/test/ScopInfo/reduction_only_reduction_like_access.ll
>> @@ -0,0 +1,39 @@
>> +; RUN: opt %loadPolly -polly-scops -analyze < %s | FileCheck %s
>> --check-prefix=DETECTION
>> +;
>> +; DETECTION: Reduction like: 1
>> +;
>> +; void f(int *sum) {
>> +;   int i;
>> +;   for (i = 0; i < 100; i++)
>> +;     sum[i] = sum[99-i] + i;
>> +; }
> I like this style of test case best as it does not add unnecessary
> reg2mem instructions, basic blocks as some of your other test cases do.
> ;-)
I just started with int *sum, but I can change others if needed.

>> +;
>> +; ModuleID = 'test/ScopInfo/reduction_only_reduction_like_access.ll'
>
> Remove this line.
I will for all.

>> +target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-n32-S64"
>> +
>> +; Function Attrs: nounwind
>> +define void @f(i32* %sum) {
>> +entry:
>> +  br label %for.cond
>> +
>> +for.cond:                                         ; preds = %for.inc,
>> %entry
>> +  %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
>> +  %exitcond = icmp ne i32 %i.0, 100
>> +  br i1 %exitcond, label %for.body, label %for.end
>> +
>> +for.body:                                         ; preds = %for.cond
>> +  %sub = sub nsw i32 99, %i.0
>> +  %arrayidx = getelementptr inbounds i32* %sum, i32 %sub
>> +  %tmp = load i32* %arrayidx, align 4
>> +  %add = add nsw i32 %tmp, %i.0
>> +  %arrayidx1 = getelementptr inbounds i32* %sum, i32 %i.0
>> +  store i32 %add, i32* %arrayidx1, align 4
>> +  br label %for.inc
>> +
>> +for.inc:                                          ; preds = %for.body
>> +  %inc = add nsw i32 %i.0, 1
>> +  br label %for.cond
>> +
>> +for.end:                                          ; preds = %for.cond
>> +  ret void
>> +}
>> diff --git a/test/ScopInfo/reduction_simple_fp.ll
>> b/test/ScopInfo/reduction_simple_fp.ll
>> new file mode 100644
>> index 0000000..0eef154
>> --- /dev/null
>> +++ b/test/ScopInfo/reduction_simple_fp.ll
>> @@ -0,0 +1,69 @@
>> +; RUN: opt %loadPolly -polly-scops -analyze < %s | FileCheck %s
>> --check-prefix=DETECTION
>> +;
>> +; DETECTION: Function: f_no_fast_math
>> +; DETECTION: Reduction like: 0
>> +; DETECTION: Function: f_fast_math
>> +; DETECTION: Reduction like: 1
>> +;
>> +; void f(float *sum) {
>> +;   int i;
>> +;   for (i = 0; i < 100; i++)
>> +;     *sum += 3.41 * i;
> I like this style of test case best as it does not add unnecessary
> reg2mem instructions, basic blocks as some of your other test cases do.
> ;-)
Deja vu.
>> +; }
>> +;
>> +; ModuleID = 'test/ScopInfo/reduction_simple_fp.ll'
> Remove this line.
see above.

>> +target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-n32-S64"
>> +
>> +; Function Attrs: nounwind
> Remove this line.
Ok.

>> +define void @f_no_fast_math(float* %sum) {
>> +entry:
>> +  br label %for.cond
>> +
>> +for.cond:                                         ; preds = %for.inc,
>> %entry
>> +  %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
>> +  %exitcond = icmp ne i32 %i.0, 100
>> +  br i1 %exitcond, label %for.body, label %for.end
>> +
>> +for.body:                                         ; preds = %for.cond
>> +  %conv = sitofp i32 %i.0 to float
>> +  %pi = fptrunc double 3.41 to float
>> +  %mul = fmul float %conv, %pi
>> +  %tmp = load float* %sum, align 4
>> +  %add = fadd float %tmp, %mul
>> +  store float %add, float* %sum, align 4
>> +  br label %for.inc
>> +
>> +for.inc:                                          ; preds = %for.body
>> +  %inc = add nsw i32 %i.0, 1
>> +  br label %for.cond
>> +
>> +for.end:                                          ; preds = %for.cond
>> +  ret void
>> +}
>> +
>> +; Function Attrs: nounwind
> Remove this line.
>
>> +define void @f_fast_math(float* %sum) {
>> +entry:
>> +  br label %for.cond
>> +
>> +for.cond:                                         ; preds = %for.inc,
>> %entry
>> +  %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
>> +  %exitcond = icmp ne i32 %i.0, 100
>> +  br i1 %exitcond, label %for.body, label %for.end
>> +
>> +for.body:                                         ; preds = %for.cond
>> +  %conv = sitofp i32 %i.0 to float
>> +  %pi = fptrunc double 3.41 to float
>> +  %mul = fmul fast float %conv, %pi
>> +  %tmp = load float* %sum, align 4
>> +  %add = fadd fast float %tmp, %mul
>> +  store float %add, float* %sum, align 4
>> +  br label %for.inc
>> +
>> +for.inc:                                          ; preds = %for.body
>> +  %inc = add nsw i32 %i.0, 1
>> +  br label %for.cond
>> +
>> +for.end:                                          ; preds = %for.cond
>> +  ret void
>> +}
>> diff --git a/test/ScopInfo/reduction_simple_w_constant.ll
>> b/test/ScopInfo/reduction_simple_w_constant.ll
>> new file mode 100644
>> index 0000000..1d0a5c1
>> --- /dev/null
>> +++ b/test/ScopInfo/reduction_simple_w_constant.ll
>> @@ -0,0 +1,41 @@
>> +; RUN: opt %loadPolly -polly-scops -analyze < %s | FileCheck %s
>> --check-prefix=DETECTION
>> +;
>> +; DETECTION: Reduction like: 1
>> +;
>> +; int f(int sum) {
>> +;   int i;
>> +;   for (int i = 0; i <= 100; i++) {
>> +;     sum += 3;
>> +;   }
>> +;   return sum;
>> +; }
> You could simplify this according to reduction_simple_fp.ll
Maybe I wanted the variations ;)

>> +;
>> +; ModuleID = 'test/Reductions/simplest_binary_reduction.ll'
> Remove this line.
>
>> +target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-n32-S64"
>> +
>> +; Function Attrs: nounwind
> Remove this line.
>
>> +define i32 @f(i32 %sum) {
>> +entry:
>> +  %sum.reg2mem = alloca i32
>> +  br label %entry.split1
>> +
>> +entry.split1:                                     ; preds = %entry
>> +  br label %entry.split
>> +
>> +entry.split:                                      ; preds =
>> %entry.split1
>> +  store i32 %sum, i32* %sum.reg2mem
>> +  br label %for.cond
>> +
>> +for.cond:                                         ; preds = %for.cond,
>> %entry.split
>> +  %i1.0 = phi i32 [ 0, %entry.split ], [ %inc, %for.cond ]
>> +  %sum.reload = load i32* %sum.reg2mem
>> +  %add = add nsw i32 %sum.reload, 3
>> +  %inc = add nsw i32 %i1.0, 1
>> +  store i32 %add, i32* %sum.reg2mem
>> +  %cmp = icmp slt i32 %i1.0, 100
>> +  br i1 %cmp, label %for.cond, label %for.end
>> +
>> +for.end:                                          ; preds = %for.cond
>> +  %sum.reload.2 = load i32* %sum.reg2mem
>> +  ret i32 %sum.reload.2
>> +}
>> diff --git a/test/ScopInfo/reduction_simple_w_iv.ll
>> b/test/ScopInfo/reduction_simple_w_iv.ll
>> new file mode 100644
>> index 0000000..de5a8e1
>> --- /dev/null
>> +++ b/test/ScopInfo/reduction_simple_w_iv.ll
>> @@ -0,0 +1,42 @@
>> +; RUN: opt %loadPolly -polly-scops -analyze < %s | FileCheck %s
>> --check-prefix=DETECTION
>> +;
>> +; DETECTION: Reduction like: 1
>> +;
>> +; int f(int sum) {
>> +;   int i;
>> +;   for (int i = 0; i <= 100; i++) {
>> +;     sum += i * 3;
>> +;   }
>> +;   return sum;
>> +; }
> You could simplify this according to reduction_simple_fp.ll
>
>> +;
>> +; ModuleID = 'test/Reductions/simplest_binary_reduction.ll'
> Remove this line.
>
>> +target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-n32-S64"
>> +
>> +; Function Attrs: nounwind
> Remove this line.
>
>> +define i32 @f(i32 %sum) {
>> +entry:
>> +  %sum.reg2mem = alloca i32
>> +  br label %entry.split1
>> +
>> +entry.split1:                                     ; preds = %entry
>> +  br label %entry.split
>> +
>> +entry.split:                                      ; preds =
>> %entry.split1
>> +  store i32 %sum, i32* %sum.reg2mem
>> +  br label %for.cond
>> +
>> +for.cond:                                         ; preds = %for.cond,
>> %entry.split
>> +  %i1.0 = phi i32 [ 0, %entry.split ], [ %inc, %for.cond ]
>> +  %sum.reload = load i32* %sum.reg2mem
>> +  %mul = mul nsw i32 %i1.0, 3
>> +  %add = add nsw i32 %sum.reload, %mul
>> +  %inc = add nsw i32 %i1.0, 1
>> +  store i32 %add, i32* %sum.reg2mem
>> +  %cmp = icmp slt i32 %i1.0, 100
>> +  br i1 %cmp, label %for.cond, label %for.end
>> +
>> +for.end:                                          ; preds = %for.cond
>> +  %sum.reload.2 = load i32* %sum.reg2mem
>> +  ret i32 %sum.reload.2
>> +}
>> -- 1.8.2.1
>
> --
> You received this message because you are subscribed to the Google Groups
> "Polly Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to polly-dev+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.