Reductions

Sun Jun 15 23:12:34 PDT 2014

On 15/06/2014 23:24, jdoerfert at codeaurora.org wrote:
>
> Comments inlined.

Same.

>> On 13/06/2014 19:27, Johannes Doerfert wrote:
>>> Hey Tobias,
>>>
>>> I changed almost everything according to your comments (except the
>>> prefix in
>>> the test cases, I don't see the need to do this, but if you insist I
>>> will).
>>> I added another "only reduction like" test case and a floating point
>>> test
>>> case w and w/o fast math [= 4 test cases for in the first patch].
>>
>> Nice.
>>
>>> What is left in order to commit this one and get to the next patch?
>>
>> Besides a couple of nitpicks in the test cases and some last questions
>> about the comments, the only missing piece is the question if we need to
>> check the read/write location to be identical.
> As I said in the first mail, I dont think we have to as long as we are
> restricted like this. More when this is discussed below.

I will reply below.

> As you said, it remains to clarify where I have to put the test cases now.

In the current test you have them in test/ScopInfo. This is fine.

>>> Comments inlined (formatting might be a bit off, sorry about that).
>>
>>
>>>>>>>>> +  /// @brief Flag to indicate reduction statements
>>>>>>>>> + bool IsReductionLike = false;
>>>>>>>
>>>>>>> Why do you call this ReductionLike and not just reduction?
>>
>> This question is just a minor thing, but you did not reply to it. Is the
>> example you give in the comment the reason you call this reduction-like
>> instead of reduction?
> Yes, my bad. The explanation is in the comment, here it is:
>    +  /// @brief Flag to indicate reduction like statements
>    +  ///
>    +  /// A statement is reduction like if it contains exactly one load and
> one
>    +  /// store with exactly one binary operator in between. The binary
> operator
>    +  /// needs to be associative and commutative or easily convertable
> into a such
>    +  /// one. Not all reduction like statements are binary reductions in
> the common
>    +  /// sence, e.g.,
>    +  /// for (i = 0; i < 100; i++)
>    +  ///   sum[100-i] += sum[i]
>    +  /// which is reduction like but will not cause any reduction
> dependences.
>    +  bool IsReductionLike = false;
> And I thought this kind of detection was more to your liking as it moves
> computation and filtering to the dependecny analysis.

It is very much of my liking and the example you have shown shows nicely
why this way of detecting it is more general.

For a first patch, I would probably have chosen to just limit it to the 
obvious reductions (identical load/store value) as this would keep the 
dependence analysis discussion out of this review and as it also 
simplifies the review of the first dependence test.

If you prefer to detect reduction like statements with this generality, 
then I believe the comment is not very clear. Specifically, it is not 
clear what binary reductions in the common sense are and in which way 
the reductions we generate are more generic. Also, the example is 
misleading. You choose an example which does not have any dependences, 
whereas even for examples with dependences later passes need to make 
sure only the actual reduction dependences are detected.

What about something like:

A statement is reduction like if it contains exactly one load and
one store with exactly one binary operator in between. The binary
operator needs to be associative and commutative. Compared to binary
reductions "in the common sense" which load from and store to the same 
data location, reduction like statements do not impose any constraints 
on the location data is loaded from / stored to. The following access is 
seen as reduction like:

   for i
     for j
       sum[i+j] = sum[i] + 3;

Here not all iterations access the same memory location, but iterations 
for which j = 0 holds do. To take advantage of the reduction property,
transformations do not only need check if a statement is reduction like,
but they also need to verify that that the reduction property is only
exploited for statement instances that load from and store to the same
data location.

>> I tested test example (attached):
>>
>> for 0 <= i < 100:
>>     A[i+1] = A[i] + 1;
>>
>> To my understanding your current dependence check in patch 2 would
>> detect reduction dependences here (and does not verify correctly that
>> the in and out locations are identical).
> No, it does not. I verified it. Here is the output:
 >
 >
>      Function: f
>      Region: %for.cond---%for.end
>      Context:
>      {  :  }
>      Statements {
>      	Stmt_for_body
>          Reduction like: 1
>              Domain :=
>                  { Stmt_for_body[i0] : i0 >= 0 and i0 <= 99 };
>              Scattering :=
>                  { Stmt_for_body[i0] -> scattering[0, i0, 0] };
>              ReadAccess :=
>                  { Stmt_for_body[i0] -> MemRef_sum[i0] };
>              MustWriteAccess :=
>                  { Stmt_for_body[i0] -> MemRef_sum[1 + i0] };
>      }
> Printing analysis 'Polly - Calculate dependences' for region: 'for.cond =>
> for.end' in function 'f':
> 	RAW dependences:
> 		{ Stmt_for_body[i0] -> Stmt_for_body[1 + i0] : i0 >= 0 and i0 <= 98 }
> 	WAR dependences:
> 		{  }
> 	WAW dependences:
> 		{  }
> 	Reduction dependences:
> 		{  }
>
> And that is exactly my argument. We don't check this in this simple
> setting and add checks as soon as it is necessary (when we remove the two
> access limitation we need to check all accesses for overlapping ones
> anyway).
>
>> (In fact you don't detect this reduction, but this seems to be a small
>> bug:
>>
>>       RED = isl_union_map_intersect(RED, isl_union_map_copy(RAW));
>>       RED = isl_union_map_intersect(RED, isl_union_map_copy(WAW));
>>
>> I think you would like to compute
>>
>> RED = intersect(RED, union(RAW, WAW)), instead of the current
>> RED = intersect(intersect(RED, RAW), WAW))
> I don't think this is correct. We will get rid of the RAW intersection
> once we want unary reductions, but the WAW intersection need to stay.
>
 >
 >
>> Or are reduction dependences required to be both RAW and WAW dependences?)
> There is no "bug" I think, but I wanted to be conservative and only test
> for binary reductions.
> Reductions should always cause WAW dependences, binary reduction also the
> same RAW dependences.

OK, then this code definitely needs a comment that explains your 
assumptions and why they always hold. Specifically, why the way that you 
compute reduction dependences ensures that those dependences are only 
between statements where for each statement the load/store addresses are 
identical.

We can move this discussion back to the dependence analysis patch.

>> I think the simplest check for now is to verify that the addresses of
>> the input/output location have the same LLVM-IR value. Or that the read
>> and write access have identical access functions. We can then later
>> discuss how to relax this.
> This is part of the "relax the detection limitations" patch.

What is this patch? Did you send it out already?

 > As I
> mentioned above, we don't need that now as there won't be reduction
> dependences.

The absence of reduction dependences is/was not obvious to me.

> However, I can add a check to limit it two the same base address (which
> makes sense).

OK.

>>> diff --git a/include/polly/ScopInfo.h b/include/polly/ScopInfo.h
>>> index e85ec13..f571b47 100644
>>> --- a/include/polly/ScopInfo.h
>>> +++ b/include/polly/ScopInfo.h
>>> @@ -270,6 +270,17 @@ class ScopStmt {
>>>      MemoryAccessVec MemAccs;
>>>      std::map<const Instruction *, MemoryAccess *> InstructionToAccess;
>>>
>>> +  /// @brief Flag to indicate reduction like statements
>>> +  ///
>>> +  /// A statement is reduction like if it contains exactly one load and
>>> one
>>> +  /// store with exactly one binary operator in between. The binary
>>> operator
>>> +  /// needs to be associative and commutative or easily convertable
>>> into a such
>>
>> Should it be 'convertible'?
> Mh, maybe. I'll ask somebody.
>
>> What does it mean to be easily convertible? Is this already implemented?
>> Is there a test case/example I could look at?
> It means "sum -= A[i];" is fine too, not with the current patch (v2 and
> v3) as I removed the opcode check and use isCommutative now.
> I will adjust the comment for the final version.

Good. Just remove the mentioning of convertible for now.

>>> +  /// one. Not all reduction like statements are binary reductions in
>>> the common
>>> +  /// sence, e.g.,
>>> +  /// for (i = 0; i < 100; i++)
>>> +  ///   sum[100-i] += sum[i]
>>> +  /// which is reduction like but will not cause any reduction
>>> dependences.
>>
>> This is because there are no dependences at all in this loop or because
>> the individual iterations write to a different memory location?
> Yes, it shows that reduction like is not the same as reduction as I would
> suggest to detect this for now and call it reduction like.
> The example above A[i] = B[i] + 3 is similar and would be marked at the
> moment, however a check for the same base value won't change anything in
> the sum[100-i] example.
> I like to not test the access instruction because we can have:
> for i
>    for j
>       sum[i+j] = sum[i] + 3;
> which contains reduction and non reduction dependences.

Already commented on above.

>> Am I right that the point you want to make is that the flag in ScopInfo
>> only gives information about the properties of an individual
>> computation, but that the actual computation of reduction dependences
>> will happen later?
> Yes! And to my understanding this was what you proposed during the
> discussion of my last approach.
> However this will become stricter as soon as we remove some of the
> detection limits, maybe at some point we can rename it to reduction.

I don't follow this one. Why would this become stricter by removing 
detection limits?

>>> diff --git a/test/ScopInfo/reduction_only_reduction_like_access.ll
>>> b/test/ScopInfo/reduction_only_reduction_like_access.ll
>>> new file mode 10644
>>> index 0000000..30eef4d
>>> --- /dev/null
>>> +++ b/test/ScopInfo/reduction_only_reduction_like_access.ll
>>> @@ -0,0 +1,39 @@
>>> +; RUN: opt %loadPolly -polly-scops -analyze < %s | FileCheck %s
>>> --check-prefix=DETECTION
>>> +;
>>> +; DETECTION: Reduction like: 1
>>> +;
>>> +; void f(int *sum) {
>>> +;   int i;
>>> +;   for (i = 0; i < 100; i++)
>>> +;     sum[i] = sum[99-i] + i;
>>> +; }
>> I like this style of test case best as it does not add unnecessary
>> reg2mem instructions, basic blocks as some of your other test cases do.
>> ;-)
> I just started with int *sum, but I can change others if needed.

Please.

>>> diff --git a/test/ScopInfo/reduction_simple_w_constant.ll
>>> b/test/ScopInfo/reduction_simple_w_constant.ll
>>> new file mode 100644
>>> index 0000000..1d0a5c1
>>> --- /dev/null
>>> +++ b/test/ScopInfo/reduction_simple_w_constant.ll
>>> @@ -0,0 +1,41 @@
>>> +; RUN: opt %loadPolly -polly-scops -analyze < %s | FileCheck %s
>>> --check-prefix=DETECTION
>>> +;
>>> +; DETECTION: Reduction like: 1
>>> +;
>>> +; int f(int sum) {
>>> +;   int i;
>>> +;   for (int i = 0; i <= 100; i++) {
>>> +;     sum += 3;
>>> +;   }
>>> +;   return sum;
>>> +; }
>> You could simplify this according to reduction_simple_fp.ll
> Maybe I wanted the variations ;)

One test case per feature to test. This one apparently tests a
simple reduction with a constant.

Cheers,
Tobias