[Polly] Allow multiple reductions in one statement

Fri Jun 27 12:04:56 PDT 2014

I will change the style/typos before I commit it.

What about A[i] = A[i] + A[i]?
We will not mark it as reduction like at the moment and that is a good thing (in my opinion).
Case 1: The reads are different LLVM-IR Values: 
	The "collectPossibleLoads" for the A[i] store will return both A[i] loads, but if we test for non overlapping accesses will always find the other load as overlapping => Invalid
Case 2: The reads are the same LLVM-IR Value:
	The "collectPossibleLoads" won't return this load ad it has at least two uses (in the binary operator).

I will remove the XFAIL and make it not detect the reduction/scop.

I will introduce unary reductions in 2 commits (hopefully). I will simply remove the FIXME's for now if they are confusing.

What do you think?

--

Johannes Doerfert
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

-----Original Message-----
From: Tobias Grosser [mailto:tobias at grosser.es] 
Sent: Friday, June 27, 2014 11:52 AM
To: Johannes Doerfert
Cc: llvm-commits at cs.uiuc.edu; 'Sebastian Pop'; 'zino'
Subject: Re: [Polly] Allow multiple reductions in one statement

On 27/06/2014 20:29, Johannes Doerfert wrote:
> Hey,
>
> I didn’t commit it yet because I changed a lot of comments and a little bit of code.
> Is this OK with you?

I added some comments mostly stylistic. I believe I also found one problem with:

A[i] = A[i] + A[i] reductions

>  From 23a4b608383400c19c9bd4e6105e06d391565f1e Mon Sep 17 00:00:00 
> 2001
> From: Johannes Doerfert<jdoerfert at codeaurora.org>
> Date: Thu, 19 Jun 2014 15:21:50 -0700
> Subject: [PATCH] Allow multiple reductions per statement
>
>    Iterate over all store memory accesses and check for valid binary reduction
>    candidate loads by following the operands of the stored value.  For each
>    candidate pair we check if they have the same base address and there are no
>    other accesses which may overlap with them. This ensures that no intermediate
>    value can escape into other memory locations or is overwritten at some point.

Very nice.

> +/// @brief Collect loads which might form a reduction chain with @p 
> +StoreMA /// /// Check if the stored value for @p StoreMA is a binary 
> +operator with one or /// two loads as operands. If the binary operand 
> +is commutative & associative, /// used only once (by @p StoreMA) and 
> +his load operands are also used only
                                           its

> +/// @brief Check for reductions in this ScopStmt /// /// Iterate over 
> +all store memory accesses and check for valid binary reduction /// 
> +like chains. For all candidates we check if they have the same base 
> +address /// and there are no other accesses which overlap with them. 
> +The base address /// check rules out impossible reductions candidates 
> +early. The overlap check, /// together with the "only one user" check 
> +in collectCandiateReductionLoads, /// guarantees that non of the 
> +intermediate results will escape during
                        none

> +/// execution of the loop nest. We basically check here that no other 
> +memory /// access can access the same memory as the potential reduction.
> +void ScopStmt::checkForReductions() {
> +  SmallVector<MemoryAccess *, 2> Loads;
> +  SmallVector<std::pair<MemoryAccess *, MemoryAccess *>, 4> 
> +Candidates;
> +
> +  // First collect candidate load-store reduction chains by iterating 
> + over all  // stores and collecting possible reduction loads.
> +  for (MemoryAccess *StoreMA : MemAccs) {
> +    if (StoreMA->isRead())
> +      continue;
> +
> +    Loads.clear();
> +    collectCandiateReductionLoads(StoreMA, Loads);
> +    for (MemoryAccess *LoadMA : Loads)
> +      Candidates.push_back(std::make_pair(LoadMA, StoreMA));  }
> +
> +  // Then check each possible candidate pair

Point at end of sentence.

> +  for (const auto &CandidatePair : Candidates) {
> +    bool Valid = true;
> +    isl_map *LoadAccs = CandidatePair.first->getAccessRelation();
> +    isl_map *StoreAccs = CandidatePair.second->getAccessRelation();
> +
> +    // Skip those with obviously unequal base addresses
> +    if (!isl_map_has_equal_space(LoadAccs, StoreAccs)) {
> +      isl_map_free(LoadAccs);
> +      isl_map_free(StoreAccs);
> +      continue;
> +    }
> +
> +    // And check if the remaining for overlap with other memory 
> + accesses

Point at end of sentence.

 > +    for (MemoryAccess *MA : MemAccs) {
 > +      if (MA == CandidatePair.first || MA == CandidatePair.second)
 > +        continue;
 > +

What about:

A[i] = A[i] + A[i]

Should we check that the two loads can not have the same address (at the same iteration)?

>   std::string ScopStmt::getDomainStr() const { return 
> stringFromIslObj(Domain); } diff --git 
> a/test/Dependences/reduction_complex_location.ll 
> b/test/Dependences/reduction_complex_location.ll
> new file mode 100644
> index 0000000..8ed46b7
> --- /dev/null
> +++ b/test/Dependences/reduction_complex_location.ll
> @@ -0,0 +1,59 @@
> +; RUN: opt -basicaa %loadPolly -polly-dependences -analyze < %s | 
> +FileCheck %s ; ; CHECK: RAW dependences:
> +; CHECK:   {  }
> +; CHECK: WAR dependences:
> +; CHECK:   {  }
> +; CHECK: WAW dependences:
> +; CHECK:   {  }
> +; CHECK: Reduction dependences:
> +; CHECK:   { Stmt_for_body3[i0, i1] -> Stmt_for_body3[2 + i0, -1 + i1] : i0 <= 97 and i0 >= 0 and i1 <= 99 and i1 >= 1 }
> +;
> +; void f(int *sum) {
> +;   for (int i = 0; i < 100; i++)
> +;     for (int j = 0; j < 100; j++)
> +;       sum[i+j*2] += j * i;
> +; }

Nice that we get these partial reduction chains.

> diff --git a/test/Dependences/reduction_multiple_loops_array_sum_3.ll 
> b/test/Dependences/reduction_multiple_loops_array_sum_3.ll
> new file mode 100644
> index 0000000..4b699a1
> --- /dev/null
> +++ b/test/Dependences/reduction_multiple_loops_array_sum_3.ll
> @@ -0,0 +1,73 @@
> +; RUN: opt %loadPolly -polly-dependences -analyze -basicaa < %s | 
> +FileCheck %s ; ; CHECK: Reduction dependences:
> +; CHECK:   { Stmt_for_inc[i0, i1] -> Stmt_for_inc[i0, 1 + i1] : i0 <= 99 and i0 >= 0 and i1 <= 98 and i1 >= 0 }
> +;
> +; int f(int * __restrict__ A) {
> +;   int i, j, sum = 0;
> +;   for (k = 0; k < 37; k = g(k)) {
> +;     for (i = 0; i < 100; i++) {
> +;       sum *= 2;
> +;       for (j = 0; j < 100; j++) {
> +;         sum += A[i+j];

It may help to add S1, S2 labels and name the bbs accordingly.

> +++ b/test/Dependences/reduction_partially_escaping_intermediate_in_ot
> +++ her_stmt.ll
> @@ -0,0 +1,69 @@
> +; RUN: opt %loadPolly -polly-dependences -analyze -basicaa < %s | 
> +FileCheck %s ; ; CHECK: Reduction dependences:
> +; CHECK:   [N] -> { Stmt_for_body3[i0, i1] -> Stmt_for_body3[i0, 1 + i1] : i0 <= 1023 and i0 >= 0 and i1 <= 1022 and i1 >= 0 and i1 >= 1024 - N + i0 }
> +;
> +; void f(int N, int * restrict sums, int * restrict escape) {
> +;   for (int i = 0; i < 1024; i++) {
> +;     for (int j = 0; j < 1024; j++) {
> +;       sums[i] += 5;
> +;       if (N - i + j < 1024)
> +;         escape[N - i + j] = sums[i];

It may help to add S1, S2 labels and name the bbs accordingly.

Otherwise, a very interesting test case.

> diff --git a/test/ScopInfo/reduction_alternating_base.ll 
> b/test/ScopInfo/reduction_alternating_base.ll
> new file mode 100644
> index 0000000..f06ac6d
> --- /dev/null
> +++ b/test/ScopInfo/reduction_alternating_base.ll
> @@ -0,0 +1,35 @@
> +; RUN: opt %loadPolly -polly-scops -analyze < %s | FileCheck %s ; ; 
> +We cannot detect this SCoP yet.
> +;
> +; XFAIL: *

If you decide to commit a failing test case, it would be good to add a comment why it is currently not detected and what needs to be fixed to detect it in the future. Even better would be to not mark it as XFAIL, but to check that is it not detected such that we get notified if we happen to fix this bug later on.

> +++ b/test/ScopInfo/reduction_escaping_intermediate.ll
> @@ -0,0 +1,63 @@
> +; RUN: opt %loadPolly -basicaa -polly-scops -analyze < %s | FileCheck 
> +%s ; ; void f(int N, int * restrict sums, int * restrict escape) {
> +;   int i, j;
> +;   for (i = 0; i < 1024; i++) {
> +;     for (j = 0; j < 1024; j++) {
> +;       sums[i] += 5;
> +;       escape[N - i + j] = sums[i];
> +;     }
> +;   }
> +; }
> +;
> +; CHECK: Reduction like: 0
> +; CHECK: sums
> +; CHECK: Reduction like: 0
> +; CHECK: sums
> +; FIXME: escape is a unary reduction

Can you explain this FIXME? What is a unary reduction? What needs to be addressed to resolve this FIXME?

> +++ b/test/ScopInfo/reduction_multiple_loops_array_sum.ll
> @@ -0,0 +1,78 @@
> +; RUN: opt -basicaa %loadPolly -polly-scops -analyze < %s | FileCheck 
> +%s ; ; CHECK: Stmt_for_body ; CHECK: Reduction like: 1 ; CHECK: 
> +MemRef_sum ; CHECK: Reduction like: 1 ; CHECK: MemRef_sum ; CHECK: 
> +Stmt_for_body3 ; CHECK: Reduction like: 0 ; CHECK: MemRef_A ; CHECK: 
> +Reduction like: 1 ; CHECK: MemRef_sum ; CHECK: Reduction like: 1 ; 
> +CHECK: MemRef_sum ; CHECK: Stmt_for_end ; CHECK: Reduction like: 1 ; 
> +CHECK: MemRef_sum ; CHECK: Reduction like: 1 ; CHECK: MemRef_sum ; ; 
> +void f(int *restrict A, int *restrict sum) {
> +;   int i, j;
> +;   for (i = 0; i < 100; i++) {
> +;     *sum *= 7;
> +;     for (j = 0; j < 100; j++) {
> +;       *sum += A[i + j];
> +;     }
> +;     *sum *= 5;
> +;   }
> +; }

Statement labels would help to match relate the check and the source code comment.

> +++ b/test/ScopInfo/reduction_multiple_loops_array_sum_1.ll
> @@ -0,0 +1,74 @@
> +; RUN: opt -basicaa %loadPolly -polly-scops -analyze < %s | FileCheck 
> +%s ; ; CHECK: Stmt_for_body ; CHECK: Reduction like: 0 ; CHECK: 
> +MemRef_sum_04 ; FIXME: This is a unary reduction

Can you explain this fixme?

Tobias