[polly] Memory access based dependency analysis

Tue Jun 24 01:23:40 PDT 2014

On 20/06/2014 18:57, Johannes Doerfert wrote:
> Hey Tobias,
>
>
>
> I attached two patches, one to change the dependency analysis to use memory
> accesses not statements, and one to do both.
>
> The idea is that we need to track dependences between reduction accesses at
> some point (virtual splitting) but we don't want to do this for all
> accesses.
>
> The first patch on its own should work but increase our compile time
> significantly (e.g., we hit the isl computation limit of 25k on 3mm in
> polybench).
>
> With the second patch the compile time in most cases is very similar to the
> current implementation.
>
>
>
> What do you think?

Hi Johannes,

thanks for the small and self contained patches. I am very happy with 
their style!

Regarding the approach you have taken, this is exactly what I proposed 
and it seems OK, but may cause increased compile time. Thanks you 
figured this out and proposed an option to reduce this problem. Another 
option that came to my mind after discussing this with Sven, might be 
even faster. We could leave the dependence computation entirely 
unmodified, but modify the set of possible reduction dependences. 
Instead of reduction dependences like this:

       isl_set *AccDom = isl_map_domain(MA->getAccessRelation());
       isl_map *Identity =
           isl_map_from_domain_and_range(isl_set_copy(AccDom), AccDom);
       RED = isl_union_map_add_map(RED, Identity);

we compute a RED map containing only relations of instances that read 
and write from/to the same memory location:

Identity = 
ReductionRead->getAccessRelation().apply_range(ReductionWrite->getAccessRelation().reverse())

This avoids both the duplication of dimensions in the dependence 
computation as well as the per-access computations otherwise necessary. 
What do you think of this approach? Is it semantically identical?

Btw, this also seems to be the location where we could lift the 
requirement of input/output address being obviously identical.

> 0001-Memory-access-based-dependency-analysis_v2.patch
>
>
>  From 1eeba9bbd25fbcb3bc91a10f1ddb19bd5de72cce Mon Sep 17 00:00:00 2001
> From: Johannes Doerfert<jdoerfert at codeaurora.org>
> Date: Thu, 19 Jun 2014 15:10:52 -0700
> Subject: [PATCH 1/4] Memory access based dependency analysis
>
>   This fine grained dependecy analysis keeps track of the memory accesses
                       dependency

>   which caused dependeces between statements. It will be used to model
>   reduction dependences for statements with mutliple reduction accesses.

                                               multiple

>   + Modified a test case to check wrapped and zipped dependences
>   + Modiefed a test case, because this tracking is slightly better

       Modified

> ---
>   lib/Analysis/Dependences.cpp            | 40 +++++++++++++++++++++++++-
>   test/Dependences/do_pluto_matmult.ll    |  6 ++--
>   test/Dependences/reduction_simple_iv.ll | 50 +++++++++++++++++++++++++++------
>   3 files changed, 83 insertions(+), 13 deletions(-)
>
> diff --git a/lib/Analysis/Dependences.cpp b/lib/Analysis/Dependences.cpp
> index 3fc56e7..65b6069 100644
> --- a/lib/Analysis/Dependences.cpp
> +++ b/lib/Analysis/Dependences.cpp
> @@ -83,12 +83,35 @@ void Dependences::collectInfo(Scop &S, isl_union_map **Read,
>
>         accdom = isl_map_intersect_domain(accdom, domcp);
>
> +      // Wrap the access domain and adjust the scattering accordingly.
> +      //
> +      // An access domain like
> +      //   Stmt[i0, i1] -> MemAcc_A[i0 + i1]
> +      // will be transformed into
> +      //   [Stmt[i0, i1] -> MemAcc_A[i0 + i1]] -> MemAcc_A[i0 + i1]
> +      //
> +      // The original scattering looks like
> +      //   Stmt[i0, i1] -> [0, i0, 2, i1, 0]
> +      // but as we transformed the access domain we need the scattering
> +      // to match the new access domains, thus we need
> +      //   [Stmt[i0, i1] -> MemAcc_A[i0 + i1]] -> [0, i0, 2, i1, 0]
> +      accdom = isl_map_range_map(accdom);
> +
> +      isl_map *stmt_scatter = Stmt->getScattering();
> +      isl_set *scatter_dom = isl_map_domain(isl_map_copy(accdom));
> +      isl_set *scatter_ran = isl_map_range(stmt_scatter);
> +      isl_map *scatter =
> +          isl_map_from_domain_and_range(scatter_dom, scatter_ran);
> +      for (unsigned u = 0, e = Stmt->getNumIterators(); u != e; u++)
> +        scatter =
> +            isl_map_equate(scatter, isl_dim_out, 2 * u + 1, isl_dim_in, u);
> +      *Schedule = isl_union_map_add_map(*Schedule, scatter);
> +
>         if (MA->isRead())
>           *Read = isl_union_map_add_map(*Read, accdom);
>         else
>           *Write = isl_union_map_add_map(*Write, accdom);
>       }
> -    *Schedule = isl_union_map_add_map(*Schedule, Stmt->getScattering());
>     }
>   }
>
> @@ -254,6 +277,21 @@ void Dependences::calculateDependences(Scop &S) {
>     isl_ctx_reset_operations(S.getIslCtx());
>     isl_ctx_set_max_operations(S.getIslCtx(), MaxOpsOld);
>
> +  DEBUG(dbgs() << "Wrapped Dependences:\n"; printScop(dbgs()); dbgs() << "\n");
> +
> +  RAW = isl_union_map_zip(RAW);
> +  WAW = isl_union_map_zip(WAW);
> +  WAR = isl_union_map_zip(WAR);
> +
> +  DEBUG(dbgs() << "Zipped Dependences:\n"; printScop(dbgs()); dbgs() << "\n");
> +
> +  RAW = isl_union_set_unwrap(isl_union_map_domain(RAW));
> +  WAW = isl_union_set_unwrap(isl_union_map_domain(WAW));
> +  WAR = isl_union_set_unwrap(isl_union_map_domain(WAR));
> +
> +  DEBUG(dbgs() << "Unwrapped Dependences:\n"; printScop(dbgs());
> +        dbgs() << "\n");
> +
>     // To handle reduction dependences we proceed as follows:
>     // 1) Aggregate all possible reduction dependences, namely all self
>     //    dependences on reduction like statements.
> diff --git a/test/Dependences/do_pluto_matmult.ll b/test/Dependences/do_pluto_matmult.ll
> index 7d9799e..d0c2a1e 100644
> --- a/test/Dependences/do_pluto_matmult.ll
> +++ b/test/Dependences/do_pluto_matmult.ll
> @@ -73,8 +73,8 @@ do.end45:                                         ; preds = %do.cond42
>   ; VALUE:  { Stmt_do_body2[i0, i1, i2] -> Stmt_do_body2[i0, i1, 1 + i2] : i0 >= 0 and i0 <= 35 and i1 >= 0 and i1 <= 35 and i2 >= 0 and i2 <= 34 }
>
>   ; MEMORY: RAW dependences:
> -; MEMORY:  { Stmt_do_body2[i0, i1, i2] -> Stmt_do_body2[i0, i1, o2] : i0 <= 35 and i0 >= 0 and i1 <= 35 and i1 >= 0 and i2 >= 0 and o2 >= 1 + i2 and o2 <= 35 and o2 >= 0 }
> +; MEMORY:  { Stmt_do_body2[i0, i1, i2] -> Stmt_do_body2[i0, i1, o2] : i0 <= 35 and i0 >= 0 and i1 <= 35 and i1 >= 0 and i2 >= 0 and o2 >= 1 + i2 and o2 <= 35 and o2 >= 0 and i2 <= 35 }

Can you explain why we now get an additional constraint for this test case?

> @@ -1,14 +1,46 @@
> -; RUN: opt %loadPolly -polly-dependences -analyze < %s | FileCheck %s
> +; RUN: opt %loadPolly -polly-dependences -analyze -debug-only=polly-dependence 2> %t.2 > %t.1 < %s
> +; RUN: FileCheck %s --check-prefix=DEPENDENCES --input-file %t.1
> +; RUN: FileCheck %s --check-prefix=WRAPPED_DEPENDENCES --input-file %t.2

You can not use -debug-only in a test case, as the test case will fail 
for non-debug builds. Also, having test cases that write files is not a 
good idea as this causes noise in the test directory.

There are three options:

1) Mark the test case as 'REQUIRES: asserts' and put it into a separate file
2) Only check the -analyze output
3) Add more output to the -analyze output

I don't have a strong opinion here. Choose what you believe works best.

> +; WRAPPED_DEPENDENCES: Read: { [Stmt_for_cond[i0] -> MemRef_sum[0]] -> MemRef_sum[0] : i0 >= 0 and i0 <= 100 }
> +; WRAPPED_DEPENDENCES: Write: { [Stmt_for_cond[i0] -> MemRef_sum[0]] -> MemRef_sum[0] : i0 >= 0 and i0 <= 100 }
> +; WRAPPED_DEPENDENCES: Schedule: { [Stmt_for_cond[i0] -> MemRef_sum[0]] -> scattering[0, i0, 0] : i0 <= 100 and i0 >= 0 }
> +; WRAPPED_DEPENDENCES: Wrapped Dependences:
> +; WRAPPED_DEPENDENCES:         RAW dependences:
> +; WRAPPED_DEPENDENCES:                 { [Stmt_for_cond[i0] -> MemRef_sum[0]] -> [Stmt_for_cond[1 + i0] -> MemRef_sum[0]] : i0 >= 0 and i0 <= 99 }
> +; WRAPPED_DEPENDENCES:         WAR dependences:
> +; WRAPPED_DEPENDENCES:                 {  }
> +; WRAPPED_DEPENDENCES:         WAW dependences:
> +; WRAPPED_DEPENDENCES:                 { [Stmt_for_cond[i0] -> MemRef_sum[0]] -> [Stmt_for_cond[1 + i0] -> MemRef_sum[0]] : i0 >= 0 and i0 <= 99 }
> +; WRAPPED_DEPENDENCES:         Reduction dependences:
> +; WRAPPED_DEPENDENCES:                 n/a
> +; WRAPPED_DEPENDENCES: Zipped Dependences:
> +; WRAPPED_DEPENDENCES:         RAW dependences:
> +; WRAPPED_DEPENDENCES:                 { [Stmt_for_cond[i0] -> Stmt_for_cond[1 + i0]] -> [MemRef_sum[0] -> MemRef_sum[0]] : i0 >= 0 and i0 <= 99 }
> +; WRAPPED_DEPENDENCES:         WAR dependences:
> +; WRAPPED_DEPENDENCES:                 {  }
> +; WRAPPED_DEPENDENCES:         WAW dependences:
> +; WRAPPED_DEPENDENCES:                 { [Stmt_for_cond[i0] -> Stmt_for_cond[1 + i0]] -> [MemRef_sum[0] -> MemRef_sum[0]] : i0 >= 0 and i0 <= 99 }
> +; WRAPPED_DEPENDENCES:         Reduction dependences:
> +; WRAPPED_DEPENDENCES:                 n/a
> +; WRAPPED_DEPENDENCES: Unwrapped Dependences:
> +; WRAPPED_DEPENDENCES:         RAW dependences:
> +; WRAPPED_DEPENDENCES:                 { Stmt_for_cond[i0] -> Stmt_for_cond[1 + i0] : i0 >= 0 and i0 <= 99 }
> +; WRAPPED_DEPENDENCES:         WAR dependences:
> +; WRAPPED_DEPENDENCES:                 {  }
> +; WRAPPED_DEPENDENCES:         WAW dependences:
> +; WRAPPED_DEPENDENCES:                 { Stmt_for_cond[i0] -> Stmt_for_cond[1 + i0] : i0 >= 0 and i0 <= 99 }
> +; WRAPPED_DEPENDENCES:         Reduction dependences:
> +; WRAPPED_DEPENDENCES:                 n/a

Very nice.

Tobias