[polly] Memory access based dependency analysis

Tue Jun 24 13:35:04 PDT 2014

On 24/06/2014 19:01, Johannes Doerfert wrote:
> Shortened the mail and added inlined comments.

Thanks Johannes for your comments. I am sorry that I keep discussing 
changes even though this patch is already of very high quality. However, 
I think our discussions highly improved the quality of the patches that 
got committed in the end. So let me use this occasion to learn a little 
bit more.

> Attached updated versions.
>
> --
>
> Johannes Doerfert
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
> The Linux Foundation
>
> -----Original Message-----
> From: Tobias Grosser [mailto:tobias at grosser.es]
> Sent: Tuesday, June 24, 2014 1:24 AM
> To: Johannes Doerfert
> Cc: llvm-commits at cs.uiuc.edu; 'Sebastian Pop'
> Subject: Re: [polly] Memory access based dependency analysis
>
>>> Regarding the approach you have taken, this is exactly what I proposed
> and it seems OK, but may cause increased compile time. Thanks you figured
> this out and proposed an option to reduce this problem. Another option that
> came to my mind after discussing this with Sven, might be even faster. We
> could leave the dependence computation entirely unmodified, but modify the
> set of possible reduction dependences.
>>> Instead of reduction dependences like this:
>>>
>>>         isl_set *AccDom = isl_map_domain(MA->getAccessRelation());
>>>         isl_map *Identity =
>>>             isl_map_from_domain_and_range(isl_set_copy(AccDom), AccDom);
>>>         RED = isl_union_map_add_map(RED, Identity);
>>>
>>> we compute a RED map containing only relations of instances that read and
> write from/to the same memory location:
>>>
>>> Identity =
>>> ReductionRead->getAccessRelation().apply_range(ReductionWrite->getAccess
>>> ReductionRead->Relation().reverse())
>>>
>>> This avoids both the duplication of dimensions in the dependence
> computation as well as the per-access computations otherwise necessary.
>>> What do you think of this approach? Is it semantically identical?
> I am not completely sure if we can express everything as easy as we can do
> it at the moment with this approach. While it looks good for
> reduction dependences in isolation (I mean we can use this methode to create
> the reduction depenency templates), we need to think about:
>
> 1) Privatization dependences need to be computed, how can we do this without
> loosing generallity in the new setting?

Do you have an example where we would loose generality?

 > This is connected to 2)
>     and the dependency seperation discussion there.
>
> 2) How do we remove the reduction dependences then? So far we know that the
> identity map we initialize the reduction dependences with
>     cannot intersect with non-reduction dependences (first because of the
> detection limitations, later because we are in different spaces
> wrapped/unwrapped).
>     If we lift the limitations and use only one space we need to make sure we
> seperate the dependences correctly. Possibilities which come to mind:
>       o Use the dependency analysis twice once for reduction once for non
> reduction dependences. However, that was what I did in my initial patch and
> it won't
>         get much nicer/shorter than that.
>       o Perform the dependency analysis once and intersect the RAW/WAW
> dependences afterwards with our reduction dependency template (same as now).
>         However, the Identity we compute above looks to me more than our
> current transitive closure (see below). My problem now is that we have a
> bunch of
>         RAW/WAW dependences and the templates for reduction dependences but
> we cannot be sure the intersection will only be reduction dependences.

Why not? We only add reduction like dependences to the Identity map for 
which we know they read and write from/to the same memory location. Is 
there a test case that will not be covered by this?

 > I
> don't have
>         an example right now (if you want one I'll think about one) but the
> idea is to have both reduction and non reduction self dependences in one
> statement.

I think if we want to introduce per-access dependences tracking which 
may possibly be a lot more expensive we need at least one example that 
justifies the need for it.

>     Can you think of another way to split the dependences without the memory
> access tracking correctly (without limiting the reduction detection).
>
>         void f(int *sum) {
>           int i, j;
>           for (i = 0; i < 99; i++) {
>      S1:    sum[i + 1] += 42;
>             for (j = i; j < 100; j++)
>      S2:      sum[i - j] += i * j;
>      S3:    sum[i - 1] += 7;
>           }
>         }
>
>      ReadAccess :=       [Reduction like: 1 RT: 1]
>          { Stmt_S2[i0, i1] -> MemRef_sum[-i1] };
>      MustWriteAccess :=  [Reduction like: 1 RT: 1]
>          { Stmt_S2[i0, i1] -> MemRef_sum[-i1] };
 >
>
>      Identity as you described it intersected with the statement domain (both
> the domain and range):
>      { Stmt_S2[i0, i1] -> Stmt_S2[o0, i1] : i0 >= 0 and i0 <= 98 and i1 >= 0
> and i1 <= 99 - i0 and o0 >= 0 and o0 <= 98 and o0 <= 99 - i1 }

Is this not what you expect? It looks good to me.

> 3) At some point we also want to link reductions in different statements (at
> least I want to do that). Something along the line of:
>     for i
>        sum +=
>        for j
>          sum +=

Right. This is a good example and I believe we should look into this.

>     Should be a completely parallel loop nest, at the moment we only get
> dependences for the innermost reduction though.
>     However, if we track memory accesses we can extend it easily later.
>
> I like the current solution and on polybench we do not add compile time with
> the hybrid method,... what do you think?

As said above, the patch itself was very well written. I am just trying 
to understand the exact reason why it is needed to make sure we have 
proper test cases justifying and highlighting the use cases. Without 
actually being able to detect multi-reduction statements it seems hard 
to write a test case where this dependency tracking is needed.

>>> Btw, this also seems to be the location where we could lift the
> requirement of input/output address being obviously identical.
> Yes we could as long as the one reduction per statement restriction stays. I
> lifted both restrictions at the same time but we need one more
> patch to do so. I'm fine with both waiting and lifting it now.

My feeling is that the next trivial step would actually be to lift the 
restriction on the input/output addresses. Even though it may not yet 
justify per-statement tracking it is the simplest way to force us to 
look at the memory locations during dependency computation.

>>>> 0001-Memory-access-based-dependency-analysis_v2.patch
>>>>
>>>>
>>>>   From 1eeba9bbd25fbcb3bc91a10f1ddb19bd5de72cce Mon Sep 17 00:00:00
>>>> 2001
>>>> From: Johannes Doerfert<jdoerfert at codeaurora.org>
>>>> Date: Thu, 19 Jun 2014 15:10:52 -0700
>>>> Subject: [PATCH 1/4] Memory access based dependency analysis
>>>>
>>>>    This fine grained dependecy analysis keeps track of the memory
>>>> accesses
>>>                         dependency
>>>
>>>>    which caused dependeces between statements. It will be used to model
>>>>    reduction dependences for statements with mutliple reduction
> accesses.
>>>
>>>                                                 multiple
>>>
>>>>    + Modified a test case to check wrapped and zipped dependences
>>>>    + Modiefed a test case, because this tracking is slightly better
>>>
>>>         Modified
> My bad...
>
>>>>    ; MEMORY: RAW dependences:
>>>> -; MEMORY:  { Stmt_do_body2[i0, i1, i2] -> Stmt_do_body2[i0, i1, o2] :
>>>> i0 <= 35 and i0 >= 0 and i1 <= 35 and i1 >= 0 and i2 >= 0 and o2 >= 1
>>>> + i2 and o2 <= 35 and o2 >= 0 }
>>>> +; MEMORY:  { Stmt_do_body2[i0, i1, i2] -> Stmt_do_body2[i0, i1, o2] :
>>>> +i0 <= 35 and i0 >= 0 and i1 <= 35 and i1 >= 0 and i2 >= 0 and o2 >= 1
>>>> ++ i2 and o2 <= 35 and o2 >= 0 and i2 <= 35 }
>>>
>>> Can you explain why we now get an additional constraint for this test
> case?
> When we track the memory accesses explicitly isl just keeps track of them. I
> don't know why but they look legit to me.

OK.

Tobias