[polly] Memory access based dependency analysis

Tue Jun 24 14:44:18 PDT 2014

As I said, I don't want to stick to the statement only contains one
reduction limit.
In such a case we can have a statement which induces exactly the same
dependences as a reduction like:

for (i) {
    A[i] = A[i] + A[i-1];
    A[i-1] = A[i] + A[i-2];
}

Printing analysis 'Polly - Calculate dependences' for region: 'for.cond =>
for.end' in function 'onlyA':
        RAW dependences:
                { Stmt_for_body[i0] -> Stmt_for_body[1 + i0] : i0 >= 0 and
i0 <= 1022 }
        WAR dependences:
                {  }
        WAW dependences:
                { Stmt_for_body[i0] -> Stmt_for_body[1 + i0] : i0 >= 0 and
i0 <= 1022 }
        Reduction dependences:
                {  }

   for (i)
      *sum += i;

Printing analysis 'Polly - Calculate dependences' for region: 'for.cond =>
for.end' in function 'onlySum':
        RAW dependences:
                {  }
        WAR dependences:
                {  }
        WAW dependences:
                {  }
        Reduction dependences:
                { Stmt_for_body[i0] -> Stmt_for_body[1 + i0] : i0 <= 1022
and i0 >= 0 }

  for (i) {
    A[i] = A[i] + A[i-1];
    A[i-1] = A[i] + A[i-2];
    *sum += i;
  }

Printing analysis 'Polly - Calculate dependences' for region: 'for.cond =>
for.end' in function 'AandSum':
        RAW dependences:
                { Stmt_for_body[i0] -> Stmt_for_body[1 + i0] : i0 <= 1022
and i0 >= 0 }
        WAR dependences:
                {  }
        WAW dependences:
                { Stmt_for_body[i0] -> Stmt_for_body[1 + i0] : i0 <= 1022
and i0 >= 0 }
        Reduction dependences:
                {  }

Is that a valid reason to use memory access based tracking?

--

Johannes Doerfert
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
The Linux Foundation

-----Original Message-----
From: llvm-commits-bounces at cs.uiuc.edu
[mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Johannes Doerfert
Sent: Tuesday, June 24, 2014 10:01 AM
To: 'Tobias Grosser'
Cc: llvm-commits at cs.uiuc.edu; 'Sebastian Pop'
Subject: RE: [polly] Memory access based dependency analysis

Shortened the mail and added inlined comments.

Attached updated versions.

--

Johannes Doerfert
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
The Linux Foundation

-----Original Message-----
From: Tobias Grosser [mailto:tobias at grosser.es]
Sent: Tuesday, June 24, 2014 1:24 AM
To: Johannes Doerfert
Cc: llvm-commits at cs.uiuc.edu; 'Sebastian Pop'
Subject: Re: [polly] Memory access based dependency analysis

>> Regarding the approach you have taken, this is exactly what I 
>> proposed
and it seems OK, but may cause increased compile time. Thanks you figured
this out and proposed an option to reduce this problem. Another option that
came to my mind after discussing this with Sven, might be even faster. We
could leave the dependence computation entirely unmodified, but modify the
set of possible reduction dependences. 
>> Instead of reduction dependences like this:
>> 
>>        isl_set *AccDom = isl_map_domain(MA->getAccessRelation());
>>        isl_map *Identity =
>>            isl_map_from_domain_and_range(isl_set_copy(AccDom), AccDom);
>>        RED = isl_union_map_add_map(RED, Identity);
>> 
>> we compute a RED map containing only relations of instances that read 
>> and
write from/to the same memory location:
>> 
>> Identity =
>> ReductionRead->getAccessRelation().apply_range(ReductionWrite->getAcc
>> ReductionRead->ess
>> ReductionRead->Relation().reverse())
>> 
>> This avoids both the duplication of dimensions in the dependence
computation as well as the per-access computations otherwise necessary. 
>> What do you think of this approach? Is it semantically identical?
I am not completely sure if we can express everything as easy as we can do
it at the moment with this approach. While it looks good for reduction
dependences in isolation (I mean we can use this methode to create the
reduction depenency templates), we need to think about:

1) Privatization dependences need to be computed, how can we do this without
loosing generallity in the new setting? This is connected to 2)
   and the dependency seperation discussion there.

2) How do we remove the reduction dependences then? So far we know that the
identity map we initialize the reduction dependences with
   cannot intersect with non-reduction dependences (first because of the
detection limitations, later because we are in different spaces
wrapped/unwrapped).
   If we lift the limitations and use only one space we need to make sure we
seperate the dependences correctly. Possibilities which come to mind:
     o Use the dependency analysis twice once for reduction once for non
reduction dependences. However, that was what I did in my initial patch and
it won't
       get much nicer/shorter than that.
     o Perform the dependency analysis once and intersect the RAW/WAW
dependences afterwards with our reduction dependency template (same as now).
       However, the Identity we compute above looks to me more than our
current transitive closure (see below). My problem now is that we have a
bunch of
       RAW/WAW dependences and the templates for reduction dependences but
we cannot be sure the intersection will only be reduction dependences. I
don't have
       an example right now (if you want one I'll think about one) but the
idea is to have both reduction and non reduction self dependences in one
statement.
   Can you think of another way to split the dependences without the memory
access tracking correctly (without limiting the reduction detection).

       void f(int *sum) {
         int i, j;
         for (i = 0; i < 99; i++) {
    S1:    sum[i + 1] += 42;
           for (j = i; j < 100; j++)
    S2:      sum[i - j] += i * j;
    S3:    sum[i - 1] += 7;
         }
       }

    ReadAccess :=       [Reduction like: 1 RT: 1]
        { Stmt_S2[i0, i1] -> MemRef_sum[-i1] };
    MustWriteAccess :=  [Reduction like: 1 RT: 1]
        { Stmt_S2[i0, i1] -> MemRef_sum[-i1] };

    Identity as you described it intersected with the statement domain (both
the domain and range):
    { Stmt_S2[i0, i1] -> Stmt_S2[o0, i1] : i0 >= 0 and i0 <= 98 and i1 >= 0
and i1 <= 99 - i0 and o0 >= 0 and o0 <= 98 and o0 <= 99 - i1 }

3) At some point we also want to link reductions in different statements (at
least I want to do that). Something along the line of:
   for i
      sum += 
      for j
        sum +=
   Should be a completely parallel loop nest, at the moment we only get
dependences for the innermost reduction though.
   However, if we track memory accesses we can extend it easily later.

I like the current solution and on polybench we do not add compile time with
the hybrid method,... what do you think?

>> Btw, this also seems to be the location where we could lift the
requirement of input/output address being obviously identical.
Yes we could as long as the one reduction per statement restriction stays. I
lifted both restrictions at the same time but we need one more patch to do
so. I'm fine with both waiting and lifting it now.

>> > 0001-Memory-access-based-dependency-analysis_v2.patch
>> >
>> >
>> >  From 1eeba9bbd25fbcb3bc91a10f1ddb19bd5de72cce Mon Sep 17 00:00:00
>> > 2001
>> > From: Johannes Doerfert<jdoerfert at codeaurora.org>
>> > Date: Thu, 19 Jun 2014 15:10:52 -0700
>> > Subject: [PATCH 1/4] Memory access based dependency analysis
>> >
>> >   This fine grained dependecy analysis keeps track of the memory 
>> > accesses
>>                        dependency
>> 
>> >   which caused dependeces between statements. It will be used to model
>> >   reduction dependences for statements with mutliple reduction
accesses.
>> 
>>                                                multiple
>> 
>> >   + Modified a test case to check wrapped and zipped dependences
>> >   + Modiefed a test case, because this tracking is slightly better
>> 
>>        Modified
My bad...

>> >   ; MEMORY: RAW dependences:
>> > -; MEMORY:  { Stmt_do_body2[i0, i1, i2] -> Stmt_do_body2[i0, i1, o2] : 
>> > i0 <= 35 and i0 >= 0 and i1 <= 35 and i1 >= 0 and i2 >= 0 and o2 >= 
>> > 1
>> > + i2 and o2 <= 35 and o2 >= 0 }
>> > +; MEMORY:  { Stmt_do_body2[i0, i1, i2] -> Stmt_do_body2[i0, i1, o2] : 
>> > +i0 <= 35 and i0 >= 0 and i1 <= 35 and i1 >= 0 and i2 >= 0 and o2 
>> > +>= 1
>> > ++ i2 and o2 <= 35 and o2 >= 0 and i2 <= 35 }
>> 
>> Can you explain why we now get an additional constraint for this test
case?
When we track the memory accesses explicitly isl just keeps track of them. I
don't know why but they look legit to me.

>> > @@ -1,14 +1,46 @@
>> > -; RUN: opt %loadPolly -polly-dependences -analyze < %s | FileCheck 
>> > %s
>> > +; RUN: opt %loadPolly -polly-dependences -analyze 
>> > +-debug-only=polly-dependence 2> %t.2 > %t.1 < %s ; RUN: FileCheck 
>> > +%s --check-prefix=DEPENDENCES --input-file %t.1 ; RUN: FileCheck 
>> > +%s --check-prefix=WRAPPED_DEPENDENCES --input-file %t.2
>> 
>> You can not use -debug-only in a test case, as the test case will 
>> fail
for non-debug builds. Also, having test cases that write files is not a good
idea as this causes noise in the test directory.
>> 
>> There are three options:
>> 
>> 1) Mark the test case as 'REQUIRES: asserts' and put it into a 
>> separate
file
>> 2) Only check the -analyze output
>> 3) Add more output to the -analyze output
>> 
>> I don't have a strong opinion here. Choose what you believe works best.
I go with 1)