[Polly] Reduction dependences [V3]
    Johannes Doerfert 
    jdoerfert at codeaurora.org
       
    Thu Jun 19 11:02:12 PDT 2014
    
    
  
So,... read the new patch and the comments and tell me to commit ;)
>> Maybe, just make it two test cases. One with parameters and one just the
simple case. The general rule is that one test case tests one specific
thing. So we should not test parameters in the simple reduction test case.
This is the easiest way to have a parameter  and actual privatization
dependences. I would like to keep it that way but I renamed the file to make
the intention clear.
>> I am now very comfortable with the overall patch. A couple of minor
comments are left, but we should be able to address them quickly. The only
missing piece is to not look at the openmp flags, but instead to include the
reduction dependences in the parallelism test.
Ok let's see:
>> > +#include "polly/CodeGen/CodeGeneration.h"
>> 
>> As written in my earlier comment, I don't think we need this (the only
reason this was needed was to include the command line options from there,
no?)
Gone.
>> > //===-----------------------------------------------------------------
>> > -----===//
>> > +
>> Unrelated change.
Gone.
>> > +  TransClosure = 
>> > +isl_union_map_transitive_closure(isl_union_map_copy(RED), 0);
>> 
>> Can you add a comment that makes clear why it is save to over
approximate?
Done.
 
>> > +  PrivRAW = isl_union_map_apply_range(isl_union_map_copy(RAW),
>> > +                                      
>> > + isl_union_map_copy(TransClosure));
>> > +  PrivRAW = isl_union_map_union(
>> > +      PrivRAW,
isl_union_map_apply_range(isl_union_map_copy(TransClosure),
>> > +                                         isl_union_map_copy(RAW)));  
>> > + PrivWAW = isl_union_map_apply_range(isl_union_map_copy(WAW),
>> > +                                      
>> > + isl_union_map_copy(TransClosure));
>> > +  PrivWAW = isl_union_map_union(
>> > +      PrivWAW,
isl_union_map_apply_range(isl_union_map_copy(TransClosure),
>> > +                                         isl_union_map_copy(WAW)));  
>> > + PrivWAR = isl_union_map_apply_range(isl_union_map_copy(WAR),
>> > +                                      
>> > + isl_union_map_copy(TransClosure));
>> > +  PrivWAR = isl_union_map_union(
>> > +      PrivWAR,
isl_union_map_apply_range(isl_union_map_copy(TransClosure),
>> > +                                         isl_union_map_copy(WAR)));  
>> > + isl_union_map_free(TransClosure);
>> > +
>> > +  RAW = isl_union_map_coalesce(isl_union_map_union(RAW, PrivRAW));
>> > +  WAW = isl_union_map_coalesce(isl_union_map_union(WAW, PrivWAW));
>> > +  WAR = isl_union_map_coalesce(isl_union_map_union(WAR, PrivWAR)); }
>> 
>> I like the new algorithm. It looks clean and simple.
I put it into a loop now, this copy and paste code over multiple lines makes
my dizzy sometimes.
>> >   void Dependences::calculateDependences(Scop &S) {
>> >     isl_union_map *Read, *Write, *MayWrite, *Schedule;
>> >
>> > @@ -114,7 +181,7 @@ void Dependences::calculateDependences(Scop &S) {
>> >
>> >     // The pointers below will be set by the subsequent calls to
>> >     // isl_union_map_compute_flow.
>> > -  RAW = WAW = WAR = nullptr;
>> > +  RAW = WAW = WAR = RED = nullptr;
>> >
>> >     if (OptAnalysisType == VALUE_BASED_ANALYSIS) {
>> >       isl_union_map_compute_flow(
>> > @@ -169,6 +236,56 @@ void Dependences::calculateDependences(Scop &S) {
>> >     isl_ctx_reset_operations(S.getIslCtx());
>> >     isl_ctx_set_max_operations(S.getIslCtx(), MaxOpsOld);
>> >
>> > +  // FIXME: Code generation cannot produce privatization code yet,
thus we need
>> > +  //        to check if we will produce non-parallel code anyway. If
so, we do
>> > +  //        not need privatization, if not we turn off reduction
support.
>> > +  if (PollyVectorizerChoice == VECTORIZER_NONE && !EnableOpenMP) {
>> 
>> If you want to check that we neither introduce vector parallellism nor
OpenMP parallelism, you meant || instead of &&, no?
>> 
>> In any case, I don't think we should check those flags. Instead, we
should include the reduction dependences in all parallelism checks (and
exclude them in case we know we can perform privatization).
Done.
>> > +    // To handle reduction dependences we proceed as follows:
>> > +    // 1) Aggregate all possible reduction dependences, namely all
self
>> > +    //    dependences on reduction like statements.
>> 
>> Good.
>> 
>> > +    // 2) Intersect them with the actual RAW & WAW dependences to the
get the
>> > +    //    actual reduction dependences. This will ensure the
load/store memory
>> > +    //    addresses were __identical__ in the two iterations of the
statement.
>> 
>> Good.
>> 
>> > +    // 3) Relax the original RAW and WAW dependences by substracting
the actual
>> > +    //    reduction dependences. Binary reductions (sum += A[i]) cause
both, and
>> > +    //    the same, RAW and WAW dependences, but unary reductions
(last = A[i])
>> > +    //    only WAW.
>> 
>> So an unary reduction is a statement that only has either an incoming or
an outcoming reduction dependency, but not both? 
No, it just has only WAW dependences because it does not read the reduction
location (btw. we do not detect them yet)
>>  Is the first reduction in a reduction chain also an unary reduction?
No. A unary reduction is simply a loop carried write to the same location
without anyone accessing the location for 2 or more iterations.
>> I just looked at the dependences in:
>> 
>> reduction_simple_privatization_deps.ll
>> 
>>   RAW dependences:
>>     [N] -> { Stmt_S0[] -> Stmt_S1[0] : N >= 11; Stmt_S1[1023] ->
Stmt_S2[]
>> : N >= 11; Stmt_S1[i0] -> Stmt_S1[1 + i0] : i0 >= 0 and i0 <= 1022 and N
>= 11 }
>>   WAR dependences:
>>     [N] -> {  }
>>   WAW dependences:
>>     [N] -> { Stmt_S0[] -> Stmt_S1[0] : N >= 11; Stmt_S1[1023] ->
Stmt_S2[]
>> : N >= 11; Stmt_S1[i0] -> Stmt_S1[1 + i0] : i0 >= 0 and i0 <= 1022 and N
>= 11 }
>> 
>> It seems the WAR/WAW dependences are identical.
For binary reductions they are identical all the time.
>>  Does this mean there is no unary reduction here?
Yes, since unary reductions cause only WAW dependences (there is simply no
load).
>>Otherwise, according to your comment, we would see a WAW dependency that
is not in the RAW set, no?
YES.
>> >   OpenMP("enable-polly-openmp", cl::desc("Generate OpenMP parallel
code"),
>> >          cl::value_desc("OpenMP code generation enabled if true"),
>> > -       cl::init(false), cl::ZeroOrMore, cl::cat(PollyCategory));
>> > +       cl::location(EnableOpenMP), cl::init(false), cl::ZeroOrMore,
>> > +       cl::cat(PollyCategory));
>> 
>> As I said earlier, I don't think that we want to look at this flag when
computing dependences.
Gone.
>> Also, I did not realize we can use the CHECK-DAG instruction to not 
>> break in case of possible reordering in isl's output.
We can,.. and I like it :D ... I would even like to decompose the
dependences even more (split each "and") but that is script work :D
>> > --- /dev/null
>> > +++ b/test/Dependences/reduction_only_reduction_like_access.ll
>> > @@ -0,0 +1,38 @@
>> > +; RUN: opt %loadPolly -polly-dependences -analyze < %s | FileCheck %s
>> > +;
>> > +; The statement is reduction like but should not yield any reduction
dependences
>> 
>> The statement is currently not even reduction like. (We changed it
earlier)
Added a fixme and changed it to "almost reduction like".
>> > +;
>> > +; CHECK:  Reduction dependences:
>> > +; CHECK:    {  }
>> > +;
>> > +; void f(int *sum) {
>> > +;   for (int i = 0; i < 100; i++)
>> > +;     sum[i] = sum[99-i] + i;
>> > +; }
>> 
>> Even if we would detect this statement as reduction like, what is the 
>> reason that there are no reduction dependences?
>> 
>> i = 50 and i = 49 write to the same memory location. So there should be 
>> a reduction dependency, no?
We are limited to binary reductions at the moment and this is not one.
There are never at least two iterations which read __and__ write to the same
location,
Thus we won't find the RAW and WAW dependences of a reduction, thus we don't
find
Reduction dependences.
>> > +++ b/test/Dependences/reduction_only_reduction_like_access_2.ll
>> > @@ -0,0 +1,56 @@
>> > +; RUN: opt %loadPolly -polly-dependences -analyze < %s | FileCheck %s
>> > +;
>> > +; The statement is reduction like but should not yield any reduction
dependences
>> > +; FIXME: This is only true as long as we restrict ourselfs to the
exact same
>> ourselves
>> 
>> > +;        access pointer for the load and store.
>> 
>> Again, this statement is currently also not detected as reduction like. 
>> I would formulate this like:
I changed it again to "is only almost reduction like", can we go with that
for now? 
 
>> We do not find any reduction dependences in the following loop, as the
>> different index expressions in the read and write access prevent us from
>> detecting this statement as reduction like.
>>
>> FIXME: In case we lift the requirement on identical subscripts in input
>>         and output accesses, this test case will have reduction
>>         dependences for j = 0.
There will be even more, not only for j = 0 ;),.. but lets worry about that
later.
>> > +;  CHECK: RAW dependences:
>> > +;  CHECK-DAG:  Stmt_S3[i0] -> Stmt_S2[1 + i0, o1] : i0 <= 97 and i0 >=
0 and o1 <= 99 and o1 >= 0
>> > +;  CHECK-DAG:  Stmt_S1[i0] -> Stmt_S3[i0] : i0 >= 0 and i0 <= 98
>> > +;  CHECK: WAR dependences:
>> > +;  CHECK:   {  }
>> > +;  CHECK: WAW dependences:
>> > +;  CHECK-DAG:  Stmt_S3[i0] -> Stmt_S2[1 + i0, o1] : i0 <= 97 and i0 >=
0 and o1 <= 99 and o1 >= 0
>> > +;  CHECK-DAG:  Stmt_S1[i0] -> Stmt_S3[i0] : i0 >= 0 and i0 <= 98
>> > +;  CHECK: Reduction dependences:
>> > +;  CHECK:   { Stmt_S2[i0, i1] -> Stmt_S2[i0, 1 + i1] : i0 <= 98 and i0
>= 0 and i1 <= 98 and i1 >= 0 }
>> 
>> Maybe add a comment, that in this case we have privatization dependences 
>> from a textually later statement to a textually earlier,
>> but the dependences still go forward in time. This is again a nice test 
>> case.
Done.
>> > +;    void f(int *sum) {
>> > +;      int i, j;
>> > +;      for (i = 0; i < 99; i++) {
>> > +; S1:    sum[i + 1] += 42;
>> > +;        for (j = 0; j < 100; j++)
>> > +; S2:      sum[i] += i * j;
>> > +; S3:    sum[i + 1] += 7;
>> > +;      }
>> > +;    }
>> 
>> The other test cases are fine. We got now really good test coverage. I 
>> am very confident about this.
I agree.
--
Johannes Doerfert
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
The Linux Foundation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Model-statement-wise-reduction-dependences_v5.patch
Type: application/octet-stream
Size: 48603 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140619/214e9e34/attachment.obj>
    
    
More information about the llvm-commits
mailing list