Reductions [patch 2 & 3]

Fri Jun 13 17:10:16 PDT 2014

Comments for the second and third patch.

There is only one minor change I would make to the second patch (remove
intersection with RAW deps), thus I didn't attach a new version.
However, there are a couple of open questions (namely the naming and test
cases). You have to tell me what to do after you read my comments.

The same holds for the third one (I would add the missing comment)  and I
don't know where to put the tests.

--

Johannes Doerfert
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
The Linux Foundation

>> >  From 836c1d0ed5adac6f9d8b258440fa96183ebc0ba9 Mon Sep 17 00:00:00
>> > 2001
>> > From: Johannes Doerfert<jdoerfert at codeaurora.org>
>> > Date: Thu, 12 Jun 2014 12:15:08 -0700
>> > Subject: [PATCH 2/3] Model statement wise reduction dependences
>> >
>> > + Collect and print reduction dependences Added cmd option to 
>> > + remove reduction dependences
>> >
>> > - There is no privatization dependences yet, thus with DCE even
compilation
>> >    for sequential execution might result in wrong code.
>> > ---
>> >   include/polly/Dependences.h                        |  3 ++
>> >   include/polly/ScopInfo.h                           |  2 +
>> >   lib/Analysis/Dependences.cpp                       | 37
++++++++++++++++-
>> >   lib/Analysis/ScopInfo.cpp                          |  3 +-
>> >   .../simplest_binary_only_reduction_like.ll         | 47
++++++++++++++++++++++
>> >   .../simplest_binary_reduction_w_constant.ll        | 12 +++++-
>> >   test/Reductions/simplest_binary_reduction_w_iv.ll  | 12 +++++-
>> >   7 files changed, 112 insertions(+), 4 deletions(-)
>> >   create mode 100644
>> > test/Reductions/simplest_binary_only_reduction_like.ll
>> >
>> > diff --git a/include/polly/Dependences.h 
>> > b/include/polly/Dependences.h index 9a704a8..d44a905 100755
>> > --- a/include/polly/Dependences.h
>> > +++ b/include/polly/Dependences.h
>> > @@ -105,6 +105,9 @@ private:
>> >     isl_union_map *WAR;
>> >     isl_union_map *WAW;
>> >
>> > +  /// @brief The map of reduction dependences  isl_union_map *RED 
>> > + = nullptr;
>> > +
>> >     /// @brief Collect information about the SCoP.
>> >     void collectInfo(Scop &S, isl_union_map **Read, isl_union_map
**Write,
>> >                      isl_union_map **MayWrite, isl_union_map 
>> > **Schedule); diff --git a/include/polly/ScopInfo.h 
>> > b/include/polly/ScopInfo.h index f80cd42..63fe576 100644
>> > --- a/include/polly/ScopInfo.h
>> > +++ b/include/polly/ScopInfo.h
>> > @@ -58,6 +58,8 @@ class TempScop;
>> >   class SCEVAffFunc;
>> >   class Comparison;
>> >
>> > +extern bool EnableReductions;
>> > +
>> >
//===----------------------------------------------------------------------=
==//
>> >   /// @brief Represent memory accesses in statements.
>> >   class MemoryAccess {
>> > diff --git a/lib/Analysis/Dependences.cpp 
>> > b/lib/Analysis/Dependences.cpp index 0a33969..f54c2ee 100644
>> > --- a/lib/Analysis/Dependences.cpp
>> > +++ b/lib/Analysis/Dependences.cpp
>> > @@ -51,6 +51,13 @@ LegalityCheckDisabled("disable-polly-legality",
>> >                         cl::desc("Disable polly legality check"),
cl::Hidden,
>> >                         cl::init(false), cl::ZeroOrMore, 
>> > cl::cat(PollyCategory));
>> >
>> > +static cl::opt<bool, true>
>> > +XEnableReductions("polly-reductions",
>> > +                  cl::desc("Relax the reduction dependences before
scheduling"),
>> > +                  cl::Hidden, cl::location(polly::EnableReductions),
>> > +                  cl::init(false), cl::ZeroOrMore, 
>> > +cl::cat(PollyCategory)); bool polly::EnableReductions = false;
>> 
>> Should we enable this by default? Do we even need an option?
Without an updated code generation I would like to keep it under a flag.

>>  > @@ -310,14 +340,19 @@ void Dependences::printScop(raw_ostream &OS)  
>> >
const {
>>  >    printDependencyMap(OS, WAR);
>>  >    OS << "\tWAW dependences:\n\t\t";
>>  >    printDependencyMap(OS, WAW);
>>  > +  if (EnableReductions) {
>>  > +    OS << "\tReduction dependences:\n\t\t";
>>  > +    printDependencyMap(OS, RED);
>>  > +  }
>>  >  }
>> 
>> >   enum AnalysisType { VALUE_BASED_ANALYSIS, MEMORY_BASED_ANALYSIS 
>> > };
>> >
>> >   static cl::opt<enum AnalysisType> OptAnalysisType( @@ -169,6 
>> > +176,29 @@ void Dependences::calculateDependences(Scop &S) {
>> >     isl_ctx_reset_operations(S.getIslCtx());
>> >     isl_ctx_set_max_operations(S.getIslCtx(), MaxOpsOld);
>> >
>> > +  if (EnableReductions) {
>> > +    // Aggregate all possible reduction dependences (self 
>> > + dependences
on
>> > +    // reduction like statements). Then intersect them with the 
>> > + actual
RAW and
>> > +    // WAW dependences the get the actual reduction dependences.
>> 
>> Can there be reduction WAW dependences? Should not all such 
>> dependences
be RAW?
There are actually "more" WAW then RAW dependences. Even for unary
reductions there will always be WAW dependences (but then no RAWs).
It should not be necessary to intersect them with RAW, my bad.

>>  > The last step
>> > +    // is to relax the original RAW and WAW dependences as well as 
>> > + to
add
>> > +    // the privatization dependences [FIXME: which are missing 
>> > + here]
>> 
>> What are privatization dependences?
See next comment.

>> Also, we need to add dependences between S1 and all S2 as well as S2 
>> and
all S3, no? Any idea how to do this?
>> 
>> S1: A[0] = 0;
>> 
>>      for (i)
>> S2:    A[0 += i;
>> 
>> S3: A[0] *= 10;
Yes, I call/called them privatization dependences. I know how to do this and
a patch is ready. However it is easier to do it on memory access level, thus
it is included in the next group of patches.

>> > +    RED = isl_union_map_empty(isl_union_map_get_space(RAW));
>> > +    for (auto *Stmt : S) {
>> > +      if (!Stmt->isReductionLike())
>> > +        continue;
>> > +      isl_set *Domain = Stmt->getDomain();
>> > +      isl_map *Identity =
>> > +          isl_map_from_domain_and_range(isl_set_copy(Domain), Domain);
>> > +      RED = isl_union_map_add_map(RED, Identity);
>> > +    }
>> > +
>> > +    RED = isl_union_map_intersect(RED, isl_union_map_copy(RAW));
>> > +    RED = isl_union_map_intersect(RED, isl_union_map_copy(WAW));
>> > +
>> > +    RAW = isl_union_map_subtract(RAW, isl_union_map_copy(RED));
>> > +    WAW = isl_union_map_subtract(WAW, isl_union_map_copy(RED));  }
>> > +
>> >     DEBUG(printScop(dbgs()));
>> >   }
>> >
>> > @@ -310,14 +340,19 @@ void Dependences::printScop(raw_ostream &OS)
const {
>> >     printDependencyMap(OS, WAR);
>> >     OS << "\tWAW dependences:\n\t\t";
>> >     printDependencyMap(OS, WAW);
>> > +  if (EnableReductions) {
>> > +    OS << "\tReduction dependences:\n\t\t";
>> > +    printDependencyMap(OS, RED);
>> > +  }
>> >   }
>> >
>> >   void Dependences::releaseMemory() {
>> >     isl_union_map_free(RAW);
>> >     isl_union_map_free(WAR);
>> >     isl_union_map_free(WAW);
>> > +  isl_union_map_free(RED);
>> >
>> > -  RAW = WAR = WAW = nullptr;
>> > +  RED = RAW = WAR = WAW = nullptr;
>> >   }
>> >
>> >   isl_union_map *Dependences::getDependences(int Kinds) { diff 
>> > --git a/lib/Analysis/ScopInfo.cpp b/lib/Analysis/ScopInfo.cpp index
>> > 48b6fed..441a961 100644
>> > --- a/lib/Analysis/ScopInfo.cpp
>> > +++ b/lib/Analysis/ScopInfo.cpp
>> > @@ -744,7 +744,8 @@ ScopStmt::ScopStmt(Scop &parent, TempScop
&tempScop, const Region &CurRegion,
>> >     Domain = buildDomain(tempScop, CurRegion);
>> >     buildScattering(Scatter);
>> >     buildAccesses(tempScop, CurRegion);
>> > -  checkForReduction();
>> > +  if (EnableReductions)
>> > +    checkForReduction();
>> >   }
>> >
>> >   void ScopStmt::checkForReduction() { diff --git 
>> > a/test/Reductions/simplest_binary_only_reduction_like.ll
>> > b/test/Reductions/simplest_binary_only_reduction_like.ll
>> > new file mode 100644
>> > index 0000000..528d847
>> > --- /dev/null
>> > +++ b/test/Reductions/simplest_binary_only_reduction_like.ll
>> 
>> Can we add those tests to test/Dependences?
I'm confused. I was going to (as you see already here) to reuse the
reduction tests for a couple of runs.
This would allow quite a bit of testing with less test cases. Furthermore,
when we tailor for different kinds of Reductions it is interesting to see
not only the detection but also the dependences and effects on the schedule.
I moved the first test cases to ScopInfo (as you said) and now I can move
them to test/Dependences by duplicating them, however I do not see the
benefit of that.

You have to tell me how you want to arrange these new test cases, duplicate
them or use them only for one kind of test.
In the latter case, please also tell me how to decide for which.

>> > @@ -0,0 +1,47 @@
>> > +; RUN: opt %loadPolly -polly-scops -polly-reductions -analyze 
>> > +-basicaa < %s | FileCheck %s --check-prefix=DETECTION ; RUN: opt 
>> > +%loadPolly -polly-dependences -polly-reductions -analyze -basicaa 
>> > +< %s | FileCheck %s --check-prefix=DEPENDENCES ; ; DETECTION: 
>> > +Reduction
>> > +like: 1 ; ; DEPENDENCES:  Reduction dependences:
>> > +; DEPENDENCES:    {  }
>> > +;
>> > +; int f(int sum0, int sum1) {
>> > +;   int i;
>> > +;   for (int i = 0; i <= 100; i++) {
>> > +;     sum0 = sum1 + 3;
>> > +;   }
>> > +;   return sum0;
>> > +; }
>> > +;
>> > +; ModuleID = 'test/Reductions/simplest_binary_reduction.ll'
>> > +target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-n32-S64"
>> > +
>> > +; Function Attrs: nounwind
>> > +define i32 @f(i32 %sum0, i32 %sum1) {
>> > +entry:
>> > +  %sum0.reg2mem = alloca i32
>> > +  %sum1.reg2mem = alloca i32
>> > +  br label %entry.split1
>> > +
>> > +entry.split1:                                     ; preds = %entry
>> > +  br label %entry.split
>> > +
>> > +entry.split:                                      ; preds =
%entry.split1
>> > +  store i32 %sum0, i32* %sum0.reg2mem  store i32 %sum1, i32* 
>> > + %sum1.reg2mem  br label %for.cond
>> > +
>> > +for.cond:                                         ; preds = %for.cond,
%entry.split
>> > +  %i1.0 = phi i32 [ 0, %entry.split ], [ %inc, %for.cond ]  
>> > + %sum1.reload = load i32* %sum1.reg2mem  %add = add nsw i32 
>> > + %sum1.reload, 3  %inc = add nsw i32 %i1.0, 1  store i32 %add, 
>> > + i32* %sum0.reg2mem  %cmp = icmp slt i32 %i1.0, 100  br i1 %cmp, 
>> > + label %for.cond, label %for.end
>> > +
>> > +for.end:                                          ; preds = %for.cond
>> > +  %sum0.reload.2 = load i32* %sum0.reg2mem
>> > +  ret i32 %sum0.reload.2
>> > +}
>> > diff --git 
>> > a/test/Reductions/simplest_binary_reduction_w_constant.ll
>> > b/test/Reductions/simplest_binary_reduction_w_constant.ll
>> > index 7ab9d47..c59e84f 100644
>> > --- a/test/Reductions/simplest_binary_reduction_w_constant.ll
>> > +++ b/test/Reductions/simplest_binary_reduction_w_constant.ll
>> > @@ -1,7 +1,17 @@
>> > -; RUN: opt %loadPolly -polly-scops -analyze -basicaa < %s | 
>> > FileCheck %s --check-prefix=DETECTION
>> > +; RUN: opt %loadPolly -polly-scops -polly-reductions -analyze 
>> > +-basicaa < %s | FileCheck %s --check-prefix=DETECTION ; RUN: opt 
>> > +%loadPolly -polly-dependences -polly-reductions -analyze -basicaa 
>> > +< %s | FileCheck %s --check-prefix=DEPENDENCES
>> >   ;
>> >   ; DETECTION: Reduction like: 1
>> >   ;
>> > +; DEPENDENCES:  RAW dependences:
>> > +; DEPENDENCES:    {  }
>> > +; DEPENDENCES:  WAR dependences:
>> > +; DEPENDENCES:    {  }
>> > +; DEPENDENCES:  WAW dependences:
>> > +; DEPENDENCES:    {  }
>> > +; DEPENDENCES:  Reduction dependences:
>> > +; DEPENDENCES:    { Stmt_for_cond[i0] -> Stmt_for_cond[1 + i0] : i0 >=
0 and i0 <= 99 }
>> > +;
>> >   ; int f(int sum) {
>> >   ;   int i;
>> >   ;   for (int i = 0; i <= 100; i++) {
>> > diff --git a/test/Reductions/simplest_binary_reduction_w_iv.ll
>> > b/test/Reductions/simplest_binary_reduction_w_iv.ll
>> > index 6c49988..679a629 100644
>> > --- a/test/Reductions/simplest_binary_reduction_w_iv.ll
>> > +++ b/test/Reductions/simplest_binary_reduction_w_iv.ll
>> > @@ -1,7 +1,17 @@
>> > -; RUN: opt %loadPolly -polly-scops -analyze -basicaa < %s | 
>> > FileCheck %s --check-prefix=DETECTION
>> > +; RUN: opt %loadPolly -polly-scops -polly-reductions -analyze 
>> > +-basicaa < %s | FileCheck %s --check-prefix=DETECTION ; RUN: opt 
>> > +%loadPolly -polly-dependences -polly-reductions -analyze -basicaa 
>> > +< %s | FileCheck %s --check-prefix=DEPENDENCES
>> >   ;
>> >   ; DETECTION: Reduction like: 1
>> >   ;
>> > +; DEPENDENCES:  RAW dependences:
>> > +; DEPENDENCES:    {  }
>> > +; DEPENDENCES:  WAR dependences:
>> > +; DEPENDENCES:    {  }
>> > +; DEPENDENCES:  WAW dependences:
>> > +; DEPENDENCES:    {  }
>> > +; DEPENDENCES:  Reduction dependences:
>> > +; DEPENDENCES:    { Stmt_for_cond[i0] -> Stmt_for_cond[1 + i0] : i0 >=
0 and i0 <= 99 }
>> > +;
>> >   ; int f(int sum) {
>> >   ;   int i;
>> >   ;   for (int i = 0; i <= 100; i++) {
>> > -- 1.8.2.1
>> 
>> Adding such a test case would be nice as well:
>> 
>> S1: A[0] = 0;
>> 
>>      for (i)
>> S2:    A[0 += i;
>> 
>> S3: A[0] *= 10;
I was going to send the next set of patches (including one which adds
privatization dependences) as soon as we discussed (and accepted) these.
There will be a lot more test cases including ones similar to the one above.

>> >   class IslAstInfo : public ScopPass { diff --git 
>> > a/include/polly/Dependences.h b/include/polly/Dependences.h index 
>> > d44a905..00b18ee 100755
>> > --- a/include/polly/Dependences.h
>> > +++ b/include/polly/Dependences.h
>> > @@ -56,6 +56,9 @@ public:
>> >       // Write after write
>> >       TYPE_WAW = 0x4,
>> >
>> > +    // Reduction dependences
>> > +    TYPE_RED = 0x8,
>> > +
>> 
>> This should be part of the previous patch, no?
There was no need to do that, this patch introduces the first "user" of the
reduction dependences. Do I have to move it?

>> >       // All dependences
>> >       TYPE_ALL = (TYPE_WAR | TYPE_RAW | TYPE_WAW)
>> 
>> So reduction dependences are not part of all dependences?
Yes, so far. 
>>  I can see that we do not want to look at them by default, but 
>> calling
something all and not including them is surprising.
I think it still fits:
These are ALL dependences we have to fulfill during the scheduling and we
have to consider for parallelism checks.
These are NOT ALL we need when we check if we need to privatize (and because
privatization dependences are currently missing).

>> > @@ -151,22 +156,17 @@ static void freeIslAstUser(void *Ptr) {
>> >   // dimension if it is a subset of a map with equal values for the
current
>> >   // dimension.
>> >   static bool astScheduleDimIsParallel(__isl_keep isl_ast_build *Build,
>> > -                                     Dependences *D) {
>> > -  isl_union_map *Schedule, *Deps;
>> > +                                     __isl_take isl_union_map 
>> > + *Deps) {  isl_union_map *Schedule;
>> >     isl_map *ScheduleDeps, *Test;
>> >     isl_space *ScheduleSpace;
>> >     unsigned Dimension, IsParallel;
>> >
>> > -  if (!D->hasValidDependences()) {
>> > -    return false;
>> > -  }
>> > -
>> >     Schedule = isl_ast_build_get_schedule(Build);
>> >     ScheduleSpace = isl_ast_build_get_schedule_space(Build);
>> >
>> >     Dimension = isl_space_dim(ScheduleSpace, isl_dim_out) - 1;
>> >
>> > -  Deps = D->getDependences(Dependences::TYPE_ALL);
>> >     Deps = isl_union_map_apply_range(Deps,
isl_union_map_copy(Schedule));
>> >     Deps = isl_union_map_apply_domain(Deps, Schedule);
>> >
>> > @@ -192,6 +192,20 @@ static bool 
>> > astScheduleDimIsParallel(__isl_keep
isl_ast_build *Build,
>> >     return IsParallel;
>> >   }
>> >
>> > +static bool astScheduleDimIsParallel(__isl_keep isl_ast_build *Build,
>> > +                                     Dependences *D, bool
>> > +&IsReductionParallel) {
>> 
>> Maybe a brief comment on top of the function.
I'll copy the other one :)

>> > +  if (!D->hasValidDependences())
>> > +    return false;
>> > +
>> > +  isl_union_map *Deps = D->getDependences(Dependences::TYPE_ALL);
>> > +  if (!astScheduleDimIsParallel(Build, Deps))
>> > +    return false;
>> > +  isl_union_map *RedDeps = 
>> > +D->getDependences(Dependences::TYPE_RED);
>> > +  if (!astScheduleDimIsParallel(Build, RedDeps))
>> > +    IsReductionParallel = true;
>> > +  return true;
>> > +}
>> 
>> Did you think of using C++11 tuples for the two return values? (We 
>> need
to check if the visual studio version LLVM targets supports them).
I don't know about C++11 tuples, but where do you want to use them anyway?

>> > diff --git 
>> > a/test/Reductions/simplest_binary_reduction_w_constant.ll
>> > b/test/Reductions/simplest_binary_reduction_w_constant.ll
>> > index c59e84f..722f7b3 100644
>> > --- a/test/Reductions/simplest_binary_reduction_w_constant.ll
>> > +++ b/test/Reductions/simplest_binary_reduction_w_constant.ll
>> 
>> Make these test cases in the ast generator. We may duplicate certain
codes, but on the other side we can have minimal test cases for the
different use cases. Here we may e.g. want to later add more complex loop
nests of scheduling choices which are not relevant to ScopInfo.
These partially answers my comment about test cases above, but I would like
to know if I duplicate all or only certain test cases (and if so which).

--
You received this message because you are subscribed to the Google Groups
"Polly Development" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to polly-dev+unsubscribe at googlegroups.com.
For more options, visit https://groups.google.com/d/optout.