[Polly] Reduction dependences [V3]

Tobias Grosser tobias at grosser.es
Thu Jun 19 11:32:15 PDT 2014


On 19/06/2014 20:02, Johannes Doerfert wrote:
> So,... read the new patch and the comments and tell me to commit;)
>
>>> >>Maybe, just make it two test cases. One with parameters and one just the
> simple case. The general rule is that one test case tests one specific
> thing. So we should not test parameters in the simple reduction test case.
> This is the easiest way to have a parameter  and actual privatization
> dependences. I would like to keep it that way but I renamed the file to make
> the intention clear.

>>>> >> >+  RAW = isl_union_map_coalesce(isl_union_map_union(RAW, PrivRAW));
>>>> >> >+  WAW = isl_union_map_coalesce(isl_union_map_union(WAW, PrivWAW));
>>>> >> >+  WAR = isl_union_map_coalesce(isl_union_map_union(WAR, PrivWAR)); }
>>> >>
>>> >>I like the new algorithm. It looks clean and simple.
> I put it into a loop now, this copy and paste code over multiple lines makes
> my dizzy sometimes.

Great, a perfect use of range based for loops.

>>> >>So an unary reduction is a statement that only has either an incoming or
> an outcoming reduction dependency, but not both?
> No, it just has only WAW dependences because it does not read the reduction
> location (btw. we do not detect them yet)
>>> >>  Is the first reduction in a reduction chain also an unary reduction?
> No. A unary reduction is simply a loop carried write to the same location
> without anyone accessing the location for 2 or more iterations.

Mh, unary reduction would then probably need some explanation. The other 
option is to just not talk about them and just to comment in the context 
of what is currently supported.

>
>>>> >> >+;
>>>> >> >+; CHECK:  Reduction dependences:
>>>> >> >+; CHECK:    {  }
>>>> >> >+;
>>>> >> >+; void f(int *sum) {
>>>> >> >+;   for (int i = 0; i < 100; i++)
>>>> >> >+;     sum[i] = sum[99-i] + i;
>>>> >> >+; }
>>> >>
>>> >>Even if we would detect this statement as reduction like, what is the
>>> >>reason that there are no reduction dependences?
>>> >>
>>> >>i = 50 and i = 49 write to the same memory location. So there should be
>>> >>a reduction dependency, no?
> We are limited to binary reductions at the moment and this is not one.
> There are never at least two iterations which read __and__ write to the same
> location,
> Thus we won't find the RAW and WAW dependences of a reduction, thus we don't
> find
> Reduction dependences.

Interesting. Can you add this as a comment.

>>>> >> >+++ b/test/Dependences/reduction_only_reduction_like_access_2.ll
>>>> >> >@@ -0,0 +1,56 @@
>>>> >> >+; RUN: opt %loadPolly -polly-dependences -analyze < %s | FileCheck %s
>>>> >> >+;
>>>> >> >+; The statement is reduction like but should not yield any reduction
> dependences
>>>> >> >+; FIXME: This is only true as long as we restrict ourselfs to the
> exact same
>>> >>ourselves
>>> >>
>>>> >> >+;        access pointer for the load and store.
>>> >>
>>> >>Again, this statement is currently also not detected as reduction like.
>>> >>I would formulate this like:
> I changed it again to "is only almost reduction like", can we go with that
> for now?

I will comment in the patch.

> diff --git a/include/polly/Dependences.h b/include/polly/Dependences.h
> index 9a704a8..655d4aa 100755
> --- a/include/polly/Dependences.h
> +++ b/include/polly/Dependences.h
> @@ -46,6 +46,12 @@ public:
>     static char ID;
>
>     /// @brief The type of the dependences.
> +  ///
> +  /// Reduction dependences are not included in "TYPE_ALL" dependences
> +  /// as they are no "validity" dependences in the sense that we are not allowed
> +  /// to break them. Instead, they can be seen as dependences we want to fulfill
> +  /// but won't if it yields e.g., more parallelism.

This is not very clear. Why do we want to fulfill them but won't sometimes?

It seems the reason is that reduction dependences can be ignored during 
sequential execution, as the order in which statements are executed is
not important, as long as two statements are not executed at the same 
time. However, when executing code in parallel we need to consider the
dependences to ensure that instructions are not executed in parallel,
except if we took additional measures (such as privatization) to ensure 
that parallel writes do not conflict.

 > If we break them to
> +  /// parallelize a loop we need to introduce privatization.
>     enum Type {
>       // Write after read
>       TYPE_WAR = 0x1,
> @@ -56,7 +62,10 @@ public:
>       // Write after write
>       TYPE_WAW = 0x4,
>
> -    // All dependences
> +    // Reduction dependences
> +    TYPE_RED = 0x8,
> +
> +    // All (validity) dependences
>       TYPE_ALL = (TYPE_WAR | TYPE_RAW | TYPE_WAW)
>     };
>
> @@ -105,10 +114,16 @@ private:
>     isl_union_map *WAR;
>     isl_union_map *WAW;
>
> +  /// @brief The map of reduction dependences
> +  isl_union_map *RED = nullptr;
> +
>     /// @brief Collect information about the SCoP.
>     void collectInfo(Scop &S, isl_union_map **Read, isl_union_map **Write,
>                      isl_union_map **MayWrite, isl_union_map **Schedule);
>
> +  /// @brief Calculate and at the privatization dependences
                               add

> +void Dependences::addPrivatizationDependences() {
> +  isl_union_map *PrivRAW, *PrivWAW, *PrivWAR, *TransClosure;
> +
> +  // The transitive closure might be over approximated but we only use it to
> +  // compute the privatization dependences. Thus, overapproximation will lead
> +  // "only" to more conservative privatization dependences.

Can this approximation introduce dependences to statements that are 
scheduled earlier?

I am asking because I would like you to _verify_ that there is no way 
that there will be wrong dependences introduced. If we are not sure, we 
need to look at the dependences in the context of the schedule and 
remove all negative dependences. I would like to avoid this.

> +  TransClosure = isl_union_map_transitive_closure(isl_union_map_copy(RED), 0);
> +
> +  isl_union_map **Maps[] = {&RAW, &WAW, &WAR};
> +  isl_union_map **PrivMaps[] = {&PrivRAW, &PrivWAW, &PrivWAR};
> +  for (unsigned u = 0; u < 3; u++) {
> +    isl_union_map **Map = Maps[u], **PrivMap = PrivMaps[u];
> +
> +    *PrivMap = isl_union_map_apply_range(isl_union_map_copy(*Map),
> +                                         isl_union_map_copy(TransClosure));
> +    *PrivMap = isl_union_map_union(
> +        *PrivMap, isl_union_map_apply_range(isl_union_map_copy(TransClosure),
> +                                            isl_union_map_copy(*Map)));
> +
> +    *Map = isl_union_map_coalesce(isl_union_map_union(*Map, *PrivMap));
> +  }
> +
> +  isl_union_map_free(TransClosure);
> +}
> +
>   void Dependences::calculateDependences(Scop &S) {
>     isl_union_map *Read, *Write, *MayWrite, *Schedule;
>
> @@ -114,7 +179,7 @@ void Dependences::calculateDependences(Scop &S) {
>
>     // The pointers below will be set by the subsequent calls to
>     // isl_union_map_compute_flow.
> -  RAW = WAW = WAR = nullptr;
> +  RAW = WAW = WAR = RED = nullptr;
>
>     if (OptAnalysisType == VALUE_BASED_ANALYSIS) {
>       isl_union_map_compute_flow(
> @@ -169,6 +234,51 @@ void Dependences::calculateDependences(Scop &S) {
>     isl_ctx_reset_operations(S.getIslCtx());
>     isl_ctx_set_max_operations(S.getIslCtx(), MaxOpsOld);
>
> +  // To handle reduction dependences we proceed as follows:
> +  // 1) Aggregate all possible reduction dependences, namely all self
> +  //    dependences on reduction like statements.
> +  // 2) Intersect them with the actual RAW & WAW dependences to the get the
> +  //    actual reduction dependences. This will ensure the load/store memory
> +  //    addresses were __identical__ in the two iterations of the statement.
> +  // 3) Relax the original RAW and WAW dependences by substracting the actual
> +  //    reduction dependences. Binary reductions (sum += A[i]) cause both, and
> +  //    the same, RAW and WAW dependences, but unary reductions (last = A[i])
> +  //    only WAW.

Just skip the unary stuff. We can discuss it when we add unary dependences.

>
> @@ -256,7 +366,8 @@ bool Dependences::isParallelDimension(__isl_take isl_set *ScheduleSubset,
>       return false;
>     }
>
> -  Deps = getDependences(TYPE_ALL);
> +  // FIXME: Reduction dependences prevent parallelization at the moment.

'At the moment' is not very descriptive. Either remove the fixme, or 
explain the problem.

// FIXME: We can remove ignore reduction dependences in case we
// privatize the memory locations the reduction statements reduce into.
> +  Deps = getDependences(TYPE_ALL | TYPE_RED);
>

> diff --git a/lib/CodeGen/BlockGenerators.cpp b/lib/CodeGen/BlockGenerators.cpp
> index 0643fbe..779ff9b 100644
> --- a/lib/CodeGen/BlockGenerators.cpp
> +++ b/lib/CodeGen/BlockGenerators.cpp
> @@ -41,6 +41,7 @@ SCEVCodegenF("polly-codegen-scev", cl::desc("Use SCEV based code generation."),
>                cl::Hidden, cl::location(SCEVCodegen), cl::init(false),
>                cl::ZeroOrMore, cl::cat(PollyCategory));
>
> +bool polly::EnableOpenMP;
>   bool polly::SCEVCodegen;

Leftover.

>   bool polly::canSynthesize(const Instruction *I, const llvm::LoopInfo *LI,
> diff --git a/lib/CodeGen/IslAst.cpp b/lib/CodeGen/IslAst.cpp
> index 0e2ddcc..03ce7d6 100644
> --- a/lib/CodeGen/IslAst.cpp
> +++ b/lib/CodeGen/IslAst.cpp
> @@ -166,7 +166,8 @@ static bool astScheduleDimIsParallel(__isl_keep isl_ast_build *Build,
>
>     Dimension = isl_space_dim(ScheduleSpace, isl_dim_out) - 1;
>
> -  Deps = D->getDependences(Dependences::TYPE_ALL);
> +  // FIXME: Reduction dependences prevent parallelization at the moment.

See above.

> +  Deps = D->getDependences(Dependences::TYPE_ALL | Dependences::TYPE_RED);
>     Deps = isl_union_map_apply_range(Deps, isl_union_map_copy(Schedule));
>     Deps = isl_union_map_apply_domain(Deps, Schedule);
>
> diff --git a/lib/Transform/DeadCodeElimination.cpp b/lib/Transform/DeadCodeElimination.cpp
> index 14c6044..5967f8f 100644
> --- a/lib/Transform/DeadCodeElimination.cpp
> +++ b/lib/Transform/DeadCodeElimination.cpp
> @@ -100,7 +100,9 @@ bool DeadCodeElim::eliminateDeadCode(Scop &S, int PreciseSteps) {
>       return false;
>
>     isl_union_set *Live = this->getLastWrites(S.getWrites(), S.getSchedule());
> -  isl_union_map *Dep = D->getDependences(Dependences::TYPE_RAW);
> +  // FIXME: Reduction dependences prevent dce at the moment

Good catch. Why do we need to fix anything here? how would we fix it?

Otherwise, it looks good.

Cheers,
Tobias



More information about the llvm-commits mailing list