[PATCH] D18822: [Polly][FIX] Adjust execution context of hoisted loads wrt. error domains

Wed Apr 6 09:02:30 PDT 2016

jdoerfert added a comment.

In http://reviews.llvm.org/D18822#393262, @Meinersbur wrote:

> The summary does not mention this is supposed to fix PR26683 or what was wrong with hoisted loads before.


It does and the code shown there is a test case here. I can mention it.

> There's a typo: s/therefor/therefore


Thx.


================
Comment at: include/polly/ScopInfo.h:1320
@@ -1319,1 +1319,3 @@
 
+  /// @brief A map from basic blocks to the error context reaching them.
+  DenseMap<BasicBlock *, isl_set *> ErrorDomainCtxMap;
----------------
Meinersbur wrote:
> Could you explain somewhere what an 'error context' is?
Sure.

================
Comment at: include/polly/ScopInfo.h:1427-1441
@@ -1423,2 +1426,17 @@
 
+  /// @brief Propagate of error block domain contexts through @p R.
+  ///
+  /// This method will propagate error block domain contexts through @p R
+  /// and thereby fill the ErrorDomainCtxMap map. Additionally, the domains
+  /// of error statements and those only reachable via error statements will
+  /// be replaced by an empty set. Later those will be removed completely from
+  /// the SCoP.
+  ///
+  /// @param R  The currently traversed region.
+  /// @param SD The ScopDetection analysis for the current function.
+  /// @param DT The DominatorTree for the current function.
+  /// @param LI The LoopInfo for the current function.
+  void propagateErrorConstraints(Region *R, ScopDetection &SD,
+                                 DominatorTree &DT, LoopInfo &LI);
+
   /// @brief Propagate domains that are known due to graph properties.
----------------
Meinersbur wrote:
> Maybe this and the implementation could be moved to where removeErrorBlockDomains was to make clear it's a replacement.
If you think that is better, sure.

================
Comment at: lib/Analysis/ScopInfo.cpp:2178-2185
@@ -2177,10 +2144,1 @@
 
-  // Error blocks and blocks dominated by them have been assumed to never be
-  // executed. Representing them in the Scop does not add any value. In fact,
-  // it is likely to cause issues during construction of the ScopStmts. The
-  // contents of error blocks have not been verfied to be expressible and
-  // will cause problems when building up a ScopStmt for them.
-  // Furthermore, basic blocks dominated by error blocks may reference
-  // instructions in the error block which, if the error block is not modeled,
-  // can themselves not be constructed properly.
-  removeErrorBlockDomains(SD, DT, LI);
----------------
Meinersbur wrote:
> The comment still applies for the most part, doesn't it?
I will recycle some parts.

================
Comment at: lib/Analysis/ScopInfo.cpp:2212-2223
@@ +2211,14 @@
+
+  ReversePostOrderTraversal<Region *> RTraversal(R);
+  for (auto *RN : RTraversal) {
+
+    // Recurse for affine subregions but go on for basic blocks and non-affine
+    // subregions.
+    if (RN->isSubRegion()) {
+      Region *SubRegion = RN->getNodeAs<Region>();
+      if (!SD.isNonAffineSubRegion(SubRegion, &getRegion())) {
+        propagateErrorConstraints(SubRegion, SD, DT, LI);
+        continue;
+      }
+    }
+
----------------
Meinersbur wrote:
> I have not better immediate solution but would like to remark the following:
> - This pattern occurs multiple times (buildDomainsWithBranchConstraints, propagateDomainConstraints, propagateErrorConstraints) so it might make sense to wrap it.
> - The comment of ReversePostOrderTraversal mentions it is expensive to create
> - Why does it need to be recursive over nested regions, instead non-recursive over BB reverse post-order?
> 
All good points which I partially addresses a while ago but should have mentioned somehwere more prominent when we introduced the new domain generation. Here is my take but we might want to put that somehwere [either as comment or code]:

1) To wrap the pattern is something I would like to see but I do not have a good solution for (different recursive functions and signatures). I will gladly review and support that though.

2) I saw the comment early on and my take was that we could change the implementation of that algorithm if it was actually "slow". So far I did not (look for or) observe any slowdown due to the RPO and the first way to counter it would be to share/reuse/cache the same RTraversal for the different regions. Though, due to the little number of regions this might not change much.

3) We can either build a RPO traversal for the whole CFG or a Region which can contain SubRegions as nodes. There was an initial assumption that a SCoP does not span enough of the CFG to make it worthwile to build a traversal for the whole thing, thus we went with the latter. Since it operates on the RegionTree not on the CFG we should not pay more due to the recursive nature of the traversal. In fact we should pay only for what we need. That said, I do not know if it will make sense to go with a function RPO traversal at some point (if we have a function pass maybe) or not.

================
Comment at: lib/Analysis/ScopInfo.cpp:2242
@@ +2241,3 @@
+
+    if (!IsErrorBlock)
+      isl_set_free(DomainCtx);
----------------
Meinersbur wrote:
> else
I did this on purpose but I can do else too.

================
Comment at: lib/Analysis/ScopInfo.cpp:3070-3071
@@ -3050,2 +3069,4 @@
   isl_set *DomainCtx = isl_set_params(Stmt.getDomain());
+  if (auto *ErrorCtx = getErrorCtxReachingStmt(Stmt))
+    DomainCtx = isl_set_subtract(DomainCtx, ErrorCtx);
   DomainCtx = isl_set_remove_redundancies(DomainCtx);
----------------
Meinersbur wrote:
> Isn't it enough just use the union of all error domains (or the context) instead per-stmt?
> 
> That domain could contain the hoisted value itself, but which could be just outprojected.
> Isn't it enough just use the union of all error domains (or the context) instead per-stmt?
That would be an overapproximation that would basically make error statements and invariant loads "exclusive" again. Thus, the domain of a error-block could not depend on an invariant load, now it can.

> That domain could contain the hoisted value itself, but which could be just outprojected.
I do not get this, sorry.

================
Comment at: test/ScopInfo/error-blocks-1.ll:10-20
@@ +9,13 @@
+;
+; CHECK:         Statements {
+; CHECK-NEXT:    	Stmt_if_end
+; CHECK-NEXT:            Domain :=
+; CHECK-NEXT:                [N] -> { Stmt_if_end[i0] : 0 <= i0 < N };
+; CHECK-NEXT:            Schedule :=
+; CHECK-NEXT:                [N] -> { Stmt_if_end[i0] -> [i0] };
+; CHECK-NEXT:            ReadAccess :=	[Reduction Type: +] [Scalar: 0]
+; CHECK-NEXT:                [N] -> { Stmt_if_end[i0] -> MemRef_A[i0] };
+; CHECK-NEXT:            MustWriteAccess :=	[Reduction Type: +] [Scalar: 0]
+; CHECK-NEXT:                [N] -> { Stmt_if_end[i0] -> MemRef_A[i0] };
+; CHECK-NEXT:    }
+;
----------------
Meinersbur wrote:
> Because there is no load hoisting, what dos this patch change in this test case?
This is a test for the removal of error statements that was poorly covered before.

================
Comment at: test/ScopInfo/error-blocks-2.ll:9
@@ +8,3 @@
+; CHECK-NEXT:                [N, valid_val] -> { Stmt_S[i0] -> MemRef_ptr_addr[0] };
+; CHECK-NEXT:            Execution Context: [N, valid_val] -> {  : N > 0 and (valid_val < 0 or valid_val > 0) }
+; CHECK-NEXT:            ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
----------------
Meinersbur wrote:
> Are the invariant accesses ordered? Is the previous invariant access for MemRef_valid guaranteed to be executed before this and to use valid_val ?
> Are the invariant accesses ordered?
Not anymore. That did not work out well ... The code generation will "order" them on demand.


http://reviews.llvm.org/D18822