[PATCH] D15722: [WIP][Polly] SSA Codegen

Wed Apr 6 08:29:33 PDT 2016

jdoerfert added a comment.

In http://reviews.llvm.org/D15722#390590, @Meinersbur wrote:

> Thank you for the awseome patch.


I am glad somebody else likes the idea too and I got some more feedback!

> AFAIU the algorithm is "follow each generated MK_PHIKind through to the end of the generated CFG while adding PHIs to make them usable", ignoring whether the defs are used or how the PHIs where in the original code. Could you explain this somewhere, preferable not only in the commit message?


It is a bit more general but the idea is correct. Track all generated MK_Value and MK_PHI accesses through the CFG and add new PHIs if different versions for the same base address join in a block with multiple predecessors. [This is a short description I can build on and place somewhere].

> Is it correct that this gives us O(n*m) generated PHIs (n=number of MK_PHIKind, m=number of statements)?


Mh, it is unfortunatly not that simple. First, "n" should also include MK_Value but that is not even the problem. The problem is with "m" which is not clearly defined. "m" has to be the number of "join-points" in the generated CFG. This number can grow exponentially with the number of statements/blocks in the generated CFG (which can be arbitrarily higher than the number of statements in the SCoP). Thus, one can achieve O(n*2^m) PHIs, but that also means the optimized version has 2^m different paths through the region which will cause explosions in various other places too. That said, I am not even sure this kind of analysis makes sense because:

We will now only generate a PHI (for one particular SSA-value) if it is "needed" to join different versions of the original "value". While these PHIs can be dead, the total number of PHIs is linear in the number of paths. Additionally, if a path does not introduce a new version of the SSA-value there won't be a PHI once it is joined and if it does contain a new version of an SSA-value we would have placed a store before and mem2reg would have placed the PHI later. The main difference in the complexity comes from dead PHIs that are (as mentioned) limited by the number of paths and could be removed easily after code generation by us.

> I still don't really understand the "ScalarMap". What is the difference to BBMaps? Do we need both?


In short: BBMaps are "intra-statement" maps for __all__ local ssa-values but ScalarMaps are (control-)scope-aware "inter-statement" maps for __some__ ssa-values.

Summarized:

1. ScalarMaps are used to remember mappings between statements
2. ScalarMaps are (control-)scope-aware, thus each (control-) scope (here loop or conditional) has it's own ScalarMap that is initialized with the ScalarMap of the parent (control-) scope.
3. ScalarMaps do not contain all ssa-value mappings but only those that are interesting, thus have a SAI object.

> It uses original defs as well merge PHIs used as key. I see the test cases motivating this, but could this be avoided by e.g. adding the incoming values in a separate pass instead on-the-fly?


It basically unifies the old ScalarMap and the old PHIOpMap using this trick. One could have two (control-)scope-aware maps again (or maybe do something else) but I do not think it is worth it.

> I had 17 LNT test fails with -polly-position=before-vectorizer (none without it). Interestingly I also got 4 unit test fails under Windows but not Linux. Is there maybe some indeterministic ordering? The failing tests are:

> 

>   Polly :: Isl/CodeGen/phi_loop_carried_float.ll

>   Polly :: Isl/CodeGen/phi_loop_carried_float_4.ll

>   Polly :: Isl/CodeGen/phi_loop_carried_float_escape.ll

>   Polly :: Isl/CodeGen/phi_scalar_simple_1.ll

> 

> If you rebase and the problem is still there, I'd debug this for you under Windows.


I will rebase it soon and push it here. Then I will also check LNT again.


================
Comment at: include/polly/CodeGen/BlockGenerators.h:125
@@ -298,1 +124,3 @@
+  /// @brief The loop depth of the currently copied block in the new schedule.
+  int &LoopDepth;
 
----------------
Meinersbur wrote:
> I generally don't like the idea of a private/protected field being sneakingly modified from outside the object (here: IslNodeBuilder)
Agreed, but to be fair, we had those reference maps before too. The only alternative that I see right now is a "ConstructionContext" (similar to the DetectionContext in ScopDetection) that is handed to the functions and can be modified. While this might be a good idea I do think we can postpone it a bit.

================
Comment at: include/polly/CodeGen/BlockGenerators.h:308
@@ +307,3 @@
+  ///
+  /// @param V     A (possibly) loop carried value.
+  /// @param BBMap A mapping from old values to their new values
----------------
Meinersbur wrote:
> jdoerfert wrote:
> > sebpop wrote:
> > > What is a loop carried value?
> > Any scalar value that is defined and used in different iterations of the same loop. Prior to the transformation this means it is used by a PHI, however after the transformation it might also refere to a scalar that is used in a loop textually before it was defined.
> > 
> > An example would be the folloing original code:
> > 
> > for (i = 0...N)
> >   a = A[i];
> >   A[i+1] = a;
> >   
> > with the following "optimized" version:
> > 
> > for (i = 0...N+1)
> >   if (i > 0)
> >     A[i] = a;
> >   if (i < N + 1)
> >     a = A[i];
> > 
> > Here the former non-loop carried scalar a is now loop-carried (thus needs a PHI).
> Referring to "(possibly)": What is the function supposed to do if the value is not loop-carried (in what loop)?
Good question. The (possibly) hints on the fact that one does not know if a value will be loop carried or not but if it is one needs the PHI. To this end we can generate the PHI even though we might not need it. 

================
Comment at: include/polly/CodeGen/BlockGenerators.h:313
@@ +312,3 @@
+  /// @returns The new loop carried PHI for @p V.
+  Value *createLoopCarriedPHI(Value *V, ValueMapT &BBMap);
+
----------------
Meinersbur wrote:
> jdoerfert wrote:
> > sebpop wrote:
> > > Could we call these "loop phi" nodes instead of "loop carried phi" nodes?
> > > 
> > > There is possibly confusion in reading the abbreviated LCPHI: the existing convention is to read it Loop Closed Phi, as in LCSSA.
> > > 
> > Sure.
> Idea: Maybe define in some comment what exactly a "loop-carried phi" is? Eg. a PHI in a loop header.
Can do.

================
Comment at: include/polly/CodeGen/BlockGenerators.h:574
@@ -768,2 +573,3 @@
   DenseMap<BasicBlock *, ValueMapT> RegionMaps;
+  DenseMap<BasicBlock *, ValueMapT> ScalarMaps;
 
----------------
Meinersbur wrote:
> ScalarMaps should have its comment. AFAIK Doxygen treats this as uncommented field.
This is true, thanks for the catch.

================
Comment at: include/polly/CodeGen/IslNodeBuilder.h:69
@@ -74,1 +68,3 @@
+  /// @brief Map from loop carried PHI nodes to "floating" copies.
+  ValueMapT LCPHIs;
 
----------------
Meinersbur wrote:
> Can you explain "floating"?
> 
> As Sebastian also noted, on seeing "LCPHI" I was first thinking about the PHIs for LCSSA.
I can change the name here too and floating means not placed in a basic block [can be added to the comment].

================
Comment at: include/polly/CodeGen/IslNodeBuilder.h:71
@@ -74,3 +70,3 @@
 
-  /// @brief See BlockGenerator::PhiOpMap.
-  BlockGenerator::ScalarAllocaMapTy PHIOpMap;
+  /// @brief Currently valid scalar mappings.
+  ValueMapT ScalarMap;
----------------
Meinersbur wrote:
> Could you explain a bit more about "ScalarMaps"? What is it mapping from/to? How/where is it used? What are valid keys?
Sure

================
Comment at: include/polly/CodeGen/IslNodeBuilder.h:82
@@ +81,3 @@
+
+  IslExprBuilder ExprBuilder;
+
----------------
Meinersbur wrote:
> Why moving this definition?
Beacuse the constructor/initilizer code looks nicer when we order the members this way. I do not care to much about it and if you feel we should not move this I can undo it.

================
Comment at: include/polly/Support/ScopHelper.h:170
@@ +169,3 @@
+
+/// @brief Use @p Builder to create merge PHIs.
+///
----------------
Meinersbur wrote:
> This would be an excellent place to explain how a "merge PHI" is different from other PHIs.
I am not sure what you mean by other PHIs and therefor not what exactly you want to see here but I agree, I can add some comment here.

================
Comment at: include/polly/Support/ScopHelper.h:176
@@ +175,3 @@
+///
+/// @returns The merged mappings in @p MergeMap.
+void createMergePHIs(PollyIRBuilder &Builder,
----------------
Meinersbur wrote:
> This function returns void
I know, therefor the comment in the @returns clause.

================
Comment at: lib/Analysis/ScopDetection.cpp:90
@@ -89,3 +89,3 @@
         "Process scops that are unlikely to benefit from Polly optimizations."),
-    cl::location(PollyProcessUnprofitable), cl::init(false), cl::ZeroOrMore,
+    cl::location(PollyProcessUnprofitable), cl::init(true), cl::ZeroOrMore,
     cl::cat(PollyCategory));
----------------
Meinersbur wrote:
> I assume this is a leftover from debugging.
Indeed.

================
Comment at: lib/CodeGen/BlockGenerators.cpp:232
@@ -208,3 +231,3 @@
                                           isl_id_to_ast_expr *NewAccesses) {
-  if (Value *PreloadLoad = GlobalMap.lookup(Load))
+  if (Value *PreloadLoad = BBMap.lookup(Load))
     return PreloadLoad;
----------------
Meinersbur wrote:
> This change is surprising to me. Aren't the hoisted loads stored in GlobalMap anymore?
Yes and no. We do not need to lookup in the GlobalMap here because the hoisted loads are propagated thorugh the scalar maps to the BBMap. One could as easily not do this and I am not sure which is better.

================
Comment at: lib/CodeGen/BlockGenerators.cpp:265
@@ +264,3 @@
+Value *BlockGenerator::createLoopCarriedPHI(Value *V, ValueMapT &BBMap) {
+  auto *PHI = PHINode::Create(V->getType(), 2, V->getName() + ".polly.lc");
+
----------------
Meinersbur wrote:
> What is the reason to not insert the PHI into the generated BB at this point (but to keep them "floating")?
This function and all callers are in the BlockGenerator, but the block where these PHIs are supposed to reside in [the loop header of a Polly generated loop] is created in the IslNodeBuilder (or more precise the LoopGenerator). Thus, we either keep them floating or remember loop header blocks somewhere in the BlockGenerator. I choose the first as we can easily place them in the IslNodeBuilder if they are actually needed or delete them otherwise.

================
Comment at: lib/CodeGen/BlockGenerators.cpp:277-281
@@ +276,7 @@
+                                          LoopToScevMapT &LTS) {
+  if (!LI.isLoopHeader(PHI->getParent()))
+    return nullptr;
+
+  if (LoopDepth == 0 || canSyntheziseInStmt(Stmt, PHI))
+    return nullptr;
+
----------------
Meinersbur wrote:
> Could you add comments to explain these cases?
Sure.

================
Comment at: lib/CodeGen/BlockGenerators.cpp:339
@@ -289,2 +338,3 @@
 
-  ValueMapT BBMap;
+  // Initialize the block intern mappings with all mapping that hold when this
+  // block is entered.
----------------
Meinersbur wrote:
> block-intern_al_?
> mapping_s_
Thx.

================
Comment at: lib/CodeGen/BlockGenerators.cpp:442
@@ +441,3 @@
+    // In the second case we map both, the PHICopy as well as the BaseAddr to
+    // the NewValue to allow uses textually before as well as after the scalar
+    // definition.
----------------
Meinersbur wrote:
> "textually" doesn't seem the right word; we are not doing text processing here. Should be something about processing order.
AFAIK this is a standard term that we also used in our discussions and emails before. Processing order would not meet the requirements anyway as the actual dynamically executed order is not what this is about. "textually" means the order in which you statically write/see/discover things. In other words the order in which you first see the statements in a linearized/dumped AST if you read it like a page (from top to bottom).

================
Comment at: lib/CodeGen/BlockGenerators.cpp:1165
@@ -1278,3 +1164,3 @@
   // Once the block will be copied the incoming value will be added.
-  BasicBlock *BBCopy = BlockMap[IncomingBB];
+  BasicBlock *BBCopy = BlockMap.lookup(IncomingBB);
   if (!BBCopy) {
----------------
Meinersbur wrote:
> Unrelated?
Maybe, I do not recall.

================
Comment at: lib/CodeGen/IslNodeBuilder.cpp:61
@@ +60,3 @@
+    auto &Array = Pair.second;
+    if (Array->getNumberOfDimensions() != 0 || !Array->isPHIKind())
+      continue;
----------------
Meinersbur wrote:
> `Array->getNumberOfDimensions()!=0` seems redundant. MK_PHI always has 0 dimensions.
> 
Probably true, I do again not recall why I did it this way.

================
Comment at: lib/CodeGen/IslNodeBuilder.cpp:522
@@ -471,1 +521,3 @@
 
+  auto PreMap = ScalarMap;
+  auto PreLCPHIs = LCPHIs;
----------------
Meinersbur wrote:
> Naive question: Is there some way we could have a nested instance of IslNodeBuilder/function call with fresh values instead of temporarily storing `LCPHIs`, `LoopDepth`, `PreMap` away and restoring it afterwards?
Possibly, but that is not an easy task and it would not even help much. We would need to copy ScalarMap to the new "instance" anyway. If we think of a good solution we can implement it in the paralell code generation too (we currently copy the maps there too).

================
Comment at: lib/CodeGen/IslNodeBuilder.cpp:523-524
@@ +522,4 @@
+  auto PreMap = ScalarMap;
+  auto PreLCPHIs = LCPHIs;
+  LCPHIs.clear();
+
----------------
Meinersbur wrote:
> std::move
That would "destory" LCPHIs here, wouldn't it? AFAIK you shall not access something that you passed to std::move. We still need the container but just without its content.

================
Comment at: lib/CodeGen/IslNodeBuilder.cpp:549
@@ +548,3 @@
+      if (!PreVal)
+        PreVal = UndefValue::get(V->getType());
+
----------------
Meinersbur wrote:
> jdoerfert wrote:
> > sebpop wrote:
> > > I don't understand how these undef values are supposed to work.
> > > Are you cleaning them up in a later pass?
> > > I see in some of the testcases that they are still left in the IR after Polly.
> > Undef values are already "used" in the current code generation but simply hidden well enough. When a scalar is used before it is initialized we basically get an undef. Currently this means a load from an uninitialized alloca is basically an undef. As we promoted these allocas, loads and stores now we see the undefs in the IR, however they should only exist where the alloca content was undefined before. An example:
> > 
> > int x;
> > for i = 0...N {
> >   if (i > 0)
> >     A[i-1] = x;
> >    x = A[i];
> > }
> > 
> > Here the initial value of x is undefined and therefor the PHI in the loop header as well as the PHI after the loop will have one undef as operand.
> > 
> > In the following example there were no undefs prior to Polly in the IR but right after code generation there are.
> > 
> > if (a) {
> > S:  x = A[i];
> >   /* split BB */
> > P:  A[i] = x;  
> > }
> > 
> > Now we split the conditional for whatever reason and Polly will generate a CFG that kinda looks like this:
> > 
> > if (a)
> > S:  x1 = A[i];
> > x = phi(undef, x1)
> > 
> > if (a)
> > P:  A[i] = x;
> > 
> > We need the undef for the path that does not define x (or x1). Does that make sense? 
> > 
> Could you explain this in the code as a comment?
Sure, but I will have to rephrase it a bit.

================
Comment at: lib/CodeGen/IslNodeBuilder.cpp:798
@@ -694,1 +797,3 @@
 
+  auto ElseMap = ScalarMap;
+  auto *ElseExitBB = Builder.GetInsertBlock();
----------------
Meinersbur wrote:
> std::move
[See above for more comments] I would like to get it in like this and we can optimize it later, especially since std::move might trigger hard to debug problems I want to avoid for now.

================
Comment at: lib/Support/RegisterPasses.cpp:45
@@ -44,3 +44,3 @@
     PollyEnabled("polly", cl::desc("Enable the polly optimizer (only at -O3)"),
-                 cl::init(false), cl::ZeroOrMore, cl::cat(PollyCategory));
+                 cl::init(true), cl::ZeroOrMore, cl::cat(PollyCategory));
 
----------------
Meinersbur wrote:
> Assuming debugging leftover
Indeed, again.

================
Comment at: lib/Support/ScopHelper.cpp:468
@@ +467,3 @@
+    if (!IncomingVal)
+      IncomingVal = UndefValue::get(Val->getType());
+
----------------
Meinersbur wrote:
> If the incoming value is not found in ScalarMap, I can think of two reasons:
> 1) It's defined before the Scop
> 2) We forgot to put in there
> 3) It was defined only in one incoming branch.
> In none of these cases "undef" would make sense.  In 3) it cannot be used without an explicit PHI in the original code. What other case is there?
1) and 2) cannot/should never happen or would at least be bugs. 2) is clearly a bug and 1) should always be present.

The reason for the undef here is the same as above, thus 3).


Starting with:
```
   if (a) {
S:   x = A[i];
P:   A[i] = x;
   }
```

the scheduler can generate:

```
   if (a) {
S:   x = A[i];
   }

   if (a) {
P:   A[i] = x;
   }
```

which needs a PHI (for x after the conditional containing S) even though the original code did not contain/need any. Additionally, there is only one path to that PHI that defines a value for x but we need an operand for the other path reaching the PHI. As I explained Sebastian, these undef's were basically present before, as "content" of the alloca slots we generated. And if you look at the code we generate at the momement for the example abvoe and run mem2reg you will see the same udnef's again.

================
Comment at: lib/Support/ScopHelper.cpp:476
@@ +475,3 @@
+
+    MergePHI->setName(IncomingVal->getName() + ".merge");
+  }
----------------
Meinersbur wrote:
> Wouldn't it be better to name the PHI after the original value instead one of the incoming values?
Maybe, I do not recall if this was a concious desicion or not.

================
Comment at: lib/Support/ScopHelper.cpp:479
@@ +478,3 @@
+
+  if (Values.size() == 1) {
+    MergePHI->eraseFromParent();
----------------
Meinersbur wrote:
> Because Value.size() will be equal to BBs.size(), this condition can be tested before creating an not used MergePHI.
It is not necessarily equal to BBs.size() as the mappings in several BBs can be the same. If that happens to be the case for all BBs we can remove the PHI immediatly but we do not know that until we looked up all mappings.


http://reviews.llvm.org/D15722