[PATCH] D15722: [WIP][Polly] SSA Codegen

Tue Feb 2 02:36:02 PST 2016

jdoerfert added a comment.

In http://reviews.llvm.org/D15722#334871, @grosser wrote:

> 1. (you already know about this) I am concerned that this approach might be more complex and/or may have more corner cases we need to worry about. Aditya mentioned that in their implementation in graphite, they fail to generate code in various situations. My understanding is that you are not aware of any such limitations in your code and you already checked several of the examples Aditya provided. It would probably be good for Aditya and Sebastian to have a look at this patch, as they have experience with SSA code generation and might be able to provide you with some test cases.


Thest cases would be good yes. And yes, I am not aware of general limitations or bugs.

> 2. It is interesting to see that this patch actually removes a lot of code. So maybe I am wrong about point 1)


In my opinion it is not (so much) about the amount of code but the complexity. And most of the new code is rather straight forward (e.g., everything in ScopHelper.cpp, merging after conditionals and after the loop, and the code that is produced by Polly).

> 3. An alternative approach might be to just call PromoteMemToRegisters on the scalar allocs we introduced. This would allow us to keep the same mental model, but to still generate good code.


At some point you want to run the vectorizer after Polly. Why require post-transformation if good code generation is not harder than what we have at the moment?

> Some questions I have before i get into reviewing this code:

> 

> a) Does your patch need to make any effort to eliminate dead PHI nodes? Will we have a lot of PHI nodes that might need to be dead-code eliminated after your patch? More precisely, would we now need to run a phase of DCE after Polly instead of Mem2Reg.


It does produce dead PHI nodes, though, judging from my experiments not many. Additionally, these dead nodes should not harm vectorization or anything else and will be removed eventually. I did not implement an AST SSA generation that does not generate dead PHIs (e.g., http://pp.info.uni-karlsruhe.de/uploads/publikationen/braun13cc.pdf) since I wanted it to be simple. In our special casse one could easily remember all placed PHIs and use the number of users + the operands to remove all of them after code generation in an efficent way (without another pass).

> b) Are there any remaining issues or inherent limitations you are aware in comparison to the old approach?


Not that I am aware of.

> c) I know you are still running performance numbers, which will be interesting to look at. Otherwise, is this patch ready to be reviewed.


Performance numbers are back. Both on parkas2, almost identical except 2 small compile time improvements.

In http://reviews.llvm.org/D15722#338442, @Meinersbur wrote:

> There is also an advantage in that we can verify the generated IR statically, st wrong code (e.g. forget to initialize a value) is noticed much earlier by the IR verifier before generating a wrong program that only gets noticed at runtime.


I fully agree. There were almost only compile time problems when I wrote this. When I rememeber the time we fixed the scalar codegen the first time we saw a lot of runtime problems due to not or wrong initialization.


================
Comment at: include/polly/CodeGen/BlockGenerators.h:308
@@ +307,3 @@
+  ///
+  /// @param V     A (possibly) loop carried value.
+  /// @param BBMap A mapping from old values to their new values
----------------
sebpop wrote:
> What is a loop carried value?
Any scalar value that is defined and used in different iterations of the same loop. Prior to the transformation this means it is used by a PHI, however after the transformation it might also refere to a scalar that is used in a loop textually before it was defined.

An example would be the folloing original code:

for (i = 0...N)
  a = A[i];
  A[i+1] = a;
  
with the following "optimized" version:

for (i = 0...N+1)
  if (i > 0)
    A[i] = a;
  if (i < N + 1)
    a = A[i];

Here the former non-loop carried scalar a is now loop-carried (thus needs a PHI).

================
Comment at: include/polly/CodeGen/BlockGenerators.h:313
@@ +312,3 @@
+  /// @returns The new loop carried PHI for @p V.
+  Value *createLoopCarriedPHI(Value *V, ValueMapT &BBMap);
+
----------------
sebpop wrote:
> Could we call these "loop phi" nodes instead of "loop carried phi" nodes?
> 
> There is possibly confusion in reading the abbreviated LCPHI: the existing convention is to read it Loop Closed Phi, as in LCSSA.
> 
Sure.

================
Comment at: lib/CodeGen/BlockGenerators.cpp:136
@@ -128,3 +135,3 @@
 
-  // A scop-constant value defined by an instruction executed outside the scop.
-  if (const Instruction *Inst = dyn_cast<Instruction>(Old))
+  // An instruction defined inside the scop.
+  if (const Instruction *Inst = dyn_cast<Instruction>(Old)) {
----------------
sebpop wrote:
> This comment is confusing: either remove it, or move it one line down and fix it to say "outside the scop".
I'll repair this comment.

================
Comment at: lib/CodeGen/BlockGenerators.cpp:425
@@ +424,3 @@
+          canSyntheziseInStmt(Stmt, AccessValueInst))
+        copyInstruction(Stmt, AccessValueInst, BBMap, LTS, nullptr,
+                        /* Force */ true);
----------------
sebpop wrote:
> Why are we forcing copy of instructions that can be synthesized?
> Aren't these synthesizable instructions discarded in the first place by not being added to the scalar memory accesses of the stmt?
Because we (might) need these values as operands of PHI nodes we haven't constructed yet and we need them to be computed/placed at the correct location. Later on, when we create the PHI, we do not know anymore where we should place the operand code and not even which operand we should create.

In the following example that could be the result after scheduling we would not know where to place which version of x (x1 or x2) when we create code for the join point after the conditional. It would require a backward search on the AST to find the location where the PHI x was written last by each predecessor in order to get the operand and location. However, forcing them to be copyied in the first place solves this quite nice as we alwas know the values we will later need are actually present and mapped.

if (...) {
  x1 = 2*i;
S: /* lots of code */
} else {
  x2 = 3*i;
P: /* lots of code */
}
x3 = phi(x1, x2)


================
Comment at: lib/CodeGen/BlockGenerators.cpp:1055
@@ -1134,1 +1054,3 @@
 
+    // TODO
+    generateScalarMappings(Stmt, BB, ScalarMap, RegionMap, LTS);
----------------
sebpop wrote:
> Unfinished comment?
Yes, I'll fix this.

================
Comment at: lib/CodeGen/IslNodeBuilder.cpp:549
@@ +548,3 @@
+      if (!PreVal)
+        PreVal = UndefValue::get(V->getType());
+
----------------
sebpop wrote:
> I don't understand how these undef values are supposed to work.
> Are you cleaning them up in a later pass?
> I see in some of the testcases that they are still left in the IR after Polly.
Undef values are already "used" in the current code generation but simply hidden well enough. When a scalar is used before it is initialized we basically get an undef. Currently this means a load from an uninitialized alloca is basically an undef. As we promoted these allocas, loads and stores now we see the undefs in the IR, however they should only exist where the alloca content was undefined before. An example:

int x;
for i = 0...N {
  if (i > 0)
    A[i-1] = x;
   x = A[i];
}

Here the initial value of x is undefined and therefor the PHI in the loop header as well as the PHI after the loop will have one undef as operand.

In the following example there were no undefs prior to Polly in the IR but right after code generation there are.

if (a) {
S:  x = A[i];
  /* split BB */
P:  A[i] = x;  
}

Now we split the conditional for whatever reason and Polly will generate a CFG that kinda looks like this:

if (a)
S:  x1 = A[i];
x = phi(undef, x1)

if (a)
P:  A[i] = x;

We need the undef for the path that does not define x (or x1). Does that make sense? 


================
Comment at: lib/Support/ScopHelper.cpp:481
@@ +480,3 @@
+    MergePHI->eraseFromParent();
+    return *Values.begin();
+  }
----------------
sebpop wrote:
> You could also check whether all arguments of the MergePHI node are the same, in which case you do not need the phi node, and just return the first arg.
That is what this code is doing, at least it was supposted to do that. 
Note that the set "Values" contains all operands of the MergePHI and if it containts only one Value we know all operands are the same, otherwise we know they are not.

Do you see a problematic case? 


http://reviews.llvm.org/D15722