[PATCH] D13487: [Polly] Load/Store scalar accesses before/after the statement itself

Tue Oct 13 10:07:02 PDT 2015

grosser added a comment.

Hi Michael,

to (hopefully) resolve some open misunderstandings and to get this patch in quickly, I looked myself through these changes. As I said earlier, the general direction seems clear, but when trying to understand your change in detail (and looking at Johannes questions) some implementation details remained unclear to me. Sorry for bring these up so late, but answering them would really help my understanding.

Best,
Tobias


================
Comment at: lib/CodeGen/BlockGenerators.cpp:341
@@ -339,1 +340,3 @@
+  // their alloca.
+  generateScalarStores(Stmt, LTS, BBMap);
   return CopyBB;
----------------
@johannes: I think the point is that for region statements, Michael wants to generate scalar stores not before and after each basic block, but only once at the beginning and at the end of the entire ScopStmt. To my understanding "keep[ing] the SetInsertPoint and the generateScalarXXX in the other copyBB function" would allow the former, but not the latter. This is why Michael has difficulties to address your comment.

I have been thinking about this myself, but I do not see how to implement this better besides documenting that copyBB requires generateScalarLoads/Stores to be run explicitly before/after copyBB is called for all BBs in a ScopStmt.

================
Comment at: lib/CodeGen/BlockGenerators.cpp:438
@@ -437,3 +429,1 @@
-  for (MemoryAccess *MA : *MAL) {
-    if (MA->isExplicit() || !MA->isRead())
       continue;
----------------
Nice simplification!

================
Comment at: lib/CodeGen/BlockGenerators.cpp:1048
@@ -1051,3 +1047,3 @@
     // In order to remap PHI nodes we store also basic block mappings.
     BlockMap[BB] = BBCopy;
 
----------------
@Johannes: I have no idea how this could be hidden assuming we want indeed "scalar copies only once at the beginning and at the end of the entire ScopStmt" as described in an earlier comment. Johannes, did you write this assuming "scalar copies for each basic block" or have you a solution in mind that works also for "scalar copies only once"?

================
Comment at: lib/CodeGen/BlockGenerators.cpp:1150
@@ -1144,1 +1149,3 @@
+        Builder.CreateLoad(Address, Address->getName() + ".reload");
+  }
 }
----------------
It is not yet clear why we need a special implementation here. If I replace this code with just a call to BlockGenerator::generateScalarLoads(Stmt, BBMap)" all tests pass.

You comment here:

>	    // Only generate loads for PHIs in the entry block. Intra-ScopStmt PHIs are
>	    // copied directly.

but this comment does not seem to be related to the if-condition that follows, as inter-ScopStmt PHIs are not even modeled and do consequently not need to be skipped.

Did you assume we model Intra-ScopStmt PHI nodes or was there another reason for including this code that I missed?

================
Comment at: lib/CodeGen/BlockGenerators.cpp:1210
@@ -1167,1 +1209,3 @@
+               !DT.dominates(cast<Instruction>(Val)->getParent(), StmtExitBB))
+      continue;
 
----------------
> As mentioned in the phone call and a message to the mailing list, the current code does generate such MemoryAccesses. Some later planned change will hopefully make this check go away.

Michael, without giving an example this is hard to understand. I commented out the code and tried to generate all scalar writes in the exit block, but this failed as you promised. I give now an example:

```
               bb1:
               br non-affine bb2, bb3

bb2:                                       bb3:
  a =                                      b =
  br exit                                  br exit


exit:
  val = phi [a, %bb2], [b, %bb3]
```

For this example we create the RegionStmt bb1->exit which has two PHI writes. It is important that these writes happen in bb2 and bb3. Neither the values written may dominate the exit block nor can we choose which value to write without knowing if the control flow passed through bb2 or bb3. Hence for PHI-node writes we actually need to create them in the basic block statement they are created in (this will always be an exiting statement of the Region). Am I correct this is the reason for these PHI-node special cases?

I tried to simplify the conditions above and came up with the following code which still  passes all tests:

```
  for (MemoryAccess *MA : Stmt) {                                                 
    if (MA->isExplicit() || MA->isRead())                                         
      continue;                                                                   
                                                                                  
    Instruction *ScalarInst = MA->getAccessInstruction();                         
    Value *Val = MA->getAccessValue();                                            
                                                                                  
    // In case we add the store into an exiting block, we need to restore the     
    // position for stores in the exit node.                                      
    auto SavedInsertionPoint = Builder.GetInsertPoint();                          

    BasicBlock *ExitingBB = ScalarInst->getParent();                              
    BasicBlock *ExitingBBCopy = BlockMap[ExitingBB];                              
    Builder.SetInsertPoint(ExitingBBCopy->getTerminator());                       
                                                                                  
    auto Address = getOrCreateAlloca(*MA);                                        
                                                                                  
    Val = getNewScalarValue(Val, R, Stmt, LTS, BBMap);                            
    Builder.CreateStore(Val, Address);                                            
                                                                                  
    // Restore the insertion point if necessary.                                  
    Builder.SetInsertPoint(SavedInsertionPoint);                                  
  } 
```

Did I miss something important or are these simplifications correct?

Also, is it correct that we code-generate all PHI nodes at their normal statement locations? Hence, the copy-out code for PHI-nodes does not introduce any functional change?

================
Comment at: test/Isl/CodeGen/non-affine-phi-node-expansion-3.ll:20
@@ -19,9 +19,3 @@
 ; CHECK-NEXT: %p_val2 = fadd float 1.000000e+00, 2.000000e+00
-; CHECK-NEXT: store float %p_val0, float* %merge.phiops
-; CHECK-NEXT: store float %p_val1, float* %val1.s2a
-; CHECK-NEXT: store float %p_val2, float* %val2.s2a
-
-; FIXME -> The last two writes are not really needed and can be dropped if the
-;          incoming block of the PHI and the value that is used share the same
-;          non-affine region.
+; CHECK: store float %p_val0, float* %merge.phiops
 
----------------
> This is a side-effect because val0 and val1 are dominating the non-affine region's exit, i.e. it is not possible to have such a store in the exit block. A planned change will clean up the logic that generates MemoryAccesses.

I have troubles following this reasoning. If this code was correct before, why would it now be incorrect? Would it break the no-scalar-in-statement dependency property you want to establish?

> I didn't not understand why check for instructions that shouldn't even be there, so I just removed the check.

Which check are you talking about? I could not find a check in Polly's source code that was dropped.


Independent of the comments above, your test seems to be unnecessarily weak. AKA, it would also pass if the original output is generated. I think if the new output is indeed preferable, we should check that it is really generated.

================
Comment at: test/Isl/CodeGen/scev_expansion_in_nonaffine.ll:18
@@ +17,3 @@
+; CHECK:          %scevgep30 = getelementptr i32, i32* %scevgep29, i64 %65
+; CHECK:          store i32 21, i32* %scevgep30, align 8
+
----------------
This test case currently fails for me due to the instruction number having changed from %64 to %39. Using FileCheck Variables [1], this dependence on the instruction number can be removed.


```
; CHECK-LABEL:  polly.stmt.if.then.110:                                          
; CHECK:          [[REGISTER:%[0-9]+]] = mul i64 %polly.indvar{{.*}}, 30         
; CHECK:          [[GEP:%scevgep[0-9]+]] = getelementptr i32, i32* %scevgep{{.*}}, i64 [[REGISTER]]
; CHECK:          store i32 0, i32* [[GEP]]                                      
                                                                                 
; CHECK-LABEL:  polly.stmt.if.else:                                              
; CHECK:          [[REGISTER:%[0-9]+]] = mul i64 %polly.indvar{{.*}}, 30         
; CHECK:          [[GEP:%scevgep[0-9]+]] = getelementptr i32, i32* %scevge{{.*}}, i64 [[REGISTER:%[0-9]+]]
; CHECK:          store i32 21, i32* [[GEP]]            
```


Also, I am not fully sure what you check here. The stores you are CHECKing for seem to be due to array stores. Stmt_for_body_101__TO__for_inc_114 does not have any scalar stores modeled in -polly-scops. In your comments you mention dominance, so maybe this is related to the condition "!DT.dominates(cast<Instruction>(Val)->getParent(), StmtExitBB))", but as this condition is only checked for scalar writes this does not seem to be the case. I also tried if your patch affects this test, but in my experiments it passes with and without the remaining patch applied. Could you help me to understand how this test case is affected by your change? 

[1] http://llvm.org/docs/CommandGuide/FileCheck.html#filecheck-variables


http://reviews.llvm.org/D13487