[PATCH] D13487: [Polly] Load/Store scalar accesses before/after the statement itself

Tue Oct 13 10:58:59 PDT 2015

Meinersbur added inline comments.

================
Comment at: lib/CodeGen/BlockGenerators.cpp:1150
@@ -1144,1 +1149,3 @@
+        Builder.CreateLoad(Address, Address->getName() + ".reload");
+  }
 }
----------------
grosser wrote:
> It is not yet clear why we need a special implementation here. If I replace this code with just a call to BlockGenerator::generateScalarLoads(Stmt, BBMap)" all tests pass.
> 
> You comment here:
> 
> >	    // Only generate loads for PHIs in the entry block. Intra-ScopStmt PHIs are
> >	    // copied directly.
> 
> but this comment does not seem to be related to the if-condition that follows, as inter-ScopStmt PHIs are not even modeled and do consequently not need to be skipped.
> 
> Did you assume we model Intra-ScopStmt PHI nodes or was there another reason for including this code that I missed?
Considering that I encountered intra-scopstmt writes I assumed that there will also be intra-scop reads. If this assumption is wrong we could indeed fall through BlockGenerator::generateScalarLoads directly.

================
Comment at: lib/CodeGen/BlockGenerators.cpp:1210
@@ -1167,1 +1209,3 @@
+               !DT.dominates(cast<Instruction>(Val)->getParent(), StmtExitBB))
+      continue;
 
----------------
grosser wrote:
> > As mentioned in the phone call and a message to the mailing list, the current code does generate such MemoryAccesses. Some later planned change will hopefully make this check go away.
> 
> Michael, without giving an example this is hard to understand. I commented out the code and tried to generate all scalar writes in the exit block, but this failed as you promised. I give now an example:
> 
> ```
>                bb1:
>                br non-affine bb2, bb3
> 
> bb2:                                       bb3:
>   a =                                      b =
>   br exit                                  br exit
> 
> 
> exit:
>   val = phi [a, %bb2], [b, %bb3]
> ```
> 
> For this example we create the RegionStmt bb1->exit which has two PHI writes. It is important that these writes happen in bb2 and bb3. Neither the values written may dominate the exit block nor can we choose which value to write without knowing if the control flow passed through bb2 or bb3. Hence for PHI-node writes we actually need to create them in the basic block statement they are created in (this will always be an exiting statement of the Region). Am I correct this is the reason for these PHI-node special cases?
> 
> I tried to simplify the conditions above and came up with the following code which still  passes all tests:
> 
> ```
>   for (MemoryAccess *MA : Stmt) {                                                 
>     if (MA->isExplicit() || MA->isRead())                                         
>       continue;                                                                   
>                                                                                   
>     Instruction *ScalarInst = MA->getAccessInstruction();                         
>     Value *Val = MA->getAccessValue();                                            
>                                                                                   
>     // In case we add the store into an exiting block, we need to restore the     
>     // position for stores in the exit node.                                      
>     auto SavedInsertionPoint = Builder.GetInsertPoint();                          
> 
>     BasicBlock *ExitingBB = ScalarInst->getParent();                              
>     BasicBlock *ExitingBBCopy = BlockMap[ExitingBB];                              
>     Builder.SetInsertPoint(ExitingBBCopy->getTerminator());                       
>                                                                                   
>     auto Address = getOrCreateAlloca(*MA);                                        
>                                                                                   
>     Val = getNewScalarValue(Val, R, Stmt, LTS, BBMap);                            
>     Builder.CreateStore(Val, Address);                                            
>                                                                                   
>     // Restore the insertion point if necessary.                                  
>     Builder.SetInsertPoint(SavedInsertionPoint);                                  
>   } 
> ```
> 
> Did I miss something important or are these simplifications correct?
> 
> Also, is it correct that we code-generate all PHI nodes at their normal statement locations? Hence, the copy-out code for PHI-nodes does not introduce any functional change?
There can be (unecessary) writes to a.s2a and b.s2a which are not necessarily in the exit(ing) block, i.e. somewhere in the middle. This violated the intention I had to not write to (not yet existing) MAPPED scalars in the middle of ScopStmt, making intra-ScopStmt lifetime analysis necessary.

Stores in the BBs appear in the reverse order. Not a problem from my POV though.

I don't know whether it is possible that there already is code in the ExitingBBCopy when it has been split up to insert the non-affine region. If yes, building code at the terminator might be after the instruction that have been before.

I wouldn't think too much about this code, I will make it go away in the patch I am working on at the moment.

================
Comment at: test/Isl/CodeGen/non-affine-phi-node-expansion-3.ll:20
@@ -19,9 +19,3 @@
 ; CHECK-NEXT: %p_val2 = fadd float 1.000000e+00, 2.000000e+00
-; CHECK-NEXT: store float %p_val0, float* %merge.phiops
-; CHECK-NEXT: store float %p_val1, float* %val1.s2a
-; CHECK-NEXT: store float %p_val2, float* %val2.s2a
-
-; FIXME -> The last two writes are not really needed and can be dropped if the
-;          incoming block of the PHI and the value that is used share the same
-;          non-affine region.
+; CHECK: store float %p_val0, float* %merge.phiops
 
----------------
grosser wrote:
> > This is a side-effect because val0 and val1 are dominating the non-affine region's exit, i.e. it is not possible to have such a store in the exit block. A planned change will clean up the logic that generates MemoryAccesses.
> 
> I have troubles following this reasoning. If this code was correct before, why would it now be incorrect? Would it break the no-scalar-in-statement dependency property you want to establish?
> 
> > I didn't not understand why check for instructions that shouldn't even be there, so I just removed the check.
> 
> Which check are you talking about? I could not find a check in Polly's source code that was dropped.
> 
> 
> Independent of the comments above, your test seems to be unnecessarily weak. AKA, it would also pass if the original output is generated. I think if the new output is indeed preferable, we should check that it is really generated.
> I have troubles following this reasoning. If this code was correct before, why would it now be incorrect? Would it break the no-scalar-in-statement dependency property you want to establish?

Because this patch moves the store from the end of the BB where it has been defined to the region exit, where the value's definition is not necessarily dominating.

Taking your example:


```
               bb1:
               br non-affine bb2, bb3

bb2:                                       bb3:
  a =                                      b =
  <store %a, %a.s2a>
  br bb4                                  br exit

bb4:
  br exit:

exit:
  val = phi [a, %bb4], [b, %bb3]
  <store %a, %a.s2a>
```

The store can be in bb2, but not in exit. It cannot be in bb2 because of my intention to not have any writes in the middle of ScopStmts. It doesn't need to be written anyway because there can be no use of %a (i.e. without PHIs) beyond bb2/4.


> Which check are you talking about? I could not find a check in Polly's source code that was dropped.

; CHECK-NEXT: store float %p_val1, float* %val1.s2a
; CHECK-NEXT: store float %p_val2, float* %val2.s2a


> Independent of the comments above, your test seems to be unnecessarily weak. AKA, it would also pass if the original output is generated. I think if the new output is indeed preferable, we should check that it is really generated.

What test are you talking about?

This is not meant as a unit test of any new functionality, just an adaption to changed behavior.

================
Comment at: test/Isl/CodeGen/scev_expansion_in_nonaffine.ll:18
@@ +17,3 @@
+; CHECK:          %scevgep30 = getelementptr i32, i32* %scevgep29, i64 %65
+; CHECK:          store i32 21, i32* %scevgep30, align 8
+
----------------
grosser wrote:
> This test case currently fails for me due to the instruction number having changed from %64 to %39. Using FileCheck Variables [1], this dependence on the instruction number can be removed.
> 
> 
> ```
> ; CHECK-LABEL:  polly.stmt.if.then.110:                                          
> ; CHECK:          [[REGISTER:%[0-9]+]] = mul i64 %polly.indvar{{.*}}, 30         
> ; CHECK:          [[GEP:%scevgep[0-9]+]] = getelementptr i32, i32* %scevgep{{.*}}, i64 [[REGISTER]]
> ; CHECK:          store i32 0, i32* [[GEP]]                                      
>                                                                                  
> ; CHECK-LABEL:  polly.stmt.if.else:                                              
> ; CHECK:          [[REGISTER:%[0-9]+]] = mul i64 %polly.indvar{{.*}}, 30         
> ; CHECK:          [[GEP:%scevgep[0-9]+]] = getelementptr i32, i32* %scevge{{.*}}, i64 [[REGISTER:%[0-9]+]]
> ; CHECK:          store i32 21, i32* [[GEP]]            
> ```
> 
> 
> Also, I am not fully sure what you check here. The stores you are CHECKing for seem to be due to array stores. Stmt_for_body_101__TO__for_inc_114 does not have any scalar stores modeled in -polly-scops. In your comments you mention dominance, so maybe this is related to the condition "!DT.dominates(cast<Instruction>(Val)->getParent(), StmtExitBB))", but as this condition is only checked for scalar writes this does not seem to be the case. I also tried if your patch affects this test, but in my experiments it passes with and without the remaining patch applied. Could you help me to understand how this test case is affected by your change? 
> 
> [1] http://llvm.org/docs/CommandGuide/FileCheck.html#filecheck-variables
It is intended to fail before the diff update where instead of RegionMaps, a single ScalarMap was used. Johannes and me thought it would be okay, but this one test was failing. I made it to ensure that when trying to remove RegionMaps again, we have a unit test fail before an LNT test fail.

You could argue that this check is unrelated. I might commit it separately.


http://reviews.llvm.org/D13487