[PATCH] D13487: [Polly] Load/Store scalar accesses before/after the statement itself

Tue Oct 13 13:24:07 PDT 2015

grosser added a comment.

Hi Michael,

thank you for this quick and detailed answer. I added my comments inline. One general question I have if the original reason for this patch was to somehow handle the inter-non-affine-scop statement scalar dependences visible e.g. in non-affine-phi-node-expansion-4.ll? Meaning: if we do not generate them (as they are clearly not needed), would this possibly even remove the need for the entire patch? Or is the patch needed even if we drop them?

Best,
Tobias


================
Comment at: lib/CodeGen/BlockGenerators.cpp:1150
@@ -1144,1 +1149,3 @@
+        Builder.CreateLoad(Address, Address->getName() + ".reload");
+  }
 }
----------------
Meinersbur wrote:
> grosser wrote:
> > It is not yet clear why we need a special implementation here. If I replace this code with just a call to BlockGenerator::generateScalarLoads(Stmt, BBMap)" all tests pass.
> > 
> > You comment here:
> > 
> > >	    // Only generate loads for PHIs in the entry block. Intra-ScopStmt PHIs are
> > >	    // copied directly.
> > 
> > but this comment does not seem to be related to the if-condition that follows, as inter-ScopStmt PHIs are not even modeled and do consequently not need to be skipped.
> > 
> > Did you assume we model Intra-ScopStmt PHI nodes or was there another reason for including this code that I missed?
> Considering that I encountered intra-scopstmt writes I assumed that there will also be intra-scop reads. If this assumption is wrong we could indeed fall through BlockGenerator::generateScalarLoads directly.
I might be wrong (even though I do not yet understand why). So let's try to get it right.

What do you mean by intra-scopstmt writes? Are these scalar write statements that are read in the same scopstmt? Did they appear only in RegionStmts and with PHI nodes? I am looking for an example that makes me understand what you mean. Does this pattern appear in one of the test cases that changed? Or are you talking about the in necessary read-write chains we see in non-affine-phi-node-expansion-4.ll?

What would be a intra-scopstmt read? 'intra' suggests at least two elements that are somehow related. So maybe two scalar reads from the same address? Such may indeed happen. The only thing that will not happen is a PHI-read of a PHI-node that is part of the same non-affine region statement.

================
Comment at: lib/CodeGen/BlockGenerators.cpp:1210
@@ -1167,1 +1209,3 @@
+               !DT.dominates(cast<Instruction>(Val)->getParent(), StmtExitBB))
+      continue;
 
----------------
Meinersbur wrote:
> grosser wrote:
> > > As mentioned in the phone call and a message to the mailing list, the current code does generate such MemoryAccesses. Some later planned change will hopefully make this check go away.
> > 
> > Michael, without giving an example this is hard to understand. I commented out the code and tried to generate all scalar writes in the exit block, but this failed as you promised. I give now an example:
> > 
> > ```
> >                bb1:
> >                br non-affine bb2, bb3
> > 
> > bb2:                                       bb3:
> >   a =                                      b =
> >   br exit                                  br exit
> > 
> > 
> > exit:
> >   val = phi [a, %bb2], [b, %bb3]
> > ```
> > 
> > For this example we create the RegionStmt bb1->exit which has two PHI writes. It is important that these writes happen in bb2 and bb3. Neither the values written may dominate the exit block nor can we choose which value to write without knowing if the control flow passed through bb2 or bb3. Hence for PHI-node writes we actually need to create them in the basic block statement they are created in (this will always be an exiting statement of the Region). Am I correct this is the reason for these PHI-node special cases?
> > 
> > I tried to simplify the conditions above and came up with the following code which still  passes all tests:
> > 
> > ```
> >   for (MemoryAccess *MA : Stmt) {                                                 
> >     if (MA->isExplicit() || MA->isRead())                                         
> >       continue;                                                                   
> >                                                                                   
> >     Instruction *ScalarInst = MA->getAccessInstruction();                         
> >     Value *Val = MA->getAccessValue();                                            
> >                                                                                   
> >     // In case we add the store into an exiting block, we need to restore the     
> >     // position for stores in the exit node.                                      
> >     auto SavedInsertionPoint = Builder.GetInsertPoint();                          
> > 
> >     BasicBlock *ExitingBB = ScalarInst->getParent();                              
> >     BasicBlock *ExitingBBCopy = BlockMap[ExitingBB];                              
> >     Builder.SetInsertPoint(ExitingBBCopy->getTerminator());                       
> >                                                                                   
> >     auto Address = getOrCreateAlloca(*MA);                                        
> >                                                                                   
> >     Val = getNewScalarValue(Val, R, Stmt, LTS, BBMap);                            
> >     Builder.CreateStore(Val, Address);                                            
> >                                                                                   
> >     // Restore the insertion point if necessary.                                  
> >     Builder.SetInsertPoint(SavedInsertionPoint);                                  
> >   } 
> > ```
> > 
> > Did I miss something important or are these simplifications correct?
> > 
> > Also, is it correct that we code-generate all PHI nodes at their normal statement locations? Hence, the copy-out code for PHI-nodes does not introduce any functional change?
> There can be (unecessary) writes to a.s2a and b.s2a which are not necessarily in the exit(ing) block, i.e. somewhere in the middle. This violated the intention I had to not write to (not yet existing) MAPPED scalars in the middle of ScopStmt, making intra-ScopStmt lifetime analysis necessary.
> 
> Stores in the BBs appear in the reverse order. Not a problem from my POV though.
> 
> I don't know whether it is possible that there already is code in the ExitingBBCopy when it has been split up to insert the non-affine region. If yes, building code at the terminator might be after the instruction that have been before.
> 
> I wouldn't think too much about this code, I will make it go away in the patch I am working on at the moment.
> There can be (unecessary) writes to a.s2a and b.s2a which are not necessarily in the exit(ing) block, i.e. somewhere in the middle. This
> violated the intention I had to not write to (not yet existing) MAPPED scalars in the middle of ScopStmt, making intra-ScopStmt lifetime 
> analysis necessary.

Probably this was already explained in earlier discussions, but I do not yet fully understand what "MAPPED scalars" will be (probably the once that will be folded into array loads). To get a better feeling let me try to understand the intention of your changes. As far as I understand not inserting a.s2a and b.s2a somewhere in the middle of a non-affine region statement is important, but dropping the .s2a is not, right? Is the following code sufficient for what you need. It should generate all s2a in the exit block, but does still generate all s2a statements that we currently model.

```
for (MemoryAccess *MA : Stmt) {                                                 
  if (MA->isExplicit() || MA->isRead())                                         
    continue;                                                                   
                                                                                
  Instruction *ScalarInst = MA->getAccessInstruction();                         
  Value *Val = MA->getAccessValue();                                            
                                                                                
  // In case we add the store into an exiting block, we need to restore the     
  // position for stores in the exit node.                                      
  auto SavedInsertionPoint = Builder.GetInsertPoint();                          

  if (MA->isPHI()) {
    BasicBlock *ExitingBB = ScalarInst->getParent();                              
    BasicBlock *ExitingBBCopy = BlockMap[ExitingBB];                              
    Builder.SetInsertPoint(ExitingBBCopy->getTerminator());                       
  }
                                                                                
  auto Address = getOrCreateAlloca(*MA);                                        
                                                                                
  Val = getNewScalarValue(Val, R, Stmt, LTS, BBMap);                            
  Builder.CreateStore(Val, Address);                                            
                                                                                
  // Restore the insertion point if necessary.
  if (MA->isPHI() {
    Builder.SetInsertPoint(SavedInsertionPoint);
  }
}
```
(I mostly try to understand what are the requirements for your later patches)



> I don't know whether it is possible that there already is code in the ExitingBBCopy when it has been split up to insert the non-affine region.
> If yes, building code at the terminator might be after the instruction that have been before.\

Good point, but there should be no such code (besides branch statements and other control-flow conditions that would be unrelated).

> I wouldn't think too much about this code, I will make it go away in the patch I am working on at the moment.

OK. Though I still try to be a little detailed to make sure I get a better feeling of the direction you are aiming. I was a little high-level last week and I think that may have caused some confusion. If I know where you are going, it is easier for me to get the upcoming patches.

================
Comment at: test/Isl/CodeGen/non-affine-phi-node-expansion-3.ll:20
@@ -19,9 +19,3 @@
 ; CHECK-NEXT: %p_val2 = fadd float 1.000000e+00, 2.000000e+00
-; CHECK-NEXT: store float %p_val0, float* %merge.phiops
-; CHECK-NEXT: store float %p_val1, float* %val1.s2a
-; CHECK-NEXT: store float %p_val2, float* %val2.s2a
-
-; FIXME -> The last two writes are not really needed and can be dropped if the
-;          incoming block of the PHI and the value that is used share the same
-;          non-affine region.
+; CHECK: store float %p_val0, float* %merge.phiops
 
----------------
Meinersbur wrote:
> grosser wrote:
> > > This is a side-effect because val0 and val1 are dominating the non-affine region's exit, i.e. it is not possible to have such a store in the exit block. A planned change will clean up the logic that generates MemoryAccesses.
> > 
> > I have troubles following this reasoning. If this code was correct before, why would it now be incorrect? Would it break the no-scalar-in-statement dependency property you want to establish?
> > 
> > > I didn't not understand why check for instructions that shouldn't even be there, so I just removed the check.
> > 
> > Which check are you talking about? I could not find a check in Polly's source code that was dropped.
> > 
> > 
> > Independent of the comments above, your test seems to be unnecessarily weak. AKA, it would also pass if the original output is generated. I think if the new output is indeed preferable, we should check that it is really generated.
> > I have troubles following this reasoning. If this code was correct before, why would it now be incorrect? Would it break the no-scalar-in-statement dependency property you want to establish?
> 
> Because this patch moves the store from the end of the BB where it has been defined to the region exit, where the value's definition is not necessarily dominating.
> 
> Taking your example:
> 
> 
> ```
>                bb1:
>                br non-affine bb2, bb3
> 
> bb2:                                       bb3:
>   a =                                      b =
>   <store %a, %a.s2a>
>   br bb4                                  br exit
> 
> bb4:
>   br exit:
> 
> exit:
>   val = phi [a, %bb4], [b, %bb3]
>   <store %a, %a.s2a>
> ```
> 
> The store can be in bb2, but not in exit. It cannot be in bb2 because of my intention to not have any writes in the middle of ScopStmts. It doesn't need to be written anyway because there can be no use of %a (i.e. without PHIs) beyond bb2/4.
> 
> 
> > Which check are you talking about? I could not find a check in Polly's source code that was dropped.
> 
> ; CHECK-NEXT: store float %p_val1, float* %val1.s2a
> ; CHECK-NEXT: store float %p_val2, float* %val2.s2a
> 
> 
> > Independent of the comments above, your test seems to be unnecessarily weak. AKA, it would also pass if the original output is generated. I think if the new output is indeed preferable, we should check that it is really generated.
> 
> What test are you talking about?
> 
> This is not meant as a unit test of any new functionality, just an adaption to changed behavior.
>> I have troubles following this reasoning. If this code was correct before, why would it now be incorrect? Would it break the no-scalar-
>> in-statement dependency property you want to establish?

> Because this patch moves the store from the end of the BB where it has been defined to the region exit, where the value's definition is not
> necessarily dominating.

> Taking your example:
> ```
>               bb1:
>              br non-affine bb2, bb3
>
> bb2:                                       bb3:
>  a =                                      b =
>  <store %a, %a.s2a>
>  br bb4                                  br exit
>
> bb4:
>   br exit:
>
> exit:
>  val = phi [a, %bb4], [b, %bb3]
>  <store %a, %a.s2a>
>
> The store can be in bb2, but not in exit. It cannot be in bb2 because of my intention to not have any writes in the middle of ScopStmts. It
> doesn't need to be written anyway because there can be no use of %a (i.e. without PHIs) beyond bb2/4.

As far as I understand the test case that triggered this discussion is not affected by this issue. I just committed test/Isl/CodeGen/non-affine-phi-node-expansion-4.ll where the value passed to the PHI node indeed does not dominate the non-affine region's exit. Looking at the code your unmodified patch creates, I see the following suspicious statements in the generated code that were not created before your patch:

```
polly.stmt.loop.entry:                            ; preds = %polly.loop_header
  %val1.s2a.reload = load float, float* %val1.s2a
  %val2.s2a.reload = load float, float* %val2.s2a
  br label %polly.stmt.loop


polly.stmt.branch2:                               ; preds = %polly.stmt.branch1
  store float %val2.s2a.reload, float* %merge.phiops
  br label %polly.stmt.backedge.exit
```

Is this a bug in your patch?

Now as I understand the issue, it seems what we want to do is to ensure we do not create scalar loads in case the (PHI-node) use is within the
same region. I think this is indeed something we should avoid, but as Johannes pointed out we probably want to fix this already at the place where we model the scop statement. Specifically, the following code should be modified to take non-affine regions into account:

```
    if (OpI) {                                                                   
      BasicBlock *OpIBB = OpI->getParent();                                      
      // As we pretend there is a use (or more precise a write) of OpI in OpBB   
      // we have to insert a scalar dependence from the definition of OpI to     
      // OpBB if the definition is not in OpBB.                                  
      if (OpIBB != OpBB) {                                                       
        addScalarReadAccess(OpI, PHI, OpBB);                                     
        addScalarWriteAccess(OpI);                                               
      } 
```

The above seems to be an independent change that should go in before this patch.

>> Independent of the comments above, your test seems to be unnecessarily weak. AKA, it
>> would also pass if the original output is generated. I think if the new output is indeed preferable,
>> we should check that it is really generated.

> What test are you talking about?
> This is not meant as a unit test of any new functionality, just an adaption to changed behavior.

This test case. It is is important that the  .s2a accesses are removed, the test should fail if this is not the case. I would just make this a CHECK-NEXT and also check for the (branch?) instruction that follows the .phiops store. (I committed this hardening already when adding non-affine-phi-node-expansion-4.ll)

================
Comment at: test/Isl/CodeGen/scev_expansion_in_nonaffine.ll:18
@@ +17,3 @@
+; CHECK:          %scevgep30 = getelementptr i32, i32* %scevgep29, i64 %65
+; CHECK:          store i32 21, i32* %scevgep30, align 8
+
----------------
Meinersbur wrote:
> grosser wrote:
> > This test case currently fails for me due to the instruction number having changed from %64 to %39. Using FileCheck Variables [1], this dependence on the instruction number can be removed.
> > 
> > 
> > ```
> > ; CHECK-LABEL:  polly.stmt.if.then.110:                                          
> > ; CHECK:          [[REGISTER:%[0-9]+]] = mul i64 %polly.indvar{{.*}}, 30         
> > ; CHECK:          [[GEP:%scevgep[0-9]+]] = getelementptr i32, i32* %scevgep{{.*}}, i64 [[REGISTER]]
> > ; CHECK:          store i32 0, i32* [[GEP]]                                      
> >                                                                                  
> > ; CHECK-LABEL:  polly.stmt.if.else:                                              
> > ; CHECK:          [[REGISTER:%[0-9]+]] = mul i64 %polly.indvar{{.*}}, 30         
> > ; CHECK:          [[GEP:%scevgep[0-9]+]] = getelementptr i32, i32* %scevge{{.*}}, i64 [[REGISTER:%[0-9]+]]
> > ; CHECK:          store i32 21, i32* [[GEP]]            
> > ```
> > 
> > 
> > Also, I am not fully sure what you check here. The stores you are CHECKing for seem to be due to array stores. Stmt_for_body_101__TO__for_inc_114 does not have any scalar stores modeled in -polly-scops. In your comments you mention dominance, so maybe this is related to the condition "!DT.dominates(cast<Instruction>(Val)->getParent(), StmtExitBB))", but as this condition is only checked for scalar writes this does not seem to be the case. I also tried if your patch affects this test, but in my experiments it passes with and without the remaining patch applied. Could you help me to understand how this test case is affected by your change? 
> > 
> > [1] http://llvm.org/docs/CommandGuide/FileCheck.html#filecheck-variables
> It is intended to fail before the diff update where instead of RegionMaps, a single ScalarMap was used. Johannes and me thought it would be okay, but this one test was failing. I made it to ensure that when trying to remove RegionMaps again, we have a unit test fail before an LNT test fail.
> 
> You could argue that this check is unrelated. I might commit it separately.
OK. I see. Good that you added the test case. It would indeed be better to commit it ahead of time to make clear that this test has been working before and still keeps working. Feel free to commit as is without further review (but please include the regex stuff).


http://reviews.llvm.org/D13487