[PATCH] D123537: [VPlan] Model first exit values in VPlan.

Mon May 2 00:15:09 PDT 2022

Ayal added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4079
                         VPIteration(UF - 1, Lane));
     LCSSAPhi.addIncoming(lastIncomingValue, LoopMiddleBlock);
   }
----------------
fhahn wrote:
> Ayal wrote:
> > Is this made redundant by ScalarExitValues?
> It is in the latest version of the patch. One wrinkle is that we need to update the ScalarExitValue instructions in the the plan for the epilogue loop. At the moment this is done by removing the exit values referring the LCSSA phi's for the main loop's exits and adding exit values for the newly created exit block. 
> 
> That unfortunately requires exposing a bit more functionality to be able to do so.
Perhaps better to progress gradually - first introduce loop-closed phi recipes inside Middle block (admittedly affecting many tests?), and model VPlan live-out VPUsers + update VPlan's fixLCSSAPHIs() afterwards?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8750
+  // Introduce recipes modeling the exit values.
+  Builder.setInsertPoint(MiddleVPBB, MiddleVPBB->getFirstNonPhi());
+  if (BasicBlock *ExitBB = OrigLoop->getUniqueExitBlock()) {
----------------
fhahn wrote:
> Ayal wrote:
> > ScalarExitValue represents a (loop-closed) phi, and as such can be inserted at the top of MiddleVPBB?
> Added a comment, thanks!
> ScalarExitValue represents a (loop-closed) phi, and as such can be inserted at the top of MiddleVPBB?

Ah, original comment suggested that insert point could alternatively be set to MiddleVPBB->begin() instead of MiddleVPBB->getFirstNonPhi() for  ScalarExitValue. But that is incorrect - ScalarExitValue currently represents the rewiring of a loop-value to a loop-closed phi residing in Exit block, rather than a loop-closed phi in Middle block, along with an implicit potential extract - which must appear in MiddleBlock following all its phi's.

A recipe is designed to take care of generating an instruction (one or more) and hooking it up to its operands, rather than taking care of its users.

The term "live-out" may be ambiguous - referring to values defined inside the loop body and used outside of it, and/or values defined inside VPlan's scope and used outside of it. By definition, VPlan live-outs should be represented using a derivative of VPUser that is not a recipe - they must have underlying Instruction/User, which need to be rewired, but belong to no VPBB, potentially held by VPlan itself; analogous to VPlan live-ins represented using VPValues that are not part of any recipe's VPDef, and held in VPlan's VPExternalDefs.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8759
+          ExitPhi.getIncomingValueForBlock(OrigLoop->getLoopLatch());
+      VPValue *V = Plan.getOrAddVPValue(IncomingValue);
+      if (SkipExitValues.count(V))
----------------
fhahn wrote:
> Ayal wrote:
> > >> This should get rather than add, right?
> > 
> > > We could have loop invariant incoming values which won't be modeled in VPlan, hence getOrAddVPValue.
> > 
> > Hmm, IncomingValue is initially an IR Instruction inside the loop feeding a loop-closed phi; if it is loop invariant (missed pre-LV LICM) then LV could LICM and move it to the pre-header (analogous to hoisting up a live-out IV) but instead VPlan keeps it in place wrapping it up with a uniform ReplicateRecipe?
> Yes that would be an option. It is likely only relevant for handwritten tests.
Original comment just notes that this should get rather than add, even if incoming value is loop invariant.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:333
     State->CFG.PrevBB = NewBB;
+    State->Builder.SetInsertPoint(NewBB, NewBB->begin());
   } else if (PrevVPBB && /* A */
----------------
fhahn wrote:
> Ayal wrote:
> > Sounds right, but is this really needed - ScalarExitValue recipes seem to only hook-up loop-closed phis (at this time), w/o generating any instructions using Builder?
> We might need to insert extracts I think.
Ah, right! Better to dispel this magic rather than place a recipe in a given VPBB for no apparent reason.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:813
+  case VPInstruction::ScalarExitValue: {
+    if (Part != 0)
+      break;
----------------
Ayal wrote:
> nit: seems more logical to generate code when called for the last part rather than the first, bailing out here `if (Part != State.UF - 1)`, and using Part later rather than State.UF - 1?
sounds right?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:821
+        State.get(getOperand(0), VPIteration(State.UF - 1, Lane)),
+        State.CFG.VPBB2IRBB[getParent()]);
+    break;
----------------
fhahn wrote:
> Ayal wrote:
> > Placing ScalarExitValues inside middleBlock suggests they represent loop-closed phi's that will appear there, but actually the loop-closed phi's reside all along in Exit block, and use ScalarExitValue to retrieve its parent - middleBlock when rewiring its operand(?)
> > 
> > Test pr51366 below shows a ScalarExitValue inside 'exit' block which follows 'middleBlock'. Wonder if they are meant to be fused, provided 'exit' block has no other predecessor, or should ScalarExitValue introduce a loop-closed phi in middleBlock instead?
> > Placing ScalarExitValues inside middleBlock suggests they represent loop-closed phi's that will appear there, but actually the loop-closed phi's reside all along in Exit block, and use ScalarExitValue to retrieve its parent - middleBlock when rewiring its operand(?)
> 
> The recipe doesn't create new LCSSA phis, but only updates existing ones. It may need to insert extracts, which should be placed in the middle block.
> 
> > Test pr51366 below shows a ScalarExitValue inside 'exit' block which follows 'middleBlock'. Wonder if they are meant to be fused, provided 'exit' block has no other predecessor, or should ScalarExitValue introduce a loop-closed phi in middleBlock instead?
> 
> I think the `exit` block can have multiple predecessors, in the test case it also has the latch of the scalar loop as predecessor. We could also generate an LCSSA phi in the middle block, but it doesn't seem needed currently.
>> Placing ScalarExitValues inside middleBlock suggests they represent loop-closed phi's that will appear there, but actually the loop-closed phi's reside all along in Exit block, and use ScalarExitValue to retrieve its parent - middleBlock when rewiring its operand(?)

> The recipe doesn't create new LCSSA phis, but only updates existing ones. It may need to insert extracts, which should be placed in the middle block.

Let's try to use recipes, represent VPlan live-outs and model loop-closed phi's more explicitly and clearly.

>> Test pr51366 below shows a ScalarExitValue inside 'exit' block which follows 'middleBlock'. Wonder if they are meant to be fused, provided 'exit' block has no other predecessor, or should ScalarExitValue introduce a loop-closed phi in middleBlock instead?

> I think the exit block can have multiple predecessors, in the test case it also has the latch of the scalar loop as predecessor. We could also generate an LCSSA phi in the middle block, but it doesn't seem needed currently.

Generating an LCSSA phi in the middle block, using a dedicated recipe/VPInstruction, seems most reasonable - it would in turn feed the original loop-closed phi of the Exit block (along with an extract if needed), represented as a VPlan live-out VPUser?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123537/new/

https://reviews.llvm.org/D123537