[PATCH] D121617: [LV] Move code to place induction increment to VPlan post-processing.

Sun Mar 27 13:25:25 PDT 2022

Ayal accepted this revision.
Ayal added a comment.
This revision is now accepted and ready to land.

A good step forward in refactoring and simplifying induction variable recipes, raising thoughts for follow-up steps. Ship it!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9543
+  // block cannot be used, as it does not exist yet.
+  VecInd->addIncoming(LastInduction, State.CFG.VectorPreHeader);
 }
----------------
VectorPreHeader is definitely an incorrect BB for the backedge incoming value, but it is the correct BB for the start value.
Perhaps the current/header BB would be a safer temporary substitute (which must be non-null), to avoid confusion?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9533
+  // Add induction update using an incorrect block temporarily. The phi node
+  // will be fixed after VPlan execution.
+  VecInd->addIncoming(LastInduction, State.CFG.VectorPreHeader);
----------------
fhahn wrote:
> Ayal wrote:
> > This goes back to D22416.
> > Simply adding this last step here with its current block, as done prior to D22416, may be incorrect - when header and latch are distinct due to replicated regions? (May be good to (have a) test)
> > 
> > Would be good to explain that the last induction step, which feeds PHI, cannot be added yet to PHI properly, because that requires the latch block - which may not have been formed yet. An alternative to an incorrect addIncoming() may be to record LastInduction temporarily inside its VPWidenIntOrFpInductionRecipe (as it's operand(?) - see below), until it can be added to PHI properly. Given that LastInduction is eventually placed in the latch, another alternative may be to designate a recipe for generating LastInduction, placing it in the latch, so that when the recipe executes it could add LastInduction properly to PHI. Listing these alternatives mainly for the sake of completeness...
> > This goes back to D22416.
> > Simply adding this last step here with its current block, as done prior to D22416, may be incorrect - when header and latch are distinct due to replicated  regions? (May be good to (have a) test)
> 
> I think there are 2 distinct things going on. 1) moving the induction to the latch block and 2) adding the incoming value using the latch block. We could stop doing 1), without impacting correctness, but 2) is needed to generate valid IR in the presence of replicate regions. This should be handled by multiple existing tests.
> 
> > Would be good to explain that the last induction step, which feeds PHI, cannot be added yet to PHI properly, because that requires the latch block - which may not have been formed yet.
> 
> Done!
> 
> >  An alternative to an incorrect addIncoming() may be to record LastInduction temporarily inside its VPWidenIntOrFpInductionRecipe (as it's operand(?) - see below), until it can be added to PHI properly
> 
> Yes, but I think we would need to introduce a new VPValue for the increment, so we can map it properly. Adding the PHI as operand to itself wouldn't work I think, because in State we can only store values for each part, but we need the value of the last part + VF. Alternatively we could record the IR value in the recipe. But both of those seem like similar workarounds, but seems slightly more complicated. Another thing to note is that the CFG/IR LV generates during codegen is already temporarily invalid.
> 
> 
> >  Given that LastInduction is eventually placed in the latch, another alternative may be to designate a recipe for generating LastInduction, placing it in the latch, so that when the recipe executes it could add LastInduction properly to PHI. Listing these alternatives mainly for the sake of completeness...
> 
> Agreed, but it would add extra clutter to the plan and also the code needed. Please let me know if any of the alternative is strictly preferable to the current approach.
> 
> > This goes back to D22416.
> > Simply adding this last step here with its current block, as done prior to D22416, may be incorrect - when header and latch are distinct due to replicated  regions? (May be good to (have a) test)
> 
> I think there are 2 distinct things going on. 1) moving the induction to the latch block and 2) adding the incoming value using the latch block. We could stop doing 1), without impacting correctness, but 2) is needed to generate valid IR in the presence of replicate regions. This should be handled by multiple existing tests.
> 
> > Would be good to explain that the last induction step, which feeds PHI, cannot be added yet to PHI properly, because that requires the latch block - which may not have been formed yet.
> 
> Done!
> 
> >  An alternative to an incorrect addIncoming() may be to record LastInduction temporarily inside its VPWidenIntOrFpInductionRecipe (as it's operand(?) - see below), until it can be added to PHI properly
> 
> Yes, but I think we would need to introduce a new VPValue for the increment, so we can map it properly. Adding the PHI as operand to itself wouldn't work I think, because in State we can only store values for each part, but we need the value of the last part + VF. Alternatively we could record the IR value in the recipe. But both of those seem like similar workarounds, but seems slightly more complicated. Another thing to note is that the CFG/IR LV generates during codegen is already temporarily invalid.
> 
> 
> >  Given that LastInduction is eventually placed in the latch, another alternative may be to designate a recipe for generating LastInduction, placing it in the latch, so that when the recipe executes it could add LastInduction properly to PHI. Listing these alternatives mainly for the sake of completeness...
> 
> Agreed, but it would add extra clutter to the plan and also the code needed. Please let me know if any of the alternative is strictly preferable to the current approach.
> 

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:990

+    // Set the correct incoming block for backedge values and move induction to
+    // latch.
----------------
fhahn wrote:
> Ayal wrote:
> > fhahn wrote:
> > > Ayal wrote:
> > > > Could the incoming value to an induction be added similar to adding such values for reductions and FORs below, which have distinct values across the backedge? I.e., another case of SinglePartNeeded? Could PhiR->getBackedgeValue() work - if the induction PhiR is added as an (second) operand of itself?
> > > > 
> > > > Moving the last induction step to the end of the latch could then be done separately.
> > > > Could the incoming value to an induction be added similar to adding such values for reductions and FORs below, which have distinct values across the backedge? I.e., another case of SinglePartNeeded? Could PhiR->getBackedgeValue() work - if the induction PhiR is added as an (second) operand of itself?
> > > 
> > > Unfortunately I don't think so, without further changes. The recipe for the phi is currently mapped to the vector IV values for each part. The increment is the last part + VF. So we either would have to turn VPWidenIntOrFpInductionRecipe into a  multi-def or store the LLVM IR value in the recipe directly. Both seem like bigger workarounds in my opinion.
> > (That depends on one's definition of "workaround" ;-)
> > 
> > It would be good if the modeling in VPlan of inductions and header phi's is more consistent. This could perhaps be done later, in subsequent patches. Trying to capture the issue and potential resolutions:
> > 
> > VPHeaderPHIRecipe offers an interface for recording the backedge value as an operand, used by the Canonical IV and other header phi recipes which (must) have distinct backedge VPValues. Their modeling in VPlan thus matches the IR they generate. During VPlan execution, phi's are first generated with start values only, when traversing from header to latch, and are later revisited to add their backedge incoming values after the latch was generated.
> > 
> > VPWidenIntOrFpInductionRecipe is currently not a VPHeaderPHIRecipe, it does not support an explicit backedge value. VPWidenIntOrFpInductionRecipe currently represents both the header phi - explicitly (with UF per-part bumps) and its bump - implicitly; it is conceptually a multi-def recipe, but uses its bump only to feed itself, w/o exposing it to any other user. This implicit bump must wait until the latch is generated in order to be fully rewired to the header phi, but can be partially/incorrectly rewired to the header phi temporarily.
> > 
> > VPCanonicalIVPHIRecipe is currently a VPHeaderPHIRecipe; it represents only the header phi, having a separate CanonicalIVIncrement recipe designated to represent its bump, which feeds both VPCanonicalIVPHIRecipe and BranchOnCount; the canonical IV admittedly has an easier bump to generate than int/fp/pointer inductions. Both header phi and bump can alternatively be represented by a single multi-def recipe, as noted for VPWidenIntOrFpInductionRecipe.
> > 
> > Should VPWidenIntOrFpInductionRecipe expose the bump that feeds it across the backedge, for other potential users, and for transparency/consistency?
> > Should VPWidenIntOrFpInductionRecipe become a VPHeaderPHIRecipe, by supporting a backedge VPValue?
> > Should the rewiring of header phi's be delegated to VPHeaderPHIRecipe, say by adding a second 'execute' method to it, to be called after the latch is constructed, here or at the end of Region::execute?
> > It would be good if the modeling in VPlan of inductions and header phi's is more consistent. This could perhaps be done later, in subsequent patches. Trying to capture the issue and potential resolutions:
> 
> Agreed, we should work towards making those recipes as consistent as we can!
> 
> > Should VPWidenIntOrFpInductionRecipe expose the bump that feeds it across the backedge, for other potential users, and for transparency/consistency?
> 
> IMO I think it would be better the other way around, fold the increment of `VPCanonicalIVPHIRecipe` back into the recipe. The PHI recipes could model the bump as additional VPValue (i.e. turn them into multi-defs). This would allow us to keep the `induction` concept encapsulated in single recipes and the plans themselves more concise.
> 
> > Should VPWidenIntOrFpInductionRecipe become a VPHeaderPHIRecipe, by supporting a backedge VPValue?
> 
> Yes I think so. Should be easy now that` VPWidenIntOrFpInductionRecipe` always creates a phi.
> 
> > Should the rewiring of header phi's be delegated to VPHeaderPHIRecipe, say by adding a second 'execute' method to it, to be called after the latch is constructed, here or at the end of Region::execute?
> 
> Yes that would be a good option to move that logic to something like `VPHeaderPHIRecipe::fixupPhi`.
> 
> Updating this patch to turn `VPWidenIntOrFpInductionRecipe` into a multi-def should be relatively straight forward. Please let me know if you'd prefer to do this in the current patch or as follow up.
> 
> > Should VPWidenIntOrFpInductionRecipe expose the bump that feeds it across the backedge, for other potential users, and for transparency/consistency?
> 
> IMO I think it would be better the other way around, fold the increment of `VPCanonicalIVPHIRecipe` back into the recipe. The PHI recipes could model the bump as additional VPValue (i.e. turn them into multi-defs). This would allow us to keep the `induction` concept encapsulated in single recipes and the plans themselves more concise.

Agreed. It's the same way - turning VPWidenIntOrFpInductionRecipe into a multi-def is one way of exposing its bump.
Note that enabling other potential users of the exposed bump deserves attention when sinking it to the latch.

> Updating this patch to turn `VPWidenIntOrFpInductionRecipe` into a multi-def should be relatively straight forward. Please let me know if you'd prefer to do this in the current patch or as follow up.

Follow up is fine, preferably with some TODO/FIXME left behind to indicate expected changes.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121617/new/

https://reviews.llvm.org/D121617