[PATCH] D116123: [VPlan] Handle IV vector splat as VPInstruction.

Mon Jan 3 09:03:07 PST 2022

fhahn added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2543
   Value *Step = CreateStepValue(ID.getStep());
   if (State.VF.isZero() || State.VF.isScalar()) {
     Value *ScalarIV = CreateScalarIV(Step);
----------------
Ayal wrote:
> (assert !State.VF.isZero()? Indep. of this patch.)
Done in e2f1c4c7066b

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2546
+    for (unsigned Part = 0; Part < UF; ++Part) {
+      assert(!State.VF.isScalable() && "scalable vectors not yet supported.");
+      Value *EntryPart;
----------------
Ayal wrote:
> assert outside loop.
> This is handling the unrolling-only case, does it matter whether vectors are scalable or not?
Moved!

> This is handling the unrolling-only case, does it matter whether vectors are scalable or not?

Not sure why the assert is here, but I think it should be retained in this patch.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2550
+        Value *StartIdx =
+            getRuntimeVFAsFloat(Builder, Step->getType(), State.VF * Part);
+        // Floating-point operations inherit FMF via the builder's flags.
----------------
Ayal wrote:
> State.VF == 1?
> Calling getRuntimeVF* seems redundant, StartIdx == (Value*...)Part?
Replaced call.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2557
+        Value *StartIdx =
+            getRuntimeVF(Builder, Step->getType(), State.VF * Part);
+        EntryPart = Builder.CreateAdd(
----------------
Ayal wrote:
> ditto
Replaced call

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2568
   }

   // Determine if we want a scalar version of the induction variable. This is
----------------
Ayal wrote:
> Clarifying the three options for VF>1:
> 1. Create a vector IV (createVectorIntOrFpInductionPHI)
> 2. Create a scalar IV (CreateScalarIV) with per-lane steps (buildScalarSteps)
> 3. both 1&2
> which complements the above option for VF=1:
> 0. Create a scalar IV (CreateScalarIV) with per-part steps (CreateSplatIV updated and inlined)
> 
> It may be clearer to have
> 
> ```
>   auto NeedsVectorIV = !shouldScalarizeInstruction(EntryVal);
>   if (NeedsVectorIV)
>     createVectorIntOrFpInductionPHI(ID, Step, Start, EntryVal, Def, State);
> 
>   auto NeedsScalarIV = needsScalarInduction(EntryVal);
>   if (NeedsScalarIV) {
>     Value *ScalarIV = CreateScalarIV(Step);
>     buildScalarSteps(ScalarIV, Step, EntryVal, ID, Def, State);
>   }
> 
> ```
> 
> Perhaps VPWidenIntOrFpInductionRecipe should know of NeedsScalarIV and NeedsVectorIV (names to be improved), pass them to ILV->widenIntOrFpInduction() and to others - see removeRedundantStepVector below? 
> Clarifying the three options for VF>1:

Exactly! I restructured the code as suggested, thanks.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8403
+        VPValue *StepVector = VPUtils::getCanonicalStepVector(CanIV);
+        if (!StepVector) {
+          auto *One =
----------------
Ayal wrote:
> Is anything but CanIV expected to feed getCanonicalStepVector?
> Does it aim to cope with multiple attempts to obtain CanIV's StepVector? It only feeds a single ICmpULE.
> Sounds like Plan->getOrCreateCanonicalStepVector()?
> 
> Can alternatively have VPCanonicalIVPhiRecipe define two VPValues: one being the scalar-per-part providing lane 0, and the other (optionally optional) providing its vector steps?
> Is anything but CanIV expected to feed getCanonicalStepVector?
> Does it aim to cope with multiple attempts to obtain CanIV's StepVector? It only feeds a single ICmpULE.

Yes that was the original goal. The code is now gone, the stepvector may be replace the widened canonical IV created unconditional here.

> Can alternatively have VPCanonicalIVPhiRecipe define two VPValues: one being the scalar-per-part providing lane 0, and the other (optionally optional) providing its vector steps?

I think modeling the stepvector separately allows us to re-use it for vector induction increments and optionally defining an additional value would make it a bit harder to first add the stepvector and then optimize it away; with a separate recipe for the stepvector we can simplify replace the uses and remove the step recipe. With multi-defs we would need to either undef one of the multi-defs or replace the multi-def recipe with a single def version.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8791
+
+  return LoopVectorizationPlanner::getDecisionAndClampRange(
+      [&](ElementCount VF) {
----------------
Ayal wrote:
> If applied as a VPlan2VPlan optimization after VPlans were built according to ranges that have been clamped, suffice to check properties of any VF in Range; can assert they are the same across Range; best prevent clamping Range.
The issue here is that for VF == 1 there's no need for a step-vector. We could define step-vector to just produce a scalar for VF = 1 though? Later patches with a separate recipe for scalar-steps would handle this as well.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9125
+  // Replace StepVector recipes of the canonical induction by the widened vector
+  // induction, if it exists.
   // ---------------------------------------------------------------------------
----------------
Ayal wrote:
> place comment next to call.
the comment is stale now, removed

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9196
+        return RecipeBuilder.onlyScalarStepsNeeded(I, Range);
+      },
+      Legal->getPrimaryInduction());
----------------
Ayal wrote:
> Have removeRedundantStepVector() take care of whatever it needs to check internally, rather than passing in this lambda?
All information to take the decision is already available in the VPlan. We only need to know whether the tail will be folded and check check the users of the induction recipe in the plan.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8419
     VPValue *IV = nullptr;
-    if (Legal->getPrimaryInduction())
-      IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction());
-    else {
+    if (!Legal->getPrimaryInduction()) {
       VPBasicBlock *HeaderVPBB = Plan->getEntry()->getEntryBasicBlock();
----------------
Ayal wrote:
> fhahn wrote:
> > Ayal wrote:
> > > Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists?
> > > 
> > > At first FoldTail relied solely on PrimaryInduction, until D77635 extended it to work w/o PrimaryInduction; but it should probably have removed the dependence on PrimaryInduction altogether. This may also simplify/revert D92017. A subsequent optimization could try to fold multiple IV recipes, in case both the PrimaryInduction was widened and WidenCanonicalIV was created (or leave it to LSR).
> > > Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists?
> > 
> > We could do, but it would probably mean a bigger test diff and a bigger functional change. I'd prefer to do it separately, to avoid any unexpected knock-on effects.
> >> Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists?
> 
> > We could do, but it would probably mean a bigger test diff and a bigger functional change. I'd prefer to do it separately, to avoid any unexpected knock-on effects.
> 
> sure, can also do so separately first. It may help simplify this patch, among other things.
Update to always generate the widened canonical induction.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8442
+        IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction());
+      }
       BlockMask = Builder.createNaryOp(VPInstruction::ActiveLaneMask, {IV});
----------------
Ayal wrote:
> fhahn wrote:
> > Ayal wrote:
> > > fhahn wrote:
> > > > fhahn wrote:
> > > > > fhahn wrote:
> > > > > > Ayal wrote:
> > > > > > > No need for {}
> > > > > > > 
> > > > > > > So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately?
> > > > > > > 
> > > > > > > BTC is no longer used by ActiveLaneMask, so setting it should sink below to where ICmpULE needs it (While we're here, irrespective of this patch).
> > > > > > > No need for {}
> > > > > > 
> > > > > > I think if the body spans multiple lines due to a comment, braces should not be omitted
> > > > > > https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements
> > > > > > 
> > > > > > > So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately?
> > > > > > 
> > > > > > Yes, let me try to split that off. `ActiveLaneMask` only needs the first scalar step of each `Part`.
> > > > > > 
> > > > > > > BTC is no longer used by ActiveLaneMask, so setting it should sink below to where ICmpULE needs it (While we're here, irrespective of this patch).
> > > > > > 
> > > > > > done separately in 2e630eabd329
> > > > > >> So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately?
> > > > > >Yes, let me try to split that off. ActiveLaneMask only needs the first scalar step of each Part.
> > > > > 
> > > > > I just double checked and we may still need to go the route with the widened recipe if there's no primary induction. But there's no test case without primary inductions that use active.lane.mask. I;ll push one and update the patch.
> > > > I missed something here and IIUC your comment was regarding creating a step vector in both cases at first, right?
> > > > 
> > > > The issue with that is that in some cases `ActiveLaneMask` will use the step-vector instead of the scalar steps, so I left it as is for now.
> > > > 
> > > > > I just double checked and we may still need to go the route with the widened recipe if there's no primary induction. 
> > > > 
> > > > The current patch still creates the vector phi when necessary and the existing tests cover that already.
> > > Yes, my comment refers to the fact that currently both ActiveLaneMask and ICmpULE are treated the same when Legal has a PrimaryInduction() - both use the same IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction());
> > > This patch continues to feed ActiveLaneMask with this IV, but feeds ICmpULE with a StepVector.
> > > Yes, my comment refers to the fact that currently both ActiveLaneMask and ICmpULE are treated the same when Legal has a PrimaryInduction() - both use the same IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction());
> > 
> > Ah I see. Yes, currently they both use the same VPValue, because the single recipe creates both the scalar steps and the step vector. But ActiveLaneMask only needs the steps and ICmpULE only needs the step vector. This is explicit now in the code as a result of breaking this part of the phi handling into distinct pieces.
> > Ah I see. Yes, currently they both use the same VPValue, because the single recipe creates both the scalar steps and the step vector. But ActiveLaneMask only needs the steps and ICmpULE only needs the step vector. This is explicit now in the code as a result of breaking this part of the phi handling into distinct pieces.
> 
> ok, that confirms 1st Q above, what about the 2nd: Seems like an optimization best applied separately?
I updated the patch to always generate widened induction here, and added a transform to replace it if possible later.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8454
+          auto *R =
+              new VPInstruction(VPInstruction::StepVector, {CanIV, OneVPV});
+          Builder.getInsertBlock()->insert(R, NewInsertionPoint);
----------------
Ayal wrote:
> fhahn wrote:
> > Ayal wrote:
> > > StepVector needs to hold One as its second operand, mainly to convey the desired type?
> > At the moment only a step of one is used, but it will be used with different steps in the future (build scalar steps for inductions with arbitrary steps.
> > At the moment only a step of one is used, but it will be used with different steps in the future (build scalar steps for inductions with arbitrary steps.
> 
> Have that future patch extend the support from the initial unit step needed here?
A stepvector is also needed for incrementing vector inductions and I plan to use the recipe in a later patch, but  unfortunately there's nothing to share yet.

================
Comment at: llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h:178
+
+  bool onlyScalarStepsNeeded(Instruction *I, VFRange &Range) const;
 };
----------------
Ayal wrote:
> document
Moved definition out of VPRecipeBuilder and removed this here.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:741
+             "Val changed for Part");
+      Value *Step = State.get(getOperand(1), VPIteration(Part, 0));
+      Value *StartIdx;
----------------
Ayal wrote:
> `Value *Step = State.get(getOperand(1), Part);` ?
see above.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:751
+      Value *EntryPart =
+          getStepVector(Broadcasted, StartIdx, Step, Instruction::Add, State.VF,
+                        State.Builder);
----------------
Ayal wrote:
> Does Instruction::Add also work for FP inductions?
stepvector is not used for floats yet, added an assert

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:746
+    for (unsigned Part = 0; Part < State.UF; ++Part) {
+      assert(!State.VF.isScalable() && "scalable vectors not yet supported.");
+      Value *StartIdx;
----------------
Ayal wrote:
> fhahn wrote:
> > Ayal wrote:
> > > assert at the outset? (While we're here, irrespective of this patch)
> > Move out of the loop and will push a fix separately.
> `assert(Val == State.get(getOperand(0), Part) &&  "Val changed for Part");` ?
Unfortunately this won't work as expected in this case, because the operand is an constant wrapped in a VPValue and State is not explicitly set. Currently State.get will broadcast the constant in a vector when using State.get(getOperand(1), Part); and return scalar when using VPIteration(Part, 0).

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:753
+        StartIdx =
+            getRuntimeVF(State.Builder, Step->getType(), State.VF * Part);
+
----------------
Ayal wrote:
> Wrap this inside a common interface that checks the type of step and returns the desired Value? (ditto)
At the moment there's no need for float support here. I added an assert.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2746
+
+  static VPInstruction *getCanonicalStepVector(VPCanonicalIVPHIRecipe *R);
+};
----------------
Ayal wrote:
> There's only a single VPCanonicalIVPHIRecipe per Plan to be passed in, may as well have getCanonicalStepVector() obtain it from Plan?
This is not needed any longer in the current version of the patch. Removed.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:344
+    if (IV->getUnderlyingValue() == OriginalCanonicalIV &&
+        !OnlyScalarStepsNeeded(IV->getUnderlyingInstr())) {
+      CanonicalStepVector->replaceAllUsesWith(IV);
----------------
Ayal wrote:
> Can early-exit before the loop if OnlyScalarStepsNeeded(OriginalCanonicalIV).
> Then here check if (IV && IV->getUnderlyingValue() == OriginalCanonicalIV).
> But perhaps better to check
>   if (IV && IV->getUnderlyingValue() == OriginalCanonicalIV && !IV->NeedsVectorIV())
> as mentioned above.
This has been completely removed and replaced by a different transform.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanValue.h:333
     VPWidenSelectSC,
+    VPStepVectorSC,

----------------
Ayal wrote:
> Lex order
This is a leftover from an earlier iteration and has been removed.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116123/new/

https://reviews.llvm.org/D116123