[PATCH] D116123: [VPlan] Handle IV vector splat as VPInstruction.

Wed Dec 29 13:57:36 PST 2021

Ayal added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8419
     VPValue *IV = nullptr;
-    if (Legal->getPrimaryInduction())
-      IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction());
-    else {
+    if (!Legal->getPrimaryInduction()) {
       VPBasicBlock *HeaderVPBB = Plan->getEntry()->getEntryBasicBlock();
----------------
fhahn wrote:
> Ayal wrote:
> > Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists?
> > 
> > At first FoldTail relied solely on PrimaryInduction, until D77635 extended it to work w/o PrimaryInduction; but it should probably have removed the dependence on PrimaryInduction altogether. This may also simplify/revert D92017. A subsequent optimization could try to fold multiple IV recipes, in case both the PrimaryInduction was widened and WidenCanonicalIV was created (or leave it to LSR).
> > Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists?
> 
> We could do, but it would probably mean a bigger test diff and a bigger functional change. I'd prefer to do it separately, to avoid any unexpected knock-on effects.
>> Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists?

> We could do, but it would probably mean a bigger test diff and a bigger functional change. I'd prefer to do it separately, to avoid any unexpected knock-on effects.

sure, can also do so separately first. It may help simplify this patch, among other things.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8442
+        IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction());
+      }
       BlockMask = Builder.createNaryOp(VPInstruction::ActiveLaneMask, {IV});
----------------
fhahn wrote:
> Ayal wrote:
> > fhahn wrote:
> > > fhahn wrote:
> > > > fhahn wrote:
> > > > > Ayal wrote:
> > > > > > No need for {}
> > > > > > 
> > > > > > So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately?
> > > > > > 
> > > > > > BTC is no longer used by ActiveLaneMask, so setting it should sink below to where ICmpULE needs it (While we're here, irrespective of this patch).
> > > > > > No need for {}
> > > > > 
> > > > > I think if the body spans multiple lines due to a comment, braces should not be omitted
> > > > > https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements
> > > > > 
> > > > > > So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately?
> > > > > 
> > > > > Yes, let me try to split that off. `ActiveLaneMask` only needs the first scalar step of each `Part`.
> > > > > 
> > > > > > BTC is no longer used by ActiveLaneMask, so setting it should sink below to where ICmpULE needs it (While we're here, irrespective of this patch).
> > > > > 
> > > > > done separately in 2e630eabd329
> > > > >> So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately?
> > > > >Yes, let me try to split that off. ActiveLaneMask only needs the first scalar step of each Part.
> > > > 
> > > > I just double checked and we may still need to go the route with the widened recipe if there's no primary induction. But there's no test case without primary inductions that use active.lane.mask. I;ll push one and update the patch.
> > > I missed something here and IIUC your comment was regarding creating a step vector in both cases at first, right?
> > > 
> > > The issue with that is that in some cases `ActiveLaneMask` will use the step-vector instead of the scalar steps, so I left it as is for now.
> > > 
> > > > I just double checked and we may still need to go the route with the widened recipe if there's no primary induction. 
> > > 
> > > The current patch still creates the vector phi when necessary and the existing tests cover that already.
> > Yes, my comment refers to the fact that currently both ActiveLaneMask and ICmpULE are treated the same when Legal has a PrimaryInduction() - both use the same IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction());
> > This patch continues to feed ActiveLaneMask with this IV, but feeds ICmpULE with a StepVector.
> > Yes, my comment refers to the fact that currently both ActiveLaneMask and ICmpULE are treated the same when Legal has a PrimaryInduction() - both use the same IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction());
> 
> Ah I see. Yes, currently they both use the same VPValue, because the single recipe creates both the scalar steps and the step vector. But ActiveLaneMask only needs the steps and ICmpULE only needs the step vector. This is explicit now in the code as a result of breaking this part of the phi handling into distinct pieces.
> Ah I see. Yes, currently they both use the same VPValue, because the single recipe creates both the scalar steps and the step vector. But ActiveLaneMask only needs the steps and ICmpULE only needs the step vector. This is explicit now in the code as a result of breaking this part of the phi handling into distinct pieces.

ok, that confirms 1st Q above, what about the 2nd: Seems like an optimization best applied separately?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8454
+          auto *R =
+              new VPInstruction(VPInstruction::StepVector, {CanIV, OneVPV});
+          Builder.getInsertBlock()->insert(R, NewInsertionPoint);
----------------
fhahn wrote:
> Ayal wrote:
> > StepVector needs to hold One as its second operand, mainly to convey the desired type?
> At the moment only a step of one is used, but it will be used with different steps in the future (build scalar steps for inductions with arbitrary steps.
> At the moment only a step of one is used, but it will be used with different steps in the future (build scalar steps for inductions with arbitrary steps.

Have that future patch extend the support from the initial unit step needed here?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2543
   Value *Step = CreateStepValue(ID.getStep());
   if (State.VF.isZero() || State.VF.isScalar()) {
     Value *ScalarIV = CreateScalarIV(Step);
----------------
(assert !State.VF.isZero()? Indep. of this patch.)

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2546
+    for (unsigned Part = 0; Part < UF; ++Part) {
+      assert(!State.VF.isScalable() && "scalable vectors not yet supported.");
+      Value *EntryPart;
----------------
assert outside loop.
This is handling the unrolling-only case, does it matter whether vectors are scalable or not?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2550
+        Value *StartIdx =
+            getRuntimeVFAsFloat(Builder, Step->getType(), State.VF * Part);
+        // Floating-point operations inherit FMF via the builder's flags.
----------------
State.VF == 1?
Calling getRuntimeVF* seems redundant, StartIdx == (Value*...)Part?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2557
+        Value *StartIdx =
+            getRuntimeVF(Builder, Step->getType(), State.VF * Part);
+        EntryPart = Builder.CreateAdd(
----------------
ditto

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2568
   }

   // Determine if we want a scalar version of the induction variable. This is
----------------
Clarifying the three options for VF>1:
1. Create a vector IV (createVectorIntOrFpInductionPHI)
2. Create a scalar IV (CreateScalarIV) with per-lane steps (buildScalarSteps)
3. both 1&2
which complements the above option for VF=1:
0. Create a scalar IV (CreateScalarIV) with per-part steps (CreateSplatIV updated and inlined)

It may be clearer to have

```
  auto NeedsVectorIV = !shouldScalarizeInstruction(EntryVal);
  if (NeedsVectorIV)
    createVectorIntOrFpInductionPHI(ID, Step, Start, EntryVal, Def, State);

  auto NeedsScalarIV = needsScalarInduction(EntryVal);
  if (NeedsScalarIV) {
    Value *ScalarIV = CreateScalarIV(Step);
    buildScalarSteps(ScalarIV, Step, EntryVal, ID, Def, State);
  }

```

Perhaps VPWidenIntOrFpInductionRecipe should know of NeedsScalarIV and NeedsVectorIV (names to be improved), pass them to ILV->widenIntOrFpInduction() and to others - see removeRedundantStepVector below? 

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8400
       VPValue *BTC = Plan->getOrCreateBackedgeTakenCount();
+      if (!IV) {
+        auto *CanIV = Plan->getCanonicalIV();
----------------
(If a VPWidenCanonicalIVRecipe is always constructed then IV will always be set to it here and this construction is not needed?)

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8403
+        VPValue *StepVector = VPUtils::getCanonicalStepVector(CanIV);
+        if (!StepVector) {
+          auto *One =
----------------
Is anything but CanIV expected to feed getCanonicalStepVector?
Does it aim to cope with multiple attempts to obtain CanIV's StepVector? It only feeds a single ICmpULE.
Sounds like Plan->getOrCreateCanonicalStepVector()?

Can alternatively have VPCanonicalIVPhiRecipe define two VPValues: one being the scalar-per-part providing lane 0, and the other (optionally optional) providing its vector steps?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8791
+
+  return LoopVectorizationPlanner::getDecisionAndClampRange(
+      [&](ElementCount VF) {
----------------
If applied as a VPlan2VPlan optimization after VPlans were built according to ranges that have been clamped, suffice to check properties of any VF in Range; can assert they are the same across Range; best prevent clamping Range.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9125
+  // Replace StepVector recipes of the canonical induction by the widened vector
+  // induction, if it exists.
   // ---------------------------------------------------------------------------
----------------
place comment next to call.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9196
+        return RecipeBuilder.onlyScalarStepsNeeded(I, Range);
+      },
+      Legal->getPrimaryInduction());
----------------
Have removeRedundantStepVector() take care of whatever it needs to check internally, rather than passing in this lambda?

================
Comment at: llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h:178
+
+  bool onlyScalarStepsNeeded(Instruction *I, VFRange &Range) const;
 };
----------------
document

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:741
+             "Val changed for Part");
+      Value *Step = State.get(getOperand(1), VPIteration(Part, 0));
+      Value *StartIdx;
----------------
`Value *Step = State.get(getOperand(1), Part);` ?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:751
+      Value *EntryPart =
+          getStepVector(Broadcasted, StartIdx, Step, Instruction::Add, State.VF,
+                        State.Builder);
----------------
Does Instruction::Add also work for FP inductions?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:741
+    Value *Step = State.get(getOperand(1), VPIteration(0, 0));
+    Value *Val = State.get(getOperand(0), VPIteration(0, 0));
+    Value *Broadcasted =
----------------
fhahn wrote:
> Ayal wrote:
> > get operands in order?
> > Store singleton scalar values per part 0, instead of per lane 0?
> Changed to get `VPIteration(Part, 0)`. I kept accessing `Val`  outside the loop, to keep a single broadcast. But there's now an assert.
Val >> a more informative name: CanonicalIV? ScalarFirstLanesOfIV?

The single scalar value of the scalar canonical IV chain can be recorded per-part (0 or all) throughout: 
`Value *Val = State.get(getOperand(0), 0)`?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:746
+    for (unsigned Part = 0; Part < State.UF; ++Part) {
+      assert(!State.VF.isScalable() && "scalable vectors not yet supported.");
+      Value *StartIdx;
----------------
fhahn wrote:
> Ayal wrote:
> > assert at the outset? (While we're here, irrespective of this patch)
> Move out of the loop and will push a fix separately.
`assert(Val == State.get(getOperand(0), Part) &&  "Val changed for Part");` ?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2746
+
+  static VPInstruction *getCanonicalStepVector(VPCanonicalIVPHIRecipe *R);
+};
----------------
There's only a single VPCanonicalIVPHIRecipe per Plan to be passed in, may as well have getCanonicalStepVector() obtain it from Plan?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:856
+  }
+
   unsigned getOpcode() const { return Opcode; }
----------------
fhahn wrote:
> Ayal wrote:
> > Where are these new classof's needed?
> Unfortunately this is needed to `dyn_cast` directly from VPUser -> VPInstruction
document with said need?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:344
+    if (IV->getUnderlyingValue() == OriginalCanonicalIV &&
+        !OnlyScalarStepsNeeded(IV->getUnderlyingInstr())) {
+      CanonicalStepVector->replaceAllUsesWith(IV);
----------------
Can early-exit before the loop if OnlyScalarStepsNeeded(OriginalCanonicalIV).
Then here check if (IV && IV->getUnderlyingValue() == OriginalCanonicalIV).
But perhaps better to check
  if (IV && IV->getUnderlyingValue() == OriginalCanonicalIV && !IV->NeedsVectorIV())
as mentioned above.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanValue.h:333
     VPWidenSelectSC,
+    VPStepVectorSC,

----------------
Lex order

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116123/new/

https://reviews.llvm.org/D116123