[PATCH] D116123: [VPlan] Handle IV vector splat as VPInstruction.

Ayal Zaks via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 25 00:00:05 PST 2022


Ayal added inline comments.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4794
     // If tail-folding is applied, the primary induction variable will be used
     // to feed a vector compare.
     if (Ind == Legal->getPrimaryInduction() && foldTailByMasking())
----------------
Wonder if this is still the case now that NewIV is always created to feed the vector compare, until replaced?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:769
+      assert(Step->getType()->isIntegerTy() &&
+             "stepvector only supports integer steps for now");
+      Value *StartIdx =
----------------
Assert is redundant given than Step was created above as a ConstantInt - assert instead that Val is integer, before the loop?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1383
+    return all_of(users(), [this](VPUser *U) {
+      return cast<VPRecipeBase>(U)->onlyFirstLaneUsed(this);
+    });
----------------
Could a user be live-out / not a recipe?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:926
+           "Op must be an operand of the recipe");
+    return getOperand(0) == Op && getOpcode() == VPInstruction::ActiveLaneMask;
+  }
----------------
fhahn wrote:
> Ayal wrote:
> > The recipes along the canonical IV chain also use first lane only, namely: CanonicalIVIncrement, CanonicalIVIncrementNUW, BranchOnCount and VPCanonicalIVPHIRecipe? They are also "onlyFirstPartUsed()", but are unique in that.
> Updated to include CanonicalIVIncrement, CanonicalIVIncrementNUW, BranchOnCount  opcodes.
Also have VPCanonicalIVPHIRecipe indicate it uses only first lane?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:354
     // directly.
-    if (WidenOriginalIV && WidenOriginalIV->isCanonical() &&
-        WidenOriginalIV->getScalarType() == WidenNewIV->getScalarType()) {
-      WidenNewIV->replaceAllUsesWith(WidenOriginalIV);
-      WidenNewIV->eraseFromParent();
-      return;
+    // If scalar values are demanded from WidenNewIV, create a stepvector.
+    // TODO: Remove checks once a vector phi will be generated for all
----------------
fhahn wrote:
> Ayal wrote:
> > The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below?
> > 
> > Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)?
> > The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below?
> 
> I might be missing something, but I think It is still possible to fold the tail with VF=1 and UF>1, so I left the check (e.g. in `llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll`)
> 
> 
> > Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)?
> 
> Yes, for ActiveLane we only need the first scalar part of each lane. I think it might be better to have a special version of build-scalar-steps that only generates the first lane as follow-up to the build-scalar-steps patches.
>> The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below?

> I might be missing something, but I think It is still possible to fold the tail with VF=1 and UF>1, so I left the check (e.g. in llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll)

Ah, you're right, I forgot foldTail may operate when unrolling only; it should probably stop doing that, but in a separate patch.

>> Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)?

> Yes, for ActiveLane we only need the first scalar part of each lane. I think it might be better to have a special version of build-scalar-steps that only generates the first lane as follow-up to the build-scalar-steps patches.

Follow-up TODO would be fine.

Still would be good to try and simplify the logic here that checks if OriginalIV cannot replace NewIV (but instead a StepVector'd CanonicalIV replaces NewIV): FoldTail must be true given that NewIV exists, so can be asserted or not passed-in; which leaves

Range includes only positive VF's &&
onlyFirstLaneUsed(WidenOriginalIV) &&
onlyScalarsUsed(WidenOriginalIV) &&
NewIV feeds some VPInstruction that isn't ActiveLaneMask
? 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116123/new/

https://reviews.llvm.org/D116123



More information about the llvm-commits mailing list