[PATCH] D116123: [VPlan] Handle IV vector splat as VPInstruction.

Wed Jan 12 05:09:16 PST 2022

Ayal added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8442
+        IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction());
+      }
       BlockMask = Builder.createNaryOp(VPInstruction::ActiveLaneMask, {IV});
----------------
fhahn wrote:
> Ayal wrote:
> > fhahn wrote:
> > > Ayal wrote:
> > > > fhahn wrote:
> > > > > fhahn wrote:
> > > > > > fhahn wrote:
> > > > > > > Ayal wrote:
> > > > > > > > No need for {}
> > > > > > > > 
> > > > > > > > So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately?
> > > > > > > > 
> > > > > > > > BTC is no longer used by ActiveLaneMask, so setting it should sink below to where ICmpULE needs it (While we're here, irrespective of this patch).
> > > > > > > > No need for {}
> > > > > > > 
> > > > > > > I think if the body spans multiple lines due to a comment, braces should not be omitted
> > > > > > > https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements
> > > > > > > 
> > > > > > > > So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately?
> > > > > > > 
> > > > > > > Yes, let me try to split that off. `ActiveLaneMask` only needs the first scalar step of each `Part`.
> > > > > > > 
> > > > > > > > BTC is no longer used by ActiveLaneMask, so setting it should sink below to where ICmpULE needs it (While we're here, irrespective of this patch).
> > > > > > > 
> > > > > > > done separately in 2e630eabd329
> > > > > > >> So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately?
> > > > > > >Yes, let me try to split that off. ActiveLaneMask only needs the first scalar step of each Part.
> > > > > > 
> > > > > > I just double checked and we may still need to go the route with the widened recipe if there's no primary induction. But there's no test case without primary inductions that use active.lane.mask. I;ll push one and update the patch.
> > > > > I missed something here and IIUC your comment was regarding creating a step vector in both cases at first, right?
> > > > > 
> > > > > The issue with that is that in some cases `ActiveLaneMask` will use the step-vector instead of the scalar steps, so I left it as is for now.
> > > > > 
> > > > > > I just double checked and we may still need to go the route with the widened recipe if there's no primary induction. 
> > > > > 
> > > > > The current patch still creates the vector phi when necessary and the existing tests cover that already.
> > > > Yes, my comment refers to the fact that currently both ActiveLaneMask and ICmpULE are treated the same when Legal has a PrimaryInduction() - both use the same IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction());
> > > > This patch continues to feed ActiveLaneMask with this IV, but feeds ICmpULE with a StepVector.
> > > > Yes, my comment refers to the fact that currently both ActiveLaneMask and ICmpULE are treated the same when Legal has a PrimaryInduction() - both use the same IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction());
> > > 
> > > Ah I see. Yes, currently they both use the same VPValue, because the single recipe creates both the scalar steps and the step vector. But ActiveLaneMask only needs the steps and ICmpULE only needs the step vector. This is explicit now in the code as a result of breaking this part of the phi handling into distinct pieces.
> > > Ah I see. Yes, currently they both use the same VPValue, because the single recipe creates both the scalar steps and the step vector. But ActiveLaneMask only needs the steps and ICmpULE only needs the step vector. This is explicit now in the code as a result of breaking this part of the phi handling into distinct pieces.
> > 
> > ok, that confirms 1st Q above, what about the 2nd: Seems like an optimization best applied separately?
> I updated the patch to always generate widened induction here, and added a transform to replace it if possible later.
Better have the transform in a separate patch, to retain exiting behavior here?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2556
+      if (Step->getType()->isFloatingPointTy()) {
+        StartIdx = Builder.CreateUIToFP(StartIdx, Step->getType());
+        // Floating-point operations inherit FMF via the builder's flags.
----------------
Set StartIdx to int or fp from its beginning? Indep. of this patch.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2563
+        EntryPart = Builder.CreateAdd(
+            ScalarIV, Builder.CreateMul(StartIdx, Step), "induction");
+      }
----------------
Would be good to fold int and fp handlings, indep. of this patch:
Set and use `MulOp` here as well?
Use CreateBinOp here as well?
Name the instruction "induction" above for fp as well?
OTOH, hanlding Trunc should apply to int only?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:336
+  VPWidenIntOrFpInductionRecipe *WidenInduction = nullptr;
+  for (VPRecipeBase &Phi : *HeaderVPBB) {
+    if (auto *W = dyn_cast<VPWidenCanonicalIVRecipe>(&Phi)) {
----------------
`Phi`: VPWidenIntOrFpInductionRecipe represents a phi (assumed to exist, corresponding to OriginalCanonicalIV) but VPWidenCanonicalIVRecipe represents a non-phi and thus appears after the former. Ordering their handling accordingly is more logical? Break as soon as (if) the latter is found?

WidenCan >> WidenNewIV?
WidenInduction >> WidenOriginalIV?
(both are canonical...)

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:356
+  if (Range.Start.isScalar())
+    Range.End = Range.Start * 2;
+
----------------
Is this optimization expected to clamp Range.End for the current VPlan during VPlans-for-ranges construction?
Is there a need to (only) know if VPlan's range is scalar? Specifically, if Start is Scalar then End can be asserted to be Start*2, if needed? Suffice the check if all recipes provide scalars only, although faster to check if range is scalar?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:358
+
+  auto AllUsersScalar = [](VPWidenIntOrFpInductionRecipe *R) {
+    return all_of(R->users(), [](VPUser *U) {
----------------
vputils::onlyScalarsDemanded()?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:365
+
+  auto AllUsersUniform = [](VPWidenIntOrFpInductionRecipe *R) {
+    return all_of(R->users(), [](VPUser *U) {
----------------
vputils::onlyFirstLaneDemanded()?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:373
+
+  assert(WidenInduction && "expected recipe for original canonical IV");
+
----------------
Place next to above early-exit if !WidenCan?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:390
+        return !VPI || VPI->getOpcode() == VPInstruction::ActiveLaneMask;
+      })) {
+    WidenCan->replaceAllUsesWith(WidenInduction);
----------------
WidenCan is going to be replaced, question is by what; check instead if StepVector is needed (one recipe producing only scalars the other vectors (deserves more review thoughts...)) and if so create it, followed by a single replacing and erasing of WidenCan?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:396
+  // Retain WidenCan fora scalar plans.
+  if (Range.Start.isScalar())
+    return;
----------------
Scalar Start handled above (w/o retaining WidenCan)?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116123/new/

https://reviews.llvm.org/D116123