[PATCH] D116123: [VPlan] Handle IV vector splat as VPInstruction.

Wed Jan 19 08:35:38 PST 2022

Ayal added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:1674
+  });
+}
----------------
Demanded == Used?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:75

+Value *getRuntimeVFAsFloat(IRBuilder<> &B, Type *FTy, ElementCount VF);
+
----------------
Place above next to getRuntimeVF()?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:80
+/// to each vector element of Val. The sequence starts at StartIndex.
+/// \p Opcode is relevant for FP induction variable.
+Value *getStepVector(Value *Val, Value *StartIdx, Value *Step,
----------------
(Indep. of this patch, but while we're here:)
"The sequence starts at StartIndex." - redundant?
\p Opcode >> \p BinOp

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:773
+
+  /// Returns true if the recipe only uses the first lane of operand \p Op.
+  virtual bool onlyFirstLaneUsed(const VPValue *Op) const {
----------------
Default is to return false conservatively.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:780
+
+  /// Returns true if the recipe only uses scalars of operand \p Op.
+  virtual bool onlyScalarsUsed(const VPValue *Op) const {
----------------
ditto

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:926
+           "Op must be an operand of the recipe");
+    return getOperand(0) == Op && getOpcode() == VPInstruction::ActiveLaneMask;
+  }
----------------
The recipes along the canonical IV chain also use first lane only, namely: CanonicalIVIncrement, CanonicalIVIncrementNUW, BranchOnCount and VPCanonicalIVPHIRecipe? They are also "onlyFirstPartUsed()", but are unique in that.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1364
+    assert(is_contained(operands(), Op) &&
+           "Op must be an operand of the recipe");
+    return all_of(users(), [this](VPUser *U) {
----------------
Comment that "// Recursing through Blend recipes only, must terminate at header phi's the latest." ?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1744
+
+    if (isStore() && getStoredValue() == Op)
+      return false;
----------------
Already confirmed above that Op is the address?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1747
+
+    // Widened, consecutive memory operations only demand the first lane.
+    return isConsecutive();
----------------
... of their address.

`return Op == getAddr() && isConsecutive();` ?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:342
   VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock();
   for (VPRecipeBase &Phi : HeaderVPBB->phis()) {
     auto *WidenOriginalIV = dyn_cast<VPWidenIntOrFpInductionRecipe>(&Phi);
----------------
Ahh, removeRedundantInductionCasts() is called before hoisting VPWidenIntOrFpInductionRecipes-built-from-Truncs, so the latter will be excluded if we only visit phis() here? OTOH, are such truncated IV's candidates for folding with WidenCanonicalIV in terms of matching types?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:354
     // directly.
-    if (WidenOriginalIV && WidenOriginalIV->isCanonical() &&
-        WidenOriginalIV->getScalarType() == WidenNewIV->getScalarType()) {
-      WidenNewIV->replaceAllUsesWith(WidenOriginalIV);
-      WidenNewIV->eraseFromParent();
-      return;
+    // If scalar values are demanded from WidenNewIV, create a stepvector.
+    // TODO: Remove checks once a vector phi will be generated for all
----------------
The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below?

Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116123/new/

https://reviews.llvm.org/D116123