[llvm] [VPlan] Add VPValue for VF, use it for VPWidenIntOrFpInductionRecipe. (PR #95305)

Fri Sep 6 03:58:10 PDT 2024

================
@@ -8189,10 +8189,12 @@ createWidenInductionRecipes(PHINode *Phi, Instruction *PhiOrTrunc,
   VPValue *Step =
       vputils::getOrCreateVPValueForSCEVExpr(Plan, IndDesc.getStep(), SE);
   if (auto *TruncI = dyn_cast<TruncInst>(PhiOrTrunc)) {
-    return new VPWidenIntOrFpInductionRecipe(Phi, Start, Step, IndDesc, TruncI);
+    return new VPWidenIntOrFpInductionRecipe(Phi, Start, Step, Plan.getVF(),
----------------
fhahn wrote:

> VPWidenIntOrFpInductionRecipe could retrieve VF by looking up Plan.getVF() on demand rather than recording it as on operand, but the latter helps in checking if VF has users, i.e., if any VPWidenIntOrFpInductionRecipe exists?

Exactly, this is used to check whether to generate it or not. `VFxUF` is similarly added as operand to the VPInstruction to increment the canonical IV.

> Surely VF is needed to vectorize any loop, including ones free of VPWidenIntOrFpInductionRecipes. Does it need to be cached somehow, to prevent regeneration?
There are multiple other places that currently generate runtime VF on demand. Adding it as operand here and generating on-demand only is mostly to gradually convert all users.

We could create VF unconditionally, then we would have update all tests with scalable vectors to split up VFxUF computation to `((vscale * VF) * UF)` instead of `(vscale * (VF * UF))` even if `vscale * VF` is only used in the multiply by `UF`.

To limit this we could try to fold it back as post-codegen cleanup. Or update all tests, happy to go either way (or leave as is in the current patch for now)

https://github.com/llvm/llvm-project/pull/95305