[llvm] [LV] Decompose WidenIntOrFPInduction into phi and update recipes (PR #82021)

Fri Feb 16 10:34:01 PST 2024

llvmbot wrote:



@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-risc-v

Author: Kolya Panchenko (nikolaypanchenko)

<details>
<summary>Changes</summary>

Loop Vectorizer still has two recipes `VPWidenIntOrFpInductionRecipe` and `VPWidenPointerInductionRecipe` that behave in a VPlan as phi-like, as they're derived from `VPHeaderPHIRecipe`, but their generate functions construct vector phi and vector self-update in the vectorized loop.

This is not only bad from readability of a VPlan, but also requires more code to maintain such behavior. For instance, there's already ad-hoc code motion to move generated updates of these recipes closer to the loop latch.

The changeset:
* Adds `WidenVFxUF` to represent `broadcast({1...UF} x `VFxUF`)` value
* Decomposes existing `VPWidenIntOrFpInductionRecipe` into
```
  WIDEN-INDUCTION vp<%iv> = phi ir<0>, vp<%be-value>
  ...
  EMIT vp<%widen-step> = mul ir<%step>, vp<WidenVFxUF>
  EMIT vp<%be-value> = add vp<%iv>,vp<%widen-step>
```
* Moves trunc optimization of widen IV into VPlan xform
* Adds trivial cyclic dependency removal and mark some binops as non side-effecting
* Adds element type to `VPValue` to query it for artifical added `VPValue` without underlying instruction

---

Patch is 3.06 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/82021.diff


171 Files Affected:

- (modified) llvm/include/llvm/Analysis/IVDescriptors.h (+5) 
- (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+88-36) 
- (modified) llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h (+13-1) 
- (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+54-16) 
- (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+57-32) 
- (modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+13-1) 
- (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+26-60) 
- (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+81-2) 
- (modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+19-1) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll (+42-44) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll (+120-120) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence-fold-tail.ll (+5-5) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/gather-do-not-vectorize-addressing.ll (+64-12) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-trunc.ll (+62-12) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/interleave-allocsize-not-equal-typesize.ll (+9-9) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/interleaved-store-of-first-order-recurrence.ll (+49-14) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/outer_loop_prefer_scalable.ll (+31-31) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/outer_loop_test1_no_explicit_vect_width.ll (+123-57) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/pr60831-sve-inv-store-crash.ll (+11-11) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll (+19-20) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions-tf.ll (+78-16) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll (+844-844) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/streaming-compatible-sve-no-maximize-bandwidth.ll (+36-36) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll (+2903-778) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-inloop-reductions.ll (+24-24) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-reductions.ll (+22-22) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll (+18-18) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-gather-scatter.ll (+14-15) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll (+149-54) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions.ll (+11-11) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll (+173-168) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll (+170-170) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-low-trip-count.ll (+63-19) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-runtime-check-size-based-threshold.ll (+43-43) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll (+11-11) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-reductions.ll (+109-109) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-unroll.ll (+158-158) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll (+149-149) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll (+131-32) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/tail-folding-styles.ll (+56-51) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/vector-call-linear-args.ll (+56-69) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/wider-VF-for-callinst.ll (+9-9) 
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll (+81-83) 
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/lmul.ll (+35-35) 
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/mask-index-type.ll (+21-22) 
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/masked_gather_scatter.ll (+66-66) 
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/ordered-reduction.ll (+39-39) 
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-interleaved.ll (+1-1) 
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/scalable-basics.ll (+106-106) 
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll (+580-214) 
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll (+123-125) 
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+238-243) 
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/zvl32b.ll (+4-4) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/consecutive-ptr-uniforms.ll (+202-41) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll (+1-1) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/conversion-cost.ll (+30-30) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/cost-model.ll (+1-1) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll (+496-119) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll (+167-104) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/fixed-order-recurrence.ll (+6-6) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/float-induction-x86.ll (+31-40) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll (+54-54) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll (+4-4) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/interleaved-accesses-sink-store-across-load.ll (+12-12) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll (+27-27) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll (+364-364) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/optsize.ll (+42-46) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/outer_loop_test1_no_explicit_vect_width.ll (+118-57) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/pr34438.ll (+1-1) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/pr36524.ll (+24-27) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/pr51366-sunk-instruction-used-outside-of-loop.ll (+39-10) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/pr54634.ll (+19-25) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/scatter_crash.ll (+245-15) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/small-size.ll (+60-61) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/tail_loop_folding.ll (+29-32) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll (+47-58) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll (+8-9) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/vectorize-interleaved-accesses-gap.ll (+6-7) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll (+189-191) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll (+23-24) 
- (modified) llvm/test/Transforms/LoopVectorize/X86/x86-predication.ll (+88-98) 
- (modified) llvm/test/Transforms/LoopVectorize/branch-weights.ll (+101-52) 
- (modified) llvm/test/Transforms/LoopVectorize/bsd_regex.ll (+6-7) 
- (modified) llvm/test/Transforms/LoopVectorize/cast-induction.ll (+363-58) 
- (modified) llvm/test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll (+1-1) 
- (modified) llvm/test/Transforms/LoopVectorize/create-induction-resume.ll (+2-2) 
- (modified) llvm/test/Transforms/LoopVectorize/dbg-outer-loop-vect.ll (+12-12) 
- (modified) llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-divisible-TC.ll (+5-5) 
- (modified) llvm/test/Transforms/LoopVectorize/epilog-vectorization-reductions.ll (+31-31) 
- (modified) llvm/test/Transforms/LoopVectorize/epilog-vectorization-trunc-induction-steps.ll (+4-4) 
- (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains-vplan.ll (+3-72) 
- (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains.ll (+648-198) 
- (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll (+3-352) 
- (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll (+174-176) 
- (modified) llvm/test/Transforms/LoopVectorize/float-induction.ll (+138-149) 
- (modified) llvm/test/Transforms/LoopVectorize/float-minmax-instruction-flag.ll (+1-1) 
- (modified) llvm/test/Transforms/LoopVectorize/fpsat.ll (+3-3) 
- (modified) llvm/test/Transforms/LoopVectorize/i8-induction.ll (+98-4) 
- (modified) llvm/test/Transforms/LoopVectorize/icmp-uniforms.ll (+3-1) 
- (modified) llvm/test/Transforms/LoopVectorize/if-pred-non-void.ll (+104-112) 
- (modified) llvm/test/Transforms/LoopVectorize/induction-multiple-uses-in-same-instruction.ll (+8-7) 
- (modified) llvm/test/Transforms/LoopVectorize/induction-ptrcasts.ll (+83-17) 
- (modified) llvm/test/Transforms/LoopVectorize/induction-step.ll (+226-75) 
- (modified) llvm/test/Transforms/LoopVectorize/induction-unroll-novec.ll (+59-20) 
- (modified) llvm/test/Transforms/LoopVectorize/induction.ll (+839-880) 
- (modified) llvm/test/Transforms/LoopVectorize/instruction-only-used-outside-of-loop.ll (+15-17) 
- (modified) llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll (+10-13) 
- (modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+35-35) 
- (modified) llvm/test/Transforms/LoopVectorize/load-of-struct-deref-pred.ll (+8-8) 
- (modified) llvm/test/Transforms/LoopVectorize/loop-form.ll (+12-12) 
- (modified) llvm/test/Transforms/LoopVectorize/loop-scalars.ll (+4-4) 
- (modified) llvm/test/Transforms/LoopVectorize/memdep-fold-tail.ll (+4-4) 
- (modified) llvm/test/Transforms/LoopVectorize/multiple-strides-vectorization.ll (+8-8) 
- (modified) llvm/test/Transforms/LoopVectorize/no_outside_user.ll (+1-1) 
- (modified) llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization-liveout.ll (+1-1) 
- (modified) llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization.ll (+114-114) 
- (modified) llvm/test/Transforms/LoopVectorize/outer-loop-vec-phi-predecessor-order.ll (+5-5) 
- (modified) llvm/test/Transforms/LoopVectorize/outer_loop_hcfg_construction.ll (+27-18) 
- (modified) llvm/test/Transforms/LoopVectorize/outer_loop_scalable.ll (+32-31) 
- (modified) llvm/test/Transforms/LoopVectorize/outer_loop_test1.ll (+62-29) 
- (modified) llvm/test/Transforms/LoopVectorize/outer_loop_test2.ll (+94-40) 
- (modified) llvm/test/Transforms/LoopVectorize/pointer-induction-unroll.ll (+28-28) 
- (modified) llvm/test/Transforms/LoopVectorize/pointer-select-runtime-checks.ll (+99-99) 
- (modified) llvm/test/Transforms/LoopVectorize/pr30654-phiscev-sext-trunc.ll (+33-33) 
- (modified) llvm/test/Transforms/LoopVectorize/pr35773.ll (+57-16) 
- (modified) llvm/test/Transforms/LoopVectorize/pr37248.ll (+10-10) 
- (modified) llvm/test/Transforms/LoopVectorize/pr44488-predication.ll (+1-1) 
- (modified) llvm/test/Transforms/LoopVectorize/pr45259.ll (+5-5) 
- (modified) llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll (+90-102) 
- (modified) llvm/test/Transforms/LoopVectorize/pr47343-expander-lcssa-after-cfg-update.ll (+1-1) 
- (modified) llvm/test/Transforms/LoopVectorize/pr50686.ll (+9-9) 
- (modified) llvm/test/Transforms/LoopVectorize/pr51614-fold-tail-by-masking.ll (+45-45) 
- (modified) llvm/test/Transforms/LoopVectorize/pr55100-expand-scev-predicate-used.ll (+1-1) 
- (modified) llvm/test/Transforms/LoopVectorize/pr55167-fold-tail-live-out.ll (+27-27) 
- (modified) llvm/test/Transforms/LoopVectorize/pr58811-scev-expansion.ll (+8-8) 
- (modified) llvm/test/Transforms/LoopVectorize/pr59319-loop-access-info-invalidation.ll (+2-2) 
- (modified) llvm/test/Transforms/LoopVectorize/reduction-align.ll (+1-1) 
- (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll (+154-154) 
- (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-uf4.ll (+192-198) 
- (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+21-21) 
- (modified) llvm/test/Transforms/LoopVectorize/reduction-odd-interleave-counts.ll (+136-70) 
- (modified) llvm/test/Transforms/LoopVectorize/reduction-predselect.ll (+61-61) 
- (modified) llvm/test/Transforms/LoopVectorize/reduction-small-size.ll (+14-14) 
- (modified) llvm/test/Transforms/LoopVectorize/reduction.ll (+87-87) 
- (modified) llvm/test/Transforms/LoopVectorize/runtime-check-needed-but-empty.ll (+16-17) 
- (modified) llvm/test/Transforms/LoopVectorize/runtime-check-small-clamped-bounds.ll (+11-11) 
- (modified) llvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll (+1026-88) 
- (modified) llvm/test/Transforms/LoopVectorize/scalable-inductions.ll (+66-65) 
- (modified) llvm/test/Transforms/LoopVectorize/scalable-reduction-inloop.ll (+69-26) 
- (modified) llvm/test/Transforms/LoopVectorize/scalable-trunc-min-bitwidth.ll (+15-15) 
- (modified) llvm/test/Transforms/LoopVectorize/scalarize-masked-call.ll (+1-1) 
- (modified) llvm/test/Transforms/LoopVectorize/scev-exit-phi-invalidation.ll (+8-8) 
- (modified) llvm/test/Transforms/LoopVectorize/scev-predicate-reasoning.ll (+4-4) 
- (modified) llvm/test/Transforms/LoopVectorize/single-value-blend-phis.ll (+86-87) 
- (modified) llvm/test/Transforms/LoopVectorize/skeleton-lcssa-crash.ll (+10-10) 
- (modified) llvm/test/Transforms/LoopVectorize/strict-fadd-interleave-only.ll (+33-35) 
- (modified) llvm/test/Transforms/LoopVectorize/trunc-shifts.ll (+14-30) 
- (modified) llvm/test/Transforms/LoopVectorize/uniform-blend.ll (+135-49) 
- (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll (+64-63) 
- (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_and.ll (+32-32) 
- (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_div_urem.ll (+12-12) 
- (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_lshr.ll (+52-52) 
- (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll (+210-210) 
- (modified) llvm/test/Transforms/LoopVectorize/use-scalar-epilogue-if-tp-fails.ll (+1-1) 
- (modified) llvm/test/Transforms/LoopVectorize/vector-geps.ll (+4-4) 
- (modified) llvm/test/Transforms/LoopVectorize/vplan-iv-transforms.ll (+3-1) 
- (modified) llvm/test/Transforms/LoopVectorize/vplan-printing.ll (+14-4) 
- (modified) llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll (+30-16) 
- (modified) llvm/test/Transforms/LoopVectorize/vplan-vectorize-inner-loop-reduction.ll (+1-1) 
- (modified) llvm/test/Transforms/LoopVectorize/vplan-widen-call-instruction.ll (+1-1) 
- (modified) llvm/unittests/Transforms/Vectorize/VPlanTest.cpp (-10) 


``````````diff

diff --git a/llvm/include/llvm/Analysis/IVDescriptors.h b/llvm/include/llvm/Analysis/IVDescriptors.h
index 5c7b613ac48c40..7ca13adae87f6a 100644
--- a/llvm/include/llvm/Analysis/IVDescriptors.h
+++ b/llvm/include/llvm/Analysis/IVDescriptors.h
@@ -363,6 +363,11 @@ class InductionDescriptor {
     return nullptr;
   }
 
+  const Instruction *getExactFPMathInst() const {
+    return const_cast<const Instruction *>(
+        const_cast<InductionDescriptor *>(this)->getExactFPMathInst());
+  }
+
   /// Returns binary opcode of the induction operator.
   Instruction::BinaryOps getInductionOpcode() const {
     return InductionBinOp ? InductionBinOp->getOpcode()
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 98b177cf5d2d0e..92b783d3badeae 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8114,34 +8114,6 @@ VPHeaderPHIRecipe *VPRecipeBuilder::tryToOptimizeInductionPHI(
   return nullptr;
 }
 
-VPWidenIntOrFpInductionRecipe *VPRecipeBuilder::tryToOptimizeInductionTruncate(
-    TruncInst *I, ArrayRef<VPValue *> Operands, VFRange &Range, VPlan &Plan) {
-  // Optimize the special case where the source is a constant integer
-  // induction variable. Notice that we can only optimize the 'trunc' case
-  // because (a) FP conversions lose precision, (b) sext/zext may wrap, and
-  // (c) other casts depend on pointer size.
-
-  // Determine whether \p K is a truncation based on an induction variable that
-  // can be optimized.
-  auto isOptimizableIVTruncate =
-      [&](Instruction *K) -> std::function<bool(ElementCount)> {
-    return [=](ElementCount VF) -> bool {
-      return CM.isOptimizableIVTruncate(K, VF);
-    };
-  };
-
-  if (LoopVectorizationPlanner::getDecisionAndClampRange(
-          isOptimizableIVTruncate(I), Range)) {
-
-    auto *Phi = cast<PHINode>(I->getOperand(0));
-    const InductionDescriptor &II = *Legal->getIntOrFpInductionDescriptor(Phi);
-    VPValue *Start = Plan.getVPValueOrAddLiveIn(II.getStartValue());
-    return createWidenInductionRecipes(Phi, I, Start, II, Plan, *PSE.getSE(),
-                                       *OrigLoop, Range);
-  }
-  return nullptr;
-}
-
 VPBlendRecipe *VPRecipeBuilder::tryToBlend(PHINode *Phi,
                                            ArrayRef<VPValue *> Operands,
                                            VPlanPtr &Plan) {
@@ -8275,6 +8247,70 @@ bool VPRecipeBuilder::shouldWiden(Instruction *I, VFRange &Range) const {
                                                              Range);
 }
 
+VPWidenCastRecipe *VPRecipeBuilder::createCast(VPValue *V, Type *From,
+                                               Type *To) {
+  if (From == To)
+    return nullptr;
+  Instruction::CastOps CastOpcode;
+  if (To->isIntegerTy() && From->isIntegerTy())
+    CastOpcode = To->getPrimitiveSizeInBits() < From->getPrimitiveSizeInBits()
+                     ? Instruction::Trunc
+                     : Instruction::ZExt;
+  else if (To->isIntegerTy())
+    CastOpcode = Instruction::FPToUI;
+  else
+    CastOpcode = Instruction::UIToFP;
+
+  return new VPWidenCastRecipe(CastOpcode, V, To);
+}
+
+VPRecipeBase *
+VPRecipeBuilder::createWidenStep(VPWidenIntOrFpInductionRecipe &WIV,
+                                 ScalarEvolution &SE, VPlan &Plan,
+                                 DenseSet<VPRecipeBase *> *CreatedRecipes) {
+  PHINode *PN = WIV.getPHINode();
+  const InductionDescriptor &IndDesc = WIV.getInductionDescriptor();
+  VPValue *ScalarStep =
+      vputils::getOrCreateVPValueForSCEVExpr(Plan, IndDesc.getStep(), SE);
+  Type *VFxUFTy = Plan.getVFxUF().getElementType();
+  Type *StepTy = IndDesc.getStep()->getType();
+  VPValue *WidenVFxUF = &Plan.getWidenVFxUF();
+  VPBasicBlock *LatchVPBB = Plan.getVectorLoopRegion()->getExitingBasicBlock();
+  if (VPWidenCastRecipe *WidenVFxUFCast =
+          createCast(&Plan.getWidenVFxUF(), VFxUFTy, StepTy)) {
+    WidenVFxUFCast->insertBefore(LatchVPBB->getTerminator());
+    if (CreatedRecipes)
+      CreatedRecipes->insert(WidenVFxUFCast);
+    WidenVFxUF = WidenVFxUFCast->getVPSingleValue();
+  }
+  const Instruction::BinaryOps UpdateOp =
+      IndDesc.getInductionOpcode() != Instruction::BinaryOpsEnd
+          ? IndDesc.getInductionOpcode()
+          : Instruction::Add;
+  VPInstruction *Update;
+  if (StepTy->isIntegerTy()) {
+    VPInstruction *Mul = new VPInstruction(
+        Instruction::Mul, {WidenVFxUF, ScalarStep}, PN->getDebugLoc());
+    Mul->insertBefore(LatchVPBB->getTerminator());
+    if (CreatedRecipes)
+      CreatedRecipes->insert(Mul);
+    Update = new VPInstruction(UpdateOp, {&WIV, Mul}, PN->getDebugLoc());
+    Update->insertBefore(LatchVPBB->getTerminator());
+  } else {
+    FastMathFlags FMF = IndDesc.getExactFPMathInst()
+                            ? IndDesc.getExactFPMathInst()->getFastMathFlags()
+                            : FastMathFlags();
+    VPInstruction *Mul = new VPInstruction(
+        Instruction::FMul, {WidenVFxUF, ScalarStep}, FMF, PN->getDebugLoc());
+    Mul->insertBefore(LatchVPBB->getTerminator());
+    Update = new VPInstruction(UpdateOp, {&WIV, Mul}, FMF, PN->getDebugLoc());
+    Update->insertBefore(LatchVPBB->getTerminator());
+  }
+  if (CreatedRecipes)
+    CreatedRecipes->insert(Update);
+  return Update;
+}
+
 VPWidenRecipe *VPRecipeBuilder::tryToWiden(Instruction *I,
                                            ArrayRef<VPValue *> Operands,
                                            VPBasicBlock *VPBB, VPlanPtr &Plan) {
@@ -8324,10 +8360,15 @@ VPWidenRecipe *VPRecipeBuilder::tryToWiden(Instruction *I,
   };
 }
 
-void VPRecipeBuilder::fixHeaderPhis() {
+void VPRecipeBuilder::fixHeaderPhis(VPlan &Plan) {
   BasicBlock *OrigLatch = OrigLoop->getLoopLatch();
   for (VPHeaderPHIRecipe *R : PhisToFix) {
-    auto *PN = cast<PHINode>(R->getUnderlyingValue());
+    if (auto *VPWIFR = dyn_cast<VPWidenIntOrFpInductionRecipe>(R)) {
+      VPWIFR->addOperand(
+          createWidenStep(*VPWIFR, *PSE.getSE(), Plan)->getVPSingleValue());
+      continue;
+    }
+    PHINode *PN = cast<PHINode>(R->getUnderlyingValue());
     VPRecipeBase *IncR =
         getRecipe(cast<Instruction>(PN->getIncomingValueForBlock(OrigLatch)));
     R->addOperand(IncR->getVPSingleValue());
@@ -8405,8 +8446,12 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(
     // can have earlier phis as incoming values.
     recordRecipeOf(Phi);
 
-    if ((Recipe = tryToOptimizeInductionPHI(Phi, Operands, *Plan, Range)))
+    if ((Recipe = tryToOptimizeInductionPHI(Phi, Operands, *Plan, Range))) {
+      if (isa<VPWidenPointerInductionRecipe>(Recipe))
+        return Recipe;
+      PhisToFix.push_back(cast<VPWidenIntOrFpInductionRecipe>(Recipe));
       return Recipe;
+    }
 
     VPHeaderPHIRecipe *PhiRecipe = nullptr;
     assert((Legal->isReductionVariable(Phi) ||
@@ -8441,10 +8486,17 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(
     return PhiRecipe;
   }
 
-  if (isa<TruncInst>(Instr) &&
-      (Recipe = tryToOptimizeInductionTruncate(cast<TruncInst>(Instr), Operands,
-                                               Range, *Plan)))
-    return Recipe;
+  if (isa<TruncInst>(Instr)) {
+    auto IsOptimizableIVTruncate =
+        [&](Instruction *K) -> std::function<bool(ElementCount)> {
+      return [=](ElementCount VF) -> bool {
+        return CM.isOptimizableIVTruncate(K, VF);
+      };
+    };
+
+    LoopVectorizationPlanner::getDecisionAndClampRange(
+        IsOptimizableIVTruncate(Instr), Range);
+  }
 
   // All widen recipes below deal only with VF > 1.
   if (LoopVectorizationPlanner::getDecisionAndClampRange(
@@ -8707,7 +8759,7 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
          !Plan->getVectorLoopRegion()->getEntryBasicBlock()->empty() &&
          "entry block must be set to a VPRegionBlock having a non-empty entry "
          "VPBasicBlock");
-  RecipeBuilder.fixHeaderPhis();
+  RecipeBuilder.fixHeaderPhis(*Plan);
 
   // ---------------------------------------------------------------------------
   // Transform initial VPlan: Apply previously taken decisions, in order, to
diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
index b1498026adadfe..126a6b1c061265 100644
--- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
@@ -146,6 +146,18 @@ class VPRecipeBuilder {
   /// between SRC and DST.
   VPValue *getEdgeMask(BasicBlock *Src, BasicBlock *Dst) const;
 
+  /// A helper function to create VPWidenCastRecipe of a \p V VPValue to a \p To
+  /// type.
+  /// FIXME: Remove \p From argument and take it from a \p V value
+  static VPWidenCastRecipe *createCast(VPValue *V, Type *From, Type *To);
+
+  /// A helper function which widens \p WIV step, multiplies it by WidenVFxUF
+  /// and attaches to loop latch of the \p Plan. Returns multiplication.
+  static VPRecipeBase *
+  createWidenStep(VPWidenIntOrFpInductionRecipe &WIV, ScalarEvolution &SE,
+                  VPlan &Plan,
+                  DenseSet<VPRecipeBase *> *CreatedRecipes = nullptr);
+
   /// Mark given ingredient for recording its recipe once one is created for
   /// it.
   void recordRecipeOf(Instruction *I) {
@@ -171,7 +183,7 @@ class VPRecipeBuilder {
 
   /// Add the incoming values from the backedge to reduction & first-order
   /// recurrence cross-iteration phis.
-  void fixHeaderPhis();
+  void fixHeaderPhis(VPlan &Plan);
 };
 } // end namespace llvm
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 2c0daa82afa59f..96732b77a9db3d 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -76,12 +76,25 @@ Value *VPLane::getAsRuntimeExpr(IRBuilderBase &Builder,
   llvm_unreachable("Unknown lane kind");
 }
 
-VPValue::VPValue(const unsigned char SC, Value *UV, VPDef *Def)
-    : SubclassID(SC), UnderlyingVal(UV), Def(Def) {
+VPValue::VPValue(const unsigned char SC, Value *UV, VPDef *Def, Type *Ty)
+    : SubclassID(SC), UnderlyingVal(UV), UnderlyingTy(Ty), Def(Def) {
+  if (UnderlyingTy)
+    assert((!UnderlyingVal || UnderlyingVal->getType() == UnderlyingTy) &&
+           "VPValue with set type should either be created without underlying "
+           "value or type should match the given type");
   if (Def)
     Def->addDefinedValue(this);
 }
 
+Type *VPValue::getElementType() {
+  return const_cast<Type *>(
+      const_cast<const VPValue *>(this)->getElementType());
+}
+
+const Type *VPValue::getElementType() const {
+  return UnderlyingVal ? UnderlyingVal->getType() : UnderlyingTy;
+}
+
 VPValue::~VPValue() {
   assert(Users.empty() && "trying to delete a VPValue with remaining users");
   if (Def)
@@ -763,6 +776,10 @@ VPlanPtr VPlan::createInitialVPlan(const SCEV *TripCount, ScalarEvolution &SE) {
   auto Plan = std::make_unique<VPlan>(Preheader, VecPreheader);
   Plan->TripCount =
       vputils::getOrCreateVPValueForSCEVExpr(*Plan, TripCount, SE);
+  Type *TCType = TripCount->getType();
+  Plan->getVectorTripCount().setElementType(TCType);
+  Plan->getVFxUF().setElementType(TCType);
+  Plan->getWidenVFxUF().setElementType(TCType);
   // Create empty VPRegionBlock, to be filled during processing later.
   auto *TopRegion = new VPRegionBlock("vector loop", false /*isReplicator*/);
   VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
@@ -796,6 +813,18 @@ void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
             createStepForVF(Builder, TripCountV->getType(), State.VF, State.UF),
             0);
 
+  if (WidenVFxUF.getNumUsers() > 0)
+    for (unsigned Part = 0, UF = State.UF; Part < UF; ++Part) {
+      Value *Step =
+          createStepForVF(Builder, TripCountV->getType(), State.VF, Part+1);
+      if (State.VF.isScalar())
+        State.set(&WidenVFxUF, Step, Part);
+      else
+        State.set(&WidenVFxUF,
+                  Builder.CreateVectorSplat(State.VF, Step, "widen.vfxuf"),
+                  Part);
+    }
+
   // When vectorizing the epilogue loop, the canonical induction start value
   // needs to be changed from zero to the value after the main vector loop.
   // FIXME: Improve modeling for canonical IV start values in the epilogue loop.
@@ -845,21 +874,16 @@ void VPlan::execute(VPTransformState *State) {
     if (isa<VPWidenPHIRecipe>(&R))
       continue;
 
-    if (isa<VPWidenPointerInductionRecipe>(&R) ||
-        isa<VPWidenIntOrFpInductionRecipe>(&R)) {
+    if (isa<VPWidenPointerInductionRecipe>(&R)) {
       PHINode *Phi = nullptr;
-      if (isa<VPWidenIntOrFpInductionRecipe>(&R)) {
-        Phi = cast<PHINode>(State->get(R.getVPSingleValue(), 0));
-      } else {
-        auto *WidenPhi = cast<VPWidenPointerInductionRecipe>(&R);
-        // TODO: Split off the case that all users of a pointer phi are scalar
-        // from the VPWidenPointerInductionRecipe.
-        if (WidenPhi->onlyScalarsGenerated(State->VF.isScalable()))
-          continue;
-
-        auto *GEP = cast<GetElementPtrInst>(State->get(WidenPhi, 0));
-        Phi = cast<PHINode>(GEP->getPointerOperand());
-      }
+      auto *WidenPhi = cast<VPWidenPointerInductionRecipe>(&R);
+      // TODO: Split off the case that all users of a pointer phi are scalar
+      // from the VPWidenPointerInductionRecipe.
+      if (WidenPhi->onlyScalarsGenerated(State->VF.isScalable()))
+        continue;
+
+      auto *GEP = cast<GetElementPtrInst>(State->get(WidenPhi, 0));
+      Phi = cast<PHINode>(GEP->getPointerOperand());
 
       Phi->setIncomingBlock(1, VectorLatchBB);
 
@@ -877,6 +901,7 @@ void VPlan::execute(VPTransformState *State) {
     // generated.
     bool SinglePartNeeded = isa<VPCanonicalIVPHIRecipe>(PhiR) ||
                             isa<VPFirstOrderRecurrencePHIRecipe>(PhiR) ||
+                            isa<VPWidenIntOrFpInductionRecipe>(PhiR) ||
                             (isa<VPReductionPHIRecipe>(PhiR) &&
                              cast<VPReductionPHIRecipe>(PhiR)->isOrdered());
     unsigned LastPartForNewPhi = SinglePartNeeded ? 1 : State->UF;
@@ -908,6 +933,12 @@ void VPlan::printLiveIns(raw_ostream &O) const {
     O << " = VF * UF";
   }
 
+  if (WidenVFxUF.getNumUsers() > 0) {
+    O << "\nLive-in ";
+    WidenVFxUF.printAsOperand(O, SlotTracker);
+    O << " = WIDEN VF * UF";
+  }
+
   if (VectorTripCount.getNumUsers() > 0) {
     O << "\nLive-in ";
     VectorTripCount.printAsOperand(O, SlotTracker);
@@ -1083,6 +1114,11 @@ VPlan *VPlan::duplicate() {
   }
   Old2NewVPValues[&VectorTripCount] = &NewPlan->VectorTripCount;
   Old2NewVPValues[&VFxUF] = &NewPlan->VFxUF;
+  Old2NewVPValues[&WidenVFxUF] = &NewPlan->WidenVFxUF;
+  NewPlan->getVectorTripCount().setElementType(
+      getVectorTripCount().getElementType());
+  NewPlan->getVFxUF().setElementType(getVFxUF().getElementType());
+  NewPlan->getWidenVFxUF().setElementType(getWidenVFxUF().getElementType());
   if (BackedgeTakenCount) {
     NewPlan->BackedgeTakenCount = new VPValue();
     Old2NewVPValues[BackedgeTakenCount] = NewPlan->BackedgeTakenCount;
@@ -1379,6 +1415,8 @@ void VPSlotTracker::assignSlot(const VPValue *V) {
 void VPSlotTracker::assignSlots(const VPlan &Plan) {
   if (Plan.VFxUF.getNumUsers() > 0)
     assignSlot(&Plan.VFxUF);
+  if (Plan.WidenVFxUF.getNumUsers() > 0)
+    assignSlot(&Plan.WidenVFxUF);
   assignSlot(&Plan.VectorTripCount);
   if (Plan.BackedgeTakenCount)
     assignSlot(Plan.BackedgeTakenCount);
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 13e1859ad6b250..306c2200ca34c9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -1618,38 +1618,65 @@ class VPHeaderPHIRecipe : public VPSingleDefRecipe {
   }
 };
 
-/// A recipe for handling phi nodes of integer and floating-point inductions,
-/// producing their vector values.
-class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
-  PHINode *IV;
-  TruncInst *Trunc;
+/// A base class for all widen induction-like recipes
+class VPWidenInductionBasePHIRecipe : public VPHeaderPHIRecipe {
+protected:
   const InductionDescriptor &IndDesc;
 
 public:
-  VPWidenIntOrFpInductionRecipe(PHINode *IV, VPValue *Start, VPValue *Step,
+  VPWidenInductionBasePHIRecipe(unsigned char VPDefID, Instruction *Instr,
+                                VPValue *Start, VPValue *Step,
                                 const InductionDescriptor &IndDesc)
-      : VPHeaderPHIRecipe(VPDef::VPWidenIntOrFpInductionSC, IV, Start), IV(IV),
-        Trunc(nullptr), IndDesc(IndDesc) {
+      : VPHeaderPHIRecipe(VPDefID, Instr, Start), IndDesc(IndDesc) {
     addOperand(Step);
   }
 
+  ~VPWidenInductionBasePHIRecipe() override = default;
+
+  /// Returns the step value of the induction.
+  VPValue *getStepValue() { return getOperand(1); }
+  const VPValue *getStepValue() const { return getOperand(1); }
+
+  /// Returns the induction descriptor for the recipe.
+  const InductionDescriptor &getInductionDescriptor() const { return IndDesc; }
+};
+
+/// A recipe for handling phi nodes of integer and floating-point inductions,
+/// producing their vector values.
+class VPWidenIntOrFpInductionRecipe : public VPWidenInductionBasePHIRecipe {
+  PHINode *IV = nullptr;
+  TruncInst *Trunc = nullptr;
+
+public:
+  VPWidenIntOrFpInductionRecipe(PHINode *IV, VPValue *Start, VPValue *Step,
+                                const InductionDescriptor &IndDesc)
+      : VPWidenInductionBasePHIRecipe(VPDef::VPWidenIntOrFpInductionSC, IV,
+                                      Start, Step, IndDesc),
+        IV(IV), Trunc(nullptr) {}
+
   VPWidenIntOrFpInductionRecipe(PHINode *IV, VPValue *Start, VPValue *Step,
                                 const InductionDescriptor &IndDesc,
                                 TruncInst *Trunc)
-      : VPHeaderPHIRecipe(VPDef::VPWidenIntOrFpInductionSC, Trunc, Start),
-        IV(IV), Trunc(Trunc), IndDesc(IndDesc) {
-    addOperand(Step);
-  }
+      : VPWidenInductionBasePHIRecipe(VPDef::VPWidenIntOrFpInductionSC, Trunc,
+                                      Start, Step, IndDesc),
+        IV(IV), Trunc(Trunc) {}
 
   ~VPWidenIntOrFpInductionRecipe() override = default;
 
   VPRecipeBase *clone() override {
-    return new VPWidenIntOrFpInductionRecipe(IV, getStartValue(),
-                                             getStepValue(), IndDesc, Trunc);
+    VPRecipeBase *Cloned = new VPWidenIntOrFpInductionRecipe(
+        getPHINode(), getStartValue(), getStepValue(), IndDesc, Trunc);
+    if (getNumOperands() == 3)
+      Cloned->addOperand(getOperand(2));
+    return Cloned;
   }
 
   VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionSC)
 
+  static inline bool classof(const VPHeaderPHIRecipe *R) {
+    return R->getVPDefID() == VPDef::VPWidenIntOrFpInductionSC;
+  }
+
   /// Generate the vectorized and scalarized versions of the phi node as
   /// needed by their users.
   void execute(VPTransformState &State) override;
@@ -1660,33 +1687,24 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
              VPSlotTracker &SlotTracker) const override;
 #endif
 
-  VPValue *getBackedgeValue() override {
-    // TODO: All operands of base recipe must exist and be at same index in
-    // derived recipe.
-    llvm_unreachable(
-        "VPWidenIntOrFpInductionRecipe generates its own backedge value");
+  VPValue *getBackedgeValue() override final {
+    if (getNumOperands() != 3)
+      llvm_unreachable(
+          "VPWidenIntOrFpInductionRecipe::getBackedgeValue is not yet valid");
+    return getOperand(2);
   }
 
-  VPRecipeBase &getBackedgeRecipe() override {
-    // TODO: All operands of base recipe must exist and be at same index in
-    // derived recipe.
-    llvm_unreachable(
-        "VPWidenIntOrFpInductionRecipe generates its own backedge value");
+  VPRecipeBase &getBackedgeRecipe() override final {
+    return *getBackedgeValue()->getDefiningRecipe();
   }
 
-  /// Returns the step value of the induction.
-  VPValue *getStepValue() { return getOperand(1); }
-  const VPValue *getStepValue() const { return getOperand(1); }
-
   /// Returns the first defined value as TruncInst, if it is one or nullptr
   /// otherwise.
   TruncInst *getTruncInst() { return Trunc; }
   const TruncInst *getTruncInst() const { retu...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/82021