[llvm] [LV] Decompose WidenIntOrFPInduction into phi and update recipes (PR #82021)
via llvm-commits
llvm-commits at lists.llvm.org
Fri Feb 16 10:34:01 PST 2024
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-backend-risc-v
Author: Kolya Panchenko (nikolaypanchenko)
<details>
<summary>Changes</summary>
Loop Vectorizer still has two recipes `VPWidenIntOrFpInductionRecipe` and `VPWidenPointerInductionRecipe` that behave in a VPlan as phi-like, as they're derived from `VPHeaderPHIRecipe`, but their generate functions construct vector phi and vector self-update in the vectorized loop.
This is not only bad from readability of a VPlan, but also requires more code to maintain such behavior. For instance, there's already ad-hoc code motion to move generated updates of these recipes closer to the loop latch.
The changeset:
* Adds `WidenVFxUF` to represent `broadcast({1...UF} x `VFxUF`)` value
* Decomposes existing `VPWidenIntOrFpInductionRecipe` into
```
WIDEN-INDUCTION vp<%iv> = phi ir<0>, vp<%be-value>
...
EMIT vp<%widen-step> = mul ir<%step>, vp<WidenVFxUF>
EMIT vp<%be-value> = add vp<%iv>,vp<%widen-step>
```
* Moves trunc optimization of widen IV into VPlan xform
* Adds trivial cyclic dependency removal and mark some binops as non side-effecting
* Adds element type to `VPValue` to query it for artifical added `VPValue` without underlying instruction
---
Patch is 3.06 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/82021.diff
171 Files Affected:
- (modified) llvm/include/llvm/Analysis/IVDescriptors.h (+5)
- (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+88-36)
- (modified) llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h (+13-1)
- (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+54-16)
- (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+57-32)
- (modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+13-1)
- (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+26-60)
- (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+81-2)
- (modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+19-1)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll (+42-44)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll (+120-120)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence-fold-tail.ll (+5-5)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/gather-do-not-vectorize-addressing.ll (+64-12)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-trunc.ll (+62-12)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/interleave-allocsize-not-equal-typesize.ll (+9-9)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/interleaved-store-of-first-order-recurrence.ll (+49-14)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/outer_loop_prefer_scalable.ll (+31-31)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/outer_loop_test1_no_explicit_vect_width.ll (+123-57)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/pr60831-sve-inv-store-crash.ll (+11-11)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll (+19-20)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions-tf.ll (+78-16)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll (+844-844)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/streaming-compatible-sve-no-maximize-bandwidth.ll (+36-36)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll (+2903-778)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-inloop-reductions.ll (+24-24)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-reductions.ll (+22-22)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll (+18-18)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-gather-scatter.ll (+14-15)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll (+149-54)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions.ll (+11-11)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll (+173-168)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll (+170-170)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-low-trip-count.ll (+63-19)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-runtime-check-size-based-threshold.ll (+43-43)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll (+11-11)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-reductions.ll (+109-109)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-unroll.ll (+158-158)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll (+149-149)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll (+131-32)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/tail-folding-styles.ll (+56-51)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/vector-call-linear-args.ll (+56-69)
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/wider-VF-for-callinst.ll (+9-9)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll (+81-83)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/lmul.ll (+35-35)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/mask-index-type.ll (+21-22)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/masked_gather_scatter.ll (+66-66)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/ordered-reduction.ll (+39-39)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-interleaved.ll (+1-1)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/scalable-basics.ll (+106-106)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll (+580-214)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll (+123-125)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+238-243)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/zvl32b.ll (+4-4)
- (modified) llvm/test/Transforms/LoopVectorize/X86/consecutive-ptr-uniforms.ll (+202-41)
- (modified) llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll (+1-1)
- (modified) llvm/test/Transforms/LoopVectorize/X86/conversion-cost.ll (+30-30)
- (modified) llvm/test/Transforms/LoopVectorize/X86/cost-model.ll (+1-1)
- (modified) llvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll (+496-119)
- (modified) llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll (+167-104)
- (modified) llvm/test/Transforms/LoopVectorize/X86/fixed-order-recurrence.ll (+6-6)
- (modified) llvm/test/Transforms/LoopVectorize/X86/float-induction-x86.ll (+31-40)
- (modified) llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll (+54-54)
- (modified) llvm/test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll (+4-4)
- (modified) llvm/test/Transforms/LoopVectorize/X86/interleaved-accesses-sink-store-across-load.ll (+12-12)
- (modified) llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll (+27-27)
- (modified) llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll (+364-364)
- (modified) llvm/test/Transforms/LoopVectorize/X86/optsize.ll (+42-46)
- (modified) llvm/test/Transforms/LoopVectorize/X86/outer_loop_test1_no_explicit_vect_width.ll (+118-57)
- (modified) llvm/test/Transforms/LoopVectorize/X86/pr34438.ll (+1-1)
- (modified) llvm/test/Transforms/LoopVectorize/X86/pr36524.ll (+24-27)
- (modified) llvm/test/Transforms/LoopVectorize/X86/pr51366-sunk-instruction-used-outside-of-loop.ll (+39-10)
- (modified) llvm/test/Transforms/LoopVectorize/X86/pr54634.ll (+19-25)
- (modified) llvm/test/Transforms/LoopVectorize/X86/scatter_crash.ll (+245-15)
- (modified) llvm/test/Transforms/LoopVectorize/X86/small-size.ll (+60-61)
- (modified) llvm/test/Transforms/LoopVectorize/X86/tail_loop_folding.ll (+29-32)
- (modified) llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll (+47-58)
- (modified) llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll (+8-9)
- (modified) llvm/test/Transforms/LoopVectorize/X86/vectorize-interleaved-accesses-gap.ll (+6-7)
- (modified) llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll (+189-191)
- (modified) llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll (+23-24)
- (modified) llvm/test/Transforms/LoopVectorize/X86/x86-predication.ll (+88-98)
- (modified) llvm/test/Transforms/LoopVectorize/branch-weights.ll (+101-52)
- (modified) llvm/test/Transforms/LoopVectorize/bsd_regex.ll (+6-7)
- (modified) llvm/test/Transforms/LoopVectorize/cast-induction.ll (+363-58)
- (modified) llvm/test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll (+1-1)
- (modified) llvm/test/Transforms/LoopVectorize/create-induction-resume.ll (+2-2)
- (modified) llvm/test/Transforms/LoopVectorize/dbg-outer-loop-vect.ll (+12-12)
- (modified) llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-divisible-TC.ll (+5-5)
- (modified) llvm/test/Transforms/LoopVectorize/epilog-vectorization-reductions.ll (+31-31)
- (modified) llvm/test/Transforms/LoopVectorize/epilog-vectorization-trunc-induction-steps.ll (+4-4)
- (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains-vplan.ll (+3-72)
- (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains.ll (+648-198)
- (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll (+3-352)
- (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll (+174-176)
- (modified) llvm/test/Transforms/LoopVectorize/float-induction.ll (+138-149)
- (modified) llvm/test/Transforms/LoopVectorize/float-minmax-instruction-flag.ll (+1-1)
- (modified) llvm/test/Transforms/LoopVectorize/fpsat.ll (+3-3)
- (modified) llvm/test/Transforms/LoopVectorize/i8-induction.ll (+98-4)
- (modified) llvm/test/Transforms/LoopVectorize/icmp-uniforms.ll (+3-1)
- (modified) llvm/test/Transforms/LoopVectorize/if-pred-non-void.ll (+104-112)
- (modified) llvm/test/Transforms/LoopVectorize/induction-multiple-uses-in-same-instruction.ll (+8-7)
- (modified) llvm/test/Transforms/LoopVectorize/induction-ptrcasts.ll (+83-17)
- (modified) llvm/test/Transforms/LoopVectorize/induction-step.ll (+226-75)
- (modified) llvm/test/Transforms/LoopVectorize/induction-unroll-novec.ll (+59-20)
- (modified) llvm/test/Transforms/LoopVectorize/induction.ll (+839-880)
- (modified) llvm/test/Transforms/LoopVectorize/instruction-only-used-outside-of-loop.ll (+15-17)
- (modified) llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll (+10-13)
- (modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+35-35)
- (modified) llvm/test/Transforms/LoopVectorize/load-of-struct-deref-pred.ll (+8-8)
- (modified) llvm/test/Transforms/LoopVectorize/loop-form.ll (+12-12)
- (modified) llvm/test/Transforms/LoopVectorize/loop-scalars.ll (+4-4)
- (modified) llvm/test/Transforms/LoopVectorize/memdep-fold-tail.ll (+4-4)
- (modified) llvm/test/Transforms/LoopVectorize/multiple-strides-vectorization.ll (+8-8)
- (modified) llvm/test/Transforms/LoopVectorize/no_outside_user.ll (+1-1)
- (modified) llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization-liveout.ll (+1-1)
- (modified) llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization.ll (+114-114)
- (modified) llvm/test/Transforms/LoopVectorize/outer-loop-vec-phi-predecessor-order.ll (+5-5)
- (modified) llvm/test/Transforms/LoopVectorize/outer_loop_hcfg_construction.ll (+27-18)
- (modified) llvm/test/Transforms/LoopVectorize/outer_loop_scalable.ll (+32-31)
- (modified) llvm/test/Transforms/LoopVectorize/outer_loop_test1.ll (+62-29)
- (modified) llvm/test/Transforms/LoopVectorize/outer_loop_test2.ll (+94-40)
- (modified) llvm/test/Transforms/LoopVectorize/pointer-induction-unroll.ll (+28-28)
- (modified) llvm/test/Transforms/LoopVectorize/pointer-select-runtime-checks.ll (+99-99)
- (modified) llvm/test/Transforms/LoopVectorize/pr30654-phiscev-sext-trunc.ll (+33-33)
- (modified) llvm/test/Transforms/LoopVectorize/pr35773.ll (+57-16)
- (modified) llvm/test/Transforms/LoopVectorize/pr37248.ll (+10-10)
- (modified) llvm/test/Transforms/LoopVectorize/pr44488-predication.ll (+1-1)
- (modified) llvm/test/Transforms/LoopVectorize/pr45259.ll (+5-5)
- (modified) llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll (+90-102)
- (modified) llvm/test/Transforms/LoopVectorize/pr47343-expander-lcssa-after-cfg-update.ll (+1-1)
- (modified) llvm/test/Transforms/LoopVectorize/pr50686.ll (+9-9)
- (modified) llvm/test/Transforms/LoopVectorize/pr51614-fold-tail-by-masking.ll (+45-45)
- (modified) llvm/test/Transforms/LoopVectorize/pr55100-expand-scev-predicate-used.ll (+1-1)
- (modified) llvm/test/Transforms/LoopVectorize/pr55167-fold-tail-live-out.ll (+27-27)
- (modified) llvm/test/Transforms/LoopVectorize/pr58811-scev-expansion.ll (+8-8)
- (modified) llvm/test/Transforms/LoopVectorize/pr59319-loop-access-info-invalidation.ll (+2-2)
- (modified) llvm/test/Transforms/LoopVectorize/reduction-align.ll (+1-1)
- (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll (+154-154)
- (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-uf4.ll (+192-198)
- (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+21-21)
- (modified) llvm/test/Transforms/LoopVectorize/reduction-odd-interleave-counts.ll (+136-70)
- (modified) llvm/test/Transforms/LoopVectorize/reduction-predselect.ll (+61-61)
- (modified) llvm/test/Transforms/LoopVectorize/reduction-small-size.ll (+14-14)
- (modified) llvm/test/Transforms/LoopVectorize/reduction.ll (+87-87)
- (modified) llvm/test/Transforms/LoopVectorize/runtime-check-needed-but-empty.ll (+16-17)
- (modified) llvm/test/Transforms/LoopVectorize/runtime-check-small-clamped-bounds.ll (+11-11)
- (modified) llvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll (+1026-88)
- (modified) llvm/test/Transforms/LoopVectorize/scalable-inductions.ll (+66-65)
- (modified) llvm/test/Transforms/LoopVectorize/scalable-reduction-inloop.ll (+69-26)
- (modified) llvm/test/Transforms/LoopVectorize/scalable-trunc-min-bitwidth.ll (+15-15)
- (modified) llvm/test/Transforms/LoopVectorize/scalarize-masked-call.ll (+1-1)
- (modified) llvm/test/Transforms/LoopVectorize/scev-exit-phi-invalidation.ll (+8-8)
- (modified) llvm/test/Transforms/LoopVectorize/scev-predicate-reasoning.ll (+4-4)
- (modified) llvm/test/Transforms/LoopVectorize/single-value-blend-phis.ll (+86-87)
- (modified) llvm/test/Transforms/LoopVectorize/skeleton-lcssa-crash.ll (+10-10)
- (modified) llvm/test/Transforms/LoopVectorize/strict-fadd-interleave-only.ll (+33-35)
- (modified) llvm/test/Transforms/LoopVectorize/trunc-shifts.ll (+14-30)
- (modified) llvm/test/Transforms/LoopVectorize/uniform-blend.ll (+135-49)
- (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll (+64-63)
- (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_and.ll (+32-32)
- (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_div_urem.ll (+12-12)
- (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_lshr.ll (+52-52)
- (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll (+210-210)
- (modified) llvm/test/Transforms/LoopVectorize/use-scalar-epilogue-if-tp-fails.ll (+1-1)
- (modified) llvm/test/Transforms/LoopVectorize/vector-geps.ll (+4-4)
- (modified) llvm/test/Transforms/LoopVectorize/vplan-iv-transforms.ll (+3-1)
- (modified) llvm/test/Transforms/LoopVectorize/vplan-printing.ll (+14-4)
- (modified) llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll (+30-16)
- (modified) llvm/test/Transforms/LoopVectorize/vplan-vectorize-inner-loop-reduction.ll (+1-1)
- (modified) llvm/test/Transforms/LoopVectorize/vplan-widen-call-instruction.ll (+1-1)
- (modified) llvm/unittests/Transforms/Vectorize/VPlanTest.cpp (-10)
``````````diff
diff --git a/llvm/include/llvm/Analysis/IVDescriptors.h b/llvm/include/llvm/Analysis/IVDescriptors.h
index 5c7b613ac48c40..7ca13adae87f6a 100644
--- a/llvm/include/llvm/Analysis/IVDescriptors.h
+++ b/llvm/include/llvm/Analysis/IVDescriptors.h
@@ -363,6 +363,11 @@ class InductionDescriptor {
return nullptr;
}
+ const Instruction *getExactFPMathInst() const {
+ return const_cast<const Instruction *>(
+ const_cast<InductionDescriptor *>(this)->getExactFPMathInst());
+ }
+
/// Returns binary opcode of the induction operator.
Instruction::BinaryOps getInductionOpcode() const {
return InductionBinOp ? InductionBinOp->getOpcode()
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 98b177cf5d2d0e..92b783d3badeae 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8114,34 +8114,6 @@ VPHeaderPHIRecipe *VPRecipeBuilder::tryToOptimizeInductionPHI(
return nullptr;
}
-VPWidenIntOrFpInductionRecipe *VPRecipeBuilder::tryToOptimizeInductionTruncate(
- TruncInst *I, ArrayRef<VPValue *> Operands, VFRange &Range, VPlan &Plan) {
- // Optimize the special case where the source is a constant integer
- // induction variable. Notice that we can only optimize the 'trunc' case
- // because (a) FP conversions lose precision, (b) sext/zext may wrap, and
- // (c) other casts depend on pointer size.
-
- // Determine whether \p K is a truncation based on an induction variable that
- // can be optimized.
- auto isOptimizableIVTruncate =
- [&](Instruction *K) -> std::function<bool(ElementCount)> {
- return [=](ElementCount VF) -> bool {
- return CM.isOptimizableIVTruncate(K, VF);
- };
- };
-
- if (LoopVectorizationPlanner::getDecisionAndClampRange(
- isOptimizableIVTruncate(I), Range)) {
-
- auto *Phi = cast<PHINode>(I->getOperand(0));
- const InductionDescriptor &II = *Legal->getIntOrFpInductionDescriptor(Phi);
- VPValue *Start = Plan.getVPValueOrAddLiveIn(II.getStartValue());
- return createWidenInductionRecipes(Phi, I, Start, II, Plan, *PSE.getSE(),
- *OrigLoop, Range);
- }
- return nullptr;
-}
-
VPBlendRecipe *VPRecipeBuilder::tryToBlend(PHINode *Phi,
ArrayRef<VPValue *> Operands,
VPlanPtr &Plan) {
@@ -8275,6 +8247,70 @@ bool VPRecipeBuilder::shouldWiden(Instruction *I, VFRange &Range) const {
Range);
}
+VPWidenCastRecipe *VPRecipeBuilder::createCast(VPValue *V, Type *From,
+ Type *To) {
+ if (From == To)
+ return nullptr;
+ Instruction::CastOps CastOpcode;
+ if (To->isIntegerTy() && From->isIntegerTy())
+ CastOpcode = To->getPrimitiveSizeInBits() < From->getPrimitiveSizeInBits()
+ ? Instruction::Trunc
+ : Instruction::ZExt;
+ else if (To->isIntegerTy())
+ CastOpcode = Instruction::FPToUI;
+ else
+ CastOpcode = Instruction::UIToFP;
+
+ return new VPWidenCastRecipe(CastOpcode, V, To);
+}
+
+VPRecipeBase *
+VPRecipeBuilder::createWidenStep(VPWidenIntOrFpInductionRecipe &WIV,
+ ScalarEvolution &SE, VPlan &Plan,
+ DenseSet<VPRecipeBase *> *CreatedRecipes) {
+ PHINode *PN = WIV.getPHINode();
+ const InductionDescriptor &IndDesc = WIV.getInductionDescriptor();
+ VPValue *ScalarStep =
+ vputils::getOrCreateVPValueForSCEVExpr(Plan, IndDesc.getStep(), SE);
+ Type *VFxUFTy = Plan.getVFxUF().getElementType();
+ Type *StepTy = IndDesc.getStep()->getType();
+ VPValue *WidenVFxUF = &Plan.getWidenVFxUF();
+ VPBasicBlock *LatchVPBB = Plan.getVectorLoopRegion()->getExitingBasicBlock();
+ if (VPWidenCastRecipe *WidenVFxUFCast =
+ createCast(&Plan.getWidenVFxUF(), VFxUFTy, StepTy)) {
+ WidenVFxUFCast->insertBefore(LatchVPBB->getTerminator());
+ if (CreatedRecipes)
+ CreatedRecipes->insert(WidenVFxUFCast);
+ WidenVFxUF = WidenVFxUFCast->getVPSingleValue();
+ }
+ const Instruction::BinaryOps UpdateOp =
+ IndDesc.getInductionOpcode() != Instruction::BinaryOpsEnd
+ ? IndDesc.getInductionOpcode()
+ : Instruction::Add;
+ VPInstruction *Update;
+ if (StepTy->isIntegerTy()) {
+ VPInstruction *Mul = new VPInstruction(
+ Instruction::Mul, {WidenVFxUF, ScalarStep}, PN->getDebugLoc());
+ Mul->insertBefore(LatchVPBB->getTerminator());
+ if (CreatedRecipes)
+ CreatedRecipes->insert(Mul);
+ Update = new VPInstruction(UpdateOp, {&WIV, Mul}, PN->getDebugLoc());
+ Update->insertBefore(LatchVPBB->getTerminator());
+ } else {
+ FastMathFlags FMF = IndDesc.getExactFPMathInst()
+ ? IndDesc.getExactFPMathInst()->getFastMathFlags()
+ : FastMathFlags();
+ VPInstruction *Mul = new VPInstruction(
+ Instruction::FMul, {WidenVFxUF, ScalarStep}, FMF, PN->getDebugLoc());
+ Mul->insertBefore(LatchVPBB->getTerminator());
+ Update = new VPInstruction(UpdateOp, {&WIV, Mul}, FMF, PN->getDebugLoc());
+ Update->insertBefore(LatchVPBB->getTerminator());
+ }
+ if (CreatedRecipes)
+ CreatedRecipes->insert(Update);
+ return Update;
+}
+
VPWidenRecipe *VPRecipeBuilder::tryToWiden(Instruction *I,
ArrayRef<VPValue *> Operands,
VPBasicBlock *VPBB, VPlanPtr &Plan) {
@@ -8324,10 +8360,15 @@ VPWidenRecipe *VPRecipeBuilder::tryToWiden(Instruction *I,
};
}
-void VPRecipeBuilder::fixHeaderPhis() {
+void VPRecipeBuilder::fixHeaderPhis(VPlan &Plan) {
BasicBlock *OrigLatch = OrigLoop->getLoopLatch();
for (VPHeaderPHIRecipe *R : PhisToFix) {
- auto *PN = cast<PHINode>(R->getUnderlyingValue());
+ if (auto *VPWIFR = dyn_cast<VPWidenIntOrFpInductionRecipe>(R)) {
+ VPWIFR->addOperand(
+ createWidenStep(*VPWIFR, *PSE.getSE(), Plan)->getVPSingleValue());
+ continue;
+ }
+ PHINode *PN = cast<PHINode>(R->getUnderlyingValue());
VPRecipeBase *IncR =
getRecipe(cast<Instruction>(PN->getIncomingValueForBlock(OrigLatch)));
R->addOperand(IncR->getVPSingleValue());
@@ -8405,8 +8446,12 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(
// can have earlier phis as incoming values.
recordRecipeOf(Phi);
- if ((Recipe = tryToOptimizeInductionPHI(Phi, Operands, *Plan, Range)))
+ if ((Recipe = tryToOptimizeInductionPHI(Phi, Operands, *Plan, Range))) {
+ if (isa<VPWidenPointerInductionRecipe>(Recipe))
+ return Recipe;
+ PhisToFix.push_back(cast<VPWidenIntOrFpInductionRecipe>(Recipe));
return Recipe;
+ }
VPHeaderPHIRecipe *PhiRecipe = nullptr;
assert((Legal->isReductionVariable(Phi) ||
@@ -8441,10 +8486,17 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(
return PhiRecipe;
}
- if (isa<TruncInst>(Instr) &&
- (Recipe = tryToOptimizeInductionTruncate(cast<TruncInst>(Instr), Operands,
- Range, *Plan)))
- return Recipe;
+ if (isa<TruncInst>(Instr)) {
+ auto IsOptimizableIVTruncate =
+ [&](Instruction *K) -> std::function<bool(ElementCount)> {
+ return [=](ElementCount VF) -> bool {
+ return CM.isOptimizableIVTruncate(K, VF);
+ };
+ };
+
+ LoopVectorizationPlanner::getDecisionAndClampRange(
+ IsOptimizableIVTruncate(Instr), Range);
+ }
// All widen recipes below deal only with VF > 1.
if (LoopVectorizationPlanner::getDecisionAndClampRange(
@@ -8707,7 +8759,7 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
!Plan->getVectorLoopRegion()->getEntryBasicBlock()->empty() &&
"entry block must be set to a VPRegionBlock having a non-empty entry "
"VPBasicBlock");
- RecipeBuilder.fixHeaderPhis();
+ RecipeBuilder.fixHeaderPhis(*Plan);
// ---------------------------------------------------------------------------
// Transform initial VPlan: Apply previously taken decisions, in order, to
diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
index b1498026adadfe..126a6b1c061265 100644
--- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
@@ -146,6 +146,18 @@ class VPRecipeBuilder {
/// between SRC and DST.
VPValue *getEdgeMask(BasicBlock *Src, BasicBlock *Dst) const;
+ /// A helper function to create VPWidenCastRecipe of a \p V VPValue to a \p To
+ /// type.
+ /// FIXME: Remove \p From argument and take it from a \p V value
+ static VPWidenCastRecipe *createCast(VPValue *V, Type *From, Type *To);
+
+ /// A helper function which widens \p WIV step, multiplies it by WidenVFxUF
+ /// and attaches to loop latch of the \p Plan. Returns multiplication.
+ static VPRecipeBase *
+ createWidenStep(VPWidenIntOrFpInductionRecipe &WIV, ScalarEvolution &SE,
+ VPlan &Plan,
+ DenseSet<VPRecipeBase *> *CreatedRecipes = nullptr);
+
/// Mark given ingredient for recording its recipe once one is created for
/// it.
void recordRecipeOf(Instruction *I) {
@@ -171,7 +183,7 @@ class VPRecipeBuilder {
/// Add the incoming values from the backedge to reduction & first-order
/// recurrence cross-iteration phis.
- void fixHeaderPhis();
+ void fixHeaderPhis(VPlan &Plan);
};
} // end namespace llvm
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 2c0daa82afa59f..96732b77a9db3d 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -76,12 +76,25 @@ Value *VPLane::getAsRuntimeExpr(IRBuilderBase &Builder,
llvm_unreachable("Unknown lane kind");
}
-VPValue::VPValue(const unsigned char SC, Value *UV, VPDef *Def)
- : SubclassID(SC), UnderlyingVal(UV), Def(Def) {
+VPValue::VPValue(const unsigned char SC, Value *UV, VPDef *Def, Type *Ty)
+ : SubclassID(SC), UnderlyingVal(UV), UnderlyingTy(Ty), Def(Def) {
+ if (UnderlyingTy)
+ assert((!UnderlyingVal || UnderlyingVal->getType() == UnderlyingTy) &&
+ "VPValue with set type should either be created without underlying "
+ "value or type should match the given type");
if (Def)
Def->addDefinedValue(this);
}
+Type *VPValue::getElementType() {
+ return const_cast<Type *>(
+ const_cast<const VPValue *>(this)->getElementType());
+}
+
+const Type *VPValue::getElementType() const {
+ return UnderlyingVal ? UnderlyingVal->getType() : UnderlyingTy;
+}
+
VPValue::~VPValue() {
assert(Users.empty() && "trying to delete a VPValue with remaining users");
if (Def)
@@ -763,6 +776,10 @@ VPlanPtr VPlan::createInitialVPlan(const SCEV *TripCount, ScalarEvolution &SE) {
auto Plan = std::make_unique<VPlan>(Preheader, VecPreheader);
Plan->TripCount =
vputils::getOrCreateVPValueForSCEVExpr(*Plan, TripCount, SE);
+ Type *TCType = TripCount->getType();
+ Plan->getVectorTripCount().setElementType(TCType);
+ Plan->getVFxUF().setElementType(TCType);
+ Plan->getWidenVFxUF().setElementType(TCType);
// Create empty VPRegionBlock, to be filled during processing later.
auto *TopRegion = new VPRegionBlock("vector loop", false /*isReplicator*/);
VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
@@ -796,6 +813,18 @@ void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
createStepForVF(Builder, TripCountV->getType(), State.VF, State.UF),
0);
+ if (WidenVFxUF.getNumUsers() > 0)
+ for (unsigned Part = 0, UF = State.UF; Part < UF; ++Part) {
+ Value *Step =
+ createStepForVF(Builder, TripCountV->getType(), State.VF, Part+1);
+ if (State.VF.isScalar())
+ State.set(&WidenVFxUF, Step, Part);
+ else
+ State.set(&WidenVFxUF,
+ Builder.CreateVectorSplat(State.VF, Step, "widen.vfxuf"),
+ Part);
+ }
+
// When vectorizing the epilogue loop, the canonical induction start value
// needs to be changed from zero to the value after the main vector loop.
// FIXME: Improve modeling for canonical IV start values in the epilogue loop.
@@ -845,21 +874,16 @@ void VPlan::execute(VPTransformState *State) {
if (isa<VPWidenPHIRecipe>(&R))
continue;
- if (isa<VPWidenPointerInductionRecipe>(&R) ||
- isa<VPWidenIntOrFpInductionRecipe>(&R)) {
+ if (isa<VPWidenPointerInductionRecipe>(&R)) {
PHINode *Phi = nullptr;
- if (isa<VPWidenIntOrFpInductionRecipe>(&R)) {
- Phi = cast<PHINode>(State->get(R.getVPSingleValue(), 0));
- } else {
- auto *WidenPhi = cast<VPWidenPointerInductionRecipe>(&R);
- // TODO: Split off the case that all users of a pointer phi are scalar
- // from the VPWidenPointerInductionRecipe.
- if (WidenPhi->onlyScalarsGenerated(State->VF.isScalable()))
- continue;
-
- auto *GEP = cast<GetElementPtrInst>(State->get(WidenPhi, 0));
- Phi = cast<PHINode>(GEP->getPointerOperand());
- }
+ auto *WidenPhi = cast<VPWidenPointerInductionRecipe>(&R);
+ // TODO: Split off the case that all users of a pointer phi are scalar
+ // from the VPWidenPointerInductionRecipe.
+ if (WidenPhi->onlyScalarsGenerated(State->VF.isScalable()))
+ continue;
+
+ auto *GEP = cast<GetElementPtrInst>(State->get(WidenPhi, 0));
+ Phi = cast<PHINode>(GEP->getPointerOperand());
Phi->setIncomingBlock(1, VectorLatchBB);
@@ -877,6 +901,7 @@ void VPlan::execute(VPTransformState *State) {
// generated.
bool SinglePartNeeded = isa<VPCanonicalIVPHIRecipe>(PhiR) ||
isa<VPFirstOrderRecurrencePHIRecipe>(PhiR) ||
+ isa<VPWidenIntOrFpInductionRecipe>(PhiR) ||
(isa<VPReductionPHIRecipe>(PhiR) &&
cast<VPReductionPHIRecipe>(PhiR)->isOrdered());
unsigned LastPartForNewPhi = SinglePartNeeded ? 1 : State->UF;
@@ -908,6 +933,12 @@ void VPlan::printLiveIns(raw_ostream &O) const {
O << " = VF * UF";
}
+ if (WidenVFxUF.getNumUsers() > 0) {
+ O << "\nLive-in ";
+ WidenVFxUF.printAsOperand(O, SlotTracker);
+ O << " = WIDEN VF * UF";
+ }
+
if (VectorTripCount.getNumUsers() > 0) {
O << "\nLive-in ";
VectorTripCount.printAsOperand(O, SlotTracker);
@@ -1083,6 +1114,11 @@ VPlan *VPlan::duplicate() {
}
Old2NewVPValues[&VectorTripCount] = &NewPlan->VectorTripCount;
Old2NewVPValues[&VFxUF] = &NewPlan->VFxUF;
+ Old2NewVPValues[&WidenVFxUF] = &NewPlan->WidenVFxUF;
+ NewPlan->getVectorTripCount().setElementType(
+ getVectorTripCount().getElementType());
+ NewPlan->getVFxUF().setElementType(getVFxUF().getElementType());
+ NewPlan->getWidenVFxUF().setElementType(getWidenVFxUF().getElementType());
if (BackedgeTakenCount) {
NewPlan->BackedgeTakenCount = new VPValue();
Old2NewVPValues[BackedgeTakenCount] = NewPlan->BackedgeTakenCount;
@@ -1379,6 +1415,8 @@ void VPSlotTracker::assignSlot(const VPValue *V) {
void VPSlotTracker::assignSlots(const VPlan &Plan) {
if (Plan.VFxUF.getNumUsers() > 0)
assignSlot(&Plan.VFxUF);
+ if (Plan.WidenVFxUF.getNumUsers() > 0)
+ assignSlot(&Plan.WidenVFxUF);
assignSlot(&Plan.VectorTripCount);
if (Plan.BackedgeTakenCount)
assignSlot(Plan.BackedgeTakenCount);
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 13e1859ad6b250..306c2200ca34c9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -1618,38 +1618,65 @@ class VPHeaderPHIRecipe : public VPSingleDefRecipe {
}
};
-/// A recipe for handling phi nodes of integer and floating-point inductions,
-/// producing their vector values.
-class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
- PHINode *IV;
- TruncInst *Trunc;
+/// A base class for all widen induction-like recipes
+class VPWidenInductionBasePHIRecipe : public VPHeaderPHIRecipe {
+protected:
const InductionDescriptor &IndDesc;
public:
- VPWidenIntOrFpInductionRecipe(PHINode *IV, VPValue *Start, VPValue *Step,
+ VPWidenInductionBasePHIRecipe(unsigned char VPDefID, Instruction *Instr,
+ VPValue *Start, VPValue *Step,
const InductionDescriptor &IndDesc)
- : VPHeaderPHIRecipe(VPDef::VPWidenIntOrFpInductionSC, IV, Start), IV(IV),
- Trunc(nullptr), IndDesc(IndDesc) {
+ : VPHeaderPHIRecipe(VPDefID, Instr, Start), IndDesc(IndDesc) {
addOperand(Step);
}
+ ~VPWidenInductionBasePHIRecipe() override = default;
+
+ /// Returns the step value of the induction.
+ VPValue *getStepValue() { return getOperand(1); }
+ const VPValue *getStepValue() const { return getOperand(1); }
+
+ /// Returns the induction descriptor for the recipe.
+ const InductionDescriptor &getInductionDescriptor() const { return IndDesc; }
+};
+
+/// A recipe for handling phi nodes of integer and floating-point inductions,
+/// producing their vector values.
+class VPWidenIntOrFpInductionRecipe : public VPWidenInductionBasePHIRecipe {
+ PHINode *IV = nullptr;
+ TruncInst *Trunc = nullptr;
+
+public:
+ VPWidenIntOrFpInductionRecipe(PHINode *IV, VPValue *Start, VPValue *Step,
+ const InductionDescriptor &IndDesc)
+ : VPWidenInductionBasePHIRecipe(VPDef::VPWidenIntOrFpInductionSC, IV,
+ Start, Step, IndDesc),
+ IV(IV), Trunc(nullptr) {}
+
VPWidenIntOrFpInductionRecipe(PHINode *IV, VPValue *Start, VPValue *Step,
const InductionDescriptor &IndDesc,
TruncInst *Trunc)
- : VPHeaderPHIRecipe(VPDef::VPWidenIntOrFpInductionSC, Trunc, Start),
- IV(IV), Trunc(Trunc), IndDesc(IndDesc) {
- addOperand(Step);
- }
+ : VPWidenInductionBasePHIRecipe(VPDef::VPWidenIntOrFpInductionSC, Trunc,
+ Start, Step, IndDesc),
+ IV(IV), Trunc(Trunc) {}
~VPWidenIntOrFpInductionRecipe() override = default;
VPRecipeBase *clone() override {
- return new VPWidenIntOrFpInductionRecipe(IV, getStartValue(),
- getStepValue(), IndDesc, Trunc);
+ VPRecipeBase *Cloned = new VPWidenIntOrFpInductionRecipe(
+ getPHINode(), getStartValue(), getStepValue(), IndDesc, Trunc);
+ if (getNumOperands() == 3)
+ Cloned->addOperand(getOperand(2));
+ return Cloned;
}
VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionSC)
+ static inline bool classof(const VPHeaderPHIRecipe *R) {
+ return R->getVPDefID() == VPDef::VPWidenIntOrFpInductionSC;
+ }
+
/// Generate the vectorized and scalarized versions of the phi node as
/// needed by their users.
void execute(VPTransformState &State) override;
@@ -1660,33 +1687,24 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
VPSlotTracker &SlotTracker) const override;
#endif
- VPValue *getBackedgeValue() override {
- // TODO: All operands of base recipe must exist and be at same index in
- // derived recipe.
- llvm_unreachable(
- "VPWidenIntOrFpInductionRecipe generates its own backedge value");
+ VPValue *getBackedgeValue() override final {
+ if (getNumOperands() != 3)
+ llvm_unreachable(
+ "VPWidenIntOrFpInductionRecipe::getBackedgeValue is not yet valid");
+ return getOperand(2);
}
- VPRecipeBase &getBackedgeRecipe() override {
- // TODO: All operands of base recipe must exist and be at same index in
- // derived recipe.
- llvm_unreachable(
- "VPWidenIntOrFpInductionRecipe generates its own backedge value");
+ VPRecipeBase &getBackedgeRecipe() override final {
+ return *getBackedgeValue()->getDefiningRecipe();
}
- /// Returns the step value of the induction.
- VPValue *getStepValue() { return getOperand(1); }
- const VPValue *getStepValue() const { return getOperand(1); }
-
/// Returns the first defined value as TruncInst, if it is one or nullptr
/// otherwise.
TruncInst *getTruncInst() { return Trunc; }
const TruncInst *getTruncInst() const { retu...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/82021
More information about the llvm-commits
mailing list