[llvm] [VPlan] Don't convert widen recipes to VP intrinsics in EVL transform (PR #126177)
via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 6 21:11:05 PST 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-llvm-transforms
Author: Luke Lau (lukel97)
<details>
<summary>Changes</summary>
This patch proposes to avoid converting widening recipes to VP intrinsics during the EVL transform.
IIUC we initially did this to avoid `vl` toggles on RISC-V. However we now have the RISCVVLOptimizer pass which mostly makes this redundant.
Emitting regular IR instead of VP intrinsics allows more generic optimisations, both in the middle end and DAGCombiner, and we generally have better patterns in the RISC-V backend for non-VP nodes. Sticking to regular IR instructions is likely a lot less work than reimplementing all of these optimisations for VP intrinsics.
On SPEC CPU 2017 we get noticeably better code generation:
- Better matching of mixed width arithmetic:
```diff
- vzext.vf2 v16, v14
- vzext.vf2 v14, v15
- vwsubu.vv v15, v16, v14
- vsetvli zero, zero, e32, m1, ta, ma
- vmul.vv v14, v15, v15
+ vwsubu.vv v16, v14, v15
+ vsetvli zero, zero, e16, mf2, ta, ma
+ vwmul.vv v14, v16, v16
```
- Ability to match saturating arithmetic:
```diff
@@ -6896,24 +6825,19 @@
sub s5, t4, t6
add s1, a2, t6
vsetvli s5, s5, e32, m2, ta, ma
- vle8.v v8, (s1)
+ vle8.v v10, (s1)
add s1, t5, t6
sub s0, s0, t1
- vmv.v.i v12, 0
- vzext.vf4 v14, v8
- vmadd.vx v14, s2, v10
- vsra.vx v8, v14, s3
- vadd.vx v14, v8, s4
- vmsgt.vi v0, v14, 0
- vmsltu.vx v8, v14, t0
- vmerge.vim v12, v12, -1, v0
- vmv1r.v v0, v8
- vmerge.vvm v8, v12, v14, v0
- vsetvli zero, zero, e16, m1, ta, ma
- vnsrl.wi v12, v8, 0
+ vzext.vf4 v12, v10
+ vmadd.vx v12, s2, v8
+ vsra.vx v10, v12, s3
+ vadd.vx v10, v10, s4
+ vmax.vx v10, v10, zero
+ vsetvli zero, zero, e16, m1, ta, ma
+ vnclipu.wi v12, v10, 0
```
- Strength reduction on div:
```diff
vluxei64.v v10, (a6), v12
sub a5, a5, a4
vmadd.vv v14, v10, v9
- vdiv.vx v10, v14, s1
+ vmulh.vx v10, v14, s0
+ vsra.vi v10, v10, 12
+ vsrl.vi v11, v10, 31
+ vadd.vv v10, v10, v11
```
- Better mask code gen:
```diff
vand.vx v9, v10, s4
vmsne.vi v0, v9, 0
- vmv.v.i v9, 0
- vmerge.vim v9, v9, 1, v0
- vsetvli zero, zero, e32, m1, tu, ma
- vadd.vv v8, v8, v9
+ sub a4, a4, a3
+ vsetvli zero, zero, e32, m1, tu, mu
+ vadd.vi v8, v8, 1, v0.t
```
- Less constraints on `vl` which allows RISCVLOptimizer to remove some `vsetvli`s
```diff
- vsetvli s1, a4, e32, m1, ta, ma
+ vsetvli a4, a4, e32, m1, ta, ma
vle64.v v10, (a5)
sub a2, a2, s8
vluxei64.v v9, (zero), v10
vsetvli a5, zero, e64, m2, ta, ma
- vmv.v.x v10, s1
+ vmv.v.x v10, a4
vmsleu.vv v12, v10, v14
- vmsltu.vx v10, v14, s1
+ vmsltu.vx v10, v14, a4
vmand.mm v11, v8, v12
- vsetvli zero, a4, e32, m1, ta, ma
+ vsetvli zero, zero, e32, m1, ta, ma
vand.vx v9, v9, s6
- vsetvli a4, zero, e32, m1, ta, ma
```
I've removed the VPWidenEVLRecipe in this patch since it's no longer used, but if people would prefer to keep it for other targets (PPC?) then I'd be happy to add it back in and gate it under a target hook.
---
Patch is 107.47 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/126177.diff
23 Files Affected:
- (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+1-52)
- (modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+2-3)
- (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (-48)
- (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (-30)
- (modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (-1)
- (modified) llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp (-4)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll (+3-3)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll (+4-4)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll (+7-7)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll (+18-18)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-call-intrinsics.ll (+45-23)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll (+10-10)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cond-reduction.ll (+2-2)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-intermediate-store.ll (+1-1)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll (+3-3)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-masked-loadstore.ll (+1-1)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reduction.ll (+6-6)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll (+1-1)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-call-intrinsics.ll (+22-22)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-cast-intrinsics.ll (+20-20)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics-reduction.ll (+1-1)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll (+1-1)
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-select-intrinsics.ll (+2-2)
``````````diff
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index fac207287e0bcc7..064e916d6d81718 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -523,7 +523,6 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
case VPRecipeBase::VPWidenGEPSC:
case VPRecipeBase::VPWidenIntrinsicSC:
case VPRecipeBase::VPWidenSC:
- case VPRecipeBase::VPWidenEVLSC:
case VPRecipeBase::VPWidenSelectSC:
case VPRecipeBase::VPBlendSC:
case VPRecipeBase::VPPredInstPHISC:
@@ -710,7 +709,6 @@ class VPRecipeWithIRFlags : public VPSingleDefRecipe {
static inline bool classof(const VPRecipeBase *R) {
return R->getVPDefID() == VPRecipeBase::VPInstructionSC ||
R->getVPDefID() == VPRecipeBase::VPWidenSC ||
- R->getVPDefID() == VPRecipeBase::VPWidenEVLSC ||
R->getVPDefID() == VPRecipeBase::VPWidenGEPSC ||
R->getVPDefID() == VPRecipeBase::VPWidenCastSC ||
R->getVPDefID() == VPRecipeBase::VPWidenIntrinsicSC ||
@@ -1116,8 +1114,7 @@ class VPWidenRecipe : public VPRecipeWithIRFlags {
}
static inline bool classof(const VPRecipeBase *R) {
- return R->getVPDefID() == VPRecipeBase::VPWidenSC ||
- R->getVPDefID() == VPRecipeBase::VPWidenEVLSC;
+ return R->getVPDefID() == VPRecipeBase::VPWidenSC;
}
static inline bool classof(const VPUser *U) {
@@ -1142,54 +1139,6 @@ class VPWidenRecipe : public VPRecipeWithIRFlags {
#endif
};
-/// A recipe for widening operations with vector-predication intrinsics with
-/// explicit vector length (EVL).
-class VPWidenEVLRecipe : public VPWidenRecipe {
- using VPRecipeWithIRFlags::transferFlags;
-
-public:
- template <typename IterT>
- VPWidenEVLRecipe(Instruction &I, iterator_range<IterT> Operands, VPValue &EVL)
- : VPWidenRecipe(VPDef::VPWidenEVLSC, I, Operands) {
- addOperand(&EVL);
- }
- VPWidenEVLRecipe(VPWidenRecipe &W, VPValue &EVL)
- : VPWidenEVLRecipe(*W.getUnderlyingInstr(), W.operands(), EVL) {
- transferFlags(W);
- }
-
- ~VPWidenEVLRecipe() override = default;
-
- VPWidenRecipe *clone() override final {
- llvm_unreachable("VPWidenEVLRecipe cannot be cloned");
- return nullptr;
- }
-
- VP_CLASSOF_IMPL(VPDef::VPWidenEVLSC);
-
- VPValue *getEVL() { return getOperand(getNumOperands() - 1); }
- const VPValue *getEVL() const { return getOperand(getNumOperands() - 1); }
-
- /// Produce a vp-intrinsic using the opcode and operands of the recipe,
- /// processing EVL elements.
- void execute(VPTransformState &State) override final;
-
- /// Returns true if the recipe only uses the first lane of operand \p Op.
- bool onlyFirstLaneUsed(const VPValue *Op) const override {
- assert(is_contained(operands(), Op) &&
- "Op must be an operand of the recipe");
- // EVL in that recipe is always the last operand, thus any use before means
- // the VPValue should be vectorized.
- return getEVL() == Op;
- }
-
-#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- /// Print the recipe.
- void print(raw_ostream &O, const Twine &Indent,
- VPSlotTracker &SlotTracker) const override final;
-#endif
-};
-
/// VPWidenCastRecipe is a recipe to create vector cast instructions.
class VPWidenCastRecipe : public VPRecipeWithIRFlags {
/// Cast instruction opcode.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 71fb6d42116cfe4..dcd8eeecdc15b06 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -245,9 +245,8 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
VPPartialReductionRecipe>([this](const VPRecipeBase *R) {
return inferScalarType(R->getOperand(0));
})
- .Case<VPBlendRecipe, VPInstruction, VPWidenRecipe, VPWidenEVLRecipe,
- VPReplicateRecipe, VPWidenCallRecipe, VPWidenMemoryRecipe,
- VPWidenSelectRecipe>(
+ .Case<VPBlendRecipe, VPInstruction, VPWidenRecipe, VPReplicateRecipe,
+ VPWidenCallRecipe, VPWidenMemoryRecipe, VPWidenSelectRecipe>(
[this](const auto *R) { return inferScalarTypeForRecipe(R); })
.Case<VPWidenIntrinsicRecipe>([](const VPWidenIntrinsicRecipe *R) {
return R->getResultType();
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index c84a93d7398f73b..83c764b87953d9d 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -85,7 +85,6 @@ bool VPRecipeBase::mayWriteToMemory() const {
case VPWidenLoadSC:
case VPWidenPHISC:
case VPWidenSC:
- case VPWidenEVLSC:
case VPWidenSelectSC: {
const Instruction *I =
dyn_cast_or_null<Instruction>(getVPSingleValue()->getUnderlyingValue());
@@ -131,7 +130,6 @@ bool VPRecipeBase::mayReadFromMemory() const {
case VPWidenIntOrFpInductionSC:
case VPWidenPHISC:
case VPWidenSC:
- case VPWidenEVLSC:
case VPWidenSelectSC: {
const Instruction *I =
dyn_cast_or_null<Instruction>(getVPSingleValue()->getUnderlyingValue());
@@ -172,7 +170,6 @@ bool VPRecipeBase::mayHaveSideEffects() const {
case VPWidenPHISC:
case VPWidenPointerInductionSC:
case VPWidenSC:
- case VPWidenEVLSC:
case VPWidenSelectSC: {
const Instruction *I =
dyn_cast_or_null<Instruction>(getVPSingleValue()->getUnderlyingValue());
@@ -1550,42 +1547,6 @@ InstructionCost VPWidenRecipe::computeCost(ElementCount VF,
}
}
-void VPWidenEVLRecipe::execute(VPTransformState &State) {
- unsigned Opcode = getOpcode();
- // TODO: Support other opcodes
- if (!Instruction::isBinaryOp(Opcode) && !Instruction::isUnaryOp(Opcode))
- llvm_unreachable("Unsupported opcode in VPWidenEVLRecipe::execute");
-
- State.setDebugLocFrom(getDebugLoc());
-
- assert(State.get(getOperand(0))->getType()->isVectorTy() &&
- "VPWidenEVLRecipe should not be used for scalars");
-
- VPValue *EVL = getEVL();
- Value *EVLArg = State.get(EVL, /*NeedsScalar=*/true);
- IRBuilderBase &BuilderIR = State.Builder;
- VectorBuilder Builder(BuilderIR);
- Value *Mask = BuilderIR.CreateVectorSplat(State.VF, BuilderIR.getTrue());
-
- SmallVector<Value *, 4> Ops;
- for (unsigned I = 0, E = getNumOperands() - 1; I < E; ++I) {
- VPValue *VPOp = getOperand(I);
- Ops.push_back(State.get(VPOp));
- }
-
- Builder.setMask(Mask).setEVL(EVLArg);
- Value *VPInst =
- Builder.createVectorInstruction(Opcode, Ops[0]->getType(), Ops, "vp.op");
- // Currently vp-intrinsics only accept FMF flags.
- // TODO: Enable other flags when support is added.
- if (isa<FPMathOperator>(VPInst))
- setFlags(cast<Instruction>(VPInst));
-
- State.set(this, VPInst);
- State.addMetadata(VPInst,
- dyn_cast_or_null<Instruction>(getUnderlyingValue()));
-}
-
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
void VPWidenRecipe::print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const {
@@ -1595,15 +1556,6 @@ void VPWidenRecipe::print(raw_ostream &O, const Twine &Indent,
printFlags(O);
printOperands(O, SlotTracker);
}
-
-void VPWidenEVLRecipe::print(raw_ostream &O, const Twine &Indent,
- VPSlotTracker &SlotTracker) const {
- O << Indent << "WIDEN ";
- printAsOperand(O, SlotTracker);
- O << " = vp." << Instruction::getOpcodeName(getOpcode());
- printFlags(O);
- printOperands(O, SlotTracker);
-}
#endif
void VPWidenCastRecipe::execute(VPTransformState &State) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 7e9ef46133936d7..18323a078f1ee0c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1665,40 +1665,10 @@ static VPRecipeBase *createEVLRecipe(VPValue *HeaderMask,
VPValue *NewMask = GetNewMask(S->getMask());
return new VPWidenStoreEVLRecipe(*S, EVL, NewMask);
})
- .Case<VPWidenRecipe>([&](VPWidenRecipe *W) -> VPRecipeBase * {
- unsigned Opcode = W->getOpcode();
- if (!Instruction::isBinaryOp(Opcode) && !Instruction::isUnaryOp(Opcode))
- return nullptr;
- return new VPWidenEVLRecipe(*W, EVL);
- })
.Case<VPReductionRecipe>([&](VPReductionRecipe *Red) {
VPValue *NewMask = GetNewMask(Red->getCondOp());
return new VPReductionEVLRecipe(*Red, EVL, NewMask);
})
- .Case<VPWidenIntrinsicRecipe, VPWidenCastRecipe>(
- [&](auto *CR) -> VPRecipeBase * {
- Intrinsic::ID VPID;
- if (auto *CallR = dyn_cast<VPWidenIntrinsicRecipe>(CR)) {
- VPID =
- VPIntrinsic::getForIntrinsic(CallR->getVectorIntrinsicID());
- } else {
- auto *CastR = cast<VPWidenCastRecipe>(CR);
- VPID = VPIntrinsic::getForOpcode(CastR->getOpcode());
- }
-
- // Not all intrinsics have a corresponding VP intrinsic.
- if (VPID == Intrinsic::not_intrinsic)
- return nullptr;
- assert(VPIntrinsic::getMaskParamPos(VPID) &&
- VPIntrinsic::getVectorLengthParamPos(VPID) &&
- "Expected VP intrinsic to have mask and EVL");
-
- SmallVector<VPValue *> Ops(CR->operands());
- Ops.push_back(&AllOneMask);
- Ops.push_back(&EVL);
- return new VPWidenIntrinsicRecipe(
- VPID, Ops, TypeInfo.inferScalarType(CR), CR->getDebugLoc());
- })
.Case<VPWidenSelectRecipe>([&](VPWidenSelectRecipe *Sel) {
SmallVector<VPValue *> Ops(Sel->operands());
Ops.push_back(&EVL);
diff --git a/llvm/lib/Transforms/Vectorize/VPlanValue.h b/llvm/lib/Transforms/Vectorize/VPlanValue.h
index aabc4ab571e7a91..a058b2a121d59f3 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanValue.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanValue.h
@@ -351,7 +351,6 @@ class VPDef {
VPWidenStoreEVLSC,
VPWidenStoreSC,
VPWidenSC,
- VPWidenEVLSC,
VPWidenSelectSC,
VPBlendSC,
VPHistogramSC,
diff --git a/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp b/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
index 96156de444f8813..a4b309d6dcd9fe3 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
@@ -145,10 +145,6 @@ bool VPlanVerifier::verifyEVLRecipe(const VPInstruction &EVL) const {
[&](const VPRecipeBase *S) { return VerifyEVLUse(*S, 2); })
.Case<VPWidenLoadEVLRecipe, VPReverseVectorPointerRecipe>(
[&](const VPRecipeBase *R) { return VerifyEVLUse(*R, 1); })
- .Case<VPWidenEVLRecipe>([&](const VPWidenEVLRecipe *W) {
- return VerifyEVLUse(*W,
- Instruction::isUnaryOp(W->getOpcode()) ? 1 : 2);
- })
.Case<VPScalarCastRecipe>(
[&](const VPScalarCastRecipe *S) { return VerifyEVLUse(*S, 0); })
.Case<VPInstruction>([&](const VPInstruction *I) {
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll b/llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll
index 14818199072c285..b96a44a546a141a 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll
@@ -143,8 +143,8 @@ define i32 @add_i16_i32(ptr nocapture readonly %x, i32 %n) {
; IF-EVL-OUTLOOP-NEXT: [[TMP7:%.*]] = getelementptr inbounds i16, ptr [[X:%.*]], i32 [[TMP6]]
; IF-EVL-OUTLOOP-NEXT: [[TMP8:%.*]] = getelementptr inbounds i16, ptr [[TMP7]], i32 0
; IF-EVL-OUTLOOP-NEXT: [[VP_OP_LOAD:%.*]] = call <vscale x 4 x i16> @llvm.vp.load.nxv4i16.p0(ptr align 2 [[TMP8]], <vscale x 4 x i1> splat (i1 true), i32 [[TMP5]])
-; IF-EVL-OUTLOOP-NEXT: [[TMP9:%.*]] = call <vscale x 4 x i32> @llvm.vp.sext.nxv4i32.nxv4i16(<vscale x 4 x i16> [[VP_OP_LOAD]], <vscale x 4 x i1> splat (i1 true), i32 [[TMP5]])
-; IF-EVL-OUTLOOP-NEXT: [[VP_OP:%.*]] = call <vscale x 4 x i32> @llvm.vp.add.nxv4i32(<vscale x 4 x i32> [[VEC_PHI]], <vscale x 4 x i32> [[TMP9]], <vscale x 4 x i1> splat (i1 true), i32 [[TMP5]])
+; IF-EVL-OUTLOOP-NEXT: [[TMP9:%.*]] = sext <vscale x 4 x i16> [[VP_OP_LOAD]] to <vscale x 4 x i32>
+; IF-EVL-OUTLOOP-NEXT: [[VP_OP:%.*]] = add <vscale x 4 x i32> [[VEC_PHI]], [[TMP9]]
; IF-EVL-OUTLOOP-NEXT: [[TMP10]] = call <vscale x 4 x i32> @llvm.vp.merge.nxv4i32(<vscale x 4 x i1> splat (i1 true), <vscale x 4 x i32> [[VP_OP]], <vscale x 4 x i32> [[VEC_PHI]], i32 [[TMP5]])
; IF-EVL-OUTLOOP-NEXT: [[INDEX_EVL_NEXT]] = add nuw i32 [[TMP5]], [[EVL_BASED_IV]]
; IF-EVL-OUTLOOP-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], [[TMP4]]
@@ -200,7 +200,7 @@ define i32 @add_i16_i32(ptr nocapture readonly %x, i32 %n) {
; IF-EVL-INLOOP-NEXT: [[TMP8:%.*]] = getelementptr inbounds i16, ptr [[X:%.*]], i32 [[TMP7]]
; IF-EVL-INLOOP-NEXT: [[TMP9:%.*]] = getelementptr inbounds i16, ptr [[TMP8]], i32 0
; IF-EVL-INLOOP-NEXT: [[VP_OP_LOAD:%.*]] = call <vscale x 8 x i16> @llvm.vp.load.nxv8i16.p0(ptr align 2 [[TMP9]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP6]])
-; IF-EVL-INLOOP-NEXT: [[TMP14:%.*]] = call <vscale x 8 x i32> @llvm.vp.sext.nxv8i32.nxv8i16(<vscale x 8 x i16> [[VP_OP_LOAD]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-INLOOP-NEXT: [[TMP14:%.*]] = sext <vscale x 8 x i16> [[VP_OP_LOAD]] to <vscale x 8 x i32>
; IF-EVL-INLOOP-NEXT: [[TMP10:%.*]] = call i32 @llvm.vp.reduce.add.nxv8i32(i32 0, <vscale x 8 x i32> [[TMP14]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP6]])
; IF-EVL-INLOOP-NEXT: [[TMP11]] = add i32 [[TMP10]], [[VEC_PHI]]
; IF-EVL-INLOOP-NEXT: [[INDEX_EVL_NEXT]] = add nuw i32 [[TMP6]], [[EVL_BASED_IV]]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll b/llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll
index 68b36f23de4b053..ba7158eb02d904a 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll
@@ -27,10 +27,10 @@ define void @truncate_to_minimal_bitwidths_widen_cast_recipe(ptr %src) {
; CHECK-NEXT: [[TMP5:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP4]]
; CHECK-NEXT: [[TMP6:%.*]] = getelementptr i8, ptr [[TMP5]], i32 0
; CHECK-NEXT: [[VP_OP_LOAD:%.*]] = call <vscale x 1 x i8> @llvm.vp.load.nxv1i8.p0(ptr align 1 [[TMP6]], <vscale x 1 x i1> splat (i1 true), i32 [[TMP3]])
-; CHECK-NEXT: [[TMP7:%.*]] = call <vscale x 1 x i16> @llvm.vp.zext.nxv1i16.nxv1i8(<vscale x 1 x i8> [[VP_OP_LOAD]], <vscale x 1 x i1> splat (i1 true), i32 [[TMP3]])
-; CHECK-NEXT: [[VP_OP:%.*]] = call <vscale x 1 x i16> @llvm.vp.mul.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> [[TMP7]], <vscale x 1 x i1> splat (i1 true), i32 [[TMP3]])
-; CHECK-NEXT: [[VP_OP1:%.*]] = call <vscale x 1 x i16> @llvm.vp.lshr.nxv1i16(<vscale x 1 x i16> [[VP_OP]], <vscale x 1 x i16> trunc (<vscale x 1 x i32> splat (i32 1) to <vscale x 1 x i16>), <vscale x 1 x i1> splat (i1 true), i32 [[TMP3]])
-; CHECK-NEXT: [[TMP8:%.*]] = call <vscale x 1 x i8> @llvm.vp.trunc.nxv1i8.nxv1i16(<vscale x 1 x i16> [[VP_OP1]], <vscale x 1 x i1> splat (i1 true), i32 [[TMP3]])
+; CHECK-NEXT: [[TMP7:%.*]] = zext <vscale x 1 x i8> [[VP_OP_LOAD]] to <vscale x 1 x i16>
+; CHECK-NEXT: [[TMP12:%.*]] = mul <vscale x 1 x i16> zeroinitializer, [[TMP7]]
+; CHECK-NEXT: [[VP_OP1:%.*]] = lshr <vscale x 1 x i16> [[TMP12]], trunc (<vscale x 1 x i32> splat (i32 1) to <vscale x 1 x i16>)
+; CHECK-NEXT: [[TMP8:%.*]] = trunc <vscale x 1 x i16> [[VP_OP1]] to <vscale x 1 x i8>
; CHECK-NEXT: call void @llvm.vp.scatter.nxv1i8.nxv1p0(<vscale x 1 x i8> [[TMP8]], <vscale x 1 x ptr> align 1 zeroinitializer, <vscale x 1 x i1> splat (i1 true), i32 [[TMP3]])
; CHECK-NEXT: [[TMP9:%.*]] = zext i32 [[TMP3]] to i64
; CHECK-NEXT: [[INDEX_EVL_NEXT]] = add nuw i64 [[TMP9]], [[EVL_BASED_IV]]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll b/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
index 48b73c7f1a4deef..03e4b661a941bdc 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
@@ -44,15 +44,15 @@ define void @type_info_cache_clobber(ptr %dstv, ptr %src, i64 %wide.trip.count)
; CHECK-NEXT: [[TMP13:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP12]]
; CHECK-NEXT: [[TMP14:%.*]] = getelementptr i8, ptr [[TMP13]], i32 0
; CHECK-NEXT: [[VP_OP_LOAD:%.*]] = call <vscale x 8 x i8> @llvm.vp.load.nxv8i8.p0(ptr align 1 [[TMP14]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]]), !alias.scope [[META0:![0-9]+]]
-; CHECK-NEXT: [[TMP15:%.*]] = call <vscale x 8 x i32> @llvm.vp.zext.nxv8i32.nxv8i8(<vscale x 8 x i8> [[VP_OP_LOAD]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
-; CHECK-NEXT: [[VP_OP:%.*]] = call <vscale x 8 x i32> @llvm.vp.mul.nxv8i32(<vscale x 8 x i32> [[TMP15]], <vscale x 8 x i32> zeroinitializer, <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
-; CHECK-NEXT: [[VP_OP2:%.*]] = call <vscale x 8 x i32> @llvm.vp.ashr.nxv8i32(<vscale x 8 x i32> [[TMP15]], <vscale x 8 x i32> zeroinitializer, <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
-; CHECK-NEXT: [[VP_OP3:%.*]] = call <vscale x 8 x i32> @llvm.vp.or.nxv8i32(<vscale x 8 x i32> [[VP_OP2]], <vscale x 8 x i32> zeroinitializer, <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
+; CHECK-NEXT: [[TMP15:%.*]] = zext <vscale x 8 x i8> [[VP_OP_LOAD]] to <vscale x 8 x i32>
+; CHECK-NEXT: [[VP_OP:%.*]] = mul <vscale x 8 x i32> [[TMP15]], zeroinitializer
+; CHECK-NEXT: [[TMP23:%.*]] = ashr <vscale x 8 x i32> [[TMP15]], zeroinitializer
+; CHECK-NEXT: [[VP_OP3:%.*]] = or <vscale x 8 x i32> [[TMP23]], zeroinitializer
; CHECK-NEXT: [[TMP16:%.*]] = icmp ult <vscale x 8 x i32> [[TMP15]], zeroinitializer
; CHECK-NEXT: [[TMP17:%.*]] = call <vscale x 8 x i32> @llvm.vp.select.nxv8i32(<vscale x 8 x i1> [[TMP16]], <vscale x 8 x i32> [[VP_OP3]], <vscale x 8 x i32> zeroinitializer, i32 [[TMP11]])
-; CHECK-NEXT: [[TMP18:%.*]] = call <vscale x 8 x i8> @llvm.vp.trunc.nxv8i8.nxv8i32(<vscale x 8 x i32> [[TMP17]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
-; CHECK-NEXT: call void @llvm.vp.scatter.nxv8i8.nxv8p0(<vscale x 8 x i8> [[TMP18]], <vscale x 8 x ptr> align 1 [[BROADCAST_SPLAT]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]]), !alias.scope [[META3:![0-9]+]], !noalias [[META0]]
-; CHECK-NEXT: [[TMP19:%.*]] = call <vscale x 8 x i16> @llvm.vp.trunc.nxv8i16.nxv8i32(<vscale x 8 x i32> [[VP_OP]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
+; CHECK-NEXT: [[TMP24:%.*]] = trunc <vscale x 8 x i32> [[TMP17]] to <vscale x 8 x i8>
+; CHECK-NEXT: call void @llvm.vp.scatter.nxv8i8.nxv8p0(<vscale x 8 x i8> [[TMP24]], <vscale x 8 x ptr> align 1 [[BROADCAST_SPLAT]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]]), !alias.scope [[META3:![0-9]+]], !noalias [[META0]]
+; CHECK-NEXT: [[TMP19:%.*]] = trunc <vscale x 8 x i32> [[VP_OP]] to <vscale x 8 x i16>
; CHECK-NEXT: call void @llvm.vp.scatter.nxv8i16.nxv8p0(<vscale x 8 x i16> [[TMP19]], <vscale x 8 x ptr> align 2 zeroinitializer, <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
; CHECK-NEXT: [[TMP20:%.*]] = zext i32 [[TMP11]] to i64
; CHECK-NEXT: [[INDEX_EVL_NEXT]] = add i64 [[TMP20]], [[EVL_BASED_IV]]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll
index df9ca218aad7031..2c111ff674eaea8 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll
@@ -42,7 +42,7 @@ define void @test_and(ptr nocapture %a, ptr nocapture readonly %b) {
; IF-EVL-NEXT: [[TMP13:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 [[TMP12]]
; IF-EVL-NEXT: [[TMP14:%.*]] = getelementptr inbounds i8, ptr [[TMP13]], i32 0
; IF-EV...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/126177
More information about the llvm-commits
mailing list