[llvm] [RegAlloc][RISCV] Increase the spill weight by target factor (PR #113675)
via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 25 03:45:12 PDT 2024
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-backend-risc-v
Author: Pengcheng Wang (wangpc-pp)
<details>
<summary>Changes</summary>
Currently, the spill weight is only determined by isDef/isUse and
block frequency. However, for registers with different register
classes, the costs of spilling them are different.
For example, for `LMUL>1` registers (in which, several physical regsiter
compound a bigger logical register), the costs are larger than
`LMUL=1` case (in which, there is only one physical register).
To sovle this problem, a new target hook `getSpillWeightFactor` is
added. Targets can override the default factor (which is 1) according
to the register classes.
For RISC-V, the factors are set to the `RegClassWeight` which is
used to track regsiter pressure. The values of `RegClassWeight`
are the number of register units.
I believe all the targets can benefit from this change, but I will
shrink the range of tests to RISC-V only.
Partially fixes #<!-- -->113489.
---
Patch is 870.37 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/113675.diff
78 Files Affected:
- (modified) llvm/include/llvm/CodeGen/LiveIntervals.h (+2-2)
- (modified) llvm/include/llvm/CodeGen/TargetRegisterInfo.h (+3)
- (modified) llvm/lib/CodeGen/CalcSpillWeights.cpp (+7-5)
- (modified) llvm/lib/CodeGen/LiveIntervals.cpp (+4-4)
- (modified) llvm/lib/CodeGen/TargetRegisterInfo.cpp (+5)
- (modified) llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp (+5)
- (modified) llvm/lib/Target/RISCV/RISCVRegisterInfo.h (+2)
- (modified) llvm/test/CodeGen/RISCV/rvv/abs-vp.ll (+6-29)
- (modified) llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll (+67-87)
- (modified) llvm/test/CodeGen/RISCV/rvv/bswap-vp.ll (+36-56)
- (modified) llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll (+44-130)
- (modified) llvm/test/CodeGen/RISCV/rvv/compressstore.ll (+7-30)
- (modified) llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll (+57-157)
- (modified) llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll (+74-225)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bitreverse-vp.ll (+58-78)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap-vp.ll (+37-57)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ceil-vp.ll (-14)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz-vp.ll (+166-472)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctpop-vp.ll (+80-181)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz-vp.ll (+146-452)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-floor-vp.ll (-14)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll (+361-358)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-store-fp.ll (+8-96)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-store-int.ll (+8-96)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll (+76-224)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-rint-vp.ll (-13)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-round-vp.ll (-14)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-roundeven-vp.ll (-14)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-roundtozero-vp.ll (-14)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-setcc-fp-vp.ll (+20-40)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-setcc-int-vp.ll (+18-38)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-trunc-vp.ll (+94-91)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vcopysign-vp.ll (+5-15)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfmax-vp.ll (+5-15)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfmin-vp.ll (+5-15)
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vselect-vp.ll (+15-25)
- (modified) llvm/test/CodeGen/RISCV/rvv/floor-vp.ll (+44-130)
- (modified) llvm/test/CodeGen/RISCV/rvv/fmaximum-sdnode.ll (+4-30)
- (modified) llvm/test/CodeGen/RISCV/rvv/fmaximum-vp.ll (+231-282)
- (modified) llvm/test/CodeGen/RISCV/rvv/fminimum-sdnode.ll (+4-30)
- (modified) llvm/test/CodeGen/RISCV/rvv/fminimum-vp.ll (+231-282)
- (modified) llvm/test/CodeGen/RISCV/rvv/fshr-fshl-vp.ll (+67-201)
- (modified) llvm/test/CodeGen/RISCV/rvv/mscatter-sdnode.ll (+9-19)
- (modified) llvm/test/CodeGen/RISCV/rvv/nearbyint-vp.ll (+36-92)
- (modified) llvm/test/CodeGen/RISCV/rvv/rint-vp.ll (+44-123)
- (modified) llvm/test/CodeGen/RISCV/rvv/round-vp.ll (+44-130)
- (modified) llvm/test/CodeGen/RISCV/rvv/roundeven-vp.ll (+44-130)
- (modified) llvm/test/CodeGen/RISCV/rvv/roundtozero-vp.ll (+44-130)
- (modified) llvm/test/CodeGen/RISCV/rvv/setcc-fp-vp.ll (+84-230)
- (modified) llvm/test/CodeGen/RISCV/rvv/setcc-int-vp.ll (+20-40)
- (modified) llvm/test/CodeGen/RISCV/rvv/strided-vpstore.ll (+6-20)
- (modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll (+17-19)
- (modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll (+40-22)
- (modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll (+7-29)
- (modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll (+48-96)
- (modified) llvm/test/CodeGen/RISCV/rvv/vfadd-vp.ll (+74-90)
- (modified) llvm/test/CodeGen/RISCV/rvv/vfdiv-vp.ll (+74-90)
- (modified) llvm/test/CodeGen/RISCV/rvv/vfma-vp.ll (+918-978)
- (modified) llvm/test/CodeGen/RISCV/rvv/vfmadd-constrained-sdnode.ll (+94-68)
- (modified) llvm/test/CodeGen/RISCV/rvv/vfmadd-sdnode.ll (+147-106)
- (modified) llvm/test/CodeGen/RISCV/rvv/vfmul-vp.ll (+37-45)
- (modified) llvm/test/CodeGen/RISCV/rvv/vfmuladd-vp.ll (+4-4)
- (modified) llvm/test/CodeGen/RISCV/rvv/vfnmadd-constrained-sdnode.ll (+60-54)
- (modified) llvm/test/CodeGen/RISCV/rvv/vfnmsub-constrained-sdnode.ll (+55-64)
- (modified) llvm/test/CodeGen/RISCV/rvv/vfptosi-vp.ll (+1-14)
- (modified) llvm/test/CodeGen/RISCV/rvv/vfptoui-vp.ll (+1-14)
- (modified) llvm/test/CodeGen/RISCV/rvv/vfptrunc-vp.ll (+5-19)
- (modified) llvm/test/CodeGen/RISCV/rvv/vfsub-vp.ll (+74-90)
- (modified) llvm/test/CodeGen/RISCV/rvv/vfwmacc-vp.ll (+5-17)
- (modified) llvm/test/CodeGen/RISCV/rvv/vfwnmacc-vp.ll (+5-17)
- (modified) llvm/test/CodeGen/RISCV/rvv/vfwnmsac-vp.ll (+5-17)
- (modified) llvm/test/CodeGen/RISCV/rvv/vpscatter-sdnode.ll (+14-14)
- (modified) llvm/test/CodeGen/RISCV/rvv/vpstore.ll (+6-20)
- (modified) llvm/test/CodeGen/RISCV/rvv/vselect-fp.ll (-1)
- (modified) llvm/test/CodeGen/RISCV/rvv/vselect-vp.ll (+27-101)
- (modified) llvm/test/CodeGen/RISCV/rvv/vsitofp-vp.ll (+1-14)
- (modified) llvm/test/CodeGen/RISCV/rvv/vtrunc-vp.ll (+4-5)
- (modified) llvm/test/CodeGen/RISCV/rvv/vuitofp-vp.ll (+1-14)
``````````diff
diff --git a/llvm/include/llvm/CodeGen/LiveIntervals.h b/llvm/include/llvm/CodeGen/LiveIntervals.h
index 161bb247a0e968..a58ba178ac8484 100644
--- a/llvm/include/llvm/CodeGen/LiveIntervals.h
+++ b/llvm/include/llvm/CodeGen/LiveIntervals.h
@@ -117,14 +117,14 @@ class LiveIntervals {
/// If \p PSI is provided the calculation is altered for optsize functions.
static float getSpillWeight(bool isDef, bool isUse,
const MachineBlockFrequencyInfo *MBFI,
- const MachineInstr &MI,
+ const MachineInstr &MI, unsigned Factor = 1,
ProfileSummaryInfo *PSI = nullptr);
/// Calculate the spill weight to assign to a single instruction.
/// If \p PSI is provided the calculation is altered for optsize functions.
static float getSpillWeight(bool isDef, bool isUse,
const MachineBlockFrequencyInfo *MBFI,
- const MachineBasicBlock *MBB,
+ const MachineBasicBlock *MBB, unsigned Factor = 1,
ProfileSummaryInfo *PSI = nullptr);
LiveInterval &getInterval(Register Reg) {
diff --git a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
index 292fa3c94969be..8726d2e33dbc83 100644
--- a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
@@ -926,6 +926,9 @@ class TargetRegisterInfo : public MCRegisterInfo {
/// Returns a -1 terminated array of pressure set IDs.
virtual const int *getRegUnitPressureSets(unsigned RegUnit) const = 0;
+ /// Get the factor of spill weight for this register class.
+ virtual unsigned getSpillWeightFactor(const TargetRegisterClass *RC) const;
+
/// Get a list of 'hint' registers that the register allocator should try
/// first when allocating a physical register for the virtual register
/// VirtReg. These registers are effectively moved to the front of the
diff --git a/llvm/lib/CodeGen/CalcSpillWeights.cpp b/llvm/lib/CodeGen/CalcSpillWeights.cpp
index f361c956092e88..8c3ab0d1e43a89 100644
--- a/llvm/lib/CodeGen/CalcSpillWeights.cpp
+++ b/llvm/lib/CodeGen/CalcSpillWeights.cpp
@@ -189,6 +189,7 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
// Do not update future local split artifacts.
bool ShouldUpdateLI = !IsLocalSplitArtifact;
+ unsigned Factor = TRI.getSpillWeightFactor(MRI.getRegClass(LI.reg()));
if (IsLocalSplitArtifact) {
MachineBasicBlock *LocalMBB = LIS.getMBBFromIndex(*End);
assert(LocalMBB == LIS.getMBBFromIndex(*Start) &&
@@ -199,10 +200,10 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
// localLI = COPY other
// ...
// other = COPY localLI
- TotalWeight +=
- LiveIntervals::getSpillWeight(true, false, &MBFI, LocalMBB, PSI);
- TotalWeight +=
- LiveIntervals::getSpillWeight(false, true, &MBFI, LocalMBB, PSI);
+ TotalWeight += LiveIntervals::getSpillWeight(true, false, &MBFI, LocalMBB,
+ Factor, PSI);
+ TotalWeight += LiveIntervals::getSpillWeight(false, true, &MBFI, LocalMBB,
+ Factor, PSI);
NumInstr += 2;
}
@@ -274,7 +275,8 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
// Calculate instr weight.
bool Reads, Writes;
std::tie(Reads, Writes) = MI->readsWritesVirtualRegister(LI.reg());
- Weight = LiveIntervals::getSpillWeight(Writes, Reads, &MBFI, *MI, PSI);
+ Weight =
+ LiveIntervals::getSpillWeight(Writes, Reads, &MBFI, *MI, Factor, PSI);
// Give extra weight to what looks like a loop induction variable update.
if (Writes && IsExiting && LIS.isLiveOutOfMBB(LI, MBB))
diff --git a/llvm/lib/CodeGen/LiveIntervals.cpp b/llvm/lib/CodeGen/LiveIntervals.cpp
index 21a316cf99a217..48f4538122be3e 100644
--- a/llvm/lib/CodeGen/LiveIntervals.cpp
+++ b/llvm/lib/CodeGen/LiveIntervals.cpp
@@ -877,15 +877,15 @@ LiveIntervals::hasPHIKill(const LiveInterval &LI, const VNInfo *VNI) const {
float LiveIntervals::getSpillWeight(bool isDef, bool isUse,
const MachineBlockFrequencyInfo *MBFI,
- const MachineInstr &MI,
+ const MachineInstr &MI, unsigned Factor,
ProfileSummaryInfo *PSI) {
- return getSpillWeight(isDef, isUse, MBFI, MI.getParent(), PSI);
+ return getSpillWeight(isDef, isUse, MBFI, MI.getParent(), Factor, PSI);
}
float LiveIntervals::getSpillWeight(bool isDef, bool isUse,
const MachineBlockFrequencyInfo *MBFI,
const MachineBasicBlock *MBB,
- ProfileSummaryInfo *PSI) {
+ unsigned Factor, ProfileSummaryInfo *PSI) {
float Weight = isDef + isUse;
const auto *MF = MBB->getParent();
// When optimizing for size we only consider the codesize impact of spilling
@@ -893,7 +893,7 @@ float LiveIntervals::getSpillWeight(bool isDef, bool isUse,
if (PSI && (MF->getFunction().hasOptSize() ||
llvm::shouldOptimizeForSize(MF, PSI, MBFI)))
return Weight;
- return Weight * MBFI->getBlockFreqRelativeToEntryBlock(MBB);
+ return Weight * MBFI->getBlockFreqRelativeToEntryBlock(MBB) * Factor;
}
LiveRange::Segment
diff --git a/llvm/lib/CodeGen/TargetRegisterInfo.cpp b/llvm/lib/CodeGen/TargetRegisterInfo.cpp
index ac9a3d6f0d1a60..d1f02489db62cb 100644
--- a/llvm/lib/CodeGen/TargetRegisterInfo.cpp
+++ b/llvm/lib/CodeGen/TargetRegisterInfo.cpp
@@ -415,6 +415,11 @@ bool TargetRegisterInfo::shouldRewriteCopySrc(const TargetRegisterClass *DefRC,
return shareSameRegisterFile(*this, DefRC, DefSubReg, SrcRC, SrcSubReg);
}
+unsigned
+TargetRegisterInfo::getSpillWeightFactor(const TargetRegisterClass *RC) const {
+ return 1;
+}
+
// Compute target-independent register allocator hints to help eliminate copies.
bool TargetRegisterInfo::getRegAllocationHints(
Register VirtReg, ArrayRef<MCPhysReg> Order,
diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
index 26195ef721db39..884a62c3e70679 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
@@ -803,6 +803,11 @@ RISCVRegisterInfo::getRegisterCostTableIndex(const MachineFunction &MF) const {
: 0;
}
+unsigned
+RISCVRegisterInfo::getSpillWeightFactor(const TargetRegisterClass *RC) const {
+ return getRegClassWeight(RC).RegWeight;
+}
+
// Add two address hints to improve chances of being able to use a compressed
// instruction.
bool RISCVRegisterInfo::getRegAllocationHints(
diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.h b/llvm/lib/Target/RISCV/RISCVRegisterInfo.h
index 6ddb1eb9c14d5e..51e7f9d3b0cc1d 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.h
@@ -127,6 +127,8 @@ struct RISCVRegisterInfo : public RISCVGenRegisterInfo {
unsigned getRegisterCostTableIndex(const MachineFunction &MF) const override;
+ unsigned getSpillWeightFactor(const TargetRegisterClass *RC) const override;
+
bool getRegAllocationHints(Register VirtReg, ArrayRef<MCPhysReg> Order,
SmallVectorImpl<MCPhysReg> &Hints,
const MachineFunction &MF, const VirtRegMap *VRM,
diff --git a/llvm/test/CodeGen/RISCV/rvv/abs-vp.ll b/llvm/test/CodeGen/RISCV/rvv/abs-vp.ll
index cd2208e31eb6d3..b37454b3b24434 100644
--- a/llvm/test/CodeGen/RISCV/rvv/abs-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/abs-vp.ll
@@ -561,18 +561,7 @@ declare <vscale x 16 x i64> @llvm.vp.abs.nxv16i64(<vscale x 16 x i64>, i1 immarg
define <vscale x 16 x i64> @vp_abs_nxv16i64(<vscale x 16 x i64> %va, <vscale x 16 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vp_abs_nxv16i64:
; CHECK: # %bb.0:
-; CHECK-NEXT: addi sp, sp, -16
-; CHECK-NEXT: .cfi_def_cfa_offset 16
-; CHECK-NEXT: csrr a1, vlenb
-; CHECK-NEXT: slli a1, a1, 4
-; CHECK-NEXT: sub sp, sp, a1
-; CHECK-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x10, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 16 * vlenb
-; CHECK-NEXT: vmv1r.v v24, v0
-; CHECK-NEXT: csrr a1, vlenb
-; CHECK-NEXT: slli a1, a1, 3
-; CHECK-NEXT: add a1, sp, a1
-; CHECK-NEXT: addi a1, a1, 16
-; CHECK-NEXT: vs8r.v v8, (a1) # Unknown-size Folded Spill
+; CHECK-NEXT: vmv1r.v v7, v0
; CHECK-NEXT: csrr a1, vlenb
; CHECK-NEXT: srli a2, a1, 3
; CHECK-NEXT: vsetvli a3, zero, e8, mf4, ta, ma
@@ -582,28 +571,16 @@ define <vscale x 16 x i64> @vp_abs_nxv16i64(<vscale x 16 x i64> %va, <vscale x 1
; CHECK-NEXT: addi a3, a3, -1
; CHECK-NEXT: and a2, a3, a2
; CHECK-NEXT: vsetvli zero, a2, e64, m8, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v16, 0, v0.t
-; CHECK-NEXT: vmax.vv v8, v16, v8, v0.t
-; CHECK-NEXT: addi a2, sp, 16
-; CHECK-NEXT: vs8r.v v8, (a2) # Unknown-size Folded Spill
+; CHECK-NEXT: vrsub.vi v24, v16, 0, v0.t
+; CHECK-NEXT: vmax.vv v16, v16, v24, v0.t
; CHECK-NEXT: bltu a0, a1, .LBB46_2
; CHECK-NEXT: # %bb.1:
; CHECK-NEXT: mv a0, a1
; CHECK-NEXT: .LBB46_2:
-; CHECK-NEXT: vmv1r.v v0, v24
-; CHECK-NEXT: slli a1, a1, 3
-; CHECK-NEXT: add a1, sp, a1
-; CHECK-NEXT: addi a1, a1, 16
-; CHECK-NEXT: vl8r.v v8, (a1) # Unknown-size Folded Reload
+; CHECK-NEXT: vmv1r.v v0, v7
; CHECK-NEXT: vsetvli zero, a0, e64, m8, ta, ma
-; CHECK-NEXT: vrsub.vi v16, v8, 0, v0.t
-; CHECK-NEXT: vmax.vv v8, v8, v16, v0.t
-; CHECK-NEXT: addi a0, sp, 16
-; CHECK-NEXT: vl8r.v v16, (a0) # Unknown-size Folded Reload
-; CHECK-NEXT: csrr a0, vlenb
-; CHECK-NEXT: slli a0, a0, 4
-; CHECK-NEXT: add sp, sp, a0
-; CHECK-NEXT: addi sp, sp, 16
+; CHECK-NEXT: vrsub.vi v24, v8, 0, v0.t
+; CHECK-NEXT: vmax.vv v8, v8, v24, v0.t
; CHECK-NEXT: ret
%v = call <vscale x 16 x i64> @llvm.vp.abs.nxv16i64(<vscale x 16 x i64> %va, i1 false, <vscale x 16 x i1> %m, i32 %evl)
ret <vscale x 16 x i64> %v
diff --git a/llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll b/llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll
index afce04d107e728..94bc2851a6bf40 100644
--- a/llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll
@@ -2307,7 +2307,7 @@ define <vscale x 7 x i64> @vp_bitreverse_nxv7i64(<vscale x 7 x i64> %va, <vscale
; RV32-NEXT: vsll.vx v24, v24, a3, v0.t
; RV32-NEXT: vor.vv v16, v16, v24, v0.t
; RV32-NEXT: csrr a4, vlenb
-; RV32-NEXT: slli a4, a4, 4
+; RV32-NEXT: slli a4, a4, 3
; RV32-NEXT: add a4, sp, a4
; RV32-NEXT: addi a4, a4, 16
; RV32-NEXT: vs8r.v v16, (a4) # Unknown-size Folded Spill
@@ -2315,28 +2315,30 @@ define <vscale x 7 x i64> @vp_bitreverse_nxv7i64(<vscale x 7 x i64> %va, <vscale
; RV32-NEXT: vsetvli a5, zero, e64, m8, ta, ma
; RV32-NEXT: vlse64.v v16, (a4), zero
; RV32-NEXT: csrr a4, vlenb
-; RV32-NEXT: slli a4, a4, 3
+; RV32-NEXT: slli a4, a4, 4
; RV32-NEXT: add a4, sp, a4
; RV32-NEXT: addi a4, a4, 16
; RV32-NEXT: vs8r.v v16, (a4) # Unknown-size Folded Spill
; RV32-NEXT: lui a4, 4080
; RV32-NEXT: vsetvli zero, a0, e64, m8, ta, ma
-; RV32-NEXT: vand.vx v24, v8, a4, v0.t
-; RV32-NEXT: vsll.vi v24, v24, 24, v0.t
-; RV32-NEXT: addi a5, sp, 16
-; RV32-NEXT: vs8r.v v24, (a5) # Unknown-size Folded Spill
-; RV32-NEXT: vand.vv v24, v8, v16, v0.t
-; RV32-NEXT: vsll.vi v16, v24, 8, v0.t
-; RV32-NEXT: vl8r.v v24, (a5) # Unknown-size Folded Reload
-; RV32-NEXT: vor.vv v16, v24, v16, v0.t
+; RV32-NEXT: vand.vx v16, v8, a4, v0.t
+; RV32-NEXT: vsll.vi v24, v16, 24, v0.t
; RV32-NEXT: csrr a5, vlenb
; RV32-NEXT: slli a5, a5, 4
; RV32-NEXT: add a5, sp, a5
; RV32-NEXT: addi a5, a5, 16
+; RV32-NEXT: vl8r.v v16, (a5) # Unknown-size Folded Reload
+; RV32-NEXT: vand.vv v16, v8, v16, v0.t
+; RV32-NEXT: vsll.vi v16, v16, 8, v0.t
+; RV32-NEXT: vor.vv v16, v24, v16, v0.t
+; RV32-NEXT: csrr a5, vlenb
+; RV32-NEXT: slli a5, a5, 3
+; RV32-NEXT: add a5, sp, a5
+; RV32-NEXT: addi a5, a5, 16
; RV32-NEXT: vl8r.v v24, (a5) # Unknown-size Folded Reload
; RV32-NEXT: vor.vv v16, v24, v16, v0.t
; RV32-NEXT: csrr a5, vlenb
-; RV32-NEXT: slli a5, a5, 4
+; RV32-NEXT: slli a5, a5, 3
; RV32-NEXT: add a5, sp, a5
; RV32-NEXT: addi a5, a5, 16
; RV32-NEXT: vs8r.v v16, (a5) # Unknown-size Folded Spill
@@ -2350,7 +2352,7 @@ define <vscale x 7 x i64> @vp_bitreverse_nxv7i64(<vscale x 7 x i64> %va, <vscale
; RV32-NEXT: vand.vx v24, v24, a4, v0.t
; RV32-NEXT: vsrl.vi v8, v8, 8, v0.t
; RV32-NEXT: csrr a1, vlenb
-; RV32-NEXT: slli a1, a1, 3
+; RV32-NEXT: slli a1, a1, 4
; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16
; RV32-NEXT: vl8r.v v16, (a1) # Unknown-size Folded Reload
@@ -2360,7 +2362,7 @@ define <vscale x 7 x i64> @vp_bitreverse_nxv7i64(<vscale x 7 x i64> %va, <vscale
; RV32-NEXT: vl8r.v v16, (a1) # Unknown-size Folded Reload
; RV32-NEXT: vor.vv v8, v8, v16, v0.t
; RV32-NEXT: csrr a1, vlenb
-; RV32-NEXT: slli a1, a1, 4
+; RV32-NEXT: slli a1, a1, 3
; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16
; RV32-NEXT: vl8r.v v16, (a1) # Unknown-size Folded Reload
@@ -2668,7 +2670,7 @@ define <vscale x 8 x i64> @vp_bitreverse_nxv8i64(<vscale x 8 x i64> %va, <vscale
; RV32-NEXT: vsll.vx v24, v24, a3, v0.t
; RV32-NEXT: vor.vv v16, v16, v24, v0.t
; RV32-NEXT: csrr a4, vlenb
-; RV32-NEXT: slli a4, a4, 4
+; RV32-NEXT: slli a4, a4, 3
; RV32-NEXT: add a4, sp, a4
; RV32-NEXT: addi a4, a4, 16
; RV32-NEXT: vs8r.v v16, (a4) # Unknown-size Folded Spill
@@ -2676,28 +2678,30 @@ define <vscale x 8 x i64> @vp_bitreverse_nxv8i64(<vscale x 8 x i64> %va, <vscale
; RV32-NEXT: vsetvli a5, zero, e64, m8, ta, ma
; RV32-NEXT: vlse64.v v16, (a4), zero
; RV32-NEXT: csrr a4, vlenb
-; RV32-NEXT: slli a4, a4, 3
+; RV32-NEXT: slli a4, a4, 4
; RV32-NEXT: add a4, sp, a4
; RV32-NEXT: addi a4, a4, 16
; RV32-NEXT: vs8r.v v16, (a4) # Unknown-size Folded Spill
; RV32-NEXT: lui a4, 4080
; RV32-NEXT: vsetvli zero, a0, e64, m8, ta, ma
-; RV32-NEXT: vand.vx v24, v8, a4, v0.t
-; RV32-NEXT: vsll.vi v24, v24, 24, v0.t
-; RV32-NEXT: addi a5, sp, 16
-; RV32-NEXT: vs8r.v v24, (a5) # Unknown-size Folded Spill
-; RV32-NEXT: vand.vv v24, v8, v16, v0.t
-; RV32-NEXT: vsll.vi v16, v24, 8, v0.t
-; RV32-NEXT: vl8r.v v24, (a5) # Unknown-size Folded Reload
-; RV32-NEXT: vor.vv v16, v24, v16, v0.t
+; RV32-NEXT: vand.vx v16, v8, a4, v0.t
+; RV32-NEXT: vsll.vi v24, v16, 24, v0.t
; RV32-NEXT: csrr a5, vlenb
; RV32-NEXT: slli a5, a5, 4
; RV32-NEXT: add a5, sp, a5
; RV32-NEXT: addi a5, a5, 16
+; RV32-NEXT: vl8r.v v16, (a5) # Unknown-size Folded Reload
+; RV32-NEXT: vand.vv v16, v8, v16, v0.t
+; RV32-NEXT: vsll.vi v16, v16, 8, v0.t
+; RV32-NEXT: vor.vv v16, v24, v16, v0.t
+; RV32-NEXT: csrr a5, vlenb
+; RV32-NEXT: slli a5, a5, 3
+; RV32-NEXT: add a5, sp, a5
+; RV32-NEXT: addi a5, a5, 16
; RV32-NEXT: vl8r.v v24, (a5) # Unknown-size Folded Reload
; RV32-NEXT: vor.vv v16, v24, v16, v0.t
; RV32-NEXT: csrr a5, vlenb
-; RV32-NEXT: slli a5, a5, 4
+; RV32-NEXT: slli a5, a5, 3
; RV32-NEXT: add a5, sp, a5
; RV32-NEXT: addi a5, a5, 16
; RV32-NEXT: vs8r.v v16, (a5) # Unknown-size Folded Spill
@@ -2711,7 +2715,7 @@ define <vscale x 8 x i64> @vp_bitreverse_nxv8i64(<vscale x 8 x i64> %va, <vscale
; RV32-NEXT: vand.vx v24, v24, a4, v0.t
; RV32-NEXT: vsrl.vi v8, v8, 8, v0.t
; RV32-NEXT: csrr a1, vlenb
-; RV32-NEXT: slli a1, a1, 3
+; RV32-NEXT: slli a1, a1, 4
; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16
; RV32-NEXT: vl8r.v v16, (a1) # Unknown-size Folded Reload
@@ -2721,7 +2725,7 @@ define <vscale x 8 x i64> @vp_bitreverse_nxv8i64(<vscale x 8 x i64> %va, <vscale
; RV32-NEXT: vl8r.v v16, (a1) # Unknown-size Folded Reload
; RV32-NEXT: vor.vv v8, v8, v16, v0.t
; RV32-NEXT: csrr a1, vlenb
-; RV32-NEXT: slli a1, a1, 4
+; RV32-NEXT: slli a1, a1, 3
; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16
; RV32-NEXT: vl8r.v v16, (a1) # Unknown-size Folded Reload
@@ -3010,89 +3014,65 @@ declare <vscale x 64 x i16> @llvm.vp.bitreverse.nxv64i16(<vscale x 64 x i16>, <v
define <vscale x 64 x i16> @vp_bitreverse_nxv64i16(<vscale x 64 x i16> %va, <vscale x 64 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vp_bitreverse_nxv64i16:
; CHECK: # %bb.0:
-; CHECK-NEXT: addi sp, sp, -16
-; CHECK-NEXT: .cfi_def_cfa_offset 16
+; CHECK-NEXT: vmv1r.v v7, v0
; CHECK-NEXT: csrr a1, vlenb
-; CHECK-NEXT: slli a1, a1, 4
-; CHECK-NEXT: sub sp, sp, a1
-; CHECK-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x10, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 16 * vlenb
-; CHECK-NEXT: vmv1r.v v24, v0
-; CHECK-NEXT: csrr a1, vlenb
-; CHECK-NEXT: slli a1, a1, 3
-; CHECK-NEXT: add a1, sp, a1
-; CHECK-NEXT: addi a1, a1, 16
-; CHECK-NEXT: vs8r.v v8, (a1) # Unknown-size Folded Spill
-; CHECK-NEXT: csrr a2, vlenb
-; CHECK-NEXT: srli a1, a2, 1
+; CHECK-NEXT: srli a2, a1, 1
; CHECK-NEXT: vsetvli a3, zero, e8, m1, ta, ma
-; CHECK-NEXT: vslidedown.vx v0, v0, a1
-; CHECK-NEXT: slli a2, a2, 2
-; CHECK-NEXT: sub a1, a0, a2
-; CHECK-NEXT: sltu a3, a0, a1
+; CHECK-NEXT: vslidedown.vx v0, v0, a2
+; CHECK-NEXT: slli a1, a1, 2
+; CHECK-NEXT: sub a2, a0, a1
+; CHECK-NEXT: sltu a3, a0, a2
; CHECK-NEXT: addi a3, a3, -1
-; CHECK-NEXT: and a1, a3, a1
-; CHECK-NEXT: vsetvli zero, a1, e16, m8, ta, ma
-; CHECK-NEXT: vsrl.vi v8, v16, 8, v0.t
+; CHECK-NEXT: and a2, a3, a2
+; CHECK-NEXT: vsetvli zero, a2, e16, m8, ta, ma
+; CHECK-NEXT: vsrl.vi v24, v16, 8, v0.t
; CHECK-NEXT: vsll.vi v16, v16, 8, v0.t
-; CHECK-NEXT: vor.vv v16, v16, v8, v0.t
-; CHECK-NEXT: vsrl.vi v8, v16, 4, v0.t
-; CHECK-NEXT: lui a1, 1
-; CHECK-NEXT: addi a1, a1, -241
-; CHECK-NEXT: vand.vx v8, v8, a1, v0.t
-; CHECK-NEXT: vand.vx v16, v16, a1, v0.t
+; CHECK-NEXT: vor.vv v16, v16, v24, v0.t
+; CHECK-NEXT: vsrl.vi v24, v16, 4, v0.t
+; CHECK-NEXT: lui a2, 1
+; CHECK-NEXT: addi a2, a2, -241
+; CHECK-NEXT: vand.vx v24, v24, a2, v0.t
+; CHECK-NEXT: vand.vx v16, v16, a2, v0.t
; CHECK-NEXT: vsll.vi v16, v16, 4, v0.t
-; CHECK-NEXT: vor.vv v16, v8, v16, v0.t
-; CHECK-NEXT: vsrl.vi v8, v16, 2, v0.t
+; CHECK-NEXT: vor.vv v16, v24, v16, v0.t
+; CHECK-NEXT: vsrl.vi v24, v16, 2, v0.t
; CHECK-NEXT: lui a3, 3
; CHECK-NEXT: addi a3, a3, 819
-; CHECK-NEXT: vand.vx v8, v8, a3, v0.t
+; CHECK-NEXT: vand.vx v24, v24, a3, v0.t
; CHECK-NEXT: vand.vx v16, v16, a3, v0.t
; CHECK-NEXT: vsll.vi v16, v16, 2, v0.t
-; CHECK-NEXT: vor.vv v16, v8, v16, v0.t
-; CHECK-NEXT: vsrl.vi v8, v16, 1, v0.t
+; CHECK-NEXT: vor.vv v16, v24, v16, v0.t
+; CHECK-NEXT: vsrl.vi v24, v16, 1, v0.t
; CHECK-NEXT: lui a4, 5
; CHECK-NEXT: addi a4, a4, 1365
-; CHECK-NEXT: vand.vx v8, v8, a4, v0.t
+; CHECK-NEXT: vand.vx v24, v24, a4, v0.t
; CHECK-NEXT: vand.vx v16, v16, a4, v0.t
; CHECK-NEXT: vsll.vi v16, v16, 1, v0.t
-; CHECK-NEXT: vor.vv v8, v8, v16, v0.t
-; CHECK-NEXT: addi a5, sp, 16
-; CHECK-NEXT: vs8r.v v8, (a5) # Unknown-size Folded Spill
-; CHECK-NEXT: bltu a0, a2, .LBB46_2
+; CHECK-NEXT: vor.vv v16, v24, v16, v0.t
+; CHECK-NEXT: bltu a0, a1, .LBB46_2
; CHECK-NEXT: # %bb.1:
-; CHECK-NEXT: mv a0, a2
+; CHECK-NEXT: mv a0, a1
; CHECK-NEXT: .LBB46_2:
-; CHECK-NEXT: vmv1r.v v0, v24
-; CHECK-NEXT: csrr a2, vlenb
-; CHECK-NEXT: slli a2, a2, 3
-; CHECK-NEXT: add a2, sp, a2
-; CHECK-NEXT: addi a2, a2, 16
-; CHECK-NEXT: vl8r.v v8, (a2) # Unknown-size Folded Reload
+; CHECK-NEXT: vmv1r.v v0, v7
; CHECK-NEXT: vsetvli zero, a0, e16, m8, ta, ma
-; CHECK-NEXT: vsrl.vi v16, v8, 8, v0.t
+; CHECK-NEXT: vsrl.vi v24, v8, 8, v0.t
; CHECK-NEXT: vsll.vi v8, v8, 8, v0.t
-; CHECK-NEXT: vor.vv v8, v8, v16, v0.t
-; CHECK-NEXT: vsrl.vi v16, v8, 4, v0.t...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/113675
More information about the llvm-commits
mailing list