[llvm] [RISCV] Collect shuffle mask for the lane not by createSequentialMask (PR #129830)
Jim Lin via llvm-commits
llvm-commits at lists.llvm.org
Tue Mar 4 21:20:42 PST 2025
https://github.com/tclin914 created https://github.com/llvm/llvm-project/pull/129830
If there are the shuffle mask <1, u, u, u, 2, u, u, u> with factor 4. we should have the shuffle mask <1, 2> for lane 0 and <u, u> for lane 1, and so on. Since we use createSequentialMask to create the shuffle mask, the shuffle mask for lane 1 would be <u, 0>(dervied from <u, u+1>). This leads to poor code generation.
>From cd71fa5952808c86a8ef70745df90c96672a7621 Mon Sep 17 00:00:00 2001
From: Jim Lin <jim at andestech.com>
Date: Tue, 4 Mar 2025 19:12:48 +0800
Subject: [PATCH] [RISCV] Collect shuffle mask for the lane not by
createSequentialMask
If there are the shuffle mask <1, u, u, u, 2, u, u, u> with factor 4. we
should have the shuffle mask <1, 2> for lane 0 and <u, u> for lane 1,
and so on. Since we use createSequentialMask to create the shuffle mask, the
shuffle mask for lane 1 would be <u, 0>(dervied from <u, u+1>). This leads to
poor code generation.
---
llvm/lib/Target/RISCV/RISCVISelLowering.cpp | 9 ++++++++-
.../RISCV/rvv/fixed-vectors-interleaved-access.ll | 8 ++------
2 files changed, 10 insertions(+), 7 deletions(-)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 4e6b3a224b79b..54206aba01e05 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -23056,12 +23056,19 @@ bool RISCVTargetLowering::lowerInterleavedStore(StoreInst *SI,
{VTy, SI->getPointerOperandType(), XLenTy});
SmallVector<Value *, 10> Ops;
+ SmallVector<int, 16> NewShuffleMask;
for (unsigned i = 0; i < Factor; i++) {
+ // Collect shuffle mask for this lane.
+ for (unsigned j = 0; j < VTy->getNumElements(); j++)
+ NewShuffleMask.push_back(Mask[i + Factor * j]);
+
Value *Shuffle = Builder.CreateShuffleVector(
SVI->getOperand(0), SVI->getOperand(1),
- createSequentialMask(Mask[i], VTy->getNumElements(), 0));
+ NewShuffleMask);
Ops.push_back(Shuffle);
+
+ NewShuffleMask.clear();
}
// This VL should be OK (should be executable in one vsseg instruction,
// potentially under larger LMULs) because we checked that the fixed vector
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll
index 4200837227899..7cc8c0c3f2d89 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll
@@ -1394,16 +1394,12 @@ define void @store_factor4_one_active_fullwidth(ptr %ptr, <16 x i32> %v) {
ret void
}
-; TODO: This could be a vslidedown followed by a strided store
define void @store_factor4_one_active_slidedown(ptr %ptr, <4 x i32> %v) {
; CHECK-LABEL: store_factor4_one_active_slidedown:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
-; CHECK-NEXT: vslidedown.vi v9, v8, 1
-; CHECK-NEXT: vslideup.vi v10, v8, 1
-; CHECK-NEXT: vmv.v.v v11, v10
-; CHECK-NEXT: vmv.v.v v12, v10
-; CHECK-NEXT: vsseg4e32.v v9, (a0)
+; CHECK-NEXT: vslidedown.vi v8, v8, 1
+; CHECK-NEXT: vsseg4e32.v v8, (a0)
; CHECK-NEXT: ret
%v0 = shufflevector <4 x i32> %v, <4 x i32> poison, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 2, i32 undef, i32 undef, i32 undef, i32 3, i32 undef, i32 undef, i32 undef, i32 4, i32 undef, i32 undef, i32 undef>
store <16 x i32> %v0, ptr %ptr
More information about the llvm-commits
mailing list