[llvm] [IR] Allow non-constant offsets in @llvm.vector.splice.{left,right} (PR #174693)
Luke Lau via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 6 23:52:00 PST 2026
https://github.com/lukel97 updated https://github.com/llvm/llvm-project/pull/174693
>From 95d5b9a1bf18600216aa1c58e73dfc9798bc03d3 Mon Sep 17 00:00:00 2001
From: Luke Lau <luke at igalia.com>
Date: Tue, 6 Jan 2026 21:25:48 +0800
Subject: [PATCH 1/3] Allow non-constant splice offsets
---
llvm/docs/LangRef.rst | 44 +++++----
.../llvm/Analysis/TargetTransformInfo.h | 1 -
llvm/include/llvm/CodeGen/BasicTTIImpl.h | 5 +-
llvm/include/llvm/CodeGen/ISDOpcodes.h | 9 +-
llvm/include/llvm/IR/Intrinsics.td | 4 +-
llvm/lib/Analysis/InstructionSimplify.cpp | 24 +++++
.../SelectionDAG/SelectionDAGBuilder.cpp | 17 ++--
.../CodeGen/SelectionDAG/TargetLowering.cpp | 42 ++++-----
llvm/lib/IR/Verifier.cpp | 27 ------
.../Target/AArch64/AArch64ISelLowering.cpp | 2 +
.../test/Analysis/CostModel/AArch64/splice.ll | 7 +-
.../CostModel/AArch64/sve-intrinsics.ll | 11 ++-
llvm/test/Analysis/CostModel/RISCV/splice.ll | 9 +-
.../AArch64/named-vector-shuffles-neon.ll | 37 ++++++++
.../AArch64/named-vector-shuffles-sve.ll | 87 +++++++++++++----
llvm/test/CodeGen/RISCV/rvv/vector-splice.ll | 80 ++++++++++++++++
.../Transforms/InstSimplify/vector-splice.ll | 94 +++++++++++++++++++
llvm/test/Verifier/invalid-splice.ll | 37 --------
18 files changed, 390 insertions(+), 147 deletions(-)
create mode 100644 llvm/test/Transforms/InstSimplify/vector-splice.ll
delete mode 100644 llvm/test/Verifier/invalid-splice.ll
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 041a526b6729f..2b95163f96eb6 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -20822,20 +20822,20 @@ This is an overloaded intrinsic.
::
- declare <2 x double> @llvm.vector.splice.left.v2f64(<2 x double> %vec1, <2 x double> %vec2, i32 %imm)
- declare <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, i32 %imm)
+ declare <2 x double> @llvm.vector.splice.left.v2f64(<2 x double> %vec1, <2 x double> %vec2, i32 %offset)
+ declare <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, i32 %offset)
Overview:
"""""""""
The '``llvm.vector.splice.left.*``' intrinsics construct a vector by
-concatenating two vectors together, shifting the elements left by ``imm``, and
-extracting the lower half.
+concatenating two vectors together, shifting the elements left by ``offset``,
+and extracting the lower half.
These intrinsics work for both fixed and scalable vectors. While this intrinsic
supports all vector types the recommended way to express this operation for
-fixed-width vectors is still to use a shufflevector, as that may allow for more
-optimization opportunities.
+fixed-width vectors with an immediate offset is still to use a shufflevector, as
+that may allow for more optimization opportunities.
For example:
@@ -20849,11 +20849,13 @@ For example:
Arguments:
""""""""""
+The first two operands are vectors with the same type. ``offset`` is an unsigned
+scalar i32 that determines how many elements to shift left by.
-The first two operands are vectors with the same type. For a fixed-width vector
-<N x eltty>, imm is an unsigned integer constant in the range 0 <= imm < N. For
-a scalable vector <vscale x N x eltty>, imm is an unsigned integer constant in
-the range 0 <= imm < X where X=vscale_range_min * N.
+Semantics:
+""""""""""
+For a vector type with a runtime length of N, if ``offset`` > N then the result
+is a :ref:`poison value <poisonvalues>`.
'``llvm.vector.splice.right``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -20864,20 +20866,20 @@ This is an overloaded intrinsic.
::
- declare <2 x double> @llvm.vector.splice.right.v2f64(<2 x double> %vec1, <2 x double> %vec2, i32 %imm)
- declare <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, i32 %imm)
+ declare <2 x double> @llvm.vector.splice.right.v2f64(<2 x double> %vec1, <2 x double> %vec2, i32 %offset)
+ declare <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, i32 %offset)
Overview:
"""""""""
The '``llvm.vector.splice.right.*``' intrinsics construct a vector by
-concatenating two vectors together, shifting the elements right by ``imm``, and
-extracting the upper half.
+concatenating two vectors together, shifting the elements right by ``offset``,
+and extracting the upper half.
These intrinsics work for both fixed and scalable vectors. While this intrinsic
supports all vector types the recommended way to express this operation for
-fixed-width vectors is still to use a shufflevector, as that may allow for more
-optimization opportunities.
+fixed-width vectors with an immediate offset is still to use a shufflevector, as
+that may allow for more optimization opportunities.
For example:
@@ -20891,11 +20893,13 @@ For example:
Arguments:
""""""""""
+The first two operands are vectors with the same type. ``offset`` is an unsigned
+scalar i32 that determines how many elements to shift right by.
-The first two operands are vectors with the same type. For a fixed-width vector
-<N x eltty>, imm is an unsigned integer constant in the range 0 <= imm <= N. For
-a scalable vector <vscale x N x eltty>, imm is an unsigned integer constant in
-the range 0 <= imm <= X where X=vscale_range_min * N.
+Semantics:
+""""""""""
+For a vector type with a runtime length of N, if ``offset`` > N then the result
+is a :ref:`poison value <poisonvalues>`.
'``llvm.stepvector``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 8b06b4aae26ce..5a4eb8daf0af6 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1210,7 +1210,6 @@ class TargetTransformInfo {
///< with any shuffle mask.
SK_PermuteSingleSrc, ///< Shuffle elements of single source vector with any
///< shuffle mask.
- // TODO: Split into SK_SpliceLeft + SK_SpliceRight
SK_Splice ///< Concatenates elements from the first input vector
///< with elements of the second input vector. Returning
///< a vector of the same type as the input vectors.
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index ef91c845ce9e7..c430e11168f73 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -2001,7 +2001,10 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
}
case Intrinsic::vector_splice_left:
case Intrinsic::vector_splice_right: {
- unsigned Index = cast<ConstantInt>(Args[2])->getZExtValue();
+ auto *COffset = dyn_cast<ConstantInt>(Args[2]);
+ if (!COffset)
+ break;
+ unsigned Index = COffset->getZExtValue();
return thisT()->getShuffleCost(
TTI::SK_Splice, cast<VectorType>(RetTy),
cast<VectorType>(Args[0]->getType()), {}, CostKind,
diff --git a/llvm/include/llvm/CodeGen/ISDOpcodes.h b/llvm/include/llvm/CodeGen/ISDOpcodes.h
index ea7b21b6f6448..a7325f500f0ef 100644
--- a/llvm/include/llvm/CodeGen/ISDOpcodes.h
+++ b/llvm/include/llvm/CodeGen/ISDOpcodes.h
@@ -643,11 +643,12 @@ enum NodeType {
/// in terms of the element size of VEC1/VEC2, not in terms of bytes.
VECTOR_SHUFFLE,
- /// VECTOR_SPLICE_LEFT(VEC1, VEC2, IMM) - Shifts CONCAT_VECTORS(VEC1, VEC2)
- /// left by IMM elements and returns the lower half.
+ /// VECTOR_SPLICE_LEFT(VEC1, VEC2, OFFSET) - Shifts CONCAT_VECTORS(VEC1, VEC2)
+ /// left by OFFSET elements and returns the lower half.
VECTOR_SPLICE_LEFT,
- /// VECTOR_SPLICE_RIGHT(VEC1, VEC2, IMM) - Shifts CONCAT_VECTORS(VEC1, VEC2)
- /// right by IMM elements and returns the upper half.
+ /// VECTOR_SPLICE_RIGHT(VEC1, VEC2, OFFSET) - Shifts CONCAT_VECTORS(VEC1,
+ /// VEC2)
+ /// right by OFFSET elements and returns the upper half.
VECTOR_SPLICE_RIGHT,
/// SCALAR_TO_VECTOR(VAL) - This represents the operation of loading a
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index c56b0185b4f1e..c5be0c4e0ebf8 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2832,12 +2832,12 @@ def int_vector_reverse : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
def int_vector_splice_left
: DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
- [IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<2>>]>;
+ [IntrNoMem, IntrSpeculatable]>;
def int_vector_splice_right
: DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
- [IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<2>>]>;
+ [IntrNoMem, IntrSpeculatable]>;
//===---------- Intrinsics to query properties of scalable vectors --------===//
def int_vscale : DefaultAttrsIntrinsic<[llvm_anyint_ty],
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index 5af4b299cb60c..bff666ce236a5 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -7258,6 +7258,30 @@ static Value *simplifyIntrinsic(CallBase *Call, Value *Callee,
return nullptr;
}
+ case Intrinsic::vector_splice_left:
+ case Intrinsic::vector_splice_right: {
+ Value *Offset = Args[2];
+ auto *Ty = cast<VectorType>(F->getReturnType());
+ if (Q.isUndefValue(Offset))
+ return PoisonValue::get(Ty);
+
+ unsigned BitWidth = Offset->getType()->getScalarSizeInBits();
+ ConstantRange NumElts(
+ APInt(BitWidth, Ty->getElementCount().getKnownMinValue()));
+ if (Ty->isScalableTy())
+ NumElts = NumElts.multiply(getVScaleRange(Call->getFunction(), BitWidth));
+
+ // If we know Offset > NumElts, simplify to poison.
+ ConstantRange CR = computeConstantRangeIncludingKnownBits(Offset, false, Q);
+ if (CR.getUnsignedMin().ugt(NumElts.getUnsignedMax()))
+ return PoisonValue::get(Ty);
+
+ // splice.left(a, b, 0) --> a, splice.right(a, b, 0) --> b
+ if (CR.isSingleElement() && CR.getSingleElement()->isZero())
+ return IID == Intrinsic::vector_splice_left ? Args[0] : Args[1];
+
+ return nullptr;
+ }
case Intrinsic::experimental_constrained_fadd: {
auto *FPI = cast<ConstrainedFPIntrinsic>(Call);
return simplifyFAddInst(Args[0], Args[1], FPI->getFastMathFlags(), Q,
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index bff4799963fc2..745626488ccf0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -12850,19 +12850,18 @@ void SelectionDAGBuilder::visitVectorSplice(const CallInst &I) {
SDLoc DL = getCurSDLoc();
SDValue V1 = getValue(I.getOperand(0));
SDValue V2 = getValue(I.getOperand(1));
- uint64_t Imm = cast<ConstantInt>(I.getOperand(2))->getZExtValue();
const bool IsLeft = I.getIntrinsicID() == Intrinsic::vector_splice_left;
- // VECTOR_SHUFFLE doesn't support a scalable mask so use a dedicated node.
- if (VT.isScalableVector()) {
- setValue(
- &I,
- DAG.getNode(
- IsLeft ? ISD::VECTOR_SPLICE_LEFT : ISD::VECTOR_SPLICE_RIGHT, DL, VT,
- V1, V2,
- DAG.getConstant(Imm, DL, TLI.getVectorIdxTy(DAG.getDataLayout()))));
+ // VECTOR_SHUFFLE doesn't support a scalable or non-constant mask.
+ if (VT.isScalableVector() || !isa<ConstantInt>(I.getOperand(2))) {
+ SDValue Offset = DAG.getZExtOrTrunc(
+ getValue(I.getOperand(2)), DL, TLI.getVectorIdxTy(DAG.getDataLayout()));
+ setValue(&I, DAG.getNode(IsLeft ? ISD::VECTOR_SPLICE_LEFT
+ : ISD::VECTOR_SPLICE_RIGHT,
+ DL, VT, V1, V2, Offset));
return;
}
+ uint64_t Imm = cast<ConstantInt>(I.getOperand(2))->getZExtValue();
unsigned NumElts = VT.getVectorNumElements();
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 8968907dcf5af..cf0a13473bc56 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -11992,24 +11992,25 @@ SDValue TargetLowering::expandVectorSplice(SDNode *Node,
assert((Node->getOpcode() == ISD::VECTOR_SPLICE_LEFT ||
Node->getOpcode() == ISD::VECTOR_SPLICE_RIGHT) &&
"Unexpected opcode!");
- assert(Node->getValueType(0).isScalableVector() &&
- "Fixed length vector types expected to use SHUFFLE_VECTOR!");
+ assert((Node->getValueType(0).isScalableVector() ||
+ !isa<ConstantSDNode>(Node->getOperand(2))) &&
+ "Fixed length vector types with constant offsets expected to use "
+ "SHUFFLE_VECTOR!");
EVT VT = Node->getValueType(0);
SDValue V1 = Node->getOperand(0);
SDValue V2 = Node->getOperand(1);
- uint64_t Imm = Node->getConstantOperandVal(2);
+ SDValue Offset = Node->getOperand(2);
SDLoc DL(Node);
// Expand through memory thusly:
// Alloca CONCAT_VECTORS_TYPES(V1, V2) Ptr
// Store V1, Ptr
// Store V2, Ptr + sizeof(V1)
- // If (Imm < 0)
- // TrailingElts = -Imm
- // Ptr = Ptr + sizeof(V1) - (TrailingElts * sizeof(VT.Elt))
+ // if (VECTOR_SPLICE_LEFT)
+ // Ptr = Ptr + (Offset * sizeof(VT.Elt))
// else
- // Ptr = Ptr + (Imm * sizeof(VT.Elt))
+ // Ptr = Ptr + sizeof(V1) - (Offset * size(VT.Elt))
// Res = Load Ptr
Align Alignment = DAG.getReducedAlign(VT, /*UseABI=*/false);
@@ -12029,27 +12030,20 @@ SDValue TargetLowering::expandVectorSplice(SDNode *Node,
SDValue StackPtr2 = DAG.getNode(ISD::ADD, DL, PtrVT, StackPtr, VTBytes);
SDValue StoreV2 = DAG.getStore(StoreV1, DL, V2, StackPtr2, PtrInfo);
- if (Node->getOpcode() == ISD::VECTOR_SPLICE_LEFT) {
- // Load back the required element. getVectorElementPointer takes care of
- // clamping the index if it's out-of-bounds.
- StackPtr = getVectorElementPointer(DAG, StackPtr, VT, Node->getOperand(2));
- // Load the spliced result
- return DAG.getLoad(VT, DL, StoreV2, StackPtr,
- MachinePointerInfo::getUnknownStack(MF));
- }
-
- // NOTE: TrailingElts must be clamped so as not to read outside of V1:V2.
- TypeSize EltByteSize = VT.getVectorElementType().getStoreSize();
- SDValue TrailingBytes = DAG.getConstant(Imm * EltByteSize, DL, PtrVT);
+ // NOTE: TrailingBytes must be clamped so as not to read outside of V1:V2.
+ SDValue EltByteSize =
+ DAG.getTypeSize(DL, PtrVT, VT.getVectorElementType().getStoreSize());
+ SDValue TrailingBytes = DAG.getNode(ISD::MUL, DL, PtrVT, Offset, EltByteSize);
- if (Imm > VT.getVectorMinNumElements())
- TrailingBytes = DAG.getNode(ISD::UMIN, DL, PtrVT, TrailingBytes, VTBytes);
+ TrailingBytes = DAG.getNode(ISD::UMIN, DL, PtrVT, TrailingBytes, VTBytes);
- // Calculate the start address of the spliced result.
- StackPtr2 = DAG.getNode(ISD::SUB, DL, PtrVT, StackPtr2, TrailingBytes);
+ if (Node->getOpcode() == ISD::VECTOR_SPLICE_LEFT)
+ StackPtr = DAG.getMemBasePlusOffset(StackPtr, TrailingBytes, DL);
+ else
+ StackPtr = DAG.getNode(ISD::SUB, DL, PtrVT, StackPtr2, TrailingBytes);
// Load the spliced result
- return DAG.getLoad(VT, DL, StoreV2, StackPtr2,
+ return DAG.getLoad(VT, DL, StoreV2, StackPtr,
MachinePointerInfo::getUnknownStack(MF));
}
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index a2ad95eb5abc4..1aa6152d55499 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -6572,33 +6572,6 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
break;
}
- case Intrinsic::vector_splice_left:
- case Intrinsic::vector_splice_right: {
- VectorType *VecTy = cast<VectorType>(Call.getType());
- uint64_t Idx = cast<ConstantInt>(Call.getArgOperand(2))->getZExtValue();
- uint64_t KnownMinNumElements = VecTy->getElementCount().getKnownMinValue();
- if (VecTy->isScalableTy() && Call.getParent() &&
- Call.getParent()->getParent()) {
- AttributeList Attrs = Call.getParent()->getParent()->getAttributes();
- if (Attrs.hasFnAttr(Attribute::VScaleRange))
- KnownMinNumElements *= Attrs.getFnAttrs().getVScaleRangeMin();
- }
- if (ID == Intrinsic::vector_splice_left)
- Check(Idx < KnownMinNumElements,
- "The splice index exceeds the range [0, VL-1] where VL is the "
- "known minimum number of elements in the vector. For scalable "
- "vectors the minimum number of elements is determined from "
- "vscale_range.",
- &Call);
- else
- Check(Idx <= KnownMinNumElements,
- "The splice index exceeds the range [0, VL] where VL is the "
- "known minimum number of elements in the vector. For scalable "
- "vectors the minimum number of elements is determined from "
- "vscale_range.",
- &Call);
- break;
- }
case Intrinsic::stepvector: {
VectorType *VecTy = dyn_cast<VectorType>(Call.getType());
Check(VecTy && VecTy->getScalarType()->isIntegerTy() &&
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index e9396ae76776b..b1851bee7e860 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -12311,6 +12311,8 @@ SDValue AArch64TargetLowering::LowerSELECT_CC(
SDValue AArch64TargetLowering::LowerVECTOR_SPLICE(SDValue Op,
SelectionDAG &DAG) const {
EVT Ty = Op.getValueType();
+ if (!isa<ConstantSDNode>(Op.getOperand(2)))
+ return SDValue();
auto Idx = Op.getConstantOperandAPInt(2);
int64_t IdxVal = Idx.getSExtValue();
assert(Ty.isScalableVector() &&
diff --git a/llvm/test/Analysis/CostModel/AArch64/splice.ll b/llvm/test/Analysis/CostModel/AArch64/splice.ll
index 1d3154ad82299..bb787d0928310 100644
--- a/llvm/test/Analysis/CostModel/AArch64/splice.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/splice.ll
@@ -3,7 +3,7 @@
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
-define void @vector_splice() #0 {
+define void @vector_splice(i32 %offset) #0 {
; CHECK-LABEL: 'vector_splice'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %1 = call <16 x i8> @llvm.vector.splice.left.v16i8(<16 x i8> zeroinitializer, <16 x i8> zeroinitializer, i32 1)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %2 = call <32 x i8> @llvm.vector.splice.left.v32i8(<32 x i8> zeroinitializer, <32 x i8> zeroinitializer, i32 1)
@@ -33,6 +33,8 @@ define void @vector_splice() #0 {
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %26 = call <4 x i1> @llvm.vector.splice.left.v4i1(<4 x i1> zeroinitializer, <4 x i1> zeroinitializer, i32 1)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %27 = call <2 x i1> @llvm.vector.splice.left.v2i1(<2 x i1> zeroinitializer, <2 x i1> zeroinitializer, i32 1)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %28 = call <2 x i128> @llvm.vector.splice.left.v2i128(<2 x i128> zeroinitializer, <2 x i128> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %left.variable = call <4 x i32> @llvm.vector.splice.left.v4i32(<4 x i32> zeroinitializer, <4 x i32> zeroinitializer, i32 %offset)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %right.variable = call <4 x i32> @llvm.vector.splice.right.v4i32(<4 x i32> zeroinitializer, <4 x i32> zeroinitializer, i32 %offset)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
%splice.v16i8 = call <16 x i8> @llvm.vector.splice.v16i8(<16 x i8> zeroinitializer, <16 x i8> zeroinitializer, i32 1)
@@ -63,6 +65,9 @@ define void @vector_splice() #0 {
%splice.v4i1 = call <4 x i1> @llvm.vector.splice.v4i1(<4 x i1> zeroinitializer, <4 x i1> zeroinitializer, i32 1)
%splice.v2i1 = call <2 x i1> @llvm.vector.splice.v2i1(<2 x i1> zeroinitializer, <2 x i1> zeroinitializer, i32 1)
%splice.v2i128 = call <2 x i128> @llvm.vector.splice.v2i128(<2 x i128> zeroinitializer, <2 x i128> zeroinitializer, i32 1)
+
+ %left.variable = call <4 x i32> @llvm.vector.splice.left(<4 x i32> zeroinitializer, <4 x i32> zeroinitializer, i32 %offset)
+ %right.variable = call <4 x i32> @llvm.vector.splice.right(<4 x i32> zeroinitializer, <4 x i32> zeroinitializer, i32 %offset)
ret void
}
diff --git a/llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll b/llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll
index d503918ce6f78..6ed3a90438c34 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll
@@ -615,7 +615,7 @@ declare <vscale x 4 x float> @llvm.log.nxv4f32(<vscale x 4 x float>)
declare <vscale x 4 x float> @llvm.log2.nxv4f32(<vscale x 4 x float>)
declare <vscale x 4 x float> @llvm.log10.nxv4f32(<vscale x 4 x float>)
-define void @vector_splice() #0 {
+define void @vector_splice(i32 %offset) #0 {
; CHECK-VSCALE-1-LABEL: 'vector_splice'
; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %1 = call <vscale x 16 x i8> @llvm.vector.splice.left.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %2 = call <vscale x 32 x i8> @llvm.vector.splice.left.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
@@ -678,6 +678,8 @@ define void @vector_splice() #0 {
; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %59 = call <vscale x 4 x i1> @llvm.vector.splice.right.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %60 = call <vscale x 2 x i1> @llvm.vector.splice.right.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %61 = call <vscale x 1 x i1> @llvm.vector.splice.right.nxv1i1(<vscale x 1 x i1> zeroinitializer, <vscale x 1 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %left.variable = call <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %right.variable = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
; CHECK-VSCALE-2-LABEL: 'vector_splice'
@@ -742,6 +744,8 @@ define void @vector_splice() #0 {
; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %59 = call <vscale x 4 x i1> @llvm.vector.splice.right.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %60 = call <vscale x 2 x i1> @llvm.vector.splice.right.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %61 = call <vscale x 1 x i1> @llvm.vector.splice.right.nxv1i1(<vscale x 1 x i1> zeroinitializer, <vscale x 1 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %left.variable = call <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %right.variable = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
; TYPE_BASED_ONLY-LABEL: 'vector_splice'
@@ -806,6 +810,8 @@ define void @vector_splice() #0 {
; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %59 = call <vscale x 4 x i1> @llvm.vector.splice.right.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %60 = call <vscale x 2 x i1> @llvm.vector.splice.right.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %61 = call <vscale x 1 x i1> @llvm.vector.splice.right.nxv1i1(<vscale x 1 x i1> zeroinitializer, <vscale x 1 x i1> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %left.variable = call <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %right.variable = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
@@ -871,6 +877,9 @@ define void @vector_splice() #0 {
%splice_nxv4i1_neg = call <vscale x 4 x i1> @llvm.vector.splice.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 -1)
%splice_nxv2i1_neg = call <vscale x 2 x i1> @llvm.vector.splice.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 -1)
%splice_nxv1i1_neg = call <vscale x 1 x i1> @llvm.vector.splice.nxv1i1(<vscale x 1 x i1> zeroinitializer, <vscale x 1 x i1> zeroinitializer, i32 -1)
+
+ %left.variable = call <vscale x 4 x i32> @llvm.vector.splice.left(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
+ %right.variable = call <vscale x 4 x i32> @llvm.vector.splice.right(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
ret void
}
diff --git a/llvm/test/Analysis/CostModel/RISCV/splice.ll b/llvm/test/Analysis/CostModel/RISCV/splice.ll
index e388a99be423b..5250c3dc1171a 100644
--- a/llvm/test/Analysis/CostModel/RISCV/splice.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/splice.ll
@@ -4,7 +4,7 @@
; RUN: opt < %s -passes="print<cost-model>" -cost-kind=code-size 2>&1 -disable-output -S -mtriple=riscv64 -mattr=+v,+f,+d,+zfh,+zvfh | FileCheck %s --check-prefix=SIZE
; RUN: opt < %s -passes="print<cost-model>" -cost-kind=code-size 2>&1 -disable-output -S -mtriple=riscv64 -mattr=+v,+f,+d,+zfh,+zvfhmin | FileCheck %s --check-prefix=SIZE
-define void @vector_splice() {
+define void @vector_splice(i32 zeroext %offset) {
; CHECK-LABEL: 'vector_splice'
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %1 = call <vscale x 1 x i8> @llvm.vector.splice.right.nxv1i8(<vscale x 1 x i8> zeroinitializer, <vscale x 1 x i8> zeroinitializer, i32 1)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %2 = call <vscale x 2 x i8> @llvm.vector.splice.right.nxv2i8(<vscale x 2 x i8> zeroinitializer, <vscale x 2 x i8> zeroinitializer, i32 1)
@@ -62,6 +62,8 @@ define void @vector_splice() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %54 = call <vscale x 16 x double> @llvm.vector.splice.right.nxv16f64(<vscale x 16 x double> zeroinitializer, <vscale x 16 x double> zeroinitializer, i32 1)
; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %55 = call <vscale x 32 x double> @llvm.vector.splice.right.nxv32f64(<vscale x 32 x double> zeroinitializer, <vscale x 32 x double> zeroinitializer, i32 1)
; CHECK-NEXT: Cost Model: Found an estimated cost of 128 for instruction: %56 = call <vscale x 64 x double> @llvm.vector.splice.right.nxv64f64(<vscale x 64 x double> zeroinitializer, <vscale x 64 x double> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Invalid cost for instruction: %left.variable = call <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
+; CHECK-NEXT: Cost Model: Invalid cost for instruction: %right.variable = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
; SIZE-LABEL: 'vector_splice'
@@ -121,6 +123,8 @@ define void @vector_splice() {
; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %54 = call <vscale x 16 x double> @llvm.vector.splice.right.nxv16f64(<vscale x 16 x double> zeroinitializer, <vscale x 16 x double> zeroinitializer, i32 1)
; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %55 = call <vscale x 32 x double> @llvm.vector.splice.right.nxv32f64(<vscale x 32 x double> zeroinitializer, <vscale x 32 x double> zeroinitializer, i32 1)
; SIZE-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %56 = call <vscale x 64 x double> @llvm.vector.splice.right.nxv64f64(<vscale x 64 x double> zeroinitializer, <vscale x 64 x double> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Invalid cost for instruction: %left.variable = call <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
+; SIZE-NEXT: Cost Model: Invalid cost for instruction: %right.variable = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
;
%splice.nxv1i8 = call <vscale x 1 x i8> @llvm.vector.splice.nxv1i8(<vscale x 1 x i8> zeroinitializer, <vscale x 1 x i8> zeroinitializer, i32 -1)
@@ -187,5 +191,8 @@ define void @vector_splice() {
%splice.nxv32f64 = call <vscale x 32 x double> @llvm.vector.splice.nxv32f64(<vscale x 32 x double> zeroinitializer, <vscale x 32 x double> zeroinitializer, i32 -1)
%splice.nxv64f64 = call <vscale x 64 x double> @llvm.vector.splice.nxv64f64(<vscale x 64 x double> zeroinitializer, <vscale x 64 x double> zeroinitializer, i32 -1)
+ %left.variable = call <vscale x 4 x i32> @llvm.vector.splice.left(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
+ %right.variable = call <vscale x 4 x i32> @llvm.vector.splice.right(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
+
ret void
}
diff --git a/llvm/test/CodeGen/AArch64/named-vector-shuffles-neon.ll b/llvm/test/CodeGen/AArch64/named-vector-shuffles-neon.ll
index e20fe07a443e4..dc7c3cbf9459c 100644
--- a/llvm/test/CodeGen/AArch64/named-vector-shuffles-neon.ll
+++ b/llvm/test/CodeGen/AArch64/named-vector-shuffles-neon.ll
@@ -127,6 +127,43 @@ define <16 x i8> @splice_right_0(<16 x i8> %a, <16 x i8> %b) #0 {
ret <16 x i8> %res
}
+define <4 x i32> @splice_left_v4i32_variable_offset(<4 x i32> %a, <4 x i32> %b, i32 zeroext %offset) #0 {
+; CHECK-LABEL: splice_left_v4i32_variable_offset:
+; CHECK: // %bb.0:
+; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
+; CHECK-NEXT: ubfiz x8, x0, #2, #32
+; CHECK-NEXT: mov w9, #16 // =0x10
+; CHECK-NEXT: cmp x8, #16
+; CHECK-NEXT: stp q0, q1, [sp, #-32]!
+; CHECK-NEXT: csel x8, x8, x9, lo
+; CHECK-NEXT: mov x9, sp
+; CHECK-NEXT: ldr q0, [x9, x8]
+; CHECK-NEXT: add sp, sp, #32
+; CHECK-NEXT: ret
+ %res = call <4 x i32> @llvm.vector.splice.left(<4 x i32> %a, <4 x i32> %b, i32 %offset)
+ ret <4 x i32> %res
+}
+
+define <4 x i32> @splice_right_v4i32_variable_offset(<4 x i32> %a, <4 x i32> %b, i32 zeroext %offset) #0 {
+; CHECK-LABEL: splice_right_v4i32_variable_offset:
+; CHECK: // %bb.0:
+; CHECK-NEXT: sub sp, sp, #32
+; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
+; CHECK-NEXT: ubfiz x8, x0, #2, #32
+; CHECK-NEXT: mov w9, #16 // =0x10
+; CHECK-NEXT: mov x10, sp
+; CHECK-NEXT: stp q0, q1, [sp]
+; CHECK-NEXT: cmp x8, #16
+; CHECK-NEXT: csel x8, x8, x9, lo
+; CHECK-NEXT: add x9, x10, #16
+; CHECK-NEXT: sub x8, x9, x8
+; CHECK-NEXT: ldr q0, [x8]
+; CHECK-NEXT: add sp, sp, #32
+; CHECK-NEXT: ret
+ %res = call <4 x i32> @llvm.vector.splice.right(<4 x i32> %a, <4 x i32> %b, i32 %offset)
+ ret <4 x i32> %res
+}
+
declare <2 x i8> @llvm.vector.splice.v2i8(<2 x i8>, <2 x i8>, i32)
declare <16 x i8> @llvm.vector.splice.v16i8(<16 x i8>, <16 x i8>, i32)
declare <8 x i32> @llvm.vector.splice.v8i32(<8 x i32>, <8 x i32>, i32)
diff --git a/llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll b/llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll
index 2aef74a91c056..d84c7658c7b1f 100644
--- a/llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll
+++ b/llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll
@@ -464,12 +464,16 @@ define <vscale x 8 x i32> @splice_nxv8i32_idx(<vscale x 8 x i32> %a, <vscale x 8
; CHECK: // %bb.0:
; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
; CHECK-NEXT: addvl sp, sp, #-4
-; CHECK-NEXT: mov x8, sp
+; CHECK-NEXT: rdvl x8, #2
+; CHECK-NEXT: mov w9, #8 // =0x8
; CHECK-NEXT: str z1, [sp, #1, mul vl]
+; CHECK-NEXT: cmp x8, #8
; CHECK-NEXT: str z0, [sp]
-; CHECK-NEXT: orr x8, x8, #0x8
+; CHECK-NEXT: csel x8, x8, x9, lo
+; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: str z3, [sp, #3, mul vl]
; CHECK-NEXT: str z2, [sp, #2, mul vl]
+; CHECK-NEXT: orr x8, x9, x8
; CHECK-NEXT: ldr z0, [x8]
; CHECK-NEXT: ldr z1, [x8, #1, mul vl]
; CHECK-NEXT: addvl sp, sp, #4
@@ -485,26 +489,25 @@ define <vscale x 16 x float> @splice_nxv16f32_16(<vscale x 16 x float> %a, <vsca
; CHECK: // %bb.0:
; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
; CHECK-NEXT: addvl sp, sp, #-8
-; CHECK-NEXT: rdvl x8, #1
-; CHECK-NEXT: mov w9, #16 // =0x10
-; CHECK-NEXT: ptrue p0.s
-; CHECK-NEXT: sub x8, x8, #1
+; CHECK-NEXT: rdvl x8, #4
+; CHECK-NEXT: mov w9, #64 // =0x40
+; CHECK-NEXT: ptrue p0.b
+; CHECK-NEXT: cmp x8, #64
; CHECK-NEXT: str z3, [sp, #3, mul vl]
-; CHECK-NEXT: cmp x8, #16
; CHECK-NEXT: str z2, [sp, #2, mul vl]
; CHECK-NEXT: csel x8, x8, x9, lo
; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: str z1, [sp, #1, mul vl]
-; CHECK-NEXT: add x10, x9, x8, lsl #2
; CHECK-NEXT: str z0, [sp]
; CHECK-NEXT: str z7, [sp, #7, mul vl]
; CHECK-NEXT: str z4, [sp, #4, mul vl]
; CHECK-NEXT: str z5, [sp, #5, mul vl]
; CHECK-NEXT: str z6, [sp, #6, mul vl]
-; CHECK-NEXT: ld1w { z0.s }, p0/z, [x9, x8, lsl #2]
-; CHECK-NEXT: ldr z1, [x10, #1, mul vl]
-; CHECK-NEXT: ldr z2, [x10, #2, mul vl]
-; CHECK-NEXT: ldr z3, [x10, #3, mul vl]
+; CHECK-NEXT: ld1b { z0.b }, p0/z, [x9, x8]
+; CHECK-NEXT: add x8, x9, x8
+; CHECK-NEXT: ldr z1, [x8, #1, mul vl]
+; CHECK-NEXT: ldr z2, [x8, #2, mul vl]
+; CHECK-NEXT: ldr z3, [x8, #3, mul vl]
; CHECK-NEXT: addvl sp, sp, #8
; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-NEXT: ret
@@ -1063,17 +1066,18 @@ define <vscale x 8 x i32> @splice_nxv8i32(<vscale x 8 x i32> %a, <vscale x 8 x i
; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
; CHECK-NEXT: addvl sp, sp, #-4
; CHECK-NEXT: rdvl x8, #2
-; CHECK-NEXT: mov x9, sp
-; CHECK-NEXT: ptrue p0.s
-; CHECK-NEXT: add x8, x9, x8
+; CHECK-NEXT: mov w9, #32 // =0x20
+; CHECK-NEXT: mov x10, sp
+; CHECK-NEXT: cmp x8, #32
; CHECK-NEXT: str z1, [sp, #1, mul vl]
-; CHECK-NEXT: mov x9, #-8 // =0xfffffffffffffff8
+; CHECK-NEXT: csel x9, x8, x9, lo
+; CHECK-NEXT: add x8, x10, x8
; CHECK-NEXT: str z0, [sp]
-; CHECK-NEXT: sub x10, x8, #32
; CHECK-NEXT: str z3, [sp, #3, mul vl]
+; CHECK-NEXT: sub x8, x8, x9
; CHECK-NEXT: str z2, [sp, #2, mul vl]
-; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8, x9, lsl #2]
-; CHECK-NEXT: ldr z1, [x10, #1, mul vl]
+; CHECK-NEXT: ldr z0, [x8]
+; CHECK-NEXT: ldr z1, [x8, #1, mul vl]
; CHECK-NEXT: addvl sp, sp, #4
; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-NEXT: ret
@@ -1113,6 +1117,51 @@ define <vscale x 16 x float> @splice_nxv16f32_neg17(<vscale x 16 x float> %a, <v
ret <vscale x 16 x float> %res
}
+define <vscale x 4 x i32> @splice_left_nxv4i32_variable_offset(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, i32 zeroext %offset) #0 {
+; CHECK-LABEL: splice_left_nxv4i32_variable_offset:
+; CHECK: // %bb.0:
+; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT: addvl sp, sp, #-2
+; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
+; CHECK-NEXT: ubfiz x8, x0, #2, #32
+; CHECK-NEXT: rdvl x9, #1
+; CHECK-NEXT: ptrue p0.b
+; CHECK-NEXT: str z0, [sp]
+; CHECK-NEXT: cmp x8, x9
+; CHECK-NEXT: str z1, [sp, #1, mul vl]
+; CHECK-NEXT: csel x8, x8, x9, lo
+; CHECK-NEXT: mov x9, sp
+; CHECK-NEXT: ld1b { z0.b }, p0/z, [x9, x8]
+; CHECK-NEXT: addvl sp, sp, #2
+; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ret
+ %res = call <vscale x 4 x i32> @llvm.vector.splice.left(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, i32 %offset)
+ ret <vscale x 4 x i32> %res
+}
+
+define <vscale x 4 x i32> @splice_right_nxv4i32_variable_offset(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, i32 zeroext %offset) #0 {
+; CHECK-LABEL: splice_right_nxv4i32_variable_offset:
+; CHECK: // %bb.0:
+; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT: addvl sp, sp, #-2
+; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
+; CHECK-NEXT: ubfiz x8, x0, #2, #32
+; CHECK-NEXT: rdvl x9, #1
+; CHECK-NEXT: mov x10, sp
+; CHECK-NEXT: str z0, [sp]
+; CHECK-NEXT: cmp x8, x9
+; CHECK-NEXT: str z1, [sp, #1, mul vl]
+; CHECK-NEXT: csel x8, x8, x9, lo
+; CHECK-NEXT: add x9, x10, x9
+; CHECK-NEXT: sub x8, x9, x8
+; CHECK-NEXT: ldr z0, [x8]
+; CHECK-NEXT: addvl sp, sp, #2
+; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ret
+ %res = call <vscale x 4 x i32> @llvm.vector.splice.right(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, i32 %offset)
+ ret <vscale x 4 x i32> %res
+}
+
declare <vscale x 2 x i1> @llvm.vector.splice.nxv2i1(<vscale x 2 x i1>, <vscale x 2 x i1>, i32)
declare <vscale x 4 x i1> @llvm.vector.splice.nxv4i1(<vscale x 4 x i1>, <vscale x 4 x i1>, i32)
declare <vscale x 8 x i1> @llvm.vector.splice.nxv8i1(<vscale x 8 x i1>, <vscale x 8 x i1>, i32)
diff --git a/llvm/test/CodeGen/RISCV/rvv/vector-splice.ll b/llvm/test/CodeGen/RISCV/rvv/vector-splice.ll
index cc389236df3ff..9fb9b508d76b0 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vector-splice.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vector-splice.ll
@@ -4208,3 +4208,83 @@ define <vscale x 2 x i32> @splice_nxv2i32_slideup_undef(<vscale x 2 x i32> %a) #
}
attributes #0 = { vscale_range(2,0) }
+
+define <vscale x 2 x i32> @splice_left_nxv2i32_variable_offset(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b, i32 zeroext %offset) {
+; NOVLDEP-LABEL: splice_left_nxv2i32_variable_offset:
+; NOVLDEP: # %bb.0:
+; NOVLDEP-NEXT: vsetvli a1, zero, e32, m1, ta, ma
+; NOVLDEP-NEXT: vslidedown.vx v8, v8, a0
+; NOVLDEP-NEXT: csrr a1, vlenb
+; NOVLDEP-NEXT: srli a1, a1, 2
+; NOVLDEP-NEXT: sub a1, a1, a0
+; NOVLDEP-NEXT: vslideup.vx v8, v9, a1
+; NOVLDEP-NEXT: ret
+;
+; VLDEP-LABEL: splice_left_nxv2i32_variable_offset:
+; VLDEP: # %bb.0:
+; VLDEP-NEXT: csrr a1, vlenb
+; VLDEP-NEXT: srli a1, a1, 2
+; VLDEP-NEXT: sub a1, a1, a0
+; VLDEP-NEXT: vsetvli zero, a1, e32, m1, ta, ma
+; VLDEP-NEXT: vslidedown.vx v8, v8, a0
+; VLDEP-NEXT: vsetvli a0, zero, e32, m1, ta, ma
+; VLDEP-NEXT: vslideup.vx v8, v9, a1
+; VLDEP-NEXT: ret
+ %res = call <vscale x 2 x i32> @llvm.vector.splice.left(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b, i32 %offset)
+ ret <vscale x 2 x i32> %res
+}
+
+define <vscale x 2 x i32> @splice_right_nxv2i32_variable_offset(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b, i32 zeroext %offset) {
+; NOVLDEP-LABEL: splice_right_nxv2i32_variable_offset:
+; NOVLDEP: # %bb.0:
+; NOVLDEP-NEXT: csrr a1, vlenb
+; NOVLDEP-NEXT: srli a1, a1, 2
+; NOVLDEP-NEXT: sub a1, a1, a0
+; NOVLDEP-NEXT: vsetvli a2, zero, e32, m1, ta, ma
+; NOVLDEP-NEXT: vslidedown.vx v8, v8, a1
+; NOVLDEP-NEXT: vslideup.vx v8, v9, a0
+; NOVLDEP-NEXT: ret
+;
+; VLDEP-LABEL: splice_right_nxv2i32_variable_offset:
+; VLDEP: # %bb.0:
+; VLDEP-NEXT: csrr a1, vlenb
+; VLDEP-NEXT: srli a1, a1, 2
+; VLDEP-NEXT: sub a1, a1, a0
+; VLDEP-NEXT: vsetvli zero, a0, e32, m1, ta, ma
+; VLDEP-NEXT: vslidedown.vx v8, v8, a1
+; VLDEP-NEXT: vsetvli a1, zero, e32, m1, ta, ma
+; VLDEP-NEXT: vslideup.vx v8, v9, a0
+; VLDEP-NEXT: ret
+ %res = call <vscale x 2 x i32> @llvm.vector.splice.right(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b, i32 %offset)
+ ret <vscale x 2 x i32> %res
+}
+
+define <vscale x 2 x i32> @splice_left_nxv2i32_variable_offset_slidedown(<vscale x 2 x i32> %a, i32 zeroext %offset) {
+; NOVLDEP-LABEL: splice_left_nxv2i32_variable_offset_slidedown:
+; NOVLDEP: # %bb.0:
+; NOVLDEP-NEXT: vsetvli a1, zero, e32, m1, ta, ma
+; NOVLDEP-NEXT: vslidedown.vx v8, v8, a0
+; NOVLDEP-NEXT: ret
+;
+; VLDEP-LABEL: splice_left_nxv2i32_variable_offset_slidedown:
+; VLDEP: # %bb.0:
+; VLDEP-NEXT: csrr a1, vlenb
+; VLDEP-NEXT: srli a1, a1, 2
+; VLDEP-NEXT: sub a1, a1, a0
+; VLDEP-NEXT: vsetvli zero, a1, e32, m1, ta, ma
+; VLDEP-NEXT: vslidedown.vx v8, v8, a0
+; VLDEP-NEXT: ret
+ %res = call <vscale x 2 x i32> @llvm.vector.splice.left(<vscale x 2 x i32> %a, <vscale x 2 x i32> poison, i32 %offset)
+ ret <vscale x 2 x i32> %res
+}
+
+define <vscale x 2 x i32> @splice_right_nxv2i32_variable_offset_slideup(<vscale x 2 x i32> %a, i32 zeroext %offset) {
+; CHECK-LABEL: splice_right_nxv2i32_variable_offset_slideup:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli a1, zero, e32, m1, ta, ma
+; CHECK-NEXT: vslideup.vx v9, v8, a0
+; CHECK-NEXT: vmv.v.v v8, v9
+; CHECK-NEXT: ret
+ %res = call <vscale x 2 x i32> @llvm.vector.splice.right(<vscale x 2 x i32> poison, <vscale x 2 x i32> %a, i32 %offset)
+ ret <vscale x 2 x i32> %res
+}
diff --git a/llvm/test/Transforms/InstSimplify/vector-splice.ll b/llvm/test/Transforms/InstSimplify/vector-splice.ll
new file mode 100644
index 0000000000000..827e5c9ec4838
--- /dev/null
+++ b/llvm/test/Transforms/InstSimplify/vector-splice.ll
@@ -0,0 +1,94 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt < %s -p instsimplify -S | FileCheck %s
+
+define <2 x i32> @left_undef_offset(<2 x i32> %a, <2 x i32> %b) {
+; CHECK-LABEL: define <2 x i32> @left_undef_offset(
+; CHECK-SAME: <2 x i32> [[A:%.*]], <2 x i32> [[B:%.*]]) {
+; CHECK-NEXT: ret <2 x i32> poison
+;
+ %res = call <2 x i32> @llvm.vector.splice.left(<2 x i32> %a, <2 x i32> %b, i32 undef)
+ ret <2 x i32> %res
+}
+
+define <2 x i32> @right_undef_offset(<2 x i32> %a, <2 x i32> %b) {
+; CHECK-LABEL: define <2 x i32> @right_undef_offset(
+; CHECK-SAME: <2 x i32> [[A:%.*]], <2 x i32> [[B:%.*]]) {
+; CHECK-NEXT: ret <2 x i32> poison
+;
+ %res = call <2 x i32> @llvm.vector.splice.right(<2 x i32> %a, <2 x i32> %b, i32 undef)
+ ret <2 x i32> %res
+}
+
+define <2 x i32> @left_out_of_bounds(<2 x i32> %a, <2 x i32> %b) {
+; CHECK-LABEL: define <2 x i32> @left_out_of_bounds(
+; CHECK-SAME: <2 x i32> [[A:%.*]], <2 x i32> [[B:%.*]]) {
+; CHECK-NEXT: ret <2 x i32> poison
+;
+ %res = call <2 x i32> @llvm.vector.splice.left(<2 x i32> %a, <2 x i32> %b, i32 3)
+ ret <2 x i32> %res
+}
+
+define <2 x i32> @right_out_of_bounds(<2 x i32> %a, <2 x i32> %b) {
+; CHECK-LABEL: define <2 x i32> @right_out_of_bounds(
+; CHECK-SAME: <2 x i32> [[A:%.*]], <2 x i32> [[B:%.*]]) {
+; CHECK-NEXT: ret <2 x i32> poison
+;
+ %res = call <2 x i32> @llvm.vector.splice.right(<2 x i32> %a, <2 x i32> %b, i32 3)
+ ret <2 x i32> %res
+}
+
+define <vscale x 2 x i32> @left_out_of_bounds_scalable(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b) vscale_range(1, 1) {
+; CHECK-LABEL: define <vscale x 2 x i32> @left_out_of_bounds_scalable(
+; CHECK-SAME: <vscale x 2 x i32> [[A:%.*]], <vscale x 2 x i32> [[B:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT: ret <vscale x 2 x i32> poison
+;
+ %res = call <vscale x 2 x i32> @llvm.vector.splice.left(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b, i32 3)
+ ret <vscale x 2 x i32> %res
+}
+
+define <vscale x 2 x i32> @left_not_out_of_bounds_scalable(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b) {
+; CHECK-LABEL: define <vscale x 2 x i32> @left_not_out_of_bounds_scalable(
+; CHECK-SAME: <vscale x 2 x i32> [[A:%.*]], <vscale x 2 x i32> [[B:%.*]]) {
+; CHECK-NEXT: [[RES:%.*]] = call <vscale x 2 x i32> @llvm.vector.splice.left.nxv2i32(<vscale x 2 x i32> [[A]], <vscale x 2 x i32> [[B]], i32 3)
+; CHECK-NEXT: ret <vscale x 2 x i32> [[RES]]
+;
+ %res = call <vscale x 2 x i32> @llvm.vector.splice.left(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b, i32 3)
+ ret <vscale x 2 x i32> %res
+}
+
+define <vscale x 2 x i32> @right_out_of_bounds_scalable(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b) vscale_range(1, 1) {
+; CHECK-LABEL: define <vscale x 2 x i32> @right_out_of_bounds_scalable(
+; CHECK-SAME: <vscale x 2 x i32> [[A:%.*]], <vscale x 2 x i32> [[B:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: ret <vscale x 2 x i32> poison
+;
+ %res = call <vscale x 2 x i32> @llvm.vector.splice.right(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b, i32 3)
+ ret <vscale x 2 x i32> %res
+}
+
+define <vscale x 2 x i32> @right_not_out_of_bounds_scalable(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b) {
+; CHECK-LABEL: define <vscale x 2 x i32> @right_not_out_of_bounds_scalable(
+; CHECK-SAME: <vscale x 2 x i32> [[A:%.*]], <vscale x 2 x i32> [[B:%.*]]) {
+; CHECK-NEXT: [[RES:%.*]] = call <vscale x 2 x i32> @llvm.vector.splice.right.nxv2i32(<vscale x 2 x i32> [[A]], <vscale x 2 x i32> [[B]], i32 3)
+; CHECK-NEXT: ret <vscale x 2 x i32> [[RES]]
+;
+ %res = call <vscale x 2 x i32> @llvm.vector.splice.right(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b, i32 3)
+ ret <vscale x 2 x i32> %res
+}
+
+define <2 x i32> @left_offset_0(<2 x i32> %a, <2 x i32> %b) {
+; CHECK-LABEL: define <2 x i32> @left_offset_0(
+; CHECK-SAME: <2 x i32> [[A:%.*]], <2 x i32> [[B:%.*]]) {
+; CHECK-NEXT: ret <2 x i32> [[A]]
+;
+ %res = call <2 x i32> @llvm.vector.splice.left(<2 x i32> %a, <2 x i32> %b, i32 0)
+ ret <2 x i32> %res
+}
+
+define <2 x i32> @right_offset_0(<2 x i32> %a, <2 x i32> %b) {
+; CHECK-LABEL: define <2 x i32> @right_offset_0(
+; CHECK-SAME: <2 x i32> [[A:%.*]], <2 x i32> [[B:%.*]]) {
+; CHECK-NEXT: ret <2 x i32> [[B]]
+;
+ %res = call <2 x i32> @llvm.vector.splice.right(<2 x i32> %a, <2 x i32> %b, i32 0)
+ ret <2 x i32> %res
+}
diff --git a/llvm/test/Verifier/invalid-splice.ll b/llvm/test/Verifier/invalid-splice.ll
deleted file mode 100644
index d921e4a5c7a78..0000000000000
--- a/llvm/test/Verifier/invalid-splice.ll
+++ /dev/null
@@ -1,37 +0,0 @@
-; RUN: not opt -passes=verify -S < %s 2>&1 >/dev/null | FileCheck %s
-
-; CHECK: The splice index exceeds the range [0, VL] where VL is the known minimum number of elements in the vector
-define <2 x double> @splice_v2f64_idx_neg3(<2 x double> %a, <2 x double> %b) #0 {
- %res = call <2 x double> @llvm.vector.splice.v2f64(<2 x double> %a, <2 x double> %b, i32 -3)
- ret <2 x double> %res
-}
-
-; CHECK: The splice index exceeds the range [0, VL] where VL is the known minimum number of elements in the vector
-define <vscale x 2 x double> @splice_nxv2f64_idx_neg3_vscale_min1(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
- %res = call <vscale x 2 x double> @llvm.vector.splice.nxv2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b, i32 -3)
- ret <vscale x 2 x double> %res
-}
-
-; CHECK: The splice index exceeds the range [0, VL] where VL is the known minimum number of elements in the vector
-define <vscale x 2 x double> @splice_nxv2f64_idx_neg5_vscale_min2(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #1 {
- %res = call <vscale x 2 x double> @llvm.vector.splice.nxv2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b, i32 -5)
- ret <vscale x 2 x double> %res
-}
-
-; CHECK: The splice index exceeds the range [0, VL-1] where VL is the known minimum number of elements in the vector
-define <2 x double> @splice_v2f64_idx2(<2 x double> %a, <2 x double> %b) #0 {
- %res = call <2 x double> @llvm.vector.splice.v2f64(<2 x double> %a, <2 x double> %b, i32 2)
- ret <2 x double> %res
-}
-
-; CHECK: The splice index exceeds the range [0, VL-1] where VL is the known minimum number of elements in the vector
-define <2 x double> @splice_v2f64_idx3(<2 x double> %a, <2 x double> %b) #1 {
- %res = call <2 x double> @llvm.vector.splice.v2f64(<2 x double> %a, <2 x double> %b, i32 3)
- ret <2 x double> %res
-}
-
-attributes #0 = { vscale_range(1,16) }
-attributes #1 = { vscale_range(2,16) }
-
-declare <2 x double> @llvm.vector.splice.v2f64(<2 x double>, <2 x double>, i32)
-declare <vscale x 2 x double> @llvm.vector.splice.nxv2f64(<vscale x 2 x double>, <vscale x 2 x double>, i32)
>From b1bdeb120a53159771f5ae50df9031dd3eb9db4a Mon Sep 17 00:00:00 2001
From: Luke Lau <luke at igalia.com>
Date: Wed, 7 Jan 2026 14:39:07 +0800
Subject: [PATCH 2/3] Update autoupgrader test
---
llvm/test/Bitcode/upgrade-vector-splice-intrinsic.ll | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/llvm/test/Bitcode/upgrade-vector-splice-intrinsic.ll b/llvm/test/Bitcode/upgrade-vector-splice-intrinsic.ll
index dd571021efa44..590abb1792431 100644
--- a/llvm/test/Bitcode/upgrade-vector-splice-intrinsic.ll
+++ b/llvm/test/Bitcode/upgrade-vector-splice-intrinsic.ll
@@ -26,7 +26,7 @@ define <vscale x 8 x half> @splice_scalable(<vscale x 8 x half> %a, <vscale x 8
}
declare <8 x half> @llvm.experimental.vector.splice.v8f16(<8 x half>, <8 x half>, i32 immarg)
-; CHECK: declare <8 x half> @llvm.vector.splice.left.v8f16(<8 x half>, <8 x half>, i32 immarg)
+; CHECK: declare <8 x half> @llvm.vector.splice.left.v8f16(<8 x half>, <8 x half>, i32)
declare <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>, i32 immarg)
-; CHECK: declare <vscale x 8 x half> @llvm.vector.splice.left.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>, i32 immarg)
+; CHECK: declare <vscale x 8 x half> @llvm.vector.splice.left.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>, i32)
>From b5c7edd4afc505a27f38070d3c40742709aa2e0c Mon Sep 17 00:00:00 2001
From: Luke Lau <luke at igalia.com>
Date: Wed, 7 Jan 2026 15:51:05 +0800
Subject: [PATCH 3/3] Extend Offset to ptr type
---
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp | 1 +
1 file changed, 1 insertion(+)
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index cf0a13473bc56..be358e9d6cef1 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -12033,6 +12033,7 @@ SDValue TargetLowering::expandVectorSplice(SDNode *Node,
// NOTE: TrailingBytes must be clamped so as not to read outside of V1:V2.
SDValue EltByteSize =
DAG.getTypeSize(DL, PtrVT, VT.getVectorElementType().getStoreSize());
+ Offset = DAG.getZExtOrTrunc(Offset, DL, PtrVT);
SDValue TrailingBytes = DAG.getNode(ISD::MUL, DL, PtrVT, Offset, EltByteSize);
TrailingBytes = DAG.getNode(ISD::UMIN, DL, PtrVT, TrailingBytes, VTBytes);
More information about the llvm-commits
mailing list