[llvm] [IR] Allow non-constant offsets in @llvm.vector.splice.{left,right} (PR #174693)

Tue Jan 6 23:52:00 PST 2026

https://github.com/lukel97 updated https://github.com/llvm/llvm-project/pull/174693

>From 95d5b9a1bf18600216aa1c58e73dfc9798bc03d3 Mon Sep 17 00:00:00 2001
From: Luke Lau <luke at igalia.com>
Date: Tue, 6 Jan 2026 21:25:48 +0800
Subject: [PATCH 1/3] Allow non-constant splice offsets

---
 llvm/docs/LangRef.rst                         | 44 +++++----
 .../llvm/Analysis/TargetTransformInfo.h       |  1 -
 llvm/include/llvm/CodeGen/BasicTTIImpl.h      |  5 +-
 llvm/include/llvm/CodeGen/ISDOpcodes.h        |  9 +-
 llvm/include/llvm/IR/Intrinsics.td            |  4 +-
 llvm/lib/Analysis/InstructionSimplify.cpp     | 24 +++++
 .../SelectionDAG/SelectionDAGBuilder.cpp      | 17 ++--
 .../CodeGen/SelectionDAG/TargetLowering.cpp   | 42 ++++-----
 llvm/lib/IR/Verifier.cpp                      | 27 ------
 .../Target/AArch64/AArch64ISelLowering.cpp    |  2 +
 .../test/Analysis/CostModel/AArch64/splice.ll |  7 +-
 .../CostModel/AArch64/sve-intrinsics.ll       | 11 ++-
 llvm/test/Analysis/CostModel/RISCV/splice.ll  |  9 +-
 .../AArch64/named-vector-shuffles-neon.ll     | 37 ++++++++
 .../AArch64/named-vector-shuffles-sve.ll      | 87 +++++++++++++----
 llvm/test/CodeGen/RISCV/rvv/vector-splice.ll  | 80 ++++++++++++++++
 .../Transforms/InstSimplify/vector-splice.ll  | 94 +++++++++++++++++++
 llvm/test/Verifier/invalid-splice.ll          | 37 --------
 18 files changed, 390 insertions(+), 147 deletions(-)
 create mode 100644 llvm/test/Transforms/InstSimplify/vector-splice.ll
 delete mode 100644 llvm/test/Verifier/invalid-splice.ll

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 041a526b6729f..2b95163f96eb6 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -20822,20 +20822,20 @@ This is an overloaded intrinsic.
 
 ::
 
-      declare <2 x double> @llvm.vector.splice.left.v2f64(<2 x double> %vec1, <2 x double> %vec2, i32 %imm)
-      declare <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, i32 %imm)
+      declare <2 x double> @llvm.vector.splice.left.v2f64(<2 x double> %vec1, <2 x double> %vec2, i32 %offset)
+      declare <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, i32 %offset)
 
 Overview:
 """""""""
 
 The '``llvm.vector.splice.left.*``' intrinsics construct a vector by
-concatenating two vectors together, shifting the elements left by ``imm``, and
-extracting the lower half.
+concatenating two vectors together, shifting the elements left by ``offset``,
+and extracting the lower half.
 
 These intrinsics work for both fixed and scalable vectors. While this intrinsic
 supports all vector types the recommended way to express this operation for
-fixed-width vectors is still to use a shufflevector, as that may allow for more
-optimization opportunities.
+fixed-width vectors with an immediate offset is still to use a shufflevector, as
+that may allow for more optimization opportunities.
 
 For example:
 
@@ -20849,11 +20849,13 @@ For example:
 
 Arguments:
 """"""""""
+The first two operands are vectors with the same type. ``offset`` is an unsigned
+scalar i32 that determines how many elements to shift left by.
 
-The first two operands are vectors with the same type. For a fixed-width vector
-<N x eltty>, imm is an unsigned integer constant in the range 0 <= imm < N. For
-a scalable vector <vscale x N x eltty>, imm is an unsigned integer constant in
-the range 0 <= imm < X where X=vscale_range_min * N.
+Semantics:
+""""""""""
+For a vector type with a runtime length of N, if ``offset`` > N then the result
+is a :ref:`poison value <poisonvalues>`.
 
 '``llvm.vector.splice.right``' Intrinsic
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -20864,20 +20866,20 @@ This is an overloaded intrinsic.
 
 ::
 
-      declare <2 x double> @llvm.vector.splice.right.v2f64(<2 x double> %vec1, <2 x double> %vec2, i32 %imm)
-      declare <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, i32 %imm)
+      declare <2 x double> @llvm.vector.splice.right.v2f64(<2 x double> %vec1, <2 x double> %vec2, i32 %offset)
+      declare <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, i32 %offset)
 
 Overview:
 """""""""
 
 The '``llvm.vector.splice.right.*``' intrinsics construct a vector by
-concatenating two vectors together, shifting the elements right by ``imm``, and
-extracting the upper half.
+concatenating two vectors together, shifting the elements right by ``offset``,
+and extracting the upper half.
 
 These intrinsics work for both fixed and scalable vectors. While this intrinsic
 supports all vector types the recommended way to express this operation for
-fixed-width vectors is still to use a shufflevector, as that may allow for more
-optimization opportunities.
+fixed-width vectors with an immediate offset is still to use a shufflevector, as
+that may allow for more optimization opportunities.
 
 For example:
 
@@ -20891,11 +20893,13 @@ For example:
 
 Arguments:
 """"""""""
+The first two operands are vectors with the same type. ``offset`` is an unsigned
+scalar i32 that determines how many elements to shift right by.
 
-The first two operands are vectors with the same type. For a fixed-width vector
-<N x eltty>, imm is an unsigned integer constant in the range 0 <= imm <= N. For
-a scalable vector <vscale x N x eltty>, imm is an unsigned integer constant in
-the range 0 <= imm <= X where X=vscale_range_min * N.
+Semantics:
+""""""""""
+For a vector type with a runtime length of N, if ``offset`` > N then the result
+is a :ref:`poison value <poisonvalues>`.
 
 '``llvm.stepvector``' Intrinsic
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 8b06b4aae26ce..5a4eb8daf0af6 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1210,7 +1210,6 @@ class TargetTransformInfo {
                          ///< with any shuffle mask.
     SK_PermuteSingleSrc, ///< Shuffle elements of single source vector with any
                          ///< shuffle mask.
-    // TODO: Split into SK_SpliceLeft + SK_SpliceRight
     SK_Splice            ///< Concatenates elements from the first input vector
                          ///< with elements of the second input vector. Returning
                          ///< a vector of the same type as the input vectors.
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index ef91c845ce9e7..c430e11168f73 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -2001,7 +2001,10 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
     }
     case Intrinsic::vector_splice_left:
     case Intrinsic::vector_splice_right: {
-      unsigned Index = cast<ConstantInt>(Args[2])->getZExtValue();
+      auto *COffset = dyn_cast<ConstantInt>(Args[2]);
+      if (!COffset)
+        break;
+      unsigned Index = COffset->getZExtValue();
       return thisT()->getShuffleCost(
           TTI::SK_Splice, cast<VectorType>(RetTy),
           cast<VectorType>(Args[0]->getType()), {}, CostKind,
diff --git a/llvm/include/llvm/CodeGen/ISDOpcodes.h b/llvm/include/llvm/CodeGen/ISDOpcodes.h
index ea7b21b6f6448..a7325f500f0ef 100644
--- a/llvm/include/llvm/CodeGen/ISDOpcodes.h
+++ b/llvm/include/llvm/CodeGen/ISDOpcodes.h
@@ -643,11 +643,12 @@ enum NodeType {
   /// in terms of the element size of VEC1/VEC2, not in terms of bytes.
   VECTOR_SHUFFLE,
 
-  /// VECTOR_SPLICE_LEFT(VEC1, VEC2, IMM) - Shifts CONCAT_VECTORS(VEC1, VEC2)
-  /// left by IMM elements and returns the lower half.
+  /// VECTOR_SPLICE_LEFT(VEC1, VEC2, OFFSET) - Shifts CONCAT_VECTORS(VEC1, VEC2)
+  /// left by OFFSET elements and returns the lower half.
   VECTOR_SPLICE_LEFT,
-  /// VECTOR_SPLICE_RIGHT(VEC1, VEC2, IMM) - Shifts CONCAT_VECTORS(VEC1, VEC2)
-  /// right by IMM elements and returns the upper half.
+  /// VECTOR_SPLICE_RIGHT(VEC1, VEC2, OFFSET) - Shifts CONCAT_VECTORS(VEC1,
+  /// VEC2)
+  /// right by OFFSET elements and returns the upper half.
   VECTOR_SPLICE_RIGHT,
 
   /// SCALAR_TO_VECTOR(VAL) - This represents the operation of loading a
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index c56b0185b4f1e..c5be0c4e0ebf8 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2832,12 +2832,12 @@ def int_vector_reverse : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
 def int_vector_splice_left
     : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
                             [LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
-                            [IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<2>>]>;
+                            [IntrNoMem, IntrSpeculatable]>;
 
 def int_vector_splice_right
     : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
                             [LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
-                            [IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<2>>]>;
+                            [IntrNoMem, IntrSpeculatable]>;
 
 //===---------- Intrinsics to query properties of scalable vectors --------===//
 def int_vscale : DefaultAttrsIntrinsic<[llvm_anyint_ty],
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index 5af4b299cb60c..bff666ce236a5 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -7258,6 +7258,30 @@ static Value *simplifyIntrinsic(CallBase *Call, Value *Callee,
 
     return nullptr;
   }
+  case Intrinsic::vector_splice_left:
+  case Intrinsic::vector_splice_right: {
+    Value *Offset = Args[2];
+    auto *Ty = cast<VectorType>(F->getReturnType());
+    if (Q.isUndefValue(Offset))
+      return PoisonValue::get(Ty);
+
+    unsigned BitWidth = Offset->getType()->getScalarSizeInBits();
+    ConstantRange NumElts(
+        APInt(BitWidth, Ty->getElementCount().getKnownMinValue()));
+    if (Ty->isScalableTy())
+      NumElts = NumElts.multiply(getVScaleRange(Call->getFunction(), BitWidth));
+
+    // If we know Offset > NumElts, simplify to poison.
+    ConstantRange CR = computeConstantRangeIncludingKnownBits(Offset, false, Q);
+    if (CR.getUnsignedMin().ugt(NumElts.getUnsignedMax()))
+      return PoisonValue::get(Ty);
+
+    // splice.left(a, b, 0) --> a, splice.right(a, b, 0) --> b
+    if (CR.isSingleElement() && CR.getSingleElement()->isZero())
+      return IID == Intrinsic::vector_splice_left ? Args[0] : Args[1];
+
+    return nullptr;
+  }
   case Intrinsic::experimental_constrained_fadd: {
     auto *FPI = cast<ConstrainedFPIntrinsic>(Call);
     return simplifyFAddInst(Args[0], Args[1], FPI->getFastMathFlags(), Q,
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index bff4799963fc2..745626488ccf0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -12850,19 +12850,18 @@ void SelectionDAGBuilder::visitVectorSplice(const CallInst &I) {
   SDLoc DL = getCurSDLoc();
   SDValue V1 = getValue(I.getOperand(0));
   SDValue V2 = getValue(I.getOperand(1));
-  uint64_t Imm = cast<ConstantInt>(I.getOperand(2))->getZExtValue();
   const bool IsLeft = I.getIntrinsicID() == Intrinsic::vector_splice_left;
 
-  // VECTOR_SHUFFLE doesn't support a scalable mask so use a dedicated node.
-  if (VT.isScalableVector()) {
-    setValue(
-        &I,
-        DAG.getNode(
-            IsLeft ? ISD::VECTOR_SPLICE_LEFT : ISD::VECTOR_SPLICE_RIGHT, DL, VT,
-            V1, V2,
-            DAG.getConstant(Imm, DL, TLI.getVectorIdxTy(DAG.getDataLayout()))));
+  // VECTOR_SHUFFLE doesn't support a scalable or non-constant mask.
+  if (VT.isScalableVector() || !isa<ConstantInt>(I.getOperand(2))) {
+    SDValue Offset = DAG.getZExtOrTrunc(
+        getValue(I.getOperand(2)), DL, TLI.getVectorIdxTy(DAG.getDataLayout()));
+    setValue(&I, DAG.getNode(IsLeft ? ISD::VECTOR_SPLICE_LEFT
+                                    : ISD::VECTOR_SPLICE_RIGHT,
+                             DL, VT, V1, V2, Offset));
     return;
   }
+  uint64_t Imm = cast<ConstantInt>(I.getOperand(2))->getZExtValue();
 
   unsigned NumElts = VT.getVectorNumElements();
 
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 8968907dcf5af..cf0a13473bc56 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -11992,24 +11992,25 @@ SDValue TargetLowering::expandVectorSplice(SDNode *Node,
   assert((Node->getOpcode() == ISD::VECTOR_SPLICE_LEFT ||
           Node->getOpcode() == ISD::VECTOR_SPLICE_RIGHT) &&
          "Unexpected opcode!");
-  assert(Node->getValueType(0).isScalableVector() &&
-         "Fixed length vector types expected to use SHUFFLE_VECTOR!");
+  assert((Node->getValueType(0).isScalableVector() ||
+          !isa<ConstantSDNode>(Node->getOperand(2))) &&
+         "Fixed length vector types with constant offsets expected to use "
+         "SHUFFLE_VECTOR!");
 
   EVT VT = Node->getValueType(0);
   SDValue V1 = Node->getOperand(0);
   SDValue V2 = Node->getOperand(1);
-  uint64_t Imm = Node->getConstantOperandVal(2);
+  SDValue Offset = Node->getOperand(2);
   SDLoc DL(Node);
 
   // Expand through memory thusly:
   //  Alloca CONCAT_VECTORS_TYPES(V1, V2) Ptr
   //  Store V1, Ptr
   //  Store V2, Ptr + sizeof(V1)
-  //  If (Imm < 0)
-  //    TrailingElts = -Imm
-  //    Ptr = Ptr + sizeof(V1) - (TrailingElts * sizeof(VT.Elt))
+  //  if (VECTOR_SPLICE_LEFT)
+  //    Ptr = Ptr + (Offset * sizeof(VT.Elt))
   //  else
-  //    Ptr = Ptr + (Imm * sizeof(VT.Elt))
+  //    Ptr = Ptr + sizeof(V1) - (Offset * size(VT.Elt))
   //  Res = Load Ptr
 
   Align Alignment = DAG.getReducedAlign(VT, /*UseABI=*/false);
@@ -12029,27 +12030,20 @@ SDValue TargetLowering::expandVectorSplice(SDNode *Node,
   SDValue StackPtr2 = DAG.getNode(ISD::ADD, DL, PtrVT, StackPtr, VTBytes);
   SDValue StoreV2 = DAG.getStore(StoreV1, DL, V2, StackPtr2, PtrInfo);
 
-  if (Node->getOpcode() == ISD::VECTOR_SPLICE_LEFT) {
-    // Load back the required element. getVectorElementPointer takes care of
-    // clamping the index if it's out-of-bounds.
-    StackPtr = getVectorElementPointer(DAG, StackPtr, VT, Node->getOperand(2));
-    // Load the spliced result
-    return DAG.getLoad(VT, DL, StoreV2, StackPtr,
-                       MachinePointerInfo::getUnknownStack(MF));
-  }
-
-  // NOTE: TrailingElts must be clamped so as not to read outside of V1:V2.
-  TypeSize EltByteSize = VT.getVectorElementType().getStoreSize();
-  SDValue TrailingBytes = DAG.getConstant(Imm * EltByteSize, DL, PtrVT);
+  // NOTE: TrailingBytes must be clamped so as not to read outside of V1:V2.
+  SDValue EltByteSize =
+      DAG.getTypeSize(DL, PtrVT, VT.getVectorElementType().getStoreSize());
+  SDValue TrailingBytes = DAG.getNode(ISD::MUL, DL, PtrVT, Offset, EltByteSize);
 
-  if (Imm > VT.getVectorMinNumElements())
-    TrailingBytes = DAG.getNode(ISD::UMIN, DL, PtrVT, TrailingBytes, VTBytes);
+  TrailingBytes = DAG.getNode(ISD::UMIN, DL, PtrVT, TrailingBytes, VTBytes);
 
-  // Calculate the start address of the spliced result.
-  StackPtr2 = DAG.getNode(ISD::SUB, DL, PtrVT, StackPtr2, TrailingBytes);
+  if (Node->getOpcode() == ISD::VECTOR_SPLICE_LEFT)
+    StackPtr = DAG.getMemBasePlusOffset(StackPtr, TrailingBytes, DL);
+  else
+    StackPtr = DAG.getNode(ISD::SUB, DL, PtrVT, StackPtr2, TrailingBytes);
 
   // Load the spliced result
-  return DAG.getLoad(VT, DL, StoreV2, StackPtr2,
+  return DAG.getLoad(VT, DL, StoreV2, StackPtr,
                      MachinePointerInfo::getUnknownStack(MF));
 }
 
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index a2ad95eb5abc4..1aa6152d55499 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -6572,33 +6572,6 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
 
     break;
   }
-  case Intrinsic::vector_splice_left:
-  case Intrinsic::vector_splice_right: {
-    VectorType *VecTy = cast<VectorType>(Call.getType());
-    uint64_t Idx = cast<ConstantInt>(Call.getArgOperand(2))->getZExtValue();
-    uint64_t KnownMinNumElements = VecTy->getElementCount().getKnownMinValue();
-    if (VecTy->isScalableTy() && Call.getParent() &&
-        Call.getParent()->getParent()) {
-      AttributeList Attrs = Call.getParent()->getParent()->getAttributes();
-      if (Attrs.hasFnAttr(Attribute::VScaleRange))
-        KnownMinNumElements *= Attrs.getFnAttrs().getVScaleRangeMin();
-    }
-    if (ID == Intrinsic::vector_splice_left)
-      Check(Idx < KnownMinNumElements,
-            "The splice index exceeds the range [0, VL-1] where VL is the "
-            "known minimum number of elements in the vector. For scalable "
-            "vectors the minimum number of elements is determined from "
-            "vscale_range.",
-            &Call);
-    else
-      Check(Idx <= KnownMinNumElements,
-            "The splice index exceeds the range [0, VL] where VL is the "
-            "known minimum number of elements in the vector. For scalable "
-            "vectors the minimum number of elements is determined from "
-            "vscale_range.",
-            &Call);
-    break;
-  }
   case Intrinsic::stepvector: {
     VectorType *VecTy = dyn_cast<VectorType>(Call.getType());
     Check(VecTy && VecTy->getScalarType()->isIntegerTy() &&
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index e9396ae76776b..b1851bee7e860 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -12311,6 +12311,8 @@ SDValue AArch64TargetLowering::LowerSELECT_CC(
 SDValue AArch64TargetLowering::LowerVECTOR_SPLICE(SDValue Op,
                                                   SelectionDAG &DAG) const {
   EVT Ty = Op.getValueType();
+  if (!isa<ConstantSDNode>(Op.getOperand(2)))
+    return SDValue();
   auto Idx = Op.getConstantOperandAPInt(2);
   int64_t IdxVal = Idx.getSExtValue();
   assert(Ty.isScalableVector() &&
diff --git a/llvm/test/Analysis/CostModel/AArch64/splice.ll b/llvm/test/Analysis/CostModel/AArch64/splice.ll
index 1d3154ad82299..bb787d0928310 100644
--- a/llvm/test/Analysis/CostModel/AArch64/splice.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/splice.ll
@@ -3,7 +3,7 @@
 
 target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
 
-define void @vector_splice() #0 {
+define void @vector_splice(i32 %offset) #0 {
 ; CHECK-LABEL: 'vector_splice'
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = call <16 x i8> @llvm.vector.splice.left.v16i8(<16 x i8> zeroinitializer, <16 x i8> zeroinitializer, i32 1)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %2 = call <32 x i8> @llvm.vector.splice.left.v32i8(<32 x i8> zeroinitializer, <32 x i8> zeroinitializer, i32 1)
@@ -33,6 +33,8 @@ define void @vector_splice() #0 {
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %26 = call <4 x i1> @llvm.vector.splice.left.v4i1(<4 x i1> zeroinitializer, <4 x i1> zeroinitializer, i32 1)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %27 = call <2 x i1> @llvm.vector.splice.left.v2i1(<2 x i1> zeroinitializer, <2 x i1> zeroinitializer, i32 1)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %28 = call <2 x i128> @llvm.vector.splice.left.v2i128(<2 x i128> zeroinitializer, <2 x i128> zeroinitializer, i32 1)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %left.variable = call <4 x i32> @llvm.vector.splice.left.v4i32(<4 x i32> zeroinitializer, <4 x i32> zeroinitializer, i32 %offset)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %right.variable = call <4 x i32> @llvm.vector.splice.right.v4i32(<4 x i32> zeroinitializer, <4 x i32> zeroinitializer, i32 %offset)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
   %splice.v16i8 = call <16 x i8> @llvm.vector.splice.v16i8(<16 x i8> zeroinitializer, <16 x i8> zeroinitializer, i32 1)
@@ -63,6 +65,9 @@ define void @vector_splice() #0 {
   %splice.v4i1 = call <4 x i1> @llvm.vector.splice.v4i1(<4 x i1> zeroinitializer, <4 x i1> zeroinitializer, i32 1)
   %splice.v2i1 = call <2 x i1> @llvm.vector.splice.v2i1(<2 x i1> zeroinitializer, <2 x i1> zeroinitializer, i32 1)
   %splice.v2i128 = call <2 x i128> @llvm.vector.splice.v2i128(<2 x i128> zeroinitializer, <2 x i128> zeroinitializer, i32 1)
+
+  %left.variable = call <4 x i32> @llvm.vector.splice.left(<4 x i32> zeroinitializer, <4 x i32> zeroinitializer, i32 %offset)
+  %right.variable = call <4 x i32> @llvm.vector.splice.right(<4 x i32> zeroinitializer, <4 x i32> zeroinitializer, i32 %offset)
   ret void
 }
 
diff --git a/llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll b/llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll
index d503918ce6f78..6ed3a90438c34 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll
@@ -615,7 +615,7 @@ declare <vscale x 4 x float> @llvm.log.nxv4f32(<vscale x 4 x float>)
 declare <vscale x 4 x float> @llvm.log2.nxv4f32(<vscale x 4 x float>)
 declare <vscale x 4 x float> @llvm.log10.nxv4f32(<vscale x 4 x float>)
 
-define void @vector_splice() #0 {
+define void @vector_splice(i32 %offset) #0 {
 ; CHECK-VSCALE-1-LABEL: 'vector_splice'
 ; CHECK-VSCALE-1-NEXT:  Cost Model: Found costs of 1 for: %1 = call <vscale x 16 x i8> @llvm.vector.splice.left.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
 ; CHECK-VSCALE-1-NEXT:  Cost Model: Found costs of 2 for: %2 = call <vscale x 32 x i8> @llvm.vector.splice.left.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
@@ -678,6 +678,8 @@ define void @vector_splice() #0 {
 ; CHECK-VSCALE-1-NEXT:  Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %59 = call <vscale x 4 x i1> @llvm.vector.splice.right.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
 ; CHECK-VSCALE-1-NEXT:  Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %60 = call <vscale x 2 x i1> @llvm.vector.splice.right.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
 ; CHECK-VSCALE-1-NEXT:  Cost Model: Found costs of Invalid for: %61 = call <vscale x 1 x i1> @llvm.vector.splice.right.nxv1i1(<vscale x 1 x i1> zeroinitializer, <vscale x 1 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT:  Cost Model: Found costs of Invalid for: %left.variable = call <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
+; CHECK-VSCALE-1-NEXT:  Cost Model: Found costs of Invalid for: %right.variable = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
 ; CHECK-VSCALE-1-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
 ; CHECK-VSCALE-2-LABEL: 'vector_splice'
@@ -742,6 +744,8 @@ define void @vector_splice() #0 {
 ; CHECK-VSCALE-2-NEXT:  Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %59 = call <vscale x 4 x i1> @llvm.vector.splice.right.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
 ; CHECK-VSCALE-2-NEXT:  Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %60 = call <vscale x 2 x i1> @llvm.vector.splice.right.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
 ; CHECK-VSCALE-2-NEXT:  Cost Model: Found costs of Invalid for: %61 = call <vscale x 1 x i1> @llvm.vector.splice.right.nxv1i1(<vscale x 1 x i1> zeroinitializer, <vscale x 1 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT:  Cost Model: Found costs of Invalid for: %left.variable = call <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
+; CHECK-VSCALE-2-NEXT:  Cost Model: Found costs of Invalid for: %right.variable = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
 ; CHECK-VSCALE-2-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
 ; TYPE_BASED_ONLY-LABEL: 'vector_splice'
@@ -806,6 +810,8 @@ define void @vector_splice() #0 {
 ; TYPE_BASED_ONLY-NEXT:  Cost Model: Found costs of Invalid for: %59 = call <vscale x 4 x i1> @llvm.vector.splice.right.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
 ; TYPE_BASED_ONLY-NEXT:  Cost Model: Found costs of Invalid for: %60 = call <vscale x 2 x i1> @llvm.vector.splice.right.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
 ; TYPE_BASED_ONLY-NEXT:  Cost Model: Found costs of Invalid for: %61 = call <vscale x 1 x i1> @llvm.vector.splice.right.nxv1i1(<vscale x 1 x i1> zeroinitializer, <vscale x 1 x i1> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT:  Cost Model: Found costs of Invalid for: %left.variable = call <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
+; TYPE_BASED_ONLY-NEXT:  Cost Model: Found costs of Invalid for: %right.variable = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
 ; TYPE_BASED_ONLY-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
 
@@ -871,6 +877,9 @@ define void @vector_splice() #0 {
   %splice_nxv4i1_neg = call <vscale x 4 x i1> @llvm.vector.splice.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 -1)
   %splice_nxv2i1_neg = call <vscale x 2 x i1> @llvm.vector.splice.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 -1)
   %splice_nxv1i1_neg = call <vscale x 1 x i1> @llvm.vector.splice.nxv1i1(<vscale x 1 x i1> zeroinitializer, <vscale x 1 x i1> zeroinitializer, i32 -1)
+
+  %left.variable = call <vscale x 4 x i32> @llvm.vector.splice.left(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
+  %right.variable = call <vscale x 4 x i32> @llvm.vector.splice.right(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
   ret void
 }
 
diff --git a/llvm/test/Analysis/CostModel/RISCV/splice.ll b/llvm/test/Analysis/CostModel/RISCV/splice.ll
index e388a99be423b..5250c3dc1171a 100644
--- a/llvm/test/Analysis/CostModel/RISCV/splice.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/splice.ll
@@ -4,7 +4,7 @@
 ; RUN: opt < %s -passes="print<cost-model>" -cost-kind=code-size 2>&1 -disable-output -S -mtriple=riscv64 -mattr=+v,+f,+d,+zfh,+zvfh | FileCheck %s --check-prefix=SIZE
 ; RUN: opt < %s -passes="print<cost-model>" -cost-kind=code-size 2>&1 -disable-output -S -mtriple=riscv64 -mattr=+v,+f,+d,+zfh,+zvfhmin | FileCheck %s --check-prefix=SIZE
 
-define void @vector_splice() {
+define void @vector_splice(i32 zeroext %offset) {
 ; CHECK-LABEL: 'vector_splice'
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %1 = call <vscale x 1 x i8> @llvm.vector.splice.right.nxv1i8(<vscale x 1 x i8> zeroinitializer, <vscale x 1 x i8> zeroinitializer, i32 1)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %2 = call <vscale x 2 x i8> @llvm.vector.splice.right.nxv2i8(<vscale x 2 x i8> zeroinitializer, <vscale x 2 x i8> zeroinitializer, i32 1)
@@ -62,6 +62,8 @@ define void @vector_splice() {
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 32 for instruction: %54 = call <vscale x 16 x double> @llvm.vector.splice.right.nxv16f64(<vscale x 16 x double> zeroinitializer, <vscale x 16 x double> zeroinitializer, i32 1)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 64 for instruction: %55 = call <vscale x 32 x double> @llvm.vector.splice.right.nxv32f64(<vscale x 32 x double> zeroinitializer, <vscale x 32 x double> zeroinitializer, i32 1)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 128 for instruction: %56 = call <vscale x 64 x double> @llvm.vector.splice.right.nxv64f64(<vscale x 64 x double> zeroinitializer, <vscale x 64 x double> zeroinitializer, i32 1)
+; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %left.variable = call <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
+; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %right.variable = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
 ; SIZE-LABEL: 'vector_splice'
@@ -121,6 +123,8 @@ define void @vector_splice() {
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %54 = call <vscale x 16 x double> @llvm.vector.splice.right.nxv16f64(<vscale x 16 x double> zeroinitializer, <vscale x 16 x double> zeroinitializer, i32 1)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %55 = call <vscale x 32 x double> @llvm.vector.splice.right.nxv32f64(<vscale x 32 x double> zeroinitializer, <vscale x 32 x double> zeroinitializer, i32 1)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %56 = call <vscale x 64 x double> @llvm.vector.splice.right.nxv64f64(<vscale x 64 x double> zeroinitializer, <vscale x 64 x double> zeroinitializer, i32 1)
+; SIZE-NEXT:  Cost Model: Invalid cost for instruction: %left.variable = call <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
+; SIZE-NEXT:  Cost Model: Invalid cost for instruction: %right.variable = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret void
 ;
   %splice.nxv1i8 = call <vscale x 1 x i8> @llvm.vector.splice.nxv1i8(<vscale x 1 x i8> zeroinitializer, <vscale x 1 x i8> zeroinitializer, i32 -1)
@@ -187,5 +191,8 @@ define void @vector_splice() {
   %splice.nxv32f64 = call <vscale x 32 x double> @llvm.vector.splice.nxv32f64(<vscale x 32 x double> zeroinitializer, <vscale x 32 x double> zeroinitializer, i32 -1)
   %splice.nxv64f64 = call <vscale x 64 x double> @llvm.vector.splice.nxv64f64(<vscale x 64 x double> zeroinitializer, <vscale x 64 x double> zeroinitializer, i32 -1)
 
+  %left.variable = call <vscale x 4 x i32> @llvm.vector.splice.left(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
+  %right.variable = call <vscale x 4 x i32> @llvm.vector.splice.right(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 %offset)
+
   ret void
 }
diff --git a/llvm/test/CodeGen/AArch64/named-vector-shuffles-neon.ll b/llvm/test/CodeGen/AArch64/named-vector-shuffles-neon.ll
index e20fe07a443e4..dc7c3cbf9459c 100644
--- a/llvm/test/CodeGen/AArch64/named-vector-shuffles-neon.ll
+++ b/llvm/test/CodeGen/AArch64/named-vector-shuffles-neon.ll
@@ -127,6 +127,43 @@ define <16 x i8> @splice_right_0(<16 x i8> %a, <16 x i8> %b) #0 {
   ret <16 x i8> %res
 }
 
+define <4 x i32> @splice_left_v4i32_variable_offset(<4 x i32> %a, <4 x i32> %b, i32 zeroext %offset) #0 {
+; CHECK-LABEL: splice_left_v4i32_variable_offset:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    // kill: def $w0 killed $w0 def $x0
+; CHECK-NEXT:    ubfiz x8, x0, #2, #32
+; CHECK-NEXT:    mov w9, #16 // =0x10
+; CHECK-NEXT:    cmp x8, #16
+; CHECK-NEXT:    stp q0, q1, [sp, #-32]!
+; CHECK-NEXT:    csel x8, x8, x9, lo
+; CHECK-NEXT:    mov x9, sp
+; CHECK-NEXT:    ldr q0, [x9, x8]
+; CHECK-NEXT:    add sp, sp, #32
+; CHECK-NEXT:    ret
+  %res = call <4 x i32> @llvm.vector.splice.left(<4 x i32> %a, <4 x i32> %b, i32 %offset)
+  ret <4 x i32> %res
+}
+
+define <4 x i32> @splice_right_v4i32_variable_offset(<4 x i32> %a, <4 x i32> %b, i32 zeroext %offset) #0 {
+; CHECK-LABEL: splice_right_v4i32_variable_offset:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    sub sp, sp, #32
+; CHECK-NEXT:    // kill: def $w0 killed $w0 def $x0
+; CHECK-NEXT:    ubfiz x8, x0, #2, #32
+; CHECK-NEXT:    mov w9, #16 // =0x10
+; CHECK-NEXT:    mov x10, sp
+; CHECK-NEXT:    stp q0, q1, [sp]
+; CHECK-NEXT:    cmp x8, #16
+; CHECK-NEXT:    csel x8, x8, x9, lo
+; CHECK-NEXT:    add x9, x10, #16
+; CHECK-NEXT:    sub x8, x9, x8
+; CHECK-NEXT:    ldr q0, [x8]
+; CHECK-NEXT:    add sp, sp, #32
+; CHECK-NEXT:    ret
+  %res = call <4 x i32> @llvm.vector.splice.right(<4 x i32> %a, <4 x i32> %b, i32 %offset)
+  ret <4 x i32> %res
+}
+
 declare <2 x i8> @llvm.vector.splice.v2i8(<2 x i8>, <2 x i8>, i32)
 declare <16 x i8> @llvm.vector.splice.v16i8(<16 x i8>, <16 x i8>, i32)
 declare <8 x i32> @llvm.vector.splice.v8i32(<8 x i32>, <8 x i32>, i32)
diff --git a/llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll b/llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll
index 2aef74a91c056..d84c7658c7b1f 100644
--- a/llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll
+++ b/llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll
@@ -464,12 +464,16 @@ define <vscale x 8 x i32> @splice_nxv8i32_idx(<vscale x 8 x i32> %a, <vscale x 8
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    str x29, [sp, #-16]! // 8-byte Folded Spill
 ; CHECK-NEXT:    addvl sp, sp, #-4
-; CHECK-NEXT:    mov x8, sp
+; CHECK-NEXT:    rdvl x8, #2
+; CHECK-NEXT:    mov w9, #8 // =0x8
 ; CHECK-NEXT:    str z1, [sp, #1, mul vl]
+; CHECK-NEXT:    cmp x8, #8
 ; CHECK-NEXT:    str z0, [sp]
-; CHECK-NEXT:    orr x8, x8, #0x8
+; CHECK-NEXT:    csel x8, x8, x9, lo
+; CHECK-NEXT:    mov x9, sp
 ; CHECK-NEXT:    str z3, [sp, #3, mul vl]
 ; CHECK-NEXT:    str z2, [sp, #2, mul vl]
+; CHECK-NEXT:    orr x8, x9, x8
 ; CHECK-NEXT:    ldr z0, [x8]
 ; CHECK-NEXT:    ldr z1, [x8, #1, mul vl]
 ; CHECK-NEXT:    addvl sp, sp, #4
@@ -485,26 +489,25 @@ define <vscale x 16 x float> @splice_nxv16f32_16(<vscale x 16 x float> %a, <vsca
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    str x29, [sp, #-16]! // 8-byte Folded Spill
 ; CHECK-NEXT:    addvl sp, sp, #-8
-; CHECK-NEXT:    rdvl x8, #1
-; CHECK-NEXT:    mov w9, #16 // =0x10
-; CHECK-NEXT:    ptrue p0.s
-; CHECK-NEXT:    sub x8, x8, #1
+; CHECK-NEXT:    rdvl x8, #4
+; CHECK-NEXT:    mov w9, #64 // =0x40
+; CHECK-NEXT:    ptrue p0.b
+; CHECK-NEXT:    cmp x8, #64
 ; CHECK-NEXT:    str z3, [sp, #3, mul vl]
-; CHECK-NEXT:    cmp x8, #16
 ; CHECK-NEXT:    str z2, [sp, #2, mul vl]
 ; CHECK-NEXT:    csel x8, x8, x9, lo
 ; CHECK-NEXT:    mov x9, sp
 ; CHECK-NEXT:    str z1, [sp, #1, mul vl]
-; CHECK-NEXT:    add x10, x9, x8, lsl #2
 ; CHECK-NEXT:    str z0, [sp]
 ; CHECK-NEXT:    str z7, [sp, #7, mul vl]
 ; CHECK-NEXT:    str z4, [sp, #4, mul vl]
 ; CHECK-NEXT:    str z5, [sp, #5, mul vl]
 ; CHECK-NEXT:    str z6, [sp, #6, mul vl]
-; CHECK-NEXT:    ld1w { z0.s }, p0/z, [x9, x8, lsl #2]
-; CHECK-NEXT:    ldr z1, [x10, #1, mul vl]
-; CHECK-NEXT:    ldr z2, [x10, #2, mul vl]
-; CHECK-NEXT:    ldr z3, [x10, #3, mul vl]
+; CHECK-NEXT:    ld1b { z0.b }, p0/z, [x9, x8]
+; CHECK-NEXT:    add x8, x9, x8
+; CHECK-NEXT:    ldr z1, [x8, #1, mul vl]
+; CHECK-NEXT:    ldr z2, [x8, #2, mul vl]
+; CHECK-NEXT:    ldr z3, [x8, #3, mul vl]
 ; CHECK-NEXT:    addvl sp, sp, #8
 ; CHECK-NEXT:    ldr x29, [sp], #16 // 8-byte Folded Reload
 ; CHECK-NEXT:    ret
@@ -1063,17 +1066,18 @@ define <vscale x 8 x i32> @splice_nxv8i32(<vscale x 8 x i32> %a, <vscale x 8 x i
 ; CHECK-NEXT:    str x29, [sp, #-16]! // 8-byte Folded Spill
 ; CHECK-NEXT:    addvl sp, sp, #-4
 ; CHECK-NEXT:    rdvl x8, #2
-; CHECK-NEXT:    mov x9, sp
-; CHECK-NEXT:    ptrue p0.s
-; CHECK-NEXT:    add x8, x9, x8
+; CHECK-NEXT:    mov w9, #32 // =0x20
+; CHECK-NEXT:    mov x10, sp
+; CHECK-NEXT:    cmp x8, #32
 ; CHECK-NEXT:    str z1, [sp, #1, mul vl]
-; CHECK-NEXT:    mov x9, #-8 // =0xfffffffffffffff8
+; CHECK-NEXT:    csel x9, x8, x9, lo
+; CHECK-NEXT:    add x8, x10, x8
 ; CHECK-NEXT:    str z0, [sp]
-; CHECK-NEXT:    sub x10, x8, #32
 ; CHECK-NEXT:    str z3, [sp, #3, mul vl]
+; CHECK-NEXT:    sub x8, x8, x9
 ; CHECK-NEXT:    str z2, [sp, #2, mul vl]
-; CHECK-NEXT:    ld1w { z0.s }, p0/z, [x8, x9, lsl #2]
-; CHECK-NEXT:    ldr z1, [x10, #1, mul vl]
+; CHECK-NEXT:    ldr z0, [x8]
+; CHECK-NEXT:    ldr z1, [x8, #1, mul vl]
 ; CHECK-NEXT:    addvl sp, sp, #4
 ; CHECK-NEXT:    ldr x29, [sp], #16 // 8-byte Folded Reload
 ; CHECK-NEXT:    ret
@@ -1113,6 +1117,51 @@ define <vscale x 16 x float> @splice_nxv16f32_neg17(<vscale x 16 x float> %a, <v
   ret <vscale x 16 x float> %res
 }
 
+define <vscale x 4 x i32> @splice_left_nxv4i32_variable_offset(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, i32 zeroext %offset) #0 {
+; CHECK-LABEL: splice_left_nxv4i32_variable_offset:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    str x29, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT:    addvl sp, sp, #-2
+; CHECK-NEXT:    // kill: def $w0 killed $w0 def $x0
+; CHECK-NEXT:    ubfiz x8, x0, #2, #32
+; CHECK-NEXT:    rdvl x9, #1
+; CHECK-NEXT:    ptrue p0.b
+; CHECK-NEXT:    str z0, [sp]
+; CHECK-NEXT:    cmp x8, x9
+; CHECK-NEXT:    str z1, [sp, #1, mul vl]
+; CHECK-NEXT:    csel x8, x8, x9, lo
+; CHECK-NEXT:    mov x9, sp
+; CHECK-NEXT:    ld1b { z0.b }, p0/z, [x9, x8]
+; CHECK-NEXT:    addvl sp, sp, #2
+; CHECK-NEXT:    ldr x29, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT:    ret
+  %res = call <vscale x 4 x i32> @llvm.vector.splice.left(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, i32 %offset)
+  ret <vscale x 4 x i32> %res
+}
+
+define <vscale x 4 x i32> @splice_right_nxv4i32_variable_offset(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, i32 zeroext %offset) #0 {
+; CHECK-LABEL: splice_right_nxv4i32_variable_offset:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    str x29, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT:    addvl sp, sp, #-2
+; CHECK-NEXT:    // kill: def $w0 killed $w0 def $x0
+; CHECK-NEXT:    ubfiz x8, x0, #2, #32
+; CHECK-NEXT:    rdvl x9, #1
+; CHECK-NEXT:    mov x10, sp
+; CHECK-NEXT:    str z0, [sp]
+; CHECK-NEXT:    cmp x8, x9
+; CHECK-NEXT:    str z1, [sp, #1, mul vl]
+; CHECK-NEXT:    csel x8, x8, x9, lo
+; CHECK-NEXT:    add x9, x10, x9
+; CHECK-NEXT:    sub x8, x9, x8
+; CHECK-NEXT:    ldr z0, [x8]
+; CHECK-NEXT:    addvl sp, sp, #2
+; CHECK-NEXT:    ldr x29, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT:    ret
+  %res = call <vscale x 4 x i32> @llvm.vector.splice.right(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, i32 %offset)
+  ret <vscale x 4 x i32> %res
+}
+
 declare <vscale x 2 x i1> @llvm.vector.splice.nxv2i1(<vscale x 2 x i1>, <vscale x 2 x i1>, i32)
 declare <vscale x 4 x i1> @llvm.vector.splice.nxv4i1(<vscale x 4 x i1>, <vscale x 4 x i1>, i32)
 declare <vscale x 8 x i1> @llvm.vector.splice.nxv8i1(<vscale x 8 x i1>, <vscale x 8 x i1>, i32)
diff --git a/llvm/test/CodeGen/RISCV/rvv/vector-splice.ll b/llvm/test/CodeGen/RISCV/rvv/vector-splice.ll
index cc389236df3ff..9fb9b508d76b0 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vector-splice.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vector-splice.ll
@@ -4208,3 +4208,83 @@ define <vscale x 2 x i32> @splice_nxv2i32_slideup_undef(<vscale x 2 x i32> %a) #
 }
 
 attributes #0 = { vscale_range(2,0) }
+
+define <vscale x 2 x i32> @splice_left_nxv2i32_variable_offset(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b, i32 zeroext %offset) {
+; NOVLDEP-LABEL: splice_left_nxv2i32_variable_offset:
+; NOVLDEP:       # %bb.0:
+; NOVLDEP-NEXT:    vsetvli a1, zero, e32, m1, ta, ma
+; NOVLDEP-NEXT:    vslidedown.vx v8, v8, a0
+; NOVLDEP-NEXT:    csrr a1, vlenb
+; NOVLDEP-NEXT:    srli a1, a1, 2
+; NOVLDEP-NEXT:    sub a1, a1, a0
+; NOVLDEP-NEXT:    vslideup.vx v8, v9, a1
+; NOVLDEP-NEXT:    ret
+;
+; VLDEP-LABEL: splice_left_nxv2i32_variable_offset:
+; VLDEP:       # %bb.0:
+; VLDEP-NEXT:    csrr a1, vlenb
+; VLDEP-NEXT:    srli a1, a1, 2
+; VLDEP-NEXT:    sub a1, a1, a0
+; VLDEP-NEXT:    vsetvli zero, a1, e32, m1, ta, ma
+; VLDEP-NEXT:    vslidedown.vx v8, v8, a0
+; VLDEP-NEXT:    vsetvli a0, zero, e32, m1, ta, ma
+; VLDEP-NEXT:    vslideup.vx v8, v9, a1
+; VLDEP-NEXT:    ret
+  %res = call <vscale x 2 x i32> @llvm.vector.splice.left(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b, i32 %offset)
+  ret <vscale x 2 x i32> %res
+}
+
+define <vscale x 2 x i32> @splice_right_nxv2i32_variable_offset(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b, i32 zeroext %offset) {
+; NOVLDEP-LABEL: splice_right_nxv2i32_variable_offset:
+; NOVLDEP:       # %bb.0:
+; NOVLDEP-NEXT:    csrr a1, vlenb
+; NOVLDEP-NEXT:    srli a1, a1, 2
+; NOVLDEP-NEXT:    sub a1, a1, a0
+; NOVLDEP-NEXT:    vsetvli a2, zero, e32, m1, ta, ma
+; NOVLDEP-NEXT:    vslidedown.vx v8, v8, a1
+; NOVLDEP-NEXT:    vslideup.vx v8, v9, a0
+; NOVLDEP-NEXT:    ret
+;
+; VLDEP-LABEL: splice_right_nxv2i32_variable_offset:
+; VLDEP:       # %bb.0:
+; VLDEP-NEXT:    csrr a1, vlenb
+; VLDEP-NEXT:    srli a1, a1, 2
+; VLDEP-NEXT:    sub a1, a1, a0
+; VLDEP-NEXT:    vsetvli zero, a0, e32, m1, ta, ma
+; VLDEP-NEXT:    vslidedown.vx v8, v8, a1
+; VLDEP-NEXT:    vsetvli a1, zero, e32, m1, ta, ma
+; VLDEP-NEXT:    vslideup.vx v8, v9, a0
+; VLDEP-NEXT:    ret
+  %res = call <vscale x 2 x i32> @llvm.vector.splice.right(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b, i32 %offset)
+  ret <vscale x 2 x i32> %res
+}
+
+define <vscale x 2 x i32> @splice_left_nxv2i32_variable_offset_slidedown(<vscale x 2 x i32> %a, i32 zeroext %offset) {
+; NOVLDEP-LABEL: splice_left_nxv2i32_variable_offset_slidedown:
+; NOVLDEP:       # %bb.0:
+; NOVLDEP-NEXT:    vsetvli a1, zero, e32, m1, ta, ma
+; NOVLDEP-NEXT:    vslidedown.vx v8, v8, a0
+; NOVLDEP-NEXT:    ret
+;
+; VLDEP-LABEL: splice_left_nxv2i32_variable_offset_slidedown:
+; VLDEP:       # %bb.0:
+; VLDEP-NEXT:    csrr a1, vlenb
+; VLDEP-NEXT:    srli a1, a1, 2
+; VLDEP-NEXT:    sub a1, a1, a0
+; VLDEP-NEXT:    vsetvli zero, a1, e32, m1, ta, ma
+; VLDEP-NEXT:    vslidedown.vx v8, v8, a0
+; VLDEP-NEXT:    ret
+  %res = call <vscale x 2 x i32> @llvm.vector.splice.left(<vscale x 2 x i32> %a, <vscale x 2 x i32> poison, i32 %offset)
+  ret <vscale x 2 x i32> %res
+}
+
+define <vscale x 2 x i32> @splice_right_nxv2i32_variable_offset_slideup(<vscale x 2 x i32> %a, i32 zeroext %offset) {
+; CHECK-LABEL: splice_right_nxv2i32_variable_offset_slideup:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a1, zero, e32, m1, ta, ma
+; CHECK-NEXT:    vslideup.vx v9, v8, a0
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %res = call <vscale x 2 x i32> @llvm.vector.splice.right(<vscale x 2 x i32> poison, <vscale x 2 x i32> %a, i32 %offset)
+  ret <vscale x 2 x i32> %res
+}
diff --git a/llvm/test/Transforms/InstSimplify/vector-splice.ll b/llvm/test/Transforms/InstSimplify/vector-splice.ll
new file mode 100644
index 0000000000000..827e5c9ec4838
--- /dev/null
+++ b/llvm/test/Transforms/InstSimplify/vector-splice.ll
@@ -0,0 +1,94 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt < %s -p instsimplify -S | FileCheck %s
+
+define <2 x i32> @left_undef_offset(<2 x i32> %a, <2 x i32> %b) {
+; CHECK-LABEL: define <2 x i32> @left_undef_offset(
+; CHECK-SAME: <2 x i32> [[A:%.*]], <2 x i32> [[B:%.*]]) {
+; CHECK-NEXT:    ret <2 x i32> poison
+;
+  %res = call <2 x i32> @llvm.vector.splice.left(<2 x i32> %a, <2 x i32> %b, i32 undef)
+  ret <2 x i32> %res
+}
+
+define <2 x i32> @right_undef_offset(<2 x i32> %a, <2 x i32> %b) {
+; CHECK-LABEL: define <2 x i32> @right_undef_offset(
+; CHECK-SAME: <2 x i32> [[A:%.*]], <2 x i32> [[B:%.*]]) {
+; CHECK-NEXT:    ret <2 x i32> poison
+;
+  %res = call <2 x i32> @llvm.vector.splice.right(<2 x i32> %a, <2 x i32> %b, i32 undef)
+  ret <2 x i32> %res
+}
+
+define <2 x i32> @left_out_of_bounds(<2 x i32> %a, <2 x i32> %b) {
+; CHECK-LABEL: define <2 x i32> @left_out_of_bounds(
+; CHECK-SAME: <2 x i32> [[A:%.*]], <2 x i32> [[B:%.*]]) {
+; CHECK-NEXT:    ret <2 x i32> poison
+;
+  %res = call <2 x i32> @llvm.vector.splice.left(<2 x i32> %a, <2 x i32> %b, i32 3)
+  ret <2 x i32> %res
+}
+
+define <2 x i32> @right_out_of_bounds(<2 x i32> %a, <2 x i32> %b) {
+; CHECK-LABEL: define <2 x i32> @right_out_of_bounds(
+; CHECK-SAME: <2 x i32> [[A:%.*]], <2 x i32> [[B:%.*]]) {
+; CHECK-NEXT:    ret <2 x i32> poison
+;
+  %res = call <2 x i32> @llvm.vector.splice.right(<2 x i32> %a, <2 x i32> %b, i32 3)
+  ret <2 x i32> %res
+}
+
+define <vscale x 2 x i32> @left_out_of_bounds_scalable(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b) vscale_range(1, 1) {
+; CHECK-LABEL: define <vscale x 2 x i32> @left_out_of_bounds_scalable(
+; CHECK-SAME: <vscale x 2 x i32> [[A:%.*]], <vscale x 2 x i32> [[B:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:    ret <vscale x 2 x i32> poison
+;
+  %res = call <vscale x 2 x i32> @llvm.vector.splice.left(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b, i32 3)
+  ret <vscale x 2 x i32> %res
+}
+
+define <vscale x 2 x i32> @left_not_out_of_bounds_scalable(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b) {
+; CHECK-LABEL: define <vscale x 2 x i32> @left_not_out_of_bounds_scalable(
+; CHECK-SAME: <vscale x 2 x i32> [[A:%.*]], <vscale x 2 x i32> [[B:%.*]]) {
+; CHECK-NEXT:    [[RES:%.*]] = call <vscale x 2 x i32> @llvm.vector.splice.left.nxv2i32(<vscale x 2 x i32> [[A]], <vscale x 2 x i32> [[B]], i32 3)
+; CHECK-NEXT:    ret <vscale x 2 x i32> [[RES]]
+;
+  %res = call <vscale x 2 x i32> @llvm.vector.splice.left(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b, i32 3)
+  ret <vscale x 2 x i32> %res
+}
+
+define <vscale x 2 x i32> @right_out_of_bounds_scalable(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b) vscale_range(1, 1) {
+; CHECK-LABEL: define <vscale x 2 x i32> @right_out_of_bounds_scalable(
+; CHECK-SAME: <vscale x 2 x i32> [[A:%.*]], <vscale x 2 x i32> [[B:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret <vscale x 2 x i32> poison
+;
+  %res = call <vscale x 2 x i32> @llvm.vector.splice.right(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b, i32 3)
+  ret <vscale x 2 x i32> %res
+}
+
+define <vscale x 2 x i32> @right_not_out_of_bounds_scalable(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b) {
+; CHECK-LABEL: define <vscale x 2 x i32> @right_not_out_of_bounds_scalable(
+; CHECK-SAME: <vscale x 2 x i32> [[A:%.*]], <vscale x 2 x i32> [[B:%.*]]) {
+; CHECK-NEXT:    [[RES:%.*]] = call <vscale x 2 x i32> @llvm.vector.splice.right.nxv2i32(<vscale x 2 x i32> [[A]], <vscale x 2 x i32> [[B]], i32 3)
+; CHECK-NEXT:    ret <vscale x 2 x i32> [[RES]]
+;
+  %res = call <vscale x 2 x i32> @llvm.vector.splice.right(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b, i32 3)
+  ret <vscale x 2 x i32> %res
+}
+
+define <2 x i32> @left_offset_0(<2 x i32> %a, <2 x i32> %b) {
+; CHECK-LABEL: define <2 x i32> @left_offset_0(
+; CHECK-SAME: <2 x i32> [[A:%.*]], <2 x i32> [[B:%.*]]) {
+; CHECK-NEXT:    ret <2 x i32> [[A]]
+;
+  %res = call <2 x i32> @llvm.vector.splice.left(<2 x i32> %a, <2 x i32> %b, i32 0)
+  ret <2 x i32> %res
+}
+
+define <2 x i32> @right_offset_0(<2 x i32> %a, <2 x i32> %b) {
+; CHECK-LABEL: define <2 x i32> @right_offset_0(
+; CHECK-SAME: <2 x i32> [[A:%.*]], <2 x i32> [[B:%.*]]) {
+; CHECK-NEXT:    ret <2 x i32> [[B]]
+;
+  %res = call <2 x i32> @llvm.vector.splice.right(<2 x i32> %a, <2 x i32> %b, i32 0)
+  ret <2 x i32> %res
+}
diff --git a/llvm/test/Verifier/invalid-splice.ll b/llvm/test/Verifier/invalid-splice.ll
deleted file mode 100644
index d921e4a5c7a78..0000000000000
--- a/llvm/test/Verifier/invalid-splice.ll
+++ /dev/null
@@ -1,37 +0,0 @@
-; RUN: not opt -passes=verify -S < %s 2>&1 >/dev/null | FileCheck %s
-
-; CHECK: The splice index exceeds the range [0, VL] where VL is the known minimum number of elements in the vector
-define <2 x double> @splice_v2f64_idx_neg3(<2 x double> %a, <2 x double> %b) #0 {
-  %res = call <2 x double> @llvm.vector.splice.v2f64(<2 x double> %a, <2 x double> %b, i32 -3)
-  ret <2 x double> %res
-}
-
-; CHECK: The splice index exceeds the range [0, VL] where VL is the known minimum number of elements in the vector
-define <vscale x 2 x double> @splice_nxv2f64_idx_neg3_vscale_min1(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
-  %res = call <vscale x 2 x double> @llvm.vector.splice.nxv2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b, i32 -3)
-  ret <vscale x 2 x double> %res
-}
-
-; CHECK: The splice index exceeds the range [0, VL] where VL is the known minimum number of elements in the vector
-define <vscale x 2 x double> @splice_nxv2f64_idx_neg5_vscale_min2(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #1 {
-  %res = call <vscale x 2 x double> @llvm.vector.splice.nxv2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b, i32 -5)
-  ret <vscale x 2 x double> %res
-}
-
-; CHECK: The splice index exceeds the range [0, VL-1] where VL is the known minimum number of elements in the vector
-define <2 x double> @splice_v2f64_idx2(<2 x double> %a, <2 x double> %b) #0 {
-  %res = call <2 x double> @llvm.vector.splice.v2f64(<2 x double> %a, <2 x double> %b, i32 2)
-  ret <2 x double> %res
-}
-
-; CHECK: The splice index exceeds the range [0, VL-1] where VL is the known minimum number of elements in the vector
-define <2 x double> @splice_v2f64_idx3(<2 x double> %a, <2 x double> %b) #1 {
-  %res = call <2 x double> @llvm.vector.splice.v2f64(<2 x double> %a, <2 x double> %b, i32 3)
-  ret <2 x double> %res
-}
-
-attributes #0 = { vscale_range(1,16) }
-attributes #1 = { vscale_range(2,16) }
-
-declare <2 x double> @llvm.vector.splice.v2f64(<2 x double>, <2 x double>, i32)
-declare <vscale x 2 x double> @llvm.vector.splice.nxv2f64(<vscale x 2 x double>, <vscale x 2 x double>, i32)

>From b1bdeb120a53159771f5ae50df9031dd3eb9db4a Mon Sep 17 00:00:00 2001
From: Luke Lau <luke at igalia.com>
Date: Wed, 7 Jan 2026 14:39:07 +0800
Subject: [PATCH 2/3] Update autoupgrader test

---
 llvm/test/Bitcode/upgrade-vector-splice-intrinsic.ll | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/llvm/test/Bitcode/upgrade-vector-splice-intrinsic.ll b/llvm/test/Bitcode/upgrade-vector-splice-intrinsic.ll
index dd571021efa44..590abb1792431 100644
--- a/llvm/test/Bitcode/upgrade-vector-splice-intrinsic.ll
+++ b/llvm/test/Bitcode/upgrade-vector-splice-intrinsic.ll
@@ -26,7 +26,7 @@ define <vscale x 8 x half> @splice_scalable(<vscale x 8 x half> %a, <vscale x 8
 }
 
 declare <8 x half> @llvm.experimental.vector.splice.v8f16(<8 x half>, <8 x half>, i32 immarg)
-; CHECK: declare <8 x half> @llvm.vector.splice.left.v8f16(<8 x half>, <8 x half>, i32 immarg)
+; CHECK: declare <8 x half> @llvm.vector.splice.left.v8f16(<8 x half>, <8 x half>, i32)
 
 declare <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>, i32 immarg)
-; CHECK: declare <vscale x 8 x half> @llvm.vector.splice.left.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>, i32 immarg)
+; CHECK: declare <vscale x 8 x half> @llvm.vector.splice.left.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>, i32)

>From b5c7edd4afc505a27f38070d3c40742709aa2e0c Mon Sep 17 00:00:00 2001
From: Luke Lau <luke at igalia.com>
Date: Wed, 7 Jan 2026 15:51:05 +0800
Subject: [PATCH 3/3] Extend Offset to ptr type

---
 llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index cf0a13473bc56..be358e9d6cef1 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -12033,6 +12033,7 @@ SDValue TargetLowering::expandVectorSplice(SDNode *Node,
   // NOTE: TrailingBytes must be clamped so as not to read outside of V1:V2.
   SDValue EltByteSize =
       DAG.getTypeSize(DL, PtrVT, VT.getVectorElementType().getStoreSize());
+  Offset = DAG.getZExtOrTrunc(Offset, DL, PtrVT);
   SDValue TrailingBytes = DAG.getNode(ISD::MUL, DL, PtrVT, Offset, EltByteSize);
 
   TrailingBytes = DAG.getNode(ISD::UMIN, DL, PtrVT, TrailingBytes, VTBytes);