[llvm] [IR] Split vector.splice into vector.splice.left and vector.splice.right (PR #170796)
Luke Lau via llvm-commits
llvm-commits at lists.llvm.org
Fri Dec 12 07:31:26 PST 2025
https://github.com/lukel97 updated https://github.com/llvm/llvm-project/pull/170796
>From 9567976529f230892be50a6459dd215da6e4acef Mon Sep 17 00:00:00 2001
From: Luke Lau <luke at igalia.com>
Date: Fri, 5 Dec 2025 10:14:37 +0800
Subject: [PATCH 01/12] [IR] Split vector.splice into vector.splice.down and
vector.splice.up
---
llvm/docs/LangRef.rst | 91 ++++-
.../llvm/Analysis/TargetTransformInfo.h | 1 +
llvm/include/llvm/CodeGen/BasicTTIImpl.h | 11 +-
llvm/include/llvm/CodeGen/ISDOpcodes.h | 21 +-
llvm/include/llvm/IR/Intrinsics.td | 16 +-
.../include/llvm/Target/TargetSelectionDAG.td | 3 +-
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp | 8 +-
.../SelectionDAG/LegalizeIntegerTypes.cpp | 8 +-
.../SelectionDAG/LegalizeVectorTypes.cpp | 3 +-
.../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 9 +-
.../SelectionDAG/SelectionDAGBuilder.cpp | 17 +-
.../SelectionDAG/SelectionDAGDumper.cpp | 3 +-
.../CodeGen/SelectionDAG/TargetLowering.cpp | 14 +-
llvm/lib/CodeGen/TargetLoweringBase.cpp | 3 +-
llvm/lib/IR/AutoUpgrade.cpp | 22 +-
llvm/lib/IR/IRBuilder.cpp | 8 +-
llvm/lib/IR/Verifier.cpp | 12 +-
.../Target/AArch64/AArch64ISelLowering.cpp | 37 +-
.../lib/Target/AArch64/AArch64SVEInstrInfo.td | 31 +-
llvm/lib/Target/RISCV/RISCVISelLowering.cpp | 32 +-
.../test/Analysis/CostModel/AArch64/splice.ll | 56 +--
.../CostModel/AArch64/sve-intrinsics.ll | 366 +++++++++---------
.../Analysis/CostModel/RISCV/rvv-shuffle.ll | 84 ++--
llvm/test/Analysis/CostModel/RISCV/splice.ll | 224 +++++------
.../test/Assembler/auto_upgrade_intrinsics.ll | 11 +
.../AArch64/first-order-recurrence.ll | 6 +-
.../AArch64/reduction-recurrence-costs-sve.ll | 12 +-
.../AArch64/sve-interleaved-accesses.ll | 2 +-
.../AArch64/sve-tail-folding-option.ll | 14 +-
.../tail-folding-fixed-order-recurrence.ll | 16 +-
.../first-order-recurrence-scalable-vf1.ll | 4 +-
.../scalable-first-order-recurrence.ll | 24 +-
32 files changed, 647 insertions(+), 522 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 656b930e1f5ca..2f38971c1e39c 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -20724,7 +20724,7 @@ Arguments:
All arguments must be vectors of the same type whereby their logical
concatenation matches the result type.
-'``llvm.vector.splice``' Intrinsic
+'``llvm.vector.splice.down``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax:
@@ -20733,21 +20733,70 @@ This is an overloaded intrinsic.
::
- declare <2 x double> @llvm.vector.splice.v2f64(<2 x double> %vec1, <2 x double> %vec2, i32 %imm)
- declare <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, i32 %imm)
+ declare <2 x double> @llvm.vector.splice.down.v2f64(<2 x double> %vec1, <2 x double> %vec2, i32 %imm)
+ declare <vscale x 4 x i32> @llvm.vector.splice.down.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, i32 %imm)
Overview:
"""""""""
-The '``llvm.vector.splice.*``' intrinsics construct a vector by
-concatenating elements from the first input vector with elements of the second
-input vector, returning a vector of the same type as the input vectors. The
-signed immediate, modulo the number of elements in the vector, is the index
-into the first vector from which to extract the result value. This means
-conceptually that for a positive immediate, a vector is extracted from
-``concat(%vec1, %vec2)`` starting at index ``imm``, whereas for a negative
-immediate, it extracts ``-imm`` trailing elements from the first vector, and
-the remaining elements from ``%vec2``.
+The '``llvm.vector.splice.down.*``' intrinsics construct a vector by
+concatenating two vectors together, shifting the elements down by ``imm``, and
+extracting the lower half.
+
+This is equivalent to :ref:`llvm.fshr.* <int_fshr>`, but operating on elements
+instead of bits.
+
+These intrinsics work for both fixed and scalable vectors. While this intrinsic
+supports all vector types the recommended way to express this operation for
+fixed-width vectors is still to use a shufflevector, as that may allow for more
+optimization opportunities.
+
+For example:
+
+.. code-block:: text
+
+ llvm.vector.splice.down(<A,B,C,D>, <E,F,G,H>, 1);
+ ==> <A,B,C,D,E,F,G,H>
+ ==> <B,C,D,E,F,G,H,_>
+ ==> <B,C,D,E>
+
+
+Arguments:
+""""""""""
+
+The first two operands are vectors with the same type. The start index is imm
+modulo the runtime number of elements in the source vector. For a fixed-width
+vector <N x eltty>, imm is an unsigned integer constant in the range
+0 <= imm <= N. For a scalable vector <vscale x N x eltty>, imm is an unsigned
+integer constant in the range 0 <= imm <= X where X=vscale_range_min * N.
+
+Semantics:
+""""""""""
+
+For a scalable vector, if the value of ``imm`` exceeds the runtime length of the
+source vector type, the result is a :ref:`poison value <poisonvalues>`.
+
+'``llvm.vector.splice.up``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic.
+
+::
+
+ declare <2 x double> @llvm.vector.splice.up.v2f64(<2 x double> %vec1, <2 x double> %vec2, i32 %imm)
+ declare <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, i32 %imm)
+
+Overview:
+"""""""""
+
+The '``llvm.vector.splice.up.*``' intrinsics construct a vector by
+concatenating two vectors together, shifting the elements up by ``imm``, and
+extracting the upper half.
+
+This is equivalent to :ref:`llvm.fshr.* <int_fshl>`, but operating on elements instead
+of bits.
These intrinsics work for both fixed and scalable vectors. While this intrinsic
supports all vector types the recommended way to express this operation for
@@ -20758,8 +20807,10 @@ For example:
.. code-block:: text
- llvm.vector.splice(<A,B,C,D>, <E,F,G,H>, 1); ==> <B, C, D, E> index
- llvm.vector.splice(<A,B,C,D>, <E,F,G,H>, -3); ==> <B, C, D, E> trailing elements
+ llvm.vector.splice.up(<A,B,C,D>, <E,F,G,H>, 1);
+ ==> <A,B,C,D,E,F,G,H>
+ ==> <_,A,B,C,D,E,F,G>
+ ==> <D,E,F,G>
Arguments:
@@ -20767,9 +20818,15 @@ Arguments:
The first two operands are vectors with the same type. The start index is imm
modulo the runtime number of elements in the source vector. For a fixed-width
-vector <N x eltty>, imm is a signed integer constant in the range
--N <= imm < N. For a scalable vector <vscale x N x eltty>, imm is a signed
-integer constant in the range -X <= imm < X where X=vscale_range_min * N.
+vector <N x eltty>, imm is an unsigned integer constant in the range
+0 <= imm <= N. For a scalable vector <vscale x N x eltty>, imm is an unsigned
+integer constant in the range 0 <= imm <= X where X=vscale_range_min * N.
+
+Semantics:
+""""""""""
+
+For a scalable vector, if the value of ``imm`` exceeds the runtime length of the
+source vector type, the result is a :ref:`poison value <poisonvalues>`.
'``llvm.stepvector``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 99525607f744a..4be5ce9c3e653 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1211,6 +1211,7 @@ class TargetTransformInfo {
///< with any shuffle mask.
SK_PermuteSingleSrc, ///< Shuffle elements of single source vector with any
///< shuffle mask.
+ // TODO: Split into SK_SpliceDown + SK_SpliceUp
SK_Splice ///< Concatenates elements from the first input vector
///< with elements of the second input vector. Returning
///< a vector of the same type as the input vectors.
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 494199835a19c..43f9008edf6e1 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -2003,11 +2003,14 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
cast<VectorType>(Args[0]->getType()), {}, CostKind, Index,
cast<VectorType>(Args[1]->getType()));
}
- case Intrinsic::vector_splice: {
+ case Intrinsic::vector_splice_down:
+ case Intrinsic::vector_splice_up: {
unsigned Index = cast<ConstantInt>(Args[2])->getZExtValue();
- return thisT()->getShuffleCost(TTI::SK_Splice, cast<VectorType>(RetTy),
- cast<VectorType>(Args[0]->getType()), {},
- CostKind, Index, cast<VectorType>(RetTy));
+ return thisT()->getShuffleCost(
+ TTI::SK_Splice, cast<VectorType>(RetTy),
+ cast<VectorType>(Args[0]->getType()), {}, CostKind,
+ IID == Intrinsic::vector_splice_down ? Index : -Index,
+ cast<VectorType>(RetTy));
}
case Intrinsic::vector_reduce_add:
case Intrinsic::vector_reduce_mul:
diff --git a/llvm/include/llvm/CodeGen/ISDOpcodes.h b/llvm/include/llvm/CodeGen/ISDOpcodes.h
index b32f3dacbb3a4..922d9fa79ceed 100644
--- a/llvm/include/llvm/CodeGen/ISDOpcodes.h
+++ b/llvm/include/llvm/CodeGen/ISDOpcodes.h
@@ -641,17 +641,16 @@ enum NodeType {
/// in terms of the element size of VEC1/VEC2, not in terms of bytes.
VECTOR_SHUFFLE,
- /// VECTOR_SPLICE(VEC1, VEC2, IMM) - Returns a subvector of the same type as
- /// VEC1/VEC2 from CONCAT_VECTORS(VEC1, VEC2), based on the IMM in two ways.
- /// Let the result type be T, if IMM is positive it represents the starting
- /// element number (an index) from which a subvector of type T is extracted
- /// from CONCAT_VECTORS(VEC1, VEC2). If IMM is negative it represents a count
- /// specifying the number of trailing elements to extract from VEC1, where the
- /// elements of T are selected using the following algorithm:
- /// RESULT[i] = CONCAT_VECTORS(VEC1,VEC2)[VEC1.ElementCount - ABS(IMM) + i]
- /// If IMM is not in the range [-VL, VL-1] the result vector is undefined. IMM
- /// is a constant integer.
- VECTOR_SPLICE,
+ /// VECTOR_SPLICE_DOWN(VEC1, VEC2, OFFSET) - Shifts CONCAT_VECTORS(VEC1, VEC2)
+ /// down by OFFSET elements and returns the lower half. If OFFSET is greater
+ /// than the runtime number of elements in the result type the result is
+ /// poison.
+ VECTOR_SPLICE_DOWN,
+ /// VECTOR_SPLICE_UP(VEC1, VEC2, OFFSET) - Shifts CONCAT_VECTORS(VEC1, VEC2)
+ /// up by OFFSET elements and returns the upper half. If OFFSET is greater
+ /// than the runtime number of elements in the result type the result is
+ /// poison.
+ VECTOR_SPLICE_UP,
/// SCALAR_TO_VECTOR(VAL) - This represents the operation of loading a
/// scalar value into element 0 of the resultant vector type. The top
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index c3c4718c3548f..1aaa41464b577 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2808,13 +2808,15 @@ def int_vector_reverse : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[IntrNoMem,
IntrSpeculatable]>;
-def int_vector_splice : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
- [LLVMMatchType<0>,
- LLVMMatchType<0>,
- llvm_i32_ty],
- [IntrNoMem,
- IntrSpeculatable,
- ImmArg<ArgIndex<2>>]>;
+def int_vector_splice_down
+ : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+ [LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
+ [IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<2>>]>;
+
+def int_vector_splice_up
+ : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+ [LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
+ [IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<2>>]>;
//===---------- Intrinsics to query properties of scalable vectors --------===//
def int_vscale : DefaultAttrsIntrinsic<[llvm_anyint_ty],
diff --git a/llvm/include/llvm/Target/TargetSelectionDAG.td b/llvm/include/llvm/Target/TargetSelectionDAG.td
index a9750a5ab03f9..abd6d1435d8f6 100644
--- a/llvm/include/llvm/Target/TargetSelectionDAG.td
+++ b/llvm/include/llvm/Target/TargetSelectionDAG.td
@@ -832,7 +832,8 @@ def ist : SDNode<"ISD::STORE" , SDTIStore,
def vector_shuffle : SDNode<"ISD::VECTOR_SHUFFLE", SDTVecShuffle, []>;
def vector_reverse : SDNode<"ISD::VECTOR_REVERSE", SDTVecReverse>;
-def vector_splice : SDNode<"ISD::VECTOR_SPLICE", SDTVecSlice, []>;
+def vector_splice_down : SDNode<"ISD::VECTOR_SPLICE_DOWN", SDTVecSlice, []>;
+def vector_splice_up : SDNode<"ISD::VECTOR_SPLICE_UP", SDTVecSlice, []>;
def build_vector : SDNode<"ISD::BUILD_VECTOR", SDTypeProfile<1, -1, []>, []>;
def splat_vector : SDNode<"ISD::SPLAT_VECTOR", SDTypeProfile<1, 1, []>, []>;
def step_vector : SDNode<"ISD::STEP_VECTOR", SDTypeProfile<1, 1,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index e739659d68561..8a1e59715334a 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -3706,7 +3706,8 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
Results.push_back(Tmp1);
break;
}
- case ISD::VECTOR_SPLICE: {
+ case ISD::VECTOR_SPLICE_DOWN:
+ case ISD::VECTOR_SPLICE_UP: {
Results.push_back(TLI.expandVectorSplice(Node, DAG));
break;
}
@@ -5640,10 +5641,11 @@ void SelectionDAGLegalize::PromoteNode(SDNode *Node) {
Results.push_back(Tmp1);
break;
}
- case ISD::VECTOR_SPLICE: {
+ case ISD::VECTOR_SPLICE_DOWN:
+ case ISD::VECTOR_SPLICE_UP: {
Tmp1 = DAG.getNode(ISD::ANY_EXTEND, dl, NVT, Node->getOperand(0));
Tmp2 = DAG.getNode(ISD::ANY_EXTEND, dl, NVT, Node->getOperand(1));
- Tmp3 = DAG.getNode(ISD::VECTOR_SPLICE, dl, NVT, Tmp1, Tmp2,
+ Tmp3 = DAG.getNode(Node->getOpcode(), dl, NVT, Tmp1, Tmp2,
Node->getOperand(2));
Results.push_back(DAG.getNode(ISD::TRUNCATE, dl, OVT, Tmp3));
break;
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index 17933ab8a81f6..17d968db270b8 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -132,8 +132,10 @@ void DAGTypeLegalizer::PromoteIntegerResult(SDNode *N, unsigned ResNo) {
Res = PromoteIntRes_VECTOR_REVERSE(N); break;
case ISD::VECTOR_SHUFFLE:
Res = PromoteIntRes_VECTOR_SHUFFLE(N); break;
- case ISD::VECTOR_SPLICE:
- Res = PromoteIntRes_VECTOR_SPLICE(N); break;
+ case ISD::VECTOR_SPLICE_DOWN:
+ case ISD::VECTOR_SPLICE_UP:
+ Res = PromoteIntRes_VECTOR_SPLICE(N);
+ break;
case ISD::VECTOR_INTERLEAVE:
case ISD::VECTOR_DEINTERLEAVE:
Res = PromoteIntRes_VECTOR_INTERLEAVE_DEINTERLEAVE(N);
@@ -6001,7 +6003,7 @@ SDValue DAGTypeLegalizer::PromoteIntRes_VECTOR_SPLICE(SDNode *N) {
SDValue V1 = GetPromotedInteger(N->getOperand(1));
EVT OutVT = V0.getValueType();
- return DAG.getNode(ISD::VECTOR_SPLICE, dl, OutVT, V0, V1, N->getOperand(2));
+ return DAG.getNode(N->getOpcode(), dl, OutVT, V0, V1, N->getOperand(2));
}
SDValue DAGTypeLegalizer::PromoteIntRes_VECTOR_INTERLEAVE_DEINTERLEAVE(SDNode *N) {
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 6e1e02f38113e..8a3d965ccce72 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1258,7 +1258,8 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) {
case ISD::VECTOR_SHUFFLE:
SplitVecRes_VECTOR_SHUFFLE(cast<ShuffleVectorSDNode>(N), Lo, Hi);
break;
- case ISD::VECTOR_SPLICE:
+ case ISD::VECTOR_SPLICE_DOWN:
+ case ISD::VECTOR_SPLICE_UP:
SplitVecRes_VECTOR_SPLICE(N, Lo, Hi);
break;
case ISD::VECTOR_DEINTERLEAVE:
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index b009e6a3d5f5f..0abab934f6392 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -8179,11 +8179,14 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
break;
case ISD::VECTOR_SHUFFLE:
llvm_unreachable("should use getVectorShuffle constructor!");
- case ISD::VECTOR_SPLICE: {
- if (cast<ConstantSDNode>(N3)->isZero())
+ case ISD::VECTOR_SPLICE_DOWN:
+ if (isNullConstant(N3))
return N1;
break;
- }
+ case ISD::VECTOR_SPLICE_UP:
+ if (isNullConstant(N3))
+ return N2;
+ break;
case ISD::INSERT_VECTOR_ELT: {
assert(VT.isVector() && VT == N1.getValueType() &&
"INSERT_VECTOR_ELT vector type mismatch");
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 09a0673bfe1bb..025a1ce33ce67 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8355,7 +8355,8 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
case Intrinsic::vector_reverse:
visitVectorReverse(I);
return;
- case Intrinsic::vector_splice:
+ case Intrinsic::vector_splice_down:
+ case Intrinsic::vector_splice_up:
visitVectorSplice(I);
return;
case Intrinsic::callbr_landingpad:
@@ -12891,20 +12892,22 @@ void SelectionDAGBuilder::visitVectorSplice(const CallInst &I) {
SDLoc DL = getCurSDLoc();
SDValue V1 = getValue(I.getOperand(0));
SDValue V2 = getValue(I.getOperand(1));
- int64_t Imm = cast<ConstantInt>(I.getOperand(2))->getSExtValue();
+ uint64_t Imm = cast<ConstantInt>(I.getOperand(2))->getSExtValue();
+ const bool IsDown = I.getIntrinsicID() == Intrinsic::vector_splice_down;
// VECTOR_SHUFFLE doesn't support a scalable mask so use a dedicated node.
if (VT.isScalableVector()) {
- setValue(
- &I, DAG.getNode(ISD::VECTOR_SPLICE, DL, VT, V1, V2,
- DAG.getSignedConstant(
- Imm, DL, TLI.getVectorIdxTy(DAG.getDataLayout()))));
+ setValue(&I, DAG.getNode(
+ IsDown ? ISD::VECTOR_SPLICE_DOWN : ISD::VECTOR_SPLICE_UP,
+ DL, VT, V1, V2,
+ DAG.getConstant(Imm, DL,
+ TLI.getVectorIdxTy(DAG.getDataLayout()))));
return;
}
unsigned NumElts = VT.getVectorNumElements();
- uint64_t Idx = (NumElts + Imm) % NumElts;
+ uint64_t Idx = (NumElts + (IsDown ? Imm : -Imm)) % NumElts;
// Use VECTOR_SHUFFLE to maintain original behaviour for fixed-length vectors.
SmallVector<int, 8> Mask;
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
index ec5edd5f13978..8d7f202c41947 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
@@ -348,7 +348,8 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
case ISD::VECTOR_INTERLEAVE: return "vector_interleave";
case ISD::SCALAR_TO_VECTOR: return "scalar_to_vector";
case ISD::VECTOR_SHUFFLE: return "vector_shuffle";
- case ISD::VECTOR_SPLICE: return "vector_splice";
+ case ISD::VECTOR_SPLICE_DOWN: return "vector_splice_down";
+ case ISD::VECTOR_SPLICE_UP: return "vector_splice_up";
case ISD::SPLAT_VECTOR: return "splat_vector";
case ISD::SPLAT_VECTOR_PARTS: return "splat_vector_parts";
case ISD::VECTOR_REVERSE: return "vector_reverse";
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 1183f562c274d..062682c5a68ef 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -11903,14 +11903,16 @@ SDValue TargetLowering::expandFP_ROUND(SDNode *Node, SelectionDAG &DAG) const {
SDValue TargetLowering::expandVectorSplice(SDNode *Node,
SelectionDAG &DAG) const {
- assert(Node->getOpcode() == ISD::VECTOR_SPLICE && "Unexpected opcode!");
+ assert((Node->getOpcode() == ISD::VECTOR_SPLICE_DOWN ||
+ Node->getOpcode() == ISD::VECTOR_SPLICE_UP) &&
+ "Unexpected opcode!");
assert(Node->getValueType(0).isScalableVector() &&
"Fixed length vector types expected to use SHUFFLE_VECTOR!");
EVT VT = Node->getValueType(0);
SDValue V1 = Node->getOperand(0);
SDValue V2 = Node->getOperand(1);
- int64_t Imm = cast<ConstantSDNode>(Node->getOperand(2))->getSExtValue();
+ uint64_t Imm = Node->getConstantOperandVal(2);
SDLoc DL(Node);
// Expand through memory thusly:
@@ -11941,7 +11943,7 @@ SDValue TargetLowering::expandVectorSplice(SDNode *Node,
SDValue StackPtr2 = DAG.getNode(ISD::ADD, DL, PtrVT, StackPtr, VTBytes);
SDValue StoreV2 = DAG.getStore(StoreV1, DL, V2, StackPtr2, PtrInfo);
- if (Imm >= 0) {
+ if (Node->getOpcode() == ISD::VECTOR_SPLICE_DOWN) {
// Load back the required element. getVectorElementPointer takes care of
// clamping the index if it's out-of-bounds.
StackPtr = getVectorElementPointer(DAG, StackPtr, VT, Node->getOperand(2));
@@ -11950,14 +11952,12 @@ SDValue TargetLowering::expandVectorSplice(SDNode *Node,
MachinePointerInfo::getUnknownStack(MF));
}
- uint64_t TrailingElts = -Imm;
-
// NOTE: TrailingElts must be clamped so as not to read outside of V1:V2.
TypeSize EltByteSize = VT.getVectorElementType().getStoreSize();
SDValue TrailingBytes =
- DAG.getConstant(TrailingElts * EltByteSize, DL, PtrVT);
+ DAG.getConstant(Imm * EltByteSize, DL, PtrVT);
- if (TrailingElts > VT.getVectorMinNumElements())
+ if (Imm > VT.getVectorMinNumElements())
TrailingBytes = DAG.getNode(ISD::UMIN, DL, PtrVT, TrailingBytes, VTBytes);
// Calculate the start address of the spliced result.
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 1d674b283db15..1c0150848e97d 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -957,7 +957,8 @@ void TargetLoweringBase::initActions() {
VT, Expand);
// Named vector shuffles default to expand.
- setOperationAction(ISD::VECTOR_SPLICE, VT, Expand);
+ setOperationAction({ISD::VECTOR_SPLICE_DOWN, ISD::VECTOR_SPLICE_UP}, VT,
+ Expand);
// Only some target support this vector operation. Most need to expand it.
setOperationAction(ISD::VECTOR_COMPRESS, VT, Expand);
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index 487db134b0df3..a22ea555cbcc5 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -1299,7 +1299,6 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn,
.StartsWith("extract.last.active.", Intrinsic::not_intrinsic)
.StartsWith("extract.", Intrinsic::vector_extract)
.StartsWith("insert.", Intrinsic::vector_insert)
- .StartsWith("splice.", Intrinsic::vector_splice)
.StartsWith("reverse.", Intrinsic::vector_reverse)
.StartsWith("interleave2.", Intrinsic::vector_interleave2)
.StartsWith("deinterleave2.", Intrinsic::vector_deinterleave2)
@@ -1666,6 +1665,11 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn,
{{F->arg_begin()->getType(), F->getArg(1)->getType()}});
return true;
}
+ if (Name.consume_front("vector.splice")) {
+ if (Name.starts_with(".down") || Name.starts_with(".up"))
+ break;
+ return true;
+ }
break;
}
@@ -4673,6 +4677,18 @@ static void upgradeDbgIntrinsicToDbgRecord(StringRef Name, CallBase *CI) {
CI->getParent()->insertDbgRecordBefore(DR, CI->getIterator());
}
+static Value *upgradeVectorSplice(CallBase *CI, IRBuilder<> &Builder) {
+ auto *Offset = dyn_cast<ConstantInt>(CI->getArgOperand(2));
+ if (!Offset)
+ reportFatalUsageError("Invalid llvm.vector.splice offset argument");
+ int64_t OffsetVal = Offset->getSExtValue();
+ return Builder.CreateIntrinsic(OffsetVal >= 0 ? Intrinsic::vector_splice_down
+ : Intrinsic::vector_splice_up,
+ CI->getType(),
+ {CI->getArgOperand(0), CI->getArgOperand(1),
+ Builder.getInt32(std::abs(OffsetVal))});
+}
+
/// Upgrade a call to an old intrinsic. All argument and return casting must be
/// provided to seamlessly integrate with existing context.
void llvm::UpgradeIntrinsicCall(CallBase *CI, Function *NewFn) {
@@ -4700,6 +4716,8 @@ void llvm::UpgradeIntrinsicCall(CallBase *CI, Function *NewFn) {
bool IsARM = Name.consume_front("arm.");
bool IsAMDGCN = Name.consume_front("amdgcn.");
bool IsDbg = Name.consume_front("dbg.");
+ bool IsOldSplice = Name.consume_front("vector.splice") &&
+ !(Name.starts_with(".down") || Name.starts_with(".up"));
Value *Rep = nullptr;
if (!IsX86 && Name == "stackprotectorcheck") {
@@ -4716,6 +4734,8 @@ void llvm::UpgradeIntrinsicCall(CallBase *CI, Function *NewFn) {
Rep = upgradeAMDGCNIntrinsicCall(Name, CI, F, Builder);
} else if (IsDbg) {
upgradeDbgIntrinsicToDbgRecord(Name, CI);
+ } else if (IsOldSplice) {
+ Rep = upgradeVectorSplice(CI, Builder);
} else {
llvm_unreachable("Unknown function for CallBase upgrade.");
}
diff --git a/llvm/lib/IR/IRBuilder.cpp b/llvm/lib/IR/IRBuilder.cpp
index 8e1707ac98a51..5faf1f4fdfe14 100644
--- a/llvm/lib/IR/IRBuilder.cpp
+++ b/llvm/lib/IR/IRBuilder.cpp
@@ -1102,10 +1102,12 @@ Value *IRBuilderBase::CreateVectorSplice(Value *V1, Value *V2, int64_t Imm,
if (auto *VTy = dyn_cast<ScalableVectorType>(V1->getType())) {
Module *M = BB->getParent()->getParent();
- Function *F =
- Intrinsic::getOrInsertDeclaration(M, Intrinsic::vector_splice, VTy);
+ Function *F = Intrinsic::getOrInsertDeclaration(
+ M,
+ Imm >= 0 ? Intrinsic::vector_splice_down : Intrinsic::vector_splice_up,
+ VTy);
- Value *Ops[] = {V1, V2, getInt32(Imm)};
+ Value *Ops[] = {V1, V2, getInt32(std::abs(Imm))};
return Insert(CallInst::Create(F, Ops), Name);
}
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 439b3859fd3ac..3993616fb66eb 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -6566,18 +6566,18 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
break;
}
- case Intrinsic::vector_splice: {
+ case Intrinsic::vector_splice_down:
+ case Intrinsic::vector_splice_up: {
VectorType *VecTy = cast<VectorType>(Call.getType());
- int64_t Idx = cast<ConstantInt>(Call.getArgOperand(2))->getSExtValue();
- int64_t KnownMinNumElements = VecTy->getElementCount().getKnownMinValue();
+ uint64_t Idx = cast<ConstantInt>(Call.getArgOperand(2))->getZExtValue();
+ uint64_t KnownMinNumElements = VecTy->getElementCount().getKnownMinValue();
if (Call.getParent() && Call.getParent()->getParent()) {
AttributeList Attrs = Call.getParent()->getParent()->getAttributes();
if (Attrs.hasFnAttr(Attribute::VScaleRange))
KnownMinNumElements *= Attrs.getFnAttrs().getVScaleRangeMin();
}
- Check((Idx < 0 && std::abs(Idx) <= KnownMinNumElements) ||
- (Idx >= 0 && Idx < KnownMinNumElements),
- "The splice index exceeds the range [-VL, VL-1] where VL is the "
+ Check(Idx <= KnownMinNumElements,
+ "The splice index exceeds the range [0, VL] where VL is the "
"known minimum number of elements in the vector. For scalable "
"vectors the minimum number of elements is determined from "
"vscale_range.",
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 7a15d7b75f1b9..2df2705aa4197 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1543,7 +1543,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::MULHS, VT, Custom);
setOperationAction(ISD::MULHU, VT, Custom);
setOperationAction(ISD::SPLAT_VECTOR, VT, Legal);
- setOperationAction(ISD::VECTOR_SPLICE, VT, Custom);
+ setOperationAction(ISD::VECTOR_SPLICE_DOWN, VT, Custom);
+ setOperationAction(ISD::VECTOR_SPLICE_UP, VT, Custom);
setOperationAction(ISD::SELECT, VT, Custom);
setOperationAction(ISD::SETCC, VT, Custom);
setOperationAction(ISD::SDIV, VT, Custom);
@@ -1731,7 +1732,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::VECREDUCE_FMAXIMUM, VT, Custom);
setOperationAction(ISD::VECREDUCE_FMINIMUM, VT, Custom);
setOperationAction(ISD::VECREDUCE_FMUL, VT, Custom);
- setOperationAction(ISD::VECTOR_SPLICE, VT, Custom);
+ setOperationAction(ISD::VECTOR_SPLICE_DOWN, VT, Custom);
+ setOperationAction(ISD::VECTOR_SPLICE_UP, VT, Custom);
setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);
@@ -1785,7 +1787,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::SPLAT_VECTOR, VT, Legal);
setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);
- setOperationAction(ISD::VECTOR_SPLICE, VT, Custom);
+ setOperationAction(ISD::VECTOR_SPLICE_DOWN, VT, Custom);
+ setOperationAction(ISD::VECTOR_SPLICE_UP, VT, Custom);
}
if (Subtarget->hasSVEB16B16() &&
@@ -1911,10 +1914,14 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::VECREDUCE_FADD, VT, Custom);
}
- setOperationPromotedToType(ISD::VECTOR_SPLICE, MVT::nxv2i1, MVT::nxv2i64);
- setOperationPromotedToType(ISD::VECTOR_SPLICE, MVT::nxv4i1, MVT::nxv4i32);
- setOperationPromotedToType(ISD::VECTOR_SPLICE, MVT::nxv8i1, MVT::nxv8i16);
- setOperationPromotedToType(ISD::VECTOR_SPLICE, MVT::nxv16i1, MVT::nxv16i8);
+ setOperationPromotedToType({ISD::VECTOR_SPLICE_DOWN, ISD::VECTOR_SPLICE_UP},
+ MVT::nxv2i1, MVT::nxv2i64);
+ setOperationPromotedToType({ISD::VECTOR_SPLICE_DOWN, ISD::VECTOR_SPLICE_UP},
+ MVT::nxv4i1, MVT::nxv4i32);
+ setOperationPromotedToType({ISD::VECTOR_SPLICE_DOWN, ISD::VECTOR_SPLICE_UP},
+ MVT::nxv8i1, MVT::nxv8i16);
+ setOperationPromotedToType({ISD::VECTOR_SPLICE_DOWN, ISD::VECTOR_SPLICE_UP},
+ MVT::nxv16i1, MVT::nxv16i8);
setOperationAction(ISD::VSCALE, MVT::i32, Custom);
@@ -2421,7 +2428,8 @@ void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT VT) {
setOperationAction(ISD::VECREDUCE_UMIN, VT, Default);
setOperationAction(ISD::VECREDUCE_XOR, VT, Default);
setOperationAction(ISD::VECTOR_SHUFFLE, VT, Default);
- setOperationAction(ISD::VECTOR_SPLICE, VT, Default);
+ setOperationAction(ISD::VECTOR_SPLICE_DOWN, VT, Default);
+ setOperationAction(ISD::VECTOR_SPLICE_UP, VT, Default);
setOperationAction(ISD::VSELECT, VT, Default);
setOperationAction(ISD::XOR, VT, Default);
setOperationAction(ISD::ZERO_EXTEND, VT, Default);
@@ -8080,7 +8088,8 @@ SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
return LowerToPredicatedOp(Op, DAG, AArch64ISD::CTLZ_MERGE_PASSTHRU);
case ISD::CTTZ:
return LowerCTTZ(Op, DAG);
- case ISD::VECTOR_SPLICE:
+ case ISD::VECTOR_SPLICE_DOWN:
+ case ISD::VECTOR_SPLICE_UP:
return LowerVECTOR_SPLICE(Op, DAG);
case ISD::VECTOR_DEINTERLEAVE:
return LowerVECTOR_DEINTERLEAVE(Op, DAG);
@@ -12280,8 +12289,8 @@ SDValue AArch64TargetLowering::LowerVECTOR_SPLICE(SDValue Op,
// there are enough elements in the vector, hence we check the index <= min
// number of elements.
std::optional<unsigned> PredPattern;
- if (Ty.isScalableVector() && IdxVal < 0 &&
- (PredPattern = getSVEPredPatternFromNumElements(std::abs(IdxVal))) !=
+ if (Ty.isScalableVector() && Op.getOpcode() == ISD::VECTOR_SPLICE_UP &&
+ (PredPattern = getSVEPredPatternFromNumElements(IdxVal)) !=
std::nullopt) {
SDLoc DL(Op);
@@ -12297,7 +12306,8 @@ SDValue AArch64TargetLowering::LowerVECTOR_SPLICE(SDValue Op,
// We can select to an EXT instruction when indexing the first 256 bytes.
unsigned BlockSize = AArch64::SVEBitsPerBlock / Ty.getVectorMinNumElements();
- if (IdxVal >= 0 && (IdxVal * BlockSize / 8) < 256)
+ if (Op.getOpcode() == ISD::VECTOR_SPLICE_DOWN &&
+ (IdxVal * BlockSize / 8) < 256)
return Op;
return SDValue();
@@ -16398,7 +16408,8 @@ SDValue AArch64TargetLowering::LowerEXTRACT_SUBVECTOR(SDValue Op,
assert(InVT.isScalableVector() && "Unexpected vector type!");
// Move requested subvector to the start of the vector and try again.
- SDValue Splice = DAG.getNode(ISD::VECTOR_SPLICE, DL, InVT, Vec, Vec, Idx);
+ SDValue Splice =
+ DAG.getNode(ISD::VECTOR_SPLICE_DOWN, DL, InVT, Vec, Vec, Idx);
return convertFromScalableVector(DAG, VT, Splice);
}
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index bfa4ce6da212b..6908d0cb476fe 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -2152,50 +2152,51 @@ let Predicates = [HasSVE_or_SME] in {
def : Pat<(nxv8bf16 (concat_vectors nxv4bf16:$v1, nxv4bf16:$v2)),
(UZP1_ZZZ_H $v1, $v2)>;
- // Splice with lane equal to -1
- def : Pat<(nxv16i8 (vector_splice nxv16i8:$Z1, nxv16i8:$Z2, (i64 -1))),
+ // Splice up with offset equal to 1
+ def : Pat<(nxv16i8 (vector_splice_up nxv16i8:$Z1, nxv16i8:$Z2, (i64 1))),
(INSR_ZV_B ZPR:$Z2, (INSERT_SUBREG (IMPLICIT_DEF),
(LASTB_VPZ_B (PTRUE_B 31), ZPR:$Z1), bsub))>;
- def : Pat<(nxv8i16 (vector_splice nxv8i16:$Z1, nxv8i16:$Z2, (i64 -1))),
+ def : Pat<(nxv8i16 (vector_splice_up nxv8i16:$Z1, nxv8i16:$Z2, (i64 1))),
(INSR_ZV_H ZPR:$Z2, (INSERT_SUBREG (IMPLICIT_DEF),
(LASTB_VPZ_H (PTRUE_H 31), ZPR:$Z1), hsub))>;
- def : Pat<(nxv4i32 (vector_splice nxv4i32:$Z1, nxv4i32:$Z2, (i64 -1))),
+ def : Pat<(nxv4i32 (vector_splice_up nxv4i32:$Z1, nxv4i32:$Z2, (i64 1))),
(INSR_ZV_S ZPR:$Z2, (INSERT_SUBREG (IMPLICIT_DEF),
(LASTB_VPZ_S (PTRUE_S 31), ZPR:$Z1), ssub))>;
- def : Pat<(nxv2i64 (vector_splice nxv2i64:$Z1, nxv2i64:$Z2, (i64 -1))),
+ def : Pat<(nxv2i64 (vector_splice_up nxv2i64:$Z1, nxv2i64:$Z2, (i64 1))),
(INSR_ZV_D ZPR:$Z2, (INSERT_SUBREG (IMPLICIT_DEF),
(LASTB_VPZ_D (PTRUE_D 31), ZPR:$Z1), dsub))>;
- // Splice with lane bigger or equal to 0
+ // Splice down
foreach VT = [nxv16i8] in {
- def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_255 i32:$index)))),
+ def : Pat<(VT(vector_splice_down VT:$Z1, VT:$Z2,
+ (i64(sve_ext_imm_0_255 i32:$index)))),
(EXT_ZZI ZPR:$Z1, ZPR:$Z2, imm0_255:$index)>;
let AddedComplexity = 1 in
- def : Pat<(VT (vector_splice VT:$Z1, VT:$Z1, (i64 (sve_ext_imm_0_255 i32:$index)))),
- (EXT_ZZI_CONSTRUCTIVE ZPR:$Z1, imm0_255:$index)>;
+ def : Pat<(VT (vector_splice_down VT:$Z1, VT:$Z1, (i64(sve_ext_imm_0_255 i32:$index)))),
+ (EXT_ZZI_CONSTRUCTIVE ZPR:$Z1, imm0_255:$index)>;
}
foreach VT = [nxv8i16, nxv8f16, nxv8bf16] in {
- def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_127 i32:$index)))),
+ def : Pat<(VT (vector_splice_down VT:$Z1, VT:$Z2, (i64(sve_ext_imm_0_127 i32:$index)))),
(EXT_ZZI ZPR:$Z1, ZPR:$Z2, imm0_255:$index)>;
let AddedComplexity = 1 in
- def : Pat<(VT (vector_splice VT:$Z1, VT:$Z1, (i64 (sve_ext_imm_0_127 i32:$index)))),
+ def : Pat<(VT (vector_splice_down VT:$Z1, VT:$Z1, (i64(sve_ext_imm_0_127 i32:$index)))),
(EXT_ZZI_CONSTRUCTIVE ZPR:$Z1, imm0_255:$index)>;
}
foreach VT = [nxv4i32, nxv4f16, nxv4f32, nxv4bf16] in {
- def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_63 i32:$index)))),
+ def : Pat<(VT (vector_splice_down VT:$Z1, VT:$Z2, (i64(sve_ext_imm_0_63 i32:$index)))),
(EXT_ZZI ZPR:$Z1, ZPR:$Z2, imm0_255:$index)>;
let AddedComplexity = 1 in
- def : Pat<(VT (vector_splice VT:$Z1, VT:$Z1, (i64 (sve_ext_imm_0_63 i32:$index)))),
+ def : Pat<(VT (vector_splice_down VT:$Z1, VT:$Z1, (i64(sve_ext_imm_0_63 i32:$index)))),
(EXT_ZZI_CONSTRUCTIVE ZPR:$Z1, imm0_255:$index)>;
}
foreach VT = [nxv2i64, nxv2f16, nxv2f32, nxv2f64, nxv2bf16] in {
- def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_31 i32:$index)))),
+ def : Pat<(VT( vector_splice_down VT:$Z1, VT:$Z2, (i64(sve_ext_imm_0_31 i32:$index)))),
(EXT_ZZI ZPR:$Z1, ZPR:$Z2, imm0_255:$index)>;
let AddedComplexity = 1 in
- def : Pat<(VT (vector_splice VT:$Z1, VT:$Z1, (i64 (sve_ext_imm_0_31 i32:$index)))),
+ def : Pat<(VT (vector_splice_down VT:$Z1, VT:$Z1, (i64(sve_ext_imm_0_31 i32:$index)))),
(EXT_ZZI_CONSTRUCTIVE ZPR:$Z1, imm0_255:$index)>;
}
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index b187187f83eb3..2c168c724682c 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -910,7 +910,7 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::EXPERIMENTAL_VP_SPLAT, VT, Custom);
setOperationPromotedToType(
- ISD::VECTOR_SPLICE, VT,
+ {ISD::VECTOR_SPLICE_DOWN, ISD::VECTOR_SPLICE_UP}, VT,
MVT::getVectorVT(MVT::i8, VT.getVectorElementCount()));
}
@@ -998,8 +998,8 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);
- // Splice
- setOperationAction(ISD::VECTOR_SPLICE, VT, Custom);
+ setOperationAction({ISD::VECTOR_SPLICE_DOWN, ISD::VECTOR_SPLICE_UP}, VT,
+ Custom);
if (Subtarget.hasStdExtZvkb()) {
setOperationAction(ISD::BSWAP, VT, Legal);
@@ -1169,7 +1169,9 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);
- setOperationAction({ISD::VECTOR_REVERSE, ISD::VECTOR_SPLICE}, VT, Custom);
+ setOperationAction(
+ {ISD::VECTOR_REVERSE, ISD::VECTOR_SPLICE_DOWN, ISD::VECTOR_SPLICE_UP},
+ VT, Custom);
setOperationAction(ISD::EXPERIMENTAL_VP_SPLICE, VT, Custom);
setOperationAction(ISD::EXPERIMENTAL_VP_REVERSE, VT, Custom);
@@ -1215,8 +1217,8 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setOperationAction({ISD::INSERT_VECTOR_ELT, ISD::CONCAT_VECTORS,
ISD::INSERT_SUBVECTOR, ISD::EXTRACT_SUBVECTOR,
ISD::VECTOR_DEINTERLEAVE, ISD::VECTOR_INTERLEAVE,
- ISD::VECTOR_REVERSE, ISD::VECTOR_SPLICE,
- ISD::VECTOR_COMPRESS},
+ ISD::VECTOR_REVERSE, ISD::VECTOR_SPLICE_DOWN,
+ ISD::VECTOR_SPLICE_UP, ISD::VECTOR_COMPRESS},
VT, Custom);
setOperationAction(ISD::EXPERIMENTAL_VP_SPLICE, VT, Custom);
setOperationAction(ISD::EXPERIMENTAL_VP_REVERSE, VT, Custom);
@@ -1271,7 +1273,8 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
ISD::CONCAT_VECTORS, ISD::INSERT_SUBVECTOR,
ISD::EXTRACT_SUBVECTOR, ISD::VECTOR_DEINTERLEAVE,
ISD::VECTOR_INTERLEAVE, ISD::VECTOR_REVERSE,
- ISD::VECTOR_SPLICE, ISD::VECTOR_COMPRESS},
+ ISD::VECTOR_SPLICE_DOWN, ISD::VECTOR_SPLICE_UP,
+ ISD::VECTOR_COMPRESS},
VT, Custom);
setOperationAction(ISD::EXPERIMENTAL_VP_SPLICE, VT, Custom);
setOperationAction(ISD::EXPERIMENTAL_VP_REVERSE, VT, Custom);
@@ -8249,7 +8252,8 @@ SDValue RISCVTargetLowering::LowerOperation(SDValue Op,
return lowerSTEP_VECTOR(Op, DAG);
case ISD::VECTOR_REVERSE:
return lowerVECTOR_REVERSE(Op, DAG);
- case ISD::VECTOR_SPLICE:
+ case ISD::VECTOR_SPLICE_DOWN:
+ case ISD::VECTOR_SPLICE_UP:
return lowerVECTOR_SPLICE(Op, DAG);
case ISD::BUILD_VECTOR: {
MVT VT = Op.getSimpleValueType();
@@ -13026,23 +13030,23 @@ SDValue RISCVTargetLowering::lowerVECTOR_SPLICE(SDValue Op,
SDLoc DL(Op);
SDValue V1 = Op.getOperand(0);
SDValue V2 = Op.getOperand(1);
+ SDValue Offset = Op.getOperand(2);
MVT XLenVT = Subtarget.getXLenVT();
MVT VecVT = Op.getSimpleValueType();
SDValue VLMax = computeVLMax(VecVT, DL, DAG);
- int64_t ImmValue = cast<ConstantSDNode>(Op.getOperand(2))->getSExtValue();
SDValue DownOffset, UpOffset;
- if (ImmValue >= 0) {
+ if (Op.getOpcode() == ISD::VECTOR_SPLICE_DOWN) {
// The operand is a TargetConstant, we need to rebuild it as a regular
// constant.
- DownOffset = DAG.getConstant(ImmValue, DL, XLenVT);
- UpOffset = DAG.getNode(ISD::SUB, DL, XLenVT, VLMax, DownOffset);
+ DownOffset = Offset;
+ UpOffset = DAG.getNode(ISD::SUB, DL, XLenVT, VLMax, Offset);
} else {
// The operand is a TargetConstant, we need to rebuild it as a regular
// constant rather than negating the original operand.
- UpOffset = DAG.getConstant(-ImmValue, DL, XLenVT);
- DownOffset = DAG.getNode(ISD::SUB, DL, XLenVT, VLMax, UpOffset);
+ UpOffset = Offset;
+ DownOffset = DAG.getNode(ISD::SUB, DL, XLenVT, VLMax, Offset);
}
SDValue TrueMask = getAllOnesMask(VecVT, VLMax, DL, DAG);
diff --git a/llvm/test/Analysis/CostModel/AArch64/splice.ll b/llvm/test/Analysis/CostModel/AArch64/splice.ll
index 1d76a4838cee5..1667a6c91965c 100644
--- a/llvm/test/Analysis/CostModel/AArch64/splice.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/splice.ll
@@ -5,34 +5,34 @@ target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
define void @vector_splice() #0 {
; CHECK-LABEL: 'vector_splice'
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice.v16i8 = call <16 x i8> @llvm.vector.splice.v16i8(<16 x i8> zeroinitializer, <16 x i8> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.v32i8 = call <32 x i8> @llvm.vector.splice.v32i8(<32 x i8> zeroinitializer, <32 x i8> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice.v2i16 = call <2 x i16> @llvm.vector.splice.v2i16(<2 x i16> zeroinitializer, <2 x i16> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice.v4i16 = call <4 x i16> @llvm.vector.splice.v4i16(<4 x i16> zeroinitializer, <4 x i16> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice.v8i16 = call <8 x i16> @llvm.vector.splice.v8i16(<8 x i16> zeroinitializer, <8 x i16> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.v16i16 = call <16 x i16> @llvm.vector.splice.v16i16(<16 x i16> zeroinitializer, <16 x i16> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice.v4i32 = call <4 x i32> @llvm.vector.splice.v4i32(<4 x i32> zeroinitializer, <4 x i32> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.v8i32 = call <8 x i32> @llvm.vector.splice.v8i32(<8 x i32> zeroinitializer, <8 x i32> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice.v2i64 = call <2 x i64> @llvm.vector.splice.v2i64(<2 x i64> zeroinitializer, <2 x i64> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.v4i64 = call <4 x i64> @llvm.vector.splice.v4i64(<4 x i64> zeroinitializer, <4 x i64> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice.v2f16 = call <2 x half> @llvm.vector.splice.v2f16(<2 x half> zeroinitializer, <2 x half> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice.v4f16 = call <4 x half> @llvm.vector.splice.v4f16(<4 x half> zeroinitializer, <4 x half> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice.v8f16 = call <8 x half> @llvm.vector.splice.v8f16(<8 x half> zeroinitializer, <8 x half> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.v16f16 = call <16 x half> @llvm.vector.splice.v16f16(<16 x half> zeroinitializer, <16 x half> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice.v2f32 = call <2 x float> @llvm.vector.splice.v2f32(<2 x float> zeroinitializer, <2 x float> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice.v4f32 = call <4 x float> @llvm.vector.splice.v4f32(<4 x float> zeroinitializer, <4 x float> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.v8f32 = call <8 x float> @llvm.vector.splice.v8f32(<8 x float> zeroinitializer, <8 x float> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice.v2f64 = call <2 x double> @llvm.vector.splice.v2f64(<2 x double> zeroinitializer, <2 x double> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.v4f64 = call <4 x double> @llvm.vector.splice.v4f64(<4 x double> zeroinitializer, <4 x double> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice.v2bf16 = call <2 x bfloat> @llvm.vector.splice.v2bf16(<2 x bfloat> zeroinitializer, <2 x bfloat> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice.v4bf16 = call <4 x bfloat> @llvm.vector.splice.v4bf16(<4 x bfloat> zeroinitializer, <4 x bfloat> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice.v8bf16 = call <8 x bfloat> @llvm.vector.splice.v8bf16(<8 x bfloat> zeroinitializer, <8 x bfloat> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.v16bf16 = call <16 x bfloat> @llvm.vector.splice.v16bf16(<16 x bfloat> zeroinitializer, <16 x bfloat> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice.v16i1 = call <16 x i1> @llvm.vector.splice.v16i1(<16 x i1> zeroinitializer, <16 x i1> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice.v8i1 = call <8 x i1> @llvm.vector.splice.v8i1(<8 x i1> zeroinitializer, <8 x i1> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice.v4i1 = call <4 x i1> @llvm.vector.splice.v4i1(<4 x i1> zeroinitializer, <4 x i1> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice.v2i1 = call <2 x i1> @llvm.vector.splice.v2i1(<2 x i1> zeroinitializer, <2 x i1> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %splice.v2i128 = call <2 x i128> @llvm.vector.splice.v2i128(<2 x i128> zeroinitializer, <2 x i128> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %1 = call <16 x i8> @llvm.vector.splice.down.v16i8(<16 x i8> zeroinitializer, <16 x i8> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %2 = call <32 x i8> @llvm.vector.splice.down.v32i8(<32 x i8> zeroinitializer, <32 x i8> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %3 = call <2 x i16> @llvm.vector.splice.down.v2i16(<2 x i16> zeroinitializer, <2 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %4 = call <4 x i16> @llvm.vector.splice.down.v4i16(<4 x i16> zeroinitializer, <4 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %5 = call <8 x i16> @llvm.vector.splice.down.v8i16(<8 x i16> zeroinitializer, <8 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %6 = call <16 x i16> @llvm.vector.splice.down.v16i16(<16 x i16> zeroinitializer, <16 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %7 = call <4 x i32> @llvm.vector.splice.down.v4i32(<4 x i32> zeroinitializer, <4 x i32> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = call <8 x i32> @llvm.vector.splice.down.v8i32(<8 x i32> zeroinitializer, <8 x i32> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %9 = call <2 x i64> @llvm.vector.splice.down.v2i64(<2 x i64> zeroinitializer, <2 x i64> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %10 = call <4 x i64> @llvm.vector.splice.down.v4i64(<4 x i64> zeroinitializer, <4 x i64> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %11 = call <2 x half> @llvm.vector.splice.down.v2f16(<2 x half> zeroinitializer, <2 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %12 = call <4 x half> @llvm.vector.splice.down.v4f16(<4 x half> zeroinitializer, <4 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %13 = call <8 x half> @llvm.vector.splice.down.v8f16(<8 x half> zeroinitializer, <8 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %14 = call <16 x half> @llvm.vector.splice.down.v16f16(<16 x half> zeroinitializer, <16 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %15 = call <2 x float> @llvm.vector.splice.down.v2f32(<2 x float> zeroinitializer, <2 x float> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %16 = call <4 x float> @llvm.vector.splice.down.v4f32(<4 x float> zeroinitializer, <4 x float> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %17 = call <8 x float> @llvm.vector.splice.down.v8f32(<8 x float> zeroinitializer, <8 x float> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %18 = call <2 x double> @llvm.vector.splice.down.v2f64(<2 x double> zeroinitializer, <2 x double> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %19 = call <4 x double> @llvm.vector.splice.down.v4f64(<4 x double> zeroinitializer, <4 x double> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %20 = call <2 x bfloat> @llvm.vector.splice.down.v2bf16(<2 x bfloat> zeroinitializer, <2 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %21 = call <4 x bfloat> @llvm.vector.splice.down.v4bf16(<4 x bfloat> zeroinitializer, <4 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %22 = call <8 x bfloat> @llvm.vector.splice.down.v8bf16(<8 x bfloat> zeroinitializer, <8 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %23 = call <16 x bfloat> @llvm.vector.splice.down.v16bf16(<16 x bfloat> zeroinitializer, <16 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %24 = call <16 x i1> @llvm.vector.splice.down.v16i1(<16 x i1> zeroinitializer, <16 x i1> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %25 = call <8 x i1> @llvm.vector.splice.down.v8i1(<8 x i1> zeroinitializer, <8 x i1> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %26 = call <4 x i1> @llvm.vector.splice.down.v4i1(<4 x i1> zeroinitializer, <4 x i1> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %27 = call <2 x i1> @llvm.vector.splice.down.v2i1(<2 x i1> zeroinitializer, <2 x i1> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %28 = call <2 x i128> @llvm.vector.splice.down.v2i128(<2 x i128> zeroinitializer, <2 x i128> zeroinitializer, i32 1)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
%splice.v16i8 = call <16 x i8> @llvm.vector.splice.v16i8(<16 x i8> zeroinitializer, <16 x i8> zeroinitializer, i32 1)
diff --git a/llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll b/llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll
index ac654ddaf3808..e222399fa9cb7 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll
@@ -617,195 +617,195 @@ declare <vscale x 4 x float> @llvm.log10.nxv4f32(<vscale x 4 x float>)
define void @vector_splice() #0 {
; CHECK-VSCALE-1-LABEL: 'vector_splice'
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %splice_nxv16i8 = call <vscale x 16 x i8> @llvm.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %splice_nxv32i8 = call <vscale x 32 x i8> @llvm.vector.splice.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %splice_nxv2i16 = call <vscale x 2 x i16> @llvm.vector.splice.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %splice_nxv4i16 = call <vscale x 4 x i16> @llvm.vector.splice.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %splice_nxv8i16 = call <vscale x 8 x i16> @llvm.vector.splice.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %splice_nxv16i16 = call <vscale x 16 x i16> @llvm.vector.splice.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %splice_nxv4i32 = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %splice_nxv8i32 = call <vscale x 8 x i32> @llvm.vector.splice.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %splice_nxv2i64 = call <vscale x 2 x i64> @llvm.vector.splice.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %splice_nxv4i64 = call <vscale x 4 x i64> @llvm.vector.splice.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %splice_nxv2f16 = call <vscale x 2 x half> @llvm.vector.splice.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %splice_nxv4f16 = call <vscale x 4 x half> @llvm.vector.splice.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %splice_nxv8f16 = call <vscale x 8 x half> @llvm.vector.splice.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %splice_nxv16f16 = call <vscale x 16 x half> @llvm.vector.splice.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %splice_nxv2f32 = call <vscale x 2 x float> @llvm.vector.splice.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %splice_nxv4f32 = call <vscale x 4 x float> @llvm.vector.splice.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %splice_nxv8f32 = call <vscale x 8 x float> @llvm.vector.splice.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %splice_nxv2f64 = call <vscale x 2 x double> @llvm.vector.splice.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %splice_nxv4f64 = call <vscale x 4 x double> @llvm.vector.splice.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %splice_nxv2bf16 = call <vscale x 2 x bfloat> @llvm.vector.splice.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %splice_nxv4bf16 = call <vscale x 4 x bfloat> @llvm.vector.splice.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %splice_nxv8bf16 = call <vscale x 8 x bfloat> @llvm.vector.splice.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %splice_nxv16bf16 = call <vscale x 16 x bfloat> @llvm.vector.splice.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %splice_nxv16i1 = call <vscale x 16 x i1> @llvm.vector.splice.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %splice_nxv8i1 = call <vscale x 8 x i1> @llvm.vector.splice.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %splice_nxv4i1 = call <vscale x 4 x i1> @llvm.vector.splice.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %splice_nxv2i1 = call <vscale x 2 x i1> @llvm.vector.splice.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %splice_nxv16i8_neg = call <vscale x 16 x i8> @llvm.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %splice_nxv32i8_neg = call <vscale x 32 x i8> @llvm.vector.splice.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1i16_neg = call <vscale x 1 x i16> @llvm.vector.splice.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %splice_nxv2i16_neg = call <vscale x 2 x i16> @llvm.vector.splice.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %splice_nxv4i16_neg = call <vscale x 4 x i16> @llvm.vector.splice.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %splice_nxv8i16_neg = call <vscale x 8 x i16> @llvm.vector.splice.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %splice_nxv16i16_neg = call <vscale x 16 x i16> @llvm.vector.splice.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %splice_nxv4i32_neg = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %splice_nxv8i32_neg = call <vscale x 8 x i32> @llvm.vector.splice.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1i64_neg = call <vscale x 1 x i64> @llvm.vector.splice.nxv1i64(<vscale x 1 x i64> zeroinitializer, <vscale x 1 x i64> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %splice_nxv2i64_neg = call <vscale x 2 x i64> @llvm.vector.splice.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %splice_nxv4i64_neg = call <vscale x 4 x i64> @llvm.vector.splice.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1f16_neg = call <vscale x 1 x half> @llvm.vector.splice.nxv1f16(<vscale x 1 x half> zeroinitializer, <vscale x 1 x half> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %splice_nxv2f16_neg = call <vscale x 2 x half> @llvm.vector.splice.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %splice_nxv4f16_neg = call <vscale x 4 x half> @llvm.vector.splice.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %splice_nxv8f16_neg = call <vscale x 8 x half> @llvm.vector.splice.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %splice_nxv16f16_neg = call <vscale x 16 x half> @llvm.vector.splice.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1f32_neg = call <vscale x 1 x float> @llvm.vector.splice.nxv1f32(<vscale x 1 x float> zeroinitializer, <vscale x 1 x float> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %splice_nxv2f32_neg = call <vscale x 2 x float> @llvm.vector.splice.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %splice_nxv4f32_neg = call <vscale x 4 x float> @llvm.vector.splice.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %splice_nxv8f32_neg = call <vscale x 8 x float> @llvm.vector.splice.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1f64_neg = call <vscale x 1 x double> @llvm.vector.splice.nxv1f64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %splice_nxv2f64_neg = call <vscale x 2 x double> @llvm.vector.splice.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %splice_nxv4f64_neg = call <vscale x 4 x double> @llvm.vector.splice.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1bf16_neg = call <vscale x 1 x bfloat> @llvm.vector.splice.nxv1bf16(<vscale x 1 x bfloat> zeroinitializer, <vscale x 1 x bfloat> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %splice_nxv2bf16_neg = call <vscale x 2 x bfloat> @llvm.vector.splice.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %splice_nxv4bf16_neg = call <vscale x 4 x bfloat> @llvm.vector.splice.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %splice_nxv8bf16_neg = call <vscale x 8 x bfloat> @llvm.vector.splice.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %splice_nxv16bf16_neg = call <vscale x 16 x bfloat> @llvm.vector.splice.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %splice_nxv16i1_neg = call <vscale x 16 x i1> @llvm.vector.splice.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %splice_nxv8i1_neg = call <vscale x 8 x i1> @llvm.vector.splice.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %splice_nxv4i1_neg = call <vscale x 4 x i1> @llvm.vector.splice.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %splice_nxv2i1_neg = call <vscale x 2 x i1> @llvm.vector.splice.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 -1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1i1_neg = call <vscale x 1 x i1> @llvm.vector.splice.nxv1i1(<vscale x 1 x i1> zeroinitializer, <vscale x 1 x i1> zeroinitializer, i32 -1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %1 = call <vscale x 16 x i8> @llvm.vector.splice.down.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %2 = call <vscale x 32 x i8> @llvm.vector.splice.down.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %3 = call <vscale x 2 x i16> @llvm.vector.splice.down.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %4 = call <vscale x 4 x i16> @llvm.vector.splice.down.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %5 = call <vscale x 8 x i16> @llvm.vector.splice.down.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %6 = call <vscale x 16 x i16> @llvm.vector.splice.down.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %7 = call <vscale x 4 x i32> @llvm.vector.splice.down.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %8 = call <vscale x 8 x i32> @llvm.vector.splice.down.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %9 = call <vscale x 2 x i64> @llvm.vector.splice.down.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %10 = call <vscale x 4 x i64> @llvm.vector.splice.down.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %11 = call <vscale x 2 x half> @llvm.vector.splice.down.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %12 = call <vscale x 4 x half> @llvm.vector.splice.down.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %13 = call <vscale x 8 x half> @llvm.vector.splice.down.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %14 = call <vscale x 16 x half> @llvm.vector.splice.down.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %15 = call <vscale x 2 x float> @llvm.vector.splice.down.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %16 = call <vscale x 4 x float> @llvm.vector.splice.down.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %17 = call <vscale x 8 x float> @llvm.vector.splice.down.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %18 = call <vscale x 2 x double> @llvm.vector.splice.down.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %19 = call <vscale x 4 x double> @llvm.vector.splice.down.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %20 = call <vscale x 2 x bfloat> @llvm.vector.splice.down.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %21 = call <vscale x 4 x bfloat> @llvm.vector.splice.down.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %22 = call <vscale x 8 x bfloat> @llvm.vector.splice.down.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %23 = call <vscale x 16 x bfloat> @llvm.vector.splice.down.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %24 = call <vscale x 16 x i1> @llvm.vector.splice.down.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %25 = call <vscale x 8 x i1> @llvm.vector.splice.down.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %26 = call <vscale x 4 x i1> @llvm.vector.splice.down.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %27 = call <vscale x 2 x i1> @llvm.vector.splice.down.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %28 = call <vscale x 16 x i8> @llvm.vector.splice.up.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %29 = call <vscale x 32 x i8> @llvm.vector.splice.up.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %30 = call <vscale x 1 x i16> @llvm.vector.splice.up.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %31 = call <vscale x 2 x i16> @llvm.vector.splice.up.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %32 = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %33 = call <vscale x 8 x i16> @llvm.vector.splice.up.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %34 = call <vscale x 16 x i16> @llvm.vector.splice.up.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %35 = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %36 = call <vscale x 8 x i32> @llvm.vector.splice.up.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %37 = call <vscale x 1 x i64> @llvm.vector.splice.up.nxv1i64(<vscale x 1 x i64> zeroinitializer, <vscale x 1 x i64> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %38 = call <vscale x 2 x i64> @llvm.vector.splice.up.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %39 = call <vscale x 4 x i64> @llvm.vector.splice.up.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %40 = call <vscale x 1 x half> @llvm.vector.splice.up.nxv1f16(<vscale x 1 x half> zeroinitializer, <vscale x 1 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %41 = call <vscale x 2 x half> @llvm.vector.splice.up.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %42 = call <vscale x 4 x half> @llvm.vector.splice.up.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %43 = call <vscale x 8 x half> @llvm.vector.splice.up.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %44 = call <vscale x 16 x half> @llvm.vector.splice.up.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %45 = call <vscale x 1 x float> @llvm.vector.splice.up.nxv1f32(<vscale x 1 x float> zeroinitializer, <vscale x 1 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %46 = call <vscale x 2 x float> @llvm.vector.splice.up.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %47 = call <vscale x 4 x float> @llvm.vector.splice.up.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %48 = call <vscale x 8 x float> @llvm.vector.splice.up.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %49 = call <vscale x 1 x double> @llvm.vector.splice.up.nxv1f64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %50 = call <vscale x 2 x double> @llvm.vector.splice.up.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %51 = call <vscale x 4 x double> @llvm.vector.splice.up.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %52 = call <vscale x 1 x bfloat> @llvm.vector.splice.up.nxv1bf16(<vscale x 1 x bfloat> zeroinitializer, <vscale x 1 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %53 = call <vscale x 2 x bfloat> @llvm.vector.splice.up.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %54 = call <vscale x 4 x bfloat> @llvm.vector.splice.up.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %55 = call <vscale x 8 x bfloat> @llvm.vector.splice.up.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %56 = call <vscale x 16 x bfloat> @llvm.vector.splice.up.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %57 = call <vscale x 16 x i1> @llvm.vector.splice.up.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %58 = call <vscale x 8 x i1> @llvm.vector.splice.up.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %59 = call <vscale x 4 x i1> @llvm.vector.splice.up.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %60 = call <vscale x 2 x i1> @llvm.vector.splice.up.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %61 = call <vscale x 1 x i1> @llvm.vector.splice.up.nxv1i1(<vscale x 1 x i1> zeroinitializer, <vscale x 1 x i1> zeroinitializer, i32 1)
; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
; CHECK-VSCALE-2-LABEL: 'vector_splice'
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %splice_nxv16i8 = call <vscale x 16 x i8> @llvm.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %splice_nxv32i8 = call <vscale x 32 x i8> @llvm.vector.splice.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %splice_nxv2i16 = call <vscale x 2 x i16> @llvm.vector.splice.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %splice_nxv4i16 = call <vscale x 4 x i16> @llvm.vector.splice.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %splice_nxv8i16 = call <vscale x 8 x i16> @llvm.vector.splice.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %splice_nxv16i16 = call <vscale x 16 x i16> @llvm.vector.splice.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %splice_nxv4i32 = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %splice_nxv8i32 = call <vscale x 8 x i32> @llvm.vector.splice.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %splice_nxv2i64 = call <vscale x 2 x i64> @llvm.vector.splice.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %splice_nxv4i64 = call <vscale x 4 x i64> @llvm.vector.splice.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %splice_nxv2f16 = call <vscale x 2 x half> @llvm.vector.splice.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %splice_nxv4f16 = call <vscale x 4 x half> @llvm.vector.splice.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %splice_nxv8f16 = call <vscale x 8 x half> @llvm.vector.splice.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %splice_nxv16f16 = call <vscale x 16 x half> @llvm.vector.splice.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %splice_nxv2f32 = call <vscale x 2 x float> @llvm.vector.splice.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %splice_nxv4f32 = call <vscale x 4 x float> @llvm.vector.splice.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %splice_nxv8f32 = call <vscale x 8 x float> @llvm.vector.splice.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %splice_nxv2f64 = call <vscale x 2 x double> @llvm.vector.splice.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %splice_nxv4f64 = call <vscale x 4 x double> @llvm.vector.splice.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %splice_nxv2bf16 = call <vscale x 2 x bfloat> @llvm.vector.splice.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %splice_nxv4bf16 = call <vscale x 4 x bfloat> @llvm.vector.splice.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %splice_nxv8bf16 = call <vscale x 8 x bfloat> @llvm.vector.splice.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %splice_nxv16bf16 = call <vscale x 16 x bfloat> @llvm.vector.splice.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %splice_nxv16i1 = call <vscale x 16 x i1> @llvm.vector.splice.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %splice_nxv8i1 = call <vscale x 8 x i1> @llvm.vector.splice.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %splice_nxv4i1 = call <vscale x 4 x i1> @llvm.vector.splice.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %splice_nxv2i1 = call <vscale x 2 x i1> @llvm.vector.splice.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %splice_nxv16i8_neg = call <vscale x 16 x i8> @llvm.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %splice_nxv32i8_neg = call <vscale x 32 x i8> @llvm.vector.splice.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1i16_neg = call <vscale x 1 x i16> @llvm.vector.splice.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %splice_nxv2i16_neg = call <vscale x 2 x i16> @llvm.vector.splice.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %splice_nxv4i16_neg = call <vscale x 4 x i16> @llvm.vector.splice.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %splice_nxv8i16_neg = call <vscale x 8 x i16> @llvm.vector.splice.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %splice_nxv16i16_neg = call <vscale x 16 x i16> @llvm.vector.splice.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %splice_nxv4i32_neg = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %splice_nxv8i32_neg = call <vscale x 8 x i32> @llvm.vector.splice.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1i64_neg = call <vscale x 1 x i64> @llvm.vector.splice.nxv1i64(<vscale x 1 x i64> zeroinitializer, <vscale x 1 x i64> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %splice_nxv2i64_neg = call <vscale x 2 x i64> @llvm.vector.splice.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %splice_nxv4i64_neg = call <vscale x 4 x i64> @llvm.vector.splice.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1f16_neg = call <vscale x 1 x half> @llvm.vector.splice.nxv1f16(<vscale x 1 x half> zeroinitializer, <vscale x 1 x half> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %splice_nxv2f16_neg = call <vscale x 2 x half> @llvm.vector.splice.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %splice_nxv4f16_neg = call <vscale x 4 x half> @llvm.vector.splice.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %splice_nxv8f16_neg = call <vscale x 8 x half> @llvm.vector.splice.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %splice_nxv16f16_neg = call <vscale x 16 x half> @llvm.vector.splice.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1f32_neg = call <vscale x 1 x float> @llvm.vector.splice.nxv1f32(<vscale x 1 x float> zeroinitializer, <vscale x 1 x float> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %splice_nxv2f32_neg = call <vscale x 2 x float> @llvm.vector.splice.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %splice_nxv4f32_neg = call <vscale x 4 x float> @llvm.vector.splice.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %splice_nxv8f32_neg = call <vscale x 8 x float> @llvm.vector.splice.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1f64_neg = call <vscale x 1 x double> @llvm.vector.splice.nxv1f64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %splice_nxv2f64_neg = call <vscale x 2 x double> @llvm.vector.splice.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %splice_nxv4f64_neg = call <vscale x 4 x double> @llvm.vector.splice.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1bf16_neg = call <vscale x 1 x bfloat> @llvm.vector.splice.nxv1bf16(<vscale x 1 x bfloat> zeroinitializer, <vscale x 1 x bfloat> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %splice_nxv2bf16_neg = call <vscale x 2 x bfloat> @llvm.vector.splice.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %splice_nxv4bf16_neg = call <vscale x 4 x bfloat> @llvm.vector.splice.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %splice_nxv8bf16_neg = call <vscale x 8 x bfloat> @llvm.vector.splice.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %splice_nxv16bf16_neg = call <vscale x 16 x bfloat> @llvm.vector.splice.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %splice_nxv16i1_neg = call <vscale x 16 x i1> @llvm.vector.splice.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %splice_nxv8i1_neg = call <vscale x 8 x i1> @llvm.vector.splice.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %splice_nxv4i1_neg = call <vscale x 4 x i1> @llvm.vector.splice.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %splice_nxv2i1_neg = call <vscale x 2 x i1> @llvm.vector.splice.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 -1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1i1_neg = call <vscale x 1 x i1> @llvm.vector.splice.nxv1i1(<vscale x 1 x i1> zeroinitializer, <vscale x 1 x i1> zeroinitializer, i32 -1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %1 = call <vscale x 16 x i8> @llvm.vector.splice.down.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %2 = call <vscale x 32 x i8> @llvm.vector.splice.down.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %3 = call <vscale x 2 x i16> @llvm.vector.splice.down.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %4 = call <vscale x 4 x i16> @llvm.vector.splice.down.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %5 = call <vscale x 8 x i16> @llvm.vector.splice.down.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %6 = call <vscale x 16 x i16> @llvm.vector.splice.down.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %7 = call <vscale x 4 x i32> @llvm.vector.splice.down.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %8 = call <vscale x 8 x i32> @llvm.vector.splice.down.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %9 = call <vscale x 2 x i64> @llvm.vector.splice.down.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %10 = call <vscale x 4 x i64> @llvm.vector.splice.down.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %11 = call <vscale x 2 x half> @llvm.vector.splice.down.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %12 = call <vscale x 4 x half> @llvm.vector.splice.down.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %13 = call <vscale x 8 x half> @llvm.vector.splice.down.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %14 = call <vscale x 16 x half> @llvm.vector.splice.down.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %15 = call <vscale x 2 x float> @llvm.vector.splice.down.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %16 = call <vscale x 4 x float> @llvm.vector.splice.down.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %17 = call <vscale x 8 x float> @llvm.vector.splice.down.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %18 = call <vscale x 2 x double> @llvm.vector.splice.down.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %19 = call <vscale x 4 x double> @llvm.vector.splice.down.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %20 = call <vscale x 2 x bfloat> @llvm.vector.splice.down.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %21 = call <vscale x 4 x bfloat> @llvm.vector.splice.down.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %22 = call <vscale x 8 x bfloat> @llvm.vector.splice.down.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %23 = call <vscale x 16 x bfloat> @llvm.vector.splice.down.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %24 = call <vscale x 16 x i1> @llvm.vector.splice.down.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %25 = call <vscale x 8 x i1> @llvm.vector.splice.down.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %26 = call <vscale x 4 x i1> @llvm.vector.splice.down.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %27 = call <vscale x 2 x i1> @llvm.vector.splice.down.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %28 = call <vscale x 16 x i8> @llvm.vector.splice.up.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %29 = call <vscale x 32 x i8> @llvm.vector.splice.up.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %30 = call <vscale x 1 x i16> @llvm.vector.splice.up.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %31 = call <vscale x 2 x i16> @llvm.vector.splice.up.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %32 = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %33 = call <vscale x 8 x i16> @llvm.vector.splice.up.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %34 = call <vscale x 16 x i16> @llvm.vector.splice.up.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %35 = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %36 = call <vscale x 8 x i32> @llvm.vector.splice.up.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %37 = call <vscale x 1 x i64> @llvm.vector.splice.up.nxv1i64(<vscale x 1 x i64> zeroinitializer, <vscale x 1 x i64> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %38 = call <vscale x 2 x i64> @llvm.vector.splice.up.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %39 = call <vscale x 4 x i64> @llvm.vector.splice.up.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %40 = call <vscale x 1 x half> @llvm.vector.splice.up.nxv1f16(<vscale x 1 x half> zeroinitializer, <vscale x 1 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %41 = call <vscale x 2 x half> @llvm.vector.splice.up.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %42 = call <vscale x 4 x half> @llvm.vector.splice.up.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %43 = call <vscale x 8 x half> @llvm.vector.splice.up.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %44 = call <vscale x 16 x half> @llvm.vector.splice.up.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %45 = call <vscale x 1 x float> @llvm.vector.splice.up.nxv1f32(<vscale x 1 x float> zeroinitializer, <vscale x 1 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %46 = call <vscale x 2 x float> @llvm.vector.splice.up.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %47 = call <vscale x 4 x float> @llvm.vector.splice.up.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %48 = call <vscale x 8 x float> @llvm.vector.splice.up.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %49 = call <vscale x 1 x double> @llvm.vector.splice.up.nxv1f64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %50 = call <vscale x 2 x double> @llvm.vector.splice.up.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %51 = call <vscale x 4 x double> @llvm.vector.splice.up.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %52 = call <vscale x 1 x bfloat> @llvm.vector.splice.up.nxv1bf16(<vscale x 1 x bfloat> zeroinitializer, <vscale x 1 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %53 = call <vscale x 2 x bfloat> @llvm.vector.splice.up.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %54 = call <vscale x 4 x bfloat> @llvm.vector.splice.up.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %55 = call <vscale x 8 x bfloat> @llvm.vector.splice.up.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %56 = call <vscale x 16 x bfloat> @llvm.vector.splice.up.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %57 = call <vscale x 16 x i1> @llvm.vector.splice.up.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %58 = call <vscale x 8 x i1> @llvm.vector.splice.up.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %59 = call <vscale x 4 x i1> @llvm.vector.splice.up.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %60 = call <vscale x 2 x i1> @llvm.vector.splice.up.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %61 = call <vscale x 1 x i1> @llvm.vector.splice.up.nxv1i1(<vscale x 1 x i1> zeroinitializer, <vscale x 1 x i1> zeroinitializer, i32 1)
; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
; TYPE_BASED_ONLY-LABEL: 'vector_splice'
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv16i8 = call <vscale x 16 x i8> @llvm.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv32i8 = call <vscale x 32 x i8> @llvm.vector.splice.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv2i16 = call <vscale x 2 x i16> @llvm.vector.splice.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv4i16 = call <vscale x 4 x i16> @llvm.vector.splice.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv8i16 = call <vscale x 8 x i16> @llvm.vector.splice.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv16i16 = call <vscale x 16 x i16> @llvm.vector.splice.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv4i32 = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv8i32 = call <vscale x 8 x i32> @llvm.vector.splice.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv2i64 = call <vscale x 2 x i64> @llvm.vector.splice.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv4i64 = call <vscale x 4 x i64> @llvm.vector.splice.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv2f16 = call <vscale x 2 x half> @llvm.vector.splice.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv4f16 = call <vscale x 4 x half> @llvm.vector.splice.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv8f16 = call <vscale x 8 x half> @llvm.vector.splice.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv16f16 = call <vscale x 16 x half> @llvm.vector.splice.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv2f32 = call <vscale x 2 x float> @llvm.vector.splice.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv4f32 = call <vscale x 4 x float> @llvm.vector.splice.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv8f32 = call <vscale x 8 x float> @llvm.vector.splice.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv2f64 = call <vscale x 2 x double> @llvm.vector.splice.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv4f64 = call <vscale x 4 x double> @llvm.vector.splice.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv2bf16 = call <vscale x 2 x bfloat> @llvm.vector.splice.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv4bf16 = call <vscale x 4 x bfloat> @llvm.vector.splice.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv8bf16 = call <vscale x 8 x bfloat> @llvm.vector.splice.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv16bf16 = call <vscale x 16 x bfloat> @llvm.vector.splice.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv16i1 = call <vscale x 16 x i1> @llvm.vector.splice.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv8i1 = call <vscale x 8 x i1> @llvm.vector.splice.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv4i1 = call <vscale x 4 x i1> @llvm.vector.splice.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv2i1 = call <vscale x 2 x i1> @llvm.vector.splice.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv16i8_neg = call <vscale x 16 x i8> @llvm.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv32i8_neg = call <vscale x 32 x i8> @llvm.vector.splice.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1i16_neg = call <vscale x 1 x i16> @llvm.vector.splice.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv2i16_neg = call <vscale x 2 x i16> @llvm.vector.splice.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv4i16_neg = call <vscale x 4 x i16> @llvm.vector.splice.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv8i16_neg = call <vscale x 8 x i16> @llvm.vector.splice.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv16i16_neg = call <vscale x 16 x i16> @llvm.vector.splice.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv4i32_neg = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv8i32_neg = call <vscale x 8 x i32> @llvm.vector.splice.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1i64_neg = call <vscale x 1 x i64> @llvm.vector.splice.nxv1i64(<vscale x 1 x i64> zeroinitializer, <vscale x 1 x i64> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv2i64_neg = call <vscale x 2 x i64> @llvm.vector.splice.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv4i64_neg = call <vscale x 4 x i64> @llvm.vector.splice.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1f16_neg = call <vscale x 1 x half> @llvm.vector.splice.nxv1f16(<vscale x 1 x half> zeroinitializer, <vscale x 1 x half> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv2f16_neg = call <vscale x 2 x half> @llvm.vector.splice.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv4f16_neg = call <vscale x 4 x half> @llvm.vector.splice.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv8f16_neg = call <vscale x 8 x half> @llvm.vector.splice.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv16f16_neg = call <vscale x 16 x half> @llvm.vector.splice.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1f32_neg = call <vscale x 1 x float> @llvm.vector.splice.nxv1f32(<vscale x 1 x float> zeroinitializer, <vscale x 1 x float> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv2f32_neg = call <vscale x 2 x float> @llvm.vector.splice.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv4f32_neg = call <vscale x 4 x float> @llvm.vector.splice.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv8f32_neg = call <vscale x 8 x float> @llvm.vector.splice.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1f64_neg = call <vscale x 1 x double> @llvm.vector.splice.nxv1f64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv2f64_neg = call <vscale x 2 x double> @llvm.vector.splice.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv4f64_neg = call <vscale x 4 x double> @llvm.vector.splice.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1bf16_neg = call <vscale x 1 x bfloat> @llvm.vector.splice.nxv1bf16(<vscale x 1 x bfloat> zeroinitializer, <vscale x 1 x bfloat> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv2bf16_neg = call <vscale x 2 x bfloat> @llvm.vector.splice.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv4bf16_neg = call <vscale x 4 x bfloat> @llvm.vector.splice.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv8bf16_neg = call <vscale x 8 x bfloat> @llvm.vector.splice.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv16bf16_neg = call <vscale x 16 x bfloat> @llvm.vector.splice.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv16i1_neg = call <vscale x 16 x i1> @llvm.vector.splice.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv8i1_neg = call <vscale x 8 x i1> @llvm.vector.splice.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv4i1_neg = call <vscale x 4 x i1> @llvm.vector.splice.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv2i1_neg = call <vscale x 2 x i1> @llvm.vector.splice.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 -1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %splice_nxv1i1_neg = call <vscale x 1 x i1> @llvm.vector.splice.nxv1i1(<vscale x 1 x i1> zeroinitializer, <vscale x 1 x i1> zeroinitializer, i32 -1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %1 = call <vscale x 16 x i8> @llvm.vector.splice.down.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %2 = call <vscale x 32 x i8> @llvm.vector.splice.down.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %3 = call <vscale x 2 x i16> @llvm.vector.splice.down.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %4 = call <vscale x 4 x i16> @llvm.vector.splice.down.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %5 = call <vscale x 8 x i16> @llvm.vector.splice.down.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %6 = call <vscale x 16 x i16> @llvm.vector.splice.down.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %7 = call <vscale x 4 x i32> @llvm.vector.splice.down.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %8 = call <vscale x 8 x i32> @llvm.vector.splice.down.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %9 = call <vscale x 2 x i64> @llvm.vector.splice.down.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %10 = call <vscale x 4 x i64> @llvm.vector.splice.down.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %11 = call <vscale x 2 x half> @llvm.vector.splice.down.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %12 = call <vscale x 4 x half> @llvm.vector.splice.down.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %13 = call <vscale x 8 x half> @llvm.vector.splice.down.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %14 = call <vscale x 16 x half> @llvm.vector.splice.down.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %15 = call <vscale x 2 x float> @llvm.vector.splice.down.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %16 = call <vscale x 4 x float> @llvm.vector.splice.down.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %17 = call <vscale x 8 x float> @llvm.vector.splice.down.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %18 = call <vscale x 2 x double> @llvm.vector.splice.down.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %19 = call <vscale x 4 x double> @llvm.vector.splice.down.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %20 = call <vscale x 2 x bfloat> @llvm.vector.splice.down.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %21 = call <vscale x 4 x bfloat> @llvm.vector.splice.down.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %22 = call <vscale x 8 x bfloat> @llvm.vector.splice.down.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %23 = call <vscale x 16 x bfloat> @llvm.vector.splice.down.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %24 = call <vscale x 16 x i1> @llvm.vector.splice.down.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %25 = call <vscale x 8 x i1> @llvm.vector.splice.down.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %26 = call <vscale x 4 x i1> @llvm.vector.splice.down.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %27 = call <vscale x 2 x i1> @llvm.vector.splice.down.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %28 = call <vscale x 16 x i8> @llvm.vector.splice.up.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %29 = call <vscale x 32 x i8> @llvm.vector.splice.up.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %30 = call <vscale x 1 x i16> @llvm.vector.splice.up.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %31 = call <vscale x 2 x i16> @llvm.vector.splice.up.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %32 = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %33 = call <vscale x 8 x i16> @llvm.vector.splice.up.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %34 = call <vscale x 16 x i16> @llvm.vector.splice.up.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %35 = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %36 = call <vscale x 8 x i32> @llvm.vector.splice.up.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %37 = call <vscale x 1 x i64> @llvm.vector.splice.up.nxv1i64(<vscale x 1 x i64> zeroinitializer, <vscale x 1 x i64> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %38 = call <vscale x 2 x i64> @llvm.vector.splice.up.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %39 = call <vscale x 4 x i64> @llvm.vector.splice.up.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %40 = call <vscale x 1 x half> @llvm.vector.splice.up.nxv1f16(<vscale x 1 x half> zeroinitializer, <vscale x 1 x half> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %41 = call <vscale x 2 x half> @llvm.vector.splice.up.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %42 = call <vscale x 4 x half> @llvm.vector.splice.up.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %43 = call <vscale x 8 x half> @llvm.vector.splice.up.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %44 = call <vscale x 16 x half> @llvm.vector.splice.up.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %45 = call <vscale x 1 x float> @llvm.vector.splice.up.nxv1f32(<vscale x 1 x float> zeroinitializer, <vscale x 1 x float> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %46 = call <vscale x 2 x float> @llvm.vector.splice.up.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %47 = call <vscale x 4 x float> @llvm.vector.splice.up.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %48 = call <vscale x 8 x float> @llvm.vector.splice.up.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %49 = call <vscale x 1 x double> @llvm.vector.splice.up.nxv1f64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %50 = call <vscale x 2 x double> @llvm.vector.splice.up.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %51 = call <vscale x 4 x double> @llvm.vector.splice.up.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %52 = call <vscale x 1 x bfloat> @llvm.vector.splice.up.nxv1bf16(<vscale x 1 x bfloat> zeroinitializer, <vscale x 1 x bfloat> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %53 = call <vscale x 2 x bfloat> @llvm.vector.splice.up.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %54 = call <vscale x 4 x bfloat> @llvm.vector.splice.up.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %55 = call <vscale x 8 x bfloat> @llvm.vector.splice.up.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %56 = call <vscale x 16 x bfloat> @llvm.vector.splice.up.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %57 = call <vscale x 16 x i1> @llvm.vector.splice.up.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %58 = call <vscale x 8 x i1> @llvm.vector.splice.up.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %59 = call <vscale x 4 x i1> @llvm.vector.splice.up.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %60 = call <vscale x 2 x i1> @llvm.vector.splice.up.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %61 = call <vscale x 1 x i1> @llvm.vector.splice.up.nxv1i1(<vscale x 1 x i1> zeroinitializer, <vscale x 1 x i1> zeroinitializer, i32 1)
; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
diff --git a/llvm/test/Analysis/CostModel/RISCV/rvv-shuffle.ll b/llvm/test/Analysis/CostModel/RISCV/rvv-shuffle.ll
index e3305c0c6ffb8..8a2f8e18df805 100644
--- a/llvm/test/Analysis/CostModel/RISCV/rvv-shuffle.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/rvv-shuffle.ll
@@ -149,54 +149,54 @@ define void @vector_reverse() {
define void @vector_splice() {
; ARGBASED-LABEL: 'vector_splice'
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %splice_nxv16i8 = call <vscale x 16 x i8> @llvm.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %splice_nxv32i8 = call <vscale x 32 x i8> @llvm.vector.splice.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv2i16 = call <vscale x 2 x i16> @llvm.vector.splice.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv4i16 = call <vscale x 4 x i16> @llvm.vector.splice.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %splice_nxv8i16 = call <vscale x 8 x i16> @llvm.vector.splice.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %splice_nxv16i16 = call <vscale x 16 x i16> @llvm.vector.splice.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %splice_nxv4i32 = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %splice_nxv8i32 = call <vscale x 8 x i32> @llvm.vector.splice.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %splice_nxv2i64 = call <vscale x 2 x i64> @llvm.vector.splice.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %splice_nxv4i64 = call <vscale x 4 x i64> @llvm.vector.splice.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %splice_nxv16i1 = call <vscale x 16 x i1> @llvm.vector.splice.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv8i1 = call <vscale x 8 x i1> @llvm.vector.splice.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv4i1 = call <vscale x 4 x i1> @llvm.vector.splice.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv2i1 = call <vscale x 2 x i1> @llvm.vector.splice.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %1 = call <vscale x 16 x i8> @llvm.vector.splice.down.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %2 = call <vscale x 32 x i8> @llvm.vector.splice.down.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %3 = call <vscale x 2 x i16> @llvm.vector.splice.down.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %4 = call <vscale x 4 x i16> @llvm.vector.splice.down.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %5 = call <vscale x 8 x i16> @llvm.vector.splice.down.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %6 = call <vscale x 16 x i16> @llvm.vector.splice.down.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %7 = call <vscale x 4 x i32> @llvm.vector.splice.down.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %8 = call <vscale x 8 x i32> @llvm.vector.splice.down.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %9 = call <vscale x 2 x i64> @llvm.vector.splice.down.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %10 = call <vscale x 4 x i64> @llvm.vector.splice.down.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %11 = call <vscale x 16 x i1> @llvm.vector.splice.down.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %12 = call <vscale x 8 x i1> @llvm.vector.splice.down.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %13 = call <vscale x 4 x i1> @llvm.vector.splice.down.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %14 = call <vscale x 2 x i1> @llvm.vector.splice.down.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
; ARGBASED-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
; TYPEBASED-LABEL: 'vector_splice'
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %splice_nxv16i8 = call <vscale x 16 x i8> @llvm.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %splice_nxv32i8 = call <vscale x 32 x i8> @llvm.vector.splice.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %splice_nxv2i16 = call <vscale x 2 x i16> @llvm.vector.splice.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %splice_nxv4i16 = call <vscale x 4 x i16> @llvm.vector.splice.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %splice_nxv8i16 = call <vscale x 8 x i16> @llvm.vector.splice.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %splice_nxv16i16 = call <vscale x 16 x i16> @llvm.vector.splice.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %splice_nxv4i32 = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %splice_nxv8i32 = call <vscale x 8 x i32> @llvm.vector.splice.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %splice_nxv2i64 = call <vscale x 2 x i64> @llvm.vector.splice.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %splice_nxv4i64 = call <vscale x 4 x i64> @llvm.vector.splice.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %splice_nxv16i1 = call <vscale x 16 x i1> @llvm.vector.splice.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %splice_nxv8i1 = call <vscale x 8 x i1> @llvm.vector.splice.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %splice_nxv4i1 = call <vscale x 4 x i1> @llvm.vector.splice.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %splice_nxv2i1 = call <vscale x 2 x i1> @llvm.vector.splice.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %1 = call <vscale x 16 x i8> @llvm.vector.splice.down.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %2 = call <vscale x 32 x i8> @llvm.vector.splice.down.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %3 = call <vscale x 2 x i16> @llvm.vector.splice.down.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %4 = call <vscale x 4 x i16> @llvm.vector.splice.down.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %5 = call <vscale x 8 x i16> @llvm.vector.splice.down.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %6 = call <vscale x 16 x i16> @llvm.vector.splice.down.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %7 = call <vscale x 4 x i32> @llvm.vector.splice.down.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %8 = call <vscale x 8 x i32> @llvm.vector.splice.down.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %9 = call <vscale x 2 x i64> @llvm.vector.splice.down.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %10 = call <vscale x 4 x i64> @llvm.vector.splice.down.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %11 = call <vscale x 16 x i1> @llvm.vector.splice.down.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %12 = call <vscale x 8 x i1> @llvm.vector.splice.down.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %13 = call <vscale x 4 x i1> @llvm.vector.splice.down.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %14 = call <vscale x 2 x i1> @llvm.vector.splice.down.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
; SIZE-LABEL: 'vector_splice'
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv16i8 = call <vscale x 16 x i8> @llvm.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv32i8 = call <vscale x 32 x i8> @llvm.vector.splice.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv2i16 = call <vscale x 2 x i16> @llvm.vector.splice.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv4i16 = call <vscale x 4 x i16> @llvm.vector.splice.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv8i16 = call <vscale x 8 x i16> @llvm.vector.splice.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv16i16 = call <vscale x 16 x i16> @llvm.vector.splice.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv4i32 = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv8i32 = call <vscale x 8 x i32> @llvm.vector.splice.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv2i64 = call <vscale x 2 x i64> @llvm.vector.splice.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv4i64 = call <vscale x 4 x i64> @llvm.vector.splice.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv16i1 = call <vscale x 16 x i1> @llvm.vector.splice.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv8i1 = call <vscale x 8 x i1> @llvm.vector.splice.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv4i1 = call <vscale x 4 x i1> @llvm.vector.splice.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv2i1 = call <vscale x 2 x i1> @llvm.vector.splice.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %1 = call <vscale x 16 x i8> @llvm.vector.splice.down.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %2 = call <vscale x 32 x i8> @llvm.vector.splice.down.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %3 = call <vscale x 2 x i16> @llvm.vector.splice.down.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %4 = call <vscale x 4 x i16> @llvm.vector.splice.down.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %5 = call <vscale x 8 x i16> @llvm.vector.splice.down.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %6 = call <vscale x 16 x i16> @llvm.vector.splice.down.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %7 = call <vscale x 4 x i32> @llvm.vector.splice.down.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = call <vscale x 8 x i32> @llvm.vector.splice.down.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %9 = call <vscale x 2 x i64> @llvm.vector.splice.down.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %10 = call <vscale x 4 x i64> @llvm.vector.splice.down.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %11 = call <vscale x 16 x i1> @llvm.vector.splice.down.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %12 = call <vscale x 8 x i1> @llvm.vector.splice.down.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %13 = call <vscale x 4 x i1> @llvm.vector.splice.down.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %14 = call <vscale x 2 x i1> @llvm.vector.splice.down.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
;
%splice_nxv16i8 = call <vscale x 16 x i8> @llvm.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
diff --git a/llvm/test/Analysis/CostModel/RISCV/splice.ll b/llvm/test/Analysis/CostModel/RISCV/splice.ll
index ddfaa8c13d425..f40fb65a29144 100644
--- a/llvm/test/Analysis/CostModel/RISCV/splice.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/splice.ll
@@ -6,121 +6,121 @@
define void @vector_splice() {
; CHECK-LABEL: 'vector_splice'
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv1i8 = call <vscale x 1 x i8> @llvm.vector.splice.nxv1i8(<vscale x 1 x i8> zeroinitializer, <vscale x 1 x i8> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv2i8 = call <vscale x 2 x i8> @llvm.vector.splice.nxv2i8(<vscale x 2 x i8> zeroinitializer, <vscale x 2 x i8> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv4i8 = call <vscale x 4 x i8> @llvm.vector.splice.nxv4i8(<vscale x 4 x i8> zeroinitializer, <vscale x 4 x i8> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv8i8 = call <vscale x 8 x i8> @llvm.vector.splice.nxv8i8(<vscale x 8 x i8> zeroinitializer, <vscale x 8 x i8> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %splice.nxv16i8 = call <vscale x 16 x i8> @llvm.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %splice.nxv32i8 = call <vscale x 32 x i8> @llvm.vector.splice.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %splice.nxv64i8 = call <vscale x 64 x i8> @llvm.vector.splice.nxv64i8(<vscale x 64 x i8> zeroinitializer, <vscale x 64 x i8> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv1i16 = call <vscale x 1 x i16> @llvm.vector.splice.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv2i16 = call <vscale x 2 x i16> @llvm.vector.splice.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv4i16 = call <vscale x 4 x i16> @llvm.vector.splice.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %splice.nxv8i16 = call <vscale x 8 x i16> @llvm.vector.splice.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %splice.nxv16i16 = call <vscale x 16 x i16> @llvm.vector.splice.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %splice.nxv32i16 = call <vscale x 32 x i16> @llvm.vector.splice.nxv32i16(<vscale x 32 x i16> zeroinitializer, <vscale x 32 x i16> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %splice.nxv64i16 = call <vscale x 64 x i16> @llvm.vector.splice.nxv64i16(<vscale x 64 x i16> zeroinitializer, <vscale x 64 x i16> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv1i32 = call <vscale x 1 x i32> @llvm.vector.splice.nxv1i32(<vscale x 1 x i32> zeroinitializer, <vscale x 1 x i32> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv2i32 = call <vscale x 2 x i32> @llvm.vector.splice.nxv2i32(<vscale x 2 x i32> zeroinitializer, <vscale x 2 x i32> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %splice.nxv4i32 = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %splice.nxv8i32 = call <vscale x 8 x i32> @llvm.vector.splice.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %splice.nxv16i32 = call <vscale x 16 x i32> @llvm.vector.splice.nxv16i32(<vscale x 16 x i32> zeroinitializer, <vscale x 16 x i32> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %splice.nxv32i32 = call <vscale x 32 x i32> @llvm.vector.splice.nxv32i32(<vscale x 32 x i32> zeroinitializer, <vscale x 32 x i32> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %splice.nxv64i32 = call <vscale x 64 x i32> @llvm.vector.splice.nxv64i32(<vscale x 64 x i32> zeroinitializer, <vscale x 64 x i32> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv1i64 = call <vscale x 1 x i64> @llvm.vector.splice.nxv1i64(<vscale x 1 x i64> zeroinitializer, <vscale x 1 x i64> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %splice.nxv2i64 = call <vscale x 2 x i64> @llvm.vector.splice.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %splice.nxv4i64 = call <vscale x 4 x i64> @llvm.vector.splice.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %splice.nxv8i64 = call <vscale x 8 x i64> @llvm.vector.splice.nxv8i64(<vscale x 8 x i64> zeroinitializer, <vscale x 8 x i64> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %splice.nxv16i64 = call <vscale x 16 x i64> @llvm.vector.splice.nxv16i64(<vscale x 16 x i64> zeroinitializer, <vscale x 16 x i64> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %splice.nxv32i64 = call <vscale x 32 x i64> @llvm.vector.splice.nxv32i64(<vscale x 32 x i64> zeroinitializer, <vscale x 32 x i64> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 128 for instruction: %splice.nxv64i64 = call <vscale x 64 x i64> @llvm.vector.splice.nxv64i64(<vscale x 64 x i64> zeroinitializer, <vscale x 64 x i64> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv1bf16 = call <vscale x 1 x bfloat> @llvm.vector.splice.nxv1bf16(<vscale x 1 x bfloat> zeroinitializer, <vscale x 1 x bfloat> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv2bf16 = call <vscale x 2 x bfloat> @llvm.vector.splice.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv4bf16 = call <vscale x 4 x bfloat> @llvm.vector.splice.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %splice.nxv8bf16 = call <vscale x 8 x bfloat> @llvm.vector.splice.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %splice.nxv16bf16 = call <vscale x 16 x bfloat> @llvm.vector.splice.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %splice.nxv32bf16 = call <vscale x 32 x bfloat> @llvm.vector.splice.nxv32bf16(<vscale x 32 x bfloat> zeroinitializer, <vscale x 32 x bfloat> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %splice.nxv64bf16 = call <vscale x 64 x bfloat> @llvm.vector.splice.nxv64bf16(<vscale x 64 x bfloat> zeroinitializer, <vscale x 64 x bfloat> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv1f16 = call <vscale x 1 x half> @llvm.vector.splice.nxv1f16(<vscale x 1 x half> zeroinitializer, <vscale x 1 x half> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv2f16 = call <vscale x 2 x half> @llvm.vector.splice.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv4f16 = call <vscale x 4 x half> @llvm.vector.splice.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %splice.nxv8f16 = call <vscale x 8 x half> @llvm.vector.splice.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %splice.nxv16f16 = call <vscale x 16 x half> @llvm.vector.splice.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %splice.nxv32f16 = call <vscale x 32 x half> @llvm.vector.splice.nxv32f16(<vscale x 32 x half> zeroinitializer, <vscale x 32 x half> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %splice.nxv64f16 = call <vscale x 64 x half> @llvm.vector.splice.nxv64f16(<vscale x 64 x half> zeroinitializer, <vscale x 64 x half> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv1f32 = call <vscale x 1 x float> @llvm.vector.splice.nxv1f32(<vscale x 1 x float> zeroinitializer, <vscale x 1 x float> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv2f32 = call <vscale x 2 x float> @llvm.vector.splice.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %splice.nxv4f32 = call <vscale x 4 x float> @llvm.vector.splice.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %splice.nxv8f32 = call <vscale x 8 x float> @llvm.vector.splice.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %splice.nxv16f32 = call <vscale x 16 x float> @llvm.vector.splice.nxv16f32(<vscale x 16 x float> zeroinitializer, <vscale x 16 x float> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %splice.nxv32f32 = call <vscale x 32 x float> @llvm.vector.splice.nxv32f32(<vscale x 32 x float> zeroinitializer, <vscale x 32 x float> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %splice.nxv64f32 = call <vscale x 64 x float> @llvm.vector.splice.nxv64f32(<vscale x 64 x float> zeroinitializer, <vscale x 64 x float> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv1f64 = call <vscale x 1 x double> @llvm.vector.splice.nxv1f64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %splice.nxv2f64 = call <vscale x 2 x double> @llvm.vector.splice.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %splice.nxv4f64 = call <vscale x 4 x double> @llvm.vector.splice.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %splice.nxv8f64 = call <vscale x 8 x double> @llvm.vector.splice.nxv8f64(<vscale x 8 x double> zeroinitializer, <vscale x 8 x double> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %splice.nxv16f64 = call <vscale x 16 x double> @llvm.vector.splice.nxv16f64(<vscale x 16 x double> zeroinitializer, <vscale x 16 x double> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %splice.nxv32f64 = call <vscale x 32 x double> @llvm.vector.splice.nxv32f64(<vscale x 32 x double> zeroinitializer, <vscale x 32 x double> zeroinitializer, i32 -1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 128 for instruction: %splice.nxv64f64 = call <vscale x 64 x double> @llvm.vector.splice.nxv64f64(<vscale x 64 x double> zeroinitializer, <vscale x 64 x double> zeroinitializer, i32 -1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %1 = call <vscale x 1 x i8> @llvm.vector.splice.up.nxv1i8(<vscale x 1 x i8> zeroinitializer, <vscale x 1 x i8> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %2 = call <vscale x 2 x i8> @llvm.vector.splice.up.nxv2i8(<vscale x 2 x i8> zeroinitializer, <vscale x 2 x i8> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %3 = call <vscale x 4 x i8> @llvm.vector.splice.up.nxv4i8(<vscale x 4 x i8> zeroinitializer, <vscale x 4 x i8> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %4 = call <vscale x 8 x i8> @llvm.vector.splice.up.nxv8i8(<vscale x 8 x i8> zeroinitializer, <vscale x 8 x i8> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %5 = call <vscale x 16 x i8> @llvm.vector.splice.up.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %6 = call <vscale x 32 x i8> @llvm.vector.splice.up.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %7 = call <vscale x 64 x i8> @llvm.vector.splice.up.nxv64i8(<vscale x 64 x i8> zeroinitializer, <vscale x 64 x i8> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = call <vscale x 1 x i16> @llvm.vector.splice.up.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %9 = call <vscale x 2 x i16> @llvm.vector.splice.up.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %10 = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %11 = call <vscale x 8 x i16> @llvm.vector.splice.up.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %12 = call <vscale x 16 x i16> @llvm.vector.splice.up.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %13 = call <vscale x 32 x i16> @llvm.vector.splice.up.nxv32i16(<vscale x 32 x i16> zeroinitializer, <vscale x 32 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %14 = call <vscale x 64 x i16> @llvm.vector.splice.up.nxv64i16(<vscale x 64 x i16> zeroinitializer, <vscale x 64 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %15 = call <vscale x 1 x i32> @llvm.vector.splice.up.nxv1i32(<vscale x 1 x i32> zeroinitializer, <vscale x 1 x i32> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %16 = call <vscale x 2 x i32> @llvm.vector.splice.up.nxv2i32(<vscale x 2 x i32> zeroinitializer, <vscale x 2 x i32> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %17 = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %18 = call <vscale x 8 x i32> @llvm.vector.splice.up.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %19 = call <vscale x 16 x i32> @llvm.vector.splice.up.nxv16i32(<vscale x 16 x i32> zeroinitializer, <vscale x 16 x i32> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %20 = call <vscale x 32 x i32> @llvm.vector.splice.up.nxv32i32(<vscale x 32 x i32> zeroinitializer, <vscale x 32 x i32> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %21 = call <vscale x 64 x i32> @llvm.vector.splice.up.nxv64i32(<vscale x 64 x i32> zeroinitializer, <vscale x 64 x i32> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %22 = call <vscale x 1 x i64> @llvm.vector.splice.up.nxv1i64(<vscale x 1 x i64> zeroinitializer, <vscale x 1 x i64> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %23 = call <vscale x 2 x i64> @llvm.vector.splice.up.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %24 = call <vscale x 4 x i64> @llvm.vector.splice.up.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %25 = call <vscale x 8 x i64> @llvm.vector.splice.up.nxv8i64(<vscale x 8 x i64> zeroinitializer, <vscale x 8 x i64> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %26 = call <vscale x 16 x i64> @llvm.vector.splice.up.nxv16i64(<vscale x 16 x i64> zeroinitializer, <vscale x 16 x i64> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %27 = call <vscale x 32 x i64> @llvm.vector.splice.up.nxv32i64(<vscale x 32 x i64> zeroinitializer, <vscale x 32 x i64> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 128 for instruction: %28 = call <vscale x 64 x i64> @llvm.vector.splice.up.nxv64i64(<vscale x 64 x i64> zeroinitializer, <vscale x 64 x i64> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %29 = call <vscale x 1 x bfloat> @llvm.vector.splice.up.nxv1bf16(<vscale x 1 x bfloat> zeroinitializer, <vscale x 1 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %30 = call <vscale x 2 x bfloat> @llvm.vector.splice.up.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %31 = call <vscale x 4 x bfloat> @llvm.vector.splice.up.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %32 = call <vscale x 8 x bfloat> @llvm.vector.splice.up.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %33 = call <vscale x 16 x bfloat> @llvm.vector.splice.up.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %34 = call <vscale x 32 x bfloat> @llvm.vector.splice.up.nxv32bf16(<vscale x 32 x bfloat> zeroinitializer, <vscale x 32 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %35 = call <vscale x 64 x bfloat> @llvm.vector.splice.up.nxv64bf16(<vscale x 64 x bfloat> zeroinitializer, <vscale x 64 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %36 = call <vscale x 1 x half> @llvm.vector.splice.up.nxv1f16(<vscale x 1 x half> zeroinitializer, <vscale x 1 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %37 = call <vscale x 2 x half> @llvm.vector.splice.up.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %38 = call <vscale x 4 x half> @llvm.vector.splice.up.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %39 = call <vscale x 8 x half> @llvm.vector.splice.up.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %40 = call <vscale x 16 x half> @llvm.vector.splice.up.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %41 = call <vscale x 32 x half> @llvm.vector.splice.up.nxv32f16(<vscale x 32 x half> zeroinitializer, <vscale x 32 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %42 = call <vscale x 64 x half> @llvm.vector.splice.up.nxv64f16(<vscale x 64 x half> zeroinitializer, <vscale x 64 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %43 = call <vscale x 1 x float> @llvm.vector.splice.up.nxv1f32(<vscale x 1 x float> zeroinitializer, <vscale x 1 x float> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %44 = call <vscale x 2 x float> @llvm.vector.splice.up.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %45 = call <vscale x 4 x float> @llvm.vector.splice.up.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %46 = call <vscale x 8 x float> @llvm.vector.splice.up.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %47 = call <vscale x 16 x float> @llvm.vector.splice.up.nxv16f32(<vscale x 16 x float> zeroinitializer, <vscale x 16 x float> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %48 = call <vscale x 32 x float> @llvm.vector.splice.up.nxv32f32(<vscale x 32 x float> zeroinitializer, <vscale x 32 x float> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %49 = call <vscale x 64 x float> @llvm.vector.splice.up.nxv64f32(<vscale x 64 x float> zeroinitializer, <vscale x 64 x float> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %50 = call <vscale x 1 x double> @llvm.vector.splice.up.nxv1f64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %51 = call <vscale x 2 x double> @llvm.vector.splice.up.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %52 = call <vscale x 4 x double> @llvm.vector.splice.up.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %53 = call <vscale x 8 x double> @llvm.vector.splice.up.nxv8f64(<vscale x 8 x double> zeroinitializer, <vscale x 8 x double> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %54 = call <vscale x 16 x double> @llvm.vector.splice.up.nxv16f64(<vscale x 16 x double> zeroinitializer, <vscale x 16 x double> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %55 = call <vscale x 32 x double> @llvm.vector.splice.up.nxv32f64(<vscale x 32 x double> zeroinitializer, <vscale x 32 x double> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 128 for instruction: %56 = call <vscale x 64 x double> @llvm.vector.splice.up.nxv64f64(<vscale x 64 x double> zeroinitializer, <vscale x 64 x double> zeroinitializer, i32 1)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
; SIZE-LABEL: 'vector_splice'
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv1i8 = call <vscale x 1 x i8> @llvm.vector.splice.nxv1i8(<vscale x 1 x i8> zeroinitializer, <vscale x 1 x i8> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv2i8 = call <vscale x 2 x i8> @llvm.vector.splice.nxv2i8(<vscale x 2 x i8> zeroinitializer, <vscale x 2 x i8> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv4i8 = call <vscale x 4 x i8> @llvm.vector.splice.nxv4i8(<vscale x 4 x i8> zeroinitializer, <vscale x 4 x i8> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv8i8 = call <vscale x 8 x i8> @llvm.vector.splice.nxv8i8(<vscale x 8 x i8> zeroinitializer, <vscale x 8 x i8> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv16i8 = call <vscale x 16 x i8> @llvm.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv32i8 = call <vscale x 32 x i8> @llvm.vector.splice.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv64i8 = call <vscale x 64 x i8> @llvm.vector.splice.nxv64i8(<vscale x 64 x i8> zeroinitializer, <vscale x 64 x i8> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv1i16 = call <vscale x 1 x i16> @llvm.vector.splice.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv2i16 = call <vscale x 2 x i16> @llvm.vector.splice.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv4i16 = call <vscale x 4 x i16> @llvm.vector.splice.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv8i16 = call <vscale x 8 x i16> @llvm.vector.splice.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv16i16 = call <vscale x 16 x i16> @llvm.vector.splice.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv32i16 = call <vscale x 32 x i16> @llvm.vector.splice.nxv32i16(<vscale x 32 x i16> zeroinitializer, <vscale x 32 x i16> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %splice.nxv64i16 = call <vscale x 64 x i16> @llvm.vector.splice.nxv64i16(<vscale x 64 x i16> zeroinitializer, <vscale x 64 x i16> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv1i32 = call <vscale x 1 x i32> @llvm.vector.splice.nxv1i32(<vscale x 1 x i32> zeroinitializer, <vscale x 1 x i32> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv2i32 = call <vscale x 2 x i32> @llvm.vector.splice.nxv2i32(<vscale x 2 x i32> zeroinitializer, <vscale x 2 x i32> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv4i32 = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv8i32 = call <vscale x 8 x i32> @llvm.vector.splice.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv16i32 = call <vscale x 16 x i32> @llvm.vector.splice.nxv16i32(<vscale x 16 x i32> zeroinitializer, <vscale x 16 x i32> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %splice.nxv32i32 = call <vscale x 32 x i32> @llvm.vector.splice.nxv32i32(<vscale x 32 x i32> zeroinitializer, <vscale x 32 x i32> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %splice.nxv64i32 = call <vscale x 64 x i32> @llvm.vector.splice.nxv64i32(<vscale x 64 x i32> zeroinitializer, <vscale x 64 x i32> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv1i64 = call <vscale x 1 x i64> @llvm.vector.splice.nxv1i64(<vscale x 1 x i64> zeroinitializer, <vscale x 1 x i64> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv2i64 = call <vscale x 2 x i64> @llvm.vector.splice.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv4i64 = call <vscale x 4 x i64> @llvm.vector.splice.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv8i64 = call <vscale x 8 x i64> @llvm.vector.splice.nxv8i64(<vscale x 8 x i64> zeroinitializer, <vscale x 8 x i64> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %splice.nxv16i64 = call <vscale x 16 x i64> @llvm.vector.splice.nxv16i64(<vscale x 16 x i64> zeroinitializer, <vscale x 16 x i64> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %splice.nxv32i64 = call <vscale x 32 x i64> @llvm.vector.splice.nxv32i64(<vscale x 32 x i64> zeroinitializer, <vscale x 32 x i64> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %splice.nxv64i64 = call <vscale x 64 x i64> @llvm.vector.splice.nxv64i64(<vscale x 64 x i64> zeroinitializer, <vscale x 64 x i64> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Invalid cost for instruction: %splice.nxv1bf16 = call <vscale x 1 x bfloat> @llvm.vector.splice.nxv1bf16(<vscale x 1 x bfloat> zeroinitializer, <vscale x 1 x bfloat> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Invalid cost for instruction: %splice.nxv2bf16 = call <vscale x 2 x bfloat> @llvm.vector.splice.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Invalid cost for instruction: %splice.nxv4bf16 = call <vscale x 4 x bfloat> @llvm.vector.splice.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Invalid cost for instruction: %splice.nxv8bf16 = call <vscale x 8 x bfloat> @llvm.vector.splice.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Invalid cost for instruction: %splice.nxv16bf16 = call <vscale x 16 x bfloat> @llvm.vector.splice.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Invalid cost for instruction: %splice.nxv32bf16 = call <vscale x 32 x bfloat> @llvm.vector.splice.nxv32bf16(<vscale x 32 x bfloat> zeroinitializer, <vscale x 32 x bfloat> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Invalid cost for instruction: %splice.nxv64bf16 = call <vscale x 64 x bfloat> @llvm.vector.splice.nxv64bf16(<vscale x 64 x bfloat> zeroinitializer, <vscale x 64 x bfloat> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv1f16 = call <vscale x 1 x half> @llvm.vector.splice.nxv1f16(<vscale x 1 x half> zeroinitializer, <vscale x 1 x half> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv2f16 = call <vscale x 2 x half> @llvm.vector.splice.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv4f16 = call <vscale x 4 x half> @llvm.vector.splice.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv8f16 = call <vscale x 8 x half> @llvm.vector.splice.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv16f16 = call <vscale x 16 x half> @llvm.vector.splice.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv32f16 = call <vscale x 32 x half> @llvm.vector.splice.nxv32f16(<vscale x 32 x half> zeroinitializer, <vscale x 32 x half> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %splice.nxv64f16 = call <vscale x 64 x half> @llvm.vector.splice.nxv64f16(<vscale x 64 x half> zeroinitializer, <vscale x 64 x half> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv1f32 = call <vscale x 1 x float> @llvm.vector.splice.nxv1f32(<vscale x 1 x float> zeroinitializer, <vscale x 1 x float> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv2f32 = call <vscale x 2 x float> @llvm.vector.splice.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv4f32 = call <vscale x 4 x float> @llvm.vector.splice.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv8f32 = call <vscale x 8 x float> @llvm.vector.splice.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv16f32 = call <vscale x 16 x float> @llvm.vector.splice.nxv16f32(<vscale x 16 x float> zeroinitializer, <vscale x 16 x float> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %splice.nxv32f32 = call <vscale x 32 x float> @llvm.vector.splice.nxv32f32(<vscale x 32 x float> zeroinitializer, <vscale x 32 x float> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %splice.nxv64f32 = call <vscale x 64 x float> @llvm.vector.splice.nxv64f32(<vscale x 64 x float> zeroinitializer, <vscale x 64 x float> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv1f64 = call <vscale x 1 x double> @llvm.vector.splice.nxv1f64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv2f64 = call <vscale x 2 x double> @llvm.vector.splice.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv4f64 = call <vscale x 4 x double> @llvm.vector.splice.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice.nxv8f64 = call <vscale x 8 x double> @llvm.vector.splice.nxv8f64(<vscale x 8 x double> zeroinitializer, <vscale x 8 x double> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %splice.nxv16f64 = call <vscale x 16 x double> @llvm.vector.splice.nxv16f64(<vscale x 16 x double> zeroinitializer, <vscale x 16 x double> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %splice.nxv32f64 = call <vscale x 32 x double> @llvm.vector.splice.nxv32f64(<vscale x 32 x double> zeroinitializer, <vscale x 32 x double> zeroinitializer, i32 -1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %splice.nxv64f64 = call <vscale x 64 x double> @llvm.vector.splice.nxv64f64(<vscale x 64 x double> zeroinitializer, <vscale x 64 x double> zeroinitializer, i32 -1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %1 = call <vscale x 1 x i8> @llvm.vector.splice.up.nxv1i8(<vscale x 1 x i8> zeroinitializer, <vscale x 1 x i8> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %2 = call <vscale x 2 x i8> @llvm.vector.splice.up.nxv2i8(<vscale x 2 x i8> zeroinitializer, <vscale x 2 x i8> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %3 = call <vscale x 4 x i8> @llvm.vector.splice.up.nxv4i8(<vscale x 4 x i8> zeroinitializer, <vscale x 4 x i8> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %4 = call <vscale x 8 x i8> @llvm.vector.splice.up.nxv8i8(<vscale x 8 x i8> zeroinitializer, <vscale x 8 x i8> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %5 = call <vscale x 16 x i8> @llvm.vector.splice.up.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %6 = call <vscale x 32 x i8> @llvm.vector.splice.up.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %7 = call <vscale x 64 x i8> @llvm.vector.splice.up.nxv64i8(<vscale x 64 x i8> zeroinitializer, <vscale x 64 x i8> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = call <vscale x 1 x i16> @llvm.vector.splice.up.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %9 = call <vscale x 2 x i16> @llvm.vector.splice.up.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %10 = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %11 = call <vscale x 8 x i16> @llvm.vector.splice.up.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %12 = call <vscale x 16 x i16> @llvm.vector.splice.up.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %13 = call <vscale x 32 x i16> @llvm.vector.splice.up.nxv32i16(<vscale x 32 x i16> zeroinitializer, <vscale x 32 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %14 = call <vscale x 64 x i16> @llvm.vector.splice.up.nxv64i16(<vscale x 64 x i16> zeroinitializer, <vscale x 64 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %15 = call <vscale x 1 x i32> @llvm.vector.splice.up.nxv1i32(<vscale x 1 x i32> zeroinitializer, <vscale x 1 x i32> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %16 = call <vscale x 2 x i32> @llvm.vector.splice.up.nxv2i32(<vscale x 2 x i32> zeroinitializer, <vscale x 2 x i32> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %17 = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %18 = call <vscale x 8 x i32> @llvm.vector.splice.up.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %19 = call <vscale x 16 x i32> @llvm.vector.splice.up.nxv16i32(<vscale x 16 x i32> zeroinitializer, <vscale x 16 x i32> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %20 = call <vscale x 32 x i32> @llvm.vector.splice.up.nxv32i32(<vscale x 32 x i32> zeroinitializer, <vscale x 32 x i32> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %21 = call <vscale x 64 x i32> @llvm.vector.splice.up.nxv64i32(<vscale x 64 x i32> zeroinitializer, <vscale x 64 x i32> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %22 = call <vscale x 1 x i64> @llvm.vector.splice.up.nxv1i64(<vscale x 1 x i64> zeroinitializer, <vscale x 1 x i64> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %23 = call <vscale x 2 x i64> @llvm.vector.splice.up.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %24 = call <vscale x 4 x i64> @llvm.vector.splice.up.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %25 = call <vscale x 8 x i64> @llvm.vector.splice.up.nxv8i64(<vscale x 8 x i64> zeroinitializer, <vscale x 8 x i64> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %26 = call <vscale x 16 x i64> @llvm.vector.splice.up.nxv16i64(<vscale x 16 x i64> zeroinitializer, <vscale x 16 x i64> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %27 = call <vscale x 32 x i64> @llvm.vector.splice.up.nxv32i64(<vscale x 32 x i64> zeroinitializer, <vscale x 32 x i64> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %28 = call <vscale x 64 x i64> @llvm.vector.splice.up.nxv64i64(<vscale x 64 x i64> zeroinitializer, <vscale x 64 x i64> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Invalid cost for instruction: %29 = call <vscale x 1 x bfloat> @llvm.vector.splice.up.nxv1bf16(<vscale x 1 x bfloat> zeroinitializer, <vscale x 1 x bfloat> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Invalid cost for instruction: %30 = call <vscale x 2 x bfloat> @llvm.vector.splice.up.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Invalid cost for instruction: %31 = call <vscale x 4 x bfloat> @llvm.vector.splice.up.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Invalid cost for instruction: %32 = call <vscale x 8 x bfloat> @llvm.vector.splice.up.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Invalid cost for instruction: %33 = call <vscale x 16 x bfloat> @llvm.vector.splice.up.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Invalid cost for instruction: %34 = call <vscale x 32 x bfloat> @llvm.vector.splice.up.nxv32bf16(<vscale x 32 x bfloat> zeroinitializer, <vscale x 32 x bfloat> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Invalid cost for instruction: %35 = call <vscale x 64 x bfloat> @llvm.vector.splice.up.nxv64bf16(<vscale x 64 x bfloat> zeroinitializer, <vscale x 64 x bfloat> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %36 = call <vscale x 1 x half> @llvm.vector.splice.up.nxv1f16(<vscale x 1 x half> zeroinitializer, <vscale x 1 x half> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %37 = call <vscale x 2 x half> @llvm.vector.splice.up.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %38 = call <vscale x 4 x half> @llvm.vector.splice.up.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %39 = call <vscale x 8 x half> @llvm.vector.splice.up.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %40 = call <vscale x 16 x half> @llvm.vector.splice.up.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %41 = call <vscale x 32 x half> @llvm.vector.splice.up.nxv32f16(<vscale x 32 x half> zeroinitializer, <vscale x 32 x half> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %42 = call <vscale x 64 x half> @llvm.vector.splice.up.nxv64f16(<vscale x 64 x half> zeroinitializer, <vscale x 64 x half> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %43 = call <vscale x 1 x float> @llvm.vector.splice.up.nxv1f32(<vscale x 1 x float> zeroinitializer, <vscale x 1 x float> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %44 = call <vscale x 2 x float> @llvm.vector.splice.up.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %45 = call <vscale x 4 x float> @llvm.vector.splice.up.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %46 = call <vscale x 8 x float> @llvm.vector.splice.up.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %47 = call <vscale x 16 x float> @llvm.vector.splice.up.nxv16f32(<vscale x 16 x float> zeroinitializer, <vscale x 16 x float> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %48 = call <vscale x 32 x float> @llvm.vector.splice.up.nxv32f32(<vscale x 32 x float> zeroinitializer, <vscale x 32 x float> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %49 = call <vscale x 64 x float> @llvm.vector.splice.up.nxv64f32(<vscale x 64 x float> zeroinitializer, <vscale x 64 x float> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %50 = call <vscale x 1 x double> @llvm.vector.splice.up.nxv1f64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %51 = call <vscale x 2 x double> @llvm.vector.splice.up.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %52 = call <vscale x 4 x double> @llvm.vector.splice.up.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %53 = call <vscale x 8 x double> @llvm.vector.splice.up.nxv8f64(<vscale x 8 x double> zeroinitializer, <vscale x 8 x double> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %54 = call <vscale x 16 x double> @llvm.vector.splice.up.nxv16f64(<vscale x 16 x double> zeroinitializer, <vscale x 16 x double> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %55 = call <vscale x 32 x double> @llvm.vector.splice.up.nxv32f64(<vscale x 32 x double> zeroinitializer, <vscale x 32 x double> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %56 = call <vscale x 64 x double> @llvm.vector.splice.up.nxv64f64(<vscale x 64 x double> zeroinitializer, <vscale x 64 x double> zeroinitializer, i32 1)
; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
;
%splice.nxv1i8 = call <vscale x 1 x i8> @llvm.vector.splice.nxv1i8(<vscale x 1 x i8> zeroinitializer, <vscale x 1 x i8> zeroinitializer, i32 -1)
diff --git a/llvm/test/Assembler/auto_upgrade_intrinsics.ll b/llvm/test/Assembler/auto_upgrade_intrinsics.ll
index 64d4a3ba7c802..d990447b434fc 100644
--- a/llvm/test/Assembler/auto_upgrade_intrinsics.ll
+++ b/llvm/test/Assembler/auto_upgrade_intrinsics.ll
@@ -216,6 +216,17 @@ define void @test.prefetch.unnamed(ptr %ptr) {
ret void
}
+define void @test.vector.splice(<4 x i32> %a, <4 x i32> %b) {
+; CHECK-LABEL: @test.vector.splice
+; CHECK: @llvm.vector.splice.up.v4i32(<4 x i32> %a, <4 x i32> %b, i32 3)
+ call <4 x i32> @llvm.vector.splice(<4 x i32> %a, <4 x i32> %b, i32 3)
+; CHECK: @llvm.vector.splice.down.v4i32(<4 x i32> %a, <4 x i32> %b, i32 2)
+ call <4 x i32> @llvm.vector.splice(<4 x i32> %a, <4 x i32> %b, i32 -2)
+; CHECK: @llvm.vector.splice.up.v4i32(<4 x i32> %a, <4 x i32> %b, i32 1)
+ call <4 x i32> @llvm.vector.splice.v4i32(<4 x i32> %a, <4 x i32> %b, i32 1)
+ ret void
+}
+
; This is part of @test.objectsize(), since llvm.objectsize declaration gets
; emitted at the end.
; CHECK: declare i32 @llvm.objectsize.i32.p0
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence.ll b/llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence.ll
index 16e9d410e4aa7..a08d6d1513989 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence.ll
@@ -22,8 +22,8 @@ define i32 @PR33613(ptr %b, double %j, i32 %d) #0 {
; CHECK-VF4UF2-LABEL: @PR33613
; CHECK-VF4UF2: vector.body
; CHECK-VF4UF2: %[[VEC_RECUR:.*]] = phi <vscale x 4 x double> [ {{.*}}, %vector.ph ], [ {{.*}}, %vector.body ]
-; CHECK-VF4UF2: %[[SPLICE1:.*]] = call <vscale x 4 x double> @llvm.vector.splice.nxv4f64(<vscale x 4 x double> %[[VEC_RECUR]], <vscale x 4 x double> {{.*}}, i32 -1)
-; CHECK-VF4UF2-NEXT: %[[SPLICE2:.*]] = call <vscale x 4 x double> @llvm.vector.splice.nxv4f64(<vscale x 4 x double> %{{.*}}, <vscale x 4 x double> %{{.*}}, i32 -1)
+; CHECK-VF4UF2: %[[SPLICE1:.*]] = call <vscale x 4 x double> @llvm.vector.splice.up.nxv4f64(<vscale x 4 x double> %[[VEC_RECUR]], <vscale x 4 x double> {{.*}}, i32 1)
+; CHECK-VF4UF2-NEXT: %[[SPLICE2:.*]] = call <vscale x 4 x double> @llvm.vector.splice.up.nxv4f64(<vscale x 4 x double> %{{.*}}, <vscale x 4 x double> %{{.*}}, i32 1)
; CHECK-VF4UF2-NOT: insertelement <vscale x 4 x double>
; CHECK-VF4UF2: middle.block
entry:
@@ -71,7 +71,7 @@ define void @PR34711(ptr %a, ptr %b, ptr %c, i64 %n) #0 {
; CHECK-VF4UF1: vector.body
; CHECK-VF4UF1: %[[VEC_RECUR:.*]] = phi <vscale x 4 x i16> [ %vector.recur.init, %vector.ph ], [ %[[MGATHER:.*]], %vector.body ]
; CHECK-VF4UF1: %[[MGATHER]] = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16.nxv4p0(<vscale x 4 x ptr> {{.*}}, <vscale x 4 x i1> splat (i1 true), <vscale x 4 x i16> poison)
-; CHECK-VF4UF1-NEXT: %[[SPLICE:.*]] = call <vscale x 4 x i16> @llvm.vector.splice.nxv4i16(<vscale x 4 x i16> %[[VEC_RECUR]], <vscale x 4 x i16> %[[MGATHER]], i32 -1)
+; CHECK-VF4UF1-NEXT: %[[SPLICE:.*]] = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> %[[VEC_RECUR]], <vscale x 4 x i16> %[[MGATHER]], i32 1)
; CHECK-VF4UF1-NEXT: %[[SXT1:.*]] = sext <vscale x 4 x i16> %[[SPLICE]] to <vscale x 4 x i32>
; CHECK-VF4UF1-NEXT: %[[SXT2:.*]] = sext <vscale x 4 x i16> %[[MGATHER]] to <vscale x 4 x i32>
; CHECK-VF4UF1-NEXT: mul nsw <vscale x 4 x i32> %[[SXT2]], %[[SXT1]]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/reduction-recurrence-costs-sve.ll b/llvm/test/Transforms/LoopVectorize/AArch64/reduction-recurrence-costs-sve.ll
index f2c0ca30a6c18..b66018ef04c48 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/reduction-recurrence-costs-sve.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/reduction-recurrence-costs-sve.ll
@@ -91,10 +91,10 @@ define i32 @chained_recurrences(i32 %x, i64 %y, ptr %src.1, i32 %z, ptr %src.2)
; VSCALEFORTUNING2-NEXT: [[TMP24:%.*]] = load i32, ptr [[TMP8]], align 4
; VSCALEFORTUNING2-NEXT: [[BROADCAST_SPLATINSERT6:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[TMP24]], i64 0
; VSCALEFORTUNING2-NEXT: [[BROADCAST_SPLAT7]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT6]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
-; VSCALEFORTUNING2-NEXT: [[TMP25:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[BROADCAST_SPLAT7]], i32 -1)
-; VSCALEFORTUNING2-NEXT: [[TMP26]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> [[BROADCAST_SPLAT7]], <vscale x 4 x i32> [[BROADCAST_SPLAT7]], i32 -1)
-; VSCALEFORTUNING2-NEXT: [[TMP27:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR4]], <vscale x 4 x i32> [[TMP25]], i32 -1)
-; VSCALEFORTUNING2-NEXT: [[TMP28:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> [[TMP25]], <vscale x 4 x i32> [[TMP26]], i32 -1)
+; VSCALEFORTUNING2-NEXT: [[TMP25:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[BROADCAST_SPLAT7]], i32 1)
+; VSCALEFORTUNING2-NEXT: [[TMP26]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[BROADCAST_SPLAT7]], <vscale x 4 x i32> [[BROADCAST_SPLAT7]], i32 1)
+; VSCALEFORTUNING2-NEXT: [[TMP27:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR4]], <vscale x 4 x i32> [[TMP25]], i32 1)
+; VSCALEFORTUNING2-NEXT: [[TMP28:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[TMP25]], <vscale x 4 x i32> [[TMP26]], i32 1)
; VSCALEFORTUNING2-NEXT: [[TMP29:%.*]] = or <vscale x 4 x i32> [[TMP27]], [[BROADCAST_SPLAT]]
; VSCALEFORTUNING2-NEXT: [[TMP30:%.*]] = or <vscale x 4 x i32> [[TMP28]], [[BROADCAST_SPLAT]]
; VSCALEFORTUNING2-NEXT: [[TMP31:%.*]] = shl <vscale x 4 x i32> [[TMP29]], splat (i32 1)
@@ -218,8 +218,8 @@ define i32 @chained_recurrences(i32 %x, i64 %y, ptr %src.1, i32 %z, ptr %src.2)
; PRED-NEXT: [[TMP28:%.*]] = load i32, ptr [[TMP12]], align 4
; PRED-NEXT: [[BROADCAST_SPLATINSERT5:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[TMP28]], i64 0
; PRED-NEXT: [[BROADCAST_SPLAT6]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT5]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
-; PRED-NEXT: [[TMP29]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[BROADCAST_SPLAT6]], i32 -1)
-; PRED-NEXT: [[TMP30:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR4]], <vscale x 4 x i32> [[TMP29]], i32 -1)
+; PRED-NEXT: [[TMP29]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[BROADCAST_SPLAT6]], i32 1)
+; PRED-NEXT: [[TMP30:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR4]], <vscale x 4 x i32> [[TMP29]], i32 1)
; PRED-NEXT: [[TMP31:%.*]] = or <vscale x 4 x i32> [[TMP30]], [[BROADCAST_SPLAT]]
; PRED-NEXT: [[TMP32:%.*]] = shl <vscale x 4 x i32> [[TMP31]], splat (i32 1)
; PRED-NEXT: [[TMP33:%.*]] = or <vscale x 4 x i32> [[TMP32]], splat (i32 2)
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll
index 8935010e71676..6ba0eb23e485c 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll
@@ -1293,7 +1293,7 @@ define void @PR34743(ptr %a, ptr %b, i64 %n) #1 {
; CHECK-NEXT: [[TMP21:%.*]] = sext <vscale x 4 x i16> [[WIDE_MASKED_GATHER]] to <vscale x 4 x i32>
; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i16, ptr [[A]], <vscale x 4 x i64> [[TMP19]]
; CHECK-NEXT: [[WIDE_MASKED_GATHER4]] = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16.nxv4p0(<vscale x 4 x ptr> align 4 [[TMP22]], <vscale x 4 x i1> splat (i1 true), <vscale x 4 x i16> poison), !alias.scope [[META34]]
-; CHECK-NEXT: [[TMP23:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_MASKED_GATHER4]], i32 -1)
+; CHECK-NEXT: [[TMP23:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_MASKED_GATHER4]], i32 1)
; CHECK-NEXT: [[TMP24:%.*]] = sext <vscale x 4 x i16> [[TMP23]] to <vscale x 4 x i32>
; CHECK-NEXT: [[TMP25:%.*]] = sext <vscale x 4 x i16> [[WIDE_MASKED_GATHER4]] to <vscale x 4 x i32>
; CHECK-NEXT: [[TMP26:%.*]] = mul nsw <vscale x 4 x i32> [[TMP24]], [[TMP21]]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll
index 7dd0f0c0ad8e0..3ebc38679e203 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll
@@ -178,7 +178,7 @@ define void @add_recur(ptr noalias %dst, ptr noalias %src, i64 %n) #0 {
; CHECK-NOTF-NOT: %{{.*}} = phi <vscale x 4 x i1>
; CHECK-NOTF: %[[VECTOR_RECUR:.*]] = phi <vscale x 4 x i32> [ %[[RECUR_INIT]], %vector.ph ], [ %[[LOAD:.*]], %vector.body ]
; CHECK-NOTF: %[[LOAD]] = load <vscale x 4 x i32>
-; CHECK-NOTF: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 -1)
+; CHECK-NOTF: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
; CHECK-NOTF: %[[ADD:.*]] = add nsw <vscale x 4 x i32> %[[LOAD]], %[[SPLICE]]
; CHECK-NOTF: store <vscale x 4 x i32> %[[ADD]]
@@ -191,7 +191,7 @@ define void @add_recur(ptr noalias %dst, ptr noalias %src, i64 %n) #0 {
; CHECK-TF-NORED: %[[ACTIVE_LANE_MASK:.*]] = phi <vscale x 4 x i1>
; CHECK-TF-NORED: %[[VECTOR_RECUR:.*]] = phi <vscale x 4 x i32> [ %[[RECUR_INIT]], %vector.ph ], [ %[[LOAD:.*]], %vector.body ]
; CHECK-TF-NORED: %[[LOAD]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0({{.*}} %[[ACTIVE_LANE_MASK]]
-; CHECK-TF-NORED: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 -1)
+; CHECK-TF-NORED: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
; CHECK-TF-NORED: %[[ADD:.*]] = add nsw <vscale x 4 x i32> %[[LOAD]], %[[SPLICE]]
; CHECK-TF-NORED: call void @llvm.masked.store.nxv4i32.p0(<vscale x 4 x i32> %[[ADD]], {{.*}} <vscale x 4 x i1> %[[ACTIVE_LANE_MASK]])
@@ -204,7 +204,7 @@ define void @add_recur(ptr noalias %dst, ptr noalias %src, i64 %n) #0 {
; CHECK-TF-NOREC-NOT: %{{.*}} = phi <vscale x 4 x i1>
; CHECK-TF-NOREC: %[[VECTOR_RECUR:.*]] = phi <vscale x 4 x i32> [ %[[RECUR_INIT]], %vector.ph ], [ %[[LOAD:.*]], %vector.body ]
; CHECK-TF-NOREC: %[[LOAD]] = load <vscale x 4 x i32>
-; CHECK-TF-NOREC: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 -1)
+; CHECK-TF-NOREC: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
; CHECK-TF-NOREC: %[[ADD:.*]] = add nsw <vscale x 4 x i32> %[[LOAD]], %[[SPLICE]]
; CHECK-TF-NOREC: store <vscale x 4 x i32> %[[ADD]]
@@ -217,7 +217,7 @@ define void @add_recur(ptr noalias %dst, ptr noalias %src, i64 %n) #0 {
; CHECK-TF-NOREV: %[[ACTIVE_LANE_MASK:.*]] = phi <vscale x 4 x i1>
; CHECK-TF-NOREV: %[[VECTOR_RECUR:.*]] = phi <vscale x 4 x i32> [ %[[RECUR_INIT]], %vector.ph ], [ %[[LOAD:.*]], %vector.body ]
; CHECK-TF-NOREV: %[[LOAD]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0({{.*}} %[[ACTIVE_LANE_MASK]]
-; CHECK-TF-NOREV: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 -1)
+; CHECK-TF-NOREV: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
; CHECK-TF-NOREV: %[[ADD:.*]] = add nsw <vscale x 4 x i32> %[[LOAD]], %[[SPLICE]]
; CHECK-TF-NOREV: call void @llvm.masked.store.nxv4i32.p0(<vscale x 4 x i32> %[[ADD]], {{.*}} <vscale x 4 x i1> %[[ACTIVE_LANE_MASK]])
@@ -230,7 +230,7 @@ define void @add_recur(ptr noalias %dst, ptr noalias %src, i64 %n) #0 {
; CHECK-TF: %[[ACTIVE_LANE_MASK:.*]] = phi <vscale x 4 x i1>
; CHECK-TF: %[[VECTOR_RECUR:.*]] = phi <vscale x 4 x i32> [ %[[RECUR_INIT]], %vector.ph ], [ %[[LOAD:.*]], %vector.body ]
; CHECK-TF: %[[LOAD]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0({{.*}} %[[ACTIVE_LANE_MASK]]
-; CHECK-TF: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 -1)
+; CHECK-TF: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
; CHECK-TF: %[[ADD:.*]] = add nsw <vscale x 4 x i32> %[[LOAD]], %[[SPLICE]]
; CHECK-TF: call void @llvm.masked.store.nxv4i32.p0(<vscale x 4 x i32> %[[ADD]], {{.*}} <vscale x 4 x i1> %[[ACTIVE_LANE_MASK]])
@@ -243,7 +243,7 @@ define void @add_recur(ptr noalias %dst, ptr noalias %src, i64 %n) #0 {
; CHECK-TF-ONLYRED-NOT: %{{.*}} = phi <vscale x 4 x i1>
; CHECK-TF-ONLYRED: %[[VECTOR_RECUR:.*]] = phi <vscale x 4 x i32> [ %[[RECUR_INIT]], %vector.ph ], [ %[[LOAD:.*]], %vector.body ]
; CHECK-TF-ONLYRED: %[[LOAD]] = load <vscale x 4 x i32>
-; CHECK-TF-ONLYRED: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 -1)
+; CHECK-TF-ONLYRED: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
; CHECK-TF-ONLYRED: %[[ADD:.*]] = add nsw <vscale x 4 x i32> %[[LOAD]], %[[SPLICE]]
; CHECK-TF-ONLYRED: store <vscale x 4 x i32> %[[ADD]]
@@ -256,7 +256,7 @@ define void @add_recur(ptr noalias %dst, ptr noalias %src, i64 %n) #0 {
; CHECK-NEOVERSE-V1-NOT: %{{.*}} = phi <vscale x 4 x i1>
; CHECK-NEOVERSE-V1: %[[VECTOR_RECUR:.*]] = phi <vscale x 4 x i32> [ %[[RECUR_INIT]], %vector.ph ], [ %[[LOAD:.*]], %vector.body ]
; CHECK-NEOVERSE-V1: %[[LOAD]] = load <vscale x 4 x i32>
-; CHECK-NEOVERSE-V1: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 -1)
+; CHECK-NEOVERSE-V1: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
; CHECK-NEOVERSE-V1: %[[ADD:.*]] = add nsw <vscale x 4 x i32> %[[LOAD]], %[[SPLICE]]
; CHECK-NEOVERSE-V1: store <vscale x 4 x i32> %[[ADD]]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-fixed-order-recurrence.ll b/llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-fixed-order-recurrence.ll
index b95691f6e7c04..65cf3f161df93 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-fixed-order-recurrence.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-fixed-order-recurrence.ll
@@ -67,7 +67,7 @@ define void @first_order_recurrence(ptr noalias %A, ptr noalias %B, i64 %TC) {
; NO-VP-NEXT: [[VECTOR_RECUR:%.*]] = phi <vscale x 4 x i32> [ [[VECTOR_RECUR_INIT]], %[[VECTOR_PH]] ], [ [[WIDE_LOAD:%.*]], %[[VECTOR_BODY]] ]
; NO-VP-NEXT: [[TMP10:%.*]] = getelementptr inbounds nuw i32, ptr [[A]], i64 [[INDEX]]
; NO-VP-NEXT: [[WIDE_LOAD]] = load <vscale x 4 x i32>, ptr [[TMP10]], align 4
-; NO-VP-NEXT: [[TMP12:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 -1)
+; NO-VP-NEXT: [[TMP12:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
; NO-VP-NEXT: [[TMP13:%.*]] = add nsw <vscale x 4 x i32> [[TMP12]], [[WIDE_LOAD]]
; NO-VP-NEXT: [[TMP14:%.*]] = getelementptr inbounds nuw i32, ptr [[B]], i64 [[INDEX]]
; NO-VP-NEXT: store <vscale x 4 x i32> [[TMP13]], ptr [[TMP14]], align 4
@@ -187,8 +187,8 @@ define void @second_order_recurrence(ptr noalias %A, ptr noalias %B, i64 %TC) {
; NO-VP-NEXT: [[VECTOR_RECUR2:%.*]] = phi <vscale x 4 x i32> [ [[VECTOR_RECUR_INIT1]], %[[VECTOR_PH]] ], [ [[TMP15:%.*]], %[[VECTOR_BODY]] ]
; NO-VP-NEXT: [[TMP13:%.*]] = getelementptr inbounds nuw i32, ptr [[A]], i64 [[INDEX]]
; NO-VP-NEXT: [[WIDE_LOAD]] = load <vscale x 4 x i32>, ptr [[TMP13]], align 4
-; NO-VP-NEXT: [[TMP15]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 -1)
-; NO-VP-NEXT: [[TMP16:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR2]], <vscale x 4 x i32> [[TMP15]], i32 -1)
+; NO-VP-NEXT: [[TMP15]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
+; NO-VP-NEXT: [[TMP16:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR2]], <vscale x 4 x i32> [[TMP15]], i32 1)
; NO-VP-NEXT: [[TMP17:%.*]] = add nsw <vscale x 4 x i32> [[TMP15]], [[TMP16]]
; NO-VP-NEXT: [[TMP18:%.*]] = getelementptr inbounds nuw i32, ptr [[B]], i64 [[INDEX]]
; NO-VP-NEXT: store <vscale x 4 x i32> [[TMP17]], ptr [[TMP18]], align 4
@@ -327,9 +327,9 @@ define void @third_order_recurrence(ptr noalias %A, ptr noalias %B, i64 %TC) {
; NO-VP-NEXT: [[VECTOR_RECUR4:%.*]] = phi <vscale x 4 x i32> [ [[VECTOR_RECUR_INIT3]], %[[VECTOR_PH]] ], [ [[TMP19:%.*]], %[[VECTOR_BODY]] ]
; NO-VP-NEXT: [[TMP16:%.*]] = getelementptr inbounds nuw i32, ptr [[A]], i64 [[INDEX]]
; NO-VP-NEXT: [[WIDE_LOAD]] = load <vscale x 4 x i32>, ptr [[TMP16]], align 4
-; NO-VP-NEXT: [[TMP18]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 -1)
-; NO-VP-NEXT: [[TMP19]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR2]], <vscale x 4 x i32> [[TMP18]], i32 -1)
-; NO-VP-NEXT: [[TMP20:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR4]], <vscale x 4 x i32> [[TMP19]], i32 -1)
+; NO-VP-NEXT: [[TMP18]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
+; NO-VP-NEXT: [[TMP19]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR2]], <vscale x 4 x i32> [[TMP18]], i32 1)
+; NO-VP-NEXT: [[TMP20:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR4]], <vscale x 4 x i32> [[TMP19]], i32 1)
; NO-VP-NEXT: [[TMP21:%.*]] = add nsw <vscale x 4 x i32> [[TMP19]], [[TMP20]]
; NO-VP-NEXT: [[TMP22:%.*]] = add <vscale x 4 x i32> [[TMP21]], [[TMP18]]
; NO-VP-NEXT: [[TMP23:%.*]] = getelementptr inbounds nuw i32, ptr [[B]], i64 [[INDEX]]
@@ -467,7 +467,7 @@ define i32 @FOR_reduction(ptr noalias %A, ptr noalias %B, i64 %TC) {
; NO-VP-NEXT: [[VECTOR_RECUR:%.*]] = phi <vscale x 4 x i32> [ [[VECTOR_RECUR_INIT]], %[[VECTOR_PH]] ], [ [[WIDE_LOAD:%.*]], %[[VECTOR_BODY]] ]
; NO-VP-NEXT: [[TMP10:%.*]] = getelementptr inbounds nuw i32, ptr [[A]], i64 [[INDEX]]
; NO-VP-NEXT: [[WIDE_LOAD]] = load <vscale x 4 x i32>, ptr [[TMP10]], align 4
-; NO-VP-NEXT: [[TMP12:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 -1)
+; NO-VP-NEXT: [[TMP12:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
; NO-VP-NEXT: [[TMP13:%.*]] = add nsw <vscale x 4 x i32> [[TMP12]], [[WIDE_LOAD]]
; NO-VP-NEXT: [[TMP14:%.*]] = getelementptr inbounds nuw i32, ptr [[B]], i64 [[INDEX]]
; NO-VP-NEXT: store <vscale x 4 x i32> [[TMP13]], ptr [[TMP14]], align 4
@@ -591,7 +591,7 @@ define void @first_order_recurrence_indvar(ptr noalias %A, i64 %TC) {
; NO-VP-NEXT: [[VEC_IND:%.*]] = phi <vscale x 2 x i64> [ [[INDUCTION]], %[[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], %[[VECTOR_BODY]] ]
; NO-VP-NEXT: [[VECTOR_RECUR:%.*]] = phi <vscale x 2 x i64> [ [[VECTOR_RECUR_INIT]], %[[VECTOR_PH]] ], [ [[TMP12:%.*]], %[[VECTOR_BODY]] ]
; NO-VP-NEXT: [[TMP12]] = add <vscale x 2 x i64> [[VEC_IND]], splat (i64 42)
-; NO-VP-NEXT: [[TMP13:%.*]] = call <vscale x 2 x i64> @llvm.vector.splice.nxv2i64(<vscale x 2 x i64> [[VECTOR_RECUR]], <vscale x 2 x i64> [[TMP12]], i32 -1)
+; NO-VP-NEXT: [[TMP13:%.*]] = call <vscale x 2 x i64> @llvm.vector.splice.up.nxv2i64(<vscale x 2 x i64> [[VECTOR_RECUR]], <vscale x 2 x i64> [[TMP12]], i32 1)
; NO-VP-NEXT: [[TMP11:%.*]] = getelementptr inbounds nuw i64, ptr [[A]], i64 [[INDEX]]
; NO-VP-NEXT: store <vscale x 2 x i64> [[TMP13]], ptr [[TMP11]], align 8
; NO-VP-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP3]]
diff --git a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-scalable-vf1.ll b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-scalable-vf1.ll
index 40587c0c8b68c..8b43c8554cc86 100644
--- a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-scalable-vf1.ll
+++ b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-scalable-vf1.ll
@@ -24,7 +24,7 @@ define i64 @pr97452_scalable_vf1_for_live_out(ptr %src) {
; CHECK-NEXT: [[VECTOR_RECUR:%.*]] = phi <vscale x 1 x i64> [ [[VECTOR_RECUR_INIT]], %[[VECTOR_PH]] ], [ [[WIDE_LOAD:%.*]], %[[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD]] = load <vscale x 1 x i64>, ptr [[TMP5]], align 8
-; CHECK-NEXT: [[TMP7:%.*]] = call <vscale x 1 x i64> @llvm.vector.splice.nxv1i64(<vscale x 1 x i64> [[VECTOR_RECUR]], <vscale x 1 x i64> [[WIDE_LOAD]], i32 -1)
+; CHECK-NEXT: [[TMP7:%.*]] = call <vscale x 1 x i64> @llvm.vector.splice.up.nxv1i64(<vscale x 1 x i64> [[VECTOR_RECUR]], <vscale x 1 x i64> [[WIDE_LOAD]], i32 1)
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP1]]
; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP6]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
@@ -91,7 +91,7 @@ define void @pr97452_scalable_vf1_for_no_live_out(ptr %src, ptr noalias %dst) {
; CHECK-NEXT: [[VECTOR_RECUR:%.*]] = phi <vscale x 1 x i64> [ [[VECTOR_RECUR_INIT]], %[[VECTOR_PH]] ], [ [[WIDE_LOAD:%.*]], %[[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD]] = load <vscale x 1 x i64>, ptr [[TMP5]], align 8
-; CHECK-NEXT: [[TMP7:%.*]] = call <vscale x 1 x i64> @llvm.vector.splice.nxv1i64(<vscale x 1 x i64> [[VECTOR_RECUR]], <vscale x 1 x i64> [[WIDE_LOAD]], i32 -1)
+; CHECK-NEXT: [[TMP7:%.*]] = call <vscale x 1 x i64> @llvm.vector.splice.up.nxv1i64(<vscale x 1 x i64> [[VECTOR_RECUR]], <vscale x 1 x i64> [[WIDE_LOAD]], i32 1)
; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[DST]], i64 [[INDEX]]
; CHECK-NEXT: store <vscale x 1 x i64> [[TMP7]], ptr [[TMP8]], align 8
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP1]]
diff --git a/llvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll b/llvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll
index 1216bc1dc33cc..01cb4ffe9debd 100644
--- a/llvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll
+++ b/llvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll
@@ -47,7 +47,7 @@ define i32 @recurrence_1(ptr nocapture readonly %a, ptr nocapture %b, i32 %n) {
; CHECK-VF4UF1-NEXT: [[TMP17:%.*]] = add nuw nsw i64 [[INDEX]], 1
; CHECK-VF4UF1-NEXT: [[TMP18:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP17]]
; CHECK-VF4UF1-NEXT: [[WIDE_LOAD]] = load <vscale x 4 x i32>, ptr [[TMP18]], align 4
-; CHECK-VF4UF1-NEXT: [[TMP20:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 -1)
+; CHECK-VF4UF1-NEXT: [[TMP20:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
; CHECK-VF4UF1-NEXT: [[TMP21:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDEX]]
; CHECK-VF4UF1-NEXT: [[TMP22:%.*]] = add <vscale x 4 x i32> [[WIDE_LOAD]], [[TMP20]]
; CHECK-VF4UF1-NEXT: store <vscale x 4 x i32> [[TMP22]], ptr [[TMP21]], align 4
@@ -114,8 +114,8 @@ define i32 @recurrence_1(ptr nocapture readonly %a, ptr nocapture %b, i32 %n) {
; CHECK-VF4UF2-NEXT: [[TMP22:%.*]] = getelementptr inbounds i32, ptr [[TMP18]], i64 [[TMP21]]
; CHECK-VF4UF2-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 4 x i32>, ptr [[TMP18]], align 4
; CHECK-VF4UF2-NEXT: [[WIDE_LOAD3]] = load <vscale x 4 x i32>, ptr [[TMP22]], align 4
-; CHECK-VF4UF2-NEXT: [[TMP23:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 -1)
-; CHECK-VF4UF2-NEXT: [[TMP24:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> [[WIDE_LOAD]], <vscale x 4 x i32> [[WIDE_LOAD3]], i32 -1)
+; CHECK-VF4UF2-NEXT: [[TMP23:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
+; CHECK-VF4UF2-NEXT: [[TMP24:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[WIDE_LOAD]], <vscale x 4 x i32> [[WIDE_LOAD3]], i32 1)
; CHECK-VF4UF2-NEXT: [[TMP25:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDEX]]
; CHECK-VF4UF2-NEXT: [[TMP26:%.*]] = add <vscale x 4 x i32> [[WIDE_LOAD]], [[TMP23]]
; CHECK-VF4UF2-NEXT: [[TMP27:%.*]] = add <vscale x 4 x i32> [[WIDE_LOAD3]], [[TMP24]]
@@ -206,7 +206,7 @@ define i32 @recurrence_2(ptr nocapture readonly %a, i32 %n) {
; CHECK-VF4UF1-NEXT: [[VEC_PHI:%.*]] = phi <vscale x 4 x i32> [ zeroinitializer, %[[VECTOR_PH]] ], [ [[TMP17:%.*]], %[[VECTOR_BODY]] ]
; CHECK-VF4UF1-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[INDEX]]
; CHECK-VF4UF1-NEXT: [[WIDE_LOAD]] = load <vscale x 4 x i32>, ptr [[TMP10]], align 4
-; CHECK-VF4UF1-NEXT: [[TMP12:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 -1)
+; CHECK-VF4UF1-NEXT: [[TMP12:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
; CHECK-VF4UF1-NEXT: [[TMP13:%.*]] = sub nsw <vscale x 4 x i32> [[WIDE_LOAD]], [[TMP12]]
; CHECK-VF4UF1-NEXT: [[TMP14:%.*]] = icmp sgt <vscale x 4 x i32> [[TMP13]], zeroinitializer
; CHECK-VF4UF1-NEXT: [[TMP15:%.*]] = select <vscale x 4 x i1> [[TMP14]], <vscale x 4 x i32> [[TMP13]], <vscale x 4 x i32> zeroinitializer
@@ -270,8 +270,8 @@ define i32 @recurrence_2(ptr nocapture readonly %a, i32 %n) {
; CHECK-VF4UF2-NEXT: [[TMP14:%.*]] = getelementptr inbounds i32, ptr [[TMP10]], i64 [[TMP13]]
; CHECK-VF4UF2-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 4 x i32>, ptr [[TMP10]], align 4
; CHECK-VF4UF2-NEXT: [[WIDE_LOAD2]] = load <vscale x 4 x i32>, ptr [[TMP14]], align 4
-; CHECK-VF4UF2-NEXT: [[TMP15:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 -1)
-; CHECK-VF4UF2-NEXT: [[TMP16:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.nxv4i32(<vscale x 4 x i32> [[WIDE_LOAD]], <vscale x 4 x i32> [[WIDE_LOAD2]], i32 -1)
+; CHECK-VF4UF2-NEXT: [[TMP15:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
+; CHECK-VF4UF2-NEXT: [[TMP16:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[WIDE_LOAD]], <vscale x 4 x i32> [[WIDE_LOAD2]], i32 1)
; CHECK-VF4UF2-NEXT: [[TMP17:%.*]] = sub nsw <vscale x 4 x i32> [[WIDE_LOAD]], [[TMP15]]
; CHECK-VF4UF2-NEXT: [[TMP18:%.*]] = sub nsw <vscale x 4 x i32> [[WIDE_LOAD2]], [[TMP16]]
; CHECK-VF4UF2-NEXT: [[TMP19:%.*]] = icmp sgt <vscale x 4 x i32> [[TMP17]], zeroinitializer
@@ -392,7 +392,7 @@ define void @recurrence_3(ptr nocapture readonly %a, ptr nocapture %b, i32 %n, f
; CHECK-VF4UF1-NEXT: [[OFFSET_IDX:%.*]] = add i64 1, [[INDEX]]
; CHECK-VF4UF1-NEXT: [[TMP19:%.*]] = getelementptr inbounds i16, ptr [[A]], i64 [[OFFSET_IDX]]
; CHECK-VF4UF1-NEXT: [[WIDE_LOAD]] = load <vscale x 4 x i16>, ptr [[TMP19]], align 2, !alias.scope [[META6:![0-9]+]]
-; CHECK-VF4UF1-NEXT: [[TMP21:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_LOAD]], i32 -1)
+; CHECK-VF4UF1-NEXT: [[TMP21:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_LOAD]], i32 1)
; CHECK-VF4UF1-NEXT: [[TMP22:%.*]] = sitofp <vscale x 4 x i16> [[WIDE_LOAD]] to <vscale x 4 x double>
; CHECK-VF4UF1-NEXT: [[TMP23:%.*]] = sitofp <vscale x 4 x i16> [[TMP21]] to <vscale x 4 x double>
; CHECK-VF4UF1-NEXT: [[TMP24:%.*]] = fmul fast <vscale x 4 x double> [[TMP23]], [[BROADCAST_SPLAT]]
@@ -472,8 +472,8 @@ define void @recurrence_3(ptr nocapture readonly %a, ptr nocapture %b, i32 %n, f
; CHECK-VF4UF2-NEXT: [[TMP23:%.*]] = getelementptr inbounds i16, ptr [[TMP19]], i64 [[TMP22]]
; CHECK-VF4UF2-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 4 x i16>, ptr [[TMP19]], align 2, !alias.scope [[META6:![0-9]+]]
; CHECK-VF4UF2-NEXT: [[WIDE_LOAD4]] = load <vscale x 4 x i16>, ptr [[TMP23]], align 2, !alias.scope [[META6]]
-; CHECK-VF4UF2-NEXT: [[TMP24:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_LOAD]], i32 -1)
-; CHECK-VF4UF2-NEXT: [[TMP25:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.nxv4i16(<vscale x 4 x i16> [[WIDE_LOAD]], <vscale x 4 x i16> [[WIDE_LOAD4]], i32 -1)
+; CHECK-VF4UF2-NEXT: [[TMP24:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_LOAD]], i32 1)
+; CHECK-VF4UF2-NEXT: [[TMP25:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> [[WIDE_LOAD]], <vscale x 4 x i16> [[WIDE_LOAD4]], i32 1)
; CHECK-VF4UF2-NEXT: [[TMP26:%.*]] = sitofp <vscale x 4 x i16> [[WIDE_LOAD]] to <vscale x 4 x double>
; CHECK-VF4UF2-NEXT: [[TMP27:%.*]] = sitofp <vscale x 4 x i16> [[WIDE_LOAD4]] to <vscale x 4 x double>
; CHECK-VF4UF2-NEXT: [[TMP28:%.*]] = sitofp <vscale x 4 x i16> [[TMP24]] to <vscale x 4 x double>
@@ -766,7 +766,7 @@ define void @sink_after(ptr %a, ptr %b, i64 %n) {
; CHECK-VF4UF1-NEXT: [[TMP12:%.*]] = add nuw nsw i64 [[INDEX]], 1
; CHECK-VF4UF1-NEXT: [[TMP13:%.*]] = getelementptr inbounds i16, ptr [[A]], i64 [[TMP12]]
; CHECK-VF4UF1-NEXT: [[WIDE_LOAD]] = load <vscale x 4 x i16>, ptr [[TMP13]], align 2, !alias.scope [[META17:![0-9]+]]
-; CHECK-VF4UF1-NEXT: [[TMP15:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_LOAD]], i32 -1)
+; CHECK-VF4UF1-NEXT: [[TMP15:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_LOAD]], i32 1)
; CHECK-VF4UF1-NEXT: [[TMP16:%.*]] = sext <vscale x 4 x i16> [[TMP15]] to <vscale x 4 x i32>
; CHECK-VF4UF1-NEXT: [[TMP17:%.*]] = sext <vscale x 4 x i16> [[WIDE_LOAD]] to <vscale x 4 x i32>
; CHECK-VF4UF1-NEXT: [[TMP18:%.*]] = mul nsw <vscale x 4 x i32> [[TMP17]], [[TMP16]]
@@ -827,8 +827,8 @@ define void @sink_after(ptr %a, ptr %b, i64 %n) {
; CHECK-VF4UF2-NEXT: [[TMP17:%.*]] = getelementptr inbounds i16, ptr [[TMP13]], i64 [[TMP16]]
; CHECK-VF4UF2-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 4 x i16>, ptr [[TMP13]], align 2, !alias.scope [[META17:![0-9]+]]
; CHECK-VF4UF2-NEXT: [[WIDE_LOAD3]] = load <vscale x 4 x i16>, ptr [[TMP17]], align 2, !alias.scope [[META17]]
-; CHECK-VF4UF2-NEXT: [[TMP18:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_LOAD]], i32 -1)
-; CHECK-VF4UF2-NEXT: [[TMP19:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.nxv4i16(<vscale x 4 x i16> [[WIDE_LOAD]], <vscale x 4 x i16> [[WIDE_LOAD3]], i32 -1)
+; CHECK-VF4UF2-NEXT: [[TMP18:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_LOAD]], i32 1)
+; CHECK-VF4UF2-NEXT: [[TMP19:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> [[WIDE_LOAD]], <vscale x 4 x i16> [[WIDE_LOAD3]], i32 1)
; CHECK-VF4UF2-NEXT: [[TMP20:%.*]] = sext <vscale x 4 x i16> [[TMP18]] to <vscale x 4 x i32>
; CHECK-VF4UF2-NEXT: [[TMP21:%.*]] = sext <vscale x 4 x i16> [[TMP19]] to <vscale x 4 x i32>
; CHECK-VF4UF2-NEXT: [[TMP22:%.*]] = sext <vscale x 4 x i16> [[WIDE_LOAD]] to <vscale x 4 x i32>
>From 89315b4fb22addbdf5323809ea6ddb8f7e09761a Mon Sep 17 00:00:00 2001
From: Luke Lau <luke at igalia.com>
Date: Fri, 5 Dec 2025 15:30:37 +0800
Subject: [PATCH 02/12] Fix some tests
- Restore experimental.vector.splice autoupgrade
- Fix verifier scaling min known elements by vscale even for fixed vectors
---
llvm/lib/IR/AutoUpgrade.cpp | 6 +++++-
llvm/lib/IR/Verifier.cpp | 3 ++-
llvm/test/Assembler/auto_upgrade_intrinsics.ll | 6 +++---
llvm/test/Bitcode/upgrade-vector-splice-intrinsic.ll | 8 ++++----
llvm/test/Verifier/invalid-splice.ll | 10 +++++-----
5 files changed, 19 insertions(+), 14 deletions(-)
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index a22ea555cbcc5..4e5249cc7d99e 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -1361,6 +1361,9 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn,
}
break; // No other 'expermental.vector.reduce.*'.
}
+
+ if (Name.consume_front("splice"))
+ return true;
break; // No other 'experimental.vector.*'.
}
if (Name.consume_front("experimental.stepvector.")) {
@@ -4716,7 +4719,8 @@ void llvm::UpgradeIntrinsicCall(CallBase *CI, Function *NewFn) {
bool IsARM = Name.consume_front("arm.");
bool IsAMDGCN = Name.consume_front("amdgcn.");
bool IsDbg = Name.consume_front("dbg.");
- bool IsOldSplice = Name.consume_front("vector.splice") &&
+ bool IsOldSplice = (Name.consume_front("experimental.vector.splice") ||
+ Name.consume_front("vector.splice")) &&
!(Name.starts_with(".down") || Name.starts_with(".up"));
Value *Rep = nullptr;
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 3993616fb66eb..aa6d2c58a610e 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -6571,7 +6571,8 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
VectorType *VecTy = cast<VectorType>(Call.getType());
uint64_t Idx = cast<ConstantInt>(Call.getArgOperand(2))->getZExtValue();
uint64_t KnownMinNumElements = VecTy->getElementCount().getKnownMinValue();
- if (Call.getParent() && Call.getParent()->getParent()) {
+ if (VecTy->isScalableTy() && Call.getParent() &&
+ Call.getParent()->getParent()) {
AttributeList Attrs = Call.getParent()->getParent()->getAttributes();
if (Attrs.hasFnAttr(Attribute::VScaleRange))
KnownMinNumElements *= Attrs.getFnAttrs().getVScaleRangeMin();
diff --git a/llvm/test/Assembler/auto_upgrade_intrinsics.ll b/llvm/test/Assembler/auto_upgrade_intrinsics.ll
index d990447b434fc..a8eb644a0d03a 100644
--- a/llvm/test/Assembler/auto_upgrade_intrinsics.ll
+++ b/llvm/test/Assembler/auto_upgrade_intrinsics.ll
@@ -218,11 +218,11 @@ define void @test.prefetch.unnamed(ptr %ptr) {
define void @test.vector.splice(<4 x i32> %a, <4 x i32> %b) {
; CHECK-LABEL: @test.vector.splice
-; CHECK: @llvm.vector.splice.up.v4i32(<4 x i32> %a, <4 x i32> %b, i32 3)
+; CHECK: @llvm.vector.splice.down.v4i32(<4 x i32> %a, <4 x i32> %b, i32 3)
call <4 x i32> @llvm.vector.splice(<4 x i32> %a, <4 x i32> %b, i32 3)
-; CHECK: @llvm.vector.splice.down.v4i32(<4 x i32> %a, <4 x i32> %b, i32 2)
+; CHECK: @llvm.vector.splice.up.v4i32(<4 x i32> %a, <4 x i32> %b, i32 2)
call <4 x i32> @llvm.vector.splice(<4 x i32> %a, <4 x i32> %b, i32 -2)
-; CHECK: @llvm.vector.splice.up.v4i32(<4 x i32> %a, <4 x i32> %b, i32 1)
+; CHECK: @llvm.vector.splice.down.v4i32(<4 x i32> %a, <4 x i32> %b, i32 1)
call <4 x i32> @llvm.vector.splice.v4i32(<4 x i32> %a, <4 x i32> %b, i32 1)
ret void
}
diff --git a/llvm/test/Bitcode/upgrade-vector-splice-intrinsic.ll b/llvm/test/Bitcode/upgrade-vector-splice-intrinsic.ll
index 1b55da21ecd2a..844e54f32ad06 100644
--- a/llvm/test/Bitcode/upgrade-vector-splice-intrinsic.ll
+++ b/llvm/test/Bitcode/upgrade-vector-splice-intrinsic.ll
@@ -3,7 +3,7 @@
define <8 x half> @splice_fixed(<8 x half> %a, <8 x half> %b) {
; CHECK-LABEL: @splice_fixed
-; CHECK: %res = call <8 x half> @llvm.vector.splice.v8f16(<8 x half> %a, <8 x half> %b, i32 2)
+; CHECK: %1 = call <8 x half> @llvm.vector.splice.down.v8f16(<8 x half> %a, <8 x half> %b, i32 2)
%res = call <8 x half> @llvm.experimental.vector.splice.v8f16(<8 x half> %a, <8 x half> %b, i32 2)
ret <8 x half> %res
@@ -11,14 +11,14 @@ define <8 x half> @splice_fixed(<8 x half> %a, <8 x half> %b) {
define <vscale x 8 x half> @splice_scalable(<vscale x 8 x half> %a, <vscale x 8 x half> %b) {
; CHECK-LABEL: @splice_scalable
-; CHECK: %res = call <vscale x 8 x half> @llvm.vector.splice.nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b, i32 2)
+; CHECK: %1 = call <vscale x 8 x half> @llvm.vector.splice.down.nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b, i32 2)
%res = call <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b, i32 2)
ret <vscale x 8 x half> %res
}
declare <8 x half> @llvm.experimental.vector.splice.v8f16(<8 x half>, <8 x half>, i32 immarg)
-; CHECK: declare <8 x half> @llvm.vector.splice.v8f16(<8 x half>, <8 x half>, i32 immarg)
+; CHECK: declare <8 x half> @llvm.vector.splice.down.v8f16(<8 x half>, <8 x half>, i32 immarg)
declare <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>, i32 immarg)
-; CHECK: declare <vscale x 8 x half> @llvm.vector.splice.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>, i32 immarg)
+; CHECK: declare <vscale x 8 x half> @llvm.vector.splice.down.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>, i32 immarg)
diff --git a/llvm/test/Verifier/invalid-splice.ll b/llvm/test/Verifier/invalid-splice.ll
index 2239386df562f..f54c958f18410 100644
--- a/llvm/test/Verifier/invalid-splice.ll
+++ b/llvm/test/Verifier/invalid-splice.ll
@@ -1,30 +1,30 @@
; RUN: not opt -passes=verify -S < %s 2>&1 >/dev/null | FileCheck %s
-; CHECK: The splice index exceeds the range [-VL, VL-1] where VL is the known minimum number of elements in the vector
+; CHECK: The splice index exceeds the range [0, VL] where VL is the known minimum number of elements in the vector
define <2 x double> @splice_v2f64_idx_neg3(<2 x double> %a, <2 x double> %b) #0 {
%res = call <2 x double> @llvm.vector.splice.v2f64(<2 x double> %a, <2 x double> %b, i32 -3)
ret <2 x double> %res
}
-; CHECK: The splice index exceeds the range [-VL, VL-1] where VL is the known minimum number of elements in the vector
+; CHECK: The splice index exceeds the range [0, VL] where VL is the known minimum number of elements in the vector
define <vscale x 2 x double> @splice_nxv2f64_idx_neg3_vscale_min1(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
%res = call <vscale x 2 x double> @llvm.vector.splice.nxv2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b, i32 -3)
ret <vscale x 2 x double> %res
}
-; CHECK: The splice index exceeds the range [-VL, VL-1] where VL is the known minimum number of elements in the vector
+; CHECK: The splice index exceeds the range [0, VL] where VL is the known minimum number of elements in the vector
define <vscale x 2 x double> @splice_nxv2f64_idx_neg5_vscale_min2(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #1 {
%res = call <vscale x 2 x double> @llvm.vector.splice.nxv2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b, i32 -5)
ret <vscale x 2 x double> %res
}
-; CHECK: The splice index exceeds the range [-VL, VL-1] where VL is the known minimum number of elements in the vector
+; CHECK-NOT: The splice index exceeds the range [0, VL] where VL is the known minimum number of elements in the vector
define <2 x double> @splice_v2f64_idx2(<2 x double> %a, <2 x double> %b) #0 {
%res = call <2 x double> @llvm.vector.splice.v2f64(<2 x double> %a, <2 x double> %b, i32 2)
ret <2 x double> %res
}
-; CHECK: The splice index exceeds the range [-VL, VL-1] where VL is the known minimum number of elements in the vector
+; CHECK: The splice index exceeds the range [0, VL] where VL is the known minimum number of elements in the vector
define <2 x double> @splice_v2f64_idx3(<2 x double> %a, <2 x double> %b) #1 {
%res = call <2 x double> @llvm.vector.splice.v2f64(<2 x double> %a, <2 x double> %b, i32 4)
ret <2 x double> %res
>From 9415e04b5113dbbf796daea68f50442cc4472f6a Mon Sep 17 00:00:00 2001
From: Luke Lau <luke at igalia.com>
Date: Fri, 5 Dec 2025 17:19:10 +0800
Subject: [PATCH 03/12] clang-format
---
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index c88c8f300a23f..54440eb249327 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -11956,8 +11956,7 @@ SDValue TargetLowering::expandVectorSplice(SDNode *Node,
// NOTE: TrailingElts must be clamped so as not to read outside of V1:V2.
TypeSize EltByteSize = VT.getVectorElementType().getStoreSize();
- SDValue TrailingBytes =
- DAG.getConstant(Imm * EltByteSize, DL, PtrVT);
+ SDValue TrailingBytes = DAG.getConstant(Imm * EltByteSize, DL, PtrVT);
if (Imm > VT.getVectorMinNumElements())
TrailingBytes = DAG.getNode(ISD::UMIN, DL, PtrVT, TrailingBytes, VTBytes);
>From 260e69e9fa867ab3681723198f9df296bd90b3eb Mon Sep 17 00:00:00 2001
From: Luke Lau <luke at igalia.com>
Date: Fri, 5 Dec 2025 18:15:01 +0800
Subject: [PATCH 04/12] Fixup carats, typo in fshl
---
llvm/docs/LangRef.rst | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 445b772a41087..3e790973e1742 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -20730,7 +20730,7 @@ All arguments must be vectors of the same type whereby their logical
concatenation matches the result type.
'``llvm.vector.splice.down``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax:
"""""""
@@ -20782,7 +20782,7 @@ For a scalable vector, if the value of ``imm`` exceeds the runtime length of the
source vector type, the result is a :ref:`poison value <poisonvalues>`.
'``llvm.vector.splice.up``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax:
"""""""
@@ -20800,7 +20800,7 @@ The '``llvm.vector.splice.up.*``' intrinsics construct a vector by
concatenating two vectors together, shifting the elements up by ``imm``, and
extracting the upper half.
-This is equivalent to :ref:`llvm.fshr.* <int_fshl>`, but operating on elements instead
+This is equivalent to :ref:`llvm.fshl.* <int_fshl>`, but operating on elements instead
of bits.
These intrinsics work for both fixed and scalable vectors. While this intrinsic
>From bca832b14442607550addf0ec5d753af740c527a Mon Sep 17 00:00:00 2001
From: Luke Lau <luke at igalia.com>
Date: Mon, 8 Dec 2025 17:28:17 +0800
Subject: [PATCH 05/12] Restore VL-1 imm restriction for llvm.splice.down
---
llvm/docs/LangRef.rst | 4 ++--
llvm/lib/IR/Verifier.cpp | 20 ++++++++++++++------
2 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 3e790973e1742..2a193499429d0 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -20772,8 +20772,8 @@ Arguments:
The first two operands are vectors with the same type. The start index is imm
modulo the runtime number of elements in the source vector. For a fixed-width
vector <N x eltty>, imm is an unsigned integer constant in the range
-0 <= imm <= N. For a scalable vector <vscale x N x eltty>, imm is an unsigned
-integer constant in the range 0 <= imm <= X where X=vscale_range_min * N.
+0 <= imm < N. For a scalable vector <vscale x N x eltty>, imm is an unsigned
+integer constant in the range 0 <= imm < X where X=vscale_range_min * N.
Semantics:
""""""""""
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index aa6d2c58a610e..198a699bd8fed 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -6577,12 +6577,20 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
if (Attrs.hasFnAttr(Attribute::VScaleRange))
KnownMinNumElements *= Attrs.getFnAttrs().getVScaleRangeMin();
}
- Check(Idx <= KnownMinNumElements,
- "The splice index exceeds the range [0, VL] where VL is the "
- "known minimum number of elements in the vector. For scalable "
- "vectors the minimum number of elements is determined from "
- "vscale_range.",
- &Call);
+ if (ID == Intrinsic::vector_splice_down)
+ Check(Idx < KnownMinNumElements,
+ "The splice index exceeds the range [0, VL-1] where VL is the "
+ "known minimum number of elements in the vector. For scalable "
+ "vectors the minimum number of elements is determined from "
+ "vscale_range.",
+ &Call);
+ else
+ Check(Idx <= KnownMinNumElements,
+ "The splice index exceeds the range [0, VL] where VL is the "
+ "known minimum number of elements in the vector. For scalable "
+ "vectors the minimum number of elements is determined from "
+ "vscale_range.",
+ &Call);
break;
}
case Intrinsic::stepvector: {
>From 949c932d21c016addab24169d598357bbafcae39 Mon Sep 17 00:00:00 2001
From: Luke Lau <luke at igalia.com>
Date: Mon, 8 Dec 2025 17:29:42 +0800
Subject: [PATCH 06/12] Drop runtime unknown imm out of bounds semantics
This isn't possible given the Imm < vscale_range_min requirement
---
llvm/docs/LangRef.rst | 12 ------------
1 file changed, 12 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 2a193499429d0..ce2fdc2135102 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -20775,12 +20775,6 @@ vector <N x eltty>, imm is an unsigned integer constant in the range
0 <= imm < N. For a scalable vector <vscale x N x eltty>, imm is an unsigned
integer constant in the range 0 <= imm < X where X=vscale_range_min * N.
-Semantics:
-""""""""""
-
-For a scalable vector, if the value of ``imm`` exceeds the runtime length of the
-source vector type, the result is a :ref:`poison value <poisonvalues>`.
-
'``llvm.vector.splice.up``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -20827,12 +20821,6 @@ vector <N x eltty>, imm is an unsigned integer constant in the range
0 <= imm <= N. For a scalable vector <vscale x N x eltty>, imm is an unsigned
integer constant in the range 0 <= imm <= X where X=vscale_range_min * N.
-Semantics:
-""""""""""
-
-For a scalable vector, if the value of ``imm`` exceeds the runtime length of the
-source vector type, the result is a :ref:`poison value <poisonvalues>`.
-
'``llvm.stepvector``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>From 8430837ad6e42061ff68148558e54369b7746910 Mon Sep 17 00:00:00 2001
From: Luke Lau <luke at igalia.com>
Date: Mon, 8 Dec 2025 17:42:34 +0800
Subject: [PATCH 07/12] Remove sentence on "start index is Imm modulo", since
we don't have the notion of a start index anymore
---
llvm/docs/LangRef.rst | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index ce2fdc2135102..3aa232f903518 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -20769,11 +20769,10 @@ For example:
Arguments:
""""""""""
-The first two operands are vectors with the same type. The start index is imm
-modulo the runtime number of elements in the source vector. For a fixed-width
-vector <N x eltty>, imm is an unsigned integer constant in the range
-0 <= imm < N. For a scalable vector <vscale x N x eltty>, imm is an unsigned
-integer constant in the range 0 <= imm < X where X=vscale_range_min * N.
+The first two operands are vectors with the same type. For a fixed-width vector
+<N x eltty>, imm is an unsigned integer constant in the range 0 <= imm < N. For
+a scalable vector <vscale x N x eltty>, imm is an unsigned integer constant in
+the range 0 <= imm < X where X=vscale_range_min * N.
'``llvm.vector.splice.up``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -20815,11 +20814,10 @@ For example:
Arguments:
""""""""""
-The first two operands are vectors with the same type. The start index is imm
-modulo the runtime number of elements in the source vector. For a fixed-width
-vector <N x eltty>, imm is an unsigned integer constant in the range
-0 <= imm <= N. For a scalable vector <vscale x N x eltty>, imm is an unsigned
-integer constant in the range 0 <= imm <= X where X=vscale_range_min * N.
+The first two operands are vectors with the same type. For a fixed-width vector
+<N x eltty>, imm is an unsigned integer constant in the range 0 <= imm <= N. For
+a scalable vector <vscale x N x eltty>, imm is an unsigned integer constant in
+the range 0 <= imm <= X where X=vscale_range_min * N.
'``llvm.stepvector``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>From c669b92aabf99f837e529953a8b7f88729734c84 Mon Sep 17 00:00:00 2001
From: Luke Lau <luke at igalia.com>
Date: Mon, 8 Dec 2025 17:43:30 +0800
Subject: [PATCH 08/12] Remove mention of llvm.fshr/llvm.fshl
---
llvm/docs/LangRef.rst | 6 ------
1 file changed, 6 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 3aa232f903518..446603e287a9e 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -20748,9 +20748,6 @@ The '``llvm.vector.splice.down.*``' intrinsics construct a vector by
concatenating two vectors together, shifting the elements down by ``imm``, and
extracting the lower half.
-This is equivalent to :ref:`llvm.fshr.* <int_fshr>`, but operating on elements
-instead of bits.
-
These intrinsics work for both fixed and scalable vectors. While this intrinsic
supports all vector types the recommended way to express this operation for
fixed-width vectors is still to use a shufflevector, as that may allow for more
@@ -20793,9 +20790,6 @@ The '``llvm.vector.splice.up.*``' intrinsics construct a vector by
concatenating two vectors together, shifting the elements up by ``imm``, and
extracting the upper half.
-This is equivalent to :ref:`llvm.fshl.* <int_fshl>`, but operating on elements instead
-of bits.
-
These intrinsics work for both fixed and scalable vectors. While this intrinsic
supports all vector types the recommended way to express this operation for
fixed-width vectors is still to use a shufflevector, as that may allow for more
>From 86d01ce6123299d1299fe29f75bc3e1572321d61 Mon Sep 17 00:00:00 2001
From: Luke Lau <luke at igalia.com>
Date: Mon, 8 Dec 2025 18:13:33 +0800
Subject: [PATCH 09/12] Rename down/up -> left/right
---
llvm/docs/LangRef.rst | 22 +-
llvm/include/llvm/CodeGen/BasicTTIImpl.h | 6 +-
llvm/include/llvm/CodeGen/ISDOpcodes.h | 16 +-
llvm/include/llvm/IR/Intrinsics.td | 4 +-
.../include/llvm/Target/TargetSelectionDAG.td | 4 +-
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp | 8 +-
.../SelectionDAG/LegalizeIntegerTypes.cpp | 4 +-
.../SelectionDAG/LegalizeVectorTypes.cpp | 4 +-
.../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 4 +-
.../SelectionDAG/SelectionDAGBuilder.cpp | 19 +-
.../SelectionDAG/SelectionDAGDumper.cpp | 4 +-
.../CodeGen/SelectionDAG/TargetLowering.cpp | 6 +-
llvm/lib/CodeGen/TargetLoweringBase.cpp | 2 +-
llvm/lib/IR/AutoUpgrade.cpp | 14 +-
llvm/lib/IR/IRBuilder.cpp | 3 +-
llvm/lib/IR/Verifier.cpp | 6 +-
.../Target/AArch64/AArch64ISelLowering.cpp | 34 +-
.../lib/Target/AArch64/AArch64SVEInstrInfo.td | 24 +-
llvm/lib/Target/RISCV/RISCVISelLowering.cpp | 18 +-
.../test/Analysis/CostModel/AArch64/splice.ll | 56 +--
.../CostModel/AArch64/sve-intrinsics.ll | 366 +++++++++---------
.../Analysis/CostModel/RISCV/rvv-shuffle.ll | 84 ++--
llvm/test/Analysis/CostModel/RISCV/splice.ll | 224 +++++------
.../test/Assembler/auto_upgrade_intrinsics.ll | 6 +-
.../upgrade-vector-splice-intrinsic.ll | 8 +-
.../AArch64/first-order-recurrence.ll | 6 +-
.../AArch64/reduction-recurrence-costs-sve.ll | 12 +-
.../AArch64/sve-interleaved-accesses.ll | 2 +-
.../AArch64/sve-tail-folding-option.ll | 14 +-
.../tail-folding-fixed-order-recurrence.ll | 16 +-
.../first-order-recurrence-scalable-vf1.ll | 4 +-
.../scalable-first-order-recurrence.ll | 24 +-
llvm/test/Verifier/invalid-splice.ll | 4 +-
33 files changed, 514 insertions(+), 514 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 446603e287a9e..1af53f374c178 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -20729,7 +20729,7 @@ Arguments:
All arguments must be vectors of the same type whereby their logical
concatenation matches the result type.
-'``llvm.vector.splice.down``' Intrinsic
+'``llvm.vector.splice.left``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax:
@@ -20738,14 +20738,14 @@ This is an overloaded intrinsic.
::
- declare <2 x double> @llvm.vector.splice.down.v2f64(<2 x double> %vec1, <2 x double> %vec2, i32 %imm)
- declare <vscale x 4 x i32> @llvm.vector.splice.down.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, i32 %imm)
+ declare <2 x double> @llvm.vector.splice.left.v2f64(<2 x double> %vec1, <2 x double> %vec2, i32 %imm)
+ declare <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, i32 %imm)
Overview:
"""""""""
-The '``llvm.vector.splice.down.*``' intrinsics construct a vector by
-concatenating two vectors together, shifting the elements down by ``imm``, and
+The '``llvm.vector.splice.left.*``' intrinsics construct a vector by
+concatenating two vectors together, shifting the elements left by ``imm``, and
extracting the lower half.
These intrinsics work for both fixed and scalable vectors. While this intrinsic
@@ -20757,7 +20757,7 @@ For example:
.. code-block:: text
- llvm.vector.splice.down(<A,B,C,D>, <E,F,G,H>, 1);
+ llvm.vector.splice.left(<A,B,C,D>, <E,F,G,H>, 1);
==> <A,B,C,D,E,F,G,H>
==> <B,C,D,E,F,G,H,_>
==> <B,C,D,E>
@@ -20780,14 +20780,14 @@ This is an overloaded intrinsic.
::
- declare <2 x double> @llvm.vector.splice.up.v2f64(<2 x double> %vec1, <2 x double> %vec2, i32 %imm)
- declare <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, i32 %imm)
+ declare <2 x double> @llvm.vector.splice.right.v2f64(<2 x double> %vec1, <2 x double> %vec2, i32 %imm)
+ declare <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, i32 %imm)
Overview:
"""""""""
-The '``llvm.vector.splice.up.*``' intrinsics construct a vector by
-concatenating two vectors together, shifting the elements up by ``imm``, and
+The '``llvm.vector.splice.right.*``' intrinsics construct a vector by
+concatenating two vectors together, shifting the elements right by ``imm``, and
extracting the upper half.
These intrinsics work for both fixed and scalable vectors. While this intrinsic
@@ -20799,7 +20799,7 @@ For example:
.. code-block:: text
- llvm.vector.splice.up(<A,B,C,D>, <E,F,G,H>, 1);
+ llvm.vector.splice.right(<A,B,C,D>, <E,F,G,H>, 1);
==> <A,B,C,D,E,F,G,H>
==> <_,A,B,C,D,E,F,G>
==> <D,E,F,G>
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 43f9008edf6e1..8a6b48e3fe902 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -2003,13 +2003,13 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
cast<VectorType>(Args[0]->getType()), {}, CostKind, Index,
cast<VectorType>(Args[1]->getType()));
}
- case Intrinsic::vector_splice_down:
- case Intrinsic::vector_splice_up: {
+ case Intrinsic::vector_splice_left:
+ case Intrinsic::vector_splice_right: {
unsigned Index = cast<ConstantInt>(Args[2])->getZExtValue();
return thisT()->getShuffleCost(
TTI::SK_Splice, cast<VectorType>(RetTy),
cast<VectorType>(Args[0]->getType()), {}, CostKind,
- IID == Intrinsic::vector_splice_down ? Index : -Index,
+ IID == Intrinsic::vector_splice_left ? Index : -Index,
cast<VectorType>(RetTy));
}
case Intrinsic::vector_reduce_add:
diff --git a/llvm/include/llvm/CodeGen/ISDOpcodes.h b/llvm/include/llvm/CodeGen/ISDOpcodes.h
index 922d9fa79ceed..78acb85a7773f 100644
--- a/llvm/include/llvm/CodeGen/ISDOpcodes.h
+++ b/llvm/include/llvm/CodeGen/ISDOpcodes.h
@@ -641,16 +641,12 @@ enum NodeType {
/// in terms of the element size of VEC1/VEC2, not in terms of bytes.
VECTOR_SHUFFLE,
- /// VECTOR_SPLICE_DOWN(VEC1, VEC2, OFFSET) - Shifts CONCAT_VECTORS(VEC1, VEC2)
- /// down by OFFSET elements and returns the lower half. If OFFSET is greater
- /// than the runtime number of elements in the result type the result is
- /// poison.
- VECTOR_SPLICE_DOWN,
- /// VECTOR_SPLICE_UP(VEC1, VEC2, OFFSET) - Shifts CONCAT_VECTORS(VEC1, VEC2)
- /// up by OFFSET elements and returns the upper half. If OFFSET is greater
- /// than the runtime number of elements in the result type the result is
- /// poison.
- VECTOR_SPLICE_UP,
+ /// VECTOR_SPLICE_LEFT(VEC1, VEC2, IMM) - Shifts CONCAT_VECTORS(VEC1, VEC2)
+ /// left by IMM elements and returns the lower half.
+ VECTOR_SPLICE_LEFT,
+ /// VECTOR_SPLICE_RIGHT(VEC1, VEC2, IMM) - Shifts CONCAT_VECTORS(VEC1, VEC2)
+ /// right by IMM elements and returns the upper half.
+ VECTOR_SPLICE_RIGHT,
/// SCALAR_TO_VECTOR(VAL) - This represents the operation of loading a
/// scalar value into element 0 of the resultant vector type. The top
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 1aaa41464b577..99f96f5e6e0fe 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2808,12 +2808,12 @@ def int_vector_reverse : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[IntrNoMem,
IntrSpeculatable]>;
-def int_vector_splice_down
+def int_vector_splice_left
: DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
[IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<2>>]>;
-def int_vector_splice_up
+def int_vector_splice_right
: DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
[IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<2>>]>;
diff --git a/llvm/include/llvm/Target/TargetSelectionDAG.td b/llvm/include/llvm/Target/TargetSelectionDAG.td
index abd6d1435d8f6..6b063cbd776c2 100644
--- a/llvm/include/llvm/Target/TargetSelectionDAG.td
+++ b/llvm/include/llvm/Target/TargetSelectionDAG.td
@@ -832,8 +832,8 @@ def ist : SDNode<"ISD::STORE" , SDTIStore,
def vector_shuffle : SDNode<"ISD::VECTOR_SHUFFLE", SDTVecShuffle, []>;
def vector_reverse : SDNode<"ISD::VECTOR_REVERSE", SDTVecReverse>;
-def vector_splice_down : SDNode<"ISD::VECTOR_SPLICE_DOWN", SDTVecSlice, []>;
-def vector_splice_up : SDNode<"ISD::VECTOR_SPLICE_UP", SDTVecSlice, []>;
+def vector_splice_left : SDNode<"ISD::VECTOR_SPLICE_LEFT", SDTVecSlice, []>;
+def vector_splice_right : SDNode<"ISD::VECTOR_SPLICE_RIGHT", SDTVecSlice, []>;
def build_vector : SDNode<"ISD::BUILD_VECTOR", SDTypeProfile<1, -1, []>, []>;
def splat_vector : SDNode<"ISD::SPLAT_VECTOR", SDTypeProfile<1, 1, []>, []>;
def step_vector : SDNode<"ISD::STEP_VECTOR", SDTypeProfile<1, 1,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index 9ef076976dcb1..494ca598cf2d2 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -3706,8 +3706,8 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
Results.push_back(Tmp1);
break;
}
- case ISD::VECTOR_SPLICE_DOWN:
- case ISD::VECTOR_SPLICE_UP: {
+ case ISD::VECTOR_SPLICE_LEFT:
+ case ISD::VECTOR_SPLICE_RIGHT: {
Results.push_back(TLI.expandVectorSplice(Node, DAG));
break;
}
@@ -5641,8 +5641,8 @@ void SelectionDAGLegalize::PromoteNode(SDNode *Node) {
Results.push_back(Tmp1);
break;
}
- case ISD::VECTOR_SPLICE_DOWN:
- case ISD::VECTOR_SPLICE_UP: {
+ case ISD::VECTOR_SPLICE_LEFT:
+ case ISD::VECTOR_SPLICE_RIGHT: {
Tmp1 = DAG.getNode(ISD::ANY_EXTEND, dl, NVT, Node->getOperand(0));
Tmp2 = DAG.getNode(ISD::ANY_EXTEND, dl, NVT, Node->getOperand(1));
Tmp3 = DAG.getNode(Node->getOpcode(), dl, NVT, Tmp1, Tmp2,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index c1b0b5ee71d73..da3d9f7a7eded 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -132,8 +132,8 @@ void DAGTypeLegalizer::PromoteIntegerResult(SDNode *N, unsigned ResNo) {
Res = PromoteIntRes_VECTOR_REVERSE(N); break;
case ISD::VECTOR_SHUFFLE:
Res = PromoteIntRes_VECTOR_SHUFFLE(N); break;
- case ISD::VECTOR_SPLICE_DOWN:
- case ISD::VECTOR_SPLICE_UP:
+ case ISD::VECTOR_SPLICE_LEFT:
+ case ISD::VECTOR_SPLICE_RIGHT:
Res = PromoteIntRes_VECTOR_SPLICE(N);
break;
case ISD::VECTOR_INTERLEAVE:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 8a3d965ccce72..88927ea89fc9d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1258,8 +1258,8 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) {
case ISD::VECTOR_SHUFFLE:
SplitVecRes_VECTOR_SHUFFLE(cast<ShuffleVectorSDNode>(N), Lo, Hi);
break;
- case ISD::VECTOR_SPLICE_DOWN:
- case ISD::VECTOR_SPLICE_UP:
+ case ISD::VECTOR_SPLICE_LEFT:
+ case ISD::VECTOR_SPLICE_RIGHT:
SplitVecRes_VECTOR_SPLICE(N, Lo, Hi);
break;
case ISD::VECTOR_DEINTERLEAVE:
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 5c6becd72dea4..cac559f308613 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -8181,11 +8181,11 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
break;
case ISD::VECTOR_SHUFFLE:
llvm_unreachable("should use getVectorShuffle constructor!");
- case ISD::VECTOR_SPLICE_DOWN:
+ case ISD::VECTOR_SPLICE_LEFT:
if (isNullConstant(N3))
return N1;
break;
- case ISD::VECTOR_SPLICE_UP:
+ case ISD::VECTOR_SPLICE_RIGHT:
if (isNullConstant(N3))
return N2;
break;
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 025a1ce33ce67..e4a3e60bf3156 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8355,8 +8355,8 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
case Intrinsic::vector_reverse:
visitVectorReverse(I);
return;
- case Intrinsic::vector_splice_down:
- case Intrinsic::vector_splice_up:
+ case Intrinsic::vector_splice_left:
+ case Intrinsic::vector_splice_right:
visitVectorSplice(I);
return;
case Intrinsic::callbr_landingpad:
@@ -12893,21 +12893,22 @@ void SelectionDAGBuilder::visitVectorSplice(const CallInst &I) {
SDValue V1 = getValue(I.getOperand(0));
SDValue V2 = getValue(I.getOperand(1));
uint64_t Imm = cast<ConstantInt>(I.getOperand(2))->getSExtValue();
- const bool IsDown = I.getIntrinsicID() == Intrinsic::vector_splice_down;
+ const bool IsLeft = I.getIntrinsicID() == Intrinsic::vector_splice_left;
// VECTOR_SHUFFLE doesn't support a scalable mask so use a dedicated node.
if (VT.isScalableVector()) {
- setValue(&I, DAG.getNode(
- IsDown ? ISD::VECTOR_SPLICE_DOWN : ISD::VECTOR_SPLICE_UP,
- DL, VT, V1, V2,
- DAG.getConstant(Imm, DL,
- TLI.getVectorIdxTy(DAG.getDataLayout()))));
+ setValue(
+ &I,
+ DAG.getNode(
+ IsLeft ? ISD::VECTOR_SPLICE_LEFT : ISD::VECTOR_SPLICE_RIGHT, DL, VT,
+ V1, V2,
+ DAG.getConstant(Imm, DL, TLI.getVectorIdxTy(DAG.getDataLayout()))));
return;
}
unsigned NumElts = VT.getVectorNumElements();
- uint64_t Idx = (NumElts + (IsDown ? Imm : -Imm)) % NumElts;
+ uint64_t Idx = (NumElts + (IsLeft ? Imm : -Imm)) % NumElts;
// Use VECTOR_SHUFFLE to maintain original behaviour for fixed-length vectors.
SmallVector<int, 8> Mask;
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
index 8d7f202c41947..bb65ecd34c684 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
@@ -348,8 +348,8 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
case ISD::VECTOR_INTERLEAVE: return "vector_interleave";
case ISD::SCALAR_TO_VECTOR: return "scalar_to_vector";
case ISD::VECTOR_SHUFFLE: return "vector_shuffle";
- case ISD::VECTOR_SPLICE_DOWN: return "vector_splice_down";
- case ISD::VECTOR_SPLICE_UP: return "vector_splice_up";
+ case ISD::VECTOR_SPLICE_LEFT: return "vector_splice_left";
+ case ISD::VECTOR_SPLICE_RIGHT: return "vector_splice_right";
case ISD::SPLAT_VECTOR: return "splat_vector";
case ISD::SPLAT_VECTOR_PARTS: return "splat_vector_parts";
case ISD::VECTOR_REVERSE: return "vector_reverse";
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 54440eb249327..c7b3f0cbb5ee9 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -11905,8 +11905,8 @@ SDValue TargetLowering::expandFP_ROUND(SDNode *Node, SelectionDAG &DAG) const {
SDValue TargetLowering::expandVectorSplice(SDNode *Node,
SelectionDAG &DAG) const {
- assert((Node->getOpcode() == ISD::VECTOR_SPLICE_DOWN ||
- Node->getOpcode() == ISD::VECTOR_SPLICE_UP) &&
+ assert((Node->getOpcode() == ISD::VECTOR_SPLICE_LEFT ||
+ Node->getOpcode() == ISD::VECTOR_SPLICE_RIGHT) &&
"Unexpected opcode!");
assert(Node->getValueType(0).isScalableVector() &&
"Fixed length vector types expected to use SHUFFLE_VECTOR!");
@@ -11945,7 +11945,7 @@ SDValue TargetLowering::expandVectorSplice(SDNode *Node,
SDValue StackPtr2 = DAG.getNode(ISD::ADD, DL, PtrVT, StackPtr, VTBytes);
SDValue StoreV2 = DAG.getStore(StoreV1, DL, V2, StackPtr2, PtrInfo);
- if (Node->getOpcode() == ISD::VECTOR_SPLICE_DOWN) {
+ if (Node->getOpcode() == ISD::VECTOR_SPLICE_LEFT) {
// Load back the required element. getVectorElementPointer takes care of
// clamping the index if it's out-of-bounds.
StackPtr = getVectorElementPointer(DAG, StackPtr, VT, Node->getOperand(2));
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 7894fc058687c..1ff7aa968724c 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -1134,7 +1134,7 @@ void TargetLoweringBase::initActions() {
VT, Expand);
// Named vector shuffles default to expand.
- setOperationAction({ISD::VECTOR_SPLICE_DOWN, ISD::VECTOR_SPLICE_UP}, VT,
+ setOperationAction({ISD::VECTOR_SPLICE_LEFT, ISD::VECTOR_SPLICE_RIGHT}, VT,
Expand);
// Only some target support this vector operation. Most need to expand it.
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index 4e5249cc7d99e..33b07222bb255 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -1669,7 +1669,7 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn,
return true;
}
if (Name.consume_front("vector.splice")) {
- if (Name.starts_with(".down") || Name.starts_with(".up"))
+ if (Name.starts_with(".left") || Name.starts_with(".right"))
break;
return true;
}
@@ -4685,8 +4685,9 @@ static Value *upgradeVectorSplice(CallBase *CI, IRBuilder<> &Builder) {
if (!Offset)
reportFatalUsageError("Invalid llvm.vector.splice offset argument");
int64_t OffsetVal = Offset->getSExtValue();
- return Builder.CreateIntrinsic(OffsetVal >= 0 ? Intrinsic::vector_splice_down
- : Intrinsic::vector_splice_up,
+ return Builder.CreateIntrinsic(OffsetVal >= 0
+ ? Intrinsic::vector_splice_left
+ : Intrinsic::vector_splice_right,
CI->getType(),
{CI->getArgOperand(0), CI->getArgOperand(1),
Builder.getInt32(std::abs(OffsetVal))});
@@ -4719,9 +4720,10 @@ void llvm::UpgradeIntrinsicCall(CallBase *CI, Function *NewFn) {
bool IsARM = Name.consume_front("arm.");
bool IsAMDGCN = Name.consume_front("amdgcn.");
bool IsDbg = Name.consume_front("dbg.");
- bool IsOldSplice = (Name.consume_front("experimental.vector.splice") ||
- Name.consume_front("vector.splice")) &&
- !(Name.starts_with(".down") || Name.starts_with(".up"));
+ bool IsOldSplice =
+ (Name.consume_front("experimental.vector.splice") ||
+ Name.consume_front("vector.splice")) &&
+ !(Name.starts_with(".left") || Name.starts_with(".right"));
Value *Rep = nullptr;
if (!IsX86 && Name == "stackprotectorcheck") {
diff --git a/llvm/lib/IR/IRBuilder.cpp b/llvm/lib/IR/IRBuilder.cpp
index 5faf1f4fdfe14..a21a89898450a 100644
--- a/llvm/lib/IR/IRBuilder.cpp
+++ b/llvm/lib/IR/IRBuilder.cpp
@@ -1104,7 +1104,8 @@ Value *IRBuilderBase::CreateVectorSplice(Value *V1, Value *V2, int64_t Imm,
Module *M = BB->getParent()->getParent();
Function *F = Intrinsic::getOrInsertDeclaration(
M,
- Imm >= 0 ? Intrinsic::vector_splice_down : Intrinsic::vector_splice_up,
+ Imm >= 0 ? Intrinsic::vector_splice_left
+ : Intrinsic::vector_splice_right,
VTy);
Value *Ops[] = {V1, V2, getInt32(std::abs(Imm))};
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 198a699bd8fed..0e1a0d75e9856 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -6566,8 +6566,8 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
break;
}
- case Intrinsic::vector_splice_down:
- case Intrinsic::vector_splice_up: {
+ case Intrinsic::vector_splice_left:
+ case Intrinsic::vector_splice_right: {
VectorType *VecTy = cast<VectorType>(Call.getType());
uint64_t Idx = cast<ConstantInt>(Call.getArgOperand(2))->getZExtValue();
uint64_t KnownMinNumElements = VecTy->getElementCount().getKnownMinValue();
@@ -6577,7 +6577,7 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
if (Attrs.hasFnAttr(Attribute::VScaleRange))
KnownMinNumElements *= Attrs.getFnAttrs().getVScaleRangeMin();
}
- if (ID == Intrinsic::vector_splice_down)
+ if (ID == Intrinsic::vector_splice_left)
Check(Idx < KnownMinNumElements,
"The splice index exceeds the range [0, VL-1] where VL is the "
"known minimum number of elements in the vector. For scalable "
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 2df2705aa4197..38886648a3e91 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1543,8 +1543,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::MULHS, VT, Custom);
setOperationAction(ISD::MULHU, VT, Custom);
setOperationAction(ISD::SPLAT_VECTOR, VT, Legal);
- setOperationAction(ISD::VECTOR_SPLICE_DOWN, VT, Custom);
- setOperationAction(ISD::VECTOR_SPLICE_UP, VT, Custom);
+ setOperationAction(ISD::VECTOR_SPLICE_LEFT, VT, Custom);
+ setOperationAction(ISD::VECTOR_SPLICE_RIGHT, VT, Custom);
setOperationAction(ISD::SELECT, VT, Custom);
setOperationAction(ISD::SETCC, VT, Custom);
setOperationAction(ISD::SDIV, VT, Custom);
@@ -1732,8 +1732,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::VECREDUCE_FMAXIMUM, VT, Custom);
setOperationAction(ISD::VECREDUCE_FMINIMUM, VT, Custom);
setOperationAction(ISD::VECREDUCE_FMUL, VT, Custom);
- setOperationAction(ISD::VECTOR_SPLICE_DOWN, VT, Custom);
- setOperationAction(ISD::VECTOR_SPLICE_UP, VT, Custom);
+ setOperationAction(ISD::VECTOR_SPLICE_LEFT, VT, Custom);
+ setOperationAction(ISD::VECTOR_SPLICE_RIGHT, VT, Custom);
setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);
@@ -1787,8 +1787,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::SPLAT_VECTOR, VT, Legal);
setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);
- setOperationAction(ISD::VECTOR_SPLICE_DOWN, VT, Custom);
- setOperationAction(ISD::VECTOR_SPLICE_UP, VT, Custom);
+ setOperationAction(ISD::VECTOR_SPLICE_LEFT, VT, Custom);
+ setOperationAction(ISD::VECTOR_SPLICE_RIGHT, VT, Custom);
}
if (Subtarget->hasSVEB16B16() &&
@@ -1914,13 +1914,13 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::VECREDUCE_FADD, VT, Custom);
}
- setOperationPromotedToType({ISD::VECTOR_SPLICE_DOWN, ISD::VECTOR_SPLICE_UP},
+ setOperationPromotedToType({ISD::VECTOR_SPLICE_LEFT, ISD::VECTOR_SPLICE_RIGHT},
MVT::nxv2i1, MVT::nxv2i64);
- setOperationPromotedToType({ISD::VECTOR_SPLICE_DOWN, ISD::VECTOR_SPLICE_UP},
+ setOperationPromotedToType({ISD::VECTOR_SPLICE_LEFT, ISD::VECTOR_SPLICE_RIGHT},
MVT::nxv4i1, MVT::nxv4i32);
- setOperationPromotedToType({ISD::VECTOR_SPLICE_DOWN, ISD::VECTOR_SPLICE_UP},
+ setOperationPromotedToType({ISD::VECTOR_SPLICE_LEFT, ISD::VECTOR_SPLICE_RIGHT},
MVT::nxv8i1, MVT::nxv8i16);
- setOperationPromotedToType({ISD::VECTOR_SPLICE_DOWN, ISD::VECTOR_SPLICE_UP},
+ setOperationPromotedToType({ISD::VECTOR_SPLICE_LEFT, ISD::VECTOR_SPLICE_RIGHT},
MVT::nxv16i1, MVT::nxv16i8);
setOperationAction(ISD::VSCALE, MVT::i32, Custom);
@@ -2428,8 +2428,8 @@ void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT VT) {
setOperationAction(ISD::VECREDUCE_UMIN, VT, Default);
setOperationAction(ISD::VECREDUCE_XOR, VT, Default);
setOperationAction(ISD::VECTOR_SHUFFLE, VT, Default);
- setOperationAction(ISD::VECTOR_SPLICE_DOWN, VT, Default);
- setOperationAction(ISD::VECTOR_SPLICE_UP, VT, Default);
+ setOperationAction(ISD::VECTOR_SPLICE_LEFT, VT, Default);
+ setOperationAction(ISD::VECTOR_SPLICE_RIGHT, VT, Default);
setOperationAction(ISD::VSELECT, VT, Default);
setOperationAction(ISD::XOR, VT, Default);
setOperationAction(ISD::ZERO_EXTEND, VT, Default);
@@ -8088,8 +8088,8 @@ SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
return LowerToPredicatedOp(Op, DAG, AArch64ISD::CTLZ_MERGE_PASSTHRU);
case ISD::CTTZ:
return LowerCTTZ(Op, DAG);
- case ISD::VECTOR_SPLICE_DOWN:
- case ISD::VECTOR_SPLICE_UP:
+ case ISD::VECTOR_SPLICE_LEFT:
+ case ISD::VECTOR_SPLICE_RIGHT:
return LowerVECTOR_SPLICE(Op, DAG);
case ISD::VECTOR_DEINTERLEAVE:
return LowerVECTOR_DEINTERLEAVE(Op, DAG);
@@ -12289,7 +12289,7 @@ SDValue AArch64TargetLowering::LowerVECTOR_SPLICE(SDValue Op,
// there are enough elements in the vector, hence we check the index <= min
// number of elements.
std::optional<unsigned> PredPattern;
- if (Ty.isScalableVector() && Op.getOpcode() == ISD::VECTOR_SPLICE_UP &&
+ if (Ty.isScalableVector() && Op.getOpcode() == ISD::VECTOR_SPLICE_RIGHT &&
(PredPattern = getSVEPredPatternFromNumElements(IdxVal)) !=
std::nullopt) {
SDLoc DL(Op);
@@ -12306,7 +12306,7 @@ SDValue AArch64TargetLowering::LowerVECTOR_SPLICE(SDValue Op,
// We can select to an EXT instruction when indexing the first 256 bytes.
unsigned BlockSize = AArch64::SVEBitsPerBlock / Ty.getVectorMinNumElements();
- if (Op.getOpcode() == ISD::VECTOR_SPLICE_DOWN &&
+ if (Op.getOpcode() == ISD::VECTOR_SPLICE_LEFT &&
(IdxVal * BlockSize / 8) < 256)
return Op;
@@ -16409,7 +16409,7 @@ SDValue AArch64TargetLowering::LowerEXTRACT_SUBVECTOR(SDValue Op,
assert(InVT.isScalableVector() && "Unexpected vector type!");
// Move requested subvector to the start of the vector and try again.
SDValue Splice =
- DAG.getNode(ISD::VECTOR_SPLICE_DOWN, DL, InVT, Vec, Vec, Idx);
+ DAG.getNode(ISD::VECTOR_SPLICE_LEFT, DL, InVT, Vec, Vec, Idx);
return convertFromScalableVector(DAG, VT, Splice);
}
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 6908d0cb476fe..6829fc8b8bdbd 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -2153,50 +2153,50 @@ let Predicates = [HasSVE_or_SME] in {
(UZP1_ZZZ_H $v1, $v2)>;
// Splice up with offset equal to 1
- def : Pat<(nxv16i8 (vector_splice_up nxv16i8:$Z1, nxv16i8:$Z2, (i64 1))),
+ def : Pat<(nxv16i8 (vector_splice_right nxv16i8:$Z1, nxv16i8:$Z2, (i64 1))),
(INSR_ZV_B ZPR:$Z2, (INSERT_SUBREG (IMPLICIT_DEF),
(LASTB_VPZ_B (PTRUE_B 31), ZPR:$Z1), bsub))>;
- def : Pat<(nxv8i16 (vector_splice_up nxv8i16:$Z1, nxv8i16:$Z2, (i64 1))),
+ def : Pat<(nxv8i16 (vector_splice_right nxv8i16:$Z1, nxv8i16:$Z2, (i64 1))),
(INSR_ZV_H ZPR:$Z2, (INSERT_SUBREG (IMPLICIT_DEF),
(LASTB_VPZ_H (PTRUE_H 31), ZPR:$Z1), hsub))>;
- def : Pat<(nxv4i32 (vector_splice_up nxv4i32:$Z1, nxv4i32:$Z2, (i64 1))),
+ def : Pat<(nxv4i32 (vector_splice_right nxv4i32:$Z1, nxv4i32:$Z2, (i64 1))),
(INSR_ZV_S ZPR:$Z2, (INSERT_SUBREG (IMPLICIT_DEF),
(LASTB_VPZ_S (PTRUE_S 31), ZPR:$Z1), ssub))>;
- def : Pat<(nxv2i64 (vector_splice_up nxv2i64:$Z1, nxv2i64:$Z2, (i64 1))),
+ def : Pat<(nxv2i64 (vector_splice_right nxv2i64:$Z1, nxv2i64:$Z2, (i64 1))),
(INSR_ZV_D ZPR:$Z2, (INSERT_SUBREG (IMPLICIT_DEF),
(LASTB_VPZ_D (PTRUE_D 31), ZPR:$Z1), dsub))>;
// Splice down
foreach VT = [nxv16i8] in {
- def : Pat<(VT(vector_splice_down VT:$Z1, VT:$Z2,
+ def : Pat<(VT(vector_splice_left VT:$Z1, VT:$Z2,
(i64(sve_ext_imm_0_255 i32:$index)))),
(EXT_ZZI ZPR:$Z1, ZPR:$Z2, imm0_255:$index)>;
let AddedComplexity = 1 in
- def : Pat<(VT (vector_splice_down VT:$Z1, VT:$Z1, (i64(sve_ext_imm_0_255 i32:$index)))),
+ def : Pat<(VT (vector_splice_left VT:$Z1, VT:$Z1, (i64(sve_ext_imm_0_255 i32:$index)))),
(EXT_ZZI_CONSTRUCTIVE ZPR:$Z1, imm0_255:$index)>;
}
foreach VT = [nxv8i16, nxv8f16, nxv8bf16] in {
- def : Pat<(VT (vector_splice_down VT:$Z1, VT:$Z2, (i64(sve_ext_imm_0_127 i32:$index)))),
+ def : Pat<(VT (vector_splice_left VT:$Z1, VT:$Z2, (i64(sve_ext_imm_0_127 i32:$index)))),
(EXT_ZZI ZPR:$Z1, ZPR:$Z2, imm0_255:$index)>;
let AddedComplexity = 1 in
- def : Pat<(VT (vector_splice_down VT:$Z1, VT:$Z1, (i64(sve_ext_imm_0_127 i32:$index)))),
+ def : Pat<(VT (vector_splice_left VT:$Z1, VT:$Z1, (i64(sve_ext_imm_0_127 i32:$index)))),
(EXT_ZZI_CONSTRUCTIVE ZPR:$Z1, imm0_255:$index)>;
}
foreach VT = [nxv4i32, nxv4f16, nxv4f32, nxv4bf16] in {
- def : Pat<(VT (vector_splice_down VT:$Z1, VT:$Z2, (i64(sve_ext_imm_0_63 i32:$index)))),
+ def : Pat<(VT (vector_splice_left VT:$Z1, VT:$Z2, (i64(sve_ext_imm_0_63 i32:$index)))),
(EXT_ZZI ZPR:$Z1, ZPR:$Z2, imm0_255:$index)>;
let AddedComplexity = 1 in
- def : Pat<(VT (vector_splice_down VT:$Z1, VT:$Z1, (i64(sve_ext_imm_0_63 i32:$index)))),
+ def : Pat<(VT (vector_splice_left VT:$Z1, VT:$Z1, (i64(sve_ext_imm_0_63 i32:$index)))),
(EXT_ZZI_CONSTRUCTIVE ZPR:$Z1, imm0_255:$index)>;
}
foreach VT = [nxv2i64, nxv2f16, nxv2f32, nxv2f64, nxv2bf16] in {
- def : Pat<(VT( vector_splice_down VT:$Z1, VT:$Z2, (i64(sve_ext_imm_0_31 i32:$index)))),
+ def : Pat<(VT( vector_splice_left VT:$Z1, VT:$Z2, (i64(sve_ext_imm_0_31 i32:$index)))),
(EXT_ZZI ZPR:$Z1, ZPR:$Z2, imm0_255:$index)>;
let AddedComplexity = 1 in
- def : Pat<(VT (vector_splice_down VT:$Z1, VT:$Z1, (i64(sve_ext_imm_0_31 i32:$index)))),
+ def : Pat<(VT (vector_splice_left VT:$Z1, VT:$Z1, (i64(sve_ext_imm_0_31 i32:$index)))),
(EXT_ZZI_CONSTRUCTIVE ZPR:$Z1, imm0_255:$index)>;
}
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 9c4906001fb82..901ffc30c4c94 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -912,7 +912,7 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::EXPERIMENTAL_VP_SPLAT, VT, Custom);
setOperationPromotedToType(
- {ISD::VECTOR_SPLICE_DOWN, ISD::VECTOR_SPLICE_UP}, VT,
+ {ISD::VECTOR_SPLICE_LEFT, ISD::VECTOR_SPLICE_RIGHT}, VT,
MVT::getVectorVT(MVT::i8, VT.getVectorElementCount()));
}
@@ -1000,7 +1000,7 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);
- setOperationAction({ISD::VECTOR_SPLICE_DOWN, ISD::VECTOR_SPLICE_UP}, VT,
+ setOperationAction({ISD::VECTOR_SPLICE_LEFT, ISD::VECTOR_SPLICE_RIGHT}, VT,
Custom);
if (Subtarget.hasStdExtZvkb()) {
@@ -1202,7 +1202,7 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);
setOperationAction(
- {ISD::VECTOR_REVERSE, ISD::VECTOR_SPLICE_DOWN, ISD::VECTOR_SPLICE_UP},
+ {ISD::VECTOR_REVERSE, ISD::VECTOR_SPLICE_LEFT, ISD::VECTOR_SPLICE_RIGHT},
VT, Custom);
setOperationAction(ISD::EXPERIMENTAL_VP_SPLICE, VT, Custom);
setOperationAction(ISD::EXPERIMENTAL_VP_REVERSE, VT, Custom);
@@ -1249,8 +1249,8 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setOperationAction({ISD::INSERT_VECTOR_ELT, ISD::CONCAT_VECTORS,
ISD::INSERT_SUBVECTOR, ISD::EXTRACT_SUBVECTOR,
ISD::VECTOR_DEINTERLEAVE, ISD::VECTOR_INTERLEAVE,
- ISD::VECTOR_REVERSE, ISD::VECTOR_SPLICE_DOWN,
- ISD::VECTOR_SPLICE_UP, ISD::VECTOR_COMPRESS},
+ ISD::VECTOR_REVERSE, ISD::VECTOR_SPLICE_LEFT,
+ ISD::VECTOR_SPLICE_RIGHT, ISD::VECTOR_COMPRESS},
VT, Custom);
setOperationAction(ISD::EXPERIMENTAL_VP_SPLICE, VT, Custom);
setOperationAction(ISD::EXPERIMENTAL_VP_REVERSE, VT, Custom);
@@ -1305,7 +1305,7 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
ISD::CONCAT_VECTORS, ISD::INSERT_SUBVECTOR,
ISD::EXTRACT_SUBVECTOR, ISD::VECTOR_DEINTERLEAVE,
ISD::VECTOR_INTERLEAVE, ISD::VECTOR_REVERSE,
- ISD::VECTOR_SPLICE_DOWN, ISD::VECTOR_SPLICE_UP,
+ ISD::VECTOR_SPLICE_LEFT, ISD::VECTOR_SPLICE_RIGHT,
ISD::VECTOR_COMPRESS},
VT, Custom);
setOperationAction(ISD::EXPERIMENTAL_VP_SPLICE, VT, Custom);
@@ -8288,8 +8288,8 @@ SDValue RISCVTargetLowering::LowerOperation(SDValue Op,
return lowerSTEP_VECTOR(Op, DAG);
case ISD::VECTOR_REVERSE:
return lowerVECTOR_REVERSE(Op, DAG);
- case ISD::VECTOR_SPLICE_DOWN:
- case ISD::VECTOR_SPLICE_UP:
+ case ISD::VECTOR_SPLICE_LEFT:
+ case ISD::VECTOR_SPLICE_RIGHT:
return lowerVECTOR_SPLICE(Op, DAG);
case ISD::BUILD_VECTOR: {
MVT VT = Op.getSimpleValueType();
@@ -13085,7 +13085,7 @@ SDValue RISCVTargetLowering::lowerVECTOR_SPLICE(SDValue Op,
SDValue VLMax = computeVLMax(VecVT, DL, DAG);
SDValue DownOffset, UpOffset;
- if (Op.getOpcode() == ISD::VECTOR_SPLICE_DOWN) {
+ if (Op.getOpcode() == ISD::VECTOR_SPLICE_LEFT) {
// The operand is a TargetConstant, we need to rebuild it as a regular
// constant.
DownOffset = Offset;
diff --git a/llvm/test/Analysis/CostModel/AArch64/splice.ll b/llvm/test/Analysis/CostModel/AArch64/splice.ll
index 1667a6c91965c..1d3154ad82299 100644
--- a/llvm/test/Analysis/CostModel/AArch64/splice.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/splice.ll
@@ -5,34 +5,34 @@ target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
define void @vector_splice() #0 {
; CHECK-LABEL: 'vector_splice'
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %1 = call <16 x i8> @llvm.vector.splice.down.v16i8(<16 x i8> zeroinitializer, <16 x i8> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %2 = call <32 x i8> @llvm.vector.splice.down.v32i8(<32 x i8> zeroinitializer, <32 x i8> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %3 = call <2 x i16> @llvm.vector.splice.down.v2i16(<2 x i16> zeroinitializer, <2 x i16> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %4 = call <4 x i16> @llvm.vector.splice.down.v4i16(<4 x i16> zeroinitializer, <4 x i16> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %5 = call <8 x i16> @llvm.vector.splice.down.v8i16(<8 x i16> zeroinitializer, <8 x i16> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %6 = call <16 x i16> @llvm.vector.splice.down.v16i16(<16 x i16> zeroinitializer, <16 x i16> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %7 = call <4 x i32> @llvm.vector.splice.down.v4i32(<4 x i32> zeroinitializer, <4 x i32> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = call <8 x i32> @llvm.vector.splice.down.v8i32(<8 x i32> zeroinitializer, <8 x i32> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %9 = call <2 x i64> @llvm.vector.splice.down.v2i64(<2 x i64> zeroinitializer, <2 x i64> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %10 = call <4 x i64> @llvm.vector.splice.down.v4i64(<4 x i64> zeroinitializer, <4 x i64> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %11 = call <2 x half> @llvm.vector.splice.down.v2f16(<2 x half> zeroinitializer, <2 x half> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %12 = call <4 x half> @llvm.vector.splice.down.v4f16(<4 x half> zeroinitializer, <4 x half> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %13 = call <8 x half> @llvm.vector.splice.down.v8f16(<8 x half> zeroinitializer, <8 x half> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %14 = call <16 x half> @llvm.vector.splice.down.v16f16(<16 x half> zeroinitializer, <16 x half> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %15 = call <2 x float> @llvm.vector.splice.down.v2f32(<2 x float> zeroinitializer, <2 x float> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %16 = call <4 x float> @llvm.vector.splice.down.v4f32(<4 x float> zeroinitializer, <4 x float> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %17 = call <8 x float> @llvm.vector.splice.down.v8f32(<8 x float> zeroinitializer, <8 x float> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %18 = call <2 x double> @llvm.vector.splice.down.v2f64(<2 x double> zeroinitializer, <2 x double> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %19 = call <4 x double> @llvm.vector.splice.down.v4f64(<4 x double> zeroinitializer, <4 x double> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %20 = call <2 x bfloat> @llvm.vector.splice.down.v2bf16(<2 x bfloat> zeroinitializer, <2 x bfloat> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %21 = call <4 x bfloat> @llvm.vector.splice.down.v4bf16(<4 x bfloat> zeroinitializer, <4 x bfloat> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %22 = call <8 x bfloat> @llvm.vector.splice.down.v8bf16(<8 x bfloat> zeroinitializer, <8 x bfloat> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %23 = call <16 x bfloat> @llvm.vector.splice.down.v16bf16(<16 x bfloat> zeroinitializer, <16 x bfloat> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %24 = call <16 x i1> @llvm.vector.splice.down.v16i1(<16 x i1> zeroinitializer, <16 x i1> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %25 = call <8 x i1> @llvm.vector.splice.down.v8i1(<8 x i1> zeroinitializer, <8 x i1> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %26 = call <4 x i1> @llvm.vector.splice.down.v4i1(<4 x i1> zeroinitializer, <4 x i1> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %27 = call <2 x i1> @llvm.vector.splice.down.v2i1(<2 x i1> zeroinitializer, <2 x i1> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %28 = call <2 x i128> @llvm.vector.splice.down.v2i128(<2 x i128> zeroinitializer, <2 x i128> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %1 = call <16 x i8> @llvm.vector.splice.left.v16i8(<16 x i8> zeroinitializer, <16 x i8> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %2 = call <32 x i8> @llvm.vector.splice.left.v32i8(<32 x i8> zeroinitializer, <32 x i8> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %3 = call <2 x i16> @llvm.vector.splice.left.v2i16(<2 x i16> zeroinitializer, <2 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %4 = call <4 x i16> @llvm.vector.splice.left.v4i16(<4 x i16> zeroinitializer, <4 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %5 = call <8 x i16> @llvm.vector.splice.left.v8i16(<8 x i16> zeroinitializer, <8 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %6 = call <16 x i16> @llvm.vector.splice.left.v16i16(<16 x i16> zeroinitializer, <16 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %7 = call <4 x i32> @llvm.vector.splice.left.v4i32(<4 x i32> zeroinitializer, <4 x i32> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = call <8 x i32> @llvm.vector.splice.left.v8i32(<8 x i32> zeroinitializer, <8 x i32> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %9 = call <2 x i64> @llvm.vector.splice.left.v2i64(<2 x i64> zeroinitializer, <2 x i64> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %10 = call <4 x i64> @llvm.vector.splice.left.v4i64(<4 x i64> zeroinitializer, <4 x i64> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %11 = call <2 x half> @llvm.vector.splice.left.v2f16(<2 x half> zeroinitializer, <2 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %12 = call <4 x half> @llvm.vector.splice.left.v4f16(<4 x half> zeroinitializer, <4 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %13 = call <8 x half> @llvm.vector.splice.left.v8f16(<8 x half> zeroinitializer, <8 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %14 = call <16 x half> @llvm.vector.splice.left.v16f16(<16 x half> zeroinitializer, <16 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %15 = call <2 x float> @llvm.vector.splice.left.v2f32(<2 x float> zeroinitializer, <2 x float> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %16 = call <4 x float> @llvm.vector.splice.left.v4f32(<4 x float> zeroinitializer, <4 x float> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %17 = call <8 x float> @llvm.vector.splice.left.v8f32(<8 x float> zeroinitializer, <8 x float> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %18 = call <2 x double> @llvm.vector.splice.left.v2f64(<2 x double> zeroinitializer, <2 x double> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %19 = call <4 x double> @llvm.vector.splice.left.v4f64(<4 x double> zeroinitializer, <4 x double> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %20 = call <2 x bfloat> @llvm.vector.splice.left.v2bf16(<2 x bfloat> zeroinitializer, <2 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %21 = call <4 x bfloat> @llvm.vector.splice.left.v4bf16(<4 x bfloat> zeroinitializer, <4 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %22 = call <8 x bfloat> @llvm.vector.splice.left.v8bf16(<8 x bfloat> zeroinitializer, <8 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %23 = call <16 x bfloat> @llvm.vector.splice.left.v16bf16(<16 x bfloat> zeroinitializer, <16 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %24 = call <16 x i1> @llvm.vector.splice.left.v16i1(<16 x i1> zeroinitializer, <16 x i1> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %25 = call <8 x i1> @llvm.vector.splice.left.v8i1(<8 x i1> zeroinitializer, <8 x i1> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %26 = call <4 x i1> @llvm.vector.splice.left.v4i1(<4 x i1> zeroinitializer, <4 x i1> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %27 = call <2 x i1> @llvm.vector.splice.left.v2i1(<2 x i1> zeroinitializer, <2 x i1> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %28 = call <2 x i128> @llvm.vector.splice.left.v2i128(<2 x i128> zeroinitializer, <2 x i128> zeroinitializer, i32 1)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
%splice.v16i8 = call <16 x i8> @llvm.vector.splice.v16i8(<16 x i8> zeroinitializer, <16 x i8> zeroinitializer, i32 1)
diff --git a/llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll b/llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll
index e222399fa9cb7..d503918ce6f78 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll
@@ -617,195 +617,195 @@ declare <vscale x 4 x float> @llvm.log10.nxv4f32(<vscale x 4 x float>)
define void @vector_splice() #0 {
; CHECK-VSCALE-1-LABEL: 'vector_splice'
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %1 = call <vscale x 16 x i8> @llvm.vector.splice.down.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %2 = call <vscale x 32 x i8> @llvm.vector.splice.down.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %3 = call <vscale x 2 x i16> @llvm.vector.splice.down.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %4 = call <vscale x 4 x i16> @llvm.vector.splice.down.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %5 = call <vscale x 8 x i16> @llvm.vector.splice.down.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %6 = call <vscale x 16 x i16> @llvm.vector.splice.down.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %7 = call <vscale x 4 x i32> @llvm.vector.splice.down.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %8 = call <vscale x 8 x i32> @llvm.vector.splice.down.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %9 = call <vscale x 2 x i64> @llvm.vector.splice.down.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %10 = call <vscale x 4 x i64> @llvm.vector.splice.down.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %11 = call <vscale x 2 x half> @llvm.vector.splice.down.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %12 = call <vscale x 4 x half> @llvm.vector.splice.down.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %13 = call <vscale x 8 x half> @llvm.vector.splice.down.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %14 = call <vscale x 16 x half> @llvm.vector.splice.down.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %15 = call <vscale x 2 x float> @llvm.vector.splice.down.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %16 = call <vscale x 4 x float> @llvm.vector.splice.down.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %17 = call <vscale x 8 x float> @llvm.vector.splice.down.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %18 = call <vscale x 2 x double> @llvm.vector.splice.down.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %19 = call <vscale x 4 x double> @llvm.vector.splice.down.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %20 = call <vscale x 2 x bfloat> @llvm.vector.splice.down.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %21 = call <vscale x 4 x bfloat> @llvm.vector.splice.down.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %22 = call <vscale x 8 x bfloat> @llvm.vector.splice.down.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %23 = call <vscale x 16 x bfloat> @llvm.vector.splice.down.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %24 = call <vscale x 16 x i1> @llvm.vector.splice.down.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %25 = call <vscale x 8 x i1> @llvm.vector.splice.down.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %26 = call <vscale x 4 x i1> @llvm.vector.splice.down.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %27 = call <vscale x 2 x i1> @llvm.vector.splice.down.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %28 = call <vscale x 16 x i8> @llvm.vector.splice.up.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %29 = call <vscale x 32 x i8> @llvm.vector.splice.up.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %30 = call <vscale x 1 x i16> @llvm.vector.splice.up.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %31 = call <vscale x 2 x i16> @llvm.vector.splice.up.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %32 = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %33 = call <vscale x 8 x i16> @llvm.vector.splice.up.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %34 = call <vscale x 16 x i16> @llvm.vector.splice.up.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %35 = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %36 = call <vscale x 8 x i32> @llvm.vector.splice.up.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %37 = call <vscale x 1 x i64> @llvm.vector.splice.up.nxv1i64(<vscale x 1 x i64> zeroinitializer, <vscale x 1 x i64> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %38 = call <vscale x 2 x i64> @llvm.vector.splice.up.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %39 = call <vscale x 4 x i64> @llvm.vector.splice.up.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %40 = call <vscale x 1 x half> @llvm.vector.splice.up.nxv1f16(<vscale x 1 x half> zeroinitializer, <vscale x 1 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %41 = call <vscale x 2 x half> @llvm.vector.splice.up.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %42 = call <vscale x 4 x half> @llvm.vector.splice.up.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %43 = call <vscale x 8 x half> @llvm.vector.splice.up.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %44 = call <vscale x 16 x half> @llvm.vector.splice.up.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %45 = call <vscale x 1 x float> @llvm.vector.splice.up.nxv1f32(<vscale x 1 x float> zeroinitializer, <vscale x 1 x float> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %46 = call <vscale x 2 x float> @llvm.vector.splice.up.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %47 = call <vscale x 4 x float> @llvm.vector.splice.up.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %48 = call <vscale x 8 x float> @llvm.vector.splice.up.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %49 = call <vscale x 1 x double> @llvm.vector.splice.up.nxv1f64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %50 = call <vscale x 2 x double> @llvm.vector.splice.up.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %51 = call <vscale x 4 x double> @llvm.vector.splice.up.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %52 = call <vscale x 1 x bfloat> @llvm.vector.splice.up.nxv1bf16(<vscale x 1 x bfloat> zeroinitializer, <vscale x 1 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %53 = call <vscale x 2 x bfloat> @llvm.vector.splice.up.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %54 = call <vscale x 4 x bfloat> @llvm.vector.splice.up.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %55 = call <vscale x 8 x bfloat> @llvm.vector.splice.up.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %56 = call <vscale x 16 x bfloat> @llvm.vector.splice.up.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %57 = call <vscale x 16 x i1> @llvm.vector.splice.up.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %58 = call <vscale x 8 x i1> @llvm.vector.splice.up.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %59 = call <vscale x 4 x i1> @llvm.vector.splice.up.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %60 = call <vscale x 2 x i1> @llvm.vector.splice.up.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %61 = call <vscale x 1 x i1> @llvm.vector.splice.up.nxv1i1(<vscale x 1 x i1> zeroinitializer, <vscale x 1 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %1 = call <vscale x 16 x i8> @llvm.vector.splice.left.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %2 = call <vscale x 32 x i8> @llvm.vector.splice.left.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %3 = call <vscale x 2 x i16> @llvm.vector.splice.left.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %4 = call <vscale x 4 x i16> @llvm.vector.splice.left.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %5 = call <vscale x 8 x i16> @llvm.vector.splice.left.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %6 = call <vscale x 16 x i16> @llvm.vector.splice.left.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %7 = call <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %8 = call <vscale x 8 x i32> @llvm.vector.splice.left.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %9 = call <vscale x 2 x i64> @llvm.vector.splice.left.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %10 = call <vscale x 4 x i64> @llvm.vector.splice.left.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %11 = call <vscale x 2 x half> @llvm.vector.splice.left.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %12 = call <vscale x 4 x half> @llvm.vector.splice.left.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %13 = call <vscale x 8 x half> @llvm.vector.splice.left.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %14 = call <vscale x 16 x half> @llvm.vector.splice.left.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %15 = call <vscale x 2 x float> @llvm.vector.splice.left.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %16 = call <vscale x 4 x float> @llvm.vector.splice.left.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %17 = call <vscale x 8 x float> @llvm.vector.splice.left.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %18 = call <vscale x 2 x double> @llvm.vector.splice.left.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %19 = call <vscale x 4 x double> @llvm.vector.splice.left.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %20 = call <vscale x 2 x bfloat> @llvm.vector.splice.left.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %21 = call <vscale x 4 x bfloat> @llvm.vector.splice.left.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 1 for: %22 = call <vscale x 8 x bfloat> @llvm.vector.splice.left.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 2 for: %23 = call <vscale x 16 x bfloat> @llvm.vector.splice.left.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %24 = call <vscale x 16 x i1> @llvm.vector.splice.left.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %25 = call <vscale x 8 x i1> @llvm.vector.splice.left.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %26 = call <vscale x 4 x i1> @llvm.vector.splice.left.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %27 = call <vscale x 2 x i1> @llvm.vector.splice.left.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %28 = call <vscale x 16 x i8> @llvm.vector.splice.right.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %29 = call <vscale x 32 x i8> @llvm.vector.splice.right.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %30 = call <vscale x 1 x i16> @llvm.vector.splice.right.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %31 = call <vscale x 2 x i16> @llvm.vector.splice.right.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %32 = call <vscale x 4 x i16> @llvm.vector.splice.right.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %33 = call <vscale x 8 x i16> @llvm.vector.splice.right.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %34 = call <vscale x 16 x i16> @llvm.vector.splice.right.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %35 = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %36 = call <vscale x 8 x i32> @llvm.vector.splice.right.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %37 = call <vscale x 1 x i64> @llvm.vector.splice.right.nxv1i64(<vscale x 1 x i64> zeroinitializer, <vscale x 1 x i64> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %38 = call <vscale x 2 x i64> @llvm.vector.splice.right.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %39 = call <vscale x 4 x i64> @llvm.vector.splice.right.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %40 = call <vscale x 1 x half> @llvm.vector.splice.right.nxv1f16(<vscale x 1 x half> zeroinitializer, <vscale x 1 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %41 = call <vscale x 2 x half> @llvm.vector.splice.right.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %42 = call <vscale x 4 x half> @llvm.vector.splice.right.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %43 = call <vscale x 8 x half> @llvm.vector.splice.right.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %44 = call <vscale x 16 x half> @llvm.vector.splice.right.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %45 = call <vscale x 1 x float> @llvm.vector.splice.right.nxv1f32(<vscale x 1 x float> zeroinitializer, <vscale x 1 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %46 = call <vscale x 2 x float> @llvm.vector.splice.right.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %47 = call <vscale x 4 x float> @llvm.vector.splice.right.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %48 = call <vscale x 8 x float> @llvm.vector.splice.right.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %49 = call <vscale x 1 x double> @llvm.vector.splice.right.nxv1f64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %50 = call <vscale x 2 x double> @llvm.vector.splice.right.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %51 = call <vscale x 4 x double> @llvm.vector.splice.right.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %52 = call <vscale x 1 x bfloat> @llvm.vector.splice.right.nxv1bf16(<vscale x 1 x bfloat> zeroinitializer, <vscale x 1 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %53 = call <vscale x 2 x bfloat> @llvm.vector.splice.right.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %54 = call <vscale x 4 x bfloat> @llvm.vector.splice.right.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 3 for: %55 = call <vscale x 8 x bfloat> @llvm.vector.splice.right.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of 6 for: %56 = call <vscale x 16 x bfloat> @llvm.vector.splice.right.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %57 = call <vscale x 16 x i1> @llvm.vector.splice.right.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %58 = call <vscale x 8 x i1> @llvm.vector.splice.right.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %59 = call <vscale x 4 x i1> @llvm.vector.splice.right.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %60 = call <vscale x 2 x i1> @llvm.vector.splice.right.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of Invalid for: %61 = call <vscale x 1 x i1> @llvm.vector.splice.right.nxv1i1(<vscale x 1 x i1> zeroinitializer, <vscale x 1 x i1> zeroinitializer, i32 1)
; CHECK-VSCALE-1-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
; CHECK-VSCALE-2-LABEL: 'vector_splice'
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %1 = call <vscale x 16 x i8> @llvm.vector.splice.down.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %2 = call <vscale x 32 x i8> @llvm.vector.splice.down.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %3 = call <vscale x 2 x i16> @llvm.vector.splice.down.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %4 = call <vscale x 4 x i16> @llvm.vector.splice.down.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %5 = call <vscale x 8 x i16> @llvm.vector.splice.down.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %6 = call <vscale x 16 x i16> @llvm.vector.splice.down.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %7 = call <vscale x 4 x i32> @llvm.vector.splice.down.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %8 = call <vscale x 8 x i32> @llvm.vector.splice.down.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %9 = call <vscale x 2 x i64> @llvm.vector.splice.down.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %10 = call <vscale x 4 x i64> @llvm.vector.splice.down.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %11 = call <vscale x 2 x half> @llvm.vector.splice.down.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %12 = call <vscale x 4 x half> @llvm.vector.splice.down.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %13 = call <vscale x 8 x half> @llvm.vector.splice.down.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %14 = call <vscale x 16 x half> @llvm.vector.splice.down.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %15 = call <vscale x 2 x float> @llvm.vector.splice.down.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %16 = call <vscale x 4 x float> @llvm.vector.splice.down.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %17 = call <vscale x 8 x float> @llvm.vector.splice.down.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %18 = call <vscale x 2 x double> @llvm.vector.splice.down.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %19 = call <vscale x 4 x double> @llvm.vector.splice.down.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %20 = call <vscale x 2 x bfloat> @llvm.vector.splice.down.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %21 = call <vscale x 4 x bfloat> @llvm.vector.splice.down.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %22 = call <vscale x 8 x bfloat> @llvm.vector.splice.down.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %23 = call <vscale x 16 x bfloat> @llvm.vector.splice.down.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %24 = call <vscale x 16 x i1> @llvm.vector.splice.down.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %25 = call <vscale x 8 x i1> @llvm.vector.splice.down.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %26 = call <vscale x 4 x i1> @llvm.vector.splice.down.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %27 = call <vscale x 2 x i1> @llvm.vector.splice.down.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %28 = call <vscale x 16 x i8> @llvm.vector.splice.up.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %29 = call <vscale x 32 x i8> @llvm.vector.splice.up.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %30 = call <vscale x 1 x i16> @llvm.vector.splice.up.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %31 = call <vscale x 2 x i16> @llvm.vector.splice.up.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %32 = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %33 = call <vscale x 8 x i16> @llvm.vector.splice.up.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %34 = call <vscale x 16 x i16> @llvm.vector.splice.up.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %35 = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %36 = call <vscale x 8 x i32> @llvm.vector.splice.up.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %37 = call <vscale x 1 x i64> @llvm.vector.splice.up.nxv1i64(<vscale x 1 x i64> zeroinitializer, <vscale x 1 x i64> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %38 = call <vscale x 2 x i64> @llvm.vector.splice.up.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %39 = call <vscale x 4 x i64> @llvm.vector.splice.up.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %40 = call <vscale x 1 x half> @llvm.vector.splice.up.nxv1f16(<vscale x 1 x half> zeroinitializer, <vscale x 1 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %41 = call <vscale x 2 x half> @llvm.vector.splice.up.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %42 = call <vscale x 4 x half> @llvm.vector.splice.up.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %43 = call <vscale x 8 x half> @llvm.vector.splice.up.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %44 = call <vscale x 16 x half> @llvm.vector.splice.up.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %45 = call <vscale x 1 x float> @llvm.vector.splice.up.nxv1f32(<vscale x 1 x float> zeroinitializer, <vscale x 1 x float> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %46 = call <vscale x 2 x float> @llvm.vector.splice.up.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %47 = call <vscale x 4 x float> @llvm.vector.splice.up.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %48 = call <vscale x 8 x float> @llvm.vector.splice.up.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %49 = call <vscale x 1 x double> @llvm.vector.splice.up.nxv1f64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %50 = call <vscale x 2 x double> @llvm.vector.splice.up.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %51 = call <vscale x 4 x double> @llvm.vector.splice.up.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %52 = call <vscale x 1 x bfloat> @llvm.vector.splice.up.nxv1bf16(<vscale x 1 x bfloat> zeroinitializer, <vscale x 1 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %53 = call <vscale x 2 x bfloat> @llvm.vector.splice.up.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %54 = call <vscale x 4 x bfloat> @llvm.vector.splice.up.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %55 = call <vscale x 8 x bfloat> @llvm.vector.splice.up.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %56 = call <vscale x 16 x bfloat> @llvm.vector.splice.up.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %57 = call <vscale x 16 x i1> @llvm.vector.splice.up.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %58 = call <vscale x 8 x i1> @llvm.vector.splice.up.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %59 = call <vscale x 4 x i1> @llvm.vector.splice.up.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %60 = call <vscale x 2 x i1> @llvm.vector.splice.up.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
-; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %61 = call <vscale x 1 x i1> @llvm.vector.splice.up.nxv1i1(<vscale x 1 x i1> zeroinitializer, <vscale x 1 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %1 = call <vscale x 16 x i8> @llvm.vector.splice.left.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %2 = call <vscale x 32 x i8> @llvm.vector.splice.left.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %3 = call <vscale x 2 x i16> @llvm.vector.splice.left.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %4 = call <vscale x 4 x i16> @llvm.vector.splice.left.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %5 = call <vscale x 8 x i16> @llvm.vector.splice.left.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %6 = call <vscale x 16 x i16> @llvm.vector.splice.left.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %7 = call <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %8 = call <vscale x 8 x i32> @llvm.vector.splice.left.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %9 = call <vscale x 2 x i64> @llvm.vector.splice.left.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %10 = call <vscale x 4 x i64> @llvm.vector.splice.left.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %11 = call <vscale x 2 x half> @llvm.vector.splice.left.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %12 = call <vscale x 4 x half> @llvm.vector.splice.left.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %13 = call <vscale x 8 x half> @llvm.vector.splice.left.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %14 = call <vscale x 16 x half> @llvm.vector.splice.left.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %15 = call <vscale x 2 x float> @llvm.vector.splice.left.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %16 = call <vscale x 4 x float> @llvm.vector.splice.left.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %17 = call <vscale x 8 x float> @llvm.vector.splice.left.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %18 = call <vscale x 2 x double> @llvm.vector.splice.left.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %19 = call <vscale x 4 x double> @llvm.vector.splice.left.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %20 = call <vscale x 2 x bfloat> @llvm.vector.splice.left.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %21 = call <vscale x 4 x bfloat> @llvm.vector.splice.left.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 1 for: %22 = call <vscale x 8 x bfloat> @llvm.vector.splice.left.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 2 for: %23 = call <vscale x 16 x bfloat> @llvm.vector.splice.left.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %24 = call <vscale x 16 x i1> @llvm.vector.splice.left.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %25 = call <vscale x 8 x i1> @llvm.vector.splice.left.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %26 = call <vscale x 4 x i1> @llvm.vector.splice.left.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:3 SizeLat:3 for: %27 = call <vscale x 2 x i1> @llvm.vector.splice.left.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %28 = call <vscale x 16 x i8> @llvm.vector.splice.right.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %29 = call <vscale x 32 x i8> @llvm.vector.splice.right.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %30 = call <vscale x 1 x i16> @llvm.vector.splice.right.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %31 = call <vscale x 2 x i16> @llvm.vector.splice.right.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %32 = call <vscale x 4 x i16> @llvm.vector.splice.right.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %33 = call <vscale x 8 x i16> @llvm.vector.splice.right.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %34 = call <vscale x 16 x i16> @llvm.vector.splice.right.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %35 = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %36 = call <vscale x 8 x i32> @llvm.vector.splice.right.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %37 = call <vscale x 1 x i64> @llvm.vector.splice.right.nxv1i64(<vscale x 1 x i64> zeroinitializer, <vscale x 1 x i64> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %38 = call <vscale x 2 x i64> @llvm.vector.splice.right.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %39 = call <vscale x 4 x i64> @llvm.vector.splice.right.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %40 = call <vscale x 1 x half> @llvm.vector.splice.right.nxv1f16(<vscale x 1 x half> zeroinitializer, <vscale x 1 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %41 = call <vscale x 2 x half> @llvm.vector.splice.right.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %42 = call <vscale x 4 x half> @llvm.vector.splice.right.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %43 = call <vscale x 8 x half> @llvm.vector.splice.right.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %44 = call <vscale x 16 x half> @llvm.vector.splice.right.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %45 = call <vscale x 1 x float> @llvm.vector.splice.right.nxv1f32(<vscale x 1 x float> zeroinitializer, <vscale x 1 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %46 = call <vscale x 2 x float> @llvm.vector.splice.right.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %47 = call <vscale x 4 x float> @llvm.vector.splice.right.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %48 = call <vscale x 8 x float> @llvm.vector.splice.right.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %49 = call <vscale x 1 x double> @llvm.vector.splice.right.nxv1f64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %50 = call <vscale x 2 x double> @llvm.vector.splice.right.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %51 = call <vscale x 4 x double> @llvm.vector.splice.right.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %52 = call <vscale x 1 x bfloat> @llvm.vector.splice.right.nxv1bf16(<vscale x 1 x bfloat> zeroinitializer, <vscale x 1 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %53 = call <vscale x 2 x bfloat> @llvm.vector.splice.right.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %54 = call <vscale x 4 x bfloat> @llvm.vector.splice.right.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 3 for: %55 = call <vscale x 8 x bfloat> @llvm.vector.splice.right.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of 6 for: %56 = call <vscale x 16 x bfloat> @llvm.vector.splice.right.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %57 = call <vscale x 16 x i1> @llvm.vector.splice.right.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %58 = call <vscale x 8 x i1> @llvm.vector.splice.right.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %59 = call <vscale x 4 x i1> @llvm.vector.splice.right.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:6 CodeSize:5 Lat:5 SizeLat:5 for: %60 = call <vscale x 2 x i1> @llvm.vector.splice.right.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
+; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of Invalid for: %61 = call <vscale x 1 x i1> @llvm.vector.splice.right.nxv1i1(<vscale x 1 x i1> zeroinitializer, <vscale x 1 x i1> zeroinitializer, i32 1)
; CHECK-VSCALE-2-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
; TYPE_BASED_ONLY-LABEL: 'vector_splice'
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %1 = call <vscale x 16 x i8> @llvm.vector.splice.down.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %2 = call <vscale x 32 x i8> @llvm.vector.splice.down.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %3 = call <vscale x 2 x i16> @llvm.vector.splice.down.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %4 = call <vscale x 4 x i16> @llvm.vector.splice.down.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %5 = call <vscale x 8 x i16> @llvm.vector.splice.down.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %6 = call <vscale x 16 x i16> @llvm.vector.splice.down.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %7 = call <vscale x 4 x i32> @llvm.vector.splice.down.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %8 = call <vscale x 8 x i32> @llvm.vector.splice.down.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %9 = call <vscale x 2 x i64> @llvm.vector.splice.down.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %10 = call <vscale x 4 x i64> @llvm.vector.splice.down.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %11 = call <vscale x 2 x half> @llvm.vector.splice.down.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %12 = call <vscale x 4 x half> @llvm.vector.splice.down.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %13 = call <vscale x 8 x half> @llvm.vector.splice.down.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %14 = call <vscale x 16 x half> @llvm.vector.splice.down.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %15 = call <vscale x 2 x float> @llvm.vector.splice.down.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %16 = call <vscale x 4 x float> @llvm.vector.splice.down.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %17 = call <vscale x 8 x float> @llvm.vector.splice.down.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %18 = call <vscale x 2 x double> @llvm.vector.splice.down.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %19 = call <vscale x 4 x double> @llvm.vector.splice.down.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %20 = call <vscale x 2 x bfloat> @llvm.vector.splice.down.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %21 = call <vscale x 4 x bfloat> @llvm.vector.splice.down.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %22 = call <vscale x 8 x bfloat> @llvm.vector.splice.down.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %23 = call <vscale x 16 x bfloat> @llvm.vector.splice.down.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %24 = call <vscale x 16 x i1> @llvm.vector.splice.down.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %25 = call <vscale x 8 x i1> @llvm.vector.splice.down.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %26 = call <vscale x 4 x i1> @llvm.vector.splice.down.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %27 = call <vscale x 2 x i1> @llvm.vector.splice.down.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %28 = call <vscale x 16 x i8> @llvm.vector.splice.up.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %29 = call <vscale x 32 x i8> @llvm.vector.splice.up.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %30 = call <vscale x 1 x i16> @llvm.vector.splice.up.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %31 = call <vscale x 2 x i16> @llvm.vector.splice.up.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %32 = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %33 = call <vscale x 8 x i16> @llvm.vector.splice.up.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %34 = call <vscale x 16 x i16> @llvm.vector.splice.up.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %35 = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %36 = call <vscale x 8 x i32> @llvm.vector.splice.up.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %37 = call <vscale x 1 x i64> @llvm.vector.splice.up.nxv1i64(<vscale x 1 x i64> zeroinitializer, <vscale x 1 x i64> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %38 = call <vscale x 2 x i64> @llvm.vector.splice.up.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %39 = call <vscale x 4 x i64> @llvm.vector.splice.up.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %40 = call <vscale x 1 x half> @llvm.vector.splice.up.nxv1f16(<vscale x 1 x half> zeroinitializer, <vscale x 1 x half> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %41 = call <vscale x 2 x half> @llvm.vector.splice.up.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %42 = call <vscale x 4 x half> @llvm.vector.splice.up.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %43 = call <vscale x 8 x half> @llvm.vector.splice.up.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %44 = call <vscale x 16 x half> @llvm.vector.splice.up.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %45 = call <vscale x 1 x float> @llvm.vector.splice.up.nxv1f32(<vscale x 1 x float> zeroinitializer, <vscale x 1 x float> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %46 = call <vscale x 2 x float> @llvm.vector.splice.up.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %47 = call <vscale x 4 x float> @llvm.vector.splice.up.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %48 = call <vscale x 8 x float> @llvm.vector.splice.up.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %49 = call <vscale x 1 x double> @llvm.vector.splice.up.nxv1f64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %50 = call <vscale x 2 x double> @llvm.vector.splice.up.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %51 = call <vscale x 4 x double> @llvm.vector.splice.up.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %52 = call <vscale x 1 x bfloat> @llvm.vector.splice.up.nxv1bf16(<vscale x 1 x bfloat> zeroinitializer, <vscale x 1 x bfloat> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %53 = call <vscale x 2 x bfloat> @llvm.vector.splice.up.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %54 = call <vscale x 4 x bfloat> @llvm.vector.splice.up.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %55 = call <vscale x 8 x bfloat> @llvm.vector.splice.up.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %56 = call <vscale x 16 x bfloat> @llvm.vector.splice.up.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %57 = call <vscale x 16 x i1> @llvm.vector.splice.up.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %58 = call <vscale x 8 x i1> @llvm.vector.splice.up.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %59 = call <vscale x 4 x i1> @llvm.vector.splice.up.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %60 = call <vscale x 2 x i1> @llvm.vector.splice.up.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
-; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %61 = call <vscale x 1 x i1> @llvm.vector.splice.up.nxv1i1(<vscale x 1 x i1> zeroinitializer, <vscale x 1 x i1> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %1 = call <vscale x 16 x i8> @llvm.vector.splice.left.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %2 = call <vscale x 32 x i8> @llvm.vector.splice.left.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %3 = call <vscale x 2 x i16> @llvm.vector.splice.left.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %4 = call <vscale x 4 x i16> @llvm.vector.splice.left.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %5 = call <vscale x 8 x i16> @llvm.vector.splice.left.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %6 = call <vscale x 16 x i16> @llvm.vector.splice.left.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %7 = call <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %8 = call <vscale x 8 x i32> @llvm.vector.splice.left.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %9 = call <vscale x 2 x i64> @llvm.vector.splice.left.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %10 = call <vscale x 4 x i64> @llvm.vector.splice.left.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %11 = call <vscale x 2 x half> @llvm.vector.splice.left.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %12 = call <vscale x 4 x half> @llvm.vector.splice.left.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %13 = call <vscale x 8 x half> @llvm.vector.splice.left.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %14 = call <vscale x 16 x half> @llvm.vector.splice.left.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %15 = call <vscale x 2 x float> @llvm.vector.splice.left.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %16 = call <vscale x 4 x float> @llvm.vector.splice.left.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %17 = call <vscale x 8 x float> @llvm.vector.splice.left.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %18 = call <vscale x 2 x double> @llvm.vector.splice.left.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %19 = call <vscale x 4 x double> @llvm.vector.splice.left.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %20 = call <vscale x 2 x bfloat> @llvm.vector.splice.left.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %21 = call <vscale x 4 x bfloat> @llvm.vector.splice.left.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %22 = call <vscale x 8 x bfloat> @llvm.vector.splice.left.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %23 = call <vscale x 16 x bfloat> @llvm.vector.splice.left.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %24 = call <vscale x 16 x i1> @llvm.vector.splice.left.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %25 = call <vscale x 8 x i1> @llvm.vector.splice.left.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %26 = call <vscale x 4 x i1> @llvm.vector.splice.left.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %27 = call <vscale x 2 x i1> @llvm.vector.splice.left.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %28 = call <vscale x 16 x i8> @llvm.vector.splice.right.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %29 = call <vscale x 32 x i8> @llvm.vector.splice.right.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %30 = call <vscale x 1 x i16> @llvm.vector.splice.right.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %31 = call <vscale x 2 x i16> @llvm.vector.splice.right.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %32 = call <vscale x 4 x i16> @llvm.vector.splice.right.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %33 = call <vscale x 8 x i16> @llvm.vector.splice.right.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %34 = call <vscale x 16 x i16> @llvm.vector.splice.right.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %35 = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %36 = call <vscale x 8 x i32> @llvm.vector.splice.right.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %37 = call <vscale x 1 x i64> @llvm.vector.splice.right.nxv1i64(<vscale x 1 x i64> zeroinitializer, <vscale x 1 x i64> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %38 = call <vscale x 2 x i64> @llvm.vector.splice.right.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %39 = call <vscale x 4 x i64> @llvm.vector.splice.right.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %40 = call <vscale x 1 x half> @llvm.vector.splice.right.nxv1f16(<vscale x 1 x half> zeroinitializer, <vscale x 1 x half> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %41 = call <vscale x 2 x half> @llvm.vector.splice.right.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %42 = call <vscale x 4 x half> @llvm.vector.splice.right.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %43 = call <vscale x 8 x half> @llvm.vector.splice.right.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %44 = call <vscale x 16 x half> @llvm.vector.splice.right.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %45 = call <vscale x 1 x float> @llvm.vector.splice.right.nxv1f32(<vscale x 1 x float> zeroinitializer, <vscale x 1 x float> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %46 = call <vscale x 2 x float> @llvm.vector.splice.right.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %47 = call <vscale x 4 x float> @llvm.vector.splice.right.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %48 = call <vscale x 8 x float> @llvm.vector.splice.right.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %49 = call <vscale x 1 x double> @llvm.vector.splice.right.nxv1f64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %50 = call <vscale x 2 x double> @llvm.vector.splice.right.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %51 = call <vscale x 4 x double> @llvm.vector.splice.right.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %52 = call <vscale x 1 x bfloat> @llvm.vector.splice.right.nxv1bf16(<vscale x 1 x bfloat> zeroinitializer, <vscale x 1 x bfloat> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %53 = call <vscale x 2 x bfloat> @llvm.vector.splice.right.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %54 = call <vscale x 4 x bfloat> @llvm.vector.splice.right.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %55 = call <vscale x 8 x bfloat> @llvm.vector.splice.right.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %56 = call <vscale x 16 x bfloat> @llvm.vector.splice.right.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %57 = call <vscale x 16 x i1> @llvm.vector.splice.right.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %58 = call <vscale x 8 x i1> @llvm.vector.splice.right.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %59 = call <vscale x 4 x i1> @llvm.vector.splice.right.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %60 = call <vscale x 2 x i1> @llvm.vector.splice.right.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
+; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of Invalid for: %61 = call <vscale x 1 x i1> @llvm.vector.splice.right.nxv1i1(<vscale x 1 x i1> zeroinitializer, <vscale x 1 x i1> zeroinitializer, i32 1)
; TYPE_BASED_ONLY-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
diff --git a/llvm/test/Analysis/CostModel/RISCV/rvv-shuffle.ll b/llvm/test/Analysis/CostModel/RISCV/rvv-shuffle.ll
index 8a2f8e18df805..56b8aa343310b 100644
--- a/llvm/test/Analysis/CostModel/RISCV/rvv-shuffle.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/rvv-shuffle.ll
@@ -149,54 +149,54 @@ define void @vector_reverse() {
define void @vector_splice() {
; ARGBASED-LABEL: 'vector_splice'
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %1 = call <vscale x 16 x i8> @llvm.vector.splice.down.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %2 = call <vscale x 32 x i8> @llvm.vector.splice.down.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %3 = call <vscale x 2 x i16> @llvm.vector.splice.down.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %4 = call <vscale x 4 x i16> @llvm.vector.splice.down.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %5 = call <vscale x 8 x i16> @llvm.vector.splice.down.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %6 = call <vscale x 16 x i16> @llvm.vector.splice.down.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %7 = call <vscale x 4 x i32> @llvm.vector.splice.down.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %8 = call <vscale x 8 x i32> @llvm.vector.splice.down.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %9 = call <vscale x 2 x i64> @llvm.vector.splice.down.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %10 = call <vscale x 4 x i64> @llvm.vector.splice.down.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %11 = call <vscale x 16 x i1> @llvm.vector.splice.down.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %12 = call <vscale x 8 x i1> @llvm.vector.splice.down.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %13 = call <vscale x 4 x i1> @llvm.vector.splice.down.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
-; ARGBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %14 = call <vscale x 2 x i1> @llvm.vector.splice.down.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %1 = call <vscale x 16 x i8> @llvm.vector.splice.left.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %2 = call <vscale x 32 x i8> @llvm.vector.splice.left.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %3 = call <vscale x 2 x i16> @llvm.vector.splice.left.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %4 = call <vscale x 4 x i16> @llvm.vector.splice.left.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %5 = call <vscale x 8 x i16> @llvm.vector.splice.left.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %6 = call <vscale x 16 x i16> @llvm.vector.splice.left.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %7 = call <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %8 = call <vscale x 8 x i32> @llvm.vector.splice.left.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %9 = call <vscale x 2 x i64> @llvm.vector.splice.left.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %10 = call <vscale x 4 x i64> @llvm.vector.splice.left.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %11 = call <vscale x 16 x i1> @llvm.vector.splice.left.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %12 = call <vscale x 8 x i1> @llvm.vector.splice.left.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %13 = call <vscale x 4 x i1> @llvm.vector.splice.left.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
+; ARGBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %14 = call <vscale x 2 x i1> @llvm.vector.splice.left.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
; ARGBASED-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
; TYPEBASED-LABEL: 'vector_splice'
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %1 = call <vscale x 16 x i8> @llvm.vector.splice.down.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %2 = call <vscale x 32 x i8> @llvm.vector.splice.down.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %3 = call <vscale x 2 x i16> @llvm.vector.splice.down.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %4 = call <vscale x 4 x i16> @llvm.vector.splice.down.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %5 = call <vscale x 8 x i16> @llvm.vector.splice.down.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %6 = call <vscale x 16 x i16> @llvm.vector.splice.down.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %7 = call <vscale x 4 x i32> @llvm.vector.splice.down.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %8 = call <vscale x 8 x i32> @llvm.vector.splice.down.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %9 = call <vscale x 2 x i64> @llvm.vector.splice.down.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %10 = call <vscale x 4 x i64> @llvm.vector.splice.down.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %11 = call <vscale x 16 x i1> @llvm.vector.splice.down.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %12 = call <vscale x 8 x i1> @llvm.vector.splice.down.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %13 = call <vscale x 4 x i1> @llvm.vector.splice.down.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
-; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %14 = call <vscale x 2 x i1> @llvm.vector.splice.down.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %1 = call <vscale x 16 x i8> @llvm.vector.splice.left.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %2 = call <vscale x 32 x i8> @llvm.vector.splice.left.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %3 = call <vscale x 2 x i16> @llvm.vector.splice.left.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %4 = call <vscale x 4 x i16> @llvm.vector.splice.left.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %5 = call <vscale x 8 x i16> @llvm.vector.splice.left.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %6 = call <vscale x 16 x i16> @llvm.vector.splice.left.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %7 = call <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %8 = call <vscale x 8 x i32> @llvm.vector.splice.left.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %9 = call <vscale x 2 x i64> @llvm.vector.splice.left.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %10 = call <vscale x 4 x i64> @llvm.vector.splice.left.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %11 = call <vscale x 16 x i1> @llvm.vector.splice.left.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %12 = call <vscale x 8 x i1> @llvm.vector.splice.left.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %13 = call <vscale x 4 x i1> @llvm.vector.splice.left.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
+; TYPEBASED-NEXT: Cost Model: Invalid cost for instruction: %14 = call <vscale x 2 x i1> @llvm.vector.splice.left.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
; SIZE-LABEL: 'vector_splice'
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %1 = call <vscale x 16 x i8> @llvm.vector.splice.down.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %2 = call <vscale x 32 x i8> @llvm.vector.splice.down.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %3 = call <vscale x 2 x i16> @llvm.vector.splice.down.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %4 = call <vscale x 4 x i16> @llvm.vector.splice.down.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %5 = call <vscale x 8 x i16> @llvm.vector.splice.down.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %6 = call <vscale x 16 x i16> @llvm.vector.splice.down.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %7 = call <vscale x 4 x i32> @llvm.vector.splice.down.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = call <vscale x 8 x i32> @llvm.vector.splice.down.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %9 = call <vscale x 2 x i64> @llvm.vector.splice.down.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %10 = call <vscale x 4 x i64> @llvm.vector.splice.down.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %11 = call <vscale x 16 x i1> @llvm.vector.splice.down.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %12 = call <vscale x 8 x i1> @llvm.vector.splice.down.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %13 = call <vscale x 4 x i1> @llvm.vector.splice.down.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %14 = call <vscale x 2 x i1> @llvm.vector.splice.down.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %1 = call <vscale x 16 x i8> @llvm.vector.splice.left.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %2 = call <vscale x 32 x i8> @llvm.vector.splice.left.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %3 = call <vscale x 2 x i16> @llvm.vector.splice.left.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %4 = call <vscale x 4 x i16> @llvm.vector.splice.left.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %5 = call <vscale x 8 x i16> @llvm.vector.splice.left.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %6 = call <vscale x 16 x i16> @llvm.vector.splice.left.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %7 = call <vscale x 4 x i32> @llvm.vector.splice.left.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = call <vscale x 8 x i32> @llvm.vector.splice.left.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %9 = call <vscale x 2 x i64> @llvm.vector.splice.left.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %10 = call <vscale x 4 x i64> @llvm.vector.splice.left.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %11 = call <vscale x 16 x i1> @llvm.vector.splice.left.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %12 = call <vscale x 8 x i1> @llvm.vector.splice.left.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %13 = call <vscale x 4 x i1> @llvm.vector.splice.left.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %14 = call <vscale x 2 x i1> @llvm.vector.splice.left.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
;
%splice_nxv16i8 = call <vscale x 16 x i8> @llvm.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
diff --git a/llvm/test/Analysis/CostModel/RISCV/splice.ll b/llvm/test/Analysis/CostModel/RISCV/splice.ll
index f40fb65a29144..e388a99be423b 100644
--- a/llvm/test/Analysis/CostModel/RISCV/splice.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/splice.ll
@@ -6,121 +6,121 @@
define void @vector_splice() {
; CHECK-LABEL: 'vector_splice'
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %1 = call <vscale x 1 x i8> @llvm.vector.splice.up.nxv1i8(<vscale x 1 x i8> zeroinitializer, <vscale x 1 x i8> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %2 = call <vscale x 2 x i8> @llvm.vector.splice.up.nxv2i8(<vscale x 2 x i8> zeroinitializer, <vscale x 2 x i8> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %3 = call <vscale x 4 x i8> @llvm.vector.splice.up.nxv4i8(<vscale x 4 x i8> zeroinitializer, <vscale x 4 x i8> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %4 = call <vscale x 8 x i8> @llvm.vector.splice.up.nxv8i8(<vscale x 8 x i8> zeroinitializer, <vscale x 8 x i8> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %5 = call <vscale x 16 x i8> @llvm.vector.splice.up.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %6 = call <vscale x 32 x i8> @llvm.vector.splice.up.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %7 = call <vscale x 64 x i8> @llvm.vector.splice.up.nxv64i8(<vscale x 64 x i8> zeroinitializer, <vscale x 64 x i8> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = call <vscale x 1 x i16> @llvm.vector.splice.up.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %9 = call <vscale x 2 x i16> @llvm.vector.splice.up.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %10 = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %11 = call <vscale x 8 x i16> @llvm.vector.splice.up.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %12 = call <vscale x 16 x i16> @llvm.vector.splice.up.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %13 = call <vscale x 32 x i16> @llvm.vector.splice.up.nxv32i16(<vscale x 32 x i16> zeroinitializer, <vscale x 32 x i16> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %14 = call <vscale x 64 x i16> @llvm.vector.splice.up.nxv64i16(<vscale x 64 x i16> zeroinitializer, <vscale x 64 x i16> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %15 = call <vscale x 1 x i32> @llvm.vector.splice.up.nxv1i32(<vscale x 1 x i32> zeroinitializer, <vscale x 1 x i32> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %16 = call <vscale x 2 x i32> @llvm.vector.splice.up.nxv2i32(<vscale x 2 x i32> zeroinitializer, <vscale x 2 x i32> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %17 = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %18 = call <vscale x 8 x i32> @llvm.vector.splice.up.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %19 = call <vscale x 16 x i32> @llvm.vector.splice.up.nxv16i32(<vscale x 16 x i32> zeroinitializer, <vscale x 16 x i32> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %20 = call <vscale x 32 x i32> @llvm.vector.splice.up.nxv32i32(<vscale x 32 x i32> zeroinitializer, <vscale x 32 x i32> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %21 = call <vscale x 64 x i32> @llvm.vector.splice.up.nxv64i32(<vscale x 64 x i32> zeroinitializer, <vscale x 64 x i32> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %22 = call <vscale x 1 x i64> @llvm.vector.splice.up.nxv1i64(<vscale x 1 x i64> zeroinitializer, <vscale x 1 x i64> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %23 = call <vscale x 2 x i64> @llvm.vector.splice.up.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %24 = call <vscale x 4 x i64> @llvm.vector.splice.up.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %25 = call <vscale x 8 x i64> @llvm.vector.splice.up.nxv8i64(<vscale x 8 x i64> zeroinitializer, <vscale x 8 x i64> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %26 = call <vscale x 16 x i64> @llvm.vector.splice.up.nxv16i64(<vscale x 16 x i64> zeroinitializer, <vscale x 16 x i64> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %27 = call <vscale x 32 x i64> @llvm.vector.splice.up.nxv32i64(<vscale x 32 x i64> zeroinitializer, <vscale x 32 x i64> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 128 for instruction: %28 = call <vscale x 64 x i64> @llvm.vector.splice.up.nxv64i64(<vscale x 64 x i64> zeroinitializer, <vscale x 64 x i64> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %29 = call <vscale x 1 x bfloat> @llvm.vector.splice.up.nxv1bf16(<vscale x 1 x bfloat> zeroinitializer, <vscale x 1 x bfloat> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %30 = call <vscale x 2 x bfloat> @llvm.vector.splice.up.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %31 = call <vscale x 4 x bfloat> @llvm.vector.splice.up.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %32 = call <vscale x 8 x bfloat> @llvm.vector.splice.up.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %33 = call <vscale x 16 x bfloat> @llvm.vector.splice.up.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %34 = call <vscale x 32 x bfloat> @llvm.vector.splice.up.nxv32bf16(<vscale x 32 x bfloat> zeroinitializer, <vscale x 32 x bfloat> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %35 = call <vscale x 64 x bfloat> @llvm.vector.splice.up.nxv64bf16(<vscale x 64 x bfloat> zeroinitializer, <vscale x 64 x bfloat> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %36 = call <vscale x 1 x half> @llvm.vector.splice.up.nxv1f16(<vscale x 1 x half> zeroinitializer, <vscale x 1 x half> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %37 = call <vscale x 2 x half> @llvm.vector.splice.up.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %38 = call <vscale x 4 x half> @llvm.vector.splice.up.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %39 = call <vscale x 8 x half> @llvm.vector.splice.up.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %40 = call <vscale x 16 x half> @llvm.vector.splice.up.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %41 = call <vscale x 32 x half> @llvm.vector.splice.up.nxv32f16(<vscale x 32 x half> zeroinitializer, <vscale x 32 x half> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %42 = call <vscale x 64 x half> @llvm.vector.splice.up.nxv64f16(<vscale x 64 x half> zeroinitializer, <vscale x 64 x half> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %43 = call <vscale x 1 x float> @llvm.vector.splice.up.nxv1f32(<vscale x 1 x float> zeroinitializer, <vscale x 1 x float> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %44 = call <vscale x 2 x float> @llvm.vector.splice.up.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %45 = call <vscale x 4 x float> @llvm.vector.splice.up.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %46 = call <vscale x 8 x float> @llvm.vector.splice.up.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %47 = call <vscale x 16 x float> @llvm.vector.splice.up.nxv16f32(<vscale x 16 x float> zeroinitializer, <vscale x 16 x float> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %48 = call <vscale x 32 x float> @llvm.vector.splice.up.nxv32f32(<vscale x 32 x float> zeroinitializer, <vscale x 32 x float> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %49 = call <vscale x 64 x float> @llvm.vector.splice.up.nxv64f32(<vscale x 64 x float> zeroinitializer, <vscale x 64 x float> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %50 = call <vscale x 1 x double> @llvm.vector.splice.up.nxv1f64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %51 = call <vscale x 2 x double> @llvm.vector.splice.up.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %52 = call <vscale x 4 x double> @llvm.vector.splice.up.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %53 = call <vscale x 8 x double> @llvm.vector.splice.up.nxv8f64(<vscale x 8 x double> zeroinitializer, <vscale x 8 x double> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %54 = call <vscale x 16 x double> @llvm.vector.splice.up.nxv16f64(<vscale x 16 x double> zeroinitializer, <vscale x 16 x double> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %55 = call <vscale x 32 x double> @llvm.vector.splice.up.nxv32f64(<vscale x 32 x double> zeroinitializer, <vscale x 32 x double> zeroinitializer, i32 1)
-; CHECK-NEXT: Cost Model: Found an estimated cost of 128 for instruction: %56 = call <vscale x 64 x double> @llvm.vector.splice.up.nxv64f64(<vscale x 64 x double> zeroinitializer, <vscale x 64 x double> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %1 = call <vscale x 1 x i8> @llvm.vector.splice.right.nxv1i8(<vscale x 1 x i8> zeroinitializer, <vscale x 1 x i8> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %2 = call <vscale x 2 x i8> @llvm.vector.splice.right.nxv2i8(<vscale x 2 x i8> zeroinitializer, <vscale x 2 x i8> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %3 = call <vscale x 4 x i8> @llvm.vector.splice.right.nxv4i8(<vscale x 4 x i8> zeroinitializer, <vscale x 4 x i8> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %4 = call <vscale x 8 x i8> @llvm.vector.splice.right.nxv8i8(<vscale x 8 x i8> zeroinitializer, <vscale x 8 x i8> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %5 = call <vscale x 16 x i8> @llvm.vector.splice.right.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %6 = call <vscale x 32 x i8> @llvm.vector.splice.right.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %7 = call <vscale x 64 x i8> @llvm.vector.splice.right.nxv64i8(<vscale x 64 x i8> zeroinitializer, <vscale x 64 x i8> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = call <vscale x 1 x i16> @llvm.vector.splice.right.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %9 = call <vscale x 2 x i16> @llvm.vector.splice.right.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %10 = call <vscale x 4 x i16> @llvm.vector.splice.right.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %11 = call <vscale x 8 x i16> @llvm.vector.splice.right.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %12 = call <vscale x 16 x i16> @llvm.vector.splice.right.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %13 = call <vscale x 32 x i16> @llvm.vector.splice.right.nxv32i16(<vscale x 32 x i16> zeroinitializer, <vscale x 32 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %14 = call <vscale x 64 x i16> @llvm.vector.splice.right.nxv64i16(<vscale x 64 x i16> zeroinitializer, <vscale x 64 x i16> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %15 = call <vscale x 1 x i32> @llvm.vector.splice.right.nxv1i32(<vscale x 1 x i32> zeroinitializer, <vscale x 1 x i32> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %16 = call <vscale x 2 x i32> @llvm.vector.splice.right.nxv2i32(<vscale x 2 x i32> zeroinitializer, <vscale x 2 x i32> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %17 = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %18 = call <vscale x 8 x i32> @llvm.vector.splice.right.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %19 = call <vscale x 16 x i32> @llvm.vector.splice.right.nxv16i32(<vscale x 16 x i32> zeroinitializer, <vscale x 16 x i32> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %20 = call <vscale x 32 x i32> @llvm.vector.splice.right.nxv32i32(<vscale x 32 x i32> zeroinitializer, <vscale x 32 x i32> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %21 = call <vscale x 64 x i32> @llvm.vector.splice.right.nxv64i32(<vscale x 64 x i32> zeroinitializer, <vscale x 64 x i32> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %22 = call <vscale x 1 x i64> @llvm.vector.splice.right.nxv1i64(<vscale x 1 x i64> zeroinitializer, <vscale x 1 x i64> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %23 = call <vscale x 2 x i64> @llvm.vector.splice.right.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %24 = call <vscale x 4 x i64> @llvm.vector.splice.right.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %25 = call <vscale x 8 x i64> @llvm.vector.splice.right.nxv8i64(<vscale x 8 x i64> zeroinitializer, <vscale x 8 x i64> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %26 = call <vscale x 16 x i64> @llvm.vector.splice.right.nxv16i64(<vscale x 16 x i64> zeroinitializer, <vscale x 16 x i64> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %27 = call <vscale x 32 x i64> @llvm.vector.splice.right.nxv32i64(<vscale x 32 x i64> zeroinitializer, <vscale x 32 x i64> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 128 for instruction: %28 = call <vscale x 64 x i64> @llvm.vector.splice.right.nxv64i64(<vscale x 64 x i64> zeroinitializer, <vscale x 64 x i64> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %29 = call <vscale x 1 x bfloat> @llvm.vector.splice.right.nxv1bf16(<vscale x 1 x bfloat> zeroinitializer, <vscale x 1 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %30 = call <vscale x 2 x bfloat> @llvm.vector.splice.right.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %31 = call <vscale x 4 x bfloat> @llvm.vector.splice.right.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %32 = call <vscale x 8 x bfloat> @llvm.vector.splice.right.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %33 = call <vscale x 16 x bfloat> @llvm.vector.splice.right.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %34 = call <vscale x 32 x bfloat> @llvm.vector.splice.right.nxv32bf16(<vscale x 32 x bfloat> zeroinitializer, <vscale x 32 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %35 = call <vscale x 64 x bfloat> @llvm.vector.splice.right.nxv64bf16(<vscale x 64 x bfloat> zeroinitializer, <vscale x 64 x bfloat> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %36 = call <vscale x 1 x half> @llvm.vector.splice.right.nxv1f16(<vscale x 1 x half> zeroinitializer, <vscale x 1 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %37 = call <vscale x 2 x half> @llvm.vector.splice.right.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %38 = call <vscale x 4 x half> @llvm.vector.splice.right.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %39 = call <vscale x 8 x half> @llvm.vector.splice.right.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %40 = call <vscale x 16 x half> @llvm.vector.splice.right.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %41 = call <vscale x 32 x half> @llvm.vector.splice.right.nxv32f16(<vscale x 32 x half> zeroinitializer, <vscale x 32 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %42 = call <vscale x 64 x half> @llvm.vector.splice.right.nxv64f16(<vscale x 64 x half> zeroinitializer, <vscale x 64 x half> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %43 = call <vscale x 1 x float> @llvm.vector.splice.right.nxv1f32(<vscale x 1 x float> zeroinitializer, <vscale x 1 x float> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %44 = call <vscale x 2 x float> @llvm.vector.splice.right.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %45 = call <vscale x 4 x float> @llvm.vector.splice.right.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %46 = call <vscale x 8 x float> @llvm.vector.splice.right.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %47 = call <vscale x 16 x float> @llvm.vector.splice.right.nxv16f32(<vscale x 16 x float> zeroinitializer, <vscale x 16 x float> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %48 = call <vscale x 32 x float> @llvm.vector.splice.right.nxv32f32(<vscale x 32 x float> zeroinitializer, <vscale x 32 x float> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %49 = call <vscale x 64 x float> @llvm.vector.splice.right.nxv64f32(<vscale x 64 x float> zeroinitializer, <vscale x 64 x float> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %50 = call <vscale x 1 x double> @llvm.vector.splice.right.nxv1f64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %51 = call <vscale x 2 x double> @llvm.vector.splice.right.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %52 = call <vscale x 4 x double> @llvm.vector.splice.right.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %53 = call <vscale x 8 x double> @llvm.vector.splice.right.nxv8f64(<vscale x 8 x double> zeroinitializer, <vscale x 8 x double> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %54 = call <vscale x 16 x double> @llvm.vector.splice.right.nxv16f64(<vscale x 16 x double> zeroinitializer, <vscale x 16 x double> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %55 = call <vscale x 32 x double> @llvm.vector.splice.right.nxv32f64(<vscale x 32 x double> zeroinitializer, <vscale x 32 x double> zeroinitializer, i32 1)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 128 for instruction: %56 = call <vscale x 64 x double> @llvm.vector.splice.right.nxv64f64(<vscale x 64 x double> zeroinitializer, <vscale x 64 x double> zeroinitializer, i32 1)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
; SIZE-LABEL: 'vector_splice'
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %1 = call <vscale x 1 x i8> @llvm.vector.splice.up.nxv1i8(<vscale x 1 x i8> zeroinitializer, <vscale x 1 x i8> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %2 = call <vscale x 2 x i8> @llvm.vector.splice.up.nxv2i8(<vscale x 2 x i8> zeroinitializer, <vscale x 2 x i8> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %3 = call <vscale x 4 x i8> @llvm.vector.splice.up.nxv4i8(<vscale x 4 x i8> zeroinitializer, <vscale x 4 x i8> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %4 = call <vscale x 8 x i8> @llvm.vector.splice.up.nxv8i8(<vscale x 8 x i8> zeroinitializer, <vscale x 8 x i8> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %5 = call <vscale x 16 x i8> @llvm.vector.splice.up.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %6 = call <vscale x 32 x i8> @llvm.vector.splice.up.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %7 = call <vscale x 64 x i8> @llvm.vector.splice.up.nxv64i8(<vscale x 64 x i8> zeroinitializer, <vscale x 64 x i8> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = call <vscale x 1 x i16> @llvm.vector.splice.up.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %9 = call <vscale x 2 x i16> @llvm.vector.splice.up.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %10 = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %11 = call <vscale x 8 x i16> @llvm.vector.splice.up.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %12 = call <vscale x 16 x i16> @llvm.vector.splice.up.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %13 = call <vscale x 32 x i16> @llvm.vector.splice.up.nxv32i16(<vscale x 32 x i16> zeroinitializer, <vscale x 32 x i16> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %14 = call <vscale x 64 x i16> @llvm.vector.splice.up.nxv64i16(<vscale x 64 x i16> zeroinitializer, <vscale x 64 x i16> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %15 = call <vscale x 1 x i32> @llvm.vector.splice.up.nxv1i32(<vscale x 1 x i32> zeroinitializer, <vscale x 1 x i32> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %16 = call <vscale x 2 x i32> @llvm.vector.splice.up.nxv2i32(<vscale x 2 x i32> zeroinitializer, <vscale x 2 x i32> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %17 = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %18 = call <vscale x 8 x i32> @llvm.vector.splice.up.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %19 = call <vscale x 16 x i32> @llvm.vector.splice.up.nxv16i32(<vscale x 16 x i32> zeroinitializer, <vscale x 16 x i32> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %20 = call <vscale x 32 x i32> @llvm.vector.splice.up.nxv32i32(<vscale x 32 x i32> zeroinitializer, <vscale x 32 x i32> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %21 = call <vscale x 64 x i32> @llvm.vector.splice.up.nxv64i32(<vscale x 64 x i32> zeroinitializer, <vscale x 64 x i32> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %22 = call <vscale x 1 x i64> @llvm.vector.splice.up.nxv1i64(<vscale x 1 x i64> zeroinitializer, <vscale x 1 x i64> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %23 = call <vscale x 2 x i64> @llvm.vector.splice.up.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %24 = call <vscale x 4 x i64> @llvm.vector.splice.up.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %25 = call <vscale x 8 x i64> @llvm.vector.splice.up.nxv8i64(<vscale x 8 x i64> zeroinitializer, <vscale x 8 x i64> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %26 = call <vscale x 16 x i64> @llvm.vector.splice.up.nxv16i64(<vscale x 16 x i64> zeroinitializer, <vscale x 16 x i64> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %27 = call <vscale x 32 x i64> @llvm.vector.splice.up.nxv32i64(<vscale x 32 x i64> zeroinitializer, <vscale x 32 x i64> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %28 = call <vscale x 64 x i64> @llvm.vector.splice.up.nxv64i64(<vscale x 64 x i64> zeroinitializer, <vscale x 64 x i64> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Invalid cost for instruction: %29 = call <vscale x 1 x bfloat> @llvm.vector.splice.up.nxv1bf16(<vscale x 1 x bfloat> zeroinitializer, <vscale x 1 x bfloat> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Invalid cost for instruction: %30 = call <vscale x 2 x bfloat> @llvm.vector.splice.up.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Invalid cost for instruction: %31 = call <vscale x 4 x bfloat> @llvm.vector.splice.up.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Invalid cost for instruction: %32 = call <vscale x 8 x bfloat> @llvm.vector.splice.up.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Invalid cost for instruction: %33 = call <vscale x 16 x bfloat> @llvm.vector.splice.up.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Invalid cost for instruction: %34 = call <vscale x 32 x bfloat> @llvm.vector.splice.up.nxv32bf16(<vscale x 32 x bfloat> zeroinitializer, <vscale x 32 x bfloat> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Invalid cost for instruction: %35 = call <vscale x 64 x bfloat> @llvm.vector.splice.up.nxv64bf16(<vscale x 64 x bfloat> zeroinitializer, <vscale x 64 x bfloat> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %36 = call <vscale x 1 x half> @llvm.vector.splice.up.nxv1f16(<vscale x 1 x half> zeroinitializer, <vscale x 1 x half> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %37 = call <vscale x 2 x half> @llvm.vector.splice.up.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %38 = call <vscale x 4 x half> @llvm.vector.splice.up.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %39 = call <vscale x 8 x half> @llvm.vector.splice.up.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %40 = call <vscale x 16 x half> @llvm.vector.splice.up.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %41 = call <vscale x 32 x half> @llvm.vector.splice.up.nxv32f16(<vscale x 32 x half> zeroinitializer, <vscale x 32 x half> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %42 = call <vscale x 64 x half> @llvm.vector.splice.up.nxv64f16(<vscale x 64 x half> zeroinitializer, <vscale x 64 x half> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %43 = call <vscale x 1 x float> @llvm.vector.splice.up.nxv1f32(<vscale x 1 x float> zeroinitializer, <vscale x 1 x float> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %44 = call <vscale x 2 x float> @llvm.vector.splice.up.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %45 = call <vscale x 4 x float> @llvm.vector.splice.up.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %46 = call <vscale x 8 x float> @llvm.vector.splice.up.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %47 = call <vscale x 16 x float> @llvm.vector.splice.up.nxv16f32(<vscale x 16 x float> zeroinitializer, <vscale x 16 x float> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %48 = call <vscale x 32 x float> @llvm.vector.splice.up.nxv32f32(<vscale x 32 x float> zeroinitializer, <vscale x 32 x float> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %49 = call <vscale x 64 x float> @llvm.vector.splice.up.nxv64f32(<vscale x 64 x float> zeroinitializer, <vscale x 64 x float> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %50 = call <vscale x 1 x double> @llvm.vector.splice.up.nxv1f64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %51 = call <vscale x 2 x double> @llvm.vector.splice.up.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %52 = call <vscale x 4 x double> @llvm.vector.splice.up.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %53 = call <vscale x 8 x double> @llvm.vector.splice.up.nxv8f64(<vscale x 8 x double> zeroinitializer, <vscale x 8 x double> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %54 = call <vscale x 16 x double> @llvm.vector.splice.up.nxv16f64(<vscale x 16 x double> zeroinitializer, <vscale x 16 x double> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %55 = call <vscale x 32 x double> @llvm.vector.splice.up.nxv32f64(<vscale x 32 x double> zeroinitializer, <vscale x 32 x double> zeroinitializer, i32 1)
-; SIZE-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %56 = call <vscale x 64 x double> @llvm.vector.splice.up.nxv64f64(<vscale x 64 x double> zeroinitializer, <vscale x 64 x double> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %1 = call <vscale x 1 x i8> @llvm.vector.splice.right.nxv1i8(<vscale x 1 x i8> zeroinitializer, <vscale x 1 x i8> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %2 = call <vscale x 2 x i8> @llvm.vector.splice.right.nxv2i8(<vscale x 2 x i8> zeroinitializer, <vscale x 2 x i8> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %3 = call <vscale x 4 x i8> @llvm.vector.splice.right.nxv4i8(<vscale x 4 x i8> zeroinitializer, <vscale x 4 x i8> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %4 = call <vscale x 8 x i8> @llvm.vector.splice.right.nxv8i8(<vscale x 8 x i8> zeroinitializer, <vscale x 8 x i8> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %5 = call <vscale x 16 x i8> @llvm.vector.splice.right.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %6 = call <vscale x 32 x i8> @llvm.vector.splice.right.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %7 = call <vscale x 64 x i8> @llvm.vector.splice.right.nxv64i8(<vscale x 64 x i8> zeroinitializer, <vscale x 64 x i8> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = call <vscale x 1 x i16> @llvm.vector.splice.right.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %9 = call <vscale x 2 x i16> @llvm.vector.splice.right.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %10 = call <vscale x 4 x i16> @llvm.vector.splice.right.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %11 = call <vscale x 8 x i16> @llvm.vector.splice.right.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %12 = call <vscale x 16 x i16> @llvm.vector.splice.right.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %13 = call <vscale x 32 x i16> @llvm.vector.splice.right.nxv32i16(<vscale x 32 x i16> zeroinitializer, <vscale x 32 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %14 = call <vscale x 64 x i16> @llvm.vector.splice.right.nxv64i16(<vscale x 64 x i16> zeroinitializer, <vscale x 64 x i16> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %15 = call <vscale x 1 x i32> @llvm.vector.splice.right.nxv1i32(<vscale x 1 x i32> zeroinitializer, <vscale x 1 x i32> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %16 = call <vscale x 2 x i32> @llvm.vector.splice.right.nxv2i32(<vscale x 2 x i32> zeroinitializer, <vscale x 2 x i32> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %17 = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %18 = call <vscale x 8 x i32> @llvm.vector.splice.right.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %19 = call <vscale x 16 x i32> @llvm.vector.splice.right.nxv16i32(<vscale x 16 x i32> zeroinitializer, <vscale x 16 x i32> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %20 = call <vscale x 32 x i32> @llvm.vector.splice.right.nxv32i32(<vscale x 32 x i32> zeroinitializer, <vscale x 32 x i32> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %21 = call <vscale x 64 x i32> @llvm.vector.splice.right.nxv64i32(<vscale x 64 x i32> zeroinitializer, <vscale x 64 x i32> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %22 = call <vscale x 1 x i64> @llvm.vector.splice.right.nxv1i64(<vscale x 1 x i64> zeroinitializer, <vscale x 1 x i64> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %23 = call <vscale x 2 x i64> @llvm.vector.splice.right.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %24 = call <vscale x 4 x i64> @llvm.vector.splice.right.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %25 = call <vscale x 8 x i64> @llvm.vector.splice.right.nxv8i64(<vscale x 8 x i64> zeroinitializer, <vscale x 8 x i64> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %26 = call <vscale x 16 x i64> @llvm.vector.splice.right.nxv16i64(<vscale x 16 x i64> zeroinitializer, <vscale x 16 x i64> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %27 = call <vscale x 32 x i64> @llvm.vector.splice.right.nxv32i64(<vscale x 32 x i64> zeroinitializer, <vscale x 32 x i64> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %28 = call <vscale x 64 x i64> @llvm.vector.splice.right.nxv64i64(<vscale x 64 x i64> zeroinitializer, <vscale x 64 x i64> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Invalid cost for instruction: %29 = call <vscale x 1 x bfloat> @llvm.vector.splice.right.nxv1bf16(<vscale x 1 x bfloat> zeroinitializer, <vscale x 1 x bfloat> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Invalid cost for instruction: %30 = call <vscale x 2 x bfloat> @llvm.vector.splice.right.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Invalid cost for instruction: %31 = call <vscale x 4 x bfloat> @llvm.vector.splice.right.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Invalid cost for instruction: %32 = call <vscale x 8 x bfloat> @llvm.vector.splice.right.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Invalid cost for instruction: %33 = call <vscale x 16 x bfloat> @llvm.vector.splice.right.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Invalid cost for instruction: %34 = call <vscale x 32 x bfloat> @llvm.vector.splice.right.nxv32bf16(<vscale x 32 x bfloat> zeroinitializer, <vscale x 32 x bfloat> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Invalid cost for instruction: %35 = call <vscale x 64 x bfloat> @llvm.vector.splice.right.nxv64bf16(<vscale x 64 x bfloat> zeroinitializer, <vscale x 64 x bfloat> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %36 = call <vscale x 1 x half> @llvm.vector.splice.right.nxv1f16(<vscale x 1 x half> zeroinitializer, <vscale x 1 x half> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %37 = call <vscale x 2 x half> @llvm.vector.splice.right.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %38 = call <vscale x 4 x half> @llvm.vector.splice.right.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %39 = call <vscale x 8 x half> @llvm.vector.splice.right.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %40 = call <vscale x 16 x half> @llvm.vector.splice.right.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %41 = call <vscale x 32 x half> @llvm.vector.splice.right.nxv32f16(<vscale x 32 x half> zeroinitializer, <vscale x 32 x half> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %42 = call <vscale x 64 x half> @llvm.vector.splice.right.nxv64f16(<vscale x 64 x half> zeroinitializer, <vscale x 64 x half> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %43 = call <vscale x 1 x float> @llvm.vector.splice.right.nxv1f32(<vscale x 1 x float> zeroinitializer, <vscale x 1 x float> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %44 = call <vscale x 2 x float> @llvm.vector.splice.right.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %45 = call <vscale x 4 x float> @llvm.vector.splice.right.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %46 = call <vscale x 8 x float> @llvm.vector.splice.right.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %47 = call <vscale x 16 x float> @llvm.vector.splice.right.nxv16f32(<vscale x 16 x float> zeroinitializer, <vscale x 16 x float> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %48 = call <vscale x 32 x float> @llvm.vector.splice.right.nxv32f32(<vscale x 32 x float> zeroinitializer, <vscale x 32 x float> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %49 = call <vscale x 64 x float> @llvm.vector.splice.right.nxv64f32(<vscale x 64 x float> zeroinitializer, <vscale x 64 x float> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %50 = call <vscale x 1 x double> @llvm.vector.splice.right.nxv1f64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %51 = call <vscale x 2 x double> @llvm.vector.splice.right.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %52 = call <vscale x 4 x double> @llvm.vector.splice.right.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %53 = call <vscale x 8 x double> @llvm.vector.splice.right.nxv8f64(<vscale x 8 x double> zeroinitializer, <vscale x 8 x double> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %54 = call <vscale x 16 x double> @llvm.vector.splice.right.nxv16f64(<vscale x 16 x double> zeroinitializer, <vscale x 16 x double> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %55 = call <vscale x 32 x double> @llvm.vector.splice.right.nxv32f64(<vscale x 32 x double> zeroinitializer, <vscale x 32 x double> zeroinitializer, i32 1)
+; SIZE-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %56 = call <vscale x 64 x double> @llvm.vector.splice.right.nxv64f64(<vscale x 64 x double> zeroinitializer, <vscale x 64 x double> zeroinitializer, i32 1)
; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
;
%splice.nxv1i8 = call <vscale x 1 x i8> @llvm.vector.splice.nxv1i8(<vscale x 1 x i8> zeroinitializer, <vscale x 1 x i8> zeroinitializer, i32 -1)
diff --git a/llvm/test/Assembler/auto_upgrade_intrinsics.ll b/llvm/test/Assembler/auto_upgrade_intrinsics.ll
index a8eb644a0d03a..922b0fe832f79 100644
--- a/llvm/test/Assembler/auto_upgrade_intrinsics.ll
+++ b/llvm/test/Assembler/auto_upgrade_intrinsics.ll
@@ -218,11 +218,11 @@ define void @test.prefetch.unnamed(ptr %ptr) {
define void @test.vector.splice(<4 x i32> %a, <4 x i32> %b) {
; CHECK-LABEL: @test.vector.splice
-; CHECK: @llvm.vector.splice.down.v4i32(<4 x i32> %a, <4 x i32> %b, i32 3)
+; CHECK: @llvm.vector.splice.left.v4i32(<4 x i32> %a, <4 x i32> %b, i32 3)
call <4 x i32> @llvm.vector.splice(<4 x i32> %a, <4 x i32> %b, i32 3)
-; CHECK: @llvm.vector.splice.up.v4i32(<4 x i32> %a, <4 x i32> %b, i32 2)
+; CHECK: @llvm.vector.splice.right.v4i32(<4 x i32> %a, <4 x i32> %b, i32 2)
call <4 x i32> @llvm.vector.splice(<4 x i32> %a, <4 x i32> %b, i32 -2)
-; CHECK: @llvm.vector.splice.down.v4i32(<4 x i32> %a, <4 x i32> %b, i32 1)
+; CHECK: @llvm.vector.splice.left.v4i32(<4 x i32> %a, <4 x i32> %b, i32 1)
call <4 x i32> @llvm.vector.splice.v4i32(<4 x i32> %a, <4 x i32> %b, i32 1)
ret void
}
diff --git a/llvm/test/Bitcode/upgrade-vector-splice-intrinsic.ll b/llvm/test/Bitcode/upgrade-vector-splice-intrinsic.ll
index 844e54f32ad06..ba6cd7e672822 100644
--- a/llvm/test/Bitcode/upgrade-vector-splice-intrinsic.ll
+++ b/llvm/test/Bitcode/upgrade-vector-splice-intrinsic.ll
@@ -3,7 +3,7 @@
define <8 x half> @splice_fixed(<8 x half> %a, <8 x half> %b) {
; CHECK-LABEL: @splice_fixed
-; CHECK: %1 = call <8 x half> @llvm.vector.splice.down.v8f16(<8 x half> %a, <8 x half> %b, i32 2)
+; CHECK: %1 = call <8 x half> @llvm.vector.splice.left.v8f16(<8 x half> %a, <8 x half> %b, i32 2)
%res = call <8 x half> @llvm.experimental.vector.splice.v8f16(<8 x half> %a, <8 x half> %b, i32 2)
ret <8 x half> %res
@@ -11,14 +11,14 @@ define <8 x half> @splice_fixed(<8 x half> %a, <8 x half> %b) {
define <vscale x 8 x half> @splice_scalable(<vscale x 8 x half> %a, <vscale x 8 x half> %b) {
; CHECK-LABEL: @splice_scalable
-; CHECK: %1 = call <vscale x 8 x half> @llvm.vector.splice.down.nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b, i32 2)
+; CHECK: %1 = call <vscale x 8 x half> @llvm.vector.splice.left.nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b, i32 2)
%res = call <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b, i32 2)
ret <vscale x 8 x half> %res
}
declare <8 x half> @llvm.experimental.vector.splice.v8f16(<8 x half>, <8 x half>, i32 immarg)
-; CHECK: declare <8 x half> @llvm.vector.splice.down.v8f16(<8 x half>, <8 x half>, i32 immarg)
+; CHECK: declare <8 x half> @llvm.vector.splice.left.v8f16(<8 x half>, <8 x half>, i32 immarg)
declare <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>, i32 immarg)
-; CHECK: declare <vscale x 8 x half> @llvm.vector.splice.down.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>, i32 immarg)
+; CHECK: declare <vscale x 8 x half> @llvm.vector.splice.left.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>, i32 immarg)
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence.ll b/llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence.ll
index a08d6d1513989..9a09724a592eb 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence.ll
@@ -22,8 +22,8 @@ define i32 @PR33613(ptr %b, double %j, i32 %d) #0 {
; CHECK-VF4UF2-LABEL: @PR33613
; CHECK-VF4UF2: vector.body
; CHECK-VF4UF2: %[[VEC_RECUR:.*]] = phi <vscale x 4 x double> [ {{.*}}, %vector.ph ], [ {{.*}}, %vector.body ]
-; CHECK-VF4UF2: %[[SPLICE1:.*]] = call <vscale x 4 x double> @llvm.vector.splice.up.nxv4f64(<vscale x 4 x double> %[[VEC_RECUR]], <vscale x 4 x double> {{.*}}, i32 1)
-; CHECK-VF4UF2-NEXT: %[[SPLICE2:.*]] = call <vscale x 4 x double> @llvm.vector.splice.up.nxv4f64(<vscale x 4 x double> %{{.*}}, <vscale x 4 x double> %{{.*}}, i32 1)
+; CHECK-VF4UF2: %[[SPLICE1:.*]] = call <vscale x 4 x double> @llvm.vector.splice.right.nxv4f64(<vscale x 4 x double> %[[VEC_RECUR]], <vscale x 4 x double> {{.*}}, i32 1)
+; CHECK-VF4UF2-NEXT: %[[SPLICE2:.*]] = call <vscale x 4 x double> @llvm.vector.splice.right.nxv4f64(<vscale x 4 x double> %{{.*}}, <vscale x 4 x double> %{{.*}}, i32 1)
; CHECK-VF4UF2-NOT: insertelement <vscale x 4 x double>
; CHECK-VF4UF2: middle.block
entry:
@@ -71,7 +71,7 @@ define void @PR34711(ptr %a, ptr %b, ptr %c, i64 %n) #0 {
; CHECK-VF4UF1: vector.body
; CHECK-VF4UF1: %[[VEC_RECUR:.*]] = phi <vscale x 4 x i16> [ %vector.recur.init, %vector.ph ], [ %[[MGATHER:.*]], %vector.body ]
; CHECK-VF4UF1: %[[MGATHER]] = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16.nxv4p0(<vscale x 4 x ptr> {{.*}}, <vscale x 4 x i1> splat (i1 true), <vscale x 4 x i16> poison)
-; CHECK-VF4UF1-NEXT: %[[SPLICE:.*]] = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> %[[VEC_RECUR]], <vscale x 4 x i16> %[[MGATHER]], i32 1)
+; CHECK-VF4UF1-NEXT: %[[SPLICE:.*]] = call <vscale x 4 x i16> @llvm.vector.splice.right.nxv4i16(<vscale x 4 x i16> %[[VEC_RECUR]], <vscale x 4 x i16> %[[MGATHER]], i32 1)
; CHECK-VF4UF1-NEXT: %[[SXT1:.*]] = sext <vscale x 4 x i16> %[[SPLICE]] to <vscale x 4 x i32>
; CHECK-VF4UF1-NEXT: %[[SXT2:.*]] = sext <vscale x 4 x i16> %[[MGATHER]] to <vscale x 4 x i32>
; CHECK-VF4UF1-NEXT: mul nsw <vscale x 4 x i32> %[[SXT2]], %[[SXT1]]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/reduction-recurrence-costs-sve.ll b/llvm/test/Transforms/LoopVectorize/AArch64/reduction-recurrence-costs-sve.ll
index b66018ef04c48..fd1cc2502c97b 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/reduction-recurrence-costs-sve.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/reduction-recurrence-costs-sve.ll
@@ -91,10 +91,10 @@ define i32 @chained_recurrences(i32 %x, i64 %y, ptr %src.1, i32 %z, ptr %src.2)
; VSCALEFORTUNING2-NEXT: [[TMP24:%.*]] = load i32, ptr [[TMP8]], align 4
; VSCALEFORTUNING2-NEXT: [[BROADCAST_SPLATINSERT6:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[TMP24]], i64 0
; VSCALEFORTUNING2-NEXT: [[BROADCAST_SPLAT7]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT6]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
-; VSCALEFORTUNING2-NEXT: [[TMP25:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[BROADCAST_SPLAT7]], i32 1)
-; VSCALEFORTUNING2-NEXT: [[TMP26]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[BROADCAST_SPLAT7]], <vscale x 4 x i32> [[BROADCAST_SPLAT7]], i32 1)
-; VSCALEFORTUNING2-NEXT: [[TMP27:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR4]], <vscale x 4 x i32> [[TMP25]], i32 1)
-; VSCALEFORTUNING2-NEXT: [[TMP28:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[TMP25]], <vscale x 4 x i32> [[TMP26]], i32 1)
+; VSCALEFORTUNING2-NEXT: [[TMP25:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[BROADCAST_SPLAT7]], i32 1)
+; VSCALEFORTUNING2-NEXT: [[TMP26]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> [[BROADCAST_SPLAT7]], <vscale x 4 x i32> [[BROADCAST_SPLAT7]], i32 1)
+; VSCALEFORTUNING2-NEXT: [[TMP27:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR4]], <vscale x 4 x i32> [[TMP25]], i32 1)
+; VSCALEFORTUNING2-NEXT: [[TMP28:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> [[TMP25]], <vscale x 4 x i32> [[TMP26]], i32 1)
; VSCALEFORTUNING2-NEXT: [[TMP29:%.*]] = or <vscale x 4 x i32> [[TMP27]], [[BROADCAST_SPLAT]]
; VSCALEFORTUNING2-NEXT: [[TMP30:%.*]] = or <vscale x 4 x i32> [[TMP28]], [[BROADCAST_SPLAT]]
; VSCALEFORTUNING2-NEXT: [[TMP31:%.*]] = shl <vscale x 4 x i32> [[TMP29]], splat (i32 1)
@@ -218,8 +218,8 @@ define i32 @chained_recurrences(i32 %x, i64 %y, ptr %src.1, i32 %z, ptr %src.2)
; PRED-NEXT: [[TMP28:%.*]] = load i32, ptr [[TMP12]], align 4
; PRED-NEXT: [[BROADCAST_SPLATINSERT5:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[TMP28]], i64 0
; PRED-NEXT: [[BROADCAST_SPLAT6]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT5]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
-; PRED-NEXT: [[TMP29]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[BROADCAST_SPLAT6]], i32 1)
-; PRED-NEXT: [[TMP30:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR4]], <vscale x 4 x i32> [[TMP29]], i32 1)
+; PRED-NEXT: [[TMP29]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[BROADCAST_SPLAT6]], i32 1)
+; PRED-NEXT: [[TMP30:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR4]], <vscale x 4 x i32> [[TMP29]], i32 1)
; PRED-NEXT: [[TMP31:%.*]] = or <vscale x 4 x i32> [[TMP30]], [[BROADCAST_SPLAT]]
; PRED-NEXT: [[TMP32:%.*]] = shl <vscale x 4 x i32> [[TMP31]], splat (i32 1)
; PRED-NEXT: [[TMP33:%.*]] = or <vscale x 4 x i32> [[TMP32]], splat (i32 2)
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll
index 6ba0eb23e485c..d92c0e3a9a3f9 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll
@@ -1293,7 +1293,7 @@ define void @PR34743(ptr %a, ptr %b, i64 %n) #1 {
; CHECK-NEXT: [[TMP21:%.*]] = sext <vscale x 4 x i16> [[WIDE_MASKED_GATHER]] to <vscale x 4 x i32>
; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i16, ptr [[A]], <vscale x 4 x i64> [[TMP19]]
; CHECK-NEXT: [[WIDE_MASKED_GATHER4]] = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16.nxv4p0(<vscale x 4 x ptr> align 4 [[TMP22]], <vscale x 4 x i1> splat (i1 true), <vscale x 4 x i16> poison), !alias.scope [[META34]]
-; CHECK-NEXT: [[TMP23:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_MASKED_GATHER4]], i32 1)
+; CHECK-NEXT: [[TMP23:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.right.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_MASKED_GATHER4]], i32 1)
; CHECK-NEXT: [[TMP24:%.*]] = sext <vscale x 4 x i16> [[TMP23]] to <vscale x 4 x i32>
; CHECK-NEXT: [[TMP25:%.*]] = sext <vscale x 4 x i16> [[WIDE_MASKED_GATHER4]] to <vscale x 4 x i32>
; CHECK-NEXT: [[TMP26:%.*]] = mul nsw <vscale x 4 x i32> [[TMP24]], [[TMP21]]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll
index 3ebc38679e203..25b3d44b48cd6 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll
@@ -178,7 +178,7 @@ define void @add_recur(ptr noalias %dst, ptr noalias %src, i64 %n) #0 {
; CHECK-NOTF-NOT: %{{.*}} = phi <vscale x 4 x i1>
; CHECK-NOTF: %[[VECTOR_RECUR:.*]] = phi <vscale x 4 x i32> [ %[[RECUR_INIT]], %vector.ph ], [ %[[LOAD:.*]], %vector.body ]
; CHECK-NOTF: %[[LOAD]] = load <vscale x 4 x i32>
-; CHECK-NOTF: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
+; CHECK-NOTF: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
; CHECK-NOTF: %[[ADD:.*]] = add nsw <vscale x 4 x i32> %[[LOAD]], %[[SPLICE]]
; CHECK-NOTF: store <vscale x 4 x i32> %[[ADD]]
@@ -191,7 +191,7 @@ define void @add_recur(ptr noalias %dst, ptr noalias %src, i64 %n) #0 {
; CHECK-TF-NORED: %[[ACTIVE_LANE_MASK:.*]] = phi <vscale x 4 x i1>
; CHECK-TF-NORED: %[[VECTOR_RECUR:.*]] = phi <vscale x 4 x i32> [ %[[RECUR_INIT]], %vector.ph ], [ %[[LOAD:.*]], %vector.body ]
; CHECK-TF-NORED: %[[LOAD]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0({{.*}} %[[ACTIVE_LANE_MASK]]
-; CHECK-TF-NORED: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
+; CHECK-TF-NORED: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
; CHECK-TF-NORED: %[[ADD:.*]] = add nsw <vscale x 4 x i32> %[[LOAD]], %[[SPLICE]]
; CHECK-TF-NORED: call void @llvm.masked.store.nxv4i32.p0(<vscale x 4 x i32> %[[ADD]], {{.*}} <vscale x 4 x i1> %[[ACTIVE_LANE_MASK]])
@@ -204,7 +204,7 @@ define void @add_recur(ptr noalias %dst, ptr noalias %src, i64 %n) #0 {
; CHECK-TF-NOREC-NOT: %{{.*}} = phi <vscale x 4 x i1>
; CHECK-TF-NOREC: %[[VECTOR_RECUR:.*]] = phi <vscale x 4 x i32> [ %[[RECUR_INIT]], %vector.ph ], [ %[[LOAD:.*]], %vector.body ]
; CHECK-TF-NOREC: %[[LOAD]] = load <vscale x 4 x i32>
-; CHECK-TF-NOREC: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
+; CHECK-TF-NOREC: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
; CHECK-TF-NOREC: %[[ADD:.*]] = add nsw <vscale x 4 x i32> %[[LOAD]], %[[SPLICE]]
; CHECK-TF-NOREC: store <vscale x 4 x i32> %[[ADD]]
@@ -217,7 +217,7 @@ define void @add_recur(ptr noalias %dst, ptr noalias %src, i64 %n) #0 {
; CHECK-TF-NOREV: %[[ACTIVE_LANE_MASK:.*]] = phi <vscale x 4 x i1>
; CHECK-TF-NOREV: %[[VECTOR_RECUR:.*]] = phi <vscale x 4 x i32> [ %[[RECUR_INIT]], %vector.ph ], [ %[[LOAD:.*]], %vector.body ]
; CHECK-TF-NOREV: %[[LOAD]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0({{.*}} %[[ACTIVE_LANE_MASK]]
-; CHECK-TF-NOREV: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
+; CHECK-TF-NOREV: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
; CHECK-TF-NOREV: %[[ADD:.*]] = add nsw <vscale x 4 x i32> %[[LOAD]], %[[SPLICE]]
; CHECK-TF-NOREV: call void @llvm.masked.store.nxv4i32.p0(<vscale x 4 x i32> %[[ADD]], {{.*}} <vscale x 4 x i1> %[[ACTIVE_LANE_MASK]])
@@ -230,7 +230,7 @@ define void @add_recur(ptr noalias %dst, ptr noalias %src, i64 %n) #0 {
; CHECK-TF: %[[ACTIVE_LANE_MASK:.*]] = phi <vscale x 4 x i1>
; CHECK-TF: %[[VECTOR_RECUR:.*]] = phi <vscale x 4 x i32> [ %[[RECUR_INIT]], %vector.ph ], [ %[[LOAD:.*]], %vector.body ]
; CHECK-TF: %[[LOAD]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0({{.*}} %[[ACTIVE_LANE_MASK]]
-; CHECK-TF: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
+; CHECK-TF: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
; CHECK-TF: %[[ADD:.*]] = add nsw <vscale x 4 x i32> %[[LOAD]], %[[SPLICE]]
; CHECK-TF: call void @llvm.masked.store.nxv4i32.p0(<vscale x 4 x i32> %[[ADD]], {{.*}} <vscale x 4 x i1> %[[ACTIVE_LANE_MASK]])
@@ -243,7 +243,7 @@ define void @add_recur(ptr noalias %dst, ptr noalias %src, i64 %n) #0 {
; CHECK-TF-ONLYRED-NOT: %{{.*}} = phi <vscale x 4 x i1>
; CHECK-TF-ONLYRED: %[[VECTOR_RECUR:.*]] = phi <vscale x 4 x i32> [ %[[RECUR_INIT]], %vector.ph ], [ %[[LOAD:.*]], %vector.body ]
; CHECK-TF-ONLYRED: %[[LOAD]] = load <vscale x 4 x i32>
-; CHECK-TF-ONLYRED: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
+; CHECK-TF-ONLYRED: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
; CHECK-TF-ONLYRED: %[[ADD:.*]] = add nsw <vscale x 4 x i32> %[[LOAD]], %[[SPLICE]]
; CHECK-TF-ONLYRED: store <vscale x 4 x i32> %[[ADD]]
@@ -256,7 +256,7 @@ define void @add_recur(ptr noalias %dst, ptr noalias %src, i64 %n) #0 {
; CHECK-NEOVERSE-V1-NOT: %{{.*}} = phi <vscale x 4 x i1>
; CHECK-NEOVERSE-V1: %[[VECTOR_RECUR:.*]] = phi <vscale x 4 x i32> [ %[[RECUR_INIT]], %vector.ph ], [ %[[LOAD:.*]], %vector.body ]
; CHECK-NEOVERSE-V1: %[[LOAD]] = load <vscale x 4 x i32>
-; CHECK-NEOVERSE-V1: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
+; CHECK-NEOVERSE-V1: %[[SPLICE:.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> %[[VECTOR_RECUR]], <vscale x 4 x i32> %[[LOAD]], i32 1)
; CHECK-NEOVERSE-V1: %[[ADD:.*]] = add nsw <vscale x 4 x i32> %[[LOAD]], %[[SPLICE]]
; CHECK-NEOVERSE-V1: store <vscale x 4 x i32> %[[ADD]]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-fixed-order-recurrence.ll b/llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-fixed-order-recurrence.ll
index 65cf3f161df93..a0b95db2f3552 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-fixed-order-recurrence.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-fixed-order-recurrence.ll
@@ -67,7 +67,7 @@ define void @first_order_recurrence(ptr noalias %A, ptr noalias %B, i64 %TC) {
; NO-VP-NEXT: [[VECTOR_RECUR:%.*]] = phi <vscale x 4 x i32> [ [[VECTOR_RECUR_INIT]], %[[VECTOR_PH]] ], [ [[WIDE_LOAD:%.*]], %[[VECTOR_BODY]] ]
; NO-VP-NEXT: [[TMP10:%.*]] = getelementptr inbounds nuw i32, ptr [[A]], i64 [[INDEX]]
; NO-VP-NEXT: [[WIDE_LOAD]] = load <vscale x 4 x i32>, ptr [[TMP10]], align 4
-; NO-VP-NEXT: [[TMP12:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
+; NO-VP-NEXT: [[TMP12:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
; NO-VP-NEXT: [[TMP13:%.*]] = add nsw <vscale x 4 x i32> [[TMP12]], [[WIDE_LOAD]]
; NO-VP-NEXT: [[TMP14:%.*]] = getelementptr inbounds nuw i32, ptr [[B]], i64 [[INDEX]]
; NO-VP-NEXT: store <vscale x 4 x i32> [[TMP13]], ptr [[TMP14]], align 4
@@ -187,8 +187,8 @@ define void @second_order_recurrence(ptr noalias %A, ptr noalias %B, i64 %TC) {
; NO-VP-NEXT: [[VECTOR_RECUR2:%.*]] = phi <vscale x 4 x i32> [ [[VECTOR_RECUR_INIT1]], %[[VECTOR_PH]] ], [ [[TMP15:%.*]], %[[VECTOR_BODY]] ]
; NO-VP-NEXT: [[TMP13:%.*]] = getelementptr inbounds nuw i32, ptr [[A]], i64 [[INDEX]]
; NO-VP-NEXT: [[WIDE_LOAD]] = load <vscale x 4 x i32>, ptr [[TMP13]], align 4
-; NO-VP-NEXT: [[TMP15]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
-; NO-VP-NEXT: [[TMP16:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR2]], <vscale x 4 x i32> [[TMP15]], i32 1)
+; NO-VP-NEXT: [[TMP15]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
+; NO-VP-NEXT: [[TMP16:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR2]], <vscale x 4 x i32> [[TMP15]], i32 1)
; NO-VP-NEXT: [[TMP17:%.*]] = add nsw <vscale x 4 x i32> [[TMP15]], [[TMP16]]
; NO-VP-NEXT: [[TMP18:%.*]] = getelementptr inbounds nuw i32, ptr [[B]], i64 [[INDEX]]
; NO-VP-NEXT: store <vscale x 4 x i32> [[TMP17]], ptr [[TMP18]], align 4
@@ -327,9 +327,9 @@ define void @third_order_recurrence(ptr noalias %A, ptr noalias %B, i64 %TC) {
; NO-VP-NEXT: [[VECTOR_RECUR4:%.*]] = phi <vscale x 4 x i32> [ [[VECTOR_RECUR_INIT3]], %[[VECTOR_PH]] ], [ [[TMP19:%.*]], %[[VECTOR_BODY]] ]
; NO-VP-NEXT: [[TMP16:%.*]] = getelementptr inbounds nuw i32, ptr [[A]], i64 [[INDEX]]
; NO-VP-NEXT: [[WIDE_LOAD]] = load <vscale x 4 x i32>, ptr [[TMP16]], align 4
-; NO-VP-NEXT: [[TMP18]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
-; NO-VP-NEXT: [[TMP19]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR2]], <vscale x 4 x i32> [[TMP18]], i32 1)
-; NO-VP-NEXT: [[TMP20:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR4]], <vscale x 4 x i32> [[TMP19]], i32 1)
+; NO-VP-NEXT: [[TMP18]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
+; NO-VP-NEXT: [[TMP19]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR2]], <vscale x 4 x i32> [[TMP18]], i32 1)
+; NO-VP-NEXT: [[TMP20:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR4]], <vscale x 4 x i32> [[TMP19]], i32 1)
; NO-VP-NEXT: [[TMP21:%.*]] = add nsw <vscale x 4 x i32> [[TMP19]], [[TMP20]]
; NO-VP-NEXT: [[TMP22:%.*]] = add <vscale x 4 x i32> [[TMP21]], [[TMP18]]
; NO-VP-NEXT: [[TMP23:%.*]] = getelementptr inbounds nuw i32, ptr [[B]], i64 [[INDEX]]
@@ -467,7 +467,7 @@ define i32 @FOR_reduction(ptr noalias %A, ptr noalias %B, i64 %TC) {
; NO-VP-NEXT: [[VECTOR_RECUR:%.*]] = phi <vscale x 4 x i32> [ [[VECTOR_RECUR_INIT]], %[[VECTOR_PH]] ], [ [[WIDE_LOAD:%.*]], %[[VECTOR_BODY]] ]
; NO-VP-NEXT: [[TMP10:%.*]] = getelementptr inbounds nuw i32, ptr [[A]], i64 [[INDEX]]
; NO-VP-NEXT: [[WIDE_LOAD]] = load <vscale x 4 x i32>, ptr [[TMP10]], align 4
-; NO-VP-NEXT: [[TMP12:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
+; NO-VP-NEXT: [[TMP12:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
; NO-VP-NEXT: [[TMP13:%.*]] = add nsw <vscale x 4 x i32> [[TMP12]], [[WIDE_LOAD]]
; NO-VP-NEXT: [[TMP14:%.*]] = getelementptr inbounds nuw i32, ptr [[B]], i64 [[INDEX]]
; NO-VP-NEXT: store <vscale x 4 x i32> [[TMP13]], ptr [[TMP14]], align 4
@@ -591,7 +591,7 @@ define void @first_order_recurrence_indvar(ptr noalias %A, i64 %TC) {
; NO-VP-NEXT: [[VEC_IND:%.*]] = phi <vscale x 2 x i64> [ [[INDUCTION]], %[[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], %[[VECTOR_BODY]] ]
; NO-VP-NEXT: [[VECTOR_RECUR:%.*]] = phi <vscale x 2 x i64> [ [[VECTOR_RECUR_INIT]], %[[VECTOR_PH]] ], [ [[TMP12:%.*]], %[[VECTOR_BODY]] ]
; NO-VP-NEXT: [[TMP12]] = add <vscale x 2 x i64> [[VEC_IND]], splat (i64 42)
-; NO-VP-NEXT: [[TMP13:%.*]] = call <vscale x 2 x i64> @llvm.vector.splice.up.nxv2i64(<vscale x 2 x i64> [[VECTOR_RECUR]], <vscale x 2 x i64> [[TMP12]], i32 1)
+; NO-VP-NEXT: [[TMP13:%.*]] = call <vscale x 2 x i64> @llvm.vector.splice.right.nxv2i64(<vscale x 2 x i64> [[VECTOR_RECUR]], <vscale x 2 x i64> [[TMP12]], i32 1)
; NO-VP-NEXT: [[TMP11:%.*]] = getelementptr inbounds nuw i64, ptr [[A]], i64 [[INDEX]]
; NO-VP-NEXT: store <vscale x 2 x i64> [[TMP13]], ptr [[TMP11]], align 8
; NO-VP-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP3]]
diff --git a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-scalable-vf1.ll b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-scalable-vf1.ll
index 8b43c8554cc86..3ae58cfb0fb01 100644
--- a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-scalable-vf1.ll
+++ b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-scalable-vf1.ll
@@ -24,7 +24,7 @@ define i64 @pr97452_scalable_vf1_for_live_out(ptr %src) {
; CHECK-NEXT: [[VECTOR_RECUR:%.*]] = phi <vscale x 1 x i64> [ [[VECTOR_RECUR_INIT]], %[[VECTOR_PH]] ], [ [[WIDE_LOAD:%.*]], %[[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD]] = load <vscale x 1 x i64>, ptr [[TMP5]], align 8
-; CHECK-NEXT: [[TMP7:%.*]] = call <vscale x 1 x i64> @llvm.vector.splice.up.nxv1i64(<vscale x 1 x i64> [[VECTOR_RECUR]], <vscale x 1 x i64> [[WIDE_LOAD]], i32 1)
+; CHECK-NEXT: [[TMP7:%.*]] = call <vscale x 1 x i64> @llvm.vector.splice.right.nxv1i64(<vscale x 1 x i64> [[VECTOR_RECUR]], <vscale x 1 x i64> [[WIDE_LOAD]], i32 1)
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP1]]
; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP6]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
@@ -91,7 +91,7 @@ define void @pr97452_scalable_vf1_for_no_live_out(ptr %src, ptr noalias %dst) {
; CHECK-NEXT: [[VECTOR_RECUR:%.*]] = phi <vscale x 1 x i64> [ [[VECTOR_RECUR_INIT]], %[[VECTOR_PH]] ], [ [[WIDE_LOAD:%.*]], %[[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD]] = load <vscale x 1 x i64>, ptr [[TMP5]], align 8
-; CHECK-NEXT: [[TMP7:%.*]] = call <vscale x 1 x i64> @llvm.vector.splice.up.nxv1i64(<vscale x 1 x i64> [[VECTOR_RECUR]], <vscale x 1 x i64> [[WIDE_LOAD]], i32 1)
+; CHECK-NEXT: [[TMP7:%.*]] = call <vscale x 1 x i64> @llvm.vector.splice.right.nxv1i64(<vscale x 1 x i64> [[VECTOR_RECUR]], <vscale x 1 x i64> [[WIDE_LOAD]], i32 1)
; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[DST]], i64 [[INDEX]]
; CHECK-NEXT: store <vscale x 1 x i64> [[TMP7]], ptr [[TMP8]], align 8
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP1]]
diff --git a/llvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll b/llvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll
index 01cb4ffe9debd..7c7de7458e984 100644
--- a/llvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll
+++ b/llvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll
@@ -47,7 +47,7 @@ define i32 @recurrence_1(ptr nocapture readonly %a, ptr nocapture %b, i32 %n) {
; CHECK-VF4UF1-NEXT: [[TMP17:%.*]] = add nuw nsw i64 [[INDEX]], 1
; CHECK-VF4UF1-NEXT: [[TMP18:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP17]]
; CHECK-VF4UF1-NEXT: [[WIDE_LOAD]] = load <vscale x 4 x i32>, ptr [[TMP18]], align 4
-; CHECK-VF4UF1-NEXT: [[TMP20:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
+; CHECK-VF4UF1-NEXT: [[TMP20:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
; CHECK-VF4UF1-NEXT: [[TMP21:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDEX]]
; CHECK-VF4UF1-NEXT: [[TMP22:%.*]] = add <vscale x 4 x i32> [[WIDE_LOAD]], [[TMP20]]
; CHECK-VF4UF1-NEXT: store <vscale x 4 x i32> [[TMP22]], ptr [[TMP21]], align 4
@@ -114,8 +114,8 @@ define i32 @recurrence_1(ptr nocapture readonly %a, ptr nocapture %b, i32 %n) {
; CHECK-VF4UF2-NEXT: [[TMP22:%.*]] = getelementptr inbounds i32, ptr [[TMP18]], i64 [[TMP21]]
; CHECK-VF4UF2-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 4 x i32>, ptr [[TMP18]], align 4
; CHECK-VF4UF2-NEXT: [[WIDE_LOAD3]] = load <vscale x 4 x i32>, ptr [[TMP22]], align 4
-; CHECK-VF4UF2-NEXT: [[TMP23:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
-; CHECK-VF4UF2-NEXT: [[TMP24:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[WIDE_LOAD]], <vscale x 4 x i32> [[WIDE_LOAD3]], i32 1)
+; CHECK-VF4UF2-NEXT: [[TMP23:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
+; CHECK-VF4UF2-NEXT: [[TMP24:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> [[WIDE_LOAD]], <vscale x 4 x i32> [[WIDE_LOAD3]], i32 1)
; CHECK-VF4UF2-NEXT: [[TMP25:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDEX]]
; CHECK-VF4UF2-NEXT: [[TMP26:%.*]] = add <vscale x 4 x i32> [[WIDE_LOAD]], [[TMP23]]
; CHECK-VF4UF2-NEXT: [[TMP27:%.*]] = add <vscale x 4 x i32> [[WIDE_LOAD3]], [[TMP24]]
@@ -206,7 +206,7 @@ define i32 @recurrence_2(ptr nocapture readonly %a, i32 %n) {
; CHECK-VF4UF1-NEXT: [[VEC_PHI:%.*]] = phi <vscale x 4 x i32> [ zeroinitializer, %[[VECTOR_PH]] ], [ [[TMP17:%.*]], %[[VECTOR_BODY]] ]
; CHECK-VF4UF1-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[INDEX]]
; CHECK-VF4UF1-NEXT: [[WIDE_LOAD]] = load <vscale x 4 x i32>, ptr [[TMP10]], align 4
-; CHECK-VF4UF1-NEXT: [[TMP12:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
+; CHECK-VF4UF1-NEXT: [[TMP12:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
; CHECK-VF4UF1-NEXT: [[TMP13:%.*]] = sub nsw <vscale x 4 x i32> [[WIDE_LOAD]], [[TMP12]]
; CHECK-VF4UF1-NEXT: [[TMP14:%.*]] = icmp sgt <vscale x 4 x i32> [[TMP13]], zeroinitializer
; CHECK-VF4UF1-NEXT: [[TMP15:%.*]] = select <vscale x 4 x i1> [[TMP14]], <vscale x 4 x i32> [[TMP13]], <vscale x 4 x i32> zeroinitializer
@@ -270,8 +270,8 @@ define i32 @recurrence_2(ptr nocapture readonly %a, i32 %n) {
; CHECK-VF4UF2-NEXT: [[TMP14:%.*]] = getelementptr inbounds i32, ptr [[TMP10]], i64 [[TMP13]]
; CHECK-VF4UF2-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 4 x i32>, ptr [[TMP10]], align 4
; CHECK-VF4UF2-NEXT: [[WIDE_LOAD2]] = load <vscale x 4 x i32>, ptr [[TMP14]], align 4
-; CHECK-VF4UF2-NEXT: [[TMP15:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
-; CHECK-VF4UF2-NEXT: [[TMP16:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.up.nxv4i32(<vscale x 4 x i32> [[WIDE_LOAD]], <vscale x 4 x i32> [[WIDE_LOAD2]], i32 1)
+; CHECK-VF4UF2-NEXT: [[TMP15:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> [[VECTOR_RECUR]], <vscale x 4 x i32> [[WIDE_LOAD]], i32 1)
+; CHECK-VF4UF2-NEXT: [[TMP16:%.*]] = call <vscale x 4 x i32> @llvm.vector.splice.right.nxv4i32(<vscale x 4 x i32> [[WIDE_LOAD]], <vscale x 4 x i32> [[WIDE_LOAD2]], i32 1)
; CHECK-VF4UF2-NEXT: [[TMP17:%.*]] = sub nsw <vscale x 4 x i32> [[WIDE_LOAD]], [[TMP15]]
; CHECK-VF4UF2-NEXT: [[TMP18:%.*]] = sub nsw <vscale x 4 x i32> [[WIDE_LOAD2]], [[TMP16]]
; CHECK-VF4UF2-NEXT: [[TMP19:%.*]] = icmp sgt <vscale x 4 x i32> [[TMP17]], zeroinitializer
@@ -392,7 +392,7 @@ define void @recurrence_3(ptr nocapture readonly %a, ptr nocapture %b, i32 %n, f
; CHECK-VF4UF1-NEXT: [[OFFSET_IDX:%.*]] = add i64 1, [[INDEX]]
; CHECK-VF4UF1-NEXT: [[TMP19:%.*]] = getelementptr inbounds i16, ptr [[A]], i64 [[OFFSET_IDX]]
; CHECK-VF4UF1-NEXT: [[WIDE_LOAD]] = load <vscale x 4 x i16>, ptr [[TMP19]], align 2, !alias.scope [[META6:![0-9]+]]
-; CHECK-VF4UF1-NEXT: [[TMP21:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_LOAD]], i32 1)
+; CHECK-VF4UF1-NEXT: [[TMP21:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.right.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_LOAD]], i32 1)
; CHECK-VF4UF1-NEXT: [[TMP22:%.*]] = sitofp <vscale x 4 x i16> [[WIDE_LOAD]] to <vscale x 4 x double>
; CHECK-VF4UF1-NEXT: [[TMP23:%.*]] = sitofp <vscale x 4 x i16> [[TMP21]] to <vscale x 4 x double>
; CHECK-VF4UF1-NEXT: [[TMP24:%.*]] = fmul fast <vscale x 4 x double> [[TMP23]], [[BROADCAST_SPLAT]]
@@ -472,8 +472,8 @@ define void @recurrence_3(ptr nocapture readonly %a, ptr nocapture %b, i32 %n, f
; CHECK-VF4UF2-NEXT: [[TMP23:%.*]] = getelementptr inbounds i16, ptr [[TMP19]], i64 [[TMP22]]
; CHECK-VF4UF2-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 4 x i16>, ptr [[TMP19]], align 2, !alias.scope [[META6:![0-9]+]]
; CHECK-VF4UF2-NEXT: [[WIDE_LOAD4]] = load <vscale x 4 x i16>, ptr [[TMP23]], align 2, !alias.scope [[META6]]
-; CHECK-VF4UF2-NEXT: [[TMP24:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_LOAD]], i32 1)
-; CHECK-VF4UF2-NEXT: [[TMP25:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> [[WIDE_LOAD]], <vscale x 4 x i16> [[WIDE_LOAD4]], i32 1)
+; CHECK-VF4UF2-NEXT: [[TMP24:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.right.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_LOAD]], i32 1)
+; CHECK-VF4UF2-NEXT: [[TMP25:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.right.nxv4i16(<vscale x 4 x i16> [[WIDE_LOAD]], <vscale x 4 x i16> [[WIDE_LOAD4]], i32 1)
; CHECK-VF4UF2-NEXT: [[TMP26:%.*]] = sitofp <vscale x 4 x i16> [[WIDE_LOAD]] to <vscale x 4 x double>
; CHECK-VF4UF2-NEXT: [[TMP27:%.*]] = sitofp <vscale x 4 x i16> [[WIDE_LOAD4]] to <vscale x 4 x double>
; CHECK-VF4UF2-NEXT: [[TMP28:%.*]] = sitofp <vscale x 4 x i16> [[TMP24]] to <vscale x 4 x double>
@@ -766,7 +766,7 @@ define void @sink_after(ptr %a, ptr %b, i64 %n) {
; CHECK-VF4UF1-NEXT: [[TMP12:%.*]] = add nuw nsw i64 [[INDEX]], 1
; CHECK-VF4UF1-NEXT: [[TMP13:%.*]] = getelementptr inbounds i16, ptr [[A]], i64 [[TMP12]]
; CHECK-VF4UF1-NEXT: [[WIDE_LOAD]] = load <vscale x 4 x i16>, ptr [[TMP13]], align 2, !alias.scope [[META17:![0-9]+]]
-; CHECK-VF4UF1-NEXT: [[TMP15:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_LOAD]], i32 1)
+; CHECK-VF4UF1-NEXT: [[TMP15:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.right.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_LOAD]], i32 1)
; CHECK-VF4UF1-NEXT: [[TMP16:%.*]] = sext <vscale x 4 x i16> [[TMP15]] to <vscale x 4 x i32>
; CHECK-VF4UF1-NEXT: [[TMP17:%.*]] = sext <vscale x 4 x i16> [[WIDE_LOAD]] to <vscale x 4 x i32>
; CHECK-VF4UF1-NEXT: [[TMP18:%.*]] = mul nsw <vscale x 4 x i32> [[TMP17]], [[TMP16]]
@@ -827,8 +827,8 @@ define void @sink_after(ptr %a, ptr %b, i64 %n) {
; CHECK-VF4UF2-NEXT: [[TMP17:%.*]] = getelementptr inbounds i16, ptr [[TMP13]], i64 [[TMP16]]
; CHECK-VF4UF2-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 4 x i16>, ptr [[TMP13]], align 2, !alias.scope [[META17:![0-9]+]]
; CHECK-VF4UF2-NEXT: [[WIDE_LOAD3]] = load <vscale x 4 x i16>, ptr [[TMP17]], align 2, !alias.scope [[META17]]
-; CHECK-VF4UF2-NEXT: [[TMP18:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_LOAD]], i32 1)
-; CHECK-VF4UF2-NEXT: [[TMP19:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.up.nxv4i16(<vscale x 4 x i16> [[WIDE_LOAD]], <vscale x 4 x i16> [[WIDE_LOAD3]], i32 1)
+; CHECK-VF4UF2-NEXT: [[TMP18:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.right.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_LOAD]], i32 1)
+; CHECK-VF4UF2-NEXT: [[TMP19:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.right.nxv4i16(<vscale x 4 x i16> [[WIDE_LOAD]], <vscale x 4 x i16> [[WIDE_LOAD3]], i32 1)
; CHECK-VF4UF2-NEXT: [[TMP20:%.*]] = sext <vscale x 4 x i16> [[TMP18]] to <vscale x 4 x i32>
; CHECK-VF4UF2-NEXT: [[TMP21:%.*]] = sext <vscale x 4 x i16> [[TMP19]] to <vscale x 4 x i32>
; CHECK-VF4UF2-NEXT: [[TMP22:%.*]] = sext <vscale x 4 x i16> [[WIDE_LOAD]] to <vscale x 4 x i32>
diff --git a/llvm/test/Verifier/invalid-splice.ll b/llvm/test/Verifier/invalid-splice.ll
index 818508461f74c..d921e4a5c7a78 100644
--- a/llvm/test/Verifier/invalid-splice.ll
+++ b/llvm/test/Verifier/invalid-splice.ll
@@ -18,13 +18,13 @@ define <vscale x 2 x double> @splice_nxv2f64_idx_neg5_vscale_min2(<vscale x 2 x
ret <vscale x 2 x double> %res
}
-; CHECK-NOT: The splice index exceeds the range [0, VL] where VL is the known minimum number of elements in the vector
+; CHECK: The splice index exceeds the range [0, VL-1] where VL is the known minimum number of elements in the vector
define <2 x double> @splice_v2f64_idx2(<2 x double> %a, <2 x double> %b) #0 {
%res = call <2 x double> @llvm.vector.splice.v2f64(<2 x double> %a, <2 x double> %b, i32 2)
ret <2 x double> %res
}
-; CHECK: The splice index exceeds the range [0, VL] where VL is the known minimum number of elements in the vector
+; CHECK: The splice index exceeds the range [0, VL-1] where VL is the known minimum number of elements in the vector
define <2 x double> @splice_v2f64_idx3(<2 x double> %a, <2 x double> %b) #1 {
%res = call <2 x double> @llvm.vector.splice.v2f64(<2 x double> %a, <2 x double> %b, i32 3)
ret <2 x double> %res
>From 82d2670d14ab1290aa6c9a52001accfb1318dc5e Mon Sep 17 00:00:00 2001
From: Luke Lau <luke at igalia.com>
Date: Mon, 8 Dec 2025 18:22:11 +0800
Subject: [PATCH 10/12] clang-format
---
.../llvm/Analysis/TargetTransformInfo.h | 2 +-
.../Target/AArch64/AArch64ISelLowering.cpp | 20 +++++++++++--------
llvm/lib/Target/RISCV/RISCVISelLowering.cpp | 10 +++++-----
3 files changed, 18 insertions(+), 14 deletions(-)
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 4be5ce9c3e653..e66c4481adccf 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1211,7 +1211,7 @@ class TargetTransformInfo {
///< with any shuffle mask.
SK_PermuteSingleSrc, ///< Shuffle elements of single source vector with any
///< shuffle mask.
- // TODO: Split into SK_SpliceDown + SK_SpliceUp
+ // TODO: Split into SK_SpliceLeft + SK_SpliceRight
SK_Splice ///< Concatenates elements from the first input vector
///< with elements of the second input vector. Returning
///< a vector of the same type as the input vectors.
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 38886648a3e91..af84eccbc2549 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1914,14 +1914,18 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::VECREDUCE_FADD, VT, Custom);
}
- setOperationPromotedToType({ISD::VECTOR_SPLICE_LEFT, ISD::VECTOR_SPLICE_RIGHT},
- MVT::nxv2i1, MVT::nxv2i64);
- setOperationPromotedToType({ISD::VECTOR_SPLICE_LEFT, ISD::VECTOR_SPLICE_RIGHT},
- MVT::nxv4i1, MVT::nxv4i32);
- setOperationPromotedToType({ISD::VECTOR_SPLICE_LEFT, ISD::VECTOR_SPLICE_RIGHT},
- MVT::nxv8i1, MVT::nxv8i16);
- setOperationPromotedToType({ISD::VECTOR_SPLICE_LEFT, ISD::VECTOR_SPLICE_RIGHT},
- MVT::nxv16i1, MVT::nxv16i8);
+ setOperationPromotedToType(
+ {ISD::VECTOR_SPLICE_LEFT, ISD::VECTOR_SPLICE_RIGHT}, MVT::nxv2i1,
+ MVT::nxv2i64);
+ setOperationPromotedToType(
+ {ISD::VECTOR_SPLICE_LEFT, ISD::VECTOR_SPLICE_RIGHT}, MVT::nxv4i1,
+ MVT::nxv4i32);
+ setOperationPromotedToType(
+ {ISD::VECTOR_SPLICE_LEFT, ISD::VECTOR_SPLICE_RIGHT}, MVT::nxv8i1,
+ MVT::nxv8i16);
+ setOperationPromotedToType(
+ {ISD::VECTOR_SPLICE_LEFT, ISD::VECTOR_SPLICE_RIGHT}, MVT::nxv16i1,
+ MVT::nxv16i8);
setOperationAction(ISD::VSCALE, MVT::i32, Custom);
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 901ffc30c4c94..b4cc06394a1e2 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -1000,8 +1000,8 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);
- setOperationAction({ISD::VECTOR_SPLICE_LEFT, ISD::VECTOR_SPLICE_RIGHT}, VT,
- Custom);
+ setOperationAction({ISD::VECTOR_SPLICE_LEFT, ISD::VECTOR_SPLICE_RIGHT},
+ VT, Custom);
if (Subtarget.hasStdExtZvkb()) {
setOperationAction(ISD::BSWAP, VT, Legal);
@@ -1201,9 +1201,9 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);
- setOperationAction(
- {ISD::VECTOR_REVERSE, ISD::VECTOR_SPLICE_LEFT, ISD::VECTOR_SPLICE_RIGHT},
- VT, Custom);
+ setOperationAction({ISD::VECTOR_REVERSE, ISD::VECTOR_SPLICE_LEFT,
+ ISD::VECTOR_SPLICE_RIGHT},
+ VT, Custom);
setOperationAction(ISD::EXPERIMENTAL_VP_SPLICE, VT, Custom);
setOperationAction(ISD::EXPERIMENTAL_VP_REVERSE, VT, Custom);
>From 4c8cf714df8604e774e37f079725407864fa0edc Mon Sep 17 00:00:00 2001
From: Luke Lau <luke at igalia.com>
Date: Fri, 12 Dec 2025 23:30:34 +0800
Subject: [PATCH 11/12] Fix old references to up/down
---
llvm/docs/LangRef.rst | 10 +++++-----
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td | 4 ++--
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 1af53f374c178..5a889acc57820 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -20771,8 +20771,8 @@ The first two operands are vectors with the same type. For a fixed-width vector
a scalable vector <vscale x N x eltty>, imm is an unsigned integer constant in
the range 0 <= imm < X where X=vscale_range_min * N.
-'``llvm.vector.splice.up``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+'``llvm.vector.splice.right``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax:
"""""""
@@ -20800,9 +20800,9 @@ For example:
.. code-block:: text
llvm.vector.splice.right(<A,B,C,D>, <E,F,G,H>, 1);
- ==> <A,B,C,D,E,F,G,H>
- ==> <_,A,B,C,D,E,F,G>
- ==> <D,E,F,G>
+ ==> <A,B,C,D,E,F,G,H>
+ ==> <_,A,B,C,D,E,F,G>
+ ==> <D,E,F,G>
Arguments:
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 6829fc8b8bdbd..1f3854ad5f5b5 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -2152,7 +2152,7 @@ let Predicates = [HasSVE_or_SME] in {
def : Pat<(nxv8bf16 (concat_vectors nxv4bf16:$v1, nxv4bf16:$v2)),
(UZP1_ZZZ_H $v1, $v2)>;
- // Splice up with offset equal to 1
+ // Splice right with offset equal to 1
def : Pat<(nxv16i8 (vector_splice_right nxv16i8:$Z1, nxv16i8:$Z2, (i64 1))),
(INSR_ZV_B ZPR:$Z2, (INSERT_SUBREG (IMPLICIT_DEF),
(LASTB_VPZ_B (PTRUE_B 31), ZPR:$Z1), bsub))>;
@@ -2166,7 +2166,7 @@ let Predicates = [HasSVE_or_SME] in {
(INSR_ZV_D ZPR:$Z2, (INSERT_SUBREG (IMPLICIT_DEF),
(LASTB_VPZ_D (PTRUE_D 31), ZPR:$Z1), dsub))>;
- // Splice down
+ // Splice left
foreach VT = [nxv16i8] in {
def : Pat<(VT(vector_splice_left VT:$Z1, VT:$Z2,
(i64(sve_ext_imm_0_255 i32:$index)))),
>From d42db33fdb81303ca876a8e8fd896c4188eece3e Mon Sep 17 00:00:00 2001
From: Luke Lau <luke at igalia.com>
Date: Fri, 12 Dec 2025 23:30:48 +0800
Subject: [PATCH 12/12] sext->zext
---
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index e4a3e60bf3156..3d0062196c91b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -12892,7 +12892,7 @@ void SelectionDAGBuilder::visitVectorSplice(const CallInst &I) {
SDLoc DL = getCurSDLoc();
SDValue V1 = getValue(I.getOperand(0));
SDValue V2 = getValue(I.getOperand(1));
- uint64_t Imm = cast<ConstantInt>(I.getOperand(2))->getSExtValue();
+ uint64_t Imm = cast<ConstantInt>(I.getOperand(2))->getZExtValue();
const bool IsLeft = I.getIntrinsicID() == Intrinsic::vector_splice_left;
// VECTOR_SHUFFLE doesn't support a scalable mask so use a dedicated node.
More information about the llvm-commits
mailing list