[llvm] [VP][RISCV] Introduce llvm.experimental.vp.compress/expand and RISC-V support. (PR #74815)

Fri Dec 8 00:01:12 PST 2023

llvmbot wrote:




@llvm/pr-subscribers-llvm-ir

Author: Yeting Kuo (yetingk)

<details>
<summary>Changes</summary>

This patch introduces llvm.experimental.vp.compress/expand which are similar to llvm.masked.compressstore/expandload but have evl operand and reads/writes to vectors instead of memory.
This patch does not support DAG nodes for llvm.experimental.vp.compress/expand, since it may be impossible for DAG to split and expand them.

---

Patch is 94.98 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/74815.diff


15 Files Affected:

- (modified) llvm/docs/LangRef.rst (+83) 
- (modified) llvm/include/llvm/IR/Intrinsics.td (+14) 
- (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+4) 
- (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+60) 
- (modified) llvm/lib/Target/RISCV/RISCVISelLowering.h (+2) 
- (modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+13) 
- (added) llvm/test/Analysis/CostModel/RISCV/vp-cmpress-expand.ll (+119) 
- (added) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-compress-float.ll (+117) 
- (added) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-compress-int.ll (+274) 
- (added) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-expand-float.ll (+135) 
- (added) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-expand-int.ll (+298) 
- (added) llvm/test/CodeGen/RISCV/rvv/vp-compress-float.ll (+106) 
- (added) llvm/test/CodeGen/RISCV/rvv/vp-compress-int.ll (+202) 
- (added) llvm/test/CodeGen/RISCV/rvv/vp-expand-float.ll (+122) 
- (added) llvm/test/CodeGen/RISCV/rvv/vp-expand-int.ll (+216) 


``````````diff

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index cf9b33a30eab52..8744502aff70b7 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -21777,6 +21777,89 @@ This intrinsic reverses the order of the first ``evl`` elements in a vector.
 The lanes in the result vector disabled by ``mask`` are ``poison``. The
 elements past ``evl`` are poison.
 
+
+.. _int_experimental_vp_compress:
+
+
+'``llvm.experimental.vp.compress``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic. A number of scalar values of integer, floating
+point or pointer data type are collected from an input vector. A mask defines
+which elements to collect from the vector.
+
+::
+
+      declare <8 x i32> @llvm.experimental.vp.compress.v8i32(<8 x i32> <value>, <8  x i1> <mask>, i32 <evl>)
+      declare <16 x float> @llvm.experimental.vp.compress.v16f32 (<16 x float> <value>, <16 x i1> <mask>, i32 <evl>)
+
+Overview:
+"""""""""
+
+Predicated version of :ref:`llvm.masked.compressstore <int_compressstore>` that
+writes to a vector instead of memory.
+
+Arguments:
+""""""""""
+
+The first operand is the input vector, from which elements are collected. The
+second operand is the mask, a vector of boolean values. The mask and the input
+vector must have the same number of vector elements. The fourth operand is the
+explicit vector length of the operation.
+
+Semantics:
+""""""""""
+
+The '``llvm.experimental.vp.compress``' intrinsic is designed for compressing
+data. It allows to collect elements from possibly non-adjacent lanes of a vector
+in one IR operation. It is useful for targets that support compressing operations.
+The result after first '``evl``' lanes is a :ref:`poison value <poisonvalues>`.
+
+
+.. _int_experimental_vp_expand:
+
+
+'``llvm.experimental.vp.expand``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic. Several values of integer, floating point or
+pointer data type are read from a vector and stored into the elements of a
+vector according to the mask. The fourth operand is the explicit vector length
+of the operation.
+
+::
+
+      declare <8 x i32> @llvm.experimental.vp.expand.v8i32(<8 x i32> <value>, <8 x i1> <mask>, i32 <evl>)
+      declare <16 x float> @llvm.experimental.vp.expand.v16f32 (<16 x float> <value>, <16 x i1> <mask>, i32 <evl>)
+
+Overview:
+"""""""""
+
+Predicated version of :ref:`llvm.masked.expandload <int_expandload>` that reads
+from a vector instead of memory.
+
+Arguments:
+""""""""""
+
+The first operand has same type as the result. The second operand is the mask, a
+vector of boolean values. The mask and the input vector must have the same
+number of vector elements. The fourth operand is the explicit vector length of
+the operation.
+
+
+Semantics:
+""""""""""
+
+The '``llvm.experimental.vp.expand``' intrinsic is designed for reading multiple
+scalar values from adjacent vector lanes into possibly non-adjacent vector lanes.
+It is useful for targets that support vector expanding. The result on disabled
+lanes is a :ref:`poison value <poisonvalues>`.
+
+
 .. _int_vp_load:
 
 '``llvm.vp.load``' Intrinsic
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 060e964f77bf71..20d616cfda8edd 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2209,6 +2209,20 @@ def int_experimental_vp_reverse:
                          llvm_i32_ty],
                         [IntrNoMem]>;
 
+def int_experimental_vp_compress:
+  DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+                        [LLVMMatchType<0>,
+                         LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
+                         llvm_i32_ty],
+            [IntrNoMem, IntrNoSync, IntrWillReturn]>;
+
+def int_experimental_vp_expand:
+  DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+                        [LLVMMatchType<0>,
+                         LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
+                         llvm_i32_ty],
+            [IntrNoMem, IntrNoSync, IntrWillReturn]>;
+
 def int_vp_is_fpclass:
       DefaultAttrsIntrinsic<[ LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>],
                               [ llvm_anyvector_ty,
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index ed1c96a873748f..855502f0b14511 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -6548,6 +6548,10 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
 #include "llvm/IR/VPIntrinsics.def"
     visitVectorPredicationIntrinsic(cast<VPIntrinsic>(I));
     return;
+  case Intrinsic::experimental_vp_compress:
+  case Intrinsic::experimental_vp_expand:
+    visitTargetIntrinsic(I, Intrinsic);
+    return;
   case Intrinsic::fptrunc_round: {
     // Get the last argument, the metadata and convert it to an integer in the
     // call
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 72de6d18079892..c34b1d4611f6ca 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -8401,6 +8401,10 @@ SDValue RISCVTargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
   }
   case Intrinsic::experimental_get_vector_length:
     return lowerGetVectorLength(Op.getNode(), DAG, Subtarget);
+  case Intrinsic::experimental_vp_compress:
+    return lowerVPCompressExperimental(Op, DAG);
+  case Intrinsic::experimental_vp_expand:
+    return lowerVPExpandExperimental(Op, DAG);
   case Intrinsic::riscv_vmv_x_s: {
     SDValue Res = DAG.getNode(RISCVISD::VMV_X_S, DL, XLenVT, Op.getOperand(1));
     return DAG.getNode(ISD::TRUNCATE, DL, Op.getValueType(), Res);
@@ -19751,6 +19755,62 @@ bool RISCVTargetLowering::lowerInterleaveIntrinsicToStore(IntrinsicInst *II,
   return true;
 }
 
+SDValue
+RISCVTargetLowering::lowerVPCompressExperimental(SDValue N,
+                                                 SelectionDAG &DAG) const {
+  SDLoc DL(N);
+  MVT VT = N.getSimpleValueType();
+  MVT XLenVT = Subtarget.getXLenVT();
+  SDValue Op = N.getOperand(1);
+  SDValue Mask = N.getOperand(2);
+  SDValue VL = DAG.getNode(ISD::ZERO_EXTEND, DL, XLenVT, N.getOperand(3));
+
+  MVT ContainerVT = VT;
+  if (VT.isFixedLengthVector()) {
+    ContainerVT = getContainerForFixedLengthVector(VT);
+    Op = convertToScalableVector(ContainerVT, Op, DAG, Subtarget);
+    MVT MaskContainerVT = ContainerVT.changeVectorElementType(MVT::i1);
+    Mask = convertToScalableVector(MaskContainerVT, Mask, DAG, Subtarget);
+  }
+  SDValue Res =
+      DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, ContainerVT,
+                  DAG.getConstant(Intrinsic::riscv_vcompress, DL, XLenVT),
+                  DAG.getUNDEF(ContainerVT), Op, Mask, VL);
+  if (!VT.isFixedLengthVector())
+    return Res;
+  return convertFromScalableVector(VT, Res, DAG, Subtarget);
+}
+
+SDValue
+RISCVTargetLowering::lowerVPExpandExperimental(SDValue N,
+                                               SelectionDAG &DAG) const {
+  SDLoc DL(N);
+  MVT VT = N.getSimpleValueType();
+  MVT XLenVT = Subtarget.getXLenVT();
+  SDValue Op = N.getOperand(1);
+  SDValue Mask = N.getOperand(2);
+  SDValue VL = DAG.getNode(ISD::ZERO_EXTEND, DL, XLenVT, N.getOperand(3));
+
+  MVT ContainerVT = VT;
+  if (VT.isFixedLengthVector()) {
+    ContainerVT = getContainerForFixedLengthVector(VT);
+    Op = convertToScalableVector(ContainerVT, Op, DAG, Subtarget);
+    MVT MaskContainerVT = ContainerVT.changeVectorElementType(MVT::i1);
+    Mask = convertToScalableVector(MaskContainerVT, Mask, DAG, Subtarget);
+  }
+
+  MVT IndexVT = ContainerVT.changeVectorElementType(MVT::i16);
+  SDValue Index =
+      DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, IndexVT,
+                  DAG.getConstant(Intrinsic::riscv_viota, DL, XLenVT),
+                  DAG.getUNDEF(IndexVT), Mask, VL);
+  SDValue Res = DAG.getNode(RISCVISD::VRGATHEREI16_VV_VL, DL, ContainerVT, Op,
+                            Index, DAG.getUNDEF(ContainerVT), Mask, VL);
+  if (!VT.isFixedLengthVector())
+    return Res;
+  return convertFromScalableVector(VT, Res, DAG, Subtarget);
+}
+
 MachineInstr *
 RISCVTargetLowering::EmitKCFICheck(MachineBasicBlock &MBB,
                                    MachineBasicBlock::instr_iterator &MBBI,
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index ae798cc47bf833..c50f813930f5e4 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -905,6 +905,8 @@ class RISCVTargetLowering : public TargetLowering {
   SDValue lowerVPExtMaskOp(SDValue Op, SelectionDAG &DAG) const;
   SDValue lowerVPSetCCMaskOp(SDValue Op, SelectionDAG &DAG) const;
   SDValue lowerVPReverseExperimental(SDValue Op, SelectionDAG &DAG) const;
+  SDValue lowerVPCompressExperimental(SDValue Op, SelectionDAG &DAG) const;
+  SDValue lowerVPExpandExperimental(SDValue Op, SelectionDAG &DAG) const;
   SDValue lowerVPFPIntConvOp(SDValue Op, SelectionDAG &DAG) const;
   SDValue lowerVPStridedLoad(SDValue Op, SelectionDAG &DAG) const;
   SDValue lowerVPStridedStore(SDValue Op, SelectionDAG &DAG) const;
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 3a2f2f39cd1c9b..06673e71d70a9b 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1157,6 +1157,19 @@ RISCVTTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
       return Cost * LT.first;
     break;
   }
+  case Intrinsic::experimental_vp_compress: {
+    if (!isTypeLegal(RetTy))
+      return InstructionCost::getInvalid();
+    return 1;
+  }
+  case Intrinsic::experimental_vp_expand: {
+    // The codegen of vp.expand is iota.m + vrgatherei16.vv, so there will be an
+    // i16 vector whose element count is same as the RetTy.
+    IntegerType *HalfType = Type::getInt16Ty(RetTy->getContext());
+    if (!isTypeLegal(RetTy) || !isTypeLegal(RetTy->getWithNewType(HalfType)))
+      return InstructionCost::getInvalid();
+    return 4;
+  }
   }
 
   if (ST->hasVInstructions() && RetTy->isVectorTy()) {
diff --git a/llvm/test/Analysis/CostModel/RISCV/vp-cmpress-expand.ll b/llvm/test/Analysis/CostModel/RISCV/vp-cmpress-expand.ll
new file mode 100644
index 00000000000000..586e4f14bccf46
--- /dev/null
+++ b/llvm/test/Analysis/CostModel/RISCV/vp-cmpress-expand.ll
@@ -0,0 +1,119 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
+; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -S -mtriple=riscv64 -mattr=+v,+f,+d,+zfh,+zvfh | FileCheck %s
+
+define void @vp_compress() {
+; CHECK-LABEL: 'vp_compress'
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %0 = call <vscale x 1 x i64> @llvm.experimental.vp.compress.nxv1i64(<vscale x 1 x i64> undef, <vscale x 1 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = call <vscale x 2 x i32> @llvm.experimental.vp.compress.nxv2i32(<vscale x 2 x i32> undef, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %2 = call <vscale x 4 x i16> @llvm.experimental.vp.compress.nxv4i16(<vscale x 4 x i16> undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %3 = call <vscale x 8 x i8> @llvm.experimental.vp.compress.nxv8i8(<vscale x 8 x i8> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %4 = call <vscale x 2 x i64> @llvm.experimental.vp.compress.nxv2i64(<vscale x 2 x i64> undef, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %5 = call <vscale x 4 x i32> @llvm.experimental.vp.compress.nxv4i32(<vscale x 4 x i32> undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %6 = call <vscale x 8 x i16> @llvm.experimental.vp.compress.nxv8i16(<vscale x 8 x i16> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %7 = call <vscale x 16 x i8> @llvm.experimental.vp.compress.nxv16i8(<vscale x 16 x i8> undef, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %8 = call <vscale x 4 x i64> @llvm.experimental.vp.compress.nxv4i64(<vscale x 4 x i64> undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %9 = call <vscale x 8 x i32> @llvm.experimental.vp.compress.nxv8i32(<vscale x 8 x i32> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %10 = call <vscale x 16 x i16> @llvm.experimental.vp.compress.nxv16i16(<vscale x 16 x i16> undef, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %11 = call <vscale x 32 x i8> @llvm.experimental.vp.compress.nxv32i8(<vscale x 32 x i8> undef, <vscale x 32 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %12 = call <vscale x 8 x i64> @llvm.experimental.vp.compress.nxv8i64(<vscale x 8 x i64> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %13 = call <vscale x 16 x i32> @llvm.experimental.vp.compress.nxv16i32(<vscale x 16 x i32> undef, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %14 = call <vscale x 32 x i16> @llvm.experimental.vp.compress.nxv32i16(<vscale x 32 x i16> undef, <vscale x 32 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %15 = call <vscale x 64 x i8> @llvm.experimental.vp.compress.nxv64i8(<vscale x 64 x i8> undef, <vscale x 64 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %16 = call <vscale x 128 x i8> @llvm.experimental.vp.compress.nxv128i8(<vscale x 128 x i8> undef, <vscale x 128 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+entry:
+  %0 = call <vscale x 1 x i64> @llvm.experimental.vp.compress.nxv1i64(<vscale x 1 x i64> undef,<vscale x 1 x i1> undef,i32 undef)
+  %1 = call <vscale x 2 x i32> @llvm.experimental.vp.compress.nxv2i32(<vscale x 2 x i32> undef,<vscale x 2 x i1> undef,i32 undef)
+  %2 = call <vscale x 4 x i16> @llvm.experimental.vp.compress.nxv4i16(<vscale x 4 x i16> undef,<vscale x 4 x i1> undef,i32 undef)
+  %3 = call <vscale x 8 x i8> @llvm.experimental.vp.compress.nxv8i8(<vscale x 8 x i8> undef,<vscale x 8 x i1> undef,i32 undef)
+  %4 = call <vscale x 2 x i64> @llvm.experimental.vp.compress.nxv2i64(<vscale x 2 x i64> undef,<vscale x 2 x i1> undef,i32 undef)
+  %5 = call <vscale x 4 x i32> @llvm.experimental.vp.compress.nxv4i32(<vscale x 4 x i32> undef,<vscale x 4 x i1> undef,i32 undef)
+  %6 = call <vscale x 8 x i16> @llvm.experimental.vp.compress.nxv8i16(<vscale x 8 x i16> undef,<vscale x 8 x i1> undef,i32 undef)
+  %7 = call <vscale x 16 x i8> @llvm.experimental.vp.compress.nxv16i8(<vscale x 16 x i8> undef,<vscale x 16 x i1> undef,i32 undef)
+  %8 = call <vscale x 4 x i64> @llvm.experimental.vp.compress.nxv4i64(<vscale x 4 x i64> undef,<vscale x 4 x i1> undef,i32 undef)
+  %9 = call <vscale x 8 x i32> @llvm.experimental.vp.compress.nxv8i32(<vscale x 8 x i32> undef,<vscale x 8 x i1> undef,i32 undef)
+  %10 = call <vscale x 16 x i16> @llvm.experimental.vp.compress.nxv16i16(<vscale x 16 x i16> undef,<vscale x 16 x i1> undef,i32 undef)
+  %11 = call <vscale x 32 x i8> @llvm.experimental.vp.compress.nxv32i8(<vscale x 32 x i8> undef,<vscale x 32 x i1> undef,i32 undef)
+  %12 = call <vscale x 8 x i64> @llvm.experimental.vp.compress.nxv8i64(<vscale x 8 x i64> undef,<vscale x 8 x i1> undef,i32 undef)
+  %13 = call <vscale x 16 x i32> @llvm.experimental.vp.compress.nxv16i32(<vscale x 16 x i32> undef,<vscale x 16 x i1> undef,i32 undef)
+  %14 = call <vscale x 32 x i16> @llvm.experimental.vp.compress.nxv32i16(<vscale x 32 x i16> undef,<vscale x 32 x i1> undef,i32 undef)
+  %15 = call <vscale x 64 x i8> @llvm.experimental.vp.compress.nxv64i8(<vscale x 64 x i8> undef,<vscale x 64 x i1> undef,i32 undef)
+  %16 = call <vscale x 128 x i8> @llvm.experimental.vp.compress.nxv128i8(<vscale x 128 x i8> undef,<vscale x 128 x i1> undef,i32 undef)
+  ret void
+}
+
+define void @vp_expand() {
+; CHECK-LABEL: 'vp_expand'
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %0 = call <vscale x 1 x i64> @llvm.experimental.vp.expand.nxv1i64(<vscale x 1 x i64> undef, <vscale x 1 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %1 = call <vscale x 2 x i32> @llvm.experimental.vp.expand.nxv2i32(<vscale x 2 x i32> undef, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %2 = call <vscale x 4 x i16> @llvm.experimental.vp.expand.nxv4i16(<vscale x 4 x i16> undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %3 = call <vscale x 8 x i8> @llvm.experimental.vp.expand.nxv8i8(<vscale x 8 x i8> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %4 = call <vscale x 2 x i64> @llvm.experimental.vp.expand.nxv2i64(<vscale x 2 x i64> undef, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %5 = call <vscale x 4 x i32> @llvm.experimental.vp.expand.nxv4i32(<vscale x 4 x i32> undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %6 = call <vscale x 8 x i16> @llvm.experimental.vp.expand.nxv8i16(<vscale x 8 x i16> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %7 = call <vscale x 16 x i8> @llvm.experimental.vp.expand.nxv16i8(<vscale x 16 x i8> undef, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %8 = call <vscale x 4 x i64> @llvm.experimental.vp.expand.nxv4i64(<vscale x 4 x i64> undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %9 = call <vscale x 8 x i32> @llvm.experimental.vp.expand.nxv8i32(<vscale x 8 x i32> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %10 = call <vscale x 16 x i16> @llvm.experimental.vp.expand.nxv16i16(<vscale x 16 x i16> undef, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %11 = call <vscale x 32 x i8> @llvm.experimental.vp.expand.nxv32i8(<vscale x 32 x i8> undef, <vscale x 32 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %12 = call <vscale x 8 x i64> @llvm.experimental.vp.expand.nxv8i64(<vscale x 8 x i64> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %13 = call <vscale x 16 x i32> @llvm.experimental.vp.expand.nxv16i32(<vscale x 16 x i32> undef, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %14 = call <vscale x 32 x i16> @llvm.experimental.vp.expand.nxv32i16(<vscale x 32 x i16> undef, <vscale x 32 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %15 = call <vscale x 64 x i8> @llvm.experimental.vp.expand.nxv64i8(<vscale x 64 x i8> undef, <vscale x 64 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+entry:
+  %0 = call <vscale x 1 x i64> @llvm.experimental.vp.expand.nxv1i64(<vscale x 1 x i64> undef,<vscale x 1 x i1> undef,i32 undef)
+  %1 = call <vscale x 2 x i32> @llvm.experimental.vp.expand.nxv2i32(<vscale x 2 x i32> undef,<vscale x 2 x i1> undef,i32 undef)
+  %2 = call <vscale x 4 x i16> @llvm.experimental.vp.expand.nxv4i16(<v...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/74815