[llvm] [VP][RISCV] Introduce llvm.experimental.vp.compress/expand and RISC-V support. (PR #74815)

Fri Dec 8 00:00:42 PST 2023

https://github.com/yetingk created https://github.com/llvm/llvm-project/pull/74815

This patch introduces llvm.experimental.vp.compress/expand which are similar to llvm.masked.compressstore/expandload but have evl operand and reads/writes to vectors instead of memory.
This patch does not support DAG nodes for llvm.experimental.vp.compress/expand, since it may be impossible for DAG to split and expand them.

>From 623c6fafa4657bdc5e997a53aea76012e13441a2 Mon Sep 17 00:00:00 2001
From: Yeting Kuo <yeting.kuo at sifive.com>
Date: Tue, 5 Dec 2023 01:24:51 -0800
Subject: [PATCH] [VP][RISCV] Introduce llvm.experimental.vp.compress/expand
 and RISC-V support.

This patch introudces llvm.experimental.vp.compress/expand which are
similiar to llvm.masked.compressstore/expandload but have evl operand and
reads/writes to vectors instead of memeory.
This patch does not support DAG nodes for llvm.experimental.vp.compress/expand,
since it may be impossible for DAG to split and expand them.
---
 llvm/docs/LangRef.rst                         |  83 +++++
 llvm/include/llvm/IR/Intrinsics.td            |  14 +
 .../SelectionDAG/SelectionDAGBuilder.cpp      |   4 +
 llvm/lib/Target/RISCV/RISCVISelLowering.cpp   |  60 ++++
 llvm/lib/Target/RISCV/RISCVISelLowering.h     |   2 +
 .../Target/RISCV/RISCVTargetTransformInfo.cpp |  13 +
 .../CostModel/RISCV/vp-cmpress-expand.ll      | 119 +++++++
 .../rvv/fixed-vectors-vp-compress-float.ll    | 117 +++++++
 .../rvv/fixed-vectors-vp-compress-int.ll      | 274 ++++++++++++++++
 .../rvv/fixed-vectors-vp-expand-float.ll      | 135 ++++++++
 .../RISCV/rvv/fixed-vectors-vp-expand-int.ll  | 298 ++++++++++++++++++
 .../CodeGen/RISCV/rvv/vp-compress-float.ll    | 106 +++++++
 .../test/CodeGen/RISCV/rvv/vp-compress-int.ll | 202 ++++++++++++
 .../test/CodeGen/RISCV/rvv/vp-expand-float.ll | 122 +++++++
 llvm/test/CodeGen/RISCV/rvv/vp-expand-int.ll  | 216 +++++++++++++
 15 files changed, 1765 insertions(+)
 create mode 100644 llvm/test/Analysis/CostModel/RISCV/vp-cmpress-expand.ll
 create mode 100644 llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-compress-float.ll
 create mode 100644 llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-compress-int.ll
 create mode 100644 llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-expand-float.ll
 create mode 100644 llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-expand-int.ll
 create mode 100644 llvm/test/CodeGen/RISCV/rvv/vp-compress-float.ll
 create mode 100644 llvm/test/CodeGen/RISCV/rvv/vp-compress-int.ll
 create mode 100644 llvm/test/CodeGen/RISCV/rvv/vp-expand-float.ll
 create mode 100644 llvm/test/CodeGen/RISCV/rvv/vp-expand-int.ll

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index cf9b33a30eab5..8744502aff70b 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -21777,6 +21777,89 @@ This intrinsic reverses the order of the first ``evl`` elements in a vector.
 The lanes in the result vector disabled by ``mask`` are ``poison``. The
 elements past ``evl`` are poison.
 
+
+.. _int_experimental_vp_compress:
+
+
+'``llvm.experimental.vp.compress``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic. A number of scalar values of integer, floating
+point or pointer data type are collected from an input vector. A mask defines
+which elements to collect from the vector.
+
+::
+
+      declare <8 x i32> @llvm.experimental.vp.compress.v8i32(<8 x i32> <value>, <8  x i1> <mask>, i32 <evl>)
+      declare <16 x float> @llvm.experimental.vp.compress.v16f32 (<16 x float> <value>, <16 x i1> <mask>, i32 <evl>)
+
+Overview:
+"""""""""
+
+Predicated version of :ref:`llvm.masked.compressstore <int_compressstore>` that
+writes to a vector instead of memory.
+
+Arguments:
+""""""""""
+
+The first operand is the input vector, from which elements are collected. The
+second operand is the mask, a vector of boolean values. The mask and the input
+vector must have the same number of vector elements. The fourth operand is the
+explicit vector length of the operation.
+
+Semantics:
+""""""""""
+
+The '``llvm.experimental.vp.compress``' intrinsic is designed for compressing
+data. It allows to collect elements from possibly non-adjacent lanes of a vector
+in one IR operation. It is useful for targets that support compressing operations.
+The result after first '``evl``' lanes is a :ref:`poison value <poisonvalues>`.
+
+
+.. _int_experimental_vp_expand:
+
+
+'``llvm.experimental.vp.expand``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic. Several values of integer, floating point or
+pointer data type are read from a vector and stored into the elements of a
+vector according to the mask. The fourth operand is the explicit vector length
+of the operation.
+
+::
+
+      declare <8 x i32> @llvm.experimental.vp.expand.v8i32(<8 x i32> <value>, <8 x i1> <mask>, i32 <evl>)
+      declare <16 x float> @llvm.experimental.vp.expand.v16f32 (<16 x float> <value>, <16 x i1> <mask>, i32 <evl>)
+
+Overview:
+"""""""""
+
+Predicated version of :ref:`llvm.masked.expandload <int_expandload>` that reads
+from a vector instead of memory.
+
+Arguments:
+""""""""""
+
+The first operand has same type as the result. The second operand is the mask, a
+vector of boolean values. The mask and the input vector must have the same
+number of vector elements. The fourth operand is the explicit vector length of
+the operation.
+
+
+Semantics:
+""""""""""
+
+The '``llvm.experimental.vp.expand``' intrinsic is designed for reading multiple
+scalar values from adjacent vector lanes into possibly non-adjacent vector lanes.
+It is useful for targets that support vector expanding. The result on disabled
+lanes is a :ref:`poison value <poisonvalues>`.
+
+
 .. _int_vp_load:
 
 '``llvm.vp.load``' Intrinsic
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 060e964f77bf7..20d616cfda8ed 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2209,6 +2209,20 @@ def int_experimental_vp_reverse:
                          llvm_i32_ty],
                         [IntrNoMem]>;
 
+def int_experimental_vp_compress:
+  DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+                        [LLVMMatchType<0>,
+                         LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
+                         llvm_i32_ty],
+            [IntrNoMem, IntrNoSync, IntrWillReturn]>;
+
+def int_experimental_vp_expand:
+  DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+                        [LLVMMatchType<0>,
+                         LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
+                         llvm_i32_ty],
+            [IntrNoMem, IntrNoSync, IntrWillReturn]>;
+
 def int_vp_is_fpclass:
       DefaultAttrsIntrinsic<[ LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>],
                               [ llvm_anyvector_ty,
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index ed1c96a873748..855502f0b1451 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -6548,6 +6548,10 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
 #include "llvm/IR/VPIntrinsics.def"
     visitVectorPredicationIntrinsic(cast<VPIntrinsic>(I));
     return;
+  case Intrinsic::experimental_vp_compress:
+  case Intrinsic::experimental_vp_expand:
+    visitTargetIntrinsic(I, Intrinsic);
+    return;
   case Intrinsic::fptrunc_round: {
     // Get the last argument, the metadata and convert it to an integer in the
     // call
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 72de6d1807989..c34b1d4611f6c 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -8401,6 +8401,10 @@ SDValue RISCVTargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
   }
   case Intrinsic::experimental_get_vector_length:
     return lowerGetVectorLength(Op.getNode(), DAG, Subtarget);
+  case Intrinsic::experimental_vp_compress:
+    return lowerVPCompressExperimental(Op, DAG);
+  case Intrinsic::experimental_vp_expand:
+    return lowerVPExpandExperimental(Op, DAG);
   case Intrinsic::riscv_vmv_x_s: {
     SDValue Res = DAG.getNode(RISCVISD::VMV_X_S, DL, XLenVT, Op.getOperand(1));
     return DAG.getNode(ISD::TRUNCATE, DL, Op.getValueType(), Res);
@@ -19751,6 +19755,62 @@ bool RISCVTargetLowering::lowerInterleaveIntrinsicToStore(IntrinsicInst *II,
   return true;
 }
 
+SDValue
+RISCVTargetLowering::lowerVPCompressExperimental(SDValue N,
+                                                 SelectionDAG &DAG) const {
+  SDLoc DL(N);
+  MVT VT = N.getSimpleValueType();
+  MVT XLenVT = Subtarget.getXLenVT();
+  SDValue Op = N.getOperand(1);
+  SDValue Mask = N.getOperand(2);
+  SDValue VL = DAG.getNode(ISD::ZERO_EXTEND, DL, XLenVT, N.getOperand(3));
+
+  MVT ContainerVT = VT;
+  if (VT.isFixedLengthVector()) {
+    ContainerVT = getContainerForFixedLengthVector(VT);
+    Op = convertToScalableVector(ContainerVT, Op, DAG, Subtarget);
+    MVT MaskContainerVT = ContainerVT.changeVectorElementType(MVT::i1);
+    Mask = convertToScalableVector(MaskContainerVT, Mask, DAG, Subtarget);
+  }
+  SDValue Res =
+      DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, ContainerVT,
+                  DAG.getConstant(Intrinsic::riscv_vcompress, DL, XLenVT),
+                  DAG.getUNDEF(ContainerVT), Op, Mask, VL);
+  if (!VT.isFixedLengthVector())
+    return Res;
+  return convertFromScalableVector(VT, Res, DAG, Subtarget);
+}
+
+SDValue
+RISCVTargetLowering::lowerVPExpandExperimental(SDValue N,
+                                               SelectionDAG &DAG) const {
+  SDLoc DL(N);
+  MVT VT = N.getSimpleValueType();
+  MVT XLenVT = Subtarget.getXLenVT();
+  SDValue Op = N.getOperand(1);
+  SDValue Mask = N.getOperand(2);
+  SDValue VL = DAG.getNode(ISD::ZERO_EXTEND, DL, XLenVT, N.getOperand(3));
+
+  MVT ContainerVT = VT;
+  if (VT.isFixedLengthVector()) {
+    ContainerVT = getContainerForFixedLengthVector(VT);
+    Op = convertToScalableVector(ContainerVT, Op, DAG, Subtarget);
+    MVT MaskContainerVT = ContainerVT.changeVectorElementType(MVT::i1);
+    Mask = convertToScalableVector(MaskContainerVT, Mask, DAG, Subtarget);
+  }
+
+  MVT IndexVT = ContainerVT.changeVectorElementType(MVT::i16);
+  SDValue Index =
+      DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, IndexVT,
+                  DAG.getConstant(Intrinsic::riscv_viota, DL, XLenVT),
+                  DAG.getUNDEF(IndexVT), Mask, VL);
+  SDValue Res = DAG.getNode(RISCVISD::VRGATHEREI16_VV_VL, DL, ContainerVT, Op,
+                            Index, DAG.getUNDEF(ContainerVT), Mask, VL);
+  if (!VT.isFixedLengthVector())
+    return Res;
+  return convertFromScalableVector(VT, Res, DAG, Subtarget);
+}
+
 MachineInstr *
 RISCVTargetLowering::EmitKCFICheck(MachineBasicBlock &MBB,
                                    MachineBasicBlock::instr_iterator &MBBI,
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index ae798cc47bf83..c50f813930f5e 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -905,6 +905,8 @@ class RISCVTargetLowering : public TargetLowering {
   SDValue lowerVPExtMaskOp(SDValue Op, SelectionDAG &DAG) const;
   SDValue lowerVPSetCCMaskOp(SDValue Op, SelectionDAG &DAG) const;
   SDValue lowerVPReverseExperimental(SDValue Op, SelectionDAG &DAG) const;
+  SDValue lowerVPCompressExperimental(SDValue Op, SelectionDAG &DAG) const;
+  SDValue lowerVPExpandExperimental(SDValue Op, SelectionDAG &DAG) const;
   SDValue lowerVPFPIntConvOp(SDValue Op, SelectionDAG &DAG) const;
   SDValue lowerVPStridedLoad(SDValue Op, SelectionDAG &DAG) const;
   SDValue lowerVPStridedStore(SDValue Op, SelectionDAG &DAG) const;
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 3a2f2f39cd1c9..06673e71d70a9 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1157,6 +1157,19 @@ RISCVTTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
       return Cost * LT.first;
     break;
   }
+  case Intrinsic::experimental_vp_compress: {
+    if (!isTypeLegal(RetTy))
+      return InstructionCost::getInvalid();
+    return 1;
+  }
+  case Intrinsic::experimental_vp_expand: {
+    // The codegen of vp.expand is iota.m + vrgatherei16.vv, so there will be an
+    // i16 vector whose element count is same as the RetTy.
+    IntegerType *HalfType = Type::getInt16Ty(RetTy->getContext());
+    if (!isTypeLegal(RetTy) || !isTypeLegal(RetTy->getWithNewType(HalfType)))
+      return InstructionCost::getInvalid();
+    return 4;
+  }
   }
 
   if (ST->hasVInstructions() && RetTy->isVectorTy()) {
diff --git a/llvm/test/Analysis/CostModel/RISCV/vp-cmpress-expand.ll b/llvm/test/Analysis/CostModel/RISCV/vp-cmpress-expand.ll
new file mode 100644
index 0000000000000..586e4f14bccf4
--- /dev/null
+++ b/llvm/test/Analysis/CostModel/RISCV/vp-cmpress-expand.ll
@@ -0,0 +1,119 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
+; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -S -mtriple=riscv64 -mattr=+v,+f,+d,+zfh,+zvfh | FileCheck %s
+
+define void @vp_compress() {
+; CHECK-LABEL: 'vp_compress'
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %0 = call <vscale x 1 x i64> @llvm.experimental.vp.compress.nxv1i64(<vscale x 1 x i64> undef, <vscale x 1 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = call <vscale x 2 x i32> @llvm.experimental.vp.compress.nxv2i32(<vscale x 2 x i32> undef, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %2 = call <vscale x 4 x i16> @llvm.experimental.vp.compress.nxv4i16(<vscale x 4 x i16> undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %3 = call <vscale x 8 x i8> @llvm.experimental.vp.compress.nxv8i8(<vscale x 8 x i8> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %4 = call <vscale x 2 x i64> @llvm.experimental.vp.compress.nxv2i64(<vscale x 2 x i64> undef, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %5 = call <vscale x 4 x i32> @llvm.experimental.vp.compress.nxv4i32(<vscale x 4 x i32> undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %6 = call <vscale x 8 x i16> @llvm.experimental.vp.compress.nxv8i16(<vscale x 8 x i16> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %7 = call <vscale x 16 x i8> @llvm.experimental.vp.compress.nxv16i8(<vscale x 16 x i8> undef, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %8 = call <vscale x 4 x i64> @llvm.experimental.vp.compress.nxv4i64(<vscale x 4 x i64> undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %9 = call <vscale x 8 x i32> @llvm.experimental.vp.compress.nxv8i32(<vscale x 8 x i32> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %10 = call <vscale x 16 x i16> @llvm.experimental.vp.compress.nxv16i16(<vscale x 16 x i16> undef, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %11 = call <vscale x 32 x i8> @llvm.experimental.vp.compress.nxv32i8(<vscale x 32 x i8> undef, <vscale x 32 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %12 = call <vscale x 8 x i64> @llvm.experimental.vp.compress.nxv8i64(<vscale x 8 x i64> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %13 = call <vscale x 16 x i32> @llvm.experimental.vp.compress.nxv16i32(<vscale x 16 x i32> undef, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %14 = call <vscale x 32 x i16> @llvm.experimental.vp.compress.nxv32i16(<vscale x 32 x i16> undef, <vscale x 32 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %15 = call <vscale x 64 x i8> @llvm.experimental.vp.compress.nxv64i8(<vscale x 64 x i8> undef, <vscale x 64 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %16 = call <vscale x 128 x i8> @llvm.experimental.vp.compress.nxv128i8(<vscale x 128 x i8> undef, <vscale x 128 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+entry:
+  %0 = call <vscale x 1 x i64> @llvm.experimental.vp.compress.nxv1i64(<vscale x 1 x i64> undef,<vscale x 1 x i1> undef,i32 undef)
+  %1 = call <vscale x 2 x i32> @llvm.experimental.vp.compress.nxv2i32(<vscale x 2 x i32> undef,<vscale x 2 x i1> undef,i32 undef)
+  %2 = call <vscale x 4 x i16> @llvm.experimental.vp.compress.nxv4i16(<vscale x 4 x i16> undef,<vscale x 4 x i1> undef,i32 undef)
+  %3 = call <vscale x 8 x i8> @llvm.experimental.vp.compress.nxv8i8(<vscale x 8 x i8> undef,<vscale x 8 x i1> undef,i32 undef)
+  %4 = call <vscale x 2 x i64> @llvm.experimental.vp.compress.nxv2i64(<vscale x 2 x i64> undef,<vscale x 2 x i1> undef,i32 undef)
+  %5 = call <vscale x 4 x i32> @llvm.experimental.vp.compress.nxv4i32(<vscale x 4 x i32> undef,<vscale x 4 x i1> undef,i32 undef)
+  %6 = call <vscale x 8 x i16> @llvm.experimental.vp.compress.nxv8i16(<vscale x 8 x i16> undef,<vscale x 8 x i1> undef,i32 undef)
+  %7 = call <vscale x 16 x i8> @llvm.experimental.vp.compress.nxv16i8(<vscale x 16 x i8> undef,<vscale x 16 x i1> undef,i32 undef)
+  %8 = call <vscale x 4 x i64> @llvm.experimental.vp.compress.nxv4i64(<vscale x 4 x i64> undef,<vscale x 4 x i1> undef,i32 undef)
+  %9 = call <vscale x 8 x i32> @llvm.experimental.vp.compress.nxv8i32(<vscale x 8 x i32> undef,<vscale x 8 x i1> undef,i32 undef)
+  %10 = call <vscale x 16 x i16> @llvm.experimental.vp.compress.nxv16i16(<vscale x 16 x i16> undef,<vscale x 16 x i1> undef,i32 undef)
+  %11 = call <vscale x 32 x i8> @llvm.experimental.vp.compress.nxv32i8(<vscale x 32 x i8> undef,<vscale x 32 x i1> undef,i32 undef)
+  %12 = call <vscale x 8 x i64> @llvm.experimental.vp.compress.nxv8i64(<vscale x 8 x i64> undef,<vscale x 8 x i1> undef,i32 undef)
+  %13 = call <vscale x 16 x i32> @llvm.experimental.vp.compress.nxv16i32(<vscale x 16 x i32> undef,<vscale x 16 x i1> undef,i32 undef)
+  %14 = call <vscale x 32 x i16> @llvm.experimental.vp.compress.nxv32i16(<vscale x 32 x i16> undef,<vscale x 32 x i1> undef,i32 undef)
+  %15 = call <vscale x 64 x i8> @llvm.experimental.vp.compress.nxv64i8(<vscale x 64 x i8> undef,<vscale x 64 x i1> undef,i32 undef)
+  %16 = call <vscale x 128 x i8> @llvm.experimental.vp.compress.nxv128i8(<vscale x 128 x i8> undef,<vscale x 128 x i1> undef,i32 undef)
+  ret void
+}
+
+define void @vp_expand() {
+; CHECK-LABEL: 'vp_expand'
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %0 = call <vscale x 1 x i64> @llvm.experimental.vp.expand.nxv1i64(<vscale x 1 x i64> undef, <vscale x 1 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %1 = call <vscale x 2 x i32> @llvm.experimental.vp.expand.nxv2i32(<vscale x 2 x i32> undef, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %2 = call <vscale x 4 x i16> @llvm.experimental.vp.expand.nxv4i16(<vscale x 4 x i16> undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %3 = call <vscale x 8 x i8> @llvm.experimental.vp.expand.nxv8i8(<vscale x 8 x i8> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %4 = call <vscale x 2 x i64> @llvm.experimental.vp.expand.nxv2i64(<vscale x 2 x i64> undef, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %5 = call <vscale x 4 x i32> @llvm.experimental.vp.expand.nxv4i32(<vscale x 4 x i32> undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %6 = call <vscale x 8 x i16> @llvm.experimental.vp.expand.nxv8i16(<vscale x 8 x i16> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %7 = call <vscale x 16 x i8> @llvm.experimental.vp.expand.nxv16i8(<vscale x 16 x i8> undef, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %8 = call <vscale x 4 x i64> @llvm.experimental.vp.expand.nxv4i64(<vscale x 4 x i64> undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %9 = call <vscale x 8 x i32> @llvm.experimental.vp.expand.nxv8i32(<vscale x 8 x i32> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %10 = call <vscale x 16 x i16> @llvm.experimental.vp.expand.nxv16i16(<vscale x 16 x i16> undef, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %11 = call <vscale x 32 x i8> @llvm.experimental.vp.expand.nxv32i8(<vscale x 32 x i8> undef, <vscale x 32 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %12 = call <vscale x 8 x i64> @llvm.experimental.vp.expand.nxv8i64(<vscale x 8 x i64> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %13 = call <vscale x 16 x i32> @llvm.experimental.vp.expand.nxv16i32(<vscale x 16 x i32> undef, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %14 = call <vscale x 32 x i16> @llvm.experimental.vp.expand.nxv32i16(<vscale x 32 x i16> undef, <vscale x 32 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %15 = call <vscale x 64 x i8> @llvm.experimental.vp.expand.nxv64i8(<vscale x 64 x i8> undef, <vscale x 64 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+entry:
+  %0 = call <vscale x 1 x i64> @llvm.experimental.vp.expand.nxv1i64(<vscale x 1 x i64> undef,<vscale x 1 x i1> undef,i32 undef)
+  %1 = call <vscale x 2 x i32> @llvm.experimental.vp.expand.nxv2i32(<vscale x 2 x i32> undef,<vscale x 2 x i1> undef,i32 undef)
+  %2 = call <vscale x 4 x i16> @llvm.experimental.vp.expand.nxv4i16(<vscale x 4 x i16> undef,<vscale x 4 x i1> undef,i32 undef)
+  %3 = call <vscale x 8 x i8> @llvm.experimental.vp.expand.nxv8i8(<vscale x 8 x i8> undef,<vscale x 8 x i1> undef,i32 undef)
+  %4 = call <vscale x 2 x i64> @llvm.experimental.vp.expand.nxv2i64(<vscale x 2 x i64> undef,<vscale x 2 x i1> undef,i32 undef)
+  %5 = call <vscale x 4 x i32> @llvm.experimental.vp.expand.nxv4i32(<vscale x 4 x i32> undef,<vscale x 4 x i1> undef,i32 undef)
+  %6 = call <vscale x 8 x i16> @llvm.experimental.vp.expand.nxv8i16(<vscale x 8 x i16> undef,<vscale x 8 x i1> undef,i32 undef)
+  %7 = call <vscale x 16 x i8> @llvm.experimental.vp.expand.nxv16i8(<vscale x 16 x i8> undef,<vscale x 16 x i1> undef,i32 undef)
+  %8 = call <vscale x 4 x i64> @llvm.experimental.vp.expand.nxv4i64(<vscale x 4 x i64> undef,<vscale x 4 x i1> undef,i32 undef)
+  %9 = call <vscale x 8 x i32> @llvm.experimental.vp.expand.nxv8i32(<vscale x 8 x i32> undef,<vscale x 8 x i1> undef,i32 undef)
+  %10 = call <vscale x 16 x i16> @llvm.experimental.vp.expand.nxv16i16(<vscale x 16 x i16> undef,<vscale x 16 x i1> undef,i32 undef)
+  %11 = call <vscale x 32 x i8> @llvm.experimental.vp.expand.nxv32i8(<vscale x 32 x i8> undef,<vscale x 32 x i1> undef,i32 undef)
+  %12 = call <vscale x 8 x i64> @llvm.experimental.vp.expand.nxv8i64(<vscale x 8 x i64> undef,<vscale x 8 x i1> undef,i32 undef)
+  %13 = call <vscale x 16 x i32> @llvm.experimental.vp.expand.nxv16i32(<vscale x 16 x i32> undef,<vscale x 16 x i1> undef,i32 undef)
+  %14 = call <vscale x 32 x i16> @llvm.experimental.vp.expand.nxv32i16(<vscale x 32 x i16> undef,<vscale x 32 x i1> undef,i32 undef)
+  %15 = call <vscale x 64 x i8> @llvm.experimental.vp.expand.nxv64i8(<vscale x 64 x i8> undef,<vscale x 64 x i1> undef,i32 undef)
+  ret void
+}
+
+declare <vscale x 1 x i64> @llvm.experimental.vp.compress.nxv1i64(<vscale x 1 x i64>,<vscale x 1 x i1>,i32)
+declare <vscale x 2 x i32> @llvm.experimental.vp.compress.nxv2i32(<vscale x 2 x i32>,<vscale x 2 x i1>,i32)
+declare <vscale x 4 x i16> @llvm.experimental.vp.compress.nxv4i16(<vscale x 4 x i16>,<vscale x 4 x i1>,i32)
+declare <vscale x 8 x i8> @llvm.experimental.vp.compress.nxv8i8(<vscale x 8 x i8>,<vscale x 8 x i1>,i32)
+declare <vscale x 2 x i64> @llvm.experimental.vp.compress.nxv2i64(<vscale x 2 x i64>,<vscale x 2 x i1>,i32)
+declare <vscale x 4 x i32> @llvm.experimental.vp.compress.nxv4i32(<vscale x 4 x i32>,<vscale x 4 x i1>,i32)
+declare <vscale x 8 x i16> @llvm.experimental.vp.compress.nxv8i16(<vscale x 8 x i16>,<vscale x 8 x i1>,i32)
+declare <vscale x 16 x i8> @llvm.experimental.vp.compress.nxv16i8(<vscale x 16 x i8>,<vscale x 16 x i1>,i32)
+declare <vscale x 4 x i64> @llvm.experimental.vp.compress.nxv4i64(<vscale x 4 x i64>,<vscale x 4 x i1>,i32)
+declare <vscale x 8 x i32> @llvm.experimental.vp.compress.nxv8i32(<vscale x 8 x i32>,<vscale x 8 x i1>,i32)
+declare <vscale x 16 x i16> @llvm.experimental.vp.compress.nxv16i16(<vscale x 16 x i16>,<vscale x 16 x i1>,i32)
+declare <vscale x 32 x i8> @llvm.experimental.vp.compress.nxv32i8(<vscale x 32 x i8>,<vscale x 32 x i1>,i32)
+declare <vscale x 8 x i64> @llvm.experimental.vp.compress.nxv8i64(<vscale x 8 x i64>,<vscale x 8 x i1>,i32)
+declare <vscale x 16 x i32> @llvm.experimental.vp.compress.nxv16i32(<vscale x 16 x i32>,<vscale x 16 x i1>,i32)
+declare <vscale x 32 x i16> @llvm.experimental.vp.compress.nxv32i16(<vscale x 32 x i16>,<vscale x 32 x i1>,i32)
+declare <vscale x 64 x i8> @llvm.experimental.vp.compress.nxv64i8(<vscale x 64 x i8>,<vscale x 64 x i1>,i32)
+declare <vscale x 128 x i8> @llvm.experimental.vp.compress.nxv128i8(<vscale x 128 x i8>,<vscale x 128 x i1>,i32)
+
+declare <vscale x 1 x i64> @llvm.experimental.vp.expand.nxv1i64(<vscale x 1 x i64>,<vscale x 1 x i1>,i32)
+declare <vscale x 2 x i32> @llvm.experimental.vp.expand.nxv2i32(<vscale x 2 x i32>,<vscale x 2 x i1>,i32)
+declare <vscale x 4 x i16> @llvm.experimental.vp.expand.nxv4i16(<vscale x 4 x i16>,<vscale x 4 x i1>,i32)
+declare <vscale x 8 x i8> @llvm.experimental.vp.expand.nxv8i8(<vscale x 8 x i8>,<vscale x 8 x i1>,i32)
+declare <vscale x 2 x i64> @llvm.experimental.vp.expand.nxv2i64(<vscale x 2 x i64>,<vscale x 2 x i1>,i32)
+declare <vscale x 4 x i32> @llvm.experimental.vp.expand.nxv4i32(<vscale x 4 x i32>,<vscale x 4 x i1>,i32)
+declare <vscale x 8 x i16> @llvm.experimental.vp.expand.nxv8i16(<vscale x 8 x i16>,<vscale x 8 x i1>,i32)
+declare <vscale x 16 x i8> @llvm.experimental.vp.expand.nxv16i8(<vscale x 16 x i8>,<vscale x 16 x i1>,i32)
+declare <vscale x 4 x i64> @llvm.experimental.vp.expand.nxv4i64(<vscale x 4 x i64>,<vscale x 4 x i1>,i32)
+declare <vscale x 8 x i32> @llvm.experimental.vp.expand.nxv8i32(<vscale x 8 x i32>,<vscale x 8 x i1>,i32)
+declare <vscale x 16 x i16> @llvm.experimental.vp.expand.nxv16i16(<vscale x 16 x i16>,<vscale x 16 x i1>,i32)
+declare <vscale x 32 x i8> @llvm.experimental.vp.expand.nxv32i8(<vscale x 32 x i8>,<vscale x 32 x i1>,i32)
+declare <vscale x 8 x i64> @llvm.experimental.vp.expand.nxv8i64(<vscale x 8 x i64>,<vscale x 8 x i1>,i32)
+declare <vscale x 16 x i32> @llvm.experimental.vp.expand.nxv16i32(<vscale x 16 x i32>,<vscale x 16 x i1>,i32)
+declare <vscale x 32 x i16> @llvm.experimental.vp.expand.nxv32i16(<vscale x 32 x i16>,<vscale x 32 x i1>,i32)
+declare <vscale x 64 x i8> @llvm.experimental.vp.expand.nxv64i8(<vscale x 64 x i8>,<vscale x 64 x i1>,i32)
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-compress-float.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-compress-float.ll
new file mode 100644
index 0000000000000..a816c89cb6969
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-compress-float.ll
@@ -0,0 +1,117 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=riscv64 -mattr=+m,+v,+d -verify-machineinstrs < %s | FileCheck %s
+
+define <2 x double> @test_vp_compress_v2f64_masked(<2 x double> %src, <2 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v2f64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e64, m1, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <2 x double> @llvm.experimental.vp.compress.v2f64(<2 x double> %src, <2 x i1> %mask, i32 %evl)
+  ret <2 x double> %dst
+}
+
+define <2 x float> @test_vp_compress_v2f32_masked(<2 x float> %src, <2 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v2f32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e32, mf2, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <2 x float> @llvm.experimental.vp.compress.v2f32(<2 x float> %src, <2 x i1> %mask, i32 %evl)
+  ret <2 x float> %dst
+}
+
+define <4 x float> @test_vp_compress_v4f32_masked(<4 x float> %src, <4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v4f32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e32, m1, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <4 x float> @llvm.experimental.vp.compress.v4f32(<4 x float> %src, <4 x i1> %mask, i32 %evl)
+  ret <4 x float> %dst
+}
+define <4 x double> @test_vp_compress_v4f64_masked(<4 x double> %src, <4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v4f64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e64, m2, ta, ma
+; CHECK-NEXT:    vcompress.vm v10, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <4 x double> @llvm.experimental.vp.compress.v4f64(<4 x double> %src, <4 x i1> %mask, i32 %evl)
+  ret <4 x double> %dst
+}
+
+define <8 x float> @test_vp_compress_v8f32_masked(<8 x float> %src, <8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v8f32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e32, m2, ta, ma
+; CHECK-NEXT:    vcompress.vm v10, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <8 x float> @llvm.experimental.vp.compress.v8f32(<8 x float> %src, <8 x i1> %mask, i32 %evl)
+  ret <8 x float> %dst
+}
+
+define <8 x double> @test_vp_compress_v8f64_masked(<8 x double> %src, <8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v8f64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e64, m4, ta, ma
+; CHECK-NEXT:    vcompress.vm v12, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <8 x double> @llvm.experimental.vp.compress.v8f64(<8 x double> %src, <8 x i1> %mask, i32 %evl)
+  ret <8 x double> %dst
+}
+
+define <16 x float> @test_vp_compress_v16f32_masked(<16 x float> %src, <16 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v16f32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e32, m4, ta, ma
+; CHECK-NEXT:    vcompress.vm v12, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <16 x float> @llvm.experimental.vp.compress.v16f32(<16 x float> %src, <16 x i1> %mask, i32 %evl)
+  ret <16 x float> %dst
+}
+
+define <16 x double> @test_vp_compress_v16f64_masked(<16 x double> %src, <16 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v16f64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e64, m8, ta, ma
+; CHECK-NEXT:    vcompress.vm v16, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <16 x double> @llvm.experimental.vp.compress.v16f64(<16 x double> %src, <16 x i1> %mask, i32 %evl)
+  ret <16 x double> %dst
+}
+
+define <32 x float> @test_vp_compress_v32f32_masked(<32 x float> %src, <32 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v32f32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e32, m8, ta, ma
+; CHECK-NEXT:    vcompress.vm v16, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <32 x float> @llvm.experimental.vp.compress.v32f32(<32 x float> %src, <32 x i1> %mask, i32 %evl)
+  ret <32 x float> %dst
+}
+
+; LMUL = 1
+declare <2 x double> @llvm.experimental.vp.compress.v2f64(<2 x double>,<2 x i1>,i32)
+declare <2 x float> @llvm.experimental.vp.compress.v2f32(<2 x float>,<2 x i1>,i32)
+declare <4 x float> @llvm.experimental.vp.compress.v4f32(<4 x float>,<4 x i1>,i32)
+
+; LMUL = 2
+declare <4 x double> @llvm.experimental.vp.compress.v4f64(<4 x double>,<4 x i1>,i32)
+declare <8 x float> @llvm.experimental.vp.compress.v8f32(<8 x float>,<8 x i1>,i32)
+
+; LMUL = 4
+declare <8 x double> @llvm.experimental.vp.compress.v8f64(<8 x double>,<8 x i1>,i32)
+declare <16 x float> @llvm.experimental.vp.compress.v16f32(<16 x float>,<16 x i1>,i32)
+
+; LMUL = 8
+declare <16 x double> @llvm.experimental.vp.compress.v16f64(<16 x double>,<16 x i1>,i32)
+declare <32 x float> @llvm.experimental.vp.compress.v32f32(<32 x float>,<32 x i1>,i32)
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-compress-int.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-compress-int.ll
new file mode 100644
index 0000000000000..f9cbdf8d850a3
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-compress-int.ll
@@ -0,0 +1,274 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=riscv64 -mattr=+m,+v -verify-machineinstrs < %s | FileCheck %s
+
+define <2 x i64> @test_vp_compress_v2i64_masked(<2 x i64> %src, <2 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v2i64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e64, m1, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <2 x i64> @llvm.experimental.vp.compress.v2i64(<2 x i64> %src, <2 x i1> %mask, i32 %evl)
+  ret <2 x i64> %dst
+}
+
+define <2 x i32> @test_vp_compress_v2i32_masked(<2 x i32> %src, <2 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v2i32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e32, mf2, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <2 x i32> @llvm.experimental.vp.compress.v2i32(<2 x i32> %src, <2 x i1> %mask, i32 %evl)
+  ret <2 x i32> %dst
+}
+
+define <4 x i32> @test_vp_compress_v4i32_masked(<4 x i32> %src, <4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v4i32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e32, m1, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <4 x i32> @llvm.experimental.vp.compress.v4i32(<4 x i32> %src, <4 x i1> %mask, i32 %evl)
+  ret <4 x i32> %dst
+}
+
+define <2 x i16> @test_vp_compress_v2i16_masked(<2 x i16> %src, <2 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v2i16_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <2 x i16> @llvm.experimental.vp.compress.v2i16(<2 x i16> %src, <2 x i1> %mask, i32 %evl)
+  ret <2 x i16> %dst
+}
+
+define <4 x i16> @test_vp_compress_v4i16_masked(<4 x i16> %src, <4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v4i16_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf2, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <4 x i16> @llvm.experimental.vp.compress.v4i16(<4 x i16> %src, <4 x i1> %mask, i32 %evl)
+  ret <4 x i16> %dst
+}
+
+define <8 x i16> @test_vp_compress_v8i16_masked(<8 x i16> %src, <8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v8i16_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m1, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <8 x i16> @llvm.experimental.vp.compress.v8i16(<8 x i16> %src, <8 x i1> %mask, i32 %evl)
+  ret <8 x i16> %dst
+}
+
+define <2 x i8> @test_vp_compress_v2i8_masked(<2 x i8> %src, <2 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v2i8_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e8, mf8, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <2 x i8> @llvm.experimental.vp.compress.v2i8(<2 x i8> %src, <2 x i1> %mask, i32 %evl)
+  ret <2 x i8> %dst
+}
+
+define <4 x i8> @test_vp_compress_v4i8_masked(<4 x i8> %src, <4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v4i8_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e8, mf4, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <4 x i8> @llvm.experimental.vp.compress.v4i8(<4 x i8> %src, <4 x i1> %mask, i32 %evl)
+  ret <4 x i8> %dst
+}
+
+define <8 x i8> @test_vp_compress_v8i8_masked(<8 x i8> %src, <8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v8i8_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e8, mf2, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <8 x i8> @llvm.experimental.vp.compress.v8i8(<8 x i8> %src, <8 x i1> %mask, i32 %evl)
+  ret <8 x i8> %dst
+}
+
+define <16 x i8> @test_vp_compress_v16i8_masked(<16 x i8> %src, <16 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v16i8_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e8, m1, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <16 x i8> @llvm.experimental.vp.compress.v16i8(<16 x i8> %src, <16 x i1> %mask, i32 %evl)
+  ret <16 x i8> %dst
+}
+
+define <4 x i64> @test_vp_compress_v4i64_masked(<4 x i64> %src, <4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v4i64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e64, m2, ta, ma
+; CHECK-NEXT:    vcompress.vm v10, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <4 x i64> @llvm.experimental.vp.compress.v4i64(<4 x i64> %src, <4 x i1> %mask, i32 %evl)
+  ret <4 x i64> %dst
+}
+
+define <8 x i32> @test_vp_compress_v8i32_masked(<8 x i32> %src, <8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v8i32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e32, m2, ta, ma
+; CHECK-NEXT:    vcompress.vm v10, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <8 x i32> @llvm.experimental.vp.compress.v8i32(<8 x i32> %src, <8 x i1> %mask, i32 %evl)
+  ret <8 x i32> %dst
+}
+
+define <16 x i16> @test_vp_compress_v16i16_masked(<16 x i16> %src, <16 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v16i16_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m2, ta, ma
+; CHECK-NEXT:    vcompress.vm v10, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <16 x i16> @llvm.experimental.vp.compress.v16i16(<16 x i16> %src, <16 x i1> %mask, i32 %evl)
+  ret <16 x i16> %dst
+}
+
+define <32 x i8> @test_vp_compress_v32i8_masked(<32 x i8> %src, <32 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v32i8_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e8, m2, ta, ma
+; CHECK-NEXT:    vcompress.vm v10, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <32 x i8> @llvm.experimental.vp.compress.v32i8(<32 x i8> %src, <32 x i1> %mask, i32 %evl)
+  ret <32 x i8> %dst
+}
+
+define <8 x i64> @test_vp_compress_v8i64_masked(<8 x i64> %src, <8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v8i64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e64, m4, ta, ma
+; CHECK-NEXT:    vcompress.vm v12, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <8 x i64> @llvm.experimental.vp.compress.v8i64(<8 x i64> %src, <8 x i1> %mask, i32 %evl)
+  ret <8 x i64> %dst
+}
+
+define <16 x i32> @test_vp_compress_v16i32_masked(<16 x i32> %src, <16 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v16i32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e32, m4, ta, ma
+; CHECK-NEXT:    vcompress.vm v12, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <16 x i32> @llvm.experimental.vp.compress.v16i32(<16 x i32> %src, <16 x i1> %mask, i32 %evl)
+  ret <16 x i32> %dst
+}
+
+define <32 x i16> @test_vp_compress_v32i16_masked(<32 x i16> %src, <32 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v32i16_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, ma
+; CHECK-NEXT:    vcompress.vm v12, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <32 x i16> @llvm.experimental.vp.compress.v32i16(<32 x i16> %src, <32 x i1> %mask, i32 %evl)
+  ret <32 x i16> %dst
+}
+
+define <64 x i8> @test_vp_compress_v64i8_masked(<64 x i8> %src, <64 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v64i8_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e8, m4, ta, ma
+; CHECK-NEXT:    vcompress.vm v12, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <64 x i8> @llvm.experimental.vp.compress.v64i8(<64 x i8> %src, <64 x i1> %mask, i32 %evl)
+  ret <64 x i8> %dst
+}
+
+define <16 x i64> @test_vp_compress_v16i64_masked(<16 x i64> %src, <16 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v16i64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e64, m8, ta, ma
+; CHECK-NEXT:    vcompress.vm v16, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <16 x i64> @llvm.experimental.vp.compress.v16i64(<16 x i64> %src, <16 x i1> %mask, i32 %evl)
+  ret <16 x i64> %dst
+}
+
+define <32 x i32> @test_vp_compress_v32i32_masked(<32 x i32> %src, <32 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v32i32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e32, m8, ta, ma
+; CHECK-NEXT:    vcompress.vm v16, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <32 x i32> @llvm.experimental.vp.compress.v32i32(<32 x i32> %src, <32 x i1> %mask, i32 %evl)
+  ret <32 x i32> %dst
+}
+
+define <64 x i16> @test_vp_compress_v64i16_masked(<64 x i16> %src, <64 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v64i16_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m8, ta, ma
+; CHECK-NEXT:    vcompress.vm v16, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <64 x i16> @llvm.experimental.vp.compress.v64i16(<64 x i16> %src, <64 x i1> %mask, i32 %evl)
+  ret <64 x i16> %dst
+}
+
+define <128 x i8> @test_vp_compress_v128i8_masked(<128 x i8> %src, <128 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_v128i8_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT:    vcompress.vm v16, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <128 x i8> @llvm.experimental.vp.compress.v128i8(<128 x i8> %src, <128 x i1> %mask, i32 %evl)
+  ret <128 x i8> %dst
+}
+
+; LMUL = 1
+declare <2 x i64> @llvm.experimental.vp.compress.v2i64(<2 x i64>,<2 x i1>,i32)
+declare <2 x i32> @llvm.experimental.vp.compress.v2i32(<2 x i32>,<2 x i1>,i32)
+declare <4 x i32> @llvm.experimental.vp.compress.v4i32(<4 x i32>,<4 x i1>,i32)
+declare <2 x i16> @llvm.experimental.vp.compress.v2i16(<2 x i16>,<2 x i1>,i32)
+declare <4 x i16> @llvm.experimental.vp.compress.v4i16(<4 x i16>,<4 x i1>,i32)
+declare <8 x i16> @llvm.experimental.vp.compress.v8i16(<8 x i16>,<8 x i1>,i32)
+declare <2 x i8> @llvm.experimental.vp.compress.v2i8(<2 x i8>,<2 x i1>,i32)
+declare <4 x i8> @llvm.experimental.vp.compress.v4i8(<4 x i8>,<4 x i1>,i32)
+declare <8 x i8> @llvm.experimental.vp.compress.v8i8(<8 x i8>,<8 x i1>,i32)
+declare <16 x i8> @llvm.experimental.vp.compress.v16i8(<16 x i8>,<16 x i1>,i32)
+
+; LMUL = 2
+declare <4 x i64> @llvm.experimental.vp.compress.v4i64(<4 x i64>,<4 x i1>,i32)
+declare <8 x i32> @llvm.experimental.vp.compress.v8i32(<8 x i32>,<8 x i1>,i32)
+declare <16 x i16> @llvm.experimental.vp.compress.v16i16(<16 x i16>,<16 x i1>,i32)
+declare <32 x i8> @llvm.experimental.vp.compress.v32i8(<32 x i8>,<32 x i1>,i32)
+
+; LMUL = 4
+declare <8 x i64> @llvm.experimental.vp.compress.v8i64(<8 x i64>,<8 x i1>,i32)
+declare <16 x i32> @llvm.experimental.vp.compress.v16i32(<16 x i32>,<16 x i1>,i32)
+declare <32 x i16> @llvm.experimental.vp.compress.v32i16(<32 x i16>,<32 x i1>,i32)
+declare <64 x i8> @llvm.experimental.vp.compress.v64i8(<64 x i8>,<64 x i1>,i32)
+
+; LMUL = 8
+declare <16 x i64> @llvm.experimental.vp.compress.v16i64(<16 x i64>,<16 x i1>,i32)
+declare <32 x i32> @llvm.experimental.vp.compress.v32i32(<32 x i32>,<32 x i1>,i32)
+declare <64 x i16> @llvm.experimental.vp.compress.v64i16(<64 x i16>,<64 x i1>,i32)
+declare <128 x i8> @llvm.experimental.vp.compress.v128i8(<128 x i8>,<128 x i1>,i32)
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-expand-float.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-expand-float.ll
new file mode 100644
index 0000000000000..2d4ab32effa7f
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-expand-float.ll
@@ -0,0 +1,135 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=riscv64 -mattr=+m,+v,+d -verify-machineinstrs < %s | FileCheck %s
+
+define <2 x double> @test_vp_expand_v2f64_masked(<2 x double> %src, <2 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v2f64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-NEXT:    viota.m v10, v0
+; CHECK-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v9, v8, v10, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <2 x double> @llvm.experimental.vp.expand.v2f64(<2 x double> %src, <2 x i1> %mask, i32 %evl)
+  ret <2 x double> %dst
+}
+
+define <2 x float> @test_vp_expand_v2f32_masked(<2 x float> %src, <2 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v2f32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-NEXT:    viota.m v10, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v9, v8, v10, v0.t
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <2 x float> @llvm.experimental.vp.expand.v2f32(<2 x float> %src, <2 x i1> %mask, i32 %evl)
+  ret <2 x float> %dst
+}
+
+define <4 x float> @test_vp_expand_v4f32_masked(<4 x float> %src, <4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v4f32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf2, ta, ma
+; CHECK-NEXT:    viota.m v10, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v9, v8, v10, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <4 x float> @llvm.experimental.vp.expand.v4f32(<4 x float> %src, <4 x i1> %mask, i32 %evl)
+  ret <4 x float> %dst
+}
+define <4 x double> @test_vp_expand_v4f64_masked(<4 x double> %src, <4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v4f64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf2, ta, ma
+; CHECK-NEXT:    viota.m v12, v0
+; CHECK-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v10, v8, v12, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <4 x double> @llvm.experimental.vp.expand.v4f64(<4 x double> %src, <4 x i1> %mask, i32 %evl)
+  ret <4 x double> %dst
+}
+
+define <8 x float> @test_vp_expand_v8f32_masked(<8 x float> %src, <8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v8f32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m1, ta, ma
+; CHECK-NEXT:    viota.m v12, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v10, v8, v12, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <8 x float> @llvm.experimental.vp.expand.v8f32(<8 x float> %src, <8 x i1> %mask, i32 %evl)
+  ret <8 x float> %dst
+}
+
+define <8 x double> @test_vp_expand_v8f64_masked(<8 x double> %src, <8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v8f64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m1, ta, ma
+; CHECK-NEXT:    viota.m v16, v0
+; CHECK-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v12, v8, v16, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <8 x double> @llvm.experimental.vp.expand.v8f64(<8 x double> %src, <8 x i1> %mask, i32 %evl)
+  ret <8 x double> %dst
+}
+
+define <16 x float> @test_vp_expand_v16f32_masked(<16 x float> %src, <16 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v16f32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m2, ta, ma
+; CHECK-NEXT:    viota.m v16, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v12, v8, v16, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <16 x float> @llvm.experimental.vp.expand.v16f32(<16 x float> %src, <16 x i1> %mask, i32 %evl)
+  ret <16 x float> %dst
+}
+
+define <16 x double> @test_vp_expand_v16f64_masked(<16 x double> %src, <16 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v16f64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m2, ta, ma
+; CHECK-NEXT:    viota.m v24, v0
+; CHECK-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v16, v8, v24, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <16 x double> @llvm.experimental.vp.expand.v16f64(<16 x double> %src, <16 x i1> %mask, i32 %evl)
+  ret <16 x double> %dst
+}
+
+define <32 x float> @test_vp_expand_v32f32_masked(<32 x float> %src, <32 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v32f32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, ma
+; CHECK-NEXT:    viota.m v24, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v16, v8, v24, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <32 x float> @llvm.experimental.vp.expand.v32f32(<32 x float> %src, <32 x i1> %mask, i32 %evl)
+  ret <32 x float> %dst
+}
+
+; LMUL = 1
+declare <2 x double> @llvm.experimental.vp.expand.v2f64(<2 x double>,<2 x i1>,i32)
+declare <2 x float> @llvm.experimental.vp.expand.v2f32(<2 x float>,<2 x i1>,i32)
+declare <4 x float> @llvm.experimental.vp.expand.v4f32(<4 x float>,<4 x i1>,i32)
+
+; LMUL = 2
+declare <4 x double> @llvm.experimental.vp.expand.v4f64(<4 x double>,<4 x i1>,i32)
+declare <8 x float> @llvm.experimental.vp.expand.v8f32(<8 x float>,<8 x i1>,i32)
+
+; LMUL = 4
+declare <8 x double> @llvm.experimental.vp.expand.v8f64(<8 x double>,<8 x i1>,i32)
+declare <16 x float> @llvm.experimental.vp.expand.v16f32(<16 x float>,<16 x i1>,i32)
+
+; LMUL = 8
+declare <16 x double> @llvm.experimental.vp.expand.v16f64(<16 x double>,<16 x i1>,i32)
+declare <32 x float> @llvm.experimental.vp.expand.v32f32(<32 x float>,<32 x i1>,i32)
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-expand-int.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-expand-int.ll
new file mode 100644
index 0000000000000..d85746e88f368
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-expand-int.ll
@@ -0,0 +1,298 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=riscv64 -mattr=+m,+v -verify-machineinstrs < %s | FileCheck %s
+
+define <2 x i64> @test_vp_expand_v2i64_masked(<2 x i64> %src, <2 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v2i64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-NEXT:    viota.m v10, v0
+; CHECK-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v9, v8, v10, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <2 x i64> @llvm.experimental.vp.expand.v2i64(<2 x i64> %src, <2 x i1> %mask, i32 %evl)
+  ret <2 x i64> %dst
+}
+
+define <2 x i32> @test_vp_expand_v2i32_masked(<2 x i32> %src, <2 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v2i32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-NEXT:    viota.m v10, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v9, v8, v10, v0.t
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <2 x i32> @llvm.experimental.vp.expand.v2i32(<2 x i32> %src, <2 x i1> %mask, i32 %evl)
+  ret <2 x i32> %dst
+}
+
+define <4 x i32> @test_vp_expand_v4i32_masked(<4 x i32> %src, <4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v4i32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf2, ta, ma
+; CHECK-NEXT:    viota.m v10, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v9, v8, v10, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <4 x i32> @llvm.experimental.vp.expand.v4i32(<4 x i32> %src, <4 x i1> %mask, i32 %evl)
+  ret <4 x i32> %dst
+}
+
+define <2 x i16> @test_vp_expand_v2i16_masked(<2 x i16> %src, <2 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v2i16_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-NEXT:    viota.m v10, v0
+; CHECK-NEXT:    vrgatherei16.vv v9, v8, v10, v0.t
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <2 x i16> @llvm.experimental.vp.expand.v2i16(<2 x i16> %src, <2 x i1> %mask, i32 %evl)
+  ret <2 x i16> %dst
+}
+
+define <4 x i16> @test_vp_expand_v4i16_masked(<4 x i16> %src, <4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v4i16_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf2, ta, ma
+; CHECK-NEXT:    viota.m v10, v0
+; CHECK-NEXT:    vrgatherei16.vv v9, v8, v10, v0.t
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <4 x i16> @llvm.experimental.vp.expand.v4i16(<4 x i16> %src, <4 x i1> %mask, i32 %evl)
+  ret <4 x i16> %dst
+}
+
+define <8 x i16> @test_vp_expand_v8i16_masked(<8 x i16> %src, <8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v8i16_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m1, ta, ma
+; CHECK-NEXT:    viota.m v10, v0
+; CHECK-NEXT:    vrgatherei16.vv v9, v8, v10, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <8 x i16> @llvm.experimental.vp.expand.v8i16(<8 x i16> %src, <8 x i1> %mask, i32 %evl)
+  ret <8 x i16> %dst
+}
+
+define <2 x i8> @test_vp_expand_v2i8_masked(<2 x i8> %src, <2 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v2i8_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-NEXT:    viota.m v10, v0
+; CHECK-NEXT:    vsetvli zero, zero, e8, mf8, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v9, v8, v10, v0.t
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <2 x i8> @llvm.experimental.vp.expand.v2i8(<2 x i8> %src, <2 x i1> %mask, i32 %evl)
+  ret <2 x i8> %dst
+}
+
+define <4 x i8> @test_vp_expand_v4i8_masked(<4 x i8> %src, <4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v4i8_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf2, ta, ma
+; CHECK-NEXT:    viota.m v10, v0
+; CHECK-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v9, v8, v10, v0.t
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <4 x i8> @llvm.experimental.vp.expand.v4i8(<4 x i8> %src, <4 x i1> %mask, i32 %evl)
+  ret <4 x i8> %dst
+}
+
+define <8 x i8> @test_vp_expand_v8i8_masked(<8 x i8> %src, <8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v8i8_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m1, ta, ma
+; CHECK-NEXT:    viota.m v10, v0
+; CHECK-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v9, v8, v10, v0.t
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <8 x i8> @llvm.experimental.vp.expand.v8i8(<8 x i8> %src, <8 x i1> %mask, i32 %evl)
+  ret <8 x i8> %dst
+}
+
+define <16 x i8> @test_vp_expand_v16i8_masked(<16 x i8> %src, <16 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v16i8_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m2, ta, ma
+; CHECK-NEXT:    viota.m v10, v0
+; CHECK-NEXT:    vsetvli zero, zero, e8, m1, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v9, v8, v10, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <16 x i8> @llvm.experimental.vp.expand.v16i8(<16 x i8> %src, <16 x i1> %mask, i32 %evl)
+  ret <16 x i8> %dst
+}
+
+define <4 x i64> @test_vp_expand_v4i64_masked(<4 x i64> %src, <4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v4i64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf2, ta, ma
+; CHECK-NEXT:    viota.m v12, v0
+; CHECK-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v10, v8, v12, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <4 x i64> @llvm.experimental.vp.expand.v4i64(<4 x i64> %src, <4 x i1> %mask, i32 %evl)
+  ret <4 x i64> %dst
+}
+
+define <8 x i32> @test_vp_expand_v8i32_masked(<8 x i32> %src, <8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v8i32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m1, ta, ma
+; CHECK-NEXT:    viota.m v12, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v10, v8, v12, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <8 x i32> @llvm.experimental.vp.expand.v8i32(<8 x i32> %src, <8 x i1> %mask, i32 %evl)
+  ret <8 x i32> %dst
+}
+
+define <16 x i16> @test_vp_expand_v16i16_masked(<16 x i16> %src, <16 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v16i16_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m2, ta, ma
+; CHECK-NEXT:    viota.m v12, v0
+; CHECK-NEXT:    vrgatherei16.vv v10, v8, v12, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <16 x i16> @llvm.experimental.vp.expand.v16i16(<16 x i16> %src, <16 x i1> %mask, i32 %evl)
+  ret <16 x i16> %dst
+}
+
+define <32 x i8> @test_vp_expand_v32i8_masked(<32 x i8> %src, <32 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v32i8_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, ma
+; CHECK-NEXT:    viota.m v12, v0
+; CHECK-NEXT:    vsetvli zero, zero, e8, m2, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v10, v8, v12, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <32 x i8> @llvm.experimental.vp.expand.v32i8(<32 x i8> %src, <32 x i1> %mask, i32 %evl)
+  ret <32 x i8> %dst
+}
+
+define <8 x i64> @test_vp_expand_v8i64_masked(<8 x i64> %src, <8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v8i64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m1, ta, ma
+; CHECK-NEXT:    viota.m v16, v0
+; CHECK-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v12, v8, v16, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <8 x i64> @llvm.experimental.vp.expand.v8i64(<8 x i64> %src, <8 x i1> %mask, i32 %evl)
+  ret <8 x i64> %dst
+}
+
+define <16 x i32> @test_vp_expand_v16i32_masked(<16 x i32> %src, <16 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v16i32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m2, ta, ma
+; CHECK-NEXT:    viota.m v16, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v12, v8, v16, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <16 x i32> @llvm.experimental.vp.expand.v16i32(<16 x i32> %src, <16 x i1> %mask, i32 %evl)
+  ret <16 x i32> %dst
+}
+
+define <32 x i16> @test_vp_expand_v32i16_masked(<32 x i16> %src, <32 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v32i16_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, ma
+; CHECK-NEXT:    viota.m v16, v0
+; CHECK-NEXT:    vrgatherei16.vv v12, v8, v16, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <32 x i16> @llvm.experimental.vp.expand.v32i16(<32 x i16> %src, <32 x i1> %mask, i32 %evl)
+  ret <32 x i16> %dst
+}
+
+define <64 x i8> @test_vp_expand_v64i8_masked(<64 x i8> %src, <64 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v64i8_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m8, ta, ma
+; CHECK-NEXT:    viota.m v16, v0
+; CHECK-NEXT:    vsetvli zero, zero, e8, m4, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v12, v8, v16, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <64 x i8> @llvm.experimental.vp.expand.v64i8(<64 x i8> %src, <64 x i1> %mask, i32 %evl)
+  ret <64 x i8> %dst
+}
+
+define <16 x i64> @test_vp_expand_v16i64_masked(<16 x i64> %src, <16 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v16i64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m2, ta, ma
+; CHECK-NEXT:    viota.m v24, v0
+; CHECK-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v16, v8, v24, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <16 x i64> @llvm.experimental.vp.expand.v16i64(<16 x i64> %src, <16 x i1> %mask, i32 %evl)
+  ret <16 x i64> %dst
+}
+
+define <32 x i32> @test_vp_expand_v32i32_masked(<32 x i32> %src, <32 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v32i32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, ma
+; CHECK-NEXT:    viota.m v24, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v16, v8, v24, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <32 x i32> @llvm.experimental.vp.expand.v32i32(<32 x i32> %src, <32 x i1> %mask, i32 %evl)
+  ret <32 x i32> %dst
+}
+
+define <64 x i16> @test_vp_expand_v64i16_masked(<64 x i16> %src, <64 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_v64i16_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m8, ta, ma
+; CHECK-NEXT:    viota.m v24, v0
+; CHECK-NEXT:    vrgatherei16.vv v16, v8, v24, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <64 x i16> @llvm.experimental.vp.expand.v64i16(<64 x i16> %src, <64 x i1> %mask, i32 %evl)
+  ret <64 x i16> %dst
+}
+
+; LMUL = 1
+declare <2 x i64> @llvm.experimental.vp.expand.v2i64(<2 x i64>,<2 x i1>,i32)
+declare <2 x i32> @llvm.experimental.vp.expand.v2i32(<2 x i32>,<2 x i1>,i32)
+declare <4 x i32> @llvm.experimental.vp.expand.v4i32(<4 x i32>,<4 x i1>,i32)
+declare <2 x i16> @llvm.experimental.vp.expand.v2i16(<2 x i16>,<2 x i1>,i32)
+declare <4 x i16> @llvm.experimental.vp.expand.v4i16(<4 x i16>,<4 x i1>,i32)
+declare <8 x i16> @llvm.experimental.vp.expand.v8i16(<8 x i16>,<8 x i1>,i32)
+declare <2 x i8> @llvm.experimental.vp.expand.v2i8(<2 x i8>,<2 x i1>,i32)
+declare <4 x i8> @llvm.experimental.vp.expand.v4i8(<4 x i8>,<4 x i1>,i32)
+declare <8 x i8> @llvm.experimental.vp.expand.v8i8(<8 x i8>,<8 x i1>,i32)
+declare <16 x i8> @llvm.experimental.vp.expand.v16i8(<16 x i8>,<16 x i1>,i32)
+
+; LMUL = 2
+declare <4 x i64> @llvm.experimental.vp.expand.v4i64(<4 x i64>,<4 x i1>,i32)
+declare <8 x i32> @llvm.experimental.vp.expand.v8i32(<8 x i32>,<8 x i1>,i32)
+declare <16 x i16> @llvm.experimental.vp.expand.v16i16(<16 x i16>,<16 x i1>,i32)
+declare <32 x i8> @llvm.experimental.vp.expand.v32i8(<32 x i8>,<32 x i1>,i32)
+
+; LMUL = 4
+declare <8 x i64> @llvm.experimental.vp.expand.v8i64(<8 x i64>,<8 x i1>,i32)
+declare <16 x i32> @llvm.experimental.vp.expand.v16i32(<16 x i32>,<16 x i1>,i32)
+declare <32 x i16> @llvm.experimental.vp.expand.v32i16(<32 x i16>,<32 x i1>,i32)
+declare <64 x i8> @llvm.experimental.vp.expand.v64i8(<64 x i8>,<64 x i1>,i32)
+
+; LMUL = 8
+declare <16 x i64> @llvm.experimental.vp.expand.v16i64(<16 x i64>,<16 x i1>,i32)
+declare <32 x i32> @llvm.experimental.vp.expand.v32i32(<32 x i32>,<32 x i1>,i32)
+declare <64 x i16> @llvm.experimental.vp.expand.v64i16(<64 x i16>,<64 x i1>,i32)
diff --git a/llvm/test/CodeGen/RISCV/rvv/vp-compress-float.ll b/llvm/test/CodeGen/RISCV/rvv/vp-compress-float.ll
new file mode 100644
index 0000000000000..4d009f37af7c9
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/vp-compress-float.ll
@@ -0,0 +1,106 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=riscv64 -mattr=+m,+f,+d,+v -verify-machineinstrs < %s | FileCheck %s
+
+define <vscale x 1 x double> @test_vp_compress_nxv1f64_masked(<vscale x 1 x double> %src, <vscale x 1 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv1f64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e64, m1, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 1 x double> @llvm.experimental.vp.compress.nxv1f64(<vscale x 1 x double> %src, <vscale x 1 x i1> %mask, i32 %evl)
+  ret <vscale x 1 x double> %dst
+}
+
+define <vscale x 2 x float> @test_vp_compress_nxv2f32_masked(<vscale x 2 x float> %src, <vscale x 2 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv2f32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e32, m1, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 2 x float> @llvm.experimental.vp.compress.nxv2f32(<vscale x 2 x float> %src, <vscale x 2 x i1> %mask, i32 %evl)
+  ret <vscale x 2 x float> %dst
+}
+
+define <vscale x 2 x double> @test_vp_compress_nxv2f64_masked(<vscale x 2 x double> %src, <vscale x 2 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv2f64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e64, m2, ta, ma
+; CHECK-NEXT:    vcompress.vm v10, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 2 x double> @llvm.experimental.vp.compress.nxv2f64(<vscale x 2 x double> %src, <vscale x 2 x i1> %mask, i32 %evl)
+  ret <vscale x 2 x double> %dst
+}
+
+define <vscale x 4 x float> @test_vp_compress_nxv4f32_masked(<vscale x 4 x float> %src, <vscale x 4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv4f32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e32, m2, ta, ma
+; CHECK-NEXT:    vcompress.vm v10, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 4 x float> @llvm.experimental.vp.compress.nxv4f32(<vscale x 4 x float> %src, <vscale x 4 x i1> %mask, i32 %evl)
+  ret <vscale x 4 x float> %dst
+}
+
+define <vscale x 4 x double> @test_vp_compress_nxv4f64_masked(<vscale x 4 x double> %src, <vscale x 4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv4f64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e64, m4, ta, ma
+; CHECK-NEXT:    vcompress.vm v12, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 4 x double> @llvm.experimental.vp.compress.nxv4f64(<vscale x 4 x double> %src, <vscale x 4 x i1> %mask, i32 %evl)
+  ret <vscale x 4 x double> %dst
+}
+
+define <vscale x 8 x float> @test_vp_compress_nxv8f32_masked(<vscale x 8 x float> %src, <vscale x 8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv8f32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e32, m4, ta, ma
+; CHECK-NEXT:    vcompress.vm v12, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 8 x float> @llvm.experimental.vp.compress.nxv8f32(<vscale x 8 x float> %src, <vscale x 8 x i1> %mask, i32 %evl)
+  ret <vscale x 8 x float> %dst
+}
+
+define <vscale x 8 x double> @test_vp_compress_nxv8f64_masked(<vscale x 8 x double> %src, <vscale x 8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv8f64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e64, m8, ta, ma
+; CHECK-NEXT:    vcompress.vm v16, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 8 x double> @llvm.experimental.vp.compress.nxv8f64(<vscale x 8 x double> %src, <vscale x 8 x i1> %mask, i32 %evl)
+  ret <vscale x 8 x double> %dst
+}
+
+define <vscale x 16 x float> @test_vp_compress_nxv16f32_masked(<vscale x 16 x float> %src, <vscale x 16 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv16f32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e32, m8, ta, ma
+; CHECK-NEXT:    vcompress.vm v16, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 16 x float> @llvm.experimental.vp.compress.nxv16f32(<vscale x 16 x float> %src, <vscale x 16 x i1> %mask, i32 %evl)
+  ret <vscale x 16 x float> %dst
+}
+
+; LMUL = 1
+declare <vscale x 1 x double> @llvm.experimental.vp.compress.nxv1f64(<vscale x 1 x double>,<vscale x 1 x i1>,i32)
+declare <vscale x 2 x float> @llvm.experimental.vp.compress.nxv2f32(<vscale x 2 x float>,<vscale x 2 x i1>,i32)
+
+; LMUL = 2
+declare <vscale x 2 x double> @llvm.experimental.vp.compress.nxv2f64(<vscale x 2 x double>,<vscale x 2 x i1>,i32)
+declare <vscale x 4 x float> @llvm.experimental.vp.compress.nxv4f32(<vscale x 4 x float>,<vscale x 4 x i1>,i32)
+
+; LMUL = 4
+declare <vscale x 4 x double> @llvm.experimental.vp.compress.nxv4f64(<vscale x 4 x double>,<vscale x 4 x i1>,i32)
+declare <vscale x 8 x float> @llvm.experimental.vp.compress.nxv8f32(<vscale x 8 x float>,<vscale x 8 x i1>,i32)
+
+; LMUL = 8
+declare <vscale x 8 x double> @llvm.experimental.vp.compress.nxv8f64(<vscale x 8 x double>,<vscale x 8 x i1>,i32)
+declare <vscale x 16 x float> @llvm.experimental.vp.compress.nxv16f32(<vscale x 16 x float>,<vscale x 16 x i1>,i32)
diff --git a/llvm/test/CodeGen/RISCV/rvv/vp-compress-int.ll b/llvm/test/CodeGen/RISCV/rvv/vp-compress-int.ll
new file mode 100644
index 0000000000000..4452386a7a5eb
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/vp-compress-int.ll
@@ -0,0 +1,202 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=riscv64 -mattr=+m,+v -verify-machineinstrs < %s | FileCheck %s
+
+define <vscale x 1 x i64> @test_vp_compress_nxv1i64_masked(<vscale x 1 x i64> %src, <vscale x 1 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv1i64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e64, m1, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 1 x i64> @llvm.experimental.vp.compress.nxv1i64(<vscale x 1 x i64> %src, <vscale x 1 x i1> %mask, i32 %evl)
+  ret <vscale x 1 x i64> %dst
+}
+
+define <vscale x 2 x i32> @test_vp_compress_nxv2i32_masked(<vscale x 2 x i32> %src, <vscale x 2 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv2i32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e32, m1, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 2 x i32> @llvm.experimental.vp.compress.nxv2i32(<vscale x 2 x i32> %src, <vscale x 2 x i1> %mask, i32 %evl)
+  ret <vscale x 2 x i32> %dst
+}
+
+define <vscale x 4 x i16> @test_vp_compress_nxv4i16_masked(<vscale x 4 x i16> %src, <vscale x 4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv4i16_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m1, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 4 x i16> @llvm.experimental.vp.compress.nxv4i16(<vscale x 4 x i16> %src, <vscale x 4 x i1> %mask, i32 %evl)
+  ret <vscale x 4 x i16> %dst
+}
+
+define <vscale x 8 x i8> @test_vp_compress_nxv8i8_masked(<vscale x 8 x i8> %src, <vscale x 8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv8i8_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e8, m1, ta, ma
+; CHECK-NEXT:    vcompress.vm v9, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 8 x i8> @llvm.experimental.vp.compress.nxv8i8(<vscale x 8 x i8> %src, <vscale x 8 x i1> %mask, i32 %evl)
+  ret <vscale x 8 x i8> %dst
+}
+
+define <vscale x 2 x i64> @test_vp_compress_nxv2i64_masked(<vscale x 2 x i64> %src, <vscale x 2 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv2i64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e64, m2, ta, ma
+; CHECK-NEXT:    vcompress.vm v10, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 2 x i64> @llvm.experimental.vp.compress.nxv2i64(<vscale x 2 x i64> %src, <vscale x 2 x i1> %mask, i32 %evl)
+  ret <vscale x 2 x i64> %dst
+}
+
+define <vscale x 4 x i32> @test_vp_compress_nxv4i32_masked(<vscale x 4 x i32> %src, <vscale x 4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv4i32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e32, m2, ta, ma
+; CHECK-NEXT:    vcompress.vm v10, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 4 x i32> @llvm.experimental.vp.compress.nxv4i32(<vscale x 4 x i32> %src, <vscale x 4 x i1> %mask, i32 %evl)
+  ret <vscale x 4 x i32> %dst
+}
+
+define <vscale x 8 x i16> @test_vp_compress_nxv8i16_masked(<vscale x 8 x i16> %src, <vscale x 8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv8i16_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m2, ta, ma
+; CHECK-NEXT:    vcompress.vm v10, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 8 x i16> @llvm.experimental.vp.compress.nxv8i16(<vscale x 8 x i16> %src, <vscale x 8 x i1> %mask, i32 %evl)
+  ret <vscale x 8 x i16> %dst
+}
+
+define <vscale x 16 x i8> @test_vp_compress_nxv16i8_masked(<vscale x 16 x i8> %src, <vscale x 16 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv16i8_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e8, m2, ta, ma
+; CHECK-NEXT:    vcompress.vm v10, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 16 x i8> @llvm.experimental.vp.compress.nxv16i8(<vscale x 16 x i8> %src, <vscale x 16 x i1> %mask, i32 %evl)
+  ret <vscale x 16 x i8> %dst
+}
+
+define <vscale x 4 x i64> @test_vp_compress_nxv4i64_masked(<vscale x 4 x i64> %src, <vscale x 4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv4i64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e64, m4, ta, ma
+; CHECK-NEXT:    vcompress.vm v12, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 4 x i64> @llvm.experimental.vp.compress.nxv4i64(<vscale x 4 x i64> %src, <vscale x 4 x i1> %mask, i32 %evl)
+  ret <vscale x 4 x i64> %dst
+}
+
+define <vscale x 8 x i32> @test_vp_compress_nxv8i32_masked(<vscale x 8 x i32> %src, <vscale x 8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv8i32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e32, m4, ta, ma
+; CHECK-NEXT:    vcompress.vm v12, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 8 x i32> @llvm.experimental.vp.compress.nxv8i32(<vscale x 8 x i32> %src, <vscale x 8 x i1> %mask, i32 %evl)
+  ret <vscale x 8 x i32> %dst
+}
+
+define <vscale x 16 x i16> @test_vp_compress_nxv16i16_masked(<vscale x 16 x i16> %src, <vscale x 16 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv16i16_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, ma
+; CHECK-NEXT:    vcompress.vm v12, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 16 x i16> @llvm.experimental.vp.compress.nxv16i16(<vscale x 16 x i16> %src, <vscale x 16 x i1> %mask, i32 %evl)
+  ret <vscale x 16 x i16> %dst
+}
+
+define <vscale x 32 x i8> @test_vp_compress_nxv32i8_masked(<vscale x 32 x i8> %src, <vscale x 32 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv32i8_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e8, m4, ta, ma
+; CHECK-NEXT:    vcompress.vm v12, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 32 x i8> @llvm.experimental.vp.compress.nxv32i8(<vscale x 32 x i8> %src, <vscale x 32 x i1> %mask, i32 %evl)
+  ret <vscale x 32 x i8> %dst
+}
+
+define <vscale x 8 x i64> @test_vp_compress_nxv8i64_masked(<vscale x 8 x i64> %src, <vscale x 8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv8i64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e64, m8, ta, ma
+; CHECK-NEXT:    vcompress.vm v16, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 8 x i64> @llvm.experimental.vp.compress.nxv8i64(<vscale x 8 x i64> %src, <vscale x 8 x i1> %mask, i32 %evl)
+  ret <vscale x 8 x i64> %dst
+}
+
+define <vscale x 16 x i32> @test_vp_compress_nxv16i32_masked(<vscale x 16 x i32> %src, <vscale x 16 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv16i32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e32, m8, ta, ma
+; CHECK-NEXT:    vcompress.vm v16, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 16 x i32> @llvm.experimental.vp.compress.nxv16i32(<vscale x 16 x i32> %src, <vscale x 16 x i1> %mask, i32 %evl)
+  ret <vscale x 16 x i32> %dst
+}
+
+define <vscale x 32 x i16> @test_vp_compress_nxv32i16_masked(<vscale x 32 x i16> %src, <vscale x 32 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv32i16_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m8, ta, ma
+; CHECK-NEXT:    vcompress.vm v16, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 32 x i16> @llvm.experimental.vp.compress.nxv32i16(<vscale x 32 x i16> %src, <vscale x 32 x i1> %mask, i32 %evl)
+  ret <vscale x 32 x i16> %dst
+}
+
+define <vscale x 64 x i8> @test_vp_compress_nxv64i8_masked(<vscale x 64 x i8> %src, <vscale x 64 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_compress_nxv64i8_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT:    vcompress.vm v16, v8, v0
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 64 x i8> @llvm.experimental.vp.compress.nxv64i8(<vscale x 64 x i8> %src, <vscale x 64 x i1> %mask, i32 %evl)
+  ret <vscale x 64 x i8> %dst
+}
+
+; LMUL = 1
+declare <vscale x 1 x i64> @llvm.experimental.vp.compress.nxv1i64(<vscale x 1 x i64>,<vscale x 1 x i1>,i32)
+declare <vscale x 2 x i32> @llvm.experimental.vp.compress.nxv2i32(<vscale x 2 x i32>,<vscale x 2 x i1>,i32)
+declare <vscale x 4 x i16> @llvm.experimental.vp.compress.nxv4i16(<vscale x 4 x i16>,<vscale x 4 x i1>,i32)
+declare <vscale x 8 x i8> @llvm.experimental.vp.compress.nxv8i8(<vscale x 8 x i8>,<vscale x 8 x i1>,i32)
+
+; LMUL = 2
+declare <vscale x 2 x i64> @llvm.experimental.vp.compress.nxv2i64(<vscale x 2 x i64>,<vscale x 2 x i1>,i32)
+declare <vscale x 4 x i32> @llvm.experimental.vp.compress.nxv4i32(<vscale x 4 x i32>,<vscale x 4 x i1>,i32)
+declare <vscale x 8 x i16> @llvm.experimental.vp.compress.nxv8i16(<vscale x 8 x i16>,<vscale x 8 x i1>,i32)
+declare <vscale x 16 x i8> @llvm.experimental.vp.compress.nxv16i8(<vscale x 16 x i8>,<vscale x 16 x i1>,i32)
+
+; LMUL = 4
+declare <vscale x 4 x i64> @llvm.experimental.vp.compress.nxv4i64(<vscale x 4 x i64>,<vscale x 4 x i1>,i32)
+declare <vscale x 8 x i32> @llvm.experimental.vp.compress.nxv8i32(<vscale x 8 x i32>,<vscale x 8 x i1>,i32)
+declare <vscale x 16 x i16> @llvm.experimental.vp.compress.nxv16i16(<vscale x 16 x i16>,<vscale x 16 x i1>,i32)
+declare <vscale x 32 x i8> @llvm.experimental.vp.compress.nxv32i8(<vscale x 32 x i8>,<vscale x 32 x i1>,i32)
+
+; LMUL = 8
+declare <vscale x 8 x i64> @llvm.experimental.vp.compress.nxv8i64(<vscale x 8 x i64>,<vscale x 8 x i1>,i32)
+declare <vscale x 16 x i32> @llvm.experimental.vp.compress.nxv16i32(<vscale x 16 x i32>,<vscale x 16 x i1>,i32)
+declare <vscale x 32 x i16> @llvm.experimental.vp.compress.nxv32i16(<vscale x 32 x i16>,<vscale x 32 x i1>,i32)
+declare <vscale x 64 x i8> @llvm.experimental.vp.compress.nxv64i8(<vscale x 64 x i8>,<vscale x 64 x i1>,i32)
diff --git a/llvm/test/CodeGen/RISCV/rvv/vp-expand-float.ll b/llvm/test/CodeGen/RISCV/rvv/vp-expand-float.ll
new file mode 100644
index 0000000000000..87de11075b55e
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/vp-expand-float.ll
@@ -0,0 +1,122 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=riscv64 -mattr=+m,+f,+d,+v -verify-machineinstrs < %s | FileCheck %s
+
+define <vscale x 1 x double> @test_vp_expand_nxv1f64_masked(<vscale x 1 x double> %src, <vscale x 1 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv1f64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-NEXT:    viota.m v10, v0
+; CHECK-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v9, v8, v10, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 1 x double> @llvm.experimental.vp.expand.nxv1f64(<vscale x 1 x double> %src, <vscale x 1 x i1> %mask, i32 %evl)
+  ret <vscale x 1 x double> %dst
+}
+
+define <vscale x 2 x float> @test_vp_expand_nxv2f32_masked(<vscale x 2 x float> %src, <vscale x 2 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv2f32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf2, ta, ma
+; CHECK-NEXT:    viota.m v10, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v9, v8, v10, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 2 x float> @llvm.experimental.vp.expand.nxv2f32(<vscale x 2 x float> %src, <vscale x 2 x i1> %mask, i32 %evl)
+  ret <vscale x 2 x float> %dst
+}
+
+define <vscale x 2 x double> @test_vp_expand_nxv2f64_masked(<vscale x 2 x double> %src, <vscale x 2 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv2f64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf2, ta, ma
+; CHECK-NEXT:    viota.m v12, v0
+; CHECK-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v10, v8, v12, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 2 x double> @llvm.experimental.vp.expand.nxv2f64(<vscale x 2 x double> %src, <vscale x 2 x i1> %mask, i32 %evl)
+  ret <vscale x 2 x double> %dst
+}
+
+define <vscale x 4 x float> @test_vp_expand_nxv4f32_masked(<vscale x 4 x float> %src, <vscale x 4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv4f32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m1, ta, ma
+; CHECK-NEXT:    viota.m v12, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v10, v8, v12, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 4 x float> @llvm.experimental.vp.expand.nxv4f32(<vscale x 4 x float> %src, <vscale x 4 x i1> %mask, i32 %evl)
+  ret <vscale x 4 x float> %dst
+}
+
+define <vscale x 4 x double> @test_vp_expand_nxv4f64_masked(<vscale x 4 x double> %src, <vscale x 4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv4f64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m1, ta, ma
+; CHECK-NEXT:    viota.m v16, v0
+; CHECK-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v12, v8, v16, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 4 x double> @llvm.experimental.vp.expand.nxv4f64(<vscale x 4 x double> %src, <vscale x 4 x i1> %mask, i32 %evl)
+  ret <vscale x 4 x double> %dst
+}
+
+define <vscale x 8 x float> @test_vp_expand_nxv8f32_masked(<vscale x 8 x float> %src, <vscale x 8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv8f32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m2, ta, ma
+; CHECK-NEXT:    viota.m v16, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v12, v8, v16, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 8 x float> @llvm.experimental.vp.expand.nxv8f32(<vscale x 8 x float> %src, <vscale x 8 x i1> %mask, i32 %evl)
+  ret <vscale x 8 x float> %dst
+}
+
+define <vscale x 8 x double> @test_vp_expand_nxv8f64_masked(<vscale x 8 x double> %src, <vscale x 8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv8f64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m2, ta, ma
+; CHECK-NEXT:    viota.m v24, v0
+; CHECK-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v16, v8, v24, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 8 x double> @llvm.experimental.vp.expand.nxv8f64(<vscale x 8 x double> %src, <vscale x 8 x i1> %mask, i32 %evl)
+  ret <vscale x 8 x double> %dst
+}
+
+define <vscale x 16 x float> @test_vp_expand_nxv16f32_masked(<vscale x 16 x float> %src, <vscale x 16 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv16f32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, ma
+; CHECK-NEXT:    viota.m v24, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v16, v8, v24, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 16 x float> @llvm.experimental.vp.expand.nxv16f32(<vscale x 16 x float> %src, <vscale x 16 x i1> %mask, i32 %evl)
+  ret <vscale x 16 x float> %dst
+}
+
+; LMUL = 1
+declare <vscale x 1 x double> @llvm.experimental.vp.expand.nxv1f64(<vscale x 1 x double>,<vscale x 1 x i1>,i32)
+declare <vscale x 2 x float> @llvm.experimental.vp.expand.nxv2f32(<vscale x 2 x float>,<vscale x 2 x i1>,i32)
+
+; LMUL = 2
+declare <vscale x 2 x double> @llvm.experimental.vp.expand.nxv2f64(<vscale x 2 x double>,<vscale x 2 x i1>,i32)
+declare <vscale x 4 x float> @llvm.experimental.vp.expand.nxv4f32(<vscale x 4 x float>,<vscale x 4 x i1>,i32)
+
+; LMUL = 4
+declare <vscale x 4 x double> @llvm.experimental.vp.expand.nxv4f64(<vscale x 4 x double>,<vscale x 4 x i1>,i32)
+declare <vscale x 8 x float> @llvm.experimental.vp.expand.nxv8f32(<vscale x 8 x float>,<vscale x 8 x i1>,i32)
+
+; LMUL = 8
+declare <vscale x 8 x double> @llvm.experimental.vp.expand.nxv8f64(<vscale x 8 x double>,<vscale x 8 x i1>,i32)
+declare <vscale x 16 x float> @llvm.experimental.vp.expand.nxv16f32(<vscale x 16 x float>,<vscale x 16 x i1>,i32)
diff --git a/llvm/test/CodeGen/RISCV/rvv/vp-expand-int.ll b/llvm/test/CodeGen/RISCV/rvv/vp-expand-int.ll
new file mode 100644
index 0000000000000..74dee12d4a4bf
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/vp-expand-int.ll
@@ -0,0 +1,216 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=riscv64 -mattr=+m,+v -verify-machineinstrs < %s | FileCheck %s
+
+define <vscale x 1 x i64> @test_vp_expand_nxv1i64_masked(<vscale x 1 x i64> %src, <vscale x 1 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv1i64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-NEXT:    viota.m v10, v0
+; CHECK-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v9, v8, v10, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 1 x i64> @llvm.experimental.vp.expand.nxv1i64(<vscale x 1 x i64> %src, <vscale x 1 x i1> %mask, i32 %evl)
+  ret <vscale x 1 x i64> %dst
+}
+
+define <vscale x 2 x i32> @test_vp_expand_nxv2i32_masked(<vscale x 2 x i32> %src, <vscale x 2 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv2i32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf2, ta, ma
+; CHECK-NEXT:    viota.m v10, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v9, v8, v10, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 2 x i32> @llvm.experimental.vp.expand.nxv2i32(<vscale x 2 x i32> %src, <vscale x 2 x i1> %mask, i32 %evl)
+  ret <vscale x 2 x i32> %dst
+}
+
+define <vscale x 4 x i16> @test_vp_expand_nxv4i16_masked(<vscale x 4 x i16> %src, <vscale x 4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv4i16_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m1, ta, ma
+; CHECK-NEXT:    viota.m v10, v0
+; CHECK-NEXT:    vrgatherei16.vv v9, v8, v10, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 4 x i16> @llvm.experimental.vp.expand.nxv4i16(<vscale x 4 x i16> %src, <vscale x 4 x i1> %mask, i32 %evl)
+  ret <vscale x 4 x i16> %dst
+}
+
+define <vscale x 8 x i8> @test_vp_expand_nxv8i8_masked(<vscale x 8 x i8> %src, <vscale x 8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv8i8_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m2, ta, ma
+; CHECK-NEXT:    viota.m v10, v0
+; CHECK-NEXT:    vsetvli zero, zero, e8, m1, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v9, v8, v10, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v9
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 8 x i8> @llvm.experimental.vp.expand.nxv8i8(<vscale x 8 x i8> %src, <vscale x 8 x i1> %mask, i32 %evl)
+  ret <vscale x 8 x i8> %dst
+}
+
+define <vscale x 2 x i64> @test_vp_expand_nxv2i64_masked(<vscale x 2 x i64> %src, <vscale x 2 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv2i64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf2, ta, ma
+; CHECK-NEXT:    viota.m v12, v0
+; CHECK-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v10, v8, v12, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 2 x i64> @llvm.experimental.vp.expand.nxv2i64(<vscale x 2 x i64> %src, <vscale x 2 x i1> %mask, i32 %evl)
+  ret <vscale x 2 x i64> %dst
+}
+
+define <vscale x 4 x i32> @test_vp_expand_nxv4i32_masked(<vscale x 4 x i32> %src, <vscale x 4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv4i32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m1, ta, ma
+; CHECK-NEXT:    viota.m v12, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v10, v8, v12, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 4 x i32> @llvm.experimental.vp.expand.nxv4i32(<vscale x 4 x i32> %src, <vscale x 4 x i1> %mask, i32 %evl)
+  ret <vscale x 4 x i32> %dst
+}
+
+define <vscale x 8 x i16> @test_vp_expand_nxv8i16_masked(<vscale x 8 x i16> %src, <vscale x 8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv8i16_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m2, ta, ma
+; CHECK-NEXT:    viota.m v12, v0
+; CHECK-NEXT:    vrgatherei16.vv v10, v8, v12, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 8 x i16> @llvm.experimental.vp.expand.nxv8i16(<vscale x 8 x i16> %src, <vscale x 8 x i1> %mask, i32 %evl)
+  ret <vscale x 8 x i16> %dst
+}
+
+define <vscale x 16 x i8> @test_vp_expand_nxv16i8_masked(<vscale x 16 x i8> %src, <vscale x 16 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv16i8_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, ma
+; CHECK-NEXT:    viota.m v12, v0
+; CHECK-NEXT:    vsetvli zero, zero, e8, m2, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v10, v8, v12, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 16 x i8> @llvm.experimental.vp.expand.nxv16i8(<vscale x 16 x i8> %src, <vscale x 16 x i1> %mask, i32 %evl)
+  ret <vscale x 16 x i8> %dst
+}
+
+define <vscale x 4 x i64> @test_vp_expand_nxv4i64_masked(<vscale x 4 x i64> %src, <vscale x 4 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv4i64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m1, ta, ma
+; CHECK-NEXT:    viota.m v16, v0
+; CHECK-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v12, v8, v16, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 4 x i64> @llvm.experimental.vp.expand.nxv4i64(<vscale x 4 x i64> %src, <vscale x 4 x i1> %mask, i32 %evl)
+  ret <vscale x 4 x i64> %dst
+}
+
+define <vscale x 8 x i32> @test_vp_expand_nxv8i32_masked(<vscale x 8 x i32> %src, <vscale x 8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv8i32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m2, ta, ma
+; CHECK-NEXT:    viota.m v16, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v12, v8, v16, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 8 x i32> @llvm.experimental.vp.expand.nxv8i32(<vscale x 8 x i32> %src, <vscale x 8 x i1> %mask, i32 %evl)
+  ret <vscale x 8 x i32> %dst
+}
+
+define <vscale x 16 x i16> @test_vp_expand_nxv16i16_masked(<vscale x 16 x i16> %src, <vscale x 16 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv16i16_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, ma
+; CHECK-NEXT:    viota.m v16, v0
+; CHECK-NEXT:    vrgatherei16.vv v12, v8, v16, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 16 x i16> @llvm.experimental.vp.expand.nxv16i16(<vscale x 16 x i16> %src, <vscale x 16 x i1> %mask, i32 %evl)
+  ret <vscale x 16 x i16> %dst
+}
+
+define <vscale x 32 x i8> @test_vp_expand_nxv32i8_masked(<vscale x 32 x i8> %src, <vscale x 32 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv32i8_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m8, ta, ma
+; CHECK-NEXT:    viota.m v16, v0
+; CHECK-NEXT:    vsetvli zero, zero, e8, m4, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v12, v8, v16, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v12
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 32 x i8> @llvm.experimental.vp.expand.nxv32i8(<vscale x 32 x i8> %src, <vscale x 32 x i1> %mask, i32 %evl)
+  ret <vscale x 32 x i8> %dst
+}
+
+define <vscale x 8 x i64> @test_vp_expand_nxv8i64_masked(<vscale x 8 x i64> %src, <vscale x 8 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv8i64_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m2, ta, ma
+; CHECK-NEXT:    viota.m v24, v0
+; CHECK-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v16, v8, v24, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 8 x i64> @llvm.experimental.vp.expand.nxv8i64(<vscale x 8 x i64> %src, <vscale x 8 x i1> %mask, i32 %evl)
+  ret <vscale x 8 x i64> %dst
+}
+
+define <vscale x 16 x i32> @test_vp_expand_nxv16i32_masked(<vscale x 16 x i32> %src, <vscale x 16 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv16i32_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, ma
+; CHECK-NEXT:    viota.m v24, v0
+; CHECK-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v16, v8, v24, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 16 x i32> @llvm.experimental.vp.expand.nxv16i32(<vscale x 16 x i32> %src, <vscale x 16 x i1> %mask, i32 %evl)
+  ret <vscale x 16 x i32> %dst
+}
+
+define <vscale x 32 x i16> @test_vp_expand_nxv32i16_masked(<vscale x 32 x i16> %src, <vscale x 32 x i1> %mask, i32 zeroext %evl) {
+; CHECK-LABEL: test_vp_expand_nxv32i16_masked:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e16, m8, ta, ma
+; CHECK-NEXT:    viota.m v24, v0
+; CHECK-NEXT:    vrgatherei16.vv v16, v8, v24, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %dst = call <vscale x 32 x i16> @llvm.experimental.vp.expand.nxv32i16(<vscale x 32 x i16> %src, <vscale x 32 x i1> %mask, i32 %evl)
+  ret <vscale x 32 x i16> %dst
+}
+
+; LMUL = 1
+declare <vscale x 1 x i64> @llvm.experimental.vp.expand.nxv1i64(<vscale x 1 x i64>,<vscale x 1 x i1>,i32)
+declare <vscale x 2 x i32> @llvm.experimental.vp.expand.nxv2i32(<vscale x 2 x i32>,<vscale x 2 x i1>,i32)
+declare <vscale x 4 x i16> @llvm.experimental.vp.expand.nxv4i16(<vscale x 4 x i16>,<vscale x 4 x i1>,i32)
+declare <vscale x 8 x i8> @llvm.experimental.vp.expand.nxv8i8(<vscale x 8 x i8>,<vscale x 8 x i1>,i32)
+
+; LMUL = 2
+declare <vscale x 2 x i64> @llvm.experimental.vp.expand.nxv2i64(<vscale x 2 x i64>,<vscale x 2 x i1>,i32)
+declare <vscale x 4 x i32> @llvm.experimental.vp.expand.nxv4i32(<vscale x 4 x i32>,<vscale x 4 x i1>,i32)
+declare <vscale x 8 x i16> @llvm.experimental.vp.expand.nxv8i16(<vscale x 8 x i16>,<vscale x 8 x i1>,i32)
+declare <vscale x 16 x i8> @llvm.experimental.vp.expand.nxv16i8(<vscale x 16 x i8>,<vscale x 16 x i1>,i32)
+
+; LMUL = 4
+declare <vscale x 4 x i64> @llvm.experimental.vp.expand.nxv4i64(<vscale x 4 x i64>,<vscale x 4 x i1>,i32)
+declare <vscale x 8 x i32> @llvm.experimental.vp.expand.nxv8i32(<vscale x 8 x i32>,<vscale x 8 x i1>,i32)
+declare <vscale x 16 x i16> @llvm.experimental.vp.expand.nxv16i16(<vscale x 16 x i16>,<vscale x 16 x i1>,i32)
+declare <vscale x 32 x i8> @llvm.experimental.vp.expand.nxv32i8(<vscale x 32 x i8>,<vscale x 32 x i1>,i32)
+
+; LMUL = 8
+declare <vscale x 8 x i64> @llvm.experimental.vp.expand.nxv8i64(<vscale x 8 x i64>,<vscale x 8 x i1>,i32)
+declare <vscale x 16 x i32> @llvm.experimental.vp.expand.nxv16i32(<vscale x 16 x i32>,<vscale x 16 x i1>,i32)
+declare <vscale x 32 x i16> @llvm.experimental.vp.expand.nxv32i16(<vscale x 32 x i16>,<vscale x 32 x i1>,i32)