[llvm] [RISCV] custom scmp(x,0) and scmp(0,x) lowering for RVV [draft] (PR #151753)
Olaf Bernstein via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 1 12:20:40 PDT 2025
https://github.com/camel-cdr created https://github.com/llvm/llvm-project/pull/151753
I noticed that the current codegen for scmp(x,0) and scmp(0,x), also known as sign(x) and -sign(x), isn't optimal for RVV.
It produces a four instruction sequence of four instructions
```
vmsgt.vi + vmslt.vi + vmerge.vim + vmerge.vim
```
for SEW<=32 and three instructions for SEW=64.
```
scmp(0,x): vmsgt.vi + vsra.vx + vor.vi
scmp(x,0): vmsgt.vi + vsrl.vx + vmerge.vim
```
This patch introduces a new lowering for all values of SEW which expresses the above in SelectionDAG Nodes.
This maps to two arithmetic instructions and a vector register move:
```
scmp(0,x): vmv.v.i/v + vmsgt.vi + masked vsra.vi/vx
scmp(x,0): vmv.v.i/v + vmsgt.vi + masked vsrl.vi/vx
```
These clobber v0, need to have a different destination than the input and need to use an additional GPR for SEW=64.
For the SEW<=32 and scmp(x,0) case a slightly different lowering was chooses:
```
scmp(x,0): vmin.vx + vsra.i + vor.vv
```
This doesn't clobber v0, but uses a single GPR.
I deemed using a single GPR slightly better than clobbering v0 (SEW<=32), but using two GPRs as worse than using one GPR and clobbering v0. But I haven't done any empirical tests.
---
I'm not sure why the fixed-vectors tests are so messed up, so I marked this as a draft for now.
This type of lowering is also advantageous for SVE and AVX512 and could be generically implemented in `TargetLowering::expandCMP`, but you need to know of integer shift instructions of the element size are available.
Here are the alive2 transforms:
* [scmp(x,0) SEW<=32 variant](https://alive2.llvm.org/ce/z/_NZgiz)
* [scmp(x,0) SEW>32 variant](https://alive2.llvm.org/ce/z/9Nhhtk)
* [scmp(0,x)](https://alive2.llvm.org/ce/z/BrxiTs)
>From 3cb30b9c72004eb56a98ac942ffade5117754967 Mon Sep 17 00:00:00 2001
From: Olaf Bernstein <camel-cdr at protonmail.com>
Date: Fri, 1 Aug 2025 20:44:17 +0200
Subject: [PATCH] [RISCV] custom scmp(x,0) and scmp(0,x) lowering for RVV
The current codegen for scmp(x,0) and scmp(0,x), also known as sign(x)
and -sign(x), isn't optimal for RVV.
It produces a four instruction sequence of for instructions
vmsgt.vi + vmslt.vi + vmerge.vim + vmerge.vim
for SEW<=32 and three instructions for SEW=64.
scmp(0,x): vmsgt.vi + vsra.vx + vor.vi
scmp(x,0): vmsgt.vi + vsrl.vx + vmerge.vim
This patch introduces a new lowering for all values of SEW which
expresses the above in SelectionDAG Nodes.
This maps to two arithmetic instructions and a vector register move:
scmp(0,x): vmv.v.i/v + vmsgt.vi + masked vsra.vi/vx
scmp(x,0): vmv.v.i/v + vmsgt.vi + masked vsrl.vi/vx
These clobber v0, need to have a different destination than the input
and need to use an additional GPR for SEW=64.
For the SEW<=32 and scmp(x,0) case a slightly different
lowering was chooses:
scmp(x,0): vmin.vx + vsra.i + vor.vv
This doesn't clobber v0, but uses a single GPR.
We deemed using a single GPR slightly better than clobbering v0
(SEW<=32), but using two GPRs as worse than using one GPR and
clobbering v0.
---
llvm/lib/Target/RISCV/RISCVISelLowering.cpp | 31 +
.../CodeGen/RISCV/rvv/fixed-vectors-scmp.ll | 596 ++++++++++++++++++
llvm/test/CodeGen/RISCV/rvv/scmp.ll | 200 ++++++
3 files changed, 827 insertions(+)
create mode 100644 llvm/test/CodeGen/RISCV/rvv/fixed-vectors-scmp.ll
create mode 100644 llvm/test/CodeGen/RISCV/rvv/scmp.ll
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index adbfbeb4669e7..f36f134fff452 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -880,6 +880,7 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setOperationAction({ISD::SMIN, ISD::SMAX, ISD::UMIN, ISD::UMAX}, VT,
Legal);
+ setOperationAction(ISD::SCMP, VT, Custom);
setOperationAction({ISD::ABDS, ISD::ABDU}, VT, Custom);
// Custom-lower extensions and truncations from/to mask types.
@@ -1361,6 +1362,7 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setOperationAction(
{ISD::SMIN, ISD::SMAX, ISD::UMIN, ISD::UMAX, ISD::ABS}, VT, Custom);
+ setOperationAction(ISD::SCMP, VT, Custom);
setOperationAction({ISD::ABDS, ISD::ABDU}, VT, Custom);
// vXi64 MULHS/MULHU requires the V extension instead of Zve64*.
@@ -8223,6 +8225,35 @@ SDValue RISCVTargetLowering::LowerOperation(SDValue Op,
case ISD::SADDSAT:
case ISD::SSUBSAT:
return lowerToScalableOp(Op, DAG);
+ case ISD::SCMP: {
+ SDLoc DL(Op);
+ EVT VT = Op->getValueType(0);
+ SDValue LHS = DAG.getFreeze(Op->getOperand(0));
+ SDValue RHS = DAG.getFreeze(Op->getOperand(1));
+ unsigned SEW = VT.getScalarSizeInBits();
+
+ SDValue Shift = DAG.getConstant(SEW-1, DL, VT);
+ SDValue Zero = DAG.getConstant(0, DL, VT);
+ SDValue One = DAG.getConstant(1, DL, VT);
+ SDValue MinusOne = DAG.getAllOnesConstant(DL, VT);
+
+ if (ISD::isConstantSplatVectorAllZeros(RHS.getNode())) {
+ SDValue Sra = DAG.getNode(ISD::SRA, DL, VT, LHS, Shift);
+ if (SEW <= 32) {
+ // scmp(lhs, 0) -> vor.vv(vsra.vi(lhs,SEW-1), vmin.vx(lhs,1))
+ SDValue Min = DAG.getNode(ISD::SMIN, DL, VT, LHS, One);
+ return DAG.getNode(ISD::OR, DL, VT, Sra, Min);
+ }
+ // scmp(lhs, 0) -> vmerge.vi(vmsgt.vi(rhs,0), vsra.vx(lhs,SEW-1), 1)
+ return DAG.getSelectCC(DL, LHS, Zero, Sra, One, ISD::SETGT);
+ } else if (ISD::isConstantSplatVectorAllZeros(LHS.getNode())) {
+ // scmp(0, rhs) -> vmerge.vi(vmsgt.vi(rhs,0), vsrl.vi/vx(rhs,SEW-1), -1)
+ SDValue Srl = DAG.getNode(ISD::SRL, DL, VT, RHS, Shift);
+ return DAG.getSelectCC(DL, RHS, Zero, Srl, MinusOne, ISD::SETGT);
+ }
+
+ return SDValue();
+ }
case ISD::ABDS:
case ISD::ABDU: {
SDLoc dl(Op);
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-scmp.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-scmp.ll
new file mode 100644
index 0000000000000..444d3a08216c9
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-scmp.ll
@@ -0,0 +1,596 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=riscv32 -mattr=+v -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,RV32
+; RUN: llc -mtriple=riscv64 -mattr=+v -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,RV64
+
+define <16 x i8> @scmp_i8i8(<16 x i8> %a, <16 x i8> %b) {
+; CHECK-LABEL: scmp_i8i8:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: vsetivli zero, 16, e8, m1, ta, ma
+; CHECK-NEXT: vmslt.vv v0, v9, v8
+; CHECK-NEXT: vmv.v.i v10, 0
+; CHECK-NEXT: vmerge.vim v10, v10, 1, v0
+; CHECK-NEXT: vmslt.vv v0, v8, v9
+; CHECK-NEXT: vmerge.vim v8, v10, -1, v0
+; CHECK-NEXT: ret
+entry:
+ %c = call <16 x i8> @llvm.scmp(<16 x i8> %a, <16 x i8> %b)
+ ret <16 x i8> %c
+}
+
+define <16 x i8> @scmp_z8i8(<16 x i8> %a) {
+; RV32-LABEL: scmp_z8i8:
+; RV32: # %bb.0: # %entry
+; RV32-NEXT: addi sp, sp, -16
+; RV32-NEXT: .cfi_def_cfa_offset 16
+; RV32-NEXT: sw s0, 12(sp) # 4-byte Folded Spill
+; RV32-NEXT: .cfi_offset s0, -4
+; RV32-NEXT: vsetivli zero, 16, e8, m1, ta, ma
+; RV32-NEXT: vslidedown.vi v10, v8, 9
+; RV32-NEXT: vsrl.vi v9, v8, 7
+; RV32-NEXT: vslidedown.vi v11, v8, 8
+; RV32-NEXT: vslidedown.vi v12, v8, 10
+; RV32-NEXT: vmv.x.s a0, v10
+; RV32-NEXT: vslidedown.vi v10, v8, 11
+; RV32-NEXT: vmv.x.s a1, v11
+; RV32-NEXT: vslidedown.vi v11, v8, 12
+; RV32-NEXT: vmv.x.s a2, v12
+; RV32-NEXT: vslidedown.vi v12, v8, 13
+; RV32-NEXT: vmv.x.s a3, v10
+; RV32-NEXT: vslidedown.vi v10, v8, 14
+; RV32-NEXT: vmv.x.s a4, v11
+; RV32-NEXT: vslidedown.vi v11, v8, 15
+; RV32-NEXT: vmv.x.s a5, v12
+; RV32-NEXT: vslidedown.vi v12, v8, 1
+; RV32-NEXT: vmv.x.s t4, v8
+; RV32-NEXT: vmv.x.s a6, v10
+; RV32-NEXT: vslidedown.vi v10, v8, 2
+; RV32-NEXT: vmv.x.s a7, v11
+; RV32-NEXT: vslidedown.vi v11, v8, 3
+; RV32-NEXT: vmv.x.s t0, v12
+; RV32-NEXT: vslidedown.vi v12, v8, 4
+; RV32-NEXT: vmv.x.s t1, v10
+; RV32-NEXT: vslidedown.vi v10, v8, 5
+; RV32-NEXT: vmv.x.s t2, v11
+; RV32-NEXT: vslidedown.vi v11, v8, 6
+; RV32-NEXT: vslidedown.vi v8, v8, 7
+; RV32-NEXT: li t6, 255
+; RV32-NEXT: vmv.x.s t3, v12
+; RV32-NEXT: vslidedown.vi v12, v9, 9
+; RV32-NEXT: vmv.x.s t5, v10
+; RV32-NEXT: vslidedown.vi v10, v9, 8
+; RV32-NEXT: sgtz s0, t4
+; RV32-NEXT: vmv.x.s t4, v11
+; RV32-NEXT: vsetvli zero, zero, e16, m2, ta, ma
+; RV32-NEXT: vmv.s.x v0, t6
+; RV32-NEXT: vsetvli zero, zero, e8, m1, ta, mu
+; RV32-NEXT: vmv.x.s t6, v9
+; RV32-NEXT: addi s0, s0, -1
+; RV32-NEXT: or t6, s0, t6
+; RV32-NEXT: vmv.x.s s0, v12
+; RV32-NEXT: vslidedown.vi v11, v9, 10
+; RV32-NEXT: sgtz a0, a0
+; RV32-NEXT: addi a0, a0, -1
+; RV32-NEXT: or a0, a0, s0
+; RV32-NEXT: vmv.x.s s0, v10
+; RV32-NEXT: vslidedown.vi v10, v9, 11
+; RV32-NEXT: sgtz a1, a1
+; RV32-NEXT: addi a1, a1, -1
+; RV32-NEXT: or a1, a1, s0
+; RV32-NEXT: vmv.x.s s0, v11
+; RV32-NEXT: vslidedown.vi v11, v9, 12
+; RV32-NEXT: sgtz a2, a2
+; RV32-NEXT: addi a2, a2, -1
+; RV32-NEXT: or a2, a2, s0
+; RV32-NEXT: vmv.x.s s0, v10
+; RV32-NEXT: vslidedown.vi v10, v9, 13
+; RV32-NEXT: sgtz a3, a3
+; RV32-NEXT: addi a3, a3, -1
+; RV32-NEXT: or a3, a3, s0
+; RV32-NEXT: vmv.x.s s0, v11
+; RV32-NEXT: vslidedown.vi v11, v9, 14
+; RV32-NEXT: sgtz a4, a4
+; RV32-NEXT: addi a4, a4, -1
+; RV32-NEXT: or a4, a4, s0
+; RV32-NEXT: vmv.x.s s0, v10
+; RV32-NEXT: vslidedown.vi v10, v9, 15
+; RV32-NEXT: sgtz a5, a5
+; RV32-NEXT: addi a5, a5, -1
+; RV32-NEXT: or a5, a5, s0
+; RV32-NEXT: vmv.x.s s0, v11
+; RV32-NEXT: vslidedown.vi v11, v9, 1
+; RV32-NEXT: sgtz a6, a6
+; RV32-NEXT: addi a6, a6, -1
+; RV32-NEXT: or a6, a6, s0
+; RV32-NEXT: vmv.x.s s0, v10
+; RV32-NEXT: vslidedown.vi v10, v9, 2
+; RV32-NEXT: sgtz a7, a7
+; RV32-NEXT: addi a7, a7, -1
+; RV32-NEXT: or a7, a7, s0
+; RV32-NEXT: vmv.x.s s0, v11
+; RV32-NEXT: vslidedown.vi v11, v9, 3
+; RV32-NEXT: sgtz t0, t0
+; RV32-NEXT: addi t0, t0, -1
+; RV32-NEXT: or t0, t0, s0
+; RV32-NEXT: vmv.x.s s0, v8
+; RV32-NEXT: vmv.v.x v8, t6
+; RV32-NEXT: vmv.x.s t6, v10
+; RV32-NEXT: vslidedown.vi v10, v9, 4
+; RV32-NEXT: sgtz t1, t1
+; RV32-NEXT: addi t1, t1, -1
+; RV32-NEXT: or t1, t1, t6
+; RV32-NEXT: vmv.x.s t6, v11
+; RV32-NEXT: vslidedown.vi v11, v9, 5
+; RV32-NEXT: sgtz t2, t2
+; RV32-NEXT: addi t2, t2, -1
+; RV32-NEXT: or t2, t2, t6
+; RV32-NEXT: vmv.x.s t6, v10
+; RV32-NEXT: vslidedown.vi v10, v9, 6
+; RV32-NEXT: vslidedown.vi v9, v9, 7
+; RV32-NEXT: sgtz t3, t3
+; RV32-NEXT: sgtz t5, t5
+; RV32-NEXT: addi t3, t3, -1
+; RV32-NEXT: or t3, t3, t6
+; RV32-NEXT: vmv.x.s t6, v11
+; RV32-NEXT: sgtz t4, t4
+; RV32-NEXT: addi t5, t5, -1
+; RV32-NEXT: or t5, t5, t6
+; RV32-NEXT: vmv.x.s t6, v10
+; RV32-NEXT: sgtz s0, s0
+; RV32-NEXT: addi t4, t4, -1
+; RV32-NEXT: or t4, t4, t6
+; RV32-NEXT: vmv.x.s t6, v9
+; RV32-NEXT: addi s0, s0, -1
+; RV32-NEXT: or t6, s0, t6
+; RV32-NEXT: vmv.v.x v9, a1
+; RV32-NEXT: vslide1down.vx v8, v8, t0
+; RV32-NEXT: vslide1down.vx v9, v9, a0
+; RV32-NEXT: vslide1down.vx v8, v8, t1
+; RV32-NEXT: vslide1down.vx v9, v9, a2
+; RV32-NEXT: vslide1down.vx v8, v8, t2
+; RV32-NEXT: vslide1down.vx v9, v9, a3
+; RV32-NEXT: vslide1down.vx v8, v8, t3
+; RV32-NEXT: vslide1down.vx v9, v9, a4
+; RV32-NEXT: vslide1down.vx v8, v8, t5
+; RV32-NEXT: vslide1down.vx v9, v9, a5
+; RV32-NEXT: vslide1down.vx v10, v8, t4
+; RV32-NEXT: vslide1down.vx v8, v9, a6
+; RV32-NEXT: vslide1down.vx v8, v8, a7
+; RV32-NEXT: vslide1down.vx v9, v10, t6
+; RV32-NEXT: vslidedown.vi v8, v9, 8, v0.t
+; RV32-NEXT: lw s0, 12(sp) # 4-byte Folded Reload
+; RV32-NEXT: .cfi_restore s0
+; RV32-NEXT: addi sp, sp, 16
+; RV32-NEXT: .cfi_def_cfa_offset 0
+; RV32-NEXT: ret
+;
+; RV64-LABEL: scmp_z8i8:
+; RV64: # %bb.0: # %entry
+; RV64-NEXT: addi sp, sp, -16
+; RV64-NEXT: .cfi_def_cfa_offset 16
+; RV64-NEXT: sd s0, 8(sp) # 8-byte Folded Spill
+; RV64-NEXT: .cfi_offset s0, -8
+; RV64-NEXT: vsetivli zero, 16, e8, m1, ta, ma
+; RV64-NEXT: vslidedown.vi v10, v8, 9
+; RV64-NEXT: vsrl.vi v9, v8, 7
+; RV64-NEXT: vslidedown.vi v11, v8, 8
+; RV64-NEXT: vslidedown.vi v12, v8, 10
+; RV64-NEXT: vmv.x.s a0, v10
+; RV64-NEXT: vslidedown.vi v10, v8, 11
+; RV64-NEXT: vmv.x.s a1, v11
+; RV64-NEXT: vslidedown.vi v11, v8, 12
+; RV64-NEXT: vmv.x.s a2, v12
+; RV64-NEXT: vslidedown.vi v12, v8, 13
+; RV64-NEXT: vmv.x.s a3, v10
+; RV64-NEXT: vslidedown.vi v10, v8, 14
+; RV64-NEXT: vmv.x.s a4, v11
+; RV64-NEXT: vslidedown.vi v11, v8, 15
+; RV64-NEXT: vmv.x.s a5, v12
+; RV64-NEXT: vslidedown.vi v12, v8, 1
+; RV64-NEXT: vmv.x.s t4, v8
+; RV64-NEXT: vmv.x.s a6, v10
+; RV64-NEXT: vslidedown.vi v10, v8, 2
+; RV64-NEXT: vmv.x.s a7, v11
+; RV64-NEXT: vslidedown.vi v11, v8, 3
+; RV64-NEXT: vmv.x.s t0, v12
+; RV64-NEXT: vslidedown.vi v12, v8, 4
+; RV64-NEXT: vmv.x.s t1, v10
+; RV64-NEXT: vslidedown.vi v10, v8, 5
+; RV64-NEXT: vmv.x.s t2, v11
+; RV64-NEXT: vslidedown.vi v11, v8, 6
+; RV64-NEXT: vslidedown.vi v8, v8, 7
+; RV64-NEXT: li t6, 255
+; RV64-NEXT: vmv.x.s t3, v12
+; RV64-NEXT: vslidedown.vi v12, v9, 9
+; RV64-NEXT: vmv.x.s t5, v10
+; RV64-NEXT: vslidedown.vi v10, v9, 8
+; RV64-NEXT: sgtz s0, t4
+; RV64-NEXT: vmv.x.s t4, v11
+; RV64-NEXT: vsetvli zero, zero, e16, m2, ta, ma
+; RV64-NEXT: vmv.s.x v0, t6
+; RV64-NEXT: vsetvli zero, zero, e8, m1, ta, mu
+; RV64-NEXT: vmv.x.s t6, v9
+; RV64-NEXT: addi s0, s0, -1
+; RV64-NEXT: or t6, s0, t6
+; RV64-NEXT: vmv.x.s s0, v12
+; RV64-NEXT: vslidedown.vi v11, v9, 10
+; RV64-NEXT: sgtz a0, a0
+; RV64-NEXT: addi a0, a0, -1
+; RV64-NEXT: or a0, a0, s0
+; RV64-NEXT: vmv.x.s s0, v10
+; RV64-NEXT: vslidedown.vi v10, v9, 11
+; RV64-NEXT: sgtz a1, a1
+; RV64-NEXT: addi a1, a1, -1
+; RV64-NEXT: or a1, a1, s0
+; RV64-NEXT: vmv.x.s s0, v11
+; RV64-NEXT: vslidedown.vi v11, v9, 12
+; RV64-NEXT: sgtz a2, a2
+; RV64-NEXT: addi a2, a2, -1
+; RV64-NEXT: or a2, a2, s0
+; RV64-NEXT: vmv.x.s s0, v10
+; RV64-NEXT: vslidedown.vi v10, v9, 13
+; RV64-NEXT: sgtz a3, a3
+; RV64-NEXT: addi a3, a3, -1
+; RV64-NEXT: or a3, a3, s0
+; RV64-NEXT: vmv.x.s s0, v11
+; RV64-NEXT: vslidedown.vi v11, v9, 14
+; RV64-NEXT: sgtz a4, a4
+; RV64-NEXT: addi a4, a4, -1
+; RV64-NEXT: or a4, a4, s0
+; RV64-NEXT: vmv.x.s s0, v10
+; RV64-NEXT: vslidedown.vi v10, v9, 15
+; RV64-NEXT: sgtz a5, a5
+; RV64-NEXT: addi a5, a5, -1
+; RV64-NEXT: or a5, a5, s0
+; RV64-NEXT: vmv.x.s s0, v11
+; RV64-NEXT: vslidedown.vi v11, v9, 1
+; RV64-NEXT: sgtz a6, a6
+; RV64-NEXT: addi a6, a6, -1
+; RV64-NEXT: or a6, a6, s0
+; RV64-NEXT: vmv.x.s s0, v10
+; RV64-NEXT: vslidedown.vi v10, v9, 2
+; RV64-NEXT: sgtz a7, a7
+; RV64-NEXT: addi a7, a7, -1
+; RV64-NEXT: or a7, a7, s0
+; RV64-NEXT: vmv.x.s s0, v11
+; RV64-NEXT: vslidedown.vi v11, v9, 3
+; RV64-NEXT: sgtz t0, t0
+; RV64-NEXT: addi t0, t0, -1
+; RV64-NEXT: or t0, t0, s0
+; RV64-NEXT: vmv.x.s s0, v8
+; RV64-NEXT: vmv.v.x v8, t6
+; RV64-NEXT: vmv.x.s t6, v10
+; RV64-NEXT: vslidedown.vi v10, v9, 4
+; RV64-NEXT: sgtz t1, t1
+; RV64-NEXT: addi t1, t1, -1
+; RV64-NEXT: or t1, t1, t6
+; RV64-NEXT: vmv.x.s t6, v11
+; RV64-NEXT: vslidedown.vi v11, v9, 5
+; RV64-NEXT: sgtz t2, t2
+; RV64-NEXT: addi t2, t2, -1
+; RV64-NEXT: or t2, t2, t6
+; RV64-NEXT: vmv.x.s t6, v10
+; RV64-NEXT: vslidedown.vi v10, v9, 6
+; RV64-NEXT: vslidedown.vi v9, v9, 7
+; RV64-NEXT: sgtz t3, t3
+; RV64-NEXT: sgtz t5, t5
+; RV64-NEXT: addi t3, t3, -1
+; RV64-NEXT: or t3, t3, t6
+; RV64-NEXT: vmv.x.s t6, v11
+; RV64-NEXT: sgtz t4, t4
+; RV64-NEXT: addi t5, t5, -1
+; RV64-NEXT: or t5, t5, t6
+; RV64-NEXT: vmv.x.s t6, v10
+; RV64-NEXT: sgtz s0, s0
+; RV64-NEXT: addi t4, t4, -1
+; RV64-NEXT: or t4, t4, t6
+; RV64-NEXT: vmv.x.s t6, v9
+; RV64-NEXT: addi s0, s0, -1
+; RV64-NEXT: or t6, s0, t6
+; RV64-NEXT: vmv.v.x v9, a1
+; RV64-NEXT: vslide1down.vx v8, v8, t0
+; RV64-NEXT: vslide1down.vx v9, v9, a0
+; RV64-NEXT: vslide1down.vx v8, v8, t1
+; RV64-NEXT: vslide1down.vx v9, v9, a2
+; RV64-NEXT: vslide1down.vx v8, v8, t2
+; RV64-NEXT: vslide1down.vx v9, v9, a3
+; RV64-NEXT: vslide1down.vx v8, v8, t3
+; RV64-NEXT: vslide1down.vx v9, v9, a4
+; RV64-NEXT: vslide1down.vx v8, v8, t5
+; RV64-NEXT: vslide1down.vx v9, v9, a5
+; RV64-NEXT: vslide1down.vx v10, v8, t4
+; RV64-NEXT: vslide1down.vx v8, v9, a6
+; RV64-NEXT: vslide1down.vx v8, v8, a7
+; RV64-NEXT: vslide1down.vx v9, v10, t6
+; RV64-NEXT: vslidedown.vi v8, v9, 8, v0.t
+; RV64-NEXT: ld s0, 8(sp) # 8-byte Folded Reload
+; RV64-NEXT: .cfi_restore s0
+; RV64-NEXT: addi sp, sp, 16
+; RV64-NEXT: .cfi_def_cfa_offset 0
+; RV64-NEXT: ret
+entry:
+ %c = call <16 x i8> @llvm.scmp(<16 x i8> zeroinitializer, <16 x i8> %a)
+ ret <16 x i8> %c
+}
+
+define <16 x i8> @scmp_i8z8(<16 x i8> %a) {
+; CHECK-LABEL: scmp_i8z8:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: li a0, 1
+; CHECK-NEXT: vsetivli zero, 16, e8, m1, ta, ma
+; CHECK-NEXT: vmin.vx v9, v8, a0
+; CHECK-NEXT: vsra.vi v8, v8, 7
+; CHECK-NEXT: vor.vv v8, v8, v9
+; CHECK-NEXT: ret
+entry:
+ %c = call <16 x i8> @llvm.scmp(<16 x i8> %a, <16 x i8> zeroinitializer)
+ ret <16 x i8> %c
+}
+
+
+define <8 x i16> @scmp_i16i16(<8 x i16> %a, <8 x i16> %b) {
+; CHECK-LABEL: scmp_i16i16:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT: vmslt.vv v0, v9, v8
+; CHECK-NEXT: vmv.v.i v10, 0
+; CHECK-NEXT: vmerge.vim v10, v10, 1, v0
+; CHECK-NEXT: vmslt.vv v0, v8, v9
+; CHECK-NEXT: vmerge.vim v8, v10, -1, v0
+; CHECK-NEXT: ret
+entry:
+ %c = call <8 x i16> @llvm.scmp(<8 x i16> %a, <8 x i16> %b)
+ ret <8 x i16> %c
+}
+
+define <8 x i16> @scmp_z16i16(<8 x i16> %a) {
+; CHECK-LABEL: scmp_z16i16:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, mu
+; CHECK-NEXT: vslidedown.vi v10, v8, 5
+; CHECK-NEXT: vsrl.vi v9, v8, 15
+; CHECK-NEXT: vslidedown.vi v11, v8, 4
+; CHECK-NEXT: vslidedown.vi v12, v8, 6
+; CHECK-NEXT: vmv.x.s a0, v10
+; CHECK-NEXT: vslidedown.vi v10, v8, 7
+; CHECK-NEXT: vmv.x.s a1, v11
+; CHECK-NEXT: vslidedown.vi v11, v8, 1
+; CHECK-NEXT: vmv.x.s a2, v8
+; CHECK-NEXT: vmv.x.s a3, v12
+; CHECK-NEXT: vslidedown.vi v12, v8, 2
+; CHECK-NEXT: vslidedown.vi v8, v8, 3
+; CHECK-NEXT: vmv.x.s a4, v10
+; CHECK-NEXT: vslidedown.vi v10, v9, 5
+; CHECK-NEXT: vmv.x.s a5, v11
+; CHECK-NEXT: vslidedown.vi v11, v9, 4
+; CHECK-NEXT: vmv.x.s a6, v12
+; CHECK-NEXT: vslidedown.vi v12, v9, 6
+; CHECK-NEXT: sgtz a2, a2
+; CHECK-NEXT: vmv.x.s a7, v9
+; CHECK-NEXT: addi a2, a2, -1
+; CHECK-NEXT: or a2, a2, a7
+; CHECK-NEXT: vmv.x.s a7, v10
+; CHECK-NEXT: vslidedown.vi v10, v9, 7
+; CHECK-NEXT: sgtz a0, a0
+; CHECK-NEXT: addi a0, a0, -1
+; CHECK-NEXT: or a0, a0, a7
+; CHECK-NEXT: vmv.x.s a7, v11
+; CHECK-NEXT: vslidedown.vi v11, v9, 1
+; CHECK-NEXT: sgtz a1, a1
+; CHECK-NEXT: addi a1, a1, -1
+; CHECK-NEXT: or a1, a1, a7
+; CHECK-NEXT: vmv.x.s a7, v12
+; CHECK-NEXT: vslidedown.vi v12, v9, 2
+; CHECK-NEXT: sgtz a3, a3
+; CHECK-NEXT: sgtz a4, a4
+; CHECK-NEXT: addi a3, a3, -1
+; CHECK-NEXT: or a3, a3, a7
+; CHECK-NEXT: vmv.x.s a7, v10
+; CHECK-NEXT: sgtz a5, a5
+; CHECK-NEXT: addi a4, a4, -1
+; CHECK-NEXT: or a4, a4, a7
+; CHECK-NEXT: vmv.x.s a7, v11
+; CHECK-NEXT: addi a5, a5, -1
+; CHECK-NEXT: or a5, a5, a7
+; CHECK-NEXT: vmv.x.s a7, v8
+; CHECK-NEXT: vslidedown.vi v8, v9, 3
+; CHECK-NEXT: sgtz a6, a6
+; CHECK-NEXT: vmv.v.x v9, a2
+; CHECK-NEXT: vmv.x.s a2, v12
+; CHECK-NEXT: sgtz a7, a7
+; CHECK-NEXT: addi a6, a6, -1
+; CHECK-NEXT: or a2, a6, a2
+; CHECK-NEXT: vmv.x.s a6, v8
+; CHECK-NEXT: addi a7, a7, -1
+; CHECK-NEXT: or a6, a7, a6
+; CHECK-NEXT: vmv.v.x v8, a1
+; CHECK-NEXT: vslide1down.vx v9, v9, a5
+; CHECK-NEXT: vslide1down.vx v8, v8, a0
+; CHECK-NEXT: vslide1down.vx v9, v9, a2
+; CHECK-NEXT: vmv.v.i v0, 15
+; CHECK-NEXT: vslide1down.vx v8, v8, a3
+; CHECK-NEXT: vslide1down.vx v8, v8, a4
+; CHECK-NEXT: vslide1down.vx v9, v9, a6
+; CHECK-NEXT: vslidedown.vi v8, v9, 4, v0.t
+; CHECK-NEXT: ret
+entry:
+ %c = call <8 x i16> @llvm.scmp(<8 x i16> zeroinitializer, <8 x i16> %a)
+ ret <8 x i16> %c
+}
+
+define <8 x i16> @scmp_i16z16(<8 x i16> %a) {
+; CHECK-LABEL: scmp_i16z16:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: li a0, 1
+; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT: vmin.vx v9, v8, a0
+; CHECK-NEXT: vsra.vi v8, v8, 15
+; CHECK-NEXT: vor.vv v8, v8, v9
+; CHECK-NEXT: ret
+entry:
+ %c = call <8 x i16> @llvm.scmp(<8 x i16> %a, <8 x i16> zeroinitializer)
+ ret <8 x i16> %c
+}
+
+
+define <4 x i32> @scmp_i32i32(<4 x i32> %a, <4 x i32> %b) {
+; CHECK-LABEL: scmp_i32i32:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT: vmslt.vv v0, v9, v8
+; CHECK-NEXT: vmv.v.i v10, 0
+; CHECK-NEXT: vmerge.vim v10, v10, 1, v0
+; CHECK-NEXT: vmslt.vv v0, v8, v9
+; CHECK-NEXT: vmerge.vim v8, v10, -1, v0
+; CHECK-NEXT: ret
+entry:
+ %c = call <4 x i32> @llvm.scmp(<4 x i32> %a, <4 x i32> %b)
+ ret <4 x i32> %c
+}
+
+define <4 x i32> @scmp_z32i32(<4 x i32> %a) {
+; CHECK-LABEL: scmp_z32i32:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT: vslidedown.vi v9, v8, 1
+; CHECK-NEXT: vsrl.vi v10, v8, 31
+; CHECK-NEXT: vmv.x.s a0, v8
+; CHECK-NEXT: vslidedown.vi v11, v8, 2
+; CHECK-NEXT: vslidedown.vi v8, v8, 3
+; CHECK-NEXT: vmv.x.s a1, v9
+; CHECK-NEXT: vslidedown.vi v9, v10, 1
+; CHECK-NEXT: sgtz a0, a0
+; CHECK-NEXT: vmv.x.s a2, v10
+; CHECK-NEXT: sgtz a1, a1
+; CHECK-NEXT: addi a0, a0, -1
+; CHECK-NEXT: or a0, a0, a2
+; CHECK-NEXT: vmv.x.s a2, v9
+; CHECK-NEXT: addi a1, a1, -1
+; CHECK-NEXT: or a1, a1, a2
+; CHECK-NEXT: vmv.x.s a2, v11
+; CHECK-NEXT: vslidedown.vi v9, v10, 2
+; CHECK-NEXT: sgtz a2, a2
+; CHECK-NEXT: vmv.v.x v11, a0
+; CHECK-NEXT: vmv.x.s a0, v9
+; CHECK-NEXT: addi a2, a2, -1
+; CHECK-NEXT: or a0, a2, a0
+; CHECK-NEXT: vmv.x.s a2, v8
+; CHECK-NEXT: vslidedown.vi v8, v10, 3
+; CHECK-NEXT: sgtz a2, a2
+; CHECK-NEXT: vslide1down.vx v9, v11, a1
+; CHECK-NEXT: vmv.x.s a1, v8
+; CHECK-NEXT: addi a2, a2, -1
+; CHECK-NEXT: vslide1down.vx v8, v9, a0
+; CHECK-NEXT: or a1, a2, a1
+; CHECK-NEXT: vslide1down.vx v8, v8, a1
+; CHECK-NEXT: ret
+entry:
+ %c = call <4 x i32> @llvm.scmp(<4 x i32> zeroinitializer, <4 x i32> %a)
+ ret <4 x i32> %c
+}
+
+define <4 x i32> @scmp_i32z32(<4 x i32> %a) {
+; CHECK-LABEL: scmp_i32z32:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: li a0, 1
+; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT: vmin.vx v9, v8, a0
+; CHECK-NEXT: vsra.vi v8, v8, 31
+; CHECK-NEXT: vor.vv v8, v8, v9
+; CHECK-NEXT: ret
+entry:
+ %c = call <4 x i32> @llvm.scmp(<4 x i32> %a, <4 x i32> zeroinitializer)
+ ret <4 x i32> %c
+}
+
+
+define <2 x i64> @scmp_i64i64(<2 x i64> %a, <2 x i64> %b) {
+; CHECK-LABEL: scmp_i64i64:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: vsetivli zero, 2, e64, m1, ta, ma
+; CHECK-NEXT: vmslt.vv v0, v9, v8
+; CHECK-NEXT: vmv.v.i v10, 0
+; CHECK-NEXT: vmerge.vim v10, v10, 1, v0
+; CHECK-NEXT: vmslt.vv v0, v8, v9
+; CHECK-NEXT: vmerge.vim v8, v10, -1, v0
+; CHECK-NEXT: ret
+entry:
+ %c = call <2 x i64> @llvm.scmp(<2 x i64> %a, <2 x i64> %b)
+ ret <2 x i64> %c
+}
+
+define <2 x i64> @scmp_z64i64(<2 x i64> %a) {
+; RV32-LABEL: scmp_z64i64:
+; RV32: # %bb.0: # %entry
+; RV32-NEXT: vsetivli zero, 2, e64, m1, ta, ma
+; RV32-NEXT: vmsle.vi v0, v8, -1
+; RV32-NEXT: vmv.v.i v9, 0
+; RV32-NEXT: vmerge.vim v9, v9, 1, v0
+; RV32-NEXT: vmsgt.vi v0, v8, 0
+; RV32-NEXT: vmerge.vim v8, v9, -1, v0
+; RV32-NEXT: ret
+;
+; RV64-LABEL: scmp_z64i64:
+; RV64: # %bb.0: # %entry
+; RV64-NEXT: li a0, 63
+; RV64-NEXT: vsetivli zero, 2, e64, m1, ta, ma
+; RV64-NEXT: vslidedown.vi v9, v8, 1
+; RV64-NEXT: vmv.x.s a1, v8
+; RV64-NEXT: vsrl.vx v8, v8, a0
+; RV64-NEXT: vmv.x.s a0, v9
+; RV64-NEXT: sgtz a1, a1
+; RV64-NEXT: vslidedown.vi v9, v8, 1
+; RV64-NEXT: sgtz a0, a0
+; RV64-NEXT: vmv.x.s a2, v8
+; RV64-NEXT: addi a1, a1, -1
+; RV64-NEXT: or a1, a1, a2
+; RV64-NEXT: vmv.x.s a2, v9
+; RV64-NEXT: addi a0, a0, -1
+; RV64-NEXT: or a0, a0, a2
+; RV64-NEXT: vmv.v.x v8, a1
+; RV64-NEXT: vslide1down.vx v8, v8, a0
+; RV64-NEXT: ret
+entry:
+ %c = call <2 x i64> @llvm.scmp(<2 x i64> zeroinitializer, <2 x i64> %a)
+ ret <2 x i64> %c
+}
+
+define <2 x i64> @scmp_i64z64(<2 x i64> %a) {
+; RV32-LABEL: scmp_i64z64:
+; RV32: # %bb.0: # %entry
+; RV32-NEXT: vsetivli zero, 2, e64, m1, ta, ma
+; RV32-NEXT: vmsgt.vi v0, v8, 0
+; RV32-NEXT: vmv.v.i v9, 0
+; RV32-NEXT: vmerge.vim v9, v9, 1, v0
+; RV32-NEXT: vmsle.vi v0, v8, -1
+; RV32-NEXT: vmerge.vim v8, v9, -1, v0
+; RV32-NEXT: ret
+;
+; RV64-LABEL: scmp_i64z64:
+; RV64: # %bb.0: # %entry
+; RV64-NEXT: li a0, 63
+; RV64-NEXT: vsetivli zero, 2, e64, m1, ta, ma
+; RV64-NEXT: vslidedown.vi v10, v8, 1
+; RV64-NEXT: vsra.vx v9, v8, a0
+; RV64-NEXT: vmv.x.s a1, v10
+; RV64-NEXT: li a0, 1
+; RV64-NEXT: bgtz a1, .LBB11_2
+; RV64-NEXT: # %bb.1: # %entry
+; RV64-NEXT: li a1, 1
+; RV64-NEXT: vmv.x.s a2, v8
+; RV64-NEXT: bgtz a2, .LBB11_3
+; RV64-NEXT: j .LBB11_4
+; RV64-NEXT: .LBB11_2:
+; RV64-NEXT: vslidedown.vi v10, v9, 1
+; RV64-NEXT: vmv.x.s a1, v10
+; RV64-NEXT: vmv.x.s a2, v8
+; RV64-NEXT: blez a2, .LBB11_4
+; RV64-NEXT: .LBB11_3:
+; RV64-NEXT: vmv.x.s a0, v9
+; RV64-NEXT: .LBB11_4: # %entry
+; RV64-NEXT: vmv.v.x v8, a0
+; RV64-NEXT: vslide1down.vx v8, v8, a1
+; RV64-NEXT: ret
+entry:
+ %c = call <2 x i64> @llvm.scmp(<2 x i64> %a, <2 x i64> zeroinitializer)
+ ret <2 x i64> %c
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/scmp.ll b/llvm/test/CodeGen/RISCV/rvv/scmp.ll
new file mode 100644
index 0000000000000..7ce424156eeeb
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/scmp.ll
@@ -0,0 +1,200 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=riscv32 -mattr=+v -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,RV32
+; RUN: llc -mtriple=riscv64 -mattr=+v -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,RV64
+
+define <vscale x 16 x i8> @scmp_i8i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) {
+; CHECK-LABEL: scmp_i8i8:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: vsetvli a0, zero, e8, m2, ta, ma
+; CHECK-NEXT: vmslt.vv v0, v10, v8
+; CHECK-NEXT: vmv.v.i v12, 0
+; CHECK-NEXT: vmerge.vim v12, v12, 1, v0
+; CHECK-NEXT: vmslt.vv v0, v8, v10
+; CHECK-NEXT: vmerge.vim v8, v12, -1, v0
+; CHECK-NEXT: ret
+entry:
+ %c = call <vscale x 16 x i8> @llvm.scmp(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
+ ret <vscale x 16 x i8> %c
+}
+
+define <vscale x 16 x i8> @scmp_z8i8(<vscale x 16 x i8> %a) {
+; CHECK-LABEL: scmp_z8i8:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: vsetvli a0, zero, e8, m2, ta, mu
+; CHECK-NEXT: vmsgt.vi v0, v8, 0
+; CHECK-NEXT: vmv.v.i v10, -1
+; CHECK-NEXT: vsrl.vi v10, v8, 7, v0.t
+; CHECK-NEXT: vmv.v.v v8, v10
+; CHECK-NEXT: ret
+entry:
+ %c = call <vscale x 16 x i8> @llvm.scmp(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> %a)
+ ret <vscale x 16 x i8> %c
+}
+
+define <vscale x 16 x i8> @scmp_i8z8(<vscale x 16 x i8> %a) {
+; CHECK-LABEL: scmp_i8z8:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: li a0, 1
+; CHECK-NEXT: vsetvli a1, zero, e8, m2, ta, ma
+; CHECK-NEXT: vmin.vx v10, v8, a0
+; CHECK-NEXT: vsra.vi v8, v8, 7
+; CHECK-NEXT: vor.vv v8, v8, v10
+; CHECK-NEXT: ret
+entry:
+ %c = call <vscale x 16 x i8> @llvm.scmp(<vscale x 16 x i8> %a, <vscale x 16 x i8> zeroinitializer)
+ ret <vscale x 16 x i8> %c
+}
+
+
+define <vscale x 8 x i16> @scmp_i16i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) {
+; CHECK-LABEL: scmp_i16i16:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: vsetvli a0, zero, e16, m2, ta, ma
+; CHECK-NEXT: vmslt.vv v0, v10, v8
+; CHECK-NEXT: vmv.v.i v12, 0
+; CHECK-NEXT: vmerge.vim v12, v12, 1, v0
+; CHECK-NEXT: vmslt.vv v0, v8, v10
+; CHECK-NEXT: vmerge.vim v8, v12, -1, v0
+; CHECK-NEXT: ret
+entry:
+ %c = call <vscale x 8 x i16> @llvm.scmp(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
+ ret <vscale x 8 x i16> %c
+}
+
+define <vscale x 8 x i16> @scmp_z16i16(<vscale x 8 x i16> %a) {
+; CHECK-LABEL: scmp_z16i16:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: vsetvli a0, zero, e16, m2, ta, mu
+; CHECK-NEXT: vmsgt.vi v0, v8, 0
+; CHECK-NEXT: vmv.v.i v10, -1
+; CHECK-NEXT: vsrl.vi v10, v8, 15, v0.t
+; CHECK-NEXT: vmv.v.v v8, v10
+; CHECK-NEXT: ret
+entry:
+ %c = call <vscale x 8 x i16> @llvm.scmp(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> %a)
+ ret <vscale x 8 x i16> %c
+}
+
+define <vscale x 8 x i16> @scmp_i16z16(<vscale x 8 x i16> %a) {
+; CHECK-LABEL: scmp_i16z16:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: li a0, 1
+; CHECK-NEXT: vsetvli a1, zero, e16, m2, ta, ma
+; CHECK-NEXT: vmin.vx v10, v8, a0
+; CHECK-NEXT: vsra.vi v8, v8, 15
+; CHECK-NEXT: vor.vv v8, v8, v10
+; CHECK-NEXT: ret
+entry:
+ %c = call <vscale x 8 x i16> @llvm.scmp(<vscale x 8 x i16> %a, <vscale x 8 x i16> zeroinitializer)
+ ret <vscale x 8 x i16> %c
+}
+
+
+define <vscale x 4 x i32> @scmp_i32i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) {
+; CHECK-LABEL: scmp_i32i32:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: vsetvli a0, zero, e32, m2, ta, ma
+; CHECK-NEXT: vmslt.vv v0, v10, v8
+; CHECK-NEXT: vmv.v.i v12, 0
+; CHECK-NEXT: vmerge.vim v12, v12, 1, v0
+; CHECK-NEXT: vmslt.vv v0, v8, v10
+; CHECK-NEXT: vmerge.vim v8, v12, -1, v0
+; CHECK-NEXT: ret
+entry:
+ %c = call <vscale x 4 x i32> @llvm.scmp(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
+ ret <vscale x 4 x i32> %c
+}
+
+define <vscale x 4 x i32> @scmp_z32i32(<vscale x 4 x i32> %a) {
+; CHECK-LABEL: scmp_z32i32:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: vsetvli a0, zero, e32, m2, ta, mu
+; CHECK-NEXT: vmsgt.vi v0, v8, 0
+; CHECK-NEXT: vmv.v.i v10, -1
+; CHECK-NEXT: vsrl.vi v10, v8, 31, v0.t
+; CHECK-NEXT: vmv.v.v v8, v10
+; CHECK-NEXT: ret
+entry:
+ %c = call <vscale x 4 x i32> @llvm.scmp(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> %a)
+ ret <vscale x 4 x i32> %c
+}
+
+define <vscale x 4 x i32> @scmp_i32z32(<vscale x 4 x i32> %a) {
+; CHECK-LABEL: scmp_i32z32:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: li a0, 1
+; CHECK-NEXT: vsetvli a1, zero, e32, m2, ta, ma
+; CHECK-NEXT: vmin.vx v10, v8, a0
+; CHECK-NEXT: vsra.vi v8, v8, 31
+; CHECK-NEXT: vor.vv v8, v8, v10
+; CHECK-NEXT: ret
+entry:
+ %c = call <vscale x 4 x i32> @llvm.scmp(<vscale x 4 x i32> %a, <vscale x 4 x i32> zeroinitializer)
+ ret <vscale x 4 x i32> %c
+}
+
+
+define <vscale x 2 x i64> @scmp_i64i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) {
+; CHECK-LABEL: scmp_i64i64:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: vsetvli a0, zero, e64, m2, ta, ma
+; CHECK-NEXT: vmslt.vv v0, v10, v8
+; CHECK-NEXT: vmv.v.i v12, 0
+; CHECK-NEXT: vmerge.vim v12, v12, 1, v0
+; CHECK-NEXT: vmslt.vv v0, v8, v10
+; CHECK-NEXT: vmerge.vim v8, v12, -1, v0
+; CHECK-NEXT: ret
+entry:
+ %c = call <vscale x 2 x i64> @llvm.scmp(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
+ ret <vscale x 2 x i64> %c
+}
+
+define <vscale x 2 x i64> @scmp_z64i64(<vscale x 2 x i64> %a) {
+; RV32-LABEL: scmp_z64i64:
+; RV32: # %bb.0: # %entry
+; RV32-NEXT: vsetvli a0, zero, e64, m2, ta, ma
+; RV32-NEXT: vmsle.vi v0, v8, -1
+; RV32-NEXT: vmv.v.i v10, 0
+; RV32-NEXT: vmerge.vim v10, v10, 1, v0
+; RV32-NEXT: vmsgt.vi v0, v8, 0
+; RV32-NEXT: vmerge.vim v8, v10, -1, v0
+; RV32-NEXT: ret
+;
+; RV64-LABEL: scmp_z64i64:
+; RV64: # %bb.0: # %entry
+; RV64-NEXT: li a0, 63
+; RV64-NEXT: vsetvli a1, zero, e64, m2, ta, mu
+; RV64-NEXT: vmsgt.vi v0, v8, 0
+; RV64-NEXT: vmv.v.i v10, -1
+; RV64-NEXT: vsrl.vx v10, v8, a0, v0.t
+; RV64-NEXT: vmv.v.v v8, v10
+; RV64-NEXT: ret
+entry:
+ %c = call <vscale x 2 x i64> @llvm.scmp(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> %a)
+ ret <vscale x 2 x i64> %c
+}
+
+define <vscale x 2 x i64> @scmp_i64z64(<vscale x 2 x i64> %a) {
+; RV32-LABEL: scmp_i64z64:
+; RV32: # %bb.0: # %entry
+; RV32-NEXT: vsetvli a0, zero, e64, m2, ta, ma
+; RV32-NEXT: vmsgt.vi v0, v8, 0
+; RV32-NEXT: vmv.v.i v10, 0
+; RV32-NEXT: vmerge.vim v10, v10, 1, v0
+; RV32-NEXT: vmsle.vi v0, v8, -1
+; RV32-NEXT: vmerge.vim v8, v10, -1, v0
+; RV32-NEXT: ret
+;
+; RV64-LABEL: scmp_i64z64:
+; RV64: # %bb.0: # %entry
+; RV64-NEXT: li a0, 63
+; RV64-NEXT: vsetvli a1, zero, e64, m2, ta, mu
+; RV64-NEXT: vmsgt.vi v0, v8, 0
+; RV64-NEXT: vmv.v.i v10, 1
+; RV64-NEXT: vsra.vx v10, v8, a0, v0.t
+; RV64-NEXT: vmv.v.v v8, v10
+; RV64-NEXT: ret
+entry:
+ %c = call <vscale x 2 x i64> @llvm.scmp(<vscale x 2 x i64> %a, <vscale x 2 x i64> zeroinitializer)
+ ret <vscale x 2 x i64> %c
+}
More information about the llvm-commits
mailing list