[llvm] [RISCV] Add test for copy propagation issue with VMV0. NFC (PR #75347)

Wed Dec 13 07:19:22 PST 2023

https://github.com/lukel97 created https://github.com/llvm/llvm-project/pull/75347

I am currently looking into selecting mask registers as virtual registers
instead of physical registers (i.e. copies into V0) in order to simplify

PseudoVADD_VV_M1_MASK %x, %y, %mask:vmv0

instead of how we currently copy it into v0:

$v0 = COPY %mask:vr
PseudoVADD_VV_M1_MASK %x, %y, $v0

One issue I've run into with this approach is that register allocation can fail
on vector compare instructions, due to an interaction with MachineCSE and how
we model register overlap constraints.

We currently model vector register overlap constraints with early clobber. For
instructions like PseudoVMSEQ_VV_M2_MASK, this is more restrictive than what is
needed since the mask operand can be the same as the destination register, but
there's currently no way in LLVM today to mark only a subset of operands as
being clobbered: [1]

early-clobber %res:vr = PseudoVMSEQ_VV_M2_MASK %pt:vr(tied-def 0), ..., %mask:vmv0, ...

The issue arises if passthru operand is a copy of mask operand, e.g:

%mask:vmv0 = ...
%pt:vr = COPY %mask

MachineCSE performs trivial copy propagation and will coalesce the copy of
%mask to the passthru operand:

early-clobber %res:vr = PseudoVMSEQ_VV_M2_MASK %mask:vmv0(tied-def 0), ..., %mask:vmv0, ...

The two address instruction pass then sees the tied operand and constrains the
def's register class:

%mask:vmv0 = ...
%res:vmv0 = COPY %mask
early-clobber %res:vmv0 = PseudoVMSNE_VV_M2_MASK %res:vmv0, ..., %mask:vmv0

Because of the early-clobber constraint, %mask and %res will need to be
separate registers: But vmv0 only has one register, and allocation errors out.

This doesn't occur today because we explicitly copy the mask into $v0 first, so
the coalescing never occurs in the first place.

I will post a separate PR with one possible approach to fixing (teaching
MachineCSE to avoid coalescing in this case)

[1] https://discourse.llvm.org/t/earlyclobber-but-for-a-subset-of-the-inputs/55240


>From 2509f7d9fa39b7c1fca29e21c0a4dfc2942679d6 Mon Sep 17 00:00:00 2001
From: Luke Lau <luke at igalia.com>
Date: Wed, 13 Dec 2023 23:29:40 +0900
Subject: [PATCH] [RISCV] Add test for copy propagation issue with VMV0. NFC

I am currently looking into selecting mask registers as virtual registers
instead of physical registers (i.e. copies into V0) in order to simplify

PseudoVADD_VV_M1_MASK %x, %y, %mask:vmv0

instead of how we currently copy it into v0:

$v0 = COPY %mask:vr
PseudoVADD_VV_M1_MASK %x, %y, $v0

One issue I've run into with this approach is that register allocation can fail
on vector compare instructions, due to an interaction with MachineCSE and how
we model register overlap constraints.

We currently model vector register overlap constraints with early clobber. For
instructions like PseudoVMSEQ_VV_M2_MASK, this is more restrictive than what is
needed since the mask operand can be the same as the destination register, but
there's currently no way in LLVM today to mark only a subset of operands as
being clobbered: [1]

early-clobber %res:vr = PseudoVMSEQ_VV_M2_MASK %pt:vr(tied-def 0), ..., %mask:vmv0, ...

The issue arises if passthru operand is a copy of mask operand, e.g:

%mask:vmv0 = ...
%pt:vr = COPY %mask

MachineCSE performs trivial copy propagation and will coalesce the copy of
%mask to the passthru operand:

early-clobber %res:vr = PseudoVMSEQ_VV_M2_MASK %mask:vmv0(tied-def 0), ..., %mask:vmv0, ...

The two address instruction pass then sees the tied operand and constrains the
def's register class:

%mask:vmv0 = ...
%res:vmv0 = COPY %mask
early-clobber %res:vmv0 = PseudoVMSNE_VV_M2_MASK %res:vmv0, ..., %mask:vmv0

Because of the early-clobber constraint, %mask and %res will need to be
separate registers: But vmv0 only has one register, and allocation errors out.

This doesn't occur today because we explicitly copy the mask into $v0 first, so
the coalescing never occurs in the first place.

I will post a separate PR with one possible approach to fixing (teaching
MachineCSE to avoid coalescing in this case)

[1] https://discourse.llvm.org/t/earlyclobber-but-for-a-subset-of-the-inputs/55240
---
 .../RISCV/rvv/machine-cse-early-clobber.mir   | 29 +++++++++++++++++++
 1 file changed, 29 insertions(+)
 create mode 100644 llvm/test/CodeGen/RISCV/rvv/machine-cse-early-clobber.mir

diff --git a/llvm/test/CodeGen/RISCV/rvv/machine-cse-early-clobber.mir b/llvm/test/CodeGen/RISCV/rvv/machine-cse-early-clobber.mir
new file mode 100644
index 00000000000000..7399ee902c57f4
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/machine-cse-early-clobber.mir
@@ -0,0 +1,29 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
+# RUN: llc -mtriple=riscv64 -mattr=+v -verify-machineinstrs -run-pass=machine-cse -o - %s | FileCheck %s
+
+# FIXME: MachineCSE propagates the copy of %mask to %pt, since the tied-def
+# means we will end up constraining the def to vmv0. This results in an
+# unallocatable instruction where we have to allocate two vmv0 registers because
+# of the early clobber, but vmv0 only has one register available (v0):
+#
+# %mask:vmv0 = COPY $v0
+# %res:vmv0 = COPY %mask
+# early-clobber %res:vmv0 = PseudoVMSNE_VV_M2_MASK %res, $noreg, $noreg, %mask, $noreg, 3 /* e8 */, implicit $vl, implicit $vtype
+
+---
+name: early_clobber_tied_def
+body: |
+  bb.0:
+    liveins: $v0
+    ; CHECK-LABEL: name: early_clobber_tied_def
+    ; CHECK: liveins: $v0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %mask:vmv0 = COPY $v0
+    ; CHECK-NEXT: early-clobber %res:vr = PseudoVMSEQ_VV_M2_MASK %mask, $noreg, $noreg, %mask, $noreg, 3 /* e8 */, implicit $vl, implicit $vtype
+    ; CHECK-NEXT: $v0 = COPY %res
+    ; CHECK-NEXT: PseudoRET implicit $v0
+    %mask:vmv0 = COPY $v0
+    %pt:vr = COPY %mask
+    %res:vr = PseudoVMSEQ_VV_M2_MASK %pt:vr, $noreg, $noreg, %mask:vmv0, $noreg, 3, implicit $vl, implicit $vtype
+    $v0 = COPY %res:vr
+    PseudoRET implicit $v0