[llvm] [RISCV] Use C.ADD when OR is not compressible due to register restriction (PR #156044)

Fri Aug 29 08:04:24 PDT 2025

https://github.com/preames created https://github.com/llvm/llvm-project/pull/156044

c.or requires that all the operands be the gprc register class, but c.add does not.  As a result, we can use c.add for disjoint or to allow additional compression.

This patch does the transform extremely late (when converting to MCInst) so that we only emit an OR as an ADD if the difference actually reduces code size.

I haven't touched the register allocator hint mechanism (yet), so this is only catching cases which naturally end up reusing one of the source registers.

This is a (likely much better) alternative to https://github.com/llvm/llvm-project/pull/155669.  When I first wrote that, I hadn't realized that we already propagate disjoint onto the RISCV::OR MachineInst Node.

Note there is a small correctness risk with this change - if we forgot to drop the disioint flag somewhere this could cause miscompiles, and I don't think we have another use of the flag this late in the backend.

>From a08de2dd61d4eccce12022c4f9dfb67ad1821095 Mon Sep 17 00:00:00 2001
From: Philip Reames <preames at rivosinc.com>
Date: Thu, 28 Aug 2025 17:44:52 -0700
Subject: [PATCH] [RISCV] Use C.ADD when OR is not compressible due to register
 restrictions

c.or requires that all the operands be the gprc register class, but c.add
does not.  As a result, we can use c.add for disjoint or to allow additional
compression.

This patch does the transform extremely late (when converting to MCInst)
so that we only emit an OR as an ADD if the difference actually reduces
code size.

I haven't touched the register allocator hint mechanism (yet), so this
is only catching cases which naturally end up reusing one of the source
registers.

This is a (likely much better) alternative to https://github.com/llvm/llvm-project/pull/155669.  When I first wrote that, I hadn't reazlied that we already propoagate disjoint onto the RISCV::OR MachineInst Node.

Note there is a small correctness risk with this change - if we forgot to drop
the disoint flag somewhere this could cause miscompiles, and I don't think
we have another use of the flag this late in the backend.
---
 llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp        | 16 +++++++++++++++-
 .../RISCV/rvv/fixed-vectors-int-buildvec.ll      | 10 +++++-----
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp b/llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp
index 83566b1c57782..f10d2a1ccba58 100644
--- a/llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp
+++ b/llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp
@@ -1178,7 +1178,21 @@ bool RISCVAsmPrinter::lowerToMCInst(const MachineInstr *MI, MCInst &OutMI) {
   if (lowerRISCVVMachineInstrToMCInst(MI, OutMI, STI))
     return false;
 
-  OutMI.setOpcode(MI->getOpcode());
+  unsigned Opcode = MI->getOpcode();
+  // If we have a disjoint OR which isn't compressible as an c.or, we can
+  // convert it to a c.add which doesn't have the gprc register restriction.
+  if (STI->hasStdExtZca() && Opcode == RISCV::OR &&
+      MI->getFlag(MachineInstr::Disjoint)) {
+    Register Rd = MI->getOperand(0).getReg();
+    Register Rs1 = MI->getOperand(1).getReg();
+    Register Rs2 = MI->getOperand(2).getReg();
+    if ((Rd == Rs1 || Rd == Rs2) &&
+        !(RISCV::GPRCRegClass.contains(Rd) &&
+          RISCV::GPRCRegClass.contains(Rs1) &&
+          RISCV::GPRCRegClass.contains(Rs2)))
+      Opcode = RISCV::ADD;
+  }
+  OutMI.setOpcode(Opcode);
 
   for (const MachineOperand &MO : MI->operands()) {
     MCOperand MCOp;
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
index d9bb007a10f71..157ab0bd30d70 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
@@ -1532,7 +1532,7 @@ define <16 x i8> @buildvec_v16i8_loads_contigous(ptr %p) {
 ; RVA22U64-NEXT:    slli a4, a4, 24
 ; RVA22U64-NEXT:    slli a5, a5, 32
 ; RVA22U64-NEXT:    slli a1, a1, 40
-; RVA22U64-NEXT:    or a6, a6, a2
+; RVA22U64-NEXT:    add a6, a6, a2
 ; RVA22U64-NEXT:    or t2, a4, a3
 ; RVA22U64-NEXT:    or t1, a1, a5
 ; RVA22U64-NEXT:    lbu a4, 8(a0)
@@ -1907,7 +1907,7 @@ define <16 x i8> @buildvec_v16i8_loads_gather(ptr %p) {
 ; RVA22U64-NEXT:    slli a4, a4, 24
 ; RVA22U64-NEXT:    slli a5, a5, 32
 ; RVA22U64-NEXT:    slli a1, a1, 40
-; RVA22U64-NEXT:    or a7, a7, a2
+; RVA22U64-NEXT:    add a7, a7, a2
 ; RVA22U64-NEXT:    or t3, a4, a3
 ; RVA22U64-NEXT:    or t2, a1, a5
 ; RVA22U64-NEXT:    lbu a4, 93(a0)
@@ -1918,13 +1918,13 @@ define <16 x i8> @buildvec_v16i8_loads_gather(ptr %p) {
 ; RVA22U64-NEXT:    slli t0, t0, 56
 ; RVA22U64-NEXT:    slli a4, a4, 8
 ; RVA22U64-NEXT:    or a3, t0, a6
-; RVA22U64-NEXT:    or a4, t1, a4
+; RVA22U64-NEXT:    add a4, a4, t1
 ; RVA22U64-NEXT:    lbu a5, 161(a0)
 ; RVA22U64-NEXT:    lbu a1, 154(a0)
 ; RVA22U64-NEXT:    lbu a0, 163(a0)
 ; RVA22U64-NEXT:    slli t4, t4, 16
 ; RVA22U64-NEXT:    slli a5, a5, 24
-; RVA22U64-NEXT:    or a5, a5, t4
+; RVA22U64-NEXT:    add a5, a5, t4
 ; RVA22U64-NEXT:    slli a2, a2, 32
 ; RVA22U64-NEXT:    slli a0, a0, 40
 ; RVA22U64-NEXT:    or a0, a0, a2
@@ -3083,7 +3083,7 @@ define <8 x i8> @buildvec_v8i8_pack(i8 %e1, i8 %e2, i8 %e3, i8 %e4, i8 %e5, i8 %
 ; RVA22U64-NEXT:    slli a2, a2, 16
 ; RVA22U64-NEXT:    slli a3, a3, 24
 ; RVA22U64-NEXT:    slli a1, a1, 8
-; RVA22U64-NEXT:    or a5, a5, t0
+; RVA22U64-NEXT:    add a5, a5, t0
 ; RVA22U64-NEXT:    or a4, a7, a4
 ; RVA22U64-NEXT:    or a2, a2, a3
 ; RVA22U64-NEXT:    or a0, a0, a1