[llvm] [RISCV][LSR] Account for temporary register for base addition (PR #92296)

Wed May 15 10:28:58 PDT 2024

https://github.com/preames created https://github.com/llvm/llvm-project/pull/92296

An LSR formula may require the addition of multiple base or scale registers, this sum reduction requires a temporary register to perform.  Since the formulas are independent, we only need one temporary, regardless of the number of unique formula.  Each formula can reuse the same temporary.  A later CSE pass may come along and combine sub-expressions - but then the register pressure would be that passes problem to consider.

This change fixes up the costing in the RISCV specific way, but this is really a generic LSR problem.  I just didn't feel like fighting with LSR and dealing with all the various targets swinging slightly in hard to reason about ways.  This problem is more pronounced on RISCV than any other target due to our lack of addressing modes.

This change is not hugely important on it's own, but I have an upcoming change to add support fo shNadd in LSR which biases us fairly strongly towards adding more "base adds".  Without this change, we see net regression due to the increase in register pressure which is not accounted for.

>From fa434b1d74fed5cda8b8a429c2e96a777125f1bf Mon Sep 17 00:00:00 2001
From: Philip Reames <preames at rivosinc.com>
Date: Wed, 15 May 2024 09:21:58 -0700
Subject: [PATCH] [RISCV][LSR] Account for temporary register for base addition

An LSR formula may require the addition of multiple base or scale registers,
this sum reduction requires a temporary register to perform.  Since the
formulas are independent, we only need one temporary, regardless of the number
of unique formula.  Each formula can reuse the same temporary.  A later CSE
pass may come along and combine sub-expressions - but then the register
pressure would be that passes problem to consider.

This change fixes up the costing in the RISCV specific way, but this is
really a generic LSR problem.  I just didn't feel like fighting with LSR
and dealing with all the various targets swinging slightly in hard to
reason about ways.  This problem is more pronounced on RISCV than any
other target due to our lack of addressing modes.

This change is not hugely important on it's own, but I have an upcoming
change to add support fo shNadd in LSR which biases us fairly strongly
towards adding more "base adds".  Without this change, we see net regression
due to the increase in register pressure which is not accounted for.
---
 llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp |  8 ++++++--
 .../RISCV/loop-strength-reduce-loop-invar.ll       | 14 ++++++--------
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 4d2479fc233f5..6aa3175a1cd81 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1859,10 +1859,14 @@ unsigned RISCVTTIImpl::getMaximumVF(unsigned ElemWidth, unsigned Opcode) const {
 bool RISCVTTIImpl::isLSRCostLess(const TargetTransformInfo::LSRCost &C1,
                                  const TargetTransformInfo::LSRCost &C2) {
   // RISC-V specific here are "instruction number 1st priority".
-  return std::tie(C1.Insns, C1.NumRegs, C1.AddRecCost,
+  // If we need to emit adds inside the loop to add up base registers, then
+  // we need at least one extra temporary register.
+  unsigned C1NumRegs = C1.NumRegs + (C1.NumBaseAdds != 0);
+  unsigned C2NumRegs = C2.NumRegs + (C2.NumBaseAdds != 0);
+  return std::tie(C1.Insns, C1NumRegs, C1.AddRecCost,
                   C1.NumIVMuls, C1.NumBaseAdds,
                   C1.ScaleCost, C1.ImmCost, C1.SetupCost) <
-         std::tie(C2.Insns, C2.NumRegs, C2.AddRecCost,
+         std::tie(C2.Insns, C2NumRegs, C2.AddRecCost,
                   C2.NumIVMuls, C2.NumBaseAdds,
                   C2.ScaleCost, C2.ImmCost, C2.SetupCost);
 }
diff --git a/llvm/test/CodeGen/RISCV/loop-strength-reduce-loop-invar.ll b/llvm/test/CodeGen/RISCV/loop-strength-reduce-loop-invar.ll
index 8b22046cb6243..8693283e83712 100644
--- a/llvm/test/CodeGen/RISCV/loop-strength-reduce-loop-invar.ll
+++ b/llvm/test/CodeGen/RISCV/loop-strength-reduce-loop-invar.ll
@@ -53,26 +53,24 @@ define void @test(i32 signext %row, i32 signext %N.in) nounwind {
 ; RV64:       # %bb.0: # %entry
 ; RV64-NEXT:    blez a1, .LBB0_3
 ; RV64-NEXT:  # %bb.1: # %cond_true.preheader
-; RV64-NEXT:    negw a1, a1
 ; RV64-NEXT:    slli a0, a0, 6
 ; RV64-NEXT:    lui a2, %hi(A)
 ; RV64-NEXT:    addi a2, a2, %lo(A)
 ; RV64-NEXT:    add a0, a0, a2
 ; RV64-NEXT:    addi a2, a0, 4
+; RV64-NEXT:    addiw a1, a1, 2
 ; RV64-NEXT:    li a3, 2
 ; RV64-NEXT:    li a4, 4
 ; RV64-NEXT:    li a5, 5
-; RV64-NEXT:    li a6, 2
 ; RV64-NEXT:  .LBB0_2: # %cond_true
 ; RV64-NEXT:    # =>This Inner Loop Header: Depth=1
 ; RV64-NEXT:    sw a4, 0(a2)
-; RV64-NEXT:    slli a7, a6, 2
-; RV64-NEXT:    add a7, a0, a7
-; RV64-NEXT:    sw a5, 0(a7)
-; RV64-NEXT:    addiw a6, a6, 1
-; RV64-NEXT:    addw a7, a1, a6
+; RV64-NEXT:    slli a6, a3, 2
+; RV64-NEXT:    add a6, a0, a6
+; RV64-NEXT:    sw a5, 0(a6)
+; RV64-NEXT:    addiw a3, a3, 1
 ; RV64-NEXT:    addi a2, a2, 4
-; RV64-NEXT:    bne a7, a3, .LBB0_2
+; RV64-NEXT:    bne a3, a1, .LBB0_2
 ; RV64-NEXT:  .LBB0_3: # %return
 ; RV64-NEXT:    ret
 entry: