[llvm] [RISCV][POC] Recursive search for mul expansion (PR #96327)

Fri Jun 21 09:44:40 PDT 2024

llvmbot wrote:




@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

<details>
<summary>Changes</summary>

(This is very much a POC - code quality is poor, and some bugs are unfixed.)

I want to gather opinions on what we should do next (if anything) for multiply strength reduction.  As a strawman, this patch implements a recursive search strategy which is applied during ISEL.  This is exponential in search depth, and has a branching factor of ~17 if I counted correctly.

For the range [0,199] inclusive:
On ToT, we cover all but 123 values
At depth=3, we cover all but 43 values.
As depth=4, we cover all values in that range.
An older gcc, covers all but 94.

For the range [-1000,999] inclusive:
On ToT, we miss 1853 values.      (0.9s)
At depth=3, we miss 1667 values.  (1.14s
At depth=4, we miss 1130 values   (2.5s)
At depth=5, we miss 741 values.  (12.25s)
An older gcc, misses 1545 values. (1.3s - but for full compile, not just llc)

On the right hand side is the llc time for the given search depth.

There are things we could do to improve time here.  The biggest one is likely memoizing results - in particular, remembering that a result can't be found in the budget would likely help a lot.

Another approach might be to pre-generate lookup tables for "common" numbers. The search strategy here could be run offline, and the results fairly cached in a data table.  The size of the table of course depends on the range chosen, but for e.g. -128..127 we could probably get away with 3 bytes per entry or 768 bits of static storage.  One slight complications is that the tables differ by ISA - at the moment, we'd only need 2 (zba vs no-zba), but we could reasonably want to exploit other extensions in the future.

We could of course also chose to do nothing.

Thoughts, ideas?

---

Patch is 103.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/96327.diff


17 Files Affected:

- (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+226-118) 
- (modified) llvm/test/CodeGen/RISCV/addimm-mulimm.ll (+81-77) 
- (modified) llvm/test/CodeGen/RISCV/div-by-constant.ll (+38-18) 
- (modified) llvm/test/CodeGen/RISCV/mul.ll (+94-90) 
- (modified) llvm/test/CodeGen/RISCV/rv32xtheadba.ll (+35-27) 
- (modified) llvm/test/CodeGen/RISCV/rv32zba.ll (+46-35) 
- (modified) llvm/test/CodeGen/RISCV/rv64-legal-i32/rv64zba.ll (+76-59) 
- (modified) llvm/test/CodeGen/RISCV/rv64xtheadba.ll (+39-31) 
- (modified) llvm/test/CodeGen/RISCV/rv64zba.ll (+196-133) 
- (modified) llvm/test/CodeGen/RISCV/rvv/calling-conv-fastcc.ll (+53-49) 
- (modified) llvm/test/CodeGen/RISCV/rvv/extract-subvector.ll (+8-6) 
- (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll (+26-20) 
- (modified) llvm/test/CodeGen/RISCV/rvv/mscatter-combine.ll (+4-4) 
- (modified) llvm/test/CodeGen/RISCV/rvv/setcc-fp-vp.ll (+7-7) 
- (modified) llvm/test/CodeGen/RISCV/rvv/stepvector.ll (+5-5) 
- (modified) llvm/test/CodeGen/RISCV/srem-seteq-illegal-types.ll (+15-15) 
- (modified) llvm/test/CodeGen/RISCV/urem-vector-lkk.ll (+6-6) 


``````````diff

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 57817832c9b42..b3ad67423b550 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -80,6 +80,12 @@ static cl::opt<bool>
     RV64LegalI32("riscv-experimental-rv64-legal-i32", cl::ReallyHidden,
                  cl::desc("Make i32 a legal type for SelectionDAG on RV64."));
 
+static cl::opt<unsigned> MulExpansionDepth(
+    "riscv-mul-expansion-depth", cl::Hidden,
+    cl::desc("Maximum depth to search when expanding a mul by constant"),
+    cl::init(3));
+
+
 RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
                                          const RISCVSubtarget &STI)
     : TargetLowering(TM), Subtarget(STI) {
@@ -13689,6 +13695,221 @@ static SDValue performXORCombine(SDNode *N, SelectionDAG &DAG,
   return combineSelectAndUseCommutative(N, DAG, /*AllOnes*/ false, Subtarget);
 }
 
+struct Step {
+  unsigned Opcode;
+  uint64_t A;
+  uint64_t B;
+  unsigned Shift = 0;
+};
+
+static bool findMulExpansionRecursive(uint64_t MulAmt, bool HasShlAdd,
+                                      SmallVector<Step> &Path) {
+  // Maximum sequence size is 3, avoid anything beyond that.
+  // In 0-199 inclusive, with max depth = X, all but:
+  // 3: 43 unique values
+  // 4: none, all covered
+
+  if (Path.size() > MulExpansionDepth)
+    return false;
+
+  // Base case
+  if (MulAmt == 1)
+    return true;
+
+  // Rework the recursion bit here...
+  SmallVector<Step> TmpPath = Path;
+  std::optional<SmallVector<Step>> BestPath;
+
+  auto recurseAndScore = [&](uint64_t MulAmt2, Step S) {
+    unsigned Len = TmpPath.size();
+    TmpPath.push_back(S);
+    if (findMulExpansionRecursive(MulAmt2, HasShlAdd, TmpPath))
+      if (!BestPath || BestPath->size() > TmpPath.size())
+        BestPath = TmpPath;
+    TmpPath.resize(Len);
+  };
+
+  // We only recurse on the first operand, so the second step must
+  // be a complete without further search.
+  auto recurseAndScore2 = [&](uint64_t MulAmt2, Step S1, Step S2) {
+    unsigned Len = TmpPath.size();
+    TmpPath.push_back(S1);
+    TmpPath.push_back(S2);
+    if (findMulExpansionRecursive(MulAmt2, HasShlAdd, TmpPath))
+      if (!BestPath || BestPath->size() > TmpPath.size())
+        BestPath = TmpPath;
+    TmpPath.resize(Len);
+  };
+
+
+  // Only the base case (MulAmt is a power of two) is required, but we prefer to
+  // expand the shl last, so add that to our solution set eagerly so the cost
+  // with be equal to exclude the inverse factoring.
+  if (unsigned TZ = llvm::countr_zero(MulAmt); TZ) {
+    uint64_t MulAmt2 = MulAmt >> TZ;
+    recurseAndScore(MulAmt2, {ISD::SHL, MulAmt2, TZ});
+  }
+
+  // TODO: Should we factor out the MUL note entirely in favor of
+  // the RISCVISD::SHL_ADD one?
+  if (HasShlAdd) {
+    // {3,5,9}*W -> shNadd W, W
+    for (uint64_t Divisor : {3, 5, 9}) {
+      if (MulAmt % Divisor != 0)
+        continue;
+      uint64_t MulAmt2 = MulAmt / Divisor;
+      recurseAndScore(MulAmt2, {ISD::MUL, MulAmt2, Divisor});
+    }
+
+    // 2^(1,2,3) * W + 1 -> (shNadd (W, x)
+    unsigned TZ = llvm::countr_zero(MulAmt - 1);
+    if (TZ == 1 || TZ == 2 || TZ == 3) {
+      uint64_t MulAmt2 = (MulAmt - 1) >> TZ;
+      recurseAndScore(MulAmt2, {RISCVISD::SHL_ADD, MulAmt2, 1, TZ});
+    }
+
+    // W + [2,4,8] -> shNadd x, W
+    for (uint64_t Offset : {2, 4, 8}) {
+      uint64_t MulAmt2 = MulAmt - Offset;
+      recurseAndScore(MulAmt2, {ISD::ADD, MulAmt2, Offset});
+    }
+  }
+
+  {
+    uint64_t MulAmt2 = MulAmt - 1;
+    recurseAndScore(MulAmt2, {ISD::ADD, MulAmt2, 1});
+  }
+
+  {
+    uint64_t MulAmt2 = MulAmt + 1;
+    recurseAndScore(MulAmt2, {ISD::SUB, MulAmt2, 1});
+  }
+
+
+  // Add +/- 3,5,9 cases, needs two instructions each even using
+  // shNadd
+  if (HasShlAdd) {
+    for (uint64_t Offset : {3, 5, 9}) {
+      uint64_t MulAmt2 = (MulAmt - Offset);
+      recurseAndScore2(MulAmt2,
+                   {ISD::ADD, MulAmt2, Offset},
+                   {ISD::MUL, 1, Offset});
+    }
+
+    for (uint64_t Offset : {3, 5, 9}) {
+      uint64_t MulAmt2 = (MulAmt + Offset);
+      recurseAndScore2(MulAmt2,
+                       {ISD::SUB, MulAmt2, Offset},
+                       {ISD::MUL, 1, Offset});
+    }
+  }
+
+  // Isolate the last set bit, and recurse on the remaining
+  {
+    uint64_t MulAmtLowBit = MulAmt & (-MulAmt);
+    uint64_t MulAmt2 = MulAmt - MulAmtLowBit;
+    recurseAndScore2(MulAmt2,
+                     {ISD::ADD, MulAmt2, MulAmtLowBit},
+                     {ISD::SHL, 1, Log2_64(MulAmtLowBit)});
+  }
+
+  // TODO: Add subtracting last zero bit..
+  
+
+  if (!BestPath)
+    return false;
+
+  Path = *BestPath;
+  return true;
+}
+
+static SDValue expandMulPath(SelectionDAG &DAG, SDNode *N,
+                             const bool HasShlAdd,
+                             SmallVector<Step> &Path, SDValue X) {
+  assert(!Path.empty());
+  EVT VT = N->getValueType(0);
+  SDLoc DL(N);
+
+  uint64_t MulAmt = cast<ConstantSDNode>(N->getOperand(1))->getZExtValue();
+  //dbgs() << "Found path for " << MulAmt << "\n";
+  
+  DenseMap<uint64_t, SDValue> Expansions;
+  Expansions[1] = X;
+  for (Step &S : llvm::reverse(Path)) {
+    switch (S.Opcode) {
+    default:
+      llvm_unreachable("");
+    case ISD::SHL: {
+      //dbgs() << "Expanding " << S.A << " << " << S.B << "\n";
+      assert(Expansions.contains(S.A));
+      SDValue A = Expansions[S.A];
+      SDValue Res = DAG.getNode(ISD::SHL, DL, VT, A,
+                                DAG.getConstant(S.B, DL, VT));
+      Expansions[S.A << S.B] = Res;
+      break;
+    }
+    case ISD::MUL: {
+      //dbgs() << "Expanding " << S.A << " * " << S.B << "\n";
+      assert(Expansions.contains(S.A));
+      SDValue A = Expansions[S.A];
+      assert(S.B == 3 || S.B == 5 || S.B == 9);
+      SDValue Res = DAG.getNode(RISCVISD::SHL_ADD, DL, VT, A,
+                                DAG.getConstant(Log2_64(S.B - 1), DL, VT),
+                                A);
+      Expansions[S.A * S.B] = Res;
+      break;
+    }
+    case ISD::ADD: {
+      //dbgs() << "Expanding " << S.A << " + " << S.B << "\n";
+      assert(Expansions.contains(S.A));
+      SDValue A = Expansions[S.A];
+      SDValue Res;
+      if (HasShlAdd && (S.B == 2 || S.B == 4 || S.B == 8))
+        // TODO: Refactor this out
+        Res = DAG.getNode(RISCVISD::SHL_ADD, DL, VT, X,
+                          DAG.getConstant(Log2_64(S.B), DL, VT),
+                          A);
+      else {
+        assert(Expansions.contains(S.B));
+        SDValue B = Expansions[S.B];
+        Res =  DAG.getNode(ISD::ADD, DL, VT, A, B);
+      }
+      assert(Res);
+      Expansions[S.A + S.B] = Res;
+      break;
+    }
+    case ISD::SUB: {
+      //dbgs() << "Expanding " << S.A << " - " << S.B << "\n";
+      assert(Expansions.contains(S.A));
+      SDValue A = Expansions[S.A];
+      assert(Expansions.contains(S.B));
+      SDValue B = Expansions[S.B];
+      SDValue Res =  DAG.getNode(ISD::SUB, DL, VT, A, B);
+      Expansions[S.A - S.B] = Res;
+      break;
+    }
+    case RISCVISD::SHL_ADD: {
+      //dbgs() << "Expanding " << S.A << " << " << S.Shift << " + " << S.B << "\n";
+      assert(HasShlAdd);
+      assert(Expansions.contains(S.A));
+      assert(Expansions.contains(S.B));
+      SDValue A = Expansions[S.A];
+      SDValue B = Expansions[S.B];
+      SDValue Res = DAG.getNode(RISCVISD::SHL_ADD, DL, VT, A,
+                                DAG.getConstant(S.Shift, DL, VT),
+                                B);
+      Expansions[(S.A << S.Shift) + S.B] = Res;
+      break;
+    }
+    };
+  }
+
+
+  assert(Expansions.contains(MulAmt));
+  return Expansions[MulAmt];
+}
+  
+
 // Try to expand a scalar multiply to a faster sequence.
 static SDValue expandMul(SDNode *N, SelectionDAG &DAG,
                          TargetLowering::DAGCombinerInfo &DCI,
@@ -13720,124 +13941,11 @@ static SDValue expandMul(SDNode *N, SelectionDAG &DAG,
   // other target properly freezes X in these cases either.
   SDValue X = N->getOperand(0);
 
-  if (HasShlAdd) {
-    for (uint64_t Divisor : {3, 5, 9}) {
-      if (MulAmt % Divisor != 0)
-        continue;
-      uint64_t MulAmt2 = MulAmt / Divisor;
-      // 3/5/9 * 2^N ->  shl (shXadd X, X), N
-      if (isPowerOf2_64(MulAmt2)) {
-        SDLoc DL(N);
-        SDValue X = N->getOperand(0);
-        // Put the shift first if we can fold a zext into the
-        // shift forming a slli.uw.
-        if (X.getOpcode() == ISD::AND && isa<ConstantSDNode>(X.getOperand(1)) &&
-            X.getConstantOperandVal(1) == UINT64_C(0xffffffff)) {
-          SDValue Shl = DAG.getNode(ISD::SHL, DL, VT, X,
-                                    DAG.getConstant(Log2_64(MulAmt2), DL, VT));
-          return DAG.getNode(RISCVISD::SHL_ADD, DL, VT, Shl,
-                             DAG.getConstant(Log2_64(Divisor - 1), DL, VT),
-                             Shl);
-        }
-        // Otherwise, put rhe shl second so that it can fold with following
-        // instructions (e.g. sext or add).
-        SDValue Mul359 =
-            DAG.getNode(RISCVISD::SHL_ADD, DL, VT, X,
-                        DAG.getConstant(Log2_64(Divisor - 1), DL, VT), X);
-        return DAG.getNode(ISD::SHL, DL, VT, Mul359,
-                           DAG.getConstant(Log2_64(MulAmt2), DL, VT));
-      }
-
-      // 3/5/9 * 3/5/9 -> shXadd (shYadd X, X), (shYadd X, X)
-      if (MulAmt2 == 3 || MulAmt2 == 5 || MulAmt2 == 9) {
-        SDLoc DL(N);
-        SDValue Mul359 =
-            DAG.getNode(RISCVISD::SHL_ADD, DL, VT, X,
-                        DAG.getConstant(Log2_64(Divisor - 1), DL, VT), X);
-        return DAG.getNode(RISCVISD::SHL_ADD, DL, VT, Mul359,
-                           DAG.getConstant(Log2_64(MulAmt2 - 1), DL, VT),
-                           Mul359);
-      }
-    }
-
-    // If this is a power 2 + 2/4/8, we can use a shift followed by a single
-    // shXadd. First check if this a sum of two power of 2s because that's
-    // easy. Then count how many zeros are up to the first bit.
-    if (isPowerOf2_64(MulAmt & (MulAmt - 1))) {
-      unsigned ScaleShift = llvm::countr_zero(MulAmt);
-      if (ScaleShift >= 1 && ScaleShift < 4) {
-        unsigned ShiftAmt = Log2_64((MulAmt & (MulAmt - 1)));
-        SDLoc DL(N);
-        SDValue Shift1 =
-            DAG.getNode(ISD::SHL, DL, VT, X, DAG.getConstant(ShiftAmt, DL, VT));
-        return DAG.getNode(RISCVISD::SHL_ADD, DL, VT, X,
-                           DAG.getConstant(ScaleShift, DL, VT), Shift1);
-      }
-    }
-
-    // 2^(1,2,3) * 3,5,9 + 1 -> (shXadd (shYadd x, x), x)
-    // This is the two instruction form, there are also three instruction
-    // variants we could implement.  e.g.
-    //   (2^(1,2,3) * 3,5,9 + 1) << C2
-    //   2^(C1>3) * 3,5,9 +/- 1
-    for (uint64_t Divisor : {3, 5, 9}) {
-      uint64_t C = MulAmt - 1;
-      if (C <= Divisor)
-        continue;
-      unsigned TZ = llvm::countr_zero(C);
-      if ((C >> TZ) == Divisor && (TZ == 1 || TZ == 2 || TZ == 3)) {
-        SDLoc DL(N);
-        SDValue Mul359 =
-            DAG.getNode(RISCVISD::SHL_ADD, DL, VT, X,
-                        DAG.getConstant(Log2_64(Divisor - 1), DL, VT), X);
-        return DAG.getNode(RISCVISD::SHL_ADD, DL, VT, Mul359,
-                           DAG.getConstant(TZ, DL, VT), X);
-      }
-    }
-
-    // 2^n + 2/4/8 + 1 -> (add (shl X, C1), (shXadd X, X))
-    if (MulAmt > 2 && isPowerOf2_64((MulAmt - 1) & (MulAmt - 2))) {
-      unsigned ScaleShift = llvm::countr_zero(MulAmt - 1);
-      if (ScaleShift >= 1 && ScaleShift < 4) {
-        unsigned ShiftAmt = Log2_64(((MulAmt - 1) & (MulAmt - 2)));
-        SDLoc DL(N);
-        SDValue Shift1 =
-            DAG.getNode(ISD::SHL, DL, VT, X, DAG.getConstant(ShiftAmt, DL, VT));
-        return DAG.getNode(ISD::ADD, DL, VT, Shift1,
-                           DAG.getNode(RISCVISD::SHL_ADD, DL, VT, X,
-                                       DAG.getConstant(ScaleShift, DL, VT), X));
-      }
-    }
-
-    // 2^N - 3/5/9 --> (sub (shl X, C1), (shXadd X, x))
-    for (uint64_t Offset : {3, 5, 9}) {
-      if (isPowerOf2_64(MulAmt + Offset)) {
-        SDLoc DL(N);
-        SDValue Shift1 =
-            DAG.getNode(ISD::SHL, DL, VT, X,
-                        DAG.getConstant(Log2_64(MulAmt + Offset), DL, VT));
-        SDValue Mul359 =
-            DAG.getNode(RISCVISD::SHL_ADD, DL, VT, X,
-                        DAG.getConstant(Log2_64(Offset - 1), DL, VT), X);
-        return DAG.getNode(ISD::SUB, DL, VT, Shift1, Mul359);
-      }
-    }
-  }
-
-  // 2^N - 2^M -> (sub (shl X, C1), (shl X, C2))
-  uint64_t MulAmtLowBit = MulAmt & (-MulAmt);
-  if (isPowerOf2_64(MulAmt + MulAmtLowBit)) {
-    uint64_t ShiftAmt1 = MulAmt + MulAmtLowBit;
-    SDLoc DL(N);
-    SDValue Shift1 = DAG.getNode(ISD::SHL, DL, VT, N->getOperand(0),
-                                 DAG.getConstant(Log2_64(ShiftAmt1), DL, VT));
-    SDValue Shift2 =
-        DAG.getNode(ISD::SHL, DL, VT, N->getOperand(0),
-                    DAG.getConstant(Log2_64(MulAmtLowBit), DL, VT));
-    return DAG.getNode(ISD::SUB, DL, VT, Shift1, Shift2);
-  }
-
-  return SDValue();
+  SmallVector<Step> Path;
+  if (!findMulExpansionRecursive(MulAmt, HasShlAdd, Path))
+    return SDValue();
+  assert(!Path.empty());
+  return expandMulPath(DAG, N, HasShlAdd, Path, X);
 }
 
 // Combine vXi32 (mul (and (lshr X, 15), 0x10001), 0xffff) ->
diff --git a/llvm/test/CodeGen/RISCV/addimm-mulimm.ll b/llvm/test/CodeGen/RISCV/addimm-mulimm.ll
index a18526718461e..570712be40b4b 100644
--- a/llvm/test/CodeGen/RISCV/addimm-mulimm.ll
+++ b/llvm/test/CodeGen/RISCV/addimm-mulimm.ll
@@ -11,16 +11,16 @@ define i32 @add_mul_combine_accept_a1(i32 %x) {
 ; RV32IMB-LABEL: add_mul_combine_accept_a1:
 ; RV32IMB:       # %bb.0:
 ; RV32IMB-NEXT:    sh1add a1, a0, a0
-; RV32IMB-NEXT:    slli a0, a0, 5
-; RV32IMB-NEXT:    sub a0, a0, a1
+; RV32IMB-NEXT:    sh1add a1, a1, a0
+; RV32IMB-NEXT:    sh2add a0, a1, a0
 ; RV32IMB-NEXT:    addi a0, a0, 1073
 ; RV32IMB-NEXT:    ret
 ;
 ; RV64IMB-LABEL: add_mul_combine_accept_a1:
 ; RV64IMB:       # %bb.0:
 ; RV64IMB-NEXT:    sh1add a1, a0, a0
-; RV64IMB-NEXT:    slli a0, a0, 5
-; RV64IMB-NEXT:    subw a0, a0, a1
+; RV64IMB-NEXT:    sh1add a1, a1, a0
+; RV64IMB-NEXT:    sh2add a0, a1, a0
 ; RV64IMB-NEXT:    addiw a0, a0, 1073
 ; RV64IMB-NEXT:    ret
   %tmp0 = add i32 %x, 37
@@ -32,16 +32,16 @@ define signext i32 @add_mul_combine_accept_a2(i32 signext %x) {
 ; RV32IMB-LABEL: add_mul_combine_accept_a2:
 ; RV32IMB:       # %bb.0:
 ; RV32IMB-NEXT:    sh1add a1, a0, a0
-; RV32IMB-NEXT:    slli a0, a0, 5
-; RV32IMB-NEXT:    sub a0, a0, a1
+; RV32IMB-NEXT:    sh1add a1, a1, a0
+; RV32IMB-NEXT:    sh2add a0, a1, a0
 ; RV32IMB-NEXT:    addi a0, a0, 1073
 ; RV32IMB-NEXT:    ret
 ;
 ; RV64IMB-LABEL: add_mul_combine_accept_a2:
 ; RV64IMB:       # %bb.0:
 ; RV64IMB-NEXT:    sh1add a1, a0, a0
-; RV64IMB-NEXT:    slli a0, a0, 5
-; RV64IMB-NEXT:    subw a0, a0, a1
+; RV64IMB-NEXT:    sh1add a1, a1, a0
+; RV64IMB-NEXT:    sh2add a0, a1, a0
 ; RV64IMB-NEXT:    addiw a0, a0, 1073
 ; RV64IMB-NEXT:    ret
   %tmp0 = add i32 %x, 37
@@ -55,12 +55,12 @@ define i64 @add_mul_combine_accept_a3(i64 %x) {
 ; RV32IMB-NEXT:    li a2, 29
 ; RV32IMB-NEXT:    mulhu a2, a0, a2
 ; RV32IMB-NEXT:    sh1add a3, a1, a1
-; RV32IMB-NEXT:    slli a1, a1, 5
-; RV32IMB-NEXT:    sub a1, a1, a3
+; RV32IMB-NEXT:    sh1add a3, a3, a1
+; RV32IMB-NEXT:    sh2add a1, a3, a1
 ; RV32IMB-NEXT:    add a1, a2, a1
 ; RV32IMB-NEXT:    sh1add a2, a0, a0
-; RV32IMB-NEXT:    slli a0, a0, 5
-; RV32IMB-NEXT:    sub a2, a0, a2
+; RV32IMB-NEXT:    sh1add a2, a2, a0
+; RV32IMB-NEXT:    sh2add a2, a2, a0
 ; RV32IMB-NEXT:    addi a0, a2, 1073
 ; RV32IMB-NEXT:    sltu a2, a0, a2
 ; RV32IMB-NEXT:    add a1, a1, a2
@@ -69,8 +69,8 @@ define i64 @add_mul_combine_accept_a3(i64 %x) {
 ; RV64IMB-LABEL: add_mul_combine_accept_a3:
 ; RV64IMB:       # %bb.0:
 ; RV64IMB-NEXT:    sh1add a1, a0, a0
-; RV64IMB-NEXT:    slli a0, a0, 5
-; RV64IMB-NEXT:    sub a0, a0, a1
+; RV64IMB-NEXT:    sh1add a1, a1, a0
+; RV64IMB-NEXT:    sh2add a0, a1, a0
 ; RV64IMB-NEXT:    addi a0, a0, 1073
 ; RV64IMB-NEXT:    ret
   %tmp0 = add i64 %x, 37
@@ -81,9 +81,9 @@ define i64 @add_mul_combine_accept_a3(i64 %x) {
 define i32 @add_mul_combine_accept_b1(i32 %x) {
 ; RV32IMB-LABEL: add_mul_combine_accept_b1:
 ; RV32IMB:       # %bb.0:
-; RV32IMB-NEXT:    sh3add a1, a0, a0
-; RV32IMB-NEXT:    slli a0, a0, 5
-; RV32IMB-NEXT:    sub a0, a0, a1
+; RV32IMB-NEXT:    sh2add a1, a0, a0
+; RV32IMB-NEXT:    sh1add a1, a1, a0
+; RV32IMB-NEXT:    sh1add a0, a1, a0
 ; RV32IMB-NEXT:    lui a1, 50
 ; RV32IMB-NEXT:    addi a1, a1, 1119
 ; RV32IMB-NEXT:    add a0, a0, a1
@@ -91,9 +91,9 @@ define i32 @add_mul_combine_accept_b1(i32 %x) {
 ;
 ; RV64IMB-LABEL: add_mul_combine_accept_b1:
 ; RV64IMB:       # %bb.0:
-; RV64IMB-NEXT:    sh3add a1, a0, a0
-; RV64IMB-NEXT:    slli a0, a0, 5
-; RV64IMB-NEXT:    subw a0, a0, a1
+; RV64IMB-NEXT:    sh2add a1, a0, a0
+; RV64IMB-NEXT:    sh1add a1, a1, a0
+; RV64IMB-NEXT:    sh1add a0, a1, a0
 ; RV64IMB-NEXT:    lui a1, 50
 ; RV64IMB-NEXT:    addi a1, a1, 1119
 ; RV64IMB-NEXT:    addw a0, a0, a1
@@ -106,9 +106,9 @@ define i32 @add_mul_combine_accept_b1(i32 %x) {
 define signext i32 @add_mul_combine_accept_b2(i32 signext %x) {
 ; RV32IMB-LABEL: add_mul_combine_accept_b2:
 ; RV32IMB:       # %bb.0:
-; RV32IMB-NEXT:    sh3add a1, a0, a0
-; RV32IMB-NEXT:    slli a0, a0, 5
-; RV32IMB-NEXT:    sub a0, a0, a1
+; RV32IMB-NEXT:    sh2add a1, a0, a0
+; RV32IMB-NEXT:    sh1add a1, a1, a0
+; RV32IMB-NEXT:    sh1add a0, a1, a0
 ; RV32IMB-NEXT:    lui a1, 50
 ; RV32IMB-NEXT:    addi a1, a1, 1119
 ; RV32IMB-NEXT:    add a0, a0, a1
@@ -116,9 +116,9 @@ define signext i32 @add_mul_combine_accept_b2(i32 signext %x) {
 ;
 ; RV64IMB-LABEL: add_mul_combine_accept_b2:
 ; RV64IMB:       # %bb.0:
-; RV64IMB-NEXT:    sh3add a1, a0, a0
-; RV64IMB-NEXT:    slli a0, a0, 5
-; RV64IMB-NEXT:    subw a0, a0, a1
+; RV64IMB-NEXT:    sh2add a1, a0, a0
+; RV64IMB-NEXT:    sh1add a1, a1, a0
+; RV64IMB-NEXT:    sh1add a0, a1, a0
 ; RV64IMB-NEXT:    lui a1, 50
 ; RV64IMB-NEXT:    addi a1, a1, 1119
 ; RV64IMB-NEXT:    addw a0, a0, a1
@@ -133,13 +133,13 @@ define i64 @add_mul_combine_accept_b3(i64 %x) {
 ; RV32IMB:       # %bb.0:
 ; RV32IMB-NEXT:    li a2, 23
 ; RV32IMB-NEXT:    mulhu a2, a0, a2
-; RV32IMB-NEXT:    sh3add a3, a1, a1
-; RV32IMB-NEXT:    slli a1, a1, 5
-; RV32IMB-NEXT:    sub a1, a1, a3
+; RV32IMB-NEXT:    sh2add a3, a1, a1
+; RV32IMB-NEXT:    sh1add a3, a3, a1
+; RV32IMB-NEXT:    sh1add a1, a3, a1
 ; RV32IMB-NEXT:    add a1, a2, a1
-; RV32IMB-NEXT:    sh3add a2, a0, a0
-; RV32IMB-NEXT:    slli a0, a0, 5
-; RV32IMB-NEXT:    sub a2, a0, a2
+; RV32IMB-NEXT:    sh2add a2, a0, a0
+; RV32IMB-NEXT:    sh1add a2, a2, a0
+; RV32IMB-NEXT:    sh1add a2, a2, a0
 ; RV32IMB-NEXT:    lui a0, 50
 ; RV32IMB-NEXT:    addi a0, a0, 1119
 ; RV32IMB-NEXT:    add a0, a2, a0
@@ -149,9 +149,9 @@ define i64 @add_mul_combine_accept_b3(i64 %x) {
 ;
 ; RV64IMB-LABEL: add_mul_combine_accept_b3:
 ; RV64IMB:       # %bb.0:
-; RV64IMB-NEXT:    sh3add a1, a0, a0
-; RV64IMB-NEXT:    slli a0, a0, 5
-; RV64IMB-NEXT:    sub a0, a0, a1
+; RV64IMB-NEXT:    sh2add a1, a0, a0
+; RV64IMB-NEXT:    sh1add a1, a1, a0
+; RV64IMB-NEXT:    sh1add a0, a1, a0
 ; RV64IMB-NEXT:    lui a1, 50
 ; RV64IMB-NEXT:    addiw a1, a1, 1119
 ; RV64IMB-NEXT:    add a0, a0, a1
@@ -166,16 +166,17 @@ define i32 @add_mul_combine_reject_a1(i32 %x) {
 ; RV32IMB:       # %bb.0:
 ; RV32IMB-NEXT:    addi a0, a0, 1971
 ; RV32IMB-NEXT:    sh1add a1, a0, a0
-; RV32IMB-NEXT:    slli a0, a0, 5
-; RV32IMB-NEXT:    sub a0, a0, a1
+; RV32IMB-NEXT:    sh1add a1, a1, a0
+; RV32IMB-NEXT:    sh2add a0, a1, a0
 ; RV32IMB-NEXT:    ret
 ;
 ; RV64IMB-LABEL: add_mul_combine_reject_a1:
 ; RV64IMB:       # %bb.0:
 ; RV64IMB-NEXT:    addi a0, a0, 1971
 ; RV64IMB-NEXT:    sh1add a1, a0, a0
-; RV64IMB-NEXT:    slli a0, a0, 5
-; RV64IMB-NEXT:    subw a0, a0, a1
+; RV64IMB-NEXT:    sh1add a1, a1, a0
+; RV64IMB-NEXT:    sh2add a0, a1, a0
+; RV64IMB-NEXT:    sext.w a0, a0
 ; RV64IMB-NEXT:    ret
   %tmp0 = add i32 %x, 1971
   %tmp1 = mul i32 %tmp0, 29
@@ -187,16 +188,17 @@ define signext i32 @add_mul_combine_reject_a2(i32 signext %x) {
 ; RV32IMB:       # %bb.0:
 ; RV32IMB-NEXT:    addi a0, a0, 1971
 ; RV32IMB-NEXT:    sh1add a1, a0, a0
-; RV32IMB-N...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/96327