[llvm] [RISCV][WIP] Fold (sh3add Z, (add X, (slli Y, 6))) -> (sh3add (sh3add Y, Z), X). (PR #85734)

Mon Mar 18 21:53:25 PDT 2024

https://github.com/topperc created https://github.com/llvm/llvm-project/pull/85734

This gives a 0.5% reduction in dynamic instruction count for 531.deepsjeng_r from spec2017. Using 2 sh3add matches what gcc generates.

This pattern appears when indexing arrays like `uint64_t fillUpAttacks[64][8]`. The first index needs to multiplied by 64 bytes and the second index needs to be multiplied by 8 bytes. Then both multiplied indices need to be added to the start of the array to calculate the full address. Alternatively, you can multiply the first index by 8 add it to the second index, then multiply the sum by 8 before adding the base pointer.

What we currently generate is a direct result of how GEPs are expanded in SelectionDAGBuilder.

This patch is a proof of concept I hacked together to do measurements and not necessarily how I think it should be implemented. There other variations of this pattern with different shift amounts that I did not handle and did not look for.

We do a lot of work during isel to find shXadd and slli instructions that are obscured. In the motivating example, Y in `(slli Y, 6)` is `(srli W, 58)`. What becomes `(slli (srli W, 58), 6)` is `(and (srl W, 52), -64)` when we start isel. We convert to a shift pair during selection of the `and`. Unless we repeat all of that cleverness to find all the variations of this pattern, we need to do this optimization sometime after isel.

There could be deeper versions of this pattern with more indices too.

X86 misses the opportunity to use 2 LEAs on the same code.

Posting this patch for discussion.

>From dbd3f2e1775a57e63c40afbdbc35897d5672484f Mon Sep 17 00:00:00 2001
From: Craig Topper <craig.topper at sifive.com>
Date: Mon, 18 Mar 2024 21:31:45 -0700
Subject: [PATCH] [RISCV][WIP] Fold (sh3add Z, (add X, (slli Y, 6))) -> (sh3add
 (sh3add Y, Z), X).

This gives a 0.5% reduction in dynamic instruction count for
531.deepsjeng_r from spec2017. This matches what gcc generates.

This pattern appears when indexing arrays like `uint64_t fillUpAttacks[64][8]`.
The first index needs to multipled by 64 and the second index needs
to be multiplied by 8. The both multiplied indices need to be added
to the start of the array to calculate the full address. Alternatively,
you can multiply the first index by 8 add it to the second index, then
multiply the sum by 8 before adding the base pointer.

This patch is a proof of concept and not how I think it should be
implemented.

We do a lot of work during isel to find shXadd and slli instructions
that are obscured. In the motivating example, Y in (slli Y, 6) is
(srli W, 58). The pattern is (slli (srli W, 58), 6) is written as
(and (srli W, 52), -64) when we start isel. We detect the shift pair
during selection of the and. Unless we repeat all of that cleverness,
we need to do this optimization sometime after isel.

X86 misses the opportunity to use 2 LEAs on the same code.

Posting this patch for discussion.
---
 llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp | 40 +++++++++++++++++++++
 llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h   |  1 +
 2 files changed, 41 insertions(+)

diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
index 1b8c1434c9f2d9..42eb4104ae538e 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
@@ -155,6 +155,8 @@ void RISCVDAGToDAGISel::PostprocessISelDAG() {
     // know how to handle masked true inputs.  Once that has been moved
     // to post-ISEL, this can be deleted as well.
     MadeChange |= doPeepholeMaskedRVV(cast<MachineSDNode>(N));
+
+    MadeChange |= doPeepholeSHXADD(N);
   }
 
   CurDAG->setRoot(Dummy.getValue());
@@ -3320,6 +3322,44 @@ bool RISCVDAGToDAGISel::selectRVVSimm5(SDValue N, unsigned Width,
   return false;
 }
 
+// Fold (sh3add Z, (add X, (slli Y, 6))) -> (sh3add (sh3add Y, Z), X).
+// TODO: There is a more general form of this.
+// TODO: There also .uw forms of this.
+// TODO: Not sure a post-isel peephole makes sense.
+bool RISCVDAGToDAGISel::doPeepholeSHXADD(SDNode *N) {
+  if (N->getMachineOpcode() != RISCV::SH3ADD)
+    return false;
+
+  SDValue N1 = N->getOperand(1);
+  if (!N1.isMachineOpcode() || N1.getMachineOpcode() != RISCV::ADD ||
+      !N1.hasOneUse())
+    return false;
+
+  SDValue N10 = N1.getOperand(0);
+  SDValue N11 = N1.getOperand(1);
+
+  if (!N11.isMachineOpcode() || N11.getMachineOpcode() != RISCV::SLLI)
+    std::swap(N10, N11);
+  if (!N11.isMachineOpcode() || N11.getMachineOpcode() != RISCV::SLLI ||
+      !N11.hasOneUse())
+    return false;
+
+  if (!isa<ConstantSDNode>(N11.getOperand(1)) ||
+      cast<ConstantSDNode>(N11.getOperand(1))->getZExtValue() != 6)
+    return false;
+
+  SDValue X = N1.getOperand(0);
+  SDValue Y = N11.getOperand(0);
+  SDValue Z = N->getOperand(0);
+
+  SDNode *SH3ADD1 = CurDAG->getMachineNode(RISCV::SH3ADD, SDLoc(N), N->getValueType(0),
+                                          Y, Z);
+  SDNode *SH3ADD2 = CurDAG->getMachineNode(RISCV::SH3ADD, SDLoc(N), N->getValueType(0),
+                                           SDValue(SH3ADD1, 0), X);
+  ReplaceUses(N, SH3ADD2);
+  return true;
+}
+
 // Try to remove sext.w if the input is a W instruction or can be made into
 // a W instruction cheaply.
 bool RISCVDAGToDAGISel::doPeepholeSExtW(SDNode *N) {
diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h
index 92f818b0dc4891..49326f0b6c2584 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h
@@ -189,6 +189,7 @@ class RISCVDAGToDAGISel : public SelectionDAGISel {
 
 private:
   bool doPeepholeSExtW(SDNode *Node);
+  bool doPeepholeSHXADD(SDNode *Node);
   bool doPeepholeMaskedRVV(MachineSDNode *Node);
   bool doPeepholeMergeVVMFold();
   bool doPeepholeNoRegPassThru();