[PATCH] D133768: [DAGCombine] Do not fold SRA/SRL of MUL into MULH when MUL's LSB are used, and MUL_LOHI is available

Tue Sep 13 04:11:26 PDT 2022

jmmartinez created this revision.
jmmartinez added projects: AMDGPU, LLVM.
Herald added subscribers: kosarev, ecnelises, kerbowa, hiraditya, jvesely.
Herald added a project: All.
jmmartinez requested review of this revision.
Herald added a subscriber: llvm-commits.

Folding into a sra(mul) / srl(mul) into a mulh introduces an extra multiplication to compute the high half of the multiplication,
while it is more profitable to compute the high and lower halfs with a single mul_lohi.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D133768

Files:
  llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
  llvm/test/CodeGen/AMDGPU/mul_lohi.ll


Index: llvm/test/CodeGen/AMDGPU/mul_lohi.ll
===================================================================

--- /dev/null
+++ llvm/test/CodeGen/AMDGPU/mul_lohi.ll
@@ -0,0 +1,16 @@
+; RUN: llc -march=amdgcn -mcpu=gfx900 < %s | FileCheck -check-prefix=GCN %s
+
+define i32 @kernel(i32 %0, i32 %1, i32* %2) {
+  ; GCN-LABEL: kernel:
+  ; GCN:       ; %bb.0:
+  ; GCN-NOT:   v_mul_{{lo|hi}}
+  ; GCN:       v_mad_u64_u32
+  %4 = zext i32 %0 to i64
+  %5 = zext i32 %1 to i64
+  %6 = mul nuw i64 %5, %4
+  %7 = lshr i64 %6, 32
+  %8 = trunc i64 %7 to i32
+  store i32 %8, i32* %2, align 4
+  %9 = trunc i64 %6 to i32
+  ret i32 %9
+}
Index: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
===================================================================
--- llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -9244,6 +9244,28 @@
   EVT NarrowVT = LeftOp.getOperand(0).getValueType();
   unsigned NarrowVTSize = NarrowVT.getScalarSizeInBits();
 
+  // return true if U may use the lower bits of its operands
+  auto UserOfLowerBits = [NarrowVTSize](SDNode *U) {
+    if (U->getOpcode() != ISD::SRL || U->getOpcode() != ISD::SRA) {
+      return true;
+    }
+    ConstantSDNode *UShiftAmtSrc = isConstOrConstSplat(U->getOperand(1));
+    if (!UShiftAmtSrc) {
+      return true;
+    }
+    unsigned UShiftAmt = UShiftAmtSrc->getZExtValue();
+    return UShiftAmt < NarrowVTSize;
+  };
+
+  // If the lower part of the MUL is also used and MUL_LOHI is supported
+  // do not introduce the MULH in favor of MUL_LOHI
+  unsigned MulLoHiOp = IsSignExt ? ISD::SMUL_LOHI : ISD::UMUL_LOHI;
+  if (ShiftOperand->use_size() > 1 &&
+      TLI.isOperationLegalOrCustom(MulLoHiOp, NarrowVT) &&
+      llvm::any_of(ShiftOperand->uses(), UserOfLowerBits)) {
+    return SDValue();
+  }
+
   SDValue MulhRightOp;
   if (ConstantSDNode *Constant = isConstOrConstSplat(RightOp)) {
     unsigned ActiveBits = IsSignExt


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D133768.459710.patch
Type: text/x-patch
Size: 1949 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220913/57f294aa/attachment.bin>