[llvm] [SystemZ] Enable MachineCombiner for FP reassociation (PR #83546)

via llvm-commits llvm-commits at lists.llvm.org
Fri Mar 1 01:27:42 PST 2024


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-backend-systemz

Author: Jonas Paulsson (JonPsson1)

<details>
<summary>Changes</summary>

This patch is still slightly unfinished/experimental. It has the option of using either PPC or alternative new patterns. Some feedback would be nice at this point (see below).

Main points:

- Enable MachineCombiner
- Disable reg/mem folding during isel as MachineCombiner can only work with reg/reg instructions.
- Implement optimizeLoadInstr() so that PeepholeOptimizer can fold loads into reg/mem instructions after
  MachineCombiner.
- Run a new simple pass in SystemZ backend that converts any remaining reg/reg pseudos. It also does some
  limited reg/mem folding across MBB boundaries which gives a nice improvement on imagick (13%!).

I have experimented with figuring out which pattern combinations would give the best improvement, measured by the decrease of maximum depth of any instructions in the MBB as reported by MachineTraceMetrics. Just the common-code binary patterns gives a total static improvement of -10294. Adding different fma patterns gave some further improvements:

```
Only common code binary patterns:          -10294
Additional decrease, relative to -10294:               With reassoc-additions: static (relative to -10294) / performance
PowerPC patterns (fma2add, fma3-ch):        -3211
fma1add, fma2, addfma                    :  -6269      -6812 / possibly better
fma1add, fma2                            :  -5760      
All experimental patterns                :  -6237      -6865 / about same
```

Looking at these numbers, it seemed like the patterns of {fma+fma, fma+add, add+fma} was the most promising combination, and that the PPC patterns {fma+fma+fma, fma+fma+add} were only half of that.

However, even though these different settings of FMA patterns give further static improvements, there doesn't seem to be any further performance improvement on SPEC for any of them. With the experimental reassoc-additions pass (right colum) a bit more depth-improvement resulted, but not even with this is there any noticeable performance win.

A bit surprisingly, I saw instead a 13% improvement imagick *without* the MachineCombiner, just because of the global reg/mem folding I tried. Then with the MachineCombiner using only common-code patterns for binops, lbm is improved ~20%, and cactus ~2%. 

My conclusions are that the MachineCombiner is worth enabling and even though there is no noticeable performance improvement on SPEC it seems good to have some basic FMA patterns, either like -z-fma or -ppc-fma . The simpler solution is probably to reuse the 2 PPC patterns, while the z-fma ones give better maxdepth improvement which may or may not be valuable sometimes. Compile time seems to be the same.

The experiments and results per above have included a minor fix in MachineCombiner (https://github.com/llvm/llvm-project/pull/82025), which hopefully can be committed but is probably not vital.

The full experimental patch: 
[MachineComb.patch.tar.gz](https://github.com/llvm/llvm-project/files/14459050/MachineComb.patch.tar.gz)




---

Patch is 116.87 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/83546.diff


27 Files Affected:

- (modified) llvm/include/llvm/CodeGen/MachineCombinerPattern.h (+9) 
- (modified) llvm/include/llvm/CodeGen/TargetInstrInfo.h (+1-1) 
- (modified) llvm/lib/CodeGen/MachineCombiner.cpp (+8-3) 
- (modified) llvm/lib/Target/SystemZ/CMakeLists.txt (+1) 
- (modified) llvm/lib/Target/SystemZ/SystemZ.h (+2) 
- (added) llvm/lib/Target/SystemZ/SystemZFinalizeReassociation.cpp (+127) 
- (modified) llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp (+7) 
- (modified) llvm/lib/Target/SystemZ/SystemZInstrFP.td (+16-8) 
- (modified) llvm/lib/Target/SystemZ/SystemZInstrFormats.td (+30) 
- (modified) llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp (+554-14) 
- (modified) llvm/lib/Target/SystemZ/SystemZInstrInfo.h (+39) 
- (modified) llvm/lib/Target/SystemZ/SystemZInstrVector.td (+33-33) 
- (modified) llvm/lib/Target/SystemZ/SystemZOperators.td (+29) 
- (modified) llvm/lib/Target/SystemZ/SystemZScheduleZ13.td (+3-3) 
- (modified) llvm/lib/Target/SystemZ/SystemZScheduleZ14.td (+5-5) 
- (modified) llvm/lib/Target/SystemZ/SystemZScheduleZ15.td (+5-5) 
- (modified) llvm/lib/Target/SystemZ/SystemZScheduleZ16.td (+5-5) 
- (modified) llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp (+10) 
- (modified) llvm/lib/Target/X86/X86InstrInfo.cpp (+1-1) 
- (modified) llvm/lib/Target/X86/X86InstrInfo.h (+1-1) 
- (modified) llvm/test/CodeGen/SystemZ/fp-add-02.ll (+14) 
- (modified) llvm/test/CodeGen/SystemZ/fp-mul-02.ll (+11-1) 
- (added) llvm/test/CodeGen/SystemZ/machine-combiner-reassoc-fp-01.ll (+690) 
- (added) llvm/test/CodeGen/SystemZ/machine-combiner-reassoc-fp-03.ll (+90) 
- (added) llvm/test/CodeGen/SystemZ/machine-combiner-reassoc-fp-04.ll (+123) 
- (added) llvm/test/CodeGen/SystemZ/machine-combiner-reassoc-fp-08.ll (+115) 
- (added) llvm/test/CodeGen/SystemZ/machine-combiner-reassoc-fp-09.ll (+177) 


``````````diff
diff --git a/llvm/include/llvm/CodeGen/MachineCombinerPattern.h b/llvm/include/llvm/CodeGen/MachineCombinerPattern.h
index 89eed7463bd783..c0b3927450cb69 100644
--- a/llvm/include/llvm/CodeGen/MachineCombinerPattern.h
+++ b/llvm/include/llvm/CodeGen/MachineCombinerPattern.h
@@ -176,6 +176,15 @@ enum class MachineCombinerPattern {
   FMSUB,
   FNMSUB,
 
+  // SystemZ patterns.  (EXPERIMENTAL)
+  FMA2_P1P0,
+  FMA2_P0P1,
+  FMA2,
+  FMA1_Add_L,
+  FMA1_Add_R,
+  FMA3,     // These are inspired by PPC
+  FMA2_Add, //
+
   // X86 VNNI
   DPWSSD,
 
diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
index e7787aafb98e2d..dc48056c3d9bcd 100644
--- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
@@ -1698,7 +1698,7 @@ class TargetInstrInfo : public MCInstrInfo {
   /// instruction that defines FoldAsLoadDefReg, and the function returns
   /// the machine instruction generated due to folding.
   virtual MachineInstr *optimizeLoadInstr(MachineInstr &MI,
-                                          const MachineRegisterInfo *MRI,
+                                          MachineRegisterInfo *MRI,
                                           Register &FoldAsLoadDefReg,
                                           MachineInstr *&DefMI) const {
     return nullptr;
diff --git a/llvm/lib/CodeGen/MachineCombiner.cpp b/llvm/lib/CodeGen/MachineCombiner.cpp
index c65937935ed820..577331441eef28 100644
--- a/llvm/lib/CodeGen/MachineCombiner.cpp
+++ b/llvm/lib/CodeGen/MachineCombiner.cpp
@@ -155,9 +155,7 @@ MachineCombiner::getOperandDef(const MachineOperand &MO) {
   // We need a virtual register definition.
   if (MO.isReg() && MO.getReg().isVirtual())
     DefInstr = MRI->getUniqueVRegDef(MO.getReg());
-  // PHI's have no depth etc.
-  if (DefInstr && DefInstr->isPHI())
-    DefInstr = nullptr;
+  // (PATCH PROPOSED for PHIs: https://github.com/llvm/llvm-project/pull/82025)
   return DefInstr;
 }
 
@@ -317,6 +315,13 @@ static CombinerObjective getCombinerObjective(MachineCombinerPattern P) {
   case MachineCombinerPattern::FMADD_XA:
   case MachineCombinerPattern::FMSUB:
   case MachineCombinerPattern::FNMSUB:
+  case MachineCombinerPattern::FMA2_P1P0:
+  case MachineCombinerPattern::FMA2_P0P1:
+  case MachineCombinerPattern::FMA2:
+  case MachineCombinerPattern::FMA1_Add_L:
+  case MachineCombinerPattern::FMA1_Add_R:
+  case MachineCombinerPattern::FMA3:
+  case MachineCombinerPattern::FMA2_Add:
     return CombinerObjective::MustReduceDepth;
   case MachineCombinerPattern::REASSOC_XY_BCA:
   case MachineCombinerPattern::REASSOC_XY_BAC:
diff --git a/llvm/lib/Target/SystemZ/CMakeLists.txt b/llvm/lib/Target/SystemZ/CMakeLists.txt
index 0614e07bde8ac1..235df220729447 100644
--- a/llvm/lib/Target/SystemZ/CMakeLists.txt
+++ b/llvm/lib/Target/SystemZ/CMakeLists.txt
@@ -20,6 +20,7 @@ add_llvm_target(SystemZCodeGen
   SystemZConstantPoolValue.cpp
   SystemZCopyPhysRegs.cpp
   SystemZElimCompare.cpp
+  SystemZFinalizeReassociation.cpp
   SystemZFrameLowering.cpp
   SystemZHazardRecognizer.cpp
   SystemZISelDAGToDAG.cpp
diff --git a/llvm/lib/Target/SystemZ/SystemZ.h b/llvm/lib/Target/SystemZ/SystemZ.h
index d7aa9e4e18cbbb..49a200babfff57 100644
--- a/llvm/lib/Target/SystemZ/SystemZ.h
+++ b/llvm/lib/Target/SystemZ/SystemZ.h
@@ -195,12 +195,14 @@ FunctionPass *createSystemZShortenInstPass(SystemZTargetMachine &TM);
 FunctionPass *createSystemZLongBranchPass(SystemZTargetMachine &TM);
 FunctionPass *createSystemZLDCleanupPass(SystemZTargetMachine &TM);
 FunctionPass *createSystemZCopyPhysRegsPass(SystemZTargetMachine &TM);
+FunctionPass *createSystemZFinalizeReassociationPass(SystemZTargetMachine &TM);
 FunctionPass *createSystemZPostRewritePass(SystemZTargetMachine &TM);
 FunctionPass *createSystemZTDCPass();
 
 void initializeSystemZCopyPhysRegsPass(PassRegistry &);
 void initializeSystemZDAGToDAGISelPass(PassRegistry &);
 void initializeSystemZElimComparePass(PassRegistry &);
+void initializeSystemZFinalizeReassociationPass(PassRegistry &);
 void initializeSystemZLDCleanupPass(PassRegistry &);
 void initializeSystemZLongBranchPass(PassRegistry &);
 void initializeSystemZPostRewritePass(PassRegistry &);
diff --git a/llvm/lib/Target/SystemZ/SystemZFinalizeReassociation.cpp b/llvm/lib/Target/SystemZ/SystemZFinalizeReassociation.cpp
new file mode 100644
index 00000000000000..d441b8bbc87381
--- /dev/null
+++ b/llvm/lib/Target/SystemZ/SystemZFinalizeReassociation.cpp
@@ -0,0 +1,127 @@
+//===---- SystemZFinalizeReassociation.cpp - Finalize FP reassociation ----===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass is the last step of the process of enabling reassociation with
+// the MachineCombiner. These are the steps involved:
+//
+// 1. Instruction selection: Disable reg/mem folding for any operations that
+//    are reassociable since MachineCombiner will not succeed
+//    otherwise. Instead select a reg/reg pseudo that pretends to clobber CC.
+//
+// 2. MachineCombiner: Performs reassociation with the reg/reg instructions.
+//
+// 3. PeepholeOptimizer: fold loads into reg/mem instructions after
+//    reassociation. The reg/mem opcode sets CC which is why the special
+//    reg/reg pseudo is needed.
+//
+// 4. Convert any remaining pseudos into the target opcodes that do not
+//    clobber CC (this pass).
+//
+//===----------------------------------------------------------------------===//
+
+#include "SystemZMachineFunctionInfo.h"
+#include "SystemZTargetMachine.h"
+#include "llvm/CodeGen/MachineDominators.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/TargetInstrInfo.h"
+#include "llvm/CodeGen/TargetRegisterInfo.h"
+#include "llvm/Target/TargetMachine.h"
+
+using namespace llvm;
+
+namespace {
+
+class SystemZFinalizeReassociation : public MachineFunctionPass {
+public:
+  static char ID;
+  SystemZFinalizeReassociation()
+    : MachineFunctionPass(ID), TII(nullptr), MRI(nullptr) {
+    initializeSystemZFinalizeReassociationPass(*PassRegistry::getPassRegistry());
+  }
+
+  bool runOnMachineFunction(MachineFunction &MF) override;
+  void getAnalysisUsage(AnalysisUsage &AU) const override;
+
+private:
+
+  bool visitMBB(MachineBasicBlock &MBB);
+
+  const SystemZInstrInfo *TII;
+  MachineRegisterInfo *MRI;
+};
+
+char SystemZFinalizeReassociation::ID = 0;
+
+} // end anonymous namespace
+
+INITIALIZE_PASS(SystemZFinalizeReassociation, "systemz-finalize-reassoc",
+                "SystemZ Finalize Reassociation", false, false)
+
+FunctionPass *llvm::
+createSystemZFinalizeReassociationPass(SystemZTargetMachine &TM) {
+  return new SystemZFinalizeReassociation();
+}
+
+void SystemZFinalizeReassociation::getAnalysisUsage(AnalysisUsage &AU) const {
+  AU.setPreservesCFG();
+  MachineFunctionPass::getAnalysisUsage(AU);
+}
+
+bool SystemZFinalizeReassociation::visitMBB(MachineBasicBlock &MBB) {
+  bool Changed = false;
+  for (MachineInstr &MI : llvm::make_early_inc_range(MBB)) {
+    unsigned PseudoOpcode = MI.getOpcode();
+    unsigned TargetOpcode =
+      PseudoOpcode == SystemZ::WFADB_CCPseudo    ? SystemZ::WFADB
+      : PseudoOpcode == SystemZ::WFASB_CCPseudo  ? SystemZ::WFASB
+      : PseudoOpcode == SystemZ::WFSDB_CCPseudo  ? SystemZ::WFSDB
+      : PseudoOpcode == SystemZ::WFSSB_CCPseudo  ? SystemZ::WFSSB
+      : PseudoOpcode == SystemZ::WFMDB_CCPseudo  ? SystemZ::WFMDB
+      : PseudoOpcode == SystemZ::WFMSB_CCPseudo  ? SystemZ::WFMSB
+      : PseudoOpcode == SystemZ::WFMADB_CCPseudo ? SystemZ::WFMADB
+      : PseudoOpcode == SystemZ::WFMASB_CCPseudo ? SystemZ::WFMASB
+      : 0;
+    if (TargetOpcode) {
+      // PeepholeOptimizer will not fold any loads across basic blocks, which
+      // however seems beneficial, so do it here:
+      bool Folded = false;
+      for (unsigned Op = 1; Op <= 2; ++Op) {
+        Register Reg = MI.getOperand(Op).getReg();
+        if (MachineInstr *DefMI = MRI->getVRegDef(Reg))
+          if (TII->optimizeLoadInstr(MI, MRI, Reg, DefMI)) {
+            MI.eraseFromParent();
+            DefMI->eraseFromParent();
+            MRI->markUsesInDebugValueAsUndef(Reg);
+            Folded = true;
+            break;
+          }
+      }
+
+      if (!Folded) {
+        MI.setDesc(TII->get(TargetOpcode));
+        int CCIdx = MI.findRegisterDefOperandIdx(SystemZ::CC);
+        MI.removeOperand(CCIdx);
+      }
+      Changed = true;
+    }
+  }
+  return Changed;
+}
+
+bool SystemZFinalizeReassociation::runOnMachineFunction(MachineFunction &F) {
+  TII = F.getSubtarget<SystemZSubtarget>().getInstrInfo();
+  MRI = &F.getRegInfo();
+
+  bool Modified = false;
+  for (auto &MBB : F)
+    Modified |= visitMBB(MBB);
+
+  return Modified;
+}
diff --git a/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp b/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp
index 815eca1240d827..652980d3c2c0df 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp
@@ -347,6 +347,13 @@ class SystemZDAGToDAGISel : public SelectionDAGISel {
   // Try to expand a boolean SELECT_CCMASK using an IPM sequence.
   SDValue expandSelectBoolean(SDNode *Node);
 
+  // Return true if the flags of N and the subtarget allows for reassociation.
+  bool isReassociable(SDNode *N) const {
+    return N->getFlags().hasAllowReassociation() &&
+           N->getFlags().hasNoSignedZeros() &&
+           Subtarget->hasVector();
+  }
+
 public:
   static char ID;
 
diff --git a/llvm/lib/Target/SystemZ/SystemZInstrFP.td b/llvm/lib/Target/SystemZ/SystemZInstrFP.td
index 6e67425c1e788b..5b383b3fa6ae88 100644
--- a/llvm/lib/Target/SystemZ/SystemZInstrFP.td
+++ b/llvm/lib/Target/SystemZ/SystemZInstrFP.td
@@ -430,8 +430,10 @@ let Uses = [FPC], mayRaiseFPException = 1,
     def ADBR : BinaryRRE<"adbr", 0xB31A, any_fadd, FP64,  FP64>;
     def AXBR : BinaryRRE<"axbr", 0xB34A, any_fadd, FP128, FP128>;
   }
-  defm AEB : BinaryRXEAndPseudo<"aeb", 0xED0A, any_fadd, FP32, load, 4>;
-  defm ADB : BinaryRXEAndPseudo<"adb", 0xED1A, any_fadd, FP64, load, 8>;
+  defm AEB : BinaryRXEAndPseudo<"aeb", 0xED0A, z_any_fadd_noreassoc, FP32,
+                                load, 4>;
+  defm ADB : BinaryRXEAndPseudo<"adb", 0xED1A, z_any_fadd_noreassoc, FP64,
+                                load, 8>;
 }
 
 // Subtraction.
@@ -441,8 +443,10 @@ let Uses = [FPC], mayRaiseFPException = 1,
   def SDBR : BinaryRRE<"sdbr", 0xB31B, any_fsub, FP64,  FP64>;
   def SXBR : BinaryRRE<"sxbr", 0xB34B, any_fsub, FP128, FP128>;
 
-  defm SEB : BinaryRXEAndPseudo<"seb",  0xED0B, any_fsub, FP32, load, 4>;
-  defm SDB : BinaryRXEAndPseudo<"sdb",  0xED1B, any_fsub, FP64, load, 8>;
+  defm SEB : BinaryRXEAndPseudo<"seb",  0xED0B, z_any_fsub_noreassoc, FP32,
+                                load, 4>;
+  defm SDB : BinaryRXEAndPseudo<"sdb",  0xED1B, z_any_fsub_noreassoc, FP64,
+                                load, 8>;
 }
 
 // Multiplication.
@@ -452,8 +456,10 @@ let Uses = [FPC], mayRaiseFPException = 1 in {
     def MDBR  : BinaryRRE<"mdbr",  0xB31C, any_fmul, FP64,  FP64>;
     def MXBR  : BinaryRRE<"mxbr",  0xB34C, any_fmul, FP128, FP128>;
   }
-  defm MEEB : BinaryRXEAndPseudo<"meeb", 0xED17, any_fmul, FP32, load, 4>;
-  defm MDB  : BinaryRXEAndPseudo<"mdb",  0xED1C, any_fmul, FP64, load, 8>;
+  defm MEEB : BinaryRXEAndPseudo<"meeb", 0xED17, z_any_fmul_noreassoc, FP32,
+                                 load, 4>;
+  defm MDB  : BinaryRXEAndPseudo<"mdb",  0xED1C, z_any_fmul_noreassoc, FP64,
+                                 load, 8>;
 }
 
 // f64 multiplication of two FP32 registers.
@@ -495,8 +501,10 @@ let Uses = [FPC], mayRaiseFPException = 1 in {
   def MAEBR : TernaryRRD<"maebr", 0xB30E, z_any_fma, FP32, FP32>;
   def MADBR : TernaryRRD<"madbr", 0xB31E, z_any_fma, FP64, FP64>;
 
-  defm MAEB : TernaryRXFAndPseudo<"maeb", 0xED0E, z_any_fma, FP32, FP32, load, 4>;
-  defm MADB : TernaryRXFAndPseudo<"madb", 0xED1E, z_any_fma, FP64, FP64, load, 8>;
+  defm MAEB : TernaryRXFAndPseudo<"maeb", 0xED0E, z_any_fma_noreassoc, FP32,
+                                  FP32, load, 4>;
+  defm MADB : TernaryRXFAndPseudo<"madb", 0xED1E, z_any_fma_noreassoc, FP64,
+                                  FP64, load, 8>;
 }
 
 // Fused multiply-subtract.
diff --git a/llvm/lib/Target/SystemZ/SystemZInstrFormats.td b/llvm/lib/Target/SystemZ/SystemZInstrFormats.td
index bb9fa0fc33ffa0..9fdda0a5d60031 100644
--- a/llvm/lib/Target/SystemZ/SystemZInstrFormats.td
+++ b/llvm/lib/Target/SystemZ/SystemZInstrFormats.td
@@ -5536,3 +5536,33 @@ multiclass StringRRE<string mnemonic, bits<16> opcode,
                       [(set GR64:$end, (operator GR64:$start1, GR64:$start2,
                                                  GR32:$char))]>;
 }
+
+multiclass BinaryVRRcAndCCPseudo<string mnemonic, bits<16> opcode,
+                                 SDPatternOperator operator,
+                                 SDPatternOperator reassoc_operator,
+                                 TypedReg tr1, TypedReg tr2, bits<4> type = 0,
+                                 bits<4> m5 = 0, bits<4> m6 = 0,
+                                 string fp_mnemonic = ""> {
+  def "" : BinaryVRRc<mnemonic, opcode, operator, tr1, tr2, type, m5, m6,
+                      fp_mnemonic>;
+  let Defs = [CC], AddedComplexity = 1 in // Win over "".
+    def _CCPseudo : Pseudo<(outs tr1.op:$V1), (ins tr2.op:$V2, tr2.op:$V3),
+                           [(set (tr1.vt tr1.op:$V1),
+                                 (reassoc_operator (tr2.vt tr2.op:$V2),
+                                                   (tr2.vt tr2.op:$V3)))]>;
+}
+
+multiclass TernaryVRReAndCCPseudo<string mnemonic, bits<16> opcode,
+                                  SDPatternOperator operator,
+                                  SDPatternOperator reassoc_operator,
+                                  TypedReg tr1, TypedReg tr2, bits<4> m5 = 0,
+                                  bits<4> type = 0, string fp_mnemonic = ""> {
+  def "" : TernaryVRRe<mnemonic, opcode, operator, tr1, tr2, m5, type,
+                       fp_mnemonic>;
+  let Defs = [CC], AddedComplexity = 1 in // Win over "".
+    def _CCPseudo : Pseudo<(outs tr1.op:$V1),
+                           (ins tr2.op:$V2, tr2.op:$V3, tr1.op:$V4),
+            [(set (tr1.vt tr1.op:$V1), (reassoc_operator (tr2.vt tr2.op:$V2),
+                                                         (tr2.vt tr2.op:$V3),
+                                                         (tr1.vt tr1.op:$V4)))]>;
+}
diff --git a/llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp b/llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
index 046a12208467b4..1d8db507e30f14 100644
--- a/llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
@@ -21,6 +21,7 @@
 #include "llvm/CodeGen/LivePhysRegs.h"
 #include "llvm/CodeGen/LiveVariables.h"
 #include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineCombinerPattern.h"
 #include "llvm/CodeGen/MachineFrameInfo.h"
 #include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/CodeGen/MachineInstr.h"
@@ -610,6 +611,112 @@ void SystemZInstrInfo::insertSelect(MachineBasicBlock &MBB,
     .addImm(CCValid).addImm(CCMask);
 }
 
+static void transferDeadCC(MachineInstr *OldMI, MachineInstr *NewMI) {
+  if (OldMI->registerDefIsDead(SystemZ::CC)) {
+    MachineOperand *CCDef = NewMI->findRegisterDefOperand(SystemZ::CC);
+    if (CCDef != nullptr)
+      CCDef->setIsDead(true);
+  }
+}
+
+void SystemZInstrInfo::transferMIFlag(MachineInstr *OldMI, MachineInstr *NewMI,
+                                      MachineInstr::MIFlag Flag) const {
+  if (OldMI->getFlag(Flag))
+    NewMI->setFlag(Flag);
+}
+
+static cl::opt<unsigned> MaxUsersGlobalFold("maxusersglobfold", cl::init(7));
+
+MachineInstr *SystemZInstrInfo::optimizeLoadInstr(MachineInstr &MI,
+                                                  MachineRegisterInfo *MRI,
+                                                  Register &FoldAsLoadDefReg,
+                                                  MachineInstr *&DefMI) const {
+  const TargetRegisterInfo *TRI = MRI->getTargetRegisterInfo();
+
+  // Check whether we can move the DefMI load, and that it only has one use.
+  DefMI = MRI->getVRegDef(FoldAsLoadDefReg);
+  assert(DefMI);
+  bool SawStore = false;
+  if (!DefMI->isSafeToMove(nullptr, SawStore) ||
+      !MRI->hasOneNonDBGUse(FoldAsLoadDefReg))
+    return nullptr;
+
+  // For reassociable FP operations, any loads have been purposefully left
+  // unfolded so that MachineCombiner can do its work on reg/reg
+  // opcodes. After that, as many loads as possible are now folded.
+  unsigned LoadOpcode = 0;
+  unsigned RegMemOpcode = 0;
+  const TargetRegisterClass *FPRC = nullptr;
+  RegMemOpcode = MI.getOpcode() == SystemZ::WFADB_CCPseudo    ? SystemZ::ADB
+                 : MI.getOpcode() == SystemZ::WFSDB_CCPseudo  ? SystemZ::SDB
+                 : MI.getOpcode() == SystemZ::WFMDB_CCPseudo  ? SystemZ::MDB
+                 : MI.getOpcode() == SystemZ::WFMADB_CCPseudo ? SystemZ::MADB
+                                                              : 0;
+  if (RegMemOpcode) {
+    LoadOpcode = SystemZ::VL64;
+    FPRC = &SystemZ::FP64BitRegClass;
+  } else {
+    RegMemOpcode = MI.getOpcode() == SystemZ::WFASB_CCPseudo    ? SystemZ::AEB
+                   : MI.getOpcode() == SystemZ::WFSSB_CCPseudo  ? SystemZ::SEB
+                   : MI.getOpcode() == SystemZ::WFMSB_CCPseudo  ? SystemZ::MEEB
+                   : MI.getOpcode() == SystemZ::WFMASB_CCPseudo ? SystemZ::MAEB
+                                                                : 0;
+    if (RegMemOpcode) {
+      LoadOpcode = SystemZ::VL32;
+      FPRC = &SystemZ::FP32BitRegClass;
+    }
+  }
+  if (!RegMemOpcode || DefMI->getOpcode() != LoadOpcode)
+    return nullptr;
+
+  DebugLoc DL = MI.getDebugLoc();
+  Register DstReg = MI.getOperand(0).getReg();
+  MachineOperand LHS = MI.getOperand(1);
+  MachineOperand RHS = MI.getOperand(2);
+  bool IsTernary =
+    (RegMemOpcode == SystemZ::MADB || RegMemOpcode == SystemZ::MAEB);
+  MachineOperand *AccMO = IsTernary ? &MI.getOperand(3) : nullptr;
+  if ((RegMemOpcode == SystemZ::SDB || RegMemOpcode == SystemZ::SEB) &&
+      FoldAsLoadDefReg != RHS.getReg())
+    return nullptr;
+  if (IsTernary && FoldAsLoadDefReg == AccMO->getReg())
+    return nullptr;
+  MachineOperand &RegMO = RHS.getReg() == FoldAsLoadDefReg ? LHS : RHS;
+  // It seems that folding globally into 2-address with too many users of reg
+  // is detrimental (resulting in reg copys before each use).
+  if (!MRI->hasOneNonDBGUse(RegMO.getReg()) && !RegMO.isKill() &&
+      DefMI->getParent() != MI.getParent()) {
+    unsigned NumUsers = 0;
+    for (MachineInstr &UI : MRI->use_nodbg_instructions(RegMO.getReg())) {
+      NumUsers++; (void)UI;
+    }
+    if (NumUsers > MaxUsersGlobalFold)
+      return nullptr;
+  }
+
+  MachineOperand &Base = DefMI->getOperand(1);
+  MachineOperand &Disp = DefMI->getOperand(2);
+  MachineOperand &Indx = DefMI->getOperand(3);
+  if (Base.isReg())  // Could be a FrameIndex.
+    Base.setIsKill(false);
+  Indx.setIsKill(false);
+
+  MachineInstrBuilder MIB =
+    BuildMI(*MI.getParent(), MI, DL, get(RegMemOpcode), DstReg);
+  if (IsTernary)
+    MIB.add(*AccMO);
+  MIB.add(RegMO);
+  MIB.add(Base).add(Disp).add(Indx);
+  MIB.addMemOperand(*DefMI->memoperands_begin());
+  transferMIFlag(&MI, MIB, MachineInstr::NoFPExcept);
+  MIB->addRegisterDead(SystemZ::CC, TRI);
+  MRI->setRegClass(DstReg, FPRC);
+  if (IsTernary)
+    MRI->setRegClass(AccMO->getReg(), FPRC);
+  MRI->setRegClass(RegMO.getReg(), FPRC);
+  return MIB;
+}
+
 bool SystemZInstrInfo::foldImmediate(MachineInstr &UseMI, MachineInstr &DefMI,
                                      Register Reg,
                                      MachineRegisterInfo *MRI) const {
@@ -937,20 +1044,6 @@ static LogicOp interpretAndImmediate(unsigned Opcode) {
   }
 }
 
-static void transferDeadCC(MachineInstr *OldMI, MachineInstr *NewMI) {
-  if (OldMI->registerDefIsDead(SystemZ::CC)) {
-    MachineOperand *CCDef = NewMI->findRegisterDefOperand(SystemZ::CC);
-    ...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/83546


More information about the llvm-commits mailing list