[PATCH] D23814: AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the scheduler

Tue Aug 23 14:45:36 PDT 2016

arsenm added inline comments.

================
Comment at: lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:499
@@ -498,1 +498,3 @@
   addPass(&DeadMachineInstructionElimID);
+  if (getOptLevel() > CodeGenOpt::None) {
+    // Don't do this with no optimizations since it throws away debug info by
----------------
This isn't needed here since addMachineSSAOptimization isn't called with None anyway

================
Comment at: lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:547-552
@@ -541,13 +546,8 @@
 
   if (getOptLevel() > CodeGenOpt::None) {
-    // Don't do this with no optimizations since it throws away debug info by
-    // merging nonadjacent loads.
-
-    // This should be run after scheduling, but before register allocation. It
-    // also need extra copies to the address operand to be eliminated.
-
-    // FIXME: Move pre-RA and remove extra reg coalescer run.
-    insertPass(&MachineSchedulerID, &SILoadStoreOptimizerID);
+    // FIXME: Remove extra reg coalescer run.  This was required when
+    // we ran the SILoadStoreOptimizer here, but removing it blocks
+    // some of the optimizations in the SIShrinkInstructionsPass.
     insertPass(&MachineSchedulerID, &RegisterCoalescerID);
   }
 
----------------
The extra run should be removed now

================
Comment at: lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:78
@@ -76,1 +77,3 @@
+    unsigned EltSize,
+    const std::vector<MachineInstr*> &InstsToMove);
 
----------------
ArrayRef

================
Comment at: lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:84
@@ -81,1 +83,3 @@
+    unsigned EltSize,
+    const std::vector<MachineInstr*> &InstsToMove);
 
----------------
ArrayRef

================
Comment at: lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:107
@@ -102,6 +106,3 @@
     AU.setPreservesCFG();
-    AU.addPreserved<SlotIndexes>();
-    AU.addPreserved<LiveIntervals>();
-    AU.addPreserved<LiveVariables>();
-    AU.addRequired<LiveIntervals>();
+    AU.addRequired<AAResultsWrapperPass>();
 
----------------
Does this need to be required? If it's not required is it available in the current pass pipeline? I think there's a subtarget hook we needed to enable to use this

================
Comment at: lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:198
@@ -164,25 +197,3 @@
 
-  if (MBBI == MBB.end() || MBBI->getOpcode() != I->getOpcode())
-    return E;
-
-  // Don't merge volatiles.
-  if (MBBI->hasOrderedMemoryRef())
-    return E;
-
-  int AddrIdx = AMDGPU::getNamedOperandIdx(I->getOpcode(), AMDGPU::OpName::addr);
-  const MachineOperand &AddrReg0 = I->getOperand(AddrIdx);
-  const MachineOperand &AddrReg1 = MBBI->getOperand(AddrIdx);
-
-  // Check same base pointer. Be careful of subregisters, which can occur with
-  // vectors of pointers.
-  if (AddrReg0.getReg() == AddrReg1.getReg() &&
-      AddrReg0.getSubReg() == AddrReg1.getSubReg()) {
-    int OffsetIdx = AMDGPU::getNamedOperandIdx(I->getOpcode(),
-                                               AMDGPU::OpName::offset);
-    unsigned Offset0 = I->getOperand(OffsetIdx).getImm() & 0xffff;
-    unsigned Offset1 = MBBI->getOperand(OffsetIdx).getImm() & 0xffff;
-
-    // Check both offsets fit in the reduced range.
-    if (offsetsCanBeCombined(Offset0, Offset1, EltSize))
-      return MBBI;
-  }
+  std::vector<const MachineOperand *> DefsToMove;
+  addDefsToList(*I, DefsToMove);
----------------
Should be a SmallVector

================
Comment at: lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:226-249
@@ -189,1 +225,26 @@
+
+      // When we match I with another DS instruction we will be moving I down
+      // to the location of the matched instruction any uses of I will need to
+      // be moved down as well.
+      for (const MachineOperand *Def : DefsToMove) {
+        std::pair<bool, bool> ReadWrite =
+            MBBI->readsWritesVirtualRegister(Def->getReg());
+        if (ReadWrite.second)
+          // This means Def and MBBI both write to the same super register.
+          // FIXME: This is probably OK, but we may need some additional logic
+          // to make sure the sub-indices of the writes don't overlap.
+          return E;
+
+        // If ReadWrite.first is true, then there is a use of Def between I
+        // and the instruction that I will potentially be merged with, we
+        // will need to move this instruction after the merged instructions.
+        if (ReadWrite.first) {
+          InstsToMove.push_back(&*MBBI);
+          addDefsToList(*MBBI, DefsToMove);
+          break;
+        }
+      }
+      continue;
+    }
 
+    // Don't merge volatiles.
----------------
It is not possible to have a sub register def in SSA so I don't think you need any of this


https://reviews.llvm.org/D23814