[PATCH] D49671: [SchedModel] Propagate read advance cycles to implicit operands outside instruction descriptor

Mon Jul 23 07:58:55 PDT 2018

jonpa created this revision.
jonpa added reviewers: uweigand, MatzeB, Florob, javed.absar, hfinkel.
Herald added a subscriber: JDevlieghere.

The SchedModel allows the addition of ReadAdvances to express that certain operands of the instructions is needed at a later point than the others. On SystemZ, this amounts to the register operand of a reg/mem instruction, given that the memory operand must first be loaded.

I discovered that in ~ 5% of the cases of expected latency adjustment, this was in effect not achieved. This problem involves the extra use operands added by regalloc for the full register, in case of a subregister usage, like:

  After Coalescer:
  
  1920B     %70.subreg_l32:addr64bit = MSR %70.subreg_l32:addr64bit, %70.subreg_l32:addr64bit
  1952B     %70.subreg_l32:addr64bit = MSY %70.subreg_l32:addr64bit, %92:addr64bit, -12, $noreg :: (load 4 from %ir.scevgep18)
  
  After RA:
  
  2136B     renamable $r4l = MSR renamable $r4l, renamable $r4l, implicit killed $r4d, implicit-def $r4d
  2144B     renamable $r4l = MSY renamable $r4l, renamable $r2d, -12, $noreg, implicit killed $r4d, implicit-def $r4d :: (load 4 from %ir.scevgep18)
  
  Post-RA machine scheduler DAG:
  
  SU(20):   renamable $r4l = MSR renamable $r4l, renamable $r4l, implicit $r4d, implicit-def $r4d
  # preds left       : 3
  # succs left       : 3
  # rdefs left       : 0
  Latency            : 6
  Depth              : 2
  Height             : 31
  Predecessors:
  SU(4): Out  Latency=0
  SU(4): Data Latency=1 Reg=$r4l
  SU(4): Data Latency=1 Reg=$r4d
  Successors:
  SU(21): Out  Latency=0
  SU(21): Data Latency=2 Reg=$r4l
  SU(21): Data Latency=6 Reg=$r4d
  SU(21):   renamable $r4l = MSY renamable $r4l, renamable $r2d, -12, $noreg, implicit $r4d, implicit-def $r4d :: (load 4 from %ir.scevgep18)
  # preds left       : 3
  # succs left       : 3
  # rdefs left       : 0
  Latency            : 10
  Depth              : 8
  Height             : 25
  Predecessors:
  SU(20): Out  Latency=0
  SU(20): Data Latency=2 Reg=$r4l
  SU(20): Data Latency=6 Reg=$r4d

SU(20) has instruction latency 6, and MSY has a ReadAdvance on the first use operand of 4 cycles ($r4l). However, the $r4d operand is not covered by this, so the effective latency between the nodes is still 6!

It seems to me that this is a target independent problem. I am not really sure how to best handle this situation, but it seems that the patch I made here solves the problem on SystemZ.

I thought about a simpler version like "If a register use is not part of the instruction descriptor, set latency to 0, in case a subreg has a read advance". I did not dare to do this however, since I found some rare cases (not involving ReadAdvance:s), where it was actually the super reg that had a latency value, while its subreg had a latency of 0. I am guessing this is another situation involving super/sub regs not quite the same as the more common one seen above.

As before, I am not really aware of the true necessities of these extra register allocator operands, but I trust they are needed somehow (explanations welcome). Given this, I suspect there may be some simpler way of achieving this result?

Have not yet looked at any test impact.


https://reviews.llvm.org/D49671

Files:
  lib/CodeGen/TargetSchedule.cpp


Index: lib/CodeGen/TargetSchedule.cpp
===================================================================

--- lib/CodeGen/TargetSchedule.cpp
+++ lib/CodeGen/TargetSchedule.cpp
@@ -232,6 +232,37 @@
       return Latency;
     unsigned UseIdx = findUseIdx(UseMI, UseOperIdx);
     int Advance = STI->getReadAdvanceCycles(UseDesc, UseIdx, WriteID);
+
+    // Take extra care with implicit ops that are not part of the instruction
+    // descriptor (typically those added by RegAlloc for the full register).
+    const MCInstrDesc *UseIDesc = &UseMI->getDesc();
+    const MachineOperand &UseMO = UseMI->getOperand(UseOperIdx);
+    const TargetRegisterInfo *TRI = UseMI->getMF()->getSubtarget().getRegisterInfo();
+    if (!Advance && UseMO.isImplicit() &&
+        !UseIDesc->hasImplicitUseOfPhysReg(UseMO.getReg())) {
+      UseIdx = 0;
+      int SubAdv = INT_MAX;
+      for (unsigned MOIdx = 0; MOIdx < UseMI->getNumOperands(); MOIdx++) {
+        const MachineOperand &MO = UseMI->getOperand(MOIdx);
+        // Only consider operands part of UseDesc.
+        if (MOIdx >= UseIDesc->getNumOperands() &&
+            ((MO.isDef() && !UseIDesc->hasImplicitDefOfPhysReg(MO.getReg())) ||
+             (MO.isUse() && !UseIDesc->hasImplicitUseOfPhysReg(MO.getReg()))))
+            continue;
+
+        // Get the smallest advance for any subregister of UseMO.
+        if (MO.isReg() && MO.readsReg() && !MO.isDef()) {
+          if (TRI->isSubRegister(UseMO.getReg(), MO.getReg()))
+            SubAdv = std::min(SubAdv, STI->getReadAdvanceCycles(UseDesc, UseIdx, WriteID));
+          UseIdx++;
+        }
+        if (SubAdv == 0)
+          break;
+      }
+      if (SubAdv < INT_MAX && SubAdv != 0)
+        Advance = SubAdv;
+    }
+
     if (Advance > 0 && (unsigned)Advance > Latency) // unsigned wrap
       return 0;
     return Latency - Advance;


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D49671.156788.patch
Type: text/x-patch
Size: 1860 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180723/403f0b99/attachment.bin>