[PATCH] D49671: [SchedModel] Propagate read advance cycles to implicit operands outside instruction descriptor
Jonas Paulsson via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jul 23 07:58:55 PDT 2018
jonpa created this revision.
jonpa added reviewers: uweigand, MatzeB, Florob, javed.absar, hfinkel.
Herald added a subscriber: JDevlieghere.
The SchedModel allows the addition of ReadAdvances to express that certain operands of the instructions is needed at a later point than the others. On SystemZ, this amounts to the register operand of a reg/mem instruction, given that the memory operand must first be loaded.
I discovered that in ~ 5% of the cases of expected latency adjustment, this was in effect not achieved. This problem involves the extra use operands added by regalloc for the full register, in case of a subregister usage, like:
After Coalescer:
1920B %70.subreg_l32:addr64bit = MSR %70.subreg_l32:addr64bit, %70.subreg_l32:addr64bit
1952B %70.subreg_l32:addr64bit = MSY %70.subreg_l32:addr64bit, %92:addr64bit, -12, $noreg :: (load 4 from %ir.scevgep18)
After RA:
2136B renamable $r4l = MSR renamable $r4l, renamable $r4l, implicit killed $r4d, implicit-def $r4d
2144B renamable $r4l = MSY renamable $r4l, renamable $r2d, -12, $noreg, implicit killed $r4d, implicit-def $r4d :: (load 4 from %ir.scevgep18)
Post-RA machine scheduler DAG:
SU(20): renamable $r4l = MSR renamable $r4l, renamable $r4l, implicit $r4d, implicit-def $r4d
# preds left : 3
# succs left : 3
# rdefs left : 0
Latency : 6
Depth : 2
Height : 31
Predecessors:
SU(4): Out Latency=0
SU(4): Data Latency=1 Reg=$r4l
SU(4): Data Latency=1 Reg=$r4d
Successors:
SU(21): Out Latency=0
SU(21): Data Latency=2 Reg=$r4l
SU(21): Data Latency=6 Reg=$r4d
SU(21): renamable $r4l = MSY renamable $r4l, renamable $r2d, -12, $noreg, implicit $r4d, implicit-def $r4d :: (load 4 from %ir.scevgep18)
# preds left : 3
# succs left : 3
# rdefs left : 0
Latency : 10
Depth : 8
Height : 25
Predecessors:
SU(20): Out Latency=0
SU(20): Data Latency=2 Reg=$r4l
SU(20): Data Latency=6 Reg=$r4d
SU(20) has instruction latency 6, and MSY has a ReadAdvance on the first use operand of 4 cycles ($r4l). However, the $r4d operand is not covered by this, so the effective latency between the nodes is still 6!
It seems to me that this is a target independent problem. I am not really sure how to best handle this situation, but it seems that the patch I made here solves the problem on SystemZ.
I thought about a simpler version like "If a register use is not part of the instruction descriptor, set latency to 0, in case a subreg has a read advance". I did not dare to do this however, since I found some rare cases (not involving ReadAdvance:s), where it was actually the super reg that had a latency value, while its subreg had a latency of 0. I am guessing this is another situation involving super/sub regs not quite the same as the more common one seen above.
As before, I am not really aware of the true necessities of these extra register allocator operands, but I trust they are needed somehow (explanations welcome). Given this, I suspect there may be some simpler way of achieving this result?
Have not yet looked at any test impact.
https://reviews.llvm.org/D49671
Files:
lib/CodeGen/TargetSchedule.cpp
Index: lib/CodeGen/TargetSchedule.cpp
===================================================================
--- lib/CodeGen/TargetSchedule.cpp
+++ lib/CodeGen/TargetSchedule.cpp
@@ -232,6 +232,37 @@
return Latency;
unsigned UseIdx = findUseIdx(UseMI, UseOperIdx);
int Advance = STI->getReadAdvanceCycles(UseDesc, UseIdx, WriteID);
+
+ // Take extra care with implicit ops that are not part of the instruction
+ // descriptor (typically those added by RegAlloc for the full register).
+ const MCInstrDesc *UseIDesc = &UseMI->getDesc();
+ const MachineOperand &UseMO = UseMI->getOperand(UseOperIdx);
+ const TargetRegisterInfo *TRI = UseMI->getMF()->getSubtarget().getRegisterInfo();
+ if (!Advance && UseMO.isImplicit() &&
+ !UseIDesc->hasImplicitUseOfPhysReg(UseMO.getReg())) {
+ UseIdx = 0;
+ int SubAdv = INT_MAX;
+ for (unsigned MOIdx = 0; MOIdx < UseMI->getNumOperands(); MOIdx++) {
+ const MachineOperand &MO = UseMI->getOperand(MOIdx);
+ // Only consider operands part of UseDesc.
+ if (MOIdx >= UseIDesc->getNumOperands() &&
+ ((MO.isDef() && !UseIDesc->hasImplicitDefOfPhysReg(MO.getReg())) ||
+ (MO.isUse() && !UseIDesc->hasImplicitUseOfPhysReg(MO.getReg()))))
+ continue;
+
+ // Get the smallest advance for any subregister of UseMO.
+ if (MO.isReg() && MO.readsReg() && !MO.isDef()) {
+ if (TRI->isSubRegister(UseMO.getReg(), MO.getReg()))
+ SubAdv = std::min(SubAdv, STI->getReadAdvanceCycles(UseDesc, UseIdx, WriteID));
+ UseIdx++;
+ }
+ if (SubAdv == 0)
+ break;
+ }
+ if (SubAdv < INT_MAX && SubAdv != 0)
+ Advance = SubAdv;
+ }
+
if (Advance > 0 && (unsigned)Advance > Latency) // unsigned wrap
return 0;
return Latency - Advance;
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D49671.156788.patch
Type: text/x-patch
Size: 1860 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180723/403f0b99/attachment.bin>
More information about the llvm-commits
mailing list