[PATCH] D60441: [X86] Make _Int instructions the preferred instructon for the assembly parser and disassembly parser to remove inconsistencies between VEX and EVEX.
Andrea Di Biagio via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 9 09:48:13 PDT 2019
andreadb added a comment.
The reason why there is a regression is because this patch adds an extra input operand to the following instructions:
cvtsi2ssl %ecx, %xmm0
cvtsi2sdl %ecx, %xmm0
For example:
cvtsi2ssl %ecx, %xmm0 # <MCInst #775 CVTSI2SSrr_Int
# <MCOperand Reg:142>
# <MCOperand Reg:142>
# <MCOperand Reg:25>>
XMM0 is now an in/out operand. Before this patch, XMM0 was only an output register.
This is equivalent to introducing a false dependency on the output register. That is what causes the extra latency from those two mca tests.
There is a way to workaround the problem introduced by the presence of that extra register read.
We can force that extra read to always have zero-latency by introducing a special "ReadAdvance" definition.
--- X86Schedule.td (revision 357997)
+++ X86Schedule.td (working copy)
@@ -22,6 +22,7 @@
// This SchedRead describes a bypass delay caused by data being moved from the
// integer unit to the floating point unit.
def ReadInt2Fpu : SchedRead;
+def ReadAfterInt2Fpu : SchedRead;
For BdVer2 `ReadAfterInt2Fpu ` would be defined as:
def : ReadAdvance<ReadAfterInt2Fpu, 13>;
For BtVer2 it would be defined as follows:
def : ReadAdvance<ReadAfterInt2Fpu, 7>;
The last step is to add `ReadAfterInt2Fpu` to the schedule read/write list of SSE CVTSI*_Int definitions.
In my experimental change, I had to change multiclass `sse12_cvt_sint_3addr` by adding an extra param (see below):
Index: X86InstrSSE.td
===================================================================
--- X86InstrSSE.td (revision 357997)
+++ X86InstrSSE.td (working copy)
@@ -1026,6 +989,7 @@
multiclass sse12_cvt_sint_3addr<bits<8> opc, RegisterClass SrcRC,
RegisterClass DstRC, X86MemOperand x86memop,
string asm, X86FoldableSchedWrite sched,
+ SchedRead ReadAdv = ReadDefault,
bit Is2Addr = 1> {
That required the following changes too:
+let Predicates = [UseAVX] in {
+defm VCVTSI2SS : sse12_cvt_sint_3addr<0x2A, GR32, VR128,
+ i32mem, "cvtsi2ss{l}", WriteCvtI2SS, ReadDefault, 0>, XS, VEX_4V;
+defm VCVTSI642SS : sse12_cvt_sint_3addr<0x2A, GR64, VR128,
+ i64mem, "cvtsi2ss{q}", WriteCvtI2SS, ReadDefault, 0>, XS, VEX_4V, VEX_W;
+defm VCVTSI2SD : sse12_cvt_sint_3addr<0x2A, GR32, VR128,
+ i32mem, "cvtsi2sd{l}", WriteCvtI2SD, ReadDefault, 0>, XD, VEX_4V;
+defm VCVTSI642SD : sse12_cvt_sint_3addr<0x2A, GR64, VR128,
+ i64mem, "cvtsi2sd{q}", WriteCvtI2SD, ReadDefault, 0>, XD, VEX_4V, VEX_W;
+}
+let Constraints = "$src1 = $dst" in {
+ defm CVTSI2SS : sse12_cvt_sint_3addr<0x2A, GR32, VR128,
+ i32mem, "cvtsi2ss{l}", WriteCvtI2SS, ReadAfterInt2Fpu>, XS;
+ defm CVTSI642SS : sse12_cvt_sint_3addr<0x2A, GR64, VR128,
+ i64mem, "cvtsi2ss{q}", WriteCvtI2SS, ReadAfterInt2Fpu>, XS, REX_W;
+ defm CVTSI2SD : sse12_cvt_sint_3addr<0x2A, GR32, VR128,
+ i32mem, "cvtsi2sd{l}", WriteCvtI2SD, ReadAfterInt2Fpu>, XD;
+ defm CVTSI642SD : sse12_cvt_sint_3addr<0x2A, GR64, VR128,
+ i64mem, "cvtsi2sd{q}", WriteCvtI2SD, ReadAfterInt2Fpu>, XD, REX_W;
+}
This was enough to avoid the regression on BtVer2.
There is one last change required for BdVer2:
===================================================================
--- X86ScheduleBdVer2.td (revision 357997)
+++ X86ScheduleBdVer2.td (working copy)
@@ -898,7 +899,8 @@
let Latency = 13;
let NumMicroOps = 2;
}
-def : InstRW<[PdWriteCVTSI642SDrr_CVTSI642SSrr_CVTSI2SDr_CVTSI2SSrr], (instrs CVTSI642SDrr, CVTSI642SSrr, CVTSI2SDrr, CVTSI2SSrr)>;
+def : InstRW<[PdWriteCVTSI642SDrr_CVTSI642SSrr_CVTSI2SDr_CVTSI2SSrr, ReadAfterInt2Fpu], (instrs CVTSI642SDrr, CVTSI642SSrr, CVTSI2SDrr, CVTSI2SSrr,
+ CVTSI642SDrr_Int, CVTSI642SSrr_Int, CVTSI2SDrr_Int, CVTSI2SSrr_Int)>;
Basically, we need to explicitly pass `ReadAfterInt2Fpu` to that InstRW. Otherwise the regression would not go away.
Tbh. I don't know if there is another way to fix that regression.
If we want to fix it, then this may be a way to do it.
I hope it helps.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D60441/new/
https://reviews.llvm.org/D60441
More information about the llvm-commits
mailing list