[PATCH] D33099: AMD Jaguar scheduler doesn't correctly model 256-bit AVX instructions
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jun 7 07:04:39 PDT 2017
RKSimon added inline comments.
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:22
+ // in-flight in the 64-macro-op in-flight window that the integer retire control unit provides.
+ let MicroOpBufferSize = 64; // Integer Retire Control Unit
let LoadLatency = 5; // FPU latency (worse case cf Integer 3 cycle latency)
----------------
It is still the Retire Control Unit, its just that the FPU can only touch 44 of the entries.
```
let MicroOpBufferSize = 64; // Retire Control Unit
```
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:25
let PostRAScheduler = 1;
-
// FIXME: SSE4/AVX is unimplemented. This flag is set to allow
----------------
Don't remove whitespace.
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:94
int Lat> {
+
// Register variant is using a single cycle on ExePort.
----------------
Undo this whitespace
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:176
defm : JWriteResFpuPair<WriteFShuffle256, JFPU01, 1>;
-
def : WriteRes<WriteFSqrt, [JFPU1, JLAGU, JFPM]> {
----------------
Don't remove whitespace.
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:370
+
+def WriteVMULPD: SchedWriteRes<[JFPU1]> {
+ let Latency = 4;
----------------
WriteVMULYPD
For all these defs, please can you include the 'Y' to make it clear that its just the 256-bit case
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:442
+
+// FIXME: We don't need 'Ld' version for AVX11 because deafult ResourceCycles == 1
+// TODO: How to use ResourceCycles from non-folding version like we do it for Latency?
----------------
What is AVX11?
Spelling: deafault -> default
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:450
+ "VBROADCASTF128", "VBROADCASTSSrr", "VINSERTF128rr",
+ "VMOVAP(D|S)rm", "VMOVDDUPYrr", "VMOVS(H|L)DUPYrr", "VMOVUP(D|S)Yrm",
+ "VORP(S|D)Yrr", "VPERMILP(D|S)Yri", "VSHUFP(D|S)Yrri", "VUNPCK(H|L)P(D|S)rr",
----------------
"VMOVAP(D|S)rm" etc. are memory loads - they should be in the Ld version
================
Comment at: test/CodeGen/X86/avx-vzeroupper.ll:163
+; NO-VZ-NEXT: popq %rbx
+; NO-VZ-NEXT: retq
entry:
----------------
What is causing this?
================
Comment at: test/CodeGen/X86/recip-fastmath.ll:344
; BTVER2-NEXT: vrcpps %xmm0, %xmm1 # sched: [2:1.00]
+; BTVER2-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [1:1.00]
; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [2:1.00]
----------------
Latency should be 5cy
https://reviews.llvm.org/D33099
More information about the llvm-commits
mailing list