[PATCH] D33099: AMD Jaguar scheduler doesn't correctly model 256-bit AVX instructions
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu May 11 06:41:35 PDT 2017
RKSimon added inline comments.
================
Comment at: lib/Target/X86/X86Schedule.td:48
def WriteLEA : SchedWrite; // LEA instructions can't fold loads.
+def WriteLEA3 : SchedWrite; // Complex LEA instructions can't fold loads.
----------------
The LEA3 changes should be in their own patch.
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:137
+ let Latency = 1;
+ // FIXME: We need 0.5 but it's list<int>?
+ let ResourceCycles = [1];
----------------
Isn't this handled by the use of JALU01 grouping JALU0 + JALU1 together? So it has a choice of 2 pipes and it will have a tp of 1cy whichever it goes down.
================
Comment at: test/CodeGen/X86/sse2-schedule.ll:6022
; BTVER2-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0] sched: [1:0.50]
-; BTVER2-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm0[0],mem[0] sched: [6:1.00]
+; BTVER2-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm0[0],mem[0] sched: [6:0.50]
; BTVER2-NEXT: vaddpd %xmm1, %xmm0, %xmm0 # sched: [3:1.00]
----------------
Jaguar has a max of 1 load/cycle - so the tp should still be 1.00
https://reviews.llvm.org/D33099
More information about the llvm-commits
mailing list