[PATCH] D33099: AMD Jaguar scheduler doesn't correctly model 256-bit AVX instructions

Thu May 11 06:41:35 PDT 2017

RKSimon added inline comments.

================
Comment at: lib/Target/X86/X86Schedule.td:48
 def  WriteLEA  : SchedWrite;        // LEA instructions can't fold loads.
+def  WriteLEA3 : SchedWrite;        // Complex LEA instructions can't fold loads.

----------------
The LEA3 changes should be in their own patch.

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:137
+  let Latency = 1;
+  // FIXME: We need 0.5 but it's list<int>?
+  let ResourceCycles = [1];
----------------
Isn't this handled by the use of JALU01 grouping JALU0 + JALU1 together? So it has a choice of 2 pipes and it will have a tp of 1cy whichever it goes down.

================
Comment at: test/CodeGen/X86/sse2-schedule.ll:6022
 ; BTVER2-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0] sched: [1:0.50]
-; BTVER2-NEXT:    vunpcklpd {{.*#+}} xmm1 = xmm0[0],mem[0] sched: [6:1.00]
+; BTVER2-NEXT:    vunpcklpd {{.*#+}} xmm1 = xmm0[0],mem[0] sched: [6:0.50]
 ; BTVER2-NEXT:    vaddpd %xmm1, %xmm0, %xmm0 # sched: [3:1.00]
----------------
Jaguar has a max of 1 load/cycle - so the tp should still be 1.00

https://reviews.llvm.org/D33099