[PATCH] D33099: AMD Jaguar scheduler doesn't correctly model 256-bit AVX instructions

Tue May 16 14:27:27 PDT 2017

RKSimon added inline comments.

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:71
+def JFPIntCluster : ProcResGroup<[JVALU0, JVALU1, JSTC]>;
+
 // Integer loads are 3 cycles, so ReadAfterLd registers needn't be available until 3
----------------
I don't think adding these Cluster groups is necessary. TBH most of the ProcResource defs appear to be superfluous - most aren't used at all - we're just using the JFPU0/JFPU0/JFPU01 defs, with a few others for the longer op chain instructions.

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:349
+
+def WriteFAddYY: SchedWriteRes<[JFPA]> {
+  let Latency = 3;
----------------
Better off using JFPU0 as that's what is actually bound to the buffer. Same for the others below.

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:357
+  let Latency = 8;
+  let ResourceCycles = [2];
+}
----------------
Shouldn't this def be something like the below, to show it will consume the AGU for a cycle? Same for the other loads.
```
def WriteFAddYMLd: SchedWriteRes<[JLAGU,JFPU0]>  {
  let Latency = 8;
  let ResourceCycles = [1,2];
}
```

================
Comment at: test/CodeGen/X86/slow-unaligned-mem.ll:89
 ; FAST:       # BB#0:
-; FAST-NEXT:    movl {{[0-9]+}}(%esp), %eax
+; FAST:         movl {{[0-9]+}}(%esp), %eax
 ; FAST-NOT:     movl
----------------
????

https://reviews.llvm.org/D33099