[PATCH] D33099: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573)

Sat Oct 14 07:07:31 PDT 2017

RKSimon added inline comments.

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:357
+  let Latency = 8;
+  let ResourceCycles = [2];
+}
----------------
avt77 wrote:
> RKSimon wrote:
> > Shouldn't this def be something like the below, to show it will consume the AGU for a cycle? Same for the other loads.
> > ```
> > def WriteFAddYMLd: SchedWriteRes<[JLAGU,JFPU0]>  {
> >   let Latency = 8;
> >   let ResourceCycles = [1,2];
> > }
> > ```
> > 
> I thought about but Software Optimization Guide does not show it (I mean it says about AGU but it does not include the additional cycle in its tables).  Should I update the model?
> 
This is a load so the AGU should be the first pipe
```
def WriteDPPSLd: SchedWriteRes<[JLAGU, JFPU0, JFPU1]> {
```

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:398
+
+def WriteDPPDLd: SchedWriteRes<[JFPU0, JFPU1, JLAGU]> {
+  let Latency = 14;
----------------
def WriteDPPDLd: SchedWriteRes<[JLAGU, JFPU0, JFPU1]> {

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:404
+def : InstRW<[WriteDPPDLd], (instregex "(V)?DPPDrmi")>;
+
+////////////////////////////////////////////////////////////////////////////////
----------------
Missing VTEST instructions

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:413
+}
+def : InstRW<[WriteCVTPS2PH], (instregex "VCVTPS2PHrr", "VCVTPH2PSrr")>;
+
----------------
Latency is 3 according to AMD64_16h_InstrLatency_1.1.xlsx

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:416
+def WriteCVTPS2PHSt: SchedWriteRes<[JFPU1, JLAGU]> {
+  let Latency = 9;
+  let ResourceCycles = [1, 1];
----------------
You should probably just use a latency 3 here as its a convert+store.

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:419
+}
+def : InstRW<[WriteCVTPS2PHSt], (instregex "VCVTPS2PHmr", "VCVTPH2PSmr")>;
+
----------------
There's no such instruction as VCVTPH2PSmr

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:428
+
+def WriteCVTPS2PHYLd: SchedWriteRes<[JFPU0, JFPU1, JLAGU]> {
+  let Latency = 11;
----------------
WriteCVTPS2PHYSt

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:519
+  let Latency = 12;
+  let ResourceCycles = [6, 6];
+}
----------------
let NumMicroOps = 10;

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:525
+  let Latency = 17;
+  let ResourceCycles = [1, 6, 6];
+}
----------------
let NumMicroOps = 11;

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:533
+}
+def : InstRW<[WriteVCVT], (instregex "VCVTDQ2P(S|D)Yrr", "VMOVNTP(S|D)Ymr", "VROUNDYP(S|D)r")>;
+
----------------
Give the MOVNT and ROUND instructions their own entries

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:659
+}
+def : InstRW<[WriteVTESTY], (instregex "VTESTP(S|D)Yrr")>;
+
----------------
VPTESTD?

https://reviews.llvm.org/D33099