[PATCH] D33099: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573)
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Oct 14 07:07:31 PDT 2017
RKSimon added inline comments.
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:357
+ let Latency = 8;
+ let ResourceCycles = [2];
+}
----------------
avt77 wrote:
> RKSimon wrote:
> > Shouldn't this def be something like the below, to show it will consume the AGU for a cycle? Same for the other loads.
> > ```
> > def WriteFAddYMLd: SchedWriteRes<[JLAGU,JFPU0]> {
> > let Latency = 8;
> > let ResourceCycles = [1,2];
> > }
> > ```
> >
> I thought about but Software Optimization Guide does not show it (I mean it says about AGU but it does not include the additional cycle in its tables). Should I update the model?
>
This is a load so the AGU should be the first pipe
```
def WriteDPPSLd: SchedWriteRes<[JLAGU, JFPU0, JFPU1]> {
```
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:398
+
+def WriteDPPDLd: SchedWriteRes<[JFPU0, JFPU1, JLAGU]> {
+ let Latency = 14;
----------------
def WriteDPPDLd: SchedWriteRes<[JLAGU, JFPU0, JFPU1]> {
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:404
+def : InstRW<[WriteDPPDLd], (instregex "(V)?DPPDrmi")>;
+
+////////////////////////////////////////////////////////////////////////////////
----------------
Missing VTEST instructions
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:413
+}
+def : InstRW<[WriteCVTPS2PH], (instregex "VCVTPS2PHrr", "VCVTPH2PSrr")>;
+
----------------
Latency is 3 according to AMD64_16h_InstrLatency_1.1.xlsx
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:416
+def WriteCVTPS2PHSt: SchedWriteRes<[JFPU1, JLAGU]> {
+ let Latency = 9;
+ let ResourceCycles = [1, 1];
----------------
You should probably just use a latency 3 here as its a convert+store.
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:419
+}
+def : InstRW<[WriteCVTPS2PHSt], (instregex "VCVTPS2PHmr", "VCVTPH2PSmr")>;
+
----------------
There's no such instruction as VCVTPH2PSmr
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:428
+
+def WriteCVTPS2PHYLd: SchedWriteRes<[JFPU0, JFPU1, JLAGU]> {
+ let Latency = 11;
----------------
WriteCVTPS2PHYSt
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:519
+ let Latency = 12;
+ let ResourceCycles = [6, 6];
+}
----------------
let NumMicroOps = 10;
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:525
+ let Latency = 17;
+ let ResourceCycles = [1, 6, 6];
+}
----------------
let NumMicroOps = 11;
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:533
+}
+def : InstRW<[WriteVCVT], (instregex "VCVTDQ2P(S|D)Yrr", "VMOVNTP(S|D)Ymr", "VROUNDYP(S|D)r")>;
+
----------------
Give the MOVNT and ROUND instructions their own entries
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:659
+}
+def : InstRW<[WriteVTESTY], (instregex "VTESTP(S|D)Yrr")>;
+
----------------
VPTESTD?
https://reviews.llvm.org/D33099
More information about the llvm-commits
mailing list