[PATCH] D47374: [RFC][patch 3/3] Add support for variant scheduling classes in llvm-mca.

Fri May 25 08:53:44 PDT 2018

andreadb added inline comments.

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:550
+////////////////////////////////////////////////////////////////////////////////
+//  MCInstPredicate definitions used by variant scheduling classes.
+////////////////////////////////////////////////////////////////////////////////
----------------
RKSimon wrote:
> Add a TODO saying this may go into X86Schedule.td in the future?
Will do.

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:575
+// Reference: Section 10.8 of the "Software Optimization Guide for AMD Family
+// 15h Processors".
+
----------------
RKSimon wrote:
> Reference Agner's microarchitecture doc as well - he explicity says this works for Jaguar - AMD 16h SOG unfortunately misses it and just having a reference to a different cpu is confusing.
Okay. I will add that reference.

================
Comment at: test/CodeGen/X86/sse-schedule.ll:6084
 ; BTVER2-SSE:       # %bb.0:
-; BTVER2-SSE-NEXT:    xorps %xmm1, %xmm0 # sched: [1:0.50]
+; BTVER2-SSE-NEXT:    xorps %xmm1, %xmm0 # sched: [?:?]
 ; BTVER2-SSE-NEXT:    xorps (%rdi), %xmm0 # sched: [6:1.00]
----------------
RKSimon wrote:
> This is unfortunate - ideally we'd keep [1:0.50] here and zero idioms would get [0:0.50]
I should have mentioned this in the summary. The existing functionality used to obtain the latency from a MCInst doesn't know how to resolve variant scheduling classes. So, it accepts a machine opcode ID, and it bails out if the scheduling class identifier from the opcode descriptor refers to a variant class.
The idea is to fix it. However, it would be a follow-up patch. I can add a FIXME comment there.

================
Comment at: test/tools/llvm-mca/X86/BtVer2/zero-idioms.s:32
+# CHECK-NEXT:  1      0      -                          pxor	%xmm2, %xmm2
+# CHECK-NEXT:  1      0      -                          vpxor	%xmm3, %xmm3, %xmm3
+
----------------
RKSimon wrote:
> Should the RThroughput still be limited to by issue width?
It doesn't take into account the issue width. In this case, the MCSchedInfo method returned an invalid RThroughput (an Optional<double>).

On the other hand, the Block RThroughput correctly reports 3.0 for the sequence of 6 instructions (for a total of 6 uOps).

================
Comment at: tools/llvm-mca/InstrBuilder.cpp:385
+    SchedClassID =
+        STI.resolveVariantSchedClass(SchedClassID, &MCI, SM.getProcessorID());
+  }
----------------
RKSimon wrote:
> Drop braces - also is it worth pulling out SM.getProcessorID()? It might get this down to a single line for tidiness.
Okay. I will fix it.

https://reviews.llvm.org/D47374