[PATCH] D47374: [RFC][patch 3/3] Add support for variant scheduling classes in llvm-mca.
Andrea Di Biagio via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri May 25 08:53:44 PDT 2018
andreadb added inline comments.
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:550
+////////////////////////////////////////////////////////////////////////////////
+// MCInstPredicate definitions used by variant scheduling classes.
+////////////////////////////////////////////////////////////////////////////////
----------------
RKSimon wrote:
> Add a TODO saying this may go into X86Schedule.td in the future?
Will do.
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:575
+// Reference: Section 10.8 of the "Software Optimization Guide for AMD Family
+// 15h Processors".
+
----------------
RKSimon wrote:
> Reference Agner's microarchitecture doc as well - he explicity says this works for Jaguar - AMD 16h SOG unfortunately misses it and just having a reference to a different cpu is confusing.
Okay. I will add that reference.
================
Comment at: test/CodeGen/X86/sse-schedule.ll:6084
; BTVER2-SSE: # %bb.0:
-; BTVER2-SSE-NEXT: xorps %xmm1, %xmm0 # sched: [1:0.50]
+; BTVER2-SSE-NEXT: xorps %xmm1, %xmm0 # sched: [?:?]
; BTVER2-SSE-NEXT: xorps (%rdi), %xmm0 # sched: [6:1.00]
----------------
RKSimon wrote:
> This is unfortunate - ideally we'd keep [1:0.50] here and zero idioms would get [0:0.50]
I should have mentioned this in the summary. The existing functionality used to obtain the latency from a MCInst doesn't know how to resolve variant scheduling classes. So, it accepts a machine opcode ID, and it bails out if the scheduling class identifier from the opcode descriptor refers to a variant class.
The idea is to fix it. However, it would be a follow-up patch. I can add a FIXME comment there.
================
Comment at: test/tools/llvm-mca/X86/BtVer2/zero-idioms.s:32
+# CHECK-NEXT: 1 0 - pxor %xmm2, %xmm2
+# CHECK-NEXT: 1 0 - vpxor %xmm3, %xmm3, %xmm3
+
----------------
RKSimon wrote:
> Should the RThroughput still be limited to by issue width?
It doesn't take into account the issue width. In this case, the MCSchedInfo method returned an invalid RThroughput (an Optional<double>).
On the other hand, the Block RThroughput correctly reports 3.0 for the sequence of 6 instructions (for a total of 6 uOps).
================
Comment at: tools/llvm-mca/InstrBuilder.cpp:385
+ SchedClassID =
+ STI.resolveVariantSchedClass(SchedClassID, &MCI, SM.getProcessorID());
+ }
----------------
RKSimon wrote:
> Drop braces - also is it worth pulling out SM.getProcessorID()? It might get this down to a single line for tidiness.
Okay. I will fix it.
https://reviews.llvm.org/D47374
More information about the llvm-commits
mailing list