[PATCH] D28152: Cortex-A57 scheduling model for ARM backend (AArch32)
Javed Absar via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Jan 22 09:08:21 PST 2017
javed.absar added a comment.
Hi.
Some inline comments. Also, could you please share some performance results that you have for this sched-model.
Best Regards
Javed
================
Comment at: lib/Target/ARM/ARMScheduleA57.td:498
+
+def A57WriteLdrAm3PostX3 : SchedWriteVariant<[
+ SchedVar<IsLdrAm3RegOffPredX3, [A57Write_4cyc_1I_1L]>,
----------------
Do we need sched-variant here since outcome is identical?
================
Comment at: lib/Target/ARM/ARMScheduleA57.td:1214
+// ASIMD reverse
+def : InstRW<[A57Write_3cyc_1V], (instregex "VREV16", "VREV32", "VREV64")>;
+
----------------
You may want to combine VREV* VSWP* VTBL .. into one instregex comma separated def, to make definitions shorter.
Similarly in other places too, where possible without compromising clarity
================
Comment at: lib/Target/ARM/ARMScheduleA57WriteRes.td:19
+// e.g. A57Write_6cyc_1I_6S_4V means the total latency is 6 and there are
+// 11 micro-ops to be issued down one I pipe, six S pipes and four V pipes.
+//
----------------
You may want to rephrase "issued down one " to "issued as follows: one to I pipe, six to S pipe and ...", for better clarity.
================
Comment at: lib/Target/ARM/ARMScheduleA57WriteRes.td:55
+def A57Write_3cyc_1X : SchedWriteRes<[A57UnitX]> { let Latency = 3; }
+def A57Write_3cyc_1L : SchedWriteRes<[A57UnitL]> { let Latency = 3; }
+def A57Write_4cyc_1L : SchedWriteRes<[A57UnitL]> { let Latency = 4; }
----------------
You could shorten the following using someitng like -
foreach Lat = 3-20 in {
def A57Write_#Lat#cyc_1L : SchedWriteRes<[A57UnitL]> {
let Latency = Lat;
}
================
Comment at: lib/Target/ARM/ARMScheduleA57WriteRes.td:74
+
+def A57Write_4cyc_1S : SchedWriteRes<[A57UnitS]> { let Latency = 4; }
+def A57Write_5cyc_1S : SchedWriteRes<[A57UnitS]> { let Latency = 5; }
----------------
Same here. You could shorten the following -
foreach Lat = 5-16 in {
def A57Write_#Lat#cyc_1S : SchedWriteRes<[A57UnitS]> {
let Latency = Lat;
}
================
Comment at: lib/Target/ARM/ARMScheduleA57WriteRes.td:243
+}
+def A57Write_4cyc_1L_1I : SchedWriteRes<[A57UnitL,
+ A57UnitI]> {
----------------
Same here. You could shorten with the following -
foreach Lat = 1-20 in {
def A57Write_#Lat#cyc_1L_1I : SchedWriteRes<[A57UnitL, A57UnitI]>
{
let Latency = Lat; let NumMicroOps = 2;
}
================
Comment at: lib/Target/ARM/ARMScheduleA57WriteRes.td:754
+
+def A57Write_6cyc_6S_4V : SchedWriteRes<[A57UnitS, A57UnitS, A57UnitS,
+ A57UnitS, A57UnitS, A57UnitS,
----------------
Something seems amiss as its defined earlier there is only 1 processor-unit of type A57UnitS.
================
Comment at: test/CodeGen/ARM/cortex-a57-misched-alu.ll:30
+
+; CHECK: ** ScheduleDAGMILive::schedule picking next node
+; Skipping COPY
----------------
This part below is not required I think as you have already checked the latencies here against the model.
https://reviews.llvm.org/D28152
More information about the llvm-commits
mailing list