[PATCH] D28152: Cortex-A57 scheduling model for ARM backend (AArch32)

Sun Jan 22 09:08:21 PST 2017

javed.absar added a comment.

Hi. 
Some inline comments. Also, could you please share some performance results that you have for this sched-model.
Best Regards
Javed

================
Comment at: lib/Target/ARM/ARMScheduleA57.td:498
+
+def A57WriteLdrAm3PostX3 : SchedWriteVariant<[
+  SchedVar<IsLdrAm3RegOffPredX3, [A57Write_4cyc_1I_1L]>,
----------------
Do we need sched-variant here since outcome is identical?

================
Comment at: lib/Target/ARM/ARMScheduleA57.td:1214
+// ASIMD reverse
+def : InstRW<[A57Write_3cyc_1V], (instregex "VREV16", "VREV32", "VREV64")>;
+
----------------
You may want to combine VREV* VSWP*  VTBL .. into one instregex comma separated def, to make definitions shorter.
Similarly in other places too, where possible without compromising clarity

================
Comment at: lib/Target/ARM/ARMScheduleA57WriteRes.td:19
+// e.g. A57Write_6cyc_1I_6S_4V means the total latency is 6 and there are
+//      11 micro-ops to be issued down one I pipe, six S pipes and four V pipes.
+//
----------------
You may want to rephrase "issued down one " to "issued as follows: one to I pipe, six to S pipe and ...", for better clarity.

================
Comment at: lib/Target/ARM/ARMScheduleA57WriteRes.td:55
+def A57Write_3cyc_1X  : SchedWriteRes<[A57UnitX]> { let Latency = 3;  }
+def A57Write_3cyc_1L  : SchedWriteRes<[A57UnitL]> { let Latency = 3;  }
+def A57Write_4cyc_1L  : SchedWriteRes<[A57UnitL]> { let Latency = 4;  }
----------------
You could shorten the following using someitng like - 

foreach Lat = 3-20  in {
  def A57Write_#Lat#cyc_1L : SchedWriteRes<[A57UnitL]> {   
     let Latency = Lat;
}

================
Comment at: lib/Target/ARM/ARMScheduleA57WriteRes.td:74
+
+def A57Write_4cyc_1S  : SchedWriteRes<[A57UnitS]> { let Latency = 4;  }
+def A57Write_5cyc_1S  : SchedWriteRes<[A57UnitS]> { let Latency = 5;  }
----------------
Same here. You could shorten the following  -
foreach Lat = 5-16  in {
def A57Write_#Lat#cyc_1S : SchedWriteRes<[A57UnitS]> {   
   let Latency = Lat;
}

================
Comment at: lib/Target/ARM/ARMScheduleA57WriteRes.td:243
+}
+def A57Write_4cyc_1L_1I  : SchedWriteRes<[A57UnitL,
+                                          A57UnitI]> {
----------------
Same here. You could shorten with the following -
foreach Lat = 1-20 in {
def A57Write_#Lat#cyc_1L_1I : SchedWriteRes<[A57UnitL, A57UnitI]> 
{
let Latency = Lat; let NumMicroOps = 2;
}

================
Comment at: lib/Target/ARM/ARMScheduleA57WriteRes.td:754
+
+def A57Write_6cyc_6S_4V : SchedWriteRes<[A57UnitS, A57UnitS, A57UnitS,
+                                         A57UnitS, A57UnitS, A57UnitS,
----------------
Something seems amiss as its defined earlier there is only 1 processor-unit of type A57UnitS. 

================
Comment at: test/CodeGen/ARM/cortex-a57-misched-alu.ll:30
+
+; CHECK:      ** ScheduleDAGMILive::schedule picking next node
+; Skipping COPY
----------------
This part below is not required I think as you have already checked the latencies here against the model.

https://reviews.llvm.org/D28152