[PATCH] D38196: [AArch64] Avoid interleaved SIMD store instructions for Exynos

Mon Nov 13 08:44:27 PST 2017

kristof.beyls added a comment.

I haven't gone through the whole patch line by line again, but I believe this is almost in a reasonable shape.
Could you do the compilation time benchmarking to ensure this doesn't increase it for target not needing the rewriting?

================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:80
+
+	// Instruction representd by OrigOpc is replaced by instructions in ReplOpc.
+  struct InstReplInfo {
----------------
s/representd/represented/

================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:87-92
+#define Rule1(OpcOrg, OpcR0, OpcR1, OpcR2, RC) \
+  {OpcOrg, {OpcR0, OpcR1, OpcR2}, RC}
+#define Rule2(OpcOrg, OpcR0, OpcR1, OpcR2, OpcR3, OpcR4, OpcR5, OpcR6, \
+	OpcR7, OpcR8, OpcR9, RC) \
+  {OpcOrg, {OpcR0, OpcR1, OpcR2, OpcR3, OpcR4, OpcR5, OpcR6, OpcR7, \
+   OpcR8, OpcR9}, RC}
----------------
Maybe "RuleST2" instead of "Rule1" and "RuleST4" instead of "Rule2" is a little bit more descriptive?

================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:290-294
+  case Interleave:
+    for (auto &I : IRT) {
+      OriginalMCID = &TII->get(I.OrigOpc);
+      for (auto &Repl : I.ReplOpc)
+        ReplInstrMCID.push_back(&TII->get(Repl));
----------------
This still seems to be doing a bit of work, even though the information could already be cached from processing an earlier function.
I guess this could be improved by caching per Subpass PS, rather than per MCInstrDesc*?
I'm not sure if this is needed. Could you benchmark the compile-time impact as is on e.g. CTMark both when targeting Exynos (i.e. when shouldExitEarly returns false), and for another target (i.e. when shouldExitEarly returns true).

https://reviews.llvm.org/D38196