[PATCH] D38196: [AArch64] Avoid interleaved SIMD store instructions for Exynos
Kristof Beyls via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 13 08:44:27 PST 2017
kristof.beyls added a comment.
I haven't gone through the whole patch line by line again, but I believe this is almost in a reasonable shape.
Could you do the compilation time benchmarking to ensure this doesn't increase it for target not needing the rewriting?
================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:80
+
+ // Instruction representd by OrigOpc is replaced by instructions in ReplOpc.
+ struct InstReplInfo {
----------------
s/representd/represented/
================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:87-92
+#define Rule1(OpcOrg, OpcR0, OpcR1, OpcR2, RC) \
+ {OpcOrg, {OpcR0, OpcR1, OpcR2}, RC}
+#define Rule2(OpcOrg, OpcR0, OpcR1, OpcR2, OpcR3, OpcR4, OpcR5, OpcR6, \
+ OpcR7, OpcR8, OpcR9, RC) \
+ {OpcOrg, {OpcR0, OpcR1, OpcR2, OpcR3, OpcR4, OpcR5, OpcR6, OpcR7, \
+ OpcR8, OpcR9}, RC}
----------------
Maybe "RuleST2" instead of "Rule1" and "RuleST4" instead of "Rule2" is a little bit more descriptive?
================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:290-294
+ case Interleave:
+ for (auto &I : IRT) {
+ OriginalMCID = &TII->get(I.OrigOpc);
+ for (auto &Repl : I.ReplOpc)
+ ReplInstrMCID.push_back(&TII->get(Repl));
----------------
This still seems to be doing a bit of work, even though the information could already be cached from processing an earlier function.
I guess this could be improved by caching per Subpass PS, rather than per MCInstrDesc*?
I'm not sure if this is needed. Could you benchmark the compile-time impact as is on e.g. CTMark both when targeting Exynos (i.e. when shouldExitEarly returns false), and for another target (i.e. when shouldExitEarly returns true).
https://reviews.llvm.org/D38196
More information about the llvm-commits
mailing list