[PATCH] D38196: [AArch64] Avoid interleaved SIMD store instructions for Exynos

Mon Dec 11 16:51:48 PST 2017

az added a comment.

Thank you for providing several very useful feedback to improve this patch. In the future (not very soon), I am planning to make this pass more complete by 1) implementing the existing vector element sub-pass in the same way as optimize interleaved store instructions sub-pass for consistency reason (put all rewrite rules and not just one, caching, instruction replacement table, etc.) 2) Addressing the ST3 instruction inefficiency assuming I can find an efficient combination of instructions to replace ST3. I hope I can put you as the reviewer.

I would like also to get your feedback on a couple of things:

1. Is it worth replacing how code generation (i.e. building new instructions) is implemented in this pass? It is currently hard-coded but we can replace it by a table driven approach similar to the one used in the analysis phase (see IRT Table). In particular, I need to find a simple way to express through a table or other means how the new instructions are built out of the old one and more specifically how the operands of new instructions related to old ones. This adds complexity to the implementation and understanding of the code generation but makes adding more rewrite rules quite simple. It tried this before but the patch was not accepted but may be I should bring it again now that we added several new rewrite pattern in this patch and expect to add more for ST3.

2. I still need your investigation about the latency for st4/st2 on A57. Based on my unit test case (which is posted somewhere in the comments of this patch), st4 is inefficient for cortex-a57 and the replacement instructions in this patch perform better. If you or somebody from ARM can verify on that, then this means that the latency information in A57 .td file is not accurate and we need to update it. My test for st2 is not conclusive related to whether st2 is better than the replacement instructions or not. Note that, we have few benchmarks that are compiled with cortex-a57 and we are not supposed to change that flag. It would be nice if the A57 .td file has accurate information.

https://reviews.llvm.org/D38196