[PATCH] D38196: [AArch64] Avoid interleaved SIMD store instructions for Exynos

Fri Oct 13 08:03:03 PDT 2017

az marked 7 inline comments as done.
az added a comment.

> But my main question at this point is: did you benchmark this? What are the results of that? On which cores/benchmarks?

As llvm is now getting better at generating interleaved memory access instructions, we are seeing a major degradation in a proprietary benchmark for Exynos because latest llvm generates st2 instructions in a hot loop. I can refer you to the llvm file lib/Target/AArch64/AArch64SchedM1.td for Exynos latencies (example: st2 takes 8 cycles, while zip1, zip2, and stp each take 1 cycle).

Note that other targets that we tested did not in general have this problem. However, I noticed through unit testing that ARM A57 had this problem with st4 only (and not with st2) even though A57 latencies in AArch64SchedA57.td do not indicate any problem and hence this pass does not affect A57 (somebody from ARM may need to double check on those latency).

================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:1
 //=- AArch64VectorByElementOpt.cpp - AArch64 vector by element inst opt pass =//
 //
----------------
kristof.beyls wrote:
> I feel like the name of this pass/file should be changed, as the name no longer covers what it does.
> It seems the pass now is becoming about replacing certain SIMD patterns with longer but more efficient SIMD patterns.
> AArch64SIMDOptimizer might fit also with some of the file names of other optimizer passes already in the AArch64 target?
I was planning to clean the naming afterward in a separate patch but I just did it now after seeing this Comment. I Changed classes names, pass name, variable name, etc. (as a result, more files with minor changes are included in the review related to changing the pass name). Note that I did not change the file name in here but will do when I commit the code. I chose AArch64SIMDInstrOpt instead of  AArch64SIMDOptimizer based on teh assumption that SIMDOptimizer may be confuse people into thinking that we may be doing some vectorization related work. SIMDInstr may convey the idea that we are doing optimization on the generated SIMD instructions. Let me know if you have other ideas about this.

================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:74
+    Interleave,
+    unsupportedSub
+  } Subpass;
----------------
kristof.beyls wrote:
> s/unsupportedSub/UnsupportedSub/.
> Do you need the UnsupportedSub value?
I used to have an assert statement to look for enum value error at the top of the function (see original review). Then, I moved that error checking into the default case of the switch statement (with llvm_unreachable()) based on my understanding of one of your suggestions. That resulted in a compiler warning during the build related to the no need for the default case as all values are covered. So, I added the Unsupported in enum. In this new review, I changed the names of the extra enum value just in case it looks less confusing. While I prefer that, I am perfectly fine with removing them along with the default case and any enum related error checking.

https://reviews.llvm.org/D38196