[llvm] [AArch64] Disable consecutive store merging when Neon is unavailable (PR #111519)

Wed Oct 9 01:09:33 PDT 2024

================
@@ -27924,6 +27924,24 @@ bool AArch64TargetLowering::isIntDivCheap(EVT VT, AttributeList Attr) const {
   return OptSize && !VT.isVector();
 }
 
+bool AArch64TargetLowering::canMergeStoresTo(unsigned AddressSpace, EVT MemVT,
+                                             const MachineFunction &MF) const {
+  // Avoid merging stores into fixed-length vectors when Neon is unavailable.
+  // In future, we could allow this when SVE is available, but currently,
+  // the SVE lowerings for BUILD_VECTOR are limited to a few specific cases (and
+  // the general lowering may introduce stack spills/reloads).
----------------
sdesmalen-arm wrote:

[Just thinking out loud here] My understanding is that for the example in the test, the reason we don't want to do this optimisation is because we can use the `stp` instructions instead, there is no upside to merging the stores although there is a possible downside that the insert operation is expensive. At the moment, it is expensive because we use a spill/reload, but for streaming[-compatible] SVE we could implement the operation using the SVE `INSR` instruction, which may not be any less efficient than the NEON operation if the value being inserted is also in a FPR/SIMD register. With the lack of upside, disabling the merging of stores avoids this complexity altogether, which understandably is the route chosen here.

I guess the question is; for which cases is merging stores beneficial when NEON is available? and for those cases, can we implement these efficiently using Streaming SVE?

https://github.com/llvm/llvm-project/pull/111519