[llvm] [AArch64] Disable consecutive store merging when Neon is unavailable (PR #111519)

Wed Oct 9 02:23:43 PDT 2024

================
@@ -27924,6 +27924,24 @@ bool AArch64TargetLowering::isIntDivCheap(EVT VT, AttributeList Attr) const {
   return OptSize && !VT.isVector();
 }
 
+bool AArch64TargetLowering::canMergeStoresTo(unsigned AddressSpace, EVT MemVT,
+                                             const MachineFunction &MF) const {
+  // Avoid merging stores into fixed-length vectors when Neon is unavailable.
+  // In future, we could allow this when SVE is available, but currently,
+  // the SVE lowerings for BUILD_VECTOR are limited to a few specific cases (and
+  // the general lowering may introduce stack spills/reloads).
----------------
MacDue wrote:

I think there are two (slightly) independent cases here. There's unwanted store merging (for non-streaming functions), because we could just use a stp instead. E.g.

```
mov v0.s[1], v1.s[0]
str d0, [x0]
```
->

```
stp s0, s1, [x0]
```

That's not fixed in this PR.

Then there's streaming mode store merging, which results in stack spills due to the BUILD_VECTOR lowering. Disabling store mering means, in some cases, we use a more preferable `stp` in streaming mode, but that's a secondary goal here; the main aim is to avoid the stack spills.

As for a streaming-mode/SVE BUILD_VECTOR lowering, I think there are a few options, but likely not as efficient as NEON (though maybe others have better ideas :smile:).

E.g. for <4 x float>:

You could make a chain of `INSR`:
```
insr    z3.s, s2
insr    z3.s, s1
insr    z3.s, s0
str     q3, [x0]
```

But `INSR` has a higher latency than a `MOV`. Also, there is a dependency chain here, as each `INSR` depends on the previous one.

Another option is a chain of `ZIP1`:

```
zip1    z2.s, z2.s, z3.s
zip1    z0.s, z0.s, z1.s
zip1    z0.d, z0.d, z2.d
str     q0, [x0]
```

This seems like it may be more efficient than `INSR`, and also allows for a shorter dependency chain (logn), but it is still likely not as efficient  as just `MOV`s.

https://github.com/llvm/llvm-project/pull/111519