[llvm] [AArch64] Generalize integer FPR lane stores for all types (PR #134117)
Benjamin Maxwell via llvm-commits
llvm-commits at lists.llvm.org
Thu Apr 10 06:08:22 PDT 2025
================
@@ -263,10 +262,10 @@ define void @v3i16(ptr %p1, ptr %p2) {
; CHECK-SD: // %bb.0: // %entry
; CHECK-SD-NEXT: ldr d0, [x0]
; CHECK-SD-NEXT: ldr d1, [x1]
-; CHECK-SD-NEXT: add x8, x0, #4
; CHECK-SD-NEXT: add v0.4h, v0.4h, v1.4h
-; CHECK-SD-NEXT: st1 { v0.h }[2], [x8]
+; CHECK-SD-NEXT: mov v1.h[0], v0.h[2]
; CHECK-SD-NEXT: str s0, [x0]
+; CHECK-SD-NEXT: str h1, [x0, #4]
----------------
MacDue wrote:
I believe it could be an improvement, as `st1` is basically `str + mov`. Looking at the Neoverse V1 optimization guide, I get:
```
add => latency of 1
st1 (b/h/s) => latency of 4
```
vs
```
ins => latency of 2
str (b/h/s/d) => latency of 2
```
https://github.com/llvm/llvm-project/pull/134117
More information about the llvm-commits
mailing list