[llvm] [NFC] Remove reverse restore from epilogue for SVE registers (PR #79623)
Sander de Smalen via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 30 06:09:02 PST 2024
================
@@ -219,34 +219,34 @@ define <vscale x 2 x double> @streaming_compatible_with_scalable_vectors(<vscale
; CHECK-NEXT: ldr z1, [sp] // 16-byte Folded Reload
; CHECK-NEXT: fadd z0.d, z1.d, z0.d
; CHECK-NEXT: addvl sp, sp, #2
-; CHECK-NEXT: ldr p15, [sp, #4, mul vl] // 2-byte Folded Reload
-; CHECK-NEXT: ldr z23, [sp, #2, mul vl] // 16-byte Folded Reload
-; CHECK-NEXT: ldr z22, [sp, #3, mul vl] // 16-byte Folded Reload
-; CHECK-NEXT: ldr p14, [sp, #5, mul vl] // 2-byte Folded Reload
-; CHECK-NEXT: ldr z21, [sp, #4, mul vl] // 16-byte Folded Reload
-; CHECK-NEXT: ldr z20, [sp, #5, mul vl] // 16-byte Folded Reload
-; CHECK-NEXT: ldr p13, [sp, #6, mul vl] // 2-byte Folded Reload
-; CHECK-NEXT: ldr z19, [sp, #6, mul vl] // 16-byte Folded Reload
-; CHECK-NEXT: ldr z18, [sp, #7, mul vl] // 16-byte Folded Reload
-; CHECK-NEXT: ldr p12, [sp, #7, mul vl] // 2-byte Folded Reload
-; CHECK-NEXT: ldr z17, [sp, #8, mul vl] // 16-byte Folded Reload
-; CHECK-NEXT: ldr z16, [sp, #9, mul vl] // 16-byte Folded Reload
-; CHECK-NEXT: ldr p11, [sp, #8, mul vl] // 2-byte Folded Reload
-; CHECK-NEXT: ldr z15, [sp, #10, mul vl] // 16-byte Folded Reload
-; CHECK-NEXT: ldr z14, [sp, #11, mul vl] // 16-byte Folded Reload
-; CHECK-NEXT: ldr p10, [sp, #9, mul vl] // 2-byte Folded Reload
-; CHECK-NEXT: ldr z13, [sp, #12, mul vl] // 16-byte Folded Reload
-; CHECK-NEXT: ldr z12, [sp, #13, mul vl] // 16-byte Folded Reload
-; CHECK-NEXT: ldr p9, [sp, #10, mul vl] // 2-byte Folded Reload
-; CHECK-NEXT: ldr z11, [sp, #14, mul vl] // 16-byte Folded Reload
-; CHECK-NEXT: ldr z10, [sp, #15, mul vl] // 16-byte Folded Reload
-; CHECK-NEXT: ldr p8, [sp, #11, mul vl] // 2-byte Folded Reload
-; CHECK-NEXT: ldr z9, [sp, #16, mul vl] // 16-byte Folded Reload
; CHECK-NEXT: ldr z8, [sp, #17, mul vl] // 16-byte Folded Reload
-; CHECK-NEXT: ldr p7, [sp, #12, mul vl] // 2-byte Folded Reload
-; CHECK-NEXT: ldr p6, [sp, #13, mul vl] // 2-byte Folded Reload
-; CHECK-NEXT: ldr p5, [sp, #14, mul vl] // 2-byte Folded Reload
+; CHECK-NEXT: ldr z9, [sp, #16, mul vl] // 16-byte Folded Reload
+; CHECK-NEXT: ldr z10, [sp, #15, mul vl] // 16-byte Folded Reload
+; CHECK-NEXT: ldr z11, [sp, #14, mul vl] // 16-byte Folded Reload
+; CHECK-NEXT: ldr z12, [sp, #13, mul vl] // 16-byte Folded Reload
+; CHECK-NEXT: ldr z13, [sp, #12, mul vl] // 16-byte Folded Reload
+; CHECK-NEXT: ldr z14, [sp, #11, mul vl] // 16-byte Folded Reload
+; CHECK-NEXT: ldr z15, [sp, #10, mul vl] // 16-byte Folded Reload
+; CHECK-NEXT: ldr z16, [sp, #9, mul vl] // 16-byte Folded Reload
+; CHECK-NEXT: ldr z17, [sp, #8, mul vl] // 16-byte Folded Reload
+; CHECK-NEXT: ldr z18, [sp, #7, mul vl] // 16-byte Folded Reload
+; CHECK-NEXT: ldr z19, [sp, #6, mul vl] // 16-byte Folded Reload
+; CHECK-NEXT: ldr z20, [sp, #5, mul vl] // 16-byte Folded Reload
+; CHECK-NEXT: ldr z21, [sp, #4, mul vl] // 16-byte Folded Reload
+; CHECK-NEXT: ldr z22, [sp, #3, mul vl] // 16-byte Folded Reload
+; CHECK-NEXT: ldr z23, [sp, #2, mul vl] // 16-byte Folded Reload
; CHECK-NEXT: ldr p4, [sp, #15, mul vl] // 2-byte Folded Reload
+; CHECK-NEXT: ldr p5, [sp, #14, mul vl] // 2-byte Folded Reload
+; CHECK-NEXT: ldr p6, [sp, #13, mul vl] // 2-byte Folded Reload
+; CHECK-NEXT: ldr p7, [sp, #12, mul vl] // 2-byte Folded Reload
+; CHECK-NEXT: ldr p8, [sp, #11, mul vl] // 2-byte Folded Reload
+; CHECK-NEXT: ldr p9, [sp, #10, mul vl] // 2-byte Folded Reload
+; CHECK-NEXT: ldr p10, [sp, #9, mul vl] // 2-byte Folded Reload
+; CHECK-NEXT: ldr p11, [sp, #8, mul vl] // 2-byte Folded Reload
+; CHECK-NEXT: ldr p12, [sp, #7, mul vl] // 2-byte Folded Reload
+; CHECK-NEXT: ldr p13, [sp, #6, mul vl] // 2-byte Folded Reload
+; CHECK-NEXT: ldr p14, [sp, #5, mul vl] // 2-byte Folded Reload
+; CHECK-NEXT: ldr p15, [sp, #4, mul vl] // 2-byte Folded Reload
----------------
sdesmalen-arm wrote:
I would probably recommend keeping incrementing addresses for performance reasons, but you could split the order of things such that you end up with:
```
ldr z23, [sp, #2, mul vl] // 16-byte Folded Reload
ldr z22, [sp, #3, mul vl] // 16-byte Folded Reload
...
ldr z8, [sp, #17, mul vl] // 16-byte Folded Reload
ldr p15, [sp, #4, mul vl] // 2-byte Folded Reload
ldr p14, [sp, #5, mul vl] // 2-byte Folded Reload
...
ldr p4, [sp, #16, mul vl] // 2-byte Folded Reload
```
You could do that by doing a `stable_sort` of the RegPairs (rather than a regular sort), so that you only swap when the two regpairs are a combination of P and Z registers. A stable_sort should retain the original sequence of the their respective P and Z spill/reload sequences.
https://github.com/llvm/llvm-project/pull/79623
More information about the llvm-commits
mailing list