[llvm] [AArch64][SVE] Lower unpredicated loads/stores as fixed LDR/STR with -msve-vector-bits=128. (PR #127500)
Ricardo Jesus via llvm-commits
llvm-commits at lists.llvm.org
Wed Feb 19 05:12:13 PST 2025
rj-jesus wrote:
Hi @paulwalker-arm, thanks very much for your feedback, I'll address the code changes but first I thought I could reply to your main comments.
> I have the following thoughts/questions:
>
> * Where do the scalable vector operations come from?
> * Is this scalable vector ACLE code that wants to benefit from knowing the exact vector length?
Yes, it's scalable ACLE code compiled with `-msve-vector-bits=128` (see example below).
> * If the predicate is the main concern, does emitting the SVE fill/spill instructions improve performance?
> * Which could be achieved during isel.
> * If this works then perhaps AArch64LoadStoreOpt could be taught to pair SVE spill/fill instructions when the vector length is known to be 128-bit?
I think this should work too. I have a patch that adds patterns for emitting SVE LDR/STR instead of PTRUE LD1/ST1. I can put it up for review if you'd like to try going this route instead. What do you think?
> * If this is only the start of taking advantage of knowing the exact vector length then would it be better to have these transformations as a dedicated IR pass? Then the other optimisers can improve things further and the code generator should just work.
Whilst having a dedicated pass could be useful, we are not aiming to pursue this "generally" for all instructions. We care about going from scalable to fixed-length vectors for loads and stores mainly to benefit from LDP/STP folds.
For example, given (https://godbolt.org/z/orbeMTon3):
```cpp
#include <arm_sve.h>
svfloat64_t foo(const double *x) {
svbool_t pg = svptrue_b64();
return svld1_f64(pg, x) + svld1_f64(pg, x+svcntd());
}
```
We currently generate:
```gas
foo:
ptrue p0.d
mov x8, #2
ld1d { z0.d }, p0/z, [x0]
ld1d { z1.d }, p0/z, [x0, x8, lsl #3]
fadd z0.d, z0.d, z1.d
ret
```
With this patch, we would instead have:
```gas
foo:
ldp q0, q1, [x0]
fadd v0.2d, v0.2d, v1.2d
ret
```
Please let me know what you think. I'm happy to try the route via AArch64LoadStoreOpt you suggested if you think that's a better strategy!
https://github.com/llvm/llvm-project/pull/127500
More information about the llvm-commits
mailing list