[llvm] [AArch64][SVE] Lower unpredicated loads/stores as fixed LDR/STR with -msve-vector-bits=128. (PR #127500)

Ricardo Jesus via llvm-commits llvm-commits at lists.llvm.org
Wed Feb 19 05:12:13 PST 2025


rj-jesus wrote:

Hi @paulwalker-arm, thanks very much for your feedback, I'll address the code changes but first I thought I could reply to your main comments.

> I have the following thoughts/questions:
> 
>     * Where do the scalable vector operations come from?
>       * Is this scalable vector ACLE code that wants to benefit from knowing the exact vector length?

Yes, it's scalable ACLE code compiled with `-msve-vector-bits=128` (see example below).

>     * If the predicate is the main concern, does emitting the SVE fill/spill instructions improve performance?
>       * Which could be achieved during isel.
>       * If this works then perhaps AArch64LoadStoreOpt could be taught to pair SVE spill/fill instructions when the vector length is known to be 128-bit?

I think this should work too. I have a patch that adds patterns for emitting SVE LDR/STR instead of PTRUE LD1/ST1. I can put it up for review if you'd like to try going this route instead. What do you think?

>     * If this is only the start of taking advantage of knowing the exact vector length then would it be better to have these transformations as a dedicated IR pass?  Then the other optimisers can improve things further and the code generator should just work.

Whilst having a dedicated pass could be useful, we are not aiming to pursue this "generally" for all instructions. We care about going from scalable to fixed-length vectors for loads and stores mainly to benefit from LDP/STP folds.

For example, given (https://godbolt.org/z/orbeMTon3):
```cpp
#include <arm_sve.h>

svfloat64_t foo(const double *x) {
  svbool_t pg = svptrue_b64();
  return svld1_f64(pg, x) + svld1_f64(pg, x+svcntd());
}
```
We currently generate:
```gas
foo:
        ptrue   p0.d
        mov     x8, #2
        ld1d    { z0.d }, p0/z, [x0]
        ld1d    { z1.d }, p0/z, [x0, x8, lsl #3]
        fadd    z0.d, z0.d, z1.d
        ret
```
With this patch, we would instead have:
```gas
foo:
	ldp	q0, q1, [x0]
	fadd	v0.2d, v0.2d, v1.2d
	ret
```

Please let me know what you think. I'm happy to try the route via AArch64LoadStoreOpt you suggested if you think that's a better strategy!

https://github.com/llvm/llvm-project/pull/127500


More information about the llvm-commits mailing list