[llvm] [AArch64][SVE] Lower unpredicated loads/stores as fixed LDR/STR with -msve-vector-bits=128. (PR #127500)

Wed Feb 19 04:09:48 PST 2025

https://github.com/paulwalker-arm commented:

Whilst not unreasonable this PR makes me slightly uneasy because we have considerable code within the code generator that goes the other way (i.e. from fixed-length to scalable vector) for:
* wider than NEON fixed length vectors.
* operations unsupported by NEON
* most all fixed length operations when in StreamingSVE mode. 

I worry how far this could travel because if it's worth removing the predicate for the loads and stores then presumably the same is true for many other instructions and then we could end up in a situation where legalisation and combines are conflicting.

I have the following thoughts/questions:
* Where do the scalable vector operations come from?
    * Is this because the cost model has chosen scalable auto-vectorisation when fixed length would have been better?
        * I'd rather not complicate code generation just because LoopVectorize has made the wrong call.
    * Is this scalable vector ACLE code that wants to benefit from knowing the exact vector length?
* If the predicate is the main concern, does emitting the SVE fill/spill instructions improve performance?
    * Which could be achieved during isel.  
    * If this works then perhaps AArch64LoadStoreOpt could be taught to pair SVE spill/fill instructions when the vector length is known to be 128-bit?
* If this is only the start of taking advantage of knowing the exact vector length then would it be better to have these transformations as a dedicated IR pass?  Then the other optimisers can improve things further and the code generator should just work.

As I say, I'm not against the PR but it would be good to understand the direction of travel early to prevent tying selection in knots.

https://github.com/llvm/llvm-project/pull/127500