[clang] [llvm] [AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (PR #127837)
Kinoshita Kotaro via cfe-commits
cfe-commits at lists.llvm.org
Thu Sep 4 02:11:36 PDT 2025
kinoshita-fj wrote:
Hi all,
Sorry to comment on this merged PR, but this lowering causes a performance regression on A64FX.
This is because on A64FX, LDR/STR instructions are optimized for latency for register spill/fill, and they do not have the same throughput as LD1/ST1.
Other CPUs might also have a performance difference between LDR/STR and LD1/ST1.
Also, I understand this lowering can remove a `ptrue` instruction in a function, like in the first comment example.
However, in practice, functions are inlined and the `ptrue` instruction can be shared.
So, I am not sure if this lowering gives a big performance improvement.
Could you please share an example where this lowering improves performance?
If there isn't a clear benefit, perhaps we should consider reverting this PR to avoid the regression on some CPUs.
Given these points, I'd like to proceed with one of the following options. What do you think we should do?
1. Disable this lowering by default and enable it with an `-mcpu` option .
2. Keep it enabled by default and add an `-mcpu` option to disable it .
3. Revert this PR entirely.
Thank you.
https://github.com/llvm/llvm-project/pull/127837
More information about the cfe-commits
mailing list