[llvm] [LV] Add on extra cost for scalarising math calls in vector loops (PR #158611)
David Green via llvm-commits
llvm-commits at lists.llvm.org
Mon Sep 15 23:51:13 PDT 2025
https://github.com/davemgreen commented:
This seems like quite a high cost to add. Could we consider just fixing the codegeneration?
There is an example in https://godbolt.org/z/j7xrPnx1K. If you take out the noise from the last functions it should be something like
```
mov s8, v0.s[1]
mov s9, v0.s[2]
mov s10, v0.s[3]
bl expf
fmov s11, s0
fmov s0, s8
bl expf
fmov s8, s0
fmov s0, s9
bl expf
fmov s9, s0
fmov s0, s10
bl expf
mov v11.s[1], v8.s[0]
mov v11.s[2], v9.s[0]
mov v11.s[3], v0.s[0]
```
The costs of calls is always a bit difficult. The fmovs are not exactly free but sometimes close to it, and the rest is not far away from the existing 4*call+scalarization overhead.
I guess the general problem is that any call without a vector calling convention will cause spilling of v/z vector registers if they need to be live across it.
https://github.com/llvm/llvm-project/pull/158611
More information about the llvm-commits
mailing list