[llvm] [LV] Add on extra cost for scalarising math calls in vector loops (PR #158611)

Mon Sep 15 23:51:13 PDT 2025

https://github.com/davemgreen commented:

This seems like quite a high cost to add. Could we consider just fixing the codegeneration?

There is an example in https://godbolt.org/z/j7xrPnx1K. If you take out the noise from the last functions it should be something like
```
        mov     s8, v0.s[1]
        mov     s9, v0.s[2]
        mov     s10, v0.s[3]
        bl      expf
        fmov    s11, s0
        fmov    s0, s8
        bl      expf
        fmov    s8, s0
        fmov    s0, s9
        bl      expf
        fmov    s9, s0
        fmov    s0, s10
        bl      expf
        mov     v11.s[1], v8.s[0]
        mov     v11.s[2], v9.s[0]
        mov     v11.s[3], v0.s[0]
```
The costs of calls is always a bit difficult. The fmovs are not exactly free but sometimes close to it, and the rest is not far away from the existing 4*call+scalarization overhead.

I guess the general problem is that any call without a vector calling convention will cause spilling of v/z vector registers if they need to be live across it.

https://github.com/llvm/llvm-project/pull/158611