[llvm] [RISCV][TTI] Reduce cost of a build_vector pattern (PR #108419)

Wed Sep 18 07:50:26 PDT 2024

preames wrote:

> I'm seeing a 0.88% regression on 511.povray_r on the BPI F3 after applying this, and it's fairly reproducible (< 0.1% stddev). Looking through the codegen changes, it looks like we're now avoiding partial vectorization in some places where we e.g. exploded a vector to do a bunch of exp intrinsic calls, spilling the vector registers exactly as you described.

Just to confirm, you're looking at cycle count right?  What routine are you seeing this in?  I'm looking at an LTO build of povray, and not seeing any heavy use of the @exp routine - except indirectly through a function pointer table.  Is your build -Ofast -flto=auto?  Or something else?

> I would have thought that avoiding these vector spills would have been more performant, I'm not sure why it's turning out to be slower in the scalar form. Do we need discount build_vectors a bit more to get it to partially vectorize these parts again?

Either that, or we exposing some other issue.  If you can show me the relevant areas, I'll try to take a look.  

> To clarify, I think the changes in this PR are the right thing to do, I just want to point the interaction with SLP. I'm running the other benchmarks now to see if they're also affected.

Ack.  

https://github.com/llvm/llvm-project/pull/108419