[llvm] [ARM][SLP] Fix cost function for SLP Vectorization of ZExt/SExt (PR #122713)

Wed Jan 22 07:08:20 PST 2025

https://github.com/davemgreen commented:

It is usually the input type that matters for MVE, and a vadd v4i32 -> vadd v4i64 reduction is still a single vaddlv. The comment IIRC was talking about things like v8i32->i64 reductions, that would need to be split into two v4i32 reductions but is difficult under MVE due to the way predicates and extensions work.

There are some Loop-vectorize test that would need to be updated, but I believe we would still want to generate an extending reduction pattern. There might be something more complex going on where a mixture of MVE and scalar code doesn't perform as well as sticking with scalar. Or maybe the SLP vectorizer isn't handling multiple uses properly, or the scalar cost should be a little lower for SMLAL DSP instructions.

https://github.com/llvm/llvm-project/pull/122713