[llvm] [SLP] Vectorize non-power-of-2 ops with padding. (PR #77790)

Tue Jan 16 12:49:34 PST 2024

fhahn wrote:

> As I said before, landing this alone causes perf regressions. I'm working on this, need to land couple patches to avoid this. I'm working on this feature for 3 years already (or more), most part of my previous patches are related to non-power-of-2 support.

I did perf evaluation for this change on SPEC2017 and some large internal benchmarks on AArch64, and the only regressions I was able to spot were due to AArch64 cost-model issues, where the cost of non-power-of-2 vector ops wasn't estimated accurately (one instance fixed by #78181). 

With the initial level of support, do you think there are fundamental issues in SLPVectorizer that aren't related to target specific cost models not providing accurate costs? To iterate on the latter, having limited support early would be helpful as people can fix their cost models early and also iteratively as more features enabled for non-power-of-2 vec.

https://github.com/llvm/llvm-project/pull/77790