[llvm] [SLP] Vectorize non-power-of-2 ops with padding. (PR #77790)

Thu Jan 11 09:09:15 PST 2024

fhahn wrote:

> Thanks for the patch, but this is too early to land. I'm working on non-power-of-2 vectorization (it is WIP and still requires couple more patches to go). Being implemented without some extra patches it leads to perf regression in some cases. Need to handle all this stuff correctly. 

Do you have any more info about where those perf regressions are showing up (like architecture, benchmark, code pattern)? Curious if they are mostly related to reordering/shuffling/improved gathering? This patch does only support full gather, which should hopefully reflect the cost quite accurately (modulo cost-model gaps)

> Would be good if you could land new tests separately, may help with the perf gains/regressions later.

Done! Some of the tests are for profitability, but many of them are also to cover crashes.

https://github.com/llvm/llvm-project/pull/77790