[llvm] [SLP] Vectorize non-power-of-2 ops with padding. (PR #77790)

Thu Jan 11 09:23:52 PST 2024

alexey-bataev wrote:

> > Thanks for the patch, but this is too early to land. I'm working on non-power-of-2 vectorization (it is WIP and still requires couple more patches to go). Being implemented without some extra patches it leads to perf regression in some cases. Need to handle all this stuff correctly.
> 
> Do you have any more info about where those perf regressions are showing up (like architecture, benchmark, code pattern)? Curious if they are mostly related to reordering/shuffling/improved gathering? This patch does only support full gather, which should hopefully reflect the cost quite accurately (modulo cost-model gaps)

I investigated Spec + some other benchmarks, mostly the regressions are because of the too early SLP vectorization with LTO scenarios. To fix this, need to implement couple more patches (vectorization of gathered loads + reordering) + need to add some limitations.
The reordering is also a problem, but not the biggest one.
This new node will be a burden to support, later the genric Vectorization will be extended to support non-power-of-2 nodes directly.

> 
> > Would be good if you could land new tests separately, may help with the perf gains/regressions later.
> 
> Done! Some of the tests are for profitability, but many of them are also to cover crashes.

https://github.com/llvm/llvm-project/pull/77790