[llvm] [SLP]Initial non-power-of-2 support (but still whole register) for reductions (PR #112361)

Mon Oct 21 05:43:42 PDT 2024

davemgreen wrote:

Hi - Sorry but I had to revert this again. It caused regressions in the Phase Ordering test due to the order of shuffle elements being a lot worse than they were. ([0,2,4,6] is much better than [0,3,4,7]). It is a fairly gnarly SLP vectorization issue, where the order of elements matters a lot for performance and is tricky to get right. That was fixed in #100653, but the second patch here (7f2e937469a8cec3fe977bf41ad2dfb9b4ce648a) undid it again I think. That might be something that is necessary if we can't figure out how to do it nicely within the compile-time budget that we have, but that should happen properly in a reviewed patch not in a fixup to something unrelated.

The test case is quite subtle, and the whole thing feels a bit delicate at the moment. There is some adjustments made in VectorCombiner that help code like this that uses select-shuffles, and was at least partially done there so that it didn't have to be ran on every SLP case. I know there are other cases were we could be doing better if we could get the ordering a bit nicer, but it's tough to do that quickly for every bit of potentially SLP vectorizable code.

https://github.com/llvm/llvm-project/pull/112361