[llvm] [VectorCombine] Add foldShuffleToIdentity (PR #88693)

Tue Apr 16 06:11:56 PDT 2024

davemgreen wrote:

> Funnily enough, I have started work on a VectorCombine::foldShuffleOfShuffles fold that will sort of do this (and catch many of these test cases):
> 
> `shuffle(shuffle(x,u,m1),shuffle(y,u,m2)) -> shuffle(x,y,m3)`
> 
> But it doesn't recurse, so every fold has to be cost beneficial - are you seeing cases where we need the recursion capability?

Hi - I get the feeling that X86 has generally better shuffle support than other architectures. Even for AArch64, which generally does OK, there can be a big difference between a low-cost shuffle and a bad one. For something like MVE which is more constrained, the differences can be a lot larger often falling back to scalarization where it can't do much better.

It means we can't really take local steps and get to the optimal solution, because each step on it's own is worse for performance. It's only taken together that we end up with something better. It is not the motivating case but one that can come up a fair amount - consider the LD4/ST4 vectorization that we do for interleaving loads/stores. The load is various shuffle(loads) and the store a store(shuffle(shuffle, shuffle)). If the intervening instructions are all the same then the whole thing can be simplified away to continuous loads/stores, but a single shuffle moved only breaks the pattern for LD4/ST4. I think the v8f64interleave test shows something similar, but it involves splat'd vectors and wider interleaving groups.

I wanted to add some recursion limit to this code, to make sure it doesn't go too wrong. I will try and look into why #88743 causes regressions too.

https://github.com/llvm/llvm-project/pull/88693