[llvm] [AArch64][GlobalISel] Combine G_EXTRACT_VECTOR_ELT and G_BUILD_VECTOR sequences into G_SHUFFLE_VECTOR (PR #110545)

Fri Oct 11 06:24:13 PDT 2024

ValentijnvdBeek wrote:

> If we decided to instead directly pattern match into a vector concat for example, we're skipping the canonicalization step and going straight to the instruction we want to emit. There's no ambiguity here about what's the best final output, in this case it's always a G_CONCAT_VECTOR. So going directly has benefits in 1) not having to spend time going through the intermediate step, and b) having the guaranteed output generated.

Thanks for your clear explanation of GISelm MIR, I appreciate the time you took and I understand a lot better now. I have updated the code so that it directly runs the analysis rather than doing a step in the middle. That has the benefit that we can still share code without actually having to turn into shufflevector.

> FWIW the DAG does try to do this: https://github.com/llvm/llvm-project/blob/f0ed31ce4b63a5530fd1de875c0d1467d4d2c6ea/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L24096
>
> I can see this pattern appearing if some operations require some degree of vector splitting, and others do not

>Yeah but I don't think it's a good transformation to do if we can express the shuffle in a simpler & equally compact way. I can see that being helpful for long, complex sequences of extracts/inserts but if we know the idiom (e.g. concat) we should go directly.

I mean it does both. The PR aims to simplify long, complex sequences that occur in matrix operations like the ones that were common for the backend that I used to work on (as mentioned in the PR description). With the side benefit of the code improvements that the shufflevector optimizations bring. Since this is not the backend I worked on and I don't how common it is in other backends, it isn't my focus atm. 

But, both of you know more than I ever forget, so I will eagerly await the result of this discussion. One suggestion that I have is that, in line with the comment from @tschuett, I can use specific combiners for each size. Smaller sizes are optimized directly, but larger sequences (say >32) are turned into shufflevector. Does that sound like an idea?   

https://github.com/llvm/llvm-project/pull/110545