[PATCH] D123911: [VectorCombine] Fold shuffle select pattern
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue May 3 07:21:57 PDT 2022
dmgreen marked an inline comment as done.
dmgreen added inline comments.
================
Comment at: llvm/lib/Transforms/Vectorize/VectorCombine.cpp:1300-1302
+ // For each of the output shuffles, we try to sort the first vector elements
+ // to the beginning, and the second array elements to the end to allow us to
+ // only use half of each of the binops. We compute the ReconstructMask mask
----------------
samtebbs wrote:
> I'm not exactly sure what is meant by "we try to sort the first vector elements to the beginning, and the second array elements to the end". Does it mean sorting e.g. `shuffle <9, 4, 11, 12, 3>` to `shuffle<3, 4, 9, 11, 12>`?
>
> How does that then allow us to only use half of the binops? The number of binary operations in the output seems to remain the same. This is just for my own understanding, not because I think it's wrong.
Yeah I don't feel like that was written very well. I've tried to update it a little.
The idea is to take a shuffle of the form `shuffle A, B <0, 8, 2, 3, 12, 13, 6, 15>` and turn that into a shuffle that only uses the first 4 lanes (0,1,2,3) from A and the first 4 lanes from B (8,9,10,11). We need to recreate the original, so we create a reconstruction mask of `shuffle A', B' <0, 8, 1, 2, 9, 10, 3, 11>`. The shuffles into A and B are altered to keep the lanes valid, and the whole thing is costed to make sure the total cost of the new shuffles is lower than the originals.
If A and B are <8 x i32>, for example, then we only need the first <4 x i32> from each, cutting the number of operations down from 2 to 1 for each of the binops. Depending on the cost of the shuffles, this can be better overall.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D123911/new/
https://reviews.llvm.org/D123911
More information about the llvm-commits
mailing list