[PATCH] D94467: [PowerPC] Use mtvsrdd+vpku instructions to optimize build_vector
Nemanja Ivanovic via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 13 16:58:15 PST 2021
nemanjai added a comment.
If all the values are in GPR's, the code produced with this patch:
mtvsrdd 34, 4, 3
mtvsrdd 35, 6, 5
vpkudum 2, 3, 2
mtvsrdd 35, 8, 7
mtvsrdd 36, 10, 9
vpkudum 3, 4, 3
vpkuwum 2, 3, 2
is certainly better than the naive code we currently produce. But I don't think we should be doing the merging/packing in the vector domain because (at least on P9 <https://reviews.llvm.org/P9>) we get half the dispatch width and the permute operations potentially have a higher latency. Furthermore, there is a potential of increasing vector register pressure with this approach which is probably not ideal. I think that for the basic case (where all values are in GPR's) we should simply add a pattern in the .td file that does something like this (similar to what we did for the wider elements):
rlwimi 3, 4, ... # merge r3 and r4
rlwimi 5, 6, ... # merge r5 and r6
rlwimi 7, 8, ... # merge r7 and r8
rlwimi 9, 10, ... # merge r9 and r10
rldimi 3, 5, ... # merge r3, r4, r5, r6
rldimi 7, 9, ... # merge r7, r8, r9, r10
mtvsrdd 34, 3, 7
For 32-bit mode, we can't really do the merging to doublewords in GPR's but I think they can be moved to VSR's after the word merges and then merged with a single `vpkuwum`.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D94467/new/
https://reviews.llvm.org/D94467
More information about the llvm-commits
mailing list