[PATCH] D94467: [PowerPC] Use mtvsrdd+vpku instructions to optimize build_vector

Wed Jan 13 16:58:15 PST 2021

nemanjai added a comment.

If all the values are in GPR's, the code produced with this patch:

  mtvsrdd 34, 4, 3
  mtvsrdd 35, 6, 5
  vpkudum 2, 3, 2
  mtvsrdd 35, 8, 7
  mtvsrdd 36, 10, 9
  vpkudum 3, 4, 3
  vpkuwum 2, 3, 2

is certainly better than the naive code we currently produce. But I don't think we should be doing the merging/packing in the vector domain because (at least on P9 <https://reviews.llvm.org/P9>) we get half the dispatch width and the permute operations potentially have a higher latency. Furthermore, there is a potential of increasing vector register pressure with this approach which is probably not ideal. I think that for the basic case (where all values are in GPR's) we should simply add a pattern in the .td file that does something like this (similar to what we did for the wider elements):

  rlwimi 3, 4, ...  # merge r3 and r4
  rlwimi 5, 6, ...  # merge r5 and r6
  rlwimi 7, 8, ...  # merge r7 and r8
  rlwimi 9, 10, ... # merge r9 and r10
  rldimi 3, 5, ...  # merge r3, r4, r5, r6
  rldimi 7, 9, ...  # merge r7, r8, r9, r10
  mtvsrdd 34, 3, 7

For 32-bit mode, we can't really do the merging to doublewords in GPR's but I think they can be moved to VSR's after the word merges and then merged with a single `vpkuwum`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94467/new/

https://reviews.llvm.org/D94467