[PATCH] D134463: [AMDGPU] Use V_PERM to match buildvectors when inputs are not canonicalized (i.e. can't use V_PACK)

Tue Sep 27 12:26:16 PDT 2022

jrbyrnes added a comment.

In D134463#3817469 <https://reviews.llvm.org/D134463#3817469>, @foad wrote:

> Can't you use v_alignbit for all the cases where you need the upper 16 bits of one register and the lower 16 bits of the other? It should be smaller than v_perm because the shift amount (16) is an inline constant.

Hey, thanks for the good suggestion! I think this will only work for the case where we want V[1].low : V[0].hi

In the case where we want V[1].hi : V[0].low we can't lower to `V_ALIGNBIT_B32 $V0, $V1, 16` because that would incorrectly put the bits from $V0 as the MSBs in the dest. On the other hand `V_ALIGNBIT_B32 $V1, $V0, 16` correctly has the bits from $V1 as the MSBs,  but they are the lower 16 (and the higher 16 from $V0).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134463/new/

https://reviews.llvm.org/D134463