[PATCH] D134463: [AMDGPU] Use V_PERM to match buildvectors when inputs are not canonicalized (i.e. can't use V_PACK)

Mon Sep 26 07:52:41 PDT 2022

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIInstructions.td:2785-2794
+// Take the lower 16 bits from each VGPR_32 and concat them
 def : GCNPat <
-  (v2i16 (UniformBinFrag<build_vector> (i16 (trunc (srl_oneuse SReg_32:$src0, (i32 16)))),
-                       (i16 (trunc (srl_oneuse SReg_32:$src1, (i32 16)))))),
-  (S_PACK_HH_B32_B16 SReg_32:$src0, SReg_32:$src1)
+  (v2f16 (DivergentBinFrag<build_vector> (f16 VGPR_32:$a), (f16 VGPR_32:$b))),
+  (V_PERM_B32_e64 VGPR_32:$b, VGPR_32:$a, (S_MOV_B32 (i32 0x05040100)))
 >;

+// Take the lower 16 bits from each VGPR_32 and concat them
----------------
Can use a class or foreach over the types to avoid repeating the same pattern twice

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134463/new/

https://reviews.llvm.org/D134463