[PATCH] D134463: [AMDGPU] Use V_PERM to match buildvectors when inputs are not canonicalized (i.e. can't use V_PACK)
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Sep 26 07:52:41 PDT 2022
arsenm added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIInstructions.td:2785-2794
+// Take the lower 16 bits from each VGPR_32 and concat them
def : GCNPat <
- (v2i16 (UniformBinFrag<build_vector> (i16 (trunc (srl_oneuse SReg_32:$src0, (i32 16)))),
- (i16 (trunc (srl_oneuse SReg_32:$src1, (i32 16)))))),
- (S_PACK_HH_B32_B16 SReg_32:$src0, SReg_32:$src1)
+ (v2f16 (DivergentBinFrag<build_vector> (f16 VGPR_32:$a), (f16 VGPR_32:$b))),
+ (V_PERM_B32_e64 VGPR_32:$b, VGPR_32:$a, (S_MOV_B32 (i32 0x05040100)))
>;
+// Take the lower 16 bits from each VGPR_32 and concat them
----------------
Can use a class or foreach over the types to avoid repeating the same pattern twice
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D134463/new/
https://reviews.llvm.org/D134463
More information about the llvm-commits
mailing list