[llvm] [AMDGPU][True16][CodeGen] true16 codegen for bswap (PR #122849)
Brox Chen via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 16 07:24:13 PST 2025
================
@@ -3039,6 +3041,19 @@ def : GCNPat <
(i32 (zext (bswap i16:$a))),
(V_PERM_B32_e64 (i32 0), VSrc_b32:$a, (S_MOV_B32 (i32 0x0c0c0001)))
>;
+}
+
+let True16Predicate = UseRealTrue16Insts in {
+def : GCNPat <
+ (i16 (bswap i16:$a)),
+ (EXTRACT_SUBREG (V_PERM_B32_e64 (i32 0), (COPY VGPR_16:$a), (S_MOV_B32 (i32 0x0c0c0001))), lo16)
+>;
+
+def : GCNPat <
+ (i32 (zext (bswap i16:$a))),
----------------
broxigarchen wrote:
I see. I think we can have both zext and anyext here like https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/SIInstructions.td#L2355-L2356
But if there is only one then it should be using zext. The reason would be
zext -> anyext cost nothing, but anyext -> zext will add an addtional and operation to zero out top bits
https://github.com/llvm/llvm-project/pull/122849
More information about the llvm-commits
mailing list