[PATCH] D74568: AMDGPU/GlobalISel: Handle G_BSWAP
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Feb 14 01:45:47 PST 2020
foad accepted this revision.
foad added a comment.
This revision is now accepted and ready to land.
LGTM. See inline for some very minor possible improvements.
================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/bswap.ll:18
+; GFX8: ; %bb.0:
+; GFX8-NEXT: v_mov_b32_e32 v0, s0
+; GFX8-NEXT: s_mov_b32 s0, 0x10203
----------------
Just curious: why is this v_mov needed? Can't v_perm read this value directly from s0?
================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/bswap.ll:384
+; GFX7-NEXT: v_alignbit_b32 v1, v0, v0, 8
+; GFX7-NEXT: v_alignbit_b32 v0, v0, v0, 24
+; GFX7-NEXT: s_mov_b32 s4, 0xff00ff
----------------
This would work out slightly better using a non-AMDGPU-specific lowering to something like `x >> 8 | (x & 0xff) << 8`.
================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/bswap.ll:393
+; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT: s_mov_b32 s4, 0x10203
+; GFX8-NEXT: v_perm_b32 v0, 0, v0, s4
----------------
Could do a single v_perm with mask 03020001 to avoid the shift. (Or mask 0C0C0001 if you really want to guarantee the upper bits get zeroed.)
================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/bswap.ll:497
+; GFX9-NEXT: v_lshrrev_b32_e32 v1, 16, v0
+; GFX9-NEXT: s_mov_b32 s4, 0x10203
+; GFX9-NEXT: v_perm_b32 v1, 0, v1, s4
----------------
If you care about v2i16 this whole sequence could be done with a single v_perm with mask 02030001.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D74568/new/
https://reviews.llvm.org/D74568
More information about the llvm-commits
mailing list