[PATCH] D74568: AMDGPU/GlobalISel: Handle G_BSWAP

Fri Feb 14 01:45:47 PST 2020

foad accepted this revision.
foad added a comment.
This revision is now accepted and ready to land.

LGTM. See inline for some very minor possible improvements.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/bswap.ll:18
+; GFX8:       ; %bb.0:
+; GFX8-NEXT:    v_mov_b32_e32 v0, s0
+; GFX8-NEXT:    s_mov_b32 s0, 0x10203
----------------
Just curious: why is this v_mov needed? Can't v_perm read this value directly from s0?

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/bswap.ll:384
+; GFX7-NEXT:    v_alignbit_b32 v1, v0, v0, 8
+; GFX7-NEXT:    v_alignbit_b32 v0, v0, v0, 24
+; GFX7-NEXT:    s_mov_b32 s4, 0xff00ff
----------------
This would work out slightly better using a non-AMDGPU-specific lowering to something like `x >> 8 | (x & 0xff) << 8`.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/bswap.ll:393
+; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT:    s_mov_b32 s4, 0x10203
+; GFX8-NEXT:    v_perm_b32 v0, 0, v0, s4
----------------
Could do a single v_perm with mask 03020001 to avoid the shift. (Or mask 0C0C0001 if you really want to guarantee the upper bits get zeroed.)

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/bswap.ll:497
+; GFX9-NEXT:    v_lshrrev_b32_e32 v1, 16, v0
+; GFX9-NEXT:    s_mov_b32 s4, 0x10203
+; GFX9-NEXT:    v_perm_b32 v1, 0, v1, s4
----------------
If you care about v2i16 this whole sequence could be done with a single v_perm with mask 02030001.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74568/new/

https://reviews.llvm.org/D74568