[PATCH] D107187: [amdgpu] Add an enhanced conversion from i64 to f32.

Wed Aug 4 15:23:30 PDT 2021

hliao added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/sint_to_fp.i64.ll:233-239
+; GFX8-NEXT:    v_xor_b32_e32 v0, v1, v2
+; GFX8-NEXT:    v_cmp_lt_i32_e32 vcc, -1, v0
+; GFX8-NEXT:    v_ffbh_i32_e32 v5, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v0, 32, 33, vcc
+; GFX8-NEXT:    v_cmp_ne_u32_e32 vcc, -1, v5
+; GFX8-NEXT:    v_cndmask_b32_e32 v5, v0, v5, vcc
+; GFX8-NEXT:    v_add_u32_e32 v0, vcc, -1, v5
----------------
foad wrote:
> I think you might be able to shave off another instruction from this sequence with something like:
> ```
> v_alignbit v0, v1, v2, 31 ; extract bits 62..31
> v_ashrrev v3, 31, v2 ; duplicate sign bit 32 times
> v_xor v0, v0, v3 ; mask is 0 where bits 62..31 match sign bit
> v_ffbh_u32 v0, v0 ; count how many of the high bits from 62..31 match the sign bit
> v_min_u32 v0, 32, v0 ; clamp to 32
> ```
> Now v0 is the shift amount for the v_lshlrev_b64.
see D107507 for further enhancement. I choose another sequence with less instruction because v_alignbit is not available on pre-GCN targets as well as SALU. Also, the final 32-bit integer conversion is also revised inspired by D107474. Overall, it helps reduce the uitofp by 1 insn or 2 (with D107474). and sitofp by 2 insn.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107187/new/

https://reviews.llvm.org/D107187