[PATCH] D107187: [amdgpu] Add an enhanced conversion from i64 to f32.

Michael Liao via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 3 07:32:18 PDT 2021


hliao marked 3 inline comments as done.
hliao added inline comments.


================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/cvt_f32_ubyte.ll:1085
 ; SI-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; SI-NEXT:    s_movk_i32 s6, 0xff
-; SI-NEXT:    v_and_b32_e32 v0, s6, v0
-; SI-NEXT:    v_add_i32_e32 v0, vcc, 0, v0
-; SI-NEXT:    v_ffbh_u32_e32 v2, v0
-; SI-NEXT:    v_addc_u32_e64 v1, s[4:5], 0, 0, vcc
-; SI-NEXT:    v_add_i32_e32 v2, vcc, 32, v2
-; SI-NEXT:    v_ffbh_u32_e32 v3, v1
-; SI-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v1
-; SI-NEXT:    v_cndmask_b32_e32 v2, v3, v2, vcc
-; SI-NEXT:    v_mov_b32_e32 v3, 0xbe
-; SI-NEXT:    v_sub_i32_e32 v4, vcc, v3, v2
-; SI-NEXT:    v_lshl_b64 v[2:3], v[0:1], v2
-; SI-NEXT:    v_cmp_ne_u64_e32 vcc, 0, v[0:1]
-; SI-NEXT:    v_and_b32_e32 v1, 0x7fffffff, v3
-; SI-NEXT:    v_cndmask_b32_e32 v0, 0, v4, vcc
-; SI-NEXT:    s_mov_b32 s4, 0
-; SI-NEXT:    v_and_b32_e32 v3, s6, v3
-; SI-NEXT:    s_movk_i32 s5, 0x80
-; SI-NEXT:    v_lshrrev_b32_e32 v1, 8, v1
-; SI-NEXT:    v_lshlrev_b32_e32 v0, 23, v0
-; SI-NEXT:    v_or_b32_e32 v0, v0, v1
-; SI-NEXT:    v_cmp_eq_u64_e32 vcc, s[4:5], v[2:3]
-; SI-NEXT:    v_and_b32_e32 v1, 1, v0
-; SI-NEXT:    v_cndmask_b32_e32 v1, 0, v1, vcc
-; SI-NEXT:    v_cmp_lt_u64_e32 vcc, s[4:5], v[2:3]
-; SI-NEXT:    v_cndmask_b32_e64 v1, v1, 1, vcc
-; SI-NEXT:    v_add_i32_e32 v0, vcc, v0, v1
+; SI-NEXT:    v_ffbh_i32_e32 v2, 0
+; SI-NEXT:    v_cmp_ne_u32_e32 vcc, -1, v2
----------------
arsenm wrote:
> foad wrote:
> > Not related to your patch, but we should generate v_cvt_f32_ubyte0 here, shouldn't we?
> Yes, but nothing is trying to reduce the bitwidth of anything right now
the constant folding on those intrinsics is not supported yet. You may notice that the unsigned byte conversion is already simplified a lot due to the general SDNode used, where the only missing part if LDEXP(v, 0). Once we teach the combiner to understand ffbh_i32 and ldexp, we should get the expected result. I also found other minor issues related, will prepare patches to improve them soon.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107187/new/

https://reviews.llvm.org/D107187



More information about the llvm-commits mailing list