[PATCH] D107474: [AMDGPU] Better legalization of ctlz/cttz

Thu Aug 5 06:44:52 PDT 2021

foad added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/cvt_f32_ubyte.ll:1126-1136
+; SI-NEXT:    v_ffbh_u32_e32 v2, 0
 ; SI-NEXT:    v_and_b32_e32 v0, 0xff, v0
-; SI-NEXT:    v_cvt_f32_ubyte0_e32 v0, v0
-; SI-NEXT:    v_ldexp_f32_e64 v0, v0, 0
+; SI-NEXT:    v_mov_b32_e32 v1, 0
+; SI-NEXT:    v_min_u32_e32 v2, 32, v2
+; SI-NEXT:    v_lshl_b64 v[0:1], v[0:1], v2
+; SI-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v0
+; SI-NEXT:    v_cndmask_b32_e64 v0, 0, 1, vcc
----------------
This is unfortunate. The problem is that when CTLZ is expanded using FFBH, `AMDGPUPostLegalizerCombinerHelper::matchUCharToFloat` can no longer see that CTLZ of the high half of `%masked = and i64 %arg0, 255` is known to be 32. It seems like we would need a whole bunch of extra constant folds and/or known bits logic to make this work again.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107474/new/

https://reviews.llvm.org/D107474