[PATCH] D107474: [AMDGPU] Better legalization of ctlz/cttz
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Aug 5 06:44:52 PDT 2021
foad added inline comments.
================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/cvt_f32_ubyte.ll:1126-1136
+; SI-NEXT: v_ffbh_u32_e32 v2, 0
; SI-NEXT: v_and_b32_e32 v0, 0xff, v0
-; SI-NEXT: v_cvt_f32_ubyte0_e32 v0, v0
-; SI-NEXT: v_ldexp_f32_e64 v0, v0, 0
+; SI-NEXT: v_mov_b32_e32 v1, 0
+; SI-NEXT: v_min_u32_e32 v2, 32, v2
+; SI-NEXT: v_lshl_b64 v[0:1], v[0:1], v2
+; SI-NEXT: v_cmp_ne_u32_e32 vcc, 0, v0
+; SI-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc
----------------
This is unfortunate. The problem is that when CTLZ is expanded using FFBH, `AMDGPUPostLegalizerCombinerHelper::matchUCharToFloat` can no longer see that CTLZ of the high half of `%masked = and i64 %arg0, 255` is known to be 32. It seems like we would need a whole bunch of extra constant folds and/or known bits logic to make this work again.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D107474/new/
https://reviews.llvm.org/D107474
More information about the llvm-commits
mailing list