[llvm] [AMDGPU] Use LSH for lowering ctlz_zero_undef.i8/i16 (PR #88512)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Wed May 8 07:13:41 PDT 2024
================
@@ -1701,12 +1695,11 @@ define amdgpu_kernel void @v_ctlz_zero_undef_i8_sel_eq_neg1(ptr addrspace(1) noa
; GFX9-GISEL-NEXT: v_add_co_u32_e32 v0, vcc, v1, v0
; GFX9-GISEL-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v3, vcc
; GFX9-GISEL-NEXT: global_load_ubyte v0, v[0:1], off
-; GFX9-GISEL-NEXT: s_waitcnt vmcnt(0)
-; GFX9-GISEL-NEXT: v_ffbh_u32_e32 v1, v0
-; GFX9-GISEL-NEXT: v_subrev_u32_e32 v1, 24, v1
-; GFX9-GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-GISEL-NEXT: v_cndmask_b32_e64 v0, v1, -1, vcc
; GFX9-GISEL-NEXT: v_mov_b32_e32 v1, 0
+; GFX9-GISEL-NEXT: s_waitcnt vmcnt(0)
+; GFX9-GISEL-NEXT: v_ffbh_u32_sdwa v2, v0
+; GFX9-GISEL-NEXT: v_cmp_eq_u32_sdwa s[2:3], v0, v1
+; GFX9-GISEL-NEXT: v_cndmask_b32_e64 v0, v2, -1, s[2:3]
; GFX9-GISEL-NEXT: global_store_byte v1, v0, s[0:1]
; GFX9-GISEL-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
----------------
arsenm wrote:
Added some in b5afda8d760998641cf08a6d229252924b0ad146
https://github.com/llvm/llvm-project/pull/88512
More information about the llvm-commits
mailing list