[llvm] [AMDGPU] Use LSH for lowering ctlz_zero_undef.i8/i16 (PR #88512)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Wed May 8 06:34:06 PDT 2024
================
@@ -1701,12 +1695,11 @@ define amdgpu_kernel void @v_ctlz_zero_undef_i8_sel_eq_neg1(ptr addrspace(1) noa
; GFX9-GISEL-NEXT: v_add_co_u32_e32 v0, vcc, v1, v0
; GFX9-GISEL-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v3, vcc
; GFX9-GISEL-NEXT: global_load_ubyte v0, v[0:1], off
-; GFX9-GISEL-NEXT: s_waitcnt vmcnt(0)
-; GFX9-GISEL-NEXT: v_ffbh_u32_e32 v1, v0
-; GFX9-GISEL-NEXT: v_subrev_u32_e32 v1, 24, v1
-; GFX9-GISEL-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-GISEL-NEXT: v_cndmask_b32_e64 v0, v1, -1, vcc
; GFX9-GISEL-NEXT: v_mov_b32_e32 v1, 0
+; GFX9-GISEL-NEXT: s_waitcnt vmcnt(0)
+; GFX9-GISEL-NEXT: v_ffbh_u32_sdwa v2, v0
+; GFX9-GISEL-NEXT: v_cmp_eq_u32_sdwa s[2:3], v0, v1
+; GFX9-GISEL-NEXT: v_cndmask_b32_e64 v0, v2, -1, s[2:3]
; GFX9-GISEL-NEXT: global_store_byte v1, v0, s[0:1]
; GFX9-GISEL-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
----------------
arsenm wrote:
This test is missing some exotic bit width tests
https://github.com/llvm/llvm-project/pull/88512
More information about the llvm-commits
mailing list