[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors (PR #108596)

via llvm-commits llvm-commits at lists.llvm.org
Wed Sep 25 21:47:02 PDT 2024


================
@@ -1391,24 +1419,28 @@ define amdgpu_ps float @test_control_flow_0(<8 x i32> inreg %rsrc, <4 x i32> inr
 ; GFX10-W32:       ; %bb.0: ; %main_body
 ; GFX10-W32-NEXT:    s_mov_b32 s12, exec_lo
 ; GFX10-W32-NEXT:    s_wqm_b32 exec_lo, exec_lo
-; GFX10-W32-NEXT:    s_mov_b32 s13, exec_lo
-; GFX10-W32-NEXT:    v_cmpx_ne_u32_e32 0, v1
-; GFX10-W32-NEXT:    s_xor_b32 s13, exec_lo, s13
-; GFX10-W32-NEXT:    s_cbranch_execz .LBB27_2
+; GFX10-W32-NEXT:    v_cmp_ne_u32_e32 vcc_lo, 0, v1
+; GFX10-W32-NEXT:    s_xor_b32 s14, vcc_lo, exec_lo
+; GFX10-W32-NEXT:    s_cmp_lg_u32 vcc_lo, 0
+; GFX10-W32-NEXT:    s_cmov_b32 exec_lo, vcc_lo
+; GFX10-W32-NEXT:    s_cbranch_scc0 .LBB27_2
 ; GFX10-W32-NEXT:  ; %bb.1: ; %ELSE
-; GFX10-W32-NEXT:    s_and_saveexec_b32 s14, s12
+; GFX10-W32-NEXT:    s_and_saveexec_b32 s13, s12
 ; GFX10-W32-NEXT:    buffer_store_dword v2, v0, s[0:3], 0 idxen
 ; GFX10-W32-NEXT:    ; implicit-def: $vgpr0
-; GFX10-W32-NEXT:    s_mov_b32 exec_lo, s14
+; GFX10-W32-NEXT:    s_mov_b32 exec_lo, s13
+; GFX10-W32-NEXT:    s_or_b32 exec_lo, exec_lo, s14
 ; GFX10-W32-NEXT:  .LBB27_2: ; %Flow
-; GFX10-W32-NEXT:    s_andn2_saveexec_b32 s13, s13
-; GFX10-W32-NEXT:    s_cbranch_execz .LBB27_4
+; GFX10-W32-NEXT:    s_xor_b32 s13, s14, exec_lo
+; GFX10-W32-NEXT:    s_cmp_lg_u32 s14, 0
+; GFX10-W32-NEXT:    s_cmov_b32 exec_lo, s14
----------------
ruiling wrote:

We are safe to use (no kill in the else region):
s_and_saveexec s13, s14
s_cselect s14, s13ruil
s_cbranch_scc0

https://github.com/llvm/llvm-project/pull/108596


More information about the llvm-commits mailing list