[llvm] AMDGPU/GlobalISel: Fix inst-selection of ballot (PR #109986)
Petar Avramovic via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 10 04:09:53 PDT 2024
================
@@ -419,3 +424,80 @@ true:
false:
ret i32 33
}
+
+; Input that is not constant or direct result of a compare.
+; Tests setting 0 to inactive lanes.
+define amdgpu_ps void @non_cst_non_compare_input(ptr addrspace(1) %out, i32 %tid, i32 %cond) {
+; GFX10-LABEL: non_cst_non_compare_input:
+; GFX10: ; %bb.0: ; %entry
+; GFX10-NEXT: s_and_b32 s0, 1, s0
+; GFX10-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v3
+; GFX10-NEXT: v_cmp_ne_u32_e64 s0, 0, s0
+; GFX10-NEXT: s_and_saveexec_b32 s1, vcc_lo
+; GFX10-NEXT: s_xor_b32 s1, exec_lo, s1
+; GFX10-NEXT: ; %bb.1: ; %B
+; GFX10-NEXT: v_cmp_gt_u32_e32 vcc_lo, 2, v2
+; GFX10-NEXT: s_andn2_b32 s0, s0, exec_lo
+; GFX10-NEXT: ; implicit-def: $vgpr2
+; GFX10-NEXT: s_and_b32 s2, exec_lo, vcc_lo
+; GFX10-NEXT: s_or_b32 s0, s0, s2
+; GFX10-NEXT: ; %bb.2: ; %Flow
+; GFX10-NEXT: s_andn2_saveexec_b32 s1, s1
+; GFX10-NEXT: ; %bb.3: ; %A
+; GFX10-NEXT: v_cmp_le_u32_e32 vcc_lo, 1, v2
+; GFX10-NEXT: s_andn2_b32 s0, s0, exec_lo
+; GFX10-NEXT: s_and_b32 s2, exec_lo, vcc_lo
+; GFX10-NEXT: s_or_b32 s0, s0, s2
+; GFX10-NEXT: ; %bb.4: ; %exit
+; GFX10-NEXT: s_or_b32 exec_lo, exec_lo, s1
+; GFX10-NEXT: s_and_b32 s0, s0, exec_lo
+; GFX10-NEXT: v_mov_b32_e32 v2, s0
+; GFX10-NEXT: global_store_dword v[0:1], v2, off
+; GFX10-NEXT: s_endpgm
+;
+; GFX11-LABEL: non_cst_non_compare_input:
+; GFX11: ; %bb.0: ; %entry
+; GFX11-NEXT: s_and_b32 s0, 1, s0
+; GFX11-NEXT: s_mov_b32 s1, exec_lo
+; GFX11-NEXT: v_cmp_ne_u32_e64 s0, 0, s0
+; GFX11-NEXT: v_cmpx_ne_u32_e32 0, v3
+; GFX11-NEXT: s_xor_b32 s1, exec_lo, s1
+; GFX11-NEXT: ; %bb.1: ; %B
+; GFX11-NEXT: v_cmp_gt_u32_e32 vcc_lo, 2, v2
+; GFX11-NEXT: s_and_not1_b32 s0, s0, exec_lo
+; GFX11-NEXT: ; implicit-def: $vgpr2
+; GFX11-NEXT: s_and_b32 s2, exec_lo, vcc_lo
+; GFX11-NEXT: s_or_b32 s0, s0, s2
+; GFX11-NEXT: ; %bb.2: ; %Flow
+; GFX11-NEXT: s_and_not1_saveexec_b32 s1, s1
+; GFX11-NEXT: ; %bb.3: ; %A
+; GFX11-NEXT: v_cmp_le_u32_e32 vcc_lo, 1, v2
+; GFX11-NEXT: s_and_not1_b32 s0, s0, exec_lo
+; GFX11-NEXT: s_and_b32 s2, exec_lo, vcc_lo
+; GFX11-NEXT: s_or_b32 s0, s0, s2
+; GFX11-NEXT: ; %bb.4: ; %exit
+; GFX11-NEXT: s_or_b32 exec_lo, exec_lo, s1
+; GFX11-NEXT: s_and_b32 s0, s0, exec_lo
+; GFX11-NEXT: v_mov_b32_e32 v2, s0
+; GFX11-NEXT: global_store_b32 v[0:1], v2, off
----------------
petar-avramovic wrote:
This is ballot source that is not compare (source is phi), compared to what sdag does:
https://github.com/llvm/llvm-project/pull/109986/commits/0c0e21bf407bb4616e7283befec8ac0aec361ee3#diff-8048303f4c9f4c844e010c047a71b7bfcfb6c612f0faf5d9a89104acb11ee39fR571-R576
https://github.com/llvm/llvm-project/pull/109986
More information about the llvm-commits
mailing list