[llvm] AMDGPU/GlobalISel: Fix inst-selection of ballot (PR #109986)

Petar Avramovic via llvm-commits llvm-commits at lists.llvm.org
Wed Sep 25 08:15:29 PDT 2024


================
@@ -522,3 +522,76 @@ true:
 false:
   ret i32 33
 }
+
+; Input that is not constant or direct result of a compare.
+; Tests setting 0 to inactive lanes.
+define amdgpu_ps void @non_cst_non_compare_input(ptr addrspace(1) %out, i32 %tid, i32 %cond) {
+; GFX10-LABEL: non_cst_non_compare_input:
+; GFX10:       ; %bb.0: ; %entry
+; GFX10-NEXT:    v_cmp_ne_u32_e32 vcc_lo, 0, v3
+; GFX10-NEXT:    ; implicit-def: $sgpr0
+; GFX10-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX10-NEXT:    s_xor_b32 s1, exec_lo, s1
+; GFX10-NEXT:  ; %bb.1: ; %B
+; GFX10-NEXT:    v_cmp_gt_u32_e32 vcc_lo, 2, v2
+; GFX10-NEXT:    ; implicit-def: $vgpr2
+; GFX10-NEXT:    s_and_b32 s0, vcc_lo, exec_lo
+; GFX10-NEXT:  ; %bb.2: ; %Flow
+; GFX10-NEXT:    s_andn2_saveexec_b32 s1, s1
+; GFX10-NEXT:  ; %bb.3: ; %A
+; GFX10-NEXT:    v_cmp_ne_u32_e32 vcc_lo, 0, v2
+; GFX10-NEXT:    s_andn2_b32 s0, s0, exec_lo
+; GFX10-NEXT:    s_and_b32 s2, vcc_lo, exec_lo
+; GFX10-NEXT:    s_or_b32 s0, s0, s2
+; GFX10-NEXT:  ; %bb.4: ; %exit
+; GFX10-NEXT:    s_or_b32 exec_lo, exec_lo, s1
+; GFX10-NEXT:    v_cndmask_b32_e64 v2, 0, 1, s0
----------------
petar-avramovic wrote:

for context this is what sdag does.

non-constant ballot
tsrc:    i1 = ...
tballot: i32 = llvm.amdgcn.ballot tsrc

is selected as (by default, when combine does not kick in)
tzext: i32 = zero_extend tsrc
tballot: i32 = SETCC tzext, Constant:i32<0>, setne:ch

Note the uppercase SETCC and i32 type, regular compare is "i1 = setcc"

most common case is with regular compare input that gets folded(*)
tsrc:    i1 = setcc t_op0, t_op1
tballot: i32 = llvm.amdgcn.ballot tsrc

tballot: i32 = SETCC t_op0, t_op1

Case that is missing on GlobalISel is when inactive lanes need to be zeroed by ballot.
SDag does this by selecting tzext: i32 = zero_extend tsrc into select
and then comparing it with zero - compare result in vcc is set to 0 for inactive lanes.

(*) In theory this might be wrong in case folded compare was from another block with different exec mask but might not be possible since sdag works block by block?

GlobalISel could avoid adding 'AND with exec' when input is compare result that was calculated with same exec mask (compared to current exec - used by ballot)

https://github.com/llvm/llvm-project/pull/109986


More information about the llvm-commits mailing list