[llvm] [AMDGPU] (x or y) xor -1 -> x nor y (PR #130264)
Ana Mihajlovic via llvm-commits
llvm-commits at lists.llvm.org
Tue Mar 11 07:47:42 PDT 2025
mihajlovicana wrote:
> Can you remodel the testing style after something like test/CodeGen/AMDGPU/GlobalISel/xnor.ll?
>
> Notably: wave32 and wave64 should be the same file. There is no IR difference between them.
>
> The tested type combinations should be in the name. The wavesize is only significant to the output, not the input.
>
> Explicit tests for the negative one use case, with a name to reflect that
>
> Avoid memory instructions, they are not relevant to the pattern and add a lot of extra noise to the output
The problem I have here is that if I am using ballot I have to have separate tests. I tried using cmp instead of it but the problem is that SelectionDAG optimizes IR and I get this :
`define amdgpu_ps void @foo(i32 %x, i32 %y, ptr addrspace(1) %out, i32 %a, i32 %b) {
; W64-LABEL: foo:
; W64: ; %bb.0:
; W64-NEXT: v_cmp_gt_u32_e32 vcc, v0, v4
; W64-NEXT: v_cmp_ne_u32_e64 s[0:1], v1, v5
; W64-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
; W64-NEXT: s_and_b64 s[0:1], s[0:1], vcc
; W64-NEXT: v_cndmask_b32_e64 v0, 0, 1, s[0:1]
; W64-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_1)
; W64-NEXT: v_cmp_ne_u32_e64 s[0:1], 0, v0
; W64-NEXT: s_wait_alu 0xf1ff
; W64-NEXT: v_mov_b32_e32 v0, s0
; W64-NEXT: s_delay_alu instid0(VALU_DEP_2)
; W64-NEXT: v_mov_b32_e32 v1, s1
; W64-NEXT: global_store_b64 v[2:3], v[0:1], off
; W64-NEXT: s_endpgm
%y.b = icmp ule i32 %x, %a
%x.b = icmp eq i32 %y, %b
%t = or i1 %x.b, %y.b
%t.1 = xor i1 %t, -1
%z = call i64 @llvm.amdgcn.ballot.i64(i1 %t.1)
store i64 %z, ptr addrspace(1) %out
ret void
}`
So I can't really test if this patch is working with this.
I added store in the last case to avoid illegal vgpr to sgpr copy.
https://github.com/llvm/llvm-project/pull/130264
More information about the llvm-commits
mailing list