[PATCH] D83214: [AMDGPU][GlobalISel] Select llvm.amdgcn.ballot
Mirko Brkusanin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jul 10 04:58:12 PDT 2020
mbrkusanin added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:1053-1054
+
+ Optional<ValueAndVReg> Arg =
+ getConstantVRegValWithLookThrough(I.getOperand(2).getReg(), *MRI, true);
+
----------------
arsenm wrote:
> I think you want just regular getConstantVRegVal. I don't think you're getting much from the look through
Unfortunately regular version fails to produce the value.
================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll:11-12
+; CHECK: ; %bb.0:
+; CHECK-NEXT: s_mov_b32 s0, 0
+; CHECK-NEXT: s_mov_b32 s1, 0
+; CHECK-NEXT: ; return to shader part epilog
----------------
arsenm wrote:
> This can be one s_mov_b64
It can, but SIFoldOperands will not let that happen.
From:
%10:sreg_64 = S_MOV_B64 0
%3:sreg_32 = COPY %10.sub0:sreg_64
%4:sreg_32 = COPY %10.sub1:sreg_64
plus some instructions that use %3, %4 but will eventually be removed.
SIFoldOperands will produce:
%10:sreg_64 = S_MOV_B64 0
%3:sreg_32 = S_MOV_B32 0
%4:sreg_32 = S_MOV_B32 0
...
which makes the first instruction dead and in the end we're left with two S_MOV_B32.
For example bellow with exec, AMDGPU::sub0_sub1 seems to do the trick but I don't see anything similar for immediate opreands.
Alternatively we can produce
v_cmp_ne_u32_e64 s[0:1], 0, 0
if for whatever reason that is more preferable then
s_mov_b32 s0, 0
s_mov_b32 s1, 0
Anyway, this is not an issue with selecting ballot. Following example has the same issue:
```
define amdgpu_cs i64 @si_fold_constants_i64() {
%x = add i64 0, 0
ret i64 %x
}
```
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D83214/new/
https://reviews.llvm.org/D83214
More information about the llvm-commits
mailing list