[PATCH] D83214: [AMDGPU][GlobalISel] Select llvm.amdgcn.ballot

Fri Jul 10 04:58:12 PDT 2020

mbrkusanin added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:1053-1054
+
+  Optional<ValueAndVReg> Arg =
+      getConstantVRegValWithLookThrough(I.getOperand(2).getReg(), *MRI, true);
+
----------------
arsenm wrote:
> I think you want just regular getConstantVRegVal. I don't think you're getting much from the look through
Unfortunately regular version fails to produce the value.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll:11-12
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_mov_b32 s0, 0
+; CHECK-NEXT:    s_mov_b32 s1, 0
+; CHECK-NEXT:    ; return to shader part epilog
----------------
arsenm wrote:
> This can be one s_mov_b64
It can, but SIFoldOperands will not let that happen.

From:
  %10:sreg_64 = S_MOV_B64 0
  %3:sreg_32 = COPY %10.sub0:sreg_64
  %4:sreg_32 = COPY %10.sub1:sreg_64
  plus some instructions that use %3, %4 but will eventually be removed.

SIFoldOperands will produce:
  %10:sreg_64 = S_MOV_B64 0
  %3:sreg_32 = S_MOV_B32 0
  %4:sreg_32 = S_MOV_B32 0
  ...

which makes the first instruction dead and in the end we're left with two S_MOV_B32.

For example bellow with exec, AMDGPU::sub0_sub1 seems to do the trick but I don't see anything similar for immediate opreands.
Alternatively we can produce 
  v_cmp_ne_u32_e64 s[0:1], 0, 0 
if for whatever reason that is more preferable then
  s_mov_b32 s0, 0
  s_mov_b32 s1, 0

Anyway, this is not an issue with selecting ballot. Following example has the same issue:

```
define amdgpu_cs i64 @si_fold_constants_i64() {
  %x = add i64 0, 0
  ret i64 %x
}
```

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83214/new/

https://reviews.llvm.org/D83214