[PATCH] D83214: [AMDGPU][GlobalISel] Select llvm.amdgcn.ballot

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jul 10 10:34:41 PDT 2020


arsenm accepted this revision.
arsenm added inline comments.
This revision is now accepted and ready to land.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:1053-1054
+
+  Optional<ValueAndVReg> Arg =
+      getConstantVRegValWithLookThrough(I.getOperand(2).getReg(), *MRI, true);
+
----------------
mbrkusanin wrote:
> arsenm wrote:
> > I think you want just regular getConstantVRegVal. I don't think you're getting much from the look through
> Unfortunately regular version fails to produce the value.
>  
I think something probably went wrong here due to us not trying to do anything resembling optimization during/after RegBankSelect. When we do that, we can probably remove a lot of these


================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll:11-12
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_mov_b32 s0, 0
+; CHECK-NEXT:    s_mov_b32 s1, 0
+; CHECK-NEXT:    ; return to shader part epilog
----------------
mbrkusanin wrote:
> arsenm wrote:
> > This can be one s_mov_b64
> It can, but SIFoldOperands will not let that happen.
> 
> From:
>   %10:sreg_64 = S_MOV_B64 0
>   %3:sreg_32 = COPY %10.sub0:sreg_64
>   %4:sreg_32 = COPY %10.sub1:sreg_64
>   plus some instructions that use %3, %4 but will eventually be removed.
> 
> SIFoldOperands will produce:
>   %10:sreg_64 = S_MOV_B64 0
>   %3:sreg_32 = S_MOV_B32 0
>   %4:sreg_32 = S_MOV_B32 0
>   ...
> 
> which makes the first instruction dead and in the end we're left with two S_MOV_B32.
> 
> For example bellow with exec, AMDGPU::sub0_sub1 seems to do the trick but I don't see anything similar for immediate opreands.
> Alternatively we can produce 
>   v_cmp_ne_u32_e64 s[0:1], 0, 0 
> if for whatever reason that is more preferable then
>   s_mov_b32 s0, 0
>   s_mov_b32 s1, 0
> 
> Anyway, this is not an issue with selecting ballot. Following example has the same issue:
> 
> ```
> define amdgpu_cs i64 @si_fold_constants_i64() {
>   %x = add i64 0, 0
>   ret i64 %x
> }
> ```
I guess this is another bug to solve. Can you file that somewhere? We shouldn't be trying to workaround it in the selector


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83214/new/

https://reviews.llvm.org/D83214





More information about the llvm-commits mailing list