[PATCH] D94746: [AMDGPU] Move kill lowering to WQM pass and add live mask tracking

Fri Jan 15 18:22:54 PST 2021

critson added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.kill.ll:87
+; GCN: v_cmp_lg_f32
 define amdgpu_gs void @oeq(float %a) {
   %c1 = fcmp oeq float %a, 0.0
----------------
piotr wrote:
> The generated code for this test (and a few others) is slightly unexpected (all three patches combined):
> 
> Before:
>         v_cmpx_lt_f32_e32 vcc, 0, v0
> 
> After:
>         v_cmp_gt_f32_e32 vcc, 0, v0
>         s_andn2_b64 exec, exec, vcc
>         s_andn2_b64 exec, exec, vcc
So what is happening is the mask update and the exec update use the same register, and shader is marked GS.

Post WQM:
```
// live mask generated:
%3:sreg_64 = COPY $exec

// kill: 
%0:vgpr_32 = COPY $vgpr0
V_CMP_GT_F32_e32 0, %0:vgpr_32, implicit-def $vcc, implicit $mode, implicit $exec

// live mask update:
dead %3:sreg_64 = S_ANDN2_B64 %3:sreg_64, $vcc, implicit-def $scc
SI_EARLY_TERMINATE_SCC0 implicit $exec, implicit $scc

// kill implemented:
$exec = S_ANDN2_B64 $exec, $vcc, implicit-def $scc
```

Here the SI_EARLY_TERMINATE_SCC0 generates nothing because the test shader is marked amdgpu_gs.  I think the test shaders are too trivial to be representative of real code generation.  I added the sendmsg otherwise these shaders optimise away to nothing.  It still could be reasonable to add a peephole to clean these up.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94746/new/

https://reviews.llvm.org/D94746