[PATCH] D94746: [AMDGPU] Move kill lowering to WQM pass and add live mask tracking

Mon Jan 18 05:58:49 PST 2021

piotr added a comment.

LGTM with a few more nits.

Being able to query live lanes at any point in the shader makes sense to me and I really like the removal of SI_KILL_CLEANUP.

================
Comment at: llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp:862
+    LIS->createAndComputeVirtRegInterval(TmpReg);
+  if (LiveMaskWQM)
+    LIS->createAndComputeVirtRegInterval(LiveMaskWQM);
----------------
LiveMaskWQM unused.

================
Comment at: llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp:1043

-void SIWholeQuadMode::processBlock(MachineBasicBlock &MBB, unsigned LiveMaskReg,
-                                   bool isEntry) {
+void SIWholeQuadMode::processBlock(MachineBasicBlock &MBB, bool isEntry) {
   auto BII = Blocks.find(&MBB);
----------------
Nit: I tend to agree with the clang-tidy warning. 's/isEntry/IsEntry/' for consistency?

================
Comment at: llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp:1329

-    lowerLiveMaskQueries(LiveMaskReg);
+  // Shader is simple, only needs WQM or WWM
+  if (!(GlobalFlags & (StateWQM | StateWWM)) && LowerToCopyInstrs.empty() &&
----------------
Is the comment up-to-date? Did you mean: "does not need WQM nor WWM"?

================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.kill.ll:87
+; GCN: v_cmp_lg_f32
 define amdgpu_gs void @oeq(float %a) {
   %c1 = fcmp oeq float %a, 0.0
----------------
critson wrote:
> piotr wrote:
> > The generated code for this test (and a few others) is slightly unexpected (all three patches combined):
> > 
> > Before:
> >         v_cmpx_lt_f32_e32 vcc, 0, v0
> > 
> > After:
> >         v_cmp_gt_f32_e32 vcc, 0, v0
> >         s_andn2_b64 exec, exec, vcc
> >         s_andn2_b64 exec, exec, vcc
> So what is happening is the mask update and the exec update use the same register, and shader is marked GS.
> 
> Post WQM:
> ```
> // live mask generated:
> %3:sreg_64 = COPY $exec
> 
> // kill: 
> %0:vgpr_32 = COPY $vgpr0
> V_CMP_GT_F32_e32 0, %0:vgpr_32, implicit-def $vcc, implicit $mode, implicit $exec
> 
> // live mask update:
> dead %3:sreg_64 = S_ANDN2_B64 %3:sreg_64, $vcc, implicit-def $scc
> SI_EARLY_TERMINATE_SCC0 implicit $exec, implicit $scc
> 
> // kill implemented:
> $exec = S_ANDN2_B64 $exec, $vcc, implicit-def $scc
> ```
> 
> Here the SI_EARLY_TERMINATE_SCC0 generates nothing because the test shader is marked amdgpu_gs.  I think the test shaders are too trivial to be representative of real code generation.  I added the sendmsg otherwise these shaders optimise away to nothing.  It still could be reasonable to add a peephole to clean these up.
> 
Fair enough. Thanks for checking.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94746/new/

https://reviews.llvm.org/D94746