[PATCH] D105192: [AMDGPU] Optimize VGPR LiveRange in waterfall loops

Ruiling, Song via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jul 5 18:23:46 PDT 2021


ruiling added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp:422
 
+void SILowerControlFlow::emitWaterfallLoop(MachineInstr &MI) {
+  MachineBasicBlock &MBB = *MI.getParent();
----------------
You can move the SI_WATERFALL_LOOP changes into a separate patch.


================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp:479
+    if (Pred == Loop)
+      PHI.addReg(UndefReg, RegState::Undef | RegState::Kill).addMBB(Pred);
+    else
----------------
the `RegState::Kill` is unnecessary. we don't need to track Undef register.


================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp:494-496
+  LiveVariables::VarInfo &OldVarInfo = LV->getVarInfo(Reg);
+  LiveVariables::VarInfo &NewVarInfo = LV->getVarInfo(NewReg);
+  LiveVariables::VarInfo &UndefVarInfo = LV->getVarInfo(UndefReg);
----------------
should be:
```
  LiveVariables::VarInfo &NewVarInfo = LV->getVarInfo(NewReg);
  LiveVariables::VarInfo &OldVarInfo = LV->getVarInfo(Reg);
```
the getVarInfo(NewReg) may trigger memory reallocation of `VirtRegInfo` which invalidate `OldVarInfo`. You can extract the test_indirect_call_vgpr_ptr_arg_and_reuse into a separate file and run the llc command to reproduce the issue.
The `UndefVarInfo` is not needed.


================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp:500-514
+  NewVarInfo.AliveBlocks.set(Loop->getNumber());
+
+  UndefVarInfo.Kills.push_back(PHI);
+
+  // Transfer the possible Kills in the Loop from Reg to NewReg
+  auto I = OldVarInfo.Kills.begin();
+  while (I != OldVarInfo.Kills.end()) {
----------------
I think `NewVarInfo.AliveBlocks.set(Loop->getNumber());` is incorrect. the NewReg is not alive through the whole range of the waterfall loop, it is defined at the PHI instruction and Killed at the last use in the waterfall loop.
`OldVarInfo.AliveBlocks.reset(Loop->getNumber());` only works for cases excluding Case 1 and Case 2. For these two cases I listed before, there should not be "liveness hole" after the transformation.
I would suggest you not to do this optimization if you find the register def-use match these two cases as currently it does not bring any benefit, you can check if the register `isLivein` the successor of waterfall-loop-block.
I am sorry I disabled the verifier after this pass for some other issue. you can enable it to help you catch wrong LiveVariable info.


================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp:504
+
+  // Transfer the possible Kills in the Loop from Reg to NewReg
+  auto I = OldVarInfo.Kills.begin();
----------------
Did you see such case happen? generally if a value is defined before entering a loop, the use inside the loop will not be treated as kill because later loop iterations still need to access the value. I guess the code is just useless?


================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp:507
+  // The optimized Reg is not alive through the Loop block anymore.
+  OldVarInfo.AliveBlocks.reset(Loop->getNumber());
+  NewVarInfo.AliveBlocks.set(Loop->getNumber());
----------------
sebastian-ne wrote:
> ruiling wrote:
> > As you just replace the uses of OldReg within the waterfall loop, the OldReg may still be possibly alive in the waterfall loop. Consider the cases:
> > 1. OldReg defined outside of a outer_loop that encloses waterfall loop.
> > 
> > ```
> > def(OldReg)
> > 
> > outer_loop:
> >   use(OldReg)   // a use here require us to keep the OldReg alive through the whole range of outer_loop.
> >   br waterfall_loop
> > 
> > waterfall_loop:
> >   ...
> >   brcond waterfall_loop, outer_loop_end_bb
> > 
> > outer_loop_end_bb:
> >   ...
> >   brcond outer_loop
> > ```
> > 
> > 2. the OldReg is still used after the waterfall loop:
> > 
> > ```
> >   def(OldReg)
> >   ...
> > 
> > waterfall_loop:
> >   ...
> >   brcond waterfall_loop
> > 
> > after_waterfall_loop_bb:
> >   use(OldReg)
> > ```
> > 
> > In the above two listed cases, the OldReg should still be alive through the waterfall loop.
> > I think we cannot reuse the physical register of OldReg in the waterfall loop. Or did I miss something?
> > 
> Case 2 is what `@test_indirect_call_vgpr_ptr_arg_and_reuse` tests. The generated assembly doesn’t change with or without this patch, because the RegisterCoalescer merges the OldReg and NewReg.
> (And I think it can only get better if they are not merged; then the COPY inside the loop should be omitted like for the cases where OldReg is unused after the waterfall loop.)
> Case 2 is what `@test_indirect_call_vgpr_ptr_arg_and_reuse` tests. The generated assembly doesn’t change with or without this patch, because the RegisterCoalescer merges the OldReg and NewReg.
> (And I think it can only get better if they are not merged; then the COPY inside the loop should be omitted like for the cases where OldReg is unused after the waterfall loop.)

I think the remaining COPY inside the loop cannot be solved in this pass, we need to solve it using some other way at later steps. You can add a TODO in the test saying that the copy inside the waterfall loop could be removed.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105192/new/

https://reviews.llvm.org/D105192



More information about the llvm-commits mailing list