[PATCH] D105192: [AMDGPU] Optimize VGPR LiveRange in waterfall loops
Ruiling, Song via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jul 5 18:23:46 PDT 2021
ruiling added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp:422
+void SILowerControlFlow::emitWaterfallLoop(MachineInstr &MI) {
+ MachineBasicBlock &MBB = *MI.getParent();
----------------
You can move the SI_WATERFALL_LOOP changes into a separate patch.
================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp:479
+ if (Pred == Loop)
+ PHI.addReg(UndefReg, RegState::Undef | RegState::Kill).addMBB(Pred);
+ else
----------------
the `RegState::Kill` is unnecessary. we don't need to track Undef register.
================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp:494-496
+ LiveVariables::VarInfo &OldVarInfo = LV->getVarInfo(Reg);
+ LiveVariables::VarInfo &NewVarInfo = LV->getVarInfo(NewReg);
+ LiveVariables::VarInfo &UndefVarInfo = LV->getVarInfo(UndefReg);
----------------
should be:
```
LiveVariables::VarInfo &NewVarInfo = LV->getVarInfo(NewReg);
LiveVariables::VarInfo &OldVarInfo = LV->getVarInfo(Reg);
```
the getVarInfo(NewReg) may trigger memory reallocation of `VirtRegInfo` which invalidate `OldVarInfo`. You can extract the test_indirect_call_vgpr_ptr_arg_and_reuse into a separate file and run the llc command to reproduce the issue.
The `UndefVarInfo` is not needed.
================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp:500-514
+ NewVarInfo.AliveBlocks.set(Loop->getNumber());
+
+ UndefVarInfo.Kills.push_back(PHI);
+
+ // Transfer the possible Kills in the Loop from Reg to NewReg
+ auto I = OldVarInfo.Kills.begin();
+ while (I != OldVarInfo.Kills.end()) {
----------------
I think `NewVarInfo.AliveBlocks.set(Loop->getNumber());` is incorrect. the NewReg is not alive through the whole range of the waterfall loop, it is defined at the PHI instruction and Killed at the last use in the waterfall loop.
`OldVarInfo.AliveBlocks.reset(Loop->getNumber());` only works for cases excluding Case 1 and Case 2. For these two cases I listed before, there should not be "liveness hole" after the transformation.
I would suggest you not to do this optimization if you find the register def-use match these two cases as currently it does not bring any benefit, you can check if the register `isLivein` the successor of waterfall-loop-block.
I am sorry I disabled the verifier after this pass for some other issue. you can enable it to help you catch wrong LiveVariable info.
================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp:504
+
+ // Transfer the possible Kills in the Loop from Reg to NewReg
+ auto I = OldVarInfo.Kills.begin();
----------------
Did you see such case happen? generally if a value is defined before entering a loop, the use inside the loop will not be treated as kill because later loop iterations still need to access the value. I guess the code is just useless?
================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp:507
+ // The optimized Reg is not alive through the Loop block anymore.
+ OldVarInfo.AliveBlocks.reset(Loop->getNumber());
+ NewVarInfo.AliveBlocks.set(Loop->getNumber());
----------------
sebastian-ne wrote:
> ruiling wrote:
> > As you just replace the uses of OldReg within the waterfall loop, the OldReg may still be possibly alive in the waterfall loop. Consider the cases:
> > 1. OldReg defined outside of a outer_loop that encloses waterfall loop.
> >
> > ```
> > def(OldReg)
> >
> > outer_loop:
> > use(OldReg) // a use here require us to keep the OldReg alive through the whole range of outer_loop.
> > br waterfall_loop
> >
> > waterfall_loop:
> > ...
> > brcond waterfall_loop, outer_loop_end_bb
> >
> > outer_loop_end_bb:
> > ...
> > brcond outer_loop
> > ```
> >
> > 2. the OldReg is still used after the waterfall loop:
> >
> > ```
> > def(OldReg)
> > ...
> >
> > waterfall_loop:
> > ...
> > brcond waterfall_loop
> >
> > after_waterfall_loop_bb:
> > use(OldReg)
> > ```
> >
> > In the above two listed cases, the OldReg should still be alive through the waterfall loop.
> > I think we cannot reuse the physical register of OldReg in the waterfall loop. Or did I miss something?
> >
> Case 2 is what `@test_indirect_call_vgpr_ptr_arg_and_reuse` tests. The generated assembly doesn’t change with or without this patch, because the RegisterCoalescer merges the OldReg and NewReg.
> (And I think it can only get better if they are not merged; then the COPY inside the loop should be omitted like for the cases where OldReg is unused after the waterfall loop.)
> Case 2 is what `@test_indirect_call_vgpr_ptr_arg_and_reuse` tests. The generated assembly doesn’t change with or without this patch, because the RegisterCoalescer merges the OldReg and NewReg.
> (And I think it can only get better if they are not merged; then the COPY inside the loop should be omitted like for the cases where OldReg is unused after the waterfall loop.)
I think the remaining COPY inside the loop cannot be solved in this pass, we need to solve it using some other way at later steps. You can add a TODO in the test saying that the copy inside the waterfall loop could be removed.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D105192/new/
https://reviews.llvm.org/D105192
More information about the llvm-commits
mailing list