[PATCH] D105192: [AMDGPU] Optimize VGPR LiveRange in waterfall loops

Sebastian Neubauer via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jul 6 02:30:15 PDT 2021


sebastian-ne added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp:507
+  // The optimized Reg is not alive through the Loop block anymore.
+  OldVarInfo.AliveBlocks.reset(Loop->getNumber());
+  NewVarInfo.AliveBlocks.set(Loop->getNumber());
----------------
ruiling wrote:
> sebastian-ne wrote:
> > ruiling wrote:
> > > As you just replace the uses of OldReg within the waterfall loop, the OldReg may still be possibly alive in the waterfall loop. Consider the cases:
> > > 1. OldReg defined outside of a outer_loop that encloses waterfall loop.
> > > 
> > > ```
> > > def(OldReg)
> > > 
> > > outer_loop:
> > >   use(OldReg)   // a use here require us to keep the OldReg alive through the whole range of outer_loop.
> > >   br waterfall_loop
> > > 
> > > waterfall_loop:
> > >   ...
> > >   brcond waterfall_loop, outer_loop_end_bb
> > > 
> > > outer_loop_end_bb:
> > >   ...
> > >   brcond outer_loop
> > > ```
> > > 
> > > 2. the OldReg is still used after the waterfall loop:
> > > 
> > > ```
> > >   def(OldReg)
> > >   ...
> > > 
> > > waterfall_loop:
> > >   ...
> > >   brcond waterfall_loop
> > > 
> > > after_waterfall_loop_bb:
> > >   use(OldReg)
> > > ```
> > > 
> > > In the above two listed cases, the OldReg should still be alive through the waterfall loop.
> > > I think we cannot reuse the physical register of OldReg in the waterfall loop. Or did I miss something?
> > > 
> > Case 2 is what `@test_indirect_call_vgpr_ptr_arg_and_reuse` tests. The generated assembly doesn’t change with or without this patch, because the RegisterCoalescer merges the OldReg and NewReg.
> > (And I think it can only get better if they are not merged; then the COPY inside the loop should be omitted like for the cases where OldReg is unused after the waterfall loop.)
> > Case 2 is what `@test_indirect_call_vgpr_ptr_arg_and_reuse` tests. The generated assembly doesn’t change with or without this patch, because the RegisterCoalescer merges the OldReg and NewReg.
> > (And I think it can only get better if they are not merged; then the COPY inside the loop should be omitted like for the cases where OldReg is unused after the waterfall loop.)
> 
> I think the remaining COPY inside the loop cannot be solved in this pass, we need to solve it using some other way at later steps. You can add a TODO in the test saying that the copy inside the waterfall loop could be removed.
> I think the remaining COPY inside the loop cannot be solved in this pass, we need to solve it using some other way at later steps. You can add a TODO in the test saying that the copy inside the waterfall loop could be removed.

I agree.
I misunderstood my own test, `@test_indirect_call_vgpr_ptr_arg_and_reuse` shouldn’t have an unnecessary COPY, I added a new one (`@test_indirect_call_vgpr_ptr_arg_and_return`) that has an unnecessary COPY.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105192/new/

https://reviews.llvm.org/D105192



More information about the llvm-commits mailing list