[PATCH] D105192: [AMDGPU] Optimize VGPR LiveRange in waterfall loops

Sebastian Neubauer via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jul 1 02:24:12 PDT 2021


sebastian-ne added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp:309
+      if (MO.readsReg() && IndRegs.count(MOReg) == 0) {
+        // Do not collect the induction variable
+        if (MI.getOpcode() == AMDGPU::V_READFIRSTLANE_B32) {
----------------
ruiling wrote:
> why the source operand of v_readfirstlane cannot be optimized?
You’re right, it can be optimized. I confused myself when looking at the test cases and at the readfirstlane, but all lanes that could have changed that physical register are not active anymore in the next iteration, so the readfirstlane will still return the original value.


================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp:507
+  // The optimized Reg is not alive through the Loop block anymore.
+  OldVarInfo.AliveBlocks.reset(Loop->getNumber());
+  NewVarInfo.AliveBlocks.set(Loop->getNumber());
----------------
ruiling wrote:
> As you just replace the uses of OldReg within the waterfall loop, the OldReg may still be possibly alive in the waterfall loop. Consider the cases:
> 1. OldReg defined outside of a outer_loop that encloses waterfall loop.
> 
> ```
> def(OldReg)
> 
> outer_loop:
>   use(OldReg)   // a use here require us to keep the OldReg alive through the whole range of outer_loop.
>   br waterfall_loop
> 
> waterfall_loop:
>   ...
>   brcond waterfall_loop, outer_loop_end_bb
> 
> outer_loop_end_bb:
>   ...
>   brcond outer_loop
> ```
> 
> 2. the OldReg is still used after the waterfall loop:
> 
> ```
>   def(OldReg)
>   ...
> 
> waterfall_loop:
>   ...
>   brcond waterfall_loop
> 
> after_waterfall_loop_bb:
>   use(OldReg)
> ```
> 
> In the above two listed cases, the OldReg should still be alive through the waterfall loop.
> I think we cannot reuse the physical register of OldReg in the waterfall loop. Or did I miss something?
> 
Case 2 is what `@test_indirect_call_vgpr_ptr_arg_and_reuse` tests. The generated assembly doesn’t change with or without this patch, because the RegisterCoalescer merges the OldReg and NewReg.
(And I think it can only get better if they are not merged; then the COPY inside the loop should be omitted like for the cases where OldReg is unused after the waterfall loop.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105192/new/

https://reviews.llvm.org/D105192



More information about the llvm-commits mailing list