[PATCH] D119475: [AMDGPU] Add scheduler pass to rematerialize trivial defs
    Vang Thao via Phabricator via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Tue Feb 15 14:22:10 PST 2022
    
    
  
vangthao added a comment.
In D119475#3321057 <https://reviews.llvm.org/D119475#3321057>, @rampitec wrote:
> Actually you also need to update RegionBegin and RegionEnd for the block where you are erasing def in case if def was the first or the last instruction (do you have a test like this?)
Will add a test.
================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:799
+  // occupancy. Keep the modified live-in set from second check.
+  if (!ImprovedAfterSinkingLiveThrus) {
+    // Keep a list of newly rematerialized instructions so that we can easily
----------------
rampitec wrote:
> vangthao wrote:
> > rampitec wrote:
> > > vangthao wrote:
> > > > rampitec wrote:
> > > > > vangthao wrote:
> > > > > > rampitec wrote:
> > > > > > > vangthao wrote:
> > > > > > > > rampitec wrote:
> > > > > > > > > I do not think you really need a whole block above dealing with live-through estimations. First it is not guaranteed to help the RP if you sink a live-through. Second it is essentially the same as this block actually performing rematerialization, just less precise. Keep just rematerialization part.
> > > > > > > > If the live-through is increasing RP in this block then by rematerializing don't we decrease RP by the live-throughs that we rematerialized?
> > > > > > > I do not think LICM will hoist anything like that. Then even if so the code below will handle it as well.
> > > > > > The code below only checks for trivially rematerializable defs used in the high RP block. If the def is live-through and used in a low RP block then it will not be checked. This checks for that and sinks those defs to lower RP blocks so we can decrease live-through RP.
> > > > > I see 2 situations where something sinkable can be live-through:
> > > > > 
> > > > > 1)
> > > > > ```
> > > > > Def
> > > > > loop:
> > > > >   ...
> > > > > cbr loop
> > > > > Use
> > > > > ```
> > > > > There is absolutely no reason for any pass to hoist that Def high, most likely it will be sunk much earlier.
> > > > > 
> > > > > 2)
> > > > > ```
> > > > > Def
> > > > > loop1:
> > > > >   loop2:
> > > > >     Use
> > > > >   cbr loop2
> > > > > cbr loop1
> > > > > ```
> > > > > Assuming you have a high live-through pressure in loop1 sinking Def to Use into loop2 will unlikely help it as the highest pressure is likely in the loop2 anyway.
> > > > What about this situation? The def is live throughout the whole loop since it is needed in each iteration.
> > > > 
> > > > ```
> > > > Def
> > > > loop1:
> > > >   ...
> > > >   ... (high rp block)
> > > >   ...
> > > >   use (lower rp block)
> > > >   cbr loop1
> > > > ```
> > > > 
> > > This probably can happen, I doubt it is a common situation though. But then I still do not understand why code below will not handle it? You are only collecting instructions where defined register is in the live-in pressure of the high RP block, so it will be collected. Then below you are sinking it to the use. I.e. to me it will do the same thing automatically.
> > We can handle both situations there but re-calculating RP is expensive so I split up the checks into two parts where we first check live-throughs then check defs that have a use in the high RP block.
> > 
> > For the first live-through check, it is simple to re-calculate RP since we can deduct RP from the cached pressure while the second check needs a RPTracker to go over the whole region each time we sink a def. If we can improve occupancy by only sinking live-throughs then that will reduce the amount of time we have to use RPTracker.
> Besides unnecessary code complication I think you will need to recalculate RP anyway. You are going to reschedule the block to which you are sinking the instruction and since you may have several other blocks on the way this recalculation will be needed.
> 
> Speaking of which... It looks like if you have blocks in between this code will leave cached LiveIns, MBBLiveIns and Pressure for these blocks in an incorrect state.
> 
> I really like to keep it simple, at least until there is a proof a complication is necessary. Then even if it is needed it is easier to implement, debug and review it in a small steps.
If this is the last pass, do we still want to fix the cached values for those blocks we are not re-scheduling?
Repository:
  rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119475/new/
https://reviews.llvm.org/D119475
    
    
More information about the llvm-commits
mailing list