[PATCH] D99269: [AMDGPU] Unify spill code

Thu Apr 8 09:36:23 PDT 2021

scott.linder added a comment.

Thank you, I have one remaining nit concerning `RS` and `LiveRegs`, but otherwise everything LGTM.

================
Comment at: llvm/lib/Target/AMDGPU/SIFrameLowering.cpp:118
 // use.
-static void buildPrologSpill(const GCNSubtarget &ST, LivePhysRegs &LiveRegs,
-                             MachineBasicBlock &MBB,
----------------
sebastian-ne wrote:
> scott.linder wrote:
> > This seems to regress in terms of readability below, where instead of a call to `buildPrologSpill` we have a call to `buildPrologEpilogSpill(..., true)`, and the same for `buildEpilogReload`.
> > 
> > Can we name the new function `buildPrologSpillEpilogReloadImpl` and leave the old methods interface the same:
> > 
> > ```
> > static void buildPrologSpill(...) { return buildPrologSpillEpilogReloadImpl(..., true); }
> > static void buildEpilogReload(...) { return buildPrologSpillEpilogReloadImpl(..., false); }
> > ```
> > 
> > An argument against doing this may be that we already have `buildScratchExecCopy(..., true)` instead of `buildPrologScratchExecCopy` and `buildScratchExecCopy(..., false)` instead of `buildEpilogScratchExecCopy`, but I'd vote to do the same for those too.
> There are some other functions (`SIRegisterInfo::buildSpillLoadStore` and `SIRegisterInfo::buildSGPRSpillLoadStore`) that also take a boolean argument to switch between load and store. Now that I look at buildPrologEpilogSpill again, it’s quite short and the load/spill parts are different enough, so it makes sense to split them again.
> 
> I added comments to `buildScratchExecCopy`, does that look ok or should I add wrappers for prolog and epilog?
> There are some other functions (SIRegisterInfo::buildSpillLoadStore and SIRegisterInfo::buildSGPRSpillLoadStore) that also take a boolean argument to switch between load and store.

I think some amount of this is pragmatic and reasonable, I just start to find it harder to read when it gets used broadly enough. Also in the case of `buildSpillLoadStore` the boolean is capturing whether to `kill` the operand, although it seems like that would always be equivalent to whether `Opcode` is a load or a store?

> Now that I look at buildPrologEpilogSpill again, it’s quite short and the load/spill parts are different enough, so it makes sense to split them again.

This LGTM, thank you! The differences between the two are more obvious to me now than in the conditional, combined version.

> I added comments to buildScratchExecCopy, does that look ok or should I add wrappers for prolog and epilog?

Sorry, I only meant to justify my comments on your code, I didn't mean to ask for a change as part of this patch. The comment LGTM.

================
Comment at: llvm/lib/Target/AMDGPU/SIFrameLowering.cpp:831-832

-  if (!IsProlog)
-    LiveRegs.removeReg(ScratchExecCopy);

----------------
sebastian-ne wrote:
> scott.linder wrote:
> > This change seems unrelated; is this the bug fix? Can it be in a separate patch after the cleanup? I'd prefer the cleanup be as close to NFC as is reasonable, although I assume there is some change in behavior because we no longer call `findScratchNonCalleeSaveRegister`, we do whatever `buildSpillLoadRestore` does.
> > 
> > I also might need some help understanding what is happening here to be able to review it. I'm confused about why `LivePhysRegs::init` gets called with different arguments for the prolog case vs the epilog case, and why we `stepBackward` on an (ostensibly arbitrary?) instruction after adding LiveOuts in the epilog case.
> I split out the fix into D100098.
> I think the difference is, in the prolog we start in the beginning and track register liveness to the current insertion point (MBBI). In the epilog we start at the end and track liveness to the current insertion point. The result should be the same, but the epilog is probably at the end of the basic block while the prolog is at the beginning, so it’s cheaper to start from the end/start.
LGTM, thank you!

I think I understand now, and removing some of the redundant code helps.

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:853-862
+    if (RS) {
       SOffset = RS->scavengeRegister(&AMDGPU::SGPR_32RegClass, MI, 0, false);
+    } else if (LiveRegs) {
+      for (MCRegister Reg : AMDGPU::SGPR_32RegClass) {
+        if (LiveRegs->available(MF->getRegInfo(), Reg)) {
+          SOffset = Reg;
+          break;
----------------
sebastian-ne wrote:
> scott.linder wrote:
> > Does `RS` already know about `LiveRegs`, or something equivalent to it? Or maybe `RS` cannot fail and return `Register()`?
> > 
> > If not, it seems like we should continue to try alternative ways to pick an `SOffset` until we either find one or run out of ways to try.
> Usually, `buildSpillLoadStore` is called to lower spill pseudos. At that point, we get a RegScavenger. During frame lowering (which did not call this function before), we do not have access to a RegScavenger as far as I know, so we use LiveRegs to find an unused register.
> So, when calling this function, either `RS` should be set (during frame index elimination), or `LiveRegs` (during frame lowering), but never both, as they do the same thing.
> 
> `RS` can fail if all SGPRs are live. Usually it would then try to spill a register to the emergency spill slot, but that doesn’t make sense if we are currently lowering a spill, so we forbid that with the `false` last argument.
Makes sense, but in that case could you add an `assert(!RS != !LiveRegs && "expect either RS or LiveRegs but not both")`?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99269/new/

https://reviews.llvm.org/D99269