[PATCH] D130784: [AMDGPU] Support LDS spilling

Fri Aug 5 09:03:30 PDT 2022

piotr marked 6 inline comments as done.
piotr added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:893
+
+  ProgInfo.LDSSize = MFI->getLDSSize() + LDSSpillSize;
+
----------------
arsenm wrote:
> Does this need alignment padding up to 4?
Not sure (and the total size is a multiply of 4 by construction).

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:799-800
+unsigned GCNSubtarget::getLdsSpillLimitDwords(const MachineFunction &MF) const {
+  const Function &F = MF.getFunction();
+  return AMDGPU::getIntegerAttribute(F, "amdgpu-lds-spill-limit-dwords", 0);
+}
----------------
arsenm wrote:
> I don't see why we would need a new limit for this and just rely on the remaining LDS in the occupancy budget
For pixel shaders we would also need to make room for pixel parameters as they also reside in the same CU. 

================
Comment at: llvm/lib/Target/AMDGPU/SIFrameLowering.cpp:406
+// m0 for LDS spilling if possible.
+void SIFrameLowering::setupLDSSpilling(MachineFunction &MF,
+                                       MachineBasicBlock &MBB,
----------------
arsenm wrote:
> Could we do this in a post-RA pass before LiveIntervals is discarded? I was thinking we should copy what SC does and reserve more registers, and try to reallocate them in such a pass. The same place could have smarter management of m0
Doing it in a separate pass may work, but will need to explore it - need to check if I would have access to everything I need here. 

Good point about extensibility - I did not intend to do the smart m0 thing in the first implementation (left a FIXME), but it is true that to do that properly we would need kind of a data flow analysis so a separate pass would make sense in the long run.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:2565
+      auto OrigArg = Fn.getArg(Arg.getOrigArgIndex());
+      if (OrigArg->getName().equals("MultiDispatchInfo")) {
+        CCValAssign &VA = ArgLocs[i];
----------------
arsenm wrote:
> piotr wrote:
> > rampitec wrote:
> > > Matching argument by name is bad, especially if such name can be used by an user.
> > This is controlled by the front-end - graphics compute shaders do not have inputs that could be specified by a shader writer and end up landing here.
> > 
> > Having said that, I am not 100% happy with the way this is handled here either. I guess a better idea could be to rely on a function attribute set by the front-end that would say - "the user sgpr you want to use for multi dispatch info is at nth location". 
> > 
> > Just to note, input sgprs are handled differently between graphics and compute. I need to "just" preload WorkGroupInfo, so that I could later use that sgpr in the spill code. In kernels, it is treated as a system sgpr (allocateSystemSGPRs), so preloading it should not be a problem. However, in graphics we treat it as a user sgpr and pass in the list of arguments (with the name "MultiDispatchInfo"). 
> > 
> > 
> Argument names are totally meaningless. opt -strip will now break this, so you need something that doesn't rely on the name
Added an extra attr instead of matching by name.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130784/new/

https://reviews.llvm.org/D130784