[PATCH] D130784: [AMDGPU] Support LDS spilling

Thu Aug 4 19:16:27 PDT 2022

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:890

-  ProgInfo.LDSSize = MFI->getLDSSize();
+  unsigned WorkGroupSize = STM.getFlatWorkGroupSizes(F).second;
+  unsigned LDSSpillSize = MFI->getLdsSpill().TotalSize * WorkGroupSize;
----------------
Rename to MaxWorkGroupSize

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:893
+
+  ProgInfo.LDSSize = MFI->getLDSSize() + LDSSpillSize;
+
----------------
Does this need alignment padding up to 4?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:799-800
+unsigned GCNSubtarget::getLdsSpillLimitDwords(const MachineFunction &MF) const {
+  const Function &F = MF.getFunction();
+  return AMDGPU::getIntegerAttribute(F, "amdgpu-lds-spill-limit-dwords", 0);
+}
----------------
I don't see why we would need a new limit for this and just rely on the remaining LDS in the occupancy budget

================
Comment at: llvm/lib/Target/AMDGPU/SIFrameLowering.cpp:406
+// m0 for LDS spilling if possible.
+void SIFrameLowering::setupLDSSpilling(MachineFunction &MF,
+                                       MachineBasicBlock &MBB,
----------------
Could we do this in a post-RA pass before LiveIntervals is discarded? I was thinking we should copy what SC does and reserve more registers, and try to reallocate them in such a pass. The same place could have smarter management of m0

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:2565
+      auto OrigArg = Fn.getArg(Arg.getOrigArgIndex());
+      if (OrigArg->getName().equals("MultiDispatchInfo")) {
+        CCValAssign &VA = ArgLocs[i];
----------------
piotr wrote:
> rampitec wrote:
> > Matching argument by name is bad, especially if such name can be used by an user.
> This is controlled by the front-end - graphics compute shaders do not have inputs that could be specified by a shader writer and end up landing here.
> 
> Having said that, I am not 100% happy with the way this is handled here either. I guess a better idea could be to rely on a function attribute set by the front-end that would say - "the user sgpr you want to use for multi dispatch info is at nth location". 
> 
> Just to note, input sgprs are handled differently between graphics and compute. I need to "just" preload WorkGroupInfo, so that I could later use that sgpr in the spill code. In kernels, it is treated as a system sgpr (allocateSystemSGPRs), so preloading it should not be a problem. However, in graphics we treat it as a user sgpr and pass in the list of arguments (with the name "MultiDispatchInfo"). 
> 
> 
Argument names are totally meaningless. opt -strip will now break this, so you need something that doesn't rely on the name

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130784/new/

https://reviews.llvm.org/D130784