[PATCH] D80282: [AMDGPU] Make SGPR spills exec mask agnostic

Wed May 27 16:23:15 PDT 2020

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:870
+
+  Register ExecReg = AMDGPU::NoRegister;
+
----------------
critson wrote:
> arsenm wrote:
> > We're trying to eliminate use of NoRegister in favor of using the default Register constructor, so you don't need the initializer here (or explicit checks for it later in the function)
> I can get rid of the initializer, but I definitely need the explicit tests as getMatchingSuperReg returns NoRegister which must be handled correctly.
These are the same thing. getMatchingSuperReg should return MCRegister. All of the unsigneds refering to registers should use Register

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:907-908
+  if (OnlyExecLo) {
+    BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_MOV_B32), AMDGPU::EXEC_LO)
+        .addImm(VGPRLanes);
+  } else {
----------------
critson wrote:
> arsenm wrote:
> > Could this just turn on all lanes for the memory spill? mov -1 is always free for code size, but some other bitmask may not be
> As mentioned above, the idea is that we try to only enable lanes that are used, in case there is some potential benefit.
> The only lane combinations which required additional encoding space are S256, S512 and S1024.
> Note that S1024 will not occur on Wave32, as we avoid filling all but half the lanes in the VGPR before calling this function.
> I would argue that the encoding space is probably not an issue as most common spills are S32, S64, S128.
> Of course I am working without any tangible assessment of the benefit of disabling lanes, it might be that for >=S256 we should just set the mask to -1 anyway.
But is there an actual benefit? I don't think the hardware saves anything by having fewer lanes. I would rather not add complexity to handle a hypothetical case that doesn't provide a real benefit

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:916
+  MachineFrameInfo &FrameInfo = MF->getFrameInfo();
+  Align Alignment = FrameInfo.getObjectAlign(Index);
+  MachinePointerInfo PtrInfo =
----------------
critson wrote:
> arsenm wrote:
> > Should assert this isn't an SGPRSpill stack ID. You can't reuse the SGPR spill frame index for the real stack spill index
> Sorry I am not totally clear on what you are saying?  This will only get here if it is not an SGPR to VGPR spill slot (tested in spillSGPR / restoreSGPR).
It's hard to follow how all of the SGPR->VGPR spills work, and it doesn't hurt to be sure this is definitely not an SGPR frame index

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:920-921
+  MachineMemOperand *MMO = MF->getMachineMemOperand(
+      PtrInfo, IsLoad ? MachineMemOperand::MOLoad : MachineMemOperand::MOStore,
+      EltSize, commonAlignment(Alignment, EltSize * Offset));
+
----------------
critson wrote:
> arsenm wrote:
> > I think the correct thing to do is use use the base alignment here and add the offset to PtrInfo.getWithOffset
> Again, I am not clear on this.  Can you explain further?
I mean the usage of the MachineMemOperand isn't correct with respect to alignment. It's supposed to be expressed as a base MMO alignment with an offset applied. You're trying to figure out the total alignment. There are helpers for getting the memory operand with an offset

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80282/new/

https://reviews.llvm.org/D80282