[PATCH] D80282: [AMDGPU] Make SGPR spills exec mask agnostic

Thu May 21 04:50:16 PDT 2020

critson marked 14 inline comments as done.
critson added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:858-859
+      SuperReg == AMDGPU::EXEC || SuperReg == AMDGPU::EXEC_LO;
+  const unsigned ExecLoLane = SuperRegIsExec ? 0 : (isWave32 ? 16 : 32);
+  const unsigned ExecHiLane = SuperRegIsExec ? 1 : (isWave32 ? 17 : 33);
+
----------------
arsenm wrote:
> I don't understand these lane numbers; it picks the one past the midpoint lane?
Correct, because lanes before the midpoint may be used to store the SGPRs.
I have added a comment.

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:867
+  // On Wave32 only handle EXEC_LO.
+  // On Wave64 preserve EXEC_HI if there is sufficent space for a copy.
+  bool OnlyExecLo = isWave32 || NumSubRegs == 1;
----------------
arsenm wrote:
> I'm not sure I understand only sometimes saving exec_hi
Made the comment cleared.
The idea is that we try to turn off all unused lanes for the load/store on the assumption that this /may/ have some benefit (performance or power).  However if we do not have space for saving a copy of EXEC_HI then it is safe not to adjust the lanes.

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:870
+
+  Register ExecReg = AMDGPU::NoRegister;
+
----------------
arsenm wrote:
> We're trying to eliminate use of NoRegister in favor of using the default Register constructor, so you don't need the initializer here (or explicit checks for it later in the function)
I can get rid of the initializer, but I definitely need the explicit tests as getMatchingSuperReg returns NoRegister which must be handled correctly.

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:906-912
+  if (OnlyExecLo) {
+    BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_MOV_B32), AMDGPU::EXEC_LO)
+        .addImm(VGPRLanes);
+  } else {
+    BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_MOV_B64), AMDGPU::EXEC)
+        .addImm(VGPRLanes);
+  }
----------------
arsenm wrote:
> What other places do is have ExecMovOpc and ExecReg set from isWave32 and unify the BuildMI calls
I have applied a simplification to the code based on this.

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:907-908
+  if (OnlyExecLo) {
+    BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_MOV_B32), AMDGPU::EXEC_LO)
+        .addImm(VGPRLanes);
+  } else {
----------------
arsenm wrote:
> Could this just turn on all lanes for the memory spill? mov -1 is always free for code size, but some other bitmask may not be
As mentioned above, the idea is that we try to only enable lanes that are used, in case there is some potential benefit.
The only lane combinations which required additional encoding space are S256, S512 and S1024.
Note that S1024 will not occur on Wave32, as we avoid filling all but half the lanes in the VGPR before calling this function.
I would argue that the encoding space is probably not an issue as most common spills are S32, S64, S128.
Of course I am working without any tangible assessment of the benefit of disabling lanes, it might be that for >=S256 we should just set the mask to -1 anyway.

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:916
+  MachineFrameInfo &FrameInfo = MF->getFrameInfo();
+  Align Alignment = FrameInfo.getObjectAlign(Index);
+  MachinePointerInfo PtrInfo =
----------------
arsenm wrote:
> Should assert this isn't an SGPRSpill stack ID. You can't reuse the SGPR spill frame index for the real stack spill index
Sorry I am not totally clear on what you are saying?  This will only get here if it is not an SGPR to VGPR spill slot (tested in spillSGPR / restoreSGPR).

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:920-921
+  MachineMemOperand *MMO = MF->getMachineMemOperand(
+      PtrInfo, IsLoad ? MachineMemOperand::MOLoad : MachineMemOperand::MOStore,
+      EltSize, commonAlignment(Alignment, EltSize * Offset));
+
----------------
arsenm wrote:
> I think the correct thing to do is use use the base alignment here and add the offset to PtrInfo.getWithOffset
Again, I am not clear on this.  Can you explain further?

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:924-930
+    BuildMI(*MBB, MI, DL, TII->get(AMDGPU::SI_SPILL_V32_RESTORE), VGPR)
+        .addFrameIndex(Index)                // vaddr
+        .addReg(MFI->getScratchRSrcReg())    // srsrc
+        .addReg(MFI->getStackPtrOffsetReg()) // soffset
+        .addImm(Offset * EltSize)            // offset
+        .addMemOperand(MMO)
+        .addReg(VGPR, RegState::Implicit);
----------------
arsenm wrote:
> I know this is what this was doing before, but do we really need to use the intermediate spill pseudo here? As a follow on change could we directly emit the stack operation here?
Agreed, I will work on a follow up change to directly emit VMEM.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80282/new/

https://reviews.llvm.org/D80282