[PATCH] D80282: [AMDGPU] Make SGPR spills exec mask agnostic

Wed May 20 07:35:22 PDT 2020

arsenm added a comment.

Missing new tests / checks. This could probably use a MIR test

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:838

+void SIRegisterInfo::buildSGPRSpillLoadStore(MachineBasicBlock::iterator MI,
+                                             int Index, int Offset,
----------------
Needs a comment explaining the point of this function

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:858-859
+      SuperReg == AMDGPU::EXEC || SuperReg == AMDGPU::EXEC_LO;
+  const unsigned ExecLoLane = SuperRegIsExec ? 0 : (isWave32 ? 16 : 32);
+  const unsigned ExecHiLane = SuperRegIsExec ? 1 : (isWave32 ? 17 : 33);
+
----------------
I don't understand these lane numbers; it picks the one past the midpoint lane?

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:867
+  // On Wave32 only handle EXEC_LO.
+  // On Wave64 preserve EXEC_HI if there is sufficent space for a copy.
+  bool OnlyExecLo = isWave32 || NumSubRegs == 1;
----------------
I'm not sure I understand only sometimes saving exec_hi

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:870
+
+  Register ExecReg = AMDGPU::NoRegister;
+
----------------
We're trying to eliminate use of NoRegister in favor of using the default Register constructor, so you don't need the initializer here (or explicit checks for it later in the function)

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:880
+        .addImm(ExecLoLane)
+        .addReg(VGPR, IsLoad ? RegState::Undef : 0);
+
----------------
getUndefRegState

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:896
+                                  AMDGPU::sub0, &AMDGPU::SGPR_64RegClass);
+    // If src/dst is a odd size it is possible subreg0 is not aligned.
+    if (ExecReg == AMDGPU::NoRegister && NumSubRegs > 2)
----------------
Grammar, a odd size->an odd size

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:906-912
+  if (OnlyExecLo) {
+    BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_MOV_B32), AMDGPU::EXEC_LO)
+        .addImm(VGPRLanes);
+  } else {
+    BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_MOV_B64), AMDGPU::EXEC)
+        .addImm(VGPRLanes);
+  }
----------------
What other places do is have ExecMovOpc and ExecReg set from isWave32 and unify the BuildMI calls

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:907-908
+  if (OnlyExecLo) {
+    BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_MOV_B32), AMDGPU::EXEC_LO)
+        .addImm(VGPRLanes);
+  } else {
----------------
Could this just turn on all lanes for the memory spill? mov -1 is always free for code size, but some other bitmask may not be

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:916
+  MachineFrameInfo &FrameInfo = MF->getFrameInfo();
+  Align Alignment = FrameInfo.getObjectAlign(Index);
+  MachinePointerInfo PtrInfo =
----------------
Should assert this isn't an SGPRSpill stack ID. You can't reuse the SGPR spill frame index for the real stack spill index

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:920-921
+  MachineMemOperand *MMO = MF->getMachineMemOperand(
+      PtrInfo, IsLoad ? MachineMemOperand::MOLoad : MachineMemOperand::MOStore,
+      EltSize, commonAlignment(Alignment, EltSize * Offset));
+
----------------
I think the correct thing to do is use use the base alignment here and add the offset to PtrInfo.getWithOffset

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:924-930
+    BuildMI(*MBB, MI, DL, TII->get(AMDGPU::SI_SPILL_V32_RESTORE), VGPR)
+        .addFrameIndex(Index)                // vaddr
+        .addReg(MFI->getScratchRSrcReg())    // srsrc
+        .addReg(MFI->getStackPtrOffsetReg()) // soffset
+        .addImm(Offset * EltSize)            // offset
+        .addMemOperand(MMO)
+        .addReg(VGPR, RegState::Implicit);
----------------
I know this is what this was doing before, but do we really need to use the intermediate spill pseudo here? As a follow on change could we directly emit the stack operation here?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80282/new/

https://reviews.llvm.org/D80282