[PATCH] D154083: [AMDGPU] Rematerialize scalar loads

Thu Jun 29 09:01:36 PDT 2023

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:108

 bool SIInstrInfo::isReallyTriviallyReMaterializable(
     const MachineInstr &MI) const {
----------------
piotr wrote:
> This is a pre-existent issue, but I am not very comfortable with the fact that we override `isReallyTriviallyReMaterializable` the way we do. 
> 
> It works because we rely on extra checks being present before the method is invoked (for example on calling `LiveRangeEdit::allUsesAvailableAt()`). However, the comment in the base class says `isReallyTriviallyReMaterializable` "must return false if (..) or if it requres any address registers that are not always available."
> 
> Having said that, I am not sure if there is a practical solution for that (apart from softening the comment in the base class).
Yes, I don't like this either. The way the rematerializable hooks is structured is confusing and part of why this never got done before

================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:111
+  if (isVOP1(MI) || isVOP2(MI) || isVOP3(MI) || isSDWA(MI) || isSALU(MI) ||
+      (isSMRD(MI) && MI.isDereferenceableInvariantLoad())) {
     // Normally VALU use of exec would block the rematerialization, but that
----------------
piotr wrote:
> Ironically, the addition of the check `isDereferenceableInvariantLoad` makes the examples that drive this work not handled, as the S_LOADs are not marked as `dereferenceable` at the moment (but that is another issue).
Don't really need the isSMRD with the isDereferenceableInvariantLoad. We should also be able to handle any load

================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:2316
+                                unsigned SubIdx, const MachineInstr &Orig,
+                                const TargetRegisterInfo &TRI) const {
+
----------------
Don't name TRI, should use RI member here

================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:2322-2323
+  switch (Opcode) {
+
+  // TODO: Handle more cases
+  case AMDGPU::S_LOAD_DWORDX16_IMM: {
----------------
Fix weird line breaks, comment before switch

================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:2328
+
+    if (I == MBB.end() || I->getOpcode() == AMDGPU::COPY)
+      break;
----------------
I'd hope we would have access to a live lane mask that we need to handle. Failing that, do you really need the copy opcode exception?

================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:2331
+
+    auto MO = I->findRegisterUseOperand(Orig.getOperand(0).getReg());
+    if (!MO || MO->getSubReg() == AMDGPU::NoSubRegister)
----------------
don't use auto here, or at least use a const *

================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:2335
+
+    unsigned Offset = TRI.getSubRegIdxOffset(MO->getSubReg());
+    unsigned SubregSize = TRI.getSubRegIdxSize(MO->getSubReg());
----------------
s/TRI/RI/

================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:2342
+
+    if (OrigSize == SubregSize)
+      break;
----------------
Don't think this can happen

================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:2354
+    MI->getOperand(0).setIsUndef(false);
+    MRI.setRegClass(DestReg, &AMDGPU::SGPR_256RegClass);
+
----------------
Can you go through getSubRegisterClass (possibly with getMatchingSuperRegClass and getSubClassWithSubReg) to avoid hardcoding this 

================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:2362
+          getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm();
+      unsigned FinalOffset = OrigOffset + Offset;
+      getNamedOperand(*MI, AMDGPU::OpName::offset)->setImm(FinalOffset);
----------------
keep this all as int64_t

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154083/new/

https://reviews.llvm.org/D154083