[PATCH] D144099: [AMDGPU] Fold more AGPR copies/PHIs in SIFoldOperands

Mon Feb 20 12:53:40 PST 2023

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIFoldOperands.cpp:1697
+          getRegOpRC(*MRI, *TRI, Copy->getOperand(1));
+      if (ARC && ARC != CopyInRC)
+        return false;
----------------
Direct class equality checks are usually the wrong thing to do. Something like isSubclassEq or constrain to compatible subclass. Don't think there's any practical difference in this case

================
Comment at: llvm/lib/Target/AMDGPU/SIFoldOperands.cpp:1847
+  // Look at all AGPR Phis and collect the register + subregister used.
+  DenseMap<std::pair<Register, unsigned>, std::vector<MachineOperand *>>
+      RegToMO;
----------------
Don't see why you need to build this map/vector. You can just start inserting the instructions after all the phis as you process each one

================
Comment at: llvm/lib/Target/AMDGPU/SIFoldOperands.cpp:1631

-// Try to hoist an AGPR to VGPR copy out of the loop across a LCSSA PHI.
+static Register tryFindExistingCopy(MachineRegisterInfo &MRI,
+                                    MachineInstr &Begin, Register FromReg,
----------------
Pierre-vh wrote:
> arsenm wrote:
> > If SIFoldOperands worked like PeepholeOpt, you would have a map of these already. I'd rather avoid a linear scan backwards for every copy
> Do you mean that it's fine to store these in a map, or that we can't do that here?
> I'd also like to avoid the backwards scan if possible
I mean I don't like the current use iteration SIFoldOperands currently uses, and it would be better if it was rewritten to collect a map of known foldable defs as it walked. I think you can't simply introduce such a map without redoing the whole pass structure

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144099/new/

https://reviews.llvm.org/D144099