[PATCH] D80545: [AMDGPU/MemOpsCluster] Let mem ops clustering logic also consider number of clustered bytes

Thu May 28 03:13:03 PDT 2020

foad added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:2044
   const MachineOperand *BaseOp;
-  unsigned Width;
   if (!getMemOperandWithOffsetWidth(LdSt, BaseOp, Offset, OffsetIsScalable,
                                     Width, TRI))
----------------
It's a bit silly that this function has "Width" in its name, and getMemOperandsWithOffset doesn't, but both of them take same Width argument.

================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:298-299
+    const TargetRegisterInfo *TRI) const {
+  const MachineOperand *DstOp = getDstMemOperand(LdSt);
+  Width = DstOp ? getDstOpSizeInBytes(LdSt, DstOp) : 0;
+
----------------
Move this below the mayLoadOrStore check. Then, does getDstMemOperand ever fail?

================
Comment at: llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp:2972
   OffsetIsScalable = false;
+  Width = !LdSt.memoperands_empty() ? LdSt.memoperands().front()->getSize() : 0;
   const MachineOperand *BaseOp = getBaseAndOffset(LdSt, Offset, AccessSize);
----------------
Doesn't AccessSize give you the width already?

================
Comment at: llvm/lib/Target/X86/X86InstrInfo.cpp:3700-3701
   OffsetIsScalable = false;
+  Width =
+      !MemOp.memoperands_empty() ? MemOp.memoperands().front()->getSize() : 0;
   BaseOps.push_back(BaseOp);
----------------
Perhaps MemOp.hasOneMemOperand() would be more appropriate? (Or what should the behaviour be if there is more than one?)

Please at least add a FIXME to get the width without relying on memoperands, or get an opinion from the X86 maintainers if this is OK.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80545/new/

https://reviews.llvm.org/D80545