[PATCH] D80545: [AMDGPU/MemOpsCluster] Let mem ops clustering logic also consider number of clustered bytes

Wed May 27 09:11:40 PDT 2020

hsmhsm marked 4 inline comments as done.
hsmhsm added inline comments.

================
Comment at: llvm/lib/CodeGen/MachineScheduler.cpp:1576-1577
+                                      TRI)) {
+      unsigned Width =
+          !MI.memoperands_empty() ? MI.memoperands().front()->getSize() : 0;
+      MemOpRecords.push_back(MemOpInfo(SU, BaseOps, Offset, Width));
----------------
arsenm wrote:
> This won't correctly handle multiple mem operands
Taken care

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp:159-166
+        unsigned WidthA = CI.Last
+                              ? !CI.Last->memoperands_empty()
+                                    ? CI.Last->memoperands().front()->getSize()
+                                    : 0
+                              : 0;
+        unsigned WidthB =
+            !MI.memoperands_empty() ? MI.memoperands().front()->getSize() : 0;
----------------
foad wrote:
> arsenm wrote:
> > hsmhsm wrote:
> > > arsenm wrote:
> > > > It would be better to not depend on the memory operands here, but this belongs in a helper function some kind of not (and this can also sink down to the use)
> > > Hi @arsenm 
> > > 
> > > Did you mean here the helper function as a kind of below?
> > > 
> > > ```
> > > unsigned getDstMemOperandSize(const MachineInstr *MI) const {
> > >   if (!MI || MI->memoperands_empty())
> > >     return 0;
> > > 
> > >   return MI->memoperands().front()->getSize();
> > > }
> > > ```
> > > 
> > > And, use above helper function as below?
> > > 
> > > 
> > > ```
> > > unsigned WidthA = getDstMemOperandSize(CI.Last);
> > > unsigned WidthB = getDstMemOperandSize(&MI);
> > > ```
> > > 
> > Yes, but it's worse to rely on the memory operands here than getting this from the instruction opcode / operand
> Right, `getMemOperandsWithOffset` could be extended to return the width. Some targets already have an internal function `getMemOperandsWithOffsetWidth` which does that.
Taken care

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp:159-166
+        unsigned WidthA = CI.Last
+                              ? !CI.Last->memoperands_empty()
+                                    ? CI.Last->memoperands().front()->getSize()
+                                    : 0
+                              : 0;
+        unsigned WidthB =
+            !MI.memoperands_empty() ? MI.memoperands().front()->getSize() : 0;
----------------
hsmhsm wrote:
> foad wrote:
> > arsenm wrote:
> > > hsmhsm wrote:
> > > > arsenm wrote:
> > > > > It would be better to not depend on the memory operands here, but this belongs in a helper function some kind of not (and this can also sink down to the use)
> > > > Hi @arsenm 
> > > > 
> > > > Did you mean here the helper function as a kind of below?
> > > > 
> > > > ```
> > > > unsigned getDstMemOperandSize(const MachineInstr *MI) const {
> > > >   if (!MI || MI->memoperands_empty())
> > > >     return 0;
> > > > 
> > > >   return MI->memoperands().front()->getSize();
> > > > }
> > > > ```
> > > > 
> > > > And, use above helper function as below?
> > > > 
> > > > 
> > > > ```
> > > > unsigned WidthA = getDstMemOperandSize(CI.Last);
> > > > unsigned WidthB = getDstMemOperandSize(&MI);
> > > > ```
> > > > 
> > > Yes, but it's worse to rely on the memory operands here than getting this from the instruction opcode / operand
> > Right, `getMemOperandsWithOffset` could be extended to return the width. Some targets already have an internal function `getMemOperandsWithOffsetWidth` which does that.
> Taken care
Taken care

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp:175-176
               // allocation so there is no need for that kind of limit.
-              !SII->shouldClusterMemOps(CI.BaseOps, BaseOps, 2)))) {
+              !SII->shouldClusterMemOps(CI.BaseOps, BaseOps, 2,
+                                        WidthA + WidthB)))) {
           // Finish the current clause.
----------------
foad wrote:
> The comment explains that we don't really want to limit the size of the cluster here, so it's probably best to pass in a small dummy value like 2 instead of WidthA + WidthB.
Taken care

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80545/new/

https://reviews.llvm.org/D80545