[PATCH] D80545: [AMDGPU/MemOpsCluster] Let mem ops clustering logic also consider number of clustered bytes

Wed May 27 02:07:55 PDT 2020

foad added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp:159-166
+        unsigned WidthA = CI.Last
+                              ? !CI.Last->memoperands_empty()
+                                    ? CI.Last->memoperands().front()->getSize()
+                                    : 0
+                              : 0;
+        unsigned WidthB =
+            !MI.memoperands_empty() ? MI.memoperands().front()->getSize() : 0;
----------------
arsenm wrote:
> hsmhsm wrote:
> > arsenm wrote:
> > > It would be better to not depend on the memory operands here, but this belongs in a helper function some kind of not (and this can also sink down to the use)
> > Hi @arsenm 
> > 
> > Did you mean here the helper function as a kind of below?
> > 
> > ```
> > unsigned getDstMemOperandSize(const MachineInstr *MI) const {
> >   if (!MI || MI->memoperands_empty())
> >     return 0;
> > 
> >   return MI->memoperands().front()->getSize();
> > }
> > ```
> > 
> > And, use above helper function as below?
> > 
> > 
> > ```
> > unsigned WidthA = getDstMemOperandSize(CI.Last);
> > unsigned WidthB = getDstMemOperandSize(&MI);
> > ```
> > 
> Yes, but it's worse to rely on the memory operands here than getting this from the instruction opcode / operand
Right, `getMemOperandsWithOffset` could be extended to return the width. Some targets already have an internal function `getMemOperandsWithOffsetWidth` which does that.

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp:175-176
               // allocation so there is no need for that kind of limit.
-              !SII->shouldClusterMemOps(CI.BaseOps, BaseOps, 2)))) {
+              !SII->shouldClusterMemOps(CI.BaseOps, BaseOps, 2,
+                                        WidthA + WidthB)))) {
           // Finish the current clause.
----------------
The comment explains that we don't really want to limit the size of the cluster here, so it's probably best to pass in a small dummy value like 2 instead of WidthA + WidthB.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80545/new/

https://reviews.llvm.org/D80545