[PATCH] D80545: [AMDGPU/MemOpsCluster] Let mem ops clustering logic also consider number of clustered bytes
Mahesha S via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed May 27 09:11:40 PDT 2020
hsmhsm marked 4 inline comments as done.
hsmhsm added inline comments.
================
Comment at: llvm/lib/CodeGen/MachineScheduler.cpp:1576-1577
+ TRI)) {
+ unsigned Width =
+ !MI.memoperands_empty() ? MI.memoperands().front()->getSize() : 0;
+ MemOpRecords.push_back(MemOpInfo(SU, BaseOps, Offset, Width));
----------------
arsenm wrote:
> This won't correctly handle multiple mem operands
Taken care
================
Comment at: llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp:159-166
+ unsigned WidthA = CI.Last
+ ? !CI.Last->memoperands_empty()
+ ? CI.Last->memoperands().front()->getSize()
+ : 0
+ : 0;
+ unsigned WidthB =
+ !MI.memoperands_empty() ? MI.memoperands().front()->getSize() : 0;
----------------
foad wrote:
> arsenm wrote:
> > hsmhsm wrote:
> > > arsenm wrote:
> > > > It would be better to not depend on the memory operands here, but this belongs in a helper function some kind of not (and this can also sink down to the use)
> > > Hi @arsenm
> > >
> > > Did you mean here the helper function as a kind of below?
> > >
> > > ```
> > > unsigned getDstMemOperandSize(const MachineInstr *MI) const {
> > > if (!MI || MI->memoperands_empty())
> > > return 0;
> > >
> > > return MI->memoperands().front()->getSize();
> > > }
> > > ```
> > >
> > > And, use above helper function as below?
> > >
> > >
> > > ```
> > > unsigned WidthA = getDstMemOperandSize(CI.Last);
> > > unsigned WidthB = getDstMemOperandSize(&MI);
> > > ```
> > >
> > Yes, but it's worse to rely on the memory operands here than getting this from the instruction opcode / operand
> Right, `getMemOperandsWithOffset` could be extended to return the width. Some targets already have an internal function `getMemOperandsWithOffsetWidth` which does that.
Taken care
================
Comment at: llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp:159-166
+ unsigned WidthA = CI.Last
+ ? !CI.Last->memoperands_empty()
+ ? CI.Last->memoperands().front()->getSize()
+ : 0
+ : 0;
+ unsigned WidthB =
+ !MI.memoperands_empty() ? MI.memoperands().front()->getSize() : 0;
----------------
hsmhsm wrote:
> foad wrote:
> > arsenm wrote:
> > > hsmhsm wrote:
> > > > arsenm wrote:
> > > > > It would be better to not depend on the memory operands here, but this belongs in a helper function some kind of not (and this can also sink down to the use)
> > > > Hi @arsenm
> > > >
> > > > Did you mean here the helper function as a kind of below?
> > > >
> > > > ```
> > > > unsigned getDstMemOperandSize(const MachineInstr *MI) const {
> > > > if (!MI || MI->memoperands_empty())
> > > > return 0;
> > > >
> > > > return MI->memoperands().front()->getSize();
> > > > }
> > > > ```
> > > >
> > > > And, use above helper function as below?
> > > >
> > > >
> > > > ```
> > > > unsigned WidthA = getDstMemOperandSize(CI.Last);
> > > > unsigned WidthB = getDstMemOperandSize(&MI);
> > > > ```
> > > >
> > > Yes, but it's worse to rely on the memory operands here than getting this from the instruction opcode / operand
> > Right, `getMemOperandsWithOffset` could be extended to return the width. Some targets already have an internal function `getMemOperandsWithOffsetWidth` which does that.
> Taken care
Taken care
================
Comment at: llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp:175-176
// allocation so there is no need for that kind of limit.
- !SII->shouldClusterMemOps(CI.BaseOps, BaseOps, 2)))) {
+ !SII->shouldClusterMemOps(CI.BaseOps, BaseOps, 2,
+ WidthA + WidthB)))) {
// Finish the current clause.
----------------
foad wrote:
> The comment explains that we don't really want to limit the size of the cluster here, so it's probably best to pass in a small dummy value like 2 instead of WidthA + WidthB.
Taken care
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D80545/new/
https://reviews.llvm.org/D80545
More information about the llvm-commits
mailing list