[PATCH] D73292: [AMDGPU] Correct NumLoads in clustering

Fri Jan 24 01:32:55 PST 2020

rampitec added a comment.

In D73292#1838140 <https://reviews.llvm.org/D73292#1838140>, @foad wrote:

> In D73292#1837231 <https://reviews.llvm.org/D73292#1837231>, @rampitec wrote:
>
> > In D73292#1837215 <https://reviews.llvm.org/D73292#1837215>, @foad wrote:
> >
> > > I tried something similar in D72325 <https://reviews.llvm.org/D72325>.
> >
> >
> > Comments there argue about how much should we cluster, but regardless I do not think we should use a wrong data. If we want more clustering we need to increase thresholds, but still rely on a correct input.
>
>
> I agree. I also think we should fix this properly in MachineScheduler:
>
>   --- a/llvm/lib/CodeGen/MachineScheduler.cpp
>   +++ b/llvm/lib/CodeGen/MachineScheduler.cpp
>   @@ -1584,7 +1584,7 @@ void BaseMemOpClusterMutation::clusterNeighboringMemOps(
>        SUnit *SUb = MemOpRecords[Idx+1].SU;
>        if (TII->shouldClusterMemOps(MemOpRecords[Idx].BaseOps,
>                                     MemOpRecords[Idx + 1].BaseOps,
>   -                                 ClusterLength)) {
>   +                                 ClusterLength + 1)) {
>          if (SUa->NodeNum > SUb->NodeNum)
>            std::swap(SUa, SUb);
>          if (DAG->addEdge(SUb, SDep(SUa, SDep::Cluster))) {
>
>
> ... and adjust any other target implementations of shouldClusterMemOps accordingly.

In D73292#1838140 <https://reviews.llvm.org/D73292#1838140>, @foad wrote:

> In D73292#1837231 <https://reviews.llvm.org/D73292#1837231>, @rampitec wrote:
>
> > In D73292#1837215 <https://reviews.llvm.org/D73292#1837215>, @foad wrote:
> >
> > > I tried something similar in D72325 <https://reviews.llvm.org/D72325>.
> >
> >
> > Comments there argue about how much should we cluster, but regardless I do not think we should use a wrong data. If we want more clustering we need to increase thresholds, but still rely on a correct input.
>
>
> I agree. I also think we should fix this properly in MachineScheduler:
>
>   --- a/llvm/lib/CodeGen/MachineScheduler.cpp
>   +++ b/llvm/lib/CodeGen/MachineScheduler.cpp
>   @@ -1584,7 +1584,7 @@ void BaseMemOpClusterMutation::clusterNeighboringMemOps(
>        SUnit *SUb = MemOpRecords[Idx+1].SU;
>        if (TII->shouldClusterMemOps(MemOpRecords[Idx].BaseOps,
>                                     MemOpRecords[Idx + 1].BaseOps,
>   -                                 ClusterLength)) {
>   +                                 ClusterLength + 1)) {
>          if (SUa->NodeNum > SUb->NodeNum)
>            std::swap(SUa, SUb);
>          if (DAG->addEdge(SUb, SDep(SUa, SDep::Cluster))) {
>
>
> ... and adjust any other target implementations of shouldClusterMemOps accordingly.

That I agree too. I just cannot deal with any single performance regression in all of the targets, including out of tree targets. This bug is here at least for 6 years when I first noticed it. I belive every in-tree target need to do the same as in this patch and then we can collectively switch away from this bug. Sounds like the only discussion is about the process how to get there.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73292/new/

https://reviews.llvm.org/D73292