[PATCH] D73292: [AMDGPU] Correct NumLoads in clustering

Fri Jan 24 01:15:10 PST 2020

foad added a comment.

In D73292#1837231 <https://reviews.llvm.org/D73292#1837231>, @rampitec wrote:

> In D73292#1837215 <https://reviews.llvm.org/D73292#1837215>, @foad wrote:
>
> > I tried something similar in D72325 <https://reviews.llvm.org/D72325>.
>
>
> Comments there argue about how much should we cluster, but regardless I do not think we should use a wrong data. If we want more clustering we need to increase thresholds, but still rely on a correct input.

I agree. I also think we should fix this properly in MachineScheduler:

  --- a/llvm/lib/CodeGen/MachineScheduler.cpp
  +++ b/llvm/lib/CodeGen/MachineScheduler.cpp
  @@ -1584,7 +1584,7 @@ void BaseMemOpClusterMutation::clusterNeighboringMemOps(
       SUnit *SUb = MemOpRecords[Idx+1].SU;
       if (TII->shouldClusterMemOps(MemOpRecords[Idx].BaseOps,
                                    MemOpRecords[Idx + 1].BaseOps,
  -                                 ClusterLength)) {
  +                                 ClusterLength + 1)) {
         if (SUa->NodeNum > SUb->NodeNum)
           std::swap(SUa, SUb);
         if (DAG->addEdge(SUb, SDep(SUa, SDep::Cluster))) {

... and adjust any other target implementations of shouldClusterMemOps accordingly.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73292/new/

https://reviews.llvm.org/D73292