[PATCH] D72487: [AMDGPU] Fix bundle scheduling

Fri Jan 10 09:23:32 PST 2020

rampitec added a comment.

In D72487#1813828 <https://reviews.llvm.org/D72487#1813828>, @foad wrote:

> This is only relevant for post-ra scheduling because we don't have any bundles when we do pre-ra scheduling, right?

In fact we have much more bundles coming to pre-RA scheduler when we have xnack and form memory clauses. However, I have tried to use the same mutation in the pre-RA scheduler and did not notice any scheduling difference at all, even with memory clause tests.

> Doing it as a DAG mutation seems a bit late. One problem is that you set the latency for the bundle's succ, but not for the corresponding pred, so you see odd mismatches like this:
> 
>   SU(5):   BUNDLE implicit-def $sgpr6_sgpr7, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $scc
>     # preds left       : 2
>     # succs left       : 1
>     # rdefs left       : 0
>     Latency            : 3
>     Depth              : 0
>     Height             : 1
>     Predecessors:
>       SU(0): Anti Latency=0
>       SU(0): Anti Latency=0
>     Successors:
>       SU(11): Data Latency=1 Reg=$sgpr6_sgpr7
>   ...
>   SU(11):   S_NOP 0, implicit killed $sgpr6_sgpr7, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4, implicit killed $vgpr0_vgpr1_vgpr2_vgpr3
>     # preds left       : 6
>     # succs left       : 0
>     # rdefs left       : 0
>     Latency            : 1
>     Depth              : 2
>     Height             : 0
>     Predecessors:
>       SU(10): Data Latency=0 Reg=$vgpr0_vgpr1_vgpr2_vgpr3
>       SU(9): Data Latency=0 Reg=$vgpr0_vgpr1_vgpr2_vgpr3
>       SU(8): Data Latency=0 Reg=$vgpr0_vgpr1_vgpr2_vgpr3
>       SU(7): Data Latency=0 Reg=$vgpr0_vgpr1_vgpr2_vgpr3
>       SU(6): Data Latency=0 Reg=$sgpr4
>       SU(5): Data Latency=0 Reg=$sgpr6_sgpr7
> 
> 
> (Latency for succ of SU(5) is 1, but latency for pred of SU(11) is 0.)
> 
> Instead how about doing this and implementing it in `adjustSchedDependency` for AMDGPU?
> 
>   diff --git a/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp b/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
>   index 96a1f86c3e0..ef5926e4f8f 100644
>   --- a/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
>   +++ b/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
>   @@ -269,9 +269,9 @@ void ScheduleDAGInstrs::addPhysRegDataDeps(SUnit *SU, unsigned OperIdx) {
>          if (!ImplicitPseudoDef && !ImplicitPseudoUse) {
>            Dep.setLatency(SchedModel.computeOperandLatency(SU->getInstr(), OperIdx,
>                                                            RegUse, UseOp));
>   -        ST.adjustSchedDependency(SU, UseSU, Dep);
>          } else
>            Dep.setLatency(0);
>   +      ST.adjustSchedDependency(SU, UseSU, Dep);
>    
>          UseSU->addPred(Dep);
>        }

You are right, it seem I've got correct schedilng with this change, but preds are still wrong. I can either extend the mutation or try this adjustment. I will explore it.

> Also, just curious, why did you update so many tests? I only found 4 that were failing with the patch:
> 
>   Failing Tests (4):
>       LLVM :: CodeGen/AMDGPU/llvm.amdgcn.ds.gws.barrier.ll
>       LLVM :: CodeGen/AMDGPU/llvm.amdgcn.ds.gws.init.ll
>       LLVM :: CodeGen/AMDGPU/misched-killflags.mir
>       LLVM :: CodeGen/AMDGPU/mul24-pass-ordering.ll

Yeah, this is a split from another change. I am preparing the patch which will send memory operation clusters to the post-RA scheduler instead of current DAG mutation, so it looks like some tests from that patch have sneaked here.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72487/new/

https://reviews.llvm.org/D72487