[PATCH] D72487: [AMDGPU] Fix bundle scheduling
Stanislav Mekhanoshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jan 10 09:23:32 PST 2020
rampitec added a comment.
In D72487#1813828 <https://reviews.llvm.org/D72487#1813828>, @foad wrote:
> This is only relevant for post-ra scheduling because we don't have any bundles when we do pre-ra scheduling, right?
In fact we have much more bundles coming to pre-RA scheduler when we have xnack and form memory clauses. However, I have tried to use the same mutation in the pre-RA scheduler and did not notice any scheduling difference at all, even with memory clause tests.
> Doing it as a DAG mutation seems a bit late. One problem is that you set the latency for the bundle's succ, but not for the corresponding pred, so you see odd mismatches like this:
>
> SU(5): BUNDLE implicit-def $sgpr6_sgpr7, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $scc
> # preds left : 2
> # succs left : 1
> # rdefs left : 0
> Latency : 3
> Depth : 0
> Height : 1
> Predecessors:
> SU(0): Anti Latency=0
> SU(0): Anti Latency=0
> Successors:
> SU(11): Data Latency=1 Reg=$sgpr6_sgpr7
> ...
> SU(11): S_NOP 0, implicit killed $sgpr6_sgpr7, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4, implicit killed $vgpr0_vgpr1_vgpr2_vgpr3
> # preds left : 6
> # succs left : 0
> # rdefs left : 0
> Latency : 1
> Depth : 2
> Height : 0
> Predecessors:
> SU(10): Data Latency=0 Reg=$vgpr0_vgpr1_vgpr2_vgpr3
> SU(9): Data Latency=0 Reg=$vgpr0_vgpr1_vgpr2_vgpr3
> SU(8): Data Latency=0 Reg=$vgpr0_vgpr1_vgpr2_vgpr3
> SU(7): Data Latency=0 Reg=$vgpr0_vgpr1_vgpr2_vgpr3
> SU(6): Data Latency=0 Reg=$sgpr4
> SU(5): Data Latency=0 Reg=$sgpr6_sgpr7
>
>
> (Latency for succ of SU(5) is 1, but latency for pred of SU(11) is 0.)
>
> Instead how about doing this and implementing it in `adjustSchedDependency` for AMDGPU?
>
> diff --git a/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp b/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
> index 96a1f86c3e0..ef5926e4f8f 100644
> --- a/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
> +++ b/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
> @@ -269,9 +269,9 @@ void ScheduleDAGInstrs::addPhysRegDataDeps(SUnit *SU, unsigned OperIdx) {
> if (!ImplicitPseudoDef && !ImplicitPseudoUse) {
> Dep.setLatency(SchedModel.computeOperandLatency(SU->getInstr(), OperIdx,
> RegUse, UseOp));
> - ST.adjustSchedDependency(SU, UseSU, Dep);
> } else
> Dep.setLatency(0);
> + ST.adjustSchedDependency(SU, UseSU, Dep);
>
> UseSU->addPred(Dep);
> }
You are right, it seem I've got correct schedilng with this change, but preds are still wrong. I can either extend the mutation or try this adjustment. I will explore it.
> Also, just curious, why did you update so many tests? I only found 4 that were failing with the patch:
>
> Failing Tests (4):
> LLVM :: CodeGen/AMDGPU/llvm.amdgcn.ds.gws.barrier.ll
> LLVM :: CodeGen/AMDGPU/llvm.amdgcn.ds.gws.init.ll
> LLVM :: CodeGen/AMDGPU/misched-killflags.mir
> LLVM :: CodeGen/AMDGPU/mul24-pass-ordering.ll
Yeah, this is a split from another change. I am preparing the patch which will send memory operation clusters to the post-RA scheduler instead of current DAG mutation, so it looks like some tests from that patch have sneaked here.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D72487/new/
https://reviews.llvm.org/D72487
More information about the llvm-commits
mailing list