[PATCH] D73509: [MachineScheduler] relax successor chain on clustering
Stanislav Mekhanoshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 28 11:10:11 PST 2020
rampitec added a comment.
In D73509#1845136 <https://reviews.llvm.org/D73509#1845136>, @rampitec wrote:
> In D73509#1844150 <https://reviews.llvm.org/D73509#1844150>, @foad wrote:
>
> > This doesn't fix the problem that inspired D71717 <https://reviews.llvm.org/D71717>. Consider the first test case in `memory_clause.ll`. With baseline llvm I get:
> >
> > $ bin/llc -march=amdgcn -mcpu=gfx902 -verify-machineinstrs -amdgpu-enable-global-sgpr-addr -o /dev/null ~/git/llvm-project/llvm/test/CodeGen/AMDGPU/memory_clause.ll -debug-only=machine-scheduler |& egrep "^Cluster|Machine code for function"
> > # Machine code for function vector_clause: NoPHIs, TracksLiveness
> > Cluster ld/st SU(2) - SU(3)
> > Cluster ld/st SU(6) - SU(7)
> > Cluster ld/st SU(8) - SU(9)
> > Cluster ld/st SU(11) - SU(13)
> > # Machine code for function vector_clause: NoPHIs, TracksLiveness
>
>
> My problem is I cannot reproduce it. All I have is
>
> # Machine code for function vector_clause: NoPHIs, TracksLiveness
> Cluster ld/st SU(2) - SU(3)
> # Machine code for function vector_clause: NoPHIs, TracksLiveness
>
>
> It just does not try to cluster all global loads and stores at all! It also does not do it even with D71717 <https://reviews.llvm.org/D71717>.
> I will debug it...
I likely need to have some of your changes too, but I have reproduced it without -amdgpu-enable-global-sgpr-addr.
With master and my change, llc -march=amdgcn -mcpu=gfx902 -verify-machineinstrs < vector_clause.ll -debug-only=machine-scheduler |& egrep "^Cluster|^SU\(.*GLOBAL_"
Cluster ld/st SU(2) - SU(3)
Cluster ld/st SU(8) - SU(12)
Cluster ld/st SU(13) - SU(14)
Cluster ld/st SU(16) - SU(18)
SU(8): %12:vreg_128 = GLOBAL_LOAD_DWORDX4 %38:vreg_64, 0, 0, 0, 0, implicit $exec :: (load 16 from %ir.tmp3, addrspace 1)
SU(12): %15:vreg_128 = GLOBAL_LOAD_DWORDX4 %38:vreg_64, 16, 0, 0, 0, implicit $exec :: (load 16 from %ir.tmp72, addrspace 1)
SU(13): %17:vreg_128 = GLOBAL_LOAD_DWORDX4 %38:vreg_64, 32, 0, 0, 0, implicit $exec :: (load 16 from %ir.tmp116, addrspace 1)
SU(14): %19:vreg_128 = GLOBAL_LOAD_DWORDX4 %38:vreg_64, 48, 0, 0, 0, implicit $exec :: (load 16 from %ir.tmp1510, addrspace 1)
SU(15): GLOBAL_STORE_DWORDX4 %28:vreg_64, %12:vreg_128, 0, 0, 0, 0, implicit $exec :: (store 16 into %ir.tmp5, addrspace 1)
SU(16): GLOBAL_STORE_DWORDX4 %28:vreg_64, %15:vreg_128, 16, 0, 0, 0, implicit $exec :: (store 16 into %ir.tmp94, addrspace 1)
SU(17): GLOBAL_STORE_DWORDX4 %28:vreg_64, %17:vreg_128, 32, 0, 0, 0, implicit $exec :: (store 16 into %ir.tmp138, addrspace 1)
SU(18): GLOBAL_STORE_DWORDX4 %28:vreg_64, %19:vreg_128, 48, 0, 0, 0, implicit $exec :: (store 16 into %ir.tmp1712, addrspace 1)
With D71717 <https://reviews.llvm.org/D71717>:
Cluster ld/st SU(2) - SU(3)
Cluster ld/st SU(8) - SU(12)
Cluster ld/st SU(13) - SU(14)
Cluster ld/st SU(15) - SU(16)
Cluster ld/st SU(17) - SU(18)
SU(8): %12:vreg_128 = GLOBAL_LOAD_DWORDX4 %38:vreg_64, 0, 0, 0, 0, implicit $exec :: (load 16 from %ir.tmp3, addrspace 1)
SU(12): %15:vreg_128 = GLOBAL_LOAD_DWORDX4 %38:vreg_64, 16, 0, 0, 0, implicit $exec :: (load 16 from %ir.tmp72, addrspace 1)
SU(13): %17:vreg_128 = GLOBAL_LOAD_DWORDX4 %38:vreg_64, 32, 0, 0, 0, implicit $exec :: (load 16 from %ir.tmp116, addrspace 1)
SU(14): %19:vreg_128 = GLOBAL_LOAD_DWORDX4 %38:vreg_64, 48, 0, 0, 0, implicit $exec :: (load 16 from %ir.tmp1510, addrspace 1)
SU(15): GLOBAL_STORE_DWORDX4 %28:vreg_64, %12:vreg_128, 0, 0, 0, 0, implicit $exec :: (store 16 into %ir.tmp5, addrspace 1)
SU(16): GLOBAL_STORE_DWORDX4 %28:vreg_64, %15:vreg_128, 16, 0, 0, 0, implicit $exec :: (store 16 into %ir.tmp94, addrspace 1)
SU(17): GLOBAL_STORE_DWORDX4 %28:vreg_64, %17:vreg_128, 32, 0, 0, 0, implicit $exec :: (store 16 into %ir.tmp138, addrspace 1)
SU(18): GLOBAL_STORE_DWORDX4 %28:vreg_64, %19:vreg_128, 48, 0, 0, 0, implicit $exec :: (store 16 into %ir.tmp1712, addrspace 1)
Even with D71717 <https://reviews.llvm.org/D71717> it does not do desired clustering, so both changes are insufficient.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D73509/new/
https://reviews.llvm.org/D73509
More information about the llvm-commits
mailing list