[PATCH] D73509: [MachineScheduler] relax successor chain on clustering

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 28 11:10:11 PST 2020


rampitec added a comment.

In D73509#1845136 <https://reviews.llvm.org/D73509#1845136>, @rampitec wrote:

> In D73509#1844150 <https://reviews.llvm.org/D73509#1844150>, @foad wrote:
>
> > This doesn't fix the problem that inspired D71717 <https://reviews.llvm.org/D71717>. Consider the first test case in `memory_clause.ll`. With baseline llvm I get:
> >
> >   $ bin/llc -march=amdgcn -mcpu=gfx902 -verify-machineinstrs -amdgpu-enable-global-sgpr-addr -o /dev/null ~/git/llvm-project/llvm/test/CodeGen/AMDGPU/memory_clause.ll -debug-only=machine-scheduler |& egrep "^Cluster|Machine code for function"
> >   # Machine code for function vector_clause: NoPHIs, TracksLiveness
> >   Cluster ld/st SU(2) - SU(3)
> >   Cluster ld/st SU(6) - SU(7)
> >   Cluster ld/st SU(8) - SU(9)
> >   Cluster ld/st SU(11) - SU(13)
> >   # Machine code for function vector_clause: NoPHIs, TracksLiveness
>
>
> My problem is I cannot reproduce it. All I have is
>
>   # Machine code for function vector_clause: NoPHIs, TracksLiveness
>   Cluster ld/st SU(2) - SU(3)
>   # Machine code for function vector_clause: NoPHIs, TracksLiveness
>
>
> It just does not try to cluster all global loads and stores at all! It also does not do it even with D71717 <https://reviews.llvm.org/D71717>.
>  I will debug it...


I likely need to have some of your changes too, but I have reproduced it without -amdgpu-enable-global-sgpr-addr.

With master and my change, llc -march=amdgcn -mcpu=gfx902 -verify-machineinstrs < vector_clause.ll -debug-only=machine-scheduler |& egrep "^Cluster|^SU\(.*GLOBAL_"

  Cluster ld/st SU(2) - SU(3)
  Cluster ld/st SU(8) - SU(12)
  Cluster ld/st SU(13) - SU(14)
  Cluster ld/st SU(16) - SU(18)
  SU(8):   %12:vreg_128 = GLOBAL_LOAD_DWORDX4 %38:vreg_64, 0, 0, 0, 0, implicit $exec :: (load 16 from %ir.tmp3, addrspace 1)
  SU(12):   %15:vreg_128 = GLOBAL_LOAD_DWORDX4 %38:vreg_64, 16, 0, 0, 0, implicit $exec :: (load 16 from %ir.tmp72, addrspace 1)
  SU(13):   %17:vreg_128 = GLOBAL_LOAD_DWORDX4 %38:vreg_64, 32, 0, 0, 0, implicit $exec :: (load 16 from %ir.tmp116, addrspace 1)
  SU(14):   %19:vreg_128 = GLOBAL_LOAD_DWORDX4 %38:vreg_64, 48, 0, 0, 0, implicit $exec :: (load 16 from %ir.tmp1510, addrspace 1)
  SU(15):   GLOBAL_STORE_DWORDX4 %28:vreg_64, %12:vreg_128, 0, 0, 0, 0, implicit $exec :: (store 16 into %ir.tmp5, addrspace 1)
  SU(16):   GLOBAL_STORE_DWORDX4 %28:vreg_64, %15:vreg_128, 16, 0, 0, 0, implicit $exec :: (store 16 into %ir.tmp94, addrspace 1)
  SU(17):   GLOBAL_STORE_DWORDX4 %28:vreg_64, %17:vreg_128, 32, 0, 0, 0, implicit $exec :: (store 16 into %ir.tmp138, addrspace 1)
  SU(18):   GLOBAL_STORE_DWORDX4 %28:vreg_64, %19:vreg_128, 48, 0, 0, 0, implicit $exec :: (store 16 into %ir.tmp1712, addrspace 1)

With D71717 <https://reviews.llvm.org/D71717>:

  Cluster ld/st SU(2) - SU(3)
  Cluster ld/st SU(8) - SU(12)
  Cluster ld/st SU(13) - SU(14)
  Cluster ld/st SU(15) - SU(16)
  Cluster ld/st SU(17) - SU(18)
  SU(8):   %12:vreg_128 = GLOBAL_LOAD_DWORDX4 %38:vreg_64, 0, 0, 0, 0, implicit $exec :: (load 16 from %ir.tmp3, addrspace 1)
  SU(12):   %15:vreg_128 = GLOBAL_LOAD_DWORDX4 %38:vreg_64, 16, 0, 0, 0, implicit $exec :: (load 16 from %ir.tmp72, addrspace 1)
  SU(13):   %17:vreg_128 = GLOBAL_LOAD_DWORDX4 %38:vreg_64, 32, 0, 0, 0, implicit $exec :: (load 16 from %ir.tmp116, addrspace 1)
  SU(14):   %19:vreg_128 = GLOBAL_LOAD_DWORDX4 %38:vreg_64, 48, 0, 0, 0, implicit $exec :: (load 16 from %ir.tmp1510, addrspace 1)
  SU(15):   GLOBAL_STORE_DWORDX4 %28:vreg_64, %12:vreg_128, 0, 0, 0, 0, implicit $exec :: (store 16 into %ir.tmp5, addrspace 1)
  SU(16):   GLOBAL_STORE_DWORDX4 %28:vreg_64, %15:vreg_128, 16, 0, 0, 0, implicit $exec :: (store 16 into %ir.tmp94, addrspace 1)
  SU(17):   GLOBAL_STORE_DWORDX4 %28:vreg_64, %17:vreg_128, 32, 0, 0, 0, implicit $exec :: (store 16 into %ir.tmp138, addrspace 1)
  SU(18):   GLOBAL_STORE_DWORDX4 %28:vreg_64, %19:vreg_128, 48, 0, 0, 0, implicit $exec :: (store 16 into %ir.tmp1712, addrspace 1)

Even with D71717 <https://reviews.llvm.org/D71717> it does not do desired clustering, so both changes are insufficient.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73509/new/

https://reviews.llvm.org/D73509





More information about the llvm-commits mailing list