[PATCH] D85517: [Scheduling] Implement a new way to cluster loads/stores
Qing Shan Zhang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 18 03:14:00 PDT 2020
steven.zhang added inline comments.
================
Comment at: llvm/test/CodeGen/AMDGPU/max.i16.ll:148
; GFX9-NEXT: s_waitcnt lgkmcnt(0)
-; GFX9-NEXT: global_load_short_d16 v2, v0, s[6:7] offset:4
; GFX9-NEXT: global_load_short_d16 v1, v0, s[0:1] offset:4
+; GFX9-NEXT: global_load_dword v3, v0, s[0:1]
----------------
This is an improvement from the scheduler log.
old implementation cluster 3 ld/st pairs.
```
Cluster ld/st SU(2) - SU(3)
Copy Succ SU(14)
Copy Succ SU(13)
Copy Succ SU(8)
Copy Succ SU(7)
Curr cluster length: 2, Curr cluster bytes: 24
Cluster ld/st SU(8) - SU(10)
Copy Succ SU(11)
Copy Succ SU(14)
Copy Succ SU(13)
Curr cluster length: 2, Curr cluster bytes: 8
Num BaseOps: 2, Offset: 4, OffsetIsScalable: 0, Width: 4
Num BaseOps: 2, Offset: 4, OffsetIsScalable: 0, Width: 4
Num BaseOps: 2, Offset: 0, OffsetIsScalable: 0, Width: 4
Cluster ld/st SU(13) - SU(14)
Copy Pred SU(11)
Copy Pred SU(10)
Copy Pred SU(9)
Copy Pred SU(8)
Copy Pred SU(7)
Copy Pred SU(4)
Copy Pred SU(2)
Copy Pred SU(3)
Curr cluster length: 2, Curr cluster bytes: 8
Final:
SU(0): %1:sgpr_64(p4) = COPY $sgpr0_sgpr1
SU(1): %0:vgpr_32(s32) = COPY $vgpr0
SU(2): %4:sgpr_128 = S_LOAD_DWORDX4_IMM %1:sgpr_64(p4), 36, 0, 0 :: (dereferenceable invariant load 16 from %ir.1, align 4, addrspace 4)
SU(3): %14:sreg_64_xexec = S_LOAD_DWORDX2_IMM %1:sgpr_64(p4), 52, 0, 0 :: (dereferenceable invariant load 8 from %ir.1 + 16, align 4, addrspace 4)
SU(4): %16:vgpr_32 = V_LSHLREV_B32_e32 3, %0:vgpr_32(s32), implicit $exec
SU(5): %20:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
SU(6): %18:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
SU(7): %18:vgpr_32 = GLOBAL_LOAD_SHORT_D16_SADDR %4.sub2_sub3:sgpr_128, %16:vgpr_32, 4, 0, 0, 0, %18:vgpr_32(tied-def 0), implicit $exec :: (load 2 from %ir.gep0 + 4, align 4, addrspace 1)
SU(9): %20:vgpr_32 = GLOBAL_LOAD_SHORT_D16_SADDR %14:sreg_64_xexec, %16:vgpr_32, 4, 0, 0, 0, %20:vgpr_32(tied-def 0), implicit $exec :: (load 2 from %ir.gep1 + 4, align 4, addrspace 1)
SU(8): %19:vgpr_32 = GLOBAL_LOAD_DWORD_SADDR %4.sub2_sub3:sgpr_128, %16:vgpr_32, 0, 0, 0, 0, implicit $exec :: (load 4 from %ir.gep0, addrspace 1)
SU(10): %21:vgpr_32 = GLOBAL_LOAD_DWORD_SADDR %14:sreg_64_xexec, %16:vgpr_32, 0, 0, 0, 0, implicit $exec :: (load 4 from %ir.gep1, addrspace 1)
SU(11): %22:vgpr_32 = V_PK_MAX_I16 8, %19:vgpr_32, 8, %21:vgpr_32, 0, 0, 0, 0, 0, implicit $exec
SU(12): %23:vgpr_32 = V_PK_MAX_I16 8, %18:vgpr_32, 8, %20:vgpr_32, 0, 0, 0, 0, 0, implicit $exec
SU(13): GLOBAL_STORE_SHORT_SADDR %16:vgpr_32, %23:vgpr_32, %4.sub0_sub1:sgpr_128, 4, 0, 0, 0, implicit $exec :: (store 2 into %ir.outgep + 4, align 4, addrspace 1)
SU(14): GLOBAL_STORE_DWORD_SADDR %16:vgpr_32, %22:vgpr_32, %4.sub0_sub1:sgpr_128, 0, 0, 0, 0, implicit $exec :: (store 4 into %ir.outgep, addrspace 1)
```
New implementation cluster 5 pairs.
```
Cluster ld/st SU(2) - SU(3)
Copy Succ SU(14)
Copy Succ SU(13)
Copy Succ SU(8)
Copy Succ SU(7)
Curr cluster length: 2, Curr cluster bytes: 24
Cluster ld/st SU(7) - SU(8)
Copy Succ SU(12)
Copy Succ SU(14)
Copy Succ SU(13)
Curr cluster length: 2, Curr cluster bytes: 8
Cluster ld/st SU(7) - SU(10)
Copy Succ SU(12)
Copy Succ SU(14)
Copy Succ SU(13)
Copy Succ SU(8)
Curr cluster length: 3, Curr cluster bytes: 12
Cluster ld/st SU(9) - SU(10)
Copy Succ SU(12)
Copy Succ SU(14)
Copy Succ SU(13)
Curr cluster length: 4, Curr cluster bytes: 16
Num BaseOps: 2, Offset: 4, OffsetIsScalable: 0, Width: 4
Num BaseOps: 2, Offset: 0, OffsetIsScalable: 0, Width: 4
Cluster ld/st SU(13) - SU(14)
Copy Pred SU(11)
Copy Pred SU(10)
Copy Pred SU(9)
Copy Pred SU(8)
Copy Pred SU(7)
Copy Pred SU(4)
Copy Pred SU(2)
Copy Pred SU(3)
Curr cluster length: 2, Curr cluster bytes: 8
Final:
*** Final schedule for %bb.0 ***
SU(0): %1:sgpr_64(p4) = COPY $sgpr0_sgpr1
SU(1): %0:vgpr_32(s32) = COPY $vgpr0
SU(2): %4:sgpr_128 = S_LOAD_DWORDX4_IMM %1:sgpr_64(p4), 36, 0, 0 :: (dereferenceable invariant load 16 from %ir.1, align 4, addrspace 4)
SU(3): %14:sreg_64_xexec = S_LOAD_DWORDX2_IMM %1:sgpr_64(p4), 52, 0, 0 :: (dereferenceable invariant load 8 from %ir.1 + 16, align 4, addrspace 4)
SU(4): %16:vgpr_32 = V_LSHLREV_B32_e32 3, %0:vgpr_32(s32), implicit $exec
SU(5): %20:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
SU(6): %18:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
SU(9): %20:vgpr_32 = GLOBAL_LOAD_SHORT_D16_SADDR %14:sreg_64_xexec, %16:vgpr_32, 4, 0, 0, 0, %20:vgpr_32(tied-def 0), implicit $exec :: (load 2 from %ir.gep1 + 4, align 4, addrspace 1)
SU(10): %21:vgpr_32 = GLOBAL_LOAD_DWORD_SADDR %14:sreg_64_xexec, %16:vgpr_32, 0, 0, 0, 0, implicit $exec :: (load 4 from %ir.gep1, addrspace 1)
SU(7): %18:vgpr_32 = GLOBAL_LOAD_SHORT_D16_SADDR %4.sub2_sub3:sgpr_128, %16:vgpr_32, 4, 0, 0, 0, %18:vgpr_32(tied-def 0), implicit $exec :: (load 2 from %ir.gep0 + 4, align 4, addrspace 1)
SU(8): %19:vgpr_32 = GLOBAL_LOAD_DWORD_SADDR %4.sub2_sub3:sgpr_128, %16:vgpr_32, 0, 0, 0, 0, implicit $exec :: (load 4 from %ir.gep0, addrspace 1)
SU(11): %22:vgpr_32 = V_PK_MAX_I16 8, %19:vgpr_32, 8, %21:vgpr_32, 0, 0, 0, 0, 0, implicit $exec
SU(12): %23:vgpr_32 = V_PK_MAX_I16 8, %18:vgpr_32, 8, %20:vgpr_32, 0, 0, 0, 0, 0, implicit $exec
SU(13): GLOBAL_STORE_SHORT_SADDR %16:vgpr_32, %23:vgpr_32, %4.sub0_sub1:sgpr_128, 4, 0, 0, 0, implicit $exec :: (store 2 into %ir.outgep + 4, align 4, addrspace 1)
SU(14): GLOBAL_STORE_DWORD_SADDR %16:vgpr_32, %22:vgpr_32, %4.sub0_sub1:sgpr_128, 0, 0, 0, 0, implicit $exec :: (store 4 into %ir.outgep, addrspace 1)
```
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D85517/new/
https://reviews.llvm.org/D85517
More information about the llvm-commits
mailing list