[PATCH] D85517: [Scheduling] Implement a new way to cluster loads/stores

Fri Aug 7 08:48:19 PDT 2020

foad added a comment.

Thanks for the new detailed summary.

================
Comment at: llvm/lib/CodeGen/MachineScheduler.cpp:1620-1621
     SUnit *SUb = MemOpb.SU;
     if (SUa->NodeNum > SUb->NodeNum)
       std::swap(SUa, SUb);

----------------
This is supposed to prefer to keep loads/stores in their original code order. From the AMDGPU test case diffs (e.g. test/CodeGen/AMDGPU/global-saddr.ll) it looks like a lot of the clusters have been reordered. Do you have any idea why?

================
Comment at: llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs.ll:627-630
+; FIXEDABI: buffer_store_dword [[K0]], off, s[0:3], 0 offset:4{{$}}
 ; FIXEDABI: s_movk_i32 s32, 0x400{{$}}
 ; FIXEDABI: v_mov_b32_e32 [[K1:v[0-9]+]], 0x140
 ; FIXEDABI: buffer_store_dword [[K1]], off, s[0:3], s32{{$}}
----------------
This looks like a regression because the stores on lines 627 and 630 are no longer clustered. BUT see D85530: I don't think there is any reason for AMDGPU to try to cluster stores, so this may become a non-issue.

================
Comment at: llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs.ll:672-674
 ; FIXEDABI: buffer_store_dword [[K0]], off, s[0:3], s33{{$}}
+; FIXEDABI: v_mov_b32_e32 [[K1:v[0-9]+]], 0x140{{$}}
 ; FIXEDABI: buffer_store_dword [[K1]], off, s[0:3], s32{{$}}
----------------
As above, these stores are no longer clustered.

================
Comment at: llvm/test/CodeGen/AMDGPU/copy-illegal-type.ll:165-166
 ; SI:       ; %bb.0:
-; SI-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x11
-; SI-NEXT:    s_mov_b32 s11, 0xf000
-; SI-NEXT:    s_mov_b32 s6, 0
-; SI-NEXT:    s_mov_b32 s7, s11
+; SI-NEXT:    s_load_dwordx8 s[4:11], s[0:1], 0x9
+; SI-NEXT:    s_load_dwordx2 s[0:1], s[0:1], 0x11
+; SI-NEXT:    s_mov_b32 s15, 0xf000
----------------
This looks good!

================
Comment at: llvm/test/CodeGen/AMDGPU/copy-illegal-type.ll:197-198
 ; VI:       ; %bb.0:
-; VI-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x44
+; VI-NEXT:    s_load_dwordx8 s[4:11], s[0:1], 0x24
+; VI-NEXT:    s_load_dwordx2 s[0:1], s[0:1], 0x44
 ; VI-NEXT:    v_lshlrev_b32_e32 v0, 2, v0
----------------
This looks good!

================
Comment at: llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll:759-761
+; SI-NEXT:    buffer_store_dword v7, off, s[4:7], 0 offset:24
 ; SI-NEXT:    buffer_store_dwordx2 v[4:5], off, s[4:7], 0 offset:16
 ; SI-NEXT:    buffer_store_dwordx4 v[0:3], off, s[4:7], 0
----------------
This looks like an improvement modulo D85530.

================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.round.f64.ll:349-350
 ; CI:       ; %bb.0:
-; CI-NEXT:    s_load_dwordx8 s[4:11], s[0:1], 0x11
-; CI-NEXT:    s_brev_b32 s12, -2
+; CI-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x9
+; CI-NEXT:    s_load_dwordx8 s[8:15], s[0:1], 0x11
+; CI-NEXT:    s_brev_b32 s2, -2
----------------
This looks good!

================
Comment at: llvm/test/CodeGen/AMDGPU/stack-realign.ll:167-171
+; GCN: buffer_store_dword v32, off, s[0:3], s33 offset:1024
 ; GCN-NEXT: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s34
 ; GCN-NEXT: s_add_u32 s32, s32, 0x30000

 ; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s32
----------------
The stores are no longer clustered.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85517/new/

https://reviews.llvm.org/D85517