[PATCH] D85517: [Scheduling] Implement a new way to cluster loads/stores
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 7 08:48:19 PDT 2020
foad added a comment.
Thanks for the new detailed summary.
================
Comment at: llvm/lib/CodeGen/MachineScheduler.cpp:1620-1621
SUnit *SUb = MemOpb.SU;
if (SUa->NodeNum > SUb->NodeNum)
std::swap(SUa, SUb);
----------------
This is supposed to prefer to keep loads/stores in their original code order. From the AMDGPU test case diffs (e.g. test/CodeGen/AMDGPU/global-saddr.ll) it looks like a lot of the clusters have been reordered. Do you have any idea why?
================
Comment at: llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs.ll:627-630
+; FIXEDABI: buffer_store_dword [[K0]], off, s[0:3], 0 offset:4{{$}}
; FIXEDABI: s_movk_i32 s32, 0x400{{$}}
; FIXEDABI: v_mov_b32_e32 [[K1:v[0-9]+]], 0x140
; FIXEDABI: buffer_store_dword [[K1]], off, s[0:3], s32{{$}}
----------------
This looks like a regression because the stores on lines 627 and 630 are no longer clustered. BUT see D85530: I don't think there is any reason for AMDGPU to try to cluster stores, so this may become a non-issue.
================
Comment at: llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs.ll:672-674
; FIXEDABI: buffer_store_dword [[K0]], off, s[0:3], s33{{$}}
+; FIXEDABI: v_mov_b32_e32 [[K1:v[0-9]+]], 0x140{{$}}
; FIXEDABI: buffer_store_dword [[K1]], off, s[0:3], s32{{$}}
----------------
As above, these stores are no longer clustered.
================
Comment at: llvm/test/CodeGen/AMDGPU/copy-illegal-type.ll:165-166
; SI: ; %bb.0:
-; SI-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x11
-; SI-NEXT: s_mov_b32 s11, 0xf000
-; SI-NEXT: s_mov_b32 s6, 0
-; SI-NEXT: s_mov_b32 s7, s11
+; SI-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0x9
+; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x11
+; SI-NEXT: s_mov_b32 s15, 0xf000
----------------
This looks good!
================
Comment at: llvm/test/CodeGen/AMDGPU/copy-illegal-type.ll:197-198
; VI: ; %bb.0:
-; VI-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x44
+; VI-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0x24
+; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x44
; VI-NEXT: v_lshlrev_b32_e32 v0, 2, v0
----------------
This looks good!
================
Comment at: llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll:759-761
+; SI-NEXT: buffer_store_dword v7, off, s[4:7], 0 offset:24
; SI-NEXT: buffer_store_dwordx2 v[4:5], off, s[4:7], 0 offset:16
; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[4:7], 0
----------------
This looks like an improvement modulo D85530.
================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.round.f64.ll:349-350
; CI: ; %bb.0:
-; CI-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0x11
-; CI-NEXT: s_brev_b32 s12, -2
+; CI-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9
+; CI-NEXT: s_load_dwordx8 s[8:15], s[0:1], 0x11
+; CI-NEXT: s_brev_b32 s2, -2
----------------
This looks good!
================
Comment at: llvm/test/CodeGen/AMDGPU/stack-realign.ll:167-171
+; GCN: buffer_store_dword v32, off, s[0:3], s33 offset:1024
; GCN-NEXT: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s34
; GCN-NEXT: s_add_u32 s32, s32, 0x30000
; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s32
----------------
The stores are no longer clustered.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D85517/new/
https://reviews.llvm.org/D85517
More information about the llvm-commits
mailing list