[PATCH] D72737: [AMDGPU] Bundle loads before post-RA scheduler

Wed Jan 15 01:42:17 PST 2020

foad added a comment.

There are nice changes in a bunch of tests, where we're preserving clusters instead of breaking them apart.

But there are also strange changes in some other tests, where the clustering hasn't changed, but some instructions that use the result of a load have moved around. Does this mean we're getting the latency of the load wrong now? (Or were we getting it wrong before?) For example:
insert_vector_elt
llvm.maxnum.f16.ll
saddo.ll
sign_extend.ll

================
Comment at: llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll:288
+; GFX1064-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x24
 ; GFX1064-NEXT:    s_load_dword s0, s[0:1], 0x2c
 ; GFX1064-NEXT:    ; implicit-def: $vgpr1
----------------
Nice.

================
Comment at: llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll:188
+; GCN-NEXT:    global_store_dword v[0:1], v0, off
+; GCN-NEXT:    s_endpgm
 ; GCN-NEXT:  BB4_2: ; %if.else
----------------
What happened here? Has some cost estimate changed because of the bundling? Can we fix it?

================
Comment at: llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll:278
+; SI-NEXT:    s_load_dwordx2 s[8:9], s[0:1], 0x9
+; SI-NEXT:    s_load_dwordx2 s[0:1], s[0:1], 0xb
 ; SI-NEXT:    s_waitcnt lgkmcnt(0)
----------------
Nice.

================
Comment at: llvm/test/CodeGen/AMDGPU/idot2.ll:2677
 ; GFX7-NEXT:    s_load_dwordx4 s[4:7], s[0:1], 0x9
+; GFX7-NEXT:    s_load_dwordx2 s[0:1], s[0:1], 0xd
 ; GFX7-NEXT:    s_mov_b32 s3, 0xf000
----------------
Nice.

================
Comment at: llvm/test/CodeGen/AMDGPU/idot2.ll:2797
 ; GFX10-DL-NEXT:    global_load_ushort v0, v[0:1], off
+; GFX10-DL-NEXT:    s_load_dword s2, s[0:1], 0x0
 ; GFX10-DL-NEXT:    s_waitcnt vmcnt(1)
----------------
Is this just coincidence, or are we actually trying to cluster a FLAT load with an SMEM load?

================
Comment at: llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll:582
 ; GFX9-NEXT:    s_load_dwordx4 s[0:3], s[4:5], 0x0
-; GFX9-NEXT:    v_lshlrev_b32_e32 v2, 2, v0
 ; GFX9-NEXT:    s_load_dword s4, s[4:5], 0x10
+; GFX9-NEXT:    v_lshlrev_b32_e32 v2, 2, v0
----------------
Nice.

================
Comment at: llvm/test/CodeGen/AMDGPU/lshr.v2i16.ll:148
 ; GFX9-NEXT:    s_load_dwordx4 s[4:7], s[0:1], 0x24
-; GFX9-NEXT:    v_lshlrev_b32_e32 v2, 2, v0
 ; GFX9-NEXT:    s_load_dword s0, s[0:1], 0x34
+; GFX9-NEXT:    v_lshlrev_b32_e32 v2, 2, v0
----------------
Nice.

================
Comment at: llvm/test/CodeGen/AMDGPU/memory_clause.ll:8
 ; GCN-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x24
+; GCN-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x2c
 ; GCN-NEXT:    v_mov_b32_e32 v17, 0
----------------
Nice.

================
Comment at: llvm/test/CodeGen/AMDGPU/shl.ll:166
 ; GCN-NEXT:    s_load_dwordx4 s[4:7], s[0:1], 0x9
+; GCN-NEXT:    s_load_dword s8, s[0:1], 0xd
 ; GCN-NEXT:    s_mov_b32 s3, 0xf000
----------------
Nice.

================
Comment at: llvm/test/CodeGen/AMDGPU/shl.v2i16.ll:149
 ; GFX9-NEXT:    s_load_dwordx4 s[4:7], s[0:1], 0x24
-; GFX9-NEXT:    v_lshlrev_b32_e32 v2, 2, v0
 ; GFX9-NEXT:    s_load_dword s0, s[0:1], 0x34
+; GFX9-NEXT:    v_lshlrev_b32_e32 v2, 2, v0
----------------
Nice.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72737/new/

https://reviews.llvm.org/D72737