[PATCH] D72737: [AMDGPU] Bundle loads before post-RA scheduler

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 15 08:45:50 PST 2020


rampitec marked 2 inline comments as done.
rampitec added inline comments.


================
Comment at: llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll:188
+; GCN-NEXT:    global_store_dword v[0:1], v0, off
+; GCN-NEXT:    s_endpgm
 ; GCN-NEXT:  BB4_2: ; %if.else
----------------
foad wrote:
> What happened here? Has some cost estimate changed because of the bundling? Can we fix it?
It was duplicated by Branch Probability Basic Block Placement immediately after the post-RA scheduler.
It is now duplicated because of -tail-dup-placement-threshold default value of 2. If you use 3 it will be duplicated w/o bundling.
That is because TailDuplicator::shouldTailDuplicate() simply count instructions and compare against the threshold: https://llvm.org/doxygen/TailDuplicator_8cpp_source.html#l00622

It can be fixed in a separate follow-up patch to add a bundle's size if it is a bundle, I am not sure if it may affect other targets or not.


================
Comment at: llvm/test/CodeGen/AMDGPU/idot2.ll:2797
 ; GFX10-DL-NEXT:    global_load_ushort v0, v[0:1], off
+; GFX10-DL-NEXT:    s_load_dword s2, s[0:1], 0x0
 ; GFX10-DL-NEXT:    s_waitcnt vmcnt(1)
----------------
foad wrote:
> Is this just coincidence, or are we actually trying to cluster a FLAT load with an SMEM load?
No, we do not:

  BUNDLE implicit-def $vgpr2, implicit-def $vgpr0, implicit killed $vgpr2_vgpr3, implicit $exec, implicit killed $vgpr0_vgpr1 {
    renamable $vgpr2 = GLOBAL_LOAD_USHORT killed renamable $vgpr2_vgpr3, 0, 0, 0, 0, implicit $exec :: (load 2 from %ir.2, addrspace 1)
    renamable $vgpr0 = GLOBAL_LOAD_USHORT killed renamable $vgpr0_vgpr1, 0, 0, 0, 0, implicit $exec :: (load 2 from %ir.3, addrspace 1)
  }
  renamable $sgpr0 = S_LOAD_DWORD_IMM renamable $sgpr4_sgpr5, 0, 0, 0 :: (load 4 from %ir.4, addrspace 1)

That is because of removed mutation again I guess. Anyway, this is a better schedule because global loads take longer than SMRD.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72737/new/

https://reviews.llvm.org/D72737





More information about the llvm-commits mailing list