[PATCH] D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations
Stanislav Mekhanoshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 20 14:31:42 PDT 2020
rampitec added inline comments.
================
Comment at: llvm/test/CodeGen/AMDGPU/waitcnt.mir:103
+ $vgpr3 = FLAT_LOAD_DWORD $vgpr1_vgpr2, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %ir.flat4)
+ $vgpr4 = FLAT_LOAD_DWORD $vgpr1_vgpr2, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %ir.global4)
+ $vgpr0 = V_MOV_B32_e32 $vgpr3, implicit $exec
----------------
t-tye wrote:
> rampitec wrote:
> > Can you keep just load from flat here? The other load obscures the result.
> Add the extra BB3 you suggested.
>
> The waitcnts being generated seem correct from my inspection.
They seem to be correct, but with two loads per block it is hard to understand which of the loads has actually caused the wait. If you want to keep it this way, add yet another bb.4, but with only a load from flat.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D89618/new/
https://reviews.llvm.org/D89618
More information about the llvm-commits
mailing list