[PATCH] D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations

Tue Oct 20 14:31:42 PDT 2020

rampitec added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/waitcnt.mir:103
+    $vgpr3 = FLAT_LOAD_DWORD $vgpr1_vgpr2, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %ir.flat4)
+    $vgpr4 = FLAT_LOAD_DWORD $vgpr1_vgpr2, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %ir.global4)
+    $vgpr0 = V_MOV_B32_e32 $vgpr3, implicit $exec
----------------
t-tye wrote:
> rampitec wrote:
> > Can you keep just load from flat here? The other load obscures the result.
> Add the extra BB3 you suggested.
> 
> The waitcnts being generated seem correct from my inspection.
They seem to be correct, but with two loads per block it is hard to understand which of the loads has actually caused the wait. If you want to keep it this way, add yet another bb.4, but with only a load from flat.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89618/new/

https://reviews.llvm.org/D89618