[PATCH] D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations

Tue Oct 20 14:24:31 PDT 2020

rampitec added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/waitcnt.mir:103
+    $vgpr3 = FLAT_LOAD_DWORD $vgpr1_vgpr2, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %ir.flat4)
+    $vgpr4 = FLAT_LOAD_DWORD $vgpr1_vgpr2, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %ir.global4)
+    $vgpr0 = V_MOV_B32_e32 $vgpr3, implicit $exec
----------------
Can you keep just load from flat here? The other load obscures the result.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89618/new/

https://reviews.llvm.org/D89618