[PATCH] D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations

Mon Oct 19 23:33:34 PDT 2020

rampitec requested changes to this revision.
rampitec added a comment.
This revision now requires changes to proceed.

The patch clearly ignores existence of flat pointers with the test failing.

================
Comment at: llvm/test/CodeGen/AMDGPU/waitcnt.mir:66
 # CHECK: FLAT_LOAD_DWORD
-# GFX89: S_WAITCNT 112
+# GFX89: S_WAITCNT 3952
 # CHECK: FLAT_LOAD_DWORDX4
----------------
t-tye wrote:
> rampitec wrote:
> > That one was not supposed to change? The pointer is flat here.
> Yes. Previously it was "s_waitcnt vmcnt(0) lgkmcnt(0)". Now it is "s_waitcnt vmcnt(0)" as the address space of global16 is 1 which is GLOBAL. Therefore there is no need to wait on LGKM.
It is not global, it is flat:

```
<4 x i32>* %flat16
...
    $vgpr3_vgpr4_vgpr5_vgpr6 = FLAT_LOAD_DWORDX4 $vgpr7_vgpr8, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 16 from %ir.flat16)
```

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89618/new/

https://reviews.llvm.org/D89618