[PATCH] D149332: [AMDGPU] Also consider global and scratch instructions when flushing vmcnt counter in loop preheader

Thu Apr 27 07:24:10 PDT 2023

bsaleil added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1724
     for (MachineInstr &MI : *MBB) {
-      if (SIInstrInfo::isVMEM(MI)) {
+      if (updateVMCntOnly(MI)) {
         if (MI.mayLoad())
----------------
foad wrote:
> My only slight concern is whether we should also accept FLAT instructions here? They update vmcnt but not //only// vmcnt. I'm not sure what the answer is.
I think it is still better to flush vmcnt in this case.
With a flat load, we would have:

```v0 = flat_load(...)
s_waitcnt vmcnt(0)
loop {
  ...
  s_waitcnt lgkmcnt(0)
  use(v0)
  ...
  store(...)
  ...
}```

Which is better than having a s_waitcnt vmcnt in the loop. If the store is also a flat store, it may be worth flushing lgkmcnt too, but I don't know if this case is common or not.
Anyway, we should add tests cases for that. @rochauha, can you extend `waitcnt-vmcnt-loop.mir` with a minimal test case for global and flat instructions ?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149332/new/

https://reviews.llvm.org/D149332