[PATCH] D149332: [AMDGPU] Also consider global and scratch instructions when flushing vmcnt counter in loop preheader

Thu Apr 27 07:30:58 PDT 2023

foad added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1724
     for (MachineInstr &MI : *MBB) {
-      if (SIInstrInfo::isVMEM(MI)) {
+      if (updateVMCntOnly(MI)) {
         if (MI.mayLoad())
----------------
bsaleil wrote:
> foad wrote:
> > My only slight concern is whether we should also accept FLAT instructions here? They update vmcnt but not //only// vmcnt. I'm not sure what the answer is.
> I think it is still better to flush vmcnt in this case.
> With a flat load, we would have:
> 
> ```v0 = flat_load(...)
> s_waitcnt vmcnt(0)
> loop {
>   ...
>   s_waitcnt lgkmcnt(0)
>   use(v0)
>   ...
>   store(...)
>   ...
> }```
> 
> Which is better than having a s_waitcnt vmcnt in the loop. If the store is also a flat store, it may be worth flushing lgkmcnt too, but I don't know if this case is common or not.
> Anyway, we should add tests cases for that. @rochauha, can you extend `waitcnt-vmcnt-loop.mir` with a minimal test case for global and flat instructions ?
> I think it is still better to flush vmcnt in this case.
OK, so maybe we should test `isVMEM || isFLAT` here?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149332/new/

https://reviews.llvm.org/D149332