[PATCH] D149332: [AMDGPU] Also consider global and scratch instructions when flushing vmcnt counter in loop preheader

Thu Apr 27 08:14:11 PDT 2023

kerbowa added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1724
     for (MachineInstr &MI : *MBB) {
-      if (SIInstrInfo::isVMEM(MI)) {
+      if (updateVMCntOnly(MI)) {
         if (MI.mayLoad())
----------------
foad wrote:
> bsaleil wrote:
> > foad wrote:
> > > My only slight concern is whether we should also accept FLAT instructions here? They update vmcnt but not //only// vmcnt. I'm not sure what the answer is.
> > I think it is still better to flush vmcnt in this case.
> > With a flat load, we would have:
> > 
> > ```v0 = flat_load(...)
> > s_waitcnt vmcnt(0)
> > loop {
> >   ...
> >   s_waitcnt lgkmcnt(0)
> >   use(v0)
> >   ...
> >   store(...)
> >   ...
> > }```
> > 
> > Which is better than having a s_waitcnt vmcnt in the loop. If the store is also a flat store, it may be worth flushing lgkmcnt too, but I don't know if this case is common or not.
> > Anyway, we should add tests cases for that. @rochauha, can you extend `waitcnt-vmcnt-loop.mir` with a minimal test case for global and flat instructions ?
> > I think it is still better to flush vmcnt in this case.
> OK, so maybe we should test `isVMEM || isFLAT` here?
@rochauha You could also try isFLAT+mayAccessVMEMThroughFlat

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149332/new/

https://reviews.llvm.org/D149332