[llvm] [AMDGPU][InsertWaitCnts] Optimize loadcnt insertion at function boundaries (PR #169647)

Pankaj Dwivedi via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 26 23:01:42 PST 2025


================
@@ -715,6 +717,16 @@ class WaitcntBrackets {
     PendingEvents |= Context->WaitEventMaskForInst[STORE_CNT];
   }
 
+  // Returns true if there are pending VGPR-writing loads for counter type T.
+  // This is used to optimize waitcnt insertion at function boundaries when the
+  // only pending LOAD_CNT events are from instructions that don't write to
+  // VGPRs (e.g., GLOBAL_INV). We check for VMEM_READ_ACCESS or VMEM_ACCESS
+  // events, which correspond to actual VGPR-writing loads.
+  bool hasPendingVGPRWait(InstCounterType T) const {
+    assert(T == LOAD_CNT && "Only LOAD_CNT is supported");
+    return hasPendingEvent(VMEM_READ_ACCESS) || hasPendingEvent(VMEM_ACCESS);
----------------
PankajDwivedi-25 wrote:

I mean, here we need to only check for VMEM_READ_ACCESS not the VMEM_ACCESS.

https://github.com/llvm/llvm-project/pull/169647


More information about the llvm-commits mailing list