[llvm] [AMDGPU][InsertWaitCnts] Optimize loadcnt insertion at function boundaries (PR #169647)
Pankaj Dwivedi via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 26 06:56:15 PST 2025
================
@@ -715,6 +715,22 @@ class WaitcntBrackets {
PendingEvents |= Context->WaitEventMaskForInst[STORE_CNT];
}
+ // Returns true if any VGPR has a pending load (score > lower bound for T).
+ // This is used to optimize waitcnt insertion at function boundaries when the
+ // only pending LOAD_CNT events are from instructions that don't write to
+ // VGPRs (e.g., GLOBAL_INV).
+ bool hasPendingVGPRWait(InstCounterType T) const {
+ unsigned LB = getScoreLB(T);
+ // If VgprUB is -1, no VGPRs have been touched
+ if (VgprUB < 0)
+ return false;
+ for (int RegNo = 0; RegNo <= VgprUB; ++RegNo) {
----------------
PankajDwivedi-25 wrote:
> By default this would cause SIInsertWaitcnts to pessimistically assume that GLOBAL_INV can complete out-of-order with other instructions that use LOADcnt. But we could either fix that in `counterOutOfOrder`, or ignore it if it does not seem to cause any bad effects in practice.
Updated counterOutOfOrder() to handle GLOBAL_INV_ACCESS correctly.
https://github.com/llvm/llvm-project/pull/169647
More information about the llvm-commits
mailing list