[llvm] [AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known to be 0 already (PR #72830)

Pierre van Houtryve via llvm-commits llvm-commits at lists.llvm.org
Thu Nov 23 00:30:23 PST 2023


Juan Manuel MARTINEZ =?utf-8?q?CAAMAÑO?= <juamarti at amd.com>,pvanhout
 <pierre.vanhoutryve at amd.com>,pvanhout <pierre.vanhoutryve at amd.com>,pvanhout
 <pierre.vanhoutryve at amd.com>,pvanhout <pierre.vanhoutryve at amd.com>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/72830 at github.com>


================
@@ -45,6 +45,7 @@ define void @back_off_barrier_no_fence(ptr %in, ptr %out) #0 {
 ; GFX11-BACKOFF-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX11-BACKOFF-NEXT:    flat_load_b32 v0, v[0:1]
 ; GFX11-BACKOFF-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX11-BACKOFF-NEXT:    s_waitcnt_vscnt null, 0x0
----------------
Pierre-vh wrote:

> In this patch I would prefer that setNonKernelFunctionInitialState only sets vscnt to unknown, and leaves the other counters as known to be zero. Does that change affect the tests?

With
```
    setScoreUB(VS_CNT, getWaitCountMax(VS_CNT));
    PendingEvents |= WaitEventMaskForInst[VS_CNT];
````

Only `CodeGen/AMDGPU/vgpr-descriptor-waterfall-loop-idom-update.ll` changes, everything else stays the same

I have no strong preference on whether to finish this first or the other patch - I just picked it up and I'm still learning about InsertWaitCnt myself. Though, I think if this patch has a lot more improvements than regressions we should land it first and add a TODO for the remaining bad cases

https://github.com/llvm/llvm-project/pull/72830


More information about the llvm-commits mailing list