[llvm] [AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known to be 0 already (PR #72830)
Pierre van Houtryve via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 23 00:30:23 PST 2023
Juan Manuel MARTINEZ =?utf-8?q?CAAMAÑO?= <juamarti at amd.com>,pvanhout
<pierre.vanhoutryve at amd.com>,pvanhout <pierre.vanhoutryve at amd.com>,pvanhout
<pierre.vanhoutryve at amd.com>,pvanhout <pierre.vanhoutryve at amd.com>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/72830 at github.com>
================
@@ -45,6 +45,7 @@ define void @back_off_barrier_no_fence(ptr %in, ptr %out) #0 {
; GFX11-BACKOFF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-BACKOFF-NEXT: flat_load_b32 v0, v[0:1]
; GFX11-BACKOFF-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX11-BACKOFF-NEXT: s_waitcnt_vscnt null, 0x0
----------------
Pierre-vh wrote:
> In this patch I would prefer that setNonKernelFunctionInitialState only sets vscnt to unknown, and leaves the other counters as known to be zero. Does that change affect the tests?
With
```
setScoreUB(VS_CNT, getWaitCountMax(VS_CNT));
PendingEvents |= WaitEventMaskForInst[VS_CNT];
````
Only `CodeGen/AMDGPU/vgpr-descriptor-waterfall-loop-idom-update.ll` changes, everything else stays the same
I have no strong preference on whether to finish this first or the other patch - I just picked it up and I'm still learning about InsertWaitCnt myself. Though, I think if this patch has a lot more improvements than regressions we should land it first and add a TODO for the remaining bad cases
https://github.com/llvm/llvm-project/pull/72830
More information about the llvm-commits
mailing list