[llvm] [AMDGPU] Add scheduling DAG mutation for hazard latencies (PR #170075)
Carl Ritson via llvm-commits
llvm-commits at lists.llvm.org
Sun Dec 7 22:48:28 PST 2025
perlfu wrote:
> > Specifically this helps with the case of V_CMP output feeding V_CNDMASK instructions.
>
> Can you explain more and give an example? V_CMP feeding V_CNDMASK is a fast-forward case so it should be fine to schedule them adjacent.
This issue is that in (tight) loops `V_CNDMASK` taints the SGPRs used for the mask so they require a `S_WAITCNT_DEPCTR`.
e.g.
```
MBB:
...
$sgpr = V_CMP
$vgpr = V_CNDMASK ..., $sgpr
...
S_CBRANCH %MBB
```
So the fast-forward case you mention ends up requiring a VALU pipeline wait/stall.
This is particularly painful if there are multiple `V_CMP` to `V_CNDMASK` in the loop body.
With this mutation the scheduler is biased to perform SGPR writes (from VALUs), schedule other instructions, then schedule SGPR reads.
Typically this minimize the impact to a single `S_WAITCNT_DEPCTR` per-iteration with some latency hiding.
I could restrict this mutation to loops -- although I am sure if that analysis is available within the schedule as it works per-MBB.
https://github.com/llvm/llvm-project/pull/170075
More information about the llvm-commits
mailing list