[llvm] [AMDGPU] Fix code sequence for barrier start in GFX10+ CU Mode (PR #160501)

Wed Sep 24 08:17:39 PDT 2025

nhaehnle wrote:

> Beyond the scope of this change... It would be nice if we could define where `vm_vsrc(0)` would be sufficient, and be able to apply that as an optimization. My suspicion is that it is sufficient in the majority of graphics scenarios.

I'm not so sure about that, unfortunately. The example you showed offline showed a problem when wave A has a workgroup-scope release fence and then wave B has an agent-scope release fence that should push out the data from A as well.

Since we can't do long-distance static analysis to understand what happens in other waves, we can only reduce a workgroup-scope release fence to `vm_vsrc(0)` if we change the code sequence for agent-scope release fences in a way that establishes this guarantee, and I'm not convinced that the hardware we have guarantees that. We can follow up offline.

https://github.com/llvm/llvm-project/pull/160501