[llvm] [AMDGPU] Add DS loop wait optimization infrastructure (1/4) (PR #171942)

Fri Dec 12 03:18:23 PST 2025

jayfoad wrote:

I do have some high level concerns about the whole series:
1. It's highly specific to your use case. I don't see why we can't do the same optimization for _all_ wait types in _all_ loops (or at least all inner loops). Why only DS? Why only loops with lots of WMMA? Etc.
2. It adds a lot of new code that is not integrated into the existing flow. For example we already have FlushVmCnt which does something pretty similar, but the implementation is completely separate.

https://github.com/llvm/llvm-project/pull/171942