[llvm] [AMDGPU] Implement Waitcnt Expansion for Profiling (PR #169345)

Pankaj Dwivedi via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 26 05:28:57 PST 2025


PankajDwivedi-25 wrote:

> Memory operations to different address can complete in any order, even if the waitcnt is decremented in a fixed order. I'm not sure of how useful this is in practice.

You're correct that if operations complete internally out of order, the last one to complete might not be the slowest. However, the information is still useful because: It tells you which group of operations (in program order) is still pending, It's more granular than a single waitcnt(0) which provides no insight.

The feature is primarily targeted at scenarios with multiple operations of the same type (common in memory-intensive kernels with many loads/stores), where the in-order completion guarantee makes the profiling data actionable.

https://github.com/llvm/llvm-project/pull/169345


More information about the llvm-commits mailing list