[clang] [compiler-rt] [llvm] [PGO][AMDGPU] Add offload profiling with uniformity-aware optimization (PR #177665)
Joseph Huber via cfe-commits
cfe-commits at lists.llvm.org
Wed Mar 18 12:52:19 PDT 2026
jhuber6 wrote:
> The uniformity counter tracks how many times a block was executed with all lanes in the wave active. We use this at compile time to decide spill placement — if a block is always uniform (uniformity count equals the main counter divided by wave size), we know spills there won't cause cross-lane issues. This is different from average occupancy; a block with 95% average occupancy might still have some partial-wave executions where spill placement matters. So ideally the function would atomically increment the uniformity counter only when the lane mask equals the full wave mask, something like:
>
> ```c
> if (__gpu_is_first_in_lane(mask) && uniform_counter &&
> mask == ((__gpu_num_lanes() == 64) ? ~0ULL : 0xFFFFFFFFULL))
> __scoped_atomic_fetch_add(uniform_counter, step * __builtin_popcountg(mask), ...);
> ```
>
> This can coexist with whatever occupancy tracking you'd like to add.
I see, we can always add it later but we should that that if it's a use-case you explicitly have in mind.
https://github.com/llvm/llvm-project/pull/177665
More information about the cfe-commits
mailing list