[clang] [compiler-rt] [llvm] [PGO][AMDGPU] Add offload profiling with uniformity-aware optimization (PR #177665)
Yaxun Liu via cfe-commits
cfe-commits at lists.llvm.org
Mon Mar 9 17:06:30 PDT 2026
yxsamliu wrote:
> This is a structural question, but what's stopping us from building an actual library for the GPU portion? This PR seems to code the warp-aggregate increment in-line while it could probably be ~10 lines of portable C code using `gpuintrin.h`. Now that HIP is on the new offloading driver it should be trivial to just link in when we pass this to the linker-wrapper.
>
> Something like this:
>
> ```c
> #include <gpuintrin.h>
> #include "InstrProfiling.h"
>
> void __llvm_profile_instrument_gpu(uint64_t *counter, uint64_t step) {
> uint64_t mask = __gpu_lane_mask();
> if (__gpu_is_first_in_lane(mask))
> __scoped_atomic_fetch_add(counter, step * __builtin_popcountg(mask),
> __ATOMIC_RELAXED, __MEMORY_SCOPE_DEVICE);
> }
> ```
>
> The global loading / accessing could probably be abstracted further, and I'm also wondering if we shouldn't make the OpenMP handling do this as well.
>
> I could try experimenting with building what `InstrProfiling*` files already work using the existing build if that would help.
I think this is a great idea. Can you open a PR for it? I can rebase my patch on that PR. Thanks.
https://github.com/llvm/llvm-project/pull/177665
More information about the cfe-commits
mailing list