[clang] [compiler-rt] [llvm] [PGO][AMDGPU] Add offload profiling with uniformity-aware optimization (PR #177665)
Yaxun Liu via llvm-commits
llvm-commits at lists.llvm.org
Mon Mar 9 07:43:44 PDT 2026
================
@@ -354,6 +355,11 @@ static cl::opt<unsigned> MemprofGenerateRandomHotnessSeed(
cl::sub(MergeSubcommand),
cl::desc("Random hotness seed to use (0 to generate new seed)"));
+static cl::opt<unsigned> OffloadDeviceWaveSize(
----------------
yxsamliu wrote:
The wave size is needed during `llvm-profdata merge` which runs offline on the host, processing raw profile files that were collected from a device. At merge time, there's no GPU context or dispatch packet to query - we only have the profile data files. The wave size affects how we interpret the uniformity counter ratios to compute block uniformity bits. A better long-term solution would be to embed the wave size in the raw profile header during collection so `llvm-profdata merge` can read it directly without requiring a command-line option. For now, the `--wave-size` flag provides a workaround, defaulting to 32 (gfx10/gfx11/gfx12) which covers newer hardware.
https://github.com/llvm/llvm-project/pull/177665
More information about the llvm-commits
mailing list