[all-commits] [llvm/llvm-project] 7c0a3a: [PGO][HIP] Fix HIP device profile collection and s...
Yaxun (Sam) Liu via All-commits
all-commits at lists.llvm.org
Fri Jun 12 06:24:28 PDT 2026
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 7c0a3a52cf967da9c41d009fb92453b272d0d04a
https://github.com/llvm/llvm-project/commit/7c0a3a52cf967da9c41d009fb92453b272d0d04a
Author: Yaxun (Sam) Liu <yaxun.liu at amd.com>
Date: 2026-06-12 (Fri, 12 Jun 2026)
Changed paths:
M clang/lib/CodeGen/CGCUDANV.cpp
M clang/lib/Driver/ToolChains/Linux.cpp
M clang/lib/Driver/ToolChains/MSVC.cpp
M clang/test/CodeGenHIP/offload-pgo-sections.hip
M clang/test/Driver/hip-profile-rocm-runtime.hip
M compiler-rt/lib/profile/InstrProfilingFile.c
M compiler-rt/lib/profile/InstrProfilingPlatformROCm.cpp
M llvm/include/llvm/ProfileData/InstrProf.h
M llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
Log Message:
-----------
[PGO][HIP] Fix HIP device profile collection and sections emission (#202095)
Several related HIP device-PGO fixes:
Windows device collection. HIP rejects a hipMemcpy that reads past the
bounds
of a symbol registered with __hipRegisterVar, but device
data/counters/names
live in merged linker sections. Register a separate shadow for each
device
data, counters, and names symbol and copy each one by its exact
hipGetSymbolSize
size; this also lets static TUs with several kernels keep all their
profile
data. Open the device profile file in binary mode and pass the device
names to
the correct lprofWriteDataImpl arguments so llvm-profdata can read the
raw
profile. Open the versioned amdhip64_7.dll first, falling back to
amdhip64.dll.
Per-TU sections struct. Clang CodeGen emitted the
__llvm_profile_sections_<CUID>
struct (and its section start/stop references) for any profiling-enabled
device
TU. A TU with no instrumented device functions then referenced sections
nothing
populates, so the RDC device link failed under --no-undefined (and
duplicated
__llvm_prf_nm before per-CUID naming). Move the struct emission from
CGCUDANV
into the InstrProfiling pass, which emits it only when the TU has
profile data;
clang emits only the per-TU names-postfix marker, also making names
unique per
TU so RDC builds do not clash.
Dynamic-module interceptors. The hipModuleLoad* interceptors live in a
constructor-only object in clang_rt.profile_rocm that nothing
references, so the
linker drops it and dynamic-module programs collect no device profile.
When
linking clang_rt.profile_rocm, emit a force-link reference (-u on ELF,
-include: on COFF); the constructor self-skips when the program does not
use
hipModuleLoad.
Multi-device profile collection. On Linux, static profile collection
used to
try reading profile data from every visible HIP device. This could fault
when a
device was visible but had not launched the instrumented kernel. Track
HIP
devices that successfully launch kernels, and skip unused devices during
static
profile collection. If tracking is not available, keep the old
collect-all
behavior.
Depends on #201607 (reland HIP offload PGO compiler support and link the
device-profile runtime); that PR must land first.
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list