[clang] [compiler-rt] [llvm] [PGO][AMDGPU] Add offload profiling with uniformity-aware optimization (PR #177665)
Yaxun Liu via cfe-commits
cfe-commits at lists.llvm.org
Mon Mar 9 07:43:43 PDT 2026
================
@@ -303,6 +303,21 @@ HIPAMDToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args,
const OptTable &Opts = getDriver().getOpts();
for (Arg *A : Args) {
+ // Handle device-side profile data file for PGO
+ if (A->getOption().matches(options::OPT_fprofile_use_EQ)) {
+ StringRef ProfileFile = A->getValue();
+ auto [Base, Ext] = ProfileFile.rsplit('.');
+ std::string DeviceProfileFile;
+ if (!Ext.empty()) {
+ DeviceProfileFile = (Base + "." + getTriple().str() + "." + Ext).str();
+ } else {
+ DeviceProfileFile = (ProfileFile + "." + getTriple().str()).str();
+ }
+ DAL->AddJoinedArg(A, Opts.getOption(options::OPT_fprofile_instr_use_EQ),
+ DeviceProfileFile);
+ A->claim();
+ continue;
+ }
----------------
yxsamliu wrote:
Good point. I've removed the auto-suffix logic from HIPAMD.cpp. HIP now uses the same explicit `-Xarch_host`/`-Xarch_device` approach as OpenMP:
```bash
clang++ -x hip app.hip -o app_optimized \
-Xarch_host -fprofile-use=host.profdata \
-Xarch_device -fprofile-use=device.profdata
```
This is more flexible since users control the profile file naming, and it's consistent with OpenMP behavior.
I've also added documentation in `clang/docs/HIPSupport.rst` covering the PGO workflow.
https://github.com/llvm/llvm-project/pull/177665
More information about the cfe-commits
mailing list