[clang] [compiler-rt] [llvm] [PGO][AMDGPU] Add offload profiling with uniformity-aware optimization (PR #177665)

Sun Feb 1 23:23:22 PST 2026

================
@@ -303,6 +303,21 @@ HIPAMDToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args,
   const OptTable &Opts = getDriver().getOpts();
 
   for (Arg *A : Args) {
+    // Handle device-side profile data file for PGO
+    if (A->getOption().matches(options::OPT_fprofile_use_EQ)) {
+      StringRef ProfileFile = A->getValue();
+      auto [Base, Ext] = ProfileFile.rsplit('.');
+      std::string DeviceProfileFile;
+      if (!Ext.empty()) {
+        DeviceProfileFile = (Base + "." + getTriple().str() + "." + Ext).str();
+      } else {
+        DeviceProfileFile = (ProfileFile + "." + getTriple().str()).str();
+      }
+      DAL->AddJoinedArg(A, Opts.getOption(options::OPT_fprofile_instr_use_EQ),
+                        DeviceProfileFile);
+      A->claim();
+      continue;
+    }
----------------
EthanLuisMcDonough wrote:

The current PGO offloading infrastructure uses `-Xarch_device` and `-Xarch_host` to supply target specific profiling flags:

```
clang ... -Xarch_device -fprofile-use=device.profdata -Xarch_host -fprofile-use=host.profdata
```

This is certainly more verbose than automatically assuming the device's profile filename, but it allows the end user a greater degree of flexibility while also being more explicit. @jhuber6 originally suggested using these flags instead of introducing new flags for GPU PGO (see #94268), perhaps he could weigh in here as well?

https://github.com/llvm/llvm-project/pull/177665