[clang] Pass -offload-lto instead of -lto for cuda/hip kernels (PR #125243)

Mon Feb 3 07:27:15 PST 2025

================
@@ -498,12 +498,16 @@ Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
   };
 
   // Forward all of the `--offload-opt` and similar options to the device.
-  CmdArgs.push_back("-flto");
   for (auto &Arg : Args.filtered(OPT_offload_opt_eq_minus, OPT_mllvm))
     CmdArgs.append(
         {"-Xlinker",
          Args.MakeArgString("--plugin-opt=" + StringRef(Arg->getValue()))});
 
+  if (Triple.isNVPTX() || Triple.isAMDGPU())
+    CmdArgs.push_back("-foffload-lto");
+  else
+    CmdArgs.push_back("-flto");
----------------
jhuber6 wrote:

Yeah, that would be greatly appreciated. I would recommend using `-v` and `-save-temps` to get the files that go into the embedded `clang --target=nvptx64-nvidia-cuda` job.

https://github.com/llvm/llvm-project/pull/125243