[clang] Pass -offload-lto instead of -lto for cuda/hip kernels (PR #125243)

Mon Feb 3 07:14:03 PST 2025

================
@@ -498,12 +498,16 @@ Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
   };
 
   // Forward all of the `--offload-opt` and similar options to the device.
-  CmdArgs.push_back("-flto");
   for (auto &Arg : Args.filtered(OPT_offload_opt_eq_minus, OPT_mllvm))
     CmdArgs.append(
         {"-Xlinker",
          Args.MakeArgString("--plugin-opt=" + StringRef(Arg->getValue()))});
 
+  if (Triple.isNVPTX() || Triple.isAMDGPU())
+    CmdArgs.push_back("-foffload-lto");
+  else
+    CmdArgs.push_back("-flto");
----------------
omarahmed1111 wrote:

It was breaking some tests where one of the pipeline commands was compiling using clang from llvm IR to ptx but the output from this was the input llvm IR (I think that means something have failed). After investigation, it appeared that `-flto` is the reason for this breaking behaviour.

I don't have strong opinions on using `-foffload-lto`, but I think if `-flto` is not needed, we could avoid using it entirely for CUDA/AMD kernels.  

https://github.com/llvm/llvm-project/pull/125243