[clang] Pass -offload-lto instead of -lto for cuda/hip kernels (PR #125243)

Tue Feb 4 08:07:56 PST 2025

================
@@ -498,12 +498,16 @@ Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
   };
 
   // Forward all of the `--offload-opt` and similar options to the device.
-  CmdArgs.push_back("-flto");
   for (auto &Arg : Args.filtered(OPT_offload_opt_eq_minus, OPT_mllvm))
     CmdArgs.append(
         {"-Xlinker",
          Args.MakeArgString("--plugin-opt=" + StringRef(Arg->getValue()))});
 
+  if (Triple.isNVPTX() || Triple.isAMDGPU())
+    CmdArgs.push_back("-foffload-lto");
+  else
+    CmdArgs.push_back("-flto");
----------------
jhuber6 wrote:

Clang 19 is in release and can't be modified, does it happen with 20 or main? Also this example uses the `ptx_kernel` CC which I think was only introduced after the 19 release. It works for my installation on `main`. I'm going to guess  you're just using an older version of `clang` or your fork is missing something.

```console
> clang test.ll --target=nvptx64-nvidia-cuda -march=sm_50 -O2 -flto
> llvm-readelf -h a.out                                            
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 33 07 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1
  OS/ABI:                            NVIDIA - CUDA
  ABI Version:                       7
  Type:                              EXEC (Executable file)
  Machine:                           NVIDIA CUDA architecture
  Version:                           0x7E
  Entry point address:               0x0
  Start of program headers:          1888 (bytes into file)
  Start of section headers:          1248 (bytes into file)
  Flags:                             0x320532, sm_50
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         3
  Size of section headers:           64 (bytes)
  Number of section headers:         10
  Section header string table index: 1
```

https://github.com/llvm/llvm-project/pull/125243