[clang] Pass -offload-lto instead of -lto for cuda/hip kernels (PR #125243)
Joseph Huber via cfe-commits
cfe-commits at lists.llvm.org
Tue Feb 4 08:07:56 PST 2025
================
@@ -498,12 +498,16 @@ Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
};
// Forward all of the `--offload-opt` and similar options to the device.
- CmdArgs.push_back("-flto");
for (auto &Arg : Args.filtered(OPT_offload_opt_eq_minus, OPT_mllvm))
CmdArgs.append(
{"-Xlinker",
Args.MakeArgString("--plugin-opt=" + StringRef(Arg->getValue()))});
+ if (Triple.isNVPTX() || Triple.isAMDGPU())
+ CmdArgs.push_back("-foffload-lto");
+ else
+ CmdArgs.push_back("-flto");
----------------
jhuber6 wrote:
Clang 19 is in release and can't be modified, does it happen with 20 or main? Also this example uses the `ptx_kernel` CC which I think was only introduced after the 19 release. It works for my installation on `main`. I'm going to guess you're just using an older version of `clang` or your fork is missing something.
```console
> clang test.ll --target=nvptx64-nvidia-cuda -march=sm_50 -O2 -flto
> llvm-readelf -h a.out
ELF Header:
Magic: 7f 45 4c 46 02 01 01 33 07 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1
OS/ABI: NVIDIA - CUDA
ABI Version: 7
Type: EXEC (Executable file)
Machine: NVIDIA CUDA architecture
Version: 0x7E
Entry point address: 0x0
Start of program headers: 1888 (bytes into file)
Start of section headers: 1248 (bytes into file)
Flags: 0x320532, sm_50
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 3
Size of section headers: 64 (bytes)
Number of section headers: 10
Section header string table index: 1
```
https://github.com/llvm/llvm-project/pull/125243
More information about the cfe-commits
mailing list