[lld] [lldb] [libcxx] [compiler-rt] [clang-tools-extra] [llvm] [libc] [clang] [flang] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)

Thu Jan 25 13:19:45 PST 2024

jhuber6 wrote:

> > This method of compilation is not like CUDA, so we can't target all the GPUs at the same time.
> 
> Can you clarify for me -- what are you compiling where it's impossible to target multiple GPUs in the binary? I'm confused because Art is understanding that it's not CUDA, but we're modifying the CUDA driver here?

The idea is to simply compile C / C++ code directly targeting NVPTX rather than going through offloading languages like CUDA or OpenMP. This is more or less what cross-compiling is. We specify `--target=nvptx64-nvidia-cuda` which instructs the compiler to cross-compile the C / C++ targeting NVPTX. This results in a workflow that is very close to compiling a standard executable by design. This is mostly related to my work on the LLVM C library for GPUs [which I did a talk on that goes in more detail](https://www.youtube.com/watch?v=_LLGc48GYHc)

Right now, with the LLVM `libc` infrastructure I can do the following on my AMD GPU.

```
#include <stdio.h>
int main() { puts("Hello World!"); }
```
And compile it and run it more or less.
```
$ clang hello.c --target=amdgcn-amd-amdhsa -mcpu=native -flto -lc crt1.o
$ amdhsa_loader a.out
Hello World!
```
This works with AMD currently, and I want it to work for NVPTX so I can remove some ugly, annoying code in the `libc` project. This is how I'm running unit tests targeting the GPU in that project, which needs to run on the user's GPU. I'd rather just use `-march=native` than detect it manually in CMake.

https://github.com/llvm/llvm-project/pull/79373