[flang-commits] [clang] [clang-tools-extra] [lldb] [libc] [libcxx] [lld] [llvm] [flang] [compiler-rt] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)

Thu Jan 25 12:51:17 PST 2024

jhuber6 wrote:

> > I think the semantics of native on other architectures are clear enough here.
> 
> I don't think we have the same idea about that. Let's spell it out, so there's no confusion.
> 
> [GCC manual](https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html#index-march-16) says:
> 
> > Using -march=native enables all instruction subsets supported by the local machine (hence the result might not run on different machines)
> 
> The way I read it "all instruction subsets supported by the local machine" would be what all-GPUs strategy would do. The binary is expected to run on all GPU architecture variants available on the machine.
> 
> Granted, gcc was not written with GPUs in mind, but it's a good baseline for establishing existing conventions for the meaning of `-march=native`.

This more or less depends on what your definition of "local machine" is when it comes to a system augmented with GPUs. The verbiage of "**The** local machine" implies an assumption that there is only one, which I personally find consistent with just selecting the first GPU found on the system. There is ambiguity in how we should treat this in the case of multiple GPUs, but that's what the warning message is for. it informs the user that the "native" architecture is somewhat ambiguous and that the first one was selected.

Further, our current default makes sense, because it corresponds to Device ID zero in CUDA, which means that unless you change the environment via `CUDA_VISIBLE_DEVICES` or something, it will work on the default device.

So, in the case there is one device, the behavior is consistent with `-march=native`. In the case where there are two, we make an implicit decision to target the first GPU and inform the user. This method of compilation is not like CUDA, so we can't target all the GPUs at the same time. This will be useful in cases where we want to write code that simply targets a GPU that will "work". We have CMake code around LLVM already to do this, so it would be nice to get rid of that.

https://github.com/llvm/llvm-project/pull/79373