[clang] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)

Wed Jan 24 20:53:16 PST 2024

jlebar wrote:

I think I'm with Art on this one.

>> Problem #2 [...] The arch=native will create a working configuration, but would build more than necessary.
>
> It will target the first GPU it finds. We could maybe change the behavior to detect the newest, but the idea is just to target the user's system.

OK, but I think this is worse.

Now it's basically always incorrect to ship a build system which uses arch=native, because the people running the build might very reasonably have multiple GPUs in their system, and which GPU clang picks is unspecified.

But we all know people are going to do it anyway.

Given that this feature cannot correctly be used with a build system, and given that 99.99% of invocations of clang are from a build system that the user running the build did not write, it seems to me that we should not add a feature that is such a footgun when used with a build system.

(A non-CUDA C++ file compiled with march=native will almost surely run on your computer, whereas this won't, and it's unpredictable whether or not it will, depending on the order the nvidia driver returns GPUs in.  So there's no good analogy here.)

If we were going to add this, I think we should compile for all the GPUs in your system, like Art had assumed.  I think that's better, but it has other problems, like slow builds and also the fact that your graphics GPU is likely less powerful than your compute GPU, so now compilation is going to fail because you're e.g. using tensorcores and compiling for a GPU that doesn't have them.  So again you can't really use arch=native in a build system, even if you say "requires an sm80 GPU", because really the requirement is "has an sm80 GPU and no others in the machine".

https://github.com/llvm/llvm-project/pull/79373