[clang] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)

Thu Jan 25 10:41:36 PST 2024

Artem-B wrote:

> It's not unspecified per-se, it just picks the one the CUDA driver assigned to ID zero, so it will correspond to the layman using a default device if loaded into CUDA.

The default "fastest card first" is also somewhat flaky. First, the "default" enumeration order is affected by the environment (could be by PCI ID, or by "highest-performance-first") which adds another external parameter the user may or may not be aware of. The "highest performance first" is also known to be wrong. E.g. on my machine CUDA runtime was picking a puny newer card I used for graphics over a 2 orders of magnitude faster compute card.

> I think that it's much less intuitive currently where we'll just have it default to sm_52

That would fall under the "any default choice for GPU will be wrong" and the implication that it's up to the user to explicitly provide the correct set of GPUs to target.

On the other hand, I'd be OK with providing `--offload-arch=native` translating into "compile for *all* present GPU variants", with a possibility to further adjust the selected set with the usual `--no-offload-arch-foo`, if the user needs to. This will at least produce code that will run on the machine where it's built, be somewhat consistent and is still adjustable by the user when the default choice will inevitably be wrong.

https://github.com/llvm/llvm-project/pull/79373