[clang] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)

Wed Jan 24 17:35:42 PST 2024

Artem-B wrote:

This option may not as well as one would hope.

Problem #1 is that it will drastically slow down compilation for some users. NVIDIA GPU drivers are loaded on demand, and the process takes a while (O(second), depending on the kind and number of GPUs). If you build on a headless machine, they will get loaded during GPU probing step, and they will get unloaded after that. For each compilation. This will also affect folks who use AMD GPUs to run graphics, but use NVIDIA GPUs for compute (my current machine is set up that way). It can be worked around by enabling driver persistence, but there would be no obvious cues for the user that they would need to do so.

Problem #2 is that it will likely result in unnecessary compilation for nontrivial subset of users who have separate GPUs dedicated to compute and do not care to compile for a separate GPU they use for graphics. The `arch=native` will create a working configuration, but would build more than necessary. Again, the end user would not be aware of that.

Problem #3 -- it adds an extra step to the reproducibility/debugging process. If/when someone reports an issue with a compilation done with `-mnative`, we'll inevitably have to start with clarifying questions -- what exactly was the hardware configuration of the machine where the compilation was done.

With my "GPU support dude for nontrivial number of users" hat on, I personally would really like not to open this can of worms. It's not a very big deal, but my gut is telling me that I will see all three cases once the option makes it into the tribal knowledge (hi, reddit & stack overflow!).

So, in short, the source code changes are OK, but I'm not a huge fan of `-mnative` in principle (both CPU and GPU variants).
If others find it useful, I'm OK with adding the option, but it should probably come with documented caveats so affected users have a chance to find the answer if/when they run into trouble.

https://github.com/llvm/llvm-project/pull/79373