[llvm] [NVPTX] Add errors for incorrect CUDA addrpaces (PR #138706)

Thu May 8 11:16:47 PDT 2025

================
@@ -1399,19 +1399,27 @@ void NVPTXAsmPrinter::emitFunctionParamList(const Function *F, raw_ostream &O) {
       if (PTy) {
         O << "\t.param .u" << PTySizeInBits << " .ptr";
 
+        bool IsCUDA = static_cast<NVPTXTargetMachine &>(TM).getDrvInterface() ==
+                      NVPTX::CUDA;
         switch (PTy->getAddressSpace()) {
         default:
           break;
         case ADDRESS_SPACE_GLOBAL:
           O << " .global";
           break;
         case ADDRESS_SPACE_SHARED:
+          if (IsCUDA)
+            report_fatal_error(".shared ptr kernel args unsupported in CUDA.");
           O << " .shared";
           break;
         case ADDRESS_SPACE_CONST:
+          if (IsCUDA)
+            report_fatal_error(".const ptr kernel args unsupported in CUDA.");
----------------
Artem-B wrote:

> quality of life improvement for developers of non-cuda frontends (openmp/-acc offloading, c++ parallel algorithms, fortran do-concurrent...) that support a variety of backends (cuda ptx, opencl, ...).

Then it makes little sense to apply those error checks to the *CUDA* front-end only. They do not change anything for those non-CUDA front-ends, because they are not CUDA. Or they should not be using `-cuda` triple because they are following some other kind of calling convention. The checks also do nothing for the CUDA front-end itself because it never generates such pointer arguments. So, what are we fixing there. Real, specific examples would help. So far, it looks like a case of someone misusing CUDA-specific triple, and now we're fixing the corner cases in misuse scenarios, instead of properly fixing the way those front-ends interact with LLVM.

> EDIT2: IIRC, this also caused churn for those frontend devs as ptxas was fixed to detect these better, because some of those frontends generate the ptx ahead of time, but 

If the font-ends generate PTX, then this change will also be irrelevant, as it applies to IR only. Whatever user may have put into inline asm is opaque to us, and if it's just a text blob passed to the ptxas, then LLVM does not touch it at all.

So, we're left with use cases where IR uses non-generic kernel pointer arguments. If we do want to diagnose those, it may have some merit, but it should be clearly documented what we're doing and why. E.g. if we're appealing to CUDA front-end as the source of those restrictions, then we should ban all non-generic pointers. If it's the PTXAS that determines which pointer variants are acceptable, I'd like to see ptx documentation saying that. Right now the patch is neither here nor there.


https://github.com/llvm/llvm-project/pull/138706