[clang] [NVPTX][AMDGPU][CodeGen] Fix `local_space nullptr` handling for NVPTX and local/private `nullptr` value for AMDGPU. (PR #78759)

Victor Lomuller via cfe-commits cfe-commits at lists.llvm.org
Thu Jan 25 07:39:58 PST 2024


================
@@ -285,6 +289,20 @@ void NVPTXTargetCodeGenInfo::addNVVMMetadata(llvm::GlobalValue *GV,
 bool NVPTXTargetCodeGenInfo::shouldEmitStaticExternCAliases() const {
   return false;
 }
+
+llvm::Constant *
+NVPTXTargetCodeGenInfo::getNullPointer(const CodeGen::CodeGenModule &CGM,
+                                       llvm::PointerType *PT,
+                                       QualType QT) const {
+  auto &Ctx = CGM.getContext();
+  if (PT->getAddressSpace() != Ctx.getTargetAddressSpace(LangAS::opencl_local))
+    return llvm::ConstantPointerNull::get(PT);
+
+  auto NPT = llvm::PointerType::get(
+      PT->getContext(), Ctx.getTargetAddressSpace(LangAS::opencl_generic));
+  return llvm::ConstantExpr::getAddrSpaceCast(
+      llvm::ConstantPointerNull::get(NPT), PT);
+}
----------------
Naghasan wrote:

Hi @Artem-B 

I'm shimming in at @mmoadeli's request. I advised him on the resolution of his issue.

> I don't quite understand what's going on here.

So it is a similar story as for the AMDGPU backend. `0` as a pointer to shared memory is a valid one and points to the root of the shared memory, so that's means we cannot use this value as `nullptr`. AMDGPU uses -1 (all bits set) for this, but we couldn't find anything equivalent in the CUDA/PTX documentation. After a few investigation, we found out the most stable way to do this is simply by inserting this expression.

Note that `0` as a pointer to the generic address space the expected value for `nullptr`

> Why are we ASC'ing all null pointers to LangAS::opencl_generic ?

The patch isn't doing this, if the pointer type *is* to the cuda shared address space (opencl's local address space) then we do ASC. Otherwise this emits the simple `llvm::ConstantPointerNull`.
We used `LangAS::opencl_generic` as a way to emphasis there is a generic to shared address space cast going on. The other solution here would be to use `LangAS::Default` to retrieve the target address space, but `Default` doesn't sound right to me as you have to know this maps to NVPTX's generic target address space. Either way, we don't have a strong opinion on what to use. But a comment is probably needed regardless.

> Will it work for CUDA (as in the CUDA language)? I think this code should be restricted to apply the ASC only for OpenCL and leave CUDA/HIP with the dafault.

So yes and no. To the `Will it work for CUDA ?` part, yes it will because you actually cannot hit this return. CUDA doesn't expose address spaces so you can't have that nullptr as an address in the cuda shared address space, so the `if` above will always evaluate to true in CUDA.

For the `leave CUDA/HIP with the dafault` part, you could force things and use target address spaces like it is done in the clang headers for CUDA and this change would capture that. However, as explained before, `0` in the address space 3 (NVPTX backend) is a valid address and it is very easy to highlight in SASS.

https://github.com/llvm/llvm-project/pull/78759


More information about the cfe-commits mailing list