[libc-commits] [libc] [libc] Use the NVIDIA device allocator for GPU malloc (PR #124277)

Fri Jan 24 06:38:00 PST 2025

https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/124277

Summary:
This is a blocker on another patch in the OpenMP runtime. The problem is
that NVIDIA truly doesn't handle RPC-based allocations very well. It
cannot reliably update the MMU while a kernel is running and it will
usually deadlock if called from a separate thread due to internal use of
TLS.

This patch just removes the definition of `malloc` and `free` for NVPTX.
The result here is that they will be undefined, which is the cue for the
`nvlink` linker to define them for us. So, as far as `libc` is concerned
it still implements malloc.


>From 8c15b0b8b2b58d9686e5bf87d0779da28fccfda2 Mon Sep 17 00:00:00 2001
From: Joseph Huber <huberjn at outlook.com>
Date: Fri, 24 Jan 2025 08:32:58 -0600
Subject: [PATCH] [libc] Use the NVIDIA device allocator for GPU malloc

Summary:
This is a blocker on another patch in the OpenMP runtime. The problem is
that NVIDIA truly doesn't handle RPC-based allocations very well. It
cannot reliably update the MMU while a kernel is running and it will
usually deadlock if called from a separate thread due to internal use of
TLS.

This patch just removes the definition of `malloc` and `free` for NVPTX.
The result here is that they will be undefined, which is the cue for the
`nvlink` linker to define them for us. So, as far as `libc` is concerned
it still implements malloc.
---
 libc/src/stdlib/gpu/free.cpp        | 4 ++++
 libc/src/stdlib/gpu/malloc.cpp      | 4 ++++
 libc/test/src/stdlib/CMakeLists.txt | 3 ++-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/libc/src/stdlib/gpu/free.cpp b/libc/src/stdlib/gpu/free.cpp
index 1f0e9ec7359740..6ef9d718315a5c 100644
--- a/libc/src/stdlib/gpu/free.cpp
+++ b/libc/src/stdlib/gpu/free.cpp
@@ -14,6 +14,10 @@
 
 namespace LIBC_NAMESPACE_DECL {
 
+// FIXME: For now we just default to the NVIDIA device allocator which is
+// always available on NVPTX targets. This will be implemented fully later.
+#ifndef LIBC_TARGET_ARCH_IS_NVPTX
 LLVM_LIBC_FUNCTION(void, free, (void *ptr)) { gpu::deallocate(ptr); }
+#endif
 
 } // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/src/stdlib/gpu/malloc.cpp b/libc/src/stdlib/gpu/malloc.cpp
index 54f2d8843996ee..b5909cb9cb4d02 100644
--- a/libc/src/stdlib/gpu/malloc.cpp
+++ b/libc/src/stdlib/gpu/malloc.cpp
@@ -14,8 +14,12 @@
 
 namespace LIBC_NAMESPACE_DECL {
 
+// FIXME: For now we just default to the NVIDIA device allocator which is
+// always available on NVPTX targets. This will be implemented fully later.
+#ifndef LIBC_TARGET_ARCH_IS_NVPTX
 LLVM_LIBC_FUNCTION(void *, malloc, (size_t size)) {
   return gpu::allocate(size);
 }
+#endif
 
 } // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/test/src/stdlib/CMakeLists.txt b/libc/test/src/stdlib/CMakeLists.txt
index 8cc0428632ba39..aba76833be9d41 100644
--- a/libc/test/src/stdlib/CMakeLists.txt
+++ b/libc/test/src/stdlib/CMakeLists.txt
@@ -420,7 +420,8 @@ if(LLVM_LIBC_FULL_BUILD)
   )
 
   # Only baremetal and GPU has an in-tree 'malloc' implementation.
-  if(LIBC_TARGET_OS_IS_BAREMETAL OR LIBC_TARGET_OS_IS_GPU)
+  if((LIBC_TARGET_OS_IS_BAREMETAL OR LIBC_TARGET_OS_IS_GPU) AND
+      NOT LIBC_TARGET_ARCHITECTURE_IS_NVPTX)
     add_libc_test(
       malloc_test
       HERMETIC_TEST_ONLY