[libc-commits] [libc] [libc] Sleep to maximize slab re-use in GPU malloc (PR #143609)

Joseph Huber via libc-commits libc-commits at lists.llvm.org
Tue Jun 10 14:55:36 PDT 2025


https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/143609

Summary:
This patch simply adds a long sleep after we set the reference counter
to zero. Hopefully this suspends the thread long enough that another
will come along and revive the counter instead of having it get
deallocated and then needing to make a new slab.


>From d52b03ddf2bbc04fc97ec20a74d961b9e0e19816 Mon Sep 17 00:00:00 2001
From: Joseph Huber <huberjn at outlook.com>
Date: Tue, 10 Jun 2025 16:53:35 -0500
Subject: [PATCH] [libc] Sleep to maximize slab re-use in GPU malloc

Summary:
This patch simply adds a long sleep after we set the reference counter
to zero. Hopefully this suspends the thread long enough that another
will come along and revive the counter instead of having it get
deallocated and then needing to make a new slab.
---
 libc/src/__support/GPU/allocator.cpp | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/libc/src/__support/GPU/allocator.cpp b/libc/src/__support/GPU/allocator.cpp
index 135ced3df704c..f664ffe180069 100644
--- a/libc/src/__support/GPU/allocator.cpp
+++ b/libc/src/__support/GPU/allocator.cpp
@@ -129,6 +129,16 @@ static inline constexpr T round_up(const T x) {
   return (x + N) & ~(N - 1);
 }
 
+// Sleep for an extended period of time to allow other threads to progress.
+static inline void sleep_extensively() {
+#if defined(LIBC_TARGET_ARCH_IS_NVPTX)
+  if (__nvvm_reflect("__CUDA_ARCH") >= 700)
+    LIBC_INLINE_ASM("nanosleep.u32 8192;" :: : "memory");
+#elif defined(LIBC_TARGET_ARCH_IS_AMDGPU)
+  __builtin_amdgcn_s_sleep(128);
+#endif
+}
+
 } // namespace impl
 
 /// A slab allocator used to hand out identically sized slabs of memory.
@@ -313,6 +323,10 @@ template <typename T> struct GuardPtr {
       // another thread resurrected the counter and we quit, or a parallel read
       // helped us invalidating it. For the latter, claim that flag and return.
       if (counter.fetch_sub(n, cpp::MemoryOrder::RELAXED) == n) {
+        // Yield this thread here to maximize the chance that this CAS fails and
+        // we re-use the slab instead of deallocating it.
+        impl::sleep_extensively();
+
         uint64_t expected = 0;
         if (counter.compare_exchange_strong(expected, INVALID,
                                             cpp::MemoryOrder::RELAXED,



More information about the libc-commits mailing list