[libc-commits] [libc] ce52f9c - [libc] Search empty bits after failed allocation (#149910)

via libc-commits libc-commits at lists.llvm.org
Wed Jul 23 11:19:47 PDT 2025


Author: Joseph Huber
Date: 2025-07-23T13:19:43-05:00
New Revision: ce52f9cdc4c6873d1d5b3a970393d4b6aff12e70

URL: https://github.com/llvm/llvm-project/commit/ce52f9cdc4c6873d1d5b3a970393d4b6aff12e70
DIFF: https://github.com/llvm/llvm-project/commit/ce52f9cdc4c6873d1d5b3a970393d4b6aff12e70.diff

LOG: [libc] Search empty bits after failed allocation (#149910)

Summary:
The scheme we use to find a free bit is to just do a random walk. This
works very well up until you start to completely saturate the bitfield.
Because the result of the fetch_or yields the previous value, we can
search this to go to any known empty bits as our next guess. This
effectively increases our liklihood of finding a match after two tries
by 32x since the distribution is random.

This *massively* improves performance when a lot of memory is allocated
without freeing, as it now doesn't takea one in a million shot to fill
that last bit. A further change could improve this further by only
*mostly* filling the slab, allowing 1% to be free at all times.

Added: 
    

Modified: 
    libc/src/__support/GPU/allocator.cpp

Removed: 
    


################################################################################
diff  --git a/libc/src/__support/GPU/allocator.cpp b/libc/src/__support/GPU/allocator.cpp
index 14b0d06e664fe..866aea7b69d4e 100644
--- a/libc/src/__support/GPU/allocator.cpp
+++ b/libc/src/__support/GPU/allocator.cpp
@@ -256,12 +256,18 @@ struct Slab {
     // The uniform mask represents which lanes contain a uniform target pointer.
     // We attempt to place these next to each other.
     void *result = nullptr;
+    uint32_t after = ~0u;
+    uint32_t old_index = 0;
     for (uint64_t mask = lane_mask; mask;
          mask = gpu::ballot(lane_mask, !result)) {
       if (result)
         continue;
 
-      uint32_t start = gpu::broadcast_value(lane_mask, impl::xorshift32(state));
+      // We try using any known empty bits from the previous attempt first.
+      uint32_t start = gpu::shuffle(mask, cpp::countr_zero(uniform & mask),
+                                    ~after ? (old_index & ~(BITS_IN_WORD - 1)) +
+                                                 cpp::countr_zero(~after)
+                                           : impl::xorshift32(state));
 
       uint32_t id = impl::lane_count(uniform & mask);
       uint32_t index = (start + id) % usable_bits(chunk_size);
@@ -271,8 +277,9 @@ struct Slab {
       // Get the mask of bits destined for the same slot and coalesce it.
       uint64_t match = uniform & gpu::match_any(mask, slot);
       uint32_t length = cpp::popcount(match);
-      uint32_t bitmask = static_cast<uint32_t>((uint64_t(1) << length) - 1)
-                         << bit;
+      uint32_t bitmask = gpu::shuffle(
+          mask, cpp::countr_zero(match),
+          static_cast<uint32_t>((uint64_t(1) << length) - 1) << bit);
 
       uint32_t before = 0;
       if (gpu::get_lane_id() == static_cast<uint32_t>(cpp::countr_zero(match)))
@@ -283,6 +290,9 @@ struct Slab {
         result = ptr_from_index(index, chunk_size);
       else
         sleep_briefly();
+
+      after = before | bitmask;
+      old_index = index;
     }
 
     cpp::atomic_thread_fence(cpp::MemoryOrder::ACQUIRE);


        


More information about the libc-commits mailing list