[libc-commits] [libc] ce52f9c - [libc] Search empty bits after failed allocation (#149910)
via libc-commits
libc-commits at lists.llvm.org
Wed Jul 23 11:19:47 PDT 2025
Author: Joseph Huber
Date: 2025-07-23T13:19:43-05:00
New Revision: ce52f9cdc4c6873d1d5b3a970393d4b6aff12e70
URL: https://github.com/llvm/llvm-project/commit/ce52f9cdc4c6873d1d5b3a970393d4b6aff12e70
DIFF: https://github.com/llvm/llvm-project/commit/ce52f9cdc4c6873d1d5b3a970393d4b6aff12e70.diff
LOG: [libc] Search empty bits after failed allocation (#149910)
Summary:
The scheme we use to find a free bit is to just do a random walk. This
works very well up until you start to completely saturate the bitfield.
Because the result of the fetch_or yields the previous value, we can
search this to go to any known empty bits as our next guess. This
effectively increases our liklihood of finding a match after two tries
by 32x since the distribution is random.
This *massively* improves performance when a lot of memory is allocated
without freeing, as it now doesn't takea one in a million shot to fill
that last bit. A further change could improve this further by only
*mostly* filling the slab, allowing 1% to be free at all times.
Added:
Modified:
libc/src/__support/GPU/allocator.cpp
Removed:
################################################################################
diff --git a/libc/src/__support/GPU/allocator.cpp b/libc/src/__support/GPU/allocator.cpp
index 14b0d06e664fe..866aea7b69d4e 100644
--- a/libc/src/__support/GPU/allocator.cpp
+++ b/libc/src/__support/GPU/allocator.cpp
@@ -256,12 +256,18 @@ struct Slab {
// The uniform mask represents which lanes contain a uniform target pointer.
// We attempt to place these next to each other.
void *result = nullptr;
+ uint32_t after = ~0u;
+ uint32_t old_index = 0;
for (uint64_t mask = lane_mask; mask;
mask = gpu::ballot(lane_mask, !result)) {
if (result)
continue;
- uint32_t start = gpu::broadcast_value(lane_mask, impl::xorshift32(state));
+ // We try using any known empty bits from the previous attempt first.
+ uint32_t start = gpu::shuffle(mask, cpp::countr_zero(uniform & mask),
+ ~after ? (old_index & ~(BITS_IN_WORD - 1)) +
+ cpp::countr_zero(~after)
+ : impl::xorshift32(state));
uint32_t id = impl::lane_count(uniform & mask);
uint32_t index = (start + id) % usable_bits(chunk_size);
@@ -271,8 +277,9 @@ struct Slab {
// Get the mask of bits destined for the same slot and coalesce it.
uint64_t match = uniform & gpu::match_any(mask, slot);
uint32_t length = cpp::popcount(match);
- uint32_t bitmask = static_cast<uint32_t>((uint64_t(1) << length) - 1)
- << bit;
+ uint32_t bitmask = gpu::shuffle(
+ mask, cpp::countr_zero(match),
+ static_cast<uint32_t>((uint64_t(1) << length) - 1) << bit);
uint32_t before = 0;
if (gpu::get_lane_id() == static_cast<uint32_t>(cpp::countr_zero(match)))
@@ -283,6 +290,9 @@ struct Slab {
result = ptr_from_index(index, chunk_size);
else
sleep_briefly();
+
+ after = before | bitmask;
+ old_index = index;
}
cpp::atomic_thread_fence(cpp::MemoryOrder::ACQUIRE);
More information about the libc-commits
mailing list