[compiler-rt] r334410 - [scudo] Improve the scalability of the shared TSD model
Kostya Kortchinsky via llvm-commits
llvm-commits at lists.llvm.org
Mon Jun 11 07:50:31 PDT 2018
Author: cryptoad
Date: Mon Jun 11 07:50:31 2018
New Revision: 334410
URL: http://llvm.org/viewvc/llvm-project?rev=334410&view=rev
Log:
[scudo] Improve the scalability of the shared TSD model
Summary:
The shared TSD model in its current form doesn't scale. Here is an example of
rpc2-benchmark (with default parameters, which is threading heavy) on a 72-core
machines (defaulting to a `CompactSizeClassMap` and no Quarantine):
- with tcmalloc: 337K reqs/sec, peak RSS of 338MB;
- with scudo (exclusive): 321K reqs/sec, peak RSS of 637MB;
- with scudo (shared): 241K reqs/sec, peak RSS of 324MB.
This isn't great, since the exclusive model uses a lot of memory, while the
shared model doesn't even come close to be competitive.
This is mostly due to the fact that we are consistently scanning the TSD pool
starting at index 0 for an available TSD, which can result in a lot of failed
lock attempts, and touching some memory that needs not be touched.
This CL attempts to make things better in most situations:
- first, use a thread local variable on Linux (intead of pthread APIs) to store
the current TSD in the shared model;
- move the locking boolean out of the TSD: this allows the compiler to use a
register and potentially optimize out a branch instead of reading it from the
TSD everytime (we also save a tiny bit of memory per TSD);
- 64-bit atomic operations on 32-bit ARM platforms happen to be expensive: so
store the `Precedence` in a `uptr` instead of a `u64`. We lose some
nanoseconds of precision and we'll wrap around at some point, but the benefit
is worth it;
- change a `CHECK` to a `DCHECK`: this should never happen, but if something is
ever terribly wrong, we'll crash on a near null AV if the TSD happens to be
null;
- based on an idea by dvyukov@, we are implementing a bound random scan for
an available TSD. This requires computing the coprimes for the number of TSDs,
and attempting to lock up to 4 TSDs in an random order before falling back to
the current one. This is obviously slightly more expansive when we have just
2 TSDs (barely noticeable) but is otherwise beneficial. The `Precedence` still
basically corresponds to the moment of the first contention on a TSD. To seed
on random choice, we use the precedence of the current TSD since it is very
likely to be non-zero (since we are in the slow path after a failed `tryLock`)
With those modifications, the benchmark yields to:
- with scudo (shared): 330K reqs/sec, peak RSS of 327MB.
So the shared model for this specific situation not only becomes competitive but
outperforms the exclusive model. I experimented with some values greater than 4
for the number of TSDs to attempt to lock and it yielded a decrease in QPS. Just
sticking with the current TSD is also a tad slower. Numbers on platforms with
less cores (eg: Android) remain similar.
Reviewers: alekseyshl, dvyukov, javed.absar
Reviewed By: alekseyshl, dvyukov
Subscribers: srhines, kristof.beyls, delcypher, llvm-commits, #sanitizers
Differential Revision: https://reviews.llvm.org/D47289
Modified:
compiler-rt/trunk/lib/scudo/scudo_allocator.cpp
compiler-rt/trunk/lib/scudo/scudo_tsd.h
compiler-rt/trunk/lib/scudo/scudo_tsd_exclusive.cpp
compiler-rt/trunk/lib/scudo/scudo_tsd_exclusive.inc
compiler-rt/trunk/lib/scudo/scudo_tsd_shared.cpp
compiler-rt/trunk/lib/scudo/scudo_tsd_shared.inc
Modified: compiler-rt/trunk/lib/scudo/scudo_allocator.cpp
URL: http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/scudo/scudo_allocator.cpp?rev=334410&r1=334409&r2=334410&view=diff
==============================================================================
--- compiler-rt/trunk/lib/scudo/scudo_allocator.cpp (original)
+++ compiler-rt/trunk/lib/scudo/scudo_allocator.cpp Mon Jun 11 07:50:31 2018
@@ -388,9 +388,11 @@ struct ScudoAllocator {
if (PrimaryAllocator::CanAllocate(AlignedSize, MinAlignment)) {
BackendSize = AlignedSize;
ClassId = SizeClassMap::ClassID(BackendSize);
- ScudoTSD *TSD = getTSDAndLock();
+ bool UnlockRequired;
+ ScudoTSD *TSD = getTSDAndLock(&UnlockRequired);
BackendPtr = BackendAllocator.allocatePrimary(&TSD->Cache, ClassId);
- TSD->unlock();
+ if (UnlockRequired)
+ TSD->unlock();
} else {
BackendSize = NeededSize;
ClassId = 0;
@@ -447,10 +449,12 @@ struct ScudoAllocator {
Chunk::eraseHeader(Ptr);
void *BackendPtr = Chunk::getBackendPtr(Ptr, Header);
if (Header->ClassId) {
- ScudoTSD *TSD = getTSDAndLock();
+ bool UnlockRequired;
+ ScudoTSD *TSD = getTSDAndLock(&UnlockRequired);
getBackendAllocator().deallocatePrimary(&TSD->Cache, BackendPtr,
Header->ClassId);
- TSD->unlock();
+ if (UnlockRequired)
+ TSD->unlock();
} else {
getBackendAllocator().deallocateSecondary(BackendPtr);
}
@@ -464,11 +468,13 @@ struct ScudoAllocator {
UnpackedHeader NewHeader = *Header;
NewHeader.State = ChunkQuarantine;
Chunk::compareExchangeHeader(Ptr, &NewHeader, Header);
- ScudoTSD *TSD = getTSDAndLock();
+ bool UnlockRequired;
+ ScudoTSD *TSD = getTSDAndLock(&UnlockRequired);
AllocatorQuarantine.Put(getQuarantineCache(TSD),
QuarantineCallback(&TSD->Cache), Ptr,
EstimatedSize);
- TSD->unlock();
+ if (UnlockRequired)
+ TSD->unlock();
}
}
@@ -612,8 +618,7 @@ void initScudo() {
Instance.init();
}
-void ScudoTSD::init(bool Shared) {
- UnlockRequired = Shared;
+void ScudoTSD::init() {
getBackendAllocator().initCache(&Cache);
memset(QuarantineCachePlaceHolder, 0, sizeof(QuarantineCachePlaceHolder));
}
Modified: compiler-rt/trunk/lib/scudo/scudo_tsd.h
URL: http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/scudo/scudo_tsd.h?rev=334410&r1=334409&r2=334410&view=diff
==============================================================================
--- compiler-rt/trunk/lib/scudo/scudo_tsd.h (original)
+++ compiler-rt/trunk/lib/scudo/scudo_tsd.h Mon Jun 11 07:50:31 2018
@@ -23,11 +23,11 @@
namespace __scudo {
-struct ALIGNED(64) ScudoTSD {
+struct ALIGNED(SANITIZER_CACHE_LINE_SIZE) ScudoTSD {
AllocatorCache Cache;
uptr QuarantineCachePlaceHolder[4];
- void init(bool Shared);
+ void init();
void commitBack();
INLINE bool tryLock() {
@@ -36,29 +36,23 @@ struct ALIGNED(64) ScudoTSD {
return true;
}
if (atomic_load_relaxed(&Precedence) == 0)
- atomic_store_relaxed(&Precedence, MonotonicNanoTime());
+ atomic_store_relaxed(&Precedence, static_cast<uptr>(
+ MonotonicNanoTime() >> FIRST_32_SECOND_64(16, 0)));
return false;
}
INLINE void lock() {
- Mutex.Lock();
atomic_store_relaxed(&Precedence, 0);
+ Mutex.Lock();
}
- INLINE void unlock() {
- if (!UnlockRequired)
- return;
- Mutex.Unlock();
- }
+ INLINE void unlock() { Mutex.Unlock(); }
- INLINE u64 getPrecedence() {
- return atomic_load_relaxed(&Precedence);
- }
+ INLINE uptr getPrecedence() { return atomic_load_relaxed(&Precedence); }
private:
- bool UnlockRequired;
StaticSpinMutex Mutex;
- atomic_uint64_t Precedence;
+ atomic_uintptr_t Precedence;
};
void initThread(bool MinimalInit);
Modified: compiler-rt/trunk/lib/scudo/scudo_tsd_exclusive.cpp
URL: http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/scudo/scudo_tsd_exclusive.cpp?rev=334410&r1=334409&r2=334410&view=diff
==============================================================================
--- compiler-rt/trunk/lib/scudo/scudo_tsd_exclusive.cpp (original)
+++ compiler-rt/trunk/lib/scudo/scudo_tsd_exclusive.cpp Mon Jun 11 07:50:31 2018
@@ -50,7 +50,7 @@ static void teardownThread(void *Ptr) {
static void initOnce() {
CHECK_EQ(pthread_key_create(&PThreadKey, teardownThread), 0);
initScudo();
- FallbackTSD.init(/*Shared=*/true);
+ FallbackTSD.init();
}
void initThread(bool MinimalInit) {
@@ -59,7 +59,7 @@ void initThread(bool MinimalInit) {
return;
CHECK_EQ(pthread_setspecific(PThreadKey, reinterpret_cast<void *>(
GetPthreadDestructorIterations())), 0);
- TSD.init(/*Shared=*/false);
+ TSD.init();
ScudoThreadState = ThreadInitialized;
}
Modified: compiler-rt/trunk/lib/scudo/scudo_tsd_exclusive.inc
URL: http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/scudo/scudo_tsd_exclusive.inc?rev=334410&r1=334409&r2=334410&view=diff
==============================================================================
--- compiler-rt/trunk/lib/scudo/scudo_tsd_exclusive.inc (original)
+++ compiler-rt/trunk/lib/scudo/scudo_tsd_exclusive.inc Mon Jun 11 07:50:31 2018
@@ -35,11 +35,13 @@ ALWAYS_INLINE void initThreadMaybe(bool
initThread(MinimalInit);
}
-ALWAYS_INLINE ScudoTSD *getTSDAndLock() {
+ALWAYS_INLINE ScudoTSD *getTSDAndLock(bool *UnlockRequired) {
if (UNLIKELY(ScudoThreadState != ThreadInitialized)) {
FallbackTSD.lock();
+ *UnlockRequired = true;
return &FallbackTSD;
}
+ *UnlockRequired = false;
return &TSD;
}
Modified: compiler-rt/trunk/lib/scudo/scudo_tsd_shared.cpp
URL: http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/scudo/scudo_tsd_shared.cpp?rev=334410&r1=334409&r2=334410&view=diff
==============================================================================
--- compiler-rt/trunk/lib/scudo/scudo_tsd_shared.cpp (original)
+++ compiler-rt/trunk/lib/scudo/scudo_tsd_shared.cpp Mon Jun 11 07:50:31 2018
@@ -23,6 +23,13 @@ pthread_key_t PThreadKey;
static atomic_uint32_t CurrentIndex;
static ScudoTSD *TSDs;
static u32 NumberOfTSDs;
+static u32 CoPrimes[SCUDO_SHARED_TSD_POOL_SIZE];
+static u32 NumberOfCoPrimes = 0;
+
+#if SANITIZER_LINUX && !SANITIZER_ANDROID
+__attribute__((tls_model("initial-exec")))
+THREADLOCAL ScudoTSD *CurrentTSD;
+#endif
static void initOnce() {
CHECK_EQ(pthread_key_create(&PThreadKey, NULL), 0);
@@ -31,13 +38,21 @@ static void initOnce() {
static_cast<u32>(SCUDO_SHARED_TSD_POOL_SIZE));
TSDs = reinterpret_cast<ScudoTSD *>(
MmapOrDie(sizeof(ScudoTSD) * NumberOfTSDs, "ScudoTSDs"));
- for (u32 i = 0; i < NumberOfTSDs; i++)
- TSDs[i].init(/*Shared=*/true);
+ for (u32 I = 0; I < NumberOfTSDs; I++) {
+ TSDs[I].init();
+ u32 A = I + 1;
+ u32 B = NumberOfTSDs;
+ while (B != 0) { const u32 T = A; A = B; B = T % B; }
+ if (A == 1)
+ CoPrimes[NumberOfCoPrimes++] = I + 1;
+ }
}
ALWAYS_INLINE void setCurrentTSD(ScudoTSD *TSD) {
#if SANITIZER_ANDROID
*get_android_tls_ptr() = reinterpret_cast<uptr>(TSD);
+#elif SANITIZER_LINUX
+ CurrentTSD = TSD;
#else
CHECK_EQ(pthread_setspecific(PThreadKey, reinterpret_cast<void *>(TSD)), 0);
#endif // SANITIZER_ANDROID
@@ -50,34 +65,42 @@ void initThread(bool MinimalInit) {
setCurrentTSD(&TSDs[Index % NumberOfTSDs]);
}
-ScudoTSD *getTSDAndLockSlow() {
- ScudoTSD *TSD;
+ScudoTSD *getTSDAndLockSlow(ScudoTSD *TSD) {
if (NumberOfTSDs > 1) {
- // Go through all the contexts and find the first unlocked one.
- for (u32 i = 0; i < NumberOfTSDs; i++) {
- TSD = &TSDs[i];
- if (TSD->tryLock()) {
- setCurrentTSD(TSD);
- return TSD;
+ // Use the Precedence of the current TSD as our random seed. Since we are in
+ // the slow path, it means that tryLock failed, and as a result it's very
+ // likely that said Precedence is non-zero.
+ u32 RandState = static_cast<u32>(TSD->getPrecedence());
+ const u32 R = Rand(&RandState);
+ const u32 Inc = CoPrimes[R % NumberOfCoPrimes];
+ u32 Index = R % NumberOfTSDs;
+ uptr LowestPrecedence = UINTPTR_MAX;
+ ScudoTSD *CandidateTSD = nullptr;
+ // Go randomly through at most 4 contexts and find a candidate.
+ for (u32 I = 0; I < Min(4U, NumberOfTSDs); I++) {
+ if (TSDs[Index].tryLock()) {
+ setCurrentTSD(&TSDs[Index]);
+ return &TSDs[Index];
}
- }
- // No luck, find the one with the lowest Precedence, and slow lock it.
- u64 LowestPrecedence = UINT64_MAX;
- for (u32 i = 0; i < NumberOfTSDs; i++) {
- u64 Precedence = TSDs[i].getPrecedence();
- if (Precedence && Precedence < LowestPrecedence) {
- TSD = &TSDs[i];
+ const uptr Precedence = TSDs[Index].getPrecedence();
+ // A 0 precedence here means another thread just locked this TSD.
+ if (UNLIKELY(Precedence == 0))
+ continue;
+ if (Precedence < LowestPrecedence) {
+ CandidateTSD = &TSDs[Index];
LowestPrecedence = Precedence;
}
+ Index += Inc;
+ if (Index >= NumberOfTSDs)
+ Index -= NumberOfTSDs;
}
- if (LIKELY(LowestPrecedence != UINT64_MAX)) {
- TSD->lock();
- setCurrentTSD(TSD);
- return TSD;
+ if (CandidateTSD) {
+ CandidateTSD->lock();
+ setCurrentTSD(CandidateTSD);
+ return CandidateTSD;
}
}
// Last resort, stick with the current one.
- TSD = getCurrentTSD();
TSD->lock();
return TSD;
}
Modified: compiler-rt/trunk/lib/scudo/scudo_tsd_shared.inc
URL: http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/scudo/scudo_tsd_shared.inc?rev=334410&r1=334409&r2=334410&view=diff
==============================================================================
--- compiler-rt/trunk/lib/scudo/scudo_tsd_shared.inc (original)
+++ compiler-rt/trunk/lib/scudo/scudo_tsd_shared.inc Mon Jun 11 07:50:31 2018
@@ -19,9 +19,16 @@
extern pthread_key_t PThreadKey;
+#if SANITIZER_LINUX && !SANITIZER_ANDROID
+__attribute__((tls_model("initial-exec")))
+extern THREADLOCAL ScudoTSD *CurrentTSD;
+#endif
+
ALWAYS_INLINE ScudoTSD* getCurrentTSD() {
#if SANITIZER_ANDROID
return reinterpret_cast<ScudoTSD *>(*get_android_tls_ptr());
+#elif SANITIZER_LINUX
+ return CurrentTSD;
#else
return reinterpret_cast<ScudoTSD *>(pthread_getspecific(PThreadKey));
#endif // SANITIZER_ANDROID
@@ -33,16 +40,17 @@ ALWAYS_INLINE void initThreadMaybe(bool
initThread(MinimalInit);
}
-ScudoTSD *getTSDAndLockSlow();
+ScudoTSD *getTSDAndLockSlow(ScudoTSD *TSD);
-ALWAYS_INLINE ScudoTSD *getTSDAndLock() {
+ALWAYS_INLINE ScudoTSD *getTSDAndLock(bool *UnlockRequired) {
ScudoTSD *TSD = getCurrentTSD();
- CHECK(TSD && "No TSD associated with the current thread!");
+ DCHECK(TSD && "No TSD associated with the current thread!");
+ *UnlockRequired = true;
// Try to lock the currently associated context.
if (TSD->tryLock())
return TSD;
// If it failed, go the slow path.
- return getTSDAndLockSlow();
+ return getTSDAndLockSlow(TSD);
}
#endif // !SCUDO_TSD_EXCLUSIVE
More information about the llvm-commits
mailing list