[PATCH] D47289: [scudo] Improve the scalability of the shared TSD model
Dmitry Vyukov via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat May 26 01:28:09 PDT 2018
dvyukov added inline comments.
================
Comment at: lib/scudo/scudo_tsd_shared.cpp:71
if (NumberOfTSDs > 1) {
- // Go through all the contexts and find the first unlocked one.
- for (u32 i = 0; i < NumberOfTSDs; i++) {
- TSD = &TSDs[i];
- if (TSD->tryLock()) {
- setCurrentTSD(TSD);
- return TSD;
+ u32 RandState = static_cast<u32>(TSD->getPrecedence());
+ const u32 R = Rand(&RandState);
----------------
The comment from the change description as to why we use precedence as rand state should be duplicated as a comment here. This is confusing.
================
Comment at: lib/scudo/scudo_tsd_shared.cpp:73
+ const u32 R = Rand(&RandState);
+ const u32 Inc = CoPrimes[R % NumberOfCoPrimes];
+ u32 Index = R % NumberOfTSDs;
----------------
Do you see getTSDAndLockSlow in profiles? Here we have 2 divisions. Divisions are expensive. We could replace them with multiplication+shift if they show up in profiles.
================
Comment at: lib/scudo/scudo_tsd_shared.cpp:78
+ // Go randomly through at most 4 contexts and find a candidate.
+ for (u32 I = 0; I < Min(4U, NumberOfTSDs); I++) {
+ if (&TSDs[Index] == TSD)
----------------
Have you tried dropping precedence entirely and then just going over all TSDs pseudo-randomly once (or maybe twice) and tryLock'ing them? If all tryLock's fail then we could lock the old one, or maybe a random one.
That would shave few cycles from malloc/free fast paths. The benefit of using precedence scheme is still unproved, right?
Also, if we have a hundred of cores/caches, then giving up after just 4 can lead to unnecessary blocking (there still can be dozens of available caches).
Repository:
rCRT Compiler Runtime
https://reviews.llvm.org/D47289
More information about the llvm-commits
mailing list