[PATCH] D47289: [scudo] Improve the scalability of the shared TSD model
Kostya Kortchinsky via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 7 11:16:02 PDT 2018
cryptoad added a comment.
Here are some answers to Dmitry's requests:
- Regarding `getTSDAndLockSlow` and the division: pprof shows no significant time spent in the function outside of the `tryLock` & `lock` so I think we are good here;
- Regarding the precedence: I tested a version where I dropped it entirely, results are mixed:
- For Android's "improved" `memory_replay`: it is faster in all cases, but we only have 2 caches for that specific platform (due to memory constraints compared to the default allocator);
- For `rpc2-benchmark`: mostly similar numbers;
- For `t-test1`: the version with precedence shows better performances in almost all situations; this benchmark also demonstrates a slowdown with the number of TSDs scanned in the slowpath, eg: scanning 4 and slow locking if they all failed to tryLock performs better overall than scanning 32. And this can be a significant slowdown, for example with `t-test1 800 40 800000 100000`, it's 900s spent in allocation functions vs 1150s. The argument here is that this benchmark only does {de}allocations (& memset) and as such isn't very representative of "real" programs, but it's exercising the most contention on the caches.
I can't seem to get a definitive answer overall as with or without precedence have both win & lose situations.
The only sure thing so far is that both are better than the current version.
I am open to suggestion or potential improvements, otherwise I'd keep the current version of the CL (and will address the review comments).
Repository:
rCRT Compiler Runtime
https://reviews.llvm.org/D47289
More information about the llvm-commits
mailing list