[PATCH] D47289: [scudo] Improve the scalability of the shared TSD model

Thu Jun 7 14:41:43 PDT 2018

cryptoad added a comment.

Here are more detail numbers for t-test1.

The machine has 72 cores. We are using the shared TSD version with 32 caches (to exercise some contention).
The numbers are the total time (averaged and rounded over 3 consecutive runs) spent in allocation functions only with 40, then 80 concurrent threads:

- current upstream: 960s, 3315s
- with precedence, max 4 caches scanned, lock current: 810s, 3200s (current CL proposed)
- with precedence, max 4 caches scanned, lock random:  815s, 3125s
- with precedence, all caches scanned, lock current:   880s, 3940s
- with precedence, all caches scanned, lock random:    890s, 3755s
- no precedence, max 4 caches scanned, lock current:   900s, 3365s
- no precedence, max 4 caches scanned, lock random:    840s, 3300s
- no precedence, all caches scanned, lock current:    1025s, 3600s
- no precedence, all caches scanned, lock random:      890s, 3785s

Locking a random cache in the event of heavier contention seems to be beneficial, but not necessarily with lesser contention.
Since I am more interested in striking a middle ground rather than aiming for contentious applications, it looks like the precedence matters, as well as not scanning all the caches but limiting ourselves to 4.

Repository:
  rCRT Compiler Runtime

https://reviews.llvm.org/D47289