[PATCH] D147493: [ELF] Cap parallel::strategy to 8 cores when --threads= is unspecified

Tue Apr 4 10:01:45 PDT 2023

andrewng added a comment.

These are the link times for a UE4 based link with a LLVM 16 `libc++` self-hosted release build with the `rpmalloc` allocator on Windows 10, AMD Ryzen 3900X 12C/24T, Samsung 970 EVO NVMe SSD, 64GB RAM:

"Regular" link:

| **Threads** | **Time** | **% gain** |
| -j4         | 3.508    | -          |
| -j8         | 2.761    | 21.29      |
| -j12        | 2.647    | 24.53      |
| -j16        | 2.602    | 25.82      |
| -j20        | 2.630    | 25.04      |
| -j24        | 2.663    | 24.08      |
|

"GC + ICF" link:

| **Threads** | **Time** | **% gain** |
| -j4         | 4.033    | -          |
| -j8         | 3.273    | 18.84      |
| -j12        | 3.128    | 22.45      |
| -j16        | 3.067    | 23.96      |
| -j20        | 3.092    | 23.33      |
| -j24        | 3.121    | 22.63      |
|

So there is improvement all the way to `-j16`, although the benefit at `-j16` is quite small. However, the gain for `-j12` is not insignificant. So this example already shows that limiting the thread count to 8 could be detrimental to performance, although in absolute terms it might not be considered significant. It also shows that `-j20` and above reduces performance. These results are also likely to vary depending on the toolchain, runtime and allocator in use as well as the test system itself.

I was kind of hoping that the "GC + ICF" link would show better scaling but it's quite similar (perhaps some scope for improvement?). At the end of the day, it's tricky to make threading scale well and tricky to determine the right balance too, particularly when there are so many factors that can affect the outcome. However, it's also very clear that too many threads can be very bad. I think the key thing is to allow the user easy and sufficient control to be able to get "good" performance. However, what the default behaviour should be is not so obvious. Although from the evidence so far, it seems some "cap" would make sense.

> do you have any way to profile what's going on in your case?

I could but I suspect that it's just a case of slightly better scaling in this particular link and/or setup.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147493/new/

https://reviews.llvm.org/D147493