[PATCH] D71775: [ThreadPool] On Windows, extend usage to all CPU sockets and all NUMA groups

Sat Dec 21 08:28:49 PST 2019

aganea added a comment.

In D71775#1793767 <https://reviews.llvm.org/D71775#1793767>, @mehdi_amini wrote:

> > Will it make sense to say "I don't want hyper-threads" ?
>
> Not sure I remember correctly, but I believe one motivation behind avoiding "hyper-threads" and other virtual cores was that while they improve slightly the performance, they also increase the peak memory requirements: using heavyweight_hardware_concurrency() seemed like a good default tradeoff for most end-users.

It all makes sense. After this patch, the memory consumption is doubled when using both CPU sockets. Evidently then there's also a discussion about memory bandwidth, which doesn't scale in my case, when using both sockets (possibly on a AMD Epyc it could be better because is has more memory channels.). This is also why enabling the second socket only marginally decrease the timings.

F11115603: 6140_two_sockets.PNG <https://reviews.llvm.org/F11115603>

In Ubisoft's case, time is immensely more valuable (both compute and human) than memory sticks. Historically we didn't really use LTO on game productions because it was really slow, and often introduced undesirable bugs or side-effects. The graphs in D71786 <https://reviews.llvm.org/D71786> are for Rainbow 6: Siege, which is a "smaller" codebase. For larger games, LTO link time is more in the range of 1h 20min, both for MSVC and previous versions of Clang. If there's a LTO-specific bug in a final build, it is very hard to iterate with that kind of timings. In addition, there are hundreds of builds every day on the build system, and we want to keep all the cores in the build system busy. This is why both build and link times are important to us.

In D71775#1793768 <https://reviews.llvm.org/D71775#1793768>, @mehdi_amini wrote:

> Also: using heavyweight_hardware_concurrency() in the linker but having multiple linker jobs schedules by the build system was another reason (I think LLVM CMake default to 2 parallel link jobs when using ThinLTO for instance).

Understood. If one sets the CPU affinity when starting the application, ie. `start /affinity XXX lld-link.exe ...`, then this patch disables dispatching on other "processor groups", even if they are available. However, there doesn't seem to be a way to //restrain// the application to one "processor group".

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71775/new/

https://reviews.llvm.org/D71775