[llvm-dev] RFC: Replacing the default CRT allocator on Windows
Alexandre Ganea via llvm-dev
llvm-dev at lists.llvm.org
Wed Jul 1 21:20:20 PDT 2020
I was wondering how folks were feeling about replacing the default Windows CRT allocator in Clang, LLD and other LLVM tools possibly.
The CRT heap allocator on Windows doesn't scale well on large core count machines. Any multi-threaded workload in LLVM that allocates often is impacted by this. As a result, link times with ThinLTO are extremely slow on Windows. We're observing performance inversely proportional to the number of cores. The more cores the machines has, the slower ThinLTO linking gets.
We've replaced the CRT heap allocator by modern lock-free thread-cache allocators such as rpmalloc (unlicence), mimalloc (MIT licence) or snmalloc (MIT licence). The runtime performance is an order of magnitude faster.
Time to link clang.exe with LLD and -flto on 36-core:
Windows CRT heap allocator: 38 min 47 sec
mimalloc: 2 min 22 sec
rpmalloc: 2 min 15 sec
snmalloc: 2 min 19 sec
We're running in production with a downstream fork of LLVM + rpmalloc for more than a year. However when cross-compiling some specific game platforms we're using other downstream forks of LLVM that we can't change.
Two questions arise:
1. The licencing. Should we embed one of these allocators into the LLVM tree, or keep them separate out-of-the-tree?
2. If the answer for above question is "yes", given the tremendous performance speedup, should we embed one of these allocators into Clang/LLD builds by default? (on Windows only) Considering that Windows doesn't have a LD_PRELOAD mechanism.
Please see demo patch here: https://reviews.llvm.org/D71786
Thank you in advance for the feedback!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev