[all-commits] [llvm/llvm-project] a6a37a: [Support] On Windows, add optional support for {rp...
Alexandre Ganea via All-commits
all-commits at lists.llvm.org
Thu Aug 27 08:10:10 PDT 2020
Branch: refs/heads/master
Home: https://github.com/llvm/llvm-project
Commit: a6a37a2fcd2a8048a75bd0d8280497ed89d73224
https://github.com/llvm/llvm-project/commit/a6a37a2fcd2a8048a75bd0d8280497ed89d73224
Author: Alexandre Ganea <alexandre.ganea at ubisoft.com>
Date: 2020-08-27 (Thu, 27 Aug 2020)
Changed paths:
M llvm/CMakeLists.txt
M llvm/docs/CMake.rst
M llvm/lib/Support/CMakeLists.txt
M llvm/tools/llvm-shlib/CMakeLists.txt
M llvm/tools/remarks-shlib/CMakeLists.txt
M llvm/unittests/Support/DynamicLibrary/CMakeLists.txt
Log Message:
-----------
[Support] On Windows, add optional support for {rpmalloc|snmalloc|mimalloc}
This patch optionally replaces the CRT allocator (i.e., malloc and free) with rpmalloc (mixed public domain licence/MIT licence) or snmalloc (MIT licence) or mimalloc (MIT licence). Please note that the source code for these allocators must be available outside of LLVM's tree.
To enable, use `cmake ... -DLLVM_INTEGRATED_CRT_ALLOC=D:/git/rpmalloc -DLLVM_USE_CRT_RELEASE=MT` where `D:/git/rpmalloc` has already been git clone'd from `https://github.com/mjansson/rpmalloc`. The same applies to snmalloc and mimalloc.
When enabled, the allocator will be embeded (statically linked) into the LLVM tools & libraries. This currently only works with the static CRT (/MT), although using the dynamic CRT (/MD) could potentially work as well in the future.
When enabled, this changes the memory stack from:
new/delete -> MS VC++ CRT malloc/free -> HeapAlloc -> VirtualAlloc
to:
new/delete -> {rpmalloc|snmalloc|mimalloc} -> VirtualAlloc
The goal of this patch is to bypass the application's global heap - which is thread-safe thus inducing locking - and instead take advantage of a modern lock-free, thread cache, allocator. On a 6-core Xeon Skylake we observe a 2.5x decrease in execution time when linking a large scale application with LLD and ThinLTO (12 min 20 sec -> 5 min 34 sec), when all hardware threads are being used (using LLD's flag /opt:lldltojobs=all). On a dual 36-core Xeon Skylake with all hardware threads used, we observe a 24x decrease in execution time (1 h 2 min -> 2 min 38 sec) when linking a large application with LLD and ThinLTO. Clang build times also see a decrease in the range 5-10% depending on the configuration.
Differential Revision: https://reviews.llvm.org/D71786
More information about the All-commits
mailing list