[all-commits] [llvm/llvm-project] a6a37a: [Support] On Windows, add optional support for {rp...

Thu Aug 27 08:10:10 PDT 2020

  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: a6a37a2fcd2a8048a75bd0d8280497ed89d73224
      https://github.com/llvm/llvm-project/commit/a6a37a2fcd2a8048a75bd0d8280497ed89d73224
  Author: Alexandre Ganea <alexandre.ganea at ubisoft.com>
  Date:   2020-08-27 (Thu, 27 Aug 2020)

  Changed paths:
    M llvm/CMakeLists.txt
    M llvm/docs/CMake.rst
    M llvm/lib/Support/CMakeLists.txt
    M llvm/tools/llvm-shlib/CMakeLists.txt
    M llvm/tools/remarks-shlib/CMakeLists.txt
    M llvm/unittests/Support/DynamicLibrary/CMakeLists.txt

  Log Message:
  -----------
  [Support] On Windows, add optional support for {rpmalloc|snmalloc|mimalloc}

This patch optionally replaces the CRT allocator (i.e., malloc and free) with rpmalloc (mixed public domain licence/MIT licence) or snmalloc (MIT licence) or mimalloc (MIT licence). Please note that the source code for these allocators must be available outside of LLVM's tree.

To enable, use `cmake ... -DLLVM_INTEGRATED_CRT_ALLOC=D:/git/rpmalloc -DLLVM_USE_CRT_RELEASE=MT` where `D:/git/rpmalloc` has already been git clone'd from `https://github.com/mjansson/rpmalloc`. The same applies to snmalloc and mimalloc.

When enabled, the allocator will be embeded (statically linked) into the LLVM tools & libraries. This currently only works with the static CRT (/MT), although using the dynamic CRT (/MD) could potentially work as well in the future.

When enabled, this changes the memory stack from:
  new/delete -> MS VC++ CRT malloc/free -> HeapAlloc -> VirtualAlloc
to:
  new/delete -> {rpmalloc|snmalloc|mimalloc} -> VirtualAlloc

The goal of this patch is to bypass the application's global heap - which is thread-safe thus inducing locking - and instead take advantage of a modern lock-free, thread cache, allocator. On a 6-core Xeon Skylake we observe a 2.5x decrease in execution time when linking a large scale application with LLD and ThinLTO (12 min 20 sec -> 5 min 34 sec), when all hardware threads are being used (using LLD's flag /opt:lldltojobs=all). On a dual 36-core Xeon Skylake with all hardware threads used, we observe a 24x decrease in execution time (1 h 2 min -> 2 min 38 sec) when linking a large application with LLD and ThinLTO. Clang build times also see a decrease in the range 5-10% depending on the configuration.

Differential Revision: https://reviews.llvm.org/D71786