[PATCH] D71786: RFC: [Support] On Windows, add optional support for rpmalloc

Alexandre Ganea via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Feb 3 08:59:19 PST 2020


aganea added a comment.

> In D71786#1813866 <https://reviews.llvm.org/D71786#1813866>, @russell.gallop wrote:
>  With all respect to @maniccoder, there are other replacement memory allocators that could be considered as alternatives (e.g.):
>  tcmalloc from https://github.com/gperftools/gperftools (as @rnk mentioned above)
>  https://github.com/microsoft/mimalloc

I've tried **mimalloc** -- it's a tad faster than rpmalloc. maybe this has to do with mimalloc mapping //less// memory pages than rpmalloc.
I used mimalloc as a .lib, not compiled as part of LLVM, because of mismatches between Clang headers and Windows headers (see here <https://github.com/microsoft/mimalloc/issues/200>).

Figures below are for linking `clang.exe` with the cmake flags [3] mentioned in the Summary. LTO cache is disabled; the Windows cache is warm, figures taken after several runs.
Using a two-stage `lld-link.exe` with the same cmake flags and mimalloc/rpmalloc used in both stages.
D71775 <https://reviews.llvm.org/D71775> is also applied.
36-threads test uses ThinLTO with one thread per core, but using all CPU sockets. 72-threads test uses all hardware threads, on all cores, on all CPU sockets.
F11286761: mimalloc_rpmalloc_compare_clang_thinlto.png <https://reviews.llvm.org/F11286761>

Memory usage:

|                       | Working Set (B) | Private Working Set (B) | Commit (B) | Virtual Size (B) |
| rpmalloc - 36-threads | 25.1 GB         | 16.5 GB                 | 19.9 GB    | 37.4 GB          |
| mimalloc - 36-threads | 25.6 GB         | 16.3 GB                 | 18.3 GB    | 33.3 GB          |
|                       |                 |                         |            |                  |
| rpmalloc - 72-threads | 33.6 GB         | 25.1 GB                 | 28.5 GB    | 46 GB            |
| mimalloc - 72-threads | 30.5 GB         | 21.2 GB                 | 23.4 GB    | 38.4 GB          |
|

It's quite neat to see lld-link remaining at 99% CPU almost all the time during link :-D
F11287066: link_thinkLTO_99percent.PNG <https://reviews.llvm.org/F11287066>

I've also tried **tcmalloc**, but it performs significantly worse than both mimalloc and rpmalloc, mainly because of the SpinLock implementation, which downgrades to a kernel `Sleep` after a few spins. This is an absolute no-go for many-core systems when a lot of allocations are involved, like ThinLTO.
Also quickly tested **ptmalloc3** but out-of-the-box it doesn't pass the LLVM tests. However I don't expect any miracle there, quickly browsing through the code shows the same spinlock behavior as tcmalloc. And so does **Hoard**. Possibly glibc 2.26+ would perform better if they have a per-thread cache, but the mimalloc bench <https://github.com/microsoft/mimalloc#benchmark-results-on-36-core-intel> uses glibc 2.7, and shows worse performance in general.

Please let me know if you'd like more testing on any of the above, or any other allocator.

I think both mimalloc & rpmalloc are already on the asymptote in terms of performance. They both use about 3.5% of CPU time on the 36-core, which includes calls to malloc & free and the kernel calls to VirtualAlloc.
At this point, any further improvements will come from reducing allocations in parts of LLVM that allocate a lot:

F11287181: thinlto_callers_of_mimalloc.PNG <https://reviews.llvm.org/F11287181>

F11287185: thinlto_callers_of_mimalloc_2.PNG <https://reviews.llvm.org/F11287185>

In D71786#1813866 <https://reviews.llvm.org/D71786#1813866>, @russell.gallop wrote:

> Including the entire rpmalloc source does add quite a bit of source code in one go. Would it be better to keep the rpmalloc source outside LLVM and just include hooks?


I think integrating it by default into LLVM would ease things and reduce the points of failure for enabling it (from a toolchain provider POV).

In D71786#1813866 <https://reviews.llvm.org/D71786#1813866>, @russell.gallop wrote:

> I wouldn't be happy with unconditionally replacing the C library malloc (without a cmake option). LLVM should certainly continue to be tested, and work, with the default allocator as we wouldn't want to unintentionally develop a dependence on an alternative.


Agreed. What about compiling it by default, but enabling it at runtime with a environment variable or a cmd-line flag? That would be the best, and it wouldn't require you (Sony) or any other LLVM toolchain provider (Nintendo) to ship a different set of binaries. Customers could selectively enable the alternate allocator, at the expense of higher memory usage. We (Ubisoft) would like very much if first-party LLVM toolchains providers, could include this/an alternate allocator/ as part of their toolchains for Windows. Our games need to ship built with those toolchains, we can't use our own downstream forks (at least not at the moment).

As for code size,

  D:\Utils>cloc-1.80.exe d:\llvm-project\llvm\lib\Support\rpmalloc\
         4 text files.
         4 unique files.
         2 files ignored.
  
  github.com/AlDanial/cloc v 1.80  T=0.50 s (6.0 files/s, 6472.0 lines/s)
  -------------------------------------------------------------------------------
  Language                     files          blank        comment           code
  -------------------------------------------------------------------------------
  C                                2            280            341           2352
  C/C++ Header                     1             30            110            123
  -------------------------------------------------------------------------------
  SUM:                             3            310            451           2475
  -------------------------------------------------------------------------------

2475 LoC in 2 files seem pretty reasonable addition to me.

And:

  D:\Utils>cloc-1.80.exe d:\git\mimalloc\src
        15 text files.
        15 unique files.
         1 file ignored.
  
  github.com/AlDanial/cloc v 1.80  T=0.50 s (30.0 files/s, 13760.0 lines/s)
  -------------------------------------------------------------------------------
  Language                     files          blank        comment           code
  -------------------------------------------------------------------------------
  C                               15            889           1064           4927
  -------------------------------------------------------------------------------
  SUM:                            15            889           1064           4927
  -------------------------------------------------------------------------------
  
  D:\Utils>cloc-1.80.exe d:\git\mimalloc\include
         6 text files.
         6 unique files.
         1 file ignored.
  
  github.com/AlDanial/cloc v 1.80  T=0.50 s (12.0 files/s, 3194.0 lines/s)
  -------------------------------------------------------------------------------
  Language                     files          blank        comment           code
  -------------------------------------------------------------------------------
  C/C++ Header                     6            261            308           1028
  -------------------------------------------------------------------------------
  SUM:                             6            261            308           1028
  -------------------------------------------------------------------------------

mimalloc comes with about 2x larger codesize, and more .C files to potentially integrate into LLVM.

---

At this point, if rpmalloc could be improved to the same levels of memory usage as mimalloc, I'd be willing to wait a few weeks/months before going through. @maniccoder WDYT? I can provide a full step-by-step repro.
Otherwise, I'd take mimalloc as an external a lib, but this is not my preferred solution.

Please advise:
--------------

1. rpmalloc or mimalloc?
2. .lib or in-tree?
3. Compiled by default in LLVM, but disabled at runtime, and optionally enabled by the user at runtime?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71786/new/

https://reviews.llvm.org/D71786





More information about the llvm-commits mailing list