[cfe-dev] RFC: Replacing the default CRT allocator on Windows

James Y Knight via cfe-dev cfe-dev at lists.llvm.org
Thu Jul 2 15:08:27 PDT 2020


Have you tried Microsoft's new "segment heap" implementation? Only apps
that opt-in get it at the moment. Reportedly edge and chromium are getting
large memory savings from switching, but I haven't seen performance
comparisons.

If the performance is good, seems like that might be the simplest choice

https://docs.microsoft.com/en-us/windows/win32/sbscs/application-manifests#heaptype

https://www.blackhat.com/docs/us-16/materials/us-16-Yason-Windows-10-Segment-Heap-Internals.pdf

On Thu, Jul 2, 2020, 12:20 AM Alexandre Ganea via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Hello,
>
>
>
> I was wondering how folks were feeling about replacing the default Windows
> CRT allocator in Clang, LLD and other LLVM tools possibly.
>
>
>
> The CRT heap allocator on Windows doesn’t scale well on large core count
> machines. Any multi-threaded workload in LLVM that allocates often is
> impacted by this. As a result, link times with ThinLTO are extremely slow
> on Windows. We’re observing performance inversely proportional to the
> number of cores. The more cores the machines has, the slower ThinLTO
> linking gets.
>
>
>
> We’ve replaced the CRT heap allocator by modern lock-free thread-cache
> allocators such as rpmalloc (unlicence), mimalloc (MIT licence) or snmalloc
> (MIT licence). The runtime performance is an order of magnitude faster.
>
>
>
> Time to link clang.exe with LLD and -flto on 36-core:
>
>   Windows CRT heap allocator: 38 min 47 sec
>
>   mimalloc: 2 min 22 sec
>
>   rpmalloc: 2 min 15 sec
>
>   snmalloc: 2 min 19 sec
>
>
>
> We’re running in production with a downstream fork of LLVM + rpmalloc for
> more than a year. However when cross-compiling some specific game platforms
> we’re using other downstream forks of LLVM that we can’t change.
>
>
>
> Two questions arise:
>
>    1. The licencing. Should we embed one of these allocators into the
>    LLVM tree, or keep them separate out-of-the-tree?
>    2. If the answer for above question is “yes”, given the tremendous
>    performance speedup, should we embed one of these allocators into Clang/LLD
>    builds by default? (on Windows only) Considering that Windows doesn’t have
>    a LD_PRELOAD mechanism.
>
>
>
> Please see demo patch here: https://reviews.llvm.org/D71786
>
>
>
> Thank you in advance for the feedback!
>
> Alex.
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200702/7c79c7d9/attachment.html>


More information about the cfe-dev mailing list