[llvm-dev] (Thin)LTO llvm build

Sat Sep 17 20:45:08 PDT 2016

As Mehdi mentioned, thinLTO backend processes use very little memory, you
may get away without any additional flags (neither
-Wl,--plugin-opt=jobs=.., nor -Dxxx for cmake to limit link parallesm) if
your build machine has enough memory. Here is some build time data of
parallel linking (with ThinLTO) 52 binaries in clang build (linking
parallelism equals ninja parallelism). The machine has 32 logical cores and
64GB memory.

1) Using the default ninja parallelism, the peak 1min load-average is 537.
The total elapse time is 9m43s
2) Using ninja -j16, the peak load is 411. The elapse time is 8m26s
3) ninja -j8 : elapse time is 8m34s
4) ninja -j4 : elapse time is 8m50s
5) ninja  -j2 : elapse time is 9m54s
6) ninja -j1 : elapse time is 12m3s

As you can see, doing serial thinLTO linking across multiple binaries do
not give you the best performance. The build performance peaked at j16 in
this configuration.   You may need to find your best
LLVM_PARALLEL_LINK_JOBS value.

Having said that,  there is definitely  room for ThinLTO usability
improvement so that ThinLTO parallel backend can coordinate well with the
build system's parallelism so that user does not need to figure out the
sweet spot.

thanks,

David

On Sat, Sep 17, 2016 at 4:03 PM, Carsten Mattner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> On Sun, Sep 18, 2016 at 12:32 AM, Mehdi Amini <mehdi.amini at apple.com>
> wrote:
> >
> >> On Sep 17, 2016, at 3:19 PM, Carsten Mattner <carstenmattner at gmail.com>
> wrote:
> >>
> >> So, when I embark on the next ThinLTO try build, probably this Sunday,
> >> should I append -Wl,-plugin-opt,jobs=NUM_PHYS_CORES to LDFLAGS
> >> and run ninja without -j or -jNUM_PHYS_CORES?
> >
> >
> > ThinLTO is fairly lean on memory: It should not consume more memory per
> thread than if you launch the same number of clang process in parallel to
> process C++ files.
> >
> > For example when linking the clang binary itself, without debug info it
> consumes 0.6GB with 8 threads, 0.9GB with 16 threads, and 1.4GB with 32
> threads.
> > With full debug info, we still have room for improvement, right now it
> consumes 2.3GB with 8 threads, 3.5GB with 16 threads, and 6.5GB with 32
> threads.
> >
> > So I believe that configuring with -DDLLVM_PARALLEL_LINK_JOBS=1 should
> be enough without other constrains, but your mileage may vary.
>
> Sure, I'll try that to not introduce too many variables into the
> configure changes,
> though I have to ask if using lld would make it possible to have a common
> -Wl
> that works across platforms, being able to ignore if it's binutils.
>
> If I really wanted to pass that to cmake, overriding LDFLAGS would work,
> right?
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160917/73723ffe/attachment.html>