[lld] r287946 - Parallelize uncompress() and splitIntoPieces().

Rui Ueyama via llvm-commits llvm-commits at lists.llvm.org
Mon Dec 5 09:23:35 PST 2016


What is your machine spec by the way?

On Mon, Dec 5, 2016 at 9:22 AM, Rafael Avila de Espindola <
rafael.espindola at gmail.com> wrote:

>
> Thanks!
>
> I didn't have access to my workstation last week. Now that I do, I
> measure 1.15226905502x faster for firefox and 1.27814295845x faster for
> scylla, the two programs with debug info in the tests I normally run.
>
> Cheers,
> Rafael
>
> Rui Ueyama via llvm-commits <llvm-commits at lists.llvm.org> writes:
>
> > Author: ruiu
> > Date: Fri Nov 25 14:05:08 2016
> > New Revision: 287946
> >
> > URL: http://llvm.org/viewvc/llvm-project?rev=287946&view=rev
> > Log:
> > Parallelize uncompress() and splitIntoPieces().
> >
> > Uncompressing section contents and spliting mergeable section contents
> > into smaller chunks are heavy tasks. They scan entire section contents
> > and do CPU-intensive tasks such as uncompressing zlib-compressed data
> > or computing a hash value for each section piece.
> >
> > Luckily, these tasks are independent to each other, so we can do that
> > in parallel_for_each. The number of input sections is large (as opposed
> > to the number of output sections), so there's a large parallelism here.
> >
> > Actually the current design to call uncompress() and splitIntoPieces()
> > in batch was chosen with doing this in mind. Basically what we need to
> > do here is to replace `for` with `parallel_for_each`.
> >
> > It seems this patch improves latency significantly if linked programs
> > contain debug info (which in turn contain lots of mergeable strings.)
> > For example, the latency to link Clang (debug build) improved by 20% on
> > my machine as shown below. Note that ld.gold took 19.2 seconds to do
> > the same thing.
> >
> > Before:
> >     30801.782712 task-clock (msec)         #    3.652 CPUs utilized
>       ( +-  2.59% )
> >          104,084 context-switches          #    0.003 M/sec
>       ( +-  1.02% )
> >            5,063 cpu-migrations            #    0.164 K/sec
>       ( +- 13.66% )
> >        2,528,130 page-faults               #    0.082 M/sec
>       ( +-  0.47% )
> >   85,317,809,130 cycles                    #    2.770 GHz
>       ( +-  2.62% )
> >   67,352,463,373 stalled-cycles-frontend   #   78.94% frontend cycles
> idle     ( +-  3.06% )
> >  <not supported> stalled-cycles-backend
> >   44,295,945,493 instructions              #    0.52  insns per cycle
> >                                            #    1.52  stalled cycles per
> insn  ( +-  0.44% )
> >    8,572,384,877 branches                  #  278.308 M/sec
>       ( +-  0.66% )
> >      141,806,726 branch-misses             #    1.65% of all branches
>       ( +-  0.13% )
> >
> >      8.433424003 seconds time elapsed
>       ( +-  1.20% )
> >
> > After:
> >     35523.764575 task-clock (msec)         #    5.265 CPUs utilized
>       ( +-  2.67% )
> >          159,107 context-switches          #    0.004 M/sec
>       ( +-  0.48% )
> >            8,123 cpu-migrations            #    0.229 K/sec
>       ( +- 23.34% )
> >        2,372,483 page-faults               #    0.067 M/sec
>       ( +-  0.36% )
> >   98,395,342,152 cycles                    #    2.770 GHz
>       ( +-  2.62% )
> >   79,294,670,125 stalled-cycles-frontend   #   80.59% frontend cycles
> idle     ( +-  3.03% )
> >  <not supported> stalled-cycles-backend
> >   46,274,151,813 instructions              #    0.47  insns per cycle
> >                                            #    1.71  stalled cycles per
> insn  ( +-  0.47% )
> >    8,987,621,670 branches                  #  253.003 M/sec
>       ( +-  0.60% )
> >      148,900,624 branch-misses             #    1.66% of all branches
>       ( +-  0.27% )
> >
> >      6.747548004 seconds time elapsed
>       ( +-  0.40% )
> >
> > Modified:
> >     lld/trunk/ELF/Driver.cpp
> >     lld/trunk/ELF/InputSection.cpp
> >
> > Modified: lld/trunk/ELF/Driver.cpp
> > URL: http://llvm.org/viewvc/llvm-project/lld/trunk/ELF/Driver.
> cpp?rev=287946&r1=287945&r2=287946&view=diff
> > ============================================================
> ==================
> > --- lld/trunk/ELF/Driver.cpp (original)
> > +++ lld/trunk/ELF/Driver.cpp Fri Nov 25 14:05:08 2016
> > @@ -20,6 +20,7 @@
> >  #include "Target.h"
> >  #include "Writer.h"
> >  #include "lld/Config/Version.h"
> > +#include "lld/Core/Parallel.h"
> >  #include "lld/Driver/Driver.h"
> >  #include "llvm/ADT/StringExtras.h"
> >  #include "llvm/ADT/StringSwitch.h"
> > @@ -800,14 +801,15 @@ template <class ELFT> void LinkerDriver:
> >
> >    // MergeInputSection::splitIntoPieces needs to be called before
> >    // any call of MergeInputSection::getOffset. Do that.
> > -  for (InputSectionBase<ELFT> *S : Symtab.Sections) {
> > -    if (!S->Live)
> > -      continue;
> > -    if (S->Compressed)
> > -      S->uncompress();
> > -    if (auto *MS = dyn_cast<MergeInputSection<ELFT>>(S))
> > -      MS->splitIntoPieces();
> > -  }
> > +  parallel_for_each(Symtab.Sections.begin(), Symtab.Sections.end(),
> > +                    [](InputSectionBase<ELFT> *S) {
> > +                      if (!S->Live)
> > +                        return;
> > +                      if (S->Compressed)
> > +                        S->uncompress();
> > +                      if (auto *MS = dyn_cast<MergeInputSection<
> ELFT>>(S))
> > +                        MS->splitIntoPieces();
> > +                    });
> >
> >    // Write the result to the file.
> >    writeResult<ELFT>();
> >
> > Modified: lld/trunk/ELF/InputSection.cpp
> > URL: http://llvm.org/viewvc/llvm-project/lld/trunk/ELF/
> InputSection.cpp?rev=287946&r1=287945&r2=287946&view=diff
> > ============================================================
> ==================
> > --- lld/trunk/ELF/InputSection.cpp (original)
> > +++ lld/trunk/ELF/InputSection.cpp Fri Nov 25 14:05:08 2016
> > @@ -22,6 +22,7 @@
> >
> >  #include "llvm/Support/Compression.h"
> >  #include "llvm/Support/Endian.h"
> > +#include <mutex>
> >
> >  using namespace llvm;
> >  using namespace llvm::ELF;
> > @@ -160,6 +161,8 @@ InputSectionBase<ELFT>::getRawCompressed
> >    return {Data.slice(sizeof(*Hdr)), read64be(Hdr->Size)};
> >  }
> >
> > +// Uncompress section contents. Note that this function is called
> > +// from parallel_for_each, so it must be thread-safe.
> >  template <class ELFT> void InputSectionBase<ELFT>::uncompress() {
> >    if (!zlib::isAvailable())
> >      fatal(toString(this) +
> > @@ -179,7 +182,12 @@ template <class ELFT> void InputSectionB
> >      std::tie(Buf, Size) = getRawCompressedData(Data);
> >
> >    // Uncompress Buf.
> > -  char *OutputBuf = BAlloc.Allocate<char>(Size);
> > +  char *OutputBuf;
> > +  {
> > +    static std::mutex Mu;
> > +    std::lock_guard<std::mutex> Lock(Mu);
> > +    OutputBuf = BAlloc.Allocate<char>(Size);
> > +  }
> >    if (zlib::uncompress(toStringRef(Buf), OutputBuf, Size) !=
> zlib::StatusOK)
> >      fatal(toString(this) + ": error while uncompressing section");
> >    Data = ArrayRef<uint8_t>((uint8_t *)OutputBuf, Size);
> > @@ -746,6 +754,12 @@ MergeInputSection<ELFT>::MergeInputSecti
> >                                             StringRef Name)
> >      : InputSectionBase<ELFT>(F, Header, Name,
> InputSectionBase<ELFT>::Merge) {}
> >
> > +// This function is called after we obtain a complete list of input
> sections
> > +// that need to be linked. This is responsible to split section contents
> > +// into small chunks for further processing.
> > +//
> > +// Note that this function is called from parallel_for_each. This must
> be
> > +// thread-safe (i.e. no memory allocation from the pools).
> >  template <class ELFT> void MergeInputSection<ELFT>::splitIntoPieces() {
> >    ArrayRef<uint8_t> Data = this->Data;
> >    uintX_t EntSize = this->Entsize;
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161205/55207764/attachment.html>


More information about the llvm-commits mailing list