[lld] r287946 - Parallelize uncompress() and splitIntoPieces().
Rui Ueyama via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 5 09:23:35 PST 2016
What is your machine spec by the way?
On Mon, Dec 5, 2016 at 9:22 AM, Rafael Avila de Espindola <
rafael.espindola at gmail.com> wrote:
>
> Thanks!
>
> I didn't have access to my workstation last week. Now that I do, I
> measure 1.15226905502x faster for firefox and 1.27814295845x faster for
> scylla, the two programs with debug info in the tests I normally run.
>
> Cheers,
> Rafael
>
> Rui Ueyama via llvm-commits <llvm-commits at lists.llvm.org> writes:
>
> > Author: ruiu
> > Date: Fri Nov 25 14:05:08 2016
> > New Revision: 287946
> >
> > URL: http://llvm.org/viewvc/llvm-project?rev=287946&view=rev
> > Log:
> > Parallelize uncompress() and splitIntoPieces().
> >
> > Uncompressing section contents and spliting mergeable section contents
> > into smaller chunks are heavy tasks. They scan entire section contents
> > and do CPU-intensive tasks such as uncompressing zlib-compressed data
> > or computing a hash value for each section piece.
> >
> > Luckily, these tasks are independent to each other, so we can do that
> > in parallel_for_each. The number of input sections is large (as opposed
> > to the number of output sections), so there's a large parallelism here.
> >
> > Actually the current design to call uncompress() and splitIntoPieces()
> > in batch was chosen with doing this in mind. Basically what we need to
> > do here is to replace `for` with `parallel_for_each`.
> >
> > It seems this patch improves latency significantly if linked programs
> > contain debug info (which in turn contain lots of mergeable strings.)
> > For example, the latency to link Clang (debug build) improved by 20% on
> > my machine as shown below. Note that ld.gold took 19.2 seconds to do
> > the same thing.
> >
> > Before:
> > 30801.782712 task-clock (msec) # 3.652 CPUs utilized
> ( +- 2.59% )
> > 104,084 context-switches # 0.003 M/sec
> ( +- 1.02% )
> > 5,063 cpu-migrations # 0.164 K/sec
> ( +- 13.66% )
> > 2,528,130 page-faults # 0.082 M/sec
> ( +- 0.47% )
> > 85,317,809,130 cycles # 2.770 GHz
> ( +- 2.62% )
> > 67,352,463,373 stalled-cycles-frontend # 78.94% frontend cycles
> idle ( +- 3.06% )
> > <not supported> stalled-cycles-backend
> > 44,295,945,493 instructions # 0.52 insns per cycle
> > # 1.52 stalled cycles per
> insn ( +- 0.44% )
> > 8,572,384,877 branches # 278.308 M/sec
> ( +- 0.66% )
> > 141,806,726 branch-misses # 1.65% of all branches
> ( +- 0.13% )
> >
> > 8.433424003 seconds time elapsed
> ( +- 1.20% )
> >
> > After:
> > 35523.764575 task-clock (msec) # 5.265 CPUs utilized
> ( +- 2.67% )
> > 159,107 context-switches # 0.004 M/sec
> ( +- 0.48% )
> > 8,123 cpu-migrations # 0.229 K/sec
> ( +- 23.34% )
> > 2,372,483 page-faults # 0.067 M/sec
> ( +- 0.36% )
> > 98,395,342,152 cycles # 2.770 GHz
> ( +- 2.62% )
> > 79,294,670,125 stalled-cycles-frontend # 80.59% frontend cycles
> idle ( +- 3.03% )
> > <not supported> stalled-cycles-backend
> > 46,274,151,813 instructions # 0.47 insns per cycle
> > # 1.71 stalled cycles per
> insn ( +- 0.47% )
> > 8,987,621,670 branches # 253.003 M/sec
> ( +- 0.60% )
> > 148,900,624 branch-misses # 1.66% of all branches
> ( +- 0.27% )
> >
> > 6.747548004 seconds time elapsed
> ( +- 0.40% )
> >
> > Modified:
> > lld/trunk/ELF/Driver.cpp
> > lld/trunk/ELF/InputSection.cpp
> >
> > Modified: lld/trunk/ELF/Driver.cpp
> > URL: http://llvm.org/viewvc/llvm-project/lld/trunk/ELF/Driver.
> cpp?rev=287946&r1=287945&r2=287946&view=diff
> > ============================================================
> ==================
> > --- lld/trunk/ELF/Driver.cpp (original)
> > +++ lld/trunk/ELF/Driver.cpp Fri Nov 25 14:05:08 2016
> > @@ -20,6 +20,7 @@
> > #include "Target.h"
> > #include "Writer.h"
> > #include "lld/Config/Version.h"
> > +#include "lld/Core/Parallel.h"
> > #include "lld/Driver/Driver.h"
> > #include "llvm/ADT/StringExtras.h"
> > #include "llvm/ADT/StringSwitch.h"
> > @@ -800,14 +801,15 @@ template <class ELFT> void LinkerDriver:
> >
> > // MergeInputSection::splitIntoPieces needs to be called before
> > // any call of MergeInputSection::getOffset. Do that.
> > - for (InputSectionBase<ELFT> *S : Symtab.Sections) {
> > - if (!S->Live)
> > - continue;
> > - if (S->Compressed)
> > - S->uncompress();
> > - if (auto *MS = dyn_cast<MergeInputSection<ELFT>>(S))
> > - MS->splitIntoPieces();
> > - }
> > + parallel_for_each(Symtab.Sections.begin(), Symtab.Sections.end(),
> > + [](InputSectionBase<ELFT> *S) {
> > + if (!S->Live)
> > + return;
> > + if (S->Compressed)
> > + S->uncompress();
> > + if (auto *MS = dyn_cast<MergeInputSection<
> ELFT>>(S))
> > + MS->splitIntoPieces();
> > + });
> >
> > // Write the result to the file.
> > writeResult<ELFT>();
> >
> > Modified: lld/trunk/ELF/InputSection.cpp
> > URL: http://llvm.org/viewvc/llvm-project/lld/trunk/ELF/
> InputSection.cpp?rev=287946&r1=287945&r2=287946&view=diff
> > ============================================================
> ==================
> > --- lld/trunk/ELF/InputSection.cpp (original)
> > +++ lld/trunk/ELF/InputSection.cpp Fri Nov 25 14:05:08 2016
> > @@ -22,6 +22,7 @@
> >
> > #include "llvm/Support/Compression.h"
> > #include "llvm/Support/Endian.h"
> > +#include <mutex>
> >
> > using namespace llvm;
> > using namespace llvm::ELF;
> > @@ -160,6 +161,8 @@ InputSectionBase<ELFT>::getRawCompressed
> > return {Data.slice(sizeof(*Hdr)), read64be(Hdr->Size)};
> > }
> >
> > +// Uncompress section contents. Note that this function is called
> > +// from parallel_for_each, so it must be thread-safe.
> > template <class ELFT> void InputSectionBase<ELFT>::uncompress() {
> > if (!zlib::isAvailable())
> > fatal(toString(this) +
> > @@ -179,7 +182,12 @@ template <class ELFT> void InputSectionB
> > std::tie(Buf, Size) = getRawCompressedData(Data);
> >
> > // Uncompress Buf.
> > - char *OutputBuf = BAlloc.Allocate<char>(Size);
> > + char *OutputBuf;
> > + {
> > + static std::mutex Mu;
> > + std::lock_guard<std::mutex> Lock(Mu);
> > + OutputBuf = BAlloc.Allocate<char>(Size);
> > + }
> > if (zlib::uncompress(toStringRef(Buf), OutputBuf, Size) !=
> zlib::StatusOK)
> > fatal(toString(this) + ": error while uncompressing section");
> > Data = ArrayRef<uint8_t>((uint8_t *)OutputBuf, Size);
> > @@ -746,6 +754,12 @@ MergeInputSection<ELFT>::MergeInputSecti
> > StringRef Name)
> > : InputSectionBase<ELFT>(F, Header, Name,
> InputSectionBase<ELFT>::Merge) {}
> >
> > +// This function is called after we obtain a complete list of input
> sections
> > +// that need to be linked. This is responsible to split section contents
> > +// into small chunks for further processing.
> > +//
> > +// Note that this function is called from parallel_for_each. This must
> be
> > +// thread-safe (i.e. no memory allocation from the pools).
> > template <class ELFT> void MergeInputSection<ELFT>::splitIntoPieces() {
> > ArrayRef<uint8_t> Data = this->Data;
> > uintX_t EntSize = this->Entsize;
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161205/55207764/attachment.html>
More information about the llvm-commits
mailing list