[lld] r287946 - Parallelize uncompress() and splitIntoPieces().
David Blaikie via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 28 09:21:49 PST 2016
tangentially related to compressed sections: Currently, I take it, lld
decompresses all compressed input sections into memory before producing
output, yes? Is there any chance in the future that lld might use a more
streaming approach to reduce memory overhead? (ie: defer decompressing
until output - and decompress/write out (possibly recompressing) in chunks,
rather than necessary whole sections or all sections)
On Fri, Nov 25, 2016 at 12:15 PM Rui Ueyama via llvm-commits <
llvm-commits at lists.llvm.org> wrote:
> Author: ruiu
> Date: Fri Nov 25 14:05:08 2016
> New Revision: 287946
>
> URL: http://llvm.org/viewvc/llvm-project?rev=287946&view=rev
> Log:
> Parallelize uncompress() and splitIntoPieces().
>
> Uncompressing section contents and spliting mergeable section contents
> into smaller chunks are heavy tasks. They scan entire section contents
> and do CPU-intensive tasks such as uncompressing zlib-compressed data
> or computing a hash value for each section piece.
>
> Luckily, these tasks are independent to each other, so we can do that
> in parallel_for_each. The number of input sections is large (as opposed
> to the number of output sections), so there's a large parallelism here.
>
> Actually the current design to call uncompress() and splitIntoPieces()
> in batch was chosen with doing this in mind. Basically what we need to
> do here is to replace `for` with `parallel_for_each`.
>
> It seems this patch improves latency significantly if linked programs
> contain debug info (which in turn contain lots of mergeable strings.)
> For example, the latency to link Clang (debug build) improved by 20% on
> my machine as shown below. Note that ld.gold took 19.2 seconds to do
> the same thing.
>
> Before:
> 30801.782712 task-clock (msec) # 3.652 CPUs utilized
> ( +- 2.59% )
> 104,084 context-switches # 0.003 M/sec
> ( +- 1.02% )
> 5,063 cpu-migrations # 0.164 K/sec
> ( +- 13.66% )
> 2,528,130 page-faults # 0.082 M/sec
> ( +- 0.47% )
> 85,317,809,130 cycles # 2.770 GHz
> ( +- 2.62% )
> 67,352,463,373 stalled-cycles-frontend # 78.94% frontend cycles
> idle ( +- 3.06% )
> <not supported> stalled-cycles-backend
> 44,295,945,493 instructions # 0.52 insns per cycle
> # 1.52 stalled cycles per
> insn ( +- 0.44% )
> 8,572,384,877 branches # 278.308 M/sec
> ( +- 0.66% )
> 141,806,726 branch-misses # 1.65% of all branches
> ( +- 0.13% )
>
> 8.433424003 seconds time elapsed
> ( +- 1.20% )
>
> After:
> 35523.764575 task-clock (msec) # 5.265 CPUs utilized
> ( +- 2.67% )
> 159,107 context-switches # 0.004 M/sec
> ( +- 0.48% )
> 8,123 cpu-migrations # 0.229 K/sec
> ( +- 23.34% )
> 2,372,483 page-faults # 0.067 M/sec
> ( +- 0.36% )
> 98,395,342,152 cycles # 2.770 GHz
> ( +- 2.62% )
> 79,294,670,125 stalled-cycles-frontend # 80.59% frontend cycles
> idle ( +- 3.03% )
> <not supported> stalled-cycles-backend
> 46,274,151,813 instructions # 0.47 insns per cycle
> # 1.71 stalled cycles per
> insn ( +- 0.47% )
> 8,987,621,670 branches # 253.003 M/sec
> ( +- 0.60% )
> 148,900,624 branch-misses # 1.66% of all branches
> ( +- 0.27% )
>
> 6.747548004 seconds time elapsed
> ( +- 0.40% )
>
> Modified:
> lld/trunk/ELF/Driver.cpp
> lld/trunk/ELF/InputSection.cpp
>
> Modified: lld/trunk/ELF/Driver.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/lld/trunk/ELF/Driver.cpp?rev=287946&r1=287945&r2=287946&view=diff
>
> ==============================================================================
> --- lld/trunk/ELF/Driver.cpp (original)
> +++ lld/trunk/ELF/Driver.cpp Fri Nov 25 14:05:08 2016
> @@ -20,6 +20,7 @@
> #include "Target.h"
> #include "Writer.h"
> #include "lld/Config/Version.h"
> +#include "lld/Core/Parallel.h"
> #include "lld/Driver/Driver.h"
> #include "llvm/ADT/StringExtras.h"
> #include "llvm/ADT/StringSwitch.h"
> @@ -800,14 +801,15 @@ template <class ELFT> void LinkerDriver:
>
> // MergeInputSection::splitIntoPieces needs to be called before
> // any call of MergeInputSection::getOffset. Do that.
> - for (InputSectionBase<ELFT> *S : Symtab.Sections) {
> - if (!S->Live)
> - continue;
> - if (S->Compressed)
> - S->uncompress();
> - if (auto *MS = dyn_cast<MergeInputSection<ELFT>>(S))
> - MS->splitIntoPieces();
> - }
> + parallel_for_each(Symtab.Sections.begin(), Symtab.Sections.end(),
> + [](InputSectionBase<ELFT> *S) {
> + if (!S->Live)
> + return;
> + if (S->Compressed)
> + S->uncompress();
> + if (auto *MS = dyn_cast<MergeInputSection<ELFT>>(S))
> + MS->splitIntoPieces();
> + });
>
> // Write the result to the file.
> writeResult<ELFT>();
>
> Modified: lld/trunk/ELF/InputSection.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/lld/trunk/ELF/InputSection.cpp?rev=287946&r1=287945&r2=287946&view=diff
>
> ==============================================================================
> --- lld/trunk/ELF/InputSection.cpp (original)
> +++ lld/trunk/ELF/InputSection.cpp Fri Nov 25 14:05:08 2016
> @@ -22,6 +22,7 @@
>
> #include "llvm/Support/Compression.h"
> #include "llvm/Support/Endian.h"
> +#include <mutex>
>
> using namespace llvm;
> using namespace llvm::ELF;
> @@ -160,6 +161,8 @@ InputSectionBase<ELFT>::getRawCompressed
> return {Data.slice(sizeof(*Hdr)), read64be(Hdr->Size)};
> }
>
> +// Uncompress section contents. Note that this function is called
> +// from parallel_for_each, so it must be thread-safe.
> template <class ELFT> void InputSectionBase<ELFT>::uncompress() {
> if (!zlib::isAvailable())
> fatal(toString(this) +
> @@ -179,7 +182,12 @@ template <class ELFT> void InputSectionB
> std::tie(Buf, Size) = getRawCompressedData(Data);
>
> // Uncompress Buf.
> - char *OutputBuf = BAlloc.Allocate<char>(Size);
> + char *OutputBuf;
> + {
> + static std::mutex Mu;
> + std::lock_guard<std::mutex> Lock(Mu);
> + OutputBuf = BAlloc.Allocate<char>(Size);
> + }
> if (zlib::uncompress(toStringRef(Buf), OutputBuf, Size) !=
> zlib::StatusOK)
> fatal(toString(this) + ": error while uncompressing section");
> Data = ArrayRef<uint8_t>((uint8_t *)OutputBuf, Size);
> @@ -746,6 +754,12 @@ MergeInputSection<ELFT>::MergeInputSecti
> StringRef Name)
> : InputSectionBase<ELFT>(F, Header, Name,
> InputSectionBase<ELFT>::Merge) {}
>
> +// This function is called after we obtain a complete list of input
> sections
> +// that need to be linked. This is responsible to split section contents
> +// into small chunks for further processing.
> +//
> +// Note that this function is called from parallel_for_each. This must be
> +// thread-safe (i.e. no memory allocation from the pools).
> template <class ELFT> void MergeInputSection<ELFT>::splitIntoPieces() {
> ArrayRef<uint8_t> Data = this->Data;
> uintX_t EntSize = this->Entsize;
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161128/cfc04550/attachment.html>
More information about the llvm-commits
mailing list