[lld] r232460 - [ELF] Use parallel_for_each for writing.

Tue Mar 17 13:35:39 PDT 2015

On Tue, Mar 17, 2015 at 12:30 PM, Rui Ueyama <ruiu at google.com> wrote:

> On Mon, Mar 16, 2015 at 8:29 PM, Shankar Easwaran <shankare at codeaurora.org
> > wrote:
>
>> Author: shankare
>> Date: Mon Mar 16 22:29:32 2015
>> New Revision: 232460
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=232460&view=rev
>> Log:
>> [ELF] Use parallel_for_each for writing.
>>
>> This changes improves performance of lld, when self-hosting lld, when
>> compared
>> with the bfd linker. BFD linker on average takes 8 seconds in elapsed
>> time.
>> lld takes 3 seconds elapased time average. Without this change, lld takes
>> ~5
>> seconds average. The runtime comparisons were done on a release build and
>> measured by running linking thrice.
>>
>> lld self-host without the change
>> ----------------------------------
>> real    0m3.196s
>> user    0m4.580s
>> sys     0m0.832s
>>
>> lld self-host with lld
>> -----------------------
>> user    0m3.024s
>> user    0m3.252s
>> sys     0m0.796s
>>
>
> The above results don't look real output of "time" command.
>
> If it's real, it's too good to be true, assuming the first line of the
> second result is "real" instead of "user".
>
> "real" is wall clock time from process start to process exit. "user" is
> CPU time consumed by the process in user mode (if a process is
> multi-threaded, it can be larger than real).
>
> Your result shows significant improvement in user time. Which means you
> have significantly reduced the amount of processing time to do the same
> thing compared to before. However, because this change didn't change
> algorithm, but just execute them in parallel, it couldn't happen.
>
> Something's not correct.
>

I also noticed a similar thing in Davide's flame graphs in `[LLVMdev] On
LLD performance`. It's not impossible though. E.g. doing it multithreaded
allows utilizing the caches of multiple cores, which could result in a
significant reduction in the time the CPU is waiting on memory and thus
improves user CPU time.

But I agree that it does make me skeptical/want a closer investigation.

-- Sean Silva

>
> I appreciate your effort to make LLD faster, but we need to be careful
> about benchmark results. If we don't measure improvements accurately, it's
> easy to make an "optimization" that makes things slower.
>
> Another important thing is to disbelieve what you do when you optimize
> something and measure its effect. It sometimes happen that I believe
> something is going to improve performance 100% sure but it actually
> wouldn't.
>
> time taken to build lld with bfd
>> --------------------------------
>> real    0m8.419s
>> user    0m7.748s
>> sys     0m0.632s
>>
>> Modified:
>>     lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h
>>     lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h
>>
>> Modified: lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h?rev=232460&r1=232459&r2=232460&view=diff
>>
>> ==============================================================================
>> --- lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h (original)
>> +++ lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h Mon Mar 16 22:29:32
>> 2015
>> @@ -586,8 +586,10 @@ std::error_code OutputELFWriter<ELFT>::w
>>    _elfHeader->write(this, _layout, *buffer);
>>    _programHeader->write(this, _layout, *buffer);
>>
>> -  for (auto section : _layout.sections())
>> -    section->write(this, _layout, *buffer);
>> +  auto sections = _layout.sections();
>> +  parallel_for_each(
>> +      sections.begin(), sections.end(),
>> +      [&](Chunk<ELFT> *section) { section->write(this, _layout,
>> *buffer); });
>>    writeTask.end();
>>
>>    ScopedTask commitTask(getDefaultDomain(), "ELF Writer commit to disk");
>>
>> Modified: lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h?rev=232460&r1=232459&r2=232460&view=diff
>>
>> ==============================================================================
>> --- lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h (original)
>> +++ lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h Mon Mar 16 22:29:32
>> 2015
>> @@ -234,17 +234,17 @@ public:
>>    /// routine gets called after the linker fixes up the virtual address
>>    /// of the section
>>    virtual void assignVirtualAddress(uint64_t addr) override {
>> -    for (auto &ai : _atoms) {
>> +    parallel_for_each(_atoms.begin(), _atoms.end(), [&](AtomLayout *ai) {
>>        ai->_virtualAddr = addr + ai->_fileOffset;
>> -    }
>> +    });
>>    }
>>
>>    /// \brief Set the file offset of each Atom in the section. This
>> routine
>>    /// gets called after the linker fixes up the section offset
>>    void assignFileOffsets(uint64_t offset) override {
>> -    for (auto &ai : _atoms) {
>> +    parallel_for_each(_atoms.begin(), _atoms.end(), [&](AtomLayout *ai) {
>>        ai->_fileOffset = offset + ai->_fileOffset;
>> -    }
>> +    });
>>    }
>>
>>    /// \brief Find the Atom address given a name, this is needed to
>> properly
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150317/201737d8/attachment.html>