<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix"><br>
I tried to measure this again with 4 tries and got results, to
make sure just in case, and I see few results identical to what I
measured before :-<br>
<br>
<u><b>Raw data below :-</b></u><br>
<br>
LLD Try With Patch #1<br>
4.16user 0.80system 0:03.06elapsed 162%CPU (0avgtext+0avgdata
7174160maxresident)k<br>
LLD Try Without Patch #1<br>
4.49user 0.92system 0:03.32elapsed 162%CPU (0avgtext+0avgdata
7179984maxresident)k<br>
BFD Try #1<br>
7.81user 0.68system 0:08.53elapsed 99%CPU (0avgtext+0avgdata
3230416maxresident)k<br>
LLD Try With Patch #2<br>
3.94user 0.86system 0:02.93elapsed 163%CPU (0avgtext+0avgdata
7175808maxresident)k<br>
LLD Try Without Patch #2<br>
4.12user 0.83system 0:03.22elapsed 154%CPU (0avgtext+0avgdata
7172704maxresident)k<br>
BFD Try #2<br>
7.78user 0.75system 0:08.57elapsed 99%CPU (0avgtext+0avgdata
3230416maxresident)k<br>
LLD Try With Patch #3<br>
4.36user 1.05system 0:03.08elapsed 175%CPU (0avgtext+0avgdata
7176320maxresident)k<br>
LLD Try Without Patch #3<br>
4.38user 0.90system 0:03.14elapsed 168%CPU (0avgtext+0avgdata
7175600maxresident)k<br>
BFD Try #3<br>
7.78user 0.64system 0:08.46elapsed 99%CPU (0avgtext+0avgdata
3230416maxresident)k<br>
LLD Try With Patch #4<br>
4.17user 0.72system 0:02.93elapsed 166%CPU (0avgtext+0avgdata
7175120maxresident)k<br>
LLD Try Without Patch #4<br>
4.20user 0.79system 0:03.08elapsed 161%CPU (0avgtext+0avgdata
7174864maxresident)k<br>
BFD Try #4<br>
7.77user 0.66system 0:08.46elapsed 99%CPU (0avgtext+0avgdata
3230416maxresident)k<br>
<br>
<u><b>Questions :-</b></u><br>
<br>
As Rui mentions I dont know why the user time is more without the
patch, any methods to verify this ?<br>
Could this be because of user threads instead of kernel threads ?
<b><br>
</b><br>
Shankar Easwaran<br>
<br>
On 3/17/2015 3:35 PM, Shankar Easwaran wrote:<br>
</div>
<blockquote cite="mid:5508902F.9090900@codeaurora.org" type="cite">Yes,
this is true. There were several logs of runs in the same file
that I read into the commit and manually removing them resulted in
two user lines.
<br>
<br>
But the result for all reasons is true. I can re-measure the time
taken though.
<br>
<br>
Shankar Easwaran
<br>
<br>
On 3/17/2015 2:30 PM, Rui Ueyama wrote:
<br>
<blockquote type="cite">On Mon, Mar 16, 2015 at 8:29 PM, Shankar
Easwaran <a class="moz-txt-link-rfc2396E" href="mailto:shankare@codeaurora.org"><shankare@codeaurora.org></a>
<br>
wrote:
<br>
<br>
<blockquote type="cite">Author: shankare
<br>
Date: Mon Mar 16 22:29:32 2015
<br>
New Revision: 232460
<br>
<br>
URL:
<a class="moz-txt-link-freetext" href="http://llvm.org/viewvc/llvm-project?rev=232460&view=rev">http://llvm.org/viewvc/llvm-project?rev=232460&view=rev</a>
<br>
Log:
<br>
[ELF] Use parallel_for_each for writing.
<br>
<br>
This changes improves performance of lld, when self-hosting
lld, when
<br>
compared
<br>
with the bfd linker. BFD linker on average takes 8 seconds in
elapsed time.
<br>
lld takes 3 seconds elapased time average. Without this
change, lld takes
<br>
~5
<br>
seconds average. The runtime comparisons were done on a
release build and
<br>
measured by running linking thrice.
<br>
<br>
lld self-host without the change
<br>
----------------------------------
<br>
real 0m3.196s
<br>
user 0m4.580s
<br>
sys 0m0.832s
<br>
<br>
lld self-host with lld
<br>
-----------------------
<br>
user 0m3.024s
<br>
user 0m3.252s
<br>
sys 0m0.796s
<br>
<br>
</blockquote>
The above results don't look real output of "time" command.
<br>
<br>
If it's real, it's too good to be true, assuming the first line
of the
<br>
second result is "real" instead of "user".
<br>
<br>
"real" is wall clock time from process start to process exit.
"user" is CPU
<br>
time consumed by the process in user mode (if a process is
multi-threaded,
<br>
it can be larger than real).
<br>
<br>
Your result shows significant improvement in user time. Which
means you
<br>
have significantly reduced the amount of processing time to do
the same
<br>
thing compared to before. However, because this change didn't
change
<br>
algorithm, but just execute them in parallel, it couldn't
happen.
<br>
<br>
Something's not correct.
<br>
<br>
I appreciate your effort to make LLD faster, but we need to be
careful
<br>
about benchmark results. If we don't measure improvements
accurately, it's
<br>
easy to make an "optimization" that makes things slower.
<br>
<br>
Another important thing is to disbelieve what you do when you
optimize
<br>
something and measure its effect. It sometimes happen that I
believe
<br>
something is going to improve performance 100% sure but it
actually
<br>
wouldn't.
<br>
<br>
time taken to build lld with bfd
<br>
<blockquote type="cite">--------------------------------
<br>
real 0m8.419s
<br>
user 0m7.748s
<br>
sys 0m0.632s
<br>
<br>
Modified:
<br>
lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h
<br>
lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h
<br>
<br>
Modified: lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h
<br>
URL:
<br>
<a class="moz-txt-link-freetext" href="http://llvm.org/viewvc/llvm-project/lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h?rev=232460&r1=232459&r2=232460&view=diff">http://llvm.org/viewvc/llvm-project/lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h?rev=232460&r1=232459&r2=232460&view=diff</a>
<br>
<br>
==============================================================================
<br>
--- lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h
(original)
<br>
+++ lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h Mon Mar
16 22:29:32
<br>
2015
<br>
@@ -586,8 +586,10 @@ std::error_code
OutputELFWriter<ELFT>::w
<br>
_elfHeader->write(this, _layout, *buffer);
<br>
_programHeader->write(this, _layout, *buffer);
<br>
<br>
- for (auto section : _layout.sections())
<br>
- section->write(this, _layout, *buffer);
<br>
+ auto sections = _layout.sections();
<br>
+ parallel_for_each(
<br>
+ sections.begin(), sections.end(),
<br>
+ [&](Chunk<ELFT> *section) {
section->write(this, _layout, *buffer);
<br>
});
<br>
writeTask.end();
<br>
<br>
ScopedTask commitTask(getDefaultDomain(), "ELF Writer
commit to disk");
<br>
<br>
Modified: lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h
<br>
URL:
<br>
<a class="moz-txt-link-freetext" href="http://llvm.org/viewvc/llvm-project/lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h?rev=232460&r1=232459&r2=232460&view=diff">http://llvm.org/viewvc/llvm-project/lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h?rev=232460&r1=232459&r2=232460&view=diff</a>
<br>
<br>
==============================================================================
<br>
--- lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h (original)
<br>
+++ lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h Mon Mar 16
22:29:32 2015
<br>
@@ -234,17 +234,17 @@ public:
<br>
/// routine gets called after the linker fixes up the
virtual address
<br>
/// of the section
<br>
virtual void assignVirtualAddress(uint64_t addr) override
{
<br>
- for (auto &ai : _atoms) {
<br>
+ parallel_for_each(_atoms.begin(), _atoms.end(),
[&](AtomLayout *ai) {
<br>
ai->_virtualAddr = addr + ai->_fileOffset;
<br>
- }
<br>
+ });
<br>
}
<br>
<br>
/// \brief Set the file offset of each Atom in the
section. This routine
<br>
/// gets called after the linker fixes up the section
offset
<br>
void assignFileOffsets(uint64_t offset) override {
<br>
- for (auto &ai : _atoms) {
<br>
+ parallel_for_each(_atoms.begin(), _atoms.end(),
[&](AtomLayout *ai) {
<br>
ai->_fileOffset = offset + ai->_fileOffset;
<br>
- }
<br>
+ });
<br>
}
<br>
<br>
/// \brief Find the Atom address given a name, this is
needed to
<br>
properly
<br>
<br>
<br>
_______________________________________________
<br>
llvm-commits mailing list
<br>
<a class="moz-txt-link-abbreviated" href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a>
<br>
<a class="moz-txt-link-freetext" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a>
<br>
<br>
</blockquote>
</blockquote>
<br>
<br>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation</pre>
</body>
</html>