<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Dec 1, 2017 at 2:07 PM, Brian Cain <span dir="ltr"><<a href="mailto:brian.cain@gmail.com" target="_blank">brian.cain@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="h5">On Fri, Dec 1, 2017 at 3:55 PM, Rui Ueyama via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><div class="m_-5675966006092816668gmail-h5">On Fri, Dec 1, 2017 at 1:26 PM, Rafael Avila de Espindola <span dir="ltr"><<a href="mailto:rafael.espindola@gmail.com" target="_blank">rafael.espindola@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
I got curious how the lld produced gnu hash tables compared to gold. To<br>
test that I timed "perf record ninja check-llvm" (just the lit run) in a<br>
BUILD_SHARED_LIBS build.<br>
<br>
The performance was almost identical, so I decided to try sysv versus<br>
gnu (both produced by lld). The results are interesting:<br>
<br>
% grep -v '^#' perf-gnu/perf.report-by-dso-sy<wbr>m | head<br>
38.77% <a href="http://ld-2.24.so" rel="noreferrer" target="_blank">ld-2.24.so</a> [.] do_lookup_x<br>
8.08% <a href="http://ld-2.24.so" rel="noreferrer" target="_blank">ld-2.24.so</a> [.] strcmp<br>
2.66% <a href="http://ld-2.24.so" rel="noreferrer" target="_blank">ld-2.24.so</a> [.] _dl_relocate_object<br>
2.58% <a href="http://ld-2.24.so" rel="noreferrer" target="_blank">ld-2.24.so</a> [.] _dl_lookup_symbol_x<br>
1.85% <a href="http://ld-2.24.so" rel="noreferrer" target="_blank">ld-2.24.so</a> [.] _dl_name_match_p<br>
1.46% [kernel.kallsyms] [k] copy_page<br>
1.38% <a href="http://ld-2.24.so" rel="noreferrer" target="_blank">ld-2.24.so</a> [.] _dl_map_object<br>
1.30% [kernel.kallsyms] [k] unmap_page_range<br>
1.28% [kernel.kallsyms] [k]<br>
filemap_map_pages<br>
1.26% libLLVMSupport.so.6.0.0svn [.] sstep<br>
% grep -v '^#' perf-sysv/perf.report-by-dso-s<wbr>ym | head<br>
42.18% <a href="http://ld-2.24.so" rel="noreferrer" target="_blank">ld-2.24.so</a> [.] do_lookup_x<br>
17.73% <a href="http://ld-2.24.so" rel="noreferrer" target="_blank">ld-2.24.so</a> [.] check_match<br>
14.41% <a href="http://ld-2.24.so" rel="noreferrer" target="_blank">ld-2.24.so</a> [.] strcmp<br>
1.22% <a href="http://ld-2.24.so" rel="noreferrer" target="_blank">ld-2.24.so</a> [.] _dl_relocate_object<br>
1.13% <a href="http://ld-2.24.so" rel="noreferrer" target="_blank">ld-2.24.so</a> [.] _dl_lookup_symbol_x<br>
0.91% <a href="http://ld-2.24.so" rel="noreferrer" target="_blank">ld-2.24.so</a> [.] _dl_name_match_p<br>
0.67% <a href="http://ld-2.24.so" rel="noreferrer" target="_blank">ld-2.24.so</a> [.] _dl_map_object<br>
0.65% [kernel.kallsyms] [k] unmap_page_range<br>
0.63% [kernel.kallsyms] [k] copy_page<br>
0.59% libLLVMSupport.so.6.0.0svn [.] sstep<br>
<br>
So the gnu hash table helps a lot, but BUILD_SHARED_LIBS is still crazy<br>
inefficient.</blockquote><div><br></div></div></div><div>What is "100%" in these numbers? If 100% means all execution time, <a href="http://ld-2.24.so" target="_blank">ld-2.24.so</a> takes more than 70% of execution time. Is this real?</div></div></div></div>
<br><br></blockquote><div><br></div><div><br></div></div></div><div>perf usually measures cycles ("CPU_CLK_UNHALTED" for core/xeon, e.g.). So it's not time but cycles. This is a critical distinction when the thing being measured has delays/synchronization/disk/<wbr>network I/O.</div><div><br></div><div>Also it looks like this report might be decomposed by some other attribute (DSO-at-a-time?) that would affect what "100%" means.</div><div><br></div><div>Doing perf on "ninja check-llvm" seems like it would measure cycles contributed by lots of non-lld things, in fact it's worth ruling out whether it's dominated by non-lld things. Doesn't testing itself perhaps spend more cycles than the linking being done here?<br></div></div>
</div></div>
</blockquote></div><br></div><div class="gmail_extra">He is measuring the performance of the dynamic linker/loader to see if lld-generated dynamic symbol tables and their corresponding .hash or .gnu.hash tables are efficient. So that is a correct way of testing it.</div></div>