<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Mar 13, 2015 at 12:35 PM, Shankar Easwaran <span dir="ltr"><<a href="mailto:shankare@codeaurora.org" target="_blank">shankare@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 3/13/2015 1:59 PM, Rafael Espíndola wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I will do a run with --merge-strings. This should probably the the<br>
default to match other ELF linkers.<br>
</blockquote>
Trying --merge-strings with today's trunk I got<br>
<br>
* comment got 77 797 bytes smaller.<br>
* rodata got 9 394 257 bytes smaller.<br>
</blockquote></span>
We can significantly improve merge string performance by delaying merging strings until the sections/atoms are garbage collected. We do it very early in the reader.<br></blockquote><div><br></div><div>From my experience, I can say it's very hard to predict doing something can significantly improve performance just by taking a look at code and thinking.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Are you using oprofile to get this stats ?<div class="HOEnZb"><div class="h5"><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Comparing with gold, comment now has the same size and rodata is 55<br>
021 bytes bigger.<br>
<br>
Amusingly, merging strings seems to make lld a bit faster. With<br>
today's files I got:<br>
<br>
lld:<br>
------------------------------<u></u>------------------------------<u></u>---------------<br>
<br>
1985.256427 task-clock (msec) # 0.999 CPUs<br>
utilized ( +- 0.07% )<br>
1,152 context-switches # 0.580 K/sec<br>
0 cpu-migrations # 0.000 K/sec<br>
( +-100.00% )<br>
199,309 page-faults # 0.100 M/sec<br>
5,970,383,833 cycles # 3.007 GHz<br>
( +- 0.07% )<br>
3,413,740,580 stalled-cycles-frontend # 57.18% frontend<br>
cycles idle ( +- 0.12% )<br>
<not supported> stalled-cycles-backend<br>
6,240,156,987 instructions # 1.05 insns per<br>
cycle<br>
# 0.55 stalled<br>
cycles per insn ( +- 0.01% )<br>
1,293,186,347 branches # 651.395 M/sec<br>
( +- 0.01% )<br>
26,687,288 branch-misses # 2.06% of all<br>
branches ( +- 0.00% )<br>
<br>
1.987125976 seconds time elapsed<br>
( +- 0.07% )<br>
------------------------------<u></u>------------------------------<u></u>-----------------------<br>
ldd --merge-strings:<br>
<br>
------------------------------<u></u>------------------------------<u></u>------------------<br>
1912.735291 task-clock (msec) # 0.999 CPUs<br>
utilized ( +- 0.10% )<br>
1,152 context-switches # 0.602 K/sec<br>
0 cpu-migrations # 0.000 K/sec<br>
( +-100.00% )<br>
187,916 page-faults # 0.098 M/sec<br>
( +- 0.00% )<br>
5,749,920,058 cycles # 3.006 GHz<br>
( +- 0.04% )<br>
3,250,485,516 stalled-cycles-frontend # 56.53% frontend<br>
cycles idle ( +- 0.07% )<br>
<not supported> stalled-cycles-backend<br>
5,987,870,976 instructions # 1.04 insns per<br>
cycle<br>
# 0.54 stalled<br>
cycles per insn ( +- 0.00% )<br>
1,250,773,036 branches # 653.919 M/sec<br>
( +- 0.00% )<br>
27,922,489 branch-misses # 2.23% of all<br>
branches ( +- 0.00% )<br>
<br>
1.914565005 seconds time elapsed<br>
( +- 0.10% )<br>
------------------------------<u></u>------------------------------<u></u>----------------<br>
<br>
<br>
gold<br>
<br>
------------------------------<u></u>------------------------------<u></u>-------------------<br>
1000.132594 task-clock (msec) # 0.999 CPUs<br>
utilized ( +- 0.01% )<br>
0 context-switches # 0.000 K/sec<br>
0 cpu-migrations # 0.000 K/sec<br>
77,836 page-faults # 0.078 M/sec<br>
3,002,431,314 cycles # 3.002 GHz<br>
( +- 0.01% )<br>
1,404,393,569 stalled-cycles-frontend # 46.78% frontend<br>
cycles idle ( +- 0.02% )<br>
<not supported> stalled-cycles-backend<br>
4,110,576,101 instructions # 1.37 insns per<br>
cycle<br>
# 0.34 stalled<br>
cycles per insn ( +- 0.00% )<br>
869,160,761 branches # 869.046 M/sec<br>
( +- 0.00% )<br>
15,691,670 branch-misses # 1.81% of all<br>
branches ( +- 0.00% )<br>
<br>
1.001044905 seconds time elapsed<br>
( +- 0.01% )<br>
------------------------------<u></u>------------------------------<u></u>-------------------<br>
<br>
I have attached the run.sh script I used to collect the numbers.<br>
<br>
Cheers,<br>
Rafael<br>
</blockquote>
<br>
<br></div></div><div class="HOEnZb"><div class="h5">
-- <br>
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation<br>
<br>
</div></div></blockquote></div><br></div></div>