<div dir="ltr">Rafael,<div><br></div><div>Thank you for testing it!</div><div><br></div><div>I'm not sure if we discussed this too, but what I had in mind is more radical approach than that, which is to remove a mergeable string section from ELF entirely and instead use the symbol table as a "storage" of mergeable strings. Here's an idea.</div><div><br></div><div>Let's say we have three mergeable strings, "foo", "bar" and "baz", and a few relocations referring these strings. Currently, we'd create</div><div><br></div><div> - .rodata.str section with "foo\0bar\0baz\0",</div><div> - symbols .L.str.1, .L.str.2, .L.str.3 that point to offset 0, 4, and 8 in .rodata.str, respectively, and</div><div> - relocations pointing to .rodata.str+<offset> or .L.str.<num>+<offset></div><div><br></div><div>for such file. In this way, he linker needs to have n extra path to merge strings, and the relocation targets are computed in a special manner.</div><div><br></div><div>But, what if we make mergeable strings symbol names? Then we'd have</div><div><br></div><div> - symbols .merge.foo, .merge.bar, .merge.baz (where ".merge" is a prefix to identify mergeable symbols, and the symbols don't point to any specific data), and</div><div> - relocations referring .merge.foo+<offset>, .merge.bar+<offset> or .merge.baz+<offset></div><div><br></div><div>We no longer have to have .rodata.str section for mergeable strings in this layout because the symbol themselves contain data.</div><div><br></div><div>Now, the mergeable strings are merged automatically by the linker because symbols are naturally uniquified by name. What the linker has to do, after resolving all symbols, is to iterate over all symbols to find symbols starting with ".merge" and to put them in an .rodata section.</div><div><br></div><div>We no longer need any special offset calculation for relocations referring mergeable strings.</div><div><br></div><div>Does this make sense to you?</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct 24, 2016 at 3:59 PM, Rafael Espíndola <span dir="ltr"><<a href="mailto:rafael.espindola@gmail.com" target="_blank">rafael.espindola@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">During bsdcan Rui suggested an idea for improving the performance of<br>
offset computation: Instead of having relocations use section+offset,<br>
always use a symbol. That would give us a place to cache the value.<br>
<br>
The attached patches implement that. Unfortunately the .o produced are<br>
substantially larger and in the end there is a performance regression.<br>
I didn't investigate it too much, but it looks like the net impact is<br>
to just move cache misses from getOffset to getVA, and as we now use<br>
more memory, there are more misses.<br>
<br>
I have uploaded two reproducibles of a debug build with clang. One<br>
uses section symbols, the other does not.<br>
<br>
<a href="https://drive.google.com/open?id=0B7iRtublysV6U244RG1BYkE3dDg" rel="noreferrer" target="_blank">https://drive.google.com/open?<wbr>id=<wbr>0B7iRtublysV6U244RG1BYkE3dDg</a><br>
<br>
<a href="https://drive.google.com/open?id=0B7iRtublysV6aktKWWViTzhCbDg" rel="noreferrer" target="_blank">https://drive.google.com/open?<wbr>id=<wbr>0B7iRtublysV6aktKWWViTzhCbDg</a><br>
<br>
Another thing I tried was using a complete binary tree instead of a<br>
binary search. While that shows promise, it hit code quality issues in<br>
llvm pr30771, pr30772. A very hackish patch that also includes a<br>
benchmark is attached too.<br>
<br>
My current understanding is that the local algorithms we for string<br>
merging and offset computation are pretty efficient. What might still<br>
be possible is<br>
<br>
* Reducing memory consumption somewhere else to improve cache hits.<br>
* Not copying as much debug info (dwo, ideally single file dwo).<br>
<br>
Cheers,<br>
Rafael<br>
</blockquote></div><br></div>