Experiment on speeding up offset computation in SHF_MERGE

Wed Oct 26 17:53:44 PDT 2016

Rafael,

Thank you for testing it!

I'm not sure if we discussed this too, but what I had in mind is more
radical approach than that, which is to remove  a mergeable string section
from ELF entirely and instead use the symbol table as a "storage" of
mergeable strings. Here's an idea.

Let's say we have three mergeable strings, "foo", "bar" and "baz", and a
few relocations referring these strings. Currently, we'd create

 -  .rodata.str section with "foo\0bar\0baz\0",
 - symbols .L.str.1, .L.str.2, .L.str.3 that point to offset 0, 4, and 8 in
.rodata.str, respectively, and
 - relocations pointing to .rodata.str+<offset> or .L.str.<num>+<offset>

for such file. In this way, he linker needs to have n extra path to merge
strings, and the relocation targets are computed in a special manner.

But, what if we make mergeable strings symbol names? Then we'd have

 - symbols .merge.foo, .merge.bar, .merge.baz (where ".merge" is a prefix
to identify mergeable symbols, and the symbols don't point to any specific
data), and
 - relocations referring .merge.foo+<offset>, .merge.bar+<offset> or
.merge.baz+<offset>

We no longer have to have .rodata.str section for mergeable strings in this
layout because the symbol themselves contain data.

Now, the mergeable strings are merged automatically by the linker because
symbols are naturally uniquified by name. What the linker has to do, after
resolving all symbols, is to iterate over all symbols to find symbols
starting with ".merge" and to put them in an .rodata section.

We no longer need any special offset calculation for relocations referring
mergeable strings.

Does this make sense to you?

On Mon, Oct 24, 2016 at 3:59 PM, Rafael Espíndola <
rafael.espindola at gmail.com> wrote:

> During bsdcan Rui suggested an idea for improving the performance of
> offset computation: Instead of having relocations use section+offset,
> always use a symbol. That would give us a place to cache the value.
>
> The attached patches implement that. Unfortunately the .o produced are
> substantially larger and in the end there is a performance regression.
> I didn't investigate it too much, but it looks like the net impact is
> to just move cache misses from getOffset to getVA, and as we now use
> more memory, there are more misses.
>
> I have uploaded two reproducibles of a debug build with clang. One
> uses section symbols, the other does not.
>
> https://drive.google.com/open?id=0B7iRtublysV6U244RG1BYkE3dDg
>
> https://drive.google.com/open?id=0B7iRtublysV6aktKWWViTzhCbDg
>
> Another thing I tried was using a complete binary tree instead of a
> binary search. While that shows promise, it hit code quality issues in
> llvm pr30771, pr30772. A very hackish patch that also includes a
> benchmark is attached too.
>
> My current understanding is that the local algorithms we for string
> merging and offset computation are pretty efficient. What might still
> be possible is
>
> * Reducing memory consumption somewhere else to improve cache hits.
> * Not copying as much debug info (dwo, ideally single file dwo).
>
> Cheers,
> Rafael
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161026/7e43e756/attachment.html>