Experiment on speeding up offset computation in SHF_MERGE

Thu Oct 27 05:06:25 PDT 2016

On 26 October 2016 at 20:53, Rui Ueyama <ruiu at google.com> wrote:
> Rafael,
>
> Thank you for testing it!
>
> I'm not sure if we discussed this too, but what I had in mind is more
> radical approach than that, which is to remove  a mergeable string section
> from ELF entirely and instead use the symbol table as a "storage" of
> mergeable strings. Here's an idea.
>
> Let's say we have three mergeable strings, "foo", "bar" and "baz", and a few
> relocations referring these strings. Currently, we'd create
>
>  -  .rodata.str section with "foo\0bar\0baz\0",
>  - symbols .L.str.1, .L.str.2, .L.str.3 that point to offset 0, 4, and 8 in
> .rodata.str, respectively, and
>  - relocations pointing to .rodata.str+<offset> or .L.str.<num>+<offset>
>
> for such file. In this way, he linker needs to have n extra path to merge
> strings, and the relocation targets are computed in a special manner.
>
> But, what if we make mergeable strings symbol names? Then we'd have
>
>  - symbols .merge.foo, .merge.bar, .merge.baz (where ".merge" is a prefix to
> identify mergeable symbols, and the symbols don't point to any specific
> data), and
>  - relocations referring .merge.foo+<offset>, .merge.bar+<offset> or
> .merge.baz+<offset>
>
> We no longer have to have .rodata.str section for mergeable strings in this
> layout because the symbol themselves contain data.
>
> Now, the mergeable strings are merged automatically by the linker because
> symbols are naturally uniquified by name. What the linker has to do, after
> resolving all symbols, is to iterate over all symbols to find symbols
> starting with ".merge" and to put them in an .rodata section.
>
> We no longer need any special offset calculation for relocations referring
> mergeable strings.
>
> Does this make sense to you?

I am not sure it would help. A symbol is basically a pair of two
offsets, st_name and st_value. In your proposal we would make st_value
0 and st_name would point to the actual content. Right now we make
st_name 0 and st_value points to the content.

I see a few advantages in the current setup:

* Two smaller tables should be faster than one big one.
* Merging is still optional.
* With section symbols we end up with far fewer symbols.

Cheers,
Rafael