[llvm-dev] Proposal for address-significance tables for --icf=safe

Peter Collingbourne via llvm-dev llvm-dev at lists.llvm.org
Wed May 23 12:06:52 PDT 2018


On Wed, May 23, 2018 at 8:15 AM, James Y Knight <jyknight at google.com> wrote:

> On Tue, May 22, 2018 at 6:06 PM Peter Collingbourne via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi all,
>>
>> Context: ld.gold has an --icf=safe flag which is intended to apply ICF
>> only to sections which can be safely merged according to the guarantees
>> provided by the language. It works using a set of heuristics (symbol name
>> matching and relocation scanning). That's not only imprecise but it only
>> works with certain languages and is slow due to the need to demangle
>> symbols and scan relocations. It's also redundant with the
>> (local_)unnamed_addr analysis already performed by LLVM.
>>
>> I implemented an alternative to this approach in clang and lld. It works
>> by adding a section to each object file containing the indexes of the
>> symbols which are address-significant (i.e. not (local_)unnamed_addr in IR).
>>
>> I used this implementation to link clang with release+asserts with each
>> of --icf={none,safe,all}. The binary sizes were:
>>
>> none: 109407184
>> safe: 108534736 (-0.8%)
>> all: 107281360 (-2%)
>>
>> I measured the object file overhead of these sections in my clang build
>> at 0.08%. That's almost nothing, and I think it's small enough that we can
>> turn it on by default.
>>
>> I've uploaded a patch series for this feature here:
>> https://github.com/pcc/llvm-project/tree/llvm-addrsig
>> I intend to start sending it for review soon.
>>
>
> This sounds like a nice idea, but it'd be great to put in some effort to
> see if we can get this done in a cross-toolchain collaborative manner,
> instead of llvm-specific. The need is clearly generic, after all.
>
>
Peter Smith has suggested making a proposal to generic-abi and that
certainly seems reasonable, but there seems to be a practical problem
there: the generic-abi is unmaintained and there doesn't seem to be an
authority responsible for assigning section numbers (see the recent
SHT_RELR thread for example). Since this proposal requires a new section
number I would suggest that we make the proposal to generic-abi,
incorporate any design feedback from there and proceed with a section
number in the LLVM namespace until the generic-abi gets a maintainer, at
which point we can change lld to accept both section numbers or maybe just
the generic one (since reading the section is optional).

I'm a bit worried of the scheme of emitting symbol indexes into a
> section. AFAIK there is nothing else in ELF which puts symbol indexes in
> data at the moment (only in relocations and section headers). In
> particular, if anyone were to use a tool which rewrites the symbol table,
> it'll break things, unless that tool knows about this special section.
>

The design accounts for this :)

To begin with, in practice I don't think we can get this right for every
conceivable tool, because the tool could put something in the object file
that would invalidate the metadata by making a symbol address-significant.
For example, I can use ld -r to combine a metadata-containing object file
defining a function foo with a non-metadata-containing object file defining
a function bar that returns the address of foo, which would invalidate the
metadata for foo. So I think the best that we can hope for is to arrange
for most tools to "naturally" invalidate the metadata.

There turns out to be a way to do this: most tools will reset the sh_link
field of an unrecognized section to zero if that sh_link field points to
the .symtab section. GNU objcopy, ld.bfd -r, ld.gold -r and ld.lld -r all
do this. (It looks like llvm-objcopy will preserve the sh_link, but we can
fix that.) So what we can do is make the sh_link in our section point to
.symtab and use sh_link=0 as a signal that a tool has operated on the
object file, and therefore ignore the section. This resetting of sh_link
for unrecognized sections doesn't appear to be required by the generic-abi,
so this is probably something that we'd want to bring up there in addition
to the section itself. Also, this should hopefully become a non-problem
once this proposal makes its way into either the generic ABI or GNU-gABI
and tools learn about the section.


> I wonder if it's possible to put the data in the symbol table.
> Unfortunately, there's not a whole lot of available space there...
>
> The "st_other" field has some space available -- only the bottom 2 bits
> are currently used in general for visibility, so one could imagine adding a
> flag at 0x04, perhaps, for this indicator. Unfortunately, the bits of this
> field are widely used by a variety of architecture-specific things, which
> makes that rather more complicated. MIPS uses almost all the remaining bits
> in that field, but seemingly not bit 3. PowerPC uses bits 5-8 for the
> local-call optimization, leaving bits 3-4. Alpha uses bits 4 and 8 for what
> I think may be a similar optimization. IA64 OpenVMS uses bits 5-8 for
> additional function-type and linkage annotations. m68k uses bits 7-8 for
> identifying "far" functions and interrupt handlers.
>
> So...that might be viable -- just barely --  but even after suggesting
> that, I'd not really want to argue for it. =)
>
>
I thought about using st_other for this, but I came to the same conclusion
that it would probably be too hard to stake out a bit with the
architecture-specific things going on there. From the looks of things it
looks like MIPS is (somewhat gratuitously in the case of STO_MIPS_MIPS16)
using all of the "unused" bits: http://llvm-cs.pcc.me.uk
/include/llvm/BinaryFormat/ELF.h#554
So if we did something with st_other we'd probably need to exclude MIPS and
maybe other architectures and get tools to do the right thing (which seems
harder than with the sh_link trick).

Thanks,
-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180523/139a5f60/attachment.html>


More information about the llvm-dev mailing list