[llvm-dev] Proposal for address-significance tables for --icf=safe

James Y Knight via llvm-dev llvm-dev at lists.llvm.org
Wed May 23 08:15:25 PDT 2018


On Tue, May 22, 2018 at 6:06 PM Peter Collingbourne via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi all,
>
> Context: ld.gold has an --icf=safe flag which is intended to apply ICF
> only to sections which can be safely merged according to the guarantees
> provided by the language. It works using a set of heuristics (symbol name
> matching and relocation scanning). That's not only imprecise but it only
> works with certain languages and is slow due to the need to demangle
> symbols and scan relocations. It's also redundant with the
> (local_)unnamed_addr analysis already performed by LLVM.
>
> I implemented an alternative to this approach in clang and lld. It works
> by adding a section to each object file containing the indexes of the
> symbols which are address-significant (i.e. not (local_)unnamed_addr in IR).
>
> I used this implementation to link clang with release+asserts with each of
> --icf={none,safe,all}. The binary sizes were:
>
> none: 109407184
> safe: 108534736 (-0.8%)
> all: 107281360 (-2%)
>
> I measured the object file overhead of these sections in my clang build at
> 0.08%. That's almost nothing, and I think it's small enough that we can
> turn it on by default.
>
> I've uploaded a patch series for this feature here:
> https://github.com/pcc/llvm-project/tree/llvm-addrsig
> I intend to start sending it for review soon.
>

This sounds like a nice idea, but it'd be great to put in some effort to
see if we can get this done in a cross-toolchain collaborative manner,
instead of llvm-specific. The need is clearly generic, after all.

I'm a bit worried of the scheme of emitting symbol indexes into a
section. AFAIK there is nothing else in ELF which puts symbol indexes in
data at the moment (only in relocations and section headers). In
particular, if anyone were to use a tool which rewrites the symbol table,
it'll break things, unless that tool knows about this special section.

I wonder if it's possible to put the data in the symbol table.
Unfortunately, there's not a whole lot of available space there...

The "st_other" field has some space available -- only the bottom 2 bits are
currently used in general for visibility, so one could imagine adding a
flag at 0x04, perhaps, for this indicator. Unfortunately, the bits of this
field are widely used by a variety of architecture-specific things, which
makes that rather more complicated. MIPS uses almost all the remaining bits
in that field, but seemingly not bit 3. PowerPC uses bits 5-8 for the
local-call optimization, leaving bits 3-4. Alpha uses bits 4 and 8 for what
I think may be a similar optimization. IA64 OpenVMS uses bits 5-8 for
additional function-type and linkage annotations. m68k uses bits 7-8 for
identifying "far" functions and interrupt handlers.

So...that might be viable -- just barely --  but even after suggesting
that, I'd not really want to argue for it. =)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180523/770c44ab/attachment.html>


More information about the llvm-dev mailing list