Support dead-stripping in ELF objects

Nick Kledzik kledzik at apple.com
Tue Apr 2 18:49:05 PDT 2013


On Apr 2, 2013, at 6:23 PM, Michael Spencer <bigcheesegs at gmail.com> wrote:
> 
> This is a good idea, but I think we need more for an effective and portable implementation.
I very much agree!

> 
> We need a detailed specification of what guarantees the compiler is making for the linker. I've thought about this in terms of both deadstripping and how to make atomization of ELF files faster and have the following:
> 
> Section flags:
> * SHF_VALID_SIZE
>   * There shall be a symbol with st_value of 0.
>   * For each symbol, st_value + st_size shall be the st_value of another symbol, or sh_size (the end of the section).
>   * For each symbol, the range [st_value, st_value + st_size) shall be non-overlapping.
So these say that the whole section is "covered" by the set of symbol ranges.  Is that too strict?  What if the compiler 16-byte aligned the start of each function.  Should the padding at the end be included in each st_size?  If the padding is not in the st_size, then that would allow the linker optimizations like replacing the padding with small functions...


>   * If a relocation in this section has a target symbol which is in a section with this flag set, r_addend must be less than st_size.
I assume you mean the st_size of the target symbol.  But that would disallow valid C code.  You can write:

       char* x = &foo + 100;
or
       char* y = &foo - 50;

where 100 > sizeof(foo).  Supporting this sort of stuff is why "scattered relocations" were created for mach-o long ago.  You see mach-o does not have RELA relocations (that is relocations with addends). Scattered relocations give you the final value and the symbol.  You can then subtract the two to get the addend.    Anyway, with addends, you don't need this st_size check.  

Which reminds me this additional information is not just about dead stripping it also enable ordering.  That is, when you link, you can pass the linker an ordered list of "hot" functions/data and the linker can rearrange the atoms to group the hot atoms together.  The end result is less paging and less dirty pages.  And that is how "scattered relocation" got its name (yes, "scattered" seems like increasing entropy, whereas order files are attempting to better organize the code.  But in someone's mind moving the functions at all was "scattering" them).

-Nick


> 
>   This is separate from SHF_SUBSECTION_VIA_SYMBOLS because it makes atomization faster even in cases where symbol data cannot be moved. This is important as it allows the assembler to preform more relaxation to get smaller output while still allowing for this optimization.
> 
> * SHF_SUBSECTION_VIA_SYMBOLS - SHF_VALID_SIZE must also be set.
>   * The range [st_value, st_value + st_size) of each symbol in the section may be moved to a different relative location.
> 
> * SHF_DEADSTRIP - This section may be removed if it is dead. If SHF_SUBSECTION_VIA_SYMBOLS is also set, symbols and their data may be removed if dead.
> 
>   This is separate because the assembler may have preformed relaxation within a section, but you would still like the entire section to be removed if it is not referenced. This also provides a way to limit the effect of --gc-sections without hard coding the names of sections not to remove (such as .init and .fini).
> 
> Symbol flags (which we'll have to find a place to embed, probably st_other which has 6 free bits):
> * STF_NO_DEADSTRIP - Never remove this symbol or symbols it references even if it is in a SHF_SUBSECTION_VIA_SYMBOLS or SHF_DEADSTRIP section.
> 
> 
> == Deadstripping ==
> 
> The deadstrip roots is the set of all symbols defined in sections not marked SHF_DEADSTRIP, symbols with STF_NO_DEADSTRIP, and any implementation defined symbols.
> 
> A symbol is live if it is a deadstrip root, or it is referenced from a deadstrip root or another live symbol, or it is in a section not marked SHF_SUBSECTION_VIA_SYMBOLS that has other live symbols.
> 
> A symbol X is referenced by Z if:
> * If Z and X are not in a section marked SHF_VALID_SIZE.
>   * There is a relocation in the section containing Z with a resolved symbol which comes from the same file and has the same st_shndx as X (the section containing Z references the section containing X).
> * If Z is not in a section marked SHF_VALID_SIZE but X is.
>   * There is a relocation in the section containing Z with the resolved symbol X.
> * If both Z and X are in sections marked SHF_VALID_SIZE.
>   * There is a relocation with an r_offset within the range [st_value, st_value + st_size) of Z with the resolved symbol X.
> 
> Any symbol that is not live may be removed if allowed by section flags.
> 
> - Michael Spencer
>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130402/60e4358f/attachment.html>


More information about the llvm-commits mailing list