Support dead-stripping in ELF objects

Robinson, Paul Paul_Robinson at playstation.sony.com
Wed Apr 10 11:06:45 PDT 2013


From: Nick Kledzik [mailto:kledzik at apple.com]
Sent: Monday, April 08, 2013 3:04 PM
> On Apr 8, 2013, at 2:42 PM, "Robinson, Paul"
> <Paul_Robinson at playstation.sony.com> wrote:
> 
> > Okay, just to restate the specification all in one place.
> >
> > We define three new ELF section-header flags:
> >
> > * SHF_ATOMIZED
> > The symbols defined for this section (other than the STT_SECTION
> symbol)
> > have the following properties.
> > - The range [st_value, st_value + st_size) for each symbol does not
> overlap
> >  the range for any other symbol.
> > - For all symbols, st_value + st_size <= sh_size (the end of the
> section).
> > - All relocations targeting this section shall resolve to values
> within the
> >  range for some symbol.
> I'm still worried this restriction may prevent some compiler
> optimizations.  For instance:
> 
> int attrs[26];
> 
> int attr_of_lower_char(char c) {
> 	return attrs[c-'a'];
> }
> 
> This function assumes a character is in the range of a lowercase letter.
> The compiler could
> optimize this to:
>      mov reg1, &addrs  -  0x61
>      load reg2, (reg1 + reg0)
> Basically, merge the minus 0x61 into the address calculation, so as to
> not need an instruction
> to subtract 0x61.  But that means a relocation that points outside
> (before) the target symbol.

Interesting.  For my toolchain, my linker guy says that (without actually
trying it) attrs would be considered referenced even though the addend
points outside the atom.  That is, given an object file that is marked
as prepped-for-deadstripping, the actual addends aren't considered.
If the target of the relocation ends up outside the atom in a way that
causes Bad Stuff(tm) to happen, that's a compiler bug, because the compiler
promised that the atoms were okay.

(And I note in passing that PC-relative relocations, e.g. function calls,
seem to use symbol-4 on x86_64... so negative addends occur all the time.)

How's this for a reasonable third requirement for SHF_ATOMIZED:
- All references to this section shall be to addresses within the range
  of some symbol.  (If a relocation resolves to an address outside the
  range of all symbols, the compiler must arrange that the final
  reference is still within range.)

> >
> > The range for each symbol is called an "atom."  There are no
> relocations
> > addressing any gaps between atoms.
> >
> > Note in particular this does not disallow using .section+N
> relocations,
> > as long as the target address is in the range of some atom.
> >
> > (Michael, is it really required to have a symbol with st_value = 0, if
> > we have all the other requirements?  Typically we would, but strictly
> > speaking a "gap" at the start of the section should not be a problem?)
> >
> > * SHF_SUBSECTIONS_VIA_SYMBOLS
> > SHF_ATOMIZED must be set if SHF_SUBSECTIONS_VIA_SYMBOLS is set.
> > In addition, the following properties hold for the entire object file.
> > - No relocation uses the STT_SECTION symbol for this section.
> > - If any relocation uses a symbol defined in this section, the addend
> >  must be less than st_size for that symbol.

Per the discussion above, this second property should probably be restated:
- Any reference to an atom in this section (from outside that atom)
  must use a relocation based on the symbol for that atom.

Again not requiring addends be in range, but requiring that all references
are based on a relocation using the correct symbol.

> > - The range [st_value, st_value + st_size) of any symbol in the
> section
> >  may be moved to a different relative location.
> >
> > Each "atom" in this section is also called a "subsection."  Any
> relocation
> > using the symbol for subsection must resolve to an address within the
> > subsection.  There are no location dependencies between subsections,
> other
> > than those expressed by relocations.
> 
> How does this interact with weak definitions?  That is, how do weak
> copies of
> inline header defined functions work today with ELF?  Are all copies
> left in final
> linked image but only one is used?  Or does the linker actually remove
> unused
> weak functions from the middle of a section? Or are all weak definitions
> always
> in their own section?

In my experiments, inline header defined functions were emitted in their
own sections, apparently using comdat groups.  At least, if they weren't
being optimized away entirely.

> Are there any restrictions on how these new flags interact with group
> comdat?

I don't know how any existing flags interact with group comdat...
I would guess that these new flags should be and-ed together, or required
to be the same.  I don't know what a linker would prefer (my toolchain's
linker obviously isn't using these flags).

--paulr

> 
> -Nick
> 
> >
> > * SHF_DEADSTRIP
> > This entire section may be omitted from the output file, if it is dead
> > (i.e., there are no references to any symbol defined in the section).
> > If SHF_SUBSECTIONS_VIA_SYMBOLS is also set, individual subsections
> > may be omitted from the output file, if they are dead.
> >
> >
> > We define one new ELF symbol flag:
> >
> > * STF_NO_DEADSTRIP
> > The linker may not remove this symbol from the output file.
> > This symbol flag takes precedence over the SHF_DEADSTRIP section flag.
> > If the symbol defines a subsection, the subsection must be considered
> live.
> > If the symbol does not define a subsection, the symbol's entire
> section
> > must be considered live.
> >
> >
> > How's that?
> > --paulr






More information about the llvm-commits mailing list