[llvm-commits] Atom alignment

Thu Oct 18 18:32:59 PDT 2012

On 10/18/2012 6:38 PM, Nick Kledzik wrote:
> On Oct 18, 2012, at 3:19 PM, Evandro Menezes wrote:
>> I wonder how much of is because we're breaking up the indivisible unit in ELF, sections, into symbol atoms.  AFAIK, because we cannot rely on the assembler programmer being diligent about specifying symbol sizes, the ELF reader gobbles up everything between symbols, even if more than its indicated size.
>>
>> The reason for this gluttony is that there is code out there that an assembler accepts that should also be accepted by the linker.  For example, jump tables.  Though I don't advocate it being written this way, an assembler is typically fine with it:
>>
>> foo:
>> 	...
>> 	jmp (.L123 + %eax * 8)
>> 	ret
>>
>> 	.size foo, . - foo
>>
>> .L123:
>> 	.long bar
>> 	.long goo
>> 	.long ...
>> 	...
>> 	.long car
>> 	.long ...
>>
>> bar:
>> 	...
>> 	jmp (bar - 16 + %eax * 8)
>> 	ret
>>
>> 	.size bar, . - bar
>>
>> In this case, .L123 doesn't have a size.  Even if a compiler would be careful enough to specify the jump-table size, a sloppy assembler programmer might not be as careful.  And if an assembler doesn't complain about it and generates a valid ELF file, the linker should take it as is.
>>
>> Now, assume that foo's atom is not referenced and thus discarded. Consequently, so is .L123's atom, including the data in it that the function bar relies on and refers to only indirectly.  Then, the code in the function bar is broken.
>>
>> So, please, bear with me, I wonder if the ELF model fits neatly ion the atom model.  And, if not, how could the atom model be improved to accommodate it and perhaps other section-based file formats.
> The mach-o file format has the same problem as ELF, but we've been successfully using the atom model in the darwin linker for 7+ years now.
>
> The trick is to we have an opt-in directive is assembly files (.subsections_via_symbols).  The name is historic, but what it means is that the linker can assume the file follows some rules.  The compiler always follows the rules and always uses that directive.   Hand written assembly can (if the author wants) follow the rules and use the directive.
Its a nice trick, but not sure if all assemblers would be able to follow 
this rule though.
>
>  From the linker's perspective, if the directive was not used, it has to be more conservative it what it can do with the file.  In particular it adds a "follow-on" reference from each atom in a section to the next one.  The follow-on atoms constrain the layout engine that particular atoms must layout right after another.  So, if an order file is used to move one atom there may be a whole train of atoms that move with it
When each section have different alignment requirements, it would be 
really hard to fix the alignment, when atoms get removed due to Garbage 
collection right ?

The additional problems that

1) The Atom model forgets the relation between sections after parsing 
the Input file
2) We dont have a way to layout the atoms to fix their offsets/order 
after reading the files like traditional linkers which run with ELF files
3) Certain things w.r.t the ELF file is not completely represented by Atoms
      a) Merging constants in the sections have SHF_MERGE/SHF_STRINGS 
flag set
      b) Handling section groups (SHT_GROUP) for C++ comdat resolution

The way to fix these would be to

1) Represent sections as Atoms in the lld design ? What do you think ?
2) Handle Section Groups by grouping the sections into Atoms by creating 
AtomSets/AtomCollections

What do you think ?

Shankar Easwaran