[LLVMdev] [lld] ARM/Thumb atom forming

Wed Dec 24 01:09:34 PST 2014

Hi guys,

I'm working on ARM architecture support for lld.
I faced the problem with ARM/Thumb symbols described below.

ARM ELF Reference specifies that symbols addressing Thumb instructions
have zero bit of st_value field set (see 4.5.3).
General ELF Reference says that st_value holds virtual address offset
from the beginning of the section
for executable files and shared objects (see Chapter 4 - Symbol Values).

When atoms are created in ELFFile::createAtoms, their content size and
content data, and their addresses are formed using st_value.
Since st_value has zero bit set for symbols addressing Thumb
instructions, corresponding atoms' addresses are always
one byte ahead of real values.
Content size and, therefore, content data may also be wrong for both ARM
and Thumb symbols depending on their order (see ELFFile::symbolContentSize):
when content size is calculated, it takes the difference between offsets
of two adjacent symbols, and if one of them is Thumb, and the other is not,
the resulting value will be one byte smaller or one byte larger than
expected.
Therefore, atom's content data is also malformed since it uses given
miscalculated content size value.

Such a wrong behavior results in:
- situations when the very first instruction of an atom has the first
byte set to zero
(if there's a gap between previous atom and the current, the initial
instruction's first byte is skipped)
- situations when the very first instruction is split between two atoms
(the right atom which should hold the instruction, and the
previous one, which "stole" the very first byte of the initial instruction)

Is there a way to override this behavior so that both ARM and Thumb atoms
formed correctly, and that I can distinguish between them in the later 
stages
for proper relocation calculations?

Regards!