Trying to understand some relocation handling on Mach-O X86-64

Thu Feb 6 13:52:05 PST 2014

On 6 February 2014 13:37, Nick Kledzik <kledzik at apple.com> wrote:
> Rafael,
>
> 99.9% of the time, you don’t need symbols on literals. But it is that last 0.1%
> that drives this. On i386 and armv7, mach-o uses “scattered relocations”
> to handle these rare cases, but x86_64 (and arm64) do not have scattered
> relocations.
>
> An example is:
>
>         .section __TEXT,__cstring
> L1:
>         .asciz "f"
> L2:
>         .asciz "b"
>
>         .data
>         .quad L1+2
>         .quad L2-2
>
>         .text
>         leaq      %rax, L1+2
>         leaq      %rax, L2-2
>
> If you use local relocations, the values for the .quad and the leaq are evaluated
> by the assembler, and the relocation just tells the linker that there is something
> at that address that might need fixing up.  But as you can see, the linker will
> think each points to the wrong string.

What is the expected meaning with Mach-O? That the relocation always
points 2 byte after the start of a given string (for L1+2)?

> I think this is not a problem for ELF for two reasons:
> 1) the expression L1+2 could be encoded with a RELA relocation where
> the quad/lea points to L1 and the RELA addend contains the +2.
> 2) By default the ELF linker does not merge cstrings, so the “f” and “b” will
> always be next to each other in the final output.  The darwin linker always
> coalesces the __cstring section, so the “f” in this .o file might move to
> replace an “f” from another .o file.

Not quiet. For ELF merging happens or not depending on the section
flags. The sections we use for non unnamed_addr strings are always
mergeable, so from clang's point of view that is the same.

REL or RELA is not the issue. On 32 bits x86 we use REL. on X86-64 we use RELA.

The logic that is used by MC when handling this is that a symbol that
is used in a relocation is kept in the symbol table. It looks like gas
does the same. This means that we do get the above semantic (points X
bytes after the start of a given string).

For example, in the attached test.s I get

$ llvm-mc -filetype=obj  test.s -o test.o
$ llvm-nm test.o
00000008 r .L3
00000000 D D

Now, this is in no way specific to C strings. In the attached test2.s
the pointer in D always ends up pointing 4 bytes past the number 42.

Why then the special case for C strings on Mach-O? Couldn't we use the
same logic as ELF? The advantages would be

* The X bytes past a known mergeable data relocations would also work
for utf-16 string, integers, floats, etc
* doesSectionRequireSymbols goes away.
* isSectionAtomizable would go away, except it can then be modified a
bit. It should say if a section is symbol-atomizable instead of
know-datatype-atomizable. With that it would be used to avoid putting
a L symbol in an symbol-atomizable section like __TEXT,__const.

BTW, in PR18743, is there a section we could put the constant and use
an L prefix or do all sections with an unknown datatype get atomized
using symbols?

> So, for darwin, by leaving the symbol in the .o file, the linker gets the full
> expression information (symbol + addend) and can properly link this
> crazy code.
>
> -Nick

Thanks,
Rafael
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.s
Type: application/octet-stream
Size: 341 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140206/075e8c6c/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test2.s
Type: application/octet-stream
Size: 187 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140206/075e8c6c/attachment-0001.obj>