Trying to understand some relocation handling on Mach-O X86-64

Nick Kledzik kledzik at apple.com
Thu Feb 6 10:37:42 PST 2014


Rafael,

99.9% of the time, you don’t need symbols on literals. But it is that last 0.1%
that drives this. On i386 and armv7, mach-o uses “scattered relocations”
to handle these rare cases, but x86_64 (and arm64) do not have scattered
relocations.  

An example is:

        .section __TEXT,__cstring
L1:
        .asciz "f"
L2:
        .asciz "b"

        .data
        .quad L1+2
        .quad L2-2

	.text
	leaq	  %rax, L1+2
	leaq	  %rax, L2-2

If you use local relocations, the values for the .quad and the leaq are evaluated
by the assembler, and the relocation just tells the linker that there is something 
at that address that might need fixing up.  But as you can see, the linker will 
think each points to the wrong string.  

I think this is not a problem for ELF for two reasons:
1) the expression L1+2 could be encoded with a RELA relocation where
the quad/lea points to L1 and the RELA addend contains the +2.
2) By default the ELF linker does not merge cstrings, so the “f” and “b” will
always be next to each other in the final output.  The darwin linker always
coalesces the __cstring section, so the “f” in this .o file might move to 
replace an “f” from another .o file.

So, for darwin, by leaving the symbol in the .o file, the linker gets the full
expression information (symbol + addend) and can properly link this
crazy code.

-Nick


On Feb 5, 2014, at 10:27 PM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:
> It all started with pr17976. In it we avoided producing IR that would
> in turn produce a MachO that was incompatible with having atoms
> defined by symbols in __TEXT,__const.
> 
> I reported the more basic issue in pr18743: the IR was valid. We
> should figure out a way to produce a valid Mach-O from it or reject it
> loudly way before it gets to ld.
> 
> From there I noticed that L symbols in __TEXT,__cstring are output,
> but L symbols in similar looking sections like __TEXT,__ustring are
> not and reported pr18748 about it.
> 
> The comment in the code that decides to output the L symbols says:
> 
>    // Temporary labels in the string literals sections require symbols. The
>    // issue is that the x86_64 relocation format does not allow symbol +
>    // offset, and so the linker does not have enough information to resolve the
>    // access to the appropriate atom unless an external relocation is used.
> 
> Now,  why would one use a symbol+offset? I tried the attached patch
> and testcase. For both __TEXT,__literal8 and __TEXT,__cstring it has
> an external symbol, a private (L) symbol and they are both referred
> from another section (.data).
> 
> The output for __TEXT,__literal8 is the same as we have now. The patch
> changes the output for the cstring to be analogous to the __literal8.
> Why wouldn't this work? With it we produce the attached Mach-O. It has
> two very similar relocations. One points to __literal8 and the other
> to __cstring. Neither is external. Since the sections contain a known
> datatype, the offset tells us all that there is to know about what the
> relocations point to (the number 42 and the c string "bar") and the
> linker should be able to merge them.
> 
> With this patch I was able to do a 3 stage bootstrap on OS X x86-64.
> 
> Cheers,
> Rafael
> <patch><test.s>





More information about the llvm-commits mailing list