[LLVMdev] MachO non-external X86_64_RELOC_UNSIGNED

Keno Fischer kfischer at college.harvard.edu
Thu Jun 12 15:10:10 PDT 2014


Realized my original reply accidentally didn't go to the mailing list (see
below). I'm also still pondering the question of
whether for jitting purposes setting ->addr or ->offset would be better.


On Mon, Jun 9, 2014 at 9:46 PM, Keno Fischer <kfischer at college.harvard.edu>
wrote:

>
>
>
> On Mon, Jun 9, 2014 at 9:30 PM, Nick Kledzik <kledzik at apple.com> wrote:
>
>>
>> On Jun 9, 2014, at 6:01 PM, Keno Fischer <kfischer at college.harvard.edu>
>> wrote:
>>
>> Also, may I ask what the semantics for X86_64_RELOC_SIGNED are with an
>> r_extern=0 relocation?
>>
>> That is only used for 32-bit fixups such as in RIP-relative instructions.
>>  The r_extern=0 case might occur if the instruction references something in
>> a section that has no symbols.  The JIT would need to do an analogous
>> update of adding to the fixup location the (32-bit signed) difference
>> between the final runtime address minus the object file address of the
>> start of the section containing the thing being referenced by the RIP
>> relative instruction.
>>
>>
> Ok, a non-external X86_64_RELOC_SIGNED doesn't make sense then since the
> address would always be positive so the unsigned variant could just be
> used.
>
>
>>
>> On Mon, Jun 9, 2014 at 8:50 PM, Keno Fischer <
>> kfischer at college.harvard.edu> wrote:
>>
>>> Thank you for the explanation. Does that mean r_symbolnum is basically
>>> redundant in that case?
>>>
>> It usually is not needed.  The r_symbolnum (which is the section index
>> when r_extern=0) is needed when the target of the relocation is the start
>> of end of a section.  For instance if section __foo ends at address 0x300
>> and section __bar starts at sections 0x300 and the fixup location content
>> points to 0x300, you don’t know which section it is pointing to without
>> that r_symbolnum.  The sections may be split apart in the final execution
>> layout, so which section it is referencing is important in that edge case.
>>
>>
>>
> Ah, hadn't considered that edge case, thanks!
>
>>  Also, let me ask you how to handle the following use case which is
>>> somewhat related. Currently in MCJIT for MachO we are relocating all the
>>> debug sections. Eventually (as ELF does), it would be good to avoid this.
>>> However, this means that the debugger would have to handle relocations (as
>>> lldb currently does for ELF). With this scheme it seems impossible to me to
>>> adjust the vaddr of one section without adjusting the relocations that
>>> point at it. Is my interpretation of that correct? I guess the best we can
>>> do then is to to the relocations inline in the original copy of the object
>>> file.
>>>
>> In darwin tools, we leave the debug info in the .o file.  lldb can find
>> it there if it needs it.  To aid that, the linker generates “debug notes”
>> in the final linked image which contain the paths of the original .o files.
>>  These are STABS N_OSO symbol table entries.   Can you just ignore (not
>> copy to execution space) the DWARF debug sections in MCJIT for darwin?
>>
>>
> The way this works in ELF is that the vaddr in the object header is
> adjusted to the vaddr of the relocated section. I mirrored this approach in
> my pending patch to add MachO support (i.e. adjusting
> (section_(64))->addr). This means that if we don't relocate the debug
> section (i.e. don't copy it) then we'll have lost the information where the
> section used to be. I am now wondering if there is a better approach. Maybe
> by modifying (section_(64))->offset instead?
>
>
>> -Nick
>>
>>
>>
>>> Also, I'm not sure who at Apple does documentation, but would it be
>>> possible to include the gist of your response in the reference
>>> documentation? It's basically impossible to discern the semantics just from
>>> what's written there.
>>>
>>>
>>> On Mon, Jun 9, 2014 at 7:19 PM, Nick Kledzik <kledzik at apple.com> wrote:
>>>
>>>>
>>>> On Jun 8, 2014, at 8:59 PM, Keno Fischer <kfischer at college.harvard.edu>
>>>> wrote:
>>>>
>>>> > Hello everybody,
>>>> >
>>>> > I would like some insights on the semantics of the
>>>> X86_64_RELOC_UNSIGNED relocation type. When r_extern=1, the semantics seem
>>>> pretty clear:
>>>> >
>>>> > Let x be a pointer to r_offset of appropriate size given by r_size,
>>>> then
>>>> > *x += addr_of_symbol(r_symbolnum)
>>>> >
>>>> > However, when r_extern=0 the correct behavior is not clear. By
>>>> analogy with the above, I would have expected
>>>> >
>>>> > *x += addr_of_section(r_symbolnum)
>>>> >
>>>> > but what LLVM implements is different. In RTDyld it implements
>>>> >
>>>> > *x = (*x-addr_of_section(r_symbolnum)) + addr_of_section(r_symbolnum)
>>>> >
>>>> > or equivalently
>>>> >
>>>> > *x = *x
>>>> In ld64 relocations are parsed into “Fixups”.  A Fixup is a location to
>>>> fix up and a value/expression of what to set it to.  All sections are
>>>> parsed up into “atoms”.  A location is an atom and an offset (within the
>>>> atom).  The expression for a fixup is a target atom and optional addend
>>>> (e.g. &foo + 10).
>>>>
>>>> For X86_64_RELOC_UNSIGNED when r_extern=1, the location is the atom
>>>> containing the r_address (offset in the section), and the expression is the
>>>> atom corresponding to r_symbolnum plus the added that is the current
>>>> content of the location.  In the JIT case where you are trying to prepare a
>>>> object file for execution, that boils down to adding the final address of
>>>> the r_symbolnum atom to the current content (addend) in the fixup location.
>>>>
>>>> For X86_64_RELOC_UNSIGNED when r_extern=0, the fixup location is the
>>>> atom containing the r_address (offset in the section), and the expression
>>>> is whatever atom+offset the current contents of location points to in that
>>>> object file.  In the JIT case, the boils down to adjusting the location by
>>>> the amount the target atom slid from its address in the object file to its
>>>> final address for execution.  For instance, if the location contains
>>>> 0x00000218 which points into section __DATA,__data (0x200 thru 0x280) and
>>>> the __data section winds up at address 0x100001000 at runtime, then the
>>>> location needs to have 0x100000E00 added to it (0x100001000 - 0x200).
>>>>
>>>> -Nick
>>>>
>>>>
>>>> >
>>>> > i.e. a noop. This works because llvm codegen also emits the absolute
>>>> value of the address. I am unsure what is intended and would appreciate
>>>> some clarification. A couple of points to consider:
>>>> >
>>>> > 1. I checked ld64 and as far as I can tell it doesn't consider
>>>> non-external X86_64_RELOC_UNSIGNED but does *x +=
>>>> addr_of_symbol(r_symbolnum) regardless. That seems like a bug in ld64 to me
>>>> because other relocations in the same switch statement do check r_extern.
>>>> >
>>>> > 2. I implemented *x += addr_of_section(r_symbolnum) in LLVM and all
>>>> tests pass just fine
>>>> >
>>>> > 3. If the current implementation is correct r_symbolnum (and
>>>> potentially the entire relocation) basically meaningless, which could of
>>>> course be correct, but which is what originally caused me to look at this.
>>>> If so I'd appreciate an explanation as to why we need to have the
>>>> relocation in the first place.
>>>> >
>>>> > That's all I could find on the subject. I hope somebody else knows
>>>> more than I.
>>>> >
>>>> > Thanks,
>>>> > Keno
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > LLVM Developers mailing list
>>>> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140612/cab44b83/attachment.html>


More information about the llvm-dev mailing list