[PATCH] RuntimeDyldMachO should handle RIP-relative relocations correctly, and the corresponding tests should have coverage for this

Sat May 18 09:27:03 PDT 2013

On May 17, 2013, at 12:49 PM, Jim Grosbach <grosbach at apple.com> wrote:

> I’m really not sure that’s the right place to be making the PCRel adjustment. Since the value is target (and likely platform) dependent, why not handle it there?

It's also dependent on whether this is a local relocation, or an external one; something that resolveX86_64Relocation() and friends don't know.

> For example, this is done now in RuntimeDyldMachO::resolveARMRelocation().

No, this is *un*done in resolveARMRelocation(), and its friends for other hardware.  The point is that processRelocationRef() will leave an unadjusted Addend in some cases (the isExtern case when RelType != macho::RIT_X86_64_GOT) and will do some of the adjustment in other cases.

Once we have a RelocationEntry created by processRelocationRef(), the rest of the code expects there to be an adjustment by 4, or 8, bytes (depending on platform) for PC-relative relocations.

The bug that this patch fixes is that we were *not* adding that adjustment in processRelocationRef() in the case of local pc-relative relocations, and then resolveX86_64Relocation() and friends would undo an adjustment that hadn't been done, resulting in garbage.

> Won’t this result in the adjustment happening twice?

It's not happening twice.  It's happening and then unhappening.

> Likewise, there’s already code in resolveX86_64Relocation() to subtract 4 for PCRel values. What’s getting missed by these adjustments?

The current protocol by which processRelocationRef() and resolveBlahRelocation() involve the adjustment-by-4, but processRelocationRef() was forgetting to add that adjustment in one place.  This patch fixes that one place.

-Filip

> 
> On May 16, 2013, at 12:39 PM, Filip Pizlo <fpizlo at apple.com> wrote:
> 
>> Guys,
>> 
>> I've rolled a new set of patches that addresses the differences on ARM, and I've made the small code model tests more resilient by giving a hard guarantee that all code and data will be "close enough".  I do this by creating a FixedPoolMemoryManager just for testing.
>> 
>> Here are the new patches:
>> 
>> <Mail Attachment>
>> <Mail Attachment>
>> 
>> -Filip
>> 
>> 
>> On May 15, 2013, at 10:21 AM, "Kaylor, Andrew" <andrew.kaylor at intel.com> wrote:
>> 
>>> I think that you’ve just not come across the right combination to expose the problem with small code model.  I’ve seen problems, so I know they’re out there.  See, for example, the current thread about TLS with MCJIT.
>> 
>> I understand.  But I don't like the idea of having LLVM generate *worse* code than our current JITs.
>> 
>> Also, these patches fix a bug in PC-relative relocations that is unrelated to whether or not the small code model works for WebKit.  Do you buy that the patches are right, or do you think that I'm actually regressing PC-relative behavior?
>> 
>>>  
>>> The x86-64 ABI says this about small code model:
>>>  
>>> “The virtual address of code executed is known at link time.  Additionally all symbols are known to be located in the virtual addresses in the range from 0 to 2^31 - 2^10 - 1.  This allows the compiler to encode symbolic references with offsets in the range from -2^31 to 2^10 directly in the sign extended immediate operands, with offsets in the range from 0 to 2^31 +2^10 in the zero extended immediate operands and use instruction pointer relative addressing for the symbols with offsets in the range -2^10 to 2^10.”
>>>  
>>> It makes my head hurt trying to parse that, but what I can tell you is that when you use small code model with static relocation model the MC code generator for x86-64 + ELF sometimes generates R_X86_64_32 relocations that resolve to 64-bit values needing to be written into 32-bit slots when the code isn’t loaded in the lower 2GB.
>> 
>> If LLVM doesn't currently have a way of being told that all code and data in a module will be within 2GB of each other, then we should fix it so that it does have that.  If all it takes is having a way of saying that we want small code model but without the assumption that everything ends up in the low 32-bit space, then I'd be happy to look into adding that.
>> 
>>>  
>>> -Andy
>>>  
>>> From: Filip Pizlo [mailto:fpizlo at apple.com] 
>>> Sent: Tuesday, May 14, 2013 5:21 PM
>>> To: Kaylor, Andrew
>>> Cc: llvm-commits at cs.uiuc.edu
>>> Subject: Re: [PATCH] RuntimeDyldMachO should handle RIP-relative relocations correctly, and the corresponding tests should have coverage for this
>>>  
>>>  
>>> On May 14, 2013, at 5:13 PM, "Kaylor, Andrew" <andrew.kaylor at intel.com> wrote:
>>> 
>>> 
>>> Hi Filip,
>>> 
>>> You've got a cut-and-paste error in the ArchSupportsMCJITWithSmallCodeModel function.  It's searching the old data structure.
>>>  
>>> Ooops!  I will fix.
>>> 
>>> 
>>> 
>>> More importantly, I would have said that MCJIT doesn't exactly support small code on x86_64.  Strictly speaking, MCJIT does support it, but it relies on the memory manager to allocate memory for code and data in the lower 2GB of user address space.  At least on Linux and FreeBSD, that doesn't usually happen unless you've done something special to make it happen.
>>>  
>>> I don't think that's true.  I'm never seeing it allocate anything in the low 2GB, and yet it works.  I think it just relies on code and data being close enough to each other.  Currently, SectionMemoryManager appears to make this be true.
>>>  
>>> WebKit is definitely going to use the small memory model on X86_64: we ensure that all code on X86_64 is always allocated within a common 1GB pool of virtual memory.  We currently don't have the notion of "data" in WebKit - at least not what LLVM calls data - since global variables in JavaScript will be GC-allocated by us, and the LLVM IR will refer to them directly (i.e. a pointer immediate in the IR itself).  But we will add support for this to ensure that constant pools work right - though this will require some other work, since currently RuntimeDyldMachO thinks that constant pools are code (since they are __text sections).  Once this is fixed though, we will then ensure that data sections we allocate for LLVM are in an adjacent 1GB pool.
>>>  
>>> This is what I'm trying to get right.  The current WebKit JITs benefit from things like never having to use anything bigger than a 32-bit offset to access anything JIT-related.  It would be a regression if LLVM couldn't do it; and it appears that LLVM's MCJIT can do it with the small code model already.
>>>  
>>> If you think it's better, I would be happy to hack up a RTDyldMemoryManager subclass that provides a strong guarantee that code and data are within 2GB of each other, and then use that for my tests.  The point is, I'd like to make sure that behavior that WebKit relies on, that LLVM happens to provide, is covered by tests inside LLVM.
>>>  
>>> -Filip
>>> 
>>> 
>>> 
>>> -Andy
>>> 
>>> -----Original Message-----
>>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Filip Pizlo
>>> Sent: Sunday, May 12, 2013 6:04 PM
>>> To: llvm-commits at cs.uiuc.edu
>>> Subject: [PATCH] RuntimeDyldMachO should handle RIP-relative relocations correctly, and the corresponding tests should have coverage for this
>>> 
>>> The small code model on X86_64 involves using RIP-relative relocations, particularly for the constant pool.
>>> 
>>> The RuntimeDyldMachO doesn't understand these correctly.  In particular, it doesn't understand what offset immediate will be used in a RIP-relative reference to a local symbol as in:
>>> 
>>>             .section            __TEXT,__literal8,8byte_literals
>>>             .align   3
>>> LABEL:
>>>     .quad <blah>
>>>             .section            __TEXT,__text,regular,pure_instructions
>>> # .... bunch of things things
>>> addsd LABEL(%rip), %xmm0
>>> 
>>> In MachO, the "LABEL(%rip)" will be transformed into an offset from the addsd instruction to where the compiler thought the __literal8 section will be placed (plus the offset of the immediate within the __literal8 section, but that offset is zero, in this example).  For example, the compiler may place the __literal8 section right after the __text section, in which case the offset immediate in the addsd instruction will correspond to the size of the __text section, minus the offset of the addsd instruction, and then plus whatever padding/alignment for the __literal8 section.
>>> 
>>> So, for example, if we then allocated the __literal8 section exactly after the __text section in the MCJIT, then we wouldn't want to change the offset in the addsd instruction.  More broadly, we would want to:
>>> 
>>> 1) subtract out the offset to the __literal8 section according to the MachO file (i.e. the offset from the referencing instruction - the addsd in this example - to the __literal8 section)
>>> 2) add in the true offset to the __literal8 section after both __text and __literal8 have addresses in target memory.
>>> 
>>> Note that after (1) we will have an offset from the start of  __literal8 to the constant that the code is actually referencing. Then after (2) we have the correct offset that the code should actually be using.
>>> 
>>> This allows for the __literal8 section to be placed anywhere we want, and also allows the compiler to convey specific offsets inside of constant pools.
>>> 
>>> This is different from other relocations; for example if we had used the large code model instead, then we would have a load immediate (for example "movabsq $LABEL, %rax"), where the $LABEL that is encoded in the MachO file will be an offset of the __literal8 section relative to the *start* of the module, rather than relative to the instruction that had the reference.
>>> 
>>> The RuntimeDyldMachO now only gets this part-way right.  It knows how to handle non-PC-relative relocations just fine.  For PC-relative ones, it knows how to do (2) above, but it doesn't understand how to do (1): it assumes that the code has an offset from the start of the module, rather than an offset from the referencing instruction.
>>> 
>>> The included patch fixes this bug.
>>> 
>>> I also include a separate patch that does some stuff to the tests, including adding coverage for small code model on x86_64.  It makes sense for a lot of JIT clients to use the small code model, and I believe it's good for LLVM to have test coverage for this.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130518/2cce863a/attachment.html>