[PATCH] [MachO] MachOWriter generates bad indirect symbol tables when sections are split

Kuperstein, Michael M michael.m.kuperstein at intel.com
Tue Mar 12 23:53:16 PDT 2013


You're right, it's x86. 
While clang is the system compiler, I'm not sure the integrated assembler is the system assembler (and don't have an apple machine next to me right now to check)
Also - and this is pure conjecture, based only on the fact this hasn't been caught yet - it's possible that clang itself doesn't normally generate anything with unordered symbols. I ran into the problem while trying to use clang's integrated assembler to assemble files generated by a different compiler.

Michael

-----Original Message-----
From: David Fang [mailto:fang at csl.cornell.edu] 
Sent: Tuesday, March 12, 2013 20:01
To: Kuperstein, Michael M
Cc: Daniel Dunbar; llvm-commits at cs.uiuc.edu
Subject: RE: [PATCH] [MachO] MachOWriter generates bad indirect symbol tables when sections are split

Hi Michael,
 	No worries, I'm just glad I'm not alone on this issue -- increases the likelihood of it getting fixed!  Are you running on x86 or ARM 
+mach-o?  (assumed x86 b/c of Intel)  I'm wondering why this problem
hasn't been seen earlier, seeing that clang has been the system compiler for apple for quite some time.  I was working on the PPC/mach-o backend when I ran into this.  I'm a little afraid to hack the pieces that are common to all architectures in MachObjectWriter, as I assume they've been working for !PPC, and that problems I run into are PPC-specific.
 	Anyways, I think I understand the gist of Daniel's suggestion for the right fix.  I could take a crack at it in my spare time, but I'd much rather prefer that someone who already has intimate knowledge write it, and I'd be happy to test.  :D

Fang

> Hi David,
>
> Sorry, I kind of dropped the ball on this one, got sucked into a different project, and didn't have time to touch it.
> And no, there's no bug reference, unfortunately... thought I was going 
> to fix this on the spot. :-\
>
> Anyway, yes, it looks related. IndirectSymBase gets built incorrectly when not all symbols of the same type are in a sequence in the input. My bug is one instance of this, yours seem to be another.
>
> Michael
>
> -----Original Message-----
> From: David Fang [mailto:fang at csl.cornell.edu]
> Sent: Tuesday, March 12, 2013 10:14
> To: Daniel Dunbar
> Cc: Kuperstein, Michael M; llvm-commits at cs.uiuc.edu
> Subject: Re: [PATCH] [MachO] MachOWriter generates bad indirect symbol 
> tables when sections are split
>
> Hi Miachel and Daniel,
> 	I may be a little late to the party, but is this the same issue I'm encountering in:
> http://llvm.org/bugs/show_bug.cgi?id=14636
> comments 26 to 34?  I only found this thread just now.
> (Is there a bug reference for this in the database?)
>
> Fang
>
>> Ok, here is what I think the right fix is. Instead of creating the 
>> IndirectSymBase mapping we use to associate sections with their 
>> indirect offset start, in BindIndirectSymbols() we should:
>>
>> 1. Add a simple container struct (lets say 
>> MachSectionIndirectSymbols) for tracking the per-section list of 
>> indirect symbols. It will keep the list of symbols in the section and 
>> the index of the first indirect symbol in the section.
>>
>> 2. Keep a mapping from sections to the above type.
>>
>> 3. Add a SetVector to record the order of the sections that have 
>> indirect symbols.
>>
>> 4. During BindIndirectSymbols() we maintain the above information 
>> (populating the MachSectionIndirectSymbols per-section symbol arrays).
>>
>> 5. Once we have scanned all the symbols we make another pass over the 
>> sections (in the order seen via indirect symbols) and assign the 
>> start indices.
>>
>> 6. Update writing of the indirect symbol table to write in the same 
>> order as traversed in #5.
>>
>> Does that make sense? It's more work than your patch but it (a) 
>> should preserve binary compatibility with 'as' in situations where 
>> the indirect symbols don't appear out of order w.r.t. the sections, 
>> (b) it makes somewhere more explicit the relationship between 
>> sections and their list of symbols being in contiguous order.
>>
>> - Daniel
>>
>>
>>
>> On Tue, Feb 5, 2013 at 5:55 PM, Daniel Dunbar <daniel at zuster.org> wrote:
>>
>>> Hi Michael,
>>>
>>> I'll try and take a look at this tomorrow. Unfortunately I've paged 
>>> out the details of the indirect symbol handling so it might take me 
>>> a bit to figure out the right answer. The cookie nature of that code 
>>> comes from trying to keep exact binary compatibility with 'as' for 
>>> testing purposes, but have long past the point where that is 
>>> necessary. It may be the right fix is to reorganize the code to do 
>>> what is natural. However, I'll take a closer look and see if anything else comes to mind.
>>>
>>>  - Daniel
>>>
>>>
>>> On Mon, Feb 4, 2013 at 7:49 AM, Kuperstein, Michael M < 
>>> michael.m.kuperstein at intel.com> wrote:
>>>
>>>> Ping?
>>>>
>>>> -----Original Message-----
>>>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:
>>>> llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Kuperstein, Michael 
>>>> M
>>>> Sent: Wednesday, January 30, 2013 16:18
>>>> To: llvm-commits at cs.uiuc.edu
>>>> Subject: [PATCH] [MachO] MachOWriter generates bad indirect symbol 
>>>> tables when sections are split
>>>>
>>>> Hi,
>>>>
>>>> I just ran into a bug in the MachOWriter, and I'm not 100% certain 
>>>> what the fix should look like.
>>>> The basic scenario is an .s file which has a split 
>>>> .non_lazy_symbol_pointer section, e.g. something like this:
>>>>
>>>> .non_lazy_symbol_pointer
>>>> L_foo$non_lazy_ptr:
>>>>      .indirect_symbol _foo
>>>>      .long 0
>>>> .section __IMPORT,__jump_table, symbol_stubs,self_modifying_code,5
>>>> L_func:
>>>>     .indirect_symbol _func
>>>>     .byte 0x00, 0x00, 0x00, 0x00, 0x00
>>>> L_bar$non_lazy_ptr:
>>>>      .indirect_symbol _bar
>>>>      .long 0
>>>>
>>>> The three symbols are collected into the IndirectSymbols list of 
>>>> the MCAssembler, in their original order (foo, func, bar).
>>>> When a MachO object file is generated, two sections are created, 
>>>> one of type S_NON_LAZY_SYMBOL_POINTERS and one of type S_SYMBOL_STUBS.
>>>> Each of the section descriptors contains an index into the indirect 
>>>> symbol table which signifies the start of the (sequential!) list of 
>>>> symbols that belong to this section. This index is determined based 
>>>> on the first (in IndirectSymbols) symbol that belongs to this 
>>>> section. The symbols are then emitted into the object in their 
>>>> order in IndirectSymbols. So for the snippet above, the result will be:
>>>>
>>>> Indirect symbol table:
>>>> [foo, func, bar]
>>>>
>>>> S_NON_LAZY_SYMBOL_POINTERS: Symbols start at index 0, and it has 2 
>>>> symbols. (So the symbols for this section are "foo, func" instead 
>>>> of "foo,
>>>> bar")
>>>> S_SYMBOL_STUBS: Symbols start at index 1, and it has 1 symbol. (So 
>>>> the only symbol for this section is - correctly - "func")
>>>>
>>>> I'm attaching a patch that generates the right output, but is 
>>>> clearly not the right thing to do, design-wise.
>>>> Can anyone familiar with MC/MachO advise regarding a better fix?
>>>>
>>>> (This patch also breaks some MachO tests that expect a very 
>>>> specific structure from the output object files, I'll fix them with 
>>>> the final
>>>> patch)
>>>>
>>>> Thanks,
>>>>    Michael
>>>> -------------------------------------------------------------------
>>>> -
>>>> -
>>>> Intel Israel (74) Limited
>>>>
>>>> This e-mail and any attachments may contain confidential material 
>>>> for the sole use of the intended recipient(s). Any review or 
>>>> distribution by others is strictly prohibited. If you are not the 
>>>> intended recipient, please contact the sender and delete all copies.
>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>
>>>
>>>
>>
>
> --
> David Fang
> http://www.csl.cornell.edu/~fang/
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for 
> the sole use of the intended recipient(s). Any review or distribution 
> by others is strictly prohibited. If you are not the intended 
> recipient, please contact the sender and delete all copies.
>

--
David Fang
http://www.csl.cornell.edu/~fang/

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.





More information about the llvm-commits mailing list