[PATCH] [MachO] MachOWriter generates bad indirect symbol tables when sections are split

Kuperstein, Michael M michael.m.kuperstein at intel.com
Tue Mar 12 04:42:37 PDT 2013


Hi David,

Sorry, I kind of dropped the ball on this one, got sucked into a different project, and didn't have time to touch it.
And no, there's no bug reference, unfortunately... thought I was going to fix this on the spot. :-\

Anyway, yes, it looks related. IndirectSymBase gets built incorrectly when not all symbols of the same type are in a sequence in the input. My bug is one instance of this, yours seem to be another.

Michael

-----Original Message-----
From: David Fang [mailto:fang at csl.cornell.edu] 
Sent: Tuesday, March 12, 2013 10:14
To: Daniel Dunbar
Cc: Kuperstein, Michael M; llvm-commits at cs.uiuc.edu
Subject: Re: [PATCH] [MachO] MachOWriter generates bad indirect symbol tables when sections are split

Hi Miachel and Daniel,
 	I may be a little late to the party, but is this the same issue I'm encountering in:
http://llvm.org/bugs/show_bug.cgi?id=14636
comments 26 to 34?  I only found this thread just now.
(Is there a bug reference for this in the database?)

Fang

> Ok, here is what I think the right fix is. Instead of creating the 
> IndirectSymBase mapping we use to associate sections with their 
> indirect offset start, in BindIndirectSymbols() we should:
>
> 1. Add a simple container struct (lets say MachSectionIndirectSymbols) 
> for tracking the per-section list of indirect symbols. It will keep 
> the list of symbols in the section and the index of the first indirect 
> symbol in the section.
>
> 2. Keep a mapping from sections to the above type.
>
> 3. Add a SetVector to record the order of the sections that have 
> indirect symbols.
>
> 4. During BindIndirectSymbols() we maintain the above information 
> (populating the MachSectionIndirectSymbols per-section symbol arrays).
>
> 5. Once we have scanned all the symbols we make another pass over the 
> sections (in the order seen via indirect symbols) and assign the start 
> indices.
>
> 6. Update writing of the indirect symbol table to write in the same 
> order as traversed in #5.
>
> Does that make sense? It's more work than your patch but it (a) should 
> preserve binary compatibility with 'as' in situations where the 
> indirect symbols don't appear out of order w.r.t. the sections, (b) it 
> makes somewhere more explicit the relationship between sections and 
> their list of symbols being in contiguous order.
>
> - Daniel
>
>
>
> On Tue, Feb 5, 2013 at 5:55 PM, Daniel Dunbar <daniel at zuster.org> wrote:
>
>> Hi Michael,
>>
>> I'll try and take a look at this tomorrow. Unfortunately I've paged 
>> out the details of the indirect symbol handling so it might take me a 
>> bit to figure out the right answer. The cookie nature of that code 
>> comes from trying to keep exact binary compatibility with 'as' for 
>> testing purposes, but have long past the point where that is 
>> necessary. It may be the right fix is to reorganize the code to do 
>> what is natural. However, I'll take a closer look and see if anything else comes to mind.
>>
>>  - Daniel
>>
>>
>> On Mon, Feb 4, 2013 at 7:49 AM, Kuperstein, Michael M < 
>> michael.m.kuperstein at intel.com> wrote:
>>
>>> Ping?
>>>
>>> -----Original Message-----
>>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:
>>> llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Kuperstein, Michael M
>>> Sent: Wednesday, January 30, 2013 16:18
>>> To: llvm-commits at cs.uiuc.edu
>>> Subject: [PATCH] [MachO] MachOWriter generates bad indirect symbol 
>>> tables when sections are split
>>>
>>> Hi,
>>>
>>> I just ran into a bug in the MachOWriter, and I'm not 100% certain 
>>> what the fix should look like.
>>> The basic scenario is an .s file which has a split 
>>> .non_lazy_symbol_pointer section, e.g. something like this:
>>>
>>> .non_lazy_symbol_pointer
>>> L_foo$non_lazy_ptr:
>>>      .indirect_symbol _foo
>>>      .long 0
>>> .section __IMPORT,__jump_table, symbol_stubs,self_modifying_code,5
>>> L_func:
>>>     .indirect_symbol _func
>>>     .byte 0x00, 0x00, 0x00, 0x00, 0x00
>>> L_bar$non_lazy_ptr:
>>>      .indirect_symbol _bar
>>>      .long 0
>>>
>>> The three symbols are collected into the IndirectSymbols list of the 
>>> MCAssembler, in their original order (foo, func, bar).
>>> When a MachO object file is generated, two sections are created, one 
>>> of type S_NON_LAZY_SYMBOL_POINTERS and one of type S_SYMBOL_STUBS.
>>> Each of the section descriptors contains an index into the indirect 
>>> symbol table which signifies the start of the (sequential!) list of 
>>> symbols that belong to this section. This index is determined based 
>>> on the first (in IndirectSymbols) symbol that belongs to this 
>>> section. The symbols are then emitted into the object in their order 
>>> in IndirectSymbols. So for the snippet above, the result will be:
>>>
>>> Indirect symbol table:
>>> [foo, func, bar]
>>>
>>> S_NON_LAZY_SYMBOL_POINTERS: Symbols start at index 0, and it has 2 
>>> symbols. (So the symbols for this section are "foo, func" instead of 
>>> "foo,
>>> bar")
>>> S_SYMBOL_STUBS: Symbols start at index 1, and it has 1 symbol. (So 
>>> the only symbol for this section is - correctly - "func")
>>>
>>> I'm attaching a patch that generates the right output, but is 
>>> clearly not the right thing to do, design-wise.
>>> Can anyone familiar with MC/MachO advise regarding a better fix?
>>>
>>> (This patch also breaks some MachO tests that expect a very specific 
>>> structure from the output object files, I'll fix them with the final 
>>> patch)
>>>
>>> Thanks,
>>>    Michael
>>> --------------------------------------------------------------------
>>> -
>>> Intel Israel (74) Limited
>>>
>>> This e-mail and any attachments may contain confidential material 
>>> for the sole use of the intended recipient(s). Any review or 
>>> distribution by others is strictly prohibited. If you are not the 
>>> intended recipient, please contact the sender and delete all copies.
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>
>>
>

--
David Fang
http://www.csl.cornell.edu/~fang/

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.





More information about the llvm-commits mailing list