[PATCH] [MachO] MachOWriter generates bad indirect symbol tables when sections are split

David Fang fang at csl.cornell.edu
Tue Mar 12 01:14:19 PDT 2013


Hi Miachel and Daniel,
 	I may be a little late to the party, but is this the same issue 
I'm encountering in:
http://llvm.org/bugs/show_bug.cgi?id=14636
comments 26 to 34?  I only found this thread just now.
(Is there a bug reference for this in the database?)

Fang

> Ok, here is what I think the right fix is. Instead of creating the
> IndirectSymBase mapping we use to associate sections with their indirect
> offset start, in BindIndirectSymbols() we should:
>
> 1. Add a simple container struct (lets say MachSectionIndirectSymbols) for
> tracking the per-section list of indirect symbols. It will keep the list of
> symbols in the section and the index of the first indirect symbol in the
> section.
>
> 2. Keep a mapping from sections to the above type.
>
> 3. Add a SetVector to record the order of the sections that have indirect
> symbols.
>
> 4. During BindIndirectSymbols() we maintain the above information
> (populating the MachSectionIndirectSymbols per-section symbol arrays).
>
> 5. Once we have scanned all the symbols we make another pass over the
> sections (in the order seen via indirect symbols) and assign the start
> indices.
>
> 6. Update writing of the indirect symbol table to write in the same order
> as traversed in #5.
>
> Does that make sense? It's more work than your patch but it (a) should
> preserve binary compatibility with 'as' in situations where the indirect
> symbols don't appear out of order w.r.t. the sections, (b) it makes
> somewhere more explicit the relationship between sections and their list of
> symbols being in contiguous order.
>
> - Daniel
>
>
>
> On Tue, Feb 5, 2013 at 5:55 PM, Daniel Dunbar <daniel at zuster.org> wrote:
>
>> Hi Michael,
>>
>> I'll try and take a look at this tomorrow. Unfortunately I've paged out
>> the details of the indirect symbol handling so it might take me a bit to
>> figure out the right answer. The cookie nature of that code comes from
>> trying to keep exact binary compatibility with 'as' for testing purposes,
>> but have long past the point where that is necessary. It may be the right
>> fix is to reorganize the code to do what is natural. However, I'll take a
>> closer look and see if anything else comes to mind.
>>
>>  - Daniel
>>
>>
>> On Mon, Feb 4, 2013 at 7:49 AM, Kuperstein, Michael M <
>> michael.m.kuperstein at intel.com> wrote:
>>
>>> Ping?
>>>
>>> -----Original Message-----
>>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:
>>> llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Kuperstein, Michael M
>>> Sent: Wednesday, January 30, 2013 16:18
>>> To: llvm-commits at cs.uiuc.edu
>>> Subject: [PATCH] [MachO] MachOWriter generates bad indirect symbol tables
>>> when sections are split
>>>
>>> Hi,
>>>
>>> I just ran into a bug in the MachOWriter, and I'm not 100% certain what
>>> the fix should look like.
>>> The basic scenario is an .s file which has a split
>>> .non_lazy_symbol_pointer section, e.g. something like this:
>>>
>>> .non_lazy_symbol_pointer
>>> L_foo$non_lazy_ptr:
>>>      .indirect_symbol _foo
>>>      .long 0
>>> .section __IMPORT,__jump_table, symbol_stubs,self_modifying_code,5
>>> L_func:
>>>     .indirect_symbol _func
>>>     .byte 0x00, 0x00, 0x00, 0x00, 0x00
>>> L_bar$non_lazy_ptr:
>>>      .indirect_symbol _bar
>>>      .long 0
>>>
>>> The three symbols are collected into the IndirectSymbols list of the
>>> MCAssembler, in their original order (foo, func, bar).
>>> When a MachO object file is generated, two sections are created, one of
>>> type S_NON_LAZY_SYMBOL_POINTERS and one of type S_SYMBOL_STUBS.
>>> Each of the section descriptors contains an index into the indirect
>>> symbol table which signifies the start of the (sequential!) list of symbols
>>> that belong to this section. This index is determined based on the first
>>> (in IndirectSymbols) symbol that belongs to this section. The symbols are
>>> then emitted into the object in their order in IndirectSymbols. So for the
>>> snippet above, the result will be:
>>>
>>> Indirect symbol table:
>>> [foo, func, bar]
>>>
>>> S_NON_LAZY_SYMBOL_POINTERS: Symbols start at index 0, and it has 2
>>> symbols. (So the symbols for this section are "foo, func" instead of "foo,
>>> bar")
>>> S_SYMBOL_STUBS: Symbols start at index 1, and it has 1 symbol. (So the
>>> only symbol for this section is - correctly - "func")
>>>
>>> I'm attaching a patch that generates the right output, but is clearly not
>>> the right thing to do, design-wise.
>>> Can anyone familiar with MC/MachO advise regarding a better fix?
>>>
>>> (This patch also breaks some MachO tests that expect a very specific
>>> structure from the output object files, I'll fix them with the final patch)
>>>
>>> Thanks,
>>>    Michael
>>> ---------------------------------------------------------------------
>>> Intel Israel (74) Limited
>>>
>>> This e-mail and any attachments may contain confidential material for
>>> the sole use of the intended recipient(s). Any review or distribution
>>> by others is strictly prohibited. If you are not the intended
>>> recipient, please contact the sender and delete all copies.
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>
>>
>

-- 
David Fang
http://www.csl.cornell.edu/~fang/




More information about the llvm-commits mailing list