[PATCH] [MachO] MachOWriter generates bad indirect symbol tables when sections are split

David Fang fang at csl.cornell.edu
Tue Mar 12 11:01:11 PDT 2013


Hi Michael,
 	No worries, I'm just glad I'm not alone on this issue -- increases 
the likelihood of it getting fixed!  Are you running on x86 or ARM 
+mach-o?  (assumed x86 b/c of Intel)  I'm wondering why this problem 
hasn't been seen earlier, seeing that clang has been the system compiler 
for apple for quite some time.  I was working on the PPC/mach-o backend 
when I ran into this.  I'm a little afraid to hack the pieces that are 
common to all architectures in MachObjectWriter, as I assume they've been 
working for !PPC, and that problems I run into are PPC-specific.
 	Anyways, I think I understand the gist of Daniel's suggestion for 
the right fix.  I could take a crack at it in my spare time, but I'd much 
rather prefer that someone who already has intimate knowledge write it, 
and I'd be happy to test.  :D

Fang

> Hi David,
>
> Sorry, I kind of dropped the ball on this one, got sucked into a different project, and didn't have time to touch it.
> And no, there's no bug reference, unfortunately... thought I was going to fix this on the spot. :-\
>
> Anyway, yes, it looks related. IndirectSymBase gets built incorrectly when not all symbols of the same type are in a sequence in the input. My bug is one instance of this, yours seem to be another.
>
> Michael
>
> -----Original Message-----
> From: David Fang [mailto:fang at csl.cornell.edu]
> Sent: Tuesday, March 12, 2013 10:14
> To: Daniel Dunbar
> Cc: Kuperstein, Michael M; llvm-commits at cs.uiuc.edu
> Subject: Re: [PATCH] [MachO] MachOWriter generates bad indirect symbol tables when sections are split
>
> Hi Miachel and Daniel,
> 	I may be a little late to the party, but is this the same issue I'm encountering in:
> http://llvm.org/bugs/show_bug.cgi?id=14636
> comments 26 to 34?  I only found this thread just now.
> (Is there a bug reference for this in the database?)
>
> Fang
>
>> Ok, here is what I think the right fix is. Instead of creating the
>> IndirectSymBase mapping we use to associate sections with their
>> indirect offset start, in BindIndirectSymbols() we should:
>>
>> 1. Add a simple container struct (lets say MachSectionIndirectSymbols)
>> for tracking the per-section list of indirect symbols. It will keep
>> the list of symbols in the section and the index of the first indirect
>> symbol in the section.
>>
>> 2. Keep a mapping from sections to the above type.
>>
>> 3. Add a SetVector to record the order of the sections that have
>> indirect symbols.
>>
>> 4. During BindIndirectSymbols() we maintain the above information
>> (populating the MachSectionIndirectSymbols per-section symbol arrays).
>>
>> 5. Once we have scanned all the symbols we make another pass over the
>> sections (in the order seen via indirect symbols) and assign the start
>> indices.
>>
>> 6. Update writing of the indirect symbol table to write in the same
>> order as traversed in #5.
>>
>> Does that make sense? It's more work than your patch but it (a) should
>> preserve binary compatibility with 'as' in situations where the
>> indirect symbols don't appear out of order w.r.t. the sections, (b) it
>> makes somewhere more explicit the relationship between sections and
>> their list of symbols being in contiguous order.
>>
>> - Daniel
>>
>>
>>
>> On Tue, Feb 5, 2013 at 5:55 PM, Daniel Dunbar <daniel at zuster.org> wrote:
>>
>>> Hi Michael,
>>>
>>> I'll try and take a look at this tomorrow. Unfortunately I've paged
>>> out the details of the indirect symbol handling so it might take me a
>>> bit to figure out the right answer. The cookie nature of that code
>>> comes from trying to keep exact binary compatibility with 'as' for
>>> testing purposes, but have long past the point where that is
>>> necessary. It may be the right fix is to reorganize the code to do
>>> what is natural. However, I'll take a closer look and see if anything else comes to mind.
>>>
>>>  - Daniel
>>>
>>>
>>> On Mon, Feb 4, 2013 at 7:49 AM, Kuperstein, Michael M <
>>> michael.m.kuperstein at intel.com> wrote:
>>>
>>>> Ping?
>>>>
>>>> -----Original Message-----
>>>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:
>>>> llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Kuperstein, Michael M
>>>> Sent: Wednesday, January 30, 2013 16:18
>>>> To: llvm-commits at cs.uiuc.edu
>>>> Subject: [PATCH] [MachO] MachOWriter generates bad indirect symbol
>>>> tables when sections are split
>>>>
>>>> Hi,
>>>>
>>>> I just ran into a bug in the MachOWriter, and I'm not 100% certain
>>>> what the fix should look like.
>>>> The basic scenario is an .s file which has a split
>>>> .non_lazy_symbol_pointer section, e.g. something like this:
>>>>
>>>> .non_lazy_symbol_pointer
>>>> L_foo$non_lazy_ptr:
>>>>      .indirect_symbol _foo
>>>>      .long 0
>>>> .section __IMPORT,__jump_table, symbol_stubs,self_modifying_code,5
>>>> L_func:
>>>>     .indirect_symbol _func
>>>>     .byte 0x00, 0x00, 0x00, 0x00, 0x00
>>>> L_bar$non_lazy_ptr:
>>>>      .indirect_symbol _bar
>>>>      .long 0
>>>>
>>>> The three symbols are collected into the IndirectSymbols list of the
>>>> MCAssembler, in their original order (foo, func, bar).
>>>> When a MachO object file is generated, two sections are created, one
>>>> of type S_NON_LAZY_SYMBOL_POINTERS and one of type S_SYMBOL_STUBS.
>>>> Each of the section descriptors contains an index into the indirect
>>>> symbol table which signifies the start of the (sequential!) list of
>>>> symbols that belong to this section. This index is determined based
>>>> on the first (in IndirectSymbols) symbol that belongs to this
>>>> section. The symbols are then emitted into the object in their order
>>>> in IndirectSymbols. So for the snippet above, the result will be:
>>>>
>>>> Indirect symbol table:
>>>> [foo, func, bar]
>>>>
>>>> S_NON_LAZY_SYMBOL_POINTERS: Symbols start at index 0, and it has 2
>>>> symbols. (So the symbols for this section are "foo, func" instead of
>>>> "foo,
>>>> bar")
>>>> S_SYMBOL_STUBS: Symbols start at index 1, and it has 1 symbol. (So
>>>> the only symbol for this section is - correctly - "func")
>>>>
>>>> I'm attaching a patch that generates the right output, but is
>>>> clearly not the right thing to do, design-wise.
>>>> Can anyone familiar with MC/MachO advise regarding a better fix?
>>>>
>>>> (This patch also breaks some MachO tests that expect a very specific
>>>> structure from the output object files, I'll fix them with the final
>>>> patch)
>>>>
>>>> Thanks,
>>>>    Michael
>>>> --------------------------------------------------------------------
>>>> -
>>>> Intel Israel (74) Limited
>>>>
>>>> This e-mail and any attachments may contain confidential material
>>>> for the sole use of the intended recipient(s). Any review or
>>>> distribution by others is strictly prohibited. If you are not the
>>>> intended recipient, please contact the sender and delete all copies.
>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>
>>>
>>>
>>
>
> --
> David Fang
> http://www.csl.cornell.edu/~fang/
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>

-- 
David Fang
http://www.csl.cornell.edu/~fang/




More information about the llvm-commits mailing list