[PATCH] [MachO] MachOWriter generates bad indirect symbol tables when sections are split

David Fang fang at csl.cornell.edu
Tue Mar 19 02:27:42 PDT 2013


Hi Daniel,
 	Thanks for reviewing the patch.  I thought I tried suggestion #1 
before this patch in http://llvm.org/bugs/show_bug.cgi?id=14636#c33, which 
only reordeed IndirectSymBase without reordering WriteObject, and didn't 
quite produce the desired result.  I may revisit it again.
 	If I reorder IndirectSymBase, I need to also reorder in 
WriteObject, so am I allowed to move the IndirectSymbol (mapped by 
section) and IndirectSymbolSection (order of first appearance) structures 
out of MCAssembler into MachObjectWriter?  I need the ordering information 
from the first pass of BindIndirectSymbols to persist until WriteObject, 
without having to sort twice.  This transformation is intended for X86,PPC 
(probably ARM as well) +mach-o back-ends after all.

Fang

> Hi David,
>
> Ok, here are my main high level comments on the patch. I have some specific
> nits too but I'll save them for later:
>
> #1. I would rather this patch not require modifying the MCAssembler level
> interface. Instead, what I would like is for the new data structures
> (IndirectSymbols and IndirectSymbolSections) to get built up inside
> BindIndirectSymbols().
>
> #2. I don't think this patch should actually cause test cases to change.
> Most of those test cases were generated based on matching existing 'as'
> output for x86, and the way I envisioned this patch working I wouldn't have
> expected it to change the order of symbols except in the case where the
> indirect symbols would be out of order.
>
> I believe that if you fix #1 then #2 will fall out, because the indirect
> symbol lists will get built up in the same order as they are now (based on
> iterating the indirect symbol list twice).
>
> - Daniel
>
>
> On Thu, Mar 14, 2013 at 10:26 AM, David Fang <fang at csl.cornell.edu> wrote:
>
>> Sure, Daniel, you seem pretty active, I can ping again in a week.
>>
>> patch notes: The patch is written so that changing a single #if in
>> MCAssembler.h at the top will effectively revert it (for ease of testing).
>> A few typedefs for the new structures in MCAssembler are made public just
>> for convenience.  It looks like I'm deep in FIXME territory, judging by the
>> comments in the touched files.  Feel free to pass a revised patch back to
>> me as you see fit.
>>
>> Fang
>>
>>  Hi David,
>>>
>>> Thanks for working on a patch, I'll take a look but it may take me a
>>> couple
>>> days to get to it.
>>>
>>> - Daniel
>>>
>>>
>>> On Wed, Mar 13, 2013 at 1:48 PM, David Fang <fang at csl.cornell.edu> wrote:
>>>
>>>  Daniel,
>>>>         The resulting LLVM test diffs with my patch are shown here:
>>>> http://llvm.org/bugs/show_bug.****cgi?id=14636#c38<http://llvm.org/bugs/show_bug.**cgi?id=14636#c38>
>>>> <http://**llvm.org/bugs/show_bug.cgi?id=**14636#c38<http://llvm.org/bugs/show_bug.cgi?id=14636#c38>
>>>>>
>>>>
>>>>
>>>> 2 tests "failed" in MC/MachO: indirect-symbols.s and symbol-indirect.s
>>>> No other LLVM/clang tests regressed (Targets: PPC,X86,ARM).
>>>>
>>>> The differences are that symbols are re-ordered, which accomplishes the
>>>> objective of the patch. There are FIXME comments that require a
>>>> maintainer's scrutiny.
>>>> Could you review these differences that result from my patch?
>>>> The patch as-is now is in proof-of-concept form.  I'd be happy to revise
>>>> as you suggest.
>>>> If this patch benefits all Mach-O architectures (getting closer to usable
>>>> integrated assembly), I'd like to get this into trunk.
>>>>
>>>> Fang
>>>>
>>>>
>>>>  Michael, Daniel,
>>>>
>>>>>         The attached patch I wrote works for me so far on my small test
>>>>> cases: llvm/clang's object file matches the indirect symbol ordering of
>>>>> the
>>>>> /usr/bin/as-assembled object's (tested on my powerpc-darwin8 branch).
>>>>> Michael, could you try it out on your own test cases?
>>>>> Daniel, could you kindly comment and review?  Just the technical merits,
>>>>> not style.
>>>>>          I've also attached the same patch to bug 14636.
>>>>>
>>>>> Thanks for taking a look.
>>>>>
>>>>> Fang
>>>>>
>>>>>   Daniel,
>>>>>
>>>>>>    Some quick questions about your proposed recipe:
>>>>>>
>>>>>>    Ok, here is what I think the right fix is. Instead of creating the
>>>>>>>>   IndirectSymBase mapping we use to associate sections with their
>>>>>>>>   indirect offset start, in BindIndirectSymbols() we should:
>>>>>>>>
>>>>>>>>>   1. Add a simple container struct (lets say > >
>>>>>>>>>>
>>>>>>>>> MachSectionIndirectSymbols)
>>>>>>
>>>>>>>   for tracking the per-section list of indirect symbols. It will keep
>>>>>>>>   the list of symbols in the section and the index of the first > >
>>>>>>>>
>>>>>>>   indirect
>>>>>>
>>>>>>>   symbol in the section.
>>>>>>>>
>>>>>>>
>>>>>>  Like a map<section*, vector<indirect_symbol> >?
>>>>>>
>>>>>>  Instead of keeping track of the first indirect symbol index, could we
>>>>>> just
>>>>>>  as easily compute the index offsets later during the second pass:
>>>>>>
>>>>>>  offset = 0;
>>>>>>  for each section S (in order)
>>>>>>    record offset for section S
>>>>>>    [optional] write vector of indirect symbols for S
>>>>>>    offset += S.num_indirect_symbols();
>>>>>>  end for
>>>>>>
>>>>>>  I was able to fix the reserve1 field by doing something like this (bug
>>>>>>  14636, comments 30-33), but I only later discovered that the indirect
>>>>>>  symbols needed to be reordered.
>>>>>>
>>>>>>    2. Keep a mapping from sections to the above type.
>>>>>>>>   3. Add a SetVector to record the order of the sections that have
>>>>>>>>   indirect symbols.
>>>>>>>>
>>>>>>>
>>>>>>  This I understand, I found SetVector and SmallSetVector in ADT.
>>>>>>
>>>>>>    4. During BindIndirectSymbols() we maintain the above information
>>>>>>>>   (populating the MachSectionIndirectSymbols per-section symbol > >
>>>>>>>>
>>>>>>>   arrays).
>>>>>>
>>>>>>  BindIndirectSymbols looks like it need not change, because it's
>>>>>> already
>>>>>>  operating on the assumption of ordered indirect symbols, right?
>>>>>>
>>>>>>    5. Once we have scanned all the symbols we make another pass over
>>>>>>>>
>>>>>>> the
>>>>>>
>>>>>>>   sections (in the order seen via indirect symbols) and assign the >
>>>>>>>>
>>>>>>>   start
>>>>>>>
>>>>>>>>   indices.
>>>>>>>>
>>>>>>>
>>>>>>  as I outlined above (offset = start indices).
>>>>>>
>>>>>>    6. Update writing of the indirect symbol table to write in the same
>>>>>>>>   order as traversed in #5.
>>>>>>>>
>>>>>>>>>   Does that make sense? It's more work than your patch but it
>>>>>>>>>>
>>>>>>>>> (a) > >   should
>>>>>>
>>>>>>>   preserve binary compatibility with 'as' in situations where the
>>>>>>>>   indirect symbols don't appear out of order w.r.t. the sections,
>>>>>>>>
>>>>>>> (b) > >   it
>>>>>>
>>>>>>>   makes somewhere more explicit the relationship between sections and
>>>>>>>>   their list of symbols being in contiguous order.
>>>>>>>>
>>>>>>>>>   - Daniel
>>>>>>>>>>
>>>>>>>>>
>>>>>>  Will this re-ordering be acceptable to all Mach-O back-ends?  x86,
>>>>>> ARM,
>>>>>>  PPC?
>>>>>>
>>>>>>  Fang
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>  --
>>>> David Fang
>>>> http://www.csl.cornell.edu/~****fang/<http://www.csl.cornell.edu/~**fang/><
>>>> http://www.csl.cornell.edu/~**fang/ <http://www.csl.cornell.edu/~fang/>>
>>>>
>>>>
>>>>
>>>
>> --
>> David Fang
>> http://www.csl.cornell.edu/~**fang/ <http://www.csl.cornell.edu/~fang/>
>>
>>
>

-- 
David Fang
http://www.csl.cornell.edu/~fang/




More information about the llvm-commits mailing list