[PATCH] [MachO] MachOWriter generates bad indirect symbol tables when sections are split
David Fang
fang at csl.cornell.edu
Tue Mar 19 12:52:35 PDT 2013
Daniel and all,
Attached is take 2 of the patch, just moved the edits from
MCAssembler to MachObjectWriter. The two indirect symbol tests still get
their indirect symbols re-ordered, this patch includes those diffs. Is
this closer to what you have in mind?
On my handful of test cases, this still produces ordering that is
consistent with /usr/bin/as (vs. integrated-as).
Fang
> On Tue, Mar 19, 2013 at 2:27 AM, David Fang <fang at csl.cornell.edu> wrote:
>
>> Hi Daniel,
>> Thanks for reviewing the patch. I thought I tried suggestion #1
>> before this patch in http://llvm.org/bugs/show_bug.**cgi?id=14636#c33<http://llvm.org/bugs/show_bug.cgi?id=14636#c33>,
>> which only reordeed IndirectSymBase without reordering WriteObject, and
>> didn't quite produce the desired result. I may revisit it again.
>> If I reorder IndirectSymBase, I need to also reorder in
>> WriteObject, so am I allowed to move the IndirectSymbol (mapped by section)
>> and IndirectSymbolSection (order of first appearance) structures out of
>> MCAssembler into MachObjectWriter? I need the ordering information from
>> the first pass of BindIndirectSymbols to persist until WriteObject, without
>> having to sort twice. This transformation is intended for X86,PPC
>> (probably ARM as well) +mach-o back-ends after all.
>>
>
> I don't understand what the issue is here, you can of course add extra data
> structures into the MachObjectWriter class and compute them during
> BindIndirectSymbols() and reuse them during WriteObject(). You shouldn't
> need to change anything in the MCAssembler.
>
> Basically I am just suggesting moving the building up of the Mach-O
> specific data structures into BindIndirectSymbols() and the
> MachObjectWriter class.
>
> - Daniel
>
>
>>
>> Fang
>>
>> Hi David,
>>>
>>> Ok, here are my main high level comments on the patch. I have some
>>> specific
>>> nits too but I'll save them for later:
>>>
>>> #1. I would rather this patch not require modifying the MCAssembler level
>>> interface. Instead, what I would like is for the new data structures
>>> (IndirectSymbols and IndirectSymbolSections) to get built up inside
>>> BindIndirectSymbols().
>>>
>>> #2. I don't think this patch should actually cause test cases to change.
>>> Most of those test cases were generated based on matching existing 'as'
>>> output for x86, and the way I envisioned this patch working I wouldn't
>>> have
>>> expected it to change the order of symbols except in the case where the
>>> indirect symbols would be out of order.
>>>
>>> I believe that if you fix #1 then #2 will fall out, because the indirect
>>> symbol lists will get built up in the same order as they are now (based on
>>> iterating the indirect symbol list twice).
>>>
>>> - Daniel
>>>
>>>
>>> On Thu, Mar 14, 2013 at 10:26 AM, David Fang <fang at csl.cornell.edu>
>>> wrote:
>>>
>>> Sure, Daniel, you seem pretty active, I can ping again in a week.
>>>>
>>>> patch notes: The patch is written so that changing a single #if in
>>>> MCAssembler.h at the top will effectively revert it (for ease of
>>>> testing).
>>>> A few typedefs for the new structures in MCAssembler are made public just
>>>> for convenience. It looks like I'm deep in FIXME territory, judging by
>>>> the
>>>> comments in the touched files. Feel free to pass a revised patch back to
>>>> me as you see fit.
>>>>
>>>> Fang
>>>>
>>>> Hi David,
>>>>
>>>>>
>>>>> Thanks for working on a patch, I'll take a look but it may take me a
>>>>> couple
>>>>> days to get to it.
>>>>>
>>>>> - Daniel
>>>>>
>>>>>
>>>>> On Wed, Mar 13, 2013 at 1:48 PM, David Fang <fang at csl.cornell.edu>
>>>>> wrote:
>>>>>
>>>>> Daniel,
>>>>>
>>>>>> The resulting LLVM test diffs with my patch are shown here:
>>>>>> http://llvm.org/bugs/show_bug.******cgi?id=14636#c38<http://llvm.org/bugs/show_bug.****cgi?id=14636#c38>
>>>>>> <http://**llvm.org/bugs/show_bug.**cgi?**id=14636#c38<http://llvm.org/bugs/show_bug.**cgi?id=14636#c38>
>>>>>>>
>>>>>> <http://**llvm.org/bugs/show_**bug.cgi?id=**14636#c38<http://llvm.org/bugs/show_bug.cgi?id=**14636#c38>
>>>>>> <http://**llvm.org/bugs/show_bug.cgi?id=**14636#c38<http://llvm.org/bugs/show_bug.cgi?id=14636#c38>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>
>>>>>> 2 tests "failed" in MC/MachO: indirect-symbols.s and symbol-indirect.s
>>>>>> No other LLVM/clang tests regressed (Targets: PPC,X86,ARM).
>>>>>>
>>>>>> The differences are that symbols are re-ordered, which accomplishes the
>>>>>> objective of the patch. There are FIXME comments that require a
>>>>>> maintainer's scrutiny.
>>>>>> Could you review these differences that result from my patch?
>>>>>> The patch as-is now is in proof-of-concept form. I'd be happy to
>>>>>> revise
>>>>>> as you suggest.
>>>>>> If this patch benefits all Mach-O architectures (getting closer to
>>>>>> usable
>>>>>> integrated assembly), I'd like to get this into trunk.
>>>>>>
>>>>>> Fang
>>>>>>
>>>>>>
>>>>>> Michael, Daniel,
>>>>>>
>>>>>> The attached patch I wrote works for me so far on my small
>>>>>>> test
>>>>>>> cases: llvm/clang's object file matches the indirect symbol ordering
>>>>>>> of
>>>>>>> the
>>>>>>> /usr/bin/as-assembled object's (tested on my powerpc-darwin8 branch).
>>>>>>> Michael, could you try it out on your own test cases?
>>>>>>> Daniel, could you kindly comment and review? Just the technical
>>>>>>> merits,
>>>>>>> not style.
>>>>>>> I've also attached the same patch to bug 14636.
>>>>>>>
>>>>>>> Thanks for taking a look.
>>>>>>>
>>>>>>> Fang
>>>>>>>
>>>>>>> Daniel,
>>>>>>>
>>>>>>> Some quick questions about your proposed recipe:
>>>>>>>>
>>>>>>>> Ok, here is what I think the right fix is. Instead of creating the
>>>>>>>>
>>>>>>>>> IndirectSymBase mapping we use to associate sections with their
>>>>>>>>>> indirect offset start, in BindIndirectSymbols() we should:
>>>>>>>>>>
>>>>>>>>>> 1. Add a simple container struct (lets say > >
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> MachSectionIndirectSymbols)
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> for tracking the per-section list of indirect symbols. It will
>>>>>>>>> keep
>>>>>>>>>
>>>>>>>>>> the list of symbols in the section and the index of the first > >
>>>>>>>>>>
>>>>>>>>>> indirect
>>>>>>>>>
>>>>>>>>
>>>>>>>> symbol in the section.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Like a map<section*, vector<indirect_symbol> >?
>>>>>>>>
>>>>>>>> Instead of keeping track of the first indirect symbol index, could
>>>>>>>> we
>>>>>>>> just
>>>>>>>> as easily compute the index offsets later during the second pass:
>>>>>>>>
>>>>>>>> offset = 0;
>>>>>>>> for each section S (in order)
>>>>>>>> record offset for section S
>>>>>>>> [optional] write vector of indirect symbols for S
>>>>>>>> offset += S.num_indirect_symbols();
>>>>>>>> end for
>>>>>>>>
>>>>>>>> I was able to fix the reserve1 field by doing something like this
>>>>>>>> (bug
>>>>>>>> 14636, comments 30-33), but I only later discovered that the
>>>>>>>> indirect
>>>>>>>> symbols needed to be reordered.
>>>>>>>>
>>>>>>>> 2. Keep a mapping from sections to the above type.
>>>>>>>>
>>>>>>>>> 3. Add a SetVector to record the order of the sections that have
>>>>>>>>>> indirect symbols.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> This I understand, I found SetVector and SmallSetVector in ADT.
>>>>>>>>
>>>>>>>> 4. During BindIndirectSymbols() we maintain the above information
>>>>>>>>
>>>>>>>>> (populating the MachSectionIndirectSymbols per-section symbol > >
>>>>>>>>>>
>>>>>>>>>> arrays).
>>>>>>>>>
>>>>>>>>
>>>>>>>> BindIndirectSymbols looks like it need not change, because it's
>>>>>>>> already
>>>>>>>> operating on the assumption of ordered indirect symbols, right?
>>>>>>>>
>>>>>>>> 5. Once we have scanned all the symbols we make another pass over
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> the
>>>>>>>>>
>>>>>>>>
>>>>>>>> sections (in the order seen via indirect symbols) and assign the >
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> start
>>>>>>>>>
>>>>>>>>> indices.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> as I outlined above (offset = start indices).
>>>>>>>>
>>>>>>>> 6. Update writing of the indirect symbol table to write in the
>>>>>>>> same
>>>>>>>>
>>>>>>>>> order as traversed in #5.
>>>>>>>>>>
>>>>>>>>>> Does that make sense? It's more work than your patch but it
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> (a) > > should
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> preserve binary compatibility with 'as' in situations where the
>>>>>>>>>
>>>>>>>>>> indirect symbols don't appear out of order w.r.t. the sections,
>>>>>>>>>>
>>>>>>>>>> (b) > > it
>>>>>>>>>
>>>>>>>>
>>>>>>>> makes somewhere more explicit the relationship between sections
>>>>>>>>> and
>>>>>>>>>
>>>>>>>>>> their list of symbols being in contiguous order.
>>>>>>>>>>
>>>>>>>>>> - Daniel
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Will this re-ordering be acceptable to all Mach-O back-ends?
>>>>>>>> x86,
>>>>>>>> ARM,
>>>>>>>> PPC?
>>>>>>>>
>>>>>>>> Fang
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>> David Fang
>>>>>> http://www.csl.cornell.edu/~******fang/<http://www.csl.cornell.edu/~****fang/>
>>>>>> <http://www.csl.**cornell.edu/~**fang/<http://www.csl.cornell.edu/~**fang/>
>>>>>>> <
>>>>>> http://www.csl.cornell.edu/~****fang/<http://www.csl.cornell.edu/~**fang/><
>>>>>> http://www.csl.cornell.edu/~**fang/<http://www.csl.cornell.edu/~fang/>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> --
>>>> David Fang
>>>> http://www.csl.cornell.edu/~****fang/<http://www.csl.cornell.edu/~**fang/><
>>>> http://www.csl.cornell.edu/~**fang/ <http://www.csl.cornell.edu/~fang/>>
>>>>
>>>>
>>>>
>>>
>> --
>> David Fang
>> http://www.csl.cornell.edu/~**fang/ <http://www.csl.cornell.edu/~fang/>
>>
>>
>
--
David Fang
http://www.csl.cornell.edu/~fang/
-------------- next part --------------
diff --git a/include/llvm/MC/MCMachObjectWriter.h b/include/llvm/MC/MCMachObjectWriter.h
index 3c9a588..1630049 100644
--- a/include/llvm/MC/MCMachObjectWriter.h
+++ b/include/llvm/MC/MCMachObjectWriter.h
@@ -17,6 +17,16 @@
#include "llvm/MC/MCObjectWriter.h"
#include "llvm/Object/MachOFormat.h"
#include "llvm/Support/DataTypes.h"
+
+/**
+ Mach-O needs indirect symbols grouped by section.
+ Goal: 1
+ */
+#define ORDER_INDIRECT_SYMBOLS_BY_SECTION 1
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+#include "llvm/ADT/SetVector.h"
+#endif
+
#include <vector>
namespace llvm {
@@ -112,6 +122,22 @@ class MachObjectWriter : public MCObjectWriter {
/// @}
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+ /// @name Indirect Symbol Table Data
+ /// @{
+
+ typedef std::vector<const MCSymbol*> IndirectSymbol_list_type;
+ typedef DenseMap<const MCSectionData*, IndirectSymbol_list_type>
+ IndirectSymbol_map_type;
+ // keep sections in order of appearance
+ typedef SetVector<const MCSectionData*> IndirectSymbolSection_set_type;
+
+ IndirectSymbol_map_type IndirectSymbolMap;
+ IndirectSymbolSection_set_type IndirectSymbolSections;
+
+ /// @}
+#endif
+
public:
MachObjectWriter(MCMachObjectTargetWriter *MOTW, raw_ostream &_OS,
bool _IsLittleEndian)
diff --git a/lib/MC/MachObjectWriter.cpp b/lib/MC/MachObjectWriter.cpp
index a5ba3c3..d3eaf19 100644
--- a/lib/MC/MachObjectWriter.cpp
+++ b/lib/MC/MachObjectWriter.cpp
@@ -428,6 +428,7 @@ void MachObjectWriter::BindIndirectSymbols(MCAssembler &Asm) {
//
// FIXME: Revisit this when the dust settles.
+#if !ORDER_INDIRECT_SYMBOLS_BY_SECTION
// Bind non lazy symbol pointers first.
unsigned IndirectIndex = 0;
for (MCAssembler::indirect_symbol_iterator it = Asm.indirect_symbol_begin(),
@@ -466,6 +467,46 @@ void MachObjectWriter::BindIndirectSymbols(MCAssembler &Asm) {
if (Created)
Entry.setFlags(Entry.getFlags() | 0x0001);
}
+#else // ORDER_INDIRECT_SYMBOLS_BY_SECTION
+ // sort indirect symbols by section
+ for (MCAssembler::indirect_symbol_iterator it = Asm.indirect_symbol_begin(),
+ ie = Asm.indirect_symbol_end(); it != ie; ++it) {
+ // track their sections by order of appearance
+ IndirectSymbolSections.insert(it->SectionData);
+ IndirectSymbolMap[it->SectionData].push_back(it->Symbol);
+ }
+ // process indirect symbols by section
+ unsigned offset = 0; // running total of indirect symbol index offset
+ IndirectSymbolSection_set_type::const_iterator
+ i(IndirectSymbolSections.begin()), e(IndirectSymbolSections.end());
+ for ( ; i!=e; ++i) {
+ const IndirectSymbol_list_type& b(IndirectSymbolMap.find(*i)->second);
+ IndirectSymbol_list_type::const_iterator bi(b.begin()), be(b.end());
+ const MCSectionMachO& Section(cast<MCSectionMachO>((*i)->getSection()));
+ switch (Section.getType()) {
+ default: break;
+ case MCSectionMachO::S_NON_LAZY_SYMBOL_POINTERS: {
+ for ( ; bi!=be; ++bi)
+ Asm.getOrCreateSymbolData(**bi);
+ break;
+ }
+ case MCSectionMachO::S_LAZY_SYMBOL_POINTERS: // fall-through
+ case MCSectionMachO::S_SYMBOL_STUBS: {
+ for ( ; bi!=be; ++bi) {
+ // Set the symbol type to undefined lazy, but only on construction.
+ // FIXME: Do not hardcode.
+ bool Created;
+ MCSymbolData &Entry(Asm.getOrCreateSymbolData(**bi, &Created));
+ if (Created)
+ Entry.setFlags(Entry.getFlags() | 0x0001);
+ } // end for
+ break;
+ }
+ } // end switch(sectionType)
+ IndirectSymBase.insert(std::make_pair(*i, offset));
+ offset += b.size();
+ }
+#endif // ORDER_INDIRECT_SYMBOLS_BY_SECTION
}
/// ComputeSymbolTable - Compute the symbol table data
@@ -895,26 +936,44 @@ void MachObjectWriter::WriteObject(MCAssembler &Asm,
// Write the symbol table data, if used.
if (NumSymbols) {
// Write the indirect symbol entries.
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+ for (IndirectSymbolSection_set_type::const_iterator
+ si(IndirectSymbolSections.begin()), se(IndirectSymbolSections.end());
+ si != se; ++si) {
+ const IndirectSymbol_list_type& l(IndirectSymbolMap.find(*si)->second);
+ for (IndirectSymbol_list_type::const_iterator
+ it(l.begin()), ie(l.end()); it != ie; ++it)
+#else
for (MCAssembler::const_indirect_symbol_iterator
it = Asm.indirect_symbol_begin(),
- ie = Asm.indirect_symbol_end(); it != ie; ++it) {
+ ie = Asm.indirect_symbol_end(); it != ie; ++it)
+#endif
+ {
// Indirect symbols in the non lazy symbol pointer section have some
// special handling.
const MCSectionMachO &Section =
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+ static_cast<const MCSectionMachO&>((*si)->getSection());
+ const MCSymbol* Sym = *it;
+#else
static_cast<const MCSectionMachO&>(it->SectionData->getSection());
+ const MCSymbol* Sym = it->Symbol;
+#endif
if (Section.getType() == MCSectionMachO::S_NON_LAZY_SYMBOL_POINTERS) {
// If this symbol is defined and internal, mark it as such.
- if (it->Symbol->isDefined() &&
- !Asm.getSymbolData(*it->Symbol).isExternal()) {
+ if (Sym->isDefined() &&
+ !Asm.getSymbolData(*Sym).isExternal()) {
uint32_t Flags = macho::ISF_Local;
- if (it->Symbol->isAbsolute())
+ if (Sym->isAbsolute())
Flags |= macho::ISF_Absolute;
Write32(Flags);
continue;
}
}
-
- Write32(Asm.getSymbolData(*it->Symbol).getIndex());
+ Write32(Asm.getSymbolData(*Sym).getIndex());
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+ }
+#endif
}
// FIXME: Check that offsets match computed ones.
diff --git a/test/MC/MachO/indirect-symbols.s b/test/MC/MachO/indirect-symbols.s
index 90fd231..deb9e7a 100644
--- a/test/MC/MachO/indirect-symbols.s
+++ b/test/MC/MachO/indirect-symbols.s
@@ -97,7 +97,7 @@ _e:
// CHECK: ('nsyms', 6)
// CHECK: ('stroff', 516)
// CHECK: ('strsize', 20)
-// CHECK: ('_string_data', '\x00_d\x00_a\x00_b\x00_c\x00_e\x00_f\x00\x00')
+// CHECK: ('_string_data', '\x00_a\x00_d\x00_b\x00_c\x00_e\x00_f\x00\x00')
// CHECK: ('_symbols', [
// CHECK: # Symbol 0
// CHECK: (('n_strx', 7)
@@ -132,7 +132,7 @@ _e:
// CHECK: ('_string', '_f')
// CHECK: ),
// CHECK: # Symbol 4
-// CHECK: (('n_strx', 4)
+// CHECK: (('n_strx', 1)
// CHECK: ('n_type', 0x1)
// CHECK: ('n_sect', 0)
// CHECK: ('n_desc', 1)
@@ -140,7 +140,7 @@ _e:
// CHECK: ('_string', '_a')
// CHECK: ),
// CHECK: # Symbol 5
-// CHECK: (('n_strx', 1)
+// CHECK: (('n_strx', 4)
// CHECK: ('n_type', 0x1)
// CHECK: ('n_sect', 0)
// CHECK: ('n_desc', 0)
diff --git a/test/MC/MachO/symbol-indirect.s b/test/MC/MachO/symbol-indirect.s
index 2412970..6f5c842 100644
--- a/test/MC/MachO/symbol-indirect.s
+++ b/test/MC/MachO/symbol-indirect.s
@@ -137,7 +137,7 @@ sym_nlp_G:
// CHECK: ('nsyms', 10)
// CHECK: ('stroff', 592)
// CHECK: ('strsize', 104)
-// CHECK: ('_string_data', '\x00sym_lsp_A\x00sym_lsp_G\x00sym_nlp_A\x00sym_nlp_G\x00sym_nlp_B\x00sym_nlp_E\x00sym_lsp_B\x00sym_lsp_E\x00sym_lsp_C\x00sym_nlp_C\x00\x00\x00\x00')
+// CHECK: ('_string_data', '\x00sym_lsp_A\x00sym_lsp_G\x00sym_nlp_A\x00sym_nlp_G\x00sym_lsp_B\x00sym_lsp_E\x00sym_nlp_B\x00sym_nlp_E\x00sym_lsp_C\x00sym_nlp_C\x00\x00\x00\x00')
// CHECK: ('_symbols', [
// CHECK: # Symbol 0
// CHECK: (('n_strx', 81)
@@ -180,7 +180,7 @@ sym_nlp_G:
// CHECK: ('_string', 'sym_lsp_A')
// CHECK: ),
// CHECK: # Symbol 5
-// CHECK: (('n_strx', 61)
+// CHECK: (('n_strx', 41)
// CHECK: ('n_type', 0x1)
// CHECK: ('n_sect', 0)
// CHECK: ('n_desc', 1)
@@ -188,7 +188,7 @@ sym_nlp_G:
// CHECK: ('_string', 'sym_lsp_B')
// CHECK: ),
// CHECK: # Symbol 6
-// CHECK: (('n_strx', 71)
+// CHECK: (('n_strx', 51)
// CHECK: ('n_type', 0x1)
// CHECK: ('n_sect', 0)
// CHECK: ('n_desc', 1)
@@ -204,7 +204,7 @@ sym_nlp_G:
// CHECK: ('_string', 'sym_nlp_A')
// CHECK: ),
// CHECK: # Symbol 8
-// CHECK: (('n_strx', 41)
+// CHECK: (('n_strx', 61)
// CHECK: ('n_type', 0x1)
// CHECK: ('n_sect', 0)
// CHECK: ('n_desc', 0)
@@ -212,7 +212,7 @@ sym_nlp_G:
// CHECK: ('_string', 'sym_nlp_B')
// CHECK: ),
// CHECK: # Symbol 9
-// CHECK: (('n_strx', 51)
+// CHECK: (('n_strx', 71)
// CHECK: ('n_type', 0x1)
// CHECK: ('n_sect', 0)
// CHECK: ('n_desc', 0)
More information about the llvm-commits
mailing list