[PATCH] [MachO] MachOWriter generates bad indirect symbol tables when sections are split

David Fang fang at csl.cornell.edu
Tue Mar 19 12:52:35 PDT 2013


Daniel and all,
 	Attached is take 2 of the patch, just moved the edits from 
MCAssembler to MachObjectWriter.  The two indirect symbol tests still get 
their indirect symbols re-ordered, this patch includes those diffs.  Is 
this closer to what you have in mind?
 	On my handful of test cases, this still produces ordering that is 
consistent with /usr/bin/as (vs. integrated-as).

Fang

> On Tue, Mar 19, 2013 at 2:27 AM, David Fang <fang at csl.cornell.edu> wrote:
>
>> Hi Daniel,
>>         Thanks for reviewing the patch.  I thought I tried suggestion #1
>> before this patch in http://llvm.org/bugs/show_bug.**cgi?id=14636#c33<http://llvm.org/bugs/show_bug.cgi?id=14636#c33>,
>> which only reordeed IndirectSymBase without reordering WriteObject, and
>> didn't quite produce the desired result.  I may revisit it again.
>>         If I reorder IndirectSymBase, I need to also reorder in
>> WriteObject, so am I allowed to move the IndirectSymbol (mapped by section)
>> and IndirectSymbolSection (order of first appearance) structures out of
>> MCAssembler into MachObjectWriter?  I need the ordering information from
>> the first pass of BindIndirectSymbols to persist until WriteObject, without
>> having to sort twice.  This transformation is intended for X86,PPC
>> (probably ARM as well) +mach-o back-ends after all.
>>
>
> I don't understand what the issue is here, you can of course add extra data
> structures into the MachObjectWriter class and compute them during
> BindIndirectSymbols() and reuse them during WriteObject(). You shouldn't
> need to change anything in the MCAssembler.
>
> Basically I am just suggesting moving the building up of the Mach-O
> specific data structures into BindIndirectSymbols() and the
> MachObjectWriter class.
>
> - Daniel
>
>
>>
>> Fang
>>
>>  Hi David,
>>>
>>> Ok, here are my main high level comments on the patch. I have some
>>> specific
>>> nits too but I'll save them for later:
>>>
>>> #1. I would rather this patch not require modifying the MCAssembler level
>>> interface. Instead, what I would like is for the new data structures
>>> (IndirectSymbols and IndirectSymbolSections) to get built up inside
>>> BindIndirectSymbols().
>>>
>>> #2. I don't think this patch should actually cause test cases to change.
>>> Most of those test cases were generated based on matching existing 'as'
>>> output for x86, and the way I envisioned this patch working I wouldn't
>>> have
>>> expected it to change the order of symbols except in the case where the
>>> indirect symbols would be out of order.
>>>
>>> I believe that if you fix #1 then #2 will fall out, because the indirect
>>> symbol lists will get built up in the same order as they are now (based on
>>> iterating the indirect symbol list twice).
>>>
>>> - Daniel
>>>
>>>
>>> On Thu, Mar 14, 2013 at 10:26 AM, David Fang <fang at csl.cornell.edu>
>>> wrote:
>>>
>>>  Sure, Daniel, you seem pretty active, I can ping again in a week.
>>>>
>>>> patch notes: The patch is written so that changing a single #if in
>>>> MCAssembler.h at the top will effectively revert it (for ease of
>>>> testing).
>>>> A few typedefs for the new structures in MCAssembler are made public just
>>>> for convenience.  It looks like I'm deep in FIXME territory, judging by
>>>> the
>>>> comments in the touched files.  Feel free to pass a revised patch back to
>>>> me as you see fit.
>>>>
>>>> Fang
>>>>
>>>>  Hi David,
>>>>
>>>>>
>>>>> Thanks for working on a patch, I'll take a look but it may take me a
>>>>> couple
>>>>> days to get to it.
>>>>>
>>>>> - Daniel
>>>>>
>>>>>
>>>>> On Wed, Mar 13, 2013 at 1:48 PM, David Fang <fang at csl.cornell.edu>
>>>>> wrote:
>>>>>
>>>>>  Daniel,
>>>>>
>>>>>>         The resulting LLVM test diffs with my patch are shown here:
>>>>>> http://llvm.org/bugs/show_bug.******cgi?id=14636#c38<http://llvm.org/bugs/show_bug.****cgi?id=14636#c38>
>>>>>> <http://**llvm.org/bugs/show_bug.**cgi?**id=14636#c38<http://llvm.org/bugs/show_bug.**cgi?id=14636#c38>
>>>>>>>
>>>>>> <http://**llvm.org/bugs/show_**bug.cgi?id=**14636#c38<http://llvm.org/bugs/show_bug.cgi?id=**14636#c38>
>>>>>> <http://**llvm.org/bugs/show_bug.cgi?id=**14636#c38<http://llvm.org/bugs/show_bug.cgi?id=14636#c38>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>
>>>>>> 2 tests "failed" in MC/MachO: indirect-symbols.s and symbol-indirect.s
>>>>>> No other LLVM/clang tests regressed (Targets: PPC,X86,ARM).
>>>>>>
>>>>>> The differences are that symbols are re-ordered, which accomplishes the
>>>>>> objective of the patch. There are FIXME comments that require a
>>>>>> maintainer's scrutiny.
>>>>>> Could you review these differences that result from my patch?
>>>>>> The patch as-is now is in proof-of-concept form.  I'd be happy to
>>>>>> revise
>>>>>> as you suggest.
>>>>>> If this patch benefits all Mach-O architectures (getting closer to
>>>>>> usable
>>>>>> integrated assembly), I'd like to get this into trunk.
>>>>>>
>>>>>> Fang
>>>>>>
>>>>>>
>>>>>>  Michael, Daniel,
>>>>>>
>>>>>>          The attached patch I wrote works for me so far on my small
>>>>>>> test
>>>>>>> cases: llvm/clang's object file matches the indirect symbol ordering
>>>>>>> of
>>>>>>> the
>>>>>>> /usr/bin/as-assembled object's (tested on my powerpc-darwin8 branch).
>>>>>>> Michael, could you try it out on your own test cases?
>>>>>>> Daniel, could you kindly comment and review?  Just the technical
>>>>>>> merits,
>>>>>>> not style.
>>>>>>>          I've also attached the same patch to bug 14636.
>>>>>>>
>>>>>>> Thanks for taking a look.
>>>>>>>
>>>>>>> Fang
>>>>>>>
>>>>>>>   Daniel,
>>>>>>>
>>>>>>>     Some quick questions about your proposed recipe:
>>>>>>>>
>>>>>>>>    Ok, here is what I think the right fix is. Instead of creating the
>>>>>>>>
>>>>>>>>>   IndirectSymBase mapping we use to associate sections with their
>>>>>>>>>>   indirect offset start, in BindIndirectSymbols() we should:
>>>>>>>>>>
>>>>>>>>>>    1. Add a simple container struct (lets say > >
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  MachSectionIndirectSymbols)
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>    for tracking the per-section list of indirect symbols. It will
>>>>>>>>> keep
>>>>>>>>>
>>>>>>>>>>   the list of symbols in the section and the index of the first > >
>>>>>>>>>>
>>>>>>>>>>    indirect
>>>>>>>>>
>>>>>>>>
>>>>>>>>    symbol in the section.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>   Like a map<section*, vector<indirect_symbol> >?
>>>>>>>>
>>>>>>>>  Instead of keeping track of the first indirect symbol index, could
>>>>>>>> we
>>>>>>>> just
>>>>>>>>  as easily compute the index offsets later during the second pass:
>>>>>>>>
>>>>>>>>  offset = 0;
>>>>>>>>  for each section S (in order)
>>>>>>>>    record offset for section S
>>>>>>>>    [optional] write vector of indirect symbols for S
>>>>>>>>    offset += S.num_indirect_symbols();
>>>>>>>>  end for
>>>>>>>>
>>>>>>>>  I was able to fix the reserve1 field by doing something like this
>>>>>>>> (bug
>>>>>>>>  14636, comments 30-33), but I only later discovered that the
>>>>>>>> indirect
>>>>>>>>  symbols needed to be reordered.
>>>>>>>>
>>>>>>>>    2. Keep a mapping from sections to the above type.
>>>>>>>>
>>>>>>>>>   3. Add a SetVector to record the order of the sections that have
>>>>>>>>>>   indirect symbols.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>   This I understand, I found SetVector and SmallSetVector in ADT.
>>>>>>>>
>>>>>>>>    4. During BindIndirectSymbols() we maintain the above information
>>>>>>>>
>>>>>>>>>   (populating the MachSectionIndirectSymbols per-section symbol > >
>>>>>>>>>>
>>>>>>>>>>    arrays).
>>>>>>>>>
>>>>>>>>
>>>>>>>>  BindIndirectSymbols looks like it need not change, because it's
>>>>>>>> already
>>>>>>>>  operating on the assumption of ordered indirect symbols, right?
>>>>>>>>
>>>>>>>>    5. Once we have scanned all the symbols we make another pass over
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>  the
>>>>>>>>>
>>>>>>>>
>>>>>>>>    sections (in the order seen via indirect symbols) and assign the >
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    start
>>>>>>>>>
>>>>>>>>>    indices.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>   as I outlined above (offset = start indices).
>>>>>>>>
>>>>>>>>    6. Update writing of the indirect symbol table to write in the
>>>>>>>> same
>>>>>>>>
>>>>>>>>>   order as traversed in #5.
>>>>>>>>>>
>>>>>>>>>>    Does that make sense? It's more work than your patch but it
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  (a) > >   should
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>    preserve binary compatibility with 'as' in situations where the
>>>>>>>>>
>>>>>>>>>>   indirect symbols don't appear out of order w.r.t. the sections,
>>>>>>>>>>
>>>>>>>>>>  (b) > >   it
>>>>>>>>>
>>>>>>>>
>>>>>>>>    makes somewhere more explicit the relationship between sections
>>>>>>>>> and
>>>>>>>>>
>>>>>>>>>>   their list of symbols being in contiguous order.
>>>>>>>>>>
>>>>>>>>>>    - Daniel
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>   Will this re-ordering be acceptable to all Mach-O back-ends?
>>>>>>>>  x86,
>>>>>>>> ARM,
>>>>>>>>  PPC?
>>>>>>>>
>>>>>>>>  Fang
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>  --
>>>>>>>
>>>>>> David Fang
>>>>>> http://www.csl.cornell.edu/~******fang/<http://www.csl.cornell.edu/~****fang/>
>>>>>> <http://www.csl.**cornell.edu/~**fang/<http://www.csl.cornell.edu/~**fang/>
>>>>>>> <
>>>>>> http://www.csl.cornell.edu/~****fang/<http://www.csl.cornell.edu/~**fang/><
>>>>>> http://www.csl.cornell.edu/~**fang/<http://www.csl.cornell.edu/~fang/>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>  --
>>>> David Fang
>>>> http://www.csl.cornell.edu/~****fang/<http://www.csl.cornell.edu/~**fang/><
>>>> http://www.csl.cornell.edu/~**fang/ <http://www.csl.cornell.edu/~fang/>>
>>>>
>>>>
>>>>
>>>
>> --
>> David Fang
>> http://www.csl.cornell.edu/~**fang/ <http://www.csl.cornell.edu/~fang/>
>>
>>
>

-- 
David Fang
http://www.csl.cornell.edu/~fang/
-------------- next part --------------
diff --git a/include/llvm/MC/MCMachObjectWriter.h b/include/llvm/MC/MCMachObjectWriter.h
index 3c9a588..1630049 100644
--- a/include/llvm/MC/MCMachObjectWriter.h
+++ b/include/llvm/MC/MCMachObjectWriter.h
@@ -17,6 +17,16 @@
 #include "llvm/MC/MCObjectWriter.h"
 #include "llvm/Object/MachOFormat.h"
 #include "llvm/Support/DataTypes.h"
+
+/**
+	Mach-O needs indirect symbols grouped by section.
+	Goal: 1
+ */
+#define ORDER_INDIRECT_SYMBOLS_BY_SECTION       1
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+#include "llvm/ADT/SetVector.h"
+#endif
+
 #include <vector>
 
 namespace llvm {
@@ -112,6 +122,22 @@ class MachObjectWriter : public MCObjectWriter {
 
   /// @}
 
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+  /// @name Indirect Symbol Table Data
+  /// @{
+
+  typedef std::vector<const MCSymbol*>  IndirectSymbol_list_type;
+  typedef DenseMap<const MCSectionData*, IndirectSymbol_list_type>
+						IndirectSymbol_map_type;
+  // keep sections in order of appearance
+  typedef SetVector<const MCSectionData*>	IndirectSymbolSection_set_type;
+
+  IndirectSymbol_map_type			IndirectSymbolMap;
+  IndirectSymbolSection_set_type		IndirectSymbolSections;
+
+  /// @}
+#endif
+
 public:
   MachObjectWriter(MCMachObjectTargetWriter *MOTW, raw_ostream &_OS,
                    bool _IsLittleEndian)
diff --git a/lib/MC/MachObjectWriter.cpp b/lib/MC/MachObjectWriter.cpp
index a5ba3c3..d3eaf19 100644
--- a/lib/MC/MachObjectWriter.cpp
+++ b/lib/MC/MachObjectWriter.cpp
@@ -428,6 +428,7 @@ void MachObjectWriter::BindIndirectSymbols(MCAssembler &Asm) {
   //
   // FIXME: Revisit this when the dust settles.
 
+#if !ORDER_INDIRECT_SYMBOLS_BY_SECTION
   // Bind non lazy symbol pointers first.
   unsigned IndirectIndex = 0;
   for (MCAssembler::indirect_symbol_iterator it = Asm.indirect_symbol_begin(),
@@ -466,6 +467,46 @@ void MachObjectWriter::BindIndirectSymbols(MCAssembler &Asm) {
     if (Created)
       Entry.setFlags(Entry.getFlags() | 0x0001);
   }
+#else // ORDER_INDIRECT_SYMBOLS_BY_SECTION
+  // sort indirect symbols by section
+  for (MCAssembler::indirect_symbol_iterator it = Asm.indirect_symbol_begin(),
+         ie = Asm.indirect_symbol_end(); it != ie; ++it) {
+    // track their sections by order of appearance
+    IndirectSymbolSections.insert(it->SectionData);
+    IndirectSymbolMap[it->SectionData].push_back(it->Symbol);
+  }
+  // process indirect symbols by section
+  unsigned offset = 0;	// running total of indirect symbol index offset
+  IndirectSymbolSection_set_type::const_iterator
+	i(IndirectSymbolSections.begin()), e(IndirectSymbolSections.end());
+  for ( ; i!=e; ++i) {
+    const IndirectSymbol_list_type& b(IndirectSymbolMap.find(*i)->second);
+    IndirectSymbol_list_type::const_iterator bi(b.begin()), be(b.end());
+    const MCSectionMachO& Section(cast<MCSectionMachO>((*i)->getSection()));
+    switch (Section.getType()) {
+    default: break;
+    case MCSectionMachO::S_NON_LAZY_SYMBOL_POINTERS: {
+      for ( ; bi!=be; ++bi)
+        Asm.getOrCreateSymbolData(**bi);
+      break;
+    }
+    case MCSectionMachO::S_LAZY_SYMBOL_POINTERS: // fall-through
+    case MCSectionMachO::S_SYMBOL_STUBS: {
+      for ( ; bi!=be; ++bi) {
+    // Set the symbol type to undefined lazy, but only on construction.
+    // FIXME: Do not hardcode.
+        bool Created;
+        MCSymbolData &Entry(Asm.getOrCreateSymbolData(**bi, &Created));
+        if (Created)
+          Entry.setFlags(Entry.getFlags() | 0x0001);
+      }	// end for
+      break;
+    }
+    }	// end switch(sectionType)
+    IndirectSymBase.insert(std::make_pair(*i, offset));
+    offset += b.size();
+  }
+#endif	// ORDER_INDIRECT_SYMBOLS_BY_SECTION
 }
 
 /// ComputeSymbolTable - Compute the symbol table data
@@ -895,26 +936,44 @@ void MachObjectWriter::WriteObject(MCAssembler &Asm,
   // Write the symbol table data, if used.
   if (NumSymbols) {
     // Write the indirect symbol entries.
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+    for (IndirectSymbolSection_set_type::const_iterator
+           si(IndirectSymbolSections.begin()), se(IndirectSymbolSections.end());
+           si != se; ++si) {
+      const IndirectSymbol_list_type& l(IndirectSymbolMap.find(*si)->second);
+      for (IndirectSymbol_list_type::const_iterator
+           it(l.begin()), ie(l.end()); it != ie; ++it)
+#else
     for (MCAssembler::const_indirect_symbol_iterator
            it = Asm.indirect_symbol_begin(),
-           ie = Asm.indirect_symbol_end(); it != ie; ++it) {
+           ie = Asm.indirect_symbol_end(); it != ie; ++it)
+#endif
+    {
       // Indirect symbols in the non lazy symbol pointer section have some
       // special handling.
       const MCSectionMachO &Section =
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+        static_cast<const MCSectionMachO&>((*si)->getSection());
+        const MCSymbol* Sym = *it;
+#else
         static_cast<const MCSectionMachO&>(it->SectionData->getSection());
+        const MCSymbol* Sym = it->Symbol;
+#endif
       if (Section.getType() == MCSectionMachO::S_NON_LAZY_SYMBOL_POINTERS) {
         // If this symbol is defined and internal, mark it as such.
-        if (it->Symbol->isDefined() &&
-            !Asm.getSymbolData(*it->Symbol).isExternal()) {
+        if (Sym->isDefined() &&
+            !Asm.getSymbolData(*Sym).isExternal()) {
           uint32_t Flags = macho::ISF_Local;
-          if (it->Symbol->isAbsolute())
+          if (Sym->isAbsolute())
             Flags |= macho::ISF_Absolute;
           Write32(Flags);
           continue;
         }
       }
-
-      Write32(Asm.getSymbolData(*it->Symbol).getIndex());
+      Write32(Asm.getSymbolData(*Sym).getIndex());
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+      }
+#endif
     }
 
     // FIXME: Check that offsets match computed ones.
diff --git a/test/MC/MachO/indirect-symbols.s b/test/MC/MachO/indirect-symbols.s
index 90fd231..deb9e7a 100644
--- a/test/MC/MachO/indirect-symbols.s
+++ b/test/MC/MachO/indirect-symbols.s
@@ -97,7 +97,7 @@ _e:
 // CHECK:   ('nsyms', 6)
 // CHECK:   ('stroff', 516)
 // CHECK:   ('strsize', 20)
-// CHECK:   ('_string_data', '\x00_d\x00_a\x00_b\x00_c\x00_e\x00_f\x00\x00')
+// CHECK:   ('_string_data', '\x00_a\x00_d\x00_b\x00_c\x00_e\x00_f\x00\x00')
 // CHECK:   ('_symbols', [
 // CHECK:     # Symbol 0
 // CHECK:    (('n_strx', 7)
@@ -132,7 +132,7 @@ _e:
 // CHECK:     ('_string', '_f')
 // CHECK:    ),
 // CHECK:     # Symbol 4
-// CHECK:    (('n_strx', 4)
+// CHECK:    (('n_strx', 1)
 // CHECK:     ('n_type', 0x1)
 // CHECK:     ('n_sect', 0)
 // CHECK:     ('n_desc', 1)
@@ -140,7 +140,7 @@ _e:
 // CHECK:     ('_string', '_a')
 // CHECK:    ),
 // CHECK:     # Symbol 5
-// CHECK:    (('n_strx', 1)
+// CHECK:    (('n_strx', 4)
 // CHECK:     ('n_type', 0x1)
 // CHECK:     ('n_sect', 0)
 // CHECK:     ('n_desc', 0)
diff --git a/test/MC/MachO/symbol-indirect.s b/test/MC/MachO/symbol-indirect.s
index 2412970..6f5c842 100644
--- a/test/MC/MachO/symbol-indirect.s
+++ b/test/MC/MachO/symbol-indirect.s
@@ -137,7 +137,7 @@ sym_nlp_G:
 // CHECK:   ('nsyms', 10)
 // CHECK:   ('stroff', 592)
 // CHECK:   ('strsize', 104)
-// CHECK:   ('_string_data', '\x00sym_lsp_A\x00sym_lsp_G\x00sym_nlp_A\x00sym_nlp_G\x00sym_nlp_B\x00sym_nlp_E\x00sym_lsp_B\x00sym_lsp_E\x00sym_lsp_C\x00sym_nlp_C\x00\x00\x00\x00')
+// CHECK:   ('_string_data', '\x00sym_lsp_A\x00sym_lsp_G\x00sym_nlp_A\x00sym_nlp_G\x00sym_lsp_B\x00sym_lsp_E\x00sym_nlp_B\x00sym_nlp_E\x00sym_lsp_C\x00sym_nlp_C\x00\x00\x00\x00')
 // CHECK:   ('_symbols', [
 // CHECK:     # Symbol 0
 // CHECK:    (('n_strx', 81)
@@ -180,7 +180,7 @@ sym_nlp_G:
 // CHECK:     ('_string', 'sym_lsp_A')
 // CHECK:    ),
 // CHECK:     # Symbol 5
-// CHECK:    (('n_strx', 61)
+// CHECK:    (('n_strx', 41)
 // CHECK:     ('n_type', 0x1)
 // CHECK:     ('n_sect', 0)
 // CHECK:     ('n_desc', 1)
@@ -188,7 +188,7 @@ sym_nlp_G:
 // CHECK:     ('_string', 'sym_lsp_B')
 // CHECK:    ),
 // CHECK:     # Symbol 6
-// CHECK:    (('n_strx', 71)
+// CHECK:    (('n_strx', 51)
 // CHECK:     ('n_type', 0x1)
 // CHECK:     ('n_sect', 0)
 // CHECK:     ('n_desc', 1)
@@ -204,7 +204,7 @@ sym_nlp_G:
 // CHECK:     ('_string', 'sym_nlp_A')
 // CHECK:    ),
 // CHECK:     # Symbol 8
-// CHECK:    (('n_strx', 41)
+// CHECK:    (('n_strx', 61)
 // CHECK:     ('n_type', 0x1)
 // CHECK:     ('n_sect', 0)
 // CHECK:     ('n_desc', 0)
@@ -212,7 +212,7 @@ sym_nlp_G:
 // CHECK:     ('_string', 'sym_nlp_B')
 // CHECK:    ),
 // CHECK:     # Symbol 9
-// CHECK:    (('n_strx', 51)
+// CHECK:    (('n_strx', 71)
 // CHECK:     ('n_type', 0x1)
 // CHECK:     ('n_sect', 0)
 // CHECK:     ('n_desc', 0)


More information about the llvm-commits mailing list