[PATCH] [MachO] MachOWriter generates bad indirect symbol tables when sections are split

David Fang fang at csl.cornell.edu
Tue Mar 12 22:53:50 PDT 2013


Michael, Daniel,
 	The attached patch I wrote works for me so far on my small test 
cases: llvm/clang's object file matches the indirect symbol ordering of 
the /usr/bin/as-assembled object's (tested on my powerpc-darwin8 branch).
Michael, could you try it out on your own test cases?
Daniel, could you kindly comment and review?  Just the technical merits, 
not style.
 	I've also attached the same patch to bug 14636.

Thanks for taking a look.

Fang

> Daniel,
> 	 Some quick questions about your proposed recipe:
>
>> >  Ok, here is what I think the right fix is. Instead of creating the
>> >  IndirectSymBase mapping we use to associate sections with their
>> >  indirect offset start, in BindIndirectSymbols() we should:
>> > 
>> >  1. Add a simple container struct (lets say MachSectionIndirectSymbols)
>> >  for tracking the per-section list of indirect symbols. It will keep
>> >  the list of symbols in the section and the index of the first indirect
>> >  symbol in the section.
>
> Like a map<section*, vector<indirect_symbol> >?
>
> Instead of keeping track of the first indirect symbol index, could we just as 
> easily compute the index offsets later during the second pass:
>
> offset = 0;
> for each section S (in order)
> 	 record offset for section S
> 	 [optional] write vector of indirect symbols for S
> 	 offset += S.num_indirect_symbols();
> end for
>
> I was able to fix the reserve1 field by doing something like this (bug 14636, 
> comments 30-33), but I only later discovered that the indirect symbols needed 
> to be reordered.
>
>> >  2. Keep a mapping from sections to the above type.
>> >  3. Add a SetVector to record the order of the sections that have
>> >  indirect symbols.
>
> This I understand, I found SetVector and SmallSetVector in ADT.
>
>> >  4. During BindIndirectSymbols() we maintain the above information
>> >  (populating the MachSectionIndirectSymbols per-section symbol arrays).
>
> BindIndirectSymbols looks like it need not change, because it's already 
> operating on the assumption of ordered indirect symbols, right?
>
>> >  5. Once we have scanned all the symbols we make another pass over the
>> >  sections (in the order seen via indirect symbols) and assign the start
>> >  indices.
>
> as I outlined above (offset = start indices).
>
>> >  6. Update writing of the indirect symbol table to write in the same
>> >  order as traversed in #5.
>> > 
>> >  Does that make sense? It's more work than your patch but it (a) should
>> >  preserve binary compatibility with 'as' in situations where the
>> >  indirect symbols don't appear out of order w.r.t. the sections, (b) it
>> >  makes somewhere more explicit the relationship between sections and
>> >  their list of symbols being in contiguous order.
>> > 
>> >  - Daniel
>
> Will this re-ordering be acceptable to all Mach-O back-ends?  x86, ARM, PPC?
>
> Fang
>
>

-- 
David Fang
http://www.csl.cornell.edu/~fang/
-------------- next part --------------
diff --git a/include/llvm/MC/MCAssembler.h b/include/llvm/MC/MCAssembler.h
index 43fbdc9..2599edb 100644
--- a/include/llvm/MC/MCAssembler.h
+++ b/include/llvm/MC/MCAssembler.h
@@ -19,6 +19,15 @@
 #include "llvm/MC/MCInst.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/DataTypes.h"
+/**
+	Mach-O needs indirect symbols grouped by section.
+	Goal: 1
+ */
+#define	ORDER_INDIRECT_SYMBOLS_BY_SECTION	1
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+#include "llvm/ADT/SetVector.h"
+#endif
+
 #include <vector> // FIXME: Shouldn't be needed.
 
 namespace llvm {
@@ -781,14 +790,16 @@ public:
   void dump();
 };
 
+#if !ORDER_INDIRECT_SYMBOLS_BY_SECTION
 // FIXME: This really doesn't belong here. See comments below.
 struct IndirectSymbolData {
   MCSymbol *Symbol;
   MCSectionData *SectionData;
 };
+#endif
 
 // FIXME: Ditto this. Purely so the Streamer and the ObjectWriter can talk
 // to one another.
@@ -810,9 +821,11 @@ public:
   typedef SymbolDataListType::const_iterator const_symbol_iterator;
   typedef SymbolDataListType::iterator symbol_iterator;
 
+#if !ORDER_INDIRECT_SYMBOLS_BY_SECTION
   typedef std::vector<IndirectSymbolData>::const_iterator
     const_indirect_symbol_iterator;
   typedef std::vector<IndirectSymbolData>::iterator indirect_symbol_iterator;
+#endif
 
   typedef std::vector<DataRegionData>::const_iterator
     const_data_region_iterator;
@@ -846,7 +859,19 @@ private:
   // FIXME: Avoid this indirection?
   DenseMap<const MCSymbol*, MCSymbolData*> SymbolMap;
 
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+public:
+  typedef std::vector<const MCSymbol*>	IndirectSymbol_list_type;
+  typedef DenseMap<const MCSectionData*, IndirectSymbol_list_type>
+						IndirectSymbol_map_type;
+  // in order of appearance
+  typedef SetVector<const MCSectionData*>	IndirectSymbolSection_set_type;
+private:
+  IndirectSymbol_map_type		IndirectSymbols;
+  IndirectSymbolSection_set_type	IndirectSymbolSections;
+#else
   std::vector<IndirectSymbolData> IndirectSymbols;
+#endif
 
   std::vector<DataRegionData> DataRegions;
 
@@ -1050,6 +1075,25 @@ public:
   /// @name Indirect Symbol List Access
   /// @{
 
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+  const IndirectSymbol_map_type& getIndirectSymbols(void) const {
+    return IndirectSymbols;
+  }
+
+  const IndirectSymbolSection_set_type& getIndirectSymbolSections(void) const {
+    return IndirectSymbolSections;
+  }
+
+  void
+  push_indirect_symbol(const MCSectionData* s, const MCSymbol* d) {
+    // track indirect symbols by section,
+    // and their sections by order of appearance
+    IndirectSymbolSections.insert(s);
+    IndirectSymbols[s].push_back(d);
+  }
+
+  size_t indirect_symbol_size(void) const;	// accumulate size over buckets
+#else
   // FIXME: This is a total hack, this should not be here. Once things are
   // factored so that the streamer has direct access to the .o writer, it can
   // disappear.
@@ -1072,6 +1116,8 @@ public:
   }
 
   size_t indirect_symbol_size() const { return IndirectSymbols.size(); }
+#endif
+
 
   /// @}
   /// @name Linker Option List Access
diff --git a/lib/MC/MCAssembler.cpp b/lib/MC/MCAssembler.cpp
index 1829266..cbb4ffd 100644
--- a/lib/MC/MCAssembler.cpp
+++ b/lib/MC/MCAssembler.cpp
@@ -29,6 +29,10 @@
 #include "llvm/Support/TargetRegistry.h"
 #include "llvm/Support/raw_ostream.h"
 
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+#include <numeric>		// for std::accumulate
+#endif
+
 using namespace llvm;
 
 namespace {
@@ -277,6 +281,9 @@ void MCAssembler::reset() {
   SectionMap.clear();
   SymbolMap.clear();
   IndirectSymbols.clear();
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+  IndirectSymbolSections.clear();
+#endif
   DataRegions.clear();
   ThumbFuncs.clear();
   RelaxAll = false;
@@ -290,6 +297,19 @@ void MCAssembler::reset() {
   getWriter().reset();
 }
 
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+static size_t size_adder(const size_t a,
+	const MCAssembler::IndirectSymbol_map_type::value_type& b) {
+	return a +b.second.size();
+}
+
+size_t
+MCAssembler::indirect_symbol_size(void) const {
+  return std::accumulate(IndirectSymbols.begin(), IndirectSymbols.end(), 0,
+	&size_adder);
+}
+#endif
+
 bool MCAssembler::isSymbolLinkerVisible(const MCSymbol &Symbol) const {
   // Non-temporary labels should always be visible to the linker.
   if (!Symbol.isTemporary())
diff --git a/lib/MC/MCELFStreamer.cpp b/lib/MC/MCELFStreamer.cpp
index c1428d8..3f81ac0 100644
--- a/lib/MC/MCELFStreamer.cpp
+++ b/lib/MC/MCELFStreamer.cpp
@@ -133,10 +133,15 @@ void MCELFStreamer::EmitSymbolAttribute(MCSymbol *Symbol,
   if (Attribute == MCSA_IndirectSymbol) {
     // Note that we intentionally cannot use the symbol data here; this is
     // important for matching the string table that 'as' generates.
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+    // symbols grouped by section
+    getAssembler().push_indirect_symbol(getCurrentSectionData(), Symbol);
+#else
     IndirectSymbolData ISD;
     ISD.Symbol = Symbol;
     ISD.SectionData = getCurrentSectionData();
     getAssembler().getIndirectSymbols().push_back(ISD);
+#endif
     return;
   }
 
diff --git a/lib/MC/MCMachOStreamer.cpp b/lib/MC/MCMachOStreamer.cpp
index 7d08d0e..b90daf0 100644
--- a/lib/MC/MCMachOStreamer.cpp
+++ b/lib/MC/MCMachOStreamer.cpp
@@ -224,10 +224,15 @@ void MCMachOStreamer::EmitSymbolAttribute(MCSymbol *Symbol,
   if (Attribute == MCSA_IndirectSymbol) {
     // Note that we intentionally cannot use the symbol data here; this is
     // important for matching the string table that 'as' generates.
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+    // symbols grouped by section
+    getAssembler().push_indirect_symbol(getCurrentSectionData(), Symbol);
+#else
     IndirectSymbolData ISD;
     ISD.Symbol = Symbol;
     ISD.SectionData = getCurrentSectionData();
     getAssembler().getIndirectSymbols().push_back(ISD);
+#endif
     return;
   }
 
diff --git a/lib/MC/MachObjectWriter.cpp b/lib/MC/MachObjectWriter.cpp
index a5ba3c3..3686447 100644
--- a/lib/MC/MachObjectWriter.cpp
+++ b/lib/MC/MachObjectWriter.cpp
@@ -427,7 +428,8 @@ void MachObjectWriter::BindIndirectSymbols(MCAssembler &Asm) {
   // table much more complicated than it is worth.
   //
   // FIXME: Revisit this when the dust settles.
 
+#if !ORDER_INDIRECT_SYMBOLS_BY_SECTION
   // Bind non lazy symbol pointers first.
   unsigned IndirectIndex = 0;
   for (MCAssembler::indirect_symbol_iterator it = Asm.indirect_symbol_begin(),
@@ -466,6 +468,43 @@ void MachObjectWriter::BindIndirectSymbols(MCAssembler &Asm) {
     if (Created)
       Entry.setFlags(Entry.getFlags() | 0x0001);
   }
+#else // ORDER_INDIRECT_SYMBOLS_BY_SECTION
+  unsigned offset = 0;	// running total of offset
+  const MCAssembler::IndirectSymbolSection_set_type&
+    ISS(Asm.getIndirectSymbolSections());
+  const MCAssembler::IndirectSymbol_map_type&
+    ISyms(Asm.getIndirectSymbols());
+  MCAssembler::IndirectSymbolSection_set_type::const_iterator
+	i(ISS.begin()), e(ISS.end());
+  for ( ; i!=e; ++i) {
+    const MCAssembler::IndirectSymbol_list_type& b(ISyms.find(*i)->second);
+    MCAssembler::IndirectSymbol_list_type::const_iterator
+      bi(b.begin()), be(b.end());
+    const MCSectionMachO& Section(cast<MCSectionMachO>((*i)->getSection()));
+    switch (Section.getType()) {
+    default: break;
+    case MCSectionMachO::S_NON_LAZY_SYMBOL_POINTERS: {
+      for ( ; bi!=be; ++bi)
+        Asm.getOrCreateSymbolData(**bi);
+      break;
+    }
+    case MCSectionMachO::S_LAZY_SYMBOL_POINTERS: // fall-through
+    case MCSectionMachO::S_SYMBOL_STUBS: {
+      for ( ; bi!=be; ++bi) {
+    // Set the symbol type to undefined lazy, but only on construction.
+    // FIXME: Do not hardcode.
+        bool Created;
+        MCSymbolData &Entry(Asm.getOrCreateSymbolData(**bi, &Created));
+        if (Created)
+          Entry.setFlags(Entry.getFlags() | 0x0001);
+      }	// end for
+      break;
+    }
+    }	// end switch(sectionType)
+    IndirectSymBase.insert(std::make_pair(*i, offset));
+    offset += b.size();
+  }
+#endif
 }
 
 /// ComputeSymbolTable - Compute the symbol table data
@@ -895,26 +934,47 @@ void MachObjectWriter::WriteObject(MCAssembler &Asm,
   // Write the symbol table data, if used.
   if (NumSymbols) {
     // Write the indirect symbol entries.
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+    const MCAssembler::IndirectSymbol_map_type&
+      m(Asm.getIndirectSymbols());
+    const MCAssembler::IndirectSymbolSection_set_type&
+      ss(Asm.getIndirectSymbolSections());
+    for (MCAssembler::IndirectSymbolSection_set_type::const_iterator
+           si(ss.begin()), se(ss.end()); si != se; ++si) {
+      const MCAssembler::IndirectSymbol_list_type& l(m.find(*si)->second);
+      for (MCAssembler::IndirectSymbol_list_type::const_iterator
+           it(l.begin()), ie(l.end()); it != ie; ++it)
+#else
     for (MCAssembler::const_indirect_symbol_iterator
            it = Asm.indirect_symbol_begin(),
-           ie = Asm.indirect_symbol_end(); it != ie; ++it) {
+           ie = Asm.indirect_symbol_end(); it != ie; ++it)
+#endif
+    {
       // Indirect symbols in the non lazy symbol pointer section have some
       // special handling.
       const MCSectionMachO &Section =
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+        static_cast<const MCSectionMachO&>((*si)->getSection());
+        const MCSymbol* Sym = *it;
+#else
         static_cast<const MCSectionMachO&>(it->SectionData->getSection());
+        const MCSymbol* Sym = it->Symbol;
+#endif
       if (Section.getType() == MCSectionMachO::S_NON_LAZY_SYMBOL_POINTERS) {
         // If this symbol is defined and internal, mark it as such.
-        if (it->Symbol->isDefined() &&
-            !Asm.getSymbolData(*it->Symbol).isExternal()) {
+        if (Sym->isDefined() &&
+            !Asm.getSymbolData(*Sym).isExternal()) {
           uint32_t Flags = macho::ISF_Local;
-          if (it->Symbol->isAbsolute())
+          if (Sym->isAbsolute())
             Flags |= macho::ISF_Absolute;
           Write32(Flags);
           continue;
         }
       }
-
-      Write32(Asm.getSymbolData(*it->Symbol).getIndex());
+      Write32(Asm.getSymbolData(*Sym).getIndex());
+#if ORDER_INDIRECT_SYMBOLS_BY_SECTION
+      }
+#endif
     }
 
     // FIXME: Check that offsets match computed ones.


More information about the llvm-commits mailing list