[patch] prototype of storing MCSymbolData directly in the MCAssembler

David Blaikie dblaikie at gmail.com
Thu Apr 24 10:28:24 PDT 2014


TL,DR; Playing around with data structures, not high priority - read
on only if you're bored/curious.

So when looking at the MCSymbolData leak in MCAsmStreamer I went
looking at wyh MCSymbolData was an ilist_node and I found the iplist
in MCAssembler.

This data isn't polymorphic, particularly large, nor requires the
ability to insert/remove/iterate from bare MCSymbolData objects (ie:
doesn't need the intrusive property nor the indirection of pointers).

So I prototyped the semi-obvious change of replacing these:

iplist<MCSymbolData> Symbols;
DenseMap<const MCSymbol*, MCSymbol*> SymbolMap;

with this:

std::unordered_map<const MCSymbol *, MCSymbolData> SymbolMap;

(hmm, now that I reread the "FIXME: avoid this indirection" I guess my
patch doesn't address that comment, which is presumably about avoiding
the need for the map entirely - which is a fair question: why isn't
the MCSymbolData just stored in the MCSymbol?)

But the remaining problem is that clients of the list of symbols need
stable insertion/iteration order which unordered_map does not provide
(but was provided by the extra Symbols list in the original
implementation).

Essentially what's required here is a map with stable iteration over
values and non-invalidation of value references on insertion - is this
something worth having a generic utility for? It'd look something like
MapVector, except:

* using a deque or other non-invalidating structure instead of a vector
* using deque iterators (instead of indexes) in the map values

It looks like many uses of MapVector actually only need stable
iteration over their keys, but evidently they don't need the iterator
non-invalidation I've found a need for.

Should MapVector be more adaptable and provide both (or all 4*...)
these needs? In that case if some client needs stable iteration over
keys and values then all clients will incur the extra N*sizeof(Key* or
Key) cost that MapVector must have to provide that functionality.

* (ordered values X ordered keys and values) X (iterator invalidation
X iterator-non-invalidation)
-------------- next part --------------
commit 53ace1d3da2f5b2ac1929c5a3483b8fd34540e75
Author: David Blaikie <dblaikie at gmail.com>
Date:   Thu Apr 24 08:04:57 2014 -0700

    Prototype storing MCSymbolData by value in an unordered_map.

diff --git include/llvm/ADT/iterator.h include/llvm/ADT/iterator.h
index b1d29a8..e308a48 100644
--- include/llvm/ADT/iterator.h
+++ include/llvm/ADT/iterator.h
@@ -178,6 +178,18 @@ struct pointee_iterator
   T &operator*() const { return **this->I; }
 };
 
+template <typename WrappedIteratorT,
+          typename T = decltype(std::declval<WrappedIteratorT>()->second)>
+struct second_iterator
+    : iterator_adaptor_base<second_iterator<WrappedIteratorT>, WrappedIteratorT,
+                            T> {
+  second_iterator() {}
+  template <typename U>
+  second_iterator(U &&u)
+      : second_iterator::iterator_adaptor_base(std::forward<U &&>(u)) {}
+
+  T &operator*() const { return this->I->second; }
+};
 }
 
 #endif
diff --git include/llvm/MC/MCAssembler.h include/llvm/MC/MCAssembler.h
index 9c7733c..5aaab1b 100644
--- include/llvm/MC/MCAssembler.h
+++ include/llvm/MC/MCAssembler.h
@@ -22,8 +22,10 @@
 #include "llvm/MC/MCSubtargetInfo.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/DataTypes.h"
+#include "llvm/ADT/iterator.h"
 #include <algorithm>
 #include <vector> // FIXME: Shouldn't be needed.
+#include <unordered_map>
 
 namespace llvm {
 class raw_ostream;
@@ -683,7 +685,7 @@ public:
 };
 
 // FIXME: Same concerns as with SectionData.
-class MCSymbolData : public ilist_node<MCSymbolData> {
+class MCSymbolData {
 public:
   const MCSymbol *Symbol;
 
@@ -726,8 +728,8 @@ public:
 public:
   // Only for use as sentinel.
   MCSymbolData();
-  MCSymbolData(const MCSymbol &_Symbol, MCFragment *_Fragment, uint64_t _Offset,
-               MCAssembler *A = nullptr);
+  MCSymbolData(const MCSymbol &_Symbol, MCFragment *_Fragment,
+               uint64_t _Offset);
 
   /// @name Accessors
   /// @{
@@ -826,13 +828,15 @@ class MCAssembler {
 
 public:
   typedef iplist<MCSectionData> SectionDataListType;
-  typedef iplist<MCSymbolData> SymbolDataListType;
+  // Use std::unordered_map to maintain pointer validity to MCSymbolData over
+  // insertions into the map.
+  typedef std::unordered_map<const MCSymbol *, MCSymbolData> SymbolMapType;
 
   typedef SectionDataListType::const_iterator const_iterator;
   typedef SectionDataListType::iterator iterator;
 
-  typedef SymbolDataListType::const_iterator const_symbol_iterator;
-  typedef SymbolDataListType::iterator symbol_iterator;
+  typedef second_iterator<SymbolMapType::const_iterator> const_symbol_iterator;
+  typedef second_iterator<SymbolMapType::iterator> symbol_iterator;
 
   typedef iterator_range<symbol_iterator> symbol_range;
   typedef iterator_range<const_symbol_iterator> const_symbol_range;
@@ -873,17 +877,13 @@ private:
 
   iplist<MCSectionData> Sections;
 
-  iplist<MCSymbolData> Symbols;
-
   /// The map of sections to their associated assembler backend data.
   //
   // FIXME: Avoid this indirection?
   DenseMap<const MCSection*, MCSectionData*> SectionMap;
 
   /// The map of symbols to their associated assembler backend data.
-  //
-  // FIXME: Avoid this indirection?
-  DenseMap<const MCSymbol*, MCSymbolData*> SymbolMap;
+  SymbolMapType SymbolMap;
 
   std::vector<IndirectSymbolData> IndirectSymbols;
 
@@ -1093,19 +1093,20 @@ public:
   /// @name Symbol List Access
   /// @{
 
-  const SymbolDataListType &getSymbolList() const { return Symbols; }
-  SymbolDataListType &getSymbolList() { return Symbols; }
-
-  symbol_iterator symbol_begin() { return Symbols.begin(); }
-  const_symbol_iterator symbol_begin() const { return Symbols.begin(); }
+  symbol_iterator symbol_begin() { return symbol_iterator(SymbolMap.begin()); }
+  const_symbol_iterator symbol_begin() const {
+    return const_symbol_iterator(SymbolMap.begin());
+  }
 
-  symbol_iterator symbol_end() { return Symbols.end(); }
-  const_symbol_iterator symbol_end() const { return Symbols.end(); }
+  symbol_iterator symbol_end() { return symbol_iterator(SymbolMap.end()); }
+  const_symbol_iterator symbol_end() const {
+    return const_symbol_iterator(SymbolMap.end());
+  }
 
   symbol_range symbols() { return make_range(symbol_begin(), symbol_end()); }
   const_symbol_range symbols() const { return make_range(symbol_begin(), symbol_end()); }
 
-  size_t symbol_size() const { return Symbols.size(); }
+  size_t symbol_size() const { return SymbolMap.size(); }
 
   /// @}
   /// @name Indirect Symbol List Access
@@ -1204,30 +1205,33 @@ public:
   }
 
   bool hasSymbolData(const MCSymbol &Symbol) const {
-    return SymbolMap.lookup(&Symbol) != nullptr;
+    return SymbolMap.count(&Symbol);
   }
 
   MCSymbolData &getSymbolData(const MCSymbol &Symbol) {
-    MCSymbolData *Entry = SymbolMap.lookup(&Symbol);
-    assert(Entry && "Missing symbol data!");
-    return *Entry;
+    auto I = SymbolMap.find(&Symbol);
+    assert(I != SymbolMap.end() && "Missing symbol data!");
+    return I->second;
   }
 
   const MCSymbolData &getSymbolData(const MCSymbol &Symbol) const {
-    MCSymbolData *Entry = SymbolMap.lookup(&Symbol);
-    assert(Entry && "Missing symbol data!");
-    return *Entry;
+    auto I = SymbolMap.find(&Symbol);
+    assert(I != SymbolMap.end() && "Missing symbol data!");
+    return I->second;
   }
 
   MCSymbolData &getOrCreateSymbolData(const MCSymbol &Symbol,
                                       bool *Created = nullptr) {
-    MCSymbolData *&Entry = SymbolMap[&Symbol];
+    auto Iter = SymbolMap.find(&Symbol);
 
-    if (Created) *Created = !Entry;
-    if (!Entry)
-      Entry = new MCSymbolData(Symbol, nullptr, 0, this);
+    if (Created)
+      *Created = Iter != SymbolMap.end();
 
-    return *Entry;
+    if (Iter == SymbolMap.end())
+      Iter = SymbolMap.insert(
+          Iter, std::make_pair(&Symbol, MCSymbolData(Symbol, nullptr, 0)));
+
+    return Iter->second;
   }
 
   const_file_name_iterator file_names_begin() const {
diff --git lib/MC/MCAssembler.cpp lib/MC/MCAssembler.cpp
index b3b2bb3..e7c55b9 100644
--- lib/MC/MCAssembler.cpp
+++ lib/MC/MCAssembler.cpp
@@ -279,15 +279,10 @@ MCSectionData::getSubsectionInsertionPoint(unsigned Subsection) {
 MCSymbolData::MCSymbolData() : Symbol(nullptr) {}
 
 MCSymbolData::MCSymbolData(const MCSymbol &_Symbol, MCFragment *_Fragment,
-                           uint64_t _Offset, MCAssembler *A)
-  : Symbol(&_Symbol), Fragment(_Fragment), Offset(_Offset),
-    IsExternal(false), IsPrivateExtern(false),
-    CommonSize(0), SymbolSize(nullptr), CommonAlign(0),
-    Flags(0), Index(0)
-{
-  if (A)
-    A->getSymbolList().push_back(this);
-}
+                           uint64_t _Offset)
+    : Symbol(&_Symbol), Fragment(_Fragment), Offset(_Offset), IsExternal(false),
+      IsPrivateExtern(false), CommonSize(0), SymbolSize(nullptr),
+      CommonAlign(0), Flags(0), Index(0) {}
 
 /* *** */
 
@@ -305,7 +300,6 @@ MCAssembler::~MCAssembler() {
 
 void MCAssembler::reset() {
   Sections.clear();
-  Symbols.clear();
   SectionMap.clear();
   SymbolMap.clear();
   IndirectSymbols.clear();


More information about the llvm-commits mailing list