[llvm-dev] RFC: LLD symbol table redesign

Wed Apr 27 11:49:07 PDT 2016

Hi all,

This proposes a redesign of LLD’s symbol table in order to improve memory
locality by minimizing indirection when resolving relocations. The key idea
is that we perform symbol resolution by overwriting SymbolBodies, rather
than by updating pointers. This is based on some ideas mentioned by Rafael
on IRC.

Conceptually, we split Symbol into a non-polymorphic part and a polymorphic
part (a SymbolBody). The non-polymorphic part contains the flags that are
currently stored in Symbol, while the polymorphic part would be a derived
class of SymbolBody stored directly in the Symbol object without a level of
indirection. The class definitions would roughly look like this:

struct Symbol {
  uint8_t Binding;
  unsigned Visibility : 2;
  // ...
  char Body[Max]; // Max is the maximum size of a SymbolBody
  SymbolBody *body() { return reinterpret_cast<SymbolBody *>(Body); }
};

class SymbolBody {
  const unsigned SymbolKind : 8;
  // …

  Symbol &symbol() {
    return *reinterpret_cast<Symbol *>(reinterpret_cast<char *>(this) -
offsetof(Symbol, Body));
  }
};

class Defined : public SymbolBody {
 ...
};

As we load symbols from input files, the input file calls methods on the
symbol table that add symbols of specific types. These methods would look
up the symbol name and make a decision about whether to replace the
existing symbol. This would be similar to the decision currently being made
by SymbolBody::compare, but it would be made directly using symbol
information, without needing to create a SymbolBody for the symbol. If we
decide to replace the symbol, we placement-new a SymbolBody for the symbol
into Symbol::Body.

A sketch of how the symbol table might implement adding an undefined symbol:

Symbol *SymbolTable<ELFT>::addUndefined(StringRef Name, uint8_t Binding, …)
{
  std::pair<Symbol *, bool> P = insert(Name);
  if (!P.second) { // symbol already in symbol table
    if (auto *L = dyn_cast<Lazy>(P.first->body()))
      addFile(L);
    return P.first;
  }
  // symbol previously unseen
  new (P.first->Body) Undefined(Name, Binding, ...);
  return P.first;
}

Input files would have an associated symbol list for use in resolving
relocations. This symbol list would be a std::vector<Symbol *>. Symbols
retrieved from the symbol table are added to the symbol list. Relocations
would be resolved by following two fewer levels of indirection, from the
vector to the Symbol, rather than the current vector -> SymbolBody ->
Symbol -> SymbolBody.

This design has a disadvantage over the current design: input file loading
would no longer be thread safe. However, input file loading is already a
highly serialized process since the set of files we need to parse is
unknown until we have examined all input files, so I think that doesn’t
matter.

Next steps: I intend to implement a prototype of this idea and get
benchmark numbers. If the numbers look good, I'll send out a patch.

Thanks,
-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160427/0013d2c3/attachment.html>