[PATCH] D27152: Merge strings using sharded hash tables.

Tue Dec 6 21:48:01 PST 2016

silvas added a comment.

When running with just a single core, can we still do a sharded visitation, but instead of doing NumShards passes through all the pieces (skipping ones not in the current shard), only do a single pass? We should be able to get identical layout in that case.

I.e., instead of

          if ((Hash % NumShards) != Idx)
            continue;
  ...
          auto P = OffsetMap[Idx].insert({{S, Hash}, Off});

instead do:

  auto P = OffsetMap[Hash % NumShards].insert({{S, Hash}, Off});

Maybe that can recover some of the single-threaded performance? If not, then we definitely need comments explaining what is causing the single-thread slowdown (which is equivalent to a throughput reduction; improving latency is good, but the throughput reduction seems to be about 20%+ which might be a problem (for example, for a buildbot))

================
Comment at: ELF/OutputSections.cpp:530

-template <class ELFT> void MergeOutputSection<ELFT>::finalizeNoTailMerge() {
-  // Add all string pieces to the string table builder to create section
-  // contents. Because we are not tail-optimizing, offsets of strings are
-  // fixed when they are added to the builder (string table builder contains
-  // a hash table from strings to offsets).
-  for (MergeInputSection<ELFT> *Sec : Sections)
-    for (size_t I = 0, E = Sec->Pieces.size(); I != E; ++I)
-      if (Sec->Pieces[I].Live)
-        Sec->Pieces[I].OutputOff = Builder.add(Sec->getData(I));
-
-  Builder.finalizeInOrder();
-  this->Size = Builder.getSize();
+static size_t align2(size_t Val, size_t Alignment) {
+  return (Val + Alignment - 1) & ~(Alignment - 1);
----------------
Do we already have a helper for this in libsupport?

================
Comment at: ELF/OutputSections.cpp:562
+  const int NumShards = 8;
+  DenseMap<CachedHashStringRef, size_t> OffsetMap[NumShards];
+  size_t ShardSize[NumShards];
----------------
Please pad this so that there isn't false sharing. DenseMap is smaller than a cacheline IIRC and so currently different threads will have false sharing.

https://reviews.llvm.org/D27152