[PATCH] D27155: Merge strings using concurrent hash map (3rd try!)

Sean Silva via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Nov 27 21:28:10 PST 2016


silvas added a comment.

Wow, great work. I think I've convinced myself that this will be deterministic. The crucial observation is that with linear probing, the "streaks" (strings of consecutive occupied buckets) are always the same regardless of insertion order.

How much extra performance is there for sorting the streaks individually like you do in this patch? We could use a single call to std::remove_if to coalesce all the non-empty buckets and a single call to std::sort to get a deterministic order, which would be simpler. If that isn't too much of a performance cost, it would be simpler to do that.



================
Comment at: ELF/OutputSections.cpp:475
+// The internal hash table is an open-addressing one. It doesn't
+// support resizing. Once it becomes full, the behavior is undefined.
+//
----------------
This is extremely troubling as it implies that we need to *guarantee* that we have chosen the size large enough or else LLD will have undefined behavior.


================
Comment at: ELF/OutputSections.cpp:494
+// multiple threads claim buckets, claimed buckets are claimed, and
+// unclaimed buckets remain unclaimed. Therefore, by sortint buckets
+// for each streak, we can make the entire hash table deterministic.
----------------
This property depends critically on the linear probing. This would not hold if we used quadratic probing. It would be good to mention that.


================
Comment at: ELF/OutputSections.cpp:533
+
+  static_assert(sizeof(EntryTy) == 24, "EntryTy is too big");
+
----------------
Won't this end up smaller for 32-bit hosts?


================
Comment at: ELF/OutputSections.cpp:579
+
+    error("table is full");
+  }
----------------
This must be unreachable, or it must somehow signal this so that the calling code can retry the string table building with a bigger size. We can't have LLD fail to link because a user's object files don't have enough duplicate strings.


https://reviews.llvm.org/D27155





More information about the llvm-commits mailing list