[PATCH] D27155: Merge strings using concurrent hash map (3rd try!)

Mon Dec 5 22:59:03 PST 2016

ruiu added a comment.

I'm struggling to improve single-core performance of this patch. It scales well, but it's single-core performance sucks. This is a table to link time of clang with debug info (unit is second). As you can see, you need at least 4 cores to take advantage of this patch.

`  # of cores   Before   After

   1   13.462   17.048   +21.03%
   2    9.766   10.902   +10.42%
   4    7.697    6.935   -10.98%
   8    6.888    5.674   -21.39%
  12    7.073    5.812   -21.69%
  16    7.066    5.569   -26.88%
  20    6.846    5.226   -30.99%`

I tried to optimize it, but because it fundamentally does more thing than the simple hash table approach, it is almost impossible to compete with the original algorithm (that said I think this is too slow though).

We cannot make the linker use this algorithm only when it detects 4 or more cores because a choice of algorithm affects layout of mergeable output sections. We want to get deterministic outputs for the same input regardless how many processors are available on a computer.

I started thinking that the second, sharded algorithm may be better than this one because, even though it doesn't scale like this algorithm, it's single-core performance is not that bad. I'll update the patch with performance numbers.

I'm sorry for the back-and-force.

https://reviews.llvm.org/D27155