[PATCH] D27155: Merge strings using concurrent hash map (3rd try!)
Rui Ueyama via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 5 22:59:03 PST 2016
ruiu added a comment.
I'm struggling to improve single-core performance of this patch. It scales well, but it's single-core performance sucks. This is a table to link time of clang with debug info (unit is second). As you can see, you need at least 4 cores to take advantage of this patch.
` # of cores Before After
1 13.462 17.048 +21.03%
2 9.766 10.902 +10.42%
4 7.697 6.935 -10.98%
8 6.888 5.674 -21.39%
12 7.073 5.812 -21.69%
16 7.066 5.569 -26.88%
20 6.846 5.226 -30.99%`
I tried to optimize it, but because it fundamentally does more thing than the simple hash table approach, it is almost impossible to compete with the original algorithm (that said I think this is too slow though).
We cannot make the linker use this algorithm only when it detects 4 or more cores because a choice of algorithm affects layout of mergeable output sections. We want to get deterministic outputs for the same input regardless how many processors are available on a computer.
I started thinking that the second, sharded algorithm may be better than this one because, even though it doesn't scale like this algorithm, it's single-core performance is not that bad. I'll update the patch with performance numbers.
I'm sorry for the back-and-force.
https://reviews.llvm.org/D27155
More information about the llvm-commits
mailing list