[PATCH] D27155: Merge strings using concurrent hash map (3rd try!)
Rafael Avila de Espindola via llvm-commits
llvm-commits at lists.llvm.org
Fri Dec 9 21:01:08 PST 2016
Rui Ueyama via Phabricator via llvm-commits
<llvm-commits at lists.llvm.org> writes:
> ruiu added a comment.
>
> I'm struggling to improve single-core performance of this patch. It scales well, but it's single-core performance sucks. This is a table to link time of clang with debug info (unit is second). As you can see, you need at least 4 cores to take advantage of this patch.
What is the number of cache misses?
Given
+ size_t NumPieces = 0;
+ for (MergeInputSection<ELFT> *Sec : Sections)
+ NumPieces += Sec->Pieces.size();
+ ParallelBuilder =
+ new ParallelStringTableBuilder(NumPieces / 2, StringAlignment);
I expect this table to be enormous. Also, why is it valid to divide by 2?
> We cannot make the linker use this algorithm only when it detects 4 or more cores because a choice of algorithm affects layout of mergeable output sections. We want to get deterministic outputs for the same input regardless how many processors are available on a computer.
What would be the slowdown from fully sorting the table?
Out of curiosity, have you tried something that works by divide an
conquer? It should be interesting to try a parallel_sort followed by
std::unique since there are already good implementations of that.
Last but not least, I would still suggest checking how many strings .dwo
avoids copying.
Cheers,
Rafael
More information about the llvm-commits
mailing list