[PATCH] D152162: DWP multithreading

David Blaikie via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jun 30 10:52:43 PDT 2023


dblaikie added a comment.

In D152162#4463413 <https://reviews.llvm.org/D152162#4463413>, @zhuna8616 wrote:

> We investigated the performance of both LLVM & gold 's dwp, and found gold's performance is not as good, specifically 0.6 of llvm-dwp when packing clang's dwp. They share a bottleneck in merging the debug_str.dwo section, which for llvm-dwp is the function writeStringsAndOffsets. This is because maintaining a hash set of strings is slow, and each string requires one lookup at least. Gold uses std::unordered_map, while LLVM uses DenseMap, which is faster than std::unordered_map, thus causes Gold to be slower than LLVM.

Huh, fascinating. How's the memory usage? (I'd expect the memory usage would be way higher, and the extra copying into/out of buffers would add up to cost more than the savings in string map lookups - but I haven't done detailed profiles, admittedly)

> We changed LLVM's DenseMap to StringMap, and achieved 5% at the very least, and 17%~20% when packing for clang. This is a low-hanging fruit as @dblaikie put it. I believe we can at least make this improvement.

Yep, that sounds like a freebie - please send that as a separate review?

> With all of above being said, adding multithreading for the merging of the debug_str.dwo section at least would give the improvement of 17%~22% on average for our projects, with 16 threads. This is not as messy as the patch presently since this patch also added multithreading in other places. If only the code concerning debug_str.dwo is modified, I believe it would be friendly to the long-term engineering efforts.
>
> Furthermore, if we do not deduplicate the string table produced by each worker thread, after they are done with their assigned files, the improvement in performance can be 59%~190% as observed on our projects. The size of the produced DWP file increases less than 9% with respect to the file produced by the original implementation of LLVM.
>
> So we propose 3 options of changes:
>
> - Change DenseMap to StringMap.
> - Add multithreading for the merging of debug_str.dwo.
> - Add multithreading for the merging of debug_str.dwo and a command line option controlling whether the threads deduplicate the string table.

My concern with multithreaded string merging is determinism, I think? It's important that we produce the same output bits given the same inputs - are the multithreaded string merging approaches you have in mind still deterministic?

It'd be great if we could reuse some of lld's string merging support since they've already thought about these sort of issues & I believe figured out ways to do it deterministically and fast/multithreaded.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152162/new/

https://reviews.llvm.org/D152162



More information about the llvm-commits mailing list