[PATCH] D153692: Fixed D147740 - [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map

William Junda Huang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Jun 25 20:53:30 PDT 2023

huangjd added a comment.

In D153692#4447455 <https://reviews.llvm.org/D153692#4447455>, @wenlei wrote:

> Copying comments from the original patch for continuation..
>> In D147740#4443233 <https://reviews.llvm.org/D147740#4443233>, @huangjd wrote:
>> Actually do we really care about MD5 collision? ExtBinary format already ignored MD5 collision for regular string names (and therefore regular function profiles), as only one of two functions with colliding MD5 get written to the name table (and the other is therefore lost). If we are using CS profiles, since different CS profiles have different serializations, their > hashes are distributed as expected. The most important thing is that, even if we detect a hash collision, we can't do anything about it except logging it (using a multi-map makes the reader much slower), so I think the MD5 collision check should be marked as LLVM_DEBUG. This does reduce 0.5 second out of ~30 seconds (1.67%) over the 1 GB profile read .
> I don't think we care. Is the new type HashKeyMap and SampleProfileMap all for detecting and reporting collision? I'd avoid all that complexity and prefer a simple DenseMap + a SampleContext->hash_code converter and not even bother with debug prints for collision...

I will add a separate patch to remove the hash collision check. It would be noteworthy if users reports collision in actual use cases, and if that doesn't happen in a while the hash collision check can be removed. Even the probability of collision is very small for one single program but given that LLVM is used everywhere so we can't be so sure. 
The new wrapper type also serves another purpose that existing code can use emplace, find, erase, etc without changing in most cases, and I am planning to have CallTargetMap and FunctionSamplesMap using that as well.

  rG LLVM Github Monorepo



More information about the llvm-commits mailing list