[all-commits] [llvm/llvm-project] f4f85e: [llvm-profdata] Remove MD5 collision check in D147...
William Junda Huang via All-commits
all-commits at lists.llvm.org
Fri Sep 15 15:31:05 PDT 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: f4f85e0ab405c89e1b843401a055538bd26a0187
https://github.com/llvm/llvm-project/commit/f4f85e0ab405c89e1b843401a055538bd26a0187
Author: William Junda Huang <williamjhuang at google.com>
Date: 2023-09-15 (Fri, 15 Sep 2023)
Changed paths:
M llvm/include/llvm/ProfileData/SampleProf.h
M llvm/unittests/tools/llvm-profdata/CMakeLists.txt
R llvm/unittests/tools/llvm-profdata/MD5CollisionTest.cpp
Log Message:
-----------
[llvm-profdata] Remove MD5 collision check in D147740 (#66544)
This is the patch at https://reviews.llvm.org/D153692, migrating to
Github
After testing D147740 with multiple industrial projects with ~10 million
FunctionSamples, no MD5 collision has been found. In perfect hashing,
the probability of collision for N symbols over K possible hash value is
1 - K!/((K-N)! * K^N). When N is 1 million and K is 2^64, the
probability is 3*10^-8, when N is 10 million the probability is 3*10^-6,
so we are probably not going to find an actual case in real world
application. (However if K is 2^32, the probability of collision is
almost 1, this is indeed a problem, if anyone still use a large profile
on 32-bit machine, as hash_code is tied to size_t). Furthermore, when a
collision happens we can't do anything to recover it, unless using a
multi-map, but that is significantly slower, which contradicts the
purpose of optimizing the profile reader. One more thing, since we have
been using profiles with MD5 names, and they have to be coming from
non-MD5 sources, so if hash collision is to happen, it already happened
when we convert a non-MD5 profile to a MD5 one, so there's no point to
check for that in the reader, and this feature can be removed.
More information about the All-commits
mailing list