[PATCH] D40736: [CodeView] Add support for type record content hashing

Rui Ueyama via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 5 12:57:09 PST 2017


What is the reason to use different hash functions for these two cases? I
mean if using SHA1 is faster than a noncryptic hash function with content
comparison, why don't you always use SHA1?

On Tue, Dec 5, 2017 at 7:03 AM, Zachary Turner <zturner at google.com> wrote:

> In the case of a LocallyHashedType, collision doesn’t matter at all
> because we fall back to a full record comparison when there is a collision.
> This is the method that is used today.
>
> In the PDB, we actually store CRC32s as hashes, which is even worse, but
> again it doesn’t matter because it’s just to get the bucket, probing will
> do a full equality check. So collision is not even a theoretical problem
> for a LocallyHashedType.
>
> For a GloballyHashedType, the hash is intended to be “as good as” the
> record, so instead of a full equality comparison we only compare the full
> 20 bytes of SHA1 hash.  In this case, collision is a theoretical problem ,
> but with probability O(10^-18) because a type stream can’t have more than
> 2^32 elements anyway
>
>
> On Mon, Dec 4, 2017 at 11:12 PM Rui Ueyama via Phabricator <
> reviews at reviews.llvm.org> wrote:
>
>> ruiu added inline comments.
>>
>>
>> ================
>> Comment at: llvm/include/llvm/DebugInfo/CodeView/TypeHashing.h:29
>> +struct LocallyHashedType {
>> +  hash_code Hash;
>> +  ArrayRef<uint8_t> RecordData;
>> ----------------
>> Is this used when an object file doesn't have type record hash values?
>>
>> If you use 64-bit values as unique keys and want to maintain a
>> probability of collision lower than 10^-9, for example, the maximum number
>> of type records you can have is 190,000, according to [1]. Is this enough?
>>
>> https://en.wikipedia.org/wiki/Birthday_problem#Probability_table
>>
>>
>> ================
>> Comment at: llvm/include/llvm/DebugInfo/CodeView/TypeHashing.h:41
>> +/// global hashes of the types that B refers to), a global hash can
>> uniquely
>> +/// identify identify that A occurs in another stream that has a
>> completely
>> +/// different graph structure.  Although the hash itself is slower to
>> compute,
>> ----------------
>> identify
>>
>>
>> https://reviews.llvm.org/D40736
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20171205/8366e1a1/attachment.html>


More information about the llvm-commits mailing list