[PATCH] D40736: [CodeView] Add support for type record content hashing

Tue Dec 5 13:07:03 PST 2017

Also, there is still a big unknown of "what do we do when an object file
doesn't contain a .debug$H section?"  This is going to be the case for
system libraries, and anything not compiled with clang-cl, for example.

In the extreme case, 0% of linker inputs will have these hashes.  We need
to make sure performance is not regressed in that scenario.  The easiest
way to deal with all these unknowns while we're still in the experimental
phase is to just leave the default codepath untouched.

On Tue, Dec 5, 2017 at 1:05 PM Zachary Turner <zturner at google.com> wrote:

> There are two reasons.
>
> 1. If you simply replace our current hash function with SHA1, it is a 2x
> slowdown.  SHA1, by itself, is much slower to actually compute.  The
> advantages of SHA1 only kick in when combined with the "global" hash
> algorithm.  i.e. replacing TypeIndices (which are local to a particular
> object file) with previously computed hashes of the records they refer to.
> However, making this sweeping change across the board is invasive, and
> leads into my next point, which is.
>
> 2. I don't want to change *anything* about the current algorithm.  We will
> need to be able to iterate on, tune, and benchmark this method against the
> current method.  The current method is already very fast, so I don't want
> to do anything that could affect the performance adversely.  In particular,
> since SHA1 by itself is actually slower to compute, we would probably get a
> unacceptable performance regression.  I'd like to be able to later have a
> flag to LLD that is hidden and experimental, and when it is not used, it
> behaves exactly as it does today, since we already know that is very fast.
> And only after we determine that this new approach is sound and iron out
> all the details, then we can delete the old codepath.
>
>
> On Tue, Dec 5, 2017 at 12:57 PM Rui Ueyama <ruiu at google.com> wrote:
>
>> What is the reason to use different hash functions for these two cases? I
>> mean if using SHA1 is faster than a noncryptic hash function with content
>> comparison, why don't you always use SHA1?
>>
>> On Tue, Dec 5, 2017 at 7:03 AM, Zachary Turner <zturner at google.com>
>> wrote:
>>
>>> In the case of a LocallyHashedType, collision doesn’t matter at all
>>> because we fall back to a full record comparison when there is a collision.
>>> This is the method that is used today.
>>>
>>> In the PDB, we actually store CRC32s as hashes, which is even worse, but
>>> again it doesn’t matter because it’s just to get the bucket, probing will
>>> do a full equality check. So collision is not even a theoretical problem
>>> for a LocallyHashedType.
>>>
>>> For a GloballyHashedType, the hash is intended to be “as good as” the
>>> record, so instead of a full equality comparison we only compare the full
>>> 20 bytes of SHA1 hash.  In this case, collision is a theoretical problem ,
>>> but with probability O(10^-18) because a type stream can’t have more than
>>> 2^32 elements anyway
>>>
>>>
>>> On Mon, Dec 4, 2017 at 11:12 PM Rui Ueyama via Phabricator <
>>> reviews at reviews.llvm.org> wrote:
>>>
>>>> ruiu added inline comments.
>>>>
>>>>
>>>> ================
>>>> Comment at: llvm/include/llvm/DebugInfo/CodeView/TypeHashing.h:29
>>>> +struct LocallyHashedType {
>>>> +  hash_code Hash;
>>>> +  ArrayRef<uint8_t> RecordData;
>>>> ----------------
>>>> Is this used when an object file doesn't have type record hash values?
>>>>
>>>> If you use 64-bit values as unique keys and want to maintain a
>>>> probability of collision lower than 10^-9, for example, the maximum number
>>>> of type records you can have is 190,000, according to [1]. Is this enough?
>>>>
>>>> https://en.wikipedia.org/wiki/Birthday_problem#Probability_table
>>>>
>>>>
>>>> ================
>>>> Comment at: llvm/include/llvm/DebugInfo/CodeView/TypeHashing.h:41
>>>> +/// global hashes of the types that B refers to), a global hash can
>>>> uniquely
>>>> +/// identify identify that A occurs in another stream that has a
>>>> completely
>>>> +/// different graph structure.  Although the hash itself is slower to
>>>> compute,
>>>> ----------------
>>>> identify
>>>>
>>>>
>>>> https://reviews.llvm.org/D40736
>>>>
>>>>
>>>>
>>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20171205/d545649a/attachment.html>