[PATCH] D33488: [ELF] - Optimization for populating stringpool when building .gdb_index.
Rafael Avila de Espindola via llvm-commits
llvm-commits at lists.llvm.org
Thu May 25 12:58:13 PDT 2017
George Rimar <grimar at accesssoftek.com> writes:
>>The hash function being used is
>>
>>static uint32_t hash(StringRef Str) {
>> uint32_t R = 0;
>> for (uint8_t C : Str)
>> R = R * 67 + tolower(C) - 113;
>> return R;
>>}
>>
>>What happens we have the string "Foo" and "foo"? Do we have a guarantee
>>that that never happens?
>>
>>Cheers,
>>Rafael
>
> It is not a problem here. See llvm::StringTableBuilder implements both add() via CachedHashStringRef:
> size_t add(CachedHashStringRef S);
> size_t add(StringRef S) { return add(CachedHashStringRef(S)); }
>
> So collision of hashes can happen in both implementations actually I believe.
> But that is not harmfull, because internal implementation uses
> DenseMap<CachedHashStringRef, size_t> StringIndexMap to store strings.
>
> And 2 CachedHashStringRef are equal only when both their hash and values are equal:
> static bool isEqual(const CachedHashStringRef &LHS,
> const CachedHashStringRef &RHS) {
> return LHS.hash() == RHS.hash() &&
> DenseMapInfo<StringRef>::isEqual(LHS.val(), RHS.val());
> }
>
> So I mean that following code would produce 2 different entries anyways:
> llvm::StringTableBuilder Bar(StringTableBuilder::ELF, 1);
> Bar.add({StringRef("foo"), 2});
> Bar.add({StringRef("Foo"), 2});
Good point. But now that I think of it, any idea why the hash is defined
with a tolower? Should we be canonicalazing the strings in the set? If
so the hash could just assert that the characters are lowercase.
Trying to benchmark this myself I hit:
https://bugs.llvm.org/show_bug.cgi?id=33173.
Could you take a look?
Ignoring the error the difference I see when linking clang is in the
noise.
Cheers,
Rafael
More information about the llvm-commits
mailing list