[PATCH] D33488: [ELF] - Optimization for populating stringpool when building .gdb_index.

George Rimar via llvm-commits llvm-commits at lists.llvm.org
Thu May 25 01:15:39 PDT 2017


>The hash function being used is
>
>static uint32_t hash(StringRef Str) {
>  uint32_t R = 0;
>  for (uint8_t C : Str)
>    R = R * 67 + tolower(C) - 113;
>  return R;
>}
>
>What happens we have the string "Foo" and "foo"? Do we have a guarantee
>that that never happens?
>
>Cheers,
>Rafael

It is not a problem here. See llvm::StringTableBuilder implements both add() via CachedHashStringRef:
  size_t add(CachedHashStringRef S);
  size_t add(StringRef S) { return add(CachedHashStringRef(S)); }

So collision of hashes can happen in both implementations actually I believe. 
But that is not harmfull, because internal implementation uses
DenseMap<CachedHashStringRef, size_t> StringIndexMap to store strings. 

And 2 CachedHashStringRef are equal only when both their hash and values are equal:
  static bool isEqual(const CachedHashStringRef &LHS,
                      const CachedHashStringRef &RHS) {
    return LHS.hash() == RHS.hash() &&
           DenseMapInfo<StringRef>::isEqual(LHS.val(), RHS.val());
  }

So I mean that following code would produce 2 different entries anyways:
  llvm::StringTableBuilder Bar(StringTableBuilder::ELF, 1);
  Bar.add({StringRef("foo"), 2});
  Bar.add({StringRef("Foo"), 2});

George.


More information about the llvm-commits mailing list